WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:00.000 --> 00:00:03.486 [MUSIC PLAYING] 00:01:01.280 --> 00:01:02.510 DAVID MALAN: All right. 00:01:02.510 --> 00:01:04.580 This is CS50. 00:01:04.580 --> 00:01:08.390 This is week 2 wherein we will ultimately learn how to use memory, 00:01:08.390 --> 00:01:11.900 but we thought we'd first begin with a bit of story time. 00:01:11.900 --> 00:01:14.570 And in fact, allow me to walk over to our brave volunteers who 00:01:14.570 --> 00:01:15.650 have joined us already. 00:01:15.650 --> 00:01:18.080 First here on my left, we have who? 00:01:18.080 --> 00:01:19.730 AKSHAYA: Hi, I'm Akshaya. 00:01:19.730 --> 00:01:22.520 I'm a first year in Mathews, and I'm planning 00:01:22.520 --> 00:01:25.747 on concentrating in chemical and physical biology and CS. 00:01:25.747 --> 00:01:27.080 DAVID MALAN: Wonderful, welcome. 00:01:27.080 --> 00:01:28.955 And let me have you hang on to the microphone 00:01:28.955 --> 00:01:31.437 first because we've asked Akshaya to tell us a short story. 00:01:31.437 --> 00:01:33.770 So in your envelope, you have the beginnings of a story. 00:01:33.770 --> 00:01:35.353 If you wouldn't mind reading it aloud. 00:01:35.353 --> 00:01:38.630 And as she reads this, allow us to give some thought as to what 00:01:38.630 --> 00:01:41.922 level Akshaya reads at, so to speak. 00:01:41.922 --> 00:01:43.880 AKSHAYA: All right, it's a long one, get ready. 00:01:43.880 --> 00:01:48.405 One fish, two fish, red fish, blue fish. 00:01:48.405 --> 00:01:50.030 DAVID MALAN: All right, very well done. 00:01:50.030 --> 00:01:53.302 What grade level would you say she reads at if you think back 00:01:53.302 --> 00:01:55.010 to your middle school, grade school, when 00:01:55.010 --> 00:01:59.490 maybe teacher said you read at this level or maybe this level or this one 00:01:59.490 --> 00:02:01.530 here? 00:02:01.530 --> 00:02:03.817 So OK, no offense taken yet. 00:02:03.817 --> 00:02:05.010 AUDIENCE: 1st grade. 00:02:05.010 --> 00:02:05.550 DAVID MALAN: I'm sorry? 00:02:05.550 --> 00:02:06.060 AUDIENCE: 1st grade. 00:02:06.060 --> 00:02:07.018 DAVID MALAN: 1st grade. 00:02:07.018 --> 00:02:08.850 OK, so first grade is just about right. 00:02:08.850 --> 00:02:12.432 And in fact, according to one algorithm, this text here, 00:02:12.432 --> 00:02:14.640 one fish, two fish, red fish, blue fish, would indeed 00:02:14.640 --> 00:02:17.830 be considered to actually be 1st grade or just before first grade. 00:02:17.830 --> 00:02:19.530 So let's-- and why is that, though? 00:02:19.530 --> 00:02:21.977 Why did you say 1st grade? 00:02:21.977 --> 00:02:23.060 AUDIENCE: It's very basic. 00:02:23.060 --> 00:02:23.990 DAVID MALAN: It's very basic. 00:02:23.990 --> 00:02:26.198 But what is it about these words that are very basic? 00:02:26.198 --> 00:02:27.687 Do you want to identify yourself? 00:02:27.687 --> 00:02:28.270 AKSHAYA: Sure. 00:02:28.270 --> 00:02:31.570 They're all one syllable and they're very simple like colors and stuff 00:02:31.570 --> 00:02:32.070 like that. 00:02:32.070 --> 00:02:32.945 DAVID MALAN: Spot-on. 00:02:32.945 --> 00:02:35.620 So like they're very short words they're very short sentences. 00:02:35.620 --> 00:02:38.050 And you would expect that of a younger person. 00:02:38.050 --> 00:02:40.822 All right, let's go ahead and hand the mic to your next volunteer 00:02:40.822 --> 00:02:42.530 here if you'd like to introduce yourself. 00:02:42.530 --> 00:02:43.030 ETHAN: Yes. 00:02:43.030 --> 00:02:43.810 Hi, I'm Ethan. 00:02:43.810 --> 00:02:46.752 I'm a first year in Canada, and I'll be concentrating in economics. 00:02:46.752 --> 00:02:47.710 DAVID MALAN: Wonderful. 00:02:47.710 --> 00:02:50.860 And in your folder, we have another story to share. 00:02:50.860 --> 00:02:52.480 ETHAN: Congratulations. 00:02:52.480 --> 00:02:53.740 Today is your day. 00:02:53.740 --> 00:02:55.480 You're off to great places. 00:02:55.480 --> 00:02:56.730 You're off and away. 00:02:56.730 --> 00:02:59.230 DAVID MALAN: So this text might sound familiar, particularly 00:02:59.230 --> 00:03:00.880 on the heels of high school, perhaps. 00:03:00.880 --> 00:03:05.310 What grade level might he be reading at? 00:03:05.310 --> 00:03:06.450 So maybe 5th grade. 00:03:06.450 --> 00:03:07.620 And why 5th grade? 00:03:07.620 --> 00:03:09.740 AUDIENCE: [INAUDIBLE] 00:03:09.740 --> 00:03:11.030 DAVID MALAN: OK. 00:03:11.030 --> 00:03:11.540 Yeah. 00:03:11.540 --> 00:03:13.040 So a little more complicated. 00:03:13.040 --> 00:03:16.850 Like the words-- we've got some more punctuation, we have an apostrophe, 00:03:16.850 --> 00:03:17.892 we have longer sentences. 00:03:17.892 --> 00:03:20.392 And indeed, according to one algorithm, not quite 5th grade, 00:03:20.392 --> 00:03:22.640 but we would adjudicate your reading level to be 3rd. 00:03:22.640 --> 00:03:25.280 But let's see if we can't do one final flourish here 00:03:25.280 --> 00:03:28.190 if you'd like to introduce yourself and your story. 00:03:28.190 --> 00:03:29.840 MIKE: Hi, I'm Mike. 00:03:29.840 --> 00:03:30.920 I'm also a first year. 00:03:30.920 --> 00:03:33.020 I'm in Weld, and I'm planning on concentrating 00:03:33.020 --> 00:03:34.185 in biomedical engineering. 00:03:34.185 --> 00:03:35.060 DAVID MALAN: Welcome. 00:03:35.060 --> 00:03:36.980 And your tale? 00:03:36.980 --> 00:03:41.750 MIKE: It was a bright, cold day in April and the clocks were striking 13. 00:03:41.750 --> 00:03:45.440 Winston Smith, his chin nuzzled into his breast in an effort 00:03:45.440 --> 00:03:49.130 to escape the vile wind, slipped quickly through the glass doors 00:03:49.130 --> 00:03:51.710 of victory mansions, though not quickly enough 00:03:51.710 --> 00:03:55.445 to prevent a swirl of gritty dust from entering along with him. 00:03:55.445 --> 00:03:57.320 DAVID MALAN: All right, so escalated quickly. 00:03:57.320 --> 00:03:59.960 And someone's guess at this reading level? 00:03:59.960 --> 00:04:01.083 AUDIENCE: 1984. 00:04:01.083 --> 00:04:02.125 DAVID MALAN: What's that? 00:04:02.125 --> 00:04:05.320 Oh, OK, 1984 is indeed the text in question, and in what 00:04:05.320 --> 00:04:08.050 grade did you perhaps read that book? 00:04:08.050 --> 00:04:09.670 So I'm hearing 8th, I'm hearing 10th. 00:04:09.670 --> 00:04:12.490 So indeed, 10th grade is what a certain algorithm would actually 00:04:12.490 --> 00:04:14.260 adjudicate that reading level to be at. 00:04:14.260 --> 00:04:15.610 And consider now the heuristics. 00:04:15.610 --> 00:04:19.158 So we started with very small words, very small sentences, very easy words, 00:04:19.158 --> 00:04:21.700 and then things sort of escalated into more interesting, more 00:04:21.700 --> 00:04:25.460 sophisticated English, more interesting sentence construction and the like. 00:04:25.460 --> 00:04:30.640 So I bet if we could somehow capture those characteristics of text, 00:04:30.640 --> 00:04:33.250 the length of the words and the lengths of the sentences 00:04:33.250 --> 00:04:35.680 and the position of the punctuation, I daresay, 00:04:35.680 --> 00:04:38.878 even using week 1 material and, today, week 2 material, 00:04:38.878 --> 00:04:41.920 we'll be able to actually write code and implement an algorithm like that 00:04:41.920 --> 00:04:44.380 can take these spoken words, put them to paper, 00:04:44.380 --> 00:04:47.590 and actually analyze roughly what that reading level might be. 00:04:47.590 --> 00:04:49.390 So that's just a teaser of what lies ahead. 00:04:49.390 --> 00:04:52.300 For now, allow us to thank our volunteers, each of whom 00:04:52.300 --> 00:04:55.930 gets a wonderful parting gift here to read at home. 00:04:55.930 --> 00:04:58.410 [APPLAUSE] 00:04:58.410 --> 00:04:58.910 All right. 00:04:58.910 --> 00:05:01.110 And Thank you all so much. 00:05:01.110 --> 00:05:05.730 So with that said, there's another domain that we'll explore this week, 00:05:05.730 --> 00:05:07.730 and indeed, what you'll find in the coming weeks 00:05:07.730 --> 00:05:11.150 is that beyond just focusing on some of the fundamentals and the basics 00:05:11.150 --> 00:05:14.330 like we've really done in the past couple of weeks talking about loops 00:05:14.330 --> 00:05:16.340 and conditionals and Boolean expressions, 00:05:16.340 --> 00:05:19.400 really building blocks or puzzle pieces that we can assemble together, 00:05:19.400 --> 00:05:22.070 we're going to increasingly start talking about applications 00:05:22.070 --> 00:05:25.250 of these ideas which, after all, is why any field is perhaps 00:05:25.250 --> 00:05:26.460 important and applicable. 00:05:26.460 --> 00:05:29.510 So here, for instance, we'll consider not only reading levels today, 00:05:29.510 --> 00:05:33.630 and in turn, in problem set 2 this week, but also the world of cryptography, 00:05:33.630 --> 00:05:36.860 which is the art, the science of scrambling, encrypting 00:05:36.860 --> 00:05:39.230 information, and ciphering it in such a way 00:05:39.230 --> 00:05:43.530 that you can send a message securely through the internet, through the air, 00:05:43.530 --> 00:05:46.700 through any medium even though someone might intercept it. 00:05:46.700 --> 00:05:49.100 Ideally, thanks to cryptography, they shouldn't 00:05:49.100 --> 00:05:53.240 be able to decrypt it or actually determine what it there says. 00:05:53.240 --> 00:05:57.560 So for instance, if you were to receive a message like this, at first glance, 00:05:57.560 --> 00:05:59.460 it's indeed a bit cryptic. 00:05:59.460 --> 00:06:02.400 Three words maybe, but by day's end, we'll 00:06:02.400 --> 00:06:04.830 have decrypted even this message for you. 00:06:04.830 --> 00:06:08.550 So up until now, though, we've had some sort of conceptual training wheels on. 00:06:08.550 --> 00:06:12.480 And I gave us this picture last week when we introduced the tool make via 00:06:12.480 --> 00:06:15.870 which you can make programs out of your source code because you need to turn 00:06:15.870 --> 00:06:18.450 that source code into machine code, the 0's and 1's. 00:06:18.450 --> 00:06:20.970 And in the middle here was this thing called a compiler. 00:06:20.970 --> 00:06:23.790 But it really has been kind of an abstraction for us, 00:06:23.790 --> 00:06:27.690 and we've sort of had these metaphorical and physical training 00:06:27.690 --> 00:06:30.450 wheels here in the sense that we haven't really 00:06:30.450 --> 00:06:34.420 needed to care like what the compiler is doing, how it works and so forth. 00:06:34.420 --> 00:06:38.400 But today, what we thought we'd do is peel back a bit of that layer so 00:06:38.400 --> 00:06:40.410 that even though after today you'll continue 00:06:40.410 --> 00:06:43.380 to be able to use commands like make and sort of return 00:06:43.380 --> 00:06:46.275 to the beautiful abstraction that is not caring about some 00:06:46.275 --> 00:06:48.150 of these lower-level details, we'll offer you 00:06:48.150 --> 00:06:49.980 a glimpse of how some of these things work. 00:06:49.980 --> 00:06:52.350 Because so that inevitably when something goes wrong, 00:06:52.350 --> 00:06:54.540 you've got some bug, you're having some problem, 00:06:54.540 --> 00:06:58.620 you'll have a bottom-up understanding of what it could actually be. 00:06:58.620 --> 00:07:01.620 And indeed, these basics, you'll find, will very often 00:07:01.620 --> 00:07:05.230 help you troubleshoot problems and really solve problems more generally. 00:07:05.230 --> 00:07:07.920 So here, for instance, is the code that we keep coming back to. 00:07:07.920 --> 00:07:12.750 And this code here is the simplest of C programs that just says "hello, world." 00:07:12.750 --> 00:07:13.960 This is the source code. 00:07:13.960 --> 00:07:16.260 This, we claimed, was the corresponding machine code. 00:07:16.260 --> 00:07:18.810 And it was that program called a compiler that 00:07:18.810 --> 00:07:20.800 converted one into the other. 00:07:20.800 --> 00:07:23.100 But let's dive a little more deeply this week 00:07:23.100 --> 00:07:25.920 into what we mean by compiling code. 00:07:25.920 --> 00:07:28.410 Like what is happening so that by day's end, 00:07:28.410 --> 00:07:30.910 nothing really feels like magic anymore. 00:07:30.910 --> 00:07:33.540 It's not just that it goes from source code to machine code 00:07:33.540 --> 00:07:37.020 and that's that, you understand what's actually being done for you, 00:07:37.020 --> 00:07:40.694 and frankly, what other humans have done over the decades to make 00:07:40.694 --> 00:07:45.597 make as beautifully abstract and as simple as it now might seem to be. 00:07:45.597 --> 00:07:47.430 So here are a couple of commands that you've 00:07:47.430 --> 00:07:50.305 been in the habit of running when you want to first compile your code 00:07:50.305 --> 00:07:51.930 and then execute your code. 00:07:51.930 --> 00:07:56.280 But it turns out that make is actually running another command for you. 00:07:56.280 --> 00:07:59.190 The first of several white lies we'll tell in the course 00:07:59.190 --> 00:08:02.040 is that make itself is not a compiler, per se. 00:08:02.040 --> 00:08:06.580 It's actually a program that automatically runs a compiler for you. 00:08:06.580 --> 00:08:07.770 And by that, I mean this. 00:08:07.770 --> 00:08:13.650 Let me go over to VS Code here and let me create our familiar hello.c program. 00:08:13.650 --> 00:08:20.310 And I'm going to go ahead and do include stdio.h, int main void, and inside 00:08:20.310 --> 00:08:25.027 of the curly braces, printf "hello," comma, "world," backslash n semicolon. 00:08:25.027 --> 00:08:27.360 So that's the code that we keep writing again and again. 00:08:27.360 --> 00:08:31.932 And up until now, if I wanted to compile that, I would do make hello 00:08:31.932 --> 00:08:35.820 dot slash hello, and voila, now my program is made 00:08:35.820 --> 00:08:37.980 and it actually executes. 00:08:37.980 --> 00:08:40.289 But what's actually going on underneath the hood 00:08:40.289 --> 00:08:43.799 there is that make is running an actual compiler for you, 00:08:43.799 --> 00:08:46.980 and the reveal today is that the compiler we have been using 00:08:46.980 --> 00:08:49.170 is something called Clang for C language. 00:08:49.170 --> 00:08:51.540 And this is just another program whose purpose in life 00:08:51.540 --> 00:08:54.510 is actually to do the conversion of source code to machine code. 00:08:54.510 --> 00:08:57.360 But it turns out that Clang by itself can 00:08:57.360 --> 00:09:00.770 be used very simply like you see here, clang hello.c, 00:09:00.770 --> 00:09:04.563 but it doesn't behave nearly as user-friendly as you might like. 00:09:04.563 --> 00:09:06.480 So in particular, let me go ahead and do this. 00:09:06.480 --> 00:09:08.960 I'm going to go ahead and remove my compiled program 00:09:08.960 --> 00:09:12.830 by running rm for remove, which I alluded to briefly last time. 00:09:12.830 --> 00:09:16.260 And then I'm going to say y for yes, remove that regular file. 00:09:16.260 --> 00:09:21.800 And if I go ahead now and run just clang of hello.c and hit Enter, 00:09:21.800 --> 00:09:25.140 it seems to be successful, at least insofar as there's no error messages. 00:09:25.140 --> 00:09:27.530 But if I try to do dot slash hello, Enter, 00:09:27.530 --> 00:09:31.070 there is no such file or directory called hello. 00:09:31.070 --> 00:09:34.940 That is because by default, Clang somewhat goofily like just 00:09:34.940 --> 00:09:37.670 outputs a file name called a dot out. 00:09:37.670 --> 00:09:38.480 Like why a? 00:09:38.480 --> 00:09:42.200 Well, it's sort of a simple name. a dot out, technically for assembler output, 00:09:42.200 --> 00:09:44.270 but this just means this is the default file 00:09:44.270 --> 00:09:45.770 name that Clang is going to give us. 00:09:45.770 --> 00:09:49.790 So OK, it turns out I can do dot slash a dot out Enter, and voila, 00:09:49.790 --> 00:09:53.723 that now is my program, but that's just a stupid name for a program. 00:09:53.723 --> 00:09:54.890 It's not very user-friendly. 00:09:54.890 --> 00:09:56.598 It's certainly not an icon you would want 00:09:56.598 --> 00:09:58.680 to put on people's desktops or phones. 00:09:58.680 --> 00:10:00.070 So how can we do better? 00:10:00.070 --> 00:10:03.600 Well, it turns out, with Clang, we can configure it using 00:10:03.600 --> 00:10:05.983 what we'll call command line arguments. 00:10:05.983 --> 00:10:09.150 And command line arguments are actually something we've been using thus far, 00:10:09.150 --> 00:10:12.390 we just didn't slap this word on it, but command line arguments 00:10:12.390 --> 00:10:15.600 are additional words or shorthand notation 00:10:15.600 --> 00:10:18.660 that you typed at your command prompt that somehow 00:10:18.660 --> 00:10:21.270 modify the behavior of a program. 00:10:21.270 --> 00:10:23.310 And you can perhaps guess where this is going. 00:10:23.310 --> 00:10:28.140 It turns out that if I actually want to create a program called hello-- 00:10:28.140 --> 00:10:31.200 not a.out, which is the default, I can actually 00:10:31.200 --> 00:10:36.420 do this-- clang, space, dash lowercase o, space, hello, 00:10:36.420 --> 00:10:40.260 or whatever I want to call the thing, space, hello.c. 00:10:40.260 --> 00:10:42.630 And now if I hit Enter, nothing seems to happen, 00:10:42.630 --> 00:10:48.490 but now if I do ./hello and Enter, now I've actually got that program. 00:10:48.490 --> 00:10:49.737 So why is make useful? 00:10:49.737 --> 00:10:51.570 Well, it just saves us the trouble of having 00:10:51.570 --> 00:10:55.230 to type out this longer line of command any time 00:10:55.230 --> 00:10:56.940 we actually want to compile the code. 00:10:56.940 --> 00:10:59.430 But in fact, it gets even worse than that 00:10:59.430 --> 00:11:01.860 with commands like clang or compilers in general 00:11:01.860 --> 00:11:04.470 because consider this code here. 00:11:04.470 --> 00:11:08.010 Not just the version of "hello, world," but maybe the second version wherein 00:11:08.010 --> 00:11:11.700 last week, I started to get user input by adding the CS50 Library using 00:11:11.700 --> 00:11:14.370 get_string and then saying, "hello," comma, "David." 00:11:14.370 --> 00:11:18.210 Well, if I go back to VS Code and I modify this program 00:11:18.210 --> 00:11:19.810 to be that same one-- 00:11:19.810 --> 00:11:23.490 so let me go ahead and include cs50.h at the top. 00:11:23.490 --> 00:11:27.000 Let me get rid of this simple print line and instead give myself 00:11:27.000 --> 00:11:33.510 a string called name equals get_string, "What's your name?" 00:11:33.510 --> 00:11:35.610 Question mark, just like we did in Scratch. 00:11:35.610 --> 00:11:39.510 Then I can do printf, quote-unquote, "hello," comma. 00:11:39.510 --> 00:11:41.532 And previously I typed "world." 00:11:41.532 --> 00:11:44.490 I obviously don't want to type "David" because I want it to be dynamic. 00:11:44.490 --> 00:11:47.430 What did I type last week for as a placeholder? 00:11:47.430 --> 00:11:50.980 So yeah, just-- not Command-S, but %S. So %S in this case, 00:11:50.980 --> 00:11:53.070 which is a placeholder for any such string. 00:11:53.070 --> 00:11:56.550 Then I can still do my new line, close, quote, comma, and then 00:11:56.550 --> 00:12:00.630 I can substitute in something like the value of the name variable. 00:12:00.630 --> 00:12:03.430 All right, so if I go ahead now and compile this, 00:12:03.430 --> 00:12:06.300 now last week, I could just do make hello and I'm on my way, 00:12:06.300 --> 00:12:07.570 it worked just fine. 00:12:07.570 --> 00:12:10.440 But if I instead do clang manually, it turns out 00:12:10.440 --> 00:12:16.650 that this is not going to be sufficient now. clang -o hello, space, hello.c. 00:12:16.650 --> 00:12:19.200 Exact same thing I typed a moment ago, but I 00:12:19.200 --> 00:12:21.940 think I'm going to see some errors. 00:12:21.940 --> 00:12:24.580 So what's this error hinting at here? 00:12:24.580 --> 00:12:27.120 Well, at the very bottom, it's a bit arcane with its output, 00:12:27.120 --> 00:12:30.400 and much of this you can ignore, but there are some certain key words. 00:12:30.400 --> 00:12:33.240 What's the first maybe keyword you recognize in these three 00:12:33.240 --> 00:12:36.130 lines of erroneous output? 00:12:36.130 --> 00:12:37.273 So it mentions main. 00:12:37.273 --> 00:12:40.440 That's not that much of a clue because that's the only thing I wrote so far. 00:12:40.440 --> 00:12:42.060 Second line, though, get_string. 00:12:42.060 --> 00:12:46.050 There's some issue with an undefined reference to get_string. 00:12:46.050 --> 00:12:47.590 Now why might that be? 00:12:47.590 --> 00:12:51.820 I did include cs50.h, but that's apparently not 00:12:51.820 --> 00:12:54.520 enough to teach the compiler about get_string. 00:12:54.520 --> 00:12:58.630 Well, it turns out that if you're using a third-party library, one 00:12:58.630 --> 00:13:02.740 that doesn't necessarily come with C the language, something like CS50's, it 00:13:02.740 --> 00:13:05.860 turns out that you additionally have to tell the compiler that you 00:13:05.860 --> 00:13:07.060 want to use that library. 00:13:07.060 --> 00:13:08.890 And not just by including the header file, 00:13:08.890 --> 00:13:11.450 but by an additional command as well. 00:13:11.450 --> 00:13:15.010 So when you run Clang, you want to provide an additional 00:13:15.010 --> 00:13:16.900 rather command line argument. 00:13:16.900 --> 00:13:21.580 Literally -l for library, which is a term I used last week, cs50. 00:13:21.580 --> 00:13:23.620 A library is just code that someone else wrote 00:13:23.620 --> 00:13:25.640 that you want to use in your project. 00:13:25.640 --> 00:13:29.380 So if I really want to compile this version that uses the CS50 Library, 00:13:29.380 --> 00:13:34.660 I can still do clang o hello hello.c, but before I finish my thought, 00:13:34.660 --> 00:13:40.450 I need to tell the compiler to link, so to speak, in the library CS50. 00:13:40.450 --> 00:13:44.350 And now I hit Enter, the error message goes away, I can do ./hello, 00:13:44.350 --> 00:13:47.410 I can type in my name, and voila, we're back to week 1. 00:13:47.410 --> 00:13:49.987 And this is why, suffice it to say, we introduce make, 00:13:49.987 --> 00:13:51.070 which is not a CS50 thing. 00:13:51.070 --> 00:13:54.070 This is a popular tool that real people in the real world 00:13:54.070 --> 00:13:56.480 use to automate these kinds of processes. 00:13:56.480 --> 00:13:59.050 So unbeknownst to you, make has been using 00:13:59.050 --> 00:14:03.670 the -o for you. make, unbeknownst to you, has been using -l cs50 for you 00:14:03.670 --> 00:14:06.650 just because it makes our lives easier. 00:14:06.650 --> 00:14:08.560 But today, we thought we would deliberately 00:14:08.560 --> 00:14:11.440 peel back this layer so we at least understand 00:14:11.440 --> 00:14:16.300 what's going on behind this abstraction that is make itself 00:14:16.300 --> 00:14:17.750 and compiling more generally. 00:14:17.750 --> 00:14:21.880 So let me propose that compiling itself is not quite what 00:14:21.880 --> 00:14:22.960 we've described it to be. 00:14:22.960 --> 00:14:25.840 Compiling is like this catch-all phrase that apparently I claim 00:14:25.840 --> 00:14:27.650 goes from source code to machine code. 00:14:27.650 --> 00:14:30.710 But if we really want to get pedantic, which we'll do briefly, 00:14:30.710 --> 00:14:33.640 but this is not a sign of things to come because this, too, 00:14:33.640 --> 00:14:39.250 will be abstract away, compiling is just one of four steps that are involved 00:14:39.250 --> 00:14:43.010 in turning source code that you and I write into those 0's and 1's. 00:14:43.010 --> 00:14:45.010 But through an understanding of these four steps 00:14:45.010 --> 00:14:46.780 today, you'll hopefully better understand 00:14:46.780 --> 00:14:49.480 how to troubleshoot issues like that and just know 00:14:49.480 --> 00:14:51.680 what's happening because it's not, in fact, magic. 00:14:51.680 --> 00:14:55.850 It's just the result of years of humans developing these four steps here. 00:14:55.850 --> 00:14:58.870 So when you run make, what's happening? 00:14:58.870 --> 00:15:02.450 Or in turn, when you run clang, four different things are happening. 00:15:02.450 --> 00:15:04.360 And the first one is called pre-processing. 00:15:04.360 --> 00:15:05.720 So what is this all about? 00:15:05.720 --> 00:15:07.270 Well, let's consider this code here. 00:15:07.270 --> 00:15:09.730 And this code is a little bit interesting 00:15:09.730 --> 00:15:13.850 insofar as it's one of the more complicated examples from last week. 00:15:13.850 --> 00:15:18.550 And you'll notice, for instance, that I had include stdio at the top 00:15:18.550 --> 00:15:19.900 so I could use printf. 00:15:19.900 --> 00:15:24.340 I had main down here, whose purpose in life was just to meow three times. 00:15:24.340 --> 00:15:27.880 And then recall we made our own meow function just like we did in week 0 00:15:27.880 --> 00:15:31.630 with Scratch that just printed out, quote-unquote, "meow." 00:15:31.630 --> 00:15:37.210 But I also included this line here, which we called what? 00:15:37.210 --> 00:15:39.760 This was a prototype. 00:15:39.760 --> 00:15:41.470 And why did I have to include it there? 00:15:41.470 --> 00:15:45.070 Or equivalently, what would happen if I didn't include a prototype up 00:15:45.070 --> 00:15:45.790 at the top there? 00:15:45.790 --> 00:15:46.693 Yeah? 00:15:46.693 --> 00:15:51.255 AUDIENCE: [INAUDIBLE] 00:15:51.255 --> 00:15:52.130 DAVID MALAN: Exactly. 00:15:52.130 --> 00:15:55.820 If I didn't include it up here, the program, when trying to compile main, 00:15:55.820 --> 00:15:59.370 would not know what meow is because it's not defined until later. 00:15:59.370 --> 00:16:02.210 So this is kind of like a little hint of what is to come. 00:16:02.210 --> 00:16:05.750 Alternatively, we could just move this whole thing up at the top of the file, 00:16:05.750 --> 00:16:08.120 but I claim that just devolves into a big mess 00:16:08.120 --> 00:16:10.250 eventually once you have many different functions. 00:16:10.250 --> 00:16:13.590 Like you can't realistically put them all at the top to solve this problem. 00:16:13.590 --> 00:16:15.870 So these prototypes solve that problem. 00:16:15.870 --> 00:16:16.760 So nothing new here. 00:16:16.760 --> 00:16:20.750 Just a reminder of what motivated this one line of prototype. 00:16:20.750 --> 00:16:24.290 Now let's consider this simpler program, which 00:16:24.290 --> 00:16:26.945 is just the one we wrote most recently in VS Code. 00:16:26.945 --> 00:16:28.820 This program prompts the human for their name 00:16:28.820 --> 00:16:30.590 and then says hello to that person. 00:16:30.590 --> 00:16:33.710 But it has two includes at the top of the file. 00:16:33.710 --> 00:16:37.070 And in fact, any line of C that starts with this hash symbol 00:16:37.070 --> 00:16:40.220 is what we'll call now a preprocessor directive. 00:16:40.220 --> 00:16:42.950 It's not really a word you need to remember in your vocabulary, 00:16:42.950 --> 00:16:46.310 but it is a little bit different from most every other line 00:16:46.310 --> 00:16:47.900 because it starts with that hash. 00:16:47.900 --> 00:16:50.420 That's a special symbol in C. 00:16:50.420 --> 00:16:52.750 And what this means is the following. 00:16:52.750 --> 00:16:57.570 This very first line, cs50.h, is indeed a file that I and CS50 staff 00:16:57.570 --> 00:17:02.400 wrote and we installed somewhere in VS Code for you, somewhere in the cloud. 00:17:02.400 --> 00:17:07.859 And I've claimed you need to use this header file in order to use get_string. 00:17:07.859 --> 00:17:12.290 So just logically, what is probably inside of cs50.h? 00:17:15.089 --> 00:17:16.170 Yeah? 00:17:16.170 --> 00:17:17.610 AUDIENCE: Function [INAUDIBLE]. 00:17:23.628 --> 00:17:24.670 DAVID MALAN: Super close. 00:17:24.670 --> 00:17:27.589 So the function called get_string that does the getting of a string, 00:17:27.589 --> 00:17:30.038 but it's not quite as much as the function itself. 00:17:30.038 --> 00:17:33.080 It's actually a little bit less than that, but you're on the right track. 00:17:33.080 --> 00:17:37.940 What is inside of cs50.h, presumably? 00:17:37.940 --> 00:17:40.560 Just a what? 00:17:40.560 --> 00:17:43.770 Just a prototype for? 00:17:43.770 --> 00:17:44.820 Which function? 00:17:44.820 --> 00:17:45.750 get_string. 00:17:45.750 --> 00:17:48.390 So admittedly, there's some other stuff in there, too, 00:17:48.390 --> 00:17:51.930 but the important line for today's discussion is that inside of cs50.h 00:17:51.930 --> 00:17:55.740 is indeed one line of code that defines what the return value, what 00:17:55.740 --> 00:17:59.610 the name is, and what the arguments, if any, are to get_string, 00:17:59.610 --> 00:18:00.880 and some other stuff. 00:18:00.880 --> 00:18:05.130 And so what happens effectively when you compile your code, 00:18:05.130 --> 00:18:07.080 step 1 is this pre-processing line. 00:18:07.080 --> 00:18:09.960 And essentially, there is some code that someone else wrote inside 00:18:09.960 --> 00:18:13.710 of the clang compiler that looks for a line that starts with hash include, 00:18:13.710 --> 00:18:17.580 and when it sees that, it goes and finds this file and effectively copies 00:18:17.580 --> 00:18:21.240 and pastes the contents of that file right there into your code 00:18:21.240 --> 00:18:23.130 so that you don't have to go find the file, 00:18:23.130 --> 00:18:25.840 copy and paste it, and make a mess of your own code. 00:18:25.840 --> 00:18:29.550 So in particular, it's effectively as though you're copying and pasting 00:18:29.550 --> 00:18:32.910 the prototype of get_string to the very top of your file, 00:18:32.910 --> 00:18:35.550 thereby teaching the compiler that it exists. 00:18:35.550 --> 00:18:38.550 By that same logic, what is probably in stdio.h? 00:18:41.740 --> 00:18:43.690 The prototype for? 00:18:43.690 --> 00:18:44.710 For printf. 00:18:44.710 --> 00:18:46.280 And indeed, exactly that. 00:18:46.280 --> 00:18:49.450 So this line effectively gets replaced with the equivalent 00:18:49.450 --> 00:18:52.150 of the prototype for printf, which, for today's purposes, 00:18:52.150 --> 00:18:55.210 is a bit more complicated, so let me wave my hand at the dot-dot-dot 00:18:55.210 --> 00:18:57.850 just because it takes a variable number of arguments 00:18:57.850 --> 00:19:00.760 depending on how many placeholders or format codes you have. 00:19:00.760 --> 00:19:03.290 But effectively, that, too, is what's happening. 00:19:03.290 --> 00:19:06.190 So the preprocessor step, step 1 of 4, just 00:19:06.190 --> 00:19:08.097 does that find and replace, if you will. 00:19:08.097 --> 00:19:10.430 Now there's some-- again, some other stuff in that file, 00:19:10.430 --> 00:19:12.580 and this, too, is kind of a white lie. printf 00:19:12.580 --> 00:19:15.790 probably has its own file because that's a really big library, 00:19:15.790 --> 00:19:17.930 but the essence of it is exactly this. 00:19:17.930 --> 00:19:21.010 So preprocessing converts all of those hash 00:19:21.010 --> 00:19:24.520 include lines to whatever the underlying prototypes are 00:19:24.520 --> 00:19:26.650 within the file plus some other stuff. 00:19:26.650 --> 00:19:29.920 Now compiling we use it as this catch-all phrase, but it turns out, 00:19:29.920 --> 00:19:32.100 it has a very specific meaning that's worth 00:19:32.100 --> 00:19:33.850 knowing about even though after today, you 00:19:33.850 --> 00:19:37.120 can go back to using compiling as the sort of catch-all phrase. 00:19:37.120 --> 00:19:41.390 So when you've got this same code here after the pre-processing step 00:19:41.390 --> 00:19:42.420 has happened. 00:19:42.420 --> 00:19:44.900 So this is essentially happening in the computer's memory. 00:19:44.900 --> 00:19:49.400 It's not changing your hello.c file permanently or anything like that. 00:19:49.400 --> 00:19:54.890 This code gets, quote-unquote, "compiled" into something 00:19:54.890 --> 00:19:57.120 that looks more like this. 00:19:57.120 --> 00:19:59.660 And this is a scarier language that we won't spend time 00:19:59.660 --> 00:20:00.860 on in this particular class. 00:20:00.860 --> 00:20:02.690 This is what's known as assembly language. 00:20:02.690 --> 00:20:06.710 And back in the day, before there was C, humans 00:20:06.710 --> 00:20:09.110 wrote this to program their computers. 00:20:09.110 --> 00:20:12.440 Similarly, before there was assembly code back in the day, 00:20:12.440 --> 00:20:15.163 humans very initially used what instead? 00:20:15.163 --> 00:20:16.080 AUDIENCE: 0's and 1's. 00:20:16.080 --> 00:20:19.430 DAVID MALAN: So 0's and 1's-- like they actually wrote the machine code 00:20:19.430 --> 00:20:23.360 painfully, be it in code or be it in punch cards like physical objects 00:20:23.360 --> 00:20:24.000 or the like. 00:20:24.000 --> 00:20:25.730 So again, these are sort of abstractions, 00:20:25.730 --> 00:20:27.660 but we're rewinding for today in time. 00:20:27.660 --> 00:20:30.860 But what this compiler for C is doing is converting C 00:20:30.860 --> 00:20:33.380 into this other language called assembly language. 00:20:33.380 --> 00:20:35.630 And even though this looks very esoteric, 00:20:35.630 --> 00:20:37.940 there's at least some juicy things in here. 00:20:37.940 --> 00:20:40.580 If I highlight get_string, it's mentioned in this code. 00:20:40.580 --> 00:20:42.560 printf is mentioned in this code. 00:20:42.560 --> 00:20:44.540 And even some of these keywords here that 00:20:44.540 --> 00:20:48.320 are spelled a bit weirdly, this relates to subtracting and moving 00:20:48.320 --> 00:20:51.480 something in memory and calling a function, calling a function. 00:20:51.480 --> 00:20:53.450 So there's some semantics that are probably 00:20:53.450 --> 00:20:56.690 somewhat familiar even though this is not code we ourselves will write. 00:20:56.690 --> 00:20:59.670 But unfortunately, this is not yet machine code, 00:20:59.670 --> 00:21:02.370 and that's where step 3 comes in. 00:21:02.370 --> 00:21:06.470 So step 3 of this four-step process is technically called assembling. 00:21:06.470 --> 00:21:12.320 And assembling just takes that assembly code and converts it, thankfully, 00:21:12.320 --> 00:21:15.650 to the thing we do care about, the 0's and 1's. 00:21:15.650 --> 00:21:18.830 So assembling takes assembly code converts it to 0's and 1's. 00:21:18.830 --> 00:21:21.020 As an aside, and I alluded to this earlier, 00:21:21.020 --> 00:21:26.810 the reason that Clang names its files a.out by default, assembler output, 00:21:26.810 --> 00:21:30.740 is a side effect of that being one of the steps in this process, 00:21:30.740 --> 00:21:33.740 dealing with assembly language and its subsequent output. 00:21:33.740 --> 00:21:36.680 All right, so here are some 0's and 1's, but unfortunately, there's 00:21:36.680 --> 00:21:41.340 still that fourth and final step, which is a word that I also used earlier, 00:21:41.340 --> 00:21:42.620 namely linking. 00:21:42.620 --> 00:21:45.420 So let me take a step back and look at this code here. 00:21:45.420 --> 00:21:50.090 And even though this code is exactly as I wrote in VS Code in hello.c-- 00:21:50.090 --> 00:21:52.310 so no copying and pasting, no prototypes have 00:21:52.310 --> 00:21:55.610 been plugged in here, this is my code, technically, there's 00:21:55.610 --> 00:21:59.270 three different files involved in compiling even something relatively 00:21:59.270 --> 00:22:00.170 simple like this. 00:22:00.170 --> 00:22:03.560 There's obviously this thing itself, hello.c, which I wrote. 00:22:03.560 --> 00:22:08.600 There's apparently cs50.h, and there's apparently stdio.h. 00:22:08.600 --> 00:22:12.650 But technically-- and you don't have to know this file name, per se, somewhere 00:22:12.650 --> 00:22:15.540 else on the computer's hard drive, so to speak, 00:22:15.540 --> 00:22:19.520 is a cs50.c file, which actually contains 00:22:19.520 --> 00:22:22.910 the staff's implementation of get_string and get_int and get_float 00:22:22.910 --> 00:22:24.320 and all of those other functions. 00:22:24.320 --> 00:22:28.460 Somewhere on the server's hard drive is stdio.c 00:22:28.460 --> 00:22:31.890 that implements printf and all of these other functions as well. 00:22:31.890 --> 00:22:34.940 So the dot c is just inferred from the dot h here. 00:22:34.940 --> 00:22:38.450 You don't ever mention the dot c file, but someone else wrote those files, 00:22:38.450 --> 00:22:41.570 someone else stored them in the server for you-- 00:22:41.570 --> 00:22:43.220 CS50 staff in this case. 00:22:43.220 --> 00:22:47.270 So technically, even when compiling a relatively short program like this, 00:22:47.270 --> 00:22:51.920 you're really combining three files at least at the end of the day. 00:22:51.920 --> 00:22:54.020 And I'll write them from left to right. hello.c, 00:22:54.020 --> 00:23:01.920 which I wrote, cs50.c, which the staff wrote, and then stdio.c as well. 00:23:01.920 --> 00:23:04.010 So somewhere there's these three files. 00:23:04.010 --> 00:23:08.540 And Clang, our compiler, needs to compile each of these 00:23:08.540 --> 00:23:12.500 into the corresponding 0's and 1's. 00:23:12.500 --> 00:23:17.300 Lastly, this is not yet sufficient because these 0's and 1's haven't 00:23:17.300 --> 00:23:18.333 been linked together. 00:23:18.333 --> 00:23:20.750 I mean, I deliberately left a gap here to imply that these 00:23:20.750 --> 00:23:22.880 are three separately-compiled files. 00:23:22.880 --> 00:23:25.760 So that fourth and final step called linking 00:23:25.760 --> 00:23:28.430 takes all of these 0's and 1's and an intelligent way 00:23:28.430 --> 00:23:35.300 combines them into just one final file named hello, named a.out, 00:23:35.300 --> 00:23:37.680 whatever the file name is of choice. 00:23:37.680 --> 00:23:40.820 So what you and I for the past week have just been calling compiling-- 00:23:40.820 --> 00:23:43.550 and that's what a normal person will use henceforth 00:23:43.550 --> 00:23:46.490 to describe this whole process, technically, there's 00:23:46.490 --> 00:23:49.250 these four different steps underneath the hood, each of which 00:23:49.250 --> 00:23:55.067 is sort of a representative of an evolution of technology over the years. 00:23:55.067 --> 00:23:56.900 And nowadays, if we fast forward a few weeks 00:23:56.900 --> 00:23:59.780 in class, when we start talking about Python, which 00:23:59.780 --> 00:24:03.710 is another more modern language, that, too, is going to be conceptually even 00:24:03.710 --> 00:24:06.090 higher level, even though underneath the hood, 00:24:06.090 --> 00:24:09.330 there's going to be some lower-level principles at work. 00:24:09.330 --> 00:24:16.010 So any questions on just terminology or these processes known as compiling? 00:24:16.010 --> 00:24:17.462 Yeah? 00:24:17.462 --> 00:24:19.879 AUDIENCE: I didn't really understand what compiling means. 00:24:19.879 --> 00:24:21.360 [INAUDIBLE] 00:24:21.360 --> 00:24:22.110 DAVID MALAN: Sure. 00:24:22.110 --> 00:24:29.400 Compiling, if I rewind, is the process of taking your source code, which 00:24:29.400 --> 00:24:35.260 looks like this, recall-- whoops, this, and converting it into assembly code. 00:24:35.260 --> 00:24:38.640 So preprocessing just converts all of those hash 00:24:38.640 --> 00:24:41.470 include lines and a few others to their equivalents. 00:24:41.470 --> 00:24:42.210 So that's step 1. 00:24:42.210 --> 00:24:46.920 Compiling converts the C code into the underlying assembly code. 00:24:46.920 --> 00:24:51.750 The assembling step, step 3, converts the assembly code to 0's and 1's. 00:24:51.750 --> 00:24:54.480 And then the fourth step, linking, combines 00:24:54.480 --> 00:24:57.960 all of the 0's and 1's from the one, the two, the three or more files 00:24:57.960 --> 00:25:00.510 that are involved in your project and links them 00:25:00.510 --> 00:25:02.310 all together for you magically. 00:25:02.310 --> 00:25:06.060 But at the end of the day, all of this is happening automatically for you. 00:25:06.060 --> 00:25:10.530 If I jump now to the end here, whereby just by running 00:25:10.530 --> 00:25:14.310 make, which, in turn, runs clang for you, like all of this 00:25:14.310 --> 00:25:15.900 is abstracted away. 00:25:15.900 --> 00:25:19.620 But the key here is that even with these commands that we've been running, 00:25:19.620 --> 00:25:22.510 be it the make command or the clang command, 00:25:22.510 --> 00:25:28.570 everything should be explainable what you are typing at the prompt 00:25:28.570 --> 00:25:29.410 ultimately. 00:25:29.410 --> 00:25:31.300 Each of those things has a purpose. 00:25:31.300 --> 00:25:33.850 So any questions, then, on what we've just 00:25:33.850 --> 00:25:38.018 now called compiling even though it's only when you take another CS 00:25:38.018 --> 00:25:40.060 course that you might spend more time on assembly 00:25:40.060 --> 00:25:42.940 language or these lower-level details? 00:25:42.940 --> 00:25:43.480 Yeah? 00:25:43.480 --> 00:25:47.264 AUDIENCE: [INAUDIBLE] 00:25:49.092 --> 00:25:50.300 DAVID MALAN: A good question. 00:25:50.300 --> 00:25:51.740 Are there other types of compilers? 00:25:51.740 --> 00:25:52.240 Yes. 00:25:52.240 --> 00:25:57.320 Back when I took CS50, I used a popular compiler called GCC, the GNU Compiler 00:25:57.320 --> 00:26:00.650 Collection, which still exists actually in the code space 00:26:00.650 --> 00:26:02.120 that you're using for CS50. 00:26:02.120 --> 00:26:04.110 Clang is somewhat more recent. 00:26:04.110 --> 00:26:05.153 It's gaining popularity. 00:26:05.153 --> 00:26:07.820 And frankly, we use it in large part because it's error messages 00:26:07.820 --> 00:26:09.320 are slightly more user-friendly. 00:26:09.320 --> 00:26:12.570 You might not believe us because if you encountered some errors with your code 00:26:12.570 --> 00:26:16.370 this past week, they were probably just as arcane as the error messages I saw, 00:26:16.370 --> 00:26:18.598 but it's better than it was some years ago. 00:26:18.598 --> 00:26:20.390 And there's alternatives to compiling, too, 00:26:20.390 --> 00:26:24.100 but more on that when we get to Python as well. 00:26:24.100 --> 00:26:26.080 Other questions? 00:26:26.080 --> 00:26:26.580 No? 00:26:26.580 --> 00:26:27.080 All right. 00:26:27.080 --> 00:26:31.020 Well, what are the implications of the fact that we're going from source code 00:26:31.020 --> 00:26:32.190 to machine code? 00:26:32.190 --> 00:26:35.010 Well, it stands to reason that if you can compile code, 00:26:35.010 --> 00:26:38.970 maybe you can decompile it-- that is, go in the reverse direction. 00:26:38.970 --> 00:26:42.010 Go from 0's and 1's to actual source code. 00:26:42.010 --> 00:26:45.477 Now that would be handy if you want to go in as a programmer and change 00:26:45.477 --> 00:26:48.060 something in a program that you or someone else already wrote. 00:26:48.060 --> 00:26:51.330 It's maybe not ideal for your intellectual property, 00:26:51.330 --> 00:26:54.780 though, if you are the person who wrote that program in the first place. 00:26:54.780 --> 00:26:57.810 If you are Microsoft and you wrote Microsoft Word or Excel 00:26:57.810 --> 00:27:01.290 that people with Macs and PCs and phones have installed on their devices, 00:27:01.290 --> 00:27:04.440 it doesn't actually sound very appealing if any old customer 00:27:04.440 --> 00:27:08.830 can take those 0's and 1's and reverse them, reverse engineer them, 00:27:08.830 --> 00:27:11.157 so to speak, into the original source code 00:27:11.157 --> 00:27:13.740 because then they can have their own version of Microsoft Word 00:27:13.740 --> 00:27:17.100 and make changes to it without really having put in all of the R&D 00:27:17.100 --> 00:27:19.980 that it might have taken to build the first version thereof. 00:27:19.980 --> 00:27:22.720 But it turns out that reverse engineering-- 00:27:22.720 --> 00:27:26.050 so doing things in the opposite direction-- is easier 00:27:26.050 --> 00:27:29.740 said than done because there are multiple ways, as you've seen already, 00:27:29.740 --> 00:27:31.300 to implement programs. 00:27:31.300 --> 00:27:35.440 Like loops alone, you can use for loops, while loops, even do-while loops. 00:27:35.440 --> 00:27:37.540 And so there's other ways-- there's multiple ways 00:27:37.540 --> 00:27:38.960 to solve the same problem. 00:27:38.960 --> 00:27:41.590 So even if you try to reverse engineer a program 00:27:41.590 --> 00:27:44.440 and convert machine code back to source code, 00:27:44.440 --> 00:27:48.170 there's not necessarily going to be an obvious way to do so. 00:27:48.170 --> 00:27:50.620 And the reality is, that it ends up being such a mess 00:27:50.620 --> 00:27:53.350 because you lose the variable names typically, 00:27:53.350 --> 00:27:57.070 you lose the function names typically, that what you end up looking at 00:27:57.070 --> 00:28:01.300 might very well be C code, but it's completely difficult for you, 00:28:01.300 --> 00:28:03.040 even a good programmer, to read. 00:28:03.040 --> 00:28:06.520 And generally, the mindset is, if you're really good enough 00:28:06.520 --> 00:28:09.782 to decompile code in that way and read it subsequently 00:28:09.782 --> 00:28:11.740 even without good variable names, good function 00:28:11.740 --> 00:28:14.950 names, good documentation and the like, could probably have just implemented 00:28:14.950 --> 00:28:18.340 the program in the first place yourself without jumping through those hoops. 00:28:18.340 --> 00:28:20.440 So there's some practicality pushing back 00:28:20.440 --> 00:28:25.420 on what are otherwise potential threats to, say, your intellectual property. 00:28:25.420 --> 00:28:28.150 But that's not going to be the case later on in the term when 00:28:28.150 --> 00:28:31.270 we do get to languages like Python to some extent, other languages 00:28:31.270 --> 00:28:32.200 like JavaScript. 00:28:32.200 --> 00:28:34.870 Some of those are actually going to be readable by anyone. 00:28:34.870 --> 00:28:36.790 Any of your customers, any of your friends, 00:28:36.790 --> 00:28:39.950 and your family that actually use your programs. 00:28:39.950 --> 00:28:43.540 So with that said, let's introduce now another tool to our toolkit 00:28:43.540 --> 00:28:45.580 that will hopefully make some of the pain 00:28:45.580 --> 00:28:47.470 from this past week when you did encounter 00:28:47.470 --> 00:28:49.210 bugs a little more manageable. 00:28:49.210 --> 00:28:52.330 And indeed, part of the process of writing code to this day 00:28:52.330 --> 00:28:53.680 is debugging it. 00:28:53.680 --> 00:28:56.560 And it is a rare thing to write a program, 00:28:56.560 --> 00:29:01.450 be it in C or any other language, and get it 100% right the first time. 00:29:01.450 --> 00:29:05.360 I mean, to this day, I still, 20-plus years later, still write buggy code. 00:29:05.360 --> 00:29:08.695 Hopefully a little bit less of it, but any time you're adding a new feature, 00:29:08.695 --> 00:29:10.820 any time you're doing something for the first time, 00:29:10.820 --> 00:29:14.380 you're not necessarily going to see all of the possible mistakes. 00:29:14.380 --> 00:29:18.910 So even in industry, bugs are omnipresent, which is really to say, 00:29:18.910 --> 00:29:22.360 having techniques to debug code-- that is, eliminate bugs, 00:29:22.360 --> 00:29:23.740 is super compelling. 00:29:23.740 --> 00:29:26.920 Now just for a bit of history, here is Admiral Grace Hopper, 00:29:26.920 --> 00:29:30.230 who was actually in not only the military, 00:29:30.230 --> 00:29:33.070 but also on the faculty of Harvard years ago 00:29:33.070 --> 00:29:35.860 and worked on a Harvard computer called the Harvard Mark 00:29:35.860 --> 00:29:39.250 I, which is actually on display at the School of Engineering and Applied 00:29:39.250 --> 00:29:41.260 Sciences if you take a tour over there sometime. 00:29:41.260 --> 00:29:44.230 But also when working on the Harvard Mark II, 00:29:44.230 --> 00:29:50.170 she is known for having at least popularized the phrase "bug" to mean 00:29:50.170 --> 00:29:53.350 a mistake in a computer's program-- 00:29:53.350 --> 00:29:55.240 a mistake in a computer's code. 00:29:55.240 --> 00:29:58.510 And the etymology of this supposedly is this here logbook 00:29:58.510 --> 00:30:02.320 wherein she and her colleagues were documenting processes being computed 00:30:02.320 --> 00:30:04.960 on computers, that a moth actually got stuck 00:30:04.960 --> 00:30:09.250 in one of the relays, one of the mechanical-- the electric relays inside 00:30:09.250 --> 00:30:13.450 of the very old now computer, and someone very cleverly 00:30:13.450 --> 00:30:16.657 wrote, "First actual case of bug being found." 00:30:16.657 --> 00:30:18.490 So it wasn't she who actually discovered it, 00:30:18.490 --> 00:30:22.450 but this was a story she was thereafter fond of telling as a famed computer 00:30:22.450 --> 00:30:23.860 scientist thereafter. 00:30:23.860 --> 00:30:28.210 We now know bugs to be all too familiar when it comes to writing our own code, 00:30:28.210 --> 00:30:31.060 and I thought I would deliberately write some buggy code based 00:30:31.060 --> 00:30:34.400 on some of the programs with which we experimented last week. 00:30:34.400 --> 00:30:37.270 So let me go back over to VS Code here and let 00:30:37.270 --> 00:30:44.290 me propose that I do something somewhat simplistic just like this to print out 00:30:44.290 --> 00:30:47.140 a column of bricks of height 3. 00:30:47.140 --> 00:30:50.440 So I'm going into VS Code and I'm going to deliberately call this program 00:30:50.440 --> 00:30:53.230 buggy.c because I intend to do this poorly. 00:30:53.230 --> 00:30:58.760 I'm going to include stdio.h as before, int main void as before. 00:30:58.760 --> 00:31:01.630 And in here, if I want to print a pyramid of height 3, 00:31:01.630 --> 00:31:04.720 I'm going to do 4 int i gets-- 00:31:04.720 --> 00:31:06.910 all right, I'm still new to programming in my mind 00:31:06.910 --> 00:31:09.820 here, so I know I'm supposed to start counting at 0, OK. 00:31:09.820 --> 00:31:13.480 And I want to do this until I count up to 3, so I'm going to do that. 00:31:13.480 --> 00:31:16.700 And then i++ I remember from class in this way. 00:31:16.700 --> 00:31:20.500 And now I might go ahead and print out just a hash mark, backslash n, 00:31:20.500 --> 00:31:23.710 which I do want because I want to move this cursor to the next line 00:31:23.710 --> 00:31:24.790 to make this vertical. 00:31:24.790 --> 00:31:29.730 But of course, if you've noticed with your eye already, when I do make buggy, 00:31:29.730 --> 00:31:30.960 it compiles OK. 00:31:30.960 --> 00:31:33.640 So no typos, no syntactical errors. 00:31:33.640 --> 00:31:37.620 But when I run this, I'm going to see how many bricks. 00:31:37.620 --> 00:31:39.510 So four in this case. 00:31:39.510 --> 00:31:41.650 Now this is meant to be a simplistic example 00:31:41.650 --> 00:31:44.910 so that we don't spend time trying to figure out what the bug is, but rather, 00:31:44.910 --> 00:31:48.210 focus on techniques for actually identifying the bug. 00:31:48.210 --> 00:31:50.010 So-- finding, rather, the bug. 00:31:50.010 --> 00:31:52.170 So what's one of the first tools in your toolkit? 00:31:52.170 --> 00:31:55.470 Literally one you have already. printf is your friend. 00:31:55.470 --> 00:31:59.730 And it is a very quick and dirty tool for just seeing 00:31:59.730 --> 00:32:02.520 what's going on inside of the computer when 00:32:02.520 --> 00:32:06.550 you don't have more sophisticated tools or even the time to use them. 00:32:06.550 --> 00:32:09.750 And so in this case, for instance, what I'd propose is that-- 00:32:09.750 --> 00:32:11.610 all right, I'm obviously seeing four hashes. 00:32:11.610 --> 00:32:13.710 And let me play a little slow here. 00:32:13.710 --> 00:32:18.090 It'd be helpful for me to understand why logically I'm ending up with four, even 00:32:18.090 --> 00:32:21.360 though I'm starting at 0 like I remember from class and I'm going up to 3 00:32:21.360 --> 00:32:25.870 as we did in class, like I'm just not seeing it in this particular story. 00:32:25.870 --> 00:32:30.180 So what I would commonly do is go into my code and just help me see 00:32:30.180 --> 00:32:35.400 what's going on, and I might literally write a printf line like, i is %i, 00:32:35.400 --> 00:32:39.490 backslash n, comma, and then just print out the value of i. 00:32:39.490 --> 00:32:41.620 I just want to see on every iteration, what 00:32:41.620 --> 00:32:45.530 is i, what is i, what is i just to help me see what the computer already knows. 00:32:45.530 --> 00:32:49.900 So let me go ahead and recompile buggy, let me rerun buggy, 00:32:49.900 --> 00:32:51.910 and then let me make my terminal window bigger 00:32:51.910 --> 00:32:53.410 just to make clear what's going on. 00:32:53.410 --> 00:32:56.080 And now it's a little more pedantic. 00:32:56.080 --> 00:33:01.150 Now i is 0, I get a hash. i is 1, I get a hash. i is 2, I get a hash. 00:33:01.150 --> 00:33:04.310 Wait a minute. i is 3, I get a hash. 00:33:04.310 --> 00:33:07.250 So clearly now, it should be maybe more obvious to you, 00:33:07.250 --> 00:33:09.430 especially if the syntax itself is unfamiliar, 00:33:09.430 --> 00:33:11.680 I certainly don't want this last one printing, 00:33:11.680 --> 00:33:14.810 or maybe equivalently, I don't want the first one printing. 00:33:14.810 --> 00:33:17.830 So I can fix this in a couple of ways, but the solution, 00:33:17.830 --> 00:33:22.810 the most canonical solution is probably to do what with my code? 00:33:22.810 --> 00:33:24.430 To change to what to what? 00:33:24.430 --> 00:33:25.402 Yeah? 00:33:25.402 --> 00:33:26.590 AUDIENCE: [INAUDIBLE] 00:33:26.590 --> 00:33:27.340 DAVID MALAN: Yeah. 00:33:27.340 --> 00:33:31.000 So change the less than or equal sign to just a less than sign. 00:33:31.000 --> 00:33:36.580 So even though this is like counting from 0 to 3 instead of 1 through 3, 00:33:36.580 --> 00:33:39.890 it's the more typical programmatic way to write code like this. 00:33:39.890 --> 00:33:43.600 And now, of course, if I do make buggy-- 00:33:43.600 --> 00:33:46.840 and I'll increase my terminal window again, ./buggy, 00:33:46.840 --> 00:33:49.360 now I see what's going on inside of the code. 00:33:49.360 --> 00:33:53.080 Now it matches my expectations, and so now the bug is gone. 00:33:53.080 --> 00:33:55.330 Now of course, if I'm submitting this or shipping it, 00:33:55.330 --> 00:33:57.190 I should delete the temporary printf. 00:33:57.190 --> 00:34:00.610 And let me disclaim that using printf in this way just to help you 00:34:00.610 --> 00:34:03.100 see what's going on is generally a good thing, 00:34:03.100 --> 00:34:06.370 but generally adding a printf and a printf and a printf and a printf-- 00:34:06.370 --> 00:34:10.665 like it starts to devolve into just trial and error and you 00:34:10.665 --> 00:34:13.540 have no idea what's going on, so you're just printing out everything. 00:34:13.540 --> 00:34:17.230 Let me propose that if you ever find yourself slipping down 00:34:17.230 --> 00:34:20.260 that hill into just trying this, trying this, trying this, 00:34:20.260 --> 00:34:22.659 you need a better tool, not just doing printf. 00:34:22.659 --> 00:34:26.199 And frankly, it's annoying to use printf because every time you add a printf, 00:34:26.199 --> 00:34:28.699 you have to recompile the code, rerun the code. 00:34:28.699 --> 00:34:31.230 It's just adding to the number of steps. 00:34:31.230 --> 00:34:34.550 So let me propose instead that we do this. 00:34:34.550 --> 00:34:37.070 I'm going to go back into VS Code here and I'm 00:34:37.070 --> 00:34:39.980 going to write a different program that actually 00:34:39.980 --> 00:34:42.110 has a helper function, so to speak. 00:34:42.110 --> 00:34:44.840 A second function whose purpose in life is maybe just 00:34:44.840 --> 00:34:46.940 to print that column for me. 00:34:46.940 --> 00:34:50.685 So I'm going to say this-- void print_column, 00:34:50.685 --> 00:34:53.060 though I could call it anything I want, and this function 00:34:53.060 --> 00:34:56.570 is going to take a argument or a parameter called 00:34:56.570 --> 00:34:59.300 height which will tell it how many bricks to print, 00:34:59.300 --> 00:35:01.070 how many vertical bricks. 00:35:01.070 --> 00:35:05.900 I'm going to do the same kind of logic. for int i equals 0. 00:35:05.900 --> 00:35:06.830 i is less than-- 00:35:06.830 --> 00:35:09.830 I'm going to make the same mistake again-- less than or equal to height, 00:35:09.830 --> 00:35:10.850 i++. 00:35:10.850 --> 00:35:14.922 And then inside of this for loop, let me go ahead and print out the hash mark. 00:35:14.922 --> 00:35:16.880 So I've made the same mistake, but I've made it 00:35:16.880 --> 00:35:20.900 in the context now of a helper function only because in main, 00:35:20.900 --> 00:35:24.980 what I'd like to do now, just to be a little more sophisticated is get int 00:35:24.980 --> 00:35:27.300 from the user for the height. 00:35:27.300 --> 00:35:31.190 And when I do get that int, I want to store it in a variable called n, 00:35:31.190 --> 00:35:34.980 but I do need to give that variable a type like last week. 00:35:34.980 --> 00:35:36.440 So I'll say that it's an integer. 00:35:36.440 --> 00:35:40.940 And now, lastly, I can print_column, passing in-- actually, I'll 00:35:40.940 --> 00:35:43.100 call it h just because height is h. 00:35:43.100 --> 00:35:46.730 Print column h, semicolon. 00:35:46.730 --> 00:35:49.790 OK, so it's the exact same program except I'm getting user input now. 00:35:49.790 --> 00:35:53.030 So it's not just going to be 3, it's going to be a variable height, 00:35:53.030 --> 00:35:55.108 but I've done something stupid. 00:35:55.108 --> 00:35:56.940 AUDIENCE: [INAUDIBLE] 00:35:56.940 --> 00:35:58.690 DAVID MALAN: I've done two stupid things. 00:35:58.690 --> 00:36:02.310 So this, of course, is not supposed to be there, so I'll fix that. 00:36:02.310 --> 00:36:03.390 And someone else. 00:36:03.390 --> 00:36:05.265 What else have I done? 00:36:05.265 --> 00:36:08.990 AUDIENCE: [INAUDIBLE] 00:36:08.990 --> 00:36:09.740 DAVID MALAN: Yeah. 00:36:09.740 --> 00:36:11.070 I'm missing the prototype. 00:36:11.070 --> 00:36:16.040 And this is, let me reiterate, probably the only time where copy-paste is OK. 00:36:16.040 --> 00:36:17.960 Once you've implemented the function, you 00:36:17.960 --> 00:36:21.690 can copy paste its first line at a semicolon 00:36:21.690 --> 00:36:25.265 so that it teaches the compiler that this function will exist. 00:36:25.265 --> 00:36:26.635 AUDIENCE: [INAUDIBLE] 00:36:26.635 --> 00:36:28.010 DAVID MALAN: Three stupid things. 00:36:28.010 --> 00:36:28.510 OK. 00:36:28.510 --> 00:36:29.150 Thank you. 00:36:29.150 --> 00:36:31.520 So, good. 00:36:31.520 --> 00:36:33.620 Include cs50.h. 00:36:33.620 --> 00:36:36.860 And now, anyone want to go for four? 00:36:36.860 --> 00:36:38.040 No? 00:36:38.040 --> 00:36:38.540 All right. 00:36:38.540 --> 00:36:39.582 Slightly unintended here. 00:36:39.582 --> 00:36:42.020 So let's see. make buggy. 00:36:42.020 --> 00:36:44.160 OK, no syntax errors thanks to you all. 00:36:44.160 --> 00:36:47.090 So the code compiles, but of course, when I run buggy 00:36:47.090 --> 00:36:52.130 and I type in something like 3 manually, I'm still going to get 1, 2, 3 4 out. 00:36:52.130 --> 00:36:54.500 So let me now introduce a more powerful tool 00:36:54.500 --> 00:36:56.450 that's generally known as a debugger. 00:36:56.450 --> 00:36:58.927 And within the VS Code environment that you're using, 00:36:58.927 --> 00:37:02.010 we actually have a command that makes it a little easier to use this tool, 00:37:02.010 --> 00:37:03.510 but we didn't write the tool itself. 00:37:03.510 --> 00:37:07.040 You are about to see a very graphical, a very popular industry standard 00:37:07.040 --> 00:37:11.510 tool called a debugger, but we'll start the debugger using a CS50-specific 00:37:11.510 --> 00:37:15.080 command called debug50, which just makes it easier with a single command 00:37:15.080 --> 00:37:17.655 to start the debugger without having to configure a text 00:37:17.655 --> 00:37:20.030 file with all of your preferred settings and all of that. 00:37:20.030 --> 00:37:22.710 It's just an annoying hoop otherwise to jump through. 00:37:22.710 --> 00:37:25.100 So what I'm going to do is go back to my code here. 00:37:25.100 --> 00:37:27.900 I have already compiled it, but just for good measure, 00:37:27.900 --> 00:37:31.140 I'll make buggy again because the debugger needs your code 00:37:31.140 --> 00:37:31.862 to be compiled. 00:37:31.862 --> 00:37:33.570 It's not going to help with syntax errors 00:37:33.570 --> 00:37:36.270 like the stupid mistakes I just made unintentionally, 00:37:36.270 --> 00:37:40.530 it will help you though with programmatic errors, logical errors 00:37:40.530 --> 00:37:42.870 in your code once your code is running. 00:37:42.870 --> 00:37:47.130 So to run debug50, I'm going to do this. debug50, space, and then 00:37:47.130 --> 00:37:51.840 the exact same command I would normally run to just run the program itself. 00:37:51.840 --> 00:37:53.190 So ./buggy. 00:37:53.190 --> 00:37:57.150 So exact same thing, ./buggy, but I prefix it now with debug50. 00:37:57.150 --> 00:37:59.172 When I hit Enter, a whole bunch of-- 00:37:59.172 --> 00:38:01.380 another error is going to pop up on the screen, which 00:38:01.380 --> 00:38:04.213 is a good reminder because this will happen to you, too, invariably. 00:38:04.213 --> 00:38:07.560 It's reminding me that I have to set what's called a breakpoint. 00:38:07.560 --> 00:38:10.140 And as that word suggests, it is the point 00:38:10.140 --> 00:38:12.060 at which you want your code to break. 00:38:12.060 --> 00:38:15.420 Not break in make the situation worse sense, but rather, 00:38:15.420 --> 00:38:16.920 where do you want to pause? 00:38:16.920 --> 00:38:20.590 Execution, break, execution-- like hitting the brakes on a car 00:38:20.590 --> 00:38:22.710 so the program doesn't run all at once. 00:38:22.710 --> 00:38:24.600 And you can put this any number of places, 00:38:24.600 --> 00:38:26.308 and you might have done this accidentally 00:38:26.308 --> 00:38:29.040 if you've ever hovered over the gutter of VS Code, 00:38:29.040 --> 00:38:32.010 the left-hand side next to your line numbers. 00:38:32.010 --> 00:38:34.180 See the little red dot that appears? 00:38:34.180 --> 00:38:38.560 If I click on any of these lines, that's going to set a breakpoint, so to speak. 00:38:38.560 --> 00:38:41.310 And I want to break execution at main. 00:38:41.310 --> 00:38:44.040 So I'm just going to click to the left of line 6 in this case. 00:38:44.040 --> 00:38:47.430 That makes it a darker red circle, a stop sign 00:38:47.430 --> 00:38:51.030 of sorts that tells the debugger to pause execution on that line, 00:38:51.030 --> 00:38:53.580 though I could put it elsewhere if I so choose. 00:38:53.580 --> 00:38:57.990 Let me go ahead and rerun debug50 ./buggy, Enter, 00:38:57.990 --> 00:39:00.652 and now a bunch of things are going to happen on the screen. 00:39:00.652 --> 00:39:03.360 It's going to look a little overwhelming perhaps at first glance, 00:39:03.360 --> 00:39:05.950 but there's some useful stuff that just happened. 00:39:05.950 --> 00:39:12.450 So one, my code is still here, but the line that I set the breakpoint on is-- 00:39:12.450 --> 00:39:16.080 rather, the first line of actual executable 00:39:16.080 --> 00:39:20.970 code at or below the breakpoint I set is highlighted in this yellowish green 00:39:20.970 --> 00:39:25.120 here, which says, this line of code has not yet been executed. 00:39:25.120 --> 00:39:28.590 We broke at this point, but if I click a button, this line of code 00:39:28.590 --> 00:39:30.030 will be executed. 00:39:30.030 --> 00:39:33.750 Because up until now, every C program you write runs as fast as that. 00:39:33.750 --> 00:39:36.550 I want to pump the brakes and pause here. 00:39:36.550 --> 00:39:39.190 But notice a few other aspects of the window here. 00:39:39.190 --> 00:39:41.310 So notice that up here some weirdness. 00:39:41.310 --> 00:39:43.890 There's mentions of variables and we're familiar with these. 00:39:43.890 --> 00:39:45.990 Local is a term we'll use this week. 00:39:45.990 --> 00:39:48.210 But there's this variable h, which weirdly, 00:39:48.210 --> 00:39:51.300 where did the value 21912 come from? 00:39:51.300 --> 00:39:57.750 So it turns out, in C, before you initialize a variable with a value 00:39:57.750 --> 00:40:01.890 by literally typing the number 3, or by using a function like get_int, 00:40:01.890 --> 00:40:04.662 it often contains what's called a garbage value. 00:40:04.662 --> 00:40:06.120 More on those in a couple of weeks. 00:40:06.120 --> 00:40:07.950 But a garbage value is you can think of it 00:40:07.950 --> 00:40:10.680 as like remnants of whatever was in the computer's memory 00:40:10.680 --> 00:40:12.280 before you ran your program. 00:40:12.280 --> 00:40:14.040 And that's a bit of a oversimplification, 00:40:14.040 --> 00:40:18.150 but you cannot trust that a variable will have a certain value in this case 00:40:18.150 --> 00:40:21.490 if you did not put one there yourself. 00:40:21.490 --> 00:40:23.857 So for now, h is nonsensical. 00:40:23.857 --> 00:40:25.440 It's a garbage value it means nothing. 00:40:25.440 --> 00:40:29.230 But once I execute this line, it should contain whatever the human types in. 00:40:29.230 --> 00:40:29.730 All right. 00:40:29.730 --> 00:40:32.990 Down here, there's a watch section, which is a more sophisticated feature. 00:40:32.990 --> 00:40:34.740 Down here is what's called the call stack. 00:40:34.740 --> 00:40:35.890 More on that in the future. 00:40:35.890 --> 00:40:39.240 But what this means for now is that I'm executing the main function, not, 00:40:39.240 --> 00:40:40.870 for instance, print_column. 00:40:40.870 --> 00:40:44.790 So notice up here, these are the most useful controls within the interface. 00:40:44.790 --> 00:40:46.740 If I hit this Play button, it's just going 00:40:46.740 --> 00:40:50.640 to actually run my program to the end of it without bothering me further. 00:40:50.640 --> 00:40:54.990 However, I can actually step over this line of code and execute it, 00:40:54.990 --> 00:40:57.870 or I can step into this line of code and actually 00:40:57.870 --> 00:41:01.480 poke around the contents of get_int if it's available on the system. 00:41:01.480 --> 00:41:03.870 So conceptually you can either execute this line 00:41:03.870 --> 00:41:08.745 or you can dive down conceptually deeper and see what's inside of that function. 00:41:08.745 --> 00:41:10.620 Lastly, this will let you step out, this will 00:41:10.620 --> 00:41:13.828 allow you to restart the whole process, and this will just stop the debugger. 00:41:13.828 --> 00:41:15.960 So these buttons are going to be our friends. 00:41:15.960 --> 00:41:19.840 And the one I'll click first is the first one I described, 00:41:19.840 --> 00:41:21.690 which is step over. 00:41:21.690 --> 00:41:26.180 So step over doesn't mean, skip this step, it just means execute it, 00:41:26.180 --> 00:41:30.000 but don't bother me by going into the weeds of what is on the specific line, 00:41:30.000 --> 00:41:30.740 namely get_int. 00:41:30.740 --> 00:41:32.990 So when I click this button in a moment, you'll 00:41:32.990 --> 00:41:36.830 see that my terminal, which is still at the bottom, prompts me for a height. 00:41:36.830 --> 00:41:38.600 I'm going to go ahead and type 3. 00:41:38.600 --> 00:41:41.240 As soon as I hit Enter, what part of the screen 00:41:41.240 --> 00:41:44.285 probably will change based on what I've said? 00:41:47.280 --> 00:41:50.760 So h, the variable h should hopefully take on the number 3. 00:41:50.760 --> 00:41:53.340 And I'll probably see a different line of code 00:41:53.340 --> 00:41:57.990 highlighted, probably line 9 next once I'm done executing line 8. 00:41:57.990 --> 00:42:01.170 So let me go ahead and hit Enter and watch the top-left of the screen. 00:42:01.170 --> 00:42:08.580 And voila, h now has the value 3, and execution has now paused on line 9 00:42:08.580 --> 00:42:12.900 because the debugger is allowing me to step through my code line by line. 00:42:12.900 --> 00:42:16.998 Now let me go ahead and print out-- let me go ahead and just say, all right, 00:42:16.998 --> 00:42:17.790 I'm done with this. 00:42:17.790 --> 00:42:19.950 Let's go ahead and run the rest of the program. 00:42:19.950 --> 00:42:21.660 It clearly got the value 3. 00:42:21.660 --> 00:42:22.658 But wait a minute-- 00:42:22.658 --> 00:42:24.450 oh, and at this point, it closed the window 00:42:24.450 --> 00:42:28.530 in which I would have seen the output, I would have still seen four hashes. 00:42:28.530 --> 00:42:29.950 So let me actually do this again. 00:42:29.950 --> 00:42:34.392 Let me go back into debug50 by running the exact same command again. 00:42:34.392 --> 00:42:37.350 It's going to think for a moment, it's going to reconfigure the screen. 00:42:37.350 --> 00:42:38.892 I'm going to do the exact same thing. 00:42:38.892 --> 00:42:41.100 I'm going to step over this line, but I'd 00:42:41.100 --> 00:42:45.490 like to actually see what's going on inside of my print_column function. 00:42:45.490 --> 00:42:48.580 So this time, instead of just saying run to the end 00:42:48.580 --> 00:42:51.100 and close all the windows on me, let me go ahead 00:42:51.100 --> 00:42:54.460 and step into my print_column function. 00:42:54.460 --> 00:42:57.070 So don't step over, step into. 00:42:57.070 --> 00:42:58.525 Because if I step over-- 00:42:58.525 --> 00:43:00.400 and now this is what I meant to show earlier, 00:43:00.400 --> 00:43:02.710 you can see that it's still printing out 4. 00:43:02.710 --> 00:43:05.930 So in fact, let me undo this, let me just stop the whole thing. 00:43:05.930 --> 00:43:08.320 Let me rerun the command a final time. 00:43:08.320 --> 00:43:10.690 So it goes back to where we began before. 00:43:10.690 --> 00:43:15.520 It's going to prompt me again once I step over line 8 for a number like 3. 00:43:15.520 --> 00:43:19.930 But this time, instead of stepping over line 9, let's poke around. 00:43:19.930 --> 00:43:23.770 I wrote print_column, so let's look at print_column step by step, 00:43:23.770 --> 00:43:26.800 step into it, and watch what happens to the yellow highlight. 00:43:26.800 --> 00:43:30.220 It now jumps logically to the inside of print_column, 00:43:30.220 --> 00:43:32.510 thereby letting me walk through this code. 00:43:32.510 --> 00:43:35.720 And now I can just step over each of these lines one at a time. 00:43:35.720 --> 00:43:37.180 So stepping over. 00:43:37.180 --> 00:43:38.440 OK, so what did it do? 00:43:38.440 --> 00:43:41.200 It did that whole narrative that I did verbally last week 00:43:41.200 --> 00:43:43.720 where it compared i against height. 00:43:43.720 --> 00:43:45.520 It then went inside of the loop. 00:43:45.520 --> 00:43:48.940 When I click Step Over, watch what happens in my terminal-- one hash 00:43:48.940 --> 00:43:49.660 prints out. 00:43:49.660 --> 00:43:51.460 Now line 14 is highlighted again. 00:43:51.460 --> 00:43:54.220 It's comparing per the Boolean expression, i, 00:43:54.220 --> 00:43:55.900 is it less than or equal to height? 00:43:55.900 --> 00:43:59.770 If so, it's going to go ahead and print out the hash. 00:43:59.770 --> 00:44:02.080 It's going to do this again, print out the hash. 00:44:02.080 --> 00:44:05.020 But notice at the top-left of the screen, height 00:44:05.020 --> 00:44:10.180 is still the same, it's still 3, but what has been changing, apparently? 00:44:10.180 --> 00:44:11.960 i on each iteration. 00:44:11.960 --> 00:44:16.240 So the debugger is letting me see what's going on slowly inside of this loop 00:44:16.240 --> 00:44:18.070 because i keeps getting incremented. 00:44:18.070 --> 00:44:21.580 So if I step over this line now, notice that I've now printed 3. 00:44:21.580 --> 00:44:25.690 So ideally I want this loop to end, but if I click Step Over once more, 00:44:25.690 --> 00:44:29.710 notice that the value of i at top-left is 3, 00:44:29.710 --> 00:44:35.600 but 3 is less than or equal to height-- oh, now I get it, if I play along here. 00:44:35.600 --> 00:44:40.540 Now I see why less than or equals to, mathematically, is clearly incorrect. 00:44:40.540 --> 00:44:43.090 And as soon as that light bulb goes off, you can just sort of 00:44:43.090 --> 00:44:46.570 bail out, click the red Stop button to turn the debugger off, 00:44:46.570 --> 00:44:50.560 go back in, fix your code, and voila, recompile, run it, 00:44:50.560 --> 00:44:51.950 and you're back in business. 00:44:51.950 --> 00:44:55.480 So the takeaways here really are just what tools now exist? 00:44:55.480 --> 00:44:59.590 Printf is your friend, but only for quick-and-dirty debugging techniques. 00:44:59.590 --> 00:45:04.930 Get into the habit now of using debug50, and in turn, VS Code's debugger. 00:45:04.930 --> 00:45:08.800 You will invariably not take this advice, say, 00:45:08.800 --> 00:45:11.710 for problem set 2 as you first begin because it's 00:45:11.710 --> 00:45:15.340 going to feel easier and quicker just to use printf, just to use printf, 00:45:15.340 --> 00:45:16.300 just to use printf. 00:45:16.300 --> 00:45:17.710 And the problem with that logic is that you 00:45:17.710 --> 00:45:20.000 begin to build up like technical debt, so to speak, 00:45:20.000 --> 00:45:21.760 where you really should have learned it earlier, 00:45:21.760 --> 00:45:23.510 you really should have learned it earlier, 00:45:23.510 --> 00:45:26.000 you really should have learned it earlier, at which point, 00:45:26.000 --> 00:45:29.350 you end up spending more time wasted using printf 00:45:29.350 --> 00:45:32.720 and doing things manually than if you had just spent 10 minutes, 00:45:32.720 --> 00:45:35.170 30 minutes just learning the user interface 00:45:35.170 --> 00:45:37.510 and the buttons of a proper debugger. 00:45:37.510 --> 00:45:40.390 So please take that advice because it will save you 00:45:40.390 --> 00:45:45.480 significant amounts of time over time. 00:45:45.480 --> 00:45:48.900 Questions on printf or debugging in this way? 00:45:52.260 --> 00:45:54.790 Any questions on this? 00:45:54.790 --> 00:45:55.290 No? 00:45:55.290 --> 00:45:55.800 OK. 00:45:55.800 --> 00:45:59.880 So let me give you a third and final technique for debugging, which has been 00:45:59.880 --> 00:46:01.840 looming over us here for some time. 00:46:01.840 --> 00:46:05.400 So there is actually this technique known as rubber duck debugging. 00:46:05.400 --> 00:46:09.570 And in the absence of a roommate who is taking CS50 or who has taken CS50 00:46:09.570 --> 00:46:13.140 or knows how to program, in the absence of having a TF or TA or CA 00:46:13.140 --> 00:46:16.920 sitting next to you, in the absence of having a family member available to ask 00:46:16.920 --> 00:46:22.020 questions of, if you have simply an inanimate object on your desk, 00:46:22.020 --> 00:46:25.440 goes the tradition, just talk to that inanimate object. 00:46:25.440 --> 00:46:27.970 Better yet, if it's an adorable rubber duck in this way. 00:46:27.970 --> 00:46:31.560 And the idea of rubber duck debugging is that simply 00:46:31.560 --> 00:46:34.930 by verbalizing literally out loud to this inanimate object-- 00:46:34.930 --> 00:46:36.930 probably with the door closed and no one knowing 00:46:36.930 --> 00:46:39.930 that you're talking to this rubber duck, you invariably 00:46:39.930 --> 00:46:44.070 end up hearing any illogic in your own thoughts, at which point 00:46:44.070 --> 00:46:47.340 the proverbial light bulb tends to go off and you're like, oh, I'm an idiot. 00:46:47.340 --> 00:46:50.310 It's supposed to be less than, not less than or equal to. 00:46:50.310 --> 00:46:54.670 So literally just explaining to a duck or any inanimate object what's 00:46:54.670 --> 00:46:57.790 going on in your code will quite frequently just 00:46:57.790 --> 00:47:02.260 help you see in your mind's eye what it is you've been doing wrong. 00:47:02.260 --> 00:47:05.590 So rubber duck debugging is indeed a very effective technique 00:47:05.590 --> 00:47:09.550 even if you don't happen to have a small or large rubber duck. 00:47:09.550 --> 00:47:12.370 Of course, you're also welcome to use the CS50 Duck who 00:47:12.370 --> 00:47:17.710 lives at cs50.ai, and also within a pane in VS Code at cs50.dev. 00:47:17.710 --> 00:47:20.830 You can ask the CS50 Duck about concepts you don't understand, 00:47:20.830 --> 00:47:23.170 or you can even copy paste certain lines of code 00:47:23.170 --> 00:47:27.460 with which you might be having trouble and ask the duck for its own advice. 00:47:27.460 --> 00:47:28.180 All right. 00:47:28.180 --> 00:47:33.730 So, with those tools in our toolkit, let me propose now that we do-- 00:47:33.730 --> 00:47:37.390 that we introduce now a few lower-level features of C 00:47:37.390 --> 00:47:40.720 itself and better understand how we can start solving some of those problems 00:47:40.720 --> 00:47:44.860 like the readability of text or the encryption of data. 00:47:44.860 --> 00:47:47.080 These were our so-called types last week when 00:47:47.080 --> 00:47:51.490 we introduced at least a subset of them or used them just to store data 00:47:51.490 --> 00:47:53.328 in a certain format, so to speak. 00:47:53.328 --> 00:47:55.870 Like in week 0, we said that everything at the end of the day 00:47:55.870 --> 00:47:57.490 is just 0's and 1's, binary. 00:47:57.490 --> 00:48:03.130 And I claimed conceptually that how a computer knows if a set of bits 00:48:03.130 --> 00:48:08.230 is a number versus a letter versus a color or a sound or an image or a video 00:48:08.230 --> 00:48:11.048 is just context-dependent, like you're using Photoshop 00:48:11.048 --> 00:48:13.090 or you're using Microsoft Word or something else. 00:48:13.090 --> 00:48:16.420 But last week, we saw a little more precisely that it's 00:48:16.420 --> 00:48:18.490 not quite as broad strokes as that. 00:48:18.490 --> 00:48:23.680 It's more about what the programmer has told the software is 00:48:23.680 --> 00:48:25.690 being stored in a given variable. 00:48:25.690 --> 00:48:26.590 Is it an integer? 00:48:26.590 --> 00:48:28.180 Is it a char, a character? 00:48:28.180 --> 00:48:29.350 Is it a whole string? 00:48:29.350 --> 00:48:31.610 Is it a longer integer or the like? 00:48:31.610 --> 00:48:33.460 So you now have this control. 00:48:33.460 --> 00:48:36.340 The catch, though, recall, though, is that each of these types 00:48:36.340 --> 00:48:39.710 has only a finite amount of space allocated to it. 00:48:39.710 --> 00:48:43.060 So for instance, an integer is typically 4 bytes, 00:48:43.060 --> 00:48:46.780 and 4 bytes is 32 bits because it's 8 times 4. 00:48:46.780 --> 00:48:49.390 32 bits, we claimed, is roughly 4 billion, 00:48:49.390 --> 00:48:52.120 but if you want to represent negative and positive numbers, 00:48:52.120 --> 00:48:55.330 the biggest integer you can store is like 2 billion. 00:48:55.330 --> 00:48:57.650 Now that's really big for a lot of applications, 00:48:57.650 --> 00:48:59.950 but years ago, Facebook, for instance, was 00:48:59.950 --> 00:49:04.100 rumored to be using integers when they had fewer users. 00:49:04.100 --> 00:49:06.790 But now that they have billions of users-- 00:49:06.790 --> 00:49:12.100 3-plus billion users, an integer is no longer big enough for the Facebooks, 00:49:12.100 --> 00:49:15.620 the Googles, the Microsofts and so forth of the world. 00:49:15.620 --> 00:49:21.520 So we also have longs, which use twice as many bytes, but exponentially 00:49:21.520 --> 00:49:23.080 bigger range of values. 00:49:23.080 --> 00:49:26.260 Meanwhile, a bool, interestingly, is a byte, which 00:49:26.260 --> 00:49:29.550 is kind of bad design in what sense? 00:49:29.550 --> 00:49:31.780 Why might that be bad design? 00:49:31.780 --> 00:49:33.590 It's only-- it should only be 2-- 00:49:33.590 --> 00:49:36.170 1 bit, rather, because a 0 or 1 should suffice. 00:49:36.170 --> 00:49:38.440 Turns out, it's just easier to use a whole byte 00:49:38.440 --> 00:49:40.900 even though we're wasting seven of those bits, 00:49:40.900 --> 00:49:43.750 but bools are represented nonetheless with 1 byte. 00:49:43.750 --> 00:49:45.400 Chars are going to be 1 byte. 00:49:45.400 --> 00:49:47.890 Floats tend to be 4 bytes. 00:49:47.890 --> 00:49:49.390 Doubles tend to be 8 bytes. 00:49:49.390 --> 00:49:52.510 Some of this is system-dependent, but nowadays on modern computers, 00:49:52.510 --> 00:49:54.250 this tends to be a useful rule of thumb. 00:49:54.250 --> 00:49:56.710 The only one I can't commit to here is a string 00:49:56.710 --> 00:49:58.900 because a string, recall, is a sequence of text. 00:49:58.900 --> 00:50:02.800 And maybe it has no characters, one character, two, 10, 100. 00:50:02.800 --> 00:50:05.410 So it's a variable number of bytes presumably 00:50:05.410 --> 00:50:08.590 where each byte represents a given character. 00:50:08.590 --> 00:50:12.370 So with that said, how do we get from an actual computer 00:50:12.370 --> 00:50:16.060 to information being represented therein? 00:50:16.060 --> 00:50:19.270 Well, let me remind us that this is what's inside of our Macs, PCs, phones. 00:50:19.270 --> 00:50:22.220 Even though this isn't a scale and it might not be the same shape, 00:50:22.220 --> 00:50:24.520 this is memory, random access memory. 00:50:24.520 --> 00:50:26.890 And on these black chips, on the circuit board 00:50:26.890 --> 00:50:29.360 here, are the bytes that we keep talking about. 00:50:29.360 --> 00:50:31.940 In fact, let's go ahead and zoom in on one of these chips, 00:50:31.940 --> 00:50:33.110 fill the screen here. 00:50:33.110 --> 00:50:35.820 And just for an artist's depiction's sake, 00:50:35.820 --> 00:50:38.480 let me propose that if you've got, I don't know, 00:50:38.480 --> 00:50:43.340 a megabyte, a gigabyte-- like a lot of bytes packed into this chip nowadays, 00:50:43.340 --> 00:50:46.100 it stands to reason that no matter how many of them you have, 00:50:46.100 --> 00:50:48.398 we could just number them from top to bottom 00:50:48.398 --> 00:50:50.690 and we could say that this is byte 1, or you know what? 00:50:50.690 --> 00:50:55.950 This is byte 0, 1, 2, 3, and this is maybe byte 1 billion or whatever it is. 00:50:55.950 --> 00:50:58.370 So you can think of memory as having addresses 00:50:58.370 --> 00:51:03.020 or just locations, numeric indices that identify each of those bytes 00:51:03.020 --> 00:51:03.710 individually. 00:51:03.710 --> 00:51:04.550 Why a byte? 00:51:04.550 --> 00:51:08.300 Individual bits are not that useful, so 8, again, 1 byte 00:51:08.300 --> 00:51:10.400 tends to be the de facto standard. 00:51:10.400 --> 00:51:14.360 Let me-- so, for instance, if you're storing just a single character, 00:51:14.360 --> 00:51:18.570 a char, it might be stored literally in this top-left corner, so to speak, 00:51:18.570 --> 00:51:20.600 of the chip of memory. 00:51:20.600 --> 00:51:23.060 If you're storing maybe an integer, 4 bytes, 00:51:23.060 --> 00:51:24.830 it might take up that many bytes. 00:51:24.830 --> 00:51:28.760 If you're storing a long, it might take up that many bytes instead. 00:51:28.760 --> 00:51:31.520 Now we don't have to dwell on the particulars of the circuit board 00:51:31.520 --> 00:51:34.580 and these traces and all the connections, so let me just abstract 00:51:34.580 --> 00:51:37.550 this away and claim that what your computer's memory really 00:51:37.550 --> 00:51:41.060 is is just kind of this canvas, I mean kind of in the Photoshop sense. 00:51:41.060 --> 00:51:43.040 If you've ever made pictures, it's just a grid 00:51:43.040 --> 00:51:46.220 of pixels, up, down, left, right, that's really all your memory is. 00:51:46.220 --> 00:51:51.110 It's this canvas that you can manipulate the bits on to store numbers anywhere 00:51:51.110 --> 00:51:53.190 you want in the computer's memory. 00:51:53.190 --> 00:51:55.400 So in fact, let's zoom in here and let's consider 00:51:55.400 --> 00:52:01.640 how your computer is actually storing information using just these bytes. 00:52:01.640 --> 00:52:04.190 At the end of the day, no matter how sophisticated 00:52:04.190 --> 00:52:07.280 your Mac, your PC, your phone is, like this is all 00:52:07.280 --> 00:52:10.310 it has access to for storing information. 00:52:10.310 --> 00:52:13.010 It's a canvas of bytes, and what you do with this 00:52:13.010 --> 00:52:15.720 now really invites design decisions. 00:52:15.720 --> 00:52:17.000 So let's consider this. 00:52:17.000 --> 00:52:20.060 Here is an excerpt from a program wherein maybe I'm 00:52:20.060 --> 00:52:22.160 prompting the user for three scores. 00:52:22.160 --> 00:52:24.950 Like three test, scores, exam scores, something like that. 00:52:24.950 --> 00:52:27.035 And the purpose in life of this program is maybe 00:52:27.035 --> 00:52:28.910 to average those three scores together if you 00:52:28.910 --> 00:52:31.118 want to get a sense of where you stand in some class. 00:52:31.118 --> 00:52:33.290 So we can certainly whip up some code like this. 00:52:33.290 --> 00:52:37.370 And in just a moment, let me go ahead and flip over to VS Code here. 00:52:37.370 --> 00:52:41.420 And I'll write up a new program called scores.c. 00:52:41.420 --> 00:52:46.460 And in this, let me go ahead and first include stdio.h, 00:52:46.460 --> 00:52:48.710 int main void at the top. 00:52:48.710 --> 00:52:51.750 And in here, let me go ahead and assume that, eh, 00:52:51.750 --> 00:52:53.250 it's not been the greatest semester. 00:52:53.250 --> 00:52:56.930 So my first score, which I'll call score1, was a 72, 00:52:56.930 --> 00:53:03.050 my second score was a 73, but my third score, score3, was like a 33. 00:53:03.050 --> 00:53:05.832 Now you might remember these numbers in another context, 00:53:05.832 --> 00:53:08.540 they might spell a message, but in this case, it's just integers. 00:53:08.540 --> 00:53:12.320 It's just numbers because I'm telling the computer to treat these as ints. 00:53:12.320 --> 00:53:15.750 Now if I want to figure out what my average is, I can do a bit of math. 00:53:15.750 --> 00:53:18.770 So let me just print out that my average is-- 00:53:18.770 --> 00:53:20.600 and I don't want to shortchange myself. 00:53:20.600 --> 00:53:23.910 I'm not going to use %i because I don't want to lose even anything after 00:53:23.910 --> 00:53:24.660 the decimal point. 00:53:24.660 --> 00:53:26.540 So we're going to use a float instead. 00:53:26.540 --> 00:53:33.230 And my average i claim will be score1 plus score2 plus score3 00:53:33.230 --> 00:53:36.200 divided by 3, semicolon. 00:53:36.200 --> 00:53:38.840 With parentheses, because just like grade school math, 00:53:38.840 --> 00:53:41.580 like order of operations, I parenthesize the numerator, 00:53:41.580 --> 00:53:43.670 so I can divide the whole thing by 3. 00:53:43.670 --> 00:53:45.350 But I have screwed up already. 00:53:45.350 --> 00:53:49.370 I am going to shortchange myself and not give myself as high a grade 00:53:49.370 --> 00:53:51.977 as I deserve, but this one's subtle. 00:53:51.977 --> 00:53:52.935 What have I done wrong? 00:53:56.230 --> 00:53:59.740 Yeah, I might want to cast these scores to floats 00:53:59.740 --> 00:54:05.290 because if you do integral math, divide an integer or the sum of an integers-- 00:54:05.290 --> 00:54:09.710 some integers by an integer, it's going to be an integer as the result, 00:54:09.710 --> 00:54:12.730 so it's going to throw away anything after the decimal point. 00:54:12.730 --> 00:54:15.970 Even if it's something-point-1, something-point-5, something-point-9, 00:54:15.970 --> 00:54:18.010 that fraction is going to be thrown away. 00:54:18.010 --> 00:54:19.750 There's a bunch of ways to fix this. 00:54:19.750 --> 00:54:22.810 I could just use floats or doubles for all of these. 00:54:22.810 --> 00:54:26.140 I could cast score1, score2, or score3 as you propose. 00:54:26.140 --> 00:54:28.780 Frankly, the simplest way is just change the denominator 00:54:28.780 --> 00:54:31.840 because so long as I've got one float involved in the math, 00:54:31.840 --> 00:54:35.950 this will promote the whole arithmetic expression to being floating point 00:54:35.950 --> 00:54:37.690 math instead of integer math. 00:54:37.690 --> 00:54:41.110 So let me go ahead now and do make scores, Enter. 00:54:41.110 --> 00:54:45.100 So far, so good. ./scores, and my average seems to be not great, 00:54:45.100 --> 00:54:47.140 but 59.33333-- 00:54:47.140 --> 00:54:47.950 so in the third. 00:54:47.950 --> 00:54:50.200 But I would have lost that third if I hadn't 00:54:50.200 --> 00:54:52.940 used a float in this particular way. 00:54:52.940 --> 00:54:56.570 Well, let's consider now what's actually going on inside of the computer 00:54:56.570 --> 00:54:58.650 when I store these three variables. 00:54:58.650 --> 00:55:01.175 So, back to the grid here, just my canvas of memory. 00:55:01.175 --> 00:55:03.050 It doesn't really matter where things end up. 00:55:03.050 --> 00:55:04.820 I might put it here, I might put it there, 00:55:04.820 --> 00:55:06.510 the computer makes these decisions. 00:55:06.510 --> 00:55:10.500 But for the artist's sake, I'm going to put it at the top left-hand corner 00:55:10.500 --> 00:55:11.000 here. 00:55:11.000 --> 00:55:15.710 So, score1 is containing the integer 72. 00:55:15.710 --> 00:55:20.580 Why is it taking up four squares, though? 00:55:20.580 --> 00:55:22.040 Because? 00:55:22.040 --> 00:55:23.030 It's an integer. 00:55:23.030 --> 00:55:25.500 And on this system, an integer is 4 bytes. 00:55:25.500 --> 00:55:30.170 So I've drawn it to scale, if you will. score2 is the number 73, 00:55:30.170 --> 00:55:32.150 it also takes 4 bytes. 00:55:32.150 --> 00:55:34.850 By coincidence, but also by convention, it 00:55:34.850 --> 00:55:38.180 will likely end up next to the first integer 00:55:38.180 --> 00:55:40.970 in memory because I've only got three variables going on anyway, 00:55:40.970 --> 00:55:44.360 so the computer quite likely will store them back to back to back. 00:55:44.360 --> 00:55:48.110 And indeed, by that logic, score3, containing the number 33, 00:55:48.110 --> 00:55:50.060 is going to fill in this space here. 00:55:50.060 --> 00:55:51.917 We'll consider down the road what happens 00:55:51.917 --> 00:55:53.750 if things get fragmented-- something's here, 00:55:53.750 --> 00:55:55.875 something's here, something's here, but for now, we 00:55:55.875 --> 00:55:59.507 can assume that this is probably contiguous, though not necessarily so. 00:55:59.507 --> 00:56:01.340 All right, so that's pretty straightforward, 00:56:01.340 --> 00:56:02.750 but what's really going on? 00:56:02.750 --> 00:56:04.940 Well, these are just bytes of memory-- 00:56:04.940 --> 00:56:07.850 that is, bits of memory times 8. 00:56:07.850 --> 00:56:10.460 And so what's really going on is this pattern 00:56:10.460 --> 00:56:14.150 of 0's and 1's is being stored to represent 72. 00:56:14.150 --> 00:56:16.280 This pattern of 0's and 1's is being stored 00:56:16.280 --> 00:56:19.220 to represent 73, and similarly, 33. 00:56:19.220 --> 00:56:22.750 But that's a very low level detail that we don't really care about, 00:56:22.750 --> 00:56:27.550 so we'll generally just think about these as numbers like 72, 73, 33. 00:56:27.550 --> 00:56:28.050 All right. 00:56:28.050 --> 00:56:32.280 So if we go back to the actual code, though, here, I 00:56:32.280 --> 00:56:35.250 wonder if this is the best idea. 00:56:35.250 --> 00:56:38.280 These three lines of code are correct. 00:56:38.280 --> 00:56:41.670 I got my 59 and 1/3 for my average, which I claim 00:56:41.670 --> 00:56:46.740 is correct, but code-wise, this should maybe rub you the wrong way. 00:56:46.740 --> 00:56:49.890 Even if you hadn't programmed before CS50, 00:56:49.890 --> 00:56:53.250 why might this not be the best approach to storing things 00:56:53.250 --> 00:56:57.170 like scores in a program? 00:56:57.170 --> 00:56:58.670 How might this get us in trouble? 00:56:58.670 --> 00:56:59.240 Yeah? 00:56:59.240 --> 00:57:03.890 AUDIENCE: [INAUDIBLE] 00:57:03.890 --> 00:57:04.670 DAVID MALAN: Yeah. 00:57:04.670 --> 00:57:06.410 It's not the best because you have to use a whole bunch 00:57:06.410 --> 00:57:08.180 of different variables for each score. 00:57:08.180 --> 00:57:11.330 They're almost identically named, though, but just imagine 00:57:11.330 --> 00:57:15.620 in almost any question involving the design of your code, what happens is n, 00:57:15.620 --> 00:57:18.170 the number of things involved, gets larger? 00:57:18.170 --> 00:57:21.950 Am I really going to start writing code that has score4, score5, score6, 00:57:21.950 --> 00:57:23.270 score10, score20? 00:57:23.270 --> 00:57:27.560 I mean, your code is just going to look like this mess of mostly copy-paste 00:57:27.560 --> 00:57:30.227 except that the number at the end of the variable is changing. 00:57:30.227 --> 00:57:32.810 Like that should make you cringe a little bit because it's not 00:57:32.810 --> 00:57:34.610 going to end well eventually. 00:57:34.610 --> 00:57:37.280 And typographical errors are going to get in the way most likely 00:57:37.280 --> 00:57:38.447 because we'll make mistakes. 00:57:38.447 --> 00:57:41.240 So how can we do a little bit better than that? 00:57:41.240 --> 00:57:45.750 Well, let me propose that we introduce what we're going to now call an array. 00:57:45.750 --> 00:57:52.950 An array is a sequence of values back to back to back in memory. 00:57:52.950 --> 00:57:57.870 So an array is just a chunk of memory storing values back to back to back. 00:57:57.870 --> 00:57:59.810 So no gaps, no fragmentation. 00:57:59.810 --> 00:58:02.870 From left to right, top to bottom, just as I already drew. 00:58:02.870 --> 00:58:05.550 But these arrays in C, at least, are going 00:58:05.550 --> 00:58:09.070 to give a slightly new syntax that addresses exactly your concern. 00:58:09.070 --> 00:58:14.580 So here instead is I would propose how you define a one variable-- 00:58:14.580 --> 00:58:19.890 not three, one variable called scores, plural, each of whose values 00:58:19.890 --> 00:58:24.150 is going to be an int, and you want three integers tucked away 00:58:24.150 --> 00:58:25.420 in that variable. 00:58:25.420 --> 00:58:28.440 So now I can pluralize the name of my variable 00:58:28.440 --> 00:58:32.010 because by using square brackets and the number 3, I'm telling the compiler, 00:58:32.010 --> 00:58:36.510 give me enough room for not one, not two, but three integers in total. 00:58:36.510 --> 00:58:39.240 And the computer is going to do me a favor by storing them back 00:58:39.240 --> 00:58:41.790 to back to back in the computer's memory. 00:58:41.790 --> 00:58:45.810 Now assigning values to these variables is almost the same, 00:58:45.810 --> 00:58:47.460 but the syntax looks like this. 00:58:47.460 --> 00:58:53.370 To assign the first value, I do scores, bracket, 0 equals whatever, 72. 00:58:53.370 --> 00:58:58.560 scores, bracket, 1 equals 73; scores, bracket, 2 equals 33. 00:58:58.560 --> 00:59:00.360 And it's square brackets consistently. 00:59:00.360 --> 00:59:02.220 And notice, this is a feature-- 00:59:02.220 --> 00:59:04.080 or a downside of C. 00:59:04.080 --> 00:59:07.980 We very frequently use the same syntax for slightly different ideas. 00:59:07.980 --> 00:59:12.180 This first line tells the computer, give me an array of size 3. 00:59:12.180 --> 00:59:16.830 These next three lines mean, go into this array at location 0 00:59:16.830 --> 00:59:18.060 and put this value there. 00:59:18.060 --> 00:59:21.280 Location 1, put this value there; location 2, put this value there. 00:59:21.280 --> 00:59:24.690 So same syntax, but different meaning depending on the context here. 00:59:24.690 --> 00:59:28.470 But the equal sign indeed means that this is assignment from right 00:59:28.470 --> 00:59:30.340 to left just like last week. 00:59:30.340 --> 00:59:33.750 So what does this mean in the computer's memory? 00:59:33.750 --> 00:59:38.192 Well, in this case here, we now have a slightly different way of doing this. 00:59:38.192 --> 00:59:39.900 And actually, let me do it first in code. 00:59:39.900 --> 00:59:43.440 Let me go back to VS Code here, and let me 00:59:43.440 --> 00:59:48.100 propose that instead of having these three separate variables, 00:59:48.100 --> 00:59:52.590 let me give myself an int, scores variable of size 3, 00:59:52.590 --> 00:59:58.590 and then do scores, bracket, 0 equals 72; scores, bracket, 1 equals 73; 00:59:58.590 --> 01:00:02.100 scores, bracket, 2 equals 33. 01:00:02.100 --> 01:00:05.730 And now I have to change this syntax slightly, but same idea. 01:00:05.730 --> 01:00:12.660 scores, bracket, 0; scores, bracket, 1; and lastly, scores, bracket, 2. 01:00:12.660 --> 01:00:14.640 So a couple of key details. 01:00:14.640 --> 01:00:16.000 I started counting at 0. 01:00:16.000 --> 01:00:16.500 Why? 01:00:16.500 --> 01:00:18.210 That's just the way it is with arrays. 01:00:18.210 --> 01:00:21.818 You must start counting at 0 unless you want to waste one of those spaces. 01:00:21.818 --> 01:00:23.610 And what you definitely don't want to do is 01:00:23.610 --> 01:00:27.030 go into scores, bracket, 3 because I only 01:00:27.030 --> 01:00:29.190 ask the computer for three integers. 01:00:29.190 --> 01:00:32.190 If I blindly do something like this, you're going too far. 01:00:32.190 --> 01:00:34.830 You're going beyond the end of the chunk of memory 01:00:34.830 --> 01:00:37.080 and bad things will often happen. 01:00:37.080 --> 01:00:38.770 So we won't do that just yet. 01:00:38.770 --> 01:00:43.030 But for now, 0, 1, and 2 are the first, second, and third locations. 01:00:43.030 --> 01:00:48.030 So if I recompile this code-- so make scores seems OK. ./scores, 01:00:48.030 --> 01:00:50.607 and I get the exact same answer there. 01:00:50.607 --> 01:00:52.440 But let me make it more dynamic because this 01:00:52.440 --> 01:00:56.670 is a little stupid that I'm compiling a program with my scores hardcoded. 01:00:56.670 --> 01:00:59.380 What if I have a fourth exam tomorrow or something like that? 01:00:59.380 --> 01:01:01.110 So let's make it more dynamic and I think 01:01:01.110 --> 01:01:03.460 the syntax will start to make a little more sense. 01:01:03.460 --> 01:01:07.270 Let's go ahead and use get_int and ask the user for a score. 01:01:07.270 --> 01:01:10.270 Let's go ahead and get_int and ask the user for another score. 01:01:10.270 --> 01:01:15.090 Let's go ahead and get_int and ask the user for a third score, 01:01:15.090 --> 01:01:18.720 now storing the return values in each of those variables. 01:01:18.720 --> 01:01:20.970 If I now do make scores-- 01:01:20.970 --> 01:01:22.530 oh, darn it. 01:01:22.530 --> 01:01:24.830 a mistake. 01:01:24.830 --> 01:01:28.130 Similar to one I've made before, but we didn't see the error message last time. 01:01:28.130 --> 01:01:28.880 What'd I do wrong? 01:01:28.880 --> 01:01:30.165 Yeah? 01:01:30.165 --> 01:01:31.040 AUDIENCE: [INAUDIBLE] 01:01:31.040 --> 01:01:31.770 DAVID MALAN: OK. 01:01:31.770 --> 01:01:33.915 What did I do wrong-- how about over here? 01:01:33.915 --> 01:01:34.790 AUDIENCE: [INAUDIBLE] 01:01:34.790 --> 01:01:35.210 DAVID MALAN: Yeah. 01:01:35.210 --> 01:01:36.900 So I'm missing the CS50 header file. 01:01:36.900 --> 01:01:38.060 So how do you know that? 01:01:38.060 --> 01:01:40.550 Well, implicit declaration of function get_int. 01:01:40.550 --> 01:01:42.350 So it just doesn't know what get_int is. 01:01:42.350 --> 01:01:44.660 Well, who does know what get_int is? 01:01:44.660 --> 01:01:47.010 The CS50 Library, that should be your first instinct. 01:01:47.010 --> 01:01:47.510 All right. 01:01:47.510 --> 01:01:51.620 Let me go to the top here and let me go ahead and squeeze in the CS50 Library 01:01:51.620 --> 01:01:52.460 like this. 01:01:52.460 --> 01:01:54.140 Now let me clear my terminal. 01:01:54.140 --> 01:01:55.312 make scores again. 01:01:55.312 --> 01:01:56.270 We're back in business. 01:01:56.270 --> 01:02:00.320 And notice, I don't need to do -l cs50. 01:02:00.320 --> 01:02:05.490 make is doing that for me for clang, but we don't even see clang being executed, 01:02:05.490 --> 01:02:09.120 but it is being executed underneath the hood, so to speak. 01:02:09.120 --> 01:02:10.970 All right, so ./scores, here we go. 01:02:10.970 --> 01:02:13.340 72, 73, 33. 01:02:13.340 --> 01:02:17.630 Math is still the same, but now the program is more interactive. 01:02:17.630 --> 01:02:20.520 Now this, too, hopefully should rub you the wrong way. 01:02:20.520 --> 01:02:25.790 This is correct, I would claim, but bad design still. 01:02:25.790 --> 01:02:28.460 Reeks of week 0 inefficiencies. 01:02:28.460 --> 01:02:29.030 Yeah? 01:02:29.030 --> 01:02:33.793 AUDIENCE: [INAUDIBLE] 01:02:33.793 --> 01:02:34.460 DAVID MALAN: OK. 01:02:34.460 --> 01:02:37.160 So I could ask the human how many scores do you want to input? 01:02:37.160 --> 01:02:38.310 Let's come back to that. 01:02:38.310 --> 01:02:42.550 But I think even in this construct, what better could I do? 01:02:42.550 --> 01:02:43.510 Use a loop, right? 01:02:43.510 --> 01:02:46.060 Because I'm literally doing the same thing again and again. 01:02:46.060 --> 01:02:48.530 And notice, this number is just changing slightly. 01:02:48.530 --> 01:02:51.490 I would think that a little plus-plus could help there. get_int Score, 01:02:51.490 --> 01:02:53.960 get_int Score, get_int Score-- that's the exact same thing. 01:02:53.960 --> 01:02:56.120 So a loop is a perfect solution here. 01:02:56.120 --> 01:02:59.980 So let me go over into this code here, and I can still for now 01:02:59.980 --> 01:03:02.440 declare it to be of size 3, but I think I 01:03:02.440 --> 01:03:07.340 could do something like this-- for int i get 0, i is less than 3, 01:03:07.340 --> 01:03:10.090 so I'm not going to make the same buggy mistake as I made earlier. 01:03:10.090 --> 01:03:11.260 I++. 01:03:11.260 --> 01:03:15.850 Inside of the loop now, I can do scores, bracket, i, and now 01:03:15.850 --> 01:03:18.010 arrays are getting really interesting because you 01:03:18.010 --> 01:03:22.570 can use and reuse them, but dynamically go to a specific location. 01:03:22.570 --> 01:03:25.510 Equals get_int, quote-unquote, "Score." 01:03:25.510 --> 01:03:29.110 Now I can type that phrase just once and this loop ultimately 01:03:29.110 --> 01:03:31.330 will do the same thing, but it's getting better. 01:03:31.330 --> 01:03:34.720 The code is getting better designed because it's more compact 01:03:34.720 --> 01:03:36.250 and I'm not repeating myself. 01:03:36.250 --> 01:03:38.020 72, 73, 33. 01:03:38.020 --> 01:03:42.530 Still works the same, but we're iteratively improving the code here. 01:03:42.530 --> 01:03:48.500 Now how else-- there's one design flaw here that I still don't love 01:03:48.500 --> 01:03:49.710 it's a little more subtle. 01:03:49.710 --> 01:03:51.160 Any observations? 01:03:51.160 --> 01:03:57.462 AUDIENCE: [INAUDIBLE] 01:03:57.462 --> 01:03:58.670 DAVID MALAN: Ah, interesting. 01:03:58.670 --> 01:04:01.460 So instead of dividing by 3.0, maybe I should divide it 01:04:01.460 --> 01:04:05.480 by the array size, which at the moment is technically still 3, 01:04:05.480 --> 01:04:10.670 but I do concur that that is worrisome because they could get out of sync. 01:04:10.670 --> 01:04:13.550 But there's something else that still isn't quite right. 01:04:13.550 --> 01:04:14.965 Yeah? 01:04:14.965 --> 01:04:19.090 AUDIENCE: [INAUDIBLE] 01:04:19.090 --> 01:04:22.292 DAVID MALAN: I'm OK moving to this zero-indexed model. 01:04:22.292 --> 01:04:23.500 So this is a new term of art. 01:04:23.500 --> 01:04:27.470 To index into an array means to go to a specific location. 01:04:27.470 --> 01:04:31.120 So here, I'm indexing into location i, but i is going 01:04:31.120 --> 01:04:33.250 to start at 0 and then 1 and then 2. 01:04:33.250 --> 01:04:34.390 I'm actually OK with that. 01:04:34.390 --> 01:04:37.600 Even though in common day life we would say score1, score2, score3, 01:04:37.600 --> 01:04:39.730 as a programmer, I just have to get into the habit 01:04:39.730 --> 01:04:43.450 of saying score0, score1, score2 now. 01:04:43.450 --> 01:04:44.350 But something else. 01:04:44.350 --> 01:04:45.306 Yeah? 01:04:45.306 --> 01:04:47.540 AUDIENCE: I could compute the average. 01:04:47.540 --> 01:04:49.850 DAVID MALAN: I could also compute the average in a loop 01:04:49.850 --> 01:04:54.290 because indeed, this is only going-- so solving the problem halfway. 01:04:54.290 --> 01:04:56.240 I'm gathering the information in the loop, 01:04:56.240 --> 01:04:58.200 but then I'm manually writing it all out. 01:04:58.200 --> 01:05:01.730 So it does feel like there should be a better solution here. 01:05:01.730 --> 01:05:05.540 But let me also identify one other issue I really don't like, 01:05:05.540 --> 01:05:06.710 and this is, indeed, subtle. 01:05:06.710 --> 01:05:11.180 I've got 3 here, I've got 3 here, and I essentially have 3 here, 01:05:11.180 --> 01:05:12.750 albeit a floating point version. 01:05:12.750 --> 01:05:16.550 This is just ripe for me making a mistake eventually and changing one 01:05:16.550 --> 01:05:18.840 of those values, but not the other two? 01:05:18.840 --> 01:05:20.090 So how might I fix this? 01:05:20.090 --> 01:05:22.200 I might at least do something like this. 01:05:22.200 --> 01:05:28.010 I could say integer maybe n for scores, I'll set that equal to 3. 01:05:28.010 --> 01:05:31.430 I could then use n here, I could use n here. 01:05:31.430 --> 01:05:33.742 I could use n here, but that's a step backwards 01:05:33.742 --> 01:05:36.950 because I don't want an int because I'm going to run into the same math issue 01:05:36.950 --> 01:05:40.250 as before, but I could convert it-- that is, cast it to a float, 01:05:40.250 --> 01:05:42.920 and we did that briefly last week. 01:05:42.920 --> 01:05:47.730 But there's one other thing I could do here that we did introduced last week. 01:05:47.730 --> 01:05:51.150 This is better because I don't have a magic number floating around 01:05:51.150 --> 01:05:53.490 in multiple places. 01:05:53.490 --> 01:05:56.160 Yeah, if I really want to be proper, I should probably 01:05:56.160 --> 01:05:58.440 say this should be a constant integer. 01:05:58.440 --> 01:05:58.950 Why? 01:05:58.950 --> 01:06:01.200 Because I don't want to accidentally change it myself. 01:06:01.200 --> 01:06:03.242 I don't want to be collaborating with a colleague 01:06:03.242 --> 01:06:04.800 and they foolishly change it on me. 01:06:04.800 --> 01:06:09.060 This just sends a stronger signal to the compiler, do not let the humans change 01:06:09.060 --> 01:06:10.000 this value. 01:06:10.000 --> 01:06:12.960 And now just to point out one other feature of C, 01:06:12.960 --> 01:06:16.650 if you have a number like this, like the number 3, 01:06:16.650 --> 01:06:18.990 I've deliberately capitalized this variable name really 01:06:18.990 --> 01:06:19.915 for the first time. 01:06:19.915 --> 01:06:22.290 Any time you have a constant, it tends to be a convention 01:06:22.290 --> 01:06:25.000 to capitalize it just to draw your attention to it. 01:06:25.000 --> 01:06:26.580 It doesn't mean anything technically. 01:06:26.580 --> 01:06:28.950 Capitalizing a variable does nothing to it, 01:06:28.950 --> 01:06:31.660 but it draws attention visually to it to the human. 01:06:31.660 --> 01:06:33.930 So if you declare something as a constant, 01:06:33.930 --> 01:06:37.050 it's commonplace to capitalize it just because. 01:06:37.050 --> 01:06:41.790 Moreover, if you have a constant that you might want to occasionally modify-- 01:06:41.790 --> 01:06:45.660 maybe next semester when there's four exams or five exams instead of three, 01:06:45.660 --> 01:06:48.900 it actually is OK sometimes to define what 01:06:48.900 --> 01:06:52.080 might be called a global variable, a variable that is not 01:06:52.080 --> 01:06:57.280 inside of curly braces, it's literally at the top of the file outside of main, 01:06:57.280 --> 01:06:59.890 and despite what I said about scope last week, 01:06:59.890 --> 01:07:05.170 a global variable like this on line 4 will be in scope 01:07:05.170 --> 01:07:07.550 to every function in this file. 01:07:07.550 --> 01:07:09.880 So it's actually a way of sharing a variable 01:07:09.880 --> 01:07:13.090 across multiple functions, which is generally fine if you're 01:07:13.090 --> 01:07:14.230 using a constant. 01:07:14.230 --> 01:07:18.010 If you intend to change it, there's probably a better way 01:07:18.010 --> 01:07:21.310 than actually using a global variable, but this is just 01:07:21.310 --> 01:07:23.620 in contrast to what I previously did, which I would 01:07:23.620 --> 01:07:26.810 call, by contrast, a local variable. 01:07:26.810 --> 01:07:30.563 But again, I'm just trying to reduce the probability of making mistakes 01:07:30.563 --> 01:07:31.480 somewhere in the code. 01:07:31.480 --> 01:07:32.170 And I do agree. 01:07:32.170 --> 01:07:35.560 I don't like that I'm still adding all of these scores 01:07:35.560 --> 01:07:39.130 manually even though clearly I had a loop a moment ago. 01:07:39.130 --> 01:07:40.990 But for now, let's at least consider what's 01:07:40.990 --> 01:07:43.130 been going on inside of the computer's memory. 01:07:43.130 --> 01:07:48.880 So with this array, I now have not three variables, score1, score2, score3. 01:07:48.880 --> 01:07:53.530 I have one variable, an array variable, called scores, plural. 01:07:53.530 --> 01:07:57.700 And if I want to access the first element, its scores, bracket, 0. 01:07:57.700 --> 01:08:00.400 If I want to access the second element, its scores, bracket, 1. 01:08:00.400 --> 01:08:03.100 If I want to access the third element, it's scores, bracket, 2. 01:08:03.100 --> 01:08:07.480 If I were to make a mistake and do scores, bracket, 3, 01:08:07.480 --> 01:08:11.380 which is the fourth element, I'd end up in no man's land here, 01:08:11.380 --> 01:08:15.307 and worst case, your program could crash or something weird will happen, 01:08:15.307 --> 01:08:17.140 spinning beach balls, those kinds of things. 01:08:17.140 --> 01:08:18.910 Just don't make those mistakes. 01:08:18.910 --> 01:08:21.310 And C makes it easy to make those mistakes, 01:08:21.310 --> 01:08:25.300 so the onus is really on you programmatically. 01:08:25.300 --> 01:08:31.960 Questions on this use of arrays? 01:08:31.960 --> 01:08:33.580 Question on this use of arrays? 01:08:33.580 --> 01:08:34.359 Yeah, in back. 01:08:34.359 --> 01:08:36.283 AUDIENCE: Is there any way [INAUDIBLE]? 01:08:43.870 --> 01:08:45.370 DAVID MALAN: A really good question. 01:08:45.370 --> 01:08:48.279 Is there any way to create an array just by using syntax alone 01:08:48.279 --> 01:08:49.899 without prompting the human for it? 01:08:49.899 --> 01:08:51.490 Short answer, yes. 01:08:51.490 --> 01:08:56.529 If you want to have an array of integers called, for instance, array, 01:08:56.529 --> 01:09:01.090 you could actually do like 13, 42, 50, something like this, 01:09:01.090 --> 01:09:04.300 would give you an array if you use this syntax. 01:09:04.300 --> 01:09:08.680 This would give you an array of size 3 where the three values by default 01:09:08.680 --> 01:09:10.600 are 13, 42 and 50. 01:09:10.600 --> 01:09:13.370 It's not syntax we'll use for now, but there is syntax like that. 01:09:13.370 --> 01:09:15.970 It's not quite as user-friendly, though, as other languages 01:09:15.970 --> 01:09:19.060 if you've indeed programmed before. 01:09:19.060 --> 01:09:24.439 Other questions on this use of arrays? 01:09:24.439 --> 01:09:26.550 Yeah, in front. 01:09:26.550 --> 01:09:29.050 AUDIENCE: [INAUDIBLE] 01:09:29.050 --> 01:09:30.924 DAVID MALAN: Is there a way to copy what? 01:09:30.924 --> 01:09:33.399 AUDIENCE: [INAUDIBLE] 01:09:33.399 --> 01:09:36.310 DAVID MALAN: Oh, is there a way to calculate the length of an array? 01:09:36.310 --> 01:09:39.910 Short answer, no, and I'm about to show you one demonstration of this. 01:09:39.910 --> 01:09:43.899 Those of you who have programmed before in Java, in JavaScript, 01:09:43.899 --> 01:09:47.270 in certain other languages, it's very easy to get the length of an array. 01:09:47.270 --> 01:09:49.720 You essentially just ask the array, what's its length? 01:09:49.720 --> 01:09:51.880 C does not give you that capability. 01:09:51.880 --> 01:09:56.560 The onus is entirely on you and me to remember, s as with another variable, 01:09:56.560 --> 01:09:59.300 like n, how long the array is. 01:09:59.300 --> 01:10:01.760 And so in fact, let me go ahead and do this. 01:10:01.760 --> 01:10:06.430 I'm going to go ahead and open up a baking style, a program 01:10:06.430 --> 01:10:09.940 that I wrote in advance here which kind of escalates quickly, 01:10:09.940 --> 01:10:13.990 but there's not really too many new ideas here except for the array 01:10:13.990 --> 01:10:14.800 specifics. 01:10:14.800 --> 01:10:19.450 So this is scores.c premade this time. 01:10:19.450 --> 01:10:20.650 And notice what I have. 01:10:20.650 --> 01:10:25.750 One, I've included cs50.h and stdio.h at the top, so that's the same. 01:10:25.750 --> 01:10:28.630 I have declared a constant called n, set it equal to 3. 01:10:28.630 --> 01:10:31.270 That is now the same as of my most recent change. 01:10:31.270 --> 01:10:36.380 I did introduce an average function, which was one of the remaining concerns 01:10:36.380 --> 01:10:40.220 that I could compute the average with some kind of loop, too. 01:10:40.220 --> 01:10:42.980 That average function is going to return a float, which is what. 01:10:42.980 --> 01:10:46.100 I want my average to be a float with the fraction. 01:10:46.100 --> 01:10:47.180 But notice this. 01:10:47.180 --> 01:10:50.360 In answer to your question, if I want a function called 01:10:50.360 --> 01:10:55.100 average to do something iterate over an array step by step by step, 01:10:55.100 --> 01:10:58.430 add up all the numbers, and divide by the total number of numbers, 01:10:58.430 --> 01:11:03.350 I need to give it the array of numbers, and I need to tell it how many of those 01:11:03.350 --> 01:11:03.950 numbers are. 01:11:03.950 --> 01:11:06.230 So I literally have to pass in two values. 01:11:06.230 --> 01:11:09.890 Meanwhile, this code is the same as before inside of main. 01:11:09.890 --> 01:11:13.430 I'm declaring a variable called scores of size n. 01:11:13.430 --> 01:11:16.430 I'm iterating from i to n. 01:11:16.430 --> 01:11:17.990 And actually-- yep. 01:11:17.990 --> 01:11:22.520 And then in this loop, I'm assigning each of the scores a return 01:11:22.520 --> 01:11:23.750 value of get_int. 01:11:23.750 --> 01:11:27.350 The last line of main is this-- print out the average with f, 01:11:27.350 --> 01:11:31.280 but don't just do it manually by adding and dividing with parentheses. 01:11:31.280 --> 01:11:36.080 Call the average function, pass in the length of the array and the array 01:11:36.080 --> 01:11:41.810 itself, and hope that it returns a float that then gets plugged into percent f 01:11:41.810 --> 01:11:45.260 So I would claim that pretty much all of this, even though it's a lot, 01:11:45.260 --> 01:11:46.550 should be familiar. 01:11:46.550 --> 01:11:50.780 There's no real new ideas except for this use of the global variable now 01:11:50.780 --> 01:11:52.590 and this average function. 01:11:52.590 --> 01:11:54.740 So let me scroll down to the average function 01:11:54.740 --> 01:11:57.530 because this is the takeaway from this final example. 01:11:57.530 --> 01:11:59.570 In this example here-- 01:11:59.570 --> 01:12:01.640 let me scroll up to the average function, 01:12:01.640 --> 01:12:04.790 copy-pasted the prototype for the very first line. 01:12:04.790 --> 01:12:06.980 And here's how I'm computing the average. 01:12:06.980 --> 01:12:11.240 There's different ways of doing this, but here's an accumulator way. 01:12:11.240 --> 01:12:15.260 On line 28, I'm declaring a variable inside of the average function called 01:12:15.260 --> 01:12:17.540 sum, and I'm just initializing it to 0. 01:12:17.540 --> 01:12:18.050 Why? 01:12:18.050 --> 01:12:20.630 Mentally I want to add up all of the person scores 01:12:20.630 --> 01:12:24.480 and then I want to divide by the total and that's my mathematical average. 01:12:24.480 --> 01:12:28.970 So here's my loop where I'm iterating from 0 up to, but not 01:12:28.970 --> 01:12:32.060 through the length-- so that should be three times. 01:12:32.060 --> 01:12:37.950 I am adding to the sum variable whatever is at the i-th location, so to speak, 01:12:37.950 --> 01:12:38.850 of the array. 01:12:38.850 --> 01:12:42.050 So this is array, bracket 0; array, bracket, 1; array, bracket, 01:12:42.050 --> 01:12:43.860 2 on each iteration. 01:12:43.860 --> 01:12:46.670 And then the last thing I'm doing is a nice one-liner. 01:12:46.670 --> 01:12:51.470 I'm dividing the sum, which is an int, which is the sum of 72, 73, 33, 01:12:51.470 --> 01:12:56.550 divided by the length, which is 3, but 3 is not a float, so I cast it to a float 01:12:56.550 --> 01:13:03.060 so that the end value, hopefully, is going to be 59.33333 and so forth. 01:13:03.060 --> 01:13:06.380 So the only thing that's weird syntactically is this, though. 01:13:06.380 --> 01:13:10.430 When you define a function in C that takes an argument that isn't just 01:13:10.430 --> 01:13:14.640 a simple char, isn't just a simple integer, it's actually an array, 01:13:14.640 --> 01:13:17.090 you don't have to know the array's length in advance. 01:13:17.090 --> 01:13:19.820 You can just put square brackets after the name you give it. 01:13:19.820 --> 01:13:21.237 And I don't have to call it array. 01:13:21.237 --> 01:13:23.930 I could call it x or y or z or anything else. 01:13:23.930 --> 01:13:26.390 I called it array just to make clear that it's an array, 01:13:26.390 --> 01:13:30.620 but you do need to know the length somehow. 01:13:30.620 --> 01:13:31.120 OK. 01:13:31.120 --> 01:13:37.820 Questions on combining those ideas in that there way? 01:13:41.170 --> 01:13:42.980 Any questions? 01:13:42.980 --> 01:13:43.850 No? 01:13:43.850 --> 01:13:44.420 All right. 01:13:44.420 --> 01:13:46.790 Well, we've only dealt with numbers thus far. 01:13:46.790 --> 01:13:50.340 It would be nice to actually deal with letters and words and paragraphs 01:13:50.340 --> 01:13:52.340 and the like, much like our readability example, 01:13:52.340 --> 01:13:56.150 but I think first, some snacks and some fruit are served in the transept. 01:13:56.150 --> 01:13:57.140 So we'll see you in 10. 01:13:57.140 --> 01:13:59.480 See you in 10. 01:13:59.480 --> 01:14:00.320 All right. 01:14:00.320 --> 01:14:01.190 So we're back. 01:14:01.190 --> 01:14:02.960 And up until now, we've been representing 01:14:02.960 --> 01:14:05.060 just numbers underneath the hood, but we've 01:14:05.060 --> 01:14:07.760 introduced arrays, which gave us this ability, recall, 01:14:07.760 --> 01:14:10.260 to store numbers back to back to back. 01:14:10.260 --> 01:14:13.310 So it turns out, you actually had this capability for the past 01:14:13.310 --> 01:14:15.620 week even though you might not have realized it. 01:14:15.620 --> 01:14:19.100 And let me propose that we first consider very simple example of three 01:14:19.100 --> 01:14:20.750 chars instead of three integers. 01:14:20.750 --> 01:14:23.390 And for simplistically, I'm going to call them c1, c2, 01:14:23.390 --> 01:14:25.400 and c3 just for the sake of discussion. 01:14:25.400 --> 01:14:29.090 But I'm going to put our familiar characters, "HI!" 01:14:29.090 --> 01:14:32.330 in those variables using single quotes because again. 01:14:32.330 --> 01:14:35.900 That's what you do when using individual chars 01:14:35.900 --> 01:14:40.282 to make the point that I can store three chars in three separate variables. 01:14:40.282 --> 01:14:41.990 So let me go ahead and go over to VS Code 01:14:41.990 --> 01:14:45.180 here and let me create something called hi.c. 01:14:45.180 --> 01:14:50.970 And in this program, I'll first include stdio.h, int main void as before. 01:14:50.970 --> 01:14:53.430 And then inside of main, let's just do exactly that. 01:14:53.430 --> 01:14:57.540 Char c1 equals, quote-unquote, capital H. Char C2 equals, 01:14:57.540 --> 01:15:00.420 quote-unquote, capital I. Char C3 equals, 01:15:00.420 --> 01:15:02.550 quote-unquote, exclamation point. 01:15:02.550 --> 01:15:06.450 So clearly not the best approach, but just for demonstration's sake. 01:15:06.450 --> 01:15:09.780 And here now that you understand hopefully 01:15:09.780 --> 01:15:12.780 from week 1 that really number-- and really, from week 0, 01:15:12.780 --> 01:15:16.020 that numbers are just letters, which can be something more, too. 01:15:16.020 --> 01:15:18.570 We can really just use our basic understanding of C 01:15:18.570 --> 01:15:21.180 to tinker with these ideas now and see them such 01:15:21.180 --> 01:15:24.900 that there is indeed going to be no magic happening for us ultimately. 01:15:24.900 --> 01:15:31.800 So let me go ahead and print out three characters-- %c, %c, %c, backslash n. 01:15:31.800 --> 01:15:34.800 And then print out c1, c2, c3. 01:15:34.800 --> 01:15:36.690 So I've got three separate placeholders. 01:15:36.690 --> 01:15:40.560 And we haven't really had occasion to use %c, but it means put char here, 01:15:40.560 --> 01:15:44.760 unlike %s, which is put a whole string here, or %i, put an integer. 01:15:44.760 --> 01:15:49.290 Let me go ahead and make hi, no syntax errors, ./hi, 01:15:49.290 --> 01:15:51.330 and it should print out "HI!" 01:15:51.330 --> 01:15:53.400 in exclamation points because I'm printing out 01:15:53.400 --> 01:15:54.870 just three simple characters. 01:15:54.870 --> 01:15:57.850 But per our discussion as far back as week 0, 01:15:57.850 --> 01:16:01.440 letters are just numbers and numbers are just letters, 01:16:01.440 --> 01:16:03.840 it just depends on the context in which we use them. 01:16:03.840 --> 01:16:05.792 So let me change this %c to an i. 01:16:05.792 --> 01:16:08.250 And I'm going to add a space just so that you can obviously 01:16:08.250 --> 01:16:10.050 separate one number from another. 01:16:10.050 --> 01:16:14.850 Change this to i, change this to i, but still print out c1, c2, c3. 01:16:14.850 --> 01:16:16.650 So no integers, per se. 01:16:16.650 --> 01:16:19.500 Let me just print out those chars. 01:16:19.500 --> 01:16:26.670 Let me do make hi, no errors, ./hi, and now I see 72, 73, 33. 01:16:26.670 --> 01:16:31.270 So in the case of chars and ints, you can actually treat one as the other 01:16:31.270 --> 01:16:33.850 so long as you have enough bits to fit one in the other. 01:16:33.850 --> 01:16:36.450 You don't have to cast even or do anything explicitly. 01:16:36.450 --> 01:16:38.340 You do have to cast one of-- 01:16:38.340 --> 01:16:41.910 converting an integer to a float to make clear to the compiler 01:16:41.910 --> 01:16:44.160 that you really intend to do this because that 01:16:44.160 --> 01:16:47.400 could be destructive if it can't quite represent the number as you intend. 01:16:47.400 --> 01:16:50.880 But in this case here, I think we're OK just poking around and seeing 01:16:50.880 --> 01:16:52.750 what's going on underneath the hood. 01:16:52.750 --> 01:16:55.202 Well, what is going on underneath the hood memory-wise? 01:16:55.202 --> 01:16:56.410 Well, something very similar. 01:16:56.410 --> 01:16:57.780 Here's that canvas of memory. 01:16:57.780 --> 01:17:00.570 And maybe we got lucky and it's in the top left-hand corner 01:17:00.570 --> 01:17:03.270 like this-- c1, c2, c3. 01:17:03.270 --> 01:17:05.790 But these are just three individual characters, 01:17:05.790 --> 01:17:08.970 but we're getting awfully close to what we last week called 01:17:08.970 --> 01:17:12.270 a string, which are just characters, a sequence of characters 01:17:12.270 --> 01:17:13.500 from left to right. 01:17:13.500 --> 01:17:19.530 And in fact, I think if we combine this revelation that these are just 01:17:19.530 --> 01:17:22.410 numbers underneath the hood back to back to back combined 01:17:22.410 --> 01:17:25.620 with the idea of an array from earlier, we can 01:17:25.620 --> 01:17:27.690 start to see what's really going on. 01:17:27.690 --> 01:17:31.920 Because indeed, underneath the hood, this is just a number, 72, 73, 33. 01:17:31.920 --> 01:17:34.290 And really, if we go lower level than that, 01:17:34.290 --> 01:17:36.870 it's these three patterns of 0's and 1's. 01:17:36.870 --> 01:17:39.270 That's all that's going on inside of the computer, 01:17:39.270 --> 01:17:43.380 but it's our use of int that shows it to us as an integer. 01:17:43.380 --> 01:17:47.250 It's our use of char that makes it clear that it's a char, or equivalently, 01:17:47.250 --> 01:17:50.260 %i and %c respectively. 01:17:50.260 --> 01:17:52.180 But what exactly is a string? 01:17:52.180 --> 01:17:54.540 Well, it's really just a sequence of characters, 01:17:54.540 --> 01:17:56.530 and so why don't we go there? 01:17:56.530 --> 01:17:59.400 Let me propose that we actually give ourselves an actual string, 01:17:59.400 --> 01:18:02.260 call it s-- we'll use double quotes this time. 01:18:02.260 --> 01:18:05.760 So if I go back to VS Code here, let me shorten this program 01:18:05.760 --> 01:18:10.440 and just give myself a single string s, set it equal to "HI!" 01:18:10.440 --> 01:18:11.310 in double quotes. 01:18:11.310 --> 01:18:16.740 And then below that, let's go ahead and print out %s, backslash n, 01:18:16.740 --> 01:18:18.180 and then s itself. 01:18:18.180 --> 01:18:21.120 And then, turns out, for reasons we'll soon 01:18:21.120 --> 01:18:23.670 see, I do need to include the CS50 Library so as 01:18:23.670 --> 01:18:27.750 to use the actual keyword string here even though I'm not using get_string, 01:18:27.750 --> 01:18:29.490 but more on that another time. 01:18:29.490 --> 01:18:34.950 But if I now do make hi, it does compile ./hi and it still prints out the exact 01:18:34.950 --> 01:18:35.920 same thing. 01:18:35.920 --> 01:18:38.370 But what's going on inside of the computer's memory 01:18:38.370 --> 01:18:42.550 when I use a string called s instead of three chars, well, 01:18:42.550 --> 01:18:46.300 you can think of the string as taking up at least three bytes, H, 01:18:46.300 --> 01:18:47.690 I, exclamation point. 01:18:47.690 --> 01:18:50.440 But it's not three separate variables, it's one variable. 01:18:50.440 --> 01:18:53.620 But what does this really look like now, especially 01:18:53.620 --> 01:18:56.020 if I add back the yellow lines? 01:18:56.020 --> 01:19:00.970 s is really just an array of characters. 01:19:00.970 --> 01:19:04.170 So we called it a string last week, and I claim today 01:19:04.170 --> 01:19:10.320 that this is an abstraction in the CS50 library that's giving us this string, 01:19:10.320 --> 01:19:13.560 but it's really just an array of size at least 3 01:19:13.560 --> 01:19:16.560 here where s, bracket, 0 presumably gives me the H, s, bracket, 01:19:16.560 --> 01:19:19.720 1 is the I, s, bracket, 2 is the exclamation point. 01:19:19.720 --> 01:19:22.410 But just by saying string, all of that happens automatically. 01:19:22.410 --> 01:19:25.320 I don't even need to tell the computer how many chars are 01:19:25.320 --> 01:19:27.880 going to be in this string all at once. 01:19:27.880 --> 01:19:31.680 So in fact, let me go over to maybe a variant of this program 01:19:31.680 --> 01:19:33.570 and we can see this syntactically. 01:19:33.570 --> 01:19:37.480 So instead of printing out the whole string with %s, 01:19:37.480 --> 01:19:43.320 let me actually be a little curious and print out %c, %c, %c, 01:19:43.320 --> 01:19:47.490 and then change s to s, bracket, 0, s, bracket, 1, s, bracket, 2. 01:19:47.490 --> 01:19:49.140 Which is not better in any sense. 01:19:49.140 --> 01:19:51.000 This is way more tedious now, but it does 01:19:51.000 --> 01:19:54.840 demonstrate that I can treat here in week 2 01:19:54.840 --> 01:19:57.870 as though it's an array, which means even in week 1 it was an array, 01:19:57.870 --> 01:19:58.840 we just didn't know it. 01:19:58.840 --> 01:20:01.730 We didn't have the syntax with which to express that. 01:20:01.730 --> 01:20:05.740 So if I now do make hi, still compiles ./hi. 01:20:05.740 --> 01:20:09.370 Same exact output, but I'm now just kind of manipulating 01:20:09.370 --> 01:20:11.380 the string in these different ways because I 01:20:11.380 --> 01:20:13.720 a string is just an array of characters, so I can 01:20:13.720 --> 01:20:16.450 treat with the square bracket notation. 01:20:16.450 --> 01:20:21.100 But how do I know-- how does the computer know where hi ends? 01:20:21.100 --> 01:20:23.890 And this is where strings get a little dangerous. 01:20:23.890 --> 01:20:26.050 Like a char is 1 byte no matter what. 01:20:26.050 --> 01:20:28.480 1 char, 1 character, that's it. 01:20:28.480 --> 01:20:31.180 But a string, recall my question mark from earlier, 01:20:31.180 --> 01:20:33.415 could be null bytes if it's-- 01:20:33.415 --> 01:20:37.610 you would think could be 0 bytes if you have nothing in it inside the quotes. 01:20:37.610 --> 01:20:40.930 It could be one character, two, 10, 100 like I claimed, 01:20:40.930 --> 01:20:44.140 but how does the computer know where strings end? 01:20:44.140 --> 01:20:47.560 Like how does the computer not know that the string is not 01:20:47.560 --> 01:20:49.510 the whole row of memory here? 01:20:49.510 --> 01:20:51.350 How does it know that it ends here? 01:20:51.350 --> 01:20:54.880 Well, it turns out, all this time, when we've been using, quote-unquote, 01:20:54.880 --> 01:20:58.120 string and using get_string from the CS50 library, 01:20:58.120 --> 01:21:00.640 there's actually a special sentinel value 01:21:00.640 --> 01:21:03.580 at the end of every string in a computer's memory 01:21:03.580 --> 01:21:06.700 that tells the computer string, stops here. 01:21:06.700 --> 01:21:08.890 And the sentinel value-- and by sentinel, I 01:21:08.890 --> 01:21:13.820 just mean special value that the world decided on decades ago, is all 0 bits. 01:21:13.820 --> 01:21:20.210 If you have a byte with all 0 bits in it, that means string ends here. 01:21:20.210 --> 01:21:23.920 So the implication is that the computer now, using a loop or something, 01:21:23.920 --> 01:21:26.650 can print out char, char, char-- oh, done, 01:21:26.650 --> 01:21:28.750 because it sees this special value. 01:21:28.750 --> 01:21:32.800 If it didn't have that, it might blindly go char, char, char, char char-- 01:21:32.800 --> 01:21:37.450 printing out values of memory that don't belong to that given string. 01:21:37.450 --> 01:21:39.490 So I was correcting myself verbally a moment ago 01:21:39.490 --> 01:21:44.140 because I said that this string is of length 3, it's 3 bytes, but it's not. 01:21:44.140 --> 01:21:47.290 Every string in the world, both last week and now, this 01:21:47.290 --> 01:21:51.430 is actually n plus 1 bytes where n is the actual human length 01:21:51.430 --> 01:21:54.220 that you care about, H-I, exclamation point, or 3, 01:21:54.220 --> 01:21:59.110 but it's always going to use one extra byte for this so-called zero value 01:21:59.110 --> 01:21:59.770 at the end. 01:21:59.770 --> 01:22:03.220 And this 0 value is very tedious to write a 0-- 01:22:03.220 --> 01:22:04.630 as 8 0 bits. 01:22:04.630 --> 01:22:07.240 So we would actually typically just write it as a 0. 01:22:07.240 --> 01:22:10.420 But you don't want to confuse a 0 on the screen-- it's actually being 01:22:10.420 --> 01:22:12.290 like the number 0 on the keyboard. 01:22:12.290 --> 01:22:16.000 And so we would actually typically write this symbol with a backslash 0. 01:22:16.000 --> 01:22:19.960 So this is the char-based representation of 0. 01:22:19.960 --> 01:22:21.970 So it means the exact same thing, this is just 01:22:21.970 --> 01:22:26.470 C notation that indicates that this is 8 0 bits, 01:22:26.470 --> 01:22:29.380 but just makes clear that it's not literally the number 01:22:29.380 --> 01:22:32.320 0 that you want to see on the screen, it's a sentinel value 01:22:32.320 --> 01:22:34.880 that is terminating this here string. 01:22:34.880 --> 01:22:38.480 So now what can I do once I know this information? 01:22:38.480 --> 01:22:41.740 Well, I can actually even see this let me go back to this code 01:22:41.740 --> 01:22:42.790 here in VS Code. 01:22:42.790 --> 01:22:46.190 Let me change these %c's to %i's just like before. 01:22:46.190 --> 01:22:50.290 And now, we'll see again those same numbers, make hi, ./hi, 01:22:50.290 --> 01:22:51.730 there are the three. 01:22:51.730 --> 01:22:56.410 I can technically poke around a little bit further, %i one more, 01:22:56.410 --> 01:22:58.210 and let's look at s, bracket, 3. 01:22:58.210 --> 01:23:02.060 I was not exaggerating earlier when I said, 01:23:02.060 --> 01:23:06.260 in general, if you go past the end of an array, bad things can happen. 01:23:06.260 --> 01:23:10.030 But in this case, I know that there is one more thing at the end of this array 01:23:10.030 --> 01:23:13.210 because this is how strings are built. This is not a CS50 thing, 01:23:13.210 --> 01:23:17.440 this is a thing in C. Every string in the world in double quotes 01:23:17.440 --> 01:23:20.780 ends with a backslash 0-- that is 8 0 bits. 01:23:20.780 --> 01:23:24.400 So if I really want, I can see this by printing out s, bracket, 3, 01:23:24.400 --> 01:23:26.290 which is the fourth and final location. 01:23:26.290 --> 01:23:34.120 If I recompile my code now, make hi ./hi, I should see 72, 73, 33, and 0. 01:23:34.120 --> 01:23:35.470 That's always been there. 01:23:35.470 --> 01:23:40.900 So I'm always using 4 bytes, somewhat wastefully, but somewhat necessarily 01:23:40.900 --> 01:23:45.080 so that the computer actually knows where that string ends. 01:23:45.080 --> 01:23:48.070 So if we go back to the memory representation of this here, 01:23:48.070 --> 01:23:52.990 it's just as though you have an array of integers being stored contiguously back 01:23:52.990 --> 01:23:56.800 to back to back, the last one of which means this is the end of the array 01:23:56.800 --> 01:24:00.120 of characters, but because I'm using, quote-unquote, "string," 01:24:00.120 --> 01:24:04.080 because I'm using %s and %c, I'm not seeing these numbers by default, 01:24:04.080 --> 01:24:08.950 I'm seeing H-I, exclamation point unless I explicitly tell printf, no, no, no, 01:24:08.950 --> 01:24:13.470 no, show me with %i these actual integers. 01:24:13.470 --> 01:24:15.840 This, then, is how you can think about the string. 01:24:15.840 --> 01:24:17.100 Like you don't really need to think about 01:24:17.100 --> 01:24:18.540 it as being individual characters. 01:24:18.540 --> 01:24:21.600 This is just s, and it has some length here, 01:24:21.600 --> 01:24:26.760 but it does not necessarily an array that you yourself have to create, 01:24:26.760 --> 01:24:30.820 you get it automatically just by using a string. 01:24:30.820 --> 01:24:32.910 Now there's just-- not to add on to the jargon. 01:24:32.910 --> 01:24:35.760 This backslash 0, these 8 0 bits, there's 01:24:35.760 --> 01:24:37.290 actually a technical term for them. 01:24:37.290 --> 01:24:38.430 You can call them NUL. 01:24:38.430 --> 01:24:41.430 It's typically written in all caps like this, confusingly. 01:24:41.430 --> 01:24:44.580 In a couple of weeks, we're going to see another word pronounced null, 01:24:44.580 --> 01:24:48.720 but spelled N-U-L-L. Left hand wasn't talking to right hand years ago, 01:24:48.720 --> 01:24:54.000 but N-U-L means this is the 0 byte that terminates strings, 01:24:54.000 --> 01:24:56.520 that indicate the end of a string. 01:24:56.520 --> 01:25:00.070 And fun fact, you've actually seen this before even though we glossed over it. 01:25:00.070 --> 01:25:02.490 Here's that ASCII chart from last time. 01:25:02.490 --> 01:25:08.850 If I focus on the leftmost column, guess what is the 0 ASCII character? 01:25:08.850 --> 01:25:09.480 NUL. 01:25:09.480 --> 01:25:14.005 You never see null on the screen, it's just how you pronounce 8 0 bits. 01:25:14.005 --> 01:25:14.810 Whew! 01:25:14.810 --> 01:25:17.360 questions on this representation of strings? 01:25:17.360 --> 01:25:18.250 Yeah? 01:25:18.250 --> 01:25:20.090 AUDIENCE: Are strings [INAUDIBLE]? 01:25:20.090 --> 01:25:22.380 DAVID MALAN: Are string structured differently in other languages? 01:25:22.380 --> 01:25:22.760 Yes. 01:25:22.760 --> 01:25:24.590 They are more powerful in other languages. 01:25:24.590 --> 01:25:28.070 In C, you have to build them yourself in this way. 01:25:28.070 --> 01:25:29.900 More on that when we get to Python. 01:25:29.900 --> 01:25:30.710 Other questions. 01:25:30.710 --> 01:25:31.593 Yeah? 01:25:31.593 --> 01:25:41.170 AUDIENCE: [INAUDIBLE] 01:25:41.170 --> 01:25:42.670 DAVID MALAN: A really good question. 01:25:42.670 --> 01:25:45.840 Does that mean we don't have a function to get the length of a string? 01:25:45.840 --> 01:25:47.700 Do we have to create it? 01:25:47.700 --> 01:25:51.360 Short answer, there is a function, but you have to-- someone 01:25:51.360 --> 01:25:52.540 had to write code for it. 01:25:52.540 --> 01:25:56.010 You can't just ask the string itself like you can in JavaScript or Java. 01:25:56.010 --> 01:25:57.150 What is the-- 01:25:57.150 --> 01:25:59.355 AUDIENCE: [INAUDIBLE] 01:25:59.355 --> 01:26:00.480 DAVID MALAN: Yeah, you can. 01:26:00.480 --> 01:26:04.230 It's actually more similar to Python than it is to JavaScript or Java, 01:26:04.230 --> 01:26:07.110 but we'll see that in just a few minutes, in fact. 01:26:07.110 --> 01:26:09.340 So let's introduce maybe a couple of strings. 01:26:09.340 --> 01:26:12.360 So here's two strings in the abstract called s and t, 01:26:12.360 --> 01:26:15.210 and I've initialized them arbitrarily to "HI!" and "BYE!" 01:26:15.210 --> 01:26:18.840 just so we can explore what's going to actually happen underneath the hood. 01:26:18.840 --> 01:26:20.640 So let me go back to VS Code. 01:26:20.640 --> 01:26:23.680 Let me just completely change this program to be that instead. 01:26:23.680 --> 01:26:26.280 So string equals, quote-unquote, "HI!" 01:26:26.280 --> 01:26:28.860 String t equals, quote-unquote, "BYE!" 01:26:28.860 --> 01:26:29.860 in all caps. 01:26:29.860 --> 01:26:34.620 And then let's print them both out very simply. %s backslash n, s. 01:26:34.620 --> 01:26:39.570 Print out %s backslash n, t just so we can see what's going on. 01:26:39.570 --> 01:26:44.183 If I do make hi ./hi, I should, of course, see these two strings. 01:26:44.183 --> 01:26:46.350 But what's going on inside of the computer's memory? 01:26:46.350 --> 01:26:48.868 Well, in this computer's memory, assuming 01:26:48.868 --> 01:26:51.660 these are the only two variables involved and assuming the computer 01:26:51.660 --> 01:26:55.170 is just doing things top to bottom, "HI!" 01:26:55.170 --> 01:26:58.260 is probably going to be stored somewhere like this on my canvas of memory, 01:26:58.260 --> 01:26:58.950 "BYE!" 01:26:58.950 --> 01:27:00.290 is probably going to be stored there. 01:27:00.290 --> 01:27:03.165 And it's wrapping around, but that's just an artist's representation. 01:27:03.165 --> 01:27:05.380 But notice that it is now really important 01:27:05.380 --> 01:27:08.890 that there is this NUL byte at the end of each string 01:27:08.890 --> 01:27:11.650 because that's how the computer is going to know where "HI!" 01:27:11.650 --> 01:27:13.630 ends and where "BYE!" 01:27:13.630 --> 01:27:15.670 begins, otherwise you might see "HI!" 01:27:15.670 --> 01:27:16.360 "BYE!" 01:27:16.360 --> 01:27:20.380 all on the screen at once if there weren't the sentinel value indicating 01:27:20.380 --> 01:27:23.860 to printf, stop at this character. 01:27:23.860 --> 01:27:26.290 But that's all that's going on in your program 01:27:26.290 --> 01:27:29.080 when you have two variables in this way. 01:27:29.080 --> 01:27:32.290 And in fact, what's really going on and things get a little more interesting 01:27:32.290 --> 01:27:37.310 here, if I were to want two of these things, 01:27:37.310 --> 01:27:40.630 notice that I could refer to them two as arrays. 01:27:40.630 --> 01:27:43.990 So s, bracket, 0, 1, 2, and even 3. 01:27:43.990 --> 01:27:47.110 t, bracket, 0, 1, 2, and even 3 and 4. 01:27:47.110 --> 01:27:51.460 But if I want to actually really blend some ideas, 01:27:51.460 --> 01:27:54.190 just playing around with these basic principles now, 01:27:54.190 --> 01:27:56.140 notice what I can do in this version. 01:27:56.140 --> 01:27:59.200 If I know I've got two arrays in VS Code, 01:27:59.200 --> 01:28:02.950 I don't strictly need to do string s and t and u 01:28:02.950 --> 01:28:08.260 and v. That's devolving back into the scores1, scores2, scores3 mantra where 01:28:08.260 --> 01:28:10.277 I had multiple variables almost the same name 01:28:10.277 --> 01:28:12.610 even though I'm using different letters of the alphabet. 01:28:12.610 --> 01:28:13.840 What if I want-- 01:28:13.840 --> 01:28:15.280 what if I do this? 01:28:15.280 --> 01:28:19.660 string words, and if I want to store two words in the computer's memory, fine. 01:28:19.660 --> 01:28:22.700 Create an array of two strings. 01:28:22.700 --> 01:28:23.620 But what is a string? 01:28:23.620 --> 01:28:28.870 A string is an array of characters, so it's getting a little bit trippy here, 01:28:28.870 --> 01:28:32.290 but the ideas are still going to be the same. words, bracket, 01:28:32.290 --> 01:28:34.540 0 could certainly equal "HI!" 01:28:34.540 --> 01:28:39.280 words, bracket, 1 can certainly equal "BYE!" just like the scores example. 01:28:39.280 --> 01:28:42.910 And then if I want to print these things with %s, I can print out words, 01:28:42.910 --> 01:28:44.080 bracket, 0. 01:28:44.080 --> 01:28:48.820 And then I can print out %s backslash n words bracket 1. 01:28:48.820 --> 01:28:52.520 And the example is not going to be any different in terms of its output, 01:28:52.520 --> 01:28:58.240 but I've now avoided s and t, I now just have one variable called words 01:28:58.240 --> 01:29:00.710 containing both of these here things. 01:29:00.710 --> 01:29:02.800 And if I really want to poke around, here's 01:29:02.800 --> 01:29:06.490 where things get even more visually overwhelming, 01:29:06.490 --> 01:29:09.640 but just the logical extension of these same ideas. 01:29:09.640 --> 01:29:13.300 Right now is the previous version where I had two variables, s and t. 01:29:13.300 --> 01:29:17.290 If I now use this new version where I have one variable called words, 01:29:17.290 --> 01:29:22.060 just like this here, the picture should follow logically like this. 01:29:22.060 --> 01:29:26.320 words, bracket, 0 is this string; words, bracket, 1 is this string; 01:29:26.320 --> 01:29:27.940 but what is each string? 01:29:27.940 --> 01:29:29.840 It's an array of characters. 01:29:29.840 --> 01:29:36.520 And so you can also think of it like this, where this H is words, bracket, 01:29:36.520 --> 01:29:37.930 0, bracket, 0. 01:29:37.930 --> 01:29:41.440 So the 0-th character of the 0-th word. 01:29:41.440 --> 01:29:45.580 And this is words, bracket, 0, 1; words, bracket, 0, 2; words, bracket, 0, 3. 01:29:45.580 --> 01:29:49.180 And then words, bracket, 1, 0. 01:29:49.180 --> 01:29:52.382 So it's kind of like a two-dimensional array, almost. 01:29:52.382 --> 01:29:54.340 And you can think about it that way if helpful. 01:29:54.340 --> 01:29:58.400 But for now, it's just applying the same principles to the code. 01:29:58.400 --> 01:30:01.930 So if I go to my code here and I've got my "HI!" and my "BYE!"-- 01:30:01.930 --> 01:30:07.000 this is going to look a little stupid, but let me change this %s to %c, %c, 01:30:07.000 --> 01:30:09.640 %c, and print out words, bracket, 0. 01:30:09.640 --> 01:30:11.620 words, bracket, 0, bracket 1. 01:30:11.620 --> 01:30:16.900 words, bracket, 0, bracket, 2 to print out that three-letter word. 01:30:16.900 --> 01:30:21.550 And now down here, let me print out %c, %c, %c, 01:30:21.550 --> 01:30:24.340 %c because it's four letters in BYE, exclamation point. 01:30:24.340 --> 01:30:28.570 This is words, bracket, 1, but the first character; words, bracket, 1, 01:30:28.570 --> 01:30:32.920 the second character; words, bracket, 1, the third character; 01:30:32.920 --> 01:30:34.948 and words, bracket, 1, the fourth character. 01:30:34.948 --> 01:30:37.240 It's hard to say when you're typing a different number, 01:30:37.240 --> 01:30:40.810 but that's what we get by using zero indexing, so to speak. 01:30:40.810 --> 01:30:41.720 make hi. 01:30:41.720 --> 01:30:42.220 Whew! 01:30:42.220 --> 01:30:42.940 No mistakes. 01:30:42.940 --> 01:30:43.780 "HI!" 01:30:43.780 --> 01:30:45.440 Says the same thing. 01:30:45.440 --> 01:30:46.840 So again, there's no magic. 01:30:46.840 --> 01:30:49.630 Like you are fully in control over what's going 01:30:49.630 --> 01:30:51.560 on inside of the computer's memory. 01:30:51.560 --> 01:30:54.250 And now that we have this array syntax with square brackets, 01:30:54.250 --> 01:30:58.740 you can both create these things and then manipulate them or access them 01:30:58.740 --> 01:31:01.310 however you so choose. 01:31:01.310 --> 01:31:01.810 Whew! 01:31:01.810 --> 01:31:08.540 Questions on arrays or strings in this way? 01:31:08.540 --> 01:31:10.032 Yeah, over here. 01:31:10.032 --> 01:31:13.340 AUDIENCE: Can you have any array that has multiple data types in it? 01:31:13.340 --> 01:31:13.880 DAVID MALAN: Good question. 01:31:13.880 --> 01:31:16.255 Can you have an array with multiple different data types? 01:31:16.255 --> 01:31:19.310 Short answer, no; longer answer, sort of, 01:31:19.310 --> 01:31:22.670 but not in nearly the same user-friendly way as with languages 01:31:22.670 --> 01:31:25.220 like Python or JavaScript or others. 01:31:25.220 --> 01:31:30.580 So assume for now arrays should be the same type in C. Other questions? 01:31:30.580 --> 01:31:31.997 Yeah, over here. 01:31:31.997 --> 01:31:34.432 AUDIENCE: When you talk about [INAUDIBLE]?? 01:31:47.113 --> 01:31:48.780 DAVID MALAN: Oh, a really good question. 01:31:48.780 --> 01:31:51.500 It will-- so for those who couldn't hear, 01:31:51.500 --> 01:31:54.425 if you were to look past the end of one array, 01:31:54.425 --> 01:31:56.550 would you start to see the beginning of the second? 01:31:56.550 --> 01:31:58.008 In this case, maybe the word "BYE!" 01:31:58.008 --> 01:32:01.070 Could depend on the particulars of your code in the computer. 01:32:01.070 --> 01:32:02.250 Let's try this. 01:32:02.250 --> 01:32:07.310 So let's get a little greedy here and go one past H-I, exclamation point, 01:32:07.310 --> 01:32:11.450 null character by looking at words, bracket, 0, 3, 01:32:11.450 --> 01:32:16.220 which should actually be our null character, so that's going to be there. 01:32:16.220 --> 01:32:18.350 And actually, let's see. 01:32:18.350 --> 01:32:19.490 Let's go ahead and do this. 01:32:19.490 --> 01:32:21.530 Make hi ./hi. 01:32:21.530 --> 01:32:25.310 Still works as expected, but let me change this to integer, 01:32:25.310 --> 01:32:27.770 integer so we can actually see what's going on. 01:32:27.770 --> 01:32:28.610 Integer. 01:32:28.610 --> 01:32:32.840 And now, if I recompile make hi, I should see the same thing, 01:32:32.840 --> 01:32:34.100 but numerically. 01:32:34.100 --> 01:32:37.430 And now what I think you're proposing is let's get a little crazy 01:32:37.430 --> 01:32:41.000 and go even past that to what could be location 4, 01:32:41.000 --> 01:32:45.740 but we know semantically doesn't exist, but maybe is bumping up against "BYE!" 01:32:45.740 --> 01:32:49.140 So make hi ./hi. 01:32:49.140 --> 01:32:52.440 And guess what 66 is. 01:32:52.440 --> 01:32:54.360 Well, just the B, but yes. 01:32:54.360 --> 01:32:59.600 66, recall, is capital B because in week 0, capital A was 65. 01:32:59.600 --> 01:33:01.350 So indeed, now we're really poking around. 01:33:01.350 --> 01:33:02.267 And you can get crazy. 01:33:02.267 --> 01:33:05.520 Like, what's 400 characters away and see what's going on there. 01:33:05.520 --> 01:33:07.870 Eventually your program will probably crash, 01:33:07.870 --> 01:33:12.300 and so don't poke around too much, but more on that in the coming days, too. 01:33:12.300 --> 01:33:16.298 All right, well how about some other revelations and problem-solving? 01:33:16.298 --> 01:33:18.840 Now coming back to the question about strings length earlier, 01:33:18.840 --> 01:33:21.465 and we'll see if we can then tie this all together to something 01:33:21.465 --> 01:33:24.390 like cryptography in the end and manipulating strings 01:33:24.390 --> 01:33:26.580 for the purpose of sending them securely. 01:33:26.580 --> 01:33:30.430 So let me propose that we go into VS Code here again in a moment. 01:33:30.430 --> 01:33:32.430 And I'm going to create a program called length. 01:33:32.430 --> 01:33:36.490 Let's actually figure out ourselves the length of a string initially. 01:33:36.490 --> 01:33:39.750 So I'm going to go ahead and code length.c. 01:33:39.750 --> 01:33:42.450 I'm going to go ahead and include cs50.h. 01:33:42.450 --> 01:33:46.170 I'm going to include stdio.h, int main void. 01:33:46.170 --> 01:33:49.620 And then inside of main, I'm going to prompt the user for their name. 01:33:49.620 --> 01:33:51.930 get_string, quote-unquote, "Name." 01:33:51.930 --> 01:33:55.152 And then I'm going to go ahead and I want 01:33:55.152 --> 01:33:56.610 to count the length of this string. 01:33:56.610 --> 01:33:57.943 But I know what a string is now. 01:33:57.943 --> 01:34:01.720 It's char, char, char, char, and then eventually the null character. 01:34:01.720 --> 01:34:02.685 So I can look for that. 01:34:02.685 --> 01:34:04.560 And I can write this in a few different ways. 01:34:04.560 --> 01:34:06.518 I know a bunch of different types of loops now, 01:34:06.518 --> 01:34:10.440 but I'm going to go with a while loop by first declaring a variable n, 01:34:10.440 --> 01:34:12.618 for number of characters, set it equal to 0. 01:34:12.618 --> 01:34:14.910 It's like starting to count with your fingers all down, 01:34:14.910 --> 01:34:17.910 and I want to do the equivalent of this, counting each of the letters 01:34:17.910 --> 01:34:18.810 that I type in. 01:34:18.810 --> 01:34:20.490 So I can do that as follows. 01:34:20.490 --> 01:34:29.160 While the name variable at location n does not equal, 01:34:29.160 --> 01:34:32.910 quote-unquote, backslash 0, which looks weird, 01:34:32.910 --> 01:34:35.850 but it's just asking the question, is the character 01:34:35.850 --> 01:34:39.850 at that location equal to the so-called null character? 01:34:39.850 --> 01:34:43.560 Which is written with single quotes and backslash 0 by convention. 01:34:43.560 --> 01:34:48.300 And what I want to do, while that is true, is just add 1 to n. 01:34:48.300 --> 01:34:52.440 And then at the very bottom here, let's just go ahead and print out with %i 01:34:52.440 --> 01:34:57.540 the value of n because presumably if I type in HI, exclamation point, 01:34:57.540 --> 01:35:01.860 I'm starting at 0 and I'm going to have H, I, exclamation point, 01:35:01.860 --> 01:35:05.800 null character so I don't increment n a fourth time. 01:35:05.800 --> 01:35:08.460 So let's go ahead and run down here. 01:35:08.460 --> 01:35:12.735 make length ./length, Enter. 01:35:12.735 --> 01:35:15.360 Well, I guess I'm asking for name, so I'll do my name for real. 01:35:15.360 --> 01:35:18.840 David, five characters, and I indeed get 5. 01:35:18.840 --> 01:35:22.750 If I used a for loop, I could do something similar, 01:35:22.750 --> 01:35:26.070 but I think this while loop approach, much like our counter from the past, 01:35:26.070 --> 01:35:27.330 is fairly straightforward. 01:35:27.330 --> 01:35:28.600 But what if I want to do this? 01:35:28.600 --> 01:35:30.780 What if I want to make another function for this? 01:35:30.780 --> 01:35:32.100 Well, I could do that. 01:35:32.100 --> 01:35:32.888 Let me-- 01:35:32.888 --> 01:35:33.930 All right, let's do this. 01:35:33.930 --> 01:35:36.840 Let's write a quick function called string_length. 01:35:36.840 --> 01:35:40.172 It's going to take a string called s or whatever as input. 01:35:40.172 --> 01:35:41.130 And then you know what? 01:35:41.130 --> 01:35:43.590 Let's just do this in that function. 01:35:43.590 --> 01:35:45.810 I'm going to borrow my code from a moment ago. 01:35:45.810 --> 01:35:47.720 I'm going to paste it into this function. 01:35:47.720 --> 01:35:49.470 But I'm not going to print out the length, 01:35:49.470 --> 01:35:51.060 I'm going to return the length n. 01:35:51.060 --> 01:35:53.280 So I have a helper function of sorts that's 01:35:53.280 --> 01:35:55.590 going to hand me back the length of the string, 01:35:55.590 --> 01:36:00.780 and that's why this returns an int, but takes a string as its argument. 01:36:00.780 --> 01:36:01.860 How do I use this? 01:36:01.860 --> 01:36:04.120 Well, first, I do need to copy the prototype 01:36:04.120 --> 01:36:06.090 so I don't get into trouble as before. 01:36:06.090 --> 01:36:07.020 Semicolon. 01:36:07.020 --> 01:36:10.020 And then in my main function, what I think I can do now 01:36:10.020 --> 01:36:11.380 is something like this. 01:36:11.380 --> 01:36:17.942 I can do int length equals the string length of the name variable 01:36:17.942 --> 01:36:18.900 that was just typed in. 01:36:18.900 --> 01:36:23.940 And now using printf %i, print out length, semicolon. 01:36:23.940 --> 01:36:25.440 So exact same logic. 01:36:25.440 --> 01:36:28.050 The only thing I've done that's different this time is I've 01:36:28.050 --> 01:36:30.210 added a helper function just to demonstrate 01:36:30.210 --> 01:36:32.610 how I can take some pretty basic functionality, 01:36:32.610 --> 01:36:35.010 find the length of a string, and modularize it 01:36:35.010 --> 01:36:38.040 into a function abstract it away so I never again have 01:36:38.040 --> 01:36:39.270 to copy-paste that for loop. 01:36:39.270 --> 01:36:41.020 I now have a function called string_length 01:36:41.020 --> 01:36:43.695 that will solve this problem for me. 01:36:43.695 --> 01:36:46.600 Whoops, wrong program. make length. 01:36:46.600 --> 01:36:47.100 Huh. 01:36:47.100 --> 01:36:51.590 Use of undeclared identifier 'name.' 01:36:51.590 --> 01:36:53.090 What did I do wrong? 01:36:53.090 --> 01:36:59.350 Apparently on line 16 of length.c, what did I do wrong here? 01:36:59.350 --> 01:37:00.639 Yeah, in front. 01:37:00.639 --> 01:37:06.210 AUDIENCE: [INAUDIBLE] 01:37:06.210 --> 01:37:06.960 DAVID MALAN: Good. 01:37:06.960 --> 01:37:09.190 AUDIENCE: [INAUDIBLE] 01:37:09.190 --> 01:37:09.940 DAVID MALAN: Good. 01:37:09.940 --> 01:37:10.840 Perfect terminology. 01:37:10.840 --> 01:37:12.850 So name is local to main. 01:37:12.850 --> 01:37:16.930 The scope of name is main, though sounds similar, but different words. 01:37:16.930 --> 01:37:19.720 And so I'm actually should be calling this 01:37:19.720 --> 01:37:24.970 s because s is the name of the local variable being passed in even though it 01:37:24.970 --> 01:37:29.410 happens to be 1 and the same as name because on line 9, 01:37:29.410 --> 01:37:32.060 I'm indeed passing in name as the argument. 01:37:32.060 --> 01:37:32.560 All right. 01:37:32.560 --> 01:37:35.450 So this is where, again, copy-paste can sometimes get you into trouble. 01:37:35.450 --> 01:37:36.760 Let's try to make length again. 01:37:36.760 --> 01:37:42.520 Now it works. ./length, D-A-V-I-D, and now we have a function that seems to be 01:37:42.520 --> 01:37:43.090 working. 01:37:43.090 --> 01:37:45.490 But this is such like commodity functionality. 01:37:45.490 --> 01:37:47.770 Like my God, like surely someone before us 01:37:47.770 --> 01:37:51.070 has written a function to get the length of a string before, 01:37:51.070 --> 01:37:53.080 and indeed, other people have. 01:37:53.080 --> 01:37:56.560 So it turns out that in C, just as you have the stdio library, 01:37:56.560 --> 01:38:00.580 you also have a string library whose header file is called, appropriately, 01:38:00.580 --> 01:38:01.600 string.h. 01:38:01.600 --> 01:38:05.200 In fact CS50 has documentation, therefore, in its own manual pages, 01:38:05.200 --> 01:38:08.020 so to speak, along with some sample usage thereof. 01:38:08.020 --> 01:38:10.580 But it turns out, in the string library, there 01:38:10.580 --> 01:38:13.850 is a very popular function analogous to the Python one 01:38:13.850 --> 01:38:16.370 that you asked about earlier called strlen 01:38:16.370 --> 01:38:19.250 where strlen, one word, no underscores, just 01:38:19.250 --> 01:38:20.875 figures out the length of a string. 01:38:20.875 --> 01:38:23.000 And honestly, I've never looked at its source code, 01:38:23.000 --> 01:38:26.030 but it probably uses a while loop, maybe it uses a for loop, 01:38:26.030 --> 01:38:30.320 but it certainly uses the same idea of just iterating-- that is, 01:38:30.320 --> 01:38:33.380 walking from left to right over a variable 01:38:33.380 --> 01:38:36.860 in order to figure out what the length of a given string is. 01:38:36.860 --> 01:38:38.040 So how do we use this? 01:38:38.040 --> 01:38:42.410 Well if I go back to VS Code here, I can throw away 01:38:42.410 --> 01:38:44.810 the entirety of my string length function, 01:38:44.810 --> 01:38:47.870 I can throw away the prototype, therefore, 01:38:47.870 --> 01:38:52.640 and I can include a third header file, string.h, inside 01:38:52.640 --> 01:38:55.460 of which I claim now is this function called strlen 01:38:55.460 --> 01:38:58.370 that I can just now use out of the box for free 01:38:58.370 --> 01:39:00.560 because someone else wrote this function for me. 01:39:00.560 --> 01:39:03.870 And string.h will teach the compiler that it exists. 01:39:03.870 --> 01:39:10.700 So if I now do make length and ./length, now I have a similarly working program 01:39:10.700 --> 01:39:14.720 that doesn't bother having me write unnecessary code. 01:39:14.720 --> 01:39:16.880 So this is another example of a library. 01:39:16.880 --> 01:39:22.060 The string library is just going to make our lives easier by not having to-- 01:39:22.060 --> 01:39:25.082 for us not having to reinvent some wheel. 01:39:25.082 --> 01:39:27.290 All right, well where else does this get interesting? 01:39:27.290 --> 01:39:29.730 How about something like this? 01:39:29.730 --> 01:39:31.970 Let me go back into VS Code here. 01:39:31.970 --> 01:39:35.138 Let's create a program called string.c-- 01:39:35.138 --> 01:39:38.180 we'll play around with our own strings-- that's going to start similarly. 01:39:38.180 --> 01:39:44.030 So let's include cs50.h, let's include stdio.h, 01:39:44.030 --> 01:39:48.230 let's include string.h so we can use that same strlen function. 01:39:48.230 --> 01:39:50.030 int main void. 01:39:50.030 --> 01:39:51.810 And inside of this, let's do this. 01:39:51.810 --> 01:39:57.410 Let's get a string s and prompt the user for any old string as input. 01:39:57.410 --> 01:39:57.910 All right. 01:39:57.910 --> 01:40:04.140 And then let's go ahead and maybe print out, quote-unquote, "Output." 01:40:04.140 --> 01:40:07.238 And I'm just going to line up my spaces just right because these words are 01:40:07.238 --> 01:40:09.780 slightly different lengths, but we'll see why I'm doing this. 01:40:09.780 --> 01:40:11.860 It's just for aesthetics' sake in a moment. 01:40:11.860 --> 01:40:13.380 And let's go ahead now and do this. 01:40:13.380 --> 01:40:17.348 If I want to print out every character in a string, how can I now do this? 01:40:17.348 --> 01:40:19.140 Well, this is actually a pretty common task 01:40:19.140 --> 01:40:23.580 even though this version, thereof, will seem pointless. for int i gets 0, 01:40:23.580 --> 01:40:26.460 i is less than the length of s. 01:40:26.460 --> 01:40:31.800 i++ is just the conventional way to start a loop that iterates from left 01:40:31.800 --> 01:40:34.260 to right over a string of that length. 01:40:34.260 --> 01:40:38.190 And then let's go ahead and print out each character, %c, 01:40:38.190 --> 01:40:43.800 printing out the string at location i using our fancy new array syntax. 01:40:43.800 --> 01:40:45.780 And at the very end of this program, let's just 01:40:45.780 --> 01:40:48.870 print out a new line character just to move the cursor to the bottom 01:40:48.870 --> 01:40:50.200 like we've done in the past. 01:40:50.200 --> 01:40:54.030 So this is kind of a stupid program like I am reinventing the wheel that is 01:40:54.030 --> 01:40:56.130 the %s format code. 01:40:56.130 --> 01:40:58.600 I already know that printf can print out a whole string. 01:40:58.600 --> 01:40:59.650 Suppose it didn't. 01:40:59.650 --> 01:41:03.100 Suppose I forgot about %s and I only knew about %c, 01:41:03.100 --> 01:41:09.100 these lines of code here collectively will print out the entirety of a string 01:41:09.100 --> 01:41:12.050 character by character based on its length. 01:41:12.050 --> 01:41:17.770 So if I compile this program, make string ./string and type in my name-- 01:41:17.770 --> 01:41:20.870 for instance, David, the output is D-A-V-I-D, 01:41:20.870 --> 01:41:22.870 and here's why I hit the spacebar an extra time, 01:41:22.870 --> 01:41:26.230 because I wanted input and output to line up nicely so we could see that 01:41:26.230 --> 01:41:27.680 they're, in fact, the same length. 01:41:27.680 --> 01:41:28.930 So let me just stipulate. 01:41:28.930 --> 01:41:35.390 This code is correct, but there is an inefficiency with this line of code. 01:41:35.390 --> 01:41:38.020 Let's talk about design instinctively. 01:41:38.020 --> 01:41:42.550 What is maybe bad about this line of code 9-- 01:41:42.550 --> 01:41:44.650 line 9 that I've highlighted? 01:41:44.650 --> 01:41:47.020 This one is subtle. 01:41:47.020 --> 01:41:47.950 Let's go over here. 01:41:47.950 --> 01:41:51.900 AUDIENCE: [INAUDIBLE] 01:41:51.900 --> 01:41:54.120 DAVID MALAN: Yeah. 01:41:54.120 --> 01:41:58.930 I'm calling strlen inside of the loop again and again and again. 01:41:58.930 --> 01:41:59.430 Why? 01:41:59.430 --> 01:42:00.847 Well, recall how for loops worked. 01:42:00.847 --> 01:42:03.870 When we walked through it last week, that middle part of for loop 01:42:03.870 --> 01:42:07.230 in between the semicolons keeps getting checked, keeps getting checked, 01:42:07.230 --> 01:42:08.320 keeps getting checked. 01:42:08.320 --> 01:42:12.030 And so if you put a function call there, which is totally fine syntactically, 01:42:12.030 --> 01:42:14.970 you're asking the same damn question again and again and again. 01:42:14.970 --> 01:42:17.790 And the length of David, D-A-V-I-D, is never changing. 01:42:17.790 --> 01:42:21.330 So strlen, implemented decades ago by some other human, 01:42:21.330 --> 01:42:23.700 has some kind of loop in it, and you're literally 01:42:23.700 --> 01:42:26.580 making that code run again and again and again just 01:42:26.580 --> 01:42:29.123 to get the same answer 5 again and again. 01:42:29.123 --> 01:42:30.540 So I think your instinct is right. 01:42:30.540 --> 01:42:33.928 I could come up with another variable outside of the loop. 01:42:33.928 --> 01:42:35.220 I could do something like this. 01:42:35.220 --> 01:42:40.830 int length equals strlen of s, and then I could just plug that in. 01:42:40.830 --> 01:42:42.658 But there's a slightly more elegant way. 01:42:42.658 --> 01:42:44.700 If you like doing things with slightly less code, 01:42:44.700 --> 01:42:46.440 this is correct as I've now written it. 01:42:46.440 --> 01:42:50.400 It's less efficient-- it's more efficient because I'm only 01:42:50.400 --> 01:42:53.440 calling strlen once now on this new line 9, 01:42:53.440 --> 01:42:56.560 but a more common way to write this would typically 01:42:56.560 --> 01:42:58.360 be to do something like this. 01:42:58.360 --> 01:43:02.860 After initializing i, you can also initialize something else like length. 01:43:02.860 --> 01:43:07.580 And you can set length equal to strlen of s, then your semicolon, 01:43:07.580 --> 01:43:10.755 and now you can say while i is less than that length. 01:43:10.755 --> 01:43:12.130 Or I can tighten this up further. 01:43:12.130 --> 01:43:15.580 If it's just a number and it's a super short loop, might as well just call it 01:43:15.580 --> 01:43:16.120 n. 01:43:16.120 --> 01:43:20.920 So this now would be a canonical way of implementing the exact same idea, 01:43:20.920 --> 01:43:23.770 but without the inefficiency because now you're 01:43:23.770 --> 01:43:28.150 calling strlen in the initialization part of for loop, 01:43:28.150 --> 01:43:32.650 not inside of the Boolean expression that gets checked and executed 01:43:32.650 --> 01:43:34.000 again and again. 01:43:34.000 --> 01:43:34.510 Yeah? 01:43:34.510 --> 01:43:38.965 AUDIENCE: [INAUDIBLE] 01:43:38.965 --> 01:43:39.840 DAVID MALAN: Correct. 01:43:39.840 --> 01:43:43.200 Well, I'm declaring i as an int, but by way of the comma, 01:43:43.200 --> 01:43:45.570 I am also declaring n as an int. 01:43:45.570 --> 01:43:49.110 So they've got to be the same type for this trick to work. 01:43:49.110 --> 01:43:50.370 Good observation. 01:43:50.370 --> 01:43:54.470 Other questions on this one here? 01:43:54.470 --> 01:43:54.970 No? 01:43:54.970 --> 01:43:55.540 All right. 01:43:55.540 --> 01:43:58.900 Well, let's play around further here. 01:43:58.900 --> 01:44:01.862 Let me propose that there's other libraries and header files 01:44:01.862 --> 01:44:03.320 as well that you might find useful. 01:44:03.320 --> 01:44:05.800 There's also something called ctype, which relates to types 01:44:05.800 --> 01:44:08.500 and c's that's got a bunch of useful functions 01:44:08.500 --> 01:44:12.200 that we can actually see if we visit the documentation here. 01:44:12.200 --> 01:44:14.200 But before we get there, let me actually whip up 01:44:14.200 --> 01:44:17.680 a program that maybe does something a little bit fun, albeit low level, 01:44:17.680 --> 01:44:21.590 like forcing some string to uppercase if the human types it in lowercase. 01:44:21.590 --> 01:44:25.120 So let me go ahead and write a program called uppercase.c. 01:44:25.120 --> 01:44:27.940 Let me go ahead and give myself the same header files. 01:44:27.940 --> 01:44:31.840 Include cs50.h, include stdio.h. 01:44:31.840 --> 01:44:34.960 And for now, let's include string.h for the length. 01:44:34.960 --> 01:44:38.570 And let's go ahead and have int main void as before. 01:44:38.570 --> 01:44:40.840 And inside of main, let's give myself a string 01:44:40.840 --> 01:44:46.780 s equaling get_string "Before," just so I know what the string is initially. 01:44:46.780 --> 01:44:50.560 Now I'm going to print out proactively "After" with two spaces 01:44:50.560 --> 01:44:53.740 just so that things line up aesthetically on the screen 01:44:53.740 --> 01:44:55.580 because "After" is one character shorter. 01:44:55.580 --> 01:44:57.920 And now I'm going to do the same technique as before. 01:44:57.920 --> 01:45:07.340 for int i equals 0, n equals the string length of s, i is less than n, i++. 01:45:07.340 --> 01:45:10.940 And then inside of this loop, what do I want to do logically? 01:45:10.940 --> 01:45:15.710 I want to force these characters to uppercase if they are, in fact, 01:45:15.710 --> 01:45:16.670 lowercase. 01:45:16.670 --> 01:45:18.083 And so how might I do this? 01:45:18.083 --> 01:45:20.000 Well, there's a bunch of ways to express this, 01:45:20.000 --> 01:45:22.640 but I'm going to do it maybe the most straightforward way 01:45:22.640 --> 01:45:24.260 even if you've not seen this before. 01:45:24.260 --> 01:45:28.760 If the current letter in the string at location i, 01:45:28.760 --> 01:45:31.970 because I'm in a loop starting from 0 all the way up to, but not 01:45:31.970 --> 01:45:34.400 through the string length, is greater than 01:45:34.400 --> 01:45:42.110 or equal to a lowercase a, in single quotes, and that letter is less than 01:45:42.110 --> 01:45:43.970 or equal to a lowercase z. 01:45:43.970 --> 01:45:45.440 What does this mean in English? 01:45:45.440 --> 01:45:48.740 Well, this essentially means if lowercase-- 01:45:48.740 --> 01:45:52.280 logically, if it's greater than or equal to little a and less than 01:45:52.280 --> 01:45:55.760 or equal to little z, it's somewhere between and z in lowercase. 01:45:55.760 --> 01:45:57.060 What do I want to do? 01:45:57.060 --> 01:45:58.670 Well, I want to force it to uppercase. 01:45:58.670 --> 01:46:03.260 So I want to print out a character without a new line yet 01:46:03.260 --> 01:46:07.880 that prints out the current character, but force it to uppercase. 01:46:07.880 --> 01:46:09.120 Well, how can I do this? 01:46:09.120 --> 01:46:12.560 Well, this is where this gets into some low-level hacking, 01:46:12.560 --> 01:46:14.480 but notice the same ASCII chart. 01:46:14.480 --> 01:46:17.640 Here's our uppercase letters from last time. 01:46:17.640 --> 01:46:20.900 Here's our lowercase characters, and let me highlight those. 01:46:20.900 --> 01:46:25.370 Does anyone notice a relationship between capital A and lowercase a 01:46:25.370 --> 01:46:29.540 that happens to be the same for capital B and lowercase b? 01:46:29.540 --> 01:46:33.000 AUDIENCE: Capital A [INAUDIBLE]. 01:46:33.000 --> 01:46:33.750 DAVID MALAN: Yeah. 01:46:33.750 --> 01:46:35.170 Like this pattern is true. 01:46:35.170 --> 01:46:40.140 So 97 minus 65 is 32, and that's true for every lowercase and uppercase 01:46:40.140 --> 01:46:41.170 letter respectively. 01:46:41.170 --> 01:46:42.420 So I can leverage that. 01:46:42.420 --> 01:46:43.950 And this is not a CS50 thing. 01:46:43.950 --> 01:46:44.850 Like this is ASCII. 01:46:44.850 --> 01:46:45.990 This is, in turn, Unicode. 01:46:45.990 --> 01:46:47.533 This is how modern computers work. 01:46:47.533 --> 01:46:49.950 So if I go back to VS Code here, you know what I could do. 01:46:49.950 --> 01:46:52.350 Let's just literally subtract 32. 01:46:52.350 --> 01:46:55.440 But because I'm displaying this as a char, not as an int, 01:46:55.440 --> 01:47:01.080 I'm going to see the lowercase letter seemingly become an uppercase instead. 01:47:01.080 --> 01:47:05.310 Else, if it's not lowercase-- maybe it's already uppercase, 01:47:05.310 --> 01:47:09.420 maybe it is punctuation, let's just go ahead and print out with %c 01:47:09.420 --> 01:47:11.462 the original character unaltered. 01:47:11.462 --> 01:47:13.170 And then at the very end of this program, 01:47:13.170 --> 01:47:17.670 let's print a new line just to move the cursor to the next line. 01:47:17.670 --> 01:47:19.950 All right, so let's do make uppercase. 01:47:19.950 --> 01:47:22.500 And let me type ./uppercase. 01:47:22.500 --> 01:47:26.100 And I'll type in D-A-V-I-D, all lowercase, and now, 01:47:26.100 --> 01:47:27.750 you'll see it's in all caps. 01:47:27.750 --> 01:47:31.920 If, though, I type in maybe my last name but capitalized M, that's OK, 01:47:31.920 --> 01:47:34.930 the rest of it will still be capitalized for me. 01:47:34.930 --> 01:47:36.710 Now I don't love this technique. 01:47:36.710 --> 01:47:40.090 It's a little bit fragile because I had to do some math. 01:47:40.090 --> 01:47:43.220 I had to check my reference sheet and then incorporate it into my program. 01:47:43.220 --> 01:47:45.940 Even though it will be correct, I could be a little more clever. 01:47:45.940 --> 01:47:47.607 I could actually do something like this. 01:47:47.607 --> 01:47:49.720 Well, whatever the value of lowercase is-- 01:47:49.720 --> 01:47:53.650 lowercase a is minus whatever the value of capital A is, 01:47:53.650 --> 01:47:56.378 and I could actually do it arithmetically even though that, too, 01:47:56.378 --> 01:47:59.170 is somewhat inefficient in that it's asking the same question again 01:47:59.170 --> 01:48:02.320 and again, but the compiler is probably smart enough to optimize that. 01:48:02.320 --> 01:48:05.830 And frankly, for those more comfortable, a good compiler 01:48:05.830 --> 01:48:07.930 will also notice, no, no, no, no, you don't 01:48:07.930 --> 01:48:09.910 want to call strlen again and again. 01:48:09.910 --> 01:48:13.330 The compiler can do some of these optimizations for you, 01:48:13.330 --> 01:48:15.610 but it's still good practice to get into yourself. 01:48:15.610 --> 01:48:17.080 But there's probably a better way. 01:48:17.080 --> 01:48:19.630 Instead of rolling this solution ourselves 01:48:19.630 --> 01:48:22.810 and subtracting 32 or doing any arithmetic, 01:48:22.810 --> 01:48:24.730 let's use that ctype library. 01:48:24.730 --> 01:48:27.280 Let me go back up to my header files. 01:48:27.280 --> 01:48:29.890 Let's additionally include ctype.h. 01:48:29.890 --> 01:48:33.100 Let's pretend like I read the documentation in advance, which I did, 01:48:33.100 --> 01:48:33.940 in fact. 01:48:33.940 --> 01:48:37.570 And let's instead of doing any math here, 01:48:37.570 --> 01:48:41.590 let's use a function that exists in that library called toupper 01:48:41.590 --> 01:48:47.740 and pass to it whatever the current character is in s at location i. 01:48:47.740 --> 01:48:50.860 Otherwise, I still print out the unchanged character. 01:48:50.860 --> 01:48:54.880 And let me go ahead and do make uppercase ./uppercase. 01:48:54.880 --> 01:49:00.190 And now without any math, no subtracting 32, that, too, also works. 01:49:00.190 --> 01:49:01.240 But it gets better. 01:49:01.240 --> 01:49:03.430 If you read the documentation for toupper, 01:49:03.430 --> 01:49:07.570 it turns out its documentation tells you, if C is already uppercase, 01:49:07.570 --> 01:49:09.950 it just passes it through for you. 01:49:09.950 --> 01:49:12.550 So you don't even need to ask this conditional question. 01:49:12.550 --> 01:49:17.710 I can actually cut this to my clipboard, get rid of all of this, 01:49:17.710 --> 01:49:21.430 and just replace that one line only and just 01:49:21.430 --> 01:49:25.600 let toupper handle the situation for me because again, its documentation 01:49:25.600 --> 01:49:28.120 has assured me that if it's already uppercase, 01:49:28.120 --> 01:49:30.890 it's just going to return the original value. 01:49:30.890 --> 01:49:33.670 So if I make uppercase, this time, ./uppercase, 01:49:33.670 --> 01:49:36.640 now it works and now things are getting kind of fun. 01:49:36.640 --> 01:49:38.740 I mean, these are mundane tasks, admittedly, 01:49:38.740 --> 01:49:41.410 but at least I'm standing on the shoulders of smart people 01:49:41.410 --> 01:49:45.040 who came before me who implemented the string library, the ctype library-- 01:49:45.040 --> 01:49:51.760 heck, even the CS50 Library so I don't need to reinvent any of those wheels. 01:49:51.760 --> 01:49:57.750 Questions on any of these library techniques? 01:49:57.750 --> 01:50:00.240 It's all still arrays, it's all still strings and chars, 01:50:00.240 --> 01:50:05.110 but now we're leveraging libraries to solve some of our problems for us. 01:50:05.110 --> 01:50:05.610 All right. 01:50:05.610 --> 01:50:07.890 So let's come full circle to where we began, 01:50:07.890 --> 01:50:10.950 where and I mentioned that some programs include 01:50:10.950 --> 01:50:12.630 support for command line arguments. 01:50:12.630 --> 01:50:18.210 Like Clang takes command line arguments words after the word clang. 01:50:18.210 --> 01:50:21.270 CD, which you've used in Linux, takes command line arguments. 01:50:21.270 --> 01:50:24.510 If you type cd, space, pset1 or cd, space, 01:50:24.510 --> 01:50:28.200 mario in order to change directories into another folder. 01:50:28.200 --> 01:50:31.140 If you do rm like I did earlier, you can remove a file 01:50:31.140 --> 01:50:33.510 by using a command line argument, a second word that 01:50:33.510 --> 01:50:35.730 tells the computer what to remove. 01:50:35.730 --> 01:50:38.520 Well, it turns out that you, too, can write 01:50:38.520 --> 01:50:43.230 code that takes words at the command prompt and uses them as input. 01:50:43.230 --> 01:50:47.040 Up until now, you and I have only gotten user input via get_string, get_int, 01:50:47.040 --> 01:50:48.810 get_float, and functions like that. 01:50:48.810 --> 01:50:52.230 You, too, can write code that take command line arguments which, 01:50:52.230 --> 01:50:54.240 frankly, just save the human time. 01:50:54.240 --> 01:50:57.790 They can type their entire thought at the command line, hit Enter, and boom, 01:50:57.790 --> 01:51:01.240 the program can complete without prompting them and re-prompting them 01:51:01.240 --> 01:51:02.020 again. 01:51:02.020 --> 01:51:05.680 So here's where we can now start to take off some more training wheels. 01:51:05.680 --> 01:51:10.000 Up until now, we've just put void inside of the parentheses here any time 01:51:10.000 --> 01:51:11.620 we implement main. 01:51:11.620 --> 01:51:15.130 It turns out that you can put something else in parentheses 01:51:15.130 --> 01:51:18.820 when using C. It's a mouthful, but you can replace void 01:51:18.820 --> 01:51:23.800 with this bigger expression. 01:51:23.800 --> 01:51:25.240 But it's two things. 01:51:25.240 --> 01:51:28.960 int, called argc by convention, and a string, 01:51:28.960 --> 01:51:32.920 but not a string, actually an array of strings called argv. 01:51:32.920 --> 01:51:35.320 And these terms are a little arcane, but argc means 01:51:35.320 --> 01:51:38.770 argument count-- how many words did the human type at the prompt? 01:51:38.770 --> 01:51:41.410 Argv stands for argument vector, which is generally 01:51:41.410 --> 01:51:42.762 another term for an array-- 01:51:42.762 --> 01:51:44.470 you've heard it perhaps from mathematics. 01:51:44.470 --> 01:51:48.440 It's like a list of values, or in this case, a list of command line arguments. 01:51:48.440 --> 01:51:49.790 So C is special. 01:51:49.790 --> 01:51:54.370 If you declare main as not taking void inside of parentheses, but rather, 01:51:54.370 --> 01:51:58.270 an int and an array of strings, C will figure out 01:51:58.270 --> 01:52:00.880 whatever the human typed at the prompt and hand it to you 01:52:00.880 --> 01:52:03.620 as an array and the length thereof. 01:52:03.620 --> 01:52:05.830 So if I want to leverage this, I can start 01:52:05.830 --> 01:52:10.940 to implement some programs of my own that actually incorporate command line 01:52:10.940 --> 01:52:11.440 arguments. 01:52:11.440 --> 01:52:14.980 For instance, let me go back in a moment here to VS Code. 01:52:14.980 --> 01:52:19.090 Let me create a program, for instance, called greet.c 01:52:19.090 --> 01:52:21.590 that's just going to greet the user in a few different ways. 01:52:21.590 --> 01:52:24.580 So let me first do it the old way. cs50.h. 01:52:24.580 --> 01:52:27.430 Let me include stdio.h. 01:52:27.430 --> 01:52:29.740 Let me do int main void still. 01:52:29.740 --> 01:52:30.950 So the old way. 01:52:30.950 --> 01:52:34.420 And if I want to greet myself or Carter or Yulie or anyone else, 01:52:34.420 --> 01:52:39.850 I could do, old fashioned now, get the answer from the user, get_string. 01:52:39.850 --> 01:52:42.670 Let's prompt for "What's your name?" question mark, 01:52:42.670 --> 01:52:44.200 just like we did in Scratch. 01:52:44.200 --> 01:52:49.940 And then do printf, "Hello," comma, %s backslash n, answer. 01:52:49.940 --> 01:52:53.320 So we've done this many times now this week and last. 01:52:53.320 --> 01:52:56.290 This is the old school way now of getting command line-- 01:52:56.290 --> 01:52:59.360 of getting user input by prompting them for it. 01:52:59.360 --> 01:53:04.570 So if I do make greet /greet, there's no command line arguments at the prompt, 01:53:04.570 --> 01:53:06.610 I'm literally just running the program's name. 01:53:06.610 --> 01:53:10.690 If I hit Enter, though, now get_string kicks in, asks me for my name, 01:53:10.690 --> 01:53:12.370 and the program then greets me. 01:53:12.370 --> 01:53:13.510 But I can do-- 01:53:13.510 --> 01:53:17.530 otherwise, I could do something like this instead. 01:53:17.530 --> 01:53:20.290 First, answer's a little generic, so let's first change 01:53:20.290 --> 01:53:23.980 this back to name and back to name, but that's a minor improvement there 01:53:23.980 --> 01:53:25.480 just stylistically. 01:53:25.480 --> 01:53:28.760 Let's, though, introduce now a command line argument 01:53:28.760 --> 01:53:31.750 so that I can just greet myself by running the program, hitting Enter, 01:53:31.750 --> 01:53:33.820 and being done, no more get_string. 01:53:33.820 --> 01:53:39.520 So I'm going to go ahead and change void to int argc, string 01:53:39.520 --> 01:53:42.070 argv with square brackets. 01:53:42.070 --> 01:53:45.520 string means-- the square brackets means it's an array; 01:53:45.520 --> 01:53:49.010 string means it's an array of strings; and argc, again, 01:53:49.010 --> 01:53:51.898 is just an integer of the number of words typed. 01:53:51.898 --> 01:53:54.190 Now I'm going to somewhat dangerously going to do this. 01:53:54.190 --> 01:53:56.770 I'm going to get rid of my use of get_string altogether, 01:53:56.770 --> 01:54:01.060 and I'm going to change this line to be not name, which no longer exists, 01:54:01.060 --> 01:54:03.820 but I'm going to go into this array called argv 01:54:03.820 --> 01:54:08.050 and I'm going to go into location 1. 01:54:08.050 --> 01:54:10.180 So I'm doing this on faith. 01:54:10.180 --> 01:54:15.070 I haven't explained what I'm doing yet, but I'm going to do make greet ./greet, 01:54:15.070 --> 01:54:19.310 and now I'm going to type my name at the command line just like with rm, 01:54:19.310 --> 01:54:20.740 with clang, with cd. 01:54:20.740 --> 01:54:23.440 With any of the commands you've written with multiple words, 01:54:23.440 --> 01:54:25.090 I'm going to greet literally David. 01:54:25.090 --> 01:54:29.110 So I hit Enter, and voila, I've somehow gotten access 01:54:29.110 --> 01:54:34.930 to what I typed at the prompt by accessing this special parameter called 01:54:34.930 --> 01:54:35.590 argv. 01:54:35.590 --> 01:54:38.507 Technically you could call it anything you want, but the convention is 01:54:38.507 --> 01:54:41.020 argv and argc from right to left here. 01:54:41.020 --> 01:54:42.280 Just a guess, then. 01:54:42.280 --> 01:54:47.230 What if I change this to print out bracket 0 and recompile the code? 01:54:47.230 --> 01:54:49.570 And I run ./greet David? 01:54:49.570 --> 01:54:51.790 What might it say instinctively? 01:54:54.490 --> 01:54:56.710 Any hunches? 01:54:56.710 --> 01:54:57.250 Yeah. 01:54:57.250 --> 01:54:59.860 So it's going to say hello, ./greet. 01:54:59.860 --> 01:55:01.880 So it turns out, you get one for free. 01:55:01.880 --> 01:55:04.450 Whatever the name of your program is always 01:55:04.450 --> 01:55:07.420 accessible in argv at location 0. 01:55:07.420 --> 01:55:08.380 That's just because. 01:55:08.380 --> 01:55:09.340 It's a handy feature. 01:55:09.340 --> 01:55:12.548 In case there's an error or you need to tell the user how to use the program, 01:55:12.548 --> 01:55:15.970 you know what the command is that they ran, but at location 1, 01:55:15.970 --> 01:55:18.610 maybe 2, maybe 3 are the additional words 01:55:18.610 --> 01:55:20.590 that the human might have typed in. 01:55:20.590 --> 01:55:23.140 Well, let's do something a little smarter than this. 01:55:23.140 --> 01:55:25.420 Let me go back to version 1. 01:55:25.420 --> 01:55:27.610 Let me recompile it, make greet. 01:55:27.610 --> 01:55:31.930 Let me rerun ./greet David, and this seems to work fine. 01:55:31.930 --> 01:55:35.080 What if I get a little curious and print out location 2? 01:55:35.080 --> 01:55:41.530 Let me recompile the code, make greet ./greet David, Enter, OK, there's null. 01:55:41.530 --> 01:55:45.580 And I mentioned we'd see N-U-L-L, and here's one incarnation thereof, 01:55:45.580 --> 01:55:47.270 but this is clearly wrong. 01:55:47.270 --> 01:55:49.990 So I probably don't want to even let the user do this because I 01:55:49.990 --> 01:55:51.490 don't want them to see bogus output. 01:55:51.490 --> 01:55:53.680 Like this is arguably the a bug in the code 01:55:53.680 --> 01:55:58.420 that it even bothered to show this by default. So what could I do instead? 01:55:58.420 --> 01:55:59.420 Well, what if I do this? 01:55:59.420 --> 01:56:07.490 If argc equals equals 2, then go ahead and comfortably 01:56:07.490 --> 01:56:11.120 say printf "hello," argv, bracket, 1. 01:56:11.120 --> 01:56:15.620 Else, if the human did not give exactly two arguments at the prompt, 01:56:15.620 --> 01:56:18.590 let's just print out some default value like "hello, world" 01:56:18.590 --> 01:56:20.040 like from last week. 01:56:20.040 --> 01:56:23.540 In other words now I'm doing this error checking with a conditional, 01:56:23.540 --> 01:56:25.790 making sure with this Boolean expression only 01:56:25.790 --> 01:56:29.990 if argc equals equals 2, and therefore has two words in argv 01:56:29.990 --> 01:56:31.410 do you want to proceed. 01:56:31.410 --> 01:56:35.700 And so now if I do make greet again, ./greet David, this now works. 01:56:35.700 --> 01:56:40.460 But if I don't cooperate and I just run greet, what should it say? 01:56:40.460 --> 01:56:41.690 Just hello, world. 01:56:41.690 --> 01:56:46.280 If I run David Malan as two words, what should it say? 01:56:46.280 --> 01:56:49.880 hello, world, because that's not exactly equal to 2. 01:56:49.880 --> 01:56:52.910 Again, the first word in argv is always the program's name. 01:56:52.910 --> 01:56:56.480 The second word is whatever the human, then, has typed. 01:56:56.480 --> 01:56:59.750 Now if we don't even know in advance how many words they're going to be, 01:56:59.750 --> 01:57:01.190 we can combine today's ideas. 01:57:01.190 --> 01:57:04.190 This is going to look a little weird, but it's the same thing as before. 01:57:04.190 --> 01:57:09.920 for int i gets 0, i is less than-- 01:57:09.920 --> 01:57:13.010 how about argc i++? 01:57:13.010 --> 01:57:19.430 And then inside of this loop, I can print out %s, maybe backslash n, comma, 01:57:19.430 --> 01:57:23.660 and then print out argv, bracket, i. 01:57:23.660 --> 01:57:27.840 So I can have a loop that iterates argc number of times, 01:57:27.840 --> 01:57:29.660 once for every word at the prompt. 01:57:29.660 --> 01:57:34.700 I can print out argv, bracket, i, which is the i-th word in that array 01:57:34.700 --> 01:57:35.730 from left to right. 01:57:35.730 --> 01:57:40.700 And so if I now run make greet and I do ./greet alone, 01:57:40.700 --> 01:57:42.080 I just see the program's name. 01:57:42.080 --> 01:57:47.010 If I do ./greet David, I see, those two, one after the other. 01:57:47.010 --> 01:57:50.350 If I do David Malan, I get those three words. 01:57:50.350 --> 01:57:52.540 If I keep going, I'll get more and more words. 01:57:52.540 --> 01:57:56.040 So using just the length of the array and the name of the array, 01:57:56.040 --> 01:57:58.493 I can actually do quite a bit there. 01:57:58.493 --> 01:58:00.910 Now there's actually some fun things you can do with this, 01:58:00.910 --> 01:58:02.340 and this is sort of beside the point, but there's 01:58:02.340 --> 01:58:04.298 this thing in the world called ASCII art, which 01:58:04.298 --> 01:58:07.290 is making pictures and beautiful things just using ASCII or maybe 01:58:07.290 --> 01:58:09.990 nowadays Unicode characters, but without using emoji. 01:58:09.990 --> 01:58:12.300 Like emoji kind of make this a little too easy. 01:58:12.300 --> 01:58:15.480 But if all you have are traditional largely English letters 01:58:15.480 --> 01:58:18.540 and punctuation, you can actually do some interesting things. 01:58:18.540 --> 01:58:21.910 On Linux systems-- for instance, if I go back to VS Code here, 01:58:21.910 --> 01:58:25.835 let me increase the size of my terminal window here. 01:58:25.835 --> 01:58:27.960 And it turns out that we've pre-installed-- really, 01:58:27.960 --> 01:58:32.010 for no compelling reason, but just for fun, a program called cowsay, 01:58:32.010 --> 01:58:34.000 which has a cow say something. 01:58:34.000 --> 01:58:37.920 So if I want to have a cow say "moo" in ASCII art, I can do this, 01:58:37.920 --> 01:58:41.310 and you get an adorable cow saying something like "moo" on the screen. 01:58:41.310 --> 01:58:43.680 But moo is a command line argument that is clearly 01:58:43.680 --> 01:58:46.590 modifying the output of this program because I could also 01:58:46.590 --> 01:58:49.350 change it to say hello, comma, world, and now the cow 01:58:49.350 --> 01:58:50.980 is going to say that instead. 01:58:50.980 --> 01:58:53.460 So it takes multiple command line arguments, if you will. 01:58:53.460 --> 01:58:58.350 But it also takes what are called flags or switches whereby any command line 01:58:58.350 --> 01:59:01.740 argument that starts with a dash is usually like a special configuration 01:59:01.740 --> 01:59:04.860 option that you would only know exists by reading the documentation 01:59:04.860 --> 01:59:06.300 or seeing a demonstration. 01:59:06.300 --> 01:59:12.780 And if I have my syntax right, if I do cowsay -f, and maybe I'll do-- 01:59:12.780 --> 01:59:13.620 let's see. 01:59:13.620 --> 01:59:18.660 Instead of this cow say, how about I'll do -f for file, 01:59:18.660 --> 01:59:20.460 and I'm going to change it into duck mode. 01:59:20.460 --> 01:59:23.730 And I'm going to have this version of the ASCII art say quack. 01:59:23.730 --> 01:59:26.255 So it's a tiny little duck there, but it's saying quack. 01:59:26.255 --> 01:59:28.380 And you can kind of waste a lot of time doing this. 01:59:28.380 --> 01:59:33.690 I can do cowsay -f dragon and say something like, RAWR, 01:59:33.690 --> 01:59:36.420 and this is just amazing. 01:59:36.420 --> 01:59:38.440 Again, not really academically compelling, 01:59:38.440 --> 01:59:41.880 but it does demonstrate, again, command line arguments, which are everywhere, 01:59:41.880 --> 01:59:44.220 and you've indeed been using them already. 01:59:44.220 --> 01:59:46.830 But there's one other feature we wanted to introduce you 01:59:46.830 --> 01:59:50.610 to today, which will be a useful building block, which will also 01:59:50.610 --> 01:59:54.090 reveal one other thing about the code that we've been writing. 01:59:54.090 --> 01:59:58.110 It turns out that all of the programs we've been writing thus far, eventually 01:59:58.110 --> 02:00:00.210 obviously exit because you see your prompt again 02:00:00.210 --> 02:00:02.680 unless you have an infinite loop such that it never ends. 02:00:02.680 --> 02:00:03.870 But eventually they exit. 02:00:03.870 --> 02:00:07.560 And secretly, every program we've written thus far actually 02:00:07.560 --> 02:00:09.240 has what's called an exit status. 02:00:09.240 --> 02:00:11.730 It's like a special return value from the program 02:00:11.730 --> 02:00:14.310 itself that by default is always 0. 02:00:14.310 --> 02:00:17.590 0 as a number in the world generally means everything's OK. 02:00:17.590 --> 02:00:21.240 The flip side of that is because the world tends to use integers 02:00:21.240 --> 02:00:23.460 and you've got four billion possibilities, 02:00:23.460 --> 02:00:27.000 like every other number in the world when it comes to our program's exit 02:00:27.000 --> 02:00:29.070 status is bad. 02:00:29.070 --> 02:00:30.750 If it's 1, it's probably bad. 02:00:30.750 --> 02:00:32.095 If it's negative 1, it's bad. 02:00:32.095 --> 02:00:34.470 And in fact, you've probably seen this in the real world. 02:00:34.470 --> 02:00:37.580 If you've ever had like a random error message on the screen-- 02:00:37.580 --> 02:00:39.330 here's a screenshot of Zoom, for instance. 02:00:39.330 --> 02:00:43.920 And that screenshot, somewhat confusingly or unknowingly, 02:00:43.920 --> 02:00:47.730 has an error code like 1132, that probably 02:00:47.730 --> 02:00:52.500 means that the Zoom software that some other humans wrote incorrectly somehow 02:00:52.500 --> 02:00:58.410 had an error and it did not exit with status 0, it exited with status 1132. 02:00:58.410 --> 02:01:00.480 And somewhere at Zoom, there's probably a file 02:01:00.480 --> 02:01:04.283 or a book that tells the programmers what this error code actually means. 02:01:04.283 --> 02:01:05.700 This is not useful for you and me. 02:01:05.700 --> 02:01:08.158 There's some programmer at Zoom who would probably be like, 02:01:08.158 --> 02:01:10.950 oh, I know what I did or my colleague did wrong in this case. 02:01:10.950 --> 02:01:13.950 You've seen this elsewhere even though this is not quite the same thing, 02:01:13.950 --> 02:01:15.658 but we'll talk about this in a few weeks. 02:01:15.658 --> 02:01:19.380 If you've ever seen 404, like numbers are everywhere, and on the web, 02:01:19.380 --> 02:01:23.070 404 means like file not found. 02:01:23.070 --> 02:01:26.830 It means you made a typo, the web server deleted a file, or something like that, 02:01:26.830 --> 02:01:30.850 but this is just to say numbers are so often used to signify or represent 02:01:30.850 --> 02:01:31.350 errors. 02:01:31.350 --> 02:01:33.600 Even though that's not an exit status, per se, 02:01:33.600 --> 02:01:36.750 that's an HTTP status code, which we'll soon see. 02:01:36.750 --> 02:01:40.590 But you have access to exit statuses as it relates 02:01:40.590 --> 02:01:42.630 to command line software already. 02:01:42.630 --> 02:01:46.250 Up until now, this is how we've been writing main, now 02:01:46.250 --> 02:01:48.740 with command line arguments, but we've also 02:01:48.740 --> 02:01:51.770 been writing main with an int return value. 02:01:51.770 --> 02:01:54.620 And you've never used this-- we didn't talk about this last week. 02:01:54.620 --> 02:01:57.740 I just ask that you trust me and just keep copying and pasting this. 02:01:57.740 --> 02:02:00.590 But that int means that even your programs 02:02:00.590 --> 02:02:05.660 can return values which can be useful even if you don't use command line 02:02:05.660 --> 02:02:08.870 arguments and we just go back to the original version like void. 02:02:08.870 --> 02:02:15.320 So for instance, if I go ahead and open up, for instance, VS Code again, 02:02:15.320 --> 02:02:16.670 I'll get rid of the dragon. 02:02:16.670 --> 02:02:19.460 And let's do one other program here called status just 02:02:19.460 --> 02:02:23.450 to play around with the idea of these so-called exit statuses. 02:02:23.450 --> 02:02:28.370 Let me just demonstrate the idea with an include cs50.h, include 02:02:28.370 --> 02:02:36.440 stdio.h, int main, and here I'll do int argc, string argv. 02:02:36.440 --> 02:02:39.080 And then inside of main, let's do a similar program 02:02:39.080 --> 02:02:40.430 to before like the hello, world. 02:02:40.430 --> 02:02:44.540 So printf "hello," comma, %s backslash n. 02:02:44.540 --> 02:02:47.010 Then let's print out argv 1. 02:02:47.010 --> 02:02:52.300 But I only want to execute that line if the human gave me a command line 02:02:52.300 --> 02:02:52.800 argument. 02:02:52.800 --> 02:02:55.550 Otherwise I don't want to even say some default like hello, world. 02:02:55.550 --> 02:03:00.250 I just want to abort early and just exit the program, no output whatsoever. 02:03:00.250 --> 02:03:01.350 So I could do this. 02:03:01.350 --> 02:03:05.523 If argc does not equal 2-- 02:03:05.523 --> 02:03:08.190 and it's a single equals, but it's a bang, an exclamation point, 02:03:08.190 --> 02:03:09.370 means not equal. 02:03:09.370 --> 02:03:11.580 So this is the opposite of equals equals. 02:03:11.580 --> 02:03:14.730 Then previously I would have just printed hello, world, 02:03:14.730 --> 02:03:16.830 but now I want to print out an error message 02:03:16.830 --> 02:03:21.210 like, "Missing command-line argument" just to explain to the user 02:03:21.210 --> 02:03:26.520 why the program is about to terminate, and then I can return 1. 02:03:26.520 --> 02:03:27.750 It's kind of arbitrary. 02:03:27.750 --> 02:03:30.700 I could also return 1132, but why start there? 02:03:30.700 --> 02:03:34.180 This is the only possible error that could go wrong in my program. 02:03:34.180 --> 02:03:35.490 So I'm going to start at 1. 02:03:35.490 --> 02:03:39.150 Zoom clearly has 1,000-plus possible things that can go wrong 02:03:39.150 --> 02:03:42.660 in their source code, which is why the number got as big as 1132, 02:03:42.660 --> 02:03:45.990 but I'm just going to arbitrarily, but conventionally return 1. 02:03:45.990 --> 02:03:52.110 But if everything is OK and I do-- it is not the case that argc does not equal 2 02:03:52.110 --> 02:03:57.360 and I actually get to line 11, I'm going to return 0 because 0, again, I claim, 02:03:57.360 --> 02:03:59.190 signifies success. 02:03:59.190 --> 02:04:03.120 And all of this time, every program we've written-- you've written 02:04:03.120 --> 02:04:07.558 has secretly exited with 0 by default. But now 02:04:07.558 --> 02:04:09.600 that our programs are getting more sophisticated, 02:04:09.600 --> 02:04:11.700 when something goes wrong, it turns out it's 02:04:11.700 --> 02:04:15.085 useful to have the power to just return some other value even 02:04:15.085 --> 02:04:16.710 though the user is not going to see it. 02:04:16.710 --> 02:04:19.620 Even though the Zoom user shouldn't see it, it's still there. 02:04:19.620 --> 02:04:22.380 It's diagnostically useful to you, or in the case of a class, 02:04:22.380 --> 02:04:24.660 to your TF or TA or CA. 02:04:24.660 --> 02:04:30.930 So if I do make status now to compile this program and run ./status and type 02:04:30.930 --> 02:04:33.340 my first name I think this is a success. 02:04:33.340 --> 02:04:37.290 It should say hello, David and secretly exit with 0. 02:04:37.290 --> 02:04:41.820 If you really want to see the 0, there's this arcane command you can type. 02:04:41.820 --> 02:04:45.780 You can literally type at your prompt echo $?. 02:04:45.780 --> 02:04:48.810 It's weird symbology, but it's what the humans chose decades ago. 02:04:48.810 --> 02:04:53.460 This will just show you what did the most recently-run program secretly exit 02:04:53.460 --> 02:04:54.010 with. 02:04:54.010 --> 02:04:58.560 So if I do this in VS Code, I can do exit $?, Enter, 02:04:58.560 --> 02:04:59.982 and there's that secret 0. 02:04:59.982 --> 02:05:02.190 I could have been doing this week and last week, it's 02:05:02.190 --> 02:05:03.330 just not that interesting. 02:05:03.330 --> 02:05:08.340 But it is interesting, or at least marginally so, if I rerun status 02:05:08.340 --> 02:05:12.060 and maybe I don't provide a command line argument or I provide too many. 02:05:12.060 --> 02:05:14.340 So argc does not equal 2. 02:05:14.340 --> 02:05:17.520 And I hit Enter, I get yelled at with the error message, 02:05:17.520 --> 02:05:21.300 but I can see the secret status code, which is, indeed, 1. 02:05:21.300 --> 02:05:24.340 And so now if you're ever in the habit in either a class like this 02:05:24.340 --> 02:05:27.090 or in the real world where you're automatically testing your code, 02:05:27.090 --> 02:05:29.340 be it with check50 or in the real world, things called 02:05:29.340 --> 02:05:31.590 unit tests and other third-party software, 02:05:31.590 --> 02:05:36.150 those tests can actually detect these status code-- exit statuses 02:05:36.150 --> 02:05:39.943 and know that your code succeed or fail, 0 or 1. 02:05:39.943 --> 02:05:42.360 And if there's different types of failures it can detect-- 02:05:42.360 --> 02:05:48.630 status 2, status 3, status 1132, it's just one other tool in your toolkit. 02:05:48.630 --> 02:05:51.240 But all of that is terribly low level, and really, 02:05:51.240 --> 02:05:54.900 the goal of this week-- and really, today, and really, code more generally, 02:05:54.900 --> 02:05:55.990 is to solve problems. 02:05:55.990 --> 02:05:58.380 So let's consider an increasingly important one, which 02:05:58.380 --> 02:06:01.650 is the ability to send information securely, 02:06:01.650 --> 02:06:04.980 whether it is in file format, wirelessly, or any other. 02:06:04.980 --> 02:06:08.640 Cryptography is the art and the science of encrypting. 02:06:08.640 --> 02:06:09.930 Scrambling information. 02:06:09.930 --> 02:06:12.510 So that even if I write a secret message to you 02:06:12.510 --> 02:06:16.350 and I send it through this open audience with so many nosey eyes 02:06:16.350 --> 02:06:19.890 who could look at the message, if I've encrypted this message, none of them 02:06:19.890 --> 02:06:22.800 should be able to read it, only you, whoever you are, 02:06:22.800 --> 02:06:24.900 to whom I intended that message. 02:06:24.900 --> 02:06:27.030 In the world of cryptography, then encryption 02:06:27.030 --> 02:06:30.210 means scrambling the information so that only you and the recipient 02:06:30.210 --> 02:06:31.060 can receive it. 02:06:31.060 --> 02:06:34.380 So if we consider our black box like in week 0 and 1, 02:06:34.380 --> 02:06:36.030 here is the problem to be solved. 02:06:36.030 --> 02:06:38.910 And let me propose a couple of pieces of vocabulary. 02:06:38.910 --> 02:06:42.420 Plaintext is any message written in English or any human language 02:06:42.420 --> 02:06:45.090 that you want to send and write yourself. 02:06:45.090 --> 02:06:47.150 Ciphertext is what you want to convert it 02:06:47.150 --> 02:06:49.850 to before you just hand it off to a bunch of random strangers 02:06:49.850 --> 02:06:52.220 in the audience or a bunch of servers on the internet, 02:06:52.220 --> 02:06:54.432 any one of whom could look at your message. 02:06:54.432 --> 02:06:56.390 So in the black box is what we're going to call 02:06:56.390 --> 02:07:02.000 a cipher, an algorithm for encrypting or scrambling information 02:07:02.000 --> 02:07:03.268 in a reversible way. 02:07:03.268 --> 02:07:05.810 It doesn't suffice to just scramble the information randomly, 02:07:05.810 --> 02:07:07.980 otherwise the recipient can't do anything with it. 02:07:07.980 --> 02:07:11.660 It's an algorithm, a cipher that encrypts it in such a way 02:07:11.660 --> 02:07:13.280 that someone else can decrypt it. 02:07:13.280 --> 02:07:14.750 And here's a common way. 02:07:14.750 --> 02:07:20.540 Most ciphers take as input not only the plaintext message in English 02:07:20.540 --> 02:07:22.700 or whatever else, but also a key. 02:07:22.700 --> 02:07:25.400 And it's metaphorically like a key to open a lock, 02:07:25.400 --> 02:07:29.300 but it's technically generally a number, like a really big number made up 02:07:29.300 --> 02:07:30.170 of lots of bits. 02:07:30.170 --> 02:07:35.330 And not even 32, not even 64, sometimes 1,024 bits, which is crazy 02:07:35.330 --> 02:07:37.610 unpronounceable large, but the probability 02:07:37.610 --> 02:07:40.880 that someone is going to guess your key is just so, so small 02:07:40.880 --> 02:07:43.850 that for all intents and purposes, you are, in fact, secure. 02:07:43.850 --> 02:07:46.020 So what's an example of this, for instance? 02:07:46.020 --> 02:07:50.165 Suppose the secret message I want to send is innocuously just "HI!" 02:07:50.165 --> 02:07:52.790 Well, it'd be pretty stupid to write "HI!" on a piece of paper, 02:07:52.790 --> 02:07:54.707 hand it to someone in the audience, and expect 02:07:54.707 --> 02:07:57.770 it to get all the way to the back without someone like glancing at it 02:07:57.770 --> 02:08:00.510 and obviously seeing and reading the plaintext. 02:08:00.510 --> 02:08:03.650 So what if I, though, agree with someone in back, for instance, 02:08:03.650 --> 02:08:05.570 that our secret is going to be 1? 02:08:05.570 --> 02:08:07.790 And we have to agree upon that secret in advance, 02:08:07.790 --> 02:08:10.160 but 1 just means that is my key. 02:08:10.160 --> 02:08:13.340 And let me propose that according to one popular cipher, 02:08:13.340 --> 02:08:19.730 if I want to send "HI!", change the H to an I and the I to a J-- that is, 02:08:19.730 --> 02:08:22.740 increment effectively every letter of the alphabet by one, 02:08:22.740 --> 02:08:25.830 and if you get to a Z, wrap back around to A, for instance. 02:08:25.830 --> 02:08:28.790 So shift the alphabet by one place in this case 02:08:28.790 --> 02:08:31.200 and send this message now instead. 02:08:31.200 --> 02:08:32.510 So is that secure? 02:08:32.510 --> 02:08:35.240 Well, if one of you kind of nosily looks at this sheet of paper, 02:08:35.240 --> 02:08:36.440 you won't see "HI!" 02:08:36.440 --> 02:08:39.240 You will see some information leak in this algorithm. 02:08:39.240 --> 02:08:42.500 You'll see an exclamation point, so I'm enthusiastically saying something, 02:08:42.500 --> 02:08:46.710 but you won't know what the message is unless you decrypt it. 02:08:46.710 --> 02:08:50.720 Now that said, is this very secure, really, in practice? 02:08:50.720 --> 02:08:51.950 I mean, not really. 02:08:51.950 --> 02:08:55.520 Like, if you know I'm just using a key and I'm using the English alphabet, 02:08:55.520 --> 02:08:58.220 you could probably brute force your way to a solution 02:08:58.220 --> 02:09:01.520 by just trying 1, trying 2, trying 3, trying 25, 02:09:01.520 --> 02:09:03.740 go through all the possibilities tediously, 02:09:03.740 --> 02:09:05.660 but eventually it's probably going to pop out. 02:09:05.660 --> 02:09:08.090 This is actually known, though, as the Caesar cipher. 02:09:08.090 --> 02:09:12.080 And back in the day, before anyone else knew about or had invented encryption, 02:09:12.080 --> 02:09:15.260 Caesar, Julius Caesar, was known to use a cipher like this 02:09:15.260 --> 02:09:17.360 using a key of three, literally. 02:09:17.360 --> 02:09:20.780 And I guess it works OK if you're literally the first human in the world 02:09:20.780 --> 02:09:25.370 by lore to have thought of this idea, but of course, anyone who intercepts it 02:09:25.370 --> 02:09:29.330 could attack it nonetheless and figure things out a bit mathematically. 02:09:29.330 --> 02:09:31.140 13 is more common. 02:09:31.140 --> 02:09:35.180 This is called ROT13 on the internet for rotate the letters of the alphabet 13. 02:09:35.180 --> 02:09:38.240 That changes "HI!" to "UV!" 02:09:38.240 --> 02:09:39.937 You might think what's better than 13? 02:09:39.937 --> 02:09:41.270 Well, let's double the security. 02:09:41.270 --> 02:09:42.590 ROT26. 02:09:42.590 --> 02:09:45.140 Why is this stupid? 02:09:45.140 --> 02:09:48.140 I mean, there's like 26 letters in the alphabet, so like A becomes A. So 02:09:48.140 --> 02:09:49.730 that doesn't really help-- oh, wait. 02:09:49.730 --> 02:09:53.090 Oh, I'm pointing at something that's not on the screen, dammit. 02:09:53.090 --> 02:09:58.190 Suppose the message is more lovingly, "I LOVE YOU," instead of just "HI!" 02:09:58.190 --> 02:10:01.490 Same exact approach, whether or not there's punctuation, "I LOVE YOU," 02:10:01.490 --> 02:10:03.980 with an input of 13 might now become this. 02:10:03.980 --> 02:10:07.130 And now it's getting a little less obvious what the ciphertext actually 02:10:07.130 --> 02:10:07.970 represents. 02:10:07.970 --> 02:10:10.550 And now, what's twice as secure is 13? 02:10:10.550 --> 02:10:15.260 Well, 26 is surely better, but of course, if you rotate 26 places, 02:10:15.260 --> 02:10:17.460 that, of course, just gives you the same thing. 02:10:17.460 --> 02:10:19.460 So there's a limit to this, but again, that just 02:10:19.460 --> 02:10:22.770 speaks to the cipher being used, which is very simple. 02:10:22.770 --> 02:10:26.417 There is much, much better, more sophisticated mathematical ciphers 02:10:26.417 --> 02:10:27.000 that are used. 02:10:27.000 --> 02:10:29.660 We're just starting with something simple here. 02:10:29.660 --> 02:10:34.910 As for decryption, if I'm using a key of 1, how do I reverse the process? 02:10:34.910 --> 02:10:36.290 Yeah, so I just minus 1. 02:10:36.290 --> 02:10:41.510 So B becomes A, C becomes B, A becomes Z. And if it's 13, 02:10:41.510 --> 02:10:45.390 I subtract 13 instead or whatever the key is, so long as sender 02:10:45.390 --> 02:10:46.780 and receiver actually know it. 02:10:46.780 --> 02:10:50.280 So in this case here, this is actually the message with which we began class. 02:10:50.280 --> 02:10:53.730 If we have this message here and I used a key of 1 to encrypt it, 02:10:53.730 --> 02:10:57.220 well, decrypting, it might involve doing something like this. 02:10:57.220 --> 02:11:00.278 Here's those same letters on the screen, and I think in a moment 02:11:00.278 --> 02:11:02.070 before we adjourn, I'll mention too that we 02:11:02.070 --> 02:11:04.230 might have encrypted a message in eight characters 02:11:04.230 --> 02:11:06.360 this whole day, so if any of you took the time 02:11:06.360 --> 02:11:08.660 and procrastinated and figured out what the light bulb spelled 02:11:08.660 --> 02:11:10.743 and they didn't seem to spell anything in English, 02:11:10.743 --> 02:11:13.530 well, here now is the solution for cracking it. 02:11:13.530 --> 02:11:16.500 This, if I subtract 1, becomes what? 02:11:16.500 --> 02:11:22.007 U becomes T. And this is obviously-- see where we're going with this? 02:11:22.007 --> 02:11:25.090 And if we keep going, subtracting 1-- so indeed, we're at the end of class 02:11:25.090 --> 02:11:26.930 now because this was CS50. 02:11:26.930 --> 02:11:30.180 And the last thing we have to say is we have hundreds of ducks waiting for you 02:11:30.180 --> 02:11:30.790 outside. 02:11:30.790 --> 02:11:33.120 So on the way out, grab your own rubber duck. 02:11:33.120 --> 02:11:34.320 [APPLAUSE] 02:11:34.320 --> 02:11:37.970 [MUSIC PLAYING]