WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:01:22.668 --> 00:01:24.080 DAVID MALAN: All right. 00:01:24.080 --> 00:01:28.193 This is CS50, and today we look all the more underneath the hood, so to speak, 00:01:28.193 --> 00:01:30.860 of programming, which we've been doing the past couple of weeks, 00:01:30.860 --> 00:01:32.015 and of C in particular. 00:01:32.015 --> 00:01:33.890 And indeed, we're going to try to focus today 00:01:33.890 --> 00:01:37.410 in addition on some new programming techniques, really on first principles, 00:01:37.410 --> 00:01:40.100 so that what you've been seeing over the past couple of weeks 00:01:40.100 --> 00:01:42.720 no longer feels quite as much like magic. 00:01:42.720 --> 00:01:44.920 If you're sort of typing these magical incantations 00:01:44.920 --> 00:01:46.670 and you're not quite sure why things work, 00:01:46.670 --> 00:01:49.250 know that you will understand and appreciate 00:01:49.250 --> 00:01:52.788 all the more with practice and with application of these ideas, what 00:01:52.788 --> 00:01:53.580 it is you're doing. 00:01:53.580 --> 00:01:56.750 But today, we're going to go back to first principles, sort of week 0 00:01:56.750 --> 00:01:59.300 material, to make sure that you understand 00:01:59.300 --> 00:02:02.960 that what we're doing now in week 2 is little different from what we did back 00:02:02.960 --> 00:02:04.040 in week 0. 00:02:04.040 --> 00:02:07.164 So in fact, let's take a look at one of the first programs we saw in C, 00:02:07.164 --> 00:02:08.789 which was a little something like this. 00:02:08.789 --> 00:02:10.430 This is our source code, so to speak. 00:02:10.430 --> 00:02:13.130 There were a few salient characteristics from last week 00:02:13.130 --> 00:02:15.830 that dovetailed with the first week, week 0. 00:02:15.830 --> 00:02:18.740 And that was this thing called main, which is just the main function. 00:02:18.740 --> 00:02:20.490 It's the main entry point to your program. 00:02:20.490 --> 00:02:23.210 It's the equivalent of scratches when green flag clicked. 00:02:23.210 --> 00:02:25.880 This of course is an example of another function, 00:02:25.880 --> 00:02:28.730 one that comes with C that allows you to print on the screen. 00:02:28.730 --> 00:02:31.340 It can take inputs, at least one input here, 00:02:31.340 --> 00:02:35.720 which is typically a string in double quotes, like the message "hello world." 00:02:35.720 --> 00:02:38.900 But of course, in order to use printf in the first place, 00:02:38.900 --> 00:02:40.910 you needed this thing up here. 00:02:40.910 --> 00:02:45.650 And Standard io.h represents what, as you understand it now? 00:02:45.650 --> 00:02:50.320 Any thoughts on what Standard io.h is? 00:02:50.320 --> 00:02:50.820 Yeah? 00:02:50.820 --> 00:02:54.648 AUDIENCE: A library on how [INAUDIBLE]. 00:02:54.648 --> 00:02:56.690 DAVID MALAN: Yeah, it's a manifestation of what's 00:02:56.690 --> 00:02:59.910 called a library, code that someone else wrote years ago. 00:02:59.910 --> 00:03:02.780 Specifically, Standard io.h is a header file. 00:03:02.780 --> 00:03:05.090 It's a file written in C but with a file extension 00:03:05.090 --> 00:03:10.070 ending in dot h that among other things declares that it has the prototype, 00:03:10.070 --> 00:03:13.290 so to speak, for printf so that Clang, when you're compiling your code, 00:03:13.290 --> 00:03:15.230 know what printf actually is. 00:03:15.230 --> 00:03:17.100 And of course this little thing back here, 00:03:17.100 --> 00:03:20.460 you've probably now gotten in the habit of using this /n is new line. 00:03:20.460 --> 00:03:22.460 And it forces the cursor to go on the next line. 00:03:22.460 --> 00:03:26.300 So those were some of the uglier characteristics of code last week, 00:03:26.300 --> 00:03:29.000 and we'll tease apart int and void and a few other things 00:03:29.000 --> 00:03:31.140 over the course of today and beyond. 00:03:31.140 --> 00:03:35.030 So when you compile your code with Clang, hello.c, 00:03:35.030 --> 00:03:38.780 and then run that program, ./a.out, which you probably haven't done 00:03:38.780 --> 00:03:42.080 on your own since, because we gave you a simpler way to do this, 00:03:42.080 --> 00:03:45.650 that process was all about creating a file containing zeros and ones that 00:03:45.650 --> 00:03:48.410 the computer understands, called a.out that you can run. 00:03:48.410 --> 00:03:50.780 Of course, a.out is a pretty stupid name for a program. 00:03:50.780 --> 00:03:53.030 It's hardly descriptive, even though it's the default. 00:03:53.030 --> 00:03:55.880 So the next program we wrote and compiled, 00:03:55.880 --> 00:04:00.290 we used -ohhello, which is a so-called command line argument to Clang. 00:04:00.290 --> 00:04:02.480 It's like an option it comes with that just lets you 00:04:02.480 --> 00:04:04.752 specify the name of the file to output. 00:04:04.752 --> 00:04:06.710 So you did this past week with the problem set, 00:04:06.710 --> 00:04:09.020 with a couple of programs you yourself wrote. 00:04:09.020 --> 00:04:13.376 But what is actually going on when you compile your code via that process? 00:04:13.376 --> 00:04:16.459 Well, it turns out that if we make this program a little more interesting, 00:04:16.459 --> 00:04:19.190 this becomes even more important with code like this. 00:04:19.190 --> 00:04:20.899 Now I've added a couple of lines of code. 00:04:20.899 --> 00:04:24.440 CS50.h, which is representative of the CS50 library. 00:04:24.440 --> 00:04:28.310 Again, code that other people wrote, in this case the staff some years ago, 00:04:28.310 --> 00:04:33.020 that declares that it has prototypes for the one liners for functions 00:04:33.020 --> 00:04:36.590 like GetString so that you can use more features than came with C by default. 00:04:36.590 --> 00:04:39.840 And it has things like String itself, a data type. 00:04:39.840 --> 00:04:41.840 So GetString is declared in that file. 00:04:41.840 --> 00:04:45.260 Name is, of course, a variable in which we stored my name last week. 00:04:45.260 --> 00:04:47.990 String is the type of variable in which we stored a name. 00:04:47.990 --> 00:04:51.680 And all of that is then outputed hello comma something, 00:04:51.680 --> 00:04:53.990 where the percent S recall was a placeholder, 00:04:53.990 --> 00:04:58.130 name is the variable we plugged in to that format code, and then all of that 00:04:58.130 --> 00:05:01.700 is possible because of CS50.h, which declares string and also 00:05:01.700 --> 00:05:03.120 gives us GetString. 00:05:03.120 --> 00:05:05.453 So that's a paradigm that's at the moment CS50 specific, 00:05:05.453 --> 00:05:07.787 but it's representative of any number of other functions 00:05:07.787 --> 00:05:10.220 we're going to start using today and in the weeks to come. 00:05:10.220 --> 00:05:12.780 The process now is going to be the same. 00:05:12.780 --> 00:05:17.327 However, when you compiled that program that used the CS50 library, 00:05:17.327 --> 00:05:20.160 you might recall and you might have gotten hung up on this past week 00:05:20.160 --> 00:05:24.365 if you used Clang and not another program, you need this -lcs50, 00:05:24.365 --> 00:05:26.510 and you need it at the end just because. 00:05:26.510 --> 00:05:27.905 That's the way Clang expects it. 00:05:27.905 --> 00:05:29.780 This is a special flag that we'll tease apart 00:05:29.780 --> 00:05:34.290 in just a couple of minutes, an argument to Clang that tells it to link in, 00:05:34.290 --> 00:05:38.180 so to speak, link in all of the zeros and ones from CS50's library. 00:05:38.180 --> 00:05:40.108 But we'll see that in just a moment. 00:05:40.108 --> 00:05:41.900 This, of course, is how you should probably 00:05:41.900 --> 00:05:43.480 be compiling your code here on out. 00:05:43.480 --> 00:05:46.700 It's just super simple, but it automates everything we just 00:05:46.700 --> 00:05:49.310 saw more pedantically, step by step. 00:05:49.310 --> 00:05:51.870 So we've been compiling our code for the past week now, 00:05:51.870 --> 00:05:54.620 and we're going to keep doing that for next several weeks, until-- 00:05:54.620 --> 00:05:56.330 spoiler-- we get to Python, and you're not 00:05:56.330 --> 00:05:57.860 going to have to compile anything anymore. 00:05:57.860 --> 00:05:59.860 It's just going to happen automatically for you. 00:05:59.860 --> 00:06:03.710 But until then, compilation is actually kind of an oversimplification 00:06:03.710 --> 00:06:05.630 of what's been happening the past week. 00:06:05.630 --> 00:06:09.290 Turns out there's like actually four distinct steps that you all 00:06:09.290 --> 00:06:12.410 had been inducing by running Make or even by running 00:06:12.410 --> 00:06:14.180 Clang manually at the command prompt. 00:06:14.180 --> 00:06:16.460 And just so that, again, we can sort of understand 00:06:16.460 --> 00:06:18.930 what it is you are doing when you run these commands, 00:06:18.930 --> 00:06:21.380 let's go to first principles, understand these four steps, 00:06:21.380 --> 00:06:25.170 but then we'll move on just like in week 0 and stipulate, OK, I got that. 00:06:25.170 --> 00:06:27.690 I don't need to think at this low level after today. 00:06:27.690 --> 00:06:30.925 But hopefully you'll understand from the bottom up these four steps. 00:06:30.925 --> 00:06:32.550 So let's take a look at pre-processing. 00:06:32.550 --> 00:06:35.580 This is a term of art in programming that refers to the following. 00:06:35.580 --> 00:06:38.260 When you have source code that looks like this, 00:06:38.260 --> 00:06:40.280 you have a couple of lines at the top that 00:06:40.280 --> 00:06:43.470 say hash include two files, two library files. 00:06:43.470 --> 00:06:47.040 Well, when you actually run Clang or you induce 00:06:47.040 --> 00:06:50.610 Clang to run by using Make, what happens is those lines 00:06:50.610 --> 00:06:54.330 that start with the hash symbol are actually sort of replaced 00:06:54.330 --> 00:06:56.920 with the actual contents of that file. 00:06:56.920 --> 00:07:00.240 So instead of this code remaining include CS50.h, 00:07:00.240 --> 00:07:03.600 literally what Clang does is go into CS50.h, 00:07:03.600 --> 00:07:06.990 grab the relevant lines of code, and essentially copy-paste them 00:07:06.990 --> 00:07:10.380 into your file, hello.c or whatever it's called. 00:07:10.380 --> 00:07:12.870 The next line here, standard io.h similarly 00:07:12.870 --> 00:07:17.460 gets replaced with whatever the lines of code are in that file, standard io.h. 00:07:17.460 --> 00:07:21.030 Doesn't matter to us what they are, but they look a little something like this, 00:07:21.030 --> 00:07:22.740 though I've simplified on the slide here. 00:07:22.740 --> 00:07:25.573 And there's a whole bunch of other stuff above and below those lines 00:07:25.573 --> 00:07:27.330 certainly in those files. 00:07:27.330 --> 00:07:28.830 What then happens after that? 00:07:28.830 --> 00:07:31.110 Well, compiling, even though this is the word 00:07:31.110 --> 00:07:33.630 we use and we'll continue using to describe 00:07:33.630 --> 00:07:38.070 taking source code to machine code, it's actually a more precise step than that. 00:07:38.070 --> 00:07:40.980 When a computer-- when a program is compiled, 00:07:40.980 --> 00:07:44.340 it technically starts like this after having been pre-processed-- again, 00:07:44.340 --> 00:07:45.510 that was step 1. 00:07:45.510 --> 00:07:47.730 This code is then converted by a compiler, 00:07:47.730 --> 00:07:51.950 like Clang, to something that looks even scarier than C. This is something 00:07:51.950 --> 00:07:53.700 called assembly code, and you can actually 00:07:53.700 --> 00:07:55.530 take entire courses on assembly code. 00:07:55.530 --> 00:07:58.740 And it wasn't all that many decades ago that humans were manually 00:07:58.740 --> 00:08:02.400 programming code that looked like this, so it wasn't quite zeros and ones. 00:08:02.400 --> 00:08:04.560 But my god, C is looking pretty good now, 00:08:04.560 --> 00:08:06.810 if this is the alternative language back in the day. 00:08:06.810 --> 00:08:08.730 So this is an example of assembly language. 00:08:08.730 --> 00:08:11.460 But even though it's pretty arcane looking, 00:08:11.460 --> 00:08:13.680 if I highlight in yellow a few characteristics, 00:08:13.680 --> 00:08:15.390 there's some things that are familiar. 00:08:15.390 --> 00:08:16.520 Main is up here. 00:08:16.520 --> 00:08:17.730 Get string is down here. 00:08:17.730 --> 00:08:19.290 Printf is down here. 00:08:19.290 --> 00:08:23.760 So when your code is compiled by Clang, it goes from your source code in C 00:08:23.760 --> 00:08:27.480 to this intermediate step assembly code, and that's just 00:08:27.480 --> 00:08:30.960 a little closer to what the CPU, the brain of your computer, 00:08:30.960 --> 00:08:31.980 actually understands. 00:08:31.980 --> 00:08:35.070 In fact, now highlighted in yellow are what are called instructions. 00:08:35.070 --> 00:08:38.070 So if you've ever heard of Intel or AMD or a bunch of companies 00:08:38.070 --> 00:08:40.500 that make CPUs, central processing units, 00:08:40.500 --> 00:08:43.679 the brains of a computer, what those CPUs understand 00:08:43.679 --> 00:08:47.310 is these very, very low level operations like this. 00:08:47.310 --> 00:08:50.520 And these relate to moving things around in memory and copying things 00:08:50.520 --> 00:08:52.770 and reading things and putting things onto the screen. 00:08:52.770 --> 00:08:55.227 But much more arcanely than C is. 00:08:55.227 --> 00:08:57.060 But again, we don't have to care about this, 00:08:57.060 --> 00:08:59.710 because Clang does all of this for us. 00:08:59.710 --> 00:09:02.040 But once you're at that point of having assembly code, 00:09:02.040 --> 00:09:04.582 you need to get it to machine code the actual zeros and ones. 00:09:04.582 --> 00:09:07.350 And that's where Clang does what's called assembling. 00:09:07.350 --> 00:09:10.950 There's another part of Clang, like some built-in functionality, that 00:09:10.950 --> 00:09:13.470 takes as input that assembly code and converts it 00:09:13.470 --> 00:09:17.530 from this to the zeros and ones that we talked about in week 0. 00:09:17.530 --> 00:09:21.510 But for a program like hello.c, which involved a few different files. 00:09:21.510 --> 00:09:25.470 For instance, this code again involved my code that we wrote last week. 00:09:25.470 --> 00:09:28.650 It involves the CS50 library, which the staff wrote years ago. 00:09:28.650 --> 00:09:30.390 And it involves standard io.h. 00:09:30.390 --> 00:09:31.650 That's yet another file. 00:09:31.650 --> 00:09:36.000 That's like three different files that Clang frankly has to compile for you. 00:09:36.000 --> 00:09:39.630 Now it would be super tedious if we had to run Clang like three times 00:09:39.630 --> 00:09:41.310 to do all this compilation. 00:09:41.310 --> 00:09:42.300 Thankfully we don't. 00:09:42.300 --> 00:09:44.020 It all happens automatically. 00:09:44.020 --> 00:09:48.420 So the last step in compiling a program after it's been pre-processed, 00:09:48.420 --> 00:09:51.300 after it's been compiled, after it's been assembled, 00:09:51.300 --> 00:09:54.960 is to combine all of the zeros and ones from the files involved 00:09:54.960 --> 00:09:58.590 into one big file, like Hello or a.out. 00:09:58.590 --> 00:10:03.360 So if hello.c started as source code, as did CS50.C, somewhere on the computer's 00:10:03.360 --> 00:10:07.410 hard drive, as did Standard IO.C, somewhere on the computer's hard drive, 00:10:07.410 --> 00:10:12.480 turns out the printf is actually in its own file within Standard IO. 00:10:12.480 --> 00:10:13.380 the library. 00:10:13.380 --> 00:10:16.780 But these are the three files involved for the program I just described. 00:10:16.780 --> 00:10:19.020 So once we actually go ahead and assemble this one, 00:10:19.020 --> 00:10:20.812 it becomes a whole bunch of zeros and ones. 00:10:20.812 --> 00:10:23.062 We assemble this one, a whole bunch of zeros and ones. 00:10:23.062 --> 00:10:24.840 This one, a whole bunch of zeros and ones. 00:10:24.840 --> 00:10:26.880 That's like three separate files that then 00:10:26.880 --> 00:10:32.640 get linked together, sort of commingled, into one big file called Hello, 00:10:32.640 --> 00:10:34.480 or called a.out. 00:10:34.480 --> 00:10:36.512 And my god, like that's a lot of complexity. 00:10:36.512 --> 00:10:38.220 But that's what humans have been building 00:10:38.220 --> 00:10:41.090 and developing for the past many decades when it comes to writing software. 00:10:41.090 --> 00:10:43.320 Back in the day, it started off as zeros and ones. 00:10:43.320 --> 00:10:44.310 That was no fun. 00:10:44.310 --> 00:10:46.530 Assembly language, scary though it looks, 00:10:46.530 --> 00:10:50.010 was actually a little easier, a little more accessible for humans to write. 00:10:50.010 --> 00:10:51.810 But eventually we humans got tired of that, 00:10:51.810 --> 00:10:56.940 and thus were born languages like C and C++ and Python and PHP and Ruby 00:10:56.940 --> 00:10:57.790 and others. 00:10:57.790 --> 00:11:00.640 It's been an evolution of languages along the way. 00:11:00.640 --> 00:11:04.140 So this now we can just abstract away into compiling. 00:11:04.140 --> 00:11:07.337 When you compile your code, all of that stuff happens. 00:11:07.337 --> 00:11:09.420 But all we really care about at the end of the day 00:11:09.420 --> 00:11:12.360 is the input, your source code, the output as machine code. 00:11:12.360 --> 00:11:14.400 But those are the various steps happening. 00:11:14.400 --> 00:11:16.858 And if you ever see cryptic-looking commands on the screen, 00:11:16.858 --> 00:11:20.510 it might relate indeed to some of those intermediate steps. 00:11:20.510 --> 00:11:24.920 All right, any questions then on what compiling is or pre-processing, 00:11:24.920 --> 00:11:28.310 compiling, assembling, or linking? 00:11:28.310 --> 00:11:31.630 Anything at all? 00:11:31.630 --> 00:11:32.660 All right. 00:11:32.660 --> 00:11:36.440 So beyond that, I'm sure you've encountered now, after just one 00:11:36.440 --> 00:11:38.310 week, bugs in your software. 00:11:38.310 --> 00:11:41.560 And in fact, one of the greatest skills you can acquire from programming class 00:11:41.560 --> 00:11:45.860 is not only how to write code, but how to debug code, most likely your own. 00:11:45.860 --> 00:11:48.560 And if you've ever wondered where this phrase comes from, 00:11:48.560 --> 00:11:52.240 this notion of debugging, so this is actually part of the mythology. 00:11:52.240 --> 00:11:55.550 So this is actually a notebook kept by Grace Hopper, 00:11:55.550 --> 00:11:59.600 a very famous computer scientist, working years ago with some colleagues 00:11:59.600 --> 00:12:01.142 on what was called the Mark 2 system. 00:12:01.142 --> 00:12:03.350 If you've ever walked through Harvard Science Center, 00:12:03.350 --> 00:12:06.470 there's a big part of a machine in the ground floor of the Science Center. 00:12:06.470 --> 00:12:08.230 That's the Mark 1, the precursor. 00:12:08.230 --> 00:12:10.490 Well, the Mark 2 at some point was discovered 00:12:10.490 --> 00:12:14.180 as having literally a bug inside of it, which was causing a problem. 00:12:14.180 --> 00:12:15.320 A moth of sorts. 00:12:15.320 --> 00:12:18.140 And Grace Hopper actually made this record here, if we zoom in, 00:12:18.140 --> 00:12:20.817 the first actual case of bug being found. 00:12:20.817 --> 00:12:23.150 And even though other people had used the expression bug 00:12:23.150 --> 00:12:26.030 before to refer to mistakes or problems in systems, 00:12:26.030 --> 00:12:30.070 this is really sort of the lore that folks in computer science look back on. 00:12:30.070 --> 00:12:34.730 So bugs are just mistakes in programs, things that you surely did not intend. 00:12:34.730 --> 00:12:37.430 And we'll consider today now how we can empower you, 00:12:37.430 --> 00:12:41.420 much more so than this past week, to solve your own problems 00:12:41.420 --> 00:12:44.083 and actually debug your software. 00:12:44.083 --> 00:12:46.250 So what are the mechanisms via which we can do this? 00:12:46.250 --> 00:12:49.682 So Help 50 is one of the tools that CS50 itself provides you with. 00:12:49.682 --> 00:12:51.890 And let's go ahead and take a look at a quick example 00:12:51.890 --> 00:12:54.570 that allows us to use this tool. 00:12:54.570 --> 00:12:57.198 I'm going to go ahead and open up my CS50 Sandbox here. 00:12:57.198 --> 00:12:59.240 I'm going to go ahead and create a program called 00:12:59.240 --> 00:13:03.360 Buggy 0.C, knowing in advance that I'm going to make a mistake here. 00:13:03.360 --> 00:13:07.560 And I'm going to go ahead and do main void, as do all of my programs begin. 00:13:07.560 --> 00:13:12.740 And I'm going to go ahead and do printf hello world backslash n semicolon. 00:13:12.740 --> 00:13:14.720 All right, so that's buggy 0.c. 00:13:14.720 --> 00:13:16.970 And again, even though I could run the Clang commands, 00:13:16.970 --> 00:13:19.220 henceforth I'm just going to run things like Make. 00:13:19.220 --> 00:13:21.170 So make buggy 0 Enter. 00:13:21.170 --> 00:13:23.190 And all right, here's the first of my errors. 00:13:23.190 --> 00:13:25.520 Let me just increase the size of my terminal window, 00:13:25.520 --> 00:13:29.750 focusing as always, always on the first error, which is the one in red here. 00:13:29.750 --> 00:13:34.290 Implicitly declaring library function printf with type int const char *w, 00:13:34.290 --> 00:13:34.790 error-- 00:13:34.790 --> 00:13:35.957 I mean, there's a lot there. 00:13:35.957 --> 00:13:39.560 There's a lot to digest, even though by now, you might recognize at least some 00:13:39.560 --> 00:13:40.370 of these symbols. 00:13:40.370 --> 00:13:43.700 But suppose you don't, and you want help understanding this message. 00:13:43.700 --> 00:13:46.700 Short of asking a human for help, someone who's more familiar, 00:13:46.700 --> 00:13:48.080 you can instead do this. 00:13:48.080 --> 00:13:52.760 Rerun the same command as before, but prefix it with help 50 and hit Enter. 00:13:52.760 --> 00:13:55.730 And what will happen is we will run make for you again. 00:13:55.730 --> 00:13:59.340 We will look at the output of make, cryptic though it might be to you, 00:13:59.340 --> 00:14:03.500 run it through our own Help 50 software and look for messages we understand. 00:14:03.500 --> 00:14:07.157 And if we recognize one of the error messages in your output, 00:14:07.157 --> 00:14:08.990 we're going to highlight in yellow a message 00:14:08.990 --> 00:14:12.620 like this-- buggy zero, dot C3 colon 5, error, 00:14:12.620 --> 00:14:16.730 implicitly declaring library function printf with type, dot, dot, dot. 00:14:16.730 --> 00:14:18.803 Did you forget to include standard Io dot h 00:14:18.803 --> 00:14:20.970 and with printf is declared at the top of your file. 00:14:20.970 --> 00:14:23.540 So that's, in this case, the exact answer. 00:14:23.540 --> 00:14:25.788 And so now, you'll just see that not only 00:14:25.788 --> 00:14:28.580 are we still showing you the error, we're highlighting where it is. 00:14:28.580 --> 00:14:33.280 And in fact, buggy zero, dot c, line 3, character 5, or column 5, 00:14:33.280 --> 00:14:37.010 is just one way of now homing in on what the issue is. 00:14:37.010 --> 00:14:43.300 Let me go ahead and open up another file here, or enhance this as buggy one 00:14:43.300 --> 00:14:47.520 dot c, and make a similar mistake, but one that triggers a different error 00:14:47.520 --> 00:14:48.020 message. 00:14:48.020 --> 00:14:50.728 In this case, I'm going to go ahead and get this right this time, 00:14:50.728 --> 00:14:53.420 include standard Io dot h. 00:14:53.420 --> 00:14:56.925 And then I'm going to go ahead and do int main void, and then just as before, 00:14:56.925 --> 00:14:58.550 I'm going to do this canonical program. 00:14:58.550 --> 00:15:00.620 String name gets get string. 00:15:00.620 --> 00:15:03.890 And ask the user, what's your name-- 00:15:03.890 --> 00:15:05.150 backslash, n. 00:15:05.150 --> 00:15:10.850 And then I'm going to go ahead and say hello to them with a %s comma name. 00:15:10.850 --> 00:15:12.480 So that too looks good. 00:15:12.480 --> 00:15:16.760 I'm going to go ahead and scroll back up here, do make buggy one this time. 00:15:16.760 --> 00:15:20.030 But of course, it looks like, my god, as before, I have two lines of code, 00:15:20.030 --> 00:15:21.740 yet somehow, five or six errors. 00:15:21.740 --> 00:15:23.120 Always focus on the top. 00:15:23.120 --> 00:15:27.180 So it probably relates to something like this, but this one's more confusing. 00:15:27.180 --> 00:15:29.960 The undeclared identifier string-- did you mean standard Io? 00:15:29.960 --> 00:15:31.040 Well, no. 00:15:31.040 --> 00:15:34.310 So if you don't quite grok that, go ahead and run the same command, 00:15:34.310 --> 00:15:36.360 help 50, make buggy one. 00:15:36.360 --> 00:15:38.960 And this time, we'll see the output of this command, 00:15:38.960 --> 00:15:42.470 hopefully, after asking for help, a clue as to what 00:15:42.470 --> 00:15:44.840 it is that we're actually looking for. 00:15:44.840 --> 00:15:47.780 And indeed, now we notice that oh, by undeclared identifier, 00:15:47.780 --> 00:15:50.495 clang means you've used a name string on line five of buggy one 00:15:50.495 --> 00:15:52.040 dot c, which hasn't been defined. 00:15:52.040 --> 00:15:55.400 Did you forget to include cs50 dot h, at this point. 00:15:55.400 --> 00:15:58.550 So in short, anytime you're having a problem running a command 00:15:58.550 --> 00:16:02.270 and you're seeing cryptic messages, reach for help 50 as a command 00:16:02.270 --> 00:16:04.100 for actually explaining it to you. 00:16:04.100 --> 00:16:08.122 And thereafter, probably you won't have to run that same command again. 00:16:08.122 --> 00:16:09.080 But what about another? 00:16:09.080 --> 00:16:12.950 Let me go ahead and open up a program I wrote in advance here, 00:16:12.950 --> 00:16:17.250 and go ahead and open this one. 00:16:17.250 --> 00:16:17.750 Yeah? 00:16:17.750 --> 00:16:18.565 Sure. 00:16:18.565 --> 00:16:23.925 AUDIENCE: [INAUDIBLE] just press more buttons. 00:16:23.925 --> 00:16:25.550 DAVID MALAN: To rerun the same command? 00:16:25.550 --> 00:16:27.890 AUDIENCE: Not to delete that, but to [INAUDIBLE] 00:16:27.890 --> 00:16:30.560 DAVID MALAN: Oh, yes, so just to keep things neat in class, 00:16:30.560 --> 00:16:32.600 I'm in the habit of hitting Control l a lot, 00:16:32.600 --> 00:16:34.460 which just clears my terminal window. 00:16:34.460 --> 00:16:35.630 It has no functional impact. 00:16:35.630 --> 00:16:37.490 It just gets the clutter off of the screen. 00:16:37.490 --> 00:16:40.515 You can also literally type, for instance, clear, Enter. 00:16:40.515 --> 00:16:42.890 That's just a little more verbose than hitting Control l. 00:16:42.890 --> 00:16:45.932 So there's a lot of little keyboard shortcuts, and interrupt at any point 00:16:45.932 --> 00:16:47.520 if you have questions about those. 00:16:47.520 --> 00:16:49.670 So here's a program that also is buggy. 00:16:49.670 --> 00:16:52.160 I wrote it in advance, and it's called buggy two dot c. 00:16:52.160 --> 00:16:53.180 It's got a for loop. 00:16:53.180 --> 00:16:54.590 It's printing some hashes. 00:16:54.590 --> 00:16:58.010 And the goal of this program is to print something 10 times. 00:16:58.010 --> 00:17:00.560 So I've got my for loop from zero on up to 10. 00:17:00.560 --> 00:17:02.870 I'm printing a hash with a backslash n. 00:17:02.870 --> 00:17:06.450 So let's go ahead and run this, make buggy two. 00:17:06.450 --> 00:17:06.950 Oops. 00:17:06.950 --> 00:17:08.033 I'm not in this directory. 00:17:08.033 --> 00:17:10.609 Let me go ahead and make buggy two-- 00:17:10.609 --> 00:17:11.510 seems to compile. 00:17:11.510 --> 00:17:14.089 So this is not a problem for help 50 yet, 00:17:14.089 --> 00:17:17.030 because that would be when the command itself isn't working. 00:17:17.030 --> 00:17:19.880 Buggy two-- all right, it looks good, but let's 00:17:19.880 --> 00:17:23.798 just be super sure-- one, two, three, four, five, six, seven, eight, nine, 00:17:23.798 --> 00:17:26.000 10, 11. 00:17:26.000 --> 00:17:28.550 So it is flawed, if my goal is to print just 10 hashes. 00:17:28.550 --> 00:17:30.230 And obviously, this is very contrived. 00:17:30.230 --> 00:17:32.960 Odds are, you can just reason through what the problem here is, 00:17:32.960 --> 00:17:36.320 but this is representative of another type of problem 00:17:36.320 --> 00:17:41.870 that's not a bug syntactically, whereby you typed some wrong symbol or Command. 00:17:41.870 --> 00:17:43.370 This is more of a logical error. 00:17:43.370 --> 00:17:45.500 My goal is to print something 10 times. 00:17:45.500 --> 00:17:46.370 It's obviously not. 00:17:46.370 --> 00:17:47.787 It's printing something 11 times. 00:17:47.787 --> 00:17:50.370 And suppose that the goal at hand is to wrap your mind around, 00:17:50.370 --> 00:17:51.960 why is that happening? 00:17:51.960 --> 00:17:55.280 Well, the next debugging tool that we'll propose that you consider, 00:17:55.280 --> 00:17:57.170 is actually quite simply printf. 00:17:57.170 --> 00:18:00.980 It's perhaps the simplest tool you can use to actually understand 00:18:00.980 --> 00:18:04.190 what's going on inside of your program, and we might use it in this case 00:18:04.190 --> 00:18:05.240 as follows. 00:18:05.240 --> 00:18:08.023 I'm obviously printing out already the hash symbol, 00:18:08.023 --> 00:18:10.940 but let me go ahead and say something more deliberate, just to myself, 00:18:10.940 --> 00:18:18.830 something like i is now, %i, and then let's go ahead and just put a space, 00:18:18.830 --> 00:18:21.690 and then in there, output i semicolon. 00:18:21.690 --> 00:18:23.460 So this is not the goal of the program. 00:18:23.460 --> 00:18:25.400 It's just a temporary diagnostic message, 00:18:25.400 --> 00:18:28.850 so that now, if I go ahead and increase my terminal window, 00:18:28.850 --> 00:18:33.620 recompile buggy two, and rerun dot slash buggy two-- 00:18:33.620 --> 00:18:35.930 [LAUGHS] buffy two-- 00:18:35.930 --> 00:18:41.000 buggy two-- I'll now see, oh, a little more interesting information. 00:18:41.000 --> 00:18:44.600 Not only am I still seeing the hashes, I'm now seeing, in real time, 00:18:44.600 --> 00:18:45.617 the value of i. 00:18:45.617 --> 00:18:47.450 And now, it should probably jump out at you, 00:18:47.450 --> 00:18:50.210 if it didn't already in the for loop alone, what's 00:18:50.210 --> 00:18:53.256 the mistake I've made in my code? 00:18:53.256 --> 00:18:54.613 AUDIENCE: [INAUDIBLE] 00:18:54.613 --> 00:18:55.571 DAVID MALAN: Say again. 00:18:55.571 --> 00:18:58.220 AUDIENCE: [INAUDIBLE] 00:18:58.220 --> 00:19:02.100 DAVID MALAN: Yeah, my first value for i was zero, and that's normally OK. 00:19:02.100 --> 00:19:04.100 Programmers do tend to start counting from zero, 00:19:04.100 --> 00:19:07.730 but if you do that, you can't catch keep counting through 10. 00:19:07.730 --> 00:19:09.860 You have to make a couple of tweaks here. 00:19:09.860 --> 00:19:11.462 So what can we do to fix? 00:19:11.462 --> 00:19:15.457 AUDIENCE: [INAUDIBLE] 00:19:15.457 --> 00:19:18.290 DAVID MALAN: Yeah, so this would be the canonical way of doing this. 00:19:18.290 --> 00:19:20.480 It's not the only way, but generally start at zero 00:19:20.480 --> 00:19:23.330 and go up to less than the value you care about. 00:19:23.330 --> 00:19:27.290 So now if I rerun this, I can go ahead and run make buggy two again, 00:19:27.290 --> 00:19:30.860 clear my screen, dot slash buggy two, Enter. 00:19:30.860 --> 00:19:33.557 And now I indeed have 10, even though it never says 10, 00:19:33.557 --> 00:19:35.390 but that's OK, because I'm starting at zero, 00:19:35.390 --> 00:19:38.330 and now that I found my logical error, where it's just not 00:19:38.330 --> 00:19:41.810 working as I intended, now I can go ahead and delete that line. 00:19:41.810 --> 00:19:47.000 I can go ahead and make buggy two once more, dot slash buggy two, Enter. 00:19:47.000 --> 00:19:51.140 And voila, I can now submit my program, or ship it out to my actual user. 00:19:51.140 --> 00:19:53.652 So printf is sort of a very old-school way 00:19:53.652 --> 00:19:56.360 of just wrapping your mind around what's going on in your program 00:19:56.360 --> 00:19:57.530 by just poking around. 00:19:57.530 --> 00:20:00.870 Use printf to see what's going on inside of your program, 00:20:00.870 --> 00:20:03.740 so you're not just staring at a screen trying to reason through 00:20:03.740 --> 00:20:05.810 without the help of the computer. 00:20:05.810 --> 00:20:09.405 But of course, that's about as versatile as cs50 sandbox 00:20:09.405 --> 00:20:11.030 gets when it comes to solving problems. 00:20:11.030 --> 00:20:12.500 You can write code up here. 00:20:12.500 --> 00:20:14.630 You can compile and run code down here. 00:20:14.630 --> 00:20:16.850 And there are commands like help 50 and a few others 00:20:16.850 --> 00:20:19.400 we'll see that you can run to improve your code, 00:20:19.400 --> 00:20:22.215 but the sandbox itself is actually pretty limited. 00:20:22.215 --> 00:20:25.340 And so today, we're going to introduce another programming environment that 00:20:25.340 --> 00:20:29.720 fundamentally is the same thing, it just has additional features, particularly 00:20:29.720 --> 00:20:31.880 ones related to debugging. 00:20:31.880 --> 00:20:36.200 So here now, is what is called CS50 IDE. 00:20:36.200 --> 00:20:39.113 IDE is a term of art for integrated development environment. 00:20:39.113 --> 00:20:41.030 You might have used it if you programed before 00:20:41.030 --> 00:20:44.270 in high school things like Eclipse or Visual Studio or NetBeans 00:20:44.270 --> 00:20:45.830 or a bunch of other tools as well. 00:20:45.830 --> 00:20:47.420 If you've ever used any of these tools, that's fine. 00:20:47.420 --> 00:20:48.590 Most students have not. 00:20:48.590 --> 00:20:52.940 But CS50 IDE is just sort of a fancier version of CS50 sandbox 00:20:52.940 --> 00:20:56.420 that adds some additional tools, like debugging tools. 00:20:56.420 --> 00:21:00.140 And so here I've gone ahead and logged in advance to CS50 IDE, 00:21:00.140 --> 00:21:01.970 and it's pretty much the same layout. 00:21:01.970 --> 00:21:05.660 On the top of the window is where my tabs with my code will go. 00:21:05.660 --> 00:21:07.220 On the bottom is my terminal window. 00:21:07.220 --> 00:21:10.530 It happens to be blue instead of black, but that's just an aesthetic detail. 00:21:10.530 --> 00:21:13.108 But you'll see a teaser over here of other features, 00:21:13.108 --> 00:21:16.400 including what's called the debugger, a program that's going to let me actually 00:21:16.400 --> 00:21:19.470 step through my code, step by step. 00:21:19.470 --> 00:21:21.710 So let's go ahead and do this after introducing 00:21:21.710 --> 00:21:25.850 one other command that exists in the IDE, and that's called debug 50. 00:21:25.850 --> 00:21:28.790 Suffice it to say, that any command this semester that ends in 50 00:21:28.790 --> 00:21:30.958 is a training wheel of sorts that's CS50 specific. 00:21:30.958 --> 00:21:32.750 But by term's end, well we have essentially 00:21:32.750 --> 00:21:36.980 taken away all of those CS50 specific tools so that everything you're using 00:21:36.980 --> 00:21:39.660 is industry standard, so to speak. 00:21:39.660 --> 00:21:46.110 So if we look now at CS50 IDE, let's go ahead and maybe run that same program. 00:21:46.110 --> 00:21:49.520 So if I click this folder icon up here, you'll see a whole bunch of files, 00:21:49.520 --> 00:21:50.810 just like in the sandbox. 00:21:50.810 --> 00:21:53.810 And I've pre downloaded all of today's source code from CS50's website 00:21:53.810 --> 00:21:56.960 and just uploaded it to the IDE, just like you can in the sandbox. 00:21:56.960 --> 00:22:00.780 And we'll do this in section or in super section, manually, if you'd like. 00:22:00.780 --> 00:22:03.950 I'm going to go ahead and open up that same program buggy two, that's 00:22:03.950 --> 00:22:06.053 now in the IDE instead of the sandbox, and you'll 00:22:06.053 --> 00:22:07.470 see it looks pretty much the same. 00:22:07.470 --> 00:22:09.060 The color coding might be a little different, 00:22:09.060 --> 00:22:10.648 but that's just an aesthetic detail. 00:22:10.648 --> 00:22:11.690 And I can still run this. 00:22:11.690 --> 00:22:14.540 Make buggy two down here. 00:22:14.540 --> 00:22:19.340 But notice here, this error, I could use help 50 on this, but notice in advance, 00:22:19.340 --> 00:22:22.490 I've downloaded all of my code into a folder called source two. 00:22:22.490 --> 00:22:25.100 That's what's in the zip file, on the course's website. 00:22:25.100 --> 00:22:29.960 So again, just like we did briefly last week, if you know your code is not just 00:22:29.960 --> 00:22:32.420 in the default location, but is in another directory, 00:22:32.420 --> 00:22:34.005 what does cd stand for? 00:22:34.005 --> 00:22:35.250 AUDIENCE: Change directory. 00:22:35.250 --> 00:22:35.600 DAVID MALAN: OK. 00:22:35.600 --> 00:22:37.225 So change directory-- so not that hard. 00:22:37.225 --> 00:22:38.210 It changes directory. 00:22:38.210 --> 00:22:39.980 And now notice what the sandbox does. 00:22:39.980 --> 00:22:42.950 It's a little more powerful, even though it's a little more cryptic. 00:22:42.950 --> 00:22:45.200 It always puts a constant reminder of where 00:22:45.200 --> 00:22:48.800 you are in the folders in your IDE, whereas the sandbox hid 00:22:48.800 --> 00:22:49.880 this detail altogether. 00:22:49.880 --> 00:22:52.547 So again, we're removing a training wheel by just reminding you, 00:22:52.547 --> 00:22:55.730 you are in source two and the tilde is just a computer convention, 00:22:55.730 --> 00:22:57.560 meaning that is your home directory, that 00:22:57.560 --> 00:23:02.450 is your personal folder with your CS50 files, demarcated with just a tilde. 00:23:02.450 --> 00:23:05.150 So now I'm going to go ahead and do make buggy two. 00:23:05.150 --> 00:23:08.130 It does compile, because again, this is not a syntax error. 00:23:08.130 --> 00:23:09.660 This is a logical problem. 00:23:09.660 --> 00:23:12.480 I'm to go ahead now and dot slash buggy two. 00:23:12.480 --> 00:23:16.550 And if I count these up, I've still got 11 hashes on the screen. 00:23:16.550 --> 00:23:18.800 So I could go in and add printf, but that's not really 00:23:18.800 --> 00:23:20.790 taking advantage of any new tools. 00:23:20.790 --> 00:23:22.610 But watch what I can instead do. 00:23:22.610 --> 00:23:26.310 Let me scroll this down just a little bit so I can see all of my code. 00:23:26.310 --> 00:23:31.700 Let me go ahead and click to the left of the line numbers in the IDE, 00:23:31.700 --> 00:23:35.360 like in main, and it puts a red dot, like a stop sign that says stop here. 00:23:35.360 --> 00:23:37.040 This is what's called a breakpoint. 00:23:37.040 --> 00:23:39.950 This is a feature of a lot of integrated development environments, 00:23:39.950 --> 00:23:42.830 like CS50 IDE that's telling the computer in advance, 00:23:42.830 --> 00:23:45.590 when I run this program, don't just run it like usual, 00:23:45.590 --> 00:23:50.450 stop there, and allow me, the human, to step through my code, step 00:23:50.450 --> 00:23:52.220 by step by step. 00:23:52.220 --> 00:23:55.880 So to do this, you do not just run buggy two again. 00:23:55.880 --> 00:23:58.340 You instead run debug 50. 00:23:58.340 --> 00:24:02.310 So just like help 50 helps you understand error messages, debug 50 00:24:02.310 --> 00:24:05.930 lets you walk through your program step by step by step. 00:24:05.930 --> 00:24:07.550 So let me go ahead and hit Enter. 00:24:07.550 --> 00:24:10.760 You'll notice now on the right-hand side a new window 00:24:10.760 --> 00:24:12.523 that the sandbox did not have opened up. 00:24:12.523 --> 00:24:15.690 And there's a lot going on there, but we'll soon see the pieces that matter. 00:24:15.690 --> 00:24:16.980 That is the debugger. 00:24:16.980 --> 00:24:19.560 And you'll see that this line here, line seven, 00:24:19.560 --> 00:24:23.192 is highlighted, because that's the first real piece of code inside of main 00:24:23.192 --> 00:24:24.900 that's potentially going to get executed. 00:24:24.900 --> 00:24:26.775 Nothing really happens with the curly braces. 00:24:26.775 --> 00:24:28.630 Seven is the first real line of code. 00:24:28.630 --> 00:24:30.450 So what this yellow or greenish bar means 00:24:30.450 --> 00:24:34.500 is that the debugger has paused your program at that moment in time, 00:24:34.500 --> 00:24:38.460 has not run all the way through, so we can start to poke around. 00:24:38.460 --> 00:24:41.790 And in fact, if I zoom in on the right, let's focus today 00:24:41.790 --> 00:24:46.890 pretty much on variables, you'll notice a nice little visual clue 00:24:46.890 --> 00:24:48.810 that you have a variable called i. 00:24:48.810 --> 00:24:50.430 At the moment, its value is zero. 00:24:50.430 --> 00:24:51.570 What is its type? 00:24:51.570 --> 00:24:52.740 Integer. 00:24:52.740 --> 00:24:56.400 So watch what happens now when I take advantage of some of the icons 00:24:56.400 --> 00:24:57.688 that are slightly higher up. 00:24:57.688 --> 00:25:00.480 I'm just going to scroll up on the debugger, and most of this we'll 00:25:00.480 --> 00:25:03.010 ignore for today, but there's some icons here. 00:25:03.010 --> 00:25:05.730 So if I were to hit Play, that will just resume my program 00:25:05.730 --> 00:25:07.950 and run it all the way to the end-- not very useful 00:25:07.950 --> 00:25:09.610 if my goal was to step through it. 00:25:09.610 --> 00:25:13.530 But if you hover over these other icons instead, step over, 00:25:13.530 --> 00:25:17.320 this will step over one line of code at a time, 00:25:17.320 --> 00:25:19.740 and execute it one by one by one, so literally 00:25:19.740 --> 00:25:21.960 allowing you to walk through your own code. 00:25:21.960 --> 00:25:23.010 And so let's try this. 00:25:23.010 --> 00:25:27.570 When I go ahead and click Step Over, notice that the color moves. 00:25:27.570 --> 00:25:30.600 Watch my terminal window now, the big blue window at the bottom. 00:25:30.600 --> 00:25:31.980 I'm going to see hash. 00:25:31.980 --> 00:25:33.990 Now notice that line seven is highlighted again, 00:25:33.990 --> 00:25:35.400 because just with a for loop, something's 00:25:35.400 --> 00:25:37.020 going to happen again and again. 00:25:37.020 --> 00:25:41.274 So what should we see happen though when I click step over once more? 00:25:41.274 --> 00:25:42.215 AUDIENCE: [INAUDIBLE] 00:25:42.215 --> 00:25:43.590 DAVID MALAN: i should become one. 00:25:43.590 --> 00:25:46.757 So it's a little small, but watch the right-hand side of the screen where it 00:25:46.757 --> 00:25:49.650 says variable i, and I click Step Over-- 00:25:49.650 --> 00:25:51.630 voila, now we see one. 00:25:51.630 --> 00:25:54.840 And if I continue doing this, not much of interest really happens. 00:25:54.840 --> 00:25:57.990 I've just really slowed down the same program. 00:25:57.990 --> 00:26:01.590 But you'll notice that i is incrementing again and again and again. 00:26:01.590 --> 00:26:03.690 But what's interesting here is I didn't have 00:26:03.690 --> 00:26:07.238 to go in and change my code by adding a bunch of messy printf statements 00:26:07.238 --> 00:26:09.780 that I'm going to have to delete later just to submit my code 00:26:09.780 --> 00:26:11.100 or ship it on the internet. 00:26:11.100 --> 00:26:15.150 Instead, I can kind of watch what's going on inside of my computer's memory 00:26:15.150 --> 00:26:17.160 while I'm executing this program. 00:26:17.160 --> 00:26:21.750 And the fact now that the value of i is 10, 00:26:21.750 --> 00:26:26.830 and yet I'm about to print another hash, therein lies the same logical error. 00:26:26.830 --> 00:26:30.790 So we're seeing just graphically the same problem as before. 00:26:30.790 --> 00:26:33.040 So now at this point, the program is pretty much done. 00:26:33.040 --> 00:26:35.750 If I keep clicking Step Over, it's just going to terminate. 00:26:35.750 --> 00:26:37.500 If at this point, I'm like, oh my god, now 00:26:37.500 --> 00:26:41.240 I know it's wrong, you can exit out of most any program in the IDE 00:26:41.240 --> 00:26:43.847 or in sandbox by hitting Control c, for cancel, 00:26:43.847 --> 00:26:45.930 and that will kill the debugger, close the window, 00:26:45.930 --> 00:26:47.860 and get you back to your terminal window. 00:26:47.860 --> 00:26:51.660 And I can't emphasize this enough, moving forward even this week, 00:26:51.660 --> 00:26:56.247 use help 50 when you have a bug compiling your code, some error message 00:26:56.247 --> 00:26:57.330 that you don't understand. 00:26:57.330 --> 00:26:59.160 It will just help you like a member of the staff could. 00:26:59.160 --> 00:27:01.827 And then certainly reach out to us if you don't understand that. 00:27:01.827 --> 00:27:04.800 But debug 50 should, moving forward, be your first instinct. 00:27:04.800 --> 00:27:06.960 If you have a bug where something's not working, 00:27:06.960 --> 00:27:08.670 the amount of change your computing is wrong, 00:27:08.670 --> 00:27:10.503 the credit card numbers you're analyzing are 00:27:10.503 --> 00:27:14.790 wrong, use debug 50, starting this week, not two weeks from now, 00:27:14.790 --> 00:27:16.890 to develop that muscle memory of using a debugger. 00:27:16.890 --> 00:27:21.630 And it is truly a lifelong skill, not just for C, but for other languages 00:27:21.630 --> 00:27:23.310 as well. 00:27:23.310 --> 00:27:26.400 Any questions on that? 00:27:26.400 --> 00:27:29.910 You'll see more of it in section and beyond. 00:27:29.910 --> 00:27:33.480 So what else do we have in the way of tools in our toolkit here? 00:27:33.480 --> 00:27:35.888 Let's go ahead and introduce one other now. 00:27:35.888 --> 00:27:38.430 That one you've probably used this past week called check 50. 00:27:38.430 --> 00:27:41.740 This is a tool that allows you to analyze the correctness of your code. 00:27:41.740 --> 00:27:45.090 And you might recall with check 50, you did a little something like this. 00:27:45.090 --> 00:27:50.520 If I went ahead and whipped up a program, like my typical hello dot c-- 00:27:50.520 --> 00:27:54.160 so I've gone ahead and clicked Save, saving this file as hello dot c. 00:27:54.160 --> 00:27:57.900 Let me go ahead and include standard Io dot h, int main void. 00:27:57.900 --> 00:27:59.940 Let me go ahead now and printf. 00:27:59.940 --> 00:28:03.720 Hello comma world backslash n semicolon. 00:28:03.720 --> 00:28:06.870 And I know from the problem sets, that the way 00:28:06.870 --> 00:28:09.660 to check the correctness of this code with CS50-- 00:28:09.660 --> 00:28:12.570 check 50 and then a slug, a unique identifier. 00:28:12.570 --> 00:28:17.280 I'm using a shorter one just for lecture today called CS50 problems hello. 00:28:17.280 --> 00:28:20.760 That is just the unique set of tests that I want to run on my code 00:28:20.760 --> 00:28:22.095 called hello dot c. 00:28:22.095 --> 00:28:24.720 So what's happening here is I'm being prompted to authenticate. 00:28:24.720 --> 00:28:27.120 GitHub is what this uses, as you've seen. 00:28:27.120 --> 00:28:29.220 I'm going to go ahead and use my student account. 00:28:29.220 --> 00:28:32.020 I'm going to go ahead and log in. 00:28:32.020 --> 00:28:34.170 You'll notice a star represents your password, 00:28:34.170 --> 00:28:37.140 so it kind of sort of masks it, even though everyone in the world now 00:28:37.140 --> 00:28:39.007 knows how long my password is. 00:28:39.007 --> 00:28:41.340 And now we're preparing, we're uploading the submission, 00:28:41.340 --> 00:28:43.423 and in just a few seconds, we'll get some feedback 00:28:43.423 --> 00:28:46.650 from CS50's server that tells us, hopefully, 00:28:46.650 --> 00:28:49.110 that my code is perfectly correct-- 00:28:49.110 --> 00:28:50.520 perfectly correct. 00:28:50.520 --> 00:28:52.740 But no, it's not in this case. 00:28:52.740 --> 00:28:54.570 And if you recall from problem set one, you 00:28:54.570 --> 00:28:56.362 weren't supposed to just print hello world. 00:28:56.362 --> 00:28:59.550 You were supposed to print hello so and so, whatever the human's name is. 00:28:59.550 --> 00:29:03.020 So you'll see two green smileys here saying hello dot c exists. 00:29:03.020 --> 00:29:04.140 So I got that one right. 00:29:04.140 --> 00:29:05.580 I named the file correctly. 00:29:05.580 --> 00:29:08.400 Step two, it compiled, so there were no error messages 00:29:08.400 --> 00:29:10.140 when we ran make on your code. 00:29:10.140 --> 00:29:12.660 But we did get unhappy twice. 00:29:12.660 --> 00:29:16.230 We expected when passing in the name Emma, for you to say hello Emma. 00:29:16.230 --> 00:29:19.590 And when we expected to pass in Rodrigo, we expected hello Rodrigo, 00:29:19.590 --> 00:29:22.510 so you did not pass these two tests. 00:29:22.510 --> 00:29:26.970 So check 50 happens to be CS50 specific, that the TF's and I use to grade 00:29:26.970 --> 00:29:29.460 and provide automated feedback on code, but it's 00:29:29.460 --> 00:29:33.120 representative of what in the real world are just quite simply called tests. 00:29:33.120 --> 00:29:36.300 Whenever you work for a company or write software, part of that process 00:29:36.300 --> 00:29:39.150 is typically not just to write the code that solves your problem, 00:29:39.150 --> 00:29:43.650 but to write tests that make sure that your own code is correct, especially 00:29:43.650 --> 00:29:47.160 so that if you add features to your programs down the road or someone else 00:29:47.160 --> 00:29:50.490 tries to add features to your code, they and you don't break it-- 00:29:50.490 --> 00:29:54.690 you're constantly have a capability to make sure your code is still 00:29:54.690 --> 00:29:56.140 working as expected. 00:29:56.140 --> 00:29:59.460 So while we do use it in academic context to score problems sets, 00:29:59.460 --> 00:30:02.610 it's fundamentally representative of a real-world process 00:30:02.610 --> 00:30:06.270 of testing one's own code repeatedly. 00:30:06.270 --> 00:30:08.760 And then lastly, there's this thing-- style 50. 00:30:08.760 --> 00:30:11.490 So it's not uncommon when learning how to program, especially 00:30:11.490 --> 00:30:13.830 in a language like C, to be a little sloppy when 00:30:13.830 --> 00:30:15.150 it comes to writing your code. 00:30:15.150 --> 00:30:18.300 Technically speaking, this same program here, 00:30:18.300 --> 00:30:19.800 I could just make it look like this. 00:30:19.800 --> 00:30:23.058 And frankly, if I really wanted to, I can make it look like this, 00:30:23.058 --> 00:30:24.600 and the computer's not going to care. 00:30:24.600 --> 00:30:27.142 It's smart enough to be able to distinguish the various curly 00:30:27.142 --> 00:30:29.280 braces from parentheses and semicolons. 00:30:29.280 --> 00:30:32.100 But my god, this is not very pleasant to look at. 00:30:32.100 --> 00:30:34.470 Or if it is right now, break that mindset. 00:30:34.470 --> 00:30:36.610 This is not very pleasant to look at. 00:30:36.610 --> 00:30:40.440 You should be writing code that's easier for you to read, for other people 00:30:40.440 --> 00:30:42.940 to read, and honestly, easier for you to maintain. 00:30:42.940 --> 00:30:46.530 There is nothing worse than writing really bad code, coming back 00:30:46.530 --> 00:30:49.140 to it weeks or months later to fix something, add something, 00:30:49.140 --> 00:30:52.590 and you don't even know what you're looking at because it's your own code. 00:30:52.590 --> 00:30:56.130 So style 50 is a tool that just helps you develop muscle 00:30:56.130 --> 00:30:58.410 memory for writing prettier code. 00:30:58.410 --> 00:31:00.870 Style has nothing to do with your coach correctness. 00:31:00.870 --> 00:31:04.500 It's more of the nit picky aesthetics that just makes it pleasant to look at. 00:31:04.500 --> 00:31:08.100 And reasonable people will disagree as to what constitutes pretty code. 00:31:08.100 --> 00:31:11.160 With style 50, we, like a company, have standardized 00:31:11.160 --> 00:31:14.010 on what we would propose your C code looks like, 00:31:14.010 --> 00:31:17.470 so that we can have an objective measure of how clean it is. 00:31:17.470 --> 00:31:22.170 So if I go ahead and run, after saving my file, style 50 on hello dot c, 00:31:22.170 --> 00:31:24.660 Enter, you'll see some output like this. 00:31:24.660 --> 00:31:27.510 You'll see your same code in black and white at the bottom, 00:31:27.510 --> 00:31:30.665 but you'll see green text telling you where you should add space. 00:31:30.665 --> 00:31:32.790 So you should literally hit the spacebar four times 00:31:32.790 --> 00:31:35.130 and that will make style 50 happy. 00:31:35.130 --> 00:31:39.150 By contrast, if I instead do something like this, let me go ahead 00:31:39.150 --> 00:31:41.580 and correct it incorrectly. 00:31:41.580 --> 00:31:45.190 There are people in the world that write code that looks like this. 00:31:45.190 --> 00:31:46.950 This is frowned upon. 00:31:46.950 --> 00:31:50.310 But if I go ahead and run style 50 now on this file-- 00:31:50.310 --> 00:31:52.140 Enter-- you'll see the opposite. 00:31:52.140 --> 00:31:54.390 And it gets a little scarier with this syntax, 00:31:54.390 --> 00:31:57.780 because we're doing our best to explain what it is we want you to do. 00:31:57.780 --> 00:32:02.100 But we want you to delete the new line, the Enter key that you hit here, 00:32:02.100 --> 00:32:04.380 and we want you to pull it up to the top here, 00:32:04.380 --> 00:32:06.090 and we want you to delete that read here. 00:32:06.090 --> 00:32:08.430 So admittedly, it's sometimes hard for the computer 00:32:08.430 --> 00:32:11.950 to give you very straightforward advice as to what's going on. 00:32:11.950 --> 00:32:14.850 So you'll see over time, certain patterns. 00:32:14.850 --> 00:32:17.850 So in fact, if I go to CS50's own website here, 00:32:17.850 --> 00:32:20.980 let me go ahead and pull up what's called a style guide. 00:32:20.980 --> 00:32:22.740 And this is the authoritative answer when 00:32:22.740 --> 00:32:25.995 it comes to what your code should look like in a class or in a company. 00:32:25.995 --> 00:32:27.870 You'll see throughout this style guide that's 00:32:27.870 --> 00:32:31.590 online a lot of examples of what good code, pretty code, 00:32:31.590 --> 00:32:33.600 readable code should look like. 00:32:33.600 --> 00:32:35.670 And there, too, reasonable people will disagree, 00:32:35.670 --> 00:32:39.930 but it's part of the programming process to have good style for your code, 00:32:39.930 --> 00:32:44.610 as well in style 50 allows you to develop that muscle memory, as well. 00:32:44.610 --> 00:32:48.840 And one aside, whereas the sandbox tool used to auto save your file, 00:32:48.840 --> 00:32:50.350 the IDE does not do that. 00:32:50.350 --> 00:32:53.160 So notice I just hit Enter a couple of times in this file, 00:32:53.160 --> 00:32:57.390 or suppose I said something like Goodbye World more explicitly, and suppose I 00:32:57.390 --> 00:32:59.740 now move my cursor to the terminal window, 00:32:59.740 --> 00:33:02.840 you'll see a big red alert saying, hey did not save your file. 00:33:02.840 --> 00:33:06.090 That's because the IDE is meant to be a little more powerful and a little more 00:33:06.090 --> 00:33:10.020 of the onus now is on you to actually know OK, red dot up there 00:33:10.020 --> 00:33:11.280 means I should save. 00:33:11.280 --> 00:33:14.760 So file, Save, or you can hit Control s or Command s. 00:33:14.760 --> 00:33:18.900 So just realize that is now unto you. 00:33:18.900 --> 00:33:23.790 And lastly, a summary of what all these tools really figure into. 00:33:23.790 --> 00:33:25.620 Pretty much, the first four of these tools 00:33:25.620 --> 00:33:28.710 all relate to the writing correct code, code 00:33:28.710 --> 00:33:32.280 that works the way you want it to, code the way we want it to, 00:33:32.280 --> 00:33:36.120 code the way that some problem to be solved wants you to implement it. 00:33:36.120 --> 00:33:40.080 Style is the last of those, and that's really the best categorization thereof. 00:33:40.080 --> 00:33:43.027 Of course, not always do these tools solve all of your problems. 00:33:43.027 --> 00:33:44.985 And undoubtedly, if you didn't experience this, 00:33:44.985 --> 00:33:47.520 this past week already, you will get frustrated. 00:33:47.520 --> 00:33:51.450 You will get incredibly frustrated sometimes by some bug in your code 00:33:51.450 --> 00:33:52.950 and you might be staring at it. 00:33:52.950 --> 00:33:53.940 You might be thinking it through. 00:33:53.940 --> 00:33:56.790 You might try all of these darn tools, go to office hours tutorial, 00:33:56.790 --> 00:33:59.400 and it's still not working out for you. 00:33:59.400 --> 00:34:01.920 Frankly, the solution there is to take a step back. 00:34:01.920 --> 00:34:06.580 And I can't emphasize enough the value of going for a jog, taking a break, 00:34:06.580 --> 00:34:08.580 doing something else, changing your mental model 00:34:08.580 --> 00:34:10.020 and coming back to it later. 00:34:10.020 --> 00:34:14.580 I have literally, and I'm sure many of the TF's and TA's have, solved code 00:34:14.580 --> 00:34:17.670 while falling asleep, because there, you're sort of thoughtfully 00:34:17.670 --> 00:34:20.639 thinking through what it is you did, what it is you're trying to do. 00:34:20.639 --> 00:34:23.738 But undoubtedly, it helps to talk through your problems some time. 00:34:23.738 --> 00:34:26.280 And there's this other term of art in computer science called 00:34:26.280 --> 00:34:27.570 rubber duck debugging. 00:34:27.570 --> 00:34:30.690 The idea being that if you don't have a TF at your side 00:34:30.690 --> 00:34:34.080 or CA at your side or roommate who has any idea what you're talking about when 00:34:34.080 --> 00:34:37.472 it comes to programming, you can have one of these little things on your desk 00:34:37.472 --> 00:34:39.389 that you can literally, probably with the door 00:34:39.389 --> 00:34:43.320 closed, start talking to, to explain to the duck, just like you would 00:34:43.320 --> 00:34:47.580 a teaching fellow, what it is you think your code is doing, walking through 00:34:47.580 --> 00:34:49.710 it line-by-line verbally, until hopefully, you 00:34:49.710 --> 00:34:52.710 have that self-induced aha moment, like oh, wait a minute, 00:34:52.710 --> 00:34:55.718 it's supposed to be 10 not 11, at which point, 00:34:55.718 --> 00:34:58.260 you discretely put the duck back down and go about your work. 00:34:58.260 --> 00:35:01.320 But it is meant to be this proxy for just 00:35:01.320 --> 00:35:04.595 a very deliberate thoughtful process to which everyone is welcome. 00:35:04.595 --> 00:35:06.720 You're welcome to take a duck today on your way out 00:35:06.720 --> 00:35:08.490 and we have lots more tutorials and office hours, 00:35:08.490 --> 00:35:10.260 because this is not enough here today. 00:35:10.260 --> 00:35:12.900 This is just because it exists. 00:35:12.900 --> 00:35:18.390 But the goal with rubber duck debugging is just that additional human mechanism 00:35:18.390 --> 00:35:22.260 for solving problems by taking the emphasis off of tools 00:35:22.260 --> 00:35:24.150 and putting it really back on the human. 00:35:24.150 --> 00:35:27.000 So if a little socially awkwardly, consider 00:35:27.000 --> 00:35:30.870 deploying that tool as needed as well. 00:35:30.870 --> 00:35:34.110 So that's all focusing on correctness and style, 00:35:34.110 --> 00:35:36.323 and that's indeed what every problem set here on out 00:35:36.323 --> 00:35:37.740 is going to have as one component. 00:35:37.740 --> 00:35:40.020 Does it work correctly and is it well styled? 00:35:40.020 --> 00:35:42.090 But the third axis of quality, when it comes 00:35:42.090 --> 00:35:45.360 to writing software, not just for CS50 but really in general 00:35:45.360 --> 00:35:49.050 with programming in the real world, is this notion of design. 00:35:49.050 --> 00:35:53.070 And design isn't quite something that we can assess yet with software, 00:35:53.070 --> 00:35:55.020 and say you designed that well or you did not 00:35:55.020 --> 00:35:57.450 design that well, it's more of a subjective measure. 00:35:57.450 --> 00:35:59.610 And here, too, reasonable people can disagree. 00:35:59.610 --> 00:36:02.760 So what we'll focus on, not only today, but in the weeks to come, 00:36:02.760 --> 00:36:06.540 is also the process of writing well-designed software 00:36:06.540 --> 00:36:10.230 and making more intelligent decisions to not just get the problem solved, 00:36:10.230 --> 00:36:11.640 but to get it solved well. 00:36:11.640 --> 00:36:14.732 And this is what full-time software engineers at the Facebooks and Googles 00:36:14.732 --> 00:36:16.440 and Microsofts and others of the world do 00:36:16.440 --> 00:36:19.380 every day, especially when they have huge amounts of data 00:36:19.380 --> 00:36:20.730 and many, many users. 00:36:20.730 --> 00:36:25.560 Every design decision they make matters and might cost money or CPU cycles 00:36:25.560 --> 00:36:26.710 or memory. 00:36:26.710 --> 00:36:28.620 And indeed, think back to week zero, finding 00:36:28.620 --> 00:36:31.200 Mike Smith was possible in three different ways, 00:36:31.200 --> 00:36:33.270 but that third way, the divide and conquer, 00:36:33.270 --> 00:36:35.400 was hands down the most efficient. 00:36:35.400 --> 00:36:37.830 That was better designed than the first couple. 00:36:37.830 --> 00:36:41.220 So let's now consider this in the context of programming 00:36:41.220 --> 00:36:45.900 and how we can use a few new features today in C to solve problems better 00:36:45.900 --> 00:36:48.540 and to write better designed code. 00:36:48.540 --> 00:36:52.080 And we'll do that first by way of something that is called an array. 00:36:52.080 --> 00:36:56.520 So an array is something that allows us to solve a problem, 00:36:56.520 --> 00:36:59.890 in perhaps, the following way. 00:36:59.890 --> 00:37:02.640 So in our computers-- 00:37:02.640 --> 00:37:06.390 in our programs in C, we have choices of bunches of data types. 00:37:06.390 --> 00:37:09.630 We've seen that there's chars, there's ints, there's floats, there's longs, 00:37:09.630 --> 00:37:12.510 there's doubles, there's bool, there's now string, 00:37:12.510 --> 00:37:14.550 and there's actually a few others as well. 00:37:14.550 --> 00:37:18.180 And each of those, depending on the computer system you're using, 00:37:18.180 --> 00:37:22.050 does take up a specific amount of space, on CS50, IDE, on the sandbox, 00:37:22.050 --> 00:37:24.750 and most likely on your own personal Macs and PCs. 00:37:24.750 --> 00:37:27.120 These days, each one of these data types, 00:37:27.120 --> 00:37:30.180 if you're writing a program in C, takes up this much space, 00:37:30.180 --> 00:37:33.630 where one byte is 8 bits, 4 bytes is 32 bits, 00:37:33.630 --> 00:37:37.450 8 bytes is 64 bits, to tie it back to week zero. 00:37:37.450 --> 00:37:39.930 So these are data types that we have at our disposal 00:37:39.930 --> 00:37:42.660 for any variables in our computer's memory. 00:37:42.660 --> 00:37:44.400 So why is that germane here? 00:37:44.400 --> 00:37:46.530 Well, this is that thing I showed a couple of weeks 00:37:46.530 --> 00:37:49.440 ago too, which is representative of RAM, random access memory. 00:37:49.440 --> 00:37:52.880 It's one of the pieces of hard drive in your macro PC or even phone these days. 00:37:52.880 --> 00:37:56.730 And each of these black chips represents some number of bytes. 00:37:56.730 --> 00:37:58.680 Odds are, small although it is in reality, 00:37:58.680 --> 00:38:02.730 it might represent a billion bytes if you have one gigabyte of memory, 00:38:02.730 --> 00:38:04.560 or maybe even more than that these days. 00:38:04.560 --> 00:38:07.860 But this little black chip, inside of your Mac, PC, or phone, 00:38:07.860 --> 00:38:11.190 is where information is stored when you're running software, 00:38:11.190 --> 00:38:13.860 whether it's on a desktop, or laptop, or mobile device. 00:38:13.860 --> 00:38:16.440 And we can actually think of this chip as just 00:38:16.440 --> 00:38:19.770 being divided into a bunch of different individual bytes. 00:38:19.770 --> 00:38:21.900 In fact, let's just arbitrarily zoom in on it 00:38:21.900 --> 00:38:24.030 and sort of divide it into rows and columns, 00:38:24.030 --> 00:38:27.870 and just claim that the top left here is going to be the first byte. 00:38:27.870 --> 00:38:30.240 This is the second byte, the third byte, and way down 00:38:30.240 --> 00:38:32.880 here is like the billionth byte of memory in my computer, 00:38:32.880 --> 00:38:36.420 obviously not drawn to scale, which is to say we can just number these bytes. 00:38:36.420 --> 00:38:38.830 So one, two, three, four, five, six, seven, eight, 00:38:38.830 --> 00:38:43.030 or to be really computer science like zero, one, two, three, four, five, six, 00:38:43.030 --> 00:38:45.010 seven, and so forth. 00:38:45.010 --> 00:38:46.860 So we don't have to know anything about how 00:38:46.860 --> 00:38:50.172 RAM works, electrically or physically, but let's 00:38:50.172 --> 00:38:52.380 just stipulate that if you've got some amount of RAM, 00:38:52.380 --> 00:38:56.220 we can surely think of each byte as having a number. 00:38:56.220 --> 00:38:57.510 So what does that do for us? 00:38:57.510 --> 00:39:01.560 Well if you write a program that has a char in it, a character, 00:39:01.560 --> 00:39:05.310 how big was a char according to the chart a moment ago? 00:39:05.310 --> 00:39:06.210 So just one byte. 00:39:06.210 --> 00:39:11.380 So if you allocate a char, called c, or called anything in your program, 00:39:11.380 --> 00:39:14.977 you will be asking the computer to use just one of these tiny little squares 00:39:14.977 --> 00:39:16.810 physically inside of your computer's memory. 00:39:16.810 --> 00:39:19.990 By contrast, how about an int-- how big was an int? 00:39:19.990 --> 00:39:20.590 Four bytes. 00:39:20.590 --> 00:39:23.130 So if you want to store a number as an integer, 00:39:23.130 --> 00:39:26.380 you're actually going to consume four of these bytes in your computer's memory 00:39:26.380 --> 00:39:26.880 instead. 00:39:26.880 --> 00:39:30.970 And if you're using a double or long, you might use as many of eight of them. 00:39:30.970 --> 00:39:32.903 So what is inside each of these boxes? 00:39:32.903 --> 00:39:35.320 There's eight bits here, eight bits here, eight bits here, 00:39:35.320 --> 00:39:37.960 or maybe it's eight little transistors, or even eight little light bulbs. 00:39:37.960 --> 00:39:41.050 Whatever they are, they're some way of representing zeros and ones. 00:39:41.050 --> 00:39:43.480 And that's what each of those boxes represents. 00:39:43.480 --> 00:39:45.410 So what can we do with this information? 00:39:45.410 --> 00:39:47.410 Well, let's go ahead and get rid of the hardware 00:39:47.410 --> 00:39:49.660 and abstract away, so to speak, as we keep doing, 00:39:49.660 --> 00:39:54.690 and consider if we zoom in here, how the computer, last week and this week 00:39:54.690 --> 00:39:58.990 end forever here out, is storing the information in the programs 00:39:58.990 --> 00:40:00.280 that you write. 00:40:00.280 --> 00:40:04.330 Suppose for instance, that we've got a program like this, 00:40:04.330 --> 00:40:05.770 with just three characters in it. 00:40:05.770 --> 00:40:11.860 I'm going to go ahead and whip this up in a file called, let's say, hi dot c. 00:40:11.860 --> 00:40:15.880 And I'm going to go ahead and do include standard Io dot h, int main void-- 00:40:18.550 --> 00:40:19.533 learning. 00:40:19.533 --> 00:40:22.450 Now in here, I'm going to go ahead and have those three lines of code. 00:40:22.450 --> 00:40:25.090 So give me one char called c1 arbitrarily 00:40:25.090 --> 00:40:27.820 and set it equal to a capital H. Give me another one called 00:40:27.820 --> 00:40:31.870 c2, set it equal to capital I. Give me a third called c3, 00:40:31.870 --> 00:40:34.630 and set that equal to the exclamation point. 00:40:34.630 --> 00:40:40.180 Now you'll notice one detail that I've not emphasized before, I don't think. 00:40:40.180 --> 00:40:44.350 What types of punctuation am I clearly using here? 00:40:44.350 --> 00:40:46.510 So single quotes or apostrophes here. 00:40:46.510 --> 00:40:49.840 Single quotes in C are necessary for chars. 00:40:49.840 --> 00:40:52.110 Chars or single characters, just one byte. 00:40:52.110 --> 00:40:54.610 Whenever you want to hardcode them into a program like this, 00:40:54.610 --> 00:40:56.450 like I've done here, use single quotes. 00:40:56.450 --> 00:40:59.540 Of course for strings we used double quotes. 00:40:59.540 --> 00:41:00.040 Why? 00:41:00.040 --> 00:41:00.820 Just because. 00:41:00.820 --> 00:41:03.160 Like C requires that we distinguish those two. 00:41:03.160 --> 00:41:05.518 So let me just do something a little silly here. 00:41:05.518 --> 00:41:07.810 Now that I've got three variables, let me just go ahead 00:41:07.810 --> 00:41:08.770 and print them all out. 00:41:08.770 --> 00:41:10.480 What is the format code I can print-- 00:41:10.480 --> 00:41:12.480 I can use to print a char? 00:41:12.480 --> 00:41:13.960 Yeah, a percent-- 00:41:13.960 --> 00:41:15.340 AUDIENCE: [INAUDIBLE] 00:41:15.340 --> 00:41:18.475 DAVID MALAN: Percent c for char, so percent c, and I want three of them. 00:41:18.475 --> 00:41:21.380 So I'm going to print all three at once, followed by a new line. 00:41:21.380 --> 00:41:23.560 And then if I want to print c1 first, c2, 00:41:23.560 --> 00:41:28.270 c3, that's the syntax with printf for just plugging in three place 00:41:28.270 --> 00:41:31.390 holders followed by three values, respectively left to right, 00:41:31.390 --> 00:41:34.600 and hopefully it's going to print presumably hi 00:41:34.600 --> 00:41:36.500 on the screen followed by a new line. 00:41:36.500 --> 00:41:37.970 So let me save the file. 00:41:37.970 --> 00:41:39.860 Let me do make hi. 00:41:39.860 --> 00:41:41.200 OK, no errors, which is good. 00:41:41.200 --> 00:41:46.330 Let me do dot slash hi, and indeed I see hi exclamation point, however 00:41:46.330 --> 00:41:49.330 with a space in between each character. 00:41:49.330 --> 00:41:50.350 But you know what? 00:41:50.350 --> 00:41:56.200 hi exclamation point are indeed chars, but what is a char, or a character? 00:41:56.200 --> 00:41:58.740 What is an Ascii character underneath the hood? 00:41:58.740 --> 00:41:59.680 AUDIENCE: [INAUDIBLE] 00:41:59.680 --> 00:42:00.700 DAVID MALAN: It's ultimately binary. 00:42:00.700 --> 00:42:01.630 Everything is binary. 00:42:01.630 --> 00:42:04.025 And what's one step in between there, in some sense? 00:42:04.025 --> 00:42:04.900 AUDIENCE: [INAUDIBLE] 00:42:04.900 --> 00:42:06.733 DAVID MALAN: It's just a number, an integer. 00:42:06.733 --> 00:42:09.220 Thanks to Ascii and Unicode in week zero, 00:42:09.220 --> 00:42:12.007 there's just a mapping from characters to numbers. 00:42:12.007 --> 00:42:13.090 So how do I print numbers? 00:42:13.090 --> 00:42:15.154 What format code do I use for printf? 00:42:15.154 --> 00:42:16.810 AUDIENCE: [INAUDIBLE] 00:42:16.810 --> 00:42:19.390 DAVID MALAN: Percent i, for integer. 00:42:19.390 --> 00:42:22.360 So suppose I want to actually see those values? 00:42:22.360 --> 00:42:23.710 Notice what I can do. 00:42:23.710 --> 00:42:26.020 I can tell the computer, you know what? 00:42:26.020 --> 00:42:30.190 Even though c1 is a char, please go ahead and treat it as an integer. 00:42:30.190 --> 00:42:33.740 And I can literally write int in parentheses before the variable, 00:42:33.740 --> 00:42:36.790 which is what's known as casting, C-A-S-T, 00:42:36.790 --> 00:42:41.410 which is just a verb describing the act of converting one data type to another 00:42:41.410 --> 00:42:43.240 so that I can actually see those numbers. 00:42:43.240 --> 00:42:45.240 So let me go ahead and save the file. 00:42:45.240 --> 00:42:50.530 Let me go ahead now and do make hi again. 00:42:50.530 --> 00:42:51.790 That seems to work fine. 00:42:51.790 --> 00:42:57.880 Dot slash hi, and now this old familiar 72, 73, 33. 00:42:57.880 --> 00:43:00.040 And frankly, I don't need to be so pedantic here. 00:43:00.040 --> 00:43:04.300 Frankly, clang is smart enough to just know that if I pass it a char, 00:43:04.300 --> 00:43:06.340 but I ask it to format it is an int, it's 00:43:06.340 --> 00:43:09.980 going to implicitly, not explicitly, cast it for me. 00:43:09.980 --> 00:43:13.720 So if I go ahead and run make hi again, and do dot slash hi, 00:43:13.720 --> 00:43:15.390 I'm going to see the exact same thing. 00:43:15.390 --> 00:43:17.890 So this understanding of what's going on underneath the hood 00:43:17.890 --> 00:43:20.260 can allow me to kind of tinker now and play around 00:43:20.260 --> 00:43:22.990 with what's going on inside of my computer's memory. 00:43:22.990 --> 00:43:25.120 But let's now see this more visually. 00:43:25.120 --> 00:43:27.700 If this is my computer's memory really magnified, 00:43:27.700 --> 00:43:31.270 such that there's like a billion squares somewhere available to me 00:43:31.270 --> 00:43:33.490 and this is zero, this is one, this is two. 00:43:33.490 --> 00:43:37.300 Suppose I have a program with three variables-- c1, c2, and c3-- 00:43:37.300 --> 00:43:39.100 what the computer is going to do is going 00:43:39.100 --> 00:43:41.180 to put the h in one of those boxes. 00:43:41.180 --> 00:43:43.180 It's going to put the i in another box, and it's 00:43:43.180 --> 00:43:45.190 going to put the exclamation point in a third box, 00:43:45.190 --> 00:43:48.490 and somehow or other it's going to label those with the names of the variables. 00:43:48.490 --> 00:43:52.420 It's going to sort of jot down as with a virtual pencil, this is c1, this is c2, 00:43:52.420 --> 00:43:53.500 this is c3. 00:43:53.500 --> 00:43:56.170 But it's the H-I exclamation point that's 00:43:56.170 --> 00:43:58.660 actually stored at that location. 00:43:58.660 --> 00:44:00.190 But of course, it's not just a char. 00:44:00.190 --> 00:44:01.697 It's really technically a number. 00:44:01.697 --> 00:44:04.030 So really what's going on inside of my computer's memory 00:44:04.030 --> 00:44:06.740 is that 72, 73, and 33 is stored. 00:44:06.740 --> 00:44:09.430 But someone called out earlier it's actually binary. 00:44:09.430 --> 00:44:13.030 So what's really underneath the hood is this. 00:44:13.030 --> 00:44:15.280 Those zeros and ones are somehow implemented 00:44:15.280 --> 00:44:18.190 with transistors or light bulbs or whatever the technology is, 00:44:18.190 --> 00:44:20.830 but it's just storing a pattern of zeros and ones. 00:44:20.830 --> 00:44:22.360 And I did out the math before class. 00:44:22.360 --> 00:44:26.650 This indeed represents 72 in decimal, 73, and 33. 00:44:26.650 --> 00:44:30.310 But here, too, we're getting to a low-level implementation detail 00:44:30.310 --> 00:44:32.380 that we generally don't need to care about. 00:44:32.380 --> 00:44:35.260 Abstraction, per week zero, is this beautiful thing 00:44:35.260 --> 00:44:38.170 because we could just, meh, tune all that out and just think 00:44:38.170 --> 00:44:41.650 of it at any higher level that we want, whether it's decimal 00:44:41.650 --> 00:44:44.230 or whether it's actual Ascii characters. 00:44:44.230 --> 00:44:46.640 But that's all that's going on underneath the hood. 00:44:46.640 --> 00:44:47.140 Yeah? 00:44:47.140 --> 00:44:51.028 AUDIENCE: [INAUDIBLE] 00:44:55.383 --> 00:44:56.800 DAVID MALAN: Really good question. 00:44:56.800 --> 00:45:02.200 If you declared three variables as integers and stored 72, 73, 33 in them 00:45:02.200 --> 00:45:04.840 and tried to print them then with percent c, 00:45:04.840 --> 00:45:08.260 yes, you could coerce that behavior as well, and literally do the opposite. 00:45:08.260 --> 00:45:11.410 At that point, you need to know what the Ascii codes are-- 00:45:11.410 --> 00:45:12.850 72, 73, 33. 00:45:12.850 --> 00:45:15.460 And mostly, programmers don't care about that. 00:45:15.460 --> 00:45:18.340 All they do is know that there is some mapping underneath the hood, 00:45:18.340 --> 00:45:19.510 but absolutely. 00:45:19.510 --> 00:45:22.090 Well let's consider another example now, this time involving 00:45:22.090 --> 00:45:26.290 three score, so three integers, instead of something like three characters. 00:45:26.290 --> 00:45:29.000 What might I actually do with values like this? 00:45:29.000 --> 00:45:32.110 Well, let me go ahead and write some code, this time in a file 00:45:32.110 --> 00:45:35.977 called scores dot c. 00:45:35.977 --> 00:45:38.560 I'm going to go ahead and clean up my terminal here and create 00:45:38.560 --> 00:45:42.220 a new file called scores dot c. 00:45:42.220 --> 00:45:45.550 And let's go ahead and do a few similar lines here. 00:45:45.550 --> 00:45:50.830 Let me go ahead and include say, CS50 dot h, include standard Io dot h, 00:45:50.830 --> 00:45:55.360 int main void, and now go ahead and start declaring some variables. 00:45:55.360 --> 00:45:56.790 Give me int score one. 00:45:56.790 --> 00:45:59.560 And I'm going to declare my score on some assignment 00:45:59.560 --> 00:46:03.900 to be 72, another score on an assignment to be about the same, 73, 00:46:03.900 --> 00:46:06.970 and another regrettable assignment to be, say, 33. 00:46:06.970 --> 00:46:09.910 So now I have three variables called integers, and suppose I just want 00:46:09.910 --> 00:46:11.770 to do something like print the average. 00:46:11.770 --> 00:46:14.170 I can certainly do this with printf and some math. 00:46:14.170 --> 00:46:18.537 So I might go ahead and say the average is % i, 00:46:18.537 --> 00:46:20.870 where that's going to be a placeholder, then a new line. 00:46:20.870 --> 00:46:23.912 And then the average, of course, is going to be something like score one, 00:46:23.912 --> 00:46:28.990 plus score two, plus score three, divided by three total, and then 00:46:28.990 --> 00:46:29.810 semicolon. 00:46:29.810 --> 00:46:30.760 So again, that's just the average. 00:46:30.760 --> 00:46:33.510 Add three numbers together, divide by the total number, and voila, 00:46:33.510 --> 00:46:35.020 we should get an average. 00:46:35.020 --> 00:46:40.120 Let me go ahead and save the file, compile this with make scores, Enter. 00:46:40.120 --> 00:46:42.370 Seems to compile OK-- dot slash scores. 00:46:42.370 --> 00:46:46.420 And I should get an average of 59 for those three quiz scores, or assignment 00:46:46.420 --> 00:46:48.260 scores, in this context. 00:46:48.260 --> 00:46:50.350 But this isn't the best design now. 00:46:50.350 --> 00:46:52.600 Now that we're dealing with numbers and scores, 00:46:52.600 --> 00:46:55.100 especially in the context of like a class where maybe you're 00:46:55.100 --> 00:46:58.300 going to have four scores or five scores or more scores, ultimately, 00:46:58.300 --> 00:46:59.320 week to week. 00:46:59.320 --> 00:47:03.132 What rubs you perhaps the wrong way about this design so far? 00:47:03.132 --> 00:47:04.392 AUDIENCE: [INAUDIBLE] 00:47:04.392 --> 00:47:05.350 DAVID MALAN: Say again. 00:47:05.350 --> 00:47:06.752 AUDIENCE: I 00:47:06.752 --> 00:47:08.210 DAVID MALAN: Yeah, it's very fixed. 00:47:08.210 --> 00:47:10.310 This is like writing a program at the beginning of the semester 00:47:10.310 --> 00:47:13.102 and deciding in advance there's only going to be three assignments, 00:47:13.102 --> 00:47:15.140 and if you want to have a fourth, too bad. 00:47:15.140 --> 00:47:17.010 The software does not support it. 00:47:17.010 --> 00:47:18.260 So that's not the best design. 00:47:18.260 --> 00:47:21.650 And what else might you critique about this code, simple as it is. 00:47:21.650 --> 00:47:22.396 Yeah? 00:47:22.396 --> 00:47:26.204 AUDIENCE: [INAUDIBLE] 00:47:28.110 --> 00:47:30.600 DAVID MALAN: Yeah, I'm potentially cheating students out 00:47:30.600 --> 00:47:34.320 of a partial score, especially if their average was like 59.5. 00:47:34.320 --> 00:47:36.470 I would like to be rounded up to 60, for instance. 00:47:36.470 --> 00:47:38.520 So we're also having some imprecision issues. 00:47:38.520 --> 00:47:40.020 And we'll come back to that as well. 00:47:40.020 --> 00:47:40.740 Any other critiques? 00:47:40.740 --> 00:47:41.240 Yeah? 00:47:41.240 --> 00:47:44.720 AUDIENCE: [INAUDIBLE] 00:47:44.720 --> 00:47:47.930 DAVID MALAN: Yeah, even though I typed it out manually, 00:47:47.930 --> 00:47:51.137 this is dangerously close to just copying and pasting the same code again 00:47:51.137 --> 00:47:51.970 and again and again. 00:47:51.970 --> 00:47:55.570 So just with the hi example, as with this one, as with our cough example 00:47:55.570 --> 00:47:59.470 last week and the week before, just doing this thing again and again 00:47:59.470 --> 00:48:02.260 and again is really an opportunity for a better design. 00:48:02.260 --> 00:48:05.020 So it turns out, there is that opportunity. 00:48:05.020 --> 00:48:10.490 And in C, if you know that you want to have more than just one value, 00:48:10.490 --> 00:48:12.610 but they're all kind of related, what might 00:48:12.610 --> 00:48:16.720 be a nice name for a variable containing multiple scores? 00:48:16.720 --> 00:48:17.620 AUDIENCE: [INAUDIBLE] 00:48:17.620 --> 00:48:19.420 DAVID MALAN: Scores plural in English. 00:48:19.420 --> 00:48:20.990 So how can we do that? 00:48:20.990 --> 00:48:23.650 Well unfortunately, if I just say int scores, 00:48:23.650 --> 00:48:25.650 I need to decide which score it gets as a value. 00:48:25.650 --> 00:48:27.942 Now those of you who have prior programming experience, 00:48:27.942 --> 00:48:31.030 might know where we're going with this, and we're about to get there. 00:48:31.030 --> 00:48:34.990 It turns out in C, if you want to have one variable that 00:48:34.990 --> 00:48:39.430 can store multiple values, you use what's called an array. 00:48:39.430 --> 00:48:44.680 An array is a list of values that can be all the same type 00:48:44.680 --> 00:48:46.790 in a variable of the same name. 00:48:46.790 --> 00:48:50.140 So if you want three scores, each of which is an int in C, 00:48:50.140 --> 00:48:53.620 you literally use square brackets, the number of scores you want, 00:48:53.620 --> 00:48:54.640 and then a semicolon. 00:48:54.640 --> 00:48:58.750 That will say to the computer, give me enough memory for three integers. 00:48:58.750 --> 00:49:01.340 Down here now, I get to change my syntax. 00:49:01.340 --> 00:49:03.640 I don't want score one, score two, score three. 00:49:03.640 --> 00:49:10.240 I want to put these scores inside of the array by simply saying its name, 00:49:10.240 --> 00:49:14.530 using square brackets, albeit a little differently this time, 00:49:14.530 --> 00:49:17.420 and put them at locations one, two, three, 00:49:17.420 --> 00:49:19.150 but that's actually my first mistake. 00:49:19.150 --> 00:49:22.320 Computer scientists typically start counting at one-- 00:49:22.320 --> 00:49:26.170 no-- computer scientists typically start counting at zero, 00:49:26.170 --> 00:49:29.590 so I need to zero index my array. 00:49:29.590 --> 00:49:34.360 Arrays are zero indexed, which just means the first location is zero, 00:49:34.360 --> 00:49:36.730 the second is one, the third is two. 00:49:36.730 --> 00:49:39.960 So this now, is equivalent code to giving me three variables, 00:49:39.960 --> 00:49:42.460 but now I've gotten rid of the messiness that you identified 00:49:42.460 --> 00:49:44.560 by copying and pasting the name again and again, 00:49:44.560 --> 00:49:46.374 and I can store them all together. 00:49:46.374 --> 00:49:50.573 AUDIENCE: On the scores, the number three stands for three variables, 00:49:50.573 --> 00:49:51.073 right? 00:49:51.073 --> 00:49:53.440 It doesn't stand for four? 00:49:53.440 --> 00:49:56.350 DAVID MALAN: Does the three stand for three variables? 00:49:56.350 --> 00:50:02.110 It stands for enough space for three values in one variable. 00:50:02.110 --> 00:50:03.140 Good question. 00:50:03.140 --> 00:50:05.035 Others, questions? 00:50:05.035 --> 00:50:05.535 Yeah? 00:50:05.535 --> 00:50:10.063 AUDIENCE: [INAUDIBLE] bringing equals and then [INAUDIBLE] 00:50:10.063 --> 00:50:11.480 DAVID MALAN: Really good question. 00:50:11.480 --> 00:50:13.070 Can you do this all in one line? 00:50:13.070 --> 00:50:15.710 Yes, but let me just tease you by saying something 00:50:15.710 --> 00:50:18.530 like this involving curly braces, but we won't go there today. 00:50:18.530 --> 00:50:20.750 But yes, there are ways to get around this. 00:50:20.750 --> 00:50:22.337 So let me go ahead and fix this now. 00:50:22.337 --> 00:50:24.170 If I want to compute the average now, I need 00:50:24.170 --> 00:50:30.470 to add these three values in this array, score zero, scores one, and scores two. 00:50:30.470 --> 00:50:32.510 But arithmetically, the answer-- 00:50:32.510 --> 00:50:37.280 the code is still the same, so if I now make scores and do dot slash scores, 00:50:37.280 --> 00:50:38.680 my average is still 59. 00:50:38.680 --> 00:50:41.180 And I do disclaim, there's still probably a mathematical bug 00:50:41.180 --> 00:50:43.580 because if we're using integers, as was noted, 00:50:43.580 --> 00:50:46.110 but we'll come back to that in just a little bit. 00:50:46.110 --> 00:50:47.360 So let's push a little harder. 00:50:47.360 --> 00:50:50.720 Even if you've never programmed before, what might still 00:50:50.720 --> 00:50:52.970 be a little bad about the design. 00:50:52.970 --> 00:50:55.250 The program works, but we can do it better. 00:50:55.250 --> 00:50:57.170 AUDIENCE: Still only stores three. 00:50:57.170 --> 00:50:58.190 DAVID MALAN: Still only stores three. 00:50:58.190 --> 00:51:00.232 So we haven't even solved the very first problem. 00:51:00.232 --> 00:51:01.010 Other critiques? 00:51:01.010 --> 00:51:02.845 AUDIENCE: [INAUDIBLE] 00:51:02.845 --> 00:51:04.970 DAVID MALAN: I have too much code in the last line. 00:51:04.970 --> 00:51:06.710 Yeah, it's getting a little wordy, so it's 00:51:06.710 --> 00:51:08.752 going to be a little harder to read-- quite fair. 00:51:08.752 --> 00:51:09.570 Yeah? 00:51:09.570 --> 00:51:10.154 AUDIENCE: I 00:51:10.154 --> 00:51:11.946 DAVID MALAN: Sorry, say it a little louder. 00:51:11.946 --> 00:51:14.080 AUDIENCE: The scores are hardcoded into the program. 00:51:14.080 --> 00:51:16.320 DAVID MALAN: Yeah, the scores are hardcoded into the program, 00:51:16.320 --> 00:51:18.110 which means it doesn't matter what you get on your assignments, 00:51:18.110 --> 00:51:19.670 we're all getting 59's. 00:51:19.670 --> 00:51:21.400 So that's another problem as well. 00:51:21.400 --> 00:51:22.400 And any other critiques? 00:51:22.400 --> 00:51:23.060 Yeah? 00:51:23.060 --> 00:51:25.770 AUDIENCE: If it could read the input data, it might be better. 00:51:25.770 --> 00:51:27.770 DAVID MALAN: If it could read input data-- yeah, 00:51:27.770 --> 00:51:29.270 so let me combine those suggestions. 00:51:29.270 --> 00:51:31.930 It'd be great if, eventually, this program is dynamic. 00:51:31.930 --> 00:51:32.850 And anything else? 00:51:32.850 --> 00:51:33.513 Yeah? 00:51:33.513 --> 00:51:35.260 AUDIENCE: [INAUDIBLE] 00:51:35.260 --> 00:51:36.260 DAVID MALAN: Definitely. 00:51:36.260 --> 00:51:38.360 We can pull loop into the situation and actually 00:51:38.360 --> 00:51:40.595 get multiple values from the user. 00:51:40.595 --> 00:51:44.470 AUDIENCE: Always dividing by three, so [INAUDIBLE] 00:51:44.470 --> 00:51:46.720 DAVID MALAN: Yeah, it's also always dividing by three. 00:51:46.720 --> 00:51:49.780 And this is subtle, and it's not a huge problem yet, 00:51:49.780 --> 00:51:53.290 but there is this principle I'm kind of violating here known 00:51:53.290 --> 00:51:54.940 as don't repeat yourself. 00:51:54.940 --> 00:51:58.090 And I have repeated myself in at least two locations. 00:51:58.090 --> 00:52:01.310 What values appear in two locations? 00:52:01.310 --> 00:52:04.750 So three up here, and then also three down here. 00:52:04.750 --> 00:52:10.240 And minor though this detail seems, this is the source of so many common bugs 00:52:10.240 --> 00:52:12.360 because if you just kind of decide by yourself, 00:52:12.360 --> 00:52:13.630 well, I'm going to hard code three up here, 00:52:13.630 --> 00:52:15.970 I'm going to hard code three down here, odds are, 00:52:15.970 --> 00:52:18.650 tomorrow morning, next week, next month, next year, 00:52:18.650 --> 00:52:20.920 let alone a colleague of yours, is never going 00:52:20.920 --> 00:52:24.580 to notice the subtlety that this three just by social contract 00:52:24.580 --> 00:52:26.690 has to be the same as this three. 00:52:26.690 --> 00:52:27.940 That is not a code constraint. 00:52:27.940 --> 00:52:31.390 That's just sort of a little thing you knew and decided at the time. 00:52:31.390 --> 00:52:33.350 So let me fix this in the following way. 00:52:33.350 --> 00:52:38.260 It turns out that in C we can have variables that just have numbers 00:52:38.260 --> 00:52:41.560 like this, so maybe int n gets three. 00:52:41.560 --> 00:52:45.670 I can now just use my variable here and here. 00:52:45.670 --> 00:52:46.725 That's a little better. 00:52:46.725 --> 00:52:47.600 It's a little better. 00:52:47.600 --> 00:52:50.475 But there's this other feature in C, as with other languages too, 00:52:50.475 --> 00:52:52.600 where if you know you want to hard code some value, 00:52:52.600 --> 00:52:56.140 at least for now, but you don't want it to change, you will not change it 00:52:56.140 --> 00:52:58.780 and you want to make sure you don't accidentally change it, 00:52:58.780 --> 00:53:02.410 you can actually do something like this and even make it global if we want, 00:53:02.410 --> 00:53:09.192 at the top of the file, I can say not just int n, but const int n, 00:53:09.192 --> 00:53:10.900 and just because of human convention, I'm 00:53:10.900 --> 00:53:13.940 also going to now capitalize the variable, just because. 00:53:13.940 --> 00:53:17.290 And now I'm going to change this n to capital, this n to capital. 00:53:17.290 --> 00:53:20.702 The reason being, I have just created for myself what's called a constant. 00:53:20.702 --> 00:53:23.410 A constant is exactly what the word implies, even though you just 00:53:23.410 --> 00:53:26.680 say const, and then the type of the variable, the compiler, clang, 00:53:26.680 --> 00:53:29.560 we'll make sure that neither you nor some friend or colleague 00:53:29.560 --> 00:53:31.960 accidentally change the value of n. 00:53:31.960 --> 00:53:35.690 So now you can use n here, here, and any number of other places. 00:53:35.690 --> 00:53:37.697 It will always be the same. 00:53:37.697 --> 00:53:40.780 And what I'm using at the moment is what's called a global variable, which 00:53:40.780 --> 00:53:43.570 are often frowned upon, even though you can put variables outside 00:53:43.570 --> 00:53:45.760 of your functions, as we may eventually see, 00:53:45.760 --> 00:53:49.030 it tends to be sloppy, except with constants. 00:53:49.030 --> 00:53:53.765 When a constant is a value that you want to set and then forget about, 00:53:53.765 --> 00:53:56.890 if you come back to this program weeks or months later, and you're like oh, 00:53:56.890 --> 00:53:59.200 this semester we have four assignments, or five, 00:53:59.200 --> 00:54:02.080 it's just handy to put the values you might 00:54:02.080 --> 00:54:05.680 want to change before recompiling your code at the very top 00:54:05.680 --> 00:54:09.185 so you have to go fishing for visually lower in your code. 00:54:09.185 --> 00:54:10.060 So just a convention. 00:54:10.060 --> 00:54:13.210 It goes at the top of the file, quite often, and you declare it as const, 00:54:13.210 --> 00:54:19.280 and you capitalize it, and then you can use that value, n, throughout the code. 00:54:19.280 --> 00:54:22.480 But now let's tie together those other suggestions and make this program 00:54:22.480 --> 00:54:24.430 even better, such that it's not just hard 00:54:24.430 --> 00:54:27.940 coding this one value, n, everywhere. 00:54:27.940 --> 00:54:30.170 Let me go ahead and get rid of this. 00:54:30.170 --> 00:54:33.460 Let me go ahead now and take your suggestion that we do this dynamically, 00:54:33.460 --> 00:54:35.590 and we can use arrays for this too. 00:54:35.590 --> 00:54:39.700 If I know in advance that I want to ask the user for how many assignments there 00:54:39.700 --> 00:54:42.280 are this semester, well I can do something like this. 00:54:42.280 --> 00:54:48.800 Int n gets get int, and I'll say number of scores, 00:54:48.800 --> 00:54:50.950 and then prompt them for their input. 00:54:50.950 --> 00:54:54.550 And then what I'm going to do after that is give myself an array 00:54:54.550 --> 00:54:57.910 called scores of size n as step two. 00:54:57.910 --> 00:55:00.130 And then what I might do is something like this. 00:55:00.130 --> 00:55:05.140 For int i get zero, i less than n, i plus plus, 00:55:05.140 --> 00:55:08.140 which even though I'm typing it fast, is exactly the same paradigm we've 00:55:08.140 --> 00:55:10.360 used before, for, for loops. 00:55:10.360 --> 00:55:12.610 And here, I could do something like scores 00:55:12.610 --> 00:55:21.490 bracket i gets get int score semicolon, prompting the user again and again 00:55:21.490 --> 00:55:24.730 and again for a loop for the IFE score, so to speak. 00:55:24.730 --> 00:55:29.230 And because I start counting at zero, and on up to, but not through n, 00:55:29.230 --> 00:55:34.030 I will end up filling this with exactly as many scores as the human requested. 00:55:34.030 --> 00:55:37.568 Let's go ahead now and leave this as a to do for a moment. 00:55:37.568 --> 00:55:39.610 Let me just because the math's about the change-- 00:55:39.610 --> 00:55:42.485 let me go ahead and delete that and we'll just not do the average yet 00:55:42.485 --> 00:55:44.380 just so I can compile this first. 00:55:44.380 --> 00:55:46.540 I'm going to go ahead and make scores again-- 00:55:46.540 --> 00:55:47.610 seems to compile. 00:55:47.610 --> 00:55:55.460 Dot slash scores, number of scores-- let's do three, so 72, 73, 33, Enter, 00:55:55.460 --> 00:55:56.710 and my average is still to do. 00:55:56.710 --> 00:55:58.030 So we'll come back to that. 00:55:58.030 --> 00:55:58.400 But you know what? 00:55:58.400 --> 00:56:00.430 It would be nice to make this a little prettier. 00:56:00.430 --> 00:56:03.850 Why don't I tell the human what score I want from them, so I can say, give me 00:56:03.850 --> 00:56:05.910 score number such and such, i. 00:56:05.910 --> 00:56:09.730 So let me just use get int, like this. 00:56:09.730 --> 00:56:13.810 Now let me go ahead and make scores, dot slash scores. 00:56:13.810 --> 00:56:15.110 Give me three scores again. 00:56:15.110 --> 00:56:18.633 Score zero, 72, 73, 33. 00:56:18.633 --> 00:56:20.050 Now this is kind of stupid, right? 00:56:20.050 --> 00:56:23.230 At least for normal people who might use my program, what is score zero? 00:56:23.230 --> 00:56:24.190 What is score one? 00:56:24.190 --> 00:56:27.998 We can fix this for normal people, and just do that. 00:56:27.998 --> 00:56:30.040 We're not changing where we're putting the value, 00:56:30.040 --> 00:56:32.665 but we can certainly change the aesthetics of what we're doing. 00:56:32.665 --> 00:56:34.230 So let's remake scores. 00:56:34.230 --> 00:56:37.050 Dot slash scores, and now it's more human friendly-- 00:56:37.050 --> 00:56:40.300 72, 73, 33. 00:56:40.300 --> 00:56:41.620 So one piece remains. 00:56:41.620 --> 00:56:43.870 How do I now compute the average in a way 00:56:43.870 --> 00:56:46.620 that's dynamic and I'm not hard coding score one, score two, score 00:56:46.620 --> 00:56:48.660 three again, or even the array version? 00:56:48.660 --> 00:56:49.410 And you know what? 00:56:49.410 --> 00:56:51.202 This is a nice opportunity to maybe come up 00:56:51.202 --> 00:56:55.160 with a helper function that also solves the int issue from before. 00:56:55.160 --> 00:56:56.910 So let me go ahead and say, you know what? 00:56:56.910 --> 00:56:59.280 The average could perhaps have a fraction. 00:56:59.280 --> 00:57:02.940 So what data type do I want to use if my average might have a fraction? 00:57:02.940 --> 00:57:03.860 So a double or float. 00:57:03.860 --> 00:57:04.860 So we'll go with either. 00:57:04.860 --> 00:57:08.190 I'll keep it simple because the scores are going to be crazy big or precise. 00:57:08.190 --> 00:57:10.540 I'm going to create a function called average. 00:57:10.540 --> 00:57:14.820 And if I want to average all of the numbers that the human has typed in, 00:57:14.820 --> 00:57:16.710 turns out I need to know two things. 00:57:16.710 --> 00:57:20.790 I need to know the length of the array that they've been accumulating 00:57:20.790 --> 00:57:23.825 and I need to have the array itself, so I'm 00:57:23.825 --> 00:57:25.950 going to denote it with these square brackets here. 00:57:25.950 --> 00:57:28.830 I don't have to know, at this point, how big it is. 00:57:28.830 --> 00:57:30.990 The compiler will figure that out for me. 00:57:30.990 --> 00:57:34.720 But I can now declare a function like this. 00:57:34.720 --> 00:57:38.318 Well how do you go about averaging some number of values, 00:57:38.318 --> 00:57:40.860 if you're handed them in a list, otherwise known as an array, 00:57:40.860 --> 00:57:45.390 but I'm telling you the length of that list, what's this sort of intuition 00:57:45.390 --> 00:57:48.390 for taking an average here? 00:57:48.390 --> 00:57:49.068 Yeah? 00:57:49.068 --> 00:57:53.280 AUDIENCE: You could take the sum and then divide it by [INAUDIBLE] number. 00:57:53.280 --> 00:57:54.470 DAVID MALAN: Yeah. 00:57:54.470 --> 00:57:56.220 Yeah, the average of a bunch of numbers is 00:57:56.220 --> 00:57:58.262 just add all the numbers together and then divide 00:57:58.262 --> 00:57:59.640 by the total number of numbers. 00:57:59.640 --> 00:58:01.140 And I have all of those ingredients. 00:58:01.140 --> 00:58:03.030 I have the length of the array, apparently, 00:58:03.030 --> 00:58:05.505 and I have the array of numbers itself, as follows. 00:58:05.505 --> 00:58:07.380 So let me go ahead and say something like sum 00:58:07.380 --> 00:58:09.880 is zero, because I'm just going to start counting from zero, 00:58:09.880 --> 00:58:14.580 and then I'm going to do for int i get zero, i less than length, i plus plus. 00:58:14.580 --> 00:58:17.850 So again, I typed it fast, but it's identical to my for loop from before. 00:58:17.850 --> 00:58:20.310 I'm just using the length as the condition. 00:58:20.310 --> 00:58:21.840 And now what do I want to do here? 00:58:21.840 --> 00:58:26.800 On each iteration, what do I want to add to the sum? 00:58:26.800 --> 00:58:28.460 Sum equals sum plus what? 00:58:28.460 --> 00:58:29.883 AUDIENCE: [INAUDIBLE] 00:58:29.883 --> 00:58:31.550 DAVID MALAN: The next item in the array. 00:58:31.550 --> 00:58:33.380 And I can express that, it turns out, just 00:58:33.380 --> 00:58:37.070 like before the name of the array, which happens to be literally array, just 00:58:37.070 --> 00:58:38.160 for convenience. 00:58:38.160 --> 00:58:41.520 And then how do I get the appropriate value from it? 00:58:41.520 --> 00:58:45.040 Bracket i, because i is going to start in this loop at zero, 00:58:45.040 --> 00:58:47.630 going to go up to, but not through its length. 00:58:47.630 --> 00:58:50.630 So this is just a way of getting bracket zero, bracket one, bracket two, 00:58:50.630 --> 00:58:53.510 and just adding it to sum on each iteration. 00:58:53.510 --> 00:58:55.820 Now this is unnecessarily wordy. 00:58:55.820 --> 00:58:59.210 Recall, that this is shorthand notation for that. 00:58:59.210 --> 00:59:01.940 I can't just use plus, plus here though, because I want 00:59:01.940 --> 00:59:04.070 to add the actual scores not just one. 00:59:04.070 --> 00:59:07.280 So I can use either this syntax or the more verbose syntax, 00:59:07.280 --> 00:59:08.540 but I'll go with this one. 00:59:08.540 --> 00:59:11.480 And now at the end of this function, notice I have to make a decision. 00:59:11.480 --> 00:59:14.120 And we haven't seen terribly many functions of our own, 00:59:14.120 --> 00:59:17.420 but if this is what my function looks like, its name is average, 00:59:17.420 --> 00:59:22.100 it takes two inputs, one of which is an int called length, the other of which 00:59:22.100 --> 00:59:26.150 is an array of integers, and I know it's an array not by its name, which 00:59:26.150 --> 00:59:30.290 I could have called anything, but I know it because of these new square brackets 00:59:30.290 --> 00:59:31.790 today. 00:59:31.790 --> 00:59:36.405 However, what does this mention of float mean on the left-hand side of line 18? 00:59:36.405 --> 00:59:37.280 AUDIENCE: [INAUDIBLE] 00:59:37.280 --> 00:59:38.780 DAVID MALAN: That's what it returns. 00:59:38.780 --> 00:59:42.740 The return value of a function is what it hands back to whoever is using it. 00:59:42.740 --> 00:59:44.690 So get string, returns a string. 00:59:44.690 --> 00:59:46.160 Get int, returns an int. 00:59:46.160 --> 00:59:48.860 Average I want to return a float. 00:59:48.860 --> 00:59:51.000 And so how do I return this value? 00:59:51.000 --> 00:59:55.120 Well, let me go ahead and return the sum divided by the length, 00:59:55.120 --> 00:59:56.990 as I think you proposed? 00:59:56.990 --> 01:00:00.748 Now there's actually one bug here, but we'll come back to that in a moment. 01:00:00.748 --> 01:00:02.790 Now let me just go ahead and plug in the average. 01:00:02.790 --> 01:00:06.240 What's the format code for a floating point value? 01:00:06.240 --> 01:00:07.400 Percent f, yeah. 01:00:07.400 --> 01:00:09.710 And then if I want to plug in the average, 01:00:09.710 --> 01:00:12.600 I can call my function called average. 01:00:12.600 --> 01:00:15.440 And what two inputs do I need to give it? 01:00:15.440 --> 01:00:20.760 n, which is the length of the array, and scores, which is the name of the array. 01:00:20.760 --> 01:00:22.970 So again, even though arrays are new, this is not. 01:00:22.970 --> 01:00:27.230 We have last week called functions that take one or more arguments 01:00:27.230 --> 01:00:28.830 and it's certainly fine to nest them. 01:00:28.830 --> 01:00:30.913 However, if you don't like that, you can certainly 01:00:30.913 --> 01:00:32.720 do something like this-- float average gets 01:00:32.720 --> 01:00:34.890 that, and then you can plug in average. 01:00:34.890 --> 01:00:36.920 But again, in the spirit of good design, you're 01:00:36.920 --> 01:00:39.260 just doubling the number of lines unnecessarily. 01:00:39.260 --> 01:00:42.710 So I'm going to go ahead and nest it just like this. 01:00:42.710 --> 01:00:44.430 All right, let me save that. 01:00:44.430 --> 01:00:46.527 And I feel really good about this so far. 01:00:46.527 --> 01:00:48.110 I feel like everything's making sense. 01:00:48.110 --> 01:00:49.220 So make scores. 01:00:49.220 --> 01:00:50.690 And oh, my god. 01:00:53.450 --> 01:00:58.042 Line 15 seems to be at fault. So we can certainly use help 50, 01:00:58.042 --> 01:00:59.750 but let's see if we can't reason through. 01:00:59.750 --> 01:01:00.792 What mistake have I made? 01:01:04.130 --> 01:01:06.920 It's highlighted here, even though it's very non obvious. 01:01:06.920 --> 01:01:07.862 Yeah? 01:01:07.862 --> 01:01:12.101 AUDIENCE: [INAUDIBLE] 01:01:12.685 --> 01:01:13.560 DAVID MALAN: Exactly. 01:01:13.560 --> 01:01:16.230 My function is at the bottom of my file and C is kind of dumb. 01:01:16.230 --> 01:01:18.730 It only does what it's told, top to bottom, left to right. 01:01:18.730 --> 01:01:20.670 And if your function averages at the bottom, 01:01:20.670 --> 01:01:23.400 but you're trying to use it in main, that's too late. 01:01:23.400 --> 01:01:26.380 So we can fix this in a couple of ways, just as we did last week. 01:01:26.380 --> 01:01:28.380 I can kind of sloppily just say, all right, well 01:01:28.380 --> 01:01:29.630 let's just move it to the top. 01:01:29.630 --> 01:01:31.350 That will solve that problem. 01:01:31.350 --> 01:01:33.727 But frankly, that moves main farther down 01:01:33.727 --> 01:01:36.060 and it's a good human convention to keep main at the top 01:01:36.060 --> 01:01:38.200 so you can see the main part of your program. 01:01:38.200 --> 01:01:41.700 This is why, last week, we introduced the notion of a prototype, 01:01:41.700 --> 01:01:44.940 where you literally-- and this is the only time where the copy-paste is OK-- 01:01:44.940 --> 01:01:48.810 you copy-paste the first line of your function and end it with a semicolon 01:01:48.810 --> 01:01:50.430 without any more currently braces. 01:01:50.430 --> 01:01:52.200 That's now a clue to solve that problem. 01:01:52.200 --> 01:01:53.770 Hey clang, here's a function. 01:01:53.770 --> 01:01:55.895 I'm not going to get around to implementing it yet, 01:01:55.895 --> 01:01:57.570 but you at least know what it's called. 01:01:57.570 --> 01:02:00.090 Now there's still a slight logical bug in here. 01:02:00.090 --> 01:02:04.270 Let me try re-saving and recompiling scores. 01:02:04.270 --> 01:02:05.880 It compiled this time-- nice. 01:02:05.880 --> 01:02:07.410 Let me go ahead and run scores. 01:02:07.410 --> 01:02:13.320 Number of scores will be three, 72, 73, 33. 01:02:13.320 --> 01:02:14.560 OK, that's pretty good. 01:02:14.560 --> 01:02:15.730 Let me try another one. 01:02:15.730 --> 01:02:17.190 How about two scores. 01:02:17.190 --> 01:02:21.570 100 and suppose you get a 99 on the other, 01:02:21.570 --> 01:02:24.620 you probably want your grade to be what? 01:02:24.620 --> 01:02:25.380 100, right. 01:02:25.380 --> 01:02:28.240 If it's 99.5, you'd prefer we round up. 01:02:28.240 --> 01:02:30.360 So where is that bug? 01:02:30.360 --> 01:02:32.490 Well let me scroll down here, and this is 01:02:32.490 --> 01:02:35.370 what you were alluding to earlier when you identified this early on. 01:02:35.370 --> 01:02:37.650 So I'm doing a couple of things incorrectly here. 01:02:37.650 --> 01:02:40.950 One, I'm adding the sum here. 01:02:40.950 --> 01:02:44.280 I'm using an int and initializing sum to zero, 01:02:44.280 --> 01:02:46.600 and then I'm dividing an integer by an integer. 01:02:46.600 --> 01:02:50.190 And this is subtle, but in C, if you divide an integer by an integer, 01:02:50.190 --> 01:02:52.920 just take a guess-- what do you get as the answer? 01:02:52.920 --> 01:02:53.800 AUDIENCE: An integer. 01:02:53.800 --> 01:02:54.420 DAVID MALAN: An integer. 01:02:54.420 --> 01:02:56.170 Integers can't store decimal points. 01:02:56.170 --> 01:03:01.398 So even if your score is 99.900000 ad nauseum, 01:03:01.398 --> 01:03:04.440 what's going to get thrown away is literally everything after the decimal 01:03:04.440 --> 01:03:05.140 point. 01:03:05.140 --> 01:03:07.290 So your grade is actually a 99. 01:03:07.290 --> 01:03:11.550 So there's a couple of ways we can fix this, but perhaps the simplest is this. 01:03:11.550 --> 01:03:14.220 I can use that casting feature from before. 01:03:14.220 --> 01:03:16.363 I can tell the computer, don't treat length 01:03:16.363 --> 01:03:19.530 as an int, actually treated as a float, and you know, just for good measure, 01:03:19.530 --> 01:03:20.820 also treat sum as a float. 01:03:20.820 --> 01:03:23.670 And there's different ways to do this, but now, I'm 01:03:23.670 --> 01:03:26.670 telling the computer divide a float by a float, which 01:03:26.670 --> 01:03:30.090 will allow me to return a float, and let's see what happens now. 01:03:30.090 --> 01:03:31.590 Let me save that. 01:03:31.590 --> 01:03:33.570 Make scores. 01:03:33.570 --> 01:03:34.350 It compiled. 01:03:34.350 --> 01:03:35.840 Dot slash scores. 01:03:35.840 --> 01:03:36.850 Number of scores is two. 01:03:36.850 --> 01:03:38.040 100 is the first. 01:03:38.040 --> 01:03:39.570 99 is the second. 01:03:39.570 --> 01:03:41.572 Nice, now I've gotten the grade I deserved. 01:03:41.572 --> 01:03:44.030 Heck, we could even bring in the round function if we want, 01:03:44.030 --> 01:03:46.863 which you might have used for p-set one, but we'll leave it as this. 01:03:46.863 --> 01:03:49.917 But I am going to go ahead and just do a 0.1 there. 01:03:49.917 --> 01:03:51.750 Recall that with format codes you can really 01:03:51.750 --> 01:03:54.700 start to get precise and say only show me one digit. 01:03:54.700 --> 01:03:59.430 So if I recompile this now, make scores, and do dot slash scores-- 01:03:59.430 --> 01:04:02.100 two scores-- 100, 99. 01:04:02.100 --> 01:04:09.730 There's my 99.5% Any questions then on these arrays and the use there of? 01:04:09.730 --> 01:04:10.327 Yeah? 01:04:10.327 --> 01:04:14.143 AUDIENCE: [INAUDIBLE] the average [INAUDIBLE] income scores by 01:04:14.143 --> 01:04:15.097 [INAUDIBLE] 01:04:15.097 --> 01:04:17.180 DAVID MALAN: Explain the average-- this part here? 01:04:17.180 --> 01:04:17.960 AUDIENCE: Yeah. 01:04:17.960 --> 01:04:19.140 DAVID MALAN: Sure, can I explain this? 01:04:19.140 --> 01:04:20.480 So, let me just show more of the code. 01:04:20.480 --> 01:04:22.438 The last line of this program's purpose in life 01:04:22.438 --> 01:04:25.610 is just to print the average of all of my scores. 01:04:25.610 --> 01:04:28.160 And I decided, partly for design purposes, 01:04:28.160 --> 01:04:32.780 but also today to illustrate a point, to relegate the computation of an average 01:04:32.780 --> 01:04:33.980 to a custom function. 01:04:33.980 --> 01:04:35.330 This is handy, because now if I ever work 01:04:35.330 --> 01:04:37.070 on another problem that needs to average, 01:04:37.070 --> 01:04:39.620 I've got a function I can use in that code too. 01:04:39.620 --> 01:04:43.340 But in this case, average takes two arguments, apparently 01:04:43.340 --> 01:04:45.970 the length of the array and the array itself, 01:04:45.970 --> 01:04:47.720 but I could call these two things anything 01:04:47.720 --> 01:04:51.350 I want-- x and y, length and array, anything else, 01:04:51.350 --> 01:04:53.120 but I chose this for clarity. 01:04:53.120 --> 01:04:54.860 But up here, I want to use that function. 01:04:54.860 --> 01:04:57.200 So just like in Scratch, recall that you can nest blocks 01:04:57.200 --> 01:04:59.430 and you can join something and then say it. 01:04:59.430 --> 01:05:02.120 So can we call the average function, passing 01:05:02.120 --> 01:05:04.610 in the length of the array and the array itself, 01:05:04.610 --> 01:05:08.000 that gives me back my average 99.5, and then I'm 01:05:08.000 --> 01:05:11.480 plugging that in to this format code in printf. 01:05:11.480 --> 01:05:13.888 So just like in math, when you have lots of parentheses, 01:05:13.888 --> 01:05:14.930 work from the inside out. 01:05:14.930 --> 01:05:17.388 Look at the innermost parentheses, figure out what that is, 01:05:17.388 --> 01:05:19.420 then work your way outward. 01:05:19.420 --> 01:05:22.880 And if you've programmed in Java, or Python, or other languages, 01:05:22.880 --> 01:05:25.040 you might be wondering why we need to tell 01:05:25.040 --> 01:05:27.110 the function the length of an array. 01:05:27.110 --> 01:05:30.440 In C, the arrays do not remember their own length. 01:05:30.440 --> 01:05:33.200 So if you have programmed before, this is necessary. 01:05:33.200 --> 01:05:36.920 You do not get that feature for free in C. Yeah? 01:05:36.920 --> 01:05:41.580 AUDIENCE: [INAUDIBLE] 01:05:41.580 --> 01:05:43.420 DAVID MALAN: Correct, if you do percent 0.1 01:05:43.420 --> 01:05:45.520 you get one decimal point, so 99.5%. 01:05:45.520 --> 01:05:49.843 AUDIENCE: Suppose that the answer was 99.49 [INAUDIBLE] 01:05:49.843 --> 01:05:51.260 DAVID MALAN: Really good question. 01:05:51.260 --> 01:05:57.170 If the answer is mathematically 99.49, but you do 0.1 here, 01:05:57.170 --> 01:05:58.940 it will round up for you. 01:05:58.940 --> 01:06:00.950 It will-- good question as well. 01:06:00.950 --> 01:06:01.650 Yeah? 01:06:01.650 --> 01:06:05.003 AUDIENCE: What happens [INAUDIBLE]? 01:06:05.003 --> 01:06:06.420 DAVID MALAN: Really good question. 01:06:06.420 --> 01:06:09.830 What happens if you divide an int by a float or something else? 01:06:09.830 --> 01:06:13.520 You will typically up cast it to whatever the more powerful type is. 01:06:13.520 --> 01:06:16.760 So if you divide an int by a float, you will actually get back a float. 01:06:16.760 --> 01:06:20.060 So strictly speaking, I did not need to cast both the numerator 01:06:20.060 --> 01:06:21.710 and the denominator to a float. 01:06:21.710 --> 01:06:25.160 I just did it for consistency and demonstration's sake. 01:06:25.160 --> 01:06:28.700 So it turns out, while we've been looking at numbers here alone 01:06:28.700 --> 01:06:31.250 and scores, it turns out that there's actually 01:06:31.250 --> 01:06:35.533 an intricate relationship with all of the h's and the i's and the exhalation 01:06:35.533 --> 01:06:37.700 points we've been looking at, and all of the strings 01:06:37.700 --> 01:06:40.670 we've been typing in too, however this was a mouthful, 01:06:40.670 --> 01:06:42.450 and frankly I feel like a brownie as well, 01:06:42.450 --> 01:06:45.283 so why don't we take our five minute break here and we'll come back. 01:06:47.570 --> 01:06:51.610 We are back. 01:06:51.610 --> 01:06:55.930 So thus far, we've introduced arrays as an opportunity 01:06:55.930 --> 01:06:58.048 to improve the design of our code. 01:06:58.048 --> 01:07:00.340 So we're going to hear a lot of squeaking now, I think. 01:07:00.340 --> 01:07:05.890 So thus far, we've introduced arrays as the-- 01:07:05.890 --> 01:07:08.240 we're going to do my best to keep a straight face. 01:07:08.240 --> 01:07:11.500 Thus far, we have introduced arrays as a solution to a design problem 01:07:11.500 --> 01:07:14.260 so that we can actually store multiple values, 01:07:14.260 --> 01:07:18.970 but in the guise of one variable so as to avoid the copy-paste tendency 01:07:18.970 --> 01:07:20.410 that we might otherwise have. 01:07:20.410 --> 01:07:24.250 And those arrays ultimately started from trying to clean this kind of code up. 01:07:24.250 --> 01:07:27.530 But what is it that was ultimately going on inside of the computer's memory 01:07:27.530 --> 01:07:30.520 we can still consider, because it's actually not all that different. 01:07:30.520 --> 01:07:34.960 However, when we have three integers, score one, score two, score three, 01:07:34.960 --> 01:07:38.530 how many bytes is each of those-- it's going to take up? 01:07:38.530 --> 01:07:41.770 So four, if you think back to the chat from before, char is one, 01:07:41.770 --> 01:07:45.160 an int is four, at least on most systems, and so the number 01:07:45.160 --> 01:07:49.180 72 in the variable called score one, we can draw on our computers 01:07:49.180 --> 01:07:50.950 memory is taking up four of these boxes. 01:07:50.950 --> 01:07:54.010 Because again, each box represents one byte, therefore four bytes 01:07:54.010 --> 01:07:55.300 requires four boxes. 01:07:55.300 --> 01:07:57.340 Score two and score three would similarly 01:07:57.340 --> 01:07:58.990 be laid out in my computer's memory. 01:07:58.990 --> 01:08:02.800 If I had three variables, score one, two, and three, as follows, like this. 01:08:02.800 --> 01:08:05.890 Of course what's underneath the hood is actually bits, 01:08:05.890 --> 01:08:09.500 but again, we don't need to worry about that level of abstraction anymore. 01:08:09.500 --> 01:08:11.920 But that's indeed all that's going on there. 01:08:11.920 --> 01:08:13.330 But we can clean this up. 01:08:13.330 --> 01:08:16.270 We can instead get rid of this copy-paste approach to variable names 01:08:16.270 --> 01:08:18.460 and just introduce an array called scores, 01:08:18.460 --> 01:08:22.689 plural, and then initialize those three values, as in the program I wrote here. 01:08:22.689 --> 01:08:27.490 And then, this picture is similar in spirit, but the names of these boxes, 01:08:27.490 --> 01:08:31.840 so to speak, become score zero, scores one, and scores two. 01:08:31.840 --> 01:08:36.640 So the array is now independent of the number of bytes being consumed. 01:08:36.640 --> 01:08:38.979 Just because an int is four bytes, doesn't 01:08:38.979 --> 01:08:43.359 mean you do score zero, scores four, scores eight, and so forth. 01:08:43.359 --> 01:08:44.979 It's still zero, one, two. 01:08:44.979 --> 01:08:49.990 The computer will figure out exactly how much space to give each of those values 01:08:49.990 --> 01:08:52.354 based on its type, which is an int. 01:08:52.354 --> 01:08:54.729 But it turns out that there's actually a relationship now 01:08:54.729 --> 01:08:58.330 to where we began this story when we looked at characters. 01:08:58.330 --> 01:09:01.660 H-I exclamation point was implemented with three lines of code 01:09:01.660 --> 01:09:03.760 using c1, c2, and c3. 01:09:03.760 --> 01:09:06.850 But last week, we already saw the notion of a string, 01:09:06.850 --> 01:09:11.710 and it turns out strings and chars are fundamentally interrelated in ways 01:09:11.710 --> 01:09:13.630 that we can now literally see. 01:09:13.630 --> 01:09:16.779 If we had a string called s, for instance, 01:09:16.779 --> 01:09:20.680 and that string contains three characters, H-I and an exclamation 01:09:20.680 --> 01:09:23.109 point, well it turns out you can actually 01:09:23.109 --> 01:09:25.479 get at the individual letters in a string 01:09:25.479 --> 01:09:29.950 by doing the name of the string, bracket, zero, close bracket, 01:09:29.950 --> 01:09:31.950 or s bracket one, or s bracket two. 01:09:31.950 --> 01:09:35.260 If the name of my variable is s, and s is a string, 01:09:35.260 --> 01:09:38.529 I can actually access the individual characters there in just 01:09:38.529 --> 01:09:41.830 like an array, which is to say then, what 01:09:41.830 --> 01:09:48.029 is a string as of this week versus last? 01:09:48.029 --> 01:09:49.850 It's just an array of chars. 01:09:49.850 --> 01:09:51.340 It's just an array of characters. 01:09:51.340 --> 01:09:54.848 So even though it's a data type, thanks to CS50's library and CS50 dot h, 01:09:54.848 --> 01:09:57.640 and we're going to take this training wheel off within a few weeks, 01:09:57.640 --> 01:09:59.890 we've essentially just created a string to be 01:09:59.890 --> 01:10:02.720 for now, at this point in the story, just an array of characters. 01:10:02.720 --> 01:10:03.220 Why? 01:10:03.220 --> 01:10:05.327 Because being able to have multiple characters 01:10:05.327 --> 01:10:07.660 is certainly way more useful than having to spell things 01:10:07.660 --> 01:10:11.320 out one variable at a time with one char at a time. 01:10:11.320 --> 01:10:14.470 So string is a data type in the CS50 library 01:10:14.470 --> 01:10:17.763 that for today's purposes indeed, just an array of characters. 01:10:17.763 --> 01:10:19.930 And we'll see before long that, that too is actually 01:10:19.930 --> 01:10:24.290 kind of a bit of a white lie, but we'll see why before long as well. 01:10:24.290 --> 01:10:27.040 So if I declare a string in C, I can actually 01:10:27.040 --> 01:10:28.540 literally do something like this. 01:10:28.540 --> 01:10:32.620 String s equals quote unquote hi, this time using double quotes, and not 01:10:32.620 --> 01:10:36.182 single quotes, because it's three characters and not just a single char. 01:10:36.182 --> 01:10:38.890 So in memory, that's actually going to look pretty much the same. 01:10:38.890 --> 01:10:42.910 If the variable's called s, it's going to have h i and an exclamation point. 01:10:42.910 --> 01:10:46.630 And just for simplicity, I'll label the first box as s 01:10:46.630 --> 01:10:49.210 and just assume that we can get everywhere else. 01:10:49.210 --> 01:10:53.170 But it turns out that strings are a little special, because 01:10:53.170 --> 01:10:56.530 unlike a char, which is one byte, unlike an int, which 01:10:56.530 --> 01:10:59.590 is four bytes, unlike a long, which is eight bytes, 01:10:59.590 --> 01:11:01.672 how long should a string be? 01:11:01.672 --> 01:11:03.245 AUDIENCE: [INAUDIBLE] 01:11:03.245 --> 01:11:05.620 DAVID MALAN: Yeah, I mean as many characters as you need, 01:11:05.620 --> 01:11:08.050 because if I want to store H-I I need-- 01:11:08.050 --> 01:11:11.170 H-I exclamation point, I need strings to be at least three bytes, 01:11:11.170 --> 01:11:12.020 it would seem-- 01:11:12.020 --> 01:11:15.100 for my name David, at least five bytes, for D-A-V-I-D-- 01:11:15.100 --> 01:11:18.130 Brian, as well, and much longer names in the room, too. 01:11:18.130 --> 01:11:21.413 So strings can't really have a preordained length associated 01:11:21.413 --> 01:11:23.830 with them, which is why I put a question mark on the board 01:11:23.830 --> 01:11:27.040 before when I first summarized the sizes of these types. 01:11:27.040 --> 01:11:31.810 But the catch is that if a variable only has a name, like s, or name, or any 01:11:31.810 --> 01:11:34.820 of the variables you use for p-set one's problems, 01:11:34.820 --> 01:11:38.380 it turns out we all need to decide as human programmers 01:11:38.380 --> 01:11:41.020 how do we know where the string ends? 01:11:41.020 --> 01:11:43.330 The name of the variable, suffice it to say, 01:11:43.330 --> 01:11:46.330 lets us know where the variable begins, just as I've drawn here. 01:11:46.330 --> 01:11:48.960 If you reference a variable in a program and call it s, 01:11:48.960 --> 01:11:52.390 the computer will just know to go to the first character in that string. 01:11:52.390 --> 01:11:55.210 But there needs to be a little clue to the computer as to where 01:11:55.210 --> 01:11:59.230 the string ends, and that clue is what's called a null character. 01:11:59.230 --> 01:12:01.300 It's a little funky to look at, but it's just 01:12:01.300 --> 01:12:04.347 a backslash zero, which might remind you of backslash n, which 01:12:04.347 --> 01:12:06.430 too is a little funky, and that's a special symbol 01:12:06.430 --> 01:12:09.790 that says move the cursor to the next line, give a new line. 01:12:09.790 --> 01:12:12.550 Backslash zero is the so-called null character 01:12:12.550 --> 01:12:14.890 or the null terminating character. 01:12:14.890 --> 01:12:19.930 And all that is special syntax for eight zero bits. 01:12:19.930 --> 01:12:22.310 So each of these boxes represents h bits. 01:12:22.310 --> 01:12:23.530 This is number 72. 01:12:23.530 --> 01:12:25.030 This is the number 73. 01:12:25.030 --> 01:12:26.470 This is the number 33. 01:12:26.470 --> 01:12:32.530 This backslash zero is just the way of drawing all eight bits as zeros. 01:12:32.530 --> 01:12:36.670 So that's what a computer uses in C to demarcate the end of a string. 01:12:36.670 --> 01:12:40.025 It just wastes one byte as all zero bits. 01:12:40.025 --> 01:12:41.650 And I say waste, because you know what? 01:12:41.650 --> 01:12:47.740 How much space does H-I exclamation point actually take up accordingly? 01:12:47.740 --> 01:12:50.620 How many bytes do you need to store hi? 01:12:50.620 --> 01:12:52.000 AUDIENCE: [INAUDIBLE] 01:12:52.000 --> 01:12:56.380 DAVID MALAN: Three, well, four, because you need to know where the string ends, 01:12:56.380 --> 01:12:58.780 otherwise you won't be able to distinguish 01:12:58.780 --> 01:13:02.450 the beginnings of other variables, potentially, in your computer's memory. 01:13:02.450 --> 01:13:04.130 And we'll see this in just a moment. 01:13:04.130 --> 01:13:06.400 So if my string is called s, it turns out 01:13:06.400 --> 01:13:08.410 that at s bracket zero is the first character. 01:13:08.410 --> 01:13:12.010 S bracket one is the second character. s bracket two is the third. 01:13:12.010 --> 01:13:16.210 And that null character, so to speak, the invisible backslash zero 01:13:16.210 --> 01:13:18.950 or eight zero bits happens to be at the end. 01:13:18.950 --> 01:13:24.250 So a string that's of length three, actually takes up four bytes. 01:13:24.250 --> 01:13:27.940 Any string you have typed into a computer yet, whether it's hi, 01:13:27.940 --> 01:13:30.970 or David, or Brian, or Emma, or Rodrigo, takes up 01:13:30.970 --> 01:13:33.940 as many characters as are in those names, 01:13:33.940 --> 01:13:37.370 plus one byte for this special null terminating character. 01:13:37.370 --> 01:13:38.122 So let's see that. 01:13:38.122 --> 01:13:40.330 If we were to write a program using these four names, 01:13:40.330 --> 01:13:42.860 let me go ahead and with that up really quickly here. 01:13:42.860 --> 01:13:46.450 I'm going to create a file called names dot c, 01:13:46.450 --> 01:13:50.050 and I'm going to go ahead and do include standard Io dot h. 01:13:50.050 --> 01:13:53.260 Then I'm going to go ahead and do int main void. 01:13:53.260 --> 01:13:57.790 Inside of here, I'm going to give myself four strings, using my new array 01:13:57.790 --> 01:13:59.080 syntax, as before. 01:13:59.080 --> 01:14:01.870 So I could call this name one, name two, name three, name four, 01:14:01.870 --> 01:14:03.760 but I'm not going to repeat that bad habit. 01:14:03.760 --> 01:14:05.440 I'm going to give myself a name-- 01:14:05.440 --> 01:14:10.150 a variable called names, plural, and store four strings in it, as follows. 01:14:10.150 --> 01:14:12.430 Let's give Emma the first spot there. 01:14:12.430 --> 01:14:16.180 Let's give Rodrigo the second spot there. 01:14:16.180 --> 01:14:19.780 I'm using all caps just because we've seen some of those Ascii codes before, 01:14:19.780 --> 01:14:21.610 but I could use lowercase as well. 01:14:21.610 --> 01:14:22.690 Let's add Brian. 01:14:22.690 --> 01:14:25.120 And then I'll go ahead and add myself lastly. 01:14:25.120 --> 01:14:29.680 So the array is of size four, but I count from zero on up through C. 01:14:29.680 --> 01:14:32.080 And now just for demonstration's sake, let's go ahead 01:14:32.080 --> 01:14:33.790 and print out, say, Emma's name. 01:14:33.790 --> 01:14:37.360 So if I want to print out Emma's name, the type of variable in which she 01:14:37.360 --> 01:14:39.121 is stored, is what? 01:14:39.121 --> 01:14:41.174 What is the type that I want to print? 01:14:41.174 --> 01:14:41.990 String. 01:14:41.990 --> 01:14:43.837 So that's percent s, just like last week. 01:14:43.837 --> 01:14:45.670 And I'm going to head and put a backslash n. 01:14:45.670 --> 01:14:49.720 And if I want to print Emma's name, what do I type here 01:14:49.720 --> 01:14:52.052 to plug into that placeholder? 01:14:52.052 --> 01:14:53.095 AUDIENCE: [INAUDIBLE] 01:14:53.095 --> 01:14:54.470 DAVID MALAN: Names brackets zero. 01:14:54.470 --> 01:14:56.540 It's a little bad that I'm hard coding it here, 01:14:56.540 --> 01:14:59.560 but again, I'm just demonstrating how this all works for now. 01:14:59.560 --> 01:15:01.090 Let me go ahead and save that. 01:15:01.090 --> 01:15:03.400 Let me do make names. 01:15:03.400 --> 01:15:04.570 Bit of an error here. 01:15:04.570 --> 01:15:05.500 What did I do wrong? 01:15:05.500 --> 01:15:08.170 Oh my god, all of this is wrong. 01:15:08.170 --> 01:15:11.274 Does anyone see it yet? 01:15:11.274 --> 01:15:12.150 AUDIENCE: [INAUDIBLE] 01:15:12.150 --> 01:15:14.025 DAVID MALAN: Yeah, I forgot the CS50 library. 01:15:14.025 --> 01:15:16.920 So even though I'm not using get string, I am using string, 01:15:16.920 --> 01:15:19.630 so I do need the CS50 library up here. 01:15:19.630 --> 01:15:21.150 So let me go ahead and clear that. 01:15:21.150 --> 01:15:22.500 Make names. 01:15:22.500 --> 01:15:23.236 OK better. 01:15:23.236 --> 01:15:25.980 Dot slash names, and I should just see Emma's name. 01:15:25.980 --> 01:15:28.080 But watch this, what I can do too. 01:15:28.080 --> 01:15:31.830 I know that Emma's name is a string, and I now 01:15:31.830 --> 01:15:36.550 know that a string is an array of characters, so I can also do this. 01:15:36.550 --> 01:15:41.370 Let me go ahead and print out one, two, three, four characters, 01:15:41.370 --> 01:15:42.700 and then a new line. 01:15:42.700 --> 01:15:44.640 And the characters I'm going to print out 01:15:44.640 --> 01:15:48.660 are going to be Emma's names, first character, 01:15:48.660 --> 01:15:54.600 Emma's names, second character, Emma's names, third character, 01:15:54.600 --> 01:15:58.500 and Emma's names, fourth character. 01:15:58.500 --> 01:16:01.980 So you can have what's essentially a two-dimensional array, where 01:16:01.980 --> 01:16:03.570 you have two sets of square brackets. 01:16:03.570 --> 01:16:06.930 The first one indexes me into the array of names. 01:16:06.930 --> 01:16:10.510 And to index into an array means go to a certain location in an array. 01:16:10.510 --> 01:16:13.530 So names, bracket zero, so to speak. 01:16:13.530 --> 01:16:18.930 This part here means go get Emma's name from the array of four names. 01:16:18.930 --> 01:16:23.260 This square bracket after says within that string, 01:16:23.260 --> 01:16:25.230 treat it as an array of characters and get 01:16:25.230 --> 01:16:28.980 the zeroth character, the first character, which is hopefully e 01:16:28.980 --> 01:16:31.940 and an m and an m and then a. 01:16:31.940 --> 01:16:34.290 So I'm going to go ahead and save this file now. 01:16:34.290 --> 01:16:35.870 Make names again. 01:16:35.870 --> 01:16:41.190 It compiled, dot slash names, and voila, Emma, Emma, I see twice. 01:16:41.190 --> 01:16:44.730 Now, I'm never again going to print any string like this. 01:16:44.730 --> 01:16:48.060 This is just ridiculous, plus I had to know in advance how long her name is. 01:16:48.060 --> 01:16:51.360 However, it is equivalent to printing the string itself. 01:16:51.360 --> 01:16:54.240 It's just C and printf knows when you use 01:16:54.240 --> 01:16:56.550 percent s and you pass on the name of a variable, 01:16:56.550 --> 01:17:00.090 all printf is probably doing under the hood is some kind of loop 01:17:00.090 --> 01:17:03.450 and it's iterating over your string from the first character and it's checking, 01:17:03.450 --> 01:17:04.680 is this the null character? 01:17:04.680 --> 01:17:05.760 If not, print it. 01:17:05.760 --> 01:17:06.930 Is this the null character? 01:17:06.930 --> 01:17:07.672 If not, print it. 01:17:07.672 --> 01:17:10.130 If this is the null character-- is this the null character? 01:17:10.130 --> 01:17:10.980 If not, print it. 01:17:10.980 --> 01:17:18.300 And that's how we get, E-M-M-A stop, because printf, in this line 12, 01:17:18.300 --> 01:17:24.510 presumably noticed, oh, wait a minute, the fifth byte in Emma's names zero 01:17:24.510 --> 01:17:29.143 array is backslash zero, or all eight bits as zero. 01:17:29.143 --> 01:17:29.643 Yeah? 01:17:29.643 --> 01:17:32.160 AUDIENCE: That's just part of [INAUDIBLE] 01:17:32.160 --> 01:17:35.160 DAVID MALAN: That is all part of the underneath the hood stuff of printf 01:17:35.160 --> 01:17:38.740 and it's what humans decided decades ago with C how strings would work. 01:17:38.740 --> 01:17:40.740 They could have come up with a different system, 01:17:40.740 --> 01:17:44.250 but this is the system that they decided to use. 01:17:44.250 --> 01:17:45.210 Other questions? 01:17:45.210 --> 01:17:45.941 Yeah? 01:17:45.941 --> 01:17:49.869 AUDIENCE: [INAUDIBLE] 01:17:58.710 --> 01:18:00.130 DAVID MALAN: I didn't go further. 01:18:00.130 --> 01:18:05.400 So I deliberately did not touch bracket four, even though it's there. 01:18:05.400 --> 01:18:06.570 But I can try to print this. 01:18:06.570 --> 01:18:07.130 Let's see. 01:18:07.130 --> 01:18:09.700 So let me go ahead and change this program real quick. 01:18:09.700 --> 01:18:12.720 I'm going to go ahead and print out percent C a fifth time. 01:18:12.720 --> 01:18:18.030 And let's go ahead and see if we can see Emma's null terminating character 01:18:18.030 --> 01:18:22.500 at location four, which is her fifth location, so after the E-M-M-A. 01:18:22.500 --> 01:18:23.820 Let me save that. 01:18:23.820 --> 01:18:27.920 Make names, dot slash names, Emma Emma. 01:18:27.920 --> 01:18:28.920 So I don't see it there. 01:18:28.920 --> 01:18:29.670 But you know what? 01:18:29.670 --> 01:18:32.482 Let me try changing this last one just for kicks to percent i. 01:18:32.482 --> 01:18:34.440 And again, this is where printf is your friend. 01:18:34.440 --> 01:18:36.600 You can use it powerfully to see what's going on. 01:18:36.600 --> 01:18:38.490 Or we could whip out debug 50. 01:18:38.490 --> 01:18:42.150 Let me go ahead and make names, dot slash names. 01:18:42.150 --> 01:18:43.835 And voila, there's the zero. 01:18:43.835 --> 01:18:45.960 I'm printing it literally as an int just to see it. 01:18:45.960 --> 01:18:47.627 I would never do this in the real world. 01:18:47.627 --> 01:18:49.260 But it's indeed there. 01:18:49.260 --> 01:18:51.800 And now, this doesn't often work, but just for kicks-- 01:18:51.800 --> 01:18:53.040 I'm getting a little crazy-- 01:18:53.040 --> 01:18:55.650 suppose that I want to look well past Emma's name 01:18:55.650 --> 01:18:59.310 to like location 400, like let's start poking around in the computer's memory, 01:18:59.310 --> 01:19:00.510 one of those other boxes. 01:19:00.510 --> 01:19:03.360 Make names, dot slash names. 01:19:03.360 --> 01:19:06.420 OK, there's a negative three down there as well, or technically 01:19:06.420 --> 01:19:07.530 a hyphen and then a three. 01:19:07.530 --> 01:19:09.780 So we'll come back to this in a couple of weeks' time. 01:19:09.780 --> 01:19:13.560 We can actually start hacking around and looking around my computer's memory 01:19:13.560 --> 01:19:18.930 at any location, because it's just numbers of boxes on the screen. 01:19:18.930 --> 01:19:19.671 Yeah? 01:19:19.671 --> 01:19:22.500 AUDIENCE: Is there any limit to the length of the string? 01:19:22.500 --> 01:19:25.020 DAVID MALAN: Is there any limit to the length of the string? 01:19:25.020 --> 01:19:30.440 Short answer-- yes, the amount of memory that the computer has. 01:19:30.440 --> 01:19:32.140 So like 2 billion 4 billion-- 01:19:32.140 --> 01:19:33.915 it's long. 01:19:33.915 --> 01:19:37.153 AUDIENCE: What happens if try to type in [INAUDIBLE] 01:19:37.153 --> 01:19:38.570 DAVID MALAN: Really good question. 01:19:38.570 --> 01:19:40.240 What happens if you try to type that in hypothetically? 01:19:40.240 --> 01:19:41.500 It depends on the function you use. 01:19:41.500 --> 01:19:43.500 Let me come back to that in like two weeks time. 01:19:43.500 --> 01:19:45.250 Get string will not crash. 01:19:45.250 --> 01:19:48.640 Other C functions will crash, if you give them more input than they expect, 01:19:48.640 --> 01:19:50.950 and we'll come back to the reasons why. 01:19:50.950 --> 01:19:53.642 So what's actually going on underneath this hood, then, 01:19:53.642 --> 01:19:54.850 if we have these four names-- 01:19:54.850 --> 01:19:56.370 Emma, Rodrigo, Brian, and David. 01:19:56.370 --> 01:19:59.530 Well, if we consider our memory again, we know that Emma's up at this first 01:19:59.530 --> 01:20:03.340 location, E-M-M-A, followed by this null terminating character. 01:20:03.340 --> 01:20:06.190 But if the second name we stored in a variable was Rodrigo, 01:20:06.190 --> 01:20:09.610 turns out he's going to end up sort of back to back with that memory as well. 01:20:09.610 --> 01:20:12.220 And again, it's wrapping only because this is an artist's rendition of what 01:20:12.220 --> 01:20:12.970 memory looks like. 01:20:12.970 --> 01:20:15.400 There's no notion of left, right, up, or down in RAM. 01:20:15.400 --> 01:20:20.380 But he is R-O-D-R-I-G-O, and his null terminating character there. 01:20:20.380 --> 01:20:21.460 Brian might end up there. 01:20:21.460 --> 01:20:22.750 I might end up after it. 01:20:22.750 --> 01:20:25.780 And this is what's really going on underneath the hood of your computer. 01:20:25.780 --> 01:20:28.030 Each of these values isn't technically a character. 01:20:28.030 --> 01:20:29.290 It's technically a number. 01:20:29.290 --> 01:20:30.790 And frankly, it's not even a number. 01:20:30.790 --> 01:20:32.890 It's eight bits at a time. 01:20:32.890 --> 01:20:35.800 But again, we don't have to worry about that level of detail now 01:20:35.800 --> 01:20:38.660 that we're operating at this level of abstraction. 01:20:38.660 --> 01:20:40.917 And I put up the wrong code a moment ago. 01:20:40.917 --> 01:20:43.750 This is the code that I actually implemented using an array from the 01:20:43.750 --> 01:20:47.470 get go, as opposed to an actual-- 01:20:47.470 --> 01:20:49.660 as opposed to four separate variables. 01:20:49.660 --> 01:20:52.540 So just to highlight, then, what's going on, per the example I just 01:20:52.540 --> 01:20:56.590 did with printing out Emma's characters, if this is a variable called names, 01:20:56.590 --> 01:21:01.360 and there's four names in it, zero, one, two, three, 01:21:01.360 --> 01:21:05.800 you can think of every character as being kind of addressable 01:21:05.800 --> 01:21:07.580 using square bracket notation. 01:21:07.580 --> 01:21:10.870 The first set of square brackets picks the name in question. 01:21:10.870 --> 01:21:14.230 The second set of square brackets picks the character within the name. 01:21:14.230 --> 01:21:17.740 So e is the first character, so that's zero. m is the next one, so that's one. 01:21:17.740 --> 01:21:21.850 m is the third, so that's two. a Is the fourth, and so that's three. 01:21:21.850 --> 01:21:26.183 And then with Rodrigo, he's at names one, and his r is in brackets zero. 01:21:26.183 --> 01:21:28.100 So again, we're really getting into the weeds. 01:21:28.100 --> 01:21:31.100 And this is not what programming ultimately is, but this is just to say, 01:21:31.100 --> 01:21:34.630 there's no magic when you use printf and get string and get int, and so forth. 01:21:34.630 --> 01:21:40.390 All that's going on underneath the hood is manipulation of values like these. 01:21:40.390 --> 01:21:44.230 So let's now see what a string really is and we'll ultimately conclude today 01:21:44.230 --> 01:21:46.030 with some domain specific problems. 01:21:46.030 --> 01:21:48.130 Indeed with problem set two will you be exploring 01:21:48.130 --> 01:21:50.830 a number of real-world problems, like assessing just how 01:21:50.830 --> 01:21:54.607 readable some text is, what grade level might a certain book or another be, 01:21:54.607 --> 01:21:56.690 and two, implementing some notion of cryptography, 01:21:56.690 --> 01:21:58.330 the art of scrambling information. 01:21:58.330 --> 01:22:00.610 And suffice it to say, in both of those domains, 01:22:00.610 --> 01:22:03.490 reading texts and also cryptography, strings 01:22:03.490 --> 01:22:05.840 are going to be the ingredient that we need. 01:22:05.840 --> 01:22:09.550 So let's take a look now at a few examples involving 01:22:09.550 --> 01:22:11.330 more and more strings. 01:22:11.330 --> 01:22:16.210 I'm going to go ahead and create a program here called string dot c, 01:22:16.210 --> 01:22:17.710 just so I can play with this notion. 01:22:17.710 --> 01:22:20.470 I'm going to go ahead and include CS50 dot h. 01:22:20.470 --> 01:22:24.550 I'm going to go ahead and include standard Io dot h. 01:22:24.550 --> 01:22:25.990 I'll fix this up here-- 01:22:25.990 --> 01:22:26.993 int main void. 01:22:26.993 --> 01:22:30.160 And now let me go ahead and just play around with some strings for a moment. 01:22:30.160 --> 01:22:32.720 Let me go ahead and get myself a string from the user. 01:22:32.720 --> 01:22:36.940 So get string and ask for their input. 01:22:36.940 --> 01:22:38.890 Trying to type too fast now. 01:22:38.890 --> 01:22:41.860 So let me go ahead and ask the user for their input via get string, 01:22:41.860 --> 01:22:44.260 and store the answer in a variable called s. 01:22:44.260 --> 01:22:46.030 Then let me go ahead and preemptively say 01:22:46.030 --> 01:22:48.610 that their output is going to be the following. 01:22:48.610 --> 01:22:51.760 And what I want to do is just print out the individual characters 01:22:51.760 --> 01:22:53.060 in that string. 01:22:53.060 --> 01:22:57.400 So for int i get to zero, I don't know what my condition is yet, 01:22:57.400 --> 01:23:00.180 so I'll come back to that-- i plus plus. 01:23:00.180 --> 01:23:03.730 I'm going to go ahead and print out the individual character 01:23:03.730 --> 01:23:06.680 at the i-th location in that string, and I'm 01:23:06.680 --> 01:23:08.680 going to end this whole program with a new line. 01:23:08.680 --> 01:23:12.490 So I still have a blank to fill in, these question marks, but I ultimately 01:23:12.490 --> 01:23:15.490 just want to take as input a string, and then print it out as output, 01:23:15.490 --> 01:23:18.370 but not using percent s. 01:23:18.370 --> 01:23:21.670 I'm going to use percent c, one character at a time. 01:23:21.670 --> 01:23:26.080 So my question mark here is what question could I ask on every iteration 01:23:26.080 --> 01:23:30.660 before deciding whether or not I've printed every character in the string? 01:23:30.660 --> 01:23:31.160 Yeah? 01:23:31.160 --> 01:23:32.480 AUDIENCE: Length of the string. 01:23:32.480 --> 01:23:33.730 DAVID MALAN: Length of string. 01:23:33.730 --> 01:23:36.105 So I could say while i is less than the length of string. 01:23:36.105 --> 01:23:36.913 What else? 01:23:36.913 --> 01:23:37.720 AUDIENCE: The null character. 01:23:37.720 --> 01:23:39.070 DAVID MALAN: Or if it's equal to the null character. 01:23:39.070 --> 01:23:40.070 Let's try both of these. 01:23:40.070 --> 01:23:42.790 So if I know how strings are represented, 01:23:42.790 --> 01:23:47.718 I can just say while s bracket i does not equal backslash zero. 01:23:47.718 --> 01:23:50.260 Now this is a bit of a funky syntax, because even though it's 01:23:50.260 --> 01:23:53.290 two characters, I still have to use single quotes, 01:23:53.290 --> 01:23:55.990 because those two characters, just like backslash n, 01:23:55.990 --> 01:23:58.960 represent one idea, not two literal characters. 01:23:58.960 --> 01:24:01.780 But this is a literal translation of what we just discussed. 01:24:01.780 --> 01:24:05.050 Initialize i to zero, incremented on every iteration, 01:24:05.050 --> 01:24:09.220 but every time you do that check does the i-th character in the string 01:24:09.220 --> 01:24:13.300 equal the special null character, and if so, that's it for the loop. 01:24:13.300 --> 01:24:15.320 We only want to iterate through this for loop 01:24:15.320 --> 01:24:18.520 so long as it's not that special backslash zero. 01:24:18.520 --> 01:24:22.810 So if I go ahead now and save this file and make string and run 01:24:22.810 --> 01:24:27.220 dot slash string and my input for instance is Emma, Enter, 01:24:27.220 --> 01:24:29.260 I'm going to see literally her name back. 01:24:29.260 --> 01:24:33.400 So this is kind of my way of re implementing the idea of percent s, 01:24:33.400 --> 01:24:34.813 but using only percent c. 01:24:34.813 --> 01:24:35.980 But I liked your suggestion. 01:24:35.980 --> 01:24:37.230 Why don't we use the string-- 01:24:37.230 --> 01:24:40.988 the length of the string, rather than this low-level implementation detail? 01:24:40.988 --> 01:24:42.780 It would be really nice if I could just say 01:24:42.780 --> 01:24:48.690 while i is less than the length of s-- 01:24:48.690 --> 01:24:50.440 so how do express this? 01:24:50.440 --> 01:24:55.560 Well, it turns out there's another file called 01:24:55.560 --> 01:24:59.610 string dot h inside of which are a bunch of string-related functions 01:24:59.610 --> 01:25:00.900 that I might like to use. 01:25:00.900 --> 01:25:04.950 One of those is a function called str leng, for short, 01:25:04.950 --> 01:25:07.150 which means the length of a string. 01:25:07.150 --> 01:25:09.130 So I can take your suggestion and just say, 01:25:09.130 --> 01:25:10.838 I don't care how a string is implemented. 01:25:10.838 --> 01:25:12.755 I mean, my god, the whole point of programming 01:25:12.755 --> 01:25:15.840 ultimately is too abstract on those lower level implementation details. 01:25:15.840 --> 01:25:18.600 Let me just ask the computer what is your length, so 01:25:18.600 --> 01:25:20.070 that I don't count past it. 01:25:20.070 --> 01:25:24.540 Let me go ahead now and make string, dot slash string. 01:25:24.540 --> 01:25:26.010 Let's type in Emma again. 01:25:26.010 --> 01:25:28.030 And the output is the same. 01:25:28.030 --> 01:25:33.090 But now, this is correct perhaps, but I argue it's not very well-designed. 01:25:33.090 --> 01:25:36.300 I'm being a little inefficient and I bet I can do this better. 01:25:36.300 --> 01:25:37.280 What do you see? 01:25:37.280 --> 01:25:38.423 AUDIENCE: [INAUDIBLE] 01:25:38.423 --> 01:25:39.340 DAVID MALAN: Go ahead. 01:25:39.340 --> 01:25:43.290 AUDIENCE: [INAUDIBLE] 01:25:43.290 --> 01:25:45.300 DAVID MALAN: Yeah, exactly. 01:25:45.300 --> 01:25:47.910 Remember in a for loop that the condition in the middle, 01:25:47.910 --> 01:25:50.790 in between the semicolons, is a question, a Boolean expression, 01:25:50.790 --> 01:25:53.620 that you ask again and again and again. 01:25:53.620 --> 01:25:56.940 And it turns out that calling a function is not without cost. 01:25:56.940 --> 01:25:59.770 It might take a split second, because computers are super fast, 01:25:59.770 --> 01:26:04.140 but why are you asking the same question again and again and again and again. 01:26:04.140 --> 01:26:07.080 The answer is never going to change, because Emma's name is not 01:26:07.080 --> 01:26:09.630 growing or shrinking, it's just Emma. 01:26:09.630 --> 01:26:11.350 So I can solve this in a couple of ways. 01:26:11.350 --> 01:26:12.642 I could do something like this. 01:26:12.642 --> 01:26:17.130 Int n get str leng of s, and then I could just plug in n. 01:26:17.130 --> 01:26:20.460 My program is just as correct, but it's a little better designed 01:26:20.460 --> 01:26:23.250 now because I'm asking the question of string length 01:26:23.250 --> 01:26:27.333 once, remembering the answer, and then using that answer again and again. 01:26:27.333 --> 01:26:30.000 Now, yes, technically, now I'm wasting some space, because I now 01:26:30.000 --> 01:26:31.440 have another variable called n. 01:26:31.440 --> 01:26:32.530 So something's gotta give. 01:26:32.530 --> 01:26:35.220 I'm going to use more space or maybe more time, 01:26:35.220 --> 01:26:37.802 but that's a theme we'll come back to next week especially. 01:26:37.802 --> 01:26:40.260 But it turns out there's some special syntax for this, too. 01:26:40.260 --> 01:26:43.980 If you know in a loop that you want to ask a question once and remember 01:26:43.980 --> 01:26:48.210 the answer, you can actually just say this and do this all in one line. 01:26:48.210 --> 01:26:51.900 It's no better or worse, it's just a little more succinct, stylistically. 01:26:51.900 --> 01:26:55.740 This has the same effect of initializing i to zero, and n 01:26:55.740 --> 01:27:00.190 to the length of string, and then never again asking that question. 01:27:00.190 --> 01:27:00.990 So I can save this. 01:27:00.990 --> 01:27:03.120 I can make string. 01:27:03.120 --> 01:27:05.100 I can then do dot slash string, and I'm going 01:27:05.100 --> 01:27:07.560 to see hopefully, Emma, Emma again. 01:27:07.560 --> 01:27:10.960 So a third and final version of this idea, but a little better Designed. 01:27:10.960 --> 01:27:11.996 Yeah? 01:27:11.996 --> 01:27:15.670 AUDIENCE: [INAUDIBLE] 01:27:15.672 --> 01:27:17.130 DAVID MALAN: In this case, it's OK. 01:27:17.130 --> 01:27:18.420 This would be a common convention. 01:27:18.420 --> 01:27:20.280 When you are doing something especially to minimize 01:27:20.280 --> 01:27:22.710 the number of questions you're asking, this is OK, so long 01:27:22.710 --> 01:27:23.910 as it's still pretty tight. 01:27:23.910 --> 01:27:26.590 But there, too, reasonable people might disagree. 01:27:26.590 --> 01:27:27.230 Yeah? 01:27:27.230 --> 01:27:29.703 AUDIENCE: Is the prototype string in library [INAUDIBLE]?? 01:27:29.703 --> 01:27:31.120 DAVID MALAN: Really good question. 01:27:31.120 --> 01:27:34.902 The prototype for string, its declaration, is in string dot h. 01:27:34.902 --> 01:27:36.860 I would get one of those cryptic error messages 01:27:36.860 --> 01:27:40.400 if I forgot to include string dot h, because clang would not 01:27:40.400 --> 01:27:43.910 know that str leng actually exists. 01:27:43.910 --> 01:27:46.490 Let me try another example here and see what kind of power 01:27:46.490 --> 01:27:51.140 we have now that we actually are controlling-- 01:27:51.140 --> 01:27:53.690 now that we actually understand what a string actually is. 01:27:53.690 --> 01:27:55.482 Let me go ahead and whip this up real fast. 01:27:55.482 --> 01:27:58.190 So up here in my program, called uppercase dot c, 01:27:58.190 --> 01:28:00.350 me give myself the CS50 library. 01:28:00.350 --> 01:28:02.270 Let me give myself standard Io dot h. 01:28:02.270 --> 01:28:06.140 And now let me give me string dot h, just so I can use str leng. 01:28:06.140 --> 01:28:09.440 Let me give myself the name of a function main. 01:28:09.440 --> 01:28:11.630 And then in here, let's do the same thing. 01:28:11.630 --> 01:28:13.868 String s gets get string. 01:28:13.868 --> 01:28:16.160 But this time, let me just ask the human for the string 01:28:16.160 --> 01:28:18.290 before I'm going to do something to it. 01:28:18.290 --> 01:28:24.230 Then I'm going to go ahead and say after I want the following to happen. 01:28:24.230 --> 01:28:25.550 And I'm going to do this-- 01:28:25.550 --> 01:28:31.610 for int i get zero, n equal str leng s as before. 01:28:31.610 --> 01:28:35.390 Do this so long as i is less than n, and on each iteration, i plus plus. 01:28:35.390 --> 01:28:36.740 So copy-paste from before. 01:28:36.740 --> 01:28:38.930 I just retyped out the same thing. 01:28:38.930 --> 01:28:42.170 Now let me go ahead and in this for loop, let me change 01:28:42.170 --> 01:28:45.410 this string, whatever it is, all to uppercase. 01:28:45.410 --> 01:28:46.590 So how might I do this? 01:28:46.590 --> 01:28:52.280 So let me go ahead and say, well, if the current character at s bracket i 01:28:52.280 --> 01:28:58.700 is greater than or equal to lower case a, and that same character is less than 01:28:58.700 --> 01:29:00.187 or equal to lowercase z. 01:29:00.187 --> 01:29:03.020 So I'm using some week one style stuff, even though we didn't really 01:29:03.020 --> 01:29:04.850 use this much syntax last week. 01:29:04.850 --> 01:29:06.440 I'm just asking a simple question. 01:29:06.440 --> 01:29:11.480 Is the i-th character in s greater than or equal to lowercase a and-- 01:29:11.480 --> 01:29:13.370 double ampersand means and-- 01:29:13.370 --> 01:29:16.570 logically, is that character less than or equal to z? 01:29:16.570 --> 01:29:19.940 So is it a, b, c, all the way through z-- is it a lowercase letter? 01:29:19.940 --> 01:29:24.120 If so, I want to do something like convert to uppercase. 01:29:24.120 --> 01:29:26.180 But we'll come back to that in just a moment. 01:29:26.180 --> 01:29:30.590 Else what do I want to do if the character is not lowercase 01:29:30.590 --> 01:29:33.096 and my goal is to uppercase the whole input? 01:29:33.096 --> 01:29:34.250 AUDIENCE: [INAUDIBLE] 01:29:34.250 --> 01:29:35.300 DAVID MALAN: Yeah, just leave it alone. 01:29:35.300 --> 01:29:35.690 So you know what? 01:29:35.690 --> 01:29:37.773 I'm just-- fine, I'm just going to leave it alone. 01:29:37.773 --> 01:29:40.842 I'm going to print it back out, just as I would with printf like that. 01:29:40.842 --> 01:29:42.800 So now even though this is not obvious from the 01:29:42.800 --> 01:29:45.830 get go how I'm going to solve this, I've now left myself 01:29:45.830 --> 01:29:47.540 a placeholder, pseudocode if you will. 01:29:47.540 --> 01:29:49.830 I just now need to answer this question. 01:29:49.830 --> 01:29:56.602 Well, it turns out a popular place to go for this answer would be AsciiChart.com 01:29:56.602 --> 01:29:58.310 And there's different ways to solve this, 01:29:58.310 --> 01:30:00.060 but this is just a free website that shows 01:30:00.060 --> 01:30:02.480 us all of the decimal numbers that correspond to letters. 01:30:02.480 --> 01:30:08.460 And recall from week zero, 65 is a, 66 is b, and so forth. 01:30:08.460 --> 01:30:11.090 Notice that 65 is-- capital A is 65. 01:30:11.090 --> 01:30:12.736 What is lowercase a? 01:30:12.736 --> 01:30:14.640 AUDIENCE: [INAUDIBLE] 01:30:14.640 --> 01:30:16.180 DAVID MALAN: 97. 01:30:16.180 --> 01:30:22.330 And then look-- 66 to 98, 67 to 99, 68 to 100-- 01:30:22.330 --> 01:30:24.690 what's the difference between these? 01:30:24.690 --> 01:30:25.720 Yeah, it's 32. 01:30:25.720 --> 01:30:28.510 If you add 32 to 65, you get 97. 01:30:28.510 --> 01:30:31.640 If you add 32 to 66, you get 98, and so forth. 01:30:31.640 --> 01:30:34.870 So it seems that the lowercase letters, wonderfully conveniently, 01:30:34.870 --> 01:30:39.880 are all 32 values away from the uppercase letters. 01:30:39.880 --> 01:30:42.460 Or conversely, if I have a lowercase letter, 01:30:42.460 --> 01:30:45.460 logically, what could I do to it in order 01:30:45.460 --> 01:30:49.960 to convert it from uppercase to lowercase-- 01:30:49.960 --> 01:30:53.180 Sorry-- from lowercase to uppercase? 01:30:53.180 --> 01:30:54.240 Subtract, right? 01:30:54.240 --> 01:30:58.300 So why don't I try printing out printf, percent c, 01:30:58.300 --> 01:31:01.450 then go ahead and print out not the actual character, 01:31:01.450 --> 01:31:03.193 but just subtract 32 from it. 01:31:03.193 --> 01:31:05.110 I know these are integers underneath the hood. 01:31:05.110 --> 01:31:07.000 And frankly, if I want to be really explicit, 01:31:07.000 --> 01:31:10.960 I can convert it to an integer, the Ascii code, and then subtract 32, 01:31:10.960 --> 01:31:14.120 but that can be done implicitly-- we saw earlier. 01:31:14.120 --> 01:31:17.740 So let me go ahead and save this file and run uppercase, 01:31:17.740 --> 01:31:20.290 make uppercase, dot slash uppercase. 01:31:20.290 --> 01:31:24.280 And this time, let me write Emma's name in all lowercase, and voila, 01:31:24.280 --> 01:31:24.940 I see it here. 01:31:24.940 --> 01:31:25.898 Now it's a little ugly. 01:31:25.898 --> 01:31:26.705 What did I forget? 01:31:26.705 --> 01:31:27.580 AUDIENCE: [INAUDIBLE] 01:31:27.580 --> 01:31:28.310 DAVID MALAN: A new line. 01:31:28.310 --> 01:31:31.143 So I'm going to go ahead and do that at the very end of the program, 01:31:31.143 --> 01:31:33.100 so I get it only once at the very end. 01:31:33.100 --> 01:31:38.140 Let me rerun-- make uppercase, dot slash uppercase, Emma in lowercase. 01:31:38.140 --> 01:31:39.980 Voila, I've got it uppercase. 01:31:39.980 --> 01:31:42.993 So this is like a very low-level implementation 01:31:42.993 --> 01:31:44.660 of the notion of upper casing something. 01:31:44.660 --> 01:31:46.780 So if you've ever done this in Google Docs or Microsoft Word-- 01:31:46.780 --> 01:31:48.370 convert this all to uppercase for whatever 01:31:48.370 --> 01:31:51.010 reason, that's all the computer is doing underneath the hood-- 01:31:51.010 --> 01:31:54.850 iterating over the characters and presumably subtracting off of that. 01:31:54.850 --> 01:31:57.970 But this, too, is at a low-level detail that we probably 01:31:57.970 --> 01:31:59.830 don't want to have to think about too much, 01:31:59.830 --> 01:32:02.872 and so it turns out there's functions that can solve this problem for us. 01:32:02.872 --> 01:32:06.320 And you might have discovered these last week or used them yourself. 01:32:06.320 --> 01:32:10.210 But on CS50's website is an example of what are called manual pages. 01:32:10.210 --> 01:32:13.840 And if I go ahead and pull this up on the course's website, 01:32:13.840 --> 01:32:17.560 we'll see a tool that adds the following. 01:32:17.560 --> 01:32:25.070 If I go to the course's web page and click on manual pages, 01:32:25.070 --> 01:32:27.190 you'll see the CS50 programmers manual, which 01:32:27.190 --> 01:32:29.560 is a simplified version of a very popular tool that's 01:32:29.560 --> 01:32:32.770 available on most computer systems that support programming. 01:32:32.770 --> 01:32:36.430 And suppose I want to do something like convert something to uppercase, 01:32:36.430 --> 01:32:37.730 I can search up there. 01:32:37.730 --> 01:32:39.820 And notice, there's a few functions available in C 01:32:39.820 --> 01:32:41.020 that relate to uppercase. 01:32:41.020 --> 01:32:44.470 Is upper, which asks a question, to lower and to upper. 01:32:44.470 --> 01:32:46.200 I'm going to go ahead and use to upper. 01:32:46.200 --> 01:32:47.860 I'm going to go ahead and use to upper. 01:32:47.860 --> 01:32:51.300 And if I click on this, I'll see essentially its documentation for it. 01:32:51.300 --> 01:32:53.050 And it's a little cryptic at first glance. 01:32:53.050 --> 01:32:55.060 But what you're seeing in the documentation 01:32:55.060 --> 01:32:58.900 is it's required header file and it's prototype. 01:32:58.900 --> 01:33:02.441 What file do I apparently need to include to use to upper? 01:33:02.441 --> 01:33:03.316 AUDIENCE: [INAUDIBLE] 01:33:03.316 --> 01:33:04.649 DAVID MALAN: Yeah, c type dot h. 01:33:04.649 --> 01:33:06.399 I don't really know what else is in there, 01:33:06.399 --> 01:33:08.410 but this is my hint that I should use that file. 01:33:08.410 --> 01:33:11.080 And what kind of input does to upper take? 01:33:11.080 --> 01:33:13.270 Well technically, it takes an int, for reasons that 01:33:13.270 --> 01:33:14.800 are explained in the documentation. 01:33:14.800 --> 01:33:17.110 But even if the documentation is not obvious, 01:33:17.110 --> 01:33:19.880 it turns out it's actually pretty easy to use. 01:33:19.880 --> 01:33:23.470 I'm going to go ahead and rip out most of this logic, 01:33:23.470 --> 01:33:28.480 and I'm just going to do this-- printf, percent c, to upper, 01:33:28.480 --> 01:33:31.180 s bracket i, semicolon. 01:33:31.180 --> 01:33:35.650 And up here, I'm going to go ahead and include c type dot h, 01:33:35.650 --> 01:33:37.660 because in reading the documentation, I realize 01:33:37.660 --> 01:33:41.320 that oh, I can pass in any character to to upper, and if it's lowercase, 01:33:41.320 --> 01:33:44.800 it's going to return in uppercase, and if it's not a lowercase letter, 01:33:44.800 --> 01:33:47.450 it's just going to return it unchanged. 01:33:47.450 --> 01:33:51.400 So if I save this file now, make uppercase, and then rerun 01:33:51.400 --> 01:33:56.140 this program, this time typing in Emma's name again in lowercase, voila, 01:33:56.140 --> 01:33:59.230 I've now used another helper function, something someone else wrote. 01:33:59.230 --> 01:34:02.050 But you can imagine that all the person did 01:34:02.050 --> 01:34:04.150 who wrote this function for us is what? 01:34:04.150 --> 01:34:08.260 Like an if else, checking the Ascii mathematics to see 01:34:08.260 --> 01:34:11.575 if the character is indeed lowercase. 01:34:11.575 --> 01:34:14.270 Any questions then on this? 01:34:14.270 --> 01:34:18.610 Again, now the goal is to move away from caring about 32 or the Ascii codes 01:34:18.610 --> 01:34:21.520 and just using helper functions someone else wrote. 01:34:21.520 --> 01:34:22.202 Yeah? 01:34:22.202 --> 01:34:24.628 AUDIENCE: Why [INAUDIBLE] 01:34:24.628 --> 01:34:26.170 DAVID MALAN: Why do you not need to-- 01:34:26.170 --> 01:34:29.052 AUDIENCE: [INAUDIBLE] 01:34:29.052 --> 01:34:30.010 DAVID MALAN: The type-- 01:34:30.010 --> 01:34:31.900 Ah, why do you not need to declare the type of int. 01:34:31.900 --> 01:34:32.440 I am. 01:34:32.440 --> 01:34:35.440 This only works if it's the same type as i. 01:34:35.440 --> 01:34:36.577 Good question. 01:34:36.577 --> 01:34:39.410 So I get away with it because both i and n are meant to be integers. 01:34:39.410 --> 01:34:40.090 Yeah? 01:34:40.090 --> 01:34:44.010 AUDIENCE: [INAUDIBLE] 01:34:44.785 --> 01:34:46.410 DAVID MALAN: Are there any limitations? 01:34:46.410 --> 01:34:51.163 No, you may use any functions you want on CS50 problem sets, 01:34:51.163 --> 01:34:52.830 whether or not we've used them in class. 01:34:52.830 --> 01:34:54.913 That's certainly fine, unless otherwise specified, 01:34:54.913 --> 01:34:56.710 which will rarely be the case. 01:34:56.710 --> 01:34:58.023 So what else then can we do? 01:34:58.023 --> 01:34:59.940 Well turns out, we've just empowered ourselves 01:34:59.940 --> 01:35:01.890 with a couple of new features, one of which 01:35:01.890 --> 01:35:04.140 is, again, called command line arguments. 01:35:04.140 --> 01:35:05.850 We've seen these before. 01:35:05.850 --> 01:35:09.420 What did I describe previously today and last week as a command line argument? 01:35:09.420 --> 01:35:11.700 What was an example? 01:35:11.700 --> 01:35:13.290 Anyone-- I heard here. 01:35:13.290 --> 01:35:14.010 AUDIENCE: Dash o. 01:35:14.010 --> 01:35:14.843 DAVID MALAN: Dash o. 01:35:14.843 --> 01:35:17.490 Remember that clang can have its default behavior, which 01:35:17.490 --> 01:35:20.520 was a little annoying, whereby it outputs a file called a dot out, 01:35:20.520 --> 01:35:25.170 overridden by saying dash o hello, or dash o anything, 01:35:25.170 --> 01:35:28.920 to change the output to a file of your choice. 01:35:28.920 --> 01:35:31.080 That was an example of a command line argument. 01:35:31.080 --> 01:35:33.917 You literally typed it after the command, on a line, 01:35:33.917 --> 01:35:36.750 and it's an argument in the sense that it's an input to the program. 01:35:36.750 --> 01:35:38.640 So a command line argument, more generally, 01:35:38.640 --> 01:35:43.110 is just one or more words that you type at the prompt after the program you 01:35:43.110 --> 01:35:44.370 care about running. 01:35:44.370 --> 01:35:46.110 So where are these germane here? 01:35:46.110 --> 01:35:51.300 Well finally, can we now explain what a little more of this canonical program 01:35:51.300 --> 01:35:52.110 is about. 01:35:52.110 --> 01:35:55.470 We already discussed earlier today that includes standard Io dot h. 01:35:55.470 --> 01:35:57.900 It just contains your prototypes for things like printf, 01:35:57.900 --> 01:36:01.110 and that gets copied and pasted during pre processing into the file, 01:36:01.110 --> 01:36:02.290 and so forth. 01:36:02.290 --> 01:36:05.110 But what we've not explained yet, what void is here, 01:36:05.110 --> 01:36:06.330 let alone what int is here. 01:36:06.330 --> 01:36:10.080 We've just been copying and pasting this now for just over a week. 01:36:10.080 --> 01:36:15.570 Well it turns out, that in C, you do not need to write only the word void inside 01:36:15.570 --> 01:36:16.830 of those parentheses. 01:36:16.830 --> 01:36:21.000 You can also write, wonderfully, int arg c, string arg v, open bracket, 01:36:21.000 --> 01:36:22.110 close bracket. 01:36:22.110 --> 01:36:23.620 Now why is that compelling? 01:36:23.620 --> 01:36:25.470 Well notice there's a pattern here, and it's 01:36:25.470 --> 01:36:28.530 quite similar to my average function a moment ago. 01:36:28.530 --> 01:36:30.750 It takes two arguments main, apparently. 01:36:30.750 --> 01:36:34.360 One is an int, and one is what? 01:36:34.360 --> 01:36:35.680 It's not a string, per se. 01:36:35.680 --> 01:36:36.430 It's-- 01:36:36.430 --> 01:36:36.910 AUDIENCE: [INAUDIBLE] 01:36:36.910 --> 01:36:38.420 DAVID MALAN: --an array of strings. 01:36:38.420 --> 01:36:40.033 Now arg v is a human convention. 01:36:40.033 --> 01:36:41.950 It means argument vector, which is a fancy way 01:36:41.950 --> 01:36:44.590 of saying an array of arguments. 01:36:44.590 --> 01:36:47.920 And the way you know this is an array is by the fact that you have open bracket 01:36:47.920 --> 01:36:48.820 closed bracket. 01:36:48.820 --> 01:36:51.760 And it's an array of strings because to the left is the word string. 01:36:51.760 --> 01:36:54.430 This is just an old-school integer called int arg 01:36:54.430 --> 01:36:57.850 c, which stands for by convention, argument count. 01:36:57.850 --> 01:37:00.700 However, we could call these arguments anything we want. 01:37:00.700 --> 01:37:03.550 Humans for decades have just called them arg c and arg v, 01:37:03.550 --> 01:37:06.670 just like my average function took in the length of an array 01:37:06.670 --> 01:37:10.690 and the number of scores inside of it. 01:37:10.690 --> 01:37:13.450 So what-- the actual scores inside of it. 01:37:13.450 --> 01:37:15.380 So what can we do with this information? 01:37:15.380 --> 01:37:17.560 Well it turns out, we can now write programs 01:37:17.560 --> 01:37:21.620 that take words from the human, not via get string, but at the actual command 01:37:21.620 --> 01:37:22.120 prompt. 01:37:22.120 --> 01:37:24.380 We can implement features, like clang has. 01:37:24.380 --> 01:37:27.430 So let me go ahead and write a program called arg v in a file 01:37:27.430 --> 01:37:28.900 called arg v dot c. 01:37:28.900 --> 01:37:33.070 Let me go ahead include the CS50 library. 01:37:33.070 --> 01:37:37.980 Let me go ahead and include standard Io dot h. 01:37:37.980 --> 01:37:38.830 Voila. 01:37:38.830 --> 01:37:43.000 Now let me go ahead and do int main not void, int arg 01:37:43.000 --> 01:37:47.140 c, string arg v, open brackets. 01:37:47.140 --> 01:37:50.350 So it's actually worse than it has been, but now it's useful. 01:37:50.350 --> 01:37:51.130 We'll see. 01:37:51.130 --> 01:37:53.180 And now I'm going to go ahead and do this. 01:37:53.180 --> 01:37:59.050 Let me go ahead and say if arg c equals two, 01:37:59.050 --> 01:38:02.630 that's going to mean that the human has typed two words at their prompt. 01:38:02.630 --> 01:38:07.630 And I'm going to go ahead and say this, hello percent s, new line, 01:38:07.630 --> 01:38:11.230 and then I'm going to plug in arg v bracket one, 01:38:11.230 --> 01:38:15.160 for reasons we'll soon see, else if arg c does not equal two, 01:38:15.160 --> 01:38:20.050 I'm just going to hard code this and say hello, world, backslash n. 01:38:20.050 --> 01:38:21.118 So what am I doing? 01:38:21.118 --> 01:38:23.410 I'm trying to write a program that allows the human now 01:38:23.410 --> 01:38:26.200 to write their name at the command prompt, 01:38:26.200 --> 01:38:29.590 instead of waiting for the program to run and use get string [INAUDIBLE] 01:38:29.590 --> 01:38:31.120 like a blinking prompt. 01:38:31.120 --> 01:38:34.990 So what I can do now is this, make arg v. It compiles. 01:38:34.990 --> 01:38:37.570 Dot slash arg v, Enter. 01:38:37.570 --> 01:38:38.950 Hello, world. 01:38:38.950 --> 01:38:45.480 So presumably, what does arg c equal when I run it in that way? 01:38:45.480 --> 01:38:46.480 DAVID MALAN: Maybe one-- 01:38:46.480 --> 01:38:48.550 I mean, not two, at least, it stands to reason. 01:38:48.550 --> 01:38:50.800 It's not two, because I didn't see my own name. 01:38:50.800 --> 01:38:53.253 So if I go ahead and rerun it now, it would say David. 01:38:53.253 --> 01:38:54.670 What's it going to say, hopefully? 01:38:54.670 --> 01:38:56.950 Like, hello comma David? 01:38:56.950 --> 01:38:57.850 And indeed, it does. 01:38:57.850 --> 01:38:58.630 Why? 01:38:58.630 --> 01:39:01.300 Well when you run a program that you have written in C 01:39:01.300 --> 01:39:05.320 and you specify one or more words after your program's name, 01:39:05.320 --> 01:39:09.400 you are handed those words in an array, called arg v, 01:39:09.400 --> 01:39:13.700 and you are told how many words the human typed in arg c. 01:39:13.700 --> 01:39:19.000 So the clang program, the make program, help 50, style 50, check 50, 01:39:19.000 --> 01:39:21.610 all of the programs we've seen thus far that take words 01:39:21.610 --> 01:39:24.570 after the program's names, literally are implemented with code 01:39:24.570 --> 01:39:26.290 that's similar in spirit to this. 01:39:26.290 --> 01:39:28.990 Some programmer checked oh, did the human type any words? 01:39:28.990 --> 01:39:31.930 If so, maybe I want to output a different name than a dot out. 01:39:31.930 --> 01:39:33.670 Maybe I want to output the name hello. 01:39:33.670 --> 01:39:36.190 When you run make something, well what do you want to make? 01:39:36.190 --> 01:39:40.120 That's a command line argument that the human programmer checked arg v for 01:39:40.120 --> 01:39:43.270 to know what program it is you want to make. 01:39:43.270 --> 01:39:47.090 So it's a simple idea, even though the syntax is admittedly pretty ugly. 01:39:47.090 --> 01:39:48.490 But it's the same idea. 01:39:48.490 --> 01:39:51.790 And the only two forms then, for main moving 01:39:51.790 --> 01:39:55.150 forward are either this new one, which lets you accept command line arguments, 01:39:55.150 --> 01:39:57.677 or the old one, which is when you know in advance I 01:39:57.677 --> 01:39:59.260 don't need any command line arguments. 01:39:59.260 --> 01:40:02.290 It's entirely up to you which to use, if you actually 01:40:02.290 --> 01:40:05.510 want to accept command line arguments. 01:40:05.510 --> 01:40:08.410 Now there's one last detail that we've not explained yet 01:40:08.410 --> 01:40:10.180 and that's this one here. 01:40:10.180 --> 01:40:13.030 Why the heck does main have a return value? 01:40:13.030 --> 01:40:15.280 And there's not really a super compelling reason here, 01:40:15.280 --> 01:40:18.160 but we can see that there's a low-level reason that this is useful, 01:40:18.160 --> 01:40:20.290 but it's not something to stress over much. 01:40:20.290 --> 01:40:25.030 It turns out that main by default in C does have a return value. 01:40:25.030 --> 01:40:29.110 And even though we have never returned anything from main yet, by default, 01:40:29.110 --> 01:40:30.990 main returns zero. 01:40:30.990 --> 01:40:34.090 Zero in computers typically means all is well. 01:40:34.090 --> 01:40:37.120 It's a little paradoxical, because you would think zero-- false-- bad. 01:40:37.120 --> 01:40:39.410 But no, zero tends to be good. 01:40:39.410 --> 01:40:44.260 The reason for this is that main can return non-zero values, 01:40:44.260 --> 01:40:47.893 like one, or negative one, or 2 billion, or negative 2 billion. 01:40:47.893 --> 01:40:50.560 In fact, if you've ever seen an error message on your Mac or PC, 01:40:50.560 --> 01:40:52.477 sometimes there's a little window that pops up 01:40:52.477 --> 01:40:55.750 and it's a cryptic looking code, like an error has happened, negative 42, 01:40:55.750 --> 01:40:56.740 or whatever. 01:40:56.740 --> 01:40:59.680 That number is just an arbitrary number some human 01:40:59.680 --> 01:41:04.420 decided that their main program will return if something went wrong. 01:41:04.420 --> 01:41:07.220 And we can do this as follows. 01:41:07.220 --> 01:41:14.320 I can write a program like this in a file called exit dot c that has, 01:41:14.320 --> 01:41:20.735 say, the CS50 library, that has includes standard Io dot h, int main void-- 01:41:20.735 --> 01:41:22.610 I'm going to go back to void, because I'm not 01:41:22.610 --> 01:41:26.290 going to take any-- or actually, no, I'm going to do int rc, 01:41:26.290 --> 01:41:30.377 and then string arg v brackets, so I can take a command line argument, 01:41:30.377 --> 01:41:31.960 and I'm going to start to error check. 01:41:31.960 --> 01:41:34.370 Suppose this is a program that the human is supposed 01:41:34.370 --> 01:41:36.160 to provide a command line argument. 01:41:36.160 --> 01:41:37.190 I'm going to do this. 01:41:37.190 --> 01:41:40.190 If arg c does not equal two, you know what I'm going to do? 01:41:40.190 --> 01:41:45.860 I'm going to yell at the user, say missing command line argument backslash 01:41:45.860 --> 01:41:48.320 n, but now I want to quit from the program. 01:41:48.320 --> 01:41:49.970 I want to do the equivalent of exit. 01:41:49.970 --> 01:41:51.770 So how do you do that in C? 01:41:51.770 --> 01:41:54.080 You actually return a value. 01:41:54.080 --> 01:41:57.150 And if all was well, you would return zero. 01:41:57.150 --> 01:42:00.470 However, if something went wrong, the sky's the limit, up to 2 billion 01:42:00.470 --> 01:42:01.500 or negative 2 billion. 01:42:01.500 --> 01:42:05.330 However, we'll keep it simple, and just return one, if something went wrong. 01:42:05.330 --> 01:42:11.210 Meanwhile, I might then say printf, hello, percent s. 01:42:11.210 --> 01:42:14.040 Type in arg v one, just as before. 01:42:14.040 --> 01:42:17.270 And then, if all is well, return zero. 01:42:17.270 --> 01:42:19.252 So not much new is happening here. 01:42:19.252 --> 01:42:20.960 This program is very similar to the last, 01:42:20.960 --> 01:42:24.680 except instead of saying hello world by default, I'm going to yell at the user 01:42:24.680 --> 01:42:26.540 with this, missing command line argument, 01:42:26.540 --> 01:42:31.040 and then return one to signal to the computer, this program did not succeed. 01:42:31.040 --> 01:42:34.670 And I'm going to return zero, if and only if, it did. 01:42:34.670 --> 01:42:35.405 Yeah? 01:42:35.405 --> 01:42:38.660 AUDIENCE: Why is arg c unequal to zero? 01:42:38.660 --> 01:42:42.990 DAVID MALAN: Why is arg c not equal-- really good question. 01:42:42.990 --> 01:42:46.070 So let me go ahead and change this. 01:42:46.070 --> 01:42:50.810 What is in arg v zero that makes it have two things instead of one, 01:42:50.810 --> 01:42:52.230 if I run David-- 01:42:52.230 --> 01:42:53.600 if I run my name, David. 01:42:53.600 --> 01:42:56.030 Well, hello-- let me recompile. 01:42:56.030 --> 01:43:01.630 Make arg v one, or make arg v, dot slash, arg v, hello-- 01:43:01.630 --> 01:43:02.780 no, wrong program. 01:43:02.780 --> 01:43:03.890 Make exit. 01:43:03.890 --> 01:43:05.600 Sorry. 01:43:05.600 --> 01:43:07.490 There's no program to detect that mistake. 01:43:07.490 --> 01:43:10.190 Dot slash exit, missing command line argument. 01:43:10.190 --> 01:43:15.530 However, if I do exit David, now I see-- oh, did I run arg v before? 01:43:15.530 --> 01:43:16.310 Check the tape. 01:43:16.310 --> 01:43:17.450 Hello dot exit. 01:43:17.450 --> 01:43:21.110 So in arg v, the first word you type, the program's name, 01:43:21.110 --> 01:43:22.910 is stored at arg v zero. 01:43:22.910 --> 01:43:26.450 The second word you type, the first argument you care about, 01:43:26.450 --> 01:43:28.370 is an arg v one. 01:43:28.370 --> 01:43:29.690 And that's why arg c is two. 01:43:29.690 --> 01:43:32.648 I literally typed two words at the prompt, even though only one of them 01:43:32.648 --> 01:43:36.050 is technically an argument I care about. 01:43:36.050 --> 01:43:39.448 So where can we go from this? 01:43:39.448 --> 01:43:41.990 So we're going to use this now to solve a number of problems, 01:43:41.990 --> 01:43:43.407 that of readability, for instance. 01:43:43.407 --> 01:43:44.960 You might recall this paragraph here. 01:43:44.960 --> 01:43:45.860 Mr. And Mr. Durst-- 01:43:45.860 --> 01:43:47.660 "Mr. And Mrs. Dursley of number 4 Privet Drive 01:43:47.660 --> 01:43:50.490 were proud to say that they were perfectly normal, thank you very much. 01:43:50.490 --> 01:43:52.490 They were the last people you'd expect to be involved in anything 01:43:52.490 --> 01:43:55.790 strange or mysterious, because they just didn't hold with such nonsense," 01:43:55.790 --> 01:43:56.495 and so forth. 01:43:56.495 --> 01:43:59.120 So from the very first Harry Potter in the Philosopher's Stone, 01:43:59.120 --> 01:44:01.160 if you were to run the entirety of that book 01:44:01.160 --> 01:44:05.487 through a program written in C, that analyzes its readability, 01:44:05.487 --> 01:44:07.820 you would be informed that the grade level for that book 01:44:07.820 --> 01:44:09.260 is estimated at grade 7. 01:44:09.260 --> 01:44:13.760 So you can read it well and comfortably if you're a human in grade 7. 01:44:13.760 --> 01:44:15.090 Why is that the case? 01:44:15.090 --> 01:44:18.710 Well, the program, as is conventional in software, 01:44:18.710 --> 01:44:21.498 would analyze like the number of words in the sentence, 01:44:21.498 --> 01:44:24.290 the lengths of your words, how big the words are that you're using. 01:44:24.290 --> 01:44:26.082 There's a number of heuristics that are not 01:44:26.082 --> 01:44:31.272 perfectly correlated with readability, but they are-- 01:44:31.272 --> 01:44:33.230 they're not perfectly aligned with readability, 01:44:33.230 --> 01:44:35.220 but they do correlate with readability. 01:44:35.220 --> 01:44:37.303 So the bigger the words, the bigger the sentences, 01:44:37.303 --> 01:44:41.090 and more likely the older you should be to actually read that text effectively. 01:44:41.090 --> 01:44:42.670 Now something like this. 01:44:42.670 --> 01:44:45.080 "In computational linguistics, authorship attribution 01:44:45.080 --> 01:44:47.540 is the task of predicting the author of a document of unknown authorship. 01:44:47.540 --> 01:44:50.720 This task is generally performed by the analysis of style metric features, 01:44:50.720 --> 01:44:52.280 particular characteristics of an author's writing 01:44:52.280 --> 01:44:54.655 that can be used to identify his or her works in contrast 01:44:54.655 --> 01:44:56.055 with the works of other authors." 01:44:56.055 --> 01:44:58.430 If you were to run that through the same program and see, 01:44:58.430 --> 01:45:00.138 otherwise known as Brian's senior thesis, 01:45:00.138 --> 01:45:04.610 you would get grade 16, because he uses a lot bigger words, longer sentences, 01:45:04.610 --> 01:45:06.110 more elegant prose. 01:45:06.110 --> 01:45:10.293 It turns out that this program in C to which I allude, will exist in a week, 01:45:10.293 --> 01:45:12.210 because for the first problem on the problem-- 01:45:12.210 --> 01:45:14.168 one of the problems on the problem set will you 01:45:14.168 --> 01:45:16.010 implement a readability analysis. 01:45:16.010 --> 01:45:19.040 But it all boils down to taking in text as inputs, such as Harry 01:45:19.040 --> 01:45:22.250 Potter or Brian's text, analyzing the lengths of the words, 01:45:22.250 --> 01:45:26.295 looking for the spaces, and so forth, and deciding how advanced that text is. 01:45:26.295 --> 01:45:28.295 But we're also going to challenge you with this, 01:45:28.295 --> 01:45:31.310 this notion of cryptography, the art of scrambling information 01:45:31.310 --> 01:45:32.570 to keep it private. 01:45:32.570 --> 01:45:35.450 And cryptography might work, just like in week zero, 01:45:35.450 --> 01:45:38.330 as having inputs and outputs, where the input is the message you 01:45:38.330 --> 01:45:40.410 want to send safely to someone else. 01:45:40.410 --> 01:45:43.452 The output is some kind of scrambled version thereof, the equivalent of, 01:45:43.452 --> 01:45:46.160 like in grade school, maybe writing a little love note to someone 01:45:46.160 --> 01:45:48.243 and passing it through the class to the recipient. 01:45:48.243 --> 01:45:50.452 And you don't want the teacher, if they intercept it, 01:45:50.452 --> 01:45:53.780 to be able to understand the message, so it's somehow scrambled or encrypted, 01:45:53.780 --> 01:45:54.710 so to speak. 01:45:54.710 --> 01:45:56.748 In cryptography, the input is called plaintext, 01:45:56.748 --> 01:45:58.290 and the output is called cipher text. 01:45:58.290 --> 01:46:02.180 So if we were, for instance, to say something like hi exclamation point, 01:46:02.180 --> 01:46:05.540 recall that, that of course can be represented in Ascii as three numbers-- 01:46:05.540 --> 01:46:07.670 72, 73, and 33. 01:46:07.670 --> 01:46:10.130 Well, it turns out, if we want to send a fancier message, 01:46:10.130 --> 01:46:13.580 a longer one, we can just look at all of those numeric equivalents, 01:46:13.580 --> 01:46:16.520 do some mathematics on them, and effectively scramble them. 01:46:16.520 --> 01:46:17.600 But we need a key. 01:46:17.600 --> 01:46:21.140 You and I need to decide in advance, sender and recipient, what 01:46:21.140 --> 01:46:24.290 is the secret we're going to use to kind of jumble the letters up 01:46:24.290 --> 01:46:27.230 so as to encrypt it without a teacher or a classmate 01:46:27.230 --> 01:46:28.820 intercepting and decrypting it. 01:46:28.820 --> 01:46:32.750 Suppose, very simply and probably foolishly, our secret number is one. 01:46:32.750 --> 01:46:36.490 You and I both green one is our secret and we're going to use one to scramble 01:46:36.490 --> 01:46:38.630 the information as follows. 01:46:38.630 --> 01:46:42.490 If I want to say, I love you, and send this across an insecure medium, 01:46:42.490 --> 01:46:44.620 like a roomful of people, well I might first 01:46:44.620 --> 01:46:47.500 convert each of these letters to their Ascii equivalents 01:46:47.500 --> 01:46:50.800 just by looking them up on AsciiChart.com or doing it in code, 01:46:50.800 --> 01:46:54.010 then I might go ahead and start adding one to each of those letters, 01:46:54.010 --> 01:46:57.040 because that is the secret on which you and I have agreed, 01:46:57.040 --> 01:46:59.020 and then I'll convert it back to the characters 01:46:59.020 --> 01:47:03.280 as by casting it from an int to a char so that the message I actually 01:47:03.280 --> 01:47:06.760 write on my piece of paper, or send in my program, looks like this. 01:47:06.760 --> 01:47:10.210 So that if a teacher or a classmate intercepts it, they see this, 01:47:10.210 --> 01:47:12.070 but you know, I love you. 01:47:12.070 --> 01:47:16.160 And so, with that said, will you be doing your readability and cryptography 01:47:16.160 --> 01:47:16.660 and more? 01:47:16.660 --> 01:47:20.010 That's it for week two, and we'll see you next time.