1 00:00:00,000 --> 00:01:22,668 2 00:01:22,668 --> 00:01:24,080 DAVID MALAN: All right. 3 00:01:24,080 --> 00:01:28,193 This is CS50, and today we look all the more underneath the hood, so to speak, 4 00:01:28,193 --> 00:01:30,860 of programming, which we've been doing the past couple of weeks, 5 00:01:30,860 --> 00:01:32,015 and of C in particular. 6 00:01:32,015 --> 00:01:33,890 And indeed, we're going to try to focus today 7 00:01:33,890 --> 00:01:37,410 in addition on some new programming techniques, really on first principles, 8 00:01:37,410 --> 00:01:40,100 so that what you've been seeing over the past couple of weeks 9 00:01:40,100 --> 00:01:42,720 no longer feels quite as much like magic. 10 00:01:42,720 --> 00:01:44,920 If you're sort of typing these magical incantations 11 00:01:44,920 --> 00:01:46,670 and you're not quite sure why things work, 12 00:01:46,670 --> 00:01:49,250 know that you will understand and appreciate 13 00:01:49,250 --> 00:01:52,788 all the more with practice and with application of these ideas, what 14 00:01:52,788 --> 00:01:53,580 it is you're doing. 15 00:01:53,580 --> 00:01:56,750 But today, we're going to go back to first principles, sort of week 0 16 00:01:56,750 --> 00:01:59,300 material, to make sure that you understand 17 00:01:59,300 --> 00:02:02,960 that what we're doing now in week 2 is little different from what we did back 18 00:02:02,960 --> 00:02:04,040 in week 0. 19 00:02:04,040 --> 00:02:07,164 So in fact, let's take a look at one of the first programs we saw in C, 20 00:02:07,164 --> 00:02:08,789 which was a little something like this. 21 00:02:08,789 --> 00:02:10,430 This is our source code, so to speak. 22 00:02:10,430 --> 00:02:13,130 There were a few salient characteristics from last week 23 00:02:13,130 --> 00:02:15,830 that dovetailed with the first week, week 0. 24 00:02:15,830 --> 00:02:18,740 And that was this thing called main, which is just the main function. 25 00:02:18,740 --> 00:02:20,490 It's the main entry point to your program. 26 00:02:20,490 --> 00:02:23,210 It's the equivalent of scratches when green flag clicked. 27 00:02:23,210 --> 00:02:25,880 This of course is an example of another function, 28 00:02:25,880 --> 00:02:28,730 one that comes with C that allows you to print on the screen. 29 00:02:28,730 --> 00:02:31,340 It can take inputs, at least one input here, 30 00:02:31,340 --> 00:02:35,720 which is typically a string in double quotes, like the message "hello world." 31 00:02:35,720 --> 00:02:38,900 But of course, in order to use printf in the first place, 32 00:02:38,900 --> 00:02:40,910 you needed this thing up here. 33 00:02:40,910 --> 00:02:45,650 And Standard io.h represents what, as you understand it now? 34 00:02:45,650 --> 00:02:50,320 Any thoughts on what Standard io.h is? 35 00:02:50,320 --> 00:02:50,820 Yeah? 36 00:02:50,820 --> 00:02:54,648 AUDIENCE: A library on how [INAUDIBLE]. 37 00:02:54,648 --> 00:02:56,690 DAVID MALAN: Yeah, it's a manifestation of what's 38 00:02:56,690 --> 00:02:59,910 called a library, code that someone else wrote years ago. 39 00:02:59,910 --> 00:03:02,780 Specifically, Standard io.h is a header file. 40 00:03:02,780 --> 00:03:05,090 It's a file written in C but with a file extension 41 00:03:05,090 --> 00:03:10,070 ending in dot h that among other things declares that it has the prototype, 42 00:03:10,070 --> 00:03:13,290 so to speak, for printf so that Clang, when you're compiling your code, 43 00:03:13,290 --> 00:03:15,230 know what printf actually is. 44 00:03:15,230 --> 00:03:17,100 And of course this little thing back here, 45 00:03:17,100 --> 00:03:20,460 you've probably now gotten in the habit of using this /n is new line. 46 00:03:20,460 --> 00:03:22,460 And it forces the cursor to go on the next line. 47 00:03:22,460 --> 00:03:26,300 So those were some of the uglier characteristics of code last week, 48 00:03:26,300 --> 00:03:29,000 and we'll tease apart int and void and a few other things 49 00:03:29,000 --> 00:03:31,140 over the course of today and beyond. 50 00:03:31,140 --> 00:03:35,030 So when you compile your code with Clang, hello.c, 51 00:03:35,030 --> 00:03:38,780 and then run that program, ./a.out, which you probably haven't done 52 00:03:38,780 --> 00:03:42,080 on your own since, because we gave you a simpler way to do this, 53 00:03:42,080 --> 00:03:45,650 that process was all about creating a file containing zeros and ones that 54 00:03:45,650 --> 00:03:48,410 the computer understands, called a.out that you can run. 55 00:03:48,410 --> 00:03:50,780 Of course, a.out is a pretty stupid name for a program. 56 00:03:50,780 --> 00:03:53,030 It's hardly descriptive, even though it's the default. 57 00:03:53,030 --> 00:03:55,880 So the next program we wrote and compiled, 58 00:03:55,880 --> 00:04:00,290 we used -ohhello, which is a so-called command line argument to Clang. 59 00:04:00,290 --> 00:04:02,480 It's like an option it comes with that just lets you 60 00:04:02,480 --> 00:04:04,752 specify the name of the file to output. 61 00:04:04,752 --> 00:04:06,710 So you did this past week with the problem set, 62 00:04:06,710 --> 00:04:09,020 with a couple of programs you yourself wrote. 63 00:04:09,020 --> 00:04:13,376 But what is actually going on when you compile your code via that process? 64 00:04:13,376 --> 00:04:16,459 Well, it turns out that if we make this program a little more interesting, 65 00:04:16,459 --> 00:04:19,190 this becomes even more important with code like this. 66 00:04:19,190 --> 00:04:20,899 Now I've added a couple of lines of code. 67 00:04:20,899 --> 00:04:24,440 CS50.h, which is representative of the CS50 library. 68 00:04:24,440 --> 00:04:28,310 Again, code that other people wrote, in this case the staff some years ago, 69 00:04:28,310 --> 00:04:33,020 that declares that it has prototypes for the one liners for functions 70 00:04:33,020 --> 00:04:36,590 like GetString so that you can use more features than came with C by default. 71 00:04:36,590 --> 00:04:39,840 And it has things like String itself, a data type. 72 00:04:39,840 --> 00:04:41,840 So GetString is declared in that file. 73 00:04:41,840 --> 00:04:45,260 Name is, of course, a variable in which we stored my name last week. 74 00:04:45,260 --> 00:04:47,990 String is the type of variable in which we stored a name. 75 00:04:47,990 --> 00:04:51,680 And all of that is then outputed hello comma something, 76 00:04:51,680 --> 00:04:53,990 where the percent S recall was a placeholder, 77 00:04:53,990 --> 00:04:58,130 name is the variable we plugged in to that format code, and then all of that 78 00:04:58,130 --> 00:05:01,700 is possible because of CS50.h, which declares string and also 79 00:05:01,700 --> 00:05:03,120 gives us GetString. 80 00:05:03,120 --> 00:05:05,453 So that's a paradigm that's at the moment CS50 specific, 81 00:05:05,453 --> 00:05:07,787 but it's representative of any number of other functions 82 00:05:07,787 --> 00:05:10,220 we're going to start using today and in the weeks to come. 83 00:05:10,220 --> 00:05:12,780 The process now is going to be the same. 84 00:05:12,780 --> 00:05:17,327 However, when you compiled that program that used the CS50 library, 85 00:05:17,327 --> 00:05:20,160 you might recall and you might have gotten hung up on this past week 86 00:05:20,160 --> 00:05:24,365 if you used Clang and not another program, you need this -lcs50, 87 00:05:24,365 --> 00:05:26,510 and you need it at the end just because. 88 00:05:26,510 --> 00:05:27,905 That's the way Clang expects it. 89 00:05:27,905 --> 00:05:29,780 This is a special flag that we'll tease apart 90 00:05:29,780 --> 00:05:34,290 in just a couple of minutes, an argument to Clang that tells it to link in, 91 00:05:34,290 --> 00:05:38,180 so to speak, link in all of the zeros and ones from CS50's library. 92 00:05:38,180 --> 00:05:40,108 But we'll see that in just a moment. 93 00:05:40,108 --> 00:05:41,900 This, of course, is how you should probably 94 00:05:41,900 --> 00:05:43,480 be compiling your code here on out. 95 00:05:43,480 --> 00:05:46,700 It's just super simple, but it automates everything we just 96 00:05:46,700 --> 00:05:49,310 saw more pedantically, step by step. 97 00:05:49,310 --> 00:05:51,870 So we've been compiling our code for the past week now, 98 00:05:51,870 --> 00:05:54,620 and we're going to keep doing that for next several weeks, until-- 99 00:05:54,620 --> 00:05:56,330 spoiler-- we get to Python, and you're not 100 00:05:56,330 --> 00:05:57,860 going to have to compile anything anymore. 101 00:05:57,860 --> 00:05:59,860 It's just going to happen automatically for you. 102 00:05:59,860 --> 00:06:03,710 But until then, compilation is actually kind of an oversimplification 103 00:06:03,710 --> 00:06:05,630 of what's been happening the past week. 104 00:06:05,630 --> 00:06:09,290 Turns out there's like actually four distinct steps that you all 105 00:06:09,290 --> 00:06:12,410 had been inducing by running Make or even by running 106 00:06:12,410 --> 00:06:14,180 Clang manually at the command prompt. 107 00:06:14,180 --> 00:06:16,460 And just so that, again, we can sort of understand 108 00:06:16,460 --> 00:06:18,930 what it is you are doing when you run these commands, 109 00:06:18,930 --> 00:06:21,380 let's go to first principles, understand these four steps, 110 00:06:21,380 --> 00:06:25,170 but then we'll move on just like in week 0 and stipulate, OK, I got that. 111 00:06:25,170 --> 00:06:27,690 I don't need to think at this low level after today. 112 00:06:27,690 --> 00:06:30,925 But hopefully you'll understand from the bottom up these four steps. 113 00:06:30,925 --> 00:06:32,550 So let's take a look at pre-processing. 114 00:06:32,550 --> 00:06:35,580 This is a term of art in programming that refers to the following. 115 00:06:35,580 --> 00:06:38,260 When you have source code that looks like this, 116 00:06:38,260 --> 00:06:40,280 you have a couple of lines at the top that 117 00:06:40,280 --> 00:06:43,470 say hash include two files, two library files. 118 00:06:43,470 --> 00:06:47,040 Well, when you actually run Clang or you induce 119 00:06:47,040 --> 00:06:50,610 Clang to run by using Make, what happens is those lines 120 00:06:50,610 --> 00:06:54,330 that start with the hash symbol are actually sort of replaced 121 00:06:54,330 --> 00:06:56,920 with the actual contents of that file. 122 00:06:56,920 --> 00:07:00,240 So instead of this code remaining include CS50.h, 123 00:07:00,240 --> 00:07:03,600 literally what Clang does is go into CS50.h, 124 00:07:03,600 --> 00:07:06,990 grab the relevant lines of code, and essentially copy-paste them 125 00:07:06,990 --> 00:07:10,380 into your file, hello.c or whatever it's called. 126 00:07:10,380 --> 00:07:12,870 The next line here, standard io.h similarly 127 00:07:12,870 --> 00:07:17,460 gets replaced with whatever the lines of code are in that file, standard io.h. 128 00:07:17,460 --> 00:07:21,030 Doesn't matter to us what they are, but they look a little something like this, 129 00:07:21,030 --> 00:07:22,740 though I've simplified on the slide here. 130 00:07:22,740 --> 00:07:25,573 And there's a whole bunch of other stuff above and below those lines 131 00:07:25,573 --> 00:07:27,330 certainly in those files. 132 00:07:27,330 --> 00:07:28,830 What then happens after that? 133 00:07:28,830 --> 00:07:31,110 Well, compiling, even though this is the word 134 00:07:31,110 --> 00:07:33,630 we use and we'll continue using to describe 135 00:07:33,630 --> 00:07:38,070 taking source code to machine code, it's actually a more precise step than that. 136 00:07:38,070 --> 00:07:40,980 When a computer-- when a program is compiled, 137 00:07:40,980 --> 00:07:44,340 it technically starts like this after having been pre-processed-- again, 138 00:07:44,340 --> 00:07:45,510 that was step 1. 139 00:07:45,510 --> 00:07:47,730 This code is then converted by a compiler, 140 00:07:47,730 --> 00:07:51,950 like Clang, to something that looks even scarier than C. This is something 141 00:07:51,950 --> 00:07:53,700 called assembly code, and you can actually 142 00:07:53,700 --> 00:07:55,530 take entire courses on assembly code. 143 00:07:55,530 --> 00:07:58,740 And it wasn't all that many decades ago that humans were manually 144 00:07:58,740 --> 00:08:02,400 programming code that looked like this, so it wasn't quite zeros and ones. 145 00:08:02,400 --> 00:08:04,560 But my god, C is looking pretty good now, 146 00:08:04,560 --> 00:08:06,810 if this is the alternative language back in the day. 147 00:08:06,810 --> 00:08:08,730 So this is an example of assembly language. 148 00:08:08,730 --> 00:08:11,460 But even though it's pretty arcane looking, 149 00:08:11,460 --> 00:08:13,680 if I highlight in yellow a few characteristics, 150 00:08:13,680 --> 00:08:15,390 there's some things that are familiar. 151 00:08:15,390 --> 00:08:16,520 Main is up here. 152 00:08:16,520 --> 00:08:17,730 Get string is down here. 153 00:08:17,730 --> 00:08:19,290 Printf is down here. 154 00:08:19,290 --> 00:08:23,760 So when your code is compiled by Clang, it goes from your source code in C 155 00:08:23,760 --> 00:08:27,480 to this intermediate step assembly code, and that's just 156 00:08:27,480 --> 00:08:30,960 a little closer to what the CPU, the brain of your computer, 157 00:08:30,960 --> 00:08:31,980 actually understands. 158 00:08:31,980 --> 00:08:35,070 In fact, now highlighted in yellow are what are called instructions. 159 00:08:35,070 --> 00:08:38,070 So if you've ever heard of Intel or AMD or a bunch of companies 160 00:08:38,070 --> 00:08:40,500 that make CPUs, central processing units, 161 00:08:40,500 --> 00:08:43,679 the brains of a computer, what those CPUs understand 162 00:08:43,679 --> 00:08:47,310 is these very, very low level operations like this. 163 00:08:47,310 --> 00:08:50,520 And these relate to moving things around in memory and copying things 164 00:08:50,520 --> 00:08:52,770 and reading things and putting things onto the screen. 165 00:08:52,770 --> 00:08:55,227 But much more arcanely than C is. 166 00:08:55,227 --> 00:08:57,060 But again, we don't have to care about this, 167 00:08:57,060 --> 00:08:59,710 because Clang does all of this for us. 168 00:08:59,710 --> 00:09:02,040 But once you're at that point of having assembly code, 169 00:09:02,040 --> 00:09:04,582 you need to get it to machine code the actual zeros and ones. 170 00:09:04,582 --> 00:09:07,350 And that's where Clang does what's called assembling. 171 00:09:07,350 --> 00:09:10,950 There's another part of Clang, like some built-in functionality, that 172 00:09:10,950 --> 00:09:13,470 takes as input that assembly code and converts it 173 00:09:13,470 --> 00:09:17,530 from this to the zeros and ones that we talked about in week 0. 174 00:09:17,530 --> 00:09:21,510 But for a program like hello.c, which involved a few different files. 175 00:09:21,510 --> 00:09:25,470 For instance, this code again involved my code that we wrote last week. 176 00:09:25,470 --> 00:09:28,650 It involves the CS50 library, which the staff wrote years ago. 177 00:09:28,650 --> 00:09:30,390 And it involves standard io.h. 178 00:09:30,390 --> 00:09:31,650 That's yet another file. 179 00:09:31,650 --> 00:09:36,000 That's like three different files that Clang frankly has to compile for you. 180 00:09:36,000 --> 00:09:39,630 Now it would be super tedious if we had to run Clang like three times 181 00:09:39,630 --> 00:09:41,310 to do all this compilation. 182 00:09:41,310 --> 00:09:42,300 Thankfully we don't. 183 00:09:42,300 --> 00:09:44,020 It all happens automatically. 184 00:09:44,020 --> 00:09:48,420 So the last step in compiling a program after it's been pre-processed, 185 00:09:48,420 --> 00:09:51,300 after it's been compiled, after it's been assembled, 186 00:09:51,300 --> 00:09:54,960 is to combine all of the zeros and ones from the files involved 187 00:09:54,960 --> 00:09:58,590 into one big file, like Hello or a.out. 188 00:09:58,590 --> 00:10:03,360 So if hello.c started as source code, as did CS50.C, somewhere on the computer's 189 00:10:03,360 --> 00:10:07,410 hard drive, as did Standard IO.C, somewhere on the computer's hard drive, 190 00:10:07,410 --> 00:10:12,480 turns out the printf is actually in its own file within Standard IO. 191 00:10:12,480 --> 00:10:13,380 the library. 192 00:10:13,380 --> 00:10:16,780 But these are the three files involved for the program I just described. 193 00:10:16,780 --> 00:10:19,020 So once we actually go ahead and assemble this one, 194 00:10:19,020 --> 00:10:20,812 it becomes a whole bunch of zeros and ones. 195 00:10:20,812 --> 00:10:23,062 We assemble this one, a whole bunch of zeros and ones. 196 00:10:23,062 --> 00:10:24,840 This one, a whole bunch of zeros and ones. 197 00:10:24,840 --> 00:10:26,880 That's like three separate files that then 198 00:10:26,880 --> 00:10:32,640 get linked together, sort of commingled, into one big file called Hello, 199 00:10:32,640 --> 00:10:34,480 or called a.out. 200 00:10:34,480 --> 00:10:36,512 And my god, like that's a lot of complexity. 201 00:10:36,512 --> 00:10:38,220 But that's what humans have been building 202 00:10:38,220 --> 00:10:41,090 and developing for the past many decades when it comes to writing software. 203 00:10:41,090 --> 00:10:43,320 Back in the day, it started off as zeros and ones. 204 00:10:43,320 --> 00:10:44,310 That was no fun. 205 00:10:44,310 --> 00:10:46,530 Assembly language, scary though it looks, 206 00:10:46,530 --> 00:10:50,010 was actually a little easier, a little more accessible for humans to write. 207 00:10:50,010 --> 00:10:51,810 But eventually we humans got tired of that, 208 00:10:51,810 --> 00:10:56,940 and thus were born languages like C and C++ and Python and PHP and Ruby 209 00:10:56,940 --> 00:10:57,790 and others. 210 00:10:57,790 --> 00:11:00,640 It's been an evolution of languages along the way. 211 00:11:00,640 --> 00:11:04,140 So this now we can just abstract away into compiling. 212 00:11:04,140 --> 00:11:07,337 When you compile your code, all of that stuff happens. 213 00:11:07,337 --> 00:11:09,420 But all we really care about at the end of the day 214 00:11:09,420 --> 00:11:12,360 is the input, your source code, the output as machine code. 215 00:11:12,360 --> 00:11:14,400 But those are the various steps happening. 216 00:11:14,400 --> 00:11:16,858 And if you ever see cryptic-looking commands on the screen, 217 00:11:16,858 --> 00:11:20,510 it might relate indeed to some of those intermediate steps. 218 00:11:20,510 --> 00:11:24,920 All right, any questions then on what compiling is or pre-processing, 219 00:11:24,920 --> 00:11:28,310 compiling, assembling, or linking? 220 00:11:28,310 --> 00:11:31,630 Anything at all? 221 00:11:31,630 --> 00:11:32,660 All right. 222 00:11:32,660 --> 00:11:36,440 So beyond that, I'm sure you've encountered now, after just one 223 00:11:36,440 --> 00:11:38,310 week, bugs in your software. 224 00:11:38,310 --> 00:11:41,560 And in fact, one of the greatest skills you can acquire from programming class 225 00:11:41,560 --> 00:11:45,860 is not only how to write code, but how to debug code, most likely your own. 226 00:11:45,860 --> 00:11:48,560 And if you've ever wondered where this phrase comes from, 227 00:11:48,560 --> 00:11:52,240 this notion of debugging, so this is actually part of the mythology. 228 00:11:52,240 --> 00:11:55,550 So this is actually a notebook kept by Grace Hopper, 229 00:11:55,550 --> 00:11:59,600 a very famous computer scientist, working years ago with some colleagues 230 00:11:59,600 --> 00:12:01,142 on what was called the Mark 2 system. 231 00:12:01,142 --> 00:12:03,350 If you've ever walked through Harvard Science Center, 232 00:12:03,350 --> 00:12:06,470 there's a big part of a machine in the ground floor of the Science Center. 233 00:12:06,470 --> 00:12:08,230 That's the Mark 1, the precursor. 234 00:12:08,230 --> 00:12:10,490 Well, the Mark 2 at some point was discovered 235 00:12:10,490 --> 00:12:14,180 as having literally a bug inside of it, which was causing a problem. 236 00:12:14,180 --> 00:12:15,320 A moth of sorts. 237 00:12:15,320 --> 00:12:18,140 And Grace Hopper actually made this record here, if we zoom in, 238 00:12:18,140 --> 00:12:20,817 the first actual case of bug being found. 239 00:12:20,817 --> 00:12:23,150 And even though other people had used the expression bug 240 00:12:23,150 --> 00:12:26,030 before to refer to mistakes or problems in systems, 241 00:12:26,030 --> 00:12:30,070 this is really sort of the lore that folks in computer science look back on. 242 00:12:30,070 --> 00:12:34,730 So bugs are just mistakes in programs, things that you surely did not intend. 243 00:12:34,730 --> 00:12:37,430 And we'll consider today now how we can empower you, 244 00:12:37,430 --> 00:12:41,420 much more so than this past week, to solve your own problems 245 00:12:41,420 --> 00:12:44,083 and actually debug your software. 246 00:12:44,083 --> 00:12:46,250 So what are the mechanisms via which we can do this? 247 00:12:46,250 --> 00:12:49,682 So Help 50 is one of the tools that CS50 itself provides you with. 248 00:12:49,682 --> 00:12:51,890 And let's go ahead and take a look at a quick example 249 00:12:51,890 --> 00:12:54,570 that allows us to use this tool. 250 00:12:54,570 --> 00:12:57,198 I'm going to go ahead and open up my CS50 Sandbox here. 251 00:12:57,198 --> 00:12:59,240 I'm going to go ahead and create a program called 252 00:12:59,240 --> 00:13:03,360 Buggy 0.C, knowing in advance that I'm going to make a mistake here. 253 00:13:03,360 --> 00:13:07,560 And I'm going to go ahead and do main void, as do all of my programs begin. 254 00:13:07,560 --> 00:13:12,740 And I'm going to go ahead and do printf hello world backslash n semicolon. 255 00:13:12,740 --> 00:13:14,720 All right, so that's buggy 0.c. 256 00:13:14,720 --> 00:13:16,970 And again, even though I could run the Clang commands, 257 00:13:16,970 --> 00:13:19,220 henceforth I'm just going to run things like Make. 258 00:13:19,220 --> 00:13:21,170 So make buggy 0 Enter. 259 00:13:21,170 --> 00:13:23,190 And all right, here's the first of my errors. 260 00:13:23,190 --> 00:13:25,520 Let me just increase the size of my terminal window, 261 00:13:25,520 --> 00:13:29,750 focusing as always, always on the first error, which is the one in red here. 262 00:13:29,750 --> 00:13:34,290 Implicitly declaring library function printf with type int const char *w, 263 00:13:34,290 --> 00:13:34,790 error-- 264 00:13:34,790 --> 00:13:35,957 I mean, there's a lot there. 265 00:13:35,957 --> 00:13:39,560 There's a lot to digest, even though by now, you might recognize at least some 266 00:13:39,560 --> 00:13:40,370 of these symbols. 267 00:13:40,370 --> 00:13:43,700 But suppose you don't, and you want help understanding this message. 268 00:13:43,700 --> 00:13:46,700 Short of asking a human for help, someone who's more familiar, 269 00:13:46,700 --> 00:13:48,080 you can instead do this. 270 00:13:48,080 --> 00:13:52,760 Rerun the same command as before, but prefix it with help 50 and hit Enter. 271 00:13:52,760 --> 00:13:55,730 And what will happen is we will run make for you again. 272 00:13:55,730 --> 00:13:59,340 We will look at the output of make, cryptic though it might be to you, 273 00:13:59,340 --> 00:14:03,500 run it through our own Help 50 software and look for messages we understand. 274 00:14:03,500 --> 00:14:07,157 And if we recognize one of the error messages in your output, 275 00:14:07,157 --> 00:14:08,990 we're going to highlight in yellow a message 276 00:14:08,990 --> 00:14:12,620 like this-- buggy zero, dot C3 colon 5, error, 277 00:14:12,620 --> 00:14:16,730 implicitly declaring library function printf with type, dot, dot, dot. 278 00:14:16,730 --> 00:14:18,803 Did you forget to include standard Io dot h 279 00:14:18,803 --> 00:14:20,970 and with printf is declared at the top of your file. 280 00:14:20,970 --> 00:14:23,540 So that's, in this case, the exact answer. 281 00:14:23,540 --> 00:14:25,788 And so now, you'll just see that not only 282 00:14:25,788 --> 00:14:28,580 are we still showing you the error, we're highlighting where it is. 283 00:14:28,580 --> 00:14:33,280 And in fact, buggy zero, dot c, line 3, character 5, or column 5, 284 00:14:33,280 --> 00:14:37,010 is just one way of now homing in on what the issue is. 285 00:14:37,010 --> 00:14:43,300 Let me go ahead and open up another file here, or enhance this as buggy one 286 00:14:43,300 --> 00:14:47,520 dot c, and make a similar mistake, but one that triggers a different error 287 00:14:47,520 --> 00:14:48,020 message. 288 00:14:48,020 --> 00:14:50,728 In this case, I'm going to go ahead and get this right this time, 289 00:14:50,728 --> 00:14:53,420 include standard Io dot h. 290 00:14:53,420 --> 00:14:56,925 And then I'm going to go ahead and do int main void, and then just as before, 291 00:14:56,925 --> 00:14:58,550 I'm going to do this canonical program. 292 00:14:58,550 --> 00:15:00,620 String name gets get string. 293 00:15:00,620 --> 00:15:03,890 And ask the user, what's your name-- 294 00:15:03,890 --> 00:15:05,150 backslash, n. 295 00:15:05,150 --> 00:15:10,850 And then I'm going to go ahead and say hello to them with a %s comma name. 296 00:15:10,850 --> 00:15:12,480 So that too looks good. 297 00:15:12,480 --> 00:15:16,760 I'm going to go ahead and scroll back up here, do make buggy one this time. 298 00:15:16,760 --> 00:15:20,030 But of course, it looks like, my god, as before, I have two lines of code, 299 00:15:20,030 --> 00:15:21,740 yet somehow, five or six errors. 300 00:15:21,740 --> 00:15:23,120 Always focus on the top. 301 00:15:23,120 --> 00:15:27,180 So it probably relates to something like this, but this one's more confusing. 302 00:15:27,180 --> 00:15:29,960 The undeclared identifier string-- did you mean standard Io? 303 00:15:29,960 --> 00:15:31,040 Well, no. 304 00:15:31,040 --> 00:15:34,310 So if you don't quite grok that, go ahead and run the same command, 305 00:15:34,310 --> 00:15:36,360 help 50, make buggy one. 306 00:15:36,360 --> 00:15:38,960 And this time, we'll see the output of this command, 307 00:15:38,960 --> 00:15:42,470 hopefully, after asking for help, a clue as to what 308 00:15:42,470 --> 00:15:44,840 it is that we're actually looking for. 309 00:15:44,840 --> 00:15:47,780 And indeed, now we notice that oh, by undeclared identifier, 310 00:15:47,780 --> 00:15:50,495 clang means you've used a name string on line five of buggy one 311 00:15:50,495 --> 00:15:52,040 dot c, which hasn't been defined. 312 00:15:52,040 --> 00:15:55,400 Did you forget to include cs50 dot h, at this point. 313 00:15:55,400 --> 00:15:58,550 So in short, anytime you're having a problem running a command 314 00:15:58,550 --> 00:16:02,270 and you're seeing cryptic messages, reach for help 50 as a command 315 00:16:02,270 --> 00:16:04,100 for actually explaining it to you. 316 00:16:04,100 --> 00:16:08,122 And thereafter, probably you won't have to run that same command again. 317 00:16:08,122 --> 00:16:09,080 But what about another? 318 00:16:09,080 --> 00:16:12,950 Let me go ahead and open up a program I wrote in advance here, 319 00:16:12,950 --> 00:16:17,250 and go ahead and open this one. 320 00:16:17,250 --> 00:16:17,750 Yeah? 321 00:16:17,750 --> 00:16:18,565 Sure. 322 00:16:18,565 --> 00:16:23,925 AUDIENCE: [INAUDIBLE] just press more buttons. 323 00:16:23,925 --> 00:16:25,550 DAVID MALAN: To rerun the same command? 324 00:16:25,550 --> 00:16:27,890 AUDIENCE: Not to delete that, but to [INAUDIBLE] 325 00:16:27,890 --> 00:16:30,560 DAVID MALAN: Oh, yes, so just to keep things neat in class, 326 00:16:30,560 --> 00:16:32,600 I'm in the habit of hitting Control l a lot, 327 00:16:32,600 --> 00:16:34,460 which just clears my terminal window. 328 00:16:34,460 --> 00:16:35,630 It has no functional impact. 329 00:16:35,630 --> 00:16:37,490 It just gets the clutter off of the screen. 330 00:16:37,490 --> 00:16:40,515 You can also literally type, for instance, clear, Enter. 331 00:16:40,515 --> 00:16:42,890 That's just a little more verbose than hitting Control l. 332 00:16:42,890 --> 00:16:45,932 So there's a lot of little keyboard shortcuts, and interrupt at any point 333 00:16:45,932 --> 00:16:47,520 if you have questions about those. 334 00:16:47,520 --> 00:16:49,670 So here's a program that also is buggy. 335 00:16:49,670 --> 00:16:52,160 I wrote it in advance, and it's called buggy two dot c. 336 00:16:52,160 --> 00:16:53,180 It's got a for loop. 337 00:16:53,180 --> 00:16:54,590 It's printing some hashes. 338 00:16:54,590 --> 00:16:58,010 And the goal of this program is to print something 10 times. 339 00:16:58,010 --> 00:17:00,560 So I've got my for loop from zero on up to 10. 340 00:17:00,560 --> 00:17:02,870 I'm printing a hash with a backslash n. 341 00:17:02,870 --> 00:17:06,450 So let's go ahead and run this, make buggy two. 342 00:17:06,450 --> 00:17:06,950 Oops. 343 00:17:06,950 --> 00:17:08,033 I'm not in this directory. 344 00:17:08,033 --> 00:17:10,609 Let me go ahead and make buggy two-- 345 00:17:10,609 --> 00:17:11,510 seems to compile. 346 00:17:11,510 --> 00:17:14,089 So this is not a problem for help 50 yet, 347 00:17:14,089 --> 00:17:17,030 because that would be when the command itself isn't working. 348 00:17:17,030 --> 00:17:19,880 Buggy two-- all right, it looks good, but let's 349 00:17:19,880 --> 00:17:23,798 just be super sure-- one, two, three, four, five, six, seven, eight, nine, 350 00:17:23,798 --> 00:17:26,000 10, 11. 351 00:17:26,000 --> 00:17:28,550 So it is flawed, if my goal is to print just 10 hashes. 352 00:17:28,550 --> 00:17:30,230 And obviously, this is very contrived. 353 00:17:30,230 --> 00:17:32,960 Odds are, you can just reason through what the problem here is, 354 00:17:32,960 --> 00:17:36,320 but this is representative of another type of problem 355 00:17:36,320 --> 00:17:41,870 that's not a bug syntactically, whereby you typed some wrong symbol or Command. 356 00:17:41,870 --> 00:17:43,370 This is more of a logical error. 357 00:17:43,370 --> 00:17:45,500 My goal is to print something 10 times. 358 00:17:45,500 --> 00:17:46,370 It's obviously not. 359 00:17:46,370 --> 00:17:47,787 It's printing something 11 times. 360 00:17:47,787 --> 00:17:50,370 And suppose that the goal at hand is to wrap your mind around, 361 00:17:50,370 --> 00:17:51,960 why is that happening? 362 00:17:51,960 --> 00:17:55,280 Well, the next debugging tool that we'll propose that you consider, 363 00:17:55,280 --> 00:17:57,170 is actually quite simply printf. 364 00:17:57,170 --> 00:18:00,980 It's perhaps the simplest tool you can use to actually understand 365 00:18:00,980 --> 00:18:04,190 what's going on inside of your program, and we might use it in this case 366 00:18:04,190 --> 00:18:05,240 as follows. 367 00:18:05,240 --> 00:18:08,023 I'm obviously printing out already the hash symbol, 368 00:18:08,023 --> 00:18:10,940 but let me go ahead and say something more deliberate, just to myself, 369 00:18:10,940 --> 00:18:18,830 something like i is now, %i, and then let's go ahead and just put a space, 370 00:18:18,830 --> 00:18:21,690 and then in there, output i semicolon. 371 00:18:21,690 --> 00:18:23,460 So this is not the goal of the program. 372 00:18:23,460 --> 00:18:25,400 It's just a temporary diagnostic message, 373 00:18:25,400 --> 00:18:28,850 so that now, if I go ahead and increase my terminal window, 374 00:18:28,850 --> 00:18:33,620 recompile buggy two, and rerun dot slash buggy two-- 375 00:18:33,620 --> 00:18:35,930 [LAUGHS] buffy two-- 376 00:18:35,930 --> 00:18:41,000 buggy two-- I'll now see, oh, a little more interesting information. 377 00:18:41,000 --> 00:18:44,600 Not only am I still seeing the hashes, I'm now seeing, in real time, 378 00:18:44,600 --> 00:18:45,617 the value of i. 379 00:18:45,617 --> 00:18:47,450 And now, it should probably jump out at you, 380 00:18:47,450 --> 00:18:50,210 if it didn't already in the for loop alone, what's 381 00:18:50,210 --> 00:18:53,256 the mistake I've made in my code? 382 00:18:53,256 --> 00:18:54,613 AUDIENCE: [INAUDIBLE] 383 00:18:54,613 --> 00:18:55,571 DAVID MALAN: Say again. 384 00:18:55,571 --> 00:18:58,220 AUDIENCE: [INAUDIBLE] 385 00:18:58,220 --> 00:19:02,100 DAVID MALAN: Yeah, my first value for i was zero, and that's normally OK. 386 00:19:02,100 --> 00:19:04,100 Programmers do tend to start counting from zero, 387 00:19:04,100 --> 00:19:07,730 but if you do that, you can't catch keep counting through 10. 388 00:19:07,730 --> 00:19:09,860 You have to make a couple of tweaks here. 389 00:19:09,860 --> 00:19:11,462 So what can we do to fix? 390 00:19:11,462 --> 00:19:15,457 AUDIENCE: [INAUDIBLE] 391 00:19:15,457 --> 00:19:18,290 DAVID MALAN: Yeah, so this would be the canonical way of doing this. 392 00:19:18,290 --> 00:19:20,480 It's not the only way, but generally start at zero 393 00:19:20,480 --> 00:19:23,330 and go up to less than the value you care about. 394 00:19:23,330 --> 00:19:27,290 So now if I rerun this, I can go ahead and run make buggy two again, 395 00:19:27,290 --> 00:19:30,860 clear my screen, dot slash buggy two, Enter. 396 00:19:30,860 --> 00:19:33,557 And now I indeed have 10, even though it never says 10, 397 00:19:33,557 --> 00:19:35,390 but that's OK, because I'm starting at zero, 398 00:19:35,390 --> 00:19:38,330 and now that I found my logical error, where it's just not 399 00:19:38,330 --> 00:19:41,810 working as I intended, now I can go ahead and delete that line. 400 00:19:41,810 --> 00:19:47,000 I can go ahead and make buggy two once more, dot slash buggy two, Enter. 401 00:19:47,000 --> 00:19:51,140 And voila, I can now submit my program, or ship it out to my actual user. 402 00:19:51,140 --> 00:19:53,652 So printf is sort of a very old-school way 403 00:19:53,652 --> 00:19:56,360 of just wrapping your mind around what's going on in your program 404 00:19:56,360 --> 00:19:57,530 by just poking around. 405 00:19:57,530 --> 00:20:00,870 Use printf to see what's going on inside of your program, 406 00:20:00,870 --> 00:20:03,740 so you're not just staring at a screen trying to reason through 407 00:20:03,740 --> 00:20:05,810 without the help of the computer. 408 00:20:05,810 --> 00:20:09,405 But of course, that's about as versatile as cs50 sandbox 409 00:20:09,405 --> 00:20:11,030 gets when it comes to solving problems. 410 00:20:11,030 --> 00:20:12,500 You can write code up here. 411 00:20:12,500 --> 00:20:14,630 You can compile and run code down here. 412 00:20:14,630 --> 00:20:16,850 And there are commands like help 50 and a few others 413 00:20:16,850 --> 00:20:19,400 we'll see that you can run to improve your code, 414 00:20:19,400 --> 00:20:22,215 but the sandbox itself is actually pretty limited. 415 00:20:22,215 --> 00:20:25,340 And so today, we're going to introduce another programming environment that 416 00:20:25,340 --> 00:20:29,720 fundamentally is the same thing, it just has additional features, particularly 417 00:20:29,720 --> 00:20:31,880 ones related to debugging. 418 00:20:31,880 --> 00:20:36,200 So here now, is what is called CS50 IDE. 419 00:20:36,200 --> 00:20:39,113 IDE is a term of art for integrated development environment. 420 00:20:39,113 --> 00:20:41,030 You might have used it if you programed before 421 00:20:41,030 --> 00:20:44,270 in high school things like Eclipse or Visual Studio or NetBeans 422 00:20:44,270 --> 00:20:45,830 or a bunch of other tools as well. 423 00:20:45,830 --> 00:20:47,420 If you've ever used any of these tools, that's fine. 424 00:20:47,420 --> 00:20:48,590 Most students have not. 425 00:20:48,590 --> 00:20:52,940 But CS50 IDE is just sort of a fancier version of CS50 sandbox 426 00:20:52,940 --> 00:20:56,420 that adds some additional tools, like debugging tools. 427 00:20:56,420 --> 00:21:00,140 And so here I've gone ahead and logged in advance to CS50 IDE, 428 00:21:00,140 --> 00:21:01,970 and it's pretty much the same layout. 429 00:21:01,970 --> 00:21:05,660 On the top of the window is where my tabs with my code will go. 430 00:21:05,660 --> 00:21:07,220 On the bottom is my terminal window. 431 00:21:07,220 --> 00:21:10,530 It happens to be blue instead of black, but that's just an aesthetic detail. 432 00:21:10,530 --> 00:21:13,108 But you'll see a teaser over here of other features, 433 00:21:13,108 --> 00:21:16,400 including what's called the debugger, a program that's going to let me actually 434 00:21:16,400 --> 00:21:19,470 step through my code, step by step. 435 00:21:19,470 --> 00:21:21,710 So let's go ahead and do this after introducing 436 00:21:21,710 --> 00:21:25,850 one other command that exists in the IDE, and that's called debug 50. 437 00:21:25,850 --> 00:21:28,790 Suffice it to say, that any command this semester that ends in 50 438 00:21:28,790 --> 00:21:30,958 is a training wheel of sorts that's CS50 specific. 439 00:21:30,958 --> 00:21:32,750 But by term's end, well we have essentially 440 00:21:32,750 --> 00:21:36,980 taken away all of those CS50 specific tools so that everything you're using 441 00:21:36,980 --> 00:21:39,660 is industry standard, so to speak. 442 00:21:39,660 --> 00:21:46,110 So if we look now at CS50 IDE, let's go ahead and maybe run that same program. 443 00:21:46,110 --> 00:21:49,520 So if I click this folder icon up here, you'll see a whole bunch of files, 444 00:21:49,520 --> 00:21:50,810 just like in the sandbox. 445 00:21:50,810 --> 00:21:53,810 And I've pre downloaded all of today's source code from CS50's website 446 00:21:53,810 --> 00:21:56,960 and just uploaded it to the IDE, just like you can in the sandbox. 447 00:21:56,960 --> 00:22:00,780 And we'll do this in section or in super section, manually, if you'd like. 448 00:22:00,780 --> 00:22:03,950 I'm going to go ahead and open up that same program buggy two, that's 449 00:22:03,950 --> 00:22:06,053 now in the IDE instead of the sandbox, and you'll 450 00:22:06,053 --> 00:22:07,470 see it looks pretty much the same. 451 00:22:07,470 --> 00:22:09,060 The color coding might be a little different, 452 00:22:09,060 --> 00:22:10,648 but that's just an aesthetic detail. 453 00:22:10,648 --> 00:22:11,690 And I can still run this. 454 00:22:11,690 --> 00:22:14,540 Make buggy two down here. 455 00:22:14,540 --> 00:22:19,340 But notice here, this error, I could use help 50 on this, but notice in advance, 456 00:22:19,340 --> 00:22:22,490 I've downloaded all of my code into a folder called source two. 457 00:22:22,490 --> 00:22:25,100 That's what's in the zip file, on the course's website. 458 00:22:25,100 --> 00:22:29,960 So again, just like we did briefly last week, if you know your code is not just 459 00:22:29,960 --> 00:22:32,420 in the default location, but is in another directory, 460 00:22:32,420 --> 00:22:34,005 what does cd stand for? 461 00:22:34,005 --> 00:22:35,250 AUDIENCE: Change directory. 462 00:22:35,250 --> 00:22:35,600 DAVID MALAN: OK. 463 00:22:35,600 --> 00:22:37,225 So change directory-- so not that hard. 464 00:22:37,225 --> 00:22:38,210 It changes directory. 465 00:22:38,210 --> 00:22:39,980 And now notice what the sandbox does. 466 00:22:39,980 --> 00:22:42,950 It's a little more powerful, even though it's a little more cryptic. 467 00:22:42,950 --> 00:22:45,200 It always puts a constant reminder of where 468 00:22:45,200 --> 00:22:48,800 you are in the folders in your IDE, whereas the sandbox hid 469 00:22:48,800 --> 00:22:49,880 this detail altogether. 470 00:22:49,880 --> 00:22:52,547 So again, we're removing a training wheel by just reminding you, 471 00:22:52,547 --> 00:22:55,730 you are in source two and the tilde is just a computer convention, 472 00:22:55,730 --> 00:22:57,560 meaning that is your home directory, that 473 00:22:57,560 --> 00:23:02,450 is your personal folder with your CS50 files, demarcated with just a tilde. 474 00:23:02,450 --> 00:23:05,150 So now I'm going to go ahead and do make buggy two. 475 00:23:05,150 --> 00:23:08,130 It does compile, because again, this is not a syntax error. 476 00:23:08,130 --> 00:23:09,660 This is a logical problem. 477 00:23:09,660 --> 00:23:12,480 I'm to go ahead now and dot slash buggy two. 478 00:23:12,480 --> 00:23:16,550 And if I count these up, I've still got 11 hashes on the screen. 479 00:23:16,550 --> 00:23:18,800 So I could go in and add printf, but that's not really 480 00:23:18,800 --> 00:23:20,790 taking advantage of any new tools. 481 00:23:20,790 --> 00:23:22,610 But watch what I can instead do. 482 00:23:22,610 --> 00:23:26,310 Let me scroll this down just a little bit so I can see all of my code. 483 00:23:26,310 --> 00:23:31,700 Let me go ahead and click to the left of the line numbers in the IDE, 484 00:23:31,700 --> 00:23:35,360 like in main, and it puts a red dot, like a stop sign that says stop here. 485 00:23:35,360 --> 00:23:37,040 This is what's called a breakpoint. 486 00:23:37,040 --> 00:23:39,950 This is a feature of a lot of integrated development environments, 487 00:23:39,950 --> 00:23:42,830 like CS50 IDE that's telling the computer in advance, 488 00:23:42,830 --> 00:23:45,590 when I run this program, don't just run it like usual, 489 00:23:45,590 --> 00:23:50,450 stop there, and allow me, the human, to step through my code, step 490 00:23:50,450 --> 00:23:52,220 by step by step. 491 00:23:52,220 --> 00:23:55,880 So to do this, you do not just run buggy two again. 492 00:23:55,880 --> 00:23:58,340 You instead run debug 50. 493 00:23:58,340 --> 00:24:02,310 So just like help 50 helps you understand error messages, debug 50 494 00:24:02,310 --> 00:24:05,930 lets you walk through your program step by step by step. 495 00:24:05,930 --> 00:24:07,550 So let me go ahead and hit Enter. 496 00:24:07,550 --> 00:24:10,760 You'll notice now on the right-hand side a new window 497 00:24:10,760 --> 00:24:12,523 that the sandbox did not have opened up. 498 00:24:12,523 --> 00:24:15,690 And there's a lot going on there, but we'll soon see the pieces that matter. 499 00:24:15,690 --> 00:24:16,980 That is the debugger. 500 00:24:16,980 --> 00:24:19,560 And you'll see that this line here, line seven, 501 00:24:19,560 --> 00:24:23,192 is highlighted, because that's the first real piece of code inside of main 502 00:24:23,192 --> 00:24:24,900 that's potentially going to get executed. 503 00:24:24,900 --> 00:24:26,775 Nothing really happens with the curly braces. 504 00:24:26,775 --> 00:24:28,630 Seven is the first real line of code. 505 00:24:28,630 --> 00:24:30,450 So what this yellow or greenish bar means 506 00:24:30,450 --> 00:24:34,500 is that the debugger has paused your program at that moment in time, 507 00:24:34,500 --> 00:24:38,460 has not run all the way through, so we can start to poke around. 508 00:24:38,460 --> 00:24:41,790 And in fact, if I zoom in on the right, let's focus today 509 00:24:41,790 --> 00:24:46,890 pretty much on variables, you'll notice a nice little visual clue 510 00:24:46,890 --> 00:24:48,810 that you have a variable called i. 511 00:24:48,810 --> 00:24:50,430 At the moment, its value is zero. 512 00:24:50,430 --> 00:24:51,570 What is its type? 513 00:24:51,570 --> 00:24:52,740 Integer. 514 00:24:52,740 --> 00:24:56,400 So watch what happens now when I take advantage of some of the icons 515 00:24:56,400 --> 00:24:57,688 that are slightly higher up. 516 00:24:57,688 --> 00:25:00,480 I'm just going to scroll up on the debugger, and most of this we'll 517 00:25:00,480 --> 00:25:03,010 ignore for today, but there's some icons here. 518 00:25:03,010 --> 00:25:05,730 So if I were to hit Play, that will just resume my program 519 00:25:05,730 --> 00:25:07,950 and run it all the way to the end-- not very useful 520 00:25:07,950 --> 00:25:09,610 if my goal was to step through it. 521 00:25:09,610 --> 00:25:13,530 But if you hover over these other icons instead, step over, 522 00:25:13,530 --> 00:25:17,320 this will step over one line of code at a time, 523 00:25:17,320 --> 00:25:19,740 and execute it one by one by one, so literally 524 00:25:19,740 --> 00:25:21,960 allowing you to walk through your own code. 525 00:25:21,960 --> 00:25:23,010 And so let's try this. 526 00:25:23,010 --> 00:25:27,570 When I go ahead and click Step Over, notice that the color moves. 527 00:25:27,570 --> 00:25:30,600 Watch my terminal window now, the big blue window at the bottom. 528 00:25:30,600 --> 00:25:31,980 I'm going to see hash. 529 00:25:31,980 --> 00:25:33,990 Now notice that line seven is highlighted again, 530 00:25:33,990 --> 00:25:35,400 because just with a for loop, something's 531 00:25:35,400 --> 00:25:37,020 going to happen again and again. 532 00:25:37,020 --> 00:25:41,274 So what should we see happen though when I click step over once more? 533 00:25:41,274 --> 00:25:42,215 AUDIENCE: [INAUDIBLE] 534 00:25:42,215 --> 00:25:43,590 DAVID MALAN: i should become one. 535 00:25:43,590 --> 00:25:46,757 So it's a little small, but watch the right-hand side of the screen where it 536 00:25:46,757 --> 00:25:49,650 says variable i, and I click Step Over-- 537 00:25:49,650 --> 00:25:51,630 voila, now we see one. 538 00:25:51,630 --> 00:25:54,840 And if I continue doing this, not much of interest really happens. 539 00:25:54,840 --> 00:25:57,990 I've just really slowed down the same program. 540 00:25:57,990 --> 00:26:01,590 But you'll notice that i is incrementing again and again and again. 541 00:26:01,590 --> 00:26:03,690 But what's interesting here is I didn't have 542 00:26:03,690 --> 00:26:07,238 to go in and change my code by adding a bunch of messy printf statements 543 00:26:07,238 --> 00:26:09,780 that I'm going to have to delete later just to submit my code 544 00:26:09,780 --> 00:26:11,100 or ship it on the internet. 545 00:26:11,100 --> 00:26:15,150 Instead, I can kind of watch what's going on inside of my computer's memory 546 00:26:15,150 --> 00:26:17,160 while I'm executing this program. 547 00:26:17,160 --> 00:26:21,750 And the fact now that the value of i is 10, 548 00:26:21,750 --> 00:26:26,830 and yet I'm about to print another hash, therein lies the same logical error. 549 00:26:26,830 --> 00:26:30,790 So we're seeing just graphically the same problem as before. 550 00:26:30,790 --> 00:26:33,040 So now at this point, the program is pretty much done. 551 00:26:33,040 --> 00:26:35,750 If I keep clicking Step Over, it's just going to terminate. 552 00:26:35,750 --> 00:26:37,500 If at this point, I'm like, oh my god, now 553 00:26:37,500 --> 00:26:41,240 I know it's wrong, you can exit out of most any program in the IDE 554 00:26:41,240 --> 00:26:43,847 or in sandbox by hitting Control c, for cancel, 555 00:26:43,847 --> 00:26:45,930 and that will kill the debugger, close the window, 556 00:26:45,930 --> 00:26:47,860 and get you back to your terminal window. 557 00:26:47,860 --> 00:26:51,660 And I can't emphasize this enough, moving forward even this week, 558 00:26:51,660 --> 00:26:56,247 use help 50 when you have a bug compiling your code, some error message 559 00:26:56,247 --> 00:26:57,330 that you don't understand. 560 00:26:57,330 --> 00:26:59,160 It will just help you like a member of the staff could. 561 00:26:59,160 --> 00:27:01,827 And then certainly reach out to us if you don't understand that. 562 00:27:01,827 --> 00:27:04,800 But debug 50 should, moving forward, be your first instinct. 563 00:27:04,800 --> 00:27:06,960 If you have a bug where something's not working, 564 00:27:06,960 --> 00:27:08,670 the amount of change your computing is wrong, 565 00:27:08,670 --> 00:27:10,503 the credit card numbers you're analyzing are 566 00:27:10,503 --> 00:27:14,790 wrong, use debug 50, starting this week, not two weeks from now, 567 00:27:14,790 --> 00:27:16,890 to develop that muscle memory of using a debugger. 568 00:27:16,890 --> 00:27:21,630 And it is truly a lifelong skill, not just for C, but for other languages 569 00:27:21,630 --> 00:27:23,310 as well. 570 00:27:23,310 --> 00:27:26,400 Any questions on that? 571 00:27:26,400 --> 00:27:29,910 You'll see more of it in section and beyond. 572 00:27:29,910 --> 00:27:33,480 So what else do we have in the way of tools in our toolkit here? 573 00:27:33,480 --> 00:27:35,888 Let's go ahead and introduce one other now. 574 00:27:35,888 --> 00:27:38,430 That one you've probably used this past week called check 50. 575 00:27:38,430 --> 00:27:41,740 This is a tool that allows you to analyze the correctness of your code. 576 00:27:41,740 --> 00:27:45,090 And you might recall with check 50, you did a little something like this. 577 00:27:45,090 --> 00:27:50,520 If I went ahead and whipped up a program, like my typical hello dot c-- 578 00:27:50,520 --> 00:27:54,160 so I've gone ahead and clicked Save, saving this file as hello dot c. 579 00:27:54,160 --> 00:27:57,900 Let me go ahead and include standard Io dot h, int main void. 580 00:27:57,900 --> 00:27:59,940 Let me go ahead now and printf. 581 00:27:59,940 --> 00:28:03,720 Hello comma world backslash n semicolon. 582 00:28:03,720 --> 00:28:06,870 And I know from the problem sets, that the way 583 00:28:06,870 --> 00:28:09,660 to check the correctness of this code with CS50-- 584 00:28:09,660 --> 00:28:12,570 check 50 and then a slug, a unique identifier. 585 00:28:12,570 --> 00:28:17,280 I'm using a shorter one just for lecture today called CS50 problems hello. 586 00:28:17,280 --> 00:28:20,760 That is just the unique set of tests that I want to run on my code 587 00:28:20,760 --> 00:28:22,095 called hello dot c. 588 00:28:22,095 --> 00:28:24,720 So what's happening here is I'm being prompted to authenticate. 589 00:28:24,720 --> 00:28:27,120 GitHub is what this uses, as you've seen. 590 00:28:27,120 --> 00:28:29,220 I'm going to go ahead and use my student account. 591 00:28:29,220 --> 00:28:32,020 I'm going to go ahead and log in. 592 00:28:32,020 --> 00:28:34,170 You'll notice a star represents your password, 593 00:28:34,170 --> 00:28:37,140 so it kind of sort of masks it, even though everyone in the world now 594 00:28:37,140 --> 00:28:39,007 knows how long my password is. 595 00:28:39,007 --> 00:28:41,340 And now we're preparing, we're uploading the submission, 596 00:28:41,340 --> 00:28:43,423 and in just a few seconds, we'll get some feedback 597 00:28:43,423 --> 00:28:46,650 from CS50's server that tells us, hopefully, 598 00:28:46,650 --> 00:28:49,110 that my code is perfectly correct-- 599 00:28:49,110 --> 00:28:50,520 perfectly correct. 600 00:28:50,520 --> 00:28:52,740 But no, it's not in this case. 601 00:28:52,740 --> 00:28:54,570 And if you recall from problem set one, you 602 00:28:54,570 --> 00:28:56,362 weren't supposed to just print hello world. 603 00:28:56,362 --> 00:28:59,550 You were supposed to print hello so and so, whatever the human's name is. 604 00:28:59,550 --> 00:29:03,020 So you'll see two green smileys here saying hello dot c exists. 605 00:29:03,020 --> 00:29:04,140 So I got that one right. 606 00:29:04,140 --> 00:29:05,580 I named the file correctly. 607 00:29:05,580 --> 00:29:08,400 Step two, it compiled, so there were no error messages 608 00:29:08,400 --> 00:29:10,140 when we ran make on your code. 609 00:29:10,140 --> 00:29:12,660 But we did get unhappy twice. 610 00:29:12,660 --> 00:29:16,230 We expected when passing in the name Emma, for you to say hello Emma. 611 00:29:16,230 --> 00:29:19,590 And when we expected to pass in Rodrigo, we expected hello Rodrigo, 612 00:29:19,590 --> 00:29:22,510 so you did not pass these two tests. 613 00:29:22,510 --> 00:29:26,970 So check 50 happens to be CS50 specific, that the TF's and I use to grade 614 00:29:26,970 --> 00:29:29,460 and provide automated feedback on code, but it's 615 00:29:29,460 --> 00:29:33,120 representative of what in the real world are just quite simply called tests. 616 00:29:33,120 --> 00:29:36,300 Whenever you work for a company or write software, part of that process 617 00:29:36,300 --> 00:29:39,150 is typically not just to write the code that solves your problem, 618 00:29:39,150 --> 00:29:43,650 but to write tests that make sure that your own code is correct, especially 619 00:29:43,650 --> 00:29:47,160 so that if you add features to your programs down the road or someone else 620 00:29:47,160 --> 00:29:50,490 tries to add features to your code, they and you don't break it-- 621 00:29:50,490 --> 00:29:54,690 you're constantly have a capability to make sure your code is still 622 00:29:54,690 --> 00:29:56,140 working as expected. 623 00:29:56,140 --> 00:29:59,460 So while we do use it in academic context to score problems sets, 624 00:29:59,460 --> 00:30:02,610 it's fundamentally representative of a real-world process 625 00:30:02,610 --> 00:30:06,270 of testing one's own code repeatedly. 626 00:30:06,270 --> 00:30:08,760 And then lastly, there's this thing-- style 50. 627 00:30:08,760 --> 00:30:11,490 So it's not uncommon when learning how to program, especially 628 00:30:11,490 --> 00:30:13,830 in a language like C, to be a little sloppy when 629 00:30:13,830 --> 00:30:15,150 it comes to writing your code. 630 00:30:15,150 --> 00:30:18,300 Technically speaking, this same program here, 631 00:30:18,300 --> 00:30:19,800 I could just make it look like this. 632 00:30:19,800 --> 00:30:23,058 And frankly, if I really wanted to, I can make it look like this, 633 00:30:23,058 --> 00:30:24,600 and the computer's not going to care. 634 00:30:24,600 --> 00:30:27,142 It's smart enough to be able to distinguish the various curly 635 00:30:27,142 --> 00:30:29,280 braces from parentheses and semicolons. 636 00:30:29,280 --> 00:30:32,100 But my god, this is not very pleasant to look at. 637 00:30:32,100 --> 00:30:34,470 Or if it is right now, break that mindset. 638 00:30:34,470 --> 00:30:36,610 This is not very pleasant to look at. 639 00:30:36,610 --> 00:30:40,440 You should be writing code that's easier for you to read, for other people 640 00:30:40,440 --> 00:30:42,940 to read, and honestly, easier for you to maintain. 641 00:30:42,940 --> 00:30:46,530 There is nothing worse than writing really bad code, coming back 642 00:30:46,530 --> 00:30:49,140 to it weeks or months later to fix something, add something, 643 00:30:49,140 --> 00:30:52,590 and you don't even know what you're looking at because it's your own code. 644 00:30:52,590 --> 00:30:56,130 So style 50 is a tool that just helps you develop muscle 645 00:30:56,130 --> 00:30:58,410 memory for writing prettier code. 646 00:30:58,410 --> 00:31:00,870 Style has nothing to do with your coach correctness. 647 00:31:00,870 --> 00:31:04,500 It's more of the nit picky aesthetics that just makes it pleasant to look at. 648 00:31:04,500 --> 00:31:08,100 And reasonable people will disagree as to what constitutes pretty code. 649 00:31:08,100 --> 00:31:11,160 With style 50, we, like a company, have standardized 650 00:31:11,160 --> 00:31:14,010 on what we would propose your C code looks like, 651 00:31:14,010 --> 00:31:17,470 so that we can have an objective measure of how clean it is. 652 00:31:17,470 --> 00:31:22,170 So if I go ahead and run, after saving my file, style 50 on hello dot c, 653 00:31:22,170 --> 00:31:24,660 Enter, you'll see some output like this. 654 00:31:24,660 --> 00:31:27,510 You'll see your same code in black and white at the bottom, 655 00:31:27,510 --> 00:31:30,665 but you'll see green text telling you where you should add space. 656 00:31:30,665 --> 00:31:32,790 So you should literally hit the spacebar four times 657 00:31:32,790 --> 00:31:35,130 and that will make style 50 happy. 658 00:31:35,130 --> 00:31:39,150 By contrast, if I instead do something like this, let me go ahead 659 00:31:39,150 --> 00:31:41,580 and correct it incorrectly. 660 00:31:41,580 --> 00:31:45,190 There are people in the world that write code that looks like this. 661 00:31:45,190 --> 00:31:46,950 This is frowned upon. 662 00:31:46,950 --> 00:31:50,310 But if I go ahead and run style 50 now on this file-- 663 00:31:50,310 --> 00:31:52,140 Enter-- you'll see the opposite. 664 00:31:52,140 --> 00:31:54,390 And it gets a little scarier with this syntax, 665 00:31:54,390 --> 00:31:57,780 because we're doing our best to explain what it is we want you to do. 666 00:31:57,780 --> 00:32:02,100 But we want you to delete the new line, the Enter key that you hit here, 667 00:32:02,100 --> 00:32:04,380 and we want you to pull it up to the top here, 668 00:32:04,380 --> 00:32:06,090 and we want you to delete that read here. 669 00:32:06,090 --> 00:32:08,430 So admittedly, it's sometimes hard for the computer 670 00:32:08,430 --> 00:32:11,950 to give you very straightforward advice as to what's going on. 671 00:32:11,950 --> 00:32:14,850 So you'll see over time, certain patterns. 672 00:32:14,850 --> 00:32:17,850 So in fact, if I go to CS50's own website here, 673 00:32:17,850 --> 00:32:20,980 let me go ahead and pull up what's called a style guide. 674 00:32:20,980 --> 00:32:22,740 And this is the authoritative answer when 675 00:32:22,740 --> 00:32:25,995 it comes to what your code should look like in a class or in a company. 676 00:32:25,995 --> 00:32:27,870 You'll see throughout this style guide that's 677 00:32:27,870 --> 00:32:31,590 online a lot of examples of what good code, pretty code, 678 00:32:31,590 --> 00:32:33,600 readable code should look like. 679 00:32:33,600 --> 00:32:35,670 And there, too, reasonable people will disagree, 680 00:32:35,670 --> 00:32:39,930 but it's part of the programming process to have good style for your code, 681 00:32:39,930 --> 00:32:44,610 as well in style 50 allows you to develop that muscle memory, as well. 682 00:32:44,610 --> 00:32:48,840 And one aside, whereas the sandbox tool used to auto save your file, 683 00:32:48,840 --> 00:32:50,350 the IDE does not do that. 684 00:32:50,350 --> 00:32:53,160 So notice I just hit Enter a couple of times in this file, 685 00:32:53,160 --> 00:32:57,390 or suppose I said something like Goodbye World more explicitly, and suppose I 686 00:32:57,390 --> 00:32:59,740 now move my cursor to the terminal window, 687 00:32:59,740 --> 00:33:02,840 you'll see a big red alert saying, hey did not save your file. 688 00:33:02,840 --> 00:33:06,090 That's because the IDE is meant to be a little more powerful and a little more 689 00:33:06,090 --> 00:33:10,020 of the onus now is on you to actually know OK, red dot up there 690 00:33:10,020 --> 00:33:11,280 means I should save. 691 00:33:11,280 --> 00:33:14,760 So file, Save, or you can hit Control s or Command s. 692 00:33:14,760 --> 00:33:18,900 So just realize that is now unto you. 693 00:33:18,900 --> 00:33:23,790 And lastly, a summary of what all these tools really figure into. 694 00:33:23,790 --> 00:33:25,620 Pretty much, the first four of these tools 695 00:33:25,620 --> 00:33:28,710 all relate to the writing correct code, code 696 00:33:28,710 --> 00:33:32,280 that works the way you want it to, code the way we want it to, 697 00:33:32,280 --> 00:33:36,120 code the way that some problem to be solved wants you to implement it. 698 00:33:36,120 --> 00:33:40,080 Style is the last of those, and that's really the best categorization thereof. 699 00:33:40,080 --> 00:33:43,027 Of course, not always do these tools solve all of your problems. 700 00:33:43,027 --> 00:33:44,985 And undoubtedly, if you didn't experience this, 701 00:33:44,985 --> 00:33:47,520 this past week already, you will get frustrated. 702 00:33:47,520 --> 00:33:51,450 You will get incredibly frustrated sometimes by some bug in your code 703 00:33:51,450 --> 00:33:52,950 and you might be staring at it. 704 00:33:52,950 --> 00:33:53,940 You might be thinking it through. 705 00:33:53,940 --> 00:33:56,790 You might try all of these darn tools, go to office hours tutorial, 706 00:33:56,790 --> 00:33:59,400 and it's still not working out for you. 707 00:33:59,400 --> 00:34:01,920 Frankly, the solution there is to take a step back. 708 00:34:01,920 --> 00:34:06,580 And I can't emphasize enough the value of going for a jog, taking a break, 709 00:34:06,580 --> 00:34:08,580 doing something else, changing your mental model 710 00:34:08,580 --> 00:34:10,020 and coming back to it later. 711 00:34:10,020 --> 00:34:14,580 I have literally, and I'm sure many of the TF's and TA's have, solved code 712 00:34:14,580 --> 00:34:17,670 while falling asleep, because there, you're sort of thoughtfully 713 00:34:17,670 --> 00:34:20,639 thinking through what it is you did, what it is you're trying to do. 714 00:34:20,639 --> 00:34:23,738 But undoubtedly, it helps to talk through your problems some time. 715 00:34:23,738 --> 00:34:26,280 And there's this other term of art in computer science called 716 00:34:26,280 --> 00:34:27,570 rubber duck debugging. 717 00:34:27,570 --> 00:34:30,690 The idea being that if you don't have a TF at your side 718 00:34:30,690 --> 00:34:34,080 or CA at your side or roommate who has any idea what you're talking about when 719 00:34:34,080 --> 00:34:37,472 it comes to programming, you can have one of these little things on your desk 720 00:34:37,472 --> 00:34:39,389 that you can literally, probably with the door 721 00:34:39,389 --> 00:34:43,320 closed, start talking to, to explain to the duck, just like you would 722 00:34:43,320 --> 00:34:47,580 a teaching fellow, what it is you think your code is doing, walking through 723 00:34:47,580 --> 00:34:49,710 it line-by-line verbally, until hopefully, you 724 00:34:49,710 --> 00:34:52,710 have that self-induced aha moment, like oh, wait a minute, 725 00:34:52,710 --> 00:34:55,718 it's supposed to be 10 not 11, at which point, 726 00:34:55,718 --> 00:34:58,260 you discretely put the duck back down and go about your work. 727 00:34:58,260 --> 00:35:01,320 But it is meant to be this proxy for just 728 00:35:01,320 --> 00:35:04,595 a very deliberate thoughtful process to which everyone is welcome. 729 00:35:04,595 --> 00:35:06,720 You're welcome to take a duck today on your way out 730 00:35:06,720 --> 00:35:08,490 and we have lots more tutorials and office hours, 731 00:35:08,490 --> 00:35:10,260 because this is not enough here today. 732 00:35:10,260 --> 00:35:12,900 This is just because it exists. 733 00:35:12,900 --> 00:35:18,390 But the goal with rubber duck debugging is just that additional human mechanism 734 00:35:18,390 --> 00:35:22,260 for solving problems by taking the emphasis off of tools 735 00:35:22,260 --> 00:35:24,150 and putting it really back on the human. 736 00:35:24,150 --> 00:35:27,000 So if a little socially awkwardly, consider 737 00:35:27,000 --> 00:35:30,870 deploying that tool as needed as well. 738 00:35:30,870 --> 00:35:34,110 So that's all focusing on correctness and style, 739 00:35:34,110 --> 00:35:36,323 and that's indeed what every problem set here on out 740 00:35:36,323 --> 00:35:37,740 is going to have as one component. 741 00:35:37,740 --> 00:35:40,020 Does it work correctly and is it well styled? 742 00:35:40,020 --> 00:35:42,090 But the third axis of quality, when it comes 743 00:35:42,090 --> 00:35:45,360 to writing software, not just for CS50 but really in general 744 00:35:45,360 --> 00:35:49,050 with programming in the real world, is this notion of design. 745 00:35:49,050 --> 00:35:53,070 And design isn't quite something that we can assess yet with software, 746 00:35:53,070 --> 00:35:55,020 and say you designed that well or you did not 747 00:35:55,020 --> 00:35:57,450 design that well, it's more of a subjective measure. 748 00:35:57,450 --> 00:35:59,610 And here, too, reasonable people can disagree. 749 00:35:59,610 --> 00:36:02,760 So what we'll focus on, not only today, but in the weeks to come, 750 00:36:02,760 --> 00:36:06,540 is also the process of writing well-designed software 751 00:36:06,540 --> 00:36:10,230 and making more intelligent decisions to not just get the problem solved, 752 00:36:10,230 --> 00:36:11,640 but to get it solved well. 753 00:36:11,640 --> 00:36:14,732 And this is what full-time software engineers at the Facebooks and Googles 754 00:36:14,732 --> 00:36:16,440 and Microsofts and others of the world do 755 00:36:16,440 --> 00:36:19,380 every day, especially when they have huge amounts of data 756 00:36:19,380 --> 00:36:20,730 and many, many users. 757 00:36:20,730 --> 00:36:25,560 Every design decision they make matters and might cost money or CPU cycles 758 00:36:25,560 --> 00:36:26,710 or memory. 759 00:36:26,710 --> 00:36:28,620 And indeed, think back to week zero, finding 760 00:36:28,620 --> 00:36:31,200 Mike Smith was possible in three different ways, 761 00:36:31,200 --> 00:36:33,270 but that third way, the divide and conquer, 762 00:36:33,270 --> 00:36:35,400 was hands down the most efficient. 763 00:36:35,400 --> 00:36:37,830 That was better designed than the first couple. 764 00:36:37,830 --> 00:36:41,220 So let's now consider this in the context of programming 765 00:36:41,220 --> 00:36:45,900 and how we can use a few new features today in C to solve problems better 766 00:36:45,900 --> 00:36:48,540 and to write better designed code. 767 00:36:48,540 --> 00:36:52,080 And we'll do that first by way of something that is called an array. 768 00:36:52,080 --> 00:36:56,520 So an array is something that allows us to solve a problem, 769 00:36:56,520 --> 00:36:59,890 in perhaps, the following way. 770 00:36:59,890 --> 00:37:02,640 So in our computers-- 771 00:37:02,640 --> 00:37:06,390 in our programs in C, we have choices of bunches of data types. 772 00:37:06,390 --> 00:37:09,630 We've seen that there's chars, there's ints, there's floats, there's longs, 773 00:37:09,630 --> 00:37:12,510 there's doubles, there's bool, there's now string, 774 00:37:12,510 --> 00:37:14,550 and there's actually a few others as well. 775 00:37:14,550 --> 00:37:18,180 And each of those, depending on the computer system you're using, 776 00:37:18,180 --> 00:37:22,050 does take up a specific amount of space, on CS50, IDE, on the sandbox, 777 00:37:22,050 --> 00:37:24,750 and most likely on your own personal Macs and PCs. 778 00:37:24,750 --> 00:37:27,120 These days, each one of these data types, 779 00:37:27,120 --> 00:37:30,180 if you're writing a program in C, takes up this much space, 780 00:37:30,180 --> 00:37:33,630 where one byte is 8 bits, 4 bytes is 32 bits, 781 00:37:33,630 --> 00:37:37,450 8 bytes is 64 bits, to tie it back to week zero. 782 00:37:37,450 --> 00:37:39,930 So these are data types that we have at our disposal 783 00:37:39,930 --> 00:37:42,660 for any variables in our computer's memory. 784 00:37:42,660 --> 00:37:44,400 So why is that germane here? 785 00:37:44,400 --> 00:37:46,530 Well, this is that thing I showed a couple of weeks 786 00:37:46,530 --> 00:37:49,440 ago too, which is representative of RAM, random access memory. 787 00:37:49,440 --> 00:37:52,880 It's one of the pieces of hard drive in your macro PC or even phone these days. 788 00:37:52,880 --> 00:37:56,730 And each of these black chips represents some number of bytes. 789 00:37:56,730 --> 00:37:58,680 Odds are, small although it is in reality, 790 00:37:58,680 --> 00:38:02,730 it might represent a billion bytes if you have one gigabyte of memory, 791 00:38:02,730 --> 00:38:04,560 or maybe even more than that these days. 792 00:38:04,560 --> 00:38:07,860 But this little black chip, inside of your Mac, PC, or phone, 793 00:38:07,860 --> 00:38:11,190 is where information is stored when you're running software, 794 00:38:11,190 --> 00:38:13,860 whether it's on a desktop, or laptop, or mobile device. 795 00:38:13,860 --> 00:38:16,440 And we can actually think of this chip as just 796 00:38:16,440 --> 00:38:19,770 being divided into a bunch of different individual bytes. 797 00:38:19,770 --> 00:38:21,900 In fact, let's just arbitrarily zoom in on it 798 00:38:21,900 --> 00:38:24,030 and sort of divide it into rows and columns, 799 00:38:24,030 --> 00:38:27,870 and just claim that the top left here is going to be the first byte. 800 00:38:27,870 --> 00:38:30,240 This is the second byte, the third byte, and way down 801 00:38:30,240 --> 00:38:32,880 here is like the billionth byte of memory in my computer, 802 00:38:32,880 --> 00:38:36,420 obviously not drawn to scale, which is to say we can just number these bytes. 803 00:38:36,420 --> 00:38:38,830 So one, two, three, four, five, six, seven, eight, 804 00:38:38,830 --> 00:38:43,030 or to be really computer science like zero, one, two, three, four, five, six, 805 00:38:43,030 --> 00:38:45,010 seven, and so forth. 806 00:38:45,010 --> 00:38:46,860 So we don't have to know anything about how 807 00:38:46,860 --> 00:38:50,172 RAM works, electrically or physically, but let's 808 00:38:50,172 --> 00:38:52,380 just stipulate that if you've got some amount of RAM, 809 00:38:52,380 --> 00:38:56,220 we can surely think of each byte as having a number. 810 00:38:56,220 --> 00:38:57,510 So what does that do for us? 811 00:38:57,510 --> 00:39:01,560 Well if you write a program that has a char in it, a character, 812 00:39:01,560 --> 00:39:05,310 how big was a char according to the chart a moment ago? 813 00:39:05,310 --> 00:39:06,210 So just one byte. 814 00:39:06,210 --> 00:39:11,380 So if you allocate a char, called c, or called anything in your program, 815 00:39:11,380 --> 00:39:14,977 you will be asking the computer to use just one of these tiny little squares 816 00:39:14,977 --> 00:39:16,810 physically inside of your computer's memory. 817 00:39:16,810 --> 00:39:19,990 By contrast, how about an int-- how big was an int? 818 00:39:19,990 --> 00:39:20,590 Four bytes. 819 00:39:20,590 --> 00:39:23,130 So if you want to store a number as an integer, 820 00:39:23,130 --> 00:39:26,380 you're actually going to consume four of these bytes in your computer's memory 821 00:39:26,380 --> 00:39:26,880 instead. 822 00:39:26,880 --> 00:39:30,970 And if you're using a double or long, you might use as many of eight of them. 823 00:39:30,970 --> 00:39:32,903 So what is inside each of these boxes? 824 00:39:32,903 --> 00:39:35,320 There's eight bits here, eight bits here, eight bits here, 825 00:39:35,320 --> 00:39:37,960 or maybe it's eight little transistors, or even eight little light bulbs. 826 00:39:37,960 --> 00:39:41,050 Whatever they are, they're some way of representing zeros and ones. 827 00:39:41,050 --> 00:39:43,480 And that's what each of those boxes represents. 828 00:39:43,480 --> 00:39:45,410 So what can we do with this information? 829 00:39:45,410 --> 00:39:47,410 Well, let's go ahead and get rid of the hardware 830 00:39:47,410 --> 00:39:49,660 and abstract away, so to speak, as we keep doing, 831 00:39:49,660 --> 00:39:54,690 and consider if we zoom in here, how the computer, last week and this week 832 00:39:54,690 --> 00:39:58,990 end forever here out, is storing the information in the programs 833 00:39:58,990 --> 00:40:00,280 that you write. 834 00:40:00,280 --> 00:40:04,330 Suppose for instance, that we've got a program like this, 835 00:40:04,330 --> 00:40:05,770 with just three characters in it. 836 00:40:05,770 --> 00:40:11,860 I'm going to go ahead and whip this up in a file called, let's say, hi dot c. 837 00:40:11,860 --> 00:40:15,880 And I'm going to go ahead and do include standard Io dot h, int main void-- 838 00:40:15,880 --> 00:40:18,550 839 00:40:18,550 --> 00:40:19,533 learning. 840 00:40:19,533 --> 00:40:22,450 Now in here, I'm going to go ahead and have those three lines of code. 841 00:40:22,450 --> 00:40:25,090 So give me one char called c1 arbitrarily 842 00:40:25,090 --> 00:40:27,820 and set it equal to a capital H. Give me another one called 843 00:40:27,820 --> 00:40:31,870 c2, set it equal to capital I. Give me a third called c3, 844 00:40:31,870 --> 00:40:34,630 and set that equal to the exclamation point. 845 00:40:34,630 --> 00:40:40,180 Now you'll notice one detail that I've not emphasized before, I don't think. 846 00:40:40,180 --> 00:40:44,350 What types of punctuation am I clearly using here? 847 00:40:44,350 --> 00:40:46,510 So single quotes or apostrophes here. 848 00:40:46,510 --> 00:40:49,840 Single quotes in C are necessary for chars. 849 00:40:49,840 --> 00:40:52,110 Chars or single characters, just one byte. 850 00:40:52,110 --> 00:40:54,610 Whenever you want to hardcode them into a program like this, 851 00:40:54,610 --> 00:40:56,450 like I've done here, use single quotes. 852 00:40:56,450 --> 00:40:59,540 Of course for strings we used double quotes. 853 00:40:59,540 --> 00:41:00,040 Why? 854 00:41:00,040 --> 00:41:00,820 Just because. 855 00:41:00,820 --> 00:41:03,160 Like C requires that we distinguish those two. 856 00:41:03,160 --> 00:41:05,518 So let me just do something a little silly here. 857 00:41:05,518 --> 00:41:07,810 Now that I've got three variables, let me just go ahead 858 00:41:07,810 --> 00:41:08,770 and print them all out. 859 00:41:08,770 --> 00:41:10,480 What is the format code I can print-- 860 00:41:10,480 --> 00:41:12,480 I can use to print a char? 861 00:41:12,480 --> 00:41:13,960 Yeah, a percent-- 862 00:41:13,960 --> 00:41:15,340 AUDIENCE: [INAUDIBLE] 863 00:41:15,340 --> 00:41:18,475 DAVID MALAN: Percent c for char, so percent c, and I want three of them. 864 00:41:18,475 --> 00:41:21,380 So I'm going to print all three at once, followed by a new line. 865 00:41:21,380 --> 00:41:23,560 And then if I want to print c1 first, c2, 866 00:41:23,560 --> 00:41:28,270 c3, that's the syntax with printf for just plugging in three place 867 00:41:28,270 --> 00:41:31,390 holders followed by three values, respectively left to right, 868 00:41:31,390 --> 00:41:34,600 and hopefully it's going to print presumably hi 869 00:41:34,600 --> 00:41:36,500 on the screen followed by a new line. 870 00:41:36,500 --> 00:41:37,970 So let me save the file. 871 00:41:37,970 --> 00:41:39,860 Let me do make hi. 872 00:41:39,860 --> 00:41:41,200 OK, no errors, which is good. 873 00:41:41,200 --> 00:41:46,330 Let me do dot slash hi, and indeed I see hi exclamation point, however 874 00:41:46,330 --> 00:41:49,330 with a space in between each character. 875 00:41:49,330 --> 00:41:50,350 But you know what? 876 00:41:50,350 --> 00:41:56,200 hi exclamation point are indeed chars, but what is a char, or a character? 877 00:41:56,200 --> 00:41:58,740 What is an Ascii character underneath the hood? 878 00:41:58,740 --> 00:41:59,680 AUDIENCE: [INAUDIBLE] 879 00:41:59,680 --> 00:42:00,700 DAVID MALAN: It's ultimately binary. 880 00:42:00,700 --> 00:42:01,630 Everything is binary. 881 00:42:01,630 --> 00:42:04,025 And what's one step in between there, in some sense? 882 00:42:04,025 --> 00:42:04,900 AUDIENCE: [INAUDIBLE] 883 00:42:04,900 --> 00:42:06,733 DAVID MALAN: It's just a number, an integer. 884 00:42:06,733 --> 00:42:09,220 Thanks to Ascii and Unicode in week zero, 885 00:42:09,220 --> 00:42:12,007 there's just a mapping from characters to numbers. 886 00:42:12,007 --> 00:42:13,090 So how do I print numbers? 887 00:42:13,090 --> 00:42:15,154 What format code do I use for printf? 888 00:42:15,154 --> 00:42:16,810 AUDIENCE: [INAUDIBLE] 889 00:42:16,810 --> 00:42:19,390 DAVID MALAN: Percent i, for integer. 890 00:42:19,390 --> 00:42:22,360 So suppose I want to actually see those values? 891 00:42:22,360 --> 00:42:23,710 Notice what I can do. 892 00:42:23,710 --> 00:42:26,020 I can tell the computer, you know what? 893 00:42:26,020 --> 00:42:30,190 Even though c1 is a char, please go ahead and treat it as an integer. 894 00:42:30,190 --> 00:42:33,740 And I can literally write int in parentheses before the variable, 895 00:42:33,740 --> 00:42:36,790 which is what's known as casting, C-A-S-T, 896 00:42:36,790 --> 00:42:41,410 which is just a verb describing the act of converting one data type to another 897 00:42:41,410 --> 00:42:43,240 so that I can actually see those numbers. 898 00:42:43,240 --> 00:42:45,240 So let me go ahead and save the file. 899 00:42:45,240 --> 00:42:50,530 Let me go ahead now and do make hi again. 900 00:42:50,530 --> 00:42:51,790 That seems to work fine. 901 00:42:51,790 --> 00:42:57,880 Dot slash hi, and now this old familiar 72, 73, 33. 902 00:42:57,880 --> 00:43:00,040 And frankly, I don't need to be so pedantic here. 903 00:43:00,040 --> 00:43:04,300 Frankly, clang is smart enough to just know that if I pass it a char, 904 00:43:04,300 --> 00:43:06,340 but I ask it to format it is an int, it's 905 00:43:06,340 --> 00:43:09,980 going to implicitly, not explicitly, cast it for me. 906 00:43:09,980 --> 00:43:13,720 So if I go ahead and run make hi again, and do dot slash hi, 907 00:43:13,720 --> 00:43:15,390 I'm going to see the exact same thing. 908 00:43:15,390 --> 00:43:17,890 So this understanding of what's going on underneath the hood 909 00:43:17,890 --> 00:43:20,260 can allow me to kind of tinker now and play around 910 00:43:20,260 --> 00:43:22,990 with what's going on inside of my computer's memory. 911 00:43:22,990 --> 00:43:25,120 But let's now see this more visually. 912 00:43:25,120 --> 00:43:27,700 If this is my computer's memory really magnified, 913 00:43:27,700 --> 00:43:31,270 such that there's like a billion squares somewhere available to me 914 00:43:31,270 --> 00:43:33,490 and this is zero, this is one, this is two. 915 00:43:33,490 --> 00:43:37,300 Suppose I have a program with three variables-- c1, c2, and c3-- 916 00:43:37,300 --> 00:43:39,100 what the computer is going to do is going 917 00:43:39,100 --> 00:43:41,180 to put the h in one of those boxes. 918 00:43:41,180 --> 00:43:43,180 It's going to put the i in another box, and it's 919 00:43:43,180 --> 00:43:45,190 going to put the exclamation point in a third box, 920 00:43:45,190 --> 00:43:48,490 and somehow or other it's going to label those with the names of the variables. 921 00:43:48,490 --> 00:43:52,420 It's going to sort of jot down as with a virtual pencil, this is c1, this is c2, 922 00:43:52,420 --> 00:43:53,500 this is c3. 923 00:43:53,500 --> 00:43:56,170 But it's the H-I exclamation point that's 924 00:43:56,170 --> 00:43:58,660 actually stored at that location. 925 00:43:58,660 --> 00:44:00,190 But of course, it's not just a char. 926 00:44:00,190 --> 00:44:01,697 It's really technically a number. 927 00:44:01,697 --> 00:44:04,030 So really what's going on inside of my computer's memory 928 00:44:04,030 --> 00:44:06,740 is that 72, 73, and 33 is stored. 929 00:44:06,740 --> 00:44:09,430 But someone called out earlier it's actually binary. 930 00:44:09,430 --> 00:44:13,030 So what's really underneath the hood is this. 931 00:44:13,030 --> 00:44:15,280 Those zeros and ones are somehow implemented 932 00:44:15,280 --> 00:44:18,190 with transistors or light bulbs or whatever the technology is, 933 00:44:18,190 --> 00:44:20,830 but it's just storing a pattern of zeros and ones. 934 00:44:20,830 --> 00:44:22,360 And I did out the math before class. 935 00:44:22,360 --> 00:44:26,650 This indeed represents 72 in decimal, 73, and 33. 936 00:44:26,650 --> 00:44:30,310 But here, too, we're getting to a low-level implementation detail 937 00:44:30,310 --> 00:44:32,380 that we generally don't need to care about. 938 00:44:32,380 --> 00:44:35,260 Abstraction, per week zero, is this beautiful thing 939 00:44:35,260 --> 00:44:38,170 because we could just, meh, tune all that out and just think 940 00:44:38,170 --> 00:44:41,650 of it at any higher level that we want, whether it's decimal 941 00:44:41,650 --> 00:44:44,230 or whether it's actual Ascii characters. 942 00:44:44,230 --> 00:44:46,640 But that's all that's going on underneath the hood. 943 00:44:46,640 --> 00:44:47,140 Yeah? 944 00:44:47,140 --> 00:44:51,028 AUDIENCE: [INAUDIBLE] 945 00:44:51,028 --> 00:44:55,383 946 00:44:55,383 --> 00:44:56,800 DAVID MALAN: Really good question. 947 00:44:56,800 --> 00:45:02,200 If you declared three variables as integers and stored 72, 73, 33 in them 948 00:45:02,200 --> 00:45:04,840 and tried to print them then with percent c, 949 00:45:04,840 --> 00:45:08,260 yes, you could coerce that behavior as well, and literally do the opposite. 950 00:45:08,260 --> 00:45:11,410 At that point, you need to know what the Ascii codes are-- 951 00:45:11,410 --> 00:45:12,850 72, 73, 33. 952 00:45:12,850 --> 00:45:15,460 And mostly, programmers don't care about that. 953 00:45:15,460 --> 00:45:18,340 All they do is know that there is some mapping underneath the hood, 954 00:45:18,340 --> 00:45:19,510 but absolutely. 955 00:45:19,510 --> 00:45:22,090 Well let's consider another example now, this time involving 956 00:45:22,090 --> 00:45:26,290 three score, so three integers, instead of something like three characters. 957 00:45:26,290 --> 00:45:29,000 What might I actually do with values like this? 958 00:45:29,000 --> 00:45:32,110 Well, let me go ahead and write some code, this time in a file 959 00:45:32,110 --> 00:45:35,977 called scores dot c. 960 00:45:35,977 --> 00:45:38,560 I'm going to go ahead and clean up my terminal here and create 961 00:45:38,560 --> 00:45:42,220 a new file called scores dot c. 962 00:45:42,220 --> 00:45:45,550 And let's go ahead and do a few similar lines here. 963 00:45:45,550 --> 00:45:50,830 Let me go ahead and include say, CS50 dot h, include standard Io dot h, 964 00:45:50,830 --> 00:45:55,360 int main void, and now go ahead and start declaring some variables. 965 00:45:55,360 --> 00:45:56,790 Give me int score one. 966 00:45:56,790 --> 00:45:59,560 And I'm going to declare my score on some assignment 967 00:45:59,560 --> 00:46:03,900 to be 72, another score on an assignment to be about the same, 73, 968 00:46:03,900 --> 00:46:06,970 and another regrettable assignment to be, say, 33. 969 00:46:06,970 --> 00:46:09,910 So now I have three variables called integers, and suppose I just want 970 00:46:09,910 --> 00:46:11,770 to do something like print the average. 971 00:46:11,770 --> 00:46:14,170 I can certainly do this with printf and some math. 972 00:46:14,170 --> 00:46:18,537 So I might go ahead and say the average is % i, 973 00:46:18,537 --> 00:46:20,870 where that's going to be a placeholder, then a new line. 974 00:46:20,870 --> 00:46:23,912 And then the average, of course, is going to be something like score one, 975 00:46:23,912 --> 00:46:28,990 plus score two, plus score three, divided by three total, and then 976 00:46:28,990 --> 00:46:29,810 semicolon. 977 00:46:29,810 --> 00:46:30,760 So again, that's just the average. 978 00:46:30,760 --> 00:46:33,510 Add three numbers together, divide by the total number, and voila, 979 00:46:33,510 --> 00:46:35,020 we should get an average. 980 00:46:35,020 --> 00:46:40,120 Let me go ahead and save the file, compile this with make scores, Enter. 981 00:46:40,120 --> 00:46:42,370 Seems to compile OK-- dot slash scores. 982 00:46:42,370 --> 00:46:46,420 And I should get an average of 59 for those three quiz scores, or assignment 983 00:46:46,420 --> 00:46:48,260 scores, in this context. 984 00:46:48,260 --> 00:46:50,350 But this isn't the best design now. 985 00:46:50,350 --> 00:46:52,600 Now that we're dealing with numbers and scores, 986 00:46:52,600 --> 00:46:55,100 especially in the context of like a class where maybe you're 987 00:46:55,100 --> 00:46:58,300 going to have four scores or five scores or more scores, ultimately, 988 00:46:58,300 --> 00:46:59,320 week to week. 989 00:46:59,320 --> 00:47:03,132 What rubs you perhaps the wrong way about this design so far? 990 00:47:03,132 --> 00:47:04,392 AUDIENCE: [INAUDIBLE] 991 00:47:04,392 --> 00:47:05,350 DAVID MALAN: Say again. 992 00:47:05,350 --> 00:47:06,752 AUDIENCE: I 993 00:47:06,752 --> 00:47:08,210 DAVID MALAN: Yeah, it's very fixed. 994 00:47:08,210 --> 00:47:10,310 This is like writing a program at the beginning of the semester 995 00:47:10,310 --> 00:47:13,102 and deciding in advance there's only going to be three assignments, 996 00:47:13,102 --> 00:47:15,140 and if you want to have a fourth, too bad. 997 00:47:15,140 --> 00:47:17,010 The software does not support it. 998 00:47:17,010 --> 00:47:18,260 So that's not the best design. 999 00:47:18,260 --> 00:47:21,650 And what else might you critique about this code, simple as it is. 1000 00:47:21,650 --> 00:47:22,396 Yeah? 1001 00:47:22,396 --> 00:47:26,204 AUDIENCE: [INAUDIBLE] 1002 00:47:26,204 --> 00:47:28,110 1003 00:47:28,110 --> 00:47:30,600 DAVID MALAN: Yeah, I'm potentially cheating students out 1004 00:47:30,600 --> 00:47:34,320 of a partial score, especially if their average was like 59.5. 1005 00:47:34,320 --> 00:47:36,470 I would like to be rounded up to 60, for instance. 1006 00:47:36,470 --> 00:47:38,520 So we're also having some imprecision issues. 1007 00:47:38,520 --> 00:47:40,020 And we'll come back to that as well. 1008 00:47:40,020 --> 00:47:40,740 Any other critiques? 1009 00:47:40,740 --> 00:47:41,240 Yeah? 1010 00:47:41,240 --> 00:47:44,720 AUDIENCE: [INAUDIBLE] 1011 00:47:44,720 --> 00:47:47,930 DAVID MALAN: Yeah, even though I typed it out manually, 1012 00:47:47,930 --> 00:47:51,137 this is dangerously close to just copying and pasting the same code again 1013 00:47:51,137 --> 00:47:51,970 and again and again. 1014 00:47:51,970 --> 00:47:55,570 So just with the hi example, as with this one, as with our cough example 1015 00:47:55,570 --> 00:47:59,470 last week and the week before, just doing this thing again and again 1016 00:47:59,470 --> 00:48:02,260 and again is really an opportunity for a better design. 1017 00:48:02,260 --> 00:48:05,020 So it turns out, there is that opportunity. 1018 00:48:05,020 --> 00:48:10,490 And in C, if you know that you want to have more than just one value, 1019 00:48:10,490 --> 00:48:12,610 but they're all kind of related, what might 1020 00:48:12,610 --> 00:48:16,720 be a nice name for a variable containing multiple scores? 1021 00:48:16,720 --> 00:48:17,620 AUDIENCE: [INAUDIBLE] 1022 00:48:17,620 --> 00:48:19,420 DAVID MALAN: Scores plural in English. 1023 00:48:19,420 --> 00:48:20,990 So how can we do that? 1024 00:48:20,990 --> 00:48:23,650 Well unfortunately, if I just say int scores, 1025 00:48:23,650 --> 00:48:25,650 I need to decide which score it gets as a value. 1026 00:48:25,650 --> 00:48:27,942 Now those of you who have prior programming experience, 1027 00:48:27,942 --> 00:48:31,030 might know where we're going with this, and we're about to get there. 1028 00:48:31,030 --> 00:48:34,990 It turns out in C, if you want to have one variable that 1029 00:48:34,990 --> 00:48:39,430 can store multiple values, you use what's called an array. 1030 00:48:39,430 --> 00:48:44,680 An array is a list of values that can be all the same type 1031 00:48:44,680 --> 00:48:46,790 in a variable of the same name. 1032 00:48:46,790 --> 00:48:50,140 So if you want three scores, each of which is an int in C, 1033 00:48:50,140 --> 00:48:53,620 you literally use square brackets, the number of scores you want, 1034 00:48:53,620 --> 00:48:54,640 and then a semicolon. 1035 00:48:54,640 --> 00:48:58,750 That will say to the computer, give me enough memory for three integers. 1036 00:48:58,750 --> 00:49:01,340 Down here now, I get to change my syntax. 1037 00:49:01,340 --> 00:49:03,640 I don't want score one, score two, score three. 1038 00:49:03,640 --> 00:49:10,240 I want to put these scores inside of the array by simply saying its name, 1039 00:49:10,240 --> 00:49:14,530 using square brackets, albeit a little differently this time, 1040 00:49:14,530 --> 00:49:17,420 and put them at locations one, two, three, 1041 00:49:17,420 --> 00:49:19,150 but that's actually my first mistake. 1042 00:49:19,150 --> 00:49:22,320 Computer scientists typically start counting at one-- 1043 00:49:22,320 --> 00:49:26,170 no-- computer scientists typically start counting at zero, 1044 00:49:26,170 --> 00:49:29,590 so I need to zero index my array. 1045 00:49:29,590 --> 00:49:34,360 Arrays are zero indexed, which just means the first location is zero, 1046 00:49:34,360 --> 00:49:36,730 the second is one, the third is two. 1047 00:49:36,730 --> 00:49:39,960 So this now, is equivalent code to giving me three variables, 1048 00:49:39,960 --> 00:49:42,460 but now I've gotten rid of the messiness that you identified 1049 00:49:42,460 --> 00:49:44,560 by copying and pasting the name again and again, 1050 00:49:44,560 --> 00:49:46,374 and I can store them all together. 1051 00:49:46,374 --> 00:49:50,573 AUDIENCE: On the scores, the number three stands for three variables, 1052 00:49:50,573 --> 00:49:51,073 right? 1053 00:49:51,073 --> 00:49:53,440 It doesn't stand for four? 1054 00:49:53,440 --> 00:49:56,350 DAVID MALAN: Does the three stand for three variables? 1055 00:49:56,350 --> 00:50:02,110 It stands for enough space for three values in one variable. 1056 00:50:02,110 --> 00:50:03,140 Good question. 1057 00:50:03,140 --> 00:50:05,035 Others, questions? 1058 00:50:05,035 --> 00:50:05,535 Yeah? 1059 00:50:05,535 --> 00:50:10,063 AUDIENCE: [INAUDIBLE] bringing equals and then [INAUDIBLE] 1060 00:50:10,063 --> 00:50:11,480 DAVID MALAN: Really good question. 1061 00:50:11,480 --> 00:50:13,070 Can you do this all in one line? 1062 00:50:13,070 --> 00:50:15,710 Yes, but let me just tease you by saying something 1063 00:50:15,710 --> 00:50:18,530 like this involving curly braces, but we won't go there today. 1064 00:50:18,530 --> 00:50:20,750 But yes, there are ways to get around this. 1065 00:50:20,750 --> 00:50:22,337 So let me go ahead and fix this now. 1066 00:50:22,337 --> 00:50:24,170 If I want to compute the average now, I need 1067 00:50:24,170 --> 00:50:30,470 to add these three values in this array, score zero, scores one, and scores two. 1068 00:50:30,470 --> 00:50:32,510 But arithmetically, the answer-- 1069 00:50:32,510 --> 00:50:37,280 the code is still the same, so if I now make scores and do dot slash scores, 1070 00:50:37,280 --> 00:50:38,680 my average is still 59. 1071 00:50:38,680 --> 00:50:41,180 And I do disclaim, there's still probably a mathematical bug 1072 00:50:41,180 --> 00:50:43,580 because if we're using integers, as was noted, 1073 00:50:43,580 --> 00:50:46,110 but we'll come back to that in just a little bit. 1074 00:50:46,110 --> 00:50:47,360 So let's push a little harder. 1075 00:50:47,360 --> 00:50:50,720 Even if you've never programmed before, what might still 1076 00:50:50,720 --> 00:50:52,970 be a little bad about the design. 1077 00:50:52,970 --> 00:50:55,250 The program works, but we can do it better. 1078 00:50:55,250 --> 00:50:57,170 AUDIENCE: Still only stores three. 1079 00:50:57,170 --> 00:50:58,190 DAVID MALAN: Still only stores three. 1080 00:50:58,190 --> 00:51:00,232 So we haven't even solved the very first problem. 1081 00:51:00,232 --> 00:51:01,010 Other critiques? 1082 00:51:01,010 --> 00:51:02,845 AUDIENCE: [INAUDIBLE] 1083 00:51:02,845 --> 00:51:04,970 DAVID MALAN: I have too much code in the last line. 1084 00:51:04,970 --> 00:51:06,710 Yeah, it's getting a little wordy, so it's 1085 00:51:06,710 --> 00:51:08,752 going to be a little harder to read-- quite fair. 1086 00:51:08,752 --> 00:51:09,570 Yeah? 1087 00:51:09,570 --> 00:51:10,154 AUDIENCE: I 1088 00:51:10,154 --> 00:51:11,946 DAVID MALAN: Sorry, say it a little louder. 1089 00:51:11,946 --> 00:51:14,080 AUDIENCE: The scores are hardcoded into the program. 1090 00:51:14,080 --> 00:51:16,320 DAVID MALAN: Yeah, the scores are hardcoded into the program, 1091 00:51:16,320 --> 00:51:18,110 which means it doesn't matter what you get on your assignments, 1092 00:51:18,110 --> 00:51:19,670 we're all getting 59's. 1093 00:51:19,670 --> 00:51:21,400 So that's another problem as well. 1094 00:51:21,400 --> 00:51:22,400 And any other critiques? 1095 00:51:22,400 --> 00:51:23,060 Yeah? 1096 00:51:23,060 --> 00:51:25,770 AUDIENCE: If it could read the input data, it might be better. 1097 00:51:25,770 --> 00:51:27,770 DAVID MALAN: If it could read input data-- yeah, 1098 00:51:27,770 --> 00:51:29,270 so let me combine those suggestions. 1099 00:51:29,270 --> 00:51:31,930 It'd be great if, eventually, this program is dynamic. 1100 00:51:31,930 --> 00:51:32,850 And anything else? 1101 00:51:32,850 --> 00:51:33,513 Yeah? 1102 00:51:33,513 --> 00:51:35,260 AUDIENCE: [INAUDIBLE] 1103 00:51:35,260 --> 00:51:36,260 DAVID MALAN: Definitely. 1104 00:51:36,260 --> 00:51:38,360 We can pull loop into the situation and actually 1105 00:51:38,360 --> 00:51:40,595 get multiple values from the user. 1106 00:51:40,595 --> 00:51:44,470 AUDIENCE: Always dividing by three, so [INAUDIBLE] 1107 00:51:44,470 --> 00:51:46,720 DAVID MALAN: Yeah, it's also always dividing by three. 1108 00:51:46,720 --> 00:51:49,780 And this is subtle, and it's not a huge problem yet, 1109 00:51:49,780 --> 00:51:53,290 but there is this principle I'm kind of violating here known 1110 00:51:53,290 --> 00:51:54,940 as don't repeat yourself. 1111 00:51:54,940 --> 00:51:58,090 And I have repeated myself in at least two locations. 1112 00:51:58,090 --> 00:52:01,310 What values appear in two locations? 1113 00:52:01,310 --> 00:52:04,750 So three up here, and then also three down here. 1114 00:52:04,750 --> 00:52:10,240 And minor though this detail seems, this is the source of so many common bugs 1115 00:52:10,240 --> 00:52:12,360 because if you just kind of decide by yourself, 1116 00:52:12,360 --> 00:52:13,630 well, I'm going to hard code three up here, 1117 00:52:13,630 --> 00:52:15,970 I'm going to hard code three down here, odds are, 1118 00:52:15,970 --> 00:52:18,650 tomorrow morning, next week, next month, next year, 1119 00:52:18,650 --> 00:52:20,920 let alone a colleague of yours, is never going 1120 00:52:20,920 --> 00:52:24,580 to notice the subtlety that this three just by social contract 1121 00:52:24,580 --> 00:52:26,690 has to be the same as this three. 1122 00:52:26,690 --> 00:52:27,940 That is not a code constraint. 1123 00:52:27,940 --> 00:52:31,390 That's just sort of a little thing you knew and decided at the time. 1124 00:52:31,390 --> 00:52:33,350 So let me fix this in the following way. 1125 00:52:33,350 --> 00:52:38,260 It turns out that in C we can have variables that just have numbers 1126 00:52:38,260 --> 00:52:41,560 like this, so maybe int n gets three. 1127 00:52:41,560 --> 00:52:45,670 I can now just use my variable here and here. 1128 00:52:45,670 --> 00:52:46,725 That's a little better. 1129 00:52:46,725 --> 00:52:47,600 It's a little better. 1130 00:52:47,600 --> 00:52:50,475 But there's this other feature in C, as with other languages too, 1131 00:52:50,475 --> 00:52:52,600 where if you know you want to hard code some value, 1132 00:52:52,600 --> 00:52:56,140 at least for now, but you don't want it to change, you will not change it 1133 00:52:56,140 --> 00:52:58,780 and you want to make sure you don't accidentally change it, 1134 00:52:58,780 --> 00:53:02,410 you can actually do something like this and even make it global if we want, 1135 00:53:02,410 --> 00:53:09,192 at the top of the file, I can say not just int n, but const int n, 1136 00:53:09,192 --> 00:53:10,900 and just because of human convention, I'm 1137 00:53:10,900 --> 00:53:13,940 also going to now capitalize the variable, just because. 1138 00:53:13,940 --> 00:53:17,290 And now I'm going to change this n to capital, this n to capital. 1139 00:53:17,290 --> 00:53:20,702 The reason being, I have just created for myself what's called a constant. 1140 00:53:20,702 --> 00:53:23,410 A constant is exactly what the word implies, even though you just 1141 00:53:23,410 --> 00:53:26,680 say const, and then the type of the variable, the compiler, clang, 1142 00:53:26,680 --> 00:53:29,560 we'll make sure that neither you nor some friend or colleague 1143 00:53:29,560 --> 00:53:31,960 accidentally change the value of n. 1144 00:53:31,960 --> 00:53:35,690 So now you can use n here, here, and any number of other places. 1145 00:53:35,690 --> 00:53:37,697 It will always be the same. 1146 00:53:37,697 --> 00:53:40,780 And what I'm using at the moment is what's called a global variable, which 1147 00:53:40,780 --> 00:53:43,570 are often frowned upon, even though you can put variables outside 1148 00:53:43,570 --> 00:53:45,760 of your functions, as we may eventually see, 1149 00:53:45,760 --> 00:53:49,030 it tends to be sloppy, except with constants. 1150 00:53:49,030 --> 00:53:53,765 When a constant is a value that you want to set and then forget about, 1151 00:53:53,765 --> 00:53:56,890 if you come back to this program weeks or months later, and you're like oh, 1152 00:53:56,890 --> 00:53:59,200 this semester we have four assignments, or five, 1153 00:53:59,200 --> 00:54:02,080 it's just handy to put the values you might 1154 00:54:02,080 --> 00:54:05,680 want to change before recompiling your code at the very top 1155 00:54:05,680 --> 00:54:09,185 so you have to go fishing for visually lower in your code. 1156 00:54:09,185 --> 00:54:10,060 So just a convention. 1157 00:54:10,060 --> 00:54:13,210 It goes at the top of the file, quite often, and you declare it as const, 1158 00:54:13,210 --> 00:54:19,280 and you capitalize it, and then you can use that value, n, throughout the code. 1159 00:54:19,280 --> 00:54:22,480 But now let's tie together those other suggestions and make this program 1160 00:54:22,480 --> 00:54:24,430 even better, such that it's not just hard 1161 00:54:24,430 --> 00:54:27,940 coding this one value, n, everywhere. 1162 00:54:27,940 --> 00:54:30,170 Let me go ahead and get rid of this. 1163 00:54:30,170 --> 00:54:33,460 Let me go ahead now and take your suggestion that we do this dynamically, 1164 00:54:33,460 --> 00:54:35,590 and we can use arrays for this too. 1165 00:54:35,590 --> 00:54:39,700 If I know in advance that I want to ask the user for how many assignments there 1166 00:54:39,700 --> 00:54:42,280 are this semester, well I can do something like this. 1167 00:54:42,280 --> 00:54:48,800 Int n gets get int, and I'll say number of scores, 1168 00:54:48,800 --> 00:54:50,950 and then prompt them for their input. 1169 00:54:50,950 --> 00:54:54,550 And then what I'm going to do after that is give myself an array 1170 00:54:54,550 --> 00:54:57,910 called scores of size n as step two. 1171 00:54:57,910 --> 00:55:00,130 And then what I might do is something like this. 1172 00:55:00,130 --> 00:55:05,140 For int i get zero, i less than n, i plus plus, 1173 00:55:05,140 --> 00:55:08,140 which even though I'm typing it fast, is exactly the same paradigm we've 1174 00:55:08,140 --> 00:55:10,360 used before, for, for loops. 1175 00:55:10,360 --> 00:55:12,610 And here, I could do something like scores 1176 00:55:12,610 --> 00:55:21,490 bracket i gets get int score semicolon, prompting the user again and again 1177 00:55:21,490 --> 00:55:24,730 and again for a loop for the IFE score, so to speak. 1178 00:55:24,730 --> 00:55:29,230 And because I start counting at zero, and on up to, but not through n, 1179 00:55:29,230 --> 00:55:34,030 I will end up filling this with exactly as many scores as the human requested. 1180 00:55:34,030 --> 00:55:37,568 Let's go ahead now and leave this as a to do for a moment. 1181 00:55:37,568 --> 00:55:39,610 Let me just because the math's about the change-- 1182 00:55:39,610 --> 00:55:42,485 let me go ahead and delete that and we'll just not do the average yet 1183 00:55:42,485 --> 00:55:44,380 just so I can compile this first. 1184 00:55:44,380 --> 00:55:46,540 I'm going to go ahead and make scores again-- 1185 00:55:46,540 --> 00:55:47,610 seems to compile. 1186 00:55:47,610 --> 00:55:55,460 Dot slash scores, number of scores-- let's do three, so 72, 73, 33, Enter, 1187 00:55:55,460 --> 00:55:56,710 and my average is still to do. 1188 00:55:56,710 --> 00:55:58,030 So we'll come back to that. 1189 00:55:58,030 --> 00:55:58,400 But you know what? 1190 00:55:58,400 --> 00:56:00,430 It would be nice to make this a little prettier. 1191 00:56:00,430 --> 00:56:03,850 Why don't I tell the human what score I want from them, so I can say, give me 1192 00:56:03,850 --> 00:56:05,910 score number such and such, i. 1193 00:56:05,910 --> 00:56:09,730 So let me just use get int, like this. 1194 00:56:09,730 --> 00:56:13,810 Now let me go ahead and make scores, dot slash scores. 1195 00:56:13,810 --> 00:56:15,110 Give me three scores again. 1196 00:56:15,110 --> 00:56:18,633 Score zero, 72, 73, 33. 1197 00:56:18,633 --> 00:56:20,050 Now this is kind of stupid, right? 1198 00:56:20,050 --> 00:56:23,230 At least for normal people who might use my program, what is score zero? 1199 00:56:23,230 --> 00:56:24,190 What is score one? 1200 00:56:24,190 --> 00:56:27,998 We can fix this for normal people, and just do that. 1201 00:56:27,998 --> 00:56:30,040 We're not changing where we're putting the value, 1202 00:56:30,040 --> 00:56:32,665 but we can certainly change the aesthetics of what we're doing. 1203 00:56:32,665 --> 00:56:34,230 So let's remake scores. 1204 00:56:34,230 --> 00:56:37,050 Dot slash scores, and now it's more human friendly-- 1205 00:56:37,050 --> 00:56:40,300 72, 73, 33. 1206 00:56:40,300 --> 00:56:41,620 So one piece remains. 1207 00:56:41,620 --> 00:56:43,870 How do I now compute the average in a way 1208 00:56:43,870 --> 00:56:46,620 that's dynamic and I'm not hard coding score one, score two, score 1209 00:56:46,620 --> 00:56:48,660 three again, or even the array version? 1210 00:56:48,660 --> 00:56:49,410 And you know what? 1211 00:56:49,410 --> 00:56:51,202 This is a nice opportunity to maybe come up 1212 00:56:51,202 --> 00:56:55,160 with a helper function that also solves the int issue from before. 1213 00:56:55,160 --> 00:56:56,910 So let me go ahead and say, you know what? 1214 00:56:56,910 --> 00:56:59,280 The average could perhaps have a fraction. 1215 00:56:59,280 --> 00:57:02,940 So what data type do I want to use if my average might have a fraction? 1216 00:57:02,940 --> 00:57:03,860 So a double or float. 1217 00:57:03,860 --> 00:57:04,860 So we'll go with either. 1218 00:57:04,860 --> 00:57:08,190 I'll keep it simple because the scores are going to be crazy big or precise. 1219 00:57:08,190 --> 00:57:10,540 I'm going to create a function called average. 1220 00:57:10,540 --> 00:57:14,820 And if I want to average all of the numbers that the human has typed in, 1221 00:57:14,820 --> 00:57:16,710 turns out I need to know two things. 1222 00:57:16,710 --> 00:57:20,790 I need to know the length of the array that they've been accumulating 1223 00:57:20,790 --> 00:57:23,825 and I need to have the array itself, so I'm 1224 00:57:23,825 --> 00:57:25,950 going to denote it with these square brackets here. 1225 00:57:25,950 --> 00:57:28,830 I don't have to know, at this point, how big it is. 1226 00:57:28,830 --> 00:57:30,990 The compiler will figure that out for me. 1227 00:57:30,990 --> 00:57:34,720 But I can now declare a function like this. 1228 00:57:34,720 --> 00:57:38,318 Well how do you go about averaging some number of values, 1229 00:57:38,318 --> 00:57:40,860 if you're handed them in a list, otherwise known as an array, 1230 00:57:40,860 --> 00:57:45,390 but I'm telling you the length of that list, what's this sort of intuition 1231 00:57:45,390 --> 00:57:48,390 for taking an average here? 1232 00:57:48,390 --> 00:57:49,068 Yeah? 1233 00:57:49,068 --> 00:57:53,280 AUDIENCE: You could take the sum and then divide it by [INAUDIBLE] number. 1234 00:57:53,280 --> 00:57:54,470 DAVID MALAN: Yeah. 1235 00:57:54,470 --> 00:57:56,220 Yeah, the average of a bunch of numbers is 1236 00:57:56,220 --> 00:57:58,262 just add all the numbers together and then divide 1237 00:57:58,262 --> 00:57:59,640 by the total number of numbers. 1238 00:57:59,640 --> 00:58:01,140 And I have all of those ingredients. 1239 00:58:01,140 --> 00:58:03,030 I have the length of the array, apparently, 1240 00:58:03,030 --> 00:58:05,505 and I have the array of numbers itself, as follows. 1241 00:58:05,505 --> 00:58:07,380 So let me go ahead and say something like sum 1242 00:58:07,380 --> 00:58:09,880 is zero, because I'm just going to start counting from zero, 1243 00:58:09,880 --> 00:58:14,580 and then I'm going to do for int i get zero, i less than length, i plus plus. 1244 00:58:14,580 --> 00:58:17,850 So again, I typed it fast, but it's identical to my for loop from before. 1245 00:58:17,850 --> 00:58:20,310 I'm just using the length as the condition. 1246 00:58:20,310 --> 00:58:21,840 And now what do I want to do here? 1247 00:58:21,840 --> 00:58:26,800 On each iteration, what do I want to add to the sum? 1248 00:58:26,800 --> 00:58:28,460 Sum equals sum plus what? 1249 00:58:28,460 --> 00:58:29,883 AUDIENCE: [INAUDIBLE] 1250 00:58:29,883 --> 00:58:31,550 DAVID MALAN: The next item in the array. 1251 00:58:31,550 --> 00:58:33,380 And I can express that, it turns out, just 1252 00:58:33,380 --> 00:58:37,070 like before the name of the array, which happens to be literally array, just 1253 00:58:37,070 --> 00:58:38,160 for convenience. 1254 00:58:38,160 --> 00:58:41,520 And then how do I get the appropriate value from it? 1255 00:58:41,520 --> 00:58:45,040 Bracket i, because i is going to start in this loop at zero, 1256 00:58:45,040 --> 00:58:47,630 going to go up to, but not through its length. 1257 00:58:47,630 --> 00:58:50,630 So this is just a way of getting bracket zero, bracket one, bracket two, 1258 00:58:50,630 --> 00:58:53,510 and just adding it to sum on each iteration. 1259 00:58:53,510 --> 00:58:55,820 Now this is unnecessarily wordy. 1260 00:58:55,820 --> 00:58:59,210 Recall, that this is shorthand notation for that. 1261 00:58:59,210 --> 00:59:01,940 I can't just use plus, plus here though, because I want 1262 00:59:01,940 --> 00:59:04,070 to add the actual scores not just one. 1263 00:59:04,070 --> 00:59:07,280 So I can use either this syntax or the more verbose syntax, 1264 00:59:07,280 --> 00:59:08,540 but I'll go with this one. 1265 00:59:08,540 --> 00:59:11,480 And now at the end of this function, notice I have to make a decision. 1266 00:59:11,480 --> 00:59:14,120 And we haven't seen terribly many functions of our own, 1267 00:59:14,120 --> 00:59:17,420 but if this is what my function looks like, its name is average, 1268 00:59:17,420 --> 00:59:22,100 it takes two inputs, one of which is an int called length, the other of which 1269 00:59:22,100 --> 00:59:26,150 is an array of integers, and I know it's an array not by its name, which 1270 00:59:26,150 --> 00:59:30,290 I could have called anything, but I know it because of these new square brackets 1271 00:59:30,290 --> 00:59:31,790 today. 1272 00:59:31,790 --> 00:59:36,405 However, what does this mention of float mean on the left-hand side of line 18? 1273 00:59:36,405 --> 00:59:37,280 AUDIENCE: [INAUDIBLE] 1274 00:59:37,280 --> 00:59:38,780 DAVID MALAN: That's what it returns. 1275 00:59:38,780 --> 00:59:42,740 The return value of a function is what it hands back to whoever is using it. 1276 00:59:42,740 --> 00:59:44,690 So get string, returns a string. 1277 00:59:44,690 --> 00:59:46,160 Get int, returns an int. 1278 00:59:46,160 --> 00:59:48,860 Average I want to return a float. 1279 00:59:48,860 --> 00:59:51,000 And so how do I return this value? 1280 00:59:51,000 --> 00:59:55,120 Well, let me go ahead and return the sum divided by the length, 1281 00:59:55,120 --> 00:59:56,990 as I think you proposed? 1282 00:59:56,990 --> 01:00:00,748 Now there's actually one bug here, but we'll come back to that in a moment. 1283 01:00:00,748 --> 01:00:02,790 Now let me just go ahead and plug in the average. 1284 01:00:02,790 --> 01:00:06,240 What's the format code for a floating point value? 1285 01:00:06,240 --> 01:00:07,400 Percent f, yeah. 1286 01:00:07,400 --> 01:00:09,710 And then if I want to plug in the average, 1287 01:00:09,710 --> 01:00:12,600 I can call my function called average. 1288 01:00:12,600 --> 01:00:15,440 And what two inputs do I need to give it? 1289 01:00:15,440 --> 01:00:20,760 n, which is the length of the array, and scores, which is the name of the array. 1290 01:00:20,760 --> 01:00:22,970 So again, even though arrays are new, this is not. 1291 01:00:22,970 --> 01:00:27,230 We have last week called functions that take one or more arguments 1292 01:00:27,230 --> 01:00:28,830 and it's certainly fine to nest them. 1293 01:00:28,830 --> 01:00:30,913 However, if you don't like that, you can certainly 1294 01:00:30,913 --> 01:00:32,720 do something like this-- float average gets 1295 01:00:32,720 --> 01:00:34,890 that, and then you can plug in average. 1296 01:00:34,890 --> 01:00:36,920 But again, in the spirit of good design, you're 1297 01:00:36,920 --> 01:00:39,260 just doubling the number of lines unnecessarily. 1298 01:00:39,260 --> 01:00:42,710 So I'm going to go ahead and nest it just like this. 1299 01:00:42,710 --> 01:00:44,430 All right, let me save that. 1300 01:00:44,430 --> 01:00:46,527 And I feel really good about this so far. 1301 01:00:46,527 --> 01:00:48,110 I feel like everything's making sense. 1302 01:00:48,110 --> 01:00:49,220 So make scores. 1303 01:00:49,220 --> 01:00:50,690 And oh, my god. 1304 01:00:50,690 --> 01:00:53,450 1305 01:00:53,450 --> 01:00:58,042 Line 15 seems to be at fault. So we can certainly use help 50, 1306 01:00:58,042 --> 01:00:59,750 but let's see if we can't reason through. 1307 01:00:59,750 --> 01:01:00,792 What mistake have I made? 1308 01:01:00,792 --> 01:01:04,130 1309 01:01:04,130 --> 01:01:06,920 It's highlighted here, even though it's very non obvious. 1310 01:01:06,920 --> 01:01:07,862 Yeah? 1311 01:01:07,862 --> 01:01:12,101 AUDIENCE: [INAUDIBLE] 1312 01:01:12,101 --> 01:01:12,685 1313 01:01:12,685 --> 01:01:13,560 DAVID MALAN: Exactly. 1314 01:01:13,560 --> 01:01:16,230 My function is at the bottom of my file and C is kind of dumb. 1315 01:01:16,230 --> 01:01:18,730 It only does what it's told, top to bottom, left to right. 1316 01:01:18,730 --> 01:01:20,670 And if your function averages at the bottom, 1317 01:01:20,670 --> 01:01:23,400 but you're trying to use it in main, that's too late. 1318 01:01:23,400 --> 01:01:26,380 So we can fix this in a couple of ways, just as we did last week. 1319 01:01:26,380 --> 01:01:28,380 I can kind of sloppily just say, all right, well 1320 01:01:28,380 --> 01:01:29,630 let's just move it to the top. 1321 01:01:29,630 --> 01:01:31,350 That will solve that problem. 1322 01:01:31,350 --> 01:01:33,727 But frankly, that moves main farther down 1323 01:01:33,727 --> 01:01:36,060 and it's a good human convention to keep main at the top 1324 01:01:36,060 --> 01:01:38,200 so you can see the main part of your program. 1325 01:01:38,200 --> 01:01:41,700 This is why, last week, we introduced the notion of a prototype, 1326 01:01:41,700 --> 01:01:44,940 where you literally-- and this is the only time where the copy-paste is OK-- 1327 01:01:44,940 --> 01:01:48,810 you copy-paste the first line of your function and end it with a semicolon 1328 01:01:48,810 --> 01:01:50,430 without any more currently braces. 1329 01:01:50,430 --> 01:01:52,200 That's now a clue to solve that problem. 1330 01:01:52,200 --> 01:01:53,770 Hey clang, here's a function. 1331 01:01:53,770 --> 01:01:55,895 I'm not going to get around to implementing it yet, 1332 01:01:55,895 --> 01:01:57,570 but you at least know what it's called. 1333 01:01:57,570 --> 01:02:00,090 Now there's still a slight logical bug in here. 1334 01:02:00,090 --> 01:02:04,270 Let me try re-saving and recompiling scores. 1335 01:02:04,270 --> 01:02:05,880 It compiled this time-- nice. 1336 01:02:05,880 --> 01:02:07,410 Let me go ahead and run scores. 1337 01:02:07,410 --> 01:02:13,320 Number of scores will be three, 72, 73, 33. 1338 01:02:13,320 --> 01:02:14,560 OK, that's pretty good. 1339 01:02:14,560 --> 01:02:15,730 Let me try another one. 1340 01:02:15,730 --> 01:02:17,190 How about two scores. 1341 01:02:17,190 --> 01:02:21,570 100 and suppose you get a 99 on the other, 1342 01:02:21,570 --> 01:02:24,620 you probably want your grade to be what? 1343 01:02:24,620 --> 01:02:25,380 100, right. 1344 01:02:25,380 --> 01:02:28,240 If it's 99.5, you'd prefer we round up. 1345 01:02:28,240 --> 01:02:30,360 So where is that bug? 1346 01:02:30,360 --> 01:02:32,490 Well let me scroll down here, and this is 1347 01:02:32,490 --> 01:02:35,370 what you were alluding to earlier when you identified this early on. 1348 01:02:35,370 --> 01:02:37,650 So I'm doing a couple of things incorrectly here. 1349 01:02:37,650 --> 01:02:40,950 One, I'm adding the sum here. 1350 01:02:40,950 --> 01:02:44,280 I'm using an int and initializing sum to zero, 1351 01:02:44,280 --> 01:02:46,600 and then I'm dividing an integer by an integer. 1352 01:02:46,600 --> 01:02:50,190 And this is subtle, but in C, if you divide an integer by an integer, 1353 01:02:50,190 --> 01:02:52,920 just take a guess-- what do you get as the answer? 1354 01:02:52,920 --> 01:02:53,800 AUDIENCE: An integer. 1355 01:02:53,800 --> 01:02:54,420 DAVID MALAN: An integer. 1356 01:02:54,420 --> 01:02:56,170 Integers can't store decimal points. 1357 01:02:56,170 --> 01:03:01,398 So even if your score is 99.900000 ad nauseum, 1358 01:03:01,398 --> 01:03:04,440 what's going to get thrown away is literally everything after the decimal 1359 01:03:04,440 --> 01:03:05,140 point. 1360 01:03:05,140 --> 01:03:07,290 So your grade is actually a 99. 1361 01:03:07,290 --> 01:03:11,550 So there's a couple of ways we can fix this, but perhaps the simplest is this. 1362 01:03:11,550 --> 01:03:14,220 I can use that casting feature from before. 1363 01:03:14,220 --> 01:03:16,363 I can tell the computer, don't treat length 1364 01:03:16,363 --> 01:03:19,530 as an int, actually treated as a float, and you know, just for good measure, 1365 01:03:19,530 --> 01:03:20,820 also treat sum as a float. 1366 01:03:20,820 --> 01:03:23,670 And there's different ways to do this, but now, I'm 1367 01:03:23,670 --> 01:03:26,670 telling the computer divide a float by a float, which 1368 01:03:26,670 --> 01:03:30,090 will allow me to return a float, and let's see what happens now. 1369 01:03:30,090 --> 01:03:31,590 Let me save that. 1370 01:03:31,590 --> 01:03:33,570 Make scores. 1371 01:03:33,570 --> 01:03:34,350 It compiled. 1372 01:03:34,350 --> 01:03:35,840 Dot slash scores. 1373 01:03:35,840 --> 01:03:36,850 Number of scores is two. 1374 01:03:36,850 --> 01:03:38,040 100 is the first. 1375 01:03:38,040 --> 01:03:39,570 99 is the second. 1376 01:03:39,570 --> 01:03:41,572 Nice, now I've gotten the grade I deserved. 1377 01:03:41,572 --> 01:03:44,030 Heck, we could even bring in the round function if we want, 1378 01:03:44,030 --> 01:03:46,863 which you might have used for p-set one, but we'll leave it as this. 1379 01:03:46,863 --> 01:03:49,917 But I am going to go ahead and just do a 0.1 there. 1380 01:03:49,917 --> 01:03:51,750 Recall that with format codes you can really 1381 01:03:51,750 --> 01:03:54,700 start to get precise and say only show me one digit. 1382 01:03:54,700 --> 01:03:59,430 So if I recompile this now, make scores, and do dot slash scores-- 1383 01:03:59,430 --> 01:04:02,100 two scores-- 100, 99. 1384 01:04:02,100 --> 01:04:09,730 There's my 99.5% Any questions then on these arrays and the use there of? 1385 01:04:09,730 --> 01:04:10,327 Yeah? 1386 01:04:10,327 --> 01:04:14,143 AUDIENCE: [INAUDIBLE] the average [INAUDIBLE] income scores by 1387 01:04:14,143 --> 01:04:15,097 [INAUDIBLE] 1388 01:04:15,097 --> 01:04:17,180 DAVID MALAN: Explain the average-- this part here? 1389 01:04:17,180 --> 01:04:17,960 AUDIENCE: Yeah. 1390 01:04:17,960 --> 01:04:19,140 DAVID MALAN: Sure, can I explain this? 1391 01:04:19,140 --> 01:04:20,480 So, let me just show more of the code. 1392 01:04:20,480 --> 01:04:22,438 The last line of this program's purpose in life 1393 01:04:22,438 --> 01:04:25,610 is just to print the average of all of my scores. 1394 01:04:25,610 --> 01:04:28,160 And I decided, partly for design purposes, 1395 01:04:28,160 --> 01:04:32,780 but also today to illustrate a point, to relegate the computation of an average 1396 01:04:32,780 --> 01:04:33,980 to a custom function. 1397 01:04:33,980 --> 01:04:35,330 This is handy, because now if I ever work 1398 01:04:35,330 --> 01:04:37,070 on another problem that needs to average, 1399 01:04:37,070 --> 01:04:39,620 I've got a function I can use in that code too. 1400 01:04:39,620 --> 01:04:43,340 But in this case, average takes two arguments, apparently 1401 01:04:43,340 --> 01:04:45,970 the length of the array and the array itself, 1402 01:04:45,970 --> 01:04:47,720 but I could call these two things anything 1403 01:04:47,720 --> 01:04:51,350 I want-- x and y, length and array, anything else, 1404 01:04:51,350 --> 01:04:53,120 but I chose this for clarity. 1405 01:04:53,120 --> 01:04:54,860 But up here, I want to use that function. 1406 01:04:54,860 --> 01:04:57,200 So just like in Scratch, recall that you can nest blocks 1407 01:04:57,200 --> 01:04:59,430 and you can join something and then say it. 1408 01:04:59,430 --> 01:05:02,120 So can we call the average function, passing 1409 01:05:02,120 --> 01:05:04,610 in the length of the array and the array itself, 1410 01:05:04,610 --> 01:05:08,000 that gives me back my average 99.5, and then I'm 1411 01:05:08,000 --> 01:05:11,480 plugging that in to this format code in printf. 1412 01:05:11,480 --> 01:05:13,888 So just like in math, when you have lots of parentheses, 1413 01:05:13,888 --> 01:05:14,930 work from the inside out. 1414 01:05:14,930 --> 01:05:17,388 Look at the innermost parentheses, figure out what that is, 1415 01:05:17,388 --> 01:05:19,420 then work your way outward. 1416 01:05:19,420 --> 01:05:22,880 And if you've programmed in Java, or Python, or other languages, 1417 01:05:22,880 --> 01:05:25,040 you might be wondering why we need to tell 1418 01:05:25,040 --> 01:05:27,110 the function the length of an array. 1419 01:05:27,110 --> 01:05:30,440 In C, the arrays do not remember their own length. 1420 01:05:30,440 --> 01:05:33,200 So if you have programmed before, this is necessary. 1421 01:05:33,200 --> 01:05:36,920 You do not get that feature for free in C. Yeah? 1422 01:05:36,920 --> 01:05:41,580 AUDIENCE: [INAUDIBLE] 1423 01:05:41,580 --> 01:05:43,420 DAVID MALAN: Correct, if you do percent 0.1 1424 01:05:43,420 --> 01:05:45,520 you get one decimal point, so 99.5%. 1425 01:05:45,520 --> 01:05:49,843 AUDIENCE: Suppose that the answer was 99.49 [INAUDIBLE] 1426 01:05:49,843 --> 01:05:51,260 DAVID MALAN: Really good question. 1427 01:05:51,260 --> 01:05:57,170 If the answer is mathematically 99.49, but you do 0.1 here, 1428 01:05:57,170 --> 01:05:58,940 it will round up for you. 1429 01:05:58,940 --> 01:06:00,950 It will-- good question as well. 1430 01:06:00,950 --> 01:06:01,650 Yeah? 1431 01:06:01,650 --> 01:06:05,003 AUDIENCE: What happens [INAUDIBLE]? 1432 01:06:05,003 --> 01:06:06,420 DAVID MALAN: Really good question. 1433 01:06:06,420 --> 01:06:09,830 What happens if you divide an int by a float or something else? 1434 01:06:09,830 --> 01:06:13,520 You will typically up cast it to whatever the more powerful type is. 1435 01:06:13,520 --> 01:06:16,760 So if you divide an int by a float, you will actually get back a float. 1436 01:06:16,760 --> 01:06:20,060 So strictly speaking, I did not need to cast both the numerator 1437 01:06:20,060 --> 01:06:21,710 and the denominator to a float. 1438 01:06:21,710 --> 01:06:25,160 I just did it for consistency and demonstration's sake. 1439 01:06:25,160 --> 01:06:28,700 So it turns out, while we've been looking at numbers here alone 1440 01:06:28,700 --> 01:06:31,250 and scores, it turns out that there's actually 1441 01:06:31,250 --> 01:06:35,533 an intricate relationship with all of the h's and the i's and the exhalation 1442 01:06:35,533 --> 01:06:37,700 points we've been looking at, and all of the strings 1443 01:06:37,700 --> 01:06:40,670 we've been typing in too, however this was a mouthful, 1444 01:06:40,670 --> 01:06:42,450 and frankly I feel like a brownie as well, 1445 01:06:42,450 --> 01:06:45,283 so why don't we take our five minute break here and we'll come back. 1446 01:06:45,283 --> 01:06:47,570 1447 01:06:47,570 --> 01:06:51,610 We are back. 1448 01:06:51,610 --> 01:06:55,930 So thus far, we've introduced arrays as an opportunity 1449 01:06:55,930 --> 01:06:58,048 to improve the design of our code. 1450 01:06:58,048 --> 01:07:00,340 So we're going to hear a lot of squeaking now, I think. 1451 01:07:00,340 --> 01:07:05,890 So thus far, we've introduced arrays as the-- 1452 01:07:05,890 --> 01:07:08,240 we're going to do my best to keep a straight face. 1453 01:07:08,240 --> 01:07:11,500 Thus far, we have introduced arrays as a solution to a design problem 1454 01:07:11,500 --> 01:07:14,260 so that we can actually store multiple values, 1455 01:07:14,260 --> 01:07:18,970 but in the guise of one variable so as to avoid the copy-paste tendency 1456 01:07:18,970 --> 01:07:20,410 that we might otherwise have. 1457 01:07:20,410 --> 01:07:24,250 And those arrays ultimately started from trying to clean this kind of code up. 1458 01:07:24,250 --> 01:07:27,530 But what is it that was ultimately going on inside of the computer's memory 1459 01:07:27,530 --> 01:07:30,520 we can still consider, because it's actually not all that different. 1460 01:07:30,520 --> 01:07:34,960 However, when we have three integers, score one, score two, score three, 1461 01:07:34,960 --> 01:07:38,530 how many bytes is each of those-- it's going to take up? 1462 01:07:38,530 --> 01:07:41,770 So four, if you think back to the chat from before, char is one, 1463 01:07:41,770 --> 01:07:45,160 an int is four, at least on most systems, and so the number 1464 01:07:45,160 --> 01:07:49,180 72 in the variable called score one, we can draw on our computers 1465 01:07:49,180 --> 01:07:50,950 memory is taking up four of these boxes. 1466 01:07:50,950 --> 01:07:54,010 Because again, each box represents one byte, therefore four bytes 1467 01:07:54,010 --> 01:07:55,300 requires four boxes. 1468 01:07:55,300 --> 01:07:57,340 Score two and score three would similarly 1469 01:07:57,340 --> 01:07:58,990 be laid out in my computer's memory. 1470 01:07:58,990 --> 01:08:02,800 If I had three variables, score one, two, and three, as follows, like this. 1471 01:08:02,800 --> 01:08:05,890 Of course what's underneath the hood is actually bits, 1472 01:08:05,890 --> 01:08:09,500 but again, we don't need to worry about that level of abstraction anymore. 1473 01:08:09,500 --> 01:08:11,920 But that's indeed all that's going on there. 1474 01:08:11,920 --> 01:08:13,330 But we can clean this up. 1475 01:08:13,330 --> 01:08:16,270 We can instead get rid of this copy-paste approach to variable names 1476 01:08:16,270 --> 01:08:18,460 and just introduce an array called scores, 1477 01:08:18,460 --> 01:08:22,689 plural, and then initialize those three values, as in the program I wrote here. 1478 01:08:22,689 --> 01:08:27,490 And then, this picture is similar in spirit, but the names of these boxes, 1479 01:08:27,490 --> 01:08:31,840 so to speak, become score zero, scores one, and scores two. 1480 01:08:31,840 --> 01:08:36,640 So the array is now independent of the number of bytes being consumed. 1481 01:08:36,640 --> 01:08:38,979 Just because an int is four bytes, doesn't 1482 01:08:38,979 --> 01:08:43,359 mean you do score zero, scores four, scores eight, and so forth. 1483 01:08:43,359 --> 01:08:44,979 It's still zero, one, two. 1484 01:08:44,979 --> 01:08:49,990 The computer will figure out exactly how much space to give each of those values 1485 01:08:49,990 --> 01:08:52,354 based on its type, which is an int. 1486 01:08:52,354 --> 01:08:54,729 But it turns out that there's actually a relationship now 1487 01:08:54,729 --> 01:08:58,330 to where we began this story when we looked at characters. 1488 01:08:58,330 --> 01:09:01,660 H-I exclamation point was implemented with three lines of code 1489 01:09:01,660 --> 01:09:03,760 using c1, c2, and c3. 1490 01:09:03,760 --> 01:09:06,850 But last week, we already saw the notion of a string, 1491 01:09:06,850 --> 01:09:11,710 and it turns out strings and chars are fundamentally interrelated in ways 1492 01:09:11,710 --> 01:09:13,630 that we can now literally see. 1493 01:09:13,630 --> 01:09:16,779 If we had a string called s, for instance, 1494 01:09:16,779 --> 01:09:20,680 and that string contains three characters, H-I and an exclamation 1495 01:09:20,680 --> 01:09:23,109 point, well it turns out you can actually 1496 01:09:23,109 --> 01:09:25,479 get at the individual letters in a string 1497 01:09:25,479 --> 01:09:29,950 by doing the name of the string, bracket, zero, close bracket, 1498 01:09:29,950 --> 01:09:31,950 or s bracket one, or s bracket two. 1499 01:09:31,950 --> 01:09:35,260 If the name of my variable is s, and s is a string, 1500 01:09:35,260 --> 01:09:38,529 I can actually access the individual characters there in just 1501 01:09:38,529 --> 01:09:41,830 like an array, which is to say then, what 1502 01:09:41,830 --> 01:09:48,029 is a string as of this week versus last? 1503 01:09:48,029 --> 01:09:49,850 It's just an array of chars. 1504 01:09:49,850 --> 01:09:51,340 It's just an array of characters. 1505 01:09:51,340 --> 01:09:54,848 So even though it's a data type, thanks to CS50's library and CS50 dot h, 1506 01:09:54,848 --> 01:09:57,640 and we're going to take this training wheel off within a few weeks, 1507 01:09:57,640 --> 01:09:59,890 we've essentially just created a string to be 1508 01:09:59,890 --> 01:10:02,720 for now, at this point in the story, just an array of characters. 1509 01:10:02,720 --> 01:10:03,220 Why? 1510 01:10:03,220 --> 01:10:05,327 Because being able to have multiple characters 1511 01:10:05,327 --> 01:10:07,660 is certainly way more useful than having to spell things 1512 01:10:07,660 --> 01:10:11,320 out one variable at a time with one char at a time. 1513 01:10:11,320 --> 01:10:14,470 So string is a data type in the CS50 library 1514 01:10:14,470 --> 01:10:17,763 that for today's purposes indeed, just an array of characters. 1515 01:10:17,763 --> 01:10:19,930 And we'll see before long that, that too is actually 1516 01:10:19,930 --> 01:10:24,290 kind of a bit of a white lie, but we'll see why before long as well. 1517 01:10:24,290 --> 01:10:27,040 So if I declare a string in C, I can actually 1518 01:10:27,040 --> 01:10:28,540 literally do something like this. 1519 01:10:28,540 --> 01:10:32,620 String s equals quote unquote hi, this time using double quotes, and not 1520 01:10:32,620 --> 01:10:36,182 single quotes, because it's three characters and not just a single char. 1521 01:10:36,182 --> 01:10:38,890 So in memory, that's actually going to look pretty much the same. 1522 01:10:38,890 --> 01:10:42,910 If the variable's called s, it's going to have h i and an exclamation point. 1523 01:10:42,910 --> 01:10:46,630 And just for simplicity, I'll label the first box as s 1524 01:10:46,630 --> 01:10:49,210 and just assume that we can get everywhere else. 1525 01:10:49,210 --> 01:10:53,170 But it turns out that strings are a little special, because 1526 01:10:53,170 --> 01:10:56,530 unlike a char, which is one byte, unlike an int, which 1527 01:10:56,530 --> 01:10:59,590 is four bytes, unlike a long, which is eight bytes, 1528 01:10:59,590 --> 01:11:01,672 how long should a string be? 1529 01:11:01,672 --> 01:11:03,245 AUDIENCE: [INAUDIBLE] 1530 01:11:03,245 --> 01:11:05,620 DAVID MALAN: Yeah, I mean as many characters as you need, 1531 01:11:05,620 --> 01:11:08,050 because if I want to store H-I I need-- 1532 01:11:08,050 --> 01:11:11,170 H-I exclamation point, I need strings to be at least three bytes, 1533 01:11:11,170 --> 01:11:12,020 it would seem-- 1534 01:11:12,020 --> 01:11:15,100 for my name David, at least five bytes, for D-A-V-I-D-- 1535 01:11:15,100 --> 01:11:18,130 Brian, as well, and much longer names in the room, too. 1536 01:11:18,130 --> 01:11:21,413 So strings can't really have a preordained length associated 1537 01:11:21,413 --> 01:11:23,830 with them, which is why I put a question mark on the board 1538 01:11:23,830 --> 01:11:27,040 before when I first summarized the sizes of these types. 1539 01:11:27,040 --> 01:11:31,810 But the catch is that if a variable only has a name, like s, or name, or any 1540 01:11:31,810 --> 01:11:34,820 of the variables you use for p-set one's problems, 1541 01:11:34,820 --> 01:11:38,380 it turns out we all need to decide as human programmers 1542 01:11:38,380 --> 01:11:41,020 how do we know where the string ends? 1543 01:11:41,020 --> 01:11:43,330 The name of the variable, suffice it to say, 1544 01:11:43,330 --> 01:11:46,330 lets us know where the variable begins, just as I've drawn here. 1545 01:11:46,330 --> 01:11:48,960 If you reference a variable in a program and call it s, 1546 01:11:48,960 --> 01:11:52,390 the computer will just know to go to the first character in that string. 1547 01:11:52,390 --> 01:11:55,210 But there needs to be a little clue to the computer as to where 1548 01:11:55,210 --> 01:11:59,230 the string ends, and that clue is what's called a null character. 1549 01:11:59,230 --> 01:12:01,300 It's a little funky to look at, but it's just 1550 01:12:01,300 --> 01:12:04,347 a backslash zero, which might remind you of backslash n, which 1551 01:12:04,347 --> 01:12:06,430 too is a little funky, and that's a special symbol 1552 01:12:06,430 --> 01:12:09,790 that says move the cursor to the next line, give a new line. 1553 01:12:09,790 --> 01:12:12,550 Backslash zero is the so-called null character 1554 01:12:12,550 --> 01:12:14,890 or the null terminating character. 1555 01:12:14,890 --> 01:12:19,930 And all that is special syntax for eight zero bits. 1556 01:12:19,930 --> 01:12:22,310 So each of these boxes represents h bits. 1557 01:12:22,310 --> 01:12:23,530 This is number 72. 1558 01:12:23,530 --> 01:12:25,030 This is the number 73. 1559 01:12:25,030 --> 01:12:26,470 This is the number 33. 1560 01:12:26,470 --> 01:12:32,530 This backslash zero is just the way of drawing all eight bits as zeros. 1561 01:12:32,530 --> 01:12:36,670 So that's what a computer uses in C to demarcate the end of a string. 1562 01:12:36,670 --> 01:12:40,025 It just wastes one byte as all zero bits. 1563 01:12:40,025 --> 01:12:41,650 And I say waste, because you know what? 1564 01:12:41,650 --> 01:12:47,740 How much space does H-I exclamation point actually take up accordingly? 1565 01:12:47,740 --> 01:12:50,620 How many bytes do you need to store hi? 1566 01:12:50,620 --> 01:12:52,000 AUDIENCE: [INAUDIBLE] 1567 01:12:52,000 --> 01:12:56,380 DAVID MALAN: Three, well, four, because you need to know where the string ends, 1568 01:12:56,380 --> 01:12:58,780 otherwise you won't be able to distinguish 1569 01:12:58,780 --> 01:13:02,450 the beginnings of other variables, potentially, in your computer's memory. 1570 01:13:02,450 --> 01:13:04,130 And we'll see this in just a moment. 1571 01:13:04,130 --> 01:13:06,400 So if my string is called s, it turns out 1572 01:13:06,400 --> 01:13:08,410 that at s bracket zero is the first character. 1573 01:13:08,410 --> 01:13:12,010 S bracket one is the second character. s bracket two is the third. 1574 01:13:12,010 --> 01:13:16,210 And that null character, so to speak, the invisible backslash zero 1575 01:13:16,210 --> 01:13:18,950 or eight zero bits happens to be at the end. 1576 01:13:18,950 --> 01:13:24,250 So a string that's of length three, actually takes up four bytes. 1577 01:13:24,250 --> 01:13:27,940 Any string you have typed into a computer yet, whether it's hi, 1578 01:13:27,940 --> 01:13:30,970 or David, or Brian, or Emma, or Rodrigo, takes up 1579 01:13:30,970 --> 01:13:33,940 as many characters as are in those names, 1580 01:13:33,940 --> 01:13:37,370 plus one byte for this special null terminating character. 1581 01:13:37,370 --> 01:13:38,122 So let's see that. 1582 01:13:38,122 --> 01:13:40,330 If we were to write a program using these four names, 1583 01:13:40,330 --> 01:13:42,860 let me go ahead and with that up really quickly here. 1584 01:13:42,860 --> 01:13:46,450 I'm going to create a file called names dot c, 1585 01:13:46,450 --> 01:13:50,050 and I'm going to go ahead and do include standard Io dot h. 1586 01:13:50,050 --> 01:13:53,260 Then I'm going to go ahead and do int main void. 1587 01:13:53,260 --> 01:13:57,790 Inside of here, I'm going to give myself four strings, using my new array 1588 01:13:57,790 --> 01:13:59,080 syntax, as before. 1589 01:13:59,080 --> 01:14:01,870 So I could call this name one, name two, name three, name four, 1590 01:14:01,870 --> 01:14:03,760 but I'm not going to repeat that bad habit. 1591 01:14:03,760 --> 01:14:05,440 I'm going to give myself a name-- 1592 01:14:05,440 --> 01:14:10,150 a variable called names, plural, and store four strings in it, as follows. 1593 01:14:10,150 --> 01:14:12,430 Let's give Emma the first spot there. 1594 01:14:12,430 --> 01:14:16,180 Let's give Rodrigo the second spot there. 1595 01:14:16,180 --> 01:14:19,780 I'm using all caps just because we've seen some of those Ascii codes before, 1596 01:14:19,780 --> 01:14:21,610 but I could use lowercase as well. 1597 01:14:21,610 --> 01:14:22,690 Let's add Brian. 1598 01:14:22,690 --> 01:14:25,120 And then I'll go ahead and add myself lastly. 1599 01:14:25,120 --> 01:14:29,680 So the array is of size four, but I count from zero on up through C. 1600 01:14:29,680 --> 01:14:32,080 And now just for demonstration's sake, let's go ahead 1601 01:14:32,080 --> 01:14:33,790 and print out, say, Emma's name. 1602 01:14:33,790 --> 01:14:37,360 So if I want to print out Emma's name, the type of variable in which she 1603 01:14:37,360 --> 01:14:39,121 is stored, is what? 1604 01:14:39,121 --> 01:14:41,174 What is the type that I want to print? 1605 01:14:41,174 --> 01:14:41,990 String. 1606 01:14:41,990 --> 01:14:43,837 So that's percent s, just like last week. 1607 01:14:43,837 --> 01:14:45,670 And I'm going to head and put a backslash n. 1608 01:14:45,670 --> 01:14:49,720 And if I want to print Emma's name, what do I type here 1609 01:14:49,720 --> 01:14:52,052 to plug into that placeholder? 1610 01:14:52,052 --> 01:14:53,095 AUDIENCE: [INAUDIBLE] 1611 01:14:53,095 --> 01:14:54,470 DAVID MALAN: Names brackets zero. 1612 01:14:54,470 --> 01:14:56,540 It's a little bad that I'm hard coding it here, 1613 01:14:56,540 --> 01:14:59,560 but again, I'm just demonstrating how this all works for now. 1614 01:14:59,560 --> 01:15:01,090 Let me go ahead and save that. 1615 01:15:01,090 --> 01:15:03,400 Let me do make names. 1616 01:15:03,400 --> 01:15:04,570 Bit of an error here. 1617 01:15:04,570 --> 01:15:05,500 What did I do wrong? 1618 01:15:05,500 --> 01:15:08,170 Oh my god, all of this is wrong. 1619 01:15:08,170 --> 01:15:11,274 Does anyone see it yet? 1620 01:15:11,274 --> 01:15:12,150 AUDIENCE: [INAUDIBLE] 1621 01:15:12,150 --> 01:15:14,025 DAVID MALAN: Yeah, I forgot the CS50 library. 1622 01:15:14,025 --> 01:15:16,920 So even though I'm not using get string, I am using string, 1623 01:15:16,920 --> 01:15:19,630 so I do need the CS50 library up here. 1624 01:15:19,630 --> 01:15:21,150 So let me go ahead and clear that. 1625 01:15:21,150 --> 01:15:22,500 Make names. 1626 01:15:22,500 --> 01:15:23,236 OK better. 1627 01:15:23,236 --> 01:15:25,980 Dot slash names, and I should just see Emma's name. 1628 01:15:25,980 --> 01:15:28,080 But watch this, what I can do too. 1629 01:15:28,080 --> 01:15:31,830 I know that Emma's name is a string, and I now 1630 01:15:31,830 --> 01:15:36,550 know that a string is an array of characters, so I can also do this. 1631 01:15:36,550 --> 01:15:41,370 Let me go ahead and print out one, two, three, four characters, 1632 01:15:41,370 --> 01:15:42,700 and then a new line. 1633 01:15:42,700 --> 01:15:44,640 And the characters I'm going to print out 1634 01:15:44,640 --> 01:15:48,660 are going to be Emma's names, first character, 1635 01:15:48,660 --> 01:15:54,600 Emma's names, second character, Emma's names, third character, 1636 01:15:54,600 --> 01:15:58,500 and Emma's names, fourth character. 1637 01:15:58,500 --> 01:16:01,980 So you can have what's essentially a two-dimensional array, where 1638 01:16:01,980 --> 01:16:03,570 you have two sets of square brackets. 1639 01:16:03,570 --> 01:16:06,930 The first one indexes me into the array of names. 1640 01:16:06,930 --> 01:16:10,510 And to index into an array means go to a certain location in an array. 1641 01:16:10,510 --> 01:16:13,530 So names, bracket zero, so to speak. 1642 01:16:13,530 --> 01:16:18,930 This part here means go get Emma's name from the array of four names. 1643 01:16:18,930 --> 01:16:23,260 This square bracket after says within that string, 1644 01:16:23,260 --> 01:16:25,230 treat it as an array of characters and get 1645 01:16:25,230 --> 01:16:28,980 the zeroth character, the first character, which is hopefully e 1646 01:16:28,980 --> 01:16:31,940 and an m and an m and then a. 1647 01:16:31,940 --> 01:16:34,290 So I'm going to go ahead and save this file now. 1648 01:16:34,290 --> 01:16:35,870 Make names again. 1649 01:16:35,870 --> 01:16:41,190 It compiled, dot slash names, and voila, Emma, Emma, I see twice. 1650 01:16:41,190 --> 01:16:44,730 Now, I'm never again going to print any string like this. 1651 01:16:44,730 --> 01:16:48,060 This is just ridiculous, plus I had to know in advance how long her name is. 1652 01:16:48,060 --> 01:16:51,360 However, it is equivalent to printing the string itself. 1653 01:16:51,360 --> 01:16:54,240 It's just C and printf knows when you use 1654 01:16:54,240 --> 01:16:56,550 percent s and you pass on the name of a variable, 1655 01:16:56,550 --> 01:17:00,090 all printf is probably doing under the hood is some kind of loop 1656 01:17:00,090 --> 01:17:03,450 and it's iterating over your string from the first character and it's checking, 1657 01:17:03,450 --> 01:17:04,680 is this the null character? 1658 01:17:04,680 --> 01:17:05,760 If not, print it. 1659 01:17:05,760 --> 01:17:06,930 Is this the null character? 1660 01:17:06,930 --> 01:17:07,672 If not, print it. 1661 01:17:07,672 --> 01:17:10,130 If this is the null character-- is this the null character? 1662 01:17:10,130 --> 01:17:10,980 If not, print it. 1663 01:17:10,980 --> 01:17:18,300 And that's how we get, E-M-M-A stop, because printf, in this line 12, 1664 01:17:18,300 --> 01:17:24,510 presumably noticed, oh, wait a minute, the fifth byte in Emma's names zero 1665 01:17:24,510 --> 01:17:29,143 array is backslash zero, or all eight bits as zero. 1666 01:17:29,143 --> 01:17:29,643 Yeah? 1667 01:17:29,643 --> 01:17:32,160 AUDIENCE: That's just part of [INAUDIBLE] 1668 01:17:32,160 --> 01:17:35,160 DAVID MALAN: That is all part of the underneath the hood stuff of printf 1669 01:17:35,160 --> 01:17:38,740 and it's what humans decided decades ago with C how strings would work. 1670 01:17:38,740 --> 01:17:40,740 They could have come up with a different system, 1671 01:17:40,740 --> 01:17:44,250 but this is the system that they decided to use. 1672 01:17:44,250 --> 01:17:45,210 Other questions? 1673 01:17:45,210 --> 01:17:45,941 Yeah? 1674 01:17:45,941 --> 01:17:49,869 AUDIENCE: [INAUDIBLE] 1675 01:17:49,869 --> 01:17:58,710 1676 01:17:58,710 --> 01:18:00,130 DAVID MALAN: I didn't go further. 1677 01:18:00,130 --> 01:18:05,400 So I deliberately did not touch bracket four, even though it's there. 1678 01:18:05,400 --> 01:18:06,570 But I can try to print this. 1679 01:18:06,570 --> 01:18:07,130 Let's see. 1680 01:18:07,130 --> 01:18:09,700 So let me go ahead and change this program real quick. 1681 01:18:09,700 --> 01:18:12,720 I'm going to go ahead and print out percent C a fifth time. 1682 01:18:12,720 --> 01:18:18,030 And let's go ahead and see if we can see Emma's null terminating character 1683 01:18:18,030 --> 01:18:22,500 at location four, which is her fifth location, so after the E-M-M-A. 1684 01:18:22,500 --> 01:18:23,820 Let me save that. 1685 01:18:23,820 --> 01:18:27,920 Make names, dot slash names, Emma Emma. 1686 01:18:27,920 --> 01:18:28,920 So I don't see it there. 1687 01:18:28,920 --> 01:18:29,670 But you know what? 1688 01:18:29,670 --> 01:18:32,482 Let me try changing this last one just for kicks to percent i. 1689 01:18:32,482 --> 01:18:34,440 And again, this is where printf is your friend. 1690 01:18:34,440 --> 01:18:36,600 You can use it powerfully to see what's going on. 1691 01:18:36,600 --> 01:18:38,490 Or we could whip out debug 50. 1692 01:18:38,490 --> 01:18:42,150 Let me go ahead and make names, dot slash names. 1693 01:18:42,150 --> 01:18:43,835 And voila, there's the zero. 1694 01:18:43,835 --> 01:18:45,960 I'm printing it literally as an int just to see it. 1695 01:18:45,960 --> 01:18:47,627 I would never do this in the real world. 1696 01:18:47,627 --> 01:18:49,260 But it's indeed there. 1697 01:18:49,260 --> 01:18:51,800 And now, this doesn't often work, but just for kicks-- 1698 01:18:51,800 --> 01:18:53,040 I'm getting a little crazy-- 1699 01:18:53,040 --> 01:18:55,650 suppose that I want to look well past Emma's name 1700 01:18:55,650 --> 01:18:59,310 to like location 400, like let's start poking around in the computer's memory, 1701 01:18:59,310 --> 01:19:00,510 one of those other boxes. 1702 01:19:00,510 --> 01:19:03,360 Make names, dot slash names. 1703 01:19:03,360 --> 01:19:06,420 OK, there's a negative three down there as well, or technically 1704 01:19:06,420 --> 01:19:07,530 a hyphen and then a three. 1705 01:19:07,530 --> 01:19:09,780 So we'll come back to this in a couple of weeks' time. 1706 01:19:09,780 --> 01:19:13,560 We can actually start hacking around and looking around my computer's memory 1707 01:19:13,560 --> 01:19:18,930 at any location, because it's just numbers of boxes on the screen. 1708 01:19:18,930 --> 01:19:19,671 Yeah? 1709 01:19:19,671 --> 01:19:22,500 AUDIENCE: Is there any limit to the length of the string? 1710 01:19:22,500 --> 01:19:25,020 DAVID MALAN: Is there any limit to the length of the string? 1711 01:19:25,020 --> 01:19:30,440 Short answer-- yes, the amount of memory that the computer has. 1712 01:19:30,440 --> 01:19:32,140 So like 2 billion 4 billion-- 1713 01:19:32,140 --> 01:19:33,915 it's long. 1714 01:19:33,915 --> 01:19:37,153 AUDIENCE: What happens if try to type in [INAUDIBLE] 1715 01:19:37,153 --> 01:19:38,570 DAVID MALAN: Really good question. 1716 01:19:38,570 --> 01:19:40,240 What happens if you try to type that in hypothetically? 1717 01:19:40,240 --> 01:19:41,500 It depends on the function you use. 1718 01:19:41,500 --> 01:19:43,500 Let me come back to that in like two weeks time. 1719 01:19:43,500 --> 01:19:45,250 Get string will not crash. 1720 01:19:45,250 --> 01:19:48,640 Other C functions will crash, if you give them more input than they expect, 1721 01:19:48,640 --> 01:19:50,950 and we'll come back to the reasons why. 1722 01:19:50,950 --> 01:19:53,642 So what's actually going on underneath this hood, then, 1723 01:19:53,642 --> 01:19:54,850 if we have these four names-- 1724 01:19:54,850 --> 01:19:56,370 Emma, Rodrigo, Brian, and David. 1725 01:19:56,370 --> 01:19:59,530 Well, if we consider our memory again, we know that Emma's up at this first 1726 01:19:59,530 --> 01:20:03,340 location, E-M-M-A, followed by this null terminating character. 1727 01:20:03,340 --> 01:20:06,190 But if the second name we stored in a variable was Rodrigo, 1728 01:20:06,190 --> 01:20:09,610 turns out he's going to end up sort of back to back with that memory as well. 1729 01:20:09,610 --> 01:20:12,220 And again, it's wrapping only because this is an artist's rendition of what 1730 01:20:12,220 --> 01:20:12,970 memory looks like. 1731 01:20:12,970 --> 01:20:15,400 There's no notion of left, right, up, or down in RAM. 1732 01:20:15,400 --> 01:20:20,380 But he is R-O-D-R-I-G-O, and his null terminating character there. 1733 01:20:20,380 --> 01:20:21,460 Brian might end up there. 1734 01:20:21,460 --> 01:20:22,750 I might end up after it. 1735 01:20:22,750 --> 01:20:25,780 And this is what's really going on underneath the hood of your computer. 1736 01:20:25,780 --> 01:20:28,030 Each of these values isn't technically a character. 1737 01:20:28,030 --> 01:20:29,290 It's technically a number. 1738 01:20:29,290 --> 01:20:30,790 And frankly, it's not even a number. 1739 01:20:30,790 --> 01:20:32,890 It's eight bits at a time. 1740 01:20:32,890 --> 01:20:35,800 But again, we don't have to worry about that level of detail now 1741 01:20:35,800 --> 01:20:38,660 that we're operating at this level of abstraction. 1742 01:20:38,660 --> 01:20:40,917 And I put up the wrong code a moment ago. 1743 01:20:40,917 --> 01:20:43,750 This is the code that I actually implemented using an array from the 1744 01:20:43,750 --> 01:20:47,470 get go, as opposed to an actual-- 1745 01:20:47,470 --> 01:20:49,660 as opposed to four separate variables. 1746 01:20:49,660 --> 01:20:52,540 So just to highlight, then, what's going on, per the example I just 1747 01:20:52,540 --> 01:20:56,590 did with printing out Emma's characters, if this is a variable called names, 1748 01:20:56,590 --> 01:21:01,360 and there's four names in it, zero, one, two, three, 1749 01:21:01,360 --> 01:21:05,800 you can think of every character as being kind of addressable 1750 01:21:05,800 --> 01:21:07,580 using square bracket notation. 1751 01:21:07,580 --> 01:21:10,870 The first set of square brackets picks the name in question. 1752 01:21:10,870 --> 01:21:14,230 The second set of square brackets picks the character within the name. 1753 01:21:14,230 --> 01:21:17,740 So e is the first character, so that's zero. m is the next one, so that's one. 1754 01:21:17,740 --> 01:21:21,850 m is the third, so that's two. a Is the fourth, and so that's three. 1755 01:21:21,850 --> 01:21:26,183 And then with Rodrigo, he's at names one, and his r is in brackets zero. 1756 01:21:26,183 --> 01:21:28,100 So again, we're really getting into the weeds. 1757 01:21:28,100 --> 01:21:31,100 And this is not what programming ultimately is, but this is just to say, 1758 01:21:31,100 --> 01:21:34,630 there's no magic when you use printf and get string and get int, and so forth. 1759 01:21:34,630 --> 01:21:40,390 All that's going on underneath the hood is manipulation of values like these. 1760 01:21:40,390 --> 01:21:44,230 So let's now see what a string really is and we'll ultimately conclude today 1761 01:21:44,230 --> 01:21:46,030 with some domain specific problems. 1762 01:21:46,030 --> 01:21:48,130 Indeed with problem set two will you be exploring 1763 01:21:48,130 --> 01:21:50,830 a number of real-world problems, like assessing just how 1764 01:21:50,830 --> 01:21:54,607 readable some text is, what grade level might a certain book or another be, 1765 01:21:54,607 --> 01:21:56,690 and two, implementing some notion of cryptography, 1766 01:21:56,690 --> 01:21:58,330 the art of scrambling information. 1767 01:21:58,330 --> 01:22:00,610 And suffice it to say, in both of those domains, 1768 01:22:00,610 --> 01:22:03,490 reading texts and also cryptography, strings 1769 01:22:03,490 --> 01:22:05,840 are going to be the ingredient that we need. 1770 01:22:05,840 --> 01:22:09,550 So let's take a look now at a few examples involving 1771 01:22:09,550 --> 01:22:11,330 more and more strings. 1772 01:22:11,330 --> 01:22:16,210 I'm going to go ahead and create a program here called string dot c, 1773 01:22:16,210 --> 01:22:17,710 just so I can play with this notion. 1774 01:22:17,710 --> 01:22:20,470 I'm going to go ahead and include CS50 dot h. 1775 01:22:20,470 --> 01:22:24,550 I'm going to go ahead and include standard Io dot h. 1776 01:22:24,550 --> 01:22:25,990 I'll fix this up here-- 1777 01:22:25,990 --> 01:22:26,993 int main void. 1778 01:22:26,993 --> 01:22:30,160 And now let me go ahead and just play around with some strings for a moment. 1779 01:22:30,160 --> 01:22:32,720 Let me go ahead and get myself a string from the user. 1780 01:22:32,720 --> 01:22:36,940 So get string and ask for their input. 1781 01:22:36,940 --> 01:22:38,890 Trying to type too fast now. 1782 01:22:38,890 --> 01:22:41,860 So let me go ahead and ask the user for their input via get string, 1783 01:22:41,860 --> 01:22:44,260 and store the answer in a variable called s. 1784 01:22:44,260 --> 01:22:46,030 Then let me go ahead and preemptively say 1785 01:22:46,030 --> 01:22:48,610 that their output is going to be the following. 1786 01:22:48,610 --> 01:22:51,760 And what I want to do is just print out the individual characters 1787 01:22:51,760 --> 01:22:53,060 in that string. 1788 01:22:53,060 --> 01:22:57,400 So for int i get to zero, I don't know what my condition is yet, 1789 01:22:57,400 --> 01:23:00,180 so I'll come back to that-- i plus plus. 1790 01:23:00,180 --> 01:23:03,730 I'm going to go ahead and print out the individual character 1791 01:23:03,730 --> 01:23:06,680 at the i-th location in that string, and I'm 1792 01:23:06,680 --> 01:23:08,680 going to end this whole program with a new line. 1793 01:23:08,680 --> 01:23:12,490 So I still have a blank to fill in, these question marks, but I ultimately 1794 01:23:12,490 --> 01:23:15,490 just want to take as input a string, and then print it out as output, 1795 01:23:15,490 --> 01:23:18,370 but not using percent s. 1796 01:23:18,370 --> 01:23:21,670 I'm going to use percent c, one character at a time. 1797 01:23:21,670 --> 01:23:26,080 So my question mark here is what question could I ask on every iteration 1798 01:23:26,080 --> 01:23:30,660 before deciding whether or not I've printed every character in the string? 1799 01:23:30,660 --> 01:23:31,160 Yeah? 1800 01:23:31,160 --> 01:23:32,480 AUDIENCE: Length of the string. 1801 01:23:32,480 --> 01:23:33,730 DAVID MALAN: Length of string. 1802 01:23:33,730 --> 01:23:36,105 So I could say while i is less than the length of string. 1803 01:23:36,105 --> 01:23:36,913 What else? 1804 01:23:36,913 --> 01:23:37,720 AUDIENCE: The null character. 1805 01:23:37,720 --> 01:23:39,070 DAVID MALAN: Or if it's equal to the null character. 1806 01:23:39,070 --> 01:23:40,070 Let's try both of these. 1807 01:23:40,070 --> 01:23:42,790 So if I know how strings are represented, 1808 01:23:42,790 --> 01:23:47,718 I can just say while s bracket i does not equal backslash zero. 1809 01:23:47,718 --> 01:23:50,260 Now this is a bit of a funky syntax, because even though it's 1810 01:23:50,260 --> 01:23:53,290 two characters, I still have to use single quotes, 1811 01:23:53,290 --> 01:23:55,990 because those two characters, just like backslash n, 1812 01:23:55,990 --> 01:23:58,960 represent one idea, not two literal characters. 1813 01:23:58,960 --> 01:24:01,780 But this is a literal translation of what we just discussed. 1814 01:24:01,780 --> 01:24:05,050 Initialize i to zero, incremented on every iteration, 1815 01:24:05,050 --> 01:24:09,220 but every time you do that check does the i-th character in the string 1816 01:24:09,220 --> 01:24:13,300 equal the special null character, and if so, that's it for the loop. 1817 01:24:13,300 --> 01:24:15,320 We only want to iterate through this for loop 1818 01:24:15,320 --> 01:24:18,520 so long as it's not that special backslash zero. 1819 01:24:18,520 --> 01:24:22,810 So if I go ahead now and save this file and make string and run 1820 01:24:22,810 --> 01:24:27,220 dot slash string and my input for instance is Emma, Enter, 1821 01:24:27,220 --> 01:24:29,260 I'm going to see literally her name back. 1822 01:24:29,260 --> 01:24:33,400 So this is kind of my way of re implementing the idea of percent s, 1823 01:24:33,400 --> 01:24:34,813 but using only percent c. 1824 01:24:34,813 --> 01:24:35,980 But I liked your suggestion. 1825 01:24:35,980 --> 01:24:37,230 Why don't we use the string-- 1826 01:24:37,230 --> 01:24:40,988 the length of the string, rather than this low-level implementation detail? 1827 01:24:40,988 --> 01:24:42,780 It would be really nice if I could just say 1828 01:24:42,780 --> 01:24:48,690 while i is less than the length of s-- 1829 01:24:48,690 --> 01:24:50,440 so how do express this? 1830 01:24:50,440 --> 01:24:55,560 Well, it turns out there's another file called 1831 01:24:55,560 --> 01:24:59,610 string dot h inside of which are a bunch of string-related functions 1832 01:24:59,610 --> 01:25:00,900 that I might like to use. 1833 01:25:00,900 --> 01:25:04,950 One of those is a function called str leng, for short, 1834 01:25:04,950 --> 01:25:07,150 which means the length of a string. 1835 01:25:07,150 --> 01:25:09,130 So I can take your suggestion and just say, 1836 01:25:09,130 --> 01:25:10,838 I don't care how a string is implemented. 1837 01:25:10,838 --> 01:25:12,755 I mean, my god, the whole point of programming 1838 01:25:12,755 --> 01:25:15,840 ultimately is too abstract on those lower level implementation details. 1839 01:25:15,840 --> 01:25:18,600 Let me just ask the computer what is your length, so 1840 01:25:18,600 --> 01:25:20,070 that I don't count past it. 1841 01:25:20,070 --> 01:25:24,540 Let me go ahead now and make string, dot slash string. 1842 01:25:24,540 --> 01:25:26,010 Let's type in Emma again. 1843 01:25:26,010 --> 01:25:28,030 And the output is the same. 1844 01:25:28,030 --> 01:25:33,090 But now, this is correct perhaps, but I argue it's not very well-designed. 1845 01:25:33,090 --> 01:25:36,300 I'm being a little inefficient and I bet I can do this better. 1846 01:25:36,300 --> 01:25:37,280 What do you see? 1847 01:25:37,280 --> 01:25:38,423 AUDIENCE: [INAUDIBLE] 1848 01:25:38,423 --> 01:25:39,340 DAVID MALAN: Go ahead. 1849 01:25:39,340 --> 01:25:43,290 AUDIENCE: [INAUDIBLE] 1850 01:25:43,290 --> 01:25:45,300 DAVID MALAN: Yeah, exactly. 1851 01:25:45,300 --> 01:25:47,910 Remember in a for loop that the condition in the middle, 1852 01:25:47,910 --> 01:25:50,790 in between the semicolons, is a question, a Boolean expression, 1853 01:25:50,790 --> 01:25:53,620 that you ask again and again and again. 1854 01:25:53,620 --> 01:25:56,940 And it turns out that calling a function is not without cost. 1855 01:25:56,940 --> 01:25:59,770 It might take a split second, because computers are super fast, 1856 01:25:59,770 --> 01:26:04,140 but why are you asking the same question again and again and again and again. 1857 01:26:04,140 --> 01:26:07,080 The answer is never going to change, because Emma's name is not 1858 01:26:07,080 --> 01:26:09,630 growing or shrinking, it's just Emma. 1859 01:26:09,630 --> 01:26:11,350 So I can solve this in a couple of ways. 1860 01:26:11,350 --> 01:26:12,642 I could do something like this. 1861 01:26:12,642 --> 01:26:17,130 Int n get str leng of s, and then I could just plug in n. 1862 01:26:17,130 --> 01:26:20,460 My program is just as correct, but it's a little better designed 1863 01:26:20,460 --> 01:26:23,250 now because I'm asking the question of string length 1864 01:26:23,250 --> 01:26:27,333 once, remembering the answer, and then using that answer again and again. 1865 01:26:27,333 --> 01:26:30,000 Now, yes, technically, now I'm wasting some space, because I now 1866 01:26:30,000 --> 01:26:31,440 have another variable called n. 1867 01:26:31,440 --> 01:26:32,530 So something's gotta give. 1868 01:26:32,530 --> 01:26:35,220 I'm going to use more space or maybe more time, 1869 01:26:35,220 --> 01:26:37,802 but that's a theme we'll come back to next week especially. 1870 01:26:37,802 --> 01:26:40,260 But it turns out there's some special syntax for this, too. 1871 01:26:40,260 --> 01:26:43,980 If you know in a loop that you want to ask a question once and remember 1872 01:26:43,980 --> 01:26:48,210 the answer, you can actually just say this and do this all in one line. 1873 01:26:48,210 --> 01:26:51,900 It's no better or worse, it's just a little more succinct, stylistically. 1874 01:26:51,900 --> 01:26:55,740 This has the same effect of initializing i to zero, and n 1875 01:26:55,740 --> 01:27:00,190 to the length of string, and then never again asking that question. 1876 01:27:00,190 --> 01:27:00,990 So I can save this. 1877 01:27:00,990 --> 01:27:03,120 I can make string. 1878 01:27:03,120 --> 01:27:05,100 I can then do dot slash string, and I'm going 1879 01:27:05,100 --> 01:27:07,560 to see hopefully, Emma, Emma again. 1880 01:27:07,560 --> 01:27:10,960 So a third and final version of this idea, but a little better Designed. 1881 01:27:10,960 --> 01:27:11,996 Yeah? 1882 01:27:11,996 --> 01:27:15,670 AUDIENCE: [INAUDIBLE] 1883 01:27:15,670 --> 01:27:15,672 1884 01:27:15,672 --> 01:27:17,130 DAVID MALAN: In this case, it's OK. 1885 01:27:17,130 --> 01:27:18,420 This would be a common convention. 1886 01:27:18,420 --> 01:27:20,280 When you are doing something especially to minimize 1887 01:27:20,280 --> 01:27:22,710 the number of questions you're asking, this is OK, so long 1888 01:27:22,710 --> 01:27:23,910 as it's still pretty tight. 1889 01:27:23,910 --> 01:27:26,590 But there, too, reasonable people might disagree. 1890 01:27:26,590 --> 01:27:27,230 Yeah? 1891 01:27:27,230 --> 01:27:29,703 AUDIENCE: Is the prototype string in library [INAUDIBLE]?? 1892 01:27:29,703 --> 01:27:31,120 DAVID MALAN: Really good question. 1893 01:27:31,120 --> 01:27:34,902 The prototype for string, its declaration, is in string dot h. 1894 01:27:34,902 --> 01:27:36,860 I would get one of those cryptic error messages 1895 01:27:36,860 --> 01:27:40,400 if I forgot to include string dot h, because clang would not 1896 01:27:40,400 --> 01:27:43,910 know that str leng actually exists. 1897 01:27:43,910 --> 01:27:46,490 Let me try another example here and see what kind of power 1898 01:27:46,490 --> 01:27:51,140 we have now that we actually are controlling-- 1899 01:27:51,140 --> 01:27:53,690 now that we actually understand what a string actually is. 1900 01:27:53,690 --> 01:27:55,482 Let me go ahead and whip this up real fast. 1901 01:27:55,482 --> 01:27:58,190 So up here in my program, called uppercase dot c, 1902 01:27:58,190 --> 01:28:00,350 me give myself the CS50 library. 1903 01:28:00,350 --> 01:28:02,270 Let me give myself standard Io dot h. 1904 01:28:02,270 --> 01:28:06,140 And now let me give me string dot h, just so I can use str leng. 1905 01:28:06,140 --> 01:28:09,440 Let me give myself the name of a function main. 1906 01:28:09,440 --> 01:28:11,630 And then in here, let's do the same thing. 1907 01:28:11,630 --> 01:28:13,868 String s gets get string. 1908 01:28:13,868 --> 01:28:16,160 But this time, let me just ask the human for the string 1909 01:28:16,160 --> 01:28:18,290 before I'm going to do something to it. 1910 01:28:18,290 --> 01:28:24,230 Then I'm going to go ahead and say after I want the following to happen. 1911 01:28:24,230 --> 01:28:25,550 And I'm going to do this-- 1912 01:28:25,550 --> 01:28:31,610 for int i get zero, n equal str leng s as before. 1913 01:28:31,610 --> 01:28:35,390 Do this so long as i is less than n, and on each iteration, i plus plus. 1914 01:28:35,390 --> 01:28:36,740 So copy-paste from before. 1915 01:28:36,740 --> 01:28:38,930 I just retyped out the same thing. 1916 01:28:38,930 --> 01:28:42,170 Now let me go ahead and in this for loop, let me change 1917 01:28:42,170 --> 01:28:45,410 this string, whatever it is, all to uppercase. 1918 01:28:45,410 --> 01:28:46,590 So how might I do this? 1919 01:28:46,590 --> 01:28:52,280 So let me go ahead and say, well, if the current character at s bracket i 1920 01:28:52,280 --> 01:28:58,700 is greater than or equal to lower case a, and that same character is less than 1921 01:28:58,700 --> 01:29:00,187 or equal to lowercase z. 1922 01:29:00,187 --> 01:29:03,020 So I'm using some week one style stuff, even though we didn't really 1923 01:29:03,020 --> 01:29:04,850 use this much syntax last week. 1924 01:29:04,850 --> 01:29:06,440 I'm just asking a simple question. 1925 01:29:06,440 --> 01:29:11,480 Is the i-th character in s greater than or equal to lowercase a and-- 1926 01:29:11,480 --> 01:29:13,370 double ampersand means and-- 1927 01:29:13,370 --> 01:29:16,570 logically, is that character less than or equal to z? 1928 01:29:16,570 --> 01:29:19,940 So is it a, b, c, all the way through z-- is it a lowercase letter? 1929 01:29:19,940 --> 01:29:24,120 If so, I want to do something like convert to uppercase. 1930 01:29:24,120 --> 01:29:26,180 But we'll come back to that in just a moment. 1931 01:29:26,180 --> 01:29:30,590 Else what do I want to do if the character is not lowercase 1932 01:29:30,590 --> 01:29:33,096 and my goal is to uppercase the whole input? 1933 01:29:33,096 --> 01:29:34,250 AUDIENCE: [INAUDIBLE] 1934 01:29:34,250 --> 01:29:35,300 DAVID MALAN: Yeah, just leave it alone. 1935 01:29:35,300 --> 01:29:35,690 So you know what? 1936 01:29:35,690 --> 01:29:37,773 I'm just-- fine, I'm just going to leave it alone. 1937 01:29:37,773 --> 01:29:40,842 I'm going to print it back out, just as I would with printf like that. 1938 01:29:40,842 --> 01:29:42,800 So now even though this is not obvious from the 1939 01:29:42,800 --> 01:29:45,830 get go how I'm going to solve this, I've now left myself 1940 01:29:45,830 --> 01:29:47,540 a placeholder, pseudocode if you will. 1941 01:29:47,540 --> 01:29:49,830 I just now need to answer this question. 1942 01:29:49,830 --> 01:29:56,602 Well, it turns out a popular place to go for this answer would be AsciiChart.com 1943 01:29:56,602 --> 01:29:58,310 And there's different ways to solve this, 1944 01:29:58,310 --> 01:30:00,060 but this is just a free website that shows 1945 01:30:00,060 --> 01:30:02,480 us all of the decimal numbers that correspond to letters. 1946 01:30:02,480 --> 01:30:08,460 And recall from week zero, 65 is a, 66 is b, and so forth. 1947 01:30:08,460 --> 01:30:11,090 Notice that 65 is-- capital A is 65. 1948 01:30:11,090 --> 01:30:12,736 What is lowercase a? 1949 01:30:12,736 --> 01:30:14,640 AUDIENCE: [INAUDIBLE] 1950 01:30:14,640 --> 01:30:16,180 DAVID MALAN: 97. 1951 01:30:16,180 --> 01:30:22,330 And then look-- 66 to 98, 67 to 99, 68 to 100-- 1952 01:30:22,330 --> 01:30:24,690 what's the difference between these? 1953 01:30:24,690 --> 01:30:25,720 Yeah, it's 32. 1954 01:30:25,720 --> 01:30:28,510 If you add 32 to 65, you get 97. 1955 01:30:28,510 --> 01:30:31,640 If you add 32 to 66, you get 98, and so forth. 1956 01:30:31,640 --> 01:30:34,870 So it seems that the lowercase letters, wonderfully conveniently, 1957 01:30:34,870 --> 01:30:39,880 are all 32 values away from the uppercase letters. 1958 01:30:39,880 --> 01:30:42,460 Or conversely, if I have a lowercase letter, 1959 01:30:42,460 --> 01:30:45,460 logically, what could I do to it in order 1960 01:30:45,460 --> 01:30:49,960 to convert it from uppercase to lowercase-- 1961 01:30:49,960 --> 01:30:53,180 Sorry-- from lowercase to uppercase? 1962 01:30:53,180 --> 01:30:54,240 Subtract, right? 1963 01:30:54,240 --> 01:30:58,300 So why don't I try printing out printf, percent c, 1964 01:30:58,300 --> 01:31:01,450 then go ahead and print out not the actual character, 1965 01:31:01,450 --> 01:31:03,193 but just subtract 32 from it. 1966 01:31:03,193 --> 01:31:05,110 I know these are integers underneath the hood. 1967 01:31:05,110 --> 01:31:07,000 And frankly, if I want to be really explicit, 1968 01:31:07,000 --> 01:31:10,960 I can convert it to an integer, the Ascii code, and then subtract 32, 1969 01:31:10,960 --> 01:31:14,120 but that can be done implicitly-- we saw earlier. 1970 01:31:14,120 --> 01:31:17,740 So let me go ahead and save this file and run uppercase, 1971 01:31:17,740 --> 01:31:20,290 make uppercase, dot slash uppercase. 1972 01:31:20,290 --> 01:31:24,280 And this time, let me write Emma's name in all lowercase, and voila, 1973 01:31:24,280 --> 01:31:24,940 I see it here. 1974 01:31:24,940 --> 01:31:25,898 Now it's a little ugly. 1975 01:31:25,898 --> 01:31:26,705 What did I forget? 1976 01:31:26,705 --> 01:31:27,580 AUDIENCE: [INAUDIBLE] 1977 01:31:27,580 --> 01:31:28,310 DAVID MALAN: A new line. 1978 01:31:28,310 --> 01:31:31,143 So I'm going to go ahead and do that at the very end of the program, 1979 01:31:31,143 --> 01:31:33,100 so I get it only once at the very end. 1980 01:31:33,100 --> 01:31:38,140 Let me rerun-- make uppercase, dot slash uppercase, Emma in lowercase. 1981 01:31:38,140 --> 01:31:39,980 Voila, I've got it uppercase. 1982 01:31:39,980 --> 01:31:42,993 So this is like a very low-level implementation 1983 01:31:42,993 --> 01:31:44,660 of the notion of upper casing something. 1984 01:31:44,660 --> 01:31:46,780 So if you've ever done this in Google Docs or Microsoft Word-- 1985 01:31:46,780 --> 01:31:48,370 convert this all to uppercase for whatever 1986 01:31:48,370 --> 01:31:51,010 reason, that's all the computer is doing underneath the hood-- 1987 01:31:51,010 --> 01:31:54,850 iterating over the characters and presumably subtracting off of that. 1988 01:31:54,850 --> 01:31:57,970 But this, too, is at a low-level detail that we probably 1989 01:31:57,970 --> 01:31:59,830 don't want to have to think about too much, 1990 01:31:59,830 --> 01:32:02,872 and so it turns out there's functions that can solve this problem for us. 1991 01:32:02,872 --> 01:32:06,320 And you might have discovered these last week or used them yourself. 1992 01:32:06,320 --> 01:32:10,210 But on CS50's website is an example of what are called manual pages. 1993 01:32:10,210 --> 01:32:13,840 And if I go ahead and pull this up on the course's website, 1994 01:32:13,840 --> 01:32:17,560 we'll see a tool that adds the following. 1995 01:32:17,560 --> 01:32:25,070 If I go to the course's web page and click on manual pages, 1996 01:32:25,070 --> 01:32:27,190 you'll see the CS50 programmers manual, which 1997 01:32:27,190 --> 01:32:29,560 is a simplified version of a very popular tool that's 1998 01:32:29,560 --> 01:32:32,770 available on most computer systems that support programming. 1999 01:32:32,770 --> 01:32:36,430 And suppose I want to do something like convert something to uppercase, 2000 01:32:36,430 --> 01:32:37,730 I can search up there. 2001 01:32:37,730 --> 01:32:39,820 And notice, there's a few functions available in C 2002 01:32:39,820 --> 01:32:41,020 that relate to uppercase. 2003 01:32:41,020 --> 01:32:44,470 Is upper, which asks a question, to lower and to upper. 2004 01:32:44,470 --> 01:32:46,200 I'm going to go ahead and use to upper. 2005 01:32:46,200 --> 01:32:47,860 I'm going to go ahead and use to upper. 2006 01:32:47,860 --> 01:32:51,300 And if I click on this, I'll see essentially its documentation for it. 2007 01:32:51,300 --> 01:32:53,050 And it's a little cryptic at first glance. 2008 01:32:53,050 --> 01:32:55,060 But what you're seeing in the documentation 2009 01:32:55,060 --> 01:32:58,900 is it's required header file and it's prototype. 2010 01:32:58,900 --> 01:33:02,441 What file do I apparently need to include to use to upper? 2011 01:33:02,441 --> 01:33:03,316 AUDIENCE: [INAUDIBLE] 2012 01:33:03,316 --> 01:33:04,649 DAVID MALAN: Yeah, c type dot h. 2013 01:33:04,649 --> 01:33:06,399 I don't really know what else is in there, 2014 01:33:06,399 --> 01:33:08,410 but this is my hint that I should use that file. 2015 01:33:08,410 --> 01:33:11,080 And what kind of input does to upper take? 2016 01:33:11,080 --> 01:33:13,270 Well technically, it takes an int, for reasons that 2017 01:33:13,270 --> 01:33:14,800 are explained in the documentation. 2018 01:33:14,800 --> 01:33:17,110 But even if the documentation is not obvious, 2019 01:33:17,110 --> 01:33:19,880 it turns out it's actually pretty easy to use. 2020 01:33:19,880 --> 01:33:23,470 I'm going to go ahead and rip out most of this logic, 2021 01:33:23,470 --> 01:33:28,480 and I'm just going to do this-- printf, percent c, to upper, 2022 01:33:28,480 --> 01:33:31,180 s bracket i, semicolon. 2023 01:33:31,180 --> 01:33:35,650 And up here, I'm going to go ahead and include c type dot h, 2024 01:33:35,650 --> 01:33:37,660 because in reading the documentation, I realize 2025 01:33:37,660 --> 01:33:41,320 that oh, I can pass in any character to to upper, and if it's lowercase, 2026 01:33:41,320 --> 01:33:44,800 it's going to return in uppercase, and if it's not a lowercase letter, 2027 01:33:44,800 --> 01:33:47,450 it's just going to return it unchanged. 2028 01:33:47,450 --> 01:33:51,400 So if I save this file now, make uppercase, and then rerun 2029 01:33:51,400 --> 01:33:56,140 this program, this time typing in Emma's name again in lowercase, voila, 2030 01:33:56,140 --> 01:33:59,230 I've now used another helper function, something someone else wrote. 2031 01:33:59,230 --> 01:34:02,050 But you can imagine that all the person did 2032 01:34:02,050 --> 01:34:04,150 who wrote this function for us is what? 2033 01:34:04,150 --> 01:34:08,260 Like an if else, checking the Ascii mathematics to see 2034 01:34:08,260 --> 01:34:11,575 if the character is indeed lowercase. 2035 01:34:11,575 --> 01:34:14,270 Any questions then on this? 2036 01:34:14,270 --> 01:34:18,610 Again, now the goal is to move away from caring about 32 or the Ascii codes 2037 01:34:18,610 --> 01:34:21,520 and just using helper functions someone else wrote. 2038 01:34:21,520 --> 01:34:22,202 Yeah? 2039 01:34:22,202 --> 01:34:24,628 AUDIENCE: Why [INAUDIBLE] 2040 01:34:24,628 --> 01:34:26,170 DAVID MALAN: Why do you not need to-- 2041 01:34:26,170 --> 01:34:29,052 AUDIENCE: [INAUDIBLE] 2042 01:34:29,052 --> 01:34:30,010 DAVID MALAN: The type-- 2043 01:34:30,010 --> 01:34:31,900 Ah, why do you not need to declare the type of int. 2044 01:34:31,900 --> 01:34:32,440 I am. 2045 01:34:32,440 --> 01:34:35,440 This only works if it's the same type as i. 2046 01:34:35,440 --> 01:34:36,577 Good question. 2047 01:34:36,577 --> 01:34:39,410 So I get away with it because both i and n are meant to be integers. 2048 01:34:39,410 --> 01:34:40,090 Yeah? 2049 01:34:40,090 --> 01:34:44,010 AUDIENCE: [INAUDIBLE] 2050 01:34:44,010 --> 01:34:44,785 2051 01:34:44,785 --> 01:34:46,410 DAVID MALAN: Are there any limitations? 2052 01:34:46,410 --> 01:34:51,163 No, you may use any functions you want on CS50 problem sets, 2053 01:34:51,163 --> 01:34:52,830 whether or not we've used them in class. 2054 01:34:52,830 --> 01:34:54,913 That's certainly fine, unless otherwise specified, 2055 01:34:54,913 --> 01:34:56,710 which will rarely be the case. 2056 01:34:56,710 --> 01:34:58,023 So what else then can we do? 2057 01:34:58,023 --> 01:34:59,940 Well turns out, we've just empowered ourselves 2058 01:34:59,940 --> 01:35:01,890 with a couple of new features, one of which 2059 01:35:01,890 --> 01:35:04,140 is, again, called command line arguments. 2060 01:35:04,140 --> 01:35:05,850 We've seen these before. 2061 01:35:05,850 --> 01:35:09,420 What did I describe previously today and last week as a command line argument? 2062 01:35:09,420 --> 01:35:11,700 What was an example? 2063 01:35:11,700 --> 01:35:13,290 Anyone-- I heard here. 2064 01:35:13,290 --> 01:35:14,010 AUDIENCE: Dash o. 2065 01:35:14,010 --> 01:35:14,843 DAVID MALAN: Dash o. 2066 01:35:14,843 --> 01:35:17,490 Remember that clang can have its default behavior, which 2067 01:35:17,490 --> 01:35:20,520 was a little annoying, whereby it outputs a file called a dot out, 2068 01:35:20,520 --> 01:35:25,170 overridden by saying dash o hello, or dash o anything, 2069 01:35:25,170 --> 01:35:28,920 to change the output to a file of your choice. 2070 01:35:28,920 --> 01:35:31,080 That was an example of a command line argument. 2071 01:35:31,080 --> 01:35:33,917 You literally typed it after the command, on a line, 2072 01:35:33,917 --> 01:35:36,750 and it's an argument in the sense that it's an input to the program. 2073 01:35:36,750 --> 01:35:38,640 So a command line argument, more generally, 2074 01:35:38,640 --> 01:35:43,110 is just one or more words that you type at the prompt after the program you 2075 01:35:43,110 --> 01:35:44,370 care about running. 2076 01:35:44,370 --> 01:35:46,110 So where are these germane here? 2077 01:35:46,110 --> 01:35:51,300 Well finally, can we now explain what a little more of this canonical program 2078 01:35:51,300 --> 01:35:52,110 is about. 2079 01:35:52,110 --> 01:35:55,470 We already discussed earlier today that includes standard Io dot h. 2080 01:35:55,470 --> 01:35:57,900 It just contains your prototypes for things like printf, 2081 01:35:57,900 --> 01:36:01,110 and that gets copied and pasted during pre processing into the file, 2082 01:36:01,110 --> 01:36:02,290 and so forth. 2083 01:36:02,290 --> 01:36:05,110 But what we've not explained yet, what void is here, 2084 01:36:05,110 --> 01:36:06,330 let alone what int is here. 2085 01:36:06,330 --> 01:36:10,080 We've just been copying and pasting this now for just over a week. 2086 01:36:10,080 --> 01:36:15,570 Well it turns out, that in C, you do not need to write only the word void inside 2087 01:36:15,570 --> 01:36:16,830 of those parentheses. 2088 01:36:16,830 --> 01:36:21,000 You can also write, wonderfully, int arg c, string arg v, open bracket, 2089 01:36:21,000 --> 01:36:22,110 close bracket. 2090 01:36:22,110 --> 01:36:23,620 Now why is that compelling? 2091 01:36:23,620 --> 01:36:25,470 Well notice there's a pattern here, and it's 2092 01:36:25,470 --> 01:36:28,530 quite similar to my average function a moment ago. 2093 01:36:28,530 --> 01:36:30,750 It takes two arguments main, apparently. 2094 01:36:30,750 --> 01:36:34,360 One is an int, and one is what? 2095 01:36:34,360 --> 01:36:35,680 It's not a string, per se. 2096 01:36:35,680 --> 01:36:36,430 It's-- 2097 01:36:36,430 --> 01:36:36,910 AUDIENCE: [INAUDIBLE] 2098 01:36:36,910 --> 01:36:38,420 DAVID MALAN: --an array of strings. 2099 01:36:38,420 --> 01:36:40,033 Now arg v is a human convention. 2100 01:36:40,033 --> 01:36:41,950 It means argument vector, which is a fancy way 2101 01:36:41,950 --> 01:36:44,590 of saying an array of arguments. 2102 01:36:44,590 --> 01:36:47,920 And the way you know this is an array is by the fact that you have open bracket 2103 01:36:47,920 --> 01:36:48,820 closed bracket. 2104 01:36:48,820 --> 01:36:51,760 And it's an array of strings because to the left is the word string. 2105 01:36:51,760 --> 01:36:54,430 This is just an old-school integer called int arg 2106 01:36:54,430 --> 01:36:57,850 c, which stands for by convention, argument count. 2107 01:36:57,850 --> 01:37:00,700 However, we could call these arguments anything we want. 2108 01:37:00,700 --> 01:37:03,550 Humans for decades have just called them arg c and arg v, 2109 01:37:03,550 --> 01:37:06,670 just like my average function took in the length of an array 2110 01:37:06,670 --> 01:37:10,690 and the number of scores inside of it. 2111 01:37:10,690 --> 01:37:13,450 So what-- the actual scores inside of it. 2112 01:37:13,450 --> 01:37:15,380 So what can we do with this information? 2113 01:37:15,380 --> 01:37:17,560 Well it turns out, we can now write programs 2114 01:37:17,560 --> 01:37:21,620 that take words from the human, not via get string, but at the actual command 2115 01:37:21,620 --> 01:37:22,120 prompt. 2116 01:37:22,120 --> 01:37:24,380 We can implement features, like clang has. 2117 01:37:24,380 --> 01:37:27,430 So let me go ahead and write a program called arg v in a file 2118 01:37:27,430 --> 01:37:28,900 called arg v dot c. 2119 01:37:28,900 --> 01:37:33,070 Let me go ahead include the CS50 library. 2120 01:37:33,070 --> 01:37:37,980 Let me go ahead and include standard Io dot h. 2121 01:37:37,980 --> 01:37:38,830 Voila. 2122 01:37:38,830 --> 01:37:43,000 Now let me go ahead and do int main not void, int arg 2123 01:37:43,000 --> 01:37:47,140 c, string arg v, open brackets. 2124 01:37:47,140 --> 01:37:50,350 So it's actually worse than it has been, but now it's useful. 2125 01:37:50,350 --> 01:37:51,130 We'll see. 2126 01:37:51,130 --> 01:37:53,180 And now I'm going to go ahead and do this. 2127 01:37:53,180 --> 01:37:59,050 Let me go ahead and say if arg c equals two, 2128 01:37:59,050 --> 01:38:02,630 that's going to mean that the human has typed two words at their prompt. 2129 01:38:02,630 --> 01:38:07,630 And I'm going to go ahead and say this, hello percent s, new line, 2130 01:38:07,630 --> 01:38:11,230 and then I'm going to plug in arg v bracket one, 2131 01:38:11,230 --> 01:38:15,160 for reasons we'll soon see, else if arg c does not equal two, 2132 01:38:15,160 --> 01:38:20,050 I'm just going to hard code this and say hello, world, backslash n. 2133 01:38:20,050 --> 01:38:21,118 So what am I doing? 2134 01:38:21,118 --> 01:38:23,410 I'm trying to write a program that allows the human now 2135 01:38:23,410 --> 01:38:26,200 to write their name at the command prompt, 2136 01:38:26,200 --> 01:38:29,590 instead of waiting for the program to run and use get string [INAUDIBLE] 2137 01:38:29,590 --> 01:38:31,120 like a blinking prompt. 2138 01:38:31,120 --> 01:38:34,990 So what I can do now is this, make arg v. It compiles. 2139 01:38:34,990 --> 01:38:37,570 Dot slash arg v, Enter. 2140 01:38:37,570 --> 01:38:38,950 Hello, world. 2141 01:38:38,950 --> 01:38:45,480 So presumably, what does arg c equal when I run it in that way? 2142 01:38:45,480 --> 01:38:46,480 DAVID MALAN: Maybe one-- 2143 01:38:46,480 --> 01:38:48,550 I mean, not two, at least, it stands to reason. 2144 01:38:48,550 --> 01:38:50,800 It's not two, because I didn't see my own name. 2145 01:38:50,800 --> 01:38:53,253 So if I go ahead and rerun it now, it would say David. 2146 01:38:53,253 --> 01:38:54,670 What's it going to say, hopefully? 2147 01:38:54,670 --> 01:38:56,950 Like, hello comma David? 2148 01:38:56,950 --> 01:38:57,850 And indeed, it does. 2149 01:38:57,850 --> 01:38:58,630 Why? 2150 01:38:58,630 --> 01:39:01,300 Well when you run a program that you have written in C 2151 01:39:01,300 --> 01:39:05,320 and you specify one or more words after your program's name, 2152 01:39:05,320 --> 01:39:09,400 you are handed those words in an array, called arg v, 2153 01:39:09,400 --> 01:39:13,700 and you are told how many words the human typed in arg c. 2154 01:39:13,700 --> 01:39:19,000 So the clang program, the make program, help 50, style 50, check 50, 2155 01:39:19,000 --> 01:39:21,610 all of the programs we've seen thus far that take words 2156 01:39:21,610 --> 01:39:24,570 after the program's names, literally are implemented with code 2157 01:39:24,570 --> 01:39:26,290 that's similar in spirit to this. 2158 01:39:26,290 --> 01:39:28,990 Some programmer checked oh, did the human type any words? 2159 01:39:28,990 --> 01:39:31,930 If so, maybe I want to output a different name than a dot out. 2160 01:39:31,930 --> 01:39:33,670 Maybe I want to output the name hello. 2161 01:39:33,670 --> 01:39:36,190 When you run make something, well what do you want to make? 2162 01:39:36,190 --> 01:39:40,120 That's a command line argument that the human programmer checked arg v for 2163 01:39:40,120 --> 01:39:43,270 to know what program it is you want to make. 2164 01:39:43,270 --> 01:39:47,090 So it's a simple idea, even though the syntax is admittedly pretty ugly. 2165 01:39:47,090 --> 01:39:48,490 But it's the same idea. 2166 01:39:48,490 --> 01:39:51,790 And the only two forms then, for main moving 2167 01:39:51,790 --> 01:39:55,150 forward are either this new one, which lets you accept command line arguments, 2168 01:39:55,150 --> 01:39:57,677 or the old one, which is when you know in advance I 2169 01:39:57,677 --> 01:39:59,260 don't need any command line arguments. 2170 01:39:59,260 --> 01:40:02,290 It's entirely up to you which to use, if you actually 2171 01:40:02,290 --> 01:40:05,510 want to accept command line arguments. 2172 01:40:05,510 --> 01:40:08,410 Now there's one last detail that we've not explained yet 2173 01:40:08,410 --> 01:40:10,180 and that's this one here. 2174 01:40:10,180 --> 01:40:13,030 Why the heck does main have a return value? 2175 01:40:13,030 --> 01:40:15,280 And there's not really a super compelling reason here, 2176 01:40:15,280 --> 01:40:18,160 but we can see that there's a low-level reason that this is useful, 2177 01:40:18,160 --> 01:40:20,290 but it's not something to stress over much. 2178 01:40:20,290 --> 01:40:25,030 It turns out that main by default in C does have a return value. 2179 01:40:25,030 --> 01:40:29,110 And even though we have never returned anything from main yet, by default, 2180 01:40:29,110 --> 01:40:30,990 main returns zero. 2181 01:40:30,990 --> 01:40:34,090 Zero in computers typically means all is well. 2182 01:40:34,090 --> 01:40:37,120 It's a little paradoxical, because you would think zero-- false-- bad. 2183 01:40:37,120 --> 01:40:39,410 But no, zero tends to be good. 2184 01:40:39,410 --> 01:40:44,260 The reason for this is that main can return non-zero values, 2185 01:40:44,260 --> 01:40:47,893 like one, or negative one, or 2 billion, or negative 2 billion. 2186 01:40:47,893 --> 01:40:50,560 In fact, if you've ever seen an error message on your Mac or PC, 2187 01:40:50,560 --> 01:40:52,477 sometimes there's a little window that pops up 2188 01:40:52,477 --> 01:40:55,750 and it's a cryptic looking code, like an error has happened, negative 42, 2189 01:40:55,750 --> 01:40:56,740 or whatever. 2190 01:40:56,740 --> 01:40:59,680 That number is just an arbitrary number some human 2191 01:40:59,680 --> 01:41:04,420 decided that their main program will return if something went wrong. 2192 01:41:04,420 --> 01:41:07,220 And we can do this as follows. 2193 01:41:07,220 --> 01:41:14,320 I can write a program like this in a file called exit dot c that has, 2194 01:41:14,320 --> 01:41:20,735 say, the CS50 library, that has includes standard Io dot h, int main void-- 2195 01:41:20,735 --> 01:41:22,610 I'm going to go back to void, because I'm not 2196 01:41:22,610 --> 01:41:26,290 going to take any-- or actually, no, I'm going to do int rc, 2197 01:41:26,290 --> 01:41:30,377 and then string arg v brackets, so I can take a command line argument, 2198 01:41:30,377 --> 01:41:31,960 and I'm going to start to error check. 2199 01:41:31,960 --> 01:41:34,370 Suppose this is a program that the human is supposed 2200 01:41:34,370 --> 01:41:36,160 to provide a command line argument. 2201 01:41:36,160 --> 01:41:37,190 I'm going to do this. 2202 01:41:37,190 --> 01:41:40,190 If arg c does not equal two, you know what I'm going to do? 2203 01:41:40,190 --> 01:41:45,860 I'm going to yell at the user, say missing command line argument backslash 2204 01:41:45,860 --> 01:41:48,320 n, but now I want to quit from the program. 2205 01:41:48,320 --> 01:41:49,970 I want to do the equivalent of exit. 2206 01:41:49,970 --> 01:41:51,770 So how do you do that in C? 2207 01:41:51,770 --> 01:41:54,080 You actually return a value. 2208 01:41:54,080 --> 01:41:57,150 And if all was well, you would return zero. 2209 01:41:57,150 --> 01:42:00,470 However, if something went wrong, the sky's the limit, up to 2 billion 2210 01:42:00,470 --> 01:42:01,500 or negative 2 billion. 2211 01:42:01,500 --> 01:42:05,330 However, we'll keep it simple, and just return one, if something went wrong. 2212 01:42:05,330 --> 01:42:11,210 Meanwhile, I might then say printf, hello, percent s. 2213 01:42:11,210 --> 01:42:14,040 Type in arg v one, just as before. 2214 01:42:14,040 --> 01:42:17,270 And then, if all is well, return zero. 2215 01:42:17,270 --> 01:42:19,252 So not much new is happening here. 2216 01:42:19,252 --> 01:42:20,960 This program is very similar to the last, 2217 01:42:20,960 --> 01:42:24,680 except instead of saying hello world by default, I'm going to yell at the user 2218 01:42:24,680 --> 01:42:26,540 with this, missing command line argument, 2219 01:42:26,540 --> 01:42:31,040 and then return one to signal to the computer, this program did not succeed. 2220 01:42:31,040 --> 01:42:34,670 And I'm going to return zero, if and only if, it did. 2221 01:42:34,670 --> 01:42:35,405 Yeah? 2222 01:42:35,405 --> 01:42:38,660 AUDIENCE: Why is arg c unequal to zero? 2223 01:42:38,660 --> 01:42:42,990 DAVID MALAN: Why is arg c not equal-- really good question. 2224 01:42:42,990 --> 01:42:46,070 So let me go ahead and change this. 2225 01:42:46,070 --> 01:42:50,810 What is in arg v zero that makes it have two things instead of one, 2226 01:42:50,810 --> 01:42:52,230 if I run David-- 2227 01:42:52,230 --> 01:42:53,600 if I run my name, David. 2228 01:42:53,600 --> 01:42:56,030 Well, hello-- let me recompile. 2229 01:42:56,030 --> 01:43:01,630 Make arg v one, or make arg v, dot slash, arg v, hello-- 2230 01:43:01,630 --> 01:43:02,780 no, wrong program. 2231 01:43:02,780 --> 01:43:03,890 Make exit. 2232 01:43:03,890 --> 01:43:05,600 Sorry. 2233 01:43:05,600 --> 01:43:07,490 There's no program to detect that mistake. 2234 01:43:07,490 --> 01:43:10,190 Dot slash exit, missing command line argument. 2235 01:43:10,190 --> 01:43:15,530 However, if I do exit David, now I see-- oh, did I run arg v before? 2236 01:43:15,530 --> 01:43:16,310 Check the tape. 2237 01:43:16,310 --> 01:43:17,450 Hello dot exit. 2238 01:43:17,450 --> 01:43:21,110 So in arg v, the first word you type, the program's name, 2239 01:43:21,110 --> 01:43:22,910 is stored at arg v zero. 2240 01:43:22,910 --> 01:43:26,450 The second word you type, the first argument you care about, 2241 01:43:26,450 --> 01:43:28,370 is an arg v one. 2242 01:43:28,370 --> 01:43:29,690 And that's why arg c is two. 2243 01:43:29,690 --> 01:43:32,648 I literally typed two words at the prompt, even though only one of them 2244 01:43:32,648 --> 01:43:36,050 is technically an argument I care about. 2245 01:43:36,050 --> 01:43:39,448 So where can we go from this? 2246 01:43:39,448 --> 01:43:41,990 So we're going to use this now to solve a number of problems, 2247 01:43:41,990 --> 01:43:43,407 that of readability, for instance. 2248 01:43:43,407 --> 01:43:44,960 You might recall this paragraph here. 2249 01:43:44,960 --> 01:43:45,860 Mr. And Mr. Durst-- 2250 01:43:45,860 --> 01:43:47,660 "Mr. And Mrs. Dursley of number 4 Privet Drive 2251 01:43:47,660 --> 01:43:50,490 were proud to say that they were perfectly normal, thank you very much. 2252 01:43:50,490 --> 01:43:52,490 They were the last people you'd expect to be involved in anything 2253 01:43:52,490 --> 01:43:55,790 strange or mysterious, because they just didn't hold with such nonsense," 2254 01:43:55,790 --> 01:43:56,495 and so forth. 2255 01:43:56,495 --> 01:43:59,120 So from the very first Harry Potter in the Philosopher's Stone, 2256 01:43:59,120 --> 01:44:01,160 if you were to run the entirety of that book 2257 01:44:01,160 --> 01:44:05,487 through a program written in C, that analyzes its readability, 2258 01:44:05,487 --> 01:44:07,820 you would be informed that the grade level for that book 2259 01:44:07,820 --> 01:44:09,260 is estimated at grade 7. 2260 01:44:09,260 --> 01:44:13,760 So you can read it well and comfortably if you're a human in grade 7. 2261 01:44:13,760 --> 01:44:15,090 Why is that the case? 2262 01:44:15,090 --> 01:44:18,710 Well, the program, as is conventional in software, 2263 01:44:18,710 --> 01:44:21,498 would analyze like the number of words in the sentence, 2264 01:44:21,498 --> 01:44:24,290 the lengths of your words, how big the words are that you're using. 2265 01:44:24,290 --> 01:44:26,082 There's a number of heuristics that are not 2266 01:44:26,082 --> 01:44:31,272 perfectly correlated with readability, but they are-- 2267 01:44:31,272 --> 01:44:33,230 they're not perfectly aligned with readability, 2268 01:44:33,230 --> 01:44:35,220 but they do correlate with readability. 2269 01:44:35,220 --> 01:44:37,303 So the bigger the words, the bigger the sentences, 2270 01:44:37,303 --> 01:44:41,090 and more likely the older you should be to actually read that text effectively. 2271 01:44:41,090 --> 01:44:42,670 Now something like this. 2272 01:44:42,670 --> 01:44:45,080 "In computational linguistics, authorship attribution 2273 01:44:45,080 --> 01:44:47,540 is the task of predicting the author of a document of unknown authorship. 2274 01:44:47,540 --> 01:44:50,720 This task is generally performed by the analysis of style metric features, 2275 01:44:50,720 --> 01:44:52,280 particular characteristics of an author's writing 2276 01:44:52,280 --> 01:44:54,655 that can be used to identify his or her works in contrast 2277 01:44:54,655 --> 01:44:56,055 with the works of other authors." 2278 01:44:56,055 --> 01:44:58,430 If you were to run that through the same program and see, 2279 01:44:58,430 --> 01:45:00,138 otherwise known as Brian's senior thesis, 2280 01:45:00,138 --> 01:45:04,610 you would get grade 16, because he uses a lot bigger words, longer sentences, 2281 01:45:04,610 --> 01:45:06,110 more elegant prose. 2282 01:45:06,110 --> 01:45:10,293 It turns out that this program in C to which I allude, will exist in a week, 2283 01:45:10,293 --> 01:45:12,210 because for the first problem on the problem-- 2284 01:45:12,210 --> 01:45:14,168 one of the problems on the problem set will you 2285 01:45:14,168 --> 01:45:16,010 implement a readability analysis. 2286 01:45:16,010 --> 01:45:19,040 But it all boils down to taking in text as inputs, such as Harry 2287 01:45:19,040 --> 01:45:22,250 Potter or Brian's text, analyzing the lengths of the words, 2288 01:45:22,250 --> 01:45:26,295 looking for the spaces, and so forth, and deciding how advanced that text is. 2289 01:45:26,295 --> 01:45:28,295 But we're also going to challenge you with this, 2290 01:45:28,295 --> 01:45:31,310 this notion of cryptography, the art of scrambling information 2291 01:45:31,310 --> 01:45:32,570 to keep it private. 2292 01:45:32,570 --> 01:45:35,450 And cryptography might work, just like in week zero, 2293 01:45:35,450 --> 01:45:38,330 as having inputs and outputs, where the input is the message you 2294 01:45:38,330 --> 01:45:40,410 want to send safely to someone else. 2295 01:45:40,410 --> 01:45:43,452 The output is some kind of scrambled version thereof, the equivalent of, 2296 01:45:43,452 --> 01:45:46,160 like in grade school, maybe writing a little love note to someone 2297 01:45:46,160 --> 01:45:48,243 and passing it through the class to the recipient. 2298 01:45:48,243 --> 01:45:50,452 And you don't want the teacher, if they intercept it, 2299 01:45:50,452 --> 01:45:53,780 to be able to understand the message, so it's somehow scrambled or encrypted, 2300 01:45:53,780 --> 01:45:54,710 so to speak. 2301 01:45:54,710 --> 01:45:56,748 In cryptography, the input is called plaintext, 2302 01:45:56,748 --> 01:45:58,290 and the output is called cipher text. 2303 01:45:58,290 --> 01:46:02,180 So if we were, for instance, to say something like hi exclamation point, 2304 01:46:02,180 --> 01:46:05,540 recall that, that of course can be represented in Ascii as three numbers-- 2305 01:46:05,540 --> 01:46:07,670 72, 73, and 33. 2306 01:46:07,670 --> 01:46:10,130 Well, it turns out, if we want to send a fancier message, 2307 01:46:10,130 --> 01:46:13,580 a longer one, we can just look at all of those numeric equivalents, 2308 01:46:13,580 --> 01:46:16,520 do some mathematics on them, and effectively scramble them. 2309 01:46:16,520 --> 01:46:17,600 But we need a key. 2310 01:46:17,600 --> 01:46:21,140 You and I need to decide in advance, sender and recipient, what 2311 01:46:21,140 --> 01:46:24,290 is the secret we're going to use to kind of jumble the letters up 2312 01:46:24,290 --> 01:46:27,230 so as to encrypt it without a teacher or a classmate 2313 01:46:27,230 --> 01:46:28,820 intercepting and decrypting it. 2314 01:46:28,820 --> 01:46:32,750 Suppose, very simply and probably foolishly, our secret number is one. 2315 01:46:32,750 --> 01:46:36,490 You and I both green one is our secret and we're going to use one to scramble 2316 01:46:36,490 --> 01:46:38,630 the information as follows. 2317 01:46:38,630 --> 01:46:42,490 If I want to say, I love you, and send this across an insecure medium, 2318 01:46:42,490 --> 01:46:44,620 like a roomful of people, well I might first 2319 01:46:44,620 --> 01:46:47,500 convert each of these letters to their Ascii equivalents 2320 01:46:47,500 --> 01:46:50,800 just by looking them up on AsciiChart.com or doing it in code, 2321 01:46:50,800 --> 01:46:54,010 then I might go ahead and start adding one to each of those letters, 2322 01:46:54,010 --> 01:46:57,040 because that is the secret on which you and I have agreed, 2323 01:46:57,040 --> 01:46:59,020 and then I'll convert it back to the characters 2324 01:46:59,020 --> 01:47:03,280 as by casting it from an int to a char so that the message I actually 2325 01:47:03,280 --> 01:47:06,760 write on my piece of paper, or send in my program, looks like this. 2326 01:47:06,760 --> 01:47:10,210 So that if a teacher or a classmate intercepts it, they see this, 2327 01:47:10,210 --> 01:47:12,070 but you know, I love you. 2328 01:47:12,070 --> 01:47:16,160 And so, with that said, will you be doing your readability and cryptography 2329 01:47:16,160 --> 01:47:16,660 and more? 2330 01:47:16,660 --> 01:47:20,010 That's it for week two, and we'll see you next time. 2331 01:47:20,010 --> 01:47:20,926