1 00:00:00,000 --> 00:00:03,493 [MUSIC PLAYING] 2 00:00:03,493 --> 00:00:49,420 3 00:00:49,420 --> 00:00:51,760 DAVID MALAN: All right, so this is CS50. 4 00:00:51,760 --> 00:00:55,450 And this is week 2, wherein we're going to dive in a little more deeply 5 00:00:55,450 --> 00:00:56,720 to see this new language. 6 00:00:56,720 --> 00:00:58,720 And we're also going to take a look back at some 7 00:00:58,720 --> 00:01:02,350 of the concepts we looked at last week so that you can better understand some 8 00:01:02,350 --> 00:01:04,750 of the features of C and some of the steps 9 00:01:04,750 --> 00:01:06,830 you've been taking to make your code work. 10 00:01:06,830 --> 00:01:09,880 So we'll peel back some of the layers of abstraction from last week 11 00:01:09,880 --> 00:01:11,950 so that you better understand really what's going 12 00:01:11,950 --> 00:01:14,540 on underneath the hood of the computer. 13 00:01:14,540 --> 00:01:18,907 So, of course, last week, we began with perhaps the most canonical of programs 14 00:01:18,907 --> 00:01:20,740 in C, the most canonical of programs you can 15 00:01:20,740 --> 00:01:23,140 write pretty much in any language, which is that which 16 00:01:23,140 --> 00:01:25,030 says, quite simply, "hello, world." 17 00:01:25,030 --> 00:01:28,600 But recall that before actually running this program, 18 00:01:28,600 --> 00:01:31,600 we have to convert it into the language that computers themselves speak, 19 00:01:31,600 --> 00:01:35,080 which we defined last week as binary, 0's and 1's, otherwise known 20 00:01:35,080 --> 00:01:37,640 as machine language in this context. 21 00:01:37,640 --> 00:01:40,120 So we have to go somehow from this source code to something 22 00:01:40,120 --> 00:01:44,380 more like this machine code, the 0's and 1's that the computer actually 23 00:01:44,380 --> 00:01:45,250 understands. 24 00:01:45,250 --> 00:01:48,160 Now, you may recall too that we introduced a command for this. 25 00:01:48,160 --> 00:01:49,840 And that command was called make. 26 00:01:49,840 --> 00:01:53,770 And literally via this command, "make hello," could we make a program 27 00:01:53,770 --> 00:01:54,490 called hello. 28 00:01:54,490 --> 00:01:55,840 And make was a little fancy. 29 00:01:55,840 --> 00:01:59,140 It assumed that if you want to make a program called hello, 30 00:01:59,140 --> 00:02:01,720 it would look for a file called hello.c. 31 00:02:01,720 --> 00:02:03,850 That just happens automatically for you. 32 00:02:03,850 --> 00:02:07,240 And the end result, of course, was an additional file called hello 33 00:02:07,240 --> 00:02:11,390 that would end up getting put into your current directory. 34 00:02:11,390 --> 00:02:14,360 So you could then do ./hello and be on your way. 35 00:02:14,360 --> 00:02:17,620 But it turns out that make is actually automating 36 00:02:17,620 --> 00:02:20,650 a more specific set of steps for us that we'll 37 00:02:20,650 --> 00:02:22,510 see a little more closely now instead. 38 00:02:22,510 --> 00:02:24,640 So on the screen here is exactly the same code 39 00:02:24,640 --> 00:02:27,370 that we wrote last week to say, quite simply, "hello, world." 40 00:02:27,370 --> 00:02:31,810 And recall that any time you run "make hello" or "make mario" 41 00:02:31,810 --> 00:02:34,120 or "make cash" or "make credit," any of the problems 42 00:02:34,120 --> 00:02:35,890 that you might have tackled more recently, 43 00:02:35,890 --> 00:02:38,260 you see some cryptic output on the screen. 44 00:02:38,260 --> 00:02:42,040 Hopefully, no red or yellow error messages, but even when all is well, 45 00:02:42,040 --> 00:02:45,770 you see this white text which is indicative of all having been well. 46 00:02:45,770 --> 00:02:49,240 And last week, we just kind of ignored this and moved on and immediately did 47 00:02:49,240 --> 00:02:51,400 something like ./hello. 48 00:02:51,400 --> 00:02:53,528 But today, let's actually better understand 49 00:02:53,528 --> 00:02:55,570 what it is that we've been turning a blind eye to 50 00:02:55,570 --> 00:03:00,820 so that each week, as it passes, there's less and less that you don't understand 51 00:03:00,820 --> 00:03:04,220 the entirety of with respect to what's going on your screen. 52 00:03:04,220 --> 00:03:08,020 So again, if I do ls here, we'll see not only hello.c, but also 53 00:03:08,020 --> 00:03:12,910 the executable program called hello that I actually created via make. 54 00:03:12,910 --> 00:03:14,450 But look at this output. 55 00:03:14,450 --> 00:03:16,930 There's some mention of something called Clang here. 56 00:03:16,930 --> 00:03:21,730 And then there's a lot of other words or cryptic phrases, something in computer 57 00:03:21,730 --> 00:03:24,500 speak here that has all of these hyphens in front of them. 58 00:03:24,500 --> 00:03:26,770 And it turns out that what make is doing for us 59 00:03:26,770 --> 00:03:31,870 is it's automating execution of a command more specifically called clang. 60 00:03:31,870 --> 00:03:35,860 Clang is actually the compiler that we alluded to last week, a compiler 61 00:03:35,860 --> 00:03:39,010 being a program that converts source code to machine code. 62 00:03:39,010 --> 00:03:41,590 We've actually been using Clang this whole time. 63 00:03:41,590 --> 00:03:44,860 But notice that Clang requires a bit more sophistication. 64 00:03:44,860 --> 00:03:48,200 You have to understand a bit more about what's going on in order to use it. 65 00:03:48,200 --> 00:03:51,190 So let me go ahead and remove the program called hello. 66 00:03:51,190 --> 00:03:54,160 I'm going to use the rm command that we saw briefly last time. 67 00:03:54,160 --> 00:03:55,900 I'm going to confirm by hitting y. 68 00:03:55,900 --> 00:04:00,010 And if I type ls again now, hello.c is the only file that remains. 69 00:04:00,010 --> 00:04:04,600 Well, temporarily, let me take away the ability to use make. 70 00:04:04,600 --> 00:04:07,000 And let's now use Clang directly. 71 00:04:07,000 --> 00:04:10,210 Clang is another program installed in CS50 IDE. 72 00:04:10,210 --> 00:04:13,540 It's a very popular compiler that you can download onto your own Macs and PCs 73 00:04:13,540 --> 00:04:14,320 as well. 74 00:04:14,320 --> 00:04:16,550 But to run it is a little different. 75 00:04:16,550 --> 00:04:19,930 I'm going to go ahead and say clang and then the name of the file 76 00:04:19,930 --> 00:04:23,380 that I want to compile, hello.c being this one. 77 00:04:23,380 --> 00:04:24,970 I'm going to go ahead and hit Enter. 78 00:04:24,970 --> 00:04:27,340 And now nothing happens, seemingly. 79 00:04:27,340 --> 00:04:29,410 But frankly, as you've probably gleaned already, 80 00:04:29,410 --> 00:04:31,698 when nothing bad seems to happen, that implicitly 81 00:04:31,698 --> 00:04:33,490 tends to mean that something good happened. 82 00:04:33,490 --> 00:04:35,710 Your program compiled successfully. 83 00:04:35,710 --> 00:04:39,790 But curiously, if I type ls now, you don't see the program, hello. 84 00:04:39,790 --> 00:04:42,700 You see this weird file name called a.out. 85 00:04:42,700 --> 00:04:44,620 And this is actually a historical remnant. 86 00:04:44,620 --> 00:04:48,220 Years ago, when humans would use a compiler to compile their code, 87 00:04:48,220 --> 00:04:51,520 the default file name that every program was given 88 00:04:51,520 --> 00:04:54,460 was a.out for assembly output. 89 00:04:54,460 --> 00:04:55,670 More on that in a moment. 90 00:04:55,670 --> 00:04:57,670 But this is kind of a stupid name for a program. 91 00:04:57,670 --> 00:04:59,590 It's not at all descriptive of what it does. 92 00:04:59,590 --> 00:05:05,140 So it turns out that programs like Clang can be configured at the command line. 93 00:05:05,140 --> 00:05:08,140 The command line, again, refers to the blinking prompt where you can 94 00:05:08,140 --> 00:05:09,280 type commands. 95 00:05:09,280 --> 00:05:14,200 So indeed, I'm going to go ahead and remove this file now-- rm space a.out, 96 00:05:14,200 --> 00:05:15,550 and then confirm with y. 97 00:05:15,550 --> 00:05:18,520 And now I'm back to where I began with just hello.c. 98 00:05:18,520 --> 00:05:21,140 And let me go ahead now and do something a little different. 99 00:05:21,140 --> 00:05:27,700 I'm going to do "clang -o hello" and then the word "hello.c." 100 00:05:27,700 --> 00:05:29,950 And what I'm doing here is actually providing 101 00:05:29,950 --> 00:05:33,080 what we're going to start calling a command-line argument. 102 00:05:33,080 --> 00:05:37,330 So these commands, like make and rm, sometimes 103 00:05:37,330 --> 00:05:39,460 can just be run all by themselves. 104 00:05:39,460 --> 00:05:41,500 You just type a single word and hit Enter. 105 00:05:41,500 --> 00:05:44,980 But very often, we've seen that they take inputs in some sense. 106 00:05:44,980 --> 00:05:46,660 You type, "make hello." 107 00:05:46,660 --> 00:05:48,870 You type, "rm hello." 108 00:05:48,870 --> 00:05:51,030 And the second word, "hello," in those cases, 109 00:05:51,030 --> 00:05:53,910 is kind of an input to the command, otherwise 110 00:05:53,910 --> 00:05:56,980 now known as a command-line argument. 111 00:05:56,980 --> 00:05:58,480 It's an input to the command. 112 00:05:58,480 --> 00:06:01,710 So here, we have more command-line arguments. 113 00:06:01,710 --> 00:06:06,300 We've got the word "clang," which is the compiler we're about to run, "-o," 114 00:06:06,300 --> 00:06:09,230 which it turns out is shorthand notation for "output," 115 00:06:09,230 --> 00:06:10,875 so please output the following. 116 00:06:10,875 --> 00:06:12,000 What do you want to output? 117 00:06:12,000 --> 00:06:13,830 Well, the next word is "hello." 118 00:06:13,830 --> 00:06:16,210 And then the final word is "hello.c." 119 00:06:16,210 --> 00:06:19,530 So long story short, this command now more verbose 120 00:06:19,530 --> 00:06:24,210 though it is, is saying, run Clang, output a file called hello, 121 00:06:24,210 --> 00:06:27,020 and take as input file called hello.c. 122 00:06:27,020 --> 00:06:30,270 So when I run this command after hitting Enter, nothing again seems to happen. 123 00:06:30,270 --> 00:06:34,560 But if I type ls, I don't see that stupid default file name of a.out. 124 00:06:34,560 --> 00:06:37,590 Now I see the file name, hello. 125 00:06:37,590 --> 00:06:41,310 So this is how ultimately Clang is helping me compile my code. 126 00:06:41,310 --> 00:06:43,770 It's kind of automating all of those processes. 127 00:06:43,770 --> 00:06:48,210 But recall that that's not the only type of program we ran last week 128 00:06:48,210 --> 00:06:49,290 or wrote last week. 129 00:06:49,290 --> 00:06:52,425 We rather took code like this and began to enhance it 130 00:06:52,425 --> 00:06:53,550 with some additional lines. 131 00:06:53,550 --> 00:06:56,040 So version 2 of Hello, World actually involved 132 00:06:56,040 --> 00:06:59,730 prompting the user for input using CS50's get_string function, 133 00:06:59,730 --> 00:07:02,940 storing the output in a variable called name. 134 00:07:02,940 --> 00:07:07,028 But recall that we also had to add cs50.h at the top of the file. 135 00:07:07,028 --> 00:07:08,320 So let me go ahead and do that. 136 00:07:08,320 --> 00:07:12,060 Let me go ahead and remove hello because that's now the old version. 137 00:07:12,060 --> 00:07:18,510 Let me go in now and start updating my code here and go into my hello.c file, 138 00:07:18,510 --> 00:07:23,010 include cs50.h, now get myself a string called name, 139 00:07:23,010 --> 00:07:25,890 but we could call it anything, call the function get_string, 140 00:07:25,890 --> 00:07:30,990 and ask, "What's your name," question mark with a space at the very end 141 00:07:30,990 --> 00:07:32,340 just to create a gap. 142 00:07:32,340 --> 00:07:36,090 And then down here, instead of printing out "hello, world" always, 143 00:07:36,090 --> 00:07:40,110 let me print out "Hello, %s," which is a placeholder recall, 144 00:07:40,110 --> 00:07:41,910 and output the person's name. 145 00:07:41,910 --> 00:07:44,760 So last week, the way we compiled this program was just 146 00:07:44,760 --> 00:07:47,140 "make hello," no different from now. 147 00:07:47,140 --> 00:07:52,200 But this week, suppose I were to instead get rid of make, only 148 00:07:52,200 --> 00:07:54,570 because it's sort of automating steps for me that I now 149 00:07:54,570 --> 00:07:56,250 want to understand in more detail. 150 00:07:56,250 --> 00:08:01,380 I could compile this program again with clang -o hello hello.c, so just 151 00:08:01,380 --> 00:08:06,600 a reapplication of that same idea of passing in three arguments, -o, hello, 152 00:08:06,600 --> 00:08:08,130 and hello.c. 153 00:08:08,130 --> 00:08:11,070 But the catch now is that I'm actually going 154 00:08:11,070 --> 00:08:12,880 to see one of these red error messages. 155 00:08:12,880 --> 00:08:14,880 And let's consider what this is actually saying. 156 00:08:14,880 --> 00:08:17,440 There's still going to be a bunch of cryptic stuff here. 157 00:08:17,440 --> 00:08:19,992 But notice, as always, we're going to see, hopefully, 158 00:08:19,992 --> 00:08:21,450 something that's a little familiar. 159 00:08:21,450 --> 00:08:23,520 So "undefined reference to get_string." 160 00:08:23,520 --> 00:08:26,790 I don't yet know what an undefined reference is, necessarily. 161 00:08:26,790 --> 00:08:28,440 I don't know what a linker command is. 162 00:08:28,440 --> 00:08:31,680 But I at least recognize there's something going on with get_string. 163 00:08:31,680 --> 00:08:33,100 And there's a reason for this. 164 00:08:33,100 --> 00:08:37,140 It turns out that when using a library, whether it's CS50's library or others' 165 00:08:37,140 --> 00:08:41,159 as well, it's sometimes not sufficient only to include the header 166 00:08:41,159 --> 00:08:43,650 file at the top of your own code. 167 00:08:43,650 --> 00:08:46,230 Sometimes, you additionally have to tell the computer 168 00:08:46,230 --> 00:08:52,350 where to find the 0's and 1's that someone has written to implement 169 00:08:52,350 --> 00:08:54,290 a function like get_string. 170 00:08:54,290 --> 00:08:58,860 So the header file, like cs50.h, just tells the compiler 171 00:08:58,860 --> 00:09:00,540 that the function exists. 172 00:09:00,540 --> 00:09:02,910 But there's a second mechanism that, up until now, 173 00:09:02,910 --> 00:09:05,730 has been automated for us, that tells the computer where 174 00:09:05,730 --> 00:09:10,560 to find the actual 0's and 1's that implements the functions in that header 175 00:09:10,560 --> 00:09:11,470 file. 176 00:09:11,470 --> 00:09:15,180 So with that said, I'm going to need to actually add another command line 177 00:09:15,180 --> 00:09:16,710 argument to this command. 178 00:09:16,710 --> 00:09:22,140 And instead of doing clang -o hello hello.c, I'm going to additionally, 179 00:09:22,140 --> 00:09:26,250 and admittedly, cryptically, do -lcs50 at the end 180 00:09:26,250 --> 00:09:31,440 of this command, which quite simply refers to link in the CS50 library. 181 00:09:31,440 --> 00:09:33,750 So "link" is a term of art that we'll see what it 182 00:09:33,750 --> 00:09:35,560 means in more detail in just a moment. 183 00:09:35,560 --> 00:09:39,330 But this additional final command-line argument tells Clang, 184 00:09:39,330 --> 00:09:42,660 you already know that a function like get_string exists. 185 00:09:42,660 --> 00:09:47,100 -lcs50 means when compiling hello.c, make 186 00:09:47,100 --> 00:09:51,570 sure to incorporate all of the machine code from CS50's library 187 00:09:51,570 --> 00:09:53,190 into your program as well. 188 00:09:53,190 --> 00:09:56,880 In short, it's something you have to do when you use certain libraries. 189 00:09:56,880 --> 00:10:00,610 So now when I hit Enter, all seems to be well because nothing bad got printed. 190 00:10:00,610 --> 00:10:02,520 If I type ls, I see hello. 191 00:10:02,520 --> 00:10:06,300 And voila, I can do ./hello, type in my name, David. 192 00:10:06,300 --> 00:10:08,520 And voila, "hello, David." 193 00:10:08,520 --> 00:10:10,740 So why didn't we do all of this last week? 194 00:10:10,740 --> 00:10:13,020 And frankly, we've made no fundamental progress. 195 00:10:13,020 --> 00:10:15,960 All we've done is reveal what's going on underneath the hood. 196 00:10:15,960 --> 00:10:19,650 But I'll claim that, frankly, compiling your code by typing out 197 00:10:19,650 --> 00:10:24,450 all of these verbose command-line arguments just gets tedious quickly. 198 00:10:24,450 --> 00:10:27,330 And so computer scientists and programmers, more specifically, 199 00:10:27,330 --> 00:10:29,370 tend to automate monotonous steps. 200 00:10:29,370 --> 00:10:33,360 So what's happening ultimately with make is that all of this 201 00:10:33,360 --> 00:10:34,800 is being automated for us. 202 00:10:34,800 --> 00:10:37,590 So when you typed "make hello" last week-- and henceforth, 203 00:10:37,590 --> 00:10:40,030 you're welcome to continue using make as well-- 204 00:10:40,030 --> 00:10:43,710 notice that it generates this extra long command, some of which 205 00:10:43,710 --> 00:10:45,000 we haven't even talked about. 206 00:10:45,000 --> 00:10:47,400 But I do recognize clang at the beginning. 207 00:10:47,400 --> 00:10:51,170 I recognize hello.c see here. 208 00:10:51,170 --> 00:10:54,330 I recognize -lcs50 here. 209 00:10:54,330 --> 00:10:56,490 But notice there's a bunch of other stuff as well, 210 00:10:56,490 --> 00:11:00,470 not only the -o hello, but also -lm, which 211 00:11:00,470 --> 00:11:03,500 refers to a math library, -lcrypt, which refers 212 00:11:03,500 --> 00:11:05,900 to a cryptography or an encryption library. 213 00:11:05,900 --> 00:11:08,810 In short, we the staff have preconfigured 214 00:11:08,810 --> 00:11:11,450 make to just make sure that when you compile your code, 215 00:11:11,450 --> 00:11:15,410 all of the requisite dependencies, libraries, and so forth, 216 00:11:15,410 --> 00:11:18,650 are available to you without having to worry about all 217 00:11:18,650 --> 00:11:20,100 of these command-line arguments. 218 00:11:20,100 --> 00:11:22,700 So henceforth, you can certainly compile your code 219 00:11:22,700 --> 00:11:24,650 in this way using Clang directly. 220 00:11:24,650 --> 00:11:27,740 Or you can come back full circle to where we were last week 221 00:11:27,740 --> 00:11:29,240 and just run "make hello." 222 00:11:29,240 --> 00:11:33,170 But there's a reason we run make hello, because executing all of those steps 223 00:11:33,170 --> 00:11:35,900 manually tends to just get tedious quickly. 224 00:11:35,900 --> 00:11:38,992 And so indeed, what we've done here is compile our code. 225 00:11:38,992 --> 00:11:41,450 And compiling means going from source code to machine code. 226 00:11:41,450 --> 00:11:44,660 But today, we revealed that there's a little more, indeed, 227 00:11:44,660 --> 00:11:46,610 going on underneath the hood, this "linking" 228 00:11:46,610 --> 00:11:49,800 that I referred to and a couple of other steps as well. 229 00:11:49,800 --> 00:11:53,900 So it turns out when you compile your code from source code to machine code, 230 00:11:53,900 --> 00:11:56,900 there's a few more steps that are ultimately involved. 231 00:11:56,900 --> 00:11:59,960 And when we say "compiling," we actually mean these four steps. 232 00:11:59,960 --> 00:12:03,350 And we're not going to dwell on these kinds of low-level details. 233 00:12:03,350 --> 00:12:05,690 But it's perhaps enlightening just to see 234 00:12:05,690 --> 00:12:09,830 a brief tour of what's going on when you start with your source code 235 00:12:09,830 --> 00:12:11,680 and end up trying to produce machine code. 236 00:12:11,680 --> 00:12:12,638 So let's consider this. 237 00:12:12,638 --> 00:12:14,660 This is step 1 that the computer is doing 238 00:12:14,660 --> 00:12:17,150 for you when you compile your code. 239 00:12:17,150 --> 00:12:19,640 So step 1 takes your own source code that 240 00:12:19,640 --> 00:12:21,260 looks a little something like this. 241 00:12:21,260 --> 00:12:24,650 And it preprocesses your code, top to bottom, left to right. 242 00:12:24,650 --> 00:12:27,170 And to preprocess your code essentially means 243 00:12:27,170 --> 00:12:30,500 that it looks for any lines that start with a hash symbol, so 244 00:12:30,500 --> 00:12:35,300 #include cs50.h, #include stdio.h. 245 00:12:35,300 --> 00:12:39,230 And what the preprocessing step does is it's kind of like a find and replace. 246 00:12:39,230 --> 00:12:42,330 It notices, oh, here's a #include line. 247 00:12:42,330 --> 00:12:49,790 Let me go ahead and copy the contents of that file, cs50.h, into your own code. 248 00:12:49,790 --> 00:12:54,290 Similarly, when I encounter #include stdio.h, let me, 249 00:12:54,290 --> 00:12:58,760 the so-called preprocessor, open that file, stdio.h, and copy/paste 250 00:12:58,760 --> 00:13:04,650 the contents of that file so that what's in the file now looks more like this. 251 00:13:04,650 --> 00:13:06,290 So this is happening automatically. 252 00:13:06,290 --> 00:13:08,240 You never have to do this manually. 253 00:13:08,240 --> 00:13:12,290 But why is there this preprocessing step? 254 00:13:12,290 --> 00:13:16,880 If you recall our discussion last week of these lines of code that 255 00:13:16,880 --> 00:13:19,910 tend to go at the top of your file, does anyone 256 00:13:19,910 --> 00:13:24,580 perceive what the preprocessor is doing for me and why? 257 00:13:24,580 --> 00:13:29,720 Why do I write code that has these hash symbols, like #include cs50.h 258 00:13:29,720 --> 00:13:33,350 and #include stdio.h, but this preprocessor apparently 259 00:13:33,350 --> 00:13:37,415 is automatically replacing those lines with the actual contents 260 00:13:37,415 --> 00:13:38,040 of those files? 261 00:13:38,040 --> 00:13:42,740 What are these things here in yellow now? 262 00:13:42,740 --> 00:13:44,168 Yeah, Jack, what do you think? 263 00:13:44,168 --> 00:13:46,960 JACK: Is it defining all the functions for you to use in your code, 264 00:13:46,960 --> 00:13:48,740 otherwise the computer wouldn't know what to do? 265 00:13:48,740 --> 00:13:49,340 DAVID MALAN: Exactly. 266 00:13:49,340 --> 00:13:51,065 It's defining all of the functions in my code 267 00:13:51,065 --> 00:13:52,648 so that the computer knows what to do. 268 00:13:52,648 --> 00:13:56,113 Because remember that we ran into that sort of annoying bug last week, 269 00:13:56,113 --> 00:13:58,280 whereby I was trying to implement a function called, 270 00:13:58,280 --> 00:13:59,960 I think, get_positive_int. 271 00:13:59,960 --> 00:14:04,350 And recall that when I implemented that function at the bottom of my file, 272 00:14:04,350 --> 00:14:07,610 the compiler was kind of dumb in that it didn't realize 273 00:14:07,610 --> 00:14:09,440 that it existed because it was implemented 274 00:14:09,440 --> 00:14:11,220 all the way at the bottom of my file. 275 00:14:11,220 --> 00:14:16,040 So to Jack's point, by putting a mention of this function, a hint, if you will, 276 00:14:16,040 --> 00:14:18,950 at the very top, it's like training the compiler 277 00:14:18,950 --> 00:14:22,160 to know in advance that I don't know how it's implemented yet, 278 00:14:22,160 --> 00:14:23,850 but I know get_string is going to exist. 279 00:14:23,850 --> 00:14:27,540 I don't know how it's implemented yet, but I know printf is going to exist. 280 00:14:27,540 --> 00:14:31,400 So these header files that we've been including for the past week essentially 281 00:14:31,400 --> 00:14:34,190 contain all of the prototypes-- 282 00:14:34,190 --> 00:14:38,240 that is, all of the hints for all the functions that exist in the library-- 283 00:14:38,240 --> 00:14:42,710 so that your code, when compiled, know from the top down 284 00:14:42,710 --> 00:14:45,690 that those functions will indeed exist. 285 00:14:45,690 --> 00:14:47,690 So the preprocessor just saves us the trouble 286 00:14:47,690 --> 00:14:50,480 of having to copy and paste all of these prototypes, if you will, 287 00:14:50,480 --> 00:14:52,830 all of these hints, ourselves. 288 00:14:52,830 --> 00:14:54,950 So what happens after that step there? 289 00:14:54,950 --> 00:14:55,777 What comes next? 290 00:14:55,777 --> 00:14:57,860 Well, there might very well be other header files. 291 00:14:57,860 --> 00:15:00,152 There might very well be other contents in those files. 292 00:15:00,152 --> 00:15:03,800 But for now, let's just assume that only in there is the prototype. 293 00:15:03,800 --> 00:15:06,770 So now compiling actually has a more precise meaning 294 00:15:06,770 --> 00:15:08,000 that we'll define today. 295 00:15:08,000 --> 00:15:11,690 To compile your code now means to take this C code 296 00:15:11,690 --> 00:15:17,215 and to convert it from source code here to another type of source code here. 297 00:15:17,215 --> 00:15:20,090 Now, this is probably going to be the most cryptic stuff we ever see. 298 00:15:20,090 --> 00:15:22,190 And this is not code you need to understand. 299 00:15:22,190 --> 00:15:25,460 But what's on the screen here is what's called assembly code. 300 00:15:25,460 --> 00:15:28,550 So long story short, there's a lot of different computers in the world. 301 00:15:28,550 --> 00:15:30,650 And specifically, there's a lot of different types 302 00:15:30,650 --> 00:15:35,730 of CPUs in the, Central Processing Units, the brains of a computer. 303 00:15:35,730 --> 00:15:39,680 And a CPU understands certain commands. 304 00:15:39,680 --> 00:15:43,880 And those commands tend to be expressed in this language called assembly code. 305 00:15:43,880 --> 00:15:46,597 Now, I honestly don't really understand most of this myself. 306 00:15:46,597 --> 00:15:49,680 It's certainly been a while even since I thought hard about assembly code. 307 00:15:49,680 --> 00:15:53,460 But if I highlight a few operative characters here, 308 00:15:53,460 --> 00:15:56,570 notice that there's mention of main, get_string, printf. 309 00:15:56,570 --> 00:16:00,170 So this is of like a lower-level implementation of main, 310 00:16:00,170 --> 00:16:03,420 of get_string and printf, in a different language called assembly. 311 00:16:03,420 --> 00:16:04,820 So you write the C code. 312 00:16:04,820 --> 00:16:08,630 The computer, though, converts it to a more computer-friendly language 313 00:16:08,630 --> 00:16:09,960 called assembly code. 314 00:16:09,960 --> 00:16:12,320 And decades ago, humans wrote this stuff. 315 00:16:12,320 --> 00:16:14,210 Humans wrote assembly code. 316 00:16:14,210 --> 00:16:17,585 But nowadays, we have C. And nowadays, we have languages like Python-- 317 00:16:17,585 --> 00:16:20,210 more on that in a few weeks-- that are just more user friendly, 318 00:16:20,210 --> 00:16:22,310 even if it didn't feel like that this past week. 319 00:16:22,310 --> 00:16:26,180 Assembly code is a little closer to what the computer itself understands. 320 00:16:26,180 --> 00:16:27,740 But there's still another step. 321 00:16:27,740 --> 00:16:29,240 There's this step called assembling. 322 00:16:29,240 --> 00:16:31,910 And again, all of this is happening when you simply run 323 00:16:31,910 --> 00:16:34,580 make and, in turn, this command, clang. 324 00:16:34,580 --> 00:16:39,350 To assemble your code means to take this assembly code and finally convert it 325 00:16:39,350 --> 00:16:41,720 to machine code, 0's and 1's. 326 00:16:41,720 --> 00:16:43,460 So you write the source code. 327 00:16:43,460 --> 00:16:46,700 The compiler assembles it into assembly code. 328 00:16:46,700 --> 00:16:49,550 Then it compiles it into assembly code. 329 00:16:49,550 --> 00:16:54,650 Then it assembles it into machine code until we have the actual 0's and 1's. 330 00:16:54,650 --> 00:16:56,610 But there's actually one final step. 331 00:16:56,610 --> 00:17:00,380 Just because your code that you wrote has been converted into 0's and 1's, it 332 00:17:00,380 --> 00:17:04,369 still needs to be linked in with the 0's and 1's that CS50 wrote 333 00:17:04,369 --> 00:17:07,280 and that the designers of the C language wrote years ago 334 00:17:07,280 --> 00:17:09,680 when implementing the CS50 library in our case, 335 00:17:09,680 --> 00:17:12,470 and the printf function in their case. 336 00:17:12,470 --> 00:17:15,950 So this is to say that when you have code like this that's not only 337 00:17:15,950 --> 00:17:20,270 including the prototypes for functions like get_string and printf at the very 338 00:17:20,270 --> 00:17:24,440 top, these lines here in yellow are what are ultimately converted 339 00:17:24,440 --> 00:17:27,440 into 0's and 1's. 340 00:17:27,440 --> 00:17:32,270 We now have to combine those 0's and 1's with the 0's and 1's from cs50.c, 341 00:17:32,270 --> 00:17:35,030 which the staff wrote some time ago, and even a file 342 00:17:35,030 --> 00:17:38,588 called stdio.c, which the designers of C wrote years ago. 343 00:17:38,588 --> 00:17:40,880 And technically, it might be called something different 344 00:17:40,880 --> 00:17:41,802 underneath the hood. 345 00:17:41,802 --> 00:17:43,760 But there's really three files that are getting 346 00:17:43,760 --> 00:17:45,530 combined when you write your program. 347 00:17:45,530 --> 00:17:51,920 The first, I just claimed, once it's preprocessed and compiled 348 00:17:51,920 --> 00:17:55,760 and assembled, it's then in this form of all 0's and 1's. 349 00:17:55,760 --> 00:17:58,130 Somewhere on the CS50 IDE, there's a whole bunch 350 00:17:58,130 --> 00:18:00,800 of 0's and 1's representing cs50.c. 351 00:18:00,800 --> 00:18:03,410 Somewhere in CS50 IDE, there's another file 352 00:18:03,410 --> 00:18:08,840 representing the 0's and 1's for stdio.c So this final fourth step, a.k.a. 353 00:18:08,840 --> 00:18:13,280 linking, just takes all of my 0's and 1's, all of CS50 0's and 1's, all 354 00:18:13,280 --> 00:18:18,800 of printf's 0's and 1's, and links them all together into one big blob, 355 00:18:18,800 --> 00:18:23,870 if you will, that collectively represent your program, hello. 356 00:18:23,870 --> 00:18:26,960 So, my god, like, that's quite a mouthful and so many steps. 357 00:18:26,960 --> 00:18:31,250 And none of the steps have I described are really germane to you implementing 358 00:18:31,250 --> 00:18:35,090 Mario's pyramid or cash or credit, because what we've really 359 00:18:35,090 --> 00:18:37,340 been doing over the past week is taking all four 360 00:18:37,340 --> 00:18:40,880 of these fairly low-level, sophisticated concepts and, if you will, 361 00:18:40,880 --> 00:18:44,720 abstracting them away so that we just refer to this whole process 362 00:18:44,720 --> 00:18:46,310 as compiling. 363 00:18:46,310 --> 00:18:48,380 So we even though, yes, technically, compiling 364 00:18:48,380 --> 00:18:51,320 is just one of the four steps, what a programmer typically 365 00:18:51,320 --> 00:18:54,470 does when saying compiling is they're, just with a wave of the hand, 366 00:18:54,470 --> 00:18:58,400 referring to all of those lower-level details. 367 00:18:58,400 --> 00:19:01,700 But it is the case that there's multiple steps happening underneath the hood. 368 00:19:01,700 --> 00:19:04,610 And this is what make and, in turn, Clang are doing for you, 369 00:19:04,610 --> 00:19:08,810 automating this process of going from source code to assembly code 370 00:19:08,810 --> 00:19:13,153 to machine code and then linking it all together with any libraries you 371 00:19:13,153 --> 00:19:13,820 might have used. 372 00:19:13,820 --> 00:19:15,800 So no longer take for granted what's happening. 373 00:19:15,800 --> 00:19:17,990 Hopefully, that offers you a glimpse a bit more 374 00:19:17,990 --> 00:19:21,860 of what's actually happening when you compile your own code. 375 00:19:21,860 --> 00:19:24,800 Well, let me pause there, because that's quite a mouthful, 376 00:19:24,800 --> 00:19:29,660 and see if there's any questions on preprocessing, compiling, 377 00:19:29,660 --> 00:19:33,050 or assembling, or linking, a.k.a. 378 00:19:33,050 --> 00:19:35,120 compiling. 379 00:19:35,120 --> 00:19:37,550 And again, we won't dwell at this low level. 380 00:19:37,550 --> 00:19:40,640 We'll tend to now just abstract this all away if we can sort of agree 381 00:19:40,640 --> 00:19:42,540 that, OK, yes, there's those steps. 382 00:19:42,540 --> 00:19:45,290 But what's really important is the whole process, not the minutia. 383 00:19:45,290 --> 00:19:46,260 Sophia? 384 00:19:46,260 --> 00:19:50,060 SOPHIA: I had a question about with the first step, when 385 00:19:50,060 --> 00:19:53,720 we're replacing all the information at the top, 386 00:19:53,720 --> 00:19:56,790 is that information contained within the IDE? 387 00:19:56,790 --> 00:19:58,010 Or where do we-- 388 00:19:58,010 --> 00:20:00,375 are there files saved somewhere in that IDE, like, where 389 00:20:00,375 --> 00:20:02,000 it's getting all this information from? 390 00:20:02,000 --> 00:20:03,020 DAVID MALAN: Yeah, really good question. 391 00:20:03,020 --> 00:20:04,603 Where are all these files coming from? 392 00:20:04,603 --> 00:20:07,320 So yes, when you are using CS50 IDE, or frankly, 393 00:20:07,320 --> 00:20:09,830 if you're using your own Mac or your own PC, 394 00:20:09,830 --> 00:20:13,810 and you have preinstalled a compiler into your Mac or PC just like we have 395 00:20:13,810 --> 00:20:18,500 to CS50 IDE, what you get is a whole bunch of .h files somewhere 396 00:20:18,500 --> 00:20:19,700 on the computer system. 397 00:20:19,700 --> 00:20:23,950 You might also have a whole bunch of .c files, or compiled versions thereof, 398 00:20:23,950 --> 00:20:24,950 somewhere on the system. 399 00:20:24,950 --> 00:20:28,370 So yes, when you download and install a compiler, 400 00:20:28,370 --> 00:20:31,280 you are getting all of these libraries added for you. 401 00:20:31,280 --> 00:20:35,720 And we preinstalled an additional library called CS50's library that 402 00:20:35,720 --> 00:20:40,180 additionally comes with its own .h file and its own machine code as well. 403 00:20:40,180 --> 00:20:43,250 So all of those files are somewhere in CS50 IDE, 404 00:20:43,250 --> 00:20:46,460 or equivalently, in your own Mac or PC if you're working locally. 405 00:20:46,460 --> 00:20:48,620 And the compiler, Clang, in this case, just 406 00:20:48,620 --> 00:20:52,370 knows how to find that because one of the steps involved in installing 407 00:20:52,370 --> 00:20:55,130 your own compiler is making sure it's configured to know, 408 00:20:55,130 --> 00:20:58,010 per Sophia's question, where all those files are. 409 00:20:58,010 --> 00:21:00,770 410 00:21:00,770 --> 00:21:02,990 [? Basili? ?] I'm sorry if I'm mispronouncing it. 411 00:21:02,990 --> 00:21:04,010 [? Basili? ?] 412 00:21:04,010 --> 00:21:06,800 [? BASILI: ?] So whenever we're compiling hello, 413 00:21:06,800 --> 00:21:11,960 for example, is the compiler also compiling, for example, CS50? 414 00:21:11,960 --> 00:21:16,387 Or does CS50 already exist in machine code somewhere beneath? 415 00:21:16,387 --> 00:21:18,220 DAVID MALAN: Yeah, really good question too. 416 00:21:18,220 --> 00:21:20,570 So I was kind of skirting this part of Sophia's question 417 00:21:20,570 --> 00:21:25,640 because technically speaking, probably cs50.c is not installed on the system. 418 00:21:25,640 --> 00:21:29,550 And technically, stdio.c is probably not installed in the system. 419 00:21:29,550 --> 00:21:30,050 Why? 420 00:21:30,050 --> 00:21:31,160 It just doesn't need to be. 421 00:21:31,160 --> 00:21:32,868 It would be kind of inefficient, that is, 422 00:21:32,868 --> 00:21:35,600 slow, if every time you compiled your own program, 423 00:21:35,600 --> 00:21:39,050 you had to additionally compile CS50's program, and stdio's program, 424 00:21:39,050 --> 00:21:40,020 and so forth. 425 00:21:40,020 --> 00:21:42,740 So it actually stands to reason that what computers typically do 426 00:21:42,740 --> 00:21:46,490 is they precompile all of those library files for you 427 00:21:46,490 --> 00:21:48,823 so that more efficiently they can just be linked in. 428 00:21:48,823 --> 00:21:50,990 And you don't have to keep preprocessing, compiling, 429 00:21:50,990 --> 00:21:53,330 and assembling third-party code. 430 00:21:53,330 --> 00:21:57,560 You only perform those steps on your own code and then link everything together. 431 00:21:57,560 --> 00:21:59,270 And indeed, that's the case. 432 00:21:59,270 --> 00:22:01,490 It's all done in advance. 433 00:22:01,490 --> 00:22:03,800 Iris, question from you. 434 00:22:03,800 --> 00:22:07,070 IRIS: When we replace the header files with prototypes, 435 00:22:07,070 --> 00:22:10,440 are we only replacing it with the prototypes that get used? 436 00:22:10,440 --> 00:22:12,777 Or are all the prototypes technically substituted? 437 00:22:12,777 --> 00:22:15,110 DAVID MALAN: Yeah, so I was kind of sweeping that detail 438 00:22:15,110 --> 00:22:16,535 under the rug with my dot, dot, dot. 439 00:22:16,535 --> 00:22:18,618 There's a whole lot of other stuff in those files. 440 00:22:18,618 --> 00:22:21,110 You're getting the entire contents of those files, 441 00:22:21,110 --> 00:22:24,710 even if the only thing you need is the prototype. 442 00:22:24,710 --> 00:22:27,710 But, and this is why I alluded to the fact too that technically, 443 00:22:27,710 --> 00:22:30,860 there probably isn't a stdio.c file, because there 444 00:22:30,860 --> 00:22:32,630 would be so much stuff in it. 445 00:22:32,630 --> 00:22:36,140 There's probably not just one stdio.h file with everything in it. 446 00:22:36,140 --> 00:22:40,070 There's probably some smaller files that get magically included as well. 447 00:22:40,070 --> 00:22:44,300 But yes, there are many more lines of code in those files. 448 00:22:44,300 --> 00:22:47,330 But that's OK. 449 00:22:47,330 --> 00:22:51,920 Your compiler is only going to use the lines that it actually cares about. 450 00:22:51,920 --> 00:22:53,120 Good question. 451 00:22:53,120 --> 00:22:56,450 All right, so with that said, this past week 452 00:22:56,450 --> 00:22:58,850 undoubtedly was a bit frustrating in some ways 453 00:22:58,850 --> 00:23:00,980 because you probably ran into problems. 454 00:23:00,980 --> 00:23:03,560 You ran into bugs, mistakes in your own code. 455 00:23:03,560 --> 00:23:06,165 You probably saw one or more yellow or red error messages. 456 00:23:06,165 --> 00:23:09,290 And you might have struggled a little bit just to get your code to compile. 457 00:23:09,290 --> 00:23:10,670 And again, that's normal. 458 00:23:10,670 --> 00:23:12,390 That will go away over time. 459 00:23:12,390 --> 00:23:16,320 But honestly, whenever I write C, let's say 20% of the time, 460 00:23:16,320 --> 00:23:20,400 I still have a compilation error, let alone logical errors, in my own code. 461 00:23:20,400 --> 00:23:23,240 So this is just part of the experience of writing code. 462 00:23:23,240 --> 00:23:25,370 Humans make mistakes in all forms of life. 463 00:23:25,370 --> 00:23:28,130 And that's ever more true in the context of code, where again, 464 00:23:28,130 --> 00:23:32,180 per our first two weeks precision is important as is correctness. 465 00:23:32,180 --> 00:23:35,520 And it's hard sometimes to achieve both of those goals. 466 00:23:35,520 --> 00:23:38,060 So let's consider now how you might be more 467 00:23:38,060 --> 00:23:42,590 empowered to debug your own code-- that is, find problems in your own code. 468 00:23:42,590 --> 00:23:44,750 And this word actually has some etymology. 469 00:23:44,750 --> 00:23:46,670 This isn't necessarily the first bug. 470 00:23:46,670 --> 00:23:49,130 But perhaps the most famous bug is this one pictured 471 00:23:49,130 --> 00:23:53,060 here from the research notebook of Grace Hopper, 472 00:23:53,060 --> 00:23:56,090 a famous computer scientist, who had discovered 473 00:23:56,090 --> 00:23:59,810 that there were some problems with the Harvard Mark II computer, a very 474 00:23:59,810 --> 00:24:03,440 famous computer nowadays that actually lives over soon 475 00:24:03,440 --> 00:24:05,240 in the new engineering school on campus-- 476 00:24:05,240 --> 00:24:06,830 used to live in the Science Center. 477 00:24:06,830 --> 00:24:08,330 The computer was having problems. 478 00:24:08,330 --> 00:24:12,770 And sure enough, when the engineers took a look inside of this big mainframe 479 00:24:12,770 --> 00:24:15,770 computer, there was actually a bug, pictured here 480 00:24:15,770 --> 00:24:17,900 and taped to Grace Hopper's notebook. 481 00:24:17,900 --> 00:24:20,840 So this wasn't necessarily the first use of the term "bug," 482 00:24:20,840 --> 00:24:25,110 but it is a very well-known example of an actual bug in an actual computer. 483 00:24:25,110 --> 00:24:27,860 Nowadays, we speak a little more metaphorically that a bug is just 484 00:24:27,860 --> 00:24:29,760 a mistake in one program. 485 00:24:29,760 --> 00:24:33,020 And we did give you a few tools last week for troubleshooting bugs. 486 00:24:33,020 --> 00:24:37,135 Help50 allows you to better understand some of the cryptic error messages. 487 00:24:37,135 --> 00:24:39,510 And that's just because the staff wrote this program that 488 00:24:39,510 --> 00:24:41,610 analyzed the problem you're having, and we try 489 00:24:41,610 --> 00:24:44,250 to translate it to just more human-friendly speak. 490 00:24:44,250 --> 00:24:47,400 We saw a tool called style50, which helps you not with your correctness, 491 00:24:47,400 --> 00:24:49,470 but just with the aesthetics of your code, 492 00:24:49,470 --> 00:24:52,020 helping you better indent things and add white space-- that 493 00:24:52,020 --> 00:24:55,050 is, blank lines or space characters-- so it's a little more user 494 00:24:55,050 --> 00:24:56,760 friendly to the human to read. 495 00:24:56,760 --> 00:24:59,130 And then check50, which, of course, the staff 496 00:24:59,130 --> 00:25:01,560 write so that we can give you immediate feedback on 497 00:25:01,560 --> 00:25:05,230 whether or not your code is correct per the problem sets or the lab 498 00:25:05,230 --> 00:25:06,450 specification. 499 00:25:06,450 --> 00:25:09,323 But there's some other tools that you should have in your toolkit. 500 00:25:09,323 --> 00:25:10,740 And we'll give those to you today. 501 00:25:10,740 --> 00:25:14,790 And one, frankly, is this universal debugging tool just called, 502 00:25:14,790 --> 00:25:16,928 in the context of C, printf. 503 00:25:16,928 --> 00:25:18,720 So printf, of course, is just this function 504 00:25:18,720 --> 00:25:20,470 that prints stuff out onto the screen. 505 00:25:20,470 --> 00:25:24,270 But that in and of itself is a wonderfully powerful tool 506 00:25:24,270 --> 00:25:26,820 via which you can chase down problems in your code. 507 00:25:26,820 --> 00:25:29,940 And even after we leave C in a few weeks and introduce 508 00:25:29,940 --> 00:25:33,690 Python and other languages, almost every programming language out there 509 00:25:33,690 --> 00:25:35,460 has some form of printf. 510 00:25:35,460 --> 00:25:36,480 Maybe it's called print. 511 00:25:36,480 --> 00:25:38,940 Maybe it's called say, as it was in Scratch, 512 00:25:38,940 --> 00:25:43,780 but some ability to display information or present information to a human. 513 00:25:43,780 --> 00:25:47,700 So let's try to use this primitive, this notion of print f, 514 00:25:47,700 --> 00:25:49,760 to chase down a bug in one's code. 515 00:25:49,760 --> 00:25:52,950 So let me go ahead and deliberately write a buggy program. 516 00:25:52,950 --> 00:25:56,570 I'm going to even call the file buggy0.c. 517 00:25:56,570 --> 00:26:01,230 And at the top of this file, I'm going to go ahead and #include stdio.h. 518 00:26:01,230 --> 00:26:03,810 No need for the CS50 library for this one. 519 00:26:03,810 --> 00:26:06,960 And then I'm going to do int main(void), which we saw last week, 520 00:26:06,960 --> 00:26:08,700 and we'll explain in more detail today. 521 00:26:08,700 --> 00:26:10,260 And then I'm going to give myself a quick loop. 522 00:26:10,260 --> 00:26:12,990 I just want to go ahead and print out, oh, I don't know, like, 523 00:26:12,990 --> 00:26:14,580 10 hashes on the screen. 524 00:26:14,580 --> 00:26:17,430 So I want to print a vertical column, kind of like one 525 00:26:17,430 --> 00:26:20,190 of those screenshots from Super Mario Bros., not a pyramid, 526 00:26:20,190 --> 00:26:23,020 just a single column of hashes, and 10 of them. 527 00:26:23,020 --> 00:26:25,830 So I'm going to do something like, int i = 0, 528 00:26:25,830 --> 00:26:28,140 because I feel like I learned in class that I generally 529 00:26:28,140 --> 00:26:29,562 should start counting from 0. 530 00:26:29,562 --> 00:26:31,770 Then I'm going to have my condition in this for loop. 531 00:26:31,770 --> 00:26:33,190 And I want to do this 10 times. 532 00:26:33,190 --> 00:26:35,200 I'm going to do it less than or equal to 10. 533 00:26:35,200 --> 00:26:37,242 Then I'm going to go ahead and have my increment, 534 00:26:37,242 --> 00:26:39,478 which quite simply can be expressed as i++. 535 00:26:39,478 --> 00:26:42,270 And then inside this loop, I'm just going to go ahead and print out 536 00:26:42,270 --> 00:26:44,970 a single hash followed by a new line. 537 00:26:44,970 --> 00:26:46,680 I'm going to save the program. 538 00:26:46,680 --> 00:26:51,870 I'm going to compile it with clang -o buggy0 buggy0-- 539 00:26:51,870 --> 00:26:52,768 I mean, no. 540 00:26:52,768 --> 00:26:54,810 You don't have to use Clang manually in this way. 541 00:26:54,810 --> 00:26:58,453 It's a lot simpler to just abstract that away-- 542 00:26:58,453 --> 00:26:59,370 that's not a command-- 543 00:26:59,370 --> 00:27:03,330 to abstract that away and run make buggy0. 544 00:27:03,330 --> 00:27:07,475 And make will take care of the process of invoking Clang for you. 545 00:27:07,475 --> 00:27:08,850 I'm going to go ahead and run it. 546 00:27:08,850 --> 00:27:12,290 Seems to be compiling successfully, so no need for help50. 547 00:27:12,290 --> 00:27:13,890 It's already pretty well styled. 548 00:27:13,890 --> 00:27:18,420 In fact, if I run style50 on this buggy0, I don't have any comments yet. 549 00:27:18,420 --> 00:27:20,490 But at least it looks very nicely indented. 550 00:27:20,490 --> 00:27:22,000 So I think I'm OK with that. 551 00:27:22,000 --> 00:27:25,530 But let me add that comment and do "Print 10 hashes" just 552 00:27:25,530 --> 00:27:27,120 to remind myself of my goal. 553 00:27:27,120 --> 00:27:31,290 And now let me go ahead and run this, ./buggy0, Enter. 554 00:27:31,290 --> 00:27:32,670 And I see, OK, good. 555 00:27:32,670 --> 00:27:38,147 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, I think. 556 00:27:38,147 --> 00:27:39,480 All right, so it's a stupid bug. 557 00:27:39,480 --> 00:27:41,970 And maybe it's jumped out obviously to some of you. 558 00:27:41,970 --> 00:27:44,220 But maybe it's a little more subtle to others of you. 559 00:27:44,220 --> 00:27:45,250 But where do you begin? 560 00:27:45,250 --> 00:27:46,830 Suppose I were to run check50. 561 00:27:46,830 --> 00:27:51,570 And check50 were to say, nope, you printed out 11 hashes instead of 10. 562 00:27:51,570 --> 00:27:54,210 But my code looks right to me, at least at first glance. 563 00:27:54,210 --> 00:27:57,220 Well, how can I go about debugging this or solving this? 564 00:27:57,220 --> 00:27:58,780 Well, again, printf is your friend. 565 00:27:58,780 --> 00:28:01,560 If you want to understand more about your own program, 566 00:28:01,560 --> 00:28:05,460 use printf to temporarily print more information to the screen, 567 00:28:05,460 --> 00:28:08,790 not that we want in the final version, not that your TF wants to see, 568 00:28:08,790 --> 00:28:11,340 but that you, the programmer, can temporarily see. 569 00:28:11,340 --> 00:28:14,040 So before I print this hash, let me print something 570 00:28:14,040 --> 00:28:15,900 a little more pedantic like this-- 571 00:28:15,900 --> 00:28:19,950 "i is now %i backslash n." 572 00:28:19,950 --> 00:28:25,440 So I literally want to know, just for my own mental math, what is the value of i 573 00:28:25,440 --> 00:28:28,230 at this point before I print that hash? 574 00:28:28,230 --> 00:28:30,480 Now I'm going to go ahead and paste in the value of i. 575 00:28:30,480 --> 00:28:32,640 So I'm using %i as a placeholder. 576 00:28:32,640 --> 00:28:35,100 I'm plugging in the value of the variable i. 577 00:28:35,100 --> 00:28:36,870 I'm going to save my code now. 578 00:28:36,870 --> 00:28:39,750 I'm going to recompile it with make buggy0. 579 00:28:39,750 --> 00:28:41,380 And I'm going to rerun it now. 580 00:28:41,380 --> 00:28:43,680 And let me go ahead and increase the size of my window 581 00:28:43,680 --> 00:28:46,390 just so we can focus now on the output. 582 00:28:46,390 --> 00:28:50,440 And I'm going to go ahead and ./buggy0, Enter. 583 00:28:50,440 --> 00:28:53,670 OK, so now I see not only my output, but also 584 00:28:53,670 --> 00:28:56,670 commingled with that output, some diagnostic output, if you will, 585 00:28:56,670 --> 00:28:58,080 some debugging output. 586 00:28:58,080 --> 00:29:02,430 And it's just more pedantically telling me, "i is now 0," "i is now 1," 587 00:29:02,430 --> 00:29:08,490 "i is now 2," dot, dot, dot, "i is now 9," "i is now 10." 588 00:29:08,490 --> 00:29:11,040 OK, I don't hate the fact that i is 10. 589 00:29:11,040 --> 00:29:15,630 But I'm not loving the fact that if I started at 0 and printed a hash, 590 00:29:15,630 --> 00:29:19,140 and I'm hitting 10 and printing another hash, well, obviously, 591 00:29:19,140 --> 00:29:20,180 there's my problem. 592 00:29:20,180 --> 00:29:22,620 So it might not have been all that much more obvious 593 00:29:22,620 --> 00:29:24,030 than looking at the code itself. 594 00:29:24,030 --> 00:29:27,090 But by using printf, you can just be a lot more clear 595 00:29:27,090 --> 00:29:28,900 to yourself what's going on. 596 00:29:28,900 --> 00:29:32,490 So if now I see, OK, well, if I start at 0, I have to go up to 10. 597 00:29:32,490 --> 00:29:35,100 I could change my code to do this to be less than 10. 598 00:29:35,100 --> 00:29:38,040 I could leave that alone and go from 1 through 10. 599 00:29:38,040 --> 00:29:41,920 But again, programmer convention would be to go from 0 up to 10. 600 00:29:41,920 --> 00:29:43,140 So I think I'm good now. 601 00:29:43,140 --> 00:29:46,662 And in fact, now I'll go ahead and recompile this, make buggy0. 602 00:29:46,662 --> 00:29:49,620 Let me go ahead and increase the size of the window again just so I can 603 00:29:49,620 --> 00:29:53,980 temporarily see this and ./buggy0. 604 00:29:53,980 --> 00:29:57,460 OK, I start now at 0, 1, 2, dot, dot, dot. 605 00:29:57,460 --> 00:29:59,160 Now I stop at 9. 606 00:29:59,160 --> 00:30:01,080 And that, of course, gives me 10 hashes. 607 00:30:01,080 --> 00:30:03,343 So again, I don't need this in the final output. 608 00:30:03,343 --> 00:30:05,010 And I'm to go ahead and delete this now. 609 00:30:05,010 --> 00:30:06,510 It's temporary output. 610 00:30:06,510 --> 00:30:08,760 But again, having those instincts-- if you don't quite 611 00:30:08,760 --> 00:30:12,120 understand why your code is compiling but not running properly, 612 00:30:12,120 --> 00:30:15,360 and you want to better see what the computer is clearly seeing, 613 00:30:15,360 --> 00:30:18,930 its mind eye, use printf to just tell yourself 614 00:30:18,930 --> 00:30:23,790 what the value of some variable or variables are anywhere in your code 615 00:30:23,790 --> 00:30:26,732 that you want to see a little more detail. 616 00:30:26,732 --> 00:30:28,440 All right, let me pause for just a moment 617 00:30:28,440 --> 00:30:32,220 to see if there's any questions on this technique of just using printf 618 00:30:32,220 --> 00:30:37,830 to begin to debug your code and to see the values of variables 619 00:30:37,830 --> 00:30:40,560 in a way that's a little more explicit. 620 00:30:40,560 --> 00:30:43,980 621 00:30:43,980 --> 00:30:44,580 No? 622 00:30:44,580 --> 00:30:45,670 All right. 623 00:30:45,670 --> 00:30:50,130 Well, let me propose an even more powerful tool that admittedly 624 00:30:50,130 --> 00:30:51,480 takes a little getting used to. 625 00:30:51,480 --> 00:30:54,000 But this is kind of one of those lessons, 626 00:30:54,000 --> 00:30:58,350 trust me, if you will, that if you spend a few more minutes, maybe even 627 00:30:58,350 --> 00:31:01,320 an hour or so this week, learning the following tool, 628 00:31:01,320 --> 00:31:04,500 you will save yourself hours, plural, maybe even 629 00:31:04,500 --> 00:31:07,440 tens of hours over the course of the next many weeks 630 00:31:07,440 --> 00:31:12,520 because this tool can help you truly see what's going on inside of your code. 631 00:31:12,520 --> 00:31:15,870 So this tool we're going to add to the list today is called debug50. 632 00:31:15,870 --> 00:31:20,130 And while this one does end with 50, implying that it's a CS50 tool, 633 00:31:20,130 --> 00:31:24,450 it's built on top of an industry standard tool known as GDB, the GNU 634 00:31:24,450 --> 00:31:27,960 DeBugger, that's a standard tool that a lot of different computer systems 635 00:31:27,960 --> 00:31:32,520 use to provide you with the ability to debug your code in a more sophisticated 636 00:31:32,520 --> 00:31:35,530 way than just using printf alone. 637 00:31:35,530 --> 00:31:36,780 So let's go ahead and do this. 638 00:31:36,780 --> 00:31:39,360 Let me go back to the buggy version of this program 639 00:31:39,360 --> 00:31:43,620 which, recall, had me going from 0 through 10, which was too many steps. 640 00:31:43,620 --> 00:31:47,850 A moment ago, I proposed that we just use printf to see the value of i. 641 00:31:47,850 --> 00:31:50,640 But frankly, the bigger our programs get, the more complicated 642 00:31:50,640 --> 00:31:53,730 they get, the more output they need to have on the screen. 643 00:31:53,730 --> 00:31:56,250 It's just going to get very messy quickly 644 00:31:56,250 --> 00:31:58,800 if you're printing out stuff that shouldn't be there, right? 645 00:31:58,800 --> 00:31:59,910 Think back to Mario. 646 00:31:59,910 --> 00:32:03,060 Mario's pyramid is this sort of graphical output. 647 00:32:03,060 --> 00:32:07,860 And it would very quickly get ugly and kind of hard to understand your pyramid 648 00:32:07,860 --> 00:32:11,520 if you're comingling that pyramid with actual textual output from printf 649 00:32:11,520 --> 00:32:12,430 as well. 650 00:32:12,430 --> 00:32:16,560 So debug50, and in turn a debugger in any language, 651 00:32:16,560 --> 00:32:20,580 is a tool that allows you to run your code step by step 652 00:32:20,580 --> 00:32:26,550 and look inside of variables and other pieces of memory inside of the computer 653 00:32:26,550 --> 00:32:28,080 while your program is running. 654 00:32:28,080 --> 00:32:31,800 Right now, pretty much every program we run takes a split second to run. 655 00:32:31,800 --> 00:32:34,170 That's way too fast for me, the human, to wrap my mind 656 00:32:34,170 --> 00:32:36,330 around what's going on step by step. 657 00:32:36,330 --> 00:32:38,550 A debugger allows you to run your program, 658 00:32:38,550 --> 00:32:42,970 but much more slowly, step by step, so you can see what's going on. 659 00:32:42,970 --> 00:32:48,030 So I'm going to go ahead now and run debug50 ./hello. 660 00:32:48,030 --> 00:32:52,380 No, sorry, debug50 ./buggy0. 661 00:32:52,380 --> 00:32:54,900 So I write debug50 first, a space, and then 662 00:32:54,900 --> 00:32:56,910 dot slash and the name of the program that's 663 00:32:56,910 --> 00:32:59,785 already compiled that I want to debug. 664 00:32:59,785 --> 00:33:01,410 So I'm going to go ahead and hit Enter. 665 00:33:01,410 --> 00:33:03,240 And notice that, oh, it was smart. 666 00:33:03,240 --> 00:33:05,100 It noticed that I changed my code. 667 00:33:05,100 --> 00:33:06,060 And I did a moment ago. 668 00:33:06,060 --> 00:33:07,740 I reverted it back to the buggy version. 669 00:33:07,740 --> 00:33:10,380 So let me fix this-- make buggy0. 670 00:33:10,380 --> 00:33:11,620 All right, no errors. 671 00:33:11,620 --> 00:33:13,500 Now let me go ahead and run debug50 again. 672 00:33:13,500 --> 00:33:17,280 And if you haven't noticed this already, sometimes I seem to type crazy fast. 673 00:33:17,280 --> 00:33:19,180 I'm not necessarily typing that fast. 674 00:33:19,180 --> 00:33:21,960 I'm going through my history in CS50 IDE. 675 00:33:21,960 --> 00:33:25,470 Using your arrow keys, Up and Down, you can scroll back 676 00:33:25,470 --> 00:33:29,070 in time for all of the commands you've typed over the past few minutes 677 00:33:29,070 --> 00:33:30,430 or hours or even days. 678 00:33:30,430 --> 00:33:32,430 And this will just start to save you keystrokes. 679 00:33:32,430 --> 00:33:33,870 So I'm going to go ahead and hit Up. 680 00:33:33,870 --> 00:33:36,495 And now I don't have to bother typing this whole command again. 681 00:33:36,495 --> 00:33:38,320 It's a helpful way to just save time. 682 00:33:38,320 --> 00:33:40,800 I'm going to go head in now and hit Enter. 683 00:33:40,800 --> 00:33:43,650 And now notice this error message-- 684 00:33:43,650 --> 00:33:45,050 I haven't set any breakpoints. 685 00:33:45,050 --> 00:33:48,300 "Set at least one breakpoint by clicking to the left of a line number and then 686 00:33:48,300 --> 00:33:49,500 re-run debug50!" 687 00:33:49,500 --> 00:33:51,420 Well, what's going on here? 688 00:33:51,420 --> 00:33:55,620 Well, debug50 needs me to tell the computer in advance at what line 689 00:33:55,620 --> 00:33:59,910 I want to break into and step through step by step. 690 00:33:59,910 --> 00:34:01,020 So, I can do that. 691 00:34:01,020 --> 00:34:03,780 I'm going to go over to the side of the file here, as it says. 692 00:34:03,780 --> 00:34:04,530 And you know what? 693 00:34:04,530 --> 00:34:08,460 The first interesting line is this one here, line 6. 694 00:34:08,460 --> 00:34:12,060 So I clicked in the so-called gutter, the left-hand side of the screen, 695 00:34:12,060 --> 00:34:13,170 on line 6. 696 00:34:13,170 --> 00:34:16,139 And that automatically put a red dot there, like a stop sign. 697 00:34:16,139 --> 00:34:21,420 Now, one last time, I'm going to go ahead and run debug50 ./buggy0 and hit 698 00:34:21,420 --> 00:34:21,960 Enter. 699 00:34:21,960 --> 00:34:25,887 And now notice this fancy new panel opens up on the right-hand side. 700 00:34:25,887 --> 00:34:27,929 And it's going to look a little cryptic at first. 701 00:34:27,929 --> 00:34:30,219 But let's consider what has changed on the screen. 702 00:34:30,219 --> 00:34:34,440 Notice now that highlighted in this sort of off-yellow color is line 6. 703 00:34:34,440 --> 00:34:37,949 And that's because what debug50 is doing is it's running my program, 704 00:34:37,949 --> 00:34:41,610 but it has paused execution on line 6. 705 00:34:41,610 --> 00:34:44,100 So it's done everything from line 1 through 5, 706 00:34:44,100 --> 00:34:46,860 but now it's waiting for me on line 6. 707 00:34:46,860 --> 00:34:49,620 And what's interesting over here is this-- let 708 00:34:49,620 --> 00:34:51,929 me zoom in on this window over here. 709 00:34:51,929 --> 00:34:54,150 And there's a lot going on here, admittedly. 710 00:34:54,150 --> 00:34:59,190 But let's focus for just a moment not on Watch Expressions, not on Call Stack, 711 00:34:59,190 --> 00:35:00,850 but only on Local Variables. 712 00:35:00,850 --> 00:35:04,380 And notice, I have a variable called i whose initial value is 0, 713 00:35:04,380 --> 00:35:05,820 and it's of type int. 714 00:35:05,820 --> 00:35:09,150 Now, this is kind of interesting because watch what I can do via these icons 715 00:35:09,150 --> 00:35:09,930 up here. 716 00:35:09,930 --> 00:35:15,360 I can click on this Step Over line and start to step through my code line 717 00:35:15,360 --> 00:35:16,007 by line. 718 00:35:16,007 --> 00:35:17,340 So let me go ahead and zoom out. 719 00:35:17,340 --> 00:35:18,870 Let me go ahead and click Step Over. 720 00:35:18,870 --> 00:35:21,180 And watch what happens to the yellow highlighting. 721 00:35:21,180 --> 00:35:23,140 It moves down to the next line. 722 00:35:23,140 --> 00:35:27,090 But notice, if I zoom in again up here, the value of i has not changed. 723 00:35:27,090 --> 00:35:29,460 Now let me go ahead and step over again. 724 00:35:29,460 --> 00:35:31,740 And notice the yellow highlighting doubles back. 725 00:35:31,740 --> 00:35:33,790 That makes sense because I'm in a loop. 726 00:35:33,790 --> 00:35:36,760 So it should be going back and forth, back and forth. 727 00:35:36,760 --> 00:35:38,123 But what next happens in a loop? 728 00:35:38,123 --> 00:35:40,290 Every time you go back to the beginning of the loop, 729 00:35:40,290 --> 00:35:43,770 remember that your incrementation happens, like the i++. 730 00:35:43,770 --> 00:35:46,530 So watch now closely in the top right-hand corner, 731 00:35:46,530 --> 00:35:52,110 when I Step Over now, notice that the value of i in my debugger 732 00:35:52,110 --> 00:35:54,058 has just been changed to 1. 733 00:35:54,058 --> 00:35:55,350 So I didn't have to use printf. 734 00:35:55,350 --> 00:35:57,400 I didn't have to mess up the output of my screen. 735 00:35:57,400 --> 00:35:59,850 I can literally see in this GUI, this Graphical User 736 00:35:59,850 --> 00:36:02,790 Interface on the right-hand side, what the value of i is. 737 00:36:02,790 --> 00:36:05,310 Now if I just start clicking a little more quickly, 738 00:36:05,310 --> 00:36:09,900 notice that as the loop is executing, again and again, the value of i 739 00:36:09,900 --> 00:36:11,070 keeps getting updated. 740 00:36:11,070 --> 00:36:11,820 And you know what? 741 00:36:11,820 --> 00:36:15,930 I bet, even though we started at 0, if I do this enough times, 742 00:36:15,930 --> 00:36:18,990 I will see that the value is 10 now, thereby 743 00:36:18,990 --> 00:36:25,110 giving me another printf at the bottom, thereby explaining the 11 total hashes 744 00:36:25,110 --> 00:36:25,950 that I saw. 745 00:36:25,950 --> 00:36:28,450 So I haven't gotten any new information here. 746 00:36:28,450 --> 00:36:30,960 But notice I've gotten unperturbed information. 747 00:36:30,960 --> 00:36:35,370 I've not messily and sloppily printed out all of these printf statements 748 00:36:35,370 --> 00:36:36,100 on the screen. 749 00:36:36,100 --> 00:36:38,430 I'm just kind of watching a little more methodically 750 00:36:38,430 --> 00:36:43,230 what's happening to the state of my variable over on the top right there. 751 00:36:43,230 --> 00:36:47,700 All right, let me pause here too to see if there's any questions on what 752 00:36:47,700 --> 00:36:49,230 this debugger does. 753 00:36:49,230 --> 00:36:51,150 Again, you compile your code. 754 00:36:51,150 --> 00:36:56,340 You run debug50 on your code, but only after setting a so-called breakpoint, 755 00:36:56,340 --> 00:37:00,575 where you decide in advance where do you want to pause execution of your code. 756 00:37:00,575 --> 00:37:03,450 Even though here I did it pretty much at the beginning of my program, 757 00:37:03,450 --> 00:37:05,242 for bigger programs, it's going to be super 758 00:37:05,242 --> 00:37:07,718 convenient to be able to pause halfway through your code 759 00:37:07,718 --> 00:37:09,510 and not have to go through the whole thing. 760 00:37:09,510 --> 00:37:11,430 Peter, question. 761 00:37:11,430 --> 00:37:16,350 PETER: About the debugger, what's the difference between Step Over 762 00:37:16,350 --> 00:37:18,813 and Step Into and Step Out and-- 763 00:37:18,813 --> 00:37:20,230 DAVID MALAN: Really good question. 764 00:37:20,230 --> 00:37:21,980 Let me come back to that in just a moment, 765 00:37:21,980 --> 00:37:25,800 because we'll do one other example where Step Into and Step Out actually 766 00:37:25,800 --> 00:37:27,520 are germane. 767 00:37:27,520 --> 00:37:28,490 But before we do that. 768 00:37:28,490 --> 00:37:33,520 Any other questions about debug50 before we reveal what Step Into and Step 769 00:37:33,520 --> 00:37:35,335 Over do for us as well? 770 00:37:35,335 --> 00:37:38,940 771 00:37:38,940 --> 00:37:39,910 Oh, all right. 772 00:37:39,910 --> 00:37:42,310 Well, let's take Peter's question right there. 773 00:37:42,310 --> 00:37:44,705 Let me go ahead now and get out of the debugger. 774 00:37:44,705 --> 00:37:46,830 And honestly, I don't see an obvious way to get out 775 00:37:46,830 --> 00:37:48,490 of the debugger at the moment. 776 00:37:48,490 --> 00:37:51,240 But Control-C is your new friend today too. 777 00:37:51,240 --> 00:37:53,700 Pretty much any time you lose control of a program 778 00:37:53,700 --> 00:37:56,880 because the debugger's running, and you've lost interest in it. 779 00:37:56,880 --> 00:37:58,770 Or maybe last week, you wrote a program that 780 00:37:58,770 --> 00:38:01,800 has an infinite loop that just keeps going and going and going, 781 00:38:01,800 --> 00:38:04,110 Control-C will break out of that program. 782 00:38:04,110 --> 00:38:07,290 But let's now write quickly another program that, this time, 783 00:38:07,290 --> 00:38:08,430 has a second function. 784 00:38:08,430 --> 00:38:10,800 And we'll see one other feature of the debugger today. 785 00:38:10,800 --> 00:38:14,520 I'm going to go ahead and create a new file now called buggy1.c. 786 00:38:14,520 --> 00:38:16,470 Again, it's going to be deliberately flawed. 787 00:38:16,470 --> 00:38:20,280 But I'm going to first going to go ahead and #include cs50.h this time. 788 00:38:20,280 --> 00:38:22,830 And I'm going to #include stdio.h. 789 00:38:22,830 --> 00:38:24,590 I'm going to do int main void. 790 00:38:24,590 --> 00:38:27,090 And I'm going to go ahead and do the following-- give myself 791 00:38:27,090 --> 00:38:28,380 a variable called i. 792 00:38:28,380 --> 00:38:31,290 And I'm going to try to get a negative int by calling 793 00:38:31,290 --> 00:38:33,180 a function called get_negative_int. 794 00:38:33,180 --> 00:38:37,740 And then quite simply, I'm going to print out this value, "%i backslash n", 795 00:38:37,740 --> 00:38:39,210 i, semicolon. 796 00:38:39,210 --> 00:38:40,860 Now, there's only one problem-- 797 00:38:40,860 --> 00:38:43,210 get_negative_int does not exist. 798 00:38:43,210 --> 00:38:45,870 So like last week, where we implemented get_positive_int, 799 00:38:45,870 --> 00:38:47,790 this week, I'll implement get_negative_int. 800 00:38:47,790 --> 00:38:49,890 But I'm going to do it incorrectly at first. 801 00:38:49,890 --> 00:38:54,300 Now, get_negative_int, as the name implies, needs to return an integer. 802 00:38:54,300 --> 00:38:57,330 And even though we only spent brief time on this last week, 803 00:38:57,330 --> 00:39:00,210 recall that you can specify the output of a function, 804 00:39:00,210 --> 00:39:03,720 a custom function that you wrote, by putting its so-called return 805 00:39:03,720 --> 00:39:05,555 value first on this line. 806 00:39:05,555 --> 00:39:08,430 And then you can put the name of the function, like get_negative_int, 807 00:39:08,430 --> 00:39:11,940 and then in parentheses, you can put the input to the function. 808 00:39:11,940 --> 00:39:15,030 But if it takes no input, you can literally write the word "void," 809 00:39:15,030 --> 00:39:17,965 which is a term of art that just means, nothing goes here. 810 00:39:17,965 --> 00:39:20,340 I'm going to go ahead now and implement get_negative_int. 811 00:39:20,340 --> 00:39:22,920 And frankly, I think it's going to be pretty similar to last week. 812 00:39:22,920 --> 00:39:24,212 But my memory is a little hazy. 813 00:39:24,212 --> 00:39:26,310 So again, it will be deliberately flawed. 814 00:39:26,310 --> 00:39:29,130 But I'm going to go ahead and declare a variable called n. 815 00:39:29,130 --> 00:39:31,420 Then I'm going to do the following-- 816 00:39:31,420 --> 00:39:34,170 I'm going to set n equal to get_int. 817 00:39:34,170 --> 00:39:39,000 And I'm just going to explicitly ask the user for "Negative integer" followed 818 00:39:39,000 --> 00:39:39,900 by a space. 819 00:39:39,900 --> 00:39:44,220 And then I'm going to keep doing this while n is less than 0. 820 00:39:44,220 --> 00:39:48,540 And then at the very last line, I'm going to return n. 821 00:39:48,540 --> 00:39:51,120 So again, I claim that this function will 822 00:39:51,120 --> 00:39:53,340 get me a negative int from the user. 823 00:39:53,340 --> 00:39:57,810 And it's going to keep doing it again and again until the user cooperates. 824 00:39:57,810 --> 00:40:00,720 However, there is a bug. 825 00:40:00,720 --> 00:40:02,730 And there's a couple of bugs, in fact. 826 00:40:02,730 --> 00:40:06,720 Right now, let me go ahead and make a deliberate mistake-- make buggy1, 827 00:40:06,720 --> 00:40:07,740 Enter. 828 00:40:07,740 --> 00:40:10,020 And I see a whole bunch of errors here. 829 00:40:10,020 --> 00:40:12,300 I could use help50 on this. 830 00:40:12,300 --> 00:40:16,290 But based on last week, does anyone recall what the error here might be? 831 00:40:16,290 --> 00:40:20,100 "Error-- implicit declaration of function 'get_negative_int' 832 00:40:20,100 --> 00:40:21,930 is invalid in C99." 833 00:40:21,930 --> 00:40:24,992 So I don't know all of that, but implicit declaration of function 834 00:40:24,992 --> 00:40:26,700 is something you're going to start to see 835 00:40:26,700 --> 00:40:28,770 more often if you make this mistake. 836 00:40:28,770 --> 00:40:35,030 Anyone recall what this means and what the fix is without resorting to help50? 837 00:40:35,030 --> 00:40:37,760 Yeah, Jasmine, what do you think? 838 00:40:37,760 --> 00:40:40,370 JASMINE: So basically, since you declared it 839 00:40:40,370 --> 00:40:42,830 after you already used it in your code, it 840 00:40:42,830 --> 00:40:46,050 doesn't know what to read that as when it's processing it. 841 00:40:46,050 --> 00:40:49,825 So you have to move the first line above when you actually start the code. 842 00:40:49,825 --> 00:40:50,700 DAVID MALAN: Perfect. 843 00:40:50,700 --> 00:40:53,690 And this is the only time I will claim that copy/paste 844 00:40:53,690 --> 00:40:55,730 is acceptable and encouraged. 845 00:40:55,730 --> 00:40:59,180 I'm going to copy the very first line only of that function. 846 00:40:59,180 --> 00:41:02,840 And as Jasmine proposed, I'm going to paste it at the very top of the file, 847 00:41:02,840 --> 00:41:05,990 thereby giving myself a hint otherwise known as a prototype. 848 00:41:05,990 --> 00:41:09,290 So I'll even label it as such to remind myself why it's there-- 849 00:41:09,290 --> 00:41:11,720 prototype of that function. 850 00:41:11,720 --> 00:41:16,790 And here, I'm going to go ahead and "Get negative integer from user." 851 00:41:16,790 --> 00:41:20,720 And then this function is as left as written. 852 00:41:20,720 --> 00:41:23,340 So I now have this prototype at the very top 853 00:41:23,340 --> 00:41:25,840 of my file, which I think will indeed get rid of this error. 854 00:41:25,840 --> 00:41:27,950 Let me go to make buggy1 again. 855 00:41:27,950 --> 00:41:29,960 Now I see that indeed compiled OK. 856 00:41:29,960 --> 00:41:33,110 But when I run it now, ./buggy1-- 857 00:41:33,110 --> 00:41:36,270 let me go ahead and input a negative integer, negative 1. 858 00:41:36,270 --> 00:41:36,770 Hm. 859 00:41:36,770 --> 00:41:38,685 Negative 2, negative 3-- 860 00:41:38,685 --> 00:41:41,810 I feel like the function should be happy with this, and it's obviously not. 861 00:41:41,810 --> 00:41:42,650 So there's a bug. 862 00:41:42,650 --> 00:41:45,470 I'm going to go ahead and hit Control-C to get out of my program 863 00:41:45,470 --> 00:41:47,810 because otherwise, it would run potentially forever. 864 00:41:47,810 --> 00:41:49,610 And now I'm going to use debug50. 865 00:41:49,610 --> 00:41:53,090 But debug50 just got really interesting, to Peter's question 866 00:41:53,090 --> 00:41:56,180 earlier, because now I have things I can step into. 867 00:41:56,180 --> 00:41:58,070 I'm not writing all of my code in main. 868 00:41:58,070 --> 00:42:00,570 There's this other function now called get_negative_int. 869 00:42:00,570 --> 00:42:02,390 So let's see what happens now. 870 00:42:02,390 --> 00:42:05,930 Let me go ahead and set a breakpoint on the first interesting line of code, 871 00:42:05,930 --> 00:42:06,532 line 10. 872 00:42:06,532 --> 00:42:08,990 And it's interesting only in the sense that everything else 873 00:42:08,990 --> 00:42:10,670 is kind of boilerplate at this point. 874 00:42:10,670 --> 00:42:13,460 You just have to do it to get your program started. 875 00:42:13,460 --> 00:42:15,020 I'm going to now go down here. 876 00:42:15,020 --> 00:42:18,770 And I'm going to do debug50 ./buggy1. 877 00:42:18,770 --> 00:42:22,220 And in a moment, it's going to open up that sidebar. 878 00:42:22,220 --> 00:42:25,640 And I'm going to focus now not only on local variables-- 879 00:42:25,640 --> 00:42:29,810 like I did before, notice that i is again equal to 0 here by default. 880 00:42:29,810 --> 00:42:33,560 But I'm also going to reveal this option here, Call Stack. 881 00:42:33,560 --> 00:42:38,120 So Call Stack is a fancy way of referring to all of the functions 882 00:42:38,120 --> 00:42:43,560 that your program at this point in time has executed and not yet returned from. 883 00:42:43,560 --> 00:42:45,890 So right now, there's only one thing on the call stack 884 00:42:45,890 --> 00:42:49,700 because the only function that is currently executing is, of course, 885 00:42:49,700 --> 00:42:50,930 main, because why? 886 00:42:50,930 --> 00:42:55,040 I set a breakpoint at line 10, which is, by definition, inside of main. 887 00:42:55,040 --> 00:42:59,720 But to Peter's question earlier, I feel like lines 10 and 11-- 888 00:42:59,720 --> 00:43:01,550 frankly, they look pretty correct, right? 889 00:43:01,550 --> 00:43:03,470 It's hard at this point to have screwed up 890 00:43:03,470 --> 00:43:07,400 lines 10 and 11 except syntactically, because I'm getting a negative int. 891 00:43:07,400 --> 00:43:10,730 I'm storing it in i, and then I'm printing out the value of i 892 00:43:10,730 --> 00:43:12,230 on those two lines. 893 00:43:12,230 --> 00:43:16,370 But what if instead, I'm curious about get_negative_int? 894 00:43:16,370 --> 00:43:18,350 I feel like the bug-- logically, it's got 895 00:43:18,350 --> 00:43:21,170 to be in there because that's the harder code that I wrote. 896 00:43:21,170 --> 00:43:24,530 Notice this time, instead of clicking Step Over, 897 00:43:24,530 --> 00:43:28,640 let me go ahead and click on Step Into, which is one of the buttons Peter 898 00:43:28,640 --> 00:43:29,240 alluded to. 899 00:43:29,240 --> 00:43:33,440 And when I click Step Into, notice that you sort of go down the rabbit hole. 900 00:43:33,440 --> 00:43:38,460 And debug50 jumps into the function get_negative_int, 901 00:43:38,460 --> 00:43:41,460 and it focuses on the first interesting line of code. 902 00:43:41,460 --> 00:43:44,070 So do, in and of itself, really isn't that interesting. 903 00:43:44,070 --> 00:43:46,160 Int n isn't that interesting because it's not 904 00:43:46,160 --> 00:43:48,020 assigning a value to it even yet. 905 00:43:48,020 --> 00:43:50,930 The first juicy line of code seems to be line 19. 906 00:43:50,930 --> 00:43:53,150 And that's why the debugger has jumped to that line. 907 00:43:53,150 --> 00:43:57,350 Now, n = get_int feels pretty correct. 908 00:43:57,350 --> 00:43:59,300 It's hard to misuse get_int. 909 00:43:59,300 --> 00:44:02,420 But notice now on the right-hand side what has happened. 910 00:44:02,420 --> 00:44:06,500 Under Call Stack, you now see two things, not only main, 911 00:44:06,500 --> 00:44:08,930 but also get_negative_int in a stack. 912 00:44:08,930 --> 00:44:11,030 It's like a stack of trays in a cafeteria. 913 00:44:11,030 --> 00:44:13,250 The first tray at the bottom is like main. 914 00:44:13,250 --> 00:44:17,750 The second tray on the stack in the cafeteria is now get_negative_int. 915 00:44:17,750 --> 00:44:21,680 And what's cool about this is that notice that right now, I 916 00:44:21,680 --> 00:44:23,630 can see my local variables, n. 917 00:44:23,630 --> 00:44:25,380 And that's indeed the variable I used. 918 00:44:25,380 --> 00:44:26,750 So I no longer see i. 919 00:44:26,750 --> 00:44:30,780 I see n because I'm into the get_negative_int function. 920 00:44:30,780 --> 00:44:35,030 And now if I keep clicking Step Over again and again 921 00:44:35,030 --> 00:44:36,140 after typing in a number. 922 00:44:36,140 --> 00:44:38,210 Let me type in negative 1 here. 923 00:44:38,210 --> 00:44:41,540 Now notice on the top right of the screen, you can see in the debugger 924 00:44:41,540 --> 00:44:43,280 that n equals negative 1. 925 00:44:43,280 --> 00:44:45,830 I'm going to now go ahead and click Step Over. 926 00:44:45,830 --> 00:44:48,680 And I think I'm going to end up in line 22. 927 00:44:48,680 --> 00:44:51,920 If the human has typed in a negative integer like negative 1, 928 00:44:51,920 --> 00:44:53,480 obviously, that's negative. 929 00:44:53,480 --> 00:44:55,160 Let's proceed to line 22. 930 00:44:55,160 --> 00:44:58,310 But watch what happens when I click Step Over. 931 00:44:58,310 --> 00:45:03,740 It actually seems to be going back to the do loop again and again 932 00:45:03,740 --> 00:45:06,750 and again, as it will, I keep providing negative integers. 933 00:45:06,750 --> 00:45:10,670 So my logic then should be, well, OK, if n is negative 1, 934 00:45:10,670 --> 00:45:17,030 but my loop is still running, what should your logical takeaway here be? 935 00:45:17,030 --> 00:45:20,710 If n is negative 1, and that is by definition a negative integer, 936 00:45:20,710 --> 00:45:25,720 but my loop is still running, what could be your diagnostic conclusion 937 00:45:25,720 --> 00:45:29,860 if the debugger is essentially revealing this hint to you? n is negative 1, 938 00:45:29,860 --> 00:45:31,420 but the loop is still going. 939 00:45:31,420 --> 00:45:33,730 Omar, what would you conclude? 940 00:45:33,730 --> 00:45:36,850 OMAR: Either the condition is wrong, or maybe some sort of Boolean logic 941 00:45:36,850 --> 00:45:37,755 could be flawed. 942 00:45:37,755 --> 00:45:38,630 DAVID MALAN: Perfect. 943 00:45:38,630 --> 00:45:40,463 So obviously, either the condition is wrong, 944 00:45:40,463 --> 00:45:42,505 or there's something wrong with my Boolean logic. 945 00:45:42,505 --> 00:45:44,540 And Boolean logic just refers to true or false. 946 00:45:44,540 --> 00:45:46,930 So somewhere, I'm saying true instead of false, 947 00:45:46,930 --> 00:45:48,850 or I'm saying false instead of true. 948 00:45:48,850 --> 00:45:52,060 And frankly, the only place where I have code 949 00:45:52,060 --> 00:45:56,050 that's going to make this loop go again and again must logically be on line 21. 950 00:45:56,050 --> 00:45:59,350 So even if you're not quite sure how to fix it yet, just by deduction, 951 00:45:59,350 --> 00:46:02,215 you should realize that, OK, negative 1 is what's in the variable. 952 00:46:02,215 --> 00:46:03,340 But that's not good enough. 953 00:46:03,340 --> 00:46:04,340 The loop is still going. 954 00:46:04,340 --> 00:46:05,680 I must have screwed up the loop. 955 00:46:05,680 --> 00:46:08,080 And indeed, let me just now call it out. 956 00:46:08,080 --> 00:46:11,290 Line 21 is indeed the source of the bug. 957 00:46:11,290 --> 00:46:12,520 So we've isolated it. 958 00:46:12,520 --> 00:46:15,160 Out of 23 lines, we've at least found the one line 959 00:46:15,160 --> 00:46:18,520 where I know the solution has to be. 960 00:46:18,520 --> 00:46:19,610 What's the solution? 961 00:46:19,610 --> 00:46:26,020 How do I fix the logic now thanks to the debugger having led me down this road? 962 00:46:26,020 --> 00:46:29,230 How do I fix line 21 here? 963 00:46:29,230 --> 00:46:31,350 What's the fix? 964 00:46:31,350 --> 00:46:33,960 What do you propose? 965 00:46:33,960 --> 00:46:35,220 Yeah, Jacob? 966 00:46:35,220 --> 00:46:38,700 JACOB: You would have to change it from while n is less than 0 967 00:46:38,700 --> 00:46:40,345 to while n is greater than 0. 968 00:46:40,345 --> 00:46:41,220 DAVID MALAN: Exactly. 969 00:46:41,220 --> 00:46:44,640 So instead of n less than 0, I want to say n greater than 0. 970 00:46:44,640 --> 00:46:46,860 And I think-- slight clarification, I think 971 00:46:46,860 --> 00:46:50,328 I want to include 0 here because 0 is not negative. 972 00:46:50,328 --> 00:46:52,620 And if I want a negative int, I think what I'm probably 973 00:46:52,620 --> 00:46:56,070 going to want to say is while n is greater than or equal to 0, 974 00:46:56,070 --> 00:46:57,120 keep doing the loop. 975 00:46:57,120 --> 00:46:59,970 So I very understandably sort of just inverted the logic. 976 00:46:59,970 --> 00:47:00,490 No big deal. 977 00:47:00,490 --> 00:47:02,323 I'm thinking negatives, and I did less than. 978 00:47:02,323 --> 00:47:03,670 But the fix is easy. 979 00:47:03,670 --> 00:47:06,300 The point is the debugger led you to this point. 980 00:47:06,300 --> 00:47:08,730 Now, those of you who have programmed before probably 981 00:47:08,730 --> 00:47:10,290 saw the bug jumping out at you. 982 00:47:10,290 --> 00:47:13,123 Those of you who haven't programmed before, probably with some time, 983 00:47:13,123 --> 00:47:15,940 would have figured out what the bug was, because out of 23 lines, 984 00:47:15,940 --> 00:47:17,580 it's got to be one of those. 985 00:47:17,580 --> 00:47:19,830 But as our programs get more sophisticated, 986 00:47:19,830 --> 00:47:25,020 and we start writing more lines of code, debug50 and debuggers in general 987 00:47:25,020 --> 00:47:26,020 will be your friend. 988 00:47:26,020 --> 00:47:29,865 And I realize that this is easier said than done because at first, 989 00:47:29,865 --> 00:47:32,240 when using a debugger, you're going to feel like, ah, I'm 990 00:47:32,240 --> 00:47:33,282 just going to use printf. 991 00:47:33,282 --> 00:47:35,700 Ah, I'm just going to fight through this. 992 00:47:35,700 --> 00:47:37,500 Because there's a bit of a learning curve, 993 00:47:37,500 --> 00:47:41,640 you will gain back that time and more by just 994 00:47:41,640 --> 00:47:47,490 using a debugger as your first instinct when chasing down problems like this. 995 00:47:47,490 --> 00:47:51,660 All right, so that's it for debug50, a new tool in your toolkit in addition 996 00:47:51,660 --> 00:47:52,800 to printf. 997 00:47:52,800 --> 00:47:55,730 But debug50 is hands down the more powerful of the two. 998 00:47:55,730 --> 00:47:58,230 Now, some of you have wondered over the past couple of weeks 999 00:47:58,230 --> 00:48:00,207 why there's this little rubber duck here. 1000 00:48:00,207 --> 00:48:02,040 And there actually is a reason for this too. 1001 00:48:02,040 --> 00:48:05,280 And there's one final debugging technique that, in all seriousness, 1002 00:48:05,280 --> 00:48:08,390 we'll introduce you today to known as rubber duck debugging. 1003 00:48:08,390 --> 00:48:09,390 And you can google this. 1004 00:48:09,390 --> 00:48:11,288 There's a whole Wikipedia article about it. 1005 00:48:11,288 --> 00:48:14,580 And this is kind of a thing in computer science circles for computer scientists 1006 00:48:14,580 --> 00:48:16,860 or programmers to have rubber ducks on their desk. 1007 00:48:16,860 --> 00:48:19,290 And the point here is that sometimes, when 1008 00:48:19,290 --> 00:48:22,710 trying to understand what is wrong in your code, 1009 00:48:22,710 --> 00:48:24,420 it helps to just talk it through. 1010 00:48:24,420 --> 00:48:28,620 And in an ideal world, we would just talk to our colleague or our partner 1011 00:48:28,620 --> 00:48:29,610 on some project. 1012 00:48:29,610 --> 00:48:33,060 And just in hearing yourself vocalize what it is your code 1013 00:48:33,060 --> 00:48:36,810 is supposed to do, very often, that proverbial light bulb goes off. 1014 00:48:36,810 --> 00:48:39,330 And you're like, oh, wait a minute, never mind, I got it, 1015 00:48:39,330 --> 00:48:42,600 just because you heard yourself speaking illogically when 1016 00:48:42,600 --> 00:48:44,910 you intended something actual logical. 1017 00:48:44,910 --> 00:48:49,410 Now, we don't often all have colleagues or partners or friends with whom we're 1018 00:48:49,410 --> 00:48:50,698 working on a project with. 1019 00:48:50,698 --> 00:48:52,740 And we don't often have family members or friends 1020 00:48:52,740 --> 00:48:55,270 who want to hear about our code of all things. 1021 00:48:55,270 --> 00:48:58,590 And so a wonderful proxy for that conversant partner 1022 00:48:58,590 --> 00:49:00,300 would be literally a rubber duck. 1023 00:49:00,300 --> 00:49:03,900 And so here in healthier times, we would be giving all of you rubber ducks. 1024 00:49:03,900 --> 00:49:07,080 Here on stage, we brought a larger one for us all to share. 1025 00:49:07,080 --> 00:49:09,900 If you've noticed in some of the wide shots on camera, 1026 00:49:09,900 --> 00:49:12,100 there's a duck who's been watching this whole time. 1027 00:49:12,100 --> 00:49:13,975 So that any time I screw up, I literally have 1028 00:49:13,975 --> 00:49:17,850 someone I can sort of talk to nonverbally, in this case. 1029 00:49:17,850 --> 00:49:20,880 But we can't emphasize enough that in addition to printf, in addition to 1030 00:49:20,880 --> 00:49:25,230 the more sophisticated debug50, talking through your problems with code 1031 00:49:25,230 --> 00:49:26,940 is a wonderfully valuable thing. 1032 00:49:26,940 --> 00:49:29,010 And if your friends or family are willing to hear 1033 00:49:29,010 --> 00:49:31,650 about some low-level code you're writing and some bug you're 1034 00:49:31,650 --> 00:49:33,000 trying to solve, great. 1035 00:49:33,000 --> 00:49:36,210 But in the absence of that, talk to a stuffed animal in your room. 1036 00:49:36,210 --> 00:49:38,400 Talk to an actual rubber duck if you have one. 1037 00:49:38,400 --> 00:49:39,960 Talk even aloud or think aloud. 1038 00:49:39,960 --> 00:49:42,120 It's just a wonderful compelling habit to get 1039 00:49:42,120 --> 00:49:46,440 into because just in hearing yourself vocalize what you think is logical 1040 00:49:46,440 --> 00:49:51,750 will the illogical very often jump out at you instead. 1041 00:49:51,750 --> 00:49:55,620 All right, so with that said, that's been a lot. 1042 00:49:55,620 --> 00:49:57,690 Let's go ahead here and take a five-minute break, 1043 00:49:57,690 --> 00:49:59,130 give everyone a bit of a breather. 1044 00:49:59,130 --> 00:50:01,050 And when we come back, we'll take a look now 1045 00:50:01,050 --> 00:50:02,880 at some of the more powerful features of C 1046 00:50:02,880 --> 00:50:05,850 now that we can trust that we can solve any problems with all 1047 00:50:05,850 --> 00:50:06,730 of these new tools. 1048 00:50:06,730 --> 00:50:08,700 So we'll be back in five. 1049 00:50:08,700 --> 00:50:10,440 All right, we are back. 1050 00:50:10,440 --> 00:50:13,320 So let's take a look underneath the hood, so to speak, 1051 00:50:13,320 --> 00:50:15,480 of a computer, because as fancy as these devices 1052 00:50:15,480 --> 00:50:17,460 are and as powerful as they seem, they're 1053 00:50:17,460 --> 00:50:21,630 relatively simple in their capabilities and what they can actually do. 1054 00:50:21,630 --> 00:50:24,570 And let's reveal as much by way of last week's discussion of type. 1055 00:50:24,570 --> 00:50:27,700 So recall that C supports different data types. 1056 00:50:27,700 --> 00:50:31,060 So we saw char, and string, and int, and so forth. 1057 00:50:31,060 --> 00:50:32,730 So to recap, we had all of these. 1058 00:50:32,730 --> 00:50:35,310 Well, it turns out that each of these data types 1059 00:50:35,310 --> 00:50:40,800 is defined on a typical computer system as taking up a fixed amount of space. 1060 00:50:40,800 --> 00:50:44,280 And it depends on the computer, whether it's Mac or PC, or old or new, 1061 00:50:44,280 --> 00:50:47,400 just how much space is used typically by these data types. 1062 00:50:47,400 --> 00:50:51,400 But on CS50 IDE, the sizes of all of these types are as follows-- 1063 00:50:51,400 --> 00:50:54,510 a bool, true or false, uses just 1 byte. 1064 00:50:54,510 --> 00:50:58,320 Now, that's actually a little wasteful because 1 byte is 8 bits, and gosh, 1065 00:50:58,320 --> 00:51:00,090 for a bool, you should only need 1 bit. 1066 00:51:00,090 --> 00:51:04,200 You can't work at the single-bit level easily in C. 1067 00:51:04,200 --> 00:51:07,440 And so we just typically spend 1 whole byte on a bool. 1068 00:51:07,440 --> 00:51:09,640 Char is going to be 1 byte as well. 1069 00:51:09,640 --> 00:51:13,110 And that might sound familiar, because last week when we talked about ASCII, 1070 00:51:13,110 --> 00:51:15,450 we proposed that the total number of possible characters 1071 00:51:15,450 --> 00:51:20,190 you can represent with a char was 256 because of 8 bits and 2 1072 00:51:20,190 --> 00:51:21,340 to the eighth power. 1073 00:51:21,340 --> 00:51:23,620 So one char is 1 byte. 1074 00:51:23,620 --> 00:51:25,383 And that's fixed in C, no matter what. 1075 00:51:25,383 --> 00:51:27,300 Then there were all of these other data types. 1076 00:51:27,300 --> 00:51:29,910 There was float, which is a real number with a decimal point. 1077 00:51:29,910 --> 00:51:31,410 That happens to use 4 bytes. 1078 00:51:31,410 --> 00:51:34,000 A double is also a real number with a decimal point, 1079 00:51:34,000 --> 00:51:36,820 but it uses 8 bytes, which gives you even more precision. 1080 00:51:36,820 --> 00:51:40,450 You can have more significant digits after the decimal point, for instance. 1081 00:51:40,450 --> 00:51:41,830 Ints, we've used a bunch. 1082 00:51:41,830 --> 00:51:43,540 Those are 4 bytes, typically. 1083 00:51:43,540 --> 00:51:45,490 A long is twice as big, and that just allows 1084 00:51:45,490 --> 00:51:47,115 you to represent an even bigger number. 1085 00:51:47,115 --> 00:51:49,780 And some of you might have done that exactly on credit when 1086 00:51:49,780 --> 00:51:51,400 storing a whole credit card number. 1087 00:51:51,400 --> 00:51:54,310 Strings, for now, are a variable number of bytes. 1088 00:51:54,310 --> 00:51:57,780 It could be a short string of text, a long string of text, a whole paragraph. 1089 00:51:57,780 --> 00:51:58,780 So that's going to vary. 1090 00:51:58,780 --> 00:52:01,870 So we'll come back to this notion of string next time. 1091 00:52:01,870 --> 00:52:05,380 But today, focus on just these primitive types, if you will. 1092 00:52:05,380 --> 00:52:08,960 And here is a picture of what is inside of your computer. 1093 00:52:08,960 --> 00:52:12,285 So this is a piece of memory or RAM, Random Access Memory. 1094 00:52:12,285 --> 00:52:13,660 And it might be a little smaller. 1095 00:52:13,660 --> 00:52:15,820 It might be a little bigger depending on whether it's a laptop, 1096 00:52:15,820 --> 00:52:17,450 or desktop, or phone, or the like. 1097 00:52:17,450 --> 00:52:20,710 But it's in memory, or RAM, that programs 1098 00:52:20,710 --> 00:52:22,780 are stored while they're running. 1099 00:52:22,780 --> 00:52:25,840 And it's where files are stored when they are open. 1100 00:52:25,840 --> 00:52:29,440 So typically, if you save, install programs, or save files, 1101 00:52:29,440 --> 00:52:32,740 those are saved on what's generally called your hard drive, or hard disk, 1102 00:52:32,740 --> 00:52:37,090 or solid-state disk, or CD, or some other physical medium. 1103 00:52:37,090 --> 00:52:40,450 And that, the [INAUDIBLE] of which is that they don't require electricity 1104 00:52:40,450 --> 00:52:42,160 to store your data long term. 1105 00:52:42,160 --> 00:52:43,180 RAM is different. 1106 00:52:43,180 --> 00:52:44,830 It's volatile, so to speak. 1107 00:52:44,830 --> 00:52:48,500 But it's much faster than a hard disk or a solid-state disk, even. 1108 00:52:48,500 --> 00:52:50,500 It's much faster because it's purely electronic. 1109 00:52:50,500 --> 00:52:52,300 And indeed, there are no moving parts. 1110 00:52:52,300 --> 00:52:54,640 It's purely electronic, as pictured here. 1111 00:52:54,640 --> 00:52:59,410 And so with RAM, you have the ability to open files and run programs 1112 00:52:59,410 --> 00:53:02,290 more quickly because when you double-click a program to run it, 1113 00:53:02,290 --> 00:53:04,960 or you open a file in order to view or edit it, 1114 00:53:04,960 --> 00:53:06,820 it's stored temporarily in RAM. 1115 00:53:06,820 --> 00:53:11,290 And long story short, if your laptop battery has ever died, 1116 00:53:11,290 --> 00:53:13,990 or your computer's gotten unplugged, or your phone dies, 1117 00:53:13,990 --> 00:53:17,770 the reason that you and I tend to lose data, the paragraph that you just 1118 00:53:17,770 --> 00:53:19,960 wrote in the essay that you hadn't yet saved, 1119 00:53:19,960 --> 00:53:23,020 is because RAM, memory, is volatile. 1120 00:53:23,020 --> 00:53:26,360 That is, it requires electricity to continue powering it. 1121 00:53:26,360 --> 00:53:30,100 But for our purposes, we're only going to focus on RAM, 1122 00:53:30,100 --> 00:53:33,340 not so much long-term disk space yet, because when 1123 00:53:33,340 --> 00:53:36,820 you're running a program in C, it is indeed, by definition, 1124 00:53:36,820 --> 00:53:38,680 running in your computer's memory. 1125 00:53:38,680 --> 00:53:41,530 But the funny thing about something as simple as this picture 1126 00:53:41,530 --> 00:53:44,290 is that each of these black rectangles is kind of a chip. 1127 00:53:44,290 --> 00:53:47,710 And in those chips are stored all of the 0's and 1's, the little 1128 00:53:47,710 --> 00:53:49,690 switches that we alluded to in week 0. 1129 00:53:49,690 --> 00:53:53,410 So let's focus on and just zoom in on just one of these chips. 1130 00:53:53,410 --> 00:53:57,130 Now, it stands to reason that I don't know how big this stick of RAM is. 1131 00:53:57,130 --> 00:53:59,440 Maybe it's 1 gigabyte, a billion bytes. 1132 00:53:59,440 --> 00:54:01,000 Maybe it's 4 gigabytes. 1133 00:54:01,000 --> 00:54:02,800 Maybe it's even smaller or bigger. 1134 00:54:02,800 --> 00:54:07,250 There's some number of bytes represented physically by this hardware. 1135 00:54:07,250 --> 00:54:10,228 So if we zoom in further, let me propose that, all right, 1136 00:54:10,228 --> 00:54:11,770 I don't know how many bytes are here. 1137 00:54:11,770 --> 00:54:15,130 But if there's some number of bytes, whether it's a billion or 2 billion, 1138 00:54:15,130 --> 00:54:17,440 or fewer or more, it stands to reason that we 1139 00:54:17,440 --> 00:54:19,240 could just number all of these bytes. 1140 00:54:19,240 --> 00:54:22,670 We could sort of think of this physical device, this memory, 1141 00:54:22,670 --> 00:54:24,970 as just being a grid, top to bottom, left to right. 1142 00:54:24,970 --> 00:54:28,270 And each of the squares I've just overlaid on this physical device 1143 00:54:28,270 --> 00:54:30,057 might represent an individual byte. 1144 00:54:30,057 --> 00:54:32,140 And again, in reality, maybe there's more of them. 1145 00:54:32,140 --> 00:54:33,410 Maybe there's fewer of them. 1146 00:54:33,410 --> 00:54:35,920 But it stands to reason, no matter how many there are, 1147 00:54:35,920 --> 00:54:38,650 we can think of each of these as having a location. 1148 00:54:38,650 --> 00:54:42,200 Like, this is the first byte, second byte, third byte, and so forth. 1149 00:54:42,200 --> 00:54:45,940 Well, what does it mean, then, for a char to take up 1 byte? 1150 00:54:45,940 --> 00:54:49,750 That means that if your computer's memory is running a program maybe 1151 00:54:49,750 --> 00:54:53,890 that you wrote or I wrote that's using a char variable somewhere in it, 1152 00:54:53,890 --> 00:54:56,500 the char you're storing in that variable may very well 1153 00:54:56,500 --> 00:55:00,940 be stored in the top left-hand corner physically of this piece of RAM. 1154 00:55:00,940 --> 00:55:01,660 Maybe it's there. 1155 00:55:01,660 --> 00:55:02,535 Maybe it's elsewhere. 1156 00:55:02,535 --> 00:55:04,840 But it's just one physical square. 1157 00:55:04,840 --> 00:55:08,110 If you're storing something like an int, which takes up 4 bytes, 1158 00:55:08,110 --> 00:55:11,830 well, that frankly might take up all four squares along the top there 1159 00:55:11,830 --> 00:55:12,670 or somewhere else. 1160 00:55:12,670 --> 00:55:15,610 If you're using a long, that's going to take up twice as much space. 1161 00:55:15,610 --> 00:55:18,250 So representing an even bigger number in your computer's memory 1162 00:55:18,250 --> 00:55:21,340 is going to require that you use all of the 0's and 1's 1163 00:55:21,340 --> 00:55:25,030 comprising these 8 bytes instead. 1164 00:55:25,030 --> 00:55:27,170 but let's now move away from physical hardware. 1165 00:55:27,170 --> 00:55:30,530 Let's abstract it away, if you will, and just now start to think of our memory 1166 00:55:30,530 --> 00:55:31,390 as just this grid. 1167 00:55:31,390 --> 00:55:33,280 And technically, it's not a two-dimensional structure. 1168 00:55:33,280 --> 00:55:35,860 I could just as easily draw all of these bytes from left to right. 1169 00:55:35,860 --> 00:55:37,790 I could just fit fewer of them on the screen. 1170 00:55:37,790 --> 00:55:39,832 So we'll take the physical metaphor a bit further 1171 00:55:39,832 --> 00:55:44,020 and just think of our computer's memory as this grid, this grid of bytes. 1172 00:55:44,020 --> 00:55:46,510 And those bytes are each 8 bits. 1173 00:55:46,510 --> 00:55:48,340 Those bits are just 0's and 1's. 1174 00:55:48,340 --> 00:55:52,840 So what we've really done is zoom in metaphorically on our computer's memory 1175 00:55:52,840 --> 00:55:57,460 to start thinking about where things are going to end up in memory when you 1176 00:55:57,460 --> 00:56:01,030 double-click on a program on your Mac or PC or, in CS50 IDE, 1177 00:56:01,030 --> 00:56:04,720 when you do ./hello or ./buggy0 or ./buggy1, 1178 00:56:04,720 --> 00:56:07,990 it's these bytes in your computer's memory that are filled with all 1179 00:56:07,990 --> 00:56:09,440 of your variables' values. 1180 00:56:09,440 --> 00:56:10,940 So let's consider an example here. 1181 00:56:10,940 --> 00:56:14,800 Suppose I had written some code that involved declaring three scores. 1182 00:56:14,800 --> 00:56:17,680 Maybe it's a class that's got, like, three tests. 1183 00:56:17,680 --> 00:56:23,343 And you want to average the student's grade across all three of those tests. 1184 00:56:23,343 --> 00:56:26,260 Well, let's go ahead and write a quick program that does exactly this. 1185 00:56:26,260 --> 00:56:30,640 In CS50 IDE, I'm going to create a program called scores.c. 1186 00:56:30,640 --> 00:56:35,920 And in scores.c, I'm going to go ahead and #include stdio.h. 1187 00:56:35,920 --> 00:56:38,703 I'm going to then do my int main(void) as usual. 1188 00:56:38,703 --> 00:56:41,120 And then inside of here, I'm going to keep it very simple. 1189 00:56:41,120 --> 00:56:43,750 I'm going to give myself one int called score1. 1190 00:56:43,750 --> 00:56:46,180 And just to be a little playful, I'm going 1191 00:56:46,180 --> 00:56:48,242 to set it equal to 72, like last week. 1192 00:56:48,242 --> 00:56:50,200 I'm going to give myself a second score and set 1193 00:56:50,200 --> 00:56:55,120 it equal to 73, and then a third score whose value is going to be 33. 1194 00:56:55,120 --> 00:56:59,770 And then let me go ahead and print out the average of those three values 1195 00:56:59,770 --> 00:57:03,100 by plugging in a placeholder for floating point value, right? 1196 00:57:03,100 --> 00:57:07,330 If you add three integers together and divide them by 3, 1197 00:57:07,330 --> 00:57:10,730 I may very well get a fraction or a real number with a decimal point. 1198 00:57:10,730 --> 00:57:14,110 So I'm going to use %f instead of %i because I don't want to truncate 1199 00:57:14,110 --> 00:57:15,010 someone's grade. 1200 00:57:15,010 --> 00:57:19,210 Otherwise, if they have, like, a 99.9%, they're not being rounded up to 100%. 1201 00:57:19,210 --> 00:57:22,880 They're going to get the 99% because of truncation, as we discussed last week. 1202 00:57:22,880 --> 00:57:25,030 So how do I do now the math of an average? 1203 00:57:25,030 --> 00:57:27,220 Well, it's pretty straightforward-- score1 1204 00:57:27,220 --> 00:57:30,160 plus score2 plus score3 in parentheses, just 1205 00:57:30,160 --> 00:57:33,610 like in math, divided by 3, semicolon. 1206 00:57:33,610 --> 00:57:35,000 Let me save that file. 1207 00:57:35,000 --> 00:57:36,655 Let me do make scores at the bottom. 1208 00:57:36,655 --> 00:57:38,530 Again, we're not going to use Clang manually. 1209 00:57:38,530 --> 00:57:41,230 No need to, because it's a lot easier to run make. 1210 00:57:41,230 --> 00:57:42,670 But I did mess up here. 1211 00:57:42,670 --> 00:57:46,420 "Format specifies type 'double', but the argument has type 'int'." 1212 00:57:46,420 --> 00:57:48,500 So I don't quite understand that. 1213 00:57:48,500 --> 00:57:52,540 But it's drawing my attention to the %f and the fact that my math looks like 1214 00:57:52,540 --> 00:57:53,930 this. 1215 00:57:53,930 --> 00:57:56,110 So any thoughts here? 1216 00:57:56,110 --> 00:57:59,650 I don't think printf is going to help me here because the bug is 1217 00:57:59,650 --> 00:58:01,810 within the printf line. 1218 00:58:01,810 --> 00:58:06,160 I don't think that debug50 is going to really help me here because I already 1219 00:58:06,160 --> 00:58:09,130 know what line of code the bug is in. 1220 00:58:09,130 --> 00:58:13,570 This feels like an opportunity to talk to the physical duck 1221 00:58:13,570 --> 00:58:15,280 or some other inanimate object. 1222 00:58:15,280 --> 00:58:21,070 Or we can perhaps think about what errors we ran into even last week. 1223 00:58:21,070 --> 00:58:23,410 [? Arpan, ?] what do you think? 1224 00:58:23,410 --> 00:58:27,520 [? ARPAN: ?] I think it's telling you this because it's 1225 00:58:27,520 --> 00:58:31,390 receiving in all the values are integer type, 1226 00:58:31,390 --> 00:58:33,557 but you are telling it to be in float. 1227 00:58:33,557 --> 00:58:34,390 DAVID MALAN: Indeed. 1228 00:58:34,390 --> 00:58:37,450 So score1, score2, score3 are all integers, 1229 00:58:37,450 --> 00:58:40,570 and the number 3 is literally an integer. 1230 00:58:40,570 --> 00:58:43,720 And so this time, the compiler is smart enough to realize, wait a minute, 1231 00:58:43,720 --> 00:58:48,850 you're trying to coerce an integer result into a floating point value, 1232 00:58:48,850 --> 00:58:51,592 but you haven't done any floating point arithmetic, if you will. 1233 00:58:51,592 --> 00:58:52,300 So you know what? 1234 00:58:52,300 --> 00:58:53,592 There's a few ways to fix this. 1235 00:58:53,592 --> 00:58:56,680 Last week, recall we proposed that you could use a cast, 1236 00:58:56,680 --> 00:59:00,880 and you could explicitly cast one or more of those values to a float. 1237 00:59:00,880 --> 00:59:02,830 So I could do this, for instance. 1238 00:59:02,830 --> 00:59:06,728 Or I could cast all of these to floats or one of these to floats. 1239 00:59:06,728 --> 00:59:08,270 There's many different possibilities. 1240 00:59:08,270 --> 00:59:12,250 But frankly, the simplest fix is just to divide, for instance, by 3.0. 1241 00:59:12,250 --> 00:59:16,263 I can avoid some of the headaches of casting from one to another by just 1242 00:59:16,263 --> 00:59:18,430 making sure that there's at least one floating point 1243 00:59:18,430 --> 00:59:20,680 value involved in this arithmetic. 1244 00:59:20,680 --> 00:59:23,020 So now let me recompile scores. 1245 00:59:23,020 --> 00:59:24,460 This time, it compiles OK. 1246 00:59:24,460 --> 00:59:31,810 Let me do ./scores, and voila, my average isn't so high, 59.333333. 1247 00:59:31,810 --> 00:59:34,750 All right, so what is actually going on inside 1248 00:59:34,750 --> 00:59:38,980 of the computer irrespective of the floating point arithmetic, which was, 1249 00:59:38,980 --> 00:59:40,760 again, a topic of last week? 1250 00:59:40,760 --> 00:59:44,470 Well, let's consider these three variables, score1, score2, score3-- 1251 00:59:44,470 --> 00:59:47,500 where are they actually being stored in the computer's memory? 1252 00:59:47,500 --> 00:59:49,120 Well, let's consider that grid again. 1253 00:59:49,120 --> 00:59:51,550 And again, I'm going to start at top left for convenience. 1254 00:59:51,550 --> 00:59:53,925 But technically speaking-- we'll see this down the road-- 1255 00:59:53,925 --> 00:59:56,410 your computer's memory is just like this big canvas. 1256 00:59:56,410 --> 00:59:59,150 And values can end up in all different places. 1257 00:59:59,150 --> 01:00:00,700 But for today, we'll keep it clean. 1258 01:00:00,700 --> 01:00:03,830 The first variable, score1, I claim is going to be here, 1259 01:00:03,830 --> 01:00:05,420 top left, for simplicity. 1260 01:00:05,420 --> 01:00:08,290 But what's important about where score1-- 1261 01:00:08,290 --> 01:00:13,120 that is, 72-- is being stored, is it's taking up four of these boxes. 1262 01:00:13,120 --> 01:00:15,670 Each of these boxes, recall, represents 1 byte. 1263 01:00:15,670 --> 01:00:19,600 And an integer, recall, in CS50 IDE is 4 bytes. 1264 01:00:19,600 --> 01:00:24,250 Therefore, I have used 4 bytes of space to represent the number 72. 1265 01:00:24,250 --> 01:00:27,430 The number 73 in score2 similarly is going 1266 01:00:27,430 --> 01:00:32,150 to take up four boxes, as is score3 going to take up four boxes as well. 1267 01:00:32,150 --> 01:00:34,750 But what's really going on underneath the hood here? 1268 01:00:34,750 --> 01:00:37,450 Well, if each of these squares represents a byte, 1269 01:00:37,450 --> 01:00:42,070 and each of those bytes is 8 bits, and a bit is just a 0 or 1, 1270 01:00:42,070 --> 01:00:45,130 what's really going on underneath the hood is something like this. 1271 01:00:45,130 --> 01:00:47,980 Somehow, this electronic memory is storing 1272 01:00:47,980 --> 01:00:50,920 electricity in just the right way so that it's storing 1273 01:00:50,920 --> 01:00:53,650 this pattern of 0's and 1's, a.k.a. 1274 01:00:53,650 --> 01:00:57,370 72 in decimal, this pattern of 0's and 1's, a.k.a. 1275 01:00:57,370 --> 01:01:01,180 73 in decimal, this pattern of 0's and 1's, a.k.a. 1276 01:01:01,180 --> 01:01:02,682 33 in decimal. 1277 01:01:02,682 --> 01:01:05,140 But again, we don't have to keep thinking about or dwelling 1278 01:01:05,140 --> 01:01:06,190 on the binary level. 1279 01:01:06,190 --> 01:01:09,100 But this is only to say that everything we've discussed thus far 1280 01:01:09,100 --> 01:01:11,560 is coming together now in this one picture 1281 01:01:11,560 --> 01:01:14,320 because a computer is just storing these patterns for us, 1282 01:01:14,320 --> 01:01:16,930 and we are allocating space now thanks to our programming 1283 01:01:16,930 --> 01:01:20,050 language via code like this. 1284 01:01:20,050 --> 01:01:26,770 But this code, correct though it may be, indeed 59.333333 and so forth 1285 01:01:26,770 --> 01:01:31,240 was my average if my test scores were 72, 73, and 33. 1286 01:01:31,240 --> 01:01:34,330 But I feel like there's an opportunity for better design here. 1287 01:01:34,330 --> 01:01:37,570 So not just correctness, not just style, recall that design 1288 01:01:37,570 --> 01:01:40,000 is this other metric of code quality. 1289 01:01:40,000 --> 01:01:42,100 And it's a little more subjective, and it's 1290 01:01:42,100 --> 01:01:45,490 a little more subject to debate among reasonable people. 1291 01:01:45,490 --> 01:01:50,258 But I don't really love what I was doing with this naming scheme. 1292 01:01:50,258 --> 01:01:52,300 And in fact, if we look at the code, there really 1293 01:01:52,300 --> 01:01:55,240 wasn't much more to my program than these three lines. 1294 01:01:55,240 --> 01:01:58,840 I worry this program isn't particularly well designed. 1295 01:01:58,840 --> 01:02:04,360 What rubs you the wrong way, perhaps, about those three lines of code? 1296 01:02:04,360 --> 01:02:06,255 What could be better? 1297 01:02:06,255 --> 01:02:08,380 And even if you don't know the solution, especially 1298 01:02:08,380 --> 01:02:13,688 if you've never programmed before, what kind of smells about those three lines? 1299 01:02:13,688 --> 01:02:14,980 This is actually a term of art. 1300 01:02:14,980 --> 01:02:18,820 "Code smell" is like something-- not loving that for some reason. 1301 01:02:18,820 --> 01:02:22,270 If you can't put your finger on it, it's not the best design. 1302 01:02:22,270 --> 01:02:23,380 The code smells. 1303 01:02:23,380 --> 01:02:26,950 What's smelly, if you will, about score1, score2, score3? 1304 01:02:26,950 --> 01:02:28,405 Ryan, what do you think? 1305 01:02:28,405 --> 01:02:30,280 RYAN: If you're doing an average calculation, 1306 01:02:30,280 --> 01:02:33,170 you don't need to add them up all together in the code. 1307 01:02:33,170 --> 01:02:35,970 You can add them up beforehand and store it as one variable. 1308 01:02:35,970 --> 01:02:36,970 DAVID MALAN: Absolutely. 1309 01:02:36,970 --> 01:02:39,803 If I'm computing the average, I don't need to keep all three around. 1310 01:02:39,803 --> 01:02:42,940 I can just keep a sum and then divide the whole sum by the total number. 1311 01:02:42,940 --> 01:02:45,070 I like that, that instinct. 1312 01:02:45,070 --> 01:02:49,480 What else might you not like about the design of this code now? 1313 01:02:49,480 --> 01:02:51,340 Score1, score2, score3. 1314 01:02:51,340 --> 01:02:54,110 1315 01:02:54,110 --> 01:02:56,150 Score1, score2, score3. 1316 01:02:56,150 --> 01:02:59,030 Might there be opportunity still for improvement? 1317 01:02:59,030 --> 01:03:02,090 I feel like any time you start to see this repetition, maybe. 1318 01:03:02,090 --> 01:03:03,600 Andrew, your thoughts? 1319 01:03:03,600 --> 01:03:06,715 ANDREW: Not hard code the three scores together? 1320 01:03:06,715 --> 01:03:08,840 DAVID MALAN: OK, so not hard code the three scores. 1321 01:03:08,840 --> 01:03:10,500 And what would you do instead? 1322 01:03:10,500 --> 01:03:14,060 ANDREW: Maybe take an input, or I would-- 1323 01:03:14,060 --> 01:03:16,562 yeah, I wouldn't write out the scores themselves. 1324 01:03:16,562 --> 01:03:18,270 DAVID MALAN: Yeah, another good instinct. 1325 01:03:18,270 --> 01:03:21,470 It's kind of stupid that I've written a program, compiled a program, 1326 01:03:21,470 --> 01:03:25,847 that only computes the average for some student who literally got those three 1327 01:03:25,847 --> 01:03:26,930 test scores and no others. 1328 01:03:26,930 --> 01:03:28,490 Like, there's no dynamism here. 1329 01:03:28,490 --> 01:03:31,550 Moreover, it's a little lazy too that I called 1330 01:03:31,550 --> 01:03:33,890 my variables score1, score2, score3. 1331 01:03:33,890 --> 01:03:35,450 I mean, where does it end after that? 1332 01:03:35,450 --> 01:03:37,790 If I want to have a fourth test next semester, now 1333 01:03:37,790 --> 01:03:39,140 I have to go and have score4. 1334 01:03:39,140 --> 01:03:40,760 If I've got a fifth, score5. 1335 01:03:40,760 --> 01:03:44,270 That starts to be reminiscent of last week's copy/paste, which 1336 01:03:44,270 --> 01:03:45,990 really wasn't the best practice. 1337 01:03:45,990 --> 01:03:48,590 And so let me propose that we clean this up. 1338 01:03:48,590 --> 01:03:50,900 And it turns out we can clean this up by way 1339 01:03:50,900 --> 01:03:53,210 of another topic, another feature of C that's 1340 01:03:53,210 --> 01:03:56,100 also present in other languages, known as arrays. 1341 01:03:56,100 --> 01:03:58,880 And if you happened to use something called a list in Scratch, 1342 01:03:58,880 --> 01:04:01,760 very similar in spirit to Scratch's lists. 1343 01:04:01,760 --> 01:04:05,060 But we didn't see those in lecture that first week. 1344 01:04:05,060 --> 01:04:11,030 An array in C, as in other languages, is a sequence 1345 01:04:11,030 --> 01:04:14,870 of values stored in memory back to back to back, 1346 01:04:14,870 --> 01:04:19,020 a sequence of contiguous values, so to speak, back to back to back. 1347 01:04:19,020 --> 01:04:22,185 So in that sense, it's like a list of values from left to right 1348 01:04:22,185 --> 01:04:24,560 if we use the metaphor of the picture we've been drawing. 1349 01:04:24,560 --> 01:04:27,480 So how might this be germane here? 1350 01:04:27,480 --> 01:04:30,710 Well, it turns out that if you want to store a whole bunch of values, 1351 01:04:30,710 --> 01:04:33,320 but they're all kind of interrelated, like they're all scores, 1352 01:04:33,320 --> 01:04:37,340 you don't have to resort to this sort of lazy, score1, score2, score3, score4, 1353 01:04:37,340 --> 01:04:40,850 score5, up to score99, depending on how many scores there are. 1354 01:04:40,850 --> 01:04:44,630 Why don't you just call all of those numbers scores, 1355 01:04:44,630 --> 01:04:46,520 but use a slightly different syntax? 1356 01:04:46,520 --> 01:04:49,550 And that syntax gives you access to what are called arrays. 1357 01:04:49,550 --> 01:04:52,730 So the syntax here on the screen is an example 1358 01:04:52,730 --> 01:04:56,930 of declaring space for three integers all at once 1359 01:04:56,930 --> 01:05:00,980 and collectively referring to all of them as the word "scores." 1360 01:05:00,980 --> 01:05:03,320 So there's no more scores 1, 2, and 3. 1361 01:05:03,320 --> 01:05:06,260 All three of those scores are in a variable called "scores." 1362 01:05:06,260 --> 01:05:09,830 And what's new here is the square brackets, inside of which 1363 01:05:09,830 --> 01:05:14,180 is a number that literally connotes how many integers do you 1364 01:05:14,180 --> 01:05:18,150 want to store under the name "scores." 1365 01:05:18,150 --> 01:05:19,940 So what does this allow me to do? 1366 01:05:19,940 --> 01:05:24,240 It allows me still to define three integers in that array. 1367 01:05:24,240 --> 01:05:26,330 So this array is going to be a chunk of memory 1368 01:05:26,330 --> 01:05:29,030 back to back to back that I can put values in. 1369 01:05:29,030 --> 01:05:32,240 And the way I put those values is going to look syntactically like this. 1370 01:05:32,240 --> 01:05:36,050 I still use numbers, but now I'm using a new notation. 1371 01:05:36,050 --> 01:05:39,170 And it's similar to what I resorted to before, 1372 01:05:39,170 --> 01:05:41,390 but it's a little more generalized now and dynamic. 1373 01:05:41,390 --> 01:05:44,960 Now if I want to update the very first score in that array, 1374 01:05:44,960 --> 01:05:47,540 I literally write the name of the variable scores, 1375 01:05:47,540 --> 01:05:51,020 bracket[0] and then assign it the value. 1376 01:05:51,020 --> 01:05:54,200 If I want to get at the second score, I do scores[1]. 1377 01:05:54,200 --> 01:05:56,570 If I want the third score, it's scores[2]. 1378 01:05:56,570 --> 01:06:00,050 And the only thing that's a little weird and takes some getting used to is 1379 01:06:00,050 --> 01:06:04,100 the fact that we are "zero-indexing" our arrays. 1380 01:06:04,100 --> 01:06:06,980 So in past examples, like for loops and while loops, 1381 01:06:06,980 --> 01:06:09,680 I've sort of said, eh, it's a convention in programming 1382 01:06:09,680 --> 01:06:11,180 to start counting from 0. 1383 01:06:11,180 --> 01:06:15,140 When it comes to arrays, which are contiguous 1384 01:06:15,140 --> 01:06:20,292 sequences of values in a computer's memory, they have to start at 0. 1385 01:06:20,292 --> 01:06:22,250 So otherwise, if you don't start counting at 0, 1386 01:06:22,250 --> 01:06:26,280 you're literally going to be wasting space by overlooking one value. 1387 01:06:26,280 --> 01:06:29,100 So now if we were to rename things on the screen, 1388 01:06:29,100 --> 01:06:34,340 instead of calling these three rectangles score1, score2, score3, 1389 01:06:34,340 --> 01:06:35,810 they're all called scores. 1390 01:06:35,810 --> 01:06:38,270 But if you want to refer specifically to the first one, 1391 01:06:38,270 --> 01:06:42,140 you use this fancy bracket notation, and the second one, this bracket notation, 1392 01:06:42,140 --> 01:06:44,150 and the third one, this bracket notation. 1393 01:06:44,150 --> 01:06:45,740 But notice the dichotomy. 1394 01:06:45,740 --> 01:06:51,770 When declaring the array, when creating the array, saying, give me three ints, 1395 01:06:51,770 --> 01:06:56,270 you use [3] where [3] is the total number of values. 1396 01:06:56,270 --> 01:06:59,300 When you index into the array-- 1397 01:06:59,300 --> 01:07:03,260 that is, when you go to a specific location in that chunk of memory-- 1398 01:07:03,260 --> 01:07:05,130 you similarly use numbers. 1399 01:07:05,130 --> 01:07:08,510 But now those are referring to their relative positions, position 0, 1400 01:07:08,510 --> 01:07:10,460 position 1, position 2. 1401 01:07:10,460 --> 01:07:13,160 This is the total number of spaces. 1402 01:07:13,160 --> 01:07:17,480 This is the specific space first, second, and third. 1403 01:07:17,480 --> 01:07:20,000 All right, so pictorially, nothing has changed, 1404 01:07:20,000 --> 01:07:21,872 just our nomenclature really has. 1405 01:07:21,872 --> 01:07:24,080 So let me go ahead and start to improve this program, 1406 01:07:24,080 --> 01:07:28,220 taking in the advice that was offered too on how we can improve the design 1407 01:07:28,220 --> 01:07:30,290 and get rid of the smelliness of it. 1408 01:07:30,290 --> 01:07:32,060 Let me take the first-- 1409 01:07:32,060 --> 01:07:34,650 let me take the easiest of these approaches 1410 01:07:34,650 --> 01:07:37,340 first by just getting rid of these three separate variables 1411 01:07:37,340 --> 01:07:42,590 and instead giving me one variable called scores, an array of size 3. 1412 01:07:42,590 --> 01:07:45,710 And then I don't need to declare score1, score2. 1413 01:07:45,710 --> 01:07:47,420 Again, that's all going away. 1414 01:07:47,420 --> 01:07:48,453 That's all going away. 1415 01:07:48,453 --> 01:07:49,370 That's all going away. 1416 01:07:49,370 --> 01:07:52,520 Now if I want to initialize that array with these three values, 1417 01:07:52,520 --> 01:07:54,440 I say scores[0]. 1418 01:07:54,440 --> 01:07:56,510 And down here, I say scores[1]. 1419 01:07:56,510 --> 01:07:59,060 And down here, I say scores[2]. 1420 01:07:59,060 --> 01:08:01,200 So I've added one line of code. 1421 01:08:01,200 --> 01:08:02,810 But notice the dynamism now. 1422 01:08:02,810 --> 01:08:05,720 If I want to have a fourth one, I can just allocate here and then 1423 01:08:05,720 --> 01:08:09,742 put in the value with another line of code, or 5, or 6, or 7, or 8. 1424 01:08:09,742 --> 01:08:11,450 I don't have to start copying and pasting 1425 01:08:11,450 --> 01:08:13,930 all of these different variable names by convention. 1426 01:08:13,930 --> 01:08:16,930 But I think if we take some of the advice that was offered a moment ago, 1427 01:08:16,930 --> 01:08:20,370 we can also clean this up by way of a loop or such as well. 1428 01:08:20,370 --> 01:08:21,380 So let's do that. 1429 01:08:21,380 --> 01:08:26,140 Let me go ahead and give myself, actually, first the CS50 library so 1430 01:08:26,140 --> 01:08:27,609 that I can use get_int. 1431 01:08:27,609 --> 01:08:30,100 And let's take this first piece of advice, which is, 1432 01:08:30,100 --> 01:08:33,370 let's start asking for a score using get_int. 1433 01:08:33,370 --> 01:08:35,950 And I'm going to do this three times. 1434 01:08:35,950 --> 01:08:37,750 And yeah, I'm getting a little lazy. 1435 01:08:37,750 --> 01:08:38,859 I'm getting a little bored already. 1436 01:08:38,859 --> 01:08:40,029 So I'm going to copy/paste. 1437 01:08:40,029 --> 01:08:41,946 And again, that does not bode well in general. 1438 01:08:41,946 --> 01:08:44,649 When copying and pasting, we can probably do better still. 1439 01:08:44,649 --> 01:08:47,770 But now I think I need to change just one more thing here. 1440 01:08:47,770 --> 01:08:54,010 When doing the math, I want scores[0] plus scores[1] plus scores[2]. 1441 01:08:54,010 --> 01:08:57,580 But before I solve this problem here-- the logic is still the same, 1442 01:08:57,580 --> 01:09:00,310 but I'm now taking in dynamically three integers-- 1443 01:09:00,310 --> 01:09:02,740 there's still a smell to it as well. 1444 01:09:02,740 --> 01:09:04,550 It's still not as well designed. 1445 01:09:04,550 --> 01:09:10,640 And so just to make clear, what could I do be doing better now? 1446 01:09:10,640 --> 01:09:14,359 How could I clean up this code and make it not just correct, not just 1447 01:09:14,359 --> 01:09:17,310 well styled, but better designed? 1448 01:09:17,310 --> 01:09:18,240 What remains here? 1449 01:09:18,240 --> 01:09:18,740 Nina? 1450 01:09:18,740 --> 01:09:20,450 What do you think? 1451 01:09:20,450 --> 01:09:24,170 NINA: The code is specific for only three scores. 1452 01:09:24,170 --> 01:09:27,859 So you could, as an input, [INAUDIBLE] how many scores 1453 01:09:27,859 --> 01:09:30,109 it wants at the very beginning. 1454 01:09:30,109 --> 01:09:33,735 And then instead of having scores[0], scores[1], 1455 01:09:33,735 --> 01:09:40,560 you could use a for loop that goes through from 0 to n minus 1 or less 1456 01:09:40,560 --> 01:09:45,078 than n that will ask, and it should be one line of code instead. 1457 01:09:45,078 --> 01:09:46,370 DAVID MALAN: Yeah, really good. 1458 01:09:46,370 --> 01:09:48,649 It's the fact that we have get_int, get_int, get_int. 1459 01:09:48,649 --> 01:09:51,649 That's the first sign that you're probably doing something suboptimally. 1460 01:09:51,649 --> 01:09:53,510 It might be correct, but it's probably not well designed 1461 01:09:53,510 --> 01:09:55,682 because I did literally resort to copy/paste. 1462 01:09:55,682 --> 01:09:57,890 There's sort of a pattern here that I could certainly 1463 01:09:57,890 --> 01:09:59,580 integrate into something like a loop. 1464 01:09:59,580 --> 01:10:00,470 So let me do that. 1465 01:10:00,470 --> 01:10:03,830 Let me actually get rid of two of these lines of code. 1466 01:10:03,830 --> 01:10:08,870 Let me go up here and do something like for int i get 0, i less than 3 for now, 1467 01:10:08,870 --> 01:10:10,100 i++. 1468 01:10:10,100 --> 01:10:11,600 Let me open up this for loop. 1469 01:10:11,600 --> 01:10:14,120 Let me indent that remaining line of code. 1470 01:10:14,120 --> 01:10:16,520 And instead of scores[0]-- 1471 01:10:16,520 --> 01:10:18,930 this is where arrays get really powerful-- 1472 01:10:18,930 --> 01:10:22,760 you can use a variable to index into an array-- 1473 01:10:22,760 --> 01:10:24,740 that is, to go to a specific location. 1474 01:10:24,740 --> 01:10:26,780 What do I want to use for my variable? 1475 01:10:26,780 --> 01:10:29,400 Well, I would think i here. 1476 01:10:29,400 --> 01:10:33,980 So now I've whittled my lines of code down from all three triplicate, three 1477 01:10:33,980 --> 01:10:36,140 nearly identical lines, into just one really 1478 01:10:36,140 --> 01:10:39,290 inside of a loop that's going to do the same thing for me again and again. 1479 01:10:39,290 --> 01:10:42,230 And as Nina proposed too, I don't have to hard code 1480 01:10:42,230 --> 01:10:43,860 these 3's all over the place. 1481 01:10:43,860 --> 01:10:46,050 Maybe I could do something like this. 1482 01:10:46,050 --> 01:10:50,240 I could say something like, int total gets get_int. 1483 01:10:50,240 --> 01:10:53,690 And I might ask, "Total number of scores." 1484 01:10:53,690 --> 01:10:56,660 And I could literally ask the human from the get-go 1485 01:10:56,660 --> 01:10:58,490 how many total scores are there. 1486 01:10:58,490 --> 01:11:04,680 Then I can even more powerfully use this variable, total, in multiple places 1487 01:11:04,680 --> 01:11:07,820 so that now, I'm doing my math much more dynamically. 1488 01:11:07,820 --> 01:11:11,300 This, though-- I'm afraid, Nina, this broke a bit. 1489 01:11:11,300 --> 01:11:13,280 I'm going to be a little more-- 1490 01:11:13,280 --> 01:11:16,280 I need to exert a little more effort here on line 14 because now I 1491 01:11:16,280 --> 01:11:21,500 can't hard code scores[0], [1], and [2] because if the total number of scores 1492 01:11:21,500 --> 01:11:23,820 is more than that, I need to do more addition. 1493 01:11:23,820 --> 01:11:26,010 If it's fewer than that, I need to do less addition. 1494 01:11:26,010 --> 01:11:28,698 So I think we've introduced a bug, but we can fix that. 1495 01:11:28,698 --> 01:11:30,240 But let me propose for just a moment. 1496 01:11:30,240 --> 01:11:33,470 Let's not make it dynamic because I worry that's just made my life harder. 1497 01:11:33,470 --> 01:11:36,620 Let's at least introduce one other feature here first. 1498 01:11:36,620 --> 01:11:41,090 I'm going to go ahead up here and define a new feature of C today, which 1499 01:11:41,090 --> 01:11:42,290 is known as a constant. 1500 01:11:42,290 --> 01:11:46,070 If I know in advance that I want to declare a number that I want 1501 01:11:46,070 --> 01:11:49,670 to use again and again and again without copying and pasting 1502 01:11:49,670 --> 01:11:53,180 literally that number 3, I can give myself a constant int 1503 01:11:53,180 --> 01:11:58,160 by a const int total = 3. 1504 01:11:58,160 --> 01:12:01,320 This declares what's called a constant in programming, 1505 01:12:01,320 --> 01:12:05,600 which is a feature of many languages whereby you declare a variable of sorts 1506 01:12:05,600 --> 01:12:07,370 whose value can never change. 1507 01:12:07,370 --> 01:12:09,870 Once you set it, you cannot change it. 1508 01:12:09,870 --> 01:12:12,020 And that's a good thing because, one, it shouldn't 1509 01:12:12,020 --> 01:12:13,603 change in the context of this program. 1510 01:12:13,603 --> 01:12:16,460 And two, just in case you, the human, are fallible, 1511 01:12:16,460 --> 01:12:19,103 you don't want to accidentally change it when you don't intend. 1512 01:12:19,103 --> 01:12:21,020 So this is a feature of a programming language 1513 01:12:21,020 --> 01:12:23,850 that sort of protects you from yourself. 1514 01:12:23,850 --> 01:12:28,280 So now I can sort of take an amalgam of my instincts and Nina's and use 1515 01:12:28,280 --> 01:12:29,780 this variable, total. 1516 01:12:29,780 --> 01:12:32,570 And actually, another convention when declaring constants 1517 01:12:32,570 --> 01:12:35,960 is to capitalize them just to make visually clear that there's 1518 01:12:35,960 --> 01:12:38,820 something different or special about this variable. 1519 01:12:38,820 --> 01:12:42,320 So I'm going to change this to TOTAL, and I'm going to use that value here 1520 01:12:42,320 --> 01:12:45,260 and here and also down here. 1521 01:12:45,260 --> 01:12:49,550 But I'm afraid both Nina and I have a little bit of cleanup here to do 1522 01:12:49,550 --> 01:12:54,680 in that I still have hard coded scores[0], scores[1], and scores[2]. 1523 01:12:54,680 --> 01:12:58,552 And I want to add a changing number of values together. 1524 01:12:58,552 --> 01:12:59,260 So you know what? 1525 01:12:59,260 --> 01:13:00,270 I've got an idea. 1526 01:13:00,270 --> 01:13:03,260 Let me go ahead and create a function that's 1527 01:13:03,260 --> 01:13:05,070 going to compute an average for me. 1528 01:13:05,070 --> 01:13:08,780 So if I want to create my own function that computes an average, 1529 01:13:08,780 --> 01:13:10,650 I want it to return a floating point value, 1530 01:13:10,650 --> 01:13:13,400 just so that we don't truncate any math. 1531 01:13:13,400 --> 01:13:15,020 I'm going to call this average. 1532 01:13:15,020 --> 01:13:18,560 And the input to this function is going to be the length 1533 01:13:18,560 --> 01:13:21,710 of an array and the actual array. 1534 01:13:21,710 --> 01:13:24,320 And this is the last piece of funky syntax for now. 1535 01:13:24,320 --> 01:13:31,700 It turns out that when you want to pass an array as input to a custom function, 1536 01:13:31,700 --> 01:13:35,360 you literally use those square brackets again, but you don't specify the size. 1537 01:13:35,360 --> 01:13:37,940 And the upside of this is that your function then 1538 01:13:37,940 --> 01:13:42,050 can support an array that's got one space in it, two spaces, three, 1539 01:13:42,050 --> 01:13:42,800 a hundred. 1540 01:13:42,800 --> 01:13:45,030 It's more dynamic this way. 1541 01:13:45,030 --> 01:13:47,090 So how do I compute an average here? 1542 01:13:47,090 --> 01:13:48,590 I can do this a few different ways. 1543 01:13:48,590 --> 01:13:50,975 But I think what was suggested earlier makes 1544 01:13:50,975 --> 01:13:52,850 sense, where I can do some kind of summation. 1545 01:13:52,850 --> 01:13:54,898 So let me do int sum = 0. 1546 01:13:54,898 --> 01:13:57,440 Because how do you compute the average of a bunch of numbers? 1547 01:13:57,440 --> 01:14:00,170 Well, you add them all together, and you divide by the total. 1548 01:14:00,170 --> 01:14:01,670 Well, let's see how I might do that. 1549 01:14:01,670 --> 01:14:06,050 Let me do for int i gets 0, i less than-- 1550 01:14:06,050 --> 01:14:06,920 what should this be? 1551 01:14:06,920 --> 01:14:11,990 Well, if I'm being passed as this custom function the length of the array 1552 01:14:11,990 --> 01:14:16,280 and the actual array, I think I can iterate from i up to length, 1553 01:14:16,280 --> 01:14:18,870 and then i++ on each iteration. 1554 01:14:18,870 --> 01:14:20,720 And then on each iteration, I think I want 1555 01:14:20,720 --> 01:14:27,450 to do sum plus whatever is in the array's i-th location, so to speak. 1556 01:14:27,450 --> 01:14:31,160 So again, this is shorthand notation per last week for this. 1557 01:14:31,160 --> 01:14:38,330 Sum equals whatever sum is plus whatever is in location i of the array. 1558 01:14:38,330 --> 01:14:40,670 And once I've done all of that, I think what 1559 01:14:40,670 --> 01:14:47,400 I can do is return the total sum divided by the length of the array. 1560 01:14:47,400 --> 01:14:50,510 And what I like about this whole approach-- assuming my code's correct, 1561 01:14:50,510 --> 01:14:54,440 and I don't think it is just yet-- notice what I can do back up in main. 1562 01:14:54,440 --> 01:14:58,400 Now I can abstract away the notion of calculating an average 1563 01:14:58,400 --> 01:15:04,863 and just do something like this with this line of code here. 1564 01:15:04,863 --> 01:15:05,780 So what did I just do? 1565 01:15:05,780 --> 01:15:09,080 A lot's going on, but let's focus for a moment on line 14 here. 1566 01:15:09,080 --> 01:15:12,230 On line 14, I'm still just printing the average of some floating point 1567 01:15:12,230 --> 01:15:13,230 placeholder. 1568 01:15:13,230 --> 01:15:17,570 But what I'm passing as input now is this function, average, 1569 01:15:17,570 --> 01:15:20,210 whose inputs are going to be TOTAL, which again is just 1570 01:15:20,210 --> 01:15:23,210 this constant at the very top-- oh, sorry, I goofed. 1571 01:15:23,210 --> 01:15:26,420 I should have capitalized it, which is just that constant at the very top. 1572 01:15:26,420 --> 01:15:29,840 And I'm passing in scores, which again, is just 1573 01:15:29,840 --> 01:15:32,750 this array of all of those scores. 1574 01:15:32,750 --> 01:15:36,180 Meanwhile, in the function, in the context of the function, 1575 01:15:36,180 --> 01:15:39,140 notice that the names of the inputs to a function 1576 01:15:39,140 --> 01:15:41,870 do not need to match the names of the variables being 1577 01:15:41,870 --> 01:15:43,410 passed into that function. 1578 01:15:43,410 --> 01:15:46,820 So even though in main, they're called TOTAL and scores, 1579 01:15:46,820 --> 01:15:48,890 in the context of my function, average, I 1580 01:15:48,890 --> 01:15:54,140 can call them x and y, a and b, or more generically, length and array. 1581 01:15:54,140 --> 01:15:56,660 I don't know what the array is, but it's an array of ints. 1582 01:15:56,660 --> 01:16:01,280 And I don't know how long it is, but that answer is going to be in length. 1583 01:16:01,280 --> 01:16:03,560 But there's still a bug here. 1584 01:16:03,560 --> 01:16:04,640 There's still a bug. 1585 01:16:04,640 --> 01:16:07,940 And if we ignore main for a moment, this is a subtle one. 1586 01:16:07,940 --> 01:16:11,840 Does anyone see a mistake that I've made probably for the third time 1587 01:16:11,840 --> 01:16:14,330 now over the past two weeks? 1588 01:16:14,330 --> 01:16:18,490 What mistake subtle have I made here with my code 1589 01:16:18,490 --> 01:16:21,850 only in this average function? 1590 01:16:21,850 --> 01:16:23,740 This one's a little more subtle. 1591 01:16:23,740 --> 01:16:27,250 But the goal is to compute the average of a whole bunch of integers 1592 01:16:27,250 --> 01:16:28,460 and return the answer. 1593 01:16:28,460 --> 01:16:29,770 Nicholas? 1594 01:16:29,770 --> 01:16:33,940 NICHOLAS: You've declared the variable within the function. 1595 01:16:33,940 --> 01:16:37,420 DAVID MALAN: I've declared the variable within the function. 1596 01:16:37,420 --> 01:16:42,460 That's OK because I've declared my variable sum here, I think you mean. 1597 01:16:42,460 --> 01:16:45,430 But that's inside of the average function. 1598 01:16:45,430 --> 01:16:49,870 And I'm using sum inside of the outermost curly braces 1599 01:16:49,870 --> 01:16:50,890 that was defined. 1600 01:16:50,890 --> 01:16:52,300 And so that's OK. 1601 01:16:52,300 --> 01:16:53,590 That's OK. 1602 01:16:53,590 --> 01:16:56,650 Let's take another thought here. 1603 01:16:56,650 --> 01:17:00,085 Olivia, where might the bug still be? 1604 01:17:00,085 --> 01:17:01,960 OLIVIA: The return type's a float, but you're 1605 01:17:01,960 --> 01:17:03,425 returning an int divided by an int. 1606 01:17:03,425 --> 01:17:04,300 DAVID MALAN: Perfect. 1607 01:17:04,300 --> 01:17:06,400 So I again made that same stupid mistake that's 1608 01:17:06,400 --> 01:17:08,710 just going to get more obvious as time goes on 1609 01:17:08,710 --> 01:17:12,760 that if I want to do floating point arithmetic, just like the Ariane rocket 1610 01:17:12,760 --> 01:17:15,550 discussion, the Patriot missile-- like, these kinds of details 1611 01:17:15,550 --> 01:17:16,870 matter in a program. 1612 01:17:16,870 --> 01:17:18,790 Now it's correct because I'm actually going 1613 01:17:18,790 --> 01:17:22,000 to ensure that even though the context here 1614 01:17:22,000 --> 01:17:24,740 is much less important than those real-world contexts, 1615 01:17:24,740 --> 01:17:28,960 just computing some average of scores, I'm not going to accidentally truncate 1616 01:17:28,960 --> 01:17:30,212 any of my values. 1617 01:17:30,212 --> 01:17:32,170 So again, in the context here of this function, 1618 01:17:32,170 --> 01:17:34,540 average is just applying some of last week's principles. 1619 01:17:34,540 --> 01:17:35,500 I've got a variable. 1620 01:17:35,500 --> 01:17:36,310 I've got a loop. 1621 01:17:36,310 --> 01:17:39,070 And I'm doing some floating point arithmetic, ultimately. 1622 01:17:39,070 --> 01:17:42,790 And I'm now creating a function that takes two inputs. 1623 01:17:42,790 --> 01:17:44,890 One is length, and one is the length-- 1624 01:17:44,890 --> 01:17:48,100 one is the array itself, and the return type, as Olivia notes, 1625 01:17:48,100 --> 01:17:51,790 is a float so that my output is also well defined. 1626 01:17:51,790 --> 01:17:53,590 But what's nice about this is, again, you 1627 01:17:53,590 --> 01:17:55,660 can think of these functions as abstractions. 1628 01:17:55,660 --> 01:18:00,760 Now I don't need to worry about how I calculate an average because I now 1629 01:18:00,760 --> 01:18:03,400 have this helper function, a custom function 1630 01:18:03,400 --> 01:18:05,930 I wrote that can help me answer that question. 1631 01:18:05,930 --> 01:18:09,010 And here, notice that the output of this average function 1632 01:18:09,010 --> 01:18:12,842 will become an input into printf. 1633 01:18:12,842 --> 01:18:15,050 And the only other feature I've added to the mix here 1634 01:18:15,050 --> 01:18:18,380 now are not only arrays, which allow us to create 1635 01:18:18,380 --> 01:18:21,650 multiple variables, a variable number of variables, if you will, 1636 01:18:21,650 --> 01:18:23,420 but also this notion of a constant. 1637 01:18:23,420 --> 01:18:26,960 If I find myself using the same number again and again and again, 1638 01:18:26,960 --> 01:18:29,570 this constant can help me keep my code clean. 1639 01:18:29,570 --> 01:18:30,440 And notice this. 1640 01:18:30,440 --> 01:18:33,710 If next year, maybe another semester, there's four scores or four tests, 1641 01:18:33,710 --> 01:18:34,940 I change it in one place. 1642 01:18:34,940 --> 01:18:35,870 I recompile. 1643 01:18:35,870 --> 01:18:37,610 Boom, I'm done. 1644 01:18:37,610 --> 01:18:39,980 A well-designed program does not require that you 1645 01:18:39,980 --> 01:18:43,130 go reading through the entirety of it, fixing numbers here, numbers there. 1646 01:18:43,130 --> 01:18:46,010 Changing it in one place can allow me to improve this program, 1647 01:18:46,010 --> 01:18:49,520 make it support four tests next year instead of just the three. 1648 01:18:49,520 --> 01:18:52,760 But better still would be to take, I think, 1649 01:18:52,760 --> 01:18:56,900 Nina's advice before, which was to maybe just use get_int and ask the human 1650 01:18:56,900 --> 01:18:58,910 for how many tests they actually have. 1651 01:18:58,910 --> 01:19:00,562 That too would work. 1652 01:19:00,562 --> 01:19:02,270 Well, let me pause here to see if there's 1653 01:19:02,270 --> 01:19:07,460 any questions then about arrays or about constants 1654 01:19:07,460 --> 01:19:13,770 or passing them around as inputs and outputs in this way. 1655 01:19:13,770 --> 01:19:16,740 Yeah, over to Sophia. 1656 01:19:16,740 --> 01:19:21,570 SOPHIA: I had question about the use of float and why the use of one float 1657 01:19:21,570 --> 01:19:23,790 causes the whole output to be a float. 1658 01:19:23,790 --> 01:19:24,870 Why does that occur? 1659 01:19:24,870 --> 01:19:25,920 DAVID MALAN: Yeah, really good question. 1660 01:19:25,920 --> 01:19:27,340 That's just how C behaves. 1661 01:19:27,340 --> 01:19:30,840 So long as there is one or more floating point values involved 1662 01:19:30,840 --> 01:19:35,820 in a mathematical formula, it is going to use that data type, which 1663 01:19:35,820 --> 01:19:39,610 is the more powerful one, if you will, rather than risk truncating anything. 1664 01:19:39,610 --> 01:19:41,970 So you just need one float to be participating 1665 01:19:41,970 --> 01:19:44,490 in the formula in question. 1666 01:19:44,490 --> 01:19:45,900 Good question. 1667 01:19:45,900 --> 01:19:53,550 Other questions on arrays or constants or this passing around of them? 1668 01:19:53,550 --> 01:19:57,150 Yeah, over to Alexandra. 1669 01:19:57,150 --> 01:20:03,240 ALEXANDRA: I have a question about the declaring of the array, scores. 1670 01:20:03,240 --> 01:20:08,370 When you declared it in main, you said int scores. 1671 01:20:08,370 --> 01:20:11,890 And in the brackets, you have TOTAL. 1672 01:20:11,890 --> 01:20:16,313 Can you declare it without the TOTAL-- 1673 01:20:16,313 --> 01:20:17,730 DAVID MALAN: Really good question. 1674 01:20:17,730 --> 01:20:18,600 ALEXANDRA: --only the brackets? 1675 01:20:18,600 --> 01:20:19,530 DAVID MALAN: Short answer, no. 1676 01:20:19,530 --> 01:20:21,940 So the way I did it is the way you do have to do it. 1677 01:20:21,940 --> 01:20:25,810 And in fact, if I highlight what I did here, now it currently says TOTAL. 1678 01:20:25,810 --> 01:20:29,400 If I get rid of that, and I go back to our first version where I said 1679 01:20:29,400 --> 01:20:36,360 something like 3 and 3 and 3 over here, you cannot do this, which I think, 1680 01:20:36,360 --> 01:20:38,010 Alexandra, is what you were proposing. 1681 01:20:38,010 --> 01:20:41,640 The computer needs to know how big the array is when you are creating it. 1682 01:20:41,640 --> 01:20:44,160 The exception to that is that when you're 1683 01:20:44,160 --> 01:20:47,070 passing an array from one function to another, 1684 01:20:47,070 --> 01:20:49,350 you do not need to tell that custom function 1685 01:20:49,350 --> 01:20:51,990 how big the array is because, again, you don't know in advance. 1686 01:20:51,990 --> 01:20:55,410 You're writing a fairly generic, dynamic function whose purpose in life 1687 01:20:55,410 --> 01:21:00,750 is to take any array as input of integers and any length 1688 01:21:00,750 --> 01:21:05,640 and respond accordingly with an average that matches the size of that thing. 1689 01:21:05,640 --> 01:21:09,870 And those of you, as an aside, who have programmed before, especially in Java, 1690 01:21:09,870 --> 01:21:13,890 unlike in Java and certain other languages, the length of an array 1691 01:21:13,890 --> 01:21:16,320 is not built into the array itself. 1692 01:21:16,320 --> 01:21:20,590 If you do not pass in the length of an array to another function, 1693 01:21:20,590 --> 01:21:24,280 there is no way to determine how big the array is. 1694 01:21:24,280 --> 01:21:26,850 This is different from Java and other languages, 1695 01:21:26,850 --> 01:21:29,880 where you can ask the array, in some sense, what is its length. 1696 01:21:29,880 --> 01:21:32,490 In C, you have to pass both the array itself 1697 01:21:32,490 --> 01:21:35,610 and its length around separately. [? Sina? ?] 1698 01:21:35,610 --> 01:21:38,880 [? SINA: ?] I just-- I'm still a little bit confused about how, 1699 01:21:38,880 --> 01:21:44,740 when we write that second command, when is it void in the parentheses? 1700 01:21:44,740 --> 01:21:47,520 And when do we define the int? 1701 01:21:47,520 --> 01:21:50,918 Because as I remember when we did the-- 1702 01:21:50,918 --> 01:21:53,460 get a negative number, we get a positive number, it was void, 1703 01:21:53,460 --> 01:21:55,500 but we still kind of gave it an input. 1704 01:21:55,500 --> 01:21:57,705 I'm just not completely sold on that. 1705 01:21:57,705 --> 01:21:59,080 DAVID MALAN: Sure, good question. 1706 01:21:59,080 --> 01:22:01,570 Let me go ahead and open up that previous example, 1707 01:22:01,570 --> 01:22:04,960 which was a little buggy, but it has the right syntax here. 1708 01:22:04,960 --> 01:22:07,620 So here was the get_negative_int function from before. 1709 01:22:07,620 --> 01:22:10,620 And, [? Sina, ?] you know it was void as input. 1710 01:22:10,620 --> 01:22:13,170 So there was one comment you made where it still took input. 1711 01:22:13,170 --> 01:22:14,010 That was not so. 1712 01:22:14,010 --> 01:22:17,070 So get_negative_int did not take any input. 1713 01:22:17,070 --> 01:22:19,620 And case in point, if we scroll up to main, 1714 01:22:19,620 --> 01:22:22,530 notice that when I called it on line 10, I 1715 01:22:22,530 --> 01:22:25,920 said get_negative_int, open parenthesis, close parenthesis, 1716 01:22:25,920 --> 01:22:29,040 with no inputs inside of those parentheses. 1717 01:22:29,040 --> 01:22:32,220 This keyword "void," which we've seen a few times now last week 1718 01:22:32,220 --> 01:22:35,880 and this week, is just an explicit keyword in C that says, 1719 01:22:35,880 --> 01:22:41,340 do not put anything here, which is to say, it would be incorrect for me up 1720 01:22:41,340 --> 01:22:44,970 here to do something like this, like to pass in a number, 1721 01:22:44,970 --> 01:22:48,990 or to pass in a prompt, or anything inside of those parentheses. 1722 01:22:48,990 --> 01:22:51,630 The fact that this function, get_negative_int 1723 01:22:51,630 --> 01:22:56,340 takes void as its input means it does not take any inputs whatsoever. 1724 01:22:56,340 --> 01:22:56,942 That's fine. 1725 01:22:56,942 --> 01:22:59,400 For get_negative_int, the name of the function says it all. 1726 01:22:59,400 --> 01:23:02,367 Like, there's no need to parameterize or customize 1727 01:23:02,367 --> 01:23:04,200 the behavior of getting negative int itself. 1728 01:23:04,200 --> 01:23:06,180 You just want to get a negative int. 1729 01:23:06,180 --> 01:23:09,300 By contrast, though, with the function we just wrote, 1730 01:23:09,300 --> 01:23:14,940 average, this function does make conceptual sense to take inputs, 1731 01:23:14,940 --> 01:23:17,490 because you can't just say, give me the average. 1732 01:23:17,490 --> 01:23:18,930 Like, average of what? 1733 01:23:18,930 --> 01:23:22,110 Like, it needs to take input so as to answer that question for you. 1734 01:23:22,110 --> 01:23:24,840 And the input, in this case, is the array itself of numbers 1735 01:23:24,840 --> 01:23:28,425 and the length of that array so you can do the arithmetic. 1736 01:23:28,425 --> 01:23:31,050 And so, [? Sina, ?] hopefully, that helps make the distinction. 1737 01:23:31,050 --> 01:23:33,930 You use void when you don't want to take input. 1738 01:23:33,930 --> 01:23:38,340 And you actually specify a comma-separated list of arguments 1739 01:23:38,340 --> 01:23:42,000 when you do want to take input. 1740 01:23:42,000 --> 01:23:46,170 All right, so we focused up until now on integers, really. 1741 01:23:46,170 --> 01:23:49,020 But let's simplify a little bit because it turns out 1742 01:23:49,020 --> 01:23:52,020 that arrays and memory actually intersect 1743 01:23:52,020 --> 01:23:55,740 to create some very familiar features of most any computer program, namely 1744 01:23:55,740 --> 01:23:57,970 text or strings more generally. 1745 01:23:57,970 --> 01:24:03,010 So suppose we simplify further, no more integers, no more arrays of integers. 1746 01:24:03,010 --> 01:24:05,490 Let's just start for a moment with a single character 1747 01:24:05,490 --> 01:24:09,840 and write a program that just creates a single brick from that Mario game. 1748 01:24:09,840 --> 01:24:13,540 Let me go ahead and create a program here called brick.c. 1749 01:24:13,540 --> 01:24:15,900 And in brick.c, I'm just going to #include 1750 01:24:15,900 --> 01:24:21,570 stdio.h, int main(void) And more on this void a little later today. 1751 01:24:21,570 --> 01:24:25,170 Char c gets, quote unquote, '#'. 1752 01:24:25,170 --> 01:24:29,730 And then down here, let me just go ahead and print very simply a placeholder, 1753 01:24:29,730 --> 01:24:32,800 %c, backslash n, and then output c. 1754 01:24:32,800 --> 01:24:34,380 So this is a pretty stupid program. 1755 01:24:34,380 --> 01:24:37,530 Its sole purpose in life is to print a single hash 1756 01:24:37,530 --> 01:24:41,940 as you might have in a Mario pyramid of height 1, so very simple. 1757 01:24:41,940 --> 01:24:44,040 Let me go ahead and make brick. 1758 01:24:44,040 --> 01:24:45,480 It seems to compile OK. 1759 01:24:45,480 --> 01:24:47,040 Let me run it with ./brick. 1760 01:24:47,040 --> 01:24:48,750 And voila, we get a single brick. 1761 01:24:48,750 --> 01:24:54,150 But let's consider for just a moment exactly what just happened here 1762 01:24:54,150 --> 01:24:58,237 and what actually was going on underneath the hood. 1763 01:24:58,237 --> 01:24:59,070 Well, you know what? 1764 01:24:59,070 --> 01:25:00,030 I'm kind of curious. 1765 01:25:00,030 --> 01:25:03,990 I remember from last week, we could cast values from one thing to another. 1766 01:25:03,990 --> 01:25:07,290 What if I got a little curious, and I didn't print out c, 1767 01:25:07,290 --> 01:25:12,480 which is this hash character, as %c, which is a placeholder for a character? 1768 01:25:12,480 --> 01:25:15,250 What if I got a little crazy and said %i? 1769 01:25:15,250 --> 01:25:21,370 I think I could probably coerce this char by casting it to an int 1770 01:25:21,370 --> 01:25:23,830 so I can see its decimal equivalent. 1771 01:25:23,830 --> 01:25:25,960 I could see its actual ASCII code. 1772 01:25:25,960 --> 01:25:28,350 So let me rebuild this with make brick. 1773 01:25:28,350 --> 01:25:30,330 Now let me do ./brick. 1774 01:25:30,330 --> 01:25:32,430 And what number might we see? 1775 01:25:32,430 --> 01:25:36,840 Last week, we saw 72 a lot, 73, and 33 for "HI!" 1776 01:25:36,840 --> 01:25:39,000 This week, you can see 35. 1777 01:25:39,000 --> 01:25:43,140 It turns out it's the code for and an ASCII hash. 1778 01:25:43,140 --> 01:25:47,730 And you can see this, for instance, if I go to a website like-- 1779 01:25:47,730 --> 01:25:52,020 let's go to asciichart.com. 1780 01:25:52,020 --> 01:25:55,170 And sure enough, if I go to the same chart from last week, 1781 01:25:55,170 --> 01:25:58,560 and I look for the hash symbol here, its ASCII code is 35. 1782 01:25:58,560 --> 01:26:02,340 And it turns out, in C, if it's pretty straightforward to the computer 1783 01:26:02,340 --> 01:26:05,390 that, yes, if this is a character, I know I can convert it to an int, 1784 01:26:05,390 --> 01:26:07,440 you don't have to explicitly cast it. 1785 01:26:07,440 --> 01:26:12,990 You can instead implicitly cast one data type to another just from context here. 1786 01:26:12,990 --> 01:26:16,950 So printf and C are smart enough here to know, OK, you're giving me 1787 01:26:16,950 --> 01:26:19,050 a character in the form of variable c. 1788 01:26:19,050 --> 01:26:23,083 But you want to display it as a %i, an integer. 1789 01:26:23,083 --> 01:26:24,000 That's going to be OK. 1790 01:26:24,000 --> 01:26:25,990 And indeed, I still see the number 35. 1791 01:26:25,990 --> 01:26:27,392 So that's just simple casting. 1792 01:26:27,392 --> 01:26:29,850 But let's now put this into the context of today's picture. 1793 01:26:29,850 --> 01:26:31,372 How is that character laid out? 1794 01:26:31,372 --> 01:26:33,330 Well, quite simply, if this is my memory again, 1795 01:26:33,330 --> 01:26:36,150 and we've gotten rid of all of the numbers, c, 1796 01:26:36,150 --> 01:26:41,250 otherwise storing this hash, is just being stored in one of these bytes. 1797 01:26:41,250 --> 01:26:47,370 It only requires one square because, again, a char is a single byte. 1798 01:26:47,370 --> 01:26:52,240 But equivalently, 35 is the number that's actually being stored there. 1799 01:26:52,240 --> 01:26:53,790 But I wonder, I wonder. 1800 01:26:53,790 --> 01:26:55,890 Last week, we spent quite a bit of time storing 1801 01:26:55,890 --> 01:27:01,060 not just single characters, but actual words like "hi" and other expressions. 1802 01:27:01,060 --> 01:27:03,490 And so what if I were to do something like this? 1803 01:27:03,490 --> 01:27:04,960 Let me go back to my code. 1804 01:27:04,960 --> 01:27:07,530 And let me not quite yet practice what I just preached. 1805 01:27:07,530 --> 01:27:11,910 And let me give myself three variables this time-- c1, c2, and c3. 1806 01:27:11,910 --> 01:27:16,980 And let me deliberately store in those three variables H, I, in all caps, 1807 01:27:16,980 --> 01:27:18,720 followed by an exclamation point. 1808 01:27:18,720 --> 01:27:22,170 And per last week, when you're dealing with individual characters, 1809 01:27:22,170 --> 01:27:24,630 you must, in C, use single quotes. 1810 01:27:24,630 --> 01:27:26,520 When you're dealing with multiple characters, 1811 01:27:26,520 --> 01:27:29,080 otherwise known last week as strings, use double quotes. 1812 01:27:29,080 --> 01:27:31,830 But that's why I'm using single quotes, because we're only playing 1813 01:27:31,830 --> 01:27:34,060 at the moment with single characters. 1814 01:27:34,060 --> 01:27:37,080 Now let me go ahead and print these values out. 1815 01:27:37,080 --> 01:27:43,320 Let me print out %c, %c, %c, and output c1, c2, c3. 1816 01:27:43,320 --> 01:27:49,590 So this is perhaps the stupidest way you could print out a full word like "HI!" 1817 01:27:49,590 --> 01:27:54,360 in C by storing every single character in its own variable, but so be it. 1818 01:27:54,360 --> 01:27:57,090 I'm just using these first principles here. 1819 01:27:57,090 --> 01:27:58,493 I'm using %c as my placeholder. 1820 01:27:58,493 --> 01:27:59,910 I'm printing out these characters. 1821 01:27:59,910 --> 01:28:01,950 So let me do make brick now. 1822 01:28:01,950 --> 01:28:03,000 Compiles OK. 1823 01:28:03,000 --> 01:28:04,678 And if I do a dot slash-- 1824 01:28:04,678 --> 01:28:06,720 you know, I really should have renamed this file, 1825 01:28:06,720 --> 01:28:08,095 but we'll rename it in a moment-- 1826 01:28:08,095 --> 01:28:09,630 ./brick, "HI!" 1827 01:28:09,630 --> 01:28:11,190 And let me go ahead and do this. 1828 01:28:11,190 --> 01:28:14,490 Let me go ahead now and actually close the file. 1829 01:28:14,490 --> 01:28:18,820 And recall from last week, if I want to rename my file from brick.c, 1830 01:28:18,820 --> 01:28:22,620 let's say, to hi.c, I can use the move command, mv. 1831 01:28:22,620 --> 01:28:26,730 And now if I open up this file, sure enough, there's hi.c. 1832 01:28:26,730 --> 01:28:29,850 And I've fixed my renaming mistake. 1833 01:28:29,850 --> 01:28:35,040 All right, so again, if I now do make hi, and I do ./hi, voila, 1834 01:28:35,040 --> 01:28:36,000 I see the "HI!" 1835 01:28:36,000 --> 01:28:40,052 But again, this is kind of a stupid way of implementing a string. 1836 01:28:40,052 --> 01:28:41,760 But let's still look underneath the hood. 1837 01:28:41,760 --> 01:28:43,093 Let me go ahead and get curious. 1838 01:28:43,093 --> 01:28:46,311 Let me print out %i, %i, and %i. 1839 01:28:46,311 --> 01:28:48,480 And Let me include spaces this time just so I 1840 01:28:48,480 --> 01:28:51,700 can see separation between the numbers. 1841 01:28:51,700 --> 01:28:54,750 Let me make hi again, ./hi. 1842 01:28:54,750 --> 01:28:56,760 OK, there's that 72. 1843 01:28:56,760 --> 01:28:57,900 There's that 73. 1844 01:28:57,900 --> 01:29:00,340 And there's that 33 from last week. 1845 01:29:00,340 --> 01:29:01,653 So that's interesting too. 1846 01:29:01,653 --> 01:29:04,320 So what's going on underneath the hood in the computer's memory? 1847 01:29:04,320 --> 01:29:06,237 Well, when I'm storing these three characters, 1848 01:29:06,237 --> 01:29:11,040 now I'm just storing them in three different boxes, so c1, c2, c3. 1849 01:29:11,040 --> 01:29:14,970 And when you look at it collectively, it kind of looks like a whole word 1850 01:29:14,970 --> 01:29:17,610 even though it's, of course, just these individual characters. 1851 01:29:17,610 --> 01:29:20,850 So what's underneath the hood, of course, though, is 72, 73, 33. 1852 01:29:20,850 --> 01:29:23,160 Or equivalently, in binary, just this. 1853 01:29:23,160 --> 01:29:25,410 So the story is the same even though we're now talking 1854 01:29:25,410 --> 01:29:28,540 about chars instead of integers. 1855 01:29:28,540 --> 01:29:31,110 But what happens when I do this? 1856 01:29:31,110 --> 01:29:35,040 What happens when I do string s gets, quote unquote, "HI!" 1857 01:29:35,040 --> 01:29:36,450 using double quotes? 1858 01:29:36,450 --> 01:29:38,850 Well, let's change this program accordingly. 1859 01:29:38,850 --> 01:29:42,390 Let me go ahead and do what we would have done last week, string-- 1860 01:29:42,390 --> 01:29:44,760 I'll call it s just for s for string-- 1861 01:29:44,760 --> 01:29:45,300 "HI!" 1862 01:29:45,300 --> 01:29:46,410 in all caps. 1863 01:29:46,410 --> 01:29:47,925 I can simplify this next line. 1864 01:29:47,925 --> 01:29:52,170 I'm going to use %s as a placeholder for string s. 1865 01:29:52,170 --> 01:29:54,300 But let's, for now, reveal what a string really 1866 01:29:54,300 --> 01:29:55,800 is, because string is a term of art. 1867 01:29:55,800 --> 01:29:59,370 Every programming language has "strings" even if it doesn't technically 1868 01:29:59,370 --> 01:30:01,260 have a data type called string. 1869 01:30:01,260 --> 01:30:04,560 C does not technically have a data type called string. 1870 01:30:04,560 --> 01:30:08,850 We have added this type to C by way of CS50's library. 1871 01:30:08,850 --> 01:30:12,720 But now if I do make hi, notice that my code compiles OK. 1872 01:30:12,720 --> 01:30:17,230 And if I do ./hi Enter, voila, I still see "HI!", 1873 01:30:17,230 --> 01:30:19,570 which is what I would have seen last week as well. 1874 01:30:19,570 --> 01:30:23,700 And if we depict this in the computer's memory, because "HI!" is three letters, 1875 01:30:23,700 --> 01:30:26,040 it's kind of like saying, well, give me three boxes, 1876 01:30:26,040 --> 01:30:27,930 and let me call this string s. 1877 01:30:27,930 --> 01:30:30,510 So this feels like a reasonable artist's rendition 1878 01:30:30,510 --> 01:30:35,070 of what s is if it's storing a three-letter word like "HI!" 1879 01:30:35,070 --> 01:30:39,840 But any time we have sequences of characters like this, 1880 01:30:39,840 --> 01:30:44,190 I feel like we're now seeing the capability of a proper programming 1881 01:30:44,190 --> 01:30:44,760 language. 1882 01:30:44,760 --> 01:30:48,250 We introduced a little bit ago the notion of a string. 1883 01:30:48,250 --> 01:30:52,190 So maybe could someone redefine string as we've 1884 01:30:52,190 --> 01:30:56,360 been using it in terms of some of today's nomenclature? 1885 01:30:56,360 --> 01:30:57,860 Like, what is a string? 1886 01:30:57,860 --> 01:31:02,730 There's an example of one, "HI!", taking up three boxes. 1887 01:31:02,730 --> 01:31:06,720 But how did we, CS50 maybe implement string underneath the hood, 1888 01:31:06,720 --> 01:31:09,140 would you say? 1889 01:31:09,140 --> 01:31:09,650 What is it? 1890 01:31:09,650 --> 01:31:11,270 Tucker? 1891 01:31:11,270 --> 01:31:14,848 TUCKER: Well, it's an array of characters and integers. 1892 01:31:14,848 --> 01:31:16,640 Well, it's integers are used in the string, 1893 01:31:16,640 --> 01:31:19,575 but it's an array of basically single characters. 1894 01:31:19,575 --> 01:31:20,450 DAVID MALAN: Perfect. 1895 01:31:20,450 --> 01:31:22,640 If we now have the ability to express-- 1896 01:31:22,640 --> 01:31:23,810 very nicely done, Tucker. 1897 01:31:23,810 --> 01:31:27,560 If we now have the ability to represent sequences of things, integers, 1898 01:31:27,560 --> 01:31:29,360 for instance, like scores, well, it stands 1899 01:31:29,360 --> 01:31:33,410 to reason that we can take another primitive, a very basic data type 1900 01:31:33,410 --> 01:31:34,340 like a char. 1901 01:31:34,340 --> 01:31:38,030 And if we want to spell things with those chars, like English words, 1902 01:31:38,030 --> 01:31:41,180 well, let's just think of a string really as an array 1903 01:31:41,180 --> 01:31:43,820 of characters, an array of chars. 1904 01:31:43,820 --> 01:31:47,850 And indeed, that's exactly what string actually is. 1905 01:31:47,850 --> 01:31:54,180 So this thing here, "HI!", technically speaking is an array called s. 1906 01:31:54,180 --> 01:31:57,080 And this is s[0] This is s[1]. 1907 01:31:57,080 --> 01:31:58,310 This is s[2]. 1908 01:31:58,310 --> 01:31:59,878 It's just an array called s. 1909 01:31:59,878 --> 01:32:01,670 Now, we didn't use the word array last week 1910 01:32:01,670 --> 01:32:04,610 because it's not as familiar as the notion of a "string of text," 1911 01:32:04,610 --> 01:32:05,570 for instance. 1912 01:32:05,570 --> 01:32:08,720 But a string is apparently just an array. 1913 01:32:08,720 --> 01:32:12,380 And if it's an array, that means we can access, if we want to, 1914 01:32:12,380 --> 01:32:16,610 the individual characters of that array by way of the square bracket 1915 01:32:16,610 --> 01:32:18,170 notation from today. 1916 01:32:18,170 --> 01:32:23,180 But it turns out there's something a little special about strings 1917 01:32:23,180 --> 01:32:24,440 as they're implemented. 1918 01:32:24,440 --> 01:32:28,190 Recall in our example involving scores, the only way 1919 01:32:28,190 --> 01:32:32,930 we knew how long that array was was because I 1920 01:32:32,930 --> 01:32:36,740 had a second variable called length or TOTAL 1921 01:32:36,740 --> 01:32:41,900 that stored the total number of integers in that array. 1922 01:32:41,900 --> 01:32:44,480 That is to say in our scores example, not only did we 1923 01:32:44,480 --> 01:32:45,860 allocate the array itself. 1924 01:32:45,860 --> 01:32:51,390 We also kept track of how many things were in that array with two variables. 1925 01:32:51,390 --> 01:32:56,810 However, up until now, every time you and I have used the printf function, 1926 01:32:56,810 --> 01:33:01,040 and we have passed to that printf function a string like s, 1927 01:33:01,040 --> 01:33:05,420 we have only provided printf with the string itself. 1928 01:33:05,420 --> 01:33:08,030 Or logically, we have only provided printf 1929 01:33:08,030 --> 01:33:11,670 with the array of characters itself. 1930 01:33:11,670 --> 01:33:17,870 And yet somehow, printf is magically figuring out how long the string is. 1931 01:33:17,870 --> 01:33:20,660 After all, when printf prints the value of s, 1932 01:33:20,660 --> 01:33:23,780 it is printing H, I, exclamation point, and that's it. 1933 01:33:23,780 --> 01:33:27,643 It's not going and printing 4 characters or 5 or 20, right? 1934 01:33:27,643 --> 01:33:30,560 It stands to reason that there's other stuff in your computer's memory 1935 01:33:30,560 --> 01:33:32,960 if you've got other variables or other programs running. 1936 01:33:32,960 --> 01:33:35,480 Yet printf seems to be smart enough to know, 1937 01:33:35,480 --> 01:33:39,320 given an array, how long the array is because, quite simply, it 1938 01:33:39,320 --> 01:33:42,480 only prints out that single word. 1939 01:33:42,480 --> 01:33:48,440 So how then does a computer know where a string ends in memory if all a string 1940 01:33:48,440 --> 01:33:49,910 is is a sequence of characters? 1941 01:33:49,910 --> 01:33:54,500 Well, it turns out that if your string is length 3, as is this one, H, I, 1942 01:33:54,500 --> 01:34:00,680 exclamation point, technically a string, implemented underneath the hood, 1943 01:34:00,680 --> 01:34:02,390 uses 4 bytes. 1944 01:34:02,390 --> 01:34:04,280 It uses 4 bytes. 1945 01:34:04,280 --> 01:34:07,760 It uses a fourth byte to be initialized to what 1946 01:34:07,760 --> 01:34:11,850 we would describe as backslash 0, which is a weird way of describing it. 1947 01:34:11,850 --> 01:34:14,870 But this just represents a special character, otherwise known 1948 01:34:14,870 --> 01:34:18,890 as the null character, which is just a special value that 1949 01:34:18,890 --> 01:34:20,880 represents the end of a string. 1950 01:34:20,880 --> 01:34:23,960 So that is to say when you create a string, quote 1951 01:34:23,960 --> 01:34:26,750 unquote with double quotes, "HI!"-- 1952 01:34:26,750 --> 01:34:28,400 yes, the string is length 3. 1953 01:34:28,400 --> 01:34:31,580 But you're wasting or spending 4 total bytes on it. 1954 01:34:31,580 --> 01:34:32,240 Why? 1955 01:34:32,240 --> 01:34:36,380 Because this is a clue to the computer as to where "HI!" 1956 01:34:36,380 --> 01:34:39,800 ends and where the next string maybe begins. 1957 01:34:39,800 --> 01:34:43,010 It is not sufficient to just start printing characters inside 1958 01:34:43,010 --> 01:34:45,117 of printf one at a time, left to right. 1959 01:34:45,117 --> 01:34:47,450 There needs to be this sort of equivalent of a stop sign 1960 01:34:47,450 --> 01:34:50,150 at the end of the string, saying, that's it for this string. 1961 01:34:50,150 --> 01:34:51,540 Well, what are these values? 1962 01:34:51,540 --> 01:34:53,290 Well, let's convert them back to decimal-- 1963 01:34:53,290 --> 01:34:54,800 72, 73, 33. 1964 01:34:54,800 --> 01:35:00,560 That fancy backslash 0 was just a way of saying, in character form, it's 0. 1965 01:35:00,560 --> 01:35:06,740 More specifically, it is eight 0 bits inside of that square. 1966 01:35:06,740 --> 01:35:09,470 So to store a string, the computer, unbeknownst to you, 1967 01:35:09,470 --> 01:35:15,260 has been using one extra byte all, 0 bits, otherwise written as backslash 0, 1968 01:35:15,260 --> 01:35:19,340 but otherwise known as literally the value 0. 1969 01:35:19,340 --> 01:35:23,180 So this thing, otherwise colloquially known as null, 1970 01:35:23,180 --> 01:35:24,685 is just a special character. 1971 01:35:24,685 --> 01:35:26,060 And we can actually see it again. 1972 01:35:26,060 --> 01:35:30,260 If I go back to my asciichart.com from before, 1973 01:35:30,260 --> 01:35:35,480 notice number 0 is known as NUL, N-U-L in all caps. 1974 01:35:35,480 --> 01:35:40,580 All right, so with that said, what is powerful then about strings 1975 01:35:40,580 --> 01:35:42,060 once we have this capability? 1976 01:35:42,060 --> 01:35:43,640 Well, let me go ahead and do this. 1977 01:35:43,640 --> 01:35:46,130 Let me go back into my code from a moment ago. 1978 01:35:46,130 --> 01:35:48,830 And let me go ahead and enhance this program a little bit 1979 01:35:48,830 --> 01:35:51,965 just to get a little curious as to what's going on. 1980 01:35:51,965 --> 01:35:53,250 You know what I can do? 1981 01:35:53,250 --> 01:35:57,200 I bet what I can do here in this version here is this. 1982 01:35:57,200 --> 01:35:57,800 You know what? 1983 01:35:57,800 --> 01:36:00,440 If I want to print out all of these characters of s, 1984 01:36:00,440 --> 01:36:06,590 I can get a little curious again and print out %c, %c, %c. 1985 01:36:06,590 --> 01:36:11,340 And if s is an array, per today's syntax, I can technically do s[0], 1986 01:36:11,340 --> 01:36:14,940 s[1], s[2]. 1987 01:36:14,940 --> 01:36:21,720 And then if I save this, recompile my code with make hi, OK, ./hi, 1988 01:36:21,720 --> 01:36:23,070 I still see "HI!" 1989 01:36:23,070 --> 01:36:23,820 But you know what? 1990 01:36:23,820 --> 01:36:25,195 Let me get a little more curious. 1991 01:36:25,195 --> 01:36:28,740 Let me use %i so I can actually see those ASCII codes. 1992 01:36:28,740 --> 01:36:31,950 Let me go ahead and recompile with make hi, ./hi. 1993 01:36:31,950 --> 01:36:35,190 There's the 72, 73, 33. 1994 01:36:35,190 --> 01:36:37,090 Now let me get even more curious. 1995 01:36:37,090 --> 01:36:42,270 Let me print a fourth value like this here, s[3], 1996 01:36:42,270 --> 01:36:44,430 which is the fourth location, mind you. 1997 01:36:44,430 --> 01:36:50,850 So if I now do make hi and ./hi, voila, now you see 0. 1998 01:36:50,850 --> 01:36:55,110 And what this hints at is actually a very dangerous feature of C. You know, 1999 01:36:55,110 --> 01:36:57,750 suppose I'm curious at seeing what's beyond that. 2000 01:36:57,750 --> 01:37:01,290 I could technically do s[4], the fifth location, 2001 01:37:01,290 --> 01:37:04,830 even though according to my picture, there really shouldn't be anything 2002 01:37:04,830 --> 01:37:08,010 at the fifth location, at least not that I know about just yet. 2003 01:37:08,010 --> 01:37:10,980 But I can do it in C. Nothing's stopping me. 2004 01:37:10,980 --> 01:37:13,710 So let me do make hi, ./hi. 2005 01:37:13,710 --> 01:37:15,490 And that's interesting. 2006 01:37:15,490 --> 01:37:17,560 Apparently there's the number 37. 2007 01:37:17,560 --> 01:37:19,110 What is the number 37? 2008 01:37:19,110 --> 01:37:21,150 Well, let me go back to my ASCII chart. 2009 01:37:21,150 --> 01:37:25,102 And let me conclude that number 37 is a percent sign. 2010 01:37:25,102 --> 01:37:28,060 So that's kind of weird because I didn't print out an explicit percent. 2011 01:37:28,060 --> 01:37:31,290 Now I'm kind of poking around the computer's memory in places 2012 01:37:31,290 --> 01:37:33,370 I shouldn't be looking, in some sense. 2013 01:37:33,370 --> 01:37:36,510 In fact, if I get really curious, let's look not at location 4. 2014 01:37:36,510 --> 01:37:40,140 How about location 40, like way off into that picture? 2015 01:37:40,140 --> 01:37:44,400 Make hi, ./hi, 24, whatever that is. 2016 01:37:44,400 --> 01:37:52,470 I can look at location 400, recompile my code, make hi, ./hi. 2017 01:37:52,470 --> 01:37:54,090 And now it's 0 again. 2018 01:37:54,090 --> 01:37:57,060 So this is what's both powerful and also dangerous about C. 2019 01:37:57,060 --> 01:38:01,088 You can touch, look at, change any memory you want. 2020 01:38:01,088 --> 01:38:02,880 You're essentially just on the honor system 2021 01:38:02,880 --> 01:38:04,838 not to touch memory that does it belong to you. 2022 01:38:04,838 --> 01:38:06,960 And invariably, especially next week, are 2023 01:38:06,960 --> 01:38:10,290 we going to start accidentally touching memory that doesn't belong to you. 2024 01:38:10,290 --> 01:38:13,380 And you'll see that it actually can cause computer programs to crash, 2025 01:38:13,380 --> 01:38:18,330 including programs on your own Mac and PC, yet another source of common bugs. 2026 01:38:18,330 --> 01:38:22,350 But now that we have this ability to store different strings 2027 01:38:22,350 --> 01:38:24,362 or to think about strings as arrays, well, 2028 01:38:24,362 --> 01:38:26,070 let's go ahead and consider how you might 2029 01:38:26,070 --> 01:38:27,670 have multiple strings in a program. 2030 01:38:27,670 --> 01:38:30,900 So for instance, if you were to store two strings in a program-- let's call 2031 01:38:30,900 --> 01:38:32,677 them s and t respectively. 2032 01:38:32,677 --> 01:38:35,010 Another programmer convention-- if you need two strings, 2033 01:38:35,010 --> 01:38:37,110 call the first one s then the second one t. 2034 01:38:37,110 --> 01:38:38,400 Maybe I'm storing "HI!" 2035 01:38:38,400 --> 01:38:39,280 then "BYE!" 2036 01:38:39,280 --> 01:38:41,530 Well, what's the computer's memory going to look like? 2037 01:38:41,530 --> 01:38:43,950 Well, let's do some digging. 2038 01:38:43,950 --> 01:38:46,202 "HI!", as before, is going to be stored here. 2039 01:38:46,202 --> 01:38:47,910 So this whole thing refers to s, and it's 2040 01:38:47,910 --> 01:38:52,080 taking 4 bytes because the last one is that special null character that 2041 01:38:52,080 --> 01:38:55,440 just is the stop sign that demarcates the end of the string. 2042 01:38:55,440 --> 01:38:59,760 "BYE!", meanwhile, is going to take up another B, Y, E, exclamation point, 2043 01:38:59,760 --> 01:39:04,650 five bytes because I need a fifth byte to represent another null character. 2044 01:39:04,650 --> 01:39:06,600 And this one deliberately wraps around. 2045 01:39:06,600 --> 01:39:08,820 Though again, this is just an artist's rendition. 2046 01:39:08,820 --> 01:39:11,580 There's not necessarily a grid in reality. 2047 01:39:11,580 --> 01:39:16,770 B, Y, E, exclamation point, backslash 0 now represents t. 2048 01:39:16,770 --> 01:39:21,690 So this is to say, if I had a program like this, where I had "HI!" 2049 01:39:21,690 --> 01:39:25,200 and then "BYE!", and I started poking around the computer's memory 2050 01:39:25,200 --> 01:39:27,360 just using the square bracket notation, I 2051 01:39:27,360 --> 01:39:31,020 bet I could start accessing the value of B or Y 2052 01:39:31,020 --> 01:39:34,710 or E just by looking a little past the string s. 2053 01:39:34,710 --> 01:39:37,380 So again, as complicated as our programs get, 2054 01:39:37,380 --> 01:39:40,320 all that's going on underneath the hood is you just plop things down 2055 01:39:40,320 --> 01:39:44,070 in memory in locations like these. 2056 01:39:44,070 --> 01:39:47,310 And so now that we have this ability or maybe this mental model 2057 01:39:47,310 --> 01:39:49,710 for what's going on inside of a computer, 2058 01:39:49,710 --> 01:39:53,490 we can consider some of the features that you might want 2059 01:39:53,490 --> 01:39:55,740 to now use in programs that you write. 2060 01:39:55,740 --> 01:39:59,190 So let me go ahead here and whip up a quick program, 2061 01:39:59,190 --> 01:40:05,400 for instance, that goes ahead and, let's say, 2062 01:40:05,400 --> 01:40:09,310 prints out the total length of a string. 2063 01:40:09,310 --> 01:40:10,540 Let me go ahead and do this. 2064 01:40:10,540 --> 01:40:14,730 I'm going to go ahead and create a new program here in CS50's IDE. 2065 01:40:14,730 --> 01:40:17,870 And I'm going to call this one string.c. 2066 01:40:17,870 --> 01:40:22,080 And I'm going to very quickly at the top include as usual cs50.h. 2067 01:40:22,080 --> 01:40:24,735 And I'm going to go ahead and #include stdio.h. 2068 01:40:24,735 --> 01:40:27,185 And I'm going to give myself int main(void). 2069 01:40:27,185 --> 01:40:29,310 And then in here, I'm going to get myself a string. 2070 01:40:29,310 --> 01:40:32,280 So string s equals get_string. 2071 01:40:32,280 --> 01:40:35,220 Let me just ask the human for some input, whatever it is. 2072 01:40:35,220 --> 01:40:39,270 Then let me go ahead and print out literally the word "Output" 2073 01:40:39,270 --> 01:40:41,730 just so that I can actually see the result. 2074 01:40:41,730 --> 01:40:47,250 And then down here, let me go ahead and print out that string, for int i get 0, 2075 01:40:47,250 --> 01:40:49,792 i is less than-- 2076 01:40:49,792 --> 01:40:52,240 huh, I don't know what the length of the string is yet. 2077 01:40:52,240 --> 01:40:54,990 So let me just put a question mark there, which is not valid code, 2078 01:40:54,990 --> 01:40:57,068 but we'll come back to this-- i++. 2079 01:40:57,068 --> 01:40:59,610 And then inside of the loop, I want to go ahead and print out 2080 01:40:59,610 --> 01:41:03,432 every character one at a time by using my new array notation. 2081 01:41:03,432 --> 01:41:05,140 And then at the very end of this program, 2082 01:41:05,140 --> 01:41:06,890 I'm going to print a new line just to make 2083 01:41:06,890 --> 01:41:08,460 sure the cursor is on its own line. 2084 01:41:08,460 --> 01:41:11,000 So this is a complete program that is now, 2085 01:41:11,000 --> 01:41:15,950 as of this week, going to treat a string as an array, ergo, my syntax in line 10 2086 01:41:15,950 --> 01:41:18,830 that's using my new fancy square bracket notation. 2087 01:41:18,830 --> 01:41:21,920 But the only question I haven't answered yet is this-- 2088 01:41:21,920 --> 01:41:25,100 how do I know when to stop printing the string? 2089 01:41:25,100 --> 01:41:26,390 How do I know when to stop? 2090 01:41:26,390 --> 01:41:28,850 Well, it turns out, thus far, when we're using for loops, 2091 01:41:28,850 --> 01:41:34,040 we've typically done something like just count from 0 on up to some number. 2092 01:41:34,040 --> 01:41:36,620 This condition, though, is any Boolean expression. 2093 01:41:36,620 --> 01:41:39,300 I just need to have a yes/no or a true/false answer. 2094 01:41:39,300 --> 01:41:40,850 So you know what I could do? 2095 01:41:40,850 --> 01:41:45,620 Keep looping so long as character at location i 2096 01:41:45,620 --> 01:41:50,030 and s does not equal backslash 0. 2097 01:41:50,030 --> 01:41:52,170 So this is now definitely some new syntax. 2098 01:41:52,170 --> 01:41:53,510 Let me zoom in here. 2099 01:41:53,510 --> 01:41:58,700 But s[i] just means the i-th character in s, or more specifically, 2100 01:41:58,700 --> 01:42:01,820 the character at position i in s. 2101 01:42:01,820 --> 01:42:05,000 Bang equals-- so bang is how a programmer pronounces 2102 01:42:05,000 --> 01:42:08,150 exclamation point because it's a little faster-- bang equals 2103 01:42:08,150 --> 01:42:09,597 means does not equal. 2104 01:42:09,597 --> 01:42:12,680 So this is how you would do an equal sign with a slash through it in math. 2105 01:42:12,680 --> 01:42:15,920 It's, in code, exclamation point, equals sign. 2106 01:42:15,920 --> 01:42:18,230 And then notice this funkiness-- backslash 2107 01:42:18,230 --> 01:42:22,100 0 is again, the "null character," but it's in single quotes 2108 01:42:22,100 --> 01:42:24,500 because, again, it is by definition a character. 2109 01:42:24,500 --> 01:42:26,480 And for reasons we'll get into another time, 2110 01:42:26,480 --> 01:42:28,760 backslash 0 is how you express it. 2111 01:42:28,760 --> 01:42:32,600 Just like backslash n is kind of a weird escape character for the new line, 2112 01:42:32,600 --> 01:42:36,710 backslash 0 is the character that is all 0's. 2113 01:42:36,710 --> 01:42:38,570 So this is kind of a different for loop. 2114 01:42:38,570 --> 01:42:41,870 I'm still starting at 0 for i. 2115 01:42:41,870 --> 01:42:43,880 I'm still incrementing i as always. 2116 01:42:43,880 --> 01:42:46,400 But I'm now not checking for some preordained length 2117 01:42:46,400 --> 01:42:50,990 because just like a computer, I do not know a priori where these strings end. 2118 01:42:50,990 --> 01:42:55,580 I only know that they end once I see backslash 0. 2119 01:42:55,580 --> 01:42:59,150 So when I now go down here and do make string-- 2120 01:42:59,150 --> 01:43:05,570 it compiles OK-- ./string, let me type in something like "HELLO" in all caps. 2121 01:43:05,570 --> 01:43:07,460 Voila, the output is "HELLO" again. 2122 01:43:07,460 --> 01:43:08,450 Let me do it again-- 2123 01:43:08,450 --> 01:43:11,030 "BYE" in all caps, and the output is "BYE." 2124 01:43:11,030 --> 01:43:13,580 So it's kind of a useless program in that it's just printing 2125 01:43:13,580 --> 01:43:15,350 the same thing that I typed in. 2126 01:43:15,350 --> 01:43:19,490 But I'm conditionally using this Boolean expression 2127 01:43:19,490 --> 01:43:22,170 to decide whether or not to keep printing characters. 2128 01:43:22,170 --> 01:43:25,280 Now thankfully, C comes with a function that can answer this for me. 2129 01:43:25,280 --> 01:43:29,210 It turns out there is a function called strlen 2130 01:43:29,210 --> 01:43:31,850 so I can literally just say, well, figure out 2131 01:43:31,850 --> 01:43:33,500 what the length of the string is. 2132 01:43:33,500 --> 01:43:36,110 The function is called strlen for string length. 2133 01:43:36,110 --> 01:43:40,730 And it exists in a file called, not surprisingly, perhaps, 2134 01:43:40,730 --> 01:43:43,610 string.h, string.h. 2135 01:43:43,610 --> 01:43:47,660 So now let me go ahead down here and do make string-- 2136 01:43:47,660 --> 01:43:50,300 compiles OK-- ./string. 2137 01:43:50,300 --> 01:43:52,950 Type in "HELLO," and it still works. 2138 01:43:52,950 --> 01:43:58,400 So this function strlen that does exist in a library via the header file 2139 01:43:58,400 --> 01:43:59,523 string.h already exists. 2140 01:43:59,523 --> 01:44:00,440 Someone else wrote it. 2141 01:44:00,440 --> 01:44:01,710 But how did they write it? 2142 01:44:01,710 --> 01:44:04,040 Odds are they wrote the first version that I 2143 01:44:04,040 --> 01:44:06,980 did by checking for that backslash 0. 2144 01:44:06,980 --> 01:44:09,235 But let me ask a subtle question here. 2145 01:44:09,235 --> 01:44:10,235 This program is correct. 2146 01:44:10,235 --> 01:44:12,235 It iterates over the whole length of the string, 2147 01:44:12,235 --> 01:44:14,870 and it prints out every character therein. 2148 01:44:14,870 --> 01:44:20,510 Can anyone observe a poor design decision in this function? 2149 01:44:20,510 --> 01:44:24,200 This one's subtle, but there's something I don't 2150 01:44:24,200 --> 01:44:26,660 like about my for loop in particular. 2151 01:44:26,660 --> 01:44:28,640 And I'll isolate it to line 9. 2152 01:44:28,640 --> 01:44:31,230 I've not done something optimally on line 9. 2153 01:44:31,230 --> 01:44:34,700 There's an opportunity for better design. 2154 01:44:34,700 --> 01:44:40,830 Any thoughts here on what I might do better? 2155 01:44:40,830 --> 01:44:42,426 Yeah, Jonathan? 2156 01:44:42,426 --> 01:44:46,770 JONATHAN: Yeah, to create basically another variable for the string length 2157 01:44:46,770 --> 01:44:48,455 and to remember it. 2158 01:44:48,455 --> 01:44:50,580 DAVID MALAN: Yeah, and why are you suggesting that? 2159 01:44:50,580 --> 01:44:53,670 JONATHAN: If you want to use a different value for the string length, 2160 01:44:53,670 --> 01:44:55,710 or if it might fluctuate or change, you want 2161 01:44:55,710 --> 01:44:59,370 to just have a different variable as a sort of placeholder value for it. 2162 01:44:59,370 --> 01:45:00,670 DAVID MALAN: OK, potentially. 2163 01:45:00,670 --> 01:45:03,210 But I will claim in this case that because the human has 2164 01:45:03,210 --> 01:45:07,090 typed in the word, once you type in the word, it's not going to change. 2165 01:45:07,090 --> 01:45:11,520 But I think you're going down the right direction because 2166 01:45:11,520 --> 01:45:15,570 in this Boolean expression here, i less than the string length of s, 2167 01:45:15,570 --> 01:45:19,350 recall that this expression gets evaluated again and again and again. 2168 01:45:19,350 --> 01:45:22,050 Every time through a for loop, recall that you're constantly 2169 01:45:22,050 --> 01:45:23,290 checking the condition. 2170 01:45:23,290 --> 01:45:26,460 The condition in this case is i less than the length of s. 2171 01:45:26,460 --> 01:45:30,382 The problem is that strlen in this case is a function, which 2172 01:45:30,382 --> 01:45:32,340 means there's some piece of code someone wrote, 2173 01:45:32,340 --> 01:45:35,593 probably similar to what I wrote a few minutes ago, that you're constantly 2174 01:45:35,593 --> 01:45:37,260 asking, what's the length of the string? 2175 01:45:37,260 --> 01:45:38,593 What's the length of the string? 2176 01:45:38,593 --> 01:45:41,880 And recall from our picture, the way you figure out the length of a string 2177 01:45:41,880 --> 01:45:44,070 is you start at the beginning of the string, and you keep checking, 2178 01:45:44,070 --> 01:45:45,300 am I at backslash 0? 2179 01:45:45,300 --> 01:45:46,020 OK. 2180 01:45:46,020 --> 01:45:47,700 Am I at backslash 0? 2181 01:45:47,700 --> 01:45:48,540 OK. 2182 01:45:48,540 --> 01:45:52,600 So to figure out the length of "HI!", it's going to take me 1, 2, 3, 4 steps, 2183 01:45:52,600 --> 01:45:54,600 right, because I have to start at the beginning. 2184 01:45:54,600 --> 01:45:57,267 And I iterate from location 0 on to the end. 2185 01:45:57,267 --> 01:45:59,100 To find out the length of "BYE!", it's going 2186 01:45:59,100 --> 01:46:01,350 to take me five steps because that's how long it's 2187 01:46:01,350 --> 01:46:04,740 going to take me from left to right to find that backslash 0. 2188 01:46:04,740 --> 01:46:07,080 So what I don't like about this line of code is, 2189 01:46:07,080 --> 01:46:10,680 why are you asking for the string length of s again and again 2190 01:46:10,680 --> 01:46:11,790 and again and again? 2191 01:46:11,790 --> 01:46:14,230 It's not going to change in this context. 2192 01:46:14,230 --> 01:46:17,887 So Jonathan's point is taken if we keep asking the user for more input. 2193 01:46:17,887 --> 01:46:19,970 But in this case, we've only asked the human once. 2194 01:46:19,970 --> 01:46:20,920 So you know what? 2195 01:46:20,920 --> 01:46:26,700 Let's take Jonathan's advice and do int n equals the string length of s. 2196 01:46:26,700 --> 01:46:28,950 And then maybe you know what we could do? 2197 01:46:28,950 --> 01:46:32,170 Put n in this condition instead. 2198 01:46:32,170 --> 01:46:35,520 So now I'm asking the same question, but I'm not foolishly, 2199 01:46:35,520 --> 01:46:39,030 inefficiently asking the same question again and again, 2200 01:46:39,030 --> 01:46:42,720 whereby the same question requires a good amount of work 2201 01:46:42,720 --> 01:46:45,940 to find the backslash 0 again and again and again. 2202 01:46:45,940 --> 01:46:48,470 Now, there's some cleaning up we can do here too. 2203 01:46:48,470 --> 01:46:50,970 It turns out there's this other subtle feature of for loops. 2204 01:46:50,970 --> 01:46:54,660 If you want to initialize another variable to a value, 2205 01:46:54,660 --> 01:46:56,370 you can actually do this all at once. 2206 01:46:56,370 --> 01:46:59,130 And you can do so before the semicolon. 2207 01:46:59,130 --> 01:47:04,530 You can do comma n equals strlen of s. 2208 01:47:04,530 --> 01:47:07,150 And then you can use n, just as I have here. 2209 01:47:07,150 --> 01:47:09,210 So it's not all that much better, but it's 2210 01:47:09,210 --> 01:47:11,790 a little cleaner in that now I've taken two lines of code 2211 01:47:11,790 --> 01:47:13,710 and collapsed them into one. 2212 01:47:13,710 --> 01:47:15,750 They both have to be of the same data types, 2213 01:47:15,750 --> 01:47:19,380 but that's OK here because both i and n are. 2214 01:47:19,380 --> 01:47:21,750 So again, the inefficiency here is that it was foolish 2215 01:47:21,750 --> 01:47:26,100 before that I kept asking the same question again and again and again. 2216 01:47:26,100 --> 01:47:30,810 But now I'm asking the question once, remembering it in a variable called n, 2217 01:47:30,810 --> 01:47:36,720 and only comparing i against that integer which does not actually change. 2218 01:47:36,720 --> 01:47:38,370 All right, I know that too was a lot. 2219 01:47:38,370 --> 01:47:41,910 Let's go ahead here and take a 3-minute break just to stretch legs and whatnot. 2220 01:47:41,910 --> 01:47:44,880 In 3 minutes, we'll come back and start to see applications 2221 01:47:44,880 --> 01:47:48,030 now of all of these features ultimately to some problems that 2222 01:47:48,030 --> 01:47:51,030 are going to lie ahead this week on the readability of language 2223 01:47:51,030 --> 01:47:52,510 and also on cryptography. 2224 01:47:52,510 --> 01:47:54,750 So we'll see you in 3 minutes. 2225 01:47:54,750 --> 01:47:57,240 All right, so we're back. 2226 01:47:57,240 --> 01:48:00,885 And this has been a whole bunch of low-level details, admittedly. 2227 01:48:00,885 --> 01:48:03,510 And where we're going with this ultimately this week and beyond 2228 01:48:03,510 --> 01:48:05,562 is applications of some of these building blocks. 2229 01:48:05,562 --> 01:48:08,520 And one of those applications this coming week and the next problem set 2230 01:48:08,520 --> 01:48:11,580 is going to be that of cryptography, the art of scrambling or encrypting 2231 01:48:11,580 --> 01:48:12,597 information. 2232 01:48:12,597 --> 01:48:14,430 And if you're trying to encrypt information, 2233 01:48:14,430 --> 01:48:16,830 like messages, well, those messages might very well 2234 01:48:16,830 --> 01:48:19,260 be written in English or in ASCII, if you will. 2235 01:48:19,260 --> 01:48:23,250 And you might want to convert some of those ASCII characters from one thing 2236 01:48:23,250 --> 01:48:27,480 to another so that if your message is intercepted by some third party, 2237 01:48:27,480 --> 01:48:30,990 they can't actually decipher or figure out what it is that you've sent. 2238 01:48:30,990 --> 01:48:33,360 So I feel like we're almost toward-- 2239 01:48:33,360 --> 01:48:35,550 we're almost at the ability where, in code, we 2240 01:48:35,550 --> 01:48:39,270 can start to convert one word to another or to scramble our text. 2241 01:48:39,270 --> 01:48:41,490 But we do need a couple of more building blocks. 2242 01:48:41,490 --> 01:48:44,040 So recall that we left off with this picture 2243 01:48:44,040 --> 01:48:47,160 here, where we had two words in the computer's memory, "HI!" and "BYE!", 2244 01:48:47,160 --> 01:48:50,610 both with exclamation points, but also both with these backslash 0's 2245 01:48:50,610 --> 01:48:52,800 that you and I do not put there explicitly. 2246 01:48:52,800 --> 01:48:56,370 They just happen for you any time you use the double quotes and any time 2247 01:48:56,370 --> 01:48:58,990 you use the get_string function. 2248 01:48:58,990 --> 01:49:03,720 So once we have those in memory, you can think of them as s and t respectively. 2249 01:49:03,720 --> 01:49:06,480 But a string, s or t, is just an array. 2250 01:49:06,480 --> 01:49:11,040 So again, you can also refer to all of these individual characters or chars 2251 01:49:11,040 --> 01:49:15,420 via the new square bracket notation of today, s[0], s[1], s[2], s[3], 2252 01:49:15,420 --> 01:49:21,210 and then t[0], t[1], [2], [3], and [4], and then whatever else is 2253 01:49:21,210 --> 01:49:22,470 in the computer's memory. 2254 01:49:22,470 --> 01:49:26,880 But you know what you can even do is this-- suppose that instead we 2255 01:49:26,880 --> 01:49:28,980 wanted to have an array of words. 2256 01:49:28,980 --> 01:49:32,650 So before, we had an array of scores, an array of integers. 2257 01:49:32,650 --> 01:49:35,370 But now suppose we wanted in the context of some other program 2258 01:49:35,370 --> 01:49:36,780 to have an array of words. 2259 01:49:36,780 --> 01:49:37,800 You can totally do that. 2260 01:49:37,800 --> 01:49:40,560 There's nothing stopping you from having an array of words. 2261 01:49:40,560 --> 01:49:42,240 And the syntax is going to be identical. 2262 01:49:42,240 --> 01:49:48,150 Notice, if I want an array called words that has room for two strings, 2263 01:49:48,150 --> 01:49:51,180 I literally just say, string words[2]. 2264 01:49:51,180 --> 01:49:56,540 This means, hey, computer, give me an array of size 2, each of whose members 2265 01:49:56,540 --> 01:49:57,540 is going to be a string. 2266 01:49:57,540 --> 01:49:58,920 How do I populate that array? 2267 01:49:58,920 --> 01:50:00,510 Same as before with the scores-- 2268 01:50:00,510 --> 01:50:02,790 words[0] gets, quote unquote, "HI!" 2269 01:50:02,790 --> 01:50:05,280 Words[1] gets, quote unquote, "BYE!" 2270 01:50:05,280 --> 01:50:09,540 So that is to say with this code, could we create a picture similar to the one 2271 01:50:09,540 --> 01:50:10,380 previously? 2272 01:50:10,380 --> 01:50:12,540 But I'm not calling these strings s and t. 2273 01:50:12,540 --> 01:50:16,890 Now I'm calling them both "words" at two different locations, 0 and 1 2274 01:50:16,890 --> 01:50:17,830 respectively. 2275 01:50:17,830 --> 01:50:20,040 So we could redraw that same picture like this. 2276 01:50:20,040 --> 01:50:23,790 Now this word is technically named words[0]. 2277 01:50:23,790 --> 01:50:26,640 And this one is referred to by words[1]. 2278 01:50:26,640 --> 01:50:29,310 But again, what is a string? 2279 01:50:29,310 --> 01:50:30,990 A string is an array. 2280 01:50:30,990 --> 01:50:34,360 And yet, here we have an array of strings. 2281 01:50:34,360 --> 01:50:37,510 So we kind of sort of have an array of arrays. 2282 01:50:37,510 --> 01:50:40,440 So we've got an array of words, but a word is just a string. 2283 01:50:40,440 --> 01:50:42,850 And a string is an array of characters. 2284 01:50:42,850 --> 01:50:47,430 So what I really have on the board is an array of arrays. 2285 01:50:47,430 --> 01:50:51,190 And so here-- and this will be the last weird syntax for today-- 2286 01:50:51,190 --> 01:50:55,050 you can actually have multiple square brackets back to back. 2287 01:50:55,050 --> 01:50:58,650 So if your variable's called words, and that variable's an array, 2288 01:50:58,650 --> 01:51:03,240 if you want to get the first word in the array, you do words[0]. 2289 01:51:03,240 --> 01:51:06,090 Once you're at that word, "HI!", and you want 2290 01:51:06,090 --> 01:51:10,860 to get the first character in that word, you can similarly do [0]. 2291 01:51:10,860 --> 01:51:14,230 So the first bracket refers to what word do you want in the array. 2292 01:51:14,230 --> 01:51:18,060 The second bracket refers to what character do you want in that word. 2293 01:51:18,060 --> 01:51:22,320 So now the I is that words[0][1]. 2294 01:51:22,320 --> 01:51:25,500 The exclamation point is that words[0][2]. 2295 01:51:25,500 --> 01:51:28,810 And the null character's at words[0][3]. 2296 01:51:28,810 --> 01:51:37,508 Meanwhile, the B is that words[1][0], [1][1], [1][2], [1][3], [1][4]. 2297 01:51:37,508 --> 01:51:40,050 So it's almost kind of like a coordinate system, if you will. 2298 01:51:40,050 --> 01:51:43,200 It's a two-dimensional array, or an array of arrays. 2299 01:51:43,200 --> 01:51:49,080 So this is only to say that if we wanted to think of arrays of strings 2300 01:51:49,080 --> 01:51:53,280 as individual characters, we can. 2301 01:51:53,280 --> 01:51:56,680 We have that expressiveness now to encode. 2302 01:51:56,680 --> 01:52:00,460 So what more can I do now that I can manipulate things at this level? 2303 01:52:00,460 --> 01:52:03,263 Let me do a program that'll be pretty applicable, 2304 01:52:03,263 --> 01:52:05,430 I think, with some of our upcoming programs as well. 2305 01:52:05,430 --> 01:52:06,960 Let me call this one uppercase. 2306 01:52:06,960 --> 01:52:09,240 Let me quickly write a program whose purpose in life 2307 01:52:09,240 --> 01:52:12,120 is just to convert an input word to uppercase. 2308 01:52:12,120 --> 01:52:13,540 And let's see how we can do this. 2309 01:52:13,540 --> 01:52:16,380 So let me go ahead and #include cs50.h. 2310 01:52:16,380 --> 01:52:20,050 Let me go ahead and #include stdio.h. 2311 01:52:20,050 --> 01:52:23,160 Let me also include this time string.h, which is 2312 01:52:23,160 --> 01:52:24,990 going to give us functions like strlen. 2313 01:52:24,990 --> 01:52:27,670 And then let me do int main(void). 2314 01:52:27,670 --> 01:52:31,280 And then let me go ahead here and get a string from the user like before. 2315 01:52:31,280 --> 01:52:34,030 So I'm just going to ask the user for a string. 2316 01:52:34,030 --> 01:52:36,370 And I want them to give me whatever the string should 2317 01:52:36,370 --> 01:52:38,950 be before I uppercase everything. 2318 01:52:38,950 --> 01:52:41,890 Then I'm just going to go ahead and print out literally "After," 2319 01:52:41,890 --> 01:52:46,330 just so I can see what happens after I capitalize everything in the string. 2320 01:52:46,330 --> 01:52:49,690 And now let me go ahead and do this-- for int i get 0, 2321 01:52:49,690 --> 01:52:53,110 i less than string length of s, i++. 2322 01:52:53,110 --> 01:52:55,180 Wait a minute, I made that mistake before. 2323 01:52:55,180 --> 01:52:57,200 Let's not repeat this question. 2324 01:52:57,200 --> 01:53:02,740 Let's give myself a second variable-- n gets string length of s, i less than n, 2325 01:53:02,740 --> 01:53:04,090 i++. 2326 01:53:04,090 --> 01:53:06,340 So again, this is now becoming boilerplate. 2327 01:53:06,340 --> 01:53:09,400 Any time you want to iterate over all of the characters in the string, 2328 01:53:09,400 --> 01:53:11,913 this probably is a reasonable place to start. 2329 01:53:11,913 --> 01:53:13,330 And then let me ask the question-- 2330 01:53:13,330 --> 01:53:15,673 I want to iterate over every character in the string 2331 01:53:15,673 --> 01:53:16,840 that the human has typed in. 2332 01:53:16,840 --> 01:53:20,470 And I want to ask myself a question, just as we've done with any algorithm. 2333 01:53:20,470 --> 01:53:23,980 Specifically, I want to ask if the current letter is lowercase, 2334 01:53:23,980 --> 01:53:26,080 let me somehow convert it to uppercase. 2335 01:53:26,080 --> 01:53:28,260 Else, let me just print it out unchanged. 2336 01:53:28,260 --> 01:53:31,540 So how can I express that using last week and this week's building blocks? 2337 01:53:31,540 --> 01:53:33,280 Well, let me say something like this-- 2338 01:53:33,280 --> 01:53:39,670 if the character at location i in s, or if the i-th character in s 2339 01:53:39,670 --> 01:53:47,710 is greater than or equal to a lowercase a, and the i-th character in s 2340 01:53:47,710 --> 01:53:52,270 is less than or equal to a lower case z, what do I want to do? 2341 01:53:52,270 --> 01:53:55,750 Let me go ahead and print out a character. 2342 01:53:55,750 --> 01:53:59,320 But that character should be what? s bracket i, 2343 01:53:59,320 --> 01:54:01,960 but I'm not sure what to do here yet. 2344 01:54:01,960 --> 01:54:03,460 But let me come back to that. 2345 01:54:03,460 --> 01:54:09,250 Else, let me go ahead and just print out that character unchanged, s[i]. 2346 01:54:09,250 --> 01:54:14,270 So minus the placeholder, the question marks I've put, I'm kind of all the way 2347 01:54:14,270 --> 01:54:14,770 there. 2348 01:54:14,770 --> 01:54:16,872 Line 10 initializes i to 0. 2349 01:54:16,872 --> 01:54:20,080 It's going to count all the way up to n, where n is the length of the string. 2350 01:54:20,080 --> 01:54:21,310 And it's going to keep incrementing i. 2351 01:54:21,310 --> 01:54:22,393 So we've seen that before. 2352 01:54:22,393 --> 01:54:25,330 And again, that's going to become muscle memory before long. 2353 01:54:25,330 --> 01:54:28,480 Line 12 is a little new, but it uses building blocks 2354 01:54:28,480 --> 01:54:29,532 from last week and this. 2355 01:54:29,532 --> 01:54:31,240 This week, we have the new square bracket 2356 01:54:31,240 --> 01:54:34,810 notation to get the i-th character in the string s. 2357 01:54:34,810 --> 01:54:37,870 Greater than or equal to, less than or equal to-- we saw at least one 2358 01:54:37,870 --> 01:54:38,770 of those last week. 2359 01:54:38,770 --> 01:54:41,860 That just means greater than or equal to, less than or equal to. 2360 01:54:41,860 --> 01:54:46,370 I mentioned && last week, which is the logical AND operator, 2361 01:54:46,370 --> 01:54:49,150 which means you can check one condition and another. 2362 01:54:49,150 --> 01:54:52,540 And the whole thing is true if both of those are true. 2363 01:54:52,540 --> 01:54:54,440 This is a bit weird today. 2364 01:54:54,440 --> 01:54:57,100 But if you want to express, is the current character 2365 01:54:57,100 --> 01:55:01,930 between lowercase a and lowercase z, totally fine 2366 01:55:01,930 --> 01:55:07,750 to implicitly treat a and z as numbers, which they really are. 2367 01:55:07,750 --> 01:55:11,180 Because again, if we come back to our favorite ASCII chart, 2368 01:55:11,180 --> 01:55:16,600 you'll see again that lowercase a has a number associated with it, 97. 2369 01:55:16,600 --> 01:55:20,410 Lowercase z has a number associated with it, 122. 2370 01:55:20,410 --> 01:55:25,000 So if I really wanted to be pedantic, I could go back into my code 2371 01:55:25,000 --> 01:55:28,540 and do something like, well, if this is greater than or equal to 97, 2372 01:55:28,540 --> 01:55:32,320 and it's less than or equal to 122, but bad design. 2373 01:55:32,320 --> 01:55:35,272 Like, I'm never going to remember that lowercase z is 122. 2374 01:55:35,272 --> 01:55:36,730 Like, no one is going to know that. 2375 01:55:36,730 --> 01:55:38,320 It makes the code less obvious. 2376 01:55:38,320 --> 01:55:41,080 Go ahead and write it in a way that's a little more 2377 01:55:41,080 --> 01:55:43,730 friendly to humans like this. 2378 01:55:43,730 --> 01:55:45,070 But notice this question mark. 2379 01:55:45,070 --> 01:55:46,720 How do I fill in this blank? 2380 01:55:46,720 --> 01:55:48,970 Well, let me go back to the ASCII chart. 2381 01:55:48,970 --> 01:55:51,520 This is subtle, but this is kind of cool. 2382 01:55:51,520 --> 01:55:53,560 And humans were definitely thinking ahead. 2383 01:55:53,560 --> 01:55:56,590 Notice that lowercase a is 97. 2384 01:55:56,590 --> 01:55:58,900 Capital A is 65. 2385 01:55:58,900 --> 01:56:01,000 Lowercase b is 98. 2386 01:56:01,000 --> 01:56:03,430 Capital B is 66. 2387 01:56:03,430 --> 01:56:05,965 And notice these two numbers-- 2388 01:56:05,965 --> 01:56:13,330 65 to 97, 66 to 98, 67 to 99. 2389 01:56:13,330 --> 01:56:17,320 It would seem that no matter what letters we compare, lowercase 2390 01:56:17,320 --> 01:56:20,543 and uppercase, they're always 32 apart. 2391 01:56:20,543 --> 01:56:21,460 And that's consistent. 2392 01:56:21,460 --> 01:56:24,290 We could do it for all 26 English letters. 2393 01:56:24,290 --> 01:56:27,520 So if they're always 32 apart, you know what I could do-- 2394 01:56:27,520 --> 01:56:30,730 if I want to take a lowercase letter, which 2395 01:56:30,730 --> 01:56:33,790 is what I'm thinking about in line 14, I could just 2396 01:56:33,790 --> 01:56:36,102 subtract off 32 in this case. 2397 01:56:36,102 --> 01:56:37,810 It's not the cleanest, because again, I'm 2398 01:56:37,810 --> 01:56:39,640 probably going to forget that math at some point. 2399 01:56:39,640 --> 01:56:41,598 But at least mathematically, I think that'll do 2400 01:56:41,598 --> 01:56:44,050 the trick because 97 will become 65. 2401 01:56:44,050 --> 01:56:47,922 98 will become 66, which is forcing those characters to lowercase. 2402 01:56:47,922 --> 01:56:49,630 But they're not being printed as numbers. 2403 01:56:49,630 --> 01:56:52,870 I'm still using %c to coerce it to be a char. 2404 01:56:52,870 --> 01:56:56,780 So if I didn't mess any syntax up here, let me make uppercase. 2405 01:56:56,780 --> 01:56:59,260 OK, ./uppercase. 2406 01:56:59,260 --> 01:57:03,580 And let me go ahead and type in, for instance, my name in all lowercase. 2407 01:57:03,580 --> 01:57:05,490 And voila, uppercase. 2408 01:57:05,490 --> 01:57:06,490 Now, it's a little ugly. 2409 01:57:06,490 --> 01:57:08,530 I forgot my backslash n, so let me go ahead 2410 01:57:08,530 --> 01:57:11,620 and add one of those real quick just to fix the cursor. 2411 01:57:11,620 --> 01:57:14,590 Let me recompile the code with make uppercase. 2412 01:57:14,590 --> 01:57:17,650 Let me rerun the program with ./uppercase and now type in my name, 2413 01:57:17,650 --> 01:57:18,400 David. 2414 01:57:18,400 --> 01:57:20,050 Let me do it again with Brian. 2415 01:57:20,050 --> 01:57:23,770 And notice that it's capitalizing everything character by character 2416 01:57:23,770 --> 01:57:26,470 using only today's building blocks. 2417 01:57:26,470 --> 01:57:27,530 This is correct. 2418 01:57:27,530 --> 01:57:30,350 It's pretty well styled because everything's nicely indented. 2419 01:57:30,350 --> 01:57:33,890 It's very readable even though it might look a little cryptic at first glance. 2420 01:57:33,890 --> 01:57:35,430 But I think I can do better. 2421 01:57:35,430 --> 01:57:37,940 And I can do better by using yet another library. 2422 01:57:37,940 --> 01:57:41,270 And here's where C, and really programming in general, gets powerful. 2423 01:57:41,270 --> 01:57:43,340 The whole point of using popular languages 2424 01:57:43,340 --> 01:57:46,742 is because so many other people before you have solved problems 2425 01:57:46,742 --> 01:57:48,200 that you don't need to solve again. 2426 01:57:48,200 --> 01:57:51,230 And I'm sure over the past, like, 50 years, someone has probably 2427 01:57:51,230 --> 01:57:54,770 written a function that capitalizes letters for me. 2428 01:57:54,770 --> 01:57:56,690 I don't have to do this myself. 2429 01:57:56,690 --> 01:58:00,770 And indeed, there is another library that I'm going 2430 01:58:00,770 --> 01:58:02,540 to include by way of its header file. 2431 01:58:02,540 --> 01:58:07,050 In ctype.h, type which is the language C and a bunch of type-related things. 2432 01:58:07,050 --> 01:58:11,270 And in ctype.h, it turns out there's a function call-- 2433 01:58:11,270 --> 01:58:12,650 there's a couple of functions. 2434 01:58:12,650 --> 01:58:15,990 Specifically, let me get rid of all of this code. 2435 01:58:15,990 --> 01:58:21,980 And let me call a function called islower and pass to islower s[i]. 2436 01:58:21,980 --> 01:58:24,560 And islower, as you might guess, its purpose in life 2437 01:58:24,560 --> 01:58:27,230 is to return essentially a Boolean value, true or false, 2438 01:58:27,230 --> 01:58:28,770 if that character is lower. 2439 01:58:28,770 --> 01:58:31,610 And if so, well, let me go ahead and print out a placeholder 2440 01:58:31,610 --> 01:58:34,280 followed by the capitalization of that letter. 2441 01:58:34,280 --> 01:58:37,670 Now, before I had to do that annoying math with minus 32 and figure it out, 2442 01:58:37,670 --> 01:58:44,120 uh-uh, toupper of parentheses s[i]. 2443 01:58:44,120 --> 01:58:48,110 And now I can otherwise just print out that character unchanged, 2444 01:58:48,110 --> 01:58:50,990 just as before, s[i]. 2445 01:58:50,990 --> 01:58:52,400 But now notice my program-- 2446 01:58:52,400 --> 01:58:54,540 honestly, it's definitely a little shorter. 2447 01:58:54,540 --> 01:58:56,900 It's a little simpler in that there's just less code. 2448 01:58:56,900 --> 01:59:00,800 And hopefully, if the person that wrote islower and toupper did a good job, 2449 01:59:00,800 --> 01:59:01,898 I know it's correct. 2450 01:59:01,898 --> 01:59:03,440 I'm just standing on their shoulders. 2451 01:59:03,440 --> 01:59:07,010 And frankly, my code's more readable because I understand what islower 2452 01:59:07,010 --> 01:59:11,450 means, whereas that crazy && syntax and all of the additional code-- 2453 01:59:11,450 --> 01:59:14,360 that was just a lot harder to wrap your mind around, arguably. 2454 01:59:14,360 --> 01:59:19,510 So now if I go ahead and compile this-- make uppercase. 2455 01:59:19,510 --> 01:59:21,460 OK, that seemed to work well. 2456 01:59:21,460 --> 01:59:24,870 And now I'm going to go ahead and do ./uppercase and type in my name in all 2457 01:59:24,870 --> 01:59:25,800 lowercase again. 2458 01:59:25,800 --> 01:59:26,810 David seems to work. 2459 01:59:26,810 --> 01:59:27,690 Brian seems to work. 2460 01:59:27,690 --> 01:59:29,065 And I could do this all day long. 2461 01:59:29,065 --> 01:59:30,390 It seems to still work. 2462 01:59:30,390 --> 01:59:31,350 But you know what? 2463 01:59:31,350 --> 01:59:33,407 I don't think I have to be even this explicit. 2464 01:59:33,407 --> 01:59:33,990 You know what? 2465 01:59:33,990 --> 01:59:36,660 I bet if the human who wrote toupper was smart, 2466 01:59:36,660 --> 01:59:41,700 I bet I can just blindly pass in any character to toupper, 2467 01:59:41,700 --> 01:59:46,948 and it's only going to uppercase it if it can be converted to uppercase. 2468 01:59:46,948 --> 01:59:48,740 Otherwise, it'll pass it through unchanged. 2469 01:59:48,740 --> 01:59:49,448 So you know what? 2470 01:59:49,448 --> 01:59:53,340 Let me get rid of all of this stuff and really tighten this program up 2471 01:59:53,340 --> 01:59:59,760 and print out a placeholder for c and then toupper of s[i]. 2472 01:59:59,760 --> 02:00:02,760 And sure enough, if you read the documentation for this function, 2473 02:00:02,760 --> 02:00:07,380 it will handle the case where it's either lowercase or not lowercase. 2474 02:00:07,380 --> 02:00:09,270 And it will do the right thing. 2475 02:00:09,270 --> 02:00:14,070 So now if I recompile my code, make uppercase, so far so good. 2476 02:00:14,070 --> 02:00:15,780 ./uppercase, David again. 2477 02:00:15,780 --> 02:00:17,370 Voila, it still works. 2478 02:00:17,370 --> 02:00:21,300 And notice truly just how much tighter, how much cleaner, 2479 02:00:21,300 --> 02:00:23,100 how much shorter my code is. 2480 02:00:23,100 --> 02:00:26,790 And it's more readable in the sense that this function is pretty well named. 2481 02:00:26,790 --> 02:00:29,070 Toupper is what it's indeed called. 2482 02:00:29,070 --> 02:00:31,140 But there is an important detail here. 2483 02:00:31,140 --> 02:00:34,140 Toupper expects as input a character. 2484 02:00:34,140 --> 02:00:36,090 You cannot pass a whole word to it. 2485 02:00:36,090 --> 02:00:39,480 It is still necessary at this point for me to be using this loop 2486 02:00:39,480 --> 02:00:41,542 and doing it character by character. 2487 02:00:41,542 --> 02:00:42,750 Now, how would you know this? 2488 02:00:42,750 --> 02:00:45,910 Well, you'll see multiple examples of this over the weeks to come. 2489 02:00:45,910 --> 02:00:50,000 But if I go to what's called the manual pages for the language C, 2490 02:00:50,000 --> 02:00:51,750 we have our own web-based version of them. 2491 02:00:51,750 --> 02:00:54,210 And we'll link this for you in the course's labs 2492 02:00:54,210 --> 02:00:55,690 and problem sets as needed. 2493 02:00:55,690 --> 02:00:58,680 You can see a list of all of the available functions in C 2494 02:00:58,680 --> 02:01:00,570 at least that are frequently used in CS50. 2495 02:01:00,570 --> 02:01:03,510 And if we uncheck a box at the top, we can see even more functions. 2496 02:01:03,510 --> 02:01:06,660 There's dozens, maybe hundreds of functions, most of which 2497 02:01:06,660 --> 02:01:08,715 we will not need or use in CS50. 2498 02:01:08,715 --> 02:01:10,590 But this is going to be true in any language. 2499 02:01:10,590 --> 02:01:13,623 You sort of pick up the building blocks that you need over time. 2500 02:01:13,623 --> 02:01:15,540 So we'll refer you to these kinds of resources 2501 02:01:15,540 --> 02:01:18,930 so that you don't rely only on what we show in section and lecture, 2502 02:01:18,930 --> 02:01:24,010 but you have at your disposal these other functions and toolkits as well. 2503 02:01:24,010 --> 02:01:28,120 And we'll do the same with Python and SQL and other languages as well. 2504 02:01:28,120 --> 02:01:32,040 So those are what we call, again, manual pages. 2505 02:01:32,040 --> 02:01:34,440 All right, a final feature before we even 2506 02:01:34,440 --> 02:01:38,880 think about cryptography and scrambling information as for problem set 2. 2507 02:01:38,880 --> 02:01:41,520 So a command-line argument I mentioned by name before-- 2508 02:01:41,520 --> 02:01:44,460 it's like a word you can type after a program's name 2509 02:01:44,460 --> 02:01:46,960 in order to provide it input at the command line. 2510 02:01:46,960 --> 02:01:52,140 So make hello-- hello is a command-line argument to the program, hello. 2511 02:01:52,140 --> 02:01:58,470 Rm space a.out-- a.out was an argument, a command-line argument to the program 2512 02:01:58,470 --> 02:02:00,130 rm when I wanted to remove it. 2513 02:02:00,130 --> 02:02:02,790 So we've already seen command-line arguments in action. 2514 02:02:02,790 --> 02:02:05,520 But we haven't actually written any programs 2515 02:02:05,520 --> 02:02:11,460 that allow you to accept words or other inputs from the so-called command line. 2516 02:02:11,460 --> 02:02:14,430 Up until now, all of the input you and I have gotten in our programs 2517 02:02:14,430 --> 02:02:16,440 comes from get_string, get_int, and so forth. 2518 02:02:16,440 --> 02:02:20,490 We have never been able to look at words that the human might very well have 2519 02:02:20,490 --> 02:02:23,610 typed at the prompt when running your program. 2520 02:02:23,610 --> 02:02:25,350 But that's all about to change now. 2521 02:02:25,350 --> 02:02:28,350 Let me go ahead and create a program called argv.c, 2522 02:02:28,350 --> 02:02:31,140 and it'll become clear why in just a moment. 2523 02:02:31,140 --> 02:02:36,270 I'm going to go ahead and include, shall we say, stdio.h. 2524 02:02:36,270 --> 02:02:39,120 And then I'm going to give myself int main(void). 2525 02:02:39,120 --> 02:02:43,650 And then I'm just going to very simply go back and change the void. 2526 02:02:43,650 --> 02:02:47,160 So just as our own custom functions can take inputs-- 2527 02:02:47,160 --> 02:02:49,290 and we saw that with get_negative_int. 2528 02:02:49,290 --> 02:02:52,020 We saw that with average today-- 2529 02:02:52,020 --> 02:02:54,600 so does main potentially take inputs. 2530 02:02:54,600 --> 02:02:57,120 Up till now though, we've been saying void. 2531 02:02:57,120 --> 02:02:58,770 And we told you to say void last week. 2532 02:02:58,770 --> 02:03:01,380 And we told you to say void in problem set 1. 2533 02:03:01,380 --> 02:03:06,780 But now it turns out that C does allow you to put other inputs into main. 2534 02:03:06,780 --> 02:03:10,720 You can either say, nope, main does not take any command-line arguments. 2535 02:03:10,720 --> 02:03:15,270 But if it does, you can say literally, int argc 2536 02:03:15,270 --> 02:03:19,150 and string argv with square brackets. 2537 02:03:19,150 --> 02:03:20,220 So it's a little cryptic. 2538 02:03:20,220 --> 02:03:22,803 And technically, you don't have to type it precisely this way. 2539 02:03:22,803 --> 02:03:26,220 But human convention would have you do it, at least for now, in this way. 2540 02:03:26,220 --> 02:03:29,010 This says that main, your function, main, 2541 02:03:29,010 --> 02:03:33,360 takes an integer as one input and not a string 2542 02:03:33,360 --> 02:03:36,570 but an array of strings as input. 2543 02:03:36,570 --> 02:03:40,480 And argc is shorthand notation for argument count. 2544 02:03:40,480 --> 02:03:43,860 Argument count is an integer that's going to represent the number of words 2545 02:03:43,860 --> 02:03:45,720 that your users type at the prompt. 2546 02:03:45,720 --> 02:03:48,330 Argv is short for argument vector. 2547 02:03:48,330 --> 02:03:50,430 Vector is a fancy way of saying list. 2548 02:03:50,430 --> 02:03:55,470 It is a variable that's going to store in an array all of the strings 2549 02:03:55,470 --> 02:03:59,940 that a human types at the prompt after your own program's name. 2550 02:03:59,940 --> 02:04:02,710 So we can use this, for instance, as follows. 2551 02:04:02,710 --> 02:04:06,330 Suppose that I want to let the user type their own name at the command prompt. 2552 02:04:06,330 --> 02:04:06,960 I don't want to use get_string. 2553 02:04:06,960 --> 02:04:09,583 I don't want to have to prompt the human later for their name. 2554 02:04:09,583 --> 02:04:12,750 I want them to be able to run my program and give me their name all at once, 2555 02:04:12,750 --> 02:04:17,080 just like make, just like rm, and Clang, and other programs we've seen. 2556 02:04:17,080 --> 02:04:20,850 So I'm going to do this-- if argc == 2-- 2557 02:04:20,850 --> 02:04:24,390 so if the number of arguments to my program is 2-- 2558 02:04:24,390 --> 02:04:31,420 go ahead and print out, "hello, %s", and plug in whatever is that argv[1]. 2559 02:04:31,420 --> 02:04:33,450 So more on this in just a moment. 2560 02:04:33,450 --> 02:04:37,770 Else, if argc is not equal to 2, let's just go with last week's default, 2561 02:04:37,770 --> 02:04:39,190 "hello, world." 2562 02:04:39,190 --> 02:04:41,250 So what is this program's purpose in life? 2563 02:04:41,250 --> 02:04:43,680 If the human types two words at the prompt, 2564 02:04:43,680 --> 02:04:47,310 I want to say, "hello, David," "hello, Brian," "hello, so-and-so." 2565 02:04:47,310 --> 02:04:50,310 Otherwise, if they don't type two words at the prompt, 2566 02:04:50,310 --> 02:04:52,630 I'm just going to say the default "hello, world." 2567 02:04:52,630 --> 02:04:55,780 So let me compile this, make argv. 2568 02:04:55,780 --> 02:05:00,350 And, hm, I didn't get it right here-- unknown type string, unknown type 2569 02:05:00,350 --> 02:05:00,850 string. 2570 02:05:00,850 --> 02:05:01,820 All right, I goofed. 2571 02:05:01,820 --> 02:05:07,150 If I'm using string, recall that now I need to start using the CS50 library. 2572 02:05:07,150 --> 02:05:09,730 And again, we'll see all the more why in the coming weeks as 2573 02:05:09,730 --> 02:05:11,440 we take those training wheels off. 2574 02:05:11,440 --> 02:05:13,870 But now I'm going to do this again, make argv. 2575 02:05:13,870 --> 02:05:14,440 There we go. 2576 02:05:14,440 --> 02:05:18,280 Now it works-- ./argv, Enter, "hello, world." 2577 02:05:18,280 --> 02:05:20,710 That's pretty much equivalent to what we did last week. 2578 02:05:20,710 --> 02:05:26,030 But notice if I type in, for instance, argv[1] David, Enter, it says, "hello, 2579 02:05:26,030 --> 02:05:26,530 David." 2580 02:05:26,530 --> 02:05:29,500 If I type in argv Brian, it says that. 2581 02:05:29,500 --> 02:05:33,710 If I type in Brian Yu, it says "hello, world." 2582 02:05:33,710 --> 02:05:35,200 So what's going on? 2583 02:05:35,200 --> 02:05:40,990 Well, the way you write programs in C that accept zero or more command-line 2584 02:05:40,990 --> 02:05:44,620 arguments-- that is, words at the prompt after your program's name-- 2585 02:05:44,620 --> 02:05:48,910 is you change what we have been doing all this time from void 2586 02:05:48,910 --> 02:05:52,748 to be this into argc string argv with square brackets. 2587 02:05:52,748 --> 02:05:55,165 And what the computer is going to do for you automatically 2588 02:05:55,165 --> 02:05:59,170 is it's going to store in argc a number of the total number of words 2589 02:05:59,170 --> 02:06:01,690 that the human typed in, not just the arguments, technically 2590 02:06:01,690 --> 02:06:04,420 all of the words, including your own program's name. 2591 02:06:04,420 --> 02:06:08,650 It's then going to fill this array of strings, a.k.a. argv, 2592 02:06:08,650 --> 02:06:11,890 with all of the words the human typed at the prompt, so not just 2593 02:06:11,890 --> 02:06:16,340 the arguments like Brian or David, but also the name of your program. 2594 02:06:16,340 --> 02:06:20,560 So if the human typed in two total words, which they did, argv Brian, 2595 02:06:20,560 --> 02:06:24,160 argv David, then I want to print out, "hello" 2596 02:06:24,160 --> 02:06:27,790 followed by a placeholder and then whatever value is at argv[1]. 2597 02:06:27,790 --> 02:06:29,770 And I'm deliberately not doing 0. 2598 02:06:29,770 --> 02:06:33,340 If I did 0, based on the verbal definition I just gave, 2599 02:06:33,340 --> 02:06:38,260 if I recompile this program, I don't want to see this, hello, ./argv. 2600 02:06:38,260 --> 02:06:43,030 So the program's own name is automatically always stored for you 2601 02:06:43,030 --> 02:06:45,190 at the first location in that array. 2602 02:06:45,190 --> 02:06:48,070 But if you want the first useful piece of information, 2603 02:06:48,070 --> 02:06:53,860 you actually would, after recompiling the code here, access it at [1]. 2604 02:06:53,860 --> 02:06:58,030 And so in this way do we see in argv that we can actually 2605 02:06:58,030 --> 02:06:59,350 access individual words. 2606 02:06:59,350 --> 02:07:00,520 But notice this too-- 2607 02:07:00,520 --> 02:07:05,410 suppose I want to print out all of the individual characters in someone's 2608 02:07:05,410 --> 02:07:06,010 input. 2609 02:07:06,010 --> 02:07:06,593 You know what? 2610 02:07:06,593 --> 02:07:08,083 I bet I could even do this. 2611 02:07:08,083 --> 02:07:09,250 Let me go ahead and do this. 2612 02:07:09,250 --> 02:07:13,330 Instead of just printing out "hello," let me do for int i get 0, 2613 02:07:13,330 --> 02:07:17,620 n equals the string length of argv[1]. 2614 02:07:17,620 --> 02:07:20,270 2615 02:07:20,270 --> 02:07:24,800 And then over here, I'm going to do i is less than n, i++. 2616 02:07:24,800 --> 02:07:27,770 All right, so I'm going to iterate over all of the characters 2617 02:07:27,770 --> 02:07:30,930 in the first real word in argv. 2618 02:07:30,930 --> 02:07:32,160 And what am I going to do? 2619 02:07:32,160 --> 02:07:37,310 Well, let me go ahead and print out a character that's at argv[1] 2620 02:07:37,310 --> 02:07:38,900 but at location i. 2621 02:07:38,900 --> 02:07:41,300 So I said a moment ago with our picture that we 2622 02:07:41,300 --> 02:07:47,090 could think of an array of strings as really just being an array of arrays. 2623 02:07:47,090 --> 02:07:53,570 And so I can employ that syntax here by going into argv[1] to get me the word 2624 02:07:53,570 --> 02:07:57,440 like "David" or "Brian" or so forth, and then further index into it with more 2625 02:07:57,440 --> 02:08:02,100 square brackets that get me the D, the A, the V, the I, the D, and so forth. 2626 02:08:02,100 --> 02:08:05,300 And just to be super clear, let me put a new line character there 2627 02:08:05,300 --> 02:08:08,070 just so we can see explicitly what's going on. 2628 02:08:08,070 --> 02:08:10,580 And let me go ahead now and just delete this "hello, world" 2629 02:08:10,580 --> 02:08:12,205 because I don't want to see any hellos. 2630 02:08:12,205 --> 02:08:14,240 I just want to see the word the human typed in. 2631 02:08:14,240 --> 02:08:19,500 Make argv-- whoops, what did I do wrong? 2632 02:08:19,500 --> 02:08:25,260 Oh, I used strlen when I shouldn't have because I haven't included string.h 2633 02:08:25,260 --> 02:08:26,550 at the top. 2634 02:08:26,550 --> 02:08:31,230 OK, now if I recompile this code and recompile make argv-- 2635 02:08:31,230 --> 02:08:36,010 there we go-- ./argv David, you'll see one character per line. 2636 02:08:36,010 --> 02:08:38,940 And if I do the same with Brian's name or anyone's name 2637 02:08:38,940 --> 02:08:42,352 and change it to Brian, I'm printing one character at a time. 2638 02:08:42,352 --> 02:08:44,560 So again, I'm not sure why you would want to do that. 2639 02:08:44,560 --> 02:08:47,760 But in this case, my goal simply was to not only iterate over 2640 02:08:47,760 --> 02:08:51,970 the characters in that first word, but print them out. 2641 02:08:51,970 --> 02:08:56,520 So again, just by applying twice over this time this principle, 2642 02:08:56,520 --> 02:09:00,570 can we actually see that a program has access 2643 02:09:00,570 --> 02:09:03,600 to the individual characters in each of these strings. 2644 02:09:03,600 --> 02:09:06,090 All right, and one last explanation before we 2645 02:09:06,090 --> 02:09:08,880 introduce the crypto and application thereof. 2646 02:09:08,880 --> 02:09:11,790 This thing here, this thing here-- does anyone 2647 02:09:11,790 --> 02:09:15,660 have any idea as to why main, last week and this week, 2648 02:09:15,660 --> 02:09:19,320 seems to return an int even though it's not an average function? 2649 02:09:19,320 --> 02:09:21,000 It's not a get_positive_int function. 2650 02:09:21,000 --> 02:09:22,470 It's not get_negative_int. 2651 02:09:22,470 --> 02:09:26,040 Somehow, for some reason, main keeps returning an int even though we 2652 02:09:26,040 --> 02:09:29,410 have never seen this int in action. 2653 02:09:29,410 --> 02:09:31,040 What might this mean? 2654 02:09:31,040 --> 02:09:33,340 This is the one last piece that we promised 2655 02:09:33,340 --> 02:09:37,090 last week we would eventually explain. 2656 02:09:37,090 --> 02:09:38,800 What might this mean? 2657 02:09:38,800 --> 02:09:41,420 And this one's a tough one. 2658 02:09:41,420 --> 02:09:43,870 Brian, who do we have? 2659 02:09:43,870 --> 02:09:47,230 How about [? Gred, ?] is it? 2660 02:09:47,230 --> 02:09:51,810 [? GRED: ?] Usually, the functions in the end have returned 0. 2661 02:09:51,810 --> 02:09:54,060 And that means that the function stops. 2662 02:09:54,060 --> 02:10:00,270 And the 0 is the integer that pops out of the main function. 2663 02:10:00,270 --> 02:10:03,810 DAVID MALAN: Yeah, and this one's subtle in that if you had programmed before, 2664 02:10:03,810 --> 02:10:06,390 odds are-- and I'm guessing you have, [? Gred-- ?] you've seen this in use 2665 02:10:06,390 --> 02:10:07,020 before. 2666 02:10:07,020 --> 02:10:10,350 We humans, though, in the real world of using Macs and PCs-- 2667 02:10:10,350 --> 02:10:13,320 you've actually seen numbers, integers in weird places. 2668 02:10:13,320 --> 02:10:17,220 Frankly, almost any time your computer freezes or you see an error message, 2669 02:10:17,220 --> 02:10:21,280 odds are you see an English or some spoken language in the error message. 2670 02:10:21,280 --> 02:10:23,307 But you very often see a numeric code. 2671 02:10:23,307 --> 02:10:25,140 For instance, if you're having Zoom trouble, 2672 02:10:25,140 --> 02:10:29,700 you'll often see the number 5 in the error window in Zoom's program. 2673 02:10:29,700 --> 02:10:31,710 And 5 just means you're having network issues. 2674 02:10:31,710 --> 02:10:34,710 So programmers often associate integers with things 2675 02:10:34,710 --> 02:10:36,540 that can go wrong in a program. 2676 02:10:36,540 --> 02:10:42,210 And as [? Gred ?] notes, they use 0 to connote that nothing has gone wrong, 2677 02:10:42,210 --> 02:10:43,660 that all as well. 2678 02:10:43,660 --> 02:10:48,285 So let me write one final program here just called exit.c 2679 02:10:48,285 --> 02:10:49,950 that puts this to the test. 2680 02:10:49,950 --> 02:10:54,640 Let me go ahead and write a program in a file called exit.c 2681 02:10:54,640 --> 02:10:57,870 that's going to introduce what we're going to call an exit status. 2682 02:10:57,870 --> 02:11:01,230 This is a subtlety that will be useful as our programs get 2683 02:11:01,230 --> 02:11:02,580 a little more complicated. 2684 02:11:02,580 --> 02:11:06,360 I'm going to go in here and do #include cs50.h. 2685 02:11:06,360 --> 02:11:09,360 And I'm going to go ahead and #include stdio.h. 2686 02:11:09,360 --> 02:11:14,970 And I'm going to give myself the longer version of main, so int argc, string 2687 02:11:14,970 --> 02:11:17,140 argv with the square brackets. 2688 02:11:17,140 --> 02:11:21,690 And in here, I'm going to say, if argc does not equal 2, 2689 02:11:21,690 --> 02:11:24,290 uh-uh, the human is not doing what I want them to, 2690 02:11:24,290 --> 02:11:26,040 and I'm going to yell at them in some way. 2691 02:11:26,040 --> 02:11:28,580 I'm going to say missing command-line arguments. 2692 02:11:28,580 --> 02:11:31,960 So any kind of error message that I want the human to see on the screen, 2693 02:11:31,960 --> 02:11:33,900 I'm just going to tell them with that message. 2694 02:11:33,900 --> 02:11:37,650 But I'm going to very subtly return the number 1. 2695 02:11:37,650 --> 02:11:39,090 I'm going to return an error code. 2696 02:11:39,090 --> 02:11:41,830 And the human is not necessarily going to see this code. 2697 02:11:41,830 --> 02:11:45,150 But if we were to have a graphical user interface or some other feature 2698 02:11:45,150 --> 02:11:47,130 to this program, that would be the number 2699 02:11:47,130 --> 02:11:49,110 they see in the error window that pops up, 2700 02:11:49,110 --> 02:11:52,320 just like Zoom might show you the number 5 if something has gone wrong. 2701 02:11:52,320 --> 02:11:54,870 Similarly, if you've ever visited a page, frankly, 2702 02:11:54,870 --> 02:11:59,130 and the web page doesn't exist, you see the integer 404. 2703 02:11:59,130 --> 02:12:01,890 That's not technically the exact same incarnation of this, 2704 02:12:01,890 --> 02:12:05,440 but it is representative of programmers using numbers to represent errors. 2705 02:12:05,440 --> 02:12:07,230 So that one, you probably have seen. 2706 02:12:07,230 --> 02:12:11,160 Here, I'm going to go ahead, though, and by default, say, "hello, %s," 2707 02:12:11,160 --> 02:12:14,250 just like before, passing in whatever's in argv[1]. 2708 02:12:14,250 --> 02:12:17,940 So same program as before, but I'm not going to do any of this lame, "hello, 2709 02:12:17,940 --> 02:12:21,580 world" if the human doesn't type in their name as I expect. 2710 02:12:21,580 --> 02:12:25,110 Instead, I am going to check, did the human 2711 02:12:25,110 --> 02:12:27,180 give me two words at the command line? 2712 02:12:27,180 --> 02:12:30,210 If not, I'm going to print, "missing command-line argument," 2713 02:12:30,210 --> 02:12:32,220 and then return this exit code. 2714 02:12:32,220 --> 02:12:36,750 Otherwise, if all is well, I'm going to go ahead and return explicitly 0. 2715 02:12:36,750 --> 02:12:40,200 This is another number that the human, you and I, are never going to see, 2716 02:12:40,200 --> 02:12:42,060 but we could have access to it. 2717 02:12:42,060 --> 02:12:46,200 And frankly, for course purposes, check50 can have access to this. 2718 02:12:46,200 --> 02:12:48,570 And graphical user interfaces, when we get to those, 2719 02:12:48,570 --> 02:12:50,980 can have access to these values. 2720 02:12:50,980 --> 02:12:54,160 So 0, as [? Gred ?] notes, is just all as well. 2721 02:12:54,160 --> 02:12:56,235 But 1 would mean that something goes wrong. 2722 02:12:56,235 --> 02:12:58,860 So let me go ahead and make exit, which is kind of appropriate, 2723 02:12:58,860 --> 02:13:00,270 as we're wrapping up here. 2724 02:13:00,270 --> 02:13:02,760 And let me go ahead and do ./exit. 2725 02:13:02,760 --> 02:13:05,700 "Missing command-line argument" is what's displayed. 2726 02:13:05,700 --> 02:13:09,120 If I go ahead and say, exit David, now I see "hello, David." 2727 02:13:09,120 --> 02:13:12,570 Or exit Brian, I'll see "exit Brian." 2728 02:13:12,570 --> 02:13:15,120 Now, this is not a technique you'll need to use often, 2729 02:13:15,120 --> 02:13:19,110 but you can actually see these return values if you want. 2730 02:13:19,110 --> 02:13:23,970 If I run exit, and I see this error message, I can very weirdly say, 2731 02:13:23,970 --> 02:13:28,260 echo $?, which is a very admittedly cryptic way of saying, 2732 02:13:28,260 --> 02:13:30,120 what was my exit status? 2733 02:13:30,120 --> 02:13:32,640 And if you hit Enter, you'll see 1. 2734 02:13:32,640 --> 02:13:35,370 By contrast, if I run exit of David, and I actually 2735 02:13:35,370 --> 02:13:42,060 see "hello, David," and I do echo $?, now I will see 0. 2736 02:13:42,060 --> 02:13:45,030 So again, this is not a technique you and I will use very frequently. 2737 02:13:45,030 --> 02:13:48,480 But it's a capability of a program, and it's a capability of C, 2738 02:13:48,480 --> 02:13:49,920 that you do now have access to. 2739 02:13:49,920 --> 02:13:52,140 And so in writing programs moving forward, 2740 02:13:52,140 --> 02:13:55,380 what we will often do in labs and in problem sets and the like 2741 02:13:55,380 --> 02:14:02,430 is ask you to return from main either 0 or 1 or maybe 2 or 3 or 4 2742 02:14:02,430 --> 02:14:06,060 based on the problems that might have gone wrong in your program 2743 02:14:06,060 --> 02:14:09,420 that you have detected and responded to appropriately. 2744 02:14:09,420 --> 02:14:13,530 So it's a very effective way of handling errors in a standard way 2745 02:14:13,530 --> 02:14:18,180 so that you know that you are being proactive about detecting mistakes. 2746 02:14:18,180 --> 02:14:20,540 So what kinds of mistakes might we handle this week? 2747 02:14:20,540 --> 02:14:22,290 And what kinds of problems might we solve? 2748 02:14:22,290 --> 02:14:26,100 Well, today was entirely about deconstructing what a string is. 2749 02:14:26,100 --> 02:14:29,220 Last week, it was just a sequence of text, a chunk of text. 2750 02:14:29,220 --> 02:14:31,740 Today, it's now an array of characters. 2751 02:14:31,740 --> 02:14:34,950 And we have new syntax in C for accessing those characters. 2752 02:14:34,950 --> 02:14:38,370 We also today have access to more libraries, more header files, 2753 02:14:38,370 --> 02:14:41,460 the documentation, therefore, so that we can actually solve problems 2754 02:14:41,460 --> 02:14:43,290 without writing as much code ourselves. 2755 02:14:43,290 --> 02:14:46,630 We can use other people's code in the form of these libraries. 2756 02:14:46,630 --> 02:14:49,890 So one problem we will solve this coming week by way of problems set 2 2757 02:14:49,890 --> 02:14:51,120 is that of readability. 2758 02:14:51,120 --> 02:14:54,150 Like, when you're reading a book or an essay or a paper or anything, 2759 02:14:54,150 --> 02:14:56,370 what is it that makes it like a 3rd-grade reading 2760 02:14:56,370 --> 02:14:59,917 level or a 12th-grade reading level or university reading level? 2761 02:14:59,917 --> 02:15:02,250 Well, all of us probably have an intuitive sense, right? 2762 02:15:02,250 --> 02:15:05,940 Like, if it's big font and short words, it's probably for younger kids. 2763 02:15:05,940 --> 02:15:09,000 And if it's really complicated words with big vocabulary and things 2764 02:15:09,000 --> 02:15:12,460 we don't know, maybe it's meant for university audiences. 2765 02:15:12,460 --> 02:15:16,440 But we can quantify this a little more formulaically, 2766 02:15:16,440 --> 02:15:19,328 not necessarily the only way, but we'll give you a few definitions. 2767 02:15:19,328 --> 02:15:21,120 So for instance, here's a famous sentence-- 2768 02:15:21,120 --> 02:15:23,370 "Mr. And Mrs. Dursley, of number four, Privet Drive, 2769 02:15:23,370 --> 02:15:26,412 we're proud to say that they were perfectly normal, thank you very much," 2770 02:15:26,412 --> 02:15:27,420 and so forth. 2771 02:15:27,420 --> 02:15:32,070 Well, what is it about this text that puts Harry Potter at grade seven 2772 02:15:32,070 --> 02:15:32,940 reading level? 2773 02:15:32,940 --> 02:15:35,520 Well, it probably has to do with the vocabulary words. 2774 02:15:35,520 --> 02:15:38,760 But it probably has to do with the lengths of the sentences, the amount 2775 02:15:38,760 --> 02:15:44,550 of punctuation perhaps, the total number of characters that you might count up. 2776 02:15:44,550 --> 02:15:48,518 You can imagine quantifying it just based generically on the look 2777 02:15:48,518 --> 02:15:49,810 and the aesthetics of the text. 2778 02:15:49,810 --> 02:15:50,670 What about this? 2779 02:15:50,670 --> 02:15:53,010 "In computational linguistics, authorship attribution 2780 02:15:53,010 --> 02:15:55,590 is the task of predicting the author of document of unknown authorship. 2781 02:15:55,590 --> 02:15:58,673 This task is generally performed by the analysis of stylometric features-- 2782 02:15:58,673 --> 02:16:00,750 particular"-- this is Brian's senior thesis. 2783 02:16:00,750 --> 02:16:02,650 So this is not a seventh-grade reading level. 2784 02:16:02,650 --> 02:16:04,860 This was actually rated at grade 16. 2785 02:16:04,860 --> 02:16:08,130 So Brian's pretty sophisticated when it comes to writing theses. 2786 02:16:08,130 --> 02:16:11,160 But there too, you could perhaps glean from the sophistication 2787 02:16:11,160 --> 02:16:14,010 of the sentences, the length thereof, and the words therein-- 2788 02:16:14,010 --> 02:16:17,010 there's something we could perhaps quantify so as to apply numbers. 2789 02:16:17,010 --> 02:16:21,720 And indeed, that's one way you could assess the readability of a text 2790 02:16:21,720 --> 02:16:24,480 even if you don't have access to a dictionary with which 2791 02:16:24,480 --> 02:16:27,360 to figure out which are the actual big or small words. 2792 02:16:27,360 --> 02:16:28,820 And what about cryptography? 2793 02:16:28,820 --> 02:16:32,160 So it's incredibly common these days and so important 2794 02:16:32,160 --> 02:16:37,020 these days for you and I to use cryptography, not necessarily using 2795 02:16:37,020 --> 02:16:39,389 algorithms we ourselves come up with, but rather using 2796 02:16:39,389 --> 02:16:43,469 software, like WhatsApp and Signal and Telegram and Messenger and others, 2797 02:16:43,469 --> 02:16:48,340 that support encryption between you and the third party or friend or family, 2798 02:16:48,340 --> 02:16:51,090 or at least minimally the website with which you're interacting. 2799 02:16:51,090 --> 02:16:55,590 So cryptography is the art of scrambling information, or hiding information. 2800 02:16:55,590 --> 02:16:59,430 And if that information is text, well, frankly, as of this third week of CS50, 2801 02:16:59,430 --> 02:17:03,059 we already have the requisite building blocks for not only representing text, 2802 02:17:03,059 --> 02:17:05,040 but we saw today manipulating it. 2803 02:17:05,040 --> 02:17:09,330 Even just uppercasing characters allows us to start mutating text. 2804 02:17:09,330 --> 02:17:11,459 Well, what does it mean to encrypt information? 2805 02:17:11,459 --> 02:17:13,650 Well, it's like our black box from last week. 2806 02:17:13,650 --> 02:17:14,520 You have some input. 2807 02:17:14,520 --> 02:17:15,395 You want some output. 2808 02:17:15,395 --> 02:17:18,040 The input, we're going to start calling plaintext. 2809 02:17:18,040 --> 02:17:20,969 The message, you want to send from yourself to someone else. 2810 02:17:20,969 --> 02:17:22,976 Ciphertext is the output that you want. 2811 02:17:22,976 --> 02:17:24,809 And so in between there, there's going to be 2812 02:17:24,809 --> 02:17:26,226 what we're going to call a cipher. 2813 02:17:26,226 --> 02:17:30,270 A cipher is an algorithm that encrypts or scrambles 2814 02:17:30,270 --> 02:17:34,177 its input so as to produce output that a third party can't understand. 2815 02:17:34,177 --> 02:17:35,969 And hopefully, that cipher, that algorithm, 2816 02:17:35,969 --> 02:17:40,020 is a reversible process so that when you receive the scrambled ciphertext, 2817 02:17:40,020 --> 02:17:44,830 you can figure out what it was that the person sent to you. 2818 02:17:44,830 --> 02:17:48,030 But the key to using cryptography-- pun intended-- 2819 02:17:48,030 --> 02:17:49,282 is to also have a secret key. 2820 02:17:49,282 --> 02:17:51,240 So if you think back to grade school, maybe you 2821 02:17:51,240 --> 02:17:53,549 were flirting with someone in class, and you sent them 2822 02:17:53,549 --> 02:17:55,082 a note on a piece of paper. 2823 02:17:55,082 --> 02:17:58,290 Well, hopefully, you didn't just say, like, I love you, on the piece of paper 2824 02:17:58,290 --> 02:18:00,165 and then pass it through all of your friends, 2825 02:18:00,165 --> 02:18:02,910 or let alone the teacher, to the ultimate recipient. 2826 02:18:02,910 --> 02:18:05,340 Maybe you did something like, an A becomes 2827 02:18:05,340 --> 02:18:08,459 a B. A B becomes a C. A C becomes a D. Like, 2828 02:18:08,459 --> 02:18:11,740 you kind of apply an algorithm to add 1 to all of the letters 2829 02:18:11,740 --> 02:18:14,219 so that if the teacher does intercept it and look at it, 2830 02:18:14,219 --> 02:18:17,070 they probably don't have enough care in the world to figure out what this is. 2831 02:18:17,070 --> 02:18:18,690 It's just going to look like nonsense. 2832 02:18:18,690 --> 02:18:21,840 But if your friend knows that you changed A to B, B 2833 02:18:21,840 --> 02:18:26,010 to C by adding 1 to every letter, they could reverse that process 2834 02:18:26,010 --> 02:18:27,610 and decrypt it. 2835 02:18:27,610 --> 02:18:30,270 So the key, for instance, might be literally the number 1. 2836 02:18:30,270 --> 02:18:32,610 The message literally might be, "I LOVE YOU." 2837 02:18:32,610 --> 02:18:35,080 But what would the ciphertext be, or the output? 2838 02:18:35,080 --> 02:18:38,610 Well, let's consider "I LOVE YOU" is a string which, as of today, 2839 02:18:38,610 --> 02:18:40,240 is an array of characters. 2840 02:18:40,240 --> 02:18:42,000 So what use is that? 2841 02:18:42,000 --> 02:18:45,123 Well, let's consider exactly that phrase as though it's an array. 2842 02:18:45,123 --> 02:18:46,290 It's an array of characters. 2843 02:18:46,290 --> 02:18:50,969 We know from last week, characters are just integers, decimal integers, 2844 02:18:50,969 --> 02:18:53,190 thanks to ASCII, and in turn, Unicode. 2845 02:18:53,190 --> 02:18:55,770 So it turns out I, we already know, is 73. 2846 02:18:55,770 --> 02:19:04,920 And if we looked up all the others on a chart, L is 76, 79, 86, 69, 89, 79, 85. 2847 02:19:04,920 --> 02:19:08,400 So we could relatively easily and see-- you might have to check your notes 2848 02:19:08,400 --> 02:19:10,240 and check my sample code and so forth-- 2849 02:19:10,240 --> 02:19:15,750 but relatively easily in C convert "I LOVE YOU" to the corresponding integers 2850 02:19:15,750 --> 02:19:19,290 by just casting, so to speak, chars to integers. 2851 02:19:19,290 --> 02:19:23,340 I could very easily mathematically, using the plus operator in C, 2852 02:19:23,340 --> 02:19:26,910 start to add 1 to every one of these characters, 2853 02:19:26,910 --> 02:19:29,309 thereby encrypting my message. 2854 02:19:29,309 --> 02:19:31,398 But I could send my friend these numbers. 2855 02:19:31,398 --> 02:19:33,690 But I might as well make it a little more user friendly 2856 02:19:33,690 --> 02:19:36,209 and cast it back from integers to chars. 2857 02:19:36,209 --> 02:19:42,930 So now it would seem that the ciphertext for "I LOVE YOU," if using a key of 1-- 2858 02:19:42,930 --> 02:19:47,910 and 1 just means change A to B, not A to C, just move it by one place-- 2859 02:19:47,910 --> 02:19:52,740 this is the ciphertext for an encrypted message of, "I LOVE YOU." 2860 02:19:52,740 --> 02:19:55,840 And so the whole process becomes 1 is the input as the key. 2861 02:19:55,840 --> 02:19:57,810 "I LOVE YOU" is the input as the plaintext. 2862 02:19:57,810 --> 02:20:00,990 And the output ultimately is this unpronounceable phrase 2863 02:20:00,990 --> 02:20:03,630 that, again, if the teacher or some friend intercepts, 2864 02:20:03,630 --> 02:20:06,060 they probably don't know what's going on. 2865 02:20:06,060 --> 02:20:08,520 And indeed, this is the essence of cryptography. 2866 02:20:08,520 --> 02:20:12,027 The algorithms that protect our emails and texts and financial information 2867 02:20:12,027 --> 02:20:13,860 and health information is hopefully way more 2868 02:20:13,860 --> 02:20:17,160 sophisticated than that particular algorithm as it is. 2869 02:20:17,160 --> 02:20:19,350 But it reduces to the same process-- 2870 02:20:19,350 --> 02:20:23,640 an input key and an input text followed by some output, 2871 02:20:23,640 --> 02:20:25,050 the so-called ciphertext. 2872 02:20:25,050 --> 02:20:28,500 And this has been with us for decades now in some form, sometimes even 2873 02:20:28,500 --> 02:20:29,400 mechanical form. 2874 02:20:29,400 --> 02:20:32,760 Back in the day, you could actually get these little circular devices 2875 02:20:32,760 --> 02:20:35,343 that have letters on the alphabet on one side, other letters 2876 02:20:35,343 --> 02:20:36,760 on the alphabet on the other side. 2877 02:20:36,760 --> 02:20:39,720 And if you rotate one or the other, A might line up 2878 02:20:39,720 --> 02:20:41,310 with B, B might line up with C. 2879 02:20:41,310 --> 02:20:44,760 So you can have even a physical incarnation of cryptography, 2880 02:20:44,760 --> 02:20:49,920 just as was popular in a movie that seems to play endlessly on TV, 2881 02:20:49,920 --> 02:20:52,930 at least here in the US around Christmas time. 2882 02:20:52,930 --> 02:20:56,980 And you might recognize if you've seen A Christmas Story one such look. 2883 02:20:56,980 --> 02:20:59,460 So we'll use just a couple of minutes of our final moments 2884 02:20:59,460 --> 02:21:02,910 together to take a look at this real-world incarnation of cryptography 2885 02:21:02,910 --> 02:21:06,533 that undoubtedly you can probably see on TV this fall. 2886 02:21:06,533 --> 02:21:07,200 [VIDEO PLAYBACK] 2887 02:21:07,200 --> 02:21:09,810 - "Be it known to all and sundry that Ralph Parker is hereby 2888 02:21:09,810 --> 02:21:12,720 appointed a member of the Little Orphan Annie secret circle 2889 02:21:12,720 --> 02:21:16,300 and is entitled to all the honors and benefits occurring thereto." 2890 02:21:16,300 --> 02:21:18,810 - "Signed, Little Orphan Annie." 2891 02:21:18,810 --> 02:21:22,920 "Countersigned, Pierre Andre," in ink. 2892 02:21:22,920 --> 02:21:25,620 Honors and benefits already at the age of nine. 2893 02:21:25,620 --> 02:21:27,942 [RADIO CHATTER] 2894 02:21:27,942 --> 02:21:28,900 - (ON RADIO) Attention! 2895 02:21:28,900 --> 02:21:29,710 [INAUDIBLE] overboard! 2896 02:21:29,710 --> 02:21:30,165 [CLANGING] 2897 02:21:30,165 --> 02:21:31,530 - (ON RADIO) Come [INAUDIBLE] Gone overboard! 2898 02:21:31,530 --> 02:21:32,440 - (ON RADIO) [INAUDIBLE] 2899 02:21:32,440 --> 02:21:33,808 - Come on, let's get on with it. 2900 02:21:33,808 --> 02:21:36,100 I don't need all that jazz about smugglers and pirates. 2901 02:21:36,100 --> 02:21:36,976 [BARKING] 2902 02:21:36,976 --> 02:21:37,645 2903 02:21:37,645 --> 02:21:40,270 - (ON RADIO) Listen tomorrow night for the concluding adventure 2904 02:21:40,270 --> 02:21:42,450 of the Black Pirate Ship. 2905 02:21:42,450 --> 02:21:48,430 Now it's time for Annie's secret message for you members of the secret circle. 2906 02:21:48,430 --> 02:21:52,150 Remember kids, only members of any secret circle 2907 02:21:52,150 --> 02:21:54,730 can decode any secret message. 2908 02:21:54,730 --> 02:21:58,900 Remember, Annie is depending on you. 2909 02:21:58,900 --> 02:22:01,480 Set your pins to B-2. 2910 02:22:01,480 --> 02:22:03,760 Here is the message. 2911 02:22:03,760 --> 02:22:05,710 12, 11, 2, 8-- 2912 02:22:05,710 --> 02:22:07,540 - I am in my first secret meeting. 2913 02:22:07,540 --> 02:22:12,250 - (ON RADIO) --25, 14, 11, 18, 16, 23-- 2914 02:22:12,250 --> 02:22:14,110 - Old Pierre was in great voice tonight. 2915 02:22:14,110 --> 02:22:14,350 - (ON RADIO) --12, 23-- 2916 02:22:14,350 --> 02:22:16,767 - I could tell that tonight's message was really important 2917 02:22:16,767 --> 02:22:19,660 - (ON RADIO) --21, 3, 25. 2918 02:22:19,660 --> 02:22:21,400 That's a message from Annie herself. 2919 02:22:21,400 --> 02:22:22,620 Remember, don't tell anyone. 2920 02:22:22,620 --> 02:22:25,584 [FOOTSTEPS AND PANTING] 2921 02:22:25,584 --> 02:22:27,560 2922 02:22:27,560 --> 02:22:31,550 - 90 seconds later, I'm in the only room in the house where a boy of nine 2923 02:22:31,550 --> 02:22:33,635 could sit in privacy and decode. 2924 02:22:33,635 --> 02:22:43,070 [CHUCKLES] Aha, B. [CHUCKLES] I went to the next, E. The first word is "be." 2925 02:22:43,070 --> 02:22:45,680 S, it was coming easier now. 2926 02:22:45,680 --> 02:22:47,747 U. [CHUCKLES] 25, that's R. 2927 02:22:47,747 --> 02:22:50,192 - Aw, come on, Ralphie, I gotta go. 2928 02:22:50,192 --> 02:22:51,170 - Come on. 2929 02:22:51,170 --> 02:22:53,126 - I'll be right down, Ma! 2930 02:22:53,126 --> 02:22:54,104 - Gee whiz. 2931 02:22:54,104 --> 02:22:57,040 2932 02:22:57,040 --> 02:23:01,120 - T, O. "Be sure to." 2933 02:23:01,120 --> 02:23:02,380 Be sure to what? 2934 02:23:02,380 --> 02:23:04,513 What was Little Orphan Annie trying to say? 2935 02:23:04,513 --> 02:23:05,180 Be sure to what? 2936 02:23:05,180 --> 02:23:06,970 - Ralphie, Randy has got to go. 2937 02:23:06,970 --> 02:23:08,350 Will you please come out? 2938 02:23:08,350 --> 02:23:09,580 - All right, Ma! 2939 02:23:09,580 --> 02:23:11,470 I'll be right out! 2940 02:23:11,470 --> 02:23:13,370 - I was getting closer now. 2941 02:23:13,370 --> 02:23:15,300 The tension was terrible. 2942 02:23:15,300 --> 02:23:16,310 What was it? 2943 02:23:16,310 --> 02:23:18,776 The fate of the planet may hang in the balance. 2944 02:23:18,776 --> 02:23:19,276 [KNOCKING] 2945 02:23:19,276 --> 02:23:19,776 - Ralphie! 2946 02:23:19,776 --> 02:23:21,666 Randy's got to go! 2947 02:23:21,666 --> 02:23:25,012 - I'll be right out, for crying out loud! 2948 02:23:25,012 --> 02:23:26,860 - [CHUCKLES] Almost there. 2949 02:23:26,860 --> 02:23:27,930 My fingers flew. 2950 02:23:27,930 --> 02:23:31,560 My mind was a steel trap, every pore vibrated. 2951 02:23:31,560 --> 02:23:33,524 It was almost clear. 2952 02:23:33,524 --> 02:23:35,894 Yes, yes, yes, yes. 2953 02:23:35,894 --> 02:23:41,700 - "Be sure to drink your Ovaltine." 2954 02:23:41,700 --> 02:23:42,630 Ovaltine? 2955 02:23:42,630 --> 02:23:46,510 2956 02:23:46,510 --> 02:23:47,750 A crummy commercial? 2957 02:23:47,750 --> 02:23:50,890 [MUSIC PLAYING] 2958 02:23:50,890 --> 02:23:52,317 Son of a bitch. 2959 02:23:52,317 --> 02:23:52,900 [END PLAYBACK] 2960 02:23:52,900 --> 02:23:55,030 DAVID MALAN: All right, that's it for CS50. 2961 02:23:55,030 --> 02:23:57,310 We will see you next time. 2962 02:23:57,310 --> 02:24:00,660 [MUSIC PLAYING] 2963 02:24:00,660 --> 02:24:58,000