1 00:00:00,000 --> 00:00:02,982 2 00:00:02,982 --> 00:00:06,461 [MUSIC PLAYING] 3 00:00:06,461 --> 00:01:12,600 4 00:01:12,600 --> 00:01:13,590 DAVID MALAN: All right. 5 00:01:13,590 --> 00:01:17,130 This is CS50, and this is week 2 wherein we're 6 00:01:17,130 --> 00:01:20,610 going to take a look at a lower level at how things work, 7 00:01:20,610 --> 00:01:24,120 and indeed, among the goals of the course is this bottom-up understanding 8 00:01:24,120 --> 00:01:26,670 so that in a couple of weeks' time, even a few years' time, 9 00:01:26,670 --> 00:01:29,920 when you encounter some new technology, you'll be able to think back hopefully 10 00:01:29,920 --> 00:01:33,180 on some of this week's and this is basic building blocks and primitives 11 00:01:33,180 --> 00:01:36,060 and really just deduce how tomorrow's technologies work. 12 00:01:36,060 --> 00:01:37,685 But along the way, it's going to seem-- 13 00:01:37,685 --> 00:01:40,727 it's going to be a little hard, perhaps, to see the forest for the trees, 14 00:01:40,727 --> 00:01:41,380 so to speak. 15 00:01:41,380 --> 00:01:44,783 And so the goal at the end of the day still is going to be problem-solving. 16 00:01:44,783 --> 00:01:47,700 And so we thought we'd begin today with a look at some of the problems 17 00:01:47,700 --> 00:01:50,405 we'll talk about or solve this coming week, 18 00:01:50,405 --> 00:01:53,280 and for that, we have some brave volunteers who have already come up. 19 00:01:53,280 --> 00:01:58,320 If we could turn on some dramatic lighting and meet today's volunteers. 20 00:01:58,320 --> 00:02:00,430 So on my left here, we have-- 21 00:02:00,430 --> 00:02:00,930 ALEX: Hi. 22 00:02:00,930 --> 00:02:01,960 My name is Alex. 23 00:02:01,960 --> 00:02:05,340 I'm a first-year at the college and I'm from Chapel Hill, North Carolina. 24 00:02:05,340 --> 00:02:07,080 DAVID MALAN: Welcome to Alex. 25 00:02:07,080 --> 00:02:09,180 And to Alex's right. 26 00:02:09,180 --> 00:02:10,050 SARAH: I'm Sarah. 27 00:02:10,050 --> 00:02:13,230 I'm from Toronto, Canada, and I'm also a first-year student at the college. 28 00:02:13,230 --> 00:02:14,188 DAVID MALAN: Wonderful. 29 00:02:14,188 --> 00:02:15,869 Well, welcome to both Alex and Sarah. 30 00:02:15,869 --> 00:02:18,577 So one of the problems you'll perhaps solve this week for problem 31 00:02:18,577 --> 00:02:22,442 set 2 is to analyze the reading level of a body of text, 32 00:02:22,442 --> 00:02:25,650 whether someone reads at a first grade level, second grade level, third grade 33 00:02:25,650 --> 00:02:28,570 level, all the way up to 12 or 13 or beyond. 34 00:02:28,570 --> 00:02:32,250 What you perhaps never quite thought about, certainly in terms of code, 35 00:02:32,250 --> 00:02:35,310 like how you would analyze some text, some book and figure 36 00:02:35,310 --> 00:02:36,750 out what reading level is it at. 37 00:02:36,750 --> 00:02:40,330 And yet, surely our teachers growing up knew or had an intuitive sense of this. 38 00:02:40,330 --> 00:02:42,450 So let's consider some sample text. 39 00:02:42,450 --> 00:02:45,960 For instance, Alex, what have you been reading lately? 40 00:02:45,960 --> 00:02:52,502 ALEX: One fish, two fish, red fish, blue fish. 41 00:02:52,502 --> 00:02:53,460 DAVID MALAN: Wonderful. 42 00:02:53,460 --> 00:02:58,890 So given that, what grade level would you say Alex is currently reading at? 43 00:02:58,890 --> 00:03:01,500 Feel free to just shout it out. 44 00:03:01,500 --> 00:03:02,730 First, first? 45 00:03:02,730 --> 00:03:07,200 So indeed, you'll see this week, if you run your code on Alex's text, 46 00:03:07,200 --> 00:03:10,410 it actually turns out he reads below a first grade reading level. 47 00:03:10,410 --> 00:03:12,400 But why might that be? 48 00:03:12,400 --> 00:03:16,410 What might your intuition be for why we've 49 00:03:16,410 --> 00:03:19,020 accused Alex of reading at this level? 50 00:03:19,020 --> 00:03:20,990 Feel free to shout out. 51 00:03:20,990 --> 00:03:21,490 Yeah. 52 00:03:21,490 --> 00:03:24,520 So very few syllables, short words, short sentences. 53 00:03:24,520 --> 00:03:27,828 And so there's some heuristics, perhaps, we can infer from that short text, 54 00:03:27,828 --> 00:03:30,370 that that probably means that it's best for younger children. 55 00:03:30,370 --> 00:03:33,370 Now Sarah, by contrast, what have you been reading? 56 00:03:33,370 --> 00:03:35,470 SARAH: Mr. And Mrs. Dursley of Number. 57 00:03:35,470 --> 00:03:38,890 Four Privet Drive were proud to say that they were 58 00:03:38,890 --> 00:03:41,050 perfectly normal, thank you very much. 59 00:03:41,050 --> 00:03:43,480 They were the last people you'd expect to be involved 60 00:03:43,480 --> 00:03:46,390 in anything strange or mysterious because they just 61 00:03:46,390 --> 00:03:47,952 didn't hold with much nonsense. 62 00:03:47,952 --> 00:03:48,910 DAVID MALAN: All right. 63 00:03:48,910 --> 00:03:50,950 Now irrespective of what grade you were in when 64 00:03:50,950 --> 00:03:53,283 you might have read that text, what grade level to Sarah 65 00:03:53,283 --> 00:03:55,230 seemed to be reading at? 66 00:03:55,230 --> 00:03:57,570 So eighth grade, second grade. 67 00:03:57,570 --> 00:03:58,080 OK. 68 00:03:58,080 --> 00:04:01,125 So hearing a bit of everything, so with that, at least according to code, 69 00:04:01,125 --> 00:04:03,240 it would actually be seventh grade. 70 00:04:03,240 --> 00:04:05,130 And what might the intuition there be? 71 00:04:05,130 --> 00:04:07,620 Why is that a higher grade level even though we might 72 00:04:07,620 --> 00:04:09,917 disagree exactly which grade it is? 73 00:04:09,917 --> 00:04:11,250 AUDIENCE: Complicated sentences. 74 00:04:11,250 --> 00:04:12,000 DAVID MALAN: Yeah. 75 00:04:12,000 --> 00:04:14,218 So complicated sentences, longer sentences. 76 00:04:14,218 --> 00:04:17,010 So indeed a lot more words were being spoken by Sarah because there 77 00:04:17,010 --> 00:04:18,519 was so much more there on the page. 78 00:04:18,519 --> 00:04:22,079 So we'll translate these ideas this coming week in problem set 2, 79 00:04:22,079 --> 00:04:25,170 if you tackle this one, through code so that you can ultimately 80 00:04:25,170 --> 00:04:26,910 infer things of these quantitatively. 81 00:04:26,910 --> 00:04:29,190 But to do so, we're going to have to understand text. 82 00:04:29,190 --> 00:04:32,610 So let's first thank our volunteers and then we'll dive in to that lower level. 83 00:04:32,610 --> 00:04:35,337 [APPLAUSE] 84 00:04:35,337 --> 00:04:39,910 85 00:04:39,910 --> 00:04:40,600 Sorry. 86 00:04:40,600 --> 00:04:41,490 You can keep those. 87 00:04:41,490 --> 00:04:42,222 SARAH: Oh, OK. 88 00:04:42,222 --> 00:04:43,180 DAVID MALAN: All right. 89 00:04:43,180 --> 00:04:45,970 So besides that, let's consider one other body of text 90 00:04:45,970 --> 00:04:48,010 perhaps that you might see this week, which 91 00:04:48,010 --> 00:04:50,210 is namely a little something like this. 92 00:04:50,210 --> 00:04:53,860 What I have here on the screen is what we'll start calling today ciphertext. 93 00:04:53,860 --> 00:04:56,530 It's the result of encrypting some piece of information. 94 00:04:56,530 --> 00:05:00,190 And encryption, or more generally, the art and science of cryptography 95 00:05:00,190 --> 00:05:00,908 is all around us. 96 00:05:00,908 --> 00:05:03,700 It's what you're using on the web, on your phones, with your banks. 97 00:05:03,700 --> 00:05:07,000 And anything that tries to keep data secure is using encryption. 98 00:05:07,000 --> 00:05:10,390 But there's going to be different levels of encryption-- strong encryption, 99 00:05:10,390 --> 00:05:11,140 weak encryption. 100 00:05:11,140 --> 00:05:14,590 And what you see here on the screen isn't all that strong, 101 00:05:14,590 --> 00:05:18,190 but we'll see later today how we might decrypt this and actually reveal 102 00:05:18,190 --> 00:05:22,030 what the plaintext is that corresponds to that ciphertext. 103 00:05:22,030 --> 00:05:25,670 But in order to do so, we have to start taking off some training wheels, 104 00:05:25,670 --> 00:05:26,197 so to speak. 105 00:05:26,197 --> 00:05:28,030 And believe it or not, even though your time 106 00:05:28,030 --> 00:05:30,100 would see this past week for the first time, 107 00:05:30,100 --> 00:05:32,230 probably, might have been rather in the weeds. 108 00:05:32,230 --> 00:05:36,072 And much more complicated seemingly than C, it turns out that along the way, 109 00:05:36,072 --> 00:05:37,780 we have been providing and we'll continue 110 00:05:37,780 --> 00:05:39,760 to provide certain training wheels. 111 00:05:39,760 --> 00:05:42,190 For instance, the CS50 Library is one of them, 112 00:05:42,190 --> 00:05:46,240 and even some of the explanations we give of topics for now 113 00:05:46,240 --> 00:05:49,120 in these early weeks will be somewhat simplified-- abstracted away, 114 00:05:49,120 --> 00:05:49,730 if you will. 115 00:05:49,730 --> 00:05:51,730 But the goal ultimately is for you to understand 116 00:05:51,730 --> 00:05:55,060 each and every one of those details so that after CS50, you really 117 00:05:55,060 --> 00:05:58,210 can stand on your own and understand and wrap your mind 118 00:05:58,210 --> 00:06:01,040 around any future technologies as well. 119 00:06:01,040 --> 00:06:05,318 So let's consider first the very first program with which we began last week, 120 00:06:05,318 --> 00:06:06,110 which was this one. 121 00:06:06,110 --> 00:06:09,215 So "hello, world" in C. At the end of the day, it was really the printf 122 00:06:09,215 --> 00:06:11,590 function that was doing the interesting part of the work, 123 00:06:11,590 --> 00:06:14,890 but there was a lot of technical stuff above and below it. 124 00:06:14,890 --> 00:06:19,900 The curly braces, the parentheses, words like void and include, and then 125 00:06:19,900 --> 00:06:21,730 of course, the angled brackets and more. 126 00:06:21,730 --> 00:06:25,870 But at the end of the day, we needed to convert that source code in C 127 00:06:25,870 --> 00:06:30,190 to machine code, the 0's and 1's in binary that the computer understood. 128 00:06:30,190 --> 00:06:32,500 And to do that, of course, we ran-- 129 00:06:32,500 --> 00:06:33,700 we compiled the code. 130 00:06:33,700 --> 00:06:37,400 We ran make and then we were able to actually run that code there. 131 00:06:37,400 --> 00:06:39,370 So let me actually go over here to VS Code 132 00:06:39,370 --> 00:06:44,510 and really quickly recreate that hello.c pretty much by transcribing the same. 133 00:06:44,510 --> 00:06:51,970 So I might have here include stdio.h, int main void. 134 00:06:51,970 --> 00:06:54,460 And then in here, I had quite simply, hello, 135 00:06:54,460 --> 00:06:57,430 comma, world with my backslash, endquotes, and more. 136 00:06:57,430 --> 00:07:01,693 Now last time, to compile this, I indeed ran make hello, followed by Enter. 137 00:07:01,693 --> 00:07:03,860 Hopefully you see no errors and that's a good thing. 138 00:07:03,860 --> 00:07:05,980 And if you do dot, slash, hello, you see, 139 00:07:05,980 --> 00:07:07,840 in fact, the results of that program. 140 00:07:07,840 --> 00:07:11,470 But it turns out that make is not actually a compiler 141 00:07:11,470 --> 00:07:12,950 as I alluded to last week. 142 00:07:12,950 --> 00:07:15,520 It's a program that clearly makes your program, 143 00:07:15,520 --> 00:07:19,030 but it itself just automates the process of using an actual compiler. 144 00:07:19,030 --> 00:07:21,290 And there's lots of different compilers out there, 145 00:07:21,290 --> 00:07:24,190 and the one that it's actually using underneath the hood 146 00:07:24,190 --> 00:07:27,640 is a little something called Clang for C Language. 147 00:07:27,640 --> 00:07:30,190 And Clang is a pretty popular compiler nowadays. 148 00:07:30,190 --> 00:07:33,520 There's another one that's been around for ages called GCC, 149 00:07:33,520 --> 00:07:36,330 but these are just specific names for types of compilers 150 00:07:36,330 --> 00:07:38,830 that different people, different companies, different groups 151 00:07:38,830 --> 00:07:40,310 have actually created. 152 00:07:40,310 --> 00:07:44,800 But if you use in week 1 a compiler yourself manually, 153 00:07:44,800 --> 00:07:47,170 you have to understand a little more about what's 154 00:07:47,170 --> 00:07:50,703 going on because it's even more cryptic than what just make alone. 155 00:07:50,703 --> 00:07:53,620 So in fact, let me go back to my terminal window here, let me go ahead 156 00:07:53,620 --> 00:07:58,690 and clear the screen a little bit and just run really the raw compiler 157 00:07:58,690 --> 00:07:59,360 command. 158 00:07:59,360 --> 00:08:01,450 So what make is automating for me let me, 159 00:08:01,450 --> 00:08:03,620 actually do this manually for just a moment. 160 00:08:03,620 --> 00:08:10,450 So if I want to compile hello.c into an executable program I can run, 161 00:08:10,450 --> 00:08:12,220 I can do this. 162 00:08:12,220 --> 00:08:17,110 clang, space, hello.c, and then Enter. 163 00:08:17,110 --> 00:08:20,980 And now there's no output, which is a good thing in this case, no errors, 164 00:08:20,980 --> 00:08:22,010 but notice this. 165 00:08:22,010 --> 00:08:25,450 If I go ahead and type ls, it turns out there's 166 00:08:25,450 --> 00:08:32,140 a file that's been created suddenly in my current folder weirdly called a.out. 167 00:08:32,140 --> 00:08:33,580 That stands for Assembler Output. 168 00:08:33,580 --> 00:08:35,980 And long story short, that's actually the default name 169 00:08:35,980 --> 00:08:39,440 of a program that's created when you just run Clang by itself. 170 00:08:39,440 --> 00:08:41,830 Now that's a pretty bad name for a program 171 00:08:41,830 --> 00:08:44,000 because it doesn't describe what it does. 172 00:08:44,000 --> 00:08:49,870 So better would be here to perhaps do, well, instead of a.out, which, yes, 173 00:08:49,870 --> 00:08:53,950 still prints hello.world, but isn't really a clearly-named program, 174 00:08:53,950 --> 00:08:55,420 it'd be nice to name this hello. 175 00:08:55,420 --> 00:08:56,240 So what could I do? 176 00:08:56,240 --> 00:08:59,740 I could do like we learned last week-- well, I could rename a.out to hello 177 00:08:59,740 --> 00:09:01,820 by using Linux's mv command. 178 00:09:01,820 --> 00:09:04,480 So I'm going to move a.out to become hello. 179 00:09:04,480 --> 00:09:06,370 But that, too, seems kind of tedious. 180 00:09:06,370 --> 00:09:07,720 Now I have three steps. 181 00:09:07,720 --> 00:09:10,750 Like write my code, compile my code, and then rename it 182 00:09:10,750 --> 00:09:12,190 before I can even run it. 183 00:09:12,190 --> 00:09:13,580 We can do better than that. 184 00:09:13,580 --> 00:09:15,580 And so it turns out that certain commands 185 00:09:15,580 --> 00:09:18,220 like clang support what we're going to start today 186 00:09:18,220 --> 00:09:20,380 calling command line arguments. 187 00:09:20,380 --> 00:09:24,010 A command line argument, unlike an argument to a function, 188 00:09:24,010 --> 00:09:27,040 is just an additional word or key phrase that you 189 00:09:27,040 --> 00:09:30,400 type after a command at your prompt in your terminal 190 00:09:30,400 --> 00:09:33,440 window that just modifies the behavior of that command. 191 00:09:33,440 --> 00:09:35,600 It configures it a little more specifically. 192 00:09:35,600 --> 00:09:39,220 So what you're seeing here on the screen is some of a better command with which 193 00:09:39,220 --> 00:09:45,220 to run clang so that now I can specify the output of this command per this o. 194 00:09:45,220 --> 00:09:46,610 So do what I mean by that? 195 00:09:46,610 --> 00:09:48,943 Well, let me go ahead and clear my terminal window again 196 00:09:48,943 --> 00:09:54,955 and more explicitly type clang -o hello hello.c and then Enter. 197 00:09:54,955 --> 00:09:57,580 Nothing, again, appears to happen, but that's a good thing when 198 00:09:57,580 --> 00:10:02,860 you see no errors and now the program I just created is indeed called Hello. 199 00:10:02,860 --> 00:10:07,280 So it achieves really the same exact effect as make did, but what. 200 00:10:07,280 --> 00:10:09,820 I don't have to do with make is type and remember something 201 00:10:09,820 --> 00:10:11,075 as long as this command. 202 00:10:11,075 --> 00:10:12,700 And this, too, is a bit of a white lie. 203 00:10:12,700 --> 00:10:16,420 It turns out, we have preconfigured VS Code in the cloud for you 204 00:10:16,420 --> 00:10:21,310 to also use some other features of Clang that would be even more 205 00:10:21,310 --> 00:10:22,840 tedious for you to write yourselves. 206 00:10:22,840 --> 00:10:28,130 And so really, this is why we distill this as ultimately just running make. 207 00:10:28,130 --> 00:10:31,900 So let me pause here to see first if there's any questions on what I've 208 00:10:31,900 --> 00:10:34,540 done by taking my very first program in C 209 00:10:34,540 --> 00:10:37,720 and just now compiling it first with make, but then starting over 210 00:10:37,720 --> 00:10:40,780 and now manually compiling it with clang with what 211 00:10:40,780 --> 00:10:44,500 we'll call command line arguments. -o, space, hello, 212 00:10:44,500 --> 00:10:46,820 and then the name of the file. 213 00:10:46,820 --> 00:10:47,320 Yeah? 214 00:10:47,320 --> 00:10:48,780 AUDIENCE: What is a.out? 215 00:10:48,780 --> 00:10:49,530 DAVID MALAN: Yeah. 216 00:10:49,530 --> 00:10:51,870 So a.out is a historical name. 217 00:10:51,870 --> 00:10:55,240 It refers to assembler output-- more on that soon. 218 00:10:55,240 --> 00:10:58,080 And it's just the default file name that you get automatically 219 00:10:58,080 --> 00:11:01,350 if you just run the compiler on any file so that you 220 00:11:01,350 --> 00:11:02,970 have just a standard name for it. 221 00:11:02,970 --> 00:11:05,213 But it's not a very well-named program. 222 00:11:05,213 --> 00:11:07,380 Instead of running Microsoft Word on your Mac or PC, 223 00:11:07,380 --> 00:11:09,880 it would be like double-clicking on a.out. 224 00:11:09,880 --> 00:11:11,880 So instead with these command line arguments, 225 00:11:11,880 --> 00:11:17,370 you can customize the output of Clang and call it hello or anything you want. 226 00:11:17,370 --> 00:11:23,020 Other questions on what I've done here with Clang itself, the compiler? 227 00:11:23,020 --> 00:11:23,520 Yeah? 228 00:11:23,520 --> 00:11:25,510 AUDIENCE: What is -o? 229 00:11:25,510 --> 00:11:26,565 DAVID MALAN: So -o-- 230 00:11:26,565 --> 00:11:29,440 and you would only know this from reading the manual, taking a class, 231 00:11:29,440 --> 00:11:30,500 means output. 232 00:11:30,500 --> 00:11:35,890 So -o means change Clang's output to be a file called hello 233 00:11:35,890 --> 00:11:38,680 instead of the default, which is a.out. 234 00:11:38,680 --> 00:11:42,400 And this, too, is, again, a detail you would have to look up on a web page, 235 00:11:42,400 --> 00:11:44,810 read the manual, hear someone like me tell you about it. 236 00:11:44,810 --> 00:11:46,893 And in fact, there's even more than these options, 237 00:11:46,893 --> 00:11:48,890 but we'll just scratch the surface here. 238 00:11:48,890 --> 00:11:49,390 All right. 239 00:11:49,390 --> 00:11:53,530 So if we now know this, what more is actually happening underneath the hood? 240 00:11:53,530 --> 00:11:57,250 Well, let's take a closer look at not just this version of my code, 241 00:11:57,250 --> 00:12:01,190 but my slightly more complicated version last week, 242 00:12:01,190 --> 00:12:03,430 which looked a little something like this, wherein 243 00:12:03,430 --> 00:12:07,330 I added in some dynamic input from the user so I could say not hello, world 244 00:12:07,330 --> 00:12:11,810 to everyone, but hello, David or hello to whoever actually runs this program. 245 00:12:11,810 --> 00:12:15,880 So in fact, let me go ahead and change my code here in VS Code just 246 00:12:15,880 --> 00:12:17,770 to match that same code from last week. 247 00:12:17,770 --> 00:12:19,190 So no new code yet. 248 00:12:19,190 --> 00:12:22,820 I'm just going to, in a moment, compile it in a slightly different way. 249 00:12:22,820 --> 00:12:29,020 So I did last week's string, I think, answer equals string, quote-unquote, 250 00:12:29,020 --> 00:12:30,100 "What's your name?" 251 00:12:30,100 --> 00:12:31,540 Just like in Scratch. 252 00:12:31,540 --> 00:12:35,920 And then down here, instead of doing world, I initially wrote answer, 253 00:12:35,920 --> 00:12:37,450 but that didn't go well. 254 00:12:37,450 --> 00:12:41,530 What did I ultimately do instead to print out hello, David or hello, 255 00:12:41,530 --> 00:12:42,940 so-and-so? 256 00:12:42,940 --> 00:12:44,722 Yeah? 257 00:12:44,722 --> 00:12:45,680 Sorry, a little louder? 258 00:12:45,680 --> 00:12:46,430 AUDIENCE: %s? 259 00:12:46,430 --> 00:12:50,478 DAVID MALAN: Yeah, so %s, the so-called format code that printf just knows how 260 00:12:50,478 --> 00:12:51,020 to deal with. 261 00:12:51,020 --> 00:12:52,470 And I had to add one other thing. 262 00:12:52,470 --> 00:12:54,350 Someone else besides %s-- 263 00:12:54,350 --> 00:12:54,850 yeah? 264 00:12:54,850 --> 00:12:56,050 AUDIENCE: The name of the variable. 265 00:12:56,050 --> 00:12:58,870 DAVID MALAN: The name of the variable that I want to plug into that 266 00:12:58,870 --> 00:13:00,190 placeholder %s. 267 00:13:00,190 --> 00:13:01,630 And in this case, it's answer. 268 00:13:01,630 --> 00:13:04,363 Now let me make one refinement only because now we're in week 2 269 00:13:04,363 --> 00:13:06,530 and we're going to start writing more lines of code, 270 00:13:06,530 --> 00:13:10,360 even though Scratch called the return value of the ask puzzle piece, 271 00:13:10,360 --> 00:13:11,560 answer always. 272 00:13:11,560 --> 00:13:14,480 And see, we have full control over what our variables are called. 273 00:13:14,480 --> 00:13:17,410 And now it's probably good not to just generically always call 274 00:13:17,410 --> 00:13:19,870 my variable answer if I'm using get_string. 275 00:13:19,870 --> 00:13:21,050 Let's call it what it is. 276 00:13:21,050 --> 00:13:23,680 So this is now just a matter of style, if you will. 277 00:13:23,680 --> 00:13:26,620 Let me change the variable to be name just so 278 00:13:26,620 --> 00:13:29,980 that it's a little clearer to me, to you, to a TF or TA 279 00:13:29,980 --> 00:13:34,000 exactly what that variable represents instead of more generically answer. 280 00:13:34,000 --> 00:13:37,030 All right, so that said, let me go down to my terminal window, 281 00:13:37,030 --> 00:13:41,050 and last week again, I ran make to compile this exact same program. 282 00:13:41,050 --> 00:13:43,270 Now, though, let me go ahead and just use clang. 283 00:13:43,270 --> 00:13:45,490 So clang -o-- 284 00:13:45,490 --> 00:13:47,500 I'll still call this version hello-- 285 00:13:47,500 --> 00:13:49,330 space, hello.c. 286 00:13:49,330 --> 00:13:51,080 So exact same command as before. 287 00:13:51,080 --> 00:13:54,640 The only thing that's different is I've added a couple of more lines of code 288 00:13:54,640 --> 00:13:56,330 to get the user's input. 289 00:13:56,330 --> 00:13:59,960 Let me hit Enter, and now, darn it, our first error. 290 00:13:59,960 --> 00:14:02,750 So output from clang and make is not a good thing, 291 00:14:02,750 --> 00:14:05,420 and here, we're seeing something particularly cryptic. 292 00:14:05,420 --> 00:14:09,010 So something in function 'main--' undefined reference 293 00:14:09,010 --> 00:14:13,480 to 'get_string,' string and then linker command failed with exit code 1. 294 00:14:13,480 --> 00:14:16,540 So there's actually a lot of jargon in there that will tease apart today, 295 00:14:16,540 --> 00:14:20,338 but my hint is that clearly my problem's in main, although that's not surprising 296 00:14:20,338 --> 00:14:22,130 because there's nothing else going on here. 297 00:14:22,130 --> 00:14:26,830 get_string is an issue, and the issue is that it's an undefined reference. 298 00:14:26,830 --> 00:14:28,990 And yet, notice, I was pretty good. 299 00:14:28,990 --> 00:14:32,920 I added the CS50 header file and I said last week that that's 300 00:14:32,920 --> 00:14:35,920 enough to teach the compiler that functions exist, 301 00:14:35,920 --> 00:14:39,070 but the problem is that even though this does, in fact, 302 00:14:39,070 --> 00:14:43,090 teach Clang that get_string exists, it is not 303 00:14:43,090 --> 00:14:47,530 sufficient information for Clang to go find on the hard drive of the computer 304 00:14:47,530 --> 00:14:51,860 the 0's and 1's that actually implement get_string itself. 305 00:14:51,860 --> 00:14:54,250 So in other words, this include line, per last week, 306 00:14:54,250 --> 00:14:55,333 is a little bit of a hint. 307 00:14:55,333 --> 00:14:59,560 It's a teaser to Clang that you're about to see and use this function somewhere. 308 00:14:59,560 --> 00:15:05,710 But if you actually want to use the 0's and 1's that CS50 wrote some time ago 309 00:15:05,710 --> 00:15:08,740 and bake those into your program so your program actually 310 00:15:08,740 --> 00:15:11,470 knows how to get input from the user, well then, 311 00:15:11,470 --> 00:15:15,440 I'm going to have to go ahead and run a slightly different command. 312 00:15:15,440 --> 00:15:16,250 So let me do this. 313 00:15:16,250 --> 00:15:18,917 Let me clear my terminal window just get rid of that distraction 314 00:15:18,917 --> 00:15:23,020 and let me propose now that we run this command instead. 315 00:15:23,020 --> 00:15:28,510 Almost the same as before, clang -o, space, hello, then hello.c, 316 00:15:28,510 --> 00:15:34,210 but with one additional command line argument at the end, and this is a -l-- 317 00:15:34,210 --> 00:15:35,050 not a number 1. 318 00:15:35,050 --> 00:15:39,370 So -lcs with no space in between those two. 319 00:15:39,370 --> 00:15:43,540 Now the l is going to result in all of those 0's and 1's that actually 320 00:15:43,540 --> 00:15:48,350 were in by CS50 being linked into your code, your few lines of code or mine 321 00:15:48,350 --> 00:15:48,850 here. 322 00:15:48,850 --> 00:15:53,530 But that's the second step that the compiler requires in order to know how 323 00:15:53,530 --> 00:15:58,537 to actually execute and rather compile your code and CS50's. 324 00:15:58,537 --> 00:16:00,370 And CS50 is not the only one that does this. 325 00:16:00,370 --> 00:16:04,750 If you use any third party library in C that doesn't come with the language, 326 00:16:04,750 --> 00:16:08,333 you would do -l such and such where whoever-- 327 00:16:08,333 --> 00:16:10,000 however they've named their own library. 328 00:16:10,000 --> 00:16:14,298 But you don't have to do it for built in things like we've been using thus far. 329 00:16:14,298 --> 00:16:16,090 All right, so let me go ahead and try this. 330 00:16:16,090 --> 00:16:19,000 I'll go back to VS Code here, and let me go ahead now 331 00:16:19,000 --> 00:16:23,620 and run clang -o hello, then hello.c. 332 00:16:23,620 --> 00:16:26,560 And now instead of just hitting Enter, -lcs50 333 00:16:26,560 --> 00:16:29,590 with no space between the l and the cs50, Enter. 334 00:16:29,590 --> 00:16:33,310 Now nothing bad happens, and now I can do ./hello. 335 00:16:33,310 --> 00:16:34,180 What's your name? 336 00:16:34,180 --> 00:16:37,633 I'll type in David, Enter, and now we see hello, David. 337 00:16:37,633 --> 00:16:40,300 Now honestly, this is where we're really getting into the weeds, 338 00:16:40,300 --> 00:16:42,130 and now this is taking-- 339 00:16:42,130 --> 00:16:45,730 this is really just adding nuisance to the process of compiling and running 340 00:16:45,730 --> 00:16:46,460 your code. 341 00:16:46,460 --> 00:16:49,960 And so the reality is, even though this is indeed what is happening, 342 00:16:49,960 --> 00:16:51,880 this is why we used last week and we're going 343 00:16:51,880 --> 00:16:55,240 to continue using this week onward make because it just 344 00:16:55,240 --> 00:16:57,130 automates that whole process for you. 345 00:16:57,130 --> 00:17:00,130 But it's ideal to understand what's going wrong because any of the error 346 00:17:00,130 --> 00:17:02,770 messages you saw for problem set 1, any of the error messages 347 00:17:02,770 --> 00:17:05,859 you see for the next few weeks probably aren't coming from make, 348 00:17:05,859 --> 00:17:08,560 they're coming from Clang underneath the hood 349 00:17:08,560 --> 00:17:10,780 because make is just automating the process. 350 00:17:10,780 --> 00:17:14,060 But with make, you literally just write make and then the name of the program, 351 00:17:14,060 --> 00:17:17,560 you don't have to worry about any of those command line arguments. 352 00:17:17,560 --> 00:17:22,240 Questions, then, on compiling with dash -lcs50 or anything else? 353 00:17:22,240 --> 00:17:23,043 Yeah? 354 00:17:23,043 --> 00:17:24,960 AUDIENCE: What is the benefit of [INAUDIBLE]?? 355 00:17:24,960 --> 00:17:26,220 DAVID MALAN: Sorry, what is the benefit of-- 356 00:17:26,220 --> 00:17:27,512 AUDIENCE: Using Clang manually. 357 00:17:27,512 --> 00:17:30,000 DAVID MALAN: What is the benefit of using Clang manually? 358 00:17:30,000 --> 00:17:30,870 None, really. 359 00:17:30,870 --> 00:17:33,450 In fact, all main is doing is just say-- make is doing 360 00:17:33,450 --> 00:17:35,055 is saving us some keystrokes. 361 00:17:35,055 --> 00:17:37,680 If you prefer, though, and you just like to be more in control, 362 00:17:37,680 --> 00:17:41,130 you can totally run Clang manually if you remember the various command line 363 00:17:41,130 --> 00:17:42,090 arguments. 364 00:17:42,090 --> 00:17:42,660 Yeah? 365 00:17:42,660 --> 00:17:47,335 AUDIENCE: So why did you have to explain [INAUDIBLE] 366 00:17:47,335 --> 00:17:48,210 DAVID MALAN: Exactly. 367 00:17:48,210 --> 00:17:49,560 Why did I have to explain-- 368 00:17:49,560 --> 00:17:53,220 that is, provide a hint to CS50 with the cs50.h header file, 369 00:17:53,220 --> 00:17:55,470 but I didn't have to do that with standardio.h? 370 00:17:55,470 --> 00:17:56,400 Just because. 371 00:17:56,400 --> 00:18:00,990 standardio.h comes with C, just like a few other libraries come 372 00:18:00,990 --> 00:18:03,060 with C that we'll start seeing today. 373 00:18:03,060 --> 00:18:05,410 CS50, though, is not built into C everywhere, 374 00:18:05,410 --> 00:18:07,890 and so you do have to explicitly add that one there. 375 00:18:07,890 --> 00:18:08,767 Yeah? 376 00:18:08,767 --> 00:18:11,970 AUDIENCE: Can you define what command line argument [INAUDIBLE]?? 377 00:18:11,970 --> 00:18:15,210 DAVID MALAN: A command line argument is a word or phrase 378 00:18:15,210 --> 00:18:17,740 that you type at the command line-- 379 00:18:17,740 --> 00:18:22,200 a.k.a., your terminal-- in order to influence the behavior of a program. 380 00:18:22,200 --> 00:18:22,742 AUDIENCE: OK. 381 00:18:22,742 --> 00:18:24,430 So it's a term for whatever you're giving it. 382 00:18:24,430 --> 00:18:24,565 DAVID MALAN: Yeah. 383 00:18:24,565 --> 00:18:25,660 It changes the defaults. 384 00:18:25,660 --> 00:18:27,790 In our GUI world, Graphical User Interface, 385 00:18:27,790 --> 00:18:29,680 you and I would probably click some boxes, 386 00:18:29,680 --> 00:18:32,350 we would select some menu options to configure a program 387 00:18:32,350 --> 00:18:33,460 to behave in the same way. 388 00:18:33,460 --> 00:18:36,850 At a command line interface, you have to just say everything all at once, 389 00:18:36,850 --> 00:18:39,600 and that's why we have command line arguments. 390 00:18:39,600 --> 00:18:40,605 Yeah? 391 00:18:40,605 --> 00:18:43,243 AUDIENCE: Is make [INAUDIBLE] 392 00:18:43,243 --> 00:18:43,910 DAVID MALAN: No. 393 00:18:43,910 --> 00:18:45,470 Make is not just for CS50. 394 00:18:45,470 --> 00:18:50,480 It's used globally in any project really nowadays using C, C++, 395 00:18:50,480 --> 00:18:52,020 even other languages as well. 396 00:18:52,020 --> 00:18:54,140 In fact, most every command you see in this class, 397 00:18:54,140 --> 00:18:57,530 unless it has 5-0 at the end of it, is globally used. 398 00:18:57,530 --> 00:19:00,758 Only those-- a suffix with 50 are, indeed, course-specific. 399 00:19:00,758 --> 00:19:03,050 And even those we'll gradually take training wheels off 400 00:19:03,050 --> 00:19:06,890 of so that exactly what those commands are doing as well. 401 00:19:06,890 --> 00:19:09,053 All right, so what is it that we've just done? 402 00:19:09,053 --> 00:19:11,720 Everything we've just done, of course, I keep calling compiling, 403 00:19:11,720 --> 00:19:13,580 but let's just go down one rabbit hole so 404 00:19:13,580 --> 00:19:15,967 that you understand that when you compile code, 405 00:19:15,967 --> 00:19:18,050 there's actually a whole bunch of steps, happening 406 00:19:18,050 --> 00:19:21,800 and this is going to enable a lot of features, like companies can 407 00:19:21,800 --> 00:19:26,060 write code and then convert it to run it on Macs and PCs alike 408 00:19:26,060 --> 00:19:27,240 or phones or the like. 409 00:19:27,240 --> 00:19:30,320 So it's not just a matter of converting source code to machine code, 410 00:19:30,320 --> 00:19:34,610 there's actually four steps involved in what you and I, as of last week, 411 00:19:34,610 --> 00:19:35,840 know as compiling. 412 00:19:35,840 --> 00:19:39,033 And these aren't terms that you'll have to keep in mind constantly 413 00:19:39,033 --> 00:19:41,450 because again, we're going to abstract a lot of this away. 414 00:19:41,450 --> 00:19:43,492 But just so we've gone down the rabbit hole once, 415 00:19:43,492 --> 00:19:45,890 let's consider each of these four steps that 416 00:19:45,890 --> 00:19:49,850 have been happening for you for a week automatically, the first of which 417 00:19:49,850 --> 00:19:51,080 is called preprocessing. 418 00:19:51,080 --> 00:19:52,260 So what does this mean? 419 00:19:52,260 --> 00:19:54,450 Well, let's consider that same program as before. 420 00:19:54,450 --> 00:19:57,830 So notice that two of the lines of code start with a hash mark. 421 00:19:57,830 --> 00:20:02,338 That is a special symbol in C, and it's a so-called preprocessor directive. 422 00:20:02,338 --> 00:20:04,130 You don't need to memorize terms like that, 423 00:20:04,130 --> 00:20:07,005 but it just means that it's a little different from every other line. 424 00:20:07,005 --> 00:20:08,960 And anything with a hash symbol here should 425 00:20:08,960 --> 00:20:13,315 be preprocessed-- that is, analyzed initially before anything else happens. 426 00:20:13,315 --> 00:20:17,100 So let's consider these two lines up top, what exactly is happening. 427 00:20:17,100 --> 00:20:19,220 Well, it turns out with these two lines, you 428 00:20:19,220 --> 00:20:23,390 have two header files, of course, cs50.h and stdio.h. 429 00:20:23,390 --> 00:20:27,980 Where are those files, because they've never been in VS Code for you, 430 00:20:27,980 --> 00:20:28,550 seemingly. 431 00:20:28,550 --> 00:20:31,940 If you type LS-- if you open up the File Explorer in the GUI, 432 00:20:31,940 --> 00:20:35,900 you have never seen, probably, cs50.h or stdio.h. 433 00:20:35,900 --> 00:20:39,620 They just work, but that's because there's a folder somewhere 434 00:20:39,620 --> 00:20:43,340 on the hard drive that you're using on your Mac or PC 435 00:20:43,340 --> 00:20:45,690 or somewhere in the cloud, as in our case. 436 00:20:45,690 --> 00:20:50,210 And inside of this folder, traditionally called /usr/include. 437 00:20:50,210 --> 00:20:51,857 And user is deliberately misspelled. 438 00:20:51,857 --> 00:20:54,440 It's just slightly more succinct, although it's a little weird 439 00:20:54,440 --> 00:20:55,760 why we drop that one letter. 440 00:20:55,760 --> 00:21:01,760 But usr/include is just a folder on the server that contains cs50.h, stdio.h, 441 00:21:01,760 --> 00:21:03,990 and a bunch of other things as well. 442 00:21:03,990 --> 00:21:08,030 So in fact, if you type in VS Code, in your terminal window, 443 00:21:08,030 --> 00:21:13,310 when you're using code spaces in the cloud and type LS space /usr/include, 444 00:21:13,310 --> 00:21:15,470 you can see all of the files in that folder. 445 00:21:15,470 --> 00:21:17,580 But we've preinstalled all of that stuff for you. 446 00:21:17,580 --> 00:21:20,390 So let's consider what's actually in those files here. 447 00:21:20,390 --> 00:21:25,370 If I highlight these two lines up top that start with hash include, well, 448 00:21:25,370 --> 00:21:30,530 I kind of hinted last week that what's in that first file is a hint as to what 449 00:21:30,530 --> 00:21:32,660 functions CS50 wrote for you. 450 00:21:32,660 --> 00:21:35,540 So you can kind of think of these include lines 451 00:21:35,540 --> 00:21:38,300 as being temporary placeholders for what's 452 00:21:38,300 --> 00:21:41,000 going to become like a global find and replace. 453 00:21:41,000 --> 00:21:44,270 That is the first thing clang is going to do is to preprocess this file. 454 00:21:44,270 --> 00:21:47,300 It's going to look for any line that starts with hash include. 455 00:21:47,300 --> 00:21:50,960 And if it sees that, it's going to essentially go into that file, 456 00:21:50,960 --> 00:21:55,190 like cs50.h, and then just copy and paste the contents of that file 457 00:21:55,190 --> 00:21:56,443 magically there for you. 458 00:21:56,443 --> 00:21:58,110 You don't see it visually on the screen. 459 00:21:58,110 --> 00:22:00,060 But it's happening behind the scenes. 460 00:22:00,060 --> 00:22:03,230 And so really, what's happening with this first line 461 00:22:03,230 --> 00:22:09,380 is that somewhere in cs50.h is the declaration of getString 462 00:22:09,380 --> 00:22:11,690 like we talked last week, and it probably 463 00:22:11,690 --> 00:22:13,215 looks a little something like this. 464 00:22:13,215 --> 00:22:15,590 And we didn't spend much time on this yet this past week, 465 00:22:15,590 --> 00:22:17,030 but we will in time more. 466 00:22:17,030 --> 00:22:21,470 Notice that this is how a function is declared. 467 00:22:21,470 --> 00:22:23,677 That is, it is decreed to exist. 468 00:22:23,677 --> 00:22:25,760 The name of the function, of course, is getString. 469 00:22:25,760 --> 00:22:28,310 Inside of the parentheses are its arguments. 470 00:22:28,310 --> 00:22:31,580 In this case, there's one argument to getString, I claim today, 471 00:22:31,580 --> 00:22:33,080 but you've known this implicitly. 472 00:22:33,080 --> 00:22:34,160 And it's a prompt. 473 00:22:34,160 --> 00:22:36,860 It's the prompt that the human sees when you use getString. 474 00:22:36,860 --> 00:22:37,790 What is that prompt? 475 00:22:37,790 --> 00:22:41,060 Well, it's a string of text, like quote unquote, "what's your name?" 476 00:22:41,060 --> 00:22:43,080 or anything else that I asked last week. 477 00:22:43,080 --> 00:22:46,610 Meanwhile, getString, as we know from last week, has a return value. 478 00:22:46,610 --> 00:22:48,140 It returns something to you. 479 00:22:48,140 --> 00:22:49,610 And that, too, is a string. 480 00:22:49,610 --> 00:22:52,120 So again, this is also called a functions prototype. 481 00:22:52,120 --> 00:22:53,870 It's the thing toward the end of last week 482 00:22:53,870 --> 00:22:57,560 that I just copied and pasted from the bottom of my file to the top, 483 00:22:57,560 --> 00:23:02,030 just so that it was like this teaser for clang as to what would exist later. 484 00:23:02,030 --> 00:23:07,670 So you can think, then, of these include lines as just kind of combining all 485 00:23:07,670 --> 00:23:11,360 of those function declarations in some separate file called cs50.h, 486 00:23:11,360 --> 00:23:14,780 so that you yourself don't have to type them every time you use the library-- 487 00:23:14,780 --> 00:23:18,470 or worse, so that you, yourself, don't have to copy and paste those lines. 488 00:23:18,470 --> 00:23:22,520 This is what clang is doing for you in its first step of preprocessing. 489 00:23:22,520 --> 00:23:27,470 Second, and last in this example, what happens when clang preprocesses 490 00:23:27,470 --> 00:23:29,175 this second include line? 491 00:23:29,175 --> 00:23:31,550 Well, the only other function we care about in this story 492 00:23:31,550 --> 00:23:33,650 is printf, of course, which comes with C. 493 00:23:33,650 --> 00:23:39,440 So essentially, you can think of printf's prototype or declaration 494 00:23:39,440 --> 00:23:40,820 as just being this. 495 00:23:40,820 --> 00:23:42,870 Printf is the name of the function. 496 00:23:42,870 --> 00:23:47,370 It takes a string that you want to format like, Hello comma world, 497 00:23:47,370 --> 00:23:49,110 or Hello comma %s. 498 00:23:49,110 --> 00:23:52,120 And then with dot, dot, dot, this actually has technical meaning. 499 00:23:52,120 --> 00:23:55,770 It means, of course, that you can plug-in 0 variables, 1 variable, 2 500 00:23:55,770 --> 00:23:56,340 or 10. 501 00:23:56,340 --> 00:23:58,530 So dot, dot, dot means some number of variables. 502 00:23:58,530 --> 00:24:00,072 Now we haven't talked about this yet. 503 00:24:00,072 --> 00:24:01,410 And we won't really, in general. 504 00:24:01,410 --> 00:24:05,490 printf actually returns a value, a number, that is an integer. 505 00:24:05,490 --> 00:24:07,420 But more on that perhaps another time. 506 00:24:07,420 --> 00:24:10,920 It's generally not something the programmer tends to look at. 507 00:24:10,920 --> 00:24:14,250 But that's all we mean by preprocessing, so that at the end of this process, 508 00:24:14,250 --> 00:24:18,030 even though there's more lines of code in cs50.h and stdio.h, 509 00:24:18,030 --> 00:24:21,330 what's really just happening is that clang, in preprocessing 510 00:24:21,330 --> 00:24:25,380 the file, copies and pastes the contents of those files into your code 511 00:24:25,380 --> 00:24:29,160 so that now your code knows about everything-- getString, printf, 512 00:24:29,160 --> 00:24:31,060 and anything else. 513 00:24:31,060 --> 00:24:35,230 Any questions, then, on that first step, preprocessing? 514 00:24:35,230 --> 00:24:35,920 Yes? 515 00:24:35,920 --> 00:24:49,195 AUDIENCE: [INAUDIBLE] 516 00:24:49,195 --> 00:24:50,320 DAVID MALAN: Good question. 517 00:24:50,320 --> 00:24:52,720 When you include a file, does it only include what 518 00:24:52,720 --> 00:24:54,880 you need or does it include everything? 519 00:24:54,880 --> 00:24:56,420 Think of it as including everything. 520 00:24:56,420 --> 00:24:59,020 So if it's a big file, that's a lot of code at the very top. 521 00:24:59,020 --> 00:25:01,880 And that's why, if you think back to all of the zeros and ones 522 00:25:01,880 --> 00:25:03,880 I showed a little bit ago, as well as last week, 523 00:25:03,880 --> 00:25:06,130 there's a lot of zeros and ones that end up 524 00:25:06,130 --> 00:25:08,892 on the screen as a result of just writing, Hello, world. 525 00:25:08,892 --> 00:25:10,600 A lot of those zeros and ones are perhaps 526 00:25:10,600 --> 00:25:13,390 coming from code that you didn't actually, necessarily need. 527 00:25:13,390 --> 00:25:15,340 But some of it is perhaps there, but there 528 00:25:15,340 --> 00:25:17,740 are ways to optimize that as well. 529 00:25:17,740 --> 00:25:22,395 All right, so step two of compiling is, confusingly, called compiling. 530 00:25:22,395 --> 00:25:24,520 It's just, this is the term that most everyone uses 531 00:25:24,520 --> 00:25:27,940 to describe the whole process, instead of just this one step. 532 00:25:27,940 --> 00:25:32,140 But once a program has been preprocessed behind the scenes 533 00:25:32,140 --> 00:25:35,865 by the compiler for you, it looks now a little something like this. 534 00:25:35,865 --> 00:25:38,740 And I've put dot, dot, dot just to imply that, yes, to your question, 535 00:25:38,740 --> 00:25:39,820 there's more stuff above it. 536 00:25:39,820 --> 00:25:40,987 There's more stuff below it. 537 00:25:40,987 --> 00:25:43,070 It's just not interesting right now for us. 538 00:25:43,070 --> 00:25:44,860 So now we have just C code. 539 00:25:44,860 --> 00:25:46,960 There's no more preprocessor directives. 540 00:25:46,960 --> 00:25:49,840 At this point, all of the hash symbols and those lines of code 541 00:25:49,840 --> 00:25:52,670 have been preprocessed and converted to something else. 542 00:25:52,670 --> 00:25:56,380 And so now-- and this is where things get a little spooky looking. 543 00:25:56,380 --> 00:26:00,370 Here now is what happens when clang, or any compiler, 544 00:26:00,370 --> 00:26:03,310 literally compiles code like this. 545 00:26:03,310 --> 00:26:08,720 It converts it from this in C to this in assembly code. 546 00:26:08,720 --> 00:26:10,720 So this is among the scarier languages. 547 00:26:10,720 --> 00:26:12,580 I, myself, don't really have fond memories. 548 00:26:12,580 --> 00:26:14,805 This is not a language that many people program in. 549 00:26:14,805 --> 00:26:16,930 If you take a subsequent class in computer science, 550 00:26:16,930 --> 00:26:19,600 in systems, a higher level class, you might actually 551 00:26:19,600 --> 00:26:21,430 learn this or some variant thereof. 552 00:26:21,430 --> 00:26:23,232 But there's at least a few people out there 553 00:26:23,232 --> 00:26:24,940 that need to know this stuff because this 554 00:26:24,940 --> 00:26:29,320 is closer to what the computers themselves, nowadays, understand. 555 00:26:29,320 --> 00:26:34,600 The Intel CPUs or the AMD CPUs, the brains of today's computers and phones 556 00:26:34,600 --> 00:26:37,960 understand stuff that looks more like this and less like C. 557 00:26:37,960 --> 00:26:42,430 Now it's completely esoteric, but let me just highlight a few phrases. 558 00:26:42,430 --> 00:26:44,630 There's some stuff that's a little familiar. 559 00:26:44,630 --> 00:26:47,620 There is mention of main at the top there in yellow. 560 00:26:47,620 --> 00:26:49,750 There is mention of getString toward the bottom. 561 00:26:49,750 --> 00:26:52,070 There is mention of printf down below. 562 00:26:52,070 --> 00:26:55,600 So this is just another programming language called assembly language, 563 00:26:55,600 --> 00:26:57,010 that decades ago, humans-- 564 00:26:57,010 --> 00:26:58,450 myself included in school-- 565 00:26:58,450 --> 00:27:00,130 did write code in. 566 00:27:00,130 --> 00:27:02,630 And absolutely, some people still write this code, 567 00:27:02,630 --> 00:27:06,070 especially since you can write very, very efficient code. 568 00:27:06,070 --> 00:27:08,590 But it's a lot more arcane. 569 00:27:08,590 --> 00:27:11,380 It's a lot less user friendly. 570 00:27:11,380 --> 00:27:14,650 So you'll see in yellow now, these are the so-called instructions 571 00:27:14,650 --> 00:27:18,460 that a computer's brain or CPU understands, pushing values 572 00:27:18,460 --> 00:27:23,630 around, moving them, subtracting values, calling functions, and move, move, 573 00:27:23,630 --> 00:27:24,130 move. 574 00:27:24,130 --> 00:27:27,400 So really, the low-level operations that computers understand 575 00:27:27,400 --> 00:27:31,030 tend to be arithmetic operations-- subtraction, addition, 576 00:27:31,030 --> 00:27:34,120 and the like-- moving things in and out of memory. 577 00:27:34,120 --> 00:27:37,510 It's just a lot more tedious for folks like us to write code like this. 578 00:27:37,510 --> 00:27:40,450 This is why you and I tend to write stuff like this. 579 00:27:40,450 --> 00:27:44,080 And ideally, still, people like you and I tend to drag and drop puzzle pieces 580 00:27:44,080 --> 00:27:46,520 that sort of abstract all of that away further. 581 00:27:46,520 --> 00:27:49,420 But for now, this is, again, called assembly language. 582 00:27:49,420 --> 00:27:54,310 It is what happens when the compiler literally compiles your code. 583 00:27:54,310 --> 00:27:57,010 But of course, this, still not zeros and ones. 584 00:27:57,010 --> 00:27:58,580 So we got two steps to go. 585 00:27:58,580 --> 00:28:02,270 So when a compiler proceeds to step three, 586 00:28:02,270 --> 00:28:05,530 this is where things get converted to machine code. 587 00:28:05,530 --> 00:28:08,500 And when a compiler assembles your code for you, 588 00:28:08,500 --> 00:28:14,260 it converts what we just saw on the screen here to actual zeros and ones-- 589 00:28:14,260 --> 00:28:18,550 the so-called machine code that your phone or your computer understands. 590 00:28:18,550 --> 00:28:22,120 But it's worth noting that these are not necessarily all 591 00:28:22,120 --> 00:28:24,280 of the zeros and ones of your program. 592 00:28:24,280 --> 00:28:29,980 Yes, they are the zeros and ones that correspond to your Hello program 593 00:28:29,980 --> 00:28:33,250 or printf and getString and the like, but notice 594 00:28:33,250 --> 00:28:36,940 that here, we need one final step. 595 00:28:36,940 --> 00:28:40,100 In those zeros and ones are only your lines of code. 596 00:28:40,100 --> 00:28:43,540 But what about CS50's lines of code that we wrote to implement getString? 597 00:28:43,540 --> 00:28:46,990 What about the lines of code that humans wrote decades ago to implement printf? 598 00:28:46,990 --> 00:28:50,020 Those are somewhere on this hard drive, like on my Mac, my PC, 599 00:28:50,020 --> 00:28:54,460 or somewhere in the cloud, but we need to combine all of those zeros and ones 600 00:28:54,460 --> 00:29:01,390 together and link my code with CS50's code with standard I/O's code, 601 00:29:01,390 --> 00:29:02,420 all together. 602 00:29:02,420 --> 00:29:05,110 And so what happens in the last step, ultimately, 603 00:29:05,110 --> 00:29:07,960 is that if we have my code here in yellow, 604 00:29:07,960 --> 00:29:11,440 and then the code that CS50 wrote, and the code that the authors of C 605 00:29:11,440 --> 00:29:15,940 itself wrote, what really is happening is that somewhere, we have not only 606 00:29:15,940 --> 00:29:19,960 hello.c, which, obviously, I wrote, and wrote with us live here, 607 00:29:19,960 --> 00:29:24,550 there's also, let's assume, somewhere on the computer, a cs50.c file 608 00:29:24,550 --> 00:29:28,210 that, coincidentally, I and CS50 staff wrote years ago. 609 00:29:28,210 --> 00:29:30,790 And also, somewhere on the computer, there's another file. 610 00:29:30,790 --> 00:29:34,120 Let me oversimplify by just calling it stdio.c. 611 00:29:34,120 --> 00:29:36,850 In practice, it's probably specifically called printf.c. 612 00:29:36,850 --> 00:29:39,460 But they're somewhere, these two other files. 613 00:29:39,460 --> 00:29:44,110 And so this last step called linking takes my zeros and ones 614 00:29:44,110 --> 00:29:48,100 from the code I just wrote, namely this code on the screen here. 615 00:29:48,100 --> 00:29:50,810 It then grabs the zeros and ones that CS50 wrote. 616 00:29:50,810 --> 00:29:53,480 And it grabs the zeros and ones that the authors of C wrote, 617 00:29:53,480 --> 00:29:56,240 in order to implement the standard I/O library. 618 00:29:56,240 --> 00:30:00,750 And lastly, voila, links them all together. 619 00:30:00,750 --> 00:30:03,980 And this is the same blob of zeros and ones that we saw earlier. 620 00:30:03,980 --> 00:30:08,090 It's just now the result of preprocessing your code, 621 00:30:08,090 --> 00:30:12,620 compiling your code, assembling your code, linking your code, and my God, 622 00:30:12,620 --> 00:30:15,830 at this point, like if there were any fun in programming for you yet, 623 00:30:15,830 --> 00:30:19,620 we've just taken it all away, we just call this whole process compiling. 624 00:30:19,620 --> 00:30:20,120 Why? 625 00:30:20,120 --> 00:30:22,490 Because now that we know those steps exist-- 626 00:30:22,490 --> 00:30:25,370 and smart people solve that problem for us-- 627 00:30:25,370 --> 00:30:27,890 you and I can kind of operate at this level of abstraction 628 00:30:27,890 --> 00:30:32,420 and just assume that compiling converts source code to machine code. 629 00:30:32,420 --> 00:30:36,350 Questions, though, on any of these intermediate steps? 630 00:30:36,350 --> 00:30:37,360 Yeah? 631 00:30:37,360 --> 00:30:41,958 AUDIENCE: For linking, are different parts, like [INAUDIBLE]?? 632 00:30:41,958 --> 00:30:50,072 633 00:30:50,072 --> 00:30:51,280 DAVID MALAN: A good question. 634 00:30:51,280 --> 00:30:53,238 So where are all of these zeros and one stored? 635 00:30:53,238 --> 00:30:56,400 Because you and I, we've been using a browser, right? code.cs50.io, 636 00:30:56,400 --> 00:30:58,330 of course, is this web-based user interface. 637 00:30:58,330 --> 00:31:00,497 But again, recall from last week, even though you're 638 00:31:00,497 --> 00:31:05,640 using a web browser to access VS Code, that web-based version of VS code 639 00:31:05,640 --> 00:31:09,000 is connected to an actual server somewhere in the cloud. 640 00:31:09,000 --> 00:31:13,170 And on that server, you have your own account and your own files, and really, 641 00:31:13,170 --> 00:31:15,360 your own hard drive, virtually in the cloud. 642 00:31:15,360 --> 00:31:18,872 Think of it a little like Dropbox or Box or Google Drive or OneDrive 643 00:31:18,872 --> 00:31:19,830 or something like that. 644 00:31:19,830 --> 00:31:23,310 So you have a hard drive somewhere out there that we've provisioned for you. 645 00:31:23,310 --> 00:31:27,930 And it's on that hard drive that you have your code that you just wrote, 646 00:31:27,930 --> 00:31:32,700 or I just wrote, cs50.c, stdio.c, and all of the other code 647 00:31:32,700 --> 00:31:36,967 that implements the math functions and everything else that C supports. 648 00:31:36,967 --> 00:31:37,550 Good question. 649 00:31:37,550 --> 00:31:38,964 Yeah? 650 00:31:38,964 --> 00:31:45,425 AUDIENCE: So, say in the CS50 library, the line [INAUDIBLE] 651 00:31:45,425 --> 00:31:49,401 do we do the same exact thing [INAUDIBLE] 652 00:31:49,401 --> 00:31:51,935 copy paste them all the way over? 653 00:31:51,935 --> 00:31:53,060 DAVID MALAN: Good question. 654 00:31:53,060 --> 00:31:57,110 That hash includes cs50.h line at the top of my code. 655 00:31:57,110 --> 00:32:01,310 If I just replace that with the contents of cs50.c, would that work? 656 00:32:01,310 --> 00:32:03,590 Short answer, yes, that would work. 657 00:32:03,590 --> 00:32:05,400 You could copy all of the code there. 658 00:32:05,400 --> 00:32:08,577 However, there's some order of operations that might come into play. 659 00:32:08,577 --> 00:32:10,910 And so it's probably not quite as simple as copy, paste. 660 00:32:10,910 --> 00:32:13,190 But conceptually, yes, that's what's happening. 661 00:32:13,190 --> 00:32:19,370 Now with that said, in cs50.h, are only the prototypes of the functions, 662 00:32:19,370 --> 00:32:23,628 the hints as to how the functions look, what their return type is, 663 00:32:23,628 --> 00:32:25,670 what their name is, and what their arguments are. 664 00:32:25,670 --> 00:32:29,867 It's in the dot c file that actual code tends to be written. 665 00:32:29,867 --> 00:32:32,450 And this is a little confusing now because you and I have only 666 00:32:32,450 --> 00:32:33,920 written code in dot c files. 667 00:32:33,920 --> 00:32:35,690 But in the next few weeks, you'll actually 668 00:32:35,690 --> 00:32:37,940 start writing some of your own dot h files 669 00:32:37,940 --> 00:32:40,460 as well, just like CS50, just like standard I/O. 670 00:32:40,460 --> 00:32:44,150 But in essence, that line of code just makes it easier to use and reuse 671 00:32:44,150 --> 00:32:46,020 code that's already been written. 672 00:32:46,020 --> 00:32:47,750 And that's the whole point of a library. 673 00:32:47,750 --> 00:32:50,327 AUDIENCE: Does linking them [INAUDIBLE]? 674 00:32:50,327 --> 00:32:51,910 DAVID MALAN: Say that a little louder. 675 00:32:51,910 --> 00:32:54,472 AUDIENCE: Does linking happen when you use the compiler? 676 00:32:54,472 --> 00:32:55,180 DAVID MALAN: Yes. 677 00:32:55,180 --> 00:32:56,980 Does linking happen when you compile your code? 678 00:32:56,980 --> 00:32:57,480 Yes. 679 00:32:57,480 --> 00:33:02,320 When you run make, as we have been doing the past week now, 680 00:33:02,320 --> 00:33:04,570 all four of these steps are happening. 681 00:33:04,570 --> 00:33:07,780 Preprocessing converts the hash include lines to something else. 682 00:33:07,780 --> 00:33:10,600 Compiling technically converts it to assembly 683 00:33:10,600 --> 00:33:14,290 code, which the Mac, the PC, the server more closely understands. 684 00:33:14,290 --> 00:33:18,850 Assembly converts that language to binary machine code that this computer 685 00:33:18,850 --> 00:33:20,080 actually understands. 686 00:33:20,080 --> 00:33:22,540 And then linking combines everything together. 687 00:33:22,540 --> 00:33:27,550 And in fact, if you think back a few minutes ago to when I did this -lcs50, 688 00:33:27,550 --> 00:33:30,070 the reason I had to add that, and the reason 689 00:33:30,070 --> 00:33:32,860 my code did not compile at first, was because I 690 00:33:32,860 --> 00:33:38,650 forgot to tell clang to link in CS50's zeros and ones per that last step. 691 00:33:38,650 --> 00:33:42,147 I don't need to do -lstdio because it comes with C, 692 00:33:42,147 --> 00:33:44,480 so that would just be tedious for everyone in the world. 693 00:33:44,480 --> 00:33:47,140 But CS50 does not come with C, so we link that in. 694 00:33:47,140 --> 00:33:49,780 And to be clear, too, we won't always use CS50's library. 695 00:33:49,780 --> 00:33:53,072 That'll be yet another pair of training wheels we take off in the coming weeks. 696 00:33:53,072 --> 00:33:55,000 But for now, it makes a few things simpler. 697 00:33:55,000 --> 00:33:57,284 Yeah? 698 00:33:57,284 --> 00:33:59,750 AUDIENCE: What is the [INAUDIBLE]? 699 00:33:59,750 --> 00:34:08,878 700 00:34:08,878 --> 00:34:10,170 DAVID MALAN: Short answer, yes. 701 00:34:10,170 --> 00:34:12,870 So what do the zeros and ones, the machine code, translate to? 702 00:34:12,870 --> 00:34:15,690 Yes, there is a one-to-one relationship between the machine 703 00:34:15,690 --> 00:34:17,340 code and the assembly code. 704 00:34:17,340 --> 00:34:21,510 Assembly code, it's not really English, but at least it's symbols I recognize. 705 00:34:21,510 --> 00:34:22,800 It's not zeros and ones. 706 00:34:22,800 --> 00:34:24,810 Machine code, of course, is just zeros and ones. 707 00:34:24,810 --> 00:34:27,960 So back in the day, before C existed, people 708 00:34:27,960 --> 00:34:30,630 were programming only in assembly code. 709 00:34:30,630 --> 00:34:34,469 Before assembly code existed, people were coding in zeros and ones. 710 00:34:34,469 --> 00:34:36,719 And you can imagine just how painful that was, 711 00:34:36,719 --> 00:34:39,027 and so each of these languages makes life, for us, 712 00:34:39,027 --> 00:34:40,110 sort of easier and easier. 713 00:34:40,110 --> 00:34:42,330 In a few weeks, we'll transition to Python, which 714 00:34:42,330 --> 00:34:45,300 will, in turn, make C even simpler-- 715 00:34:45,300 --> 00:34:48,090 or coding, in general, simpler to do too. 716 00:34:48,090 --> 00:34:53,346 All right, so with that said, what now can we-- 717 00:34:53,346 --> 00:34:55,060 what could go wrong with this? 718 00:34:55,060 --> 00:34:58,140 Well, it turns out that besides compiling, technically speaking, 719 00:34:58,140 --> 00:34:59,233 there's decompiling. 720 00:34:59,233 --> 00:35:01,150 And we've not done this, and we won't do this. 721 00:35:01,150 --> 00:35:04,080 But it's worth considering for just a moment. 722 00:35:04,080 --> 00:35:07,560 If you were to not compile your code, but decompile it-- 723 00:35:07,560 --> 00:35:11,340 as the word suggests, this just means reversing the process, converting it, 724 00:35:11,340 --> 00:35:14,580 ideally, from machine code-- zeros and ones-- 725 00:35:14,580 --> 00:35:19,870 maybe back to C. Now this would be cool, perhaps, if all you have is a program, 726 00:35:19,870 --> 00:35:22,080 you can convert it and see the actual source code. 727 00:35:22,080 --> 00:35:25,320 What might a downside be, if anyone on the internet 728 00:35:25,320 --> 00:35:28,650 is able to decompile code on their machine? 729 00:35:28,650 --> 00:35:29,160 Yeah? 730 00:35:29,160 --> 00:35:30,270 AUDIENCE: [INAUDIBLE] 731 00:35:30,270 --> 00:35:34,130 DAVID MALAN: OK, so it's easier to find bugs in the code that-- 732 00:35:34,130 --> 00:35:35,430 oh, to exploit. 733 00:35:35,430 --> 00:35:38,417 So it might be easier to hack into the software 734 00:35:38,417 --> 00:35:41,000 by finding mistakes you and I made because, literally, they're 735 00:35:41,000 --> 00:35:43,370 staring at you in code, whereas the zeros and ones make 736 00:35:43,370 --> 00:35:45,080 it way less obvious. 737 00:35:45,080 --> 00:35:48,140 Other downsides of what I called decompiling? 738 00:35:48,140 --> 00:35:49,970 Yeah? 739 00:35:49,970 --> 00:35:53,690 AUDIENCE: If stuff is copyrighted or you don't even know how to get it-- 740 00:35:53,690 --> 00:35:54,440 DAVID MALAN: Yeah. 741 00:35:54,440 --> 00:35:55,948 AUDIENCE: [INAUDIBLE] 742 00:35:55,948 --> 00:35:57,740 DAVID MALAN: Yeah, if your code, your work, 743 00:35:57,740 --> 00:36:00,950 is your intellectual property, copyrighted or otherwise, that's 744 00:36:00,950 --> 00:36:03,660 kind of obnoxious that someone can just run a command, and boom, 745 00:36:03,660 --> 00:36:05,577 they can see the original code that you wrote. 746 00:36:05,577 --> 00:36:08,490 Now, it turns out it's not quite as simple as that. 747 00:36:08,490 --> 00:36:11,720 And so even though, yes, you could take a program like Hello, 748 00:36:11,720 --> 00:36:15,080 or even Microsoft Word, and convert it from zeros and ones 749 00:36:15,080 --> 00:36:19,400 back to some form of source code-- be it in C or Java 750 00:36:19,400 --> 00:36:22,820 or Python or something else, whatever it was originally written in-- odds 751 00:36:22,820 --> 00:36:25,800 are it's going to be an utter mess to look at. 752 00:36:25,800 --> 00:36:26,300 Why? 753 00:36:26,300 --> 00:36:30,390 Because things variable names are not retained in the zeros and ones, 754 00:36:30,390 --> 00:36:30,890 typically. 755 00:36:30,890 --> 00:36:33,980 Function names might not be retained in the zeros and ones. 756 00:36:33,980 --> 00:36:36,350 The code is, the logic is, but the computer 757 00:36:36,350 --> 00:36:38,510 doesn't care what pretty variables you chose 758 00:36:38,510 --> 00:36:41,060 and how nicely named your functions were, it just 759 00:36:41,060 --> 00:36:42,890 needs to know them as zeros and ones. 760 00:36:42,890 --> 00:36:46,370 Moreover, if you think about last week, we introduced things like loops in C. 761 00:36:46,370 --> 00:36:49,745 And besides for loops, there's what other kind of loop, for instance? 762 00:36:49,745 --> 00:36:50,620 AUDIENCE: [INAUDIBLE] 763 00:36:50,620 --> 00:36:53,412 DAVID MALAN: So, a while loop-- and even though they look different 764 00:36:53,412 --> 00:36:55,920 and you have to write different code, they achieve exactly 765 00:36:55,920 --> 00:36:59,910 the same functionality, which is to say, when you compile a for loop 766 00:36:59,910 --> 00:37:04,140 or you compile a while loop, if they logically do the same thing, 767 00:37:04,140 --> 00:37:07,420 they might end up looking identical as zeros and ones. 768 00:37:07,420 --> 00:37:09,780 And so, therefore, it's not necessarily predictable 769 00:37:09,780 --> 00:37:11,820 that you'll get back the original code, why? 770 00:37:11,820 --> 00:37:15,110 Because the zeros and ones might not know, so to speak, 771 00:37:15,110 --> 00:37:16,860 whether it was a for loop or a while loop, 772 00:37:16,860 --> 00:37:19,350 so maybe compiling will show you one or the other. 773 00:37:19,350 --> 00:37:21,870 And honestly, decompiling, while possible-- and it's 774 00:37:21,870 --> 00:37:24,570 one way of reverse engineering someone's product. 775 00:37:24,570 --> 00:37:28,662 Odds are, if you're good enough to start reading code that's been decompiled 776 00:37:28,662 --> 00:37:30,870 and reading through the messiness of it, odds are you 777 00:37:30,870 --> 00:37:34,020 have the talent probably to just write that same program from scratch 778 00:37:34,020 --> 00:37:34,650 yourself. 779 00:37:34,650 --> 00:37:36,870 Now, that's an overstatement, perhaps, but it's not 780 00:37:36,870 --> 00:37:40,410 quite as easy or threatening as you might first think. 781 00:37:40,410 --> 00:37:43,290 So in general, once code is compiled, it's 782 00:37:43,290 --> 00:37:48,290 pretty challenging, time consuming, costly to reverse engineer it, much 783 00:37:48,290 --> 00:37:50,040 like it would be in the real world, right? 784 00:37:50,040 --> 00:37:52,860 Like all of us have some kind of phone, probably, nowadays in our pocket. 785 00:37:52,860 --> 00:37:55,193 There's nothing stopping you from opening it up somehow, 786 00:37:55,193 --> 00:37:57,060 poking around, recreating what's there. 787 00:37:57,060 --> 00:37:59,130 That's a huge amount of effort, most likely. 788 00:37:59,130 --> 00:38:01,880 And at that point, maybe you should just invent the phone, instead 789 00:38:01,880 --> 00:38:03,310 of trying to reverse engineer it. 790 00:38:03,310 --> 00:38:06,330 So same kind of idea in the physical world. 791 00:38:06,330 --> 00:38:13,050 Any questions, then, on compiling, or even decompiling in these forms? 792 00:38:13,050 --> 00:38:17,160 All right, so odds are, at this point, not only I, but you have made mistakes. 793 00:38:17,160 --> 00:38:19,050 And you've written buggy code-- 794 00:38:19,050 --> 00:38:22,350 a bug in a code is just a mistake, a logical error 795 00:38:22,350 --> 00:38:26,490 or otherwise, where the code just does not behave correctly as you intend. 796 00:38:26,490 --> 00:38:29,880 And up until now, odds are, your debugging techniques 797 00:38:29,880 --> 00:38:32,910 have been to maybe look back at what I did in class, maybe 798 00:38:32,910 --> 00:38:35,320 ask a question online or in-person. 799 00:38:35,320 --> 00:38:38,190 But ultimately, it'd be nice if you had some tools of your own 800 00:38:38,190 --> 00:38:39,570 with which to debug code. 801 00:38:39,570 --> 00:38:41,587 And this, honestly, is a lifelong skill. 802 00:38:41,587 --> 00:38:43,170 You're not going to emerge from CS50-- 803 00:38:43,170 --> 00:38:44,490 and even 20 years from now, you're not going 804 00:38:44,490 --> 00:38:47,910 to be writing-- if you're writing code at all-- correct code all of the time. 805 00:38:47,910 --> 00:38:50,820 Like, all of us on the staff continue to write bugs. 806 00:38:50,820 --> 00:38:54,120 Hopefully, they get a little more sophisticated, and not sort of like, 807 00:38:54,120 --> 00:38:55,540 oops, I missed a semicolon. 808 00:38:55,540 --> 00:38:57,660 But even those kinds of mistakes, we make too. 809 00:38:57,660 --> 00:39:00,150 But there's tools out there and techniques 810 00:39:00,150 --> 00:39:03,550 that can make your life easier when it comes to solving those problems. 811 00:39:03,550 --> 00:39:06,360 Now, the term bug has actually been around for decades. 812 00:39:06,360 --> 00:39:11,790 But a fun story to tell is that the first documented actual bug was 813 00:39:11,790 --> 00:39:13,650 actually somehow connected to Harvard. 814 00:39:13,650 --> 00:39:18,870 In fact, this is the logbook relating to the Harvard Mark II computer 815 00:39:18,870 --> 00:39:22,890 from 1947, whereby if you read the notes here-- and I'll Zoom in-- this 816 00:39:22,890 --> 00:39:27,630 was an actual moth discovered inside of this big mainframe computer that 817 00:39:27,630 --> 00:39:29,160 was causing some kind of problems. 818 00:39:29,160 --> 00:39:30,450 And the engineers there at the time actually 819 00:39:30,450 --> 00:39:33,610 thought it was funny that, wow, physical bug actually explains the issue. 820 00:39:33,610 --> 00:39:36,450 And it's been forever taped to this sheet of paper, which I believe 821 00:39:36,450 --> 00:39:39,090 now is on display in the Smithsonian. 822 00:39:39,090 --> 00:39:43,260 With that said, this is just representative, too, of a logical bug. 823 00:39:43,260 --> 00:39:45,390 And that story is actually-- 824 00:39:45,390 --> 00:39:49,170 that story was often retold by a famous mathematician, then computer scientist 825 00:39:49,170 --> 00:39:53,640 really, Dr. Grace Hopper, who actually worked not only on the Harvard Mark II 826 00:39:53,640 --> 00:39:57,210 computer, but its predecessor, the Harvard Mark I. 827 00:39:57,210 --> 00:40:01,020 And if you ever spent time, yet, in the engineering building across the river 828 00:40:01,020 --> 00:40:04,103 here, you can actually see much of this computer, which 829 00:40:04,103 --> 00:40:07,020 is along the wall when you first walk into the Science and Engineering 830 00:40:07,020 --> 00:40:07,530 Complex. 831 00:40:07,530 --> 00:40:09,530 And indeed, as you've probably heard growing up, 832 00:40:09,530 --> 00:40:11,070 this is a mainframe computer. 833 00:40:11,070 --> 00:40:15,210 This is what Macs and PCs, so to speak, looked like back in the day, 834 00:40:15,210 --> 00:40:18,240 with very physical things that essentially implemented the zeros 835 00:40:18,240 --> 00:40:21,900 and ones that you and I take for granted now being miniaturized in our laptops 836 00:40:21,900 --> 00:40:22,410 and phones. 837 00:40:22,410 --> 00:40:23,910 So there's a piece of history there. 838 00:40:23,910 --> 00:40:27,390 If you visit that side of campus sometime, do take a look. 839 00:40:27,390 --> 00:40:30,480 But let's consider, then, how we solve not, of course, physical bugs, 840 00:40:30,480 --> 00:40:31,350 but logical bugs. 841 00:40:31,350 --> 00:40:33,600 And let's consider something like this from last week, 842 00:40:33,600 --> 00:40:38,820 whereby, we were trying very simply to print like this column of three bricks 843 00:40:38,820 --> 00:40:40,320 using hashtags of sorts. 844 00:40:40,320 --> 00:40:44,400 So let me go over here in just a moment to VS Code. 845 00:40:44,400 --> 00:40:47,080 And I'm going to go ahead and open a program I wrote in advance. 846 00:40:47,080 --> 00:40:49,455 And I'm bringing it to class because there's a bug in it, 847 00:40:49,455 --> 00:40:51,510 and I'd like to figure out how to solve this bug. 848 00:40:51,510 --> 00:40:56,160 So let me open up a buggy0.c, which is version 0 of my code. 849 00:40:56,160 --> 00:40:58,200 And let's just take a quick peek at what's here. 850 00:40:58,200 --> 00:40:58,950 It's pretty short. 851 00:40:58,950 --> 00:41:03,750 It includes only stdio.h, it uses printf, it uses a for loop, 852 00:41:03,750 --> 00:41:07,797 and the goal, quite simply, is to print out that column of three bricks. 853 00:41:07,797 --> 00:41:11,130 Now, it's short enough that some of you, if you're getting comfy already with C, 854 00:41:11,130 --> 00:41:13,360 you might already see the logical bug. 855 00:41:13,360 --> 00:41:16,200 It's not a syntax error, like it will compile and run. 856 00:41:16,200 --> 00:41:17,280 But there's a bug there. 857 00:41:17,280 --> 00:41:22,320 And suppose that I'm very new to C, I'm very uncomfortable with C, it's 2:00 AM 858 00:41:22,320 --> 00:41:26,130 and I just can't see the bug, what are my recourses here for actually 859 00:41:26,130 --> 00:41:27,745 finding a mistake like this? 860 00:41:27,745 --> 00:41:29,370 Well, first, let's look at the symptom. 861 00:41:29,370 --> 00:41:31,740 Let me go down to my terminal window. 862 00:41:31,740 --> 00:41:36,120 I'm going to use make buggy0 because, again, the file is called buggyo.c. 863 00:41:36,120 --> 00:41:37,260 I'm not going to use clang. 864 00:41:37,260 --> 00:41:39,880 In fact, I'm never really going to use clang manually here on out. 865 00:41:39,880 --> 00:41:42,430 I'm just going to use make because it makes our lives easier. 866 00:41:42,430 --> 00:41:43,560 It does compile. 867 00:41:43,560 --> 00:41:45,390 No errors, so it's not syntax. 868 00:41:45,390 --> 00:41:47,670 It's not something silly like a missing semicolon. 869 00:41:47,670 --> 00:41:53,190 But when I run ./buggy0, I, of course, see one, two, three, four-- 870 00:41:53,190 --> 00:41:57,990 and this, of course, does not match the one, two, three bricks that I actually 871 00:41:57,990 --> 00:41:59,610 intended for that column. 872 00:41:59,610 --> 00:42:02,970 And yet, I'm starting counting at 0, as I usually do. 873 00:42:02,970 --> 00:42:03,930 I've got three. 874 00:42:03,930 --> 00:42:05,280 I'm going up to three. 875 00:42:05,280 --> 00:42:06,780 So where is my logical error? 876 00:42:06,780 --> 00:42:10,150 If it hasn't obviously jumped out at you already, well, how can I solve this? 877 00:42:10,150 --> 00:42:13,080 Well, first and foremost, perhaps the best technique 878 00:42:13,080 --> 00:42:16,080 for solving bugs, at least early on, is just use printf. 879 00:42:16,080 --> 00:42:20,020 Like thus far, we've used sprint say, Hello, and other things on the screen. 880 00:42:20,020 --> 00:42:22,530 But printf is just a function for printing anything. 881 00:42:22,530 --> 00:42:24,570 And there's no reason you can't temporarily 882 00:42:24,570 --> 00:42:27,900 use printf to print out the contents of variables, 883 00:42:27,900 --> 00:42:29,850 what's going on inside of your program, just 884 00:42:29,850 --> 00:42:31,350 to figure out where your mistake is. 885 00:42:31,350 --> 00:42:32,940 And then you can delete that line of code later. 886 00:42:32,940 --> 00:42:34,600 It doesn't have to stay there forever. 887 00:42:34,600 --> 00:42:35,740 So let me do this. 888 00:42:35,740 --> 00:42:39,450 Instead of just printing out in VS Code the hash symbol, 889 00:42:39,450 --> 00:42:45,690 let me do a little safety check here and print out the value of i. 890 00:42:45,690 --> 00:42:49,170 So let me go ahead and say something like, i is-- 891 00:42:49,170 --> 00:42:51,610 now I want to say i is this. 892 00:42:51,610 --> 00:42:54,540 But, of course, this is not how I print out the value of i. 893 00:42:54,540 --> 00:42:58,930 If I want to print out the value of i, what should I put here? 894 00:42:58,930 --> 00:43:02,160 So %i for integer, instead of %s for string. 895 00:43:02,160 --> 00:43:03,410 So they're still placeholders. 896 00:43:03,410 --> 00:43:04,930 But we use %s for integers. 897 00:43:04,930 --> 00:43:08,450 And now if I want to print out i, I just need the comma as the second argument, 898 00:43:08,450 --> 00:43:09,250 and then i. 899 00:43:09,250 --> 00:43:13,000 All right, let me go ahead and back to my terminal window. 900 00:43:13,000 --> 00:43:15,760 Let me recompile the program because I've changed it. 901 00:43:15,760 --> 00:43:18,880 That still works fine, ./buggy0. 902 00:43:18,880 --> 00:43:22,540 And now, let me increase the size of my terminal window here. 903 00:43:22,540 --> 00:43:25,510 You just see some diagnostic information, if you will. 904 00:43:25,510 --> 00:43:26,560 This is not the goal. 905 00:43:26,560 --> 00:43:29,393 This is not what you should be submitting for this homework problem, 906 00:43:29,393 --> 00:43:30,070 were it one. 907 00:43:30,070 --> 00:43:33,730 But it is helping us diagnostically know that, OK, when i is zero, 908 00:43:33,730 --> 00:43:34,450 here's a hash. 909 00:43:34,450 --> 00:43:36,182 When i is 1, here's a hash. 910 00:43:36,182 --> 00:43:37,390 When i is two, here's a hash. 911 00:43:37,390 --> 00:43:39,017 When i is 3, here's a hash. 912 00:43:39,017 --> 00:43:39,850 Well, wait a minute. 913 00:43:39,850 --> 00:43:41,530 That's one, two, three, four. 914 00:43:41,530 --> 00:43:44,360 So clearly, I'm printing it one too many times. 915 00:43:44,360 --> 00:43:48,130 So let me look back at the code here by shrinking my terminal window. 916 00:43:48,130 --> 00:43:53,080 And let me just ask the group, where is, in fact, the mistake? 917 00:43:53,080 --> 00:43:56,080 Or what, equivalently, would be the solution? 918 00:43:56,080 --> 00:43:57,561 Yeah, in the middle. 919 00:43:57,561 --> 00:44:00,020 AUDIENCE: [INAUDIBLE] 920 00:44:00,020 --> 00:44:03,550 DAVID MALAN: Yeah, instead of less than or equal to, use just less than. 921 00:44:03,550 --> 00:44:05,300 So you've got to kind of pick a lane here. 922 00:44:05,300 --> 00:44:08,630 If you're going to start counting from 0, you generally use less than, 923 00:44:08,630 --> 00:44:10,880 and go up to, but not through the value. 924 00:44:10,880 --> 00:44:13,970 Or if you prefer, like in the human world, counting from 1 on up, 925 00:44:13,970 --> 00:44:17,300 you can use less than or equal to, but you have to be consistent. 926 00:44:17,300 --> 00:44:19,790 And in general, as a programmer, just always start 927 00:44:19,790 --> 00:44:22,610 counting from 0 if you're doing something canonical like this. 928 00:44:22,610 --> 00:44:25,160 But the solution is, indeed, just to change this 929 00:44:25,160 --> 00:44:27,860 by changing the greater less than or equal to the less than. 930 00:44:27,860 --> 00:44:34,340 If I recompile this program with make buggy0, and then do .buggy0 again-- 931 00:44:34,340 --> 00:44:36,500 and let me increase the size of my terminal window. 932 00:44:36,500 --> 00:44:39,050 Now, you see, OK, almost the same output. 933 00:44:39,050 --> 00:44:44,330 But indeed, i starts at 0 and goes up to, but not through, three. 934 00:44:44,330 --> 00:44:48,920 All right, so printf, in short, can be your first diagnostic tool. 935 00:44:48,920 --> 00:44:51,500 Instead of just staring at the screen or raising your hand-- 936 00:44:51,500 --> 00:44:55,490 I mean, use printf to see, literally, what's going on inside of your program 937 00:44:55,490 --> 00:44:57,287 by just printing out things of interest. 938 00:44:57,287 --> 00:44:59,120 And then once you've solved the problem, you 939 00:44:59,120 --> 00:45:02,840 can go back into your code, as I'll do here, by shrinking my terminal window. 940 00:45:02,840 --> 00:45:04,610 I'll delete the printf line. 941 00:45:04,610 --> 00:45:07,100 And now I'm ready to share this program with the world 942 00:45:07,100 --> 00:45:08,870 or submit it as homework or the like. 943 00:45:08,870 --> 00:45:11,390 It's just meant there to be temporary. 944 00:45:11,390 --> 00:45:15,440 Any questions on printf as a debugging tool? 945 00:45:15,440 --> 00:45:18,010 946 00:45:18,010 --> 00:45:18,510 No? 947 00:45:18,510 --> 00:45:20,970 All right, well, that only gets us so far. 948 00:45:20,970 --> 00:45:23,430 And honestly, as your programs grow and grow and grow, 949 00:45:23,430 --> 00:45:25,180 it's going to actually get really annoying 950 00:45:25,180 --> 00:45:28,860 to start going in and adding printf's, then removing them, and figuring out, 951 00:45:28,860 --> 00:45:31,860 if you've got multiple printf's, well, which one printed what? 952 00:45:31,860 --> 00:45:34,560 It just gets messy, eventually, to rely on printf alone. 953 00:45:34,560 --> 00:45:37,740 So being a computer scientist, computer scientists 954 00:45:37,740 --> 00:45:41,040 have written software to make it easier to debug code. 955 00:45:41,040 --> 00:45:44,040 That software is what we would generally call a debugger, which 956 00:45:44,040 --> 00:45:47,040 would be the second tool of the trade that you can use to actually solve 957 00:45:47,040 --> 00:45:48,610 problems in your code. 958 00:45:48,610 --> 00:45:52,690 Now, in the world of VS code, there's actually a debugger built in. 959 00:45:52,690 --> 00:45:54,840 So the graphical user interface you're about to see 960 00:45:54,840 --> 00:45:58,260 in VS Code isn't specific to CS50, it actually comes with VS Code. 961 00:45:58,260 --> 00:46:01,230 And it supports C, and C++, and Java, and Python, 962 00:46:01,230 --> 00:46:03,030 and lots of other languages too. 963 00:46:03,030 --> 00:46:05,640 But it's, admittedly, a little complicated 964 00:46:05,640 --> 00:46:07,650 to just start using the debugger. 965 00:46:07,650 --> 00:46:10,200 You have to create a configuration file and do 966 00:46:10,200 --> 00:46:13,480 some annoying steps that just get in the way of solving real problems. 967 00:46:13,480 --> 00:46:17,070 So we have automated the process for you of just starting the debugger. 968 00:46:17,070 --> 00:46:19,680 And thereafter, it's sort of industry standard how you use it. 969 00:46:19,680 --> 00:46:23,380 But we save you the headache of having to create those configuration files. 970 00:46:23,380 --> 00:46:25,330 So, suppose I want to do this. 971 00:46:25,330 --> 00:46:27,600 Suppose I want to try to debug this program 972 00:46:27,600 --> 00:46:30,330 step by step using special software. 973 00:46:30,330 --> 00:46:31,810 Well, how can I do that? 974 00:46:31,810 --> 00:46:36,240 Well, let me propose that if I revert this back to the original version 975 00:46:36,240 --> 00:46:40,530 where i was less than or equal to 3, I'm pretty sure that I 976 00:46:40,530 --> 00:46:41,790 was printing too many hashes. 977 00:46:41,790 --> 00:46:43,350 So I'm going to do this-- and you might have done this 978 00:46:43,350 --> 00:46:45,160 accidentally or never at all. 979 00:46:45,160 --> 00:46:49,500 But notice if you hover over the gutter, so to speak, in VS Code, the part of it 980 00:46:49,500 --> 00:46:52,590 all the way to the left of the editor, you see this sort of grayed 981 00:46:52,590 --> 00:46:54,390 out red dot. 982 00:46:54,390 --> 00:46:57,240 If you click there, it becomes a brighter red dot. 983 00:46:57,240 --> 00:46:59,670 And this represents what we're going to call a breakpoint. 984 00:46:59,670 --> 00:47:03,090 And this is just a visual indicator that you've put like a stop sign equivalent 985 00:47:03,090 --> 00:47:06,270 there, and you're telling the debugger in a moment, stop 986 00:47:06,270 --> 00:47:07,350 running my code there. 987 00:47:07,350 --> 00:47:07,920 Why? 988 00:47:07,920 --> 00:47:11,610 Because I prefer to step through my code at sort of a human speed, 989 00:47:11,610 --> 00:47:14,380 and not as computer speed where it runs all at once. 990 00:47:14,380 --> 00:47:16,750 So I've set my breakpoint, which is step one. 991 00:47:16,750 --> 00:47:18,580 And then step two is quite simply this. 992 00:47:18,580 --> 00:47:23,190 Instead of running the program itself, run the command called debug50, 993 00:47:23,190 --> 00:47:26,010 and then ./buggy0. 994 00:47:26,010 --> 00:47:29,220 And now this will start your program, but inside 995 00:47:29,220 --> 00:47:31,200 of the debugger, which is a special program 996 00:47:31,200 --> 00:47:33,060 that smart people wrote that will empower 997 00:47:33,060 --> 00:47:38,190 you to now step through your code line by line, and again, at your own comfort 998 00:47:38,190 --> 00:47:38,970 pace. 999 00:47:38,970 --> 00:47:43,080 I'm going to hit Enter, some stuff's going to happen on the screen-- whoops. 1000 00:47:43,080 --> 00:47:45,767 Notice, this is a common mistake that I made accidentally here. 1001 00:47:45,767 --> 00:47:47,100 Looks like I've changed my code. 1002 00:47:47,100 --> 00:47:49,892 I did because I went in and changed the less than or equal to sign. 1003 00:47:49,892 --> 00:47:52,860 So let me go ahead and rerun make buggy0-- 1004 00:47:52,860 --> 00:47:53,520 Enter. 1005 00:47:53,520 --> 00:47:55,590 Good, now let me rerun debug50-- 1006 00:47:55,590 --> 00:47:57,810 Enter. 1007 00:47:57,810 --> 00:47:59,760 And now some stuff just happened on the screen 1008 00:47:59,760 --> 00:48:03,270 and it takes a moment to get started but once it's started you'll 1009 00:48:03,270 --> 00:48:06,010 see this you'll still see your code. 1010 00:48:06,010 --> 00:48:09,410 But you'll see this yellow highlight, which you've probably not seen before. 1011 00:48:09,410 --> 00:48:11,910 And notice that it's specifically highlighting the same line 1012 00:48:11,910 --> 00:48:13,440 that I set a breakpoint on. 1013 00:48:13,440 --> 00:48:13,950 Why? 1014 00:48:13,950 --> 00:48:18,870 That just means the debugger has executed all of these lines, 1015 00:48:18,870 --> 00:48:20,670 except for line 7. 1016 00:48:20,670 --> 00:48:23,340 It has broken at-- not in a bad way. 1017 00:48:23,340 --> 00:48:27,580 But it has paused execution on line 7, so it hasn't yet printed any hashes. 1018 00:48:27,580 --> 00:48:30,450 And you can see that-- no hashes in the terminal window yet. 1019 00:48:30,450 --> 00:48:31,980 It's paused execution. 1020 00:48:31,980 --> 00:48:35,190 But what's interesting with the debugger is the stuff 1021 00:48:35,190 --> 00:48:37,410 over here on the left-hand side. 1022 00:48:37,410 --> 00:48:39,960 In the debugger here, you'll see, under variables, 1023 00:48:39,960 --> 00:48:41,910 all of your so-called local variables. 1024 00:48:41,910 --> 00:48:44,160 And we haven't really made a distinction between local 1025 00:48:44,160 --> 00:48:45,327 and something called global. 1026 00:48:45,327 --> 00:48:48,000 But for now, local variables just means all of the variables 1027 00:48:48,000 --> 00:48:49,390 that exist in your function. 1028 00:48:49,390 --> 00:48:52,110 So i currently has a value of 0. 1029 00:48:52,110 --> 00:48:53,410 OK, and that makes sense. 1030 00:48:53,410 --> 00:48:57,360 So now, how do I step through my code and see what it's doing? 1031 00:48:57,360 --> 00:48:59,610 Well, at the top of the screen here, you'll 1032 00:48:59,610 --> 00:49:02,250 see some playback icons, kind of like a video player, 1033 00:49:02,250 --> 00:49:03,630 but they have special meaning. 1034 00:49:03,630 --> 00:49:07,892 This first one will just play the rest of your program all the way to the end. 1035 00:49:07,892 --> 00:49:10,350 So you only click that if you've sort of solved the problem 1036 00:49:10,350 --> 00:49:13,110 and you just want to run it to completion like before. 1037 00:49:13,110 --> 00:49:14,370 But the next three-- 1038 00:49:14,370 --> 00:49:16,920 or next two, really, are really the juiciest. 1039 00:49:16,920 --> 00:49:19,710 The second one here, if you hover over it, eventually, 1040 00:49:19,710 --> 00:49:21,930 you'll see that it's called Step Over. 1041 00:49:21,930 --> 00:49:25,170 Step Over means that the debugger will run 1042 00:49:25,170 --> 00:49:28,630 this currently highlighted line of code, but it's not going to dive into it. 1043 00:49:28,630 --> 00:49:30,660 So if it's a function like printf, it's not 1044 00:49:30,660 --> 00:49:32,827 going to start stepping through printf line by line. 1045 00:49:32,827 --> 00:49:33,327 Why? 1046 00:49:33,327 --> 00:49:36,420 Because I can pretty much assume printf, written decades ago, is correct. 1047 00:49:36,420 --> 00:49:38,050 Problem's probably with me. 1048 00:49:38,050 --> 00:49:42,690 But this next line, if I did really want to step into the printf code 1049 00:49:42,690 --> 00:49:46,110 to figure out how it works or find some problem in it all these years later, 1050 00:49:46,110 --> 00:49:48,810 you can step into printf, and then the screen would change, 1051 00:49:48,810 --> 00:49:50,910 and you'd see each of the lines for printf, 1052 00:49:50,910 --> 00:49:54,250 line by line-- at least if you have the source code for printf installed. 1053 00:49:54,250 --> 00:49:56,490 All right, I'm going to use the first one, Step Over. 1054 00:49:56,490 --> 00:49:59,130 And watch as the yellow highlight moves. 1055 00:49:59,130 --> 00:50:03,060 And watch as, in the terminal window, there's a hash symbol. 1056 00:50:03,060 --> 00:50:03,780 Here we go. 1057 00:50:03,780 --> 00:50:05,130 There's one hash. 1058 00:50:05,130 --> 00:50:07,230 Now, notice line 5 is highlighted. 1059 00:50:07,230 --> 00:50:09,480 That means it has paused on line 5. 1060 00:50:09,480 --> 00:50:11,350 Line 5 has not yet been executed. 1061 00:50:11,350 --> 00:50:12,600 So what does that mean? 1062 00:50:12,600 --> 00:50:16,320 The value of i, per the top left-hand corner, is still 0. 1063 00:50:16,320 --> 00:50:18,920 But as soon as I click Step Over again, watch 1064 00:50:18,920 --> 00:50:24,470 what happens at the top left, where i is a variable on the screen. 1065 00:50:24,470 --> 00:50:26,420 Now i-- and it flashed briefly-- 1066 00:50:26,420 --> 00:50:27,920 has a value of 1. 1067 00:50:27,920 --> 00:50:30,650 And now if I step over again, watch the terminal window. 1068 00:50:30,650 --> 00:50:32,120 There's my second hash. 1069 00:50:32,120 --> 00:50:36,380 Now, let me click Step Over on for loop, watch the variable at top left. 1070 00:50:36,380 --> 00:50:38,567 Now 1 goes to 2. 1071 00:50:38,567 --> 00:50:39,650 Now let me click it again. 1072 00:50:39,650 --> 00:50:43,220 Third hash-- and here's where the logical error is perhaps revealed. 1073 00:50:43,220 --> 00:50:45,210 Let me go ahead and step over the loop. 1074 00:50:45,210 --> 00:50:46,520 Now i is 3. 1075 00:50:46,520 --> 00:50:49,280 Wait a minute, I'm still going to print out a hash. 1076 00:50:49,280 --> 00:50:49,810 There it is. 1077 00:50:49,810 --> 00:50:50,810 There's the fourth hash. 1078 00:50:50,810 --> 00:50:53,852 And at this point, hopefully, the light bulb, proverbially, has gone off. 1079 00:50:53,852 --> 00:50:55,020 I realize, oh, I screwed up. 1080 00:50:55,020 --> 00:50:58,580 I can either stop the program altogether with the red square, 1081 00:50:58,580 --> 00:51:01,100 or I can just let it run all the way to the end, which 1082 00:51:01,100 --> 00:51:02,493 just terminates everything. 1083 00:51:02,493 --> 00:51:05,660 At this point, I just want to get back into my code and start fixing things. 1084 00:51:05,660 --> 00:51:07,700 And you can close, for instance, as I will here, 1085 00:51:07,700 --> 00:51:10,670 the File Explorer, just to hide the panel that opened. 1086 00:51:10,670 --> 00:51:12,320 So that's debug50. 1087 00:51:12,320 --> 00:51:15,920 But it's not a CS50 thing, that just starts the debugger for you, which 1088 00:51:15,920 --> 00:51:19,520 is something you'd find in most any programming environment nowadays. 1089 00:51:19,520 --> 00:51:23,670 Questions on debugging? 1090 00:51:23,670 --> 00:51:24,170 Questions? 1091 00:51:24,170 --> 00:51:24,670 Yeah? 1092 00:51:24,670 --> 00:51:27,295 AUDIENCE: Where does it tell you where it went wrong? 1093 00:51:27,295 --> 00:51:28,420 DAVID MALAN: Good question. 1094 00:51:28,420 --> 00:51:30,310 Where does it tell you where it went wrong? 1095 00:51:30,310 --> 00:51:33,190 So, sadly, it does not tell you any of that. 1096 00:51:33,190 --> 00:51:37,570 The onus is still on you, the human, to use this tool productively to walk 1097 00:51:37,570 --> 00:51:39,580 through your code at a saner pace. 1098 00:51:39,580 --> 00:51:42,070 But your brain is the one that still needs to solve it. 1099 00:51:42,070 --> 00:51:45,190 And I don't doubt, down the line, with artificial intelligence and more, 1100 00:51:45,190 --> 00:51:47,350 programs like this will get all the more helpful, 1101 00:51:47,350 --> 00:51:49,160 and start answering questions like that for us. 1102 00:51:49,160 --> 00:51:51,340 And there are other tools we'll introduce you this semester 1103 00:51:51,340 --> 00:51:52,990 that are even more powerful than this. 1104 00:51:52,990 --> 00:51:56,770 But for now, it's just a tool, really, to slow things down and not 1105 00:51:56,770 --> 00:51:57,820 have to change your code. 1106 00:51:57,820 --> 00:52:01,420 The fact that I had that panel on the left that just showed me i's changing 1107 00:52:01,420 --> 00:52:04,150 value is just an alternative to printf, and I can 1108 00:52:04,150 --> 00:52:06,820 step through it a little more slowly. 1109 00:52:06,820 --> 00:52:10,580 Other questions on debugging? 1110 00:52:10,580 --> 00:52:11,080 No? 1111 00:52:11,080 --> 00:52:14,950 Let me show you one final example with this debugger here. 1112 00:52:14,950 --> 00:52:16,750 And this one, too, I wrote in advance. 1113 00:52:16,750 --> 00:52:18,730 Let me close buggy0.c. 1114 00:52:18,730 --> 00:52:22,327 And let me open up buggy1.c, my second version thereof. 1115 00:52:22,327 --> 00:52:24,160 Let me close my terminal window for a second 1116 00:52:24,160 --> 00:52:26,350 and give you a quick tour of this program, which 1117 00:52:26,350 --> 00:52:28,030 similarly, has a mistake. 1118 00:52:28,030 --> 00:52:32,830 Now, at the top of this program, some familiar includes, cs50.h and stdio.h. 1119 00:52:32,830 --> 00:52:34,730 This is not something we've seen before. 1120 00:52:34,730 --> 00:52:36,190 It's specific to this example-- 1121 00:52:36,190 --> 00:52:38,830 a function called getNegativeInt. 1122 00:52:38,830 --> 00:52:41,043 Takes no arguments, and it returns an integer. 1123 00:52:41,043 --> 00:52:41,710 What does it do? 1124 00:52:41,710 --> 00:52:45,040 It literally gets a negative integer, ideally, from the user. 1125 00:52:45,040 --> 00:52:47,200 Fun fact, though, it doesn't correctly. 1126 00:52:47,200 --> 00:52:50,090 That's the bug. getNegativeInt is broken at the moment. 1127 00:52:50,090 --> 00:52:51,470 So what does main do? 1128 00:52:51,470 --> 00:52:54,130 Well, main just calls this function, passing in nothing 1129 00:52:54,130 --> 00:52:55,690 in parentheses, no inputs. 1130 00:52:55,690 --> 00:52:58,240 And it stores the return value in i. 1131 00:52:58,240 --> 00:53:00,260 And then it just prints out i on the screen. 1132 00:53:00,260 --> 00:53:03,910 So honestly, just by eyeballing this, I feel comfortable enough 1133 00:53:03,910 --> 00:53:06,365 with programming in C, I think main is correct. 1134 00:53:06,365 --> 00:53:07,990 Let me just stipulate, main is correct. 1135 00:53:07,990 --> 00:53:09,698 But there is going to be a bug down here. 1136 00:53:09,698 --> 00:53:11,210 Now, what's the bug down here? 1137 00:53:11,210 --> 00:53:14,830 Well, let me look at getNegativeInt's implementation. 1138 00:53:14,830 --> 00:53:18,970 Notice, this first line, 12, is identical to the prototype up here. 1139 00:53:18,970 --> 00:53:22,690 The prototype is sort of stupidly required up here 1140 00:53:22,690 --> 00:53:25,300 because C reads things top to bottom, left to right-- 1141 00:53:25,300 --> 00:53:26,690 the compiler technically does. 1142 00:53:26,690 --> 00:53:29,680 So if you reference getNegativeInt here, but you 1143 00:53:29,680 --> 00:53:33,490 don't implement it until down here, and you haven't told C in advance 1144 00:53:33,490 --> 00:53:36,820 that it will exist, again, you get the error we saw last week. 1145 00:53:36,820 --> 00:53:39,010 All right, so how does getNegativeInt work? 1146 00:53:39,010 --> 00:53:40,960 We declare a variable called n. 1147 00:53:40,960 --> 00:53:43,540 We've got to do while loop that does what? 1148 00:53:43,540 --> 00:53:47,110 It uses getInt, which comes with the cs50 library, per last week. 1149 00:53:47,110 --> 00:53:49,480 It prompts the user for negative integer, quote unquote, 1150 00:53:49,480 --> 00:53:51,670 and stores the value in n. 1151 00:53:51,670 --> 00:53:56,800 I then do all of this while n is less than 0, right? 1152 00:53:56,800 --> 00:54:00,400 Remember, we used to do while loop last week to make sure the human cooperates 1153 00:54:00,400 --> 00:54:03,970 and doesn't give us the wrong type of value, be it positive or negative 1154 00:54:03,970 --> 00:54:04,970 or something else. 1155 00:54:04,970 --> 00:54:06,400 And then we return n. 1156 00:54:06,400 --> 00:54:07,570 And there's some subtleties. 1157 00:54:07,570 --> 00:54:12,970 Anyone recall-- or have an intuition for why I've declared n on line 14, 1158 00:54:12,970 --> 00:54:15,790 instead of line 17? 1159 00:54:15,790 --> 00:54:17,620 This is a C specific thing. 1160 00:54:17,620 --> 00:54:23,465 AUDIENCE: [INAUDIBLE] 1161 00:54:23,465 --> 00:54:24,340 DAVID MALAN: Exactly. 1162 00:54:24,340 --> 00:54:27,610 There's this notion of scope in C. And we'll continue to see this over time, 1163 00:54:27,610 --> 00:54:32,590 whereby, a variable only exists inside of the most recent curly braces 1164 00:54:32,590 --> 00:54:33,560 that you've opened. 1165 00:54:33,560 --> 00:54:36,910 So if I've declared n here on line 14, I can use it 1166 00:54:36,910 --> 00:54:40,900 anywhere between lines 13 and 21 because those are the nearest curly braces. 1167 00:54:40,900 --> 00:54:43,540 If by contrast, as you note, if I instead said this, 1168 00:54:43,540 --> 00:54:49,180 int n equals getInt and so forth, and didn't have the current line 14, 1169 00:54:49,180 --> 00:54:53,470 well, n would exist inside of these curly braces, but not here, which 1170 00:54:53,470 --> 00:54:55,340 is too late, and definitely not here. 1171 00:54:55,340 --> 00:54:59,480 So you just have to declare it first, and then use and reuse it as such. 1172 00:54:59,480 --> 00:55:01,545 Now, let me just show you how I can debug this. 1173 00:55:01,545 --> 00:55:03,170 But let me show you the symptoms first. 1174 00:55:03,170 --> 00:55:04,930 Let me open my terminal window. 1175 00:55:04,930 --> 00:55:06,970 Let me run make buggy1. 1176 00:55:06,970 --> 00:55:11,710 Compiles OK, so it's not something silly like a semicolon. ./buggy1, 1177 00:55:11,710 --> 00:55:13,660 and I'm asked for a negative integer. 1178 00:55:13,660 --> 00:55:15,280 All right, let me give it negative 1-- 1179 00:55:15,280 --> 00:55:16,710 Enter. 1180 00:55:16,710 --> 00:55:19,920 Well, the main function is supposed to print out what I typed, 1181 00:55:19,920 --> 00:55:20,880 but it clearly didn't. 1182 00:55:20,880 --> 00:55:21,880 It's prompting me again. 1183 00:55:21,880 --> 00:55:23,830 All right, so maybe it'll like negative 2. 1184 00:55:23,830 --> 00:55:24,330 No? 1185 00:55:24,330 --> 00:55:26,380 Maybe negative 3. 1186 00:55:26,380 --> 00:55:27,570 50? 1187 00:55:27,570 --> 00:55:29,160 OK, so it's definitely broken, right? 1188 00:55:29,160 --> 00:55:31,528 It kind of seems logically to be doing the opposite. 1189 00:55:31,528 --> 00:55:33,820 Now, you can perhaps see why this is happening already. 1190 00:55:33,820 --> 00:55:37,170 These are deliberately simple programs for demonstrations sake. 1191 00:55:37,170 --> 00:55:38,470 But let's do this. 1192 00:55:38,470 --> 00:55:41,037 Let me go ahead and set a breakpoint in main, 1193 00:55:41,037 --> 00:55:42,870 even though I'm pretty sure main is correct. 1194 00:55:42,870 --> 00:55:45,810 But it just helps me start my thought process-- start with main, 1195 00:55:45,810 --> 00:55:47,010 and then take it from there. 1196 00:55:47,010 --> 00:55:51,840 Let me run now, debug50 ./buggy1-- 1197 00:55:51,840 --> 00:55:52,920 Enter. 1198 00:55:52,920 --> 00:55:53,700 And let's see. 1199 00:55:53,700 --> 00:55:56,880 With that breakpoint now, the GUI is going to reconfigure itself. 1200 00:55:56,880 --> 00:56:00,360 It's going to pause on line 8 because that's the first interesting line 1201 00:56:00,360 --> 00:56:01,260 inside of main. 1202 00:56:01,260 --> 00:56:03,780 So I could have just put the breakpoint on line 8 too. 1203 00:56:03,780 --> 00:56:06,480 It's smart enough to know that if I set it on 6, 1204 00:56:06,480 --> 00:56:09,570 you really mean line 8 because that's the first actual line of code. 1205 00:56:09,570 --> 00:56:11,280 And watch, now, what happens. 1206 00:56:11,280 --> 00:56:15,780 If I step over this line, notice that i, which at the moment 1207 00:56:15,780 --> 00:56:18,090 seems to have a default value of 0-- 1208 00:56:18,090 --> 00:56:19,470 more on that another time. 1209 00:56:19,470 --> 00:56:24,750 But if I click Step Over like before, I'm prompted for a negative integer. 1210 00:56:24,750 --> 00:56:25,750 Let me type negative 1-- 1211 00:56:25,750 --> 00:56:27,300 Enter. 1212 00:56:27,300 --> 00:56:32,470 And now, notice, there's no additional yellow highlight. 1213 00:56:32,470 --> 00:56:32,970 Why? 1214 00:56:32,970 --> 00:56:35,160 Where am I currently stuck, logically? 1215 00:56:35,160 --> 00:56:37,937 AUDIENCE: [INAUDIBLE] 1216 00:56:37,937 --> 00:56:40,770 DAVID MALAN: Yeah, just logically, I must be in that do, while loop. 1217 00:56:40,770 --> 00:56:43,560 And even if you don't understand it, like that's the only explanation. 1218 00:56:43,560 --> 00:56:46,143 If you keep getting prompted, surely, there's a loop going on. 1219 00:56:46,143 --> 00:56:49,270 There's only one loop in my code, so there's probably a problem there. 1220 00:56:49,270 --> 00:56:52,900 So I can't just set a breakpoint in main, and then wait for this to work. 1221 00:56:52,900 --> 00:56:53,610 So let me just-- 1222 00:56:53,610 --> 00:56:56,280 let me stop this with the red square. 1223 00:56:56,280 --> 00:56:58,860 And let me think, all right, instead of-- 1224 00:56:58,860 --> 00:57:02,770 I can still set my breakpoint in main, but let me rerun the debugger instead. 1225 00:57:02,770 --> 00:57:05,470 And this time, not step over that line of code, 1226 00:57:05,470 --> 00:57:07,930 let me step into that line of code. 1227 00:57:07,930 --> 00:57:09,270 So watch what happens now. 1228 00:57:09,270 --> 00:57:11,430 Instead of clicking the second icon here, 1229 00:57:11,430 --> 00:57:14,610 let me click the third, whose name is, indeed, Step Into. 1230 00:57:14,610 --> 00:57:17,880 And watch as the yellow highlight does not move to line 9. 1231 00:57:17,880 --> 00:57:21,930 It dives into line 8-- the function on line 8, 1232 00:57:21,930 --> 00:57:25,170 thereby, bringing me down to line 17. 1233 00:57:25,170 --> 00:57:28,270 It's kind of going down into that next function. 1234 00:57:28,270 --> 00:57:31,422 Now, it didn't bother pausing on line 12 or 13 or 14 1235 00:57:31,422 --> 00:57:34,380 because there's nothing intellectually interesting there happening yet. 1236 00:57:34,380 --> 00:57:37,080 The juicy part really starts, it would seem, in line 17. 1237 00:57:37,080 --> 00:57:40,980 So, now notice, n is my variable at the top left. 1238 00:57:40,980 --> 00:57:42,270 If I click-- 1239 00:57:42,270 --> 00:57:45,420 I don't want to click Step Into now, though. 1240 00:57:45,420 --> 00:57:48,090 What would go wrong if I click on Step Into-- 1241 00:57:48,090 --> 00:57:52,480 or what would it do that I don't think I want to do? 1242 00:57:52,480 --> 00:57:52,990 Yeah? 1243 00:57:52,990 --> 00:57:54,755 AUDIENCE: [INAUDIBLE] 1244 00:57:54,755 --> 00:57:56,630 DAVID MALAN: Yeah, it would step into getInt. 1245 00:57:56,630 --> 00:57:59,620 But I'd like to think that the staff's version of getInt is correct, 1246 00:57:59,620 --> 00:58:02,120 and that's not our problem today, so I want to step over it. 1247 00:58:02,120 --> 00:58:06,710 And watch now at top left that nothing happens yet to the value of n 1248 00:58:06,710 --> 00:58:09,530 until I go to the terminal window now, and I type in something 1249 00:58:09,530 --> 00:58:10,670 like negative 1. 1250 00:58:10,670 --> 00:58:14,600 Now notice, it jumps to line 19, which is the next interesting line. 1251 00:58:14,600 --> 00:58:17,240 Top left, n, indeed, is negative 1. 1252 00:58:17,240 --> 00:58:19,160 And here's where I can now pause as a human 1253 00:58:19,160 --> 00:58:22,760 and think, all right, so while n is less than 0. 1254 00:58:22,760 --> 00:58:25,280 All right, n, per the top left corner, is negative 1. 1255 00:58:25,280 --> 00:58:27,830 So all right, while negative 1 is less than 0, 1256 00:58:27,830 --> 00:58:29,780 well, obviously that's true mathematically. 1257 00:58:29,780 --> 00:58:30,930 So what's going to happen? 1258 00:58:30,930 --> 00:58:32,130 It's a do while loop. 1259 00:58:32,130 --> 00:58:37,285 So when I click on Step Over again, it's going to go to this line 1260 00:58:37,285 --> 00:58:39,410 because it's at the end of the inside of that loop. 1261 00:58:39,410 --> 00:58:42,710 And now here, it's looping through again and again. 1262 00:58:42,710 --> 00:58:44,240 All right, let me do this once more. 1263 00:58:44,240 --> 00:58:45,980 I'm going to step over, all right? 1264 00:58:45,980 --> 00:58:48,777 I'm going to type in negative 2, and it's the exact same thing. 1265 00:58:48,777 --> 00:58:50,360 Now is my chance, on the yellow line-- 1266 00:58:50,360 --> 00:58:51,260 OK, wait a minute. 1267 00:58:51,260 --> 00:58:53,450 Negative 2 is obviously less than 0. 1268 00:58:53,450 --> 00:58:56,080 Let me try this one more time. 1269 00:58:56,080 --> 00:58:57,570 Click it once here. 1270 00:58:57,570 --> 00:58:59,040 All right, let me give it 50. 1271 00:58:59,040 --> 00:59:05,020 And now, OK, while 50 is less than 0, that's not true, 1272 00:59:05,020 --> 00:59:08,970 so the loop is over because it's not going to do it while 50 is less than 0. 1273 00:59:08,970 --> 00:59:09,730 That's not true. 1274 00:59:09,730 --> 00:59:12,240 So now watch, when I click Step Over once more, 1275 00:59:12,240 --> 00:59:15,810 it then finishes the loop, even though there's nothing more to do. 1276 00:59:15,810 --> 00:59:17,610 It's now about to return n. 1277 00:59:17,610 --> 00:59:21,360 It jumps back up to main, where I left off on line 9. 1278 00:59:21,360 --> 00:59:23,778 It now prints, in my terminal window, the number 50. 1279 00:59:23,778 --> 00:59:26,070 And hopefully, at this point, to your question earlier, 1280 00:59:26,070 --> 00:59:30,700 my human brain has realized, oh, I'm an idiot, like I flipped my sign there. 1281 00:59:30,700 --> 00:59:32,460 So I probably-- let me stop this. 1282 00:59:32,460 --> 00:59:34,780 I probably want to do something like this. 1283 00:59:34,780 --> 00:59:38,860 If the goal is to get a negative integer, I probably want to say, 1284 00:59:38,860 --> 00:59:45,070 while n is, for instance, greater than or equal to 0 would work. 1285 00:59:45,070 --> 00:59:48,630 So while n is greater than or equal to 0, keep doing this. 1286 00:59:48,630 --> 00:59:50,430 And that's the logic I wanted to express. 1287 00:59:50,430 --> 00:59:53,733 So the debugger just saves me from staring at the screen, raising a hand, 1288 00:59:53,733 --> 00:59:54,900 sort of asking someone else. 1289 00:59:54,900 --> 00:59:58,650 At least in this case, it allows me to go through it at a healthier pace. 1290 00:59:58,650 --> 01:00:03,000 Questions now on debug50, which should be your new friend, even if it's not 1291 01:00:03,000 --> 01:00:04,940 your first instinct after printf? 1292 01:00:04,940 --> 01:00:07,690 1293 01:00:07,690 --> 01:00:09,190 Any questions on debug50? 1294 01:00:09,190 --> 01:00:09,730 No? 1295 01:00:09,730 --> 01:00:13,960 All right, well, there's one last technique we can equip you with here. 1296 01:00:13,960 --> 01:00:17,470 And that is, in addition to printf and a debugger, no joke, 1297 01:00:17,470 --> 01:00:21,400 a rubber duck is actually a reasonably recommended solution 1298 01:00:21,400 --> 01:00:22,720 to finding bugs in your code. 1299 01:00:22,720 --> 01:00:24,640 To your question earlier, the duck two is not 1300 01:00:24,640 --> 01:00:26,390 going to solve the problem for you. 1301 01:00:26,390 --> 01:00:29,710 But if you've wondered why this little guy has been here for so long, 1302 01:00:29,710 --> 01:00:32,080 there's this technique, has its own Wikipedia article 1303 01:00:32,080 --> 01:00:33,760 of called rubber duck debugging. 1304 01:00:33,760 --> 01:00:37,390 The idea of which is that if you're home in your dorm room, 1305 01:00:37,390 --> 01:00:39,520 wrestling with some bug in your code, printf 1306 01:00:39,520 --> 01:00:42,820 didn't quite reveal the source to you, debugger isn't really helping, 1307 01:00:42,820 --> 01:00:46,960 honestly, maybe it would help to just sound out what problem you're having. 1308 01:00:46,960 --> 01:00:50,260 Similar to going to office hours, talking to a TA or a professor, 1309 01:00:50,260 --> 01:00:52,030 just walking through your problems because 1310 01:00:52,030 --> 01:00:54,730 in sort of talking to the duck about the fact 1311 01:00:54,730 --> 01:01:00,550 that you're doing this while n is less than 0, and then if it is-- 1312 01:01:00,550 --> 01:01:01,180 wait a minute. 1313 01:01:01,180 --> 01:01:03,820 I'm an idiot, not just for talking to the rubber duck. 1314 01:01:03,820 --> 01:01:05,980 You realize, hopefully, in expressing yourself, 1315 01:01:05,980 --> 01:01:09,910 literally verbally, you probably will hear with non-zero probability, 1316 01:01:09,910 --> 01:01:11,860 like some illogic in your statement. 1317 01:01:11,860 --> 01:01:16,430 And just by sounding things out, you'll realize like, oh, that's my problem. 1318 01:01:16,430 --> 01:01:19,720 And so, frankly, if you have roommates, you can also use a roommate for this. 1319 01:01:19,720 --> 01:01:21,700 But the rubber duck is just sort of a go-to 1320 01:01:21,700 --> 01:01:24,700 when your roommates have no interest in your C problem set, 1321 01:01:24,700 --> 01:01:28,150 talking something through that as such. 1322 01:01:28,150 --> 01:01:29,933 And this is an invaluable technique. 1323 01:01:29,933 --> 01:01:32,350 I admittedly tend not to do it so much with a rubber duck, 1324 01:01:32,350 --> 01:01:34,510 but ideally with colleagues, human colleagues. 1325 01:01:34,510 --> 01:01:38,260 But just talking through things often will help you just realize, 1326 01:01:38,260 --> 01:01:40,360 oh, I said something illogical. 1327 01:01:40,360 --> 01:01:41,860 Now I can go back to the code. 1328 01:01:41,860 --> 01:01:44,650 So don't solve problems by staring at your screen 1329 01:01:44,650 --> 01:01:46,240 endlessly for minutes, for hours. 1330 01:01:46,240 --> 01:01:48,100 At that point, it's time for a break, time 1331 01:01:48,100 --> 01:01:50,475 to walk away, time to talk to the duck, if you've already 1332 01:01:50,475 --> 01:01:52,900 exhausted some of those other tools. 1333 01:01:52,900 --> 01:01:55,330 As an aside, on your way out today at the end of class, 1334 01:01:55,330 --> 01:01:59,020 we have, clearly, plenty of rubber ducks for you. 1335 01:01:59,020 --> 01:02:01,600 And it's become a thing over the years, at least 1336 01:02:01,600 --> 01:02:05,770 among some, to bring the duck with them when they travel and send us photos. 1337 01:02:05,770 --> 01:02:10,480 Here, for instance, is CS50's rubber duck debugger, A.K.A. DDB, 1338 01:02:10,480 --> 01:02:15,940 for Duck Debugger, which is a pun on a geekier program called GDB, the GNU 1339 01:02:15,940 --> 01:02:18,740 Debugger, which is an actual piece of software for debugging. 1340 01:02:18,740 --> 01:02:25,270 This is CS50's debugger in the hills of Puerto Rico, also, here on the sea. 1341 01:02:25,270 --> 01:02:28,310 He made its way to San Francisco here. 1342 01:02:28,310 --> 01:02:30,640 Also, down by Fisherman's Wharf by the sea lions. 1343 01:02:30,640 --> 01:02:31,660 Familiar? 1344 01:02:31,660 --> 01:02:34,570 Here at Stanford, where there's a William Gates Computer Science 1345 01:02:34,570 --> 01:02:38,950 building for computer science, down the road in SF at Google. 1346 01:02:38,950 --> 01:02:41,650 And this is the Trevi Fountain in Rome. 1347 01:02:41,650 --> 01:02:43,810 And lastly, the Colosseum. 1348 01:02:43,810 --> 01:02:46,990 So we'll be curious to see in the coming years where your duck two travels. 1349 01:02:46,990 --> 01:02:49,120 So that, then, was quite a bit. 1350 01:02:49,120 --> 01:02:51,850 Why don't we go ahead here and take a short 5 minute break? 1351 01:02:51,850 --> 01:02:52,760 No snacks yet. 1352 01:02:52,760 --> 01:02:54,400 You're welcome to get up or sit down. 1353 01:02:54,400 --> 01:02:56,620 We'll return in about five. 1354 01:02:56,620 --> 01:03:00,020 All right, so we are back. 1355 01:03:00,020 --> 01:03:04,000 And if the goal, ultimately, today is to have a better understanding of things 1356 01:03:04,000 --> 01:03:06,940 like strings so that we can solve problems with text, 1357 01:03:06,940 --> 01:03:09,190 let's consider some simpler types of data 1358 01:03:09,190 --> 01:03:11,290 first, how we might represent those, and then 1359 01:03:11,290 --> 01:03:14,290 see if that doesn't lead us to a discovery as to how strings, 1360 01:03:14,290 --> 01:03:17,330 and just today's modern software is using things like that. 1361 01:03:17,330 --> 01:03:21,850 So when we talked on week zero about representation of data, 1362 01:03:21,850 --> 01:03:25,930 we had different ways of doing it, in terms of binary and decimal, 1363 01:03:25,930 --> 01:03:27,640 and unary even. 1364 01:03:27,640 --> 01:03:30,520 When we started talking about the same last week in code, 1365 01:03:30,520 --> 01:03:33,980 we started talking about data types instead. 1366 01:03:33,980 --> 01:03:36,820 And these data types were a way of telling 1367 01:03:36,820 --> 01:03:40,000 the computer, like do you want an integer, do you want a character, 1368 01:03:40,000 --> 01:03:44,260 do you want a floating point value, like a real number, or even a string, 1369 01:03:44,260 --> 01:03:45,070 as we've seen? 1370 01:03:45,070 --> 01:03:47,350 But it turns out that computers, of course, 1371 01:03:47,350 --> 01:03:49,930 only have finite amounts of resources. 1372 01:03:49,930 --> 01:03:53,740 Your computer only has a fixed amount of memory or RAM. 1373 01:03:53,740 --> 01:03:55,910 And that actually has very real world implications. 1374 01:03:55,910 --> 01:03:59,630 So for instance, here are some of the data types we've seen thus far. 1375 01:03:59,630 --> 01:04:04,090 And it turns out that each of these in C has a specific number 1376 01:04:04,090 --> 01:04:05,650 of bits allocated to it. 1377 01:04:05,650 --> 01:04:08,350 Now, admittedly, this can vary by system. 1378 01:04:08,350 --> 01:04:10,850 It's not so much the case nowadays, but for many years, 1379 01:04:10,850 --> 01:04:13,100 for decades, computers were getting better and better. 1380 01:04:13,100 --> 01:04:15,392 The earliest computers might have used fewer bits 1381 01:04:15,392 --> 01:04:16,600 for some of these data types. 1382 01:04:16,600 --> 01:04:18,663 More modern computers might use more bits. 1383 01:04:18,663 --> 01:04:21,830 So the numbers you're about to see are pretty much where we are present day. 1384 01:04:21,830 --> 01:04:25,030 So when it comes to these data types, a bool, 1385 01:04:25,030 --> 01:04:29,020 which is true or false, somewhat curiously, uses a whole byte, 1386 01:04:29,020 --> 01:04:32,380 even though that's way overkill because for a bool, true or false, 1387 01:04:32,380 --> 01:04:33,940 you, of course, only need one bit. 1388 01:04:33,940 --> 01:04:36,520 But it turns out, even though it's wasteful to use 1389 01:04:36,520 --> 01:04:39,938 eight bits, or one byte, just to represent true or false, 1390 01:04:39,938 --> 01:04:41,230 it's just easier for computers. 1391 01:04:41,230 --> 01:04:42,820 So a bool tends to be one byte. 1392 01:04:42,820 --> 01:04:47,590 An int, which we've been using a lot, uses 4 bytes, typically, or 32 bits. 1393 01:04:47,590 --> 01:04:50,590 And if I do some quick math from week zero, with 32 bits, 1394 01:04:50,590 --> 01:04:54,040 you have 4 billion possible values, roughly. 1395 01:04:54,040 --> 01:04:56,290 But if you want to represent positive and negative, 1396 01:04:56,290 --> 01:04:59,710 that means you can represent roughly negative 2 billion, all the way up 1397 01:04:59,710 --> 01:05:01,020 to positive 2 billion. 1398 01:05:01,020 --> 01:05:02,770 So that's the range, typically, with ints. 1399 01:05:02,770 --> 01:05:06,820 If that's too few numbers for you, turns out there's things called longs. 1400 01:05:06,820 --> 01:05:10,120 And longs use 64 bits, which allow you to have 1401 01:05:10,120 --> 01:05:13,220 like a quintillion number of possibilities, 1402 01:05:13,220 --> 01:05:15,730 which is a lot, certainly, a lot more than 4 billion. 1403 01:05:15,730 --> 01:05:17,410 So sometimes you might use a long. 1404 01:05:17,410 --> 01:05:18,670 But even that's finite. 1405 01:05:18,670 --> 01:05:21,640 And so as we discussed at the end of last week, 1406 01:05:21,640 --> 01:05:23,980 bad things can happen if you make certain assumptions 1407 01:05:23,980 --> 01:05:27,220 as to the data because of things like integer overflow or the like, 1408 01:05:27,220 --> 01:05:28,330 where things wrap around. 1409 01:05:28,330 --> 01:05:31,538 Then there's a float, which is a real number, something with a decimal point. 1410 01:05:31,538 --> 01:05:36,040 By convention, it's 4 bytes or 32 bits, which gives you, in short, 1411 01:05:36,040 --> 01:05:37,810 only a specific amount of precision. 1412 01:05:37,810 --> 01:05:41,620 It doesn't necessarily dictate how many numbers to the left or to the right. 1413 01:05:41,620 --> 01:05:45,250 In the aggregate, ultimately, you have though, 1414 01:05:45,250 --> 01:05:47,650 4 billion possible permutations still. 1415 01:05:47,650 --> 01:05:50,110 If you need more precision for scientific, for medical, 1416 01:05:50,110 --> 01:05:54,790 for financial applications, you might use 8 bytes, A.K.A. a double, 1417 01:05:54,790 --> 01:05:57,700 which just gives you more digits of precision. 1418 01:05:57,700 --> 01:06:01,360 They eventually get imprecise per the example we looked at last week, 1419 01:06:01,360 --> 01:06:03,610 but it at least gets you further down the line. 1420 01:06:03,610 --> 01:06:07,930 As an aside, in really, really important applications, in finance, 1421 01:06:07,930 --> 01:06:10,030 in medicine, in military operations, and the 1422 01:06:10,030 --> 01:06:12,640 like where you really can't have rounding errors-- 1423 01:06:12,640 --> 01:06:17,470 long story short, humans have developed libraries in C and other languages 1424 01:06:17,470 --> 01:06:19,317 that use more, even, than 8 bytes. 1425 01:06:19,317 --> 01:06:22,150 So there are solutions to these problems, but they're always finite. 1426 01:06:22,150 --> 01:06:24,070 You have to pick an upper bound. 1427 01:06:24,070 --> 01:06:27,070 Then there's char, which we saw briefly last week when I asked 1428 01:06:27,070 --> 01:06:29,470 the user for y or n, for yes or no. 1429 01:06:29,470 --> 01:06:32,470 And then there's a string, which I'm going to propose as a question mark 1430 01:06:32,470 --> 01:06:34,360 because a string totally depends. 1431 01:06:34,360 --> 01:06:35,380 Like, Hi! 1432 01:06:35,380 --> 01:06:38,890 H-I, exclamation point, would seem to be three bytes. 1433 01:06:38,890 --> 01:06:41,140 D-A-V-I-D, would seem to be five. 1434 01:06:41,140 --> 01:06:45,400 So the strings, clearly, are variable based on what you or the human type in. 1435 01:06:45,400 --> 01:06:48,140 So we'll see what this means, though, in just a bit. 1436 01:06:48,140 --> 01:06:51,580 This though, is the thing inside of your Mac, your PC, your phone. 1437 01:06:51,580 --> 01:06:53,680 It might not look exactly like this, but this is 1438 01:06:53,680 --> 01:06:56,187 a memory module for a modern computer. 1439 01:06:56,187 --> 01:06:57,520 And let's go ahead and use this. 1440 01:06:57,520 --> 01:06:59,920 Really, it's just representative of the finite amount of memory 1441 01:06:59,920 --> 01:07:01,360 that any computer, indeed, has. 1442 01:07:01,360 --> 01:07:06,160 Let's zoom in on one of these little black chips on the circuit board here. 1443 01:07:06,160 --> 01:07:10,180 Zoom in, and let me propose that this rectangle really represents 1444 01:07:10,180 --> 01:07:14,380 some number of bytes, like tucked inside of this little black circuit 1445 01:07:14,380 --> 01:07:16,750 on the board is maybe, I don't know, a gigabyte, 1446 01:07:16,750 --> 01:07:19,300 a billion bytes, maybe it's 100 bytes-- some number of bytes. 1447 01:07:19,300 --> 01:07:21,258 It totally depends on the computer and how much 1448 01:07:21,258 --> 01:07:22,700 you paid for the stick of memory. 1449 01:07:22,700 --> 01:07:27,850 But if there's a finite number of bytes physically implemented somehow 1450 01:07:27,850 --> 01:07:30,327 digitally inside of this hardware, well, then it 1451 01:07:30,327 --> 01:07:32,410 stands to reason that we could number those bytes. 1452 01:07:32,410 --> 01:07:36,940 We can just arbitrarily decide that the top left corner is byte number 1453 01:07:36,940 --> 01:07:38,800 one, or really byte number zero. 1454 01:07:38,800 --> 01:07:41,170 The one next to it is number one, then number two, 1455 01:07:41,170 --> 01:07:43,450 number 3, dot, dot, dot, number 2 billion 1456 01:07:43,450 --> 01:07:46,090 or whatever it is, however big this memory is. 1457 01:07:46,090 --> 01:07:50,530 So if you use a variable in a C program, that's only one byte. 1458 01:07:50,530 --> 01:07:54,190 Like a char, it might literally be stored in that top left-hand corner 1459 01:07:54,190 --> 01:07:55,120 of the memory. 1460 01:07:55,120 --> 01:07:57,760 In practice, you don't care where, physically, it is. 1461 01:07:57,760 --> 01:07:59,830 But really, the artist's rendition would be 1462 01:07:59,830 --> 01:08:02,872 this-- a char might use one of those single bytes 1463 01:08:02,872 --> 01:08:04,330 somewhere in the computer's memory. 1464 01:08:04,330 --> 01:08:07,450 If you use an int, which is 4 bytes, it would give you 1465 01:08:07,450 --> 01:08:10,840 4 bytes, contiguous-- that is left to right, top to bottom. 1466 01:08:10,840 --> 01:08:13,274 But all 32 bits would be next to each other 1467 01:08:13,274 --> 01:08:16,149 so the computer knows that those, indeed, all belong to the same int. 1468 01:08:16,149 --> 01:08:18,680 If you need a long, or a double for that matter, 1469 01:08:18,680 --> 01:08:21,140 then you might use a full 8 bytes in this case. 1470 01:08:21,140 --> 01:08:23,439 And you just keep using and using this memory, 1471 01:08:23,439 --> 01:08:26,170 kind of like a canvas, almost in Photoshop 1472 01:08:26,170 --> 01:08:29,845 or a spreadsheet where you can just move pixels or you can move data around, 1473 01:08:29,845 --> 01:08:31,720 that's really what your computer's memory is, 1474 01:08:31,720 --> 01:08:36,702 a canvas for storing information in units of bytes or 8 bits. 1475 01:08:36,702 --> 01:08:39,160 Now, we don't need to keep looking at these circuit boards. 1476 01:08:39,160 --> 01:08:41,287 We can abstract it away, as we often do. 1477 01:08:41,287 --> 01:08:43,120 And let's go ahead and zoom in on this grid, 1478 01:08:43,120 --> 01:08:45,740 just to consider some very specific variables. 1479 01:08:45,740 --> 01:08:49,180 So let me zoom in, and now I see fewer, but larger boxes 1480 01:08:49,180 --> 01:08:51,580 on the screen, each of which, again, represents a byte. 1481 01:08:51,580 --> 01:08:55,130 And now let me propose that we play with some actual code. 1482 01:08:55,130 --> 01:08:58,029 So here in C, albeit without a full program, 1483 01:08:58,029 --> 01:09:01,060 are three ints-- score1, score2, score3. 1484 01:09:01,060 --> 01:09:07,359 I have, coincidentally, given myself two scores around 72 and 73, 1485 01:09:07,359 --> 01:09:09,040 and then a pretty low score at 33. 1486 01:09:09,040 --> 01:09:12,048 Of course, last week or two weeks ago, this would have been high. 1487 01:09:12,048 --> 01:09:13,840 But now we're dealing with actual integers. 1488 01:09:13,840 --> 01:09:17,750 So these are three so-so scores on my quizzes or tests or the like. 1489 01:09:17,750 --> 01:09:19,250 So let me go to VS Code here. 1490 01:09:19,250 --> 01:09:22,210 And let's make a program called scores.c. 1491 01:09:22,210 --> 01:09:24,399 So I'm going to write, code scores.c. 1492 01:09:24,399 --> 01:09:26,149 That's going to give me my new file. 1493 01:09:26,149 --> 01:09:28,420 And let me go ahead and implement something like this. 1494 01:09:28,420 --> 01:09:34,149 Include stdio.h, int main(void), and then inside of here, 1495 01:09:34,149 --> 01:09:37,689 let me do int score1 will be 72. 1496 01:09:37,689 --> 01:09:40,029 Int score2 will be 73. 1497 01:09:40,029 --> 01:09:43,149 And int score3 will be 33. 1498 01:09:43,149 --> 01:09:45,460 And then let me just do something like write a program 1499 01:09:45,460 --> 01:09:48,043 to average my three test scores together, something like that. 1500 01:09:48,043 --> 01:09:52,240 So let me do printf, quote unquote, my average is-- 1501 01:09:52,240 --> 01:09:56,470 and I'm going to go ahead and do, say, %i, /n. 1502 01:09:56,470 --> 01:09:58,290 And now, let me plug in the results. 1503 01:09:58,290 --> 01:10:00,040 And this is kind of grade school math now. 1504 01:10:00,040 --> 01:10:02,210 How do I compute the average of three values? 1505 01:10:02,210 --> 01:10:09,110 Well, just like on paper, I can do score1 plus score2 plus score3 1506 01:10:09,110 --> 01:10:12,830 in parentheses, because of order of operations, divided by 3, 1507 01:10:12,830 --> 01:10:14,457 since there's three total scores. 1508 01:10:14,457 --> 01:10:16,040 All right, so I think this checks out. 1509 01:10:16,040 --> 01:10:19,040 And indeed, you can use parentheses and operators like plus in your code 1510 01:10:19,040 --> 01:10:23,180 like this in C. Let me go ahead now and do make scores. 1511 01:10:23,180 --> 01:10:24,327 No syntax error. 1512 01:10:24,327 --> 01:10:25,910 So that's good, nothing missing there. 1513 01:10:25,910 --> 01:10:28,850 And now let me do ./scores and see what my test average is. 1514 01:10:28,850 --> 01:10:32,270 All right, it's not great, but I think I still passed. 1515 01:10:32,270 --> 01:10:36,050 And indeed, my average here is 59. 1516 01:10:36,050 --> 01:10:38,360 Is it precisely 59 though? 1517 01:10:38,360 --> 01:10:39,140 Well, let's see. 1518 01:10:39,140 --> 01:10:42,110 Let's actually, instead of using an int, how about we go ahead 1519 01:10:42,110 --> 01:10:44,870 and use something like a floating point value here? 1520 01:10:44,870 --> 01:10:46,250 And let me go ahead and do this. 1521 01:10:46,250 --> 01:10:48,710 So let me recompile my code, make scores. 1522 01:10:48,710 --> 01:10:50,600 Huh, all right, I've got an issue. 1523 01:10:50,600 --> 01:10:52,340 Let me zoom in on my terminal window. 1524 01:10:52,340 --> 01:10:54,710 We've not seen this one, necessarily, before. 1525 01:10:54,710 --> 01:10:56,510 But error on line 9. 1526 01:10:56,510 --> 01:11:00,410 Format specifies type double, which is a lot of precision, 1527 01:11:00,410 --> 01:11:02,180 but the argument has type int. 1528 01:11:02,180 --> 01:11:03,300 So what does this mean? 1529 01:11:03,300 --> 01:11:06,508 Well, it's showing me with these green squiggles that something's bad between 1530 01:11:06,508 --> 01:11:09,060 the %f and this thing over here. 1531 01:11:09,060 --> 01:11:13,020 Well, on the left, I'm implying a float, or a double for that matter. 1532 01:11:13,020 --> 01:11:16,835 On the right, though, what data type are score1, score2, score3? 1533 01:11:16,835 --> 01:11:17,960 All right, so they're ints. 1534 01:11:17,960 --> 01:11:19,583 So clang does not like this. 1535 01:11:19,583 --> 01:11:22,250 The compiler just doesn't like that I'm using ints on the right, 1536 01:11:22,250 --> 01:11:24,170 but I want floats on the left. 1537 01:11:24,170 --> 01:11:26,670 So there's going to be different ways of solving this. 1538 01:11:26,670 --> 01:11:29,870 One way would be to just ignore the problem like I originally did, 1539 01:11:29,870 --> 01:11:32,450 and just go back to %i. 1540 01:11:32,450 --> 01:11:38,330 Or as an aside, %d is often an alternative to %i for a decimal number. 1541 01:11:38,330 --> 01:11:42,358 But we use %i because it sounds like int, so %i is fine here too. 1542 01:11:42,358 --> 01:11:44,150 But I don't want to just avoid the problem. 1543 01:11:44,150 --> 01:11:46,500 I want to actually display a floating point value. 1544 01:11:46,500 --> 01:11:47,730 So how can I fix this? 1545 01:11:47,730 --> 01:11:50,272 Well, it turns out, I can solve this in a few different ways. 1546 01:11:50,272 --> 01:11:53,990 The simplest is just to make sure that at least one number on the right 1547 01:11:53,990 --> 01:11:59,330 is a floating point value, like 3.0 instead of just 3. 1548 01:11:59,330 --> 01:12:01,700 Now I think clang will be happier. 1549 01:12:01,700 --> 01:12:03,320 Let me do make scores-- 1550 01:12:03,320 --> 01:12:04,400 Enter. 1551 01:12:04,400 --> 01:12:05,330 And indeed, it's OK. 1552 01:12:05,330 --> 01:12:05,930 Why? 1553 01:12:05,930 --> 01:12:10,050 As soon as you have at least one more precise data type on the right, 1554 01:12:10,050 --> 01:12:13,170 it just treats everything, at that point, as floating point value 1555 01:12:13,170 --> 01:12:14,330 so that the math works out. 1556 01:12:14,330 --> 01:12:17,720 So ./scores, Enter-- and now, there we go, right? 1557 01:12:17,720 --> 01:12:20,390 Some of us might really want that 1/3 of a point. 1558 01:12:20,390 --> 01:12:21,980 Our average was not 59. 1559 01:12:21,980 --> 01:12:25,010 It's 59 1/3, as in this case here. 1560 01:12:25,010 --> 01:12:26,750 All right, so we've solved that there. 1561 01:12:26,750 --> 01:12:30,890 As an aside, though, there's one other technique to show here. 1562 01:12:30,890 --> 01:12:33,320 If you didn't want to change it to 3.0 because that's 1563 01:12:33,320 --> 01:12:36,410 a little weird, because there were literally three scores, 1564 01:12:36,410 --> 01:12:38,760 it's not like that needs to have a decimal point, 1565 01:12:38,760 --> 01:12:43,970 you could also explicitly convert the 3 to a float 1566 01:12:43,970 --> 01:12:46,230 by saying, in parentheses, float. 1567 01:12:46,230 --> 01:12:48,050 This is what's called typecasting. 1568 01:12:48,050 --> 01:12:51,840 And this will just convert the thing right after it to that data type, 1569 01:12:51,840 --> 01:12:52,560 if it's possible. 1570 01:12:52,560 --> 01:12:56,970 So if I do this again, make scores, no errors now. ./scores, and I get, 1571 01:12:56,970 --> 01:12:59,960 in fact, the same result. There's a bit of a rounding issue here, 1572 01:12:59,960 --> 01:13:03,650 but we know the rounding relates to the imprecision from last week. 1573 01:13:03,650 --> 01:13:06,980 For now, let me just be happy with my 59.3 something. 1574 01:13:06,980 --> 01:13:08,360 I'll take that for now. 1575 01:13:08,360 --> 01:13:14,660 But this is as close to a good enough correct answer for me now. 1576 01:13:14,660 --> 01:13:15,942 But how do I-- 1577 01:13:15,942 --> 01:13:18,650 think about now, what's going on inside of the computer's memory? 1578 01:13:18,650 --> 01:13:19,310 Well, let's consider. 1579 01:13:19,310 --> 01:13:20,643 Here's that same grid of memory. 1580 01:13:20,643 --> 01:13:22,490 Each box represents a byte. 1581 01:13:22,490 --> 01:13:25,790 Where are score1, score2, and score3 in my memory? 1582 01:13:25,790 --> 01:13:28,790 Well, score1, let me just propose, is at the top left. 1583 01:13:28,790 --> 01:13:32,060 But it's taking up four boxes for 4 bytes. 1584 01:13:32,060 --> 01:13:34,842 Score2 probably ends up right next to it in memory, 1585 01:13:34,842 --> 01:13:36,800 though, this isn't always going to be the case, 1586 01:13:36,800 --> 01:13:38,180 but I've chosen simple examples. 1587 01:13:38,180 --> 01:13:40,910 73 is next to it, also taking up 4 bytes. 1588 01:13:40,910 --> 01:13:45,320 And then lastly, 33 is in score3, down there underneath. 1589 01:13:45,320 --> 01:13:48,343 Now, if we really look at the computer's memory, 1590 01:13:48,343 --> 01:13:50,510 look at it with some kind of microscope or the like, 1591 01:13:50,510 --> 01:13:54,110 there's actually 32 bits, 32 bits, 32 bits 1592 01:13:54,110 --> 01:13:59,308 in each of those four groups of four bytes representing those values. 1593 01:13:59,308 --> 01:14:01,100 But again, for today's purposes onwards, we 1594 01:14:01,100 --> 01:14:03,308 don't really need to think again and again in binary. 1595 01:14:03,308 --> 01:14:05,940 It's just, indeed, these decimal numbers being stored there. 1596 01:14:05,940 --> 01:14:08,240 But I claim now, this isn't the best design. 1597 01:14:08,240 --> 01:14:11,300 Even if you have never programmed before CS50, 1598 01:14:11,300 --> 01:14:13,220 what you're looking at here on the screen, 1599 01:14:13,220 --> 01:14:16,970 as an excerpt, in what sense is this perhaps bad design, even though it's 1600 01:14:16,970 --> 01:14:19,960 a correct way of storing three test scores? 1601 01:14:19,960 --> 01:14:20,960 What's kind of bad here? 1602 01:14:20,960 --> 01:14:21,882 Yeah? 1603 01:14:21,882 --> 01:14:26,220 AUDIENCE: The more scores you have, the more you [INAUDIBLE].. 1604 01:14:26,220 --> 01:14:28,950 DAVID MALAN: Yeah, always do exactly what you did-- extrapolate 1605 01:14:28,950 --> 01:14:31,740 to 4 scores, 5 scores 50 scores. 1606 01:14:31,740 --> 01:14:34,020 This can't be that well-designed because now you're 1607 01:14:34,020 --> 01:14:36,300 going to have 4 lines of code, 5 lines of code, 1608 01:14:36,300 --> 01:14:38,550 50 lines of code that are almost identical, 1609 01:14:38,550 --> 01:14:40,770 except for this like arbitrary number that we're 1610 01:14:40,770 --> 01:14:42,430 updating at the end of the variable. 1611 01:14:42,430 --> 01:14:44,940 So indeed, there's probably going to be a better 1612 01:14:44,940 --> 01:14:48,690 way, even though, at least in C, we haven't yet seen that technique. 1613 01:14:48,690 --> 01:14:52,440 But the solution, today onward, is going to be something called an array. 1614 01:14:52,440 --> 01:14:57,180 An array is a way of storing your data back 1615 01:14:57,180 --> 01:15:00,630 to back to back in the computer's memory in such a way 1616 01:15:00,630 --> 01:15:03,960 that you can access each individual member easily. 1617 01:15:03,960 --> 01:15:08,530 Put another way, with an array, you can instead do something like this. 1618 01:15:08,530 --> 01:15:12,300 Instead of saying int score1, int score2, int score3, 1619 01:15:12,300 --> 01:15:15,790 giving each a value, you can first tell the computer, 1620 01:15:15,790 --> 01:15:18,330 please give me a variable called scores-- 1621 01:15:18,330 --> 01:15:20,700 plural, though you can call it anything you want-- 1622 01:15:20,700 --> 01:15:24,090 of size three, each of which will be an integer. 1623 01:15:24,090 --> 01:15:28,680 That is to say, this is how you declare an array in C that will have 1624 01:15:28,680 --> 01:15:30,930 enough room to store three integers. 1625 01:15:30,930 --> 01:15:34,540 Put another way, this is the technical way of telling the computer, 1626 01:15:34,540 --> 01:15:38,880 please give me 12 bytes in total-- 1627 01:15:38,880 --> 01:15:42,660 3 times 4 each for an int, so give me 12 bytes in total. 1628 01:15:42,660 --> 01:15:44,640 And what the computer will do is guarantee 1629 01:15:44,640 --> 01:15:47,350 that they're back to back to back in the computer's memory. 1630 01:15:47,350 --> 01:15:49,360 And that'll be useful in just a moment. 1631 01:15:49,360 --> 01:15:51,820 So let me go ahead and do something useful with this. 1632 01:15:51,820 --> 01:15:53,640 Let me store three actual scores. 1633 01:15:53,640 --> 01:15:58,500 Here's how I could now store those same numeric scores in this array. 1634 01:15:58,500 --> 01:16:03,040 Syntax is a little different, but there's one variable called scores. 1635 01:16:03,040 --> 01:16:05,010 But if you want to go to its first location, 1636 01:16:05,010 --> 01:16:08,520 starting today, you use square brackets and go to location 0 1637 01:16:08,520 --> 01:16:13,080 first, which because things in C are 0 indexed, so to speak, 1638 01:16:13,080 --> 01:16:14,280 you start counting at 0. 1639 01:16:14,280 --> 01:16:16,410 The first int is at [0]. 1640 01:16:16,410 --> 01:16:18,030 Second int is at [1]. 1641 01:16:18,030 --> 01:16:19,530 Third int is at [2]. 1642 01:16:19,530 --> 01:16:20,730 So it's not one, two, three. 1643 01:16:20,730 --> 01:16:22,090 It's literally 0, 1, 2. 1644 01:16:22,090 --> 01:16:24,090 And this is not something you have control over. 1645 01:16:24,090 --> 01:16:26,250 You must start at 0. 1646 01:16:26,250 --> 01:16:29,940 So these lines now create an array of size three, 1647 01:16:29,940 --> 01:16:33,510 and then insert one, two, three values into that array. 1648 01:16:33,510 --> 01:16:37,770 But the upside now is that you only have one name of the variable to remember. 1649 01:16:37,770 --> 01:16:39,240 It's just called scores. 1650 01:16:39,240 --> 01:16:43,380 Yes, you need to go into the array to get individual values. 1651 01:16:43,380 --> 01:16:46,618 You need to index into it using those square brackets. 1652 01:16:46,618 --> 01:16:48,660 But at least you don't have this hackish approach 1653 01:16:48,660 --> 01:16:53,050 of declaring a separate variable for each and every one of these values. 1654 01:16:53,050 --> 01:16:56,070 So let me go back to scores.c here. 1655 01:16:56,070 --> 01:16:57,580 And let me propose that I do this. 1656 01:16:57,580 --> 01:17:00,580 Let me just use that same idea to do the following. 1657 01:17:00,580 --> 01:17:02,580 Let me get rid of these three separate integers. 1658 01:17:02,580 --> 01:17:06,210 Let me give myself an int scores array of size 3. 1659 01:17:06,210 --> 01:17:10,470 And then scores[0] will, as before, be 72. 1660 01:17:10,470 --> 01:17:14,070 Scores[1] will be 73. 1661 01:17:14,070 --> 01:17:16,830 And scores[2] will be 33. 1662 01:17:16,830 --> 01:17:18,780 And let me get rid of the little dot there. 1663 01:17:18,780 --> 01:17:23,490 All right, so now, if I go ahead and run this again with make scores-- 1664 01:17:23,490 --> 01:17:24,642 Enter. 1665 01:17:24,642 --> 01:17:29,060 Huh, what did I do wrong here? 1666 01:17:29,060 --> 01:17:31,680 I think I got a little too ahead of myself. 1667 01:17:31,680 --> 01:17:36,100 Let me increase my terminal window. 1668 01:17:36,100 --> 01:17:38,830 Let's focus on line 10 here, first. 1669 01:17:38,830 --> 01:17:42,310 Error, use of undeclared identifier, score1. 1670 01:17:42,310 --> 01:17:44,170 What did I do here that was dumb? 1671 01:17:44,170 --> 01:17:45,430 Yeah? 1672 01:17:45,430 --> 01:17:47,440 AUDIENCE: You didn't declare it a variable. 1673 01:17:47,440 --> 01:17:49,420 DAVID MALAN: Right, so I didn't declare score1. 1674 01:17:49,420 --> 01:17:50,530 I've got old code. 1675 01:17:50,530 --> 01:17:53,798 So I just kind of, honestly, got ahead of myself here, not even intentionally. 1676 01:17:53,798 --> 01:17:56,090 So let me go ahead and shrink my terminal window again. 1677 01:17:56,090 --> 01:17:57,740 I need to finish my thought here. 1678 01:17:57,740 --> 01:17:58,960 So let me clear my terminal. 1679 01:17:58,960 --> 01:18:04,960 And let me change this now to be scores[0] plus scores[1] plus 1680 01:18:04,960 --> 01:18:05,610 scores[2]. 1681 01:18:05,610 --> 01:18:07,360 So it's a little more verbose because I've 1682 01:18:07,360 --> 01:18:10,040 got these square brackets, so to speak. 1683 01:18:10,040 --> 01:18:12,220 But I think now my code is consistent. 1684 01:18:12,220 --> 01:18:13,870 So let me make scores now. 1685 01:18:13,870 --> 01:18:14,950 It now compiles. 1686 01:18:14,950 --> 01:18:19,870 ./scores gives me, indeed, the same rough average with those same values. 1687 01:18:19,870 --> 01:18:24,280 All right, so let me go ahead and maybe enhance this a little bit. 1688 01:18:24,280 --> 01:18:26,920 It's a little silly to have to write a special program just 1689 01:18:26,920 --> 01:18:31,610 to check your average of three test scores like 72, 73, 33. 1690 01:18:31,610 --> 01:18:33,550 Why don't I actually make the program dynamic 1691 01:18:33,550 --> 01:18:37,250 and ask the human for those scores? 1692 01:18:37,250 --> 01:18:39,140 So instead, let me do this. 1693 01:18:39,140 --> 01:18:43,480 How about we get rid of the 72, and change this to getInt. 1694 01:18:43,480 --> 01:18:46,300 And I'll just prompt the user for a score. 1695 01:18:46,300 --> 01:18:52,510 Let me get rid of the 73 and get this to be getInt score, quote unquote. 1696 01:18:52,510 --> 01:18:56,560 And then lastly, get rid of the 33, and replace it with getInt, quote unquote, 1697 01:18:56,560 --> 01:18:57,670 score. 1698 01:18:57,670 --> 01:19:03,680 getInt is a CS50 thing for now, so I need to include cs50.h, as always. 1699 01:19:03,680 --> 01:19:05,650 But I think now, it's sort of a better program 1700 01:19:05,650 --> 01:19:08,680 because now I can compile it once, I can even share it with my friends. 1701 01:19:08,680 --> 01:19:12,490 And now any of us can average three scores on some classes test. 1702 01:19:12,490 --> 01:19:15,190 They don't need to know the code or rewrite the code just 1703 01:19:15,190 --> 01:19:16,910 to type in their scores. 1704 01:19:16,910 --> 01:19:19,150 So make scores worked. 1705 01:19:19,150 --> 01:19:25,120 ./scores, now I can type anything I want-- maybe it's a 72, 73, 33, 1706 01:19:25,120 --> 01:19:26,320 still get the same answer. 1707 01:19:26,320 --> 01:19:31,210 Or maybe I'm having a better semester, 100, 100, maybe 99, 1708 01:19:31,210 --> 01:19:33,520 and now we get still a pretty high score there. 1709 01:19:33,520 --> 01:19:34,600 But now it's dynamic. 1710 01:19:34,600 --> 01:19:36,080 Now you don't need the source code. 1711 01:19:36,080 --> 01:19:37,747 You don't need to recompile the program. 1712 01:19:37,747 --> 01:19:39,670 It's just going to work again and again. 1713 01:19:39,670 --> 01:19:41,090 But this, too. 1714 01:19:41,090 --> 01:19:43,660 Let me propose that this code is correct if I 1715 01:19:43,660 --> 01:19:45,910 want to get three scores from the user. 1716 01:19:45,910 --> 01:19:50,950 But these highlighted lines now, 6 through 9, are they well-designed, 1717 01:19:50,950 --> 01:19:53,170 would you say? 1718 01:19:53,170 --> 01:19:53,680 Yeah? 1719 01:19:53,680 --> 01:19:54,898 AUDIENCE: Can you loop? 1720 01:19:54,898 --> 01:19:55,940 DAVID MALAN: Yeah, right? 1721 01:19:55,940 --> 01:19:58,220 This is-- we can use a loop, is the spoiler here. 1722 01:19:58,220 --> 01:19:58,820 Why? 1723 01:19:58,820 --> 01:20:01,590 I mean, my God, it's like the same code again and again and again. 1724 01:20:01,590 --> 01:20:03,465 The only thing that's changing is the number. 1725 01:20:03,465 --> 01:20:06,170 And this should have kind of had some code smell again, 1726 01:20:06,170 --> 01:20:09,080 because if I keep typing the same thing again and again, 1727 01:20:09,080 --> 01:20:11,810 that's clearly an opportunity to better design something. 1728 01:20:11,810 --> 01:20:13,650 So let me do this. 1729 01:20:13,650 --> 01:20:18,590 Let me go ahead and still create my array of size three. 1730 01:20:18,590 --> 01:20:23,270 But let me use our old friend, the for loop, for int i equals 0, 1731 01:20:23,270 --> 01:20:26,610 i less than 3, i++. 1732 01:20:26,610 --> 01:20:29,510 And then in here, let me do scores bracket-- 1733 01:20:29,510 --> 01:20:32,920 we haven't seen this before, but any intuition? 1734 01:20:32,920 --> 01:20:34,220 Scores bracket-- 1735 01:20:34,220 --> 01:20:34,720 AUDIENCE: i. 1736 01:20:34,720 --> 01:20:39,730 DAVID MALAN: i, because that will use whatever i is, be it 0 or 1 or 2 1737 01:20:39,730 --> 01:20:40,720 in iteration. 1738 01:20:40,720 --> 01:20:43,780 And then I can get an int, asking the user for score, 1739 01:20:43,780 --> 01:20:47,000 without having to repeat myself again and again. 1740 01:20:47,000 --> 01:20:50,560 So hopefully, if I didn't make any typos, make scores, all good. 1741 01:20:50,560 --> 01:20:54,665 ./scores, 72, 73, 33, and we're back in business. 1742 01:20:54,665 --> 01:20:56,540 But the code is arguably now better designed, 1743 01:20:56,540 --> 01:21:01,240 because now, I haven't actually hardcoded the scores, 1744 01:21:01,240 --> 01:21:04,940 and I haven't actually copied and pasted any of that code. 1745 01:21:04,940 --> 01:21:08,230 Well, if we consider now what's going on inside of the computer's memory, 1746 01:21:08,230 --> 01:21:10,510 it's pretty much the same in terms of the values. 1747 01:21:10,510 --> 01:21:15,490 But instead of the variables being, literally, score1, score2, score3, 1748 01:21:15,490 --> 01:21:17,210 there's just one variable. 1749 01:21:17,210 --> 01:21:19,030 It's an array called scores. 1750 01:21:19,030 --> 01:21:24,550 But you can index into its three locations by using scores[0] to get 1751 01:21:24,550 --> 01:21:28,810 the first, scores[1] to get the second, scores[2] to get the third. 1752 01:21:28,810 --> 01:21:29,990 But this is key. 1753 01:21:29,990 --> 01:21:33,040 The memory is contiguous. 1754 01:21:33,040 --> 01:21:35,380 The screen is only so large, so it wraps around. 1755 01:21:35,380 --> 01:21:38,950 But physically, digitally, the memory is contiguous-- top 1756 01:21:38,950 --> 01:21:40,270 to bottom, left to right. 1757 01:21:40,270 --> 01:21:41,530 And that's important, why? 1758 01:21:41,530 --> 01:21:46,060 Because the brackets indicate 0, 1, 2, that each of these integers 1759 01:21:46,060 --> 01:21:48,790 is just one integer away from the next. 1760 01:21:48,790 --> 01:21:51,220 It can't be randomly down here all of a sudden. 1761 01:21:51,220 --> 01:21:54,070 It's got to be back to back to back. 1762 01:21:54,070 --> 01:21:57,130 All right, now equipped with that paradigm, 1763 01:21:57,130 --> 01:22:00,710 what more could we actually do here? 1764 01:22:00,710 --> 01:22:04,270 Well, it turns out, it's worth knowing that it's possible in code 1765 01:22:04,270 --> 01:22:06,850 to even pass arrays around as arguments. 1766 01:22:06,850 --> 01:22:09,100 And let me just whip this program up somewhat quickly, 1767 01:22:09,100 --> 01:22:11,320 just so you've seen it before long. 1768 01:22:11,320 --> 01:22:13,190 But let me go ahead and do this. 1769 01:22:13,190 --> 01:22:18,130 Let me propose that I create a function that does this averaging for me. 1770 01:22:18,130 --> 01:22:22,510 So I'm going to create a function called average that returns a float. 1771 01:22:22,510 --> 01:22:26,860 And the arguments this thing is going to take-- 1772 01:22:26,860 --> 01:22:28,640 let's see, it's going to be the array. 1773 01:22:28,640 --> 01:22:31,480 So it turns out, if you want to take in an array of numbers-- 1774 01:22:31,480 --> 01:22:33,050 you can call it anything you want. 1775 01:22:33,050 --> 01:22:36,970 This is how you tell C that a function takes, not 1776 01:22:36,970 --> 01:22:39,790 an integer, but an array of integers. 1777 01:22:39,790 --> 01:22:41,290 And you don't have to call it array. 1778 01:22:41,290 --> 01:22:42,790 I'm doing that just for the sake of discussion. 1779 01:22:42,790 --> 01:22:43,660 It can be called x. 1780 01:22:43,660 --> 01:22:44,490 It can be numbers. 1781 01:22:44,490 --> 01:22:45,490 It can be anything else. 1782 01:22:45,490 --> 01:22:49,060 I'm just calling an array to be super explicit as to what it is there. 1783 01:22:49,060 --> 01:22:51,730 Now, how do I change my code down here? 1784 01:22:51,730 --> 01:22:55,130 What I think I'm going to do for the moment is just this. 1785 01:22:55,130 --> 01:22:59,110 I'm going to get rid of this code here, where I manually computed the average. 1786 01:22:59,110 --> 01:23:01,480 And let me just call the average function here 1787 01:23:01,480 --> 01:23:05,000 by passing in the whole array of scores. 1788 01:23:05,000 --> 01:23:07,030 So this is just an example of abstraction, 1789 01:23:07,030 --> 01:23:08,890 like now I have a function called average. 1790 01:23:08,890 --> 01:23:09,670 I don't care. 1791 01:23:09,670 --> 01:23:12,490 I don't have to remember how it works once I implement it. 1792 01:23:12,490 --> 01:23:15,010 It just kind of tightens up my main code a little bit. 1793 01:23:15,010 --> 01:23:17,030 But I do still have to implement this. 1794 01:23:17,030 --> 01:23:19,360 So later in my file-- let me repeat myself before, 1795 01:23:19,360 --> 01:23:22,270 the only time it's OK in C to repeat yourself again and again, 1796 01:23:22,270 --> 01:23:27,010 by typing out again, average, and then int array open bracket-- 1797 01:23:27,010 --> 01:23:28,580 but now not a semicolon. 1798 01:23:28,580 --> 01:23:30,250 Now I have to implement this thing. 1799 01:23:30,250 --> 01:23:33,400 And I can implement this in a bunch of different ways, 1800 01:23:33,400 --> 01:23:37,630 but I don't know in advance-- 1801 01:23:37,630 --> 01:23:39,040 I can't just do this. 1802 01:23:39,040 --> 01:23:48,400 I can't just do array[0] plus array[1] plus array[2], 1803 01:23:48,400 --> 01:23:52,130 unless this program's only ever going to work on three numbers. 1804 01:23:52,130 --> 01:23:55,460 So let me go ahead and do this. 1805 01:23:55,460 --> 01:23:58,570 Let me first propose that there's a poor design here. 1806 01:23:58,570 --> 01:24:01,930 In my main function, what value have I repeated twice? 1807 01:24:01,930 --> 01:24:05,050 1808 01:24:05,050 --> 01:24:07,550 Among the highlighted lines, what jumps out at you as twice? 1809 01:24:07,550 --> 01:24:09,020 AUDIENCE: The length of the array? 1810 01:24:09,020 --> 01:24:11,520 DAVID MALAN: Yeah, the length of the array, it's just three. 1811 01:24:11,520 --> 01:24:14,720 Now it's not a huge deal that I typed the number three on line 8 and line 9, 1812 01:24:14,720 --> 01:24:17,120 but this is exactly the kind of like shortcut 1813 01:24:17,120 --> 01:24:18,440 that's going to get you in trouble eventually. 1814 01:24:18,440 --> 01:24:18,860 Why? 1815 01:24:18,860 --> 01:24:20,240 Because, eventually, you or someone else is 1816 01:24:20,240 --> 01:24:22,407 going to go in and make the array bigger or smaller, 1817 01:24:22,407 --> 01:24:24,410 and you're not going to realize that magically, 1818 01:24:24,410 --> 01:24:26,270 that same number is in two places. 1819 01:24:26,270 --> 01:24:29,270 And indeed, this is what a programmer would often call a magic number. 1820 01:24:29,270 --> 01:24:31,940 A magic number is one that just kind of appears magically. 1821 01:24:31,940 --> 01:24:35,210 And you're on the honor system to change it here, if you change it here, 1822 01:24:35,210 --> 01:24:36,688 and then you change it over here. 1823 01:24:36,688 --> 01:24:39,230 That's not going to end well if the onus is on the programmer 1824 01:24:39,230 --> 01:24:43,190 to remember where they hardcoded-- that is, wrote out three explicitly. 1825 01:24:43,190 --> 01:24:46,250 So any time you reuse a value like this, you know what? 1826 01:24:46,250 --> 01:24:50,690 We should probably do what we did last week, which was to declare a variable, 1827 01:24:50,690 --> 01:24:53,510 perhaps at the very top of my program, so it's super obvious 1828 01:24:53,510 --> 01:24:56,990 what it is, called, maybe n, and set that equal to 3. 1829 01:24:56,990 --> 01:24:59,030 Better yet, what did I do last week to make sure 1830 01:24:59,030 --> 01:25:02,390 that I can't screw up and accidentally change that value? 1831 01:25:02,390 --> 01:25:03,440 Yeah, constant. 1832 01:25:03,440 --> 01:25:05,810 And the keyword there was just const for short. 1833 01:25:05,810 --> 01:25:09,110 And now I have a global variable-- global in the sense that I can 1834 01:25:09,110 --> 01:25:11,870 access it anywhere-- that is called n. 1835 01:25:11,870 --> 01:25:12,680 It's an int. 1836 01:25:12,680 --> 01:25:14,450 And it's always going to be 3. 1837 01:25:14,450 --> 01:25:18,500 And now I can improve my main function a little bit by just changing 1838 01:25:18,500 --> 01:25:22,662 the 3's to n, so now if I, if a colleague realized, oh, wait a minute, 1839 01:25:22,662 --> 01:25:23,870 there's four tests this year. 1840 01:25:23,870 --> 01:25:25,610 You change n to four, recompile the code, 1841 01:25:25,610 --> 01:25:31,190 and it just works everywhere else, except in my average function. 1842 01:25:31,190 --> 01:25:33,830 Let me change it back to 3, just for consistency. 1843 01:25:33,830 --> 01:25:39,770 This is not going to fly now, to just sum up things like this, for instance, 1844 01:25:39,770 --> 01:25:43,610 and then return this divided by 3. 1845 01:25:43,610 --> 01:25:51,130 Why will this not work now as I've defined it? 1846 01:25:51,130 --> 01:25:52,159 Yeah? 1847 01:25:52,159 --> 01:25:58,030 AUDIENCE: [INAUDIBLE] 1848 01:25:58,030 --> 01:26:00,980 DAVID MALAN: OK, I might be returning an integer value when 1849 01:26:00,980 --> 01:26:02,870 I intend to return a float per this. 1850 01:26:02,870 --> 01:26:05,870 But I think I'm OK because I used that little trick where I made sure 1851 01:26:05,870 --> 01:26:08,810 that at least one of the numbers in my arithmetic expression 1852 01:26:08,810 --> 01:26:11,010 is, in fact, a floating point value. 1853 01:26:11,010 --> 01:26:14,180 And just by adding the point 0, make sure that everything 1854 01:26:14,180 --> 01:26:15,650 gets treated as a float. 1855 01:26:15,650 --> 01:26:17,864 So I think that's OK. 1856 01:26:17,864 --> 01:26:19,034 AUDIENCE: [INAUDIBLE] 1857 01:26:19,034 --> 01:26:20,701 DAVID MALAN: I'm sorry, a little louder. 1858 01:26:20,701 --> 01:26:24,385 AUDIENCE: It just seems like you're [INAUDIBLE].. 1859 01:26:24,385 --> 01:26:25,260 DAVID MALAN: Exactly. 1860 01:26:25,260 --> 01:26:27,093 So left hand's not talking to the right hand 1861 01:26:27,093 --> 01:26:30,210 here, in that my current implementation of average 1862 01:26:30,210 --> 01:26:33,510 is still assuming that there's only going to be three tests or whatever. 1863 01:26:33,510 --> 01:26:35,670 But wait a minute, I just went through the trouble 1864 01:26:35,670 --> 01:26:39,480 of modifying this to be n, generically. 1865 01:26:39,480 --> 01:26:43,205 And if I change this to 4, I'm not going to be happy, perhaps, 1866 01:26:43,205 --> 01:26:46,080 with my average because now I'm going to ignore one of my test scores 1867 01:26:46,080 --> 01:26:46,690 altogether. 1868 01:26:46,690 --> 01:26:48,450 So let me change this back to 3. 1869 01:26:48,450 --> 01:26:51,180 And unfortunately, if it's a variable now, 1870 01:26:51,180 --> 01:26:55,500 n, and therefore, I have literally a variable number of scores, 1871 01:26:55,500 --> 01:27:00,920 how do I take the average of a variable number of things? 1872 01:27:00,920 --> 01:27:02,630 I mean, what's my building block there? 1873 01:27:02,630 --> 01:27:03,170 Yeah? 1874 01:27:03,170 --> 01:27:10,100 AUDIENCE: [INAUDIBLE] 1875 01:27:10,100 --> 01:27:10,850 DAVID MALAN: Yeah. 1876 01:27:10,850 --> 01:27:14,880 Why don't I use a loop that goes through the array and adds things up as you go? 1877 01:27:14,880 --> 01:27:17,360 I mean, kind of like grade school, as you take the average on your calculator 1878 01:27:17,360 --> 01:27:19,730 or paper and pencil, you just keep adding the numbers together, 1879 01:27:19,730 --> 01:27:22,380 and then you divide at the end by the total number of things. 1880 01:27:22,380 --> 01:27:23,520 So how can I do this? 1881 01:27:23,520 --> 01:27:25,730 Well, let me change my implementation of average 1882 01:27:25,730 --> 01:27:30,515 to first declare a variable called sum, or whatever, set it equal to 0. 1883 01:27:30,515 --> 01:27:33,140 So this is like me on my piece of paper getting ready to count, 1884 01:27:33,140 --> 01:27:36,590 or my calculator, of course, when you turn it on, typically defaults to zero. 1885 01:27:36,590 --> 01:27:41,570 And now, let me do for, int i equals 0. i is less than a-- 1886 01:27:41,570 --> 01:27:43,700 well, no, I didn't do that. 1887 01:27:43,700 --> 01:27:46,730 i is less than n, i++. 1888 01:27:46,730 --> 01:27:52,640 And now in here, let me go ahead and add to the current sum, whatever 1889 01:27:52,640 --> 01:27:55,910 is in the array's location, i. 1890 01:27:55,910 --> 01:28:00,740 And then down here, I think I can just return some divided by 3.0-- 1891 01:28:00,740 --> 01:28:04,560 not 3.0, n, perhaps here. 1892 01:28:04,560 --> 01:28:08,492 And actually, I think I'm going to get-- let's make sure it's a float. 1893 01:28:08,492 --> 01:28:11,450 Let's use the type casting trick just to make sure I don't accidentally 1894 01:28:11,450 --> 01:28:15,540 shortchange someone and throw away everything after the decimal point. 1895 01:28:15,540 --> 01:28:17,300 So it just escalated quickly, right? 1896 01:28:17,300 --> 01:28:18,990 Average just got a lot more involved. 1897 01:28:18,990 --> 01:28:22,130 It's not just a single one line of code, but now it's dynamic. 1898 01:28:22,130 --> 01:28:25,070 I initialize a variable called sum to 0. 1899 01:28:25,070 --> 01:28:30,920 In this loop, I go through and just keep adding to sum, which is initially 0, 1900 01:28:30,920 --> 01:28:33,200 whatever's in array[i]-- 1901 01:28:33,200 --> 01:28:36,740 or specifically array[0], array[1], array[2]. 1902 01:28:36,740 --> 01:28:40,970 That gives me a total sum that I return, divided by the total number of things. 1903 01:28:40,970 --> 01:28:42,560 Now, this I can tighten slightly. 1904 01:28:42,560 --> 01:28:45,650 Recall that this is syntactic sugar for just adding things. 1905 01:28:45,650 --> 01:28:48,620 I can't use plus plus because that only literally adds one. 1906 01:28:48,620 --> 01:28:52,630 But I can use here, plus equals. 1907 01:28:52,630 --> 01:28:54,880 Questions on this implementation here? 1908 01:28:54,880 --> 01:28:58,000 Really the only takeaway-- or the most important takeaway 1909 01:28:58,000 --> 01:29:00,730 is that this is the syntax for how you tell 1910 01:29:00,730 --> 01:29:04,210 a function that it expects a whole array, not 1911 01:29:04,210 --> 01:29:06,450 a single variable like an int or the like. 1912 01:29:06,450 --> 01:29:08,200 You literally use square brackets, but you 1913 01:29:08,200 --> 01:29:11,530 don't specify the length inside there. 1914 01:29:11,530 --> 01:29:12,748 Yeah? 1915 01:29:12,748 --> 01:29:16,410 AUDIENCE: What variable [INAUDIBLE] at the top? 1916 01:29:16,410 --> 01:29:18,410 DAVID MALAN: What about the variable at the top? 1917 01:29:18,410 --> 01:29:22,205 AUDIENCE: [INAUDIBLE] 1918 01:29:22,205 --> 01:29:23,330 DAVID MALAN: Good question. 1919 01:29:23,330 --> 01:29:25,220 What do I have it defined as at the top? 1920 01:29:25,220 --> 01:29:31,280 This variable, N, it must be an integer if you're going to use it inside 1921 01:29:31,280 --> 01:29:33,840 of an arrays square brackets here. 1922 01:29:33,840 --> 01:29:38,360 So this line 10, notice, no longer says 3, it says N. 1923 01:29:38,360 --> 01:29:42,350 And so whatever N is 3 or 4 or something else, that's how many 1924 01:29:42,350 --> 01:29:43,970 integers I will get in that array. 1925 01:29:43,970 --> 01:29:47,070 And it must be, by definition of an array, an integer that 1926 01:29:47,070 --> 01:29:48,320 goes in those square brackets. 1927 01:29:48,320 --> 01:29:50,000 And here's a common source of confusion. 1928 01:29:50,000 --> 01:29:52,350 When you create the array, that is declare it, 1929 01:29:52,350 --> 01:29:54,350 you use square brackets like this, where you put 1930 01:29:54,350 --> 01:29:56,210 the total number of elements you want. 1931 01:29:56,210 --> 01:29:59,820 When you subsequently use the array, like I'm doing here, 1932 01:29:59,820 --> 01:30:02,690 you don't mention int again-- just like you don't mention int 1933 01:30:02,690 --> 01:30:04,610 again and again once a variable exists. 1934 01:30:04,610 --> 01:30:10,220 You use the square brackets still, but you don't use N. You use 0 or 1 or 2 1935 01:30:10,220 --> 01:30:11,990 or, generically here, i. 1936 01:30:11,990 --> 01:30:14,810 So when C was designed, they sometimes used the same syntax 1937 01:30:14,810 --> 01:30:17,060 for two different ideas or contexts. 1938 01:30:17,060 --> 01:30:17,984 Yeah? 1939 01:30:17,984 --> 01:30:22,645 AUDIENCE: Do you have to include line 6 [INAUDIBLE]?? 1940 01:30:22,645 --> 01:30:23,770 DAVID MALAN: Good question. 1941 01:30:23,770 --> 01:30:25,900 Do I have to include line 6? 1942 01:30:25,900 --> 01:30:29,290 Short answer, yes, because of the reason we ran into last week. 1943 01:30:29,290 --> 01:30:32,750 C, or clang really, reads your code top to bottom, left to right. 1944 01:30:32,750 --> 01:30:38,890 And so if the compiler sees some mention of this function average on line 16, 1945 01:30:38,890 --> 01:30:41,800 but you haven't told the compiler that average exists, 1946 01:30:41,800 --> 01:30:43,610 you're going to get an error on the screen. 1947 01:30:43,610 --> 01:30:45,490 So the conventional way to do that is you 1948 01:30:45,490 --> 01:30:48,670 just copy paste the first line of code from the function, 1949 01:30:48,670 --> 01:30:51,260 it's so-called prototype or declaration. 1950 01:30:51,260 --> 01:30:51,760 Yeah? 1951 01:30:51,760 --> 01:30:55,662 AUDIENCE: Is there a library if you don't know the size of the array? 1952 01:30:55,662 --> 01:30:58,120 DAVID MALAN: Really good question, and a perfect segue way. 1953 01:30:58,120 --> 01:31:01,078 Is there a library you can use if you don't know the size of the array? 1954 01:31:01,078 --> 01:31:01,720 No. 1955 01:31:01,720 --> 01:31:07,660 And so if any of you have programmed in Java or Python or other languages, 1956 01:31:07,660 --> 01:31:11,020 you can actually just ask the array, how big is it? 1957 01:31:11,020 --> 01:31:13,778 In C, you and I, the programmers, have to remember it. 1958 01:31:13,778 --> 01:31:15,820 And so short answer, no, there's no function that 1959 01:31:15,820 --> 01:31:17,445 will just automatically do this for us. 1960 01:31:17,445 --> 01:31:20,230 And in fact, let me make a more subtle claim 1961 01:31:20,230 --> 01:31:23,950 that it's fine to use global variables like this if they're really 1962 01:31:23,950 --> 01:31:25,160 for configuration options. 1963 01:31:25,160 --> 01:31:25,660 Why? 1964 01:31:25,660 --> 01:31:28,160 It's just convenient to put them at the very top of the file 1965 01:31:28,160 --> 01:31:30,565 because everyone, you, your colleagues, your TAs 1966 01:31:30,565 --> 01:31:32,440 are going to see them at the top of the code. 1967 01:31:32,440 --> 01:31:36,130 But you really shouldn't be using them everywhere throughout your code. 1968 01:31:36,130 --> 01:31:38,380 It'd be better if the average function, itself, were 1969 01:31:38,380 --> 01:31:40,610 independent of that special variable. 1970 01:31:40,610 --> 01:31:42,025 So by that, I mean this. 1971 01:31:42,025 --> 01:31:46,240 You know what I should really do, if I really want to be well-designed? 1972 01:31:46,240 --> 01:31:51,400 I should pass in the length of the array to the average function. 1973 01:31:51,400 --> 01:31:54,310 I should give the average function a second argument-- 1974 01:31:54,310 --> 01:31:57,800 I'll call it length, for instance, but I could call it anything I want. 1975 01:31:57,800 --> 01:32:02,500 And so rather than putting N all the way down here at the bottom of my file, 1976 01:32:02,500 --> 01:32:05,745 let me just dynamically say length instead. 1977 01:32:05,745 --> 01:32:08,620 And this is a subtlety-- and no need to get too tripped up over this. 1978 01:32:08,620 --> 01:32:11,830 But this, now, is just an example of how the same function can 1979 01:32:11,830 --> 01:32:13,690 take not one, but two arguments. 1980 01:32:13,690 --> 01:32:19,400 But indeed, in C, you must remember, yourself, what the length of an array 1981 01:32:19,400 --> 01:32:19,900 is. 1982 01:32:19,900 --> 01:32:22,810 You can't just ask the array via some syntax 1983 01:32:22,810 --> 01:32:26,560 like you can, those of you who've programmed before in Java or Python. 1984 01:32:26,560 --> 01:32:27,070 Yeah? 1985 01:32:27,070 --> 01:32:35,115 AUDIENCE: [INAUDIBLE] 1986 01:32:35,115 --> 01:32:36,240 DAVID MALAN: Good question. 1987 01:32:36,240 --> 01:32:39,198 Would it be better designed to write a function that computes the size? 1988 01:32:39,198 --> 01:32:42,570 Short answer, can't do that in C. As soon as you pass an array 1989 01:32:42,570 --> 01:32:47,263 into a function in C, you cannot figure out its size if it's a generic array 1990 01:32:47,263 --> 01:32:48,180 like that of integers. 1991 01:32:48,180 --> 01:32:51,040 There are special cases that you can do that. 1992 01:32:51,040 --> 01:32:53,283 But in general, no, it's just not possible in C. 1993 01:32:53,283 --> 01:32:55,200 And if that's some frustration, honestly, this 1994 01:32:55,200 --> 01:32:57,180 is why more modern languages add that feature. 1995 01:32:57,180 --> 01:32:57,680 Why? 1996 01:32:57,680 --> 01:32:59,910 Because it was really annoying, as I'm alluding here 1997 01:32:59,910 --> 01:33:01,560 to not having that information. 1998 01:33:01,560 --> 01:33:03,643 Now, just to make sure I didn't screw up anywhere, 1999 01:33:03,643 --> 01:33:07,540 let me compile this final version of scores. 2000 01:33:07,540 --> 01:33:08,620 Suspense. 2001 01:33:08,620 --> 01:33:14,030 All good. ./scores, 72, 73, 33, and we're still back in business. 2002 01:33:14,030 --> 01:33:15,530 So this version is more complicated. 2003 01:33:15,530 --> 01:33:18,738 And as always, we'll have this version on the course's website for reference. 2004 01:33:18,738 --> 01:33:20,740 But the point, really, is that arrays, not only 2005 01:33:20,740 --> 01:33:23,290 can be used as containers to store multiple values-- 2006 01:33:23,290 --> 01:33:25,490 three or more in this case-- 2007 01:33:25,490 --> 01:33:30,440 you can also even pass them around as arguments, as such. 2008 01:33:30,440 --> 01:33:34,300 All right, now besides that, let's simplify for just a moment, 2009 01:33:34,300 --> 01:33:36,100 and consider now the world of chars. 2010 01:33:36,100 --> 01:33:39,200 If we've just got single bytes, where does this lead us? 2011 01:33:39,200 --> 01:33:41,200 And how does this get us, ultimately, to strings 2012 01:33:41,200 --> 01:33:44,170 to solve problems like readability and cryptography and the like? 2013 01:33:44,170 --> 01:33:46,390 Well here, for instance, are three lines of code, 2014 01:33:46,390 --> 01:33:48,967 out of context, that simply store three chars. 2015 01:33:48,967 --> 01:33:50,800 And you can already see where this is going. 2016 01:33:50,800 --> 01:33:53,920 Having three variables called c1, c2, c3 is clearly 2017 01:33:53,920 --> 01:33:57,470 going to end up being bad design because of all the silly redundancy here. 2018 01:33:57,470 --> 01:33:59,650 But notice, I'm using single quotes like last week 2019 01:33:59,650 --> 01:34:01,330 because these are single chars. 2020 01:34:01,330 --> 01:34:03,647 What does this look like in the computer's memory? 2021 01:34:03,647 --> 01:34:05,480 Well, it looks a little something like this. 2022 01:34:05,480 --> 01:34:09,730 If we clear out the old memory, c1, c2, c3 probably 2023 01:34:09,730 --> 01:34:12,562 will end up here, maybe not literally in the top left-hand corner. 2024 01:34:12,562 --> 01:34:14,020 This is just an artist's rendition. 2025 01:34:14,020 --> 01:34:18,440 But c1, c2, c3 will probably end up like that. 2026 01:34:18,440 --> 01:34:20,020 Now, what's really there? 2027 01:34:20,020 --> 01:34:21,730 It's really those same three numbers-- 2028 01:34:21,730 --> 01:34:23,350 72, 73, 33. 2029 01:34:23,350 --> 01:34:27,920 But how many bits does a byte have? 2030 01:34:27,920 --> 01:34:28,880 Just eight. 2031 01:34:28,880 --> 01:34:33,830 So if we were to look at the binary representation of these characters, 2032 01:34:33,830 --> 01:34:35,330 it would only be eight bits each. 2033 01:34:35,330 --> 01:34:39,140 That's enough to store small numbers like 72, 73, 33. 2034 01:34:39,140 --> 01:34:41,580 We're not dealing with Unicode and emoji and the like. 2035 01:34:41,580 --> 01:34:42,837 But the point is the same. 2036 01:34:42,837 --> 01:34:45,170 You don't have to use four bytes to store these numbers. 2037 01:34:45,170 --> 01:34:48,087 You can use a different data type like chars, and underneath the hood, 2038 01:34:48,087 --> 01:34:51,420 it's, indeed, going to use just single bytes for each. 2039 01:34:51,420 --> 01:34:55,850 But this is sort of like a-- this isn't really how we implement strings, right? 2040 01:34:55,850 --> 01:34:59,270 When you wanted to say, hi, last week, or this, we used double quotes. 2041 01:34:59,270 --> 01:35:02,400 And we wrote all of the things together and used one variable, not three, 2042 01:35:02,400 --> 01:35:02,900 right? 2043 01:35:02,900 --> 01:35:06,260 When I typed in David, I didn't have a variable for D-A-V-I-D. 2044 01:35:06,260 --> 01:35:09,750 I had one variable called name that stored the whole thing. 2045 01:35:09,750 --> 01:35:13,310 So in C, we keep talking about these things called strings. 2046 01:35:13,310 --> 01:35:17,427 We'll see, eventually, that strings are not necessarily what they seem to be. 2047 01:35:17,427 --> 01:35:19,760 But for now, the key thing about strings is that they're 2048 01:35:19,760 --> 01:35:22,070 variable length, so to speak, right? 2049 01:35:22,070 --> 01:35:25,250 They might be three characters, Hi, or five characters, David, 2050 01:35:25,250 --> 01:35:28,250 or anything smaller or larger. 2051 01:35:28,250 --> 01:35:30,980 So how do we go about implementing strings, 2052 01:35:30,980 --> 01:35:33,110 if all we have at the end of the day is my memory? 2053 01:35:33,110 --> 01:35:36,290 Well, here is an example of just creating, declaring, 2054 01:35:36,290 --> 01:35:39,650 and defining a string called s. s because it's just a simple string, 2055 01:35:39,650 --> 01:35:41,900 and quote unquote, HI!, in double quotes. 2056 01:35:41,900 --> 01:35:44,090 What does this look like in the computer's memory? 2057 01:35:44,090 --> 01:35:45,230 Well, let's clear it again. 2058 01:35:45,230 --> 01:35:48,110 And here, now, because it's technically stored in one variable, 2059 01:35:48,110 --> 01:35:50,960 s, here is how I might draw it as an artist. 2060 01:35:50,960 --> 01:35:52,520 It's three bytes in total-- 2061 01:35:52,520 --> 01:35:53,990 H-I exclamation point. 2062 01:35:53,990 --> 01:35:59,630 But there's no c1, c2, c3, it's just, the whole thing is s. 2063 01:35:59,630 --> 01:36:03,800 But it turns out that a string, fun fact, 2064 01:36:03,800 --> 01:36:06,990 is really just what underneath the hood? 2065 01:36:06,990 --> 01:36:09,610 Kind of leading up to this-- 2066 01:36:09,610 --> 01:36:12,090 what is a string, if this is how it's laid out in memory? 2067 01:36:12,090 --> 01:36:13,190 AUDIENCE: An array. 2068 01:36:13,190 --> 01:36:15,830 DAVID MALAN: Literally, it's just an array of characters. 2069 01:36:15,830 --> 01:36:18,590 And we didn't have to know about arrays last week to use strings. 2070 01:36:18,590 --> 01:36:21,382 This is where, again, the training wheels are starting to come off. 2071 01:36:21,382 --> 01:36:23,730 But a string is just an array of characters. 2072 01:36:23,730 --> 01:36:26,040 H-I exclamation point, for instance. 2073 01:36:26,040 --> 01:36:28,370 So technically, an array-- 2074 01:36:28,370 --> 01:36:33,890 or a string called s is really a variable called s that allows you 2075 01:36:33,890 --> 01:36:38,150 to get at the first character with s[0], if you want-- s[1], s[2]. 2076 01:36:38,150 --> 01:36:40,340 You can literally get individual characters 2077 01:36:40,340 --> 01:36:43,820 just by treating s as though it's an array, which it really 2078 01:36:43,820 --> 01:36:47,000 is underneath the hood, in this case. 2079 01:36:47,000 --> 01:36:48,560 But there's a catch. 2080 01:36:48,560 --> 01:36:51,500 How do you know where strings end? 2081 01:36:51,500 --> 01:36:54,560 In the past, when I drew some integers on the screen, 2082 01:36:54,560 --> 01:36:57,080 I know, I claim they always take up 4 bytes. 2083 01:36:57,080 --> 01:37:00,200 If I had drawn a long, it always takes up 8 bytes. 2084 01:37:00,200 --> 01:37:03,530 If I had drawn a character, it always takes up 1 byte. 2085 01:37:03,530 --> 01:37:06,533 But how many bytes does a string take up? 2086 01:37:06,533 --> 01:37:08,450 Yeah, I mean, that's kind of the right answer. 2087 01:37:08,450 --> 01:37:10,490 In this case, three, it would seem. 2088 01:37:10,490 --> 01:37:13,490 But if it's David, that's a good five characters. 2089 01:37:13,490 --> 01:37:16,173 But where do we put the number three? 2090 01:37:16,173 --> 01:37:17,840 Where do you put the number five, right? 2091 01:37:17,840 --> 01:37:20,190 This is literally all that's inside your computer. 2092 01:37:20,190 --> 01:37:23,430 This is all our building blocks in front of us. 2093 01:37:23,430 --> 01:37:25,490 So how can we-- where does the three go? 2094 01:37:25,490 --> 01:37:26,540 Where does the five go? 2095 01:37:26,540 --> 01:37:29,420 Well, it turns out you can solve this in a couple of different ways. 2096 01:37:29,420 --> 01:37:34,160 But the way humans decided to implement strings years ago is, indeed, an array, 2097 01:37:34,160 --> 01:37:38,960 but they added one extra byte at the end of every such string array, 2098 01:37:38,960 --> 01:37:41,840 just to make clear, with a so-called sentinel value, 2099 01:37:41,840 --> 01:37:44,480 that the string ends here. 2100 01:37:44,480 --> 01:37:45,050 Why? 2101 01:37:45,050 --> 01:37:47,930 So that if you have two strings in the computer's memory like, HI! 2102 01:37:47,930 --> 01:37:52,760 and bye, you know where the barrier is between the exclamation point of one 2103 01:37:52,760 --> 01:37:54,590 and the letter B in the next, right? 2104 01:37:54,590 --> 01:37:56,000 You need some kind of delimiter. 2105 01:37:56,000 --> 01:38:00,110 And so what really is underneath the hood is this. 2106 01:38:00,110 --> 01:38:04,460 When you store a string in memory, when you type in a string-- as the user, 2107 01:38:04,460 --> 01:38:07,040 if you type in 3 characters, it's going to use 2108 01:38:07,040 --> 01:38:10,280 3 plus 1 equals 4 bytes in total. 2109 01:38:10,280 --> 01:38:14,130 If you type in David, it's going to use 5 plus 1 equals 6 bytes in total. 2110 01:38:14,130 --> 01:38:14,630 Why? 2111 01:38:14,630 --> 01:38:20,210 Because C automatically adds this special 0 at the end of the string. 2112 01:38:20,210 --> 01:38:24,710 I've drawn it with backslash 0 because this is how you represent 0 as a char, 2113 01:38:24,710 --> 01:38:25,710 as a character. 2114 01:38:25,710 --> 01:38:28,230 But this is literally just 0, as we'll soon see. 2115 01:38:28,230 --> 01:38:31,100 So any time there's a string in memory, it always takes up 2116 01:38:31,100 --> 01:38:36,197 one more byte than you, yourself, as the programmer or human typed in. 2117 01:38:36,197 --> 01:38:38,780 In fact, if we convert this again, just for discussion's sake, 2118 01:38:38,780 --> 01:38:41,572 to those integers, what's literally stored in the computer's memory 2119 01:38:41,572 --> 01:38:45,170 is going to be 72, 73, 33, and now a 0. 2120 01:38:45,170 --> 01:38:48,240 And the computer, because of C and how it was invented, 2121 01:38:48,240 --> 01:38:51,350 it's just smart enough to know that when you print out a string, 2122 01:38:51,350 --> 01:38:54,530 it prints out every character until it sees a 0, 2123 01:38:54,530 --> 01:38:56,150 and then it just stops printing. 2124 01:38:56,150 --> 01:38:58,470 In particular, printf knows how this works. 2125 01:38:58,470 --> 01:39:02,050 And this is why printf knows when to stop printing. 2126 01:39:02,050 --> 01:39:03,800 Decimal numbers are not that enlightening. 2127 01:39:03,800 --> 01:39:05,940 We'll generally write the characters like this. 2128 01:39:05,940 --> 01:39:09,350 And again, backslash 0 is just special symbology. 2129 01:39:09,350 --> 01:39:13,190 It's what the programmer types to make clear that you're not saying, HI!, 0. 2130 01:39:13,190 --> 01:39:15,980 You're saying HI!, and then it's a special 0. 2131 01:39:15,980 --> 01:39:20,887 Specifically, it is eight 0 bits that indicate 2132 01:39:20,887 --> 01:39:22,220 that it's the end of the string. 2133 01:39:22,220 --> 01:39:26,330 Technically, that backslash zero, if you want to be fancy, it's called null, 2134 01:39:26,330 --> 01:39:27,320 N-U-L-L. 2135 01:39:27,320 --> 01:39:30,320 And it turns out, you've seen this before, though we didn't call it out. 2136 01:39:30,320 --> 01:39:33,230 Here's that same ASCII chart from the past couple of weeks. 2137 01:39:33,230 --> 01:39:39,080 If I highlight this, what is decimal number 0 mapping to? 2138 01:39:39,080 --> 01:39:42,830 NUL, which is just programmer speak for the special null character. 2139 01:39:42,830 --> 01:39:46,550 All 0 bits that means the string ends here. 2140 01:39:46,550 --> 01:39:48,510 This all happens automatically for you. 2141 01:39:48,510 --> 01:39:53,420 You do not need to create these null characters or these zeros. 2142 01:39:53,420 --> 01:40:00,030 Any questions then, on this implementation thus far? 2143 01:40:00,030 --> 01:40:01,820 Any questions here? 2144 01:40:01,820 --> 01:40:02,320 No? 2145 01:40:02,320 --> 01:40:03,195 Well, let me do this. 2146 01:40:03,195 --> 01:40:05,310 Let me go back to VS Code in a second. 2147 01:40:05,310 --> 01:40:07,770 And let's actually corroborate this with some code. 2148 01:40:07,770 --> 01:40:10,830 Let me go ahead and create a small program called hi.c. 2149 01:40:10,830 --> 01:40:12,070 And how about we do this? 2150 01:40:12,070 --> 01:40:14,550 Let me include stdio.h. 2151 01:40:14,550 --> 01:40:18,670 Let me include-- let me type out int main void, as always. 2152 01:40:18,670 --> 01:40:20,910 And now let me do something simple and kind of bad, 2153 01:40:20,910 --> 01:40:24,960 but char c1 equals quote unquote, h, in single quotes. 2154 01:40:24,960 --> 01:40:28,590 Char c2 equals quote unquote, I, in single quotes. 2155 01:40:28,590 --> 01:40:32,830 And lastly, char c3 equals exclamation point, in single quotes. 2156 01:40:32,830 --> 01:40:34,500 And now, let me just print this out. 2157 01:40:34,500 --> 01:40:36,960 I can't use %s because that is not a string. 2158 01:40:36,960 --> 01:40:40,290 That's literally three chars, because that's the design decision I made. 2159 01:40:40,290 --> 01:40:41,430 But I could do this-- 2160 01:40:41,430 --> 01:40:48,600 %c, %c, %c, which we haven't seen before, but %s is string, %i is int, 2161 01:40:48,600 --> 01:40:51,060 %c is, indeed, char. 2162 01:40:51,060 --> 01:40:54,150 So let me put a backslash n at the end for cleanliness, 2163 01:40:54,150 --> 01:40:56,280 and now do, c1, c2, c3. 2164 01:40:56,280 --> 01:41:00,430 So this is like a char-based version of printing string. 2165 01:41:00,430 --> 01:41:01,650 So let me make HI! 2166 01:41:01,650 --> 01:41:05,880 And then let me do ./hi, and it looks like I used printf with %s. 2167 01:41:05,880 --> 01:41:09,750 But I did things very manually by printing out each individual character. 2168 01:41:09,750 --> 01:41:11,700 What's cool now, though, is that once you 2169 01:41:11,700 --> 01:41:15,270 know that characters are just numbers and strings are just characters, 2170 01:41:15,270 --> 01:41:16,560 you can kind of poke around. 2171 01:41:16,560 --> 01:41:21,970 Let me change all three placeholders to %i instead. 2172 01:41:21,970 --> 01:41:23,860 And this is totally fine, too. 2173 01:41:23,860 --> 01:41:26,310 Let me rerun this, make hi. 2174 01:41:26,310 --> 01:41:31,570 Actually, let me make one change, just so we can see this. 2175 01:41:31,570 --> 01:41:37,710 Let me add spaces, just for aesthetics sake, let me do make hi, ./hi, Enter, 2176 01:41:37,710 --> 01:41:40,350 and voila, like now, you can actually see the numbers, 2177 01:41:40,350 --> 01:41:44,085 that I claimed back in week zero, were in fact happening underneath the hood. 2178 01:41:44,085 --> 01:41:45,960 Well, this is not how you would make strings. 2179 01:41:45,960 --> 01:41:49,457 It'd be incredibly tedious to have three variables for three letter words, five 2180 01:41:49,457 --> 01:41:50,790 variables for five letter words. 2181 01:41:50,790 --> 01:41:52,998 We've been using, of course, strings since last week, 2182 01:41:52,998 --> 01:41:54,450 so let's do that instead. 2183 01:41:54,450 --> 01:41:59,370 String s equals quote unquote, double quotes "HI!" 2184 01:41:59,370 --> 01:42:02,520 For this, no, because of these training wheels, 2185 01:42:02,520 --> 01:42:04,560 I need to include the CS50 library. 2186 01:42:04,560 --> 01:42:06,580 But we'll come back to that in the coming weeks. 2187 01:42:06,580 --> 01:42:10,530 But for now, I'm going to go ahead and create a string s called quote unquote, 2188 01:42:10,530 --> 01:42:11,580 "HI!" 2189 01:42:11,580 --> 01:42:14,760 And now I'm going to change this to be my familiar %s, 2190 01:42:14,760 --> 01:42:17,610 and now just print out s itself. 2191 01:42:17,610 --> 01:42:20,430 This, of course, is the same thing as last week, ./hi, 2192 01:42:20,430 --> 01:42:24,750 gives me the exact same thing, but now, we're dealing, of course, with strings. 2193 01:42:24,750 --> 01:42:27,610 But how can we see a little beyond that? 2194 01:42:27,610 --> 01:42:28,810 Well, how about this? 2195 01:42:28,810 --> 01:42:31,530 Let's poke around further with today's primitives. 2196 01:42:31,530 --> 01:42:35,580 Even though s is a string, I could technically print out its first 2197 01:42:35,580 --> 01:42:39,000 character with %c by doing s[0]. 2198 01:42:39,000 --> 01:42:43,110 I could technically print out its second character with %c by doing s[1]. 2199 01:42:43,110 --> 01:42:47,820 I could print out its third character with %c and printing out s[2]. 2200 01:42:47,820 --> 01:42:50,430 So again, this just derives logically from my understanding 2201 01:42:50,430 --> 01:42:52,770 now that strings are arrays, as you note. 2202 01:42:52,770 --> 01:42:54,540 Let me do make-- 2203 01:42:54,540 --> 01:42:57,300 let me do make hi, ./hi. 2204 01:42:57,300 --> 01:43:00,760 And no visual change, but I'm just kind of now tinkering around. 2205 01:43:00,760 --> 01:43:03,400 And in fact, if you're really curious, let me do this. 2206 01:43:03,400 --> 01:43:06,870 Let me change these back to i, back to i-- 2207 01:43:06,870 --> 01:43:08,250 oops, back to i. 2208 01:43:08,250 --> 01:43:11,310 And let me add a fourth one because if I'm really curious now, 2209 01:43:11,310 --> 01:43:14,490 let's see what's in s[3]. 2210 01:43:14,490 --> 01:43:16,020 This is the fourth byte. 2211 01:43:16,020 --> 01:43:18,990 And even though the string itself is H-I, 2212 01:43:18,990 --> 01:43:21,840 I think we can corroborate this whole null thing. 2213 01:43:21,840 --> 01:43:26,248 Make hi, ./hi, Enter, and there it is. 2214 01:43:26,248 --> 01:43:28,290 You could have done this last week, if you really 2215 01:43:28,290 --> 01:43:29,580 wanted to geek out on strings. 2216 01:43:29,580 --> 01:43:33,060 But for now, it's just revealing what's going on underneath the hood. 2217 01:43:33,060 --> 01:43:36,480 Questions then, on what these strings are? 2218 01:43:36,480 --> 01:43:37,498 Yeah? 2219 01:43:37,498 --> 01:43:41,293 AUDIENCE: [INAUDIBLE] 2220 01:43:41,293 --> 01:43:42,960 DAVID MALAN: Why do we need the bracket? 2221 01:43:42,960 --> 01:43:45,430 AUDIENCE: [INAUDIBLE] 2222 01:43:45,430 --> 01:43:47,180 DAVID MALAN: Why do you not need brackets? 2223 01:43:47,180 --> 01:43:47,780 Good question. 2224 01:43:47,780 --> 01:43:51,620 Why do I not need brackets on line 6? 2225 01:43:51,620 --> 01:43:53,300 Because s is a string. 2226 01:43:53,300 --> 01:43:56,930 We'll see in a couple of weeks that s is, essentially, 2227 01:43:56,930 --> 01:44:00,200 implemented underneath the hood, indeed, as an array, 2228 01:44:00,200 --> 01:44:02,240 but that happens automatically for you. 2229 01:44:02,240 --> 01:44:06,800 You can treat s as just a variable name without square brackets. 2230 01:44:06,800 --> 01:44:09,500 You will use square brackets when you have arrays of ints 2231 01:44:09,500 --> 01:44:13,730 or you manually create arrays of chars or doubles or floats or anything else. 2232 01:44:13,730 --> 01:44:14,900 But strings are special. 2233 01:44:14,900 --> 01:44:15,440 Why? 2234 01:44:15,440 --> 01:44:19,190 I mean, every program you write seems to use strings, text in some form. 2235 01:44:19,190 --> 01:44:21,930 We're humans we like text, not just numbers and such. 2236 01:44:21,930 --> 01:44:25,910 So this is just treated a little specially in C and many other languages 2237 01:44:25,910 --> 01:44:28,580 as well. 2238 01:44:28,580 --> 01:44:31,170 Other questions on this here? 2239 01:44:31,170 --> 01:44:31,670 No? 2240 01:44:31,670 --> 01:44:33,530 Let's add then, one other string to the mix. 2241 01:44:33,530 --> 01:44:36,290 So instead of just saying, HI!, why don't we consider a version 2242 01:44:36,290 --> 01:44:38,660 of the program that says both, HI! and BYE!. 2243 01:44:38,660 --> 01:44:41,420 And I claim now that that backslash zero, 2244 01:44:41,420 --> 01:44:44,270 that null character is going to be ever more important now 2245 01:44:44,270 --> 01:44:46,820 if we've got two strings in memory, so that C knows 2246 01:44:46,820 --> 01:44:48,570 how to distinguish one from the other. 2247 01:44:48,570 --> 01:44:51,487 So let me go ahead and just get rid of these two lines for the moment. 2248 01:44:51,487 --> 01:44:55,430 Let me recreate string s equals, quote unquote double quotes, "HI!" 2249 01:44:55,430 --> 01:44:56,780 Let me give myself another one. 2250 01:44:56,780 --> 01:44:59,905 And because I'm just playing around, I'll choose very short variable names. 2251 01:44:59,905 --> 01:45:04,410 String t equals quote unquote, "BYE!" 2252 01:45:04,410 --> 01:45:06,470 And then let me just print them both out. 2253 01:45:06,470 --> 01:45:11,300 Let me go ahead and print out %s, backslash n, comma s, 2254 01:45:11,300 --> 01:45:16,910 and then printf %s backslash n, and then t. 2255 01:45:16,910 --> 01:45:19,970 So very simple demonstration of just these two variables. 2256 01:45:19,970 --> 01:45:26,090 Make hi, ./hi, and of course, it prints out two lines, one after the other. 2257 01:45:26,090 --> 01:45:27,980 What's actually going on underneath the hood? 2258 01:45:27,980 --> 01:45:29,510 Well, let's go back to the computer's memory. 2259 01:45:29,510 --> 01:45:32,160 HI!, I think, is going to be, I claim, pretty much the same. 2260 01:45:32,160 --> 01:45:36,170 So s, I'll claim, is in the top left, followed by the backslash zero. 2261 01:45:36,170 --> 01:45:40,035 And that's important now because BYE! probably is going to end up there. 2262 01:45:40,035 --> 01:45:43,160 And visually, it wraps just by nature of how I've drawn this grid of bytes, 2263 01:45:43,160 --> 01:45:44,330 but it's contiguous. 2264 01:45:44,330 --> 01:45:46,340 B-Y-E-! 2265 01:45:46,340 --> 01:45:51,470 null, A.K.A. backslash zero, this is now helpful to printf 2266 01:45:51,470 --> 01:45:55,550 because now printf knows where one begins and ends 2267 01:45:55,550 --> 01:45:58,580 by way of that special null character. 2268 01:45:58,580 --> 01:46:00,230 But we can poke around now, too. 2269 01:46:00,230 --> 01:46:01,620 What else can I do here? 2270 01:46:01,620 --> 01:46:02,840 How about this? 2271 01:46:02,840 --> 01:46:08,870 How about I go into my code here, back to VS code, and let me go ahead 2272 01:46:08,870 --> 01:46:13,790 and say something like, well, if I've got two of these strings, 2273 01:46:13,790 --> 01:46:15,410 you know, let's put them in an array. 2274 01:46:15,410 --> 01:46:20,520 Let's kind of do this sort of arrays in arrays, sort of inception-style here. 2275 01:46:20,520 --> 01:46:23,060 So string words[2]. 2276 01:46:23,060 --> 01:46:25,100 So give me an array of two strings is what 2277 01:46:25,100 --> 01:46:28,100 I'm saying here in code, even though we've not done it with strings yet. 2278 01:46:28,100 --> 01:46:29,270 We only did it with ints. 2279 01:46:29,270 --> 01:46:30,770 And now let me do this. 2280 01:46:30,770 --> 01:46:35,480 The first word A.K.A. words[0] will equal, as before, HI! 2281 01:46:35,480 --> 01:46:40,940 And now words[1] will equal quote unquote, "BYE!" 2282 01:46:40,940 --> 01:46:43,760 And now I've done the exact same thing, but again, I'm 2283 01:46:43,760 --> 01:46:48,650 just avoiding having s, t, q, r, and all these different variables in my code. 2284 01:46:48,650 --> 01:46:52,790 I just now am treating them as one single array of strings. 2285 01:46:52,790 --> 01:46:54,750 How do I change my code down here? 2286 01:46:54,750 --> 01:46:57,380 Well, if I want to print the first word, I do words[0]. 2287 01:46:57,380 --> 01:46:59,900 And if I want to print the second word, I do words[1]. 2288 01:46:59,900 --> 01:47:02,088 This is not a useful exercise at the moment 2289 01:47:02,088 --> 01:47:04,130 because I'm just making my code more complicated. 2290 01:47:04,130 --> 01:47:06,830 But again, it allows us to poke around and see what's 2291 01:47:06,830 --> 01:47:08,690 going on because there is that HI! 2292 01:47:08,690 --> 01:47:09,530 and BYE!. 2293 01:47:09,530 --> 01:47:10,700 But watch this. 2294 01:47:10,700 --> 01:47:14,670 If I really want to be cool, I can do this. 2295 01:47:14,670 --> 01:47:24,380 Let's print out %c, %c, %c, backslash n, and then here, %c, %c, %c, %c, 2296 01:47:24,380 --> 01:47:25,700 so four of those. 2297 01:47:25,700 --> 01:47:28,430 And now here's where things get interesting. 2298 01:47:28,430 --> 01:47:30,620 Words is an array of strings. 2299 01:47:30,620 --> 01:47:33,400 Again, if I may, what's a string? 2300 01:47:33,400 --> 01:47:35,060 An array of characters. 2301 01:47:35,060 --> 01:47:36,790 So just use the same logic. 2302 01:47:36,790 --> 01:47:41,110 If words is an array of strings, you get at the first string with words[0]. 2303 01:47:41,110 --> 01:47:44,530 How do you get at the first character in the first string? 2304 01:47:44,530 --> 01:47:52,150 Bracket 0, words[0][1], and lastly, words[0][2]. 2305 01:47:52,150 --> 01:47:57,460 And now down here, words[1], but the first character is there. 2306 01:47:57,460 --> 01:48:00,400 Word[1], the second character is here. 2307 01:48:00,400 --> 01:48:03,190 Words[1], the third character is here-- 2308 01:48:03,190 --> 01:48:04,720 whoops-- third character's here. 2309 01:48:04,720 --> 01:48:07,898 And words[1], the fourth character is here. 2310 01:48:07,898 --> 01:48:09,190 This is not how people program. 2311 01:48:09,190 --> 01:48:10,840 This is only for demonstrations sake. 2312 01:48:10,840 --> 01:48:13,060 My God, it's so tedious and verbose already. 2313 01:48:13,060 --> 01:48:20,410 But if I make hi now, ./hi, now, I'm manually reinventing %s, 2314 01:48:20,410 --> 01:48:22,990 if I forgot it existed, using %c alone. 2315 01:48:22,990 --> 01:48:25,900 But you can indeed manipulate arrays in this way. 2316 01:48:25,900 --> 01:48:28,300 But because strings are arrays of characters, 2317 01:48:28,300 --> 01:48:32,200 you can manipulate strings in this way too. 2318 01:48:32,200 --> 01:48:34,675 Any question now on this syntax? 2319 01:48:34,675 --> 01:48:37,210 2320 01:48:37,210 --> 01:48:38,800 Any questions here? 2321 01:48:38,800 --> 01:48:39,460 No? 2322 01:48:39,460 --> 01:48:39,970 No? 2323 01:48:39,970 --> 01:48:42,070 All right, well, let's go ahead and propose 2324 01:48:42,070 --> 01:48:45,830 that we solve a couple of other problems we might not have as before. 2325 01:48:45,830 --> 01:48:49,150 But first, a quick visual of what's been going on underneath the hood here. 2326 01:48:49,150 --> 01:48:52,420 If here, again, is where we left off on the screen, HI! and BYE! 2327 01:48:52,420 --> 01:48:56,470 back to back, here is really how I just treated these things. 2328 01:48:56,470 --> 01:49:00,880 s bracket 0, 1, 2, 3 and then t 0, 1, 2, 3, 4. 2329 01:49:00,880 --> 01:49:04,840 But really, once I put them in an array, the picture becomes this. 2330 01:49:04,840 --> 01:49:07,030 Words[0] is the whole HI!. 2331 01:49:07,030 --> 01:49:08,680 Words[1] is the whole BYE!. 2332 01:49:08,680 --> 01:49:11,470 But if I really get into the weeds and start indexing 2333 01:49:11,470 --> 01:49:14,980 into individual characters in those strings, all I'm using 2334 01:49:14,980 --> 01:49:20,710 is new syntax in order to represent these same values here. 2335 01:49:20,710 --> 01:49:28,710 Questions then, on these representations before we forge ahead? 2336 01:49:28,710 --> 01:49:29,430 No? 2337 01:49:29,430 --> 01:49:30,030 Yeah? 2338 01:49:30,030 --> 01:49:33,390 AUDIENCE: Does the new line character not [INAUDIBLE]?? 2339 01:49:33,390 --> 01:49:36,030 DAVID MALAN: Does the new line character-- say that once more? 2340 01:49:36,030 --> 01:49:38,597 AUDIENCE: Does the new line character take up any space? 2341 01:49:38,597 --> 01:49:40,180 DAVID MALAN: Ah, really good question. 2342 01:49:40,180 --> 01:49:42,730 Does the new line character take up any space? 2343 01:49:42,730 --> 01:49:45,340 It does, so far as printf is concerned. 2344 01:49:45,340 --> 01:49:48,790 But I'm not storing the backslash n in my strings, 2345 01:49:48,790 --> 01:49:53,460 printf is being manually handed that thing instead. 2346 01:49:53,460 --> 01:49:55,520 All right, so let's go ahead then and consider 2347 01:49:55,520 --> 01:49:58,970 how we might solve some problems that have arisen now with these strings, 2348 01:49:58,970 --> 01:50:00,680 as follows here. 2349 01:50:00,680 --> 01:50:02,760 Suppose I-- let's do this. 2350 01:50:02,760 --> 01:50:04,400 Let me go back to VS Code here. 2351 01:50:04,400 --> 01:50:09,980 And let me go ahead and open up a new file called, how about, length.c. 2352 01:50:09,980 --> 01:50:12,680 And let's consider for a moment how I might actually figure out 2353 01:50:12,680 --> 01:50:16,130 what the length of a string is, which is distinct from the length of an array. 2354 01:50:16,130 --> 01:50:19,680 I claimed earlier, you cannot figure out dynamically what the length of an array 2355 01:50:19,680 --> 01:50:20,180 is. 2356 01:50:20,180 --> 01:50:24,020 But I can figure out the length of a string, specifically, because 2357 01:50:24,020 --> 01:50:26,960 of this implementation detail of that null character. 2358 01:50:26,960 --> 01:50:28,500 So let me go ahead and do this. 2359 01:50:28,500 --> 01:50:31,940 Let me include cs50.h in this second program here. 2360 01:50:31,940 --> 01:50:35,090 Let me include stdio.h, as before. 2361 01:50:35,090 --> 01:50:38,120 And let me do this, int main void-- 2362 01:50:38,120 --> 01:50:40,970 and the first thing I'll do is just get a string from the user. 2363 01:50:40,970 --> 01:50:43,250 I'll ask the user, as always, for their name. 2364 01:50:43,250 --> 01:50:48,170 So I'll call getString, and say, what's your name, question mark, as always. 2365 01:50:48,170 --> 01:50:51,620 And then down here, if I want to figure out the length of this string 2366 01:50:51,620 --> 01:50:56,210 and print the length out on the screen, well, I 2367 01:50:56,210 --> 01:50:58,465 can kind of do this similar in spirit to the average, 2368 01:50:58,465 --> 01:50:59,840 where I'm accumulating something. 2369 01:50:59,840 --> 01:51:02,600 Let me go ahead and initialize N to 0. 2370 01:51:02,600 --> 01:51:05,120 Let me give myself-- 2371 01:51:05,120 --> 01:51:07,035 it's not a for loop because I don't have a-- 2372 01:51:07,035 --> 01:51:08,660 I don't know in advance how long it is. 2373 01:51:08,660 --> 01:51:09,980 But what if I do this? 2374 01:51:09,980 --> 01:51:20,600 While the value at name[n] does not equal '/0'-- 2375 01:51:20,600 --> 01:51:23,390 crazy syntax at the moment, but it's just the culmination 2376 01:51:23,390 --> 01:51:25,590 of these various building blocks. 2377 01:51:25,590 --> 01:51:28,970 Let me just finish the thought here, n++. 2378 01:51:28,970 --> 01:51:33,656 And then down here, let's just print out, with printf and %i, 2379 01:51:33,656 --> 01:51:38,930 that value of N. So I claim this is going to show me the length of any 2380 01:51:38,930 --> 01:51:43,220 string I type in, whether it's hi or bye or David or anything else. 2381 01:51:43,220 --> 01:51:45,410 I initialize a variable to zero, and that's good 2382 01:51:45,410 --> 01:51:47,535 because that's where you start counting in general. 2383 01:51:47,535 --> 01:51:50,990 While name[0] does not equal backslash zero. 2384 01:51:50,990 --> 01:51:51,930 What is this saying? 2385 01:51:51,930 --> 01:51:55,580 Well, if name is the string the user typed in-- and name is just an array, 2386 01:51:55,580 --> 01:51:56,460 as you noted-- 2387 01:51:56,460 --> 01:51:59,390 the name[0] is going to be the first character. 2388 01:51:59,390 --> 01:52:02,750 And I'm asking the question, well, does the first character not equal 2389 01:52:02,750 --> 01:52:03,680 backslash zero? 2390 01:52:03,680 --> 01:52:08,750 And if I type in David, D, it's not, so I keep going and I add 1 to N. 2391 01:52:08,750 --> 01:52:10,750 Then I'm going to check name[1]. 2392 01:52:10,750 --> 01:52:13,895 Well, if I typed in David, name[1] is going to be A. 2393 01:52:13,895 --> 01:52:18,020 A does not equal backslash zero, and so it's going to go again and again 2394 01:52:18,020 --> 01:52:18,740 and again. 2395 01:52:18,740 --> 01:52:23,090 But five steps in total later, it's going to get to the byte after 2396 01:52:23,090 --> 01:52:26,480 D-A-V-I-D, realize, wait a minute, that is a backslash n. 2397 01:52:26,480 --> 01:52:29,750 The loop finishes, and I print out the total length. 2398 01:52:29,750 --> 01:52:33,050 Arrays, in general, do not have this null character. 2399 01:52:33,050 --> 01:52:34,910 However, strings do. 2400 01:52:34,910 --> 01:52:38,150 Again, strings are special versus all of the other data types 2401 01:52:38,150 --> 01:52:39,590 we've talked about thus far. 2402 01:52:39,590 --> 01:52:43,220 But how could I, for instance, do this differently? 2403 01:52:43,220 --> 01:52:47,220 Well, let's actually factor this out as a function, as I've commonly done. 2404 01:52:47,220 --> 01:52:50,540 But rather than implement it myself, you know what? 2405 01:52:50,540 --> 01:52:54,140 It turns out what's nice about strings being so common, 2406 01:52:54,140 --> 01:52:57,260 there are many other people who have solved these problems before. 2407 01:52:57,260 --> 01:53:00,290 And in fact, there's a whole string library in C. 2408 01:53:00,290 --> 01:53:04,190 It is used by way of a header file called string.h. 2409 01:53:04,190 --> 01:53:08,400 And what string.h is, is a library of string-related functions. 2410 01:53:08,400 --> 01:53:10,760 In fact, you can see in CS50's manual pages 2411 01:53:10,760 --> 01:53:16,217 for C, the string.h functions, at least those that we recommend as most useful, 2412 01:53:16,217 --> 01:53:18,050 and in particular, if you poke around there, 2413 01:53:18,050 --> 01:53:20,290 you'll see that there's a function called strlen. 2414 01:53:20,290 --> 01:53:22,055 It means string length. 2415 01:53:22,055 --> 01:53:24,680 It was named very succinctly, just because it's a little easier 2416 01:53:24,680 --> 01:53:25,850 to type than string length. 2417 01:53:25,850 --> 01:53:28,800 But strlen tells you the length of a string. 2418 01:53:28,800 --> 01:53:30,990 So how might I use this in my code here? 2419 01:53:30,990 --> 01:53:34,020 Well, it turns out, I can simplify this quite a bit. 2420 01:53:34,020 --> 01:53:37,700 Let me get rid of my loop, get rid of my accounting 2421 01:53:37,700 --> 01:53:40,880 manually, and do something like this-- int n 2422 01:53:40,880 --> 01:53:45,630 equals strlen of the humans name, name. 2423 01:53:45,630 --> 01:53:49,430 And now I'll just use printf, as before, with %i backslash n, 2424 01:53:49,430 --> 01:53:51,290 and output the value of n. 2425 01:53:51,290 --> 01:53:54,380 But there's a bug at the moment. 2426 01:53:54,380 --> 01:53:58,480 What have I forgotten to do? 2427 01:53:58,480 --> 01:54:01,670 Yeah, I have to include the header file at the top of the screen, 2428 01:54:01,670 --> 01:54:03,260 so let me-- at the top of the code. 2429 01:54:03,260 --> 01:54:07,640 So let me also include string.h at the top of my file, 2430 01:54:07,640 --> 01:54:10,970 so that C knows that, in fact, strlen exists. 2431 01:54:10,970 --> 01:54:14,170 Let me go ahead and make length, as before. 2432 01:54:14,170 --> 01:54:18,670 ./length-- or actually, really for the first time, what's your name? 2433 01:54:18,670 --> 01:54:22,360 D-A-V-I-D. And hopefully, I'm going to see, in fact, 5. 2434 01:54:22,360 --> 01:54:26,950 By contrast, if I run it again and type in HI!, now I see three. 2435 01:54:26,950 --> 01:54:29,785 So strlen is just one of the functions in that library. 2436 01:54:29,785 --> 01:54:30,910 And there are so many more. 2437 01:54:30,910 --> 01:54:33,700 In fact, yet another library that might be useful moving forward 2438 01:54:33,700 --> 01:54:37,570 is this one, ctype, which relates to C data 2439 01:54:37,570 --> 01:54:40,580 types and lots of functions therein that can be useful. 2440 01:54:40,580 --> 01:54:43,690 For instance, if you review its documentation in the manual pages 2441 01:54:43,690 --> 01:54:46,930 online, you'll see that there are functions via which 2442 01:54:46,930 --> 01:54:49,460 we can solve problems like this. 2443 01:54:49,460 --> 01:54:52,480 Let me go ahead and propose here-- 2444 01:54:52,480 --> 01:54:53,680 let me see. 2445 01:54:53,680 --> 01:54:59,080 Let's do an example here involving-- 2446 01:54:59,080 --> 01:55:03,250 how about checking if something is uppercase or lowercase, 2447 01:55:03,250 --> 01:55:06,700 and converting it to uppercase only. 2448 01:55:06,700 --> 01:55:10,810 Let me go back to VS Code, and code a program called uppercase.c. 2449 01:55:10,810 --> 01:55:15,220 In this, file I'm going to start by including now, as always, cs50.h. 2450 01:55:15,220 --> 01:55:17,710 I'm going to include stdio.h. 2451 01:55:17,710 --> 01:55:21,670 And I'm going to add one other to the mix, which 2452 01:55:21,670 --> 01:55:26,230 is string.h now too, so I can access the length of things as needed. 2453 01:55:26,230 --> 01:55:28,570 Int main void comes next. 2454 01:55:28,570 --> 01:55:30,460 And then within my main function, I'm going 2455 01:55:30,460 --> 01:55:32,230 to go ahead and declare a string called s. 2456 01:55:32,230 --> 01:55:34,240 I'm going to call getString, as before. 2457 01:55:34,240 --> 01:55:38,170 And I'm going to go ahead and just ask the user for a string called before. 2458 01:55:38,170 --> 01:55:39,670 I want to do a before and after. 2459 01:55:39,670 --> 01:55:41,350 Whatever the user types in is before. 2460 01:55:41,350 --> 01:55:44,770 But I want to force everything to uppercase, thereafter. 2461 01:55:44,770 --> 01:55:48,740 Let me now, in this loop here, do this. 2462 01:55:48,740 --> 01:55:53,800 Let me printf quote unquote, "After," just so we can see this on the screen. 2463 01:55:53,800 --> 01:56:02,440 And let me do four int i gets 0, i is less than strlen of s, i++. 2464 01:56:02,440 --> 01:56:03,610 What am I about to do? 2465 01:56:03,610 --> 01:56:06,190 I'm about to iterate over every character in the string 2466 01:56:06,190 --> 01:56:11,230 from left to right, from 0 on up to, but not through, the length of s. 2467 01:56:11,230 --> 01:56:13,990 And how do I check if something is lowercase, 2468 01:56:13,990 --> 01:56:16,990 so that I can actually force it to uppercase? 2469 01:56:16,990 --> 01:56:19,630 Well, it turns out, I could do this literally. 2470 01:56:19,630 --> 01:56:27,436 If the character in s at location i is greater than or equal to capital A, 2471 01:56:27,436 --> 01:56:31,780 ampersand, ampersand, which means and instead of or, which we saw 2472 01:56:31,780 --> 01:56:37,930 in the past, s[i] is less than or equal to little z, that means, 2473 01:56:37,930 --> 01:56:41,800 logically in English, that this is indeed lowercase. 2474 01:56:41,800 --> 01:56:44,830 How do I now convert it to uppercase, this character? 2475 01:56:44,830 --> 01:56:48,160 Well, I could just literally print out the same character. 2476 01:56:48,160 --> 01:56:52,280 But that would not be the answer here because that's not changing the value. 2477 01:56:52,280 --> 01:56:54,470 But what could I do instead? 2478 01:56:54,470 --> 01:56:59,890 Well, let me actually pull up here real fast the ASCII chart as before, 2479 01:56:59,890 --> 01:57:03,220 and let's see if we can't glean some insight. 2480 01:57:03,220 --> 01:57:05,710 If I pull up the same ASCII chart, and suppose 2481 01:57:05,710 --> 01:57:09,790 the human has typed in a lowercase a, that's 97. 2482 01:57:09,790 --> 01:57:13,240 What letter-- I want to convert it to uppercase 2483 01:57:13,240 --> 01:57:18,660 A, so what number do I want to convert the 97 to, per week zero? 2484 01:57:18,660 --> 01:57:21,000 So 65, we keep coming back to that one. 2485 01:57:21,000 --> 01:57:23,010 What if the user types in lowercase b? 2486 01:57:23,010 --> 01:57:27,550 I want to change the 98 value to 66, and so forth. 2487 01:57:27,550 --> 01:57:30,130 And any quick math, how far apart are those? 2488 01:57:30,130 --> 01:57:33,120 So it's always 32, like uppercase to lowercase 2489 01:57:33,120 --> 01:57:37,990 is always, wonderfully, good design, 32 away, one from the other. 2490 01:57:37,990 --> 01:57:39,100 So what does this mean? 2491 01:57:39,100 --> 01:57:41,350 Well, I think we saw earlier that underneath the hood, 2492 01:57:41,350 --> 01:57:42,600 a char is just a number. 2493 01:57:42,600 --> 01:57:44,340 You can certainly do arithmetic on it. 2494 01:57:44,340 --> 01:57:46,507 And here, again, if you understand these lower level 2495 01:57:46,507 --> 01:57:48,180 primitives, what if I do this? 2496 01:57:48,180 --> 01:57:53,940 Whatever s[i] is, if I know on line 13 that it's lowercase, 2497 01:57:53,940 --> 01:57:57,048 do I want to add or subtract 32? 2498 01:57:57,048 --> 01:57:57,840 AUDIENCE: Subtract. 2499 01:57:57,840 --> 01:58:01,910 DAVID MALAN: So I want to subtract because I want to go from like 97 to 65 2500 01:58:01,910 --> 01:58:06,560 or 98 to 66, so indeed, if you do some quick math, that gives you 32. 2501 01:58:06,560 --> 01:58:10,970 So it's suffices to just treat chars as numbers, subtract the 32, 2502 01:58:10,970 --> 01:58:16,370 and printing it with %c, I think, will just convert lowercase to uppercase. 2503 01:58:16,370 --> 01:58:19,795 If you now fast forward to the real world, Microsoft Word or Google Docs, 2504 01:58:19,795 --> 01:58:22,670 if you've ever chosen the menu option that forces things to uppercase 2505 01:58:22,670 --> 01:58:24,980 or lowercase on occasion, literally, that's 2506 01:58:24,980 --> 01:58:26,480 what Microsoft and Google have done. 2507 01:58:26,480 --> 01:58:29,605 They iterate over every character in the document, check if it's lowercase, 2508 01:58:29,605 --> 01:58:33,810 and if so, they subtract 32 from it and show you the new value. 2509 01:58:33,810 --> 01:58:36,650 What if, though, it is not a lowercase letter? 2510 01:58:36,650 --> 01:58:40,520 I think I can keep it easy and just print out the current letter unchanged, 2511 01:58:40,520 --> 01:58:44,850 if my goal is to simply force things to all uppercase, and that letter, 2512 01:58:44,850 --> 01:58:46,490 then would be s[i]. 2513 01:58:46,490 --> 01:58:50,750 So let me go ahead now and make uppercase, hopefully, no errors. 2514 01:58:50,750 --> 01:58:55,670 ./uppercase, and I'll now type in David with an uppercase D, 2515 01:58:55,670 --> 01:58:57,120 but lowercase everything else. 2516 01:58:57,120 --> 01:59:00,020 But now the after version is DAVID-- 2517 01:59:00,020 --> 01:59:01,190 an aesthetic bug. 2518 01:59:01,190 --> 01:59:04,400 Notice here, I forgot to include, just for prettiness sake, 2519 01:59:04,400 --> 01:59:05,930 a backslash n at the end. 2520 01:59:05,930 --> 01:59:07,640 No problem, I'll add that. 2521 01:59:07,640 --> 01:59:08,870 Let me fix my mistake. 2522 01:59:08,870 --> 01:59:12,050 Make uppercase, ./uppercase, Enter. 2523 01:59:12,050 --> 01:59:14,240 D-A-V-I-D, Enter, and voila. 2524 01:59:14,240 --> 01:59:16,820 And I deliberately added another space after, 2525 01:59:16,820 --> 01:59:19,130 just so they would line up pretty, even though before 2526 01:59:19,130 --> 01:59:22,070 and after have different numbers of letters. 2527 01:59:22,070 --> 01:59:25,630 Questions then, on this implementation of forcing something 2528 01:59:25,630 --> 01:59:28,380 to uppercase, which in and of itself is not all that enlightening, 2529 01:59:28,380 --> 01:59:33,990 but is representative now of how you can leverage these low level primitives. 2530 01:59:33,990 --> 01:59:35,880 Question? 2531 01:59:35,880 --> 01:59:36,380 No? 2532 01:59:36,380 --> 01:59:38,633 All right, well, this honestly is tedious. 2533 01:59:38,633 --> 01:59:40,550 My God, like does Microsoft, Google, everyone, 2534 01:59:40,550 --> 01:59:43,550 you have to literally write out this code just to do something simple? 2535 01:59:43,550 --> 01:59:46,310 Well, no, that's, again, why we have things like libraries. 2536 01:59:46,310 --> 01:59:49,220 And increasingly now, for problem sets, projects, and beyond, 2537 01:59:49,220 --> 01:59:52,040 well, you just use libraries more often off-the-shelf 2538 01:59:52,040 --> 01:59:55,940 so as to solve problems that, surely, other people have had before you. 2539 01:59:55,940 --> 01:59:59,570 So how can I now use this library, ctype.h? 2540 01:59:59,570 --> 02:00:01,320 Well, let me go back into my code. 2541 02:00:01,320 --> 02:00:05,090 Let me include this among my header files here. 2542 02:00:05,090 --> 02:00:08,030 Just so I can skim things easily, I tend to alphabetize my headers. 2543 02:00:08,030 --> 02:00:11,238 But that's not strictly necessary, but it allows me, at a glance, to realize, 2544 02:00:11,238 --> 02:00:13,400 did I or did I not include something I need? 2545 02:00:13,400 --> 02:00:15,570 Now, let me go ahead and do this. 2546 02:00:15,570 --> 02:00:20,390 It turns out if you read the documentation for the C type library, 2547 02:00:20,390 --> 02:00:24,710 there's a function, wonderfully called, if islower, 2548 02:00:24,710 --> 02:00:28,910 that takes in a character as its argument, essentially, so s[i]. 2549 02:00:28,910 --> 02:00:32,182 And if that returns true, a Boolean value, if you will, 2550 02:00:32,182 --> 02:00:33,890 well, I'm going to force it to lowercase. 2551 02:00:33,890 --> 02:00:36,560 But I don't have to do this math anymore. 2552 02:00:36,560 --> 02:00:40,610 Turns out, in the C type library, there's also a function called to upper 2553 02:00:40,610 --> 02:00:43,130 that takes a character as input, like s[i], 2554 02:00:43,130 --> 02:00:45,060 and it just does the math for you. 2555 02:00:45,060 --> 02:00:47,270 So that you can abstract away the 32 thing, 2556 02:00:47,270 --> 02:00:50,400 and just know that someone else has solved that problem for you. 2557 02:00:50,400 --> 02:00:53,030 Otherwise, I can leave my code unchanged down below 2558 02:00:53,030 --> 02:00:55,200 because I'm not changing anything else. 2559 02:00:55,200 --> 02:01:00,410 So if I do make uppercase now, and then ./uppercase, D-a-v-i-d, 2560 02:01:00,410 --> 02:01:03,710 with just a capital D, and now it still works. 2561 02:01:03,710 --> 02:01:06,890 But if you read the documentation further, it turns out that to upper 2562 02:01:06,890 --> 02:01:07,520 is smart. 2563 02:01:07,520 --> 02:01:10,220 If you pass in a character to to upper, that's lowercase, 2564 02:01:10,220 --> 02:01:13,040 it obviously converts it to uppercase by doing that math. 2565 02:01:13,040 --> 02:01:17,240 But if you pass in a character to to upper that's already uppercase, 2566 02:01:17,240 --> 02:01:21,540 the documentation you would see tells you that it leaves it unchanged. 2567 02:01:21,540 --> 02:01:23,910 So I can tighten all of this up. 2568 02:01:23,910 --> 02:01:25,880 I can get rid of the whole else. 2569 02:01:25,880 --> 02:01:29,150 I can get rid of the whole if, and arguably now, 2570 02:01:29,150 --> 02:01:33,620 implement a program that's just as correct, but better designed. 2571 02:01:33,620 --> 02:01:34,250 Why? 2572 02:01:34,250 --> 02:01:38,000 Fewer lines of code easier to read, lower probability of mistakes, 2573 02:01:38,000 --> 02:01:39,740 assuming the library is correct. 2574 02:01:39,740 --> 02:01:43,160 It just makes it easier and faster for me, now, to write code. 2575 02:01:43,160 --> 02:01:47,960 So if I now do, one last time, make uppercase, Enter, ./uppercase, 2576 02:01:47,960 --> 02:01:50,190 and type in my name, still working. 2577 02:01:50,190 --> 02:01:53,810 But now notice, we've whittled this down to far fewer lines of code, 2578 02:01:53,810 --> 02:01:57,740 albeit, using now this additional library. 2579 02:01:57,740 --> 02:02:00,140 Questions then on how we did this? 2580 02:02:00,140 --> 02:02:03,930 2581 02:02:03,930 --> 02:02:06,230 Well, even though this code, I daresay, is correct, 2582 02:02:06,230 --> 02:02:09,120 it's not necessarily well-designed just yet. 2583 02:02:09,120 --> 02:02:12,590 In fact, there's one line of code, one function 2584 02:02:12,590 --> 02:02:14,690 call in this current implementation that's 2585 02:02:14,690 --> 02:02:17,900 more inefficient than it needs to be. 2586 02:02:17,900 --> 02:02:20,630 And allow me to draw your attention to this here, 2587 02:02:20,630 --> 02:02:24,320 line 10, wherein we're calling strlen. 2588 02:02:24,320 --> 02:02:27,350 But we're calling it inside of this for loop, specifically, 2589 02:02:27,350 --> 02:02:29,000 inside of the condition. 2590 02:02:29,000 --> 02:02:33,720 And why might that not necessarily be the best idea? 2591 02:02:33,720 --> 02:02:36,810 Well, is the length of the string as changing, ever? 2592 02:02:36,810 --> 02:02:38,950 I mean, certainly not within the span of this loop. 2593 02:02:38,950 --> 02:02:42,840 And so here we are within our for loop on line 10, 11, 12, and 13, 2594 02:02:42,840 --> 02:02:45,242 asking on every iteration that same question. 2595 02:02:45,242 --> 02:02:46,200 What's the length of s? 2596 02:02:46,200 --> 02:02:47,190 What's the length of s? 2597 02:02:47,190 --> 02:02:48,330 What's the length of s? 2598 02:02:48,330 --> 02:02:50,702 And in turn, we're calling strlen every time, 2599 02:02:50,702 --> 02:02:52,660 even though we're getting back the same answer. 2600 02:02:52,660 --> 02:02:54,960 So I daresay a better solution here would 2601 02:02:54,960 --> 02:02:58,230 be to maybe figure out the length of s earlier on in my code, 2602 02:02:58,230 --> 02:02:59,490 and maybe declare a variable. 2603 02:02:59,490 --> 02:03:02,580 Or perhaps do something that's syntactically a little more elegant, 2604 02:03:02,580 --> 02:03:05,070 and in fact, a very common design in a loop like this, 2605 02:03:05,070 --> 02:03:07,860 would be to declare not just one variable like i, 2606 02:03:07,860 --> 02:03:12,060 but to actually declare a second variable called n, for instance, where 2607 02:03:12,060 --> 02:03:16,530 n is just some number, set n equal to the length of s. 2608 02:03:16,530 --> 02:03:18,900 But thereafter, inside of this condition, 2609 02:03:18,900 --> 02:03:24,540 instead of calling strlen of s again and again and again, what might I now do? 2610 02:03:24,540 --> 02:03:28,110 I could instead just compare i against n itself, 2611 02:03:28,110 --> 02:03:31,080 because n now will only be calculated once when it's initialized, 2612 02:03:31,080 --> 02:03:32,730 just as i is initialize to zero. 2613 02:03:32,730 --> 02:03:36,000 And thereafter, we're going to be comparing i, which is changing, 2614 02:03:36,000 --> 02:03:37,350 against n, which will not be. 2615 02:03:37,350 --> 02:03:40,330 So it's going to be marginally more efficient by design. 2616 02:03:40,330 --> 02:03:42,900 Now with that said, a good compiler could also 2617 02:03:42,900 --> 02:03:46,080 recognize that there is this optimization possibility, 2618 02:03:46,080 --> 02:03:47,100 and maybe do it for us. 2619 02:03:47,100 --> 02:03:49,080 But for now, best to get into the habit, best 2620 02:03:49,080 --> 02:03:52,260 to develop the muscle memory for making those better design decisions 2621 02:03:52,260 --> 02:03:54,010 yourselves. 2622 02:03:54,010 --> 02:03:56,380 Questions, then, on how we did this? 2623 02:03:56,380 --> 02:03:58,900 2624 02:03:58,900 --> 02:03:59,650 No? 2625 02:03:59,650 --> 02:04:03,050 All right, a few final building blocks for the day. 2626 02:04:03,050 --> 02:04:07,870 So we started by talking about those command line arguments that clang uses, 2627 02:04:07,870 --> 02:04:13,090 whereby, anything after the command that you type at a prompt, be it make 2628 02:04:13,090 --> 02:04:18,160 or clang or even CD in Linux, any word thereafter, or something 2629 02:04:18,160 --> 02:04:21,350 cryptic like -o is a command line argument. 2630 02:04:21,350 --> 02:04:22,840 It's an input to the command. 2631 02:04:22,840 --> 02:04:26,132 It's different from a function argument because a function argument, of course, 2632 02:04:26,132 --> 02:04:27,280 is an input to a function. 2633 02:04:27,280 --> 02:04:28,345 But it's the same idea. 2634 02:04:28,345 --> 02:04:30,970 It's just different syntax after the dollar sign at the prompt. 2635 02:04:30,970 --> 02:04:33,880 Well, it turns out that command line arguments 2636 02:04:33,880 --> 02:04:37,660 are something you can now use in your own programs 2637 02:04:37,660 --> 02:04:41,800 by accessing words after the prompt. 2638 02:04:41,800 --> 02:04:45,410 And let me propose that we invent this as follows. 2639 02:04:45,410 --> 02:04:49,540 Let me propose that we switch back to VS Code here, 2640 02:04:49,540 --> 02:04:53,560 and I'll open a new file here called greet.c. 2641 02:04:53,560 --> 02:04:56,410 So in greet.c, it's going to be a program that very simply greets 2642 02:04:56,410 --> 02:04:57,070 the user. 2643 02:04:57,070 --> 02:04:59,440 Had we written this last week, we would have done this. 2644 02:04:59,440 --> 02:05:08,200 Include cs50.h, and then include stdio.h, and then int main void, 2645 02:05:08,200 --> 02:05:13,060 and then we might do something simple like string name equals getString, 2646 02:05:13,060 --> 02:05:15,980 quote unquote, "What's your name?" 2647 02:05:15,980 --> 02:05:20,020 And then we would have printed out, as always, Hello, %s, 2648 02:05:20,020 --> 02:05:21,490 and then plugging in that name. 2649 02:05:21,490 --> 02:05:25,300 So this is the same program we've implemented many times, just 2650 02:05:25,300 --> 02:05:26,590 to make sure it works-- 2651 02:05:26,590 --> 02:05:29,140 although, nope, that's not quite the same program. 2652 02:05:29,140 --> 02:05:30,940 Semicolon's in the wrong place. 2653 02:05:30,940 --> 02:05:32,960 This now is the same program. 2654 02:05:32,960 --> 02:05:37,610 So make greet, dot ./greet, and I'll type in my own name. hello, David. 2655 02:05:37,610 --> 02:05:38,770 So we're back there. 2656 02:05:38,770 --> 02:05:41,770 Now, what's arguably a little annoying about this program, 2657 02:05:41,770 --> 02:05:44,110 if I type in something else like, Carter, 2658 02:05:44,110 --> 02:05:48,130 Enter, I have to run the program, wait for the prompt, type in my name, 2659 02:05:48,130 --> 02:05:48,910 hit Enter. 2660 02:05:48,910 --> 02:05:52,360 And that's fine, but imagine if every program worked like this. 2661 02:05:52,360 --> 02:05:55,415 Like make, suppose you could only type make, then you wait for a prompt, 2662 02:05:55,415 --> 02:05:58,540 then you type the name of the program you want to make, then you hit Enter. 2663 02:05:58,540 --> 02:06:01,720 Or worse, in Linux when you have to change directories, 2664 02:06:01,720 --> 02:06:05,263 as you might have for problem set one, what if you had to type CD, Enter, 2665 02:06:05,263 --> 02:06:07,930 now type the name of the folder you want to change into, Enter-- 2666 02:06:07,930 --> 02:06:09,710 I mean, it just slows life down. 2667 02:06:09,710 --> 02:06:11,470 And so it just gets annoying quickly. 2668 02:06:11,470 --> 02:06:16,070 So command line arguments just let you express your whole thought all at once. 2669 02:06:16,070 --> 02:06:18,200 So how can I do this? 2670 02:06:18,200 --> 02:06:22,450 Well, if I want to express the notion of command line arguments in my code, 2671 02:06:22,450 --> 02:06:25,640 I could do something like this. 2672 02:06:25,640 --> 02:06:28,750 I could, for the very first time, go up and get 2673 02:06:28,750 --> 02:06:33,730 rid of this void, which as of today means, this program takes no command 2674 02:06:33,730 --> 02:06:34,780 line arguments. 2675 02:06:34,780 --> 02:06:37,540 And I can change it to exactly this. 2676 02:06:37,540 --> 02:06:43,490 Int argc, string argv, with brackets. 2677 02:06:43,490 --> 02:06:44,950 Now it's cryptic, admittedly. 2678 02:06:44,950 --> 02:06:46,150 And let me zoom in. 2679 02:06:46,150 --> 02:06:49,300 But I think we can perhaps infer now, what's going on. 2680 02:06:49,300 --> 02:06:52,750 If main now does not have void as its input, which 2681 02:06:52,750 --> 02:06:55,600 means it takes no arguments, surely, the spoiler 2682 02:06:55,600 --> 02:06:59,230 here is that now main will take command line arguments somehow. 2683 02:06:59,230 --> 02:07:05,180 Any guesses as to what argv is or will be? 2684 02:07:05,180 --> 02:07:08,330 What might this represent? 2685 02:07:08,330 --> 02:07:11,390 It's an array of strings, right, by way of the syntax. 2686 02:07:11,390 --> 02:07:13,223 Yeah? 2687 02:07:13,223 --> 02:07:15,480 AUDIENCE: All the characters will be typed out. 2688 02:07:15,480 --> 02:07:16,050 DAVID MALAN: Exactly. 2689 02:07:16,050 --> 02:07:18,550 It will be all of the characters, or really all of the words 2690 02:07:18,550 --> 02:07:19,830 that you type at the prompt. 2691 02:07:19,830 --> 02:07:21,765 Argc, as an int, any guess? 2692 02:07:21,765 --> 02:07:24,360 2693 02:07:24,360 --> 02:07:28,700 Argument count is what it generally stands for, though technically, 2694 02:07:28,700 --> 02:07:30,290 you could call these things anything. 2695 02:07:30,290 --> 02:07:31,520 But this is the convention. 2696 02:07:31,520 --> 02:07:35,780 Because I claimed earlier that arrays don't keep track of their own length, 2697 02:07:35,780 --> 02:07:38,930 if you want to know how many words the human typed at the prompt 2698 02:07:38,930 --> 02:07:41,420 after your program's name, you have to be told, 2699 02:07:41,420 --> 02:07:45,650 not just the array of the words, but the length of that array. 2700 02:07:45,650 --> 02:07:48,530 The strings, you can figure out the length of using strlen, 2701 02:07:48,530 --> 02:07:53,360 but you can't figure out the length of the array of strings, the collection 2702 02:07:53,360 --> 02:07:55,020 of words that the human typed in. 2703 02:07:55,020 --> 02:07:56,760 So how can I now use this? 2704 02:07:56,760 --> 02:07:59,190 Well, let me go ahead and do this. 2705 02:07:59,190 --> 02:08:04,190 Let me go ahead and change this program now just to be printf, quote unquote, 2706 02:08:04,190 --> 02:08:11,630 "hello, %2 /n", then argv[1]. 2707 02:08:11,630 --> 02:08:14,780 So this is not the best version of my code yet, but it's my first. 2708 02:08:14,780 --> 02:08:21,020 Make greet, and now let me do ./greet, David all at once. 2709 02:08:21,020 --> 02:08:23,210 Enter, hello, David. 2710 02:08:23,210 --> 02:08:25,820 Now let me run it again, ./greet, Carter. 2711 02:08:25,820 --> 02:08:27,620 Enter, hello, Carter. 2712 02:08:27,620 --> 02:08:29,840 It's a marginal improvement, but I don't have 2713 02:08:29,840 --> 02:08:32,330 to wait for getString to prompt me to hit Enter. 2714 02:08:32,330 --> 02:08:34,370 It's just speeding things up, twice as fast. 2715 02:08:34,370 --> 02:08:36,890 One less command to type in. 2716 02:08:36,890 --> 02:08:41,390 But I deliberately did [1], but what's the beginning of argv? 2717 02:08:41,390 --> 02:08:42,170 It would be [0]. 2718 02:08:42,170 --> 02:08:44,730 2719 02:08:44,730 --> 02:08:45,780 Well, what's that? 2720 02:08:45,780 --> 02:08:48,840 This is sometimes useful, though for now, it's not. 2721 02:08:48,840 --> 02:08:54,110 Suppose I recompile my code and run this program now, greet David. 2722 02:08:54,110 --> 02:08:58,598 Anyone want to guess what's in argv[0]? 2723 02:08:58,598 --> 02:08:59,530 AUDIENCE: [INAUDIBLE] 2724 02:08:59,530 --> 02:09:00,220 DAVID MALAN: Say again? 2725 02:09:00,220 --> 02:09:01,230 AUDIENCE: Greet, hello. 2726 02:09:01,230 --> 02:09:04,530 DAVID MALAN: Greet, Enter, hello, ./greet. 2727 02:09:04,530 --> 02:09:08,280 So if you want to sort of inception style your program to figure out what 2728 02:09:08,280 --> 02:09:11,910 its own name is, or at least how it was executed at the command line, 2729 02:09:11,910 --> 02:09:14,460 at the terminal, you can look at argv[0]. 2730 02:09:14,460 --> 02:09:17,160 In general, probably not that useful, probably better 2731 02:09:17,160 --> 02:09:21,900 to start looking at [1], which was the first word after the program name. 2732 02:09:21,900 --> 02:09:25,320 And if there were more, I could do this how about argv[2], 2733 02:09:25,320 --> 02:09:27,690 let me add in a second %s. 2734 02:09:27,690 --> 02:09:29,550 Let me recompile greet. 2735 02:09:29,550 --> 02:09:35,490 Let me do ./greet David Malan, Enter, and that, too, now works, 2736 02:09:35,490 --> 02:09:37,112 taking in two words at the prompt. 2737 02:09:37,112 --> 02:09:38,820 If I really want to be smart at this now, 2738 02:09:38,820 --> 02:09:40,445 I could do something like this, though. 2739 02:09:40,445 --> 02:09:44,700 How about if the count of arguments, A.K.A. argc, 2740 02:09:44,700 --> 02:09:49,890 equals equals to, then assume that the human typed in only their first name, 2741 02:09:49,890 --> 02:09:58,440 and do printf hello comma %s /n, and then argv[1]. 2742 02:09:58,440 --> 02:10:01,470 Else, if the human did not provide exactly two 2743 02:10:01,470 --> 02:10:04,920 arguments, the name of the program and their own name, 2744 02:10:04,920 --> 02:10:07,890 let's just print out a default value, lest they forgot their name 2745 02:10:07,890 --> 02:10:09,990 or they typed in two names or three names. 2746 02:10:09,990 --> 02:10:13,110 Let's just do, hello comma world as a default. 2747 02:10:13,110 --> 02:10:15,270 And we'll just ignore what the human typed in. 2748 02:10:15,270 --> 02:10:20,850 If I recompile this, make greet, I can do ./greet and David again, Enter. 2749 02:10:20,850 --> 02:10:24,840 Oops-- sorry, what am I missing? 2750 02:10:24,840 --> 02:10:26,640 Yeah, so newbie mistake. 2751 02:10:26,640 --> 02:10:30,090 Else, all right, make greet again. 2752 02:10:30,090 --> 02:10:34,050 ./greet, David, Enter, there's my hello, David. 2753 02:10:34,050 --> 02:10:37,870 But if I omit my name, I just get the generic, like a default value. 2754 02:10:37,870 --> 02:10:41,590 And if I get a little curious and I type in both names, then I get ignored too. 2755 02:10:41,590 --> 02:10:42,090 Why? 2756 02:10:42,090 --> 02:10:44,880 Because I just haven't built in support for argc of three. 2757 02:10:44,880 --> 02:10:47,610 I could do anything I want, but now we have access 2758 02:10:47,610 --> 02:10:50,730 to these kinds of building blocks. 2759 02:10:50,730 --> 02:10:52,780 All right, what else might I do here? 2760 02:10:52,780 --> 02:10:57,660 Well, it turns out there might be some final features for us to now execute. 2761 02:10:57,660 --> 02:11:00,090 Notice, though, that in C, despite what you 2762 02:11:00,090 --> 02:11:02,820 might see in books or online tutorials, nowadays, 2763 02:11:02,820 --> 02:11:06,180 the two official formats for defining a main function 2764 02:11:06,180 --> 02:11:11,130 are either this, which we've been using now for two plus weeks or now this, 2765 02:11:11,130 --> 02:11:14,250 whereby, you change the void to int argc, 2766 02:11:14,250 --> 02:11:17,880 and then for now, string argv, and then empty brackets. 2767 02:11:17,880 --> 02:11:20,608 And we'll see that this, too, is a simplification, some training 2768 02:11:20,608 --> 02:11:21,400 wheels if you will. 2769 02:11:21,400 --> 02:11:23,550 But for now, those are the two forms, even 2770 02:11:23,550 --> 02:11:26,550 though you will see in online tutorials and even books, some people 2771 02:11:26,550 --> 02:11:27,840 use main in different ways. 2772 02:11:27,840 --> 02:11:30,142 These are the two now to keep in mind. 2773 02:11:30,142 --> 02:11:32,100 And I'll note that these command line arguments 2774 02:11:32,100 --> 02:11:33,360 are kind of all over the place. 2775 02:11:33,360 --> 02:11:35,590 Didn't probably expect to see this word on the screen here. 2776 02:11:35,590 --> 02:11:36,490 And what does it mean? 2777 02:11:36,490 --> 02:11:37,920 Well, it turns out that for decades-- there's 2778 02:11:37,920 --> 02:11:40,080 actually this program that comes with Linux systems 2779 02:11:40,080 --> 02:11:41,880 in particular called cowsay. 2780 02:11:41,880 --> 02:11:42,510 Why? 2781 02:11:42,510 --> 02:11:45,300 Probably because someone had too much free time once and decided 2782 02:11:45,300 --> 02:11:49,920 to write a program that creates ASCII art out of a cow saying something 2783 02:11:49,920 --> 02:11:51,520 textually on the screen. 2784 02:11:51,520 --> 02:11:55,780 But you use cowsay, just for fun, by way of command line arguments. 2785 02:11:55,780 --> 02:12:00,660 So for instance, let me propose that I go back to VS Code 2786 02:12:00,660 --> 02:12:03,020 here, not because I want to write any code, 2787 02:12:03,020 --> 02:12:04,770 but I just want to use my terminal window. 2788 02:12:04,770 --> 02:12:07,320 And let me maximize my terminal window here. 2789 02:12:07,320 --> 02:12:11,880 And let me go ahead and type in something like, how about cowsay, 2790 02:12:11,880 --> 02:12:13,170 space moo? 2791 02:12:13,170 --> 02:12:14,822 So cowsay is not a program I wrote. 2792 02:12:14,822 --> 02:12:16,030 It's been around for decades. 2793 02:12:16,030 --> 02:12:18,870 But we installed it in VS Code for you in the cloud. 2794 02:12:18,870 --> 02:12:21,330 It takes at least one command line argument. 2795 02:12:21,330 --> 02:12:23,070 What do you want the cow to say? 2796 02:12:23,070 --> 02:12:26,190 I can say, cowsay moo, and hit Enter, and voila, there 2797 02:12:26,190 --> 02:12:29,490 is my ASCII art of a cow saying moo on the screen. 2798 02:12:29,490 --> 02:12:31,090 It can say multiple words. 2799 02:12:31,090 --> 02:12:33,960 So I can say, Hello, world, Enter. 2800 02:12:33,960 --> 02:12:35,800 And now it says, Hello, world. 2801 02:12:35,800 --> 02:12:38,730 So this is just an example of a silly program that uses command line 2802 02:12:38,730 --> 02:12:40,470 arguments, but it takes others too. 2803 02:12:40,470 --> 02:12:43,650 Just like clang, use this convention of hyphens 2804 02:12:43,650 --> 02:12:45,750 to change the output of the program. 2805 02:12:45,750 --> 02:12:49,350 Dash something is just a super common convention with command line arguments 2806 02:12:49,350 --> 02:12:53,520 when you want a very terse notation for some option like output. 2807 02:12:53,520 --> 02:12:56,460 In cowsay, I read the documentation, and it turns out 2808 02:12:56,460 --> 02:12:59,040 there's a dash f command line argument that 2809 02:12:59,040 --> 02:13:03,460 allows you to change the appearance of the cow, if you will. 2810 02:13:03,460 --> 02:13:10,170 So if I do cowsay dash f, duck, and then some other word like quack, 2811 02:13:10,170 --> 02:13:11,640 it's no longer a cow. 2812 02:13:11,640 --> 02:13:15,850 That command line argument turns it into a tiny, adorable duck instead. 2813 02:13:15,850 --> 02:13:19,020 And then lastly, just for fun, because I spent way too much time 2814 02:13:19,020 --> 02:13:20,790 playing with command line arguments. 2815 02:13:20,790 --> 02:13:25,260 Cowsay dash f, dragon, and then how about, rawr, Enter, 2816 02:13:25,260 --> 02:13:27,910 you can even get this on the screen here. 2817 02:13:27,910 --> 02:13:30,150 So this, too, is just an example of what you 2818 02:13:30,150 --> 02:13:34,230 can do with these command line arguments now that we have this building block. 2819 02:13:34,230 --> 02:13:36,960 And there's one final thing we can now do with code. 2820 02:13:36,960 --> 02:13:39,150 There's one last feature today that we'll 2821 02:13:39,150 --> 02:13:41,610 introduce before we now connect all of these dots 2822 02:13:41,610 --> 02:13:47,520 to readability and encryption by talking, lastly, about something called 2823 02:13:47,520 --> 02:13:48,450 exit status. 2824 02:13:48,450 --> 02:13:52,380 It turns out that whenever your main function exits, 2825 02:13:52,380 --> 02:13:55,590 it returns a secret integer that you can figure out, 2826 02:13:55,590 --> 02:13:58,260 as the programmer or an advanced user, what it was. 2827 02:13:58,260 --> 02:14:02,398 And these exit codes, exit statuses, are typically used to indicate errors. 2828 02:14:02,398 --> 02:14:05,190 So for instance, over the past couple of years, if you've used zoom 2829 02:14:05,190 --> 02:14:08,560 and you ever got some kind of error, you might have seen a screen like this. 2830 02:14:08,560 --> 02:14:11,040 It's usually not that helpful, maybe tells you to click 2831 02:14:11,040 --> 02:14:13,050 Report Problem or Contact Support. 2832 02:14:13,050 --> 02:14:16,980 But very often in our human world on Macs, PCs, and phones, 2833 02:14:16,980 --> 02:14:20,010 you see cryptic error codes, like literally numbers 2834 02:14:20,010 --> 02:14:23,640 that probably only Zoom knows, or Microsoft or Google or whatever company 2835 02:14:23,640 --> 02:14:25,050 wrote the software you're using. 2836 02:14:25,050 --> 02:14:28,260 But that number corresponds to a specific error 2837 02:14:28,260 --> 02:14:32,070 that some human somewhere knows might very well happen. 2838 02:14:32,070 --> 02:14:34,950 These are used similarly, although under a different name 2839 02:14:34,950 --> 02:14:38,260 that we'll talk about later in the term, on the web as well. 2840 02:14:38,260 --> 02:14:41,350 Have you ever seen this-- maybe not character, but number? 2841 02:14:41,350 --> 02:14:43,485 So, 404 means what? 2842 02:14:43,485 --> 02:14:44,880 AUDIENCE: Error. 2843 02:14:44,880 --> 02:14:47,790 DAVID MALAN: So error, yes, but really, not found. 2844 02:14:47,790 --> 02:14:48,410 So, why? 2845 02:14:48,410 --> 02:14:49,993 I mean, this is the most arcane thing. 2846 02:14:49,993 --> 02:14:53,000 And we'll talk in a few weeks about what this and other numbers mean, 2847 02:14:53,000 --> 02:14:54,917 but numbers are all around us in technology, 2848 02:14:54,917 --> 02:14:57,500 and they very often mean something to the technical people who 2849 02:14:57,500 --> 02:15:00,270 wrote the software, less so to humans like you and me. 2850 02:15:00,270 --> 02:15:03,230 Why so many of us recognize 404 is kind of weird, 2851 02:15:03,230 --> 02:15:05,900 that like that's been around long enough that we all know it. 2852 02:15:05,900 --> 02:15:10,250 But it really is just a special number that represents an error of some sort. 2853 02:15:10,250 --> 02:15:13,100 So it turns out, the last thing we'll reveal today 2854 02:15:13,100 --> 02:15:15,530 about what we've been taking for granted for two weeks, 2855 02:15:15,530 --> 02:15:18,200 is what the int is in main. 2856 02:15:18,200 --> 02:15:21,650 We've seen, just a moment ago, that the thing in the parentheses, which 2857 02:15:21,650 --> 02:15:24,680 up until now has been void, which means no command line arguments. 2858 02:15:24,680 --> 02:15:29,690 now int argc string argv brackets just means, yes, command line arguments. 2859 02:15:29,690 --> 02:15:31,290 And we've seen how to access them. 2860 02:15:31,290 --> 02:15:33,620 So the last piece of the puzzle, honestly, 2861 02:15:33,620 --> 02:15:37,460 of all the cryptic syntax the past two weeks, is just what int means. 2862 02:15:37,460 --> 02:15:40,610 Int is always there for main, and it indicates 2863 02:15:40,610 --> 02:15:44,300 that main will always return an integer, even though you and I have never 2864 02:15:44,300 --> 02:15:46,010 done so explicitly. 2865 02:15:46,010 --> 02:15:50,450 Usually, main returns 0, by default. But it 2866 02:15:50,450 --> 02:15:53,928 would be weird if you saw an error message saying 0, so 0 is just hidden. 2867 02:15:53,928 --> 02:15:55,470 You would never see it on the screen. 2868 02:15:55,470 --> 02:15:58,670 But it's happening automatically by way of how C is designed. 2869 02:15:58,670 --> 02:16:01,550 So let me write one final program here. 2870 02:16:01,550 --> 02:16:05,750 I'll call it, for instance, status.c to show you these exit statuses. 2871 02:16:05,750 --> 02:16:10,790 Code of status.c, and then up here, let me do something simple like include 2872 02:16:10,790 --> 02:16:18,020 cs50.h, then include stdio.h, and then int main-- 2873 02:16:18,020 --> 02:16:21,350 actually, let's use a command line argument. int argc, string argv[], 2874 02:16:21,350 --> 02:16:23,180 so that's copy, paste. 2875 02:16:23,180 --> 02:16:26,000 But now let's do this. 2876 02:16:26,000 --> 02:16:29,280 If argc does not equal to-- 2877 02:16:29,280 --> 02:16:30,780 why don't we do something like this? 2878 02:16:30,780 --> 02:16:33,740 Let's not just default to hello, world like last time. 2879 02:16:33,740 --> 02:16:34,770 Let's yell at the user. 2880 02:16:34,770 --> 02:16:38,802 So let's say something like printf missing command line argument, 2881 02:16:38,802 --> 02:16:40,760 so that they know they screwed up and they need 2882 02:16:40,760 --> 02:16:43,160 to run the program again correctly. 2883 02:16:43,160 --> 02:16:51,320 Else, let's go ahead and say, print out, as before, Hello, comma %s, 2884 02:16:51,320 --> 02:16:56,730 and then plug in argv[1], so the human's name from the prompt. 2885 02:16:56,730 --> 02:17:01,910 Now at this point, let me go ahead and run status, ./status, 2886 02:17:01,910 --> 02:17:03,590 and I'll type nothing first. 2887 02:17:03,590 --> 02:17:04,700 I get yelled at. 2888 02:17:04,700 --> 02:17:10,170 This time, I'll type it again. ./status David, and it works properly. 2889 02:17:10,170 --> 02:17:14,090 But now let me show you a somewhat secret, cryptic command. 2890 02:17:14,090 --> 02:17:17,330 You can type this at your prompt, and it's just a coincidence 2891 02:17:17,330 --> 02:17:18,740 that there's another dollar sign. 2892 02:17:18,740 --> 02:17:22,400 Echo $?, totally arcane, but it allows you 2893 02:17:22,400 --> 02:17:25,490 to see what exit status your program has ended with. 2894 02:17:25,490 --> 02:17:27,559 So let me run this again the wrong way. 2895 02:17:27,559 --> 02:17:31,040 ./status, I get the error message. 2896 02:17:31,040 --> 02:17:32,780 What was secretly returned? 2897 02:17:32,780 --> 02:17:33,440 I can't see it. 2898 02:17:33,440 --> 02:17:37,280 There's obviously no error screen, but by typing echo $?, 2899 02:17:37,280 --> 02:17:41,420 I can see that, oh, my program automatically, by default, returns 2900 02:17:41,420 --> 02:17:42,170 zero. 2901 02:17:42,170 --> 02:17:46,879 However, if I run it again correctly, ./status David, Enter, 2902 02:17:46,879 --> 02:17:48,690 this is the correct version. 2903 02:17:48,690 --> 02:17:50,629 But if I run echo $? 2904 02:17:50,629 --> 02:17:52,879 status again, it's still entered with 0. 2905 02:17:52,879 --> 02:17:55,879 And long story short, this is just a missed opportunity. 2906 02:17:55,879 --> 02:17:59,570 When something goes wrong, why don't I return a value other than 0? 2907 02:17:59,570 --> 02:18:01,070 0, by default, means success. 2908 02:18:01,070 --> 02:18:02,690 And it's always there automatically. 2909 02:18:02,690 --> 02:18:04,940 But you can control this. 2910 02:18:04,940 --> 02:18:11,160 I can go into my code here and return 1, else, if something works fine, 2911 02:18:11,160 --> 02:18:14,870 I can return 0, by default. And honestly, if I omit the return zero, 2912 02:18:14,870 --> 02:18:17,129 again, zero automatically is returned. 2913 02:18:17,129 --> 02:18:20,719 So let me go ahead and go be explicit, just so I know what's going on. 2914 02:18:20,719 --> 02:18:26,360 Make status again, ./status, and let's do this correctly with David. 2915 02:18:26,360 --> 02:18:28,520 Enter, hello, David. 2916 02:18:28,520 --> 02:18:32,059 Echo $?, zero. 2917 02:18:32,059 --> 02:18:33,270 So all is well. 2918 02:18:33,270 --> 02:18:38,240 But now if I do ./status and nothing, or multiple things, but not just David, 2919 02:18:38,240 --> 02:18:40,530 Enter, I get the error message. 2920 02:18:40,530 --> 02:18:45,230 But now if I do echo $?, voila, there now is the one. 2921 02:18:45,230 --> 02:18:47,330 So what does this now mean? 2922 02:18:47,330 --> 02:18:49,490 This is, in the graphical world, we would just 2923 02:18:49,490 --> 02:18:51,020 show something like this on the screen, which is 2924 02:18:51,020 --> 02:18:52,459 a little more informative to the user. 2925 02:18:52,459 --> 02:18:54,469 But even in the Linux world where you don't have a GUI, 2926 02:18:54,469 --> 02:18:56,690 necessarily, even for the programs we've written, 2927 02:18:56,690 --> 02:18:58,549 you can check these exit statuses. 2928 02:18:58,549 --> 02:19:01,070 And in fact, more comfortable, more advanced programmers, 2929 02:19:01,070 --> 02:19:03,889 when they write code that calls programs, 2930 02:19:03,889 --> 02:19:07,340 be it cowsay or anything else, you can encode, 2931 02:19:07,340 --> 02:19:11,030 check what the exit status is of a program, and then decide, 2932 02:19:11,030 --> 02:19:13,170 did my program work or did it not? 2933 02:19:13,170 --> 02:19:16,219 And now let's connect the final dots before we 2934 02:19:16,219 --> 02:19:19,070 adjourn for some fruit snacks. 2935 02:19:19,070 --> 02:19:22,100 Cryptography, namely one of the applications this week 2936 02:19:22,100 --> 02:19:24,770 via which you'll be able to send, if you will, 2937 02:19:24,770 --> 02:19:27,650 secret messages, and better yet, decrypt secret messages. 2938 02:19:27,650 --> 02:19:29,780 This will be in addition to perhaps analyzing 2939 02:19:29,780 --> 02:19:32,120 the readability of text using heuristics, like we 2940 02:19:32,120 --> 02:19:34,040 identified at the start of class two. 2941 02:19:34,040 --> 02:19:38,299 So cryptography is just the art, the science of encrypting information, 2942 02:19:38,299 --> 02:19:41,330 scrambling information so that if you have a secret message 2943 02:19:41,330 --> 02:19:45,980 to send in so-called plaintext, you can run it through some algorithm 2944 02:19:45,980 --> 02:19:49,910 and turn it into what's called ciphertext, thereby, encrypting it. 2945 02:19:49,910 --> 02:19:53,150 And only someone who knows what algorithm you've used 2946 02:19:53,150 --> 02:19:55,880 and what input you've used to the algorithm, theoretically, 2947 02:19:55,880 --> 02:19:59,880 can decrypt that process and convert it back to the original message. 2948 02:19:59,880 --> 02:20:03,030 So if we use our mental model from last week, here is a problem. 2949 02:20:03,030 --> 02:20:04,910 Here is an input and output. 2950 02:20:04,910 --> 02:20:08,120 The goal I claim here is to take some plain text, like the message 2951 02:20:08,120 --> 02:20:10,250 you want to send, think back to grade school 2952 02:20:10,250 --> 02:20:13,640 if you ever passed a note to a friend or to your crush saying, I love you, 2953 02:20:13,640 --> 02:20:16,910 it's a little awkward if the teacher or someone else intercepts the paper. 2954 02:20:16,910 --> 02:20:19,490 And in English, it just says, I love you, or whatever it is. 2955 02:20:19,490 --> 02:20:22,350 It'd be nice if you had at least encrypted it in some way. 2956 02:20:22,350 --> 02:20:25,220 But the other person needs to know what algorithm you used 2957 02:20:25,220 --> 02:20:27,230 and what inputs you use to that algorithm 2958 02:20:27,230 --> 02:20:31,100 so that, ultimately, they can decode the so-called ciphertext, which 2959 02:20:31,100 --> 02:20:32,040 is the output. 2960 02:20:32,040 --> 02:20:34,190 So what goes inside of the box today? 2961 02:20:34,190 --> 02:20:37,970 Well, an algorithm, as it relates to cryptography, is called a cipher. 2962 02:20:37,970 --> 02:20:41,390 And a cipher is a fancy name for an algorithm that encrypts text 2963 02:20:41,390 --> 02:20:43,250 from plaintext to ciphertext. 2964 02:20:43,250 --> 02:20:46,760 The catch is, there needs to be not just the algorithm, 2965 02:20:46,760 --> 02:20:48,750 there needs to be an input to it. 2966 02:20:48,750 --> 02:20:52,590 And so, for instance, you might draw the picture like this for the first time 2967 02:20:52,590 --> 02:20:53,090 today. 2968 02:20:53,090 --> 02:20:54,257 And we've seen this in code. 2969 02:20:54,257 --> 02:20:57,180 You can give multiple inputs or arguments to functions. 2970 02:20:57,180 --> 02:20:59,960 So in this black box, can you imagine passing in the message 2971 02:20:59,960 --> 02:21:02,510 you want to send, and then some secret. 2972 02:21:02,510 --> 02:21:05,300 So for instance, suppose that, the simplest 2973 02:21:05,300 --> 02:21:08,750 thing I could think of as a kid was instead of sending the letter A, 2974 02:21:08,750 --> 02:21:10,310 why don't I write the letter B? 2975 02:21:10,310 --> 02:21:13,070 Instead of the letter B, why don't I write the letter C? 2976 02:21:13,070 --> 02:21:16,280 So I can kind of shift the English alphabet by one space. 2977 02:21:16,280 --> 02:21:18,740 So A becomes B, B becomes C, dot, dot, dot, 2978 02:21:18,740 --> 02:21:21,690 Z becomes A. You can wrap around at the end. 2979 02:21:21,690 --> 02:21:24,120 And let's assume no punctuation in this part of the story. 2980 02:21:24,120 --> 02:21:29,420 So that's a very simple algorithm-- add a value to each letter 2981 02:21:29,420 --> 02:21:32,090 and send the value as the ciphertext. 2982 02:21:32,090 --> 02:21:35,540 And now the teacher, the classmate, they have to know that you use, 2983 02:21:35,540 --> 02:21:39,410 not only this rotational algorithm, also known as a Caesar cipher, 2984 02:21:39,410 --> 02:21:41,300 they also need to know what number you use. 2985 02:21:41,300 --> 02:21:45,200 Did you add 1 to every letter, 2 to every letter, 25 to every letter? 2986 02:21:45,200 --> 02:21:49,310 Now if they're super smart and probably not the young age in this story, 2987 02:21:49,310 --> 02:21:51,165 they could also just try all possibilities. 2988 02:21:51,165 --> 02:21:53,040 And that would be an attack on the algorithm. 2989 02:21:53,040 --> 02:21:55,310 This is not a sophisticated algorithm, but it's 2990 02:21:55,310 --> 02:21:56,970 enough to send a message in class. 2991 02:21:56,970 --> 02:21:58,940 So if the two inputs now are HI! 2992 02:21:58,940 --> 02:22:04,280 as the plain text message, and 1 as the so-called key, the secret number 2993 02:22:04,280 --> 02:22:06,950 that only you and the other person know, you 2994 02:22:06,950 --> 02:22:11,040 might be able to encrypt a message from one way to the other. 2995 02:22:11,040 --> 02:22:13,400 And so in this case, for instance, HI! 2996 02:22:13,400 --> 02:22:16,198 would become I-J-!. 2997 02:22:16,198 --> 02:22:17,990 In this version of the algorithm, we're not 2998 02:22:17,990 --> 02:22:19,823 going to bother with numbers or punctuation. 2999 02:22:19,823 --> 02:22:23,090 We'll only operate on A through Z, be it uppercase or lowercase. 3000 02:22:23,090 --> 02:22:28,250 So now if you were to receive a slip of paper in class with I-J on it, 3001 02:22:28,250 --> 02:22:31,290 you, the recipient, would know what it is 3002 02:22:31,290 --> 02:22:33,440 so long as you know that the sender used one, 3003 02:22:33,440 --> 02:22:36,500 because you just reverse the algorithm and you subtract one instead. 3004 02:22:36,500 --> 02:22:39,110 The teacher, they probably don't know what this means, 3005 02:22:39,110 --> 02:22:41,443 and they're not going to spend time hacking the message, 3006 02:22:41,443 --> 02:22:42,975 so it just looks scrambled to them. 3007 02:22:42,975 --> 02:22:44,600 And that's what we get from encryption. 3008 02:22:44,600 --> 02:22:47,430 Someone who intercepts it, be it in class or in the real world, 3009 02:22:47,430 --> 02:22:51,080 on the internet or anywhere else, can't actually figure out, ideally, 3010 02:22:51,080 --> 02:22:52,700 what it is you have sent. 3011 02:22:52,700 --> 02:22:55,130 The opposite, of course, is indeed called decryption, 3012 02:22:55,130 --> 02:22:56,300 but the process is the same. 3013 02:22:56,300 --> 02:22:58,370 We now pass in negative 1. 3014 02:22:58,370 --> 02:23:00,300 And so how about this? 3015 02:23:00,300 --> 02:23:02,840 Why don't we end with a demonstration here? 3016 02:23:02,840 --> 02:23:08,360 UIJT XBT DT50-- there's a bit of a tell there. 3017 02:23:08,360 --> 02:23:11,060 If we pass that in and do negative 1, well, 3018 02:23:11,060 --> 02:23:14,180 how do we get out the plaintext originally? 3019 02:23:14,180 --> 02:23:18,200 Well, if this is the ciphertext, and we subtract 1 from each letter, 3020 02:23:18,200 --> 02:23:28,010 I think U becomes T, I becomes H, J becomes I, T becomes S, X becomes W, 3021 02:23:28,010 --> 02:23:37,580 B becomes A, T becomes S, D becomes C, T becomes S, and this was, indeed, CS50. 3022 02:23:37,580 --> 02:23:40,250 Have a duck on your way out, and some snacks in the lobby. 3023 02:23:40,250 --> 02:23:42,350 [APPLAUSE] 3024 02:23:42,350 --> 02:23:43,850 [FILM ROLLING] 3025 02:23:43,850 --> 02:23:47,500 [MUSIC PLAYING] 3026 02:23:47,500 --> 02:24:19,000