1 00:00:00,000 --> 00:00:03,486 [MUSIC PLAYING] 2 00:00:03,486 --> 00:01:01,280 3 00:01:01,280 --> 00:01:02,510 DAVID MALAN: All right. 4 00:01:02,510 --> 00:01:04,580 This is CS50. 5 00:01:04,580 --> 00:01:08,390 This is week 2 wherein we will ultimately learn how to use memory, 6 00:01:08,390 --> 00:01:11,900 but we thought we'd first begin with a bit of story time. 7 00:01:11,900 --> 00:01:14,570 And in fact, allow me to walk over to our brave volunteers who 8 00:01:14,570 --> 00:01:15,650 have joined us already. 9 00:01:15,650 --> 00:01:18,080 First here on my left, we have who? 10 00:01:18,080 --> 00:01:19,730 AKSHAYA: Hi, I'm Akshaya. 11 00:01:19,730 --> 00:01:22,520 I'm a first year in Mathews, and I'm planning 12 00:01:22,520 --> 00:01:25,747 on concentrating in chemical and physical biology and CS. 13 00:01:25,747 --> 00:01:27,080 DAVID MALAN: Wonderful, welcome. 14 00:01:27,080 --> 00:01:28,955 And let me have you hang on to the microphone 15 00:01:28,955 --> 00:01:31,437 first because we've asked Akshaya to tell us a short story. 16 00:01:31,437 --> 00:01:33,770 So in your envelope, you have the beginnings of a story. 17 00:01:33,770 --> 00:01:35,353 If you wouldn't mind reading it aloud. 18 00:01:35,353 --> 00:01:38,630 And as she reads this, allow us to give some thought as to what 19 00:01:38,630 --> 00:01:41,922 level Akshaya reads at, so to speak. 20 00:01:41,922 --> 00:01:43,880 AKSHAYA: All right, it's a long one, get ready. 21 00:01:43,880 --> 00:01:48,405 One fish, two fish, red fish, blue fish. 22 00:01:48,405 --> 00:01:50,030 DAVID MALAN: All right, very well done. 23 00:01:50,030 --> 00:01:53,302 What grade level would you say she reads at if you think back 24 00:01:53,302 --> 00:01:55,010 to your middle school, grade school, when 25 00:01:55,010 --> 00:01:59,490 maybe teacher said you read at this level or maybe this level or this one 26 00:01:59,490 --> 00:02:01,530 here? 27 00:02:01,530 --> 00:02:03,817 So OK, no offense taken yet. 28 00:02:03,817 --> 00:02:05,010 AUDIENCE: 1st grade. 29 00:02:05,010 --> 00:02:05,550 DAVID MALAN: I'm sorry? 30 00:02:05,550 --> 00:02:06,060 AUDIENCE: 1st grade. 31 00:02:06,060 --> 00:02:07,018 DAVID MALAN: 1st grade. 32 00:02:07,018 --> 00:02:08,850 OK, so first grade is just about right. 33 00:02:08,850 --> 00:02:12,432 And in fact, according to one algorithm, this text here, 34 00:02:12,432 --> 00:02:14,640 one fish, two fish, red fish, blue fish, would indeed 35 00:02:14,640 --> 00:02:17,830 be considered to actually be 1st grade or just before first grade. 36 00:02:17,830 --> 00:02:19,530 So let's-- and why is that, though? 37 00:02:19,530 --> 00:02:21,977 Why did you say 1st grade? 38 00:02:21,977 --> 00:02:23,060 AUDIENCE: It's very basic. 39 00:02:23,060 --> 00:02:23,990 DAVID MALAN: It's very basic. 40 00:02:23,990 --> 00:02:26,198 But what is it about these words that are very basic? 41 00:02:26,198 --> 00:02:27,687 Do you want to identify yourself? 42 00:02:27,687 --> 00:02:28,270 AKSHAYA: Sure. 43 00:02:28,270 --> 00:02:31,570 They're all one syllable and they're very simple like colors and stuff 44 00:02:31,570 --> 00:02:32,070 like that. 45 00:02:32,070 --> 00:02:32,945 DAVID MALAN: Spot-on. 46 00:02:32,945 --> 00:02:35,620 So like they're very short words they're very short sentences. 47 00:02:35,620 --> 00:02:38,050 And you would expect that of a younger person. 48 00:02:38,050 --> 00:02:40,822 All right, let's go ahead and hand the mic to your next volunteer 49 00:02:40,822 --> 00:02:42,530 here if you'd like to introduce yourself. 50 00:02:42,530 --> 00:02:43,030 ETHAN: Yes. 51 00:02:43,030 --> 00:02:43,810 Hi, I'm Ethan. 52 00:02:43,810 --> 00:02:46,752 I'm a first year in Canada, and I'll be concentrating in economics. 53 00:02:46,752 --> 00:02:47,710 DAVID MALAN: Wonderful. 54 00:02:47,710 --> 00:02:50,860 And in your folder, we have another story to share. 55 00:02:50,860 --> 00:02:52,480 ETHAN: Congratulations. 56 00:02:52,480 --> 00:02:53,740 Today is your day. 57 00:02:53,740 --> 00:02:55,480 You're off to great places. 58 00:02:55,480 --> 00:02:56,730 You're off and away. 59 00:02:56,730 --> 00:02:59,230 DAVID MALAN: So this text might sound familiar, particularly 60 00:02:59,230 --> 00:03:00,880 on the heels of high school, perhaps. 61 00:03:00,880 --> 00:03:05,310 What grade level might he be reading at? 62 00:03:05,310 --> 00:03:06,450 So maybe 5th grade. 63 00:03:06,450 --> 00:03:07,620 And why 5th grade? 64 00:03:07,620 --> 00:03:09,740 AUDIENCE: [INAUDIBLE] 65 00:03:09,740 --> 00:03:11,030 DAVID MALAN: OK. 66 00:03:11,030 --> 00:03:11,540 Yeah. 67 00:03:11,540 --> 00:03:13,040 So a little more complicated. 68 00:03:13,040 --> 00:03:16,850 Like the words-- we've got some more punctuation, we have an apostrophe, 69 00:03:16,850 --> 00:03:17,892 we have longer sentences. 70 00:03:17,892 --> 00:03:20,392 And indeed, according to one algorithm, not quite 5th grade, 71 00:03:20,392 --> 00:03:22,640 but we would adjudicate your reading level to be 3rd. 72 00:03:22,640 --> 00:03:25,280 But let's see if we can't do one final flourish here 73 00:03:25,280 --> 00:03:28,190 if you'd like to introduce yourself and your story. 74 00:03:28,190 --> 00:03:29,840 MIKE: Hi, I'm Mike. 75 00:03:29,840 --> 00:03:30,920 I'm also a first year. 76 00:03:30,920 --> 00:03:33,020 I'm in Weld, and I'm planning on concentrating 77 00:03:33,020 --> 00:03:34,185 in biomedical engineering. 78 00:03:34,185 --> 00:03:35,060 DAVID MALAN: Welcome. 79 00:03:35,060 --> 00:03:36,980 And your tale? 80 00:03:36,980 --> 00:03:41,750 MIKE: It was a bright, cold day in April and the clocks were striking 13. 81 00:03:41,750 --> 00:03:45,440 Winston Smith, his chin nuzzled into his breast in an effort 82 00:03:45,440 --> 00:03:49,130 to escape the vile wind, slipped quickly through the glass doors 83 00:03:49,130 --> 00:03:51,710 of victory mansions, though not quickly enough 84 00:03:51,710 --> 00:03:55,445 to prevent a swirl of gritty dust from entering along with him. 85 00:03:55,445 --> 00:03:57,320 DAVID MALAN: All right, so escalated quickly. 86 00:03:57,320 --> 00:03:59,960 And someone's guess at this reading level? 87 00:03:59,960 --> 00:04:01,083 AUDIENCE: 1984. 88 00:04:01,083 --> 00:04:02,125 DAVID MALAN: What's that? 89 00:04:02,125 --> 00:04:05,320 Oh, OK, 1984 is indeed the text in question, and in what 90 00:04:05,320 --> 00:04:08,050 grade did you perhaps read that book? 91 00:04:08,050 --> 00:04:09,670 So I'm hearing 8th, I'm hearing 10th. 92 00:04:09,670 --> 00:04:12,490 So indeed, 10th grade is what a certain algorithm would actually 93 00:04:12,490 --> 00:04:14,260 adjudicate that reading level to be at. 94 00:04:14,260 --> 00:04:15,610 And consider now the heuristics. 95 00:04:15,610 --> 00:04:19,158 So we started with very small words, very small sentences, very easy words, 96 00:04:19,158 --> 00:04:21,700 and then things sort of escalated into more interesting, more 97 00:04:21,700 --> 00:04:25,460 sophisticated English, more interesting sentence construction and the like. 98 00:04:25,460 --> 00:04:30,640 So I bet if we could somehow capture those characteristics of text, 99 00:04:30,640 --> 00:04:33,250 the length of the words and the lengths of the sentences 100 00:04:33,250 --> 00:04:35,680 and the position of the punctuation, I daresay, 101 00:04:35,680 --> 00:04:38,878 even using week 1 material and, today, week 2 material, 102 00:04:38,878 --> 00:04:41,920 we'll be able to actually write code and implement an algorithm like that 103 00:04:41,920 --> 00:04:44,380 can take these spoken words, put them to paper, 104 00:04:44,380 --> 00:04:47,590 and actually analyze roughly what that reading level might be. 105 00:04:47,590 --> 00:04:49,390 So that's just a teaser of what lies ahead. 106 00:04:49,390 --> 00:04:52,300 For now, allow us to thank our volunteers, each of whom 107 00:04:52,300 --> 00:04:55,930 gets a wonderful parting gift here to read at home. 108 00:04:55,930 --> 00:04:58,410 [APPLAUSE] 109 00:04:58,410 --> 00:04:58,910 All right. 110 00:04:58,910 --> 00:05:01,110 And Thank you all so much. 111 00:05:01,110 --> 00:05:05,730 So with that said, there's another domain that we'll explore this week, 112 00:05:05,730 --> 00:05:07,730 and indeed, what you'll find in the coming weeks 113 00:05:07,730 --> 00:05:11,150 is that beyond just focusing on some of the fundamentals and the basics 114 00:05:11,150 --> 00:05:14,330 like we've really done in the past couple of weeks talking about loops 115 00:05:14,330 --> 00:05:16,340 and conditionals and Boolean expressions, 116 00:05:16,340 --> 00:05:19,400 really building blocks or puzzle pieces that we can assemble together, 117 00:05:19,400 --> 00:05:22,070 we're going to increasingly start talking about applications 118 00:05:22,070 --> 00:05:25,250 of these ideas which, after all, is why any field is perhaps 119 00:05:25,250 --> 00:05:26,460 important and applicable. 120 00:05:26,460 --> 00:05:29,510 So here, for instance, we'll consider not only reading levels today, 121 00:05:29,510 --> 00:05:33,630 and in turn, in problem set 2 this week, but also the world of cryptography, 122 00:05:33,630 --> 00:05:36,860 which is the art, the science of scrambling, encrypting 123 00:05:36,860 --> 00:05:39,230 information, and ciphering it in such a way 124 00:05:39,230 --> 00:05:43,530 that you can send a message securely through the internet, through the air, 125 00:05:43,530 --> 00:05:46,700 through any medium even though someone might intercept it. 126 00:05:46,700 --> 00:05:49,100 Ideally, thanks to cryptography, they shouldn't 127 00:05:49,100 --> 00:05:53,240 be able to decrypt it or actually determine what it there says. 128 00:05:53,240 --> 00:05:57,560 So for instance, if you were to receive a message like this, at first glance, 129 00:05:57,560 --> 00:05:59,460 it's indeed a bit cryptic. 130 00:05:59,460 --> 00:06:02,400 Three words maybe, but by day's end, we'll 131 00:06:02,400 --> 00:06:04,830 have decrypted even this message for you. 132 00:06:04,830 --> 00:06:08,550 So up until now, though, we've had some sort of conceptual training wheels on. 133 00:06:08,550 --> 00:06:12,480 And I gave us this picture last week when we introduced the tool make via 134 00:06:12,480 --> 00:06:15,870 which you can make programs out of your source code because you need to turn 135 00:06:15,870 --> 00:06:18,450 that source code into machine code, the 0's and 1's. 136 00:06:18,450 --> 00:06:20,970 And in the middle here was this thing called a compiler. 137 00:06:20,970 --> 00:06:23,790 But it really has been kind of an abstraction for us, 138 00:06:23,790 --> 00:06:27,690 and we've sort of had these metaphorical and physical training 139 00:06:27,690 --> 00:06:30,450 wheels here in the sense that we haven't really 140 00:06:30,450 --> 00:06:34,420 needed to care like what the compiler is doing, how it works and so forth. 141 00:06:34,420 --> 00:06:38,400 But today, what we thought we'd do is peel back a bit of that layer so 142 00:06:38,400 --> 00:06:40,410 that even though after today you'll continue 143 00:06:40,410 --> 00:06:43,380 to be able to use commands like make and sort of return 144 00:06:43,380 --> 00:06:46,275 to the beautiful abstraction that is not caring about some 145 00:06:46,275 --> 00:06:48,150 of these lower-level details, we'll offer you 146 00:06:48,150 --> 00:06:49,980 a glimpse of how some of these things work. 147 00:06:49,980 --> 00:06:52,350 Because so that inevitably when something goes wrong, 148 00:06:52,350 --> 00:06:54,540 you've got some bug, you're having some problem, 149 00:06:54,540 --> 00:06:58,620 you'll have a bottom-up understanding of what it could actually be. 150 00:06:58,620 --> 00:07:01,620 And indeed, these basics, you'll find, will very often 151 00:07:01,620 --> 00:07:05,230 help you troubleshoot problems and really solve problems more generally. 152 00:07:05,230 --> 00:07:07,920 So here, for instance, is the code that we keep coming back to. 153 00:07:07,920 --> 00:07:12,750 And this code here is the simplest of C programs that just says "hello, world." 154 00:07:12,750 --> 00:07:13,960 This is the source code. 155 00:07:13,960 --> 00:07:16,260 This, we claimed, was the corresponding machine code. 156 00:07:16,260 --> 00:07:18,810 And it was that program called a compiler that 157 00:07:18,810 --> 00:07:20,800 converted one into the other. 158 00:07:20,800 --> 00:07:23,100 But let's dive a little more deeply this week 159 00:07:23,100 --> 00:07:25,920 into what we mean by compiling code. 160 00:07:25,920 --> 00:07:28,410 Like what is happening so that by day's end, 161 00:07:28,410 --> 00:07:30,910 nothing really feels like magic anymore. 162 00:07:30,910 --> 00:07:33,540 It's not just that it goes from source code to machine code 163 00:07:33,540 --> 00:07:37,020 and that's that, you understand what's actually being done for you, 164 00:07:37,020 --> 00:07:40,694 and frankly, what other humans have done over the decades to make 165 00:07:40,694 --> 00:07:45,597 make as beautifully abstract and as simple as it now might seem to be. 166 00:07:45,597 --> 00:07:47,430 So here are a couple of commands that you've 167 00:07:47,430 --> 00:07:50,305 been in the habit of running when you want to first compile your code 168 00:07:50,305 --> 00:07:51,930 and then execute your code. 169 00:07:51,930 --> 00:07:56,280 But it turns out that make is actually running another command for you. 170 00:07:56,280 --> 00:07:59,190 The first of several white lies we'll tell in the course 171 00:07:59,190 --> 00:08:02,040 is that make itself is not a compiler, per se. 172 00:08:02,040 --> 00:08:06,580 It's actually a program that automatically runs a compiler for you. 173 00:08:06,580 --> 00:08:07,770 And by that, I mean this. 174 00:08:07,770 --> 00:08:13,650 Let me go over to VS Code here and let me create our familiar hello.c program. 175 00:08:13,650 --> 00:08:20,310 And I'm going to go ahead and do include stdio.h, int main void, and inside 176 00:08:20,310 --> 00:08:25,027 of the curly braces, printf "hello," comma, "world," backslash n semicolon. 177 00:08:25,027 --> 00:08:27,360 So that's the code that we keep writing again and again. 178 00:08:27,360 --> 00:08:31,932 And up until now, if I wanted to compile that, I would do make hello 179 00:08:31,932 --> 00:08:35,820 dot slash hello, and voila, now my program is made 180 00:08:35,820 --> 00:08:37,980 and it actually executes. 181 00:08:37,980 --> 00:08:40,289 But what's actually going on underneath the hood 182 00:08:40,289 --> 00:08:43,799 there is that make is running an actual compiler for you, 183 00:08:43,799 --> 00:08:46,980 and the reveal today is that the compiler we have been using 184 00:08:46,980 --> 00:08:49,170 is something called Clang for C language. 185 00:08:49,170 --> 00:08:51,540 And this is just another program whose purpose in life 186 00:08:51,540 --> 00:08:54,510 is actually to do the conversion of source code to machine code. 187 00:08:54,510 --> 00:08:57,360 But it turns out that Clang by itself can 188 00:08:57,360 --> 00:09:00,770 be used very simply like you see here, clang hello.c, 189 00:09:00,770 --> 00:09:04,563 but it doesn't behave nearly as user-friendly as you might like. 190 00:09:04,563 --> 00:09:06,480 So in particular, let me go ahead and do this. 191 00:09:06,480 --> 00:09:08,960 I'm going to go ahead and remove my compiled program 192 00:09:08,960 --> 00:09:12,830 by running rm for remove, which I alluded to briefly last time. 193 00:09:12,830 --> 00:09:16,260 And then I'm going to say y for yes, remove that regular file. 194 00:09:16,260 --> 00:09:21,800 And if I go ahead now and run just clang of hello.c and hit Enter, 195 00:09:21,800 --> 00:09:25,140 it seems to be successful, at least insofar as there's no error messages. 196 00:09:25,140 --> 00:09:27,530 But if I try to do dot slash hello, Enter, 197 00:09:27,530 --> 00:09:31,070 there is no such file or directory called hello. 198 00:09:31,070 --> 00:09:34,940 That is because by default, Clang somewhat goofily like just 199 00:09:34,940 --> 00:09:37,670 outputs a file name called a dot out. 200 00:09:37,670 --> 00:09:38,480 Like why a? 201 00:09:38,480 --> 00:09:42,200 Well, it's sort of a simple name. a dot out, technically for assembler output, 202 00:09:42,200 --> 00:09:44,270 but this just means this is the default file 203 00:09:44,270 --> 00:09:45,770 name that Clang is going to give us. 204 00:09:45,770 --> 00:09:49,790 So OK, it turns out I can do dot slash a dot out Enter, and voila, 205 00:09:49,790 --> 00:09:53,723 that now is my program, but that's just a stupid name for a program. 206 00:09:53,723 --> 00:09:54,890 It's not very user-friendly. 207 00:09:54,890 --> 00:09:56,598 It's certainly not an icon you would want 208 00:09:56,598 --> 00:09:58,680 to put on people's desktops or phones. 209 00:09:58,680 --> 00:10:00,070 So how can we do better? 210 00:10:00,070 --> 00:10:03,600 Well, it turns out, with Clang, we can configure it using 211 00:10:03,600 --> 00:10:05,983 what we'll call command line arguments. 212 00:10:05,983 --> 00:10:09,150 And command line arguments are actually something we've been using thus far, 213 00:10:09,150 --> 00:10:12,390 we just didn't slap this word on it, but command line arguments 214 00:10:12,390 --> 00:10:15,600 are additional words or shorthand notation 215 00:10:15,600 --> 00:10:18,660 that you typed at your command prompt that somehow 216 00:10:18,660 --> 00:10:21,270 modify the behavior of a program. 217 00:10:21,270 --> 00:10:23,310 And you can perhaps guess where this is going. 218 00:10:23,310 --> 00:10:28,140 It turns out that if I actually want to create a program called hello-- 219 00:10:28,140 --> 00:10:31,200 not a.out, which is the default, I can actually 220 00:10:31,200 --> 00:10:36,420 do this-- clang, space, dash lowercase o, space, hello, 221 00:10:36,420 --> 00:10:40,260 or whatever I want to call the thing, space, hello.c. 222 00:10:40,260 --> 00:10:42,630 And now if I hit Enter, nothing seems to happen, 223 00:10:42,630 --> 00:10:48,490 but now if I do ./hello and Enter, now I've actually got that program. 224 00:10:48,490 --> 00:10:49,737 So why is make useful? 225 00:10:49,737 --> 00:10:51,570 Well, it just saves us the trouble of having 226 00:10:51,570 --> 00:10:55,230 to type out this longer line of command any time 227 00:10:55,230 --> 00:10:56,940 we actually want to compile the code. 228 00:10:56,940 --> 00:10:59,430 But in fact, it gets even worse than that 229 00:10:59,430 --> 00:11:01,860 with commands like clang or compilers in general 230 00:11:01,860 --> 00:11:04,470 because consider this code here. 231 00:11:04,470 --> 00:11:08,010 Not just the version of "hello, world," but maybe the second version wherein 232 00:11:08,010 --> 00:11:11,700 last week, I started to get user input by adding the CS50 Library using 233 00:11:11,700 --> 00:11:14,370 get_string and then saying, "hello," comma, "David." 234 00:11:14,370 --> 00:11:18,210 Well, if I go back to VS Code and I modify this program 235 00:11:18,210 --> 00:11:19,810 to be that same one-- 236 00:11:19,810 --> 00:11:23,490 so let me go ahead and include cs50.h at the top. 237 00:11:23,490 --> 00:11:27,000 Let me get rid of this simple print line and instead give myself 238 00:11:27,000 --> 00:11:33,510 a string called name equals get_string, "What's your name?" 239 00:11:33,510 --> 00:11:35,610 Question mark, just like we did in Scratch. 240 00:11:35,610 --> 00:11:39,510 Then I can do printf, quote-unquote, "hello," comma. 241 00:11:39,510 --> 00:11:41,532 And previously I typed "world." 242 00:11:41,532 --> 00:11:44,490 I obviously don't want to type "David" because I want it to be dynamic. 243 00:11:44,490 --> 00:11:47,430 What did I type last week for as a placeholder? 244 00:11:47,430 --> 00:11:50,980 So yeah, just-- not Command-S, but %S. So %S in this case, 245 00:11:50,980 --> 00:11:53,070 which is a placeholder for any such string. 246 00:11:53,070 --> 00:11:56,550 Then I can still do my new line, close, quote, comma, and then 247 00:11:56,550 --> 00:12:00,630 I can substitute in something like the value of the name variable. 248 00:12:00,630 --> 00:12:03,430 All right, so if I go ahead now and compile this, 249 00:12:03,430 --> 00:12:06,300 now last week, I could just do make hello and I'm on my way, 250 00:12:06,300 --> 00:12:07,570 it worked just fine. 251 00:12:07,570 --> 00:12:10,440 But if I instead do clang manually, it turns out 252 00:12:10,440 --> 00:12:16,650 that this is not going to be sufficient now. clang -o hello, space, hello.c. 253 00:12:16,650 --> 00:12:19,200 Exact same thing I typed a moment ago, but I 254 00:12:19,200 --> 00:12:21,940 think I'm going to see some errors. 255 00:12:21,940 --> 00:12:24,580 So what's this error hinting at here? 256 00:12:24,580 --> 00:12:27,120 Well, at the very bottom, it's a bit arcane with its output, 257 00:12:27,120 --> 00:12:30,400 and much of this you can ignore, but there are some certain key words. 258 00:12:30,400 --> 00:12:33,240 What's the first maybe keyword you recognize in these three 259 00:12:33,240 --> 00:12:36,130 lines of erroneous output? 260 00:12:36,130 --> 00:12:37,273 So it mentions main. 261 00:12:37,273 --> 00:12:40,440 That's not that much of a clue because that's the only thing I wrote so far. 262 00:12:40,440 --> 00:12:42,060 Second line, though, get_string. 263 00:12:42,060 --> 00:12:46,050 There's some issue with an undefined reference to get_string. 264 00:12:46,050 --> 00:12:47,590 Now why might that be? 265 00:12:47,590 --> 00:12:51,820 I did include cs50.h, but that's apparently not 266 00:12:51,820 --> 00:12:54,520 enough to teach the compiler about get_string. 267 00:12:54,520 --> 00:12:58,630 Well, it turns out that if you're using a third-party library, one 268 00:12:58,630 --> 00:13:02,740 that doesn't necessarily come with C the language, something like CS50's, it 269 00:13:02,740 --> 00:13:05,860 turns out that you additionally have to tell the compiler that you 270 00:13:05,860 --> 00:13:07,060 want to use that library. 271 00:13:07,060 --> 00:13:08,890 And not just by including the header file, 272 00:13:08,890 --> 00:13:11,450 but by an additional command as well. 273 00:13:11,450 --> 00:13:15,010 So when you run Clang, you want to provide an additional 274 00:13:15,010 --> 00:13:16,900 rather command line argument. 275 00:13:16,900 --> 00:13:21,580 Literally -l for library, which is a term I used last week, cs50. 276 00:13:21,580 --> 00:13:23,620 A library is just code that someone else wrote 277 00:13:23,620 --> 00:13:25,640 that you want to use in your project. 278 00:13:25,640 --> 00:13:29,380 So if I really want to compile this version that uses the CS50 Library, 279 00:13:29,380 --> 00:13:34,660 I can still do clang o hello hello.c, but before I finish my thought, 280 00:13:34,660 --> 00:13:40,450 I need to tell the compiler to link, so to speak, in the library CS50. 281 00:13:40,450 --> 00:13:44,350 And now I hit Enter, the error message goes away, I can do ./hello, 282 00:13:44,350 --> 00:13:47,410 I can type in my name, and voila, we're back to week 1. 283 00:13:47,410 --> 00:13:49,987 And this is why, suffice it to say, we introduce make, 284 00:13:49,987 --> 00:13:51,070 which is not a CS50 thing. 285 00:13:51,070 --> 00:13:54,070 This is a popular tool that real people in the real world 286 00:13:54,070 --> 00:13:56,480 use to automate these kinds of processes. 287 00:13:56,480 --> 00:13:59,050 So unbeknownst to you, make has been using 288 00:13:59,050 --> 00:14:03,670 the -o for you. make, unbeknownst to you, has been using -l cs50 for you 289 00:14:03,670 --> 00:14:06,650 just because it makes our lives easier. 290 00:14:06,650 --> 00:14:08,560 But today, we thought we would deliberately 291 00:14:08,560 --> 00:14:11,440 peel back this layer so we at least understand 292 00:14:11,440 --> 00:14:16,300 what's going on behind this abstraction that is make itself 293 00:14:16,300 --> 00:14:17,750 and compiling more generally. 294 00:14:17,750 --> 00:14:21,880 So let me propose that compiling itself is not quite what 295 00:14:21,880 --> 00:14:22,960 we've described it to be. 296 00:14:22,960 --> 00:14:25,840 Compiling is like this catch-all phrase that apparently I claim 297 00:14:25,840 --> 00:14:27,650 goes from source code to machine code. 298 00:14:27,650 --> 00:14:30,710 But if we really want to get pedantic, which we'll do briefly, 299 00:14:30,710 --> 00:14:33,640 but this is not a sign of things to come because this, too, 300 00:14:33,640 --> 00:14:39,250 will be abstract away, compiling is just one of four steps that are involved 301 00:14:39,250 --> 00:14:43,010 in turning source code that you and I write into those 0's and 1's. 302 00:14:43,010 --> 00:14:45,010 But through an understanding of these four steps 303 00:14:45,010 --> 00:14:46,780 today, you'll hopefully better understand 304 00:14:46,780 --> 00:14:49,480 how to troubleshoot issues like that and just know 305 00:14:49,480 --> 00:14:51,680 what's happening because it's not, in fact, magic. 306 00:14:51,680 --> 00:14:55,850 It's just the result of years of humans developing these four steps here. 307 00:14:55,850 --> 00:14:58,870 So when you run make, what's happening? 308 00:14:58,870 --> 00:15:02,450 Or in turn, when you run clang, four different things are happening. 309 00:15:02,450 --> 00:15:04,360 And the first one is called pre-processing. 310 00:15:04,360 --> 00:15:05,720 So what is this all about? 311 00:15:05,720 --> 00:15:07,270 Well, let's consider this code here. 312 00:15:07,270 --> 00:15:09,730 And this code is a little bit interesting 313 00:15:09,730 --> 00:15:13,850 insofar as it's one of the more complicated examples from last week. 314 00:15:13,850 --> 00:15:18,550 And you'll notice, for instance, that I had include stdio at the top 315 00:15:18,550 --> 00:15:19,900 so I could use printf. 316 00:15:19,900 --> 00:15:24,340 I had main down here, whose purpose in life was just to meow three times. 317 00:15:24,340 --> 00:15:27,880 And then recall we made our own meow function just like we did in week 0 318 00:15:27,880 --> 00:15:31,630 with Scratch that just printed out, quote-unquote, "meow." 319 00:15:31,630 --> 00:15:37,210 But I also included this line here, which we called what? 320 00:15:37,210 --> 00:15:39,760 This was a prototype. 321 00:15:39,760 --> 00:15:41,470 And why did I have to include it there? 322 00:15:41,470 --> 00:15:45,070 Or equivalently, what would happen if I didn't include a prototype up 323 00:15:45,070 --> 00:15:45,790 at the top there? 324 00:15:45,790 --> 00:15:46,693 Yeah? 325 00:15:46,693 --> 00:15:51,255 AUDIENCE: [INAUDIBLE] 326 00:15:51,255 --> 00:15:52,130 DAVID MALAN: Exactly. 327 00:15:52,130 --> 00:15:55,820 If I didn't include it up here, the program, when trying to compile main, 328 00:15:55,820 --> 00:15:59,370 would not know what meow is because it's not defined until later. 329 00:15:59,370 --> 00:16:02,210 So this is kind of like a little hint of what is to come. 330 00:16:02,210 --> 00:16:05,750 Alternatively, we could just move this whole thing up at the top of the file, 331 00:16:05,750 --> 00:16:08,120 but I claim that just devolves into a big mess 332 00:16:08,120 --> 00:16:10,250 eventually once you have many different functions. 333 00:16:10,250 --> 00:16:13,590 Like you can't realistically put them all at the top to solve this problem. 334 00:16:13,590 --> 00:16:15,870 So these prototypes solve that problem. 335 00:16:15,870 --> 00:16:16,760 So nothing new here. 336 00:16:16,760 --> 00:16:20,750 Just a reminder of what motivated this one line of prototype. 337 00:16:20,750 --> 00:16:24,290 Now let's consider this simpler program, which 338 00:16:24,290 --> 00:16:26,945 is just the one we wrote most recently in VS Code. 339 00:16:26,945 --> 00:16:28,820 This program prompts the human for their name 340 00:16:28,820 --> 00:16:30,590 and then says hello to that person. 341 00:16:30,590 --> 00:16:33,710 But it has two includes at the top of the file. 342 00:16:33,710 --> 00:16:37,070 And in fact, any line of C that starts with this hash symbol 343 00:16:37,070 --> 00:16:40,220 is what we'll call now a preprocessor directive. 344 00:16:40,220 --> 00:16:42,950 It's not really a word you need to remember in your vocabulary, 345 00:16:42,950 --> 00:16:46,310 but it is a little bit different from most every other line 346 00:16:46,310 --> 00:16:47,900 because it starts with that hash. 347 00:16:47,900 --> 00:16:50,420 That's a special symbol in C. 348 00:16:50,420 --> 00:16:52,750 And what this means is the following. 349 00:16:52,750 --> 00:16:57,570 This very first line, cs50.h, is indeed a file that I and CS50 staff 350 00:16:57,570 --> 00:17:02,400 wrote and we installed somewhere in VS Code for you, somewhere in the cloud. 351 00:17:02,400 --> 00:17:07,859 And I've claimed you need to use this header file in order to use get_string. 352 00:17:07,859 --> 00:17:12,290 So just logically, what is probably inside of cs50.h? 353 00:17:12,290 --> 00:17:15,089 354 00:17:15,089 --> 00:17:16,170 Yeah? 355 00:17:16,170 --> 00:17:17,610 AUDIENCE: Function [INAUDIBLE]. 356 00:17:17,610 --> 00:17:23,628 357 00:17:23,628 --> 00:17:24,670 DAVID MALAN: Super close. 358 00:17:24,670 --> 00:17:27,589 So the function called get_string that does the getting of a string, 359 00:17:27,589 --> 00:17:30,038 but it's not quite as much as the function itself. 360 00:17:30,038 --> 00:17:33,080 It's actually a little bit less than that, but you're on the right track. 361 00:17:33,080 --> 00:17:37,940 What is inside of cs50.h, presumably? 362 00:17:37,940 --> 00:17:40,560 Just a what? 363 00:17:40,560 --> 00:17:43,770 Just a prototype for? 364 00:17:43,770 --> 00:17:44,820 Which function? 365 00:17:44,820 --> 00:17:45,750 get_string. 366 00:17:45,750 --> 00:17:48,390 So admittedly, there's some other stuff in there, too, 367 00:17:48,390 --> 00:17:51,930 but the important line for today's discussion is that inside of cs50.h 368 00:17:51,930 --> 00:17:55,740 is indeed one line of code that defines what the return value, what 369 00:17:55,740 --> 00:17:59,610 the name is, and what the arguments, if any, are to get_string, 370 00:17:59,610 --> 00:18:00,880 and some other stuff. 371 00:18:00,880 --> 00:18:05,130 And so what happens effectively when you compile your code, 372 00:18:05,130 --> 00:18:07,080 step 1 is this pre-processing line. 373 00:18:07,080 --> 00:18:09,960 And essentially, there is some code that someone else wrote inside 374 00:18:09,960 --> 00:18:13,710 of the clang compiler that looks for a line that starts with hash include, 375 00:18:13,710 --> 00:18:17,580 and when it sees that, it goes and finds this file and effectively copies 376 00:18:17,580 --> 00:18:21,240 and pastes the contents of that file right there into your code 377 00:18:21,240 --> 00:18:23,130 so that you don't have to go find the file, 378 00:18:23,130 --> 00:18:25,840 copy and paste it, and make a mess of your own code. 379 00:18:25,840 --> 00:18:29,550 So in particular, it's effectively as though you're copying and pasting 380 00:18:29,550 --> 00:18:32,910 the prototype of get_string to the very top of your file, 381 00:18:32,910 --> 00:18:35,550 thereby teaching the compiler that it exists. 382 00:18:35,550 --> 00:18:38,550 By that same logic, what is probably in stdio.h? 383 00:18:38,550 --> 00:18:41,740 384 00:18:41,740 --> 00:18:43,690 The prototype for? 385 00:18:43,690 --> 00:18:44,710 For printf. 386 00:18:44,710 --> 00:18:46,280 And indeed, exactly that. 387 00:18:46,280 --> 00:18:49,450 So this line effectively gets replaced with the equivalent 388 00:18:49,450 --> 00:18:52,150 of the prototype for printf, which, for today's purposes, 389 00:18:52,150 --> 00:18:55,210 is a bit more complicated, so let me wave my hand at the dot-dot-dot 390 00:18:55,210 --> 00:18:57,850 just because it takes a variable number of arguments 391 00:18:57,850 --> 00:19:00,760 depending on how many placeholders or format codes you have. 392 00:19:00,760 --> 00:19:03,290 But effectively, that, too, is what's happening. 393 00:19:03,290 --> 00:19:06,190 So the preprocessor step, step 1 of 4, just 394 00:19:06,190 --> 00:19:08,097 does that find and replace, if you will. 395 00:19:08,097 --> 00:19:10,430 Now there's some-- again, some other stuff in that file, 396 00:19:10,430 --> 00:19:12,580 and this, too, is kind of a white lie. printf 397 00:19:12,580 --> 00:19:15,790 probably has its own file because that's a really big library, 398 00:19:15,790 --> 00:19:17,930 but the essence of it is exactly this. 399 00:19:17,930 --> 00:19:21,010 So preprocessing converts all of those hash 400 00:19:21,010 --> 00:19:24,520 include lines to whatever the underlying prototypes are 401 00:19:24,520 --> 00:19:26,650 within the file plus some other stuff. 402 00:19:26,650 --> 00:19:29,920 Now compiling we use it as this catch-all phrase, but it turns out, 403 00:19:29,920 --> 00:19:32,100 it has a very specific meaning that's worth 404 00:19:32,100 --> 00:19:33,850 knowing about even though after today, you 405 00:19:33,850 --> 00:19:37,120 can go back to using compiling as the sort of catch-all phrase. 406 00:19:37,120 --> 00:19:41,390 So when you've got this same code here after the pre-processing step 407 00:19:41,390 --> 00:19:42,420 has happened. 408 00:19:42,420 --> 00:19:44,900 So this is essentially happening in the computer's memory. 409 00:19:44,900 --> 00:19:49,400 It's not changing your hello.c file permanently or anything like that. 410 00:19:49,400 --> 00:19:54,890 This code gets, quote-unquote, "compiled" into something 411 00:19:54,890 --> 00:19:57,120 that looks more like this. 412 00:19:57,120 --> 00:19:59,660 And this is a scarier language that we won't spend time 413 00:19:59,660 --> 00:20:00,860 on in this particular class. 414 00:20:00,860 --> 00:20:02,690 This is what's known as assembly language. 415 00:20:02,690 --> 00:20:06,710 And back in the day, before there was C, humans 416 00:20:06,710 --> 00:20:09,110 wrote this to program their computers. 417 00:20:09,110 --> 00:20:12,440 Similarly, before there was assembly code back in the day, 418 00:20:12,440 --> 00:20:15,163 humans very initially used what instead? 419 00:20:15,163 --> 00:20:16,080 AUDIENCE: 0's and 1's. 420 00:20:16,080 --> 00:20:19,430 DAVID MALAN: So 0's and 1's-- like they actually wrote the machine code 421 00:20:19,430 --> 00:20:23,360 painfully, be it in code or be it in punch cards like physical objects 422 00:20:23,360 --> 00:20:24,000 or the like. 423 00:20:24,000 --> 00:20:25,730 So again, these are sort of abstractions, 424 00:20:25,730 --> 00:20:27,660 but we're rewinding for today in time. 425 00:20:27,660 --> 00:20:30,860 But what this compiler for C is doing is converting C 426 00:20:30,860 --> 00:20:33,380 into this other language called assembly language. 427 00:20:33,380 --> 00:20:35,630 And even though this looks very esoteric, 428 00:20:35,630 --> 00:20:37,940 there's at least some juicy things in here. 429 00:20:37,940 --> 00:20:40,580 If I highlight get_string, it's mentioned in this code. 430 00:20:40,580 --> 00:20:42,560 printf is mentioned in this code. 431 00:20:42,560 --> 00:20:44,540 And even some of these keywords here that 432 00:20:44,540 --> 00:20:48,320 are spelled a bit weirdly, this relates to subtracting and moving 433 00:20:48,320 --> 00:20:51,480 something in memory and calling a function, calling a function. 434 00:20:51,480 --> 00:20:53,450 So there's some semantics that are probably 435 00:20:53,450 --> 00:20:56,690 somewhat familiar even though this is not code we ourselves will write. 436 00:20:56,690 --> 00:20:59,670 But unfortunately, this is not yet machine code, 437 00:20:59,670 --> 00:21:02,370 and that's where step 3 comes in. 438 00:21:02,370 --> 00:21:06,470 So step 3 of this four-step process is technically called assembling. 439 00:21:06,470 --> 00:21:12,320 And assembling just takes that assembly code and converts it, thankfully, 440 00:21:12,320 --> 00:21:15,650 to the thing we do care about, the 0's and 1's. 441 00:21:15,650 --> 00:21:18,830 So assembling takes assembly code converts it to 0's and 1's. 442 00:21:18,830 --> 00:21:21,020 As an aside, and I alluded to this earlier, 443 00:21:21,020 --> 00:21:26,810 the reason that Clang names its files a.out by default, assembler output, 444 00:21:26,810 --> 00:21:30,740 is a side effect of that being one of the steps in this process, 445 00:21:30,740 --> 00:21:33,740 dealing with assembly language and its subsequent output. 446 00:21:33,740 --> 00:21:36,680 All right, so here are some 0's and 1's, but unfortunately, there's 447 00:21:36,680 --> 00:21:41,340 still that fourth and final step, which is a word that I also used earlier, 448 00:21:41,340 --> 00:21:42,620 namely linking. 449 00:21:42,620 --> 00:21:45,420 So let me take a step back and look at this code here. 450 00:21:45,420 --> 00:21:50,090 And even though this code is exactly as I wrote in VS Code in hello.c-- 451 00:21:50,090 --> 00:21:52,310 so no copying and pasting, no prototypes have 452 00:21:52,310 --> 00:21:55,610 been plugged in here, this is my code, technically, there's 453 00:21:55,610 --> 00:21:59,270 three different files involved in compiling even something relatively 454 00:21:59,270 --> 00:22:00,170 simple like this. 455 00:22:00,170 --> 00:22:03,560 There's obviously this thing itself, hello.c, which I wrote. 456 00:22:03,560 --> 00:22:08,600 There's apparently cs50.h, and there's apparently stdio.h. 457 00:22:08,600 --> 00:22:12,650 But technically-- and you don't have to know this file name, per se, somewhere 458 00:22:12,650 --> 00:22:15,540 else on the computer's hard drive, so to speak, 459 00:22:15,540 --> 00:22:19,520 is a cs50.c file, which actually contains 460 00:22:19,520 --> 00:22:22,910 the staff's implementation of get_string and get_int and get_float 461 00:22:22,910 --> 00:22:24,320 and all of those other functions. 462 00:22:24,320 --> 00:22:28,460 Somewhere on the server's hard drive is stdio.c 463 00:22:28,460 --> 00:22:31,890 that implements printf and all of these other functions as well. 464 00:22:31,890 --> 00:22:34,940 So the dot c is just inferred from the dot h here. 465 00:22:34,940 --> 00:22:38,450 You don't ever mention the dot c file, but someone else wrote those files, 466 00:22:38,450 --> 00:22:41,570 someone else stored them in the server for you-- 467 00:22:41,570 --> 00:22:43,220 CS50 staff in this case. 468 00:22:43,220 --> 00:22:47,270 So technically, even when compiling a relatively short program like this, 469 00:22:47,270 --> 00:22:51,920 you're really combining three files at least at the end of the day. 470 00:22:51,920 --> 00:22:54,020 And I'll write them from left to right. hello.c, 471 00:22:54,020 --> 00:23:01,920 which I wrote, cs50.c, which the staff wrote, and then stdio.c as well. 472 00:23:01,920 --> 00:23:04,010 So somewhere there's these three files. 473 00:23:04,010 --> 00:23:08,540 And Clang, our compiler, needs to compile each of these 474 00:23:08,540 --> 00:23:12,500 into the corresponding 0's and 1's. 475 00:23:12,500 --> 00:23:17,300 Lastly, this is not yet sufficient because these 0's and 1's haven't 476 00:23:17,300 --> 00:23:18,333 been linked together. 477 00:23:18,333 --> 00:23:20,750 I mean, I deliberately left a gap here to imply that these 478 00:23:20,750 --> 00:23:22,880 are three separately-compiled files. 479 00:23:22,880 --> 00:23:25,760 So that fourth and final step called linking 480 00:23:25,760 --> 00:23:28,430 takes all of these 0's and 1's and an intelligent way 481 00:23:28,430 --> 00:23:35,300 combines them into just one final file named hello, named a.out, 482 00:23:35,300 --> 00:23:37,680 whatever the file name is of choice. 483 00:23:37,680 --> 00:23:40,820 So what you and I for the past week have just been calling compiling-- 484 00:23:40,820 --> 00:23:43,550 and that's what a normal person will use henceforth 485 00:23:43,550 --> 00:23:46,490 to describe this whole process, technically, there's 486 00:23:46,490 --> 00:23:49,250 these four different steps underneath the hood, each of which 487 00:23:49,250 --> 00:23:55,067 is sort of a representative of an evolution of technology over the years. 488 00:23:55,067 --> 00:23:56,900 And nowadays, if we fast forward a few weeks 489 00:23:56,900 --> 00:23:59,780 in class, when we start talking about Python, which 490 00:23:59,780 --> 00:24:03,710 is another more modern language, that, too, is going to be conceptually even 491 00:24:03,710 --> 00:24:06,090 higher level, even though underneath the hood, 492 00:24:06,090 --> 00:24:09,330 there's going to be some lower-level principles at work. 493 00:24:09,330 --> 00:24:16,010 So any questions on just terminology or these processes known as compiling? 494 00:24:16,010 --> 00:24:17,462 Yeah? 495 00:24:17,462 --> 00:24:19,879 AUDIENCE: I didn't really understand what compiling means. 496 00:24:19,879 --> 00:24:21,360 [INAUDIBLE] 497 00:24:21,360 --> 00:24:22,110 DAVID MALAN: Sure. 498 00:24:22,110 --> 00:24:29,400 Compiling, if I rewind, is the process of taking your source code, which 499 00:24:29,400 --> 00:24:35,260 looks like this, recall-- whoops, this, and converting it into assembly code. 500 00:24:35,260 --> 00:24:38,640 So preprocessing just converts all of those hash 501 00:24:38,640 --> 00:24:41,470 include lines and a few others to their equivalents. 502 00:24:41,470 --> 00:24:42,210 So that's step 1. 503 00:24:42,210 --> 00:24:46,920 Compiling converts the C code into the underlying assembly code. 504 00:24:46,920 --> 00:24:51,750 The assembling step, step 3, converts the assembly code to 0's and 1's. 505 00:24:51,750 --> 00:24:54,480 And then the fourth step, linking, combines 506 00:24:54,480 --> 00:24:57,960 all of the 0's and 1's from the one, the two, the three or more files 507 00:24:57,960 --> 00:25:00,510 that are involved in your project and links them 508 00:25:00,510 --> 00:25:02,310 all together for you magically. 509 00:25:02,310 --> 00:25:06,060 But at the end of the day, all of this is happening automatically for you. 510 00:25:06,060 --> 00:25:10,530 If I jump now to the end here, whereby just by running 511 00:25:10,530 --> 00:25:14,310 make, which, in turn, runs clang for you, like all of this 512 00:25:14,310 --> 00:25:15,900 is abstracted away. 513 00:25:15,900 --> 00:25:19,620 But the key here is that even with these commands that we've been running, 514 00:25:19,620 --> 00:25:22,510 be it the make command or the clang command, 515 00:25:22,510 --> 00:25:28,570 everything should be explainable what you are typing at the prompt 516 00:25:28,570 --> 00:25:29,410 ultimately. 517 00:25:29,410 --> 00:25:31,300 Each of those things has a purpose. 518 00:25:31,300 --> 00:25:33,850 So any questions, then, on what we've just 519 00:25:33,850 --> 00:25:38,018 now called compiling even though it's only when you take another CS 520 00:25:38,018 --> 00:25:40,060 course that you might spend more time on assembly 521 00:25:40,060 --> 00:25:42,940 language or these lower-level details? 522 00:25:42,940 --> 00:25:43,480 Yeah? 523 00:25:43,480 --> 00:25:47,264 AUDIENCE: [INAUDIBLE] 524 00:25:47,264 --> 00:25:49,092 525 00:25:49,092 --> 00:25:50,300 DAVID MALAN: A good question. 526 00:25:50,300 --> 00:25:51,740 Are there other types of compilers? 527 00:25:51,740 --> 00:25:52,240 Yes. 528 00:25:52,240 --> 00:25:57,320 Back when I took CS50, I used a popular compiler called GCC, the GNU Compiler 529 00:25:57,320 --> 00:26:00,650 Collection, which still exists actually in the code space 530 00:26:00,650 --> 00:26:02,120 that you're using for CS50. 531 00:26:02,120 --> 00:26:04,110 Clang is somewhat more recent. 532 00:26:04,110 --> 00:26:05,153 It's gaining popularity. 533 00:26:05,153 --> 00:26:07,820 And frankly, we use it in large part because it's error messages 534 00:26:07,820 --> 00:26:09,320 are slightly more user-friendly. 535 00:26:09,320 --> 00:26:12,570 You might not believe us because if you encountered some errors with your code 536 00:26:12,570 --> 00:26:16,370 this past week, they were probably just as arcane as the error messages I saw, 537 00:26:16,370 --> 00:26:18,598 but it's better than it was some years ago. 538 00:26:18,598 --> 00:26:20,390 And there's alternatives to compiling, too, 539 00:26:20,390 --> 00:26:24,100 but more on that when we get to Python as well. 540 00:26:24,100 --> 00:26:26,080 Other questions? 541 00:26:26,080 --> 00:26:26,580 No? 542 00:26:26,580 --> 00:26:27,080 All right. 543 00:26:27,080 --> 00:26:31,020 Well, what are the implications of the fact that we're going from source code 544 00:26:31,020 --> 00:26:32,190 to machine code? 545 00:26:32,190 --> 00:26:35,010 Well, it stands to reason that if you can compile code, 546 00:26:35,010 --> 00:26:38,970 maybe you can decompile it-- that is, go in the reverse direction. 547 00:26:38,970 --> 00:26:42,010 Go from 0's and 1's to actual source code. 548 00:26:42,010 --> 00:26:45,477 Now that would be handy if you want to go in as a programmer and change 549 00:26:45,477 --> 00:26:48,060 something in a program that you or someone else already wrote. 550 00:26:48,060 --> 00:26:51,330 It's maybe not ideal for your intellectual property, 551 00:26:51,330 --> 00:26:54,780 though, if you are the person who wrote that program in the first place. 552 00:26:54,780 --> 00:26:57,810 If you are Microsoft and you wrote Microsoft Word or Excel 553 00:26:57,810 --> 00:27:01,290 that people with Macs and PCs and phones have installed on their devices, 554 00:27:01,290 --> 00:27:04,440 it doesn't actually sound very appealing if any old customer 555 00:27:04,440 --> 00:27:08,830 can take those 0's and 1's and reverse them, reverse engineer them, 556 00:27:08,830 --> 00:27:11,157 so to speak, into the original source code 557 00:27:11,157 --> 00:27:13,740 because then they can have their own version of Microsoft Word 558 00:27:13,740 --> 00:27:17,100 and make changes to it without really having put in all of the R&D 559 00:27:17,100 --> 00:27:19,980 that it might have taken to build the first version thereof. 560 00:27:19,980 --> 00:27:22,720 But it turns out that reverse engineering-- 561 00:27:22,720 --> 00:27:26,050 so doing things in the opposite direction-- is easier 562 00:27:26,050 --> 00:27:29,740 said than done because there are multiple ways, as you've seen already, 563 00:27:29,740 --> 00:27:31,300 to implement programs. 564 00:27:31,300 --> 00:27:35,440 Like loops alone, you can use for loops, while loops, even do-while loops. 565 00:27:35,440 --> 00:27:37,540 And so there's other ways-- there's multiple ways 566 00:27:37,540 --> 00:27:38,960 to solve the same problem. 567 00:27:38,960 --> 00:27:41,590 So even if you try to reverse engineer a program 568 00:27:41,590 --> 00:27:44,440 and convert machine code back to source code, 569 00:27:44,440 --> 00:27:48,170 there's not necessarily going to be an obvious way to do so. 570 00:27:48,170 --> 00:27:50,620 And the reality is, that it ends up being such a mess 571 00:27:50,620 --> 00:27:53,350 because you lose the variable names typically, 572 00:27:53,350 --> 00:27:57,070 you lose the function names typically, that what you end up looking at 573 00:27:57,070 --> 00:28:01,300 might very well be C code, but it's completely difficult for you, 574 00:28:01,300 --> 00:28:03,040 even a good programmer, to read. 575 00:28:03,040 --> 00:28:06,520 And generally, the mindset is, if you're really good enough 576 00:28:06,520 --> 00:28:09,782 to decompile code in that way and read it subsequently 577 00:28:09,782 --> 00:28:11,740 even without good variable names, good function 578 00:28:11,740 --> 00:28:14,950 names, good documentation and the like, could probably have just implemented 579 00:28:14,950 --> 00:28:18,340 the program in the first place yourself without jumping through those hoops. 580 00:28:18,340 --> 00:28:20,440 So there's some practicality pushing back 581 00:28:20,440 --> 00:28:25,420 on what are otherwise potential threats to, say, your intellectual property. 582 00:28:25,420 --> 00:28:28,150 But that's not going to be the case later on in the term when 583 00:28:28,150 --> 00:28:31,270 we do get to languages like Python to some extent, other languages 584 00:28:31,270 --> 00:28:32,200 like JavaScript. 585 00:28:32,200 --> 00:28:34,870 Some of those are actually going to be readable by anyone. 586 00:28:34,870 --> 00:28:36,790 Any of your customers, any of your friends, 587 00:28:36,790 --> 00:28:39,950 and your family that actually use your programs. 588 00:28:39,950 --> 00:28:43,540 So with that said, let's introduce now another tool to our toolkit 589 00:28:43,540 --> 00:28:45,580 that will hopefully make some of the pain 590 00:28:45,580 --> 00:28:47,470 from this past week when you did encounter 591 00:28:47,470 --> 00:28:49,210 bugs a little more manageable. 592 00:28:49,210 --> 00:28:52,330 And indeed, part of the process of writing code to this day 593 00:28:52,330 --> 00:28:53,680 is debugging it. 594 00:28:53,680 --> 00:28:56,560 And it is a rare thing to write a program, 595 00:28:56,560 --> 00:29:01,450 be it in C or any other language, and get it 100% right the first time. 596 00:29:01,450 --> 00:29:05,360 I mean, to this day, I still, 20-plus years later, still write buggy code. 597 00:29:05,360 --> 00:29:08,695 Hopefully a little bit less of it, but any time you're adding a new feature, 598 00:29:08,695 --> 00:29:10,820 any time you're doing something for the first time, 599 00:29:10,820 --> 00:29:14,380 you're not necessarily going to see all of the possible mistakes. 600 00:29:14,380 --> 00:29:18,910 So even in industry, bugs are omnipresent, which is really to say, 601 00:29:18,910 --> 00:29:22,360 having techniques to debug code-- that is, eliminate bugs, 602 00:29:22,360 --> 00:29:23,740 is super compelling. 603 00:29:23,740 --> 00:29:26,920 Now just for a bit of history, here is Admiral Grace Hopper, 604 00:29:26,920 --> 00:29:30,230 who was actually in not only the military, 605 00:29:30,230 --> 00:29:33,070 but also on the faculty of Harvard years ago 606 00:29:33,070 --> 00:29:35,860 and worked on a Harvard computer called the Harvard Mark 607 00:29:35,860 --> 00:29:39,250 I, which is actually on display at the School of Engineering and Applied 608 00:29:39,250 --> 00:29:41,260 Sciences if you take a tour over there sometime. 609 00:29:41,260 --> 00:29:44,230 But also when working on the Harvard Mark II, 610 00:29:44,230 --> 00:29:50,170 she is known for having at least popularized the phrase "bug" to mean 611 00:29:50,170 --> 00:29:53,350 a mistake in a computer's program-- 612 00:29:53,350 --> 00:29:55,240 a mistake in a computer's code. 613 00:29:55,240 --> 00:29:58,510 And the etymology of this supposedly is this here logbook 614 00:29:58,510 --> 00:30:02,320 wherein she and her colleagues were documenting processes being computed 615 00:30:02,320 --> 00:30:04,960 on computers, that a moth actually got stuck 616 00:30:04,960 --> 00:30:09,250 in one of the relays, one of the mechanical-- the electric relays inside 617 00:30:09,250 --> 00:30:13,450 of the very old now computer, and someone very cleverly 618 00:30:13,450 --> 00:30:16,657 wrote, "First actual case of bug being found." 619 00:30:16,657 --> 00:30:18,490 So it wasn't she who actually discovered it, 620 00:30:18,490 --> 00:30:22,450 but this was a story she was thereafter fond of telling as a famed computer 621 00:30:22,450 --> 00:30:23,860 scientist thereafter. 622 00:30:23,860 --> 00:30:28,210 We now know bugs to be all too familiar when it comes to writing our own code, 623 00:30:28,210 --> 00:30:31,060 and I thought I would deliberately write some buggy code based 624 00:30:31,060 --> 00:30:34,400 on some of the programs with which we experimented last week. 625 00:30:34,400 --> 00:30:37,270 So let me go back over to VS Code here and let 626 00:30:37,270 --> 00:30:44,290 me propose that I do something somewhat simplistic just like this to print out 627 00:30:44,290 --> 00:30:47,140 a column of bricks of height 3. 628 00:30:47,140 --> 00:30:50,440 So I'm going into VS Code and I'm going to deliberately call this program 629 00:30:50,440 --> 00:30:53,230 buggy.c because I intend to do this poorly. 630 00:30:53,230 --> 00:30:58,760 I'm going to include stdio.h as before, int main void as before. 631 00:30:58,760 --> 00:31:01,630 And in here, if I want to print a pyramid of height 3, 632 00:31:01,630 --> 00:31:04,720 I'm going to do 4 int i gets-- 633 00:31:04,720 --> 00:31:06,910 all right, I'm still new to programming in my mind 634 00:31:06,910 --> 00:31:09,820 here, so I know I'm supposed to start counting at 0, OK. 635 00:31:09,820 --> 00:31:13,480 And I want to do this until I count up to 3, so I'm going to do that. 636 00:31:13,480 --> 00:31:16,700 And then i++ I remember from class in this way. 637 00:31:16,700 --> 00:31:20,500 And now I might go ahead and print out just a hash mark, backslash n, 638 00:31:20,500 --> 00:31:23,710 which I do want because I want to move this cursor to the next line 639 00:31:23,710 --> 00:31:24,790 to make this vertical. 640 00:31:24,790 --> 00:31:29,730 But of course, if you've noticed with your eye already, when I do make buggy, 641 00:31:29,730 --> 00:31:30,960 it compiles OK. 642 00:31:30,960 --> 00:31:33,640 So no typos, no syntactical errors. 643 00:31:33,640 --> 00:31:37,620 But when I run this, I'm going to see how many bricks. 644 00:31:37,620 --> 00:31:39,510 So four in this case. 645 00:31:39,510 --> 00:31:41,650 Now this is meant to be a simplistic example 646 00:31:41,650 --> 00:31:44,910 so that we don't spend time trying to figure out what the bug is, but rather, 647 00:31:44,910 --> 00:31:48,210 focus on techniques for actually identifying the bug. 648 00:31:48,210 --> 00:31:50,010 So-- finding, rather, the bug. 649 00:31:50,010 --> 00:31:52,170 So what's one of the first tools in your toolkit? 650 00:31:52,170 --> 00:31:55,470 Literally one you have already. printf is your friend. 651 00:31:55,470 --> 00:31:59,730 And it is a very quick and dirty tool for just seeing 652 00:31:59,730 --> 00:32:02,520 what's going on inside of the computer when 653 00:32:02,520 --> 00:32:06,550 you don't have more sophisticated tools or even the time to use them. 654 00:32:06,550 --> 00:32:09,750 And so in this case, for instance, what I'd propose is that-- 655 00:32:09,750 --> 00:32:11,610 all right, I'm obviously seeing four hashes. 656 00:32:11,610 --> 00:32:13,710 And let me play a little slow here. 657 00:32:13,710 --> 00:32:18,090 It'd be helpful for me to understand why logically I'm ending up with four, even 658 00:32:18,090 --> 00:32:21,360 though I'm starting at 0 like I remember from class and I'm going up to 3 659 00:32:21,360 --> 00:32:25,870 as we did in class, like I'm just not seeing it in this particular story. 660 00:32:25,870 --> 00:32:30,180 So what I would commonly do is go into my code and just help me see 661 00:32:30,180 --> 00:32:35,400 what's going on, and I might literally write a printf line like, i is %i, 662 00:32:35,400 --> 00:32:39,490 backslash n, comma, and then just print out the value of i. 663 00:32:39,490 --> 00:32:41,620 I just want to see on every iteration, what 664 00:32:41,620 --> 00:32:45,530 is i, what is i, what is i just to help me see what the computer already knows. 665 00:32:45,530 --> 00:32:49,900 So let me go ahead and recompile buggy, let me rerun buggy, 666 00:32:49,900 --> 00:32:51,910 and then let me make my terminal window bigger 667 00:32:51,910 --> 00:32:53,410 just to make clear what's going on. 668 00:32:53,410 --> 00:32:56,080 And now it's a little more pedantic. 669 00:32:56,080 --> 00:33:01,150 Now i is 0, I get a hash. i is 1, I get a hash. i is 2, I get a hash. 670 00:33:01,150 --> 00:33:04,310 Wait a minute. i is 3, I get a hash. 671 00:33:04,310 --> 00:33:07,250 So clearly now, it should be maybe more obvious to you, 672 00:33:07,250 --> 00:33:09,430 especially if the syntax itself is unfamiliar, 673 00:33:09,430 --> 00:33:11,680 I certainly don't want this last one printing, 674 00:33:11,680 --> 00:33:14,810 or maybe equivalently, I don't want the first one printing. 675 00:33:14,810 --> 00:33:17,830 So I can fix this in a couple of ways, but the solution, 676 00:33:17,830 --> 00:33:22,810 the most canonical solution is probably to do what with my code? 677 00:33:22,810 --> 00:33:24,430 To change to what to what? 678 00:33:24,430 --> 00:33:25,402 Yeah? 679 00:33:25,402 --> 00:33:26,590 AUDIENCE: [INAUDIBLE] 680 00:33:26,590 --> 00:33:27,340 DAVID MALAN: Yeah. 681 00:33:27,340 --> 00:33:31,000 So change the less than or equal sign to just a less than sign. 682 00:33:31,000 --> 00:33:36,580 So even though this is like counting from 0 to 3 instead of 1 through 3, 683 00:33:36,580 --> 00:33:39,890 it's the more typical programmatic way to write code like this. 684 00:33:39,890 --> 00:33:43,600 And now, of course, if I do make buggy-- 685 00:33:43,600 --> 00:33:46,840 and I'll increase my terminal window again, ./buggy, 686 00:33:46,840 --> 00:33:49,360 now I see what's going on inside of the code. 687 00:33:49,360 --> 00:33:53,080 Now it matches my expectations, and so now the bug is gone. 688 00:33:53,080 --> 00:33:55,330 Now of course, if I'm submitting this or shipping it, 689 00:33:55,330 --> 00:33:57,190 I should delete the temporary printf. 690 00:33:57,190 --> 00:34:00,610 And let me disclaim that using printf in this way just to help you 691 00:34:00,610 --> 00:34:03,100 see what's going on is generally a good thing, 692 00:34:03,100 --> 00:34:06,370 but generally adding a printf and a printf and a printf and a printf-- 693 00:34:06,370 --> 00:34:10,665 like it starts to devolve into just trial and error and you 694 00:34:10,665 --> 00:34:13,540 have no idea what's going on, so you're just printing out everything. 695 00:34:13,540 --> 00:34:17,230 Let me propose that if you ever find yourself slipping down 696 00:34:17,230 --> 00:34:20,260 that hill into just trying this, trying this, trying this, 697 00:34:20,260 --> 00:34:22,659 you need a better tool, not just doing printf. 698 00:34:22,659 --> 00:34:26,199 And frankly, it's annoying to use printf because every time you add a printf, 699 00:34:26,199 --> 00:34:28,699 you have to recompile the code, rerun the code. 700 00:34:28,699 --> 00:34:31,230 It's just adding to the number of steps. 701 00:34:31,230 --> 00:34:34,550 So let me propose instead that we do this. 702 00:34:34,550 --> 00:34:37,070 I'm going to go back into VS Code here and I'm 703 00:34:37,070 --> 00:34:39,980 going to write a different program that actually 704 00:34:39,980 --> 00:34:42,110 has a helper function, so to speak. 705 00:34:42,110 --> 00:34:44,840 A second function whose purpose in life is maybe just 706 00:34:44,840 --> 00:34:46,940 to print that column for me. 707 00:34:46,940 --> 00:34:50,685 So I'm going to say this-- void print_column, 708 00:34:50,685 --> 00:34:53,060 though I could call it anything I want, and this function 709 00:34:53,060 --> 00:34:56,570 is going to take a argument or a parameter called 710 00:34:56,570 --> 00:34:59,300 height which will tell it how many bricks to print, 711 00:34:59,300 --> 00:35:01,070 how many vertical bricks. 712 00:35:01,070 --> 00:35:05,900 I'm going to do the same kind of logic. for int i equals 0. 713 00:35:05,900 --> 00:35:06,830 i is less than-- 714 00:35:06,830 --> 00:35:09,830 I'm going to make the same mistake again-- less than or equal to height, 715 00:35:09,830 --> 00:35:10,850 i++. 716 00:35:10,850 --> 00:35:14,922 And then inside of this for loop, let me go ahead and print out the hash mark. 717 00:35:14,922 --> 00:35:16,880 So I've made the same mistake, but I've made it 718 00:35:16,880 --> 00:35:20,900 in the context now of a helper function only because in main, 719 00:35:20,900 --> 00:35:24,980 what I'd like to do now, just to be a little more sophisticated is get int 720 00:35:24,980 --> 00:35:27,300 from the user for the height. 721 00:35:27,300 --> 00:35:31,190 And when I do get that int, I want to store it in a variable called n, 722 00:35:31,190 --> 00:35:34,980 but I do need to give that variable a type like last week. 723 00:35:34,980 --> 00:35:36,440 So I'll say that it's an integer. 724 00:35:36,440 --> 00:35:40,940 And now, lastly, I can print_column, passing in-- actually, I'll 725 00:35:40,940 --> 00:35:43,100 call it h just because height is h. 726 00:35:43,100 --> 00:35:46,730 Print column h, semicolon. 727 00:35:46,730 --> 00:35:49,790 OK, so it's the exact same program except I'm getting user input now. 728 00:35:49,790 --> 00:35:53,030 So it's not just going to be 3, it's going to be a variable height, 729 00:35:53,030 --> 00:35:55,108 but I've done something stupid. 730 00:35:55,108 --> 00:35:56,940 AUDIENCE: [INAUDIBLE] 731 00:35:56,940 --> 00:35:58,690 DAVID MALAN: I've done two stupid things. 732 00:35:58,690 --> 00:36:02,310 So this, of course, is not supposed to be there, so I'll fix that. 733 00:36:02,310 --> 00:36:03,390 And someone else. 734 00:36:03,390 --> 00:36:05,265 What else have I done? 735 00:36:05,265 --> 00:36:08,990 AUDIENCE: [INAUDIBLE] 736 00:36:08,990 --> 00:36:09,740 DAVID MALAN: Yeah. 737 00:36:09,740 --> 00:36:11,070 I'm missing the prototype. 738 00:36:11,070 --> 00:36:16,040 And this is, let me reiterate, probably the only time where copy-paste is OK. 739 00:36:16,040 --> 00:36:17,960 Once you've implemented the function, you 740 00:36:17,960 --> 00:36:21,690 can copy paste its first line at a semicolon 741 00:36:21,690 --> 00:36:25,265 so that it teaches the compiler that this function will exist. 742 00:36:25,265 --> 00:36:26,635 AUDIENCE: [INAUDIBLE] 743 00:36:26,635 --> 00:36:28,010 DAVID MALAN: Three stupid things. 744 00:36:28,010 --> 00:36:28,510 OK. 745 00:36:28,510 --> 00:36:29,150 Thank you. 746 00:36:29,150 --> 00:36:31,520 So, good. 747 00:36:31,520 --> 00:36:33,620 Include cs50.h. 748 00:36:33,620 --> 00:36:36,860 And now, anyone want to go for four? 749 00:36:36,860 --> 00:36:38,040 No? 750 00:36:38,040 --> 00:36:38,540 All right. 751 00:36:38,540 --> 00:36:39,582 Slightly unintended here. 752 00:36:39,582 --> 00:36:42,020 So let's see. make buggy. 753 00:36:42,020 --> 00:36:44,160 OK, no syntax errors thanks to you all. 754 00:36:44,160 --> 00:36:47,090 So the code compiles, but of course, when I run buggy 755 00:36:47,090 --> 00:36:52,130 and I type in something like 3 manually, I'm still going to get 1, 2, 3 4 out. 756 00:36:52,130 --> 00:36:54,500 So let me now introduce a more powerful tool 757 00:36:54,500 --> 00:36:56,450 that's generally known as a debugger. 758 00:36:56,450 --> 00:36:58,927 And within the VS Code environment that you're using, 759 00:36:58,927 --> 00:37:02,010 we actually have a command that makes it a little easier to use this tool, 760 00:37:02,010 --> 00:37:03,510 but we didn't write the tool itself. 761 00:37:03,510 --> 00:37:07,040 You are about to see a very graphical, a very popular industry standard 762 00:37:07,040 --> 00:37:11,510 tool called a debugger, but we'll start the debugger using a CS50-specific 763 00:37:11,510 --> 00:37:15,080 command called debug50, which just makes it easier with a single command 764 00:37:15,080 --> 00:37:17,655 to start the debugger without having to configure a text 765 00:37:17,655 --> 00:37:20,030 file with all of your preferred settings and all of that. 766 00:37:20,030 --> 00:37:22,710 It's just an annoying hoop otherwise to jump through. 767 00:37:22,710 --> 00:37:25,100 So what I'm going to do is go back to my code here. 768 00:37:25,100 --> 00:37:27,900 I have already compiled it, but just for good measure, 769 00:37:27,900 --> 00:37:31,140 I'll make buggy again because the debugger needs your code 770 00:37:31,140 --> 00:37:31,862 to be compiled. 771 00:37:31,862 --> 00:37:33,570 It's not going to help with syntax errors 772 00:37:33,570 --> 00:37:36,270 like the stupid mistakes I just made unintentionally, 773 00:37:36,270 --> 00:37:40,530 it will help you though with programmatic errors, logical errors 774 00:37:40,530 --> 00:37:42,870 in your code once your code is running. 775 00:37:42,870 --> 00:37:47,130 So to run debug50, I'm going to do this. debug50, space, and then 776 00:37:47,130 --> 00:37:51,840 the exact same command I would normally run to just run the program itself. 777 00:37:51,840 --> 00:37:53,190 So ./buggy. 778 00:37:53,190 --> 00:37:57,150 So exact same thing, ./buggy, but I prefix it now with debug50. 779 00:37:57,150 --> 00:37:59,172 When I hit Enter, a whole bunch of-- 780 00:37:59,172 --> 00:38:01,380 another error is going to pop up on the screen, which 781 00:38:01,380 --> 00:38:04,213 is a good reminder because this will happen to you, too, invariably. 782 00:38:04,213 --> 00:38:07,560 It's reminding me that I have to set what's called a breakpoint. 783 00:38:07,560 --> 00:38:10,140 And as that word suggests, it is the point 784 00:38:10,140 --> 00:38:12,060 at which you want your code to break. 785 00:38:12,060 --> 00:38:15,420 Not break in make the situation worse sense, but rather, 786 00:38:15,420 --> 00:38:16,920 where do you want to pause? 787 00:38:16,920 --> 00:38:20,590 Execution, break, execution-- like hitting the brakes on a car 788 00:38:20,590 --> 00:38:22,710 so the program doesn't run all at once. 789 00:38:22,710 --> 00:38:24,600 And you can put this any number of places, 790 00:38:24,600 --> 00:38:26,308 and you might have done this accidentally 791 00:38:26,308 --> 00:38:29,040 if you've ever hovered over the gutter of VS Code, 792 00:38:29,040 --> 00:38:32,010 the left-hand side next to your line numbers. 793 00:38:32,010 --> 00:38:34,180 See the little red dot that appears? 794 00:38:34,180 --> 00:38:38,560 If I click on any of these lines, that's going to set a breakpoint, so to speak. 795 00:38:38,560 --> 00:38:41,310 And I want to break execution at main. 796 00:38:41,310 --> 00:38:44,040 So I'm just going to click to the left of line 6 in this case. 797 00:38:44,040 --> 00:38:47,430 That makes it a darker red circle, a stop sign 798 00:38:47,430 --> 00:38:51,030 of sorts that tells the debugger to pause execution on that line, 799 00:38:51,030 --> 00:38:53,580 though I could put it elsewhere if I so choose. 800 00:38:53,580 --> 00:38:57,990 Let me go ahead and rerun debug50 ./buggy, Enter, 801 00:38:57,990 --> 00:39:00,652 and now a bunch of things are going to happen on the screen. 802 00:39:00,652 --> 00:39:03,360 It's going to look a little overwhelming perhaps at first glance, 803 00:39:03,360 --> 00:39:05,950 but there's some useful stuff that just happened. 804 00:39:05,950 --> 00:39:12,450 So one, my code is still here, but the line that I set the breakpoint on is-- 805 00:39:12,450 --> 00:39:16,080 rather, the first line of actual executable 806 00:39:16,080 --> 00:39:20,970 code at or below the breakpoint I set is highlighted in this yellowish green 807 00:39:20,970 --> 00:39:25,120 here, which says, this line of code has not yet been executed. 808 00:39:25,120 --> 00:39:28,590 We broke at this point, but if I click a button, this line of code 809 00:39:28,590 --> 00:39:30,030 will be executed. 810 00:39:30,030 --> 00:39:33,750 Because up until now, every C program you write runs as fast as that. 811 00:39:33,750 --> 00:39:36,550 I want to pump the brakes and pause here. 812 00:39:36,550 --> 00:39:39,190 But notice a few other aspects of the window here. 813 00:39:39,190 --> 00:39:41,310 So notice that up here some weirdness. 814 00:39:41,310 --> 00:39:43,890 There's mentions of variables and we're familiar with these. 815 00:39:43,890 --> 00:39:45,990 Local is a term we'll use this week. 816 00:39:45,990 --> 00:39:48,210 But there's this variable h, which weirdly, 817 00:39:48,210 --> 00:39:51,300 where did the value 21912 come from? 818 00:39:51,300 --> 00:39:57,750 So it turns out, in C, before you initialize a variable with a value 819 00:39:57,750 --> 00:40:01,890 by literally typing the number 3, or by using a function like get_int, 820 00:40:01,890 --> 00:40:04,662 it often contains what's called a garbage value. 821 00:40:04,662 --> 00:40:06,120 More on those in a couple of weeks. 822 00:40:06,120 --> 00:40:07,950 But a garbage value is you can think of it 823 00:40:07,950 --> 00:40:10,680 as like remnants of whatever was in the computer's memory 824 00:40:10,680 --> 00:40:12,280 before you ran your program. 825 00:40:12,280 --> 00:40:14,040 And that's a bit of a oversimplification, 826 00:40:14,040 --> 00:40:18,150 but you cannot trust that a variable will have a certain value in this case 827 00:40:18,150 --> 00:40:21,490 if you did not put one there yourself. 828 00:40:21,490 --> 00:40:23,857 So for now, h is nonsensical. 829 00:40:23,857 --> 00:40:25,440 It's a garbage value it means nothing. 830 00:40:25,440 --> 00:40:29,230 But once I execute this line, it should contain whatever the human types in. 831 00:40:29,230 --> 00:40:29,730 All right. 832 00:40:29,730 --> 00:40:32,990 Down here, there's a watch section, which is a more sophisticated feature. 833 00:40:32,990 --> 00:40:34,740 Down here is what's called the call stack. 834 00:40:34,740 --> 00:40:35,890 More on that in the future. 835 00:40:35,890 --> 00:40:39,240 But what this means for now is that I'm executing the main function, not, 836 00:40:39,240 --> 00:40:40,870 for instance, print_column. 837 00:40:40,870 --> 00:40:44,790 So notice up here, these are the most useful controls within the interface. 838 00:40:44,790 --> 00:40:46,740 If I hit this Play button, it's just going 839 00:40:46,740 --> 00:40:50,640 to actually run my program to the end of it without bothering me further. 840 00:40:50,640 --> 00:40:54,990 However, I can actually step over this line of code and execute it, 841 00:40:54,990 --> 00:40:57,870 or I can step into this line of code and actually 842 00:40:57,870 --> 00:41:01,480 poke around the contents of get_int if it's available on the system. 843 00:41:01,480 --> 00:41:03,870 So conceptually you can either execute this line 844 00:41:03,870 --> 00:41:08,745 or you can dive down conceptually deeper and see what's inside of that function. 845 00:41:08,745 --> 00:41:10,620 Lastly, this will let you step out, this will 846 00:41:10,620 --> 00:41:13,828 allow you to restart the whole process, and this will just stop the debugger. 847 00:41:13,828 --> 00:41:15,960 So these buttons are going to be our friends. 848 00:41:15,960 --> 00:41:19,840 And the one I'll click first is the first one I described, 849 00:41:19,840 --> 00:41:21,690 which is step over. 850 00:41:21,690 --> 00:41:26,180 So step over doesn't mean, skip this step, it just means execute it, 851 00:41:26,180 --> 00:41:30,000 but don't bother me by going into the weeds of what is on the specific line, 852 00:41:30,000 --> 00:41:30,740 namely get_int. 853 00:41:30,740 --> 00:41:32,990 So when I click this button in a moment, you'll 854 00:41:32,990 --> 00:41:36,830 see that my terminal, which is still at the bottom, prompts me for a height. 855 00:41:36,830 --> 00:41:38,600 I'm going to go ahead and type 3. 856 00:41:38,600 --> 00:41:41,240 As soon as I hit Enter, what part of the screen 857 00:41:41,240 --> 00:41:44,285 probably will change based on what I've said? 858 00:41:44,285 --> 00:41:47,280 859 00:41:47,280 --> 00:41:50,760 So h, the variable h should hopefully take on the number 3. 860 00:41:50,760 --> 00:41:53,340 And I'll probably see a different line of code 861 00:41:53,340 --> 00:41:57,990 highlighted, probably line 9 next once I'm done executing line 8. 862 00:41:57,990 --> 00:42:01,170 So let me go ahead and hit Enter and watch the top-left of the screen. 863 00:42:01,170 --> 00:42:08,580 And voila, h now has the value 3, and execution has now paused on line 9 864 00:42:08,580 --> 00:42:12,900 because the debugger is allowing me to step through my code line by line. 865 00:42:12,900 --> 00:42:16,998 Now let me go ahead and print out-- let me go ahead and just say, all right, 866 00:42:16,998 --> 00:42:17,790 I'm done with this. 867 00:42:17,790 --> 00:42:19,950 Let's go ahead and run the rest of the program. 868 00:42:19,950 --> 00:42:21,660 It clearly got the value 3. 869 00:42:21,660 --> 00:42:22,658 But wait a minute-- 870 00:42:22,658 --> 00:42:24,450 oh, and at this point, it closed the window 871 00:42:24,450 --> 00:42:28,530 in which I would have seen the output, I would have still seen four hashes. 872 00:42:28,530 --> 00:42:29,950 So let me actually do this again. 873 00:42:29,950 --> 00:42:34,392 Let me go back into debug50 by running the exact same command again. 874 00:42:34,392 --> 00:42:37,350 It's going to think for a moment, it's going to reconfigure the screen. 875 00:42:37,350 --> 00:42:38,892 I'm going to do the exact same thing. 876 00:42:38,892 --> 00:42:41,100 I'm going to step over this line, but I'd 877 00:42:41,100 --> 00:42:45,490 like to actually see what's going on inside of my print_column function. 878 00:42:45,490 --> 00:42:48,580 So this time, instead of just saying run to the end 879 00:42:48,580 --> 00:42:51,100 and close all the windows on me, let me go ahead 880 00:42:51,100 --> 00:42:54,460 and step into my print_column function. 881 00:42:54,460 --> 00:42:57,070 So don't step over, step into. 882 00:42:57,070 --> 00:42:58,525 Because if I step over-- 883 00:42:58,525 --> 00:43:00,400 and now this is what I meant to show earlier, 884 00:43:00,400 --> 00:43:02,710 you can see that it's still printing out 4. 885 00:43:02,710 --> 00:43:05,930 So in fact, let me undo this, let me just stop the whole thing. 886 00:43:05,930 --> 00:43:08,320 Let me rerun the command a final time. 887 00:43:08,320 --> 00:43:10,690 So it goes back to where we began before. 888 00:43:10,690 --> 00:43:15,520 It's going to prompt me again once I step over line 8 for a number like 3. 889 00:43:15,520 --> 00:43:19,930 But this time, instead of stepping over line 9, let's poke around. 890 00:43:19,930 --> 00:43:23,770 I wrote print_column, so let's look at print_column step by step, 891 00:43:23,770 --> 00:43:26,800 step into it, and watch what happens to the yellow highlight. 892 00:43:26,800 --> 00:43:30,220 It now jumps logically to the inside of print_column, 893 00:43:30,220 --> 00:43:32,510 thereby letting me walk through this code. 894 00:43:32,510 --> 00:43:35,720 And now I can just step over each of these lines one at a time. 895 00:43:35,720 --> 00:43:37,180 So stepping over. 896 00:43:37,180 --> 00:43:38,440 OK, so what did it do? 897 00:43:38,440 --> 00:43:41,200 It did that whole narrative that I did verbally last week 898 00:43:41,200 --> 00:43:43,720 where it compared i against height. 899 00:43:43,720 --> 00:43:45,520 It then went inside of the loop. 900 00:43:45,520 --> 00:43:48,940 When I click Step Over, watch what happens in my terminal-- one hash 901 00:43:48,940 --> 00:43:49,660 prints out. 902 00:43:49,660 --> 00:43:51,460 Now line 14 is highlighted again. 903 00:43:51,460 --> 00:43:54,220 It's comparing per the Boolean expression, i, 904 00:43:54,220 --> 00:43:55,900 is it less than or equal to height? 905 00:43:55,900 --> 00:43:59,770 If so, it's going to go ahead and print out the hash. 906 00:43:59,770 --> 00:44:02,080 It's going to do this again, print out the hash. 907 00:44:02,080 --> 00:44:05,020 But notice at the top-left of the screen, height 908 00:44:05,020 --> 00:44:10,180 is still the same, it's still 3, but what has been changing, apparently? 909 00:44:10,180 --> 00:44:11,960 i on each iteration. 910 00:44:11,960 --> 00:44:16,240 So the debugger is letting me see what's going on slowly inside of this loop 911 00:44:16,240 --> 00:44:18,070 because i keeps getting incremented. 912 00:44:18,070 --> 00:44:21,580 So if I step over this line now, notice that I've now printed 3. 913 00:44:21,580 --> 00:44:25,690 So ideally I want this loop to end, but if I click Step Over once more, 914 00:44:25,690 --> 00:44:29,710 notice that the value of i at top-left is 3, 915 00:44:29,710 --> 00:44:35,600 but 3 is less than or equal to height-- oh, now I get it, if I play along here. 916 00:44:35,600 --> 00:44:40,540 Now I see why less than or equals to, mathematically, is clearly incorrect. 917 00:44:40,540 --> 00:44:43,090 And as soon as that light bulb goes off, you can just sort of 918 00:44:43,090 --> 00:44:46,570 bail out, click the red Stop button to turn the debugger off, 919 00:44:46,570 --> 00:44:50,560 go back in, fix your code, and voila, recompile, run it, 920 00:44:50,560 --> 00:44:51,950 and you're back in business. 921 00:44:51,950 --> 00:44:55,480 So the takeaways here really are just what tools now exist? 922 00:44:55,480 --> 00:44:59,590 Printf is your friend, but only for quick-and-dirty debugging techniques. 923 00:44:59,590 --> 00:45:04,930 Get into the habit now of using debug50, and in turn, VS Code's debugger. 924 00:45:04,930 --> 00:45:08,800 You will invariably not take this advice, say, 925 00:45:08,800 --> 00:45:11,710 for problem set 2 as you first begin because it's 926 00:45:11,710 --> 00:45:15,340 going to feel easier and quicker just to use printf, just to use printf, 927 00:45:15,340 --> 00:45:16,300 just to use printf. 928 00:45:16,300 --> 00:45:17,710 And the problem with that logic is that you 929 00:45:17,710 --> 00:45:20,000 begin to build up like technical debt, so to speak, 930 00:45:20,000 --> 00:45:21,760 where you really should have learned it earlier, 931 00:45:21,760 --> 00:45:23,510 you really should have learned it earlier, 932 00:45:23,510 --> 00:45:26,000 you really should have learned it earlier, at which point, 933 00:45:26,000 --> 00:45:29,350 you end up spending more time wasted using printf 934 00:45:29,350 --> 00:45:32,720 and doing things manually than if you had just spent 10 minutes, 935 00:45:32,720 --> 00:45:35,170 30 minutes just learning the user interface 936 00:45:35,170 --> 00:45:37,510 and the buttons of a proper debugger. 937 00:45:37,510 --> 00:45:40,390 So please take that advice because it will save you 938 00:45:40,390 --> 00:45:45,480 significant amounts of time over time. 939 00:45:45,480 --> 00:45:48,900 Questions on printf or debugging in this way? 940 00:45:48,900 --> 00:45:52,260 941 00:45:52,260 --> 00:45:54,790 Any questions on this? 942 00:45:54,790 --> 00:45:55,290 No? 943 00:45:55,290 --> 00:45:55,800 OK. 944 00:45:55,800 --> 00:45:59,880 So let me give you a third and final technique for debugging, which has been 945 00:45:59,880 --> 00:46:01,840 looming over us here for some time. 946 00:46:01,840 --> 00:46:05,400 So there is actually this technique known as rubber duck debugging. 947 00:46:05,400 --> 00:46:09,570 And in the absence of a roommate who is taking CS50 or who has taken CS50 948 00:46:09,570 --> 00:46:13,140 or knows how to program, in the absence of having a TF or TA or CA 949 00:46:13,140 --> 00:46:16,920 sitting next to you, in the absence of having a family member available to ask 950 00:46:16,920 --> 00:46:22,020 questions of, if you have simply an inanimate object on your desk, 951 00:46:22,020 --> 00:46:25,440 goes the tradition, just talk to that inanimate object. 952 00:46:25,440 --> 00:46:27,970 Better yet, if it's an adorable rubber duck in this way. 953 00:46:27,970 --> 00:46:31,560 And the idea of rubber duck debugging is that simply 954 00:46:31,560 --> 00:46:34,930 by verbalizing literally out loud to this inanimate object-- 955 00:46:34,930 --> 00:46:36,930 probably with the door closed and no one knowing 956 00:46:36,930 --> 00:46:39,930 that you're talking to this rubber duck, you invariably 957 00:46:39,930 --> 00:46:44,070 end up hearing any illogic in your own thoughts, at which point 958 00:46:44,070 --> 00:46:47,340 the proverbial light bulb tends to go off and you're like, oh, I'm an idiot. 959 00:46:47,340 --> 00:46:50,310 It's supposed to be less than, not less than or equal to. 960 00:46:50,310 --> 00:46:54,670 So literally just explaining to a duck or any inanimate object what's 961 00:46:54,670 --> 00:46:57,790 going on in your code will quite frequently just 962 00:46:57,790 --> 00:47:02,260 help you see in your mind's eye what it is you've been doing wrong. 963 00:47:02,260 --> 00:47:05,590 So rubber duck debugging is indeed a very effective technique 964 00:47:05,590 --> 00:47:09,550 even if you don't happen to have a small or large rubber duck. 965 00:47:09,550 --> 00:47:12,370 Of course, you're also welcome to use the CS50 Duck who 966 00:47:12,370 --> 00:47:17,710 lives at cs50.ai, and also within a pane in VS Code at cs50.dev. 967 00:47:17,710 --> 00:47:20,830 You can ask the CS50 Duck about concepts you don't understand, 968 00:47:20,830 --> 00:47:23,170 or you can even copy paste certain lines of code 969 00:47:23,170 --> 00:47:27,460 with which you might be having trouble and ask the duck for its own advice. 970 00:47:27,460 --> 00:47:28,180 All right. 971 00:47:28,180 --> 00:47:33,730 So, with those tools in our toolkit, let me propose now that we do-- 972 00:47:33,730 --> 00:47:37,390 that we introduce now a few lower-level features of C 973 00:47:37,390 --> 00:47:40,720 itself and better understand how we can start solving some of those problems 974 00:47:40,720 --> 00:47:44,860 like the readability of text or the encryption of data. 975 00:47:44,860 --> 00:47:47,080 These were our so-called types last week when 976 00:47:47,080 --> 00:47:51,490 we introduced at least a subset of them or used them just to store data 977 00:47:51,490 --> 00:47:53,328 in a certain format, so to speak. 978 00:47:53,328 --> 00:47:55,870 Like in week 0, we said that everything at the end of the day 979 00:47:55,870 --> 00:47:57,490 is just 0's and 1's, binary. 980 00:47:57,490 --> 00:48:03,130 And I claimed conceptually that how a computer knows if a set of bits 981 00:48:03,130 --> 00:48:08,230 is a number versus a letter versus a color or a sound or an image or a video 982 00:48:08,230 --> 00:48:11,048 is just context-dependent, like you're using Photoshop 983 00:48:11,048 --> 00:48:13,090 or you're using Microsoft Word or something else. 984 00:48:13,090 --> 00:48:16,420 But last week, we saw a little more precisely that it's 985 00:48:16,420 --> 00:48:18,490 not quite as broad strokes as that. 986 00:48:18,490 --> 00:48:23,680 It's more about what the programmer has told the software is 987 00:48:23,680 --> 00:48:25,690 being stored in a given variable. 988 00:48:25,690 --> 00:48:26,590 Is it an integer? 989 00:48:26,590 --> 00:48:28,180 Is it a char, a character? 990 00:48:28,180 --> 00:48:29,350 Is it a whole string? 991 00:48:29,350 --> 00:48:31,610 Is it a longer integer or the like? 992 00:48:31,610 --> 00:48:33,460 So you now have this control. 993 00:48:33,460 --> 00:48:36,340 The catch, though, recall, though, is that each of these types 994 00:48:36,340 --> 00:48:39,710 has only a finite amount of space allocated to it. 995 00:48:39,710 --> 00:48:43,060 So for instance, an integer is typically 4 bytes, 996 00:48:43,060 --> 00:48:46,780 and 4 bytes is 32 bits because it's 8 times 4. 997 00:48:46,780 --> 00:48:49,390 32 bits, we claimed, is roughly 4 billion, 998 00:48:49,390 --> 00:48:52,120 but if you want to represent negative and positive numbers, 999 00:48:52,120 --> 00:48:55,330 the biggest integer you can store is like 2 billion. 1000 00:48:55,330 --> 00:48:57,650 Now that's really big for a lot of applications, 1001 00:48:57,650 --> 00:48:59,950 but years ago, Facebook, for instance, was 1002 00:48:59,950 --> 00:49:04,100 rumored to be using integers when they had fewer users. 1003 00:49:04,100 --> 00:49:06,790 But now that they have billions of users-- 1004 00:49:06,790 --> 00:49:12,100 3-plus billion users, an integer is no longer big enough for the Facebooks, 1005 00:49:12,100 --> 00:49:15,620 the Googles, the Microsofts and so forth of the world. 1006 00:49:15,620 --> 00:49:21,520 So we also have longs, which use twice as many bytes, but exponentially 1007 00:49:21,520 --> 00:49:23,080 bigger range of values. 1008 00:49:23,080 --> 00:49:26,260 Meanwhile, a bool, interestingly, is a byte, which 1009 00:49:26,260 --> 00:49:29,550 is kind of bad design in what sense? 1010 00:49:29,550 --> 00:49:31,780 Why might that be bad design? 1011 00:49:31,780 --> 00:49:33,590 It's only-- it should only be 2-- 1012 00:49:33,590 --> 00:49:36,170 1 bit, rather, because a 0 or 1 should suffice. 1013 00:49:36,170 --> 00:49:38,440 Turns out, it's just easier to use a whole byte 1014 00:49:38,440 --> 00:49:40,900 even though we're wasting seven of those bits, 1015 00:49:40,900 --> 00:49:43,750 but bools are represented nonetheless with 1 byte. 1016 00:49:43,750 --> 00:49:45,400 Chars are going to be 1 byte. 1017 00:49:45,400 --> 00:49:47,890 Floats tend to be 4 bytes. 1018 00:49:47,890 --> 00:49:49,390 Doubles tend to be 8 bytes. 1019 00:49:49,390 --> 00:49:52,510 Some of this is system-dependent, but nowadays on modern computers, 1020 00:49:52,510 --> 00:49:54,250 this tends to be a useful rule of thumb. 1021 00:49:54,250 --> 00:49:56,710 The only one I can't commit to here is a string 1022 00:49:56,710 --> 00:49:58,900 because a string, recall, is a sequence of text. 1023 00:49:58,900 --> 00:50:02,800 And maybe it has no characters, one character, two, 10, 100. 1024 00:50:02,800 --> 00:50:05,410 So it's a variable number of bytes presumably 1025 00:50:05,410 --> 00:50:08,590 where each byte represents a given character. 1026 00:50:08,590 --> 00:50:12,370 So with that said, how do we get from an actual computer 1027 00:50:12,370 --> 00:50:16,060 to information being represented therein? 1028 00:50:16,060 --> 00:50:19,270 Well, let me remind us that this is what's inside of our Macs, PCs, phones. 1029 00:50:19,270 --> 00:50:22,220 Even though this isn't a scale and it might not be the same shape, 1030 00:50:22,220 --> 00:50:24,520 this is memory, random access memory. 1031 00:50:24,520 --> 00:50:26,890 And on these black chips, on the circuit board 1032 00:50:26,890 --> 00:50:29,360 here, are the bytes that we keep talking about. 1033 00:50:29,360 --> 00:50:31,940 In fact, let's go ahead and zoom in on one of these chips, 1034 00:50:31,940 --> 00:50:33,110 fill the screen here. 1035 00:50:33,110 --> 00:50:35,820 And just for an artist's depiction's sake, 1036 00:50:35,820 --> 00:50:38,480 let me propose that if you've got, I don't know, 1037 00:50:38,480 --> 00:50:43,340 a megabyte, a gigabyte-- like a lot of bytes packed into this chip nowadays, 1038 00:50:43,340 --> 00:50:46,100 it stands to reason that no matter how many of them you have, 1039 00:50:46,100 --> 00:50:48,398 we could just number them from top to bottom 1040 00:50:48,398 --> 00:50:50,690 and we could say that this is byte 1, or you know what? 1041 00:50:50,690 --> 00:50:55,950 This is byte 0, 1, 2, 3, and this is maybe byte 1 billion or whatever it is. 1042 00:50:55,950 --> 00:50:58,370 So you can think of memory as having addresses 1043 00:50:58,370 --> 00:51:03,020 or just locations, numeric indices that identify each of those bytes 1044 00:51:03,020 --> 00:51:03,710 individually. 1045 00:51:03,710 --> 00:51:04,550 Why a byte? 1046 00:51:04,550 --> 00:51:08,300 Individual bits are not that useful, so 8, again, 1 byte 1047 00:51:08,300 --> 00:51:10,400 tends to be the de facto standard. 1048 00:51:10,400 --> 00:51:14,360 Let me-- so, for instance, if you're storing just a single character, 1049 00:51:14,360 --> 00:51:18,570 a char, it might be stored literally in this top-left corner, so to speak, 1050 00:51:18,570 --> 00:51:20,600 of the chip of memory. 1051 00:51:20,600 --> 00:51:23,060 If you're storing maybe an integer, 4 bytes, 1052 00:51:23,060 --> 00:51:24,830 it might take up that many bytes. 1053 00:51:24,830 --> 00:51:28,760 If you're storing a long, it might take up that many bytes instead. 1054 00:51:28,760 --> 00:51:31,520 Now we don't have to dwell on the particulars of the circuit board 1055 00:51:31,520 --> 00:51:34,580 and these traces and all the connections, so let me just abstract 1056 00:51:34,580 --> 00:51:37,550 this away and claim that what your computer's memory really 1057 00:51:37,550 --> 00:51:41,060 is is just kind of this canvas, I mean kind of in the Photoshop sense. 1058 00:51:41,060 --> 00:51:43,040 If you've ever made pictures, it's just a grid 1059 00:51:43,040 --> 00:51:46,220 of pixels, up, down, left, right, that's really all your memory is. 1060 00:51:46,220 --> 00:51:51,110 It's this canvas that you can manipulate the bits on to store numbers anywhere 1061 00:51:51,110 --> 00:51:53,190 you want in the computer's memory. 1062 00:51:53,190 --> 00:51:55,400 So in fact, let's zoom in here and let's consider 1063 00:51:55,400 --> 00:52:01,640 how your computer is actually storing information using just these bytes. 1064 00:52:01,640 --> 00:52:04,190 At the end of the day, no matter how sophisticated 1065 00:52:04,190 --> 00:52:07,280 your Mac, your PC, your phone is, like this is all 1066 00:52:07,280 --> 00:52:10,310 it has access to for storing information. 1067 00:52:10,310 --> 00:52:13,010 It's a canvas of bytes, and what you do with this 1068 00:52:13,010 --> 00:52:15,720 now really invites design decisions. 1069 00:52:15,720 --> 00:52:17,000 So let's consider this. 1070 00:52:17,000 --> 00:52:20,060 Here is an excerpt from a program wherein maybe I'm 1071 00:52:20,060 --> 00:52:22,160 prompting the user for three scores. 1072 00:52:22,160 --> 00:52:24,950 Like three test, scores, exam scores, something like that. 1073 00:52:24,950 --> 00:52:27,035 And the purpose in life of this program is maybe 1074 00:52:27,035 --> 00:52:28,910 to average those three scores together if you 1075 00:52:28,910 --> 00:52:31,118 want to get a sense of where you stand in some class. 1076 00:52:31,118 --> 00:52:33,290 So we can certainly whip up some code like this. 1077 00:52:33,290 --> 00:52:37,370 And in just a moment, let me go ahead and flip over to VS Code here. 1078 00:52:37,370 --> 00:52:41,420 And I'll write up a new program called scores.c. 1079 00:52:41,420 --> 00:52:46,460 And in this, let me go ahead and first include stdio.h, 1080 00:52:46,460 --> 00:52:48,710 int main void at the top. 1081 00:52:48,710 --> 00:52:51,750 And in here, let me go ahead and assume that, eh, 1082 00:52:51,750 --> 00:52:53,250 it's not been the greatest semester. 1083 00:52:53,250 --> 00:52:56,930 So my first score, which I'll call score1, was a 72, 1084 00:52:56,930 --> 00:53:03,050 my second score was a 73, but my third score, score3, was like a 33. 1085 00:53:03,050 --> 00:53:05,832 Now you might remember these numbers in another context, 1086 00:53:05,832 --> 00:53:08,540 they might spell a message, but in this case, it's just integers. 1087 00:53:08,540 --> 00:53:12,320 It's just numbers because I'm telling the computer to treat these as ints. 1088 00:53:12,320 --> 00:53:15,750 Now if I want to figure out what my average is, I can do a bit of math. 1089 00:53:15,750 --> 00:53:18,770 So let me just print out that my average is-- 1090 00:53:18,770 --> 00:53:20,600 and I don't want to shortchange myself. 1091 00:53:20,600 --> 00:53:23,910 I'm not going to use %i because I don't want to lose even anything after 1092 00:53:23,910 --> 00:53:24,660 the decimal point. 1093 00:53:24,660 --> 00:53:26,540 So we're going to use a float instead. 1094 00:53:26,540 --> 00:53:33,230 And my average i claim will be score1 plus score2 plus score3 1095 00:53:33,230 --> 00:53:36,200 divided by 3, semicolon. 1096 00:53:36,200 --> 00:53:38,840 With parentheses, because just like grade school math, 1097 00:53:38,840 --> 00:53:41,580 like order of operations, I parenthesize the numerator, 1098 00:53:41,580 --> 00:53:43,670 so I can divide the whole thing by 3. 1099 00:53:43,670 --> 00:53:45,350 But I have screwed up already. 1100 00:53:45,350 --> 00:53:49,370 I am going to shortchange myself and not give myself as high a grade 1101 00:53:49,370 --> 00:53:51,977 as I deserve, but this one's subtle. 1102 00:53:51,977 --> 00:53:52,935 What have I done wrong? 1103 00:53:52,935 --> 00:53:56,230 1104 00:53:56,230 --> 00:53:59,740 Yeah, I might want to cast these scores to floats 1105 00:53:59,740 --> 00:54:05,290 because if you do integral math, divide an integer or the sum of an integers-- 1106 00:54:05,290 --> 00:54:09,710 some integers by an integer, it's going to be an integer as the result, 1107 00:54:09,710 --> 00:54:12,730 so it's going to throw away anything after the decimal point. 1108 00:54:12,730 --> 00:54:15,970 Even if it's something-point-1, something-point-5, something-point-9, 1109 00:54:15,970 --> 00:54:18,010 that fraction is going to be thrown away. 1110 00:54:18,010 --> 00:54:19,750 There's a bunch of ways to fix this. 1111 00:54:19,750 --> 00:54:22,810 I could just use floats or doubles for all of these. 1112 00:54:22,810 --> 00:54:26,140 I could cast score1, score2, or score3 as you propose. 1113 00:54:26,140 --> 00:54:28,780 Frankly, the simplest way is just change the denominator 1114 00:54:28,780 --> 00:54:31,840 because so long as I've got one float involved in the math, 1115 00:54:31,840 --> 00:54:35,950 this will promote the whole arithmetic expression to being floating point 1116 00:54:35,950 --> 00:54:37,690 math instead of integer math. 1117 00:54:37,690 --> 00:54:41,110 So let me go ahead now and do make scores, Enter. 1118 00:54:41,110 --> 00:54:45,100 So far, so good. ./scores, and my average seems to be not great, 1119 00:54:45,100 --> 00:54:47,140 but 59.33333-- 1120 00:54:47,140 --> 00:54:47,950 so in the third. 1121 00:54:47,950 --> 00:54:50,200 But I would have lost that third if I hadn't 1122 00:54:50,200 --> 00:54:52,940 used a float in this particular way. 1123 00:54:52,940 --> 00:54:56,570 Well, let's consider now what's actually going on inside of the computer 1124 00:54:56,570 --> 00:54:58,650 when I store these three variables. 1125 00:54:58,650 --> 00:55:01,175 So, back to the grid here, just my canvas of memory. 1126 00:55:01,175 --> 00:55:03,050 It doesn't really matter where things end up. 1127 00:55:03,050 --> 00:55:04,820 I might put it here, I might put it there, 1128 00:55:04,820 --> 00:55:06,510 the computer makes these decisions. 1129 00:55:06,510 --> 00:55:10,500 But for the artist's sake, I'm going to put it at the top left-hand corner 1130 00:55:10,500 --> 00:55:11,000 here. 1131 00:55:11,000 --> 00:55:15,710 So, score1 is containing the integer 72. 1132 00:55:15,710 --> 00:55:20,580 Why is it taking up four squares, though? 1133 00:55:20,580 --> 00:55:22,040 Because? 1134 00:55:22,040 --> 00:55:23,030 It's an integer. 1135 00:55:23,030 --> 00:55:25,500 And on this system, an integer is 4 bytes. 1136 00:55:25,500 --> 00:55:30,170 So I've drawn it to scale, if you will. score2 is the number 73, 1137 00:55:30,170 --> 00:55:32,150 it also takes 4 bytes. 1138 00:55:32,150 --> 00:55:34,850 By coincidence, but also by convention, it 1139 00:55:34,850 --> 00:55:38,180 will likely end up next to the first integer 1140 00:55:38,180 --> 00:55:40,970 in memory because I've only got three variables going on anyway, 1141 00:55:40,970 --> 00:55:44,360 so the computer quite likely will store them back to back to back. 1142 00:55:44,360 --> 00:55:48,110 And indeed, by that logic, score3, containing the number 33, 1143 00:55:48,110 --> 00:55:50,060 is going to fill in this space here. 1144 00:55:50,060 --> 00:55:51,917 We'll consider down the road what happens 1145 00:55:51,917 --> 00:55:53,750 if things get fragmented-- something's here, 1146 00:55:53,750 --> 00:55:55,875 something's here, something's here, but for now, we 1147 00:55:55,875 --> 00:55:59,507 can assume that this is probably contiguous, though not necessarily so. 1148 00:55:59,507 --> 00:56:01,340 All right, so that's pretty straightforward, 1149 00:56:01,340 --> 00:56:02,750 but what's really going on? 1150 00:56:02,750 --> 00:56:04,940 Well, these are just bytes of memory-- 1151 00:56:04,940 --> 00:56:07,850 that is, bits of memory times 8. 1152 00:56:07,850 --> 00:56:10,460 And so what's really going on is this pattern 1153 00:56:10,460 --> 00:56:14,150 of 0's and 1's is being stored to represent 72. 1154 00:56:14,150 --> 00:56:16,280 This pattern of 0's and 1's is being stored 1155 00:56:16,280 --> 00:56:19,220 to represent 73, and similarly, 33. 1156 00:56:19,220 --> 00:56:22,750 But that's a very low level detail that we don't really care about, 1157 00:56:22,750 --> 00:56:27,550 so we'll generally just think about these as numbers like 72, 73, 33. 1158 00:56:27,550 --> 00:56:28,050 All right. 1159 00:56:28,050 --> 00:56:32,280 So if we go back to the actual code, though, here, I 1160 00:56:32,280 --> 00:56:35,250 wonder if this is the best idea. 1161 00:56:35,250 --> 00:56:38,280 These three lines of code are correct. 1162 00:56:38,280 --> 00:56:41,670 I got my 59 and 1/3 for my average, which I claim 1163 00:56:41,670 --> 00:56:46,740 is correct, but code-wise, this should maybe rub you the wrong way. 1164 00:56:46,740 --> 00:56:49,890 Even if you hadn't programmed before CS50, 1165 00:56:49,890 --> 00:56:53,250 why might this not be the best approach to storing things 1166 00:56:53,250 --> 00:56:57,170 like scores in a program? 1167 00:56:57,170 --> 00:56:58,670 How might this get us in trouble? 1168 00:56:58,670 --> 00:56:59,240 Yeah? 1169 00:56:59,240 --> 00:57:03,890 AUDIENCE: [INAUDIBLE] 1170 00:57:03,890 --> 00:57:04,670 DAVID MALAN: Yeah. 1171 00:57:04,670 --> 00:57:06,410 It's not the best because you have to use a whole bunch 1172 00:57:06,410 --> 00:57:08,180 of different variables for each score. 1173 00:57:08,180 --> 00:57:11,330 They're almost identically named, though, but just imagine 1174 00:57:11,330 --> 00:57:15,620 in almost any question involving the design of your code, what happens is n, 1175 00:57:15,620 --> 00:57:18,170 the number of things involved, gets larger? 1176 00:57:18,170 --> 00:57:21,950 Am I really going to start writing code that has score4, score5, score6, 1177 00:57:21,950 --> 00:57:23,270 score10, score20? 1178 00:57:23,270 --> 00:57:27,560 I mean, your code is just going to look like this mess of mostly copy-paste 1179 00:57:27,560 --> 00:57:30,227 except that the number at the end of the variable is changing. 1180 00:57:30,227 --> 00:57:32,810 Like that should make you cringe a little bit because it's not 1181 00:57:32,810 --> 00:57:34,610 going to end well eventually. 1182 00:57:34,610 --> 00:57:37,280 And typographical errors are going to get in the way most likely 1183 00:57:37,280 --> 00:57:38,447 because we'll make mistakes. 1184 00:57:38,447 --> 00:57:41,240 So how can we do a little bit better than that? 1185 00:57:41,240 --> 00:57:45,750 Well, let me propose that we introduce what we're going to now call an array. 1186 00:57:45,750 --> 00:57:52,950 An array is a sequence of values back to back to back in memory. 1187 00:57:52,950 --> 00:57:57,870 So an array is just a chunk of memory storing values back to back to back. 1188 00:57:57,870 --> 00:57:59,810 So no gaps, no fragmentation. 1189 00:57:59,810 --> 00:58:02,870 From left to right, top to bottom, just as I already drew. 1190 00:58:02,870 --> 00:58:05,550 But these arrays in C, at least, are going 1191 00:58:05,550 --> 00:58:09,070 to give a slightly new syntax that addresses exactly your concern. 1192 00:58:09,070 --> 00:58:14,580 So here instead is I would propose how you define a one variable-- 1193 00:58:14,580 --> 00:58:19,890 not three, one variable called scores, plural, each of whose values 1194 00:58:19,890 --> 00:58:24,150 is going to be an int, and you want three integers tucked away 1195 00:58:24,150 --> 00:58:25,420 in that variable. 1196 00:58:25,420 --> 00:58:28,440 So now I can pluralize the name of my variable 1197 00:58:28,440 --> 00:58:32,010 because by using square brackets and the number 3, I'm telling the compiler, 1198 00:58:32,010 --> 00:58:36,510 give me enough room for not one, not two, but three integers in total. 1199 00:58:36,510 --> 00:58:39,240 And the computer is going to do me a favor by storing them back 1200 00:58:39,240 --> 00:58:41,790 to back to back in the computer's memory. 1201 00:58:41,790 --> 00:58:45,810 Now assigning values to these variables is almost the same, 1202 00:58:45,810 --> 00:58:47,460 but the syntax looks like this. 1203 00:58:47,460 --> 00:58:53,370 To assign the first value, I do scores, bracket, 0 equals whatever, 72. 1204 00:58:53,370 --> 00:58:58,560 scores, bracket, 1 equals 73; scores, bracket, 2 equals 33. 1205 00:58:58,560 --> 00:59:00,360 And it's square brackets consistently. 1206 00:59:00,360 --> 00:59:02,220 And notice, this is a feature-- 1207 00:59:02,220 --> 00:59:04,080 or a downside of C. 1208 00:59:04,080 --> 00:59:07,980 We very frequently use the same syntax for slightly different ideas. 1209 00:59:07,980 --> 00:59:12,180 This first line tells the computer, give me an array of size 3. 1210 00:59:12,180 --> 00:59:16,830 These next three lines mean, go into this array at location 0 1211 00:59:16,830 --> 00:59:18,060 and put this value there. 1212 00:59:18,060 --> 00:59:21,280 Location 1, put this value there; location 2, put this value there. 1213 00:59:21,280 --> 00:59:24,690 So same syntax, but different meaning depending on the context here. 1214 00:59:24,690 --> 00:59:28,470 But the equal sign indeed means that this is assignment from right 1215 00:59:28,470 --> 00:59:30,340 to left just like last week. 1216 00:59:30,340 --> 00:59:33,750 So what does this mean in the computer's memory? 1217 00:59:33,750 --> 00:59:38,192 Well, in this case here, we now have a slightly different way of doing this. 1218 00:59:38,192 --> 00:59:39,900 And actually, let me do it first in code. 1219 00:59:39,900 --> 00:59:43,440 Let me go back to VS Code here, and let me 1220 00:59:43,440 --> 00:59:48,100 propose that instead of having these three separate variables, 1221 00:59:48,100 --> 00:59:52,590 let me give myself an int, scores variable of size 3, 1222 00:59:52,590 --> 00:59:58,590 and then do scores, bracket, 0 equals 72; scores, bracket, 1 equals 73; 1223 00:59:58,590 --> 01:00:02,100 scores, bracket, 2 equals 33. 1224 01:00:02,100 --> 01:00:05,730 And now I have to change this syntax slightly, but same idea. 1225 01:00:05,730 --> 01:00:12,660 scores, bracket, 0; scores, bracket, 1; and lastly, scores, bracket, 2. 1226 01:00:12,660 --> 01:00:14,640 So a couple of key details. 1227 01:00:14,640 --> 01:00:16,000 I started counting at 0. 1228 01:00:16,000 --> 01:00:16,500 Why? 1229 01:00:16,500 --> 01:00:18,210 That's just the way it is with arrays. 1230 01:00:18,210 --> 01:00:21,818 You must start counting at 0 unless you want to waste one of those spaces. 1231 01:00:21,818 --> 01:00:23,610 And what you definitely don't want to do is 1232 01:00:23,610 --> 01:00:27,030 go into scores, bracket, 3 because I only 1233 01:00:27,030 --> 01:00:29,190 ask the computer for three integers. 1234 01:00:29,190 --> 01:00:32,190 If I blindly do something like this, you're going too far. 1235 01:00:32,190 --> 01:00:34,830 You're going beyond the end of the chunk of memory 1236 01:00:34,830 --> 01:00:37,080 and bad things will often happen. 1237 01:00:37,080 --> 01:00:38,770 So we won't do that just yet. 1238 01:00:38,770 --> 01:00:43,030 But for now, 0, 1, and 2 are the first, second, and third locations. 1239 01:00:43,030 --> 01:00:48,030 So if I recompile this code-- so make scores seems OK. ./scores, 1240 01:00:48,030 --> 01:00:50,607 and I get the exact same answer there. 1241 01:00:50,607 --> 01:00:52,440 But let me make it more dynamic because this 1242 01:00:52,440 --> 01:00:56,670 is a little stupid that I'm compiling a program with my scores hardcoded. 1243 01:00:56,670 --> 01:00:59,380 What if I have a fourth exam tomorrow or something like that? 1244 01:00:59,380 --> 01:01:01,110 So let's make it more dynamic and I think 1245 01:01:01,110 --> 01:01:03,460 the syntax will start to make a little more sense. 1246 01:01:03,460 --> 01:01:07,270 Let's go ahead and use get_int and ask the user for a score. 1247 01:01:07,270 --> 01:01:10,270 Let's go ahead and get_int and ask the user for another score. 1248 01:01:10,270 --> 01:01:15,090 Let's go ahead and get_int and ask the user for a third score, 1249 01:01:15,090 --> 01:01:18,720 now storing the return values in each of those variables. 1250 01:01:18,720 --> 01:01:20,970 If I now do make scores-- 1251 01:01:20,970 --> 01:01:22,530 oh, darn it. 1252 01:01:22,530 --> 01:01:24,830 a mistake. 1253 01:01:24,830 --> 01:01:28,130 Similar to one I've made before, but we didn't see the error message last time. 1254 01:01:28,130 --> 01:01:28,880 What'd I do wrong? 1255 01:01:28,880 --> 01:01:30,165 Yeah? 1256 01:01:30,165 --> 01:01:31,040 AUDIENCE: [INAUDIBLE] 1257 01:01:31,040 --> 01:01:31,770 DAVID MALAN: OK. 1258 01:01:31,770 --> 01:01:33,915 What did I do wrong-- how about over here? 1259 01:01:33,915 --> 01:01:34,790 AUDIENCE: [INAUDIBLE] 1260 01:01:34,790 --> 01:01:35,210 DAVID MALAN: Yeah. 1261 01:01:35,210 --> 01:01:36,900 So I'm missing the CS50 header file. 1262 01:01:36,900 --> 01:01:38,060 So how do you know that? 1263 01:01:38,060 --> 01:01:40,550 Well, implicit declaration of function get_int. 1264 01:01:40,550 --> 01:01:42,350 So it just doesn't know what get_int is. 1265 01:01:42,350 --> 01:01:44,660 Well, who does know what get_int is? 1266 01:01:44,660 --> 01:01:47,010 The CS50 Library, that should be your first instinct. 1267 01:01:47,010 --> 01:01:47,510 All right. 1268 01:01:47,510 --> 01:01:51,620 Let me go to the top here and let me go ahead and squeeze in the CS50 Library 1269 01:01:51,620 --> 01:01:52,460 like this. 1270 01:01:52,460 --> 01:01:54,140 Now let me clear my terminal. 1271 01:01:54,140 --> 01:01:55,312 make scores again. 1272 01:01:55,312 --> 01:01:56,270 We're back in business. 1273 01:01:56,270 --> 01:02:00,320 And notice, I don't need to do -l cs50. 1274 01:02:00,320 --> 01:02:05,490 make is doing that for me for clang, but we don't even see clang being executed, 1275 01:02:05,490 --> 01:02:09,120 but it is being executed underneath the hood, so to speak. 1276 01:02:09,120 --> 01:02:10,970 All right, so ./scores, here we go. 1277 01:02:10,970 --> 01:02:13,340 72, 73, 33. 1278 01:02:13,340 --> 01:02:17,630 Math is still the same, but now the program is more interactive. 1279 01:02:17,630 --> 01:02:20,520 Now this, too, hopefully should rub you the wrong way. 1280 01:02:20,520 --> 01:02:25,790 This is correct, I would claim, but bad design still. 1281 01:02:25,790 --> 01:02:28,460 Reeks of week 0 inefficiencies. 1282 01:02:28,460 --> 01:02:29,030 Yeah? 1283 01:02:29,030 --> 01:02:33,793 AUDIENCE: [INAUDIBLE] 1284 01:02:33,793 --> 01:02:34,460 DAVID MALAN: OK. 1285 01:02:34,460 --> 01:02:37,160 So I could ask the human how many scores do you want to input? 1286 01:02:37,160 --> 01:02:38,310 Let's come back to that. 1287 01:02:38,310 --> 01:02:42,550 But I think even in this construct, what better could I do? 1288 01:02:42,550 --> 01:02:43,510 Use a loop, right? 1289 01:02:43,510 --> 01:02:46,060 Because I'm literally doing the same thing again and again. 1290 01:02:46,060 --> 01:02:48,530 And notice, this number is just changing slightly. 1291 01:02:48,530 --> 01:02:51,490 I would think that a little plus-plus could help there. get_int Score, 1292 01:02:51,490 --> 01:02:53,960 get_int Score, get_int Score-- that's the exact same thing. 1293 01:02:53,960 --> 01:02:56,120 So a loop is a perfect solution here. 1294 01:02:56,120 --> 01:02:59,980 So let me go over into this code here, and I can still for now 1295 01:02:59,980 --> 01:03:02,440 declare it to be of size 3, but I think I 1296 01:03:02,440 --> 01:03:07,340 could do something like this-- for int i get 0, i is less than 3, 1297 01:03:07,340 --> 01:03:10,090 so I'm not going to make the same buggy mistake as I made earlier. 1298 01:03:10,090 --> 01:03:11,260 I++. 1299 01:03:11,260 --> 01:03:15,850 Inside of the loop now, I can do scores, bracket, i, and now 1300 01:03:15,850 --> 01:03:18,010 arrays are getting really interesting because you 1301 01:03:18,010 --> 01:03:22,570 can use and reuse them, but dynamically go to a specific location. 1302 01:03:22,570 --> 01:03:25,510 Equals get_int, quote-unquote, "Score." 1303 01:03:25,510 --> 01:03:29,110 Now I can type that phrase just once and this loop ultimately 1304 01:03:29,110 --> 01:03:31,330 will do the same thing, but it's getting better. 1305 01:03:31,330 --> 01:03:34,720 The code is getting better designed because it's more compact 1306 01:03:34,720 --> 01:03:36,250 and I'm not repeating myself. 1307 01:03:36,250 --> 01:03:38,020 72, 73, 33. 1308 01:03:38,020 --> 01:03:42,530 Still works the same, but we're iteratively improving the code here. 1309 01:03:42,530 --> 01:03:48,500 Now how else-- there's one design flaw here that I still don't love 1310 01:03:48,500 --> 01:03:49,710 it's a little more subtle. 1311 01:03:49,710 --> 01:03:51,160 Any observations? 1312 01:03:51,160 --> 01:03:57,462 AUDIENCE: [INAUDIBLE] 1313 01:03:57,462 --> 01:03:58,670 DAVID MALAN: Ah, interesting. 1314 01:03:58,670 --> 01:04:01,460 So instead of dividing by 3.0, maybe I should divide it 1315 01:04:01,460 --> 01:04:05,480 by the array size, which at the moment is technically still 3, 1316 01:04:05,480 --> 01:04:10,670 but I do concur that that is worrisome because they could get out of sync. 1317 01:04:10,670 --> 01:04:13,550 But there's something else that still isn't quite right. 1318 01:04:13,550 --> 01:04:14,965 Yeah? 1319 01:04:14,965 --> 01:04:19,090 AUDIENCE: [INAUDIBLE] 1320 01:04:19,090 --> 01:04:22,292 DAVID MALAN: I'm OK moving to this zero-indexed model. 1321 01:04:22,292 --> 01:04:23,500 So this is a new term of art. 1322 01:04:23,500 --> 01:04:27,470 To index into an array means to go to a specific location. 1323 01:04:27,470 --> 01:04:31,120 So here, I'm indexing into location i, but i is going 1324 01:04:31,120 --> 01:04:33,250 to start at 0 and then 1 and then 2. 1325 01:04:33,250 --> 01:04:34,390 I'm actually OK with that. 1326 01:04:34,390 --> 01:04:37,600 Even though in common day life we would say score1, score2, score3, 1327 01:04:37,600 --> 01:04:39,730 as a programmer, I just have to get into the habit 1328 01:04:39,730 --> 01:04:43,450 of saying score0, score1, score2 now. 1329 01:04:43,450 --> 01:04:44,350 But something else. 1330 01:04:44,350 --> 01:04:45,306 Yeah? 1331 01:04:45,306 --> 01:04:47,540 AUDIENCE: I could compute the average. 1332 01:04:47,540 --> 01:04:49,850 DAVID MALAN: I could also compute the average in a loop 1333 01:04:49,850 --> 01:04:54,290 because indeed, this is only going-- so solving the problem halfway. 1334 01:04:54,290 --> 01:04:56,240 I'm gathering the information in the loop, 1335 01:04:56,240 --> 01:04:58,200 but then I'm manually writing it all out. 1336 01:04:58,200 --> 01:05:01,730 So it does feel like there should be a better solution here. 1337 01:05:01,730 --> 01:05:05,540 But let me also identify one other issue I really don't like, 1338 01:05:05,540 --> 01:05:06,710 and this is, indeed, subtle. 1339 01:05:06,710 --> 01:05:11,180 I've got 3 here, I've got 3 here, and I essentially have 3 here, 1340 01:05:11,180 --> 01:05:12,750 albeit a floating point version. 1341 01:05:12,750 --> 01:05:16,550 This is just ripe for me making a mistake eventually and changing one 1342 01:05:16,550 --> 01:05:18,840 of those values, but not the other two? 1343 01:05:18,840 --> 01:05:20,090 So how might I fix this? 1344 01:05:20,090 --> 01:05:22,200 I might at least do something like this. 1345 01:05:22,200 --> 01:05:28,010 I could say integer maybe n for scores, I'll set that equal to 3. 1346 01:05:28,010 --> 01:05:31,430 I could then use n here, I could use n here. 1347 01:05:31,430 --> 01:05:33,742 I could use n here, but that's a step backwards 1348 01:05:33,742 --> 01:05:36,950 because I don't want an int because I'm going to run into the same math issue 1349 01:05:36,950 --> 01:05:40,250 as before, but I could convert it-- that is, cast it to a float, 1350 01:05:40,250 --> 01:05:42,920 and we did that briefly last week. 1351 01:05:42,920 --> 01:05:47,730 But there's one other thing I could do here that we did introduced last week. 1352 01:05:47,730 --> 01:05:51,150 This is better because I don't have a magic number floating around 1353 01:05:51,150 --> 01:05:53,490 in multiple places. 1354 01:05:53,490 --> 01:05:56,160 Yeah, if I really want to be proper, I should probably 1355 01:05:56,160 --> 01:05:58,440 say this should be a constant integer. 1356 01:05:58,440 --> 01:05:58,950 Why? 1357 01:05:58,950 --> 01:06:01,200 Because I don't want to accidentally change it myself. 1358 01:06:01,200 --> 01:06:03,242 I don't want to be collaborating with a colleague 1359 01:06:03,242 --> 01:06:04,800 and they foolishly change it on me. 1360 01:06:04,800 --> 01:06:09,060 This just sends a stronger signal to the compiler, do not let the humans change 1361 01:06:09,060 --> 01:06:10,000 this value. 1362 01:06:10,000 --> 01:06:12,960 And now just to point out one other feature of C, 1363 01:06:12,960 --> 01:06:16,650 if you have a number like this, like the number 3, 1364 01:06:16,650 --> 01:06:18,990 I've deliberately capitalized this variable name really 1365 01:06:18,990 --> 01:06:19,915 for the first time. 1366 01:06:19,915 --> 01:06:22,290 Any time you have a constant, it tends to be a convention 1367 01:06:22,290 --> 01:06:25,000 to capitalize it just to draw your attention to it. 1368 01:06:25,000 --> 01:06:26,580 It doesn't mean anything technically. 1369 01:06:26,580 --> 01:06:28,950 Capitalizing a variable does nothing to it, 1370 01:06:28,950 --> 01:06:31,660 but it draws attention visually to it to the human. 1371 01:06:31,660 --> 01:06:33,930 So if you declare something as a constant, 1372 01:06:33,930 --> 01:06:37,050 it's commonplace to capitalize it just because. 1373 01:06:37,050 --> 01:06:41,790 Moreover, if you have a constant that you might want to occasionally modify-- 1374 01:06:41,790 --> 01:06:45,660 maybe next semester when there's four exams or five exams instead of three, 1375 01:06:45,660 --> 01:06:48,900 it actually is OK sometimes to define what 1376 01:06:48,900 --> 01:06:52,080 might be called a global variable, a variable that is not 1377 01:06:52,080 --> 01:06:57,280 inside of curly braces, it's literally at the top of the file outside of main, 1378 01:06:57,280 --> 01:06:59,890 and despite what I said about scope last week, 1379 01:06:59,890 --> 01:07:05,170 a global variable like this on line 4 will be in scope 1380 01:07:05,170 --> 01:07:07,550 to every function in this file. 1381 01:07:07,550 --> 01:07:09,880 So it's actually a way of sharing a variable 1382 01:07:09,880 --> 01:07:13,090 across multiple functions, which is generally fine if you're 1383 01:07:13,090 --> 01:07:14,230 using a constant. 1384 01:07:14,230 --> 01:07:18,010 If you intend to change it, there's probably a better way 1385 01:07:18,010 --> 01:07:21,310 than actually using a global variable, but this is just 1386 01:07:21,310 --> 01:07:23,620 in contrast to what I previously did, which I would 1387 01:07:23,620 --> 01:07:26,810 call, by contrast, a local variable. 1388 01:07:26,810 --> 01:07:30,563 But again, I'm just trying to reduce the probability of making mistakes 1389 01:07:30,563 --> 01:07:31,480 somewhere in the code. 1390 01:07:31,480 --> 01:07:32,170 And I do agree. 1391 01:07:32,170 --> 01:07:35,560 I don't like that I'm still adding all of these scores 1392 01:07:35,560 --> 01:07:39,130 manually even though clearly I had a loop a moment ago. 1393 01:07:39,130 --> 01:07:40,990 But for now, let's at least consider what's 1394 01:07:40,990 --> 01:07:43,130 been going on inside of the computer's memory. 1395 01:07:43,130 --> 01:07:48,880 So with this array, I now have not three variables, score1, score2, score3. 1396 01:07:48,880 --> 01:07:53,530 I have one variable, an array variable, called scores, plural. 1397 01:07:53,530 --> 01:07:57,700 And if I want to access the first element, its scores, bracket, 0. 1398 01:07:57,700 --> 01:08:00,400 If I want to access the second element, its scores, bracket, 1. 1399 01:08:00,400 --> 01:08:03,100 If I want to access the third element, it's scores, bracket, 2. 1400 01:08:03,100 --> 01:08:07,480 If I were to make a mistake and do scores, bracket, 3, 1401 01:08:07,480 --> 01:08:11,380 which is the fourth element, I'd end up in no man's land here, 1402 01:08:11,380 --> 01:08:15,307 and worst case, your program could crash or something weird will happen, 1403 01:08:15,307 --> 01:08:17,140 spinning beach balls, those kinds of things. 1404 01:08:17,140 --> 01:08:18,910 Just don't make those mistakes. 1405 01:08:18,910 --> 01:08:21,310 And C makes it easy to make those mistakes, 1406 01:08:21,310 --> 01:08:25,300 so the onus is really on you programmatically. 1407 01:08:25,300 --> 01:08:31,960 Questions on this use of arrays? 1408 01:08:31,960 --> 01:08:33,580 Question on this use of arrays? 1409 01:08:33,580 --> 01:08:34,359 Yeah, in back. 1410 01:08:34,359 --> 01:08:36,283 AUDIENCE: Is there any way [INAUDIBLE]? 1411 01:08:36,283 --> 01:08:43,870 1412 01:08:43,870 --> 01:08:45,370 DAVID MALAN: A really good question. 1413 01:08:45,370 --> 01:08:48,279 Is there any way to create an array just by using syntax alone 1414 01:08:48,279 --> 01:08:49,899 without prompting the human for it? 1415 01:08:49,899 --> 01:08:51,490 Short answer, yes. 1416 01:08:51,490 --> 01:08:56,529 If you want to have an array of integers called, for instance, array, 1417 01:08:56,529 --> 01:09:01,090 you could actually do like 13, 42, 50, something like this, 1418 01:09:01,090 --> 01:09:04,300 would give you an array if you use this syntax. 1419 01:09:04,300 --> 01:09:08,680 This would give you an array of size 3 where the three values by default 1420 01:09:08,680 --> 01:09:10,600 are 13, 42 and 50. 1421 01:09:10,600 --> 01:09:13,370 It's not syntax we'll use for now, but there is syntax like that. 1422 01:09:13,370 --> 01:09:15,970 It's not quite as user-friendly, though, as other languages 1423 01:09:15,970 --> 01:09:19,060 if you've indeed programmed before. 1424 01:09:19,060 --> 01:09:24,439 Other questions on this use of arrays? 1425 01:09:24,439 --> 01:09:26,550 Yeah, in front. 1426 01:09:26,550 --> 01:09:29,050 AUDIENCE: [INAUDIBLE] 1427 01:09:29,050 --> 01:09:30,924 DAVID MALAN: Is there a way to copy what? 1428 01:09:30,924 --> 01:09:33,399 AUDIENCE: [INAUDIBLE] 1429 01:09:33,399 --> 01:09:36,310 DAVID MALAN: Oh, is there a way to calculate the length of an array? 1430 01:09:36,310 --> 01:09:39,910 Short answer, no, and I'm about to show you one demonstration of this. 1431 01:09:39,910 --> 01:09:43,899 Those of you who have programmed before in Java, in JavaScript, 1432 01:09:43,899 --> 01:09:47,270 in certain other languages, it's very easy to get the length of an array. 1433 01:09:47,270 --> 01:09:49,720 You essentially just ask the array, what's its length? 1434 01:09:49,720 --> 01:09:51,880 C does not give you that capability. 1435 01:09:51,880 --> 01:09:56,560 The onus is entirely on you and me to remember, s as with another variable, 1436 01:09:56,560 --> 01:09:59,300 like n, how long the array is. 1437 01:09:59,300 --> 01:10:01,760 And so in fact, let me go ahead and do this. 1438 01:10:01,760 --> 01:10:06,430 I'm going to go ahead and open up a baking style, a program 1439 01:10:06,430 --> 01:10:09,940 that I wrote in advance here which kind of escalates quickly, 1440 01:10:09,940 --> 01:10:13,990 but there's not really too many new ideas here except for the array 1441 01:10:13,990 --> 01:10:14,800 specifics. 1442 01:10:14,800 --> 01:10:19,450 So this is scores.c premade this time. 1443 01:10:19,450 --> 01:10:20,650 And notice what I have. 1444 01:10:20,650 --> 01:10:25,750 One, I've included cs50.h and stdio.h at the top, so that's the same. 1445 01:10:25,750 --> 01:10:28,630 I have declared a constant called n, set it equal to 3. 1446 01:10:28,630 --> 01:10:31,270 That is now the same as of my most recent change. 1447 01:10:31,270 --> 01:10:36,380 I did introduce an average function, which was one of the remaining concerns 1448 01:10:36,380 --> 01:10:40,220 that I could compute the average with some kind of loop, too. 1449 01:10:40,220 --> 01:10:42,980 That average function is going to return a float, which is what. 1450 01:10:42,980 --> 01:10:46,100 I want my average to be a float with the fraction. 1451 01:10:46,100 --> 01:10:47,180 But notice this. 1452 01:10:47,180 --> 01:10:50,360 In answer to your question, if I want a function called 1453 01:10:50,360 --> 01:10:55,100 average to do something iterate over an array step by step by step, 1454 01:10:55,100 --> 01:10:58,430 add up all the numbers, and divide by the total number of numbers, 1455 01:10:58,430 --> 01:11:03,350 I need to give it the array of numbers, and I need to tell it how many of those 1456 01:11:03,350 --> 01:11:03,950 numbers are. 1457 01:11:03,950 --> 01:11:06,230 So I literally have to pass in two values. 1458 01:11:06,230 --> 01:11:09,890 Meanwhile, this code is the same as before inside of main. 1459 01:11:09,890 --> 01:11:13,430 I'm declaring a variable called scores of size n. 1460 01:11:13,430 --> 01:11:16,430 I'm iterating from i to n. 1461 01:11:16,430 --> 01:11:17,990 And actually-- yep. 1462 01:11:17,990 --> 01:11:22,520 And then in this loop, I'm assigning each of the scores a return 1463 01:11:22,520 --> 01:11:23,750 value of get_int. 1464 01:11:23,750 --> 01:11:27,350 The last line of main is this-- print out the average with f, 1465 01:11:27,350 --> 01:11:31,280 but don't just do it manually by adding and dividing with parentheses. 1466 01:11:31,280 --> 01:11:36,080 Call the average function, pass in the length of the array and the array 1467 01:11:36,080 --> 01:11:41,810 itself, and hope that it returns a float that then gets plugged into percent f 1468 01:11:41,810 --> 01:11:45,260 So I would claim that pretty much all of this, even though it's a lot, 1469 01:11:45,260 --> 01:11:46,550 should be familiar. 1470 01:11:46,550 --> 01:11:50,780 There's no real new ideas except for this use of the global variable now 1471 01:11:50,780 --> 01:11:52,590 and this average function. 1472 01:11:52,590 --> 01:11:54,740 So let me scroll down to the average function 1473 01:11:54,740 --> 01:11:57,530 because this is the takeaway from this final example. 1474 01:11:57,530 --> 01:11:59,570 In this example here-- 1475 01:11:59,570 --> 01:12:01,640 let me scroll up to the average function, 1476 01:12:01,640 --> 01:12:04,790 copy-pasted the prototype for the very first line. 1477 01:12:04,790 --> 01:12:06,980 And here's how I'm computing the average. 1478 01:12:06,980 --> 01:12:11,240 There's different ways of doing this, but here's an accumulator way. 1479 01:12:11,240 --> 01:12:15,260 On line 28, I'm declaring a variable inside of the average function called 1480 01:12:15,260 --> 01:12:17,540 sum, and I'm just initializing it to 0. 1481 01:12:17,540 --> 01:12:18,050 Why? 1482 01:12:18,050 --> 01:12:20,630 Mentally I want to add up all of the person scores 1483 01:12:20,630 --> 01:12:24,480 and then I want to divide by the total and that's my mathematical average. 1484 01:12:24,480 --> 01:12:28,970 So here's my loop where I'm iterating from 0 up to, but not 1485 01:12:28,970 --> 01:12:32,060 through the length-- so that should be three times. 1486 01:12:32,060 --> 01:12:37,950 I am adding to the sum variable whatever is at the i-th location, so to speak, 1487 01:12:37,950 --> 01:12:38,850 of the array. 1488 01:12:38,850 --> 01:12:42,050 So this is array, bracket 0; array, bracket, 1; array, bracket, 1489 01:12:42,050 --> 01:12:43,860 2 on each iteration. 1490 01:12:43,860 --> 01:12:46,670 And then the last thing I'm doing is a nice one-liner. 1491 01:12:46,670 --> 01:12:51,470 I'm dividing the sum, which is an int, which is the sum of 72, 73, 33, 1492 01:12:51,470 --> 01:12:56,550 divided by the length, which is 3, but 3 is not a float, so I cast it to a float 1493 01:12:56,550 --> 01:13:03,060 so that the end value, hopefully, is going to be 59.33333 and so forth. 1494 01:13:03,060 --> 01:13:06,380 So the only thing that's weird syntactically is this, though. 1495 01:13:06,380 --> 01:13:10,430 When you define a function in C that takes an argument that isn't just 1496 01:13:10,430 --> 01:13:14,640 a simple char, isn't just a simple integer, it's actually an array, 1497 01:13:14,640 --> 01:13:17,090 you don't have to know the array's length in advance. 1498 01:13:17,090 --> 01:13:19,820 You can just put square brackets after the name you give it. 1499 01:13:19,820 --> 01:13:21,237 And I don't have to call it array. 1500 01:13:21,237 --> 01:13:23,930 I could call it x or y or z or anything else. 1501 01:13:23,930 --> 01:13:26,390 I called it array just to make clear that it's an array, 1502 01:13:26,390 --> 01:13:30,620 but you do need to know the length somehow. 1503 01:13:30,620 --> 01:13:31,120 OK. 1504 01:13:31,120 --> 01:13:37,820 Questions on combining those ideas in that there way? 1505 01:13:37,820 --> 01:13:41,170 1506 01:13:41,170 --> 01:13:42,980 Any questions? 1507 01:13:42,980 --> 01:13:43,850 No? 1508 01:13:43,850 --> 01:13:44,420 All right. 1509 01:13:44,420 --> 01:13:46,790 Well, we've only dealt with numbers thus far. 1510 01:13:46,790 --> 01:13:50,340 It would be nice to actually deal with letters and words and paragraphs 1511 01:13:50,340 --> 01:13:52,340 and the like, much like our readability example, 1512 01:13:52,340 --> 01:13:56,150 but I think first, some snacks and some fruit are served in the transept. 1513 01:13:56,150 --> 01:13:57,140 So we'll see you in 10. 1514 01:13:57,140 --> 01:13:59,480 See you in 10. 1515 01:13:59,480 --> 01:14:00,320 All right. 1516 01:14:00,320 --> 01:14:01,190 So we're back. 1517 01:14:01,190 --> 01:14:02,960 And up until now, we've been representing 1518 01:14:02,960 --> 01:14:05,060 just numbers underneath the hood, but we've 1519 01:14:05,060 --> 01:14:07,760 introduced arrays, which gave us this ability, recall, 1520 01:14:07,760 --> 01:14:10,260 to store numbers back to back to back. 1521 01:14:10,260 --> 01:14:13,310 So it turns out, you actually had this capability for the past 1522 01:14:13,310 --> 01:14:15,620 week even though you might not have realized it. 1523 01:14:15,620 --> 01:14:19,100 And let me propose that we first consider very simple example of three 1524 01:14:19,100 --> 01:14:20,750 chars instead of three integers. 1525 01:14:20,750 --> 01:14:23,390 And for simplistically, I'm going to call them c1, c2, 1526 01:14:23,390 --> 01:14:25,400 and c3 just for the sake of discussion. 1527 01:14:25,400 --> 01:14:29,090 But I'm going to put our familiar characters, "HI!" 1528 01:14:29,090 --> 01:14:32,330 in those variables using single quotes because again. 1529 01:14:32,330 --> 01:14:35,900 That's what you do when using individual chars 1530 01:14:35,900 --> 01:14:40,282 to make the point that I can store three chars in three separate variables. 1531 01:14:40,282 --> 01:14:41,990 So let me go ahead and go over to VS Code 1532 01:14:41,990 --> 01:14:45,180 here and let me create something called hi.c. 1533 01:14:45,180 --> 01:14:50,970 And in this program, I'll first include stdio.h, int main void as before. 1534 01:14:50,970 --> 01:14:53,430 And then inside of main, let's just do exactly that. 1535 01:14:53,430 --> 01:14:57,540 Char c1 equals, quote-unquote, capital H. Char C2 equals, 1536 01:14:57,540 --> 01:15:00,420 quote-unquote, capital I. Char C3 equals, 1537 01:15:00,420 --> 01:15:02,550 quote-unquote, exclamation point. 1538 01:15:02,550 --> 01:15:06,450 So clearly not the best approach, but just for demonstration's sake. 1539 01:15:06,450 --> 01:15:09,780 And here now that you understand hopefully 1540 01:15:09,780 --> 01:15:12,780 from week 1 that really number-- and really, from week 0, 1541 01:15:12,780 --> 01:15:16,020 that numbers are just letters, which can be something more, too. 1542 01:15:16,020 --> 01:15:18,570 We can really just use our basic understanding of C 1543 01:15:18,570 --> 01:15:21,180 to tinker with these ideas now and see them such 1544 01:15:21,180 --> 01:15:24,900 that there is indeed going to be no magic happening for us ultimately. 1545 01:15:24,900 --> 01:15:31,800 So let me go ahead and print out three characters-- %c, %c, %c, backslash n. 1546 01:15:31,800 --> 01:15:34,800 And then print out c1, c2, c3. 1547 01:15:34,800 --> 01:15:36,690 So I've got three separate placeholders. 1548 01:15:36,690 --> 01:15:40,560 And we haven't really had occasion to use %c, but it means put char here, 1549 01:15:40,560 --> 01:15:44,760 unlike %s, which is put a whole string here, or %i, put an integer. 1550 01:15:44,760 --> 01:15:49,290 Let me go ahead and make hi, no syntax errors, ./hi, 1551 01:15:49,290 --> 01:15:51,330 and it should print out "HI!" 1552 01:15:51,330 --> 01:15:53,400 in exclamation points because I'm printing out 1553 01:15:53,400 --> 01:15:54,870 just three simple characters. 1554 01:15:54,870 --> 01:15:57,850 But per our discussion as far back as week 0, 1555 01:15:57,850 --> 01:16:01,440 letters are just numbers and numbers are just letters, 1556 01:16:01,440 --> 01:16:03,840 it just depends on the context in which we use them. 1557 01:16:03,840 --> 01:16:05,792 So let me change this %c to an i. 1558 01:16:05,792 --> 01:16:08,250 And I'm going to add a space just so that you can obviously 1559 01:16:08,250 --> 01:16:10,050 separate one number from another. 1560 01:16:10,050 --> 01:16:14,850 Change this to i, change this to i, but still print out c1, c2, c3. 1561 01:16:14,850 --> 01:16:16,650 So no integers, per se. 1562 01:16:16,650 --> 01:16:19,500 Let me just print out those chars. 1563 01:16:19,500 --> 01:16:26,670 Let me do make hi, no errors, ./hi, and now I see 72, 73, 33. 1564 01:16:26,670 --> 01:16:31,270 So in the case of chars and ints, you can actually treat one as the other 1565 01:16:31,270 --> 01:16:33,850 so long as you have enough bits to fit one in the other. 1566 01:16:33,850 --> 01:16:36,450 You don't have to cast even or do anything explicitly. 1567 01:16:36,450 --> 01:16:38,340 You do have to cast one of-- 1568 01:16:38,340 --> 01:16:41,910 converting an integer to a float to make clear to the compiler 1569 01:16:41,910 --> 01:16:44,160 that you really intend to do this because that 1570 01:16:44,160 --> 01:16:47,400 could be destructive if it can't quite represent the number as you intend. 1571 01:16:47,400 --> 01:16:50,880 But in this case here, I think we're OK just poking around and seeing 1572 01:16:50,880 --> 01:16:52,750 what's going on underneath the hood. 1573 01:16:52,750 --> 01:16:55,202 Well, what is going on underneath the hood memory-wise? 1574 01:16:55,202 --> 01:16:56,410 Well, something very similar. 1575 01:16:56,410 --> 01:16:57,780 Here's that canvas of memory. 1576 01:16:57,780 --> 01:17:00,570 And maybe we got lucky and it's in the top left-hand corner 1577 01:17:00,570 --> 01:17:03,270 like this-- c1, c2, c3. 1578 01:17:03,270 --> 01:17:05,790 But these are just three individual characters, 1579 01:17:05,790 --> 01:17:08,970 but we're getting awfully close to what we last week called 1580 01:17:08,970 --> 01:17:12,270 a string, which are just characters, a sequence of characters 1581 01:17:12,270 --> 01:17:13,500 from left to right. 1582 01:17:13,500 --> 01:17:19,530 And in fact, I think if we combine this revelation that these are just 1583 01:17:19,530 --> 01:17:22,410 numbers underneath the hood back to back to back combined 1584 01:17:22,410 --> 01:17:25,620 with the idea of an array from earlier, we can 1585 01:17:25,620 --> 01:17:27,690 start to see what's really going on. 1586 01:17:27,690 --> 01:17:31,920 Because indeed, underneath the hood, this is just a number, 72, 73, 33. 1587 01:17:31,920 --> 01:17:34,290 And really, if we go lower level than that, 1588 01:17:34,290 --> 01:17:36,870 it's these three patterns of 0's and 1's. 1589 01:17:36,870 --> 01:17:39,270 That's all that's going on inside of the computer, 1590 01:17:39,270 --> 01:17:43,380 but it's our use of int that shows it to us as an integer. 1591 01:17:43,380 --> 01:17:47,250 It's our use of char that makes it clear that it's a char, or equivalently, 1592 01:17:47,250 --> 01:17:50,260 %i and %c respectively. 1593 01:17:50,260 --> 01:17:52,180 But what exactly is a string? 1594 01:17:52,180 --> 01:17:54,540 Well, it's really just a sequence of characters, 1595 01:17:54,540 --> 01:17:56,530 and so why don't we go there? 1596 01:17:56,530 --> 01:17:59,400 Let me propose that we actually give ourselves an actual string, 1597 01:17:59,400 --> 01:18:02,260 call it s-- we'll use double quotes this time. 1598 01:18:02,260 --> 01:18:05,760 So if I go back to VS Code here, let me shorten this program 1599 01:18:05,760 --> 01:18:10,440 and just give myself a single string s, set it equal to "HI!" 1600 01:18:10,440 --> 01:18:11,310 in double quotes. 1601 01:18:11,310 --> 01:18:16,740 And then below that, let's go ahead and print out %s, backslash n, 1602 01:18:16,740 --> 01:18:18,180 and then s itself. 1603 01:18:18,180 --> 01:18:21,120 And then, turns out, for reasons we'll soon 1604 01:18:21,120 --> 01:18:23,670 see, I do need to include the CS50 Library so as 1605 01:18:23,670 --> 01:18:27,750 to use the actual keyword string here even though I'm not using get_string, 1606 01:18:27,750 --> 01:18:29,490 but more on that another time. 1607 01:18:29,490 --> 01:18:34,950 But if I now do make hi, it does compile ./hi and it still prints out the exact 1608 01:18:34,950 --> 01:18:35,920 same thing. 1609 01:18:35,920 --> 01:18:38,370 But what's going on inside of the computer's memory 1610 01:18:38,370 --> 01:18:42,550 when I use a string called s instead of three chars, well, 1611 01:18:42,550 --> 01:18:46,300 you can think of the string as taking up at least three bytes, H, 1612 01:18:46,300 --> 01:18:47,690 I, exclamation point. 1613 01:18:47,690 --> 01:18:50,440 But it's not three separate variables, it's one variable. 1614 01:18:50,440 --> 01:18:53,620 But what does this really look like now, especially 1615 01:18:53,620 --> 01:18:56,020 if I add back the yellow lines? 1616 01:18:56,020 --> 01:19:00,970 s is really just an array of characters. 1617 01:19:00,970 --> 01:19:04,170 So we called it a string last week, and I claim today 1618 01:19:04,170 --> 01:19:10,320 that this is an abstraction in the CS50 library that's giving us this string, 1619 01:19:10,320 --> 01:19:13,560 but it's really just an array of size at least 3 1620 01:19:13,560 --> 01:19:16,560 here where s, bracket, 0 presumably gives me the H, s, bracket, 1621 01:19:16,560 --> 01:19:19,720 1 is the I, s, bracket, 2 is the exclamation point. 1622 01:19:19,720 --> 01:19:22,410 But just by saying string, all of that happens automatically. 1623 01:19:22,410 --> 01:19:25,320 I don't even need to tell the computer how many chars are 1624 01:19:25,320 --> 01:19:27,880 going to be in this string all at once. 1625 01:19:27,880 --> 01:19:31,680 So in fact, let me go over to maybe a variant of this program 1626 01:19:31,680 --> 01:19:33,570 and we can see this syntactically. 1627 01:19:33,570 --> 01:19:37,480 So instead of printing out the whole string with %s, 1628 01:19:37,480 --> 01:19:43,320 let me actually be a little curious and print out %c, %c, %c, 1629 01:19:43,320 --> 01:19:47,490 and then change s to s, bracket, 0, s, bracket, 1, s, bracket, 2. 1630 01:19:47,490 --> 01:19:49,140 Which is not better in any sense. 1631 01:19:49,140 --> 01:19:51,000 This is way more tedious now, but it does 1632 01:19:51,000 --> 01:19:54,840 demonstrate that I can treat here in week 2 1633 01:19:54,840 --> 01:19:57,870 as though it's an array, which means even in week 1 it was an array, 1634 01:19:57,870 --> 01:19:58,840 we just didn't know it. 1635 01:19:58,840 --> 01:20:01,730 We didn't have the syntax with which to express that. 1636 01:20:01,730 --> 01:20:05,740 So if I now do make hi, still compiles ./hi. 1637 01:20:05,740 --> 01:20:09,370 Same exact output, but I'm now just kind of manipulating 1638 01:20:09,370 --> 01:20:11,380 the string in these different ways because I 1639 01:20:11,380 --> 01:20:13,720 a string is just an array of characters, so I can 1640 01:20:13,720 --> 01:20:16,450 treat with the square bracket notation. 1641 01:20:16,450 --> 01:20:21,100 But how do I know-- how does the computer know where hi ends? 1642 01:20:21,100 --> 01:20:23,890 And this is where strings get a little dangerous. 1643 01:20:23,890 --> 01:20:26,050 Like a char is 1 byte no matter what. 1644 01:20:26,050 --> 01:20:28,480 1 char, 1 character, that's it. 1645 01:20:28,480 --> 01:20:31,180 But a string, recall my question mark from earlier, 1646 01:20:31,180 --> 01:20:33,415 could be null bytes if it's-- 1647 01:20:33,415 --> 01:20:37,610 you would think could be 0 bytes if you have nothing in it inside the quotes. 1648 01:20:37,610 --> 01:20:40,930 It could be one character, two, 10, 100 like I claimed, 1649 01:20:40,930 --> 01:20:44,140 but how does the computer know where strings end? 1650 01:20:44,140 --> 01:20:47,560 Like how does the computer not know that the string is not 1651 01:20:47,560 --> 01:20:49,510 the whole row of memory here? 1652 01:20:49,510 --> 01:20:51,350 How does it know that it ends here? 1653 01:20:51,350 --> 01:20:54,880 Well, it turns out, all this time, when we've been using, quote-unquote, 1654 01:20:54,880 --> 01:20:58,120 string and using get_string from the CS50 library, 1655 01:20:58,120 --> 01:21:00,640 there's actually a special sentinel value 1656 01:21:00,640 --> 01:21:03,580 at the end of every string in a computer's memory 1657 01:21:03,580 --> 01:21:06,700 that tells the computer string, stops here. 1658 01:21:06,700 --> 01:21:08,890 And the sentinel value-- and by sentinel, I 1659 01:21:08,890 --> 01:21:13,820 just mean special value that the world decided on decades ago, is all 0 bits. 1660 01:21:13,820 --> 01:21:20,210 If you have a byte with all 0 bits in it, that means string ends here. 1661 01:21:20,210 --> 01:21:23,920 So the implication is that the computer now, using a loop or something, 1662 01:21:23,920 --> 01:21:26,650 can print out char, char, char-- oh, done, 1663 01:21:26,650 --> 01:21:28,750 because it sees this special value. 1664 01:21:28,750 --> 01:21:32,800 If it didn't have that, it might blindly go char, char, char, char char-- 1665 01:21:32,800 --> 01:21:37,450 printing out values of memory that don't belong to that given string. 1666 01:21:37,450 --> 01:21:39,490 So I was correcting myself verbally a moment ago 1667 01:21:39,490 --> 01:21:44,140 because I said that this string is of length 3, it's 3 bytes, but it's not. 1668 01:21:44,140 --> 01:21:47,290 Every string in the world, both last week and now, this 1669 01:21:47,290 --> 01:21:51,430 is actually n plus 1 bytes where n is the actual human length 1670 01:21:51,430 --> 01:21:54,220 that you care about, H-I, exclamation point, or 3, 1671 01:21:54,220 --> 01:21:59,110 but it's always going to use one extra byte for this so-called zero value 1672 01:21:59,110 --> 01:21:59,770 at the end. 1673 01:21:59,770 --> 01:22:03,220 And this 0 value is very tedious to write a 0-- 1674 01:22:03,220 --> 01:22:04,630 as 8 0 bits. 1675 01:22:04,630 --> 01:22:07,240 So we would actually typically just write it as a 0. 1676 01:22:07,240 --> 01:22:10,420 But you don't want to confuse a 0 on the screen-- it's actually being 1677 01:22:10,420 --> 01:22:12,290 like the number 0 on the keyboard. 1678 01:22:12,290 --> 01:22:16,000 And so we would actually typically write this symbol with a backslash 0. 1679 01:22:16,000 --> 01:22:19,960 So this is the char-based representation of 0. 1680 01:22:19,960 --> 01:22:21,970 So it means the exact same thing, this is just 1681 01:22:21,970 --> 01:22:26,470 C notation that indicates that this is 8 0 bits, 1682 01:22:26,470 --> 01:22:29,380 but just makes clear that it's not literally the number 1683 01:22:29,380 --> 01:22:32,320 0 that you want to see on the screen, it's a sentinel value 1684 01:22:32,320 --> 01:22:34,880 that is terminating this here string. 1685 01:22:34,880 --> 01:22:38,480 So now what can I do once I know this information? 1686 01:22:38,480 --> 01:22:41,740 Well, I can actually even see this let me go back to this code 1687 01:22:41,740 --> 01:22:42,790 here in VS Code. 1688 01:22:42,790 --> 01:22:46,190 Let me change these %c's to %i's just like before. 1689 01:22:46,190 --> 01:22:50,290 And now, we'll see again those same numbers, make hi, ./hi, 1690 01:22:50,290 --> 01:22:51,730 there are the three. 1691 01:22:51,730 --> 01:22:56,410 I can technically poke around a little bit further, %i one more, 1692 01:22:56,410 --> 01:22:58,210 and let's look at s, bracket, 3. 1693 01:22:58,210 --> 01:23:02,060 I was not exaggerating earlier when I said, 1694 01:23:02,060 --> 01:23:06,260 in general, if you go past the end of an array, bad things can happen. 1695 01:23:06,260 --> 01:23:10,030 But in this case, I know that there is one more thing at the end of this array 1696 01:23:10,030 --> 01:23:13,210 because this is how strings are built. This is not a CS50 thing, 1697 01:23:13,210 --> 01:23:17,440 this is a thing in C. Every string in the world in double quotes 1698 01:23:17,440 --> 01:23:20,780 ends with a backslash 0-- that is 8 0 bits. 1699 01:23:20,780 --> 01:23:24,400 So if I really want, I can see this by printing out s, bracket, 3, 1700 01:23:24,400 --> 01:23:26,290 which is the fourth and final location. 1701 01:23:26,290 --> 01:23:34,120 If I recompile my code now, make hi ./hi, I should see 72, 73, 33, and 0. 1702 01:23:34,120 --> 01:23:35,470 That's always been there. 1703 01:23:35,470 --> 01:23:40,900 So I'm always using 4 bytes, somewhat wastefully, but somewhat necessarily 1704 01:23:40,900 --> 01:23:45,080 so that the computer actually knows where that string ends. 1705 01:23:45,080 --> 01:23:48,070 So if we go back to the memory representation of this here, 1706 01:23:48,070 --> 01:23:52,990 it's just as though you have an array of integers being stored contiguously back 1707 01:23:52,990 --> 01:23:56,800 to back to back, the last one of which means this is the end of the array 1708 01:23:56,800 --> 01:24:00,120 of characters, but because I'm using, quote-unquote, "string," 1709 01:24:00,120 --> 01:24:04,080 because I'm using %s and %c, I'm not seeing these numbers by default, 1710 01:24:04,080 --> 01:24:08,950 I'm seeing H-I, exclamation point unless I explicitly tell printf, no, no, no, 1711 01:24:08,950 --> 01:24:13,470 no, show me with %i these actual integers. 1712 01:24:13,470 --> 01:24:15,840 This, then, is how you can think about the string. 1713 01:24:15,840 --> 01:24:17,100 Like you don't really need to think about 1714 01:24:17,100 --> 01:24:18,540 it as being individual characters. 1715 01:24:18,540 --> 01:24:21,600 This is just s, and it has some length here, 1716 01:24:21,600 --> 01:24:26,760 but it does not necessarily an array that you yourself have to create, 1717 01:24:26,760 --> 01:24:30,820 you get it automatically just by using a string. 1718 01:24:30,820 --> 01:24:32,910 Now there's just-- not to add on to the jargon. 1719 01:24:32,910 --> 01:24:35,760 This backslash 0, these 8 0 bits, there's 1720 01:24:35,760 --> 01:24:37,290 actually a technical term for them. 1721 01:24:37,290 --> 01:24:38,430 You can call them NUL. 1722 01:24:38,430 --> 01:24:41,430 It's typically written in all caps like this, confusingly. 1723 01:24:41,430 --> 01:24:44,580 In a couple of weeks, we're going to see another word pronounced null, 1724 01:24:44,580 --> 01:24:48,720 but spelled N-U-L-L. Left hand wasn't talking to right hand years ago, 1725 01:24:48,720 --> 01:24:54,000 but N-U-L means this is the 0 byte that terminates strings, 1726 01:24:54,000 --> 01:24:56,520 that indicate the end of a string. 1727 01:24:56,520 --> 01:25:00,070 And fun fact, you've actually seen this before even though we glossed over it. 1728 01:25:00,070 --> 01:25:02,490 Here's that ASCII chart from last time. 1729 01:25:02,490 --> 01:25:08,850 If I focus on the leftmost column, guess what is the 0 ASCII character? 1730 01:25:08,850 --> 01:25:09,480 NUL. 1731 01:25:09,480 --> 01:25:14,005 You never see null on the screen, it's just how you pronounce 8 0 bits. 1732 01:25:14,005 --> 01:25:14,810 Whew! 1733 01:25:14,810 --> 01:25:17,360 questions on this representation of strings? 1734 01:25:17,360 --> 01:25:18,250 Yeah? 1735 01:25:18,250 --> 01:25:20,090 AUDIENCE: Are strings [INAUDIBLE]? 1736 01:25:20,090 --> 01:25:22,380 DAVID MALAN: Are string structured differently in other languages? 1737 01:25:22,380 --> 01:25:22,760 Yes. 1738 01:25:22,760 --> 01:25:24,590 They are more powerful in other languages. 1739 01:25:24,590 --> 01:25:28,070 In C, you have to build them yourself in this way. 1740 01:25:28,070 --> 01:25:29,900 More on that when we get to Python. 1741 01:25:29,900 --> 01:25:30,710 Other questions. 1742 01:25:30,710 --> 01:25:31,593 Yeah? 1743 01:25:31,593 --> 01:25:41,170 AUDIENCE: [INAUDIBLE] 1744 01:25:41,170 --> 01:25:42,670 DAVID MALAN: A really good question. 1745 01:25:42,670 --> 01:25:45,840 Does that mean we don't have a function to get the length of a string? 1746 01:25:45,840 --> 01:25:47,700 Do we have to create it? 1747 01:25:47,700 --> 01:25:51,360 Short answer, there is a function, but you have to-- someone 1748 01:25:51,360 --> 01:25:52,540 had to write code for it. 1749 01:25:52,540 --> 01:25:56,010 You can't just ask the string itself like you can in JavaScript or Java. 1750 01:25:56,010 --> 01:25:57,150 What is the-- 1751 01:25:57,150 --> 01:25:59,355 AUDIENCE: [INAUDIBLE] 1752 01:25:59,355 --> 01:26:00,480 DAVID MALAN: Yeah, you can. 1753 01:26:00,480 --> 01:26:04,230 It's actually more similar to Python than it is to JavaScript or Java, 1754 01:26:04,230 --> 01:26:07,110 but we'll see that in just a few minutes, in fact. 1755 01:26:07,110 --> 01:26:09,340 So let's introduce maybe a couple of strings. 1756 01:26:09,340 --> 01:26:12,360 So here's two strings in the abstract called s and t, 1757 01:26:12,360 --> 01:26:15,210 and I've initialized them arbitrarily to "HI!" and "BYE!" 1758 01:26:15,210 --> 01:26:18,840 just so we can explore what's going to actually happen underneath the hood. 1759 01:26:18,840 --> 01:26:20,640 So let me go back to VS Code. 1760 01:26:20,640 --> 01:26:23,680 Let me just completely change this program to be that instead. 1761 01:26:23,680 --> 01:26:26,280 So string equals, quote-unquote, "HI!" 1762 01:26:26,280 --> 01:26:28,860 String t equals, quote-unquote, "BYE!" 1763 01:26:28,860 --> 01:26:29,860 in all caps. 1764 01:26:29,860 --> 01:26:34,620 And then let's print them both out very simply. %s backslash n, s. 1765 01:26:34,620 --> 01:26:39,570 Print out %s backslash n, t just so we can see what's going on. 1766 01:26:39,570 --> 01:26:44,183 If I do make hi ./hi, I should, of course, see these two strings. 1767 01:26:44,183 --> 01:26:46,350 But what's going on inside of the computer's memory? 1768 01:26:46,350 --> 01:26:48,868 Well, in this computer's memory, assuming 1769 01:26:48,868 --> 01:26:51,660 these are the only two variables involved and assuming the computer 1770 01:26:51,660 --> 01:26:55,170 is just doing things top to bottom, "HI!" 1771 01:26:55,170 --> 01:26:58,260 is probably going to be stored somewhere like this on my canvas of memory, 1772 01:26:58,260 --> 01:26:58,950 "BYE!" 1773 01:26:58,950 --> 01:27:00,290 is probably going to be stored there. 1774 01:27:00,290 --> 01:27:03,165 And it's wrapping around, but that's just an artist's representation. 1775 01:27:03,165 --> 01:27:05,380 But notice that it is now really important 1776 01:27:05,380 --> 01:27:08,890 that there is this NUL byte at the end of each string 1777 01:27:08,890 --> 01:27:11,650 because that's how the computer is going to know where "HI!" 1778 01:27:11,650 --> 01:27:13,630 ends and where "BYE!" 1779 01:27:13,630 --> 01:27:15,670 begins, otherwise you might see "HI!" 1780 01:27:15,670 --> 01:27:16,360 "BYE!" 1781 01:27:16,360 --> 01:27:20,380 all on the screen at once if there weren't the sentinel value indicating 1782 01:27:20,380 --> 01:27:23,860 to printf, stop at this character. 1783 01:27:23,860 --> 01:27:26,290 But that's all that's going on in your program 1784 01:27:26,290 --> 01:27:29,080 when you have two variables in this way. 1785 01:27:29,080 --> 01:27:32,290 And in fact, what's really going on and things get a little more interesting 1786 01:27:32,290 --> 01:27:37,310 here, if I were to want two of these things, 1787 01:27:37,310 --> 01:27:40,630 notice that I could refer to them two as arrays. 1788 01:27:40,630 --> 01:27:43,990 So s, bracket, 0, 1, 2, and even 3. 1789 01:27:43,990 --> 01:27:47,110 t, bracket, 0, 1, 2, and even 3 and 4. 1790 01:27:47,110 --> 01:27:51,460 But if I want to actually really blend some ideas, 1791 01:27:51,460 --> 01:27:54,190 just playing around with these basic principles now, 1792 01:27:54,190 --> 01:27:56,140 notice what I can do in this version. 1793 01:27:56,140 --> 01:27:59,200 If I know I've got two arrays in VS Code, 1794 01:27:59,200 --> 01:28:02,950 I don't strictly need to do string s and t and u 1795 01:28:02,950 --> 01:28:08,260 and v. That's devolving back into the scores1, scores2, scores3 mantra where 1796 01:28:08,260 --> 01:28:10,277 I had multiple variables almost the same name 1797 01:28:10,277 --> 01:28:12,610 even though I'm using different letters of the alphabet. 1798 01:28:12,610 --> 01:28:13,840 What if I want-- 1799 01:28:13,840 --> 01:28:15,280 what if I do this? 1800 01:28:15,280 --> 01:28:19,660 string words, and if I want to store two words in the computer's memory, fine. 1801 01:28:19,660 --> 01:28:22,700 Create an array of two strings. 1802 01:28:22,700 --> 01:28:23,620 But what is a string? 1803 01:28:23,620 --> 01:28:28,870 A string is an array of characters, so it's getting a little bit trippy here, 1804 01:28:28,870 --> 01:28:32,290 but the ideas are still going to be the same. words, bracket, 1805 01:28:32,290 --> 01:28:34,540 0 could certainly equal "HI!" 1806 01:28:34,540 --> 01:28:39,280 words, bracket, 1 can certainly equal "BYE!" just like the scores example. 1807 01:28:39,280 --> 01:28:42,910 And then if I want to print these things with %s, I can print out words, 1808 01:28:42,910 --> 01:28:44,080 bracket, 0. 1809 01:28:44,080 --> 01:28:48,820 And then I can print out %s backslash n words bracket 1. 1810 01:28:48,820 --> 01:28:52,520 And the example is not going to be any different in terms of its output, 1811 01:28:52,520 --> 01:28:58,240 but I've now avoided s and t, I now just have one variable called words 1812 01:28:58,240 --> 01:29:00,710 containing both of these here things. 1813 01:29:00,710 --> 01:29:02,800 And if I really want to poke around, here's 1814 01:29:02,800 --> 01:29:06,490 where things get even more visually overwhelming, 1815 01:29:06,490 --> 01:29:09,640 but just the logical extension of these same ideas. 1816 01:29:09,640 --> 01:29:13,300 Right now is the previous version where I had two variables, s and t. 1817 01:29:13,300 --> 01:29:17,290 If I now use this new version where I have one variable called words, 1818 01:29:17,290 --> 01:29:22,060 just like this here, the picture should follow logically like this. 1819 01:29:22,060 --> 01:29:26,320 words, bracket, 0 is this string; words, bracket, 1 is this string; 1820 01:29:26,320 --> 01:29:27,940 but what is each string? 1821 01:29:27,940 --> 01:29:29,840 It's an array of characters. 1822 01:29:29,840 --> 01:29:36,520 And so you can also think of it like this, where this H is words, bracket, 1823 01:29:36,520 --> 01:29:37,930 0, bracket, 0. 1824 01:29:37,930 --> 01:29:41,440 So the 0-th character of the 0-th word. 1825 01:29:41,440 --> 01:29:45,580 And this is words, bracket, 0, 1; words, bracket, 0, 2; words, bracket, 0, 3. 1826 01:29:45,580 --> 01:29:49,180 And then words, bracket, 1, 0. 1827 01:29:49,180 --> 01:29:52,382 So it's kind of like a two-dimensional array, almost. 1828 01:29:52,382 --> 01:29:54,340 And you can think about it that way if helpful. 1829 01:29:54,340 --> 01:29:58,400 But for now, it's just applying the same principles to the code. 1830 01:29:58,400 --> 01:30:01,930 So if I go to my code here and I've got my "HI!" and my "BYE!"-- 1831 01:30:01,930 --> 01:30:07,000 this is going to look a little stupid, but let me change this %s to %c, %c, 1832 01:30:07,000 --> 01:30:09,640 %c, and print out words, bracket, 0. 1833 01:30:09,640 --> 01:30:11,620 words, bracket, 0, bracket 1. 1834 01:30:11,620 --> 01:30:16,900 words, bracket, 0, bracket, 2 to print out that three-letter word. 1835 01:30:16,900 --> 01:30:21,550 And now down here, let me print out %c, %c, %c, 1836 01:30:21,550 --> 01:30:24,340 %c because it's four letters in BYE, exclamation point. 1837 01:30:24,340 --> 01:30:28,570 This is words, bracket, 1, but the first character; words, bracket, 1, 1838 01:30:28,570 --> 01:30:32,920 the second character; words, bracket, 1, the third character; 1839 01:30:32,920 --> 01:30:34,948 and words, bracket, 1, the fourth character. 1840 01:30:34,948 --> 01:30:37,240 It's hard to say when you're typing a different number, 1841 01:30:37,240 --> 01:30:40,810 but that's what we get by using zero indexing, so to speak. 1842 01:30:40,810 --> 01:30:41,720 make hi. 1843 01:30:41,720 --> 01:30:42,220 Whew! 1844 01:30:42,220 --> 01:30:42,940 No mistakes. 1845 01:30:42,940 --> 01:30:43,780 "HI!" 1846 01:30:43,780 --> 01:30:45,440 Says the same thing. 1847 01:30:45,440 --> 01:30:46,840 So again, there's no magic. 1848 01:30:46,840 --> 01:30:49,630 Like you are fully in control over what's going 1849 01:30:49,630 --> 01:30:51,560 on inside of the computer's memory. 1850 01:30:51,560 --> 01:30:54,250 And now that we have this array syntax with square brackets, 1851 01:30:54,250 --> 01:30:58,740 you can both create these things and then manipulate them or access them 1852 01:30:58,740 --> 01:31:01,310 however you so choose. 1853 01:31:01,310 --> 01:31:01,810 Whew! 1854 01:31:01,810 --> 01:31:08,540 Questions on arrays or strings in this way? 1855 01:31:08,540 --> 01:31:10,032 Yeah, over here. 1856 01:31:10,032 --> 01:31:13,340 AUDIENCE: Can you have any array that has multiple data types in it? 1857 01:31:13,340 --> 01:31:13,880 DAVID MALAN: Good question. 1858 01:31:13,880 --> 01:31:16,255 Can you have an array with multiple different data types? 1859 01:31:16,255 --> 01:31:19,310 Short answer, no; longer answer, sort of, 1860 01:31:19,310 --> 01:31:22,670 but not in nearly the same user-friendly way as with languages 1861 01:31:22,670 --> 01:31:25,220 like Python or JavaScript or others. 1862 01:31:25,220 --> 01:31:30,580 So assume for now arrays should be the same type in C. Other questions? 1863 01:31:30,580 --> 01:31:31,997 Yeah, over here. 1864 01:31:31,997 --> 01:31:34,432 AUDIENCE: When you talk about [INAUDIBLE]?? 1865 01:31:34,432 --> 01:31:47,113 1866 01:31:47,113 --> 01:31:48,780 DAVID MALAN: Oh, a really good question. 1867 01:31:48,780 --> 01:31:51,500 It will-- so for those who couldn't hear, 1868 01:31:51,500 --> 01:31:54,425 if you were to look past the end of one array, 1869 01:31:54,425 --> 01:31:56,550 would you start to see the beginning of the second? 1870 01:31:56,550 --> 01:31:58,008 In this case, maybe the word "BYE!" 1871 01:31:58,008 --> 01:32:01,070 Could depend on the particulars of your code in the computer. 1872 01:32:01,070 --> 01:32:02,250 Let's try this. 1873 01:32:02,250 --> 01:32:07,310 So let's get a little greedy here and go one past H-I, exclamation point, 1874 01:32:07,310 --> 01:32:11,450 null character by looking at words, bracket, 0, 3, 1875 01:32:11,450 --> 01:32:16,220 which should actually be our null character, so that's going to be there. 1876 01:32:16,220 --> 01:32:18,350 And actually, let's see. 1877 01:32:18,350 --> 01:32:19,490 Let's go ahead and do this. 1878 01:32:19,490 --> 01:32:21,530 Make hi ./hi. 1879 01:32:21,530 --> 01:32:25,310 Still works as expected, but let me change this to integer, 1880 01:32:25,310 --> 01:32:27,770 integer so we can actually see what's going on. 1881 01:32:27,770 --> 01:32:28,610 Integer. 1882 01:32:28,610 --> 01:32:32,840 And now, if I recompile make hi, I should see the same thing, 1883 01:32:32,840 --> 01:32:34,100 but numerically. 1884 01:32:34,100 --> 01:32:37,430 And now what I think you're proposing is let's get a little crazy 1885 01:32:37,430 --> 01:32:41,000 and go even past that to what could be location 4, 1886 01:32:41,000 --> 01:32:45,740 but we know semantically doesn't exist, but maybe is bumping up against "BYE!" 1887 01:32:45,740 --> 01:32:49,140 So make hi ./hi. 1888 01:32:49,140 --> 01:32:52,440 And guess what 66 is. 1889 01:32:52,440 --> 01:32:54,360 Well, just the B, but yes. 1890 01:32:54,360 --> 01:32:59,600 66, recall, is capital B because in week 0, capital A was 65. 1891 01:32:59,600 --> 01:33:01,350 So indeed, now we're really poking around. 1892 01:33:01,350 --> 01:33:02,267 And you can get crazy. 1893 01:33:02,267 --> 01:33:05,520 Like, what's 400 characters away and see what's going on there. 1894 01:33:05,520 --> 01:33:07,870 Eventually your program will probably crash, 1895 01:33:07,870 --> 01:33:12,300 and so don't poke around too much, but more on that in the coming days, too. 1896 01:33:12,300 --> 01:33:16,298 All right, well how about some other revelations and problem-solving? 1897 01:33:16,298 --> 01:33:18,840 Now coming back to the question about strings length earlier, 1898 01:33:18,840 --> 01:33:21,465 and we'll see if we can then tie this all together to something 1899 01:33:21,465 --> 01:33:24,390 like cryptography in the end and manipulating strings 1900 01:33:24,390 --> 01:33:26,580 for the purpose of sending them securely. 1901 01:33:26,580 --> 01:33:30,430 So let me propose that we go into VS Code here again in a moment. 1902 01:33:30,430 --> 01:33:32,430 And I'm going to create a program called length. 1903 01:33:32,430 --> 01:33:36,490 Let's actually figure out ourselves the length of a string initially. 1904 01:33:36,490 --> 01:33:39,750 So I'm going to go ahead and code length.c. 1905 01:33:39,750 --> 01:33:42,450 I'm going to go ahead and include cs50.h. 1906 01:33:42,450 --> 01:33:46,170 I'm going to include stdio.h, int main void. 1907 01:33:46,170 --> 01:33:49,620 And then inside of main, I'm going to prompt the user for their name. 1908 01:33:49,620 --> 01:33:51,930 get_string, quote-unquote, "Name." 1909 01:33:51,930 --> 01:33:55,152 And then I'm going to go ahead and I want 1910 01:33:55,152 --> 01:33:56,610 to count the length of this string. 1911 01:33:56,610 --> 01:33:57,943 But I know what a string is now. 1912 01:33:57,943 --> 01:34:01,720 It's char, char, char, char, and then eventually the null character. 1913 01:34:01,720 --> 01:34:02,685 So I can look for that. 1914 01:34:02,685 --> 01:34:04,560 And I can write this in a few different ways. 1915 01:34:04,560 --> 01:34:06,518 I know a bunch of different types of loops now, 1916 01:34:06,518 --> 01:34:10,440 but I'm going to go with a while loop by first declaring a variable n, 1917 01:34:10,440 --> 01:34:12,618 for number of characters, set it equal to 0. 1918 01:34:12,618 --> 01:34:14,910 It's like starting to count with your fingers all down, 1919 01:34:14,910 --> 01:34:17,910 and I want to do the equivalent of this, counting each of the letters 1920 01:34:17,910 --> 01:34:18,810 that I type in. 1921 01:34:18,810 --> 01:34:20,490 So I can do that as follows. 1922 01:34:20,490 --> 01:34:29,160 While the name variable at location n does not equal, 1923 01:34:29,160 --> 01:34:32,910 quote-unquote, backslash 0, which looks weird, 1924 01:34:32,910 --> 01:34:35,850 but it's just asking the question, is the character 1925 01:34:35,850 --> 01:34:39,850 at that location equal to the so-called null character? 1926 01:34:39,850 --> 01:34:43,560 Which is written with single quotes and backslash 0 by convention. 1927 01:34:43,560 --> 01:34:48,300 And what I want to do, while that is true, is just add 1 to n. 1928 01:34:48,300 --> 01:34:52,440 And then at the very bottom here, let's just go ahead and print out with %i 1929 01:34:52,440 --> 01:34:57,540 the value of n because presumably if I type in HI, exclamation point, 1930 01:34:57,540 --> 01:35:01,860 I'm starting at 0 and I'm going to have H, I, exclamation point, 1931 01:35:01,860 --> 01:35:05,800 null character so I don't increment n a fourth time. 1932 01:35:05,800 --> 01:35:08,460 So let's go ahead and run down here. 1933 01:35:08,460 --> 01:35:12,735 make length ./length, Enter. 1934 01:35:12,735 --> 01:35:15,360 Well, I guess I'm asking for name, so I'll do my name for real. 1935 01:35:15,360 --> 01:35:18,840 David, five characters, and I indeed get 5. 1936 01:35:18,840 --> 01:35:22,750 If I used a for loop, I could do something similar, 1937 01:35:22,750 --> 01:35:26,070 but I think this while loop approach, much like our counter from the past, 1938 01:35:26,070 --> 01:35:27,330 is fairly straightforward. 1939 01:35:27,330 --> 01:35:28,600 But what if I want to do this? 1940 01:35:28,600 --> 01:35:30,780 What if I want to make another function for this? 1941 01:35:30,780 --> 01:35:32,100 Well, I could do that. 1942 01:35:32,100 --> 01:35:32,888 Let me-- 1943 01:35:32,888 --> 01:35:33,930 All right, let's do this. 1944 01:35:33,930 --> 01:35:36,840 Let's write a quick function called string_length. 1945 01:35:36,840 --> 01:35:40,172 It's going to take a string called s or whatever as input. 1946 01:35:40,172 --> 01:35:41,130 And then you know what? 1947 01:35:41,130 --> 01:35:43,590 Let's just do this in that function. 1948 01:35:43,590 --> 01:35:45,810 I'm going to borrow my code from a moment ago. 1949 01:35:45,810 --> 01:35:47,720 I'm going to paste it into this function. 1950 01:35:47,720 --> 01:35:49,470 But I'm not going to print out the length, 1951 01:35:49,470 --> 01:35:51,060 I'm going to return the length n. 1952 01:35:51,060 --> 01:35:53,280 So I have a helper function of sorts that's 1953 01:35:53,280 --> 01:35:55,590 going to hand me back the length of the string, 1954 01:35:55,590 --> 01:36:00,780 and that's why this returns an int, but takes a string as its argument. 1955 01:36:00,780 --> 01:36:01,860 How do I use this? 1956 01:36:01,860 --> 01:36:04,120 Well, first, I do need to copy the prototype 1957 01:36:04,120 --> 01:36:06,090 so I don't get into trouble as before. 1958 01:36:06,090 --> 01:36:07,020 Semicolon. 1959 01:36:07,020 --> 01:36:10,020 And then in my main function, what I think I can do now 1960 01:36:10,020 --> 01:36:11,380 is something like this. 1961 01:36:11,380 --> 01:36:17,942 I can do int length equals the string length of the name variable 1962 01:36:17,942 --> 01:36:18,900 that was just typed in. 1963 01:36:18,900 --> 01:36:23,940 And now using printf %i, print out length, semicolon. 1964 01:36:23,940 --> 01:36:25,440 So exact same logic. 1965 01:36:25,440 --> 01:36:28,050 The only thing I've done that's different this time is I've 1966 01:36:28,050 --> 01:36:30,210 added a helper function just to demonstrate 1967 01:36:30,210 --> 01:36:32,610 how I can take some pretty basic functionality, 1968 01:36:32,610 --> 01:36:35,010 find the length of a string, and modularize it 1969 01:36:35,010 --> 01:36:38,040 into a function abstract it away so I never again have 1970 01:36:38,040 --> 01:36:39,270 to copy-paste that for loop. 1971 01:36:39,270 --> 01:36:41,020 I now have a function called string_length 1972 01:36:41,020 --> 01:36:43,695 that will solve this problem for me. 1973 01:36:43,695 --> 01:36:46,600 Whoops, wrong program. make length. 1974 01:36:46,600 --> 01:36:47,100 Huh. 1975 01:36:47,100 --> 01:36:51,590 Use of undeclared identifier 'name.' 1976 01:36:51,590 --> 01:36:53,090 What did I do wrong? 1977 01:36:53,090 --> 01:36:59,350 Apparently on line 16 of length.c, what did I do wrong here? 1978 01:36:59,350 --> 01:37:00,639 Yeah, in front. 1979 01:37:00,639 --> 01:37:06,210 AUDIENCE: [INAUDIBLE] 1980 01:37:06,210 --> 01:37:06,960 DAVID MALAN: Good. 1981 01:37:06,960 --> 01:37:09,190 AUDIENCE: [INAUDIBLE] 1982 01:37:09,190 --> 01:37:09,940 DAVID MALAN: Good. 1983 01:37:09,940 --> 01:37:10,840 Perfect terminology. 1984 01:37:10,840 --> 01:37:12,850 So name is local to main. 1985 01:37:12,850 --> 01:37:16,930 The scope of name is main, though sounds similar, but different words. 1986 01:37:16,930 --> 01:37:19,720 And so I'm actually should be calling this 1987 01:37:19,720 --> 01:37:24,970 s because s is the name of the local variable being passed in even though it 1988 01:37:24,970 --> 01:37:29,410 happens to be 1 and the same as name because on line 9, 1989 01:37:29,410 --> 01:37:32,060 I'm indeed passing in name as the argument. 1990 01:37:32,060 --> 01:37:32,560 All right. 1991 01:37:32,560 --> 01:37:35,450 So this is where, again, copy-paste can sometimes get you into trouble. 1992 01:37:35,450 --> 01:37:36,760 Let's try to make length again. 1993 01:37:36,760 --> 01:37:42,520 Now it works. ./length, D-A-V-I-D, and now we have a function that seems to be 1994 01:37:42,520 --> 01:37:43,090 working. 1995 01:37:43,090 --> 01:37:45,490 But this is such like commodity functionality. 1996 01:37:45,490 --> 01:37:47,770 Like my God, like surely someone before us 1997 01:37:47,770 --> 01:37:51,070 has written a function to get the length of a string before, 1998 01:37:51,070 --> 01:37:53,080 and indeed, other people have. 1999 01:37:53,080 --> 01:37:56,560 So it turns out that in C, just as you have the stdio library, 2000 01:37:56,560 --> 01:38:00,580 you also have a string library whose header file is called, appropriately, 2001 01:38:00,580 --> 01:38:01,600 string.h. 2002 01:38:01,600 --> 01:38:05,200 In fact CS50 has documentation, therefore, in its own manual pages, 2003 01:38:05,200 --> 01:38:08,020 so to speak, along with some sample usage thereof. 2004 01:38:08,020 --> 01:38:10,580 But it turns out, in the string library, there 2005 01:38:10,580 --> 01:38:13,850 is a very popular function analogous to the Python one 2006 01:38:13,850 --> 01:38:16,370 that you asked about earlier called strlen 2007 01:38:16,370 --> 01:38:19,250 where strlen, one word, no underscores, just 2008 01:38:19,250 --> 01:38:20,875 figures out the length of a string. 2009 01:38:20,875 --> 01:38:23,000 And honestly, I've never looked at its source code, 2010 01:38:23,000 --> 01:38:26,030 but it probably uses a while loop, maybe it uses a for loop, 2011 01:38:26,030 --> 01:38:30,320 but it certainly uses the same idea of just iterating-- that is, 2012 01:38:30,320 --> 01:38:33,380 walking from left to right over a variable 2013 01:38:33,380 --> 01:38:36,860 in order to figure out what the length of a given string is. 2014 01:38:36,860 --> 01:38:38,040 So how do we use this? 2015 01:38:38,040 --> 01:38:42,410 Well if I go back to VS Code here, I can throw away 2016 01:38:42,410 --> 01:38:44,810 the entirety of my string length function, 2017 01:38:44,810 --> 01:38:47,870 I can throw away the prototype, therefore, 2018 01:38:47,870 --> 01:38:52,640 and I can include a third header file, string.h, inside 2019 01:38:52,640 --> 01:38:55,460 of which I claim now is this function called strlen 2020 01:38:55,460 --> 01:38:58,370 that I can just now use out of the box for free 2021 01:38:58,370 --> 01:39:00,560 because someone else wrote this function for me. 2022 01:39:00,560 --> 01:39:03,870 And string.h will teach the compiler that it exists. 2023 01:39:03,870 --> 01:39:10,700 So if I now do make length and ./length, now I have a similarly working program 2024 01:39:10,700 --> 01:39:14,720 that doesn't bother having me write unnecessary code. 2025 01:39:14,720 --> 01:39:16,880 So this is another example of a library. 2026 01:39:16,880 --> 01:39:22,060 The string library is just going to make our lives easier by not having to-- 2027 01:39:22,060 --> 01:39:25,082 for us not having to reinvent some wheel. 2028 01:39:25,082 --> 01:39:27,290 All right, well where else does this get interesting? 2029 01:39:27,290 --> 01:39:29,730 How about something like this? 2030 01:39:29,730 --> 01:39:31,970 Let me go back into VS Code here. 2031 01:39:31,970 --> 01:39:35,138 Let's create a program called string.c-- 2032 01:39:35,138 --> 01:39:38,180 we'll play around with our own strings-- that's going to start similarly. 2033 01:39:38,180 --> 01:39:44,030 So let's include cs50.h, let's include stdio.h, 2034 01:39:44,030 --> 01:39:48,230 let's include string.h so we can use that same strlen function. 2035 01:39:48,230 --> 01:39:50,030 int main void. 2036 01:39:50,030 --> 01:39:51,810 And inside of this, let's do this. 2037 01:39:51,810 --> 01:39:57,410 Let's get a string s and prompt the user for any old string as input. 2038 01:39:57,410 --> 01:39:57,910 All right. 2039 01:39:57,910 --> 01:40:04,140 And then let's go ahead and maybe print out, quote-unquote, "Output." 2040 01:40:04,140 --> 01:40:07,238 And I'm just going to line up my spaces just right because these words are 2041 01:40:07,238 --> 01:40:09,780 slightly different lengths, but we'll see why I'm doing this. 2042 01:40:09,780 --> 01:40:11,860 It's just for aesthetics' sake in a moment. 2043 01:40:11,860 --> 01:40:13,380 And let's go ahead now and do this. 2044 01:40:13,380 --> 01:40:17,348 If I want to print out every character in a string, how can I now do this? 2045 01:40:17,348 --> 01:40:19,140 Well, this is actually a pretty common task 2046 01:40:19,140 --> 01:40:23,580 even though this version, thereof, will seem pointless. for int i gets 0, 2047 01:40:23,580 --> 01:40:26,460 i is less than the length of s. 2048 01:40:26,460 --> 01:40:31,800 i++ is just the conventional way to start a loop that iterates from left 2049 01:40:31,800 --> 01:40:34,260 to right over a string of that length. 2050 01:40:34,260 --> 01:40:38,190 And then let's go ahead and print out each character, %c, 2051 01:40:38,190 --> 01:40:43,800 printing out the string at location i using our fancy new array syntax. 2052 01:40:43,800 --> 01:40:45,780 And at the very end of this program, let's just 2053 01:40:45,780 --> 01:40:48,870 print out a new line character just to move the cursor to the bottom 2054 01:40:48,870 --> 01:40:50,200 like we've done in the past. 2055 01:40:50,200 --> 01:40:54,030 So this is kind of a stupid program like I am reinventing the wheel that is 2056 01:40:54,030 --> 01:40:56,130 the %s format code. 2057 01:40:56,130 --> 01:40:58,600 I already know that printf can print out a whole string. 2058 01:40:58,600 --> 01:40:59,650 Suppose it didn't. 2059 01:40:59,650 --> 01:41:03,100 Suppose I forgot about %s and I only knew about %c, 2060 01:41:03,100 --> 01:41:09,100 these lines of code here collectively will print out the entirety of a string 2061 01:41:09,100 --> 01:41:12,050 character by character based on its length. 2062 01:41:12,050 --> 01:41:17,770 So if I compile this program, make string ./string and type in my name-- 2063 01:41:17,770 --> 01:41:20,870 for instance, David, the output is D-A-V-I-D, 2064 01:41:20,870 --> 01:41:22,870 and here's why I hit the spacebar an extra time, 2065 01:41:22,870 --> 01:41:26,230 because I wanted input and output to line up nicely so we could see that 2066 01:41:26,230 --> 01:41:27,680 they're, in fact, the same length. 2067 01:41:27,680 --> 01:41:28,930 So let me just stipulate. 2068 01:41:28,930 --> 01:41:35,390 This code is correct, but there is an inefficiency with this line of code. 2069 01:41:35,390 --> 01:41:38,020 Let's talk about design instinctively. 2070 01:41:38,020 --> 01:41:42,550 What is maybe bad about this line of code 9-- 2071 01:41:42,550 --> 01:41:44,650 line 9 that I've highlighted? 2072 01:41:44,650 --> 01:41:47,020 This one is subtle. 2073 01:41:47,020 --> 01:41:47,950 Let's go over here. 2074 01:41:47,950 --> 01:41:51,900 AUDIENCE: [INAUDIBLE] 2075 01:41:51,900 --> 01:41:54,120 DAVID MALAN: Yeah. 2076 01:41:54,120 --> 01:41:58,930 I'm calling strlen inside of the loop again and again and again. 2077 01:41:58,930 --> 01:41:59,430 Why? 2078 01:41:59,430 --> 01:42:00,847 Well, recall how for loops worked. 2079 01:42:00,847 --> 01:42:03,870 When we walked through it last week, that middle part of for loop 2080 01:42:03,870 --> 01:42:07,230 in between the semicolons keeps getting checked, keeps getting checked, 2081 01:42:07,230 --> 01:42:08,320 keeps getting checked. 2082 01:42:08,320 --> 01:42:12,030 And so if you put a function call there, which is totally fine syntactically, 2083 01:42:12,030 --> 01:42:14,970 you're asking the same damn question again and again and again. 2084 01:42:14,970 --> 01:42:17,790 And the length of David, D-A-V-I-D, is never changing. 2085 01:42:17,790 --> 01:42:21,330 So strlen, implemented decades ago by some other human, 2086 01:42:21,330 --> 01:42:23,700 has some kind of loop in it, and you're literally 2087 01:42:23,700 --> 01:42:26,580 making that code run again and again and again just 2088 01:42:26,580 --> 01:42:29,123 to get the same answer 5 again and again. 2089 01:42:29,123 --> 01:42:30,540 So I think your instinct is right. 2090 01:42:30,540 --> 01:42:33,928 I could come up with another variable outside of the loop. 2091 01:42:33,928 --> 01:42:35,220 I could do something like this. 2092 01:42:35,220 --> 01:42:40,830 int length equals strlen of s, and then I could just plug that in. 2093 01:42:40,830 --> 01:42:42,658 But there's a slightly more elegant way. 2094 01:42:42,658 --> 01:42:44,700 If you like doing things with slightly less code, 2095 01:42:44,700 --> 01:42:46,440 this is correct as I've now written it. 2096 01:42:46,440 --> 01:42:50,400 It's less efficient-- it's more efficient because I'm only 2097 01:42:50,400 --> 01:42:53,440 calling strlen once now on this new line 9, 2098 01:42:53,440 --> 01:42:56,560 but a more common way to write this would typically 2099 01:42:56,560 --> 01:42:58,360 be to do something like this. 2100 01:42:58,360 --> 01:43:02,860 After initializing i, you can also initialize something else like length. 2101 01:43:02,860 --> 01:43:07,580 And you can set length equal to strlen of s, then your semicolon, 2102 01:43:07,580 --> 01:43:10,755 and now you can say while i is less than that length. 2103 01:43:10,755 --> 01:43:12,130 Or I can tighten this up further. 2104 01:43:12,130 --> 01:43:15,580 If it's just a number and it's a super short loop, might as well just call it 2105 01:43:15,580 --> 01:43:16,120 n. 2106 01:43:16,120 --> 01:43:20,920 So this now would be a canonical way of implementing the exact same idea, 2107 01:43:20,920 --> 01:43:23,770 but without the inefficiency because now you're 2108 01:43:23,770 --> 01:43:28,150 calling strlen in the initialization part of for loop, 2109 01:43:28,150 --> 01:43:32,650 not inside of the Boolean expression that gets checked and executed 2110 01:43:32,650 --> 01:43:34,000 again and again. 2111 01:43:34,000 --> 01:43:34,510 Yeah? 2112 01:43:34,510 --> 01:43:38,965 AUDIENCE: [INAUDIBLE] 2113 01:43:38,965 --> 01:43:39,840 DAVID MALAN: Correct. 2114 01:43:39,840 --> 01:43:43,200 Well, I'm declaring i as an int, but by way of the comma, 2115 01:43:43,200 --> 01:43:45,570 I am also declaring n as an int. 2116 01:43:45,570 --> 01:43:49,110 So they've got to be the same type for this trick to work. 2117 01:43:49,110 --> 01:43:50,370 Good observation. 2118 01:43:50,370 --> 01:43:54,470 Other questions on this one here? 2119 01:43:54,470 --> 01:43:54,970 No? 2120 01:43:54,970 --> 01:43:55,540 All right. 2121 01:43:55,540 --> 01:43:58,900 Well, let's play around further here. 2122 01:43:58,900 --> 01:44:01,862 Let me propose that there's other libraries and header files 2123 01:44:01,862 --> 01:44:03,320 as well that you might find useful. 2124 01:44:03,320 --> 01:44:05,800 There's also something called ctype, which relates to types 2125 01:44:05,800 --> 01:44:08,500 and c's that's got a bunch of useful functions 2126 01:44:08,500 --> 01:44:12,200 that we can actually see if we visit the documentation here. 2127 01:44:12,200 --> 01:44:14,200 But before we get there, let me actually whip up 2128 01:44:14,200 --> 01:44:17,680 a program that maybe does something a little bit fun, albeit low level, 2129 01:44:17,680 --> 01:44:21,590 like forcing some string to uppercase if the human types it in lowercase. 2130 01:44:21,590 --> 01:44:25,120 So let me go ahead and write a program called uppercase.c. 2131 01:44:25,120 --> 01:44:27,940 Let me go ahead and give myself the same header files. 2132 01:44:27,940 --> 01:44:31,840 Include cs50.h, include stdio.h. 2133 01:44:31,840 --> 01:44:34,960 And for now, let's include string.h for the length. 2134 01:44:34,960 --> 01:44:38,570 And let's go ahead and have int main void as before. 2135 01:44:38,570 --> 01:44:40,840 And inside of main, let's give myself a string 2136 01:44:40,840 --> 01:44:46,780 s equaling get_string "Before," just so I know what the string is initially. 2137 01:44:46,780 --> 01:44:50,560 Now I'm going to print out proactively "After" with two spaces 2138 01:44:50,560 --> 01:44:53,740 just so that things line up aesthetically on the screen 2139 01:44:53,740 --> 01:44:55,580 because "After" is one character shorter. 2140 01:44:55,580 --> 01:44:57,920 And now I'm going to do the same technique as before. 2141 01:44:57,920 --> 01:45:07,340 for int i equals 0, n equals the string length of s, i is less than n, i++. 2142 01:45:07,340 --> 01:45:10,940 And then inside of this loop, what do I want to do logically? 2143 01:45:10,940 --> 01:45:15,710 I want to force these characters to uppercase if they are, in fact, 2144 01:45:15,710 --> 01:45:16,670 lowercase. 2145 01:45:16,670 --> 01:45:18,083 And so how might I do this? 2146 01:45:18,083 --> 01:45:20,000 Well, there's a bunch of ways to express this, 2147 01:45:20,000 --> 01:45:22,640 but I'm going to do it maybe the most straightforward way 2148 01:45:22,640 --> 01:45:24,260 even if you've not seen this before. 2149 01:45:24,260 --> 01:45:28,760 If the current letter in the string at location i, 2150 01:45:28,760 --> 01:45:31,970 because I'm in a loop starting from 0 all the way up to, but not 2151 01:45:31,970 --> 01:45:34,400 through the string length, is greater than 2152 01:45:34,400 --> 01:45:42,110 or equal to a lowercase a, in single quotes, and that letter is less than 2153 01:45:42,110 --> 01:45:43,970 or equal to a lowercase z. 2154 01:45:43,970 --> 01:45:45,440 What does this mean in English? 2155 01:45:45,440 --> 01:45:48,740 Well, this essentially means if lowercase-- 2156 01:45:48,740 --> 01:45:52,280 logically, if it's greater than or equal to little a and less than 2157 01:45:52,280 --> 01:45:55,760 or equal to little z, it's somewhere between and z in lowercase. 2158 01:45:55,760 --> 01:45:57,060 What do I want to do? 2159 01:45:57,060 --> 01:45:58,670 Well, I want to force it to uppercase. 2160 01:45:58,670 --> 01:46:03,260 So I want to print out a character without a new line yet 2161 01:46:03,260 --> 01:46:07,880 that prints out the current character, but force it to uppercase. 2162 01:46:07,880 --> 01:46:09,120 Well, how can I do this? 2163 01:46:09,120 --> 01:46:12,560 Well, this is where this gets into some low-level hacking, 2164 01:46:12,560 --> 01:46:14,480 but notice the same ASCII chart. 2165 01:46:14,480 --> 01:46:17,640 Here's our uppercase letters from last time. 2166 01:46:17,640 --> 01:46:20,900 Here's our lowercase characters, and let me highlight those. 2167 01:46:20,900 --> 01:46:25,370 Does anyone notice a relationship between capital A and lowercase a 2168 01:46:25,370 --> 01:46:29,540 that happens to be the same for capital B and lowercase b? 2169 01:46:29,540 --> 01:46:33,000 AUDIENCE: Capital A [INAUDIBLE]. 2170 01:46:33,000 --> 01:46:33,750 DAVID MALAN: Yeah. 2171 01:46:33,750 --> 01:46:35,170 Like this pattern is true. 2172 01:46:35,170 --> 01:46:40,140 So 97 minus 65 is 32, and that's true for every lowercase and uppercase 2173 01:46:40,140 --> 01:46:41,170 letter respectively. 2174 01:46:41,170 --> 01:46:42,420 So I can leverage that. 2175 01:46:42,420 --> 01:46:43,950 And this is not a CS50 thing. 2176 01:46:43,950 --> 01:46:44,850 Like this is ASCII. 2177 01:46:44,850 --> 01:46:45,990 This is, in turn, Unicode. 2178 01:46:45,990 --> 01:46:47,533 This is how modern computers work. 2179 01:46:47,533 --> 01:46:49,950 So if I go back to VS Code here, you know what I could do. 2180 01:46:49,950 --> 01:46:52,350 Let's just literally subtract 32. 2181 01:46:52,350 --> 01:46:55,440 But because I'm displaying this as a char, not as an int, 2182 01:46:55,440 --> 01:47:01,080 I'm going to see the lowercase letter seemingly become an uppercase instead. 2183 01:47:01,080 --> 01:47:05,310 Else, if it's not lowercase-- maybe it's already uppercase, 2184 01:47:05,310 --> 01:47:09,420 maybe it is punctuation, let's just go ahead and print out with %c 2185 01:47:09,420 --> 01:47:11,462 the original character unaltered. 2186 01:47:11,462 --> 01:47:13,170 And then at the very end of this program, 2187 01:47:13,170 --> 01:47:17,670 let's print a new line just to move the cursor to the next line. 2188 01:47:17,670 --> 01:47:19,950 All right, so let's do make uppercase. 2189 01:47:19,950 --> 01:47:22,500 And let me type ./uppercase. 2190 01:47:22,500 --> 01:47:26,100 And I'll type in D-A-V-I-D, all lowercase, and now, 2191 01:47:26,100 --> 01:47:27,750 you'll see it's in all caps. 2192 01:47:27,750 --> 01:47:31,920 If, though, I type in maybe my last name but capitalized M, that's OK, 2193 01:47:31,920 --> 01:47:34,930 the rest of it will still be capitalized for me. 2194 01:47:34,930 --> 01:47:36,710 Now I don't love this technique. 2195 01:47:36,710 --> 01:47:40,090 It's a little bit fragile because I had to do some math. 2196 01:47:40,090 --> 01:47:43,220 I had to check my reference sheet and then incorporate it into my program. 2197 01:47:43,220 --> 01:47:45,940 Even though it will be correct, I could be a little more clever. 2198 01:47:45,940 --> 01:47:47,607 I could actually do something like this. 2199 01:47:47,607 --> 01:47:49,720 Well, whatever the value of lowercase is-- 2200 01:47:49,720 --> 01:47:53,650 lowercase a is minus whatever the value of capital A is, 2201 01:47:53,650 --> 01:47:56,378 and I could actually do it arithmetically even though that, too, 2202 01:47:56,378 --> 01:47:59,170 is somewhat inefficient in that it's asking the same question again 2203 01:47:59,170 --> 01:48:02,320 and again, but the compiler is probably smart enough to optimize that. 2204 01:48:02,320 --> 01:48:05,830 And frankly, for those more comfortable, a good compiler 2205 01:48:05,830 --> 01:48:07,930 will also notice, no, no, no, no, you don't 2206 01:48:07,930 --> 01:48:09,910 want to call strlen again and again. 2207 01:48:09,910 --> 01:48:13,330 The compiler can do some of these optimizations for you, 2208 01:48:13,330 --> 01:48:15,610 but it's still good practice to get into yourself. 2209 01:48:15,610 --> 01:48:17,080 But there's probably a better way. 2210 01:48:17,080 --> 01:48:19,630 Instead of rolling this solution ourselves 2211 01:48:19,630 --> 01:48:22,810 and subtracting 32 or doing any arithmetic, 2212 01:48:22,810 --> 01:48:24,730 let's use that ctype library. 2213 01:48:24,730 --> 01:48:27,280 Let me go back up to my header files. 2214 01:48:27,280 --> 01:48:29,890 Let's additionally include ctype.h. 2215 01:48:29,890 --> 01:48:33,100 Let's pretend like I read the documentation in advance, which I did, 2216 01:48:33,100 --> 01:48:33,940 in fact. 2217 01:48:33,940 --> 01:48:37,570 And let's instead of doing any math here, 2218 01:48:37,570 --> 01:48:41,590 let's use a function that exists in that library called toupper 2219 01:48:41,590 --> 01:48:47,740 and pass to it whatever the current character is in s at location i. 2220 01:48:47,740 --> 01:48:50,860 Otherwise, I still print out the unchanged character. 2221 01:48:50,860 --> 01:48:54,880 And let me go ahead and do make uppercase ./uppercase. 2222 01:48:54,880 --> 01:49:00,190 And now without any math, no subtracting 32, that, too, also works. 2223 01:49:00,190 --> 01:49:01,240 But it gets better. 2224 01:49:01,240 --> 01:49:03,430 If you read the documentation for toupper, 2225 01:49:03,430 --> 01:49:07,570 it turns out its documentation tells you, if C is already uppercase, 2226 01:49:07,570 --> 01:49:09,950 it just passes it through for you. 2227 01:49:09,950 --> 01:49:12,550 So you don't even need to ask this conditional question. 2228 01:49:12,550 --> 01:49:17,710 I can actually cut this to my clipboard, get rid of all of this, 2229 01:49:17,710 --> 01:49:21,430 and just replace that one line only and just 2230 01:49:21,430 --> 01:49:25,600 let toupper handle the situation for me because again, its documentation 2231 01:49:25,600 --> 01:49:28,120 has assured me that if it's already uppercase, 2232 01:49:28,120 --> 01:49:30,890 it's just going to return the original value. 2233 01:49:30,890 --> 01:49:33,670 So if I make uppercase, this time, ./uppercase, 2234 01:49:33,670 --> 01:49:36,640 now it works and now things are getting kind of fun. 2235 01:49:36,640 --> 01:49:38,740 I mean, these are mundane tasks, admittedly, 2236 01:49:38,740 --> 01:49:41,410 but at least I'm standing on the shoulders of smart people 2237 01:49:41,410 --> 01:49:45,040 who came before me who implemented the string library, the ctype library-- 2238 01:49:45,040 --> 01:49:51,760 heck, even the CS50 Library so I don't need to reinvent any of those wheels. 2239 01:49:51,760 --> 01:49:57,750 Questions on any of these library techniques? 2240 01:49:57,750 --> 01:50:00,240 It's all still arrays, it's all still strings and chars, 2241 01:50:00,240 --> 01:50:05,110 but now we're leveraging libraries to solve some of our problems for us. 2242 01:50:05,110 --> 01:50:05,610 All right. 2243 01:50:05,610 --> 01:50:07,890 So let's come full circle to where we began, 2244 01:50:07,890 --> 01:50:10,950 where and I mentioned that some programs include 2245 01:50:10,950 --> 01:50:12,630 support for command line arguments. 2246 01:50:12,630 --> 01:50:18,210 Like Clang takes command line arguments words after the word clang. 2247 01:50:18,210 --> 01:50:21,270 CD, which you've used in Linux, takes command line arguments. 2248 01:50:21,270 --> 01:50:24,510 If you type cd, space, pset1 or cd, space, 2249 01:50:24,510 --> 01:50:28,200 mario in order to change directories into another folder. 2250 01:50:28,200 --> 01:50:31,140 If you do rm like I did earlier, you can remove a file 2251 01:50:31,140 --> 01:50:33,510 by using a command line argument, a second word that 2252 01:50:33,510 --> 01:50:35,730 tells the computer what to remove. 2253 01:50:35,730 --> 01:50:38,520 Well, it turns out that you, too, can write 2254 01:50:38,520 --> 01:50:43,230 code that takes words at the command prompt and uses them as input. 2255 01:50:43,230 --> 01:50:47,040 Up until now, you and I have only gotten user input via get_string, get_int, 2256 01:50:47,040 --> 01:50:48,810 get_float, and functions like that. 2257 01:50:48,810 --> 01:50:52,230 You, too, can write code that take command line arguments which, 2258 01:50:52,230 --> 01:50:54,240 frankly, just save the human time. 2259 01:50:54,240 --> 01:50:57,790 They can type their entire thought at the command line, hit Enter, and boom, 2260 01:50:57,790 --> 01:51:01,240 the program can complete without prompting them and re-prompting them 2261 01:51:01,240 --> 01:51:02,020 again. 2262 01:51:02,020 --> 01:51:05,680 So here's where we can now start to take off some more training wheels. 2263 01:51:05,680 --> 01:51:10,000 Up until now, we've just put void inside of the parentheses here any time 2264 01:51:10,000 --> 01:51:11,620 we implement main. 2265 01:51:11,620 --> 01:51:15,130 It turns out that you can put something else in parentheses 2266 01:51:15,130 --> 01:51:18,820 when using C. It's a mouthful, but you can replace void 2267 01:51:18,820 --> 01:51:23,800 with this bigger expression. 2268 01:51:23,800 --> 01:51:25,240 But it's two things. 2269 01:51:25,240 --> 01:51:28,960 int, called argc by convention, and a string, 2270 01:51:28,960 --> 01:51:32,920 but not a string, actually an array of strings called argv. 2271 01:51:32,920 --> 01:51:35,320 And these terms are a little arcane, but argc means 2272 01:51:35,320 --> 01:51:38,770 argument count-- how many words did the human type at the prompt? 2273 01:51:38,770 --> 01:51:41,410 Argv stands for argument vector, which is generally 2274 01:51:41,410 --> 01:51:42,762 another term for an array-- 2275 01:51:42,762 --> 01:51:44,470 you've heard it perhaps from mathematics. 2276 01:51:44,470 --> 01:51:48,440 It's like a list of values, or in this case, a list of command line arguments. 2277 01:51:48,440 --> 01:51:49,790 So C is special. 2278 01:51:49,790 --> 01:51:54,370 If you declare main as not taking void inside of parentheses, but rather, 2279 01:51:54,370 --> 01:51:58,270 an int and an array of strings, C will figure out 2280 01:51:58,270 --> 01:52:00,880 whatever the human typed at the prompt and hand it to you 2281 01:52:00,880 --> 01:52:03,620 as an array and the length thereof. 2282 01:52:03,620 --> 01:52:05,830 So if I want to leverage this, I can start 2283 01:52:05,830 --> 01:52:10,940 to implement some programs of my own that actually incorporate command line 2284 01:52:10,940 --> 01:52:11,440 arguments. 2285 01:52:11,440 --> 01:52:14,980 For instance, let me go back in a moment here to VS Code. 2286 01:52:14,980 --> 01:52:19,090 Let me create a program, for instance, called greet.c 2287 01:52:19,090 --> 01:52:21,590 that's just going to greet the user in a few different ways. 2288 01:52:21,590 --> 01:52:24,580 So let me first do it the old way. cs50.h. 2289 01:52:24,580 --> 01:52:27,430 Let me include stdio.h. 2290 01:52:27,430 --> 01:52:29,740 Let me do int main void still. 2291 01:52:29,740 --> 01:52:30,950 So the old way. 2292 01:52:30,950 --> 01:52:34,420 And if I want to greet myself or Carter or Yulie or anyone else, 2293 01:52:34,420 --> 01:52:39,850 I could do, old fashioned now, get the answer from the user, get_string. 2294 01:52:39,850 --> 01:52:42,670 Let's prompt for "What's your name?" question mark, 2295 01:52:42,670 --> 01:52:44,200 just like we did in Scratch. 2296 01:52:44,200 --> 01:52:49,940 And then do printf, "Hello," comma, %s backslash n, answer. 2297 01:52:49,940 --> 01:52:53,320 So we've done this many times now this week and last. 2298 01:52:53,320 --> 01:52:56,290 This is the old school way now of getting command line-- 2299 01:52:56,290 --> 01:52:59,360 of getting user input by prompting them for it. 2300 01:52:59,360 --> 01:53:04,570 So if I do make greet /greet, there's no command line arguments at the prompt, 2301 01:53:04,570 --> 01:53:06,610 I'm literally just running the program's name. 2302 01:53:06,610 --> 01:53:10,690 If I hit Enter, though, now get_string kicks in, asks me for my name, 2303 01:53:10,690 --> 01:53:12,370 and the program then greets me. 2304 01:53:12,370 --> 01:53:13,510 But I can do-- 2305 01:53:13,510 --> 01:53:17,530 otherwise, I could do something like this instead. 2306 01:53:17,530 --> 01:53:20,290 First, answer's a little generic, so let's first change 2307 01:53:20,290 --> 01:53:23,980 this back to name and back to name, but that's a minor improvement there 2308 01:53:23,980 --> 01:53:25,480 just stylistically. 2309 01:53:25,480 --> 01:53:28,760 Let's, though, introduce now a command line argument 2310 01:53:28,760 --> 01:53:31,750 so that I can just greet myself by running the program, hitting Enter, 2311 01:53:31,750 --> 01:53:33,820 and being done, no more get_string. 2312 01:53:33,820 --> 01:53:39,520 So I'm going to go ahead and change void to int argc, string 2313 01:53:39,520 --> 01:53:42,070 argv with square brackets. 2314 01:53:42,070 --> 01:53:45,520 string means-- the square brackets means it's an array; 2315 01:53:45,520 --> 01:53:49,010 string means it's an array of strings; and argc, again, 2316 01:53:49,010 --> 01:53:51,898 is just an integer of the number of words typed. 2317 01:53:51,898 --> 01:53:54,190 Now I'm going to somewhat dangerously going to do this. 2318 01:53:54,190 --> 01:53:56,770 I'm going to get rid of my use of get_string altogether, 2319 01:53:56,770 --> 01:54:01,060 and I'm going to change this line to be not name, which no longer exists, 2320 01:54:01,060 --> 01:54:03,820 but I'm going to go into this array called argv 2321 01:54:03,820 --> 01:54:08,050 and I'm going to go into location 1. 2322 01:54:08,050 --> 01:54:10,180 So I'm doing this on faith. 2323 01:54:10,180 --> 01:54:15,070 I haven't explained what I'm doing yet, but I'm going to do make greet ./greet, 2324 01:54:15,070 --> 01:54:19,310 and now I'm going to type my name at the command line just like with rm, 2325 01:54:19,310 --> 01:54:20,740 with clang, with cd. 2326 01:54:20,740 --> 01:54:23,440 With any of the commands you've written with multiple words, 2327 01:54:23,440 --> 01:54:25,090 I'm going to greet literally David. 2328 01:54:25,090 --> 01:54:29,110 So I hit Enter, and voila, I've somehow gotten access 2329 01:54:29,110 --> 01:54:34,930 to what I typed at the prompt by accessing this special parameter called 2330 01:54:34,930 --> 01:54:35,590 argv. 2331 01:54:35,590 --> 01:54:38,507 Technically you could call it anything you want, but the convention is 2332 01:54:38,507 --> 01:54:41,020 argv and argc from right to left here. 2333 01:54:41,020 --> 01:54:42,280 Just a guess, then. 2334 01:54:42,280 --> 01:54:47,230 What if I change this to print out bracket 0 and recompile the code? 2335 01:54:47,230 --> 01:54:49,570 And I run ./greet David? 2336 01:54:49,570 --> 01:54:51,790 What might it say instinctively? 2337 01:54:51,790 --> 01:54:54,490 2338 01:54:54,490 --> 01:54:56,710 Any hunches? 2339 01:54:56,710 --> 01:54:57,250 Yeah. 2340 01:54:57,250 --> 01:54:59,860 So it's going to say hello, ./greet. 2341 01:54:59,860 --> 01:55:01,880 So it turns out, you get one for free. 2342 01:55:01,880 --> 01:55:04,450 Whatever the name of your program is always 2343 01:55:04,450 --> 01:55:07,420 accessible in argv at location 0. 2344 01:55:07,420 --> 01:55:08,380 That's just because. 2345 01:55:08,380 --> 01:55:09,340 It's a handy feature. 2346 01:55:09,340 --> 01:55:12,548 In case there's an error or you need to tell the user how to use the program, 2347 01:55:12,548 --> 01:55:15,970 you know what the command is that they ran, but at location 1, 2348 01:55:15,970 --> 01:55:18,610 maybe 2, maybe 3 are the additional words 2349 01:55:18,610 --> 01:55:20,590 that the human might have typed in. 2350 01:55:20,590 --> 01:55:23,140 Well, let's do something a little smarter than this. 2351 01:55:23,140 --> 01:55:25,420 Let me go back to version 1. 2352 01:55:25,420 --> 01:55:27,610 Let me recompile it, make greet. 2353 01:55:27,610 --> 01:55:31,930 Let me rerun ./greet David, and this seems to work fine. 2354 01:55:31,930 --> 01:55:35,080 What if I get a little curious and print out location 2? 2355 01:55:35,080 --> 01:55:41,530 Let me recompile the code, make greet ./greet David, Enter, OK, there's null. 2356 01:55:41,530 --> 01:55:45,580 And I mentioned we'd see N-U-L-L, and here's one incarnation thereof, 2357 01:55:45,580 --> 01:55:47,270 but this is clearly wrong. 2358 01:55:47,270 --> 01:55:49,990 So I probably don't want to even let the user do this because I 2359 01:55:49,990 --> 01:55:51,490 don't want them to see bogus output. 2360 01:55:51,490 --> 01:55:53,680 Like this is arguably the a bug in the code 2361 01:55:53,680 --> 01:55:58,420 that it even bothered to show this by default. So what could I do instead? 2362 01:55:58,420 --> 01:55:59,420 Well, what if I do this? 2363 01:55:59,420 --> 01:56:07,490 If argc equals equals 2, then go ahead and comfortably 2364 01:56:07,490 --> 01:56:11,120 say printf "hello," argv, bracket, 1. 2365 01:56:11,120 --> 01:56:15,620 Else, if the human did not give exactly two arguments at the prompt, 2366 01:56:15,620 --> 01:56:18,590 let's just print out some default value like "hello, world" 2367 01:56:18,590 --> 01:56:20,040 like from last week. 2368 01:56:20,040 --> 01:56:23,540 In other words now I'm doing this error checking with a conditional, 2369 01:56:23,540 --> 01:56:25,790 making sure with this Boolean expression only 2370 01:56:25,790 --> 01:56:29,990 if argc equals equals 2, and therefore has two words in argv 2371 01:56:29,990 --> 01:56:31,410 do you want to proceed. 2372 01:56:31,410 --> 01:56:35,700 And so now if I do make greet again, ./greet David, this now works. 2373 01:56:35,700 --> 01:56:40,460 But if I don't cooperate and I just run greet, what should it say? 2374 01:56:40,460 --> 01:56:41,690 Just hello, world. 2375 01:56:41,690 --> 01:56:46,280 If I run David Malan as two words, what should it say? 2376 01:56:46,280 --> 01:56:49,880 hello, world, because that's not exactly equal to 2. 2377 01:56:49,880 --> 01:56:52,910 Again, the first word in argv is always the program's name. 2378 01:56:52,910 --> 01:56:56,480 The second word is whatever the human, then, has typed. 2379 01:56:56,480 --> 01:56:59,750 Now if we don't even know in advance how many words they're going to be, 2380 01:56:59,750 --> 01:57:01,190 we can combine today's ideas. 2381 01:57:01,190 --> 01:57:04,190 This is going to look a little weird, but it's the same thing as before. 2382 01:57:04,190 --> 01:57:09,920 for int i gets 0, i is less than-- 2383 01:57:09,920 --> 01:57:13,010 how about argc i++? 2384 01:57:13,010 --> 01:57:19,430 And then inside of this loop, I can print out %s, maybe backslash n, comma, 2385 01:57:19,430 --> 01:57:23,660 and then print out argv, bracket, i. 2386 01:57:23,660 --> 01:57:27,840 So I can have a loop that iterates argc number of times, 2387 01:57:27,840 --> 01:57:29,660 once for every word at the prompt. 2388 01:57:29,660 --> 01:57:34,700 I can print out argv, bracket, i, which is the i-th word in that array 2389 01:57:34,700 --> 01:57:35,730 from left to right. 2390 01:57:35,730 --> 01:57:40,700 And so if I now run make greet and I do ./greet alone, 2391 01:57:40,700 --> 01:57:42,080 I just see the program's name. 2392 01:57:42,080 --> 01:57:47,010 If I do ./greet David, I see, those two, one after the other. 2393 01:57:47,010 --> 01:57:50,350 If I do David Malan, I get those three words. 2394 01:57:50,350 --> 01:57:52,540 If I keep going, I'll get more and more words. 2395 01:57:52,540 --> 01:57:56,040 So using just the length of the array and the name of the array, 2396 01:57:56,040 --> 01:57:58,493 I can actually do quite a bit there. 2397 01:57:58,493 --> 01:58:00,910 Now there's actually some fun things you can do with this, 2398 01:58:00,910 --> 01:58:02,340 and this is sort of beside the point, but there's 2399 01:58:02,340 --> 01:58:04,298 this thing in the world called ASCII art, which 2400 01:58:04,298 --> 01:58:07,290 is making pictures and beautiful things just using ASCII or maybe 2401 01:58:07,290 --> 01:58:09,990 nowadays Unicode characters, but without using emoji. 2402 01:58:09,990 --> 01:58:12,300 Like emoji kind of make this a little too easy. 2403 01:58:12,300 --> 01:58:15,480 But if all you have are traditional largely English letters 2404 01:58:15,480 --> 01:58:18,540 and punctuation, you can actually do some interesting things. 2405 01:58:18,540 --> 01:58:21,910 On Linux systems-- for instance, if I go back to VS Code here, 2406 01:58:21,910 --> 01:58:25,835 let me increase the size of my terminal window here. 2407 01:58:25,835 --> 01:58:27,960 And it turns out that we've pre-installed-- really, 2408 01:58:27,960 --> 01:58:32,010 for no compelling reason, but just for fun, a program called cowsay, 2409 01:58:32,010 --> 01:58:34,000 which has a cow say something. 2410 01:58:34,000 --> 01:58:37,920 So if I want to have a cow say "moo" in ASCII art, I can do this, 2411 01:58:37,920 --> 01:58:41,310 and you get an adorable cow saying something like "moo" on the screen. 2412 01:58:41,310 --> 01:58:43,680 But moo is a command line argument that is clearly 2413 01:58:43,680 --> 01:58:46,590 modifying the output of this program because I could also 2414 01:58:46,590 --> 01:58:49,350 change it to say hello, comma, world, and now the cow 2415 01:58:49,350 --> 01:58:50,980 is going to say that instead. 2416 01:58:50,980 --> 01:58:53,460 So it takes multiple command line arguments, if you will. 2417 01:58:53,460 --> 01:58:58,350 But it also takes what are called flags or switches whereby any command line 2418 01:58:58,350 --> 01:59:01,740 argument that starts with a dash is usually like a special configuration 2419 01:59:01,740 --> 01:59:04,860 option that you would only know exists by reading the documentation 2420 01:59:04,860 --> 01:59:06,300 or seeing a demonstration. 2421 01:59:06,300 --> 01:59:12,780 And if I have my syntax right, if I do cowsay -f, and maybe I'll do-- 2422 01:59:12,780 --> 01:59:13,620 let's see. 2423 01:59:13,620 --> 01:59:18,660 Instead of this cow say, how about I'll do -f for file, 2424 01:59:18,660 --> 01:59:20,460 and I'm going to change it into duck mode. 2425 01:59:20,460 --> 01:59:23,730 And I'm going to have this version of the ASCII art say quack. 2426 01:59:23,730 --> 01:59:26,255 So it's a tiny little duck there, but it's saying quack. 2427 01:59:26,255 --> 01:59:28,380 And you can kind of waste a lot of time doing this. 2428 01:59:28,380 --> 01:59:33,690 I can do cowsay -f dragon and say something like, RAWR, 2429 01:59:33,690 --> 01:59:36,420 and this is just amazing. 2430 01:59:36,420 --> 01:59:38,440 Again, not really academically compelling, 2431 01:59:38,440 --> 01:59:41,880 but it does demonstrate, again, command line arguments, which are everywhere, 2432 01:59:41,880 --> 01:59:44,220 and you've indeed been using them already. 2433 01:59:44,220 --> 01:59:46,830 But there's one other feature we wanted to introduce you 2434 01:59:46,830 --> 01:59:50,610 to today, which will be a useful building block, which will also 2435 01:59:50,610 --> 01:59:54,090 reveal one other thing about the code that we've been writing. 2436 01:59:54,090 --> 01:59:58,110 It turns out that all of the programs we've been writing thus far, eventually 2437 01:59:58,110 --> 02:00:00,210 obviously exit because you see your prompt again 2438 02:00:00,210 --> 02:00:02,680 unless you have an infinite loop such that it never ends. 2439 02:00:02,680 --> 02:00:03,870 But eventually they exit. 2440 02:00:03,870 --> 02:00:07,560 And secretly, every program we've written thus far actually 2441 02:00:07,560 --> 02:00:09,240 has what's called an exit status. 2442 02:00:09,240 --> 02:00:11,730 It's like a special return value from the program 2443 02:00:11,730 --> 02:00:14,310 itself that by default is always 0. 2444 02:00:14,310 --> 02:00:17,590 0 as a number in the world generally means everything's OK. 2445 02:00:17,590 --> 02:00:21,240 The flip side of that is because the world tends to use integers 2446 02:00:21,240 --> 02:00:23,460 and you've got four billion possibilities, 2447 02:00:23,460 --> 02:00:27,000 like every other number in the world when it comes to our program's exit 2448 02:00:27,000 --> 02:00:29,070 status is bad. 2449 02:00:29,070 --> 02:00:30,750 If it's 1, it's probably bad. 2450 02:00:30,750 --> 02:00:32,095 If it's negative 1, it's bad. 2451 02:00:32,095 --> 02:00:34,470 And in fact, you've probably seen this in the real world. 2452 02:00:34,470 --> 02:00:37,580 If you've ever had like a random error message on the screen-- 2453 02:00:37,580 --> 02:00:39,330 here's a screenshot of Zoom, for instance. 2454 02:00:39,330 --> 02:00:43,920 And that screenshot, somewhat confusingly or unknowingly, 2455 02:00:43,920 --> 02:00:47,730 has an error code like 1132, that probably 2456 02:00:47,730 --> 02:00:52,500 means that the Zoom software that some other humans wrote incorrectly somehow 2457 02:00:52,500 --> 02:00:58,410 had an error and it did not exit with status 0, it exited with status 1132. 2458 02:00:58,410 --> 02:01:00,480 And somewhere at Zoom, there's probably a file 2459 02:01:00,480 --> 02:01:04,283 or a book that tells the programmers what this error code actually means. 2460 02:01:04,283 --> 02:01:05,700 This is not useful for you and me. 2461 02:01:05,700 --> 02:01:08,158 There's some programmer at Zoom who would probably be like, 2462 02:01:08,158 --> 02:01:10,950 oh, I know what I did or my colleague did wrong in this case. 2463 02:01:10,950 --> 02:01:13,950 You've seen this elsewhere even though this is not quite the same thing, 2464 02:01:13,950 --> 02:01:15,658 but we'll talk about this in a few weeks. 2465 02:01:15,658 --> 02:01:19,380 If you've ever seen 404, like numbers are everywhere, and on the web, 2466 02:01:19,380 --> 02:01:23,070 404 means like file not found. 2467 02:01:23,070 --> 02:01:26,830 It means you made a typo, the web server deleted a file, or something like that, 2468 02:01:26,830 --> 02:01:30,850 but this is just to say numbers are so often used to signify or represent 2469 02:01:30,850 --> 02:01:31,350 errors. 2470 02:01:31,350 --> 02:01:33,600 Even though that's not an exit status, per se, 2471 02:01:33,600 --> 02:01:36,750 that's an HTTP status code, which we'll soon see. 2472 02:01:36,750 --> 02:01:40,590 But you have access to exit statuses as it relates 2473 02:01:40,590 --> 02:01:42,630 to command line software already. 2474 02:01:42,630 --> 02:01:46,250 Up until now, this is how we've been writing main, now 2475 02:01:46,250 --> 02:01:48,740 with command line arguments, but we've also 2476 02:01:48,740 --> 02:01:51,770 been writing main with an int return value. 2477 02:01:51,770 --> 02:01:54,620 And you've never used this-- we didn't talk about this last week. 2478 02:01:54,620 --> 02:01:57,740 I just ask that you trust me and just keep copying and pasting this. 2479 02:01:57,740 --> 02:02:00,590 But that int means that even your programs 2480 02:02:00,590 --> 02:02:05,660 can return values which can be useful even if you don't use command line 2481 02:02:05,660 --> 02:02:08,870 arguments and we just go back to the original version like void. 2482 02:02:08,870 --> 02:02:15,320 So for instance, if I go ahead and open up, for instance, VS Code again, 2483 02:02:15,320 --> 02:02:16,670 I'll get rid of the dragon. 2484 02:02:16,670 --> 02:02:19,460 And let's do one other program here called status just 2485 02:02:19,460 --> 02:02:23,450 to play around with the idea of these so-called exit statuses. 2486 02:02:23,450 --> 02:02:28,370 Let me just demonstrate the idea with an include cs50.h, include 2487 02:02:28,370 --> 02:02:36,440 stdio.h, int main, and here I'll do int argc, string argv. 2488 02:02:36,440 --> 02:02:39,080 And then inside of main, let's do a similar program 2489 02:02:39,080 --> 02:02:40,430 to before like the hello, world. 2490 02:02:40,430 --> 02:02:44,540 So printf "hello," comma, %s backslash n. 2491 02:02:44,540 --> 02:02:47,010 Then let's print out argv 1. 2492 02:02:47,010 --> 02:02:52,300 But I only want to execute that line if the human gave me a command line 2493 02:02:52,300 --> 02:02:52,800 argument. 2494 02:02:52,800 --> 02:02:55,550 Otherwise I don't want to even say some default like hello, world. 2495 02:02:55,550 --> 02:03:00,250 I just want to abort early and just exit the program, no output whatsoever. 2496 02:03:00,250 --> 02:03:01,350 So I could do this. 2497 02:03:01,350 --> 02:03:05,523 If argc does not equal 2-- 2498 02:03:05,523 --> 02:03:08,190 and it's a single equals, but it's a bang, an exclamation point, 2499 02:03:08,190 --> 02:03:09,370 means not equal. 2500 02:03:09,370 --> 02:03:11,580 So this is the opposite of equals equals. 2501 02:03:11,580 --> 02:03:14,730 Then previously I would have just printed hello, world, 2502 02:03:14,730 --> 02:03:16,830 but now I want to print out an error message 2503 02:03:16,830 --> 02:03:21,210 like, "Missing command-line argument" just to explain to the user 2504 02:03:21,210 --> 02:03:26,520 why the program is about to terminate, and then I can return 1. 2505 02:03:26,520 --> 02:03:27,750 It's kind of arbitrary. 2506 02:03:27,750 --> 02:03:30,700 I could also return 1132, but why start there? 2507 02:03:30,700 --> 02:03:34,180 This is the only possible error that could go wrong in my program. 2508 02:03:34,180 --> 02:03:35,490 So I'm going to start at 1. 2509 02:03:35,490 --> 02:03:39,150 Zoom clearly has 1,000-plus possible things that can go wrong 2510 02:03:39,150 --> 02:03:42,660 in their source code, which is why the number got as big as 1132, 2511 02:03:42,660 --> 02:03:45,990 but I'm just going to arbitrarily, but conventionally return 1. 2512 02:03:45,990 --> 02:03:52,110 But if everything is OK and I do-- it is not the case that argc does not equal 2 2513 02:03:52,110 --> 02:03:57,360 and I actually get to line 11, I'm going to return 0 because 0, again, I claim, 2514 02:03:57,360 --> 02:03:59,190 signifies success. 2515 02:03:59,190 --> 02:04:03,120 And all of this time, every program we've written-- you've written 2516 02:04:03,120 --> 02:04:07,558 has secretly exited with 0 by default. But now 2517 02:04:07,558 --> 02:04:09,600 that our programs are getting more sophisticated, 2518 02:04:09,600 --> 02:04:11,700 when something goes wrong, it turns out it's 2519 02:04:11,700 --> 02:04:15,085 useful to have the power to just return some other value even 2520 02:04:15,085 --> 02:04:16,710 though the user is not going to see it. 2521 02:04:16,710 --> 02:04:19,620 Even though the Zoom user shouldn't see it, it's still there. 2522 02:04:19,620 --> 02:04:22,380 It's diagnostically useful to you, or in the case of a class, 2523 02:04:22,380 --> 02:04:24,660 to your TF or TA or CA. 2524 02:04:24,660 --> 02:04:30,930 So if I do make status now to compile this program and run ./status and type 2525 02:04:30,930 --> 02:04:33,340 my first name I think this is a success. 2526 02:04:33,340 --> 02:04:37,290 It should say hello, David and secretly exit with 0. 2527 02:04:37,290 --> 02:04:41,820 If you really want to see the 0, there's this arcane command you can type. 2528 02:04:41,820 --> 02:04:45,780 You can literally type at your prompt echo $?. 2529 02:04:45,780 --> 02:04:48,810 It's weird symbology, but it's what the humans chose decades ago. 2530 02:04:48,810 --> 02:04:53,460 This will just show you what did the most recently-run program secretly exit 2531 02:04:53,460 --> 02:04:54,010 with. 2532 02:04:54,010 --> 02:04:58,560 So if I do this in VS Code, I can do exit $?, Enter, 2533 02:04:58,560 --> 02:04:59,982 and there's that secret 0. 2534 02:04:59,982 --> 02:05:02,190 I could have been doing this week and last week, it's 2535 02:05:02,190 --> 02:05:03,330 just not that interesting. 2536 02:05:03,330 --> 02:05:08,340 But it is interesting, or at least marginally so, if I rerun status 2537 02:05:08,340 --> 02:05:12,060 and maybe I don't provide a command line argument or I provide too many. 2538 02:05:12,060 --> 02:05:14,340 So argc does not equal 2. 2539 02:05:14,340 --> 02:05:17,520 And I hit Enter, I get yelled at with the error message, 2540 02:05:17,520 --> 02:05:21,300 but I can see the secret status code, which is, indeed, 1. 2541 02:05:21,300 --> 02:05:24,340 And so now if you're ever in the habit in either a class like this 2542 02:05:24,340 --> 02:05:27,090 or in the real world where you're automatically testing your code, 2543 02:05:27,090 --> 02:05:29,340 be it with check50 or in the real world, things called 2544 02:05:29,340 --> 02:05:31,590 unit tests and other third-party software, 2545 02:05:31,590 --> 02:05:36,150 those tests can actually detect these status code-- exit statuses 2546 02:05:36,150 --> 02:05:39,943 and know that your code succeed or fail, 0 or 1. 2547 02:05:39,943 --> 02:05:42,360 And if there's different types of failures it can detect-- 2548 02:05:42,360 --> 02:05:48,630 status 2, status 3, status 1132, it's just one other tool in your toolkit. 2549 02:05:48,630 --> 02:05:51,240 But all of that is terribly low level, and really, 2550 02:05:51,240 --> 02:05:54,900 the goal of this week-- and really, today, and really, code more generally, 2551 02:05:54,900 --> 02:05:55,990 is to solve problems. 2552 02:05:55,990 --> 02:05:58,380 So let's consider an increasingly important one, which 2553 02:05:58,380 --> 02:06:01,650 is the ability to send information securely, 2554 02:06:01,650 --> 02:06:04,980 whether it is in file format, wirelessly, or any other. 2555 02:06:04,980 --> 02:06:08,640 Cryptography is the art and the science of encrypting. 2556 02:06:08,640 --> 02:06:09,930 Scrambling information. 2557 02:06:09,930 --> 02:06:12,510 So that even if I write a secret message to you 2558 02:06:12,510 --> 02:06:16,350 and I send it through this open audience with so many nosey eyes 2559 02:06:16,350 --> 02:06:19,890 who could look at the message, if I've encrypted this message, none of them 2560 02:06:19,890 --> 02:06:22,800 should be able to read it, only you, whoever you are, 2561 02:06:22,800 --> 02:06:24,900 to whom I intended that message. 2562 02:06:24,900 --> 02:06:27,030 In the world of cryptography, then encryption 2563 02:06:27,030 --> 02:06:30,210 means scrambling the information so that only you and the recipient 2564 02:06:30,210 --> 02:06:31,060 can receive it. 2565 02:06:31,060 --> 02:06:34,380 So if we consider our black box like in week 0 and 1, 2566 02:06:34,380 --> 02:06:36,030 here is the problem to be solved. 2567 02:06:36,030 --> 02:06:38,910 And let me propose a couple of pieces of vocabulary. 2568 02:06:38,910 --> 02:06:42,420 Plaintext is any message written in English or any human language 2569 02:06:42,420 --> 02:06:45,090 that you want to send and write yourself. 2570 02:06:45,090 --> 02:06:47,150 Ciphertext is what you want to convert it 2571 02:06:47,150 --> 02:06:49,850 to before you just hand it off to a bunch of random strangers 2572 02:06:49,850 --> 02:06:52,220 in the audience or a bunch of servers on the internet, 2573 02:06:52,220 --> 02:06:54,432 any one of whom could look at your message. 2574 02:06:54,432 --> 02:06:56,390 So in the black box is what we're going to call 2575 02:06:56,390 --> 02:07:02,000 a cipher, an algorithm for encrypting or scrambling information 2576 02:07:02,000 --> 02:07:03,268 in a reversible way. 2577 02:07:03,268 --> 02:07:05,810 It doesn't suffice to just scramble the information randomly, 2578 02:07:05,810 --> 02:07:07,980 otherwise the recipient can't do anything with it. 2579 02:07:07,980 --> 02:07:11,660 It's an algorithm, a cipher that encrypts it in such a way 2580 02:07:11,660 --> 02:07:13,280 that someone else can decrypt it. 2581 02:07:13,280 --> 02:07:14,750 And here's a common way. 2582 02:07:14,750 --> 02:07:20,540 Most ciphers take as input not only the plaintext message in English 2583 02:07:20,540 --> 02:07:22,700 or whatever else, but also a key. 2584 02:07:22,700 --> 02:07:25,400 And it's metaphorically like a key to open a lock, 2585 02:07:25,400 --> 02:07:29,300 but it's technically generally a number, like a really big number made up 2586 02:07:29,300 --> 02:07:30,170 of lots of bits. 2587 02:07:30,170 --> 02:07:35,330 And not even 32, not even 64, sometimes 1,024 bits, which is crazy 2588 02:07:35,330 --> 02:07:37,610 unpronounceable large, but the probability 2589 02:07:37,610 --> 02:07:40,880 that someone is going to guess your key is just so, so small 2590 02:07:40,880 --> 02:07:43,850 that for all intents and purposes, you are, in fact, secure. 2591 02:07:43,850 --> 02:07:46,020 So what's an example of this, for instance? 2592 02:07:46,020 --> 02:07:50,165 Suppose the secret message I want to send is innocuously just "HI!" 2593 02:07:50,165 --> 02:07:52,790 Well, it'd be pretty stupid to write "HI!" on a piece of paper, 2594 02:07:52,790 --> 02:07:54,707 hand it to someone in the audience, and expect 2595 02:07:54,707 --> 02:07:57,770 it to get all the way to the back without someone like glancing at it 2596 02:07:57,770 --> 02:08:00,510 and obviously seeing and reading the plaintext. 2597 02:08:00,510 --> 02:08:03,650 So what if I, though, agree with someone in back, for instance, 2598 02:08:03,650 --> 02:08:05,570 that our secret is going to be 1? 2599 02:08:05,570 --> 02:08:07,790 And we have to agree upon that secret in advance, 2600 02:08:07,790 --> 02:08:10,160 but 1 just means that is my key. 2601 02:08:10,160 --> 02:08:13,340 And let me propose that according to one popular cipher, 2602 02:08:13,340 --> 02:08:19,730 if I want to send "HI!", change the H to an I and the I to a J-- that is, 2603 02:08:19,730 --> 02:08:22,740 increment effectively every letter of the alphabet by one, 2604 02:08:22,740 --> 02:08:25,830 and if you get to a Z, wrap back around to A, for instance. 2605 02:08:25,830 --> 02:08:28,790 So shift the alphabet by one place in this case 2606 02:08:28,790 --> 02:08:31,200 and send this message now instead. 2607 02:08:31,200 --> 02:08:32,510 So is that secure? 2608 02:08:32,510 --> 02:08:35,240 Well, if one of you kind of nosily looks at this sheet of paper, 2609 02:08:35,240 --> 02:08:36,440 you won't see "HI!" 2610 02:08:36,440 --> 02:08:39,240 You will see some information leak in this algorithm. 2611 02:08:39,240 --> 02:08:42,500 You'll see an exclamation point, so I'm enthusiastically saying something, 2612 02:08:42,500 --> 02:08:46,710 but you won't know what the message is unless you decrypt it. 2613 02:08:46,710 --> 02:08:50,720 Now that said, is this very secure, really, in practice? 2614 02:08:50,720 --> 02:08:51,950 I mean, not really. 2615 02:08:51,950 --> 02:08:55,520 Like, if you know I'm just using a key and I'm using the English alphabet, 2616 02:08:55,520 --> 02:08:58,220 you could probably brute force your way to a solution 2617 02:08:58,220 --> 02:09:01,520 by just trying 1, trying 2, trying 3, trying 25, 2618 02:09:01,520 --> 02:09:03,740 go through all the possibilities tediously, 2619 02:09:03,740 --> 02:09:05,660 but eventually it's probably going to pop out. 2620 02:09:05,660 --> 02:09:08,090 This is actually known, though, as the Caesar cipher. 2621 02:09:08,090 --> 02:09:12,080 And back in the day, before anyone else knew about or had invented encryption, 2622 02:09:12,080 --> 02:09:15,260 Caesar, Julius Caesar, was known to use a cipher like this 2623 02:09:15,260 --> 02:09:17,360 using a key of three, literally. 2624 02:09:17,360 --> 02:09:20,780 And I guess it works OK if you're literally the first human in the world 2625 02:09:20,780 --> 02:09:25,370 by lore to have thought of this idea, but of course, anyone who intercepts it 2626 02:09:25,370 --> 02:09:29,330 could attack it nonetheless and figure things out a bit mathematically. 2627 02:09:29,330 --> 02:09:31,140 13 is more common. 2628 02:09:31,140 --> 02:09:35,180 This is called ROT13 on the internet for rotate the letters of the alphabet 13. 2629 02:09:35,180 --> 02:09:38,240 That changes "HI!" to "UV!" 2630 02:09:38,240 --> 02:09:39,937 You might think what's better than 13? 2631 02:09:39,937 --> 02:09:41,270 Well, let's double the security. 2632 02:09:41,270 --> 02:09:42,590 ROT26. 2633 02:09:42,590 --> 02:09:45,140 Why is this stupid? 2634 02:09:45,140 --> 02:09:48,140 I mean, there's like 26 letters in the alphabet, so like A becomes A. So 2635 02:09:48,140 --> 02:09:49,730 that doesn't really help-- oh, wait. 2636 02:09:49,730 --> 02:09:53,090 Oh, I'm pointing at something that's not on the screen, dammit. 2637 02:09:53,090 --> 02:09:58,190 Suppose the message is more lovingly, "I LOVE YOU," instead of just "HI!" 2638 02:09:58,190 --> 02:10:01,490 Same exact approach, whether or not there's punctuation, "I LOVE YOU," 2639 02:10:01,490 --> 02:10:03,980 with an input of 13 might now become this. 2640 02:10:03,980 --> 02:10:07,130 And now it's getting a little less obvious what the ciphertext actually 2641 02:10:07,130 --> 02:10:07,970 represents. 2642 02:10:07,970 --> 02:10:10,550 And now, what's twice as secure is 13? 2643 02:10:10,550 --> 02:10:15,260 Well, 26 is surely better, but of course, if you rotate 26 places, 2644 02:10:15,260 --> 02:10:17,460 that, of course, just gives you the same thing. 2645 02:10:17,460 --> 02:10:19,460 So there's a limit to this, but again, that just 2646 02:10:19,460 --> 02:10:22,770 speaks to the cipher being used, which is very simple. 2647 02:10:22,770 --> 02:10:26,417 There is much, much better, more sophisticated mathematical ciphers 2648 02:10:26,417 --> 02:10:27,000 that are used. 2649 02:10:27,000 --> 02:10:29,660 We're just starting with something simple here. 2650 02:10:29,660 --> 02:10:34,910 As for decryption, if I'm using a key of 1, how do I reverse the process? 2651 02:10:34,910 --> 02:10:36,290 Yeah, so I just minus 1. 2652 02:10:36,290 --> 02:10:41,510 So B becomes A, C becomes B, A becomes Z. And if it's 13, 2653 02:10:41,510 --> 02:10:45,390 I subtract 13 instead or whatever the key is, so long as sender 2654 02:10:45,390 --> 02:10:46,780 and receiver actually know it. 2655 02:10:46,780 --> 02:10:50,280 So in this case here, this is actually the message with which we began class. 2656 02:10:50,280 --> 02:10:53,730 If we have this message here and I used a key of 1 to encrypt it, 2657 02:10:53,730 --> 02:10:57,220 well, decrypting, it might involve doing something like this. 2658 02:10:57,220 --> 02:11:00,278 Here's those same letters on the screen, and I think in a moment 2659 02:11:00,278 --> 02:11:02,070 before we adjourn, I'll mention too that we 2660 02:11:02,070 --> 02:11:04,230 might have encrypted a message in eight characters 2661 02:11:04,230 --> 02:11:06,360 this whole day, so if any of you took the time 2662 02:11:06,360 --> 02:11:08,660 and procrastinated and figured out what the light bulb spelled 2663 02:11:08,660 --> 02:11:10,743 and they didn't seem to spell anything in English, 2664 02:11:10,743 --> 02:11:13,530 well, here now is the solution for cracking it. 2665 02:11:13,530 --> 02:11:16,500 This, if I subtract 1, becomes what? 2666 02:11:16,500 --> 02:11:22,007 U becomes T. And this is obviously-- see where we're going with this? 2667 02:11:22,007 --> 02:11:25,090 And if we keep going, subtracting 1-- so indeed, we're at the end of class 2668 02:11:25,090 --> 02:11:26,930 now because this was CS50. 2669 02:11:26,930 --> 02:11:30,180 And the last thing we have to say is we have hundreds of ducks waiting for you 2670 02:11:30,180 --> 02:11:30,790 outside. 2671 02:11:30,790 --> 02:11:33,120 So on the way out, grab your own rubber duck. 2672 02:11:33,120 --> 02:11:34,320 [APPLAUSE] 2673 02:11:34,320 --> 02:11:37,970 [MUSIC PLAYING] 2674 02:11:37,970 --> 02:12:04,000