1 00:00:00,000 --> 00:00:00,994 [MUSIC PLAYING] 2 00:00:49,690 --> 00:00:50,960 DAVID J. MALAN: All right. 3 00:00:50,960 --> 00:00:54,520 This is CS50 and this is the start of week two. 4 00:00:54,520 --> 00:00:56,730 And you'll recall that over the past couple of weeks, 5 00:00:56,730 --> 00:00:57,410 we've been building up. 6 00:00:57,410 --> 00:01:00,110 First initially from Scratch, the graphical programming language 7 00:01:00,110 --> 00:01:04,580 that we then, just last week, translated to the equivalent program NC. 8 00:01:04,580 --> 00:01:07,040 And of course, there's a lot more syntax now. 9 00:01:07,040 --> 00:01:11,090 It's entirely text but the ideas, recall, were fundamentally the same. 10 00:01:11,090 --> 00:01:13,520 The catch is that computers don't understand this. 11 00:01:13,520 --> 00:01:15,260 They only understand what language? 12 00:01:15,260 --> 00:01:16,510 AUDIENCE: [INAUDIBLE] 13 00:01:16,510 --> 00:01:18,610 DAVID J. MALAN: zeros and ones or binary. 14 00:01:18,610 --> 00:01:23,650 And so there's a requisite step in order for us to get from this code to binary. 15 00:01:23,650 --> 00:01:26,890 And what was that step or that program or process called? 16 00:01:26,890 --> 00:01:27,770 AUDIENCE: [INAUDIBLE] 17 00:01:27,770 --> 00:01:28,640 DAVID J. MALAN: Yeah, so compiling. 18 00:01:28,640 --> 00:01:30,640 And of course, recall as you've now experimented 19 00:01:30,640 --> 00:01:32,710 with this past week that to compile a program, 20 00:01:32,710 --> 00:01:34,510 you can use clang for C, language. 21 00:01:34,510 --> 00:01:36,410 And you can just say clang and then the name of the file 22 00:01:36,410 --> 00:01:37,450 that you want to compile. 23 00:01:37,450 --> 00:01:40,090 And that outputs by default a pretty oddly named program. 24 00:01:40,090 --> 00:01:41,740 Just a dot out. 25 00:01:41,740 --> 00:01:43,180 Which stands for assembler output. 26 00:01:43,180 --> 00:01:44,690 More on that in just a moment. 27 00:01:44,690 --> 00:01:47,390 But recall too that you can override that default behavior. 28 00:01:47,390 --> 00:01:49,480 And you can actually say, Output instead a program 29 00:01:49,480 --> 00:01:52,120 called, hello instead of just a dot out. 30 00:01:52,120 --> 00:01:55,120 But you can go one step further, and you actually use Make. 31 00:01:55,120 --> 00:01:58,180 And Make it self is not a compiler, it's a build utility. 32 00:01:58,180 --> 00:02:00,970 But in layman's terms, what does it do for us? 33 00:02:00,970 --> 00:02:02,350 AUDIENCE: [INAUDIBLE] 34 00:02:02,350 --> 00:02:03,580 DAVID J. MALAN: compiles it. 35 00:02:03,580 --> 00:02:07,000 And it essentially figures out all of those otherwise cryptic 36 00:02:07,000 --> 00:02:08,950 looking command line arguments. 37 00:02:08,950 --> 00:02:10,630 Like dash-o something, and so forth. 38 00:02:10,630 --> 00:02:12,340 So that the program is built just the way 39 00:02:12,340 --> 00:02:14,200 we want it without our having to remember 40 00:02:14,200 --> 00:02:16,540 those seemingly magical incantations. 41 00:02:16,540 --> 00:02:20,410 And though that only works for programs as simple as this. 42 00:02:20,410 --> 00:02:23,000 In fact, some of you with the most recent problems that 43 00:02:23,000 --> 00:02:25,420 might have encountered compilation errors that we actually 44 00:02:25,420 --> 00:02:29,410 did not encounter deliberately in class because Make was helping us out. 45 00:02:29,410 --> 00:02:31,900 In fact, as soon as you enhance a program 46 00:02:31,900 --> 00:02:36,760 to actually take user input using CS50's library by including CS50 dot H, 47 00:02:36,760 --> 00:02:39,940 some of you might have realized that all of a sudden the sandbox, 48 00:02:39,940 --> 00:02:43,150 and more generally Clang, didn't know what get_string was. 49 00:02:43,150 --> 00:02:45,950 And frankly, Clang might not even known what a string was. 50 00:02:45,950 --> 00:02:49,630 And that's because those two are features of CS50's library 51 00:02:49,630 --> 00:02:51,640 that you have to teach Clang about. 52 00:02:51,640 --> 00:02:57,250 But it's not enough to teach Clang what they look like, as by including CS50.h. 53 00:02:57,250 --> 00:03:01,360 Turns out there's a missing step that Make helps us solve 54 00:03:01,360 --> 00:03:04,610 but that you too can just solve manually if you want. 55 00:03:04,610 --> 00:03:08,650 And by that I mean this, instead of compiling a program with just Clang, 56 00:03:08,650 --> 00:03:09,670 hello.c. 57 00:03:09,670 --> 00:03:13,560 When you want to use CS50's library, you actually 58 00:03:13,560 --> 00:03:15,690 need to add this additional command line argument. 59 00:03:15,690 --> 00:03:18,820 Specifically at the end, can't go in the beginning like dash-O. 60 00:03:18,820 --> 00:03:20,410 And dash-L stands for link. 61 00:03:20,410 --> 00:03:24,250 And this is a way of telling Clang, by the way when compiling my program, 62 00:03:24,250 --> 00:03:28,210 please link in CS50's zeros and ones that we the staff 63 00:03:28,210 --> 00:03:31,590 wrote some weeks ago and installed in the sandbox for you. 64 00:03:31,590 --> 00:03:33,340 So you've got your zeros and ones and then 65 00:03:33,340 --> 00:03:35,560 you've got our zeros and ones so to speak. 66 00:03:35,560 --> 00:03:38,540 And dash-LCS50 says to link them together. 67 00:03:38,540 --> 00:03:42,340 So if you were getting some kind of undefined reference error to get_string 68 00:03:42,340 --> 00:03:43,270 or you didn't-- 69 00:03:43,270 --> 00:03:46,480 you weren't able to compile a program that just used any of the get functions 70 00:03:46,480 --> 00:03:47,830 from CS50's library. 71 00:03:47,830 --> 00:03:51,550 Odds are, this simple change dash-LCS50 would have fixed. 72 00:03:51,550 --> 00:03:54,430 But of course, this isn't interesting stuff to remember, let alone 73 00:03:54,430 --> 00:03:57,220 remembering how to use dash-0 as well, at which point 74 00:03:57,220 --> 00:03:59,600 the command gets really tedious to type. 75 00:03:59,600 --> 00:04:00,700 So here comes, Make again. 76 00:04:00,700 --> 00:04:02,270 Make automates all of this for us. 77 00:04:02,270 --> 00:04:04,690 And in fact, if you henceforth start running Make and then 78 00:04:04,690 --> 00:04:08,500 pay closer attention to the fairly long line of output that it outputs, 79 00:04:08,500 --> 00:04:11,180 you'll actually see mention of dash-LCS50, 80 00:04:11,180 --> 00:04:14,200 you'll see mention of even dash-LM, which stands for math. 81 00:04:14,200 --> 00:04:16,410 So if you're using round, for instance, you 82 00:04:16,410 --> 00:04:18,760 might have discovered that round two also 83 00:04:18,760 --> 00:04:21,880 doesn't work out of the box unless you use Make itself 84 00:04:21,880 --> 00:04:25,030 or this more nuanced approach. 85 00:04:25,030 --> 00:04:28,200 So this is all to say that compiling is a bit of a white lie. 86 00:04:28,200 --> 00:04:31,120 Like, yes you've been compiling and you've been going from source code 87 00:04:31,120 --> 00:04:32,530 to machine code. 88 00:04:32,530 --> 00:04:35,380 But it turns out that there's been a number of other steps happening 89 00:04:35,380 --> 00:04:37,720 for you that we're going to just slap some labels on today. 90 00:04:37,720 --> 00:04:40,190 At the end of the day, we're just breaking the abstraction. 91 00:04:40,190 --> 00:04:42,930 So compiling is this abstraction from source code to machine code. 92 00:04:42,930 --> 00:04:45,190 Let's just kind of zoom in briefly to appreciate 93 00:04:45,190 --> 00:04:47,860 what it is that's going on in hopes that it makes the code we're 94 00:04:47,860 --> 00:04:50,590 compiling a little more understandable. 95 00:04:50,590 --> 00:04:54,610 So step one of four, when it comes to actually compiling a program 96 00:04:54,610 --> 00:04:55,780 is called Pre-processing. 97 00:04:55,780 --> 00:04:58,330 So recall that this program we just looked at had a couple of 98 00:04:58,330 --> 00:05:00,440 includes at the top of the file. 99 00:05:00,440 --> 00:05:02,740 These are generally known as pre-processor directives. 100 00:05:02,740 --> 00:05:05,380 Not a particularly interesting term but they're 101 00:05:05,380 --> 00:05:08,380 demarcated by the hash at the start of these lines. 102 00:05:08,380 --> 00:05:12,230 That's a signal to Clang that these things should be handled first. 103 00:05:12,230 --> 00:05:13,270 Preprocessed. 104 00:05:13,270 --> 00:05:15,250 Process before everything else. 105 00:05:15,250 --> 00:05:20,260 And in fact, the reason for this we did discuss last week, inside of CS50.h 106 00:05:20,260 --> 00:05:21,790 is what, for instance? 107 00:05:21,790 --> 00:05:24,300 AUDIENCE: [INAUDIBLE] 108 00:05:24,300 --> 00:05:27,460 DAVID J. MALAN: Specifically, the declaration of get strings. 109 00:05:27,460 --> 00:05:30,540 So there's some lines of code, the prototype if you recall, 110 00:05:30,540 --> 00:05:34,770 that one line of code that teaches Clang what the inputs to get_string are 111 00:05:34,770 --> 00:05:35,910 and what the outputs are. 112 00:05:35,910 --> 00:05:38,590 The return type and the arguments, so to speak. 113 00:05:38,590 --> 00:05:42,380 And so when you have include CS50.h at the top of the file, what 114 00:05:42,380 --> 00:05:45,780 is happening when you first run Clang during this so-called pre-processing 115 00:05:45,780 --> 00:05:49,890 step, is Clang looks on the hard drive for the file literally called CS50.h. 116 00:05:49,890 --> 00:05:54,940 It grabs its contents and essentially finds and replaces this line here. 117 00:05:54,940 --> 00:05:58,800 So somewhere in CS50.h is a line like this yellow one here 118 00:05:58,800 --> 00:06:02,050 that says get_string, is a function that returns a string. 119 00:06:02,050 --> 00:06:05,340 And it takes as input, the so-called argument, a string 120 00:06:05,340 --> 00:06:06,930 that we'll call prompt. 121 00:06:06,930 --> 00:06:10,680 Meanwhile, with include standard I/O. What's the point of including that? 122 00:06:10,680 --> 00:06:14,180 What is declared inside of that file presumably? 123 00:06:14,180 --> 00:06:14,680 Yeah? 124 00:06:14,680 --> 00:06:16,110 AUDIENCE: It's the standard inputs and outputs. 125 00:06:16,110 --> 00:06:17,220 DAVID J. MALAN: Standard inputs and outputs. 126 00:06:17,220 --> 00:06:19,020 And more specifically, what example there of? 127 00:06:19,020 --> 00:06:19,600 What function? 128 00:06:19,600 --> 00:06:20,470 AUDIENCE: [INAUDIBLE] 129 00:06:20,470 --> 00:06:21,560 DAVID J. MALAN: So printf. 130 00:06:21,560 --> 00:06:22,950 The other function we keep using. 131 00:06:22,950 --> 00:06:26,860 So inside of standard io.h, somewhere on the sandbox's hard drive 132 00:06:26,860 --> 00:06:29,700 is similarly a line of code that frankly looks a little more cryptic 133 00:06:29,700 --> 00:06:31,400 but we'll come back to this sort of thing 134 00:06:31,400 --> 00:06:33,540 down the road, that says print if is a function. 135 00:06:33,540 --> 00:06:36,120 Happens to return on int, but more on that another time. 136 00:06:36,120 --> 00:06:38,730 Happens to take a char* format. 137 00:06:38,730 --> 00:06:40,150 But more on that another time. 138 00:06:40,150 --> 00:06:41,900 Indeed, this is one of the reasons we hide 139 00:06:41,900 --> 00:06:44,280 this detail early on because there's some syntax that's 140 00:06:44,280 --> 00:06:45,500 just a distraction for now. 141 00:06:45,500 --> 00:06:46,800 But that's all that's going on. 142 00:06:46,800 --> 00:06:50,640 The sharp include sign is just finding and replacing the contents. 143 00:06:50,640 --> 00:06:54,460 Plus dot, dot, dot, a bunch of other things in those files as well. 144 00:06:54,460 --> 00:06:56,450 So when we say pre-processing, we just mean 145 00:06:56,450 --> 00:06:59,370 that that's getting substituted in so you don't have to copy and paste 146 00:06:59,370 --> 00:07:01,470 this sort of thing manually yourself. 147 00:07:01,470 --> 00:07:04,770 So "compiling" is a word that actually has a well-defined meaning. 148 00:07:04,770 --> 00:07:08,370 Once you've preprocessed your code, and your code looks essentially like this, 149 00:07:08,370 --> 00:07:11,820 unbeknownst to you, then comes the actual compilation step. 150 00:07:11,820 --> 00:07:15,930 And this code here gets turned into this code here. 151 00:07:15,930 --> 00:07:18,840 Now this is scary-looking, and this is the sort of thing 152 00:07:18,840 --> 00:07:21,300 that if you take a class like CS61 at Harvard, 153 00:07:21,300 --> 00:07:23,610 or, more generally, systems programming, so to speak, 154 00:07:23,610 --> 00:07:25,180 you might see something like this. 155 00:07:25,180 --> 00:07:28,550 This is x86 64-bit assembly instructions. 156 00:07:28,550 --> 00:07:31,140 And the only thing interesting about that claim for the moment 157 00:07:31,140 --> 00:07:32,220 is that assembly-- 158 00:07:32,220 --> 00:07:35,540 I kind of alluded to that earlier-- assembler output, a.out. 159 00:07:35,540 --> 00:07:38,250 There's actually a relationship here, but long story short, these 160 00:07:38,250 --> 00:07:41,880 are the lower level instructions that only the CPU, 161 00:07:41,880 --> 00:07:44,700 the brain inside your computer, actually understands. 162 00:07:44,700 --> 00:07:48,150 Your CPU does not understand C. It doesn't understand Python or C++ 163 00:07:48,150 --> 00:07:50,550 or Java or any language with which you might be familiar. 164 00:07:50,550 --> 00:07:53,160 It only understands this cryptic-looking thing. 165 00:07:53,160 --> 00:07:56,820 But frankly, from the looks of it, you might glean that probably not so much 166 00:07:56,820 --> 00:07:58,170 fun to program in this. 167 00:07:58,170 --> 00:08:00,990 I mean, arguably, it's not that much fun to program yet in C, 168 00:08:00,990 --> 00:08:03,210 So this looks even more cryptic. 169 00:08:03,210 --> 00:08:04,250 But that's OK. 170 00:08:04,250 --> 00:08:07,440 C and lots of languages are just these abstractions 171 00:08:07,440 --> 00:08:10,380 on top of the lower level stuff that the CPUs do actually 172 00:08:10,380 --> 00:08:13,230 understand so that we don't have to worry about it as much. 173 00:08:13,230 --> 00:08:16,470 But if we highlight a few terms, here you'll see some familiar things. 174 00:08:16,470 --> 00:08:19,500 So main is mentioned in this so-called assembly code. 175 00:08:19,500 --> 00:08:21,570 You see mention of get string and printf, 176 00:08:21,570 --> 00:08:23,170 so we're not losing information. 177 00:08:23,170 --> 00:08:27,650 It's just being presented in really a different language, assembly language. 178 00:08:27,650 --> 00:08:31,510 Now you can glean, perhaps, from some of the names of these instructions, 179 00:08:31,510 --> 00:08:33,450 this is what Intel Inside means. 180 00:08:33,450 --> 00:08:37,380 When Intel or any brand of CPU understands instructions, 181 00:08:37,380 --> 00:08:42,210 it means things like pushing and moving and subtracting and calling. 182 00:08:42,210 --> 00:08:44,530 These are all low level verbs, functions, 183 00:08:44,530 --> 00:08:46,680 if you will, but at the level of the CPU. 184 00:08:46,680 --> 00:08:48,960 But for more on that, you can take entire courses. 185 00:08:48,960 --> 00:08:51,450 But just to take the hood off of this for today, 186 00:08:51,450 --> 00:08:54,930 this is a step that's been happening for us magically unbeknownst 187 00:08:54,930 --> 00:08:57,350 to us, thanks to Clang. 188 00:08:57,350 --> 00:09:00,690 So assembling-- now that you've got this cryptic-looking code that we will never 189 00:09:00,690 --> 00:09:02,910 see again-- we'll never need to output again-- 190 00:09:02,910 --> 00:09:03,880 what do you do with it? 191 00:09:03,880 --> 00:09:07,110 Well, you said earlier that computers only understand zeros and ones, 192 00:09:07,110 --> 00:09:12,630 so the third step is actually to convert this assembly language to actual zeros 193 00:09:12,630 --> 00:09:15,090 and ones that now look like this. 194 00:09:15,090 --> 00:09:17,460 So the assembling step happening, unbeknownst to you, 195 00:09:17,460 --> 00:09:19,170 every time you run Clang or, in turn, run 196 00:09:19,170 --> 00:09:22,120 make, we're getting zeros and ones out of the assembly code, 197 00:09:22,120 --> 00:09:25,500 and we're getting the assembly code out of your C-code. 198 00:09:25,500 --> 00:09:28,830 But here's the fourth and final step. 199 00:09:28,830 --> 00:09:32,670 Recall that we need to link in other people's zeros and ones. 200 00:09:32,670 --> 00:09:34,560 If you're using printf you didn't write that. 201 00:09:34,560 --> 00:09:36,940 Someone else created those zeros and ones, the patterns 202 00:09:36,940 --> 00:09:38,190 that the computer understands. 203 00:09:38,190 --> 00:09:39,390 You didn't create get string. 204 00:09:39,390 --> 00:09:41,820 We did, so you need access to those zeros and ones 205 00:09:41,820 --> 00:09:44,250 so that your program can use them as well. 206 00:09:44,250 --> 00:09:45,930 So linking, essentially, does this. 207 00:09:45,930 --> 00:09:48,490 If you've written a program-- for instance, hello.c-- 208 00:09:48,490 --> 00:09:51,450 and it happens to use a couple of other libraries, 209 00:09:51,450 --> 00:09:53,970 files that other people wrote of useful code 210 00:09:53,970 --> 00:09:57,970 for you, like cs50.c, which does exist somewhere, 211 00:09:57,970 --> 00:10:00,960 and even stdio.c, which does exist somewhere, 212 00:10:00,960 --> 00:10:03,690 or technically, Standard IO is such a big library, 213 00:10:03,690 --> 00:10:06,970 they actually put printf in a file specifically called printf.c. 214 00:10:06,970 --> 00:10:10,650 But somewhere in the sandbox's hard drive, in all of our Macs and PCs, 215 00:10:10,650 --> 00:10:14,850 if they support compiling, are, for instance, files like these. 216 00:10:14,850 --> 00:10:18,060 But we've got to convert this to zeros and ones, this, and this, 217 00:10:18,060 --> 00:10:19,570 and then somehow combine them. 218 00:10:19,570 --> 00:10:21,830 So pictorially, this just looks a bit like this. 219 00:10:21,830 --> 00:10:23,880 And this is all happening automatically by Clang. 220 00:10:23,880 --> 00:10:25,950 Hello.c, the code you wrote, gets compiled 221 00:10:25,950 --> 00:10:31,740 to assembly, which then gets assembled into zeros and ones, so-called machine 222 00:10:31,740 --> 00:10:32,970 code or object code. 223 00:10:32,970 --> 00:10:36,240 Cs50.c-- we did this for you before the semester started. 224 00:10:36,240 --> 00:10:39,090 Printf was done way before any of us started decades 225 00:10:39,090 --> 00:10:41,550 ago and looks like this. 226 00:10:41,550 --> 00:10:44,700 These are three separate files, though, so the linking step literally 227 00:10:44,700 --> 00:10:48,570 means, link all of these things together, and combine the zeros 228 00:10:48,570 --> 00:10:51,980 and ones from, like, three, at least, separate files, 229 00:10:51,980 --> 00:10:53,820 and just combine them in such a way that now 230 00:10:53,820 --> 00:10:57,930 the CPU knows how to use not just your code but printf and get string 231 00:10:57,930 --> 00:10:59,380 and so forth. 232 00:10:59,380 --> 00:11:02,110 So last week, we introduced compiling as an abstraction, 233 00:11:02,110 --> 00:11:05,440 if you will, and this is all that we've really meant this whole time. 234 00:11:05,440 --> 00:11:08,230 But now that we've seen what's going on underneath the hood, 235 00:11:08,230 --> 00:11:11,160 and we can stipulate that my CPU that looks physically 236 00:11:11,160 --> 00:11:14,010 like this, albeit smaller in a laptop or desktop, 237 00:11:14,010 --> 00:11:17,290 knows how to deal with all of that. 238 00:11:17,290 --> 00:11:19,850 So any questions on these four steps-- 239 00:11:19,850 --> 00:11:22,710 pre-processing, compiling, assembling, linking? 240 00:11:22,710 --> 00:11:27,340 But generally, now, we can just call them compiling, as most people do. 241 00:11:27,340 --> 00:11:28,100 Any questions? 242 00:11:28,100 --> 00:11:29,090 Yeah. 243 00:11:29,090 --> 00:11:36,510 AUDIENCE: How does the CPU know that [INAUDIBLE] is there? 244 00:11:36,510 --> 00:11:39,780 Is that [INAUDIBLE]? 245 00:11:39,780 --> 00:11:41,740 DAVID J. MALAN: Not in the pre-processing step, 246 00:11:41,740 --> 00:11:43,490 so the question is, how does the computer 247 00:11:43,490 --> 00:11:46,850 know that printf is the only function that's there? 248 00:11:46,850 --> 00:11:49,040 Essentially, when you're linking in code, 249 00:11:49,040 --> 00:11:51,980 only the requisite zeros and ones are typically linked in. 250 00:11:51,980 --> 00:11:55,280 Sometimes you get more than you actually need, if it's a big library, 251 00:11:55,280 --> 00:11:56,490 but that's OK, too. 252 00:11:56,490 --> 00:11:58,990 Those zeros and ones are just never used by the CPU. 253 00:11:58,990 --> 00:11:59,990 Good question. 254 00:11:59,990 --> 00:12:02,410 Other questions? 255 00:12:02,410 --> 00:12:03,170 OK, all right. 256 00:12:03,170 --> 00:12:06,740 So now that we know this is possible, let's start 257 00:12:06,740 --> 00:12:09,530 to build our way back up, because everyone here 258 00:12:09,530 --> 00:12:11,810 probably knows now that when writing in C, which 259 00:12:11,810 --> 00:12:13,520 is kind of up here conceptually, like, it 260 00:12:13,520 --> 00:12:16,740 is not without its hurdles and problems and bugs and mistakes. 261 00:12:16,740 --> 00:12:19,910 So let's introduce a few techniques and tools with which you can henceforth, 262 00:12:19,910 --> 00:12:23,110 starting this week and beyond, trying to troubleshoot those problems yourself 263 00:12:23,110 --> 00:12:26,240 rather than just trying to read through the cryptic-looking error messages 264 00:12:26,240 --> 00:12:28,010 or reach out for help to another human. 265 00:12:28,010 --> 00:12:31,660 Let's see if software can actually answer some of these questions for you. 266 00:12:31,660 --> 00:12:32,960 So let me go ahead and do this. 267 00:12:32,960 --> 00:12:35,450 Let me go ahead and open up a sandbox here, 268 00:12:35,450 --> 00:12:38,340 and I'm going to go ahead and create a new file called 269 00:12:38,340 --> 00:12:43,190 buggy0.c in which I will, this time, deliberately introduce a bug. 270 00:12:43,190 --> 00:12:46,790 I'm going to go ahead and create my function called 271 00:12:46,790 --> 00:12:50,360 main, which, again, is the default, like when green flag is clicked. 272 00:12:50,360 --> 00:12:53,240 And I'm going to go ahead and say, printf, quote, unquote, 273 00:12:53,240 --> 00:12:56,190 "Hello world/m." 274 00:12:56,190 --> 00:12:56,690 All right. 275 00:12:56,690 --> 00:12:57,750 Looks pretty good. 276 00:12:57,750 --> 00:13:01,300 I'm going to go ahead and compile buggy0, Enter, 277 00:13:01,300 --> 00:13:03,590 and of course, I get a bunch of error messages here. 278 00:13:03,590 --> 00:13:05,020 Let me zoom in on them. 279 00:13:05,020 --> 00:13:07,730 Fortunately, I only have two, but remember, you have to, have to, 280 00:13:07,730 --> 00:13:09,860 have to always scroll up to look at the first, 281 00:13:09,860 --> 00:13:12,320 because there might just be an annoying cascading effect from one earlier 282 00:13:12,320 --> 00:13:13,370 bug to the later. 283 00:13:13,370 --> 00:13:18,890 So buggy0.c, line 5, is what this means, character 5, so like 5 spaces in, 284 00:13:18,890 --> 00:13:22,430 implicitly declaring library function printf with dot, dot, dot. 285 00:13:22,430 --> 00:13:24,980 So you're going to start to see this pretty often if you make 286 00:13:24,980 --> 00:13:27,050 this particular mistake or oversight. 287 00:13:27,050 --> 00:13:29,690 Implicitly declaring something means you forgot 288 00:13:29,690 --> 00:13:31,610 to teach Clang that something exists. 289 00:13:31,610 --> 00:13:36,010 And you probably know from experience, perhaps now, what the solution is. 290 00:13:36,010 --> 00:13:38,550 What's the first mistake I made here? 291 00:13:38,550 --> 00:13:39,720 AUDIENCE: [INAUDIBLE]. 292 00:13:39,720 --> 00:13:42,020 DAVID J. MALAN: Yeah, I didn't include the header file, 293 00:13:42,020 --> 00:13:43,220 so to speak, for the library. 294 00:13:43,220 --> 00:13:47,860 I'm missing, at the top of the file, include stdio.h, 295 00:13:47,860 --> 00:13:49,690 in which printf is defined. 296 00:13:49,690 --> 00:13:53,230 But let's propose that you're not quite sure how to get to that point, 297 00:13:53,230 --> 00:13:55,320 and how can we get, actually, some help with this? 298 00:13:55,320 --> 00:13:57,430 Let me actually increase the size of my terminal 299 00:13:57,430 --> 00:14:00,490 here, and recall that just a moment ago, I ran makebuggy0, 300 00:14:00,490 --> 00:14:02,500 which yielded the errors that I saw. 301 00:14:02,500 --> 00:14:04,840 It turns out that installed in the sandbox 302 00:14:04,840 --> 00:14:07,660 is a command that we, the staff, wrote called help50. 303 00:14:07,660 --> 00:14:11,330 And this is just a program we wrote that takes as input any error 304 00:14:11,330 --> 00:14:14,740 messages that your code or some program has outputted. 305 00:14:14,740 --> 00:14:16,870 We kind of look for familiar words and phrases, 306 00:14:16,870 --> 00:14:20,680 just like a TF would in office hours, and if we recognize some error message, 307 00:14:20,680 --> 00:14:24,160 we're going to try to provide, either rhetorically or explicitly, 308 00:14:24,160 --> 00:14:25,880 some advice on how to handle. 309 00:14:25,880 --> 00:14:29,470 So if I go ahead and run this command now, notice there's a bit more output. 310 00:14:29,470 --> 00:14:33,550 I see exactly the same output in white and green and red as before, 311 00:14:33,550 --> 00:14:36,790 but down below is some yellow, which comes specifically from help50. 312 00:14:36,790 --> 00:14:38,830 And if I go ahead and zoom in on this, you'll 313 00:14:38,830 --> 00:14:43,570 see that the line of output that we recognized 314 00:14:43,570 --> 00:14:46,150 is this one, that same one I verbally drew attention 315 00:14:46,150 --> 00:14:50,200 to before-- buggy0.c, line 5, error, implicitly declaring library function 316 00:14:50,200 --> 00:14:52,010 printf, and so forth. 317 00:14:52,010 --> 00:14:54,820 So here, without the background highlighting, but still in yellow, 318 00:14:54,820 --> 00:14:58,450 is our advice or a question a TF or CA might ask you in office hours. 319 00:14:58,450 --> 00:15:02,080 Well, did you forget to include stdio.h in which printf 320 00:15:02,080 --> 00:15:05,290 is declared atop your file? 321 00:15:05,290 --> 00:15:08,710 And hopefully, our questions, rhetorical or otherwise, are correct, 322 00:15:08,710 --> 00:15:10,430 and that will get you further along. 323 00:15:10,430 --> 00:15:12,050 So let's go ahead and try that advice. 324 00:15:12,050 --> 00:15:15,220 So include stdio.h. 325 00:15:15,220 --> 00:15:16,970 Now let me go ahead and go back down here. 326 00:15:16,970 --> 00:15:19,570 And if you don't like clutter, you can type "clear," 327 00:15:19,570 --> 00:15:23,380 or hit Control+L in the terminal window to keep cleaning it like I do. 328 00:15:23,380 --> 00:15:29,120 If you want to go ahead now and run makebuggy0, Enter, fewer errors, 329 00:15:29,120 --> 00:15:30,760 so that's progress, and not the same. 330 00:15:30,760 --> 00:15:33,430 So this one's, perhaps, a little easier. 331 00:15:33,430 --> 00:15:36,470 Reading the line, what line of code is buggy here? 332 00:15:36,470 --> 00:15:38,420 AUDIENCE: Forgot the semicolon. 333 00:15:38,420 --> 00:15:41,640 DAVID J. MALAN: Yeah, so this is now still on line 5, it turns out, 334 00:15:41,640 --> 00:15:42,940 but for a different reason. 335 00:15:42,940 --> 00:15:44,360 I seem to be missing a semi-colon. 336 00:15:44,360 --> 00:15:47,030 But I could similarly ask help50 for help with that 337 00:15:47,030 --> 00:15:48,570 and hope that it recognizes my error. 338 00:15:48,570 --> 00:15:50,780 So this, too, should start being your first instinct. 339 00:15:50,780 --> 00:15:52,760 If on first glance, you don't really understand 340 00:15:52,760 --> 00:15:54,290 what an error message is doing, even though you've 341 00:15:54,290 --> 00:15:57,320 scrolled to the very first one, like literally ask this program for help 342 00:15:57,320 --> 00:15:59,870 by rerunning the exact same command you just 343 00:15:59,870 --> 00:16:03,410 ran, but prefix it with help50 and a space, 344 00:16:03,410 --> 00:16:05,360 and that will run help50 for you. 345 00:16:05,360 --> 00:16:08,600 Any questions on that process? 346 00:16:08,600 --> 00:16:10,690 All right, let's take a look at one other program, 347 00:16:10,690 --> 00:16:15,580 for instance, that, this time, has a different error involved in it. 348 00:16:15,580 --> 00:16:19,210 So how about-- let me go ahead and whip up a quick program here. 349 00:16:19,210 --> 00:16:23,530 I'll call this buggy2.c for consistency with some of the samples 350 00:16:23,530 --> 00:16:25,360 we have online for you later. 351 00:16:25,360 --> 00:16:29,930 And in this example, I'm going to go ahead and write the correct thing 352 00:16:29,930 --> 00:16:33,820 at first, stdio.h, and then I'm going to have int main void, which 353 00:16:33,820 --> 00:16:35,350 just gets my whole program started. 354 00:16:35,350 --> 00:16:37,480 And then I'm going to have a loop, and recall for-- 355 00:16:37,480 --> 00:16:40,190 [CLEARS THROAT] excuse me-- Mario or some other program, 356 00:16:40,190 --> 00:16:44,120 you might have done something like int i get 0, i is less than or equal to-- 357 00:16:44,120 --> 00:16:47,260 let's do this 10 times, and then i++. 358 00:16:47,260 --> 00:16:53,680 And all I want to do in this program is print out that value of i, as I can do, 359 00:16:53,680 --> 00:16:55,840 with the %i placeholder-- so a simple program. 360 00:16:55,840 --> 00:16:59,630 Just want it to count from 0 to 10. 361 00:16:59,630 --> 00:17:04,720 So let's go ahead and run buggy2, or rather, I want to-- 362 00:17:04,720 --> 00:17:06,490 let's not print up-- 363 00:17:06,490 --> 00:17:07,420 rewind. 364 00:17:07,420 --> 00:17:12,190 Let's go ahead and just print out a hash symbol 365 00:17:12,190 --> 00:17:15,560 and not spoil the solution this way. 366 00:17:15,560 --> 00:17:17,480 So here, I go ahead and print out buggy2. 367 00:17:17,480 --> 00:17:21,920 My goal is now I will stipulate to print out just 10 hash symbols, one per line, 368 00:17:21,920 --> 00:17:23,270 which is what I want to do here. 369 00:17:23,270 --> 00:17:28,640 And now I'm going to go ahead and run ./buggy2, and I should see, hopefully, 370 00:17:28,640 --> 00:17:30,390 10 hashes. 371 00:17:30,390 --> 00:17:33,450 And I kind of spoiled this a little bit, but what do I instead see? 372 00:17:36,860 --> 00:17:39,800 Yeah, I think I see more than I expect. 373 00:17:39,800 --> 00:17:46,380 And we can kind of zoom in here and double check, so 1, 2, 3, 4, 5, 6, 7, 374 00:17:46,380 --> 00:17:48,550 8, 9, 10, ooh, 11. 375 00:17:48,550 --> 00:17:49,220 11. 376 00:17:49,220 --> 00:17:53,090 Now some of your eyes might already be darting to what the solution should be, 377 00:17:53,090 --> 00:17:55,080 but let's just propose that it's not obvious. 378 00:17:55,080 --> 00:17:57,290 And if it is actually not obvious, all the better, so how might 379 00:17:57,290 --> 00:17:59,990 you go about diagnosing this kind of problem, short of just 380 00:17:59,990 --> 00:18:02,010 reaching out and asking a human for help. 381 00:18:02,010 --> 00:18:04,600 This is not a problem that help50 can help with, 382 00:18:04,600 --> 00:18:06,020 because it's not an error message. 383 00:18:06,020 --> 00:18:07,720 Your program is working. 384 00:18:07,720 --> 00:18:09,890 It's just not outputting what you wanted it to work, 385 00:18:09,890 --> 00:18:13,400 but it's not an error message from the compiler with which help50 can help. 386 00:18:13,400 --> 00:18:17,190 So you want to kind of get eyes into what your program is doing, 387 00:18:17,190 --> 00:18:20,420 and you want to understand, why are you printing 11 when you really 388 00:18:20,420 --> 00:18:22,310 are setting this up from 0 to 10? 389 00:18:22,310 --> 00:18:25,040 Well, one of the most common techniques in C or any language, 390 00:18:25,040 --> 00:18:29,340 honestly, is to use printf for just other purposes-- diagnostic purposes. 391 00:18:29,340 --> 00:18:32,210 For instance, there's not much going on in this program, 392 00:18:32,210 --> 00:18:35,090 but I'd argue that it would be interesting for me to know, 393 00:18:35,090 --> 00:18:37,250 and therefore understand my program, by just, 394 00:18:37,250 --> 00:18:41,690 let's print out this value of i on each iteration, 395 00:18:41,690 --> 00:18:44,150 as by doing the line of code that I earlier did, 396 00:18:44,150 --> 00:18:47,590 and just say something literally like, i is %i. 397 00:18:47,590 --> 00:18:49,640 I'm going to remove this ultimately, because it's 398 00:18:49,640 --> 00:18:52,140 going to make my program look a little silly, 399 00:18:52,140 --> 00:18:54,750 but it's going to help me understand what's going on. 400 00:18:54,750 --> 00:19:01,600 Let me go ahead and recompile buggy2, ./bugg2, and this time, 401 00:19:01,600 --> 00:19:03,110 I see a lot more output. 402 00:19:03,110 --> 00:19:07,260 But if I zoom in, now it's kind of-- 403 00:19:07,260 --> 00:19:10,180 now the computer is essentially helping me understand what's going on. 404 00:19:10,180 --> 00:19:11,950 When i is 0, here's one of them. 405 00:19:11,950 --> 00:19:13,460 When i is 1, here's another. 406 00:19:13,460 --> 00:19:16,510 I is 2, 3, 4, 5, 6, 7, 8, 9, and that looks good. 407 00:19:16,510 --> 00:19:19,960 But if we scroll a little further, it feels a little problematic 408 00:19:19,960 --> 00:19:22,210 that i can also be 10. 409 00:19:22,210 --> 00:19:24,520 So what's logically the bug in this program? 410 00:19:24,520 --> 00:19:25,880 AUDIENCE: [INAUDIBLE]. 411 00:19:25,880 --> 00:19:26,750 DAVID J. MALAN: Yeah. 412 00:19:26,750 --> 00:19:29,750 I use less than or equal to, because I kind of confuse the paradigm. 413 00:19:29,750 --> 00:19:31,790 Like programmers tend to start counting at zero, 414 00:19:31,790 --> 00:19:34,820 apparently, but I want to do this 10 times, and in the human world, 415 00:19:34,820 --> 00:19:38,560 if I want to do something 10 times, I might count up to and including 10. 416 00:19:38,560 --> 00:19:39,900 But you can't have it both ways. 417 00:19:39,900 --> 00:19:41,780 You can't start at zero and end at 10 if you 418 00:19:41,780 --> 00:19:43,700 want to do something exactly 10 times. 419 00:19:43,700 --> 00:19:46,070 So there's a couple of possibilities here. 420 00:19:46,070 --> 00:19:48,310 How might we fix this? 421 00:19:48,310 --> 00:19:50,440 Yeah, so we could certainly change it to less than. 422 00:19:50,440 --> 00:19:53,320 What's another correct approach? 423 00:19:53,320 --> 00:19:56,450 Yeah, so we could leave this alone and just start counting at one, 424 00:19:56,450 --> 00:19:59,540 and if you're not actually printing the values in your actual program, 425 00:19:59,540 --> 00:20:01,910 that might be perfectly reasonable, too. 426 00:20:01,910 --> 00:20:03,100 It's just not conventional. 427 00:20:03,100 --> 00:20:06,230 Get comfortable with, quickly, just counting from zero, because that's just 428 00:20:06,230 --> 00:20:08,810 what most everyone does these days. 429 00:20:08,810 --> 00:20:11,060 But the technique here is just use printf. 430 00:20:11,060 --> 00:20:15,190 Like, when in doubt, literally use printf on this line, on this line, 431 00:20:15,190 --> 00:20:15,740 on this line. 432 00:20:15,740 --> 00:20:18,800 Anywhere something is interesting maybe going on in your program, 433 00:20:18,800 --> 00:20:21,460 just use it to print out the strings that are in your variables, 434 00:20:21,460 --> 00:20:24,770 print out the integers that are in your variables, or anything else. 435 00:20:24,770 --> 00:20:26,690 And it allows you to kind of see, so to speak, 436 00:20:26,690 --> 00:20:32,630 what's going on inside of your program, printf. 437 00:20:32,630 --> 00:20:36,530 One last tool-- so it's not uncommon, when writing code, 438 00:20:36,530 --> 00:20:39,290 to maybe get a little sloppy early on, especially when you're not 439 00:20:39,290 --> 00:20:40,660 quite familiar with the patterns. 440 00:20:40,660 --> 00:20:43,910 And for instance, if I go ahead and do this 441 00:20:43,910 --> 00:20:47,750 by deleting a whole bunch of whitespace, even after fixing this mistake 442 00:20:47,750 --> 00:20:52,720 by going from zero to 10, is this program now correct, 443 00:20:52,720 --> 00:20:54,140 if the goal is to print 10 hashes? 444 00:20:57,130 --> 00:20:57,880 Yeah, I heard yes. 445 00:20:57,880 --> 00:20:58,820 Why is it correct? 446 00:20:58,820 --> 00:20:59,410 In what sense? 447 00:21:02,680 --> 00:21:03,440 Yeah, exactly. 448 00:21:03,440 --> 00:21:04,170 It still works. 449 00:21:04,170 --> 00:21:06,950 It prints out the 10 hashes, one per line, 450 00:21:06,950 --> 00:21:09,770 but it's poorly written in the sense of style. 451 00:21:09,770 --> 00:21:12,380 So recall that we tend to evaluate, and the world 452 00:21:12,380 --> 00:21:14,440 tends to think about code in at least three ways. 453 00:21:14,440 --> 00:21:16,690 One, the correctness-- does it do what it's supposed to do, 454 00:21:16,690 --> 00:21:17,550 like print 10 hashes? 455 00:21:17,550 --> 00:21:19,960 And yes, it does, because all I did was delete whitespace. 456 00:21:19,960 --> 00:21:22,730 I didn't actually change or break the code after making that fix. 457 00:21:22,730 --> 00:21:25,900 Two is design, like how thoughtful, how well-written is the code? 458 00:21:25,900 --> 00:21:28,490 And frankly, it's kind of hard to write this in too many ways, 459 00:21:28,490 --> 00:21:29,670 because it's so few lines. 460 00:21:29,670 --> 00:21:31,670 But you'll see over time, as your programs grow, 461 00:21:31,670 --> 00:21:33,740 the teaching fellows and staff can provide you with feedback 462 00:21:33,740 --> 00:21:35,060 on the design of your code. 463 00:21:35,060 --> 00:21:36,950 But style is relatively easy. 464 00:21:36,950 --> 00:21:39,890 And I've been teaching it mostly by way of example, if you will, 465 00:21:39,890 --> 00:21:42,500 because I've been very methodically indenting my code 466 00:21:42,500 --> 00:21:45,440 and making sure everything looks very pretty, or at least pretty 467 00:21:45,440 --> 00:21:47,150 to a trained eye. 468 00:21:47,150 --> 00:21:49,550 But this, let's just stipulate, is not pretty. 469 00:21:49,550 --> 00:21:53,060 Like, left aligning everything still works, not incorrect, 470 00:21:53,060 --> 00:21:54,460 but it's poorly styled. 471 00:21:54,460 --> 00:21:56,740 And what would be an argument for not writing code 472 00:21:56,740 --> 00:21:58,580 like this and, instead, writing code the way 473 00:21:58,580 --> 00:22:02,020 I did a moment ago, albeit after fixing the bug? 474 00:22:02,020 --> 00:22:02,680 Yeah. 475 00:22:02,680 --> 00:22:05,560 AUDIENCE: It'll help you identify each little subroutine that 476 00:22:05,560 --> 00:22:10,040 goes through the thing, so you know this section is here. 477 00:22:10,040 --> 00:22:11,030 DAVID J. MALAN: Yeah. 478 00:22:11,030 --> 00:22:13,340 AUDIENCE: [INAUDIBLE] next one, so you know where everything is. 479 00:22:13,340 --> 00:22:14,180 DAVID J. MALAN: Exactly. 480 00:22:14,180 --> 00:22:15,090 Let me summarize this. 481 00:22:15,090 --> 00:22:17,300 It allows you to see, more visually, what 482 00:22:17,300 --> 00:22:20,540 are the individual subroutines or blocks of code doing 483 00:22:20,540 --> 00:22:22,070 that are associated with each other? 484 00:22:22,070 --> 00:22:25,280 Scratch is colorful, and it has shapes, like the hugging shape 485 00:22:25,280 --> 00:22:27,200 that a lot of the control blocks make, to make 486 00:22:27,200 --> 00:22:30,800 clear visually to the programmer that this block encompasses others, 487 00:22:30,800 --> 00:22:34,190 and, therefore, this repeats block or this forever block 488 00:22:34,190 --> 00:22:36,330 is doing these things again and again and again. 489 00:22:36,330 --> 00:22:38,930 That's the role that these curly braces serve, and indentation 490 00:22:38,930 --> 00:22:41,240 in this and in other contexts just helps it 491 00:22:41,240 --> 00:22:45,080 become more obvious to the programmer what is inside of what 492 00:22:45,080 --> 00:22:46,480 and what is happening where. 493 00:22:46,480 --> 00:22:49,010 So this is just better written, because you 494 00:22:49,010 --> 00:22:52,820 can see that the code inside of main is everything that's indented here. 495 00:22:52,820 --> 00:22:56,090 The code that's inside the for loop is everything that's indented here. 496 00:22:56,090 --> 00:22:58,370 So it's just for us human readers, teaching fellows 497 00:22:58,370 --> 00:23:01,250 in the case of a course, or colleagues in the case of the real world. 498 00:23:01,250 --> 00:23:05,240 But suppose that you don't quite see these patterns too readily initially. 499 00:23:05,240 --> 00:23:06,710 That, too, is fine. 500 00:23:06,710 --> 00:23:09,320 CS50 has on its website what we call a style guide. 501 00:23:09,320 --> 00:23:11,690 It's just a summary of what your code should 502 00:23:11,690 --> 00:23:14,210 look like when using certain features of C-- loops, 503 00:23:14,210 --> 00:23:16,230 conditions, variables, functions, and so forth. 504 00:23:16,230 --> 00:23:18,110 And it's linked on the course's website. 505 00:23:18,110 --> 00:23:20,240 But there's also a tool that you can use when 506 00:23:20,240 --> 00:23:23,510 writing your code that'll help you clean it up and make it consistent, 507 00:23:23,510 --> 00:23:26,300 not just for the sake of making it consistent with the style guide, 508 00:23:26,300 --> 00:23:28,770 but just making your own code more readable. 509 00:23:28,770 --> 00:23:31,070 So for instance, if I go ahead and run a command called 510 00:23:31,070 --> 00:23:37,520 style50 on this program, buggy2.c, and then hit Enter, 511 00:23:37,520 --> 00:23:40,490 I'm going to see some output that's colorful. 512 00:23:40,490 --> 00:23:44,240 I see my own code in white, and then I see, anywhere 513 00:23:44,240 --> 00:23:47,210 I should have indented, green spaces that 514 00:23:47,210 --> 00:23:50,060 are sort of encouraging me to put space, space, space, space here. 515 00:23:50,060 --> 00:23:51,560 Put space, space, space, space here. 516 00:23:51,560 --> 00:23:54,440 Put eight spaces here, four spaces here, and so forth, 517 00:23:54,440 --> 00:23:56,940 and then it's reminding me I should add comments as well. 518 00:23:56,940 --> 00:23:58,820 This is a short program-- doesn't necessarily 519 00:23:58,820 --> 00:24:01,250 need a lot of commenting to explain what's going on. 520 00:24:01,250 --> 00:24:04,100 But just one //, like we saw last week to explain, 521 00:24:04,100 --> 00:24:06,830 maybe at the top of the file or top the block of the code, 522 00:24:06,830 --> 00:24:08,830 would make style50 happy as well. 523 00:24:08,830 --> 00:24:09,540 So let's do that. 524 00:24:09,540 --> 00:24:13,980 Let me go ahead and take its advice and actually indent this with Tab, 525 00:24:13,980 --> 00:24:17,150 this with Tab, this with Tab, this with Tab, and this once more. 526 00:24:17,150 --> 00:24:20,150 And you'll notice that on your keyboard, even though you're hitting Tab, 527 00:24:20,150 --> 00:24:23,330 it's actually converting it for you, which is very common to four spaces, 528 00:24:23,330 --> 00:24:25,700 so you don't have to hit the spacebar four times. 529 00:24:25,700 --> 00:24:27,440 Just get into the habit of using Tab. 530 00:24:27,440 --> 00:24:30,320 And let me go ahead and write a comment here. 531 00:24:30,320 --> 00:24:32,480 "Print 10 hashes." 532 00:24:32,480 --> 00:24:35,050 This way, my colleagues, my teaching fellow, myself in a week 533 00:24:35,050 --> 00:24:37,890 don't have to read my own code again and figure out what it's doing. 534 00:24:37,890 --> 00:24:40,670 I can read the comments alone per the //. 535 00:24:40,670 --> 00:24:44,480 If I run style50 again, now it looks good. 536 00:24:44,480 --> 00:24:47,780 It's in accordance with the style guide, and it's just more prettily written, 537 00:24:47,780 --> 00:24:50,480 so pretty printed would be a term of art in programming 538 00:24:50,480 --> 00:24:53,720 when your code looks good and isn't just correct. 539 00:24:53,720 --> 00:24:55,670 Any questions then? 540 00:24:55,670 --> 00:24:56,620 Yeah. 541 00:24:56,620 --> 00:24:58,990 AUDIENCE: I tried using [INAUDIBLE] this past week 542 00:24:58,990 --> 00:25:00,930 and it said I needed a new program. 543 00:25:00,930 --> 00:25:01,930 DAVID J. MALAN: That's-- 544 00:25:01,930 --> 00:25:04,010 it wasn't enabled for the first week of the class. 545 00:25:04,010 --> 00:25:06,570 It's enabled as of right now and henceforth. 546 00:25:06,570 --> 00:25:08,930 Other questions? 547 00:25:08,930 --> 00:25:09,430 No. 548 00:25:09,430 --> 00:25:13,150 All right, so just to recap then, three tools to have in the proverbial toolbox 549 00:25:13,150 --> 00:25:16,270 now are help50 anytime you see an error message that you don't understand, 550 00:25:16,270 --> 00:25:18,850 whether it's with make or Clang or, perhaps, something else. 551 00:25:18,850 --> 00:25:21,280 Printf-- when you've got a logical program-- 552 00:25:21,280 --> 00:25:24,340 a bug in your program, and it's just not working the way it's supposed to 553 00:25:24,340 --> 00:25:27,970 or the way the problem set tells you it should, and then style50 554 00:25:27,970 --> 00:25:31,960 when you want to make sure that, does my code look right in terms of style, 555 00:25:31,960 --> 00:25:33,370 and is it as readable as possible? 556 00:25:33,370 --> 00:25:35,750 And honestly, you'll find us at office hours and the like 557 00:25:35,750 --> 00:25:38,380 often encouraging you, hey, before we answer this question, 558 00:25:38,380 --> 00:25:40,180 can you please run style50 on your code? 559 00:25:40,180 --> 00:25:43,180 Can you please clean up your code, because it just makes our lives, too, 560 00:25:43,180 --> 00:25:45,940 as other humans so much easier when we can understand what's 561 00:25:45,940 --> 00:25:49,390 going on without having to visually figure out what parentheses and curly 562 00:25:49,390 --> 00:25:50,350 braces line up. 563 00:25:50,350 --> 00:25:52,060 And so do get into that habit, because it 564 00:25:52,060 --> 00:25:57,700 will save you time from having to waste time parsing things visually yourself. 565 00:25:57,700 --> 00:25:58,750 All right. 566 00:25:58,750 --> 00:26:01,570 So there's not just CPUs in computers. 567 00:26:01,570 --> 00:26:03,510 CPUs are the brains, central processing unit, 568 00:26:03,510 --> 00:26:06,760 and that's why we keep emphasizing the instructions that computers understand. 569 00:26:06,760 --> 00:26:09,610 There's also this, which we saw last time, too. 570 00:26:09,610 --> 00:26:12,350 This is an example of what type of hardware? 571 00:26:12,350 --> 00:26:12,940 AUDIENCE: RAM. 572 00:26:12,940 --> 00:26:15,110 DAVID J. MALAN: RAM, or Random Access Memory. 573 00:26:15,110 --> 00:26:17,620 This is the type of memory that laptops, desktops, servers 574 00:26:17,620 --> 00:26:21,790 have that is used whenever you run a program or open a file. 575 00:26:21,790 --> 00:26:25,060 There's another type of memory called hard drives or solid state drives, 576 00:26:25,060 --> 00:26:27,010 which you're probably familiar as a consumer, 577 00:26:27,010 --> 00:26:29,950 and that's just where your files are stored permanently. 578 00:26:29,950 --> 00:26:30,830 Your battery can die. 579 00:26:30,830 --> 00:26:32,920 You can pull the plug from your laptop or desktop, 580 00:26:32,920 --> 00:26:35,350 and any files saved on a hard drive are persistent. 581 00:26:35,350 --> 00:26:37,420 They stay there because of the technology 582 00:26:37,420 --> 00:26:38,960 being used to implement that. 583 00:26:38,960 --> 00:26:41,500 But RAM is more ephemeral. 584 00:26:41,500 --> 00:26:43,600 RAM is powered only by electricity. 585 00:26:43,600 --> 00:26:46,720 It's only used when the power is on or the battery is charged, 586 00:26:46,720 --> 00:26:49,960 and it's where your files and programs live effectively when 587 00:26:49,960 --> 00:26:52,030 you double click on them and open them. 588 00:26:52,030 --> 00:26:54,460 So when you double click on something like Microsoft Word, 589 00:26:54,460 --> 00:26:59,140 it is copied from your hard drive long term into this type of memory, 590 00:26:59,140 --> 00:27:02,440 because this type of memory, though smaller in capacity-- 591 00:27:02,440 --> 00:27:04,180 you don't have as many bytes of it-- 592 00:27:04,180 --> 00:27:06,760 but it is much, much, much, much faster. 593 00:27:06,760 --> 00:27:09,550 Similarly, when you open a document, or you go to a web page, 594 00:27:09,550 --> 00:27:13,210 the contents of the file you're seeing are stored in this type of hardware, 595 00:27:13,210 --> 00:27:15,760 because even though you don't have terribly many bytes of it, 596 00:27:15,760 --> 00:27:18,230 it's just much, much, much, much faster. 597 00:27:18,230 --> 00:27:20,950 And so this will be thematic in computer science and in hardware. 598 00:27:20,950 --> 00:27:23,950 You sort of have lots of cheap, slow stuff, 599 00:27:23,950 --> 00:27:27,010 like hard disk space, relatively speaking, and you have a little less 600 00:27:27,010 --> 00:27:30,040 of the more expensive but faster stuff like RAM. 601 00:27:30,040 --> 00:27:33,210 And you have just one, usually, CPU, which is the really fast thing that 602 00:27:33,210 --> 00:27:34,990 can do a billion things per second. 603 00:27:34,990 --> 00:27:37,250 But it, too, is more expensive. 604 00:27:37,250 --> 00:27:39,760 So there's four visible chips on this thing, if you will. 605 00:27:39,760 --> 00:27:42,640 And we won't get into the details of how these things work, but let's 606 00:27:42,640 --> 00:27:46,030 just zoom in on this one black chip here and focus on it 607 00:27:46,030 --> 00:27:48,550 as being representative as some amount of memory. 608 00:27:48,550 --> 00:27:50,830 Maybe it's one megabyte, one million bytes. 609 00:27:50,830 --> 00:27:54,310 Maybe it's even one gigabyte these days, one billion bytes. 610 00:27:54,310 --> 00:27:57,430 But this is to say that this chip can be thought of as just having 611 00:27:57,430 --> 00:27:58,740 a bunch of bytes in it. 612 00:27:58,740 --> 00:27:59,860 This is not to scale. 613 00:27:59,860 --> 00:28:01,790 You have many more bytes than these, but let 614 00:28:01,790 --> 00:28:04,090 me propose that you just think of each of these squares 615 00:28:04,090 --> 00:28:06,100 here as representing one byte. 616 00:28:06,100 --> 00:28:08,920 So the very first byte of memory I have access to is here. 617 00:28:08,920 --> 00:28:10,670 Next one is here, and so forth. 618 00:28:10,670 --> 00:28:13,420 And the fact that they wrap around is just an artist rendition. 619 00:28:13,420 --> 00:28:15,760 These things you can think of just virtually as going 620 00:28:15,760 --> 00:28:19,580 left to right, not in any kind of grid, but physically, they look like this. 621 00:28:19,580 --> 00:28:23,410 So when you actually create a variable in a program like C, 622 00:28:23,410 --> 00:28:24,630 like you need a char. 623 00:28:24,630 --> 00:28:27,760 A char tends to be one byte or eight bits, 624 00:28:27,760 --> 00:28:32,300 and so that means when you have a variable of type char in a C program, 625 00:28:32,300 --> 00:28:35,920 it goes, literally, physically in one of these boxes, 626 00:28:35,920 --> 00:28:37,420 inside of your computer's RAM. 627 00:28:37,420 --> 00:28:40,520 So for instance, it might take up this much space at top left. 628 00:28:40,520 --> 00:28:42,820 If you have a bigger type of data, so you 629 00:28:42,820 --> 00:28:45,820 have an integer, which tends to be four bytes or 32 bits, 630 00:28:45,820 --> 00:28:48,910 you might need more than one square, so the computer might give you access 631 00:28:48,910 --> 00:28:50,990 to four squares instead. 632 00:28:50,990 --> 00:28:54,010 And you have 32 bits spanning that region of memory. 633 00:28:54,010 --> 00:28:56,070 But honestly, I chose those boxes arbitrarily. 634 00:28:56,070 --> 00:28:58,780 They could be anywhere in that chip or in any of the other chips. 635 00:28:58,780 --> 00:29:01,620 It's up to the computer to just remember where they are for you. 636 00:29:01,620 --> 00:29:04,240 You don't need to remember that, per se. 637 00:29:04,240 --> 00:29:07,060 But if we think about this grid, it turns out 638 00:29:07,060 --> 00:29:10,540 this is actually very valuable that we have chunks of memory-- 639 00:29:10,540 --> 00:29:11,780 bytes, if you will-- 640 00:29:11,780 --> 00:29:13,960 that are back to back to back to back. 641 00:29:13,960 --> 00:29:16,250 And in fact, there's a word for this technique. 642 00:29:16,250 --> 00:29:17,680 This is contiguous memory-- 643 00:29:17,680 --> 00:29:19,270 back to back to back to back to back. 644 00:29:19,270 --> 00:29:23,210 And in general, in programming, this is referred to as an array. 645 00:29:23,210 --> 00:29:25,510 You might recall from Scratch, if you use this feature, 646 00:29:25,510 --> 00:29:27,250 it actually has things called lists, which 647 00:29:27,250 --> 00:29:30,250 are exactly that-- lists of values, lists of words, lists of strings. 648 00:29:30,250 --> 00:29:33,020 An array is just a contiguous chunk of memory, such 649 00:29:33,020 --> 00:29:35,770 that you can store something here, something here, something here, 650 00:29:35,770 --> 00:29:37,850 something here, and so forth. 651 00:29:37,850 --> 00:29:41,560 So it turns out an array, this super simple primitive, 652 00:29:41,560 --> 00:29:43,900 is actually incredibly powerful. 653 00:29:43,900 --> 00:29:46,900 Just being able to store things in my computer's memory 654 00:29:46,900 --> 00:29:52,630 back to back to back to back enables so many possibilities, both design-wise, 655 00:29:52,630 --> 00:29:56,710 like how well I can write my code, and also how fast I can make my code run. 656 00:29:56,710 --> 00:29:59,810 So let me go ahead and take out an example. 657 00:29:59,810 --> 00:30:04,690 Let me go ahead and open up, for instance, a new file in a sandbox, 658 00:30:04,690 --> 00:30:06,700 and we'll call this score0. 659 00:30:06,700 --> 00:30:12,150 So let me go ahead and close this one, create a new file called scores0.c. 660 00:30:12,150 --> 00:30:16,570 And in this file, let's go ahead and write a relatively simple program. 661 00:30:16,570 --> 00:30:18,790 Let me go ahead and, as usual, give myself access 662 00:30:18,790 --> 00:30:22,690 to some helpful functions-- cs50.h and stdio.h. 663 00:30:22,690 --> 00:30:25,300 And no need to copy all this down verbatim, if you don't like. 664 00:30:25,300 --> 00:30:28,210 Everything will have or is already on the course's website. 665 00:30:28,210 --> 00:30:30,880 Let me start my program as usual with int main void. 666 00:30:30,880 --> 00:30:33,730 And then let me write a program, as this program's name implies, 667 00:30:33,730 --> 00:30:38,470 that, like, asks the user for three scores on recent problem sets, 668 00:30:38,470 --> 00:30:41,470 quizzes, whatever, and then kind of creates a very simple chart of them, 669 00:30:41,470 --> 00:30:44,440 like a bar chart to kind of help me visualize how well 670 00:30:44,440 --> 00:30:46,340 or how poorly I did on something. 671 00:30:46,340 --> 00:30:49,450 So if I want to get an integer, no surprise, 672 00:30:49,450 --> 00:30:51,610 we can use the get int function, and I can just 673 00:30:51,610 --> 00:30:54,280 ask the user for their first score. 674 00:30:54,280 --> 00:30:56,500 But I should probably do something with this score, 675 00:30:56,500 --> 00:31:00,160 and on the left hand side of this, what do I typically put? 676 00:31:00,160 --> 00:31:00,660 Yeah. 677 00:31:00,660 --> 00:31:04,720 So int-- sure, score 1 equals this, and then my semi-colon. 678 00:31:04,720 --> 00:31:07,480 So you might not have had many occasions to use ints just yet, 679 00:31:07,480 --> 00:31:09,370 but get int is in the cs50 library. 680 00:31:09,370 --> 00:31:11,170 This is the so-called prompt that the human 681 00:31:11,170 --> 00:31:13,600 sees, and let me actually fix my space, because I 682 00:31:13,600 --> 00:31:16,630 want the human to see the space after the colon. 683 00:31:16,630 --> 00:31:18,310 But that's just an aesthetic detail. 684 00:31:18,310 --> 00:31:21,350 And then when I get back this value, its return value-- 685 00:31:21,350 --> 00:31:23,650 just like Aaron, last week, handed me a piece of paper, 686 00:31:23,650 --> 00:31:26,710 so does get int hand me a virtual piece of paper with a number 687 00:31:26,710 --> 00:31:29,500 that I'm going to store in a variable called Score 1. 688 00:31:29,500 --> 00:31:34,690 And now just to be clear, what has just happened effectively is this. 689 00:31:34,690 --> 00:31:39,550 The moment you create a variable of type int, which is four bytes, 690 00:31:39,550 --> 00:31:42,430 literally, this is what Clang or, more generally, 691 00:31:42,430 --> 00:31:44,230 the computer has done for you. 692 00:31:44,230 --> 00:31:47,890 That int that the human typed in is stored literally 693 00:31:47,890 --> 00:31:51,730 in four contiguous bytes back to back to back, maybe here, maybe here, 694 00:31:51,730 --> 00:31:52,480 but together. 695 00:31:52,480 --> 00:31:55,540 So that's all that's going on when you're actually using C. 696 00:31:55,540 --> 00:31:58,240 So let me go back into my code here, and now I 697 00:31:58,240 --> 00:32:00,380 want to-- it's not interesting to plot one score. 698 00:32:00,380 --> 00:32:01,750 So let's go ahead and do another. 699 00:32:01,750 --> 00:32:08,080 So int Score 2, get int, get int, and I'll ask the user for score 2, 700 00:32:08,080 --> 00:32:13,800 semi-colon, and then let's get one more, Score 3, get int, call it Score 3, 701 00:32:13,800 --> 00:32:14,830 semi-colon. 702 00:32:14,830 --> 00:32:17,390 All right, so now let me go ahead and generate a bar, 703 00:32:17,390 --> 00:32:18,490 like a bar chart of this. 704 00:32:18,490 --> 00:32:20,530 I'm going to use what we'll call ASCII art. 705 00:32:20,530 --> 00:32:22,480 ASCII, of course, is just text, recall-- 706 00:32:22,480 --> 00:32:23,780 very simple text in a computer. 707 00:32:23,780 --> 00:32:27,670 And I can kind of make a bar chart pretty simply by just printing out 708 00:32:27,670 --> 00:32:30,530 like a bunch of hashes horizontally, so a short bar 709 00:32:30,530 --> 00:32:34,090 will represent a small number, and a long bar will represent a big number. 710 00:32:34,090 --> 00:32:38,410 So let me go ahead and say to the user, all right, here's your Score 1. 711 00:32:38,410 --> 00:32:41,320 I'm going to go ahead, then, and say, for int i get 0. 712 00:32:41,320 --> 00:32:46,000 I is less than Score 1, i++. 713 00:32:46,000 --> 00:32:48,970 And now if I scroll down and give myself a bit of room here, 714 00:32:48,970 --> 00:32:53,740 let me go ahead and implement just a simple print. 715 00:32:53,740 --> 00:32:57,580 So go ahead and print out a hash, and then when you're all done with that, 716 00:32:57,580 --> 00:33:01,660 print out a new line at the end of that loop. 717 00:33:01,660 --> 00:33:03,160 And let's just pause there. 718 00:33:03,160 --> 00:33:05,830 Just to recap, I've asked the human for three scores. 719 00:33:05,830 --> 00:33:09,070 I'm only doing something with one of them at the moment, so in fact, 720 00:33:09,070 --> 00:33:13,420 just as a quick check, let me delete those so as to not get ahead of myself. 721 00:33:13,420 --> 00:33:15,060 Let me do make score 0. 722 00:33:15,060 --> 00:33:16,380 Cross my fingers. 723 00:33:16,380 --> 00:33:17,510 OK, no errors. 724 00:33:17,510 --> 00:33:22,480 Now let me go ahead and do ./score0, and your first score on a pset this year 725 00:33:22,480 --> 00:33:24,580 out of 100 has been? 726 00:33:24,580 --> 00:33:26,170 OK, 100. 727 00:33:26,170 --> 00:33:27,090 And good job. 728 00:33:27,090 --> 00:33:29,260 So it's a really long bar, and if we count those up, 729 00:33:29,260 --> 00:33:31,380 hopefully, there's actually 100 bars. 730 00:33:31,380 --> 00:33:33,760 And if we run it again and say, eh, it didn't go so well. 731 00:33:33,760 --> 00:33:35,170 I got a 50. 732 00:33:35,170 --> 00:33:36,640 That's half as big a bar. 733 00:33:36,640 --> 00:33:39,130 So it seems like we're on our way correctness-wise. 734 00:33:39,130 --> 00:33:41,200 So now let me go ahead and get the other scores. 735 00:33:41,200 --> 00:33:42,860 Well, I had them here a moment ago. 736 00:33:42,860 --> 00:33:46,930 So let me go ahead and just, well, copy, paste, and change this to two, 737 00:33:46,930 --> 00:33:50,020 change this to three, change this to three, this to three. 738 00:33:50,020 --> 00:33:53,020 All right, I know how to print bars clearly, so let me go ahead 739 00:33:53,020 --> 00:33:56,880 and do this, and then do this, and then fix the indentation. 740 00:33:56,880 --> 00:33:58,510 I don't want to say Score 1 everywhere. 741 00:33:58,510 --> 00:34:00,640 I want to say a Score 2, Score 2. 742 00:34:00,640 --> 00:34:03,370 I mean you're probably being rubbed the wrong way that this 743 00:34:03,370 --> 00:34:06,040 is both tedious and sloppy, and why? 744 00:34:06,040 --> 00:34:07,870 What am I doing poorly now design-wise? 745 00:34:07,870 --> 00:34:09,330 AUDIENCE: Copying and pasting code. 746 00:34:09,330 --> 00:34:11,650 DAVID J. MALAN: Like copy-pasting almost always bad, right? 747 00:34:11,650 --> 00:34:13,420 There's redundancy here, but that's fine. 748 00:34:13,420 --> 00:34:15,670 Let's prioritize correctness, at least, for now. 749 00:34:15,670 --> 00:34:18,110 So let me go ahead and make Score 0. 750 00:34:18,110 --> 00:34:21,290 All right, no mistakes-- ./score0. 751 00:34:21,290 --> 00:34:22,210 And then Tab it. 752 00:34:22,210 --> 00:34:24,630 Let me go ahead now and run-- 753 00:34:24,630 --> 00:34:26,170 OK, we got 100 the first time. 754 00:34:26,170 --> 00:34:27,590 We got 50 the-- 755 00:34:27,590 --> 00:34:29,320 oh, that's a bug. 756 00:34:29,320 --> 00:34:31,560 What did I do there? 757 00:34:31,560 --> 00:34:33,810 See, this is what happens when you copy-paste. 758 00:34:33,810 --> 00:34:34,780 So let's fix this. 759 00:34:34,780 --> 00:34:37,810 That should say Score 2, so Control+C will quit a program. 760 00:34:37,810 --> 00:34:42,340 Make score 0 will recreate it. ./0, Enter-- 761 00:34:42,340 --> 00:34:43,260 all right, here we go. 762 00:34:43,260 --> 00:34:44,850 100, 50. 763 00:34:44,850 --> 00:34:46,020 Let's split the difference-- 764 00:34:46,020 --> 00:34:47,260 75. 765 00:34:47,260 --> 00:34:51,000 All right, so this is a simple bar chart horizontally drawn 766 00:34:51,000 --> 00:34:55,350 of each of my three scores, where this is 100, this is 50, and this is 75. 767 00:34:55,350 --> 00:34:57,820 But there's opportunities for improvement here. 768 00:34:57,820 --> 00:34:59,850 So one, it rubbed some folks the wrong way 769 00:34:59,850 --> 00:35:05,190 already that we were literally copying and pasting code. 770 00:35:05,190 --> 00:35:09,430 So where is one opportunity for improvement here? 771 00:35:09,430 --> 00:35:13,000 What should I do instead of copying and pasting that code again and again? 772 00:35:13,000 --> 00:35:15,640 What ingredient can you bring? 773 00:35:15,640 --> 00:35:19,100 OK, so we can use a loop and actually just do the same thing three times. 774 00:35:19,100 --> 00:35:22,280 So let's try that. 775 00:35:22,280 --> 00:35:25,310 Let me go ahead and do this. 776 00:35:25,310 --> 00:35:28,660 So let's go ahead and delete the copy-paste I did, 777 00:35:28,660 --> 00:35:35,620 and let me go ahead and say, OK, well, for int i get zero, i less than 3, i++. 778 00:35:35,620 --> 00:35:37,660 Let me create a bracket. 779 00:35:37,660 --> 00:35:39,460 I can highlight multiple lines and hit Tab, 780 00:35:39,460 --> 00:35:41,590 and they'll all indent for me, which is convenient. 781 00:35:41,590 --> 00:35:44,270 And can I do this now, for instance? 782 00:35:48,860 --> 00:35:50,240 Say it a little louder. 783 00:35:50,240 --> 00:35:54,660 AUDIENCE: If you [INAUDIBLE] to a specific [INAUDIBLE].. 784 00:35:54,660 --> 00:35:56,580 DAVID J. MALAN: Yeah, I'm a little worried. 785 00:35:56,580 --> 00:36:01,420 As you're noting here, we're using on line 13 here the same variable, so mm. 786 00:36:01,420 --> 00:36:03,420 So it's good instincts, but I feel like the fact 787 00:36:03,420 --> 00:36:05,160 that this program, unlike last week, we're 788 00:36:05,160 --> 00:36:06,790 now collecting multiple pieces of data. 789 00:36:06,790 --> 00:36:08,090 Loops are breaking down for us. 790 00:36:08,090 --> 00:36:08,590 Yeah. 791 00:36:08,590 --> 00:36:13,020 AUDIENCE: [INAUDIBLE] function [INAUDIBLE] takes in-- 792 00:36:13,020 --> 00:36:16,980 like you can have it [INAUDIBLE]. 793 00:36:16,980 --> 00:36:17,770 DAVID J. MALAN: OK. 794 00:36:17,770 --> 00:36:20,520 AUDIENCE: So like an input of how many scores you wanted to enter. 795 00:36:20,520 --> 00:36:21,410 DAVID J. MALAN: OK. 796 00:36:21,410 --> 00:36:23,690 AUDIENCE: And then [INAUDIBLE]. 797 00:36:23,690 --> 00:36:25,610 DAVID J. MALAN: Yeah, we can implement another 798 00:36:25,610 --> 00:36:27,810 function that factors out some of this functionality. 799 00:36:27,810 --> 00:36:28,650 Any other thoughts? 800 00:36:28,650 --> 00:36:30,790 AUDIENCE: Store your scores in an array. 801 00:36:30,790 --> 00:36:33,540 DAVID J. MALAN: OK, so we could also store our scores in an array. 802 00:36:33,540 --> 00:36:34,910 So let's do these in order then, in fact. 803 00:36:34,910 --> 00:36:37,740 So loops are wonderful when you want to do something again and again 804 00:36:37,740 --> 00:36:39,890 and again, but the whole purpose of a function, 805 00:36:39,890 --> 00:36:43,340 fundamentally, is to factor out common functionality. 806 00:36:43,340 --> 00:36:45,340 And there might still be a loop in the solution, 807 00:36:45,340 --> 00:36:48,130 but the real fundamental problem with what I was doing a moment ago 808 00:36:48,130 --> 00:36:50,210 was I was copying and pasting functionality-- 809 00:36:50,210 --> 00:36:52,700 shouldn't need to do that, because in both C and Scratch, 810 00:36:52,700 --> 00:36:54,890 we had the ability to make our own functions. 811 00:36:54,890 --> 00:36:55,720 So let's do that. 812 00:36:55,720 --> 00:36:58,010 Let me undo my loop changes here, just to get us 813 00:36:58,010 --> 00:36:59,960 back to where we were a moment ago. 814 00:36:59,960 --> 00:37:02,790 And let me go ahead and, instead, clean this up a little bit. 815 00:37:02,790 --> 00:37:04,910 Let me go ahead and create a new function 816 00:37:04,910 --> 00:37:07,850 down here that I'm going to call, say, Chart, just 817 00:37:07,850 --> 00:37:09,380 to create a chart for myself. 818 00:37:09,380 --> 00:37:12,840 And it's going to take as input a score, but I could call this anything I want. 819 00:37:12,840 --> 00:37:15,960 It's void as its return type, because I don't need it to hand me something 820 00:37:15,960 --> 00:37:16,460 back. 821 00:37:16,460 --> 00:37:18,010 Like I'm not getting a string from the user. 822 00:37:18,010 --> 00:37:19,330 I'm just printing a char. 823 00:37:19,330 --> 00:37:21,680 It's a so-called side effect or output. 824 00:37:21,680 --> 00:37:25,220 Now I'm going to go ahead and do my loop here for int i get 0. 825 00:37:25,220 --> 00:37:27,500 I is less than-- 826 00:37:27,500 --> 00:37:32,400 how many hashes do I want to print if I'm being passed in the user score? 827 00:37:32,400 --> 00:37:34,770 Like, is this 3 here? 828 00:37:34,770 --> 00:37:35,860 AUDIENCE: The score. 829 00:37:35,860 --> 00:37:37,610 DAVID J. MALAN: The score, so if I'm being 830 00:37:37,610 --> 00:37:40,540 handed a number that's 0 to 100, that's what I want to iterate over. 831 00:37:40,540 --> 00:37:43,000 If my goal here, ultimately-- 832 00:37:43,000 --> 00:37:48,970 let me finish this thought-- i++ is [? 2 ?] inside this loop print out one 833 00:37:48,970 --> 00:37:52,080 hash per point in 1's total score. 834 00:37:52,080 --> 00:37:54,580 And just to keep things clean, I'm going to go ahead and put 835 00:37:54,580 --> 00:37:56,500 a new line at the very end of this. 836 00:37:56,500 --> 00:37:59,820 But I think now, I factored out a good amount of the redundancy. 837 00:37:59,820 --> 00:38:01,960 It's not everything, but I've at least now 838 00:38:01,960 --> 00:38:04,030 given myself a function called Chart. 839 00:38:04,030 --> 00:38:08,860 So up here, it looks like I can kind of remove this loop, which 840 00:38:08,860 --> 00:38:10,030 is what I factored out. 841 00:38:10,030 --> 00:38:13,390 That's almost identical, except the variable name was hardcoded. 842 00:38:13,390 --> 00:38:18,250 And I think I could now do chart like this, 843 00:38:18,250 --> 00:38:22,270 and then I maybe could do a little copy-paste, if that's OK, like if maybe 844 00:38:22,270 --> 00:38:28,360 I can get away with just doing this, and then say 2, and then say 3, 845 00:38:28,360 --> 00:38:30,310 and then say 3, and then say 2. 846 00:38:30,310 --> 00:38:32,680 So it's still copy-paste, but it's less. 847 00:38:32,680 --> 00:38:33,520 And it looks better. 848 00:38:33,520 --> 00:38:36,940 It literally fits on the screen, so it's progress-- not perfect, but progress. 849 00:38:36,940 --> 00:38:38,780 Better design, but not perfect. 850 00:38:38,780 --> 00:38:42,200 So is this going to compile? 851 00:38:42,200 --> 00:38:44,050 I'm going to have errors why? 852 00:38:44,050 --> 00:38:47,330 AUDIENCE: Essentially, it's [INAUDIBLE] the program [INAUDIBLE].. 853 00:38:47,330 --> 00:38:49,420 DAVID J. MALAN: OK. 854 00:38:49,420 --> 00:38:50,370 Yeah. 855 00:38:50,370 --> 00:38:52,720 AUDIENCE: We need to declare a [INAUDIBLE].. 856 00:38:52,720 --> 00:38:53,850 DAVID J. MALAN: OK, good. 857 00:38:53,850 --> 00:38:58,000 So let me induce the actual error, just so we know what problem we're solving. 858 00:38:58,000 --> 00:39:00,460 Let me go ahead and sort of innocently go ahead 859 00:39:00,460 --> 00:39:03,730 and compile Score 0 hoping all is well, but of course, 860 00:39:03,730 --> 00:39:07,690 it's not because of a familiar error up here. 861 00:39:07,690 --> 00:39:12,430 So notice, implicit declaration of function chart is invalid in C99. 862 00:39:12,430 --> 00:39:14,680 So again, implicit declaration of function 863 00:39:14,680 --> 00:39:18,720 just tends to mean Clang does not know what you're talking about. 864 00:39:18,720 --> 00:39:20,680 And you could run help50, and it would probably 865 00:39:20,680 --> 00:39:22,300 provide you with similar advice. 866 00:39:22,300 --> 00:39:25,720 But the gist of this is that chart is not a C function. 867 00:39:25,720 --> 00:39:27,760 It doesn't come with C. I wrote it. 868 00:39:27,760 --> 00:39:29,470 I just wrote it a little too late. 869 00:39:29,470 --> 00:39:32,620 So one solution that we didn't used last week 870 00:39:32,620 --> 00:39:35,290 would be, OK, well, if you don't know what chart is, let me just 871 00:39:35,290 --> 00:39:37,920 go put it where you'll know about it. 872 00:39:37,920 --> 00:39:40,740 And now run make score 0. 873 00:39:40,740 --> 00:39:42,250 OK, problem solved. 874 00:39:42,250 --> 00:39:46,180 So that fixes it, but we fixed it in a different way last week. 875 00:39:46,180 --> 00:39:48,790 And why might we want to stick with last week's approach 876 00:39:48,790 --> 00:39:50,800 and not just copy-paste my function and put it 877 00:39:50,800 --> 00:39:52,180 at the top instead of the bottom? 878 00:39:55,940 --> 00:39:57,640 AUDIENCE: [INAUDIBLE]. 879 00:39:57,640 --> 00:40:00,640 DAVID J. MALAN: Yeah, I mean it's kind of a minor concern at the moment, 880 00:40:00,640 --> 00:40:02,260 because this is a pretty short program. 881 00:40:02,260 --> 00:40:06,130 But I'm pushing the main part of my program, literally called Main, 882 00:40:06,130 --> 00:40:07,360 farther and farther down. 883 00:40:07,360 --> 00:40:10,570 And the whole point of reading code is to understand what it's doing. 884 00:40:10,570 --> 00:40:13,770 So if I open this file, and I have to scroll, scroll, scroll, scroll, scroll, 885 00:40:13,770 --> 00:40:16,330 just looking for the main function, it's just bad style. 886 00:40:16,330 --> 00:40:18,700 It's just kind of nice, and it's a good human convention. 887 00:40:18,700 --> 00:40:22,480 Put the main code, the main function, when green flag clicks equivalent, 888 00:40:22,480 --> 00:40:23,540 at the very top. 889 00:40:23,540 --> 00:40:25,480 So C does offer us a solution here. 890 00:40:25,480 --> 00:40:27,490 You just have to provide it with a little hint. 891 00:40:27,490 --> 00:40:32,500 Let me go ahead and cut this from here, put it back down at the bottom here, 892 00:40:32,500 --> 00:40:38,630 and then go ahead and copy-paste only or retype only the value-- 893 00:40:38,630 --> 00:40:43,630 whoops-- the value of that first line, which is its so-called prototype. 894 00:40:43,630 --> 00:40:47,440 Give Clang enough information so that it knows what arguments the function 895 00:40:47,440 --> 00:40:50,740 takes, what its return type is, and what its name is, semi-colon, 896 00:40:50,740 --> 00:40:53,680 and that's the so-called declaration or-- 897 00:40:53,680 --> 00:40:58,070 and then implement it with the curly braces and all the logic down below. 898 00:40:58,070 --> 00:40:59,780 So let's go ahead and run this. 899 00:40:59,780 --> 00:41:03,280 And if I scroll up here, we'll see-- whoops. 900 00:41:03,280 --> 00:41:05,840 We'll see make score 0. 901 00:41:05,840 --> 00:41:08,060 All right, now we're on our way, score 0. 902 00:41:08,060 --> 00:41:08,560 Enter. 903 00:41:08,560 --> 00:41:13,780 Score 1 is 100, 50, 75, and now we seem to have some good functionality. 904 00:41:13,780 --> 00:41:17,280 But there's still an opportunity, I dare say, for improvement. 905 00:41:17,280 --> 00:41:19,630 And I think the fundamental problem is that I'm still 906 00:41:19,630 --> 00:41:21,580 copy-pasting the little stuff, but I think 907 00:41:21,580 --> 00:41:26,290 the fundamental problem is that I don't have the expressiveness to store 908 00:41:26,290 --> 00:41:30,010 multiple values, unless I, in advance, as the programmer, 909 00:41:30,010 --> 00:41:34,720 give them all unique names, because if I use the same variable for everything, 910 00:41:34,720 --> 00:41:37,120 I couldn't collect all three variables at the top, 911 00:41:37,120 --> 00:41:40,600 and then iterate over all three at the bottom, if I only have one variable. 912 00:41:40,600 --> 00:41:43,880 So I do need three variables, but this doesn't scale very well. 913 00:41:43,880 --> 00:41:44,470 And who knows? 914 00:41:44,470 --> 00:41:47,830 If I want to take in five scores, 10 scores, or more scores, 915 00:41:47,830 --> 00:41:51,140 then I'm really copying and pasting excessively. 916 00:41:51,140 --> 00:41:53,660 So it turns out, indeed, the answer is an array. 917 00:41:53,660 --> 00:41:55,910 So an array, at the end of the day, is just 918 00:41:55,910 --> 00:41:59,110 a side effect of storing stuff in memory back to back to back to back. 919 00:41:59,110 --> 00:42:03,550 But what's powerful about this reality of memory is the following. 920 00:42:03,550 --> 00:42:07,960 I can go ahead here and in, say, a new and more improved 921 00:42:07,960 --> 00:42:10,090 version of this program, do this. 922 00:42:10,090 --> 00:42:14,560 Let me go ahead and open this one, which I wrote in advance, called scores2.c. 923 00:42:14,560 --> 00:42:18,940 And in scores2.c, notice we have the following code. 924 00:42:18,940 --> 00:42:23,140 In my main function, I've got a new feature and a new bit of syntax. 925 00:42:23,140 --> 00:42:26,230 This line here that I've highlighted says, hey, Clang, 926 00:42:26,230 --> 00:42:30,520 give me a variable called Scores of type integer, 927 00:42:30,520 --> 00:42:32,410 but please give me three of them. 928 00:42:32,410 --> 00:42:34,270 So the new syntax are your square brackets, 929 00:42:34,270 --> 00:42:37,880 and inside of which is the number of variables you want of that type. 930 00:42:37,880 --> 00:42:39,760 And you don't have to give them unique names. 931 00:42:39,760 --> 00:42:41,920 You literally call them collectively, Scores, 932 00:42:41,920 --> 00:42:44,660 and in English, I deliberately chose a plural to connote as much. 933 00:42:44,660 --> 00:42:48,160 This is an array of values, not a single value. 934 00:42:48,160 --> 00:42:49,460 What can I do next? 935 00:42:49,460 --> 00:42:53,920 Well, here's my for loop for int i get zero i is less than 3 i++, 936 00:42:53,920 --> 00:42:56,410 and now I've solved that earlier problem that was proposed. 937 00:42:56,410 --> 00:42:57,920 Well, just put it in a loop. 938 00:42:57,920 --> 00:43:02,560 Now I can, because now my variables are not called Score 1, Score 2, Score 3, 939 00:43:02,560 --> 00:43:04,270 which I literally had to hard code. 940 00:43:04,270 --> 00:43:07,420 They're just called Scores, and now that they're called Scores, 941 00:43:07,420 --> 00:43:10,480 and I have this square bracket notation, notice what I can do. 942 00:43:10,480 --> 00:43:15,700 I can get an int, and I can say, give me score%i, and plug in i plus 1. 943 00:43:15,700 --> 00:43:17,500 I didn't want to say "zero," because humans 944 00:43:17,500 --> 00:43:18,950 don't count from zero in general. 945 00:43:18,950 --> 00:43:24,430 So this is counting from one, two, and three, but the computer is doing this. 946 00:43:24,430 --> 00:43:25,980 So Scores is a variable. 947 00:43:25,980 --> 00:43:32,240 Bracket, i, close bracket says store the i-th value there. 948 00:43:32,240 --> 00:43:33,790 So i-th is just non-English. 949 00:43:33,790 --> 00:43:36,850 That means go to bracket 0, bracket 1, bracket 2. 950 00:43:36,850 --> 00:43:40,120 So what this effectively means is on the first iteration of the loop, when 951 00:43:40,120 --> 00:43:44,080 i equals 0, this looks like this, effectively. 952 00:43:44,080 --> 00:43:48,040 When i then becomes 1 on the next iteration, then you're doing this. 953 00:43:48,040 --> 00:43:51,610 When i becomes 2 on the final iteration, it looks like this. 954 00:43:51,610 --> 00:43:54,910 When i becomes 3, well, 3 is not less than 3, 955 00:43:54,910 --> 00:43:57,050 and so it doesn't execute again. 956 00:43:57,050 --> 00:44:03,980 So by using i inside of these square brackets, am I indexing into an array? 957 00:44:03,980 --> 00:44:06,640 To index into an array means go to a specific location, 958 00:44:06,640 --> 00:44:09,880 the so-called i-th location, but you start counting at zero. 959 00:44:09,880 --> 00:44:12,100 Just to make this more real, then, if you go back 960 00:44:12,100 --> 00:44:14,350 to this picture of your computer's memory, 961 00:44:14,350 --> 00:44:18,700 this might, therefore, be bracket i, bracket 1-- 962 00:44:18,700 --> 00:44:23,710 bracket 0, bracket 1, bracket 2, bracket 3, bracket 4, bracket 50, or wherever. 963 00:44:23,710 --> 00:44:27,910 You can now, using square brackets, get at any of these blocks of memory 964 00:44:27,910 --> 00:44:30,970 to store values for you. 965 00:44:30,970 --> 00:44:34,520 Any questions on what we've just done? 966 00:44:34,520 --> 00:44:37,230 All right, then on the flip side, we can do the exact same thing. 967 00:44:37,230 --> 00:44:42,290 Now when I print my scores, I can similarly iterate from 0 to 3, 968 00:44:42,290 --> 00:44:45,350 and then print out the scores by passing to chart 969 00:44:45,350 --> 00:44:48,170 the same value, the i-th score. 970 00:44:48,170 --> 00:44:51,440 Again, the only new syntax here is variable name, square bracket, 971 00:44:51,440 --> 00:44:54,980 and then a number, like 0, 1, 2, or a variable like i, 972 00:44:54,980 --> 00:44:57,800 and then my chart function down here is exactly the same. 973 00:44:57,800 --> 00:45:00,590 It has no idea an array is even involved, because I'm just 974 00:45:00,590 --> 00:45:04,640 passing in one score at a time. 975 00:45:04,640 --> 00:45:08,960 Now it turns out there's still one bad design decision in this program. 976 00:45:08,960 --> 00:45:12,710 There's still some redundancy, something that I keep typing again 977 00:45:12,710 --> 00:45:14,150 and again and again. 978 00:45:14,150 --> 00:45:15,980 Do any values jump out at you as repeated? 979 00:45:19,540 --> 00:45:20,710 AUDIENCE: The for loop. 980 00:45:20,710 --> 00:45:21,920 DAVID J. MALAN: The for loop. 981 00:45:21,920 --> 00:45:24,810 OK, so I've got the for loop in multiple places. 982 00:45:24,810 --> 00:45:25,580 Sure. 983 00:45:25,580 --> 00:45:29,350 And what other value seems to be in multiple places? 984 00:45:29,350 --> 00:45:30,170 It's subtle. 985 00:45:32,820 --> 00:45:33,390 Total number. 986 00:45:33,390 --> 00:45:34,440 Yeah, 3. 987 00:45:34,440 --> 00:45:35,850 Three is in a few places. 988 00:45:35,850 --> 00:45:36,750 It's up here. 989 00:45:36,750 --> 00:45:41,160 It's when I declare the array and ask myself for three scores. 990 00:45:41,160 --> 00:45:44,200 It's here when I'm iterating. 991 00:45:44,200 --> 00:45:46,410 It's not here, because this is a different iteration. 992 00:45:46,410 --> 00:45:48,160 That's just for the hashes. 993 00:45:48,160 --> 00:45:51,270 So in, ironically, three places, have I written 3. 994 00:45:51,270 --> 00:45:52,300 So what does this mean? 995 00:45:52,300 --> 00:45:54,630 Well, suppose next year you take more tests or whatever, 996 00:45:54,630 --> 00:45:55,770 and you need more scores. 997 00:45:55,770 --> 00:46:00,320 You open up your program, and all right, now I've got five scores and five-- 998 00:46:00,320 --> 00:46:04,020 whoops, typo already-- five, like this kind of pattern 999 00:46:04,020 --> 00:46:06,180 where you're typing the same thing again and again. 1000 00:46:06,180 --> 00:46:08,040 And now the onus is on me, the programmer, 1001 00:46:08,040 --> 00:46:11,580 to remember to change the same [? damn ?] value in multiple places-- 1002 00:46:11,580 --> 00:46:12,930 bad, bad, bad design. 1003 00:46:12,930 --> 00:46:14,640 You're going to miss one of those values. 1004 00:46:14,640 --> 00:46:15,930 Your program's going to get more complex. 1005 00:46:15,930 --> 00:46:18,390 You're going to leave one at 3 and change the other to 5, 1006 00:46:18,390 --> 00:46:20,540 and logical errors are eventually going to happen. 1007 00:46:20,540 --> 00:46:21,540 So how do we solve this? 1008 00:46:21,540 --> 00:46:24,410 The function's not the solution here, because it's not functionality. 1009 00:46:24,410 --> 00:46:25,560 It's just a value. 1010 00:46:25,560 --> 00:46:28,860 Well, we could use a variable, but a certain type of variable. 1011 00:46:28,860 --> 00:46:32,150 These numbers here-- 5, 5, 5 or 3, 3, 3-- 1012 00:46:32,150 --> 00:46:34,360 are what humans generally refer to as magic numbers. 1013 00:46:34,360 --> 00:46:36,450 Like they're numbers, but they're kind of magical, 1014 00:46:36,450 --> 00:46:39,510 because you just arbitrarily hardcoded them in random places. 1015 00:46:39,510 --> 00:46:44,000 But a better convention would be, often as a global variable, to do this-- 1016 00:46:44,000 --> 00:46:47,790 int, let's call it "count," equals 3. 1017 00:46:47,790 --> 00:46:51,270 So declare a variable of type int that is the number of things 1018 00:46:51,270 --> 00:46:55,740 you want, and then type that variable name all throughout your code 1019 00:46:55,740 --> 00:46:58,920 so that later on, if you ever want to change this program, 1020 00:46:58,920 --> 00:47:01,500 you change it-- whoops-- in one place, and you're 1021 00:47:01,500 --> 00:47:03,330 done after recompiling the program. 1022 00:47:03,330 --> 00:47:05,500 And actually, I should do a little better than this. 1023 00:47:05,500 --> 00:47:08,790 It turns out that if you know you have a variable that you're never going 1024 00:47:08,790 --> 00:47:10,790 to change, because it's not supposed to change-- 1025 00:47:10,790 --> 00:47:12,450 it's supposed to be a constant value-- 1026 00:47:12,450 --> 00:47:16,470 C also has a special keyword called const, where before the data type, 1027 00:47:16,470 --> 00:47:20,220 you say, const int, and then the name and then the value, and this way, 1028 00:47:20,220 --> 00:47:22,740 the compiler, Clang, will make sure that you, the human, 1029 00:47:22,740 --> 00:47:27,100 don't screw up and accidentally try to change the count anywhere else. 1030 00:47:27,100 --> 00:47:28,440 There's one other thing notable. 1031 00:47:28,440 --> 00:47:30,990 I also capitalize this whole thing for some reason-- 1032 00:47:30,990 --> 00:47:31,870 human convention. 1033 00:47:31,870 --> 00:47:34,830 Anytime you capitalize all of the letters in a variable name, 1034 00:47:34,830 --> 00:47:36,990 the convention is that that means it's global. 1035 00:47:36,990 --> 00:47:40,780 That means it's defined way up top, and you can use it anywhere, therefore, 1036 00:47:40,780 --> 00:47:42,780 because it's outside all curly braces. 1037 00:47:42,780 --> 00:47:46,290 But it's meant to imply and remind you that this is special. 1038 00:47:46,290 --> 00:47:48,630 It's not just a so-called local variable inside 1039 00:47:48,630 --> 00:47:52,440 of a function or inside of a loop or the like. 1040 00:47:52,440 --> 00:47:54,440 Any questions on that? 1041 00:47:54,440 --> 00:47:55,110 Yeah. 1042 00:47:55,110 --> 00:47:56,470 AUDIENCE: What is [INAUDIBLE]? 1043 00:47:56,470 --> 00:47:57,930 Why do you have i plus 1? 1044 00:47:57,930 --> 00:47:59,730 DAVID J. MALAN: Oh, why do I have i plus 1? 1045 00:47:59,730 --> 00:48:01,930 Let me run this program real quick. 1046 00:48:01,930 --> 00:48:05,160 Why do I have i plus 1 in this line here, is the question. 1047 00:48:05,160 --> 00:48:07,860 So let me go ahead and run make scores 2-- 1048 00:48:07,860 --> 00:48:09,630 whoops-- in my directory. 1049 00:48:09,630 --> 00:48:13,680 Make scores 2 ./scores2, Enter. 1050 00:48:13,680 --> 00:48:18,570 I wanted just the human to see Score 1 and Score 2 and Score 3. 1051 00:48:18,570 --> 00:48:22,620 I didn't want him or her to see Score 0, Score 1, Score 2, because it just looks 1052 00:48:22,620 --> 00:48:23,640 lame to the human. 1053 00:48:23,640 --> 00:48:25,560 The computer needs to think in terms of zeros. 1054 00:48:25,560 --> 00:48:29,190 My humans and my users do not, so just an aesthetic. 1055 00:48:29,190 --> 00:48:29,860 Other questions. 1056 00:48:29,860 --> 00:48:30,630 Yeah. 1057 00:48:30,630 --> 00:48:32,100 AUDIENCE: [INAUDIBLE]. 1058 00:48:37,300 --> 00:48:39,010 DAVID J. MALAN: Ah, really good question. 1059 00:48:39,010 --> 00:48:40,840 And I actually thought about this last night 1060 00:48:40,840 --> 00:48:43,630 when trying to craft this example. 1061 00:48:43,630 --> 00:48:45,850 Why don't I just combine these two for loops, 1062 00:48:45,850 --> 00:48:50,500 because they're clearly iterating an identical number of times? 1063 00:48:50,500 --> 00:48:52,450 Was this a hand or just a stretch? 1064 00:48:52,450 --> 00:48:53,830 No, stretch. 1065 00:48:53,830 --> 00:48:57,880 So this is actually deliberate. 1066 00:48:57,880 --> 00:49:01,750 If I combine these, what would change logically in my program? 1067 00:49:01,750 --> 00:49:02,520 Yeah. 1068 00:49:02,520 --> 00:49:05,890 AUDIENCE: After every [INAUDIBLE] input, you would [INAUDIBLE].. 1069 00:49:05,890 --> 00:49:08,860 DAVID J. MALAN: Yeah, so after every human input of a score, 1070 00:49:08,860 --> 00:49:11,920 I would see that user's chart, the row of hashes. 1071 00:49:11,920 --> 00:49:13,420 Then I'd ask them for another value. 1072 00:49:13,420 --> 00:49:15,470 They'd see the chart, another value, and they'd see the chart. 1073 00:49:15,470 --> 00:49:17,710 And that's fine, if that is the design you want. 1074 00:49:17,710 --> 00:49:18,580 Totally acceptable. 1075 00:49:18,580 --> 00:49:19,480 Totally correct. 1076 00:49:19,480 --> 00:49:22,310 I wanted mine to look a little more traditional with all of the bars 1077 00:49:22,310 --> 00:49:25,510 together, so I effectively had to postpone printing the hashes. 1078 00:49:25,510 --> 00:49:27,760 And that's why I did have a little bit of redundancy 1079 00:49:27,760 --> 00:49:30,400 by getting the user's input here and then iterating again 1080 00:49:30,400 --> 00:49:34,360 to actually print the user's output as a chart, so just a design decision. 1081 00:49:34,360 --> 00:49:35,060 Good question. 1082 00:49:35,060 --> 00:49:37,300 Other questions? 1083 00:49:37,300 --> 00:49:40,020 All right, so what does this look like? 1084 00:49:40,020 --> 00:49:41,020 Actually, you know what? 1085 00:49:41,020 --> 00:49:42,430 I can probably do a little better. 1086 00:49:42,430 --> 00:49:45,730 Let me open up one final example involving scores and this thing 1087 00:49:45,730 --> 00:49:46,860 called an array. 1088 00:49:46,860 --> 00:49:52,030 In Scores 4 here, let me go ahead and do this. 1089 00:49:52,030 --> 00:49:55,870 Now I've changed my chart function to do a little bit more, 1090 00:49:55,870 --> 00:49:58,840 and you might recall from week 0 and 1, we had the call function, 1091 00:49:58,840 --> 00:50:00,910 and we kept enhancing it to do more and more, 1092 00:50:00,910 --> 00:50:02,830 like putting more and more logic into it. 1093 00:50:02,830 --> 00:50:04,210 Notice this. 1094 00:50:04,210 --> 00:50:08,360 Chart function now takes a second argument, which is kind of interesting. 1095 00:50:08,360 --> 00:50:10,720 It takes one argument, which is a number, 1096 00:50:10,720 --> 00:50:13,750 and then the next argument is an array of scores. 1097 00:50:13,750 --> 00:50:16,090 So long story short, if you want to have a function that 1098 00:50:16,090 --> 00:50:18,610 takes as input an array, you don't have to know 1099 00:50:18,610 --> 00:50:20,200 in advance how big that array is. 1100 00:50:20,200 --> 00:50:23,140 You should not, in fact, put a number in between the square brackets 1101 00:50:23,140 --> 00:50:24,730 in this context. 1102 00:50:24,730 --> 00:50:27,700 But the thing is you do need to know, at some point, 1103 00:50:27,700 --> 00:50:29,200 how many items are in the array. 1104 00:50:29,200 --> 00:50:32,830 If you've programmed in Java, took AP CS, Java just gives you .length, 1105 00:50:32,830 --> 00:50:35,200 if you recall that feature of objects. 1106 00:50:35,200 --> 00:50:36,550 C does not have this. 1107 00:50:36,550 --> 00:50:39,850 Arrays do not have an inherent length associated with them. 1108 00:50:39,850 --> 00:50:44,060 You have to tell everyone who uses your array how long it is. 1109 00:50:44,060 --> 00:50:46,780 So even though you don't do that syntactically here, 1110 00:50:46,780 --> 00:50:50,620 you literally just say, I expect an argument called scores that 1111 00:50:50,620 --> 00:50:53,150 is an array per the square brackets. 1112 00:50:53,150 --> 00:50:55,900 You have to pass and almost always a second variable 1113 00:50:55,900 --> 00:50:57,760 that is literally called whatever you want, 1114 00:50:57,760 --> 00:50:59,770 but is the number of things in that array, 1115 00:50:59,770 --> 00:51:02,560 because if the goal of this function is just 1116 00:51:02,560 --> 00:51:09,100 to iterate over the number of scores that are passed in, 1117 00:51:09,100 --> 00:51:12,940 and then iterate over the number of points in that score 1118 00:51:12,940 --> 00:51:16,970 in order to print out the hashes, you need to know this count. 1119 00:51:16,970 --> 00:51:19,000 So what does this function do, just to be clear? 1120 00:51:19,000 --> 00:51:22,480 This iterates over the total number of scores from 0 to count, 1121 00:51:22,480 --> 00:51:24,820 which is probably 3 or 5 or whatever. 1122 00:51:24,820 --> 00:51:27,880 This loop here, using J, which is just a convention, 1123 00:51:27,880 --> 00:51:32,060 instead iterates from 0 to whatever that i-th score is. 1124 00:51:32,060 --> 00:51:33,270 So this is what's convenient. 1125 00:51:33,270 --> 00:51:38,140 Now I've passed in the array, and I can still get at individual values 1126 00:51:38,140 --> 00:51:41,840 just by using i, because I'm on my i-th iteration here. 1127 00:51:41,840 --> 00:51:44,800 So you might recall this from Mario, for instance, or any other example 1128 00:51:44,800 --> 00:51:46,300 in which you had nested loops-- 1129 00:51:46,300 --> 00:51:50,050 just very conventional to use i on the outside, j on the inside. 1130 00:51:50,050 --> 00:51:54,370 But again, the only point here is that you can, indeed, pass around arrays, 1131 00:51:54,370 --> 00:51:59,530 even as arguments, which we'll see why that's useful before long. 1132 00:51:59,530 --> 00:52:02,600 Any questions? 1133 00:52:02,600 --> 00:52:05,540 OK, so this was a lot, but we can do so much more still with arrays. 1134 00:52:05,540 --> 00:52:07,370 It gets even more and more cool. 1135 00:52:07,370 --> 00:52:10,250 In fact, we'll see, in just a bit, how arrays have actually 1136 00:52:10,250 --> 00:52:11,530 been with us since last week. 1137 00:52:11,530 --> 00:52:14,280 We just didn't quite realize it under the hood, but let's go ahead 1138 00:52:14,280 --> 00:52:15,690 and take a breather, five minutes. 1139 00:52:15,690 --> 00:52:16,990 We'll come back and dive in. 1140 00:52:16,990 --> 00:52:17,690 All right. 1141 00:52:17,690 --> 00:52:20,360 So I know that was a bit of a cliffhanger. 1142 00:52:20,360 --> 00:52:22,550 Where else could arrays have actually been? 1143 00:52:22,550 --> 00:52:25,010 But, of course, this is how we might depict it pictorially. 1144 00:52:25,010 --> 00:52:27,980 We called it an array, and it turns out that last week, when 1145 00:52:27,980 --> 00:52:31,700 we introduced strings, strings, sequences of characters, 1146 00:52:31,700 --> 00:52:34,520 are literally just an array by another name. 1147 00:52:34,520 --> 00:52:39,330 A string is an array of chars, and chars, of course, is another data type. 1148 00:52:39,330 --> 00:52:41,250 Now what are the actual implications of this, 1149 00:52:41,250 --> 00:52:44,180 both in terms of representation, like how a computer's representing 1150 00:52:44,180 --> 00:52:48,050 information, and then fundamentally, programmatically, 1151 00:52:48,050 --> 00:52:50,330 what can we do when we know all of our data 1152 00:52:50,330 --> 00:52:53,960 is so back to back to back or so proximal to one another? 1153 00:52:53,960 --> 00:52:58,140 Well, it turns out that we can apply this logic in a few different ways. 1154 00:52:58,140 --> 00:53:01,110 Let me go ahead and open up, for instance, 1155 00:53:01,110 --> 00:53:04,700 an example here called String 0. 1156 00:53:04,700 --> 00:53:08,330 So in our code for today, in our Source 2 folder, 1157 00:53:08,330 --> 00:53:13,230 let me go ahead and open up String 0, and this example looks like this. 1158 00:53:13,230 --> 00:53:17,870 Notice that we first, on line 9, get a string from the user. 1159 00:53:17,870 --> 00:53:19,670 Just say, input, please. 1160 00:53:19,670 --> 00:53:23,880 We store that value in a string, s, and then we say, here comes the output. 1161 00:53:23,880 --> 00:53:26,390 And notice what I'm doing in the following line. 1162 00:53:26,390 --> 00:53:31,130 I'm iterating over i from 0 to strlen, whatever that is. 1163 00:53:31,130 --> 00:53:34,980 And then in line 13, I'm printing a character one at a time. 1164 00:53:34,980 --> 00:53:38,270 But notice the syntax I'm using, which we didn't use last week. 1165 00:53:38,270 --> 00:53:43,400 If you have a string called s, you can index into a string 1166 00:53:43,400 --> 00:53:46,460 just like it's an array, because it, indeed, is underneath the hood. 1167 00:53:46,460 --> 00:53:50,450 So s bracket i, where i starts at 0 and goes 1168 00:53:50,450 --> 00:53:55,450 up to whatever this value is is just a way of getting character 0, then 1169 00:53:55,450 --> 00:53:58,370 character 1, then character 2, then character 3, 1170 00:53:58,370 --> 00:54:01,550 and so the end result is actually going to look like this. 1171 00:54:01,550 --> 00:54:04,210 Let me go ahead and do, make string-- 1172 00:54:04,210 --> 00:54:06,480 whoops-- make string 0. 1173 00:54:06,480 --> 00:54:06,980 Oops. 1174 00:54:06,980 --> 00:54:07,910 Not in the directory. 1175 00:54:07,910 --> 00:54:15,890 Make string 0, ./string0, Enter, and I'll type in, say, Zamyla, 1176 00:54:15,890 --> 00:54:21,570 and the output now is Z-A-M-Y-L-A. It's a little messy, 1177 00:54:21,570 --> 00:54:24,870 because I don't have a new line here, so let me actually-- let's clean that up, 1178 00:54:24,870 --> 00:54:27,350 because this is unnecessarily sloppy. 1179 00:54:27,350 --> 00:54:31,010 So let me go ahead and print out a new line. 1180 00:54:31,010 --> 00:54:34,790 Let me recompile with make string 0, dot-- 1181 00:54:34,790 --> 00:54:37,520 whoops-- ./string0. 1182 00:54:37,520 --> 00:54:43,070 Input shall be Zamyla, Enter, and now Z-A-M-Y-L-A. 1183 00:54:43,070 --> 00:54:44,240 So why is that happening? 1184 00:54:44,240 --> 00:54:47,720 Well, if I scroll down on this code, it seems that I am, 1185 00:54:47,720 --> 00:54:52,700 via this printf line here, just getting the i-th character of the name in s, 1186 00:54:52,700 --> 00:54:55,820 and then printing out one character at a time per the %c, 1187 00:54:55,820 --> 00:54:57,270 followed by a new line. 1188 00:54:57,270 --> 00:55:01,970 So you might guess, what is this function here doing? 1189 00:55:01,970 --> 00:55:06,490 Strlen-- slightly abbreviated, but you can, perhaps, glean what it means. 1190 00:55:06,490 --> 00:55:08,610 Yeah, so it's actually string length. 1191 00:55:08,610 --> 00:55:11,220 So it turns out there is a function that comes 1192 00:55:11,220 --> 00:55:13,890 with C called strlen, and humans back in the day 1193 00:55:13,890 --> 00:55:17,670 and to this day like to type as few characters when possible. 1194 00:55:17,670 --> 00:55:21,600 And so strlen is string length, and the way you use it is you 1195 00:55:21,600 --> 00:55:22,940 just need one more header file. 1196 00:55:22,940 --> 00:55:24,940 So there's another library, the so-called string 1197 00:55:24,940 --> 00:55:27,060 library that gives you string-related functions 1198 00:55:27,060 --> 00:55:29,410 beyond what CS50's library provides. 1199 00:55:29,410 --> 00:55:32,130 And so if you include string.h, that gives you access 1200 00:55:32,130 --> 00:55:35,310 to another function called strlen, that if you pass it, 1201 00:55:35,310 --> 00:55:38,580 a variable containing a string, it will pass you back 1202 00:55:38,580 --> 00:55:40,990 as a return value the total number of characters. 1203 00:55:40,990 --> 00:55:46,440 So I typed in Z-A-M-Y-L-A, and so that should be returning to me six, 1204 00:55:46,440 --> 00:55:49,650 thereby printing out the six characters in Zamyla's name. 1205 00:55:49,650 --> 00:55:50,240 Yeah. 1206 00:55:50,240 --> 00:55:52,590 AUDIENCE: [INAUDIBLE]. 1207 00:55:52,590 --> 00:55:54,000 DAVID J. MALAN: Uh-huh. 1208 00:55:54,000 --> 00:55:56,370 AUDIENCE: [INAUDIBLE] useful to get the individual digits [INAUDIBLE].. 1209 00:55:56,370 --> 00:55:57,160 DAVID J. MALAN: Really good question. 1210 00:55:57,160 --> 00:56:00,480 In the credit problem of the problem set, would this have been useful? 1211 00:56:00,480 --> 00:56:01,740 Yes, absolutely. 1212 00:56:01,740 --> 00:56:04,500 But recall that in the credit pset, we encourage you to actually 1213 00:56:04,500 --> 00:56:07,890 take in the number as a long, so as an integral value, which 1214 00:56:07,890 --> 00:56:09,510 thereby necessitated arithmetic. 1215 00:56:09,510 --> 00:56:12,690 But yes, if you had, instead, in a problem involving credit card 1216 00:56:12,690 --> 00:56:16,440 numbers, gotten the human's input as a long string of characters 1217 00:56:16,440 --> 00:56:18,620 and not as an actual number like an int or a long, 1218 00:56:18,620 --> 00:56:21,330 then, yes, you could actually get at those individual characters, 1219 00:56:21,330 --> 00:56:26,250 which probably would have made things even easier but deliberate. 1220 00:56:26,250 --> 00:56:27,000 Yeah. 1221 00:56:27,000 --> 00:56:29,290 AUDIENCE: [INAUDIBLE]. 1222 00:56:29,290 --> 00:56:30,840 DAVID J. MALAN: Really good question. 1223 00:56:30,840 --> 00:56:33,840 If we're defining string in CS50, are we redefining it in string? 1224 00:56:33,840 --> 00:56:34,380 No. 1225 00:56:34,380 --> 00:56:36,810 So string, even though it's named string.h, 1226 00:56:36,810 --> 00:56:39,210 doesn't actually define something called a string. 1227 00:56:39,210 --> 00:56:42,660 It just has string-related functions. 1228 00:56:42,660 --> 00:56:43,410 More on that soon. 1229 00:56:43,410 --> 00:56:43,910 Yeah. 1230 00:56:43,910 --> 00:56:46,770 AUDIENCE: [INAUDIBLE] individual values [INAUDIBLE]?? 1231 00:56:46,770 --> 00:56:48,480 DAVID J. MALAN: Ah, really good question. 1232 00:56:48,480 --> 00:56:51,060 Could you edit the individual values? 1233 00:56:51,060 --> 00:56:52,950 So short answer, yes. 1234 00:56:52,950 --> 00:56:57,930 We could absolutely change values, and we'll soon do that in another context. 1235 00:56:57,930 --> 00:56:59,650 Other questions? 1236 00:56:59,650 --> 00:57:03,520 All right, so turns out this is correct, if my goal is 1237 00:57:03,520 --> 00:57:06,050 to print out all of the characters in Zamyla's name, 1238 00:57:06,050 --> 00:57:07,330 but it's not the best design. 1239 00:57:07,330 --> 00:57:09,820 And this one's a little subtle, but this is, again, what we mean by design. 1240 00:57:09,820 --> 00:57:12,160 And to a question that came up during the break, 1241 00:57:12,160 --> 00:57:16,060 did we expect everyone to be writing good style and good design last week? 1242 00:57:16,060 --> 00:57:16,640 No. 1243 00:57:16,640 --> 00:57:19,270 Up until today, like we've introduced the notion of correctness 1244 00:57:19,270 --> 00:57:21,430 in both Scratch and in C last week, but now we're 1245 00:57:21,430 --> 00:57:24,070 introducing these other axes of quality of code 1246 00:57:24,070 --> 00:57:27,320 like design, how well-designed it is, and how pretty 1247 00:57:27,320 --> 00:57:28,870 does it look in the context of style. 1248 00:57:28,870 --> 00:57:33,440 So expectations are here on out meant to be aligned with those characteristics, 1249 00:57:33,440 --> 00:57:35,030 but not in the past. 1250 00:57:35,030 --> 00:57:37,600 So there's a slight inefficiency here. 1251 00:57:37,600 --> 00:57:41,590 So on the first iteration of this loop, I first initialize i to 0, 1252 00:57:41,590 --> 00:57:45,370 and then I check if i less than the length of the string, which hopefully, 1253 00:57:45,370 --> 00:57:48,470 it is, if it's Zamyla, which is longer than 0. 1254 00:57:48,470 --> 00:57:50,260 Then I print the i-th character. 1255 00:57:50,260 --> 00:57:51,880 Then I increment i. 1256 00:57:51,880 --> 00:57:53,500 Then I check this condition. 1257 00:57:53,500 --> 00:57:55,390 Then I print the i-th character. 1258 00:57:55,390 --> 00:57:56,650 Then I increment i. 1259 00:57:56,650 --> 00:57:58,730 Then I check this condition and so forth. 1260 00:57:58,730 --> 00:58:01,430 We looped through loops last week, and you've used them, perhaps, 1261 00:58:01,430 --> 00:58:03,350 by now in problems. 1262 00:58:03,350 --> 00:58:08,150 What question am I redundantly asking seemingly unnecessarily? 1263 00:58:11,750 --> 00:58:13,850 I have to check a condition again and again, 1264 00:58:13,850 --> 00:58:15,230 because i is getting incremented. 1265 00:58:15,230 --> 00:58:18,410 But there's another other question that I 1266 00:58:18,410 --> 00:58:21,260 don't need to keep asking again just to get the same answer. 1267 00:58:21,260 --> 00:58:23,530 AUDIENCE: What is the length [? of the string? ?] 1268 00:58:23,530 --> 00:58:25,570 DAVID J. MALAN: Yeah, there's this function call 1269 00:58:25,570 --> 00:58:28,790 in my loop of strlen s, which is fine. 1270 00:58:28,790 --> 00:58:29,660 This is correct. 1271 00:58:29,660 --> 00:58:32,670 I'm checking the length of the string, but once I type in Zamyla, 1272 00:58:32,670 --> 00:58:34,730 her name is not changing in length. 1273 00:58:34,730 --> 00:58:38,240 I'm incrementing i, so I'm moving in the string, if you will. 1274 00:58:38,240 --> 00:58:41,900 But the string itself, Z-A-M-Y-L-A, is not changing. 1275 00:58:41,900 --> 00:58:46,020 So why am I asking the computer, again and again, get me the strlen of s, 1276 00:58:46,020 --> 00:58:48,140 get me the strlen of s, get me the strlen of s. 1277 00:58:48,140 --> 00:58:49,570 So I can actually fix this. 1278 00:58:49,570 --> 00:58:52,540 I can improve the design, because that must take some amount of time. 1279 00:58:52,540 --> 00:58:55,600 Maybe it's fast, but it's still a non-zero amount of time. 1280 00:58:55,600 --> 00:58:56,770 So you know what I could do? 1281 00:58:56,770 --> 00:58:59,950 I could do something like this-- int n get string length of s. 1282 00:58:59,950 --> 00:59:01,390 And now just do this. 1283 00:59:01,390 --> 00:59:05,560 This would be better design, because now I'm only asking the question once 1284 00:59:05,560 --> 00:59:06,460 of the function. 1285 00:59:06,460 --> 00:59:09,610 I'm remembering or caching, if you will, the answer, and then 1286 00:59:09,610 --> 00:59:10,870 I'm just using a variable. 1287 00:59:10,870 --> 00:59:12,790 And just comparing variables is just faster 1288 00:59:12,790 --> 00:59:16,180 than comparing a variable against a function, which has to be called, 1289 00:59:16,180 --> 00:59:18,600 which has to return a value, which you can then compare. 1290 00:59:18,600 --> 00:59:20,650 But honestly, it doesn't have to be this verbose. 1291 00:59:20,650 --> 00:59:22,810 We can actually be a little elegant about this. 1292 00:59:22,810 --> 00:59:25,540 If you're using a loop, a secret feature of loops 1293 00:59:25,540 --> 00:59:28,090 is that you can have commas after declaring variables. 1294 00:59:28,090 --> 00:59:32,320 And you can actually do this and make this even more elegant, if you will, 1295 00:59:32,320 --> 00:59:35,600 or more confusing-looking, depending on your perspective. 1296 00:59:35,600 --> 00:59:39,050 But this now does the same thing but declares n inside of the loop, 1297 00:59:39,050 --> 00:59:41,470 just like I'm declaring i, and it's just a little tighter. 1298 00:59:41,470 --> 00:59:44,850 It's one fewer lines of code. 1299 00:59:44,850 --> 00:59:47,130 Any questions, then? 1300 00:59:47,130 --> 00:59:50,020 AUDIENCE: [INAUDIBLE]. 1301 00:59:50,020 --> 00:59:51,270 DAVID J. MALAN: Good question. 1302 00:59:51,270 --> 00:59:54,830 In the way I've just done it cannot reuse this outside of the curly braces. 1303 00:59:54,830 --> 00:59:59,690 The scope of i and n exists only in this context right now. 1304 00:59:59,690 --> 01:00:00,650 The other way, yes. 1305 01:00:00,650 --> 01:00:03,100 I could have used it elsewhere. 1306 01:00:03,100 --> 01:00:09,950 AUDIENCE: What if you [INAUDIBLE] other loops, and you also had [INAUDIBLE]?? 1307 01:00:09,950 --> 01:00:11,440 DAVID J. MALAN: Absolutely. 1308 01:00:11,440 --> 01:00:13,820 AUDIENCE: Using different letters of the alphabet, 1309 01:00:13,820 --> 01:00:17,140 you could just use n and not be [INAUDIBLE].. 1310 01:00:17,140 --> 01:00:18,140 DAVID J. MALAN: Correct. 1311 01:00:18,140 --> 01:00:20,930 If I want to use the length of s again, absolutely. 1312 01:00:20,930 --> 01:00:24,420 I can declare the variable, as I did earlier, outside of the loop, 1313 01:00:24,420 --> 01:00:25,650 so as to reuse it. 1314 01:00:25,650 --> 01:00:26,790 That's totally fine. 1315 01:00:26,790 --> 01:00:27,290 Yes. 1316 01:00:27,290 --> 01:00:31,250 And even i-- i exists only inside of this loop, so if I have another loop, 1317 01:00:31,250 --> 01:00:34,640 I can reuse i, and it's a different i, because these variables only 1318 01:00:34,640 --> 01:00:37,610 exist inside the for loop in which they're declared. 1319 01:00:37,610 --> 01:00:44,350 So it turns out that these strings don't have anything in them 1320 01:00:44,350 --> 01:00:46,960 other than character after character after character. 1321 01:00:46,960 --> 01:00:49,370 And in fact, let me go ahead here and draw 1322 01:00:49,370 --> 01:00:52,420 a picture of what's actually going on underneath the hood of the computer 1323 01:00:52,420 --> 01:00:52,920 here. 1324 01:00:52,920 --> 01:00:55,180 So when I type in Zamyla's name, I'm, of course, 1325 01:00:55,180 --> 01:01:02,290 doing something like Z-A-M-Y-L-A. But where is that actually going? 1326 01:01:02,290 --> 01:01:04,960 Well, we know now that inside of your computer is RAM or memory, 1327 01:01:04,960 --> 01:01:06,730 and you can think of it like a grid. 1328 01:01:06,730 --> 01:01:08,640 And honestly, I can think of this whole screen 1329 01:01:08,640 --> 01:01:11,260 as just being in a different orientation, a grid of memory. 1330 01:01:11,260 --> 01:01:16,870 So for instance, maybe we can divide it into rows and columns like this, not 1331 01:01:16,870 --> 01:01:20,830 necessarily to scale, and there's more rows and columns. 1332 01:01:20,830 --> 01:01:23,920 So on the screen here, I'm just dividing things 1333 01:01:23,920 --> 01:01:28,190 into the individual bytes of memory that we saw a moment ago. 1334 01:01:28,190 --> 01:01:32,550 And so, indeed, underneath the hood of the computer is this layout of memory. 1335 01:01:32,550 --> 01:01:35,680 The compiler has somehow figured out or the program has somehow figured out 1336 01:01:35,680 --> 01:01:39,460 where to put the z and where the a and the m and the y and the l and the a, 1337 01:01:39,460 --> 01:01:42,730 but the key is that they're all contiguous, back to back to back. 1338 01:01:42,730 --> 01:01:46,780 But the catch is if I'm typing other words into my program or scores 1339 01:01:46,780 --> 01:01:49,060 into my program or any data into my program, 1340 01:01:49,060 --> 01:01:51,590 it's going to end up elsewhere in the computer's memory. 1341 01:01:51,590 --> 01:01:53,790 So how do you know where Zamyla begins and where 1342 01:01:53,790 --> 01:01:56,590 Zamyla ends, so to speak, in memory? 1343 01:01:56,590 --> 01:02:02,100 Well, the variable, called s, essentially is here. 1344 01:02:02,100 --> 01:02:06,160 There's some remembrance in the computer of where s begins. 1345 01:02:06,160 --> 01:02:10,750 But there's no obvious way to know where Zamyla ends, 1346 01:02:10,750 --> 01:02:12,970 unless we ourselves tell the computer. 1347 01:02:12,970 --> 01:02:16,510 So unbeknownst to us, any time a computer is storing a string like 1348 01:02:16,510 --> 01:02:21,700 Z-A-M-Y-L-A, it turns out that it's not using one, two, three, four, five, 1349 01:02:21,700 --> 01:02:22,930 six characters. 1350 01:02:22,930 --> 01:02:25,960 It's actually using seven secretly. 1351 01:02:25,960 --> 01:02:28,210 It's actually putting a special character 1352 01:02:28,210 --> 01:02:33,510 of all zeros in the very last bytes. 1353 01:02:33,510 --> 01:02:37,080 Every byte is eight bits, so it's putting secretly eight zeros there, 1354 01:02:37,080 --> 01:02:40,980 or we can actually draw this more conventionally as /0. 1355 01:02:40,980 --> 01:02:44,460 It's what's called the null character, and it just means all zeros. 1356 01:02:44,460 --> 01:02:46,890 So the length of the string, Zamyla, is six, 1357 01:02:46,890 --> 01:02:50,550 but how many bytes does it apparently take up, just to be clear? 1358 01:02:50,550 --> 01:02:52,030 So it actually takes up seven. 1359 01:02:52,030 --> 01:02:54,360 And this is kind of a secret implementation detail 1360 01:02:54,360 --> 01:02:57,360 that we don't really have to care about, but eventually, we will, 1361 01:02:57,360 --> 01:02:59,250 because if we want to implement certain functionality, 1362 01:02:59,250 --> 01:03:01,390 we're going to need to know what is actually going on. 1363 01:03:01,390 --> 01:03:03,260 So for instance, let me go ahead and do this. 1364 01:03:03,260 --> 01:03:07,200 Let me go ahead and create a program called strlen itself. 1365 01:03:07,200 --> 01:03:10,440 So this is not a function but a program called strlen.c. 1366 01:03:10,440 --> 01:03:13,420 Let me go ahead and include the CS50 library at the top. 1367 01:03:13,420 --> 01:03:15,780 Let me go ahead and include stdio.h. 1368 01:03:15,780 --> 01:03:20,160 Let me go ahead and type out main void, so all this is same as always. 1369 01:03:20,160 --> 01:03:24,480 And then let me go ahead and prompt the user for, say, his or her name, 1370 01:03:24,480 --> 01:03:25,620 like so. 1371 01:03:25,620 --> 01:03:26,580 And then you know what? 1372 01:03:26,580 --> 01:03:28,530 Let me actually, this time, not just print their name out, 1373 01:03:28,530 --> 01:03:29,980 because we've done that ad nauseam. 1374 01:03:29,980 --> 01:03:32,930 Let's just count the number of letters in his or her name. 1375 01:03:32,930 --> 01:03:33,930 So how could we do that? 1376 01:03:33,930 --> 01:03:40,110 Well, we could just do this-- int n get strlen of s, and then say, 1377 01:03:40,110 --> 01:03:45,010 printf "The length of your name is %i." 1378 01:03:45,010 --> 01:03:48,120 And then we can plug in n, because that's 1379 01:03:48,120 --> 01:03:49,860 the number we stored the length in. 1380 01:03:49,860 --> 01:03:52,560 But to use strlen, I have to include what header file? 1381 01:03:52,560 --> 01:03:56,410 String.h, which is the new one, so string.h. 1382 01:03:56,410 --> 01:04:02,940 And now if I type this all correctly, make strlen, make strlen, good. 1383 01:04:02,940 --> 01:04:05,280 ./strlen-- let's try it-- 1384 01:04:05,280 --> 01:04:06,130 Zamyla. 1385 01:04:06,130 --> 01:04:06,990 Enter. 1386 01:04:06,990 --> 01:04:08,760 OK, the length of her name is six. 1387 01:04:08,760 --> 01:04:10,340 But what is strlen doing? 1388 01:04:10,340 --> 01:04:13,140 Well, strlen is just an abstraction for us that someone else wrote, 1389 01:04:13,140 --> 01:04:16,530 and it's wonderfully convenient, but you know, we don't strictly need it. 1390 01:04:16,530 --> 01:04:18,250 I can actually do this myself. 1391 01:04:18,250 --> 01:04:20,880 If I understand what the computer is doing, 1392 01:04:20,880 --> 01:04:24,040 I can implement this same functionality myself as follows. 1393 01:04:24,040 --> 01:04:26,880 I can declare a variable called n and initialize it to 0, 1394 01:04:26,880 --> 01:04:27,840 and then you know what? 1395 01:04:27,840 --> 01:04:29,250 I'm going to go ahead and do this. 1396 01:04:29,250 --> 01:04:36,490 While s bracket n does not equal all zeros, 1397 01:04:36,490 --> 01:04:38,160 but you don't write all zeros like this. 1398 01:04:38,160 --> 01:04:39,630 You literally do this-- 1399 01:04:39,630 --> 01:04:42,750 that /0 to which I referred earlier in single quotes. 1400 01:04:42,750 --> 01:04:45,390 That just means all zeros in the bytes. 1401 01:04:45,390 --> 01:04:47,990 And now I can go ahead and do n++. 1402 01:04:47,990 --> 01:04:49,950 If I'm familiar with what this means, remember, 1403 01:04:49,950 --> 01:04:54,390 that this is just n equals n plus 1, but it's just a little more compact to say, 1404 01:04:54,390 --> 01:04:55,710 n++. 1405 01:04:55,710 --> 01:04:57,750 And then I can print out the name of your n-- 1406 01:04:57,750 --> 01:04:59,250 the name of your n-- 1407 01:04:59,250 --> 01:05:03,930 the name of-- the length of your name is %i, plugging in n. 1408 01:05:03,930 --> 01:05:05,040 So why does this work? 1409 01:05:05,040 --> 01:05:07,200 It's a little funky-looking, but this is just 1410 01:05:07,200 --> 01:05:09,330 demonstrating an understanding of what's going on 1411 01:05:09,330 --> 01:05:10,690 underneath the proverbial hood. 1412 01:05:10,690 --> 01:05:14,640 If n is initialized to zero, and I look at s bracket n, 1413 01:05:14,640 --> 01:05:16,470 well, that's like looking at s bracket 0. 1414 01:05:16,470 --> 01:05:21,160 And if the string, s, is Zamyla, what is s bracket 0? 1415 01:05:21,160 --> 01:05:24,250 Z. And then it does not equal /0. 1416 01:05:24,250 --> 01:05:25,780 It equals z, obviously. 1417 01:05:25,780 --> 01:05:26,710 So we increment n. 1418 01:05:26,710 --> 01:05:28,300 So now n is 1. 1419 01:05:28,300 --> 01:05:29,150 Now n is 1. 1420 01:05:29,150 --> 01:05:32,340 So what is s bracket 1 in Zamyla's name? 1421 01:05:32,340 --> 01:05:38,140 A and so forth, and we get to Z-A-M-Y-L-A, then all zeros, 1422 01:05:38,140 --> 01:05:41,590 the so-called null character, or /0. 1423 01:05:41,590 --> 01:05:44,980 That, of course, does equal /0, so the loop stops, 1424 01:05:44,980 --> 01:05:49,090 thereby leaving the total count or value of n at what it previously was, 1425 01:05:49,090 --> 01:05:51,100 which was 6. 1426 01:05:51,100 --> 01:05:52,210 So that's it. 1427 01:05:52,210 --> 01:05:54,070 Like all underneath the hood, all we have 1428 01:05:54,070 --> 01:05:57,350 is memory laid out like this, top to bottom, left to right, 1429 01:05:57,350 --> 01:06:00,100 and yet all of the functionality we've been using for a week now 1430 01:06:00,100 --> 01:06:03,850 and henceforth just boils down to some relatively simple primitives, 1431 01:06:03,850 --> 01:06:05,650 and if you understand those primitives, you 1432 01:06:05,650 --> 01:06:08,860 can do anything you want using the computer, both computationally 1433 01:06:08,860 --> 01:06:11,350 code-wise, but also memory-wise. 1434 01:06:11,350 --> 01:06:14,830 We can actually see, in fact, some of the stuff we looked at two weeks ago as 1435 01:06:14,830 --> 01:06:15,440 follows. 1436 01:06:15,440 --> 01:06:18,520 Let me go ahead and open up an example called ASCII 0. 1437 01:06:18,520 --> 01:06:22,090 Recall that ASCII is the mapping between letters and numbers in a computer. 1438 01:06:22,090 --> 01:06:23,890 And notice what this program's going to do. 1439 01:06:23,890 --> 01:06:26,050 Make-- let me go into this folder. 1440 01:06:26,050 --> 01:06:30,250 Make ascii0, ./ascii0, Enter. 1441 01:06:30,250 --> 01:06:34,660 The string shall be, let's say, Zamyla, Enter. 1442 01:06:34,660 --> 01:06:38,320 Well, it turns out that if you actually look up 1443 01:06:38,320 --> 01:06:45,700 the ASCII code for Zamyla's name, z is 90, lowercase a is 97, m is 109, 1444 01:06:45,700 --> 01:06:46,610 and so forth. 1445 01:06:46,610 --> 01:06:47,950 There are those characters, and actually, we 1446 01:06:47,950 --> 01:06:49,610 can play the same game we did last week. 1447 01:06:49,610 --> 01:06:53,920 If I do this again on "hi," there's your 72, and there's your 73. 1448 01:06:53,920 --> 01:06:55,170 Where is this coming from? 1449 01:06:55,170 --> 01:06:57,630 Well, now that I know how to manipulate individual strings, 1450 01:06:57,630 --> 01:06:58,810 notice what I can do. 1451 01:06:58,810 --> 01:07:01,690 I can get a string from the user, just as we always have. 1452 01:07:01,690 --> 01:07:05,020 I can iterate over the length of that string, albeit inefficiently 1453 01:07:05,020 --> 01:07:06,650 using strlen here. 1454 01:07:06,650 --> 01:07:09,130 And then notice this new feature today. 1455 01:07:09,130 --> 01:07:14,830 I can now convert one data type to another, because a char, 1456 01:07:14,830 --> 01:07:20,120 a character is just eight bits, but presented in the context of characters. 1457 01:07:20,120 --> 01:07:24,300 Bytes is also just eight bits that you could treat as an integer, a number. 1458 01:07:24,300 --> 01:07:25,600 It's totally context-sensitive. 1459 01:07:25,600 --> 01:07:27,330 If you use Photoshop, it's a graphic. 1460 01:07:27,330 --> 01:07:29,780 If you use a text program, it's a message and so forth. 1461 01:07:29,780 --> 01:07:31,690 So you can encode-- 1462 01:07:31,690 --> 01:07:33,310 change the context. 1463 01:07:33,310 --> 01:07:38,020 So notice here, s bracket i is, of course, the i-th character of Zamyla's 1464 01:07:38,020 --> 01:07:40,570 name, so Z or A or M or whatever. 1465 01:07:40,570 --> 01:07:44,560 But I can convert that i-th character to an integer doing what's called casting. 1466 01:07:44,560 --> 01:07:47,320 You can literally, in parentheses, specify the data type 1467 01:07:47,320 --> 01:07:50,020 you want to convert one data type to, and then 1468 01:07:50,020 --> 01:07:52,280 store it in exactly that data type. 1469 01:07:52,280 --> 01:07:54,520 So s bracket i-- convert it to a number. 1470 01:07:54,520 --> 01:07:59,000 Then store it in an actual number variable, so I can print out its value. 1471 01:07:59,000 --> 01:08:01,510 So c-- this is show me the character. 1472 01:08:01,510 --> 01:08:06,890 Show me the letter as by plugging in the character, and then the letter-- 1473 01:08:06,890 --> 01:08:09,730 sorry, the character and the number that I've just converted it to. 1474 01:08:09,730 --> 01:08:11,720 And you don't actually even have to be explicit. 1475 01:08:11,720 --> 01:08:13,770 This is called explicit casting. 1476 01:08:13,770 --> 01:08:17,260 Technically, we can do this implicitly, too. 1477 01:08:17,260 --> 01:08:19,420 And the computer knows that numbers are characters, 1478 01:08:19,420 --> 01:08:20,680 and characters are a number. 1479 01:08:20,680 --> 01:08:22,520 You don't have to be so pedantic and even do 1480 01:08:22,520 --> 01:08:24,130 the explicit casting in parentheses. 1481 01:08:24,130 --> 01:08:27,550 You can just do it implicitly with data types, and honestly, at this point, 1482 01:08:27,550 --> 01:08:29,100 I don't even need the variable. 1483 01:08:29,100 --> 01:08:32,620 I can get rid of this, and down here, I can literally just 1484 01:08:32,620 --> 01:08:36,310 print the same thing twice, but tell printf 1485 01:08:36,310 --> 01:08:39,310 to print the first in the context of a character 1486 01:08:39,310 --> 01:08:43,120 and the second in the context of an int, just treating the exact same bits 1487 01:08:43,120 --> 01:08:44,380 differently. 1488 01:08:44,380 --> 01:08:46,420 That's implicit casting. 1489 01:08:46,420 --> 01:08:48,580 And it just demonstrates what we did in week 0 1490 01:08:48,580 --> 01:08:51,010 when we claimed that letters are numbers, 1491 01:08:51,010 --> 01:08:54,970 and numbers can also be colors, and colors can be images, and so forth. 1492 01:08:54,970 --> 01:08:55,770 Is this a question? 1493 01:08:55,770 --> 01:08:57,200 AUDIENCE: Would've been useful for credit. 1494 01:08:57,200 --> 01:08:57,600 DAVID J. MALAN: Also, yes. 1495 01:08:57,600 --> 01:08:58,930 It all comes back to credit. 1496 01:08:58,930 --> 01:08:59,520 Yeah. 1497 01:08:59,520 --> 01:09:00,460 Indeed. 1498 01:09:00,460 --> 01:09:01,810 Other questions? 1499 01:09:01,810 --> 01:09:02,470 No. 1500 01:09:02,470 --> 01:09:06,100 All right, so what else can we actually do with this appreciation? 1501 01:09:06,100 --> 01:09:08,990 So super simple feature that all of us surely take for granted, 1502 01:09:08,990 --> 01:09:10,540 if we even use it anymore these days. 1503 01:09:10,540 --> 01:09:13,420 Google Docs, Microsoft Word, and such can automatically 1504 01:09:13,420 --> 01:09:14,950 capitalize words for you these days. 1505 01:09:14,950 --> 01:09:16,600 I mean your phone can do it nowadays. 1506 01:09:16,600 --> 01:09:18,430 They just sort of AutoCorrect your messages. 1507 01:09:18,430 --> 01:09:20,080 Well, how is that actually working? 1508 01:09:20,080 --> 01:09:22,750 Well, once you know that a string is just a bunch of characters 1509 01:09:22,750 --> 01:09:26,410 back to back to back, and you know that these characters have numbers 1510 01:09:26,410 --> 01:09:32,210 representing them, and like capital A is 65, and lowercase A is 97, apparently, 1511 01:09:32,210 --> 01:09:34,950 and so forth, we can leverage these patterns. 1512 01:09:34,950 --> 01:09:36,790 If I go ahead and open up this other example 1513 01:09:36,790 --> 01:09:40,960 here called Capitalize 0, notice what this program is 1514 01:09:40,960 --> 01:09:43,000 going to do for me first by running it. 1515 01:09:43,000 --> 01:09:47,680 Make capitalize 0 ./capitalize0. 1516 01:09:47,680 --> 01:09:50,680 Let me go ahead and type in Zamyla's name just as before, but now 1517 01:09:50,680 --> 01:09:51,750 it's all capital. 1518 01:09:51,750 --> 01:09:52,930 So this is a little extreme. 1519 01:09:52,930 --> 01:09:55,220 Hopefully, your phone is not capitalizing every letter, 1520 01:09:55,220 --> 01:09:58,310 but you can imagine it capitalizing just the first, if you wanted it. 1521 01:09:58,310 --> 01:09:59,540 So how does this work? 1522 01:09:59,540 --> 01:10:03,220 Well, let me go ahead and open up this example here. 1523 01:10:03,220 --> 01:10:04,930 And so what we did-- 1524 01:10:04,930 --> 01:10:08,490 so here, I'm getting a string from the user, just as we always do. 1525 01:10:08,490 --> 01:10:11,790 Then I'm saying, after, just to kind of format the output nicely. 1526 01:10:11,790 --> 01:10:15,840 Here, I'm doing a loop pretty efficiently from i equals 0 up 1527 01:10:15,840 --> 01:10:17,500 to the length of the string. 1528 01:10:17,500 --> 01:10:20,250 And now notice this neat application of logic. 1529 01:10:20,250 --> 01:10:22,400 It's a little cryptic, certainly, at first glance. 1530 01:10:22,400 --> 01:10:23,020 But whoops. 1531 01:10:23,020 --> 01:10:23,770 And now it's gone. 1532 01:10:23,770 --> 01:10:27,580 And what am I doing exactly with these lines of code? 1533 01:10:27,580 --> 01:10:31,080 Well, with every iteration of this loop, I'm asking the question, 1534 01:10:31,080 --> 01:10:33,990 is the i-th character of s, so the current character, 1535 01:10:33,990 --> 01:10:37,830 is it greater than or equal to lowercase A, and is it less than 1536 01:10:37,830 --> 01:10:39,480 or equal to lowercase Z? 1537 01:10:39,480 --> 01:10:42,960 Put another way, how do you say that more colloquially in English? 1538 01:10:42,960 --> 01:10:44,340 Is it lowercase, literally. 1539 01:10:44,340 --> 01:10:47,640 But this is the more programmatic way of expressing, is it lowercase? 1540 01:10:47,640 --> 01:10:49,680 All right, if it is, go ahead and do this. 1541 01:10:49,680 --> 01:10:53,460 Now this is a little funky, but print out a character, specifically 1542 01:10:53,460 --> 01:10:58,560 the i-th character, but subtract from that lowercase letter whatever 1543 01:10:58,560 --> 01:11:05,230 the difference is between little A and big A. Now where did that come from? 1544 01:11:05,230 --> 01:11:06,120 So it turns out-- 1545 01:11:06,120 --> 01:11:08,730 OK, capital A is 65. 1546 01:11:08,730 --> 01:11:10,870 Lowercase A is 97. 1547 01:11:10,870 --> 01:11:13,270 So the difference between those is 32. 1548 01:11:13,270 --> 01:11:18,180 And that's true for B, so capital B is 66, and lowercase B is 98. 1549 01:11:18,180 --> 01:11:20,590 Still 32, and it repeats for the whole alphabet. 1550 01:11:20,590 --> 01:11:22,890 So I could just do this. 1551 01:11:22,890 --> 01:11:27,900 If I know that lowercase letters have bigger numbers, like 97, 98, 1552 01:11:27,900 --> 01:11:32,010 and I know that lowercase numbers have lower letters, like 65, 66, 1553 01:11:32,010 --> 01:11:35,450 I can just literally subtract off 32 from my lowercase letters. 1554 01:11:35,450 --> 01:11:37,200 As you point out, it's a lowercase letter. 1555 01:11:37,200 --> 01:11:40,910 Subtract 32, and that gives us what result? 1556 01:11:40,910 --> 01:11:42,170 The capitalized version. 1557 01:11:42,170 --> 01:11:43,740 It uppercases things for us. 1558 01:11:43,740 --> 01:11:46,550 But honestly, this feels a little hackish that, like, OK, yes, 1559 01:11:46,550 --> 01:11:48,510 I can do the math correctly, but you know what? 1560 01:11:48,510 --> 01:11:50,630 It's better practice, generally, to abstract this away. 1561 01:11:50,630 --> 01:11:53,090 Don't get into the weeds of counting how many characters are away 1562 01:11:53,090 --> 01:11:53,840 from each other. 1563 01:11:53,840 --> 01:11:55,730 Math is cheap and easy in the computer. 1564 01:11:55,730 --> 01:11:58,460 Let it do the math for you by subtracting whatever the value of A 1565 01:11:58,460 --> 01:12:04,370 is, of capital A is from the value of lowercase A. Or we could just write 32. 1566 01:12:04,370 --> 01:12:07,370 Otherwise, go ahead and just print the character unchanged. 1567 01:12:07,370 --> 01:12:11,030 So in this case, the A-M-Y-L-A in Zamyla's name got uppercased, 1568 01:12:11,030 --> 01:12:13,930 and everything else, the Z, got left alone, 1569 01:12:13,930 --> 01:12:18,490 just by understanding what's going on with how the computer's represented. 1570 01:12:18,490 --> 01:12:21,110 But honestly, God, I don't want to keep writing code like this. 1571 01:12:21,110 --> 01:12:22,530 Like, I'm never going to get this. 1572 01:12:22,530 --> 01:12:23,960 I'm new to programming, perhaps. 1573 01:12:23,960 --> 01:12:26,990 I'm never going to get this sort of sequence of all the cryptic symbols 1574 01:12:26,990 --> 01:12:30,230 together, and that's OK, because we can actually implement this same program 1575 01:12:30,230 --> 01:12:32,960 a little more easily, thanks to functions 1576 01:12:32,960 --> 01:12:35,390 and abstractions that others have written for us. 1577 01:12:35,390 --> 01:12:38,960 So in this program, turns out I can simplify 1578 01:12:38,960 --> 01:12:43,880 the questions I'm asking by literally calling a function that says, is lower. 1579 01:12:43,880 --> 01:12:45,590 And there's another one called, is upper, 1580 01:12:45,590 --> 01:12:48,130 and there's bunches of others that just literally are called, 1581 01:12:48,130 --> 01:12:49,280 is something or other. 1582 01:12:49,280 --> 01:12:53,300 So is lower takes an argument like the i-th character of s, 1583 01:12:53,300 --> 01:12:55,760 and it just returns a bull-- true or false. 1584 01:12:55,760 --> 01:12:57,110 How is it implemented? 1585 01:12:57,110 --> 01:13:00,530 Well, honestly, if we looked at the code that someone else wrote decades ago 1586 01:13:00,530 --> 01:13:03,680 for is upper, odds are-- or is lower-- 1587 01:13:03,680 --> 01:13:07,250 odds are he or she wrote code that looks almost like this. 1588 01:13:07,250 --> 01:13:10,050 But we don't need to worry about that level of detail. 1589 01:13:10,050 --> 01:13:12,590 We can just use his or her function, but how do we do that? 1590 01:13:12,590 --> 01:13:15,050 Turns out that this function-- and you would only know this 1591 01:13:15,050 --> 01:13:17,380 by having been told or Googling or reading a reference-- 1592 01:13:17,380 --> 01:13:20,630 is in a library called ctype.h. 1593 01:13:20,630 --> 01:13:25,090 And you need the header file called ctype.h in order to use it. 1594 01:13:25,090 --> 01:13:27,800 And we'll almost always point you to references and documentation 1595 01:13:27,800 --> 01:13:29,470 to explain that to you. 1596 01:13:29,470 --> 01:13:31,780 Toupper is another feature, right? 1597 01:13:31,780 --> 01:13:33,050 This math-- like, my god. 1598 01:13:33,050 --> 01:13:34,070 I just want to uppercase a letter. 1599 01:13:34,070 --> 01:13:36,830 I don't want to really keep thinking about how far apart uppercase letters 1600 01:13:36,830 --> 01:13:37,880 are from lowercase. 1601 01:13:37,880 --> 01:13:39,980 Turns out that in the C type library, there's 1602 01:13:39,980 --> 01:13:43,280 another function called toupper that literally does the exact same thing 1603 01:13:43,280 --> 01:13:45,350 in the previous program we wrote. 1604 01:13:45,350 --> 01:13:47,810 And so that, too, is OK. 1605 01:13:47,810 --> 01:13:48,560 But you know what? 1606 01:13:48,560 --> 01:13:50,870 This feels a little verbose. 1607 01:13:50,870 --> 01:13:53,510 It would be nice if I could really tighten this program up. 1608 01:13:53,510 --> 01:13:55,670 So how those toupper work? 1609 01:13:55,670 --> 01:13:58,790 Well, it turns out some of you might be familiar with CS50 Reference 1610 01:13:58,790 --> 01:14:00,890 Online, our web-based app that we have that 1611 01:14:00,890 --> 01:14:03,200 helps you navigate available functions in C. 1612 01:14:03,200 --> 01:14:06,050 Turns out that all of the data for that application 1613 01:14:06,050 --> 01:14:09,830 comes from an older command line program that comes in Linux 1614 01:14:09,830 --> 01:14:12,590 and comes in the sandbox called Man for manual. 1615 01:14:12,590 --> 01:14:16,070 And anytime you type "man" at the command prompt, and then the name 1616 01:14:16,070 --> 01:14:18,600 of a function you're interested in, if it exists, 1617 01:14:18,600 --> 01:14:20,480 it will tell you a little something about it. 1618 01:14:20,480 --> 01:14:27,080 So if I go to toupper, man toupper, I get slightly cryptic documentation 1619 01:14:27,080 --> 01:14:27,860 here. 1620 01:14:27,860 --> 01:14:30,260 But notice, toupper and some other functions 1621 01:14:30,260 --> 01:14:31,940 convert uppercase or lowercase. 1622 01:14:31,940 --> 01:14:33,410 That's the summary. 1623 01:14:33,410 --> 01:14:36,600 Notice that in the synopsis, the man page, so to speak, 1624 01:14:36,600 --> 01:14:39,560 is telling me what header file I have to include. 1625 01:14:39,560 --> 01:14:41,810 Notice that under Synopsis, it's also telling me 1626 01:14:41,810 --> 01:14:44,970 what the signature or prototype is of the function. 1627 01:14:44,970 --> 01:14:48,050 In other words, the documentation in Man, the Linux programmer's manual, 1628 01:14:48,050 --> 01:14:48,850 is very terse. 1629 01:14:48,850 --> 01:14:51,650 So it's not going to hold your hand in this black and white format. 1630 01:14:51,650 --> 01:14:54,020 It's just going to convey, well, implicitly, 1631 01:14:54,020 --> 01:14:55,830 you better put this on top of your file. 1632 01:14:55,830 --> 01:14:57,870 And by the way, this is how you use the function. 1633 01:14:57,870 --> 01:15:03,810 It takes an argument called C, returns a value of type int. 1634 01:15:03,810 --> 01:15:06,120 Why is it int? 1635 01:15:06,120 --> 01:15:07,330 Let me wave my hands at that. 1636 01:15:07,330 --> 01:15:10,550 It effectively returns a character for our purposes today. 1637 01:15:10,550 --> 01:15:12,980 And if we scroll down, OK, description. 1638 01:15:12,980 --> 01:15:16,400 Ugh, I don't really want to read all of this, but OK, here we go. 1639 01:15:16,400 --> 01:15:21,010 If c is a lowercase letter, toupper returns its uppercase equivalent, 1640 01:15:21,010 --> 01:15:23,510 if an uppercase representation exists in the current locale. 1641 01:15:23,510 --> 01:15:26,300 That just means if it's punctuation, it's not going to do anything. 1642 01:15:26,300 --> 01:15:29,900 Otherwise, it returns C, And that's kind of the key detail. 1643 01:15:29,900 --> 01:15:33,770 If I pass it lowercase A, it's going to give me capital A, 1644 01:15:33,770 --> 01:15:36,650 but if I pass it capital A, what's it going to give me? 1645 01:15:36,650 --> 01:15:37,480 AUDIENCE: Capital A. 1646 01:15:37,480 --> 01:15:40,390 DAVID J. MALAN: Also, capital A. It returns the original character, c. 1647 01:15:40,390 --> 01:15:42,200 That's the only detail I cared about. 1648 01:15:42,200 --> 01:15:43,930 When in doubt, read the manual. 1649 01:15:43,930 --> 01:15:45,680 And it might be a little cryptic, and this 1650 01:15:45,680 --> 01:15:48,320 is why CS50 Reference takes somewhat cryptic documentation 1651 01:15:48,320 --> 01:15:50,860 and tries to simplify it into more human-friendly terms. 1652 01:15:50,860 --> 01:15:53,490 But at the end of the day, these are the authoritative answers. 1653 01:15:53,490 --> 01:15:55,820 And if I or one of the staff don't know, we literally 1654 01:15:55,820 --> 01:15:59,450 pull up the Man page or CS50 Reference to answer these kinds of questions. 1655 01:15:59,450 --> 01:16:01,070 Now what's the implication? 1656 01:16:01,070 --> 01:16:02,660 I don't need any of this. 1657 01:16:02,660 --> 01:16:06,560 I can literally get rid of the condition and just let 1658 01:16:06,560 --> 01:16:10,790 toupper do all of the legwork, and now my program 1659 01:16:10,790 --> 01:16:13,880 is so much more compact than the previous versions were, 1660 01:16:13,880 --> 01:16:15,380 because I've read the documentation. 1661 01:16:15,380 --> 01:16:18,860 I know what the function does, and I can let toupper uppercase something 1662 01:16:18,860 --> 01:16:21,000 or just pass it through unchanged. 1663 01:16:21,000 --> 01:16:23,840 We can better design, because we're writing fewer lines of code that 1664 01:16:23,840 --> 01:16:29,600 are just as clear, and so we can now actually tighten things up. 1665 01:16:29,600 --> 01:16:33,690 Any questions on this particular approach? 1666 01:16:33,690 --> 01:16:34,190 All right. 1667 01:16:34,190 --> 01:16:35,520 So we're getting very low level. 1668 01:16:35,520 --> 01:16:38,510 Now let's make these things more useful, because clearly, other people 1669 01:16:38,510 --> 01:16:40,250 have solved some of these problems for us, 1670 01:16:40,250 --> 01:16:44,180 as by having these functions and the C type library and the string library. 1671 01:16:44,180 --> 01:16:45,360 What more is there? 1672 01:16:45,360 --> 01:16:49,910 Well, recall that every time we run Clang, or even run make, 1673 01:16:49,910 --> 01:16:52,400 we're typing multiple words at the command prompt. 1674 01:16:52,400 --> 01:16:56,030 You're typing make hello or make Mario, a second word, 1675 01:16:56,030 --> 01:16:59,240 or you're typing clang-o, hello, hello.c, 1676 01:16:59,240 --> 01:17:01,700 like lots of words at the prompt. 1677 01:17:01,700 --> 01:17:04,430 Well, it turns out that all this time, you're using, indeed, 1678 01:17:04,430 --> 01:17:05,630 command line arguments. 1679 01:17:05,630 --> 01:17:10,340 But in C, you can write programs that also accept words and numbers when 1680 01:17:10,340 --> 01:17:11,890 the user runs the program. 1681 01:17:11,890 --> 01:17:12,810 Think back, after all. 1682 01:17:12,810 --> 01:17:15,610 When you ran Mario, you did ./mario, Enter. 1683 01:17:15,610 --> 01:17:17,570 You couldn't type any more words at the prompt. 1684 01:17:17,570 --> 01:17:20,360 When you did credit, you did ./credit, Enter. 1685 01:17:20,360 --> 01:17:21,530 No more words at the prompt. 1686 01:17:21,530 --> 01:17:24,830 You used get string or get long to get more input, but not 1687 01:17:24,830 --> 01:17:26,100 at the command line. 1688 01:17:26,100 --> 01:17:29,720 And it turns out that we can, relatively simply, in C, 1689 01:17:29,720 --> 01:17:31,940 but it's a little cryptic at first glance. 1690 01:17:31,940 --> 01:17:33,980 Let me go ahead and-- 1691 01:17:33,980 --> 01:17:41,370 let me go ahead and, here, pull up this signature here, which looks like this. 1692 01:17:41,370 --> 01:17:45,450 This is the function that we're all used to by now for writing a main function. 1693 01:17:45,450 --> 01:17:47,070 And up until now, we've said void. 1694 01:17:47,070 --> 01:17:50,110 Main doesn't take any inputs, and indeed, it just runs. 1695 01:17:50,110 --> 01:17:54,150 But it turns out if you change your existing programs or future programs, 1696 01:17:54,150 --> 01:17:58,110 not to say void, but to say, int argc, string argv, 1697 01:17:58,110 --> 01:18:00,030 it's a little cryptic at first glance. 1698 01:18:00,030 --> 01:18:04,350 But what's a recognizable symbol now? 1699 01:18:04,350 --> 01:18:05,780 Yeah, there's brackets here. 1700 01:18:05,780 --> 01:18:08,530 So it turns out that every time you write a program, 1701 01:18:08,530 --> 01:18:11,680 if you don't just say void, you actually enable this feature 1702 01:18:11,680 --> 01:18:13,690 by writing int argc, string argv. 1703 01:18:13,690 --> 01:18:15,790 You can actually tell Clang, you know what? 1704 01:18:15,790 --> 01:18:20,350 I want this program to accept one or more words or numbers after the name 1705 01:18:20,350 --> 01:18:23,950 of the program, so I can do ./hellodavid, or ./hellozamyla. 1706 01:18:23,950 --> 01:18:27,940 I don't have to wait for the program to be running to use string. 1707 01:18:27,940 --> 01:18:34,000 And just as with the earlier example, where you were able to chart an array, 1708 01:18:34,000 --> 01:18:38,170 main is defined as taking an array, called argv historical reasons-- 1709 01:18:38,170 --> 01:18:39,070 argument vector. 1710 01:18:39,070 --> 01:18:40,630 Vector means array. 1711 01:18:40,630 --> 01:18:43,510 Argument vector, bracket, closed bracket just means this is-- 1712 01:18:43,510 --> 01:18:46,900 this contains one or more words, each of which is a string. 1713 01:18:46,900 --> 01:18:49,600 Argc is argument count, so this is the variable 1714 01:18:49,600 --> 01:18:52,240 that main gets access to that tells it how many arguments, 1715 01:18:52,240 --> 01:18:55,410 how many strings are actually in argv. 1716 01:18:55,410 --> 01:18:58,420 So how can we use this in a useful way? 1717 01:18:58,420 --> 01:19:01,780 Well, let me go ahead here and open up the sandbox. 1718 01:19:01,780 --> 01:19:06,960 And let me go ahead and create a new file called, say, argv0, argv0.c-- 1719 01:19:06,960 --> 01:19:10,590 again, argument vector, just list or array of arguments. 1720 01:19:10,590 --> 01:19:19,210 And let me go ahead and, as usual, include cs50.h, include stdio.h, 1721 01:19:19,210 --> 01:19:26,110 and then int main not void, but int argc, string argv-- 1722 01:19:26,110 --> 01:19:28,810 argv-- open bracket, closed bracket. 1723 01:19:28,810 --> 01:19:31,690 And even if that doesn't come naturally at first, it will eventually. 1724 01:19:31,690 --> 01:19:32,730 And I'm going to do this. 1725 01:19:32,730 --> 01:19:39,040 If the number of arguments passed in equals 2, 1726 01:19:39,040 --> 01:19:45,490 then I'm going to go ahead and do this-- printf, hello %s, comma, 1727 01:19:45,490 --> 01:19:47,740 and here in the past, I've typed a variable name. 1728 01:19:47,740 --> 01:19:49,840 And I now actually have access to a variable. 1729 01:19:49,840 --> 01:19:52,360 Go ahead and do argv bracket 1. 1730 01:19:52,360 --> 01:19:56,200 Else, if the user does not type, apparently, two words, 1731 01:19:56,200 --> 01:20:00,920 let me go ahead and just by default, say, hello world, as we always have. 1732 01:20:00,920 --> 01:20:03,850 Now why-- what is this doing, and how is it doing it? 1733 01:20:03,850 --> 01:20:04,980 Well, let's quickly run it. 1734 01:20:04,980 --> 01:20:07,660 So make-- whoops. 1735 01:20:07,660 --> 01:20:15,100 Make argv0, ./argv0, Enter, Hello World. 1736 01:20:15,100 --> 01:20:17,330 But if I do Hello-- 1737 01:20:17,330 --> 01:20:19,840 or dot-- the program would be better named 1738 01:20:19,840 --> 01:20:23,560 if we called it Hello, but Zamyla, Enter. 1739 01:20:23,560 --> 01:20:24,190 Hello Zamyla. 1740 01:20:24,190 --> 01:20:26,560 If I change it to David, now I have access to David. 1741 01:20:26,560 --> 01:20:29,020 If I had David Malan, no. 1742 01:20:29,020 --> 01:20:30,340 It doesn't support that. 1743 01:20:30,340 --> 01:20:31,330 So what's going on? 1744 01:20:31,830 --> 01:20:34,180 If you change main in any program write to take 1745 01:20:34,180 --> 01:20:38,410 these two arguments, argc and argv of type string int 1746 01:20:38,410 --> 01:20:41,440 and then an array of strings, argc tells you how many words 1747 01:20:41,440 --> 01:20:42,550 were typed at the prompt. 1748 01:20:42,550 --> 01:20:45,040 So if the human typed two words, I presume 1749 01:20:45,040 --> 01:20:48,700 the first word is the name of the program, dot slash argv0, 1750 01:20:48,700 --> 01:20:51,590 the second word is presumably my name, if he or she is actually 1751 01:20:51,590 --> 01:20:53,050 providing their name at the prompt. 1752 01:20:53,050 --> 01:20:55,690 And so I print out argv bracket 1. 1753 01:20:55,690 --> 01:20:58,630 Not 0 because that's the name of the program, but argv bracket 1. 1754 01:20:58,630 --> 01:21:02,680 Else, down here, if the human doesn't provide just Zamyla, or just David, 1755 01:21:02,680 --> 01:21:07,300 or just one word more generally, I just print the default, "Hello world." 1756 01:21:07,300 --> 01:21:15,400 But what's neat about this now is notice that argv is an array of strings. 1757 01:21:15,400 --> 01:21:18,510 What is a string? 1758 01:21:18,510 --> 01:21:20,340 It's an array of characters. 1759 01:21:20,340 --> 01:21:24,000 And so let's enter just one last piece of syntax that gets kind of powerful 1760 01:21:24,000 --> 01:21:24,510 here. 1761 01:21:24,510 --> 01:21:28,290 Let me go ahead and do this. 1762 01:21:28,290 --> 01:21:33,590 Let me go ahead and, in a new file here, argv 1 dot c. 1763 01:21:33,590 --> 01:21:35,010 Let me go ahead and paste this in. 1764 01:21:35,010 --> 01:21:36,120 Close this. 1765 01:21:36,120 --> 01:21:38,190 Let me go ahead and do this. 1766 01:21:38,190 --> 01:21:43,600 Rather than do this logical checking, let me do this, for-- 1767 01:21:43,600 --> 01:21:48,890 let's say for int, i get 0. 1768 01:21:48,890 --> 01:21:50,330 i is less than argc-- 1769 01:21:50,330 --> 01:21:51,580 i++. 1770 01:21:51,580 --> 01:21:54,910 Let's go ahead and, one per line, print out every word 1771 01:21:54,910 --> 01:21:57,310 that the human just typed, just to reinforce 1772 01:21:57,310 --> 01:21:59,170 that this is indeed what's going on. 1773 01:21:59,170 --> 01:22:01,180 So argv bracket 0, save. 1774 01:22:01,180 --> 01:22:05,290 Make argv 1, enter. 1775 01:22:05,290 --> 01:22:07,530 And now let's go ahead and run this program-- 1776 01:22:07,530 --> 01:22:12,590 dot slash, argv 1, David Malan. 1777 01:22:12,590 --> 01:22:14,600 OK, you see all three words. 1778 01:22:14,600 --> 01:22:17,570 If we change it to Zamyla, we see just those two words. 1779 01:22:17,570 --> 01:22:20,300 If we change it to Zamyla Chan, we see those three words. 1780 01:22:20,300 --> 01:22:23,210 So we clearly have access to all of the words in the array, 1781 01:22:23,210 --> 01:22:25,230 but let's take this one step further. 1782 01:22:25,230 --> 01:22:28,520 Rather than just print out every word in a string, let's go ahead and do this. 1783 01:22:28,520 --> 01:22:32,480 For intj get 0. 1784 01:22:32,480 --> 01:22:40,910 n equals the string length of the current argument, like this-- 1785 01:22:40,910 --> 01:22:43,340 j is less than n, j++-- 1786 01:22:43,340 --> 01:22:45,560 oops, oops, oops-- j++. 1787 01:22:45,560 --> 01:22:49,670 Now let me go ahead and print out not the full string, but let me do-- oops, 1788 01:22:49,670 --> 01:22:52,970 oops-- let me go ahead and print out this-- 1789 01:22:52,970 --> 01:23:00,240 not a string, but a character, n bracket i bracket j, like this. 1790 01:23:00,240 --> 01:23:00,740 All right. 1791 01:23:00,740 --> 01:23:01,700 So what's going on? 1792 01:23:01,700 --> 01:23:07,910 One, this outer loop, and let's comment it, iterate over strings in argv. 1793 01:23:07,910 --> 01:23:13,140 This inner loop, iterate over chars in argv bracket i. 1794 01:23:13,140 --> 01:23:17,240 So the outer loop iterates over all of the strings in argv. 1795 01:23:17,240 --> 01:23:20,990 And the inner loop, using a different variable, starting at 0, 1796 01:23:20,990 --> 01:23:23,930 iterates over all of the characters in the ith 1797 01:23:23,930 --> 01:23:26,630 argument, which itself is a string. 1798 01:23:26,630 --> 01:23:28,660 So we can call string length on it. 1799 01:23:28,660 --> 01:23:31,460 And then we do this up until n, which is the length of that string. 1800 01:23:31,460 --> 01:23:33,210 And then we print out each character. 1801 01:23:33,210 --> 01:23:38,470 So just to be clear-- when I run arv1 and correct it, at first glance, 1802 01:23:38,470 --> 01:23:42,070 why it's implicitly declaring library function sterling, what's almost always 1803 01:23:42,070 --> 01:23:44,170 the solution when you do this wrong? 1804 01:23:44,170 --> 01:23:45,050 AUDIENCE: [INAUDIBLE] 1805 01:23:45,050 --> 01:23:45,920 DAVID J. MALAN: Yeah. 1806 01:23:45,920 --> 01:23:49,640 So I forgot this, so include string.h and help50 would 1807 01:23:49,640 --> 01:23:50,600 help with that as well. 1808 01:23:50,600 --> 01:23:52,830 Let's recompile with make argv1. 1809 01:23:52,830 --> 01:23:53,330 All right. 1810 01:23:53,330 --> 01:24:00,580 When I run argv1, of, say, Zamyla Chan, what am I going to see? 1811 01:24:00,580 --> 01:24:01,980 AUDIENCE: [INAUDIBLE] 1812 01:24:01,980 --> 01:24:03,860 DAVID J. MALAN: Yeah. 1813 01:24:03,860 --> 01:24:05,620 Is that the right intuition? 1814 01:24:05,620 --> 01:24:06,940 AUDIENCE: [INAUDIBLE] 1815 01:24:06,940 --> 01:24:10,370 DAVID J. MALAN: I'm going to see Zamyla Chan, but-- 1816 01:24:10,370 --> 01:24:11,250 AUDIENCE: [INAUDIBLE] 1817 01:24:11,250 --> 01:24:14,140 DAVID J. MALAN: One character on each line, including the program's name. 1818 01:24:14,140 --> 01:24:16,550 So in fact, let me scroll this up so it's a little bigger. 1819 01:24:16,550 --> 01:24:17,160 Enter. 1820 01:24:17,160 --> 01:24:22,650 OK, it's a little stupid, the program, but it does confirm that using arrays 1821 01:24:22,650 --> 01:24:25,140 do I have access not only to the words, but I can kind of 1822 01:24:25,140 --> 01:24:26,310 have the second dimension. 1823 01:24:26,310 --> 01:24:30,450 And within each word, I can get at each character within. 1824 01:24:30,450 --> 01:24:34,740 And we do this, again, just by using not just single square brackets, 1825 01:24:34,740 --> 01:24:35,280 but double. 1826 01:24:35,280 --> 01:24:37,810 And again, just break this down into the first principles. 1827 01:24:37,810 --> 01:24:38,940 What is this first bracket? 1828 01:24:38,940 --> 01:24:41,680 This is the ith argument, the ith string in the array. 1829 01:24:41,680 --> 01:24:43,680 And then if you take it further, with bracket j, 1830 01:24:43,680 --> 01:24:47,880 that gives you the j character inside of this. 1831 01:24:47,880 --> 01:24:51,250 Now, who cares about any of this kind of functionality? 1832 01:24:51,250 --> 01:24:54,610 Well, let me scroll back and propose one application here. 1833 01:24:54,610 --> 01:24:57,570 So recall that CS is really just problem solving. 1834 01:24:57,570 --> 01:24:59,490 But suppose the problem that you want to solve 1835 01:24:59,490 --> 01:25:02,250 is to actually pass a secret message in class 1836 01:25:02,250 --> 01:25:04,400 or send someone a secret for whatever reason. 1837 01:25:04,400 --> 01:25:06,240 Well, the input to that problem is generally 1838 01:25:06,240 --> 01:25:09,300 called plain test, a message you want to send to that other person. 1839 01:25:09,300 --> 01:25:12,420 You ideally want ciphertext to emerge from it, 1840 01:25:12,420 --> 01:25:15,570 which is enciphered and scrambled, somehow encrypted information 1841 01:25:15,570 --> 01:25:18,720 so that anyone in the room, like the teacher, can't just grab the note 1842 01:25:18,720 --> 01:25:21,810 and read what you're sending to your secret crush or love across the room, 1843 01:25:21,810 --> 01:25:23,610 or in any other context as well. 1844 01:25:23,610 --> 01:25:26,160 But the problem is that if the message you want to send, say, 1845 01:25:26,160 --> 01:25:29,220 is our old friend Hi!, with an exclamation point, 1846 01:25:29,220 --> 01:25:34,020 you can encode it in certain contexts as just 72, 73, 33. 1847 01:25:34,020 --> 01:25:37,450 And I daresay most classes on campus if you wrote on a piece of paper 72, 1848 01:25:37,450 --> 01:25:40,910 73, 33, passed it through the room, and whatever professor intercepts it, 1849 01:25:40,910 --> 01:25:43,080 they're not going to know what you're saying anyway. 1850 01:25:43,080 --> 01:25:44,790 But this is not a good system. 1851 01:25:44,790 --> 01:25:46,620 This is not a cryptosystem. 1852 01:25:46,620 --> 01:25:47,460 Why? 1853 01:25:47,460 --> 01:25:48,160 It's not secure. 1854 01:25:52,270 --> 01:25:52,780 [INAUDIBLE] 1855 01:25:52,780 --> 01:25:54,540 [INTERPOSING VOICES] 1856 01:25:54,540 --> 01:25:55,420 DAVID J. MALAN: Yeah. 1857 01:25:55,420 --> 01:25:57,130 Anyone has access to this, right, so long 1858 01:25:57,130 --> 01:26:00,490 as you attend like week 1 or 0 of CS50, or you just 1859 01:26:00,490 --> 01:26:02,320 have general familiarity with Ascii. 1860 01:26:02,320 --> 01:26:04,780 Like this is just a code. 1861 01:26:04,780 --> 01:26:08,260 I mean Ascii is a system that maps letters to numbers. 1862 01:26:08,260 --> 01:26:10,320 And anyone else who knows this code obviously 1863 01:26:10,320 --> 01:26:12,820 knows what your message is, because it's not a unique secret 1864 01:26:12,820 --> 01:26:14,340 to you and the recipient. 1865 01:26:14,340 --> 01:26:16,170 So that's probably not the best idea. 1866 01:26:16,170 --> 01:26:18,160 Well, you can be a little more sophisticated. 1867 01:26:18,160 --> 01:26:19,990 And this is back-- actually, a photograph 1868 01:26:19,990 --> 01:26:23,680 from World War I of a message that was sent from Germany to Mexico 1869 01:26:23,680 --> 01:26:25,780 that was encoded in a very similar way. 1870 01:26:25,780 --> 01:26:26,800 It wasn't using Ascii. 1871 01:26:26,800 --> 01:26:29,080 The numbers, as you can perhaps glean from the photo, 1872 01:26:29,080 --> 01:26:30,430 are actually much larger. 1873 01:26:30,430 --> 01:26:33,850 But in this system, in a militaristic context, there was a code book. 1874 01:26:33,850 --> 01:26:35,740 So similar in spirit to Ascii, where you have 1875 01:26:35,740 --> 01:26:39,650 a column of numbers and a column of letters to which they correspond, 1876 01:26:39,650 --> 01:26:42,970 a codebook more generally has like numbers, and then maybe 1877 01:26:42,970 --> 01:26:45,280 even letters or whole words that they correspond to, 1878 01:26:45,280 --> 01:26:50,230 sometimes thousands of them, like literally a really big book of codes. 1879 01:26:50,230 --> 01:26:53,800 And so long as only, in this context the Germans and the recipients, 1880 01:26:53,800 --> 01:26:56,500 the Mexicans, had access to that same book, 1881 01:26:56,500 --> 01:27:01,060 only they could encrypt and decrypt, or rather encode and decode information. 1882 01:27:01,060 --> 01:27:03,230 Of course, in this very specific context-- 1883 01:27:03,230 --> 01:27:05,320 you can read more about this in historical texts-- 1884 01:27:05,320 --> 01:27:06,280 this was intercepted. 1885 01:27:06,280 --> 01:27:08,860 This message, seemingly innocuous, though definitely 1886 01:27:08,860 --> 01:27:11,740 suspicious looking with all these numbers, 1887 01:27:11,740 --> 01:27:16,450 so therefore not innocuous, the British, in this case actually, intercepted it. 1888 01:27:16,450 --> 01:27:18,500 And thanks to a lot of efforts and cryptanalysis, 1889 01:27:18,500 --> 01:27:23,740 the Bletchley Park style code breaking, albeit further back, 1890 01:27:23,740 --> 01:27:27,490 were they able to figure out what those numbers represented in words 1891 01:27:27,490 --> 01:27:29,750 and actually decode the message. 1892 01:27:29,750 --> 01:27:31,960 And in fact, here's a photograph of some of the words 1893 01:27:31,960 --> 01:27:34,480 that were translated from one to the other. 1894 01:27:34,480 --> 01:27:38,560 But more on that in any online or textual references. 1895 01:27:38,560 --> 01:27:41,380 Turns out in this poem too there was a similar code, right? 1896 01:27:41,380 --> 01:27:44,860 So apropos of being in Boston here, you might recall this one. 1897 01:27:44,860 --> 01:27:49,090 "Listen my children, and you shall hear of the midnight ride of Paul Revere. 1898 01:27:49,090 --> 01:27:51,880 On the 18th of April in '75, hardly a man 1899 01:27:51,880 --> 01:27:54,910 is now alive who remembers that famous day and year. 1900 01:27:54,910 --> 01:27:58,360 He said to his friend, if the British march by land or sea 1901 01:27:58,360 --> 01:28:00,490 from the town tonight night, hang a lantern 1902 01:28:00,490 --> 01:28:05,020 aloft in the belfry arch of the North Church tower as a signal light, 1903 01:28:05,020 --> 01:28:08,010 one if by land, and two if by sea. 1904 01:28:08,010 --> 01:28:10,510 And I on the opposite shore will be ready to ride and spread 1905 01:28:10,510 --> 01:28:13,630 the alarm through every Middlesex village and farm for the country folk 1906 01:28:13,630 --> 01:28:14,900 to be up and to arm." 1907 01:28:14,900 --> 01:28:17,530 So it turns out some of that is not actually factually correct, 1908 01:28:17,530 --> 01:28:21,760 but the one if by land and the two if by sea code were 1909 01:28:21,760 --> 01:28:23,620 sort of an example of a one-time code. 1910 01:28:23,620 --> 01:28:27,400 Because if the revolutionaries in the American Revolution kind of 1911 01:28:27,400 --> 01:28:30,970 decided secretly among themselves literally that-- we will put up one 1912 01:28:30,970 --> 01:28:34,580 light at the top of a church if the British are coming by land. 1913 01:28:34,580 --> 01:28:37,690 And we will instead use two if the British are instead coming by sea. 1914 01:28:37,690 --> 01:28:38,850 Like that is a code. 1915 01:28:38,850 --> 01:28:41,750 And you could write it down in a book, unless you have a code book. 1916 01:28:41,750 --> 01:28:44,500 But of course, as soon as someone figures out that pattern, 1917 01:28:44,500 --> 01:28:45,670 it's compromised. 1918 01:28:45,670 --> 01:28:49,150 And so code books tend not to be the most robust mechanisms 1919 01:28:49,150 --> 01:28:51,580 for encoding information. 1920 01:28:51,580 --> 01:28:54,730 Instead, it's better to use something more algorithmic. 1921 01:28:54,730 --> 01:28:56,980 And wonderfully, in computer science is this black box 1922 01:28:56,980 --> 01:28:59,800 to-- we keep saying, the home of algorithms. 1923 01:28:59,800 --> 01:29:03,940 And in general, encryption is a problem with inputs and outputs, 1924 01:29:03,940 --> 01:29:05,880 but we just need one more input. 1925 01:29:05,880 --> 01:29:09,080 The input is what's generally called the key, or a secret. 1926 01:29:09,080 --> 01:29:11,450 And a secret might just be a number. 1927 01:29:11,450 --> 01:29:13,680 So for instance, if I wanted my secret to be 1, 1928 01:29:13,680 --> 01:29:16,730 because we'll keep the example simple, but it could really be any number. 1929 01:29:16,730 --> 01:29:18,700 And indeed, we saw with the photograph a moment ago, 1930 01:29:18,700 --> 01:29:21,650 the Germans used much larger than this, albeit in the context of codes. 1931 01:29:21,650 --> 01:29:24,730 Suppose that you now want to send a more private message to someone 1932 01:29:24,730 --> 01:29:26,800 across the room in a class that, I love you. 1933 01:29:26,800 --> 01:29:31,120 How do you go about encoding that in a way that isn't just using Ascii 1934 01:29:31,120 --> 01:29:33,070 and isn't just using some simple code book? 1935 01:29:33,070 --> 01:29:37,490 Well, let me propose that now that we understand how strings are represented, 1936 01:29:37,490 --> 01:29:41,170 right-- we're about to make love really, really lame and geeky-- 1937 01:29:41,170 --> 01:29:44,950 so now that you know how to express strings computationally, 1938 01:29:44,950 --> 01:29:47,710 well, let's just start representing "I love you" in Ascii. 1939 01:29:47,710 --> 01:29:49,150 So I is 73. 1940 01:29:49,150 --> 01:29:50,620 L is 76. 1941 01:29:50,620 --> 01:29:53,320 O-V-E Y-O-U. That's just Ascii. 1942 01:29:53,320 --> 01:29:55,180 Should not send it this way, because anyone 1943 01:29:55,180 --> 01:29:58,240 who knows Ascii is going to know what you're saying. 1944 01:29:58,240 --> 01:30:02,260 But what if I enciphered this message, I performed an algorithm on it? 1945 01:30:02,260 --> 01:30:04,300 And at its simplest, an algorithm can just 1946 01:30:04,300 --> 01:30:06,590 be math-- simple arithmetic, as we've seen. 1947 01:30:06,590 --> 01:30:09,280 So you know, let me just use my secret key of 1. 1948 01:30:09,280 --> 01:30:14,950 And let me make sure that my crush knows that I am using a secret value of 1. 1949 01:30:14,950 --> 01:30:17,680 So he or she also knows to expect that value. 1950 01:30:17,680 --> 01:30:21,640 And before I send my message, I'm going to add 1 to every letter. 1951 01:30:21,640 --> 01:30:23,410 So 73 becomes 74. 1952 01:30:23,410 --> 01:30:24,880 76 becomes 77. 1953 01:30:24,880 --> 01:30:29,560 80, 87, 70, 90, 80, 86. 1954 01:30:29,560 --> 01:30:31,850 Now this could just be sent in the clear. 1955 01:30:31,850 --> 01:30:35,450 But then, I could actually send it as a textual message. 1956 01:30:35,450 --> 01:30:37,090 So let's convert it back to Ascii. 1957 01:30:37,090 --> 01:30:45,640 74 is now J. 77 is now M. 80 is now P. And you can perhaps see the pattern. 1958 01:30:45,640 --> 01:30:48,190 This message was, I love you. 1959 01:30:48,190 --> 01:30:52,740 And now, all of the letters are off by one, I think. 1960 01:30:52,740 --> 01:30:57,700 I became J. L became M. O became P, and so forth. 1961 01:30:57,700 --> 01:31:00,180 So now the claim would be, cryptographically, I'm 1962 01:31:00,180 --> 01:31:02,460 going to send this message across the room. 1963 01:31:02,460 --> 01:31:05,340 And now no one who has a code book is going to be able to solve this. 1964 01:31:05,340 --> 01:31:07,090 I can't just steal the book and decode it, 1965 01:31:07,090 --> 01:31:09,900 because now the key is only up here, so to speak. 1966 01:31:09,900 --> 01:31:12,030 It's just the number 1 that he or she and I 1967 01:31:12,030 --> 01:31:13,920 had to agree upon in advance that we would 1968 01:31:13,920 --> 01:31:15,670 use for sending our secret messages. 1969 01:31:15,670 --> 01:31:20,460 So if someone captures this message, teacher in the room or whoever, 1970 01:31:20,460 --> 01:31:26,050 how would they even go about decoding this or decrypting it? 1971 01:31:26,050 --> 01:31:29,950 Are there any techniques available to them? 1972 01:31:29,950 --> 01:31:32,590 I daresay we can kind of chip away at this love note. 1973 01:31:32,590 --> 01:31:32,980 AUDIENCE: [INAUDIBLE] 1974 01:31:32,980 --> 01:31:33,760 DAVID J. MALAN: What's that? 1975 01:31:33,760 --> 01:31:34,490 Guess and check. 1976 01:31:34,490 --> 01:31:35,480 OK, we could try all-- 1977 01:31:35,480 --> 01:31:36,860 there still kind of some spacing. 1978 01:31:36,860 --> 01:31:40,430 So you know honestly, we could do like kind of a cryptanalysis of it, 1979 01:31:40,430 --> 01:31:41,690 a frequency attack. 1980 01:31:41,690 --> 01:31:43,780 Like, I can't think of too many words in English 1981 01:31:43,780 --> 01:31:45,200 that have a single letter in them. 1982 01:31:45,200 --> 01:31:46,590 So what does J probably represent? 1983 01:31:46,590 --> 01:31:47,420 [INTERPOSING VOICES] 1984 01:31:47,420 --> 01:31:48,650 DAVID J. MALAN: I, probably. 1985 01:31:48,650 --> 01:31:52,050 Maybe A, but probably I. And there's not too many other options. 1986 01:31:52,050 --> 01:31:55,070 So we've attacked one part of the message already. 1987 01:31:55,070 --> 01:31:56,390 I see a commonality. 1988 01:31:56,390 --> 01:31:59,170 There's two what in here? 1989 01:31:59,170 --> 01:32:02,800 Two P. And I don't necessarily know that that maps to O, but I do 1990 01:32:02,800 --> 01:32:04,730 know it's the same character. 1991 01:32:04,730 --> 01:32:08,940 So if I kind of continue this thoughtful process or this trial and error, 1992 01:32:08,940 --> 01:32:10,690 and I figure out, oh, what if that's an O? 1993 01:32:10,690 --> 01:32:12,370 And then that's an O. And then wait a minute. 1994 01:32:12,370 --> 01:32:13,930 They're passing from one to another. 1995 01:32:13,930 --> 01:32:15,100 Maybe this says, I love you. 1996 01:32:15,100 --> 01:32:17,680 Like you actually can, with some probability, 1997 01:32:17,680 --> 01:32:20,930 decrypt a message by doing this kind of analysis on it. 1998 01:32:20,930 --> 01:32:22,810 It's at least more secure than the code book, 1999 01:32:22,810 --> 01:32:25,360 because you're not compromised if the book itself is stolen. 2000 01:32:25,360 --> 01:32:28,240 And you can change the key every time, so long as you 2001 01:32:28,240 --> 01:32:30,650 and the recipient actually agree on something. 2002 01:32:30,650 --> 01:32:33,490 But at least we now have this mechanism in place. 2003 01:32:33,490 --> 01:32:36,730 So with just the understanding of what you can do with strings, 2004 01:32:36,730 --> 01:32:39,760 can you actually now do really interesting domain-specific things 2005 01:32:39,760 --> 01:32:40,370 to them? 2006 01:32:40,370 --> 01:32:45,490 And in fact, back in the day, Caesar, back in militaristic times literally 2007 01:32:45,490 --> 01:32:47,250 used a cipher quite like this. 2008 01:32:47,250 --> 01:32:49,750 And frankly, when you're the first one to use these ciphers, 2009 01:32:49,750 --> 01:32:52,580 they actually are kind of secure, even if they're relatively simple. 2010 01:32:52,580 --> 01:32:57,250 But hopefully, not just using a key of 1, maybe 2, or 13, or 25, 2011 01:32:57,250 --> 01:32:58,360 or something larger. 2012 01:32:58,360 --> 01:33:01,690 But this is an example of a substitution cipher, 2013 01:33:01,690 --> 01:33:04,540 or a rotational cipher where everything's kind of rotating-- 2014 01:33:04,540 --> 01:33:07,870 A's becoming B, B's becoming C. Or you can kind of 2015 01:33:07,870 --> 01:33:11,060 rotate it even further than that. 2016 01:33:11,060 --> 01:33:14,350 Well, let's take a look at one last example here 2017 01:33:14,350 --> 01:33:17,110 of just one other final primitive of a feature 2018 01:33:17,110 --> 01:33:20,830 today, before we then go back high level to bring everything together. 2019 01:33:20,830 --> 01:33:23,380 It turns out that printing out error messages 2020 01:33:23,380 --> 01:33:27,060 is not the only way to signal that something has gone wrong. 2021 01:33:27,060 --> 01:33:31,890 There's a new keyword, a new use of an old keyword in this example, 2022 01:33:31,890 --> 01:33:33,980 that's actually a convention for signaling errors. 2023 01:33:33,980 --> 01:33:36,550 So this is an example called exit.c. 2024 01:33:36,550 --> 01:33:42,410 It apparently wants the human to do what, if you infer from the code? 2025 01:33:42,410 --> 01:33:43,540 AUDIENCE: Exit [INAUDIBLE]. 2026 01:33:43,540 --> 01:33:44,030 DAVID J. MALAN: Yes. 2027 01:33:44,030 --> 01:33:44,580 Say again? 2028 01:33:44,580 --> 01:33:45,460 AUDIENCE: [INAUDIBLE] 2029 01:33:45,460 --> 01:33:47,410 DAVID J. MALAN: Well, it wants the-- well, what 2030 01:33:47,410 --> 01:33:51,110 does it what the human to do implicitly, based on the printf's here? 2031 01:33:51,110 --> 01:33:53,410 How should I run this program? 2032 01:33:53,410 --> 01:33:53,990 Yeah? 2033 01:33:53,990 --> 01:33:56,770 AUDIENCE: [INAUDIBLE] just apply [INAUDIBLE].. 2034 01:33:56,770 --> 01:33:57,650 DAVID J. MALAN: Yeah. 2035 01:33:57,650 --> 01:34:00,410 So for whatever reason, this program implicitly 2036 01:34:00,410 --> 01:34:03,080 wants me to write exactly two words at the prompt. 2037 01:34:03,080 --> 01:34:06,500 Because if I don't, it's going to yell at me, missing command line argument. 2038 01:34:06,500 --> 01:34:08,630 And then it's going to return 1, whatever that is. 2039 01:34:08,630 --> 01:34:10,850 Otherwise, it's going to say, Hello, such and such. 2040 01:34:10,850 --> 01:34:12,680 So if I actually run this program-- 2041 01:34:12,680 --> 01:34:17,080 let me go back over here and do make exit-- 2042 01:34:17,080 --> 01:34:19,910 oops-- in my directory, make exit. 2043 01:34:19,910 --> 01:34:23,900 OK, dot slash exit, enter, I'm missing a command line argument. 2044 01:34:23,900 --> 01:34:25,400 All right, let me put Zamyla's name. 2045 01:34:25,400 --> 01:34:26,100 Oh, Hello Zamyla. 2046 01:34:26,100 --> 01:34:28,000 Let me put Zamyla Chan. 2047 01:34:28,000 --> 01:34:29,500 Nope, missing command line argument. 2048 01:34:29,500 --> 01:34:33,530 It just wants the one, so in this case here. 2049 01:34:33,530 --> 01:34:36,380 I'm seeing visually the error message, but it turns out 2050 01:34:36,380 --> 01:34:41,510 the computer is also signaling to me what the so-called exit code is. 2051 01:34:41,510 --> 01:34:44,510 So long story short, we've already seen examples last week of how 2052 01:34:44,510 --> 01:34:46,190 you can have a function return a value. 2053 01:34:46,190 --> 01:34:47,990 And we saw how [? Erin ?] came up on stage, 2054 01:34:47,990 --> 01:34:50,480 and she returned to me a piece of paper with a string on it. 2055 01:34:50,480 --> 01:34:52,820 But it turns out that main is a little special. 2056 01:34:52,820 --> 01:34:58,820 If main returns a value like 1 or 0, you can actually see that, 2057 01:34:58,820 --> 01:35:01,790 albeit in a kind of a non-obvious way. 2058 01:35:01,790 --> 01:35:06,920 If I run exit, and I run it correctly with Zamyla as the name, 2059 01:35:06,920 --> 01:35:10,910 if I then type echo, dollar sign, question mark, of all things, 2060 01:35:10,910 --> 01:35:15,990 enter, I will then see exactly what main returned with, which in this case is 0. 2061 01:35:15,990 --> 01:35:17,540 Now, let me try and be uncooperative. 2062 01:35:17,540 --> 01:35:23,570 If I actually run just dot slash exit, with no word, 2063 01:35:23,570 --> 01:35:25,230 I see, missing command line argument. 2064 01:35:25,230 --> 01:35:29,030 But if I do the same cryptic command, echo, dollar sign, question mark, 2065 01:35:29,030 --> 01:35:30,920 I see that main exited with 1. 2066 01:35:30,920 --> 01:35:32,200 Now, why is this useful? 2067 01:35:32,200 --> 01:35:35,640 Well, as we start to write more complicated programs, 2068 01:35:35,640 --> 01:35:39,320 it's going to be a convention to exit from main by returning 2069 01:35:39,320 --> 01:35:42,020 a non-zero value, if anything goes wrong. 2070 01:35:42,020 --> 01:35:44,350 0 happens to mean everything went well. 2071 01:35:44,350 --> 01:35:46,070 And in fact, in all of the programs we've 2072 01:35:46,070 --> 01:35:49,760 written thus far, if you don't mention return anything, 2073 01:35:49,760 --> 01:35:53,900 main automatically for you returns 0. 2074 01:35:53,900 --> 01:35:55,160 And it has been all this time. 2075 01:35:55,160 --> 01:35:57,990 It's just a feature, so you don't have to bother typing it yourself. 2076 01:35:57,990 --> 01:36:00,730 But what's nice about this, or what's real about this, 2077 01:36:00,730 --> 01:36:04,250 is if on your Mac or PC, if you've ever gotten an annoying error message that 2078 01:36:04,250 --> 01:36:08,780 says, error negative 29, system error has occurred, or something freezes, 2079 01:36:08,780 --> 01:36:11,510 but you very often see numbers on the screen, maybe. 2080 01:36:11,510 --> 01:36:15,300 Like those error codes actually tend to map to these kinds of values. 2081 01:36:15,300 --> 01:36:18,290 So when a human is writing software and something goes wrong 2082 01:36:18,290 --> 01:36:21,290 and an error happens, they typically return a value like this. 2083 01:36:21,290 --> 01:36:23,240 And the computer has access to it. 2084 01:36:23,240 --> 01:36:25,950 And this isn't all that useful for the human running the program. 2085 01:36:25,950 --> 01:36:27,950 But as your programs get more complex, we'll 2086 01:36:27,950 --> 01:36:32,030 see that this is actually quite useful as a way of signaling 2087 01:36:32,030 --> 01:36:34,460 that something indeed went wrong. 2088 01:36:34,460 --> 01:36:34,960 Whew. 2089 01:36:34,960 --> 01:36:41,240 OK, that's a lot of syntax wrapped in some loving context. 2090 01:36:41,240 --> 01:36:44,830 Any questions before we look at one final domain? 2091 01:36:44,830 --> 01:36:45,540 No? 2092 01:36:45,540 --> 01:36:46,170 All right. 2093 01:36:46,170 --> 01:36:51,960 So it turns out that we can answer the "who cares" question in yet another way 2094 01:36:51,960 --> 01:36:52,770 too. 2095 01:36:52,770 --> 01:36:59,520 It turns out-- let me go ahead and open up an example of our array again here-- 2096 01:36:59,520 --> 01:37:03,150 that arrays can actually now be used to solve problems more algorithmically. 2097 01:37:03,150 --> 01:37:05,030 And this is where life gets more interesting. 2098 01:37:05,030 --> 01:37:06,840 Like we were so incredibly in the weeds today. 2099 01:37:06,840 --> 01:37:08,590 And as we move forward in the class, we're 2100 01:37:08,590 --> 01:37:10,350 not going to spend so much time on syntax, 2101 01:37:10,350 --> 01:37:13,350 and dollar signs, and question marks, and square brackets, and the like. 2102 01:37:13,350 --> 01:37:14,490 That's not the interesting part. 2103 01:37:14,490 --> 01:37:17,280 The interesting part is when we now have these fundamental building 2104 01:37:17,280 --> 01:37:20,820 blocks, like an array, with which we can solve problems. 2105 01:37:20,820 --> 01:37:23,640 So it turns out that an array, you know, you 2106 01:37:23,640 --> 01:37:26,250 can kind of think of it as a series of lockers, 2107 01:37:26,250 --> 01:37:29,400 a series of lockers that might look like this, inside of which 2108 01:37:29,400 --> 01:37:32,620 are values-- strings, or numbers, or chars, or whatnot. 2109 01:37:32,620 --> 01:37:36,090 But the lockers is an apt metaphor because a computer, unlike us humans, 2110 01:37:36,090 --> 01:37:38,520 can only see and do one thing at a time. 2111 01:37:38,520 --> 01:37:41,130 It can open one locker and look inside, but it can't kind of 2112 01:37:41,130 --> 01:37:44,760 take a step back, like we humans can, and look at all of the lockers, 2113 01:37:44,760 --> 01:37:46,450 even if all of the doors are open. 2114 01:37:46,450 --> 01:37:49,170 So it has to be a more deliberate act than that. 2115 01:37:49,170 --> 01:37:51,060 So what are the actual implications? 2116 01:37:51,060 --> 01:37:52,710 Well, all this time-- 2117 01:37:52,710 --> 01:37:55,110 we had that phone book example in the first week, 2118 01:37:55,110 --> 01:37:59,370 and the efficiency of that algorithm, of finding Mike Smith in this phone book, 2119 01:37:59,370 --> 01:38:02,120 all assumed what feature of this phone book? 2120 01:38:02,120 --> 01:38:03,920 AUDIENCE: That it's ordered alphabetically. 2121 01:38:03,920 --> 01:38:05,360 DAVID J. MALAN: That it was ordered alphabetically. 2122 01:38:05,360 --> 01:38:08,270 And that was a huge plus, because then I could go to the middle, 2123 01:38:08,270 --> 01:38:10,340 and I could go to the middle of the middle, and so forth. 2124 01:38:10,340 --> 01:38:12,010 And that was an algorithmic possibility. 2125 01:38:12,010 --> 01:38:13,850 On our phones, if you pull up your contacts, 2126 01:38:13,850 --> 01:38:17,150 you've got a list of first names, or last names, all alphabetically sorted. 2127 01:38:17,150 --> 01:38:20,420 That is because, guess what data structure or layout 2128 01:38:20,420 --> 01:38:24,450 your phone probably uses to store your contacts? 2129 01:38:24,450 --> 01:38:26,380 It's an array of some sort, right? 2130 01:38:26,380 --> 01:38:27,130 It's just a list. 2131 01:38:27,130 --> 01:38:29,500 And it might be displayed vertically, instead of horizontally, 2132 01:38:29,500 --> 01:38:30,550 as I've been drawing it today. 2133 01:38:30,550 --> 01:38:33,500 But it's just values that are back, to back, to back, to back, to back, 2134 01:38:33,500 --> 01:38:34,720 that are actually sorted. 2135 01:38:34,720 --> 01:38:37,210 But how did they actually get into that sorted order? 2136 01:38:37,210 --> 01:38:38,840 And how do you actually find values? 2137 01:38:38,840 --> 01:38:40,930 Well, let's consider what this problem is actually 2138 01:38:40,930 --> 01:38:43,070 like for a computer, as follows. 2139 01:38:43,070 --> 01:38:44,530 Let me go ahead here. 2140 01:38:44,530 --> 01:38:47,840 Would a volunteer mind joining us up here? 2141 01:38:47,840 --> 01:38:49,950 I can throw in a free stress ball. 2142 01:38:49,950 --> 01:38:51,400 OK, someone from the back? 2143 01:38:51,400 --> 01:38:52,240 OK, come on up here. 2144 01:38:52,240 --> 01:38:52,940 Come on. 2145 01:38:52,940 --> 01:38:53,650 What's your name? 2146 01:38:53,650 --> 01:38:54,600 ERIC: Eric. 2147 01:38:54,600 --> 01:38:55,610 DAVID J. MALAN: Aaron. 2148 01:38:55,610 --> 01:38:56,110 All right. 2149 01:38:56,110 --> 01:38:57,670 So Aaron's going to come on up. 2150 01:38:57,670 --> 01:38:58,550 And-- 2151 01:38:58,550 --> 01:38:59,230 ERIC: Eric. 2152 01:38:59,230 --> 01:38:59,680 DAVID J. MALAN: I'm sorry? 2153 01:38:59,680 --> 01:39:00,460 Oh, Eric. 2154 01:39:00,460 --> 01:39:01,430 Nice to meet you. 2155 01:39:01,430 --> 01:39:01,930 All right. 2156 01:39:01,930 --> 01:39:02,670 Come on over here. 2157 01:39:02,670 --> 01:39:05,440 So Eric, now normally, I would ask you to find the number 23. 2158 01:39:05,440 --> 01:39:08,310 But seeing is that's a little easy, can you go ahead and just find us 2159 01:39:08,310 --> 01:39:11,420 the number 50 behind these doors, or really these yellow lockers? 2160 01:39:11,420 --> 01:39:11,920 8? 2161 01:39:11,920 --> 01:39:12,690 Nope. 2162 01:39:12,690 --> 01:39:13,190 42? 2163 01:39:13,190 --> 01:39:13,690 Nope. 2164 01:39:13,690 --> 01:39:14,300 OK. 2165 01:39:14,300 --> 01:39:14,800 Pretty good. 2166 01:39:14,800 --> 01:39:16,300 That's three, three out of seven. 2167 01:39:16,300 --> 01:39:17,740 How did you get it so quickly? 2168 01:39:17,740 --> 01:39:18,660 ERIC: I guessed. 2169 01:39:18,660 --> 01:39:20,230 DAVID J. MALAN: OK, so he guessed. 2170 01:39:20,230 --> 01:39:24,710 Is that the best algorithm that Eric could have used here? 2171 01:39:24,710 --> 01:39:26,210 ERIC: Probably not. 2172 01:39:26,210 --> 01:39:27,800 DAVID J. MALAN: Well, I don't know. 2173 01:39:27,800 --> 01:39:28,290 Yes? 2174 01:39:28,290 --> 01:39:28,860 No? 2175 01:39:28,860 --> 01:39:29,210 AUDIENCE: Yeah. 2176 01:39:29,210 --> 01:39:30,040 DAVID J. MALAN: Why? 2177 01:39:30,040 --> 01:39:30,790 Why yes? 2178 01:39:30,790 --> 01:39:31,670 AUDIENCE: [INAUDIBLE] 2179 01:39:31,670 --> 01:39:32,890 DAVID J. MALAN: He has no other information. 2180 01:39:32,890 --> 01:39:34,310 So yes, like that was the best you can do. 2181 01:39:34,310 --> 01:39:35,630 But let me give you a little more information. 2182 01:39:35,630 --> 01:39:36,720 You can stay here. 2183 01:39:36,720 --> 01:39:40,560 And let me go ahead and reload the screen here. 2184 01:39:40,560 --> 01:39:43,580 And let me go ahead and pull up a different set of doors. 2185 01:39:43,580 --> 01:39:46,730 And now suppose that, much like the phone book, and much like the phones 2186 01:39:46,730 --> 01:39:49,010 are sorted, now these doors are sorted. 2187 01:39:49,010 --> 01:39:51,460 And find us the number 50. 2188 01:39:54,360 --> 01:39:54,860 All right. 2189 01:39:54,860 --> 01:39:55,760 So good. 2190 01:39:55,760 --> 01:39:57,040 What did you do that time? 2191 01:39:57,040 --> 01:39:59,420 AUDIENCE: Well, [INAUDIBLE]. 2192 01:39:59,420 --> 01:40:00,650 It was 50 is 116. 2193 01:40:00,650 --> 01:40:01,400 So I just-- 2194 01:40:01,400 --> 01:40:02,320 DAVID J. MALAN: Right. 2195 01:40:02,320 --> 01:40:07,570 So you jumped to the middle, initially, and then to the right half. 2196 01:40:07,570 --> 01:40:10,140 And then technically-- so we're technically off by 1, right? 2197 01:40:10,140 --> 01:40:12,850 Because like binary search would have gone to the middle of the-- 2198 01:40:12,850 --> 01:40:14,840 that's OK, but very well done to Eric. 2199 01:40:14,840 --> 01:40:18,970 Here, let me at least reinforce this with a stress ball. 2200 01:40:18,970 --> 01:40:20,320 So thank you. 2201 01:40:20,320 --> 01:40:21,110 Very well done. 2202 01:40:21,110 --> 01:40:23,500 So with that additional information, as you know, 2203 01:40:23,500 --> 01:40:27,490 Eric was able to do better because the information was sorted on the screen. 2204 01:40:27,490 --> 01:40:30,490 But he only had one insight to a locker at a time, 2205 01:40:30,490 --> 01:40:33,710 because only by revealing what's inside can he actually see it. 2206 01:40:33,710 --> 01:40:35,980 So this seems to suggest that once you do 2207 01:40:35,980 --> 01:40:39,070 have this additional information in Eric's example, in your phone example, 2208 01:40:39,070 --> 01:40:44,440 in the phone book example, you open up possibilities for much much, much more 2209 01:40:44,440 --> 01:40:45,880 efficient algorithms. 2210 01:40:45,880 --> 01:40:50,380 But to get there, we've kind of been deferring this whole time in class 2211 01:40:50,380 --> 01:40:53,390 how you actually sort these elements. 2212 01:40:53,390 --> 01:40:56,920 And if you wouldn't mind-- and this way, we'll hopefully end on a more energized 2213 01:40:56,920 --> 01:40:59,700 note here because I know we've been in the weeds for a while-- 2214 01:40:59,700 --> 01:41:02,270 can we get like eight volunteers? 2215 01:41:02,270 --> 01:41:09,300 OK, so 1, 2, 3, 4-- how about 5, 6, 7, 8, come on down. 2216 01:41:09,300 --> 01:41:09,890 Oh, I'm sorry. 2217 01:41:09,890 --> 01:41:11,900 Did I completely overlook the front row? 2218 01:41:11,900 --> 01:41:12,400 OK. 2219 01:41:12,400 --> 01:41:13,270 All right, next time. 2220 01:41:13,270 --> 01:41:14,170 Next time. 2221 01:41:14,170 --> 01:41:14,800 Come on down. 2222 01:41:20,560 --> 01:41:23,980 Oh, and Colton, do you mind meeting them over there instead? 2223 01:41:23,980 --> 01:41:25,170 All right. 2224 01:41:25,170 --> 01:41:26,000 Come on up. 2225 01:41:26,000 --> 01:41:26,710 What's your name? 2226 01:41:26,710 --> 01:41:27,670 [? CAHMY: ?] [? Cahmy. ?] 2227 01:41:27,670 --> 01:41:28,540 DAVID J. MALAN: [? Cahmy? ?] David. 2228 01:41:28,540 --> 01:41:29,110 Right over there. 2229 01:41:29,110 --> 01:41:29,390 What's your name? 2230 01:41:29,390 --> 01:41:29,790 MATT: Matt. 2231 01:41:29,790 --> 01:41:30,040 DAVID J. MALAN: Matt? 2232 01:41:30,040 --> 01:41:30,880 David. 2233 01:41:30,880 --> 01:41:31,360 [? JUHE: ?] [? Juhe. ?] 2234 01:41:31,360 --> 01:41:32,270 DAVID J. MALAN: [? Juhe? ?] David. 2235 01:41:32,270 --> 01:41:32,620 MAX: Max. 2236 01:41:32,620 --> 01:41:33,900 DAVID J. MALAN: Max, nice to meet you. 2237 01:41:33,900 --> 01:41:34,350 JAMES: James. 2238 01:41:34,350 --> 01:41:35,630 DAVID J. MALAN: James, nice to see you. 2239 01:41:35,630 --> 01:41:36,700 Here, I'll get more chairs. 2240 01:41:36,700 --> 01:41:37,030 What's your name? 2241 01:41:37,030 --> 01:41:37,600 ,PEYTON: Peyton. 2242 01:41:37,600 --> 01:41:38,050 DAVID J. MALAN: Peyton? 2243 01:41:38,050 --> 01:41:38,890 David. 2244 01:41:38,890 --> 01:41:40,180 And two more. 2245 01:41:40,180 --> 01:41:42,430 Actually can what have you come down to this end here? 2246 01:41:42,430 --> 01:41:43,130 What's your name. 2247 01:41:43,130 --> 01:41:43,800 ANDREA: Andrea. 2248 01:41:43,800 --> 01:41:45,650 DAVID J. MALAN: Andrea, nice to see you. 2249 01:41:45,650 --> 01:41:46,240 And your name? 2250 01:41:46,240 --> 01:41:46,940 [? PICCO: ?] [? Picco. ?] 2251 01:41:46,940 --> 01:41:47,870 DAVID J. MALAN: [? Picco, ?] David. 2252 01:41:47,870 --> 01:41:48,530 Nice to see you. 2253 01:41:48,530 --> 01:41:54,430 OK, Colton has a T-shirt for each of you, very Harvard-esque here. 2254 01:41:54,430 --> 01:41:57,830 And each of these shirts, as you're about to see, has a number on it. 2255 01:41:57,830 --> 01:42:00,850 And that number is-- 2256 01:42:00,850 --> 01:42:02,900 well, go ahead put them on, if you wouldn't mind. 2257 01:42:06,420 --> 01:42:07,580 OK, thank you so much. 2258 01:42:07,580 --> 01:42:11,240 So I daresay we've arranged our humans much like the lockers in an array. 2259 01:42:11,240 --> 01:42:13,790 Like we have humans back, to back, to back, to back. 2260 01:42:13,790 --> 01:42:17,210 But this is actually both a blessing and a constraint, 2261 01:42:17,210 --> 01:42:18,920 because we only have eight chairs. 2262 01:42:18,920 --> 01:42:22,730 So there's really not much room here, so we're confined to just this space here. 2263 01:42:22,730 --> 01:42:25,890 And I see we have a 4, 8, 5, 2, 3, 1, 6, 7. 2264 01:42:25,890 --> 01:42:26,840 So this is great. 2265 01:42:26,840 --> 01:42:28,010 Like they are unsorted. 2266 01:42:28,010 --> 01:42:29,630 By definition, it's pretty random. 2267 01:42:29,630 --> 01:42:30,300 So that's great. 2268 01:42:30,300 --> 01:42:31,710 So let's just start off like this. 2269 01:42:31,710 --> 01:42:33,230 Sort yourselves from 1 to 8, please. 2270 01:42:42,280 --> 01:42:42,790 OK. 2271 01:42:42,790 --> 01:42:43,290 All right. 2272 01:42:43,290 --> 01:42:45,500 Well, what algorithm was that? 2273 01:42:45,500 --> 01:42:46,400 [LAUGHTER] 2274 01:42:46,400 --> 01:42:47,950 AUDIENCE: Look around, figure it out. 2275 01:42:47,950 --> 01:42:49,210 DAVID J. MALAN: Look around, figure it out. 2276 01:42:49,210 --> 01:42:49,810 OK, well-- 2277 01:42:49,810 --> 01:42:50,620 MATT: Human ingenuity. 2278 01:42:50,620 --> 01:42:51,500 DAVID J. MALAN: Human ingenuity? 2279 01:42:51,500 --> 01:42:52,330 Very well done. 2280 01:42:52,330 --> 01:42:54,700 So can we-- well, what was like a thought 2281 01:42:54,700 --> 01:42:56,420 going through any of your minds? 2282 01:42:56,420 --> 01:42:57,760 MATT: Find a chair and sit down. 2283 01:42:57,760 --> 01:42:58,720 DAVID J. MALAN: Find the chair-- 2284 01:42:58,720 --> 01:42:59,590 find the right chair. 2285 01:42:59,590 --> 01:43:00,710 So go to a location. 2286 01:43:00,710 --> 01:43:01,210 Good. 2287 01:43:01,210 --> 01:43:02,890 So like an index location, right? 2288 01:43:02,890 --> 01:43:04,750 Arrays have indices, so to spea-- 2289 01:43:04,750 --> 01:43:07,780 0, 1, 2, all the way up to 7. 2290 01:43:07,780 --> 01:43:10,310 And even though our shirts are numbered from 1 to 8, 2291 01:43:10,310 --> 01:43:11,780 you can think in terms of 0 to 7. 2292 01:43:11,780 --> 01:43:12,310 So that was good. 2293 01:43:12,310 --> 01:43:12,810 Anyone else? 2294 01:43:12,810 --> 01:43:14,530 Other thoughts? 2295 01:43:14,530 --> 01:43:17,690 [? CAHMY: ?] I mean, this is something we implicitly think of, 2296 01:43:17,690 --> 01:43:19,310 but no one told us that it was ordered right to left. 2297 01:43:19,310 --> 01:43:20,880 Like we could have done it left to right. 2298 01:43:20,880 --> 01:43:21,250 DAVID J. MALAN: OK. 2299 01:43:21,250 --> 01:43:21,730 Absolutely. 2300 01:43:21,730 --> 01:43:23,770 Could have gone from right to left, instead of left to right. 2301 01:43:23,770 --> 01:43:25,250 But at least we all agreed on this convention 2302 01:43:25,250 --> 01:43:26,590 too, so that was in your mind. 2303 01:43:26,590 --> 01:43:26,860 OK. 2304 01:43:26,860 --> 01:43:27,220 So good. 2305 01:43:27,220 --> 01:43:28,090 So we got this sorted. 2306 01:43:28,090 --> 01:43:30,130 Go ahead and re-randomize yourself, if you could. 2307 01:43:35,220 --> 01:43:37,660 And what algorithm was this? 2308 01:43:37,660 --> 01:43:38,980 Just random awkwardness? 2309 01:43:38,980 --> 01:43:39,790 OK, so that's fine. 2310 01:43:39,790 --> 01:43:41,260 So it looks pretty random. 2311 01:43:41,260 --> 01:43:42,100 That will do. 2312 01:43:42,100 --> 01:43:44,500 Let's see if we can now reduce the process of sorting 2313 01:43:44,500 --> 01:43:47,300 to something a little more algorithmic so that, one, we can be sure 2314 01:43:47,300 --> 01:43:50,500 we're correct and not just kind of get lucky that everyone kind of figured it 2315 01:43:50,500 --> 01:43:52,450 out and no one was left out, and two, then 2316 01:43:52,450 --> 01:43:54,730 start to think about how efficient it is, right? 2317 01:43:54,730 --> 01:43:57,700 Because if we've been gaining so much efficiency for the phone book, 2318 01:43:57,700 --> 01:43:59,730 for our contacts, for [? error ?] coming up, 2319 01:43:59,730 --> 01:44:01,780 we really should have been asking the whole time, 2320 01:44:01,780 --> 01:44:05,080 sure, you save time with binary search and divide and conquer, 2321 01:44:05,080 --> 01:44:08,110 but how much did it cost you to get to a point 2322 01:44:08,110 --> 01:44:10,750 where you can use binary search and divide and conquer? 2323 01:44:10,750 --> 01:44:14,160 Because sorting, if it's super, super, super expensive and time-consuming 2324 01:44:14,160 --> 01:44:15,250 maybe it's a net negative. 2325 01:44:15,250 --> 01:44:17,290 And you might as well just search the whole list, 2326 01:44:17,290 --> 01:44:18,730 rather than ever sort anything. 2327 01:44:18,730 --> 01:44:19,230 All right. 2328 01:44:19,230 --> 01:44:20,920 So let's see here. 2329 01:44:20,920 --> 01:44:22,630 6 and 5, I don't like this. 2330 01:44:22,630 --> 01:44:24,000 Why? 2331 01:44:24,000 --> 01:44:25,390 AUDIENCE: [INAUDIBLE] 2332 01:44:25,390 --> 01:44:27,310 DAVID J. MALAN: 6 is supposed to come after 5. 2333 01:44:27,310 --> 01:44:29,510 And so, can we fix this, please? 2334 01:44:29,510 --> 01:44:30,010 All right. 2335 01:44:30,010 --> 01:44:30,800 And then let's see. 2336 01:44:30,800 --> 01:44:33,520 OK, 6 and 1-- ugh, don't really like this. 2337 01:44:33,520 --> 01:44:36,050 Yeah, can we fix this? 2338 01:44:36,050 --> 01:44:36,550 Very nice. 2339 01:44:36,550 --> 01:44:39,700 6 and 3, OK, you really got the short end of the stick here. 2340 01:44:39,700 --> 01:44:43,010 So 6 and 3, could we fix this? 2341 01:44:43,010 --> 01:44:44,590 And 6-- yeah, OK. 2342 01:44:44,590 --> 01:44:46,210 Ooh, OK, 6 and 7-- good. 2343 01:44:46,210 --> 01:44:47,590 All right, so that's pretty good. 2344 01:44:47,590 --> 01:44:49,180 7 and 8, nice. 2345 01:44:49,180 --> 01:44:50,000 8 and 4, sorry. 2346 01:44:50,000 --> 01:44:52,640 Could we switch here? 2347 01:44:52,640 --> 01:44:53,140 All right. 2348 01:44:53,140 --> 01:44:54,440 And then 8 and 2? 2349 01:44:54,440 --> 01:44:56,000 OK, could we switch here? 2350 01:44:56,000 --> 01:44:56,500 OK. 2351 01:44:56,500 --> 01:44:58,580 And let me ask you a somewhat rhetorical question. 2352 01:44:58,580 --> 01:45:00,120 OK, am I done? 2353 01:45:00,120 --> 01:45:00,760 OK, no. 2354 01:45:00,760 --> 01:45:03,580 Obviously not, but I did fix some problems, right? 2355 01:45:03,580 --> 01:45:06,380 I fixed some transpositions, numbers being out of order. 2356 01:45:06,380 --> 01:45:07,480 And in fact, I-- what's your name again? 2357 01:45:07,480 --> 01:45:08,270 [? CAHMY: ?] [? Cahmy. ?] 2358 01:45:08,270 --> 01:45:11,500 DAVID J. MALAN: [? Cahmy, ?] kind of bubbled to the right here, if you will. 2359 01:45:11,500 --> 01:45:14,050 Like you were kind of farther down, and now you're over here. 2360 01:45:14,050 --> 01:45:16,930 And like the smaller numbers, kind of-- yeah 1. 2361 01:45:16,930 --> 01:45:19,600 Like, my god, like he kind of bubbled his way this way. 2362 01:45:19,600 --> 01:45:21,580 So things are percolating, in some sense. 2363 01:45:21,580 --> 01:45:23,240 And that's a good thing. 2364 01:45:23,240 --> 01:45:24,350 And so you know what? 2365 01:45:24,350 --> 01:45:26,170 Let Me try to fix some remaining problems. 2366 01:45:26,170 --> 01:45:27,330 So 1 and 5-- good. 2367 01:45:27,330 --> 01:45:29,560 Oh 3 and 5, could you switch? 2368 01:45:29,560 --> 01:45:31,780 5 and 6, OK. 2369 01:45:31,780 --> 01:45:32,590 6 and 7? 2370 01:45:32,590 --> 01:45:34,950 7 and 4, could you switch? 2371 01:45:34,950 --> 01:45:36,040 OK. 2372 01:45:36,040 --> 01:45:40,390 And 7 and 2, could you switch? 2373 01:45:40,390 --> 01:45:42,700 And now, I don't have to speak with [? Cahmy ?] again, 2374 01:45:42,700 --> 01:45:44,450 because we know you're in the right place. 2375 01:45:44,450 --> 01:45:46,490 So I actually don't have to do quite as much work 2376 01:45:46,490 --> 01:45:48,050 this time, which is kind of nice. 2377 01:45:48,050 --> 01:45:49,240 But am I done? 2378 01:45:49,240 --> 01:45:50,560 No, obviously not. 2379 01:45:50,560 --> 01:45:52,360 But what's the pattern now? 2380 01:45:52,360 --> 01:45:53,950 Like what's the fundamental primitive? 2381 01:45:53,950 --> 01:45:57,190 If I just compare pairwise humans and numbers, 2382 01:45:57,190 --> 01:45:59,560 I can slightly improve the situation each time 2383 01:45:59,560 --> 01:46:01,180 by just swapping them, swapping them. 2384 01:46:01,180 --> 01:46:02,560 And each time now-- 2385 01:46:02,560 --> 01:46:04,750 I'm sorry, [? Picco ?] is in number 7's place. 2386 01:46:04,750 --> 01:46:07,300 I don't have to talk to him anymore, because he's now bubbled 2387 01:46:07,300 --> 01:46:08,640 his way all the way up to the top. 2388 01:46:08,640 --> 01:46:10,970 So even though I'm doing the same thing again and again, 2389 01:46:10,970 --> 01:46:13,420 and looping again and again isn't always the best thing, 2390 01:46:13,420 --> 01:46:16,730 so long as you're looping fewer and fewer times, I will eventually stop, 2391 01:46:16,730 --> 01:46:17,320 it would seem. 2392 01:46:17,320 --> 01:46:20,110 Because 6 is going to eventually go in the right place, and then 5, 2393 01:46:20,110 --> 01:46:21,260 and then 4, and so forth. 2394 01:46:21,260 --> 01:46:22,510 So if we can just finish this algorithm. 2395 01:46:22,510 --> 01:46:24,270 Good. 2396 01:46:24,270 --> 01:46:26,070 Not good. 2397 01:46:26,070 --> 01:46:27,600 OK, 6 and 2, not good. 2398 01:46:27,600 --> 01:46:28,960 If you could swap? 2399 01:46:28,960 --> 01:46:30,400 OK, and what's your name again? 2400 01:46:30,400 --> 01:46:31,030 PEYTON: Peyton. 2401 01:46:31,030 --> 01:46:32,260 DAVID J. MALAN: Peyton is now in the right place. 2402 01:46:32,260 --> 01:46:33,980 I have even less work now ahead of me. 2403 01:46:33,980 --> 01:46:35,650 So if I can just continue this process-- 2404 01:46:35,650 --> 01:46:39,580 1 and 3, 3 and 5, 4 and 5, OK, and then 2 and 5. 2405 01:46:39,580 --> 01:46:40,960 And then, what's your name again? 2406 01:46:40,960 --> 01:46:41,440 MATT: Matt. 2407 01:46:41,440 --> 01:46:42,790 DAVID J. MALAN: Matt is now in the right place. 2408 01:46:42,790 --> 01:46:43,560 Even less work. 2409 01:46:43,560 --> 01:46:44,350 We're almost there. 2410 01:46:44,350 --> 01:46:47,710 1 and 3, 3 and 4, 4 and 2, if you could swap. 2411 01:46:47,710 --> 01:46:48,640 OK, almost done. 2412 01:46:48,640 --> 01:46:51,460 And 1 and 3, 3 and 2, if you could swap. 2413 01:46:51,460 --> 01:46:52,360 Nice. 2414 01:46:52,360 --> 01:46:53,600 So this is interesting. 2415 01:46:53,600 --> 01:46:55,810 It would seem that-- you know, in the first place, 2416 01:46:55,810 --> 01:46:59,080 I kind of compared seven pairs of people. 2417 01:46:59,080 --> 01:47:02,170 And then the next time I went through, I compared how many pairs of people 2418 01:47:02,170 --> 01:47:02,830 maximally? 2419 01:47:02,830 --> 01:47:03,740 AUDIENCE: [INAUDIBLE] 2420 01:47:03,740 --> 01:47:05,080 DAVID J. MALAN: Just six, right? 2421 01:47:05,080 --> 01:47:06,340 Because we were able to leave [? Cahmy ?] out. 2422 01:47:06,340 --> 01:47:09,000 And then we were able to leave [? Picco ?] out, and then Peyton. 2423 01:47:09,000 --> 01:47:12,220 And so the number of comparisons I was doing was getting fewer and fewer. 2424 01:47:12,220 --> 01:47:13,510 So that feels pretty good. 2425 01:47:13,510 --> 01:47:14,260 But you know what? 2426 01:47:14,260 --> 01:47:17,130 Before We even analyze that, can you just randomize yourselves again? 2427 01:47:17,130 --> 01:47:18,590 Any human algorithm is fine. 2428 01:47:18,590 --> 01:47:22,990 Let's try one other approach, because this feels kind of non-obvious, right? 2429 01:47:22,990 --> 01:47:26,020 I was fixing things, but I had to keep fixing things again and again. 2430 01:47:26,020 --> 01:47:28,150 Let me try to take a bigger bite out of the problem 2431 01:47:28,150 --> 01:47:30,260 this time by just selecting the smallest person. 2432 01:47:30,260 --> 01:47:31,350 OK, so your name again is? 2433 01:47:31,350 --> 01:47:32,080 [? JUHE: ?] [? Juhe. ?] 2434 01:47:32,080 --> 01:47:34,070 DAVID J. MALAN: [? Juhe, ?] number 2-- that's a pretty small number, 2435 01:47:34,070 --> 01:47:36,400 so I'm going to remember that in sort of a mental variable. 2436 01:47:36,400 --> 01:47:36,890 4? 2437 01:47:36,890 --> 01:47:37,730 No, you're too big. 2438 01:47:37,730 --> 01:47:39,050 Too big. 2439 01:47:39,050 --> 01:47:40,250 Oh, what was your name again? 2440 01:47:40,250 --> 01:47:40,840 JAMES: James. 2441 01:47:40,840 --> 01:47:41,750 DAVID J. MALAN: James. 2442 01:47:41,750 --> 01:47:42,290 James is a 1. 2443 01:47:42,290 --> 01:47:43,090 That's pretty nice. 2444 01:47:43,090 --> 01:47:43,960 Let me keep checking. 2445 01:47:43,960 --> 01:47:47,400 OK, James, in my mental variable is the smallest number. 2446 01:47:47,400 --> 01:47:48,860 I know I want him at the beginning. 2447 01:47:48,860 --> 01:47:50,480 So if you wouldn't mind coming with me. 2448 01:47:50,480 --> 01:47:52,090 And I'm sorry, we don't have room for you anymore. 2449 01:47:52,090 --> 01:47:53,670 If you could just-- oh, you know what? 2450 01:47:53,670 --> 01:47:55,500 Could you all just shuffle down? 2451 01:47:55,500 --> 01:47:57,280 Well, hm, I don't know if I like that. 2452 01:47:57,280 --> 01:47:58,470 That's a lot of work, right? 2453 01:47:58,470 --> 01:48:00,260 Moving all these values, let's not do that. 2454 01:48:00,260 --> 01:48:01,460 Let's not do that. 2455 01:48:01,460 --> 01:48:03,260 Number 2, could you mind just going where-- 2456 01:48:03,260 --> 01:48:03,770 where-- 2457 01:48:03,770 --> 01:48:04,520 JAMES: It's James. 2458 01:48:04,520 --> 01:48:06,180 DAVID J. MALAN: --James was? 2459 01:48:06,180 --> 01:48:09,270 OK, so I've kind of made the problem a little worse in that, 2460 01:48:09,270 --> 01:48:11,790 now, number 2 is farther away from the goal. 2461 01:48:11,790 --> 01:48:14,910 But I could have gotten lucky, and maybe she was number 7 or 8. 2462 01:48:14,910 --> 01:48:17,760 And so let me just claim that, on average, just evicting the person 2463 01:48:17,760 --> 01:48:20,070 is going to kind of be a wash and average out. 2464 01:48:20,070 --> 01:48:21,870 But now James is in the right place. 2465 01:48:21,870 --> 01:48:22,590 Done. 2466 01:48:22,590 --> 01:48:24,740 Now I have a problem that's of size 7. 2467 01:48:24,740 --> 01:48:26,490 So let me select the next smallest person. 2468 01:48:26,490 --> 01:48:29,910 4 is the next smallest, not 8, not 5, not 7-- ooh, 2. 2469 01:48:29,910 --> 01:48:30,990 Not 3, 6. 2470 01:48:30,990 --> 01:48:32,610 OK, so you're back in the game. 2471 01:48:32,610 --> 01:48:33,660 All right, come on back. 2472 01:48:33,660 --> 01:48:35,920 And can we evict number 4? 2473 01:48:35,920 --> 01:48:37,860 And on this algorithm, if you will, I just 2474 01:48:37,860 --> 01:48:40,530 interpretively select the smallest person. 2475 01:48:40,530 --> 01:48:44,070 I'm not comparing everyone in quite the same way and swapping them pairwise, 2476 01:48:44,070 --> 01:48:46,050 I'm doing some of more macroscopic swaps. 2477 01:48:46,050 --> 01:48:48,510 So now I'm going to look for the next smallest, which is 3. 2478 01:48:48,510 --> 01:48:50,130 If you wouldn't mind popping around here? 2479 01:48:50,130 --> 01:48:52,210 [? Cahmy, ?] we have to, unfortunately, evict you, 2480 01:48:52,210 --> 01:48:53,540 but that works out to our favor. 2481 01:48:53,540 --> 01:48:55,460 Let me look for the next smallest, which is 4. 2482 01:48:55,460 --> 01:48:56,280 OK, you're back in. 2483 01:48:56,280 --> 01:48:57,330 Come on down. 2484 01:48:57,330 --> 01:48:58,710 Swap with 5. 2485 01:48:58,710 --> 01:49:00,150 OK, now I'm looking for 5. 2486 01:49:00,150 --> 01:49:01,160 Hey, 5, there you are. 2487 01:49:01,160 --> 01:49:01,660 OK. 2488 01:49:01,660 --> 01:49:02,710 So go here. 2489 01:49:02,710 --> 01:49:03,900 OK, looking for 6. 2490 01:49:03,900 --> 01:49:06,510 Oh, 6, a little bit of a shuffle. 2491 01:49:06,510 --> 01:49:07,020 OK. 2492 01:49:07,020 --> 01:49:08,610 And now looking for 7. 2493 01:49:08,610 --> 01:49:10,710 Oh, 7, if you could go here. 2494 01:49:10,710 --> 01:49:12,170 But notice, I'm not going back. 2495 01:49:12,170 --> 01:49:13,380 And this is what's important. 2496 01:49:13,380 --> 01:49:15,360 Like my steps are getting shorter and shorter. 2497 01:49:15,360 --> 01:49:17,520 My remaining steps are getting shorter and shorter. 2498 01:49:17,520 --> 01:49:21,030 And now we've actually sorted all of these humans. 2499 01:49:21,030 --> 01:49:24,480 So two fundamentally different ways, but they're both comparative in nature, 2500 01:49:24,480 --> 01:49:27,060 because I'm comparing these characters again, 2501 01:49:27,060 --> 01:49:29,880 and again, and again, and swapping them if they're out of order. 2502 01:49:29,880 --> 01:49:34,350 Or at a higher level, going through and swapping them again, 2503 01:49:34,350 --> 01:49:35,970 and again, and again. 2504 01:49:35,970 --> 01:49:38,370 But how many steps am I taking each time? 2505 01:49:38,370 --> 01:49:41,880 Even though I was doing fewer and fewer and I wasn't doubling back, 2506 01:49:41,880 --> 01:49:45,600 the first time, I was doing like n minus 1 comparisons. 2507 01:49:45,600 --> 01:49:46,830 And then I went back here. 2508 01:49:46,830 --> 01:49:50,700 And in the first algorithm, I kind of stopped going as far. 2509 01:49:50,700 --> 01:49:53,540 In the second algorithm, I just didn't go back as far. 2510 01:49:53,540 --> 01:49:56,290 So it was just kind of a different way of thinking of the problem. 2511 01:49:56,290 --> 01:49:57,240 But then I did what? 2512 01:49:57,240 --> 01:49:59,320 Like seven comparisons? 2513 01:49:59,320 --> 01:50:03,240 Then six, then five, then four, then three, then two, then one. 2514 01:50:03,240 --> 01:50:06,340 It's getting smaller, but how many comparisons is that total? 2515 01:50:06,340 --> 01:50:09,100 I've got like n people, n being a number. 2516 01:50:09,100 --> 01:50:10,330 AUDIENCE: [INAUDIBLE] 2517 01:50:10,330 --> 01:50:12,120 DAVID J. MALAN: Is not as bad as factorial. 2518 01:50:12,120 --> 01:50:14,150 We'd be here all day long. 2519 01:50:14,150 --> 01:50:15,060 But it is big. 2520 01:50:15,060 --> 01:50:15,560 It is big. 2521 01:50:15,560 --> 01:50:18,050 Let's go-- a round of applause, if we could, for our volunteers. 2522 01:50:18,050 --> 01:50:19,910 You can keep the shirts, if you'd like, as a souvenir. 2523 01:50:19,910 --> 01:50:20,450 [APPLAUSE] 2524 01:50:20,450 --> 01:50:22,370 Thank you, very much. 2525 01:50:22,370 --> 01:50:26,570 Let me see if we can't just kind of quantify that-- thank you, so much-- 2526 01:50:26,570 --> 01:50:29,420 and see how we actually got to that point. 2527 01:50:29,420 --> 01:50:34,370 If I go ahead and pull up not our lockers, but our answers here, 2528 01:50:34,370 --> 01:50:38,900 let me propose that what we just did was essentially two algorithms. 2529 01:50:38,900 --> 01:50:39,920 One has the name bubble. 2530 01:50:39,920 --> 01:50:42,790 And I was kind of deliberately kind of shoehorning the word in there. 2531 01:50:42,790 --> 01:50:45,980 Bubble sort is just that comparative sort, pair by pair, 2532 01:50:45,980 --> 01:50:47,810 fixing tiny little mistakes. 2533 01:50:47,810 --> 01:50:50,760 But we needed to do it again, and again, and again. 2534 01:50:50,760 --> 01:50:54,310 So those steps kind of add up, but we can express them as pseudocode. 2535 01:50:54,310 --> 01:50:56,900 So in pseudocode-- and you can write this any number of ways-- 2536 01:50:56,900 --> 01:50:58,140 I might just do the following. 2537 01:50:58,140 --> 01:51:01,160 Just keep doing the following, until there's no remaining swaps-- 2538 01:51:01,160 --> 01:51:06,470 from i from 0 to n -2, which is just n is the total number of humans. 2539 01:51:06,470 --> 01:51:10,190 n -2 is go up from that person to this person, 2540 01:51:10,190 --> 01:51:13,070 because I want to compare him or her against the person next to them. 2541 01:51:13,070 --> 01:51:14,730 So I don't want to accidentally do this. 2542 01:51:14,730 --> 01:51:16,860 That's why it's n -2 at the end here. 2543 01:51:16,860 --> 01:51:19,820 Then I want to go ahead and, if the ith and the ith +1 elements are out 2544 01:51:19,820 --> 01:51:21,390 of order, swap them. 2545 01:51:21,390 --> 01:51:24,700 So that's why I was asking our human volunteers to exchange places. 2546 01:51:24,700 --> 01:51:27,410 And then just keep doing that, until there's no one left to swap. 2547 01:51:27,410 --> 01:51:29,840 And by definition, everyone is in order. 2548 01:51:29,840 --> 01:51:33,200 Meanwhile, the second algorithm has the conventional name of selection sort. 2549 01:51:33,200 --> 01:51:37,220 Selection sort is literally just that, where you actually 2550 01:51:37,220 --> 01:51:40,900 select the smallest person, or number of interest to you, intuitively, 2551 01:51:40,900 --> 01:51:41,570 again and again. 2552 01:51:41,570 --> 01:51:43,400 And the number keeps getting bigger, but you 2553 01:51:43,400 --> 01:51:45,980 start ignoring the people who you've already put into place. 2554 01:51:45,980 --> 01:51:48,810 So the problem, similarly, is getting smaller and smaller. 2555 01:51:48,810 --> 01:51:52,010 Just like in bubble sort, it was getting more and more sorted. 2556 01:51:52,010 --> 01:51:54,470 The pseudocode for selection sort might look like this. 2557 01:51:54,470 --> 01:51:58,820 For i from 0 to n -1, so that's 0 in an array. 2558 01:51:58,820 --> 01:52:00,080 And this is n -1. 2559 01:52:00,080 --> 01:52:05,060 Just keep looking for the smallest element between those two chairs, 2560 01:52:05,060 --> 01:52:06,830 and then pull that person out. 2561 01:52:06,830 --> 01:52:09,480 And then just evict whoever's there-- swap them, 2562 01:52:09,480 --> 01:52:13,110 but not necessarily adjacently, just as far away as is necessary. 2563 01:52:13,110 --> 01:52:16,610 And in this way, I keep turning my back on more and more people 2564 01:52:16,610 --> 01:52:18,920 because they are then in place. 2565 01:52:18,920 --> 01:52:20,750 So two different framings of the problem, 2566 01:52:20,750 --> 01:52:24,950 but it turns out they're actually both the same number of steps, give or take. 2567 01:52:24,950 --> 01:52:27,200 It turns out they're roughly the same number of steps, 2568 01:52:27,200 --> 01:52:29,450 even though it's a different way of thinking about it. 2569 01:52:29,450 --> 01:52:32,620 Because if I think about bubble sort, the first iteration, for instance, 2570 01:52:32,620 --> 01:52:35,500 what just-- actually, well, let's consider selection sort even. 2571 01:52:35,500 --> 01:52:39,110 In selection sort, how many comparisons did I have to do? 2572 01:52:39,110 --> 01:52:41,450 Well, once I found my smallest element, I 2573 01:52:41,450 --> 01:52:43,530 had to compare them against everyone else. 2574 01:52:43,530 --> 01:52:46,100 So that's n -1 comparisons the first time. 2575 01:52:46,100 --> 01:52:47,600 So n -1 on the board. 2576 01:52:47,600 --> 01:52:50,570 Then I can ignore them, because they're behind me now. 2577 01:52:50,570 --> 01:52:54,050 So now I have how many comparisons left out of n people? 2578 01:52:54,050 --> 01:52:56,210 n -2, because I subtracted one. 2579 01:52:56,210 --> 01:53:00,810 Then again, n -3, then n -4, all the way down to just one person remaining. 2580 01:53:00,810 --> 01:53:03,560 So I'll express that sort of generally, mathematically, like this. 2581 01:53:03,560 --> 01:53:09,590 So n -1 plus n -2 plus whatever plus one final comparison, whatever that is. 2582 01:53:09,590 --> 01:53:12,130 It turns out that if you actually read the back of the math 2583 01:53:12,130 --> 01:53:14,840 book or your physics textbooks where they have those little cheat 2584 01:53:14,840 --> 01:53:20,930 sheets as to what these recurrences are, turns out that n -1 plus n -2 plus n -3 2585 01:53:20,930 --> 01:53:22,850 and so forth can be expressed more succinctly 2586 01:53:22,850 --> 01:53:26,600 as literally just n times n -1 divided by 2. 2587 01:53:26,600 --> 01:53:28,920 And if you don't recall that, that's OK. 2588 01:53:28,920 --> 01:53:30,510 I always look these things up as well. 2589 01:53:30,510 --> 01:53:32,180 But that's true-- fact. 2590 01:53:32,180 --> 01:53:33,690 So what does that equal out to? 2591 01:53:33,690 --> 01:53:36,900 Well, it's like n squared minus n, if you just multiply it out. 2592 01:53:36,900 --> 01:53:38,660 And then if you divide the two, then it's 2593 01:53:38,660 --> 01:53:40,760 n squared divided by 2 minus n over 2. 2594 01:53:40,760 --> 01:53:42,260 So that's the total number of steps. 2595 01:53:42,260 --> 01:53:43,250 And I could actually plug this in. 2596 01:53:43,250 --> 01:53:46,610 We could plug in 8, do the math, and get the total number of comparisons 2597 01:53:46,610 --> 01:53:49,580 that I was verbally kind of rattling off. 2598 01:53:49,580 --> 01:53:51,750 So is that a big deal? 2599 01:53:51,750 --> 01:53:54,360 Hm, it feels like it's on the order of n squared. 2600 01:53:54,360 --> 01:53:56,360 And indeed, a computer scientist, when assessing 2601 01:53:56,360 --> 01:53:59,210 the efficiency of an algorithm, tends not to care too much 2602 01:53:59,210 --> 01:54:00,350 about the precise values. 2603 01:54:00,350 --> 01:54:02,800 All we're going to care about it's the biggest term. 2604 01:54:02,800 --> 01:54:05,030 What's the value in the formula that you come up 2605 01:54:05,030 --> 01:54:07,820 with that just dominates the other terms, so to speak, 2606 01:54:07,820 --> 01:54:11,610 that has the biggest effect, especially as n is getting larger and larger? 2607 01:54:11,610 --> 01:54:12,560 Now, why is this? 2608 01:54:12,560 --> 01:54:15,290 Well, let's just do sort of proof by example, if you will. 2609 01:54:15,290 --> 01:54:18,170 If this is the expression, technically, but I 2610 01:54:18,170 --> 01:54:20,270 claim that, ugh, it's close enough to say 2611 01:54:20,270 --> 01:54:24,900 on the order of, big O of n squared, so to speak, let's use an example. 2612 01:54:24,900 --> 01:54:27,620 If there's a million people on stage, and not just eight, 2613 01:54:27,620 --> 01:54:29,630 that math works out to be like a million squared 2614 01:54:29,630 --> 01:54:33,490 divided by 2 steps minus a million divided by 2, total. 2615 01:54:33,490 --> 01:54:35,240 So what does that actually work out to be? 2616 01:54:35,240 --> 01:54:38,920 Well, that's 500 billion minus 500,000. 2617 01:54:38,920 --> 01:54:40,340 And what does that work out to be? 2618 01:54:40,340 --> 01:54:46,220 Well, that's 499 billion, 999 million, 500,000. 2619 01:54:46,220 --> 01:54:49,940 That feels pretty darn close to like n squared. 2620 01:54:49,940 --> 01:54:54,750 I mean, that's a drop in the bucket to subtract 500,000 from 500 billion. 2621 01:54:54,750 --> 01:54:55,460 So you know what? 2622 01:54:55,460 --> 01:54:57,770 Eh, it's on the order of n squared. 2623 01:54:57,770 --> 01:55:01,440 It's not precise, but it's in that general order of magnitude, 2624 01:55:01,440 --> 01:55:02,190 so to speak. 2625 01:55:02,190 --> 01:55:04,700 And so this symbol, this capital 0, is literally a symbol 2626 01:55:04,700 --> 01:55:06,860 used in computer science and in programming 2627 01:55:06,860 --> 01:55:09,230 to just kind of describe with a wave of the hand, 2628 01:55:09,230 --> 01:55:13,730 but some good intuition and algorithm, how fast or slow your algorithm is. 2629 01:55:13,730 --> 01:55:16,660 And it turns out there's different ways to evaluate algorithms 2630 01:55:16,660 --> 01:55:18,440 with just different similar formulas. 2631 01:55:18,440 --> 01:55:21,950 n squared happens to be how much time both bubble sort and selection 2632 01:55:21,950 --> 01:55:22,460 sort take. 2633 01:55:22,460 --> 01:55:24,290 If I literally count up all of the work we 2634 01:55:24,290 --> 01:55:26,180 were doing on stage with our volunteers, it 2635 01:55:26,180 --> 01:55:32,300 would be roughly n squared, 8 squared, or 64 steps, give or take, 2636 01:55:32,300 --> 01:55:33,560 for all of those humans. 2637 01:55:33,560 --> 01:55:35,010 And that would be notably off. 2638 01:55:35,010 --> 01:55:36,930 There's a good amount of rounding error there. 2639 01:55:36,930 --> 01:55:39,830 But if we had a million volunteers on stage, 2640 01:55:39,830 --> 01:55:42,450 then the rounding error would be pretty negligible. 2641 01:55:42,450 --> 01:55:45,150 But we've actually seen some of these other orders of magnitude, 2642 01:55:45,150 --> 01:55:46,710 so to speak, before. 2643 01:55:46,710 --> 01:55:49,940 For instance, when we counted someone, or we searched for Mike Smith one page 2644 01:55:49,940 --> 01:55:52,340 at a time, we called that a linear algorithm. 2645 01:55:52,340 --> 01:55:53,750 And that was big O of n. 2646 01:55:53,750 --> 01:55:55,190 So it's on the order of n steps. 2647 01:55:55,190 --> 01:55:55,850 It's 1,000. 2648 01:55:55,850 --> 01:55:56,730 Maybe it's 999. 2649 01:55:56,730 --> 01:55:57,230 Whatever. 2650 01:55:57,230 --> 01:55:58,880 It's on the order of n steps. 2651 01:55:58,880 --> 01:56:02,840 The [? twosies ?] approach was twice as fast, recall-- two pages at a time. 2652 01:56:02,840 --> 01:56:03,590 But you know what? 2653 01:56:03,590 --> 01:56:05,010 That's still linear, right? 2654 01:56:05,010 --> 01:56:06,100 Like two pages at a time? 2655 01:56:06,100 --> 01:56:08,270 Let me just wait till next year when my CPU is twice 2656 01:56:08,270 --> 01:56:10,930 as fast, because Intel and companies keep speeding up computers. 2657 01:56:10,930 --> 01:56:12,630 The algorithm is fundamentally the same. 2658 01:56:12,630 --> 01:56:15,860 And indeed, if you think back to the picture we drew, 2659 01:56:15,860 --> 01:56:18,890 the shapes of those curves were indeed the same. 2660 01:56:18,890 --> 01:56:22,100 That first algorithm, finding Mike one page at a time looked like this. 2661 01:56:22,100 --> 01:56:24,230 Second algorithm finding him looked like this. 2662 01:56:24,230 --> 01:56:28,070 Only the third algorithm, the divide and conquer, splitting the phone book 2663 01:56:28,070 --> 01:56:29,900 was a fundamentally different shape. 2664 01:56:29,900 --> 01:56:33,140 And so even though we didn't use this fancy phrasing a couple of weeks 2665 01:56:33,140 --> 01:56:37,510 ago, these first algorithms, one page at a time, two pages at a time, eh, 2666 01:56:37,510 --> 01:56:39,680 they're on the order of n. 2667 01:56:39,680 --> 01:56:42,560 Technically, yes, n versus n divided by 2, 2668 01:56:42,560 --> 01:56:46,170 but we only care about the dominating factor, the variable n. 2669 01:56:46,170 --> 01:56:48,170 We can throw away everything in the denominator, 2670 01:56:48,170 --> 01:56:51,350 and we can throw away everything that's smaller than the biggest term, which 2671 01:56:51,350 --> 01:56:52,940 in this case is just n. 2672 01:56:52,940 --> 01:56:54,940 And I alluded to this two weeks ago-- 2673 01:56:54,940 --> 01:56:55,800 logarithmic. 2674 01:56:55,800 --> 01:56:58,760 Well, it turns out that any time you divide something again, and again, 2675 01:56:58,760 --> 01:57:02,520 and again, you're leveraging a logarithmic type function, 2676 01:57:02,520 --> 01:57:03,890 log base 2 technically. 2677 01:57:03,890 --> 01:57:08,010 But on the order of log base n is a common one as well. 2678 01:57:08,010 --> 01:57:10,640 The beautiful algorithms are these-- 2679 01:57:10,640 --> 01:57:14,310 literally, one step, or technically constant number of steps. 2680 01:57:14,310 --> 01:57:17,450 For instance, like what's an algorithm that might be constant time? 2681 01:57:20,270 --> 01:57:21,570 Open phone book. 2682 01:57:21,570 --> 01:57:22,600 OK, one step. 2683 01:57:22,600 --> 01:57:24,560 Doesn't really matter how many pages there are, 2684 01:57:24,560 --> 01:57:25,790 I'm just going to open the phone book. 2685 01:57:25,790 --> 01:57:27,680 And that doesn't vary by number of pages. 2686 01:57:27,680 --> 01:57:30,120 That might be a constant time algorithm, for instance. 2687 01:57:30,120 --> 01:57:32,060 So those are the lowest you can go. 2688 01:57:32,060 --> 01:57:34,580 And then there's somewhere even in between here 2689 01:57:34,580 --> 01:57:37,820 that we might aspire to with certain other algorithms. 2690 01:57:37,820 --> 01:57:41,020 So in fact, let's just see if-- just a moment-- 2691 01:57:41,020 --> 01:57:44,990 let's just see if we can do this a little more succinctly. 2692 01:57:44,990 --> 01:57:50,690 Let's go ahead and use arrays in just one final way, using merge sorts. 2693 01:57:50,690 --> 01:57:53,510 So it turns out, using an array, we can actually 2694 01:57:53,510 --> 01:57:56,870 do something pretty powerfully, so long as we allow ourselves 2695 01:57:56,870 --> 01:57:58,200 a couple of arrays. 2696 01:57:58,200 --> 01:58:00,510 So again, when we just add sorting with bubble sort and selection sort, 2697 01:58:00,510 --> 01:58:01,420 we had just one array. 2698 01:58:01,420 --> 01:58:04,860 We had eight chairs for our eight people. 2699 01:58:04,860 --> 01:58:07,910 But if I actually allowed myself like 16 chairs, or even more, 2700 01:58:07,910 --> 01:58:10,070 and I allowed these folks to move a bit more, 2701 01:58:10,070 --> 01:58:12,710 I could actually do even better than that using arrays. 2702 01:58:12,710 --> 01:58:16,040 So here's some random numbers that we'll just do visually, without any humans. 2703 01:58:16,040 --> 01:58:18,410 And they're in an array, back, to back, to back, to back. 2704 01:58:18,410 --> 01:58:20,270 But if I allow myself a second array, I'm 2705 01:58:20,270 --> 01:58:23,360 going to be able to shuffle these things around and not just compare them, 2706 01:58:23,360 --> 01:58:26,450 because it was those comparisons and all of my footsteps in front of them 2707 01:58:26,450 --> 01:58:28,320 that really started to take a lot of time. 2708 01:58:28,320 --> 01:58:29,260 So here's my array. 2709 01:58:29,260 --> 01:58:29,850 You know what? 2710 01:58:29,850 --> 01:58:32,720 Just like the phone book-- that phone book example got us pretty far 2711 01:58:32,720 --> 01:58:33,800 in the first week-- 2712 01:58:33,800 --> 01:58:38,090 let me do half of the problem at a time and then kind of combine my answer. 2713 01:58:38,090 --> 01:58:39,020 So here's an array-- 2714 01:58:39,020 --> 01:58:42,080 4, 2, 7, 5, 6, 8, 3, 1-- randomly sorted. 2715 01:58:42,080 --> 01:58:44,420 Let me go ahead and sort just half of this, 2716 01:58:44,420 --> 01:58:47,760 just like I searched for Mike initially in just half of the phone book. 2717 01:58:47,760 --> 01:58:50,570 So 4, 2, 7, 5-- not sorted. 2718 01:58:50,570 --> 01:58:51,500 But you know what? 2719 01:58:51,500 --> 01:58:53,410 This feels like too big of a problem, still. 2720 01:58:53,410 --> 01:58:56,980 Let me sort just the left half of the left half. 2721 01:58:56,980 --> 01:58:58,280 OK, now it's a smaller problem. 2722 01:58:58,280 --> 01:58:59,040 You know what? 2723 01:58:59,040 --> 01:59:00,770 4 and 2, still out of order. 2724 01:59:00,770 --> 01:59:05,540 Let me just divide this list of two into two tiny arrays, each of size 1. 2725 01:59:05,540 --> 01:59:08,500 So here's a mini-array of size 1, and then another one of like size 2726 01:59:08,500 --> 01:59:10,970 7, but they're back to back, so whatever. 2727 01:59:10,970 --> 01:59:14,060 But this array of size 1, is it sorted? 2728 01:59:14,060 --> 01:59:15,220 AUDIENCE: No. 2729 01:59:15,220 --> 01:59:16,310 DAVID J. MALAN: I'm sorry? 2730 01:59:16,310 --> 01:59:17,210 AUDIENCE: No. 2731 01:59:17,210 --> 01:59:18,490 DAVID J. MALAN: No? 2732 01:59:18,490 --> 01:59:21,360 If this array has just one element and that element is 4-- 2733 01:59:21,360 --> 01:59:22,180 AUDIENCE: There's only one thing you can do. 2734 01:59:22,180 --> 01:59:24,470 DAVID J. MALAN: Yes, then it is sorted, by definition. 2735 01:59:24,470 --> 01:59:25,600 All right, so done. 2736 01:59:25,600 --> 01:59:26,860 Making some progress. 2737 01:59:26,860 --> 01:59:28,720 Now, let me kind of mentally rewind. 2738 01:59:28,720 --> 01:59:32,300 Let me sort the right half of that array. 2739 01:59:32,300 --> 01:59:34,360 So now I have another array of size 1. 2740 01:59:34,360 --> 01:59:36,550 Is this array sorted? 2741 01:59:36,550 --> 01:59:37,930 Yeah, kind of stupidly. 2742 01:59:37,930 --> 01:59:39,250 We don't really seem to be doing anything. 2743 01:59:39,250 --> 01:59:40,290 We're just making claims. 2744 01:59:40,290 --> 01:59:41,650 But yes, this is sorted. 2745 01:59:41,650 --> 01:59:44,200 But now, this was the original half. 2746 01:59:44,200 --> 01:59:46,030 And this half is sorted. 2747 01:59:46,030 --> 01:59:47,250 This half is sorted. 2748 01:59:47,250 --> 01:59:49,840 What if I now just kind of merge these sorted halves? 2749 01:59:49,840 --> 01:59:52,060 I've got two lists of size 1-- 2750 01:59:52,060 --> 01:59:53,560 4 and 2. 2751 01:59:53,560 --> 01:59:56,740 And now if I have extra storage space, if I had like extra benches, 2752 01:59:56,740 --> 01:59:58,180 I could do this a little better. 2753 01:59:58,180 --> 02:00:00,650 don't I go ahead and merge these two as follows? 2754 02:00:00,650 --> 02:00:02,070 2 will go there. 2755 02:00:02,070 --> 02:00:03,460 4 will go there. 2756 02:00:03,460 --> 02:00:06,910 So now I've taken two sorted lists and made one bigger, more sorted list 2757 02:00:06,910 --> 02:00:10,030 by just merging them together, leveraging some additional space. 2758 02:00:10,030 --> 02:00:11,200 Now, let me mentally rewind. 2759 02:00:11,200 --> 02:00:12,480 How did I get to 4 and 2? 2760 02:00:12,480 --> 02:00:15,880 Well, I started with the left half, then the left half of the left half. 2761 02:00:15,880 --> 02:00:19,330 Let me now do the right half of the left half, if you will. 2762 02:00:19,330 --> 02:00:20,830 All right, let me divide this again. 2763 02:00:20,830 --> 02:00:23,440 7, list of size 1, is it sorted? 2764 02:00:23,440 --> 02:00:24,720 Yes, trivially. 2765 02:00:24,720 --> 02:00:26,440 5, is it sorted? 2766 02:00:26,440 --> 02:00:27,250 Yes. 2767 02:00:27,250 --> 02:00:29,710 7 and 5, let's go ahead and merge them together. 2768 02:00:29,710 --> 02:00:31,480 5 is, of course, going to go here. 2769 02:00:31,480 --> 02:00:33,840 7, of course, is going to go here. 2770 02:00:33,840 --> 02:00:34,420 OK. 2771 02:00:34,420 --> 02:00:35,470 Now where do we go? 2772 02:00:35,470 --> 02:00:37,570 We originally sorted the left half. 2773 02:00:37,570 --> 02:00:39,290 Let's go sort the right-- oh, right. 2774 02:00:39,290 --> 02:00:40,060 Sorry. 2775 02:00:40,060 --> 02:00:41,680 Now, we have the left half. 2776 02:00:41,680 --> 02:00:45,130 And the right half of the left half are sorted. 2777 02:00:45,130 --> 02:00:46,540 Let's go ahead and merge these. 2778 02:00:46,540 --> 02:00:48,950 We have two lists now of size 2-- 2779 02:00:48,950 --> 02:00:52,540 2, 4 and 5, 7, both of which are sorted. 2780 02:00:52,540 --> 02:00:56,170 If I now merge 2, 4 and 5, 7, which element should come first 2781 02:00:56,170 --> 02:00:59,260 in the new longer list, obviously? 2782 02:00:59,260 --> 02:01:00,180 2. 2783 02:01:00,180 --> 02:01:01,890 And then 4, then 5, and then 7. 2784 02:01:01,890 --> 02:01:03,100 That wasn't much of anything. 2785 02:01:03,100 --> 02:01:05,870 But OK, we're just using a little more space in our array. 2786 02:01:05,870 --> 02:01:07,350 Now what comes next? 2787 02:01:07,350 --> 02:01:08,560 Now, let's do the right half. 2788 02:01:08,560 --> 02:01:11,410 Again, we started by taking the whole problem, doing the left half, 2789 02:01:11,410 --> 02:01:14,740 the left half of the left half, the left half of the left half of the left half. 2790 02:01:14,740 --> 02:01:17,000 And now we're going back in time, if you will. 2791 02:01:17,000 --> 02:01:20,350 So let's divide this into two halves, now the left half into two 2792 02:01:20,350 --> 02:01:21,000 halves still. 2793 02:01:21,000 --> 02:01:22,350 6 is sorted. 2794 02:01:22,350 --> 02:01:23,380 8 is sorted. 2795 02:01:23,380 --> 02:01:24,640 Now I have to merge them-- 2796 02:01:24,640 --> 02:01:26,170 6, 8. 2797 02:01:26,170 --> 02:01:26,950 What comes next? 2798 02:01:26,950 --> 02:01:29,020 Right half-- 3 and 1. 2799 02:01:29,020 --> 02:01:31,960 Well, left half is sorted, right half is sorted-- 2800 02:01:31,960 --> 02:01:33,760 1 and 3. 2801 02:01:33,760 --> 02:01:35,260 All right, now how do I merge these? 2802 02:01:35,260 --> 02:01:38,560 6, 8, 1, 3, which element should obviously come first? 2803 02:01:38,560 --> 02:01:42,880 1, then 3, then 6, then 8. 2804 02:01:42,880 --> 02:01:45,610 And then lastly, I have two lists of size four. 2805 02:01:45,610 --> 02:01:48,290 Let me give myself a little more space, one more array. 2806 02:01:48,290 --> 02:01:53,320 Now let me go ahead and put 1, and 2, and 3, and 4, and 5, 2807 02:01:53,320 --> 02:01:56,380 and 6, and 7, and 8. 2808 02:01:56,380 --> 02:01:57,580 What just happened? 2809 02:01:57,580 --> 02:02:00,640 Because it actually happened a lot faster, even though we were doing this 2810 02:02:00,640 --> 02:02:01,630 all verbally. 2811 02:02:01,630 --> 02:02:06,430 Well notice, how many times did each number change locations? 2812 02:02:09,760 --> 02:02:10,810 Literally three, right? 2813 02:02:10,810 --> 02:02:13,330 Like one, two, three, right? 2814 02:02:13,330 --> 02:02:17,230 It moved from the original array, to the secondary array, to the tertiary array, 2815 02:02:17,230 --> 02:02:19,870 to the fourth array, whatever that's called. 2816 02:02:19,870 --> 02:02:21,580 And then it was ultimately in place. 2817 02:02:21,580 --> 02:02:24,670 So each number had to move one, two, three spots. 2818 02:02:24,670 --> 02:02:26,840 And then how many numbers are there? 2819 02:02:26,840 --> 02:02:28,240 AUDIENCE: [INAUDIBLE] 2820 02:02:28,240 --> 02:02:30,830 DAVID J. MALAN: Well, they were already in the original array. 2821 02:02:30,830 --> 02:02:32,450 So how many times do they have to move? 2822 02:02:32,450 --> 02:02:33,890 Just one, two, three. 2823 02:02:33,890 --> 02:02:36,500 So how many total numbers are there, just to be clear? 2824 02:02:36,500 --> 02:02:37,130 There's eight. 2825 02:02:37,130 --> 02:02:38,360 So 8 times 3. 2826 02:02:38,360 --> 02:02:39,470 So let's generalize this. 2827 02:02:39,470 --> 02:02:43,280 If there's n numbers, and each time we moved 2828 02:02:43,280 --> 02:02:46,940 the numbers we did like half of them, than half, then half, well, 2829 02:02:46,940 --> 02:02:50,510 how many times can you divide 8 by 2? 2830 02:02:50,510 --> 02:02:51,090 8 goes to 4. 2831 02:02:51,090 --> 02:02:52,340 4 goes to 2. 2832 02:02:52,340 --> 02:02:53,520 2 goes to 1. 2833 02:02:53,520 --> 02:02:57,170 And that's why we bottomed out at one element, lists of size 1. 2834 02:02:57,170 --> 02:03:00,300 So it turns out whenever you divide something by half, by half, by half, 2835 02:03:00,300 --> 02:03:05,240 what is that function or formula? 2836 02:03:05,240 --> 02:03:06,320 Not power, that's bad. 2837 02:03:06,320 --> 02:03:07,560 That's the other direction. 2838 02:03:07,560 --> 02:03:08,060 AUDIENCE: [INAUDIBLE] 2839 02:03:08,060 --> 02:03:08,970 DAVID J. MALAN: It's a logarithm. 2840 02:03:08,970 --> 02:03:11,210 So again, logarithm is just a mathematical description 2841 02:03:11,210 --> 02:03:14,210 for any function that you keep dividing something again, and again, and again. 2842 02:03:14,210 --> 02:03:17,200 In half, in half, in half, in third, in third, in third, whatever it is, 2843 02:03:17,200 --> 02:03:20,450 it just means division by the same proportional amounts again, 2844 02:03:20,450 --> 02:03:22,170 and again, and again. 2845 02:03:22,170 --> 02:03:27,920 And so if we move the numbers three times, or more generally log 2846 02:03:27,920 --> 02:03:31,220 of n times, which again just means you divided n things again, 2847 02:03:31,220 --> 02:03:33,710 and again, and again, you just call that log n. 2848 02:03:33,710 --> 02:03:36,890 And there's n numbers, so n numbers moved 2849 02:03:36,890 --> 02:03:40,850 log n times, the total arithmetic here in question 2850 02:03:40,850 --> 02:03:44,270 is one of those other values on our little cheat sheet, which 2851 02:03:44,270 --> 02:03:46,130 looked like this. 2852 02:03:46,130 --> 02:03:51,140 In our other cheat sheet, recall that we had formulas that looked like this, 2853 02:03:51,140 --> 02:03:55,910 not just n squared and n, and log n, and 1, we have this one in the middle-- 2854 02:03:55,910 --> 02:03:57,850 n times log n. 2855 02:03:57,850 --> 02:03:59,690 So again, we're kind of jumping around here. 2856 02:03:59,690 --> 02:04:02,480 But again, each number moves log n places. 2857 02:04:02,480 --> 02:04:03,950 There's n total numbers. 2858 02:04:03,950 --> 02:04:07,820 So n times log n is just, by definition, n log n. 2859 02:04:07,820 --> 02:04:09,740 But why is this sorted this way? 2860 02:04:09,740 --> 02:04:12,500 Well log n, recall from week 0 with the phone book example, 2861 02:04:12,500 --> 02:04:16,520 the green curve is definitely smaller than n. n was the straight lines, 2862 02:04:16,520 --> 02:04:18,350 log n was the green curved one. 2863 02:04:18,350 --> 02:04:21,350 So this indeed belongs in between, because this is n times n. 2864 02:04:21,350 --> 02:04:22,220 This is n. 2865 02:04:22,220 --> 02:04:25,160 This is n times something smaller than n. 2866 02:04:25,160 --> 02:04:26,870 So what's the actual implication? 2867 02:04:26,870 --> 02:04:29,930 Well, if we were to run these algorithms side by side 2868 02:04:29,930 --> 02:04:34,310 and actually compare them with something like this-- 2869 02:04:34,310 --> 02:04:41,070 let me go ahead and compare these algorithms using this demo here-- 2870 02:04:41,070 --> 02:04:44,190 if I go ahead and hit play, we'll see that the bars in this chart 2871 02:04:44,190 --> 02:04:45,510 are actually horizontal. 2872 02:04:45,510 --> 02:04:47,670 And the small bars represent small numbers, 2873 02:04:47,670 --> 02:04:49,200 large bars represent long numbers. 2874 02:04:49,200 --> 02:04:52,200 And then each of these is going to run a different algorithm-- selection 2875 02:04:52,200 --> 02:04:54,390 sort on the left, bubble sort in the middle, 2876 02:04:54,390 --> 02:04:57,450 merge sort, as we'll now call it, on the right. 2877 02:04:57,450 --> 02:05:00,240 And here's how long each of them take to sort those values. 2878 02:05:04,630 --> 02:05:06,010 Bubble's still going. 2879 02:05:06,010 --> 02:05:07,060 Selection's still going. 2880 02:05:07,060 --> 02:05:09,860 And so that's the appreciable difference, albeit with a small demo, 2881 02:05:09,860 --> 02:05:12,610 between n squared and something like log n. 2882 02:05:12,610 --> 02:05:13,860 And so what have we done here? 2883 02:05:13,860 --> 02:05:17,060 We've really, really, really got into the weeds of what arrays can actually 2884 02:05:17,060 --> 02:05:20,060 do for us and what the relationships are with strings, because all of it 2885 02:05:20,060 --> 02:05:22,580 kind of reduces to just things being back, to back, to back, to back. 2886 02:05:22,580 --> 02:05:24,410 But now that we kind of come back, and we'll 2887 02:05:24,410 --> 02:05:26,480 continue along this trajectory next time to be 2888 02:05:26,480 --> 02:05:29,810 able to talk at a much higher level about what's actually going on. 2889 02:05:29,810 --> 02:05:32,810 And we can now take this even further, by applying 2890 02:05:32,810 --> 02:05:35,420 other sort of forms of media to these same kinds of questions. 2891 02:05:35,420 --> 02:05:37,370 And we'll conclude it's about 60 seconds long. 2892 02:05:37,370 --> 02:05:39,250 These bars are vertical, instead of horizontal. 2893 02:05:39,250 --> 02:05:41,040 And what you'll see here is a visualization 2894 02:05:41,040 --> 02:05:43,770 of various sorting algorithms, among them selection sort, bubble 2895 02:05:43,770 --> 02:05:46,700 sort, and merge sort, and a whole assortment of others, each of which 2896 02:05:46,700 --> 02:05:50,030 has even a different sound to it because of the speed 2897 02:05:50,030 --> 02:05:53,550 and the pattern by which it actually operates. 2898 02:05:53,550 --> 02:05:54,680 So let's take a quick look. 2899 02:05:54,680 --> 02:05:55,340 [VIDEO PLAYBACK] 2900 02:05:55,340 --> 02:05:56,640 [MUSIC PLAYING] 2901 02:06:05,550 --> 02:06:06,860 This is bubble sort. 2902 02:06:06,860 --> 02:06:10,700 And you can see how the larger elements are indeed bubbling up to the top. 2903 02:06:15,730 --> 02:06:16,230 [? 2904 02:06:16,230 --> 02:06:18,060 And you can kind of hear the ?] periodicity, 2905 02:06:18,060 --> 02:06:20,600 or the cycle that it's going in. 2906 02:06:25,690 --> 02:06:33,470 And there's less, and less, and less, and less work to do, until almost-- 2907 02:06:33,470 --> 02:06:34,910 This is selection sort now. 2908 02:06:34,910 --> 02:06:38,810 So it starts off random, but we keep selecting the smallest human 2909 02:06:38,810 --> 02:06:41,900 or, in this case, the shortest bar. 2910 02:06:41,900 --> 02:06:45,620 And you'll see here the bars correlate with frequency, clearly. 2911 02:06:45,620 --> 02:06:50,210 So it's getting higher and higher and taller and taller. 2912 02:06:50,210 --> 02:06:53,780 This is merge sort now which, recall, does things in halves, 2913 02:06:53,780 --> 02:06:57,410 and then halves of halves, and then merges those halves. 2914 02:06:57,410 --> 02:07:03,210 So we just did all the left work, almost all the right work. 2915 02:07:03,210 --> 02:07:04,340 That one's very gratifying. 2916 02:07:04,340 --> 02:07:06,360 [LAUGHS] 2917 02:07:06,360 --> 02:07:10,550 This is something called [? nom ?] sort, which is improving things. 2918 02:07:10,550 --> 02:07:13,490 Not quite perfectly, but it's always making forward progress, 2919 02:07:13,490 --> 02:07:15,830 and then kind of doubling back and cleaning things up. 2920 02:07:24,000 --> 02:07:24,960 [END PLAYBACK] 2921 02:07:24,960 --> 02:07:25,460 Whew. 2922 02:07:25,460 --> 02:07:26,120 That was a lot. 2923 02:07:26,120 --> 02:07:26,960 Let's call it a day. 2924 02:07:26,960 --> 02:07:28,040 I'll stick around for one-on-one questions. 2925 02:07:28,040 --> 02:07:29,030 We'll see you next time. 2926 02:07:29,030 --> 02:07:31,090 [APPLAUSE]