1 00:00:00,000 --> 00:00:00,000 [MUSIC PLAYING] 2 00:01:18,000 --> 00:01:20,825 DAVID MALAN: This is CS50 and this is week 2. 3 00:01:20,825 --> 00:01:23,450 Now that you have some programming experience under your belts, 4 00:01:23,450 --> 00:01:25,910 in this more arcane language called c. 5 00:01:25,910 --> 00:01:28,790 Among our goals today is to help you understand exactly what you have 6 00:01:28,790 --> 00:01:30,650 been doing these past several days. 7 00:01:30,650 --> 00:01:33,955 Wrestling with your first programs in C, so that you have more of a bottom 8 00:01:33,955 --> 00:01:36,080 up understanding of what some of these commands do. 9 00:01:36,080 --> 00:01:38,580 And, ultimately, what more we can do with this language. 10 00:01:38,580 --> 00:01:41,750 So this recall was the very first program you wrote, 11 00:01:41,750 --> 00:01:44,870 I wrote in this language called C, much more textual, 12 00:01:44,870 --> 00:01:46,970 certainly, than the Scratch equivalent. 13 00:01:46,970 --> 00:01:51,200 But at the end of the day, computers, your Mac, your PC, 14 00:01:51,200 --> 00:01:54,555 VS Code doesn't understand this actual code. 15 00:01:54,555 --> 00:01:57,680 What's the format into which we need to get any program that we write, just 16 00:01:57,680 --> 00:01:58,180 to recap? 17 00:01:58,180 --> 00:01:59,202 AUDIENCE: [INAUDIBLE] 18 00:01:59,202 --> 00:02:01,790 DAVID MALAN: So binary, otherwise known as machine code. 19 00:02:01,790 --> 00:02:02,290 Right? 20 00:02:02,290 --> 00:02:05,870 The 0s and 1s that your computer actually does understand. 21 00:02:05,870 --> 00:02:08,030 So somehow we need to get to this format. 22 00:02:08,030 --> 00:02:10,730 And up until now, we've been using this command called make, 23 00:02:10,730 --> 00:02:13,670 which is aptly named, because it lets you make programs. 24 00:02:13,670 --> 00:02:16,430 And the invocation of that has been pretty simple. 25 00:02:16,430 --> 00:02:20,450 Make hello looks in your current directory or folder for a file called 26 00:02:20,450 --> 00:02:25,100 hello.c, implicitly, and then it compiles that into a file called hello, 27 00:02:25,100 --> 00:02:27,650 which itself is executable, which just means runnable, 28 00:02:27,650 --> 00:02:29,900 so that you can then do ./hello. 29 00:02:29,900 --> 00:02:34,190 But it turns out that make is actually not a compiler itself. 30 00:02:34,190 --> 00:02:35,840 It does help you make programs. 31 00:02:35,840 --> 00:02:40,520 But make is this utility that comes on a lot of systems that makes it easier 32 00:02:40,520 --> 00:02:44,060 to actually compile code by using an actual compiler, 33 00:02:44,060 --> 00:02:48,290 the program that converts source code to machine code, on your own Mac, or PC, 34 00:02:48,290 --> 00:02:50,660 or whatever cloud environment you might be using. 35 00:02:50,660 --> 00:02:53,330 In fact, what make is doing for us, is actually, 36 00:02:53,330 --> 00:02:57,230 running a command automatically known as clang, for C language. 37 00:02:57,230 --> 00:03:01,590 And, so here, for instance, in VS Code, is that very first program again, 38 00:03:01,590 --> 00:03:03,470 this time in the context of a text editor, 39 00:03:03,470 --> 00:03:06,680 and I could compile this with make hello. 40 00:03:06,680 --> 00:03:09,567 Let me go ahead and use the compiler itself manually. 41 00:03:09,567 --> 00:03:12,650 And we'll see in a moment why we've been automating the process with make. 42 00:03:12,650 --> 00:03:15,060 I'm going to run clang instead. 43 00:03:15,060 --> 00:03:17,340 And then I'm going to run hello.c. 44 00:03:17,340 --> 00:03:19,490 So it's a little different how the compiler's used. 45 00:03:19,490 --> 00:03:22,160 It needs to know, explicitly, what the file is called. 46 00:03:22,160 --> 00:03:25,280 I'll go ahead and run clang, hello.c, Enter. 47 00:03:25,280 --> 00:03:28,415 Nothing seems to happen, which, generally speaking, is a good thing. 48 00:03:28,415 --> 00:03:29,790 Because no errors have popped up. 49 00:03:29,790 --> 00:03:36,140 And if I do ls for list, you'll see there is not a file called hello. 50 00:03:36,140 --> 00:03:39,230 But there is a curiously-named file called a.out. 51 00:03:39,230 --> 00:03:42,620 This is a historical convention, stands for assembler output. 52 00:03:42,620 --> 00:03:45,380 And this is, just, the default file name for a program 53 00:03:45,380 --> 00:03:49,400 that you might compile yourself, manually, using clang itself. 54 00:03:49,400 --> 00:03:51,830 Let me go ahead now and point out that that's 55 00:03:51,830 --> 00:03:53,340 kind of a stupid name for a program. 56 00:03:53,340 --> 00:03:56,435 Even though it works, ./a.out would work. 57 00:03:56,435 --> 00:03:59,060 But if you actually want to customize the name of your program, 58 00:03:59,060 --> 00:04:02,720 we could just resort to make, or we could do explicitly 59 00:04:02,720 --> 00:04:03,920 what make is doing for us. 60 00:04:03,920 --> 00:04:06,770 It turns out, some programs, among them make, 61 00:04:06,770 --> 00:04:08,990 support what are called command line arguments, 62 00:04:08,990 --> 00:04:10,310 and more on those later today. 63 00:04:10,310 --> 00:04:13,670 But these are literally words or numbers that you type at your prompt 64 00:04:13,670 --> 00:04:17,330 after the name of a program that just influences its behavior in some way. 65 00:04:17,330 --> 00:04:20,040 It modifies its behavior. 66 00:04:20,040 --> 00:04:22,940 And it turns out, if you read the documentation for clang, 67 00:04:22,940 --> 00:04:28,040 you can actually pass a -o, for output, command line argument, that 68 00:04:28,040 --> 00:04:30,260 lets you specify, explicitly what do you want 69 00:04:30,260 --> 00:04:31,795 your outputted program to be called? 70 00:04:31,795 --> 00:04:34,670 And then you go ahead and type the name of the file that you actually 71 00:04:34,670 --> 00:04:37,110 want to compile, from source code to machine code. 72 00:04:37,110 --> 00:04:38,720 Let me hit Enter now. 73 00:04:38,720 --> 00:04:41,990 Again, nothing seems to happen, and I type ls and voila. 74 00:04:41,990 --> 00:04:45,010 Now we still have the old a.out, because I didn't delete it yet. 75 00:04:45,010 --> 00:04:46,010 And I do have hello now. 76 00:04:46,010 --> 00:04:50,420 So ./hello, voila, runs hello, world again. 77 00:04:50,420 --> 00:04:52,160 And let me go ahead and remove this file. 78 00:04:52,160 --> 00:04:56,593 I could, of course, resort to using the Explorer, on the left hand side. 79 00:04:56,593 --> 00:04:59,510 Which, I am in the habit of closing, just to give us more room to see. 80 00:04:59,510 --> 00:05:02,240 But I could go ahead and right-click or control-click on a.out 81 00:05:02,240 --> 00:05:03,365 if I want to get rid of it. 82 00:05:03,365 --> 00:05:06,300 Or again, let me focus on the command line interface. 83 00:05:06,300 --> 00:05:07,250 And I can use-- 84 00:05:07,250 --> 00:05:08,030 anyone recall? 85 00:05:08,030 --> 00:05:11,000 We didn't really use it much, but what command removes a file? 86 00:05:11,000 --> 00:05:12,665 AUDIENCE: rm. 87 00:05:12,665 --> 00:05:16,430 DAVID MALAN: So rm for remove. rm, a.out, Enter. 88 00:05:16,430 --> 00:05:20,060 Remove regular file, a.out, y for yes, enter. 89 00:05:20,060 --> 00:05:22,640 And now, if I do ls again, voila, it's gone. 90 00:05:22,640 --> 00:05:24,650 All right, so, let's now enhance this program 91 00:05:24,650 --> 00:05:30,290 to do the second version we ever did, which was to also include cs50.h, 92 00:05:30,290 --> 00:05:33,149 so that we have access to functions like, get string, and the like. 93 00:05:33,149 --> 00:05:40,340 Let me do string, name, gets, get string, what's your name, 94 00:05:40,340 --> 00:05:41,550 question mark. 95 00:05:41,550 --> 00:05:46,010 And now, let me go ahead and say hello to that name with our %s placeholder, 96 00:05:46,010 --> 00:05:46,920 comma, name. 97 00:05:46,920 --> 00:05:49,160 So this was version 2 of our program last time, 98 00:05:49,160 --> 00:05:53,300 that very easily compiled with make hello, but notice the difference now. 99 00:05:53,300 --> 00:05:56,360 If I want to compile this thing myself with clang, using 100 00:05:56,360 --> 00:05:58,520 that same lesson learned, all right, let's do it. 101 00:05:58,520 --> 00:06:05,300 clang-o, hello, just so I get a better name for the program, hello.c, Enter. 102 00:06:05,300 --> 00:06:09,750 And a new error pops up that some of you might have encountered on your own. 103 00:06:09,750 --> 00:06:13,580 So it's a bit arcane here, and there's this mention of a cryptic-looking path 104 00:06:13,580 --> 00:06:15,330 with temp for temporary there. 105 00:06:15,330 --> 00:06:18,560 But somehow, my issue's in main, as we can see here. 106 00:06:18,560 --> 00:06:20,257 It somehow relates to hello.c. 107 00:06:20,257 --> 00:06:23,090 Even though we might not have seen this language last time in class, 108 00:06:23,090 --> 00:06:25,970 but there's an undefined reference to get string. 109 00:06:25,970 --> 00:06:27,800 As though get string doesn't exist. 110 00:06:27,800 --> 00:06:31,340 Now, your first instinct might be, well maybe I forgot cs50.h, but of course, 111 00:06:31,340 --> 00:06:32,180 I didn't. 112 00:06:32,180 --> 00:06:34,310 That's the very first line of my program. 113 00:06:34,310 --> 00:06:37,910 But it turns out, make is doing something else for us, all this time. 114 00:06:37,910 --> 00:06:41,930 Just putting cs50.h, or any header file at the top of your code, 115 00:06:41,930 --> 00:06:46,730 for that matter, just teaches the compiler that a function will exist. 116 00:06:46,730 --> 00:06:49,310 It, sort of, asks the compiler to-- it asks the compiler 117 00:06:49,310 --> 00:06:52,610 to trust that I will, eventually, get around to implementing functions, 118 00:06:52,610 --> 00:06:58,130 like get string, and cs50.h, and stdio.h, printf, therein. 119 00:06:58,130 --> 00:07:03,830 But this error here, some kind of linker command, relates to the fact 120 00:07:03,830 --> 00:07:05,960 that there's a separate process for actually 121 00:07:05,960 --> 00:07:10,280 finding the 0s and 1s that cs50 compiled long ago for you. 122 00:07:10,280 --> 00:07:13,850 That authors of this operating system compiled for you, long ago, 123 00:07:13,850 --> 00:07:14,900 in the form of printf. 124 00:07:14,900 --> 00:07:17,840 We need to, somehow, tell the compiler that we 125 00:07:17,840 --> 00:07:20,450 need to link in code that someone else wrote, 126 00:07:20,450 --> 00:07:23,750 the actual machine code that someone else wrote and then compiled. 127 00:07:23,750 --> 00:07:27,497 So to do that, you'd have to type -lcs50, for instance, 128 00:07:27,497 --> 00:07:28,580 at the end of the command. 129 00:07:28,580 --> 00:07:31,548 So additionally, telling clang that, not only do you want to output 130 00:07:31,548 --> 00:07:34,340 a file called hello, and you want to compile a file called hello.c, 131 00:07:34,340 --> 00:07:39,200 you also want to quote-unquote link in a bunch of 0s and 1s 132 00:07:39,200 --> 00:07:43,010 that collectively implement get string and printf. 133 00:07:43,010 --> 00:07:47,220 So now, if I hit enter, this time it compiled OK. 134 00:07:47,220 --> 00:07:53,142 And now if I run ./hello, it works as it did last week, just like that. 135 00:07:53,142 --> 00:07:56,100 But honestly, this is just going to get really tedious, really quickly. 136 00:07:56,100 --> 00:07:57,930 Notice, already, just to compile my code, 137 00:07:57,930 --> 00:08:01,417 I have to run clang-o, hello, hello.c, lcs50, 138 00:08:01,417 --> 00:08:03,500 and you're going to have to type more things, too. 139 00:08:03,500 --> 00:08:06,890 If you wanted to use the math library, like, to use that round function, 140 00:08:06,890 --> 00:08:09,440 you would also have to do -lm, typically, 141 00:08:09,440 --> 00:08:12,890 to specify give me the math bits that someone else compiled. 142 00:08:12,890 --> 00:08:14,970 And the commands just get longer and longer. 143 00:08:14,970 --> 00:08:19,520 So moving forward, we won't have to resort to running clang itself, 144 00:08:19,520 --> 00:08:21,330 but clang is, indeed, the compiler. 145 00:08:21,330 --> 00:08:24,380 That is the program that converts from source code to machine code. 146 00:08:24,380 --> 00:08:28,438 But we'll continue to use make because it just automates that process. 147 00:08:28,438 --> 00:08:30,230 And the commands are only going to get more 148 00:08:30,230 --> 00:08:34,640 cryptic the more sophisticated and more feature full year programs get. 149 00:08:34,640 --> 00:08:39,620 And make, again, is just a tool that makes all that happen. 150 00:08:39,620 --> 00:08:44,300 Let me pause there to see if there's any questions before then we 151 00:08:44,300 --> 00:08:45,890 take a look further under the hood. 152 00:08:45,890 --> 00:08:47,185 Yeah, in front. 153 00:08:47,185 --> 00:08:50,185 AUDIENCE: Can you explain again what the -lcs50-- just why you put that? 154 00:08:50,185 --> 00:08:52,518 DAVID MALAN: Sure, let me come back to that in a moment. 155 00:08:52,518 --> 00:08:53,750 What does the -lcs50 mean? 156 00:08:53,750 --> 00:08:55,917 We'll come back to that, visually, in just a moment. 157 00:08:55,917 --> 00:08:58,850 But it means to link in the 0s and 1s that collectively 158 00:08:58,850 --> 00:09:00,435 implement get string and printf. 159 00:09:00,435 --> 00:09:02,060 But we'll see that, visually, in a sec. 160 00:09:02,060 --> 00:09:03,341 Yeah, behind you. 161 00:09:03,341 --> 00:09:07,073 AUDIENCE: [INAUDIBLE]. 162 00:09:07,073 --> 00:09:08,490 DAVID MALAN: Really good question. 163 00:09:08,490 --> 00:09:10,850 How come I didn't have to link in standard I/O? 164 00:09:10,850 --> 00:09:12,950 Because I used printf in version 1. 165 00:09:12,950 --> 00:09:16,280 Standard I/O is just, literally, so standard that it's built in, 166 00:09:16,280 --> 00:09:17,480 it just works for free. 167 00:09:17,480 --> 00:09:18,800 CS50, of course, is not. 168 00:09:18,800 --> 00:09:21,080 It did not come with the language C or the compiler. 169 00:09:21,080 --> 00:09:22,250 We ourselves wrote it. 170 00:09:22,250 --> 00:09:26,600 And other libraries, even though they might come with the language C, 171 00:09:26,600 --> 00:09:30,600 they might not be enabled by default, generally for efficiency purposes. 172 00:09:30,600 --> 00:09:33,470 So you're not loading more 0s and 1s into the computer's memory 173 00:09:33,470 --> 00:09:34,280 than you need to. 174 00:09:34,280 --> 00:09:37,250 So standard I/O is special, if you will. 175 00:09:37,250 --> 00:09:38,510 Other questions? 176 00:09:38,510 --> 00:09:39,500 Yeah? 177 00:09:39,500 --> 00:09:41,420 AUDIENCE: [INAUDIBLE] 178 00:09:41,420 --> 00:09:43,160 DAVID MALAN: Oh, what does the -o mean? 179 00:09:43,160 --> 00:09:46,190 So -o is shorthand for the English word output, 180 00:09:46,190 --> 00:09:51,260 and so -o is telling clang to please output a file called hello, 181 00:09:51,260 --> 00:09:53,850 because the next thing I wrote after the command line 182 00:09:53,850 --> 00:09:59,929 recall was clang -o hello, then the name of the file, then -lcs50. 183 00:09:59,929 --> 00:10:03,407 And this is where these commands do get and stay fairly arcane. 184 00:10:03,407 --> 00:10:05,240 It's just through muscle memory and practice 185 00:10:05,240 --> 00:10:07,610 that you'll start to remember, oh what are the other commands that you-- 186 00:10:07,610 --> 00:10:10,277 what are the command line arguments you can provide to programs? 187 00:10:10,277 --> 00:10:11,570 But we've seen this before. 188 00:10:11,570 --> 00:10:14,780 Technically, when you run make hello, the program is called make, 189 00:10:14,780 --> 00:10:16,980 hello is the command line argument. 190 00:10:16,980 --> 00:10:19,040 It's an input to the make function, albeit, 191 00:10:19,040 --> 00:10:22,250 typed at the prompt, that tells make what you want to make. 192 00:10:22,250 --> 00:10:26,180 Even when I used rm a moment ago, and did rm of a.out, 193 00:10:26,180 --> 00:10:28,280 the command line argument there was called a.out 194 00:10:28,280 --> 00:10:30,740 and it's telling rm what to delete. 195 00:10:30,740 --> 00:10:35,270 It is entirely dependent on the programs to decide what their conventions are, 196 00:10:35,270 --> 00:10:38,090 whether you use dash this or dash that, but we'll 197 00:10:38,090 --> 00:10:40,805 see over time, which ones actually matter in practice. 198 00:10:40,805 --> 00:10:46,220 So to come back to the first question about what actually is happening there, 199 00:10:46,220 --> 00:10:48,562 let's consider the code more closely. 200 00:10:48,562 --> 00:10:50,270 So here is that first version of the code 201 00:10:50,270 --> 00:10:54,590 again, with stdio.h and only printf, so no cs50 stuff yet. 202 00:10:54,590 --> 00:10:56,840 Until we add it back in and had the second version, 203 00:10:56,840 --> 00:10:59,630 where we actually get the human's name. 204 00:10:59,630 --> 00:11:02,783 When you run this command, there's a few things 205 00:11:02,783 --> 00:11:04,700 that are happening underneath the hood, and we 206 00:11:04,700 --> 00:11:06,650 won't dwell on these kinds of details, indeed, 207 00:11:06,650 --> 00:11:08,870 we'll abstract it away by using make. 208 00:11:08,870 --> 00:11:10,940 But it's worth understanding from the get-go, 209 00:11:10,940 --> 00:11:13,880 how much automation is going on, so that when you run these commands, 210 00:11:13,880 --> 00:11:14,850 it's not magic. 211 00:11:14,850 --> 00:11:17,940 You have this bottom-up understanding of what's going on. 212 00:11:17,940 --> 00:11:21,530 So when we say you've been compiling your code with make, 213 00:11:21,530 --> 00:11:23,600 that's a bit of an oversimplification. 214 00:11:23,600 --> 00:11:26,780 Technically, every time you compile your code, 215 00:11:26,780 --> 00:11:29,570 you're having the computer do four distinct things for you. 216 00:11:29,570 --> 00:11:33,020 And this is not four distinct things that you need to memorize and remember 217 00:11:33,020 --> 00:11:35,180 every time you run your program, what's happening, 218 00:11:35,180 --> 00:11:37,820 but it helps to break it down into building blocks, 219 00:11:37,820 --> 00:11:42,110 as to how we're getting from source code, like C, into 0s and 1s. 220 00:11:42,110 --> 00:11:46,640 It turns out, that when you compile, quote-unquote, "your code," technically 221 00:11:46,640 --> 00:11:50,510 speaking, you're doing four things automatically, and all at once. 222 00:11:50,510 --> 00:11:53,960 Preprocessing it, compiling it, assembling it, and linking it. 223 00:11:53,960 --> 00:11:57,350 Just humans decided, let's just call the whole process compiling. 224 00:11:57,350 --> 00:12:00,230 But for a moment, let's consider what these steps are. 225 00:12:00,230 --> 00:12:02,690 So preprocessing refers to this. 226 00:12:02,690 --> 00:12:06,710 If we look at our source code, version 2 that uses the cs50 library 227 00:12:06,710 --> 00:12:10,442 and therefore get string, notice that we have these include lines at top. 228 00:12:10,442 --> 00:12:12,650 And they're kind of special versus all the other code 229 00:12:12,650 --> 00:12:15,710 we've written, because they start with hash symbols, specifically. 230 00:12:15,710 --> 00:12:17,660 And that's sort of a special syntax that means 231 00:12:17,660 --> 00:12:20,600 that these are, technically, called preprocessor directives. 232 00:12:20,600 --> 00:12:25,290 Fancy way of saying they're handled special versus the rest of your code. 233 00:12:25,290 --> 00:12:29,870 In fact, if we focus on cs50.h, recall from last week 234 00:12:29,870 --> 00:12:35,870 that I provided a hint as to what's actually in cs50.h, among other things. 235 00:12:35,870 --> 00:12:40,580 What was the one salient thing that I said was in cs50.h and therefore, 236 00:12:40,580 --> 00:12:43,475 why we were including it in the first place? 237 00:12:43,475 --> 00:12:44,350 AUDIENCE: Get string? 238 00:12:44,350 --> 00:12:46,850 DAVID MALAN: So get string, specifically, 239 00:12:46,850 --> 00:12:49,160 the prototype for get string. 240 00:12:49,160 --> 00:12:51,410 We haven't made many of our own functions yet, 241 00:12:51,410 --> 00:12:53,840 but recall that any time we've made our own functions, 242 00:12:53,840 --> 00:12:56,330 and we've written them below main in a file, 243 00:12:56,330 --> 00:12:58,790 we've also had to, somewhat stupidly, copy paste 244 00:12:58,790 --> 00:13:01,370 the prototype of the function at the top of the file, 245 00:13:01,370 --> 00:13:05,210 just to teach the compiler that this function doesn't exist, yet, 246 00:13:05,210 --> 00:13:07,430 it does down there, but it will exist. 247 00:13:07,430 --> 00:13:08,300 Just trust me. 248 00:13:08,300 --> 00:13:10,980 So again, that's what these prototypes are doing for us. 249 00:13:10,980 --> 00:13:13,340 So therefore, in my code, If I want to use 250 00:13:13,340 --> 00:13:16,760 a function like get string, or printf, for that matter, 251 00:13:16,760 --> 00:13:19,150 they're not implemented clearly in the same file, 252 00:13:19,150 --> 00:13:20,400 they're implemented elsewhere. 253 00:13:20,400 --> 00:13:22,692 So I need to tell the compiler to trust me that they're 254 00:13:22,692 --> 00:13:24,000 implemented somewhere else. 255 00:13:24,000 --> 00:13:26,810 And so technically, inside of cs50.h, which 256 00:13:26,810 --> 00:13:30,410 is installed somewhere in the cloud's hard drive, so to speak, 257 00:13:30,410 --> 00:13:34,820 that you all are accessing via VS Code, there's a line that looks like this. 258 00:13:34,820 --> 00:13:38,870 A prototype for the get string function that says the name of the functions 259 00:13:38,870 --> 00:13:42,830 get string, it takes one input, or argument, called prompt, 260 00:13:42,830 --> 00:13:45,710 and that type of that prompt is a string. 261 00:13:45,710 --> 00:13:51,150 Get string, not surprisingly, has a return value and it returns a string. 262 00:13:51,150 --> 00:13:54,800 So literally, that line and a bunch of others, are in cs50.h. 263 00:13:54,800 --> 00:13:58,280 So rather than you all having to copy paste the prototype, 264 00:13:58,280 --> 00:14:01,160 you can just trust that cs50 figured out what it is. 265 00:14:01,160 --> 00:14:04,970 You can include cs50.h and the compiler is going 266 00:14:04,970 --> 00:14:07,420 to go find that prototype for you. 267 00:14:07,420 --> 00:14:09,480 Same thing in standard I/O. Someone else-- what 268 00:14:09,480 --> 00:14:13,620 must clearly be in stdio.h, among other stuff, that 269 00:14:13,620 --> 00:14:17,590 motivates our including stdio.h, too? 270 00:14:17,590 --> 00:14:18,090 Yeah? 271 00:14:18,090 --> 00:14:18,798 AUDIENCE: Printf. 272 00:14:18,798 --> 00:14:21,030 DAVID MALAN: Printf, the prototype for printf, 273 00:14:21,030 --> 00:14:24,010 and I'll just change it here in yellow, to be the same. 274 00:14:24,010 --> 00:14:25,410 And it turns out, the format-- 275 00:14:25,410 --> 00:14:28,590 the prototype for printf is, actually, pretty fancy, 276 00:14:28,590 --> 00:14:31,740 because, as you might have noticed, printf can take one argument, just 277 00:14:31,740 --> 00:14:35,910 something to print, 2, if you want to plug a value into it, 3 or more. 278 00:14:35,910 --> 00:14:38,620 So the dot dot dot just represents exactly that. 279 00:14:38,620 --> 00:14:42,330 It's not quite as simple a prototype as get strain, but more on that 280 00:14:42,330 --> 00:14:43,115 another time. 281 00:14:43,115 --> 00:14:46,050 So what does it mean to preprocess your code? 282 00:14:46,050 --> 00:14:49,860 The very first thing the compiler, clang, in this case, 283 00:14:49,860 --> 00:14:54,270 is doing for you when it reads your code top-to-bottom, left-to-right, is it 284 00:14:54,270 --> 00:14:57,960 notices, oh, here is hash include, oh, here's another hash include. 285 00:14:57,960 --> 00:15:03,090 And it, essentially, finds those files on the hard drive, cs50.h, stdio.h, 286 00:15:03,090 --> 00:15:06,990 and does the equivalent of copying and pasting them automatically 287 00:15:06,990 --> 00:15:09,360 into your code at the very top. 288 00:15:09,360 --> 00:15:12,450 Thereby teaching the compiler that gets string and printf 289 00:15:12,450 --> 00:15:14,430 will eventually exist somewhere. 290 00:15:14,430 --> 00:15:18,480 So that's the preprocessing step, whereby, again, it's 291 00:15:18,480 --> 00:15:22,080 just doing a find-and-replace of anything that starts with hash include. 292 00:15:22,080 --> 00:15:24,510 It's plugging in the files there so that you, essentially, 293 00:15:24,510 --> 00:15:27,780 get all the prototypes you need automatically. 294 00:15:27,780 --> 00:15:28,830 OK. 295 00:15:28,830 --> 00:15:31,230 What does it mean, then, to compile the results? 296 00:15:31,230 --> 00:15:33,450 Because at this point in the story, your code 297 00:15:33,450 --> 00:15:35,678 now looks like this in the computer's memory. 298 00:15:35,678 --> 00:15:37,470 It doesn't change your file, it's doing all 299 00:15:37,470 --> 00:15:39,990 of this in the computer's memory, or RAM, for you. 300 00:15:39,990 --> 00:15:42,070 But it, essentially, looks like this. 301 00:15:42,070 --> 00:15:45,600 Well the next step is what's, technically, really compiling. 302 00:15:45,600 --> 00:15:48,420 Even though again, we use compile as an umbrella term. 303 00:15:48,420 --> 00:15:51,510 Compiling code in C means to take code that 304 00:15:51,510 --> 00:15:53,740 now looks like this in the computer's memory 305 00:15:53,740 --> 00:15:56,890 and turn it into something that looks like this. 306 00:15:56,890 --> 00:15:58,350 Which is way more cryptic. 307 00:15:58,350 --> 00:16:00,990 But it was just a few decades ago that, if you 308 00:16:00,990 --> 00:16:03,930 were taking a class like CS50 in its earlier form, 309 00:16:03,930 --> 00:16:07,740 we wouldn't be using C it didn't exist yet, we would actually be using this, 310 00:16:07,740 --> 00:16:09,690 something called assembly language. 311 00:16:09,690 --> 00:16:13,230 And there's different types of, or flavors of, assembly language. 312 00:16:13,230 --> 00:16:17,010 But this is about as low level as you can get to what a computer really 313 00:16:17,010 --> 00:16:19,410 understands, be it a Mac, or PC, or a phone, 314 00:16:19,410 --> 00:16:22,650 before you start getting into actual 0s and 1s. 315 00:16:22,650 --> 00:16:24,013 And most of this is cryptic. 316 00:16:24,013 --> 00:16:27,180 I couldn't tell you what this is doing unless I thought it through carefully 317 00:16:27,180 --> 00:16:30,300 and rewound mentally, years ago, from having studied it, 318 00:16:30,300 --> 00:16:32,880 but let's highlight a few key words in yellow. 319 00:16:32,880 --> 00:16:37,380 Notice that this assembly language that the computer is outputting 320 00:16:37,380 --> 00:16:40,530 for you automatically, still has mention of main 321 00:16:40,530 --> 00:16:43,290 and it has mention of get string, and it has mention of printf. 322 00:16:43,290 --> 00:16:46,358 So there's some relationship to the C code we saw a moment ago. 323 00:16:46,358 --> 00:16:48,150 And then if I highlight these other things, 324 00:16:48,150 --> 00:16:50,430 these are what are called computer instructions. 325 00:16:50,430 --> 00:16:52,740 At the end of the day, your Mac, your PC, 326 00:16:52,740 --> 00:16:56,340 your phone actually only understands very basic instructions, 327 00:16:56,340 --> 00:17:01,020 like addition, subtraction, division, multiplication, move into memory, 328 00:17:01,020 --> 00:17:06,190 load from memory, print something to the screen, very basic operations. 329 00:17:06,190 --> 00:17:07,755 And that's what you're seeing here. 330 00:17:07,755 --> 00:17:12,750 These assembly instructions are what the computer actually 331 00:17:12,750 --> 00:17:16,870 feeds into the brains of the computer, the CPU, the central processing unit. 332 00:17:16,870 --> 00:17:19,770 And it's that Intel CPU, or whatever you have, 333 00:17:19,770 --> 00:17:23,220 that understands this instruction, and this one, and this one, and this one. 334 00:17:23,220 --> 00:17:25,860 And collectively, long story short, all they do 335 00:17:25,860 --> 00:17:28,620 is print hello, world on the screen, but in a way 336 00:17:28,620 --> 00:17:31,910 that the machine understands how to do. 337 00:17:31,910 --> 00:17:34,500 So let me pause here. 338 00:17:34,500 --> 00:17:37,010 Are there any questions on what we mean by preprocessing? 339 00:17:37,010 --> 00:17:40,850 Which finds and replaces the hash includes symbols, among others, 340 00:17:40,850 --> 00:17:44,450 and compiling, which technically takes your source code, 341 00:17:44,450 --> 00:17:48,170 once preprocessed, and converts it to that stuff called assembly language. 342 00:17:48,170 --> 00:17:50,342 AUDIENCE: [INAUDIBLE] each CPU has-- 343 00:17:50,342 --> 00:17:51,290 DAVID MALAN: Correct. 344 00:17:51,290 --> 00:17:54,710 Each type of CPU has its own instruction set. 345 00:17:54,710 --> 00:17:55,280 Indeed. 346 00:17:55,280 --> 00:17:58,970 And as a teaser, this is why, at least back in the day, when 347 00:17:58,970 --> 00:18:02,900 we used to install software from CD-ROMs, or some other type of media, 348 00:18:02,900 --> 00:18:08,222 this is why you can't take a program that was sold for a Windows computer 349 00:18:08,222 --> 00:18:09,680 and run it on a Mac, or vice-versa. 350 00:18:09,680 --> 00:18:14,420 Because the commands, the instructions that those two products understand, 351 00:18:14,420 --> 00:18:15,500 are actually different. 352 00:18:15,500 --> 00:18:20,150 Now Microsoft, or any company, could generally write code in one language, 353 00:18:20,150 --> 00:18:24,109 like C or another, and they can compile it twice, saving a PC version 354 00:18:24,109 --> 00:18:25,790 and saving a Mac version. 355 00:18:25,790 --> 00:18:30,109 It's twice as much work and sometimes you get into some incompatibilities, 356 00:18:30,109 --> 00:18:33,140 but that's why these steps are somewhat distinct. 357 00:18:33,140 --> 00:18:36,710 You can now use the same code and support even different platforms, 358 00:18:36,710 --> 00:18:37,940 or systems, if you'd want. 359 00:18:37,940 --> 00:18:38,440 All right. 360 00:18:38,440 --> 00:18:39,650 Assembly, assembling. 361 00:18:39,650 --> 00:18:42,800 Thankfully, this part is fairly straightforward, at least, in concept. 362 00:18:42,800 --> 00:18:46,250 To assemble code, which is step three of four, that is just 363 00:18:46,250 --> 00:18:50,360 happening for you every time you run make or, in turn, clang, 364 00:18:50,360 --> 00:18:53,570 this assembly language, which the computer generated automatically 365 00:18:53,570 --> 00:18:57,080 for you from your source code, is turned into 0s and 1s. 366 00:18:57,080 --> 00:19:00,783 So that's the step that, last week, I simplified and said, 367 00:19:00,783 --> 00:19:03,950 when you compile your code, you convert it to source code-- from source code 368 00:19:03,950 --> 00:19:04,970 to machine code. 369 00:19:04,970 --> 00:19:07,685 Technically, that happens when you assemble your code. 370 00:19:07,685 --> 00:19:10,940 But no one in normal conversations says that, they just 371 00:19:10,940 --> 00:19:13,280 say compile for all of these terms. 372 00:19:13,280 --> 00:19:14,310 All right. 373 00:19:14,310 --> 00:19:17,450 So that's assembling. 374 00:19:17,450 --> 00:19:19,070 There's one final step. 375 00:19:19,070 --> 00:19:22,400 Even in this simple program of getting the user's name 376 00:19:22,400 --> 00:19:27,120 and then plugging it into printf, I'm using three different people's code, 377 00:19:27,120 --> 00:19:27,620 if you will. 378 00:19:27,620 --> 00:19:30,200 My own, which is in hello.c. 379 00:19:30,200 --> 00:19:35,600 Some of CS50s, which is in hello.c, sorry-- which 380 00:19:35,600 --> 00:19:39,080 is in cs50.c, which is not a file I've mentioned, yet, 381 00:19:39,080 --> 00:19:43,220 but it stands to reason, that if there's a cs50.h that has prototypes, 382 00:19:43,220 --> 00:19:45,380 turns out, the actual implementation of get string 383 00:19:45,380 --> 00:19:47,600 and other things are in cs50.c. 384 00:19:47,600 --> 00:19:51,290 And there's a third file somewhere on the hard drive 385 00:19:51,290 --> 00:19:54,260 that's involved in compiling even this simple program. 386 00:19:54,260 --> 00:19:59,971 hello.c, cs50.c, and by that logic, what might the other be? 387 00:19:59,971 --> 00:20:00,471 Yeah? 388 00:20:00,471 --> 00:20:02,275 AUDIENCE: stdio? 389 00:20:02,275 --> 00:20:03,600 DAVID MALAN: Stdio.c. 390 00:20:03,600 --> 00:20:06,690 And that's a bit of a white lie, because that's such a big, fancy library 391 00:20:06,690 --> 00:20:09,750 that there's actually multiple files that compose it, but the same idea, 392 00:20:09,750 --> 00:20:11,380 and we'll take the simplification. 393 00:20:11,380 --> 00:20:16,200 So when I have this code, and I compile my code, 394 00:20:16,200 --> 00:20:21,300 I get those 0s and 1s that end up taking hello.c and turning it, effectively, 395 00:20:21,300 --> 00:20:26,830 into 0s and 1s that are combined with cs50.c, followed by stdio.c as well. 396 00:20:26,830 --> 00:20:27,840 So let me rewind here. 397 00:20:27,840 --> 00:20:33,300 Here might be the 0s and 1s for my code, the two lines of code that I wrote. 398 00:20:33,300 --> 00:20:37,920 Here might be the 0s and 1s for what cs50 wrote some years ago in cs50.c. 399 00:20:37,920 --> 00:20:42,210 Here might be the 0s and 1s that someone wrote for standard I/O decades ago. 400 00:20:42,210 --> 00:20:45,720 The last and final step is that linking command 401 00:20:45,720 --> 00:20:48,330 that links all of these 0s and 1s together, 402 00:20:48,330 --> 00:20:53,820 essentially stitches them together into one single file called hello, 403 00:20:53,820 --> 00:20:56,385 or called a.out, whatever you name it. 404 00:20:56,385 --> 00:21:01,650 That last step is what combines all of these different programmers' 0s and 1s. 405 00:21:01,650 --> 00:21:04,050 And my God, now we're really in the weeds. 406 00:21:04,050 --> 00:21:07,020 Who wants to even think about running code at this level? 407 00:21:07,020 --> 00:21:08,160 You shouldn't need to. 408 00:21:08,160 --> 00:21:09,180 But it's not magic. 409 00:21:09,180 --> 00:21:11,748 When you're running make, there's some very concrete steps 410 00:21:11,748 --> 00:21:14,290 that are happening that humans have developed over the years, 411 00:21:14,290 --> 00:21:17,700 over the decades, that breakdown this big problem of source code going 412 00:21:17,700 --> 00:21:22,410 to 0s and 1s, or machine code, into these very specific steps. 413 00:21:22,410 --> 00:21:26,100 But henceforth, you can call all of this compiling. 414 00:21:26,100 --> 00:21:27,120 Questions? 415 00:21:27,120 --> 00:21:27,780 Or confusion? 416 00:21:27,780 --> 00:21:28,596 Yeah? 417 00:21:28,596 --> 00:21:30,804 AUDIENCE: Can you explain again what a.out signifies? 418 00:21:30,804 --> 00:21:31,770 DAVID MALAN: Sure. 419 00:21:31,770 --> 00:21:33,270 What does a.out signify? 420 00:21:33,270 --> 00:21:37,890 a.out is just the conventional, default file name for any program 421 00:21:37,890 --> 00:21:41,280 that you compile directly with a compiler, like clang. 422 00:21:41,280 --> 00:21:43,680 It's a meaningless name, though. 423 00:21:43,680 --> 00:21:47,250 It stands for assembler output, and assembler might now sound familiar 424 00:21:47,250 --> 00:21:48,690 from this assembling process. 425 00:21:48,690 --> 00:21:51,150 It's a lame name for a computer program, and we 426 00:21:51,150 --> 00:21:56,450 can override it by outputting something like hello, instead. 427 00:21:56,450 --> 00:21:57,317 Yeah? 428 00:21:57,317 --> 00:22:03,426 AUDIENCE: [INAUDIBLE] 429 00:22:03,426 --> 00:22:07,860 DAVID MALAN: To recap, there are other prototypes in those files, 430 00:22:07,860 --> 00:22:11,910 cs50.h, stdio.h, technically, they're all included on top of your file, 431 00:22:11,910 --> 00:22:14,460 even though you, strictly speaking, don't need most of them, 432 00:22:14,460 --> 00:22:18,190 but they are there, just in case you might want them. 433 00:22:18,190 --> 00:22:19,660 And finally, any other questions? 434 00:22:19,660 --> 00:22:20,160 Yeah? 435 00:22:20,160 --> 00:22:23,878 AUDIENCE: [INAUDIBLE] 436 00:22:23,878 --> 00:22:26,920 DAVID MALAN: Does it matter what order we're telling the computer to run? 437 00:22:26,920 --> 00:22:29,140 Sometimes with libraries, yes, it matters 438 00:22:29,140 --> 00:22:31,520 what order they are linked in together. 439 00:22:31,520 --> 00:22:34,330 But for our purposes, it's really not going to matter. 440 00:22:34,330 --> 00:22:38,750 It's going to-- make is going to take care of automating that process for us. 441 00:22:38,750 --> 00:22:39,250 All right. 442 00:22:39,250 --> 00:22:41,795 So with that said, henceforth, compiling, technically, 443 00:22:41,795 --> 00:22:42,670 is these four things. 444 00:22:42,670 --> 00:22:46,690 But we'll focus on it as a higher level concept, an abstraction, 445 00:22:46,690 --> 00:22:49,880 known as compiling itself. 446 00:22:49,880 --> 00:22:52,510 So another process that we'll now begin to focus on all the 447 00:22:52,510 --> 00:22:55,690 more this week because, invariably, this past week you ran against-- 448 00:22:55,690 --> 00:22:57,160 ran up against some challenges. 449 00:22:57,160 --> 00:23:00,550 You probably created your very first bugs, or mistakes, in a program 450 00:23:00,550 --> 00:23:03,940 and so let's focus for a moment on actual techniques for debugging. 451 00:23:03,940 --> 00:23:07,060 As you spend more time this semester, in the years 452 00:23:07,060 --> 00:23:10,270 to come If you continue to program, you're never, frankly, probably, 453 00:23:10,270 --> 00:23:13,577 going to write bug free code, ultimately. 454 00:23:13,577 --> 00:23:16,660 Though your programs are going to get more featureful, more sophisticated, 455 00:23:16,660 --> 00:23:20,230 and we're all going to start to make more sophisticated mistakes. 456 00:23:20,230 --> 00:23:22,570 And to this day, I write buggy code all the time. 457 00:23:22,570 --> 00:23:24,520 And I'm always horrified when I do it up here. 458 00:23:24,520 --> 00:23:26,620 But hopefully, that won't happen too often. 459 00:23:26,620 --> 00:23:30,100 But when it does, it's a process, now, of debugging, trying 460 00:23:30,100 --> 00:23:32,230 to find the mistakes in your program. 461 00:23:32,230 --> 00:23:35,600 You don't have to stare at your code, or shake your fist at your code. 462 00:23:35,600 --> 00:23:38,590 There are actual tools that real world programmers 463 00:23:38,590 --> 00:23:41,860 use to help debug their code and find these faults. 464 00:23:41,860 --> 00:23:44,455 So what are some of the techniques and tools that folks use? 465 00:23:44,455 --> 00:23:49,440 Well as an aside, if you've ever-- 466 00:23:49,440 --> 00:23:52,840 a bug in a program is a mistake, that's been around for some time. 467 00:23:52,840 --> 00:23:58,010 If you've ever heard this tale, some 50 plus years ago, in 1947. 468 00:23:58,010 --> 00:24:02,770 This is an entry in a log book written by a famous computer scientist known 469 00:24:02,770 --> 00:24:05,230 as-- named Grace Hopper, who happened to be the one 470 00:24:05,230 --> 00:24:09,345 to record the very first discovery of a quote-unquote actual bug in a computer. 471 00:24:09,345 --> 00:24:11,860 This was like a moth that had flown into, 472 00:24:11,860 --> 00:24:17,080 at the time, a very sophisticated system known as the Harvard Mark II computer, 473 00:24:17,080 --> 00:24:20,050 very large, refrigerator-sized type systems, 474 00:24:20,050 --> 00:24:24,160 in which an actual bug caused an issue. 475 00:24:24,160 --> 00:24:27,190 The etymology of bug though, predates this particular instance, 476 00:24:27,190 --> 00:24:30,580 but here you have, as any computer scientists might know, the example 477 00:24:30,580 --> 00:24:32,845 of a first physical bug in a computer. 478 00:24:32,845 --> 00:24:35,322 How, though, do you go about removing such a thing? 479 00:24:35,322 --> 00:24:37,780 Well, let's consider a very simple scenario from last time, 480 00:24:37,780 --> 00:24:40,780 for instance, when we were trying to print out various aspects of Mario, 481 00:24:40,780 --> 00:24:42,970 like this column of 3 bricks. 482 00:24:42,970 --> 00:24:46,660 Let's consider how I might go about implementing a program like this. 483 00:24:46,660 --> 00:24:51,130 Let me switch back over to VS Code here, and I'm going to run-- 484 00:24:51,130 --> 00:24:52,750 write a program. 485 00:24:52,750 --> 00:24:54,640 And I'm not going to trust myself, so I'm 486 00:24:54,640 --> 00:24:56,507 going to call it buggy.c from the get-go, 487 00:24:56,507 --> 00:24:58,340 knowing that I'm going to mess something up. 488 00:24:58,340 --> 00:25:01,150 But I'm going to go ahead and include stdio.h. 489 00:25:01,150 --> 00:25:03,940 And I'm going to define main, as usual. 490 00:25:03,940 --> 00:25:05,950 So hopefully, no mistakes just yet. 491 00:25:05,950 --> 00:25:08,710 And now, I want to print those 3 bricks on the screen using 492 00:25:08,710 --> 00:25:10,270 just hashes for bricks. 493 00:25:10,270 --> 00:25:16,420 So how about 4 int i get 0, i less than or equal to 3, i plus plus. 494 00:25:16,420 --> 00:25:18,280 Now, inside of my curly braces, I'm going 495 00:25:18,280 --> 00:25:23,960 to go ahead and print out a hash followed by a backslash n, semicolon. 496 00:25:23,960 --> 00:25:27,975 All right, saving the file, doing make, buggy, Enter, it compiles. 497 00:25:27,975 --> 00:25:33,340 So there's no syntactical errors, my code is syntactically correct. 498 00:25:33,340 --> 00:25:36,640 But some of you have probably seen the logical error already, 499 00:25:36,640 --> 00:25:39,370 because when I run this program I don't get 500 00:25:39,370 --> 00:25:45,430 this picture, which was 3 bricks high, I seem to have 4 bricks instead. 501 00:25:45,430 --> 00:25:47,930 Now, this might be jumping out at you, why it's happening, 502 00:25:47,930 --> 00:25:49,930 but I've kept the program simple just so that we 503 00:25:49,930 --> 00:25:54,010 don't have to find an actual bug, we can use a tool to find one that we already 504 00:25:54,010 --> 00:25:55,970 know about, in this case. 505 00:25:55,970 --> 00:25:59,050 What might be the first strategy for finding a bug like this, 506 00:25:59,050 --> 00:26:03,292 rather than staring at your code, asking a question, trying to think 507 00:26:03,292 --> 00:26:04,125 through the problem? 508 00:26:04,125 --> 00:26:07,690 Well, let's actually try to diagnose the problem more proactively. 509 00:26:07,690 --> 00:26:10,420 And the simplest way to do this now, and years from now, 510 00:26:10,420 --> 00:26:13,870 is, honestly, going to be to use a function like printf. 511 00:26:13,870 --> 00:26:15,790 Printf is a wonderfully useful function, not 512 00:26:15,790 --> 00:26:18,550 for formatting-- printing formatted strings and all that, for 513 00:26:18,550 --> 00:26:21,430 just looking inside the values of variables 514 00:26:21,430 --> 00:26:24,352 that you might be curious about to see what's going on. 515 00:26:24,352 --> 00:26:25,060 So you know what? 516 00:26:25,060 --> 00:26:26,320 Let me do this. 517 00:26:26,320 --> 00:26:29,110 I see that there's 4 coming out, but I intended 3. 518 00:26:29,110 --> 00:26:31,740 So clearly, something's wrong with my i variables. 519 00:26:31,740 --> 00:26:34,090 So let me be a little more pedantic. 520 00:26:34,090 --> 00:26:37,300 Let me go inside of this loop and, temporarily, 521 00:26:37,300 --> 00:26:40,480 say something explicit, like, i is-- 522 00:26:40,480 --> 00:26:45,200 &i /n, and then just plug in the value of i. 523 00:26:45,200 --> 00:26:45,700 Right? 524 00:26:45,700 --> 00:26:48,970 This is not the program I want to write, it's the program I'm temporarily 525 00:26:48,970 --> 00:26:54,400 writing, because now I'm going to say make buggy, ./buggy. 526 00:26:54,400 --> 00:26:56,500 And if I look, now, at the output, I have 527 00:26:56,500 --> 00:27:01,090 some helpful diagnostic information. i is 0, and I get a hash, i is 1, 528 00:27:01,090 --> 00:27:03,610 and I get a hash, 2 and I get a hash, 3 and I get hash. 529 00:27:03,610 --> 00:27:04,527 OK, wait a minute. 530 00:27:04,527 --> 00:27:06,610 I'm clearly going too many steps because, maybe, I 531 00:27:06,610 --> 00:27:09,250 forgot that computers are, essentially, counting from 0, 532 00:27:09,250 --> 00:27:11,450 and now, oh, it's less than or equal to. 533 00:27:11,450 --> 00:27:13,030 Now you see it, right? 534 00:27:13,030 --> 00:27:15,940 Again, trivial example, but just by using printf, 535 00:27:15,940 --> 00:27:18,910 you can see inside of the computer's memory 536 00:27:18,910 --> 00:27:21,130 by just printing stuff out like this. 537 00:27:21,130 --> 00:27:25,770 And now, once you've figured it out, oh, so this should probably be less than 3, 538 00:27:25,770 --> 00:27:28,140 or I should start counting from 1, there's 539 00:27:28,140 --> 00:27:29,640 any number of ways I could fix this. 540 00:27:29,640 --> 00:27:32,655 But the most conventional is probably just to say less than 3. 541 00:27:32,655 --> 00:27:39,180 Now, I can delete my temporary print statement, rerun make buggy, ./buggy. 542 00:27:39,180 --> 00:27:41,790 And, voila, problem solved. 543 00:27:41,790 --> 00:27:43,830 All right, and to this day, I do this. 544 00:27:43,830 --> 00:27:46,860 Whether it's making a command line application, or a web application, 545 00:27:46,860 --> 00:27:49,050 or mobile application, It's very common to use 546 00:27:49,050 --> 00:27:51,270 printf, or some equivalent in any language, 547 00:27:51,270 --> 00:27:55,350 just to poke around and see what's inside the computer's memory. 548 00:27:55,350 --> 00:27:58,570 Thankfully, there's more sophisticated tools than this. 549 00:27:58,570 --> 00:28:00,930 Let me go ahead and reintroduce the bug here. 550 00:28:00,930 --> 00:28:04,620 And let me reopen my sidebar at left here. 551 00:28:04,620 --> 00:28:08,550 Let me now recompile the code to make sure it's current. 552 00:28:08,550 --> 00:28:11,310 And I'm going to run a command called debug50. 553 00:28:11,310 --> 00:28:15,090 Which is a command that's representative of a type of program 554 00:28:15,090 --> 00:28:16,740 known as a debugger. 555 00:28:16,740 --> 00:28:19,680 And this debugger is actually built into VS Code. 556 00:28:19,680 --> 00:28:23,700 And all debug50 is doing for us is automating the process of starting 557 00:28:23,700 --> 00:28:25,650 VS Code's built-in debugger. 558 00:28:25,650 --> 00:28:28,260 So this isn't even a CS50-specific tool, we've 559 00:28:28,260 --> 00:28:31,170 just given you a debug50 command to make it easier 560 00:28:31,170 --> 00:28:32,855 to start it up from the get-go. 561 00:28:32,855 --> 00:28:37,560 And the way you run this debugger is you say debug50, space, and then 562 00:28:37,560 --> 00:28:40,120 the name of the program that you want to debug. 563 00:28:40,120 --> 00:28:42,210 So, in this case, . /buggy. 564 00:28:42,210 --> 00:28:44,010 So you don't mention your c-file. 565 00:28:44,010 --> 00:28:46,650 You mention your already-compiled code. 566 00:28:46,650 --> 00:28:52,230 And what this debugger is going to let me do is, most powerfully, 567 00:28:52,230 --> 00:28:54,930 walk through my code step-by-step. 568 00:28:54,930 --> 00:28:58,930 Because every program we've written thus far, runs from start to finish, 569 00:28:58,930 --> 00:29:02,325 even if I'm not done thinking through each step at a time. 570 00:29:02,325 --> 00:29:05,850 With a debugger, I can actually click on a line number 571 00:29:05,850 --> 00:29:09,180 and say pause execution here, and the debugger 572 00:29:09,180 --> 00:29:14,130 will let me walk through my code one step at a time, one second at a time, 573 00:29:14,130 --> 00:29:16,740 one minute at a time, at my own human pace. 574 00:29:16,740 --> 00:29:19,470 Which is super compelling when the programs get more complicated 575 00:29:19,470 --> 00:29:22,600 and they might, otherwise, fly by on the screen. 576 00:29:22,600 --> 00:29:25,860 So I'm going to click to the left of line 5. 577 00:29:25,860 --> 00:29:27,970 And notice that these little red dots appear. 578 00:29:27,970 --> 00:29:31,290 And if I click on one it stays, and gets even redder. 579 00:29:31,290 --> 00:29:34,230 And I'm going to run debug50 on ./buggy. 580 00:29:34,230 --> 00:29:39,090 And in just a moment, you'll see that a new panel opens on the left hand side. 581 00:29:39,090 --> 00:29:41,910 It's doing some configuration of the screen. 582 00:29:41,910 --> 00:29:46,690 Let me zoom out a little bit here so we can see more on the screen at once. 583 00:29:46,690 --> 00:29:50,440 And sometimes, you'll see in VS Code that debug console opens up, 584 00:29:50,440 --> 00:29:54,480 which looks very cryptic, just go back to terminal window if that happens. 585 00:29:54,480 --> 00:29:57,875 Because at the terminal window is where you can still interact with your code. 586 00:29:57,875 --> 00:30:00,120 And let's now take a look at what's going on. 587 00:30:00,120 --> 00:30:04,650 If I zoom in on my buggy.c code here, you'll 588 00:30:04,650 --> 00:30:10,890 notice that we have the same program as before, but highlighted in yellow 589 00:30:10,890 --> 00:30:11,820 is line 5. 590 00:30:11,820 --> 00:30:15,660 Not a coincidence, that's the line I set a so-called breakpoint at. 591 00:30:15,660 --> 00:30:20,400 The little red dot means break here, pause execution here. 592 00:30:20,400 --> 00:30:23,716 And the yellow line has not yet been executed. 593 00:30:23,716 --> 00:30:27,600 But if I, now, at the top of my screen, notice these little arrows. 594 00:30:27,600 --> 00:30:28,750 There's one for Play. 595 00:30:28,750 --> 00:30:30,750 There's one for this, which, if I hover over it, 596 00:30:30,750 --> 00:30:34,140 says Step Over, there's another that's going to say Step Into, 597 00:30:34,140 --> 00:30:35,820 there's a third that says Step Out. 598 00:30:35,820 --> 00:30:38,520 I'm just going to use the first of these, Step Over. 599 00:30:38,520 --> 00:30:41,580 And I'm going to do this, and you'll see that the yellow highlight 600 00:30:41,580 --> 00:30:45,660 moved from line 5 to line 7 because now it's ready, 601 00:30:45,660 --> 00:30:47,955 but hasn't yet printed out that hash. 602 00:30:47,955 --> 00:30:51,817 But the most powerful thing here, notice, is that top left here. 603 00:30:51,817 --> 00:30:54,150 It's a little cryptic, because there's a bunch of things 604 00:30:54,150 --> 00:30:56,910 going on that will make more sense over time, but at the top 605 00:30:56,910 --> 00:30:58,470 there's a section called variables. 606 00:30:58,470 --> 00:31:00,750 Below that, something called locals, which means 607 00:31:00,750 --> 00:31:02,820 local to my current function, main. 608 00:31:02,820 --> 00:31:07,410 And notice, there's my variable called i, and its current value is 0. 609 00:31:07,410 --> 00:31:12,810 So now, once I click Step Over again, watch what happens. 610 00:31:12,810 --> 00:31:15,660 We go from line 7 back to line 5. 611 00:31:15,660 --> 00:31:19,455 But look in the terminal window, one of the hashes has printed. 612 00:31:19,455 --> 00:31:22,050 But now, it's printed at my own pace. 613 00:31:22,050 --> 00:31:24,030 I can think through this step-by-step. 614 00:31:24,030 --> 00:31:26,340 Notice that i has not changed, yet. 615 00:31:26,340 --> 00:31:29,700 It's still 0 because the yellow highlighted line hasn't yet executed. 616 00:31:29,700 --> 00:31:34,140 But the moment I click Step Over, it's going to execute line 5. 617 00:31:34,140 --> 00:31:41,010 Now, notice at top left, i has become 1, and nothing has printed, yet, 618 00:31:41,010 --> 00:31:43,290 because now, highlighted is line 7. 619 00:31:43,290 --> 00:31:48,000 So if I click Step Over again, we'll see the hash. 620 00:31:48,000 --> 00:31:51,930 If I repeat this process at my own human, comfortable pace, 621 00:31:51,930 --> 00:31:57,040 I can see my variables changing, I can see output changing on the screen, 622 00:31:57,040 --> 00:31:59,902 and I can just think about should that have just happened. 623 00:31:59,902 --> 00:32:01,860 I can pause and give thought to what's actually 624 00:32:01,860 --> 00:32:06,240 going on without trying to race the computer and figure it all out at once. 625 00:32:06,240 --> 00:32:08,490 I'm going to go ahead and stop here because we already 626 00:32:08,490 --> 00:32:11,430 know what this particular problem is, and that brings me back 627 00:32:11,430 --> 00:32:12,720 to my default terminal window. 628 00:32:12,720 --> 00:32:16,180 But this debugger, let me disable the breakpoint now 629 00:32:16,180 --> 00:32:18,570 so it doesn't keep breaking, this debugger 630 00:32:18,570 --> 00:32:20,760 will be your friend moving forward in order 631 00:32:20,760 --> 00:32:25,290 to step through your code step-by-step, at your own pace to figure out 632 00:32:25,290 --> 00:32:26,820 where something has gone wrong. 633 00:32:26,820 --> 00:32:30,397 Printf is great, but it gets annoying if you have to constantly add print this, 634 00:32:30,397 --> 00:32:33,480 print this, print this, print this, recompile, rerun it, oh wait a minute, 635 00:32:33,480 --> 00:32:34,980 print this, print this. 636 00:32:34,980 --> 00:32:39,780 The debugger lets you do the equivalent, but automatically. 637 00:32:39,780 --> 00:32:45,960 Questions on this debugger, which you'll see all the more hands-on over time? 638 00:32:45,960 --> 00:32:47,430 Questions on debugger? 639 00:32:47,430 --> 00:32:48,554 Yeah? 640 00:32:48,554 --> 00:32:50,560 AUDIENCE: You were using a Step Over feature. 641 00:32:50,560 --> 00:32:53,303 What do the other features in the debugger-- 642 00:32:53,303 --> 00:32:54,720 DAVID MALAN: Really good question. 643 00:32:54,720 --> 00:32:57,720 We'll see this before long, but those other buttons that I glossed over, 644 00:32:57,720 --> 00:33:02,460 step into and step out of, actually let you step into specific functions 645 00:33:02,460 --> 00:33:04,200 if I had any more than main. 646 00:33:04,200 --> 00:33:06,960 So if main called a function called something, 647 00:33:06,960 --> 00:33:10,380 and something called a function called something else, instead of just 648 00:33:10,380 --> 00:33:14,730 stepping over the entire execution of that function, I could step into it 649 00:33:14,730 --> 00:33:17,105 and walk through its lines of code one by one. 650 00:33:17,105 --> 00:33:19,020 So any time you have a problem set you're 651 00:33:19,020 --> 00:33:22,140 working on that has multiple functions, you can set a breakpoint in main, 652 00:33:22,140 --> 00:33:26,250 if you want, or you can set it inside of one of your additional functions 653 00:33:26,250 --> 00:33:29,130 to focus your attention only on that. 654 00:33:29,130 --> 00:33:32,640 And we'll see examples of that over time. 655 00:33:32,640 --> 00:33:33,780 All right, so what else? 656 00:33:33,780 --> 00:33:38,100 And what's the sort of, elephant in the room, so to speak, 657 00:33:38,100 --> 00:33:39,750 is actually a duck in this case. 658 00:33:39,750 --> 00:33:42,160 Why is there this duck and all of these ducks here? 659 00:33:42,160 --> 00:33:46,440 Well, it turns out, a third, genuinely recommended, debugging technique 660 00:33:46,440 --> 00:33:50,055 is talking through problems, talking through code with someone else. 661 00:33:50,055 --> 00:33:52,620 Now, in the absence of having a family member, or a friend, 662 00:33:52,620 --> 00:33:56,520 or a roommate who actually wants to hear you talk about code, of all things, 663 00:33:56,520 --> 00:34:01,320 generally, programmers turn to a rubber duck, or other inanimate objects 664 00:34:01,320 --> 00:34:03,360 if something animate is not available. 665 00:34:03,360 --> 00:34:06,760 The idea behind rubber duck debugging, so to speak, 666 00:34:06,760 --> 00:34:12,750 is that simply by looking at your code and talking it through, OK, on line 3, 667 00:34:12,750 --> 00:34:17,040 I'm starting a 4 loop and I'm initializing i to 0. 668 00:34:17,040 --> 00:34:18,990 OK, then, I'm printing out a hash. 669 00:34:18,990 --> 00:34:24,112 Just by talking through your code, step-by-step, invariably, 670 00:34:24,112 --> 00:34:26,820 finds you having the proverbial light bulb go off over your head, 671 00:34:26,820 --> 00:34:29,040 because you realize, wait a minute I just said something stupid, 672 00:34:29,040 --> 00:34:30,510 or I just said something wrong. 673 00:34:30,510 --> 00:34:34,500 And this is really just a proxy for any other human, teaching fellow, teacher 674 00:34:34,500 --> 00:34:36,060 or friend, colleague. 675 00:34:36,060 --> 00:34:38,440 But in the absence of any of those people in the room, 676 00:34:38,440 --> 00:34:40,357 you're welcome to take, on your way out today. 677 00:34:40,357 --> 00:34:44,280 One of these little, rubber ducks and consider using it, for real, any time 678 00:34:44,280 --> 00:34:47,820 you want to talk through one of your problems in CS50, 679 00:34:47,820 --> 00:34:49,140 or maybe life more generally. 680 00:34:49,140 --> 00:34:51,480 But having it there on your desk is just a way 681 00:34:51,480 --> 00:34:55,140 to help you hear illogic in what you think 682 00:34:55,140 --> 00:34:57,790 might, otherwise, be logical code. 683 00:34:57,790 --> 00:35:02,400 So printf, debugging, rubber-duck debugging are just three of the ways, 684 00:35:02,400 --> 00:35:05,207 you'll see over time, to get to the source of code 685 00:35:05,207 --> 00:35:06,790 that you will write that has mistakes. 686 00:35:06,790 --> 00:35:08,880 Which is going to happen, but it will empower you 687 00:35:08,880 --> 00:35:12,000 all the more to solve those mistakes. 688 00:35:12,000 --> 00:35:17,440 All right, any questions on debugging, in general, or these three techniques? 689 00:35:17,440 --> 00:35:17,940 Yeah? 690 00:35:17,940 --> 00:35:19,740 AUDIENCE: [INAUDIBLE] 691 00:35:19,740 --> 00:35:22,650 DAVID MALAN: What's the difference between Step Over and Step Into? 692 00:35:22,650 --> 00:35:25,980 At the moment, the only one that's applicable to the code I just wrote 693 00:35:25,980 --> 00:35:29,340 is Step Over, because it means step over each line of code. 694 00:35:29,340 --> 00:35:34,050 If, though, I had other functions that I had written in this program, 695 00:35:34,050 --> 00:35:39,300 maybe lower down in the file, I could step into those function calls 696 00:35:39,300 --> 00:35:41,469 and walk through them one at a time. 697 00:35:41,469 --> 00:35:43,650 So we'll come back to this with an actual example, 698 00:35:43,650 --> 00:35:46,230 but step into will allow me to do exactly that. 699 00:35:46,230 --> 00:35:49,210 In fact, this is a perfect segue to doing a little something like this. 700 00:35:49,210 --> 00:35:51,632 Let me go ahead and open up another file here. 701 00:35:51,632 --> 00:35:53,340 And, actually, we'll use the same, buggy. 702 00:35:53,340 --> 00:35:56,320 And we're going to write one other thing that's buggy, as well. 703 00:35:56,320 --> 00:36:00,000 Let me go up here and include, as before, cs50.h. 704 00:36:00,000 --> 00:36:03,780 Let me include stdio.h. 705 00:36:03,780 --> 00:36:05,520 Let me do int main(void). 706 00:36:05,520 --> 00:36:08,050 So all of this, I think, is correct, so far. 707 00:36:08,050 --> 00:36:11,280 And let's do this, let's give myself an int called i, 708 00:36:11,280 --> 00:36:14,530 and let's ask the user for a negative integer. 709 00:36:14,530 --> 00:36:17,300 This is not a function that exists, technically, yet. 710 00:36:17,300 --> 00:36:20,050 But I'm going to assume, for the sake of discussion, that it does. 711 00:36:20,050 --> 00:36:23,700 Then, I'm just going to print out, with %i and a new line, 712 00:36:23,700 --> 00:36:25,360 whatever the human typed in. 713 00:36:25,360 --> 00:36:28,320 So at this point in the story, my program, I think, is correct. 714 00:36:28,320 --> 00:36:30,930 Except for the fact that get negative int is not 715 00:36:30,930 --> 00:36:33,690 a function in the CS50 library or anywhere else. 716 00:36:33,690 --> 00:36:35,460 I'm going to need to invent it myself. 717 00:36:35,460 --> 00:36:41,310 So suppose, in this case, that I declare a function called get negative int. 718 00:36:41,310 --> 00:36:45,630 It's return type, so to speak, should be int, because, as its name suggests, 719 00:36:45,630 --> 00:36:48,360 I want to hand the user back in integer, and it's going 720 00:36:48,360 --> 00:36:50,310 to take no input to keep it simple. 721 00:36:50,310 --> 00:36:51,810 So I'm just going to say void there. 722 00:36:51,810 --> 00:36:54,810 No inputs, no special prompts, nothing like that. 723 00:36:54,810 --> 00:36:57,600 Let me, now, give myself some curly braces. 724 00:36:57,600 --> 00:37:00,510 And let me do something familiar, perhaps, from problem set 1. 725 00:37:00,510 --> 00:37:05,550 Let me give myself a variable, like n, and let me do the following 726 00:37:05,550 --> 00:37:07,320 within this block of code. 727 00:37:07,320 --> 00:37:13,590 Assign n the value of get int, asking the user for a negative integer using 728 00:37:13,590 --> 00:37:14,850 get int's own prompt. 729 00:37:14,850 --> 00:37:18,750 And I want to do this while n is less than 0, because I 730 00:37:18,750 --> 00:37:20,390 want to get a negative from the user. 731 00:37:20,390 --> 00:37:24,140 And recall, from having used this block in the past, 732 00:37:24,140 --> 00:37:27,770 I can now return n as the very last step to hand back 733 00:37:27,770 --> 00:37:31,790 whatever the user has typed in, so long as they cooperated and gave me 734 00:37:31,790 --> 00:37:33,750 an actual negative integer. 735 00:37:33,750 --> 00:37:36,710 Now, I've deliberately made a mistake here, 736 00:37:36,710 --> 00:37:39,080 and it's a subtle, silly, mathematical one, 737 00:37:39,080 --> 00:37:43,910 but let me compile this program after copying the prototype up to the top, 738 00:37:43,910 --> 00:37:45,380 so I don't make that mistake again. 739 00:37:45,380 --> 00:37:48,470 Let me do make buggy, Enter. 740 00:37:48,470 --> 00:37:50,720 And now, let me do ./buggy. 741 00:37:50,720 --> 00:37:54,020 I'll give it a negative integer, like negative 50. 742 00:37:54,020 --> 00:37:55,370 Uh-huh. 743 00:37:55,370 --> 00:37:59,330 That did not take. 744 00:37:59,330 --> 00:38:00,860 How about negative 5? 745 00:38:00,860 --> 00:38:02,060 No. 746 00:38:02,060 --> 00:38:04,500 How about 0? 747 00:38:04,500 --> 00:38:05,000 All right. 748 00:38:05,000 --> 00:38:09,080 So it's, clearly, working backwards, or incorrectly here, logically. 749 00:38:09,080 --> 00:38:10,800 So how could I go about debugging this? 750 00:38:10,800 --> 00:38:12,425 Well, I could do what I've done before? 751 00:38:12,425 --> 00:38:18,920 I could use my printf technique and say something explicit like n is %i, 752 00:38:18,920 --> 00:38:25,310 new line, comma n, just to print it out, let me recompile buggy, 753 00:38:25,310 --> 00:38:28,640 let me rerun buggy, let me type in negative 50. 754 00:38:28,640 --> 00:38:30,630 OK, n is negative 50. 755 00:38:30,630 --> 00:38:33,173 So that didn't really help me at this point, 756 00:38:33,173 --> 00:38:34,590 because that's the same as before. 757 00:38:34,590 --> 00:38:38,030 So let me do this, debug50, ./buggy. 758 00:38:38,030 --> 00:38:39,870 Oh, but I've made a mistake. 759 00:38:39,870 --> 00:38:41,700 So I didn't set my breakpoint, yet. 760 00:38:41,700 --> 00:38:44,930 So let me do this, and I'll set a breakpoint this time. 761 00:38:44,930 --> 00:38:47,330 I could set it here, on line 8. 762 00:38:47,330 --> 00:38:49,340 Let's do it in main, as before. 763 00:38:49,340 --> 00:38:51,530 Let me rerun debug50, now. 764 00:38:51,530 --> 00:38:52,970 On ./buggy. 765 00:38:52,970 --> 00:38:55,190 That fancy user interface is going to pop up. 766 00:38:55,190 --> 00:38:58,310 It's going to highlight the line that I set the breakpoint on. 767 00:38:58,310 --> 00:39:01,250 Notice that, on the left hand side of the screen, 768 00:39:01,250 --> 00:39:04,650 i is defaulting, at the moment to 0, because I haven't typed anything in, 769 00:39:04,650 --> 00:39:05,150 yet. 770 00:39:05,150 --> 00:39:10,815 But let me, now, Step Over this line that's highlighted in yellow, 771 00:39:10,815 --> 00:39:12,440 and you'll see that I'm being prompted. 772 00:39:12,440 --> 00:39:16,220 So let's type in my negative 50, Enter. 773 00:39:16,220 --> 00:39:21,470 Notice now that I'm stuck in that function. 774 00:39:21,470 --> 00:39:22,250 All right. 775 00:39:22,250 --> 00:39:26,520 So clearly, the issue seems to be in my get negative int function. 776 00:39:26,520 --> 00:39:30,120 So, OK, let me stop this execution. 777 00:39:30,120 --> 00:39:33,175 My problem doesn't seem to be in main, per se, maybe it's down here. 778 00:39:33,175 --> 00:39:33,800 So that's fine. 779 00:39:33,800 --> 00:39:35,990 Let me set my same breakpoint at line 8. 780 00:39:35,990 --> 00:39:38,510 Let me rerun debug50 one more time. 781 00:39:38,510 --> 00:39:43,110 But this time, instead of just stepping over that line, let's step into it. 782 00:39:43,110 --> 00:39:45,410 So notice line 8 is, again, highlighted in yellow. 783 00:39:45,410 --> 00:39:47,690 In the past I've been clicking Step Over. 784 00:39:47,690 --> 00:39:50,180 Let's click Step into, now. 785 00:39:50,180 --> 00:39:53,480 When I click Step Into, boom, now, the debugger 786 00:39:53,480 --> 00:39:56,390 jumps into that specific function. 787 00:39:56,390 --> 00:39:59,330 Now, I can step through these lines of code, again and again. 788 00:39:59,330 --> 00:40:01,700 I can see what the value of n is as I'm typing it in. 789 00:40:01,700 --> 00:40:03,500 I can think through my logic, and voila. 790 00:40:03,500 --> 00:40:07,640 Hopefully, once I've solved the issue, I can exit the debugger, fix my code, 791 00:40:07,640 --> 00:40:09,180 and move on. 792 00:40:09,180 --> 00:40:12,050 So Step Over just goes over the line, but executes it, 793 00:40:12,050 --> 00:40:17,210 Step Into lets you go into other functions you've written. 794 00:40:17,210 --> 00:40:19,400 So let's go ahead and do this. 795 00:40:19,400 --> 00:40:23,550 We've got a bunch of possible approaches that we 796 00:40:23,550 --> 00:40:25,550 can take to solving some problems let's go ahead 797 00:40:25,550 --> 00:40:26,730 and pace ourselves today, though. 798 00:40:26,730 --> 00:40:27,900 Let's take a five-minute break, here. 799 00:40:27,900 --> 00:40:30,688 And when we come back, we'll take a look at that computer's memory 800 00:40:30,688 --> 00:40:31,730 we've been talking about. 801 00:40:31,730 --> 00:40:32,950 See you in five. 802 00:40:32,950 --> 00:40:36,380 All right. 803 00:40:36,380 --> 00:40:41,000 So let's dive back in. 804 00:40:41,000 --> 00:40:46,860 Up until now, both, by way of week 1 and problems set 1, for the most part, 805 00:40:46,860 --> 00:40:50,660 we've just translated from Scratch into C all of these basic building blocks, 806 00:40:50,660 --> 00:40:53,700 like loops and conditionals, Boolean expressions, variables. 807 00:40:53,700 --> 00:40:54,950 So sort of, more of the same. 808 00:40:54,950 --> 00:40:58,430 But there are features in C that we've already stumbled across already, 809 00:40:58,430 --> 00:41:02,300 like data types, the types of variables that doesn't exist in Scratch, 810 00:41:02,300 --> 00:41:04,450 but that, in fact, does exist in other languages. 811 00:41:04,450 --> 00:41:06,200 In fact, a few that we'll see before long. 812 00:41:06,200 --> 00:41:10,670 So to summarize the types we saw last week, recall this little list here. 813 00:41:10,670 --> 00:41:15,050 We had ints, and floats, and longs, and doubles, and chars, 814 00:41:15,050 --> 00:41:18,510 there's also Booles and also string, which we've seen a few times. 815 00:41:18,510 --> 00:41:21,830 But today, let's actually start to formalize what these things are, 816 00:41:21,830 --> 00:41:25,760 and actually what your Mac and PC are doing when you manipulate bits 817 00:41:25,760 --> 00:41:29,170 as an int versus a char, versus a string, versus something else. 818 00:41:29,170 --> 00:41:31,920 And see if we can't put more tools into your toolkit, so to speak, 819 00:41:31,920 --> 00:41:35,630 so we can start quickly writing more featureful, more sophisticated 820 00:41:35,630 --> 00:41:36,800 programs in C. 821 00:41:36,800 --> 00:41:40,640 So it turns out, that on most systems nowadays, 822 00:41:40,640 --> 00:41:43,010 though this can vary by actual computer, this 823 00:41:43,010 --> 00:41:46,040 is how large each of the data types, typically, 824 00:41:46,040 --> 00:41:51,590 is in C. When you store a Boolean value, a 0 or 1, a true, a false, or true, 825 00:41:51,590 --> 00:41:52,850 it actually uses 1 byte. 826 00:41:52,850 --> 00:41:55,100 That's a little excessive, because, strictly speaking, 827 00:41:55,100 --> 00:41:58,580 you only need 1 bit, which is 1/8 of this size. 828 00:41:58,580 --> 00:42:01,190 But for simplicity, computers use a whole byte 829 00:42:01,190 --> 00:42:03,740 to represent a Boole, true or false. 830 00:42:03,740 --> 00:42:08,040 A char, we saw last week, is only 1 byte, or 8 bits. 831 00:42:08,040 --> 00:42:12,950 And this is why ASCII, which uses 1 byte, or technically, only 7 bits early 832 00:42:12,950 --> 00:42:17,600 on, was confined to only 256 maximally possible characters. 833 00:42:17,600 --> 00:42:21,940 Notice that an int is 4 bytes, or 32 bits. 834 00:42:21,940 --> 00:42:24,580 A float is also 4 bytes or 32 bits. 835 00:42:24,580 --> 00:42:27,850 But the things that we call long, it's, literally, twice as long, 836 00:42:27,850 --> 00:42:29,710 8 bytes or 64 bits. 837 00:42:29,710 --> 00:42:30,430 So is a double. 838 00:42:30,430 --> 00:42:33,900 A double is 64 bits of precision for floating point values. 839 00:42:33,900 --> 00:42:37,215 And a string, for today, we're going to leave as a question mark. 840 00:42:37,215 --> 00:42:39,340 We'll come back to that, later today and next week, 841 00:42:39,340 --> 00:42:42,520 as to how much space a string takes up, but, suffice it to say, 842 00:42:42,520 --> 00:42:45,488 it's going to take up a variable amount of space, 843 00:42:45,488 --> 00:42:47,530 depending on whether the string is short or long. 844 00:42:47,530 --> 00:42:50,470 But we'll see exactly what that means, before long. 845 00:42:50,470 --> 00:42:55,030 So here's a photograph of a typical piece of memory 846 00:42:55,030 --> 00:42:57,760 inside of your Mac, or PC, or phone. 847 00:42:57,760 --> 00:43:00,160 Odds are, it might be a little smaller in some devices. 848 00:43:00,160 --> 00:43:02,950 This is known as RAM, or random access memory. 849 00:43:02,950 --> 00:43:05,410 Each of these little black chips on this circuit 850 00:43:05,410 --> 00:43:07,720 board, the green thing, these little black chips 851 00:43:07,720 --> 00:43:10,630 are where 0s and 1s are actually stored. 852 00:43:10,630 --> 00:43:12,670 Each of those stores some number of bytes. 853 00:43:12,670 --> 00:43:15,130 Maybe megabytes, maybe even gigabytes, nowadays. 854 00:43:15,130 --> 00:43:21,430 So let's focus on one of those chips, to give us a zoomed in version, thereof. 855 00:43:21,430 --> 00:43:25,390 Let's consider the fact that, even though we don't have to care, exactly , 856 00:43:25,390 --> 00:43:29,470 how this kind of thing is made, if this is, like, 1 gigabyte of memory, 857 00:43:29,470 --> 00:43:31,930 for the sake of discussion, it stands to reason that, 858 00:43:31,930 --> 00:43:35,830 if this thing is storing 1 billion bytes, 1 gigabyte, 859 00:43:35,830 --> 00:43:38,110 then we can number them, arbitrarily. 860 00:43:38,110 --> 00:43:41,590 Maybe this will be byte 0, 1, 2, 3, 4, 5, 6, 7, 8. 861 00:43:41,590 --> 00:43:45,000 Then, maybe, way down here in the bottom right corner is byte number 1 billion. 862 00:43:45,000 --> 00:43:48,760 We can just number these things, as might be our convention. 863 00:43:48,760 --> 00:43:50,710 Let's draw that graphically. 864 00:43:50,710 --> 00:43:53,090 Not with a billion squares, but fewer than those. 865 00:43:53,090 --> 00:43:55,410 And let's zoom in further, and consider that. 866 00:43:55,410 --> 00:43:57,160 At this point in the story, let's abstract 867 00:43:57,160 --> 00:43:59,380 away all the hardware, and all the little wires, 868 00:43:59,380 --> 00:44:03,730 and just think of memory as taking up-- or, rather, just think of data 869 00:44:03,730 --> 00:44:06,170 as taking up some number of bytes. 870 00:44:06,170 --> 00:44:09,820 So, for instance, if you were to store a char in a computer's memory, which 871 00:44:09,820 --> 00:44:14,230 was 1 byte, it might be stored at this top left-hand location 872 00:44:14,230 --> 00:44:16,195 of this black chip of memory. 873 00:44:16,195 --> 00:44:20,290 If you were to store something like an integer that uses 4 bytes, well, 874 00:44:20,290 --> 00:44:23,560 it might use four of those bytes, but they're going to be contiguous 875 00:44:23,560 --> 00:44:25,220 back-to-back-to-back, in this case. 876 00:44:25,220 --> 00:44:29,270 If you were to store a long or a double, you might, actually, need 8 bytes. 877 00:44:29,270 --> 00:44:31,390 So I'm filling in these squares to represent 878 00:44:31,390 --> 00:44:36,160 how much memory and given variable of some data type would take up. 879 00:44:36,160 --> 00:44:39,230 1, or 4, or 8, in this case, here. 880 00:44:39,230 --> 00:44:42,160 Well, from here, let's abstract away from all of the hardware 881 00:44:42,160 --> 00:44:44,320 and really focus on memory as being a grid. 882 00:44:44,320 --> 00:44:47,650 Or, really, like a canvas that we can paint any types of data 883 00:44:47,650 --> 00:44:48,850 onto that we want. 884 00:44:48,850 --> 00:44:52,600 At the end of the day, all of this data is just going to be 0s and 1s. 885 00:44:52,600 --> 00:44:56,500 But it's up to you and I to build abstractions on top of that. 886 00:44:56,500 --> 00:45:00,130 Things like actual numbers, colors, images, movies, and beyond. 887 00:45:00,130 --> 00:45:02,440 But we'll start lower-level, here, first. 888 00:45:02,440 --> 00:45:05,950 Suppose I had a program that needs three integers. 889 00:45:05,950 --> 00:45:08,800 A simple program whose purpose in life is to average your three 890 00:45:08,800 --> 00:45:12,400 scores on an exam, or some such thing. 891 00:45:12,400 --> 00:45:17,020 Suppose that your three scores were these, 72, 73, not too bad, and 33, 892 00:45:17,020 --> 00:45:18,145 which is particularly low. 893 00:45:18,145 --> 00:45:23,030 Let's write a program that does this kind of averaging for us. 894 00:45:23,030 --> 00:45:24,860 Let me go back to VS Code, here. 895 00:45:24,860 --> 00:45:28,270 Let me open up a file called scores.c. 896 00:45:28,270 --> 00:45:30,830 Let me implement this as follows. 897 00:45:30,830 --> 00:45:35,860 Let me include stdio.h at the top, int main(void) as before. 898 00:45:35,860 --> 00:45:41,320 Then, inside of main, let me declare score 1, which is 72. 899 00:45:41,320 --> 00:45:43,990 Give me another score, 73. 900 00:45:43,990 --> 00:45:47,140 Then, a third score, called score 3, which is going to be 33. 901 00:45:47,140 --> 00:45:50,740 Now, I'm going to use printf to print out the average of those things, 902 00:45:50,740 --> 00:45:52,520 and I can do this in a few different ways. 903 00:45:52,520 --> 00:45:57,850 But I'm going to print out %f, and I'm going to do score 1, plus score 2, 904 00:45:57,850 --> 00:46:03,760 plus score 3, divided by 3, close parentheses semicolon. 905 00:46:03,760 --> 00:46:07,300 Some relatively simple arithmetic to compute the average of three scores, 906 00:46:07,300 --> 00:46:10,570 if I'm curious what my average grade is in the class with these three 907 00:46:10,570 --> 00:46:11,620 assessments. 908 00:46:11,620 --> 00:46:15,616 Let me, now, do make scores. 909 00:46:15,616 --> 00:46:19,240 All right, so I've somehow made an error already. 910 00:46:19,240 --> 00:46:25,150 But this one is, actually, germane to a problem we, hopefully, 911 00:46:25,150 --> 00:46:26,860 won't encounter too frequently. 912 00:46:26,860 --> 00:46:27,860 What's going on here? 913 00:46:27,860 --> 00:46:31,360 So underlined to score 1, plus score 2, plus score 3, divided by 3. 914 00:46:31,360 --> 00:46:36,250 Format specifies type double, but the argument has type int, well, 915 00:46:36,250 --> 00:46:38,530 what's going on here? 916 00:46:38,530 --> 00:46:40,430 Because the arithmetic seems to check out. 917 00:46:40,430 --> 00:46:40,930 Yeah? 918 00:46:40,930 --> 00:46:44,560 AUDIENCE: So the computer is doing the math, but they basically [INAUDIBLE] 919 00:46:44,560 --> 00:46:49,260 just gives out a value at the end because, well [INAUDIBLE] 920 00:46:49,260 --> 00:46:50,210 DAVID MALAN: Correct. 921 00:46:50,210 --> 00:46:51,640 And we'll come back to this in more detail, 922 00:46:51,640 --> 00:46:54,522 but, indeed, what's happening here is I'm adding three ints together, 923 00:46:54,522 --> 00:46:56,480 obviously, because I define them right up here. 924 00:46:56,480 --> 00:46:59,470 And I'm dividing by another int, 3, but the catch 925 00:46:59,470 --> 00:47:03,890 is, recall that C when it performs math, treats all of these things as integers. 926 00:47:03,890 --> 00:47:05,810 But integers are not floating point value. 927 00:47:05,810 --> 00:47:08,890 So if you actually want to get a precise, average for your score 928 00:47:08,890 --> 00:47:12,760 without throwing away the remainder, everything after the decimal point, 929 00:47:12,760 --> 00:47:15,430 it turns out, we're going to have to-- 930 00:47:15,430 --> 00:47:17,410 we're going to-- aww-- 931 00:47:17,410 --> 00:47:18,430 we're going to have to-- 932 00:47:18,430 --> 00:47:22,720 [LAUGHTER] we're going to have to convert this whole expression, somehow, 933 00:47:22,720 --> 00:47:23,350 to a float. 934 00:47:23,350 --> 00:47:26,230 And there's a few ways to do this but the easiest way, 935 00:47:26,230 --> 00:47:28,540 for now, I'm going to go ahead and do this up here, 936 00:47:28,540 --> 00:47:31,360 I'm going to change the divide by 3 to divide by 3.0. 937 00:47:31,360 --> 00:47:35,440 Because it turns out, long story short, in C, so long as one of the values 938 00:47:35,440 --> 00:47:37,300 participating in an arithmetic expression 939 00:47:37,300 --> 00:47:39,730 like this is something like a float, the rest 940 00:47:39,730 --> 00:47:44,210 will be treated as promoted to a floating point value as well. 941 00:47:44,210 --> 00:47:49,495 So let me, now, recompile this code with make scores, Enter. 942 00:47:49,495 --> 00:47:53,500 This time it worked OK, because I'm treating a float as a float. 943 00:47:53,500 --> 00:47:55,600 Let me do . /scores, Enter. 944 00:47:55,600 --> 00:48:00,150 All right, my average is 59.33333 and so forth. 945 00:48:00,150 --> 00:48:00,650 All right. 946 00:48:00,650 --> 00:48:03,340 So the math, presumably, checks out. 947 00:48:03,340 --> 00:48:06,220 Floating point imprecision per last week aside. 948 00:48:06,220 --> 00:48:09,280 But let's consider the design of this program. 949 00:48:09,280 --> 00:48:16,680 What is, kind of, bad about it, or if we maintain this program longer term, 950 00:48:16,680 --> 00:48:19,480 are we going to regret the design of this program? 951 00:48:19,480 --> 00:48:20,990 What might not be ideal here? 952 00:48:20,990 --> 00:48:21,490 Yeah? 953 00:48:21,490 --> 00:48:30,364 AUDIENCE: [INAUDIBLE] 954 00:48:30,364 --> 00:48:34,220 DAVID MALAN: Yeah, so in this case, I have hard coded my three scores. 955 00:48:34,220 --> 00:48:37,140 So, if I'm hearing you correctly, this program 956 00:48:37,140 --> 00:48:39,600 is only ever going to tell me this specific average. 957 00:48:39,600 --> 00:48:41,730 I'm not even using something like, get int 958 00:48:41,730 --> 00:48:44,790 or get float to get three different scores, so that's not good. 959 00:48:44,790 --> 00:48:46,942 And suppose that we wait later in the semester, 960 00:48:46,942 --> 00:48:48,400 I think other problems could arise. 961 00:48:48,400 --> 00:48:48,900 Yeah? 962 00:48:48,900 --> 00:48:51,020 AUDIENCE: Just thinking also somewhat of an issue 963 00:48:51,020 --> 00:48:52,900 that you can't reuse that number. 964 00:48:52,900 --> 00:48:55,450 DAVID MALAN: I can't reuse the number because I 965 00:48:55,450 --> 00:48:59,088 haven't stored the average in some variable, which in this program, not 966 00:48:59,088 --> 00:49:01,630 a big deal, but certainly, if I wanted to reuse it elsewhere, 967 00:49:01,630 --> 00:49:02,650 that's a problem. 968 00:49:02,650 --> 00:49:05,025 Let's fast-forward again, a little later in the semester, 969 00:49:05,025 --> 00:49:07,390 I don't just have three test scores or exam scores, 970 00:49:07,390 --> 00:49:09,430 maybe I have 4, or 5, or 6. 971 00:49:09,430 --> 00:49:10,690 Where might this take us? 972 00:49:10,690 --> 00:49:12,301 AUDIENCE: Yeah, if you ever want to have to take 973 00:49:12,301 --> 00:49:14,900 the average of any number of scores other than 3, [INAUDIBLE] 974 00:49:14,900 --> 00:49:18,110 DAVID MALAN: Yeah, I've sort of, capped this program at 3. 975 00:49:18,110 --> 00:49:20,942 And honestly, this is, kind of, bordering on copy paste. 976 00:49:20,942 --> 00:49:23,900 Even though the variables, yes, have different names; score 1, score 2, 977 00:49:23,900 --> 00:49:24,800 score 3. 978 00:49:24,800 --> 00:49:27,230 Imagine doing this for a whole grade book for a class. 979 00:49:27,230 --> 00:49:32,990 Having to score 4, 5, 6, 11 10, 12, 20, 30, that's a lot of variables. 980 00:49:32,990 --> 00:49:35,420 You can imagine just how ugly the code starts 981 00:49:35,420 --> 00:49:38,635 to get if you're just defining variable after variable, after variable. 982 00:49:38,635 --> 00:49:42,740 So it turns out, there are better ways, in languages like C, 983 00:49:42,740 --> 00:49:47,240 if you want to have multiple values stored in memory that 984 00:49:47,240 --> 00:49:49,040 happened to be of the same data type. 985 00:49:49,040 --> 00:49:50,420 Let's take a look back at this memory, here, 986 00:49:50,420 --> 00:49:52,545 to see what these things might look like in memory. 987 00:49:52,545 --> 00:49:54,170 Here's that grid of memory. 988 00:49:54,170 --> 00:49:56,450 Each of these recall represents a byte. 989 00:49:56,450 --> 00:49:59,690 To be clear, if I store score 1 in memory first, 990 00:49:59,690 --> 00:50:01,130 how many bytes will it take up? 991 00:50:01,130 --> 00:50:02,520 AUDIENCE: [INAUDIBLE] 992 00:50:02,520 --> 00:50:03,650 DAVID MALAN: So 4, a.k.a. 993 00:50:03,650 --> 00:50:04,430 32 bits. 994 00:50:04,430 --> 00:50:08,578 So I might draw a score 1 as filling up this part of the memory. 995 00:50:08,578 --> 00:50:11,870 It's up to the computer as to whether it goes here, or down there, or wherever. 996 00:50:11,870 --> 00:50:15,290 I'm just keeping the pictures clean for today, from the top-left on down. 997 00:50:15,290 --> 00:50:18,080 If I, then, declare another variable, called score 2, 998 00:50:18,080 --> 00:50:20,730 it might end up over there, also taking up 4 bytes. 999 00:50:20,730 --> 00:50:23,330 And then score 3 might end up here. 1000 00:50:23,330 --> 00:50:26,880 So that's just representing what's going on inside of the computer's memory. 1001 00:50:26,880 --> 00:50:30,680 But technically speaking, to be clear, per week 0, what's 1002 00:50:30,680 --> 00:50:34,580 really being stored in the computer's memory, are patterns of 0s and 1s. 1003 00:50:34,580 --> 00:50:39,350 32 total, in this case, because 32 bits is 4 bytes. 1004 00:50:39,350 --> 00:50:43,280 But again, it gets boring quickly to think in and look 1005 00:50:43,280 --> 00:50:44,760 at binary all the time. 1006 00:50:44,760 --> 00:50:47,120 So we'll, generally, abstract this away as just using 1007 00:50:47,120 --> 00:50:49,550 decimal numbers, in this case, instead. 1008 00:50:49,550 --> 00:50:54,170 But there might be a better way to store, not just three of these things, 1009 00:50:54,170 --> 00:50:57,500 but maybe four, maybe, five, maybe 10, maybe, more, 1010 00:50:57,500 --> 00:51:03,110 by declaring one variable to store all of them, instead of 3, or 4, or 5, 1011 00:51:03,110 --> 00:51:05,750 or more individual variables. 1012 00:51:05,750 --> 00:51:10,250 The way to do this is by way of something known as an array. 1013 00:51:10,250 --> 00:51:18,320 An array is another type of data that allows you to store multiple values 1014 00:51:18,320 --> 00:51:20,980 of the same type back-to-back-to-back. 1015 00:51:20,980 --> 00:51:22,230 That is, to say, contiguously. 1016 00:51:22,230 --> 00:51:29,840 So an array can let you create memory for one int, or two, or three, 1017 00:51:29,840 --> 00:51:32,600 or even more than that, but describe them 1018 00:51:32,600 --> 00:51:36,390 all using the same variable name, the same one name. 1019 00:51:36,390 --> 00:51:40,740 So for instance, if, for one program, I only need three integers, 1020 00:51:40,740 --> 00:51:45,800 but I don't want to messily declare them as score 1, score 2, score 3, 1021 00:51:45,800 --> 00:51:46,960 I can do this, instead. 1022 00:51:46,960 --> 00:51:49,130 This is today's first new piece of syntax, 1023 00:51:49,130 --> 00:51:51,290 the square brackets that we're now seeing. 1024 00:51:51,290 --> 00:51:57,140 This line of code, here, is similar to int score 1 semicolon, 1025 00:51:57,140 --> 00:52:00,360 or int score 1 equals 72 semicolon. 1026 00:52:00,360 --> 00:52:05,780 This line of code is declaring for me, so to speak, an array of size 3. 1027 00:52:05,780 --> 00:52:09,260 And that array is going to store three integers. 1028 00:52:09,260 --> 00:52:09,770 Why? 1029 00:52:09,770 --> 00:52:14,990 Because the type of that array is an int, here. 1030 00:52:14,990 --> 00:52:18,110 The square brackets tell the computer how many ints you want. 1031 00:52:18,110 --> 00:52:18,980 In this case, 3. 1032 00:52:18,980 --> 00:52:21,140 And the name is, of course, scores. 1033 00:52:21,140 --> 00:52:23,540 Which, in English, I've deliberately pluralized 1034 00:52:23,540 --> 00:52:28,100 so that I can describe this array as storing multiple scores, indeed. 1035 00:52:28,100 --> 00:52:32,970 So if I want to now assign values to this variable, called scores, 1036 00:52:32,970 --> 00:52:34,760 I can do code like this. 1037 00:52:34,760 --> 00:52:40,160 I can say, scores bracket 0 equals 72, scores bracket 1 equals 73, 1038 00:52:40,160 --> 00:52:42,190 and scores bracket 2 equals 33. 1039 00:52:42,190 --> 00:52:43,940 The only thing weird there is, admittedly, 1040 00:52:43,940 --> 00:52:45,830 the square brackets which are still new. 1041 00:52:45,830 --> 00:52:49,820 But we're also, notice, 0 indexing things. 1042 00:52:49,820 --> 00:52:52,345 To zero index means to start counting at 0. 1043 00:52:52,345 --> 00:52:54,470 When we've talked about that before, our four loops 1044 00:52:54,470 --> 00:52:56,000 have, generally, been zero indexed. 1045 00:52:56,000 --> 00:52:59,870 Arrays in C are zero indexed. 1046 00:52:59,870 --> 00:53:01,430 And you do not have choice over that. 1047 00:53:01,430 --> 00:53:04,550 You can't start counting at 1 in arrays because you prefer to, 1048 00:53:04,550 --> 00:53:06,830 you'd be sacrificing one of the elements. 1049 00:53:06,830 --> 00:53:09,620 You have to start in arrays counting from 0. 1050 00:53:09,620 --> 00:53:13,130 So out of context, this doesn't solve a problem, 1051 00:53:13,130 --> 00:53:15,200 but it, definitely, is going to once we have more 1052 00:53:15,200 --> 00:53:16,910 than, even, three scores here. 1053 00:53:16,910 --> 00:53:19,750 In fact, let me change this program a little bit. 1054 00:53:19,750 --> 00:53:21,450 Let me go back to VS Code. 1055 00:53:21,450 --> 00:53:24,020 And delete these three lines, here. 1056 00:53:24,020 --> 00:53:27,080 And replace it with a scores variable that's 1057 00:53:27,080 --> 00:53:30,140 ready to store three total integers. 1058 00:53:30,140 --> 00:53:34,130 And then, initialize them as follows, scores bracket 0 is 72, 1059 00:53:34,130 --> 00:53:38,300 as before, scores bracket 1 is going to be 73, scores bracket 2 1060 00:53:38,300 --> 00:53:39,740 is going to be 33. 1061 00:53:39,740 --> 00:53:44,068 Notice, I do not need to say int before any of these lines, 1062 00:53:44,068 --> 00:53:45,860 because that's been taken care of, already, 1063 00:53:45,860 --> 00:53:50,570 for me on line 5, where I already specified that everything in this array 1064 00:53:50,570 --> 00:53:53,330 is going to be an int. 1065 00:53:53,330 --> 00:53:57,020 Now, down here, this code needs to change because I no longer have 1066 00:53:57,020 --> 00:53:59,300 three variables, score 1, 2, and 3. 1067 00:53:59,300 --> 00:54:03,950 I have 1 variable, but that I can index into. 1068 00:54:03,950 --> 00:54:08,750 I'm going to, here, then, do scores bracket 0, plus scores bracket 1, 1069 00:54:08,750 --> 00:54:13,370 plus scores bracket 2, which is equivalent to what I did earlier, 1070 00:54:13,370 --> 00:54:14,900 giving me back those three integers. 1071 00:54:14,900 --> 00:54:17,860 But notice, I'm using the same variable name, every time. 1072 00:54:17,860 --> 00:54:21,070 And again, I'm using this new square bracket notation to, quote-unquote, 1073 00:54:21,070 --> 00:54:26,590 index into the array to get at the first int, the second int, and the third, 1074 00:54:26,590 --> 00:54:28,840 and then, to do it again down here. 1075 00:54:28,840 --> 00:54:31,907 Now, this program, still not really solving all the problems we describe, 1076 00:54:31,907 --> 00:54:34,240 I still can only store three scores, but we'll come back 1077 00:54:34,240 --> 00:54:35,930 to something like that before long. 1078 00:54:35,930 --> 00:54:38,950 But for now, we're just introducing a new syntax and a new feature, 1079 00:54:38,950 --> 00:54:44,980 whereby, I can now store multiple values in the same variable. 1080 00:54:44,980 --> 00:54:47,110 Well, let's enhance this a bit more. 1081 00:54:47,110 --> 00:54:50,660 Instead of hard coding these scores, as was identified as a problem, 1082 00:54:50,660 --> 00:54:54,790 let's use get int to ask the user for a score. 1083 00:54:54,790 --> 00:54:58,330 Let's, then, use get int to ask the user for another score. 1084 00:54:58,330 --> 00:55:01,540 Let's use get int to ask the user for a third score, 1085 00:55:01,540 --> 00:55:04,400 storing them in those respective locations. 1086 00:55:04,400 --> 00:55:09,820 And, now, if I go ahead and save this program, recompile scores, huh. 1087 00:55:09,820 --> 00:55:10,900 I've messed up, here. 1088 00:55:10,900 --> 00:55:13,990 Now these errors should be getting a little familiar. 1089 00:55:13,990 --> 00:55:16,750 What mistake did I make? 1090 00:55:16,750 --> 00:55:17,875 Let me give folks a moment. 1091 00:55:17,875 --> 00:55:18,970 AUDIENCE: cs50.h 1092 00:55:18,970 --> 00:55:21,100 DAVID MALAN: cs50.h. 1093 00:55:21,100 --> 00:55:24,220 That was not intentional, so still making mistakes all these years later. 1094 00:55:24,220 --> 00:55:26,320 I need to include cs50.h. 1095 00:55:26,320 --> 00:55:29,570 Now, I'm going to go back to the bottom in the terminal window, make scores. 1096 00:55:29,570 --> 00:55:30,070 OK. 1097 00:55:30,070 --> 00:55:31,670 We're back in business, ./scores. 1098 00:55:31,670 --> 00:55:33,920 Now, the program is getting a little more interesting. 1099 00:55:33,920 --> 00:55:38,020 So maybe, this year was better and I got a 100, and a 99, and a 98, and there, 1100 00:55:38,020 --> 00:55:40,900 my average is 99.0000. 1101 00:55:40,900 --> 00:55:42,370 So now, it's a little more dynamic. 1102 00:55:42,370 --> 00:55:43,270 It's a little more interesting. 1103 00:55:43,270 --> 00:55:45,978 But it's still capping the number of scores at three, admittedly. 1104 00:55:45,978 --> 00:55:50,740 But now, I've introduced another, sort of, symptom of bad programming. 1105 00:55:50,740 --> 00:55:54,108 There's this expression in programming, too, called code smell, where like-- 1106 00:55:54,108 --> 00:55:55,900 [SNIFFS AIR] something smells a little off. 1107 00:55:55,900 --> 00:56:00,550 And there's something off here in that I could do better with this code. 1108 00:56:00,550 --> 00:56:05,080 Does anyone see an opportunity to improve the design of this code, here, 1109 00:56:05,080 --> 00:56:08,230 if my goal, still, is to get three scores from the user but [SNIFF SNIFF] 1110 00:56:08,230 --> 00:56:10,430 without it smelling [SNIFF] kind of bad? 1111 00:56:10,430 --> 00:56:10,930 Yeah? 1112 00:56:10,930 --> 00:56:12,940 AUDIENCE: [INAUDIBLE] use a 4 loop? 1113 00:56:12,940 --> 00:56:15,958 That way you don't have to copy and paste all of those scores. 1114 00:56:15,958 --> 00:56:17,160 DAVID MALAN: Yeah, exactly. 1115 00:56:17,160 --> 00:56:19,022 Those lines of code are almost identical. 1116 00:56:19,022 --> 00:56:21,480 And honestly, the only thing that's changing is the number, 1117 00:56:21,480 --> 00:56:23,100 and it's just incrementing by 1. 1118 00:56:23,100 --> 00:56:25,330 We have all of the building blocks to do this better. 1119 00:56:25,330 --> 00:56:27,130 So let me go ahead and improve this. 1120 00:56:27,130 --> 00:56:29,560 Let me delete that code. 1121 00:56:29,560 --> 00:56:31,720 Let me, now, have a 4 loop. 1122 00:56:31,720 --> 00:56:36,150 So for int i get 0, i less than 3, i plus plus. 1123 00:56:36,150 --> 00:56:39,060 Then, inside of this 4 loop, I can distill all three 1124 00:56:39,060 --> 00:56:40,860 of those lines into something more generic, 1125 00:56:40,860 --> 00:56:46,530 like scores bracket i equals get int, and now, ask the user, just 1126 00:56:46,530 --> 00:56:48,905 once, via get int, for a score. 1127 00:56:48,905 --> 00:56:52,000 So this is where arrays start to get pretty powerful. 1128 00:56:52,000 --> 00:56:54,000 You don't have to hard code, that is, literally, 1129 00:56:54,000 --> 00:56:56,462 type in all of these magic numbers like 0, 1, and 2. 1130 00:56:56,462 --> 00:56:58,170 You can start to do it, programmatically, 1131 00:56:58,170 --> 00:56:59,770 as you propose with a loop. 1132 00:56:59,770 --> 00:57:01,350 So now, I've tightened things up. 1133 00:57:01,350 --> 00:57:04,230 I'm now, dynamically, getting three different scores, 1134 00:57:04,230 --> 00:57:06,766 but putting them in three different locations. 1135 00:57:06,766 --> 00:57:10,470 And so this program, ultimately, is going to work, pretty much, the same. 1136 00:57:10,470 --> 00:57:17,520 Make scores, ./scores, and 100, 99, 98, and we're back to the same answer. 1137 00:57:17,520 --> 00:57:19,440 But it's a little better designed, too. 1138 00:57:19,440 --> 00:57:21,360 If I really want to nitpick, there's something 1139 00:57:21,360 --> 00:57:23,100 that still smells, a little bit, here. 1140 00:57:23,100 --> 00:57:27,540 The fact that I have indeed, this magic number three, that really 1141 00:57:27,540 --> 00:57:29,890 has to be the same as this number here. 1142 00:57:29,890 --> 00:57:32,170 Otherwise, who knows what's going to go wrong. 1143 00:57:32,170 --> 00:57:34,380 So what might be a solution, per last week, 1144 00:57:34,380 --> 00:57:36,960 to cleaning that code up further, too? 1145 00:57:36,960 --> 00:57:39,750 AUDIENCE: [INAUDIBLE] the user's discretion 1146 00:57:39,750 --> 00:57:41,742 how many input scores [INAUDIBLE]. 1147 00:57:41,742 --> 00:57:44,790 DAVID MALAN: OK, so we could leave it up to the user's discretion. 1148 00:57:44,790 --> 00:57:47,500 And so we could, actually, do something like this. 1149 00:57:47,500 --> 00:57:49,200 Let me take this a few steps ahead. 1150 00:57:49,200 --> 00:57:56,230 Let me say something like, int n gets get int, how many scores question mark, 1151 00:57:56,230 --> 00:58:00,600 then I could actually change this to an n, and then this to an n, 1152 00:58:00,600 --> 00:58:02,970 and, indeed, make the whole program dynamic? 1153 00:58:02,970 --> 00:58:05,670 Ask the human how many tests have there been this semester? 1154 00:58:05,670 --> 00:58:07,500 Then, you can type in each of those scores 1155 00:58:07,500 --> 00:58:09,708 because the loop is going to iterate that many times. 1156 00:58:09,708 --> 00:58:13,020 And then you'll get the average of one test, two test, three-- 1157 00:58:13,020 --> 00:58:17,520 well, lost another-- or however many scores that were actually 1158 00:58:17,520 --> 00:58:20,760 specified by the user Yeah, question? 1159 00:58:20,760 --> 00:58:25,765 AUDIENCE: How many bits or bytes get used in an array? 1160 00:58:25,765 --> 00:58:28,060 DAVID MALAN: How many bytes are used in an array? 1161 00:58:28,060 --> 00:58:32,524 AUDIENCE: [INAUDIBLE] point of doing this is to save [INAUDIBLE] 1162 00:58:32,524 --> 00:58:35,500 DAVID MALAN: So the purpose of an array is not to save space. 1163 00:58:35,500 --> 00:58:39,010 It's to eliminate having multiple variable names 1164 00:58:39,010 --> 00:58:40,900 because that gets very messy quickly. 1165 00:58:40,900 --> 00:58:44,980 If you have score 1, score 2, score 3, dot, dot, dot, score 99, 1166 00:58:44,980 --> 00:58:48,100 that's, like, 99 different variables, potentially, 1167 00:58:48,100 --> 00:58:54,160 that you could collapse into one variable that has 99 locations. 1168 00:58:54,160 --> 00:58:56,230 At different indices, or indexes. 1169 00:58:56,230 --> 00:58:58,570 As someone would say, the index for an array 1170 00:58:58,570 --> 00:59:00,756 is whatever is in the square brackets. 1171 00:59:00,756 --> 00:59:11,560 AUDIENCE: [INAUDIBLE] 1172 00:59:11,560 --> 00:59:13,280 DAVID MALAN: So it's a good question. 1173 00:59:13,280 --> 00:59:15,370 So if you-- I'm using ints for everything-- 1174 00:59:15,370 --> 00:59:17,560 and honestly, we don't really need ints for scores 1175 00:59:17,560 --> 00:59:21,770 because I'm not likely to get a 2 billion on a test anytime soon. 1176 00:59:21,770 --> 00:59:23,620 And so you could use different data types. 1177 00:59:23,620 --> 00:59:26,287 And that list we had on the screen, earlier, is not all of them. 1178 00:59:26,287 --> 00:59:29,770 There's a data type called short, which is shorter than an int, 1179 00:59:29,770 --> 00:59:34,850 you could, technically, use char, in some form or other data types as well. 1180 00:59:34,850 --> 00:59:36,940 Generally speaking, in the year 2021, these 1181 00:59:36,940 --> 00:59:40,990 tend to be over optima-- overly optimized decisions. 1182 00:59:40,990 --> 00:59:42,940 Everyone just uses ints, even though no one 1183 00:59:42,940 --> 00:59:46,300 is going to get a test score that's 2 billion, or more, because int is just, 1184 00:59:46,300 --> 00:59:47,260 kind of, the go-to. 1185 00:59:47,260 --> 00:59:50,252 Years ago, memory was expensive. 1186 00:59:50,252 --> 00:59:52,210 And every one of your instincts would have been 1187 00:59:52,210 --> 00:59:54,700 spot on because memory is so tight. 1188 00:59:54,700 --> 00:59:56,930 But, nowadays, we don't worry as much about it. 1189 00:59:56,930 --> 00:59:57,430 Yeah? 1190 00:59:57,430 --> 01:00:02,556 AUDIENCE: I have a question about the error [INAUDIBLE].. 1191 01:00:02,556 --> 01:00:06,605 Could it-- when you're doing a hash problem on the problem set-- 1192 01:00:06,605 --> 01:00:10,010 DAVID MALAN: So what is the difference between dividing two ints 1193 01:00:10,010 --> 01:00:12,380 and not getting an error, as you might have encountered 1194 01:00:12,380 --> 01:00:15,920 in a program like cash, versus dividing two ints 1195 01:00:15,920 --> 01:00:18,150 and getting an error like I did a moment ago? 1196 01:00:18,150 --> 01:00:22,280 The problem with the scenario I created a moment ago was printf was involved. 1197 01:00:22,280 --> 01:00:27,980 And I was telling printf to use a %f, but I was giving printf the result 1198 01:00:27,980 --> 01:00:30,580 of dividing integers by another integer. 1199 01:00:30,580 --> 01:00:32,930 So it was printf that was yelling at me. 1200 01:00:32,930 --> 01:00:35,930 I'm guessing in the scenario you're describing, for something like cash, 1201 01:00:35,930 --> 01:00:39,180 printf was not involved in that particular line of code. 1202 01:00:39,180 --> 01:00:40,865 So that's the difference, there. 1203 01:00:40,865 --> 01:00:41,660 All right. 1204 01:00:41,660 --> 01:00:45,110 So we, now, have this ability to create an array. 1205 01:00:45,110 --> 01:00:47,510 And an array can store multiple values. 1206 01:00:47,510 --> 01:00:51,450 What, then, might we do that's more interesting than just storing numbers 1207 01:00:51,450 --> 01:00:51,950 in memory? 1208 01:00:51,950 --> 01:00:54,230 Well, let's take this one step further. 1209 01:00:54,230 --> 01:01:01,130 As opposed to just storing 72, 73, 33 or 100, 99, 98, at these given locations, 1210 01:01:01,130 --> 01:01:05,930 because again, an array gives you one variable name, but multiple locations, 1211 01:01:05,930 --> 01:01:08,360 or indices therein, bracket 0, bracket 1, 1212 01:01:08,360 --> 01:01:11,330 bracket 2 on up, if it were even bigger than that. 1213 01:01:11,330 --> 01:01:16,100 Let's, now, start to consider something more modest, like simple chars. 1214 01:01:16,100 --> 01:01:18,830 Chars, being 1 byte each, so they're even smaller, 1215 01:01:18,830 --> 01:01:20,090 they take up much less space. 1216 01:01:20,090 --> 01:01:22,048 And, indeed, if I wanted to say a message like, 1217 01:01:22,048 --> 01:01:24,200 hi I could use three variables. 1218 01:01:24,200 --> 01:01:28,520 If I wanted a program to print, hi, H-I exclamation point, 1219 01:01:28,520 --> 01:01:33,230 I could, of course, store those in three variables, like c1, c2, c3. 1220 01:01:33,230 --> 01:01:36,710 And let's, for the sake of discussion, let's whip this up real quickly. 1221 01:01:36,710 --> 01:01:39,680 Let me create a new program, now, in VS Code. 1222 01:01:39,680 --> 01:01:42,920 This time, I'm going to call it hi.c. 1223 01:01:42,920 --> 01:01:45,650 And I'm not going to bother with the CS50 library. 1224 01:01:45,650 --> 01:01:47,660 I just need the standard I/O one, for now. 1225 01:01:47,660 --> 01:01:49,220 int main(void). 1226 01:01:49,220 --> 01:01:52,400 And then, inside of main, I'm going to, simply, create three variables. 1227 01:01:52,400 --> 01:01:55,760 And this is already, hopefully, striking you as a bad idea. 1228 01:01:55,760 --> 01:01:58,310 But we'll go down this road, temporarily, 1229 01:01:58,310 --> 01:02:02,300 with c1, and c2, and, finally, c3. 1230 01:02:02,300 --> 01:02:05,660 Storing each character in the phrase I want to print, 1231 01:02:05,660 --> 01:02:09,450 and I'm going to print this in a different way than usual. 1232 01:02:09,450 --> 01:02:10,880 Now I'm dealing with chars. 1233 01:02:10,880 --> 01:02:14,480 And we've, generally, dealt with strings, which was easier last week. 1234 01:02:14,480 --> 01:02:21,600 But %c, %c, %c, will let me print out three chars, and like c1, c2, and c3. 1235 01:02:21,600 --> 01:02:24,420 So, kind of, a stupid way of printing out a string. 1236 01:02:24,420 --> 01:02:26,940 So we already have a solution to this problem last week. 1237 01:02:26,940 --> 01:02:30,540 But let's poke around at what's going on underneath the hood, here. 1238 01:02:30,540 --> 01:02:33,350 So let's make hi, ./hi. 1239 01:02:33,350 --> 01:02:34,475 And, voila no surprise. 1240 01:02:34,475 --> 01:02:36,350 But we, again, could have done this last week 1241 01:02:36,350 --> 01:02:39,530 with a string and just one variable, or even, 0, at that. 1242 01:02:39,530 --> 01:02:43,220 But let's start converting these characters 1243 01:02:43,220 --> 01:02:47,750 to their apparent numeric equivalents like we talked about in week 0 too. 1244 01:02:47,750 --> 01:02:52,310 Let me modify these %c's, just to be fun, to be %i's. 1245 01:02:52,310 --> 01:02:56,180 And let me add some spaces so there are gaps between each of them. 1246 01:02:56,180 --> 01:03:00,350 Let me, now, recompile hi, and let me rerun it. 1247 01:03:00,350 --> 01:03:02,900 Just to guess, what should I see on the screen now? 1248 01:03:05,690 --> 01:03:06,200 Any guesses? 1249 01:03:06,200 --> 01:03:06,700 Yeah? 1250 01:03:06,700 --> 01:03:08,036 AUDIENCE: The ASCII values? 1251 01:03:08,036 --> 01:03:09,760 DAVID MALAN: The ASCII values. 1252 01:03:09,760 --> 01:03:12,220 And it's intentional that I keep using the same word, 1253 01:03:12,220 --> 01:03:18,250 hi, because it should be, hopefully, the old friends, 72, 73, and 33. 1254 01:03:18,250 --> 01:03:22,120 Which, is to say, that c knows about ASCII, or equivalently, Unicode, 1255 01:03:22,120 --> 01:03:24,320 and can do this conversion for us automatically. 1256 01:03:24,320 --> 01:03:27,670 And it seems to be doing it implicitly for us, so to speak. 1257 01:03:27,670 --> 01:03:31,000 Notice that c1, c2 and c3 are, obviously, chars, 1258 01:03:31,000 --> 01:03:34,420 but printf is able to tolerate printing them as integers. 1259 01:03:34,420 --> 01:03:38,870 If I really want it to be pedantic, I could use this technique, again, 1260 01:03:38,870 --> 01:03:41,320 known as typecasting, where I can actually 1261 01:03:41,320 --> 01:03:46,610 convert one data type to another, if it makes logical sense to do so. 1262 01:03:46,610 --> 01:03:49,900 And we saw in week 0, chars, or characters, 1263 01:03:49,900 --> 01:03:53,500 are just numbers, like 72, 73, and 33. 1264 01:03:53,500 --> 01:03:57,680 So I can use this parenthetical expression to convert, incorrectly, 1265 01:03:57,680 --> 01:04:02,623 [LAUGHTER] three chars to three integers, instead. 1266 01:04:02,623 --> 01:04:04,540 So that's what I meant to type the first time. 1267 01:04:04,540 --> 01:04:05,040 There we go. 1268 01:04:05,040 --> 01:04:05,800 Strike two, today. 1269 01:04:05,800 --> 01:04:09,280 So parenthesis, int, close parenthesis says 1270 01:04:09,280 --> 01:04:14,840 take whatever variable comes after this, c1, c2, or c3 and convert it to an int. 1271 01:04:14,840 --> 01:04:18,640 The effect is going to be no different, make hi, and then rerunning whoops-- 1272 01:04:18,640 --> 01:04:24,910 then running ./hi still works the same, but now I'm explicitly converting chars 1273 01:04:24,910 --> 01:04:25,660 to ints. 1274 01:04:25,660 --> 01:04:29,260 And we can do this all day long, chars to ints, floats to ints, 1275 01:04:29,260 --> 01:04:30,250 ints to floats. 1276 01:04:30,250 --> 01:04:31,888 Sometimes, it's equivalent. 1277 01:04:31,888 --> 01:04:33,805 Other times, you're going to lose information. 1278 01:04:33,805 --> 01:04:37,270 Taking a float to an int, just intuitively, 1279 01:04:37,270 --> 01:04:39,790 is going to throw away everything after the decimal point, 1280 01:04:39,790 --> 01:04:42,680 because an int has no decimal point. 1281 01:04:42,680 --> 01:04:45,100 But, for now, I'm going to rewind to the version of this 1282 01:04:45,100 --> 01:04:49,150 that just did implicit-type conversion, or implicit casting, 1283 01:04:49,150 --> 01:04:53,350 just to demonstrate that we can, indeed, see the values underneath the hood. 1284 01:04:53,350 --> 01:04:53,950 All right. 1285 01:04:53,950 --> 01:04:56,370 Let me go ahead and do this, now, the week 1 way. 1286 01:04:56,370 --> 01:04:57,370 This was kind of stupid. 1287 01:04:57,370 --> 01:05:00,205 Let's just do printf, quote-unquote-- 1288 01:05:00,205 --> 01:05:04,630 Actually, let's do this, string s equals quote-unquote hi, 1289 01:05:04,630 --> 01:05:09,680 and then let's do a simple printf with %s, printing out s's there. 1290 01:05:09,680 --> 01:05:12,520 So now I've rewound to last week, where we began this story, 1291 01:05:12,520 --> 01:05:16,660 but you'll notice that, if we keep playing around with this-- 1292 01:05:16,660 --> 01:05:18,860 whoops, what did I do here? 1293 01:05:18,860 --> 01:05:23,470 Oh, and let me introduce the C50 library here, more on that next before long. 1294 01:05:23,470 --> 01:05:26,260 Let me go ahead and recompile, rerun this, 1295 01:05:26,260 --> 01:05:28,268 we seem to be coding in circles, here. 1296 01:05:28,268 --> 01:05:30,810 Like, I've just done the same thing multiple, different ways. 1297 01:05:30,810 --> 01:05:33,400 But there's clearly an equivalence, then, 1298 01:05:33,400 --> 01:05:36,978 between sequences of chars and strings. 1299 01:05:36,978 --> 01:05:38,770 And if you do it the real pedantic way, you 1300 01:05:38,770 --> 01:05:43,390 have three different variables, c1, c2, c3, representing H-I exclamation point, 1301 01:05:43,390 --> 01:05:47,870 or you can just treat them all together like this h, i, exclamation point. 1302 01:05:47,870 --> 01:05:52,030 But it turns out that strings are actually 1303 01:05:52,030 --> 01:05:58,060 implemented by the computer in a pretty now familiar way. 1304 01:05:58,060 --> 01:06:04,382 What might a string actually be as of this point in the story? 1305 01:06:04,382 --> 01:06:05,590 Where are we going with this? 1306 01:06:05,590 --> 01:06:06,923 Let me try to look further back. 1307 01:06:06,923 --> 01:06:07,850 Yeah, in way back? 1308 01:06:07,850 --> 01:06:08,350 Yeah? 1309 01:06:08,350 --> 01:06:10,600 AUDIENCE: Can a string like this be an array of chars? 1310 01:06:10,600 --> 01:06:13,410 DAVID MALAN: Yeah, a string might be, and indeed is, just 1311 01:06:13,410 --> 01:06:14,800 an array of characters. 1312 01:06:14,800 --> 01:06:17,190 So last week we took for granted that strings exist. 1313 01:06:17,190 --> 01:06:19,530 Technically, strings exist, but they're implemented 1314 01:06:19,530 --> 01:06:23,070 as arrays of characters, which actually opens up 1315 01:06:23,070 --> 01:06:25,770 some interesting possibilities for us. 1316 01:06:25,770 --> 01:06:28,300 Because, let me see, let me see if I can do this. 1317 01:06:28,300 --> 01:06:31,560 Let me try to print out, now, three integers again. 1318 01:06:31,560 --> 01:06:37,530 But if string s is but an array, as you propose, maybe I can do s bracket 0, 1319 01:06:37,530 --> 01:06:39,760 s bracket 1, and s bracket 2. 1320 01:06:39,760 --> 01:06:43,650 So maybe I can start poking around inside of strings, 1321 01:06:43,650 --> 01:06:45,630 even though we didn't do this last week, so I 1322 01:06:45,630 --> 01:06:47,260 can get at those individual values. 1323 01:06:47,260 --> 01:06:51,270 So make hi, ./hi and, voila, there we go again. 1324 01:06:51,270 --> 01:06:56,208 It's the same 72, 73, 33, but now, I'm sort of, hopefully, 1325 01:06:56,208 --> 01:06:58,500 like, wrapping my mind around the fact that, all right, 1326 01:06:58,500 --> 01:07:01,650 a string is just an array of characters, and arrays, you 1327 01:07:01,650 --> 01:07:04,960 can index into them using this new square bracket notation. 1328 01:07:04,960 --> 01:07:08,040 So I can get at any one of these individual characters, 1329 01:07:08,040 --> 01:07:14,055 and, heck, convert it to an integer like we did in week 0. 1330 01:07:14,055 --> 01:07:17,010 Let me get a little curious now. 1331 01:07:17,010 --> 01:07:20,020 What else might be in the computer's memory? 1332 01:07:20,020 --> 01:07:23,550 Well, let's-- I'll go back to the depiction of these same things. 1333 01:07:23,550 --> 01:07:25,860 Here might be how we originally implemented hi 1334 01:07:25,860 --> 01:07:28,800 with three variables, c1, c2, c3. 1335 01:07:28,800 --> 01:07:31,500 Of course, that map to these decimal digits or equivalent, 1336 01:07:31,500 --> 01:07:32,880 these binary values. 1337 01:07:32,880 --> 01:07:35,310 But what was this looking like in memory? 1338 01:07:35,310 --> 01:07:38,250 Literally, when you create a string in memory, like this, 1339 01:07:38,250 --> 01:07:41,240 string s equals quote-unquote hi, let's consider what's going on 1340 01:07:41,240 --> 01:07:42,615 underneath the hood, so to speak. 1341 01:07:42,615 --> 01:07:47,490 Well, as an abstraction, a string, it's H-I exclamation point taking up, 1342 01:07:47,490 --> 01:07:48,917 it would seem, 3 bytes, right? 1343 01:07:48,917 --> 01:07:51,000 I've gotten rid of the bars, there, because if you 1344 01:07:51,000 --> 01:07:55,650 think of a string as a type, I'm just going to use one big box of size 3. 1345 01:07:55,650 --> 01:08:00,210 But technically, a string, we've just revealed, is an array, 1346 01:08:00,210 --> 01:08:01,830 and the array is of size 3. 1347 01:08:01,830 --> 01:08:03,750 So technically, if the string is called s, 1348 01:08:03,750 --> 01:08:05,970 s bracket 0 will give you the first character, 1349 01:08:05,970 --> 01:08:09,810 s bracket 1, the second, and s bracket 3, the third. 1350 01:08:09,810 --> 01:08:13,290 But let me ask this question now, if this, at the end of the day, 1351 01:08:13,290 --> 01:08:16,560 is the only thing in your computer memory 1352 01:08:16,560 --> 01:08:20,790 and the ability, like a canvas to draw 0s and 1s, or numbers, or characters, 1353 01:08:20,790 --> 01:08:22,620 or whatever on it, but that's it, like this 1354 01:08:22,620 --> 01:08:25,770 is what your Mac, and PC, and phone ultimately reduced to. 1355 01:08:25,770 --> 01:08:29,730 Suppose that I'm running a piece of software, like a text messenger, 1356 01:08:29,730 --> 01:08:33,000 and now I write down bye exclamation point. 1357 01:08:33,000 --> 01:08:34,860 Well, where might that go in memory? 1358 01:08:34,860 --> 01:08:35,845 Well, it might go here. 1359 01:08:35,845 --> 01:08:39,333 B-Y-E. And then the next thing I type might go here, here, here and so forth. 1360 01:08:39,333 --> 01:08:41,250 My memory just might get filled up, over time, 1361 01:08:41,250 --> 01:08:44,310 with things that you or someone else are typing. 1362 01:08:44,310 --> 01:08:50,580 But then how does the computer know if, potentially, B-Y-E exclamation point 1363 01:08:50,580 --> 01:08:56,150 is right after H-I exclamation point where one string ends and the next one 1364 01:08:56,150 --> 01:08:56,650 begins? 1365 01:08:58,930 --> 01:08:59,430 Right? 1366 01:08:59,430 --> 01:09:03,070 All we have are bytes, or 0s and 1s. 1367 01:09:03,070 --> 01:09:05,730 So if you were designing this, how would you 1368 01:09:05,730 --> 01:09:08,280 implement some kind of delimiter between the two? 1369 01:09:08,280 --> 01:09:10,260 Or figure out what the length of a string is? 1370 01:09:10,260 --> 01:09:11,010 What do you think? 1371 01:09:11,010 --> 01:09:12,148 AUDIENCE: A nul character. 1372 01:09:12,148 --> 01:09:15,107 DAVID MALAN: OK, so the right answer is use a nul character, 1373 01:09:15,107 --> 01:09:17,190 and for those who don't know, what does that mean? 1374 01:09:17,190 --> 01:09:19,492 AUDIENCE: It's special. 1375 01:09:19,492 --> 01:09:21,450 DAVID MALAN: Yeah, so it's a special character. 1376 01:09:21,450 --> 01:09:23,520 Let me describe it as a sentinel character. 1377 01:09:23,520 --> 01:09:25,575 Humans decided some time ago that you know 1378 01:09:25,575 --> 01:09:28,560 what, if we want to delineate where one string ends 1379 01:09:28,560 --> 01:09:32,010 and where the next one begins, we just need some special symbol. 1380 01:09:32,010 --> 01:09:35,189 And the symbol they'll use is generally written as backslash 0. 1381 01:09:35,189 --> 01:09:39,555 This is just shorthand notation for literally eight 0 bits. 1382 01:09:39,555 --> 01:09:42,540 0, 0, 0, 0, 0, 0, 0, 0. 1383 01:09:42,540 --> 01:09:46,140 And the nickname for eight 0 bits, in this context, 1384 01:09:46,140 --> 01:09:48,930 is nul, N-U-L, so to speak. 1385 01:09:48,930 --> 01:09:51,910 And we can actually see this as follows. 1386 01:09:51,910 --> 01:09:53,913 If you look at the corresponding decimal digits, 1387 01:09:53,913 --> 01:09:56,580 like you could do by doing out the math or doing the conversion, 1388 01:09:56,580 --> 01:10:01,560 like we've done in code, you would see for storing hi, 72, 73, 33, 1389 01:10:01,560 --> 01:10:06,600 but then 1 extra byte that's sort of invisibly there, but that is all 0s. 1390 01:10:06,600 --> 01:10:09,120 And now I've just written it as the decimal number 0. 1391 01:10:09,120 --> 01:10:12,120 The implication of this is that the computer is apparently 1392 01:10:12,120 --> 01:10:16,695 using, not 3 bytes to store a word like hi, but 4 bytes. 1393 01:10:16,695 --> 01:10:22,050 Whatever the length of the string is, plus 1 for this special sentinel value 1394 01:10:22,050 --> 01:10:24,640 that demarcates the end of the string. 1395 01:10:24,640 --> 01:10:26,680 So we might draw it like this instead. 1396 01:10:26,680 --> 01:10:31,350 And this character is, again, pronounced nul, or written N-U-L. 1397 01:10:31,350 --> 01:10:32,319 So that's all, right? 1398 01:10:32,319 --> 01:10:35,069 If humans, at the end of the day, just have this canvas of memory, 1399 01:10:35,069 --> 01:10:36,902 they just needed to decide, all right, well, 1400 01:10:36,902 --> 01:10:39,990 how do we distinguish one string from another? 1401 01:10:39,990 --> 01:10:42,660 It's a lot easier with chars, individually, it's 1402 01:10:42,660 --> 01:10:45,450 a lot easier with ints, it's even easier With floats, why? 1403 01:10:45,450 --> 01:10:49,620 Because, per that chart earlier, every character is always 1 byte. 1404 01:10:49,620 --> 01:10:51,810 Every int is always 4 bytes. 1405 01:10:51,810 --> 01:10:54,750 Every long is always 8 bytes. 1406 01:10:54,750 --> 01:10:56,279 How long is a string? 1407 01:10:56,279 --> 01:10:59,760 Well, hi is 1, 2, 3 with an exclamation point. 1408 01:10:59,760 --> 01:11:03,029 Bye is 1, 2, 3, 4 with an exclamation point. 1409 01:11:03,029 --> 01:11:06,450 David is D-A-V-I-D, five without an exclamation point. 1410 01:11:06,450 --> 01:11:10,210 And so a string can be any number of bytes long, 1411 01:11:10,210 --> 01:11:12,700 so you somehow need to draw a line in the sand 1412 01:11:12,700 --> 01:11:16,706 to separate in memory one string from another. 1413 01:11:16,706 --> 01:11:19,412 So what's the implication of this? 1414 01:11:19,412 --> 01:11:20,870 Well, let me go back to code, here. 1415 01:11:20,870 --> 01:11:22,210 Let's actually poke around. 1416 01:11:22,210 --> 01:11:27,130 This is a bit dangerous, but I'm going to start looking at memory locations 1417 01:11:27,130 --> 01:11:29,210 past my string here. 1418 01:11:29,210 --> 01:11:33,250 So let me go ahead and recompile, make hi. 1419 01:11:33,250 --> 01:11:35,110 Whoops, what did I do here? 1420 01:11:35,110 --> 01:11:36,680 I forgot a format code. 1421 01:11:36,680 --> 01:11:38,620 Let me add one more %i. 1422 01:11:38,620 --> 01:11:42,550 Now let me go ahead and rerun make hi, ./hi, Enter. 1423 01:11:42,550 --> 01:11:43,580 There it is. 1424 01:11:43,580 --> 01:11:46,660 So you can actually see in the computer, unbeknownst to you 1425 01:11:46,660 --> 01:11:49,830 previously, that there's indeed something else going on there. 1426 01:11:49,830 --> 01:11:52,880 And if I were to make one other variant of this program-- 1427 01:11:52,880 --> 01:11:55,630 let's get rid of just this one word and let's have two. 1428 01:11:55,630 --> 01:11:57,550 So let me give myself another string called t, 1429 01:11:57,550 --> 01:12:01,810 for instance, just this common convention with bye exclamation point. 1430 01:12:01,810 --> 01:12:04,900 Let me, then print out with %s. 1431 01:12:04,900 --> 01:12:10,785 And let me also print out with %s, whoops, printf, print out t, as well. 1432 01:12:10,785 --> 01:12:14,320 Let me recompile this program, and obviously the out-- 1433 01:12:14,320 --> 01:12:17,470 ugh-- this is what happens when I go too fast. 1434 01:12:17,470 --> 01:12:20,740 All right, third mistake today, close quote. 1435 01:12:20,740 --> 01:12:22,030 As I was missing. 1436 01:12:22,030 --> 01:12:23,590 Make hi. 1437 01:12:23,590 --> 01:12:25,000 Fourth mistake today. 1438 01:12:25,000 --> 01:12:26,200 Make hi. 1439 01:12:26,200 --> 01:12:27,490 Dot slash hi. 1440 01:12:27,490 --> 01:12:28,210 OK, voila. 1441 01:12:28,210 --> 01:12:30,610 Now we have a program that's printing both hi and bye, 1442 01:12:30,610 --> 01:12:34,720 only so that we can consider what's going on in the computer's memory. 1443 01:12:34,720 --> 01:12:40,210 If s is storing hi and apparently one bonus byte that 1444 01:12:40,210 --> 01:12:43,240 demarcates the end of that string, bye is apparently 1445 01:12:43,240 --> 01:12:46,413 going to fit into the location directly after. 1446 01:12:46,413 --> 01:12:49,330 And it's wrapping around, but that's just an artist's rendition, here. 1447 01:12:49,330 --> 01:12:52,000 But bye, B-Y-E exclamation point is taking up 1448 01:12:52,000 --> 01:12:58,948 1, 2, 3, 4, plus a fifth byte, as well. 1449 01:12:58,948 --> 01:13:03,580 All right, any questions on this underlying representation of strings? 1450 01:13:03,580 --> 01:13:05,560 And we'll contextualize this, before long, 1451 01:13:05,560 --> 01:13:07,840 so that this isn't just like, OK, who really cares? 1452 01:13:07,840 --> 01:13:10,730 This is going to be the source of actually implementing things. 1453 01:13:10,730 --> 01:13:13,510 In fact for problem set 2, like cryptography, and encryption, 1454 01:13:13,510 --> 01:13:15,468 and scrambling actual human messages. 1455 01:13:15,468 --> 01:13:16,510 But some questions first. 1456 01:13:16,510 --> 01:13:20,650 AUDIENCE: So normally if you were to not use string, 1457 01:13:20,650 --> 01:13:23,480 you would just make a character range that would declare, 1458 01:13:23,480 --> 01:13:26,580 how many characters there are so you know how many characters are 1459 01:13:26,580 --> 01:13:27,330 going to be there. 1460 01:13:27,330 --> 01:13:29,480 DAVID MALAN: A good question, too and let 1461 01:13:29,480 --> 01:13:32,115 me summarize as, if we were instead to use chars all the time, 1462 01:13:32,115 --> 01:13:35,240 we would indeed have to know in advance how many chars you want for a given 1463 01:13:35,240 --> 01:13:38,750 string that you're storing, how, then, does something like get string work, 1464 01:13:38,750 --> 01:13:41,000 because when you CS50 wrote the get string function, 1465 01:13:41,000 --> 01:13:43,190 we obviously don't know how long the words are 1466 01:13:43,190 --> 01:13:45,020 going to be that you all are typing in. 1467 01:13:45,020 --> 01:13:48,560 It turns out, two weeks from now we'll see that get string 1468 01:13:48,560 --> 01:13:51,320 uses a technique known as dynamic memory allocation. 1469 01:13:51,320 --> 01:13:55,770 And it's going to grow or shrink the array automatically for you. 1470 01:13:55,770 --> 01:13:57,050 But more on that soon. 1471 01:13:57,050 --> 01:13:57,920 Other questions? 1472 01:13:57,920 --> 01:14:01,450 AUDIENCE: Why are we using a nul value? 1473 01:14:01,450 --> 01:14:02,725 Isn't that wasting a byte? 1474 01:14:02,725 --> 01:14:03,850 DAVID MALAN: Good question. 1475 01:14:03,850 --> 01:14:06,880 Why are we using a nul value, isn't it wasting a byte? 1476 01:14:06,880 --> 01:14:07,630 Yes. 1477 01:14:07,630 --> 01:14:13,210 But I claim there's really no other way to distinguish the end of one string 1478 01:14:13,210 --> 01:14:19,748 from the start of another, unless we make some sort of notation in memory. 1479 01:14:19,748 --> 01:14:22,540 All we have, at the end of the day, inside of a computer, are bits. 1480 01:14:22,540 --> 01:14:25,900 Therefore, all we can do is spin those bits in some creative way 1481 01:14:25,900 --> 01:14:27,520 to solve this problem. 1482 01:14:27,520 --> 01:14:30,710 So we're minimally going to spend 1 byte to solve this problem. 1483 01:14:30,710 --> 01:14:31,210 Yeah? 1484 01:14:31,210 --> 01:14:35,897 AUDIENCE: How does our memory device know to enter a line when you type 1485 01:14:35,897 --> 01:14:39,270 the /n if we don't have it stored as a char? 1486 01:14:39,270 --> 01:14:40,910 DAVID MALAN: If you don't-- 1487 01:14:40,910 --> 01:14:44,690 how does the computer know to move to a next line when you have a /n? 1488 01:14:44,690 --> 01:14:47,990 So /n, even though it looks like two characters, 1489 01:14:47,990 --> 01:14:51,890 it's actually stored as just 1 byte in the computer's memory. 1490 01:14:51,890 --> 01:14:54,357 There's a mapping between it and an actual number. 1491 01:14:54,357 --> 01:14:57,440 And you can see that, for instance, on the ASCII chart from the other day. 1492 01:14:57,440 --> 01:15:01,224 AUDIENCE: So with that being stored would be the [INAUDIBLE].. 1493 01:15:01,224 --> 01:15:02,420 DAVID MALAN: It would be. 1494 01:15:02,420 --> 01:15:08,210 If I had put a /n in my code here, right after the exclamation point here 1495 01:15:08,210 --> 01:15:11,840 and here, that would actually shift everything in memory because we would 1496 01:15:11,840 --> 01:15:16,740 need to make room for a /n here and another one over here. 1497 01:15:16,740 --> 01:15:18,913 So it would take two more bytes, exactly. 1498 01:15:18,913 --> 01:15:19,580 Other questions? 1499 01:15:19,580 --> 01:15:26,050 AUDIENCE: So if hi exclamation point is written in binary and ASCII 1500 01:15:26,050 --> 01:15:32,630 too as 72, 73, 33, if we are to write those numbers in the string, 1501 01:15:32,630 --> 01:15:39,090 and convert them into binary how would the computer know what's 72 1502 01:15:39,090 --> 01:15:40,390 and what's 8? 1503 01:15:40,390 --> 01:15:42,390 DAVID MALAN: And what's the last thing you said? 1504 01:15:42,390 --> 01:15:43,806 AUDIENCE: 8, for example. 1505 01:15:43,806 --> 01:15:45,700 DAVID MALAN: It's context sensitive. 1506 01:15:45,700 --> 01:15:48,450 So if, at the end of the day, all we're storing is these numbers, 1507 01:15:48,450 --> 01:15:52,380 like 72, 73, 33, recall that it's up to the program 1508 01:15:52,380 --> 01:15:55,470 to decide, based on context, how to interpret them. 1509 01:15:55,470 --> 01:15:59,310 And I simplified this story in week 0 saying that Photoshop interprets them 1510 01:15:59,310 --> 01:16:02,910 as RGB colors, and iMessage or a text messaging program 1511 01:16:02,910 --> 01:16:07,440 interprets them as letters, and Excel interprets them as numbers. 1512 01:16:07,440 --> 01:16:12,540 How those programs do it is by way of variables like string, and int, 1513 01:16:12,540 --> 01:16:13,080 and float. 1514 01:16:13,080 --> 01:16:14,872 And in fact, later this semester, we'll see 1515 01:16:14,872 --> 01:16:19,500 a data type via which you can represent a color as a triple of numbers, 1516 01:16:19,500 --> 01:16:22,240 and red value, a green value, and a blue value. 1517 01:16:22,240 --> 01:16:24,600 So we'll see other data types as well. 1518 01:16:24,600 --> 01:16:25,100 Yeah? 1519 01:16:25,100 --> 01:16:29,320 AUDIENCE: It seems easy enough to just add a nul thing at the end of the word, 1520 01:16:29,320 --> 01:16:32,190 so why do we have integers and long integers? 1521 01:16:32,190 --> 01:16:35,192 Why can't we make everything variable in its data size? 1522 01:16:35,192 --> 01:16:36,900 DAVID MALAN: Really interesting question. 1523 01:16:36,900 --> 01:16:40,110 Why could we not just make all data types variable in size? 1524 01:16:40,110 --> 01:16:43,560 And some languages, some libraries do exactly this. 1525 01:16:43,560 --> 01:16:47,100 C is an older language, and because memory was expensive 1526 01:16:47,100 --> 01:16:48,300 memory was limited. 1527 01:16:48,300 --> 01:16:50,640 The reality was you gain benefits from just 1528 01:16:50,640 --> 01:16:53,010 standardizing the size of these things. 1529 01:16:53,010 --> 01:16:55,410 You also get performance increases in the sense 1530 01:16:55,410 --> 01:16:59,620 that if you know every int is 4 bytes, you can very quickly, 1531 01:16:59,620 --> 01:17:02,220 and we'll see this next week, jump from integer to another, 1532 01:17:02,220 --> 01:17:06,600 to another in memory just by adding 4 inside of those square brackets. 1533 01:17:06,600 --> 01:17:08,430 You can very quickly poke around. 1534 01:17:08,430 --> 01:17:11,522 Whereas, if you had variable length numbers, you would have to, 1535 01:17:11,522 --> 01:17:13,980 kind of, follow, follow, follow, looking for the end of it. 1536 01:17:13,980 --> 01:17:16,780 Follow, follow-- you would have to look at more locations in memory. 1537 01:17:16,780 --> 01:17:18,322 So that's a topic we'll come back to. 1538 01:17:18,322 --> 01:17:20,700 But it was generally for efficiency. 1539 01:17:20,700 --> 01:17:22,170 And other question, yeah? 1540 01:17:22,170 --> 01:17:27,942 AUDIENCE: Why not store the nul character [INAUDIBLE] 1541 01:17:27,942 --> 01:17:31,520 DAVID MALAN: Good question why not store the-- 1542 01:17:31,520 --> 01:17:35,540 why not store the nul character at the beginning? 1543 01:17:35,540 --> 01:17:41,890 You could-- let's see, why not store it at the beginning? 1544 01:17:41,890 --> 01:17:45,080 You could do that. 1545 01:17:45,080 --> 01:17:48,325 You could absolutely-- well, could you do this? 1546 01:17:51,580 --> 01:17:56,380 If you were to do that at the beginning-- 1547 01:17:56,380 --> 01:17:57,400 short answer, no. 1548 01:17:57,400 --> 01:17:58,420 OK, now I retract that. 1549 01:17:58,420 --> 01:18:00,628 No, because I finally thought of a problem with this. 1550 01:18:00,628 --> 01:18:02,483 If you store it at the beginning instead, 1551 01:18:02,483 --> 01:18:04,900 we'll see in just a moment how you can actually write code 1552 01:18:04,900 --> 01:18:07,150 to figure out where the end of a string is, 1553 01:18:07,150 --> 01:18:09,550 and the problem there is wouldn't necessarily 1554 01:18:09,550 --> 01:18:13,000 know if you eventually hit a 0 at the end of the string, 1555 01:18:13,000 --> 01:18:16,810 because it's the number 0 in the context of Excel using some memory, 1556 01:18:16,810 --> 01:18:20,180 or if it's the context of some other data type, altogether. 1557 01:18:20,180 --> 01:18:22,600 So the fact that we've standardized-- 1558 01:18:22,600 --> 01:18:26,560 the fact that we've standardized strings as ending with nul 1559 01:18:26,560 --> 01:18:30,655 means that we can reliably distinguish one variable from another in memory. 1560 01:18:30,655 --> 01:18:32,560 And that's actually a perfect segue way, now, 1561 01:18:32,560 --> 01:18:35,693 to actually using this primitive to building up 1562 01:18:35,693 --> 01:18:38,360 our own code that manipulates these things that are lower level. 1563 01:18:38,360 --> 01:18:39,560 So let me do this. 1564 01:18:39,560 --> 01:18:41,650 Let me create a new file called length. 1565 01:18:41,650 --> 01:18:46,000 And let's use this basic idea to figure out what the length of a string 1566 01:18:46,000 --> 01:18:50,720 is after it's been stored in a variable. 1567 01:18:50,720 --> 01:18:51,860 So let's do this. 1568 01:18:51,860 --> 01:18:56,530 Let me include both the CS50 header and the standard I/O header, 1569 01:18:56,530 --> 01:19:01,250 give myself int main(void) again here, and inside of main, do this. 1570 01:19:01,250 --> 01:19:04,060 Let me prompt the user for a string s and I'll ask them 1571 01:19:04,060 --> 01:19:08,170 for a string like their name, here. 1572 01:19:08,170 --> 01:19:13,420 And then let me name it more verbosely name this time. 1573 01:19:13,420 --> 01:19:15,170 Now let me go ahead and do this. 1574 01:19:15,170 --> 01:19:20,260 Let me iterate over every character in this string 1575 01:19:20,260 --> 01:19:22,180 in order to figure out what its length is. 1576 01:19:22,180 --> 01:19:25,060 So initially, I'm going to go ahead and say this, 1577 01:19:25,060 --> 01:19:28,040 int length equals 0, because I don't know what it is yet. 1578 01:19:28,040 --> 01:19:29,290 So we're going to start at 0. 1579 01:19:29,290 --> 01:19:32,410 And then while the following is true-- 1580 01:19:32,410 --> 01:19:37,370 while-- let me-- do I want to do this? 1581 01:19:37,370 --> 01:19:40,060 Let me change this to i, just for clarity, let me do 1582 01:19:40,060 --> 01:19:45,790 this, while name bracket i does not equal that special nul character. 1583 01:19:45,790 --> 01:19:49,180 So I typed it on the slide is N-U-L, but you don't write N-U-L in code, 1584 01:19:49,180 --> 01:19:53,665 you actually use its numeric equivalent, which is /0 in single quotes. 1585 01:19:53,665 --> 01:19:58,930 While name bracket i does not equal the nul character, I'm going to go ahead 1586 01:19:58,930 --> 01:20:02,470 and increment i to i plus plus. 1587 01:20:02,470 --> 01:20:05,470 And then down here I'm going to print out the value of i 1588 01:20:05,470 --> 01:20:09,270 to see what we actually get, printing out the value of i. 1589 01:20:09,270 --> 01:20:11,020 All right, so what's going to happen here? 1590 01:20:11,020 --> 01:20:13,420 Let me run make length. 1591 01:20:13,420 --> 01:20:14,740 Fortunately no errors. 1592 01:20:14,740 --> 01:20:19,570 ./length and let me type in something like H-I, exclamation point, Enter. 1593 01:20:19,570 --> 01:20:20,740 And I get 3. 1594 01:20:20,740 --> 01:20:23,950 Let me try bye, exclamation point, Enter. 1595 01:20:23,950 --> 01:20:25,870 And I get 4. 1596 01:20:25,870 --> 01:20:28,510 Let me try my own name, David, Enter. 1597 01:20:28,510 --> 01:20:29,970 5, and so forth. 1598 01:20:29,970 --> 01:20:31,880 So what's actually going on here? 1599 01:20:31,880 --> 01:20:34,490 Well, it seems that by way of this 4 loop, 1600 01:20:34,490 --> 01:20:36,622 we are specifying a local variable called 1601 01:20:36,622 --> 01:20:39,580 i initialized to 0, because we're figuring out the length of the string 1602 01:20:39,580 --> 01:20:40,580 as we go. 1603 01:20:40,580 --> 01:20:44,050 I'm then asking the question, does location 0, 1604 01:20:44,050 --> 01:20:49,300 that is i in the name string, which we now know is an array, 1605 01:20:49,300 --> 01:20:51,700 does it not equal /0? 1606 01:20:51,700 --> 01:20:55,645 Because if it doesn't, that means it's an actual character like H, or B, or D. 1607 01:20:55,645 --> 01:20:57,640 So let's increment i. 1608 01:20:57,640 --> 01:21:00,910 Then, let's come back around to line 9 and let's ask the question again. 1609 01:21:00,910 --> 01:21:02,590 Now i equals 1. 1610 01:21:02,590 --> 01:21:06,420 So does name bracket 1 not equal /0? 1611 01:21:06,420 --> 01:21:12,070 Well, if it doesn't, and it won't if it's an i, or a y, or an a, 1612 01:21:12,070 --> 01:21:15,490 based on what I typed in, we're going to increment i once more. 1613 01:21:15,490 --> 01:21:18,940 Fast-forward to the end of the story, once I get to the end of the string, 1614 01:21:18,940 --> 01:21:22,420 technically, one space past the end of the string, 1615 01:21:22,420 --> 01:21:25,510 name bracket i will equal /0. 1616 01:21:25,510 --> 01:21:29,960 So I don't increment i anymore, I end up just printing the result. 1617 01:21:29,960 --> 01:21:34,510 So what we seem to have here with some low level C code, just this while loop, 1618 01:21:34,510 --> 01:21:39,070 is a program that figures out the length of a given string that's been typed in. 1619 01:21:39,070 --> 01:21:41,860 Let's practice our abstraction and decompose this into, 1620 01:21:41,860 --> 01:21:43,270 maybe, a helper function here. 1621 01:21:43,270 --> 01:21:47,110 Let me grab all of this code here, and assume, 1622 01:21:47,110 --> 01:21:51,580 for the sake of discussion for a moment, that I can call a function now called 1623 01:21:51,580 --> 01:21:53,740 string length. 1624 01:21:53,740 --> 01:21:56,830 And the length of the string is name that I want to get, 1625 01:21:56,830 --> 01:22:01,000 and then I'll go ahead and print out, just as before with %i, 1626 01:22:01,000 --> 01:22:02,398 the length of that string. 1627 01:22:02,398 --> 01:22:04,690 So now I'm abstracting away this notion of figuring out 1628 01:22:04,690 --> 01:22:05,732 the length of the string. 1629 01:22:05,732 --> 01:22:08,470 That's an opportunity for to me to create my own function. 1630 01:22:08,470 --> 01:22:11,515 If I want to create a function called string length, 1631 01:22:11,515 --> 01:22:15,610 I'll claim that I want to take a string as input, 1632 01:22:15,610 --> 01:22:20,860 and what should I have this function return as its return type? 1633 01:22:20,860 --> 01:22:26,090 What should get string presumably return? 1634 01:22:26,090 --> 01:22:26,590 Yeah? 1635 01:22:26,590 --> 01:22:27,430 AUDIENCE: Int. 1636 01:22:27,430 --> 01:22:28,270 DAVID MALAN: An int, right? 1637 01:22:28,270 --> 01:22:29,020 An int makes sense. 1638 01:22:29,020 --> 01:22:30,937 Float really wouldn't make sense because we're 1639 01:22:30,937 --> 01:22:33,377 measuring things that are integers. 1640 01:22:33,377 --> 01:22:34,960 In this case, the length of something. 1641 01:22:34,960 --> 01:22:36,640 So indeed, let's have it return an int. 1642 01:22:36,640 --> 01:22:39,380 I can use the same code as before, so I'm 1643 01:22:39,380 --> 01:22:42,175 going to paste what I cut earlier in the file. 1644 01:22:42,175 --> 01:22:46,660 The only thing I have to change is the name of the variable. 1645 01:22:46,660 --> 01:22:50,240 Because now this function, I decided arbitrarily 1646 01:22:50,240 --> 01:22:53,130 that I'm going to call it s, just to be more generic. 1647 01:22:53,130 --> 01:22:55,915 So I'm going to look at s bracket i at each location. 1648 01:22:55,915 --> 01:22:58,790 And I don't want to print it at the end, this would be a side effect. 1649 01:22:58,790 --> 01:23:01,250 What's the line of code I should include here if I actually 1650 01:23:01,250 --> 01:23:04,005 want to hand back the total length? 1651 01:23:04,005 --> 01:23:04,505 Yeah? 1652 01:23:04,505 --> 01:23:05,362 AUDIENCE: Return i. 1653 01:23:05,362 --> 01:23:06,320 DAVID MALAN: Say again? 1654 01:23:06,320 --> 01:23:07,112 AUDIENCE: Return i. 1655 01:23:07,112 --> 01:23:09,270 DAVID MALAN: Return i, in this case. 1656 01:23:09,270 --> 01:23:11,540 So I'm going return i, not print it. 1657 01:23:11,540 --> 01:23:16,490 Because now, my main function can use the return value stored in length 1658 01:23:16,490 --> 01:23:18,530 and print it on the next line itself. 1659 01:23:18,530 --> 01:23:22,520 I just need a prototype, so that's my one forgivable copy paste here. 1660 01:23:22,520 --> 01:23:24,170 I'm going to rerun make length. 1661 01:23:24,170 --> 01:23:25,640 Hopefully I didn't screw up. 1662 01:23:25,640 --> 01:23:29,330 I didn't. ./length, I'll type in hi-- oops-- 1663 01:23:29,330 --> 01:23:31,340 I'll type in hi, again. 1664 01:23:31,340 --> 01:23:31,880 That works. 1665 01:23:31,880 --> 01:23:34,970 I'll type in bye again, and so forth. 1666 01:23:34,970 --> 01:23:38,703 So now we have a function that determines the length of a string. 1667 01:23:38,703 --> 01:23:41,120 Well, it turns out we didn't actually need this all along. 1668 01:23:41,120 --> 01:23:46,042 It turns out that we can get rid of my own custom string length function here. 1669 01:23:46,042 --> 01:23:48,500 I can definitely delete the whole implementation down here. 1670 01:23:48,500 --> 01:23:52,160 Because it turns out, in a file called string.h, 1671 01:23:52,160 --> 01:23:55,520 which is a new header file today, we actually have access to a function 1672 01:23:55,520 --> 01:23:59,690 called, more succinctly, strlen, S-T-R-L-E-N. Which, 1673 01:23:59,690 --> 01:24:01,130 literally does that. 1674 01:24:01,130 --> 01:24:05,240 This is a function that comes with C, albeit in the string.h header file, 1675 01:24:05,240 --> 01:24:09,450 and it does what we just implemented manually. 1676 01:24:09,450 --> 01:24:13,340 So here's an example of, admittedly, a wheel we just reinvented, but no more. 1677 01:24:13,340 --> 01:24:14,480 We don't have to do that. 1678 01:24:14,480 --> 01:24:16,850 And how do what kinds of functions exist? 1679 01:24:16,850 --> 01:24:21,260 Well, let me pop out of my browser here to a website that 1680 01:24:21,260 --> 01:24:24,455 is a CS50's incarnation of what are called manual pages. 1681 01:24:24,455 --> 01:24:28,070 It turns out that in a lot of systems, Macs, and Unix, 1682 01:24:28,070 --> 01:24:31,100 and Linux systems, including the Visual Studio Code 1683 01:24:31,100 --> 01:24:33,020 instance that we have in the cloud, there 1684 01:24:33,020 --> 01:24:36,290 are publicly accessible manual pages for functions. 1685 01:24:36,290 --> 01:24:39,770 They tend to be written very expertly, in a way that's 1686 01:24:39,770 --> 01:24:41,160 not very beginner-friendly. 1687 01:24:41,160 --> 01:24:45,650 So we have here at manual.cs50.io is CS50's version 1688 01:24:45,650 --> 01:24:48,740 of manual pages that have this less-comfortable mode that 1689 01:24:48,740 --> 01:24:51,290 give you a, sort of, cheat sheet of very frequently used, 1690 01:24:51,290 --> 01:24:55,010 helpful functions in C. And we've translated the expert 1691 01:24:55,010 --> 01:24:58,075 notation to things that a beginner can understand. 1692 01:24:58,075 --> 01:25:02,190 So, for instance, let me go ahead and search for a string up at the top here. 1693 01:25:02,190 --> 01:25:06,200 You'll see that there's documentation for our own get string function, 1694 01:25:06,200 --> 01:25:08,510 but more interestingly down here, there's 1695 01:25:08,510 --> 01:25:10,850 a whole bunch of string-related functions 1696 01:25:10,850 --> 01:25:12,620 that we haven't even seen most of, yet. 1697 01:25:12,620 --> 01:25:14,660 But there's indeed one here called strlen, 1698 01:25:14,660 --> 01:25:16,620 calculate the length of a string. 1699 01:25:16,620 --> 01:25:22,160 And so if I go to strlen here, I'll see some less-comfortable documentation 1700 01:25:22,160 --> 01:25:22,970 for this function. 1701 01:25:22,970 --> 01:25:25,400 And the way a manual page typically works, 1702 01:25:25,400 --> 01:25:28,310 whether in CS50's format or any other, system 1703 01:25:28,310 --> 01:25:30,950 is you see, typically, a synopsis of what header 1704 01:25:30,950 --> 01:25:33,330 files you need to use the function. 1705 01:25:33,330 --> 01:25:35,960 So you would copy paste these couple of lines here. 1706 01:25:35,960 --> 01:25:39,530 You see what the prototype is of the function so 1707 01:25:39,530 --> 01:25:42,533 that you know what its inputs are, if any, and its outputs are, if any. 1708 01:25:42,533 --> 01:25:45,200 Then down below you might see a description, which in this case, 1709 01:25:45,200 --> 01:25:46,320 is pretty straightforward. 1710 01:25:46,320 --> 01:25:48,170 This function calculates the length of s. 1711 01:25:48,170 --> 01:25:51,110 Then you see what the return value is, if any, 1712 01:25:51,110 --> 01:25:54,310 and you might even see an example, like this one that we've whipped up here. 1713 01:25:54,310 --> 01:25:57,012 So these manual pages which are again, accessible 1714 01:25:57,012 --> 01:25:59,720 here, and we'll link to these in the problem sets moving forward, 1715 01:25:59,720 --> 01:26:02,510 are pretty much the place to start when you want to figure out 1716 01:26:02,510 --> 01:26:05,210 has a wheel been invented already? 1717 01:26:05,210 --> 01:26:08,490 Is there a function that might help me solve some problems set problems 1718 01:26:08,490 --> 01:26:11,900 so that I don't have to really get into the weeds of doing all 1719 01:26:11,900 --> 01:26:13,712 of those lower-level steps as I've had. 1720 01:26:13,712 --> 01:26:16,670 Sometimes the answer is going to be yes, sometimes it's going to be no. 1721 01:26:16,670 --> 01:26:19,160 But again the point of our having just done this together 1722 01:26:19,160 --> 01:26:21,950 is to reveal that even the functions you start taking for 1723 01:26:21,950 --> 01:26:26,135 granted, they all reduce to some of these basic building blocks. 1724 01:26:26,135 --> 01:26:29,600 At the end of the day, this is all that's inside of your computer 1725 01:26:29,600 --> 01:26:30,950 is 0s and 1s. 1726 01:26:30,950 --> 01:26:33,060 We're just learning, now, how to harness those 1727 01:26:33,060 --> 01:26:37,220 and how to manipulate them ourselves. 1728 01:26:37,220 --> 01:26:41,510 Any questions here on this? 1729 01:26:41,510 --> 01:26:43,305 Any questions at all? 1730 01:26:43,305 --> 01:26:43,805 Yeah. 1731 01:26:43,805 --> 01:26:51,779 AUDIENCE: We did just see [INAUDIBLE] Is that so common 1732 01:26:51,779 --> 01:26:54,035 that we would have to specify it, or is it not? 1733 01:26:54,035 --> 01:26:55,160 DAVID MALAN: Good question. 1734 01:26:55,160 --> 01:26:57,920 Is it so common that you would have to specify it or not? 1735 01:26:57,920 --> 01:27:00,170 You do need to include its header files because that's 1736 01:27:00,170 --> 01:27:01,670 where all of those prototypes are. 1737 01:27:01,670 --> 01:27:05,190 You don't need to worry about linking it in with -l anything. 1738 01:27:05,190 --> 01:27:07,340 And in fact, moving forward, you do not ever 1739 01:27:07,340 --> 01:27:10,910 need to worry about linking in libraries when compiling your code. 1740 01:27:10,910 --> 01:27:14,940 We, the staff, have configured make to do all of that for you automatically. 1741 01:27:14,940 --> 01:27:17,030 We want you to understand that it is doing it, 1742 01:27:17,030 --> 01:27:19,340 but we'll take care of all of the -l's for you. 1743 01:27:19,340 --> 01:27:23,360 But the onus is on you for the prototypes and the header files. 1744 01:27:23,360 --> 01:27:27,150 Other questions on these representations or techniques? 1745 01:27:27,150 --> 01:27:27,650 Yeah? 1746 01:27:27,650 --> 01:27:35,920 AUDIENCE: [INAUDIBLE] exclamation mark. 1747 01:27:35,920 --> 01:27:40,524 How does it actually define the spaces [INAUDIBLE]?? 1748 01:27:40,524 --> 01:27:41,920 DAVID MALAN: A good question. 1749 01:27:41,920 --> 01:27:45,700 If you were to have a string with actual spaces in it that is multiple words, 1750 01:27:45,700 --> 01:27:47,530 what would the computer actually do? 1751 01:27:47,530 --> 01:27:49,960 Well for this. let me go to asciichart.com. 1752 01:27:49,960 --> 01:27:54,880 Which is just a random website that's my go-to for the first 127 characters 1753 01:27:54,880 --> 01:27:55,930 of ASCII. 1754 01:27:55,930 --> 01:27:58,520 This is, in fact, what we had a screenshot of the other day. 1755 01:27:58,520 --> 01:28:02,088 And if you look here, it's a little non-obvious, but S-P is space. 1756 01:28:02,088 --> 01:28:05,380 If a computer were to store a space, it would actually store the decimal number 1757 01:28:05,380 --> 01:28:10,430 32, or technically, the pattern of 0s and 1s that represent the number 32. 1758 01:28:10,430 --> 01:28:13,240 All of the US English keys that you might type on a keyboard 1759 01:28:13,240 --> 01:28:16,390 can be represented with a number, and using Unicode can 1760 01:28:16,390 --> 01:28:18,920 you express even things like emojis and other languages. 1761 01:28:18,920 --> 01:28:19,420 Yeah? 1762 01:28:19,420 --> 01:28:23,130 AUDIENCE: Are only strings followed by nul number, 1763 01:28:23,130 --> 01:28:26,516 or let's say we had a series of numbers, would each one of them 1764 01:28:26,516 --> 01:28:27,845 be accompanied by nuls? 1765 01:28:27,845 --> 01:28:28,970 DAVID MALAN: Good question. 1766 01:28:28,970 --> 01:28:31,790 Only strings are accompanied by nuls at the end 1767 01:28:31,790 --> 01:28:34,760 because every other data type we've talked about thus far 1768 01:28:34,760 --> 01:28:37,130 is of well defined finite length. 1769 01:28:37,130 --> 01:28:40,190 1 byte for char, 4 bytes for ints and so forth. 1770 01:28:40,190 --> 01:28:44,240 If we think back to last week, we did end the week with a couple of problems. 1771 01:28:44,240 --> 01:28:48,080 Integer overflow, because 4 bytes, heck, even 8 bytes is sometimes not enough. 1772 01:28:48,080 --> 01:28:50,270 We also talked about floating point imprecision. 1773 01:28:50,270 --> 01:28:53,480 Thankfully in the world of scientific computing and financial computing, 1774 01:28:53,480 --> 01:28:56,930 there are libraries you can use that draw inspiration 1775 01:28:56,930 --> 01:28:58,820 from this idea of a string, and they might 1776 01:28:58,820 --> 01:29:02,640 use 9 bytes for an integer value or maybe 20 bytes 1777 01:29:02,640 --> 01:29:04,170 that you can count really high. 1778 01:29:04,170 --> 01:29:06,680 But they will then start to manage that memory for you 1779 01:29:06,680 --> 01:29:09,960 and what they're really probably doing is just grabbing a whole bunch of bytes 1780 01:29:09,960 --> 01:29:13,070 and somehow remembering how long the sequence of bytes is. 1781 01:29:13,070 --> 01:29:16,190 That's how these higher-level libraries work, too. 1782 01:29:16,190 --> 01:29:17,700 All right, this has been a lot. 1783 01:29:17,700 --> 01:29:19,080 Let's take one more break here. 1784 01:29:19,080 --> 01:29:20,670 We'll do a seven-minute break here. 1785 01:29:20,670 --> 01:29:23,465 And when we come back, we'll flesh out a few more details. 1786 01:29:23,465 --> 01:29:26,390 All right. 1787 01:29:26,390 --> 01:29:31,400 So we just saw strlen as an example of a function that 1788 01:29:31,400 --> 01:29:32,898 comes in the string library. 1789 01:29:32,898 --> 01:29:35,690 Let's start to take more of these library functions out for a spin. 1790 01:29:35,690 --> 01:29:39,530 So we're not relying only on the built ins that we saw last week. 1791 01:29:39,530 --> 01:29:41,660 Let me switch over to VS Code. 1792 01:29:41,660 --> 01:29:46,040 And create a file called, say string.h. 1793 01:29:46,040 --> 01:29:48,115 to apply this lesson learned, as follows. 1794 01:29:48,115 --> 01:29:54,770 Let me include cs50.h, stdio.h, and this new thing, 1795 01:29:54,770 --> 01:29:57,260 string.h as well, at the top. 1796 01:29:57,260 --> 01:29:59,698 I'm going to do the usual int main(void) here. 1797 01:29:59,698 --> 01:30:02,240 And then in this program suppose, for the sake of discussion, 1798 01:30:02,240 --> 01:30:05,540 that I didn't know about %s for printf or, heck, 1799 01:30:05,540 --> 01:30:09,300 maybe early on there was no %s format code. 1800 01:30:09,300 --> 01:30:12,420 And so there was no easy way to print strings. 1801 01:30:12,420 --> 01:30:15,830 Well, at least if we know that strings are just arrays of characters, 1802 01:30:15,830 --> 01:30:19,820 we could use %c as a workaround, a solution to that, 1803 01:30:19,820 --> 01:30:21,420 sort of, contrived problem. 1804 01:30:21,420 --> 01:30:24,920 So let me ask myself for a string s by using get string here 1805 01:30:24,920 --> 01:30:27,500 and I'll ask the user for some input. 1806 01:30:27,500 --> 01:30:33,260 And then, let me print out say, output , and all I want to do is print back out 1807 01:30:33,260 --> 01:30:34,460 what the user typed. 1808 01:30:34,460 --> 01:30:38,000 Now, the simplest way to do this, of course, is going to be like last week, 1809 01:30:38,000 --> 01:30:40,960 printf %s, and plug in the s, and we're done. 1810 01:30:40,960 --> 01:30:43,730 But again, for the sake of discussion, I forgot about, 1811 01:30:43,730 --> 01:30:47,820 or someone didn't implement %s, so how else could we do this? 1812 01:30:47,820 --> 01:30:51,800 Well, in pseudo code, or in English what's the gist of how we could solve 1813 01:30:51,800 --> 01:30:58,910 this problem, printing out the string s on the screen without using %s? 1814 01:30:58,910 --> 01:31:02,420 How might we go about solving this? 1815 01:31:02,420 --> 01:31:04,147 Just in English, high-level? 1816 01:31:04,147 --> 01:31:05,730 What would your pseudo code look like? 1817 01:31:05,730 --> 01:31:06,230 Yeah? 1818 01:31:06,230 --> 01:31:09,568 AUDIENCE: You could just print each letter. 1819 01:31:09,568 --> 01:31:11,360 DAVID MALAN: OK, so just print each letter. 1820 01:31:11,360 --> 01:31:13,490 And maybe, more precisely, some kind of loop. 1821 01:31:13,490 --> 01:31:17,030 Like, let's iterate over all of the characters in s 1822 01:31:17,030 --> 01:31:18,150 and print one at a time. 1823 01:31:18,150 --> 01:31:19,290 So how can I do that? 1824 01:31:19,290 --> 01:31:24,050 Well, for int i, get 0 is kind of the go-to starting point for most loops, 1825 01:31:24,050 --> 01:31:25,580 i is less than-- 1826 01:31:25,580 --> 01:31:27,365 OK, how long do I want to iterate? 1827 01:31:27,365 --> 01:31:29,240 Well, it's going to depend on what I type in, 1828 01:31:29,240 --> 01:31:31,300 but that's why we have strlen now. 1829 01:31:31,300 --> 01:31:36,080 So iterate up to the length of s, and then increment i with plus 1830 01:31:36,080 --> 01:31:37,075 plus on each iteration. 1831 01:31:37,075 --> 01:31:40,670 And then let's just print out %c with no new line, 1832 01:31:40,670 --> 01:31:43,010 because I want everything on the same line, 1833 01:31:43,010 --> 01:31:47,780 whatever the character is at s bracket i. 1834 01:31:47,780 --> 01:31:49,790 And then at the very end, I'll give myself 1835 01:31:49,790 --> 01:31:52,350 that new line, just to move the cursor down to the next line 1836 01:31:52,350 --> 01:31:54,350 so the dollar sign is not in a weird place. 1837 01:31:54,350 --> 01:31:57,230 All right, so let's see if I didn't screw up any of the code, 1838 01:31:57,230 --> 01:32:02,690 make string, Enter, so far so good, string and let me type in something 1839 01:32:02,690 --> 01:32:04,520 like, hi, Enter. 1840 01:32:04,520 --> 01:32:06,020 And I see output of hi, too. 1841 01:32:06,020 --> 01:32:09,680 Let me do it once more with bye, Enter, and that works, too. 1842 01:32:09,680 --> 01:32:12,410 Notice I very deliberately and quickly gave myself 1843 01:32:12,410 --> 01:32:15,260 two spaces here and one space here just because I, literally, 1844 01:32:15,260 --> 01:32:18,620 wanted these things to line up properly, and input is shorter than output. 1845 01:32:18,620 --> 01:32:21,830 But that was just a deliberate formatting detail. 1846 01:32:21,830 --> 01:32:23,520 So this code is correct. 1847 01:32:23,520 --> 01:32:29,240 Which is a claim I've made before, but it's not well-designed. 1848 01:32:29,240 --> 01:32:33,170 It is well-designed in that I'm using someone else's library function, 1849 01:32:33,170 --> 01:32:35,660 like, I've not reinvented a wheel, there's no line 15 1850 01:32:35,660 --> 01:32:38,270 or below, I didn't implement string length myself. 1851 01:32:38,270 --> 01:32:43,640 So I'm at least practicing what I've preached. 1852 01:32:43,640 --> 01:32:48,360 But there's still an imperfection, a suboptimality. 1853 01:32:48,360 --> 01:32:50,910 This one's really subtle though. 1854 01:32:50,910 --> 01:32:54,330 And you have to think about how loops work. 1855 01:32:54,330 --> 01:32:58,640 What am I doing that's not super efficient? 1856 01:32:58,640 --> 01:32:59,870 Yeah, in back? 1857 01:32:59,870 --> 01:33:03,178 AUDIENCE: [INAUDIBLE] over and over again. 1858 01:33:03,178 --> 01:33:04,970 DAVID MALAN: Yeah, this is a little subtle. 1859 01:33:04,970 --> 01:33:07,460 But if you think back to the basic definition of a 4 loop 1860 01:33:07,460 --> 01:33:10,070 and recall when I highlighted things last week, what happens? 1861 01:33:10,070 --> 01:33:12,830 Well, the first thing is that i gets set to 0. 1862 01:33:12,830 --> 01:33:14,310 Then we check the condition. 1863 01:33:14,310 --> 01:33:15,560 How do we check the condition? 1864 01:33:15,560 --> 01:33:18,380 We call strlen on s, we get back an answer 1865 01:33:18,380 --> 01:33:24,810 like 3 if it's a H-I exclamation point and 0 is less than 3, so that's fine, 1866 01:33:24,810 --> 01:33:26,570 and then we print out the character. 1867 01:33:26,570 --> 01:33:29,060 Then we increment i from 0 to 1. 1868 01:33:29,060 --> 01:33:30,468 We recheck the condition. 1869 01:33:30,468 --> 01:33:31,760 How do I recheck the condition? 1870 01:33:31,760 --> 01:33:34,100 I call strlen of s. 1871 01:33:34,100 --> 01:33:36,890 Get back the same answer, 3. 1872 01:33:36,890 --> 01:33:38,720 Compare 3 against 1. 1873 01:33:38,720 --> 01:33:39,800 We're still good. 1874 01:33:39,800 --> 01:33:44,690 So we print out another character. i gets incremented again, i is now 2. 1875 01:33:44,690 --> 01:33:46,035 We check the condition. 1876 01:33:46,035 --> 01:33:46,910 What's the condition? 1877 01:33:46,910 --> 01:33:47,960 Well, what's the string like the best? 1878 01:33:47,960 --> 01:33:48,980 It's still 3. 1879 01:33:48,980 --> 01:33:51,860 2 is still less than 3. 1880 01:33:51,860 --> 01:33:55,430 So I keep asking the same question sort of stupidly 1881 01:33:55,430 --> 01:33:58,220 because the string is, presumably, never changing in length. 1882 01:33:58,220 --> 01:34:00,158 And indeed, every time I check that condition, 1883 01:34:00,158 --> 01:34:01,700 that function is going to get called. 1884 01:34:01,700 --> 01:34:04,380 And every time, the answer for hi is going to be 3. 1885 01:34:04,380 --> 01:34:04,880 3. 1886 01:34:04,880 --> 01:34:06,095 3. 1887 01:34:06,095 --> 01:34:10,850 So it's a marginal suboptimality, but I could do better, right? 1888 01:34:10,850 --> 01:34:15,560 Don't ask multiple times questions that you can remember the answer to. 1889 01:34:15,560 --> 01:34:20,960 So how could I remember the answer to this question and ask it just once? 1890 01:34:20,960 --> 01:34:24,750 How could I remember the answer to this question? 1891 01:34:24,750 --> 01:34:25,250 Let me see. 1892 01:34:25,250 --> 01:34:26,030 Yeah, back there? 1893 01:34:26,030 --> 01:34:27,446 AUDIENCE: Store it in a variable. 1894 01:34:27,446 --> 01:34:29,180 DAVID MALAN: So store it in a variable, right? 1895 01:34:29,180 --> 01:34:32,097 That's been our answer most any time we want to keep something around. 1896 01:34:32,097 --> 01:34:33,120 So how could I do this? 1897 01:34:33,120 --> 01:34:37,880 Well, I could do something like this, int, maybe, length equals strlen of s. 1898 01:34:37,880 --> 01:34:41,200 Then I can just change this function call. 1899 01:34:41,200 --> 01:34:43,160 Let me fix my spelling here. 1900 01:34:43,160 --> 01:34:47,360 Let me fix this to be comparing against length, and this is now OK. 1901 01:34:47,360 --> 01:34:50,240 Because now strlen is only called once on line 9. 1902 01:34:50,240 --> 01:34:52,740 And I'm reusing the value of that variable, a.k.a. 1903 01:34:52,740 --> 01:34:54,240 length, again, and again, and again. 1904 01:34:54,240 --> 01:34:55,282 So that's more efficient. 1905 01:34:55,282 --> 01:34:59,760 Turns out that 4 loops let you declare multiple variables at once, 1906 01:34:59,760 --> 01:35:04,020 so we can do this a little more elegantly all in one line. 1907 01:35:04,020 --> 01:35:06,770 And this is just some syntactic improvement. 1908 01:35:06,770 --> 01:35:11,930 I could actually do something like this, n equals strlen of s, 1909 01:35:11,930 --> 01:35:14,750 and then I could just say n here or I could call it length. 1910 01:35:14,750 --> 01:35:17,667 But heck, while I'm being succinct I'm just going to use n for number. 1911 01:35:17,667 --> 01:35:22,100 So now it's just a marginal change but I've now 1912 01:35:22,100 --> 01:35:26,030 declared two variables inside of my loop, i and n. 1913 01:35:26,030 --> 01:35:29,300 i is set to 0. n extends to the string length of s. 1914 01:35:29,300 --> 01:35:33,380 But now, hereafter, all of my condition checks are just, i less than n, 1915 01:35:33,380 --> 01:35:36,170 i less than n, and n is never changing. 1916 01:35:36,170 --> 01:35:38,008 All right, so a marginal improvement there. 1917 01:35:38,008 --> 01:35:39,800 Now that I've used this new function, let's 1918 01:35:39,800 --> 01:35:41,925 use some other functions that might be of interest. 1919 01:35:41,925 --> 01:35:48,680 Let me write a quick program here that capitalizes the beginning of-- 1920 01:35:48,680 --> 01:35:51,810 changes to uppercase some string that the user types in. 1921 01:35:51,810 --> 01:35:55,490 So let me code a file called uppercase.c. 1922 01:35:55,490 --> 01:36:01,520 Up here I'll use my new friends, cs50.h, and standard I/O, and string.h. 1923 01:36:01,520 --> 01:36:07,070 So standard I/O, and string.h So just as before int main(void). 1924 01:36:07,070 --> 01:36:09,620 And then inside of main, what I'm going to do this time, 1925 01:36:09,620 --> 01:36:14,390 is let's ask the user for a string s using get string asking them 1926 01:36:14,390 --> 01:36:15,680 for the before value. 1927 01:36:15,680 --> 01:36:20,130 And then let me print out something like after. 1928 01:36:20,130 --> 01:36:24,410 So that it-- just so I can see what the uppercase version thereof is. 1929 01:36:24,410 --> 01:36:28,610 And then after this, let me do the following, for int, i 1930 01:36:28,610 --> 01:36:32,030 equals 0, oh, let's practice that same lesson, 1931 01:36:32,030 --> 01:36:37,790 so n equals the string length of s, i is less than n, i plus plus. 1932 01:36:37,790 --> 01:36:41,600 So really, nothing new, fundamentally yet. 1933 01:36:41,600 --> 01:36:47,270 How do I now convert characters from lowercase, if they are, to uppercase? 1934 01:36:47,270 --> 01:36:50,000 In other words, if I type in hi, H-I in lowercase, 1935 01:36:50,000 --> 01:36:55,490 I want my program, now, to uppercase everything to capital H, capital I. 1936 01:36:55,490 --> 01:36:58,770 Well how can I go about doing this? 1937 01:36:58,770 --> 01:37:01,010 Well you might recall that there is this-- 1938 01:37:01,010 --> 01:37:03,900 you might recall that there is this ASCII chart. 1939 01:37:03,900 --> 01:37:06,855 So let's just consult this real quick on asciichart.com. 1940 01:37:06,855 --> 01:37:11,510 We've looked at this last week notice that a-- capital A is 65, 1941 01:37:11,510 --> 01:37:15,440 capital B is 66, capital C is 67, and heck, here's 1942 01:37:15,440 --> 01:37:19,640 lowercase a, lowercase b, lowercase c, and that's 97, 98, 99. 1943 01:37:19,640 --> 01:37:22,980 And if I actually do some math, there's a distance of 32. 1944 01:37:22,980 --> 01:37:23,480 Right? 1945 01:37:23,480 --> 01:37:25,640 So if I want to go from uppercase to lowercase, 1946 01:37:25,640 --> 01:37:30,788 I can do 65 plus 32 will give me 97 and that actually works out 1947 01:37:30,788 --> 01:37:32,330 across the board for everything else. 1948 01:37:32,330 --> 01:37:36,020 66 plus 32 gets me to 98 or lowercase b. 1949 01:37:36,020 --> 01:37:40,640 Or conversely, if you have a lowercase a, and its value is 97, 1950 01:37:40,640 --> 01:37:46,850 subtract 32 and boom, you have capital A. So there's some arithmetic involved. 1951 01:37:46,850 --> 01:37:49,460 But now that we know that strings are just arrays, 1952 01:37:49,460 --> 01:37:53,330 and we know that characters, which are in those arrays, 1953 01:37:53,330 --> 01:37:56,450 are just binary representations of numbers, 1954 01:37:56,450 --> 01:37:59,297 I think we can manipulate a few of these things as follows. 1955 01:37:59,297 --> 01:38:01,130 Let me go back to my program here, and first 1956 01:38:01,130 --> 01:38:05,360 ask the question, if the current character in the array during this loop 1957 01:38:05,360 --> 01:38:08,930 is lowercase, let's force it to uppercase. 1958 01:38:08,930 --> 01:38:10,250 So how am I going to do that? 1959 01:38:10,250 --> 01:38:16,460 If the character at s bracket i, the current location in the array, 1960 01:38:16,460 --> 01:38:21,320 is greater than or equal to lowercase a, and s bracket 1961 01:38:21,320 --> 01:38:26,660 i is less than or equal to lowercase z, kind of a weird Boolean 1962 01:38:26,660 --> 01:38:31,460 expression but it's completely legitimate, because in this array 1963 01:38:31,460 --> 01:38:34,230 s is a whole bunch of characters that the humans typed in, 1964 01:38:34,230 --> 01:38:37,520 because that's what a string is, greater than or equal to a might 1965 01:38:37,520 --> 01:38:39,680 be a little nonsensical because when have you ever 1966 01:38:39,680 --> 01:38:41,330 compared numbers to letters? 1967 01:38:41,330 --> 01:38:47,568 But we know from week 0 lowercase a is 97, lowercase z is, what is it, 1? 1968 01:38:47,568 --> 01:38:48,485 I don't even remember. 1969 01:38:48,485 --> 01:38:49,065 AUDIENCE: 132. 1970 01:38:49,065 --> 01:38:49,850 DAVID MALAN: What's that? 1971 01:38:49,850 --> 01:38:50,590 AUDIENCE: 132? 1972 01:38:50,590 --> 01:38:52,590 DAVID MALAN: 132, We know. 1973 01:38:52,590 --> 01:38:56,390 And so that would allow us to answer the question is the current letter 1974 01:38:56,390 --> 01:38:57,410 lowercase? 1975 01:38:57,410 --> 01:39:00,530 All right, so let me answer that question. 1976 01:39:00,530 --> 01:39:03,140 If it is, what do I want to print out? 1977 01:39:03,140 --> 01:39:05,870 I don't want to print out the letter itself, 1978 01:39:05,870 --> 01:39:09,290 I want to print out the letter minus 32, right? 1979 01:39:09,290 --> 01:39:13,160 Because if it happens to be a lowercase a, 97, 97 minus 32 1980 01:39:13,160 --> 01:39:15,530 gives me 65, which is uppercase A, and I know that 1981 01:39:15,530 --> 01:39:18,860 just from having stared at that chart in the past. 1982 01:39:18,860 --> 01:39:24,172 Else if the character is not between little a and big A, 1983 01:39:24,172 --> 01:39:25,880 I'm just going to print out the character 1984 01:39:25,880 --> 01:39:28,550 itself by printing s bracket i. 1985 01:39:28,550 --> 01:39:31,580 And at the very end of this, I'm going to print out a new line just 1986 01:39:31,580 --> 01:39:33,480 to move the cursor to the next line. 1987 01:39:33,480 --> 01:39:34,930 So again, it's a little wordy. 1988 01:39:34,930 --> 01:39:39,020 But this loop here, which I borrowed from our code previously, 1989 01:39:39,020 --> 01:39:41,510 just iterates over the string, a.k.a. 1990 01:39:41,510 --> 01:39:44,630 array, character-by-character, through its length. 1991 01:39:44,630 --> 01:39:47,360 This line 11 here is just asking the question 1992 01:39:47,360 --> 01:39:50,870 if that current character, the i-th character of s, 1993 01:39:50,870 --> 01:39:53,900 is greater than or equal to little a and less 1994 01:39:53,900 --> 01:39:59,240 than or equal to little z, that is between 97 and 132, then 1995 01:39:59,240 --> 01:40:04,940 we're going to go ahead and force it to uppercase instead. 1996 01:40:04,940 --> 01:40:09,290 All right, and let me zoom out here for just a second. 1997 01:40:09,290 --> 01:40:14,270 And sorry, I misspoke 122, which is what you might have said. 1998 01:40:14,270 --> 01:40:15,630 There's only 26 letters. 1999 01:40:15,630 --> 01:40:17,270 So 122 is little z. 2000 01:40:17,270 --> 01:40:20,280 Let me go ahead now and compile and run this program. 2001 01:40:20,280 --> 01:40:26,210 So make uppercase, ./uppercase, and let me type in hi in lowercase, Enter. 2002 01:40:26,210 --> 01:40:28,520 And there's the capitalized version, thereof. 2003 01:40:28,520 --> 01:40:30,920 Let me do it again, with my own name in lowercase, 2004 01:40:30,920 --> 01:40:33,100 and now it's capitalized as well. 2005 01:40:33,100 --> 01:40:34,860 Well, what could we do to improve this? 2006 01:40:34,860 --> 01:40:35,360 Well. 2007 01:40:35,360 --> 01:40:35,960 You know what? 2008 01:40:35,960 --> 01:40:37,640 Let's stop reinventing wheels. 2009 01:40:37,640 --> 01:40:39,840 Let's go to the manual pages. 2010 01:40:39,840 --> 01:40:43,490 So let me go here and search for something like, I don't know, 2011 01:40:43,490 --> 01:40:44,540 lowercase. 2012 01:40:44,540 --> 01:40:45,620 And there I go. 2013 01:40:45,620 --> 01:40:48,470 I did some auto complete here, our little search box 2014 01:40:48,470 --> 01:40:50,720 is saying that, OK there's an is-lower function, 2015 01:40:50,720 --> 01:40:52,550 check whether a character is lowercase. 2016 01:40:52,550 --> 01:40:53,640 Well how do I use this? 2017 01:40:53,640 --> 01:40:59,150 Well let me check, is lower, now I see the actual man page for this function. 2018 01:40:59,150 --> 01:41:01,850 Now we see, include ctype.h. 2019 01:41:01,850 --> 01:41:02,902 So that's the protot-- 2020 01:41:02,902 --> 01:41:04,610 that's the header file I need to include. 2021 01:41:04,610 --> 01:41:08,570 This is the prototype for is-lower, it apparently takes a char as input 2022 01:41:08,570 --> 01:41:10,330 and returns an int. 2023 01:41:10,330 --> 01:41:11,330 Which is a little weird. 2024 01:41:11,330 --> 01:41:14,400 I feel like is-lower should return true or false. 2025 01:41:14,400 --> 01:41:18,680 So let's scroll down to the description and return value. 2026 01:41:18,680 --> 01:41:20,810 It returns, oh this is interesting. 2027 01:41:20,810 --> 01:41:25,370 And this is a convention in C. This function returns a non-zero int 2028 01:41:25,370 --> 01:41:30,820 if C is a lowercase letter and 0 if C is not a lowercase letter. 2029 01:41:30,820 --> 01:41:33,230 So it returns non-zero. 2030 01:41:33,230 --> 01:41:38,330 So like 1, negative 1, something that's not 0 if C is a lowercase letter, 2031 01:41:38,330 --> 01:41:41,400 and 0 if it is not a lowercase letter. 2032 01:41:41,400 --> 01:41:43,160 So how can we use this building block? 2033 01:41:43,160 --> 01:41:45,230 Let me go back to my code here. 2034 01:41:45,230 --> 01:41:49,610 Let me add this file, include ctype.h. 2035 01:41:49,610 --> 01:41:53,120 And down here, let me get rid of this cryptic expression, which 2036 01:41:53,120 --> 01:41:59,060 was kind of painful to come up with, and just ask this, is-lower s bracket i? 2037 01:42:01,970 --> 01:42:05,390 That should actually work but why? 2038 01:42:05,390 --> 01:42:10,520 Well is-lower, again, returns a non-zero value if the letter is lowercase. 2039 01:42:10,520 --> 01:42:12,150 Well, what does that mean? 2040 01:42:12,150 --> 01:42:13,415 That means it could return 1. 2041 01:42:13,415 --> 01:42:14,540 It could return negative 1. 2042 01:42:14,540 --> 01:42:16,370 It could return 50 or negative 50. 2043 01:42:16,370 --> 01:42:18,650 It's actually not precisely defined, why? 2044 01:42:18,650 --> 01:42:19,700 Just, because. 2045 01:42:19,700 --> 01:42:23,750 This was a common convention to use 0 to represent false and use 2046 01:42:23,750 --> 01:42:26,120 any other value to represent true. 2047 01:42:26,120 --> 01:42:30,140 And so it turns out, that inside of Boolean expressions, 2048 01:42:30,140 --> 01:42:34,755 if you put a value like a function call like this, that returns 0, 2049 01:42:34,755 --> 01:42:36,380 that's going to be equivalent to false. 2050 01:42:36,380 --> 01:42:38,975 It's like the answer being no, it is not lower. 2051 01:42:38,975 --> 01:42:41,990 But you can also, in parentheses, put the name 2052 01:42:41,990 --> 01:42:45,920 of the function and its arguments, and not compare it against anything. 2053 01:42:45,920 --> 01:42:51,230 Because we could do something like this, well if it's not equal to 0, then 2054 01:42:51,230 --> 01:42:52,247 it must be lowercase. 2055 01:42:52,247 --> 01:42:54,830 Because that's the definition, if it returns a non-zero value, 2056 01:42:54,830 --> 01:42:55,760 it's lowercase. 2057 01:42:55,760 --> 01:42:59,210 But a more succinct way to do that is just a bit more like English. 2058 01:42:59,210 --> 01:43:04,110 If it's is lower, then print out the character minus 32. 2059 01:43:04,110 --> 01:43:06,590 So this would be the common way of using one of these 2060 01:43:06,590 --> 01:43:10,025 is- functions to check if the answer is true or false. 2061 01:43:10,025 --> 01:43:12,810 AUDIENCE: [INAUDIBLE] 2062 01:43:12,810 --> 01:43:14,670 DAVID MALAN: OK, well we might be done. 2063 01:43:14,670 --> 01:43:15,170 OK. 2064 01:43:15,170 --> 01:43:16,922 AUDIENCE: [INAUDIBLE] 2065 01:43:16,922 --> 01:43:17,900 DAVID MALAN: No. 2066 01:43:17,900 --> 01:43:19,520 So it's not necessarily 1. 2067 01:43:19,520 --> 01:43:23,180 It would be incorrect to check for 1, or negative 1, or anything else. 2068 01:43:23,180 --> 01:43:25,550 You want to check for the opposite of 0. 2069 01:43:25,550 --> 01:43:26,870 So not equal 0. 2070 01:43:26,870 --> 01:43:31,820 Or more succinctly, like I did by just putting it into parentheses. 2071 01:43:31,820 --> 01:43:34,560 Let me see what happens here. 2072 01:43:34,560 --> 01:43:38,690 So this is great, but some of you might have spotted a better solution 2073 01:43:38,690 --> 01:43:39,680 to this problem. 2074 01:43:39,680 --> 01:43:42,230 A moment ago when we were on the manual pages searching 2075 01:43:42,230 --> 01:43:45,380 for things related to lowercase, what might be another building 2076 01:43:45,380 --> 01:43:46,475 block we can employ here? 2077 01:43:49,160 --> 01:43:50,700 Based on what's on the screen here? 2078 01:43:50,700 --> 01:43:51,200 Yeah? 2079 01:43:51,200 --> 01:43:52,888 AUDIENCE: To-upper. 2080 01:43:52,888 --> 01:43:54,140 DAVID MALAN: So to-upper. 2081 01:43:54,140 --> 01:43:57,098 There's a function that would literally do the uppercasing thing for me 2082 01:43:57,098 --> 01:44:00,032 so I don't have to get into the weeds of negative 32, plus 32. 2083 01:44:00,032 --> 01:44:01,490 I don't have to consult that chart. 2084 01:44:01,490 --> 01:44:05,120 Someone has solved this problem for me in the past. 2085 01:44:05,120 --> 01:44:09,680 And let's see if I can actually get back to it. 2086 01:44:09,680 --> 01:44:10,520 There we go. 2087 01:44:10,520 --> 01:44:12,540 Let me go ahead, now, and use this. 2088 01:44:12,540 --> 01:44:15,230 So instead of doing s bracket i minus 32, 2089 01:44:15,230 --> 01:44:19,880 let's use a function that someone else wrote, and just say to-upper, s bracket 2090 01:44:19,880 --> 01:44:20,420 i. 2091 01:44:20,420 --> 01:44:23,250 And now it's going to do the solution for me. 2092 01:44:23,250 --> 01:44:30,530 So if I rerun make uppercase, and then do, slowly, .uppercase, type in hi, 2093 01:44:30,530 --> 01:44:32,120 now it's working as expected. 2094 01:44:32,120 --> 01:44:35,870 And honestly, if I read the documentation for to-upper 2095 01:44:35,870 --> 01:44:39,170 by going back to its man page, or manual page, what you'll see 2096 01:44:39,170 --> 01:44:44,420 is that it says if it's lowercase, it will return the uppercase version 2097 01:44:44,420 --> 01:44:45,050 thereof. 2098 01:44:45,050 --> 01:44:48,913 If it's not lowercase, it's already uppercase, it's punctuation, 2099 01:44:48,913 --> 01:44:50,705 it will just return the original character. 2100 01:44:50,705 --> 01:44:53,900 Which means, thanks to this function, I can actually 2101 01:44:53,900 --> 01:44:57,650 tighten this up significantly, get rid of all of my conditional 2102 01:44:57,650 --> 01:45:02,030 there, and just print out the to-upper return value, 2103 01:45:02,030 --> 01:45:05,060 and leave it to whoever wrote that function to figure out 2104 01:45:05,060 --> 01:45:09,470 if something's uppercase or lowercase. 2105 01:45:09,470 --> 01:45:13,820 All right, questions on these kinds of tricks? 2106 01:45:13,820 --> 01:45:17,090 Again, it all reduces to week 0 basics, but we're just 2107 01:45:17,090 --> 01:45:18,750 building these abstractions on top. 2108 01:45:18,750 --> 01:45:19,250 Yeah? 2109 01:45:19,250 --> 01:45:21,208 AUDIENCE: I'm wondering if there's any way just 2110 01:45:21,208 --> 01:45:25,110 to import all packages under a certain subdomain instead 2111 01:45:25,110 --> 01:45:27,120 of having to do multiple [INAUDIBLE] statements, 2112 01:45:27,120 --> 01:45:28,412 kind of like a star [INAUDIBLE] 2113 01:45:28,412 --> 01:45:29,340 DAVID MALAN: Yes. 2114 01:45:29,340 --> 01:45:30,180 Unfortunately, no. 2115 01:45:30,180 --> 01:45:33,120 There is no easy way in C to say, give me everything. 2116 01:45:33,120 --> 01:45:35,670 That was for, historically, performance reasons. 2117 01:45:35,670 --> 01:45:38,940 They want you to be explicit as to what you want to include. 2118 01:45:38,940 --> 01:45:41,730 In other languages like Python, Java, one of which 2119 01:45:41,730 --> 01:45:44,513 we'll see later this term, you can say, give me everything. 2120 01:45:44,513 --> 01:45:47,430 But that, actually, tends to be best practice because it can slow down 2121 01:45:47,430 --> 01:45:50,000 execution or compilation of your code. 2122 01:45:50,000 --> 01:45:50,500 Yeah? 2123 01:45:50,500 --> 01:45:52,845 AUDIENCE: Does to-upper accommodate for special characters? 2124 01:45:52,845 --> 01:45:53,340 DAVID MALAN: Ah. 2125 01:45:53,340 --> 01:45:55,980 Does to-upper accommodate special characters like punctuation? 2126 01:45:55,980 --> 01:45:56,480 Yes. 2127 01:45:56,480 --> 01:45:58,440 If I read the documentation more pedantically, 2128 01:45:58,440 --> 01:45:59,710 we would see exactly that. 2129 01:45:59,710 --> 01:46:02,940 It will properly hand me back an exclamation point, 2130 01:46:02,940 --> 01:46:04,600 even if I passed it in. 2131 01:46:04,600 --> 01:46:08,970 So if I do make uppercase here, and let me do ./upper, sorry-- 2132 01:46:08,970 --> 01:46:13,620 ./uppercase, hi with an exclamation point, it's going to handle that, too, 2133 01:46:13,620 --> 01:46:15,810 pass it through unchanged Yeah? 2134 01:46:15,810 --> 01:46:19,200 AUDIENCE: Do we access to a function that would do all of that 2135 01:46:19,200 --> 01:46:21,590 but just to the screen rather than to [INAUDIBLE] 2136 01:46:21,590 --> 01:46:23,550 DAVID MALAN: Really good question, too. 2137 01:46:23,550 --> 01:46:28,110 No, we do not have access to a function that at least comes with C or comes 2138 01:46:28,110 --> 01:46:31,740 with CS50's library that will just force the whole thing to uppercase. 2139 01:46:31,740 --> 01:46:34,170 In C, that's actually easier said than done. 2140 01:46:34,170 --> 01:46:35,550 In Python, it's trivial. 2141 01:46:35,550 --> 01:46:39,810 So stay tuned for another language that will let us do exactly that. 2142 01:46:39,810 --> 01:46:42,510 All right, so what does this leave us with? 2143 01:46:42,510 --> 01:46:44,520 There's just a-- let's come full circle now, 2144 01:46:44,520 --> 01:46:47,490 to where we began today where we were talking about those command line 2145 01:46:47,490 --> 01:46:48,090 arguments. 2146 01:46:48,090 --> 01:46:51,810 Recall that we talked about rm taking command line argument. 2147 01:46:51,810 --> 01:46:54,470 The file you want to delete, we talked about clang 2148 01:46:54,470 --> 01:46:56,220 taking command line arguments, that again, 2149 01:46:56,220 --> 01:46:58,140 modify the behavior of the program. 2150 01:46:58,140 --> 01:47:01,680 How is it that maybe you and I can start to write programs that 2151 01:47:01,680 --> 01:47:03,840 actually take command line arguments? 2152 01:47:03,840 --> 01:47:07,620 Well here is where I can finally explain why 2153 01:47:07,620 --> 01:47:10,740 we've been typing int main(void) for the past week 2154 01:47:10,740 --> 01:47:14,490 and just asking that you take on faith that it's just the way you do things. 2155 01:47:14,490 --> 01:47:20,820 Well, by default in C, at least the most recent versions thereof, 2156 01:47:20,820 --> 01:47:24,010 there's only two official ways to write main functions. 2157 01:47:24,010 --> 01:47:26,460 You might see other formats online, but they're generally 2158 01:47:26,460 --> 01:47:28,870 not consistent with the current specification. 2159 01:47:28,870 --> 01:47:32,160 This, again, was sort of a boilerplate for the simplest 2160 01:47:32,160 --> 01:47:34,770 function we might write last week, and recall that we've 2161 01:47:34,770 --> 01:47:36,210 been doing this the whole time. 2162 01:47:36,210 --> 01:47:40,990 (Void) What that (void) means, for all of the programs I have written thus far 2163 01:47:40,990 --> 01:47:43,890 and you have written thus far, is that none of our programs 2164 01:47:43,890 --> 01:47:47,040 that we've written take command line arguments. 2165 01:47:47,040 --> 01:47:49,110 That's what the void there means. 2166 01:47:49,110 --> 01:47:53,950 It turns out that main is the way you can specify that your program does, 2167 01:47:53,950 --> 01:47:55,740 in fact, take command line arguments, that 2168 01:47:55,740 --> 01:47:59,760 is words after the command in your terminal window. 2169 01:47:59,760 --> 01:48:02,220 If you want to actually not use get int or get string, 2170 01:48:02,220 --> 01:48:05,970 you want the human to be able to say something, like hello, David 2171 01:48:05,970 --> 01:48:06,840 and hit Enter. 2172 01:48:06,840 --> 01:48:09,940 And just run-- print hello, David on the screen. 2173 01:48:09,940 --> 01:48:14,460 You can use command line arguments, words after the program name 2174 01:48:14,460 --> 01:48:16,750 on your command line. 2175 01:48:16,750 --> 01:48:20,460 So we're going to change this in a moment to be something more verbose, 2176 01:48:20,460 --> 01:48:23,930 but something that's now a bit more familiar syntactically. 2177 01:48:23,930 --> 01:48:28,440 If you change that (void) in main to be this incantation instead, 2178 01:48:28,440 --> 01:48:33,480 int, argc, comma, string, argv, open bracket, close bracket, 2179 01:48:33,480 --> 01:48:36,630 you are now giving yourself access to writing programs 2180 01:48:36,630 --> 01:48:38,910 that take command line arguments. 2181 01:48:38,910 --> 01:48:42,120 Argc, which stands for argument count is going 2182 01:48:42,120 --> 01:48:46,410 to be an integer that stores how many words the human typed at the prompt. 2183 01:48:46,410 --> 01:48:49,050 The C automatically gives that to you. 2184 01:48:49,050 --> 01:48:52,710 String argv stands for argument vector, that's 2185 01:48:52,710 --> 01:48:57,100 going to be an array of all of the words that the human typed at the prompt. 2186 01:48:57,100 --> 01:48:59,130 So with today's building block of an array, 2187 01:48:59,130 --> 01:49:01,980 we have the ability now to let the humans type as many words, 2188 01:49:01,980 --> 01:49:03,900 or as few words, as they want at the prompt. 2189 01:49:03,900 --> 01:49:06,900 C is going to automatically put them in an array called argv, 2190 01:49:06,900 --> 01:49:12,360 and it's going to tell us how many words there are in an int called argc. 2191 01:49:12,360 --> 01:49:16,060 The int, as the return type here, we'll come back to in just a moment. 2192 01:49:16,060 --> 01:49:19,350 Let's use this definition to make, maybe, 2193 01:49:19,350 --> 01:49:20,970 just a couple of simple programs. 2194 01:49:20,970 --> 01:49:23,070 But in problem set 2 will we actually use 2195 01:49:23,070 --> 01:49:26,470 this to control the behavior of your own code. 2196 01:49:26,470 --> 01:49:33,120 Let me code up a file called argv.0 just to keep it aptly named. 2197 01:49:33,120 --> 01:49:35,700 Let me include cs50.h. 2198 01:49:35,700 --> 01:49:37,240 Let me go ahead and include-- 2199 01:49:37,240 --> 01:49:37,740 oops. 2200 01:49:37,740 --> 01:49:40,950 That is not the right name of a program, let's start that over. 2201 01:49:40,950 --> 01:49:45,450 Let's go ahead and code up argv.c. 2202 01:49:45,450 --> 01:49:46,800 And here we have-- 2203 01:49:46,800 --> 01:49:52,890 include cs50.h, include stdio.h, int, main, not void, 2204 01:49:52,890 --> 01:50:00,025 let's actually say int, argc, string, argv, open bracket, close bracket. 2205 01:50:00,025 --> 01:50:02,400 No numbers in between because you don't know, in advance, 2206 01:50:02,400 --> 01:50:05,310 how many words the human's going to type at their prompt. 2207 01:50:05,310 --> 01:50:06,760 Now let's go ahead and do this. 2208 01:50:06,760 --> 01:50:10,800 Let's write a very simple program that just says, hello, David, hello, Carter, 2209 01:50:10,800 --> 01:50:12,660 whoever the name is that gets typed. 2210 01:50:12,660 --> 01:50:16,260 But not using get string, let's instead have the human just 2211 01:50:16,260 --> 01:50:19,890 type their name at the prompt, just like rm, just like clang, just like make, 2212 01:50:19,890 --> 01:50:22,170 so it's just one and done when you hit Enter. 2213 01:50:22,170 --> 01:50:23,610 No additional prompts. 2214 01:50:23,610 --> 01:50:28,380 Let me go ahead then and do this, printf, quote-unquote, hello, 2215 01:50:28,380 --> 01:50:31,500 comma, and instead of world today, I want to print out 2216 01:50:31,500 --> 01:50:33,370 whatever the human typed in. 2217 01:50:33,370 --> 01:50:38,850 So let's go ahead and do this, argv, bracket 0 for now. 2218 01:50:38,850 --> 01:50:43,080 But I don't think this is quite what I want because, of course, 2219 01:50:43,080 --> 01:50:48,370 that's going to literally print out argv, bracket, 0, bracket. 2220 01:50:48,370 --> 01:50:52,510 I need a placeholder, so let me put %s here and then put that here. 2221 01:50:52,510 --> 01:50:56,520 So if argv is an array, but it's an array of strings, 2222 01:50:56,520 --> 01:51:00,480 then argv bracket 0 is itself a single string. 2223 01:51:00,480 --> 01:51:03,450 And so it can be plugged into that %s placeholder. 2224 01:51:03,450 --> 01:51:05,740 Let me go ahead and save my program. 2225 01:51:05,740 --> 01:51:09,340 And compile argv, so far, so good. 2226 01:51:09,340 --> 01:51:13,170 Let me now type in my name after the name of the program. 2227 01:51:13,170 --> 01:51:13,980 So no get string. 2228 01:51:13,980 --> 01:51:18,280 I'm literally typing an extra word, my own name at the prompt, Enter. 2229 01:51:18,280 --> 01:51:21,290 OK, it's apparently a little buggy in a couple of ways. 2230 01:51:21,290 --> 01:51:24,500 I forgot my /n but that's not a huge deal. 2231 01:51:24,500 --> 01:51:28,960 But apparently, inside of argv is literally everything 2232 01:51:28,960 --> 01:51:31,270 that humans typed in including the name of the program. 2233 01:51:31,270 --> 01:51:36,250 So logically, how do I print out hello, David, or hello so-and-so and not 2234 01:51:36,250 --> 01:51:37,720 the actual name of the program? 2235 01:51:37,720 --> 01:51:38,960 What needs to change here? 2236 01:51:38,960 --> 01:51:39,460 Yeah? 2237 01:51:39,460 --> 01:51:41,050 AUDIENCE: Change the index to 1. 2238 01:51:41,050 --> 01:51:41,800 DAVID MALAN: Yeah. 2239 01:51:41,800 --> 01:51:45,940 So presumably index to 1, if that's the second thing I, or whichever human, 2240 01:51:45,940 --> 01:51:46,940 has typed at the prompt. 2241 01:51:46,940 --> 01:51:51,410 So let's do make argv again, ./argv, Enter. 2242 01:51:51,410 --> 01:51:52,090 Huh. 2243 01:51:52,090 --> 01:51:53,630 Hello, nul. 2244 01:51:53,630 --> 01:51:55,690 So this is another form of nul. 2245 01:51:55,690 --> 01:51:59,320 But this is user error, now, on my part. 2246 01:51:59,320 --> 01:52:01,070 I didn't do exactly what I said I would. 2247 01:52:01,070 --> 01:52:01,570 Yeah? 2248 01:52:01,570 --> 01:52:02,530 AUDIENCE: You forgot the parameter. 2249 01:52:02,530 --> 01:52:04,430 DAVID MALAN: Yeah, I forgot the parameter. 2250 01:52:04,430 --> 01:52:05,700 So that's actually, hm. 2251 01:52:05,700 --> 01:52:07,450 I should probably deal with that, somehow, 2252 01:52:07,450 --> 01:52:09,292 so that people aren't breaking my program 2253 01:52:09,292 --> 01:52:11,000 and printing out random things, like nul. 2254 01:52:11,000 --> 01:52:14,770 But if I do say argv, David, now you see hello, David. 2255 01:52:14,770 --> 01:52:18,070 I can get a little curious, like what's at location 2? 2256 01:52:18,070 --> 01:52:23,410 Well we can see, make argv, bracket, ./argv, David, Enter. 2257 01:52:23,410 --> 01:52:24,910 All right, so just nothing is there. 2258 01:52:24,910 --> 01:52:28,202 But it turns out, in a couple of weeks, we'll start really poking around memory 2259 01:52:28,202 --> 01:52:30,310 and see if we can't crash programs deliberately 2260 01:52:30,310 --> 01:52:32,800 because nothing is stopping me from saying, 2261 01:52:32,800 --> 01:52:36,470 oh what's at location 2 million, for instance? 2262 01:52:36,470 --> 01:52:38,350 We could really start to get curious. 2263 01:52:38,350 --> 01:52:40,420 But for now, we'll do the right thing. 2264 01:52:40,420 --> 01:52:44,360 But let's now make sure the human has typed in the right number of words. 2265 01:52:44,360 --> 01:52:50,920 So let's say this, if argc equals 2, that is the name of the program 2266 01:52:50,920 --> 01:52:54,760 and one more word after that, go ahead and trust that in argv 1, 2267 01:52:54,760 --> 01:52:56,980 as you proposed, is the person's name. 2268 01:52:56,980 --> 01:53:01,810 Else, let's go ahead and default here to something simple and basic, 2269 01:53:01,810 --> 01:53:05,860 like, well, if we don't get a name from the user, just say hello, world, 2270 01:53:05,860 --> 01:53:07,300 like always. 2271 01:53:07,300 --> 01:53:10,045 So now we're programming defensively. 2272 01:53:10,045 --> 01:53:13,090 This time the human, even if they screw up, they don't give us a name 2273 01:53:13,090 --> 01:53:15,965 or they give us too many names, we're just going to say hello, world, 2274 01:53:15,965 --> 01:53:17,890 because I now have some error handling here. 2275 01:53:17,890 --> 01:53:22,030 Because, again, argc is argument count, the number of words, total, 2276 01:53:22,030 --> 01:53:23,990 typed at the command line. 2277 01:53:23,990 --> 01:53:26,740 So make, argv, ./argv. 2278 01:53:26,740 --> 01:53:28,540 Let me make the same mistake as before. 2279 01:53:28,540 --> 01:53:29,050 OK. 2280 01:53:29,050 --> 01:53:30,910 I don't get this weird nul behavior. 2281 01:53:30,910 --> 01:53:32,350 I get something well-defined. 2282 01:53:32,350 --> 01:53:33,610 I could now do David. 2283 01:53:33,610 --> 01:53:36,850 I could do David Malan, but that's not currently supported. 2284 01:53:36,850 --> 01:53:41,290 I would need to alter my logic to support more than just two words 2285 01:53:41,290 --> 01:53:42,345 after the prompt. 2286 01:53:42,345 --> 01:53:43,770 So what's the point of this? 2287 01:53:43,770 --> 01:53:45,520 At the moment, it's just a simple exercise 2288 01:53:45,520 --> 01:53:50,702 to actually give myself a way of taking user input when they run the program. 2289 01:53:50,702 --> 01:53:52,660 Because, consider, it's just more convenient in 2290 01:53:52,660 --> 01:53:54,670 this new, command-line-interface world. 2291 01:53:54,670 --> 01:53:58,857 If you had to use get string every time you compile your code, 2292 01:53:58,857 --> 01:54:00,190 it'd be kind of annoying, right? 2293 01:54:00,190 --> 01:54:03,940 You type make, then you might get a prompt, what would you like to make? 2294 01:54:03,940 --> 01:54:07,690 Then you type in hello, or cash, or something else, then you hit Enter, 2295 01:54:07,690 --> 01:54:09,330 it just really slows the process. 2296 01:54:09,330 --> 01:54:11,440 But in this command-line-interface world, 2297 01:54:11,440 --> 01:54:14,770 if you support command line arguments, then you can use these little tricks. 2298 01:54:14,770 --> 01:54:18,170 Like, scrolling up and down in your history with your arrow keys. 2299 01:54:18,170 --> 01:54:22,430 You can just type commands more quickly because you can do it all at once. 2300 01:54:22,430 --> 01:54:25,000 And you don't have to keep prompting the user, more 2301 01:54:25,000 --> 01:54:27,760 pedantically, for more and more info. 2302 01:54:27,760 --> 01:54:30,280 So any questions then on command line arguments? 2303 01:54:30,280 --> 01:54:34,000 Which, finally, reveals why we had (void) initially, 2304 01:54:34,000 --> 01:54:36,610 but what more we can now put in main. 2305 01:54:36,610 --> 01:54:39,070 That's how you take command line arguments. 2306 01:54:39,070 --> 01:54:40,500 Yeah? 2307 01:54:40,500 --> 01:54:42,610 AUDIENCE: If you were to put-- 2308 01:54:42,610 --> 01:54:47,320 if you were to use argv, and you were to put integers inside of it, 2309 01:54:47,320 --> 01:54:49,923 would it still give you, like, a string? 2310 01:54:49,923 --> 01:54:51,506 Would that still be considered string? 2311 01:54:51,506 --> 01:54:52,923 Or would you consider [INAUDIBLE]? 2312 01:54:52,923 --> 01:54:53,760 DAVID MALAN: Yes. 2313 01:54:53,760 --> 01:54:56,550 If you were to type at the command line something 2314 01:54:56,550 --> 01:55:00,660 like, not a word, but something like the number 42, 2315 01:55:00,660 --> 01:55:03,450 that would actually be treated as a string. 2316 01:55:03,450 --> 01:55:04,290 Why? 2317 01:55:04,290 --> 01:55:06,220 Because again, context matters. 2318 01:55:06,220 --> 01:55:08,940 So if your program is currently manipulating memory 2319 01:55:08,940 --> 01:55:12,510 as though its characters or strings, whatever those patterns of 0s and 1s 2320 01:55:12,510 --> 01:55:16,800 are, they will be interpreted as ASCII text, or Unicode text. 2321 01:55:16,800 --> 01:55:20,640 If we therefore go to the chart here, that might make you wonder, well, 2322 01:55:20,640 --> 01:55:24,510 then how do you distinguish numbers from letters in the context of something 2323 01:55:24,510 --> 01:55:25,890 like chars and strings? 2324 01:55:25,890 --> 01:55:34,380 Well, notice 65 is a, 97 is a, but also 49 is 1, and 50 is 2. 2325 01:55:34,380 --> 01:55:37,500 So the designers of ASCII, and then later Unicode, 2326 01:55:37,500 --> 01:55:40,680 realized well wait a minute, if we want to support programs 2327 01:55:40,680 --> 01:55:43,440 that let you type things that look like numbers, 2328 01:55:43,440 --> 01:55:46,350 even though they're not technically ints or floats, 2329 01:55:46,350 --> 01:55:50,620 we need a way in ASCII and Unicode to represent even numbers. 2330 01:55:50,620 --> 01:55:51,870 So here are your numbers. 2331 01:55:51,870 --> 01:55:55,210 And it's a little silly that we have numbers representing other numbers. 2332 01:55:55,210 --> 01:55:57,863 But again, if you're in the world of letters and characters, 2333 01:55:57,863 --> 01:56:00,030 you've got to come up with a mapping for everything. 2334 01:56:00,030 --> 01:56:01,790 And notice here, here's the dot. 2335 01:56:01,790 --> 01:56:06,390 Even if you were to represent 1.23 as a string, or as characters, 2336 01:56:06,390 --> 01:56:10,840 even the dot now is going to be represented as an ASCII character. 2337 01:56:10,840 --> 01:56:12,930 So again, context here matters. 2338 01:56:12,930 --> 01:56:17,370 All right, one final example to tease apart what this int is 2339 01:56:17,370 --> 01:56:19,840 and what it's been doing here for so long. 2340 01:56:19,840 --> 01:56:24,780 So I'm going to add one bit of logic to a new file 2341 01:56:24,780 --> 01:56:27,750 that I'm going to call exit.c. 2342 01:56:27,750 --> 01:56:29,130 So an exit.c. 2343 01:56:29,130 --> 01:56:32,880 We're going to introduce something that are generally known as exit status. 2344 01:56:32,880 --> 01:56:34,980 It turns out this is not a feature we've used yet, 2345 01:56:34,980 --> 01:56:37,240 but it's just useful to know about. 2346 01:56:37,240 --> 01:56:40,350 Especially when automating tests of your own code. 2347 01:56:40,350 --> 01:56:44,115 When it comes to figuring out if a program succeeded or failed. 2348 01:56:44,115 --> 01:56:48,870 It turns out that main has one more feature we haven't leveraged. 2349 01:56:48,870 --> 01:56:54,330 An ability to signal to the user whether something was successful or not. 2350 01:56:54,330 --> 01:56:57,760 And that's by way of main's return value. 2351 01:56:57,760 --> 01:57:02,060 So I'm going modify this program as follows, like this. 2352 01:57:02,060 --> 01:57:04,920 Suppose I want to write a similar program that 2353 01:57:04,920 --> 01:57:07,900 requires that the user type a word at the prompt. 2354 01:57:07,900 --> 01:57:12,450 So that argc has to be 2 for whatever design purpose. 2355 01:57:12,450 --> 01:57:18,990 If argc does not equal 2, I want to quit out of my program prematurely. 2356 01:57:18,990 --> 01:57:22,590 I want to insist that the user operate the program correctly. 2357 01:57:22,590 --> 01:57:28,800 So I might give them an error message like, missing command line argument /n. 2358 01:57:28,800 --> 01:57:31,180 But now I want to quit out of the program. 2359 01:57:31,180 --> 01:57:32,310 Now how can I do that? 2360 01:57:32,310 --> 01:57:37,260 The right way, quote-unquote, to do that is to return a value from main. 2361 01:57:37,260 --> 01:57:40,590 Now it's a little weird because no one called main yet, 2362 01:57:40,590 --> 01:57:42,990 right, main just gets called automatically, 2363 01:57:42,990 --> 01:57:45,300 but the convention is anytime something goes 2364 01:57:45,300 --> 01:57:50,100 wrong in a program you should return a non-zero value from main. 2365 01:57:50,100 --> 01:57:51,780 1 is fine as a go-to. 2366 01:57:51,780 --> 01:57:55,470 We don't need to get into the weeds of having many different exit statuses, 2367 01:57:55,470 --> 01:57:56,220 so to speak. 2368 01:57:56,220 --> 01:58:01,770 But if you return 1, that is a clue to the system, the Mac, the PC, the cloud 2369 01:58:01,770 --> 01:58:03,430 device that's something went wrong. 2370 01:58:03,430 --> 01:58:03,930 Why? 2371 01:58:03,930 --> 01:58:05,670 Because 1 is not 0. 2372 01:58:05,670 --> 01:58:11,460 If everything works fine, like, let's go ahead and print out hello comma %s like 2373 01:58:11,460 --> 01:58:16,620 before, quote-unquote argv bracket 1. 2374 01:58:16,620 --> 01:58:19,080 So this is just a version of the program without an else. 2375 01:58:19,080 --> 01:58:21,390 So this is the same as doing, essentially, 2376 01:58:21,390 --> 01:58:23,580 an else here like I did earlier. 2377 01:58:23,580 --> 01:58:26,740 I want to signal to the computer that all is well. 2378 01:58:26,740 --> 01:58:28,290 And so I return 0. 2379 01:58:28,290 --> 01:58:31,650 But strictly speaking, if I'm already returning here, 2380 01:58:31,650 --> 01:58:34,560 I don't technically need, if I really want to be nit picky, 2381 01:58:34,560 --> 01:58:36,870 I don't technically need the else because the only way 2382 01:58:36,870 --> 01:58:41,486 I'm going to get to line 11 is if I didn't already return. 2383 01:58:41,486 --> 01:58:43,180 So what's going on here? 2384 01:58:43,180 --> 01:58:46,530 The only new thing here logically, is that for the first time ever, 2385 01:58:46,530 --> 01:58:48,810 I'm returning a value from main. 2386 01:58:48,810 --> 01:58:50,730 That's something I could always have done 2387 01:58:50,730 --> 01:58:55,290 because main has always been defined by us as taking an int as a return value. 2388 01:58:55,290 --> 01:58:59,880 By default, main automatically, sort of secretly, returns 0 for you. 2389 01:58:59,880 --> 01:59:02,850 If you've never once use the return keyword, which you probably 2390 01:59:02,850 --> 01:59:05,370 haven't in main, it just automatically returns 0 2391 01:59:05,370 --> 01:59:07,295 and the system assumes that all went well. 2392 01:59:07,295 --> 01:59:09,390 But now that we're starting to get a little more 2393 01:59:09,390 --> 01:59:11,520 sophisticated with our code, and you know, 2394 01:59:11,520 --> 01:59:15,480 the programmer, something went wrong, you can abort programs early. 2395 01:59:15,480 --> 01:59:20,610 You can exit out of them by returning some other value, besides 0, from main. 2396 01:59:20,610 --> 01:59:23,040 And this is fortuitous that it's an int, right? 2397 01:59:23,040 --> 01:59:25,110 0 means everything worked. 2398 01:59:25,110 --> 01:59:29,250 Unfortunately, in programming, there are seemingly, an infinite number of things 2399 01:59:29,250 --> 01:59:30,240 that can go wrong. 2400 01:59:30,240 --> 01:59:33,210 And int gives you 4 billion possible codes 2401 01:59:33,210 --> 01:59:36,455 that you can use, a.k.a. exit statuses, to signify errors. 2402 01:59:36,455 --> 01:59:39,930 So if you've ever on your Mac or PC gotten some weird pop up 2403 01:59:39,930 --> 01:59:43,320 that an error happened, sometimes, there's a cryptic number in it. 2404 01:59:43,320 --> 01:59:45,420 Maybe it's positive, maybe it's negative. 2405 01:59:45,420 --> 01:59:50,170 It might say error code 123, or negative 49, or something like that. 2406 01:59:50,170 --> 01:59:54,310 What you're generally seeing, are these exit statuses, these return 2407 01:59:54,310 --> 01:59:57,610 values from main in a program that someone at Microsoft, 2408 01:59:57,610 --> 02:00:01,120 or Apple, or somewhere else wrote, something went wrong, 2409 02:00:01,120 --> 02:00:05,980 they are unnecessarily showing you, the user what the error code is. 2410 02:00:05,980 --> 02:00:09,100 If only, so that when you call customer support or submit a ticket, 2411 02:00:09,100 --> 02:00:12,190 you can tell them what exit status you encountered, 2412 02:00:12,190 --> 02:00:15,070 what error code you encounter. 2413 02:00:15,070 --> 02:00:19,390 All right, any questions on exit statuses, 2414 02:00:19,390 --> 02:00:24,580 which is the last of our new building blocks, for now? 2415 02:00:24,580 --> 02:00:25,540 Any questions at all? 2416 02:00:25,540 --> 02:00:26,040 Yeah? 2417 02:00:26,040 --> 02:00:33,540 AUDIENCE: [INAUDIBLE] You know how if you have get string or get int, 2418 02:00:33,540 --> 02:00:35,418 if you want to make [INAUDIBLE] 2419 02:00:35,418 --> 02:00:36,085 DAVID MALAN: No. 2420 02:00:36,085 --> 02:00:39,265 The question is can you do things again and again 2421 02:00:39,265 --> 02:00:41,890 at the command line like you could with get string and get int. 2422 02:00:41,890 --> 02:00:43,870 Which, by default, recall are automatically 2423 02:00:43,870 --> 02:00:46,420 designed to keep prompting the user in their own loop 2424 02:00:46,420 --> 02:00:49,960 until they give you an int, or a float, or the like with command line 2425 02:00:49,960 --> 02:00:50,740 arguments, no. 2426 02:00:50,740 --> 02:00:52,210 You're going to get an error message but then 2427 02:00:52,210 --> 02:00:54,002 you're going to be returned to your prompt. 2428 02:00:54,002 --> 02:00:57,387 And it's up to you to type it correctly the next time. 2429 02:00:57,387 --> 02:00:57,970 Good question. 2430 02:00:57,970 --> 02:00:58,470 Yeah? 2431 02:00:58,470 --> 02:01:03,435 AUDIENCE: [INAUDIBLE] automatically for you. 2432 02:01:03,435 --> 02:01:05,310 DAVID MALAN: If you do not return a value 2433 02:01:05,310 --> 02:01:08,730 explicitly main will automatically return 0 for you, 2434 02:01:08,730 --> 02:01:12,640 that is the way C simply works so it's not strictly necessary. 2435 02:01:12,640 --> 02:01:15,510 But now that we're starting to return values explicitly, 2436 02:01:15,510 --> 02:01:18,090 if something goes wrong, it would be good practice 2437 02:01:18,090 --> 02:01:21,480 to also start returning a value for main when something goes right 2438 02:01:21,480 --> 02:01:23,775 and there are no errors. 2439 02:01:23,775 --> 02:01:27,810 So let's now get out of the weeds and contextualize 2440 02:01:27,810 --> 02:01:31,200 this for some actual problems that we'll be solving in the coming days 2441 02:01:31,200 --> 02:01:33,130 by way of problems set 2 and beyond. 2442 02:01:33,130 --> 02:01:35,740 So here for instance-- 2443 02:01:35,740 --> 02:01:39,990 So here for instance, is a problem that you might think back 2444 02:01:39,990 --> 02:01:43,980 to when you were a kid the readability of some text or some book, 2445 02:01:43,980 --> 02:01:46,230 the grade level in which some book is written. 2446 02:01:46,230 --> 02:01:49,740 If you're a young student, you might read at first-grade level 2447 02:01:49,740 --> 02:01:51,240 or third-grade level in the US. 2448 02:01:51,240 --> 02:01:53,032 Or, if you're in college presumably, you're 2449 02:01:53,032 --> 02:01:54,945 reading at a university-level of text. 2450 02:01:54,945 --> 02:01:58,073 But what does it mean for text, like in a book, 2451 02:01:58,073 --> 02:02:00,240 or in an essay, or something like that to correspond 2452 02:02:00,240 --> 02:02:01,590 to some kind of grade level? 2453 02:02:01,590 --> 02:02:04,950 Well, here's a quote-- a title of a childhood book. 2454 02:02:04,950 --> 02:02:07,590 One Fish, Two Fish, Red Fish, Blue Fish. 2455 02:02:07,590 --> 02:02:10,840 What might the grade level be for a book that has words like this? 2456 02:02:10,840 --> 02:02:13,590 Maybe, when you were a kid or if you have a siblings still reading 2457 02:02:13,590 --> 02:02:16,260 these things, what might the grade level of this thing be? 2458 02:02:18,800 --> 02:02:19,590 Any guesses? 2459 02:02:19,590 --> 02:02:20,090 Yeah? 2460 02:02:20,090 --> 02:02:21,257 AUDIENCE: Before grade 1. 2461 02:02:21,257 --> 02:02:22,340 DAVID MALAN: Sorry, again? 2462 02:02:22,340 --> 02:02:23,382 AUDIENCE: Before grade 1. 2463 02:02:23,382 --> 02:02:25,650 DAVID MALAN: Before grade 1 is, in fact, correct. 2464 02:02:25,650 --> 02:02:27,290 So that's for really young kids? 2465 02:02:27,290 --> 02:02:28,230 Why is that? 2466 02:02:28,230 --> 02:02:29,180 Well, let's consider. 2467 02:02:29,180 --> 02:02:32,210 These are pretty simple phrases, right? 2468 02:02:32,210 --> 02:02:33,500 One fish, two fish, red-- 2469 02:02:33,500 --> 02:02:35,960 I mean there's not even verbs in these sentences, 2470 02:02:35,960 --> 02:02:40,040 they're just nouns and adjectives, and very short sentences. 2471 02:02:40,040 --> 02:02:42,200 And so that might be a heuristic we could use. 2472 02:02:42,200 --> 02:02:44,810 When analyzing text, well if the words are kind of short, 2473 02:02:44,810 --> 02:02:47,240 the sentences are kind of short, everything's very simple, 2474 02:02:47,240 --> 02:02:50,250 that's probably a very young, or early, grade level. 2475 02:02:50,250 --> 02:02:53,665 And so by one formulation, it might indeed be even before grade 1, 2476 02:02:53,665 --> 02:02:54,665 for someone quite young. 2477 02:02:54,665 --> 02:02:55,670 How about this? 2478 02:02:55,670 --> 02:02:58,022 Mr and Mrs. Dursley, of number 4, Privet Drive, 2479 02:02:58,022 --> 02:03:00,980 were proud to say that they were perfectly normal, thank you very much. 2480 02:03:00,980 --> 02:03:02,960 They were the last people you would expect 2481 02:03:02,960 --> 02:03:05,120 to be involved in anything strange or mysterious 2482 02:03:05,120 --> 02:03:07,850 because they just didn't hold with such nonsense. 2483 02:03:07,850 --> 02:03:08,782 And, onward. 2484 02:03:08,782 --> 02:03:10,490 All right, what grade level is this book? 2485 02:03:10,490 --> 02:03:11,778 AUDIENCE: Third. 2486 02:03:11,778 --> 02:03:13,070 DAVID MALAN: OK, I heard third. 2487 02:03:13,070 --> 02:03:14,585 AUDIENCE: What? 2488 02:03:14,585 --> 02:03:15,980 DAVID MALAN: Seventh, fifth. 2489 02:03:15,980 --> 02:03:17,150 OK, all over the place. 2490 02:03:17,150 --> 02:03:20,540 But grade 7, according to one particular measure. 2491 02:03:20,540 --> 02:03:24,802 And whether or not we can debate exactly what age you were when you read this, 2492 02:03:24,802 --> 02:03:27,260 and maybe you're feeling ahead of your time, or behind now. 2493 02:03:27,260 --> 02:03:31,470 But here, we have a snippet of text. 2494 02:03:31,470 --> 02:03:36,560 What makes this text assume an older audience, a more mature audience, 2495 02:03:36,560 --> 02:03:39,690 a higher grade level, would you think? 2496 02:03:39,690 --> 02:03:40,190 Yeah? 2497 02:03:40,190 --> 02:03:42,415 AUDIENCE: [INAUDIBLE] 2498 02:03:42,415 --> 02:03:45,110 DAVID MALAN: Yeah, it's longer, different types of words, 2499 02:03:45,110 --> 02:03:47,513 there's commas now in phrases, and so forth. 2500 02:03:47,513 --> 02:03:49,680 So there's just some kind of sophistication to this. 2501 02:03:49,680 --> 02:03:52,280 So it turns out for the upcoming problem set, 2502 02:03:52,280 --> 02:03:55,370 among the things you'll do is take, as input, texts like this 2503 02:03:55,370 --> 02:03:56,510 and analyze them. 2504 02:03:56,510 --> 02:03:59,072 Considering , well, how many words are in the text? 2505 02:03:59,072 --> 02:04:00,530 How many sentences are in the text? 2506 02:04:00,530 --> 02:04:02,375 How many letters are in the text? 2507 02:04:02,375 --> 02:04:06,170 And use those according to a well-defined formula to prescribe what, 2508 02:04:06,170 --> 02:04:09,680 exactly, the grade level of some actual text-- there's the third-- 2509 02:04:09,680 --> 02:04:10,582 might actually be. 2510 02:04:10,582 --> 02:04:12,790 Well what else are we going to do in the coming days? 2511 02:04:12,790 --> 02:04:15,410 Well I've alluded to this notion of cryptography in the past. 2512 02:04:15,410 --> 02:04:18,350 This notion of scrambling information in such a way 2513 02:04:18,350 --> 02:04:21,422 that you can hide the contents of a message 2514 02:04:21,422 --> 02:04:23,630 from someone who might otherwise intercept it, right? 2515 02:04:23,630 --> 02:04:26,130 The earliest form of this might also be when you're younger, 2516 02:04:26,130 --> 02:04:29,390 and you're in class, and you're passing a note from one person to another, 2517 02:04:29,390 --> 02:04:30,650 from yourself to someone else. 2518 02:04:30,650 --> 02:04:32,960 You don't want to necessarily write a note in English, 2519 02:04:32,960 --> 02:04:35,120 or some other written, language you might want 2520 02:04:35,120 --> 02:04:37,430 to scramble it somehow, or encrypt it. 2521 02:04:37,430 --> 02:04:40,460 Maybe you change the As to a B, and the Bs to a C. 2522 02:04:40,460 --> 02:04:42,770 So that if the teacher snaps it up and intercepts it, 2523 02:04:42,770 --> 02:04:45,200 they can't actually understand what it is you've 2524 02:04:45,200 --> 02:04:47,160 written because it's encrypted. 2525 02:04:47,160 --> 02:04:49,610 So long as your friend, the recipient of this note, 2526 02:04:49,610 --> 02:04:51,890 knows how you manipulated it. 2527 02:04:51,890 --> 02:04:55,640 How you added or subtracted letters to each other, 2528 02:04:55,640 --> 02:04:58,850 they can decrypt it, which is to reverse that process. 2529 02:04:58,850 --> 02:05:02,070 So formally, in the world of cryptography and computer science, 2530 02:05:02,070 --> 02:05:04,130 this is another problem to solve. 2531 02:05:04,130 --> 02:05:07,173 Your input, though, when you have a message you want to send securely, 2532 02:05:07,173 --> 02:05:08,840 is what's generally known as plain text. 2533 02:05:08,840 --> 02:05:12,980 There's some algorithm that's going to then encipher, or encrypt 2534 02:05:12,980 --> 02:05:16,100 that information, into what's called ciphertext, which 2535 02:05:16,100 --> 02:05:18,650 is the scrambled version that theoretically can get safely 2536 02:05:18,650 --> 02:05:21,110 intercepted and your message has not been spoiled, 2537 02:05:21,110 --> 02:05:24,620 unless that intercept actually knows what algorithm 2538 02:05:24,620 --> 02:05:27,150 you used inside of this process. 2539 02:05:27,150 --> 02:05:29,720 So that would be generally known as a cipher. 2540 02:05:29,720 --> 02:05:33,080 The ciphers typically take, though, not one input, but two. 2541 02:05:33,080 --> 02:05:37,685 If, for instance, your cipher is as simple as A becomes B, 2542 02:05:37,685 --> 02:05:41,420 B becomes C, C becomes D, dot dot dot, Z becomes A, 2543 02:05:41,420 --> 02:05:45,140 you're essentially adding one to every letter and encrypting it. 2544 02:05:45,140 --> 02:05:47,750 Now that would be, what we call, the key. 2545 02:05:47,750 --> 02:05:51,470 You and the recipient both have to agree, presumably, before class, 2546 02:05:51,470 --> 02:05:55,280 in advance, what number you're going to use that day to rotate, 2547 02:05:55,280 --> 02:05:56,960 or change all of these letters by. 2548 02:05:56,960 --> 02:06:00,410 Because when you add 1, they upon receiving your ciphertext 2549 02:06:00,410 --> 02:06:03,090 have to subtract 1 to get back the answer. 2550 02:06:03,090 --> 02:06:07,730 For instance, if the input, plaintext, is hi, as before, 2551 02:06:07,730 --> 02:06:13,010 and the key is 1, the ciphertext using this simple rotational algorithm, 2552 02:06:13,010 --> 02:06:17,720 otherwise known as the Caesar cipher, might be ij exclamation point. 2553 02:06:17,720 --> 02:06:21,408 So it's similar, but it's at least scrambled at first glance. 2554 02:06:21,408 --> 02:06:23,450 And unless the teacher really cares to figure out 2555 02:06:23,450 --> 02:06:26,420 what algorithm are they using today, or what key are they using today, 2556 02:06:26,420 --> 02:06:29,700 it's probably sufficiently secure for your purposes. 2557 02:06:29,700 --> 02:06:31,160 How do you reverse the process? 2558 02:06:31,160 --> 02:06:34,190 Well, your friend gets this and reverses it by negative 1. 2559 02:06:34,190 --> 02:06:38,630 So I becomes H, J becomes I, and things like punctuation 2560 02:06:38,630 --> 02:06:41,060 remain untouched at least in this scheme. 2561 02:06:41,060 --> 02:06:43,580 So let's consider one final example here. 2562 02:06:43,580 --> 02:06:51,080 If the input to the algorithm is Uijtxbtdt50, and the key 2563 02:06:51,080 --> 02:06:53,090 this time is negative 1. 2564 02:06:53,090 --> 02:06:59,510 Such that now B should become A, and C should become B, and A should become A. 2565 02:06:59,510 --> 02:07:01,130 So we're going in the other direction. 2566 02:07:01,130 --> 02:07:03,030 How might we analyze this? 2567 02:07:03,030 --> 02:07:06,000 Well if we spread all the letters out, and we start from left to right, 2568 02:07:06,000 --> 02:07:11,780 and we start subtracting one letter, U becomes T, I becomes H, J becomes I, 2569 02:07:11,780 --> 02:07:17,220 T becomes S, X becomes W, A, was, D, T-- 2570 02:07:17,220 --> 02:07:18,270 this was CS50. 2571 02:07:18,270 --> 02:07:19,470 We'll see you next time. 2572 02:07:19,470 --> 02:07:21,320 [APPLAUSE] 2573 02:07:20,000 --> 02:07:56,000 [MUSIC PLAYING]