1 00:00:00,000 --> 00:00:03,486 [MUSIC PLAYING] 2 00:00:49,800 --> 00:00:50,962 [VIDEO PLAYBACK] 3 00:00:50,962 --> 00:00:52,840 - --we know? 4 00:00:52,840 --> 00:00:56,660 - That at 9:15, Ray Santoya was at the ATM. 5 00:00:56,660 --> 00:00:59,890 - So the question is, what was he doing at 9:16? 6 00:00:59,890 --> 00:01:02,690 - Shooting the nine-millimeter at something. 7 00:01:02,690 --> 00:01:04,330 Maybe he saw the sniper. 8 00:01:04,330 --> 00:01:06,445 - Or he was working with him. 9 00:01:06,445 --> 00:01:07,680 - Right. 10 00:01:07,680 --> 00:01:08,620 Go back one. 11 00:01:08,620 --> 00:01:09,370 - What do you see? 12 00:01:17,190 --> 00:01:18,310 - Bring his face up. 13 00:01:18,310 --> 00:01:20,900 Full screen. 14 00:01:20,900 --> 00:01:22,020 - His glasses. 15 00:01:22,020 --> 00:01:23,324 - There's a reflection. 16 00:01:33,490 --> 00:01:35,080 - That's Neuvitas baseball team. 17 00:01:35,080 --> 00:01:36,200 That's their logo. 18 00:01:36,200 --> 00:01:39,110 - And he's talking to whoever is wearing that jacket. 19 00:01:39,110 --> 00:01:40,405 - We may have a witness. 20 00:01:40,405 --> 00:01:41,757 - To both shootings. 21 00:01:41,757 --> 00:01:42,340 [END PLAYBACK] 22 00:01:42,340 --> 00:01:45,910 DAVID MALAN: This is he is CS50, and this is lecture 3, and that 23 00:01:45,910 --> 00:01:47,809 is not how computer science works. 24 00:01:47,809 --> 00:01:49,600 And indeed, by the end of today, we'll make 25 00:01:49,600 --> 00:01:52,270 clear exactly what's right, what's not right about that, 26 00:01:52,270 --> 00:01:57,340 and hopefully give you some pause any time you watch TV or movies hereafter 27 00:01:57,340 --> 00:02:00,160 and notice these little things that all too many writers seem 28 00:02:00,160 --> 00:02:02,390 to take for granted. 29 00:02:02,390 --> 00:02:06,620 So recall that last time, we took a look lower level at what compiling actually 30 00:02:06,620 --> 00:02:07,120 is. 31 00:02:07,120 --> 00:02:10,570 And recall that it was a few things, these four steps of pre-processing 32 00:02:10,570 --> 00:02:12,386 and compiling and assembling and linking, 33 00:02:12,386 --> 00:02:14,260 so that when you start with their source cod, 34 00:02:14,260 --> 00:02:17,140 that might look like this code that we have written in the past, 35 00:02:17,140 --> 00:02:20,590 you first have to preprocess it, and the first step in pre-processing was 36 00:02:20,590 --> 00:02:23,180 converting all of those processor instructions-- 37 00:02:23,180 --> 00:02:26,655 anything starting with a hash at the beginning-- to their equivalents. 38 00:02:26,655 --> 00:02:29,530 So opening the files and effectively copying and pasting the contents 39 00:02:29,530 --> 00:02:32,680 there so that programs and the compiler know what get_string 40 00:02:32,680 --> 00:02:34,330 is and know what printf is. 41 00:02:34,330 --> 00:02:36,400 The next step that came after that was actually 42 00:02:36,400 --> 00:02:39,820 compiling, whereby compiling technically means taking that source 43 00:02:39,820 --> 00:02:42,790 code, once it's been preprocessed, and printing and generating 44 00:02:42,790 --> 00:02:45,710 this very cryptic-looking stuff called assembly code. 45 00:02:45,710 --> 00:02:50,140 And those assembly codes or assembly instructions are really what the CPU-- 46 00:02:50,140 --> 00:02:52,780 the brain of your computer-- actually understands, 47 00:02:52,780 --> 00:02:55,840 although technically the computer understands them only in the form 48 00:02:55,840 --> 00:02:57,170 of 0's and 1's. 49 00:02:57,170 --> 00:03:00,010 And so when you "assemble-- step three-- 50 00:03:00,010 --> 00:03:02,952 that assembly code, you actually get out those 0's and 1's. 51 00:03:02,952 --> 00:03:06,160 But even that simplest of programs where we just prompt the user for a string 52 00:03:06,160 --> 00:03:10,010 and then print out their name still involved a couple more files. 53 00:03:10,010 --> 00:03:15,730 There was not only cs50.h and stdio.h at the top, 54 00:03:15,730 --> 00:03:20,650 somewhere in the computer system there's probably files called cs50.c, 55 00:03:20,650 --> 00:03:25,060 and in the case of stdio, printf.c, in which actually the code is 56 00:03:25,060 --> 00:03:28,300 for those two functions, those two have to get compiled down 57 00:03:28,300 --> 00:03:31,690 to 0's and 1's, and then we need to link everything together, 58 00:03:31,690 --> 00:03:35,140 merging those 0's and 1's so that the computer has access to your code 59 00:03:35,140 --> 00:03:39,020 and to printf's code and to the cs50 library's code And so forth. 60 00:03:39,020 --> 00:03:43,420 But all of that we can just generally wrap up in the descriptor of compiling. 61 00:03:43,420 --> 00:03:45,520 And so that's one of the looks we took last week. 62 00:03:45,520 --> 00:03:49,090 And we also have introduced, last week and previously, a few tools. 63 00:03:49,090 --> 00:03:52,420 And odds are, you're having as many frustrations perhaps already 64 00:03:52,420 --> 00:03:54,667 with the p-sets as you are accomplishments 65 00:03:54,667 --> 00:03:55,750 and sense of satisfaction. 66 00:03:55,750 --> 00:03:59,440 And that's normal, and rest assured that the scales will eventually tip more 67 00:03:59,440 --> 00:04:01,900 toward happiness and away from sadness, but we'll 68 00:04:01,900 --> 00:04:05,260 give you indeed more tools today than these for actually finding 69 00:04:05,260 --> 00:04:07,750 problems or shortcomings in your code. 70 00:04:07,750 --> 00:04:10,690 help50, recall, helps you with what process? 71 00:04:10,690 --> 00:04:14,130 When you instinctively consider using help50? 72 00:04:14,130 --> 00:04:15,880 When you see error messages on the screen. 73 00:04:15,880 --> 00:04:18,370 Something you don't understand that's the result of some mistake you 74 00:04:18,370 --> 00:04:21,411 probably made but you don't quite understand what the computer is telling 75 00:04:21,411 --> 00:04:24,527 you, run help50, and then that same command and we, the staff, 76 00:04:24,527 --> 00:04:26,860 with our code will try to understand the message for you 77 00:04:26,860 --> 00:04:28,390 and provide you with feedback. 78 00:04:28,390 --> 00:04:30,500 style50 does exactly that. 79 00:04:30,500 --> 00:04:34,449 It helps you see with red and green color coding exactly what spaces should 80 00:04:34,449 --> 00:04:36,740 be there, shouldn't be there-- it just helps you pretty 81 00:04:36,740 --> 00:04:39,920 your code so that you can read it better and other humans can as well. 82 00:04:39,920 --> 00:04:44,440 And then printf, which is kind of like the coarsest tool in your tool box, 83 00:04:44,440 --> 00:04:47,980 this is just helping you see not only messages you want to see, 84 00:04:47,980 --> 00:04:49,357 but just the values of variables. 85 00:04:49,357 --> 00:04:51,440 You can print ints and strings, whatever you want, 86 00:04:51,440 --> 00:04:54,085 and then you can delete those lines of printf 87 00:04:54,085 --> 00:04:55,960 once you're confident your program's working. 88 00:04:55,960 --> 00:04:59,001 But that gets a little tedious, and honestly, as our programs get bigger, 89 00:04:59,001 --> 00:05:02,140 we're going to want more powerful tools than like manually printing things 90 00:05:02,140 --> 00:05:04,702 out, recompiling, rerunning, it very quickly it gets tedious. 91 00:05:04,702 --> 00:05:07,660 And the goal of programming is not to be tedious, but to be empowering, 92 00:05:07,660 --> 00:05:10,840 and that's where we'll step to today via this. 93 00:05:10,840 --> 00:05:14,920 So CS50 IDE is sort of fancier version of what 94 00:05:14,920 --> 00:05:18,910 you've been using called CS50 Sandbox, and in turn, CS50 Lab. 95 00:05:18,910 --> 00:05:21,555 Now recall that both of those tools, the Sandbox and the Lab, 96 00:05:21,555 --> 00:05:23,680 have a terminal window where you can type commands, 97 00:05:23,680 --> 00:05:29,242 they have a code editor where you can actually write your code, 98 00:05:29,242 --> 00:05:31,450 and then they have a file browser with icons and such 99 00:05:31,450 --> 00:05:34,100 where you can actually see your files and folders. 100 00:05:34,100 --> 00:05:38,410 So it turns out that CS50 IDE is another tool that at first glance 101 00:05:38,410 --> 00:05:41,740 is very, very similar, even though it's laid out a little differently, 102 00:05:41,740 --> 00:05:45,820 but it has as many features as the Sandbox and the Lab, but some more. 103 00:05:45,820 --> 00:05:49,630 More features that actually help you solve problems in your code 104 00:05:49,630 --> 00:05:53,010 and even collaborate come final project time with others if you would like. 105 00:05:53,010 --> 00:05:54,760 So this we'll see is this is the CS50 IDE. 106 00:05:54,760 --> 00:05:56,110 It comes with the so-called night mode so you 107 00:05:56,110 --> 00:05:58,540 can make everything a little darker on your screen, especially if p-setting 108 00:05:58,540 --> 00:06:00,670 at night, and let's actually take a look then 109 00:06:00,670 --> 00:06:04,150 at what you can do with this kind of tool. 110 00:06:04,150 --> 00:06:08,525 When you log into this tool for the very first time in the next problem set, 111 00:06:08,525 --> 00:06:10,900 you'll see an interface that's almost the same as before. 112 00:06:10,900 --> 00:06:13,210 The colors are a little different, the font sizes are a little different, 113 00:06:13,210 --> 00:06:16,100 but at the bottom by default, you have your so-called terminal window, 114 00:06:16,100 --> 00:06:17,980 though instead of the dollar sign now, you'll 115 00:06:17,980 --> 00:06:21,440 see a little more detailed workspace, but more on that in a bit. 116 00:06:21,440 --> 00:06:23,650 Up here you just have the code editor window, 117 00:06:23,650 --> 00:06:25,599 nothing's really going on there. 118 00:06:25,599 --> 00:06:27,640 And then we have the added feature of Ceiling Cat 119 00:06:27,640 --> 00:06:29,390 in the top right-hand corner. 120 00:06:29,390 --> 00:06:31,760 And we'll also see some other features along the way. 121 00:06:31,760 --> 00:06:35,250 So let's actually write a program in CS50 IDE, which, to be clear, 122 00:06:35,250 --> 00:06:39,390 is just another web-based programming environment that also gives you 123 00:06:39,390 --> 00:06:42,600 access to your own cloud-based server. 124 00:06:42,600 --> 00:06:45,990 It, too, is running Ubuntu Linux, which is a popular operating system that 125 00:06:45,990 --> 00:06:47,970 is not macOS and it's not Windows. 126 00:06:47,970 --> 00:06:51,420 But unlike the sandbox environment where you don't even log in 127 00:06:51,420 --> 00:06:53,310 and you lose your files eventually, as you 128 00:06:53,310 --> 00:06:56,018 may know from when your cookies are lost or something goes wrong, 129 00:06:56,018 --> 00:06:57,209 the IDE saves everything. 130 00:06:57,209 --> 00:06:59,250 And you'll log in with your account, and whatever 131 00:06:59,250 --> 00:07:02,130 you put there last week is going to be there this week and next week 132 00:07:02,130 --> 00:07:03,040 and beyond. 133 00:07:03,040 --> 00:07:07,230 So let me go ahead up to File, New File, or I could just click this little plus 134 00:07:07,230 --> 00:07:10,650 icon in the top right-hand corner, and let me go ahead and preemptively hit 135 00:07:10,650 --> 00:07:13,300 Control-S or Command-S or go to File, Save-- 136 00:07:13,300 --> 00:07:17,110 you should find the interface very similar to any Mac or PC program-- 137 00:07:17,110 --> 00:07:20,380 and let me go ahead and save this file as follows. 138 00:07:20,380 --> 00:07:23,039 I'm going to call this hello.c. 139 00:07:23,039 --> 00:07:25,080 And it's important to mention the file extension, 140 00:07:25,080 --> 00:07:27,510 otherwise the IDE, like the Sandbox and the Lab, 141 00:07:27,510 --> 00:07:29,564 won't know what type of program you're writing. 142 00:07:29,564 --> 00:07:32,230 And then let me go ahead and just write my simplest of programs. 143 00:07:32,230 --> 00:07:37,770 So let me go ahead and include stdio.h, int main void. 144 00:07:37,770 --> 00:07:40,920 Let me go ahead and open my curly braces, printf-- 145 00:07:40,920 --> 00:07:43,894 hello, world, backslash n, and a semi-colon. 146 00:07:43,894 --> 00:07:46,060 So you'll notice that almost everything is the same. 147 00:07:46,060 --> 00:07:48,270 The colors are a little different, perhaps, 148 00:07:48,270 --> 00:07:50,207 and you might see some different assistive 149 00:07:50,207 --> 00:07:53,040 features as you're typing your code, but the end result is the same. 150 00:07:53,040 --> 00:07:55,560 And the color coding you just get for free because it's helping 151 00:07:55,560 --> 00:07:57,760 draw your attention to different parts of the code. 152 00:07:57,760 --> 00:07:59,515 Let me go ahead now and-- 153 00:07:59,515 --> 00:08:00,465 oh notice this. 154 00:08:00,465 --> 00:08:01,740 There's one difference. 155 00:08:01,740 --> 00:08:05,179 The IDE is a more powerful tool, but as such, it's a more manual tool 156 00:08:05,179 --> 00:08:07,470 and it's not just going to auto-save your code for you. 157 00:08:07,470 --> 00:08:10,170 Nice as that's been with the Sandbox, such that you'd never 158 00:08:10,170 --> 00:08:12,270 actually had the hit Command-S or Control-S-- 159 00:08:12,270 --> 00:08:14,870 and if you were, you didn't need to be, the IDE 160 00:08:14,870 --> 00:08:18,570 is only going to save things when you want it to so that nothing 161 00:08:18,570 --> 00:08:20,740 will happen magically anymore. 162 00:08:20,740 --> 00:08:24,570 So what I'm going to have to do is go back up here, File, Save, or Command-S 163 00:08:24,570 --> 00:08:26,460 or Control-S, you'll see a little green dot 164 00:08:26,460 --> 00:08:29,070 briefly, and now and back at my prompt. 165 00:08:29,070 --> 00:08:33,750 I'm going to go ahead now and type my familiar command, make hello, Enter, 166 00:08:33,750 --> 00:08:36,270 and you'll see pretty much the same cryptic-looking client 167 00:08:36,270 --> 00:08:40,950 command as before because the IDE is configured quite like the Sandbox. 168 00:08:40,950 --> 00:08:44,190 And if I want to go ahead and run this now, how do I run this program? 169 00:08:44,190 --> 00:08:46,150 Quick check? 170 00:08:46,150 --> 00:08:48,710 ./hello, it's exactly the same as before. 171 00:08:48,710 --> 00:08:51,680 ./hello, and there we have it, hello, world. 172 00:08:51,680 --> 00:08:54,830 So long story short, the user interface thus far is a little different, 173 00:08:54,830 --> 00:08:56,227 but functionally it's the same. 174 00:08:56,227 --> 00:08:58,560 We're just going to now start to see some more features. 175 00:08:58,560 --> 00:08:59,510 So what are those features? 176 00:08:59,510 --> 00:09:02,060 And let's introduce new some capabilities that were actually 177 00:09:02,060 --> 00:09:05,760 possible in the Sandbox, we just didn't really introduce them at the time. 178 00:09:05,760 --> 00:09:09,710 If I click this folder icon at top left, you'll see all of my files and folders. 179 00:09:09,710 --> 00:09:12,014 And today for lecture I have a lot of pre-made examples 180 00:09:12,014 --> 00:09:14,930 that are already on the course's website, some of which we'll look at, 181 00:09:14,930 --> 00:09:16,850 some of which we'll refer to the website, 182 00:09:16,850 --> 00:09:18,870 but these are just familiar files and folders. 183 00:09:18,870 --> 00:09:21,410 And you can see that everything in my account 184 00:09:21,410 --> 00:09:23,720 is apparently in something called Workspace, which 185 00:09:23,720 --> 00:09:26,180 is just a folder, name, or a directory. 186 00:09:26,180 --> 00:09:28,190 Here's my sc3 directory, which again, comes 187 00:09:28,190 --> 00:09:30,980 from the website for today's lecture, lecture 3. 188 00:09:30,980 --> 00:09:33,960 And then here's the file I just compiled in the program and the file 189 00:09:33,960 --> 00:09:35,650 that I wrote, hello.c. 190 00:09:35,650 --> 00:09:38,662 You'll notice too that there's this funky symbol here, tilde, 191 00:09:38,662 --> 00:09:41,120 that you might not have occasion to write often in English, 192 00:09:41,120 --> 00:09:44,010 but in Spanish in other languages you might use this character. 193 00:09:44,010 --> 00:09:48,600 This is actually a shorthand notation for what's called your home directory. 194 00:09:48,600 --> 00:09:52,190 In this environment, CS50 IDE, you have your own home directory, which 195 00:09:52,190 --> 00:09:55,880 means your folder of files and other folders that you get to create, 196 00:09:55,880 --> 00:09:59,210 you own, and that persists every time you log in-- you're not 197 00:09:59,210 --> 00:10:00,920 going to lose the contents therein. 198 00:10:00,920 --> 00:10:05,660 So this just means that in your home directory, a.k.a. tilde, 199 00:10:05,660 --> 00:10:09,212 there is a folder called workspace in which I'm currently working. 200 00:10:09,212 --> 00:10:12,170 And that's just one folder in which all of my work is going to be done, 201 00:10:12,170 --> 00:10:15,615 because there's so many other files and folders in this cloud environment, 202 00:10:15,615 --> 00:10:17,990 just like there are in your Mac and PC, we just generally 203 00:10:17,990 --> 00:10:19,430 don't care what they are. 204 00:10:19,430 --> 00:10:25,520 But notice what we can do at this terminal window besides compile 205 00:10:25,520 --> 00:10:26,600 and run code. 206 00:10:26,600 --> 00:10:27,740 There are other commands. 207 00:10:27,740 --> 00:10:33,650 For instance, this blue text here, similarly to the file browser up top, 208 00:10:33,650 --> 00:10:37,260 indicates now not just that this is my prompt per the dollar sign, 209 00:10:37,260 --> 00:10:40,790 but that in my home directory's workspace directory. 210 00:10:40,790 --> 00:10:44,666 So that means I can be elsewhere even though I haven't 211 00:10:44,666 --> 00:10:46,040 specified where I want to go yet. 212 00:10:46,040 --> 00:10:49,460 And in fact, I can do this. ls stands for list, 213 00:10:49,460 --> 00:10:51,270 it's just shorthand notation for that. 214 00:10:51,270 --> 00:10:56,450 And now I see a textual version of my file tree, so to speak. 215 00:10:56,450 --> 00:10:59,344 So you'll see here, sc3 is a folder, and you 216 00:10:59,344 --> 00:11:01,760 can tell as much because there's a slash at the end of it. 217 00:11:01,760 --> 00:11:04,500 hello.c is of course the file I wrote a moment ago. 218 00:11:04,500 --> 00:11:08,750 And then hello in green is my program that I compiled, and the star 219 00:11:08,750 --> 00:11:10,275 or asterisk there is just-- 220 00:11:10,275 --> 00:11:12,650 it's not the name of the file, it's just indicating to me 221 00:11:12,650 --> 00:11:14,390 visually that that is executable. 222 00:11:14,390 --> 00:11:17,510 That's a program I can run just so I know what's compiled 223 00:11:17,510 --> 00:11:19,190 and what maybe is source code. 224 00:11:19,190 --> 00:11:23,120 So when you're running ./hello, the reason all this time this has been 225 00:11:23,120 --> 00:11:28,670 working is because in dot, your current folder, there is a file called hello, 226 00:11:28,670 --> 00:11:32,160 and when you hit Enter, you are running that program there. 227 00:11:32,160 --> 00:11:36,470 So if after today you go back onto CS50 Sandbox or CS50 Lab and type ls, 228 00:11:36,470 --> 00:11:39,560 you'll see exactly the same thing as you might by the little folder 229 00:11:39,560 --> 00:11:41,840 icon in those programs as well. 230 00:11:41,840 --> 00:11:44,660 But suppose I want to go into a directory. 231 00:11:44,660 --> 00:11:48,200 In macOS or Windows or even the IDE, I could, of course, 232 00:11:48,200 --> 00:11:51,080 go my File icon, and then per the little triangle 233 00:11:51,080 --> 00:11:53,390 here, which might seem intuitive, you just click it 234 00:11:53,390 --> 00:11:56,150 and you can see what's going on inside, not surprising. 235 00:11:56,150 --> 00:11:57,740 But how do you do that textually? 236 00:11:57,740 --> 00:12:00,270 At a command prompt, well it's not all that hard. 237 00:12:00,270 --> 00:12:02,160 You just need to change your directory. 238 00:12:02,160 --> 00:12:08,670 So if I do cd space sc3, Enter, nothing seems to happen quite yet 239 00:12:08,670 --> 00:12:10,796 except that my prompt changed. 240 00:12:10,796 --> 00:12:13,670 Here's the indication that-- this is my prompt, but to the left of it 241 00:12:13,670 --> 00:12:17,540 you see in blue that I'm now in my home directory's workspace folder, 242 00:12:17,540 --> 00:12:19,990 in my sc3 folder there. 243 00:12:19,990 --> 00:12:23,411 So it's just a text-based version of the GUIs, the Graphical User Interfaces 244 00:12:23,411 --> 00:12:25,160 that all of us have certainly come to take 245 00:12:25,160 --> 00:12:29,010 for granted in the world of macOS and Windows thus far. 246 00:12:29,010 --> 00:12:33,410 Well, suppose that I'm a little done with my hello program 247 00:12:33,410 --> 00:12:34,430 and I want to delete it. 248 00:12:34,430 --> 00:12:37,920 Well in the IDE, like in the Sandbox, you can actually go up here and you can 249 00:12:37,920 --> 00:12:41,030 click on it, and then you can typically right-click or control-click, 250 00:12:41,030 --> 00:12:44,000 and you'll get a whole menu of other options, one of which is Delete-- 251 00:12:44,000 --> 00:12:46,590 and feel free to tinker like that in your own environment. 252 00:12:46,590 --> 00:12:48,020 But what about the command line? 253 00:12:48,020 --> 00:12:52,392 If I zoom in down here and I want to remove hello, you're 254 00:12:52,392 --> 00:12:55,100 not going to type remove because that just feels a little verbose 255 00:12:55,100 --> 00:12:58,250 and humans decades ago decided that's too tedious to type, 256 00:12:58,250 --> 00:13:00,410 let's just call this command rm-- 257 00:13:00,410 --> 00:13:04,920 for remove-- hello, you're going to see a somewhat cryptic prompt. 258 00:13:04,920 --> 00:13:07,040 rm-- remove regular file 'hello?' 259 00:13:07,040 --> 00:13:09,830 This is more arcane than it needs to be, but it's just asking, 260 00:13:09,830 --> 00:13:11,590 are you sure you want to delete 'hello?' 261 00:13:11,590 --> 00:13:12,950 Then it's just waiting for you. 262 00:13:12,950 --> 00:13:18,050 And here you can type y or yes or sometimes other commands too, 263 00:13:18,050 --> 00:13:20,420 now I've confirmed that my intentions were yes. 264 00:13:20,420 --> 00:13:23,600 If I type ls again, I-- whoops, in the wrong folder. 265 00:13:23,600 --> 00:13:27,810 If I type ls again after doing hello-- 266 00:13:27,810 --> 00:13:31,940 no-- after doing hello and do ls, now I'll 267 00:13:31,940 --> 00:13:34,580 see just those two things-- sc3 and hello.c. 268 00:13:34,580 --> 00:13:36,440 What if I want to make a folder? 269 00:13:36,440 --> 00:13:37,340 Well notice this. 270 00:13:37,340 --> 00:13:41,080 If I type at the bottom here, make directory-- 271 00:13:41,080 --> 00:13:45,050 mkdir-- test just to make a test folder, I'm 272 00:13:45,050 --> 00:13:48,170 about to hit Enter, but watch the top left-hand corner 273 00:13:48,170 --> 00:13:51,680 where I currently have those other files and folders, and when I hit Enter, 274 00:13:51,680 --> 00:13:53,040 now I have a test folder. 275 00:13:53,040 --> 00:13:54,290 So these things are identical. 276 00:13:54,290 --> 00:13:57,440 One is graphical, one is command line, and there's even other commands 277 00:13:57,440 --> 00:13:59,060 if I decide I don't want that. 278 00:13:59,060 --> 00:14:02,210 rmdir is remove directory, and it just goes away 279 00:14:02,210 --> 00:14:04,590 because it's empty and thus safe. 280 00:14:04,590 --> 00:14:06,680 Any questions then on any of those commands 281 00:14:06,680 --> 00:14:11,505 or just the overall layout of what it is we're looking at? 282 00:14:11,505 --> 00:14:13,880 All right, so don't get hung up on any of those commands, 283 00:14:13,880 --> 00:14:15,200 and the problem set and beyond will always 284 00:14:15,200 --> 00:14:16,670 remind you of those kinds of features. 285 00:14:16,670 --> 00:14:19,461 The point for now is just that we're in a somewhat new environment, 286 00:14:19,461 --> 00:14:23,289 but it's fundamentally still the same, it has the same capabilities. 287 00:14:23,289 --> 00:14:24,830 So what are other tools we looked at? 288 00:14:24,830 --> 00:14:28,550 So you might have heard rumors about a tool called check50, and indeed, 289 00:14:28,550 --> 00:14:31,970 this is a tool that the staff use to evaluate problem set 1 and problems set 290 00:14:31,970 --> 00:14:35,450 2 to evaluate the correctness of them so that we ourselves don't have to type 291 00:14:35,450 --> 00:14:41,660 ./mario or ./caesar again and again and again to test students' code. 292 00:14:41,660 --> 00:14:44,640 But starting this week, you, too, have access to the same program. 293 00:14:44,640 --> 00:14:48,440 check50 is a command from the staff that checks the correctness of your code 294 00:14:48,440 --> 00:14:51,800 just like style50 checks the style of your code. 295 00:14:51,800 --> 00:14:53,990 And in fact, if I go back over to my IDE, 296 00:14:53,990 --> 00:14:57,440 let's try to use this for the first time by making the same version of hello 297 00:14:57,440 --> 00:15:00,090 that you did perhaps for your first problem set. 298 00:15:00,090 --> 00:15:04,400 So if I go ahead and include not just stdio, but cs50.h, 299 00:15:04,400 --> 00:15:07,010 and I go ahead and get a string from the user 300 00:15:07,010 --> 00:15:10,280 with get_string, prompting them for their name, and then go ahead 301 00:15:10,280 --> 00:15:14,990 and print not just hello, world, but hello, percent s comma name, 302 00:15:14,990 --> 00:15:17,960 this I believe was the same program you yourselves probably 303 00:15:17,960 --> 00:15:20,010 wrote, or some variant thereof. 304 00:15:20,010 --> 00:15:22,190 So if I go ahead now and test this myself-- 305 00:15:22,190 --> 00:15:26,150 make hello, Enter, seems OK, ./hello. 306 00:15:26,150 --> 00:15:29,004 I'm going to go ahead and type in my name, and voila, hello, David. 307 00:15:29,004 --> 00:15:30,920 Now suppose you're feeling pretty good, you're 308 00:15:30,920 --> 00:15:32,711 pretty confident that your code is correct, 309 00:15:32,711 --> 00:15:36,080 and most importantly, you have tested your code yourselves. 310 00:15:36,080 --> 00:15:38,780 It's not sufficient to rely on our tool alone 311 00:15:38,780 --> 00:15:41,540 to test your code because it, too, might not be exhaustive. 312 00:15:41,540 --> 00:15:45,310 So once you've tried a few inputs, not just David, but perhaps 313 00:15:45,310 --> 00:15:47,570 Veronica's name as well, seems to work. 314 00:15:47,570 --> 00:15:49,940 Brian's name as well, seems to work. 315 00:15:49,940 --> 00:15:52,704 No name at all, doesn't seem to work, maybe? 316 00:15:52,704 --> 00:15:54,620 But we'll have to look back to the problem set 317 00:15:54,620 --> 00:15:56,420 to see if that's actually a problem. 318 00:15:56,420 --> 00:15:58,640 Let me go ahead now and run check50. 319 00:15:58,640 --> 00:16:02,330 check50 expects a special slug, so to speak. 320 00:16:02,330 --> 00:16:05,335 Just a unique identifier for the problem that you want to check. 321 00:16:05,335 --> 00:16:07,460 And you would only know this from reading a problem 322 00:16:07,460 --> 00:16:09,081 set or a documentation online. 323 00:16:09,081 --> 00:16:12,080 I just happened to recall that the command that the staff had been using 324 00:16:12,080 --> 00:16:18,280 to grade and evaluate hello is just cs50/2018/fall/hello. 325 00:16:18,280 --> 00:16:21,030 And the slash is to just kind of visually distinguish those words, 326 00:16:21,030 --> 00:16:24,560 this isn't a folder or files or anything like that in your own account. 327 00:16:24,560 --> 00:16:29,030 So I'm going to run check50 cs50/2018/fall/hello in the same 328 00:16:29,030 --> 00:16:31,550 directory that hello.c is in. 329 00:16:31,550 --> 00:16:32,700 Enter. 330 00:16:32,700 --> 00:16:35,740 It's going to go ahead and connect to GitHub, which is the backend, 331 00:16:35,740 --> 00:16:37,490 recall, that we use for storing your code. 332 00:16:37,490 --> 00:16:40,800 It's authenticating me now, which means what's your username and password? 333 00:16:40,800 --> 00:16:43,850 I'm going to go ahead and use one of my test accounts. 334 00:16:43,850 --> 00:16:45,630 And now it's prompting me for my password, 335 00:16:45,630 --> 00:16:47,360 and I'm going to go ahead and type that in. 336 00:16:47,360 --> 00:16:49,970 You'll notice you're seeing stars like you see bullets in a website 337 00:16:49,970 --> 00:16:52,950 just so that someone looking over your shoulder can't see what you're typing. 338 00:16:52,950 --> 00:16:55,190 Now I'm going to go ahead and watch the progress. 339 00:16:55,190 --> 00:16:58,490 It's preparing, let me go ahead and zoom in. 340 00:16:58,490 --> 00:16:59,890 Dot-dot-dot. 341 00:16:59,890 --> 00:17:03,050 It's looking at my code, it's getting ready for submission, 342 00:17:03,050 --> 00:17:07,460 it's now uploading it to GitHub.com, and once it's on the servers, 343 00:17:07,460 --> 00:17:11,150 then it's going to tell CS50 server, here is so-and-so's submission, 344 00:17:11,150 --> 00:17:14,359 go ahead and run a few automated tests on it, 345 00:17:14,359 --> 00:17:17,960 checking therefore its correctness, and hopefully we're about to see some 346 00:17:17,960 --> 00:17:21,200 green, happy smiley faces, and voila, yes, 347 00:17:21,200 --> 00:17:24,109 it looks like this check50 command for this problem-- 348 00:17:24,109 --> 00:17:26,150 or slug, so to speak-- 349 00:17:26,150 --> 00:17:29,630 checked that hello.c exists, because if I forgot to write the file 350 00:17:29,630 --> 00:17:32,020 or if I misnamed it, nothing's going to work. 351 00:17:32,020 --> 00:17:33,860 We checked that it compiles successfully, 352 00:17:33,860 --> 00:17:35,630 so that, too, is a happy green face. 353 00:17:35,630 --> 00:17:37,070 Then it apparently checked-- 354 00:17:37,070 --> 00:17:38,840 what if we type in Veronica? 355 00:17:38,840 --> 00:17:40,250 Do we see hello, Veronica? 356 00:17:40,250 --> 00:17:41,030 Apparently yes. 357 00:17:41,030 --> 00:17:42,800 What if we typed in another word, Brian? 358 00:17:42,800 --> 00:17:44,482 Yes, apparently we say hello, Brian. 359 00:17:44,482 --> 00:17:46,190 And so with high probability, we're going 360 00:17:46,190 --> 00:17:49,920 to conclude, based on those four tests, that your code is, in fact, correct, 361 00:17:49,920 --> 00:17:51,810 at least with respect to those inputs. 362 00:17:51,810 --> 00:17:54,302 And there's often some more detail via URL at the bottom 363 00:17:54,302 --> 00:17:56,510 where you can actually see more graphically just more 364 00:17:56,510 --> 00:17:57,740 feedback on your code. 365 00:17:57,740 --> 00:18:01,490 Of course, the first time, second time, third time maybe you run this command, 366 00:18:01,490 --> 00:18:03,410 you might not see some green happy faces, 367 00:18:03,410 --> 00:18:06,867 you might see some red unhappy faces or some yellow flat faces, 368 00:18:06,867 --> 00:18:09,950 which just means we couldn't even run the checks because something else is 369 00:18:09,950 --> 00:18:10,590 wrong. 370 00:18:10,590 --> 00:18:14,480 But over time, this will help you feel more comfortable and more confident 371 00:18:14,480 --> 00:18:18,504 that your code's correct before you actually use submit50 and submit. 372 00:18:18,504 --> 00:18:21,170 Going into it you'll feel a little better or a little frustrated 373 00:18:21,170 --> 00:18:24,470 to know in advance-- wait a minute, I'm about to submit this but nope, 374 00:18:24,470 --> 00:18:25,550 it's not yet correct. 375 00:18:25,550 --> 00:18:28,490 So realize it's a two-edged sword. 376 00:18:28,490 --> 00:18:34,470 Any questions about check50 or any of these commands thus far? 377 00:18:34,470 --> 00:18:36,974 Anything at all? 378 00:18:36,974 --> 00:18:37,741 No? 379 00:18:37,741 --> 00:18:38,240 All right. 380 00:18:38,240 --> 00:18:41,030 So let's take a look at the final and most powerful 381 00:18:41,030 --> 00:18:45,380 tool now available to you in the IDE environment. 382 00:18:45,380 --> 00:18:49,119 Built in to CS50 IDE, which stands for Integrated Development 383 00:18:49,119 --> 00:18:52,160 Environment, which isn't a CS50 thing-- this is a common term in industry 384 00:18:52,160 --> 00:18:54,780 for tools that make it easier to write code, 385 00:18:54,780 --> 00:18:58,470 it turns out that there's some other feature besides the cat over here. 386 00:18:58,470 --> 00:19:00,920 Namely, one, you can share your workspace 387 00:19:00,920 --> 00:19:03,090 with teaching fellows and course assistants 388 00:19:03,090 --> 00:19:06,230 so they can perhaps help you in real time a la Google Docs, even chatting 389 00:19:06,230 --> 00:19:07,370 with you in real time. 390 00:19:07,370 --> 00:19:09,910 But it also provides you with what's called a debugger. 391 00:19:09,910 --> 00:19:12,320 A debugger, as the name suggests, removes bugs-- 392 00:19:12,320 --> 00:19:15,350 or rather, helps you remove bugs from your code 393 00:19:15,350 --> 00:19:17,630 by allowing you to not just resort to printf-- 394 00:19:17,630 --> 00:19:19,700 printing out ints and strings and whatever 395 00:19:19,700 --> 00:19:22,580 is good that's going on your program, it kind of automates 396 00:19:22,580 --> 00:19:24,140 that very tedious process for you. 397 00:19:24,140 --> 00:19:26,330 And it lets you walk through your code one 398 00:19:26,330 --> 00:19:29,330 line at a time at your own comfortable pace 399 00:19:29,330 --> 00:19:33,984 and see along the way all of the values of your variables in that program. 400 00:19:33,984 --> 00:19:36,900 To activate this debugger, I'm going to go ahead and do the following. 401 00:19:36,900 --> 00:19:39,960 I'm going to compile my code as always with make hello. 402 00:19:39,960 --> 00:19:42,344 It has to compile, otherwise I might want 403 00:19:42,344 --> 00:19:44,510 to use help50 and figure out why it's not compiling, 404 00:19:44,510 --> 00:19:46,250 but it does seem to have compiled. 405 00:19:46,250 --> 00:19:50,210 And now I'm going to go ahead and run debug50, space, and then 406 00:19:50,210 --> 00:19:52,280 the name of the program I wanted to debug. 407 00:19:52,280 --> 00:19:54,738 And the name of the program I wanted to debug at the moment 408 00:19:54,738 --> 00:19:56,690 is the current directory's file called hello. 409 00:19:56,690 --> 00:19:59,106 Let's assume that there's perhaps something wrong with it. 410 00:19:59,106 --> 00:20:01,417 The first time I run this command, though, debug50 411 00:20:01,417 --> 00:20:03,875 is not going to be happy with me because it's going to say, 412 00:20:03,875 --> 00:20:06,270 it looks like you haven't set any breakpoints. 413 00:20:06,270 --> 00:20:09,380 Set at least one breakpoint by clicking to the left of a line number 414 00:20:09,380 --> 00:20:10,920 and then rerun debug50. 415 00:20:10,920 --> 00:20:12,170 Well what is a breakpoint? 416 00:20:12,170 --> 00:20:14,420 Well as the name kind of suggests, it allows 417 00:20:14,420 --> 00:20:19,130 you to break or pause the running of your code at any of your lines. 418 00:20:19,130 --> 00:20:21,620 And all this time for the past few weeks, 419 00:20:21,620 --> 00:20:23,480 your code been automatically line-numbered. 420 00:20:23,480 --> 00:20:27,110 And this is useful because the most interesting line in this program, 421 00:20:27,110 --> 00:20:29,880 once it really gets going, isn't this stuff at the top, 422 00:20:29,880 --> 00:20:31,130 it's not int main void, right? 423 00:20:31,130 --> 00:20:33,650 That's all copy-paste from past programs. 424 00:20:33,650 --> 00:20:37,910 It's really the sixth line here where I actually have some logic of my own. 425 00:20:37,910 --> 00:20:41,360 And so in CS50 IDE, what you can now do is 426 00:20:41,360 --> 00:20:43,340 click to the left of one of these line numbers, 427 00:20:43,340 --> 00:20:46,460 a little red light like a stop sign is going to appear saying, 428 00:20:46,460 --> 00:20:49,700 break or pause my program on this line so 429 00:20:49,700 --> 00:20:52,130 that I can poke around my actual code. 430 00:20:52,130 --> 00:20:54,180 Sandbox and Lab cannot do this. 431 00:20:54,180 --> 00:20:58,970 So now I'm going to go ahead and rerun debug50 in exactly the same way, hit 432 00:20:58,970 --> 00:21:01,400 Enter, but now I have one breakpoint. 433 00:21:01,400 --> 00:21:05,390 And you'll see on the right-hand side a fancier menu just popped up 434 00:21:05,390 --> 00:21:07,827 by the cat that provides me with a bunch of features. 435 00:21:07,827 --> 00:21:10,160 And at first glance, frankly, it's a little overwhelming 436 00:21:10,160 --> 00:21:13,940 because there's a lot going on here, but you'll notice first, 437 00:21:13,940 --> 00:21:17,540 and most importantly, there's some mention of my name variable. 438 00:21:17,540 --> 00:21:21,470 I don't quite understand 0x0 or whatnot, but I do understand string. 439 00:21:21,470 --> 00:21:26,240 And so what the debug50 program has realized is oh, on this line and below, 440 00:21:26,240 --> 00:21:28,160 you have a variable called name. 441 00:21:28,160 --> 00:21:29,950 It doesn't seem to have a value yet. 442 00:21:29,950 --> 00:21:33,890 0x0, it turns out, is just going to mean empty or null or 0. 443 00:21:33,890 --> 00:21:37,400 But that's good, because now, when I actually execute this line, 444 00:21:37,400 --> 00:21:41,000 hopefully it's going to take on the name David or Veronica or Brian. 445 00:21:41,000 --> 00:21:42,510 So let's see what happens. 446 00:21:42,510 --> 00:21:46,010 Notice that it's highlighted in yellow, line 6, which means it 447 00:21:46,010 --> 00:21:48,440 has not yet executed this line of code. 448 00:21:48,440 --> 00:21:52,670 My code has paused at this point because I set that breakpoint. 449 00:21:52,670 --> 00:21:57,260 And then notice kind of like a music player up here, there's a few icons. 450 00:21:57,260 --> 00:21:59,930 The Play button is just going to say, ah, play my program, 451 00:21:59,930 --> 00:22:03,200 run it all the way through the end, kind of like scratch with the green flag. 452 00:22:03,200 --> 00:22:04,970 But more powerful is this. 453 00:22:04,970 --> 00:22:09,110 You can step over this line, therefore executing it just once. 454 00:22:09,110 --> 00:22:12,020 If it's a function, you can step into this line 455 00:22:12,020 --> 00:22:15,710 and actually look inside of a function that you're using, like get_string, 456 00:22:15,710 --> 00:22:19,500 or you can step out of another function, but more on that another time. 457 00:22:19,500 --> 00:22:20,859 So what I'm going to do is this. 458 00:22:20,859 --> 00:22:23,900 And the button I'm going to click most commonly when trying to understand 459 00:22:23,900 --> 00:22:25,358 how my program is working is this-- 460 00:22:25,358 --> 00:22:26,340 Step Over. 461 00:22:26,340 --> 00:22:31,670 So it's the second icon from the left, right next to the triangle. 462 00:22:31,670 --> 00:22:34,230 So once I click this, watch what's going to happen, 463 00:22:34,230 --> 00:22:38,635 even though it's a little small, on the right-hand side for my name variable. 464 00:22:38,635 --> 00:22:41,510 Notice that I'm being prompted to type in my name because the program 465 00:22:41,510 --> 00:22:44,630 is still running in my terminal window, but when I hit Enter now, 466 00:22:44,630 --> 00:22:49,310 providing my own name, automatically you see on the right-hand side 467 00:22:49,310 --> 00:22:53,270 that this name variable has a value now of, quote-unquote, 468 00:22:53,270 --> 00:22:55,020 "David" of type string. 469 00:22:55,020 --> 00:22:59,240 There's this 0x1083010-- more on that later, just a little cryptic, 470 00:22:59,240 --> 00:23:02,260 but I didn't have to use printf now, I can actually see what's going on. 471 00:23:02,260 --> 00:23:04,411 Now you can see that line 7 is highlighted, 472 00:23:04,411 --> 00:23:07,160 because I set a breakpoint above it, so now I'm on the second line 473 00:23:07,160 --> 00:23:08,540 because I just stepped into it. 474 00:23:08,540 --> 00:23:11,090 Let me go ahead and click Next again, and you'll 475 00:23:11,090 --> 00:23:14,300 see that in my terminal window, hello, David just got executed. 476 00:23:14,300 --> 00:23:17,330 And now if I just keep going, it's going to go ahead and run to the end 477 00:23:17,330 --> 00:23:18,810 and close the debugger. 478 00:23:18,810 --> 00:23:21,530 So not all that useful for this program because frankly, I'm 479 00:23:21,530 --> 00:23:25,550 pretty sure this is correct, but the power of debug50 and a debugger more 480 00:23:25,550 --> 00:23:28,700 generally is that it lets you, whether you're less comfy or more comfy, 481 00:23:28,700 --> 00:23:33,050 walk through your own code at your pace just like a TF or a CA might say, OK, 482 00:23:33,050 --> 00:23:34,050 what is this line doing? 483 00:23:34,050 --> 00:23:35,160 What is this line doing? 484 00:23:35,160 --> 00:23:38,440 You don't have to resort to printf, you can just very methodically 485 00:23:38,440 --> 00:23:41,410 walk through your code and find that damn bug that's been bothering you 486 00:23:41,410 --> 00:23:43,120 for minutes or even hours. 487 00:23:43,120 --> 00:23:47,530 So henceforth, any time you have a bug in your code that is compiling 488 00:23:47,530 --> 00:23:51,010 but it's just logically incorrect-- the pyramid in Mario isn't quite right, 489 00:23:51,010 --> 00:23:53,920 your encryption of Caesar isn't quite right, or something else, 490 00:23:53,920 --> 00:23:58,240 your first instinct now should be, let me compile it, run debug50 on it, 491 00:23:58,240 --> 00:24:01,910 and just step through the code, setting a breakpoint wherever I want, 492 00:24:01,910 --> 00:24:04,450 so you focus on just a few lines, not the whole thing-- 493 00:24:04,450 --> 00:24:05,500 like I just did-- 494 00:24:05,500 --> 00:24:08,840 and see if you can figure out logically when a value is not what you expected, 495 00:24:08,840 --> 00:24:09,490 then oh-- 496 00:24:09,490 --> 00:24:13,000 go ahead and just click Resume, fix the bug, and retry. 497 00:24:13,000 --> 00:24:15,110 Such a powerful tool. 498 00:24:15,110 --> 00:24:17,000 Any questions? 499 00:24:17,000 --> 00:24:19,180 Yeah? 500 00:24:19,180 --> 00:24:19,710 What is it? 501 00:24:19,710 --> 00:24:21,780 AUDIENCE: What does it look like when there is a bug? 502 00:24:21,780 --> 00:24:24,113 DAVID MALAN: What does it look like when there is a bug? 503 00:24:24,113 --> 00:24:28,170 So the debugger won't find your bugs and it won't show you your bugs, per se. 504 00:24:28,170 --> 00:24:31,137 It's going to let you see what line is executing, 505 00:24:31,137 --> 00:24:32,970 it's going to let you see what's outputting, 506 00:24:32,970 --> 00:24:34,950 it's going to let you take input, but all it's 507 00:24:34,950 --> 00:24:36,949 going to do on that right-hand side is just show 508 00:24:36,949 --> 00:24:38,910 you the values of things along the way. 509 00:24:38,910 --> 00:24:42,840 It's up to you to infer from that information what 510 00:24:42,840 --> 00:24:46,140 it is that's going wrong, just like if you're using printf in past weeks 511 00:24:46,140 --> 00:24:48,420 to see what's going on in your program. 512 00:24:48,420 --> 00:24:50,590 Other questions? 513 00:24:50,590 --> 00:24:52,360 And let me save this too. 514 00:24:52,360 --> 00:24:55,600 It is so easy to get into the habit, especially when so many things have 515 00:24:55,600 --> 00:24:57,940 been new over the past few weeks of just saying, ah, 516 00:24:57,940 --> 00:24:59,860 this is just yet another thing to learn. 517 00:24:59,860 --> 00:25:02,620 This is hands down the kind of tool that if you 518 00:25:02,620 --> 00:25:05,622 spend a few extra minutes this week and next week just using it, 519 00:25:05,622 --> 00:25:07,330 get a little more comfortable with it, it 520 00:25:07,330 --> 00:25:09,927 will save you potentially hours in the long run, 521 00:25:09,927 --> 00:25:12,010 because all the time you've been spending manually 522 00:25:12,010 --> 00:25:14,550 trying to fix your bugs or posting questions online 523 00:25:14,550 --> 00:25:16,420 trying to understand things, this is a tool 524 00:25:16,420 --> 00:25:18,640 that if you invest those minutes upfront will just 525 00:25:18,640 --> 00:25:21,340 help you understand everything going on inside of your program, 526 00:25:21,340 --> 00:25:27,190 and will absolutely over the next few weeks save you more and more time. 527 00:25:27,190 --> 00:25:30,665 All right, any questions? yeah? 528 00:25:30,665 --> 00:25:34,060 AUDIENCE: So you have a for loop that ran [INAUDIBLE] times, 529 00:25:34,060 --> 00:25:38,199 [INAUDIBLE] separate break statements so you don't have to [INAUDIBLE].. 530 00:25:38,199 --> 00:25:39,490 DAVID MALAN: Ah, good question. 531 00:25:39,490 --> 00:25:42,156 If you have something like a for loop or a while loop, something 532 00:25:42,156 --> 00:25:45,630 that's happening a lot, can you set a breakpoint in such a way 533 00:25:45,630 --> 00:25:49,382 that it only breaks so that you don't have to walk through it 100 times 534 00:25:49,382 --> 00:25:50,340 just to see that value? 535 00:25:50,340 --> 00:25:51,480 Short answer, yes. 536 00:25:51,480 --> 00:25:54,720 And let me defer to section and online resources for just a few 537 00:25:54,720 --> 00:25:57,240 of these features, but one, you can actually watch values, 538 00:25:57,240 --> 00:25:59,323 and you can have what's called a watch expression. 539 00:25:59,323 --> 00:26:03,330 You can say show me this value if only when x is greater than 50 540 00:26:03,330 --> 00:26:04,470 or something like that. 541 00:26:04,470 --> 00:26:06,840 Or you yourself can just add some lines of code. 542 00:26:06,840 --> 00:26:11,430 You could add a, if x equals-equals 50, then print out something, 543 00:26:11,430 --> 00:26:14,080 and you can set a breakpoint on that new, if temporary line, 544 00:26:14,080 --> 00:26:15,724 so there's a couple of ways to do that. 545 00:26:15,724 --> 00:26:16,890 Good question to anticipate. 546 00:26:16,890 --> 00:26:17,390 Yeah? 547 00:26:17,390 --> 00:26:18,181 Behind. 548 00:26:18,181 --> 00:26:22,029 AUDIENCE: If you run debug50, aren't you adding 549 00:26:22,029 --> 00:26:26,994 another arugment with the [INAUDIBLE] in your main method at line 4? 550 00:26:26,994 --> 00:26:28,410 DAVID MALAN: Really good question. 551 00:26:28,410 --> 00:26:30,450 If you're running debug50, aren't you adding 552 00:26:30,450 --> 00:26:33,540 another argument-- argv-- per our discussion last week of command line 553 00:26:33,540 --> 00:26:34,320 arguments? 554 00:26:34,320 --> 00:26:36,827 Short answer, no, because debug50 corrects for that, 555 00:26:36,827 --> 00:26:38,410 so you don't have to worry about that. 556 00:26:38,410 --> 00:26:40,530 It will not shift things over numerically. 557 00:26:40,530 --> 00:26:41,730 Really good thought. 558 00:26:41,730 --> 00:26:43,800 Other questions? 559 00:26:43,800 --> 00:26:50,040 All right, so with that said, let's now take some training wheels off. 560 00:26:50,040 --> 00:26:53,070 So the only reason I bought these training wheels years ago 561 00:26:53,070 --> 00:26:57,870 is to make this very dramatic point of now taking the training wheels off 562 00:26:57,870 --> 00:26:59,660 today. 563 00:26:59,660 --> 00:27:01,470 OK, so what does this mean? 564 00:27:01,470 --> 00:27:03,300 Well worth the trip to Target. 565 00:27:03,300 --> 00:27:04,350 So what does this mean? 566 00:27:04,350 --> 00:27:07,110 For the past few weeks, we have been using a whole bunch 567 00:27:07,110 --> 00:27:09,310 of functions from CS50's library. 568 00:27:09,310 --> 00:27:12,500 All of these were meant to just make it pretty easy, relatively speaking, 569 00:27:12,500 --> 00:27:14,712 in the first few weeks to get input from the user. 570 00:27:14,712 --> 00:27:16,420 Because it turns out, as we'll see today, 571 00:27:16,420 --> 00:27:20,910 it's actually a kind of a pain in the neck to get input from users in C, 572 00:27:20,910 --> 00:27:23,834 and frankly, even in other languages reliability. 573 00:27:23,834 --> 00:27:27,000 Because you'll recall that get_string and get_int and all of these functions 574 00:27:27,000 --> 00:27:30,180 take on the burden of like re-prompting the user if they don't actually 575 00:27:30,180 --> 00:27:32,130 give you an an int or don't give you a float 576 00:27:32,130 --> 00:27:34,880 or don't give you a char that you're expecting, they'll re-prompt, 577 00:27:34,880 --> 00:27:37,296 they're using a while loop or a do-while loop or the like, 578 00:27:37,296 --> 00:27:40,200 so there's just a lot of error detection built into these functions. 579 00:27:40,200 --> 00:27:44,370 But, most importantly-- and most misleadingly, 580 00:27:44,370 --> 00:27:46,470 has been the last one on this list. 581 00:27:46,470 --> 00:27:50,400 Recall that we introduced a couple weeks ago now the notion of a string. 582 00:27:50,400 --> 00:27:53,570 And a string is in English what? 583 00:27:53,570 --> 00:27:54,780 An array of characters, good. 584 00:27:54,780 --> 00:27:57,300 It's a sequence of characters, and we learned last week that a sequence can 585 00:27:57,300 --> 00:27:59,758 be implemented in an array, which is just a chunk of memory 586 00:27:59,758 --> 00:28:01,290 back-to-back-to-back-to-back. 587 00:28:01,290 --> 00:28:06,120 So string, though, is not quite like any of those other data types. 588 00:28:06,120 --> 00:28:10,800 It turns out that it's not quite like int or char or even bool or float, 589 00:28:10,800 --> 00:28:13,480 and we can start to see that now as follows. 590 00:28:13,480 --> 00:28:15,522 I'm going to go ahead and go into the IDE today-- 591 00:28:15,522 --> 00:28:17,813 and henceforth we're going to just start using the IDE, 592 00:28:17,813 --> 00:28:20,942 but you're welcome to keep using the Sandbox for quick and dirty programs, 593 00:28:20,942 --> 00:28:22,650 but for anything you want to keep around, 594 00:28:22,650 --> 00:28:24,937 your instinct should now be to open your IDE. 595 00:28:24,937 --> 00:28:26,770 I'm going to go ahead and create a new file, 596 00:28:26,770 --> 00:28:31,710 and I'm going to call it compare0.c from my first example of comparing things. 597 00:28:31,710 --> 00:28:34,860 And I'm going to go ahead and whip up a relatively short program 598 00:28:34,860 --> 00:28:37,130 that you would hope would work right out of the box. 599 00:28:37,130 --> 00:28:40,230 So I'm going to go ahead and include the familiar cs50.h. 600 00:28:40,230 --> 00:28:42,540 I'm going to go include stdio.h. 601 00:28:42,540 --> 00:28:44,710 I'm going to go ahead and do int main void. 602 00:28:44,710 --> 00:28:46,390 I'm going to go ahead and in here-- 603 00:28:46,390 --> 00:28:49,800 let me a variable called i using get_int from the user, 604 00:28:49,800 --> 00:28:51,750 and just prompt them for i. 605 00:28:51,750 --> 00:28:55,080 Let me go ahead then and prompt the user for another get_int. 606 00:28:55,080 --> 00:28:57,292 We'll call it j and get that from them. 607 00:28:57,292 --> 00:28:59,000 And then let's just compare these things. 608 00:28:59,000 --> 00:29:03,120 So if i equals-equals j, then go ahead and print out 609 00:29:03,120 --> 00:29:05,730 with printf same and a new line. 610 00:29:05,730 --> 00:29:10,870 Then go ahead and print out the opposite, which is different. 611 00:29:10,870 --> 00:29:13,920 So the only place I think I could have screwed up, perhaps, 612 00:29:13,920 --> 00:29:16,200 is if I did this, which is kind of reasonable if you 613 00:29:16,200 --> 00:29:17,783 come in knowing what an equal sign is. 614 00:29:17,783 --> 00:29:20,250 But again, in code, we typically need two equal signs 615 00:29:20,250 --> 00:29:21,710 because that compares two values. 616 00:29:21,710 --> 00:29:24,750 So I didn't make that mistake, I'm feeling pretty good about this. 617 00:29:24,750 --> 00:29:28,170 Let me save it with Command-S or Control-S or via File, 618 00:29:28,170 --> 00:29:31,840 Save; go to my prompt and run make compare0. 619 00:29:31,840 --> 00:29:33,450 Good, everything compiled. 620 00:29:33,450 --> 00:29:38,760 And let me go ahead and run compare0, Enter, and I'll type in 50, 621 00:29:38,760 --> 00:29:42,240 and I'll type in 50, and they do seem to be the same. 622 00:29:42,240 --> 00:29:46,532 Let me go ahead and do that again, let's type in 42 and 13, 623 00:29:46,532 --> 00:29:47,490 and they are different. 624 00:29:47,490 --> 00:29:50,698 And I should probably test a few more, maybe some negative values, maybe some 625 00:29:50,698 --> 00:29:52,609 0's, positive values and the like, but I'm 626 00:29:52,609 --> 00:29:54,900 feeling pretty good about the correctness of this code. 627 00:29:54,900 --> 00:29:55,500 All right. 628 00:29:55,500 --> 00:29:57,214 So let's change this program a bit. 629 00:29:57,214 --> 00:29:59,130 Let me go ahead and create another file, which 630 00:29:59,130 --> 00:30:02,470 I can do with the little green plus or via File, New File. 631 00:30:02,470 --> 00:30:04,670 I'm going to go ahead save this one as compare1.c. 632 00:30:04,670 --> 00:30:08,880 And for the moment I'm going to go ahead and just paste in that code 633 00:30:08,880 --> 00:30:11,340 from before, but I'm going to make some changes now. 634 00:30:11,340 --> 00:30:16,170 I'm going to go ahead and rename and retype my data types as strings. 635 00:30:16,170 --> 00:30:18,840 So give me a string called s, and will prompt the user 636 00:30:18,840 --> 00:30:21,000 for that using get_string, then I'm going 637 00:30:21,000 --> 00:30:23,880 to go ahead and change this 1 to string t, 638 00:30:23,880 --> 00:30:25,860 and I'm going to go ahead and get get_string. 639 00:30:25,860 --> 00:30:30,180 I, of course, need to now compare s and t, not i and j. 640 00:30:30,180 --> 00:30:33,510 And s is a common variable name for a string. t just comes after s, 641 00:30:33,510 --> 00:30:36,860 so that's pretty reasonable too, but I should of course update that as well. 642 00:30:36,860 --> 00:30:39,390 And so I think everything's now the same logically. 643 00:30:39,390 --> 00:30:41,940 I just changed my data types and my variable names. 644 00:30:41,940 --> 00:30:42,930 So I've saved this. 645 00:30:42,930 --> 00:30:45,380 Let me go ahead and run make compare1. 646 00:30:45,380 --> 00:30:47,110 Good, everything's correct. 647 00:30:47,110 --> 00:30:51,630 Let me go ahead and do ./compare1. 648 00:30:51,630 --> 00:30:56,280 Let me go ahead and type in Brian and Veronica. 649 00:30:56,280 --> 00:30:58,380 And of course, those are different. 650 00:30:58,380 --> 00:31:01,770 Now let me go ahead and type in David, let me type in David again, 651 00:31:01,770 --> 00:31:05,591 and those of course are different? 652 00:31:05,591 --> 00:31:06,090 Huh. 653 00:31:06,090 --> 00:31:08,470 Maybe it's because I just hit the Spacebar or something. 654 00:31:08,470 --> 00:31:11,554 So let's try Erin. 655 00:31:11,554 --> 00:31:12,720 Her name's a little shorter. 656 00:31:12,720 --> 00:31:13,800 Hmm. 657 00:31:13,800 --> 00:31:16,530 OK, let's try-- oh, what's her name? 658 00:31:16,530 --> 00:31:17,040 TJ. 659 00:31:17,040 --> 00:31:19,000 OK, even shorter, perfect. 660 00:31:19,000 --> 00:31:21,281 TJ, can't go wrong. 661 00:31:21,281 --> 00:31:21,780 Different. 662 00:31:21,780 --> 00:31:23,140 I mean, what is going on? 663 00:31:23,140 --> 00:31:25,590 Let's just say i, i. 664 00:31:25,590 --> 00:31:27,010 Different? 665 00:31:27,010 --> 00:31:29,650 So where's the logical bug in this program? 666 00:31:34,070 --> 00:31:36,460 What is it that's going on? 667 00:31:36,460 --> 00:31:37,482 Yeah, what do you think? 668 00:31:37,482 --> 00:31:39,190 AUDIENCE: Is it comparing integer values? 669 00:31:39,190 --> 00:31:40,570 DAVID MALAN: Is it comparing integer values? 670 00:31:40,570 --> 00:31:41,140 Well maybe. 671 00:31:41,140 --> 00:31:43,150 I mean, thus far when we've used equal-equals 672 00:31:43,150 --> 00:31:45,520 we've probably used it mostly for comparing integers, 673 00:31:45,520 --> 00:31:47,560 so maybe I'm just misusing it, sure. 674 00:31:47,560 --> 00:31:48,862 Other thoughts? 675 00:31:48,862 --> 00:31:51,687 AUDIENCE: [INAUDIBLE] 676 00:31:51,687 --> 00:31:54,770 DAVID MALAN: Oh, that's a big word that we'll get to in just a little bit. 677 00:31:54,770 --> 00:31:58,240 But correct, correct-- but for very similar reasons. 678 00:31:58,240 --> 00:32:02,650 So something's going on logically involving comparison, 679 00:32:02,650 --> 00:32:06,580 because I'm using equal-equal, but maybe I'm using it for the wrong data types? 680 00:32:06,580 --> 00:32:09,580 I mean, it's clearly broken for strings. 681 00:32:09,580 --> 00:32:11,890 So why might that actually be? 682 00:32:11,890 --> 00:32:16,450 Well it turns out that strings don't actually exist. 683 00:32:16,450 --> 00:32:19,150 So a string that we know is just a sequence of characters 684 00:32:19,150 --> 00:32:22,720 or an array of characters is not an actual data type. 685 00:32:22,720 --> 00:32:27,440 int is, float is, double is, long is, bool is, and even more 686 00:32:27,440 --> 00:32:28,870 are actual data types. 687 00:32:28,870 --> 00:32:30,970 String is kind of a little white lie we've 688 00:32:30,970 --> 00:32:35,170 been telling for a few weeks that's implemented only in the CS50 library. 689 00:32:35,170 --> 00:32:37,350 Now the word string is super common in programming. 690 00:32:37,350 --> 00:32:40,516 Like every programmer out there will know what you mean when you say string. 691 00:32:40,516 --> 00:32:44,740 That is not a CS50 word, but our use of it in C is CS50-specific. 692 00:32:44,740 --> 00:32:47,590 Because in that file called cs50.h, in addition 693 00:32:47,590 --> 00:32:50,290 to declaring functions like get_string and get_int and get_float 694 00:32:50,290 --> 00:32:53,860 and a bunch of other things, we also have a special line that says, 695 00:32:53,860 --> 00:32:57,760 create a data type called string. 696 00:32:57,760 --> 00:33:00,860 But what does it actually do or what does it actually mean? 697 00:33:00,860 --> 00:33:04,090 Well let's go ahead and consider what might be going on underneath the hood 698 00:33:04,090 --> 00:33:04,850 here. 699 00:33:04,850 --> 00:33:08,950 So if I go ahead and draw the program that we just 700 00:33:08,950 --> 00:33:12,490 ran, that program compare1 gets a string s from the user, 701 00:33:12,490 --> 00:33:15,730 then gets a string t from the user, and then compares them. 702 00:33:15,730 --> 00:33:18,500 So we know from last week what a string is, it's just an array. 703 00:33:18,500 --> 00:33:22,520 So when I run that first line of code and get a string from the user-- 704 00:33:22,520 --> 00:33:28,270 for instance, Brian, I'm going to go ahead and see a B-R-I-A-N, 705 00:33:28,270 --> 00:33:33,415 which we know from last week to actually be an array of memory that might look 706 00:33:33,415 --> 00:33:36,040 pictorially like this-- and this, too, is a bit of a white lie, 707 00:33:36,040 --> 00:33:37,704 there's something else. 708 00:33:37,704 --> 00:33:38,560 AUDIENCE: The null. 709 00:33:38,560 --> 00:33:41,350 DAVID MALAN: Yeah, the null character, so to speak, and ul, 710 00:33:41,350 --> 00:33:45,540 which we typically just write with a backslash 0, which is just all 0 bits. 711 00:33:45,540 --> 00:33:49,390 And it turns out, you might recall from the debugger earlier, you saw this-- 712 00:33:49,390 --> 00:33:52,480 that's the even more cryptic way of expressing the null character, 713 00:33:52,480 --> 00:33:53,350 backslash 0. 714 00:33:53,350 --> 00:33:55,940 Just different programs display it in different ways. 715 00:33:55,940 --> 00:34:00,200 So when I get_string and type in Brian, this is what's allocated in memory. 716 00:34:00,200 --> 00:34:05,922 And when I type Veronica, I can see a V-E-R-O-N-I-C-A. 717 00:34:05,922 --> 00:34:07,630 I'm going to get that right preemptively. 718 00:34:07,630 --> 00:34:08,850 Backslash 0. 719 00:34:08,850 --> 00:34:12,190 That, too, is a chunk of memory, which I'll draw like this. 720 00:34:12,190 --> 00:34:16,989 1, 2, and split these up into interval characters or bytes. 721 00:34:16,989 --> 00:34:20,380 And recall from last time that these bytes just come from my memory, 722 00:34:20,380 --> 00:34:23,469 and that memory just has a bunch of bytes in it, maybe millions or even 723 00:34:23,469 --> 00:34:24,699 billions these days. 724 00:34:24,699 --> 00:34:26,830 And so honestly, if you just have that many things, 725 00:34:26,830 --> 00:34:29,290 any human or computer can certainly number them. 726 00:34:29,290 --> 00:34:31,600 Like this is byte 1, 2, 3, 4. 727 00:34:31,600 --> 00:34:34,030 So let's just assume for the sake of discussion 728 00:34:34,030 --> 00:34:36,610 that out of context of my computer's hardware, 729 00:34:36,610 --> 00:34:46,659 Brian just ended up at location 100, and location 101, and 102, 103, 104, 105. 730 00:34:46,659 --> 00:34:49,179 So this is the 100th byte in my computer, 731 00:34:49,179 --> 00:34:51,310 this is 105th byte in my computer, and Brian 732 00:34:51,310 --> 00:34:53,100 is using that many characters in total. 733 00:34:53,100 --> 00:34:55,030 Veronica, she ended up somewhere else. 734 00:34:55,030 --> 00:35:02,710 Maybe she ended up farther away just because at location 900, 901, 902, 903, 735 00:35:02,710 --> 00:35:09,910 904, 905, 906-- a lot more memory, 907, and 908-- 736 00:35:09,910 --> 00:35:14,010 but you can see even more visually now that the length of Brian's name-- 737 00:35:14,010 --> 00:35:18,398 strlen of Brian is what? 738 00:35:18,398 --> 00:35:21,230 AUDIENCE: [INAUDIBLE] 739 00:35:21,230 --> 00:35:22,950 DAVID MALAN: I hear five and I hear six. 740 00:35:22,950 --> 00:35:24,330 The length of Brian's name-- 741 00:35:24,330 --> 00:35:25,835 Brian, how long is your name? 742 00:35:25,835 --> 00:35:26,460 AUDIENCE: Five. 743 00:35:26,460 --> 00:35:28,834 DAVID MALAN: OK, it is definitively five characters, that 744 00:35:28,834 --> 00:35:31,530 is the length of Brian's name, but you have 745 00:35:31,530 --> 00:35:35,370 to appreciate that in the computer, Brian's five-character name does indeed 746 00:35:35,370 --> 00:35:36,270 take up six bytes. 747 00:35:36,270 --> 00:35:39,750 So both answers are kind of correct, but the length of the string henceforth 748 00:35:39,750 --> 00:35:41,700 is always the number of actual characters. 749 00:35:41,700 --> 00:35:45,720 The amount of space it takes up is that plus 1 for the null character. 750 00:35:45,720 --> 00:35:49,710 So you can actually see why Brian's name takes up six bytes in this picture 751 00:35:49,710 --> 00:35:52,380 rather than just the actual length, which is five. 752 00:35:52,380 --> 00:35:55,620 So when you call get_string now, and when you call 753 00:35:55,620 --> 00:35:57,480 get_string and get another string-- 754 00:35:57,480 --> 00:36:01,740 Brian and Veronica respectively, what is actually being handed back? 755 00:36:01,740 --> 00:36:04,350 A couple weeks ago, Erin came up and she kind of like 756 00:36:04,350 --> 00:36:07,200 handed me back a string, a student's name from the audience. 757 00:36:07,200 --> 00:36:11,970 On that piece of paper we thought was the student's name. 758 00:36:11,970 --> 00:36:13,030 But it's not. 759 00:36:13,030 --> 00:36:15,750 It turns out that when a function returns a value, 760 00:36:15,750 --> 00:36:20,310 it can pretty much only return a 1 byte or maybe 2 or 4 bytes. 761 00:36:20,310 --> 00:36:25,450 It can't return an arbitrary number of bytes, like six for Brian or 1, 2, 3, 762 00:36:25,450 --> 00:36:29,460 4, 5, 6, 7, 8, 9-- it cannot return 9 bytes for Veronica. 763 00:36:29,460 --> 00:36:32,850 And if you even type a whole paragraph or page of text, 764 00:36:32,850 --> 00:36:37,600 it can't return all of that text, it can only return a single value. 765 00:36:37,600 --> 00:36:40,440 So to your instinct earlier, what might actually 766 00:36:40,440 --> 00:36:44,730 be getting returned by get_string when the human has 767 00:36:44,730 --> 00:36:47,726 typed in a name like Brian or Veronica? 768 00:36:47,726 --> 00:36:49,495 AUDIENCE: [INAUDIBLE] 769 00:36:49,495 --> 00:36:50,870 DAVID MALAN: The memory location. 770 00:36:50,870 --> 00:36:53,510 Indeed, an integer, or as you called it, a pointer, 771 00:36:53,510 --> 00:36:55,770 which we'll introduce more formally in just a moment. 772 00:36:55,770 --> 00:36:58,820 So when get_string string returns "Brian," quote-unquote, 773 00:36:58,820 --> 00:37:05,570 it's actually not returning B-R-I-A-N backslash 0, it is just returning 100. 774 00:37:05,570 --> 00:37:08,360 And when get_string returns Veronica, it's not returning her name, 775 00:37:08,360 --> 00:37:10,580 it's returning 900. 776 00:37:10,580 --> 00:37:13,780 And so if you realize that now, when you do does 777 00:37:13,780 --> 00:37:19,820 s equal-equal t, what question more mundanely are you actually asking? 778 00:37:19,820 --> 00:37:20,960 Yeah. 779 00:37:20,960 --> 00:37:24,200 Memory location and memory location-- does 100 equal 900? 780 00:37:24,200 --> 00:37:25,670 And obviously not. 781 00:37:25,670 --> 00:37:28,680 And so that is why Brian's name, Veronica's name, 782 00:37:28,680 --> 00:37:32,870 my name, TJ's name-- every word I typed in was of course different, 783 00:37:32,870 --> 00:37:36,560 because each input was ending up at a different location in memory. 784 00:37:36,560 --> 00:37:40,509 And even if I typed the same word like David twice, one David was going here, 785 00:37:40,509 --> 00:37:42,800 one David was going somewhere else, they were ending up 786 00:37:42,800 --> 00:37:44,050 at different memory locations. 787 00:37:44,050 --> 00:37:46,460 Maybe 100, maybe 900, maybe something else, 788 00:37:46,460 --> 00:37:48,920 but they were ending up in different locations in memory. 789 00:37:48,920 --> 00:37:51,890 So equal-equals does compare values, but dammit 790 00:37:51,890 --> 00:37:54,280 if it isn't comparing the wrong values. 791 00:37:54,280 --> 00:37:54,780 Yeah? 792 00:37:54,780 --> 00:37:56,907 AUDIENCE: Well what if you use some char*s? 793 00:37:56,907 --> 00:37:58,740 DAVID MALAN: Ah, so we'll come back to that. 794 00:37:58,740 --> 00:38:00,490 Let me come back to that in just a moment. 795 00:38:00,490 --> 00:38:02,689 char* is actually intricately related. 796 00:38:02,689 --> 00:38:03,730 More on that in a moment. 797 00:38:03,730 --> 00:38:04,230 Yeah? 798 00:38:04,230 --> 00:38:06,170 AUDIENCE: If you add two integers in memory-- 799 00:38:06,170 --> 00:38:06,658 DAVID MALAN: Uh huh? 800 00:38:06,658 --> 00:38:09,098 AUDIENCE: Wouldn't they be in different places in memory? 801 00:38:09,098 --> 00:38:11,050 So you would return-- 802 00:38:11,050 --> 00:38:12,727 so you need a different value. 803 00:38:12,727 --> 00:38:14,310 DAVID MALAN: OK, really good question. 804 00:38:14,310 --> 00:38:19,700 So wait a minute, this same logic that I'm returning the address of something 805 00:38:19,700 --> 00:38:23,750 surely applies to integers as well or floating point values as well? 806 00:38:23,750 --> 00:38:25,910 Because if I type in the number 50 like I 807 00:38:25,910 --> 00:38:29,960 did earlier, that, too, is somewhere in memory-- like a box in memory, 808 00:38:29,960 --> 00:38:32,780 and that, too, has an address somewhere in memory, 809 00:38:32,780 --> 00:38:36,135 but it turns out, for reasons that you just alluded to, actually, 810 00:38:36,135 --> 00:38:38,540 ints are returned as their values. 811 00:38:38,540 --> 00:38:40,740 Chars are returned as their values. 812 00:38:40,740 --> 00:38:42,200 Bools are returned as their values. 813 00:38:42,200 --> 00:38:43,700 Floats are returned as their values. 814 00:38:43,700 --> 00:38:45,260 Strings are different. 815 00:38:45,260 --> 00:38:48,920 Strings are returned by their address. 816 00:38:48,920 --> 00:38:54,080 And those addresses, it turns out, are ultimately going to be called 817 00:38:54,080 --> 00:38:56,700 char*'s, which we'll see in just a moment. 818 00:38:56,700 --> 00:38:59,780 So how do we go about then fixing this fundamentally? 819 00:38:59,780 --> 00:39:03,170 Like even if you have no idea how to code this yet, just intuitively, 820 00:39:03,170 --> 00:39:06,030 if I do actually want to delete-- 821 00:39:06,030 --> 00:39:09,670 if I do actually want to compare-- 822 00:39:09,670 --> 00:39:10,170 sorry. 823 00:39:13,530 --> 00:39:14,050 OK. 824 00:39:14,050 --> 00:39:19,180 If I do want to go ahead and compare Brian and Veronica for equality, 825 00:39:19,180 --> 00:39:21,370 what do I want to do intuitively? 826 00:39:21,370 --> 00:39:23,050 I can't just compare their addresses. 827 00:39:23,050 --> 00:39:25,872 What do I need to do? 828 00:39:25,872 --> 00:39:27,955 Isolate the characters and then do what with them? 829 00:39:27,955 --> 00:39:30,820 AUDIENCE: [INAUDIBLE] 830 00:39:30,820 --> 00:39:31,570 DAVID MALAN: Good. 831 00:39:31,570 --> 00:39:32,445 Yeah, good instincts. 832 00:39:32,445 --> 00:39:35,260 Use a for loop, use a while loop-- any kind of looping structure. 833 00:39:35,260 --> 00:39:37,184 And intuitively, compare the first characters, 834 00:39:37,184 --> 00:39:40,350 and if they're different, well then we know we don't have to go any further. 835 00:39:40,350 --> 00:39:43,147 B is not a V, so surely these names are different. 836 00:39:43,147 --> 00:39:44,230 But what about in my case? 837 00:39:44,230 --> 00:39:46,870 If it was David and David, you would compare the first two. 838 00:39:46,870 --> 00:39:48,460 D and D are the same. 839 00:39:48,460 --> 00:39:50,710 Compare the second two, A and A are the same. 840 00:39:50,710 --> 00:39:55,570 V and V, I and I, D and D, and then what am I going to hit last? 841 00:39:55,570 --> 00:39:56,500 Null character. 842 00:39:56,500 --> 00:39:58,660 And should I keep going beyond the null character? 843 00:39:58,660 --> 00:39:59,160 No. 844 00:39:59,160 --> 00:40:02,250 So this is the beauty of that super simple design for a string. 845 00:40:02,250 --> 00:40:07,750 Insofar as strings are identified by their starting address, just the byte 846 00:40:07,750 --> 00:40:10,390 at which they start, you still need to know 847 00:40:10,390 --> 00:40:14,650 how long they are, because otherwise how do where one word begins and ends 848 00:40:14,650 --> 00:40:16,330 and another word begins? 849 00:40:16,330 --> 00:40:20,320 And so the simple decision we made last week-- as did humans decades ago-- 850 00:40:20,320 --> 00:40:25,600 to terminate all strings with backslash 0 or all 0's is a super handy trick, 851 00:40:25,600 --> 00:40:28,630 so that if I tell you that Brian starts at 100, 852 00:40:28,630 --> 00:40:31,120 you can infer that he ends where? 853 00:40:33,800 --> 00:40:37,400 At byte number 105 or 104, if you will, however you want to think about it, 854 00:40:37,400 --> 00:40:40,610 because all you need to do in linear time, 855 00:40:40,610 --> 00:40:43,640 if you will, left or right, is check-- backslash 0, backslash 0-- ah! 856 00:40:43,640 --> 00:40:46,790 Backslash 0, now I know how long Brian's name is. 857 00:40:46,790 --> 00:40:49,880 So let's consider for a moment this program called string length. 858 00:40:49,880 --> 00:40:52,460 How does strlen actually work? 859 00:40:52,460 --> 00:40:57,140 When you pass to strlen, a variable containing a string, like Brian, 860 00:40:57,140 --> 00:41:00,299 what is sterling probably doing? 861 00:41:00,299 --> 00:41:02,345 AUDIENCE: [INAUDIBLE] 862 00:41:02,345 --> 00:41:03,220 DAVID MALAN: Exactly. 863 00:41:03,220 --> 00:41:05,740 It's looking at that null character's address 864 00:41:05,740 --> 00:41:09,340 and subtracting the start address and the end address, 865 00:41:09,340 --> 00:41:12,160 figuring out what the difference is, and actually returning 866 00:41:12,160 --> 00:41:14,890 that minus 1 the total count. 867 00:41:14,890 --> 00:41:16,940 And more mechanically, we'll see in a moment, 868 00:41:16,940 --> 00:41:19,090 it's probably doing exactly the same thing I did, 869 00:41:19,090 --> 00:41:20,357 which is, is this backslash 0? 870 00:41:20,357 --> 00:41:21,190 Is this backslash 0? 871 00:41:21,190 --> 00:41:22,780 Is this, is this, is this? 872 00:41:22,780 --> 00:41:25,840 I asked that question five times before I saw backslash 0. 873 00:41:25,840 --> 00:41:29,320 strlen is just a function some human wrote years ago 874 00:41:29,320 --> 00:41:31,930 that probably just has a simple for loop and an if condition, 875 00:41:31,930 --> 00:41:33,220 and then that's it. 876 00:41:33,220 --> 00:41:35,320 Because that person understood before we even 877 00:41:35,320 --> 00:41:39,050 did how strings are actually implemented. 878 00:41:39,050 --> 00:41:41,047 Any questions then? 879 00:41:41,047 --> 00:41:42,880 All right, so let's actually implement this. 880 00:41:42,880 --> 00:41:46,930 Let me go ahead and into my editor here, and make one other example here 881 00:41:46,930 --> 00:41:48,730 that I'm going to call compare2. 882 00:41:48,730 --> 00:41:55,150 I'm going to go ahead and do include cs50.h and include stdio.h, 883 00:41:55,150 --> 00:41:57,940 and then I'm going to do int main void, and I'm 884 00:41:57,940 --> 00:42:03,040 going to quickly now grab my code from before where I got strings 885 00:42:03,040 --> 00:42:06,380 and I compared them, but I have to obviously fix that comparison. 886 00:42:06,380 --> 00:42:08,290 So here's my code from before. 887 00:42:08,290 --> 00:42:10,130 I'm going to do this the right way. 888 00:42:10,130 --> 00:42:14,630 I'm going to call a function called compare_strings passing in s and t. 889 00:42:14,630 --> 00:42:16,870 Because as you proposed, we need to do some logic. 890 00:42:16,870 --> 00:42:18,910 We don't have to pass it to a function, but we could. 891 00:42:18,910 --> 00:42:20,618 We could just do a for loop here, but I'm 892 00:42:20,618 --> 00:42:23,600 going to go ahead and implement compare_strings as follows. 893 00:42:23,600 --> 00:42:28,150 If I want to write a function that returns a yes/no answer, what data type 894 00:42:28,150 --> 00:42:29,890 should it return? 895 00:42:29,890 --> 00:42:30,490 A bool. 896 00:42:30,490 --> 00:42:32,800 So we've not necessarily done this yet, but you 897 00:42:32,800 --> 00:42:36,280 can return a bool just like you can int or a char or something else. 898 00:42:36,280 --> 00:42:38,500 I'm going to call this function compare_strings. 899 00:42:38,500 --> 00:42:42,570 It's going to take in one string called a and another string called b, 900 00:42:42,570 --> 00:42:44,600 but I could call those anything I want. 901 00:42:44,600 --> 00:42:47,740 And now what's the easiest thing to check? 902 00:42:47,740 --> 00:42:50,812 If I pass two strings, a and b, or Brian and Veronica, 903 00:42:50,812 --> 00:42:53,770 what's the easiest question you can ask and just immediately say, nope, 904 00:42:53,770 --> 00:42:55,235 these are different? 905 00:42:55,235 --> 00:42:56,110 String length, right? 906 00:42:56,110 --> 00:43:00,190 Like if the B-R-I-A-N is not of the same length as Veronica's name, 907 00:43:00,190 --> 00:43:02,830 we don't need to do any logic whatsoever beyond that, 908 00:43:02,830 --> 00:43:04,542 we can just quit and say false. 909 00:43:04,542 --> 00:43:05,500 So let me just do that. 910 00:43:05,500 --> 00:43:10,430 If the strlen of a does not equal the strlen of b, you know what? 911 00:43:10,430 --> 00:43:13,300 Let's just go ahead and return false and get out of here. 912 00:43:13,300 --> 00:43:17,200 OK, but now, if we get past that gateway, so to speak, 913 00:43:17,200 --> 00:43:19,780 that check, that question, that Boolean expression, 914 00:43:19,780 --> 00:43:23,182 now I have to compare things character by character by character. 915 00:43:23,182 --> 00:43:26,390 So I can do this in a bunch of ways, but I like the suggestion of a for loop. 916 00:43:26,390 --> 00:43:30,610 So for int i at 0, n for efficiency-- actually, 917 00:43:30,610 --> 00:43:33,720 let's do i is less than the string length-- 918 00:43:33,720 --> 00:43:36,825 should I do the string length of a or b? 919 00:43:36,825 --> 00:43:38,380 And it doesn't matter, right? 920 00:43:38,380 --> 00:43:39,430 So let's go with a. 921 00:43:39,430 --> 00:43:41,560 And frankly, had I been smart early on, I 922 00:43:41,560 --> 00:43:44,110 could have stored the value in a variable and then reused it, 923 00:43:44,110 --> 00:43:45,820 but we'll just keep going ahead for now. 924 00:43:45,820 --> 00:43:48,590 Then i plus-plus, but I remember from last time-- this is correct, 925 00:43:48,590 --> 00:43:49,710 but this is not good design. 926 00:43:49,710 --> 00:43:50,209 Why? 927 00:43:52,772 --> 00:43:55,980 Yeah, I keep calling strlen again and again, because remember, in a for loop, 928 00:43:55,980 --> 00:43:58,110 this condition is checked again and again 929 00:43:58,110 --> 00:44:00,160 and again-- you're just wasting your own time. 930 00:44:00,160 --> 00:44:02,970 So let me go ahead and actually do this. 931 00:44:02,970 --> 00:44:09,270 n or any variable equals the strlen of a, then just compare i against n, 932 00:44:09,270 --> 00:44:12,960 because now i is getting incremented, but n is never changing. 933 00:44:12,960 --> 00:44:15,250 So now let me go ahead and implement this for loop. 934 00:44:15,250 --> 00:44:21,510 So if-- how about the i-th character of a does not equal the i-th character 935 00:44:21,510 --> 00:44:24,780 of b, I can immediately conclude-- 936 00:44:24,780 --> 00:44:28,410 nope, these strings can't be the same, because some letter, like a B, 937 00:44:28,410 --> 00:44:31,440 is not the same as another, like a V, or whatever letter we're actually 938 00:44:31,440 --> 00:44:32,580 comparing. 939 00:44:32,580 --> 00:44:34,380 And then I think that's it. 940 00:44:34,380 --> 00:44:37,590 If I get through these gauntlets of questions-- 941 00:44:37,590 --> 00:44:38,850 are yours lengths different? 942 00:44:38,850 --> 00:44:40,100 Are your characters different? 943 00:44:40,100 --> 00:44:45,370 And I still haven't said false, what should I return by default? 944 00:44:45,370 --> 00:44:45,870 Yeah. 945 00:44:45,870 --> 00:44:49,060 Like if you make it through all of those questions and all is well, 946 00:44:49,060 --> 00:44:54,040 then D-A-V-I-D must indeed equal D-A-V-I-D or whatever the user actually 947 00:44:54,040 --> 00:44:54,920 typed in. 948 00:44:54,920 --> 00:44:56,140 Now I'm not quite done yet. 949 00:44:56,140 --> 00:44:58,780 When I've implemented a function or a helper function 950 00:44:58,780 --> 00:45:01,030 like this, because it's helping me do my work, 951 00:45:01,030 --> 00:45:02,680 what else do I have to add to the file? 952 00:45:02,680 --> 00:45:03,077 Oh? 953 00:45:03,077 --> 00:45:04,270 AUDIENCE: I've got a logical question. 954 00:45:04,270 --> 00:45:05,020 DAVID MALAN: Sure. 955 00:45:05,020 --> 00:45:08,642 AUDIENCE: In a computer, couldn't you just type in David with a capital D 956 00:45:08,642 --> 00:45:11,546 and then david with a lowercase d, you're going to run [INAUDIBLE],, 957 00:45:11,546 --> 00:45:12,914 they're not going to sync because your first character's not 958 00:45:12,914 --> 00:45:13,785 the same character. 959 00:45:13,785 --> 00:45:14,660 DAVID MALAN: Correct. 960 00:45:14,660 --> 00:45:17,240 So this is a feature, not a bug at the moment. 961 00:45:17,240 --> 00:45:20,040 My program at the moment is case-sensitive. 962 00:45:20,040 --> 00:45:22,790 If I type in DAVID and all caps, that is a different string 963 00:45:22,790 --> 00:45:25,460 I claim for now than david in all lowercase. 964 00:45:25,460 --> 00:45:27,590 If you want to tolerate uppercase and lowercase, 965 00:45:27,590 --> 00:45:29,090 you're going have to add more logic. 966 00:45:29,090 --> 00:45:32,460 But for now that's a design decision that I intend. 967 00:45:32,460 --> 00:45:32,960 All right. 968 00:45:32,960 --> 00:45:36,540 What else do I need to add to the program? 969 00:45:36,540 --> 00:45:38,057 Yeah, the prototype at top. 970 00:45:38,057 --> 00:45:41,140 You can literally copy and paste-- this is the only time copy and paste is 971 00:45:41,140 --> 00:45:43,210 probably a legitimate thing to do-- 972 00:45:43,210 --> 00:45:45,880 at the top, and then semi-colon-- don't re-implement it. 973 00:45:45,880 --> 00:45:48,550 But I do need one other header file. 974 00:45:48,550 --> 00:45:54,740 I'm using a function that's not in cs50.h or in stdio.h. 975 00:45:54,740 --> 00:45:56,900 String length? 976 00:45:56,900 --> 00:45:58,630 Where was string length? 977 00:45:58,630 --> 00:45:59,840 Yeah, string.h. 978 00:45:59,840 --> 00:46:03,460 So I just need this, include string.h, save. 979 00:46:03,460 --> 00:46:05,510 Now this I think is correct. 980 00:46:05,510 --> 00:46:08,190 We'll see if I eat the word in a moment. 981 00:46:08,190 --> 00:46:10,700 But realize that if you're writing this code yourself, 982 00:46:10,700 --> 00:46:13,940 like this is not a natural thing to be writing a program in office hours 983 00:46:13,940 --> 00:46:16,648 or at home in your dorm and just getting it right the first time. 984 00:46:16,648 --> 00:46:19,970 This is after like 20 years of doing this, so realize we happen to be-- 985 00:46:19,970 --> 00:46:21,720 and I also have a cheat sheet right here-- 986 00:46:21,720 --> 00:46:23,750 we happen to be doing this correctly often, 987 00:46:23,750 --> 00:46:26,010 but realize that's not going to be the common case. 988 00:46:26,010 --> 00:46:28,490 So with that reassurance in mind, let's see 989 00:46:28,490 --> 00:46:33,140 if I have to now take all that back. make compare2. 990 00:46:33,140 --> 00:46:33,860 OK-- phew. 991 00:46:33,860 --> 00:46:34,700 20 years worked out. 992 00:46:34,700 --> 00:46:37,940 So now I'm going to go ahead and ./compare2. 993 00:46:37,940 --> 00:46:40,850 Let's type in Brian, let's type in Veronica. 994 00:46:40,850 --> 00:46:42,800 Those are indeed still different hopefully. 995 00:46:42,800 --> 00:46:45,680 Now let's try myself, David and David. 996 00:46:45,680 --> 00:46:46,539 Phew! 997 00:46:46,539 --> 00:46:47,330 Those are the same. 998 00:46:47,330 --> 00:46:52,550 And to your point, David in capitalized and David in all lowercase, 999 00:46:52,550 --> 00:46:56,780 different, but that's what I expect now. 1000 00:46:56,780 --> 00:46:58,071 Any questions on compare2? 1001 00:46:58,071 --> 00:46:58,571 Yeah? 1002 00:46:58,571 --> 00:47:01,833 AUDIENCE: [INAUDIBLE] 1003 00:47:01,833 --> 00:47:02,499 DAVID MALAN: OK. 1004 00:47:02,499 --> 00:47:06,743 AUDIENCE: [INAUDIBLE] string in the program and in general. 1005 00:47:06,743 --> 00:47:07,409 DAVID MALAN: OK. 1006 00:47:07,409 --> 00:47:10,207 AUDIENCE: Would that still work [INAUDIBLE] 1007 00:47:10,207 --> 00:47:12,290 DAVID MALAN: If you were to hard code the strings? 1008 00:47:12,290 --> 00:47:14,000 Short answer, yes, that would still work. 1009 00:47:14,000 --> 00:47:19,160 If you for whatever reason did not do this and using get_string, 1010 00:47:19,160 --> 00:47:25,100 but you did David, and here, for instance, David, that would work too. 1011 00:47:25,100 --> 00:47:28,088 And whatever your error is, if you can recreate it, just let us know. 1012 00:47:28,088 --> 00:47:31,574 AUDIENCE: It seems to be like a string that would be increased 1013 00:47:31,574 --> 00:47:33,566 for a set that was [INAUDIBLE] only? 1014 00:47:33,566 --> 00:47:36,557 And it was having issues in the little [INAUDIBLE].. 1015 00:47:36,557 --> 00:47:39,390 DAVID MALAN: I'd have to see it to be sure, but happy to chat after. 1016 00:47:39,390 --> 00:47:42,090 All right, so let's see if we can't now clean this 1017 00:47:42,090 --> 00:47:45,790 up just a little bit as follows. 1018 00:47:45,790 --> 00:47:51,160 Let me go ahead here and reveal what it is that's actually going on. 1019 00:47:51,160 --> 00:47:53,997 So indeed, there is no such thing as a string. 1020 00:47:53,997 --> 00:47:55,830 And indeed, as you pointed out a moment ago, 1021 00:47:55,830 --> 00:47:57,371 it actually goes by a different name. 1022 00:47:57,371 --> 00:48:01,150 String is just a synonym for what's called a char*. 1023 00:48:01,150 --> 00:48:02,440 Now what does that even mean? 1024 00:48:02,440 --> 00:48:04,106 So char is the same as it's always been. 1025 00:48:04,106 --> 00:48:05,160 It's a single character. 1026 00:48:05,160 --> 00:48:09,820 Star in a program written in C could of course mean multiplication, 1027 00:48:09,820 --> 00:48:10,650 we have seen that. 1028 00:48:10,650 --> 00:48:12,270 This is another use of the star. 1029 00:48:12,270 --> 00:48:15,480 Whenever you see it after a data type like char, 1030 00:48:15,480 --> 00:48:19,500 this means that the data type in question is not just a char, 1031 00:48:19,500 --> 00:48:21,750 it's the address of a char. 1032 00:48:21,750 --> 00:48:25,330 So the star just means the address of whatever the data type is to the left, 1033 00:48:25,330 --> 00:48:27,330 and this is, as you pointed out earlier, what 1034 00:48:27,330 --> 00:48:29,170 we're going to start calling a pointer. 1035 00:48:29,170 --> 00:48:32,320 A pointer is, for all intents and purposes, an address. 1036 00:48:32,320 --> 00:48:34,650 It's just a buzzword to describe an address. 1037 00:48:34,650 --> 00:48:40,380 This data type here, char*, means I want a variable that doesn't store a char, 1038 00:48:40,380 --> 00:48:42,690 it stores the address of a char. 1039 00:48:42,690 --> 00:48:45,360 The number 100, the number 900. 1040 00:48:45,360 --> 00:48:48,330 But that address is just going to be called a pointer. 1041 00:48:48,330 --> 00:48:52,440 A pointer variable is a variable that stores the address of something. 1042 00:48:52,440 --> 00:48:54,850 A char or even other data types as well. 1043 00:48:54,850 --> 00:49:00,720 So with that in mind, let me actually quickly create compare3.c, paste this 1044 00:49:00,720 --> 00:49:05,670 in, and save it as compare3.c, and let me take off, if you will, 1045 00:49:05,670 --> 00:49:06,750 those training wheels. 1046 00:49:06,750 --> 00:49:10,110 It turns out that when you get a string with get_string, 1047 00:49:10,110 --> 00:49:12,660 it doesn't return a string, per se, because again, 1048 00:49:12,660 --> 00:49:16,620 that word doesn't exist in C, it actually returns a char*. 1049 00:49:16,620 --> 00:49:19,905 And when I call it again here and return another string, it, too, 1050 00:49:19,905 --> 00:49:21,090 returns a char*. 1051 00:49:21,090 --> 00:49:24,190 Now technically the star can have spaces around it. 1052 00:49:24,190 --> 00:49:27,420 Some people write it like this, but the sort of right way to do it 1053 00:49:27,420 --> 00:49:30,750 or the default way should just be to put the star next to the variable name 1054 00:49:30,750 --> 00:49:31,900 for clarity. 1055 00:49:31,900 --> 00:49:33,870 So I have to make a few other changes. 1056 00:49:33,870 --> 00:49:37,770 This should change too, because there is no more string as of today. 1057 00:49:37,770 --> 00:49:41,700 I'm going to change this to a char*; and then I also need to change it here, 1058 00:49:41,700 --> 00:49:48,990 char*; and then here, char*; and that is actually it. 1059 00:49:48,990 --> 00:49:53,640 And honestly, the only reason we didn't introduce this like two weeks ago 1060 00:49:53,640 --> 00:49:55,015 is because it just looks cryptic. 1061 00:49:55,015 --> 00:49:58,181 Like no one wants to program the first time they're ever touching a keyboard 1062 00:49:58,181 --> 00:50:01,230 and writing code and see char* and need to worry about what that means, 1063 00:50:01,230 --> 00:50:03,000 it's just a string conceptually. 1064 00:50:03,000 --> 00:50:06,450 But the only change I technically need to make to take those training wheels 1065 00:50:06,450 --> 00:50:11,139 off is just change all mentions of string as data types to char*. 1066 00:50:11,139 --> 00:50:12,930 And that just means that you know what-- a? 1067 00:50:12,930 --> 00:50:17,160 Yes it's a string, but more technically it's the address of a string. 1068 00:50:17,160 --> 00:50:21,720 Or more precisely, it is the address of the first byte of the string, 1069 00:50:21,720 --> 00:50:25,170 like 100 for Brian or 900 for Veronica, and I'm not even 1070 00:50:25,170 --> 00:50:28,950 going to tell you where the string ends because you, the programmer, 1071 00:50:28,950 --> 00:50:32,460 can figure that out by calling strlen or just by using a loop 1072 00:50:32,460 --> 00:50:35,550 and figuring out where that backslash 0 actually is. 1073 00:50:35,550 --> 00:50:37,990 So that is enough information to pass it around. 1074 00:50:37,990 --> 00:50:41,910 So if go ahead now and compile this, make compare3, 1075 00:50:41,910 --> 00:50:47,190 and then I go ahead and do ./compare3, let's go ahead and type in Brian 1076 00:50:47,190 --> 00:50:49,770 and Veronica, those are indeed still different. 1077 00:50:49,770 --> 00:50:53,662 Now let me go ahead and type in David and David, those are in fact the same. 1078 00:50:53,662 --> 00:50:56,370 So the training wheels are off, there is no such thing as string, 1079 00:50:56,370 --> 00:50:57,560 henceforth it's a char*. 1080 00:50:57,560 --> 00:51:00,060 Let's go ahead and take a quick break here for five minutes, 1081 00:51:00,060 --> 00:51:02,070 and we'll come back and dive in more. 1082 00:51:02,070 --> 00:51:03,240 All right. 1083 00:51:03,240 --> 00:51:06,630 So we are back, and let's go ahead and simplify this now, 1084 00:51:06,630 --> 00:51:07,710 as our tendency has been. 1085 00:51:07,710 --> 00:51:09,450 It's kind of a bunch of code, but I think 1086 00:51:09,450 --> 00:51:10,610 we can make this a little tighter. 1087 00:51:10,610 --> 00:51:12,434 But rather than type this one out manually, 1088 00:51:12,434 --> 00:51:14,850 let me go ahead and just open one of our pre-made examples 1089 00:51:14,850 --> 00:51:18,840 from today, which is all in the course's website, called compare4. 1090 00:51:18,840 --> 00:51:21,540 And you'll see in compare4, that's it. 1091 00:51:21,540 --> 00:51:23,670 I only have a main function this time. 1092 00:51:23,670 --> 00:51:26,970 I've gotten rid of my compare_strings function because you know what? 1093 00:51:26,970 --> 00:51:29,600 I seem to be using something instead. 1094 00:51:29,600 --> 00:51:33,090 What function did I apparently deploy? 1095 00:51:33,090 --> 00:51:35,640 Yeah, S-T-R-C-M-P, or someone with pronounce it, 1096 00:51:35,640 --> 00:51:37,830 just str compare or strcmp. 1097 00:51:37,830 --> 00:51:41,040 So this, like strlen, also succinctly named, 1098 00:51:41,040 --> 00:51:43,650 is just a function that's actually declared 1099 00:51:43,650 --> 00:51:46,980 in one of our familiar libraries up top, string.h, 1100 00:51:46,980 --> 00:51:49,960 and it turns out if you look in the man page, so to speak, 1101 00:51:49,960 --> 00:51:53,025 by typing man strcmp, or if you go to CS50 reference and actually 1102 00:51:53,025 --> 00:51:55,650 look at the less comfortable description of the function there, 1103 00:51:55,650 --> 00:51:57,733 this is just a function whose sole purpose in life 1104 00:51:57,733 --> 00:51:59,730 is to compare strings for you. 1105 00:51:59,730 --> 00:52:01,899 But it's a little different in behavior because it's 1106 00:52:01,899 --> 00:52:03,690 a little fancier than the one I just wrote. 1107 00:52:03,690 --> 00:52:07,860 Let me zoom in on this, and you'll see that line 14 here, I'm 1108 00:52:07,860 --> 00:52:11,490 not quite treating it in the same way. 1109 00:52:11,490 --> 00:52:14,840 My logic is ever so slightly different. 1110 00:52:14,840 --> 00:52:20,102 What am I actually checking for in my Boolean expression this time? 1111 00:52:20,102 --> 00:52:21,459 AUDIENCE: [INAUDIBLE] 1112 00:52:21,459 --> 00:52:23,250 DAVID MALAN: Yeah, which is a little weird. 1113 00:52:23,250 --> 00:52:28,860 I'm checking explicitly-- if strcmp's return value equal-equal to 0. 1114 00:52:28,860 --> 00:52:33,140 Before I just said, if compare_strings s comma 1115 00:52:33,140 --> 00:52:38,460 t, because I was expecting back a bool-- true or false. strcmp, kind of weird, 1116 00:52:38,460 --> 00:52:40,060 acts the opposite way. 1117 00:52:40,060 --> 00:52:43,380 It turns out that strcmp doesn't return true and false. 1118 00:52:43,380 --> 00:52:48,120 If you read its documentation, it returns 0 if the strings are equal, 1119 00:52:48,120 --> 00:52:52,120 but super conveniently, it returns a positive value 1120 00:52:52,120 --> 00:52:56,190 if s is supposed to come before t, and it returns a negative value 1121 00:52:56,190 --> 00:52:59,950 if s is supposed to come after t alphabetically. 1122 00:52:59,950 --> 00:53:03,270 So it turns out that you can use strcmp not just to compare for equality, 1123 00:53:03,270 --> 00:53:04,530 but inequality-- 1124 00:53:04,530 --> 00:53:05,760 less than or equal-- 1125 00:53:05,760 --> 00:53:08,880 less than or greater than, so to speak, alphabetically, 1126 00:53:08,880 --> 00:53:10,810 or in ASCII order, so to speak. 1127 00:53:10,810 --> 00:53:13,860 It will actually compare character by character the ASCII values, 1128 00:53:13,860 --> 00:53:16,050 and that will make sure that B comes after A, 1129 00:53:16,050 --> 00:53:18,430 and C comes after B, and so forth. 1130 00:53:18,430 --> 00:53:20,940 So you can actually use strcmp to like sort a dictionary, 1131 00:53:20,940 --> 00:53:24,310 or to sort the contacts in your iPhone or your Android phone. 1132 00:53:24,310 --> 00:53:27,090 So long story short, this is a function we can use, 1133 00:53:27,090 --> 00:53:30,120 we don't have to reinvent this wheel, and thus, we have no more code 1134 00:53:30,120 --> 00:53:30,840 even after this. 1135 00:53:30,840 --> 00:53:33,810 We just have to use it correctly, and there, the documentation 1136 00:53:33,810 --> 00:53:34,720 is your friend. 1137 00:53:34,720 --> 00:53:37,770 So if I run this program it's going to work exactly the same way, 1138 00:53:37,770 --> 00:53:40,590 but let me go ahead and point out some flaws. 1139 00:53:40,590 --> 00:53:44,820 It turns out all this time, I've been a little lazy with my error checking-- 1140 00:53:44,820 --> 00:53:46,200 checking for errors. 1141 00:53:46,200 --> 00:53:49,635 There's a whole bunch of things that can go wrong in week 1 of CS50 1142 00:53:49,635 --> 00:53:52,260 that we just kind of turn a blind eye to, because it would just 1143 00:53:52,260 --> 00:53:56,091 bloat our code, make it longer and sort of less interesting and fun to write 1144 00:53:56,091 --> 00:53:57,090 and less comprehensible. 1145 00:53:57,090 --> 00:53:59,776 But today, now that we know what's actually going on, 1146 00:53:59,776 --> 00:54:01,650 we can begin to ask some additional questions 1147 00:54:01,650 --> 00:54:04,020 and make our code stronger, more robust so 1148 00:54:04,020 --> 00:54:05,820 that nothing does, in fact, go wrong. 1149 00:54:05,820 --> 00:54:08,940 Turns out, if you read the documentation for get_string in the man page 1150 00:54:08,940 --> 00:54:11,880 or in CS50 reference, turns out get_string 1151 00:54:11,880 --> 00:54:14,010 does return a string-- uh, not really. 1152 00:54:14,010 --> 00:54:15,690 It returns the address of a string. 1153 00:54:15,690 --> 00:54:16,380 Uh, not really. 1154 00:54:16,380 --> 00:54:22,230 It returns the address of the first byte of a string, technically. 1155 00:54:22,230 --> 00:54:26,700 But if something goes wrong, it returns a special character called null. 1156 00:54:26,700 --> 00:54:32,230 Not to be confused with NUL, it returns a special address called null-- 1157 00:54:32,230 --> 00:54:34,510 left hand wasn't talking to right hand decades ago. 1158 00:54:34,510 --> 00:54:41,100 So null, N-U-L-L, just means the address 0, which nothing should ever live at. 1159 00:54:41,100 --> 00:54:44,190 It's just a bogus, invalid address. 1160 00:54:44,190 --> 00:54:49,200 Insofar as get_string returns the address of a string in memory, 1161 00:54:49,200 --> 00:54:53,760 like 100 for Brian or 900 for Veronica, if get_string ever 1162 00:54:53,760 --> 00:54:56,640 runs into a problem and just something goes wrong with the computer, 1163 00:54:56,640 --> 00:55:01,830 if it ever returns 0, specifically 0, a.k.a. 1164 00:55:01,830 --> 00:55:07,470 null-- N-U-L-L, then you can detect that something has gone wrong. 1165 00:55:07,470 --> 00:55:10,089 So to do that, and it's going to get a little tedious, 1166 00:55:10,089 --> 00:55:11,880 but it's nonetheless the right thing to do, 1167 00:55:11,880 --> 00:55:14,430 I need to be a little more defensive. 1168 00:55:14,430 --> 00:55:21,270 If s equals-equals null, otherwise known as 0, otherwise known as 0x0, 1169 00:55:21,270 --> 00:55:23,160 but I'll write it conventionally like this, 1170 00:55:23,160 --> 00:55:27,270 I'm going to go ahead and return 1 as my exit code. 1171 00:55:27,270 --> 00:55:32,370 If t equals-equals null, I'm going to go ahead and return 1 as my exit code, 1172 00:55:32,370 --> 00:55:34,140 or I could return 2 or 3-- 1173 00:55:34,140 --> 00:55:36,952 I just need to return some value to signal to the computer 1174 00:55:36,952 --> 00:55:38,910 that something went wrong, but by default we'll 1175 00:55:38,910 --> 00:55:43,410 just return 1 whenever something goes wrong, but if all went well, 1176 00:55:43,410 --> 00:55:44,910 I'm going to go ahead and return 0. 1177 00:55:44,910 --> 00:55:47,580 So recall again from last week, and we didn't spend a huge amount of time 1178 00:55:47,580 --> 00:55:48,300 on this-- 1179 00:55:48,300 --> 00:55:50,340 main itself can return values. 1180 00:55:50,340 --> 00:55:53,790 By default, ever since week 1, if you don't return anything, 1181 00:55:53,790 --> 00:55:58,200 main is automatically and secretly returning 0 for you because 0 is good. 1182 00:55:58,200 --> 00:56:02,079 The reason for 0 is because there's only one 0 in the world, obviously, 1183 00:56:02,079 --> 00:56:03,870 but there is an infinite number to the left 1184 00:56:03,870 --> 00:56:06,900 and there's an infinite number of the right, negative and positive. 1185 00:56:06,900 --> 00:56:09,340 That's great, because as you've already experienced in the past few weeks, 1186 00:56:09,340 --> 00:56:11,640 it feels like there's an infinite number of things that can go wrong when you're 1187 00:56:11,640 --> 00:56:13,410 writing even the shortest of programs. 1188 00:56:13,410 --> 00:56:17,190 So that means we have a lot of numbers we can assign to error codes, 1189 00:56:17,190 --> 00:56:18,250 so to speak. 1190 00:56:18,250 --> 00:56:20,670 Now I don't really care what the error codes are, 1191 00:56:20,670 --> 00:56:23,340 so I'm just going to adopt the human convention at the moment-- 1192 00:56:23,340 --> 00:56:27,480 if anything goes wrong, returns anything other than 0. 1193 00:56:27,480 --> 00:56:31,650 And so I'm going to return 1 up here, but if nothing goes wrong, return 0. 1194 00:56:31,650 --> 00:56:36,720 The point here is that by adding these three lines here and these three 1195 00:56:36,720 --> 00:56:38,790 lines here, I'm going to avoid what's called 1196 00:56:38,790 --> 00:56:42,210 a segmentation fault or segfault. Did any of you 1197 00:56:42,210 --> 00:56:43,590 encounter this cryptic error? 1198 00:56:43,590 --> 00:56:44,130 OK. 1199 00:56:44,130 --> 00:56:46,920 So a decent number of you, and if you probably had no idea what that means, 1200 00:56:46,920 --> 00:56:49,380 but starting today you will a bit more, and in the weeks to come, 1201 00:56:49,380 --> 00:56:50,790 you'll understand even more. 1202 00:56:50,790 --> 00:56:54,690 Segmentation fault means you touched memory you should not have. 1203 00:56:54,690 --> 00:56:58,050 Or something went wrong and you did not detect it. 1204 00:56:58,050 --> 00:57:01,320 It's kind of a catch-all phrase for memory-related problems. 1205 00:57:01,320 --> 00:57:03,960 This helps ward off those kinds of errors. 1206 00:57:03,960 --> 00:57:06,690 It's not the only way, but it's one such way. 1207 00:57:06,690 --> 00:57:09,330 So starting today with problems set programs and anything 1208 00:57:09,330 --> 00:57:12,090 you write in the course, you always want to be thinking about, 1209 00:57:12,090 --> 00:57:14,647 even if you go back and add it later, could this go wrong? 1210 00:57:14,647 --> 00:57:16,313 Could this go wrong? 1211 00:57:16,313 --> 00:57:18,197 And just add some additional ifs and else-ifs 1212 00:57:18,197 --> 00:57:21,280 and handle those situations so that your program doesn't just crash on you 1213 00:57:21,280 --> 00:57:25,260 or segfault or surprise someone who's actually using it. 1214 00:57:25,260 --> 00:57:28,410 All right, let's take a look at one final example, 1215 00:57:28,410 --> 00:57:30,300 because frankly this is a little tedious. 1216 00:57:30,300 --> 00:57:32,110 I'm going to go ahead and open up-- 1217 00:57:32,110 --> 00:57:34,590 and this file can be found in compare5.c. 1218 00:57:34,590 --> 00:57:39,120 Let me go ahead and save this so that we have it-- compare5.c. 1219 00:57:39,120 --> 00:57:41,095 I'm going to make one final comparison example. 1220 00:57:41,095 --> 00:57:43,800 I'm going to save this as compare6.c. 1221 00:57:43,800 --> 00:57:46,020 Turns out that humans like their succinctness. 1222 00:57:46,020 --> 00:57:50,890 And null, because it is technically the 0 address, 1223 00:57:50,890 --> 00:57:52,920 you can actually be a little clever. 1224 00:57:52,920 --> 00:57:59,430 If not s and if not t is a sufficient way to express those same things. 1225 00:57:59,430 --> 00:58:00,960 Because what does the bang do? 1226 00:58:00,960 --> 00:58:04,050 The exclamation point in code if you recall? 1227 00:58:04,050 --> 00:58:05,170 It inverts something. 1228 00:58:05,170 --> 00:58:11,725 So like if this is saying, if s is not 0, a.k.a., if s not null, or rather-- 1229 00:58:15,210 --> 00:58:18,005 if-- now I'm getting confused. 1230 00:58:18,005 --> 00:58:18,505 Yes. 1231 00:58:18,505 --> 00:58:21,910 If I had just said, if s, then it's a valid address 1232 00:58:21,910 --> 00:58:23,750 and I should go on with my business. 1233 00:58:23,750 --> 00:58:28,300 But if it's not s or if s is null, I want 1234 00:58:28,300 --> 00:58:31,240 to go ahead and return 1 because there's an error, and down here too. 1235 00:58:31,240 --> 00:58:33,910 So any time you're checking whether something equals null, 1236 00:58:33,910 --> 00:58:37,390 you can make it more succinct by just saying if not s; if it's null, 1237 00:58:37,390 --> 00:58:38,020 return 1. 1238 00:58:38,020 --> 00:58:39,190 If it's null, return 1. 1239 00:58:39,190 --> 00:58:42,680 It's just syntactic shorthand. 1240 00:58:42,680 --> 00:58:43,180 Phew! 1241 00:58:43,180 --> 00:58:45,100 I had to think about that one. 1242 00:58:45,100 --> 00:58:45,820 Any questions? 1243 00:58:45,820 --> 00:58:53,160 AUDIENCE: Why does [INAUDIBLE] will store some [INAUDIBLE] 1244 00:58:53,160 --> 00:58:54,280 DAVID MALAN: Correct. 1245 00:58:54,280 --> 00:58:58,020 You are storing an address, but if that address is 0. 1246 00:58:58,020 --> 00:59:04,300 Saying if it's not 0, 0 is like false, so not false means true, 1247 00:59:04,300 --> 00:59:07,990 and so it has the effect of inverting the logic. 1248 00:59:07,990 --> 00:59:08,590 That's all. 1249 00:59:08,590 --> 00:59:12,570 Anytime you use a bang or exclamation point, it changes a 0 to non-0-- 1250 00:59:12,570 --> 00:59:14,955 AUDIENCE: [INAUDIBLE], but even-- 1251 00:59:14,955 --> 00:59:20,820 I don't understand why [INAUDIBLE] implies that it's [INAUDIBLE].. 1252 00:59:20,820 --> 00:59:22,820 DAVID MALAN: So you can think about it this way. 1253 00:59:22,820 --> 00:59:24,725 If s-- previously we had this. 1254 00:59:24,725 --> 00:59:30,350 If s equals-equals null is like saying if s literally equals 0. 1255 00:59:30,350 --> 00:59:32,480 And you can kind of think of that informally as 1256 00:59:32,480 --> 00:59:34,970 if s doesn't have a valid pointer-- 1257 00:59:34,970 --> 00:59:38,110 0 is not a valid point or it's not a valid address by definition. 1258 00:59:38,110 --> 00:59:42,450 100 is valid, 900 is valid, 0 is not valid just by a human convention. 1259 00:59:42,450 --> 00:59:46,680 So this is like saying, if s does not have a value, that's valid. 1260 00:59:46,680 --> 00:59:52,130 So the way to succinctly say that, if not s, 1261 00:59:52,130 --> 00:59:55,040 and it's just shorthand for that is another way to think about it. 1262 00:59:55,040 --> 00:59:58,270 All right, so let's take a look at a very different program, 1263 00:59:58,270 --> 01:00:02,070 but that reveals the same kind of issue as follows. 1264 01:00:02,070 --> 01:00:05,030 I'm going to go ahead and open up an example called 1265 01:00:05,030 --> 01:00:09,410 copy0, whose purpose in life hopefully is to copy a string. 1266 01:00:09,410 --> 01:00:11,690 So notice that in my program here, which I 1267 01:00:11,690 --> 01:00:15,440 wrote in advance, I'm getting a string from the user on line 11, 1268 01:00:15,440 --> 01:00:17,240 and I'm storing it in a string called s. 1269 01:00:17,240 --> 01:00:20,210 I could change this to char* now, but we know what it is. 1270 01:00:20,210 --> 01:00:24,560 And I'm going to go ahead and copy the string's address from s into t. 1271 01:00:24,560 --> 01:00:29,370 And then I'm going to say, if the length of t is greater than 0, 1272 01:00:29,370 --> 01:00:31,620 then go ahead and just capitalize the first character. 1273 01:00:31,620 --> 01:00:32,660 So it's a little cryptic, but you might have 1274 01:00:32,660 --> 01:00:35,120 done something kind of like this with Caesar and with recent string 1275 01:00:35,120 --> 01:00:35,930 manipulation. 1276 01:00:35,930 --> 01:00:38,810 This is just making sure, do I have at least one character? 1277 01:00:38,810 --> 01:00:42,340 And if so, first character is t bracket 0, as you recall. 1278 01:00:42,340 --> 01:00:45,554 toupper is a function in ctype.h from last week 1279 01:00:45,554 --> 01:00:46,970 that just capitalizes this letter. 1280 01:00:46,970 --> 01:00:50,060 So this one line of code, 19, just capitalizes the first letter 1281 01:00:50,060 --> 01:00:51,232 in t, that's it. 1282 01:00:51,232 --> 01:00:54,440 And then at the very end we just print out what s is and print out what t is. 1283 01:00:54,440 --> 01:00:55,010 That's all. 1284 01:00:55,010 --> 01:00:59,390 So this program just copies s into t, capitalizes t, and that's it. 1285 01:00:59,390 --> 01:01:02,339 So let me go ahead and make copy0. 1286 01:01:02,339 --> 01:01:03,630 This is in our code from today. 1287 01:01:03,630 --> 01:01:07,340 So I'm going to do cd sc3, because I already wrote it in that directory. 1288 01:01:07,340 --> 01:01:08,975 make copy0. 1289 01:01:08,975 --> 01:01:12,810 Went well. ./copy0. 1290 01:01:12,810 --> 01:01:17,240 Let's go ahead and type in tj again in lowercase. 1291 01:01:17,240 --> 01:01:18,270 Enter. 1292 01:01:18,270 --> 01:01:19,760 Huh. 1293 01:01:19,760 --> 01:01:21,939 TJ, TJ-- both are capitalized. 1294 01:01:21,939 --> 01:01:24,230 All right, maybe it's just a weird thing with initials. 1295 01:01:24,230 --> 01:01:28,520 So let's just do Veronica, all lowercase. 1296 01:01:28,520 --> 01:01:30,140 Huh, that's definitely capital. 1297 01:01:30,140 --> 01:01:32,720 Let's do even more obvious difference, Brian where 1298 01:01:32,720 --> 01:01:35,070 the B's really going to look different. 1299 01:01:35,070 --> 01:01:38,270 Yet I'm only capitalizing t. 1300 01:01:38,270 --> 01:01:40,580 Well let's consider what's actually going on here. 1301 01:01:40,580 --> 01:01:46,080 In this case, when I'm getting a string from the user, s and t, and I type in, 1302 01:01:46,080 --> 01:01:51,020 for instance, brian in all lowercase, backslash 0, this, of course, 1303 01:01:51,020 --> 01:01:54,710 is just an array underneath the hood. 1304 01:01:54,710 --> 01:01:56,150 This is taking up six bytes here. 1305 01:01:56,150 --> 01:01:58,922 And when I store in s, s is a string. 1306 01:01:58,922 --> 01:01:59,630 So you know what? 1307 01:01:59,630 --> 01:02:00,671 We didn't do this before. 1308 01:02:00,671 --> 01:02:05,300 Let me actually create a variable, a chunk of memory for s and call it s. 1309 01:02:05,300 --> 01:02:07,760 And suppose Brian is just where he was before-- 1310 01:02:07,760 --> 01:02:13,230 100, 101, 102, 103, 104, and 105. 1311 01:02:13,230 --> 01:02:18,110 So if I do s equals get_string and get_string returns Brian, 1312 01:02:18,110 --> 01:02:21,626 what do I write in the box called s? 1313 01:02:21,626 --> 01:02:22,820 Yeah, just 100, right? 1314 01:02:22,820 --> 01:02:24,780 This is all that's been going on all this time 1315 01:02:24,780 --> 01:02:27,050 even though we didn't talk about it at this level. 1316 01:02:27,050 --> 01:02:30,920 And actually, it turns out-- pointer actually can be used pictorially. 1317 01:02:30,920 --> 01:02:34,940 If you actually prefer to think about a pointer as being an address 1318 01:02:34,940 --> 01:02:37,957 or like kind of a map that leads you somewhere, another way a human 1319 01:02:37,957 --> 01:02:40,040 would typically draw a pointer-- because honestly, 1320 01:02:40,040 --> 01:02:41,960 who really cares that Brian is at address 100? 1321 01:02:41,960 --> 01:02:45,230 Like that is way too low level, that's week 0 stuff. 1322 01:02:45,230 --> 01:02:46,820 He's just pointing there. 1323 01:02:46,820 --> 01:02:49,190 So s is a pointer to that chunk of memory. 1324 01:02:49,190 --> 01:02:52,640 It happens to be 100, whatever, the arrow is how you would literally 1325 01:02:52,640 --> 01:02:56,060 point at the chunk of memory if you were drawing this on some notes. 1326 01:02:56,060 --> 01:02:57,710 So that, too, is correct. 1327 01:02:57,710 --> 01:03:00,980 So the problem arises here with that line of code. 1328 01:03:00,980 --> 01:03:07,280 When I actually try to copy s and store in t, think about what's going on. 1329 01:03:07,280 --> 01:03:10,830 The right-hand side is just s's value, which happens to be 100. 1330 01:03:10,830 --> 01:03:13,460 The left-hand side is just saying, hey computer, give me 1331 01:03:13,460 --> 01:03:16,760 another variable, first string, and call it t. 1332 01:03:16,760 --> 01:03:19,730 So that's like saying, hey, computer, give me another chunk of memory, 1333 01:03:19,730 --> 01:03:22,766 call it t, and then store s in it. 1334 01:03:22,766 --> 01:03:24,140 But what does it mean to store s? 1335 01:03:24,140 --> 01:03:27,800 Well what is s's value at this point in time? 1336 01:03:27,800 --> 01:03:30,140 It's the pointer to Brian, or it's technically-- 1337 01:03:30,140 --> 01:03:34,000 I'll write both just for thoroughness-- it's literally the number 100. 1338 01:03:34,000 --> 01:03:39,720 So if you do t equals s, that is like saying put 100 there too, 1339 01:03:39,720 --> 01:03:42,580 and pictorially that's like saying this. 1340 01:03:42,580 --> 01:03:46,860 So at this point in the story, when I copy s into t, 1341 01:03:46,860 --> 01:03:48,540 the computer took me literally. 1342 01:03:48,540 --> 01:03:51,390 It did copy s into t, but what is s? 1343 01:03:51,390 --> 01:03:52,470 It's just the address. 1344 01:03:52,470 --> 01:03:56,800 It is not B-R-I-A-N backslash 0, it's just the address. 1345 01:03:56,800 --> 01:04:00,810 So when I then say, t bracket 0 gets toupper-- 1346 01:04:00,810 --> 01:04:02,370 so let's look at this line of code. 1347 01:04:02,370 --> 01:04:04,590 The one line of code here that's highlighted, 1348 01:04:04,590 --> 01:04:07,740 when I say go to the 0th character of t and store 1349 01:04:07,740 --> 01:04:11,962 the uppercase version of that same character, you just follow the arrows. 1350 01:04:11,962 --> 01:04:13,920 If you ever played chutes and ladders as a kid, 1351 01:04:13,920 --> 01:04:16,253 you just kind of follow the arrow, see where you end up. 1352 01:04:16,253 --> 01:04:19,750 t bracket 0 is this location here, because again, 1353 01:04:19,750 --> 01:04:22,330 if this is a chunk of memory, per last week it's an array, 1354 01:04:22,330 --> 01:04:26,580 so you can also think of this as being bracket 0, this is bracket 1, 1355 01:04:26,580 --> 01:04:30,460 this is bracket 2, and so forth. 1356 01:04:30,460 --> 01:04:31,600 So it's just an array. 1357 01:04:31,600 --> 01:04:36,810 So t bracket 0 is lowercase b, and toupper of lowercase b, 1358 01:04:36,810 --> 01:04:40,650 of course, changes this little b to a B. But now 1359 01:04:40,650 --> 01:04:43,220 both s and t are still pointing at the same chunk of memory, 1360 01:04:43,220 --> 01:04:47,620 so of course s and t are both going to be Bryan capitalized, 1361 01:04:47,620 --> 01:04:51,000 or TJ too in my first example. 1362 01:04:51,000 --> 01:04:57,260 Any questions then on what we just did and why that happens? 1363 01:04:57,260 --> 01:04:59,100 All right, so intuitively what's the fix? 1364 01:04:59,100 --> 01:05:01,100 Doesn't matter if you've no idea how to code it, 1365 01:05:01,100 --> 01:05:04,880 like what do we have to do to fundamentally copy a string, not 1366 01:05:04,880 --> 01:05:06,572 an address? 1367 01:05:06,572 --> 01:05:08,830 AUDIENCE: [INAUDIBLE] 1368 01:05:08,830 --> 01:05:10,136 DAVID MALAN: Create a new what? 1369 01:05:10,136 --> 01:05:12,500 AUDIENCE: Basically create the [INAUDIBLE].. 1370 01:05:12,500 --> 01:05:13,250 DAVID MALAN: Yeah. 1371 01:05:13,250 --> 01:05:15,249 Create the same string in a new chunk of memory. 1372 01:05:15,249 --> 01:05:17,970 What I really need to do is allocate or give myself 1373 01:05:17,970 --> 01:05:21,720 a bunch of more memory that's just as big as Brian, 1374 01:05:21,720 --> 01:05:24,750 including his backslash 0. 1375 01:05:24,750 --> 01:05:28,450 And then logically I just need to copy every character into that. 1376 01:05:28,450 --> 01:05:31,020 So if I go back to my original when it was a lowercase b, 1377 01:05:31,020 --> 01:05:34,080 I need to make a copy logically by using a for loop or a while loop 1378 01:05:34,080 --> 01:05:35,130 or whatever you prefer-- 1379 01:05:35,130 --> 01:05:42,990 B-R-I-A-N backslash 0, so that when I copy the string and then store it in t, 1380 01:05:42,990 --> 01:05:45,480 It's not actually copying literally s. 1381 01:05:45,480 --> 01:05:49,540 And let's suppose that he ends up at location 300 just arbitrarily-- 1382 01:05:49,540 --> 01:05:51,010 just making up easy numbers. 1383 01:05:51,010 --> 01:05:54,780 t now stores 300, points here. 1384 01:05:54,780 --> 01:05:59,370 So when I execute this line in this version of the story, t bracket 0 1385 01:05:59,370 --> 01:06:02,250 gets toupper, what am I actually doing? 1386 01:06:02,250 --> 01:06:04,440 I'm following a different arrow this time 1387 01:06:04,440 --> 01:06:08,280 because I gave myself a different chunk of memory, capitalizing this Brian, 1388 01:06:08,280 --> 01:06:13,170 thereby hopefully fixing the bug, albeit verbally only. 1389 01:06:13,170 --> 01:06:14,910 So how do we do this in code? 1390 01:06:14,910 --> 01:06:16,300 We need to do exactly that. 1391 01:06:16,300 --> 01:06:18,400 We need to give ourself some more memory, 1392 01:06:18,400 --> 01:06:23,832 so let's introduce one other feature of C. In copy1.c, 1393 01:06:23,832 --> 01:06:25,930 we see the solution to this problem. 1394 01:06:25,930 --> 01:06:31,260 Notice at the top I'm doing things a little lower level-- oop, surprise. 1395 01:06:31,260 --> 01:06:33,740 Notice in this version of the code, copy1.c, 1396 01:06:33,740 --> 01:06:37,950 see I've started off almost the same, but just to be super clear, 1397 01:06:37,950 --> 01:06:39,180 I'm just using char*. 1398 01:06:39,180 --> 01:06:41,057 I don't want any magic, so there's no string, 1399 01:06:41,057 --> 01:06:42,390 there's no training wheels here. 1400 01:06:42,390 --> 01:06:45,300 But this logically is the exact same as before-- 1401 01:06:45,300 --> 01:06:46,800 plus the error-checking. 1402 01:06:46,800 --> 01:06:47,850 This line is new. 1403 01:06:47,850 --> 01:06:51,760 And it looks a little funky, but let's see what's going on. 1404 01:06:51,760 --> 01:06:54,510 And this line of code here, what am I doing? 1405 01:06:54,510 --> 01:06:57,510 The left-hand side, that's shorter, let's start with the easier one. 1406 01:06:57,510 --> 01:07:02,780 Char* t, just in layman's terms, what does that expression do? char*? 1407 01:07:02,780 --> 01:07:06,700 Hey computer, do what? 1408 01:07:06,700 --> 01:07:07,326 What's that? 1409 01:07:07,326 --> 01:07:08,200 AUDIENCE: [INAUDIBLE] 1410 01:07:08,200 --> 01:07:09,325 DAVID MALAN: Not quite yet. 1411 01:07:09,325 --> 01:07:10,557 Different formulation. 1412 01:07:14,140 --> 01:07:17,110 Hey computer, give me-- 1413 01:07:17,110 --> 01:07:17,960 not quite. 1414 01:07:17,960 --> 01:07:19,953 Be more precise? 1415 01:07:19,953 --> 01:07:21,390 AUDIENCE: An array? 1416 01:07:21,390 --> 01:07:23,390 DAVID MALAN: Not quite an array, just this part. 1417 01:07:23,390 --> 01:07:25,520 So let me hide all this. 1418 01:07:25,520 --> 01:07:27,485 If the star wasn't there-- 1419 01:07:27,485 --> 01:07:28,860 I can't really do this very well. 1420 01:07:28,860 --> 01:07:29,964 So this-- yeah? 1421 01:07:29,964 --> 01:07:31,900 AUDIENCE: [INAUDIBLE] character? 1422 01:07:31,900 --> 01:07:33,316 DAVID MALAN: Good, I'll take that. 1423 01:07:33,316 --> 01:07:35,560 So hey computer, give me a pointer to a character. 1424 01:07:35,560 --> 01:07:37,960 Or even more low level, hey computer, give me 1425 01:07:37,960 --> 01:07:41,590 a chunk of memory in which I can store the address of a character. 1426 01:07:41,590 --> 01:07:42,820 I mean, it is that mundane. 1427 01:07:42,820 --> 01:07:46,000 Draw a box on the screen, call it s-- or rather, 1428 01:07:46,000 --> 01:07:49,825 call it t, but just give me space for a pointer, as you said. 1429 01:07:49,825 --> 01:07:50,950 So that's all that's doing. 1430 01:07:50,950 --> 01:07:54,310 It's drawing a box on the screen and calling it t, and it's currently empty. 1431 01:07:54,310 --> 01:07:56,920 Now let's look at the scarier part on the right-hand side. 1432 01:07:56,920 --> 01:07:58,780 malloc, new function today. 1433 01:07:58,780 --> 01:08:00,420 Stands for memory allocates. 1434 01:08:00,420 --> 01:08:03,420 It's very cryptic-sounding, but it just means give me a chunk of memory. 1435 01:08:03,420 --> 01:08:05,800 It says exactly what you said in functional terms. 1436 01:08:05,800 --> 01:08:07,990 Then it just needs you to answer one question-- 1437 01:08:07,990 --> 01:08:09,610 OK, how much memory do you want? 1438 01:08:09,610 --> 01:08:11,290 How many bytes do you want? 1439 01:08:11,290 --> 01:08:15,250 And now maybe the math, even though cryptic at first glance, makes sense. 1440 01:08:15,250 --> 01:08:19,689 Get the string length of s, add 1, and then multiply it 1441 01:08:19,689 --> 01:08:21,132 by the size of a character. 1442 01:08:21,132 --> 01:08:23,590 And we've not seen this before. sizeof literally does that. 1443 01:08:23,590 --> 01:08:26,170 It tells you how many bytes is a char. 1444 01:08:26,170 --> 01:08:28,300 Happens to be 1, and in fact, that's defined. 1445 01:08:28,300 --> 01:08:32,649 So if we simplify this in C, the char is always 1 byte, 1446 01:08:32,649 --> 01:08:35,029 so this is equivalent to just multiplying by 1. 1447 01:08:35,029 --> 01:08:37,352 And obviously mathematically that's a waste of time, 1448 01:08:37,352 --> 01:08:39,310 so we can whittle this down to be even simpler. 1449 01:08:39,310 --> 01:08:41,120 I was just being thorough. 1450 01:08:41,120 --> 01:08:45,430 So now, hey computer, allocate me this many bytes of memory. 1451 01:08:45,430 --> 01:08:46,790 Why is it plus 1? 1452 01:08:46,790 --> 01:08:48,470 AUDIENCE: You need the null character. 1453 01:08:48,470 --> 01:08:50,136 DAVID MALAN: I need that null character. 1454 01:08:50,136 --> 01:08:54,310 Brian is 1, 2, 3, 4, 5 as he said, but I need the sixth for his null character, 1455 01:08:54,310 --> 01:08:56,060 and I just know that's going to be there. 1456 01:08:56,060 --> 01:08:59,710 So at this point in the story, what has happened? 1457 01:08:59,710 --> 01:09:04,300 All that malloc does is it gives me this box of memory 1458 01:09:04,300 --> 01:09:07,863 containing room for as many bytes are in Brian's name. 1459 01:09:07,863 --> 01:09:09,279 But it doesn't fill them just yet. 1460 01:09:09,279 --> 01:09:13,160 Now I need to logically fill those bytes with Brian's actual name. 1461 01:09:13,160 --> 01:09:15,550 So if we scroll down to my for loop here, 1462 01:09:15,550 --> 01:09:18,490 we can actually copy the string into that space. 1463 01:09:18,490 --> 01:09:21,700 And it's a little long, the expression, but nothing new here. 1464 01:09:21,700 --> 01:09:28,689 Initialize i to 0, n to the length of s, i is less than or equal to n-- 1465 01:09:28,689 --> 01:09:30,160 we'll come back to that, i++. 1466 01:09:30,160 --> 01:09:32,260 So it's just a pretty standard for loop. 1467 01:09:32,260 --> 01:09:36,490 Then copy the i-th character of s into the i-th character of t. 1468 01:09:36,490 --> 01:09:40,667 The only thing that's making me a little nervous honestly is this thing here. 1469 01:09:40,667 --> 01:09:43,000 Like I feel like every time we do less than or equal to, 1470 01:09:43,000 --> 01:09:45,250 we create a bug like last week. 1471 01:09:45,250 --> 01:09:46,486 But this is correct, why? 1472 01:09:50,770 --> 01:09:54,580 Why do I want to go up to and through the length of this? 1473 01:09:54,580 --> 01:09:56,790 AUDIENCE: Is it the null character that adds-- 1474 01:09:56,790 --> 01:09:57,360 DAVID MALAN: Exactly. 1475 01:09:57,360 --> 01:09:58,610 Because of the null character. 1476 01:09:58,610 --> 01:10:02,130 I actually don't want to stop at the strlen of s, so I could change this. 1477 01:10:02,130 --> 01:10:04,890 If you're just more comfortable using less than, because you just 1478 01:10:04,890 --> 01:10:08,460 got your mind wrapped around why we do that in the first place, that's fine, 1479 01:10:08,460 --> 01:10:11,440 we just need to do this instead. 1480 01:10:11,440 --> 01:10:16,020 So this is mathematically-- if you go to strlen plus 1, the same thing 1481 01:10:16,020 --> 01:10:18,789 as not doing that math but just going one step further. 1482 01:10:18,789 --> 01:10:20,830 Just whatever you want to think about it is fine. 1483 01:10:20,830 --> 01:10:22,621 However you want to think about it is fine. 1484 01:10:22,621 --> 01:10:25,020 OK, and then lastly, just a quick check, is the length 1485 01:10:25,020 --> 01:10:27,420 of t at least one or more characters? 1486 01:10:27,420 --> 01:10:29,910 Because otherwise there's nothing to capitalize, and if so, 1487 01:10:29,910 --> 01:10:31,120 go ahead and do it. 1488 01:10:31,120 --> 01:10:34,350 So if I now run this example, make-- oop, let me save it. 1489 01:10:34,350 --> 01:10:37,040 make copy1, that compiled. 1490 01:10:37,040 --> 01:10:42,480 ./copy1, now let's type in tj, tj in lowercase comes back, 1491 01:10:42,480 --> 01:10:44,490 but now t is capitalized. 1492 01:10:44,490 --> 01:10:49,140 And let's go ahead and do Brian's name in all lowercase, only one of them 1493 01:10:49,140 --> 01:10:51,100 is now capitalized. 1494 01:10:51,100 --> 01:10:54,450 So does that make sense what's now happened? 1495 01:10:54,450 --> 01:10:54,950 All right. 1496 01:10:54,950 --> 01:10:57,980 So where can we go with this? 1497 01:10:57,980 --> 01:11:00,647 Well it turns out-- let me open up one final example here, 1498 01:11:00,647 --> 01:11:02,480 because honestly, that's incredibly tedious, 1499 01:11:02,480 --> 01:11:04,010 and no one's ever going to want to copy strings if you 1500 01:11:04,010 --> 01:11:05,510 have to go through all of that work. 1501 01:11:05,510 --> 01:11:08,990 Turns out that store copy exists. 1502 01:11:08,990 --> 01:11:11,060 So when in doubt, check the man page. 1503 01:11:11,060 --> 01:11:12,710 When in doubt, check CS50 reference. 1504 01:11:12,710 --> 01:11:15,194 Does the function exist somewhere related 1505 01:11:15,194 --> 01:11:16,610 to some keywords you have in mind? 1506 01:11:16,610 --> 01:11:18,650 Like string copy, see if something comes back. 1507 01:11:18,650 --> 01:11:22,490 And indeed, we've had strlen, we've had strcmp, we now have strcpy, 1508 01:11:22,490 --> 01:11:25,970 and if you read the documentation, this is deliberately reversed like this. 1509 01:11:25,970 --> 01:11:30,050 The destination is this variable, the source or the origin string 1510 01:11:30,050 --> 01:11:32,680 is this one, and it copies from one end to the other, 1511 01:11:32,680 --> 01:11:35,150 and then I don't need that for loop. 1512 01:11:35,150 --> 01:11:37,590 It just saves me a few lines of code. 1513 01:11:37,590 --> 01:11:38,240 All right. 1514 01:11:38,240 --> 01:11:41,750 So let's take off one other detail here. 1515 01:11:41,750 --> 01:11:46,520 Oh, and you'll notice, actually, let me make one fix, one fix here. 1516 01:11:46,520 --> 01:11:50,690 It turns out that what I'm doing here is a little lazy. 1517 01:11:50,690 --> 01:11:54,260 It turns out that malloc does have an opposite. 1518 01:11:54,260 --> 01:11:57,230 So anytime you allocate memory, technically 1519 01:11:57,230 --> 01:11:59,630 you should also be freeing that memory. 1520 01:11:59,630 --> 01:12:02,909 And so C allows you to ask the computer for as much memory as you want, 1521 01:12:02,909 --> 01:12:06,200 but if you never give it back, have you ever experienced on your own Mac or PC, 1522 01:12:06,200 --> 01:12:08,158 like after your computer's been running a while 1523 01:12:08,158 --> 01:12:10,700 or using some new or bloated program like a browser, 1524 01:12:10,700 --> 01:12:13,100 it gets slower and slower and slower? 1525 01:12:13,100 --> 01:12:16,610 And in the worse case it just freezes or hangs or something? 1526 01:12:16,610 --> 01:12:19,670 It is quite possible that that program simply-- was made by humans, 1527 01:12:19,670 --> 01:12:20,240 of course-- 1528 01:12:20,240 --> 01:12:21,840 just has a memory leak. 1529 01:12:21,840 --> 01:12:25,400 So some human wrote one or more lines of code that uses malloc 1530 01:12:25,400 --> 01:12:28,525 or some equivalent in another language that just kept allocating memory 1531 01:12:28,525 --> 01:12:29,400 for the user's input. 1532 01:12:29,400 --> 01:12:31,310 You're visiting one web page, two web pages, 1533 01:12:31,310 --> 01:12:33,620 that requires memory whatever the program is. 1534 01:12:33,620 --> 01:12:37,490 And if that human never calls the opposite of allocate-- deallocate, 1535 01:12:37,490 --> 01:12:40,700 otherwise known as free, you're never giving the memory back 1536 01:12:40,700 --> 01:12:41,840 to the operating system. 1537 01:12:41,840 --> 01:12:45,129 So it gets slower and slower because it's running lower and lower and lower 1538 01:12:45,129 --> 01:12:47,420 on memory, and it might have to move some things around 1539 01:12:47,420 --> 01:12:50,390 to make room for things, that's what's called a memory leak. 1540 01:12:50,390 --> 01:12:54,170 And so indeed, in this program, I should actually improve this a little bit. 1541 01:12:54,170 --> 01:12:58,250 If I go back into this version here and line 18, recall, 1542 01:12:58,250 --> 01:13:01,010 I allocated this memory just to make my copy, 1543 01:13:01,010 --> 01:13:04,310 the very last thing I should actually do in this program 1544 01:13:04,310 --> 01:13:05,977 is this line here-- free. 1545 01:13:05,977 --> 01:13:08,810 You don't have to tell the computer how many bytes you want to free, 1546 01:13:08,810 --> 01:13:12,650 it will remember for you so long as you're just pass in the pointer-- 1547 01:13:12,650 --> 01:13:16,790 the variable that's storing the address of the chunk of memory 1548 01:13:16,790 --> 01:13:18,890 that you allocated. 1549 01:13:18,890 --> 01:13:19,700 All right. 1550 01:13:19,700 --> 01:13:23,030 So let's now see why we've been using get_string, 1551 01:13:23,030 --> 01:13:25,430 since it's not just to kind of simplify the code, 1552 01:13:25,430 --> 01:13:28,640 it's also to defend against some very easy problems. 1553 01:13:28,640 --> 01:13:31,220 Here is a program called scanf0-- 1554 01:13:31,220 --> 01:13:35,070 scanned formatted text, another arcane-sounding function, 1555 01:13:35,070 --> 01:13:36,630 but it's pretty straightforward. 1556 01:13:36,630 --> 01:13:39,952 This program simply gets in from the user using scanf. 1557 01:13:39,952 --> 01:13:42,410 Up until now for the past three weeks, you've used get_int. 1558 01:13:42,410 --> 01:13:45,050 So this is an alternative to get_int that you could 1559 01:13:45,050 --> 01:13:48,320 have started using a few weeks ago. 1560 01:13:48,320 --> 01:13:51,430 Give me an int called x, print out x colon whatever-- 1561 01:13:51,430 --> 01:13:53,190 that's just the prompt to the user. 1562 01:13:53,190 --> 01:14:01,490 scanf %i, &x;, whatever that is, and then print out x's value using %i. 1563 01:14:01,490 --> 01:14:02,810 So what's going on here? 1564 01:14:02,810 --> 01:14:06,140 Now today we can actually start to wrap our minds around what get_int actually 1565 01:14:06,140 --> 01:14:06,639 does. 1566 01:14:06,639 --> 01:14:07,910 This is effectively get_int. 1567 01:14:07,910 --> 01:14:11,034 If you actually look at the source code for get_int, it's a little fancier. 1568 01:14:11,034 --> 01:14:13,999 But in essence, what get_int does is it declares a variable called x, 1569 01:14:13,999 --> 01:14:16,040 and it doesn't put anything there, because that's 1570 01:14:16,040 --> 01:14:17,630 supposed to come from you, the human. 1571 01:14:17,630 --> 01:14:20,900 It then prompts you for whatever string you pass to get_int, 1572 01:14:20,900 --> 01:14:22,370 so those are the first two lines. 1573 01:14:22,370 --> 01:14:24,570 And this is the only weird-looking one. 1574 01:14:24,570 --> 01:14:26,630 Scanf is like the opposite of printf. 1575 01:14:26,630 --> 01:14:31,920 You still use a formatted string-- %s, %i, %f or whatever, 1576 01:14:31,920 --> 01:14:35,750 but you're not going to output this, you're going to input this from 1577 01:14:35,750 --> 01:14:37,220 the human's keyboard. 1578 01:14:37,220 --> 01:14:41,780 And %x is the opposite of-- 1579 01:14:41,780 --> 01:14:49,160 is the special symbol in C that says, go ahead and get me the address of x. 1580 01:14:49,160 --> 01:14:52,370 So don't pass in x, give me the address of x. 1581 01:14:52,370 --> 01:14:53,110 Now why is that? 1582 01:14:53,110 --> 01:14:56,390 We'll see, but this is the way where you can tell the computer, 1583 01:14:56,390 --> 01:14:59,510 I've made a variable for you called x, here is where it is. 1584 01:14:59,510 --> 01:15:03,840 It's a treasure map that leads you to x, go put a value here for me. 1585 01:15:03,840 --> 01:15:06,800 And so the end result is that we do, in fact, end up getting an int. 1586 01:15:06,800 --> 01:15:13,010 If I do make scanf0, and then ./scanf0, I'll type in 42, all right? 1587 01:15:13,010 --> 01:15:16,040 It's not an interesting program, it just spits back out what I got, 1588 01:15:16,040 --> 01:15:18,330 but that's literally all that get_int, of course, 1589 01:15:18,330 --> 01:15:20,270 is doing if you then print out the value. 1590 01:15:20,270 --> 01:15:24,230 So if I stipulate this is correct, this is how you get an int from the user, 1591 01:15:24,230 --> 01:15:27,660 but honestly, the reason we don't do this in week 1 of the course is like, 1592 01:15:27,660 --> 01:15:31,070 my God, we just took the fun out of even getting a simple number from the user 1593 01:15:31,070 --> 01:15:32,987 by using these lines of code and whoever knows 1594 01:15:32,987 --> 01:15:35,486 what this symbol is-- we don't want you to think about that, 1595 01:15:35,486 --> 01:15:36,800 we want you to just get an int. 1596 01:15:36,800 --> 01:15:39,230 But today those training wheels are off, but we're 1597 01:15:39,230 --> 01:15:41,990 going to run into a problem super fast. 1598 01:15:41,990 --> 01:15:44,540 Let's try the same thing with a string. 1599 01:15:44,540 --> 01:15:49,680 If I were to do this, you would think that the result is the same. 1600 01:15:49,680 --> 01:15:52,070 Or let's just do it as char*. 1601 01:15:52,070 --> 01:15:54,020 But there's going to be one tweak. 1602 01:15:54,020 --> 01:15:59,180 If I go ahead and give myself space for the address of a character, 1603 01:15:59,180 --> 01:16:01,970 I don't need to use the ampersand now, because scanf 1604 01:16:01,970 --> 01:16:04,880 does need to be told where the chunk of memory is, 1605 01:16:04,880 --> 01:16:08,630 but it's already an address, so I don't need the ampersand here. 1606 01:16:08,630 --> 01:16:11,960 Recall earlier, I declared int x, which was just an int. 1607 01:16:11,960 --> 01:16:14,500 %x gets the address of that int. 1608 01:16:14,500 --> 01:16:19,160 Here, I'm saying from the get-go, get me the address of a char. 1609 01:16:19,160 --> 01:16:22,070 I don't need the ampersand cause I already have the address of a char 1610 01:16:22,070 --> 01:16:24,650 by definition of that star symbol. 1611 01:16:24,650 --> 01:16:26,580 So what's going on here? 1612 01:16:26,580 --> 01:16:27,560 Let me see now. 1613 01:16:27,560 --> 01:16:30,420 If I run scanf1, what happens? 1614 01:16:30,420 --> 01:16:33,410 So make scanf1 and-- 1615 01:16:33,410 --> 01:16:34,122 oh, let's see. 1616 01:16:34,122 --> 01:16:35,330 Here's a warning I'm getting. 1617 01:16:35,330 --> 01:16:37,372 Variable s is uninitialized when used here. 1618 01:16:37,372 --> 01:16:38,330 All right, that's fine. 1619 01:16:38,330 --> 01:16:41,210 It wants me to initialize it because this is a very common mistake. 1620 01:16:41,210 --> 01:16:43,168 Those of you who alluded to segmentation faults 1621 01:16:43,168 --> 01:16:46,040 earlier might have encountered something similar in spirit to this. 1622 01:16:46,040 --> 01:16:47,370 So that squelched that error. 1623 01:16:47,370 --> 01:16:49,330 Let me go ahead and run scanf1. 1624 01:16:49,330 --> 01:16:51,961 All right, here we go, TJ. 1625 01:16:51,961 --> 01:16:52,460 Hmm. 1626 01:16:52,460 --> 01:16:54,495 That is not your name, but OK. 1627 01:16:54,495 --> 01:16:57,000 It didn't crash at least, it's just a little weird. 1628 01:16:57,000 --> 01:16:58,280 David. 1629 01:16:58,280 --> 01:16:59,916 Null, OK, that's a little weird. 1630 01:16:59,916 --> 01:17:01,290 Let's go ahead and do this again. 1631 01:17:01,290 --> 01:17:03,740 Let's type in a really long name. 1632 01:17:06,280 --> 01:17:07,140 Enter. 1633 01:17:07,140 --> 01:17:09,060 Dammit, that didn't work. 1634 01:17:09,060 --> 01:17:11,250 So let's try an even longer name. 1635 01:17:16,350 --> 01:17:19,980 I'm hitting paste a lot. 1636 01:17:19,980 --> 01:17:21,790 OK-- dammit. 1637 01:17:21,790 --> 01:17:23,920 Too many times. 1638 01:17:23,920 --> 01:17:26,530 Command not found, that's definitely not a command. 1639 01:17:26,530 --> 01:17:27,310 Wow, OK. 1640 01:17:30,450 --> 01:17:32,004 Well that's interesting. 1641 01:17:32,004 --> 01:17:32,670 Oh, there it is. 1642 01:17:32,670 --> 01:17:33,610 Null, same thing. 1643 01:17:33,610 --> 01:17:36,260 OK, so what's actually going on? 1644 01:17:36,260 --> 01:17:38,370 Well null, which is all lowercase here, which 1645 01:17:38,370 --> 01:17:41,742 is this kind of an aesthetic thing, well it's not working. 1646 01:17:41,742 --> 01:17:42,450 It's not working. 1647 01:17:42,450 --> 01:17:44,010 Well what am I actually doing? 1648 01:17:44,010 --> 01:17:49,470 In that first line of code, when I say give me s to be a char*, 1649 01:17:49,470 --> 01:17:52,650 otherwise known as a string, all that's doing is allocating this. 1650 01:17:52,650 --> 01:17:54,750 And it's technically the size of a pointer. 1651 01:17:54,750 --> 01:17:57,330 A pointer, we never mentioned this before, but now we can. 1652 01:17:57,330 --> 01:18:02,910 Turns out it is 64 bits or 8 bytes. 1653 01:18:02,910 --> 01:18:07,770 8 bits is 1 bytes, so a pointer is by definition on many computers these 1654 01:18:07,770 --> 01:18:11,610 days-- most of your Macs, most of your PCs, the IDE, the Sandbox, the Lab-- 1655 01:18:11,610 --> 01:18:12,460 is 64-bit. 1656 01:18:12,460 --> 01:18:16,260 So that just means there's 64 bits here, but we initialized it to null, 1657 01:18:16,260 --> 01:18:20,190 so that just means there's 64 0's here, dot-dot-dot. 1658 01:18:20,190 --> 01:18:24,092 But when I get a string using scanf, what 1659 01:18:24,092 --> 01:18:26,550 I'm telling the computer to do with this line of code here, 1660 01:18:26,550 --> 01:18:31,930 notice, is hey computer, go to that address and put a string there. 1661 01:18:31,930 --> 01:18:34,150 So what's actually happening? 1662 01:18:34,150 --> 01:18:37,680 It turns out that there's just not enough room to type in TJ. 1663 01:18:37,680 --> 01:18:38,722 There's not enough room-- 1664 01:18:38,722 --> 01:18:41,430 that's a bit of a white lie, because we could fit you in 64 bits, 1665 01:18:41,430 --> 01:18:45,030 but there's not enough room to type in the long sentence or paragraph of text 1666 01:18:45,030 --> 01:18:46,140 I did, right? 1667 01:18:46,140 --> 01:18:47,340 What did we not do? 1668 01:18:47,340 --> 01:18:49,500 We didn't allocate any space over here. 1669 01:18:49,500 --> 01:18:51,930 All we allocated space for was the address. 1670 01:18:51,930 --> 01:18:55,770 And so every time I use scanf saying, get me a string and put it here, 1671 01:18:55,770 --> 01:18:57,220 there's nowhere to put it. 1672 01:18:57,220 --> 01:19:00,070 And so the value just very defensively says, no, like no, 1673 01:19:00,070 --> 01:19:03,030 cannot store this anywhere for you. 1674 01:19:03,030 --> 01:19:05,550 So I actually need to be a little smarter about this. 1675 01:19:05,550 --> 01:19:10,860 I actually need to get myself some space so that I can actually store something 1676 01:19:10,860 --> 01:19:11,750 in the right place. 1677 01:19:11,750 --> 01:19:12,670 Let's do that. 1678 01:19:12,670 --> 01:19:15,420 Let me go ahead and create a new program. 1679 01:19:15,420 --> 01:19:17,280 I'm going to go ahead and call this scanf2. 1680 01:19:21,370 --> 01:19:25,040 We need a little secret code to remind me of that. 1681 01:19:25,040 --> 01:19:27,180 Oh, wrong file name. 1682 01:19:27,180 --> 01:19:30,650 So I'm gone ahead and create a file called scanf2. 1683 01:19:30,650 --> 01:19:32,180 scanf2.c. 1684 01:19:32,180 --> 01:19:37,310 And I'm going to quickly recreate this stdio.h, int main void, 1685 01:19:37,310 --> 01:19:39,950 and then down here I'm going to go ahead and-- you know what? 1686 01:19:39,950 --> 01:19:44,270 Instead of a string s, which I know today to be a char* s, 1687 01:19:44,270 --> 01:19:45,549 what is this string really? 1688 01:19:45,549 --> 01:19:46,590 Well you said it earlier. 1689 01:19:46,590 --> 01:19:48,520 What is this string? 1690 01:19:48,520 --> 01:19:49,840 It's an array of characters. 1691 01:19:49,840 --> 01:19:51,210 Let me take you literally. 1692 01:19:51,210 --> 01:19:54,570 Just give me an array of let's say five characters. 1693 01:19:54,570 --> 01:19:58,980 The D-A-V-I-D, or one more, that's fine, just enough for my backslash 0. 1694 01:19:58,980 --> 01:20:01,330 Let me just create a string-- really low level, 1695 01:20:01,330 --> 01:20:03,676 but this time give myself the chunk of memory. 1696 01:20:03,676 --> 01:20:05,550 I don't want just the address of a character, 1697 01:20:05,550 --> 01:20:08,290 I want the actual characters themselves. 1698 01:20:08,290 --> 01:20:11,550 Let me go ahead and just prompt the human for their string with s, 1699 01:20:11,550 --> 01:20:12,510 just like before. 1700 01:20:12,510 --> 01:20:16,770 Then let me call scanf and get a string from the user using %s and then pass 1701 01:20:16,770 --> 01:20:17,400 in s. 1702 01:20:17,400 --> 01:20:18,720 And here's a little trick. 1703 01:20:18,720 --> 01:20:22,710 It turns out that because a string is really just an array, 1704 01:20:22,710 --> 01:20:25,620 but a string is also just a pointer, you can actually treat 1705 01:20:25,620 --> 01:20:28,440 an array as though it is a pointer-- 1706 01:20:28,440 --> 01:20:29,310 an address. 1707 01:20:29,310 --> 01:20:33,900 And so even though this is a char* array, this is OK. 1708 01:20:33,900 --> 01:20:37,710 This is the equivalent in this context to being just the address of a string. 1709 01:20:37,710 --> 01:20:41,880 Because strings are arrays, arrays can be treated as pointers as of now. 1710 01:20:41,880 --> 01:20:44,880 And then let me go ahead and just print out whatever the human typed in. 1711 01:20:44,880 --> 01:20:46,470 S is actually this. 1712 01:20:46,470 --> 01:20:49,320 Pass in s;, save. 1713 01:20:49,320 --> 01:20:49,987 Yeah? 1714 01:20:49,987 --> 01:20:52,589 AUDIENCE: So [INAUDIBLE] char*? 1715 01:20:52,589 --> 01:20:55,130 DAVID MALAN: At this point it would be redundant to do char*, 1716 01:20:55,130 --> 01:20:58,670 because I literally want for this story six characters. 1717 01:20:58,670 --> 01:21:01,740 I want space, rather, for six characters. 1718 01:21:01,740 --> 01:21:05,780 So this is kind of week 2 stuff now, there's no pointers involved. 1719 01:21:05,780 --> 01:21:08,810 But again, just showing the equivalence of these ideas for now. 1720 01:21:08,810 --> 01:21:12,560 So if I now go into this, and this is in my other directory at the moment, 1721 01:21:12,560 --> 01:21:19,760 make scanf2, Enter, ./scanf2, s is going to type in-- 1722 01:21:19,760 --> 01:21:22,864 I'll type in my name, I know I can fit that, we're back in business. 1723 01:21:22,864 --> 01:21:26,030 Like now it's working because I didn't just create the address for a string, 1724 01:21:26,030 --> 01:21:27,488 I created the space for the string. 1725 01:21:27,488 --> 01:21:31,250 But let me get a little dangerous-- 1726 01:21:31,250 --> 01:21:32,510 David Malan? 1727 01:21:32,510 --> 01:21:35,180 OK, that kind of worked out OK. 1728 01:21:35,180 --> 01:21:40,860 David Malan or some really long other name? 1729 01:21:40,860 --> 01:21:42,230 OK, that worked out too. 1730 01:21:42,230 --> 01:21:44,690 Let me go ahead and run it again. 1731 01:21:44,690 --> 01:21:48,077 Let me try that really long string again, see what happens. 1732 01:21:48,077 --> 01:21:49,910 I know this didn't work very well last time. 1733 01:21:49,910 --> 01:21:51,470 All right, done. 1734 01:21:51,470 --> 01:21:53,090 Ooh, OK. 1735 01:21:53,090 --> 01:21:57,470 So now I'm in the club of those of you who have had segmentation faults. 1736 01:21:57,470 --> 01:21:59,930 So let's understand what's going on here. 1737 01:21:59,930 --> 01:22:01,970 Segmentation fault a moment ago I claimed 1738 01:22:01,970 --> 01:22:05,420 was touching a segment, a chunk of memory that's not your own. 1739 01:22:05,420 --> 01:22:06,410 So just happened? 1740 01:22:06,410 --> 01:22:09,230 Well with this simple program, I told the computer, hey computer, 1741 01:22:09,230 --> 01:22:13,670 give me room for six characters, give me six bytes. 1742 01:22:13,670 --> 01:22:18,170 With the scanf line, I'm telling the computer, put the following user 1743 01:22:18,170 --> 01:22:22,280 input at that location, in that array of characters. 1744 01:22:22,280 --> 01:22:24,830 D-A-V-I-D backslash 0 fit. 1745 01:22:24,830 --> 01:22:27,890 David Malan didn't really, but it didn't seem to be a huge deal. 1746 01:22:27,890 --> 01:22:33,000 David Malan or some really long other name, also didn't crash the computer. 1747 01:22:33,000 --> 01:22:36,365 But that's because unbeknownst to us, usually when you ask for six bytes, 1748 01:22:36,365 --> 01:22:38,990 the computer is kind of sort of-- it's giving you a few extras. 1749 01:22:38,990 --> 01:22:41,480 It's not safe to use them, but it gives you enough 1750 01:22:41,480 --> 01:22:44,870 that you're not going to necessarily see a problem like a segmentation fault. 1751 01:22:44,870 --> 01:22:47,310 But it only allocates a few extra bytes typically, 1752 01:22:47,310 --> 01:22:51,019 so if you really keep pasting in long, long, long, long lines of text, 1753 01:22:51,019 --> 01:22:53,060 eventually you're going exceed not only those six 1754 01:22:53,060 --> 01:22:55,634 bytes, but well past the special-- 1755 01:22:55,634 --> 01:22:58,550 the secret bytes that you got back that you shouldn't be using anyway, 1756 01:22:58,550 --> 01:23:00,675 and that point the computer just gives up and says, 1757 01:23:00,675 --> 01:23:03,080 you are touching memory you shouldn't, a.k.a. 1758 01:23:03,080 --> 01:23:04,040 segmentation fault. 1759 01:23:04,040 --> 01:23:06,510 AUDIENCE: [INAUDIBLE] if the computer gives you 1760 01:23:06,510 --> 01:23:10,462 a few extra bytes, then why isn't it printing any of the other stuff? 1761 01:23:10,462 --> 01:23:14,154 After you said [INAUDIBLE] it just printed David. 1762 01:23:14,154 --> 01:23:15,570 DAVID MALAN: Really good question. 1763 01:23:15,570 --> 01:23:18,230 So even though I'm getting these sort of extra bytes, 1764 01:23:18,230 --> 01:23:20,720 why am I not seeing them after D-A-V-I-D? 1765 01:23:20,720 --> 01:23:21,920 I'm probably getting lucky. 1766 01:23:21,920 --> 01:23:24,560 Long story short, when you first run a program, 1767 01:23:24,560 --> 01:23:28,370 much of the memory that your program has access to is by default initialized 1768 01:23:28,370 --> 01:23:29,710 to 0's. 1769 01:23:29,710 --> 01:23:33,590 0 is the same thing as backslash 0, and so I'm getting lucky. 1770 01:23:33,590 --> 01:23:37,250 When I had D-A-V-I-D and then excess space in that array, 1771 01:23:37,250 --> 01:23:39,830 a lot of them are initialized as 0's already, 1772 01:23:39,830 --> 01:23:43,150 and the string is getting secretly terminated for me. 1773 01:23:43,150 --> 01:23:46,880 Or the better answer is, it's undefined behavior. 1774 01:23:46,880 --> 01:23:49,160 Like you should not touch memory that is not your own. 1775 01:23:49,160 --> 01:23:52,820 What happens after that is your risk alone. 1776 01:23:52,820 --> 01:23:55,490 But that's a conjecture as to why that's happening. 1777 01:23:55,490 --> 01:23:58,460 All right, so what is the fundamental feature than get_int 1778 01:23:58,460 --> 01:23:59,750 is providing for us? 1779 01:23:59,750 --> 01:24:02,480 All of this time get_int has actually been dealing 1780 01:24:02,480 --> 01:24:04,417 with all of this headache for us. 1781 01:24:04,417 --> 01:24:07,250 I mean honestly, even I'm getting bored like thinking about, talking 1782 01:24:07,250 --> 01:24:09,530 about how you just get a damn string from the user, 1783 01:24:09,530 --> 01:24:12,197 because you need to figure out, well how many bytes do you need? 1784 01:24:12,197 --> 01:24:15,071 And what if the human types in one more bite than you were expecting? 1785 01:24:15,071 --> 01:24:17,300 Then you need to do a switcheroo and get more memory. 1786 01:24:17,300 --> 01:24:20,059 get_string is doing all of this headache for us. 1787 01:24:20,059 --> 01:24:22,100 And that's not to say you need to use it forever, 1788 01:24:22,100 --> 01:24:23,933 there are indeed training wheels, but that's 1789 01:24:23,933 --> 01:24:27,569 just because when you're using C or a lot of programming languages, 1790 01:24:27,569 --> 01:24:29,610 the computer will only do what you tell it to do. 1791 01:24:29,610 --> 01:24:31,927 And it turns out that even asking the user for input, 1792 01:24:31,927 --> 01:24:34,010 if you don't know how many characters he or she is 1793 01:24:34,010 --> 01:24:37,144 going to type in from the get-go, you have to deal with it. 1794 01:24:37,144 --> 01:24:40,310 And so underneath the hood-- and you're welcome to take a look at the source 1795 01:24:40,310 --> 01:24:44,090 code for CS50's library, which I'll post on the home page later today, 1796 01:24:44,090 --> 01:24:48,050 it turns out that with the way we're doing get_string is taking baby steps. 1797 01:24:48,050 --> 01:24:50,780 We literally like get one character at a time 1798 01:24:50,780 --> 01:24:54,369 from the user, kind of building the road as we go. 1799 01:24:54,369 --> 01:24:56,660 And if we don't have enough space, we ask the computer, 1800 01:24:56,660 --> 01:24:58,659 give me some more bytes so I can get more bytes, 1801 01:24:58,659 --> 01:25:01,130 and we just get one character at a time so 1802 01:25:01,130 --> 01:25:04,520 that we can handle the user maliciously or accidentally typing in way 1803 01:25:04,520 --> 01:25:08,460 more input than we actually expect. 1804 01:25:08,460 --> 01:25:10,640 So let's contextualize all of this then. 1805 01:25:10,640 --> 01:25:12,950 Recall that we've been drawing these pictures the past couple of weeks. 1806 01:25:12,950 --> 01:25:15,560 Let's just make this super clear as to what's been going on. 1807 01:25:15,560 --> 01:25:17,870 This is a memory module in a computer. 1808 01:25:17,870 --> 01:25:20,600 It's just a green board, it's way blown out of scale here, 1809 01:25:20,600 --> 01:25:24,620 it's easily like yea big inside of your Mac or PC laptop or desktop, 1810 01:25:24,620 --> 01:25:25,880 though can vary in size. 1811 01:25:25,880 --> 01:25:28,501 One of these black chips is the actual memory or the bytes 1812 01:25:28,501 --> 01:25:29,750 to which we've been referring. 1813 01:25:29,750 --> 01:25:32,300 And if we zoom in on that, recall that I proposed last week 1814 01:25:32,300 --> 01:25:35,097 that you can just think about this as like a grid, an array. 1815 01:25:35,097 --> 01:25:38,180 And it doesn't have to be rectangular, this is just an artist's rendition, 1816 01:25:38,180 --> 01:25:41,120 but each of those squares represents, we claimed, a byte. 1817 01:25:41,120 --> 01:25:44,880 And each of those bytes can be addressed in some way with a number. 1818 01:25:44,880 --> 01:25:49,250 And that number is just its location, otherwise known as an address. 1819 01:25:49,250 --> 01:25:52,590 We can actually see this, it turns out, as follows. 1820 01:25:52,590 --> 01:25:54,677 Let me go ahead and open up this example here. 1821 01:25:54,677 --> 01:25:57,260 Or actually, you know, let's just write this one from scratch. 1822 01:25:57,260 --> 01:26:01,930 Let me write a program called addresses.c. 1823 01:26:01,930 --> 01:26:09,230 And that's going to use our old friends, the CS50 library and stdio.h and int 1824 01:26:09,230 --> 01:26:11,090 main void. 1825 01:26:11,090 --> 01:26:13,580 And let me go ahead and just do this. 1826 01:26:13,580 --> 01:26:15,307 I'm going to go ahead and get a string-- 1827 01:26:15,307 --> 01:26:15,890 you know what? 1828 01:26:15,890 --> 01:26:21,420 No more string. char* from the user, get_string, ask the user for s. 1829 01:26:21,420 --> 01:26:23,050 And we get another string, a.k.a. 1830 01:26:23,050 --> 01:26:26,630 char*, get_string, call it t from the user. 1831 01:26:26,630 --> 01:26:31,580 And then, I want to print out not the strings, which I used to do like this, 1832 01:26:31,580 --> 01:26:32,550 printing out s. 1833 01:26:32,550 --> 01:26:37,360 I want to print out the pointer that s really is, that is the address. 1834 01:26:37,360 --> 01:26:42,380 Turns out %p for pointer will print out not the string at that memory location, 1835 01:26:42,380 --> 01:26:45,620 it will print the actual memory location for you of s. 1836 01:26:45,620 --> 01:26:50,480 And I can do the same thing here, %p, backslash 0, paste in t. 1837 01:26:50,480 --> 01:26:52,940 And just so I know which is which, let me just prefix it 1838 01:26:52,940 --> 01:26:55,460 with some text-- s colon and t colon. 1839 01:26:55,460 --> 01:26:58,846 Let me go ahead now down here and do make addresses. 1840 01:26:58,846 --> 01:27:02,390 Oh, I messed up, missed a semi-colon. 1841 01:27:02,390 --> 01:27:03,500 Let me do this again. 1842 01:27:03,500 --> 01:27:07,940 make addresses. 1843 01:27:07,940 --> 01:27:09,830 And get rid of this. 1844 01:27:09,830 --> 01:27:14,390 That compiled OK, ./addresses, and here we go. 1845 01:27:14,390 --> 01:27:18,210 Let's type in-- let's do Brian and Veronica like before. 1846 01:27:18,210 --> 01:27:18,970 Enter. 1847 01:27:18,970 --> 01:27:23,290 And this is a little funky, but it turns out the IDE in your Macs 1848 01:27:23,290 --> 01:27:25,430 and your PCs have a lot of memory. 1849 01:27:25,430 --> 01:27:26,600 So this is the address. 1850 01:27:26,600 --> 01:27:30,070 It's not quite as small as 100, it's not quite as small as 900. 1851 01:27:30,070 --> 01:27:31,750 It's actually kind of big. 1852 01:27:31,750 --> 01:27:36,245 It's 2331010 with this weird 0x. 1853 01:27:36,245 --> 01:27:38,370 Well it turns out, this is just a human convention. 1854 01:27:38,370 --> 01:27:40,510 In week 0 we talked about decimal and all of us 1855 01:27:40,510 --> 01:27:43,030 grew up with decimal, 10 digits from 0 to 9. 1856 01:27:43,030 --> 01:27:46,060 Talked a little bit about binary 0's and 1's. 1857 01:27:46,060 --> 01:27:48,700 Turns out there's an infinite number of base systems-- 1858 01:27:48,700 --> 01:27:53,350 decimal/dec, binary/bi are just two of those infinite number of possibilities. 1859 01:27:53,350 --> 01:27:57,370 Turns out there's another one that's super common called hexadecimal. 1860 01:27:57,370 --> 01:27:59,890 Hexa meaning 16 in this case. 1861 01:27:59,890 --> 01:28:03,940 So base-16 actually has 16 letters in its alphabet. 1862 01:28:03,940 --> 01:28:11,560 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f. 1863 01:28:11,560 --> 01:28:15,990 So it turns out that base systems that need to count higher than 10 characters 1864 01:28:15,990 --> 01:28:18,430 just start using letters of the alphabet by convention. 1865 01:28:18,430 --> 01:28:19,840 Humans just decided this. 1866 01:28:19,840 --> 01:28:22,620 So we're getting just numbers in this case, 1867 01:28:22,620 --> 01:28:24,640 but if these addresses were even bigger, we 1868 01:28:24,640 --> 01:28:30,434 might actually see some alphabetical letters between a and f there. 1869 01:28:30,434 --> 01:28:32,350 And frankly I don't know what address this is, 1870 01:28:32,350 --> 01:28:34,308 but Google's usually pretty good at this stuff, 1871 01:28:34,308 --> 01:28:39,000 so let me actually open up another browser window. 1872 01:28:39,000 --> 01:28:39,300 So Google is your friend when it comes to this stuff, so let me actually open up another browser window. 1873 01:28:39,300 --> 01:28:41,340 So Google is your friend when it comes to this stuff, 1874 01:28:41,340 --> 01:28:42,570 or any number of calculators. 1875 01:28:42,570 --> 01:28:47,000 0x2331010 in decimal please. 1876 01:28:47,000 --> 01:28:48,300 And Google has translated that. 1877 01:28:48,300 --> 01:28:51,020 So Brian, I-- kind of under a bit earlier. 1878 01:28:51,020 --> 01:28:54,240 He is not at address location 0, he's actually 1879 01:28:54,240 --> 01:28:58,890 in the 36 millionth byte inside of my computer 1880 01:28:58,890 --> 01:29:02,760 right now, location 36,900,880. 1881 01:29:02,760 --> 01:29:05,100 So a little higher address than 100. 1882 01:29:05,100 --> 01:29:09,090 And then Veronica, if you really want to get into the weeds here, 1883 01:29:09,090 --> 01:29:12,400 we can say "in decimal," let Google translate that for us. 1884 01:29:12,400 --> 01:29:16,140 She's at location 36,900,944. 1885 01:29:16,140 --> 01:29:16,740 Why? 1886 01:29:16,740 --> 01:29:17,430 Who cares? 1887 01:29:17,430 --> 01:29:22,200 The computer is managing all of this for us, but when get_string used malloc, 1888 01:29:22,200 --> 01:29:26,160 these are literally the numbers that were being returned saying, 1889 01:29:26,160 --> 01:29:28,980 you may use this chunk of memory. 1890 01:29:28,980 --> 01:29:30,980 And why did humans use hexadecimal? 1891 01:29:30,980 --> 01:29:39,030 Like it's just slightly more compact to say 0x2331050, then 36900944-- 1892 01:29:39,030 --> 01:29:41,640 like you just save a few digits, so it's just conventional. 1893 01:29:41,640 --> 01:29:43,090 That's all, there's no magic there. 1894 01:29:43,090 --> 01:29:44,580 But, recall earlier. 1895 01:29:44,580 --> 01:29:47,400 Do you recall that when I had the debugger open earlier, 1896 01:29:47,400 --> 01:29:51,820 you saw next to my name variable a value that was cryptically 0x0? 1897 01:29:51,820 --> 01:29:53,910 Then there was another value that I don't recall-- 1898 01:29:53,910 --> 01:29:55,320 0x-something? 1899 01:29:55,320 --> 01:30:00,270 That was just the numeric address of my name in hexadecimal. 1900 01:30:00,270 --> 01:30:06,170 And 0x0 is just the technical address being used by null. 1901 01:30:06,170 --> 01:30:06,950 Yeah? 1902 01:30:06,950 --> 01:30:12,710 AUDIENCE: You said the address printed out was [INAUDIBLE] x of the variable s 1903 01:30:12,710 --> 01:30:13,210 and-- 1904 01:30:13,210 --> 01:30:13,960 DAVID MALAN: Sorry, could you say that again? 1905 01:30:13,960 --> 01:30:17,530 AUDIENCE: You said the address printed out on the screen was an x, 1906 01:30:17,530 --> 01:30:20,150 but x is [INAUDIBLE] 1907 01:30:20,150 --> 01:30:21,780 DAVID MALAN: Ah, I should've clarified. 1908 01:30:21,780 --> 01:30:24,990 0x, humans years ago decided anytime you see anything 1909 01:30:24,990 --> 01:30:29,040 with 0x, that means whatever comes next is hexadecimal. 1910 01:30:29,040 --> 01:30:30,250 Just the convention. 1911 01:30:30,250 --> 01:30:35,010 It's also common too if it starts with a 0, it's an octal, which is base-8. 1912 01:30:35,010 --> 01:30:37,460 If you see a lowercase b at the end, it means binary. 1913 01:30:37,460 --> 01:30:39,210 So humans have just come up with symbology 1914 01:30:39,210 --> 01:30:41,850 as to kind of communicate this to readers, that's all. 1915 01:30:41,850 --> 01:30:42,930 Not part of the value. 1916 01:30:42,930 --> 01:30:45,840 So turns out that we can actually do this math ourselves. 1917 01:30:45,840 --> 01:30:47,760 And we won't really get into the weeds of this 1918 01:30:47,760 --> 01:30:50,010 because it's not a particularly useful life 1919 01:30:50,010 --> 01:30:52,310 skill, to be able to convert to various base systems, 1920 01:30:52,310 --> 01:30:54,480 but let's just do one example so that we've seen it. 1921 01:30:54,480 --> 01:30:56,070 Just to make clear that there's no magic here, 1922 01:30:56,070 --> 01:30:59,060 it's just a different way of thinking about numbers versus grade school. 1923 01:30:59,060 --> 01:31:01,470 So if back in the day we had three decimal numbers-- 1924 01:31:01,470 --> 01:31:06,750 255, 216, and then another 255, if we rewound to week 0, 1925 01:31:06,750 --> 01:31:09,310 we could go through the math of converting that to binary. 1926 01:31:09,310 --> 01:31:12,480 And even if it might take you a little while, this is the binary equivalent. 1927 01:31:12,480 --> 01:31:15,590 And frankly, the first and last are kind of easy. 1928 01:31:15,590 --> 01:31:19,350 255 is kind of a special value because with 8 bits, all of which 1929 01:31:19,350 --> 01:31:21,690 are 1, that's what gives you 255. 1930 01:31:21,690 --> 01:31:23,500 So the only hard one is actually this. 1931 01:31:23,500 --> 01:31:25,410 But who cares about the math today. 1932 01:31:25,410 --> 01:31:28,350 We know from weeks ago that we can do this if we really tried. 1933 01:31:28,350 --> 01:31:35,760 But notice that bytes are eight bits, and of course, eight is a pair of four, 1934 01:31:35,760 --> 01:31:36,720 if you will. 1935 01:31:36,720 --> 01:31:40,800 Well what's really nice about hexadecimal is that it starts at 0 1936 01:31:40,800 --> 01:31:41,560 and ends at f. 1937 01:31:41,560 --> 01:31:46,100 And that's 0, 1, 2, 3, 4, 5, 6, 7, 8, 9-- 1938 01:31:46,100 --> 01:31:47,560 wait-- yes, that's 10. 1939 01:31:47,560 --> 01:31:48,060 OK. 1940 01:31:48,060 --> 01:31:51,240 And then a, b, c, d, e, f. 1941 01:31:51,240 --> 01:31:54,840 I just held up 16 fingers in total, hence, hexadecimal. 1942 01:31:54,840 --> 01:32:00,630 What's nice about base-16 is that how many bits do I need to count from 0 up 1943 01:32:00,630 --> 01:32:02,010 to-- 1944 01:32:02,010 --> 01:32:03,120 one, two, three, four-- 1945 01:32:03,120 --> 01:32:05,400 15? 1946 01:32:05,400 --> 01:32:06,510 Just 4, right? 1947 01:32:06,510 --> 01:32:09,510 So if I have all 0 bits, that's 0. 1948 01:32:09,510 --> 01:32:13,010 And if I have 4 1-bits, that's-- 1949 01:32:13,010 --> 01:32:13,560 let's see. 1950 01:32:13,560 --> 01:32:18,940 This is an 8 plus 4 plus 2 plus 1 gives me 15. 1951 01:32:18,940 --> 01:32:22,830 So long story short, hexadecimal's super convenient because 0 through f 1952 01:32:22,830 --> 01:32:25,230 maps wonderfully cleanly to 4 bits. 1953 01:32:25,230 --> 01:32:28,110 So it's just a nice way of thinking about the world not in units of 8 1954 01:32:28,110 --> 01:32:29,670 but in 4 instead. 1955 01:32:29,670 --> 01:32:31,800 So all I did here was I took my values and I just 1956 01:32:31,800 --> 01:32:33,720 added a little bit of whitespace to make clear 1957 01:32:33,720 --> 01:32:35,970 that 8 bits is like a pair of 4 bits. 1958 01:32:35,970 --> 01:32:40,760 It turns out now that 1 1 1 1 is f for the reasons I enumerated earlier. 1959 01:32:40,760 --> 01:32:44,700 All 1's is f, otherwise known as 15. 1960 01:32:44,700 --> 01:32:47,100 All 1's is again f, otherwise known as 15. 1961 01:32:47,100 --> 01:32:55,470 If we did the math, 1 1 0 1 is d, 1 0 0 0 is 8, and then all 1's is f and f. 1962 01:32:55,470 --> 01:32:58,710 So long story short, there is a way to convert from decimal 1963 01:32:58,710 --> 01:33:01,390 to binary, to hexadecimal, to any number of other base systems. 1964 01:33:01,390 --> 01:33:03,600 It all just boils down to what digits you care about. 1965 01:33:03,600 --> 01:33:05,640 And the way you write this, to your question earlier, 1966 01:33:05,640 --> 01:33:06,720 is by human convention. 1967 01:33:06,720 --> 01:33:12,510 Not just FFDAFF, but 0xFF0xD80xFF just because. 1968 01:33:12,510 --> 01:33:14,430 Then it's clear to the user what it is. 1969 01:33:14,430 --> 01:33:16,740 So a little levity now. 1970 01:33:16,740 --> 01:33:19,260 I'm sorry to do this to you, but now you will all hopefully 1971 01:33:19,260 --> 01:33:21,120 understand this famous comic. 1972 01:33:26,560 --> 01:33:29,610 OK, welcome to that club of people who understand things like this. 1973 01:33:29,610 --> 01:33:34,830 So let's now stumble upon just one last problem, 1974 01:33:34,830 --> 01:33:36,920 and we'll take it home by putting into the context 1975 01:33:36,920 --> 01:33:41,180 a very sexy field of forensics where all of these building blocks 1976 01:33:41,180 --> 01:33:42,190 will come into play. 1977 01:33:42,190 --> 01:33:43,740 But first let's start with a problem. 1978 01:33:43,740 --> 01:33:47,300 Suppose I want to implement a function here called swap whose purpose in life 1979 01:33:47,300 --> 01:33:49,160 is just to swap two values, a and b. 1980 01:33:49,160 --> 01:33:50,660 I just want to do a switcheroo. 1981 01:33:50,660 --> 01:33:54,800 Let's first do this with a sort of mid-lecture snack for at least 1982 01:33:54,800 --> 01:33:55,400 one person. 1983 01:33:55,400 --> 01:33:56,880 Would anyone be up for-- 1984 01:33:56,880 --> 01:33:57,680 OK, that was fast. 1985 01:33:57,680 --> 01:34:00,080 Volunteering, come on up. 1986 01:34:00,080 --> 01:34:01,160 What's your name? 1987 01:34:01,160 --> 01:34:02,580 Kelly, all right. 1988 01:34:02,580 --> 01:34:04,210 Thank you for volunteering so suddenly. 1989 01:34:07,880 --> 01:34:09,720 Kelly, David, nice to meet you. 1990 01:34:09,720 --> 01:34:11,900 OK, so very simple task at hand. 1991 01:34:11,900 --> 01:34:16,080 I have here two empty cups, and we have some orange juice. 1992 01:34:19,480 --> 01:34:22,400 OK, put this in here. 1993 01:34:22,400 --> 01:34:26,890 And we've got some milk over here. 1994 01:34:26,890 --> 01:34:29,870 That should stand out, very different colors. 1995 01:34:29,870 --> 01:34:34,830 OK, I would just like you, Kelly, if you could, swap those two values. 1996 01:34:34,830 --> 01:34:37,000 Orange goes into milk, milk goes into orange please. 1997 01:34:42,450 --> 01:34:44,390 That is cheating, OK? 1998 01:34:44,390 --> 01:34:45,840 No, I mean literally the cups. 1999 01:34:45,840 --> 01:34:47,430 I put them in the wrong cup, I prefer my milk 2000 01:34:47,430 --> 01:34:50,130 in the other cup and my orange juice in the other cup, I'm sorry. 2001 01:34:53,190 --> 01:34:54,640 AUDIENCE: Pour it back in. 2002 01:34:54,640 --> 01:34:56,730 DAVID MALAN: No, that is not available to you, OK? 2003 01:34:56,730 --> 01:34:57,860 [LAUGHTER] 2004 01:34:57,860 --> 01:34:59,280 OK, so you're struggling. 2005 01:34:59,280 --> 01:35:00,240 Why are you struggling? 2006 01:35:00,240 --> 01:35:01,830 KELLY: Because I'm going to mix them. 2007 01:35:01,830 --> 01:35:03,080 And then it won't be the same. 2008 01:35:03,080 --> 01:35:03,870 DAVID MALAN: Right. 2009 01:35:03,870 --> 01:35:06,570 So I mean obviously, this is kind of a losing proposition. 2010 01:35:06,570 --> 01:35:07,320 You can't really do this. 2011 01:35:07,320 --> 01:35:09,300 What would make this easier for you besides putting them back 2012 01:35:09,300 --> 01:35:10,110 in the bottles? 2013 01:35:10,110 --> 01:35:10,870 KELLY: Having another container. 2014 01:35:10,870 --> 01:35:11,620 DAVID MALAN: Yeah. 2015 01:35:11,620 --> 01:35:14,490 So you need like a temporary storage space for this. 2016 01:35:14,490 --> 01:35:15,240 You know, let me-- 2017 01:35:15,240 --> 01:35:18,560 Tara, can we get some more cups over here? 2018 01:35:18,560 --> 01:35:20,700 Ah, this will make it easier. 2019 01:35:20,700 --> 01:35:22,760 OK, so if I get you some temporary space-- 2020 01:35:22,760 --> 01:35:24,970 here you go-- could you solve the problem now please? 2021 01:35:28,330 --> 01:35:30,370 Ah, very nice. 2022 01:35:30,370 --> 01:35:35,150 A little contamination, but that's OK. 2023 01:35:35,150 --> 01:35:37,270 But I need that temporary cup back for Tara. 2024 01:35:37,270 --> 01:35:38,830 Yeah, OK. 2025 01:35:38,830 --> 01:35:39,790 Thank you. 2026 01:35:39,790 --> 01:35:42,250 All right, a round of applause if we could for Kelly here. 2027 01:35:42,250 --> 01:35:44,090 [APPLAUSE] 2028 01:35:44,090 --> 01:35:44,890 Well here we go. 2029 01:35:44,890 --> 01:35:47,230 I'm guessing you don't want warm milk, but orange juice? 2030 01:35:47,230 --> 01:35:47,720 OK. 2031 01:35:47,720 --> 01:35:48,490 Thank you so much. 2032 01:35:48,490 --> 01:35:50,350 All right, so what's the point here? 2033 01:35:50,350 --> 01:35:51,260 This is pretty easy. 2034 01:35:51,260 --> 01:35:53,350 Like once you have some temporary storage 2035 01:35:53,350 --> 01:35:57,140 space-- a variable, if you will, like it's no problem to swap two values. 2036 01:35:57,140 --> 01:36:00,080 So let me go ahead and do that as follows. 2037 01:36:00,080 --> 01:36:02,740 I'm going to go ahead and just implement this swap function 2038 01:36:02,740 --> 01:36:05,760 and see exactly as Kelly ultimately just implemented it. 2039 01:36:05,760 --> 01:36:09,430 If the goal is to swap a and b, I can't just do a complete switcheroo, 2040 01:36:09,430 --> 01:36:10,090 it seems. 2041 01:36:10,090 --> 01:36:13,600 I need to put one of those values, like the milk, in another container, 2042 01:36:13,600 --> 01:36:15,100 and then swap and then swap. 2043 01:36:15,100 --> 01:36:17,170 So it takes three steps, not just one. 2044 01:36:17,170 --> 01:36:19,810 All right, so I could call this extra variable or cup 2045 01:36:19,810 --> 01:36:22,270 that Tara gave us anything we want-- tmp. 2046 01:36:22,270 --> 01:36:25,210 So I'm just going to put a in tmp. 2047 01:36:25,210 --> 01:36:28,540 Then I'm going to put b in a, because a is now empty. 2048 01:36:28,540 --> 01:36:31,050 Then I'm going to put tmp in b, and then I don't really 2049 01:36:31,050 --> 01:36:33,760 care what happens to tmp-- indeed, it's just still sitting there, 2050 01:36:33,760 --> 01:36:35,690 but the job is now done. 2051 01:36:35,690 --> 01:36:39,610 So let's go ahead and see this program in action, because obviously this 2052 01:36:39,610 --> 01:36:40,990 should be pretty straightforward. 2053 01:36:40,990 --> 01:36:44,260 So let me go ahead and open up this program 2054 01:36:44,260 --> 01:36:47,920 in the context of a main function so we can actually run it. 2055 01:36:47,920 --> 01:36:51,030 In this code here, I'm going to demonstrate it as follows. 2056 01:36:51,030 --> 01:36:52,030 Here's my main function. 2057 01:36:52,030 --> 01:36:55,120 I'm going to call variable x, give it 1, call variable y, 2058 01:36:55,120 --> 01:36:58,470 give it 2, go ahead and just print out just for a quick sanity check-- 2059 01:36:58,470 --> 01:37:00,580 x is this, y is that. 2060 01:37:00,580 --> 01:37:04,390 Then I'm going to call this super simple swap function, x, y. 2061 01:37:04,390 --> 01:37:08,050 Then I'm going to print the exact same thing-- x is this, y is that, 2062 01:37:08,050 --> 01:37:09,650 just so I can see in those variables-- 2063 01:37:09,650 --> 01:37:12,250 I could also use debug50, but this is meant to be a complete solution, 2064 01:37:12,250 --> 01:37:13,600 I want to see it on the screen. 2065 01:37:13,600 --> 01:37:14,410 Here is swap. 2066 01:37:14,410 --> 01:37:16,360 I copy-pasted that from before. 2067 01:37:16,360 --> 01:37:18,610 This feels like a no-brainer, super straightforward, 2068 01:37:18,610 --> 01:37:23,770 let's go into my directory and compile this program, which, slight spoiler, 2069 01:37:23,770 --> 01:37:26,590 noswap is the name. 2070 01:37:26,590 --> 01:37:29,730 ./noswap. 2071 01:37:29,730 --> 01:37:32,180 Oof. 2072 01:37:32,180 --> 01:37:33,830 Let's zoom in. 2073 01:37:33,830 --> 01:37:35,630 Nope, that is not what I intended, right? 2074 01:37:35,630 --> 01:37:38,380 I really intended milk to become OJ, OJ to become milk, 2075 01:37:38,380 --> 01:37:41,950 or x become y, y become x, this doesn't seem to work. 2076 01:37:41,950 --> 01:37:44,750 And again, the only magic is this one call to swap. 2077 01:37:44,750 --> 01:37:46,750 All right, maybe it just works some of the time. 2078 01:37:46,750 --> 01:37:49,050 So nope, nope-- OK. 2079 01:37:49,050 --> 01:37:50,350 Now it's time for the debugger. 2080 01:37:50,350 --> 01:37:52,390 I don't understand what's going on in my program, 2081 01:37:52,390 --> 01:37:54,430 printf is not really illuminating here. 2082 01:37:54,430 --> 01:37:58,530 So let me go ahead and run debug50 ./noswap. 2083 01:37:58,530 --> 01:38:00,660 The little debugging panels get opened on the side, 2084 01:38:00,660 --> 01:38:02,440 but wait, I need a breakpoint. 2085 01:38:02,440 --> 01:38:05,660 I'm going to start a breakpoint at the very top, the first line I care about. 2086 01:38:05,660 --> 01:38:08,050 I don't really care about all the stuff at the super top. 2087 01:38:08,050 --> 01:38:12,330 Now I'm going to go ahead and rerun debug50 ./noswap, all right? 2088 01:38:12,330 --> 01:38:15,680 Now I see over here, the first line 9 is highlighted. 2089 01:38:15,680 --> 01:38:17,640 Notice on the right-hand side, and this perhaps 2090 01:38:17,640 --> 01:38:19,890 answers by example your question earlier. 2091 01:38:19,890 --> 01:38:23,490 x and y conveniently, but just because we're initialized to 0-- 2092 01:38:23,490 --> 01:38:26,500 not by me, I shouldn't necessarily trust this in all contexts, 2093 01:38:26,500 --> 01:38:28,160 but that's why they had values. 2094 01:38:28,160 --> 01:38:31,500 They're otherwise known as garbage values, but I got lucky with 0's here. 2095 01:38:31,500 --> 01:38:34,500 Let me go ahead and step over that line, and if you watch, albeit small, 2096 01:38:34,500 --> 01:38:39,070 on the right-hand side, x should suddenly take on a value of 1. 2097 01:38:39,070 --> 01:38:43,170 And if I step over one more line, y should take on a value of 2. 2098 01:38:43,170 --> 01:38:45,930 OK, so I'm pretty confident the program is thus far correct. 2099 01:38:45,930 --> 01:38:48,390 I'm going to go ahead and step over printf. 2100 01:38:48,390 --> 01:38:51,460 And notice the blue terminal window, I see one output. 2101 01:38:51,460 --> 01:38:53,010 Now things get interesting. 2102 01:38:53,010 --> 01:38:56,700 If I continue stepping over lines, it's just going to finish running 2103 01:38:56,700 --> 01:38:58,020 and that's not enough. 2104 01:38:58,020 --> 01:39:01,890 So notice this time I'm going to hover over this third icon, Step Into. 2105 01:39:01,890 --> 01:39:03,670 Now I can kind of go down the rabbit hole, 2106 01:39:03,670 --> 01:39:07,110 so to speak, and go into the swap function, and notice, 2107 01:39:07,110 --> 01:39:09,430 the debugger jumps into that other function. 2108 01:39:09,430 --> 01:39:11,400 So here now, the context changed. 2109 01:39:11,400 --> 01:39:15,570 My local variables are now a, b, and tmp, and this is really weird. 2110 01:39:15,570 --> 01:39:21,000 A is 1, b is 2, as expected, because I passed an x, y. 2111 01:39:21,000 --> 01:39:25,380 And in the context of this function I'm just calling them a, b because. 2112 01:39:25,380 --> 01:39:29,450 But why is tmp 32,767? 2113 01:39:29,450 --> 01:39:31,950 It's just because it can't be trusted, it's a garbage value. 2114 01:39:31,950 --> 01:39:35,760 If you just give yourself a temporary value, who knows what's in there? 2115 01:39:35,760 --> 01:39:38,250 We got lucky and Tara did not have anything in this cup, 2116 01:39:38,250 --> 01:39:41,370 but it could have had a garbage value, maybe it had some Pepsi, 2117 01:39:41,370 --> 01:39:44,100 and then we would have had to replace that value somehow. 2118 01:39:44,100 --> 01:39:47,130 So to be clear, when you declare variables in a program, 2119 01:39:47,130 --> 01:39:50,430 quite often they have garbage values, just bogus values-- 2120 01:39:50,430 --> 01:39:53,620 the 0's and 1's that are there underneath the hood in that chip, 2121 01:39:53,620 --> 01:39:55,020 but that you didn't set yourself. 2122 01:39:55,020 --> 01:39:59,550 But that's OK, because I'm explicitly in this next line setting tmp equal to a. 2123 01:39:59,550 --> 01:40:03,990 So it doesn't matter what its original weird value was, so if I click Next, 2124 01:40:03,990 --> 01:40:06,160 tmp is now 1, a.k.a. 2125 01:40:06,160 --> 01:40:07,500 a. 2126 01:40:07,500 --> 01:40:11,230 Now notice a is going to become b if you watch the right-hand side. 2127 01:40:11,230 --> 01:40:15,480 Now I seem to have a is 2, b is 2, which is a little worrisome but not as bad, 2128 01:40:15,480 --> 01:40:18,570 because I have that separate variable tmp, so I still have the one around. 2129 01:40:18,570 --> 01:40:22,770 So now b is about to become 1, and I've done the switcheroo. 2130 01:40:22,770 --> 01:40:27,870 OK, at this point in the story, line 22, my code seems correct. 2131 01:40:27,870 --> 01:40:30,750 b has become a, a has become b, and the values are swapped-- 2132 01:40:30,750 --> 01:40:34,650 and the debugger is confirming that for me visually. 2133 01:40:34,650 --> 01:40:39,900 So now, let's do a step and-- 2134 01:40:39,900 --> 01:40:42,560 dammit. 2135 01:40:42,560 --> 01:40:43,990 Lost. 2136 01:40:43,990 --> 01:40:45,820 What is going on? 2137 01:40:45,820 --> 01:40:46,600 Intuitively? 2138 01:40:46,600 --> 01:40:53,010 Even if you've never seen or done this before, like clearly there's a bug. 2139 01:40:53,010 --> 01:40:55,170 What is that bug? 2140 01:40:55,170 --> 01:40:56,130 What must be happening? 2141 01:40:56,130 --> 01:40:57,090 Yeah? 2142 01:40:57,090 --> 01:41:01,830 AUDIENCE: [INAUDIBLE] a new value [INAUDIBLE] 2143 01:41:01,830 --> 01:41:03,830 doesn't have the same address for the first one? 2144 01:41:03,830 --> 01:41:04,580 DAVID MALAN: Yeah. 2145 01:41:04,580 --> 01:41:07,480 What seems to be happening here is yes, you're passing in x and y 2146 01:41:07,480 --> 01:41:12,190 and calling it a and b, but a and b would seem to be copies of x and y. 2147 01:41:12,190 --> 01:41:16,060 And I am very successfully, very correctly swapping a and b, 2148 01:41:16,060 --> 01:41:20,170 but because they're copies, it has no effect on the original x and y. 2149 01:41:20,170 --> 01:41:22,810 So our metaphor here of juice isn't quite apt 2150 01:41:22,810 --> 01:41:27,250 because I didn't pass Kelly copies of the OJ and milk, 2151 01:41:27,250 --> 01:41:31,630 I handed her the actual OJ and milk and she was able to change the values. 2152 01:41:31,630 --> 01:41:35,290 But in the context of C and code, when you pass arguments to a function, 2153 01:41:35,290 --> 01:41:38,300 you're passing copies of those arguments to the function. 2154 01:41:38,300 --> 01:41:40,930 So intuitively, what is the solution? 2155 01:41:40,930 --> 01:41:44,980 We clearly cannot pass from one function to another copies of the values if we 2156 01:41:44,980 --> 01:41:47,920 expect the function swap, or a.k.a. 2157 01:41:47,920 --> 01:41:50,720 Kelly, to make some useful change for us. 2158 01:41:50,720 --> 01:41:55,090 What do we have to pass to the function or to Kelly instead? 2159 01:41:55,090 --> 01:41:57,370 The addresses of those values, right? 2160 01:41:57,370 --> 01:41:59,950 I told her where the milk and OJ were. 2161 01:41:59,950 --> 01:42:02,710 I didn't give her copies of them, I told her, here's the milk, 2162 01:42:02,710 --> 01:42:04,740 here's the OJ, swap those. 2163 01:42:04,740 --> 01:42:06,580 In this version of the code, I've just said, 2164 01:42:06,580 --> 01:42:10,570 here's a copy of x, here's a copy of y, you can call them a and b-- um-mmm. 2165 01:42:10,570 --> 01:42:14,660 We need to now use the ampersand or something like that to pass in a map, 2166 01:42:14,660 --> 01:42:15,160 if you will. 2167 01:42:15,160 --> 01:42:20,020 The treasure map to those values so that swap can change the original values. 2168 01:42:20,020 --> 01:42:22,660 And the way we do this is a little weird-looking, 2169 01:42:22,660 --> 01:42:27,640 but all we're going to have to do is make a little addition here 2170 01:42:27,640 --> 01:42:29,830 that looks as follows. 2171 01:42:29,830 --> 01:42:33,320 It's got to look like this instead. 2172 01:42:33,320 --> 01:42:35,170 So this is the broken version. 2173 01:42:35,170 --> 01:42:39,070 Or broken in that it doesn't have the effect we intend even though it works. 2174 01:42:39,070 --> 01:42:41,530 This is what we need to do instead, and it's the last piece 2175 01:42:41,530 --> 01:42:42,990 of new symbology for today. 2176 01:42:42,990 --> 01:42:44,950 We've seen star in a couple of different places 2177 01:42:44,950 --> 01:42:47,920 before, now we're using it in one final context. 2178 01:42:47,920 --> 01:42:53,080 When you specify a star here and here in the arguments to a function, that 2179 01:42:53,080 --> 01:42:55,120 is just the way you tell the computer, I'm 2180 01:42:55,120 --> 01:42:57,820 expecting not an int, but the address of an int. 2181 01:42:57,820 --> 01:43:00,320 I'm expecting not an int here, but the address of an int. 2182 01:43:00,320 --> 01:43:03,520 So two pointers, two addresses of integers. 2183 01:43:03,520 --> 01:43:05,860 Down here, tmp is still just an int. 2184 01:43:05,860 --> 01:43:08,630 I don't need to over think tmp, that's just an empty cup. 2185 01:43:08,630 --> 01:43:11,410 Give me an integer called tmp from week 1. 2186 01:43:11,410 --> 01:43:14,230 But, what do I want to store in tmp? 2187 01:43:14,230 --> 01:43:17,140 Both a and b in this version are addresses. 2188 01:43:17,140 --> 01:43:22,370 Do I want to remember the address a and the address b? 2189 01:43:22,370 --> 01:43:25,460 No, I want to remember the volume of OJ, the volume of milk, 2190 01:43:25,460 --> 01:43:29,780 I want to remember 1 and 2, I don't care where in memory they are. 2191 01:43:29,780 --> 01:43:34,460 So star in this context, when there's no mention of a data type, 2192 01:43:34,460 --> 01:43:37,110 there's just a star and a variable name. 2193 01:43:37,110 --> 01:43:39,410 That variable is a pointer and it's not multiplication, 2194 01:43:39,410 --> 01:43:40,960 there's no math going on. 2195 01:43:40,960 --> 01:43:46,340 That star is the dereference operator that says, go to this address 2196 01:43:46,340 --> 01:43:48,030 and get the value there. 2197 01:43:48,030 --> 01:43:52,070 So if this address a is at location, I don't know, 100 like Brian was, 2198 01:43:52,070 --> 01:43:55,400 and this address b is at location 900 like Veronica was, 2199 01:43:55,400 --> 01:44:01,160 *a means go to the 100th byte in memory and get me that value, which is 1. 2200 01:44:01,160 --> 01:44:05,990 This means, down here, go to the address b, get me that value at address 900, 2201 01:44:05,990 --> 01:44:07,580 which is 2. 2202 01:44:07,580 --> 01:44:10,670 And go ahead and store 1 in tmp. 2203 01:44:10,670 --> 01:44:13,490 Go ahead and go to that address and put whatever's 2204 01:44:13,490 --> 01:44:17,300 at b's address-- so get that address and put it over-- get that address, 2205 01:44:17,300 --> 01:44:20,840 get the value, and put it over at that address by dereferencing. 2206 01:44:20,840 --> 01:44:26,310 And then lastly, go to b in memory, like over there, put the tmp value there. 2207 01:44:26,310 --> 01:44:28,910 So whereas ampersand in our previous example means, 2208 01:44:28,910 --> 01:44:32,690 tell me what the address is of a variable, star is the opposite. 2209 01:44:32,690 --> 01:44:35,360 When you have an address, it says, go to that address. 2210 01:44:35,360 --> 01:44:39,410 Follow the treasure map, X marks the spot at that location in memory, 2211 01:44:39,410 --> 01:44:40,950 and get at its value. 2212 01:44:40,950 --> 01:44:42,850 So what is the net effect here? 2213 01:44:42,850 --> 01:44:46,850 If I actually now open up not this example, but swap.c-- 2214 01:44:46,850 --> 01:44:50,030 spoiler, this one is going to actually work. 2215 01:44:50,030 --> 01:44:55,490 If I open up swap.c, we're going to see now the following instead. 2216 01:44:55,490 --> 01:44:58,600 The code is almost the same, except that I pasted it 2217 01:44:58,600 --> 01:45:01,160 in this new green version of the function. 2218 01:45:01,160 --> 01:45:03,200 And notice here, this had a change. 2219 01:45:03,200 --> 01:45:11,860 Why am I typing in %x now and %y instead of just x and y? 2220 01:45:11,860 --> 01:45:17,200 AUDIENCE: [INAUDIBLE] address [INAUDIBLE] functions [INAUDIBLE].. 2221 01:45:17,200 --> 01:45:18,190 DAVID MALAN: Exactly. 2222 01:45:18,190 --> 01:45:20,140 The swap function now, the new improved version 2223 01:45:20,140 --> 01:45:22,840 is expected two addresses-- stars. 2224 01:45:22,840 --> 01:45:25,840 Each star, a.k.a. pointers, not just values. 2225 01:45:25,840 --> 01:45:29,800 So this means I know x and y are actually integers from week 1. 2226 01:45:29,800 --> 01:45:31,990 Now I need the address of x and the address of y 2227 01:45:31,990 --> 01:45:35,270 so that swap can follow those treasure maps, 2228 01:45:35,270 --> 01:45:37,580 so to speak, and go to those addresses. 2229 01:45:37,580 --> 01:45:42,250 So now, when I run this program, this is more like the metaphor with Kelly 2230 01:45:42,250 --> 01:45:44,290 where I told her where the milk and OJ were. 2231 01:45:44,290 --> 01:45:48,760 Now swap and go to those locations as follows. make swap. 2232 01:45:48,760 --> 01:45:51,950 Let me go ahead and then do ./swap, Enter-- 2233 01:45:51,950 --> 01:45:52,630 ah! 2234 01:45:52,630 --> 01:45:54,460 Now it seems to be working. 2235 01:45:54,460 --> 01:45:56,380 And we can see as much even with the debugger. 2236 01:45:56,380 --> 01:45:59,650 Even though it doesn't seem to be buggy, I can still use debug50 2237 01:45:59,650 --> 01:46:03,040 to see and understand my program, if not obvious-- oh, 2238 01:46:03,040 --> 01:46:04,180 I still need a breakpoint. 2239 01:46:04,180 --> 01:46:05,890 Let's set a breakpoint as before. 2240 01:46:05,890 --> 01:46:07,870 Let's rerun debug50. 2241 01:46:07,870 --> 01:46:11,110 The right-hand panel will open automatically for me. 2242 01:46:11,110 --> 01:46:14,590 And let's go ahead and see, if I start stepping over this, 2243 01:46:14,590 --> 01:46:23,280 now I see that x is 1, y is 2, printf prints as much on the screen. 2244 01:46:23,280 --> 01:46:25,980 Now I'm going to go ahead and step into swap, 2245 01:46:25,980 --> 01:46:28,760 and now notice, it's a little weird-looking, 2246 01:46:28,760 --> 01:46:32,420 because now a is an address and b is an address, 2247 01:46:32,420 --> 01:46:36,530 but tmp is still an int with a garbage value, but I can fix that. 2248 01:46:36,530 --> 01:46:41,700 Now tmp is 1, but notice, a and b's values are not changing, 2249 01:46:41,700 --> 01:46:43,910 but what is clearly changing per the code? 2250 01:46:46,660 --> 01:46:48,760 So notice, this is weird and cryptic. 2251 01:46:48,760 --> 01:46:50,740 a is this 0x value. 2252 01:46:50,740 --> 01:46:54,550 That's a big hexadecimal address, like that is where in memory a is. 2253 01:46:54,550 --> 01:46:55,300 But you know what? 2254 01:46:55,300 --> 01:46:58,620 If I click the little triangle, I can kind of follow that pointer 2255 01:46:58,620 --> 01:46:59,350 and go to it. 2256 01:46:59,350 --> 01:47:01,120 The debugger is smart like that. 2257 01:47:01,120 --> 01:47:06,820 So *a, go to a is 2; and *b at the moment is 2, but if I keep going, 2258 01:47:06,820 --> 01:47:11,240 now I've done a switcheroo, and you can see that these values have changed. 2259 01:47:11,240 --> 01:47:13,330 And again, we don't care what these addresses are, 2260 01:47:13,330 --> 01:47:15,010 I don't care what the actual addresses are. 2261 01:47:15,010 --> 01:47:17,630 I do care that it gives me this functionality, because now when 2262 01:47:17,630 --> 01:47:20,050 I return up here in print, now the values have indeed 2263 01:47:20,050 --> 01:47:23,350 changed as I expected this whole time. 2264 01:47:23,350 --> 01:47:24,070 All right. 2265 01:47:24,070 --> 01:47:30,520 That was complex, but hopefully clear as to why it now works even though we've 2266 01:47:30,520 --> 01:47:33,150 made this code look more cryptic. 2267 01:47:33,150 --> 01:47:34,990 If not, any questions are welcome. 2268 01:47:34,990 --> 01:47:35,640 Yeah? 2269 01:47:35,640 --> 01:47:39,610 AUDIENCE: Is that from the spot where [INAUDIBLE] 2270 01:47:39,610 --> 01:47:40,450 DAVID MALAN: Uh huh. 2271 01:47:40,450 --> 01:47:44,880 AUDIENCE: [INAUDIBLE] the star [INAUDIBLE] pointers? 2272 01:47:44,880 --> 01:47:46,010 DAVID MALAN: Good question. 2273 01:47:46,010 --> 01:47:49,960 Do we really need to have these ampersands here because we already 2274 01:47:49,960 --> 01:47:50,920 have the stars here? 2275 01:47:50,920 --> 01:47:52,630 Short answer, yes, for symmetry. 2276 01:47:52,630 --> 01:47:55,660 This is telling the function what to expect on the way in; 2277 01:47:55,660 --> 01:48:00,600 this is what's telling the computer actually what to send in. 2278 01:48:00,600 --> 01:48:03,420 So what are the actual inputs to that function? 2279 01:48:03,420 --> 01:48:05,370 It has to be symmetric. 2280 01:48:05,370 --> 01:48:05,960 Yeah? 2281 01:48:05,960 --> 01:48:11,530 AUDIENCE: [INAUDIBLE] value is swapping addresses. 2282 01:48:11,530 --> 01:48:16,110 DAVID MALAN: We are swapping what is at the addresses. 2283 01:48:16,110 --> 01:48:24,870 AUDIENCE: So what if you change the address of [INAUDIBLE] 2284 01:48:24,870 --> 01:48:25,530 DAVID MALAN: OK. 2285 01:48:25,530 --> 01:48:29,000 AUDIENCE: And would we swap the addresses saying 2 is at 200 and 1 2286 01:48:29,000 --> 01:48:31,830 is at [INAUDIBLE] that could change. 2287 01:48:31,830 --> 01:48:35,970 DAVID MALAN: Short answer, you cannot for the following reason. 2288 01:48:35,970 --> 01:48:41,220 So technically, when you do %x and %y, these are converted to the address 2289 01:48:41,220 --> 01:48:42,540 of x, the address of y. 2290 01:48:42,540 --> 01:48:46,770 Technically swap is getting copies of something, C has not changed. 2291 01:48:46,770 --> 01:48:49,740 But C is now getting copies of the address 2292 01:48:49,740 --> 01:48:53,940 of x, copies of the address of y, calling them a and b. 2293 01:48:53,940 --> 01:48:57,030 So sure, you could swap the addresses, but for the same reasons as before, 2294 01:48:57,030 --> 01:48:58,860 it's going to have no fundamental effect. 2295 01:48:58,860 --> 01:49:01,770 The difference here is because I'm passing in a map, so to speak, 2296 01:49:01,770 --> 01:49:03,420 to x and y, their addresses. 2297 01:49:03,420 --> 01:49:04,830 And again, an address is like-- 2298 01:49:04,830 --> 01:49:08,490 we are at 45 Quincy Street I think right now-- 2299 01:49:08,490 --> 01:49:10,470 Cambridge, Massachusetts 02138, USA. 2300 01:49:10,470 --> 01:49:12,150 That uniquely identifies the building. 2301 01:49:12,150 --> 01:49:15,820 These 0x hexadecimal numbers uniquely identify locations in memory. 2302 01:49:15,820 --> 01:49:19,350 So this is like saying now, get me the address of x, get me the address of y, 2303 01:49:19,350 --> 01:49:22,680 and I'm technically passing in copies of those addresses, but it doesn't matter, 2304 01:49:22,680 --> 01:49:25,920 because now with the star notation, I'm saying go to those addresses 2305 01:49:25,920 --> 01:49:30,450 and swap who is physically in this building and some other. 2306 01:49:30,450 --> 01:49:31,170 All right. 2307 01:49:31,170 --> 01:49:34,380 So let's just put this now into the context of what else 2308 01:49:34,380 --> 01:49:36,180 your computer actually has just that you've 2309 01:49:36,180 --> 01:49:39,340 seen some nomenclature around this computer's memory. 2310 01:49:39,340 --> 01:49:41,700 So this is the chip with a grid laid out on top of it 2311 01:49:41,700 --> 01:49:44,700 just to communicate that there's bytes here, and we could number them. 2312 01:49:44,700 --> 01:49:47,160 But let's think about this now more abstractly, 2313 01:49:47,160 --> 01:49:49,890 and let me just reveal that it turns out that the computer treats 2314 01:49:49,890 --> 01:49:53,910 different bytes, different squares in different ways just by convention. 2315 01:49:53,910 --> 01:49:56,040 It turns out that in your computer's memory-- 2316 01:49:56,040 --> 01:49:58,590 and this is all just an artist's representation-- 2317 01:49:58,590 --> 01:50:01,390 at the top of that chip of memory, so to speak, 2318 01:50:01,390 --> 01:50:03,130 is the so-called text of your program. 2319 01:50:03,130 --> 01:50:05,250 This is a fancy and non-obvious way of saying 2320 01:50:05,250 --> 01:50:09,540 the 0's and 1's that your code have has been compiled into. 2321 01:50:09,540 --> 01:50:12,530 The text of a program is the code you wrote in binary, 2322 01:50:12,530 --> 01:50:14,080 that's where it's loaded from memory. 2323 01:50:14,080 --> 01:50:16,290 So in macOS and Windows, you double-click an icon, 2324 01:50:16,290 --> 01:50:18,540 that program is loaded into memory I said last week. 2325 01:50:18,540 --> 01:50:22,920 It's literally loaded into the top of your computer's memory conceptually. 2326 01:50:22,920 --> 01:50:23,610 What else? 2327 01:50:23,610 --> 01:50:29,460 Well the heap is the fancy name given to the chunk of memory in which memory 2328 01:50:29,460 --> 01:50:31,140 is coming from when you call malloc. 2329 01:50:31,140 --> 01:50:34,740 So when I called malloc earlier to get a bunch of space for some characters, 2330 01:50:34,740 --> 01:50:37,800 it was just coming from this big open area called the heap. 2331 01:50:37,800 --> 01:50:41,220 And that's what get_string is using and other functions as well. 2332 01:50:41,220 --> 01:50:44,940 Well it turns out that the reason for the problem we just ran into 2333 01:50:44,940 --> 01:50:48,400 is because the bottom part of memory is what's called the stack. 2334 01:50:48,400 --> 01:50:52,650 The stack is the area of memory that functions use when they are called. 2335 01:50:52,650 --> 01:50:57,640 And this is actually relevant to that very simple noswap example as follows. 2336 01:50:57,640 --> 01:51:01,800 If we now assume that anytime you call a function, the memory it uses 2337 01:51:01,800 --> 01:51:04,700 comes from the bottom of that big block of memory, 2338 01:51:04,700 --> 01:51:07,500 where you can draw that, for instance, here on the screen, 2339 01:51:07,500 --> 01:51:10,560 because it turns out that anytime you call a function, that function gets 2340 01:51:10,560 --> 01:51:12,060 a slice of its own memory. 2341 01:51:12,060 --> 01:51:15,660 So for instance, main is always the first program a function calls, 2342 01:51:15,660 --> 01:51:20,910 and so it gets the first slice of memory at the bottom of the screen here. 2343 01:51:20,910 --> 01:51:25,030 And so if main had two variables x and y, that's like saying, 2344 01:51:25,030 --> 01:51:29,770 OK, give me a chunk of memory called x and put the value 1 in it; 2345 01:51:29,770 --> 01:51:33,690 give me another chunk of memory, call it y, put a value in it here. 2346 01:51:33,690 --> 01:51:38,850 But remember, from the first noswap example, the swap function was called. 2347 01:51:38,850 --> 01:51:40,430 This is a stack in the literal sense. 2348 01:51:40,430 --> 01:51:44,140 You go into a dining hall, a cafeteria, one tray for food, goes on another, 2349 01:51:44,140 --> 01:51:46,770 goes on another, goes on another so that the humans can take it 2350 01:51:46,770 --> 01:51:48,240 and put food and plates on it. 2351 01:51:48,240 --> 01:51:51,270 Well similarly in this model, when you call a function, 2352 01:51:51,270 --> 01:51:55,200 it gets its own slice of memory, but literally above, conceptually, 2353 01:51:55,200 --> 01:51:58,230 the existing frame on the stack. 2354 01:51:58,230 --> 01:52:01,620 So this is the swap function's own chunk of memory, 2355 01:52:01,620 --> 01:52:03,650 and it, too, gets some space. 2356 01:52:03,650 --> 01:52:06,030 It gets some space for a variable called a. 2357 01:52:06,030 --> 01:52:08,670 It gets some space for a variable called b. 2358 01:52:08,670 --> 01:52:11,700 And guess what goes inside those of that first example? 2359 01:52:11,700 --> 01:52:15,210 A copy of x and a copy of y. 2360 01:52:15,210 --> 01:52:15,960 And you know what? 2361 01:52:15,960 --> 01:52:19,140 It had a temp variable, so that's got to have some space here. 2362 01:52:19,140 --> 01:52:20,940 So I'll call this tmp. 2363 01:52:20,940 --> 01:52:24,720 And recall that I set tmp equal to a, so that got 1. 2364 01:52:24,720 --> 01:52:25,750 And then what happened? 2365 01:52:25,750 --> 01:52:27,970 Well then I did what-- 2366 01:52:27,970 --> 01:52:30,730 what did I? 2367 01:52:30,730 --> 01:52:33,630 Let me get this right. 2368 01:52:33,630 --> 01:52:36,180 We had a gets b. 2369 01:52:36,180 --> 01:52:38,920 So what happened there? 2370 01:52:38,920 --> 01:52:43,440 So in this example here, a gets the value b, so that changed. 2371 01:52:43,440 --> 01:52:46,770 And then what happens here, b got the value of 10, so that changed. 2372 01:52:46,770 --> 01:52:49,890 So swap was working in the sense that it was swapping values, 2373 01:52:49,890 --> 01:52:53,280 but the problem is, when a function returns, this chunk of memory that it 2374 01:52:53,280 --> 01:52:58,900 was previously using gets reclaimed so that someone else can now use it, 2375 01:52:58,900 --> 01:52:59,710 another function. 2376 01:52:59,710 --> 01:53:03,000 So we did all that hard work and no swap, and we did it correctly, 2377 01:53:03,000 --> 01:53:05,980 we just did it in the wrong place. 2378 01:53:05,980 --> 01:53:11,210 So by contrast, this next example that we did, which was swap.c, 2379 01:53:11,210 --> 01:53:13,260 just treated the memory a little bit differently. 2380 01:53:13,260 --> 01:53:18,210 Main this time still had two variables called x, and this was a 1, 2381 01:53:18,210 --> 01:53:21,300 and then another one called y, and this was a 2. 2382 01:53:21,300 --> 01:53:23,760 And then one swap was called this time, it again 2383 01:53:23,760 --> 01:53:26,520 had a variable called a and a variable called 2384 01:53:26,520 --> 01:53:29,790 b, but what was stored in a and b? 2385 01:53:29,790 --> 01:53:30,920 Well now they're addresses. 2386 01:53:30,920 --> 01:53:34,390 And I don't know what it is, but let me just arbitrarily say that this 2387 01:53:34,390 --> 01:53:37,240 is location 100, this is location-- 2388 01:53:37,240 --> 01:53:39,280 let's say 104. 2389 01:53:39,280 --> 01:53:41,740 But it could be anything, we just don't care at this point, 2390 01:53:41,740 --> 01:53:44,590 it would have 0x technically if the computer were showing us. 2391 01:53:44,590 --> 01:53:49,750 What's going in a here is 100, what's going in b here is 104. 2392 01:53:49,750 --> 01:53:54,040 And those are the addresses of x and y, and the code 2393 01:53:54,040 --> 01:53:56,410 we had using all of those new stars was saying, 2394 01:53:56,410 --> 01:54:04,030 go to address 100 and store whatever is at address 100 in tmp. 2395 01:54:04,030 --> 01:54:07,450 Then go to the address that's in b, or 104, 2396 01:54:07,450 --> 01:54:12,850 and store that at the location int *a, whatever is there. 2397 01:54:12,850 --> 01:54:15,430 Then it was saying, go get that 10th value, by the way, 2398 01:54:15,430 --> 01:54:20,500 and go ahead and put that here, so that now we did 2399 01:54:20,500 --> 01:54:23,300 different work in a different place. 2400 01:54:23,300 --> 01:54:25,870 So now when swap is done running, it doesn't 2401 01:54:25,870 --> 01:54:31,520 matter if its memory disappears because it has now mutated or changed 2402 01:54:31,520 --> 01:54:32,260 the other memory. 2403 01:54:32,260 --> 01:54:35,560 That it was passed in just like Kelly changed or mutated the cups 2404 01:54:35,560 --> 01:54:39,680 I actually pointed her at rather than copies thereof. 2405 01:54:39,680 --> 01:54:43,300 Now as an aside, there's other chunks of memory that are actually used. 2406 01:54:43,300 --> 01:54:45,760 If you have global variables in a program, 2407 01:54:45,760 --> 01:54:48,040 turns out that in between the text and the heap 2408 01:54:48,040 --> 01:54:51,310 memory are your global variables, if they're initialized with values 2409 01:54:51,310 --> 01:54:54,740 or they're not initialized with values, as would happen with the equal sign, 2410 01:54:54,740 --> 01:54:56,710 but we don't care too much about that for today's purposes. 2411 01:54:56,710 --> 01:54:58,510 And if you've ever heard of environment variables, which 2412 01:54:58,510 --> 01:55:01,030 we will when we get to web programming, they, too, 2413 01:55:01,030 --> 01:55:02,950 are stored elsewhere in memory. 2414 01:55:02,950 --> 01:55:04,780 But the most interesting chunks of memory 2415 01:55:04,780 --> 01:55:07,910 are stack and heap, as in this case here. 2416 01:55:07,910 --> 01:55:10,560 But unfortunately it's so easy for things to go awry-- 2417 01:55:10,560 --> 01:55:13,060 I mean, some of you experienced segmentation faults already, 2418 01:55:13,060 --> 01:55:15,200 and let's consider why that might happen. 2419 01:55:15,200 --> 01:55:18,940 So here's a contrived example of code that is by design buggy, 2420 01:55:18,940 --> 01:55:21,820 but let's just talk it through in English what these lines are doing. 2421 01:55:21,820 --> 01:55:25,390 This line here, int *x, is saying, hey, computer, 2422 01:55:25,390 --> 01:55:31,140 give me a variable that will store the address of an integer. 2423 01:55:31,140 --> 01:55:34,570 So give me a pointer to an int is the more casual way of saying it. 2424 01:55:34,570 --> 01:55:37,940 Hey computer, give me another variable that's 2425 01:55:37,940 --> 01:55:40,250 going to store the address of an int and call it y. 2426 01:55:40,250 --> 01:55:42,170 So x and y, that's it. 2427 01:55:42,170 --> 01:55:44,000 This line is new-ish. 2428 01:55:44,000 --> 01:55:48,510 Hey computer, allocate enough space that will fit an int. 2429 01:55:48,510 --> 01:55:51,560 So sizeof int is the new syntax we saw earlier for just figuring out 2430 01:55:51,560 --> 01:55:52,640 how many bytes is an int. 2431 01:55:52,640 --> 01:55:56,550 Odds are this is going to come back as 4 or 32 bits in most computers. 2432 01:55:56,550 --> 01:55:59,520 So this just says, hey browser, give me 4 bytes of memory 2433 01:55:59,520 --> 01:56:02,750 and store that in this location. 2434 01:56:02,750 --> 01:56:06,140 Or rather, store that in this variable, store that this variable. 2435 01:56:06,140 --> 01:56:09,650 So maybe it's going to say, OK, here's four bytes at location 100, 2436 01:56:09,650 --> 01:56:11,390 or here's four bytes at location 900. 2437 01:56:11,390 --> 01:56:15,670 Or wherever, we don't care, we're just remembering that address in x. 2438 01:56:15,670 --> 01:56:18,460 *x says, go to that address-- 2439 01:56:18,460 --> 01:56:22,430 100 or 900, whatever it is, put the number 42 there. 2440 01:56:22,430 --> 01:56:26,760 This next line says, go to the address in y and put the unlucky number-- hint, 2441 01:56:26,760 --> 01:56:27,320 hint-- 2442 01:56:27,320 --> 01:56:30,530 13 there. 2443 01:56:30,530 --> 01:56:32,220 Well what is the address in y? 2444 01:56:35,820 --> 01:56:36,950 I haven't allocated it yet. 2445 01:56:36,950 --> 01:56:38,540 What's the address in x? 2446 01:56:38,540 --> 01:56:41,420 It's wherever malloc told me to use space. 2447 01:56:41,420 --> 01:56:44,450 That's safe, that was like 100, 900, whatever the value was, 2448 01:56:44,450 --> 01:56:46,550 but did I allocate space for y? 2449 01:56:46,550 --> 01:56:49,570 So what kind of value does it contain, so to speak? 2450 01:56:49,570 --> 01:56:50,660 A garbage value. 2451 01:56:50,660 --> 01:56:53,640 Maybe it's 0, maybe it's like 32,000-- we don't know, 2452 01:56:53,640 --> 01:56:55,850 because if you don't specify the value, it 2453 01:56:55,850 --> 01:56:59,280 is not safe to trust it or do anything with it. 2454 01:56:59,280 --> 01:57:02,510 This is going to give me probably one of those segmentation faults. 2455 01:57:02,510 --> 01:57:04,550 And indeed, if I run a program like this, 2456 01:57:04,550 --> 01:57:08,510 I'm quite likely going to see exactly that kind of problem. 2457 01:57:08,510 --> 01:57:10,810 It's perhaps better, though, to see this in a way that 2458 01:57:10,810 --> 01:57:14,040 will paint a more memorable picture, and for that, thought we'd take-- 2459 01:57:14,040 --> 01:57:16,340 in our 10 minutes remaining, use a few of these minutes 2460 01:57:16,340 --> 01:57:18,460 to take a look at something our friends at Stanford 2461 01:57:18,460 --> 01:57:20,070 put together with a bit of claymation. 2462 01:57:20,070 --> 01:57:22,390 It's about three minutes long, well worth it 2463 01:57:22,390 --> 01:57:24,650 to paint a picture of exactly what goes wrong 2464 01:57:24,650 --> 01:57:27,250 when you don't use memory correctly. 2465 01:57:27,250 --> 01:57:29,130 If you could dim the lights. 2466 01:57:29,130 --> 01:57:29,790 [VIDEO PLAYBACK] 2467 01:57:29,790 --> 01:57:32,640 [MUSIC PLAYING] 2468 01:57:32,640 --> 01:57:33,410 - Hey, Binky. 2469 01:57:33,410 --> 01:57:34,220 Wake up! 2470 01:57:34,220 --> 01:57:36,650 It's time for pointer fun! 2471 01:57:36,650 --> 01:57:37,930 - What's that? 2472 01:57:37,930 --> 01:57:39,080 Learn about pointers? 2473 01:57:39,080 --> 01:57:41,300 Oh goody! 2474 01:57:41,300 --> 01:57:44,320 - Well to get started, I guess we're going to need a couple of pointers. 2475 01:57:44,320 --> 01:57:45,140 - OK. 2476 01:57:45,140 --> 01:57:48,660 This code allocates two pointers which can point to integers. 2477 01:57:48,660 --> 01:57:49,160 - OK. 2478 01:57:49,160 --> 01:57:52,880 Well I see the two pointers, but they don't seem to be pointing to anything. 2479 01:57:52,880 --> 01:57:53,720 - That's right. 2480 01:57:53,720 --> 01:57:55,850 Initially pointers don't point to anything. 2481 01:57:55,850 --> 01:57:58,100 The things they point to are called pointees, 2482 01:57:58,100 --> 01:58:00,050 and setting them up to a separate step. 2483 01:58:00,050 --> 01:58:00,950 - Oh right, right. 2484 01:58:00,950 --> 01:58:01,700 I knew that. 2485 01:58:01,700 --> 01:58:03,470 The pointees are separate. 2486 01:58:03,470 --> 01:58:05,980 So how do you allocate a pointee? 2487 01:58:05,980 --> 01:58:06,680 - OK. 2488 01:58:06,680 --> 01:58:09,740 Well this code allocates a new integer pointee, 2489 01:58:09,740 --> 01:58:12,890 and this part sets x to point to it. 2490 01:58:12,890 --> 01:58:14,060 - Hey, that looks better. 2491 01:58:14,060 --> 01:58:15,510 So make it do something. 2492 01:58:15,510 --> 01:58:16,300 - OK. 2493 01:58:16,300 --> 01:58:21,340 How do you reference the pointer x to store the number 42 into its pointee? 2494 01:58:21,340 --> 01:58:24,880 For this trick, I'll need my magic wand of dereferencing. 2495 01:58:24,880 --> 01:58:27,930 - Your magic wand of dereferencing? 2496 01:58:27,930 --> 01:58:30,040 That-- that's great. 2497 01:58:30,040 --> 01:58:31,840 - This is what the code looks like. 2498 01:58:31,840 --> 01:58:33,590 I'll just set up the number and-- 2499 01:58:33,590 --> 01:58:34,770 [POP] 2500 01:58:34,770 --> 01:58:35,440 - Hey look! 2501 01:58:35,440 --> 01:58:36,880 There it goes. 2502 01:58:36,880 --> 01:58:41,740 So doing a dereference on x follows the arrow to access its pointee. 2503 01:58:41,740 --> 01:58:43,900 In this case, to store 42 in there. 2504 01:58:43,900 --> 01:58:48,520 Hey, try using it to store the number 13 through the other pointer, y. 2505 01:58:48,520 --> 01:58:49,510 - OK. 2506 01:58:49,510 --> 01:58:54,010 I'll just go over here to y and get the number 13 set up, 2507 01:58:54,010 --> 01:58:58,020 and then take the wand of dereferencing and just-- 2508 01:58:58,020 --> 01:58:59,560 [BUZZING] whoa! 2509 01:58:59,560 --> 01:59:01,780 - Oh hey, that didn't work. 2510 01:59:01,780 --> 01:59:05,420 Say, Binky, I don't think dereferencing y is a good idea, 2511 01:59:05,420 --> 01:59:08,690 cause setting up the pointee is a separate step 2512 01:59:08,690 --> 01:59:10,490 and I don't think we ever did it. 2513 01:59:10,490 --> 01:59:12,230 - Mmm, good point. 2514 01:59:12,230 --> 01:59:12,730 - Yeah. 2515 01:59:12,730 --> 01:59:17,120 We allocated the pointer y, but we never set it to point to a pointee. 2516 01:59:17,120 --> 01:59:19,490 - Mmm, very observant. 2517 01:59:19,490 --> 01:59:21,160 - Hey, you're looking good there, Binky. 2518 01:59:21,160 --> 01:59:24,200 Can you fix it so that y points to the same pointee as x? 2519 01:59:24,200 --> 01:59:24,700 - Sure. 2520 01:59:24,700 --> 01:59:27,520 I'll use my magic wand of pointer assignment. 2521 01:59:27,520 --> 01:59:29,810 - Is that going to be a problem like before? 2522 01:59:29,810 --> 01:59:31,540 - No, this doesn't touch the pointees. 2523 01:59:31,540 --> 01:59:35,110 It just changes one pointer to point to the same thing as another. 2524 01:59:35,110 --> 01:59:36,190 - Oh, I see. 2525 01:59:36,190 --> 01:59:38,770 Now y points to the same place as x. 2526 01:59:38,770 --> 01:59:40,790 So wait, now y is fixed. 2527 01:59:40,790 --> 01:59:41,860 It has a pointee. 2528 01:59:41,860 --> 01:59:46,360 So you can try the wand of dereferencing again to send the 13 over. 2529 01:59:46,360 --> 01:59:47,050 - OK. 2530 01:59:47,050 --> 01:59:48,830 Here goes. 2531 01:59:48,830 --> 01:59:50,080 - Hey, look at that. 2532 01:59:50,080 --> 01:59:51,790 Now dereferencing works on y. 2533 01:59:51,790 --> 01:59:55,920 And because the pointers are sharing that one pointee, they both see the 13. 2534 01:59:55,920 --> 01:59:57,610 - Yeah, sharing, whatever. 2535 01:59:57,610 --> 01:59:59,520 So we going to switch places now? 2536 01:59:59,520 --> 02:00:01,490 - Oh look, we're out of time. 2537 02:00:01,490 --> 02:00:02,060 - But-- 2538 02:00:02,060 --> 02:00:02,740 [END PLAYBACK] 2539 02:00:02,740 --> 02:00:03,700 DAVID MALAN: All right. 2540 02:00:03,700 --> 02:00:07,780 So hopefully that puts a little more visual behind some of these ideas, 2541 02:00:07,780 --> 02:00:12,190 but let's now contextualize this in a domain that's perhaps 2542 02:00:12,190 --> 02:00:13,940 more familiar in a couple of ways. 2543 02:00:13,940 --> 02:00:16,030 So one, some of you might already know, especially 2544 02:00:16,030 --> 02:00:18,370 if you've had prior programming experience, of a very popular website 2545 02:00:18,370 --> 02:00:20,320 called Stack Overflow where lots of programmers 2546 02:00:20,320 --> 02:00:24,350 post questions and hopefully answers to common technical problems. 2547 02:00:24,350 --> 02:00:26,650 If you ever wondered why it's called Stack Overflow, 2548 02:00:26,650 --> 02:00:29,290 it turns out it reduces to this picture here. 2549 02:00:29,290 --> 02:00:33,150 This was not a mistake that I drew one arrow from the heap pointing down, 2550 02:00:33,150 --> 02:00:34,960 and one arrow from the stack growing up. 2551 02:00:34,960 --> 02:00:38,160 As you malloc, malloc, malloc more and more space, 2552 02:00:38,160 --> 02:00:41,020 starts up here, so to speak, and you just get more and more space 2553 02:00:41,020 --> 02:00:42,780 that's going this direction. 2554 02:00:42,780 --> 02:00:45,160 But the more functions you call-- function after function 2555 02:00:45,160 --> 02:00:47,030 after function after a function, each of them 2556 02:00:47,030 --> 02:00:50,380 gets its own slice or frame of memory, that, too, is growing up. 2557 02:00:50,380 --> 02:00:54,280 So this feels like a pretty bad design, but honestly, it's not really avoidable 2558 02:00:54,280 --> 02:00:56,200 because if you have a finite amount of memory, 2559 02:00:56,200 --> 02:00:58,510 you can't avoid each other forever. 2560 02:00:58,510 --> 02:01:03,010 And so there's this fundamental risk of overflowing the stack, 2561 02:01:03,010 --> 02:01:06,110 or even overflowing the heap in the reverse direction. 2562 02:01:06,110 --> 02:01:09,520 So Stack Overflow is an allusion to, for instance, calling too 2563 02:01:09,520 --> 02:01:12,190 many-- many, many, many, many, many, many, many, many functions, 2564 02:01:12,190 --> 02:01:15,760 so many so that it overlaps other chunks or segments of memory, 2565 02:01:15,760 --> 02:01:19,800 thereby inducing a segmentation fault, and buffer heap overflow 2566 02:01:19,800 --> 02:01:21,760 is in the reverse direction, and these are more 2567 02:01:21,760 --> 02:01:26,620 generally known as buffer overflows, and we'll see more of these in the weeks 2568 02:01:26,620 --> 02:01:27,340 to come. 2569 02:01:27,340 --> 02:01:29,710 But now that we have the ability to discuss pointers, 2570 02:01:29,710 --> 02:01:33,490 let's introduce one final feature and then a familiar face. 2571 02:01:33,490 --> 02:01:38,440 So it turns out that you can actually come up with your own custom variables 2572 02:01:38,440 --> 02:01:42,370 kind of like we did with string, but even more sophisticated than that. 2573 02:01:42,370 --> 02:01:46,120 For instance, if I wanted to implement a program that 2574 02:01:46,120 --> 02:01:49,480 involves multiple students, I might do something like this. 2575 02:01:49,480 --> 02:01:52,810 Ask the user what is the enrollment in a class, then go ahead 2576 02:01:52,810 --> 02:01:55,130 and give myself an array of strings, a.k.a. 2577 02:01:55,130 --> 02:01:59,650 char*s today of that size, and then I could also have another array of dorms. 2578 02:01:59,650 --> 02:02:03,240 And I could have two arrays containing one for the students' names, 2579 02:02:03,240 --> 02:02:05,990 one for the students' dorms, and I can keep track of other things. 2580 02:02:05,990 --> 02:02:08,900 Another array for emails, another array for phone numbers-- 2581 02:02:08,900 --> 02:02:11,230 but this gets messy quickly, because you can imagine, 2582 02:02:11,230 --> 02:02:15,430 if I need names and dorms and emails and phones, 2583 02:02:15,430 --> 02:02:17,710 that starts to become a lot of copy-paste. 2584 02:02:17,710 --> 02:02:20,800 And I just have this design where I have lots and lots of arrays 2585 02:02:20,800 --> 02:02:24,160 where each bracket location-- like bracket 0, bracket 1 2586 02:02:24,160 --> 02:02:28,420 presumably refers to the same student across all of these arrays, like mmm! 2587 02:02:28,420 --> 02:02:30,370 Messy, messy, messy design. 2588 02:02:30,370 --> 02:02:32,350 So with a wave of my hand, let me actually 2589 02:02:32,350 --> 02:02:36,040 fix that immediate problem out of the gate by introducing a new feature. 2590 02:02:36,040 --> 02:02:38,170 I can invent my own data types. 2591 02:02:38,170 --> 02:02:40,240 Let me just go ahead and declare an array 2592 02:02:40,240 --> 02:02:46,480 called students with this many students, but of data type student. 2593 02:02:46,480 --> 02:02:51,400 C comes with float, bool, char, int, not string, and definitely not student. 2594 02:02:51,400 --> 02:02:54,020 So you can make your own custom data types, 2595 02:02:54,020 --> 02:02:57,160 and you can put them in your own header files, which we've not done either. 2596 02:02:57,160 --> 02:03:00,870 But I can look, and you'll see more of this in the next problem set. 2597 02:03:00,870 --> 02:03:02,620 So not to worry if this feels quite brief, 2598 02:03:02,620 --> 02:03:04,800 it's just meant to be a teaser here. 2599 02:03:04,800 --> 02:03:09,580 And struct.h is how you declare or define your own type. 2600 02:03:09,580 --> 02:03:13,690 The keyword is literally typedef struct for structure, or data structure 2601 02:03:13,690 --> 02:03:14,740 to be more complete. 2602 02:03:14,740 --> 02:03:18,430 The name of the data structure comes at the end after some curly braces. 2603 02:03:18,430 --> 02:03:20,620 And then inside the curly braces you just specify, 2604 02:03:20,620 --> 02:03:22,280 well what do you want a student to have? 2605 02:03:22,280 --> 02:03:25,690 I want them to have a name, a dorm, maybe a phone number, maybe 2606 02:03:25,690 --> 02:03:27,140 an email address, anything I want. 2607 02:03:27,140 --> 02:03:28,510 I can just add here. 2608 02:03:28,510 --> 02:03:34,690 So that now in my actual code, I can have an array of actual students, 2609 02:03:34,690 --> 02:03:37,960 and I can just access them with this new notation like this. 2610 02:03:37,960 --> 02:03:40,990 You know that you can index into an array with bracket notation. 2611 02:03:40,990 --> 02:03:45,070 What you didn't know until now, perhaps, is that if at that location 2612 02:03:45,070 --> 02:03:46,900 is a structure, a.k.a. 2613 02:03:46,900 --> 02:03:51,100 struct, you can get at the name, the dorm, or the phone, or the email, 2614 02:03:51,100 --> 02:03:54,160 or anything else there just by using a dot-notation, which is 2615 02:03:54,160 --> 02:03:56,320 our last piece of new syntax for today. 2616 02:03:56,320 --> 02:03:58,040 Everything else is the same. 2617 02:03:58,040 --> 02:04:01,450 I can write a program that says so and so is in such and such a dorm 2618 02:04:01,450 --> 02:04:05,410 by just saying get the i-th student's name and the i-th student's dorm. 2619 02:04:05,410 --> 02:04:09,190 And I can be even fancier, and if I don't want to just print those values, 2620 02:04:09,190 --> 02:04:13,060 I can even, now, that I see no understand pointers-- 2621 02:04:13,060 --> 02:04:15,340 or I've seen pointers and we'll soon understand them 2622 02:04:15,340 --> 02:04:19,090 by way of problem sets and practice, I can actually do this. 2623 02:04:19,090 --> 02:04:21,760 This is just a little sneak preview of a line of code 2624 02:04:21,760 --> 02:04:23,920 that uses a new function called fopen. 2625 02:04:23,920 --> 02:04:27,130 fopen this file open, and it takes in the name of the file to open. 2626 02:04:27,130 --> 02:04:29,800 You might know of CSV files, they're like simple spreadsheets, 2627 02:04:29,800 --> 02:04:31,660 comma separated values. 2628 02:04:31,660 --> 02:04:33,610 And quote-unquote "w" means write. 2629 02:04:33,610 --> 02:04:37,150 So this says open the file called students.csv in write mode, 2630 02:04:37,150 --> 02:04:38,320 so I can write to this file. 2631 02:04:38,320 --> 02:04:40,840 Because in this example, as you'll see in the days to come, 2632 02:04:40,840 --> 02:04:42,800 I want to write out to a file. 2633 02:04:42,800 --> 02:04:45,940 But it turns out to use files, I need to know what a pointer is, 2634 02:04:45,940 --> 02:04:47,740 and it's a little weird that it's all caps, 2635 02:04:47,740 --> 02:04:51,770 but there is a data type in C called "file," and it's a pointer. 2636 02:04:51,770 --> 02:04:54,650 So long story short, what you're going to see in the next problem set 2637 02:04:54,650 --> 02:04:56,950 as we explore the world of forensics is the ability 2638 02:04:56,950 --> 02:05:00,260 using pointers and a few new functions to open files and get back 2639 02:05:00,260 --> 02:05:04,490 the address of that file in memory so that you can go to that address, 2640 02:05:04,490 --> 02:05:07,430 change the contents of a file, and save it back out. 2641 02:05:07,430 --> 02:05:10,640 All of us take for granted these days that you can go to File, Open and File, 2642 02:05:10,640 --> 02:05:13,270 Save, but what's actually happening, pointers are involved, 2643 02:05:13,270 --> 02:05:15,440 stuff's getting loaded into memory, and the computer 2644 02:05:15,440 --> 02:05:17,960 is dereferencing or going to those addresses 2645 02:05:17,960 --> 02:05:20,720 and changing what's at those locations in memory. 2646 02:05:20,720 --> 02:05:22,140 Now why might you want to do this? 2647 02:05:22,140 --> 02:05:23,600 Well here, of course, is Zamila-- you might 2648 02:05:23,600 --> 02:05:26,060 recall from some of the problem sets and the walkthroughs. 2649 02:05:26,060 --> 02:05:30,320 Turns out we could try to enhance this picture of her by zooming in, 2650 02:05:30,320 --> 02:05:33,660 and here's about as much fidelity as it is in her eyes. 2651 02:05:33,660 --> 02:05:38,090 Like I do not see the glint of any criminal's logo 2652 02:05:38,090 --> 02:05:40,730 on his or her jacket in the glint of Zamila's eyes. 2653 02:05:40,730 --> 02:05:43,580 If you zoom in on an image, and an image, recall, from week 0 2654 02:05:43,580 --> 02:05:47,580 is just a grid of pixels or dots, that's all you get. 2655 02:05:47,580 --> 02:05:50,690 And you can maybe smooth it out a little bit or clean up the colors, 2656 02:05:50,690 --> 02:05:53,390 but you can't just "enhance," quote-unquote, 2657 02:05:53,390 --> 02:05:55,760 and see more of the glint in Zamila's eye, 2658 02:05:55,760 --> 02:05:59,600 because an image at the end of the day is just a bitmap, a map-- 2659 02:05:59,600 --> 02:06:01,670 top-down, left-right-- of pixels. 2660 02:06:01,670 --> 02:06:03,140 For instance, here's a smiley face. 2661 02:06:03,140 --> 02:06:06,650 If you kind of take a look back and you can kind of see a black smiley 2662 02:06:06,650 --> 02:06:08,390 face against a white backdrop. 2663 02:06:08,390 --> 02:06:11,390 And if we just decide as humans, let's represent white dots 2664 02:06:11,390 --> 02:06:15,350 with 1's and black dots with 0's, this might be what's in the file, 2665 02:06:15,350 --> 02:06:16,790 this is what the human sees. 2666 02:06:16,790 --> 02:06:21,140 So if we have the ability to open that from a file, store it in memory, 2667 02:06:21,140 --> 02:06:24,420 and then using pointers go to those locations in memory, 2668 02:06:24,420 --> 02:06:27,800 we can even change the smiley face to an unhappy face, for instance, or color it 2669 02:06:27,800 --> 02:06:29,990 or do any number of things to it. 2670 02:06:29,990 --> 02:06:32,660 Now at quick glance, there's a lot going on in files, 2671 02:06:32,660 --> 02:06:37,010 because what a file is is a set of conventions that humans decided 2672 02:06:37,010 --> 02:06:40,130 on where humans years ago just decided in a bitmap file, 2673 02:06:40,130 --> 02:06:44,960 BMP file-- so an older but still popular file format for images, humans 2674 02:06:44,960 --> 02:06:48,170 just decided that, like, we're going to put a bunch of special values 2675 02:06:48,170 --> 02:06:50,150 at the first bytes of the file, then some more 2676 02:06:50,150 --> 02:06:55,990 special values than the actual RGB pixels in the rest of the file. 2677 02:06:55,990 --> 02:06:58,040 So this is meant to look cryptic at first glance, 2678 02:06:58,040 --> 02:07:00,530 and the next homework assignment will walk you through this, 2679 02:07:00,530 --> 02:07:04,340 but all it is is a convention of what the 0's and 1's mean 2680 02:07:04,340 --> 02:07:05,780 in these different locations. 2681 02:07:05,780 --> 02:07:08,460 And indeed, the challenge ahead is going to be to do a number of things. 2682 02:07:08,460 --> 02:07:10,190 One is to first and foremost figure out-- 2683 02:07:10,190 --> 02:07:10,990 who done it? 2684 02:07:10,990 --> 02:07:14,360 A sort of murder mystery in which there's a clue hidden in an image, 2685 02:07:14,360 --> 02:07:16,310 but an image that's a little noisy and you're 2686 02:07:16,310 --> 02:07:18,850 going to have to figure out what secret messages in the image 2687 02:07:18,850 --> 02:07:22,490 by loading that image in, tweaking it, putting a sort of red filter 2688 02:07:22,490 --> 02:07:25,940 on top of it and seeing the secret message, but all digitally; two, 2689 02:07:25,940 --> 02:07:29,800 actually resizing images and taking this many pixels in this big 2690 02:07:29,800 --> 02:07:32,130 of a smiley face or something else and making it bigger, 2691 02:07:32,130 --> 02:07:34,400 or if more comfortable, making it even smaller 2692 02:07:34,400 --> 02:07:36,680 and figuring out how to make that workout; 2693 02:07:36,680 --> 02:07:39,980 and then lastly, we've been taking some photographs of all CS50 staff 2694 02:07:39,980 --> 02:07:41,450 in Cambridge and New Haven. 2695 02:07:41,450 --> 02:07:45,440 Unfortunately we accidentally corrupted or lost the memory card, 2696 02:07:45,440 --> 02:07:49,060 but we made a forensic image of it, a copy of all of the 0's and 1's with all 2697 02:07:49,060 --> 02:07:50,900 of the staff photos, and we're going to need 2698 02:07:50,900 --> 02:07:53,880 you to write code that actually recovers all of the JPEGs 2699 02:07:53,880 --> 02:07:57,640 or photographs from that digital card by opening a file, 2700 02:07:57,640 --> 02:08:00,020 reading in those 0's and 1's, understanding what they are 2701 02:08:00,020 --> 02:08:01,720 and where they are, and just writing them 2702 02:08:01,720 --> 02:08:05,210 back out to disk using functions we'll introduce you to in the problem 2703 02:08:05,210 --> 02:08:06,080 set itself. 2704 02:08:06,080 --> 02:08:09,350 But of course, all of this takes for granted that we can do this, 2705 02:08:09,350 --> 02:08:10,590 and you can only do so much. 2706 02:08:10,590 --> 02:08:13,460 And indeed, this week is as much about solving those problems 2707 02:08:13,460 --> 02:08:16,200 as it is realizing the limitations of computers, 2708 02:08:16,200 --> 02:08:19,340 and so we thought we'd end with the final few seconds of this very 2709 02:08:19,340 --> 02:08:21,940 real example from Futurama. 2710 02:08:21,940 --> 02:08:22,720 [VIDEO PLAYBACK] 2711 02:08:22,720 --> 02:08:24,360 - Magnify that death sphere. 2712 02:08:27,000 --> 02:08:28,450 Why is it still blurry? 2713 02:08:28,450 --> 02:08:30,450 - That's all the resolution we have. 2714 02:08:30,450 --> 02:08:32,790 Making it bigger doesn't make it clearer. 2715 02:08:32,790 --> 02:08:34,500 - It does on CSI Miami. 2716 02:08:34,500 --> 02:08:35,170 - Ugh. 2717 02:08:35,170 --> 02:08:35,760 [END PLAYBACK] 2718 02:08:35,760 --> 02:08:38,400 DAVID MALAN: And that's it for CS50, we'll see you next time. 2719 02:08:38,400 --> 02:08:39,790 [APPLAUSE]