1 00:00:00,506 --> 00:00:10,776 [ Silence ] 2 00:00:11,276 --> 00:00:13,816 >> Al right, welcome back to CS50. 3 00:00:13,866 --> 00:00:16,486 This is the end of week five and this one 4 00:00:16,486 --> 00:00:19,316 of my favorite problems that's on the horizon here. 5 00:00:19,316 --> 00:00:22,406 So if some of you have kinda been staring at this, 6 00:00:22,406 --> 00:00:23,996 trying to figure out what it is, 7 00:00:23,996 --> 00:00:25,976 letting your eyes zone in and zone out. 8 00:00:25,976 --> 00:00:28,736 Odd are, you didn't quite decipher it but if you had say 9 00:00:28,736 --> 00:00:31,736 from childhood one of these pieces of plastic that's came 10 00:00:31,736 --> 00:00:34,906 for instance with one of these cereal top boxes and you held it 11 00:00:34,906 --> 00:00:37,536 up to this thing, what you would see as did last year, 12 00:00:37,536 --> 00:00:40,356 students when they implemented this idea in code was 13 00:00:40,356 --> 00:00:42,016 that it was Professor Plumb [phonetic] in the lounge 14 00:00:42,056 --> 00:00:45,026 with a candlestick with this particular murder mystery. 15 00:00:45,026 --> 00:00:47,446 And so on the horizon for you this year is gonna be a little 16 00:00:47,446 --> 00:00:49,586 someone different and a little some place different 17 00:00:49,586 --> 00:00:52,316 with a little something different but this is a teaser 18 00:00:52,316 --> 00:00:55,076 for problems set 5 who's focus will be forensics 19 00:00:55,076 --> 00:00:58,046 and the recovery of information that's either been accidentally 20 00:00:58,286 --> 00:00:59,626 or deliberately lost. 21 00:00:59,766 --> 00:01:04,166 I went through my inbox to retrieve an email from one 22 00:01:04,166 --> 00:01:06,616 of your predecessors, I'll keep it anonymous 23 00:01:06,616 --> 00:01:09,426 but this was really quite fun to read since he sent this 24 00:01:09,426 --> 00:01:12,966 to us a few months after CS50 ended a year ago. 25 00:01:13,226 --> 00:01:14,796 And he wrote to us yesterday, 26 00:01:14,846 --> 00:01:18,166 my sister accidentally formatted her camera's digital media card 27 00:01:18,446 --> 00:01:20,446 and lost a year's worth of memorable photos, 28 00:01:20,706 --> 00:01:22,666 parenthetically she unfortunately isn't the best 29 00:01:22,666 --> 00:01:23,566 at backing up her data. 30 00:01:24,266 --> 00:01:26,036 This reminded me of problems set 5. 31 00:01:26,036 --> 00:01:27,776 So I thought I would try to run her memory card 32 00:01:27,776 --> 00:01:29,996 through the recover.c program 33 00:01:30,216 --> 00:01:32,106 that I wrote all the way back in October. 34 00:01:32,386 --> 00:01:34,066 So after four hours of figuring out how 35 00:01:34,066 --> 00:01:37,666 to create a raw forensic image from the formatted memory card, 36 00:01:37,886 --> 00:01:42,846 I came across this page and was able to do the following. 37 00:01:43,066 --> 00:01:45,866 After tinkering around with some of the command line arguments, 38 00:01:45,866 --> 00:01:48,656 I managed to create the forensic image, I installed 39 00:01:48,656 --> 00:01:50,566 and configured the CS50 virtual box, 40 00:01:50,566 --> 00:01:53,416 I managed to run the forensic image through my program 41 00:01:53,636 --> 00:01:59,816 and run all-- recover all 1,027 of my sister's photos. 42 00:02:00,006 --> 00:02:02,576 I find it absolutely amazing that I was able to go 43 00:02:02,576 --> 00:02:04,496 from being one of those less comfortable 44 00:02:04,496 --> 00:02:07,526 with no programming knowledge whatsoever to having the ability 45 00:02:07,526 --> 00:02:09,736 to recover data off of a formatted memory card 46 00:02:09,736 --> 00:02:11,886 in an actual real life situation. 47 00:02:11,886 --> 00:02:13,866 So it does in fact happen. 48 00:02:13,966 --> 00:02:16,966 [Applause] So this will be problem set five. 49 00:02:17,636 --> 00:02:20,296 So quick invitation, CS50 lunch 50 00:02:20,296 --> 00:02:24,256 at the usual place cs50.net/rsvp this Friday 51 00:02:24,256 --> 00:02:28,356 at 1:15 p.m. We're gonna do a dinner in future weeks for those 52 00:02:28,356 --> 00:02:30,846 who consistently cannot make Friday, FYI. 53 00:02:31,086 --> 00:02:33,346 And now let's our attention back to this library. 54 00:02:33,346 --> 00:02:35,076 So very briefly last time we looked 55 00:02:35,076 --> 00:02:37,826 at this function get string, we've been taking this for-- 56 00:02:37,826 --> 00:02:41,336 taking, taking this for granted for some time now. 57 00:02:41,696 --> 00:02:44,296 And underneath the hood we recall that it's starting to-- 58 00:02:44,296 --> 00:02:45,936 it's actually been using some 59 00:02:45,936 --> 00:02:47,286 of these new fundamentals we looked at, 60 00:02:47,286 --> 00:02:48,926 this notion of memory management, 61 00:02:49,106 --> 00:02:52,556 dynamically allocating memory and just as a quick review. 62 00:02:52,556 --> 00:02:53,736 So the function we introduced 63 00:02:53,736 --> 00:02:55,546 with which you can ask the operating system 64 00:02:55,546 --> 00:02:59,426 for memory is now called, okay, malloc, for memory allocation. 65 00:02:59,636 --> 00:03:01,976 And that memory ends up on stack or heap. 66 00:03:03,206 --> 00:03:05,976 So it ends up on the heap and their intuitive explanation 67 00:03:05,976 --> 00:03:09,256 for that is that the stack memory constantly is coming 68 00:03:09,256 --> 00:03:09,706 and going. 69 00:03:09,706 --> 00:03:12,686 It's disappearing every time a function finishes executing 70 00:03:12,966 --> 00:03:15,386 so it's stands to reason that a function like get string, 71 00:03:15,416 --> 00:03:17,596 if you want whatever memory its allocating 72 00:03:17,596 --> 00:03:20,426 to actually survive its return 73 00:03:20,636 --> 00:03:23,016 which you do otherwise what's the point in getting a string, 74 00:03:23,216 --> 00:03:25,056 that memory needs to go on to the heap. 75 00:03:25,056 --> 00:03:27,616 So hence forth and we'll see this more on future P sets 76 00:03:27,906 --> 00:03:30,316 when you actually want memory dynamically, 77 00:03:30,356 --> 00:03:32,846 in other words you don't know how much you need in advanced 78 00:03:33,006 --> 00:03:34,286 and that's absolutely the case 79 00:03:34,286 --> 00:03:36,466 when you have no idea what the human's gonna type until he 80 00:03:36,466 --> 00:03:39,766 or she does or if you want to dynamically grow 81 00:03:39,766 --> 00:03:42,506 or shrink something, you're not just gonna declare local 82 00:03:42,506 --> 00:03:45,176 variables anymore, we're gonna start using a function called, 83 00:03:45,176 --> 00:03:45,876 "malloc." 84 00:03:45,876 --> 00:03:48,326 And it comes with an opposite function called "free" 85 00:03:48,556 --> 00:03:50,936 which is a function we'll start to need to call 86 00:03:50,936 --> 00:03:52,546 when you wanna tell the operating system, 87 00:03:52,546 --> 00:03:54,126 I am done with this memory. 88 00:03:54,316 --> 00:03:57,256 In fact a little dirty secret of the CS50 library is 89 00:03:57,256 --> 00:04:00,046 that currently it leaks memory whereby, 90 00:04:00,046 --> 00:04:01,806 when every time you call get string, 91 00:04:01,806 --> 00:04:05,096 we call that function malloc, asking the OS for memory, 92 00:04:05,096 --> 00:04:07,716 asking the OS for memory but never once probably 93 00:04:07,716 --> 00:04:10,156 for your problem sets and in even in lecture have we said 94 00:04:10,156 --> 00:04:12,216 to the operating system, "I'm done with this string, 95 00:04:12,216 --> 00:04:13,776 you can have these bytes back." 96 00:04:14,116 --> 00:04:15,536 And so that's actually a bug. 97 00:04:15,536 --> 00:04:18,546 It's a bug that's been in the CS50 library from the beginning 98 00:04:18,546 --> 00:04:21,286 of this semester but it's meant to simplify the process 99 00:04:21,286 --> 00:04:24,226 so that we can just assume that we're getting a string but now 100 00:04:24,226 --> 00:04:26,216 as we begin to dismantle these training wheels, 101 00:04:26,216 --> 00:04:28,546 anytime we actually ask the OS for memory, 102 00:04:28,816 --> 00:04:30,396 we're going to need to give it back. 103 00:04:30,776 --> 00:04:33,426 And so we're going to stop using the get string function 104 00:04:33,686 --> 00:04:35,816 since now we can appreciate perhaps 105 00:04:36,056 --> 00:04:39,556 that it's not quite doing us a huge number of favors. 106 00:04:39,556 --> 00:04:42,636 Now just as an aside, what do we mean by, what do we care 107 00:04:42,636 --> 00:04:43,756 about memory leaks for? 108 00:04:43,756 --> 00:04:45,426 Well, if you've ever been running your Mac or PC, 109 00:04:45,426 --> 00:04:48,426 you're Ubuntu computer, whatever you have and you find 110 00:04:48,426 --> 00:04:52,146 that overtime it's getting slower and slower 111 00:04:52,146 --> 00:04:56,286 and you're opening things like Gchat or your browser 112 00:04:56,286 --> 00:04:59,006 or Photoshop or whatever programs you typically run 113 00:04:59,166 --> 00:05:01,186 and you've been doing these a lot, loading things into memory, 114 00:05:01,186 --> 00:05:04,016 quitting, loading, well if any of those programs has one 115 00:05:04,016 --> 00:05:05,366 of these things called the memory leak 116 00:05:05,366 --> 00:05:08,856 or some programmer just screwed up and asks the OS for memory 117 00:05:09,066 --> 00:05:11,656 but didn't necessarily hand it back, what can happen 118 00:05:11,656 --> 00:05:13,866 in real terms is that your computer can slow 119 00:05:13,866 --> 00:05:17,346 down because your OS is gonna think that it's out of RAM. 120 00:05:17,596 --> 00:05:20,486 And so, future programs are going to use something called 121 00:05:20,556 --> 00:05:23,986 "virtual memory" which for today's purposes is slower. 122 00:05:23,986 --> 00:05:24,816 So if you're ever finding 123 00:05:24,816 --> 00:05:27,156 that your computer's inexplicably getting slower 124 00:05:27,156 --> 00:05:30,226 and slower and slower, it might just be frankly that you need 125 00:05:30,226 --> 00:05:33,186 to reboot because there's some buggy program or programs 126 00:05:33,416 --> 00:05:36,866 that haven't quite been playing nicely inside of your laptop. 127 00:05:37,236 --> 00:05:40,086 So let's take a quick look at the CS50 library. 128 00:05:40,126 --> 00:05:44,096 This is cs50.c, we started looking at this last time 129 00:05:44,096 --> 00:05:45,876 and it's okay if you don't understand all 130 00:05:45,876 --> 00:05:48,116 of the intricacies of get string but if a function 131 00:05:48,116 --> 00:05:51,546 that is a little easier is something called, "get int." 132 00:05:51,906 --> 00:05:54,426 And its turns out that, get long, get long, long, 133 00:05:54,666 --> 00:05:57,846 they're all pretty much equivalently implemented just 134 00:05:57,846 --> 00:05:58,876 for different data types 135 00:05:58,946 --> 00:06:01,516 and they all conveniently use get string. 136 00:06:01,716 --> 00:06:04,146 So let's assume for the moment that get string works 137 00:06:04,146 --> 00:06:05,856 and it allocates memory for us 138 00:06:05,856 --> 00:06:09,016 and it ultimately hands us back a line of text 139 00:06:09,016 --> 00:06:10,116 that the user has typed in. 140 00:06:10,406 --> 00:06:13,406 So here is the implementation of get ints that we've used 141 00:06:13,606 --> 00:06:16,456 as far back as week one and problem set 1 in C. 142 00:06:16,706 --> 00:06:19,166 So I first have kind of curious feature here. 143 00:06:19,286 --> 00:06:23,046 I am deliberately inducing an infinite loop while true means, 144 00:06:23,286 --> 00:06:24,236 do these forever. 145 00:06:24,436 --> 00:06:27,796 So hopefully somewhere in this loop I have a break statement 146 00:06:27,796 --> 00:06:30,656 or a return statement, something that's gonna deliberately break 147 00:06:30,656 --> 00:06:31,826 me out of this infinite loop. 148 00:06:32,066 --> 00:06:34,646 So let's see what I'm trying to do potentially infinitely. 149 00:06:34,896 --> 00:06:37,246 So, I have in the inside of this loop, I'm gonna try 150 00:06:37,246 --> 00:06:39,916 to get a line of text, so this is a familiar call. 151 00:06:40,236 --> 00:06:43,226 I'm now checking recall for line equals, equals null. 152 00:06:43,446 --> 00:06:47,166 So as an example, when might get string return this special 153 00:06:47,166 --> 00:06:47,866 keyword null? 154 00:06:47,866 --> 00:06:48,406 [ Inaudible Remark ] 155 00:06:48,406 --> 00:06:52,886 >> Yeah, so if there's nothing typed in and we can simulate 156 00:06:52,886 --> 00:06:54,656 that as we've discussed with control D 157 00:06:54,656 --> 00:06:56,376 which normal humans aren't likely to type, 158 00:06:56,376 --> 00:06:58,646 but there's another scenario that could very well happen 159 00:06:59,166 --> 00:07:00,226 and we'd get back null. 160 00:07:00,226 --> 00:07:00,836 [ Inaudible Remark ] 161 00:07:00,836 --> 00:07:03,186 >> It's too long a string, right. 162 00:07:03,186 --> 00:07:04,976 If they paste in a huge essay 163 00:07:05,166 --> 00:07:08,266 that would fully exhaust the computers RAM, get string 164 00:07:08,306 --> 00:07:09,966 and as much as it calls malloc, 165 00:07:10,236 --> 00:07:12,986 might end up returning this special pointer value called 166 00:07:12,986 --> 00:07:15,506 null which essentially means, "Something bad happens, 167 00:07:15,506 --> 00:07:16,756 I don't know what necessarily 168 00:07:17,036 --> 00:07:18,556 but I can't give you back a string." 169 00:07:18,796 --> 00:07:21,306 So we have to check for this otherwise we risk later 170 00:07:21,306 --> 00:07:23,036 in our program, those things called seg faults 171 00:07:23,036 --> 00:07:23,946 and core dumps. 172 00:07:23,946 --> 00:07:25,406 Now, why am I returning ints max? 173 00:07:26,196 --> 00:07:28,336 Well, this is just a convention in C. There's kind 174 00:07:28,336 --> 00:07:30,096 of this problem fundamentally in C, 175 00:07:30,336 --> 00:07:32,306 anytime you're implementing a function that's supposed 176 00:07:32,306 --> 00:07:35,426 to return a number, like an integer 'cause you can return, 177 00:07:35,426 --> 00:07:38,456 you know, upwards of 2 billion positive numbers maybe as big 178 00:07:38,456 --> 00:07:39,916 as negative 2 billion. 179 00:07:40,136 --> 00:07:42,916 But if you wanna return an error, you kinda have 180 00:07:42,916 --> 00:07:46,346 to arbitrarily say, "Okay, zero represents an error." 181 00:07:46,576 --> 00:07:47,896 But then that's problematic 182 00:07:47,896 --> 00:07:50,596 because if you want your function like get int to be able 183 00:07:50,596 --> 00:07:54,116 to legitimately return a zero, if the user typed in zero, well, 184 00:07:54,116 --> 00:07:56,276 you can't use zero as your special error value. 185 00:07:56,496 --> 00:07:58,156 So, you can say, "Alright, let's use negative 1." 186 00:07:58,156 --> 00:08:00,416 If the user-- If the-- something goes wring, 187 00:08:00,416 --> 00:08:02,436 we'll return negative 1 by default 188 00:08:02,596 --> 00:08:04,506 but that too is problematic, what if the user wants 189 00:08:04,506 --> 00:08:05,596 to type in negative 1? 190 00:08:05,916 --> 00:08:07,426 So the convention that most functions 191 00:08:07,426 --> 00:08:10,326 in C adopt is they don't choose these popular values 192 00:08:10,326 --> 00:08:11,886 like negative 1, 0 and 1. 193 00:08:12,106 --> 00:08:14,146 They'll choose like 2 billion and 1, 194 00:08:14,226 --> 00:08:16,596 something that's crazy large and just shy 195 00:08:16,596 --> 00:08:19,116 of the maximum possible value and we define 196 00:08:19,116 --> 00:08:20,466 that typically with a constant. 197 00:08:20,696 --> 00:08:23,546 So if we actually poked around a bunch of dot H files 198 00:08:23,546 --> 00:08:25,086 in the appliance, we would see 199 00:08:25,086 --> 00:08:28,056 that someone had done # define ints max 200 00:08:28,426 --> 00:08:29,996 and it's something like 2 billion. 201 00:08:30,236 --> 00:08:33,246 And so we're kind of deciding, "Ugh, yes, we're sacrificing 202 00:08:33,246 --> 00:08:35,446 that number but it's less likely to be useful 203 00:08:35,446 --> 00:08:37,136 than 0 or negative 1 or 2. 204 00:08:37,136 --> 00:08:39,026 So that's why we're returning int max just 205 00:08:39,026 --> 00:08:39,986 as a matter of convention. 206 00:08:40,426 --> 00:08:43,146 Now, here is where things get interesting and here's 207 00:08:43,146 --> 00:08:45,096 where the training wheels now come off. 208 00:08:45,126 --> 00:08:47,106 So this final chunk of code, 209 00:08:47,176 --> 00:08:51,066 this big branch here is how get int ultimately works 210 00:08:51,066 --> 00:08:53,476 and it uses a function that you may have seen in readings 211 00:08:53,476 --> 00:08:56,326 or other examples called sscanf, which means, 212 00:08:56,326 --> 00:08:58,496 String Scan Formatted. 213 00:08:58,696 --> 00:09:00,416 So it's kind of like the opposite of printf, 214 00:09:00,576 --> 00:09:04,326 printf obviously prints, scanf reads in from the keyboard. 215 00:09:04,326 --> 00:09:06,106 So if we didn't have the CS50 library, 216 00:09:06,296 --> 00:09:08,746 you're seeing the hoops, you would have needed to jump 217 00:09:08,746 --> 00:09:11,196 through in week 1 just to do something stupid 218 00:09:11,196 --> 00:09:12,896 like ask the user for a number. 219 00:09:13,176 --> 00:09:14,306 So how does this actually work? 220 00:09:14,306 --> 00:09:16,946 Well notice first just for temporary variables, 221 00:09:16,946 --> 00:09:19,616 I'm declaring an int N and a character C 222 00:09:19,786 --> 00:09:22,176 and I ultimately wanna put the number that the user types 223 00:09:22,176 --> 00:09:27,366 in into N. C, I just need to check if the user screws up, 224 00:09:27,366 --> 00:09:29,266 I need a character for the following reason. 225 00:09:29,596 --> 00:09:33,226 Here's the magical function, sscanf, takes a few arguments. 226 00:09:33,446 --> 00:09:37,906 The first argument is the line of text that the user typed in. 227 00:09:38,056 --> 00:09:40,886 So look at a string that the user typed in and recall 228 00:09:40,886 --> 00:09:42,686 from above, we called get string 229 00:09:42,906 --> 00:09:44,636 and we called the return value line. 230 00:09:44,636 --> 00:09:45,476 So, that' all this is. 231 00:09:45,476 --> 00:09:48,676 This is literally the array of characters, a.k.a. string, 232 00:09:48,986 --> 00:09:49,966 that the user typed in. 233 00:09:50,166 --> 00:09:53,386 So we're saying to scanf, "Scan this string and try 234 00:09:53,386 --> 00:09:55,556 to extract an integer for us." 235 00:09:55,556 --> 00:09:56,496 But how do we do that? 236 00:09:56,656 --> 00:10:00,506 Well, much like printf which is use for output, scanf does this 237 00:10:00,506 --> 00:10:03,516 for input, we say in parent-- we say in quote marks, 238 00:10:03,686 --> 00:10:06,526 present D which is a placeholder saying, 239 00:10:06,836 --> 00:10:11,016 "Try to fill this placeholder with an actual integer 240 00:10:11,096 --> 00:10:13,006 that the user typed in that's inside 241 00:10:13,006 --> 00:10:13,966 of this string called line." 242 00:10:14,956 --> 00:10:17,306 >> We have a white space character to the left 243 00:10:17,486 --> 00:10:19,316 and a white space character in the right-- 244 00:10:19,346 --> 00:10:21,366 to the right, just to be friendly. 245 00:10:21,366 --> 00:10:24,116 So that if the user accidentally hits the spacebar then a number, 246 00:10:24,216 --> 00:10:26,726 that's okay, or they hit a number and then the spacebar, 247 00:10:26,726 --> 00:10:28,656 that's okay, we're gonna forgive that. 248 00:10:28,906 --> 00:10:31,956 But we're not gonna forgive typing any character other 249 00:10:31,956 --> 00:10:33,026 than a space. 250 00:10:33,116 --> 00:10:38,796 If the user types in 42, space F or G or any letter 251 00:10:38,796 --> 00:10:40,646 of the alphabet or symbol on the keyboard? 252 00:10:40,996 --> 00:10:44,076 What's gonna happen is this, scanf is also being told, 253 00:10:44,116 --> 00:10:47,496 "Alright, put the first number you see in this placeholder 254 00:10:47,826 --> 00:10:51,786 but put the first character you see, alphabetical character, 255 00:10:51,786 --> 00:10:53,336 punctuation character that you see 256 00:10:53,526 --> 00:10:54,826 in what variable apparently?" 257 00:10:56,306 --> 00:11:00,206 In C. So this is just here as a bit of error detection. 258 00:11:00,206 --> 00:11:02,006 We only want to fill in. 259 00:11:02,246 --> 00:11:05,806 But just in case the user messes with us and we also happen 260 00:11:05,806 --> 00:11:07,916 to fill C, we wanna be able to detect 261 00:11:07,916 --> 00:11:11,106 that the user types not just a number but also a number 262 00:11:11,106 --> 00:11:13,286 and a letter of some sort. 263 00:11:13,526 --> 00:11:17,046 So we have to tell scanf where to put those variables 264 00:11:17,316 --> 00:11:20,816 and it is not correct certainly to do this 265 00:11:20,816 --> 00:11:23,936 because anytime you pass a variable into a function, 266 00:11:23,936 --> 00:11:25,796 how are you passing it, what's the jargon? 267 00:11:27,126 --> 00:11:28,896 So you're passing it by so called value, 268 00:11:28,896 --> 00:11:30,256 you're passing a copy of it in. 269 00:11:30,456 --> 00:11:35,446 But if you want scanf, sscanf to be able to fill those variables 270 00:11:35,646 --> 00:11:37,966 with some new values, how do you do that? 271 00:11:38,276 --> 00:11:40,436 Well, you have to write down that little scrap of paper, 272 00:11:40,436 --> 00:11:43,266 the visual we keep doing where you have to say to sscanf, 273 00:11:43,646 --> 00:11:45,546 "Put your answer on these two sheets of paper." 274 00:11:45,646 --> 00:11:50,526 So here is &D, here is-- rather &N, here is &C, 275 00:11:50,666 --> 00:11:53,986 put your ints here, put your character here because then, I, 276 00:11:53,986 --> 00:11:56,146 the person who wrote this code is gonna check 277 00:11:56,486 --> 00:12:00,096 if sscanf returns 1, that's perfect. 278 00:12:00,456 --> 00:12:03,516 The return value of scanf signifies how many pieces 279 00:12:03,516 --> 00:12:06,226 of paper that function filled with values. 280 00:12:06,226 --> 00:12:08,476 So if it only filled one, the ints called N, 281 00:12:08,736 --> 00:12:09,716 great, we're good to go. 282 00:12:09,716 --> 00:12:11,946 We can immediately free the line, 283 00:12:12,356 --> 00:12:14,716 the string that the user typed in and that's new today 284 00:12:14,716 --> 00:12:16,996 and then we immediately return that integer. 285 00:12:17,346 --> 00:12:20,656 But if the user somehow did not cooperate, 286 00:12:20,656 --> 00:12:23,996 did not just type a number like 42 and instead we get back 287 00:12:23,996 --> 00:12:27,316 from sscanf the number 2 which signifies that they typed 288 00:12:27,316 --> 00:12:29,906 in some number plus some garbage character 289 00:12:30,156 --> 00:12:32,886 or if s scan F returns zero because they didn't type 290 00:12:32,886 --> 00:12:35,676 in anything somehow, the else condition is gonna apply. 291 00:12:35,816 --> 00:12:39,446 We're still going to free that line, that string that we got 292 00:12:39,446 --> 00:12:41,256 from the user and then we're gonna nag them, 293 00:12:41,416 --> 00:12:43,226 "Retry, retry, retry." 294 00:12:43,506 --> 00:12:45,436 So, if you've ever kind of wondered how 295 00:12:45,436 --> 00:12:48,266 that retry is being spit out automatically for you 296 00:12:48,266 --> 00:12:49,376 if the user doesn't cooperate? 297 00:12:49,676 --> 00:12:52,096 It's right there, everyone of these functions get ints, 298 00:12:52,096 --> 00:12:54,626 get float, get double, has that retry line 299 00:12:54,866 --> 00:12:58,096 and if you don't get there but instead the user cooperates, 300 00:12:58,366 --> 00:13:00,816 you instead return the ints or the floats 301 00:13:01,156 --> 00:13:02,766 or the double or the like. 302 00:13:03,636 --> 00:13:08,106 So that was our-- the training wheels that are hence forth off. 303 00:13:09,096 --> 00:13:10,206 Questions? 304 00:13:10,206 --> 00:13:11,166 [ Inaudible Remark ] 305 00:13:11,166 --> 00:13:13,336 >> If it-- So no. 306 00:13:13,336 --> 00:13:14,326 It has-- Good question. 307 00:13:14,326 --> 00:13:17,446 So if the user just typed a character would it now still 308 00:13:17,446 --> 00:13:18,226 return 1? 309 00:13:18,406 --> 00:13:19,966 No. Then it would return 0 310 00:13:20,166 --> 00:13:23,346 because when you have the format string, percent D, percent C, 311 00:13:23,566 --> 00:13:25,516 that's telling sscanf, "You have to try 312 00:13:25,516 --> 00:13:27,496 to put number then a letter." 313 00:13:27,496 --> 00:13:29,956 So if the user only puts a letter doesn't match 314 00:13:29,956 --> 00:13:32,096 that pattern, so sscanf returns 0. 315 00:13:32,486 --> 00:13:33,676 So, good, good corner case. 316 00:13:33,676 --> 00:13:33,756 Yeah? 317 00:13:34,516 --> 00:13:41,006 [ Inaudible Remark ] 318 00:13:41,506 --> 00:13:43,956 >> So, no-- good question too. 319 00:13:43,956 --> 00:13:48,676 So while true is kind of a-- sort of [inaudible] truth. 320 00:13:48,676 --> 00:13:52,636 True is always true so while true just runs forever no matter 321 00:13:52,636 --> 00:13:54,176 what, there's no way to break 322 00:13:54,176 --> 00:13:56,826 out of this loop unless you manually return 323 00:13:56,826 --> 00:13:57,766 from the inside of it. 324 00:13:58,116 --> 00:14:01,396 Now, we don't have to do it this way, in fact it's usually wrong 325 00:14:01,396 --> 00:14:03,896 to induce an infinite loop, it usually means you messed up 326 00:14:03,896 --> 00:14:04,996 and you made some mistakes 327 00:14:05,266 --> 00:14:07,116 but in this case we could have used a do-while, 328 00:14:07,116 --> 00:14:08,526 we could have used a for loop. 329 00:14:08,726 --> 00:14:10,596 But in this case, we decided as the staff, 330 00:14:10,986 --> 00:14:13,806 we don't wanna say we're gonna let the user try a hundred time 331 00:14:13,876 --> 00:14:15,036 in which case we have a for loop 332 00:14:15,036 --> 00:14:18,016 with the number hundred hard coded and we also noticed, 333 00:14:18,066 --> 00:14:20,726 did not wanna prompt the user initially and say, 334 00:14:20,726 --> 00:14:22,616 and saying something silly like give me an integer. 335 00:14:22,616 --> 00:14:24,316 We wanted to leave that to you guys, 336 00:14:24,596 --> 00:14:26,816 the ability to actually put that first print F, 337 00:14:26,816 --> 00:14:29,726 the only print F we wanted to spit out was the retry 338 00:14:30,046 --> 00:14:32,016 so we just decided that really what we want, 339 00:14:32,106 --> 00:14:34,806 really the construct that gets the job done is just do the 340 00:14:34,806 --> 00:14:38,086 following forever but we'll stop ourselves when we're ready. 341 00:14:38,496 --> 00:14:39,736 So it's just a design decision. 342 00:14:40,416 --> 00:14:40,506 Yeah? 343 00:14:40,651 --> 00:14:42,651 [ Inaudible Remark ] 344 00:14:42,796 --> 00:14:44,526 >> So what if you typed in a number? 345 00:14:44,786 --> 00:14:46,786 [ Inaudible Remark ] 346 00:14:47,046 --> 00:14:47,546 >> Good question. 347 00:14:47,546 --> 00:14:49,866 So what if you typed in a number and a character? 348 00:14:50,006 --> 00:14:52,786 So sscanf actually treats whites space inside 349 00:14:52,786 --> 00:14:54,696 of that second argument special. 350 00:14:54,816 --> 00:14:56,916 If you actually did that, it would ignore-- 351 00:14:57,056 --> 00:14:59,316 it would still put number and then letter. 352 00:14:59,316 --> 00:15:00,956 You don't need to have the white space there. 353 00:15:01,726 --> 00:15:03,576 So that is a-- It's sort of a-- 354 00:15:04,076 --> 00:15:06,826 sort of a secret feature whereby this space is optional. 355 00:15:06,826 --> 00:15:08,226 [ Inaudible Remark ] 356 00:15:08,226 --> 00:15:09,766 >> It would return too in that case. 357 00:15:09,766 --> 00:15:12,956 If you typed in 4-2-F, for instance, you get the 42 358 00:15:13,216 --> 00:15:14,746 and the F but that would still be wrong. 359 00:15:15,496 --> 00:15:15,606 Yeah? 360 00:15:16,516 --> 00:15:23,756 [ Inaudible Remark ] 361 00:15:24,256 --> 00:15:25,856 >> Why do your programs crash? 362 00:15:25,856 --> 00:15:26,576 What do you mean? 363 00:15:26,576 --> 00:15:30,406 >> Why do they not crash? 364 00:15:30,406 --> 00:15:32,716 >> Why do they not crash? 365 00:15:32,716 --> 00:15:32,916 [ Inaudible Remark ] 366 00:15:32,916 --> 00:15:33,806 >> Ah, good question. 367 00:15:33,806 --> 00:15:34,716 So why do your programs-- 368 00:15:34,816 --> 00:15:36,826 not so much crash but why don't they hang and just kind 369 00:15:36,826 --> 00:15:38,496 of sit there waiting in perpetuity? 370 00:15:38,736 --> 00:15:41,086 Well remember, get string is actually the function that's 371 00:15:41,086 --> 00:15:43,846 first getting called and it's get string that sits there 372 00:15:43,906 --> 00:15:46,006 with that blinking prompt waiting for input. 373 00:15:46,436 --> 00:15:48,496 And so if the user doesn't ever type anything 374 00:15:48,496 --> 00:15:50,006 or even hit Control D, absolutely, 375 00:15:50,006 --> 00:15:52,856 your programs just gonna sit there and run in infinitely long 376 00:15:53,316 --> 00:15:56,566 but get string is actually what pauses this loop and waits 377 00:15:56,566 --> 00:15:58,326 with the blinking prompt for the user's input. 378 00:15:59,276 --> 00:15:59,836 Good question. 379 00:16:00,546 --> 00:16:04,606 Alright, so scanf, let's actually see. 380 00:16:04,816 --> 00:16:07,046 This is how we might use this manually. 381 00:16:07,046 --> 00:16:07,936 So this is an example. 382 00:16:07,936 --> 00:16:09,696 It's is among today's printouts on online. 383 00:16:09,896 --> 00:16:12,366 This is an example of called, scanf-1 384 00:16:12,636 --> 00:16:14,666 and it demonstrates how you can sort 385 00:16:14,666 --> 00:16:17,206 of old school style get an integer from the user 386 00:16:17,206 --> 00:16:19,276 if you don't have the CS50 library. 387 00:16:19,416 --> 00:16:20,886 And it actually is relatively simple 388 00:16:20,886 --> 00:16:22,976 if all you wanna get is an int from the user 389 00:16:23,336 --> 00:16:25,976 of these exceptions of the user not cooperating aside, 390 00:16:26,106 --> 00:16:28,576 but it gets a little dangerous if we're not trying to get ints 391 00:16:28,576 --> 00:16:30,916 but we're trying to get characters or strings. 392 00:16:31,146 --> 00:16:34,156 Because recall, we've revealed that string involve pointers, 393 00:16:34,326 --> 00:16:37,256 pointers involve memory, memory involves the risk at least 394 00:16:37,466 --> 00:16:40,206 of screwing up and inducing core dumps and seg faults, 395 00:16:40,416 --> 00:16:41,336 so we're about to see that. 396 00:16:41,676 --> 00:16:43,936 So here's a program whose purpose in life is to say 397 00:16:43,936 --> 00:16:45,426 to the user, give me a number please, 398 00:16:45,806 --> 00:16:49,106 then I have already declared a local variable called X 399 00:16:49,666 --> 00:16:52,386 and I'm passing it in by reference so to speak, 400 00:16:52,386 --> 00:16:55,276 by pointer, by address, these are all synonymous phrases 401 00:16:55,636 --> 00:16:57,116 to a function called scanf. 402 00:16:57,456 --> 00:17:01,546 And scanf reads directly from the keyboard, sscanf, 403 00:17:01,846 --> 00:17:04,786 string scanf, reads from a string that you already have 404 00:17:04,786 --> 00:17:06,336 in a variable as we did earlier. 405 00:17:06,706 --> 00:17:08,926 Scanf reads directly from the user's keyboard, 406 00:17:08,926 --> 00:17:10,386 which is the goal right now. 407 00:17:10,446 --> 00:17:12,996 So right now the user is being prompted for an integer 408 00:17:13,296 --> 00:17:15,366 and if the user provides an integer, 409 00:17:15,366 --> 00:17:18,756 it's going to be stored inside of the variable X and it's going 410 00:17:18,756 --> 00:17:20,806 to be printed back out on the screen. 411 00:17:20,806 --> 00:17:23,956 So let's try this, let me go ahead and do make scanf-1, 412 00:17:24,516 --> 00:17:25,896 alright it seems to compile okay. 413 00:17:25,896 --> 00:17:28,596 I'm gonna go ahead and run scanf-1 and I'm gonna type 414 00:17:28,596 --> 00:17:31,956 in then number 42, thanks for the 42, it seems to work. 415 00:17:31,956 --> 00:17:32,846 So let's try it again. 416 00:17:32,846 --> 00:17:36,416 Let's try the number, let's say, 0. 417 00:17:36,416 --> 00:17:38,076 Always try the interesting cases. 418 00:17:38,076 --> 00:17:38,926 That seems to work. 419 00:17:38,926 --> 00:17:40,776 Let's try the number negative 1. 420 00:17:41,186 --> 00:17:42,146 That seems to work. 421 00:17:42,146 --> 00:17:44,866 Let's try the arbitrary word, like [inaudible] to this term, 422 00:17:45,296 --> 00:17:48,646 monkey, something went wrong here. 423 00:17:48,806 --> 00:17:49,736 So what happened? 424 00:17:49,736 --> 00:17:53,216 Can we infer from this short program what the bug now is? 425 00:17:54,056 --> 00:17:54,136 Yeah? 426 00:17:54,166 --> 00:17:56,166 [ Inaudible Remark ] 427 00:17:56,196 --> 00:17:57,316 >> Okay, good-- it's a good thought. 428 00:17:57,316 --> 00:18:01,406 So perhaps it's converting the ints to the chars, alright. 429 00:18:01,406 --> 00:18:02,936 So other-- so maybe. 430 00:18:04,096 --> 00:18:06,096 So X, no. But let's see some other ideas. 431 00:18:06,606 --> 00:18:07,336 But that's a good thought. 432 00:18:07,836 --> 00:18:07,956 Yeah? 433 00:18:08,516 --> 00:18:13,666 [ Inaudible Remark ] 434 00:18:14,166 --> 00:18:16,156 >> Yeah, so it's actually-- 435 00:18:16,156 --> 00:18:18,816 it's a good thought but it is indeed this case whereby 436 00:18:19,056 --> 00:18:21,766 because sscanf did not detect an integer, 437 00:18:21,766 --> 00:18:25,386 it instead detected a word or the character M in particular, 438 00:18:25,626 --> 00:18:28,986 it could not populate the place holder and so nothing was put 439 00:18:28,986 --> 00:18:31,596 in X. So, its original value is unchanged 440 00:18:31,736 --> 00:18:33,896 and what is X's original value apparently? 441 00:18:34,776 --> 00:18:37,336 Well, it's just-- who knows, it's some garbage value 442 00:18:37,336 --> 00:18:39,386 and that garbage value at that moment in time happen 443 00:18:39,386 --> 00:18:42,826 to be 2719732, who knows what it might have been. 444 00:18:43,086 --> 00:18:44,956 So this is another take away here too. 445 00:18:45,106 --> 00:18:46,786 If there's ever a risk in a program 446 00:18:46,986 --> 00:18:50,276 where your variables might not be assigned some value, 447 00:18:50,476 --> 00:18:54,416 it's actually a good habit just to detect such things 448 00:18:54,686 --> 00:18:57,526 to for instance initialize your variables to some known value, 449 00:18:57,526 --> 00:18:59,636 maybe it's zero, maybe it's negative one, 450 00:18:59,786 --> 00:19:01,526 maybe it's int max, but so 451 00:19:01,526 --> 00:19:05,706 that you yourself can check afterward what actually-- 452 00:19:05,706 --> 00:19:08,026 whether or not the value in there is legitimate. 453 00:19:08,206 --> 00:19:10,816 So realize this is absolutely and option and so 454 00:19:10,816 --> 00:19:11,936 if there's ever a risk again 455 00:19:11,936 --> 00:19:14,546 of your variables not getting initialized, you might want 456 00:19:14,546 --> 00:19:16,346 to pre-initialize it yourself 457 00:19:16,606 --> 00:19:18,686 to what we keep calling a sentinel value, 458 00:19:18,686 --> 00:19:21,236 just some special number or special constant. 459 00:19:21,626 --> 00:19:24,656 Well, this is relatively straightforward for ints 460 00:19:24,716 --> 00:19:27,866 but let's look at version two here at trying to get a string. 461 00:19:28,016 --> 00:19:29,386 Let's just continue this logic. 462 00:19:29,726 --> 00:19:31,286 So we've taken the turning wheels off, 463 00:19:31,286 --> 00:19:34,436 I have removed the CS50 library from my appliance 464 00:19:34,626 --> 00:19:38,786 and so I cannot anymore say include CS50.h and so I instead, 465 00:19:38,786 --> 00:19:40,296 if I wanna declare a string? 466 00:19:40,546 --> 00:19:43,676 I have to go back to the old fashion way of saying char star. 467 00:19:44,026 --> 00:19:48,116 So char star buffer is going to represent my string 468 00:19:48,246 --> 00:19:49,976 and I'm calling this a buffer deliberately. 469 00:19:49,976 --> 00:19:53,216 In my mind a string really is an array, 470 00:19:53,216 --> 00:19:55,366 and an array is just a chunk of memory 471 00:19:55,366 --> 00:19:57,576 and computer scientists would typically call a chunk 472 00:19:57,576 --> 00:19:59,066 of memory a buffer, something 473 00:19:59,066 --> 00:20:01,076 into which you can read values the characters, 474 00:20:01,076 --> 00:20:01,866 numbers, whatever. 475 00:20:02,056 --> 00:20:05,176 >> So a buffer typically means an array of memory. 476 00:20:05,556 --> 00:20:09,826 But notice here that really what I've declared is char 477 00:20:09,826 --> 00:20:10,656 star buffer. 478 00:20:11,226 --> 00:20:13,606 So just to be proactive here, 479 00:20:13,976 --> 00:20:16,186 how much space have I actually allocated 480 00:20:16,186 --> 00:20:17,616 with this first line of code here? 481 00:20:19,596 --> 00:20:22,386 How may bites or how many bits have just been allocated 482 00:20:22,386 --> 00:20:26,756 by char star buffer? 483 00:20:27,046 --> 00:20:28,686 So it's pretty small, right? 484 00:20:28,686 --> 00:20:30,126 How big is a char star? 485 00:20:30,126 --> 00:20:31,676 It is what the question reduces to. 486 00:20:32,986 --> 00:20:35,226 It's probably-- It's generally 32 bits, right. 487 00:20:35,226 --> 00:20:37,706 So anytime we have a pointer we've claimed at least 488 00:20:37,706 --> 00:20:39,776 for the appliance and for slightly older computers, 489 00:20:39,906 --> 00:20:42,286 a pointer is always 32 bits. 490 00:20:43,006 --> 00:20:44,906 So what does that mean, char star buffer? 491 00:20:44,906 --> 00:20:47,536 All I've allocated is 32 bits and what's supposed 492 00:20:47,536 --> 00:20:48,736 to go inside of those bits? 493 00:20:48,936 --> 00:20:50,896 Not a string, that's not really a buffer, 494 00:20:50,896 --> 00:20:52,996 that's kind of a misnomer at this point in the story 495 00:20:53,226 --> 00:20:55,736 because really I only have a pointer and I'm not supposed 496 00:20:55,736 --> 00:20:57,586 to put characters in pointers, I'm supposed 497 00:20:57,586 --> 00:21:00,226 to put memory addresses in pointers. 498 00:21:00,476 --> 00:21:02,536 So even if this still feels a little abstract 499 00:21:02,726 --> 00:21:06,466 at least take away from this that something here is wrong. 500 00:21:06,466 --> 00:21:08,696 And as an aside, just to tie this together. 501 00:21:08,696 --> 00:21:12,726 If you're pretty computer savvy and you've known it for sometime 502 00:21:12,726 --> 00:21:16,756 that for instance your PC can only have 2 gigabytes maximally, 503 00:21:16,756 --> 00:21:17,306 a RAM. 504 00:21:17,306 --> 00:21:19,076 You might be generally aware of these restrictions. 505 00:21:19,376 --> 00:21:26,296 Well, 2 gigabytes is actually a result of having a 32 bit CPU, 506 00:21:26,296 --> 00:21:27,946 Intel Inside if its 32 bits, 507 00:21:28,196 --> 00:21:30,636 the biggest possible number you can express is 2 508 00:21:30,996 --> 00:21:32,966 to the 32, which is 4 billion. 509 00:21:33,186 --> 00:21:34,836 But for technical reasons that's actually half. 510 00:21:35,076 --> 00:21:38,646 So you have 31 buts that you can use to address your RAM. 511 00:21:38,906 --> 00:21:41,306 So long story short, you can't have more that 2 gigabytes 512 00:21:41,306 --> 00:21:43,736 of memory in some computer 'cause you don't have numbers 513 00:21:43,736 --> 00:21:46,526 big enough to actually say, "Put this here, put this here." 514 00:21:46,676 --> 00:21:48,286 You kind of can't count that high even 515 00:21:48,286 --> 00:21:50,026 if you have 10 gigabytes of memory. 516 00:21:50,316 --> 00:21:52,886 So another reason to having a 64-bit computer these days 517 00:21:52,886 --> 00:21:55,976 as most of you now probably do is actually a very good thing. 518 00:21:56,516 --> 00:22:02,356 [ Inaudible Remark ] 519 00:22:02,856 --> 00:22:03,356 >> Correct. 520 00:22:03,506 --> 00:22:06,756 So C does not have a native string class like something 521 00:22:06,756 --> 00:22:09,286 like Java does but it does know about strings 522 00:22:09,286 --> 00:22:10,166 in the sense of printf. 523 00:22:10,476 --> 00:22:13,946 But it really treats them as arrays of characters 524 00:22:13,946 --> 00:22:16,026 and it stops when it sees 0. 525 00:22:16,436 --> 00:22:18,226 So it's very primitive in that sense. 526 00:22:18,226 --> 00:22:18,836 So what do I say? 527 00:22:18,836 --> 00:22:20,186 I say to the user, string please 528 00:22:20,186 --> 00:22:23,096 and then I use scanf, percent S buffer. 529 00:22:23,306 --> 00:22:26,246 So I'm saying to scanf, take whatever the user just typed 530 00:22:26,246 --> 00:22:29,136 in at his or her keyboard and then put it, where? 531 00:22:29,136 --> 00:22:31,246 At that memory address. 532 00:22:31,746 --> 00:22:33,526 Now what is that memory address? 533 00:22:33,526 --> 00:22:35,236 Well, what is the value of buffer 534 00:22:35,236 --> 00:22:36,596 at this point in the story? 535 00:22:36,596 --> 00:22:37,296 [ Inaudible Remark ] 536 00:22:37,296 --> 00:22:41,616 >> It's a garbage value, right? 537 00:22:41,616 --> 00:22:43,306 The answer to this question is always gonna be, 538 00:22:43,306 --> 00:22:45,016 it's just some unknown garbage value. 539 00:22:45,016 --> 00:22:48,196 If I and none of my code has not initialized 540 00:22:48,196 --> 00:22:49,206 as anything explicit. 541 00:22:49,396 --> 00:22:53,926 So what you're really saying is your allocating only 32 bits 542 00:22:53,926 --> 00:22:56,106 for a pointer to a character 543 00:22:56,326 --> 00:22:58,626 and that's got the number like OX1234. 544 00:22:58,966 --> 00:23:01,616 You are then handing this slip of paper to scanf and saying, 545 00:23:01,846 --> 00:23:03,586 "Put whatever the user types here." 546 00:23:03,976 --> 00:23:04,946 Well what's there? 547 00:23:04,946 --> 00:23:06,306 Well, you have no idea. 548 00:23:06,596 --> 00:23:07,526 It could be 0. 549 00:23:07,526 --> 00:23:08,886 It could be 1, 2, 3, 4, 5. 550 00:23:08,886 --> 00:23:10,706 It could be some part of memory 551 00:23:10,706 --> 00:23:12,166 that you don't even have access to. 552 00:23:12,416 --> 00:23:16,306 So scanf is going to erroneously put the string there 553 00:23:16,376 --> 00:23:18,856 which typically induces one of those seg faults. 554 00:23:19,106 --> 00:23:20,666 So let's see if we can confirm this. 555 00:23:20,666 --> 00:23:22,816 So let me go ahead and make scanf-2. 556 00:23:22,996 --> 00:23:24,576 And always keep in mind that some 557 00:23:24,576 --> 00:23:28,436 of these errors are not detectable because you get lucky 558 00:23:28,436 --> 00:23:30,496 and you actually touch memory that is yours 559 00:23:30,496 --> 00:23:31,756 but you really shouldn't be using it. 560 00:23:31,986 --> 00:23:34,546 Now we've configured make in such a way 561 00:23:34,546 --> 00:23:37,546 that you can't even compile this code 562 00:23:37,546 --> 00:23:39,686 because we are proactively checking, wait a minute, 563 00:23:39,866 --> 00:23:42,126 buffer is uninitialized in this function 564 00:23:42,356 --> 00:23:43,376 and so there's an error. 565 00:23:43,376 --> 00:23:44,786 Won't let me even compile. 566 00:23:44,996 --> 00:23:47,516 But let me see if I can manually override this. 567 00:23:47,516 --> 00:23:50,136 Rather than type make, I'm instead going 568 00:23:50,136 --> 00:23:55,376 to do GCC-standards equals C99, this is just a version 569 00:23:55,376 --> 00:23:59,156 of C we're using, and I'm gonna skip the -W error, 570 00:23:59,156 --> 00:24:02,226 which is the guy that's making make be so picky, 571 00:24:02,756 --> 00:24:08,396 scanf-2.c-O, scanf-2, Enter. 572 00:24:08,396 --> 00:24:10,576 So now I'm gonna run scanf-2. 573 00:24:10,576 --> 00:24:12,626 So it compiled even though I know there's a bug 574 00:24:12,626 --> 00:24:15,386 in this program, let's go ahead and run it, string please, 575 00:24:15,456 --> 00:24:18,176 monkey, enter, segmentation fault. 576 00:24:18,416 --> 00:24:20,286 So what's a possible solution here? 577 00:24:20,286 --> 00:24:23,606 Well, I can at least initialize this to some known value 578 00:24:23,606 --> 00:24:25,216 and the convention is typically null. 579 00:24:25,216 --> 00:24:26,886 Let me go ahead and recompile this. 580 00:24:27,126 --> 00:24:30,386 But notice even if I do this, this is no better, 581 00:24:30,646 --> 00:24:33,876 it's still buggy, but at least now printf is detecting 582 00:24:33,876 --> 00:24:34,956 as much in this case. 583 00:24:34,956 --> 00:24:38,506 So still a bug, but at least now it's obvious to you, the human, 584 00:24:38,506 --> 00:24:39,836 the programer, wait a minute, 585 00:24:39,836 --> 00:24:41,296 this is clearly not what I expected. 586 00:24:41,296 --> 00:24:42,426 I'm doing something wrong. 587 00:24:42,906 --> 00:24:44,806 So what's the solution perhaps to this? 588 00:24:45,026 --> 00:24:46,766 What do-- How do we fix this problem, 589 00:24:46,766 --> 00:24:48,276 a buffer being some unknown value? 590 00:24:48,776 --> 00:24:53,366 Can I do something a little crazy like, well, 591 00:24:53,366 --> 00:24:55,296 null is obviously bad, I know that much. 592 00:24:55,486 --> 00:24:57,346 Well, let's put it at this address I keep quoting 593 00:24:57,346 --> 00:24:58,106 as popular. 594 00:24:58,966 --> 00:25:01,816 Well, we could do that but what you're really saying, 595 00:25:01,816 --> 00:25:04,186 now you're just arbitrarily saying, put what the user types 596 00:25:04,316 --> 00:25:06,946 over there and you have no idea where there is. 597 00:25:07,116 --> 00:25:10,816 So what function can we call that gives me a memory address 598 00:25:10,816 --> 00:25:12,546 of a legitimate chunk of memory? 599 00:25:12,546 --> 00:25:13,406 [ Inaudible Remark ] 600 00:25:13,406 --> 00:25:15,206 >> Right. So the solution here is malloc. 601 00:25:15,206 --> 00:25:16,106 So we could try this. 602 00:25:16,106 --> 00:25:20,256 So give me a string that's of size, the user is not gonna type 603 00:25:20,256 --> 00:25:22,456 in a word that's more that like 10 characters. 604 00:25:22,456 --> 00:25:23,606 So I'm gonna hard code 10. 605 00:25:23,936 --> 00:25:26,696 Now this should actually work because malloc, 606 00:25:26,696 --> 00:25:27,806 assuming there's RAM left 607 00:25:27,806 --> 00:25:29,576 in the computer is gonna give me a pointer 608 00:25:29,816 --> 00:25:33,216 and it's gonna say put this word that the user types here 609 00:25:33,216 --> 00:25:35,716 in memory and that address is now stored in buffer 610 00:25:35,826 --> 00:25:37,636 so scanf will put it there. 611 00:25:38,056 --> 00:25:39,996 Now, let me go ahead and try compiling this, 612 00:25:40,296 --> 00:25:42,586 let me recompile it with GCC, 613 00:25:42,816 --> 00:25:44,696 implicit declaration of function malloc. 614 00:25:45,156 --> 00:25:46,646 So we've not how to this manually 615 00:25:46,646 --> 00:25:47,956 yet because the library-- 616 00:25:48,006 --> 00:25:50,006 the CS50 library is usually doing this for us, 617 00:25:50,316 --> 00:25:52,846 but there's another header that's popular, standard lib, 618 00:25:53,066 --> 00:25:56,656 standardlibrary.h, and that should make GCC happy now, 619 00:25:57,106 --> 00:25:58,506 not knowing about malloc. 620 00:25:58,506 --> 00:25:59,566 And indeed it did. 621 00:25:59,566 --> 00:26:03,836 So scanf-2, we go ahead and type monkey, nice. 622 00:26:04,246 --> 00:26:05,076 Thanks for the monkey. 623 00:26:05,256 --> 00:26:09,916 Alright but wait a minute, monkey, monkey, 624 00:26:09,996 --> 00:26:13,486 monkey, enter, interesting. 625 00:26:13,486 --> 00:26:19,276 So that kind of worked but monkey, monkey, monkey, monkey, 626 00:26:19,276 --> 00:26:21,376 with no spaces, interesting 627 00:26:21,656 --> 00:26:24,046 but what's happening here actually is 628 00:26:24,046 --> 00:26:26,316 that we are getting lucky. 629 00:26:26,316 --> 00:26:28,546 Let me see if I can make us unlucky 630 00:26:28,886 --> 00:26:30,516 by doing this ad nauseam. 631 00:26:30,666 --> 00:26:31,906 Let's see, that doesn't work. 632 00:26:32,486 --> 00:26:33,406 So let's do paste. 633 00:26:33,846 --> 00:26:34,876 Let's do paste. 634 00:26:35,606 --> 00:26:36,896 Let's do paste. 635 00:26:37,356 --> 00:26:38,256 Let's do zoom out. 636 00:26:38,816 --> 00:26:39,796 Let's do paste. 637 00:26:40,546 --> 00:26:42,476 Now obviously a user is not typically going 638 00:26:42,476 --> 00:26:43,386 to do something like this. 639 00:26:43,436 --> 00:26:45,906 But imagine it's actually, you know, a form field 640 00:26:45,906 --> 00:26:48,666 on a web page where-- still working. 641 00:26:48,976 --> 00:26:51,546 I'm getting a little bored copying and pasting. 642 00:26:51,546 --> 00:26:55,976 But take my word for it today that if we did this long enough, 643 00:26:56,066 --> 00:26:58,876 we would traverse one of those segmentation barriers 644 00:26:58,876 --> 00:27:00,986 where right now we're within it and we're just getting lucky 645 00:27:00,986 --> 00:27:03,006 but we're gonna cross over at some point 646 00:27:03,176 --> 00:27:05,296 and in fact it's going to crash on us. 647 00:27:05,296 --> 00:27:06,746 Again, segmentation fault. 648 00:27:06,926 --> 00:27:07,916 So how do you fix this? 649 00:27:07,916 --> 00:27:09,976 Well, this is actually harder to fix. 650 00:27:09,976 --> 00:27:14,336 And now the CS50 libraries get string function motivated. 651 00:27:14,556 --> 00:27:16,486 Recall, we looked at it briefly on Monday 652 00:27:16,486 --> 00:27:18,166 and anytime you call get string, 653 00:27:18,426 --> 00:27:19,996 how many characters does it get at a time? 654 00:27:21,346 --> 00:27:23,486 Well, recall, it used a new function, get character, 655 00:27:23,486 --> 00:27:26,526 get char and it just got one a time, one at a time. 656 00:27:26,526 --> 00:27:29,166 It's incredibly paranoid, the get string function 657 00:27:29,166 --> 00:27:31,876 that we wrote so that it only slowly looks 658 00:27:31,876 --> 00:27:35,206 at what the user is typing in and only if it realizes, 659 00:27:35,256 --> 00:27:37,436 "Wait a minute, you just typed in 11 characters 660 00:27:37,436 --> 00:27:39,176 but I've only allocated 10 bytes." 661 00:27:39,526 --> 00:27:40,406 What is it gonna do? 662 00:27:40,406 --> 00:27:43,646 Well, recall, we saw briefly the realloc function which is 663 00:27:43,646 --> 00:27:44,916 like a cousin of malloc. 664 00:27:44,916 --> 00:27:48,046 And realloc as it's name suggests, takes the 10 bytes 665 00:27:48,086 --> 00:27:49,936 that you might have already allocated with malloc 666 00:27:50,146 --> 00:27:53,246 and doubles it or triples it and we repeat this process. 667 00:27:53,476 --> 00:27:56,676 So why was get string relatively complex compared to this? 668 00:27:57,056 --> 00:27:59,616 For this simple reason, there are so many programs 669 00:27:59,616 --> 00:28:01,406 to this day written out there where the-- 670 00:28:01,406 --> 00:28:04,406 you, the programer has made a arbitrary 671 00:28:04,616 --> 00:28:07,126 and ultimately dangerous decision to say, 672 00:28:07,416 --> 00:28:10,206 no one is gonna have a name longer than 16 characters 673 00:28:10,206 --> 00:28:11,376 or a thousand characters. 674 00:28:11,566 --> 00:28:14,276 But these are precisely the opportunities that bad guys look 675 00:28:14,276 --> 00:28:17,026 for trying to crash programs because again we saw 676 00:28:17,026 --> 00:28:20,206 on Monday this opportunity for a buffer overflow exploit 677 00:28:20,426 --> 00:28:22,526 which essentially means typing something a little more 678 00:28:22,526 --> 00:28:25,516 sophisticated than monkey, monkey, monkey but rather code 679 00:28:25,516 --> 00:28:27,916 that you wanna execute and you can trick the computer 680 00:28:28,086 --> 00:28:29,676 into over flowing this buffer 681 00:28:29,866 --> 00:28:32,476 and executing your adversarial code. 682 00:28:33,426 --> 00:28:35,476 Yeah? 683 00:28:35,476 --> 00:28:36,286 [ Inaudible Remark ] 684 00:28:36,286 --> 00:28:38,486 >> The word string does not exist. 685 00:28:38,486 --> 00:28:40,686 It's in CS50.h. 686 00:28:40,686 --> 00:28:42,656 [ Inaudible Remark ] 687 00:28:42,656 --> 00:28:44,276 >> Good question, coincidence. 688 00:28:44,276 --> 00:28:48,486 So percent S is part of C. It's part of printf and percent-- 689 00:28:48,486 --> 00:28:51,696 so the word string exists in programmer's vocabulary 690 00:28:51,966 --> 00:28:54,746 but the data type string does not exist in C. 691 00:28:55,466 --> 00:28:57,666 So percent S denotes char star. 692 00:28:59,516 --> 00:29:05,546 [ Inaudible Remark ] 693 00:29:06,046 --> 00:29:07,026 >> Really good question. 694 00:29:07,266 --> 00:29:09,396 So I'm contradicting my self here, right? 695 00:29:09,616 --> 00:29:11,326 In the previous example with S-- 696 00:29:11,456 --> 00:29:14,496 with scanf-1, recall that I did this. 697 00:29:14,856 --> 00:29:18,906 I put a past in-- in percent X to get the address of X. 698 00:29:19,406 --> 00:29:22,426 But we can kind of answer this just with our own jargon. 699 00:29:22,746 --> 00:29:24,176 What is buffer already? 700 00:29:24,776 --> 00:29:26,766 It's a memory address. 701 00:29:26,956 --> 00:29:28,636 So I don't need to use ampersand 702 00:29:28,856 --> 00:29:30,966 because I already have the answer to the question. 703 00:29:30,966 --> 00:29:32,466 The question is gonna be, where do you want me 704 00:29:32,466 --> 00:29:33,676 to put the user's input? 705 00:29:33,926 --> 00:29:34,996 Well, put it at that address. 706 00:29:35,146 --> 00:29:36,826 And the fundamental difference here is 707 00:29:36,826 --> 00:29:39,266 that in the previous example, we allocated in ints, 708 00:29:39,406 --> 00:29:42,036 it was on the stack as a local variable 709 00:29:42,036 --> 00:29:43,436 but there is no malloc involved. 710 00:29:43,656 --> 00:29:45,046 But as soon as you involve malloc, 711 00:29:45,046 --> 00:29:46,626 what you're literally getting is that address. 712 00:29:47,096 --> 00:29:49,776 So we don't need to use the ampersand in this case. 713 00:29:50,356 --> 00:29:52,726 And realize there's one other way we can create a buffer 714 00:29:52,726 --> 00:29:55,516 that's just as dangerous as hard coding a length. 715 00:29:55,806 --> 00:29:57,796 A very common approach in a program is 716 00:29:57,796 --> 00:30:00,976 to say something like, char buffer bracket 16. 717 00:30:01,666 --> 00:30:04,426 >> So char buffer 16 doesn't feel 718 00:30:04,426 --> 00:30:06,506 like a memory address really, 719 00:30:06,816 --> 00:30:09,286 but it is in fact an array of characters. 720 00:30:09,286 --> 00:30:10,536 And what though is an array? 721 00:30:10,846 --> 00:30:12,526 Well, an array really is just the address 722 00:30:12,696 --> 00:30:14,046 of the first chunk of memory. 723 00:30:14,336 --> 00:30:15,626 So what is this really doing? 724 00:30:15,626 --> 00:30:19,686 This also is allocating not 10 but 16 bytes this time. 725 00:30:19,906 --> 00:30:23,206 It's then passing the word of the name of the array buffer 726 00:30:23,476 --> 00:30:28,836 to scanf and think back now to P set 3 when you implemented sort 727 00:30:28,836 --> 00:30:31,336 or search, remember that you could pass in an array 728 00:30:31,336 --> 00:30:32,566 as an argument to a function 729 00:30:32,746 --> 00:30:34,306 and you didn't use the double brackets, 730 00:30:34,306 --> 00:30:36,116 you instead just wrote the arrays' name. 731 00:30:36,396 --> 00:30:38,736 That's because you can pass an array but its name, 732 00:30:38,736 --> 00:30:43,506 by its address and so scanf here would use that 16-byte buffer 733 00:30:43,816 --> 00:30:45,006 to put the users input. 734 00:30:45,006 --> 00:30:48,176 But what's gonna happen if the user types in 17 characters? 735 00:30:48,656 --> 00:30:51,866 Just by nature, by definition, 736 00:30:51,866 --> 00:30:54,236 you're gonna go beyond boundaries of that array 737 00:30:54,436 --> 00:30:56,496 and notice too in C, scanf 738 00:30:56,686 --> 00:31:00,746 and your own code has no idea how big the original array is. 739 00:31:00,976 --> 00:31:04,056 It has no idea how many bytes you asked malloc for. 740 00:31:04,056 --> 00:31:06,496 It is entirely up to you, the programmer 741 00:31:06,766 --> 00:31:08,876 to remember how many bytes you asked for 742 00:31:09,116 --> 00:31:11,886 or how many bytes you hard coded in the array. 743 00:31:11,956 --> 00:31:14,596 And so again, this is the prime-- 744 00:31:14,596 --> 00:31:18,006 one of the primary reasons that so much code written in C 745 00:31:18,006 --> 00:31:22,966 and C++ and even in some modern languages is in fact exploitable 746 00:31:23,356 --> 00:31:25,336 because of these kinds of dangers. 747 00:31:25,566 --> 00:31:26,786 And if you don't believe that too, 748 00:31:26,786 --> 00:31:27,926 realize that these languages 749 00:31:27,926 --> 00:31:29,166 that you might already know a little bit, 750 00:31:29,166 --> 00:31:31,866 and we certainly will by semesters in like JavaScript 751 00:31:31,866 --> 00:31:33,646 and PHP and Python and Ruby, 752 00:31:33,646 --> 00:31:37,906 a lot of the times the programs called interpreters that you use 753 00:31:37,906 --> 00:31:41,816 to use those language, they're written in C themselves. 754 00:31:41,816 --> 00:31:44,406 So you might be writing PHP code, but it's being executed 755 00:31:44,406 --> 00:31:46,996 by a C program, so if that C program is buggy, 756 00:31:47,236 --> 00:31:49,306 your PHP code can be vulnerable as well. 757 00:31:49,456 --> 00:31:49,576 Yeah? 758 00:31:50,516 --> 00:31:56,566 [ Inaudible Remark ] 759 00:31:57,066 --> 00:31:57,866 >> Good question. 760 00:31:57,866 --> 00:32:02,186 So if the buffer is an address why do we not say star buffer 761 00:32:02,186 --> 00:32:04,266 as we did in our swap function on Monday? 762 00:32:04,646 --> 00:32:06,286 So the reason again boils 763 00:32:06,286 --> 00:32:08,676 down to the fundamental question we're trying to answer here. 764 00:32:08,676 --> 00:32:11,056 The question at hand for scanf is where do you want me 765 00:32:11,056 --> 00:32:12,136 to put the user's input? 766 00:32:12,496 --> 00:32:14,366 The answer to that question must be an address. 767 00:32:15,086 --> 00:32:17,596 But we already have an address, malloc gave us an address. 768 00:32:17,946 --> 00:32:20,436 So the simple answer to why we need no star 769 00:32:20,436 --> 00:32:23,556 and no ampersand here is because buffer is already an address. 770 00:32:23,996 --> 00:32:25,746 Because in this case, we called malloc 771 00:32:26,466 --> 00:32:31,846 and as I'm disclosing today the name of an array can be treated 772 00:32:31,846 --> 00:32:34,546 as though it is a pointer as well, an address. 773 00:32:34,956 --> 00:32:37,856 The only time we used the star is when we want to go there. 774 00:32:38,096 --> 00:32:40,476 Scanf will do star buffer, we do not. 775 00:32:41,706 --> 00:32:41,776 Yeah? 776 00:32:42,236 --> 00:32:43,776 >> Can you compare char buffer 16? 777 00:32:44,046 --> 00:32:49,026 Is that 16 characters or? 778 00:32:49,376 --> 00:32:52,666 >> 16 characters, 16 of whatever the data type is in green. 779 00:32:52,666 --> 00:32:52,733 [ Inaudible Remark ] 780 00:32:52,733 --> 00:32:56,756 >> No, it's still gonna be 16 bytes. 781 00:32:56,756 --> 00:32:57,826 A char is 1 byte. 782 00:32:58,236 --> 00:32:58,736 It's 8 bits. 783 00:32:59,216 --> 00:33:02,786 So, in this case we would get literally 16 bytes or 16 chars. 784 00:33:03,016 --> 00:33:06,466 If instead we're in buffer 16, then we would get 64, 785 00:33:06,466 --> 00:33:08,176 'cause it'd be 4 bytes per integer. 786 00:33:09,376 --> 00:33:13,206 Alright, so recall then the danger that this leads to. 787 00:33:13,206 --> 00:33:16,876 Alright, we saw this picture and we kind of lambasted this design 788 00:33:16,876 --> 00:33:20,006 because you have this dangerous pointer called the return 789 00:33:20,006 --> 00:33:22,426 address in red and that was simple the address of what? 790 00:33:22,426 --> 00:33:24,886 What was this red return address used for? 791 00:33:25,946 --> 00:33:26,976 It tells the function what? 792 00:33:27,766 --> 00:33:32,106 This-- The return address, is literally return address. 793 00:33:32,106 --> 00:33:34,986 It tells a function where it should return control 794 00:33:34,986 --> 00:33:37,386 of the computer to once it's done doing its thing. 795 00:33:37,576 --> 00:33:40,226 So if I'm the main function and I called the fu function, 796 00:33:40,426 --> 00:33:42,636 what I'm essentially doing conceptually is, I main. 797 00:33:42,636 --> 00:33:45,266 I'm gonna say I am address 1, 2, 3, 4, 5. 798 00:33:45,346 --> 00:33:47,076 You hand this piece of paper to fu, 799 00:33:47,356 --> 00:33:50,506 fu keeps it around in this red slice of memory and as soon 800 00:33:50,506 --> 00:33:53,106 as fu is done executing, it checks where did main tell me 801 00:33:53,106 --> 00:33:54,816 to go back to, 1, 2, 3, 4. 802 00:33:54,976 --> 00:33:58,116 Let me hand control to the CPU back to the address that was 803 00:33:58,116 --> 00:33:59,116 on this piece of paper. 804 00:33:59,376 --> 00:34:03,386 But the problem is that if fu has on a buffer say 16 bytes, 805 00:34:03,386 --> 00:34:06,856 or in this case 12 bytes, and the user types in not hello 806 00:34:06,946 --> 00:34:10,136 but something much longer that that, where does the space go? 807 00:34:10,136 --> 00:34:11,776 It goes from top left to bottom, 808 00:34:11,986 --> 00:34:15,376 and so you run the risk ultimately of overwriting this 809 00:34:15,886 --> 00:34:18,616 with the address of some bad guy's code. 810 00:34:18,616 --> 00:34:21,316 And unfortunately even though this simple solution might be 811 00:34:21,316 --> 00:34:25,046 to just say, alright well don't write hello from top down. 812 00:34:25,046 --> 00:34:26,076 Write it from bottom up. 813 00:34:26,076 --> 00:34:28,736 It turns out that only makes the problem a little harder 814 00:34:28,736 --> 00:34:29,556 for the bad guys. 815 00:34:29,736 --> 00:34:33,166 But the problem is that you can end up tricking future functions 816 00:34:33,166 --> 00:34:35,566 that gets called into exploiting codes. 817 00:34:35,566 --> 00:34:37,566 So there's actually not a simple fix for this. 818 00:34:37,896 --> 00:34:39,936 And again, this remains one of the most common ways 819 00:34:40,296 --> 00:34:41,886 of exploiting a program. 820 00:34:42,146 --> 00:34:44,466 Let me just peel back the layer of one other thing 821 00:34:44,466 --> 00:34:45,696 with regard to pointers. 822 00:34:46,046 --> 00:34:49,746 So this here is a program called "pointers.C." 823 00:34:49,746 --> 00:34:52,566 It's among our source code from today already. 824 00:34:52,816 --> 00:34:55,546 Notice that I'm using a few header files up here, 825 00:34:55,646 --> 00:34:58,876 using a few libraries just because I wanted to resort 826 00:34:58,876 --> 00:35:00,276 to the CS50 library for this. 827 00:35:00,496 --> 00:35:02,406 And now notice, the one new habit I'm getting 828 00:35:02,406 --> 00:35:05,506 into is anytime I call get string, I now need to say 829 00:35:05,546 --> 00:35:07,846 if the return value equals-equals null, 830 00:35:07,846 --> 00:35:08,726 something went wrong. 831 00:35:08,726 --> 00:35:09,766 I should yell at the user. 832 00:35:09,766 --> 00:35:11,106 I should return. 833 00:35:11,106 --> 00:35:11,736 I should exit. 834 00:35:12,016 --> 00:35:13,536 So I'm now checking that value. 835 00:35:13,826 --> 00:35:15,436 Now why can get string return null? 836 00:35:15,476 --> 00:35:18,126 Because it uses malloc and malloc can return null. 837 00:35:18,656 --> 00:35:20,616 Alright, so notice this trick though, 838 00:35:21,126 --> 00:35:23,426 we have previously printed strings 839 00:35:23,676 --> 00:35:27,216 and previously the syntax had been this, alright. 840 00:35:27,366 --> 00:35:30,926 This should probably remind you a little bit of week 1, week 2. 841 00:35:31,236 --> 00:35:33,636 If you wanna print a string that the users typed in, 842 00:35:33,636 --> 00:35:35,196 it should you remind you of P set 2, 843 00:35:35,196 --> 00:35:36,626 the Caesar cipher and Vigenere. 844 00:35:36,926 --> 00:35:39,846 Well, I can print each character, present C one 845 00:35:39,846 --> 00:35:43,506 at a time and then I can print that character by way of S, 846 00:35:43,936 --> 00:35:45,756 the name of the string bracket I. 847 00:35:46,386 --> 00:35:47,206 So comfort with this? 848 00:35:48,216 --> 00:35:48,596 Hopefully? 849 00:35:48,596 --> 00:35:51,276 So it turns out that all this time, 850 00:35:51,276 --> 00:35:53,716 these square brackets are what we would generally call 851 00:35:53,856 --> 00:35:55,036 syntactic sugar. 852 00:35:55,286 --> 00:35:58,116 It's just a nicer, prettier way of doing something 853 00:35:58,326 --> 00:36:00,506 that at the end of the day is actually more sophisticated. 854 00:36:00,726 --> 00:36:05,076 This code here that I just wrote is equivalent to S bracket 1. 855 00:36:05,536 --> 00:36:07,096 So let's go back to the fundamentals. 856 00:36:07,096 --> 00:36:08,286 First of all, what is S? 857 00:36:08,286 --> 00:36:11,556 Well, S we call string but really as of this week, 858 00:36:11,556 --> 00:36:13,826 what is S in more technical terms? 859 00:36:14,776 --> 00:36:15,606 It's an address, right? 860 00:36:15,746 --> 00:36:17,896 It is the address is RAM at which 861 00:36:17,896 --> 00:36:20,626 that string's characters live from left to right. 862 00:36:21,106 --> 00:36:24,276 So, star S recall means "go there." 863 00:36:24,276 --> 00:36:26,826 And if you go to that address, you're gonna see the letter M 864 00:36:27,066 --> 00:36:30,016 and then O and then N and then K, if the word is monkey, right. 865 00:36:30,016 --> 00:36:31,236 If you go to that address, 866 00:36:31,436 --> 00:36:32,706 you're gonna see those characters. 867 00:36:32,886 --> 00:36:35,306 But you don't wanna go to the same address every time. 868 00:36:35,516 --> 00:36:38,346 You wanna go to the start of the string which is identified 869 00:36:38,346 --> 00:36:41,466 by the name S, but each time you iterate 870 00:36:41,466 --> 00:36:43,696 in this loop how many steps to the right 871 00:36:43,756 --> 00:36:44,906 in memory do you wanna look? 872 00:36:45,476 --> 00:36:47,966 Well I, right, one more, one more, one more. 873 00:36:48,256 --> 00:36:50,906 So if inside of your loop, you take this address S 874 00:36:50,906 --> 00:36:54,666 and you add I to it, well the first time this loop goes 875 00:36:54,666 --> 00:36:56,486 through, I is initialized to what apparently? 876 00:36:57,406 --> 00:37:01,066 Zero. So S plus 0 is S. So what are you gonna print first? 877 00:37:01,276 --> 00:37:05,176 You're gonna print star S which means go to that address 878 00:37:05,176 --> 00:37:07,546 and print out if the word is monkey, the letter M. 879 00:37:08,166 --> 00:37:10,756 If you then take that same address and do plus 1 880 00:37:10,806 --> 00:37:13,656 on your second iteration, that's not address 1, 2, 3, 4. 881 00:37:13,656 --> 00:37:16,586 That's like 1, 2, 3, 5 and what letter presumably is 882 00:37:16,586 --> 00:37:17,406 at that location? 883 00:37:18,156 --> 00:37:23,416 So, O. So star of that summation means print the O, print the N, 884 00:37:23,416 --> 00:37:25,196 print the K and so forth. 885 00:37:25,286 --> 00:37:28,276 And because we already checked in advance the length of S 886 00:37:28,546 --> 00:37:30,496 with this helpful function string length, 887 00:37:30,496 --> 00:37:31,366 we're not gonna crash. 888 00:37:31,586 --> 00:37:33,906 We're only gonna step over the characters one at a time 889 00:37:33,906 --> 00:37:34,796 and then we're gonna stop. 890 00:37:35,016 --> 00:37:38,026 But just realize all this time even as far back 891 00:37:38,026 --> 00:37:40,416 as problem set 2 in Caesar, we've been using pointers. 892 00:37:40,416 --> 00:37:41,806 We've been using memory addresses. 893 00:37:41,806 --> 00:37:44,286 We've been walking through your computer's RAM but we did it 894 00:37:44,286 --> 00:37:46,576 in a more user-friendly way with S bracket I, 895 00:37:46,866 --> 00:37:49,376 but really you've been using a feature called pointer 896 00:37:49,376 --> 00:37:51,106 arithmetic, taking an address 897 00:37:51,106 --> 00:37:54,896 and doing some mathematical arithmetic on it plus 1 minus 1. 898 00:37:55,146 --> 00:38:03,856 So realize all we've bee doing is the same topic all this time. 899 00:38:04,536 --> 00:38:05,136 Yeah? 900 00:38:05,136 --> 00:38:05,203 [ Inaudible Remark ] 901 00:38:05,203 --> 00:38:05,976 >> Really good question. 902 00:38:05,976 --> 00:38:09,376 So if we were instead iterating not over characters 903 00:38:09,376 --> 00:38:12,906 which are 1 byte typically, but instead over ints, would we have 904 00:38:12,946 --> 00:38:17,676 to do plus 4 times I so that we go 4 bytes, 4 bytes, 4 bytes? 905 00:38:17,676 --> 00:38:18,476 Short answer, no. 906 00:38:18,766 --> 00:38:21,516 The reason this feature has its own name, pointer arithmetic, 907 00:38:21,726 --> 00:38:24,046 is because the compiler will figure out that 908 00:38:24,046 --> 00:38:28,646 when you say S plus I, if S is actually a char star, 909 00:38:28,646 --> 00:38:30,116 it's gonna do literally plus 1. 910 00:38:30,546 --> 00:38:33,126 If instead though, S is an int star, 911 00:38:33,256 --> 00:38:34,856 well an inst star points to an int. 912 00:38:34,906 --> 00:38:37,396 By definition, ints on this machine are 4 bytes 913 00:38:37,656 --> 00:38:41,446 so plus 1 is actually gonna be implicitly converted to plus 4 914 00:38:41,546 --> 00:38:43,476 than plus 8, plus 12, plus 16. 915 00:38:43,686 --> 00:38:45,526 So that's what's really cool about point arithmetic. 916 00:38:45,526 --> 00:38:47,346 You don't even have to think about those details. 917 00:38:47,606 --> 00:38:50,746 So your code will work on old machines, new machines 918 00:38:50,746 --> 00:38:52,056 because the compiler will figure this 919 00:38:52,056 --> 00:38:54,526 out for you, really good catch. 920 00:38:55,686 --> 00:38:56,596 Alright, yeah? 921 00:38:56,596 --> 00:38:57,456 [ Inaudible Remark ] 922 00:38:57,456 --> 00:39:00,506 >> Good question. 923 00:39:00,506 --> 00:39:02,116 Is this more computationally efficient 924 00:39:02,116 --> 00:39:03,676 than using square brackets? 925 00:39:03,676 --> 00:39:06,706 No. The compiler will actually effectively turn your square 926 00:39:06,706 --> 00:39:07,676 brackets into this. 927 00:39:07,996 --> 00:39:10,006 So when your code is running, you will notice no difference. 928 00:39:10,106 --> 00:39:12,456 Back in the day, you might notice a compilation difference 929 00:39:12,716 --> 00:39:15,356 but these days on a 2 gigahertz computer compiling Caesars 930 00:39:15,396 --> 00:39:17,636 instantaneous anyway, so it's a non-issue 931 00:39:17,896 --> 00:39:19,916 in modern times, alright. 932 00:39:19,966 --> 00:39:20,686 So that was a lot. 933 00:39:20,686 --> 00:39:24,676 Let's go ahead and take our 5-minute musical break here. 934 00:39:25,126 --> 00:39:28,526 Alright, so we are back, really good news, 935 00:39:28,686 --> 00:39:30,026 no problems set next week. 936 00:39:30,496 --> 00:39:31,636 [Cheering] I know. 937 00:39:31,636 --> 00:39:35,876 There it goes. 938 00:39:36,486 --> 00:39:38,496 So yeah, so quiz 0 is next Wednesday. 939 00:39:38,496 --> 00:39:39,776 There's no lecture on Monday 940 00:39:39,776 --> 00:39:41,366 because it's a holiday for the University. 941 00:39:41,366 --> 00:39:42,756 Quiz 0 is on Wednesday. 942 00:39:42,956 --> 00:39:44,926 We will announce via email on the course's website 943 00:39:45,026 --> 00:39:46,296 where to go next Wednesday. 944 00:39:46,296 --> 00:39:47,846 We're gonna try to book enough classrooms 945 00:39:47,976 --> 00:39:50,036 so that we have writing surfaces for everyone. 946 00:39:50,036 --> 00:39:52,116 So we most likely will not be here. 947 00:39:52,116 --> 00:39:54,046 So again, don't show up before checking your email. 948 00:39:54,306 --> 00:39:56,796 There will be a review session this Sunday at 7 p.m. 949 00:39:56,796 --> 00:39:58,276 in Northwest Science, same time, 950 00:39:58,276 --> 00:39:59,956 same place as the walkthroughs usually are. 951 00:40:00,456 --> 00:40:01,586 >> This will be a course wide review. 952 00:40:01,586 --> 00:40:03,566 It will be filmed, put online by the next 953 00:40:03,856 --> 00:40:07,626 to cover really the past 6 weeks of material 954 00:40:07,676 --> 00:40:09,746 and particularly filled questions from you. 955 00:40:09,946 --> 00:40:12,196 And we'll also have office hours next week 956 00:40:12,196 --> 00:40:14,166 on Monday and Tuesday night. 957 00:40:14,166 --> 00:40:16,506 They most likely will not be in the dining halls, so instead be 958 00:40:16,506 --> 00:40:18,156 in a classroom where we can use a white board 959 00:40:18,156 --> 00:40:20,376 and it'll be totally casual and an opportunity 960 00:40:20,376 --> 00:40:22,706 to get some last minute questions answered. 961 00:40:23,086 --> 00:40:26,676 Know too that there's 4 years worth of old quizzes 962 00:40:26,676 --> 00:40:27,876 on the course's website. 963 00:40:28,116 --> 00:40:30,936 So the best guidance t get a sense 964 00:40:31,266 --> 00:40:34,256 of what past quizzes have been like is to go there 965 00:40:34,256 --> 00:40:36,776 and you'll see not just the questions but also the answer, 966 00:40:37,086 --> 00:40:39,616 do just realize that the course evolves overtime. 967 00:40:39,616 --> 00:40:41,446 For instance in '07, we had three quizzes, 968 00:40:41,446 --> 00:40:44,426 thereafter it was just two but the material changes. 969 00:40:44,426 --> 00:40:47,206 So realize that if you have no idea how to answer some question 970 00:40:47,206 --> 00:40:49,266 on the quiz, that's either because you zoned 971 00:40:49,266 --> 00:40:51,776 out at some point at this semester or we just never talked 972 00:40:51,776 --> 00:40:52,866 about it this semester. 973 00:40:52,866 --> 00:40:54,606 So look ultimately to the syllabus 974 00:40:54,956 --> 00:40:57,016 and to the lecture slides and scribe notes 975 00:40:57,366 --> 00:40:58,936 in particular for guidance. 976 00:40:59,016 --> 00:41:01,306 And so if you've not realized this, it's always fascinating 977 00:41:01,306 --> 00:41:03,346 at the end of the semester in the cue guide to read 978 00:41:03,346 --> 00:41:05,526 that people are unaware to these scribe notes. 979 00:41:05,526 --> 00:41:07,286 So we have a wonderful teaching fellow 980 00:41:07,496 --> 00:41:10,096 who actually summarizes what goes 981 00:41:10,096 --> 00:41:13,446 on in class each day typically with snarky little footnotes 982 00:41:13,446 --> 00:41:14,216 which you might enjoy. 983 00:41:14,476 --> 00:41:17,636 And so this is meant to be an authoritative set of notes 984 00:41:17,716 --> 00:41:20,286 in lieu of your own potentially if you'd rather not 985 00:41:20,356 --> 00:41:22,366 so much scribble things down. 986 00:41:22,516 --> 00:41:23,106 Perfect! 987 00:41:23,106 --> 00:41:23,776 [ Laughter ] 988 00:41:23,776 --> 00:41:24,446 [ Inaudible Remark ] 989 00:41:24,446 --> 00:41:27,846 >> So realize these two are meant to be a very good guidance 990 00:41:28,166 --> 00:41:30,206 through the course and I would strongly urge too, 991 00:41:30,206 --> 00:41:34,816 when you do show up at office hours and/or the course review 992 00:41:34,816 --> 00:41:37,036 session, honestly you'll be doing yourself a service 993 00:41:37,036 --> 00:41:38,266 if you spend at least a little bit 994 00:41:38,266 --> 00:41:40,736 of time this weekend taking a past quiz 995 00:41:40,736 --> 00:41:41,806 so that you're not being hit 996 00:41:41,806 --> 00:41:42,946 with material for the first time. 997 00:41:43,156 --> 00:41:44,556 You can rather make better use of your time 998 00:41:44,556 --> 00:41:45,466 and ask questions only 999 00:41:45,466 --> 00:41:49,066 about this stuff you are forgetting or struggling with. 1000 00:41:49,416 --> 00:41:53,126 Also, we put online these things. 1001 00:41:53,126 --> 00:41:54,896 We have been transcribing all 1002 00:41:54,896 --> 00:41:56,566 of the course's lectures as promised. 1003 00:41:56,566 --> 00:41:59,226 So that now when you visit past lectures, videos, 1004 00:41:59,506 --> 00:42:01,606 you will see not just the course's video on the left, 1005 00:42:01,606 --> 00:42:04,116 you will also see every word that came out of my mouth, 1006 00:42:04,386 --> 00:42:07,126 for better or for worst, and you will be able to hit play 1007 00:42:07,126 --> 00:42:09,576 and just to give you a sense of what you too will be able to do 1008 00:42:09,576 --> 00:42:12,966 by semester's end with a little JavaScript. 1009 00:42:13,206 --> 00:42:16,886 You can even read what I'm saying in realtime 1010 00:42:17,176 --> 00:42:20,176 as it highlights as the words come out of my mouth, 1011 00:42:20,216 --> 00:42:22,196 then even subtitle it if you would prefer 1012 00:42:22,196 --> 00:42:23,326 to watch in that fashion. 1013 00:42:23,576 --> 00:42:26,646 So this also means two more compelling that you can search, 1014 00:42:26,646 --> 00:42:29,566 Control F and so forth, looking for topics like pointers, 1015 00:42:29,566 --> 00:42:31,866 looking for arrays, things that you might have struggled with. 1016 00:42:31,866 --> 00:42:34,646 You can actually find that point in the lecture, scroll down to 1017 00:42:34,646 --> 00:42:36,606 that spot, click on a specific sentence, 1018 00:42:36,646 --> 00:42:40,046 and the lecture will immediately jump to that point in the class. 1019 00:42:40,046 --> 00:42:43,026 So realize that's there and we will finish by the weekend's, 1020 00:42:43,126 --> 00:42:45,686 today's, and Monday's lecture so that you have access 1021 00:42:45,686 --> 00:42:47,386 to those online before the quiz. 1022 00:42:47,726 --> 00:42:51,036 So I also got curious as to what words do actually come 1023 00:42:51,036 --> 00:42:51,826 out of my mouth. 1024 00:42:51,826 --> 00:42:54,476 And so, I uploaded them to a nice free tool online 1025 00:42:54,476 --> 00:42:57,886 that creates a visualization of the words that you've pasted in 1026 00:42:58,106 --> 00:42:59,676 and the bigger the word, the bigger the font, 1027 00:42:59,886 --> 00:43:01,006 the more times I said it. 1028 00:43:01,006 --> 00:43:02,896 The smaller the word or if it's not even there, 1029 00:43:02,896 --> 00:43:04,086 the fewer times I said it. 1030 00:43:04,266 --> 00:43:07,126 And this for instance was this year's very first lecture. 1031 00:43:07,726 --> 00:43:09,536 It's kind of curious. 1032 00:43:09,536 --> 00:43:13,526 I say the word "just" a lot, "actually" a lot, 1033 00:43:13,526 --> 00:43:16,656 "course" that makes sense, CS50 is a little smaller there. 1034 00:43:16,656 --> 00:43:18,986 I was worried I was saying Facebook too much the first week 1035 00:43:18,986 --> 00:43:20,766 but that's actually pretty small at the top right. 1036 00:43:21,006 --> 00:43:22,006 So that was reassuring. 1037 00:43:22,296 --> 00:43:23,856 I then fast-forwarded it a few weeks 1038 00:43:23,856 --> 00:43:28,746 and some themes definitely popped out, "just" again 1039 00:43:28,746 --> 00:43:30,706 and "gonna," so I didn't realize I sound 1040 00:43:30,706 --> 00:43:32,056 so intellectual in class. 1041 00:43:32,176 --> 00:43:35,696 And then I looked at yet another week 1042 00:43:35,696 --> 00:43:37,576 and like "just" is the theme. 1043 00:43:37,576 --> 00:43:41,576 So now I'm never gonna be able this word without kind 1044 00:43:41,576 --> 00:43:43,206 of tripping over myself but apparently 1045 00:43:43,206 --> 00:43:45,396 that is the most popular takeaway, 1046 00:43:45,396 --> 00:43:48,046 a word that I say in CS50. 1047 00:43:49,386 --> 00:43:55,106 So, hi! So one exciting initiative at the university 1048 00:43:55,106 --> 00:43:57,016 across all of Harvard schools has been working on, 1049 00:43:57,016 --> 00:43:58,406 you may have read about it at some point 1050 00:43:58,406 --> 00:44:00,426 in the Crimson is the Harvard Innovation Lab. 1051 00:44:00,706 --> 00:44:03,156 This is a beautiful new space across the river, 1052 00:44:03,156 --> 00:44:05,566 right next to HPS that is a few floors to it. 1053 00:44:05,566 --> 00:44:07,656 The top two are being used by HPS classes. 1054 00:44:07,656 --> 00:44:09,856 The bottom floor is meant to be an innovation space 1055 00:44:09,856 --> 00:44:13,616 for undergraduates, GSAS students, HPS students from all 1056 00:44:13,616 --> 00:44:16,186 across the university who have entrepreneurial ideas, 1057 00:44:16,186 --> 00:44:18,886 who have projects they wanna work on collaboratively 1058 00:44:18,886 --> 00:44:21,506 with friends and they just need space to work and they want 1059 00:44:21,506 --> 00:44:24,276 to be around other smart people doing interesting things, 1060 00:44:24,276 --> 00:44:27,176 technical people, so as to ask and to answer questions. 1061 00:44:27,176 --> 00:44:28,876 And so just to give you a sense of the space, 1062 00:44:28,876 --> 00:44:31,606 it has literally just opened in the past few days when you walk 1063 00:44:31,606 --> 00:44:33,896 across the river, you'll see a building like this. 1064 00:44:33,896 --> 00:44:34,586 You go in. 1065 00:44:34,586 --> 00:44:37,576 It's very modern and high-tech, concrete floors, nice lighting, 1066 00:44:37,576 --> 00:44:39,576 funky seating, and so forth, and a lot, 1067 00:44:39,576 --> 00:44:41,076 a lot, a lot of workspace. 1068 00:44:41,346 --> 00:44:43,306 What we thought we'd do for fun even though it is 1069 00:44:43,306 --> 00:44:45,536 across the river, so it's a little farther than Leverett 1070 00:44:45,536 --> 00:44:50,016 and Quincy and [inaudible] and Lowell is for just one week, 1071 00:44:50,016 --> 00:44:52,936 not next week but the week after for problems set 5, 1072 00:44:52,936 --> 00:44:54,606 is we've been cordially invited as a class 1073 00:44:54,606 --> 00:44:57,176 to spend office hours there, Monday, Tuesday, Wednesday, 1074 00:44:57,246 --> 00:44:59,356 Thursday, on pizza and soda, 1075 00:44:59,356 --> 00:45:00,956 will be served throughout the evening. 1076 00:45:01,186 --> 00:45:03,366 The shuttles will run back and forth between campus 1077 00:45:03,366 --> 00:45:04,746 and it's actually not that far a walk, 1078 00:45:04,956 --> 00:45:07,086 but just emotionally we'll make sure that you can hop 1079 00:45:07,086 --> 00:45:08,726 on a shuttle and to actually get there. 1080 00:45:09,026 --> 00:45:12,066 But it should be fun, the CS50 field trip, to see a couple 1081 00:45:12,066 --> 00:45:15,476 of a hundred of us working on P set 5 1082 00:45:15,476 --> 00:45:18,226 in the Harvard Innovation Lab as its inaugural class. 1083 00:45:18,456 --> 00:45:20,256 So more on that I'm sure overtime. 1084 00:45:20,626 --> 00:45:25,636 So we stage on Monday for forensics and focusing 1085 00:45:25,636 --> 00:45:28,526 on a problem domain, albeit using the same fundamentals 1086 00:45:28,776 --> 00:45:31,666 of recovering information or covering your tracks when trying 1087 00:45:31,666 --> 00:45:33,566 to get rid of information to begin 1088 00:45:33,566 --> 00:45:36,126 to discuss how information is actually stored on something 1089 00:45:36,126 --> 00:45:38,526 like a hard drive, we actually need to be able 1090 00:45:38,526 --> 00:45:41,946 to represent the data on a hard drive 1091 00:45:42,016 --> 00:45:43,886 in actual programatic terms. 1092 00:45:44,116 --> 00:45:46,886 So just as a flashback, you might recall 1093 00:45:46,886 --> 00:45:50,516 from week 0 what is actually inside of a hard drive. 1094 00:45:50,516 --> 00:45:52,626 We'll just watch the first part of this for a few seconds. 1095 00:45:52,626 --> 00:45:54,936 Recall that this was the story. 1096 00:45:55,016 --> 00:45:55,083 [ Video Clip ] 1097 00:45:55,083 --> 00:45:55,150 [ Background Music ] 1098 00:45:55,150 --> 00:45:56,226 >> A hard drive is where your PC stores most 1099 00:45:56,226 --> 00:45:57,326 of its permanent data. 1100 00:45:57,876 --> 00:46:00,866 To do that, the data travels from RAM along 1101 00:46:00,866 --> 00:46:03,506 with software signals that tell the hard drive how 1102 00:46:03,506 --> 00:46:04,546 to store that data. 1103 00:46:05,296 --> 00:46:07,776 The hard drive circuits translate those signals 1104 00:46:07,776 --> 00:46:09,426 in voltage fluctuations. 1105 00:46:10,136 --> 00:46:12,916 These in turn control the hard drive's moving parts, 1106 00:46:13,486 --> 00:46:16,376 some of the few moving parts left in the modern computer. 1107 00:46:17,136 --> 00:46:18,776 Some of the signals control a motor 1108 00:46:18,776 --> 00:46:20,696 which spins metal coded platters. 1109 00:46:20,876 --> 00:46:23,966 Your data is actually stored on these platters. 1110 00:46:24,676 --> 00:46:27,726 Other signals move the read/write heads to read 1111 00:46:27,726 --> 00:46:29,206 or write data on the platters. 1112 00:46:29,936 --> 00:46:31,766 This machinery is so precise 1113 00:46:32,246 --> 00:46:34,196 that a human hair couldn't even pass 1114 00:46:34,246 --> 00:46:35,876 between the heads and spinning platers. 1115 00:46:36,426 --> 00:46:38,826 Yet it all works at terrific speeds. 1116 00:46:39,416 --> 00:46:42,326 >> So the film, recall, goes on to just discuss-- 1117 00:46:42,326 --> 00:46:45,486 this is gonna be a little messing with your mind. 1118 00:46:45,486 --> 00:46:48,686 If we can focus the camera on the board here for a moment. 1119 00:46:48,976 --> 00:46:51,646 So you'll recall that the video on perhaps 1120 00:46:51,646 --> 00:46:54,376 to show those little blue and red magnetic particles 1121 00:46:54,376 --> 00:46:56,216 and to reduce the problem of storing information 1122 00:46:56,216 --> 00:46:58,256 on a hard drive to the orientation 1123 00:46:58,256 --> 00:46:59,916 of magnetic particles either north south 1124 00:47:00,136 --> 00:47:02,056 or south north thereby representing 1 1125 00:47:02,056 --> 00:47:04,266 or 0 respectively, some system like that. 1126 00:47:04,566 --> 00:47:05,606 Well, what's really going 1127 00:47:05,606 --> 00:47:08,856 on in a computer's hard drive is obviously gonna be higher level 1128 00:47:08,856 --> 00:47:09,166 than that. 1129 00:47:09,166 --> 00:47:10,936 There's got to be some notion of file name, 1130 00:47:10,936 --> 00:47:12,436 some notion of file sizes, 1131 00:47:12,436 --> 00:47:14,346 some notion of folders in which things are. 1132 00:47:14,596 --> 00:47:16,776 So hard drives are not just zeros and ones, 1133 00:47:16,776 --> 00:47:19,706 rather they actually have some metadata stored on them, 1134 00:47:19,916 --> 00:47:22,896 and metadata means data but it's useful 1135 00:47:23,156 --> 00:47:25,956 for other data you really care about. 1136 00:47:25,956 --> 00:47:28,916 So let me just draw for instance one of those platters. 1137 00:47:28,916 --> 00:47:31,756 So this is a metal disk that's inside of a typical hard drive 1138 00:47:32,096 --> 00:47:35,496 and it spins around generally at thousands of times per minute 1139 00:47:35,586 --> 00:47:39,526 and all along here are zeros and ones in some orientation. 1140 00:47:39,856 --> 00:47:42,906 Now, what's really going on there is that clusters 1141 00:47:42,906 --> 00:47:45,106 of these zeros and ones represent more 1142 00:47:45,106 --> 00:47:46,226 interesting things. 1143 00:47:46,226 --> 00:47:48,576 They represent your actual files, like your MP3S 1144 00:47:48,576 --> 00:47:50,116 and your movies, but also things 1145 00:47:50,116 --> 00:47:52,196 like the file name and where things are. 1146 00:47:52,396 --> 00:47:54,866 So it turns out even though this might represent some movie 1147 00:47:54,866 --> 00:47:57,576 you've downloaded from iTunes, there's a special part 1148 00:47:57,576 --> 00:48:01,826 of the hard drive reserved for a table, 1149 00:48:01,956 --> 00:48:04,346 sort of an Excel spreadsheet of sorts that has-- 1150 00:48:04,766 --> 00:48:06,596 to oversimplify, 2 columns. 1151 00:48:06,596 --> 00:48:09,136 On the left is the name of the file 1152 00:48:09,456 --> 00:48:11,776 and on the right is the address of the file. 1153 00:48:12,016 --> 00:48:15,206 So, just like we can say that your RAM can be addressed 1154 00:48:15,206 --> 00:48:17,496 from byte 0, 1, 2, 3, on up, 1155 00:48:17,776 --> 00:48:20,586 similarly can your hard drive even though it's a circle be 1156 00:48:20,886 --> 00:48:22,286 described in the same way. 1157 00:48:22,286 --> 00:48:23,376 This is byte 0. 1158 00:48:23,376 --> 00:48:26,736 This is byte 1, 2, 3, 4, and some system along those lines. 1159 00:48:27,016 --> 00:48:29,706 So, how does your computer remember where data 1160 00:48:29,706 --> 00:48:30,666 on your hard drive is? 1161 00:48:30,926 --> 00:48:32,456 It uses this little cheat sheet. 1162 00:48:32,596 --> 00:48:35,006 So when you create or save or download a file, 1163 00:48:35,196 --> 00:48:37,756 there's this table in your operating system's memory 1164 00:48:37,936 --> 00:48:39,966 that says, "Okay, you just downloaded movie.mov," 1165 00:48:40,126 --> 00:48:44,256 some file name like that, and so the name gets written 1166 00:48:44,256 --> 00:48:47,126 in the left column, in the right column gets written the address 1167 00:48:47,466 --> 00:48:49,966 of say the first byte of that movie. 1168 00:48:50,146 --> 00:48:51,976 Now hard drives are actually pretty fancy. 1169 00:48:51,976 --> 00:48:54,346 And so, you can get what's called disk fragmentation. 1170 00:48:54,346 --> 00:48:55,436 If you've got big files, 1171 00:48:55,696 --> 00:48:57,696 they might not necessarily all end up here. 1172 00:48:57,736 --> 00:49:01,206 You might get part of your movie here or here or here or here. 1173 00:49:01,416 --> 00:49:02,816 So, if you've ever heard this term 1174 00:49:02,816 --> 00:49:05,326 "defragment your hard drive", it's referring to the fact 1175 00:49:05,326 --> 00:49:06,636 that your file might be spread out. 1176 00:49:06,996 --> 00:49:08,586 So it's not sufficient just 1177 00:49:08,586 --> 00:49:10,866 to remember the starting address of a file. 1178 00:49:11,076 --> 00:49:13,156 It turns out there is usually a list of addresses, 1179 00:49:13,156 --> 00:49:15,756 part 1 is here, part 2 is here and so forth. 1180 00:49:16,106 --> 00:49:18,726 But now it's in the world of forensics. 1181 00:49:18,986 --> 00:49:22,806 What happens when you actually drag a file to the recycle bin 1182 00:49:22,806 --> 00:49:25,566 or to the trash can on Mac OS or Windows? 1183 00:49:26,966 --> 00:49:28,596 So you probably figured this much out, right? 1184 00:49:28,596 --> 00:49:31,226 Like what happens when you just drag it to that special icon? 1185 00:49:31,356 --> 00:49:32,906 >> You'll forget the address? 1186 00:49:33,006 --> 00:49:34,126 >> Yeah, and actually not even there, 1187 00:49:34,296 --> 00:49:37,196 in fact we can rewind further, nothing happens, right? 1188 00:49:37,196 --> 00:49:40,116 If you drag a file, something sketchy or private, 1189 00:49:40,116 --> 00:49:42,376 if you wanna delete and you just put it in the recycle bin 1190 00:49:42,376 --> 00:49:44,206 or the trash can, hopefully by now you figured 1191 00:49:44,206 --> 00:49:46,676 out that your roommate can just double click your trash can 1192 00:49:46,676 --> 00:49:47,756 or your recycle bin, right? 1193 00:49:48,016 --> 00:49:49,306 So you actually have to do what? 1194 00:49:50,166 --> 00:49:53,436 So, empty the recycle bin or empty the trash can in some way, 1195 00:49:53,436 --> 00:49:54,996 but what really happens? 1196 00:49:54,996 --> 00:49:58,566 Well despite years worth of our being trained by computer soci-- 1197 00:49:58,566 --> 00:50:01,566 a computer society that that deleting means deleting, 1198 00:50:01,566 --> 00:50:02,966 deleting does not mean deleting, right? 1199 00:50:03,046 --> 00:50:05,786 >> So, deleting means forgetting. 1200 00:50:06,036 --> 00:50:10,516 So what really happens if you've got some financial data or some, 1201 00:50:10,816 --> 00:50:12,926 you know, Dear John letter you didn't mean to send 1202 00:50:12,926 --> 00:50:15,596 or you wanted to delete or something you don't want found, 1203 00:50:15,846 --> 00:50:18,386 well if you delete it by dragging it to the recycle bin 1204 00:50:18,386 --> 00:50:20,766 and clicking empty recycle bin or empty trash, 1205 00:50:21,096 --> 00:50:24,226 all that's happening in this picture is that, 1206 00:50:24,866 --> 00:50:27,076 the operating system just forgets where it is. 1207 00:50:27,366 --> 00:50:31,516 Now, anyone who's versed in the art of forensics or anyone 1208 00:50:31,516 --> 00:50:33,836 who has Goggle can find a program 1209 00:50:33,836 --> 00:50:36,926 that can then search your hard drive looking 1210 00:50:36,926 --> 00:50:39,986 for what we'll call signatures of known files. 1211 00:50:39,986 --> 00:50:42,176 It turns out a lot of files on the internet 1212 00:50:42,396 --> 00:50:44,556 and in general have signatures 1213 00:50:44,556 --> 00:50:46,596 which just means they're identifiable 1214 00:50:46,816 --> 00:50:47,776 by just a few bytes. 1215 00:50:47,956 --> 00:50:51,316 For instance if you detect this sequence of bytes 1216 00:50:51,586 --> 00:50:54,366 on a hard disk, FFD8, FFE0 1217 00:50:54,366 --> 00:50:58,086 and these are just hexadecimal numbers, if you detect 1218 00:50:58,156 --> 00:51:01,366 that on a hard drive, that means with very high probability, 1219 00:51:01,556 --> 00:51:04,566 you have just encountered the start of a JPEG, 1220 00:51:04,566 --> 00:51:07,226 a JPEG is an image, a photograph. 1221 00:51:07,226 --> 00:51:09,406 These are very commonly deleted from one's hard drive. 1222 00:51:09,636 --> 00:51:12,896 And so you probably don't want it to be so easy for someone 1223 00:51:12,896 --> 00:51:15,876 to just search your entire hard drive with an automated program 1224 00:51:15,876 --> 00:51:18,126 and say JPEG, JPEG, JPEG, JPEG. 1225 00:51:18,446 --> 00:51:20,796 But indeed, this is what forensic investigators do. 1226 00:51:20,796 --> 00:51:22,526 I mean this is what your roommate could be doing. 1227 00:51:22,526 --> 00:51:26,276 If you see Googles for such tools, you can recover files 1228 00:51:26,276 --> 00:51:29,826 from a hard drive with very high probability because literally, 1229 00:51:29,826 --> 00:51:32,286 the bits are still there and just 1230 00:51:32,286 --> 00:51:33,616 because your computer forgot 1231 00:51:33,616 --> 00:51:37,306 about them doesn't mean there's not enough hints scattered 1232 00:51:37,306 --> 00:51:39,206 around your hard drive to recover them. 1233 00:51:39,516 --> 00:51:40,716 So that then begs the question, "Okay, 1234 00:51:40,776 --> 00:51:43,376 how do I really delete these files from my hard drive?" 1235 00:51:43,776 --> 00:51:45,136 Alright, so what do you do or what do you need 1236 00:51:45,136 --> 00:51:46,626 to do technically, intuitively? 1237 00:51:46,626 --> 00:51:46,746 Yeah. 1238 00:51:47,096 --> 00:51:50,046 >> After I delete all my lectures pictures, 1239 00:51:50,086 --> 00:51:52,856 I can upload a bunch of-- 1240 00:51:52,856 --> 00:51:55,026 I can download a bunch of episodes of The Sopranos. 1241 00:51:55,026 --> 00:51:56,916 >> Okay. You have a very concrete plan B here. 1242 00:51:56,916 --> 00:51:57,196 So-- 1243 00:51:57,196 --> 00:51:57,566 [ Laughter ] 1244 00:51:57,566 --> 00:52:00,886 >> So after you delete your sketchy photos, 1245 00:52:00,886 --> 00:52:04,656 you can download some very big Sopranos episodes online 1246 00:52:04,786 --> 00:52:06,126 and that's actually quite clever. 1247 00:52:06,126 --> 00:52:07,576 That would have the side effect 1248 00:52:07,836 --> 00:52:09,936 of overwriting your sketchy photos 1249 00:52:09,936 --> 00:52:12,516 because even though your computers forgotten 1250 00:52:12,696 --> 00:52:14,256 that these files are there, 1251 00:52:14,256 --> 00:52:16,896 as soon as you start downloading a huge amount of content, 1252 00:52:16,896 --> 00:52:20,786 big movie files like Sopranos episodes with high probability, 1253 00:52:20,786 --> 00:52:23,546 these zeros and ones, they're not gonna get erased per se 1254 00:52:23,676 --> 00:52:24,926 but they're gonna get reused. 1255 00:52:24,926 --> 00:52:27,076 Those magnetic particles are now gonna be part 1256 00:52:27,076 --> 00:52:29,066 of some Sopranos episode. 1257 00:52:29,206 --> 00:52:32,286 So now, 50 percent of your sketchy behavior is not gone, 1258 00:52:32,436 --> 00:52:32,676 right? 1259 00:52:32,676 --> 00:52:35,936 So if you've ever used something like Norton utilities or any 1260 00:52:35,936 --> 00:52:38,126 of these programs that undelete files, what-- 1261 00:52:38,126 --> 00:52:40,716 it will sometimes say, "Oh, we can delete this file 1262 00:52:40,716 --> 00:52:42,256 with 95 percent certainty." 1263 00:52:42,476 --> 00:52:44,896 Well, that probably means because 5 percent of the zeros 1264 00:52:44,896 --> 00:52:48,236 and ones have been reused by a Sopranos episode or the like. 1265 00:52:48,546 --> 00:52:50,156 Now thankfully, there are even easier ways. 1266 00:52:50,156 --> 00:52:51,836 This is sort of the like, "Oh my god, I really have 1267 00:52:51,836 --> 00:52:53,586 to cover my tracks download a lot of content." 1268 00:52:53,586 --> 00:52:55,636 Thankfully, there do exist built-in ways 1269 00:52:55,636 --> 00:52:57,336 and different operating systems do this better. 1270 00:52:57,606 --> 00:53:00,216 If you've never noticed, though you've had this probably 1271 00:53:00,216 --> 00:53:02,506 if you own a Mac for some time under your finder menu, 1272 00:53:02,956 --> 00:53:06,406 there is empty trash but there's also secure empty trash. 1273 00:53:06,706 --> 00:53:10,646 So secure empty trash will not only erase what's in this table, 1274 00:53:10,736 --> 00:53:13,066 it will also overwrite the zeros and ones 1275 00:53:13,066 --> 00:53:15,016 with it minimally all zeros. 1276 00:53:15,216 --> 00:53:17,936 And there are Department of Defense standards 1277 00:53:17,936 --> 00:53:20,046 that actually say, "Well, you can overwrite the bits 1278 00:53:20,046 --> 00:53:22,676 with zeros and ones randomly, some 7 times, 1279 00:53:22,676 --> 00:53:23,836 even more times that that." 1280 00:53:24,186 --> 00:53:27,496 The research literature, there has been no published evidence 1281 00:53:27,496 --> 00:53:30,146 that simply writing all zeros over your data 1282 00:53:30,196 --> 00:53:32,156 on modern hard drives is insufficient, 1283 00:53:32,156 --> 00:53:33,836 so you probably don't need to use Department 1284 00:53:33,836 --> 00:53:35,746 of Defense standards to actually cover your tracks, 1285 00:53:35,746 --> 00:53:36,976 whatever it is you're trying to delete. 1286 00:53:37,276 --> 00:53:39,896 It generally suffices to overwrite zeros and ones. 1287 00:53:40,106 --> 00:53:43,176 And this isn't a Mac versus PC thing but Apple has been much, 1288 00:53:43,176 --> 00:53:46,726 much better in recent years making this easy on Windows 1289 00:53:46,726 --> 00:53:47,956 to my knowledge, there is no 1290 00:53:47,956 --> 00:53:50,036 such super simple option as that. 1291 00:53:50,226 --> 00:53:52,366 In fact in Mac OS 2, and it's maybe fine 1292 00:53:52,366 --> 00:53:54,196 because we're preaching to the choir since most of you, 1293 00:53:54,196 --> 00:53:57,066 a majority have Macs, you can also encrypt your whole 1294 00:53:57,066 --> 00:53:57,546 hard drive. 1295 00:53:57,576 --> 00:53:59,176 The thing called FileVault means all 1296 00:53:59,176 --> 00:54:00,536 of your data is actually encrypted. 1297 00:54:00,536 --> 00:54:01,736 So if it's lost or stolen, 1298 00:54:01,996 --> 00:54:03,586 no one can actually see what your files are, 1299 00:54:03,586 --> 00:54:04,886 they just look completely random. 1300 00:54:04,886 --> 00:54:07,136 And there's even, if you're super paranoid, 1301 00:54:07,486 --> 00:54:09,956 something called secure virtual memory. 1302 00:54:10,246 --> 00:54:12,426 Long story short, when I mentioned earlier 1303 00:54:12,426 --> 00:54:14,286 that your computer might slow down sometimes 1304 00:54:14,376 --> 00:54:15,956 because you're loading lots of programs, 1305 00:54:16,316 --> 00:54:19,266 well typically a modern operating system will not say 1306 00:54:19,266 --> 00:54:21,756 like it used to, you have too many programs running. 1307 00:54:21,756 --> 00:54:23,786 I will not launch Safari again 1308 00:54:23,786 --> 00:54:25,506 or I will not launch Internet Explorer. 1309 00:54:25,776 --> 00:54:28,126 Instead what it will do is create the illusion 1310 00:54:28,336 --> 00:54:30,586 that you don't have maximally 2 gigs of RAM. 1311 00:54:30,796 --> 00:54:33,546 The computer will pretend that you have 3 gigabytes of RAM. 1312 00:54:33,936 --> 00:54:37,526 And to create that illusion, it will take one gigabytes worth 1313 00:54:37,526 --> 00:54:40,716 of programs and files you have open temporarily copy them 1314 00:54:40,716 --> 00:54:43,886 from RAM to your hard drive somewhere and then voila, 1315 00:54:43,946 --> 00:54:46,286 you have a gigabyte that you can use for new programs. 1316 00:54:46,496 --> 00:54:48,946 But which is slower, RAM or hard drives? 1317 00:54:48,946 --> 00:54:51,526 Well the short answer is a hard drive is generally something 1318 00:54:51,526 --> 00:54:53,736 mechanical, that this is becoming less and less true, 1319 00:54:54,016 --> 00:54:56,686 anything mechanical is gonna be slower than anything electronic 1320 00:54:56,686 --> 00:54:58,106 and RAM is purely electronic. 1321 00:54:58,466 --> 00:55:00,116 So, one of the reasons your computer slows 1322 00:55:00,116 --> 00:55:01,896 down when you're doing lots of things is 1323 00:55:01,896 --> 00:55:04,746 because you're using virtual memory, hard disk space 1324 00:55:04,806 --> 00:55:05,776 as though it were RAM. 1325 00:55:06,186 --> 00:55:08,406 But here is the security worry, even if you're good 1326 00:55:08,406 --> 00:55:10,256 about deleting your browser's cache 1327 00:55:10,256 --> 00:55:13,136 and you even empty securely your recycle bin or trash can, 1328 00:55:13,136 --> 00:55:15,746 it doesn't matter because those sketchy photos you had opened 1329 00:55:15,746 --> 00:55:18,526 in your program might have been temporarily put 1330 00:55:18,526 --> 00:55:21,926 into virtual memory which means put into some special part 1331 00:55:21,926 --> 00:55:25,126 of the hard disk that you don't have easy access to delete. 1332 00:55:25,306 --> 00:55:28,406 So unless you enable something and I think if I go 1333 00:55:28,406 --> 00:55:34,176 in a Mac system preferences, security, general-- yeah. 1334 00:55:34,556 --> 00:55:36,436 So, it's great out right now but this option here, 1335 00:55:36,486 --> 00:55:39,516 use secure virtual memory is checked here by default. 1336 00:55:39,786 --> 00:55:43,016 That means that even your virtual memory is encrypted. 1337 00:55:43,016 --> 00:55:45,346 So, we'll come back to this in problem set 5, 1338 00:55:45,346 --> 00:55:49,216 but realize that there is a lot of ways to both cover 1339 00:55:49,216 --> 00:55:52,256 or accidentally leave uncovered your tracks. 1340 00:55:52,476 --> 00:55:53,666 But we need now some way 1341 00:55:53,906 --> 00:55:56,296 of representing structures like this. 1342 00:55:56,296 --> 00:55:57,606 Up until now, the only kinds 1343 00:55:57,606 --> 00:56:01,136 of data structures we've had are things like chars and inst 1344 00:56:01,136 --> 00:56:03,916 and floats and slightly more fancy, arrays. 1345 00:56:04,156 --> 00:56:06,916 But even an array has just been a contiguous sequence 1346 00:56:07,116 --> 00:56:08,426 of inst or chars. 1347 00:56:08,696 --> 00:56:11,346 But what if we actually wanna represent something like a table 1348 00:56:11,446 --> 00:56:15,356 like this or a file or maybe even more familiar lately, 1349 00:56:15,356 --> 00:56:15,876 a student. 1350 00:56:15,946 --> 00:56:18,276 And a student might have a name and an ID 1351 00:56:18,486 --> 00:56:20,886 or you can imagine any number of real world entities, 1352 00:56:20,886 --> 00:56:22,616 that'd be nice to kind of represent 1353 00:56:22,786 --> 00:56:25,806 with your new custom data type. 1354 00:56:26,006 --> 00:56:27,106 So we can actually do this. 1355 00:56:27,496 --> 00:56:30,766 So this is a file this week called struct.h. 1356 00:56:31,266 --> 00:56:33,006 And notice what you can do here. 1357 00:56:33,346 --> 00:56:35,546 There's a new keyword that we've actually seen before 1358 00:56:35,546 --> 00:56:36,986 but we're using it now proactively 1359 00:56:36,986 --> 00:56:38,266 for the first time called typedef 1360 00:56:38,266 --> 00:56:40,036 and another one called struct. 1361 00:56:40,506 --> 00:56:43,946 And even though this is slightly new syntax, what this chunk 1362 00:56:43,946 --> 00:56:48,276 of code means is declare a new type, similar in spirit to ints 1363 00:56:48,276 --> 00:56:52,046 and float but a custom one, whose structure looks like that. 1364 00:56:52,046 --> 00:56:55,556 Inside of apparently a struct that's gonna be called students 1365 00:56:55,836 --> 00:56:59,636 is an integer called ID, a string called name 1366 00:56:59,776 --> 00:57:01,236 and a string called house. 1367 00:57:01,536 --> 00:57:03,136 So, I could literally write string but again, 1368 00:57:03,166 --> 00:57:04,876 I'm trying to take off the training wheels 1369 00:57:04,876 --> 00:57:06,046 of the CS50 library. 1370 00:57:06,316 --> 00:57:10,826 But this here means give me a new variable type called student 1371 00:57:11,046 --> 00:57:12,866 inside of which are 3 things. 1372 00:57:13,206 --> 00:57:14,296 Now, why is this useful? 1373 00:57:14,296 --> 00:57:18,736 Well, let me open up this file structs1.c. If I scroll 1374 00:57:18,736 --> 00:57:20,896 down here, notice a couple of things. 1375 00:57:20,896 --> 00:57:23,036 One, I'm kind of including some familiar 1376 00:57:23,036 --> 00:57:25,086 or friendly things 'cause I wanna use GetString 1377 00:57:25,086 --> 00:57:26,606 and some other stuff, printf and the like. 1378 00:57:26,906 --> 00:57:27,756 But notice this too. 1379 00:57:27,926 --> 00:57:30,676 Now that I have my own header file as we've had in some 1380 00:57:30,676 --> 00:57:33,996 of our own P sets, I have to include my own file. 1381 00:57:33,996 --> 00:57:36,876 And any time you're including a file you wrote, you use quotes. 1382 00:57:37,166 --> 00:57:39,456 Any time you're using a file someone else wrote, 1383 00:57:39,456 --> 00:57:42,236 you use angled brackets, so that's the one subtlety there. 1384 00:57:42,536 --> 00:57:44,596 I'm apparently using this trick called the constant 1385 00:57:44,666 --> 00:57:47,356 so that the total number of students in this program is 3. 1386 00:57:47,596 --> 00:57:49,046 And let's before we look at the code, 1387 00:57:49,046 --> 00:57:50,486 just look at what this thing is gonna do. 1388 00:57:50,486 --> 00:57:51,736 So this is structs 1. 1389 00:57:52,276 --> 00:57:55,686 So, let me go into my code and do make structs 1. 1390 00:57:55,686 --> 00:57:58,796 Let me go ahead and run structs 1, alright. 1391 00:57:58,856 --> 00:58:03,046 So, a student's ID is 1, student's name is David, Mather. 1392 00:58:03,046 --> 00:58:04,596 Student's ID is 2. 1393 00:58:04,596 --> 00:58:06,006 Let's say this is Rob. 1394 00:58:06,006 --> 00:58:07,216 This is Kirkland. 1395 00:58:07,606 --> 00:58:09,606 And 3, this is Matt. 1396 00:58:09,716 --> 00:58:10,916 This is Kirkland. 1397 00:58:11,906 --> 00:58:14,516 Okay, that's the only thing I did with this program, right? 1398 00:58:14,516 --> 00:58:15,616 David didn't matter. 1399 00:58:15,616 --> 00:58:17,266 So, how did we do this, right? 1400 00:58:17,916 --> 00:58:19,516 So clearly, we threw away two-thirds 1401 00:58:19,516 --> 00:58:20,856 of the information we collected here. 1402 00:58:21,236 --> 00:58:22,646 But, what's actually going on? 1403 00:58:22,706 --> 00:58:23,566 How did I store these? 1404 00:58:23,566 --> 00:58:24,956 Now, to take a step back, 1405 00:58:24,956 --> 00:58:28,466 you could totally implement this program in like week 1, right? 1406 00:58:28,466 --> 00:58:32,546 You could have 9 variables, student ID 1, student name 1, 1407 00:58:32,546 --> 00:58:34,816 student house 1, then you can have student ID 2, 1408 00:58:34,816 --> 00:58:36,336 student house 2, student name 2. 1409 00:58:36,336 --> 00:58:37,466 And then you could just kind of come 1410 00:58:37,466 --> 00:58:38,686 up with some arbitrary convention 1411 00:58:38,686 --> 00:58:39,966 like numbering your variables. 1412 00:58:40,266 --> 00:58:41,246 But you have 9 of them. 1413 00:58:41,576 --> 00:58:43,596 And conceptually, this should hopefully start 1414 00:58:43,596 --> 00:58:44,536 to rub you the wrong way. 1415 00:58:44,536 --> 00:58:47,976 This is a little inelegant, just to store 3 real world entities 1416 00:58:47,976 --> 00:58:51,006 like teaching staff, I now need 9 variables and I need 1417 00:58:51,006 --> 00:58:53,066 to give them all separate names, I kind of like 1418 00:58:53,066 --> 00:58:56,246 to have a variable called student 1, student 2, student 3, 1419 00:58:56,246 --> 00:58:58,256 something super simple inside 1420 00:58:58,256 --> 00:59:00,246 of which are the nitty-gritty details. 1421 00:59:00,506 --> 00:59:02,056 So that's exactly what we're doing here. 1422 00:59:02,476 --> 00:59:05,026 Notice if I scroll down to my main program here, 1423 00:59:05,456 --> 00:59:06,926 notice in my main function, 1424 00:59:07,156 --> 00:59:09,616 I first declare an array of students. 1425 00:59:09,896 --> 00:59:13,936 And I can now use jargon that just sounds more natural to me. 1426 00:59:13,936 --> 00:59:16,796 I want a student data variable. 1427 00:59:16,796 --> 00:59:20,956 I'm gonna call this a class of students and how many do I want? 1428 00:59:20,956 --> 00:59:21,986 Well, this is just 3. 1429 00:59:22,096 --> 00:59:23,686 Remember that we hard coded that up above. 1430 00:59:24,006 --> 00:59:26,866 So this means give me an array of 3 students and call 1431 00:59:26,866 --> 00:59:28,806 that array class which is just kind 1432 00:59:28,806 --> 00:59:29,876 of consistent with the idea. 1433 00:59:30,156 --> 00:59:32,576 Here is a for loop that iterates from 0 to 3. 1434 00:59:32,776 --> 00:59:34,666 And then I just use some familiar functions. 1435 00:59:34,666 --> 00:59:39,096 I use GetInts and GetString and GetString but notice the syntax. 1436 00:59:39,096 --> 00:59:41,206 This is one new piece of syntax and that's it. 1437 00:59:41,386 --> 00:59:44,956 To get the Ith students in the class, I do class bracket I. 1438 00:59:44,956 --> 00:59:48,436 But if I wanna go inside of that student structure and say, 1439 00:59:48,436 --> 00:59:51,806 edit its ID number, I just say dots 1440 00:59:51,956 --> 00:59:54,586 or I say dot name or I say dot house. 1441 00:59:54,786 --> 00:59:56,526 So this is a way of kind of clumping 1442 00:59:56,526 --> 01:00:00,266 up together multiple variables in int, a char star, a char star 1443 01:00:00,406 --> 01:00:02,196 but thinking of them and programming them 1444 01:00:02,196 --> 01:00:04,686 as though they're one bigger entity like a student 1445 01:00:04,926 --> 01:00:08,786 but still having access to all the nitty-gritty details. 1446 01:00:09,156 --> 01:00:11,526 >> So, here's how only I was printed. 1447 01:00:11,826 --> 01:00:14,626 If I iterate then overall of these students again 1448 01:00:14,626 --> 01:00:17,696 in the array, recall this function, string comparison, 1449 01:00:18,126 --> 01:00:24,126 so if the Ith students in the class, house equals "Mather" 1450 01:00:24,216 --> 01:00:26,536 and I check that by checking for equal to equal to 0, 1451 01:00:26,536 --> 01:00:28,106 remember that's what the string comparison does, 1452 01:00:28,196 --> 01:00:29,516 the 0 if they're equal. 1453 01:00:29,776 --> 01:00:32,756 I print out David or whoever is in Mather. 1454 01:00:32,756 --> 01:00:34,326 And how do I get at David's name? 1455 01:00:34,606 --> 01:00:36,546 Class bracket I dot name. 1456 01:00:36,796 --> 01:00:39,586 But there is one thing I have to do now to get into this habit, 1457 01:00:39,866 --> 01:00:41,546 notice at the very bottom, and I had this 1458 01:00:41,546 --> 01:00:43,136 in my last example, I call free. 1459 01:00:43,326 --> 01:00:44,786 Free is the opposite of malloc. 1460 01:00:45,086 --> 01:00:48,826 Any, any, any time you call malloc, it is up to you 1461 01:00:48,826 --> 01:00:52,026 and it's expected of you to call free at some point, 1462 01:00:52,026 --> 01:00:54,306 not right away 'cause that would kind of defeat the purpose 1463 01:00:54,666 --> 01:00:57,476 but eventually before you actually exit your program 1464 01:00:57,476 --> 01:01:00,876 or return for main, so what I'm doing here is I'm freeing name 1465 01:01:01,086 --> 01:01:04,366 and house for all 3 students but I'm not freeing ID. 1466 01:01:05,016 --> 01:01:09,326 Why? Because what? 1467 01:01:09,326 --> 01:01:10,216 [ Inaudible Remark ] 1468 01:01:10,216 --> 01:01:12,196 >> I think you're right, they can't hear you. 1469 01:01:12,196 --> 01:01:13,626 [ Inaudible Remark ] 1470 01:01:13,626 --> 01:01:15,276 >> Okay, so I didn't print it 1471 01:01:15,416 --> 01:01:17,936 but why did I make a conscious decision not to free ID? 1472 01:01:17,936 --> 01:01:19,706 >> You didn't use malloc. 1473 01:01:19,706 --> 01:01:20,436 >> I didn't use malloc. 1474 01:01:20,436 --> 01:01:21,616 It's as simple as that. 1475 01:01:21,616 --> 01:01:24,456 Because in my H file, notice my header file, 1476 01:01:24,636 --> 01:01:28,416 because in my header file, I said that a student is an int 1477 01:01:28,746 --> 01:01:30,516 and a char star and a house. 1478 01:01:30,806 --> 01:01:34,356 Notice there's no malloc here but I did assign to name 1479 01:01:34,356 --> 01:01:36,216 and house the return value of what function? 1480 01:01:37,586 --> 01:01:39,486 GetString, GetString uses malloc. 1481 01:01:39,626 --> 01:01:42,636 And so here too is this dirty little secret that I admitted 1482 01:01:42,636 --> 01:01:45,666 to earlier, all this time we've been writing technically 1483 01:01:45,886 --> 01:01:46,806 buggy programs. 1484 01:01:47,046 --> 01:01:49,106 But the upside is my god, we didn't have to think about 1485 01:01:49,106 --> 01:01:50,846 or talk about pointers in the first week, 1486 01:01:50,846 --> 01:01:52,926 we could just use GetString and get a string from the user. 1487 01:01:53,166 --> 01:01:55,126 But now, any time you call GetString, 1488 01:01:55,126 --> 01:01:58,206 which will soon be no more or ultimately call malloc, 1489 01:01:58,206 --> 01:01:59,256 you have to free memory. 1490 01:01:59,256 --> 01:02:01,566 Otherwise, your program will "leak" 1491 01:02:01,856 --> 01:02:03,656 and that generally results in slowdowns 1492 01:02:03,656 --> 01:02:07,346 and it ultimately involves incorrectness of programs. 1493 01:02:08,036 --> 01:02:12,936 So I thought I would disclose or emphasize all the more 1494 01:02:12,936 --> 01:02:17,366 with a concrete example what is and is not possible 1495 01:02:17,366 --> 01:02:21,456 in popular culture in terms of technical shows like this. 1496 01:02:21,826 --> 01:02:25,356 So I dug out a 30 or so second clip from an actual TV show. 1497 01:02:25,666 --> 01:02:27,126 We'll then counterbalance it 1498 01:02:27,296 --> 01:02:29,786 with what really would happen had they consulted anyone 1499 01:02:29,786 --> 01:02:32,576 remotely technical before shooting this episode. 1500 01:02:33,386 --> 01:02:39,036 So, here we go. 1501 01:02:39,036 --> 01:02:39,476 [ Inaudible Remark ] 1502 01:02:39,476 --> 01:02:39,916 [ Background Music ] 1503 01:02:39,916 --> 01:02:41,656 >> Back on. 1504 01:02:41,656 --> 01:02:41,976 >> What do you see? 1505 01:02:42,516 --> 01:02:48,516 [ Music ] 1506 01:02:49,016 --> 01:02:49,083 [ Background Music ] 1507 01:02:49,146 --> 01:02:53,896 >> Bring the space up, full screen. 1508 01:02:53,896 --> 01:02:54,206 >> His glasses. 1509 01:02:54,576 --> 01:02:55,976 >> There's a reflection. 1510 01:02:56,516 --> 01:03:00,766 [ Music ] 1511 01:03:01,266 --> 01:03:05,516 [ Noise ] 1512 01:03:06,016 --> 01:03:06,083 [ Background Music ] 1513 01:03:06,346 --> 01:03:07,456 >> It's a movie that has baseball team. 1514 01:03:07,936 --> 01:03:08,516 That's a logo. 1515 01:03:08,596 --> 01:03:14,016 >> And he's talking to whoever is wearing that jacket. 1516 01:03:14,016 --> 01:03:14,276 >> Okay. 1517 01:03:14,276 --> 01:03:14,343 [ Laughter ] 1518 01:03:14,343 --> 01:03:15,916 >> So, any time you hear someone say, again, 1519 01:03:15,916 --> 01:03:16,676 can you clean that up? 1520 01:03:16,676 --> 01:03:18,176 Can you enhance that? 1521 01:03:18,176 --> 01:03:18,786 You can't. 1522 01:03:19,016 --> 01:03:20,436 If you see a little glimmer 1523 01:03:20,436 --> 01:03:23,486 in someone's eye, for instance Rob's-- 1524 01:03:23,486 --> 01:03:23,976 [ Laughter ] 1525 01:03:23,976 --> 01:03:28,786 >> And you try to zoom in on that little white spec 1526 01:03:28,966 --> 01:03:31,316 of some reflection there as I can do 1527 01:03:31,316 --> 01:03:34,006 with some consumer program, let's call it Photoshop. 1528 01:03:34,296 --> 01:03:37,876 Let me drag suspect.jpg into Photoshop. 1529 01:03:37,876 --> 01:03:41,106 I'm going to zoom in on something suspicious here 1530 01:03:41,106 --> 01:03:42,846 in his eye and I'm gonna zoom 1531 01:03:43,556 --> 01:03:46,676 and I'm gonna zoom and I'm gonna zoom. 1532 01:03:47,416 --> 01:03:49,606 Okay, so that glimmer in his eye as well 1533 01:03:49,606 --> 01:03:53,786 as CSI's eye, 2 pixels of color. 1534 01:03:53,786 --> 01:03:56,826 This is what happens when you enhance an actual image 1535 01:03:56,826 --> 01:03:57,556 in reality. 1536 01:03:57,596 --> 01:03:59,216 So I give you Rob Bowden's [phonetic] eye. 1537 01:03:59,656 --> 01:04:00,946 This is CS50 end of week 5. 1538 01:04:00,946 --> 01:04:01,976 We will see you next week. 1539 01:04:02,516 --> 01:04:07,043 [ Applause ] 1540 01:04:07,543 --> 01:04:12,070 [ Music ]