1 00:00:08,996 --> 00:00:11,706 >> Welcome back to CS 50. 2 00:00:11,706 --> 00:00:13,006 This is Week 3. 3 00:00:13,256 --> 00:00:15,036 And on the evaluations at the end 4 00:00:15,036 --> 00:00:17,956 of last fall we got a comment from a student 5 00:00:17,956 --> 00:00:20,916 about how they apparently were in incentivized to come 6 00:00:20,916 --> 00:00:24,026 to class early simply because we started most every lecture 7 00:00:24,026 --> 00:00:25,356 with a little something stupid. 8 00:00:25,356 --> 00:00:27,476 So I thought we'd begin two minutes early 9 00:00:27,476 --> 00:00:30,756 so that we can squeeze in two such minutes of motivation. 10 00:00:30,976 --> 00:00:34,186 This one related, of course, to Problem Set 2, which for those 11 00:00:34,186 --> 00:00:36,966 of you doing the hacker edition challenges you not 12 00:00:36,966 --> 00:00:40,806 to implement ciphers but to do the reverse, cryptanalysis, 13 00:00:40,936 --> 00:00:44,196 and actually decrypt some known people's passwords. 14 00:00:44,276 --> 00:00:49,846 So with that, I give you this perhaps famous clip. 15 00:00:50,936 --> 00:00:54,866 >> What's going on, what are you doing to my daughter? 16 00:00:55,376 --> 00:00:58,876 >> Permit me to introduce the brilliant young plastic surgeon, 17 00:00:59,246 --> 00:01:03,116 Dr. Philip Slotkin, the greatest nose job man 18 00:01:03,116 --> 00:01:05,606 in the entire universe and Beverly Hills. 19 00:01:06,226 --> 00:01:07,666 >> Your Highness. 20 00:01:07,666 --> 00:01:10,236 >> Nose job, I don't understand, she's already had a nose job. 21 00:01:10,236 --> 00:01:12,356 It was her sweet 16 present. 22 00:01:12,356 --> 00:01:14,296 >> No, it's not what you think. 23 00:01:14,486 --> 00:01:15,836 It's much, much worse. 24 00:01:15,836 --> 00:01:18,626 If you do not give me the combination 25 00:01:18,626 --> 00:01:23,336 to the [Inaudible] Dr. Slotkin will give your daughter back her 26 00:01:23,336 --> 00:01:23,726 old nose! 27 00:01:24,476 --> 00:01:29,586 >> No! Where did you get that? 28 00:01:31,086 --> 00:01:32,616 >> All right, I'll tell, I'll tell. 29 00:01:32,696 --> 00:01:34,736 >> No, daddy, no, you mustn't. 30 00:01:35,716 --> 00:01:36,536 >> You're right, my dear. 31 00:01:37,276 --> 00:01:38,436 I'll miss your new nose, 32 00:01:38,556 --> 00:01:43,726 but I will not tell him the combination no matter what. 33 00:01:43,806 --> 00:01:44,156 >> Very well. 34 00:01:44,646 --> 00:01:45,536 Dr. Slotkin, do your worst. 35 00:01:46,116 --> 00:01:47,266 >> My pleasure. 36 00:01:47,936 --> 00:01:50,666 >> No! Wait, wait! 37 00:01:51,416 --> 00:01:52,636 I'll tell. 38 00:01:54,706 --> 00:01:56,056 I'll tell. 39 00:01:56,056 --> 00:01:58,096 >> I knew it would work. 40 00:01:58,096 --> 00:02:01,696 All right, give it to me. 41 00:02:02,586 --> 00:02:05,966 >> The combination is 1. 42 00:02:06,626 --> 00:02:07,546 >> One. 43 00:02:07,546 --> 00:02:08,056 >> One. 44 00:02:08,806 --> 00:02:09,046 >> Two. 45 00:02:09,526 --> 00:02:09,976 >> Two. 46 00:02:11,046 --> 00:02:11,556 >> Two. 47 00:02:12,056 --> 00:02:12,256 >> Three. 48 00:02:12,966 --> 00:02:13,126 >> Three. 49 00:02:13,126 --> 00:02:13,216 >> Three. 50 00:02:13,866 --> 00:02:14,086 >> Four. 51 00:02:14,086 --> 00:02:14,386 >> Four. 52 00:02:14,926 --> 00:02:15,156 >> Four. 53 00:02:16,556 --> 00:02:17,196 >> Five. 54 00:02:17,556 --> 00:02:17,846 >> Five. 55 00:02:18,366 --> 00:02:18,656 >> Five. 56 00:02:19,146 --> 00:02:22,976 >> So the combination is one, two, three, four, five. 57 00:02:23,526 --> 00:02:26,316 >> That's the stupidest combination I ever heard 58 00:02:26,316 --> 00:02:27,756 in my life, that's the kind 59 00:02:27,756 --> 00:02:28,976 of thing an idiot would have on his luggage. 60 00:02:29,116 --> 00:02:31,336 >> Thank you, your Highness. 61 00:02:32,576 --> 00:02:32,986 >> What did you do? 62 00:02:33,676 --> 00:02:34,296 >> I turned off the [Inaudible] -- 63 00:02:34,876 --> 00:02:36,886 >> No you didn't, you turned off the whole movie. 64 00:02:37,096 --> 00:02:41,246 >> I must have pressed the wrong button. 65 00:02:41,276 --> 00:02:42,236 >> Put it back on, put the movie back on. 66 00:02:42,266 --> 00:02:42,986 Got to get that thing fixed. 67 00:02:43,016 --> 00:02:43,976 We're back, and we have the combination. 68 00:02:45,296 --> 00:02:45,506 >> What? 69 00:02:46,546 --> 00:02:48,116 >> We're done with you. 70 00:02:48,396 --> 00:02:52,866 Go back to the golf course and work on your putts. 71 00:02:52,926 --> 00:02:55,126 >> Let's go, Arnold, come Gretchen. 72 00:02:55,246 --> 00:02:57,976 Of course you know I'll still have to bill you for this. 73 00:02:58,156 --> 00:03:03,716 >> Bet she gives great helmet. 74 00:03:04,186 --> 00:03:05,596 >> Did it work? 75 00:03:05,596 --> 00:03:05,916 Where's the key? 76 00:03:05,916 --> 00:03:08,956 >> It works sir, we have the combination. 77 00:03:08,956 --> 00:03:11,186 >> Great. Now we can take every last breath of fresh air 78 00:03:11,186 --> 00:03:12,976 from plant [Inaudible] What's the combination? 79 00:03:13,696 --> 00:03:15,886 >> One, two, three, four, five. 80 00:03:16,076 --> 00:03:17,566 >> One, two, three, four, five? 81 00:03:17,756 --> 00:03:17,936 >> Yes. 82 00:03:17,936 --> 00:03:18,866 >> That's amazing. 83 00:03:18,866 --> 00:03:22,186 I've got the same combination on my Luggage! 84 00:03:22,186 --> 00:03:24,876 Prepare, Space Ball One for immediate departure. 85 00:03:26,156 --> 00:03:26,476 >> Yes sir. 86 00:03:26,476 --> 00:03:31,976 >> And change the combination on my luggage. 87 00:03:32,046 --> 00:03:34,656 >> Okay, it really doesn't get stupider than that. 88 00:03:34,656 --> 00:03:36,716 So this is CS 50. 89 00:03:37,156 --> 00:03:40,446 We left off last week looking at cryptography, the motivation 90 00:03:40,446 --> 00:03:41,656 for which ultimately was 91 00:03:41,686 --> 00:03:45,296 to introduce a problem domain for Problem Set 2. 92 00:03:45,546 --> 00:03:47,966 But more than that, it was to make clear this reality 93 00:03:47,966 --> 00:03:50,216 that if you've already sat down to tackle P Set 2, 94 00:03:50,426 --> 00:03:51,706 or will soon this week, 95 00:03:52,276 --> 00:03:54,806 what's ultimately a fairly interesting idea, 96 00:03:54,806 --> 00:03:57,626 taking plain text, converting it to cipher text, 97 00:03:57,626 --> 00:04:00,946 and generally doing something useful, encrypting information, 98 00:04:01,256 --> 00:04:04,696 it really reduces to some things that are very, very simple. 99 00:04:04,696 --> 00:04:07,466 You're going to find that taking in plain text is a matter 100 00:04:07,466 --> 00:04:10,326 of calling something like get string, which of course returns 101 00:04:10,326 --> 00:04:13,126 to you a string, which of course is just an array of characters. 102 00:04:13,296 --> 00:04:16,106 Well, encrypting text, then, is really going to reduce itself 103 00:04:16,356 --> 00:04:19,956 to just iterating over that array from I equals 0 on up 104 00:04:19,956 --> 00:04:22,586 on the length of the string, and then doing some kind 105 00:04:22,586 --> 00:04:24,316 of manipulation of each of the characters. 106 00:04:24,316 --> 00:04:25,976 Maybe in the case of Caesar, some addition. 107 00:04:26,096 --> 00:04:28,756 But there's a problem with the Caesar cipher. 108 00:04:28,756 --> 00:04:30,506 So the Caesar cipher is this -- 109 00:04:31,096 --> 00:04:33,406 this algorithm via which you take some text 110 00:04:33,406 --> 00:04:35,676 and quote unquote rotate each of the characters 111 00:04:35,676 --> 00:04:36,806 by some number of places. 112 00:04:37,086 --> 00:04:39,446 And rot 13 [Assumed spelling] was a very specific instance 113 00:04:39,446 --> 00:04:42,116 of that, where the secret rotational value, 114 00:04:42,346 --> 00:04:45,446 the so-called key was in that case 13. 115 00:04:45,446 --> 00:04:46,376 But we can generalize. 116 00:04:46,376 --> 00:04:48,946 It can be anything, from 0 on up to infinity. 117 00:04:49,216 --> 00:04:51,436 But of course only a few of those, say 26, 118 00:04:51,436 --> 00:04:53,486 actually have some utility. 119 00:04:53,786 --> 00:04:57,796 How would you go about cracking, that is figuring 120 00:04:57,796 --> 00:04:59,976 out someone's password, if it were encrypted 121 00:04:59,976 --> 00:05:01,106 with the Caesar cipher? 122 00:05:01,826 --> 00:05:04,236 If I hand you a string of text, it looks like nonsense 123 00:05:04,236 --> 00:05:08,226 because it's been encrypted with rot 13, not twice, but once. 124 00:05:08,486 --> 00:05:08,976 What would you do? 125 00:05:09,016 --> 00:05:09,846 [ Inaudible audience comment ] 126 00:05:09,846 --> 00:05:12,166 >> What's that? 127 00:05:13,336 --> 00:05:14,886 Okay, so frequency analysis. 128 00:05:14,916 --> 00:05:17,856 So a fancy way of saying, well, there's some letters 129 00:05:17,856 --> 00:05:19,946 in the alphabet that are more popular than others. 130 00:05:19,946 --> 00:05:22,036 If you've ever watched Wheel of Fortune, you'll realize 131 00:05:22,036 --> 00:05:23,356 that most contestants know this. 132 00:05:23,626 --> 00:05:26,886 So maybe you look for the most frequently reoccurring letter 133 00:05:26,886 --> 00:05:32,436 in the cipher text and realize, hmm, Q, Q appears an awful lot. 134 00:05:32,436 --> 00:05:34,166 But wait a minute, this has been encrypted. 135 00:05:34,316 --> 00:05:36,856 It's probably the case that Q is the result 136 00:05:36,856 --> 00:05:41,616 of encrypting a more common letter, maybe E, for instance, 137 00:05:41,616 --> 00:05:43,326 and getting Q as a result. 138 00:05:43,326 --> 00:05:45,886 So you might be able to reverse engineer what the secret key is 139 00:05:46,066 --> 00:05:47,326 by making some inferences. 140 00:05:47,326 --> 00:05:49,036 Or even if you're not that clever 141 00:05:49,036 --> 00:05:51,216 and you've never heard this term frequency analysis, 142 00:05:51,256 --> 00:05:53,866 but you know it's been encrypted at least with Caesar, I mean, 143 00:05:53,866 --> 00:05:56,316 what is another approach, a correct approach, 144 00:05:56,316 --> 00:05:57,526 if naive approach, even. 145 00:05:57,576 --> 00:05:57,643 Yeah? 146 00:05:58,016 --> 00:05:58,766 [ Inaudible audience comment ] 147 00:05:58,766 --> 00:06:03,436 >> Yeah, that's only 26 keys that might have been used 148 00:06:03,436 --> 00:06:03,976 if we're talking about Caesar. 149 00:06:04,046 --> 00:06:05,736 So yeah, might be a little tedious, 150 00:06:05,736 --> 00:06:10,006 but it's not going take forever just to try all possible keys. 151 00:06:10,006 --> 00:06:12,376 And in fact, you don't have to try all possible keys 152 00:06:12,376 --> 00:06:13,636 on the entire message, 153 00:06:13,636 --> 00:06:15,706 especially if this is an entire letter or paragraph 154 00:06:15,706 --> 00:06:17,336 or whatever that's been encrypted. 155 00:06:17,426 --> 00:06:19,336 It will try the first few letters, 156 00:06:19,336 --> 00:06:22,436 and if you start seeing English pop out, or maybe French 157 00:06:22,436 --> 00:06:25,466 or Spanish or anything that can be expressed with A through Z, 158 00:06:25,766 --> 00:06:27,506 odds are if it looks like a word, 159 00:06:27,506 --> 00:06:30,226 if it looks like a sentence, bam, you found a key. 160 00:06:30,486 --> 00:06:33,256 So Caesar is not all that strong, and maybe it was 161 00:06:33,256 --> 00:06:35,176 in fact used back in the day, 162 00:06:35,436 --> 00:06:37,686 but there have been certainly improvements on it since. 163 00:06:37,926 --> 00:06:39,536 Because even we with a pace of paper 164 00:06:39,536 --> 00:06:41,436 and a pencil could crack the Caesar cipher. 165 00:06:41,626 --> 00:06:42,956 Well, the Visionaire [Assumed spelling], cipher, 166 00:06:43,186 --> 00:06:45,786 which came along some years later is a little 167 00:06:45,786 --> 00:06:46,956 for sophisticated. 168 00:06:46,956 --> 00:06:50,086 It takes the approach of Caesar, of rotating letters 169 00:06:50,086 --> 00:06:52,106 in the alphabet by some number of places, 170 00:06:52,426 --> 00:06:54,676 but of rotating different letters 171 00:06:54,716 --> 00:06:56,506 by a different number of places. 172 00:06:56,506 --> 00:06:59,306 So instead of using K, the key, the numeric key, 173 00:06:59,586 --> 00:07:02,416 to encrypt every single letter in your plain text, 174 00:07:02,706 --> 00:07:06,356 it uses a different key for each successive letter. 175 00:07:06,356 --> 00:07:09,406 Now that doesn't go on forever, you usually pick a fixed number 176 00:07:09,446 --> 00:07:11,946 of keys, and in fact, if you just need a fixed number 177 00:07:11,946 --> 00:07:14,736 of keys, you just choose what we call a keyword. 178 00:07:14,736 --> 00:07:16,006 So something like foobar. 179 00:07:16,006 --> 00:07:19,576 Now, as an aside, if you've wondered why I constantly resort 180 00:07:19,576 --> 00:07:23,086 to fu and bar and all Baz and all of these strange names 181 00:07:23,086 --> 00:07:24,396 for variables, you'll find 182 00:07:24,396 --> 00:07:26,156 that this is just a silly computer science thing, 183 00:07:26,156 --> 00:07:27,816 a silly programming thing, a convention. 184 00:07:27,816 --> 00:07:30,626 Whenever you need to whip out a random variable for the sake 185 00:07:30,626 --> 00:07:33,466 of discussion, fu is typically one's first instinct. 186 00:07:33,466 --> 00:07:36,226 Bar, is followed by that, and then ucks [Phonetic], 187 00:07:36,226 --> 00:07:38,616 and then there's a whole bunch of silly ones after that. 188 00:07:38,616 --> 00:07:40,986 So foobar is sort of a conical example 189 00:07:40,986 --> 00:07:42,536 of a random word that we might use. 190 00:07:42,846 --> 00:07:46,336 But we just have to realize that each letter in foobar 191 00:07:46,336 --> 00:07:48,766 in this case just represents a different key. 192 00:07:48,996 --> 00:07:52,486 So F represents one key, O represents another. 193 00:07:52,486 --> 00:07:54,106 O happens to represent the same. 194 00:07:54,106 --> 00:07:54,976 B is another key, 195 00:07:55,466 --> 00:07:57,656 so in other words you take your keyword, 196 00:07:57,836 --> 00:08:01,186 you line it up under your plain text, and you rotate each 197 00:08:01,186 --> 00:08:03,546 of the plain text characters by the number 198 00:08:03,596 --> 00:08:06,216 of places implied by the key. 199 00:08:06,216 --> 00:08:07,056 So now what is F? 200 00:08:07,226 --> 00:08:08,896 Well, let's see, if we start counting as zero 201 00:08:08,896 --> 00:08:12,616 as we typically do, A is zero, so then it's B, C, D, E, 202 00:08:12,616 --> 00:08:16,026 F. So F represents the number, the key 5. 203 00:08:16,396 --> 00:08:17,306 So what does that mean? 204 00:08:17,306 --> 00:08:19,966 Well, if my plain text P is hello world, 205 00:08:19,966 --> 00:08:23,886 and I need to rotate H by F, that is by 5 places, 206 00:08:24,146 --> 00:08:27,726 I go H I J K L M. Okay, 207 00:08:27,726 --> 00:08:30,836 so the resulting cipher text letter is M, and then I repeat 208 00:08:30,836 --> 00:08:32,566 that same process using different keys. 209 00:08:32,566 --> 00:08:36,616 Now presumably, I might -- plain text, the message I want 210 00:08:36,616 --> 00:08:39,246 to encrypt, is hopefully going to be longer than the key. 211 00:08:39,486 --> 00:08:41,686 If you're remembering a key that's longer 212 00:08:41,686 --> 00:08:45,436 than your actual message you're losing some efficiency there, 213 00:08:45,816 --> 00:08:46,206 perhaps. 214 00:08:46,206 --> 00:08:48,416 So foobar is only six letters. 215 00:08:48,606 --> 00:08:50,906 So what if our plain text is longer than six letters? 216 00:08:51,106 --> 00:08:54,026 Well, you just start repeating again and again and again. 217 00:08:54,476 --> 00:08:56,506 So let's ask the question for just a moment, then. 218 00:08:56,506 --> 00:08:59,316 If there are 26 possible keys for Caesar, 219 00:08:59,566 --> 00:09:07,166 how many possible keys are there for Visionaire, by contrast? 220 00:09:07,166 --> 00:09:10,876 Little louder, little more confident. 221 00:09:10,876 --> 00:09:12,756 Okay, infinitely many, certainly, 222 00:09:12,756 --> 00:09:15,806 because we can pick any possible keyword of any length. 223 00:09:16,096 --> 00:09:17,076 But let's generalize. 224 00:09:17,076 --> 00:09:22,496 Suppose we have N letters in our keyword for Visionaire. 225 00:09:22,496 --> 00:09:26,326 How many end letter key words are possible for Visionaire? 226 00:09:27,586 --> 00:09:29,366 Right, so 26 to the end. 227 00:09:29,366 --> 00:09:31,846 So if every -- you have an N letter keyword, 228 00:09:31,966 --> 00:09:35,216 six in this case, and each of those letters can be A through Z 229 00:09:35,216 --> 00:09:37,336 through any of 26 possible letters, 230 00:09:37,336 --> 00:09:41,616 you've got 26 options times 26 option, times 26, so that's 26 231 00:09:41,676 --> 00:09:44,066 on the N. And then if I relax the constraint 232 00:09:44,066 --> 00:09:47,346 that the key is six letters, and I say ehh, the key is any number 233 00:09:47,346 --> 00:09:49,166 of letters, then you have even more. 234 00:09:49,166 --> 00:09:53,576 You have 26 to the 2, plus 26 to the 3, plus 26 to the 4, 235 00:09:53,806 --> 00:09:56,276 so in short, the key space is a lot larger. 236 00:09:56,496 --> 00:09:59,656 Now ultimately you could still crack the Visionaire cipher 237 00:09:59,656 --> 00:10:00,366 by a few ways. 238 00:10:00,576 --> 00:10:03,666 You could do some intelligent manipulation, you could look 239 00:10:03,666 --> 00:10:06,146 for patterns, for instance, you could do frequency analysis. 240 00:10:06,506 --> 00:10:08,186 Some fairly sophisticated heuristics, 241 00:10:08,186 --> 00:10:10,406 or you could try all possible keys. 242 00:10:10,686 --> 00:10:12,236 Start with all the one letter keys, 243 00:10:12,266 --> 00:10:14,496 then once you've tried all those and gotten back non sense, 244 00:10:14,716 --> 00:10:17,016 try all the two letter keys, the three letter keys. 245 00:10:17,306 --> 00:10:18,826 And that algorithm, though naive, 246 00:10:19,066 --> 00:10:23,176 would eventually work, at least in theory. 247 00:10:23,176 --> 00:10:24,986 If you have enough time to wait around. 248 00:10:25,216 --> 00:10:27,836 Unfortunately, these days, when you've got two gigahertz 249 00:10:27,836 --> 00:10:30,346 of power underneath your own desk or your laptop, 250 00:10:30,676 --> 00:10:34,806 you can churn through a lot of keys very quickly, 251 00:10:34,806 --> 00:10:38,626 and in fact there's this algorithm called DES, D-E-S, 252 00:10:38,806 --> 00:10:40,956 and this is actually an algorithm that's used 253 00:10:41,256 --> 00:10:44,336 in the function that's typically used on a Linux or UNIX 254 00:10:44,336 --> 00:10:48,976 or Mac os system, still today, to encrypt your own passwords. 255 00:10:49,056 --> 00:10:50,626 So you've got a user name and password, 256 00:10:50,626 --> 00:10:51,836 maybe on your own computer, 257 00:10:52,046 --> 00:10:54,676 certainly on Nice dot F A S dot Harvard.edu now, 258 00:10:54,876 --> 00:10:58,076 and your password, fortunately, is stored in encrypted fashion, 259 00:10:58,296 --> 00:11:00,516 and this is why, for instance, the user assistants, 260 00:11:00,516 --> 00:11:03,556 beg them as you might, cannot tell you your password 261 00:11:03,556 --> 00:11:05,856 if you forget it, because the system stores it 262 00:11:05,856 --> 00:11:06,916 in encrypted fashion. 263 00:11:07,226 --> 00:11:10,256 So this algorithm, this function called DES was used 264 00:11:10,256 --> 00:11:12,526 for some time, many years after Visionaire, 265 00:11:12,706 --> 00:11:15,406 but in more recent modern times, and there were 266 00:11:15,406 --> 00:11:20,326 with DES 72 quadrillion possible keys because of the number 267 00:11:20,326 --> 00:11:22,126 of bits used to encrypt something 268 00:11:22,126 --> 00:11:23,366 with this particular algorithm. 269 00:11:23,626 --> 00:11:25,046 Now we won't go into the specifics, 270 00:11:25,046 --> 00:11:28,576 I put this little snippet from a text book to just kind of hit 271 00:11:28,576 --> 00:11:30,476 at the relative complexity of this. 272 00:11:30,736 --> 00:11:33,026 Visionaire is very simple, you can do it paper-pencil. 273 00:11:33,306 --> 00:11:35,956 DES starts to take a computer or calculator, 274 00:11:36,166 --> 00:11:39,046 as all of these arrows and this flowchart generally implies. 275 00:11:39,046 --> 00:11:42,936 But a computer can compute DES-encrypted text 276 00:11:43,486 --> 00:11:44,296 fairly quickly. 277 00:11:44,726 --> 00:11:45,996 Unfortunately, too quickly. 278 00:11:45,996 --> 00:11:48,986 So the reality is these days if you encrypt something with DES, 279 00:11:49,356 --> 00:11:52,306 you know, if someone has enough time and enough CPU cycles 280 00:11:52,306 --> 00:11:54,696 at their disposal they can crack your password. 281 00:11:54,696 --> 00:11:57,986 And that's in fact, exactly what Problem Set 2's hacker edition 282 00:11:57,986 --> 00:11:58,706 is all about. 283 00:11:58,926 --> 00:12:00,966 Now this is you know, interesting maybe fun 284 00:12:00,966 --> 00:12:04,286 in the context of passwords on some random university's system. 285 00:12:04,606 --> 00:12:08,836 It's a lot more worrisome, if random people, students in CS 50 286 00:12:08,836 --> 00:12:12,586 with a laptop on their lap, can crack information 287 00:12:12,586 --> 00:12:15,246 that has been encrypted with something like DES. 288 00:12:15,246 --> 00:12:16,996 DES is not very secure any more. 289 00:12:16,996 --> 00:12:20,946 56 bit keys, as it once used, no longer very secure. 290 00:12:21,136 --> 00:12:22,016 So there are variants. 291 00:12:22,196 --> 00:12:23,936 In fact, someone came up with the bright idea 292 00:12:23,936 --> 00:12:27,306 of not just using DES, but let's run DES three times, 293 00:12:27,306 --> 00:12:29,466 call it tripped DES or 3 DES. 294 00:12:29,466 --> 00:12:31,876 And that actually is a little more secure, 295 00:12:32,166 --> 00:12:33,406 but not terribly so. 296 00:12:33,406 --> 00:12:36,126 So the world has come up with more sophisticated algorithms. 297 00:12:36,126 --> 00:12:37,956 And if you take or have time to take 298 00:12:37,956 --> 00:12:40,736 over your years here computer science 120, 299 00:12:40,736 --> 00:12:41,976 and there's a cryptography class, 300 00:12:41,976 --> 00:12:43,166 and there's some graduate level ones, 301 00:12:43,396 --> 00:12:45,156 it's really interest space. 302 00:12:45,196 --> 00:12:48,036 But the funny thing is a lot of the security 303 00:12:48,356 --> 00:12:52,016 of today's cryptographic mechanisms ultimately boil 304 00:12:52,016 --> 00:12:53,096 down to assumptions. 305 00:12:53,166 --> 00:12:55,146 The safety of them boils down to assumptions. 306 00:12:55,146 --> 00:12:58,156 In other words, there is this algorithm called R S A, 307 00:12:58,156 --> 00:13:01,266 and you might have seen this listed to various web sites, 308 00:13:01,486 --> 00:13:03,146 you might have heard of this algorithm before. 309 00:13:03,146 --> 00:13:06,216 It basically boils down to a really fancy way 310 00:13:06,216 --> 00:13:08,806 of allowing web sites to communicate 311 00:13:08,806 --> 00:13:11,926 with browsers securely, but that's just a specific case. 312 00:13:11,926 --> 00:13:14,666 You can use it for far many more interesting purposes; 313 00:13:14,666 --> 00:13:16,166 ATM machines, bank transactions, 314 00:13:16,366 --> 00:13:18,346 a whole bunch of interesting stuff. 315 00:13:18,406 --> 00:13:21,766 But it ultimately, the security of R S A, 316 00:13:21,766 --> 00:13:24,486 whether you're using keys that are 56 bits long, 317 00:13:24,486 --> 00:13:27,756 which is pretty short these days, or 128 bits long, 318 00:13:27,756 --> 00:13:32,886 or 1,024 bits long, ultimately, whether 319 00:13:32,886 --> 00:13:35,726 or not this algorithm actually works reduces 320 00:13:35,726 --> 00:13:39,426 to whether mankind can come up with a fast way 321 00:13:39,426 --> 00:13:41,686 of factoring large numbers. 322 00:13:41,686 --> 00:13:45,956 So again, long story short, this algorithm called R S A boils 323 00:13:45,956 --> 00:13:50,176 down to assuming that mankind, and their computers, 324 00:13:50,236 --> 00:13:55,616 cannot take a really big number and factor it really quickly 325 00:13:55,616 --> 00:13:57,466 into two large prime numbers. 326 00:13:58,046 --> 00:14:00,456 So put another way, if you want to encrypt some information 327 00:14:00,456 --> 00:14:02,546 and you want to use an algorithm like R S A, 328 00:14:02,546 --> 00:14:04,936 you take two really big prime numbers, and it's not hard 329 00:14:04,936 --> 00:14:07,436 to generate prime numbers, even big ones, it's not hard 330 00:14:07,436 --> 00:14:08,926 to multiply them together. 331 00:14:09,206 --> 00:14:12,176 But it is much harder to undo that process, 332 00:14:12,176 --> 00:14:14,226 when all you do is hand an adversary 333 00:14:14,226 --> 00:14:16,946 or hand a computer a really big number and then say hey, 334 00:14:16,946 --> 00:14:18,456 this just so happens to be the product 335 00:14:18,456 --> 00:14:20,956 of two really big primes, tell me what they are. 336 00:14:21,476 --> 00:14:22,956 We, thus far, have not come 337 00:14:22,956 --> 00:14:25,616 up with a relatively fast way of figuring that out. 338 00:14:25,616 --> 00:14:28,796 It will take more units than we have in our lifetimes, 339 00:14:28,796 --> 00:14:30,866 for instance, to make sweeping generalizations. 340 00:14:31,206 --> 00:14:35,196 But the key is, no pun intended, that there is a lot 341 00:14:35,196 --> 00:14:38,776 of interesting opportunities to improve upon 342 00:14:38,826 --> 00:14:40,656 or even break some algorithms 343 00:14:40,686 --> 00:14:43,106 that the world itself is now entrusting our money, 344 00:14:43,106 --> 00:14:45,126 our security, and a whole bunch of things too. 345 00:14:45,126 --> 00:14:47,836 To give you a sense of what a bigger key looks like. 346 00:14:48,156 --> 00:14:50,586 So this is intentionally rather small and deliberate, 347 00:14:50,786 --> 00:14:54,136 this is a key, a really big number, that happens 348 00:14:54,166 --> 00:14:56,356 to be represented with alphabetical characters 349 00:14:56,356 --> 00:14:58,526 and numbers, just for efficiency's sake, 350 00:14:58,786 --> 00:15:01,656 that represents a key one might use with an algorithm 351 00:15:01,656 --> 00:15:06,656 like R S A. This is a lot bigger than a number from 1 to 26, 352 00:15:06,696 --> 00:15:08,846 this is a lot bigger than a word like foobar. 353 00:15:08,846 --> 00:15:11,016 So this hints at the types 354 00:15:11,016 --> 00:15:13,616 of security that's actually in use today. 355 00:15:14,066 --> 00:15:16,516 And so that's where we pick off -- pick up today. 356 00:15:16,606 --> 00:15:19,426 So we spent a lot of time in the past couple of weeks talking 357 00:15:19,426 --> 00:15:21,886 about syntax and C and programming, 358 00:15:22,096 --> 00:15:24,256 now finally that we've gotten some of the minutia 359 00:15:24,256 --> 00:15:27,106 out of the way, what a for loop is, what a while loop is, 360 00:15:27,106 --> 00:15:30,046 and granted, you might still need another week or two 361 00:15:30,046 --> 00:15:31,376 of practice to get comfortable 362 00:15:31,376 --> 00:15:32,846 with even those syntactical details, 363 00:15:33,106 --> 00:15:35,246 we finally now have the ability to start talking 364 00:15:35,396 --> 00:15:38,696 at a slightly higher level about how you actually start 365 00:15:38,696 --> 00:15:42,216 to think more effectively, as the syllabus 366 00:15:42,216 --> 00:15:43,586 and course catalog claims, 367 00:15:43,586 --> 00:15:46,146 how you can solve problems more efficiently, 368 00:15:46,146 --> 00:15:47,636 as is the other promise made. 369 00:15:47,636 --> 00:15:49,746 And we started this course with this sort 370 00:15:49,746 --> 00:15:51,456 of visually interesting example 371 00:15:51,456 --> 00:15:52,946 of taking a really big phone book, 372 00:15:53,116 --> 00:15:56,716 and I claim I could find Mike Smith much faster 373 00:15:57,036 --> 00:16:00,056 than say a completely naive person would, 374 00:16:00,056 --> 00:16:02,736 by starting at the front and flipping through the As, 375 00:16:03,066 --> 00:16:05,506 and then the Bs, and then the Cs. 376 00:16:05,506 --> 00:16:08,786 Because I took this approach of dividing the problem in half 377 00:16:08,816 --> 00:16:10,516 and conquering only half of it. 378 00:16:10,516 --> 00:16:13,336 And then I realized, oh, if Mike I know is in the right-hand side 379 00:16:13,336 --> 00:16:14,886 of the phone book, let me do this again, 380 00:16:14,886 --> 00:16:16,796 and split the problem again and again. 381 00:16:17,156 --> 00:16:19,136 And our take away was that if that phone book, 382 00:16:19,136 --> 00:16:23,126 and it pretty much was, a 1,000 pages or 1,024 pages, 383 00:16:23,406 --> 00:16:25,906 how many times would I actually have to split that problem 384 00:16:25,906 --> 00:16:27,776 in half to find someone like Mike Smith. 385 00:16:29,166 --> 00:16:30,716 So just ten, right? 386 00:16:30,716 --> 00:16:33,906 If you have 1024 pages, and you just do the math in your head 387 00:16:33,906 --> 00:16:35,426 or on a piece of paper, you can divide 388 00:16:35,426 --> 00:16:37,716 that number in half ten times. 389 00:16:38,326 --> 00:16:40,566 Now there might be some interesting rounding issues 390 00:16:40,566 --> 00:16:42,856 at the end when you get to the last page, but ten times, 391 00:16:42,926 --> 00:16:45,136 instead of 1024 pages. 392 00:16:45,406 --> 00:16:47,516 Suppose that phone book, as we said in week 0, 393 00:16:47,686 --> 00:16:51,586 were not 1,000 pages, but 4 billion pages, that's a hell 394 00:16:51,586 --> 00:16:53,626 of a phone book, with a lot of names in it. 395 00:16:53,946 --> 00:16:56,826 But how many pages would I actually have to look at. 396 00:16:56,826 --> 00:16:57,746 How many times would I have 397 00:16:57,746 --> 00:17:02,956 to chop a 4 billion page phone book in half. 398 00:17:02,956 --> 00:17:04,756 32 times. Why? 399 00:17:04,756 --> 00:17:06,486 Well, you can take 4 billion and divide it 400 00:17:06,486 --> 00:17:08,366 in half roughly 32 times. 401 00:17:08,366 --> 00:17:10,206 And that speaks to the power 402 00:17:10,376 --> 00:17:12,656 of actually designing algorithms intelligently, 403 00:17:12,656 --> 00:17:14,366 that speaks to the power of what you can do 404 00:17:14,556 --> 00:17:16,646 with this basic device and just a modicum 405 00:17:16,646 --> 00:17:18,906 of intelligence inserted into your algorithm. 406 00:17:19,166 --> 00:17:21,856 Right, if I told you to search a phone book right know, 407 00:17:21,856 --> 00:17:24,506 you could certainly recently through how you might implement, 408 00:17:24,506 --> 00:17:26,866 divide, and conquer, but by far the simplest way 409 00:17:26,866 --> 00:17:30,486 to implement search for a phone book using the syntax 410 00:17:30,486 --> 00:17:33,256 and the capabilities of C you have now is what, like, 411 00:17:33,366 --> 00:17:36,886 write a for loop, I gets zero, I less than or equal to 4 billion, 412 00:17:37,136 --> 00:17:41,056 and then just check from I on up to 4 billion whether 413 00:17:41,056 --> 00:17:44,376 or not someone is in the current variable or in the current array 414 00:17:44,376 --> 00:17:45,956 or however you actually code it up. 415 00:17:46,156 --> 00:17:48,456 You could implement that with just a few minute's time. 416 00:17:48,716 --> 00:17:50,746 But that thing's going to take a while to run. 417 00:17:50,746 --> 00:17:52,846 By contrast, put a little more thought up front 418 00:17:53,106 --> 00:17:55,936 and the algorithm itself will execute far more quickly. 419 00:17:56,256 --> 00:17:58,906 So we're not going to stand up and indulge in this again, 420 00:17:58,906 --> 00:18:00,926 but this is an instance of the same idea. 421 00:18:00,926 --> 00:18:04,176 The first day you all stood up, you paired off with someone, 422 00:18:04,406 --> 00:18:07,126 and then half of you sat down, then half of you sat down, 423 00:18:07,126 --> 00:18:08,236 then half of you sat down. 424 00:18:08,496 --> 00:18:09,686 Now, granted, that turned into a bit 425 00:18:09,686 --> 00:18:11,696 of a disaster, as is often the case. 426 00:18:11,696 --> 00:18:15,486 Took way longer than for me to just go 2, 4, 6, 8, 10, 12. 427 00:18:15,916 --> 00:18:19,156 But in theory, had there not been the social awkwardness, 428 00:18:19,156 --> 00:18:21,226 had there not been the problems with arithmetic, 429 00:18:21,536 --> 00:18:23,246 we would have been able to -- 430 00:18:23,346 --> 00:18:25,706 you would have been able to count the people in this room 431 00:18:25,706 --> 00:18:29,246 so much more quickly than I, because whereas I'm sort 432 00:18:29,246 --> 00:18:32,036 of going like this, 1, 2, 3, 433 00:18:32,036 --> 00:18:36,226 so with every CPU cycle I'm doing one thing, you, 434 00:18:36,226 --> 00:18:40,266 by contrast, with every toll of the clock, with every CPU cycle, 435 00:18:40,466 --> 00:18:41,916 half of you were sitting down. 436 00:18:41,916 --> 00:18:42,866 You were taking a half -- 437 00:18:43,036 --> 00:18:46,506 50% of the problem out of the room at once. 438 00:18:46,556 --> 00:18:49,266 Where I was just taking one nth of it out of the way. 439 00:18:49,796 --> 00:18:51,756 So this too has implications. 440 00:18:51,756 --> 00:18:54,116 So this is just a fancy way -- I whipped this up with Excel -- 441 00:18:54,346 --> 00:18:55,596 to depict the following idea. 442 00:18:55,686 --> 00:18:57,836 So on the X axis here is this number N, 443 00:18:58,126 --> 00:19:00,216 which generally is going to refer to the size 444 00:19:00,216 --> 00:19:02,356 of the problem, how many pages are in the phone book. 445 00:19:02,356 --> 00:19:04,676 Let's call it N. How many students are in Sanders. 446 00:19:04,676 --> 00:19:08,066 Let's call it N. So I have from zero on up to 130. 447 00:19:08,266 --> 00:19:11,286 And then on the Y axis I have what I call T of N, 448 00:19:11,546 --> 00:19:15,506 the running time of an algorithm whose input is of size N. 449 00:19:15,596 --> 00:19:18,186 So it's just a general way of describing the running time, 450 00:19:18,186 --> 00:19:20,626 how much time does it take for an algorithm to run, 451 00:19:20,886 --> 00:19:21,996 and what do we mean by time? 452 00:19:22,806 --> 00:19:25,106 Like steps or seconds, minutes, whatever, 453 00:19:25,106 --> 00:19:27,276 so long as we all agree what the metric is. 454 00:19:27,586 --> 00:19:29,696 And these different lines represent different 455 00:19:29,696 --> 00:19:30,386 running times. 456 00:19:30,386 --> 00:19:32,616 So at the top left, we have a little legend there. 457 00:19:32,676 --> 00:19:35,426 It's kind of small to see, but we have three graphs here. 458 00:19:35,586 --> 00:19:37,656 One is this one up top. 459 00:19:37,786 --> 00:19:40,996 So on the left-hand side we have a very steep curve, 460 00:19:40,996 --> 00:19:43,226 but this is a line, this is a linear graph. 461 00:19:43,226 --> 00:19:45,976 And this represents an algorithm 462 00:19:46,236 --> 00:19:50,106 that given A inputs takes N steps to run. 463 00:19:50,326 --> 00:19:54,826 And that is representative of the running time of 1, 2, 3, 464 00:19:54,936 --> 00:19:57,056 this linear algorithm of counting students. 465 00:19:57,056 --> 00:20:00,096 But we realized in Week Zero we can do better, right? 466 00:20:00,096 --> 00:20:01,896 We can actually sped those up. 467 00:20:01,956 --> 00:20:04,746 So the second graph there, the second line that's a little bit 468 00:20:04,746 --> 00:20:07,156 to the right of the first one, that would be the result 469 00:20:07,156 --> 00:20:09,176 of my doing 2, 4, 6, 8. 470 00:20:09,176 --> 00:20:13,896 It's actually twice as fast, now contrast this with your line, 471 00:20:14,166 --> 00:20:17,296 so log of N, as we described your algorithm that day, 472 00:20:17,296 --> 00:20:20,006 as we described this general principle of divide and conquer. 473 00:20:20,306 --> 00:20:23,326 What this means is your algorithm should take way 474 00:20:23,326 --> 00:20:24,096 less time. 475 00:20:24,416 --> 00:20:26,416 In other words, the more students are in the room, 476 00:20:26,646 --> 00:20:30,516 look at the contrast between how many steps or how many seconds, 477 00:20:30,516 --> 00:20:32,906 how many units much time your algorithm should take 478 00:20:33,286 --> 00:20:36,446 versus mine, which is literally off the charts. 479 00:20:36,816 --> 00:20:37,726 Now what does this mean? 480 00:20:38,296 --> 00:20:41,136 If we add -- if one more student comes into this room 481 00:20:41,566 --> 00:20:43,336 that takes me one more step. 482 00:20:43,696 --> 00:20:45,836 If one more student comes into this room, 483 00:20:46,096 --> 00:20:47,886 that's kind of rounding error for you. 484 00:20:47,886 --> 00:20:50,206 Because you can kind of assume that one additional student. 485 00:20:50,206 --> 00:20:51,636 So let's actually take a bigger -- 486 00:20:51,856 --> 00:20:53,886 let's increase the problem more significantly. 487 00:20:54,176 --> 00:20:57,206 Suppose we have roughly 2, 300 students in this room. 488 00:20:57,476 --> 00:21:01,236 Suppose another 200 or 300 students come into this room 489 00:21:01,466 --> 00:21:02,656 within the next few seconds. 490 00:21:02,886 --> 00:21:06,356 Well that's going to take me literally twice as long. 491 00:21:06,566 --> 00:21:09,766 If it took me N steps before, it's going to take me 2 N steps 492 00:21:09,856 --> 00:21:11,386 by doubling the size of the room. 493 00:21:11,706 --> 00:21:15,506 Now my contrast, if 200 or 300 more students enter this room 494 00:21:15,506 --> 00:21:18,246 in the next few seconds, how many more steps or unit 495 00:21:18,246 --> 00:21:21,166 of time does it take you to count those additional students? 496 00:21:22,216 --> 00:21:23,286 Just one, right? 497 00:21:23,316 --> 00:21:25,616 Because within the first iteration of your algorithm, 498 00:21:25,616 --> 00:21:27,726 those same kids will sit down right away. 499 00:21:27,726 --> 00:21:30,056 And that's a huge bite out of the problem, 500 00:21:30,056 --> 00:21:32,326 whereas mine has grown significantly. 501 00:21:32,546 --> 00:21:34,256 And that's the fundamental contrast 502 00:21:34,256 --> 00:21:36,386 between something that's linear, like in my case, 503 00:21:36,646 --> 00:21:38,246 and something that's logarithmic, 504 00:21:38,246 --> 00:21:40,536 or at least mathematically more interesting. 505 00:21:40,886 --> 00:21:43,586 And not to overwhelm with numbers, but just to give -- 506 00:21:43,586 --> 00:21:46,606 put concrete values on what the implications here are, 507 00:21:46,796 --> 00:21:47,806 this is just a big chart 508 00:21:47,806 --> 00:21:50,456 that computes some very basic functions. 509 00:21:50,456 --> 00:21:52,886 So in the left-hand, the third column here, 510 00:21:52,936 --> 00:21:54,746 right above where I'm standing, this is N, 511 00:21:55,066 --> 00:21:58,806 so we arbitrarily have 1, 2, 4, 8, 16, a whole bunch of values 512 00:21:58,806 --> 00:22:02,286 of N. So my algorithm, 1, 2, 3, 4 -- 513 00:22:02,396 --> 00:22:06,536 if we have let's say 1 million students in this room 514 00:22:06,856 --> 00:22:10,956 or 1,048,576, that's obviously going 515 00:22:10,956 --> 00:22:14,976 to take me 1,048,576 steps. 516 00:22:15,296 --> 00:22:18,466 But my contrast, if you scroll back to the left, 517 00:22:18,706 --> 00:22:21,066 and you can do this, you know, with a calculator or even 518 00:22:21,066 --> 00:22:23,986 in your head perhaps, that same student body 519 00:22:24,226 --> 00:22:26,506 of a million students in Sanders could be counted 520 00:22:26,506 --> 00:22:28,546 by you all in just 20 steps. 521 00:22:28,966 --> 00:22:31,066 And so the take away visually of a graph like this, 522 00:22:31,066 --> 00:22:32,866 and we'll revisit this or you can revisit this 523 00:22:32,866 --> 00:22:33,976 as we introduce yet more algorithms, 524 00:22:34,126 --> 00:22:38,016 is exactly how significant the implications are 525 00:22:38,276 --> 00:22:40,646 of well-designed algorithms. 526 00:22:40,646 --> 00:22:42,666 And we can zoom out even further, 527 00:22:42,666 --> 00:22:45,176 if you look at various graphs like these three here, 528 00:22:45,456 --> 00:22:47,496 the specifics are not so interesting today, 529 00:22:48,036 --> 00:22:49,956 but notice this last one here. 530 00:22:50,306 --> 00:22:52,866 So it's small to see, but that big curve that's going 531 00:22:52,866 --> 00:22:55,916 to the north there is called 2 to the N. 2 532 00:22:55,916 --> 00:22:57,836 to the N being an exponential function. 533 00:22:58,146 --> 00:23:00,776 So we'll talk about it at times and in future classes 534 00:23:00,776 --> 00:23:03,306 if you take other theory classes in CS, even just for fun 535 00:23:03,306 --> 00:23:04,556 or if you're minoring in CS, 536 00:23:05,076 --> 00:23:08,416 you'll find that exponential algorithms, very bad. 537 00:23:08,416 --> 00:23:12,126 Because as N increases they take a ridiculous amount of time. 538 00:23:12,456 --> 00:23:15,516 But back to our motivation today of cryptography, 539 00:23:15,516 --> 00:23:19,356 when we say it's really hard to factor large numbers quickly, 540 00:23:19,616 --> 00:23:21,346 what we mean, essentially, 541 00:23:21,626 --> 00:23:24,286 is that factoring large numbers generally boils 542 00:23:24,286 --> 00:23:28,266 down to something that takes exponential time, not linear, 543 00:23:28,316 --> 00:23:30,446 not quadratic, not polynomial, 544 00:23:30,446 --> 00:23:32,976 if you recall the terms, but exponential. 545 00:23:32,976 --> 00:23:35,166 So graphs like that, very bad. 546 00:23:35,166 --> 00:23:37,186 Takes a good amount of time to run. 547 00:23:37,766 --> 00:23:41,046 So how can we start to describe the running times of algorithms. 548 00:23:41,046 --> 00:23:43,626 Right? It's sort of -- it's easy enough to pick a few examples, 549 00:23:43,676 --> 00:23:44,746 talk about which one's better, 550 00:23:44,746 --> 00:23:46,516 why it's better, and steps it takes. 551 00:23:46,816 --> 00:23:47,786 But can we generalize. 552 00:23:47,826 --> 00:23:50,476 Can you start to look at your own code or your own programs 553 00:23:50,476 --> 00:23:54,266 and say this algorithm is good because it takes this amount 554 00:23:54,266 --> 00:23:56,946 of time, or this implementation I wrote last week sucks 555 00:23:56,946 --> 00:23:59,706 because it takes this much time in general. 556 00:23:59,746 --> 00:24:03,546 Can we slap some labels or some formalities on just 557 00:24:03,546 --> 00:24:06,256 so we can start comparing algorithms against one another, 558 00:24:06,306 --> 00:24:09,076 even in different languages or written by different people. 559 00:24:09,296 --> 00:24:12,096 Well, there's three basic constructs, three little pieces 560 00:24:12,096 --> 00:24:13,496 of jargon we'll start using. 561 00:24:13,496 --> 00:24:16,866 One is called big O, one -- and that's the guy at the top. 562 00:24:17,006 --> 00:24:18,336 The middle guy is called theta. 563 00:24:18,336 --> 00:24:19,846 And the bottom is omega. 564 00:24:20,276 --> 00:24:23,206 And these symbols are just generally used by mathematicians 565 00:24:23,206 --> 00:24:27,646 in computer science to express the running time of algorithms. 566 00:24:27,846 --> 00:24:28,776 Now what does this mean? 567 00:24:29,386 --> 00:24:30,596 So formally, it means this. 568 00:24:30,686 --> 00:24:32,116 And these slides are on line, 569 00:24:32,116 --> 00:24:33,766 if you like the mathematical formalities, 570 00:24:33,796 --> 00:24:35,516 because there's actually some juicy ideas in there, 571 00:24:35,756 --> 00:24:38,836 but for today, they reduce to some relatively simple ideas. 572 00:24:39,166 --> 00:24:43,216 If you have an algorithm that in the worst case takes N steps, 573 00:24:43,656 --> 00:24:45,506 you say that that algorithm is in -- 574 00:24:45,506 --> 00:24:52,086 you say that that algorithm is in big O of N. 575 00:24:52,236 --> 00:24:54,816 So this is the worst case running time 576 00:24:55,076 --> 00:24:57,486 of time linear algorithm of counting students, 577 00:24:57,486 --> 00:25:00,266 and as an aside, happy birthday to Jansen and Mike. 578 00:25:00,266 --> 00:25:02,056 If, though, your algorithm 579 00:25:02,116 --> 00:25:04,706 in the best case might actually take fewer steps, 580 00:25:04,886 --> 00:25:07,586 well then you start talking about omega notation. 581 00:25:07,866 --> 00:25:09,456 Now in the case of counting students, 582 00:25:09,756 --> 00:25:11,216 this really doesn't help us, 583 00:25:11,216 --> 00:25:13,206 because in the best case how many steps is it going 584 00:25:13,206 --> 00:25:14,676 to take to count N students? 585 00:25:15,166 --> 00:25:18,436 Well, N. Right, I kind of have to count everyone. 586 00:25:18,436 --> 00:25:19,326 In the best case? 587 00:25:19,606 --> 00:25:21,196 Well, if I want to count all of the students, 588 00:25:21,196 --> 00:25:22,866 it's still going to take N steps. 589 00:25:23,406 --> 00:25:25,436 But let's pick another algorithm arbitrarily. 590 00:25:25,496 --> 00:25:28,376 Suppose students file into the room, I want to search 591 00:25:28,376 --> 00:25:30,836 for a student named Mike, just because. 592 00:25:31,316 --> 00:25:33,866 So in the worst case, how many steps will be take me 593 00:25:34,126 --> 00:25:36,956 to find a student named Mike as they file in the door. 594 00:25:36,956 --> 00:25:39,396 All right, so N steps. 595 00:25:39,426 --> 00:25:41,236 Because in the worst case, the Mike 596 00:25:41,236 --> 00:25:42,466 or the several Mikes we have 597 00:25:42,466 --> 00:25:44,206 in the course this semester are just -- 598 00:25:44,376 --> 00:25:46,246 just so happen going to be the last people coming 599 00:25:46,246 --> 00:25:46,706 into the room. 600 00:25:46,706 --> 00:25:47,356 Just by chance. 601 00:25:47,636 --> 00:25:50,106 But in the best case, where might Mike be in the line? 602 00:25:51,156 --> 00:25:52,076 First, right? 603 00:25:52,216 --> 00:25:52,976 Common sense. 604 00:25:52,976 --> 00:25:56,996 So in the best case there we say that algorithm of finding Mike, 605 00:25:57,036 --> 00:26:02,096 searching for Mike, is in omega of 1 in the best case, 606 00:26:02,096 --> 00:26:03,526 just takes one step, bam, I ask the question, 607 00:26:03,566 --> 00:26:04,896 I get the answer I'm looking for right away. 608 00:26:05,046 --> 00:26:13,256 So when you have an algorithm that's both in big O of N 609 00:26:13,696 --> 00:26:16,796 or omega of N, so not the searching 610 00:26:16,846 --> 00:26:19,126 but rather the counting, then can you say 611 00:26:19,126 --> 00:26:22,876 that that algorithm is described by theta of N. So in short, 612 00:26:23,276 --> 00:26:25,106 we have worst case running time up top, 613 00:26:25,436 --> 00:26:28,346 we have best case running time at bottom, and if those two just 614 00:26:28,346 --> 00:26:30,566 so happen to be the same, then you can say 615 00:26:30,566 --> 00:26:35,316 that the algorithm is in theta of whatever formula you've come 616 00:26:35,316 --> 00:26:36,666 up with, and in that case. 617 00:26:37,236 --> 00:26:39,416 All right, so what does this mean exactly? 618 00:26:39,416 --> 00:26:44,766 So in the case of a phone book, when I tore the phone book again 619 00:26:44,766 --> 00:26:47,046 and again and again, and what was the running time 620 00:26:47,166 --> 00:26:51,526 of finding Mike Smith or really finding anyone, for that matter, 621 00:26:51,926 --> 00:26:54,716 in a phone book that happens to have N pages? 622 00:26:56,066 --> 00:27:01,896 In the worst case, searching a phone book takes how many steps? 623 00:27:02,056 --> 00:27:05,216 Okay, so log N, if my algorithm is to just start 624 00:27:05,216 --> 00:27:07,976 at the first page and go from left to right all the way 625 00:27:07,976 --> 00:27:08,876 to the end of the phone book. 626 00:27:08,876 --> 00:27:11,466 So that might be one implementation, 627 00:27:11,466 --> 00:27:14,066 and this will be say, linear search of a phone book. 628 00:27:14,166 --> 00:27:15,186 But we can do better. 629 00:27:15,466 --> 00:27:17,626 In other words, if I want to compare one algorithm, 630 00:27:17,806 --> 00:27:20,386 the linear algorithm that we started class within Week Zero 631 00:27:20,386 --> 00:27:22,926 versus the more intelligent one of divide and conquer, 632 00:27:23,336 --> 00:27:26,866 divide and conquer worst case running time is instead what? 633 00:27:27,856 --> 00:27:28,636 Log N, right? 634 00:27:28,696 --> 00:27:30,586 Because we said multiply times now 635 00:27:30,876 --> 00:27:32,596 that in the worst case we're going have 636 00:27:32,626 --> 00:27:34,986 to keep chopping the phone book in half and half and half, 637 00:27:34,986 --> 00:27:37,966 until we get to one page where Mike Smith is finally listed. 638 00:27:38,196 --> 00:27:39,886 How many times can we do that chopping? 639 00:27:40,206 --> 00:27:42,106 Log in time, so this is something 640 00:27:42,106 --> 00:27:43,416 that we'll call logarithmic. 641 00:27:43,946 --> 00:27:46,296 All right, what about the best case running time 642 00:27:46,366 --> 00:27:48,576 of the phone book searching algorithm? 643 00:27:49,416 --> 00:27:50,146 Yeah, right? 644 00:27:50,716 --> 00:27:52,486 Because in the best case -- 645 00:27:52,486 --> 00:27:54,786 and again, when you talk about an algorithm, 646 00:27:54,786 --> 00:27:56,406 you talk about a general input. 647 00:27:56,406 --> 00:27:58,836 We don't talk about an algorithm for finding Mike Smith, 648 00:27:59,116 --> 00:28:01,146 we talk about an algorithm for finding someone. 649 00:28:01,326 --> 00:28:05,516 Well, the best case running time of finding someone 650 00:28:05,516 --> 00:28:08,276 in a phone book whether you use the linear algorithm 651 00:28:08,546 --> 00:28:10,886 or the logarithmic one ultimately boils 652 00:28:10,916 --> 00:28:12,486 down to just constant time, right? 653 00:28:12,486 --> 00:28:15,066 Because you might be looking for Abigail Adams, 654 00:28:15,066 --> 00:28:17,266 who just so happens to be on the first page of the phone book. 655 00:28:17,566 --> 00:28:20,106 Bam, you're done, you don't even have to divide the phone book 656 00:28:20,466 --> 00:28:21,826 in half multiple times. 657 00:28:22,096 --> 00:28:25,256 And we'll call the running time of that algorithm constant, 658 00:28:25,716 --> 00:28:28,156 whether it takes one step, maybe it takes two steps, 659 00:28:28,156 --> 00:28:29,386 because there she is on the page, 660 00:28:29,386 --> 00:28:31,446 now I have to find what row she's in on the page. 661 00:28:31,706 --> 00:28:33,666 But those are just a constant number of steps, so we're going 662 00:28:33,666 --> 00:28:35,796 to generalize as just constant time. 663 00:28:36,146 --> 00:28:37,396 So what's the point of all this? 664 00:28:37,586 --> 00:28:39,616 Well, we now, just as we did, have this means 665 00:28:39,616 --> 00:28:42,416 of comparing not apples and oranges, per se, 666 00:28:42,416 --> 00:28:43,546 but apples and apples. 667 00:28:43,546 --> 00:28:46,286 If we have two different algorithms, linear search 668 00:28:46,346 --> 00:28:48,366 and then divide and conquer for the phone book, 669 00:28:48,656 --> 00:28:51,016 we can now say something qualitatively 670 00:28:51,016 --> 00:28:54,386 about which one is better in terms of its running time. 671 00:28:54,626 --> 00:28:57,506 And the notation folks tend to use for exactly that, 672 00:28:57,506 --> 00:28:58,686 is this notation here. 673 00:28:58,686 --> 00:29:01,146 So just to toss out some jargon in case you see it anywhere, 674 00:29:01,486 --> 00:29:04,116 constant time we discussed, logarithmic time we discussed, 675 00:29:04,426 --> 00:29:06,106 linear time we discussed. 676 00:29:06,106 --> 00:29:10,886 We'll see this running time often, N log N. Frankly, in 12, 677 00:29:10,886 --> 00:29:13,686 15 years of computer science I've never heard one human say 678 00:29:13,686 --> 00:29:17,016 supra linear, but that is in fact the formal term. 679 00:29:17,016 --> 00:29:20,056 N squared, we'll see today and-or Wednesday. 680 00:29:20,106 --> 00:29:21,126 That's called quadratic. 681 00:29:21,386 --> 00:29:22,986 More generally, if you have N raised 682 00:29:22,986 --> 00:29:24,836 to some power, that's a polynomial. 683 00:29:25,096 --> 00:29:28,936 And then there's N factorial, which is even worse than any 684 00:29:28,936 --> 00:29:30,406 of those, but you tend not to see 685 00:29:30,406 --> 00:29:33,726 that unless your algorithm is frankly downright stupid. 686 00:29:34,486 --> 00:29:36,646 So couple of announcements. 687 00:29:37,316 --> 00:29:39,266 There's one hand out today, very little code, 688 00:29:39,266 --> 00:29:40,496 but nonetheless, it's out there. 689 00:29:40,756 --> 00:29:43,196 Lunch seems to have gone well, so we were kind 690 00:29:43,196 --> 00:29:44,346 of overwhelmed by responses. 691 00:29:44,346 --> 00:29:47,386 We ended up having a barbecue for 60 of your classmates. 692 00:29:47,386 --> 00:29:50,246 So we'll try to do another such thing some future Friday, 693 00:29:50,246 --> 00:29:51,526 more on that in a future class. 694 00:29:51,876 --> 00:29:53,426 If you've not yet returned sensor boards 695 00:29:53,426 --> 00:29:56,076 from the hacker edition of P set 0, that's fine, just hand them 696 00:29:56,076 --> 00:29:58,416 to one of the TFs in our little nook over here. 697 00:29:58,686 --> 00:30:00,706 P set 2 went out the door on Friday. 698 00:30:00,996 --> 00:30:03,826 The walk through was filmed yesterday, on Sunday, 699 00:30:03,826 --> 00:30:05,376 that video will go on line by tomorrow. 700 00:30:05,376 --> 00:30:07,606 Office hours have been in progress, do take advantage 701 00:30:07,606 --> 00:30:09,516 of those, and sections have already begun. 702 00:30:09,616 --> 00:30:10,846 So your assigned section, 703 00:30:10,846 --> 00:30:12,916 you should have been informed of by e-mail. 704 00:30:12,916 --> 00:30:15,556 Either you went last night or your section is probably today 705 00:30:15,556 --> 00:30:18,446 or tomorrow, so do take advantage of those as well. 706 00:30:18,676 --> 00:30:23,206 And we also deployed per Problem Set 2 this link on Friday. 707 00:30:23,266 --> 00:30:24,966 So there is now the bulletin board, 708 00:30:25,276 --> 00:30:26,936 which if clicked will log you 709 00:30:26,936 --> 00:30:28,856 into a little interface that looks like this. 710 00:30:29,116 --> 00:30:31,486 And the bulletin board is meant to serve a couple of purposes. 711 00:30:31,486 --> 00:30:35,316 One, it's meant to allow us to answer the same question once 712 00:30:35,736 --> 00:30:38,866 for many different people to benefit from the answer. 713 00:30:38,866 --> 00:30:40,346 It's meant to be an opportunity for you 714 00:30:40,346 --> 00:30:42,786 to potentially field each other's questions, 715 00:30:42,786 --> 00:30:44,286 so long as it doesn't cross that line 716 00:30:44,286 --> 00:30:47,046 of outright telling someone some answer, and what you'll find is 717 00:30:47,046 --> 00:30:48,936 that week to week we'll add another category 718 00:30:48,936 --> 00:30:52,716 to the bulletin board for P set 2, P set 3, P set 4 719 00:30:52,716 --> 00:30:55,266 and so forth, and the goal of the bulletin board is 720 00:30:55,266 --> 00:30:58,286 to ask questions that you're pretty sure anyone can hear the 721 00:30:58,286 --> 00:31:01,716 answer to, anyone can see the specifics, but if you find 722 00:31:01,716 --> 00:31:03,736 that asking your question really requires 723 00:31:03,736 --> 00:31:07,496 that you show us your code, then continue to use help at CS.50 724 00:31:08,006 --> 00:31:10,146 for that, but what you'll find more importantly, 725 00:31:10,146 --> 00:31:12,246 especially for those less comfortable, when you log 726 00:31:12,246 --> 00:31:14,926 in for the first time although it does authenticate you 727 00:31:14,926 --> 00:31:16,266 with user name and password, 728 00:31:16,546 --> 00:31:19,806 we have made your name the default value of students. 729 00:31:19,896 --> 00:31:22,076 So you can have that pseudonym, you can change it 730 00:31:22,076 --> 00:31:24,766 to something other than that, your own name or something fun. 731 00:31:25,066 --> 00:31:28,016 But realize the point of the bulletin boards, anonymity is 732 00:31:28,016 --> 00:31:31,156 so that you can post what you fear is the proverbial dumb 733 00:31:31,156 --> 00:31:34,686 question, and no one else, other than the staff as necessary, 734 00:31:34,686 --> 00:31:36,736 is going ask that you even asked that question. 735 00:31:36,736 --> 00:31:38,696 So if that gives you more comfort, by all means, 736 00:31:38,696 --> 00:31:40,436 ask for questions in that way. 737 00:31:40,436 --> 00:31:42,846 That was a lot to absorb, let's take our five minute break. 738 00:31:43,516 --> 00:31:46,546 [ Background noise ] 739 00:31:47,046 --> 00:31:52,846 >> All right, we are back. 740 00:31:52,886 --> 00:31:55,656 So we do the following with key note. 741 00:31:55,706 --> 00:32:00,296 We have eight doors here, behind which are little surprises. 742 00:32:00,536 --> 00:32:02,606 To reveal what's behind these doors, though, 743 00:32:02,976 --> 00:32:06,706 I need to solicit the ever-awkward volunteer to come 744 00:32:06,706 --> 00:32:08,146 up on stage, if you will. 745 00:32:09,556 --> 00:32:16,736 Yeah? Come on down. 746 00:32:16,736 --> 00:32:18,606 Again, you'll need to sign your rights away 747 00:32:18,606 --> 00:32:19,726 in perpetuity in a moment. 748 00:32:20,046 --> 00:32:20,756 Come on up. 749 00:32:20,886 --> 00:32:21,306 What is your name? 750 00:32:21,306 --> 00:32:21,996 >> Edward. 751 00:32:22,186 --> 00:32:22,996 >> Edward, okay. 752 00:32:23,316 --> 00:32:24,706 David. Nice to meet you. 753 00:32:24,736 --> 00:32:25,356 Got a fan over there. 754 00:32:25,966 --> 00:32:30,666 All right, so Edward, much like a game show, 755 00:32:31,056 --> 00:32:35,846 we have the same doors -- oops -- 756 00:32:35,846 --> 00:32:37,096 [Inaudible] will spoil the surprise. 757 00:32:37,936 --> 00:32:46,196 So we have eight doors behind these pieces of paper here. 758 00:32:46,196 --> 00:32:47,196 So we have two arrays. 759 00:32:47,236 --> 00:32:49,396 So I'm going start using the geek terminology. 760 00:32:49,396 --> 00:32:52,046 So we have two arrays, arrays of integers. 761 00:32:52,046 --> 00:32:53,166 Each is of size eight. 762 00:32:53,456 --> 00:32:56,026 And you are now just a human who's been presented 763 00:32:56,026 --> 00:32:57,326 with a very awkward situation. 764 00:32:57,326 --> 00:32:59,466 The goal of which in this top array is 765 00:32:59,506 --> 00:33:00,886 to find me the number 3. 766 00:33:01,466 --> 00:33:03,006 And what we the audience hope to learn 767 00:33:03,006 --> 00:33:04,986 from this is exactly how you the human go 768 00:33:04,986 --> 00:33:06,426 about solving this problem. 769 00:33:06,506 --> 00:33:07,886 So if you would, on the top row only, 770 00:33:07,946 --> 00:33:09,066 find us the number 3, please. 771 00:33:10,316 --> 00:33:14,346 See the number 2. 772 00:33:14,346 --> 00:33:16,446 Oh, that was very good. 773 00:33:16,856 --> 00:33:17,776 See the number 3. 774 00:33:19,916 --> 00:33:23,166 Okay, so if you would, can you explain how you found the 775 00:33:23,166 --> 00:33:23,846 number 3? 776 00:33:24,516 --> 00:33:28,906 >> Just went up, [Inaudible] pieces of paper at random. 777 00:33:29,366 --> 00:33:30,956 >> Okay, so that was just chance, then, 778 00:33:30,956 --> 00:33:32,376 it was not spoiled by the wind. 779 00:33:33,626 --> 00:33:38,016 >> Right. Systematically go through and check all the paper. 780 00:33:38,266 --> 00:33:38,686 >> Okay, good. 781 00:33:38,886 --> 00:33:40,466 So thank you, first, for spoiling my demo 782 00:33:40,466 --> 00:33:41,366 by finding it so quickly. 783 00:33:41,906 --> 00:33:43,776 So -- but let's ask the question nonetheless. 784 00:33:44,016 --> 00:33:45,976 What is the running time of the algorithm 785 00:33:45,976 --> 00:33:48,166 that Edward just tried, can we slap a label on this. 786 00:33:48,256 --> 00:33:50,296 Best case running time. 787 00:33:50,296 --> 00:33:53,296 It's apparently two, but one in the truly best case, 788 00:33:53,296 --> 00:33:55,776 so this is in omega of 1, we'll say. 789 00:33:55,776 --> 00:33:57,736 In the best case, he just happens to go straight 790 00:33:57,736 --> 00:33:59,566 for the right number, just by chance 791 00:33:59,566 --> 00:34:01,486 or by some more sophisticated heuristic. 792 00:34:01,676 --> 00:34:03,576 And in the worst case, how long would it have taken you 793 00:34:03,576 --> 00:34:05,436 to find the number 3? 794 00:34:05,466 --> 00:34:07,926 Okay, 8 or more, generally big O of N. So again, 795 00:34:07,926 --> 00:34:10,856 we have an algorithm that's big O of N, or omega of 1. 796 00:34:11,056 --> 00:34:13,816 So the question now at hand is can we do better than that. 797 00:34:15,226 --> 00:34:16,206 So let's see. 798 00:34:16,206 --> 00:34:18,606 How could we fundamentally improve upon that algorithm, 799 00:34:18,986 --> 00:34:20,156 any suggestions for Edward. 800 00:34:20,806 --> 00:34:22,816 Could we do better than big O event. 801 00:34:22,816 --> 00:34:25,146 Because big O event is great when it's only eight, 802 00:34:25,146 --> 00:34:27,816 but if N is now 4 billion, as we've discussed, 803 00:34:27,816 --> 00:34:31,176 N, not so good any more. 804 00:34:31,366 --> 00:34:31,646 Sorry? 805 00:34:31,646 --> 00:34:32,786 [ Inaudible audience comment ] 806 00:34:32,786 --> 00:34:34,966 >> Okay, so we could sort these things, right? 807 00:34:34,966 --> 00:34:37,246 There is an advantage that we had all this time 808 00:34:37,246 --> 00:34:38,796 since Week Zero of a phone book. 809 00:34:38,796 --> 00:34:41,806 Someone else, Verizon or whomever, has done the hard work 810 00:34:41,806 --> 00:34:44,226 of sorting that phone book alphabetically 811 00:34:44,496 --> 00:34:46,666 so that we can make certain assumptions. 812 00:34:46,666 --> 00:34:49,026 An assumption like when I go down the middle and end 813 00:34:49,026 --> 00:34:52,456 up at the Ms or the Ns I know the Smiths, the Ss, 814 00:34:52,666 --> 00:34:53,736 are going to be to the right. 815 00:34:53,946 --> 00:34:56,846 Now I didn't give Edward any such assumptions, and in fact, 816 00:34:56,846 --> 00:35:00,626 it's a little weird that 2 is over here, and yet 3 is 817 00:35:00,626 --> 00:35:02,976 over here, kind of suggests 818 00:35:02,976 --> 00:35:05,066 that maybe these things are not so sorted. 819 00:35:05,066 --> 00:35:07,816 And in fact, there's 8, there's 1. 820 00:35:07,906 --> 00:35:09,526 So the top row is not in fact sorted. 821 00:35:09,526 --> 00:35:13,196 So if we now give you the added assumption that the bottom array 822 00:35:13,196 --> 00:35:17,026 of 8 numbers is in fact sorted, how might you go about, 823 00:35:17,396 --> 00:35:18,596 without answering, just doing, 824 00:35:18,716 --> 00:35:21,306 finding the number 50 in the bottom array. 825 00:35:22,676 --> 00:35:24,746 Different numbers, but they are now sorted. 826 00:35:24,936 --> 00:35:27,836 I see 171. 827 00:35:30,336 --> 00:35:31,906 Dammit! Okay. 828 00:35:35,346 --> 00:35:36,806 Okay, very good. 829 00:35:37,936 --> 00:35:41,366 Okay. So what did you do this time? 830 00:35:43,126 --> 00:35:45,346 >> Well, since you told me it was sorted, 831 00:35:45,346 --> 00:35:47,236 I assumed it meant it was either -- 832 00:35:48,146 --> 00:35:49,596 there was going to be some sort of pattern. 833 00:35:49,746 --> 00:35:51,036 So either the larger numbers were 834 00:35:51,276 --> 00:35:52,196 on the right or on the left. 835 00:35:52,306 --> 00:35:53,806 So I checked both sides. 836 00:35:53,806 --> 00:35:54,046 >> Okay, good. 837 00:35:54,046 --> 00:35:56,506 So as an aside, this demo was far more effective last year 838 00:35:56,506 --> 00:35:58,016 when we spent like 10 minutes on this. 839 00:35:58,256 --> 00:35:59,366 It was somewhat awkward, actually, 840 00:35:59,366 --> 00:36:02,516 because your predecessor spends a lot of time thinking 841 00:36:02,516 --> 00:36:04,416 about which number to reveal, 842 00:36:04,416 --> 00:36:06,986 thinking there was perhaps some trick to this, when really, 843 00:36:06,986 --> 00:36:08,686 five minutes before class I had written random numbers 844 00:36:08,686 --> 00:36:09,126 on the board. 845 00:36:09,126 --> 00:36:10,246 So you're doing much better. 846 00:36:10,246 --> 00:36:12,326 He was a good sport about it. 847 00:36:12,326 --> 00:36:15,086 Okay, so bottom array is now sorted. 848 00:36:15,086 --> 00:36:16,146 Let's slap some numbers on this. 849 00:36:16,146 --> 00:36:19,856 Best case is apparently two again, in reality, but formally 850 00:36:20,126 --> 00:36:23,146 or asymptotically, any time we start talking about big O 851 00:36:23,146 --> 00:36:25,186 and omega, we're talking asymptotically, 852 00:36:25,186 --> 00:36:28,496 which means as inputs get large, as N gets large, 853 00:36:28,496 --> 00:36:30,226 what generally happens to running time. 854 00:36:30,486 --> 00:36:34,346 So in the worst case, how long would you algorithm have taken. 855 00:36:34,836 --> 00:36:37,406 Okay, N, because why, what was your algorithm here? 856 00:36:38,686 --> 00:36:41,346 You said I went high, I went low, but we didn't really get 857 00:36:41,346 --> 00:36:43,116 to see what you would have done next. 858 00:36:43,336 --> 00:36:44,296 In fact, let's continue this. 859 00:36:44,626 --> 00:36:46,376 Find me the number 52. 860 00:36:46,586 --> 00:36:52,676 I see 51. I see 61. 861 00:36:54,356 --> 00:36:56,536 Good. Back on top. 862 00:36:57,196 --> 00:36:59,316 Okay, so pause there, because we know you're going 863 00:36:59,496 --> 00:37:01,226 to struggle now. 864 00:37:01,226 --> 00:37:02,766 So what was your algorithm? 865 00:37:03,636 --> 00:37:04,766 Now we have to push a little harder. 866 00:37:04,766 --> 00:37:11,706 >> So I was assuming they were sorted numerically. 867 00:37:11,706 --> 00:37:12,216 So that -- 868 00:37:12,326 --> 00:37:12,606 >> Okay. 869 00:37:12,946 --> 00:37:15,886 >> So when you said 52, since I new 50 was already down there, 870 00:37:15,986 --> 00:37:17,036 I just went to the same row. 871 00:37:17,426 --> 00:37:17,936 >> Okay, good. 872 00:37:17,936 --> 00:37:19,336 So in short, I can summarize. 873 00:37:19,336 --> 00:37:20,646 So at first you went high 874 00:37:20,776 --> 00:37:22,826 because 50 is a pretty big number, certainly relative 875 00:37:22,826 --> 00:37:24,336 to the number 3 we looked for before. 876 00:37:24,536 --> 00:37:25,976 So you went high, didn't find it, 877 00:37:25,976 --> 00:37:28,326 then sort of thought maybe I was messing with you, so went left 878 00:37:28,326 --> 00:37:29,646 to look for the number 50 there. 879 00:37:29,916 --> 00:37:31,886 Didn't find it, but you did know it was sorted. 880 00:37:32,096 --> 00:37:34,846 So the rest of your algorithm after this guess work 881 00:37:34,846 --> 00:37:37,686 at the beginning, try high, if not high, try low, 882 00:37:37,896 --> 00:37:40,006 was presumably to iterate then from left to right. 883 00:37:40,276 --> 00:37:43,436 Okay, so we can still slap some formality on this approach too. 884 00:37:43,436 --> 00:37:45,096 In the worst case, how many steps would 885 00:37:45,096 --> 00:37:47,076 that take, as you said. 886 00:37:47,076 --> 00:37:47,916 N steps, right? 887 00:37:47,916 --> 00:37:49,476 Because maybe you went high here. 888 00:37:49,666 --> 00:37:50,746 Okay, that wasn't 50. 889 00:37:50,746 --> 00:37:53,896 Then you went low, not here still, then you end up iterating 890 00:37:53,896 --> 00:37:56,726 over the N minus 1 remaining elements, okay, big O event. 891 00:37:56,936 --> 00:38:01,046 But again, best case running time is omega of -- 892 00:38:01,796 --> 00:38:03,896 in your case there, is just 1. 893 00:38:04,026 --> 00:38:05,056 So in constant time. 894 00:38:05,336 --> 00:38:07,426 So this sort of begs the question then, 895 00:38:07,536 --> 00:38:10,626 how long did it take me, the person setting this up, 896 00:38:10,626 --> 00:38:12,566 to actually sort these values. 897 00:38:12,566 --> 00:38:15,466 So let's put you out of your awkward [Inaudible] round 898 00:38:15,466 --> 00:38:16,526 of applause if we could. 899 00:38:17,516 --> 00:38:20,426 [ Applause ] 900 00:38:20,926 --> 00:38:21,746 >> Legal form awaits. 901 00:38:22,016 --> 00:38:25,636 Okay, so that begs the question, then, how do we actually go 902 00:38:25,636 --> 00:38:26,956 about sorting these values. 903 00:38:26,956 --> 00:38:28,236 Because I kind of did it for free. 904 00:38:28,236 --> 00:38:30,866 Right? I did it before class, kind of cheated 905 00:38:30,866 --> 00:38:32,376 by having the numbers pre-baked. 906 00:38:32,536 --> 00:38:34,396 But if you're just handed a bunch of numbers 907 00:38:34,396 --> 00:38:37,066 as Edward effectively was, and I left it to him 908 00:38:37,286 --> 00:38:40,006 to sort those values so that he could then impress us 909 00:38:40,006 --> 00:38:42,046 with the running time of his better algorithm, 910 00:38:42,396 --> 00:38:44,716 how much time does it takes to sort these numbers. 911 00:38:44,716 --> 00:38:47,746 Well that, we'll see, is a much harder question to answer 912 00:38:47,746 --> 00:38:50,616 and a much more time consuming problem to solve. 913 00:38:50,746 --> 00:38:54,276 But let's see if we can express now what Edward kind of did 914 00:38:54,276 --> 00:38:56,036 or in an ideal world would have done exactly. 915 00:38:56,306 --> 00:38:58,376 We can express this algorithm, 916 00:38:58,376 --> 00:39:00,636 or really any algorithm moving forward in the course, 917 00:39:00,796 --> 00:39:03,666 not just in C code which gets fairly pedantic, 918 00:39:03,666 --> 00:39:06,236 writing out all the minutia, we just want to convey the idea 919 00:39:06,236 --> 00:39:08,826 of an algorithm, just like we did in Week Zero and One, 920 00:39:09,106 --> 00:39:10,486 we can go with pseudo code. 921 00:39:10,706 --> 00:39:13,626 There is no formal standard for what pseudo code is, 922 00:39:13,626 --> 00:39:15,276 different people take different approaches. 923 00:39:15,486 --> 00:39:18,536 The only important thing is to communicate your idea clearly 924 00:39:18,786 --> 00:39:21,916 without getting bogged down in stupid details like semicolons 925 00:39:21,916 --> 00:39:23,966 and parentheses, which add nothing to the logic 926 00:39:24,226 --> 00:39:25,746 and only distract from your idea. 927 00:39:25,806 --> 00:39:27,676 So I took the somewhat conventional 928 00:39:27,676 --> 00:39:29,026 but somewhat arbitrary approach 929 00:39:29,256 --> 00:39:31,106 of defining my pseudo code as follows. 930 00:39:31,136 --> 00:39:32,856 For linear search, as I called it, 931 00:39:32,856 --> 00:39:35,116 and this is what the world generally calls an algorithm 932 00:39:35,116 --> 00:39:37,726 where you start from here to here to here to here, 933 00:39:37,726 --> 00:39:41,796 and move from left to right, so on input N, for N elements, 934 00:39:42,076 --> 00:39:47,676 for each element I, if I equals, equals N, return true. 935 00:39:47,676 --> 00:39:50,046 So on input N, for each element -- yup. 936 00:39:50,366 --> 00:39:52,946 So this is -- there are many different ways you can implement 937 00:39:52,946 --> 00:39:55,406 this pseudo code or write it, but it implies 938 00:39:55,406 --> 00:39:58,166 that there's an array of some sort, but syntactically, 939 00:39:58,166 --> 00:39:59,516 doesn't matter how we express this. 940 00:39:59,516 --> 00:40:00,766 So I'm going to take the approach of saying 941 00:40:00,766 --> 00:40:05,256 for each element I in my array, if I happens to equal N, 942 00:40:05,256 --> 00:40:08,006 the thing I'm looking for, go ahead and return true. 943 00:40:08,186 --> 00:40:11,516 But then per the indentation, if I never return true, 944 00:40:11,516 --> 00:40:13,806 the last line in my program says return false. 945 00:40:13,976 --> 00:40:16,576 And the answer is no, as was the case when Edward failed 946 00:40:16,576 --> 00:40:19,926 to find 52 because it just wasn't in that second array. 947 00:40:20,356 --> 00:40:23,546 But we can do what I was hoping he could do, but it is okay, 948 00:40:23,546 --> 00:40:24,526 I can do it myself, 949 00:40:24,916 --> 00:40:27,506 we can implement what we might call binary search, 950 00:40:27,506 --> 00:40:29,526 which is actually direct application 951 00:40:29,526 --> 00:40:31,246 of the whole divide and conquer idea. 952 00:40:31,496 --> 00:40:32,676 Now this algorithm looks 953 00:40:32,676 --> 00:40:35,666 at first glance certainly more complicated than linear search, 954 00:40:35,866 --> 00:40:38,606 but we've already seen how much time it saves on things 955 00:40:38,606 --> 00:40:40,436 like searching phone books and what not. 956 00:40:40,696 --> 00:40:43,626 So in theory, if I were searching for the number 50 here 957 00:40:43,856 --> 00:40:46,706 and I didn't have this heuristic of going high or going low, 958 00:40:46,946 --> 00:40:49,446 because maybe that would work, but even Edward had 959 00:40:49,446 --> 00:40:52,826 to make a guess, is 50 big or is it a small value. 960 00:40:52,826 --> 00:40:55,596 Well in this case, it was actually relatively small value, 961 00:40:55,596 --> 00:40:59,216 because we had 50 here, but then 171 here. 962 00:40:59,516 --> 00:41:01,696 And in the end, although it was a clever trick, 963 00:41:01,696 --> 00:41:04,276 and he actually solved that problem very quickly 964 00:41:04,276 --> 00:41:07,736 to his credit, the algorithm itself, ultimately boiled 965 00:41:07,736 --> 00:41:09,786 down to no better than just linear search. 966 00:41:09,786 --> 00:41:11,616 Because in the end these clever heuristics, 967 00:41:11,656 --> 00:41:14,326 which in the real world clearly saved us some time, 968 00:41:14,606 --> 00:41:17,606 and garnered him some applause, the rest of the algorithm, 969 00:41:17,606 --> 00:41:21,636 had he not been so lucky, is still the same linear algorithm. 970 00:41:21,636 --> 00:41:24,296 So what if I instead apply the Mike Smith approach. 971 00:41:24,526 --> 00:41:27,386 I go into the middle of my phone book, the middle of my array. 972 00:41:27,386 --> 00:41:29,296 And now I'm looking for the number 50. 973 00:41:29,486 --> 00:41:31,696 Well, we have some weird mathematical rounding errors 974 00:41:31,696 --> 00:41:33,766 that I'm sure we can work out with the rounding function 975 00:41:33,766 --> 00:41:35,536 or ceiling or floor or whatever, so I'm going 976 00:41:35,536 --> 00:41:36,736 to arbitrarily go here. 977 00:41:36,996 --> 00:41:38,156 Is this the number 50? 978 00:41:38,496 --> 00:41:39,896 No, it's the number 105. 979 00:41:39,896 --> 00:41:41,636 But just like in Week Zero when I was able 980 00:41:41,636 --> 00:41:44,286 to tear away the problem of the phone book, 981 00:41:44,556 --> 00:41:46,996 I now know that my answer is definitely not going 982 00:41:46,996 --> 00:41:47,926 to be to the right. 983 00:41:48,216 --> 00:41:50,476 So I can focus only on the remaining numbers here. 984 00:41:50,476 --> 00:41:51,416 Where do I go next? 985 00:41:51,726 --> 00:41:54,086 Well, I think essentially the same problem. 986 00:41:54,086 --> 00:41:56,876 I'm still looking for the number 50, I've still got an array, 987 00:41:56,876 --> 00:41:59,456 upside is half as big as it once was. 988 00:41:59,456 --> 00:42:01,886 I can use that to my advantage performance-wise, 989 00:42:02,176 --> 00:42:03,956 so let me just go smack-dab into the middle. 990 00:42:04,086 --> 00:42:06,196 I've got to pick left or right arbitrarily, 991 00:42:06,196 --> 00:42:07,946 because the math isn't working out beautifully, 992 00:42:08,206 --> 00:42:09,066 so let's pick this one. 993 00:42:09,376 --> 00:42:10,566 51, not the case. 994 00:42:10,566 --> 00:42:12,496 But now I know it's this way. 995 00:42:12,616 --> 00:42:15,436 So now I know it's not here, 51 or 61. 996 00:42:15,716 --> 00:42:17,746 And I'm left with a problem that's a quarter 997 00:42:17,746 --> 00:42:19,316 of my original size. 998 00:42:19,316 --> 00:42:22,756 So already, fundamentally, I'm definitely moving faster 999 00:42:22,756 --> 00:42:26,746 than Edward would have, given many more elements in his array, 1000 00:42:27,036 --> 00:42:29,296 which clearly is going to the case when we have more room 1001 00:42:29,296 --> 00:42:30,506 and more interesting problems. 1002 00:42:30,886 --> 00:42:31,946 So I've got two elements left. 1003 00:42:32,136 --> 00:42:34,096 I arbitrarily go right, as I've been doing. 1004 00:42:34,526 --> 00:42:35,996 Aha, the number 50. 1005 00:42:36,316 --> 00:42:38,996 Now, granted, and this is actually a use full take away, 1006 00:42:39,286 --> 00:42:41,726 granted, my algorithm was slower than Edward's. 1007 00:42:41,726 --> 00:42:42,926 Edward took two steps. 1008 00:42:43,376 --> 00:42:46,056 I took three steps, and yet I'm going to make the claim, 1009 00:42:46,096 --> 00:42:48,526 arrogantly, nonetheless, my algorithm is better 1010 00:42:48,946 --> 00:42:52,416 because as N increases, asymptotically, the bigger 1011 00:42:52,416 --> 00:42:55,116 and bigger N gets, the more and more you'll be impressed 1012 00:42:55,116 --> 00:42:57,726 by the running time of my algorithm, and more 1013 00:42:57,726 --> 00:43:01,186 and more Edward's algorithm, in all fairness, just devolves back 1014 00:43:01,186 --> 00:43:04,346 into a very slow but correct implementation 1015 00:43:04,416 --> 00:43:06,076 of the same idea of search. 1016 00:43:06,366 --> 00:43:09,426 Now as an aside, for the astute, this just so happens to be all 1017 00:43:09,426 --> 00:43:11,526 of the computer science courses you're qualified to take 1018 00:43:11,596 --> 00:43:13,626 after or before C S 50. 1019 00:43:13,626 --> 00:43:17,086 So keep that in mind at shopping terms -- shopping periods time. 1020 00:43:17,646 --> 00:43:22,116 But for now, the take away is that very subliminal, I know, 1021 00:43:22,896 --> 00:43:25,906 maybe it's not subliminal if I tell you what they are, 1022 00:43:26,156 --> 00:43:28,976 so for now we just need a way of expressing this. 1023 00:43:29,276 --> 00:43:31,546 So there's many different ways we can express this. 1024 00:43:31,846 --> 00:43:34,746 This is what I would call an iterative implementation 1025 00:43:35,126 --> 00:43:36,256 of binary search. 1026 00:43:36,256 --> 00:43:38,936 Thus far in the class we've only talked about phone books 1027 00:43:38,936 --> 00:43:41,956 and splitting them in half, throwing half away, repeating. 1028 00:43:42,296 --> 00:43:44,936 But now you have this ability to program a machine, 1029 00:43:44,936 --> 00:43:46,286 you kind of start -- you need 1030 00:43:46,286 --> 00:43:48,696 to start expressing yourself more precisely. 1031 00:43:48,696 --> 00:43:54,126 You can't just tell the computer find me the number 50, or 52, 1032 00:43:54,196 --> 00:43:56,316 you need to tell it how to do that. 1033 00:43:56,316 --> 00:43:58,086 You need to teach the machine, so to speak. 1034 00:43:58,346 --> 00:44:01,486 So one way we might imply that in pseudo code is as follows. 1035 00:44:01,486 --> 00:44:05,016 On the input of an array, where we call it array bracket zero 1036 00:44:05,016 --> 00:44:07,076 on up to N minus 1, so the length 1037 00:44:07,076 --> 00:44:09,566 of most any array we ever discuss now is N, 1038 00:44:09,826 --> 00:44:13,306 but the largest element index is obviously N minus 1, 1039 00:44:13,446 --> 00:44:15,726 we start counting from zero, and input K, 1040 00:44:15,726 --> 00:44:17,286 where K is the value I'm looking for. 1041 00:44:17,526 --> 00:44:19,956 I kind of need to implement this idea of binary search. 1042 00:44:20,516 --> 00:44:22,116 So I'm going to define 1043 00:44:22,116 --> 00:44:24,326 in italics here a variable called first. 1044 00:44:24,466 --> 00:44:25,566 And initialize it to zero. 1045 00:44:25,566 --> 00:44:27,476 I'm going to define another variable called last, 1046 00:44:27,706 --> 00:44:29,296 and initialize it to N minus 1. 1047 00:44:29,586 --> 00:44:32,226 And what the subsequent steps do, if you think it through, 1048 00:44:32,226 --> 00:44:35,746 as a future Problem Set may ask you to do, it's going to ask you 1049 00:44:35,746 --> 00:44:38,726 to implement very precisely this approach. 1050 00:44:38,796 --> 00:44:41,216 Here's N elements, I'm going to use my left hand 1051 00:44:41,216 --> 00:44:44,366 as my left variable, my right hand as my right variable, 1052 00:44:44,626 --> 00:44:46,786 and what is it telling me to do on the first step? 1053 00:44:47,206 --> 00:44:49,436 Well, while first is less than or equal to last. 1054 00:44:49,696 --> 00:44:52,316 So good, my left hand is indeed less than my right hand, 1055 00:44:52,316 --> 00:44:53,496 because it's to the left of it. 1056 00:44:54,006 --> 00:44:55,056 What do I do? 1057 00:44:55,696 --> 00:44:58,906 Let middle equals first, plus last, divided by 2. 1058 00:44:59,486 --> 00:45:02,226 So what that math does for me is if I add this index 1059 00:45:02,226 --> 00:45:05,836 and this index, divide by 2 and take care of the rounding, 1060 00:45:06,096 --> 00:45:07,316 that gives me a number in the middle. 1061 00:45:07,426 --> 00:45:09,126 Let's say 105 for now. 1062 00:45:09,506 --> 00:45:11,486 So middle equals that element there. 1063 00:45:11,486 --> 00:45:15,396 So if K is less than the middle element in the array, 1064 00:45:15,396 --> 00:45:18,406 then let last equal middle minus 1. 1065 00:45:18,406 --> 00:45:21,186 All right, so I'm looking for the number 50. 1066 00:45:21,396 --> 00:45:25,306 So 50 is indeed less than the middle element of the array. 1067 00:45:25,586 --> 00:45:26,476 So what do I do? 1068 00:45:26,476 --> 00:45:31,186 Well the then condition says then let last equal middle 1069 00:45:31,186 --> 00:45:31,716 minus 1. 1070 00:45:31,816 --> 00:45:35,376 Here's last, here's middle, here's middle minus 1. 1071 00:45:35,376 --> 00:45:38,156 And already in one iteration we've seen how 1072 00:45:38,156 --> 00:45:40,276 to translate a very simple idea, right? 1073 00:45:40,276 --> 00:45:43,896 Split the problem in half, in half, to something much closer 1074 00:45:44,096 --> 00:45:46,186 to what GCC could understand, 1075 00:45:46,186 --> 00:45:48,666 much closer to what you could actually code up in nano. 1076 00:45:48,946 --> 00:45:51,406 And if you follow the logic there, what you'll find is 1077 00:45:51,406 --> 00:45:54,106 that iteratively [Inaudible] my right hand, keep moving closer 1078 00:45:54,106 --> 00:45:56,786 to my left, sometimes my left goes closer to my right, 1079 00:45:57,076 --> 00:45:59,896 if my hands ever cross what's the implication, presumably? 1080 00:45:59,896 --> 00:46:02,006 That it's not there. 1081 00:46:02,006 --> 00:46:04,716 I'm looking for 52, and you know, I went too far 1082 00:46:04,716 --> 00:46:06,616 to the right, too far to the left, my hands cross 1083 00:46:06,616 --> 00:46:10,036 because it's just not there, or I end up smack dab on top 1084 00:46:10,036 --> 00:46:11,226 of the number I'm looking for. 1085 00:46:11,226 --> 00:46:13,276 Whether it's 50 or 52. 1086 00:46:13,656 --> 00:46:16,096 But what we've sacrificed here is this beauty 1087 00:46:16,216 --> 00:46:17,426 of the original algorithm. 1088 00:46:17,426 --> 00:46:19,566 What's really cool about divide and conquer is it's 1089 00:46:19,566 --> 00:46:20,906 so damn simple to explain. 1090 00:46:20,906 --> 00:46:24,076 Take the phone book, split it in half, look at the elements. 1091 00:46:24,076 --> 00:46:26,766 If it's less than, go left, if it's greater than, go right. 1092 00:46:27,456 --> 00:46:30,006 Repeat. There's this elegance about this, 1093 00:46:30,006 --> 00:46:32,076 where you just say divide the problem in half, 1094 00:46:32,296 --> 00:46:34,706 check for something, and then repeat, repeat. 1095 00:46:35,046 --> 00:46:38,466 This really clouds some of the simplicity of that algorithm. 1096 00:46:38,746 --> 00:46:42,146 But we have this ability, it turns out, in C, and in Java, 1097 00:46:42,146 --> 00:46:45,056 in C++, and all of these languages, particularly ones 1098 00:46:45,056 --> 00:46:47,876 like Lisp and Scheme, which you could cover in C S 51, 1099 00:46:48,416 --> 00:46:50,336 of this feature of recursion. 1100 00:46:50,806 --> 00:46:53,576 So it turns out that whenever we've said something 1101 00:46:53,576 --> 00:46:56,516 like repeat, you can repeat 1102 00:46:56,806 --> 00:46:59,396 by essentially applying the same algorithm again. 1103 00:46:59,876 --> 00:47:02,216 Now thus far in C, how do you implement an algorithm? 1104 00:47:02,216 --> 00:47:05,026 Well, you write a function like cipher or Caesar 1105 00:47:05,216 --> 00:47:06,756 that implements some algorithm. 1106 00:47:07,056 --> 00:47:09,996 So if we ever have to express this idea of repeat, 1107 00:47:10,356 --> 00:47:15,286 maybe we can have a function just invoke itself again 1108 00:47:15,576 --> 00:47:17,256 on a smaller piece of the puzzle. 1109 00:47:17,256 --> 00:47:18,556 Well, what do we mean by that. 1110 00:47:18,556 --> 00:47:20,716 Well, let's take a look at this very simple program. 1111 00:47:20,716 --> 00:47:23,556 So this is sigma 1.C, this is one of today's two print outs, 1112 00:47:23,976 --> 00:47:25,996 two pieces of code that's printed out. 1113 00:47:26,286 --> 00:47:27,856 So this is going to be a function that returns 1114 00:47:27,856 --> 00:47:32,446 that an int it's called sigma in homage to the little symbol 1115 00:47:32,496 --> 00:47:34,086 from your math book that looks like this, 1116 00:47:34,206 --> 00:47:35,476 and just implies summation. 1117 00:47:35,886 --> 00:47:37,656 So I want to do this. 1118 00:47:37,776 --> 00:47:42,556 I want to sum up all of the numbers from I equals 0 to M. 1119 00:47:43,196 --> 00:47:45,286 So if I want to express this like the back 1120 00:47:45,286 --> 00:47:46,666 of a high school math book which, 1121 00:47:46,666 --> 00:47:50,156 this is I equals 0 on up to M of I. 1122 00:47:50,476 --> 00:47:53,356 So I just want to compute the sum of all the numbers from 0 1123 00:47:53,356 --> 00:47:55,886 up to M. Okay, so not an interesting problem, 1124 00:47:55,886 --> 00:47:57,816 but it's interesting how we can solve it. 1125 00:47:58,246 --> 00:48:00,136 So here's one little sanity check. 1126 00:48:00,136 --> 00:48:02,216 When we said last week you better get into the habit 1127 00:48:02,216 --> 00:48:04,006 for your own sake and your program's sake 1128 00:48:04,266 --> 00:48:07,806 for error checking, you've got to do sanity checks like this. 1129 00:48:08,086 --> 00:48:09,736 Just because the function is going to be passed 1130 00:48:09,736 --> 00:48:14,286 in int per the definition, per its signature, so to speak, 1131 00:48:14,286 --> 00:48:17,256 per its declaration up here, that doesn't mean it's going 1132 00:48:17,256 --> 00:48:18,406 to be an int you want. 1133 00:48:18,466 --> 00:48:19,486 It might be negative. 1134 00:48:19,486 --> 00:48:20,366 It might be zero. 1135 00:48:20,366 --> 00:48:22,826 Neither of which we want if we're trying to sum 1136 00:48:22,826 --> 00:48:24,216 up positive values only. 1137 00:48:24,676 --> 00:48:26,936 So I'm going to avoid the risk of an infinite loop 1138 00:48:27,086 --> 00:48:28,176 by just doing a sanity check. 1139 00:48:28,296 --> 00:48:31,066 If M is less than 1, return zero. 1140 00:48:31,716 --> 00:48:34,366 So here's an interesting feature of C and a lot of languages. 1141 00:48:34,726 --> 00:48:37,586 If you have to somehow signify an error condition, 1142 00:48:38,136 --> 00:48:42,026 some languages, as you may know from high school, like Java, 1143 00:48:42,026 --> 00:48:44,386 have this thing called exceptions where you can kind 1144 00:48:44,386 --> 00:48:47,416 of raise a red flag and not bother returning a value at all. 1145 00:48:47,696 --> 00:48:49,656 But languages like C require 1146 00:48:49,656 --> 00:48:52,316 that a function return a value, generally. 1147 00:48:52,566 --> 00:48:53,586 In this is case an int. 1148 00:48:53,906 --> 00:48:56,436 And sometimes you need to return what we called last week a 1149 00:48:56,436 --> 00:48:59,006 special sentinel value, a special value 1150 00:48:59,006 --> 00:49:02,206 that the person calling your function had better check 1151 00:49:02,206 --> 00:49:05,286 for if they want to see if your function worked correctly. 1152 00:49:05,626 --> 00:49:08,856 So it's not sufficient for a function like this to just print 1153 00:49:08,856 --> 00:49:10,896 out warning, M is less than 1. 1154 00:49:11,136 --> 00:49:14,196 Because how is the code that called this function going 1155 00:49:14,196 --> 00:49:16,016 to know that an error occurred, right? 1156 00:49:16,016 --> 00:49:18,246 This is the point we were trying to make last week 1157 00:49:18,246 --> 00:49:20,436 with side effects versus return values. 1158 00:49:20,696 --> 00:49:22,116 A human can see side effects. 1159 00:49:22,176 --> 00:49:24,546 A computer can only see return values. 1160 00:49:24,546 --> 00:49:27,486 So if we're trying to write code that calls this function 1161 00:49:27,736 --> 00:49:30,546 and I want my code to be able to check if an error happens, 1162 00:49:30,746 --> 00:49:34,626 I can't rely on print def, I have to rely on return values. 1163 00:49:34,936 --> 00:49:37,416 So what value could I return in the event 1164 00:49:37,446 --> 00:49:39,406 that the input just doesn't make sense? 1165 00:49:39,736 --> 00:49:41,656 Well, here I arbitrarily chose zero. 1166 00:49:41,916 --> 00:49:44,616 What might have been another reasonable value to return 1167 00:49:44,616 --> 00:49:46,166 to signify essentially error? 1168 00:49:47,676 --> 00:49:49,406 Negative 1, why? 1169 00:49:49,406 --> 00:49:50,096 Well just because. 1170 00:49:50,166 --> 00:49:54,496 If you return a negative value clearly the caller, the function 1171 00:49:54,496 --> 00:49:56,606 that called this function can infer 1172 00:49:56,606 --> 00:49:58,846 by getting back a negative number when the whole point 1173 00:49:58,846 --> 00:50:00,986 of the exercise was summation of positive numbers, 1174 00:50:01,336 --> 00:50:02,386 that something went wrong. 1175 00:50:02,606 --> 00:50:05,906 So think too, back to last week when we mentioned very briefly, 1176 00:50:05,906 --> 00:50:07,226 because we'll come back to it, 1177 00:50:07,226 --> 00:50:10,756 that get string sometimes can return a bogus value 1178 00:50:10,756 --> 00:50:12,266 or not return a string per se. 1179 00:50:12,266 --> 00:50:15,416 It can sometimes return a special sentinel called what? 1180 00:50:15,796 --> 00:50:19,326 This was null, so we'll see this many more times, 1181 00:50:19,326 --> 00:50:20,116 it hasn't really come up, 1182 00:50:20,116 --> 00:50:24,846 but N-U-L-L in all caps is a constant that signifies, 1183 00:50:25,306 --> 00:50:28,166 there is no string, but I'm a function with a return value, 1184 00:50:28,166 --> 00:50:29,486 I have to return something. 1185 00:50:29,716 --> 00:50:32,376 So when you start consulting man pages later in the course 1186 00:50:32,376 --> 00:50:34,756 for new functions that you might want to employ, 1187 00:50:34,966 --> 00:50:36,836 you'll need to check the section of the manual 1188 00:50:36,836 --> 00:50:39,386 that says what return values does this function have, 1189 00:50:40,226 --> 00:50:42,236 because sometimes you want to check with your own little 1190 00:50:42,236 --> 00:50:45,406 if condition, did I get back a value I care about. 1191 00:50:45,666 --> 00:50:47,026 Well, this code is easy. 1192 00:50:47,126 --> 00:50:50,046 Now once I've done this sanity check and I want to sum 1193 00:50:50,046 --> 00:50:53,456 up the numbers from 1 through M or equivalently 0 through M, 1194 00:50:53,456 --> 00:50:54,876 I'm going to do this, I'm going 1195 00:50:54,876 --> 00:50:56,466 to declare a variable called sum, 1196 00:50:56,466 --> 00:50:58,426 it's going to be a type int, and I'm going 1197 00:50:58,426 --> 00:50:59,656 to initialize it to zero. 1198 00:50:59,876 --> 00:51:03,196 I'm then going to iterate from I equals 1 all the way up to M -- 1199 00:51:03,606 --> 00:51:05,376 equals, less than or equals to M -- 1200 00:51:05,376 --> 00:51:08,766 and then I++, and then what does this code ultimately do? 1201 00:51:08,766 --> 00:51:11,366 It just sums up all the numbers from 1 to M 1202 00:51:11,366 --> 00:51:14,176 or 0 to M. All right, so if I see this in action, 1203 00:51:14,176 --> 00:51:16,136 let me go ahead to a terminal window here. 1204 00:51:16,306 --> 00:51:22,066 I've got -- let's go ahead and open a new one, let me go ahead 1205 00:51:22,066 --> 00:51:25,326 and SSH to nice.fas.harvard.edu, 1206 00:51:25,986 --> 00:51:33,406 and if I go into my CS 50 directory, okay, Lecture 3, 1207 00:51:34,156 --> 00:51:35,596 Week 3, okay, sigma one. 1208 00:51:35,816 --> 00:51:38,196 So this is the same exact code. 1209 00:51:38,666 --> 00:51:41,326 Notice -- oh, why do I have this at the top of this file? 1210 00:51:41,326 --> 00:51:45,876 What's the point of significant -- 1211 00:51:45,876 --> 00:51:48,026 declaring sigma as a prototype up there? 1212 00:51:48,026 --> 00:51:53,376 Yeah, so remember from last week, 1213 00:51:53,466 --> 00:51:56,196 if you are calling a function that just so happens 1214 00:51:56,196 --> 00:51:58,906 to be implemented later in the file but you're trying 1215 00:51:58,906 --> 00:52:02,766 to call it from main, which apparently I am, so this was not 1216 00:52:02,766 --> 00:52:06,236 on the slide a moment ago, but apparently I am calling sigma 1217 00:52:06,326 --> 00:52:08,526 from main -- it's a stupid detail, but you've got 1218 00:52:08,526 --> 00:52:11,716 to proactively tell the compiler hey, there is a function 1219 00:52:11,716 --> 00:52:15,306 in this file called sigma, it returns an int, it takes an int, 1220 00:52:15,646 --> 00:52:17,436 just know to expect it eventually. 1221 00:52:17,696 --> 00:52:18,816 And realize this too. 1222 00:52:18,816 --> 00:52:20,906 It turns out that when you declare a prototype 1223 00:52:20,906 --> 00:52:24,686 for a function, you don't have to provide the names 1224 00:52:24,686 --> 00:52:26,936 of the variables, all the compiler needs to know 1225 00:52:26,996 --> 00:52:28,776 at that point in time is what types 1226 00:52:28,936 --> 00:52:30,786 of variables it has as arguments. 1227 00:52:30,786 --> 00:52:32,656 So you can literally just write inter, 1228 00:52:32,816 --> 00:52:35,316 or if it takes two arguments, int, comma, whatever. 1229 00:52:35,786 --> 00:52:37,336 So either approach is fine. 1230 00:52:37,336 --> 00:52:41,356 I could have specified the name of the variable up here 1231 00:52:41,356 --> 00:52:44,246 with int-M, but it just isn't necessary. 1232 00:52:44,246 --> 00:52:45,696 So I tend to be fairly minimalist 1233 00:52:45,696 --> 00:52:46,496 when I write the code. 1234 00:52:46,696 --> 00:52:48,436 All right, so here's the function down there. 1235 00:52:48,666 --> 00:52:50,006 And notice one other detail, 1236 00:52:50,126 --> 00:52:51,886 just to make this more compelling. 1237 00:52:52,176 --> 00:52:55,246 Why did I not do this, because this was a question asked, 1238 00:52:55,246 --> 00:52:57,036 I think on the bulletin board this weekend. 1239 00:52:57,366 --> 00:53:00,116 Why did I not do -- say, this. 1240 00:53:02,056 --> 00:53:04,536 Probably a couple problems here. 1241 00:53:05,106 --> 00:53:07,906 What's one, yeah? 1242 00:53:08,031 --> 00:53:10,031 [ Inaudible audience comment ] 1243 00:53:10,046 --> 00:53:12,916 >> Yeah. So there's a logical error, declaring int, 1244 00:53:13,246 --> 00:53:15,706 reinitializing sum to zero every time. 1245 00:53:15,706 --> 00:53:16,626 So that's not good. 1246 00:53:16,926 --> 00:53:19,316 What if I did this, as I've seen before. 1247 00:53:19,316 --> 00:53:21,296 I can sometimes initialize two variables 1248 00:53:21,906 --> 00:53:23,586 to a value in my for loop. 1249 00:53:23,586 --> 00:53:26,016 So that seems to be okay. 1250 00:53:26,016 --> 00:53:27,886 But what kind of error am I going to run into next. 1251 00:53:28,446 --> 00:53:29,726 Some of you may have -- yeah? 1252 00:53:30,636 --> 00:53:32,336 Oh, this is -- don't stretch like this during class. 1253 00:53:32,336 --> 00:53:34,366 Okay. Yeah. 1254 00:53:34,366 --> 00:53:34,976 In the back. 1255 00:53:35,516 --> 00:53:40,436 [ Inaudible audience comment ] 1256 00:53:40,936 --> 00:53:41,856 >> Exactly. 1257 00:53:41,856 --> 00:53:43,936 Remember last week's discussion of scope. 1258 00:53:43,986 --> 00:53:46,636 So generally, when you declare a variable, 1259 00:53:46,636 --> 00:53:50,486 say inside curly braces, or in this case, inside the for loop, 1260 00:53:50,576 --> 00:53:54,066 that variable only lives inside of that for loop, 1261 00:53:54,096 --> 00:53:55,636 which means this math is perfect. 1262 00:53:55,806 --> 00:53:58,566 We initialize sum to zero, we keep adding to sum I, 1263 00:53:58,816 --> 00:54:01,666 but the moment down here I try to return sum, 1264 00:54:01,996 --> 00:54:06,256 my compiler is going to complain to me sum not declared, 1265 00:54:06,416 --> 00:54:08,956 even though it is, because it's outside of the scope. 1266 00:54:08,956 --> 00:54:10,916 So it was very deliberate a moment ago 1267 00:54:10,916 --> 00:54:14,436 when I declared this variable outside of the loop 1268 00:54:14,546 --> 00:54:16,966 so that it would remain in scope the entire time. 1269 00:54:16,966 --> 00:54:18,496 All right, so let's go ahead and save this. 1270 00:54:18,496 --> 00:54:20,296 Let's make sigma 1. 1271 00:54:20,606 --> 00:54:22,716 I'm not going to bother typing GGC and all that, 1272 00:54:22,716 --> 00:54:24,816 because this handles a lot of the magic for me now. 1273 00:54:24,816 --> 00:54:27,256 I'm going to go ahead and dot slash sigma 1. 1274 00:54:27,676 --> 00:54:29,686 Be ever so clear for security reasons that it's 1275 00:54:29,686 --> 00:54:30,826 in this current directory. 1276 00:54:30,826 --> 00:54:35,736 Enter, positive integer please, let's say 10. 1277 00:54:35,736 --> 00:54:36,646 Hopefully, that's correct. 1278 00:54:37,346 --> 00:54:38,516 9. All right. 1279 00:54:38,516 --> 00:54:39,736 Let's do one I can check quickly. 1280 00:54:39,866 --> 00:54:44,216 3. Yes, I'm pretty sure that 3 plus 2 plus 1 equals in fact 6. 1281 00:54:44,216 --> 00:54:45,776 So feels like this code is correct. 1282 00:54:46,206 --> 00:54:49,036 But turns out we can implement this in a very different way. 1283 00:54:49,176 --> 00:54:50,876 How it this function get called first? 1284 00:54:51,196 --> 00:54:54,366 Well, here is an example of where do while is useful. 1285 00:54:54,596 --> 00:54:58,276 We said last week -- we said last week 1286 00:54:58,576 --> 00:55:01,266 that do while is often useful when you want to ask the user 1287 00:55:01,266 --> 00:55:03,136 for something, give them a chance to give it to you, 1288 00:55:03,136 --> 00:55:06,136 and then reprimand them or ask them again if they fail 1289 00:55:06,136 --> 00:55:07,596 to provide what you're looking for. 1290 00:55:07,916 --> 00:55:09,586 So here's exactly that scenario. 1291 00:55:09,586 --> 00:55:12,476 Declare an int called N. I'm going to do the following. 1292 00:55:12,536 --> 00:55:14,786 Tell the user positive integer, please. 1293 00:55:14,886 --> 00:55:16,156 I'm going to then use get int 1294 00:55:16,156 --> 00:55:17,786 from the CS 50 library to get the int. 1295 00:55:17,786 --> 00:55:21,496 And then right here rather elegantly am I going to check 1296 00:55:21,496 --> 00:55:25,356 within my while construct, if N is at this point in time less 1297 00:55:25,406 --> 00:55:27,976 than 1, do it again, and do it again. 1298 00:55:28,546 --> 00:55:31,606 So this is a general approach, do while, to getting user input 1299 00:55:31,606 --> 00:55:34,936 and the result to be clear is if I try messing with the computer 1300 00:55:34,936 --> 00:55:38,036 and say negative 123 or negative 6, 1301 00:55:38,036 --> 00:55:39,386 it's just going to keep pestering me. 1302 00:55:39,656 --> 00:55:42,936 But remember, my random word of last week, monkey, 1303 00:55:43,166 --> 00:55:46,216 why does it say retry instead of positive integer here? 1304 00:55:47,416 --> 00:55:49,526 So this is [Inaudible] to get int. 1305 00:55:49,526 --> 00:55:52,876 So as the name get int implies, get int will do its best 1306 00:55:52,876 --> 00:55:55,716 to get an int from the user, but it doesn't guarantee it's going 1307 00:55:55,716 --> 00:55:57,956 to be positive or negative or anything like that. 1308 00:55:58,066 --> 00:55:59,066 It just gets you an int. 1309 00:55:59,546 --> 00:56:02,146 So turns out we can implement this same function 1310 00:56:02,146 --> 00:56:05,326 in a really interesting way that hints at the beauty 1311 00:56:05,496 --> 00:56:07,226 of this thing called recursion. 1312 00:56:07,586 --> 00:56:10,266 A function is recursive if it calls itself. 1313 00:56:10,776 --> 00:56:12,926 Now on first glance or first hear 1314 00:56:13,236 --> 00:56:14,586 that might be a little worrisome. 1315 00:56:14,586 --> 00:56:16,526 If I'm a function and I got called with an input, 1316 00:56:16,526 --> 00:56:19,126 and I try to solve that problem by calling myself, 1317 00:56:19,426 --> 00:56:21,716 that should immediately should infinite loop, 1318 00:56:21,716 --> 00:56:22,726 bad stuff happens. 1319 00:56:23,076 --> 00:56:25,516 But hopefully there's some way of making sure 1320 00:56:25,516 --> 00:56:27,016 that you don't keep calling yourself 1321 00:56:27,016 --> 00:56:28,596 in perpetuity, and there is. 1322 00:56:28,596 --> 00:56:29,546 It's called the base case. 1323 00:56:29,976 --> 00:56:33,676 So notice this implementation of Signa is only four lines 1324 00:56:33,676 --> 00:56:36,656 of code, and frankly, if I got rid of some of the indentation, 1325 00:56:36,656 --> 00:56:38,566 all that, it's maybe like a two line program, 1326 00:56:38,566 --> 00:56:40,116 maybe a one line program 1327 00:56:40,476 --> 00:56:43,176 that implements exactly the same function, 1328 00:56:43,176 --> 00:56:46,016 exactly the same formula of adding numbers together. 1329 00:56:46,296 --> 00:56:47,046 So what do I do? 1330 00:56:47,046 --> 00:56:50,726 On input M I ask myself is M less than or equal to 0. 1331 00:56:51,046 --> 00:56:54,106 If so, just return 0, because bad things will happen 1332 00:56:54,106 --> 00:56:56,136 if I start trying to deal with negative numbers. 1333 00:56:56,656 --> 00:56:59,006 Now apparently all I need to do 1334 00:56:59,006 --> 00:57:01,476 to answer this question is call myself. 1335 00:57:01,476 --> 00:57:04,636 And this is where you get this weird sort of circular magic. 1336 00:57:04,946 --> 00:57:07,796 Notice that all I do otherwise, after my base case, 1337 00:57:07,926 --> 00:57:12,036 after my sanity check, is return the result of M, 1338 00:57:12,426 --> 00:57:16,336 the value I was passed, plus sigma of M minus 1. 1339 00:57:17,046 --> 00:57:20,196 Now that alone actually works. 1340 00:57:20,196 --> 00:57:23,926 If I go back to my terminal and I compile sigma 2, 1341 00:57:23,926 --> 00:57:28,966 and I run sigma 2, and I give it the number 3, I get back 6. 1342 00:57:28,966 --> 00:57:30,606 If I give it 10, I get 55. 1343 00:57:30,606 --> 00:57:34,126 In fact, it works, and yet how is it doing that? 1344 00:57:34,126 --> 00:57:34,986 Well, let's see. 1345 00:57:35,306 --> 00:57:37,696 So it turns out that this problem 1346 00:57:37,696 --> 00:57:41,166 of adding all the numbers from 0 on up on M, 1347 00:57:41,166 --> 00:57:42,986 or equivalently M all the way 1348 00:57:42,986 --> 00:57:45,526 down to 0 is pretty easy, even conceptually. 1349 00:57:45,526 --> 00:57:48,276 If you hand me the number M and ask me what's the sum from M 1350 00:57:48,276 --> 00:57:50,866 down to 0, well, I could say, all right, well, 1351 00:57:50,866 --> 00:57:54,326 it's M plus the sum of all the numbers 1352 00:57:54,326 --> 00:57:56,256 from M minus 1 on down there. 1353 00:57:56,396 --> 00:57:58,296 All right, so let me ask you again. 1354 00:57:58,296 --> 00:58:01,226 What is the sum of M minus 1 on down to 0. 1355 00:58:01,226 --> 00:58:03,806 Well, it's M minus 1 plus the sum 1356 00:58:03,806 --> 00:58:05,986 of M minus 2 all the way down to 0. 1357 00:58:05,986 --> 00:58:08,666 Right? You can be this really argumentative human pushing back 1358 00:58:08,766 --> 00:58:12,126 on the person in wise-ass form, oh, well, it's just the answer 1359 00:58:12,126 --> 00:58:13,646 of M plus the rest itself. 1360 00:58:13,896 --> 00:58:14,856 Well, what's the rest of it. 1361 00:58:14,856 --> 00:58:17,156 Well, it's a little bit less than the rest of it, 1362 00:58:17,226 --> 00:58:18,356 plus the rest of that. 1363 00:58:18,746 --> 00:58:20,646 Right, you just keep taking this bite, this bite, 1364 00:58:20,706 --> 00:58:22,846 minus 1 out of each piece of the puzzle. 1365 00:58:23,116 --> 00:58:25,506 And that would seem to go on forever, but not quite. 1366 00:58:25,506 --> 00:58:29,246 Because eventually when I get over here, my last piece 1367 00:58:29,246 --> 00:58:31,786 of the puzzle is going to be M equals what? 1368 00:58:32,146 --> 00:58:35,086 Zero. And that's when I hit the so-called base case, 1369 00:58:35,556 --> 00:58:38,416 when all of my answers now start to bubble up. 1370 00:58:38,656 --> 00:58:40,756 In a sense, I was kind of passing the buck 1371 00:58:40,956 --> 00:58:43,406 by playing this game with the person asking the question, 1372 00:58:43,406 --> 00:58:45,356 well, what's M plus all the numbers below it? 1373 00:58:45,356 --> 00:58:48,426 Well, it's M plus the sum of all the numbers below it. 1374 00:58:48,426 --> 00:58:49,546 What's the sum of that, right. 1375 00:58:49,546 --> 00:58:51,486 Eventually, we bottom out at zero, 1376 00:58:51,486 --> 00:58:54,316 and then I finally start getting useful answers back 1377 00:58:54,576 --> 00:58:57,056 until the M result is precisely the same. 1378 00:58:57,236 --> 00:58:59,716 But what's been the point of all of this? 1379 00:59:00,106 --> 00:59:02,476 Well, and again, this is something that you perhaps begin 1380 00:59:02,476 --> 00:59:05,326 to see eventually, and maybe you never see, but there's kind 1381 00:59:05,326 --> 00:59:08,866 of a beauty about expressing something as silly as addition 1382 00:59:08,866 --> 00:59:12,086 of numbers with just this recursive scenario. 1383 00:59:12,376 --> 00:59:14,646 And this will be a very representative approach 1384 00:59:14,646 --> 00:59:17,736 to solving much bigger, much more interesting problems. 1385 00:59:18,006 --> 00:59:18,576 But there is a catch. 1386 00:59:19,376 --> 00:59:23,486 Let me go ahead and run sigma of one, my first version, on 100. 1387 00:59:23,826 --> 00:59:24,726 That works pretty well. 1388 00:59:24,726 --> 00:59:26,446 How about 1,000. 1389 00:59:26,446 --> 00:59:27,136 Works pretty well. 1390 00:59:27,136 --> 00:59:28,376 How about 10,000. 1391 00:59:28,896 --> 00:59:29,696 Still working. 1392 00:59:29,766 --> 00:59:33,406 100,000. All right, then eventually I'm going 1393 00:59:33,406 --> 00:59:34,626 to start to exhaust an int. 1394 00:59:35,156 --> 00:59:38,066 Okay, so now I exhausted the int, but we would expect that. 1395 00:59:38,066 --> 00:59:40,006 Let me try now to run sigma 2. 1396 00:59:40,246 --> 00:59:41,786 So it's a more elegant solution, 1397 00:59:41,836 --> 00:59:43,796 feels like it should be even faster 1398 00:59:43,796 --> 00:59:47,396 because there's just frankly fewer lines of code, well, 1399 00:59:47,396 --> 00:59:48,926 let's see what happens. 1400 00:59:49,026 --> 00:59:50,796 Sigma 2 -- oops, 10. 1401 00:59:50,966 --> 00:59:52,656 That works, how about 100. 1402 00:59:52,656 --> 00:59:53,136 That works. 1403 00:59:53,606 --> 00:59:55,106 How about 1,000. 1404 00:59:55,236 --> 00:59:56,026 Still working. 1405 00:59:56,476 --> 00:59:58,186 10,000. All right. 1406 00:59:58,456 --> 01:00:02,576 100,000. Okay, come on, break. 1407 01:00:03,376 --> 01:00:05,886 Oh, now some of you have seen this problem too. 1408 01:00:05,886 --> 01:00:07,146 And we actually smiled. 1409 01:00:07,146 --> 01:00:07,856 So in all fairness 1410 01:00:07,906 --> 01:00:09,706 to the anonymized student asking the question 1411 01:00:09,706 --> 01:00:12,286 on the bulletin board, is it okay if my program quits 1412 01:00:12,326 --> 01:00:14,286 by saying segmentation fault, [Inaudible] dump. 1413 01:00:14,686 --> 01:00:16,566 No, so this is a bad thing. 1414 01:00:18,586 --> 01:00:19,756 Bless your heart for asking, 1415 01:00:19,756 --> 01:00:21,336 but this is not a solution to the problem. 1416 01:00:21,516 --> 01:00:23,636 This is, as I said in the bulletin board post, frankly -- 1417 01:00:23,876 --> 01:00:26,446 or maybe -- actually, maybe this was a private e-mail, 1418 01:00:26,646 --> 01:00:28,346 as I said in my response -- 1419 01:00:29,246 --> 01:00:32,676 as I said in this response this is sort of the LINUX equivalent 1420 01:00:32,746 --> 01:00:35,116 of Microsoft Word just hanging, 1421 01:00:35,116 --> 01:00:36,956 or your Mac program just freezing 1422 01:00:36,956 --> 01:00:40,036 or unexpectedly quitting, which is ever so annoying. 1423 01:00:40,256 --> 01:00:42,136 This is sort of the equivalent in the LINUX world. 1424 01:00:42,226 --> 01:00:44,306 But fortunately, now that we're the programmer 1425 01:00:44,306 --> 01:00:47,416 and we're not just the user, the fact that it's dumping core, 1426 01:00:47,416 --> 01:00:49,386 so to speak, is actually useful. 1427 01:00:49,386 --> 01:00:50,596 So for today's purposes, 1428 01:00:50,676 --> 01:00:53,256 segmentation fault means something bad happened 1429 01:00:53,256 --> 01:00:54,286 with respect to memory. 1430 01:00:54,326 --> 01:00:57,086 Because memory in your computer is generally modeled as a bunch 1431 01:00:57,086 --> 01:01:00,166 of segments, so a segmentation fault means something went awry 1432 01:01:00,166 --> 01:01:01,066 with regard to memory. 1433 01:01:01,446 --> 01:01:04,946 Core dumped means that something useful was left 1434 01:01:05,146 --> 01:01:06,326 in my current directory. 1435 01:01:06,326 --> 01:01:08,006 If I type LS for list, 1436 01:01:08,406 --> 01:01:10,566 notice that I have this file called core. 1437 01:01:10,876 --> 01:01:15,056 Core is actually a file containing pretty much the 1438 01:01:15,116 --> 01:01:18,516 contents of the computer's RAM at the moment in time 1439 01:01:18,516 --> 01:01:19,786 when my computer screwed up. 1440 01:01:20,096 --> 01:01:22,796 Now it's not all of the memory from all users on the system, 1441 01:01:22,796 --> 01:01:25,236 because that would be a security concern or privacy concern, 1442 01:01:25,526 --> 01:01:28,736 but it's diagnostically useful, we'll see in a week 1443 01:01:28,736 --> 01:01:31,796 or so once we start dissecting the insides of that file. 1444 01:01:31,796 --> 01:01:35,296 But for now the question is quite simply why it this happen, 1445 01:01:36,456 --> 01:01:40,186 why did it not appear capable of handling the same number 1446 01:01:40,186 --> 01:01:43,886 of zeros as the first version. 1447 01:01:43,886 --> 01:01:45,846 Now granted, the first version, still broken, 1448 01:01:46,286 --> 01:01:50,576 but it doesn't ultimately choke in the same horrific way. 1449 01:01:51,656 --> 01:01:56,096 So what is happening in sigma 2 that's not apparently happening 1450 01:01:56,166 --> 01:01:57,066 in sigma 1. 1451 01:01:57,366 --> 01:01:59,316 And again, some of you have tripped over this, and frankly, 1452 01:01:59,316 --> 01:02:01,776 by the end of the semester you will all have core dumped. 1453 01:02:03,296 --> 01:02:05,486 Why? What might be going wrong here? 1454 01:02:05,686 --> 01:02:11,356 So it has to do with memory. 1455 01:02:11,576 --> 01:02:13,966 Where is memory getting used in this program. 1456 01:02:13,966 --> 01:02:16,016 I don't even see any local variables other 1457 01:02:16,016 --> 01:02:18,316 than the parameter called M for this function. 1458 01:02:18,646 --> 01:02:20,016 So where's the memory getting used. 1459 01:02:20,016 --> 01:02:22,716 Like, why do I have memory problems? 1460 01:02:23,566 --> 01:02:28,976 Yeah, so when I call M -- when I call sigma does what? 1461 01:02:29,516 --> 01:02:31,736 [ Inaudible audience comment ] 1462 01:02:32,236 --> 01:02:36,486 >> So it makes -- so it allocates a bit more memory. 1463 01:02:36,486 --> 01:02:40,686 In fact, let me remind of this little picture from last week, 1464 01:02:40,686 --> 01:02:41,846 and we said we'd come back to this. 1465 01:02:42,086 --> 01:02:45,026 This recalls the rectangle that represents your computer's RAM. 1466 01:02:45,266 --> 01:02:47,276 But recall the more important point from last week 1467 01:02:47,276 --> 01:02:49,696 that any time you call a function a little chunk 1468 01:02:49,696 --> 01:02:51,396 of memory does get allocated 1469 01:02:51,396 --> 01:02:54,166 on the so-called stack for that function. 1470 01:02:54,166 --> 01:02:58,986 And that chunk of memory we call a frame, so if sigma is called 1471 01:02:58,986 --> 01:03:01,846 by name and then sigma calls who in version 2? 1472 01:03:03,036 --> 01:03:07,146 Sigma. And then sigma calls sigma, and sigma calls sigma, 1473 01:03:07,396 --> 01:03:09,716 you know, even though on this picture I've not implied 1474 01:03:09,716 --> 01:03:11,166 that there is an upper bound on the amount 1475 01:03:11,166 --> 01:03:13,506 of memory you can allocate, but in reality there is. 1476 01:03:13,506 --> 01:03:17,456 On a typical system, a given program often cannot use any 1477 01:03:17,456 --> 01:03:19,326 more than say 2 gigabytes of memory. 1478 01:03:19,586 --> 01:03:23,026 Now that's a lot of memory, but not necessarily for programs 1479 01:03:23,026 --> 01:03:25,576 like Photoshop, and so there actually are some interesting 1480 01:03:25,576 --> 01:03:26,946 real world implications here. 1481 01:03:27,276 --> 01:03:31,436 But the point is that rectangle eventually has a top to it. 1482 01:03:31,436 --> 01:03:34,366 And if you keep trying to put more and more stack frames 1483 01:03:34,436 --> 01:03:37,776 on your memory stack, eventually you're going to overlap 1484 01:03:37,976 --> 01:03:40,026 with something else that needs it more. 1485 01:03:40,026 --> 01:03:42,906 In fact, what did we say is at the very top of this rectangle, 1486 01:03:43,346 --> 01:03:44,506 if briefly, last week? 1487 01:03:45,086 --> 01:03:45,246 Sorry? 1488 01:03:45,246 --> 01:03:46,556 [ Inaudible audience comment ] 1489 01:03:46,556 --> 01:03:52,446 >> So there's a bunch of -- there's global variables, 1490 01:03:52,446 --> 01:03:54,266 but there's also the text segment, 1491 01:03:54,266 --> 01:03:56,096 as we called it, the program itself. 1492 01:03:56,166 --> 01:03:57,956 Your zeros and ones of your program itself. 1493 01:03:58,196 --> 01:04:00,506 Now this doesn't mean that you're overwriting your program 1494 01:04:00,506 --> 01:04:02,336 on disc, it doesn't mean you're erasing your code. 1495 01:04:02,606 --> 01:04:04,746 But it does mean that the operating system is just going 1496 01:04:04,746 --> 01:04:07,186 to bail all together if you try using memory 1497 01:04:07,486 --> 01:04:09,866 that already belongs to something else. 1498 01:04:11,496 --> 01:04:15,646 So let's go ahead and end there, and on Wednesday we'll figure 1499 01:04:15,646 --> 01:04:18,956 out how to work around this. 1500 01:04:19,516 --> 01:04:24,390 [ Background noise ]