1 00:00:00,000 --> 00:00:00,488 2 00:00:00,488 --> 00:00:10,800 >> [MUSIC PLAYING] 3 00:00:10,800 --> 00:00:13,500 DAVID MALAN: All right. 4 00:00:13,500 --> 00:00:14,670 All right, welcome back. 5 00:00:14,670 --> 00:00:18,120 So this is Week 4, the beginning thereof, already. 6 00:00:18,120 --> 00:00:21,320 And you'll recall that last week, we put code aside for just a little bit 7 00:00:21,320 --> 00:00:24,240 and we started talking a little more high-level, about things like 8 00:00:24,240 --> 00:00:28,130 searching and sorting, which though somewhat simple ideas, are 9 00:00:28,130 --> 00:00:31,840 representative of a class of problems you will begin to solve particularly 10 00:00:31,840 --> 00:00:34,820 as you start thinking about final projects and interesting solutions you 11 00:00:34,820 --> 00:00:36,760 might have to real-world problems. 12 00:00:36,760 --> 00:00:39,490 Now bubble sort was one of the simplest such algorithms, and it 13 00:00:39,490 --> 00:00:42,900 worked by having these small numbers in a list or in an array kind of 14 00:00:42,900 --> 00:00:46,530 bubble their way up to the top, and the big numbers move their way down to 15 00:00:46,530 --> 00:00:47,930 the end of that list. 16 00:00:47,930 --> 00:00:50,650 >> And recall that we could visualize bubble sort a little 17 00:00:50,650 --> 00:00:52,310 something like this. 18 00:00:52,310 --> 00:00:53,640 So let me go ahead and click Start. 19 00:00:53,640 --> 00:00:55,350 I've preselected bubble sort. 20 00:00:55,350 --> 00:00:58,920 And if you recall that the taller blue lines represent big numbers, small 21 00:00:58,920 --> 00:01:03,300 blue lines represent small numbers, as we go through this again and again and 22 00:01:03,300 --> 00:01:07,680 again, comparing two bars next to each other in red, we're going to swap the 23 00:01:07,680 --> 00:01:11,010 biggest and the smallest if they are out of order. 24 00:01:11,010 --> 00:01:14,150 >> So this will go on and go on and go on, and you'll see that the larger 25 00:01:14,150 --> 00:01:16,700 elements are making their way to the right, and the smaller elements are 26 00:01:16,700 --> 00:01:17,900 making their way to the left. 27 00:01:17,900 --> 00:01:21,380 But we began to quantify the efficiency, the 28 00:01:21,380 --> 00:01:22,970 quality of this algorithm. 29 00:01:22,970 --> 00:01:25,200 And we said that in the worst case, this algorithm took 30 00:01:25,200 --> 00:01:27,940 roughly how many steps? 31 00:01:27,940 --> 00:01:28,980 >> So n squared. 32 00:01:28,980 --> 00:01:30,402 And what was n? 33 00:01:30,402 --> 00:01:31,650 >> AUDIENCE: Number of elements. 34 00:01:31,650 --> 00:01:32,790 >> DAVID MALAN: So n was the number of elements. 35 00:01:32,790 --> 00:01:33,730 And so we'll do this often. 36 00:01:33,730 --> 00:01:36,650 Any time we want to talk about the size of a problem or the size of an 37 00:01:36,650 --> 00:01:39,140 input, or the amount of time it takes to produce output, we'll just 38 00:01:39,140 --> 00:01:41,610 generalized whatever the input is as n. 39 00:01:41,610 --> 00:01:45,970 So back in Week 0, the number pages in the phone book was n. 40 00:01:45,970 --> 00:01:47,550 The number of students in the room was n. 41 00:01:47,550 --> 00:01:49,630 So here, too, we're following that pattern. 42 00:01:49,630 --> 00:01:52,800 >> Now n squared isn't particularly fast, so we tried to do better. 43 00:01:52,800 --> 00:01:55,970 And so we looked at a couple of other algorithms, among which 44 00:01:55,970 --> 00:01:57,690 were selection sort. 45 00:01:57,690 --> 00:01:59,180 So selection sort was a little different. 46 00:01:59,180 --> 00:02:03,130 It was almost simpler, I dare say, whereby I started at the start of the 47 00:02:03,130 --> 00:02:06,740 list of our volunteers and I just again and again and again went through 48 00:02:06,740 --> 00:02:10,060 the list, plucking out the smallest element at a time and putting him or 49 00:02:10,060 --> 00:02:13,040 her at the beginning of the list. 50 00:02:13,040 --> 00:02:16,410 >> But this, too, once we started to think through the math and bigger 51 00:02:16,410 --> 00:02:19,860 picture, thought about how many times I was going back and forth and back 52 00:02:19,860 --> 00:02:24,090 and forth, we said in the worst case, selection sort, too, was what? 53 00:02:24,090 --> 00:02:24,960 n squared. 54 00:02:24,960 --> 00:02:27,490 Now in the real world, it might actually be marginally faster. 55 00:02:27,490 --> 00:02:30,620 Because again, I didn't have to keep backtracking once I had sorted the 56 00:02:30,620 --> 00:02:31,880 smallest elements. 57 00:02:31,880 --> 00:02:35,090 But if we think about very large n, and if you sort of do out the math as 58 00:02:35,090 --> 00:02:39,170 I did on the board, with the n squared minus something, everything else 59 00:02:39,170 --> 00:02:41,850 besides the n squared, once n gets really large, doesn't 60 00:02:41,850 --> 00:02:42,850 really matter as much. 61 00:02:42,850 --> 00:02:45,470 So as computer scientists, we sort of turn a blind eye to the smaller 62 00:02:45,470 --> 00:02:49,220 factors and focus only on the factor in an expression that's going to make 63 00:02:49,220 --> 00:02:50,330 the biggest difference. 64 00:02:50,330 --> 00:02:52,840 >> Well, lastly, we looked at insertion sort. 65 00:02:52,840 --> 00:02:56,620 And this was similar in spirit, but rather than go through iteratively and 66 00:02:56,620 --> 00:03:01,250 select the smallest element one at a time, I instead took the hand that I 67 00:03:01,250 --> 00:03:04,070 was dealt, and I decided, all right, you belong here. 68 00:03:04,070 --> 00:03:06,160 Then I moved on to the next element and decided that he or 69 00:03:06,160 --> 00:03:07,470 she belonged here. 70 00:03:07,470 --> 00:03:08,810 And then I moved on and on. 71 00:03:08,810 --> 00:03:11,780 And I might to, along the way, shift these guys in order to 72 00:03:11,780 --> 00:03:13,030 make room for them. 73 00:03:13,030 --> 00:03:16,880 So that was sort of the mental reversal of selection sort that we 74 00:03:16,880 --> 00:03:18,630 called insertion sort. 75 00:03:18,630 --> 00:03:20,810 >> So these topics to occur in the real world. 76 00:03:20,810 --> 00:03:23,640 Just a few years ago, when a certain senator was running for president, 77 00:03:23,640 --> 00:03:27,160 Eric Schmidt, at the time the CEO of Google, actually had the opportunity 78 00:03:27,160 --> 00:03:28,040 to interview him. 79 00:03:28,040 --> 00:03:32,010 And we thought we'd share this YouTube clip for you here, if we could turn up 80 00:03:32,010 --> 00:03:32,950 the volume. 81 00:03:32,950 --> 00:03:39,360 >> [VIDEO PLAYBACK] 82 00:03:39,360 --> 00:03:44,620 >> -Now, Senator, you're here at Google, and I like to think of the presidency 83 00:03:44,620 --> 00:03:46,042 as a job interview. 84 00:03:46,042 --> 00:03:47,770 >> [LAUGHTER] 85 00:03:47,770 --> 00:03:50,800 >> -Now it's hard to get a job as president. 86 00:03:50,800 --> 00:03:52,480 And you're going through the rigors now. 87 00:03:52,480 --> 00:03:54,330 It's also hard to get a job at Google. 88 00:03:54,330 --> 00:03:59,610 We have questions and we ask our candidates questions. 89 00:03:59,610 --> 00:04:02,250 And this one is from Larry Schwimmer. 90 00:04:02,250 --> 00:04:05,325 >> [LAUGHTER] 91 00:04:05,325 --> 00:04:06,400 -You guys think I'm kidding? 92 00:04:06,400 --> 00:04:08,750 It's right here. 93 00:04:08,750 --> 00:04:12,110 What is the most efficient way to sort a million two-bit integers? 94 00:04:12,110 --> 00:04:15,810 >> [LAUGHTER] 95 00:04:15,810 --> 00:04:18,260 >> -Well, uh-- 96 00:04:18,260 --> 00:04:19,029 >> -I'm sorry. 97 00:04:19,029 --> 00:04:19,745 Maybe we should-- 98 00:04:19,745 --> 00:04:21,000 >> -No, no, no, no, no. 99 00:04:21,000 --> 00:04:21,470 >> -That's not a-- 100 00:04:21,470 --> 00:04:22,185 OK. 101 00:04:22,185 --> 00:04:25,328 >> -I think the bubble sort would be the wrong way to go. 102 00:04:25,328 --> 00:04:26,792 >> [LAUGHTER] 103 00:04:26,792 --> 00:04:28,510 >> [CHEERING AND APPLAUSE] 104 00:04:28,510 --> 00:04:31,211 >> -Come on, who told him this? 105 00:04:31,211 --> 00:04:32,155 OK. 106 00:04:32,155 --> 00:04:33,350 >> [END VIDEO PLAYBACK] 107 00:04:33,350 --> 00:04:35,070 >> DAVID MALAN: So there you have it. 108 00:04:35,070 --> 00:04:39,600 So we began to quantify these running times, so to speak, with something 109 00:04:39,600 --> 00:04:43,480 called asymptotic notation, which is just referring to our sort of turning 110 00:04:43,480 --> 00:04:47,420 a blind eye to those smaller factors and only looking at the running time, 111 00:04:47,420 --> 00:04:51,250 the performance of these algorithms, as n gets really large over time. 112 00:04:51,250 --> 00:04:55,110 And so we introduced big O. And big O represented something that we thought 113 00:04:55,110 --> 00:04:57,000 of as an upper bound. 114 00:04:57,000 --> 00:04:59,570 And actually, Barry, can we lower than the mic a little bit? 115 00:04:59,570 --> 00:05:01,040 >> We thought of this is an upper bound. 116 00:05:01,040 --> 00:05:04,710 So big O of n squared means that in the worst case, something like 117 00:05:04,710 --> 00:05:07,780 selection sort would take n squared steps. 118 00:05:07,780 --> 00:05:10,310 Or something like insertion sort would n squared steps. 119 00:05:10,310 --> 00:05:15,150 Now for something like insertion sort, what was the worst case? 120 00:05:15,150 --> 00:05:18,200 Given an array, what's the worst possible scenario that you might find 121 00:05:18,200 --> 00:05:20,650 yourself faced with? 122 00:05:20,650 --> 00:05:21,860 >> It's completely backwards, right? 123 00:05:21,860 --> 00:05:24,530 Because if it's completely backwards, you have to do a whole lot of work. 124 00:05:24,530 --> 00:05:26,420 Because if you're completely backwards, you're going to find the 125 00:05:26,420 --> 00:05:28,840 biggest element here, even though it belongs down there. 126 00:05:28,840 --> 00:05:31,140 So you're going to say, all right, at this moment in time, you belong here, 127 00:05:31,140 --> 00:05:32,310 so you leave it alone. 128 00:05:32,310 --> 00:05:35,425 >> Then you realize, oh, damn, I have to move this slightly smaller element to 129 00:05:35,425 --> 00:05:36,470 the left of you. 130 00:05:36,470 --> 00:05:38,770 Then I have to do that again and again and again. 131 00:05:38,770 --> 00:05:41,410 And if I walked back and forth, you would sort of feel the performance of 132 00:05:41,410 --> 00:05:45,540 that algorithm, because constantly am I shuffling everyone else down in the 133 00:05:45,540 --> 00:05:46,510 array to make room for it. 134 00:05:46,510 --> 00:05:47,750 So that's the worst case. 135 00:05:47,750 --> 00:05:48,570 >> By contrast-- 136 00:05:48,570 --> 00:05:50,320 and this was a cliffhanger last time-- 137 00:05:50,320 --> 00:05:54,065 we said that insertion sort was an omega of what? 138 00:05:54,065 --> 00:05:57,530 What's the best-case running time of insertion sort? 139 00:05:57,530 --> 00:05:58,520 So it's actually n. 140 00:05:58,520 --> 00:06:00,980 That was the blank that we left on the board last time. 141 00:06:00,980 --> 00:06:03,160 >> And it's omega of n because why? 142 00:06:03,160 --> 00:06:06,630 Well, in the very best case, what's insertion sort going to be handed? 143 00:06:06,630 --> 00:06:09,830 Well, a list that's completely sorted already, minimal work to do. 144 00:06:09,830 --> 00:06:12,460 But what's neat about insertion sort is that because it starts here and 145 00:06:12,460 --> 00:06:15,340 decides, oh, you are the number one, you belong here. 146 00:06:15,340 --> 00:06:16,970 Oh, what a good fortune. 147 00:06:16,970 --> 00:06:17,740 >> You're the number two. 148 00:06:17,740 --> 00:06:19,030 You also belong here. 149 00:06:19,030 --> 00:06:21,010 Number three, even better, you belong here. 150 00:06:21,010 --> 00:06:25,210 As soon as it gets to the end of the list, per insertion sort's pseudocode 151 00:06:25,210 --> 00:06:28,010 that we walked through verbally last time, it's done. 152 00:06:28,010 --> 00:06:32,790 But selection sort, by contrast, kept doing what? 153 00:06:32,790 --> 00:06:35,260 >> Kept going through the list again and again and again. 154 00:06:35,260 --> 00:06:39,160 Because the key insight there was only once you've looked all the way to the 155 00:06:39,160 --> 00:06:42,500 end of the list can you be certain that the element you selected was 156 00:06:42,500 --> 00:06:45,560 indeed the currently smallest element. 157 00:06:45,560 --> 00:06:48,920 So these different mental models end up yielding some very real-world 158 00:06:48,920 --> 00:06:53,130 differences for us, as well as these theoretical asymptotic differences. 159 00:06:53,130 --> 00:06:56,910 >> So just to recap, then, big O of n squared, we've seen a few such 160 00:06:56,910 --> 00:06:58,350 algorithms thus far. 161 00:06:58,350 --> 00:06:59,580 Big O of n? 162 00:06:59,580 --> 00:07:02,870 What's an algorithm that could be said to be big O of n? 163 00:07:02,870 --> 00:07:06,930 In the worst case, it takes a linear number of steps. 164 00:07:06,930 --> 00:07:07,810 >> OK, linear search. 165 00:07:07,810 --> 00:07:10,470 And in the worst case, where is the element you're looking for when 166 00:07:10,470 --> 00:07:12,950 applying linear search? 167 00:07:12,950 --> 00:07:14,680 >> OK, in the worst case, it's not even there. 168 00:07:14,680 --> 00:07:17,000 Or in the second worst case, it's all the way in the end, which is 169 00:07:17,000 --> 00:07:18,880 plus-or-minus a one-step difference. 170 00:07:18,880 --> 00:07:21,180 So at the end of the day, we can say it's linear. 171 00:07:21,180 --> 00:07:23,910 Big O of n would be linear search, because in the worst case, the 172 00:07:23,910 --> 00:07:26,610 element's not even there or it's all the way at the end. 173 00:07:26,610 --> 00:07:29,370 >> Well, big O of log of n. 174 00:07:29,370 --> 00:07:32,760 We didn't talk in great detail about this, but we've seen this before. 175 00:07:32,760 --> 00:07:36,840 What runs in so-called logarithmic time, in the worst case? 176 00:07:36,840 --> 00:07:38,500 >> Yeah, so binary search. 177 00:07:38,500 --> 00:07:42,930 And binary search in the worst case might have the element somewhere in 178 00:07:42,930 --> 00:07:45,640 the middle, or somewhere inside the array. 179 00:07:45,640 --> 00:07:48,040 But you only find it once you divide the list in half, in 180 00:07:48,040 --> 00:07:48,940 half, in half, in half. 181 00:07:48,940 --> 00:07:50,200 And then voila, it's there. 182 00:07:50,200 --> 00:07:52,500 Or again, worst case, it's not even there. 183 00:07:52,500 --> 00:07:56,770 But you don't know that it's not there until you sort of reach that last 184 00:07:56,770 --> 00:08:00,470 bottom-most elements by halving and halving and halving. 185 00:08:00,470 --> 00:08:01,400 >> Big O of 1. 186 00:08:01,400 --> 00:08:03,540 So we could big O of 2, big O of 3. 187 00:08:03,540 --> 00:08:06,260 Anytime you want just a constant number, we just sort of just simplify 188 00:08:06,260 --> 00:08:07,280 that as big O of 1. 189 00:08:07,280 --> 00:08:10,440 Even though if realistically, it takes 2 or even 100 steps, if it's a 190 00:08:10,440 --> 00:08:13,680 constant number of steps, we just say big O of 1. 191 00:08:13,680 --> 00:08:15,930 What's an algorithm that's in big O of 1? 192 00:08:15,930 --> 00:08:18,350 >> AUDIENCE: Finding the length of a variable. 193 00:08:18,350 --> 00:08:21,090 >> DAVID MALAN: Finding the length of a variable? 194 00:08:21,090 --> 00:08:23,870 >> AUDIENCE: No, the length if it's already sorted. 195 00:08:23,870 --> 00:08:24,160 >> DAVID MALAN: Good. 196 00:08:24,160 --> 00:08:27,850 OK, so finding the length of something if the length of that something, like 197 00:08:27,850 --> 00:08:30,020 an array, is stored in some variable. 198 00:08:30,020 --> 00:08:33,380 Because you can just read the variable, or print the variable, or 199 00:08:33,380 --> 00:08:34,960 just generally access that variable. 200 00:08:34,960 --> 00:08:37,299 And voila, that takes constant time. 201 00:08:37,299 --> 00:08:38,909 >> By contrast, think back to scratch. 202 00:08:38,909 --> 00:08:42,460 Think back to the first week of C, calling just printf and printing 203 00:08:42,460 --> 00:08:46,240 something on the screen is arguably constant time, because it just takes 204 00:08:46,240 --> 00:08:50,880 some number of CPU cycles to show that text on the screen. 205 00:08:50,880 --> 00:08:52,720 Or wait-- does it? 206 00:08:52,720 --> 00:08:56,430 How else might we model the performance of printf? 207 00:08:56,430 --> 00:09:00,420 Would someone like to disagree, that maybe it's not really constant time? 208 00:09:00,420 --> 00:09:03,600 In what sense might printf's running time, actually printing a string on 209 00:09:03,600 --> 00:09:05,580 the screen, be something other than constant. 210 00:09:05,580 --> 00:09:07,860 >> AUDIENCE: [INAUDIBLE]. 211 00:09:07,860 --> 00:09:08,230 >> DAVID MALAN: Yeah. 212 00:09:08,230 --> 00:09:09,300 So it depends on our perspective. 213 00:09:09,300 --> 00:09:13,390 If we actually think of the input to printf as being the string, and 214 00:09:13,390 --> 00:09:16,380 therefore we measure the size of that input by its length-- so let's call 215 00:09:16,380 --> 00:09:17,780 that length n as well-- 216 00:09:17,780 --> 00:09:21,990 arguably, printf is itself big O of n because it's going to take you n steps 217 00:09:21,990 --> 00:09:24,750 to print out each of those n characters, most likely. 218 00:09:24,750 --> 00:09:27,730 At least to the extent that we assume that maybe it's using a for loop 219 00:09:27,730 --> 00:09:28,560 underneath the hood. 220 00:09:28,560 --> 00:09:30,860 >> But we would have to look at that code to understand it better. 221 00:09:30,860 --> 00:09:33,650 And indeed, once you guys start analyzing your own algorithms, you'll 222 00:09:33,650 --> 00:09:34,900 literally do just that. 223 00:09:34,900 --> 00:09:37,765 Sort of eyeball your code and think about-- all right, I have this loop 224 00:09:37,765 --> 00:09:41,870 here or I have a nested loops here, that's going to do n things n times, 225 00:09:41,870 --> 00:09:46,050 and you can sort of reason your way through the code, even if it's 226 00:09:46,050 --> 00:09:47,980 pseudocode and not actual code. 227 00:09:47,980 --> 00:09:49,730 >> So what about omega of n squared? 228 00:09:49,730 --> 00:09:53,582 What was an algorithm that in the best case, still took n squared steps? 229 00:09:53,582 --> 00:09:54,014 Yeah? 230 00:09:54,014 --> 00:09:54,880 >> AUDIENCE: [INAUDIBLE]. 231 00:09:54,880 --> 00:09:55,900 >> DAVID MALAN: So selection sort. 232 00:09:55,900 --> 00:09:59,150 Because in that problem really reduced to the fact that again, I don't know 233 00:09:59,150 --> 00:10:02,600 I've found the current smallest until I've checked all the darn elements. 234 00:10:02,600 --> 00:10:08,050 So omega of, say, n, we just came up with one. 235 00:10:08,050 --> 00:10:09,300 Insertion sort. 236 00:10:09,300 --> 00:10:12,370 If the list happens to be sorted already, in the best case we just have 237 00:10:12,370 --> 00:10:15,090 to make one pass through it, at which point we're sure. 238 00:10:15,090 --> 00:10:17,890 And then that could be said to be linear, for sure. 239 00:10:17,890 --> 00:10:20,570 >> What about omega of 1? 240 00:10:20,570 --> 00:10:23,790 What, in the best case, might take a constant number of steps? 241 00:10:23,790 --> 00:10:27,220 So linear search, if you just get lucky and the element you're looking 242 00:10:27,220 --> 00:10:31,000 is right at the beginning of the list, if that's where you're starting your 243 00:10:31,000 --> 00:10:33,070 linear traversal of that list. 244 00:10:33,070 --> 00:10:35,180 >> And this is true of a number of things. 245 00:10:35,180 --> 00:10:37,660 For instance, even binary search is omega of 1. 246 00:10:37,660 --> 00:10:40,310 Because what if you get really darn lucky and smack-dab in the middle of 247 00:10:40,310 --> 00:10:42,950 your array is the number you're looking for? 248 00:10:42,950 --> 00:10:45,730 So you can get lucky there, as well. 249 00:10:45,730 --> 00:10:49,190 >> This one, lastly, omega of n log n. 250 00:10:49,190 --> 00:10:52,573 So n log n, we didn't really talk about yet, but-- 251 00:10:52,573 --> 00:10:53,300 >> AUDIENCE: Merge sort? 252 00:10:53,300 --> 00:10:53,960 >> DAVID MALAN: Merge sort. 253 00:10:53,960 --> 00:10:56,920 That was the cliffhanger of last time, where we proposed, and we showed 254 00:10:56,920 --> 00:10:58,600 visually, that there are algorithms. 255 00:10:58,600 --> 00:11:02,470 And merge sort of just one such algorithm that is fundamentally faster 256 00:11:02,470 --> 00:11:03,450 than some of these other guys. 257 00:11:03,450 --> 00:11:07,800 In fact, merge short is not only in the best case n log n, in the worst 258 00:11:07,800 --> 00:11:09,460 case n log n. 259 00:11:09,460 --> 00:11:14,540 And when you have this coincidence of omega and big O being the same thing? 260 00:11:14,540 --> 00:11:17,310 We can actually describe that as what's called theta, though it's a 261 00:11:17,310 --> 00:11:18,220 little less common. 262 00:11:18,220 --> 00:11:21,730 But that just means the two bounds, in this case, are the same. 263 00:11:21,730 --> 00:11:25,770 >> So merge sort, what does this really boil down to for us? 264 00:11:25,770 --> 00:11:27,000 Well, recall the motivation. 265 00:11:27,000 --> 00:11:30,340 Let me pull up another animation that we didn't look at last time. 266 00:11:30,340 --> 00:11:33,390 This one, same idea, but it's a little larger. 267 00:11:33,390 --> 00:11:36,160 And I'm going to go ahead and point out first-- we have insertion sort on 268 00:11:36,160 --> 00:11:39,410 the top left, then selection sort, bubble sort, a couple of other sorts-- 269 00:11:39,410 --> 00:11:42,670 shell and quick-- that we haven't talked about, and heap and merge sort. 270 00:11:42,670 --> 00:11:47,090 >> So at least try to focus your eyes on the top three on the left and then 271 00:11:47,090 --> 00:11:49,120 merge sort when I click this green arrow. 272 00:11:49,120 --> 00:11:51,900 But I'll let all of them run, just to give you a sense of the diversity of 273 00:11:51,900 --> 00:11:53,980 algorithms that exist in the world. 274 00:11:53,980 --> 00:11:56,180 I'm going to let this run for just a few seconds. 275 00:11:56,180 --> 00:11:59,640 And if you focus your eyes-- pick an algorithm, focus on it for just a 276 00:11:59,640 --> 00:12:02,970 seconds-- you'll begin to see the pattern that it's implementing. 277 00:12:02,970 --> 00:12:04,530 >> Merge sort, notice, is done. 278 00:12:04,530 --> 00:12:06,430 Heap sort, quick sort, shell-- 279 00:12:06,430 --> 00:12:09,480 so it seems we introduced the three worst algorithms last week. 280 00:12:09,480 --> 00:12:12,960 But that's good that we're here today to look at merge sort, which is one of 281 00:12:12,960 --> 00:12:16,500 the easier ones is to look at, even though it probably will bend your mind 282 00:12:16,500 --> 00:12:17,490 just a little bit. 283 00:12:17,490 --> 00:12:21,130 Here we can see just how much selection sort sucks. 284 00:12:21,130 --> 00:12:24,600 >> But on the flip side, it's really easy to implement. 285 00:12:24,600 --> 00:12:28,160 And maybe for P Set 3, that's one of the algorithms you chose to implement 286 00:12:28,160 --> 00:12:28,960 for the standard edition. 287 00:12:28,960 --> 00:12:30,970 Perfectly fine, perfectly correct. 288 00:12:30,970 --> 00:12:35,210 >> But again, as n gets large, if you choose to implement a faster algorithm 289 00:12:35,210 --> 00:12:39,020 like merge sort, odds are in larger and larger inputs, your code is just 290 00:12:39,020 --> 00:12:39,800 going to run faster. 291 00:12:39,800 --> 00:12:41,090 Your website's going to work better. 292 00:12:41,090 --> 00:12:42,650 Your users are going to be happier. 293 00:12:42,650 --> 00:12:45,280 And so there are these effects of actually giving 294 00:12:45,280 --> 00:12:47,350 us some deeper thought. 295 00:12:47,350 --> 00:12:49,990 >> So let's take a look at what merge sort is actually all about. 296 00:12:49,990 --> 00:12:52,992 The cool thing is that merge sort is just this. 297 00:12:52,992 --> 00:12:56,300 This is, again, what we've called pseudocode, pseudocode being 298 00:12:56,300 --> 00:12:57,720 English-like syntax. 299 00:12:57,720 --> 00:12:59,890 And the simplicity is sort of fascinating. 300 00:12:59,890 --> 00:13:02,840 >> So on input of n elements-- so that just means, here's an array. 301 00:13:02,840 --> 00:13:04,000 It's got n things in it. 302 00:13:04,000 --> 00:13:05,370 That's all we're saying there. 303 00:13:05,370 --> 00:13:07,560 >> If n is less than 2, return. 304 00:13:07,560 --> 00:13:08,640 So that's just the trivial case. 305 00:13:08,640 --> 00:13:12,580 If n is less than 2, then obviously it's 1 or 0, in which case the thing 306 00:13:12,580 --> 00:13:14,780 is already sorted or nonexistent, so just return. 307 00:13:14,780 --> 00:13:15,900 There's nothing to do. 308 00:13:15,900 --> 00:13:18,360 So that's a simple case to pluck off. 309 00:13:18,360 --> 00:13:20,110 >> Else, we have three steps. 310 00:13:20,110 --> 00:13:23,650 Sort the left half of the elements, sort the right half of the elements, 311 00:13:23,650 --> 00:13:26,650 and then merge the sorted halves. 312 00:13:26,650 --> 00:13:29,400 What's interesting here is that I'm kind of punting, right? 313 00:13:29,400 --> 00:13:32,300 There's kind of a circular definition to this algorithm. 314 00:13:32,300 --> 00:13:35,986 In what sense is this algorithm's definition circular? 315 00:13:35,986 --> 00:13:37,850 >> AUDIENCE: [INAUDIBLE]. 316 00:13:37,850 --> 00:13:41,670 >> DAVID MALAN: Yeah, my sorting algorithm, two of its steps are "sort 317 00:13:41,670 --> 00:13:44,640 something." And so that begs the question, well, what am I going to use 318 00:13:44,640 --> 00:13:46,460 to sort the left half and the right half? 319 00:13:46,460 --> 00:13:49,600 And the beauty here is that even though again, this is the mind-bending 320 00:13:49,600 --> 00:13:54,030 part potentially, you can use same algorithm to sort the left half. 321 00:13:54,030 --> 00:13:54,700 >> But wait a minute. 322 00:13:54,700 --> 00:13:57,070 When you're told to sort the left half, what are the two 323 00:13:57,070 --> 00:13:58,240 steps going to be next? 324 00:13:58,240 --> 00:14:00,550 We'll sort the left half of the left half and the right 325 00:14:00,550 --> 00:14:01,420 half of the left half. 326 00:14:01,420 --> 00:14:04,430 Damn, how do I sort those two halves, or quarters, now? 327 00:14:04,430 --> 00:14:05,260 >> But that's OK. 328 00:14:05,260 --> 00:14:07,830 We have a sorting algorithm here. 329 00:14:07,830 --> 00:14:10,660 And even though you might worry at first this is kind of an infinite 330 00:14:10,660 --> 00:14:12,780 loop, it's a cycle that's never going to end-- it is going to 331 00:14:12,780 --> 00:14:15,770 end once what happens? 332 00:14:15,770 --> 00:14:16,970 Once n is less than 2. 333 00:14:16,970 --> 00:14:19,180 Which eventually is going to happen, because if you keep halving and 334 00:14:19,180 --> 00:14:23,020 halving in halving these halves, surely eventually you're going to end 335 00:14:23,020 --> 00:14:25,350 up with just 1 or 0 elements. 336 00:14:25,350 --> 00:14:28,500 At which point, this algorithm says you're done. 337 00:14:28,500 --> 00:14:31,020 >> So the real magic in this algorithm seems to be in 338 00:14:31,020 --> 00:14:33,470 that final step, merging. 339 00:14:33,470 --> 00:14:37,190 That simple idea just merging two things, that's what's ultimately going 340 00:14:37,190 --> 00:14:40,920 to allow us to sort an array of, let's say, eight elements. 341 00:14:40,920 --> 00:14:44,410 So I have eight more stress balls up here, eight pieces of paper, and one 342 00:14:44,410 --> 00:14:45,500 Google Glass-- 343 00:14:45,500 --> 00:14:46,140 which I get to keep. 344 00:14:46,140 --> 00:14:46,960 >> [LAUGHTER] 345 00:14:46,960 --> 00:14:48,970 >> DAVID MALAN: If we could take eight volunteers, and let's see if we can 346 00:14:48,970 --> 00:14:51,430 play this out, so. 347 00:14:51,430 --> 00:14:52,500 Wow, OK. 348 00:14:52,500 --> 00:14:53,565 Computer science is getting fun. 349 00:14:53,565 --> 00:14:54,320 All right. 350 00:14:54,320 --> 00:14:57,770 So how about you three, biggest hand up there. 351 00:14:57,770 --> 00:14:58,580 Four in the back. 352 00:14:58,580 --> 00:15:02,220 And how about we'll do you three in this row? 353 00:15:02,220 --> 00:15:03,390 And four in the front. 354 00:15:03,390 --> 00:15:04,920 So you eight, come on up. 355 00:15:04,920 --> 00:15:12,060 >> [LAUGHTER] 356 00:15:12,060 --> 00:15:13,450 >> DAVID MALAN: I'm actually not sure what it is. 357 00:15:13,450 --> 00:15:14,810 Is it the stress balls? 358 00:15:14,810 --> 00:15:16,510 The desk lamps? 359 00:15:16,510 --> 00:15:18,650 The material? 360 00:15:18,650 --> 00:15:19,680 The internet? 361 00:15:19,680 --> 00:15:20,160 >> OK. 362 00:15:20,160 --> 00:15:21,310 So come on up. 363 00:15:21,310 --> 00:15:22,310 Who would like-- 364 00:15:22,310 --> 00:15:23,570 keep coming up. 365 00:15:23,570 --> 00:15:24,240 Let's see. 366 00:15:24,240 --> 00:15:26,460 And this puts you in location-- 367 00:15:26,460 --> 00:15:27,940 you're in location one. 368 00:15:27,940 --> 00:15:28,670 Uh-oh, wait a minute. 369 00:15:28,670 --> 00:15:30,760 1, 2, 3, 4, 5, 6, 7-- oh, good. 370 00:15:30,760 --> 00:15:31,310 All right, we're good. 371 00:15:31,310 --> 00:15:35,130 All right, so everyone have a seat, but not on the Google Glass. 372 00:15:35,130 --> 00:15:36,475 Let me to queue these up. 373 00:15:36,475 --> 00:15:37,115 What's your name? 374 00:15:37,115 --> 00:15:37,440 >> MICHELLE: Michelle. 375 00:15:37,440 --> 00:15:38,090 >> DAVID MALAN: Michelle? 376 00:15:38,090 --> 00:15:42,000 All right, you get to look like the geek, if that's OK. 377 00:15:42,000 --> 00:15:44,625 Well, I do too, I suppose, for just a moment. 378 00:15:44,625 --> 00:15:45,875 All right, standby. 379 00:15:45,875 --> 00:15:48,510 380 00:15:48,510 --> 00:15:50,950 We've been trying to come up with a use case for Google Glass, and we 381 00:15:50,950 --> 00:15:53,750 thought it would be fun to just do this when people are onstage. 382 00:15:53,750 --> 00:15:57,120 We will record the world from their perspective. 383 00:15:57,120 --> 00:15:58,410 All right. 384 00:15:58,410 --> 00:15:59,830 Not probably what Google intended. 385 00:15:59,830 --> 00:16:02,260 All right, if you don't mind wearing this for the next awkward minutes, 386 00:16:02,260 --> 00:16:03,150 that would be wonderful. 387 00:16:03,150 --> 00:16:08,620 >> All right, so we have here an array of elements, and that array, as per the 388 00:16:08,620 --> 00:16:11,480 pieces of paper in these folks' hands, is currently unsorted. 389 00:16:11,480 --> 00:16:12,050 >> MICHELLE: Oh, that's so weird. 390 00:16:12,050 --> 00:16:12,810 >> DAVID MALAN: It's pretty much random. 391 00:16:12,810 --> 00:16:15,760 And in just a moment, we're going to try to implement merge sort together 392 00:16:15,760 --> 00:16:17,950 and see where that key insight is. 393 00:16:17,950 --> 00:16:21,970 And the trick here with merge sort is something that we haven't assumed yet. 394 00:16:21,970 --> 00:16:24,030 We actually need some additional space. 395 00:16:24,030 --> 00:16:26,650 So what's going to be particularly interesting about this is that these 396 00:16:26,650 --> 00:16:29,270 guys are going to move around a little bit, because I'm going to assume that 397 00:16:29,270 --> 00:16:31,880 there's an extra array of space, say, right behind them. 398 00:16:31,880 --> 00:16:34,570 >> So if they're behind their chair, that's the secondary array. 399 00:16:34,570 --> 00:16:36,960 If they're seated here, that's the primary array. 400 00:16:36,960 --> 00:16:40,170 But this is a resource that we have not leveraged thus far with bubble 401 00:16:40,170 --> 00:16:42,040 sort, with selection sort, with insertion sort. 402 00:16:42,040 --> 00:16:44,600 Recall last week, everyone just kind of shuffled in place. 403 00:16:44,600 --> 00:16:46,840 They didn't use any additional memory. 404 00:16:46,840 --> 00:16:49,310 We made room for people by moving people around. 405 00:16:49,310 --> 00:16:50,580 >> So this is a key insight, too. 406 00:16:50,580 --> 00:16:53,410 There's this trade-off, in general in computer science, of resources. 407 00:16:53,410 --> 00:16:55,800 If you want to speed up something like time, you're going to 408 00:16:55,800 --> 00:16:56,900 have to pay a price. 409 00:16:56,900 --> 00:17:00,750 And one of those prices is very often space, the amount of memory or hard 410 00:17:00,750 --> 00:17:01,700 disk space that you're using. 411 00:17:01,700 --> 00:17:03,640 Or, frankly, the amount of programmer time. 412 00:17:03,640 --> 00:17:06,700 How much time it takes you, the human, to actually implement some more 413 00:17:06,700 --> 00:17:07,829 complicated algorithm. 414 00:17:07,829 --> 00:17:09,760 But for today, the trade-off is time and space. 415 00:17:09,760 --> 00:17:11,930 >> So if you guys could just hold up your numbers so we can see that you're 416 00:17:11,930 --> 00:17:15,839 indeed matching 4, 2, 6, 1, 3, 7, 8. 417 00:17:15,839 --> 00:17:16,599 Excellent. 418 00:17:16,599 --> 00:17:19,520 So I'm going to try to orchestrate things, if you guys can just 419 00:17:19,520 --> 00:17:21,800 follow my lead here. 420 00:17:21,800 --> 00:17:26,650 >> So I am going to implement, first, the first step of the pseudocode, which is 421 00:17:26,650 --> 00:17:29,440 on input of n elements, if n is less than 2, then return. 422 00:17:29,440 --> 00:17:31,370 Obviously, that doesn't apply, so we move on. 423 00:17:31,370 --> 00:17:33,340 So sort the left half of the elements. 424 00:17:33,340 --> 00:17:36,220 So that means I'm going to focus my attention for just a moment on these 425 00:17:36,220 --> 00:17:37,310 four guys here. 426 00:17:37,310 --> 00:17:39,774 All right, what do I next do? 427 00:17:39,774 --> 00:17:40,570 >> AUDIENCE: Sort the left half. 428 00:17:40,570 --> 00:17:42,780 >> DAVID MALAN: So now I have to sort the left half of these guys. 429 00:17:42,780 --> 00:17:45,580 Because again, assume to yourself the goal is to sort the left half. 430 00:17:45,580 --> 00:17:46,440 How do you do that? 431 00:17:46,440 --> 00:17:49,140 Just follow the instructions, even though we're doing it again. 432 00:17:49,140 --> 00:17:50,160 So sort the left half. 433 00:17:50,160 --> 00:17:52,030 Now I'm sorting these two guys. 434 00:17:52,030 --> 00:17:53,563 What comes next? 435 00:17:53,563 --> 00:17:54,510 >> AUDIENCE: Sort the left half. 436 00:17:54,510 --> 00:17:55,460 >> DAVID MALAN: Sort the left half. 437 00:17:55,460 --> 00:18:00,680 So now these, this seat here, is a list of size 1. 438 00:18:00,680 --> 00:18:01,365 And what's your name again? 439 00:18:01,365 --> 00:18:02,390 >> PRINCESS DAISY: Princess Daisy. 440 00:18:02,390 --> 00:18:03,690 >> DAVID MALAN: Princess Daisy is here. 441 00:18:03,690 --> 00:18:07,470 And so she is already sorted, because the list is of size 1. 442 00:18:07,470 --> 00:18:09,490 What do I next do? 443 00:18:09,490 --> 00:18:13,680 OK, return, because that list is of size 1, which is less than 2. 444 00:18:13,680 --> 00:18:14,320 Then what's the next step? 445 00:18:14,320 --> 00:18:17,490 And now you have to kind of backtrack in your mind. 446 00:18:17,490 --> 00:18:19,340 >> Sort the right half, which is-- 447 00:18:19,340 --> 00:18:19,570 what's your name? 448 00:18:19,570 --> 00:18:20,220 >> LINDA: Linda. 449 00:18:20,220 --> 00:18:20,980 >> DAVID MALAN: Linda. 450 00:18:20,980 --> 00:18:23,210 And so what do we do now that we have a list of size 1? 451 00:18:23,210 --> 00:18:24,440 >> AUDIENCE: Return. 452 00:18:24,440 --> 00:18:24,760 >> DAVID MALAN: Careful. 453 00:18:24,760 --> 00:18:29,540 We return first, and now the third step-- and if I kind of depict it by 454 00:18:29,540 --> 00:18:33,490 embracing the two seats now, now I have to merge these two elements. 455 00:18:33,490 --> 00:18:35,530 So now unfortunately, the elements are out of order. 456 00:18:35,530 --> 00:18:39,920 But that's where the merging process starts to get compelling. 457 00:18:39,920 --> 00:18:42,410 >> So if you guys could stand up for just a moment, I'm going to need you, in a 458 00:18:42,410 --> 00:18:44,170 moment, to step behind your chair. 459 00:18:44,170 --> 00:18:46,480 And if Linda, because 2 is smaller than 4, why don't 460 00:18:46,480 --> 00:18:48,130 you come around first? 461 00:18:48,130 --> 00:18:48,690 Stay there. 462 00:18:48,690 --> 00:18:50,520 So Linda, you come around first. 463 00:18:50,520 --> 00:18:53,820 >> Now in reality if it's just an array we could just move her in real time 464 00:18:53,820 --> 00:18:55,360 from this chair to this spot. 465 00:18:55,360 --> 00:18:57,770 So imagine that took some constant number of steps 1. 466 00:18:57,770 --> 00:18:58,480 And now-- 467 00:18:58,480 --> 00:19:01,490 but we need to put you in the first location here. 468 00:19:01,490 --> 00:19:03,930 >> And now if you could come around, as well, we're going to 469 00:19:03,930 --> 00:19:06,300 be in location two. 470 00:19:06,300 --> 00:19:09,120 And even though this feels like it's taking a while, what's nice now is 471 00:19:09,120 --> 00:19:14,710 that the left half of the left half is now sorted. 472 00:19:14,710 --> 00:19:18,010 So what was the next step, if we now rewind further in the story? 473 00:19:18,010 --> 00:19:18,980 >> AUDIENCE: Right half. 474 00:19:18,980 --> 00:19:19,900 >> DAVID MALAN: Sort the right half. 475 00:19:19,900 --> 00:19:21,320 So you guys have to do this, as well. 476 00:19:21,320 --> 00:19:23,510 So if you could stand up for just a moment? 477 00:19:23,510 --> 00:19:25,192 And what's your name? 478 00:19:25,192 --> 00:19:25,540 >> JESS: Jess. 479 00:19:25,540 --> 00:19:25,870 >> DAVID MALAN: Jess. 480 00:19:25,870 --> 00:19:29,720 OK, so Jess is now the left half of the right half. 481 00:19:29,720 --> 00:19:31,400 And so she's a list of size 1. 482 00:19:31,400 --> 00:19:32,380 She's obviously sorted. 483 00:19:32,380 --> 00:19:33,070 And your name again? 484 00:19:33,070 --> 00:19:33,630 >> MICHELLE: Michelle. 485 00:19:33,630 --> 00:19:35,340 >> DAVID MALAN: Michelle is obviously a list of size 1. 486 00:19:35,340 --> 00:19:36,050 She's already sorted. 487 00:19:36,050 --> 00:19:38,690 So now the magic happens, the merging process. 488 00:19:38,690 --> 00:19:39,790 So who's going to come first? 489 00:19:39,790 --> 00:19:41,560 Obviously Michelle. 490 00:19:41,560 --> 00:19:43,280 So if you could come around back. 491 00:19:43,280 --> 00:19:47,090 The space we have available for her now is right behind this chair here. 492 00:19:47,090 --> 00:19:51,580 And now if you could come back as well, we now have, to be clear, two 493 00:19:51,580 --> 00:19:53,810 halves, each of size 2-- 494 00:19:53,810 --> 00:19:57,090 and just for depiction's sake, if you could make a little bit of a space-- 495 00:19:57,090 --> 00:19:59,780 one left half here, one right half here. 496 00:19:59,780 --> 00:20:01,160 >> Rewind further in the story. 497 00:20:01,160 --> 00:20:02,270 What step is next? 498 00:20:02,270 --> 00:20:03,030 >> AUDIENCE: Merge. 499 00:20:03,030 --> 00:20:04,160 >> DAVID MALAN: So now we have to merge. 500 00:20:04,160 --> 00:20:07,490 So OK, so now, thankfully, we just freed up four chairs. 501 00:20:07,490 --> 00:20:11,480 So we've used twice as much memory, but we can give flip-flopping between 502 00:20:11,480 --> 00:20:12,330 the two arrays. 503 00:20:12,330 --> 00:20:14,190 So which number is to come first? 504 00:20:14,190 --> 00:20:14,850 So Michelle, obviously. 505 00:20:14,850 --> 00:20:16,680 So come around and take your seat here. 506 00:20:16,680 --> 00:20:19,120 And then number 2 is obviously next, so you come here. 507 00:20:19,120 --> 00:20:21,520 Number 4, number 6. 508 00:20:21,520 --> 00:20:23,390 And again, even though there's a little bit of walking involved, 509 00:20:23,390 --> 00:20:26,010 really, these could happen instantly, by moving one-- 510 00:20:26,010 --> 00:20:26,880 OK, well played. 511 00:20:26,880 --> 00:20:28,350 >> [LAUGHTER] 512 00:20:28,350 --> 00:20:29,680 >> DAVID MALAN: And now we're in pretty good shape. 513 00:20:29,680 --> 00:20:34,910 The left half of the entire input has now been sorted. 514 00:20:34,910 --> 00:20:37,370 All right, so these guys had the advantage of my-- 515 00:20:37,370 --> 00:20:40,340 how did it end up all the girls on the left and all the boys on the right? 516 00:20:40,340 --> 00:20:42,450 >> OK, so guys' turn now. 517 00:20:42,450 --> 00:20:44,680 So I won't walk you through these steps. 518 00:20:44,680 --> 00:20:46,550 We'll see if we can reapply the same pseudocode. 519 00:20:46,550 --> 00:20:50,050 If you want to go ahead and stand up, and you guys, let me give you the mic. 520 00:20:50,050 --> 00:20:52,990 See if you can't replicate what we just did here on the 521 00:20:52,990 --> 00:20:53,880 other end of the list. 522 00:20:53,880 --> 00:20:59,530 Who needs to speak first, based on the algorithm? 523 00:20:59,530 --> 00:21:03,210 So explain what you're doing before you make any foot movements. 524 00:21:03,210 --> 00:21:05,930 >> SPEAKER 1: All right, so since I am the left half of the 525 00:21:05,930 --> 00:21:07,790 left half, I return. 526 00:21:07,790 --> 00:21:08,730 Right? 527 00:21:08,730 --> 00:21:09,250 >> DAVID MALAN: Good. 528 00:21:09,250 --> 00:21:10,350 >> SPEAKER 1: And then-- 529 00:21:10,350 --> 00:21:11,800 >> DAVID MALAN: Who does the mic go to next? 530 00:21:11,800 --> 00:21:12,920 >> SPEAKER 1: Next number. 531 00:21:12,920 --> 00:21:14,720 >> SPEAKER 2: So I'm the right half of the left half of the 532 00:21:14,720 --> 00:21:17,830 left half, and I return. 533 00:21:17,830 --> 00:21:18,050 >> DAVID MALAN: Good. 534 00:21:18,050 --> 00:21:18,550 You return. 535 00:21:18,550 --> 00:21:21,855 So now what's the next up for you two? 536 00:21:21,855 --> 00:21:23,740 >> SPEAKER 2: We want see who's smaller. 537 00:21:23,740 --> 00:21:24,200 >> DAVID MALAN: Exactly. 538 00:21:24,200 --> 00:21:24,940 We want to merge. 539 00:21:24,940 --> 00:21:27,590 The space we're going to use to merge you into, even though they're 540 00:21:27,590 --> 00:21:30,250 obviously sorted already, we're going to follow the same algorithm. 541 00:21:30,250 --> 00:21:31,560 So who goes in back first? 542 00:21:31,560 --> 00:21:35,720 So 3, and then 7. 543 00:21:35,720 --> 00:21:38,570 And now the mic goes to these guys, OK? 544 00:21:38,570 --> 00:21:43,590 >> SPEAKER 3: So I'm the right half of the left half, and my n is less than 545 00:21:43,590 --> 00:21:45,048 1, so I'm just going to pass-- 546 00:21:45,048 --> 00:21:46,380 >> DAVID MALAN: Good. 547 00:21:46,380 --> 00:21:49,450 >> SPEAKER 4: I'm the right half of the right half of the right half, and I'm 548 00:21:49,450 --> 00:21:51,740 also one person, so I'm going to return. 549 00:21:51,740 --> 00:21:52,990 So now we merge. 550 00:21:52,990 --> 00:21:55,140 551 00:21:55,140 --> 00:21:56,150 >> SPEAKER 3: So we go back. 552 00:21:56,150 --> 00:21:57,160 >> DAVID MALAN: So you go into the back. 553 00:21:57,160 --> 00:21:59,200 So 5 goes first, then 8. 554 00:21:59,200 --> 00:22:01,240 And now audience, which is the step we have to now rewind 555 00:22:01,240 --> 00:22:02,200 back to in our minds? 556 00:22:02,200 --> 00:22:02,940 >> AUDIENCE: Merge. 557 00:22:02,940 --> 00:22:07,270 >> DAVID MALAN: Merging left half and right half of the original left half. 558 00:22:07,270 --> 00:22:08,840 So now-- 559 00:22:08,840 --> 00:22:10,520 and just to make this clear, make a little bit of space 560 00:22:10,520 --> 00:22:11,690 between you two guys. 561 00:22:11,690 --> 00:22:13,800 So now that's the two lists, left and right. 562 00:22:13,800 --> 00:22:18,320 So how do we now merge you guys into the front row of seats again? 563 00:22:18,320 --> 00:22:19,600 >> 3 goes first. 564 00:22:19,600 --> 00:22:20,850 Then 5, obviously. 565 00:22:20,850 --> 00:22:23,110 566 00:22:23,110 --> 00:22:27,330 Then 7, and now 8. 567 00:22:27,330 --> 00:22:28,710 OK, and now we are? 568 00:22:28,710 --> 00:22:29,650 >> AUDIENCE: Not done. 569 00:22:29,650 --> 00:22:32,440 >> DAVID MALAN: Not done, because obviously, there's one step remaining. 570 00:22:32,440 --> 00:22:35,720 But again, the reason I'm using this jargon like "rewind in your mind," 571 00:22:35,720 --> 00:22:37,160 it's because that's really what's happening. 572 00:22:37,160 --> 00:22:39,610 We're going through all of these steps, but we're sort of pausing for a 573 00:22:39,610 --> 00:22:42,480 moment, diving deeper into the algorithm, pausing for a moment, 574 00:22:42,480 --> 00:22:45,840 diving deeper into the algorithm, and now we have to sort of rewind in our 575 00:22:45,840 --> 00:22:49,430 minds and undo all of these layers that we've sort of put on hold. 576 00:22:49,430 --> 00:22:51,790 >> So now we have two lists of size 4. 577 00:22:51,790 --> 00:22:54,790 If you guys could stand up one last time and make a bit of space here to 578 00:22:54,790 --> 00:22:57,230 make clear that this is the left half of the original, the 579 00:22:57,230 --> 00:22:58,620 right half of the original. 580 00:22:58,620 --> 00:23:01,060 Who's the first number that we need to pull into the back? 581 00:23:01,060 --> 00:23:01,870 Michelle, of course. 582 00:23:01,870 --> 00:23:03,230 >> So we put Michelle here. 583 00:23:03,230 --> 00:23:05,080 And who has number 2? 584 00:23:05,080 --> 00:23:06,440 Number 2 comes on back as well. 585 00:23:06,440 --> 00:23:07,800 Number 3? 586 00:23:07,800 --> 00:23:08,510 Excellent. 587 00:23:08,510 --> 00:23:16,570 Number 4, number 5, number 6, number 7, and number 8. 588 00:23:16,570 --> 00:23:18,850 >> OK, so it felt like a lot of steps, for sure. 589 00:23:18,850 --> 00:23:22,390 But now let's see if we can't confirm sort of intuitively that this 590 00:23:22,390 --> 00:23:26,190 algorithm fundamentally, particularly as n gets really large, as we've seen 591 00:23:26,190 --> 00:23:29,170 with the animations, is fundamentally faster. 592 00:23:29,170 --> 00:23:33,400 So I claim this algorithm, in the worst case and even in the best case, 593 00:23:33,400 --> 00:23:36,160 is big O of n times log n. 594 00:23:36,160 --> 00:23:39,160 That is, there's some aspect of this algorithm that takes n steps, but 595 00:23:39,160 --> 00:23:43,110 there's another aspect somewhere in that iteration, that looping, that 596 00:23:43,110 --> 00:23:44,410 takes log n steps. 597 00:23:44,410 --> 00:23:49,154 Can we put our finger on what those two numbers are referring to? 598 00:23:49,154 --> 00:23:51,320 Well, where-- 599 00:23:51,320 --> 00:23:54,160 where'd the mic go? 600 00:23:54,160 --> 00:23:58,660 >> SPEAKER 1: Would the log n be breaking us up into two-- 601 00:23:58,660 --> 00:23:59,630 dividing by two, essentially. 602 00:23:59,630 --> 00:24:00,120 >> DAVID MALAN: Exactly. 603 00:24:00,120 --> 00:24:03,000 Any time we see in any algorithm thus far, there's been this pattern of 604 00:24:03,000 --> 00:24:04,200 dividing, dividing, dividing. 605 00:24:04,200 --> 00:24:05,700 And it's typically reduced to something that's 606 00:24:05,700 --> 00:24:07,100 logarithmic, log base 2. 607 00:24:07,100 --> 00:24:10,180 But it could really be anything, but log base 2. 608 00:24:10,180 --> 00:24:11,330 >> Now what about the n? 609 00:24:11,330 --> 00:24:14,550 I can see that we kind of divided you guys-- divided you, divided you, 610 00:24:14,550 --> 00:24:15,910 divided you, divided you. 611 00:24:15,910 --> 00:24:18,760 Where does the end come from? 612 00:24:18,760 --> 00:24:19,810 >> So it's the merging. 613 00:24:19,810 --> 00:24:20,610 Because think about it. 614 00:24:20,610 --> 00:24:25,420 When you merge eight people together, whereby half of them are a set of four 615 00:24:25,420 --> 00:24:27,770 and the other half are another set of four, how do you go 616 00:24:27,770 --> 00:24:28,820 about doing the merging? 617 00:24:28,820 --> 00:24:30,830 Well, you guys did it fairly intuitively. 618 00:24:30,830 --> 00:24:34,140 >> But if I instead did it a little more methodically, I might have pointed at 619 00:24:34,140 --> 00:24:38,090 the leftmost person first with my left hand, pointed at the leftmost person 620 00:24:38,090 --> 00:24:42,080 of that half with my right hand, and just subsequently walked through the 621 00:24:42,080 --> 00:24:46,990 list, pointing at the smallest element each time, moving my finger over and 622 00:24:46,990 --> 00:24:48,970 over as needed throughout the list. 623 00:24:48,970 --> 00:24:51,890 But what's key about this merging process is I'm comparing these pairs 624 00:24:51,890 --> 00:24:53,460 of elements. 625 00:24:53,460 --> 00:24:57,270 From the right half and from the left half, I'm never once backtracking. 626 00:24:57,270 --> 00:25:00,570 >> So the merge itself is taking no more than n steps. 627 00:25:00,570 --> 00:25:03,250 And how many times did I have to do that merging? 628 00:25:03,250 --> 00:25:07,150 Well, no more than n, and we just saw that with the final merge. 629 00:25:07,150 --> 00:25:13,120 And so if you do something that takes log n steps n times, or vice versa, 630 00:25:13,120 --> 00:25:15,210 it's going to give us n times log n. 631 00:25:15,210 --> 00:25:16,310 >> And why is this better? 632 00:25:16,310 --> 00:25:19,600 Well, if we already know that log n is better than n-- right? 633 00:25:19,600 --> 00:25:22,590 We saw in binary search, the phone book example, log n was definitely 634 00:25:22,590 --> 00:25:23,760 better than linear. 635 00:25:23,760 --> 00:25:28,420 So that means n times log n is definitely better than n times another 636 00:25:28,420 --> 00:25:30,390 n, AKA n squared. 637 00:25:30,390 --> 00:25:32,400 And that's what we ultimately feel. 638 00:25:32,400 --> 00:25:34,928 >> So big round of applause, if we could, for these guys. 639 00:25:34,928 --> 00:25:38,920 >> [APPLAUSE] 640 00:25:38,920 --> 00:25:41,550 >> DAVID MALAN: And your parting gifts-- you may keep the numbers, 641 00:25:41,550 --> 00:25:44,010 if you would like. 642 00:25:44,010 --> 00:25:45,620 And your parting gift, as usual. 643 00:25:45,620 --> 00:25:47,290 Oh, and we will send you the footage, Michelle. 644 00:25:47,290 --> 00:25:48,343 Thank you. 645 00:25:48,343 --> 00:25:49,250 All right. 646 00:25:49,250 --> 00:25:50,400 Help yourself to a stress ball. 647 00:25:50,400 --> 00:25:54,110 >> And let me pull up, in the meantime, our friend Rob Bowden to offer 648 00:25:54,110 --> 00:25:59,520 somewhat different perspective on this, since you can think about these 649 00:25:59,520 --> 00:26:01,280 steps happening in a somewhat different way. 650 00:26:01,280 --> 00:26:04,750 In fact, the set-up for what Rob's about to show us assumes that we've 651 00:26:04,750 --> 00:26:09,030 already done the dividing up of the big list into eight small lists, 652 00:26:09,030 --> 00:26:10,570 each of size 1. 653 00:26:10,570 --> 00:26:13,350 >> So we're changing the pseudocode a little bit just to sort of get at the 654 00:26:13,350 --> 00:26:15,320 core idea of how the merging works. 655 00:26:15,320 --> 00:26:17,600 But the running time of what he's about to do is still 656 00:26:17,600 --> 00:26:19,110 going to be the same. 657 00:26:19,110 --> 00:26:23,540 And again, the set-up here is that he's begun with eight lists of size 1. 658 00:26:23,540 --> 00:26:27,730 So you've missed the part where he's actually done the log n, log n, log n 659 00:26:27,730 --> 00:26:31,205 dividing of the input. 660 00:26:31,205 --> 00:26:32,120 >> [VIDEO PLAYBACK] 661 00:26:32,120 --> 00:26:33,615 >> -That's it for step one. 662 00:26:33,615 --> 00:26:38,270 For step two, repeatedly merge pairs of lists. 663 00:26:38,270 --> 00:26:39,210 >> DAVID MALAN: Hm. 664 00:26:39,210 --> 00:26:41,270 Only audio is coming out of my computer. 665 00:26:41,270 --> 00:26:42,520 Let's try this again. 666 00:26:42,520 --> 00:26:45,330 667 00:26:45,330 --> 00:26:48,310 >> -Just arbitrarily pick which-- now we have four lists. 668 00:26:48,310 --> 00:26:51,590 669 00:26:51,590 --> 00:26:52,120 Learn before. 670 00:26:52,120 --> 00:26:53,040 >> DAVID MALAN: There we go. 671 00:26:53,040 --> 00:27:00,510 >> -Merging 108 and 15, we end up with the list 15, 108. 672 00:27:00,510 --> 00:27:07,170 Merging 50 and 4, we end up with 4, 50. 673 00:27:07,170 --> 00:27:12,990 Merging 8 and 42, we end up with 8, 42. 674 00:27:12,990 --> 00:27:19,970 And merging 23 and 16, we end up with 16, 23. 675 00:27:19,970 --> 00:27:23,270 >> Now all our lists are of size 2. 676 00:27:23,270 --> 00:27:26,690 Notice that each of the four lists is sorted. 677 00:27:26,690 --> 00:27:29,450 So we can start merging pairs of lists again. 678 00:27:29,450 --> 00:27:38,420 Merging 15 and 108 and 4 and 50, we first take the 4, then the 15, then 679 00:27:38,420 --> 00:27:41,500 the 50, then the 108. 680 00:27:41,500 --> 00:27:50,610 Merging 8, 42 and 16, 23, we first take the 8, then the 16, then the 23, 681 00:27:50,610 --> 00:27:52,700 then the 42. 682 00:27:52,700 --> 00:27:57,600 >> So now we have just two lists of size 4, each of which is sorted. 683 00:27:57,600 --> 00:28:01,170 So now we merge these two lists. 684 00:28:01,170 --> 00:28:11,835 First, we take the 4, then we take the 8, then we take the 15, then 16, then 685 00:28:11,835 --> 00:28:19,456 23, then 42, then 50, then 108. 686 00:28:19,456 --> 00:28:19,872 >> [END VIDEO PLAYBACK] 687 00:28:19,872 --> 00:28:23,430 >> DAVID MALAN: Again, notice, he never touched a given cup more than one time 688 00:28:23,430 --> 00:28:24,860 after advancing beyond it. 689 00:28:24,860 --> 00:28:26,200 So he's never repeating. 690 00:28:26,200 --> 00:28:29,850 So he's always moving to the side, and that's where we got our n. 691 00:28:29,850 --> 00:28:33,290 >> Why not let me pull up one animation that we saw earlier, but this time 692 00:28:33,290 --> 00:28:35,110 focusing only on merge sort. 693 00:28:35,110 --> 00:28:38,030 Let me go ahead and zoom in on this here. 694 00:28:38,030 --> 00:28:42,530 First let me choose a random input, magnify this, and you can sort of see 695 00:28:42,530 --> 00:28:46,600 what we took for granted, earlier, merge sort is actually doing. 696 00:28:46,600 --> 00:28:50,330 >> So notice that you get these halves or these quarters or these eighths of the 697 00:28:50,330 --> 00:28:53,140 problem that all of a sudden start to take good shape. 698 00:28:53,140 --> 00:28:57,070 And then finally, you see at the very end that bam, 699 00:28:57,070 --> 00:28:58,860 everything is merged together. 700 00:28:58,860 --> 00:29:01,690 >> So these are just three different takes on the same idea. 701 00:29:01,690 --> 00:29:05,980 But the key insight, just like divide and conquer in the very first class, 702 00:29:05,980 --> 00:29:10,640 was that we decided to somehow divide the problem into something big, into 703 00:29:10,640 --> 00:29:14,760 something sort of identical in spirit, but smaller and smaller and smaller 704 00:29:14,760 --> 00:29:15,660 and smaller. 705 00:29:15,660 --> 00:29:18,420 >> Now another fun way to sort of think about these, even though it's not 706 00:29:18,420 --> 00:29:20,520 going to give you the same intuitive understanding, is 707 00:29:20,520 --> 00:29:21,640 the following animation. 708 00:29:21,640 --> 00:29:25,400 So this is a video someone put together that associated different 709 00:29:25,400 --> 00:29:29,970 sounds with the various operations for insertion sort, for merge sort, and 710 00:29:29,970 --> 00:29:31,150 for a couple of others. 711 00:29:31,150 --> 00:29:32,330 So in a moment, I'm going to hit Play. 712 00:29:32,330 --> 00:29:33,600 It's about a minute long. 713 00:29:33,600 --> 00:29:37,410 And even though you can still see the patterns happening, this time you can 714 00:29:37,410 --> 00:29:41,420 also hear how these algorithms are performing differently and with 715 00:29:41,420 --> 00:29:43,950 somewhat different patterns. 716 00:29:43,950 --> 00:29:45,830 >> This is insertion sort. 717 00:29:45,830 --> 00:29:50,400 >> [TONES PLAYING] 718 00:29:50,400 --> 00:29:52,400 >> DAVID MALAN: It again is trying to insert each element 719 00:29:52,400 --> 00:29:52,900 into where it belongs. 720 00:29:52,900 --> 00:29:54,628 This is bubble sort. 721 00:29:54,628 --> 00:30:10,097 >> [TONES PLAYING] 722 00:30:10,097 --> 00:30:13,630 >> DAVID MALAN: And you can sort of feel how relatively little work it's doing 723 00:30:13,630 --> 00:30:15,834 on each step. 724 00:30:15,834 --> 00:30:20,470 This is what tediousness sounds like. 725 00:30:20,470 --> 00:30:21,472 >> [TONES PLAYING] 726 00:30:21,472 --> 00:30:25,222 >> DAVID MALAN: This is selection sort, where we select the element we want by 727 00:30:25,222 --> 00:30:28,845 going through again and again and again and putting it at the beginning. 728 00:30:28,845 --> 00:30:37,674 >> [TONES PLAYING] 729 00:30:37,674 --> 00:30:43,970 >> DAVID MALAN: This is merge sort, which you can really start to feel. 730 00:30:43,970 --> 00:30:51,810 >> [TONES PLAYING] 731 00:30:51,810 --> 00:30:54,770 >> [LAUGHTER] 732 00:30:54,770 --> 00:30:58,395 >> DAVID MALAN: Something called gnome sort, which we haven't looked at. 733 00:30:58,395 --> 00:31:13,630 >> [TONES PLAYING] 734 00:31:13,630 --> 00:31:17,910 >> DAVID MALAN: So let me see, now, distracted as you hopefully are by the 735 00:31:17,910 --> 00:31:21,110 music, if I can slip a little bit of math in here. 736 00:31:21,110 --> 00:31:24,850 So there's a fourth way that we can think about what it means these 737 00:31:24,850 --> 00:31:29,210 functions to be faster than ones that we've seen before. 738 00:31:29,210 --> 00:31:32,470 And if you're coming at the course from a mathematics background, you 739 00:31:32,470 --> 00:31:36,030 actually know perhaps already that you can slap a term on this technique-- 740 00:31:36,030 --> 00:31:40,400 namely recursion, a function that somehow calls itself. 741 00:31:40,400 --> 00:31:44,780 >> And again, recall that merge sort pseudocode was recursive in the sense 742 00:31:44,780 --> 00:31:48,460 that one of merge sort's steps was to call sort-- 743 00:31:48,460 --> 00:31:49,740 that is, itself. 744 00:31:49,740 --> 00:31:52,480 But thankfully, because we kept calling sort, or merge sort, 745 00:31:52,480 --> 00:31:55,880 specifically, on a smaller and smaller and smaller list, we eventually 746 00:31:55,880 --> 00:32:00,005 bottomed out thanks to what we'll call a base case, the hard-coded case that 747 00:32:00,005 --> 00:32:04,270 said if the list is small, less than 2 in that case, just return immediately. 748 00:32:04,270 --> 00:32:07,550 If we didn't have that special case, the algorithm would never bottom out, 749 00:32:07,550 --> 00:32:11,010 and you would indeed get into an infinite loop truly forever. 750 00:32:11,010 --> 00:32:14,330 >> But suppose that we wanted to now put some numbers on this, again, using n 751 00:32:14,330 --> 00:32:15,660 as the size of the input. 752 00:32:15,660 --> 00:32:18,680 And I wanted to ask you, what's the total time involved in 753 00:32:18,680 --> 00:32:19,800 running merge sort? 754 00:32:19,800 --> 00:32:22,960 Or more generally, what's the cost of it in time? 755 00:32:22,960 --> 00:32:24,730 >> Well it's pretty easy to measure that. 756 00:32:24,730 --> 00:32:29,010 If n is less than 2, the time involved in sorting n elements, 757 00:32:29,010 --> 00:32:30,480 where n is 2, is 0. 758 00:32:30,480 --> 00:32:31,410 Because we just return. 759 00:32:31,410 --> 00:32:32,510 There's no work to be done. 760 00:32:32,510 --> 00:32:35,660 Now arguably, maybe it's one step or two steps to figure out the amount of 761 00:32:35,660 --> 00:32:38,420 work, but it's close enough to 0 that I'm just going to say no work is 762 00:32:38,420 --> 00:32:40,940 required if the list is so small to be uninteresting. 763 00:32:40,940 --> 00:32:42,580 >> But this case is interesting. 764 00:32:42,580 --> 00:32:47,320 The recursive case was the branch of the pseudocode that said else, sort 765 00:32:47,320 --> 00:32:52,000 the left half, sort the right half, merge the two halves. 766 00:32:52,000 --> 00:32:55,530 Now why does this expression represent that expense? 767 00:32:55,530 --> 00:32:58,690 Well, T of n just means the time to sort n elements. 768 00:32:58,690 --> 00:33:03,070 And then on the right-hand side of the equals sign there, the T of n divided 769 00:33:03,070 --> 00:33:06,600 by 2 is referring to the cost of what? 770 00:33:06,600 --> 00:33:07,570 Sorting the left half. 771 00:33:07,570 --> 00:33:10,990 The other T of n divided by 2 is presumably referring to the cost to 772 00:33:10,990 --> 00:33:12,390 sort the right half. 773 00:33:12,390 --> 00:33:14,590 >> And then the plus n? 774 00:33:14,590 --> 00:33:15,420 Is the merging. 775 00:33:15,420 --> 00:33:19,120 Because if you have two lists, one of size n over 2 and another of size n 776 00:33:19,120 --> 00:33:22,580 over 2, you have to essentially touch each of those elements, just like Rob 777 00:33:22,580 --> 00:33:24,990 touched each of the cups, and just as we pointed at each of the 778 00:33:24,990 --> 00:33:26,310 volunteers on stage. 779 00:33:26,310 --> 00:33:28,790 So n is the expense of merging. 780 00:33:28,790 --> 00:33:31,780 >> Now unfortunately, this formula is also itself recursive. 781 00:33:31,780 --> 00:33:36,390 So if asked the question, if n is, say, 16, if there's 16 people on stage 782 00:33:36,390 --> 00:33:40,670 or 16 cups in the video, how many total steps does it take to sort them 783 00:33:40,670 --> 00:33:41,550 with merge sort? 784 00:33:41,550 --> 00:33:45,790 It's actually not an obvious answer, because now you have to sort of 785 00:33:45,790 --> 00:33:48,500 recursively answer this formula. 786 00:33:48,500 --> 00:33:51,190 >> But that's OK, because let me propose that we do the following. 787 00:33:51,190 --> 00:33:56,670 The time involved to sort 16 people or 16 cups is going to be represented 788 00:33:56,670 --> 00:33:58,020 generally as T of 16. 789 00:33:58,020 --> 00:34:01,400 But that equals, according to our previous formula, 2 times the amount 790 00:34:01,400 --> 00:34:04,780 of time it takes to sort 8 cups plus 16. 791 00:34:04,780 --> 00:34:08,590 And again, plus 16 is the time to merge, and the two times T of 8 is the 792 00:34:08,590 --> 00:34:10,790 time to sort left half and right half. 793 00:34:10,790 --> 00:34:11,989 >> But again, this isn't enough. 794 00:34:11,989 --> 00:34:13,210 We have to dive in deeper. 795 00:34:13,210 --> 00:34:16,409 This means we have to answer the question, what is T of 8? 796 00:34:16,409 --> 00:34:19,610 Well T of 8 is just 2 times T of 4 plus 8. 797 00:34:19,610 --> 00:34:20,520 Well, what's T of 4? 798 00:34:20,520 --> 00:34:23,780 T of 4 is just 2 times T of 2 plus 4. 799 00:34:23,780 --> 00:34:25,489 Well, what's T of 2? 800 00:34:25,489 --> 00:34:29,030 T of 2 is just 2 times T of 1 plus 2. 801 00:34:29,030 --> 00:34:31,940 >> And again, we're kind of getting stuck in this cycle. 802 00:34:31,940 --> 00:34:34,790 But it's about to hit that so-called base case. 803 00:34:34,790 --> 00:34:37,310 Because what's T of 1, did we claim? 804 00:34:37,310 --> 00:34:37,810 0. 805 00:34:37,810 --> 00:34:39,730 So now finally, we can work backwards. 806 00:34:39,730 --> 00:34:44,290 >> If T of 1 is 0, I can now go back up one line to this guy here, and I can 807 00:34:44,290 --> 00:34:46,330 plug in 0 for T of 1. 808 00:34:46,330 --> 00:34:51,770 So that means it equals 2 times zero, otherwise known as 0, plus 2. 809 00:34:51,770 --> 00:34:53,739 And so that whole expression is 2. 810 00:34:53,739 --> 00:34:58,740 >> Now if I take the T of 2, whose answer is 2, plug it into the middle line, T 811 00:34:58,740 --> 00:35:02,740 of 4, that gives me 2 times 2 plus 4, so 8. 812 00:35:02,740 --> 00:35:07,080 If I then plug in 8 to the previous line, that gives me 2 times 8, 16. 813 00:35:07,080 --> 00:35:12,470 And if we then continue that with the 24, adding in 16, we finally get a 814 00:35:12,470 --> 00:35:13,820 value of 64. 815 00:35:13,820 --> 00:35:18,480 >> Now that in and of itself sort of speaks nothing to the n notation, the 816 00:35:18,480 --> 00:35:20,700 big O, the omega that we've been talking about. 817 00:35:20,700 --> 00:35:24,890 But it turns out that 64 is indeed 16, the size of the input, 818 00:35:24,890 --> 00:35:27,110 log base 2 of 16. 819 00:35:27,110 --> 00:35:30,200 And if this is a little unfamiliar, just think back, and it'll come back 820 00:35:30,200 --> 00:35:30,700 to you eventually. 821 00:35:30,700 --> 00:35:33,775 If this is log base 2, it's like 2 raised to the what gives you 16? 822 00:35:33,775 --> 00:35:36,380 Oh, that's 4, so it's 16 times 4. 823 00:35:36,380 --> 00:35:39,380 >> And again, it's not a big deal if this is sort of a hazy memory now. 824 00:35:39,380 --> 00:35:43,720 But for now, take on faith that 16 log 16 is 64. 825 00:35:43,720 --> 00:35:46,590 And so indeed, with this simple sanity check, we've confirmed-- 826 00:35:46,590 --> 00:35:48,250 but not proved formally-- 827 00:35:48,250 --> 00:35:52,800 that the running time of merge sort is indeed n log n. 828 00:35:52,800 --> 00:35:53,790 >> So not bad. 829 00:35:53,790 --> 00:35:57,260 It's definitely better than the algorithms we've seen thus far, and 830 00:35:57,260 --> 00:36:00,710 it's because we've leveraged, one, a technique called recursion. 831 00:36:00,710 --> 00:36:03,880 But more interesting than that, that notion of dividing and conquering. 832 00:36:03,880 --> 00:36:07,460 Again, truly week 0 stuff that even now is recurring in a 833 00:36:07,460 --> 00:36:08,740 more compelling way. 834 00:36:08,740 --> 00:36:11,750 >> Now a fun little exercise, if you've never done this-- and you probably 835 00:36:11,750 --> 00:36:14,660 wouldn't have, because sort of normal people don't think to do this. 836 00:36:14,660 --> 00:36:20,650 But if I go to google.com and if I want to learn something about 837 00:36:20,650 --> 00:36:22,356 recursion, Enter. 838 00:36:22,356 --> 00:36:25,106 839 00:36:25,106 --> 00:36:29,058 >> [LAUGHTER] 840 00:36:29,058 --> 00:36:32,030 [MORE LAUGHTER] 841 00:36:32,030 --> 00:36:33,385 DAVID MALAN: Bad joke slowly spreading. 842 00:36:33,385 --> 00:36:34,450 [LAUGHTER] 843 00:36:34,450 --> 00:36:36,970 DAVID MALAN: Just in case, it's there. 844 00:36:36,970 --> 00:36:38,710 I didn't spell it wrong, and there's the joke. 845 00:36:38,710 --> 00:36:40,810 All right. 846 00:36:40,810 --> 00:36:42,950 Explain it to the people next to you if it hasn't quite clicked just yet. 847 00:36:42,950 --> 00:36:47,650 But recursion, more generally, refers to the process of a function calling 848 00:36:47,650 --> 00:36:51,430 itself, or more generally, dividing a problem into something that can be 849 00:36:51,430 --> 00:36:56,220 solved piecemeal by solving identical representative problems. 850 00:36:56,220 --> 00:36:58,400 >> Well, let's change gears for just a moment. 851 00:36:58,400 --> 00:37:00,840 We like to end on certain cliffhangers, so let's start to set 852 00:37:00,840 --> 00:37:05,870 the stage, for several minutes, on a very simple idea-- 853 00:37:05,870 --> 00:37:07,620 that of swapping two elements, right? 854 00:37:07,620 --> 00:37:10,040 All of these algorithms we've been talking about the past couple of 855 00:37:10,040 --> 00:37:12,420 lectures involve some sort of swapping. 856 00:37:12,420 --> 00:37:14,630 Today it was visualized by them getting up out of their chairs and 857 00:37:14,630 --> 00:37:18,570 walking around, but in code, we would just take an element from one array 858 00:37:18,570 --> 00:37:20,370 and plop it into another. 859 00:37:20,370 --> 00:37:21,880 >> So how do we go about doing this? 860 00:37:21,880 --> 00:37:24,850 Well, let me go ahead and write a quick program here. 861 00:37:24,850 --> 00:37:31,600 I'm going to go ahead and do this as the following. 862 00:37:31,600 --> 00:37:33,910 Let's call this-- 863 00:37:33,910 --> 00:37:38,070 what do we want to call this one? 864 00:37:38,070 --> 00:37:38,650 >> Actually, no. 865 00:37:38,650 --> 00:37:39,420 Let me rewind. 866 00:37:39,420 --> 00:37:41,220 I don't want to do that cliffhanger yet. 867 00:37:41,220 --> 00:37:42,270 It will spoil the fun. 868 00:37:42,270 --> 00:37:43,600 Let's do this instead. 869 00:37:43,600 --> 00:37:47,430 >> Suppose that I want to write a little program and that now embraces this 870 00:37:47,430 --> 00:37:48,700 idea of recursion. 871 00:37:48,700 --> 00:37:50,370 I kind of got ahead of myself there. 872 00:37:50,370 --> 00:37:51,420 I'm going to do the following. 873 00:37:51,420 --> 00:38:00,220 >> First, a quick include of standard io.h, as well as an include of cs50.h. 874 00:38:00,220 --> 00:38:03,200 And then I'm going to go ahead and declare int main void 875 00:38:03,200 --> 00:38:04,360 in the usual way. 876 00:38:04,360 --> 00:38:07,920 I realized I've misnamed the file, so let me just add a .c extension here so 877 00:38:07,920 --> 00:38:09,510 that we can compile it properly. 878 00:38:09,510 --> 00:38:10,970 Start this function off. 879 00:38:10,970 --> 00:38:13,290 >> And the function I want to write, quite simply, is one that asks the 880 00:38:13,290 --> 00:38:16,210 user for a number and then adds up all the numbers between that 881 00:38:16,210 --> 00:38:19,920 number and, say, 0. 882 00:38:19,920 --> 00:38:22,510 So first I'm going to go ahead and declare int n. 883 00:38:22,510 --> 00:38:24,760 Then I copy some code that we've used for a while. 884 00:38:24,760 --> 00:38:26,660 While something is true. 885 00:38:26,660 --> 00:38:28,000 I'll come back to that in a moment. 886 00:38:28,000 --> 00:38:29,010 >> What do I want to do? 887 00:38:29,010 --> 00:38:33,460 I want to say printf positive integer please. 888 00:38:33,460 --> 00:38:36,130 And then I'm going to say n gets get int. 889 00:38:36,130 --> 00:38:38,800 So again, some boilerplate code that we've used before. 890 00:38:38,800 --> 00:38:40,810 And I'm going to do this while n is less than 1. 891 00:38:40,810 --> 00:38:44,120 So this will ensure that the user gives me a positive integer. 892 00:38:44,120 --> 00:38:45,490 >> And now I'm going to do the following. 893 00:38:45,490 --> 00:38:51,020 I want to add up all of the numbers between 1 and and n, or 0 and n, 894 00:38:51,020 --> 00:38:52,570 equivalently, to get the total sum. 895 00:38:52,570 --> 00:38:55,100 So the big sigma symbol that you might recall. 896 00:38:55,100 --> 00:38:59,050 So I'm going to do this by first calling a function called sigma, 897 00:38:59,050 --> 00:39:06,030 passing it in n, and then I'm going to say printf, the answer is right there. 898 00:39:06,030 --> 00:39:08,180 >> So in short, I get and int from the user. 899 00:39:08,180 --> 00:39:09,280 I ensure it's positive. 900 00:39:09,280 --> 00:39:12,700 I declare a variable called answer of type int and store in it the return 901 00:39:12,700 --> 00:39:15,610 value of sigma, passing in n as input. 902 00:39:15,610 --> 00:39:17,060 And then I print out that answer. 903 00:39:17,060 --> 00:39:19,550 >> Unfortunately, even though sigma sounds like something that might be in 904 00:39:19,550 --> 00:39:24,040 the math.h file, its declaration, it's actually not. 905 00:39:24,040 --> 00:39:24,690 So that's OK. 906 00:39:24,690 --> 00:39:26,170 I can implement this myself. 907 00:39:26,170 --> 00:39:29,160 I'm going to implement a function called sigma, and it's going to take a 908 00:39:29,160 --> 00:39:29,900 parameter-- 909 00:39:29,900 --> 00:39:32,100 let's just call it m, just so it's different. 910 00:39:32,100 --> 00:39:35,910 And then up here, I'm going to say, well, if m is less than 1-- this is a 911 00:39:35,910 --> 00:39:38,180 very uninteresting program. 912 00:39:38,180 --> 00:39:41,700 So I'm going to go ahead and immediately return 0. 913 00:39:41,700 --> 00:39:45,920 It just doesn't make sense to add up all the numbers between 1 and m if m 914 00:39:45,920 --> 00:39:48,470 is itself 0 or negative. 915 00:39:48,470 --> 00:39:50,900 >> And then I'm going to go ahead and do this very iteratively. 916 00:39:50,900 --> 00:39:53,090 I'm going to do this sort of old-school, and I'm going to go ahead 917 00:39:53,090 --> 00:39:57,150 and say that I'm going to declare a sum to be 0. 918 00:39:57,150 --> 00:39:59,630 Then I'm going to have a for loop of int-- 919 00:39:59,630 --> 00:40:02,820 and let me do it to match our distribution code, so you have a copy 920 00:40:02,820 --> 00:40:07,500 at home. int i gets 1 on up to i is less than or equal to m. 921 00:40:07,500 --> 00:40:09,430 i plus plus. 922 00:40:09,430 --> 00:40:11,430 And then inside of this for loop-- 923 00:40:11,430 --> 00:40:12,440 we're almost there-- 924 00:40:12,440 --> 00:40:15,810 sum gets sum plus 1. 925 00:40:15,810 --> 00:40:17,670 And then I'm going to return the sum. 926 00:40:17,670 --> 00:40:19,420 >> So I did this quickly, quite admittedly. 927 00:40:19,420 --> 00:40:22,775 But again, the main function's pretty straightforward, based on code we've 928 00:40:22,775 --> 00:40:23,190 written thus far. 929 00:40:23,190 --> 00:40:25,610 Uses the dual loop to get a positive int from the user. 930 00:40:25,610 --> 00:40:29,870 I then pass that int to a new function called sigma, calling it, again, n. 931 00:40:29,870 --> 00:40:33,100 And I store the return value, the answer from the black box currently 932 00:40:33,100 --> 00:40:35,460 known as sigma, in a variable called answer. 933 00:40:35,460 --> 00:40:36,580 Then I print it. 934 00:40:36,580 --> 00:40:39,090 >> If we now continue the story, how is sigma implemented? 935 00:40:39,090 --> 00:40:40,840 I propose to implement as follows. 936 00:40:40,840 --> 00:40:43,560 First, a little bit of error-checking to make sure that the user's not 937 00:40:43,560 --> 00:40:46,480 messing with me and passing in some negative or 0 value. 938 00:40:46,480 --> 00:40:49,710 Then I declare a variable called sum and set it to 0. 939 00:40:49,710 --> 00:40:55,910 >> And now I begin to move from i equals 1 all the way up to and including m, 940 00:40:55,910 --> 00:41:00,130 because I want to include all the numbers from one through m, inclusive. 941 00:41:00,130 --> 00:41:04,350 And inside of this for loop, I just do sum gets whatever it is now, plus the 942 00:41:04,350 --> 00:41:08,900 value of i. 943 00:41:08,900 --> 00:41:10,370 Plus the value of i. 944 00:41:10,370 --> 00:41:14,090 >> As an aside, if you've not seen this before, there's some syntactic sugar 945 00:41:14,090 --> 00:41:14,870 for this line. 946 00:41:14,870 --> 00:41:21,120 I can rewrite this as plus equals i, just to save myself a few keystrokes 947 00:41:21,120 --> 00:41:22,570 and to look a bit cooler. 948 00:41:22,570 --> 00:41:23,140 But that's all. 949 00:41:23,140 --> 00:41:24,660 It's functionally the same thing. 950 00:41:24,660 --> 00:41:26,710 >> Unfortunately, this code's not going to compile yet. 951 00:41:26,710 --> 00:41:31,600 If I run make sigma 0, how am I going to get yelled at? 952 00:41:31,600 --> 00:41:33,473 What's it going to not like? 953 00:41:33,473 --> 00:41:35,740 >> AUDIENCE: [INAUDIBLE]. 954 00:41:35,740 --> 00:41:37,800 >> DAVID MALAN: Yeah, I didn't declare the function up top, right? 955 00:41:37,800 --> 00:41:40,590 C is kind of stupid, in that it only does what you tell it to do, and you 956 00:41:40,590 --> 00:41:41,880 have to do it in that order. 957 00:41:41,880 --> 00:41:45,910 And so if I hit Enter here, I'm going to get a warning about sigma implicit 958 00:41:45,910 --> 00:41:46,860 declaration. 959 00:41:46,860 --> 00:41:48,120 >> Oh, not a problem. 960 00:41:48,120 --> 00:41:50,370 I can go up to the top, and I can say, all right, wait a minute. 961 00:41:50,370 --> 00:41:54,240 Sigma is a function that returns an int and it expects an 962 00:41:54,240 --> 00:41:56,620 int as input, semicolon. 963 00:41:56,620 --> 00:41:59,550 Or I could put the whole function above main, but in general, I'd 964 00:41:59,550 --> 00:42:02,260 recommend against that, because it's nice to always have main at the top so 965 00:42:02,260 --> 00:42:06,310 you can dive right in and know what a program's doing by reading main first. 966 00:42:06,310 --> 00:42:07,930 >> So now let me clear the screen. 967 00:42:07,930 --> 00:42:09,330 Remake sigma 0. 968 00:42:09,330 --> 00:42:10,340 All seems to check out. 969 00:42:10,340 --> 00:42:11,970 Let me run sigma 0. 970 00:42:11,970 --> 00:42:12,770 Positive inter. 971 00:42:12,770 --> 00:42:15,580 I'll give it the number 3 to keep it simple. 972 00:42:15,580 --> 00:42:18,710 So that should give me 3 plus 2 plus 1, so 6. 973 00:42:18,710 --> 00:42:20,750 Enter, and indeed I get 6. 974 00:42:20,750 --> 00:42:21,820 >> I can do something bigger-- 975 00:42:21,820 --> 00:42:24,080 50, 12, 75. 976 00:42:24,080 --> 00:42:27,690 Just as a tangent, I'm going to do something ridiculous like a really big 977 00:42:27,690 --> 00:42:30,375 number, Oh, that actually worked out-- 978 00:42:30,375 --> 00:42:31,600 eh, I don't think that's right. 979 00:42:31,600 --> 00:42:32,810 Let's see. 980 00:42:32,810 --> 00:42:34,060 Let's really mess with it. 981 00:42:34,060 --> 00:42:37,150 982 00:42:37,150 --> 00:42:38,400 >> That's a problem. 983 00:42:38,400 --> 00:42:43,180 984 00:42:43,180 --> 00:42:44,970 What's going on? 985 00:42:44,970 --> 00:42:46,050 The code's not that bad. 986 00:42:46,050 --> 00:42:48,470 It's still linear. 987 00:42:48,470 --> 00:42:50,810 Whistling is a good effect, though. 988 00:42:50,810 --> 00:42:52,060 What's going on? 989 00:42:52,060 --> 00:42:54,700 990 00:42:54,700 --> 00:42:55,620 >> Not sure if I heard it. 991 00:42:55,620 --> 00:42:57,620 So it turns out-- and this is as an aside. 992 00:42:57,620 --> 00:42:59,400 This is not core to the idea of recursion. 993 00:42:59,400 --> 00:43:02,480 It turns out, because I'm trying to represent such a big number, most 994 00:43:02,480 --> 00:43:06,980 likely it's being misinterpreted by C as a not positive number, 995 00:43:06,980 --> 00:43:09,980 but negative number. 996 00:43:09,980 --> 00:43:12,690 >> We have not talked about this, but it turns out there are negative numbers 997 00:43:12,690 --> 00:43:14,210 in the world in addition to positive numbers. 998 00:43:14,210 --> 00:43:16,290 And the means by which you can represent a negative number 999 00:43:16,290 --> 00:43:19,530 essentially is, you use one special bit to indicate 1000 00:43:19,530 --> 00:43:20,400 positive over negative. 1001 00:43:20,400 --> 00:43:22,950 It's a little more complex than that, but that's the basic idea. 1002 00:43:22,950 --> 00:43:26,740 >> So unfortunately, if C is confusing one of those bits as actually meaning, 1003 00:43:26,740 --> 00:43:30,790 oh, this is a negative number, my loop here, for instance, is actually never 1004 00:43:30,790 --> 00:43:31,740 going to terminate. 1005 00:43:31,740 --> 00:43:33,850 So if I were actually printing something again and again, we would 1006 00:43:33,850 --> 00:43:34,650 see a whole lot. 1007 00:43:34,650 --> 00:43:36,220 But again, this is besides the point. 1008 00:43:36,220 --> 00:43:38,590 This is really just a sort of intellectual curiosity that we'll come 1009 00:43:38,590 --> 00:43:39,550 back to eventually. 1010 00:43:39,550 --> 00:43:43,400 But for now, this is a correct implementation if we assume that the 1011 00:43:43,400 --> 00:43:45,970 user will provide ints that fit within ints. 1012 00:43:45,970 --> 00:43:49,370 >> But I claim that this code, frankly, could be done so much more simply. 1013 00:43:49,370 --> 00:43:54,060 If the goal at hand is to take a number like m and add up all of the 1014 00:43:54,060 --> 00:43:59,510 numbers between it and 1, or conversely between 1 and it, I claim 1015 00:43:59,510 --> 00:44:03,380 that I can borrow this idea that merge sort had, which was taking a problem 1016 00:44:03,380 --> 00:44:05,660 of this size and dividing it into something smaller. 1017 00:44:05,660 --> 00:44:09,875 Maybe not half, but smaller, but representatively the same. 1018 00:44:09,875 --> 00:44:12,130 Same idea, but a smaller problem. 1019 00:44:12,130 --> 00:44:15,470 >> So I'm actually-- let me save this file with a different version number. 1020 00:44:15,470 --> 00:44:17,670 We'll call this version 1 instead of 0. 1021 00:44:17,670 --> 00:44:20,790 And I claim that I can actually reimplement this in this sort of 1022 00:44:20,790 --> 00:44:22,160 mind-bending way. 1023 00:44:22,160 --> 00:44:23,710 >> I'm going to leave part of it alone. 1024 00:44:23,710 --> 00:44:27,710 I'm going to say if m is less than or even equal to 0-- 1025 00:44:27,710 --> 00:44:29,280 I'm just going to be a little more anal this time 1026 00:44:29,280 --> 00:44:30,520 with my error checking-- 1027 00:44:30,520 --> 00:44:33,190 I'm going to go ahead and return 0. 1028 00:44:33,190 --> 00:44:34,490 This is arbitrary. 1029 00:44:34,490 --> 00:44:37,500 I am just simply deciding if the user gives me a negative number, I'm 1030 00:44:37,500 --> 00:44:41,490 returning 0, and they should have read the documentation more closely. 1031 00:44:41,490 --> 00:44:42,170 >> Else-- 1032 00:44:42,170 --> 00:44:44,070 notice what I'm going to do. 1033 00:44:44,070 --> 00:44:49,260 Else I am going to return m plus-- 1034 00:44:49,260 --> 00:44:51,010 what is sigma of m? 1035 00:44:51,010 --> 00:44:56,710 Well, sigma of m plus m minus 1, plus m minus 2, plus m minus 3. 1036 00:44:56,710 --> 00:44:58,030 I don't want to write all of that out. 1037 00:44:58,030 --> 00:44:59,120 Why don't I just punt? 1038 00:44:59,120 --> 00:45:05,080 Recursively call myself with a slightly smaller problem, semicolon, 1039 00:45:05,080 --> 00:45:06,840 and call it a day? 1040 00:45:06,840 --> 00:45:07,040 >> Right? 1041 00:45:07,040 --> 00:45:10,980 Now here, too, you might feel or worry that this is an infinite loop that I'm 1042 00:45:10,980 --> 00:45:15,450 inducing, whereby I am implementing sigma by calling sigma. 1043 00:45:15,450 --> 00:45:20,342 But that's perfectly OK, because I thought ahead an added which lines? 1044 00:45:20,342 --> 00:45:22,600 >> AUDIENCE: [INAUDIBLE]. 1045 00:45:22,600 --> 00:45:25,430 >> DAVID MALAN: 23 to 26, which is my if condition. 1046 00:45:25,430 --> 00:45:28,390 Because what's nice about the subtraction here, because I keep 1047 00:45:28,390 --> 00:45:31,180 handing sigma smaller problems, smaller problems, smaller-- it's not 1048 00:45:31,180 --> 00:45:31,870 half the size. 1049 00:45:31,870 --> 00:45:34,380 It's only a baby step smaller, but that's OK. 1050 00:45:34,380 --> 00:45:38,050 Because eventually, we'll work our way down to 1 or 0. 1051 00:45:38,050 --> 00:45:41,630 And once we hit 0, sigma's not going to call itself anymore. 1052 00:45:41,630 --> 00:45:43,590 It's going to immediately return 0. 1053 00:45:43,590 --> 00:45:47,960 >> So the effect, if you sort of wind this up in your mind, is to add m plus 1054 00:45:47,960 --> 00:45:52,740 m minus 1, plus m minus 2, plus m minus 3, plus dot, dot, dot, m minus 1055 00:45:52,740 --> 00:45:57,820 m, eventually giving you 0, and the effect is ultimately to add all of 1056 00:45:57,820 --> 00:45:59,070 these things together. 1057 00:45:59,070 --> 00:46:02,380 So we have not, with recursion, solved the problem that we 1058 00:46:02,380 --> 00:46:03,470 couldn't solve before. 1059 00:46:03,470 --> 00:46:06,840 Indeed, version 0 of this, and every problem to date, has been solvable 1060 00:46:06,840 --> 00:46:09,980 with just using for loops or while loops or similar constructs. 1061 00:46:09,980 --> 00:46:13,150 >> But recursion, I daresay, gives us a different way of thinking about 1062 00:46:13,150 --> 00:46:17,010 problems, whereby if we can take a problem, divide it from something 1063 00:46:17,010 --> 00:46:22,340 somewhat large into something somewhat smaller, I claim that we can solve it 1064 00:46:22,340 --> 00:46:26,040 perhaps a little more elegantly in terms of the design, with less code, 1065 00:46:26,040 --> 00:46:30,980 and maybe even solve problems that would be harder, as we'll eventually 1066 00:46:30,980 --> 00:46:33,280 see, solving purely iteratively. 1067 00:46:33,280 --> 00:46:35,980 >> But the cliffhanger that I did want to leave us on was this. 1068 00:46:35,980 --> 00:46:40,720 Let me go ahead and open up a file from-- 1069 00:46:40,720 --> 00:46:44,300 actually, let me go and do this real fast. 1070 00:46:44,300 --> 00:46:46,875 Let me go ahead and propose the following. 1071 00:46:46,875 --> 00:46:51,160 1072 00:46:51,160 --> 00:46:54,785 Among today's code is this file here. 1073 00:46:54,785 --> 00:47:01,090 1074 00:47:01,090 --> 00:47:03,050 This one here, noswap. 1075 00:47:03,050 --> 00:47:06,260 >> So this is a stupid little program that I whipped up that claims to do 1076 00:47:06,260 --> 00:47:06,940 the following. 1077 00:47:06,940 --> 00:47:10,140 In main, it first declares an int called x and assigns it 1078 00:47:10,140 --> 00:47:11,100 the value of 1. 1079 00:47:11,100 --> 00:47:13,520 Then it declares an int y and assigns it the value 2. 1080 00:47:13,520 --> 00:47:15,310 Then it prints out what x and y is. 1081 00:47:15,310 --> 00:47:17,781 Then it says, swapping, dot dot dot. 1082 00:47:17,781 --> 00:47:21,670 >> Then it claims to be calling a function called swap, passing in x and 1083 00:47:21,670 --> 00:47:24,290 y, the idea of which is that hopefully x and y will come back 1084 00:47:24,290 --> 00:47:25,620 different, the opposite. 1085 00:47:25,620 --> 00:47:27,110 Then it claim swapped! 1086 00:47:27,110 --> 00:47:28,420 with an exclamation point. 1087 00:47:28,420 --> 00:47:30,190 Then it prints out x and y. 1088 00:47:30,190 --> 00:47:33,480 >> But it turns out that this very simple demonstration down 1089 00:47:33,480 --> 00:47:35,570 here is actually buggy. 1090 00:47:35,570 --> 00:47:38,870 Even though I'm declaring a temporary variable and temporarily putting a in 1091 00:47:38,870 --> 00:47:42,010 it, then I'm reassigning a the value of b-- 1092 00:47:42,010 --> 00:47:45,080 which feels reasonable, because I've saved a copy of a in temp. 1093 00:47:45,080 --> 00:47:48,410 Then I update b to equal whatever was in temp. 1094 00:47:48,410 --> 00:47:51,610 This sort of shell game of moving a into b and b into a by using this 1095 00:47:51,610 --> 00:47:54,360 middle-man called temp feels perfectly reasonable. 1096 00:47:54,360 --> 00:47:57,270 >> But I claim that when I run this code, as I'll do now-- 1097 00:47:57,270 --> 00:47:59,490 let me go ahead and paste it in here. 1098 00:47:59,490 --> 00:48:01,995 I'll call this noswap.c. 1099 00:48:01,995 --> 00:48:05,630 And as the name suggests, this is not going to be a correct program. 1100 00:48:05,630 --> 00:48:09,460 Make noswap ./ no swap. 1101 00:48:09,460 --> 00:48:12,110 x is 1, y is 2, swapping, swapped. 1102 00:48:12,110 --> 00:48:14,220 x is 1, y is 2. 1103 00:48:14,220 --> 00:48:16,920 This is fundamentally wrong, even though this seems perfectly 1104 00:48:16,920 --> 00:48:17,730 reasonable to me. 1105 00:48:17,730 --> 00:48:21,330 And there is a reason, but we're not going to reveal the reason just yet. 1106 00:48:21,330 --> 00:48:24,610 >> For now the second cliffhanger I wanted to leave you with is this, an 1107 00:48:24,610 --> 00:48:27,120 announcement of sorts on coupon codes. 1108 00:48:27,120 --> 00:48:31,590 Our innovation with late days this year has provoked a non-trivial number 1109 00:48:31,590 --> 00:48:33,830 of questions, which was not our intention. 1110 00:48:33,830 --> 00:48:36,590 The intention of these coupon codes, whereby if you do part of the problem 1111 00:48:36,590 --> 00:48:39,850 set early, thereby getting an extra day, was really to help you guys help 1112 00:48:39,850 --> 00:48:42,420 yourself start early, sort of by incentivizing you. 1113 00:48:42,420 --> 00:48:44,880 Helps us distribute load across office hours better so that 1114 00:48:44,880 --> 00:48:45,740 it's sort of win-win. 1115 00:48:45,740 --> 00:48:48,860 >> Unfortunately, I think my instructions have not been, to date, very clear, so 1116 00:48:48,860 --> 00:48:52,230 I went back this weekend and updated the spec in bigger, bolder text to 1117 00:48:52,230 --> 00:48:53,600 explain bullets like these. 1118 00:48:53,600 --> 00:48:56,900 And just to say it more publicly, by default, problem sets are due Thursday 1119 00:48:56,900 --> 00:48:58,400 at noon, per the syllabus. 1120 00:48:58,400 --> 00:49:02,030 If you start early, finishing part of the problem set by Wednesday at 12:00 1121 00:49:02,030 --> 00:49:05,170 PM, the part that relates to a coupon code, the idea is that you can extend 1122 00:49:05,170 --> 00:49:07,710 your deadline for the P set until Friday. 1123 00:49:07,710 --> 00:49:10,890 That is, bit off a tiny part of the P set relative to what typically is the 1124 00:49:10,890 --> 00:49:13,200 larger problem, and you buy yourself an extra day. 1125 00:49:13,200 --> 00:49:15,190 Again, it gets you thinking about the problem set, gets you to 1126 00:49:15,190 --> 00:49:16,380 office hours sooner. 1127 00:49:16,380 --> 00:49:20,670 But the coupon code problem is still required, even if you don't submit it. 1128 00:49:20,670 --> 00:49:23,340 >> But more compellingly is this. 1129 00:49:23,340 --> 00:49:26,520 (STAGE WHISPER) And those folks leaving early are gonna regret it. 1130 00:49:26,520 --> 00:49:28,620 As are the folks on the balcony. 1131 00:49:28,620 --> 00:49:32,510 I apologize in advance to the folks on the balcony for reasons that will be 1132 00:49:32,510 --> 00:49:33,920 clear in just a moment. 1133 00:49:33,920 --> 00:49:37,050 >> So we are fortunate to have one of CS50's former head teaching fellows at 1134 00:49:37,050 --> 00:49:39,302 a company called dropbox.com. 1135 00:49:39,302 --> 00:49:45,500 They have very generously donated a coupon code here for this much space, 1136 00:49:45,500 --> 00:49:48,180 which is up from the usual 2 gigabytes. 1137 00:49:48,180 --> 00:49:51,740 So what I thought we would do on this final note is do a bit of a giveaway, 1138 00:49:51,740 --> 00:49:55,380 whereby in just a moment, we will reveal the winner and who has a coupon 1139 00:49:55,380 --> 00:49:57,980 code that you can then go to their website, type it in, and voila, get a 1140 00:49:57,980 --> 00:50:01,370 whole lot more Dropbox space for your appliance and for your personal files. 1141 00:50:01,370 --> 00:50:05,690 >> And first, who would like to participate in this drawing? 1142 00:50:05,690 --> 00:50:09,630 OK, now that makes it even more fun. 1143 00:50:09,630 --> 00:50:14,010 The person who receives this 25-gigabyte coupon code-- which is far 1144 00:50:14,010 --> 00:50:16,150 more compelling than the late days now, perhaps-- 1145 00:50:16,150 --> 00:50:20,620 is the one who is seated on top of a seat cushion beneath which there is 1146 00:50:20,620 --> 00:50:21,620 that coupon code. 1147 00:50:21,620 --> 00:50:23,480 You may now look underneath your seat cushion. 1148 00:50:23,480 --> 00:50:28,710 1149 00:50:28,710 --> 00:50:29,680 >> [VIDEO PLAYBACK] 1150 00:50:29,680 --> 00:50:31,620 >> -One, two, three. 1151 00:50:31,620 --> 00:50:35,015 >> [SCREAMING] 1152 00:50:35,015 --> 00:50:35,985 >> -You get a car! 1153 00:50:35,985 --> 00:50:36,670 You get a car! 1154 00:50:36,670 --> 00:50:37,970 >> DAVID MALAN: We will see you on Wednesday. 1155 00:50:37,970 --> 00:50:38,904 >> -You get a car! 1156 00:50:38,904 --> 00:50:39,371 You get a car! 1157 00:50:39,371 --> 00:50:40,305 You get a car! 1158 00:50:40,305 --> 00:50:41,706 You get a car! 1159 00:50:41,706 --> 00:50:43,107 You get a car! 1160 00:50:43,107 --> 00:50:45,530 >> DAVID MALAN: Balcony folks, come down here to the front, 1161 00:50:45,530 --> 00:50:46,866 where we have extras. 1162 00:50:46,866 --> 00:50:50,282 >> -Everybody gets a car! 1163 00:50:50,282 --> 00:50:52,234 Everybody gets a car! 1164 00:50:52,234 --> 00:50:52,722 >> [END VIDEO PLAYBACK] 1165 00:50:52,722 --> 00:50:54,590 >> NARRATOR: At the next CS50-- 1166 00:50:54,590 --> 00:51:00,374 >> SPEAKER 5: Oh my gosh gosh gosh gosh gosh gosh gosh gosh gosh gosh-- 1167 00:51:00,374 --> 00:51:02,106 >> [UKELELE PLAYS]