1 00:00:00,000 --> 00:00:11,100 >> [MUSIC PLAYING] 2 00:00:11,100 --> 00:00:11,490 >> DAVID J. MALAN: All right. 3 00:00:11,490 --> 00:00:12,170 So welcome back. 4 00:00:12,170 --> 00:00:15,180 This is CS50, and the is the end of week three. 5 00:00:15,180 --> 00:00:17,770 >> So recall in the past several weeks, we've been spending quite a bit of 6 00:00:17,770 --> 00:00:20,820 time on C, on programming, on syntax. 7 00:00:20,820 --> 00:00:24,680 And it's quite normal, if you're still struggling with Problem Set 2, to be 8 00:00:24,680 --> 00:00:25,950 banging your head against the wall. 9 00:00:25,950 --> 00:00:28,310 It's cryptic-looking error messages and bugs that you 10 00:00:28,310 --> 00:00:29,220 can't quite chase down. 11 00:00:29,220 --> 00:00:32,310 Because, rest assured, that in just a few weeks' time you'll look back on 12 00:00:32,310 --> 00:00:35,930 things like Caesar, and [? V-genair, ?] maybe even Crack, and 13 00:00:35,930 --> 00:00:40,050 realize just how far you've come in a short period of time. 14 00:00:40,050 --> 00:00:43,670 So if that's any consolation, hang in there for now. 15 00:00:43,670 --> 00:00:46,610 >> Today, though, we begin to transition to things higher level. 16 00:00:46,610 --> 00:00:49,820 And we start to take for granted that you guys know how to program, or at 17 00:00:49,820 --> 00:00:52,090 least the beginnings of that comfort level. 18 00:00:52,090 --> 00:00:56,520 And we'll start to consider how we can go about designing programs more 19 00:00:56,520 --> 00:00:57,440 effectively. 20 00:00:57,440 --> 00:01:01,090 How we can go about optimizing the efficiency of our algorithms, and 21 00:01:01,090 --> 00:01:03,110 generally solving more interesting problems. 22 00:01:03,110 --> 00:01:06,850 And starting to take for granted that, if we wanted to, we could code up any 23 00:01:06,850 --> 00:01:08,350 of the examples we have in mind. 24 00:01:08,350 --> 00:01:11,430 So today, we don't touch the keyboard for any form of code. 25 00:01:11,430 --> 00:01:15,150 It'll be much higher level, and ultimately, about problem-solving. 26 00:01:15,150 --> 00:01:20,490 >> So to get to that point, let me propose that the following seven 27 00:01:20,490 --> 00:01:24,290 rectangles represent seven doors, behind which are a whole bunch of 28 00:01:24,290 --> 00:01:26,340 numbers, among which is the number 50. 29 00:01:26,340 --> 00:01:30,470 Let me project this on this screen here as well. 30 00:01:30,470 --> 00:01:36,770 And propose that we need a volunteer to help find me a number in front of 31 00:01:36,770 --> 00:01:38,140 the internet here to see. 32 00:01:38,140 --> 00:01:40,755 Come on up, in the pink. 33 00:01:40,755 --> 00:01:43,050 All right. 34 00:01:43,050 --> 00:01:43,930 What's your name? 35 00:01:43,930 --> 00:01:44,850 >> JENNIFER: [INAUDIBLE] 36 00:01:44,850 --> 00:01:45,170 >> DAVID J. MALAN: Sorry? 37 00:01:45,170 --> 00:01:45,860 >> JENNIFER: Jennifer. 38 00:01:45,860 --> 00:01:46,390 >> DAVID J. MALAN: Jennifer. 39 00:01:46,390 --> 00:01:46,980 All right, Jennifer. 40 00:01:46,980 --> 00:01:47,630 Nice to meet you. 41 00:01:47,630 --> 00:01:48,370 Come on up. 42 00:01:48,370 --> 00:01:52,430 So these here are seven doors, and what I'd like you to do for us here, 43 00:01:52,430 --> 00:01:56,560 in front of all of your classmates, is find us the number, 50. 44 00:01:56,560 --> 00:02:00,860 To find a number, you can peek behind any of these doors by simply tapping 45 00:02:00,860 --> 00:02:03,030 on one of the doors, and it will reveal its number. 46 00:02:03,030 --> 00:02:06,080 And let's see how quickly you can find us the number, 50. 47 00:02:06,080 --> 00:02:09,979 48 00:02:09,979 --> 00:02:11,229 >> 15. 49 00:02:11,229 --> 00:02:13,110 50 00:02:13,110 --> 00:02:14,360 16. 51 00:02:14,360 --> 00:02:16,270 52 00:02:16,270 --> 00:02:16,530 50. 53 00:02:16,530 --> 00:02:17,350 Nicely done. 54 00:02:17,350 --> 00:02:18,040 All right. 55 00:02:18,040 --> 00:02:19,906 Round of applause for Jennifer. 56 00:02:19,906 --> 00:02:21,530 >> [APPLAUSE] 57 00:02:21,530 --> 00:02:22,320 >> All right. 58 00:02:22,320 --> 00:02:25,254 So what was your strategy for finding the number, 50? 59 00:02:25,254 --> 00:02:27,222 >> JENNIFER: Um, I thought maybe if-- 60 00:02:27,222 --> 00:02:27,714 [INAUDIBLE] 61 00:02:27,714 --> 00:02:28,206 >> DAVID J. MALAN: Oh. 62 00:02:28,206 --> 00:02:29,630 Give it one second. 63 00:02:29,630 --> 00:02:32,420 So was your strategy for finding the number, 50? 64 00:02:32,420 --> 00:02:34,760 >> JENNIFER: So I just start at the beginning to see what the first number 65 00:02:34,760 --> 00:02:38,590 was, and then I thought, maybe if they're sorted, I'll just keep 66 00:02:38,590 --> 00:02:39,970 tapping higher up? 67 00:02:39,970 --> 00:02:40,140 >> DAVID J. MALAN: OK. 68 00:02:40,140 --> 00:02:42,910 And we seem to have found that to be the case. 69 00:02:42,910 --> 00:02:45,670 Although, let's peel back the layers just a little bit, and you want to go 70 00:02:45,670 --> 00:02:47,640 ahead and reveal the other doors you could have chosen? 71 00:02:47,640 --> 00:02:50,400 72 00:02:50,400 --> 00:02:51,712 >> JENNIFER: Oh, dear. 73 00:02:51,712 --> 00:02:53,128 >> DAVID J. MALAN: Ah. 74 00:02:53,128 --> 00:02:54,280 >> JENNIFER: So I just got lucky. 75 00:02:54,280 --> 00:02:55,270 >> DAVID J. MALAN: So you got lucky. 76 00:02:55,270 --> 00:02:55,710 All right. 77 00:02:55,710 --> 00:02:56,795 So not bad. 78 00:02:56,795 --> 00:02:58,750 But that's an interesting insight, right? 79 00:02:58,750 --> 00:03:01,870 If you assumed, and you did get, indeed, a bit lucky there. 80 00:03:01,870 --> 00:03:05,350 But if you assumed that the numbers were sorted, can you be more precise 81 00:03:05,350 --> 00:03:08,750 as to how that influenced your behavior? 82 00:03:08,750 --> 00:03:11,715 >> JENNIFER: So if they were sorted, I thought maybe smallest to largest. 83 00:03:11,715 --> 00:03:11,970 >> DAVID J. MALAN: OK. 84 00:03:11,970 --> 00:03:15,260 >> JENNIFER: Or if this ended up being really big, then largest to smallest. 85 00:03:15,260 --> 00:03:15,540 >> DAVID J. MALAN: OK. 86 00:03:15,540 --> 00:03:18,170 So largest to smallest, or smallest to largest. 87 00:03:18,170 --> 00:03:21,990 But let me propose, suppose you had gotten unlucky, and suppose that they 88 00:03:21,990 --> 00:03:26,840 were not, in fact, sorted, how many of those doors might you have had to peek 89 00:03:26,840 --> 00:03:28,590 behind in that worst case? 90 00:03:28,590 --> 00:03:29,860 >> JENNIFER: All of them. 91 00:03:29,860 --> 00:03:30,420 >> DAVID J. MALAN: All of them. 92 00:03:30,420 --> 00:03:31,740 So let's generalize that as n. 93 00:03:31,740 --> 00:03:34,790 There happens to be 7, but let's more generally say there's n doors on the 94 00:03:34,790 --> 00:03:35,650 screen here. 95 00:03:35,650 --> 00:03:40,110 So in the worst case, you would have to look behind 7 doors, or n doors. 96 00:03:40,110 --> 00:03:44,140 And so this really is, it's a bit of luck today, but it's really a linear 97 00:03:44,140 --> 00:03:46,440 algorithm of sorts, even though you were kind of skipping around. 98 00:03:46,440 --> 00:03:47,080 Is that fair? 99 00:03:47,080 --> 00:03:47,500 >> JENNIFER: Yeah. 100 00:03:47,500 --> 00:03:50,000 >> DAVID J. MALAN: Well, let me see if your strategy changes if I move us to 101 00:03:50,000 --> 00:03:52,190 our second example here with 7 different doors. 102 00:03:52,190 --> 00:03:55,240 Same numbers, but this time they are sorted. 103 00:03:55,240 --> 00:03:58,350 What's your strategy here going to be, trying to put out of your mind what 104 00:03:58,350 --> 00:03:59,310 the other numbers were-- 105 00:03:59,310 --> 00:03:59,930 >> JENNIFER: OK. 106 00:03:59,930 --> 00:04:02,290 >> DAVID J. MALAN: --earlier? 107 00:04:02,290 --> 00:04:03,180 >> JENNIFER: Let's start with the first one. 108 00:04:03,180 --> 00:04:03,540 >> DAVID J. MALAN: All right. 109 00:04:03,540 --> 00:04:05,190 Start with the first one. 110 00:04:05,190 --> 00:04:05,960 4. 111 00:04:05,960 --> 00:04:08,810 Now where you going to go, and why? 112 00:04:08,810 --> 00:04:10,040 >> JENNIFER: 4 is really small. 113 00:04:10,040 --> 00:04:12,500 So if they're sort maybe smallest to largest, it should 114 00:04:12,500 --> 00:04:13,290 be twice that, and--. 115 00:04:13,290 --> 00:04:13,670 >> DAVID J. MALAN: OK. 116 00:04:13,670 --> 00:04:15,990 Let's see, which you think? 117 00:04:15,990 --> 00:04:19,050 >> JENNIFER: Try the last one. 118 00:04:19,050 --> 00:04:19,500 Nice. 119 00:04:19,500 --> 00:04:20,880 >> DAVID J. MALAN: Very nicely done. 120 00:04:20,880 --> 00:04:21,860 All right. 121 00:04:21,860 --> 00:04:23,010 >> [APPLAUSE] 122 00:04:23,010 --> 00:04:24,310 >> DAVID J. MALAN: OK. 123 00:04:24,310 --> 00:04:26,790 So you're actually doing this horribly, because you're 124 00:04:26,790 --> 00:04:27,700 doing it very well. 125 00:04:27,700 --> 00:04:31,150 Which leaves us unable to make certain points. 126 00:04:31,150 --> 00:04:32,565 So let's try to roll back here. 127 00:04:32,565 --> 00:04:34,560 >> JENNIFER: OK. 128 00:04:34,560 --> 00:04:35,980 >> DAVID J. MALAN: Very well done, nonetheless. 129 00:04:35,980 --> 00:04:39,060 So you started at the beginning, you saw that it was 4, then you 130 00:04:39,060 --> 00:04:40,240 moved to the end. 131 00:04:40,240 --> 00:04:42,320 But suppose you didn't get lucky there, and suppose 50 132 00:04:42,320 --> 00:04:42,890 was somewhere else. 133 00:04:42,890 --> 00:04:46,190 What your third step have been? 134 00:04:46,190 --> 00:04:47,680 >> JENNIFER: Go back to the beginning. 135 00:04:47,680 --> 00:04:48,320 >> DAVID J. MALAN: Go back to the beginning. 136 00:04:48,320 --> 00:04:51,320 OK, so you would've touched this door, which was 8. 137 00:04:51,320 --> 00:04:51,660 All right. 138 00:04:51,660 --> 00:04:52,650 So that's not 50. 139 00:04:52,650 --> 00:04:55,380 Where would you have looked next? 140 00:04:55,380 --> 00:04:56,720 >> JENNIFER: If I didn't know they sorted. 141 00:04:56,720 --> 00:04:57,005 >> DAVID J. MALAN: Correct. 142 00:04:57,005 --> 00:04:58,490 Well, if you did know they were sorted-- 143 00:04:58,490 --> 00:04:58,700 >> JENNIFER: Oh, did know, yeah. 144 00:04:58,700 --> 00:05:00,910 >> DAVID J. MALAN: --but you didn't know where 50 was yet? 145 00:05:00,910 --> 00:05:01,785 >> JENNIFER: Just keep going. 146 00:05:01,785 --> 00:05:02,130 >> DAVID J. MALAN: All right. 147 00:05:02,130 --> 00:05:02,520 OK. 148 00:05:02,520 --> 00:05:03,800 Keep going. 149 00:05:03,800 --> 00:05:05,270 OK, that I can work with. 150 00:05:05,270 --> 00:05:05,610 >> JENNIFER: OK. 151 00:05:05,610 --> 00:05:07,210 >> DAVID J. MALAN: Now, if you're just going to keep going, what's your 152 00:05:07,210 --> 00:05:09,680 algorithm devolving backed into. 153 00:05:09,680 --> 00:05:10,740 >> JENNIFER: The linear--. 154 00:05:10,740 --> 00:05:11,820 >> DAVID J. MALAN: It is kind of linear. 155 00:05:11,820 --> 00:05:13,480 But let me propose, let me put on the spot. 156 00:05:13,480 --> 00:05:14,900 Let me refresh the page. 157 00:05:14,900 --> 00:05:17,120 same number, same arrangement, same doors. 158 00:05:17,120 --> 00:05:21,350 But think back to that first day in class when we tore a phone book in 159 00:05:21,350 --> 00:05:25,480 half, sort of, and what was our strategy there? 160 00:05:25,480 --> 00:05:26,450 >> JENNIFER: Start at the middle. 161 00:05:26,450 --> 00:05:26,690 >> DAVID J. MALAN: OK. 162 00:05:26,690 --> 00:05:27,610 So start at the middle. 163 00:05:27,610 --> 00:05:28,790 So let's go ahead and simulate that. 164 00:05:28,790 --> 00:05:30,720 Start at the middle by revealing that door. 165 00:05:30,720 --> 00:05:31,660 So the number 16. 166 00:05:31,660 --> 00:05:35,290 So what would the strong guy have done, who tore the phone book in half, 167 00:05:35,290 --> 00:05:38,450 to get to the next guess? 168 00:05:38,450 --> 00:05:39,400 >> JENNIFER: Go in this half. 169 00:05:39,400 --> 00:05:41,700 >> DAVID J. MALAN: And why to the right? 170 00:05:41,700 --> 00:05:43,900 >> JENNIFER: If they were sort of smallest to largest, then 50 should be 171 00:05:43,900 --> 00:05:44,720 at that end. 172 00:05:44,720 --> 00:05:44,920 >> DAVID J. MALAN: Good. 173 00:05:44,920 --> 00:05:45,390 Totally reasonable. 174 00:05:45,390 --> 00:05:48,380 So like a phone book, you go to the right as opposed to the left, but here 175 00:05:48,380 --> 00:05:49,500 is the key takeaway. 176 00:05:49,500 --> 00:05:53,930 You now can throw away, or tear off, half of this problem, leaving you not 177 00:05:53,930 --> 00:05:55,970 with 7 doors, but really with just 3. 178 00:05:55,970 --> 00:05:57,870 Which is roughly half of the size of the problem. 179 00:05:57,870 --> 00:05:58,350 All right. 180 00:05:58,350 --> 00:06:01,890 So now what you would have done after you go right? 181 00:06:01,890 --> 00:06:05,870 >> JENNIFER: So 16 is still pretty small, relative to 50, so maybe I'll try, 182 00:06:05,870 --> 00:06:06,700 like, this one. 183 00:06:06,700 --> 00:06:07,890 >> DAVID J. MALAN: All right. 184 00:06:07,890 --> 00:06:08,720 42. 185 00:06:08,720 --> 00:06:10,830 All right, so now what's your instinct telling you? 186 00:06:10,830 --> 00:06:12,100 >> JENNIFER: I can throw away this and then just-- 187 00:06:12,100 --> 00:06:12,360 >> DAVID J. MALAN: OK. 188 00:06:12,360 --> 00:06:14,212 Good, you can throw away the left half there. 189 00:06:14,212 --> 00:06:14,890 >> JENNIFER: --pick this one. 190 00:06:14,890 --> 00:06:15,530 >> DAVID J. MALAN: And the right. 191 00:06:15,530 --> 00:06:15,760 >> JENNIFER: Yeah. 192 00:06:15,760 --> 00:06:17,820 >> DAVID J. MALAN: So even though it's hard to see perhaps, when there's only 193 00:06:17,820 --> 00:06:21,320 7 doors, think about, now, the consistency of the 194 00:06:21,320 --> 00:06:22,620 algorithm you just applied. 195 00:06:22,620 --> 00:06:24,510 In the previous case, you did get lucky, which was great. 196 00:06:24,510 --> 00:06:26,540 But you did use a heuristic, I would say. 197 00:06:26,540 --> 00:06:29,150 You used sort of your instincts, and knowing it sorted, if it's pretty 198 00:06:29,150 --> 00:06:31,600 small at the beginning, obviously, we've got to go more to the right. 199 00:06:31,600 --> 00:06:34,990 But in some sense, you got lucky, because maybe this was the number 100, 200 00:06:34,990 --> 00:06:36,220 and maybe 50 was more in the middle. 201 00:06:36,220 --> 00:06:37,910 Maybe 50 was even over here. 202 00:06:37,910 --> 00:06:40,960 >> But what you did a little differently this time was, you did the same thing 203 00:06:40,960 --> 00:06:42,150 again and again. 204 00:06:42,150 --> 00:06:45,310 And I would argue that what you just did, albeit influenced by the phone 205 00:06:45,310 --> 00:06:48,100 book example, is something much more algorithmic, and much 206 00:06:48,100 --> 00:06:49,930 less special cased. 207 00:06:49,930 --> 00:06:51,620 Much less instinctive. 208 00:06:51,620 --> 00:06:57,160 So in the end of the day, how would you describe the efficiency of the 209 00:06:57,160 --> 00:07:00,530 first algorithm, where you went left to right, versus the 210 00:07:00,530 --> 00:07:03,430 second algorithm here? 211 00:07:03,430 --> 00:07:06,460 >> JENNIFER: This one should, like, maybe halve the time, or even more, yeah. 212 00:07:06,460 --> 00:07:07,320 >> DAVID J. MALAN: OK, maybe even more. 213 00:07:07,320 --> 00:07:10,150 Let's push a little harder on that. 214 00:07:10,150 --> 00:07:13,030 What really, if we continue this logic, we definitely halved the 215 00:07:13,030 --> 00:07:15,830 running time with this second algorithm by throwing away half of the 216 00:07:15,830 --> 00:07:18,470 numbers, but what did we do on the next iteration, when Jennifer revealed 217 00:07:18,470 --> 00:07:20,615 the second number? 218 00:07:20,615 --> 00:07:22,830 >> We halved the numbers of doors again. 219 00:07:22,830 --> 00:07:25,270 And then what did we do after that, if there were more doors to play with? 220 00:07:25,270 --> 00:07:27,520 We would halve them, and again, and again, and again. 221 00:07:27,520 --> 00:07:30,420 And this was just like you guys all standing up at the first week of 222 00:07:30,420 --> 00:07:33,000 class, half of you sitting down, half of you sitting down, half of you 223 00:07:33,000 --> 00:07:35,440 sitting down, until one lone soul was standing. 224 00:07:35,440 --> 00:07:39,050 And we said that the running time of that, the number of steps it took was 225 00:07:39,050 --> 00:07:40,430 on the order of what? 226 00:07:40,430 --> 00:07:41,230 >> SPEAKER 1: [INAUDIBLE] 227 00:07:41,230 --> 00:07:43,970 >> DAVID J. MALAN: So log base 2 of n, or just more simply, log of n. 228 00:07:43,970 --> 00:07:45,060 So something logarithmic. 229 00:07:45,060 --> 00:07:48,380 And the graph wasn't a straight line that just got worse and worse, it was 230 00:07:48,380 --> 00:07:52,490 this interesting curve that didn't get so bad over time. 231 00:07:52,490 --> 00:07:53,910 So let's hold on to this idea. 232 00:07:53,910 --> 00:07:54,690 Let's thank Jennifer. 233 00:07:54,690 --> 00:07:56,150 Thanks so much for coming on up. 234 00:07:56,150 --> 00:07:57,400 And, one sec. 235 00:07:57,400 --> 00:08:00,170 236 00:08:00,170 --> 00:08:02,925 No desk lamps today, but we do have CS50 stress balls. 237 00:08:02,925 --> 00:08:03,420 >> JENNIFER: Yay. 238 00:08:03,420 --> 00:08:04,410 >> DAVID J. MALAN: All right, here. 239 00:08:04,410 --> 00:08:06,545 Thank you for incurring the stress up here. 240 00:08:06,545 --> 00:08:07,350 All right. 241 00:08:07,350 --> 00:08:10,620 So let's see if we can't now formalize this a bit more. 242 00:08:10,620 --> 00:08:14,820 So again, what we just did was essentially the same thing as we did 243 00:08:14,820 --> 00:08:16,660 in that first week. 244 00:08:16,660 --> 00:08:23,780 But rather than end with just a linear algorithm, which we depicted 245 00:08:23,780 --> 00:08:27,210 previously as this straight line, whereby, if we put one more door on 246 00:08:27,210 --> 00:08:29,610 the screen, then Jennifer would have had to look, potentially, 247 00:08:29,610 --> 00:08:30,600 behind one more door. 248 00:08:30,600 --> 00:08:33,490 If we put two more doors, she might have to look behind two more doors. 249 00:08:33,490 --> 00:08:35,990 >> And so, there was this linear relationship between the size of the 250 00:08:35,990 --> 00:08:39,059 problem on, say, the x-axis, and the amount of time it takes to 251 00:08:39,059 --> 00:08:40,440 solve on the y. 252 00:08:40,440 --> 00:08:43,330 But the picture I was alluding to earlier was this green line. 253 00:08:43,330 --> 00:08:45,970 Green deliberately, because it just felt better. 254 00:08:45,970 --> 00:08:49,790 In theory, the algorithm, when we did it with the phone book, when we did it 255 00:08:49,790 --> 00:08:52,420 with you guys counting each other, and in the second case, when Jennifer just 256 00:08:52,420 --> 00:08:55,250 did it up here, it was sort of fundamentally better. 257 00:08:55,250 --> 00:08:57,180 Because it wasn't just twice as fast. 258 00:08:57,180 --> 00:08:58,870 It wasn't even four times as fast. 259 00:08:58,870 --> 00:09:03,290 It was entirely dependent on what the size of the input was, as to how many 260 00:09:03,290 --> 00:09:05,220 steps it ultimately took. 261 00:09:05,220 --> 00:09:08,040 >> And so this simple idea that we all took for granted with the phone book, 262 00:09:08,040 --> 00:09:10,200 can similarly be applied to something like this. 263 00:09:10,200 --> 00:09:12,380 And this might be more casually known as, as you might 264 00:09:12,380 --> 00:09:13,940 imagine, divide and conquer. 265 00:09:13,940 --> 00:09:16,390 Not unlike what we did, of course, with the phone book. 266 00:09:16,390 --> 00:09:18,300 >> But the pseudocode, recall, was this. 267 00:09:18,300 --> 00:09:21,800 So we won't do this again, but recall that first week, all of us stood up 268 00:09:21,800 --> 00:09:25,140 and then half of you sat down, half of you sat down, half of you sat down. 269 00:09:25,140 --> 00:09:29,280 That algorithm was implemented in a bit of a cheating way, in that, it 270 00:09:29,280 --> 00:09:32,870 wasn't just one of me counting, fundamentally, more efficiently. 271 00:09:32,870 --> 00:09:35,830 In that case, I was leveraging a secondary resource. 272 00:09:35,830 --> 00:09:39,470 Sort of, multiple CPUs, multiple brains, multiple smart people in the 273 00:09:39,470 --> 00:09:42,740 room were helping me get from something linear to something 274 00:09:42,740 --> 00:09:45,190 logarithmic, from something red to something green. 275 00:09:45,190 --> 00:09:48,650 >> But in this case, Jennifer alone can fundamentally improve upon the 276 00:09:48,650 --> 00:09:52,370 performance of her first algorithm by, again, just thinking a little harder. 277 00:09:52,370 --> 00:09:56,650 And now, when it comes time to implement these things, figuring out 278 00:09:56,650 --> 00:10:00,670 what lines of code you can write such that you can repeat them again, and 279 00:10:00,670 --> 00:10:03,350 again, and again, sort of in a looping fashion. 280 00:10:03,350 --> 00:10:06,370 Because you're not going to have the luxury, like Jennifer did at first, to 281 00:10:06,370 --> 00:10:10,460 just have a whole bunch of ifs and say, hmm, if this first number is 4, 282 00:10:10,460 --> 00:10:11,800 let me jump all the way to the end. 283 00:10:11,800 --> 00:10:14,180 Ooh,if that number's too big, let me move arbitrarily back 284 00:10:14,180 --> 00:10:15,220 to the second element. 285 00:10:15,220 --> 00:10:18,210 You'll find that it's going to be a lot harder to formalize what we humans 286 00:10:18,210 --> 00:10:21,270 take for granted as very reasonable heuristics, but a computer is only 287 00:10:21,270 --> 00:10:23,260 going to do what you tell it to do. 288 00:10:23,260 --> 00:10:25,280 >> Now this has very interesting implications. 289 00:10:25,280 --> 00:10:29,950 This graph is sort of meant to sort of overwhelm visually, but notice, where 290 00:10:29,950 --> 00:10:32,230 is the straight line in this graph? 291 00:10:32,230 --> 00:10:35,330 Where is the linear graph that we call n? 292 00:10:35,330 --> 00:10:37,580 Well, it's sort of toward the bottom of this picture, right? 293 00:10:37,580 --> 00:10:40,500 So all we've done is we've sort of zoomed out to the x-axis and the 294 00:10:40,500 --> 00:10:44,780 y-axis to try to get a sense of what other types of curves look like. 295 00:10:44,780 --> 00:10:47,760 >> And the specifics of the mathematical expressions today won't matter so 296 00:10:47,760 --> 00:10:52,440 much, but notice that there's a lot of algorithms that are far worse than 297 00:10:52,440 --> 00:10:53,470 something that's linear. 298 00:10:53,470 --> 00:10:55,410 Indeed, n cubed looks pretty bad. 299 00:10:55,410 --> 00:10:58,400 2 to the n looks pretty bad. n squared looks pretty bad. 300 00:10:58,400 --> 00:11:01,630 And we'll see what some of those might be in reality today. 301 00:11:01,630 --> 00:11:05,430 And log n doesn't feel as bad, but better than n is log base 2 of n. 302 00:11:05,430 --> 00:11:08,080 But you know, it would have been even more amazing if Jennifer, or if we, 303 00:11:08,080 --> 00:11:12,910 that first week, had come up with something that's log of log of n. 304 00:11:12,910 --> 00:11:15,880 >> So in other words, there's this whole range of possible solutions to 305 00:11:15,880 --> 00:11:18,570 problems, but even here, notice what's going to happen. 306 00:11:18,570 --> 00:11:22,910 When I zoom out, which of these curves is going to prove to be the absolute 307 00:11:22,910 --> 00:11:26,630 worst of the ones on the screen now? 308 00:11:26,630 --> 00:11:28,680 So n cubed looks pretty bad at the moment. 309 00:11:28,680 --> 00:11:32,470 But if we zoom out and see more of the x and the y-axis, who's going to 310 00:11:32,470 --> 00:11:34,550 dominate ultimately? 311 00:11:34,550 --> 00:11:37,120 So it actually turns out that 2 to the n, and you can figure this out just by 312 00:11:37,120 --> 00:11:39,990 plugging in some increasingly large numbers, and you'll see that 2 to the 313 00:11:39,990 --> 00:11:42,070 n, indeed, gets bigger much faster. 314 00:11:42,070 --> 00:11:45,530 If we really zoom out, a 2 to the n algorithm absolutely sucks. 315 00:11:45,530 --> 00:11:48,170 I mean this is going to take quite a bit of time for the 316 00:11:48,170 --> 00:11:49,460 computer to churn through. 317 00:11:49,460 --> 00:11:52,500 >> But you'll see over time, especially with future problem sets and even 318 00:11:52,500 --> 00:11:55,600 final projects, is your data set gets large, all right? 319 00:11:55,600 --> 00:11:58,300 Even in the first version of Facebook, as the number of friends, and the 320 00:11:58,300 --> 00:12:01,840 number of registered users got large, you can sort of phone it in and 321 00:12:01,840 --> 00:12:05,530 implement something with linear search, or a very simple sorting 322 00:12:05,530 --> 00:12:07,030 algorithm, as we'll see today. 323 00:12:07,030 --> 00:12:09,280 You have to start thinking harder and harder about these problems. 324 00:12:09,280 --> 00:12:12,070 And the types of problems places like Facebook, and Google, and Microsoft, 325 00:12:12,070 --> 00:12:16,350 and others work on is exactly these sort of big data sort of questions 326 00:12:16,350 --> 00:12:18,530 increasingly these days. 327 00:12:18,530 --> 00:12:18,900 >> All right. 328 00:12:18,900 --> 00:12:23,800 So Jennifer's success in that second algorithm, frankly, she did amazingly 329 00:12:23,800 --> 00:12:26,110 well the first time, but let's write it as luck so that we 330 00:12:26,110 --> 00:12:27,000 can make this point. 331 00:12:27,000 --> 00:12:30,970 In the second case, she leveraged an algorithm that repeated again and 332 00:12:30,970 --> 00:12:34,670 again, but she took for granted a certain assumption that we allowed 333 00:12:34,670 --> 00:12:39,370 her, but she exploited some detail the second time that she didn't have the 334 00:12:39,370 --> 00:12:39,840 first time. 335 00:12:39,840 --> 00:12:41,800 Which was what? 336 00:12:41,800 --> 00:12:43,050 >> That the list was sorted. 337 00:12:43,050 --> 00:12:46,350 So as soon as the list was sorted, we claim that Jennifer was able to do 338 00:12:46,350 --> 00:12:47,480 fundamentally better. 339 00:12:47,480 --> 00:12:51,450 7 doors, yes, is not that interesting, but suppose it we're 7 million doors. 340 00:12:51,450 --> 00:12:54,080 Log of n is definitely going to perform much, much 341 00:12:54,080 --> 00:12:55,610 faster in the long run. 342 00:12:55,610 --> 00:12:58,880 But she had to have the doors sorted for her. 343 00:12:58,880 --> 00:13:02,320 Now, I took the liberty of doing that in advance on the computer screen 344 00:13:02,320 --> 00:13:05,160 here, but suppose that Jennifer had to do that herself? 345 00:13:05,160 --> 00:13:10,120 Suppose that the doors in question represented data in a database, or 346 00:13:10,120 --> 00:13:14,260 friends registered for Facebook, or any web pages on the internet that 347 00:13:14,260 --> 00:13:16,880 various websites might need to index or search over. 348 00:13:16,880 --> 00:13:20,940 >> Suppose that you just had a raw data set and it was left to you, or to 349 00:13:20,940 --> 00:13:23,010 Jennifer to do that sorting? 350 00:13:23,010 --> 00:13:26,950 That, rather, requires that we answer the question, well, how much time 351 00:13:26,950 --> 00:13:31,080 would have taken Jennifer, or even me, to sort those numbers in advance so 352 00:13:31,080 --> 00:13:32,680 that she could take advantage of that? 353 00:13:32,680 --> 00:13:32,880 Right? 354 00:13:32,880 --> 00:13:36,620 Because the implication, of course, is if it takes me quite a while to sort 355 00:13:36,620 --> 00:13:40,800 the numbers, who the heck cares that you can find a number like 50 so fast, 356 00:13:40,800 --> 00:13:44,850 as in Jennifer's case, if we more than overwhelmed the amount of total time 357 00:13:44,850 --> 00:13:46,920 it took by sorting things in advance? 358 00:13:46,920 --> 00:13:49,320 >> So let's see if we can't the paint the picture here. 359 00:13:49,320 --> 00:13:51,370 I have a whole bunch more stress balls, if that helps 360 00:13:51,370 --> 00:13:52,270 break the ice here. 361 00:13:52,270 --> 00:13:55,690 And if you wouldn't mind, we need seven volunteer-- 362 00:13:55,690 --> 00:13:57,060 on, OK. 363 00:13:57,060 --> 00:13:57,240 Wow. 364 00:13:57,240 --> 00:13:59,250 So we don't have to spend on desk lamps, it seems. 365 00:13:59,250 --> 00:13:59,690 All right. 366 00:13:59,690 --> 00:14:01,530 So how about you two in front. 367 00:14:01,530 --> 00:14:04,160 How about you two guys in back. 368 00:14:04,160 --> 00:14:04,870 So that's four. 369 00:14:04,870 --> 00:14:09,890 How about you in front five, six and seven. 370 00:14:09,890 --> 00:14:10,320 Right there. 371 00:14:10,320 --> 00:14:13,260 Your friend's pointing you out, so you get the prize. 372 00:14:13,260 --> 00:14:13,700 >> All right. 373 00:14:13,700 --> 00:14:14,410 Come on up. 374 00:14:14,410 --> 00:14:17,120 And why don't we have you guys come on over here. 375 00:14:17,120 --> 00:14:18,960 I'm going to give you each a number. 376 00:14:18,960 --> 00:14:22,150 And go ahead and arrange yourselves identically to what's 377 00:14:22,150 --> 00:14:25,180 depicted on the screen. 378 00:14:25,180 --> 00:14:26,530 >> [INTERPOSING VOICES] 379 00:14:26,530 --> 00:14:28,160 >> DAVID J. MALAN: Oop, sorry. 380 00:14:28,160 --> 00:14:30,210 Bug. 381 00:14:30,210 --> 00:14:32,180 All right. 382 00:14:32,180 --> 00:14:32,750 Well, here we go. 383 00:14:32,750 --> 00:14:34,180 Number five. 384 00:14:34,180 --> 00:14:35,136 Number six. 385 00:14:35,136 --> 00:14:37,770 One, two, three, four, five, six, seven. 386 00:14:37,770 --> 00:14:39,410 Oh, this is awkward. 387 00:14:39,410 --> 00:14:41,210 >> SPEAKER 2: I'll just get a--. 388 00:14:41,210 --> 00:14:41,900 >> DAVID J. MALAN: Good deal. 389 00:14:41,900 --> 00:14:43,130 All right. 390 00:14:43,130 --> 00:14:44,611 Thank you for participating. 391 00:14:44,611 --> 00:14:47,200 >> [APPLAUSE] 392 00:14:47,200 --> 00:14:48,580 >> OK. 393 00:14:48,580 --> 00:14:48,860 All right. 394 00:14:48,860 --> 00:14:51,970 So we have four, two, six, one, three, seven, five. 395 00:14:51,970 --> 00:14:56,010 Perfect so we have seven volunteers here who are equal in width to the 396 00:14:56,010 --> 00:14:57,430 array that we're playing with the earlier. 397 00:14:57,430 --> 00:14:59,470 And I chose seven for reasons that will be just 398 00:14:59,470 --> 00:15:00,840 convenient in a little bit. 399 00:15:00,840 --> 00:15:04,400 And I'm going to propose first that we sort these seven volunteers. 400 00:15:04,400 --> 00:15:06,786 If you'd like, first, to say hello though. 401 00:15:06,786 --> 00:15:08,970 Since this is going to be an awkward several minutes. 402 00:15:08,970 --> 00:15:10,370 Introduce yourselves. 403 00:15:10,370 --> 00:15:10,980 >> GRACE: Hi, I'm Grace. 404 00:15:10,980 --> 00:15:14,190 I'm a sophomore in Leverett House. 405 00:15:14,190 --> 00:15:14,620 >> BRANSON: Hi. 406 00:15:14,620 --> 00:15:15,620 I'm Branson. 407 00:15:15,620 --> 00:15:16,920 I'm a freshman in Weld. 408 00:15:16,920 --> 00:15:19,755 409 00:15:19,755 --> 00:15:20,230 >> GABE: Hi. 410 00:15:20,230 --> 00:15:21,040 I'm Gabe. 411 00:15:21,040 --> 00:15:22,300 I'm a junior in Cabot. 412 00:15:22,300 --> 00:15:24,826 413 00:15:24,826 --> 00:15:25,980 >> NEIL: I'm Neil. 414 00:15:25,980 --> 00:15:29,090 I'm a freshman in Matthews. 415 00:15:29,090 --> 00:15:29,550 >> JASON: I'm Jason. 416 00:15:29,550 --> 00:15:32,816 I'm a freshman in Greenough. 417 00:15:32,816 --> 00:15:33,700 >> MIKE : I'm Mike. 418 00:15:33,700 --> 00:15:37,360 I'm a freshman in Grays. 419 00:15:37,360 --> 00:15:37,990 >> JESS: I'm Jess. 420 00:15:37,990 --> 00:15:40,313 I'm a sophomore in Leverett. 421 00:15:40,313 --> 00:15:41,300 >> DAVID J. MALAN: Excellent. 422 00:15:41,300 --> 00:15:41,850 All right. 423 00:15:41,850 --> 00:15:44,190 Well, thank you to all of our volunteers here thus far. 424 00:15:44,190 --> 00:15:47,110 And the challenge at hand now is going to be to sort of these guys, but then 425 00:15:47,110 --> 00:15:50,250 we're going to have to think a little hard about how efficiently we actually 426 00:15:50,250 --> 00:15:51,110 sorted them. 427 00:15:51,110 --> 00:15:52,580 So let's first try this. 428 00:15:52,580 --> 00:15:55,970 You guys can see each other's numbers just by placing around the corners. 429 00:15:55,970 --> 00:15:59,380 Go ahead and take a few seconds, and sort yourselves from smallest on the 430 00:15:59,380 --> 00:16:01,240 left to largest on the right. 431 00:16:01,240 --> 00:16:02,490 Go. 432 00:16:02,490 --> 00:16:07,010 433 00:16:07,010 --> 00:16:07,530 >> OK. 434 00:16:07,530 --> 00:16:08,030 Good. 435 00:16:08,030 --> 00:16:09,370 That was really darn fast. 436 00:16:09,370 --> 00:16:14,040 Now someone here, what was the algorithm that these guys applied? 437 00:16:14,040 --> 00:16:14,900 >> SPEAKER 1: Least to greatest. 438 00:16:14,900 --> 00:16:15,000 >> DAVID J. MALAN: OK. 439 00:16:15,000 --> 00:16:18,070 Least to greatest is really sort of the objective, but I'm not sure that's 440 00:16:18,070 --> 00:16:18,890 really an algorithm. 441 00:16:18,890 --> 00:16:21,810 Least to greatest doesn't tell me step-by-step what to do. 442 00:16:21,810 --> 00:16:22,833 Yeah? 443 00:16:22,833 --> 00:16:24,083 >> SPEAKER 1: [INAUDIBLE] 444 00:16:24,083 --> 00:16:26,010 445 00:16:26,010 --> 00:16:26,280 >> DAVID J. MALAN: OK. 446 00:16:26,280 --> 00:16:28,920 So if you see a person smaller than your number, then move to 447 00:16:28,920 --> 00:16:29,680 the right of them. 448 00:16:29,680 --> 00:16:32,800 So that's now getting more expressive, more like an algorithm, because you 449 00:16:32,800 --> 00:16:35,410 can say, if this, then that. 450 00:16:35,410 --> 00:16:37,050 So we have some kind of conditional construct. 451 00:16:37,050 --> 00:16:39,700 And these guys seemed to do that a few times, because some of you moved a bit 452 00:16:39,700 --> 00:16:40,420 of a distance. 453 00:16:40,420 --> 00:16:43,410 So there was presumably some kind of looping going on in their minds. 454 00:16:43,410 --> 00:16:44,610 >> But let's try to formalize that. 455 00:16:44,610 --> 00:16:47,540 If you guys could reset back to this arrangement. 456 00:16:47,540 --> 00:16:50,650 Let's see if we can't formalize this a bit, and then ask the question, just 457 00:16:50,650 --> 00:16:51,580 how efficient is this? 458 00:16:51,580 --> 00:16:54,220 Of course, when we do this more slowly, it's going to feel as good of 459 00:16:54,220 --> 00:16:57,210 an algorithm, but let's see if we can put our fingers on the precise steps. 460 00:16:57,210 --> 00:16:58,670 >> So you two guys are four and two. 461 00:16:58,670 --> 00:17:01,020 Or you correct or incorrect order? 462 00:17:01,020 --> 00:17:01,900 Obviously incorrect. 463 00:17:01,900 --> 00:17:02,710 So we swapped. 464 00:17:02,710 --> 00:17:05,170 Now I'm going to move aside here and say, four to six. 465 00:17:05,170 --> 00:17:06,240 Are you correct or incorrect? 466 00:17:06,240 --> 00:17:06,599 >> GABE: Correct. 467 00:17:06,599 --> 00:17:07,180 >> DAVID J. MALAN: Correct. 468 00:17:07,180 --> 00:17:08,300 Six and one? 469 00:17:08,300 --> 00:17:08,609 Nope. 470 00:17:08,609 --> 00:17:09,630 Swap. 471 00:17:09,630 --> 00:17:10,490 So that's two swaps. 472 00:17:10,490 --> 00:17:11,710 Six and three? 473 00:17:11,710 --> 00:17:11,980 Nope. 474 00:17:11,980 --> 00:17:13,000 Swap. 475 00:17:13,000 --> 00:17:13,930 Six and seven? 476 00:17:13,930 --> 00:17:14,630 Looks good. 477 00:17:14,630 --> 00:17:15,396 Seven and five? 478 00:17:15,396 --> 00:17:16,150 >> JESS: [INAUDIBLE] 479 00:17:16,150 --> 00:17:17,089 >> DAVID J. MALAN: OK, swap. 480 00:17:17,089 --> 00:17:19,770 And sorted. 481 00:17:19,770 --> 00:17:19,980 All right. 482 00:17:19,980 --> 00:17:21,440 So obviously not, right? 483 00:17:21,440 --> 00:17:22,470 So there was more going on. 484 00:17:22,470 --> 00:17:24,920 But, indeed, these guys, even just instinctively. 485 00:17:24,920 --> 00:17:25,450 kept moving. 486 00:17:25,450 --> 00:17:27,710 They didn't just stop, once they corrected one problem. 487 00:17:27,710 --> 00:17:27,839 So. 488 00:17:27,839 --> 00:17:29,390 Indeed, I'm going to have to do the same thing. 489 00:17:29,390 --> 00:17:32,720 I'm going to have to sort of rewind back to the beginning of this problem, 490 00:17:32,720 --> 00:17:35,630 or the beginning of this array of people, let's start calling them. 491 00:17:35,630 --> 00:17:38,366 >> And now what should my algorithm on the second pass be? 492 00:17:38,366 --> 00:17:39,220 >> SPEAKER 1: Same thing. 493 00:17:39,220 --> 00:17:39,940 >> DAVID J. MALAN: Same thing. 494 00:17:39,940 --> 00:17:41,460 And this, I'm starting to like, right? 495 00:17:41,460 --> 00:17:44,720 As soon as you can find yourself doing the same thing again and again, that's 496 00:17:44,720 --> 00:17:47,890 becoming more like an algorithm, and less human instinct. 497 00:17:47,890 --> 00:17:48,680 >> So now, here we go again. 498 00:17:48,680 --> 00:17:49,870 Two and four? 499 00:17:49,870 --> 00:17:50,220 No. 500 00:17:50,220 --> 00:17:51,050 Four and one? 501 00:17:51,050 --> 00:17:53,380 Ah, there was indeed some work still to be done. 502 00:17:53,380 --> 00:17:53,620 For and three? 503 00:17:53,620 --> 00:17:54,572 Good. 504 00:17:54,572 --> 00:17:56,000 Four and six? 505 00:17:56,000 --> 00:17:58,380 Six and five? 506 00:17:58,380 --> 00:17:59,470 Six and seven? 507 00:17:59,470 --> 00:18:00,970 OK, now, done. 508 00:18:00,970 --> 00:18:01,550 OK, no. 509 00:18:01,550 --> 00:18:02,710 I have to go back. 510 00:18:02,710 --> 00:18:05,130 >> So now, again, we're doing this a little more deliberately. 511 00:18:05,130 --> 00:18:08,700 And now, there's just one brain executing this algorithm. 512 00:18:08,700 --> 00:18:10,290 One CPU, if you will. 513 00:18:10,290 --> 00:18:13,090 And frankly, that's the only resource we're going to have access to. 514 00:18:13,090 --> 00:18:16,280 And once we do go back to a keyboard and have something like C at our 515 00:18:16,280 --> 00:18:19,600 disposal, we're only writing a program that can do one thing at a time. 516 00:18:19,600 --> 00:18:22,900 Whereas, these guys a moment ago, we leveraged their collective brainpower 517 00:18:22,900 --> 00:18:24,180 like you guys did in week zero. 518 00:18:24,180 --> 00:18:24,980 So let's keep doing this. 519 00:18:24,980 --> 00:18:26,260 >> Two and one. 520 00:18:26,260 --> 00:18:26,945 Two and three. 521 00:18:26,945 --> 00:18:27,460 Three and four. 522 00:18:27,460 --> 00:18:28,310 Four and five. 523 00:18:28,310 --> 00:18:28,620 Five and six. 524 00:18:28,620 --> 00:18:30,510 Six and seven. 525 00:18:30,510 --> 00:18:31,880 Done? 526 00:18:31,880 --> 00:18:34,560 So I am, but let me play devil's advocate. 527 00:18:34,560 --> 00:18:37,950 Do I, the sort of computer who just made a pass through this array of 528 00:18:37,950 --> 00:18:40,225 people, know that I'm done? 529 00:18:40,225 --> 00:18:40,670 >> SPEAKER 1: No. 530 00:18:40,670 --> 00:18:41,050 >> DAVID J. MALAN: So why? 531 00:18:41,050 --> 00:18:46,900 What would I have to do in order to conclude decisively that I am done? 532 00:18:46,900 --> 00:18:48,230 Probably one more pass. 533 00:18:48,230 --> 00:18:48,430 Right? 534 00:18:48,430 --> 00:18:51,760 Because all I know from that previous pass is that I corrected a mistake. 535 00:18:51,760 --> 00:18:53,920 And that means, maybe there's still another mistake 536 00:18:53,920 --> 00:18:54,840 that I need to correct. 537 00:18:54,840 --> 00:18:58,680 So I can only be sure by rewinding, and then checking, one to two, two and 538 00:18:58,680 --> 00:19:00,940 three, three and four, four and five, five and six, six and seven. 539 00:19:00,940 --> 00:19:02,510 OK, now I did no work. 540 00:19:02,510 --> 00:19:05,990 >> I can certainly remember that I did no work with something like a variable, 541 00:19:05,990 --> 00:19:06,975 like an int. 542 00:19:06,975 --> 00:19:12,490 Call it swaps, and if swaps is 0 once I get here, and it started at 0, then 543 00:19:12,490 --> 00:19:15,520 I would just be stupid to keep going back and forth, checking again, and 544 00:19:15,520 --> 00:19:16,450 again, and again, right? 545 00:19:16,450 --> 00:19:18,450 Because you get stuck in some kind of infinite loop. 546 00:19:18,450 --> 00:19:21,250 So as soon as there's 0 swaps, we can claim that this 547 00:19:21,250 --> 00:19:23,810 algorithm is indeed complete. 548 00:19:23,810 --> 00:19:25,400 >> Now, let's put a name on this. 549 00:19:25,400 --> 00:19:28,930 The algorithm that I propose we just implemented is something called bubble 550 00:19:28,930 --> 00:19:32,800 sort, known as such in the sense that the numbers that are bigger kind of 551 00:19:32,800 --> 00:19:37,990 bubble their way up to the top, or up to the end of the array of numbers. 552 00:19:37,990 --> 00:19:40,270 But how efficient was this algorithm? 553 00:19:40,270 --> 00:19:44,600 How many steps did I physically have to take, for instance, to sort these 554 00:19:44,600 --> 00:19:45,850 seven humans? 555 00:19:45,850 --> 00:19:48,560 556 00:19:48,560 --> 00:19:49,550 >> Four to five? 557 00:19:49,550 --> 00:19:51,420 OK, too many is ultimately going to be the answer. 558 00:19:51,420 --> 00:19:54,960 But even then, the specific number isn't so interesting. 559 00:19:54,960 --> 00:19:56,670 Let's generalize it as n. 560 00:19:56,670 --> 00:20:00,520 So if I had n people up here, and they were, sort of, in random order at the 561 00:20:00,520 --> 00:20:02,180 beginning, in that original order. 562 00:20:02,180 --> 00:20:04,910 Well, how many steps did I have to take on the first pass? 563 00:20:04,910 --> 00:20:09,810 It was one, two, three, four, five, six, and they're seven people, so 564 00:20:09,810 --> 00:20:13,670 that's seven, six--, so that's n minus one steps the first time. 565 00:20:13,670 --> 00:20:16,280 >> Now, how many steps did I have to take when I rewound? 566 00:20:16,280 --> 00:20:19,310 Well, we could actually double that if we really wanted to, but for now, I'm 567 00:20:19,310 --> 00:20:22,300 just going to say, all right, another n minus 1. 568 00:20:22,300 --> 00:20:25,240 So the n minus 1 is going to get annoying to keep track of, so let's 569 00:20:25,240 --> 00:20:26,400 just round up slightly. 570 00:20:26,400 --> 00:20:27,770 So 2n steps. 571 00:20:27,770 --> 00:20:29,310 So 14 steps, give or take. 572 00:20:29,310 --> 00:20:31,930 >> How many times did I take a step the next time? 573 00:20:31,930 --> 00:20:33,740 Well, it's 3n. 574 00:20:33,740 --> 00:20:34,510 really. 575 00:20:34,510 --> 00:20:37,920 And now, in the worst case, for instance, how many times would I have 576 00:20:37,920 --> 00:20:41,730 gone back and forth, back and forth, executing this algorithm, swapping 577 00:20:41,730 --> 00:20:44,620 people on each pass, roughly? 578 00:20:44,620 --> 00:20:47,720 579 00:20:47,720 --> 00:20:50,010 It's actually n squared, right? 580 00:20:50,010 --> 00:20:53,000 >> Because in the worst case, you can kind of think about this intuitively, 581 00:20:53,000 --> 00:20:54,800 even though it might take a little bit of time to sink in. 582 00:20:54,800 --> 00:20:57,590 In the worst case, what would these seven people have looked like, in 583 00:20:57,590 --> 00:21:00,230 terms of the arrangement of their numbers? 584 00:21:00,230 --> 00:21:01,460 Completely backwards, right? 585 00:21:01,460 --> 00:21:02,815 And just to simulate that, what was your name again? 586 00:21:02,815 --> 00:21:03,360 >> MIKE : Mike. 587 00:21:03,360 --> 00:21:03,640 >> DAVID J. MALAN: Mike? 588 00:21:03,640 --> 00:21:08,100 OK, Mike, can you just join me over here for just one second? 589 00:21:08,100 --> 00:21:08,880 Actually, no. 590 00:21:08,880 --> 00:21:10,150 Sorry Mike, let's rewind. 591 00:21:10,150 --> 00:21:10,910 What's your name again? 592 00:21:10,910 --> 00:21:11,180 >> NEIL: Neil. 593 00:21:11,180 --> 00:21:11,640 >> DAVID J. MALAN: Neil. 594 00:21:11,640 --> 00:21:13,750 OK, Neil, you come with me, if you don't mind. 595 00:21:13,750 --> 00:21:17,150 So I'm going to propose, just for simplicity, that Neil is now in his 596 00:21:17,150 --> 00:21:18,510 worst possible case. 597 00:21:18,510 --> 00:21:20,720 But recall how I implemented my algorithm. 598 00:21:20,720 --> 00:21:24,530 I'm comparing, comparing, comparing, comparing, comparing, oh. 599 00:21:24,530 --> 00:21:26,640 Now these guys are out of order, so I fix. 600 00:21:26,640 --> 00:21:27,980 So you guys swap. 601 00:21:27,980 --> 00:21:31,630 But consider now, how much farther does Neil have to go? 602 00:21:31,630 --> 00:21:32,690 It's roughly n. 603 00:21:32,690 --> 00:21:33,570 You know, it's not actually n. 604 00:21:33,570 --> 00:21:36,040 It's like, n minus 1, but I'm getting annoyed keeping track of the little 605 00:21:36,040 --> 00:21:37,550 number, so let's just call it n. 606 00:21:37,550 --> 00:21:42,860 >> So if Neil moves one step maximally each time, and to move Neil one step, 607 00:21:42,860 --> 00:21:46,580 I have to make this really tedious pass back and forth, this is roughly 608 00:21:46,580 --> 00:21:52,080 doing this, n steps, a total of n times, because it's going to take me 609 00:21:52,080 --> 00:21:55,820 that many steps to get Neil all the way to where he belongs. 610 00:21:55,820 --> 00:21:58,620 Let alone everyone else if you guys were all mis-ordered as well. 611 00:21:58,620 --> 00:22:01,100 >> So let's call bubble sort n squared. 612 00:22:01,100 --> 00:22:04,860 The running time of this algorithm, the performance of this algorithm, the 613 00:22:04,860 --> 00:22:07,120 efficiency of this algorithm, we shall just describe more 614 00:22:07,120 --> 00:22:08,800 generally as n squared. 615 00:22:08,800 --> 00:22:11,650 Which is nice, because I could do the same example with eight people, nine 616 00:22:11,650 --> 00:22:15,450 people, a million people, and that answer is not going to change. 617 00:22:15,450 --> 00:22:18,870 >> So if you guys wouldn't mind, let's reset you to where you started. 618 00:22:18,870 --> 00:22:22,510 And let's try two other approaches and see if we can't do fundamentally 619 00:22:22,510 --> 00:22:23,820 better than this. 620 00:22:23,820 --> 00:22:27,130 So this time, I'm going to propose a sort of different algorithm. 621 00:22:27,130 --> 00:22:29,950 That was very clever of us last time, and you guys were right to have the 622 00:22:29,950 --> 00:22:32,470 right instincts of just kind of swapping pairwise. 623 00:22:32,470 --> 00:22:36,500 But if I really wanted to approach this simply, and my goal is to move 624 00:22:36,500 --> 00:22:39,800 all of the little numbers this way, and push all of the big numbers that 625 00:22:39,800 --> 00:22:43,030 way, why don't I just do that in the most naive way possible and see if I 626 00:22:43,030 --> 00:22:45,730 can do better than what was a fairly complex algorithm? 627 00:22:45,730 --> 00:22:46,620 >> So let's see. 628 00:22:46,620 --> 00:22:48,940 Four is a pretty small number, so I'm going to leave you there moment. 629 00:22:48,940 --> 00:22:50,610 Ooh, number two is even better. 630 00:22:50,610 --> 00:22:52,230 So can you just step forward for a moment? 631 00:22:52,230 --> 00:22:55,670 This is currently my smallest numbered candidate, and I'm going to remember 632 00:22:55,670 --> 00:22:57,000 that with, like, a variable. 633 00:22:57,000 --> 00:22:57,930 But I'm going to keep checking. 634 00:22:57,930 --> 00:22:59,890 Is there someone whose number is smaller? 635 00:22:59,890 --> 00:23:00,460 Six, no. 636 00:23:00,460 --> 00:23:01,390 Oh, there's Neil again. 637 00:23:01,390 --> 00:23:04,050 >> So I'm going to push you back sort of conceptually. 638 00:23:04,050 --> 00:23:05,120 Neil will come forward. 639 00:23:05,120 --> 00:23:08,440 And now, the variable that I'm using to keep track of who has the smallest 640 00:23:08,440 --> 00:23:11,390 number is updated to contain Neil's location. 641 00:23:11,390 --> 00:23:12,110 Well, let's see. 642 00:23:12,110 --> 00:23:13,960 Three, seven, five. 643 00:23:13,960 --> 00:23:15,590 OK, I know Neil was the smallest. 644 00:23:15,590 --> 00:23:18,110 What's the simplest thing for me to do now? 645 00:23:18,110 --> 00:23:21,410 I'm not going to waste my time by just bubbling Neil one spot to the left. 646 00:23:21,410 --> 00:23:25,350 Why don't I just put Neil where he belongs, which is of course where? 647 00:23:25,350 --> 00:23:26,160 >> All the way at the beginning. 648 00:23:26,160 --> 00:23:27,720 So Neil, come with me. 649 00:23:27,720 --> 00:23:28,910 And what was your name again? 650 00:23:28,910 --> 00:23:29,310 >> GRACE: Grace. 651 00:23:29,310 --> 00:23:29,710 >> DAVID J. MALAN: Grace. 652 00:23:29,710 --> 00:23:29,920 OK. 653 00:23:29,920 --> 00:23:32,490 So Grace, unfortunately, you're kind of in the way. 654 00:23:32,490 --> 00:23:34,290 So how do we solve this problem? 655 00:23:34,290 --> 00:23:34,490 Right? 656 00:23:34,490 --> 00:23:37,500 If this is an array, there's only seven locations. 657 00:23:37,500 --> 00:23:40,830 Recall that, with Rob, we talked about declaring ages, and we only had a 658 00:23:40,830 --> 00:23:41,740 finite number of ages? 659 00:23:41,740 --> 00:23:42,535 Same idea here. 660 00:23:42,535 --> 00:23:44,300 We only have a finite number of ints. 661 00:23:44,300 --> 00:23:47,590 Grace is kind of in our way, so how do we fix? 662 00:23:47,590 --> 00:23:49,555 >> The simplest way is like, Grace, sorry. 663 00:23:49,555 --> 00:23:51,870 You're going to have to go over there so we can make room. 664 00:23:51,870 --> 00:23:55,290 Now, if you think about this, maybe we just made the problem worse. 665 00:23:55,290 --> 00:23:58,510 And maybe we did, because what if Grace were in the right place? 666 00:23:58,510 --> 00:24:01,730 But we know she's not, because otherwise, she would have been 667 00:24:01,730 --> 00:24:03,980 standing forward instead of Neil at this time, right? 668 00:24:03,980 --> 00:24:05,550 We already checked her number out. 669 00:24:05,550 --> 00:24:05,770 >> All right. 670 00:24:05,770 --> 00:24:09,110 So now, Neil's in the right place, and I can do a slight optimization. 671 00:24:09,110 --> 00:24:11,740 For the next minute, I'm going to ignore Neil all together, so as not to 672 00:24:11,740 --> 00:24:15,280 waste his time, or accidentally swap him to the wrong place. 673 00:24:15,280 --> 00:24:17,805 So now, how do I find the next element that's smallest? 674 00:24:17,805 --> 00:24:18,480 Two. 675 00:24:18,480 --> 00:24:20,225 That's a pretty good number, if you want to step forward and 676 00:24:20,225 --> 00:24:21,100 I'll remember you. 677 00:24:21,100 --> 00:24:21,980 Six, no good. 678 00:24:21,980 --> 00:24:24,820 Four, three, seven, five, no good. 679 00:24:24,820 --> 00:24:26,800 So let me move you to your right place. 680 00:24:26,800 --> 00:24:28,470 And we just got lucky this time. 681 00:24:28,470 --> 00:24:31,350 >> Now, I'm going to ignore these two guys, and now do one more 682 00:24:31,350 --> 00:24:32,260 pass through this. 683 00:24:32,260 --> 00:24:33,490 Six, that a pretty small number. 684 00:24:33,490 --> 00:24:34,300 Come on forward. 685 00:24:34,300 --> 00:24:35,220 Oh, sorry. 686 00:24:35,220 --> 00:24:37,640 Grace's number is better, so step on forward. 687 00:24:37,640 --> 00:24:38,260 Four. 688 00:24:38,260 --> 00:24:39,120 Sorry, Grace. 689 00:24:39,120 --> 00:24:39,950 Go back again. 690 00:24:39,950 --> 00:24:41,550 Number three is better. 691 00:24:41,550 --> 00:24:42,290 Seven. 692 00:24:42,290 --> 00:24:42,720 Five. 693 00:24:42,720 --> 00:24:43,550 And now what's your name again? 694 00:24:43,550 --> 00:24:44,000 >> JASON: Jason. 695 00:24:44,000 --> 00:24:44,420 >> DAVID J. MALAN: Jason. 696 00:24:44,420 --> 00:24:47,050 So Jason is now the smallest element I've selected. 697 00:24:47,050 --> 00:24:49,160 Where is he going to go? 698 00:24:49,160 --> 00:24:50,380 So where six is. 699 00:24:50,380 --> 00:24:51,210 And your name is again? 700 00:24:51,210 --> 00:24:51,710 >> GABE: Gabe. 701 00:24:51,710 --> 00:24:52,340 >> DAVID J. MALAN: Gabe. 702 00:24:52,340 --> 00:24:53,220 Gabe's in the way. 703 00:24:53,220 --> 00:24:54,640 What's the easiest thing to do? 704 00:24:54,640 --> 00:24:58,390 Swap these two guys and continue. 705 00:24:58,390 --> 00:24:59,020 So now let's see. 706 00:24:59,020 --> 00:25:00,170 Who's the smallest? 707 00:25:00,170 --> 00:25:01,030 Four. 708 00:25:01,030 --> 00:25:01,990 Let me just kind of cheat. 709 00:25:01,990 --> 00:25:03,090 Five is going to be the smallest. 710 00:25:03,090 --> 00:25:05,220 I find next, if, you want to step forward, what do I have to do with 711 00:25:05,220 --> 00:25:06,820 these guys, with Gabe? 712 00:25:06,820 --> 00:25:08,450 Swap again. 713 00:25:08,450 --> 00:25:10,740 So now, still slightly out of order. 714 00:25:10,740 --> 00:25:14,140 I found Gabe to be the smallest, so I pop him out, move you guys over. 715 00:25:14,140 --> 00:25:15,190 And done. 716 00:25:15,190 --> 00:25:17,200 >> So answer is the same. 717 00:25:17,200 --> 00:25:18,600 The end result is the same. 718 00:25:18,600 --> 00:25:22,730 Which of these two algorithms is better? 719 00:25:22,730 --> 00:25:23,500 The second one, I heard. 720 00:25:23,500 --> 00:25:24,252 Why? 721 00:25:24,252 --> 00:25:25,900 >> SPEAKER 3: It's n steps [INAUDIBLE]. 722 00:25:25,900 --> 00:25:27,600 >> DAVID J. MALAN: It's n steps at most. 723 00:25:27,600 --> 00:25:28,490 Interesting. 724 00:25:28,490 --> 00:25:30,610 So is it though? 725 00:25:30,610 --> 00:25:33,630 So how did I find the smallest element? 726 00:25:33,630 --> 00:25:37,060 How many steps did I have to take the find the smallest element? 727 00:25:37,060 --> 00:25:39,220 I had a look all the way at the end, right? 728 00:25:39,220 --> 00:25:41,530 Because in that worst case, what if Neil were over here? 729 00:25:41,530 --> 00:25:45,700 So just finding the smallest element takes me n steps, or n minus 1. 730 00:25:45,700 --> 00:25:46,100 But, OK. 731 00:25:46,100 --> 00:25:46,980 So fix Neil. 732 00:25:46,980 --> 00:25:48,740 Remember that, a minute or so ago. 733 00:25:48,740 --> 00:25:51,680 >> But how did I find the next smallest element? 734 00:25:51,680 --> 00:25:54,830 It's n minus 1, or n minus 2 really, from the number of steps. 735 00:25:54,830 --> 00:25:55,440 So OK. 736 00:25:55,440 --> 00:25:57,390 So I did n minus 2. 737 00:25:57,390 --> 00:25:57,600 All right. 738 00:25:57,600 --> 00:25:59,130 So that feels a little better. 739 00:25:59,130 --> 00:25:59,730 All right. 740 00:25:59,730 --> 00:26:03,270 How many steps the next time to find number three? 741 00:26:03,270 --> 00:26:04,420 So n minus 4. 742 00:26:04,420 --> 00:26:07,670 So it's decreasing, one fewer step on each iteration. 743 00:26:07,670 --> 00:26:08,740 So this does feel better, right? 744 00:26:08,740 --> 00:26:13,450 If last time it was roughly n times n, this time it's n minus 1, plus n minus 745 00:26:13,450 --> 00:26:16,500 2, plus n minus 3, plus n minus 4, dot, dot, dot. 746 00:26:16,500 --> 00:26:18,750 But if you recall from your high school textbooks, the little cheat 747 00:26:18,750 --> 00:26:24,380 sheet at the back that has formulas, if you add up this series of numbers, 748 00:26:24,380 --> 00:26:31,280 what is the total number of steps going to be that I take here? 749 00:26:31,280 --> 00:26:36,580 >> This is one of those, like, n minus 1, times n, divided by 2. 750 00:26:36,580 --> 00:26:39,040 So let me see if I can pull this up for just a moment. 751 00:26:39,040 --> 00:26:42,230 And again, I'm kind of rounding some numbers just to keep our life simple, 752 00:26:42,230 --> 00:26:47,830 but as I recall, it's something like if I do n minus 1 things, then n minus 753 00:26:47,830 --> 00:26:53,570 2, then n minus 3, it's roughly something like this over 2, and if I 754 00:26:53,570 --> 00:26:55,510 multiply this out, that's actually n square. 755 00:26:55,510 --> 00:26:58,940 That's not feeling too good. n minus n over 2. 756 00:26:58,940 --> 00:27:00,350 >> But here's the thing. 757 00:27:00,350 --> 00:27:03,720 In computer science, when the problems start to get interesting is when n 758 00:27:03,720 --> 00:27:04,700 gets really large. 759 00:27:04,700 --> 00:27:08,110 And when n gets really large, which of these values is going to dominate all 760 00:27:08,110 --> 00:27:09,750 of the others? 761 00:27:09,750 --> 00:27:10,990 It's kind of the n squared, right? 762 00:27:10,990 --> 00:27:13,340 Yes, dividing by 2 is pretty good. 763 00:27:13,340 --> 00:27:16,740 But if you're talking about billions of pieces of data, or trillions of 764 00:27:16,740 --> 00:27:18,700 pieces of data, OK, so you're twice as fast. 765 00:27:18,700 --> 00:27:22,440 But who really cares if that big number, if this factor is what gets 766 00:27:22,440 --> 00:27:23,040 bigger and bigger. 767 00:27:23,040 --> 00:27:25,990 And surely, it makes more of a difference than this guy. 768 00:27:25,990 --> 00:27:29,120 So even though you guys are right, the second algorithm, we'll call it 769 00:27:29,120 --> 00:27:32,970 selection sort, is, in the real world, a bit faster potentially, because I am 770 00:27:32,970 --> 00:27:35,360 taking fewer and fewer steps each time. 771 00:27:35,360 --> 00:27:37,340 >> It's not really fundamentally faster. 772 00:27:37,340 --> 00:27:41,430 Because if we actually play this out for large values of n, at the end of 773 00:27:41,430 --> 00:27:44,750 the day, for big enough n, it's still going to feel pretty slow. 774 00:27:44,750 --> 00:27:46,770 Well, let me take one last pass at that. 775 00:27:46,770 --> 00:27:48,920 That's what I would call selection sort. 776 00:27:48,920 --> 00:27:51,040 Can you guys reset yourselves one last time? 777 00:27:51,040 --> 00:27:53,550 And in this last case, I'm going to propose something 778 00:27:53,550 --> 00:27:54,970 called insertion sort. 779 00:27:54,970 --> 00:27:57,470 Insertion sort being, conceptually, a bit different. 780 00:27:57,470 --> 00:28:00,980 >> Rather than going back and forth and selecting the smallest element, I'm 781 00:28:00,980 --> 00:28:05,030 just going to deal with each of these guys as I encounter them, and insert 782 00:28:05,030 --> 00:28:06,850 them into their correct place. 783 00:28:06,850 --> 00:28:10,160 So I'm just going to start with Grace, and I see that she's number four. 784 00:28:10,160 --> 00:28:11,720 Where does number four belong? 785 00:28:11,720 --> 00:28:14,940 I haven't started sorting anything, so Grace gets to stay right there. 786 00:28:14,940 --> 00:28:18,355 And now I'm going to claim, if you could take a step to your right, this 787 00:28:18,355 --> 00:28:21,650 my sorted list, this is my unsorted remaining list. 788 00:28:21,650 --> 00:28:23,260 So now I'm going to proceed next, and what's your name again? 789 00:28:23,260 --> 00:28:23,700 >> BRANSON: Branson. 790 00:28:23,700 --> 00:28:24,150 >> DAVID J. MALAN: Branson. 791 00:28:24,150 --> 00:28:25,375 So Branson is number two. 792 00:28:25,375 --> 00:28:27,490 So I'm going to take you out for a moment. 793 00:28:27,490 --> 00:28:30,940 And now, where do you belong in this array? 794 00:28:30,940 --> 00:28:32,360 So to the right of Grace. 795 00:28:32,360 --> 00:28:35,670 So again, we're kind of making Grace do a lot of work here. 796 00:28:35,670 --> 00:28:37,290 Where do we put you? 797 00:28:37,290 --> 00:28:40,120 So we're going to slide you to the left, and insert Branson there. 798 00:28:40,120 --> 00:28:41,680 But now I claim that you guys are done. 799 00:28:41,680 --> 00:28:43,240 But notice, I'm not using extra space. 800 00:28:43,240 --> 00:28:45,130 It's still 2 elements here, 5 over here. 801 00:28:45,130 --> 00:28:47,910 Total array size is 7, so I'm not cheating, all right? 802 00:28:47,910 --> 00:28:51,950 >> So now we have, with Gabe here, the number six, where do you belong? 803 00:28:51,950 --> 00:28:52,650 You got lucky again. 804 00:28:52,650 --> 00:28:53,820 So you get to stay right there. 805 00:28:53,820 --> 00:28:57,210 Just take a slight step to the right just to make clear that you're sorted. 806 00:28:57,210 --> 00:29:00,520 And now we have Neil again, number one, where do you go? 807 00:29:00,520 --> 00:29:03,540 And now is where we'll begin to see that this algorithm, though on first 808 00:29:03,540 --> 00:29:05,950 glance, feels pretty smart, watch what's about to happen. 809 00:29:05,950 --> 00:29:07,370 If you could step forward. 810 00:29:07,370 --> 00:29:09,260 >> Where do we want to put Neil? 811 00:29:09,260 --> 00:29:11,830 So obviously here, so how do we get Neil there? 812 00:29:11,830 --> 00:29:12,970 Let's do this step-by-step. 813 00:29:12,970 --> 00:29:15,620 Gabe, where do you need to go? 814 00:29:15,620 --> 00:29:19,590 Yep, so take one big step, or two half-steps to make 815 00:29:19,590 --> 00:29:20,820 one step over there. 816 00:29:20,820 --> 00:29:21,750 Grace, where you go? 817 00:29:21,750 --> 00:29:22,510 Good. 818 00:29:22,510 --> 00:29:23,500 So another step. 819 00:29:23,500 --> 00:29:24,960 And finally, Branson? 820 00:29:24,960 --> 00:29:25,460 Another step. 821 00:29:25,460 --> 00:29:27,190 And now we can put Neil into place. 822 00:29:27,190 --> 00:29:28,440 >> So now, continue this logic. 823 00:29:28,440 --> 00:29:32,420 Even though we are not shifting Neil over, and over, and over, to put him 824 00:29:32,420 --> 00:29:36,420 where he goes, in the worst case, the next number we might encounter could 825 00:29:36,420 --> 00:29:42,220 be the number, say, there was a number zero, then we're going to shift all of 826 00:29:42,220 --> 00:29:42,730 these guys. 827 00:29:42,730 --> 00:29:44,950 Suppose that there's a number, negative one, then we have to shift 828 00:29:44,950 --> 00:29:46,080 all of these guys. 829 00:29:46,080 --> 00:29:48,500 So we're really just kind of flipping the problem around, such that we're 830 00:29:48,500 --> 00:29:52,620 transferring the expense from the selection process so the insertion 831 00:29:52,620 --> 00:29:56,930 process, such that you guys just had to move roughly n minus something 832 00:29:56,930 --> 00:29:57,940 number of steps. 833 00:29:57,940 --> 00:30:01,200 And that number of steps is only going to increase as I select more numbers, 834 00:30:01,200 --> 00:30:04,730 if I have to keep shoving you guys back, and back, and back. 835 00:30:04,730 --> 00:30:08,320 >> So the sad thing now is all of these algorithms are n squared. 836 00:30:08,320 --> 00:30:10,570 Let's go ahead and thanks to these guys, and visualize these a bit 837 00:30:10,570 --> 00:30:11,090 differently. 838 00:30:11,090 --> 00:30:12,312 Very well done. 839 00:30:12,312 --> 00:30:14,120 >> [APPLAUSE] 840 00:30:14,120 --> 00:30:15,030 >> All right. 841 00:30:15,030 --> 00:30:16,280 There you go. 842 00:30:16,280 --> 00:30:18,390 843 00:30:18,390 --> 00:30:18,470 Thanks for-- 844 00:30:18,470 --> 00:30:19,190 >> BRANSON: [INAUDIBLE] keep the numbers. 845 00:30:19,190 --> 00:30:21,990 >> DAVID J. MALAN: No, you may keep the numbers as well. 846 00:30:21,990 --> 00:30:23,440 All right. 847 00:30:23,440 --> 00:30:24,100 Nicely done. 848 00:30:24,100 --> 00:30:25,300 All right. 849 00:30:25,300 --> 00:30:30,510 So let's see if we can't now summarize more rapidly, and more visually, 850 00:30:30,510 --> 00:30:33,410 exactly what just happened here as follows. 851 00:30:33,410 --> 00:30:36,510 852 00:30:36,510 --> 00:30:38,770 I'm going to go ahead and pull up Firefox. 853 00:30:38,770 --> 00:30:41,310 We'll link this demonstration on the course's website. 854 00:30:41,310 --> 00:30:43,870 Java is a bit annoying to get working in some browsers these days. 855 00:30:43,870 --> 00:30:46,760 So if you do play with this at home, realize you might need to use Firefox 856 00:30:46,760 --> 00:30:47,990 to get it working. 857 00:30:47,990 --> 00:30:50,440 And what I'm going to do with this demonstration is the following. 858 00:30:50,440 --> 00:30:54,875 >> At the bottom, I have a whole bunch of menu options, including a start and a 859 00:30:54,875 --> 00:30:55,840 stop button. 860 00:30:55,840 --> 00:30:59,450 Also, as an aside, there seems to be a bug in these programs, whereby you 861 00:30:59,450 --> 00:31:03,720 can't actually see the start or stop button unless you hold Command or Alt 862 00:31:03,720 --> 00:31:06,560 plus and zoom in, which curiously shows you more buttons. 863 00:31:06,560 --> 00:31:09,090 So just FYI if you play with this at home. 864 00:31:09,090 --> 00:31:12,870 Now I'm going to click Start in just a moment, after specifying a delay of, 865 00:31:12,870 --> 00:31:16,810 like, 200 milliseconds here, just so we can see what happens. 866 00:31:16,810 --> 00:31:20,180 >> So I claim that this is a visualization of the first algorithm 867 00:31:20,180 --> 00:31:23,730 these guys did, bubble sort, whereby we swapped people pair-wise. 868 00:31:23,730 --> 00:31:27,490 The key insight to this visualization is that the height of the bars 869 00:31:27,490 --> 00:31:30,510 represents the size of the number. 870 00:31:30,510 --> 00:31:32,210 So the taller the bar, the bigger the number. 871 00:31:32,210 --> 00:31:33,680 Shorter the bar, smaller the number. 872 00:31:33,680 --> 00:31:38,630 And if you notice, we're going through the first iteration of this algorithm, 873 00:31:38,630 --> 00:31:42,620 swapping big and small numbers, so that the small number comes first and 874 00:31:42,620 --> 00:31:44,280 the big number goes to the right. 875 00:31:44,280 --> 00:31:48,770 >> And as soon as we get the end of array of many more numbers than seven, we're 876 00:31:48,770 --> 00:31:49,900 going to go back to the beginning. 877 00:31:49,900 --> 00:31:51,140 And anticipate this. 878 00:31:51,140 --> 00:31:54,860 On the far left, that little guy's going to swap to the side, and this 879 00:31:54,860 --> 00:31:56,010 process repeats. 880 00:31:56,010 --> 00:31:59,450 Now this visualization quickly gets boring, so let me go ahead and stop 881 00:31:59,450 --> 00:32:04,170 it, change the delay something much faster just to get now, a feel for 882 00:32:04,170 --> 00:32:05,060 this algorithm. 883 00:32:05,060 --> 00:32:07,840 >> So even though I've sped it up, this is like upgrading my processor, buying 884 00:32:07,840 --> 00:32:08,580 a new computer. 885 00:32:08,580 --> 00:32:12,980 I haven't fundamentally changed my algorithm, but you can indeed see more 886 00:32:12,980 --> 00:32:16,800 clearly than with humans, that the big numbers are bubbling up to the top, 887 00:32:16,800 --> 00:32:20,900 and the small numbers are bubbling down to the bottom. 888 00:32:20,900 --> 00:32:22,390 And now this thing here sorted. 889 00:32:22,390 --> 00:32:25,260 And as an aside, in the squares, there's just some bookkeeping there to 890 00:32:25,260 --> 00:32:28,010 help you count how many comparisons, or how many swaps have 891 00:32:28,010 --> 00:32:28,950 actually been done. 892 00:32:28,950 --> 00:32:30,750 >> Well, let's try one of the others we saw. 893 00:32:30,750 --> 00:32:37,116 Let me click on bubble sort here, and let me choose, and this whole web page 894 00:32:37,116 --> 00:32:38,936 is a little buggy. 895 00:32:38,936 --> 00:32:41,155 Let's accept the risk and run it again. 896 00:32:41,155 --> 00:32:44,560 897 00:32:44,560 --> 00:32:45,030 There we go. 898 00:32:45,030 --> 00:32:47,180 So let's do selection sort. 899 00:32:47,180 --> 00:32:49,140 I don't know why the menu appears over there. 900 00:32:49,140 --> 00:32:54,070 Let's zoom in to fix that bug, change this to 50. 901 00:32:54,070 --> 00:32:56,020 Ah, let's actually do that much faster. 902 00:32:56,020 --> 00:32:59,160 Five milliseconds or so, and Start. 903 00:32:59,160 --> 00:33:00,470 >> So this is selection sort. 904 00:33:00,470 --> 00:33:03,070 So again, think about what we did with the humans up here. 905 00:33:03,070 --> 00:33:08,490 We went through the array and selected the smallest element again, 906 00:33:08,490 --> 00:33:09,250 and again, and again. 907 00:33:09,250 --> 00:33:11,110 Now I claim that was still pretty bad. 908 00:33:11,110 --> 00:33:15,010 It was still n squared, give or take, but it was, in the real world, a bit 909 00:33:15,010 --> 00:33:18,280 faster, because I was indeed taking slightly fewer steps each time. 910 00:33:18,280 --> 00:33:19,800 But we're only talking what? 911 00:33:19,800 --> 00:33:21,830 Maybe 40 or so bars here? 912 00:33:21,830 --> 00:33:23,200 We're not talking 40 million. 913 00:33:23,200 --> 00:33:27,430 So it's not totally clear to me that was indeed a significant gain. 914 00:33:27,430 --> 00:33:32,530 >> Let me now go back and change to our third algorithm, which was select 915 00:33:32,530 --> 00:33:33,180 insertion sort. 916 00:33:33,180 --> 00:33:36,380 And now it's really buggy because the menu really shouldn't be down there. 917 00:33:36,380 --> 00:33:40,840 So now we'll scroll back up here and start this algorithm. 918 00:33:40,840 --> 00:33:43,270 Whoop, start and stop. 919 00:33:43,270 --> 00:33:47,160 So this one kind of has a pretty pattern to it, whereby we're again 920 00:33:47,160 --> 00:33:50,240 inserting the humans, or in this case, the bars into 921 00:33:50,240 --> 00:33:52,620 their appropriate location. 922 00:33:52,620 --> 00:33:55,430 And it's already done before I turned around. 923 00:33:55,430 --> 00:33:58,940 But this one, too, in theory, is still n squared. 924 00:33:58,940 --> 00:34:01,430 >> So let's see if we can't summarize these as follows. 925 00:34:01,430 --> 00:34:04,750 I'm going to go ahead and just to give us sort of a common way of talking 926 00:34:04,750 --> 00:34:08,489 about these things, let me introduce just a bit of notation here. 927 00:34:08,489 --> 00:34:12,480 You're about to see something called big O , because it is literally a big 928 00:34:12,480 --> 00:34:16,320 O. And this is a way that a computer scientist or a mathematician even uses 929 00:34:16,320 --> 00:34:19,230 to describe the running time of some algorithm. 930 00:34:19,230 --> 00:34:21,400 How many steps does it actually take? 931 00:34:21,400 --> 00:34:25,080 >> Now I'm going to embarrass myself with my handwriting here in just a moment. 932 00:34:25,080 --> 00:34:29,020 But let me go ahead and say that this will be big O over here. 933 00:34:29,020 --> 00:34:33,610 And let me introduce one other symbol, a capital omega. 934 00:34:33,610 --> 00:34:37,080 Omega is going to be the opposite, essentially, of big O. Whereas big O 935 00:34:37,080 --> 00:34:40,790 means, in the worst case, how much time might some algorithm take, in 936 00:34:40,790 --> 00:34:43,480 terms of n, omega is going to be how much time might it 937 00:34:43,480 --> 00:34:45,409 take in the best case. 938 00:34:45,409 --> 00:34:48,090 And we'll see what we mean by best case in just a moment. 939 00:34:48,090 --> 00:34:49,940 >> So let's start something simple. 940 00:34:49,940 --> 00:34:54,719 Let me start with a linear search. 941 00:34:54,719 --> 00:34:55,679 So not sorting. 942 00:34:55,679 --> 00:34:58,000 We'll call this linear search. 943 00:34:58,000 --> 00:35:01,140 And now, make a little table out of this. 944 00:35:01,140 --> 00:35:06,600 And now, in the case of linear search, in the worst case, how many steps is 945 00:35:06,600 --> 00:35:11,770 it going to take me to find a number of arbitrary choice? 946 00:35:11,770 --> 00:35:14,540 And there's n total doors or n total numbers. 947 00:35:14,540 --> 00:35:15,940 Worst case. 948 00:35:15,940 --> 00:35:18,800 How many steps am I going to have to take to find the number 50 in an array 949 00:35:18,800 --> 00:35:20,830 of n doors? 950 00:35:20,830 --> 00:35:21,410 And why? 951 00:35:21,410 --> 00:35:23,680 Because it might be all the way over onto the end. 952 00:35:23,680 --> 00:35:27,120 So much like Jennifer encountered, the number 50 was all the way over, so in 953 00:35:27,120 --> 00:35:30,760 the worst case linear search is big O of n, we'll say. 954 00:35:30,760 --> 00:35:33,430 >> What about the best case, if you get really lucky? 955 00:35:33,430 --> 00:35:36,200 It's just going to take one step, or a constant number of steps. 956 00:35:36,200 --> 00:35:37,830 So we'll describe that as 1. 957 00:35:37,830 --> 00:35:39,010 So this is pretty good. 958 00:35:39,010 --> 00:35:41,210 Now what if we did something like binary search? 959 00:35:41,210 --> 00:35:43,860 960 00:35:43,860 --> 00:35:47,846 So binary search, in the worst case, took how much time? 961 00:35:47,846 --> 00:35:49,250 >> [INTERPOSING VOICES] 962 00:35:49,250 --> 00:35:51,310 >> DAVID J. MALAN: So actually, I heard it in a couple places. 963 00:35:51,310 --> 00:35:56,390 So it's actually log n, give or take, because as we divide the list in half 964 00:35:56,390 --> 00:36:00,730 again, and again, and again, we're able to find, ultimately, the value, 965 00:36:00,730 --> 00:36:04,750 if it's there, but there is a catch. 966 00:36:04,750 --> 00:36:08,590 What's the assumption that we have to take for granted for binary search? 967 00:36:08,590 --> 00:36:09,700 It has to be sorted. 968 00:36:09,700 --> 00:36:12,770 It's not sorted, you can split the thing in half again and again, and you 969 00:36:12,770 --> 00:36:15,490 can go left, and you can go right, and you can go left and right, but you're 970 00:36:15,490 --> 00:36:18,070 not going to find the element if the list isn't sorted, because 971 00:36:18,070 --> 00:36:18,790 you might miss it. 972 00:36:18,790 --> 00:36:22,120 Because your heuristic, for going left or right is going to be flawed if it's 973 00:36:22,120 --> 00:36:23,420 indeed not sorted. 974 00:36:23,420 --> 00:36:26,110 So there's sort of a hidden cost to using something like this. 975 00:36:26,110 --> 00:36:29,250 >> Now, let's go into our sorting algorithms not searching-- 976 00:36:29,250 --> 00:36:31,140 oh, actually let's go in this blank. 977 00:36:31,140 --> 00:36:33,190 Binary search in the best case? 978 00:36:33,190 --> 00:36:36,290 It's also 1 if it just happens to be in the very middle of the array, or 979 00:36:36,290 --> 00:36:37,810 the middle of the phone book. 980 00:36:37,810 --> 00:36:39,710 Now let's do bubble sort. 981 00:36:39,710 --> 00:36:42,570 So again, now we're entering the sorts, not the searches. 982 00:36:42,570 --> 00:36:47,220 >> In the worst case, how many steps did we claim bubble sort's going to take? 983 00:36:47,220 --> 00:36:48,410 n squared. 984 00:36:48,410 --> 00:36:49,200 So I'm going to draw that. 985 00:36:49,200 --> 00:36:51,710 Ooh, my handwriting looks even worse when it's projected that big. 986 00:36:51,710 --> 00:36:52,510 All right. 987 00:36:52,510 --> 00:36:53,570 So that's n squared. 988 00:36:53,570 --> 00:36:59,460 And in the best case of bubble sort, how many steps is it going to take? 989 00:36:59,460 --> 00:37:00,980 1, I heard. 990 00:37:00,980 --> 00:37:01,760 >> SPEAKER 1: n. 991 00:37:01,760 --> 00:37:03,286 >> DAVID J. MALAN: n, I heard. 992 00:37:03,286 --> 00:37:04,200 >> SPEAKER 1: 2. 993 00:37:04,200 --> 00:37:05,010 >> DAVID J. MALAN: 2, I heard. 994 00:37:05,010 --> 00:37:06,670 Do I hear 3? 995 00:37:06,670 --> 00:37:07,080 All right. 996 00:37:07,080 --> 00:37:11,390 So I've heard 1, n, 2, but let's pick apart at least the first of those 997 00:37:11,390 --> 00:37:12,330 suggestions, 1. 998 00:37:12,330 --> 00:37:15,370 It's not a bad instinct, because it kind of follows a pattern here. 999 00:37:15,370 --> 00:37:19,670 But if it only takes 1 step, how in the world could I claim that the list 1000 00:37:19,670 --> 00:37:22,900 is sorted, because if I'm only allowed to take 1 step, how many elements 1001 00:37:22,900 --> 00:37:25,230 could I actually check to be sure? 1002 00:37:25,230 --> 00:37:28,270 Well, just 1, which means there's n minus 1 elements that could be out of 1003 00:37:28,270 --> 00:37:31,310 order, and I'm just going on faith after looking at 1 element that the 1004 00:37:31,310 --> 00:37:31,850 thing is sorted. 1005 00:37:31,850 --> 00:37:33,930 So 1's not correct here. 1006 00:37:33,930 --> 00:37:35,710 So minimally, how many do I have to look at? 1007 00:37:35,710 --> 00:37:36,680 >> [INTERPOSING VOICES] 1008 00:37:36,680 --> 00:37:40,160 >> DAVID J. MALAN: n minus 1, or really, n, because I need to look at every 1009 00:37:40,160 --> 00:37:42,190 element to make sure that it's not out of order. 1010 00:37:42,190 --> 00:37:44,750 But again, we'll sort of wave our hands at the smaller numbers and 1011 00:37:44,750 --> 00:37:47,100 assume that, as n gets large, they're uninteresting anyway. 1012 00:37:47,100 --> 00:37:48,380 So that's bubble sort. 1013 00:37:48,380 --> 00:37:49,830 And now, let's do these last two. 1014 00:37:49,830 --> 00:37:53,520 Selection sort, and then we'll do insertion sort. 1015 00:37:53,520 --> 00:37:57,160 And then we will blow your minds with something much 1016 00:37:57,160 --> 00:37:58,926 better than all of these. 1017 00:37:58,926 --> 00:38:00,410 All right. 1018 00:38:00,410 --> 00:38:04,700 >> What is the worst case running time of selection sort? 1019 00:38:04,700 --> 00:38:05,680 >> SPEAKER 4: n squared. 1020 00:38:05,680 --> 00:38:06,710 >> DAVID J. MALAN: n square, I'm hearing. 1021 00:38:06,710 --> 00:38:09,790 But why n squared, intuitively? 1022 00:38:09,790 --> 00:38:11,170 >> SPEAKER 4: Because we just did it. 1023 00:38:11,170 --> 00:38:12,260 >> DAVID J. MALAN: Because we just did it. 1024 00:38:12,260 --> 00:38:12,550 OK. 1025 00:38:12,550 --> 00:38:13,380 Good answer. 1026 00:38:13,380 --> 00:38:16,660 But intuitively, why is selection sort n squared? 1027 00:38:16,660 --> 00:38:18,980 What did we have to do again and again? 1028 00:38:18,980 --> 00:38:22,570 We had to keep scanning through, are you the smallest, are you the 1029 00:38:22,570 --> 00:38:24,020 smallest, are you the smallest. 1030 00:38:24,020 --> 00:38:27,480 And granted, we were able to take n steps, then n minus 1, then n minus 2. 1031 00:38:27,480 --> 00:38:30,700 But if you kind of add those all up, or take it on faith that I've added 1032 00:38:30,700 --> 00:38:34,810 them up in advance, we get roughly n squared minus some smaller numbers. 1033 00:38:34,810 --> 00:38:36,730 So I'm going to call this n squared. 1034 00:38:36,730 --> 00:38:39,530 But with selection sort in the best case, how many steps is it 1035 00:38:39,530 --> 00:38:40,632 going to take me? 1036 00:38:40,632 --> 00:38:41,840 >> SPEAKER 5: [INAUDIBLE] 1037 00:38:41,840 --> 00:38:44,350 >> DAVID J. MALAN: It's unfortunately still n squared, right? 1038 00:38:44,350 --> 00:38:49,590 Because if I'm selecting the smallest element, and we had seven people here, 1039 00:38:49,590 --> 00:38:53,280 I only know, once I get to the very end, that I've found the smallest 1040 00:38:53,280 --> 00:38:55,670 number, wherever he or she may have been. 1041 00:38:55,670 --> 00:38:58,820 But how do I find the next smallest number? 1042 00:38:58,820 --> 00:39:00,160 I have to do another pass. 1043 00:39:00,160 --> 00:39:04,810 So in the best case, what is the input to selection sort? 1044 00:39:04,810 --> 00:39:07,830 It's an already sort list, number one, number two, number three, number four. 1045 00:39:07,830 --> 00:39:08,600 But I'm a computer. 1046 00:39:08,600 --> 00:39:10,190 I can only look at one thing at a time. 1047 00:39:10,190 --> 00:39:12,465 I can't sort of take a step back like a human and say, 1048 00:39:12,465 --> 00:39:14,030 ooh, this looks correct. 1049 00:39:14,030 --> 00:39:17,580 >> I can only adjudicate correctness in selection sort by selecting the 1050 00:39:17,580 --> 00:39:18,370 smallest number. 1051 00:39:18,370 --> 00:39:21,390 But even if I find number one first, if I don't know anything else about 1052 00:39:21,390 --> 00:39:24,460 the other numbers, which I don't, all I know that I've been handed an array 1053 00:39:24,460 --> 00:39:27,930 or a set of doors behind which are numbers, the only way I know that one 1054 00:39:27,930 --> 00:39:28,680 was the smallest? 1055 00:39:28,680 --> 00:39:32,440 If I get all the way here and realize, damn, one was indeed the smallest. 1056 00:39:32,440 --> 00:39:34,870 >> But how do I then determine that two is the next smallest? 1057 00:39:34,870 --> 00:39:38,350 By doing the same inefficiency again and again. 1058 00:39:38,350 --> 00:39:42,210 So finally, with insertion sort, how, in the worst case, 1059 00:39:42,210 --> 00:39:44,990 did we say it performs? 1060 00:39:44,990 --> 00:39:49,100 It too is n squared. 1061 00:39:49,100 --> 00:39:53,020 And how about with the best case? 1062 00:39:53,020 --> 00:39:56,282 We'll leave that as a cliffhanger. 1063 00:39:56,282 --> 00:40:00,090 We'll fill in that blank next time, but first let me propose that we 1064 00:40:00,090 --> 00:40:02,620 fundamentally do better than all of these, all right? 1065 00:40:02,620 --> 00:40:05,220 >> So think for yourself what insertion sort's going to be. 1066 00:40:05,220 --> 00:40:06,910 Well, that wasn't very dramatic, because I'm the only one 1067 00:40:06,910 --> 00:40:08,970 that saw the change. 1068 00:40:08,970 --> 00:40:09,620 Wow. 1069 00:40:09,620 --> 00:40:10,420 OK. 1070 00:40:10,420 --> 00:40:12,615 So here we have a somewhat different demonstration. 1071 00:40:12,615 --> 00:40:16,580 If I zoom in here, you'll see that on the left we have bubble sort, in the 1072 00:40:16,580 --> 00:40:20,740 middle we have selection sort, and on the far right, we have something we 1073 00:40:20,740 --> 00:40:23,380 haven't looked at yet called merge sort. 1074 00:40:23,380 --> 00:40:26,080 But consider what we've been doing here thus far today. 1075 00:40:26,080 --> 00:40:29,200 When Jennifer first came up on stage, we went through the array of numbers 1076 00:40:29,200 --> 00:40:33,750 again, and again, with linear search, and we got linear running time, big O 1077 00:40:33,750 --> 00:40:35,100 of n, so to speak. 1078 00:40:35,100 --> 00:40:41,000 >> When we now consider the first week of class, when we had divide and conquer, 1079 00:40:41,000 --> 00:40:43,740 and we had the phone book tearing, and Jennifer, and we collectively 1080 00:40:43,740 --> 00:40:47,500 leveraged that key insight, which was to repeat yourself again and again by 1081 00:40:47,500 --> 00:40:50,930 somehow throwing away, throwing away, throwing away, half of the problem, or 1082 00:40:50,930 --> 00:40:55,320 generally, dividing a problem in half, and then treating the smaller piece of 1083 00:40:55,320 --> 00:40:59,630 the problem as conceptually equivalent to the other, we somehow did 1084 00:40:59,630 --> 00:41:00,910 fundamentally better. 1085 00:41:00,910 --> 00:41:04,720 But with bubble sort, with selection sort, with insertion sort, we've may 1086 00:41:04,720 --> 00:41:06,560 no such insights that Jennifer did. 1087 00:41:06,560 --> 00:41:10,220 We pretty much just walked back and forth a whole bunch of times, and we 1088 00:41:10,220 --> 00:41:12,650 tweaked things a little bit, swapping in this order, maybe 1089 00:41:12,650 --> 00:41:13,730 inserting or selecting. 1090 00:41:13,730 --> 00:41:16,950 But at the end of the day, I did a lot of awkward walking back and forth. 1091 00:41:16,950 --> 00:41:21,160 We didn't really leverage something smart like Jennifer did like dividing 1092 00:41:21,160 --> 00:41:22,040 and conquering. 1093 00:41:22,040 --> 00:41:25,620 >> So merge sort, by contrast, which we won't see until next week, it's going 1094 00:41:25,620 --> 00:41:29,540 to leverage that key idea by dividing the input, and then halving, and then 1095 00:41:29,540 --> 00:41:30,580 halving, and then halving. 1096 00:41:30,580 --> 00:41:34,590 And on each iteration of that loop, sorting the left half, and the right 1097 00:41:34,590 --> 00:41:38,200 half, then the left half of left half, and the right half of the left, then 1098 00:41:38,200 --> 00:41:40,990 the left half of the right half, and the right half of the right half. 1099 00:41:40,990 --> 00:41:42,840 And repeating again and again. 1100 00:41:42,840 --> 00:41:46,170 >> So you'll see this visually, but this is what awaits us next week. 1101 00:41:46,170 --> 00:41:49,760 And in general, when we think a little bit harder on any such problem. 1102 00:41:49,760 --> 00:41:52,435 1103 00:41:52,435 --> 00:41:57,970 We have n squared on the left, n squared in the middle, and n 1104 00:41:57,970 --> 00:41:59,400 log n on the right. 1105 00:41:59,400 --> 00:42:00,590 So there's your real cliffhanger. 1106 00:42:00,590 --> 00:42:02,040 We'll see you on Monday. 1107 00:42:02,040 --> 00:42:05,163 >> [APPLAUSE]