1 00:00:00,000 --> 00:00:00,060 2 00:00:00,060 --> 00:00:02,060 DAVID MALAN: --you can leverage both intuitively 3 00:00:02,060 --> 00:00:05,261 and in code to implement this new and improved fourth algorithm. 4 00:00:05,261 --> 00:00:07,260 DOUG LLOYD: After finally having hinted about it 5 00:00:07,260 --> 00:00:09,260 now by talking about all these other algorithms, 6 00:00:09,260 --> 00:00:15,930 we finally introduce something that's better, objectively better, in a worst 7 00:00:15,930 --> 00:00:16,430 case. 8 00:00:16,430 --> 00:00:18,770 DAVID MALAN: Yeah, in most cases, frankly. 9 00:00:18,770 --> 00:00:22,950 Yeah, when I took CS50 we use Quicksort which never 10 00:00:22,950 --> 00:00:24,950 clicked for me in quite the same way. 11 00:00:24,950 --> 00:00:28,290 Merge sort has always seemed a little simpler conceptually, 12 00:00:28,290 --> 00:00:30,990 if a little more magical, given that it really 13 00:00:30,990 --> 00:00:32,770 reduces to this recursive definition. 14 00:00:32,770 --> 00:00:36,390 And frankly this is why we deliberately introduce recursion 15 00:00:36,390 --> 00:00:39,630 at this point in the course, even though we don't use it all that much in C 16 00:00:39,630 --> 00:00:42,129 we introduced the idea of it so that we can really introduce 17 00:00:42,129 --> 00:00:43,747 a recursive algorithm like this. 18 00:00:43,747 --> 00:00:45,330 DOUG LLOYD: It was interesting though. 19 00:00:45,330 --> 00:00:47,220 One of the things that we got some of the most demand 20 00:00:47,220 --> 00:00:49,590 for from our online audience was to have a video Quicksort, even though we 21 00:00:49,590 --> 00:00:52,256 don't teach it in the class, it's actually one of the few topics 22 00:00:52,256 --> 00:00:55,920 that we have a short for that we never actually use. 23 00:00:55,920 --> 00:00:59,060 I agree though, I think that the concept of a pivot and having all these-- 24 00:00:59,060 --> 00:01:03,319 it's so many extra layers to try to boil down to this. 25 00:01:03,319 --> 00:01:06,360 DAVID MALAN: And I think we've gotten better at the presentation of this, 26 00:01:06,360 --> 00:01:09,540 like last year in particular I spent a while coming up 27 00:01:09,540 --> 00:01:12,120 with the simplified animation of merge sort 28 00:01:12,120 --> 00:01:15,390 just using excruciatingly detailed keynote 29 00:01:15,390 --> 00:01:19,410 slides like these, where they're very simple, but stepwise I mean there's 30 00:01:19,410 --> 00:01:22,680 dozens of slides that sort of give us this fake animation here that we can 31 00:01:22,680 --> 00:01:26,610 step through it at our own pace without having to just hit play and follow 32 00:01:26,610 --> 00:01:28,020 along. 33 00:01:28,020 --> 00:01:30,720 And these numbers too are sort of chosen. 34 00:01:30,720 --> 00:01:34,202 I accidentally have kept the even numbers on the left and the odd numbers 35 00:01:34,202 --> 00:01:36,910 on the right, but they do lend themselves in this order at least, 36 00:01:36,910 --> 00:01:37,769 to nice merging. 37 00:01:37,769 --> 00:01:39,810 And hopefully this helps students see, especially 38 00:01:39,810 --> 00:01:42,060 since the height of what we're about to do ultimately, 39 00:01:42,060 --> 00:01:45,270 or the number of vertical steps so to speak, 40 00:01:45,270 --> 00:01:48,840 ends up being logarithmic in height, three for eight elements, 41 00:01:48,840 --> 00:01:51,750 and meanwhile the length of these things, 42 00:01:51,750 --> 00:01:55,980 the number of elements we need to look at on each iteration is n, 43 00:01:55,980 --> 00:01:57,664 and therein lies are n log n. 44 00:01:57,664 --> 00:02:00,330 DOUG LLOYD: Yeah, I think it was last year was the first time we 45 00:02:00,330 --> 00:02:02,700 had this sort of visualization. 46 00:02:02,700 --> 00:02:08,430 And I think that that was a key turning point in making that n and the log n, 47 00:02:08,430 --> 00:02:10,972 I think, actually meaningful for perhaps the first time. 48 00:02:10,972 --> 00:02:12,180 DAVID MALAN: Yeah, hopefully. 49 00:02:12,180 --> 00:02:15,132 DOUG LLOYD: Because log of eight is 3 and so you can actually finally 50 00:02:15,132 --> 00:02:18,090 now see these, were that eight and that three, or that n and that log n 51 00:02:18,090 --> 00:02:18,500 come from. 52 00:02:18,500 --> 00:02:20,040 DAVID MALAN: But there too, for students for whom 53 00:02:20,040 --> 00:02:22,230 the mathematics come a little more comfortably-- 54 00:02:22,230 --> 00:02:26,550 and it's actually very nicely defined in terms of its recursive formula, 55 00:02:26,550 --> 00:02:29,580 whereby you can abstract away what the running time is and actually see 56 00:02:29,580 --> 00:02:30,454 formulaic what it is. 57 00:02:30,454 --> 00:02:33,780 And so long as you know or can look up in the back of a math or physics 58 00:02:33,780 --> 00:02:37,230 textbook the sort of summation that that series will give you, 59 00:02:37,230 --> 00:02:44,010 then you can see the n log n popping out of the total running time. 60 00:02:44,010 --> 00:02:48,990 DOUG LLOYD: Is there a reason that we don't expect merge sort of students? 61 00:02:48,990 --> 00:02:51,420 None of our problem sets require it. 62 00:02:51,420 --> 00:02:54,660 DAVID MALAN: No, really probably for lack of time and a piece set. 63 00:02:54,660 --> 00:02:57,994 We've generally had students do something like linear search or binary 64 00:02:57,994 --> 00:02:59,910 search because those are nice stepping stones, 65 00:02:59,910 --> 00:03:02,490 and then sometimes A sorting algorithm. 66 00:03:02,490 --> 00:03:06,301 But we typically have combined our look at algorithms and sorting 67 00:03:06,301 --> 00:03:08,800 with something that involves distribution code like m of 15, 68 00:03:08,800 --> 00:03:10,110 for instance, most recently. 69 00:03:10,110 --> 00:03:13,149 And so it's really I think for just lack of enough time in the week, 70 00:03:13,149 --> 00:03:15,690 but I think that could certainly be factored out as a problem 71 00:03:15,690 --> 00:03:17,250 or set of problems unto itself. 72 00:03:17,250 --> 00:03:20,190 With that said, the problem with a lot of these canonical searching 73 00:03:20,190 --> 00:03:23,220 and sorting algorithms is that there's just so many solutions out there 74 00:03:23,220 --> 00:03:25,521 and it's not that I worry about issues of honesty here, 75 00:03:25,521 --> 00:03:27,270 it's just if you're googling around trying 76 00:03:27,270 --> 00:03:29,769 to understand bubble sort, or selection sort, or merge sort, 77 00:03:29,769 --> 00:03:33,360 or whatever better, odds are you're going to see minimally pseudo 78 00:03:33,360 --> 00:03:36,662 code in the explanation, which is great, but it's so much easier 79 00:03:36,662 --> 00:03:39,120 sometimes to explain an algorithm by way of its actual code 80 00:03:39,120 --> 00:03:42,150 when it really is only a few lines maybe a dozen or so lines. 81 00:03:42,150 --> 00:03:45,025 DOUG LLOYD: Yeah, its' not all that intellectually interesting to say 82 00:03:45,025 --> 00:03:46,630 implement this sort, right? 83 00:03:46,630 --> 00:03:47,400 DAVID MALAN: No, I'd rather they use it. 84 00:03:47,400 --> 00:03:48,630 DOUG LLOYD: It's only in the context of something else. 85 00:03:48,630 --> 00:03:51,060 DAVID MALAN: Exactly, otherwise it's a pretty mundane exercise. 86 00:03:51,060 --> 00:03:52,080 I think it's helpful, especially when you 87 00:03:52,080 --> 00:03:53,790 have to wrestle with issues of like rounding 88 00:03:53,790 --> 00:03:56,956 if you have an odd number of elements, like do you go left, do you go right, 89 00:03:56,956 --> 00:04:00,252 do you truncate or round up, but even that's like-- 90 00:04:00,252 --> 00:04:02,460 If you're going to try to understand the topic better 91 00:04:02,460 --> 00:04:05,980 you're probably going to see it anyway, at which point the opportunity is lost. 92 00:04:05,980 --> 00:04:08,460 So we focus, I think, on slightly more involved algorithms 93 00:04:08,460 --> 00:04:11,190 when we have students implement them.