1 00:00:00,000 --> 00:00:07,270 2 00:00:07,270 --> 00:00:10,390 >> MARK GROZEN-SMITH: Hi, I'm Mark Grozen-Smith, and this is Quicksort. 3 00:00:10,390 --> 00:00:13,520 Just like insertion sort and bubble sort, Quicksort is an algorithm for 4 00:00:13,520 --> 00:00:15,720 sorting a list or an array of things. 5 00:00:15,720 --> 00:00:19,080 For simplicity, let's assume that those things are just integers, but 6 00:00:19,080 --> 00:00:22,060 know that Quicksort works for more than just numbers. 7 00:00:22,060 --> 00:00:24,720 Quickstart is a bit more complicated than insertion or bubble, but it's 8 00:00:24,720 --> 00:00:27,560 also much more efficient in most cases. 9 00:00:27,560 --> 00:00:28,150 Wait a second. 10 00:00:28,150 --> 00:00:30,760 Did he just say "most cases," not "all"? 11 00:00:30,760 --> 00:00:31,710 Interestingly, no. 12 00:00:31,710 --> 00:00:33,560 Not all cases are the same. 13 00:00:33,560 --> 00:00:36,650 Don't worry about this detail if you haven't seen big O notation yet, but 14 00:00:36,650 --> 00:00:39,730 Quicksort is an O (n squared) algorithm at worst, just like 15 00:00:39,730 --> 00:00:41,430 insertion or bubble sort. 16 00:00:41,430 --> 00:00:44,950 However, it typically acts much more like an old analog m algorithm. 17 00:00:44,950 --> 00:00:45,750 Why? 18 00:00:45,750 --> 00:00:46,810 We'll get back to that later. 19 00:00:46,810 --> 00:00:49,610 But for now, let's just learn how Quicksort works. 20 00:00:49,610 --> 00:00:53,080 >> So let's walk through Quicksorting this array of integers from smallest 21 00:00:53,080 --> 00:00:54,260 to largest. 22 00:00:54,260 --> 00:01:00,110 Here we have the integers 6, 5, 1, 3, 8, 4, 7, 9, and 2. 23 00:01:00,110 --> 00:01:03,480 First, we pick the final element of this array-- in this case, two-- 24 00:01:03,480 --> 00:01:06,870 and call that the "pivot." Next, we start to look at two things-- 25 00:01:06,870 --> 00:01:10,220 one, the lowest index, which I'll refer to as staying to the right of 26 00:01:10,220 --> 00:01:13,970 the wall, and, two, the leftmost element, which I'll call the "current 27 00:01:13,970 --> 00:01:17,260 element." What we're going to do is look all the other elements, other 28 00:01:17,260 --> 00:01:20,930 than the pivot, and put all the elements smaller than the pivot to the 29 00:01:20,930 --> 00:01:24,140 left of the wall and all those larger than the pivot to the 30 00:01:24,140 --> 00:01:25,570 right of the wall. 31 00:01:25,570 --> 00:01:29,560 Then, finally, we'll drop the pivot right on that wall to put it between 32 00:01:29,560 --> 00:01:32,970 all the numbers smaller than it and all the numbers larger. 33 00:01:32,970 --> 00:01:34,460 >> So let's do that. 34 00:01:34,460 --> 00:01:38,540 Pick up the 2, put the wall at the beginning, and call the 6 the "current 35 00:01:38,540 --> 00:01:41,590 element." So we want to look at our current element, the 6. 36 00:01:41,590 --> 00:01:44,200 And since it's larger than the 2, we leave it there to the 37 00:01:44,200 --> 00:01:45,610 right of the wall. 38 00:01:45,610 --> 00:01:48,980 Then, we move on to look at the 5 as our current element and see that this 39 00:01:48,980 --> 00:01:51,840 is, again, larger than the pivot, so we leave it where it is on the right 40 00:01:51,840 --> 00:01:53,190 side of the wall. 41 00:01:53,190 --> 00:01:53,880 We move on. 42 00:01:53,880 --> 00:01:56,750 Our current element is now 1, and-- oh. 43 00:01:56,750 --> 00:01:58,030 This is different now. 44 00:01:58,030 --> 00:02:00,890 The current element is now smaller than the pivot, so we want to put it 45 00:02:00,890 --> 00:02:02,570 to the left of the wall. 46 00:02:02,570 --> 00:02:06,555 To do this, let's just switch the current element with the lowest index 47 00:02:06,555 --> 00:02:07,970 sitting just to the right of the wall. 48 00:02:07,970 --> 00:02:14,050 49 00:02:14,050 --> 00:02:17,570 Now, we move the wall up one index so the 1 is on the left 50 00:02:17,570 --> 00:02:19,750 side of the wall now. 51 00:02:19,750 --> 00:02:20,310 >> Wait. 52 00:02:20,310 --> 00:02:23,450 I just mixed up the elements on the right side of the wall, didn't I? 53 00:02:23,450 --> 00:02:23,890 Don't worry. 54 00:02:23,890 --> 00:02:24,930 That's fine. 55 00:02:24,930 --> 00:02:27,570 The only thing we care about for now is that all these elements to the 56 00:02:27,570 --> 00:02:29,570 right of the wall are bigger than the pivot. 57 00:02:29,570 --> 00:02:31,760 No actual order is assumed yet. 58 00:02:31,760 --> 00:02:33,200 >> Now, back to sorting. 59 00:02:33,200 --> 00:02:35,840 So we continue on looking at the rest of the elements. 60 00:02:35,840 --> 00:02:39,075 And in this case, we see that there are no other elements less than the 61 00:02:39,075 --> 00:02:42,100 pivot, so we leave them all on the right side of the wall. 62 00:02:42,100 --> 00:02:45,980 Finally, we get to the current element and see that it is the pivot. 63 00:02:45,980 --> 00:02:48,830 Now, that means that we have two sections of the array, the first being 64 00:02:48,830 --> 00:02:51,820 small on the pivot and on the left side of the wall, and the second being 65 00:02:51,820 --> 00:02:54,500 larger than the pivot to the right side of the wall. 66 00:02:54,500 --> 00:02:57,040 We want to put the pivot element between the two, and then we'll know 67 00:02:57,040 --> 00:03:01,000 that the pivot is in its right final sorted place. 68 00:03:01,000 --> 00:03:04,980 So we switch the first element on the right side of the wall with the pivot, 69 00:03:04,980 --> 00:03:06,410 and we know the pivot's in its right position. 70 00:03:06,410 --> 00:03:11,130 71 00:03:11,130 --> 00:03:15,650 >> We then repeat this process for the subarrays left and right of the pivot. 72 00:03:15,650 --> 00:03:18,700 Since the last subarray is only one element long, we know that's already 73 00:03:18,700 --> 00:03:22,480 sorted because how can you be out of order if you're only one element? 74 00:03:22,480 --> 00:03:28,860 For the right side of the subarray, we see that the pivot is 5, and the wall 75 00:03:28,860 --> 00:03:32,250 is just left of the 6. 76 00:03:32,250 --> 00:03:34,970 And the current element also starts out as the 6. 77 00:03:34,970 --> 00:03:36,200 So 6 is greater than 5. 78 00:03:36,200 --> 00:03:38,590 We leave it where it is on the right side of the wall. 79 00:03:38,590 --> 00:03:41,060 Now, moving on, 3 is less than 5. 80 00:03:41,060 --> 00:03:44,160 So we switch it with the first element just right of the wall. 81 00:03:44,160 --> 00:03:47,944 82 00:03:47,944 --> 00:03:50,750 Now, I moved the wall up one. 83 00:03:50,750 --> 00:03:53,010 Now, moving on to the 8. 84 00:03:53,010 --> 00:03:56,480 8 is greater than 5, and so we leave it. 85 00:03:56,480 --> 00:03:58,720 The 4 is less than 5, so we switch it. 86 00:03:58,720 --> 00:04:02,950 87 00:04:02,950 --> 00:04:03,570 And on. 88 00:04:03,570 --> 00:04:04,820 And on. 89 00:04:04,820 --> 00:04:10,190 90 00:04:10,190 --> 00:04:13,670 >> Each time we repeat the process on the left and right sides of the array. we 91 00:04:13,670 --> 00:04:17,010 choose a pivot and do the comparisons and create another level of left and 92 00:04:17,010 --> 00:04:18,240 right subarrays. 93 00:04:18,240 --> 00:04:21,500 This recursive call will continue until we reach the end when we've 94 00:04:21,500 --> 00:04:25,290 divided up the overall array into just subarrays of length 1. 95 00:04:25,290 --> 00:04:28,060 From there, we know the array is sorted because every element has, at 96 00:04:28,060 --> 00:04:29,330 some point, been a pivot. 97 00:04:29,330 --> 00:04:32,720 In other words, for every element, all the numbers to the left are lesser 98 00:04:32,720 --> 00:04:36,420 values and all the numbers to the right have greater values. 99 00:04:36,420 --> 00:04:38,980 >> This method works very well if the value of the pivot chosen is 100 00:04:38,980 --> 00:04:41,930 approximately in the middle range of the list values. 101 00:04:41,930 --> 00:04:45,630 This would mean that, after we move elements around, there about as many 102 00:04:45,630 --> 00:04:48,390 elements to the left of the pivot as there are to the right. 103 00:04:48,390 --> 00:04:52,380 And the divide-and-conquer nature of the Quicksort algorithm is then taken 104 00:04:52,380 --> 00:04:53,850 full advantage of. 105 00:04:53,850 --> 00:04:57,500 This creates a runtime of O (n log n,) the n because we have to do n minus 1 106 00:04:57,500 --> 00:05:01,640 comparisons on each generation and log n because we have to divide the list 107 00:05:01,640 --> 00:05:03,210 log n times. 108 00:05:03,210 --> 00:05:06,160 However, in the worst cases, this algorithm can actually be O (n 109 00:05:06,160 --> 00:05:09,850 squared.) Suppose on each generation, the pivot just so happens to be the 110 00:05:09,850 --> 00:05:12,520 smallest or the largest of the numbers we're sorting. 111 00:05:12,520 --> 00:05:15,870 This would mean breaking down the list n times and making n minus 1 112 00:05:15,870 --> 00:05:17,690 comparisons every single time. 113 00:05:17,690 --> 00:05:20,490 Thus, o of n squared. 114 00:05:20,490 --> 00:05:22,000 >> How can we improve the method? 115 00:05:22,000 --> 00:05:25,100 One good way to improve the method is to cut down on the probability that 116 00:05:25,100 --> 00:05:28,150 the runtime is ever actually o of n squared. 117 00:05:28,150 --> 00:05:31,860 Remember this worst, worst case scenario can only happen when the 118 00:05:31,860 --> 00:05:35,320 pivot chosen is always the highest or lowest value in the array. 119 00:05:35,320 --> 00:05:38,630 To ensure this is less likely to happen, we can find the pivot by 120 00:05:38,630 --> 00:05:42,610 choosing multiple elements and taking the median value. 121 00:05:42,610 --> 00:05:44,650 >> My name is Mark Grozen-Smith, and this is CS50. 122 00:05:44,650 --> 00:05:47,790 123 00:05:47,790 --> 00:05:50,930 >> For simplicity, let's assume that those things are just integers, but 124 00:05:50,930 --> 00:05:51,970 know that Quicksert-- 125 00:05:51,970 --> 00:05:53,160 Quicksert? 126 00:05:53,160 --> 00:05:55,200 Sorry. 127 00:05:55,200 --> 00:06:02,000 >> Here we have the integers 6, 5, 1, 3, 8, 4, 9. 128 00:06:02,000 --> 00:06:03,200 >> SPEAKER 1: Really? 129 00:06:03,200 --> 00:06:04,850 >> SPEAKER 2: Don't stop there. 130 00:06:04,850 --> 00:06:06,100 >> SPEAKER 1: Really? 131 00:06:06,100 --> 00:06:08,491