1 00:00:00,000 --> 00:00:02,290 [Insertion Sort] 2 00:00:02,290 --> 00:00:04,240 [Tommy MacWilliam] [Harvard University] 3 00:00:04,240 --> 00:00:07,290 [This is CS50.TV] 4 00:00:07,290 --> 00:00:13,060 Let's take a look at insertion sort, an algorithm for taking a list of numbers and sorting them. 5 00:00:13,060 --> 00:00:18,300 An algorithm, remember, is simply a step-by-step procedure for accomplishing a task. 6 00:00:18,300 --> 00:00:23,640 The basic idea behind insertion sort, is to divide our list into two portions, 7 00:00:23,640 --> 00:00:26,580 a sorted portion and an unsorted portion. 8 00:00:26,580 --> 00:00:29,290 >> At each step of the algorithm, a number is moved 9 00:00:29,290 --> 00:00:32,439 from the unsorted portion to the sorted portion 10 00:00:32,439 --> 00:00:35,680 until eventually the entire list is sorted. 11 00:00:35,680 --> 00:00:43,340 Here is the list of six unsorted numbers--23, 42, 4, 16, 8, and 15. 12 00:00:43,340 --> 00:00:47,680 Since these numbers are not all in ascending order, they're unsorted. 13 00:00:47,680 --> 00:00:53,890 Since we haven't started sorting yet, we'll consider all six elements our unsorted portion. 14 00:00:53,890 --> 00:00:59,270 >> Once we start sorting, we'll put these sorted numbers to the left of these. 15 00:00:59,270 --> 00:01:03,600 So, let's start with 23, the first element in our list. 16 00:01:03,600 --> 00:01:06,930 We don't have any elements in our sorted portion yet, 17 00:01:06,930 --> 00:01:12,460 so let's simply consider 23 to be the start and end of our sorted portion. 18 00:01:12,460 --> 00:01:16,510 Now, our sorted portion has one number, 23, 19 00:01:16,510 --> 00:01:20,260 and our unsorted portion has these five numbers. 20 00:01:20,260 --> 00:01:27,320 Let's now insert the next number in our unsorted portion, 42, into the sorted portion. 21 00:01:27,320 --> 00:01:35,930 >> To do so, we'll need to compare the 42 to the 23--the only element in our sorted portion so far. 22 00:01:35,930 --> 00:01:41,980 Forty-two is larger than 23, so we can simply append 42 to the end 23 00:01:41,980 --> 00:01:45,420 of the sorted portion of the list. Great! 24 00:01:45,420 --> 00:01:51,850 Now our sorted portion has two elements, and our unsorted portion has four elements. 25 00:01:51,850 --> 00:01:57,200 So, let's now move to the 4, the next element in the unsorted portion. 26 00:01:57,200 --> 00:02:00,230 So, where should this be placed in the sorted portion? 27 00:02:00,230 --> 00:02:04,220 >> Remember, we want to place this number in sorted order 28 00:02:04,220 --> 00:02:08,680 so our sorted portion remains correctly sorted at all times. 29 00:02:08,680 --> 00:02:14,380 If we place the 4 to the right of the 42, then our list will be out of order. 30 00:02:14,380 --> 00:02:18,380 So, let's continue moving right-to-left in our sort portion. 31 00:02:18,380 --> 00:02:23,260 As we move, let's shift each number down a place to make room for the new number. 32 00:02:25,440 --> 00:02:31,740 Okay, 4 is also less than 23, so we can't place it here either. 33 00:02:31,740 --> 00:02:34,480 Let's move the 23 right one place. 34 00:02:36,500 --> 00:02:43,120 >> That means we'd like to place the 4 into the first slot in the sorted portion. 35 00:02:43,120 --> 00:02:46,300 Notice how this space in the list was already empty, 36 00:02:46,300 --> 00:02:51,120 because we've been moving sorted elements down as we've encountered them. 37 00:02:51,120 --> 00:02:52,740 All right. So, we're halfway there. 38 00:02:52,740 --> 00:02:57,990 Let's continue our algorithm by inserting the 16 into the sorted portion. 39 00:02:59,260 --> 00:03:03,820 Sixteen is less than 42, so let's shift the 42 to the right. 40 00:03:05,700 --> 00:03:10,190 Sixteen is also less than 23, so let's also shift that element. 41 00:03:11,790 --> 00:03:20,820 >> Now, 16 is greater than 4. So, it looks like we'd like to insert the 16 between the 4 and the 23. 42 00:03:20,820 --> 00:03:24,620 While moving through the sorted portion of the list from right to left, 43 00:03:24,620 --> 00:03:29,160 4 is the first number we've seen that is smaller than the number 44 00:03:29,160 --> 00:03:31,540 we're trying to insert. 45 00:03:31,540 --> 00:03:35,820 So, now we can insert the 16 into this empty slot, 46 00:03:35,820 --> 00:03:40,520 which, remember, we've created by moving elements in the sort portion over 47 00:03:40,520 --> 00:03:43,340 as we've encountered them. 48 00:03:43,340 --> 00:03:47,900 >> All right. Now, we have four sorted elements and two unsorted elements. 49 00:03:47,900 --> 00:03:51,600 So, let's move the 8 into the sorted portion. 50 00:03:51,600 --> 00:03:55,010 Eight is less than 42. 51 00:03:55,010 --> 00:03:56,940 Eight is less than 23. 52 00:03:56,940 --> 00:04:00,230 And 8 is less than 16. 53 00:04:00,230 --> 00:04:03,110 But 8 is greater than 4. 54 00:04:03,110 --> 00:04:07,280 So, we'd like to insert the 8 between the 4 and the 16. 55 00:04:09,070 --> 00:04:13,650 And now we just have one more element left to sort--the 15. 56 00:04:13,950 --> 00:04:16,589 Fifteen is less than 42, 57 00:04:16,589 --> 00:04:19,130 Fifteen is less than 23. 58 00:04:19,130 --> 00:04:21,750 And 15 is less than 16. 59 00:04:21,750 --> 00:04:24,810 But 15 is greater than 8. 60 00:04:24,810 --> 00:04:27,440 >> So, here is where we want to make our final insertion. 61 00:04:28,770 --> 00:04:30,330 And we're done. 62 00:04:30,330 --> 00:04:33,540 We have no more elements in the unsorted portion, 63 00:04:33,540 --> 00:04:36,670 and our sorted portion is in the correct order. 64 00:04:36,670 --> 00:04:40,270 The numbers are ordered from smallest to largest. 65 00:04:40,270 --> 00:04:44,330 So, let's take a look at some pseudocode that describes the steps we just performed. 66 00:04:46,760 --> 00:04:51,740 >> On line 1, we can see that we'll need to iterate over each element in the list 67 00:04:51,740 --> 00:04:57,060 except the first, since the first element in the unsorted portion will simply become 68 00:04:57,060 --> 00:05:00,220 the first element in the sorted portion. 69 00:05:00,220 --> 00:05:06,320 On lines 2 and 3, we're keeping track of our current place in the unsorted portion. 70 00:05:06,320 --> 00:05:10,450 Element represents the number we are currently moving into the sorted portion, 71 00:05:10,450 --> 00:05:15,600 and j represents our index into the unsorted portion. 72 00:05:15,600 --> 00:05:20,980 >> On line 4, we're iterating through the sorted portion from right to left. 73 00:05:20,980 --> 00:05:26,010 We want to stop iterating once the element to the left of our current position 74 00:05:26,010 --> 00:05:30,050 is less than the element we're trying to insert. 75 00:05:30,050 --> 00:05:35,600 On line 5, we're shifting each element we encounter one space to the right. 76 00:05:35,600 --> 00:05:40,950 That way, we'll have a clear space to insert into when we find the first element 77 00:05:40,950 --> 00:05:44,080 less than the element we're moving. 78 00:05:44,080 --> 00:05:50,800 On line 6, we're updating our counter to continue to move left through the sorted portion. 79 00:05:50,800 --> 00:05:56,860 Finally, on line 7, we're inserting the element into the sorted portion of the list. 80 00:05:56,860 --> 00:06:00,020 >> We know that it's okay to insert into position j, 81 00:06:00,020 --> 00:06:05,360 because we've already moved the element that used to be there one space to the right. 82 00:06:05,360 --> 00:06:09,460 Remember, we're moving through the sorted portion from right to left, 83 00:06:09,460 --> 00:06:13,960 but we're moving through the unsorted portion from left to right. 84 00:06:13,960 --> 00:06:19,090 All right. Let's now take a look at how long running that algorithm took. 85 00:06:19,090 --> 00:06:25,300 We'll first ask the question how long does it take for this algorithm to run in the worst case. 86 00:06:25,300 --> 00:06:29,040 Recall that we represent this running time with Big O notation. 87 00:06:32,530 --> 00:06:38,500 In order to sort our list, we had to iterate over the elements in the unsorted portion, 88 00:06:38,500 --> 00:06:43,430 and for each of those elements, potentially over all elements in the sorted portion. 89 00:06:43,430 --> 00:06:47,950 Intuitively, this sounds like an O(n^2) operation. 90 00:06:47,950 --> 00:06:51,840 >> Looking at our pseudocode, we have a loop nested inside another loop, 91 00:06:51,840 --> 00:06:55,290 which, indeed, sounds like an O(n^2) operation. 92 00:06:55,290 --> 00:07:01,590 However, the sorted portion of list didn't contain the entire list until the very end. 93 00:07:01,590 --> 00:07:06,920 Still, we could potentially insert a new element at the very beginning of the sorted portion 94 00:07:06,920 --> 00:07:09,320 on every iteration of the algorithm, 95 00:07:09,320 --> 00:07:14,500 which means that we'd have to look at each element currently in the sorted portion. 96 00:07:14,500 --> 00:07:20,090 So, that means we could potentially make one comparison for the second element, 97 00:07:20,090 --> 00:07:24,660 two comparisons for the third element, and so on. 98 00:07:24,660 --> 00:07:32,480 So, the total number of steps is the sum of the integers from 1 to the length of the list minus 1. 99 00:07:32,480 --> 00:07:35,240 We can represent this with a summation. 100 00:07:41,170 --> 00:07:47,270 >> We won't go into summations here, but it turns out that this summation is equal to 101 00:07:47,270 --> 00:07:57,900 n (n - 1) over 2, which is equivalent n^2/2 - n/2. 102 00:07:57,900 --> 00:08:00,800 When talking about asymptotic runtime, 103 00:08:00,800 --> 00:08:05,170 this n^2 term is going to dominate this n term. 104 00:08:05,170 --> 00:08:11,430 So, insertion sort is Big O(n^2). 105 00:08:11,430 --> 00:08:16,150 What if we ran insertion sort on an already sorted list. 106 00:08:16,150 --> 00:08:20,960 In that case, we'd simply build up the sorted portion from left to right. 107 00:08:20,960 --> 00:08:25,050 So, we'll only need on the order of n steps. 108 00:08:25,050 --> 00:08:29,740 >> That means that insertion sort has a best-case performance of n, 109 00:08:29,740 --> 00:08:34,130 which we represent with Ω(n). 110 00:08:34,130 --> 00:08:36,190 And that's it for insertion sort, 111 00:08:36,190 --> 00:08:40,429 just one of the many algorithms we can use to sort a list. 112 00:08:40,429 --> 00:08:43,210 My name is Tommy, and this is CS50. 113 00:08:43,210 --> 00:08:44,880 [CS50.TV] 114 00:08:46,110 --> 00:08:49,230 Oh, you just can't stop that once it starts. 115 00:09:01,620 --> 00:09:04,760 Oh, we did that-- >>Boom!