1 00:00:00,000 --> 00:00:00,510 2 00:00:00,510 --> 00:00:02,340 BRIAN YU: In this lab, your task is going 3 00:00:02,340 --> 00:00:06,210 to be to determine which sorting algorithm various different programs 4 00:00:06,210 --> 00:00:07,380 use. 5 00:00:07,380 --> 00:00:10,850 We've been introduced, now, to three different sorting algorithms-- 6 00:00:10,850 --> 00:00:14,430 selection sort, bubble sort, and merge sort. 7 00:00:14,430 --> 00:00:17,400 Recall that in selection sort, the way the algorithm works 8 00:00:17,400 --> 00:00:21,060 is by taking an array of numbers and repeating some process-- 9 00:00:21,060 --> 00:00:23,940 the process of looking through that entire array of numbers 10 00:00:23,940 --> 00:00:27,540 to try and find the smallest item in that entire array, 11 00:00:27,540 --> 00:00:31,037 and then bringing that smallest item to the beginning of the array. 12 00:00:31,037 --> 00:00:33,120 Then we search through the remainder of the array, 13 00:00:33,120 --> 00:00:35,910 looking for the next smallest item, and when we find it, 14 00:00:35,910 --> 00:00:39,300 we bring that next smallest item back to the beginning of the array, 15 00:00:39,300 --> 00:00:42,990 and we repeat that process, continually looking through the remaining 16 00:00:42,990 --> 00:00:46,470 unsorted portion of the array, finding the smallest element, 17 00:00:46,470 --> 00:00:49,780 and swapping it with the element in the next position. 18 00:00:49,780 --> 00:00:52,380 Once we repeat that process through the entire array, 19 00:00:52,380 --> 00:00:55,440 we'll end up with a sorted array. 20 00:00:55,440 --> 00:00:58,480 Meanwhile, bubble sort works a little bit differently. 21 00:00:58,480 --> 00:01:03,060 It also makes passes through the array, but it compares two values at a time. 22 00:01:03,060 --> 00:01:05,519 It looks at every pair of values and tries 23 00:01:05,519 --> 00:01:08,970 to determine whether that pair of numbers is in the correct order. 24 00:01:08,970 --> 00:01:12,750 That is to say, if you're trying to sort from smallest to largest, making sure 25 00:01:12,750 --> 00:01:15,900 that for any pair of two numbers, the smaller number is 26 00:01:15,900 --> 00:01:18,165 to the left of the larger number. 27 00:01:18,165 --> 00:01:20,790 And what bubbles sort does is that if it finds two numbers that 28 00:01:20,790 --> 00:01:23,040 are out of order, then it swaps them. 29 00:01:23,040 --> 00:01:25,440 And bubble sort will continually go through this process 30 00:01:25,440 --> 00:01:29,580 of considering pairs of values and swapping them if necessary, until it 31 00:01:29,580 --> 00:01:32,110 gets to the end of the array. 32 00:01:32,110 --> 00:01:35,510 Finally, merge sort works fundamentally differently. 33 00:01:35,510 --> 00:01:40,060 It works by taking an array and dividing it into a left half and a right half, 34 00:01:40,060 --> 00:01:43,630 and sorting each of those halves first, and after we've recursively 35 00:01:43,630 --> 00:01:47,260 sorted each of those halves, we merge those halves back together. 36 00:01:47,260 --> 00:01:50,140 By repeating this process recursively again and again, 37 00:01:50,140 --> 00:01:54,240 we can build up an entire sorted array very quickly. 38 00:01:54,240 --> 00:01:58,110 Each of these algorithms can be analyzed in terms of their running time, 39 00:01:58,110 --> 00:01:59,640 in terms of big O-- 40 00:01:59,640 --> 00:02:03,530 the upper bound on the number of steps required to complete the sort-- 41 00:02:03,530 --> 00:02:05,310 and as well as big omega-- 42 00:02:05,310 --> 00:02:09,419 the lower bound on the number of steps required to complete the sort. 43 00:02:09,419 --> 00:02:13,410 In this case, we can see that for a selection sort, it has both a big O 44 00:02:13,410 --> 00:02:16,050 and a big omega of n squared. 45 00:02:16,050 --> 00:02:19,095 That is to say that if there are n numbers that we are trying to sort, 46 00:02:19,095 --> 00:02:23,910 it might take about n squared steps in order to actually complete that sorting 47 00:02:23,910 --> 00:02:24,810 algorithm. 48 00:02:24,810 --> 00:02:28,410 Bubble sort, meanwhile, also has a big O of n squared. 49 00:02:28,410 --> 00:02:32,385 As an upper bound, it still might take on the order of n squared steps to sort 50 00:02:32,385 --> 00:02:36,180 n numbers, but it has a big omega of n. 51 00:02:36,180 --> 00:02:38,520 That is to say, if we were to get lucky and we 52 00:02:38,520 --> 00:02:41,760 were given an array that's already sorted, for example, 53 00:02:41,760 --> 00:02:45,570 bubble sort could sort n items and using just n steps 54 00:02:45,570 --> 00:02:48,270 by making one pass through the array, concluding 55 00:02:48,270 --> 00:02:52,080 that no swaps are necessary, and then not continuing on with anything else, 56 00:02:52,080 --> 00:02:54,390 because the array is already sorted. 57 00:02:54,390 --> 00:02:59,850 Meanwhile, merge sort has a running time of big O of n log n, and big omega of n 58 00:02:59,850 --> 00:03:04,005 log n as well, meaning that if there are n numbers that we are trying to sort, 59 00:03:04,005 --> 00:03:08,940 this algorithm is going to take on the order of n times log n steps 60 00:03:08,940 --> 00:03:11,670 in order to sort all of those numbers. 61 00:03:11,670 --> 00:03:16,680 That's better than n squared, though not nearly as good as big omega of n, 62 00:03:16,680 --> 00:03:18,380 for example. 63 00:03:18,380 --> 00:03:22,460 Your task now, in this lab, is that we are going to give you three programs-- 64 00:03:22,460 --> 00:03:25,550 sort1, sort2, and sort3. 65 00:03:25,550 --> 00:03:27,290 One of them uses selection sort. 66 00:03:27,290 --> 00:03:28,910 One of them uses bubble sort. 67 00:03:28,910 --> 00:03:32,810 And one of them uses merge sort, but we're not telling you which is which. 68 00:03:32,810 --> 00:03:35,310 Your task is going to be to run these sorting algorithm 69 00:03:35,310 --> 00:03:37,760 programs on various different inputs-- 70 00:03:37,760 --> 00:03:41,020 lists of numbers of different sizes and different orders-- 71 00:03:41,020 --> 00:03:46,800 and to try to determine which algorithm corresponds to which program. 72 00:03:46,800 --> 00:03:49,320 The files we give you are going to be these. 73 00:03:49,320 --> 00:03:52,320 We'll give you a file called random5000.txt, 74 00:03:52,320 --> 00:03:55,740 for example, which will be a text file that 75 00:03:55,740 --> 00:03:58,980 contains 5,000 numbers in random order. 76 00:03:58,980 --> 00:04:03,030 Likewise, random10000.txt and random50000.txt 77 00:04:03,030 --> 00:04:08,070 will contain 10,000 50,000 numbers, respectively, all in random order. 78 00:04:08,070 --> 00:04:10,140 In addition to files in random order, we'll 79 00:04:10,140 --> 00:04:14,760 also give you files called reversed5000.txt and reverse10000.txt, 80 00:04:14,760 --> 00:04:18,540 and so forth, each of which will also contain numbers, but this time 81 00:04:18,540 --> 00:04:21,930 in reverse order, from largest to smallest. 82 00:04:21,930 --> 00:04:26,520 And finally, we also give you sorted5000 and sorted10000.txt, 83 00:04:26,520 --> 00:04:30,390 which again will be text files that just contain numbers, one on each line, 84 00:04:30,390 --> 00:04:33,720 but this time the numbers will already be in sorted order 85 00:04:33,720 --> 00:04:36,000 from smallest to largest. 86 00:04:36,000 --> 00:04:39,540 Let's take a look at what you might do with these files. 87 00:04:39,540 --> 00:04:42,592 Here's the distribution code for the sort lab. 88 00:04:42,592 --> 00:04:44,550 You'll notice that we have our three programs-- 89 00:04:44,550 --> 00:04:47,550 sort1, sort2, and sort3-- 90 00:04:47,550 --> 00:04:49,070 and we also have these files. 91 00:04:49,070 --> 00:04:54,120 Random5000.txt, for example, is going to be a file with 5,000 lines, 92 00:04:54,120 --> 00:04:58,170 each with numbers, where those numbers are not in any particular order. 93 00:04:58,170 --> 00:05:01,830 Meanwhile, reversed5000.txt is going to be a text file that 94 00:05:01,830 --> 00:05:05,700 also has 5,000 numbers, but this time they're in reverse order, 95 00:05:05,700 --> 00:05:08,070 from largest to smallest. 96 00:05:08,070 --> 00:05:11,640 And finally, files like sorted5000.txt will give you 97 00:05:11,640 --> 00:05:15,690 5,000 lines of numbers, where they're in sorted order in ascending order 98 00:05:15,690 --> 00:05:18,610 from smallest, all the way to largest. 99 00:05:18,610 --> 00:05:24,170 You can then use these sorting programs-- sort1, sort2, and sort3-- 100 00:05:24,170 --> 00:05:26,450 to try to see how you would go about sorting, 101 00:05:26,450 --> 00:05:29,750 and how long it might take to sort any of these files. 102 00:05:29,750 --> 00:05:38,610 For example, I might run ./sort1 on, let's say, random5000.txt. 103 00:05:38,610 --> 00:05:41,250 And what you'll see when I run that program is the program 104 00:05:41,250 --> 00:05:44,220 will sort those numbers using some sorting algorithm, 105 00:05:44,220 --> 00:05:48,330 and then print those numbers in sorted order from smallest to largest. 106 00:05:48,330 --> 00:05:50,190 Now, how long did that take? 107 00:05:50,190 --> 00:05:53,040 It turns out there's also a program called time 108 00:05:53,040 --> 00:05:57,490 that can time how long it takes for some other program to run. 109 00:05:57,490 --> 00:06:05,010 So I might say time ./sort1, and then random5000.txt, and if I run that, 110 00:06:05,010 --> 00:06:08,430 after everything is sorted, I'll see that the amount of time that took 111 00:06:08,430 --> 00:06:13,440 in real time was 0.168 seconds. 112 00:06:13,440 --> 00:06:16,210 And depending on which sorting program I use, 113 00:06:16,210 --> 00:06:19,920 and depending on what file I run that sorting algorithm on, and the number 114 00:06:19,920 --> 00:06:24,360 of values that are in that file, as well as what order that file is already in, 115 00:06:24,360 --> 00:06:26,910 I might get different values for the amount of time 116 00:06:26,910 --> 00:06:30,720 it takes to sort and print out all of those numbers. 117 00:06:30,720 --> 00:06:33,000 Using that information, your challenge is 118 00:06:33,000 --> 00:06:36,570 to solve that puzzle, to figure out which algorithm does sort1 use, 119 00:06:36,570 --> 00:06:42,240 and which algorithm does sort2 use, and which algorithm does sort3 use? 120 00:06:42,240 --> 00:06:48,420 Again, you can test any of these sorting programs by running ./sort1 or ./sort2 121 00:06:48,420 --> 00:06:52,660 sort3 on a text file that's provided as a command line argument. 122 00:06:52,660 --> 00:06:58,260 For example, in this case, we're running sort1 on reversed10000.txt. 123 00:06:58,260 --> 00:07:02,040 And to time how long any of these commands take, all you need to do 124 00:07:02,040 --> 00:07:05,700 is prepend the time command in front of whatever command 125 00:07:05,700 --> 00:07:07,560 you would otherwise want to run. 126 00:07:07,560 --> 00:07:11,985 For example, here we're timing how long it takes to use sort1 to sort 127 00:07:11,985 --> 00:07:17,300 and print out all of the values in reversed10000.txt. 128 00:07:17,300 --> 00:07:20,120 Ultimately, what you'll need to do is test and time 129 00:07:20,120 --> 00:07:24,260 each of these programs on some of the sample files that we give you. 130 00:07:24,260 --> 00:07:26,600 Don't have to test it on all of them, but you'll likely 131 00:07:26,600 --> 00:07:31,100 want to test on at least a few to get a sense for how these programs behave 132 00:07:31,100 --> 00:07:34,700 depending on the size of the input, and depending on how well-ordered 133 00:07:34,700 --> 00:07:37,550 the values inside of that text file already are. 134 00:07:37,550 --> 00:07:39,620 And using that information, you should be 135 00:07:39,620 --> 00:07:43,790 able to determine which program corresponds to which sorting algorithm 136 00:07:43,790 --> 00:07:46,640 by taking advantage of what you know about the running 137 00:07:46,640 --> 00:07:48,710 times of those sorting algorithms. 138 00:07:48,710 --> 00:07:53,390 If one sorting algorithm has a much better big omega then it does big O, 139 00:07:53,390 --> 00:07:57,050 then you might expect that it would be much faster on a text file that's 140 00:07:57,050 --> 00:07:58,890 already sorted, for example. 141 00:07:58,890 --> 00:08:01,760 And you might examine what happens with each of these programs 142 00:08:01,760 --> 00:08:05,060 as you try to run them on larger and larger inputs of inputs that 143 00:08:05,060 --> 00:08:10,160 have more and more values to see how long that takes in comparison to files 144 00:08:10,160 --> 00:08:11,780 that have fewer values. 145 00:08:11,780 --> 00:08:14,610 Using all of that as your clues and information, 146 00:08:14,610 --> 00:08:17,090 you should be able to piece together and draw conclusions 147 00:08:17,090 --> 00:08:21,650 about which of these sorting algorithms we're using in each of those programs. 148 00:08:21,650 --> 00:08:25,030 My name is Brian, and this was sort. 149 00:08:25,030 --> 00:08:26,000