1 00:00:00,000 --> 00:00:11,270 2 00:00:11,270 --> 00:00:14,910 >> SPEAKER: All right, this is CS50. 3 00:00:14,910 --> 00:00:19,020 This is the end of week three, and if you haven't taken advantage already, 4 00:00:19,020 --> 00:00:21,790 know that there will be lunch this Friday as usual, where 5 00:00:21,790 --> 00:00:25,430 you can enjoy good conversation and food at Fire and Ice 6 00:00:25,430 --> 00:00:27,980 with some of CS50's staff and classmates. 7 00:00:27,980 --> 00:00:30,170 Head to this URL here. 8 00:00:30,170 --> 00:00:33,420 >> Now you may recall, or you may soon be acquainted with, 9 00:00:33,420 --> 00:00:35,970 these things here, which are given out at the end 10 00:00:35,970 --> 00:00:37,850 of the semester for many classes. 11 00:00:37,850 --> 00:00:40,870 So-called exam blue books, in which you write your answers to exams. 12 00:00:40,870 --> 00:00:44,240 Now I have here 26 such blue books, on each of them 13 00:00:44,240 --> 00:00:47,580 is written a name, A through Z. And indeed the names are that simple, A 14 00:00:47,580 --> 00:00:50,490 through Z. And one of the goals at hand today 15 00:00:50,490 --> 00:00:53,910 is going to be to continue what we started on Monday, which is not 16 00:00:53,910 --> 00:00:57,830 so much looking at code, but really looking at ideas and problem solving. 17 00:00:57,830 --> 00:01:00,170 One of the goals and promises of this course 18 00:01:00,170 --> 00:01:02,985 is to teach you to think more carefully, more methodically, 19 00:01:02,985 --> 00:01:05,400 and to solve problems more efficiently. 20 00:01:05,400 --> 00:01:09,526 And indeed, we can do that really without even touching a line of code. 21 00:01:09,526 --> 00:01:12,150 So I have a couple of elephants up here today, orange and blue, 22 00:01:12,150 --> 00:01:15,780 if we could get one volunteer, maybe from farther back than usual. 23 00:01:15,780 --> 00:01:18,070 How about right there, come on down. 24 00:01:18,070 --> 00:01:24,180 The goal of which is going to be to help plus administer this exam here. 25 00:01:24,180 --> 00:01:24,935 What's your name? 26 00:01:24,935 --> 00:01:25,768 >> AUDIENCE: Mary Beth. 27 00:01:25,768 --> 00:01:27,560 SPEAKER: Mary Beth, come on up. 28 00:01:27,560 --> 00:01:29,560 Let me get the microphone here for you. 29 00:01:29,560 --> 00:01:32,172 30 00:01:32,172 --> 00:01:32,880 Nice to meet you. 31 00:01:32,880 --> 00:01:34,005 >> AUDIENCE: Nice to meet you. 32 00:01:34,005 --> 00:01:36,790 SPEAKER: All right, so I have here blue books A through Z, 33 00:01:36,790 --> 00:01:41,680 and I'm going to pretend that I have one of the students, 34 00:01:41,680 --> 00:01:45,770 and they're coming in somewhat randomly at the end of a three hour exam block, 35 00:01:45,770 --> 00:01:49,400 so they're ending up in some semi-random order like this. 36 00:01:49,400 --> 00:01:54,510 Now your job in just a moment is going to be-- this is actually how they get 37 00:01:54,510 --> 00:01:56,820 turned in at the end of the class, most likely. 38 00:01:56,820 --> 00:02:01,120 Your job now is going to be, quite simply, to sort these blue books for us 39 00:02:01,120 --> 00:02:05,220 from A through Z. 40 00:02:05,220 --> 00:02:08,400 >> AUDIENCE: Oh, this is going to take forever. 41 00:02:08,400 --> 00:02:13,747 >> SPEAKER: And we will watch as you do this, no pressure. 42 00:02:13,747 --> 00:02:15,330 AUDIENCE: No, no pressure or anything. 43 00:02:15,330 --> 00:02:19,230 44 00:02:19,230 --> 00:02:23,570 >> SPEAKER: And for fun, let's put up a timer. 45 00:02:23,570 --> 00:02:26,680 46 00:02:26,680 --> 00:02:28,700 >> AUDIENCE: So much fun, so much fun. 47 00:02:28,700 --> 00:02:36,741 48 00:02:36,741 --> 00:02:38,574 >> SPEAKER: I can hold the mic for you. 49 00:02:38,574 --> 00:02:40,240 All right, we've just doubled our speed. 50 00:02:40,240 --> 00:02:44,190 51 00:02:44,190 --> 00:02:49,060 So in the meantime, let me pose what's going to be the question for Mary Beth 52 00:02:49,060 --> 00:02:51,540 is what is she doing, how is she going about solving this? 53 00:02:51,540 --> 00:02:54,040 And in fact, you might not have ever thought about something 54 00:02:54,040 --> 00:02:57,440 so simple as when you pick up 26 books like this, 55 00:02:57,440 --> 00:02:59,350 which do have a natural ordering to them. 56 00:02:59,350 --> 00:03:01,335 What is the process that you actually use? 57 00:03:01,335 --> 00:03:03,770 Is it fairly random just picking the first one you see 58 00:03:03,770 --> 00:03:05,250 and putting it in its place? 59 00:03:05,250 --> 00:03:09,680 Do you first move your hands around looking for A then looking for B? 60 00:03:09,680 --> 00:03:11,722 Do you take a look at a pair of them side by side 61 00:03:11,722 --> 00:03:14,680 and just say, wait a minute, this isn't right, and then swap the order? 62 00:03:14,680 --> 00:03:16,960 We saw already on Monday that there's a number of ways 63 00:03:16,960 --> 00:03:22,140 in which we can do this, and indeed as we near the end here, 64 00:03:22,140 --> 00:03:26,360 I would take note perhaps of what Mary Beth is doing. 65 00:03:26,360 --> 00:03:30,040 We have a few piles it seems, a bigger one, three smaller ones. 66 00:03:30,040 --> 00:03:33,790 67 00:03:33,790 --> 00:03:36,415 >> AUDIENCE: I'm ordering them when I find two letters 68 00:03:36,415 --> 00:03:39,540 that I know are together in a sequence, I put them together so that I don't 69 00:03:39,540 --> 00:03:42,915 have to worry about keeping track of a whole row of books. 70 00:03:42,915 --> 00:03:45,706 It's just, oh, A is first, I've got this stack here. 71 00:03:45,706 --> 00:03:47,580 SPEAKER: So, almost like a puzzle pieces that 72 00:03:47,580 --> 00:03:49,860 have the right shape to match up with each other. 73 00:03:49,860 --> 00:03:51,026 AUDIENCE: Pretty much, yeah. 74 00:03:51,026 --> 00:03:55,320 SPEAKER: OK, excellent. 75 00:03:55,320 --> 00:03:59,850 And now each of these piles is presumably sorted? 76 00:03:59,850 --> 00:04:00,990 >> AUDIENCE: Yeah. 77 00:04:00,990 --> 00:04:09,900 >> SPEAKER: All right, A through Z. All right, congratulations, you did it. 78 00:04:09,900 --> 00:04:11,461 You have your choice. 79 00:04:11,461 --> 00:04:11,960 Blue? 80 00:04:11,960 --> 00:04:13,530 All right, thank you for that. 81 00:04:13,530 --> 00:04:16,679 So Mary Beth did propose what her approach was, 82 00:04:16,679 --> 00:04:19,720 but what is another approach how you might go about sorting these things? 83 00:04:19,720 --> 00:04:21,130 What would you have done? 84 00:04:21,130 --> 00:04:24,060 The record to beat would have been one minute and 50 or so seconds, 85 00:04:24,060 --> 00:04:26,039 plus the ones I forgot to count. 86 00:04:26,039 --> 00:04:27,080 What would you have done? 87 00:04:27,080 --> 00:04:27,579 Yeah? 88 00:04:27,579 --> 00:04:28,735 AUDIENCE: Take the stack. 89 00:04:28,735 --> 00:04:29,776 Start from the beginning. 90 00:04:29,776 --> 00:04:32,284 Check your papers. 91 00:04:32,284 --> 00:04:36,586 And if the top one is higher than, maybe, they are, 92 00:04:36,586 --> 00:04:38,980 the bottom one is higher, then switch them. 93 00:04:38,980 --> 00:04:41,300 >> SPEAKER: OK, so starting at the top and the bottom, 94 00:04:41,300 --> 00:04:43,716 and then working your way inward like that, swapping them? 95 00:04:43,716 --> 00:04:46,580 OK, so a little similar in spirit to bubble sort, 96 00:04:46,580 --> 00:04:49,160 but choosing the extremes not the adjacent pairs. 97 00:04:49,160 --> 00:04:52,080 But the short of it is that there's surely a bunch of different ways 98 00:04:52,080 --> 00:04:54,210 we could do this, and frankly, I think you kind of 99 00:04:54,210 --> 00:04:55,700 adopted a couple approaches, right? 100 00:04:55,700 --> 00:05:00,567 You made sort of four sorted piles, and then effectively merged them together. 101 00:05:00,567 --> 00:05:02,650 And that's, daresay, another technique altogether. 102 00:05:02,650 --> 00:05:06,950 You didn't treat it as one big pile, you divided the problem into four quads, 103 00:05:06,950 --> 00:05:09,820 if you will, and then somehow merged them in the end. 104 00:05:09,820 --> 00:05:13,410 >> So let's consider, ultimately, how else we might do this. 105 00:05:13,410 --> 00:05:15,860 We formalized the notion of bubble sort last time, 106 00:05:15,860 --> 00:05:18,780 and bubble sort recall was an algorithm that we visualized 107 00:05:18,780 --> 00:05:22,640 with eight of your classmates up here, seemingly randomly sorted at first. 108 00:05:22,640 --> 00:05:26,110 And we then decided pairwise, if two elements are out of order, 109 00:05:26,110 --> 00:05:26,950 simply swap them. 110 00:05:26,950 --> 00:05:28,930 So four and two are obviously out of order, 111 00:05:28,930 --> 00:05:31,080 so those two classmates switched positions. 112 00:05:31,080 --> 00:05:35,390 And then we repeated with four and six, then six and eight, on each iteration, 113 00:05:35,390 --> 00:05:36,980 moving to the right. 114 00:05:36,980 --> 00:05:42,590 >> So given eight people, how many pairwise comparisons did I do while walking from 115 00:05:42,590 --> 00:05:45,220 left to right in one such iteration? 116 00:05:45,220 --> 00:05:48,410 How many comparisons? 117 00:05:48,410 --> 00:05:49,197 Seven, right? 118 00:05:49,197 --> 00:05:51,405 Because if there's eight people but you have the pair 119 00:05:51,405 --> 00:05:53,880 them and you keep moving one hop to the right, 120 00:05:53,880 --> 00:05:56,060 you're not going to have eight comparisons because you can't compare 121 00:05:56,060 --> 00:05:59,226 an element against itself, or it would just be pointless, so you have seven. 122 00:05:59,226 --> 00:06:01,290 Or more generally, if we have n people, we 123 00:06:01,290 --> 00:06:04,300 do n minus 1 comparisons with bubble sort. 124 00:06:04,300 --> 00:06:08,150 >> So let's consider now how good or bad bubble sort actually was, and try 125 00:06:08,150 --> 00:06:13,570 to give ourselves vocabulary with which to critique algorithms like this, 126 00:06:13,570 --> 00:06:14,430 and soon our own. 127 00:06:14,430 --> 00:06:16,970 So the first pass through bubble sort, the first time 128 00:06:16,970 --> 00:06:20,909 I walked from left to right across the stage, took me n minus 1 comparisons. 129 00:06:20,909 --> 00:06:22,950 And that's going to be my unit of measure, right? 130 00:06:22,950 --> 00:06:26,170 I was kind of talking and strolling, somewhat fast, somewhat slow, 131 00:06:26,170 --> 00:06:29,300 so counting my number of seconds isn't particularly telling, 132 00:06:29,300 --> 00:06:32,260 but counting the number of operations that I did on Monday, 133 00:06:32,260 --> 00:06:35,900 comparing two people, that feels like a nice unit of measure. 134 00:06:35,900 --> 00:06:40,980 >> So n minus 1 steps the first time, but then what happened after that? 135 00:06:40,980 --> 00:06:46,610 What's the one upside of one pass through an otherwise unsorted list? 136 00:06:46,610 --> 00:06:49,840 What can you tell me about the element who was all the way over there? 137 00:06:49,840 --> 00:06:51,300 Yeah? 138 00:06:51,300 --> 00:06:52,870 That was the biggest element, right? 139 00:06:52,870 --> 00:06:55,710 Number eight, even though she started here, every time I 140 00:06:55,710 --> 00:06:57,860 compared her against a neighbor, she kept 141 00:06:57,860 --> 00:07:00,480 bubbling up to the right hand side of the list. 142 00:07:00,480 --> 00:07:02,710 And indeed, that's where the algorithm gets its name. 143 00:07:02,710 --> 00:07:07,630 >> Now by that logic, how many comparisons need I make on the second time 144 00:07:07,630 --> 00:07:09,800 I make that pass from left to right? 145 00:07:09,800 --> 00:07:10,730 n minus 2, right? 146 00:07:10,730 --> 00:07:14,297 It would just be wasting my time if I keep comparing eight against someone 147 00:07:14,297 --> 00:07:16,630 else because we already know she was in the right place. 148 00:07:16,630 --> 00:07:19,760 So that's a bit of an optimization, so the next pass 149 00:07:19,760 --> 00:07:23,899 is going to be plus n minus two steps, where n is the number of people. 150 00:07:23,899 --> 00:07:26,940 Now you can kind of extrapolate, even if you're not a computer scientist, 151 00:07:26,940 --> 00:07:27,680 how this ends. 152 00:07:27,680 --> 00:07:31,259 At the end of this algorithm, presumably you've got just one comparison left. 153 00:07:31,259 --> 00:07:33,800 You have to kind of fix the beginning of the list in case two 154 00:07:33,800 --> 00:07:36,540 and one are out of order and should be one and two, 155 00:07:36,540 --> 00:07:40,330 so this bottoms out at plus 1 final comparison. 156 00:07:40,330 --> 00:07:44,500 >> Now the dot, dot, dot kind of waves it's hands at some of the juicier details, 157 00:07:44,500 --> 00:07:46,452 but let's just go ahead and simplify. 158 00:07:46,452 --> 00:07:48,660 If you recall from high school, frankly, a lot of you 159 00:07:48,660 --> 00:07:50,340 had math books that had a little cheat sheet 160 00:07:50,340 --> 00:07:52,550 on the front cover or the back cover that showed you 161 00:07:52,550 --> 00:07:56,400 what series summations like this ultimately added up to. 162 00:07:56,400 --> 00:07:59,600 In the general case, if you have a variable like n, and indeed this one, 163 00:07:59,600 --> 00:08:01,634 if you looked at your old school math book, 164 00:08:01,634 --> 00:08:04,050 you would see that this actually adds up to this sum here, 165 00:08:04,050 --> 00:08:07,970 n times n minus 1 all divided by 2. 166 00:08:07,970 --> 00:08:11,172 So for now let me just stipulate this is true, so on a leap of faith, 167 00:08:11,172 --> 00:08:12,880 that's what this sums up to, and we could 168 00:08:12,880 --> 00:08:14,341 prove that in a more general case. 169 00:08:14,341 --> 00:08:15,590 But now let's expand this out. 170 00:08:15,590 --> 00:08:19,920 So let's multiply this out, so that's n squared, minus n, all divided by 2. 171 00:08:19,920 --> 00:08:23,200 That's really n squared, divided by 2, minus n over 2, 172 00:08:23,200 --> 00:08:25,010 so that's all nice and interesting. 173 00:08:25,010 --> 00:08:27,060 But what happens if we now plug-in a value? 174 00:08:27,060 --> 00:08:29,724 Suppose I didn't have eight people, but say a million. 175 00:08:29,724 --> 00:08:31,890 And a million just because it's a pretty big number, 176 00:08:31,890 --> 00:08:34,039 let's plug that in and see what happens. 177 00:08:34,039 --> 00:08:39,039 So if I plug a million into that formula I'm going to get a million squared, 178 00:08:39,039 --> 00:08:42,868 divided by 2, minus a million, divided by 2. 179 00:08:42,868 --> 00:08:44,159 Now what's that going to equal? 180 00:08:44,159 --> 00:08:47,354 So 500 billion, minus 500,000. 181 00:08:47,354 --> 00:08:49,270 And if I actually do that math out, that means 182 00:08:49,270 --> 00:08:53,920 that sorting a million people with the bubble sort 183 00:08:53,920 --> 00:09:01,800 might take me 499,999,500,000 steps or comparisons in the end, 184 00:09:01,800 --> 00:09:02,900 we're just extrapolating. 185 00:09:02,900 --> 00:09:06,860 >> That feels pretty slow, but frankly measuring one particular input 186 00:09:06,860 --> 00:09:09,160 like this, isn't all that telling. 187 00:09:09,160 --> 00:09:14,050 But indeed it does suggest that as n gets larger and larger, this algorithm 188 00:09:14,050 --> 00:09:16,280 kind of feels worse and worse, or you really 189 00:09:16,280 --> 00:09:20,450 start to feel the pain of that exponentiation, that n squared, 190 00:09:20,450 --> 00:09:21,770 which adds up pretty fast. 191 00:09:21,770 --> 00:09:25,340 And this detail isn't lost on people, in fact 192 00:09:25,340 --> 00:09:29,640 some years ago a certain senator who was campaigning, sat down for an interview 193 00:09:29,640 --> 00:09:32,180 with Google's Eric Schmidt, CEO at the time, 194 00:09:32,180 --> 00:09:36,380 and was challenged with a question much like we're exploring today. 195 00:09:36,380 --> 00:09:38,468 Let's take a look. 196 00:09:38,468 --> 00:09:45,280 >> [VIDEO PLAYBACK] 197 00:09:45,280 --> 00:09:48,560 >> -Senator, you're here at Google, and I like 198 00:09:48,560 --> 00:09:53,382 to think of the presidency as a job interview. 199 00:09:53,382 --> 00:09:56,434 Now, it's hard to get a job as president, 200 00:09:56,434 --> 00:09:58,100 and you're going through the rigors now. 201 00:09:58,100 --> 00:10:01,860 It's also hard to get a job at Google. 202 00:10:01,860 --> 00:10:05,490 We have questions, and we ask our candidates questions, 203 00:10:05,490 --> 00:10:09,770 and this one is from Larry Schwimmer. 204 00:10:09,770 --> 00:10:14,760 What-- you guys think I'm kidding, it's right here. 205 00:10:14,760 --> 00:10:17,930 What is the most efficient way to sort a million 32-bit integers? 206 00:10:17,930 --> 00:10:21,800 207 00:10:21,800 --> 00:10:24,350 >> -Well-- 208 00:10:24,350 --> 00:10:25,200 >> -I'm sorry, maybe-- 209 00:10:25,200 --> 00:10:27,400 >> -No, no, no. 210 00:10:27,400 --> 00:10:30,700 I think the bubble sort would be the wrong way to go. 211 00:10:30,700 --> 00:10:34,165 212 00:10:34,165 --> 00:10:38,180 >> -Come on, who told him this? 213 00:10:38,180 --> 00:10:40,590 I didn't see computer science in your background. 214 00:10:40,590 --> 00:10:42,130 >> -We've got our spies in there. 215 00:10:42,130 --> 00:10:44,930 216 00:10:44,930 --> 00:10:48,444 >> -OK, let's ask a different interview question. 217 00:10:48,444 --> 00:10:49,300 >> [END VIDEO PLAYBACK] 218 00:10:49,300 --> 00:10:52,290 >> SPEAKER: So talking about specific numbers though, 219 00:10:52,290 --> 00:10:53,890 isn't going to be all that useful. 220 00:10:53,890 --> 00:10:56,810 It is not a life lesson that bubble sort, given a million inputs, 221 00:10:56,810 --> 00:10:58,590 might take as many as 500 billion steps. 222 00:10:58,590 --> 00:11:01,120 You can't really generalize too effectively from that 223 00:11:01,120 --> 00:11:03,560 and make good design decisions when writing programs. 224 00:11:03,560 --> 00:11:07,070 So let's focus though on how we might simplify this result. 225 00:11:07,070 --> 00:11:11,780 >> So I've highlighted in yellow here the result of n squared divided by 2, 226 00:11:11,780 --> 00:11:14,330 so a million squared divided by 2, and then 227 00:11:14,330 --> 00:11:16,710 I've highlighted what the ultimate answer was 228 00:11:16,710 --> 00:11:20,180 once we subtracted off n divided by 2. 229 00:11:20,180 --> 00:11:24,850 And the claim I'm going to make now is, who the heck cares if you subtract off 230 00:11:24,850 --> 00:11:30,060 a little old n over 2 when the first part of this formula is so much bigger? 231 00:11:30,060 --> 00:11:33,910 It dominates the other term, n squared divided by 2 232 00:11:33,910 --> 00:11:37,510 is so much bigger, clearly, as n gets large like a million, 233 00:11:37,510 --> 00:11:41,450 that is there really a big difference at the end of the day between 500 billion 234 00:11:41,450 --> 00:11:45,730 and 499,999,500,000? 235 00:11:45,730 --> 00:11:46,349 Not really. 236 00:11:46,349 --> 00:11:48,640 And so what we're going to do as computer scientists is 237 00:11:48,640 --> 00:11:53,270 ignore those lower order terms and take something like this and really 238 00:11:53,270 --> 00:11:56,050 just simplify it to the term that's going to matter. 239 00:11:56,050 --> 00:12:00,315 The bigger our data sets get, the bigger our databases get, the more web pages 240 00:12:00,315 --> 00:12:02,690 we have to search, the more friends you have on Facebook. 241 00:12:02,690 --> 00:12:07,340 >> As n gets larger, we're really going to care about the largest 242 00:12:07,340 --> 00:12:11,560 term in any such analysis of our algorithms performance. 243 00:12:11,560 --> 00:12:16,230 And I'm going to say, you know what, bubble sort is on the order of big O, 244 00:12:16,230 --> 00:12:18,060 on the order of n squared. 245 00:12:18,060 --> 00:12:20,090 It's not exactly n squared as we've seen, 246 00:12:20,090 --> 00:12:22,060 but who really cares about those smaller terms, 247 00:12:22,060 --> 00:12:24,390 and frankly, who really cares if we divide by 2? 248 00:12:24,390 --> 00:12:25,870 That's just a constant factor. 249 00:12:25,870 --> 00:12:29,480 And is 500 billion versus 250 billion really that big of a deal? 250 00:12:29,480 --> 00:12:32,190 I could just wait one year, let my laptop literally 251 00:12:32,190 --> 00:12:34,810 get twice as fast in hardware, and that sort of difference 252 00:12:34,810 --> 00:12:36,650 just goes away naturally over time. 253 00:12:36,650 --> 00:12:39,300 >> What we care about is the expression, the part 254 00:12:39,300 --> 00:12:42,489 of the expression that's going to vary as our input gets bigger and bigger. 255 00:12:42,489 --> 00:12:45,280 And indeed, in the real world, that's what's happening increasingly 256 00:12:45,280 --> 00:12:48,330 is the inputs to our problems and algorithms are getting bigger. 257 00:12:48,330 --> 00:12:53,470 So big O is going to be the notation, the asymptotic notation, that we just 258 00:12:53,470 --> 00:12:57,160 use as computer scientists to describe the performance, or the running time, 259 00:12:57,160 --> 00:12:58,130 of an algorithm. 260 00:12:58,130 --> 00:13:00,800 So that we can compare algorithms on different computers written 261 00:13:00,800 --> 00:13:04,170 by different people, by using some fundamentally similar metric 262 00:13:04,170 --> 00:13:07,557 like the number of comparisons you're making, or maybe the number of swaps 263 00:13:07,557 --> 00:13:08,140 you're making. 264 00:13:08,140 --> 00:13:11,910 >> What we're not going to count is the amount of time 265 00:13:11,910 --> 00:13:13,981 that passes on the clock on the wall typically. 266 00:13:13,981 --> 00:13:16,230 What we're not going to worry about is how much memory 267 00:13:16,230 --> 00:13:17,820 you're using today at least, though that's 268 00:13:17,820 --> 00:13:19,370 another resource we might measure. 269 00:13:19,370 --> 00:13:23,610 We're going to try to base our analyses on just the basic operations, the ones, 270 00:13:23,610 --> 00:13:25,930 frankly, that you can see most visually. 271 00:13:25,930 --> 00:13:30,700 So with something like big O of n squared, I claim that O of n squared 272 00:13:30,700 --> 00:13:35,820 is an upper bound on the so-called running time of bubble sort. 273 00:13:35,820 --> 00:13:38,820 In other words, if you wanted to claim that there's 274 00:13:38,820 --> 00:13:41,370 this upper limit on how many steps an algorithm might take, 275 00:13:41,370 --> 00:13:46,240 it's going to be in the big O of n squared in this case, an upper bound. 276 00:13:46,240 --> 00:13:49,710 >> What if I instead change the story to be not about bubble sort, 277 00:13:49,710 --> 00:13:50,910 but about this upper bound. 278 00:13:50,910 --> 00:13:54,030 Can you think of an algorithm that we've looked at already 279 00:13:54,030 --> 00:13:59,530 whose upper bound, maximum measure of time or operations, 280 00:13:59,530 --> 00:14:04,300 would be said to be bounded by n, a linear function, 281 00:14:04,300 --> 00:14:07,260 not a quadratic one that's curved? 282 00:14:07,260 --> 00:14:10,780 What's an algorithm that always takes no more 283 00:14:10,780 --> 00:14:12,860 than like n steps, or 2n steps, or 3n steps? 284 00:14:12,860 --> 00:14:13,360 Yeah? 285 00:14:13,360 --> 00:14:15,030 >> AUDIENCE: Finding the biggest number in a list? 286 00:14:15,030 --> 00:14:16,930 >> SPEAKER: Perfect, finding the biggest number in a list. 287 00:14:16,930 --> 00:14:18,940 If I'm given a list of people for instance, 288 00:14:18,940 --> 00:14:21,440 each of who is holding a number, what is the maximum number 289 00:14:21,440 --> 00:14:23,770 of steps it should take me, a reasonably smart person, 290 00:14:23,770 --> 00:14:27,530 to find the largest person in that list? 291 00:14:27,530 --> 00:14:28,100 n, right? 292 00:14:28,100 --> 00:14:31,320 Because in the worst case, where might the biggest value be? 293 00:14:31,320 --> 00:14:32,700 Right, all the way at the end. 294 00:14:32,700 --> 00:14:34,575 So in the worst case upper bound, I might 295 00:14:34,575 --> 00:14:36,450 have to go all the way over here and be like, 296 00:14:36,450 --> 00:14:39,170 oh, here's number eight, or whatever that value is. 297 00:14:39,170 --> 00:14:41,330 Now it would just be stupid if I kept going, right? 298 00:14:41,330 --> 00:14:43,840 Looking for more and more elements if the last of them is over there? 299 00:14:43,840 --> 00:14:45,340 So surely, n is an upper bound. 300 00:14:45,340 --> 00:14:47,420 I don't need to take more steps than that. 301 00:14:47,420 --> 00:14:51,580 >> So what if instead I proposed that there are algorithms in this world that 302 00:14:51,580 --> 00:14:57,750 have a running time that's bounded by big O of log n, log n? 303 00:14:57,750 --> 00:15:00,390 Where have we seen this before? 304 00:15:00,390 --> 00:15:00,890 Yeah? 305 00:15:00,890 --> 00:15:03,309 >> AUDIENCE: In the phone book problem? 306 00:15:03,309 --> 00:15:04,850 SPEAKER: Like the phone book problem. 307 00:15:04,850 --> 00:15:07,754 What was the measure of how much time or how many tears it 308 00:15:07,754 --> 00:15:10,170 took me to find someone like Mike Smith in the phone book? 309 00:15:10,170 --> 00:15:13,212 We claimed it was log n, and even if unfamiliar or it it's 310 00:15:13,212 --> 00:15:15,170 a little hazy what a logarithm or exponent was, 311 00:15:15,170 --> 00:15:17,650 just remember that log n generally refers to the process, 312 00:15:17,650 --> 00:15:20,790 in this case, of dividing something in half again, and again, 313 00:15:20,790 --> 00:15:25,790 and again, and again, such that it gets increasingly small as you do that. 314 00:15:25,790 --> 00:15:28,470 >> So log of n refers, sure, to the phone book example, 315 00:15:28,470 --> 00:15:32,662 to binary search in theory, when we had the virtual doors on the board, 316 00:15:32,662 --> 00:15:34,370 or when Sean was searching for something. 317 00:15:34,370 --> 00:15:37,374 If he had used binary search, log n would be the upper bound on how much 318 00:15:37,374 --> 00:15:38,040 time that takes. 319 00:15:38,040 --> 00:15:44,027 But those algorithms that ran in log n assumed what key detail? 320 00:15:44,027 --> 00:15:45,360 That the list was sorted, right? 321 00:15:45,360 --> 00:15:47,789 Your algorithm is wrong if your input is not sorted, 322 00:15:47,789 --> 00:15:49,830 and yet you're using something like binary search 323 00:15:49,830 --> 00:15:51,704 because you might jump right over the element 324 00:15:51,704 --> 00:15:53,600 without realizing it's indeed there. 325 00:15:53,600 --> 00:15:55,600 >> Now what might this mean, big O of one? 326 00:15:55,600 --> 00:15:59,117 This doesn't mean that your algorithm takes one and only one step, 327 00:15:59,117 --> 00:16:01,200 it just means it takes a constant number of steps. 328 00:16:01,200 --> 00:16:04,060 Maybe it's 1, maybe it's 10, maybe it's 1,000, 329 00:16:04,060 --> 00:16:07,750 but it's independent of the size of the problem. 330 00:16:07,750 --> 00:16:10,850 No matter how big n is, a constant time algorithm 331 00:16:10,850 --> 00:16:12,747 always takes the same number of steps. 332 00:16:12,747 --> 00:16:15,080 So what might be an algorithm we've talked about or just 333 00:16:15,080 --> 00:16:20,418 intuitively that comes to you that always runs in so-called constant time? 334 00:16:20,418 --> 00:16:20,918 Yeah? 335 00:16:20,918 --> 00:16:22,001 >> AUDIENCE: Add two numbers. 336 00:16:22,001 --> 00:16:25,320 SPEAKER: Add two numbers, 2 plus 2 equals 4, done. 337 00:16:25,320 --> 00:16:27,227 So that might work, what else? 338 00:16:27,227 --> 00:16:28,560 How about more real world, yeah? 339 00:16:28,560 --> 00:16:30,686 >> AUDIENCE: Finding the first thing in a list. 340 00:16:30,686 --> 00:16:32,810 SPEAKER: Finding the first element in a list, sure. 341 00:16:32,810 --> 00:16:34,540 We've actually been talking about arrays already, 342 00:16:34,540 --> 00:16:36,540 how do you get at the first element in an array, 343 00:16:36,540 --> 00:16:40,465 no matter how long the array is in C code? 344 00:16:40,465 --> 00:16:43,090 You just use like the bracket zero notation, bam, you're there. 345 00:16:43,090 --> 00:16:46,120 And indeed arrays, as an aside, support something generally known 346 00:16:46,120 --> 00:16:49,240 as random access, random access memory, because you can literally 347 00:16:49,240 --> 00:16:50,284 jump to any one place. 348 00:16:50,284 --> 00:16:52,700 We can do this even more simply we can rewind to week zero 349 00:16:52,700 --> 00:16:53,900 when we did Scratch. 350 00:16:53,900 --> 00:16:59,707 How much time did it take for the say block in Scratch to execute? 351 00:16:59,707 --> 00:17:00,790 Just constant time, right? 352 00:17:00,790 --> 00:17:03,960 Say something, say something, it doesn't matter 353 00:17:03,960 --> 00:17:07,359 how big Scratches world is, it's always going to take the same amount of time 354 00:17:07,359 --> 00:17:08,490 to simply say something. 355 00:17:08,490 --> 00:17:11,089 >> So that's constant time, but what's the flip side? 356 00:17:11,089 --> 00:17:13,030 If that was upper bounds, what if we want 357 00:17:13,030 --> 00:17:17,089 to describe the lower bounds of our algorithms running time? 358 00:17:17,089 --> 00:17:19,852 Almost a best case potentially, if you will, 359 00:17:19,852 --> 00:17:23,060 though these terms could apply to best cases, worst cases, average cases more 360 00:17:23,060 --> 00:17:26,359 generally, but let's just focus on lower bounds more generally. 361 00:17:26,359 --> 00:17:31,920 What's an algorithm that has a lower bound of n steps, 362 00:17:31,920 --> 00:17:33,350 or 2n steps, or 3n steps? 363 00:17:33,350 --> 00:17:36,241 Some factor of n steps, that's its lower bound. 364 00:17:36,241 --> 00:17:36,740 Yeah? 365 00:17:36,740 --> 00:17:37,910 >> AUDIENCE: Bubble sort? 366 00:17:37,910 --> 00:17:41,610 >> SPEAKER: Bubble sort takes you minimally n steps, why? 367 00:17:41,610 --> 00:17:42,279 Why is that? 368 00:17:42,279 --> 00:17:45,320 Why should that start to come to you intuitively, even if it doesn't just 369 00:17:45,320 --> 00:17:46,530 yet? 370 00:17:46,530 --> 00:17:47,030 Yeah? 371 00:17:47,030 --> 00:17:47,990 >> AUDIENCE: [INAUDIBLE]. 372 00:17:47,990 --> 00:17:51,652 373 00:17:51,652 --> 00:17:52,360 SPEAKER: Exactly. 374 00:17:52,360 --> 00:17:55,810 In the best possible scenario of bubble sort, and a lot of algorithms, 375 00:17:55,810 --> 00:17:58,769 if I hand you eight people who are already sorted, 376 00:17:58,769 --> 00:18:00,560 it would be foolish for you, the algorithm, 377 00:18:00,560 --> 00:18:02,202 to go back and forth more than once, right? 378 00:18:02,202 --> 00:18:04,285 Because as soon as you walk through the list once, 379 00:18:04,285 --> 00:18:08,090 you should realize, oh, I made no swaps, this list is sorted, exit. 380 00:18:08,090 --> 00:18:09,700 But that's going to take you n steps. 381 00:18:09,700 --> 00:18:12,033 >> And conversely, what's another way of thinking about it? 382 00:18:12,033 --> 00:18:15,240 Bubble sort is an omega, so to speak, of n, 383 00:18:15,240 --> 00:18:19,050 because if you look at fewer than n elements, what 384 00:18:19,050 --> 00:18:23,009 is the fundamental issue there? 385 00:18:23,009 --> 00:18:24,550 You don't know if it's sorted, right. 386 00:18:24,550 --> 00:18:26,800 We humans might glance at eight people and be like, oh, it's sorted, 387 00:18:26,800 --> 00:18:28,430 that didn't take me n steps, but it did. 388 00:18:28,430 --> 00:18:30,810 Your eyes, even though you kind of have a big field of vision, 389 00:18:30,810 --> 00:18:33,184 you looked at eight elements, you looked at eight people, 390 00:18:33,184 --> 00:18:34,610 that's eight steps effectively. 391 00:18:34,610 --> 00:18:38,612 And only if I walk through the whole list do I realize, yes, sorted. 392 00:18:38,612 --> 00:18:41,320 If I stop halfway thinking, all right, it's pretty sorted so far, 393 00:18:41,320 --> 00:18:42,520 what are the odds it's not sorted? 394 00:18:42,520 --> 00:18:44,186 That algorithms not going to be correct. 395 00:18:44,186 --> 00:18:46,250 Might be faster, but incorrect. 396 00:18:46,250 --> 00:18:48,500 >> So now we have a way of describing a lower bounds, 397 00:18:48,500 --> 00:18:49,710 and what about constant time? 398 00:18:49,710 --> 00:18:54,565 What's an algorithm that has a lower bound on its running time of one? 399 00:18:54,565 --> 00:18:58,350 1 step, 2 steps, 10 steps, but constant, independent of n, 400 00:18:58,350 --> 00:18:59,310 the size of the input? 401 00:18:59,310 --> 00:19:03,930 402 00:19:03,930 --> 00:19:04,600 Yeah, in back. 403 00:19:04,600 --> 00:19:05,309 >> AUDIENCE: Printf? 404 00:19:05,309 --> 00:19:06,183 SPEAKER: What's that? 405 00:19:06,183 --> 00:19:07,184 AUDIENCE: Printf? 406 00:19:07,184 --> 00:19:07,850 SPEAKER: Printf. 407 00:19:07,850 --> 00:19:08,400 OK, sure. 408 00:19:08,400 --> 00:19:10,720 So it takes a fixed number of steps. 409 00:19:10,720 --> 00:19:13,170 And I should now-- now that we're talking about C code 410 00:19:13,170 --> 00:19:16,040 and not Scratch, something like say, with printf, 411 00:19:16,040 --> 00:19:17,710 we should start to get careful. 412 00:19:17,710 --> 00:19:21,090 Because printf does take input, it's a string, 413 00:19:21,090 --> 00:19:23,220 and strings do technically have length. 414 00:19:23,220 --> 00:19:25,530 So if we now want to pick on you, if you don't mind, 415 00:19:25,530 --> 00:19:29,430 technically we could argue that printf does take a variable length input, 416 00:19:29,430 --> 00:19:32,270 and surely it might take more time to print a string this long, 417 00:19:32,270 --> 00:19:33,560 than this long. 418 00:19:33,560 --> 00:19:36,570 >> So what if we consider just the sorting and searching examples? 419 00:19:36,570 --> 00:19:40,450 What about Mike Smith in the phone book, or binary search more generally? 420 00:19:40,450 --> 00:19:42,220 In the best case, what might happen? 421 00:19:42,220 --> 00:19:45,577 I open the phone book and, bam, there's Mike Smith's number. 422 00:19:45,577 --> 00:19:46,660 I can call him right away. 423 00:19:46,660 --> 00:19:49,390 >> Took one step, maybe two steps, but a constant number of steps 424 00:19:49,390 --> 00:19:50,230 if I got lucky. 425 00:19:50,230 --> 00:19:52,570 And frankly, we saw on Monday your classmate 426 00:19:52,570 --> 00:19:54,710 get quite lucky twice in a row. 427 00:19:54,710 --> 00:19:57,050 And that was indeed constant time in a lower bounds 428 00:19:57,050 --> 00:20:01,280 on the algorithm in question for finding the number 50 behind those closed 429 00:20:01,280 --> 00:20:01,830 doors. 430 00:20:01,830 --> 00:20:06,400 >> Now, as an aside, if you discover that both big O, the upper bound, 431 00:20:06,400 --> 00:20:09,310 and omega, the lower bound, are one in the same, that 432 00:20:09,310 --> 00:20:11,830 is the same formula in parentheses, you can also 433 00:20:11,830 --> 00:20:15,170 say, just to be fancy, that something is in theta 434 00:20:15,170 --> 00:20:18,270 of n or theta of some other value. 435 00:20:18,270 --> 00:20:20,661 That just means when big O and omega are the same. 436 00:20:20,661 --> 00:20:21,910 Now what about selection sort? 437 00:20:21,910 --> 00:20:23,400 Let's use this new vocabulary. 438 00:20:23,400 --> 00:20:27,407 In selection sort, what were we doing again, and again, and again? 439 00:20:27,407 --> 00:20:29,990 I was going back and forth through the list, looking for whom? 440 00:20:29,990 --> 00:20:33,260 441 00:20:33,260 --> 00:20:34,730 The smallest number. 442 00:20:34,730 --> 00:20:37,560 >> So how many steps, how many comparisons did I 443 00:20:37,560 --> 00:20:43,250 have to make in order to figure out who the smallest element in the list was? 444 00:20:43,250 --> 00:20:44,437 n minus 1, right? 445 00:20:44,437 --> 00:20:47,770 Because if I just start with the one I'm given and I start comparing him or her, 446 00:20:47,770 --> 00:20:49,519 then him or her, him or her, him or her, I 447 00:20:49,519 --> 00:20:52,010 can only pair elements together n minus 1 times. 448 00:20:52,010 --> 00:20:55,630 So selection sort similarly takes n minus 1 steps the first time. 449 00:20:55,630 --> 00:20:59,540 >> How many steps does it take me to find the second smallest element? 450 00:20:59,540 --> 00:21:02,920 n minus 2, because I'm being dumb if I keep looking at the same people 451 00:21:02,920 --> 00:21:06,280 again if I've already selected him or her and put them in their place. 452 00:21:06,280 --> 00:21:09,270 And the third step, n minus 3, then n minus 4. 453 00:21:09,270 --> 00:21:11,020 We've seen this pattern before, and indeed 454 00:21:11,020 --> 00:21:13,460 selection sort similarly has an upper bound 455 00:21:13,460 --> 00:21:16,210 of n squared if we do up that summation. 456 00:21:16,210 --> 00:21:19,790 What is its lower bound, selection sort? 457 00:21:19,790 --> 00:21:25,350 Minimally, how much time must selection sort take, as we defined it on Monday? 458 00:21:25,350 --> 00:21:29,370 459 00:21:29,370 --> 00:21:30,490 Propose two options. 460 00:21:30,490 --> 00:21:32,360 Maybe it's n, as before. 461 00:21:32,360 --> 00:21:35,040 Maybe it's n squared, as it is now as the upper bound. 462 00:21:35,040 --> 00:21:35,874 >> AUDIENCE: n squared. 463 00:21:35,874 --> 00:21:36,664 SPEAKER: n squared. 464 00:21:36,664 --> 00:21:37,368 Why? 465 00:21:37,368 --> 00:21:40,060 >> AUDIENCE: Because you have to define [INAUDIBLE]. 466 00:21:40,060 --> 00:21:41,510 >> SPEAKER: Exactly. 467 00:21:41,510 --> 00:21:45,077 At least as I defined selection sort it was pretty naive, keep going, 468 00:21:45,077 --> 00:21:46,160 find the smallest element. 469 00:21:46,160 --> 00:21:47,770 Go again, find the smallest element. 470 00:21:47,770 --> 00:21:49,490 Go again, find the smallest element. 471 00:21:49,490 --> 00:21:51,700 There's no sort of optimization in there that 472 00:21:51,700 --> 00:21:54,350 might let me abort after just n or so steps. 473 00:21:54,350 --> 00:21:57,080 So indeed, selection sort, omega of n squared. 474 00:21:57,080 --> 00:22:00,667 >> What about insertion sort, where I took who I was given, and then I plopped him 475 00:22:00,667 --> 00:22:01,750 or her in the right place? 476 00:22:01,750 --> 00:22:04,958 Then I proceeded to the second person, plopped him or her in the right place. 477 00:22:04,958 --> 00:22:07,910 Then the next person, plopped him or her in the right place. 478 00:22:07,910 --> 00:22:10,537 Notice that this is very linear, so to speak. 479 00:22:10,537 --> 00:22:12,620 I'm a straight line, I'm not going back and forth, 480 00:22:12,620 --> 00:22:16,080 I've never looking back really, but what's happening when I insert him 481 00:22:16,080 --> 00:22:20,302 or her into the beginning of the list as we did on Monday? 482 00:22:20,302 --> 00:22:21,010 What's happening? 483 00:22:21,010 --> 00:22:21,510 Yeah? 484 00:22:21,510 --> 00:22:23,122 AUDIENCE: [INAUDIBLE]. 485 00:22:23,122 --> 00:22:24,830 SPEAKER: Yeah, that was the catch, right? 486 00:22:24,830 --> 00:22:26,746 You might recall from your classmates, if they 487 00:22:26,746 --> 00:22:29,670 were making any movement with their feet, that was an operation. 488 00:22:29,670 --> 00:22:33,610 So if there were three people here and the new person belonged way over there, 489 00:22:33,610 --> 00:22:37,360 on a long stage like this, sure, he or she could just go to the very end. 490 00:22:37,360 --> 00:22:40,074 But if we're thinking about a computer and an array of memory, 491 00:22:40,074 --> 00:22:41,990 these people are going to have to shuffle over 492 00:22:41,990 --> 00:22:43,260 to make room for that person. 493 00:22:43,260 --> 00:22:46,930 And so that n minus 1 shufflings, n minus 2 shufflings, n 494 00:22:46,930 --> 00:22:50,660 minus 3 shufflings is just kind of happening behind me, not in front of me 495 00:22:50,660 --> 00:22:52,710 as before, in some sense. 496 00:22:52,557 --> 00:22:54,640 Now as an aside, and as you might have seen online 497 00:22:54,640 --> 00:22:57,699 if you start poking around about sorts, there's so many different ones 498 00:22:57,699 --> 00:22:59,490 out there, some of them better than others. 499 00:22:59,490 --> 00:23:02,200 Indeed, bogosort is one that's kind of fun to look up. 500 00:23:02,200 --> 00:23:06,650 Bogosort takes a set of numbers or say a deck of cards, 501 00:23:06,650 --> 00:23:09,870 randomly shuffles them, and checks if they're sorted. 502 00:23:09,870 --> 00:23:12,130 And if not, does it again. 503 00:23:12,130 --> 00:23:14,140 And if not, does it again. 504 00:23:14,140 --> 00:23:15,440 If not, does it again. 505 00:23:15,440 --> 00:23:17,060 Incredibly stupid. 506 00:23:17,060 --> 00:23:19,520 >> And indeed, if you read like the Wikipedia article, 507 00:23:19,520 --> 00:23:21,200 its nickname is stupid sort. 508 00:23:21,200 --> 00:23:25,180 It will eventually work, hopefully, given enough time, 509 00:23:25,180 --> 00:23:28,240 but that amount of time could take quite some time. 510 00:23:28,240 --> 00:23:31,650 So if I could, let's speed things up from Mary Beth's example earlier, 511 00:23:31,650 --> 00:23:35,150 by having a few more elements, but two more processors. 512 00:23:35,150 --> 00:23:37,100 Two people, if you wouldn't mind joining me. 513 00:23:37,100 --> 00:23:40,972 How about 1 over here, and let's go-- no one over there? 514 00:23:40,972 --> 00:23:41,722 No one over there? 515 00:23:41,722 --> 00:23:42,221 OK. 516 00:23:42,221 --> 00:23:44,190 You with the black shirt, yes, come on down. 517 00:23:44,190 --> 00:23:45,000 All right, what's your name? 518 00:23:45,000 --> 00:23:45,720 >> AUDIENCE: Peter. 519 00:23:45,720 --> 00:23:46,100 >> SPEAKER: What's that? 520 00:23:46,100 --> 00:23:46,766 >> AUDIENCE: Peter. 521 00:23:46,766 --> 00:23:49,450 SPEAKER: Peter, David, nice to meet you. 522 00:23:49,450 --> 00:23:53,670 All right, we have Peter here, if you want to come onto the table over here. 523 00:23:53,670 --> 00:23:54,550 And what's your name? 524 00:23:54,550 --> 00:23:55,216 >> AUDIENCE: Elena. 525 00:23:55,216 --> 00:23:55,970 SPEAKER: Elena. 526 00:23:55,970 --> 00:23:57,030 OK, nice to meet you. 527 00:23:57,030 --> 00:23:58,060 Elena meet Peter. 528 00:23:58,060 --> 00:23:59,170 Peter, Elena. 529 00:23:59,170 --> 00:24:02,290 And we'll need Andrew up here as well, please. 530 00:24:02,290 --> 00:24:06,107 And your challenge is going to be to sort a deck of cards. 531 00:24:06,107 --> 00:24:08,190 And if unfamiliar, deck of cards should ultimately 532 00:24:08,190 --> 00:24:11,064 be sorted a little something like this where we'll do the clubs, then 533 00:24:11,064 --> 00:24:13,660 the spades, then the hearts and diamonds, from ace as a one, 534 00:24:13,660 --> 00:24:15,570 all the way up to king. 535 00:24:15,570 --> 00:24:20,890 >> The cards I'm going to give you are going to be 52 in quantity. 536 00:24:20,890 --> 00:24:23,160 We're going to similarly time you, in just a moment. 537 00:24:23,160 --> 00:24:26,410 We're going to throw Andrew up on the screen here, 538 00:24:26,410 --> 00:24:28,170 so as to watch as you do this. 539 00:24:28,170 --> 00:24:31,070 And so that all of this is all the more visible, 540 00:24:31,070 --> 00:24:33,490 these are the cards I got on Amazon. 541 00:24:33,490 --> 00:24:42,861 So they are already randomly sorted, and we're going to time you. 542 00:24:42,861 --> 00:24:44,610 And we're going to keep it real this time, 543 00:24:44,610 --> 00:24:47,820 so we're going to try to pressure you because otherwise this will get tedious 544 00:24:47,820 --> 00:24:48,460 quickly. 545 00:24:48,460 --> 00:24:53,860 If you could proceed to sort 52 elements together via some means, now. 546 00:24:53,860 --> 00:25:04,710 547 00:25:04,710 --> 00:25:07,180 >> And again, as we watch these guys do what, in the end 548 00:25:07,180 --> 00:25:10,200 is going to produce an obvious result, think about really 549 00:25:10,200 --> 00:25:12,962 how they're each doing it, how you might describe it. 550 00:25:12,962 --> 00:25:15,045 Because again, these are all processes, algorithms 551 00:25:15,045 --> 00:25:17,090 that we take for granted as a human. 552 00:25:17,090 --> 00:25:22,349 But you've probably long had intuition, long before you even 553 00:25:22,349 --> 00:25:24,390 thought about taking a computer science class you 554 00:25:24,390 --> 00:25:27,223 might have had the intuition with which to solve problems like this. 555 00:25:27,223 --> 00:25:29,560 But once you recognize the patterns and begin 556 00:25:29,560 --> 00:25:32,407 to formalize the steps with which you're solving these problems, 557 00:25:32,407 --> 00:25:35,490 you'll find that you can solve much more interesting and much more complex 558 00:25:35,490 --> 00:25:39,190 problems quickly. 559 00:25:39,190 --> 00:25:42,351 So someone from the audience, what is at least one element of the algorithm 560 00:25:42,351 --> 00:25:43,350 that they're using here? 561 00:25:43,350 --> 00:25:44,275 >> AUDIENCE: [INAUDIBLE] 562 00:25:44,275 --> 00:25:45,150 SPEAKER: What's that? 563 00:25:45,150 --> 00:25:47,062 AUDIENCE: By suit. 564 00:25:47,062 --> 00:25:47,770 SPEAKER: By suit. 565 00:25:47,770 --> 00:25:50,630 So first they are clustering all of the diamonds together 566 00:25:50,630 --> 00:25:52,560 it seems, all of the hearts together it seems, 567 00:25:52,560 --> 00:25:56,520 and so forth, without respect for the numbers on the cards. 568 00:25:56,520 --> 00:26:00,900 And now they appear, for instance, to be sorting them by number. 569 00:26:00,900 --> 00:26:06,870 570 00:26:06,870 --> 00:26:08,910 Very good. 571 00:26:08,910 --> 00:26:12,370 >> All right, so what's going to be the final step then here? 572 00:26:12,370 --> 00:26:16,950 Once we have four sorted suits, what do we need to do to the four piles 573 00:26:16,950 --> 00:26:20,059 in order to achieve one sorted deck, quite simply? 574 00:26:20,059 --> 00:26:21,350 So we need to merge them again. 575 00:26:21,350 --> 00:26:25,160 >> So there's an interesting idea that again, daresay, is very intuitive even 576 00:26:25,160 --> 00:26:28,140 if you might never have slapped that kind of label on it. 577 00:26:28,140 --> 00:26:31,900 This fundamental notion of dividing the problem not in half this time, 578 00:26:31,900 --> 00:26:33,410 but at least into four pieces. 579 00:26:33,410 --> 00:26:36,810 Solving pretty much fundamentally identical problems 580 00:26:36,810 --> 00:26:40,480 in isolation of each other, and then merging the results. 581 00:26:40,480 --> 00:26:46,940 582 00:26:46,940 --> 00:26:50,140 And, excellent, done. 583 00:26:50,140 --> 00:26:52,140 All right, a big round of applause, if we could. 584 00:26:52,140 --> 00:26:56,480 >> [APPLAUSE] 585 00:26:56,480 --> 00:26:59,740 >> SPEAKER: I have no idea what you'll do with these, but here you go. 586 00:26:59,740 --> 00:27:01,690 Thank you so much. 587 00:27:01,690 --> 00:27:04,660 So let's see, two minutes and eight seconds, 588 00:27:04,660 --> 00:27:07,490 if you'd like to challenge your friends. 589 00:27:07,490 --> 00:27:12,160 What then is going to be a take away from this 590 00:27:12,160 --> 00:27:13,830 that we can leverage more generally? 591 00:27:13,830 --> 00:27:16,080 Well, think back to this array of numbers, 592 00:27:16,080 --> 00:27:19,060 and think back now to some of the pseudocode we've written in the past, 593 00:27:19,060 --> 00:27:22,080 and this was the pseudocode for solving the phone book problem. 594 00:27:22,080 --> 00:27:25,150 Whereby in pseudocode I enumerated a more methodical way 595 00:27:25,150 --> 00:27:28,400 of describing how I did a very intuitive human algorithm of dividing the phone 596 00:27:28,400 --> 00:27:31,650 book in half, repeat, repeat, repeat, until I find someone like Mike Smith, 597 00:27:31,650 --> 00:27:33,790 if he is indeed in the phone book. 598 00:27:33,790 --> 00:27:37,610 >> But I kind of used what I'll call a very iterative approach here, 599 00:27:37,610 --> 00:27:42,160 in particular notice line 8 and line 11. 600 00:27:42,160 --> 00:27:46,750 Those are evidence of an iterative approach, a looping approach, 601 00:27:46,750 --> 00:27:49,040 because that's exactly the behavior they induce. 602 00:27:49,040 --> 00:27:52,910 Those lines both say go to line three, and you can kind of 603 00:27:52,910 --> 00:27:55,140 think of that in your mind's eye as being a loop. 604 00:27:55,140 --> 00:27:59,080 It's telling you to go back up to step three and repeat, again, and again, 605 00:27:59,080 --> 00:28:00,010 and again. 606 00:28:00,010 --> 00:28:04,410 >> But what if we leverage a key idea here that we didn't the last time, 607 00:28:04,410 --> 00:28:10,280 and simplify line 8 and line 11 and their neighbors 608 00:28:10,280 --> 00:28:12,840 as just this, in yellow. 609 00:28:12,840 --> 00:28:16,480 It's not fundamentally shortening the pseudocode very much, 610 00:28:16,480 --> 00:28:20,530 but it's fundamentally changing the nature of my algorithm. 611 00:28:20,530 --> 00:28:24,220 What I'm now saying in step 7, in step 10, 612 00:28:24,220 --> 00:28:29,140 is to search for Mike in the exact same way, 613 00:28:29,140 --> 00:28:31,580 but just in the left half or the right half. 614 00:28:31,580 --> 00:28:33,420 >> So in other words, if I start from step one, 615 00:28:33,420 --> 00:28:36,150 pick up phone book, open to middle of phone book, look at names, 616 00:28:36,150 --> 00:28:39,010 if Smith is among name's, call Mike, else 617 00:28:39,010 --> 00:28:44,340 if Smith is earlier in book, step seven search for Mike in left half of book. 618 00:28:44,340 --> 00:28:47,130 But that kind of feels like it's leaving me hanging, right? 619 00:28:47,130 --> 00:28:49,240 In yellow, is an instruction, but how do I 620 00:28:49,240 --> 00:28:51,870 search for Mike in the left half of the phone book? 621 00:28:51,870 --> 00:28:54,210 Where do I have an algorithm with which I 622 00:28:54,210 --> 00:28:57,100 can search for someone like Mike Smith? 623 00:28:57,100 --> 00:28:58,980 Well, it's staring us in the face. 624 00:28:58,980 --> 00:29:03,090 I can literally use the exact same program effectively going up to the top 625 00:29:03,090 --> 00:29:06,490 again and re-running the same lines of code. 626 00:29:06,490 --> 00:29:10,610 >> So even though this should feel like a bit of a cyclical definition 627 00:29:10,610 --> 00:29:13,480 where you're answering someone's question by just sort of asking 628 00:29:13,480 --> 00:29:15,990 the same question again, like why, why, why? 629 00:29:15,990 --> 00:29:21,580 The reality is because we've hard coded a couple of special lines, step 4, 630 00:29:21,580 --> 00:29:25,320 which is an if, and step 12, which is effectively another branch, 631 00:29:25,320 --> 00:29:30,120 because we have those stopgap measures, this algorithm will terminate if we 632 00:29:30,120 --> 00:29:32,050 find Mike, or if we don't. 633 00:29:32,050 --> 00:29:36,810 But in step 7 and 10 now, we have what we'll call a recursive algorithm. 634 00:29:36,810 --> 00:29:40,420 And recursion is indeed a powerful idea that's a little mind bending at first, 635 00:29:40,420 --> 00:29:42,500 that we can now apply as follows. 636 00:29:42,500 --> 00:29:46,600 >> Merge sort will be the last sort that we look at, at least in class formally. 637 00:29:46,600 --> 00:29:50,040 And it's fundamentally different from those last three, and certainly 638 00:29:50,040 --> 00:29:52,140 last four if we include bogosort. 639 00:29:52,140 --> 00:29:54,810 Here's the pseudocode for merge sort. 640 00:29:54,810 --> 00:30:00,170 When on input of n elements, so given an array of size n, if n is less than 2, 641 00:30:00,170 --> 00:30:01,040 return. 642 00:30:01,040 --> 00:30:03,610 So why do I have that sanity check first? 643 00:30:03,610 --> 00:30:09,477 What's the implication if I hand you an array whose length n is less than 2? 644 00:30:09,477 --> 00:30:11,060 It's already sorted, obviously, right? 645 00:30:11,060 --> 00:30:13,640 Because the list either has one element, which is trivially 646 00:30:13,640 --> 00:30:15,180 sorted because it's the only thing there. 647 00:30:15,180 --> 00:30:18,138 Or, it's of size zero which means there's nothing to sort, so by nature 648 00:30:18,138 --> 00:30:18,720 it is sorted. 649 00:30:18,720 --> 00:30:20,410 There's just nothing wrong there. 650 00:30:20,410 --> 00:30:22,310 So that's our so-called base case. 651 00:30:22,310 --> 00:30:24,440 >> That is similar in spirit to what we did with Mike. 652 00:30:24,440 --> 00:30:26,023 If Mike's in the phone book, call him. 653 00:30:26,023 --> 00:30:27,740 If he's not there, give up. 654 00:30:27,740 --> 00:30:31,240 It's a so-called base case, to make sure this algorithm at the end of the day 655 00:30:31,240 --> 00:30:33,540 will stop in certain circumstances. 656 00:30:33,540 --> 00:30:37,890 >> But here's the leap of faith now, else, sort the left half of the elements, 657 00:30:37,890 --> 00:30:39,740 then sort the right half of the elements, 658 00:30:39,740 --> 00:30:41,189 and then merge the sorted halves. 659 00:30:41,189 --> 00:30:43,230 And here's where it feels like we're copping out. 660 00:30:43,230 --> 00:30:46,900 I've asked you to sort n elements, and I'm 661 00:30:46,900 --> 00:30:50,712 saying, OK, do it by sorting the left and sorting the right. 662 00:30:50,712 --> 00:30:52,420 But I am saying one other thing, and this 663 00:30:52,420 --> 00:30:55,530 is the key theme it seems in the intuition thus far, 664 00:30:55,530 --> 00:30:57,380 there's this third step of merging. 665 00:30:57,380 --> 00:31:00,430 Which even though it seems so dumb in spirit, 666 00:31:00,430 --> 00:31:02,320 like just merge things together, it seems 667 00:31:02,320 --> 00:31:05,380 to be a key step toward the reassembly of two problems that 668 00:31:05,380 --> 00:31:07,330 were divided ultimately in half. 669 00:31:07,330 --> 00:31:12,090 >> So merge sort, let's do this, if you'll humor me, with one more demonstration, 670 00:31:12,090 --> 00:31:14,730 just so that we have some numbers to work with. 671 00:31:14,730 --> 00:31:19,470 Can I exchange eight stress balls for eight people? 672 00:31:19,470 --> 00:31:29,320 All right, how about you three, you four in this section, five, six, and let's 673 00:31:29,320 --> 00:31:30,720 do 7, 8, come on up. 674 00:31:30,720 --> 00:31:35,120 675 00:31:35,120 --> 00:31:36,520 OK, yeah OK. 676 00:31:36,520 --> 00:31:38,640 Minus 8, there we go, plus 1. 677 00:31:38,640 --> 00:31:39,150 Excellent. 678 00:31:39,150 --> 00:31:42,000 All right come on up, let's quickly give you numbers. 679 00:31:42,000 --> 00:31:50,800 Number two, number three, number four, number five, six, seven, and eight. 680 00:31:50,800 --> 00:31:52,140 I did eight correctly this time. 681 00:31:52,140 --> 00:31:56,390 >> OK, so go ahead if you could, and let's sort in the original order 682 00:31:56,390 --> 00:31:59,810 that we had yesterday which looked like this, if you wouldn't mind. 683 00:31:59,810 --> 00:32:03,620 And let's do it in front of the table. 684 00:32:03,620 --> 00:32:06,510 All right, so merge sort. 685 00:32:06,510 --> 00:32:08,820 This is where it's going to get kind of interesting, 686 00:32:08,820 --> 00:32:12,800 because I seem to be giving myself so much less information today. 687 00:32:12,800 --> 00:32:15,149 >> So merge sort first of all on input of n elements, 688 00:32:15,149 --> 00:32:18,440 and is obviously not less than two, it's eight, so I have some more work to do. 689 00:32:18,440 --> 00:32:21,140 So now mentally we as a class are now in the else branch, 690 00:32:21,140 --> 00:32:22,540 which means three steps. 691 00:32:22,540 --> 00:32:25,017 First, I have to sort the left half of the elements. 692 00:32:25,017 --> 00:32:26,350 So how do I go about doing this? 693 00:32:26,350 --> 00:32:28,950 Well, I'm going to kind of mentally divide the list here, 694 00:32:28,950 --> 00:32:30,700 you don't have to physically move, and I'm 695 00:32:30,700 --> 00:32:33,180 going to focus only on the left half of the elements here. 696 00:32:33,180 --> 00:32:36,770 So how do I go about sorting a list now of size four? 697 00:32:36,770 --> 00:32:38,730 What's my algorithm? 698 00:32:38,730 --> 00:32:42,580 First I check is n less than two, no, so I proceed to the else block again. 699 00:32:42,580 --> 00:32:43,900 Sort left half of elements. 700 00:32:43,900 --> 00:32:45,608 >> So now again, mentally, and this is where 701 00:32:45,608 --> 00:32:49,550 you have to accrue a lot of mental history, if you will. 702 00:32:49,550 --> 00:32:51,940 Now I'm sorting the left half of the left half. 703 00:32:51,940 --> 00:32:57,000 All right, so now I call my same merge sorting algorithm, is n less than two? 704 00:32:57,000 --> 00:33:00,590 No, it is two, so I have to sort the left half, and the right half. 705 00:33:00,590 --> 00:33:02,042 So here we go, sort the left half. 706 00:33:02,042 --> 00:33:03,750 Why don't you just take one step forward. 707 00:33:03,750 --> 00:33:04,415 What's your name? 708 00:33:04,415 --> 00:33:04,860 >> AUDIENCE: Darren. 709 00:33:04,860 --> 00:33:05,260 >> SPEAKER: Dan. 710 00:33:05,260 --> 00:33:06,040 Dan has stepped forward. 711 00:33:06,040 --> 00:33:06,748 >> AUDIENCE: Darren. 712 00:33:06,748 --> 00:33:09,000 SPEAKER: Darren, done. 713 00:33:09,000 --> 00:33:10,090 Did you say Darren or Dan? 714 00:33:10,090 --> 00:33:10,550 >> AUDIENCE: Darren. 715 00:33:10,550 --> 00:33:11,216 >> SPEAKER: Darren. 716 00:33:11,216 --> 00:33:14,422 OK, Darren has stepped forward and he is now sorted. 717 00:33:14,422 --> 00:33:16,130 And this is almost an inane claim, right? 718 00:33:16,130 --> 00:33:18,862 I don't really seem to be achieving anything, but let's proceed. 719 00:33:18,862 --> 00:33:20,820 Now let me sort the right half of the elements. 720 00:33:20,820 --> 00:33:21,200 What's your name? 721 00:33:21,200 --> 00:33:21,690 >> AUDIENCE: Luke. 722 00:33:21,690 --> 00:33:22,273 >> SPEAKER: Luke. 723 00:33:22,273 --> 00:33:23,400 Come on, step forward. 724 00:33:23,400 --> 00:33:25,640 Done, I have sorted Luke. 725 00:33:25,640 --> 00:33:28,570 The left half is now sorted and the right half is now sorted, 726 00:33:28,570 --> 00:33:30,770 but again, there's a key step here. 727 00:33:30,770 --> 00:33:32,940 What do I next need to do? 728 00:33:32,940 --> 00:33:33,941 Merge the sorted halves. 729 00:33:33,941 --> 00:33:36,648 Now we're going to just have everyone back and forth in this way, 730 00:33:36,648 --> 00:33:38,620 because I kind of need some scratch space. 731 00:33:38,620 --> 00:33:40,411 It's almost like these guys are on a table, 732 00:33:40,411 --> 00:33:42,460 and I need some room to move them around on. 733 00:33:42,460 --> 00:33:44,170 So I'm going to merge you guys by looking 734 00:33:44,170 --> 00:33:45,960 at the left half and the right half. 735 00:33:45,960 --> 00:33:48,740 And who obviously comes first, left half or right half? 736 00:33:48,740 --> 00:33:52,710 So right half, so let's move Luke over here to Darren's original position. 737 00:33:52,710 --> 00:33:57,640 And now to merge their left half in, Darren's going to move right there. 738 00:33:57,640 --> 00:33:59,750 >> So feels like almost a bubble sort effect, 739 00:33:59,750 --> 00:34:02,482 but my fundamental algorithm, very different this time. 740 00:34:02,482 --> 00:34:04,815 But now's where things get a little annoying because you 741 00:34:04,815 --> 00:34:06,810 have to rewind mentally where did I leave off. 742 00:34:06,810 --> 00:34:09,893 I've just merged the sorted halves, which means I'm where in my algorithm? 743 00:34:09,893 --> 00:34:12,229 744 00:34:12,229 --> 00:34:13,770 I have to sort the right half, right? 745 00:34:13,770 --> 00:34:15,910 >> If you rewind, literally on the video, you'll 746 00:34:15,910 --> 00:34:18,339 see that we got to this point of Luke and Darren 747 00:34:18,339 --> 00:34:21,370 by sorting the left half of the left half. 748 00:34:21,370 --> 00:34:23,430 Then we merged those sorted halves, which 749 00:34:23,430 --> 00:34:27,941 means the next step is sort the right half of the left half. 750 00:34:27,941 --> 00:34:29,649 All right, so let's do this more quickly. 751 00:34:29,649 --> 00:34:33,282 All right, six, I'm going to claim you are now sorted, come on forward. 752 00:34:33,282 --> 00:34:33,990 What's your name? 753 00:34:33,990 --> 00:34:34,589 >> AUDIENCE: Adriano. 754 00:34:34,589 --> 00:34:35,200 >> SPEAKER: Adriano. 755 00:34:35,200 --> 00:34:36,010 Adriano is now sorted. 756 00:34:36,010 --> 00:34:36,450 And what's your name? 757 00:34:36,450 --> 00:34:37,080 >> AUDIENCE: Alex. 758 00:34:37,080 --> 00:34:38,379 >> SPEAKER: Alex is now sorted. 759 00:34:38,379 --> 00:34:40,750 Left half, right half, what's the final step? 760 00:34:40,750 --> 00:34:41,250 Merge. 761 00:34:41,250 --> 00:34:44,310 Pretty trivial, so I'm going to merge in six, 762 00:34:44,310 --> 00:34:46,930 take a step back, eight, take a step back. 763 00:34:46,930 --> 00:34:49,530 And now notice this is a useful takeaway, what 764 00:34:49,530 --> 00:34:53,930 is now true about the left half of the list, irrespective of how we began? 765 00:34:53,930 --> 00:34:55,090 It is sorted. 766 00:34:55,090 --> 00:34:57,750 >> Now it's not sorted in the big scheme of things, 767 00:34:57,750 --> 00:35:00,250 but it is sorted independently of the other half. 768 00:35:00,250 --> 00:35:04,100 Now what step am I on if I keep rewinding how the story began? 769 00:35:04,100 --> 00:35:05,680 Now I have to sort the right half. 770 00:35:05,680 --> 00:35:07,630 So now we're way back at the beginning of the story, 771 00:35:07,630 --> 00:35:08,921 and let's do this more rapidly. 772 00:35:08,921 --> 00:35:11,320 So I'm going to sort the right half of the whole list. 773 00:35:11,320 --> 00:35:13,060 What's the next step? 774 00:35:13,060 --> 00:35:15,840 Sort the left half of the right half. 775 00:35:15,840 --> 00:35:18,715 Sort the left half of the left half of the right half. 776 00:35:18,715 --> 00:35:19,590 And what's your name? 777 00:35:19,590 --> 00:35:20,230 >> AUDIENCE: Omar. 778 00:35:20,230 --> 00:35:21,970 >> SPEAKER: Omar, step forward, done. 779 00:35:21,970 --> 00:35:22,860 Left half is sorted. 780 00:35:22,860 --> 00:35:23,330 And what's your name? 781 00:35:23,330 --> 00:35:23,820 >> AUDIENCE: Chris. 782 00:35:23,820 --> 00:35:25,620 >> SPEAKER: Chris, take a step forward, you are now sorted. 783 00:35:25,620 --> 00:35:27,010 What's the key step now? 784 00:35:27,010 --> 00:35:27,510 Merge. 785 00:35:27,510 --> 00:35:30,509 So one is going to merge into place here, if you could take a step back, 786 00:35:30,509 --> 00:35:32,930 and three is going to take a step back, merge. 787 00:35:32,930 --> 00:35:38,080 So the left half of the right half, is now sorted. 788 00:35:38,080 --> 00:35:41,747 Frankly, this algorithm feels like we are wasting way more time than before, 789 00:35:41,747 --> 00:35:44,830 but if we did this in real time, we'll see what the takeaways going to be. 790 00:35:44,830 --> 00:35:47,970 Now here I am, right half of the right half, 791 00:35:47,970 --> 00:35:50,170 let me go ahead and sort the left half. 792 00:35:50,170 --> 00:35:51,482 Step forward, what's your name? 793 00:35:51,482 --> 00:35:52,190 AUDIENCE: Ramsey. 794 00:35:52,190 --> 00:35:53,210 SPEAKER: Ramsey is now sorted. 795 00:35:53,210 --> 00:35:53,570 What's your name? 796 00:35:53,570 --> 00:35:54,200 >> AUDIENCE: Marina. 797 00:35:54,200 --> 00:35:57,033 >> SPEAKER: Marina is now sorted as well, if you take one step forward. 798 00:35:57,033 --> 00:36:00,690 Key step here is now merge, I'm going to pluck from my two lists, 799 00:36:00,690 --> 00:36:01,720 left and right. 800 00:36:01,720 --> 00:36:05,150 Five is going to come first, and seven is going to come next. 801 00:36:05,150 --> 00:36:06,410 And again, this is deliberate. 802 00:36:06,410 --> 00:36:08,535 The fact that they're taking steps forward and back 803 00:36:08,535 --> 00:36:12,997 is meant to represent that we can't do this algorithm in place as easily 804 00:36:12,997 --> 00:36:15,830 as bubble sort, and selection sort, and insertion sort where we just 805 00:36:15,830 --> 00:36:16,960 kept swapping people. 806 00:36:16,960 --> 00:36:19,940 I literally need a sort of scratch paper in which 807 00:36:19,940 --> 00:36:21,827 to put these folks while I do the merging, 808 00:36:21,827 --> 00:36:23,410 and then I can put them back in place. 809 00:36:23,410 --> 00:36:27,260 And that's key because I'm using a new resource, space, not just time. 810 00:36:27,260 --> 00:36:28,270 >> OK, this is amazing. 811 00:36:28,270 --> 00:36:32,050 Left half is sorted, right half is sorted, now that key merging step. 812 00:36:32,050 --> 00:36:33,450 How am I going to merge this? 813 00:36:33,450 --> 00:36:35,470 So if you'll follow my left hand and right hand, 814 00:36:35,470 --> 00:36:38,930 I'm going to point my left hand at the left half, my right hand 815 00:36:38,930 --> 00:36:42,680 at the right half, and now I have to decide step by step whom to merge in. 816 00:36:42,680 --> 00:36:44,650 Who obviously comes first? 817 00:36:44,650 --> 00:36:45,150 Number one. 818 00:36:45,150 --> 00:36:47,327 So come on over here, here's our scratch pad. 819 00:36:47,327 --> 00:36:49,910 So now number one, and notice what I'll do with my right hand, 820 00:36:49,910 --> 00:36:54,152 I'm going to move my right hand one step over to point number three, 821 00:36:54,152 --> 00:36:55,860 and now I have to make the same decision. 822 00:36:55,860 --> 00:36:58,387 And actually stand right in front of Luke here if you could, 823 00:36:58,387 --> 00:36:59,720 because this is our scratch pad. 824 00:36:59,720 --> 00:37:00,610 So who comes next? 825 00:37:00,610 --> 00:37:05,000 We have Luke with number two or Chris with number three. 826 00:37:05,000 --> 00:37:07,460 Obviously Luke, number two, so you come here. 827 00:37:07,460 --> 00:37:11,270 >> But my left hand now is going to be incremented to point at Darren, 828 00:37:11,270 --> 00:37:15,160 and here's the key take away with merging, I'm going to keep doing this, 829 00:37:15,160 --> 00:37:17,340 obviously, if you kind of follow the logic. 830 00:37:17,340 --> 00:37:19,670 But my hands are never going to go backwards, 831 00:37:19,670 --> 00:37:23,861 which means I'm only ever moving to the left with my merging process, 832 00:37:23,861 --> 00:37:26,360 and that's going to be key to our analysis in just a moment. 833 00:37:26,360 --> 00:37:27,859 >> So now let's finish this up rapidly. 834 00:37:27,859 --> 00:37:31,650 So three comes next, then four comes next, 835 00:37:31,650 --> 00:37:38,750 and now five comes next, then six, and seven, and then finally eight. 836 00:37:38,750 --> 00:37:42,960 Feels like the slowest algorithm yet, but not if we actually 837 00:37:42,960 --> 00:37:45,510 run it at the same sort of clock speed, so to 838 00:37:45,510 --> 00:37:48,106 speak, with the same ticking clock as before. 839 00:37:48,106 --> 00:37:48,605 Why? 840 00:37:48,605 --> 00:37:51,100 Well, Let's take a look at the end result. 841 00:37:51,100 --> 00:37:56,990 >> Let's go back over here, let me pull up a demonstration visually 842 00:37:56,990 --> 00:37:59,030 of what we just did. 843 00:37:59,030 --> 00:38:06,110 Zooming in here, on this page here, telling Firefox 844 00:38:06,110 --> 00:38:08,200 that we want to queue up in this box, let's 845 00:38:08,200 --> 00:38:11,260 say bubble sort, with which we're now well familiar, 846 00:38:11,260 --> 00:38:14,130 selection sort, which is another fairly straightforward one, 847 00:38:14,130 --> 00:38:18,250 and now today's merge sort, which will be our climactic ending. 848 00:38:18,250 --> 00:38:21,530 The reason it took so much longer here with humans and me verbally is, 849 00:38:21,530 --> 00:38:23,480 obviously, I'm explaining every step. 850 00:38:23,480 --> 00:38:26,920 But if you simply execute this, much like we did bubble sort and selection 851 00:38:26,920 --> 00:38:30,890 sort not only visually, watch just how much more efficiently 852 00:38:30,890 --> 00:38:33,330 this leveraging of division and conquering 853 00:38:33,330 --> 00:38:39,150 can be when applied to a data set that's not even size eight, but even much, 854 00:38:39,150 --> 00:38:39,970 much bigger. 855 00:38:39,970 --> 00:38:44,585 I give you merge sort, side by side with these other algorithms. 856 00:38:44,585 --> 00:38:56,364 857 00:38:56,364 --> 00:38:58,530 This is going to get painful quickly, and the ending 858 00:38:58,530 --> 00:39:00,890 is not particularly climactic, they just end up sorted. 859 00:39:00,890 --> 00:39:05,280 But the key take away is that look how much faster merge sort 860 00:39:05,280 --> 00:39:08,110 was, unless you think I'm just kind of messing with you. 861 00:39:08,110 --> 00:39:13,100 If we do this one final time, let's reload this, let's go back 862 00:39:13,100 --> 00:39:14,960 and choose bubble sort, and just for kicks, 863 00:39:14,960 --> 00:39:17,330 let's choose insertion sort, just for good measure. 864 00:39:17,330 --> 00:39:20,020 And this time again, let's choose merge sort and let's 865 00:39:20,020 --> 00:39:21,595 actually run these side by side. 866 00:39:21,595 --> 00:39:24,140 867 00:39:24,140 --> 00:39:26,930 >> And it's not, in fact, a fluke. 868 00:39:26,930 --> 00:39:31,140 What I've effectively done is I've divided my input in half, again, 869 00:39:31,140 --> 00:39:32,240 and again, and again. 870 00:39:32,240 --> 00:39:35,590 And there's only so many times you can divide your input into halves, left 871 00:39:35,590 --> 00:39:36,240 and right. 872 00:39:36,240 --> 00:39:39,425 What's the formula that we keep seeing that describes the division in half 873 00:39:39,425 --> 00:39:41,050 again, and again, and again, and again? 874 00:39:41,050 --> 00:39:41,890 >> AUDIENCE: Log n. 875 00:39:41,890 --> 00:39:42,760 >> SPEAKER: Log n. 876 00:39:42,760 --> 00:39:46,300 But then there's one other key step, this algorithm is not log n steps. 877 00:39:46,300 --> 00:39:48,992 If it were only log n steps, we would be in the same problem 878 00:39:48,992 --> 00:39:51,200 as before where we can't be sure everything's sorted. 879 00:39:51,200 --> 00:39:54,480 You have to minimally look at n elements to be sure n elements are sorted, 880 00:39:54,480 --> 00:39:55,950 otherwise it's a leap of faith. 881 00:39:55,950 --> 00:39:59,810 >> So it's minimally log n steps, but what about this key merging step 882 00:39:59,810 --> 00:40:04,370 where I merged my left half and right half and walked across the stage? 883 00:40:04,370 --> 00:40:06,980 How many steps is that to merge? 884 00:40:06,980 --> 00:40:10,150 It's n, but I didn't just merge the final time. 885 00:40:10,150 --> 00:40:15,089 On each of those nested calls, on each of those nested merges, I still sorted. 886 00:40:15,089 --> 00:40:18,380 I merged these two guys, then these two guys, then these two guys and so forth. 887 00:40:18,380 --> 00:40:19,955 >> So I did merging again, and again. 888 00:40:19,955 --> 00:40:20,580 How many times? 889 00:40:20,580 --> 00:40:23,510 So every time I divided the list in half, I did a merge. 890 00:40:23,510 --> 00:40:25,460 Divide the list in half, do a merge. 891 00:40:25,460 --> 00:40:28,570 So if dividing the list can be done log n times, 892 00:40:28,570 --> 00:40:33,880 and the merging ultimately takes n steps, what might be now the upper 893 00:40:33,880 --> 00:40:37,000 bound on the running time of our algorithm? 894 00:40:37,000 --> 00:40:37,980 n log n. 895 00:40:37,980 --> 00:40:40,560 >> And indeed, that's what we've achieved here. 896 00:40:40,560 --> 00:40:44,650 So the feel that you see visually when those three things run side by side 897 00:40:44,650 --> 00:40:47,930 is n squared against n squared against n log n. 898 00:40:47,930 --> 00:40:51,010 Which fundamentally we'll see, not only today but in the future, 899 00:40:51,010 --> 00:40:52,760 is much, much faster. 900 00:40:52,760 --> 00:40:56,010 A round of applause for these guys, I will reward them with stress balls. 901 00:40:56,010 --> 00:41:00,260 Let's adjourn here today, and we will see you on Monday. 902 00:41:00,260 --> 00:41:02,255