1 00:00:10,976 --> 00:00:11,916 >> Lecturer: All right. 2 00:00:11,916 --> 00:00:13,516 Welcome back to CS50. 3 00:00:13,556 --> 00:00:15,866 This is the end of Week 3. 4 00:00:16,476 --> 00:00:18,706 Nothing stupid to start off with today on the overhead. 5 00:00:18,706 --> 00:00:22,106 Instead I need eight of you if you would be so brave? 6 00:00:22,866 --> 00:00:26,636 Yes, one, come on; okay, number two, come on up. 7 00:00:27,596 --> 00:00:28,196 Don't look up. 8 00:00:29,346 --> 00:00:31,206 No one is looking up in this area. 9 00:00:31,206 --> 00:00:33,086 Three? Okay, come on up. 10 00:00:33,086 --> 00:00:34,516 Again, you must be comfortable on camera. 11 00:00:34,516 --> 00:00:36,316 You two, come on. 12 00:00:36,316 --> 00:00:38,506 Okay, wait let's see, one, two, three, four, five, 13 00:00:38,816 --> 00:00:40,086 you three in front here. 14 00:00:40,866 --> 00:00:46,516 Yeah. We'll do some other awkward things in the future. 15 00:00:46,516 --> 00:00:46,866 I promise. 16 00:00:47,156 --> 00:00:49,346 Okay. So here we go. 17 00:00:49,606 --> 00:00:53,666 If you could line up in that order up here on stage? 18 00:00:54,516 --> 00:01:03,966 [ class noise ] 19 00:01:04,466 --> 00:01:06,096 Perfect, perfect. 20 00:01:06,586 --> 00:01:09,466 Okay, so if you guys could make yourselves look like that, 21 00:01:09,726 --> 00:01:12,066 the first moment of awkwardness here is just to remind you 22 00:01:12,066 --> 00:01:14,496 where we left off last time and that was the goal, 23 00:01:14,496 --> 00:01:17,966 among the goals on Monday was to actually search numbers, 24 00:01:18,266 --> 00:01:20,106 and we had a wonderful demonstration 25 00:01:20,106 --> 00:01:23,156 that worked perfectly whereby Edward found the numbers 26 00:01:23,156 --> 00:01:25,506 in question immediately, but at least the second time 27 00:01:25,506 --> 00:01:27,726 around in theory he was able to assume 28 00:01:27,726 --> 00:01:30,586 that the bottom most array was actually sorted. 29 00:01:30,586 --> 00:01:33,036 So today we ask this question, how do you go 30 00:01:33,036 --> 00:01:34,356 about sorting those numbers? 31 00:01:34,356 --> 00:01:37,326 And for this now we need the ninth volunteer. 32 00:01:37,326 --> 00:01:40,606 Yes, in the pink shirt there; come on down. 33 00:01:41,926 --> 00:01:44,536 So, this gentleman here will be in charge 34 00:01:44,536 --> 00:01:46,536 of the sorting demonstration. 35 00:01:47,426 --> 00:01:50,066 So we have eight numbers, and if you would 36 00:01:50,066 --> 00:01:51,286 as this gentleman comes down think 37 00:01:51,286 --> 00:01:53,166 to yourself you've got eight humans here each 38 00:01:53,166 --> 00:01:55,926 of which is representing an integer, how would you, 39 00:01:55,926 --> 00:01:58,546 the human, go about sorting these folks and let's see 40 00:01:58,546 --> 00:02:00,746 if yours actually coincides with? 41 00:02:00,746 --> 00:02:01,106 >> Student: Adam. 42 00:02:01,106 --> 00:02:03,006 >> Lecturer: With Adam's approach. 43 00:02:03,006 --> 00:02:07,276 So, Adam, your goal here is simply to tell these eight what 44 00:02:07,276 --> 00:02:09,736 to do and sort them from smallest on the left 45 00:02:09,736 --> 00:02:10,566 to largest on the right. 46 00:02:10,566 --> 00:02:10,716 >> Student: All right. 47 00:02:10,896 --> 00:02:13,716 You and you switch. 48 00:02:14,166 --> 00:02:19,556 Will you switch with him? 49 00:02:19,726 --> 00:02:23,576 Will you go to the end, please? 50 00:02:23,956 --> 00:02:30,116 And then will you come in between these two? 51 00:02:31,526 --> 00:02:32,786 >> Lecturer: Round of applause, absolutely. 52 00:02:33,516 --> 00:02:35,796 [ Applause ] 53 00:02:36,296 --> 00:02:37,376 That was fantastic. 54 00:02:37,376 --> 00:02:39,776 Okay. So, what was Adam's approach? 55 00:02:39,776 --> 00:02:42,346 Can someone put into words what he just did? 56 00:02:42,826 --> 00:02:43,696 What is the algorithm? 57 00:02:43,766 --> 00:02:45,666 How would you specify this? 58 00:02:46,776 --> 00:02:48,736 Not so many volunteers now. 59 00:02:48,846 --> 00:02:49,086 Okay? 60 00:02:51,046 --> 00:02:55,176 >> Student: Determine the least number and put 61 00:02:55,396 --> 00:02:57,826 that in the first index and then repeat [inaudible]. 62 00:02:57,826 --> 00:02:58,346 >> Lecturer: Okay, good. 63 00:02:58,346 --> 00:03:01,166 So look for the least, look for the smallest number, 64 00:03:01,166 --> 00:03:03,376 pull that person out and tell them go to the beginning 65 00:03:03,716 --> 00:03:07,086 and then repeat recursively so to speak do the same thing again 66 00:03:07,086 --> 00:03:09,746 but with a problem that is ever so slightly smaller 67 00:03:09,746 --> 00:03:12,536 and minus one instead of N and, in fact, 68 00:03:12,536 --> 00:03:13,306 we got to the end result. 69 00:03:13,536 --> 00:03:14,876 What was your actual approach? 70 00:03:14,876 --> 00:03:18,336 >> Student: It was basically what you said except I also 71 00:03:18,336 --> 00:03:22,336 tried to look to make sure not to change anyone that was 72 00:03:22,336 --> 00:03:23,546 in their correct position already. 73 00:03:23,656 --> 00:03:24,186 >> Lecturer: Okay, good. 74 00:03:24,186 --> 00:03:26,516 So, we had a little optimization heuristic perhaps whereby 75 00:03:26,516 --> 00:03:27,886 if someone was already in the right position, 76 00:03:27,886 --> 00:03:30,586 we didn't move them at all thereby not wasting any cycles. 77 00:03:30,826 --> 00:03:33,396 Okay, so let's see if we can't slap both a label on this 78 00:03:33,396 --> 00:03:36,016 for the sake of discussion, but then also make some claim 79 00:03:36,016 --> 00:03:38,346 as to the efficiency of what Adam just did. 80 00:03:38,346 --> 00:03:43,356 So, let's see here could you go back 81 00:03:43,466 --> 00:03:48,116 to the order you were just in? 82 00:03:48,336 --> 00:03:50,526 Okay. So we pretty much just did what this gentleman said. 83 00:03:50,756 --> 00:03:53,176 We selected so to speak the smallest element. 84 00:03:53,176 --> 00:03:54,086 Is it this person? 85 00:03:54,086 --> 00:03:54,876 Is it this person? 86 00:03:54,876 --> 00:03:55,666 Is it this person? 87 00:03:55,666 --> 00:03:56,036 This person. 88 00:03:56,166 --> 00:03:58,756 Ah-ha, we found it and notice the one difference here. 89 00:03:58,756 --> 00:04:01,016 So, the human, Adam, might have immediately gone 90 00:04:01,016 --> 00:04:03,386 for obviously you're number one you go here, 91 00:04:03,606 --> 00:04:06,186 but remember a computer is not going to be able to sort 92 00:04:06,246 --> 00:04:08,446 of [inaudible] be able to know who is the smallest 93 00:04:08,446 --> 00:04:10,076 and where they are because if you think 94 00:04:10,076 --> 00:04:12,536 about how you would represent these eight humans, 95 00:04:12,866 --> 00:04:16,316 what data structure as we'll call it would you use these days 96 00:04:16,316 --> 00:04:16,576 and see? 97 00:04:17,486 --> 00:04:19,546 So you'd use an array, which you might have had a list 98 00:04:19,546 --> 00:04:22,466 in scratch, but an array even though it allows you random 99 00:04:22,466 --> 00:04:24,556 access where you can jump randomly 100 00:04:24,556 --> 00:04:27,386 to any one location just by indexing 101 00:04:27,386 --> 00:04:30,006 into the array numerically by way of the square brackets, 102 00:04:30,286 --> 00:04:32,106 that doesn't necessarily mean [inaudible] 103 00:04:32,106 --> 00:04:33,466 that you know where to look. 104 00:04:33,466 --> 00:04:34,676 So, for instance, your name is? 105 00:04:34,996 --> 00:04:35,296 >> Student: Gabrielle. 106 00:04:35,486 --> 00:04:36,056 >> Lecturer: Gabrielle. 107 00:04:36,056 --> 00:04:38,286 So Adam didn't necessarily know that Gabrielle was 108 00:04:38,286 --> 00:04:42,136 in index location zero, one, two, three, four; he only knew 109 00:04:42,136 --> 00:04:44,866 that presumably because as a human he very quickly scanned 110 00:04:44,866 --> 00:04:47,956 the list, but much like a computer he had 111 00:04:47,956 --> 00:04:50,446 to start somewhere so start at the left and move to the right 112 00:04:50,446 --> 00:04:52,556 and then finally he saw Gabrielle and then just 113 00:04:52,556 --> 00:04:55,006 to push a little harder here once you found Gabrielle did you 114 00:04:55,006 --> 00:04:56,416 know this was the smallest element? 115 00:04:57,046 --> 00:04:57,356 >> Student: I did. 116 00:04:57,606 --> 00:04:58,706 >> Lecturer: Okay, you did, why? 117 00:04:58,706 --> 00:05:00,726 >> Student: Because I had seen all the other elements. 118 00:05:00,796 --> 00:05:01,456 >> Lecturer: Okay, good. 119 00:05:01,456 --> 00:05:03,666 So, there's at least some knowledge in advance here 120 00:05:03,666 --> 00:05:06,256 as to what the numbers are so he at least was able 121 00:05:06,256 --> 00:05:08,136 to optimize there, but in the worst case 122 00:05:08,136 --> 00:05:10,506 if you didn't have this little cheat sheet that's 20 feet high 123 00:05:10,506 --> 00:05:12,646 over here, what would you have had to do to find 124 00:05:12,646 --> 00:05:13,476 that smallest element? 125 00:05:13,476 --> 00:05:14,216 >> Student: Check the last three. 126 00:05:14,216 --> 00:05:14,646 >> Lecturer: Okay, good. 127 00:05:14,646 --> 00:05:15,706 So check the last element. 128 00:05:15,706 --> 00:05:17,736 So maximally finding Gabrielle 129 00:05:17,736 --> 00:05:19,996 or more generally finding the smallest element would take 130 00:05:20,316 --> 00:05:21,376 from the audience how many steps? 131 00:05:22,856 --> 00:05:24,006 Okay so N steps, right? 132 00:05:24,006 --> 00:05:25,556 You start here, you check four. 133 00:05:25,556 --> 00:05:28,066 Okay, four is actually the smallest element I've seen, 134 00:05:28,446 --> 00:05:29,296 but let me keep looking. 135 00:05:29,366 --> 00:05:30,516 Oh, two is even better; 136 00:05:30,686 --> 00:05:32,096 let's forget about, what was your name? 137 00:05:32,466 --> 00:05:32,596 >> Student: Praya [phonetic]. 138 00:05:32,976 --> 00:05:33,306 >> Lecturer: Praya. 139 00:05:33,306 --> 00:05:35,396 Let's forget about Praya because we've just now found? 140 00:05:35,686 --> 00:05:36,156 >> Student: Juliana. 141 00:05:36,156 --> 00:05:38,696 >> Lecturer: Juliana, who's number is smaller so now think 142 00:05:38,696 --> 00:05:40,666 about how you would implement this I just need to keep 143 00:05:40,666 --> 00:05:44,476 around it seems a variable, I or N or whatever that keeps track 144 00:05:44,476 --> 00:05:48,006 of maybe what the smallest value is or at least where it is. 145 00:05:48,006 --> 00:05:50,636 So, index location 1 and then I keep going, keep going, 146 00:05:50,636 --> 00:05:52,066 keep going, finally I get to Gabrielle 147 00:05:52,256 --> 00:05:54,936 and I update my variable because I found something better 148 00:05:55,206 --> 00:05:57,706 than Juliana and now I keep going and then I get 149 00:05:57,706 --> 00:05:59,566 to these guys and realize that was a waste of time 150 00:05:59,836 --> 00:06:01,526 because I've already found the smallest element. 151 00:06:01,526 --> 00:06:02,756 So, I pull Gabrielle out 152 00:06:02,906 --> 00:06:04,866 and then what did you tell her to do exactly? 153 00:06:05,356 --> 00:06:07,416 >> Student: Switch with Praya I believe. 154 00:06:07,646 --> 00:06:08,156 >> Lecturer: Excellent. 155 00:06:08,686 --> 00:06:09,226 Okay, good. 156 00:06:09,226 --> 00:06:11,936 So notice, and if you could implement this in reality, 157 00:06:12,766 --> 00:06:15,126 so we've switched them and this was actually pretty smart 158 00:06:15,126 --> 00:06:17,936 because what would another approach have been to make room 159 00:06:18,206 --> 00:06:19,166 for Gabrielle down here? 160 00:06:20,686 --> 00:06:20,806 Yeah? 161 00:06:20,806 --> 00:06:22,786 >> Student: Maybe [inaudible] one extra spot. 162 00:06:23,006 --> 00:06:23,326 >> Lecturer: Okay. 163 00:06:23,326 --> 00:06:25,366 So we could make another array, and we'll see 164 00:06:25,366 --> 00:06:28,306 in time how you can create arrays in advanced that are 165 00:06:28,306 --> 00:06:30,946 as big as you'd like them subject to RAM constraints 166 00:06:31,166 --> 00:06:32,396 so we could just kind of make a bunch 167 00:06:32,396 --> 00:06:34,366 of blank space then start filling in the holes. 168 00:06:34,366 --> 00:06:36,126 What would another approach be as opposed 169 00:06:36,126 --> 00:06:38,606 to just having these two swap? 170 00:06:39,336 --> 00:06:40,956 So we could shift people, right? 171 00:06:40,956 --> 00:06:42,766 If you could go back to your original locations 172 00:06:42,766 --> 00:06:43,526 for just a second? 173 00:06:43,776 --> 00:06:46,806 Another very reasonable approach would have been to say, okay, 174 00:06:46,806 --> 00:06:48,286 come out to the line if you would. 175 00:06:48,486 --> 00:06:51,236 You guys make room, she's got to go into the front of the list, 176 00:06:51,236 --> 00:06:55,386 but that's another one, two, three, four steps just 177 00:06:55,386 --> 00:06:56,926 to make room now for Gabrielle. 178 00:06:57,206 --> 00:06:59,266 So, we have a couple of options here, right? 179 00:06:59,476 --> 00:07:02,316 Clearly in a program it's going to take us N steps 180 00:07:02,316 --> 00:07:03,546 to at least find Gabrielle 181 00:07:03,546 --> 00:07:06,046 or more generally the smallest number, but even then 182 00:07:06,046 --> 00:07:08,176 after that there's some clever decisions we can make 183 00:07:08,176 --> 00:07:11,626 like as Adam did swapping the two instead of foolishly, 184 00:07:11,626 --> 00:07:13,856 though correctly, shifting everyone down. 185 00:07:14,236 --> 00:07:15,506 So let's now ask the question, 186 00:07:15,536 --> 00:07:18,086 if this algorithm is essentially defined 187 00:07:18,086 --> 00:07:21,726 as selecting the smallest person, moving them immediately 188 00:07:21,726 --> 00:07:23,696 to the front of the list and then repeating 189 00:07:23,696 --> 00:07:26,756 as this gentleman said, how many steps maximally is this 190 00:07:26,756 --> 00:07:29,276 algorithm going to take to sort everyone? 191 00:07:29,276 --> 00:07:34,826 So it's not as bad as N factorial, thankfully, 192 00:07:34,826 --> 00:07:37,746 because that would actually get really slow really fast. 193 00:07:38,396 --> 00:07:39,886 >> Student: A little less than N squared. 194 00:07:39,886 --> 00:07:40,066 >> Lecturer: Oh, okay. 195 00:07:40,066 --> 00:07:40,846 Say again? 196 00:07:41,116 --> 00:07:42,386 >> Student: A little less than N square. 197 00:07:42,386 --> 00:07:44,446 >> Lecturer: Yeah, it's actually a little less than N squared. 198 00:07:44,446 --> 00:07:45,626 Well, why is that? 199 00:07:45,626 --> 00:07:46,156 Well, let's see. 200 00:07:46,156 --> 00:07:49,806 If I take N steps initially, I take N steps and I get here, 201 00:07:49,806 --> 00:07:50,936 but now what do I have to do? 202 00:07:51,076 --> 00:07:52,236 I essentially have to repeat, 203 00:07:52,236 --> 00:07:53,856 but I can slightly optimize, right? 204 00:07:53,856 --> 00:07:55,546 Because I already put Gabrielle in the right place 205 00:07:55,846 --> 00:07:57,636 so I don't have to look at her number again 206 00:07:57,876 --> 00:07:59,556 so now I have N minus one steps 207 00:07:59,926 --> 00:08:02,696 and then the next iteration N minus two steps then N minus 208 00:08:02,756 --> 00:08:05,156 three then N minus four dot, dot, 209 00:08:05,156 --> 00:08:06,806 dot so that begs the question in the end, 210 00:08:07,046 --> 00:08:10,556 what is N plus N minus one plus N minus two plus dot, dot, 211 00:08:10,556 --> 00:08:12,116 dot down to one or down to zero? 212 00:08:12,566 --> 00:08:13,136 What is that? 213 00:08:13,136 --> 00:08:14,806 Anyone remember the little cheat sheet at the back 214 00:08:14,806 --> 00:08:15,626 of your high school math book? 215 00:08:18,386 --> 00:08:22,636 Yeah, so it's N times N plus one all over two, 216 00:08:23,146 --> 00:08:24,236 which is roughly what? 217 00:08:24,686 --> 00:08:24,976 N squared. 218 00:08:25,046 --> 00:08:29,516 So, let me part the river here for just one moment. 219 00:08:29,596 --> 00:08:31,786 Sorry, we'll get you some cough medicine shortly. 220 00:08:32,276 --> 00:08:36,056 So, that series N plus N minus one -- 221 00:08:37,156 --> 00:08:39,996 I promise not to do too much math while you guys are standing 222 00:08:39,996 --> 00:08:42,336 here -- plus, whoops, not N plus one, 223 00:08:42,396 --> 00:08:46,076 N minus one plus N minus two plus dot, dot, dot, 224 00:08:46,146 --> 00:08:48,306 plus all the way down to the very last step. 225 00:08:48,616 --> 00:08:52,066 You may recall that this generalizes N times N plus one 226 00:08:52,066 --> 00:08:53,536 over two, what does that equal? 227 00:08:53,756 --> 00:08:59,556 Well, that's N squared plus N over two and now is 228 00:08:59,556 --> 00:09:02,326 when we apparently have a way of expressing the running time 229 00:09:02,326 --> 00:09:05,866 of this algorithm and we talked on Monday about big O and omega 230 00:09:05,866 --> 00:09:07,326 and what those actually mean. 231 00:09:07,326 --> 00:09:10,106 So, what's the worst case running time of this algorithm? 232 00:09:11,536 --> 00:09:13,486 So it feels like it's big O of this. 233 00:09:13,486 --> 00:09:16,726 So, big O of this thing here, but one of the rules of thumb 234 00:09:16,726 --> 00:09:18,736 when it comes to talking about the running time 235 00:09:18,736 --> 00:09:21,966 of an algorithm mathematically or formally like this is 236 00:09:21,966 --> 00:09:25,116 when you start talking about really big values of N, 237 00:09:25,566 --> 00:09:28,206 adding another little n like this doesn't really matter. 238 00:09:28,236 --> 00:09:30,796 You start to care more about the degree, 239 00:09:31,206 --> 00:09:35,096 the member in the formula that has the highest exponent, 240 00:09:35,096 --> 00:09:36,986 for instance, because if N is not just eight 241 00:09:37,276 --> 00:09:41,026 but N is a billon, well, N squared, a billion squared, 242 00:09:41,226 --> 00:09:44,346 that's a really big damn number plus one more billion not 243 00:09:44,456 --> 00:09:48,166 so material and so when we talk about big O notation, 244 00:09:48,476 --> 00:09:51,806 we tend to get rid of the smaller terms and this divided 245 00:09:51,806 --> 00:09:54,576 by two, I mean that's really not so interesting either 246 00:09:54,576 --> 00:09:56,736 that we actually only did half as many steps 247 00:09:57,116 --> 00:09:58,796 and the motivation conceptually 248 00:09:58,796 --> 00:10:01,746 for this is essentially this idea. 249 00:10:01,746 --> 00:10:03,806 When I started counting everyone linearly 250 00:10:03,876 --> 00:10:07,176 in the lecture hall the other day, one, two, three, 251 00:10:07,506 --> 00:10:08,876 we immediately improved on that. 252 00:10:09,016 --> 00:10:10,616 That was a big O of N algorithm, 253 00:10:10,616 --> 00:10:14,536 but I immediately started doing two, four, six, eight, ten. 254 00:10:14,766 --> 00:10:17,326 So that's like an N over two algorithm 255 00:10:17,326 --> 00:10:20,606 and mathematically N is slower than N over two 256 00:10:20,606 --> 00:10:22,976 because I'm doing twice as many steps in the first version 257 00:10:23,166 --> 00:10:25,676 and half as many steps in the second, but you know what? 258 00:10:25,676 --> 00:10:29,576 If I wait 12 months, 18 months, you know, Intel is going to come 259 00:10:29,576 --> 00:10:32,186 out with a faster CPU that, you know, tends historically 260 00:10:32,186 --> 00:10:34,956 to be twice as fast and so over the course 261 00:10:34,956 --> 00:10:39,346 of a year my algorithm naive though it was automatically gets 262 00:10:39,426 --> 00:10:41,646 twice as fast, but that's not interesting. 263 00:10:41,646 --> 00:10:44,716 We need to be able to abstract away from the running time 264 00:10:44,716 --> 00:10:48,046 of me the human, the running, sorry, the speed of me, 265 00:10:48,046 --> 00:10:49,356 the human, or the speed 266 00:10:49,356 --> 00:10:51,556 of the machine implementing these algorithms 267 00:10:51,806 --> 00:10:53,776 so that we can say something interesting 268 00:10:53,776 --> 00:10:56,336 about the performance of algorithms independent 269 00:10:56,336 --> 00:10:58,206 of however many gigahertz happens to be 270 00:10:58,206 --> 00:10:59,586 in vogue at the moment. 271 00:11:00,066 --> 00:11:02,046 So, with that said the big O running time 272 00:11:02,046 --> 00:11:04,946 of this algorithm we would say it's just big O of N squared. 273 00:11:05,316 --> 00:11:06,886 Now what about in the best case? 274 00:11:06,886 --> 00:11:09,186 If you guys could realign as you were a moment ago, 275 00:11:09,546 --> 00:11:10,856 this is not the best case, 276 00:11:10,856 --> 00:11:12,646 but what would the best case scenario be 277 00:11:12,646 --> 00:11:14,446 for Adam upon arriving on stage? 278 00:11:14,446 --> 00:11:16,606 >> Student: Everyone already in the right place. 279 00:11:16,736 --> 00:11:16,986 >> Lecturer: Okay. 280 00:11:16,986 --> 00:11:18,276 So everyone is already in the right place. 281 00:11:18,276 --> 00:11:20,766 So, if you could quickly sort yourselves from one 282 00:11:20,766 --> 00:11:24,236 through eight so now Adam has just arrived on stage, 283 00:11:24,506 --> 00:11:27,716 he is handed this array say with no other knowledge in advance, 284 00:11:27,906 --> 00:11:30,666 if he applies that same algorithm of looking 285 00:11:30,666 --> 00:11:32,656 for the smallest element, putting him or her 286 00:11:32,656 --> 00:11:34,086 at the front then repeating 287 00:11:34,086 --> 00:11:36,606 for the N minus one remaining elements then repeating 288 00:11:36,606 --> 00:11:39,256 recursively, how many steps is the algorithm going 289 00:11:39,256 --> 00:11:41,906 to take this time? 290 00:11:42,046 --> 00:11:42,126 >> Student: Eight. 291 00:11:42,556 --> 00:11:43,786 >> Lecturer: So it's actually not eight. 292 00:11:43,786 --> 00:11:47,426 At least as we've described it here, it's the same number 293 00:11:47,426 --> 00:11:51,696 because what does he not know upon each iteration of this 294 00:11:52,006 --> 00:11:54,116 if we implement exactly the same algorithm? 295 00:11:54,226 --> 00:11:56,226 >> Student: I'm still not sure it one is the smallest element 296 00:11:56,226 --> 00:11:58,116 so I have to still check all the way down. 297 00:11:58,116 --> 00:11:58,716 >> Lecturer: Exactly. 298 00:11:58,716 --> 00:12:00,746 So, again, you can [inaudible] the cheat sheet that's up here 299 00:12:00,746 --> 00:12:02,956 if Adam is the computer that's been handed this array 300 00:12:02,956 --> 00:12:05,676 of eight numbers, it's wonderful that they're sorted, 301 00:12:05,966 --> 00:12:07,616 but his algorithm is not designed 302 00:12:07,616 --> 00:12:09,936 as we've just described it to really take that into account 303 00:12:09,936 --> 00:12:11,806 because he's still going to start at the left and he's going 304 00:12:11,806 --> 00:12:13,746 to realize, oh, wow, this is a really small number, 305 00:12:13,746 --> 00:12:16,796 I'm going to remember this one bigger, bigger, bigger, bigger, 306 00:12:16,796 --> 00:12:19,696 bigger, bigger, damn, I just wasted seven steps and 307 00:12:19,696 --> 00:12:22,376 yet I had the winner at the very beginning, okay, so I'm going 308 00:12:22,376 --> 00:12:24,446 to leave her where she is now let me repeat 309 00:12:24,496 --> 00:12:26,626 for the remaining elements, all right? 310 00:12:26,626 --> 00:12:28,516 Here is the small number, two, I'll remember this. 311 00:12:29,016 --> 00:12:32,326 Bigger, bigger, bigger, ah, I've wasted more time, but I'm going 312 00:12:32,446 --> 00:12:34,056 to keep wasting time because at least 313 00:12:34,056 --> 00:12:36,666 with this algorithm defined as iteratively 314 00:12:36,726 --> 00:12:38,826 or recursively selecting the smallest element 315 00:12:38,826 --> 00:12:41,996 and plopping them at the beginning, there's no cognizance 316 00:12:42,076 --> 00:12:45,186 of whether or not the list is already sorted. 317 00:12:45,386 --> 00:12:47,466 Now, you might think well this is just stupid now, right? 318 00:12:47,466 --> 00:12:49,066 Just keep track of whether 319 00:12:49,066 --> 00:12:51,986 or not you've seen smaller numbers before and that's fine, 320 00:12:52,146 --> 00:12:54,206 but that's a different algorithm then, right? 321 00:12:54,206 --> 00:12:55,606 We need more memory for that 322 00:12:55,606 --> 00:12:56,956 or we need a different set of steps. 323 00:12:57,446 --> 00:12:59,156 So let's try one other approach at least here. 324 00:12:59,156 --> 00:13:01,316 Could you guys reshuffle in this order? 325 00:13:02,526 --> 00:13:03,906 And this is really random. 326 00:13:03,906 --> 00:13:05,336 It doesn't matter what they reset to, 327 00:13:05,636 --> 00:13:07,196 but at least this gets it done quickly. 328 00:13:07,366 --> 00:13:09,256 So what might another approach be? 329 00:13:09,256 --> 00:13:11,526 Let's tell Adam this time what to do rather 330 00:13:11,526 --> 00:13:12,636 than leaving it to him. 331 00:13:12,636 --> 00:13:14,936 How else could we go about sorting these eight people? 332 00:13:15,496 --> 00:13:16,866 >> Student: Bubble sort. 333 00:13:17,056 --> 00:13:17,996 >> Lecturer: Bubble sort. 334 00:13:18,086 --> 00:13:19,356 What a wonderful lead in. 335 00:13:19,356 --> 00:13:21,436 Okay, well, what does that mean? 336 00:13:21,436 --> 00:13:23,876 Well, let's think about another very simple approach. 337 00:13:24,176 --> 00:13:25,556 You know, this is a really kind 338 00:13:25,556 --> 00:13:28,746 of overwhelming problem eight numbers, I'm the program, 339 00:13:28,816 --> 00:13:31,386 this is a lot of work to do, let me keep it simple 340 00:13:31,386 --> 00:13:32,246 and let me start at the left. 341 00:13:32,316 --> 00:13:35,816 So, these two here are clearly out of order, four is bigger 342 00:13:35,816 --> 00:13:37,636 than two, but two should be to the left 343 00:13:37,636 --> 00:13:38,466 of four so you know what? 344 00:13:38,776 --> 00:13:40,486 Let me just try a very simple algorithm 345 00:13:40,486 --> 00:13:42,976 of swapping these two volunteers. 346 00:13:42,976 --> 00:13:43,806 So we swap them. 347 00:13:43,806 --> 00:13:46,946 That takes one step or maybe three steps if you think back 348 00:13:47,006 --> 00:13:50,426 to our swap preparation, but a fixed number of steps, 349 00:13:50,426 --> 00:13:51,686 a constant number of steps. 350 00:13:51,686 --> 00:13:52,406 And that's okay. 351 00:13:52,576 --> 00:13:55,076 We're more worried about recurring costs, all right? 352 00:13:55,346 --> 00:13:56,806 So, now I compare four and six. 353 00:13:56,806 --> 00:13:58,206 Oh, these two are actually in pretty good shape. 354 00:13:58,206 --> 00:13:59,206 I'm going to leave them alone. 355 00:13:59,576 --> 00:14:00,626 Six and eight, they're good. 356 00:14:00,626 --> 00:14:03,116 Oh, eight and one, let me do a swap, 357 00:14:03,566 --> 00:14:04,946 but again, I'm not backtracking. 358 00:14:04,946 --> 00:14:06,546 I'm still making forward progressive. 359 00:14:06,546 --> 00:14:09,466 Eight and three, got to swap these two; eight and seven, 360 00:14:09,466 --> 00:14:11,836 going to swap these two; and then eight 361 00:14:11,836 --> 00:14:13,556 and five going to swap these two. 362 00:14:13,806 --> 00:14:14,496 Done, right? 363 00:14:16,296 --> 00:14:18,216 All right so obviously not, right? 364 00:14:18,216 --> 00:14:19,236 The list is not sorted. 365 00:14:19,236 --> 00:14:22,796 So I've taken N steps here and done some swaps in the middle, 366 00:14:22,796 --> 00:14:24,926 but any swaps were just constant time anyway. 367 00:14:24,926 --> 00:14:26,156 It's always the same number of steps 368 00:14:26,206 --> 00:14:27,946 to move two humans or to swap two ends. 369 00:14:28,286 --> 00:14:29,176 So, what do I do now? 370 00:14:29,416 --> 00:14:30,406 Well, let me try repeating. 371 00:14:30,406 --> 00:14:32,756 Two and four, no, four and six? 372 00:14:32,756 --> 00:14:33,696 Off six and one. 373 00:14:33,936 --> 00:14:35,946 And now notice what's happening to Gabrielle this time. 374 00:14:35,946 --> 00:14:36,956 Go ahead and swap. 375 00:14:37,196 --> 00:14:39,496 So notice, and we can slap the label that's already been 376 00:14:39,496 --> 00:14:40,966 applied to it on this algorithm. 377 00:14:41,256 --> 00:14:45,516 The smallest number seems to be bubbling her way up to this end 378 00:14:45,516 --> 00:14:48,666 of the list and meanwhile eight has already bubbled his way 379 00:14:48,866 --> 00:14:50,146 up to the end of the list. 380 00:14:50,146 --> 00:14:53,116 So this algorithm is, in fact, generally called bubble sort. 381 00:14:53,356 --> 00:14:56,696 It is just as correct as the previous one that Adam proposed, 382 00:14:56,696 --> 00:14:59,076 which was generally called selection sort 383 00:14:59,476 --> 00:15:01,946 so that begs the question, is one better than the other? 384 00:15:02,136 --> 00:15:06,166 Well, how many steps is bubble sort going to take? 385 00:15:06,726 --> 00:15:08,516 So it's also N squared. 386 00:15:08,516 --> 00:15:09,396 Now, why is that? 387 00:15:09,396 --> 00:15:09,966 Let's see. 388 00:15:10,066 --> 00:15:13,326 Every time I want to move from left to right I'm looking 389 00:15:13,486 --> 00:15:15,616 at eight elements and maybe doing some constant number 390 00:15:15,616 --> 00:15:16,226 of swaps. 391 00:15:16,226 --> 00:15:20,586 So, some product of N am I doing through each iteration, 392 00:15:20,946 --> 00:15:24,556 but in the worst case, who is where in this scenario? 393 00:15:25,246 --> 00:15:29,796 So in the worst case, maybe Gabrielle is all the way 394 00:15:29,796 --> 00:15:31,266 over here and this gentleman. 395 00:15:31,616 --> 00:15:31,926 >> Student: Ravi [phonetic]. 396 00:15:32,076 --> 00:15:33,746 >> Lecturer: Ravi is all the way over there 397 00:15:33,746 --> 00:15:36,506 so that then begs the question if on each iteration someone 398 00:15:36,506 --> 00:15:39,616 like Ravi or Gabrielle only bubble up one place to the left 399 00:15:39,866 --> 00:15:42,896 or one place to the right, how many times am I going to have 400 00:15:42,896 --> 00:15:46,246 to repeat this process to pull Ravi all the way that way 401 00:15:46,246 --> 00:15:47,916 or Gabrielle all the way this way? 402 00:15:48,246 --> 00:15:50,736 Well, another N times or N minus one. 403 00:15:50,976 --> 00:15:54,876 So I have to do N things N times 404 00:15:54,876 --> 00:15:57,396 and that one conceptually is perhaps even easier 405 00:15:57,396 --> 00:15:58,606 to wrap one's mind around. 406 00:15:58,606 --> 00:16:00,056 It's big O of N square. 407 00:16:00,306 --> 00:16:02,836 Now, I think we've had you up here 408 00:16:02,836 --> 00:16:04,506 for an awkwardly long enough moment. 409 00:16:04,506 --> 00:16:05,966 A big round of applause if we could for Adam and these guys? 410 00:16:06,181 --> 00:16:08,181 [ Applause ] 411 00:16:08,346 --> 00:16:12,386 Little souvenir and if you want, our paperwork awaits. 412 00:16:12,496 --> 00:16:12,856 Thank you. 413 00:16:13,526 --> 00:16:14,196 So, let's. 414 00:16:14,196 --> 00:16:14,866 >> Student: [Inaudible]. 415 00:16:14,866 --> 00:16:15,436 >> Lecturer: No, no. 416 00:16:15,436 --> 00:16:17,426 We don't need to do it every time. 417 00:16:17,666 --> 00:16:20,396 We have you for life now. 418 00:16:20,396 --> 00:16:25,036 All right so let me go ahead and show exactly what this means 419 00:16:25,186 --> 00:16:27,386 for something to be taken N squared time 420 00:16:27,386 --> 00:16:28,636 because we'll actually see 421 00:16:28,806 --> 00:16:31,016 that N squared though very simply defined 422 00:16:31,236 --> 00:16:33,846 and that was actually pretty simple to implement if you think 423 00:16:33,846 --> 00:16:36,686 about it I've got a four loop then maybe another four loop 424 00:16:36,686 --> 00:16:40,306 so I can pretty relatively easily translate what we just 425 00:16:40,306 --> 00:16:42,536 did with Adam and team into code. 426 00:16:42,726 --> 00:16:44,806 Well, let's see if we can't visualize it as well. 427 00:16:44,806 --> 00:16:46,816 So this is a length that we have on the course's website 428 00:16:46,816 --> 00:16:49,566 under lectures and it's just a really nice Java applet, 429 00:16:49,566 --> 00:16:51,436 a program someone happened to write in Java, 430 00:16:51,696 --> 00:16:52,736 works in a web browser, 431 00:16:52,736 --> 00:16:54,976 that helps us visualize some of these algorithms. 432 00:16:54,976 --> 00:16:57,776 If I go down here to this drop down if you want to play along 433 00:16:57,776 --> 00:17:00,686 at home and I go choose bubble sorts, array type random, 434 00:17:00,686 --> 00:17:03,136 it's just going to populate it randomly, 435 00:17:03,396 --> 00:17:05,466 and this delay is just going to control the speed 436 00:17:05,466 --> 00:17:06,306 of this demonstration. 437 00:17:06,306 --> 00:17:08,736 I'm going to make it ten milliseconds or whatever this is 438 00:17:08,976 --> 00:17:10,626 and now I'm going to go ahead and click start. 439 00:17:11,026 --> 00:17:15,866 What we have here is more, a larger input set. 440 00:17:16,166 --> 00:17:18,936 So, each of these bars represents a number or a human. 441 00:17:19,186 --> 00:17:21,786 The taller the bar the bigger the number; the shorter the bar, 442 00:17:22,076 --> 00:17:24,196 the shorter the number, the smaller the number. 443 00:17:24,196 --> 00:17:26,686 So Gabrielle would be the small bar up here number one 444 00:17:26,686 --> 00:17:28,646 and Ravi would be one of the larger bars at right. 445 00:17:28,646 --> 00:17:29,686 That's the analog here. 446 00:17:30,016 --> 00:17:32,346 So what's happening in red from left to right is 447 00:17:32,346 --> 00:17:34,506 that this program is implementing bubble sort. 448 00:17:34,506 --> 00:17:37,246 It's comparing pairs or neighbors of elements 449 00:17:37,246 --> 00:17:40,086 and if they happen to be out of order, it swaps those two 450 00:17:40,286 --> 00:17:42,626 but then forges ahead to do it again to the next pair 451 00:17:42,626 --> 00:17:45,686 of neighbors and what you can see perhaps more easily 452 00:17:45,686 --> 00:17:47,376 since this is a longer demonstration 453 00:17:47,376 --> 00:17:50,976 that bigger numbers are, in fact, bubbling up albeit slowly 454 00:17:51,416 --> 00:17:54,456 and smaller numbers are bubbling their way down. 455 00:17:54,786 --> 00:18:00,746 Now this is only what, that looks like 30 or 40 bars total? 456 00:18:00,746 --> 00:18:04,026 So N is not very large and granted we fixed the speed 457 00:18:04,026 --> 00:18:06,316 of this thing, but it's already getting kind of boring. 458 00:18:06,486 --> 00:18:09,276 So, N squared does not scale particularly well. 459 00:18:09,336 --> 00:18:13,056 Sorting elements with even more than 40 elements or whatever is 460 00:18:13,056 --> 00:18:16,086 up here definitely takes a while and let's see then 461 00:18:16,086 --> 00:18:18,046 if selection sort does a little better. 462 00:18:18,046 --> 00:18:18,826 So I'm going to go ahead 463 00:18:18,826 --> 00:18:21,006 and choose selection sort as it's called. 464 00:18:21,006 --> 00:18:23,166 I'm going to go ahead and click start, again, 465 00:18:23,816 --> 00:18:26,046 and now notice less seems to be happening, 466 00:18:26,366 --> 00:18:30,406 but that's because this algorithm is swapping bars 467 00:18:30,406 --> 00:18:33,326 as far to the left as possible on each iteration. 468 00:18:33,326 --> 00:18:34,276 So what's getting highlighted 469 00:18:34,276 --> 00:18:37,706 in red is the then smallest element that's been discovered 470 00:18:38,246 --> 00:18:40,716 say by Adam and then it's being moved all the way to the left. 471 00:18:41,076 --> 00:18:42,806 So, what's a little neat about this algorithm is 472 00:18:42,806 --> 00:18:44,716 that you can actually see the progress 473 00:18:44,716 --> 00:18:47,796 and that definitely happened more quickly, right? 474 00:18:47,796 --> 00:18:49,746 If we ran it again or you actually timed it, 475 00:18:49,746 --> 00:18:51,246 I mean I'm still talking and 476 00:18:51,246 --> 00:18:54,536 yet this thing is done selection sort seems to be faster 477 00:18:54,536 --> 00:18:57,626 and that's because what I've crossed off here 478 00:18:57,816 --> 00:19:01,216 for formality sake definitely has real-world implications. 479 00:19:01,406 --> 00:19:05,666 So N squared is slower than N squared divided by two 480 00:19:06,036 --> 00:19:08,456 for instance, but realize once we start talking 481 00:19:08,456 --> 00:19:11,696 about really large data sets, the number of Face Book users 482 00:19:11,696 --> 00:19:14,346 out there or the number of web pages that Google indexes, 483 00:19:14,346 --> 00:19:16,906 this is when things like N squared 484 00:19:16,906 --> 00:19:19,986 and these design decisions really start to hurt potentially 485 00:19:20,226 --> 00:19:22,186 because they take more time 486 00:19:22,506 --> 00:19:24,996 than you might necessarily have patience for. 487 00:19:25,146 --> 00:19:26,766 So let's see if we can't express this in a way 488 00:19:26,766 --> 00:19:27,966 that is more easily translated to code. 489 00:19:28,046 --> 00:19:31,656 So here's bubble sort pseudo code for it. 490 00:19:31,656 --> 00:19:33,286 You can write this in any number of ways, 491 00:19:33,286 --> 00:19:34,866 but the algorithm essentially boiled 492 00:19:34,866 --> 00:19:36,986 down to repeat the following N times. 493 00:19:37,166 --> 00:19:40,496 For each of the elements in the array if I 494 00:19:40,736 --> 00:19:42,456 as we'll call it iteratively and its neighbor are 495 00:19:42,456 --> 00:19:43,856 out of order, swap them. 496 00:19:44,216 --> 00:19:45,586 So this is, again, another lesson 497 00:19:45,586 --> 00:19:48,096 in pseudo code what it means to communicate your ideas 498 00:19:48,226 --> 00:19:50,596 without getting bogged down in uninteresting minutia [phonetic] 499 00:19:50,596 --> 00:19:53,386 like syntax specific to C or any other language, 500 00:19:53,666 --> 00:19:54,996 but with the expression 501 00:19:54,996 --> 00:19:59,366 of pseudo code here you can see perhaps more clearly the actual 502 00:19:59,366 --> 00:20:00,026 running time. 503 00:20:00,026 --> 00:20:03,096 You can see in the very first line there repeat N times. 504 00:20:03,426 --> 00:20:05,176 You have an explicit directive 505 00:20:05,216 --> 00:20:07,326 that the following is going the take N steps, 506 00:20:07,836 --> 00:20:10,626 but beneath that it says for each element I. 507 00:20:10,736 --> 00:20:12,026 Well, how many elements are there? 508 00:20:12,026 --> 00:20:16,316 There's N elements in the array so a four each loop is going 509 00:20:16,316 --> 00:20:19,156 to induce another N step so you can immediately see 510 00:20:19,156 --> 00:20:20,486 in the pseudo code alone 511 00:20:20,726 --> 00:20:22,156 that this algorithm is apparently going 512 00:20:22,156 --> 00:20:26,716 to take N times N steps to actually get the job done. 513 00:20:27,106 --> 00:20:31,106 Now bubble sort actually allows for more optimization though. 514 00:20:31,106 --> 00:20:33,236 So, when we had our eight volunteers up here, 515 00:20:33,486 --> 00:20:37,166 we for the sake of selection sort told them to get 516 00:20:37,166 --> 00:20:39,906 into proper order and then we asked the question how long 517 00:20:39,906 --> 00:20:42,046 would it take to sort these elements using selection sort 518 00:20:42,046 --> 00:20:44,956 in the best case, but we didn't ask that in bubble sort's case. 519 00:20:45,286 --> 00:20:48,906 So, if in bubble sort's case you have eight humans here from one 520 00:20:49,046 --> 00:20:51,126 on up to eight already sorted, 521 00:20:51,516 --> 00:20:54,486 how many steps is this algorithm going to take as written? 522 00:20:54,486 --> 00:20:55,066 >> Student: [Inaudible]. 523 00:20:55,066 --> 00:21:00,176 >> Lecturer: So I'm hearing a few answers among 524 00:21:00,176 --> 00:21:05,916 which are N squared N, I heard seven, but it is, in fact, 525 00:21:05,916 --> 00:21:08,016 again just look at the code if you will 526 00:21:08,276 --> 00:21:11,446 and do the following N times for each of the elements 527 00:21:11,666 --> 00:21:14,206 so N more things if I and its neighbor are 528 00:21:14,206 --> 00:21:15,436 out of order, swap them. 529 00:21:15,596 --> 00:21:17,536 No matter what this implementation 530 00:21:17,536 --> 00:21:19,896 of bubble sort is just blindly telling you 531 00:21:19,896 --> 00:21:21,946 to do the following N times, 532 00:21:22,396 --> 00:21:25,296 N times this algorithm takes N squared steps. 533 00:21:25,736 --> 00:21:27,756 So, if Adam had come on stage and realized, 534 00:21:27,756 --> 00:21:30,116 had been handed an array of eight people all 535 00:21:30,116 --> 00:21:32,086 of whom were sorted already, doesn't matter. 536 00:21:32,086 --> 00:21:35,666 He's going to very blindly compare again and again 537 00:21:35,926 --> 00:21:39,976 for a total of N squared steps all of the humans in that row 538 00:21:39,976 --> 00:21:42,536 and then eventually he's going to stop because what's neat 539 00:21:42,536 --> 00:21:44,486 about this algorithm at least is 540 00:21:44,486 --> 00:21:48,776 that because you are doing it N times N times no matter what it 541 00:21:48,776 --> 00:21:52,216 will have worked by the end, but there's clearly an opportunity 542 00:21:52,216 --> 00:21:53,296 for optimization here. 543 00:21:53,296 --> 00:21:58,516 So, bubble sort right now let's slap some numbers on these just 544 00:21:58,516 --> 00:22:01,876 to have sort of a list and then see if we can't do better. 545 00:22:02,246 --> 00:22:07,806 So bubble sort, so bubble sort thus far it seems 546 00:22:07,806 --> 00:22:09,726 to be big O of N squared. 547 00:22:09,976 --> 00:22:14,996 Selection sort seems to be also N O of N squared 548 00:22:14,996 --> 00:22:19,716 but we also said it also is in omega of N squared because, 549 00:22:19,716 --> 00:22:21,776 again, you just keep going through and going 550 00:22:21,776 --> 00:22:23,916 through selecting the N smallest element but you don't know 551 00:22:23,916 --> 00:22:25,736 in advanced that you found the smallest element 552 00:22:25,856 --> 00:22:26,446 until you've checked. 553 00:22:27,036 --> 00:22:30,306 So, is Omega a bubble sort better in this case? 554 00:22:31,286 --> 00:22:33,406 So, no so can we do better? 555 00:22:33,406 --> 00:22:35,206 How would you optimize this algorithm? 556 00:22:35,206 --> 00:22:37,476 How would you after receiving some feedback 557 00:22:37,516 --> 00:22:40,076 from your TF go back to the drawing board 558 00:22:40,076 --> 00:22:46,056 and just add a couple lines of code to fix this, yeah? 559 00:22:46,056 --> 00:22:46,426 >> Student: [Inaudible]. 560 00:22:46,426 --> 00:22:46,536 >> Lecturer: Yeah. 561 00:22:47,036 --> 00:22:49,446 Okay, so keep track of how many switches you've made 562 00:22:49,446 --> 00:22:50,476 by way of a variable. 563 00:22:50,506 --> 00:22:53,486 So initialize some counter variable in there, initialize it 564 00:22:53,486 --> 00:22:56,596 to zero and then as you iterate over all of the elements 565 00:22:56,656 --> 00:23:00,276 if you actually execute this if condition and swap elements, 566 00:23:00,276 --> 00:23:01,406 what should you do with this variable? 567 00:23:02,126 --> 00:23:04,126 Little plus, plus. 568 00:23:04,686 --> 00:23:06,106 Right? Or it could be a Boolean. 569 00:23:06,106 --> 00:23:07,126 It could just be a Boolean 570 00:23:07,126 --> 00:23:10,446 that says true I have swapped something because then 571 00:23:10,446 --> 00:23:13,026 when I get to the end of the list and I meet Ravi, 572 00:23:13,026 --> 00:23:15,446 who has been here the whole time because he's already sorted 573 00:23:15,446 --> 00:23:18,936 and he's number eight, what do I then check before deciding 574 00:23:18,936 --> 00:23:20,366 whether or not to repeat this algorithm? 575 00:23:20,366 --> 00:23:20,906 >> Student: [Inaudible]. 576 00:23:20,906 --> 00:23:21,006 >> Lecturer: Right. 577 00:23:22,696 --> 00:23:25,936 So if my variable equals equal zero, 578 00:23:26,236 --> 00:23:28,196 don't waste your time going back to the beginning 579 00:23:28,196 --> 00:23:29,356 and repeating this algorithm. 580 00:23:29,396 --> 00:23:33,006 Instead do something like if variable equals equal zero, 581 00:23:34,236 --> 00:23:35,516 new line, break. 582 00:23:35,906 --> 00:23:38,206 So break is a syntax you might have seen in section 583 00:23:38,206 --> 00:23:41,396 or some past examples you can forcibly exit a loop simply 584 00:23:41,396 --> 00:23:44,236 by executing the statement break semicolon and so 585 00:23:44,236 --> 00:23:46,826 that would be the easiest way of translating 586 00:23:46,826 --> 00:23:49,906 that little optimization to code here. 587 00:23:50,176 --> 00:23:51,886 So, what are the returns? 588 00:23:52,066 --> 00:23:55,386 What then is the best case running time of bubble sort? 589 00:23:55,386 --> 00:24:00,676 Yeah, it's just N and that's cool, right? 590 00:24:00,676 --> 00:24:01,896 So that was easy. 591 00:24:01,946 --> 00:24:03,966 Just by adding two lines of code and, again, 592 00:24:03,966 --> 00:24:06,576 testament to this idea of putting up a little more effort 593 00:24:06,576 --> 00:24:09,436 and thought and intellect into the design of an algorithm, 594 00:24:09,436 --> 00:24:12,706 can you significantly chip away at the running time, 595 00:24:12,706 --> 00:24:14,666 which as we saw on that big chart the other day, 596 00:24:14,946 --> 00:24:18,666 something that's linear is much, much better as N gets really, 597 00:24:18,666 --> 00:24:21,616 really large than something that's quadratic or polynomial. 598 00:24:21,956 --> 00:24:26,736 So this is already a good thing, but it still feels pretty slow. 599 00:24:26,736 --> 00:24:29,116 So is there another approach altogether? 600 00:24:29,146 --> 00:24:32,626 Well, I'll wave my hands at some more pseudo code here 601 00:24:32,626 --> 00:24:34,406 since once could write this in a number of ways 602 00:24:34,406 --> 00:24:37,026 and this just expresses what Adam 603 00:24:37,026 --> 00:24:41,016 and our volunteers already put forth to us, but it turns 604 00:24:41,016 --> 00:24:44,456 out that this trick from Monday, this capability 605 00:24:44,586 --> 00:24:47,846 of a functions calling itself is actually a really 606 00:24:47,846 --> 00:24:48,746 powerful thing. 607 00:24:49,056 --> 00:24:52,406 So, there exists this other algorithm called merge sort 608 00:24:52,676 --> 00:24:55,706 and as the name implies, it boils down to taking an array 609 00:24:55,706 --> 00:24:58,446 or taking a list more generally sorting half of it, 610 00:24:58,506 --> 00:25:00,866 sorting the other half and then merging the results 611 00:25:00,866 --> 00:25:04,726 and voila you then have a perfectly sorted list. 612 00:25:05,406 --> 00:25:06,366 Well, how does this work? 613 00:25:06,746 --> 00:25:07,436 Well, this is it. 614 00:25:08,036 --> 00:25:10,016 So this is the pseudo code for merge sort. 615 00:25:10,016 --> 00:25:12,046 It says on input of N elements so in the form 616 00:25:12,046 --> 00:25:13,916 of an array first you do a sanity check. 617 00:25:13,916 --> 00:25:15,586 If you've got fewer than two elements, 618 00:25:15,866 --> 00:25:17,736 there's no work to be done, return. 619 00:25:18,036 --> 00:25:19,826 And if you can recall the lingo from Monday, 620 00:25:19,826 --> 00:25:23,206 that represents our base case 621 00:25:23,356 --> 00:25:25,726 that actually prevents this thing from devolving 622 00:25:25,726 --> 00:25:28,876 into a really bad infinite loop after which bad things, 623 00:25:28,876 --> 00:25:31,036 segmentation faults and all of that happen. 624 00:25:31,036 --> 00:25:33,076 So, else if you do have enough elements 625 00:25:33,076 --> 00:25:35,986 to actually do interesting work on, what should you do? 626 00:25:35,986 --> 00:25:39,056 So, sort the left half of elements, sort the right half 627 00:25:39,056 --> 00:25:41,606 of elements, then merge the sorted halves. 628 00:25:42,756 --> 00:25:45,036 Well, what does this mean exactly? 629 00:25:45,516 --> 00:25:47,846 Well, if I have a list of eight numbers 630 00:25:48,166 --> 00:25:51,386 and then I take the four numbers on the left and the four numbers 631 00:25:51,386 --> 00:25:52,626 on the right, how do I go 632 00:25:52,626 --> 00:25:54,376 about sorting the left half of those elements? 633 00:25:54,376 --> 00:25:58,696 What would you do? 634 00:25:58,926 --> 00:26:01,166 Right, I've got eight volunteers here who we have dismissed, 635 00:26:01,166 --> 00:26:03,226 but pretend there's four people here, four people there, 636 00:26:03,416 --> 00:26:05,086 I have to divide the list into two 637 00:26:05,086 --> 00:26:06,616 so here's the four people I need to sort, 638 00:26:06,796 --> 00:26:08,036 how can I sort half a list? 639 00:26:08,036 --> 00:26:08,103 >> Student: [Inaudible]. 640 00:26:08,103 --> 00:26:11,436 >> Lecturer: How about in back? 641 00:26:11,496 --> 00:26:15,236 >> Student: Cut it into quarters. 642 00:26:15,306 --> 00:26:18,246 >> Lecturer: Cut it into three quarters? 643 00:26:18,246 --> 00:26:19,166 >> Student: [Inaudible]. 644 00:26:19,166 --> 00:26:19,486 >> Lecturer: Okay. 645 00:26:19,486 --> 00:26:21,506 So I could cut it in half again and again and again. 646 00:26:21,506 --> 00:26:24,196 Okay. So finally I get down to Gabrielle or any 647 00:26:24,196 --> 00:26:27,096 of our volunteers because I've cut my list of eight into four 648 00:26:27,096 --> 00:26:29,696 into two into one, what do I need now with Gabrielle 649 00:26:29,696 --> 00:26:30,746 or whoever I've found? 650 00:26:31,256 --> 00:26:33,776 Can't go back to the laptop now. 651 00:26:34,516 --> 00:26:35,036 You're on the hook. 652 00:26:35,036 --> 00:26:38,016 >> Student: Oh, you just cut it in half, 653 00:26:38,016 --> 00:26:40,076 cut in half and [inaudible]. 654 00:26:40,076 --> 00:26:40,606 >> Lecturer: Okay, true, 655 00:26:40,606 --> 00:26:43,866 but then what do I do once I keep cutting the list in half? 656 00:26:44,026 --> 00:26:46,676 Now I just have eight people, what do I do with them? 657 00:26:46,676 --> 00:26:48,656 >> Student: Merge every two. 658 00:26:49,136 --> 00:26:50,096 >> Lecturer: Okay, I merge every two. 659 00:26:50,096 --> 00:26:53,216 Okay. So, you're actually going frankly in this direction, 660 00:26:53,516 --> 00:26:55,426 but it seems like you're just kind 661 00:26:55,426 --> 00:26:57,476 of weaving a circular argument around me, right? 662 00:26:57,476 --> 00:26:59,096 I'm asking you how to sort an element 663 00:26:59,096 --> 00:27:00,646 and I've already cut them in half. 664 00:27:00,646 --> 00:27:02,386 You're telling me, oh, just cut them in half again 665 00:27:02,886 --> 00:27:03,856 and then cut them in half again. 666 00:27:03,856 --> 00:27:05,916 I feel like we're not actually getting to resolution here. 667 00:27:05,916 --> 00:27:05,996 Yeah? 668 00:27:05,996 --> 00:27:10,156 >> Student: Well, a single element [inaudible] and so 669 00:27:10,276 --> 00:27:15,376 when you get down to one element sorted 670 00:27:16,846 --> 00:27:21,556 and then you would merge it with another element [inaudible]. 671 00:27:21,976 --> 00:27:22,076 >> Lecturer: Okay. 672 00:27:22,076 --> 00:27:22,876 >> Student: And then [inaudible] 673 00:27:22,876 --> 00:27:27,196 and then those two pairs would be merged [inaudible] 674 00:27:27,526 --> 00:27:28,236 so on and so forth. 675 00:27:28,236 --> 00:27:28,766 >> Lecturer: Okay, good. 676 00:27:28,766 --> 00:27:32,796 So to summarize maybe the magic here is not in this punting 677 00:27:32,796 --> 00:27:35,716 in lines the second to bottom line. 678 00:27:35,786 --> 00:27:38,316 When I say sort left half and sort right half, you know, 679 00:27:38,316 --> 00:27:39,576 I'm kind of just punting on that 680 00:27:39,576 --> 00:27:42,726 and saying go sort these elements then the magic seems 681 00:27:42,726 --> 00:27:44,956 to be in the merging of sorted halves because if we kind 682 00:27:44,956 --> 00:27:46,496 of think through the logic that's been proposed, 683 00:27:46,686 --> 00:27:48,786 if I eventually get down to a list of one, 684 00:27:48,786 --> 00:27:51,246 of size one just Gabrielle or just Ravi, 685 00:27:51,576 --> 00:27:54,926 there's actually a really neat feature of that situation, 686 00:27:55,136 --> 00:27:59,046 which is that Gabrielle as a single number is already sorted 687 00:27:59,046 --> 00:28:01,176 by nature similarly as Ravi or any 688 00:28:01,176 --> 00:28:03,356 of our eight volunteers already sorted if I try 689 00:28:03,356 --> 00:28:05,936 to sort just them, but then I apparently need 690 00:28:05,936 --> 00:28:08,586 to take a step back conceptually and say, okay, I've got a list 691 00:28:08,686 --> 00:28:11,406 of length one, it's already sorted trivially 692 00:28:11,406 --> 00:28:13,326 because that was really an uninteresting exercise, 693 00:28:13,566 --> 00:28:16,826 but now I have two lists that sounds of size two. 694 00:28:17,226 --> 00:28:18,456 So, I know this is going to happen, 695 00:28:18,546 --> 00:28:20,976 but I will say let's take a five minute break think 696 00:28:20,976 --> 00:28:23,666 about how we will solve this and then we'll reveal the answer 697 00:28:23,856 --> 00:28:24,976 after you've not actually thought about it. 698 00:28:25,516 --> 00:28:27,576 [ Break ] 699 00:28:28,076 --> 00:28:29,276 >> Lecturer: All right so we are back. 700 00:28:29,366 --> 00:28:31,576 Here are, we'll do this one without humans, 701 00:28:31,726 --> 00:28:33,726 but here are the same numbers 702 00:28:33,726 --> 00:28:36,506 that our volunteers represented the motivation here 703 00:28:36,506 --> 00:28:39,126 to make sure the context is clear is to do better 704 00:28:39,126 --> 00:28:42,326 than N squared and even though the algorithms we demonstrated 705 00:28:42,326 --> 00:28:44,476 both with humans and with computers up top, 706 00:28:44,696 --> 00:28:48,336 hasn't taken terribly long realize that we only had, again, 707 00:28:48,416 --> 00:28:51,426 30 or 40 bars on the screen that we're getting sorted and even 708 00:28:51,426 --> 00:28:54,176 that was relatively slow, but the goal here will be 709 00:28:54,176 --> 00:28:58,536 to compare just how quickly or how much better we can do if, 710 00:28:58,536 --> 00:29:00,946 again, we inject a little more intelligence and start 711 00:29:00,946 --> 00:29:02,946 to deploy some more sophisticated techniques 712 00:29:03,206 --> 00:29:04,846 like this thing called recursion on Monday, 713 00:29:04,846 --> 00:29:06,756 which at the time very uninteresting 714 00:29:06,756 --> 00:29:10,086 to implement factorial or sigma, the summation function 715 00:29:10,086 --> 00:29:12,236 with recursion because we already had other ways of doing 716 00:29:12,236 --> 00:29:14,296 that but as in this case as we'll see, 717 00:29:14,356 --> 00:29:17,496 sometimes recursion lends itself to the expression 718 00:29:17,496 --> 00:29:19,706 of an algorithm that would just be much harder 719 00:29:19,706 --> 00:29:22,926 to express iteratively or with lots of loops and variables. 720 00:29:22,926 --> 00:29:24,756 So here's our algorithm and let's see 721 00:29:24,756 --> 00:29:25,876 if we can apply this here. 722 00:29:25,876 --> 00:29:27,016 So here's our eight elements. 723 00:29:27,286 --> 00:29:29,616 The first step is to on input of N elements, 724 00:29:29,846 --> 00:29:31,236 if N is less than two, return. 725 00:29:31,336 --> 00:29:34,306 No, N is actually eight so I'm not going to return 726 00:29:34,676 --> 00:29:35,966 so else I do the following. 727 00:29:35,966 --> 00:29:37,476 Sort the left half of elements. 728 00:29:37,896 --> 00:29:39,666 So, somehow I need to do something 729 00:29:39,666 --> 00:29:42,656 with these guys here, no. 730 00:29:44,036 --> 00:29:47,976 With these guys here so real-world bugs as well, 731 00:29:48,216 --> 00:29:50,316 and then I'm going to move on the those, but not just yet. 732 00:29:50,526 --> 00:29:53,476 So, I've called a function essentially sort left half 733 00:29:53,476 --> 00:29:53,956 of elements. 734 00:29:54,216 --> 00:29:57,226 So I need a way of sorting these four elements. 735 00:29:57,226 --> 00:29:59,366 What algorithm might I apply to these four elements? 736 00:29:59,446 --> 00:30:03,636 So, I've got bubble sorts, I've got selection sort, 737 00:30:03,636 --> 00:30:05,456 but feels like if I deploy one 738 00:30:05,456 --> 00:30:07,956 of the algorithms we've already dissected and analyzed 739 00:30:08,116 --> 00:30:10,806 and used it twice that we're not really going 740 00:30:10,806 --> 00:30:13,846 to make a fundamentally important step forward, 741 00:30:13,966 --> 00:30:16,566 but you know, we do have a new sorting algorithm supposedly. 742 00:30:16,566 --> 00:30:18,396 We don't know if it works, we haven't seen how it works, 743 00:30:18,566 --> 00:30:21,426 but it is called merge sort so let's take this leap of faith 744 00:30:21,426 --> 00:30:24,236 and sort the four elements on the left half 745 00:30:24,596 --> 00:30:26,536 with the same algorithm, merge sort. 746 00:30:26,576 --> 00:30:30,976 So you'll have a little bit of stories piling 747 00:30:30,976 --> 00:30:33,146 up in one's brain here as we do this recursively, 748 00:30:33,146 --> 00:30:33,786 but here we go. 749 00:30:34,136 --> 00:30:36,406 So, I've just sorted, I need to sort the left half, 750 00:30:36,656 --> 00:30:37,696 that's these four elements, 751 00:30:37,696 --> 00:30:40,596 which means I'm invoking merge sort N is now four, 752 00:30:40,776 --> 00:30:42,296 is four less than two? 753 00:30:42,386 --> 00:30:45,666 No. So I do the else sort left half of elements. 754 00:30:45,806 --> 00:30:46,676 Well, what does that mean? 755 00:30:46,926 --> 00:30:50,366 That means I now need to sort these elements here, 756 00:30:50,736 --> 00:30:52,156 okay, what does that mean? 757 00:30:52,156 --> 00:30:53,456 How do I sort two elements? 758 00:30:53,796 --> 00:30:54,956 Let's deploy merge sort. 759 00:30:54,956 --> 00:30:57,216 So, call merge sort on a list of size two. 760 00:30:57,456 --> 00:30:58,596 Is two less than two? 761 00:30:58,596 --> 00:31:00,826 No. So, I proceed to do the else construct 762 00:31:00,826 --> 00:31:03,866 yet again sort the left half of elements and here I go. 763 00:31:04,176 --> 00:31:08,406 I now sort the left half of elements and voila, 764 00:31:08,406 --> 00:31:11,316 I seem again, I seem already to be at the point 765 00:31:11,316 --> 00:31:13,496 where I have a list that's now size one. 766 00:31:13,686 --> 00:31:15,186 So is one less than two? 767 00:31:15,696 --> 00:31:16,646 Yes, finally. 768 00:31:16,706 --> 00:31:18,166 So I return immediately. 769 00:31:18,506 --> 00:31:20,736 Now, at this point in the story I've got a list 770 00:31:20,796 --> 00:31:24,126 of size one what was the next step 771 00:31:24,126 --> 00:31:27,596 if you roll back in time now? 772 00:31:27,716 --> 00:31:30,516 So, sort the right half, the right half in question 773 00:31:30,516 --> 00:31:33,816 at this moment in time having called merge sort multiple times 774 00:31:33,816 --> 00:31:35,246 already was sort the left half, 775 00:31:35,736 --> 00:31:38,326 now I need to sort the right half. 776 00:31:38,446 --> 00:31:40,096 So, I'm down at this point here. 777 00:31:40,296 --> 00:31:44,756 Sort two is trivial to sort because if N equals one is less 778 00:31:44,786 --> 00:31:47,136 than two I return so now I'm at this point 779 00:31:47,136 --> 00:31:49,126 in the story what do I do with four and two? 780 00:31:49,126 --> 00:31:53,236 I've got two lists of size one and the goal now is 781 00:31:53,236 --> 00:31:55,586 to merge these sorted halves. 782 00:31:56,836 --> 00:31:59,346 How do I merge these sorted halves? 783 00:31:59,346 --> 00:32:01,116 >> Student: [Inaudible] 784 00:32:01,116 --> 00:32:02,526 >> Lecturer: Well, we can do this fair, 785 00:32:02,526 --> 00:32:05,366 I mean we can do it verbally, all right, here's a list, 786 00:32:05,366 --> 00:32:07,336 here's a list, I need to merge them somehow. 787 00:32:07,336 --> 00:32:09,486 All right, who's going to come first in the merged list? 788 00:32:10,126 --> 00:32:10,996 Two obviously. 789 00:32:10,996 --> 00:32:12,586 So, I'm going to make some extra space here. 790 00:32:12,586 --> 00:32:15,756 We'll call this I need actually in reality extra memory 791 00:32:15,756 --> 00:32:18,216 for merge sort so I'm just going to give myself an extra array, 792 00:32:18,216 --> 00:32:19,826 some blank space that I can start writing 793 00:32:19,826 --> 00:32:22,606 on so here's two lists, I've already plucked off the two, 794 00:32:22,896 --> 00:32:24,086 I advance to the next element 795 00:32:24,086 --> 00:32:26,746 in list there's nothing there so who comes next? 796 00:32:26,906 --> 00:32:29,356 So four. So, now I'm at this point in the story, 797 00:32:29,356 --> 00:32:32,736 and I have a list of size two that's been sorted, 798 00:32:33,066 --> 00:32:34,666 where did we leave off in the story now? 799 00:32:34,756 --> 00:32:36,406 >> Student: Six and eight. 800 00:32:36,676 --> 00:32:37,866 >> Lecturer: So six and eight, right? 801 00:32:37,866 --> 00:32:40,496 Because this was the left half but then this was the left half 802 00:32:40,496 --> 00:32:43,386 of the left half, we sorted him, then we sorted the right half 803 00:32:43,386 --> 00:32:46,326 of the left half, that was easy, then we merged the left half 804 00:32:47,066 --> 00:32:50,226 of the left half with the right half of the left half, 805 00:32:50,566 --> 00:32:51,726 and we got two and four. 806 00:32:51,726 --> 00:32:53,666 So, I realize this is actually a little confusing 807 00:32:53,916 --> 00:32:55,526 but that's why we're actually doing a fairly small 808 00:32:55,526 --> 00:32:56,196 example here. 809 00:32:56,506 --> 00:32:59,926 Two and four now are sorted so who is their counterpart? 810 00:32:59,926 --> 00:33:01,156 Who is their right-hand side? 811 00:33:01,436 --> 00:33:02,186 That's these guys. 812 00:33:02,266 --> 00:33:04,646 So, how do I sort the right half of this list? 813 00:33:04,766 --> 00:33:07,946 All right, I sort the left half of this list first, okay, 814 00:33:07,946 --> 00:33:11,616 now I sort, that's already done because it's a list of size one; 815 00:33:11,796 --> 00:33:13,466 now I sort the right half of this list, 816 00:33:13,816 --> 00:33:16,066 that's already done, what do I do now? 817 00:33:16,216 --> 00:33:18,026 What's the third step in this sorting process? 818 00:33:19,376 --> 00:33:20,486 So I still need to merge them. 819 00:33:20,486 --> 00:33:23,076 We, the [inaudible] humans can see they're already in order, 820 00:33:23,076 --> 00:33:25,676 but we as the computer don't know in advanced 821 00:33:25,676 --> 00:33:27,576 until we actually look at them so let's look at them. 822 00:33:27,576 --> 00:33:30,046 I've got a list of size one, a list of size one, 823 00:33:30,046 --> 00:33:32,066 which one is going to come first when I merge the two? 824 00:33:32,356 --> 00:33:35,906 Obviously the six and then I advance this finger, 825 00:33:35,906 --> 00:33:37,966 the list was only size one there's nothing there 826 00:33:37,966 --> 00:33:40,856 so now I merge the remaining element six, 827 00:33:41,246 --> 00:33:43,936 remaining element eight, so I've got six and eight. 828 00:33:43,936 --> 00:33:46,526 At what point in the story am I? 829 00:33:47,296 --> 00:33:48,806 We've rolled back almost to the beginning. 830 00:33:49,376 --> 00:33:53,306 Right so I sorted the left half, which was here, 831 00:33:53,486 --> 00:33:57,226 I sorted the right half so the third step here and final step 832 00:33:57,226 --> 00:33:58,966 on the board is to merge the sorted halves. 833 00:33:59,446 --> 00:34:01,846 So now I'm going to do the same algorithm and, again, 834 00:34:01,846 --> 00:34:03,776 I tend to use my left and right finger when I need 835 00:34:03,816 --> 00:34:06,736 to represent a loop that has maybe two different indices, 836 00:34:06,736 --> 00:34:07,786 two different index variables. 837 00:34:08,096 --> 00:34:10,906 So how do I sort, how do I merge these two lists? 838 00:34:11,386 --> 00:34:14,486 Well, they're already sorted so I don't have to compare all 839 00:34:14,486 --> 00:34:16,596 of the elements against every element, I just need to look 840 00:34:16,596 --> 00:34:18,536 at the front of them because I already know 841 00:34:18,606 --> 00:34:21,386 from my previous steps that I put the smallest elements there. 842 00:34:21,646 --> 00:34:24,346 So I now need to merge this list and this list 843 00:34:24,666 --> 00:34:26,946 so here's the two smallest elements. 844 00:34:26,946 --> 00:34:27,676 I compare them. 845 00:34:27,896 --> 00:34:28,906 Is two less than six? 846 00:34:29,046 --> 00:34:29,956 Yeah, it is. 847 00:34:29,956 --> 00:34:33,696 So, let me put this into my empty space. 848 00:34:33,906 --> 00:34:34,786 Now I advance. 849 00:34:34,936 --> 00:34:36,426 Okay, this time there's something there 850 00:34:36,666 --> 00:34:38,496 so I don't just blindly take the six, 851 00:34:38,496 --> 00:34:39,866 I now compare four against six. 852 00:34:39,866 --> 00:34:42,956 Obviously the four comes next, now I advance, 853 00:34:43,046 --> 00:34:44,316 now there's nothing to compare 854 00:34:44,316 --> 00:34:46,126 against so I can just finish my loop 855 00:34:46,586 --> 00:34:48,236 over the six and over the eight. 856 00:34:48,666 --> 00:34:50,976 So at this point in the story I've just finished which step? 857 00:34:53,356 --> 00:34:53,506 >> Student: Sort. 858 00:34:53,706 --> 00:34:54,506 >> Lecturer: Sort left half. 859 00:34:54,916 --> 00:34:57,326 The very first time in this story 860 00:34:57,326 --> 00:35:00,516 that I said sort the left half was to sort these four elements. 861 00:35:00,756 --> 00:35:03,746 We finally completed that step, but notice what happened. 862 00:35:03,746 --> 00:35:06,276 It induced that simple goal of sorting the left half 863 00:35:06,276 --> 00:35:09,546 of elements, induced another call to merge sort, another call 864 00:35:09,546 --> 00:35:11,856 to merge sort, another call to merge sort, 865 00:35:12,086 --> 00:35:14,826 and then we merge the lists, but then we did the same thing. 866 00:35:14,826 --> 00:35:17,146 Another call to merge sort, another call to merge sort, 867 00:35:17,146 --> 00:35:19,506 another call to merge sort, then we merge the lists 868 00:35:19,506 --> 00:35:20,896 and we're not even done yet. 869 00:35:21,156 --> 00:35:23,166 So, it actually feels like we're doing a lot of work here, 870 00:35:23,286 --> 00:35:24,656 but let's see if it pans out. 871 00:35:24,936 --> 00:35:27,036 Well, on the right hand side if that's the next step, 872 00:35:27,426 --> 00:35:29,086 I sort the right hand side of the list. 873 00:35:29,086 --> 00:35:29,596 How do I do that? 874 00:35:30,036 --> 00:35:30,896 Sort the left half. 875 00:35:30,896 --> 00:35:32,096 All right so that's these two guys. 876 00:35:32,096 --> 00:35:32,786 How do I do that? 877 00:35:32,876 --> 00:35:33,656 Sort the left half. 878 00:35:33,656 --> 00:35:35,786 All right, that's actually pretty easy, done. 879 00:35:35,896 --> 00:35:38,336 It's a list of size one now I sort the right half. 880 00:35:38,336 --> 00:35:40,486 Easy, it's a list of size one done. 881 00:35:40,776 --> 00:35:43,816 Now the magic happens, but it's seemingly fairly simple, 882 00:35:43,816 --> 00:35:44,456 merge them. 883 00:35:44,456 --> 00:35:45,436 All right, so how do I do that? 884 00:35:45,586 --> 00:35:48,736 I compare and I'm going to plop down one first then three. 885 00:35:49,026 --> 00:35:50,406 These guys are now sorted. 886 00:35:50,746 --> 00:35:51,916 Where am I at the story? 887 00:35:51,916 --> 00:35:55,066 Now I sort the right half of the right half so that means look 888 00:35:55,066 --> 00:35:58,506 at these two, chop it into left and right, do this, trivial, 889 00:35:58,506 --> 00:36:01,196 it's already sorted do this, trivial, it's already sorted. 890 00:36:01,446 --> 00:36:04,156 Now again the magical step of merging the two 891 00:36:04,536 --> 00:36:08,466 and obviously we get out five and seven so now I have a list 892 00:36:08,466 --> 00:36:10,126 of size two that's sorted, another list 893 00:36:10,126 --> 00:36:12,366 of size two that's sorted, the left half 894 00:36:12,366 --> 00:36:15,796 and right halves respectively of the right half of the original. 895 00:36:16,186 --> 00:36:16,966 What's the last step? 896 00:36:17,246 --> 00:36:18,326 To merge the sorted halves. 897 00:36:18,396 --> 00:36:19,666 So, one and five I compare. 898 00:36:20,036 --> 00:36:21,906 One comes first, now I compare three 899 00:36:21,906 --> 00:36:23,206 and five, three comes next. 900 00:36:23,446 --> 00:36:25,146 Now I'm out of numbers there so five 901 00:36:25,146 --> 00:36:27,166 and seven get plopped down am I done? 902 00:36:29,686 --> 00:36:30,156 Not quite. 903 00:36:30,356 --> 00:36:33,096 Merge sort boils down essentially to three lines 904 00:36:33,096 --> 00:36:36,936 of code; sort left half, sort right half, and I've done that 905 00:36:37,646 --> 00:36:40,606 and I've done that and now the third and final step 906 00:36:40,636 --> 00:36:43,186 of merge sort is to merge those sorted halves. 907 00:36:43,186 --> 00:36:44,176 So what do I do? 908 00:36:44,176 --> 00:36:45,496 Well, I'm running out of board space, 909 00:36:45,496 --> 00:36:47,466 but I don't think it'll be all that enlightening to know 910 00:36:47,466 --> 00:36:49,716 that the numbers are now going to get merged as one, 911 00:36:50,056 --> 00:36:52,136 two, three, four, but how? 912 00:36:52,416 --> 00:36:54,256 Well, again, remember the two index variables. 913 00:36:54,256 --> 00:36:56,366 I've got on the start of this list, one on this one, 914 00:36:56,366 --> 00:36:59,636 I compare the two fingers, one comes first so I write down one. 915 00:36:59,916 --> 00:37:01,236 Then I advance this finger. 916 00:37:01,456 --> 00:37:03,006 Now, I compare two against three. 917 00:37:03,136 --> 00:37:05,336 I put down two and I advance this finger. 918 00:37:05,646 --> 00:37:08,346 And the point of this little finger demonstration is 919 00:37:08,346 --> 00:37:10,696 to make clear that at no point is my left 920 00:37:10,696 --> 00:37:14,706 or right finger going backwards on the board. 921 00:37:14,706 --> 00:37:16,706 They're always making forward progress. 922 00:37:17,046 --> 00:37:19,166 In other words, every time I merge, 923 00:37:19,276 --> 00:37:21,806 I'm touching physically N elements 924 00:37:22,206 --> 00:37:26,006 or put another way I'm touching every number once and only once 925 00:37:26,156 --> 00:37:29,406 because once I've touched it, I advance and move forward again. 926 00:37:29,526 --> 00:37:30,386 I never double back. 927 00:37:30,936 --> 00:37:32,916 So, in other words from left the right, 928 00:37:33,116 --> 00:37:36,046 I seem to be conceptually always taking N steps 929 00:37:36,046 --> 00:37:36,706 in this direction. 930 00:37:37,156 --> 00:37:39,916 Every time I do merging, there's N steps involved 931 00:37:39,916 --> 00:37:41,776 because I'm touching every number once 932 00:37:42,196 --> 00:37:43,416 but never going backwards. 933 00:37:44,106 --> 00:37:47,706 So that then begs the question on the other axis here, 934 00:37:47,706 --> 00:37:51,156 how many times did I go down in this list? 935 00:37:51,156 --> 00:37:53,236 How many times did I chop lists in half? 936 00:37:53,906 --> 00:37:56,186 Because if I'm doing something N times this way, 937 00:37:56,426 --> 00:37:58,876 hopefully I didn't chop the list N times this way. 938 00:37:58,876 --> 00:38:00,616 Otherwise we've got an N squared algorithm 939 00:38:00,616 --> 00:38:01,726 and this was all a waste of time, 940 00:38:02,326 --> 00:38:06,596 but how many times per week zero can we take a list of size N 941 00:38:06,946 --> 00:38:10,376 or a list of size eight and divide it in half, in half, 942 00:38:10,376 --> 00:38:12,506 in half, and then bottom out with one element? 943 00:38:12,506 --> 00:38:14,216 >> Student: [Inaudible]. 944 00:38:14,216 --> 00:38:16,756 >> Lecturer: So log base two of N. So even if you got lost 945 00:38:16,756 --> 00:38:18,496 in some of the details, that's certainly fine 946 00:38:18,496 --> 00:38:20,966 at first glance here, but just conceptually what was I doing? 947 00:38:21,206 --> 00:38:23,636 Dividing the list in half, in half, in half, in half 948 00:38:23,666 --> 00:38:25,506 and then I bottomed out with individual elements. 949 00:38:25,506 --> 00:38:26,556 How many times can you do that? 950 00:38:26,856 --> 00:38:29,926 Log N times, but every time I did that the goal was 951 00:38:29,926 --> 00:38:32,656 to then merge all of the elements together as we tried 952 00:38:32,656 --> 00:38:36,106 to do visually here so I'm doing something log N times 953 00:38:36,106 --> 00:38:38,776 and then each time I'm doing N merging steps. 954 00:38:39,126 --> 00:38:42,396 N moves with my fingers so this algorithm feels 955 00:38:42,396 --> 00:38:46,616 like it should be big O of, and I'll write it here just 956 00:38:46,616 --> 00:38:52,006 for the sake of reference, big O of N times log N, 957 00:38:52,446 --> 00:38:53,836 and it's technically base two, 958 00:38:54,026 --> 00:38:56,436 but just like we started scratching off constant numbers 959 00:38:56,436 --> 00:39:00,036 before in the context of big O, the base also means very little 960 00:39:00,036 --> 00:39:01,486 since it's just a constant value. 961 00:39:01,486 --> 00:39:03,496 You can convert one base in binary 962 00:39:03,626 --> 00:39:05,976 to any other constant value as well. 963 00:39:06,726 --> 00:39:08,516 So if this depiction, 964 00:39:08,516 --> 00:39:10,926 if I didn't do this depiction justice, let's just try to think 965 00:39:10,926 --> 00:39:12,816 about it more abstractly. 966 00:39:13,066 --> 00:39:15,246 If the problem at hand is to figure out what T of N is, 967 00:39:15,326 --> 00:39:17,686 the running time of an algorithm whose input is 968 00:39:17,686 --> 00:39:22,656 of size N how can we actually compute its running time 969 00:39:22,656 --> 00:39:24,336 by looking at the pseudo code essentially? 970 00:39:24,336 --> 00:39:25,456 So this is the algorithm. 971 00:39:25,726 --> 00:39:27,796 The last time we talked about pseudo code we were able 972 00:39:27,796 --> 00:39:30,416 to slap big O of N squared on it just by looking at it 973 00:39:30,776 --> 00:39:32,546 so let's see if we can't translate this 974 00:39:32,736 --> 00:39:33,926 to big O notation. 975 00:39:34,246 --> 00:39:36,956 All right so this first line here if N is less 976 00:39:36,956 --> 00:39:39,616 than two that's constant steps so there's a one in there 977 00:39:39,686 --> 00:39:40,856 and something uninteresting. 978 00:39:41,186 --> 00:39:44,096 Clearly the meaty part of this algorithm is down here 979 00:39:44,096 --> 00:39:45,526 in the outs condition. 980 00:39:45,906 --> 00:39:48,246 So, let's ask ourselves what is the running time 981 00:39:48,246 --> 00:39:48,956 of this algorithm? 982 00:39:48,956 --> 00:39:52,156 Well, if you had N elements, we need to figure out T of N, 983 00:39:52,156 --> 00:39:54,036 which is just a fancy way of describing its running time. 984 00:39:54,486 --> 00:39:55,506 So, what does that mean? 985 00:39:55,726 --> 00:39:59,356 Well, computing T of N means you first sort the left half 986 00:39:59,356 --> 00:40:02,656 of the list so that's T of what, N over two, 987 00:40:02,866 --> 00:40:07,406 then you sort the right half, that's T over T of N over two. 988 00:40:07,596 --> 00:40:09,866 So I don't know what the function is. 989 00:40:09,866 --> 00:40:13,136 I don't know what T is yet, but I know it's going 990 00:40:13,136 --> 00:40:16,106 to be the same thing but with an input that's half as big 991 00:40:16,106 --> 00:40:18,496 and then finally how many steps does it take 992 00:40:18,496 --> 00:40:19,796 to merge sorted halves? 993 00:40:20,016 --> 00:40:22,806 Well, per my fingers moving slowly from left to right, 994 00:40:23,126 --> 00:40:24,836 that's some number of N steps 995 00:40:24,986 --> 00:40:26,876 because every finger moves forward once 996 00:40:27,066 --> 00:40:28,996 and then they eventually both stop. 997 00:40:29,146 --> 00:40:32,326 So we can express this even without quite knowing 998 00:40:32,326 --> 00:40:35,326 where we're going yet with an algorithm like this. 999 00:40:35,446 --> 00:40:37,336 T of N, the running time of an algorithm 1000 00:40:37,546 --> 00:40:40,376 with [inaudible] elements is the same thing as sorting half 1001 00:40:40,576 --> 00:40:45,006 of it, T of N over two, plus half of it, T of N over two, 1002 00:40:45,286 --> 00:40:48,936 plus N steps, but you know I'm kind of asking lots of questions 1003 00:40:48,936 --> 00:40:50,606 so maybe there's a constant factor in there. 1004 00:40:50,886 --> 00:40:53,396 So, here two, even in this context can we say the merging 1005 00:40:53,396 --> 00:40:57,316 takes linear time I don't know exactly if this is one step 1006 00:40:57,366 --> 00:40:59,886 to compare two elements or maybe this is two steps, but again, 1007 00:40:59,886 --> 00:41:02,826 these are constant values, who cares, let's just generalize it 1008 00:41:02,826 --> 00:41:06,236 as big O of N, but now we've kind of handed and turned 1009 00:41:06,236 --> 00:41:08,396 in an answer that itself is recursive. 1010 00:41:08,396 --> 00:41:10,826 What's the running time of this algorithm? 1011 00:41:11,136 --> 00:41:13,636 Well, it's the running time of running the algorithm on half 1012 00:41:13,636 --> 00:41:15,016 of itself and half of itself. 1013 00:41:15,536 --> 00:41:18,786 Like we haven't really made progress per se, but let's see 1014 00:41:18,786 --> 00:41:20,126 if the number at least pan out. 1015 00:41:20,406 --> 00:41:22,146 So how long does it take to sort not eight, 1016 00:41:22,146 --> 00:41:24,146 but we can do a little more interesting than that, 1017 00:41:24,146 --> 00:41:26,526 16 elements, but it's still a power of two just 1018 00:41:26,526 --> 00:41:28,806 so that the math works out perfectly even though 1019 00:41:28,806 --> 00:41:30,216 in reality there'd be some rounding. 1020 00:41:30,566 --> 00:41:33,196 So, let's just plug these numbers into the formula. 1021 00:41:33,516 --> 00:41:38,376 So, the formula, again is T of N equals T of N over two plus T 1022 00:41:38,376 --> 00:41:40,586 of N over two plus some linear time. 1023 00:41:40,706 --> 00:41:41,376 So let's do it. 1024 00:41:41,506 --> 00:41:43,266 How long does it take to sort 16 elements? 1025 00:41:43,586 --> 00:41:48,516 Well, T of 16 is T of eight plus T of eight or two times T 1026 00:41:48,516 --> 00:41:50,856 of eight plus 16 to do the merging. 1027 00:41:50,856 --> 00:41:53,446 So sort the left half, sort the right half plus the merging. 1028 00:41:53,446 --> 00:41:55,696 Unfortunately, we've again gotten caught 1029 00:41:55,696 --> 00:41:57,216 in a little circular argument here 1030 00:41:57,216 --> 00:41:59,576 because now the question is, what is T of eight then? 1031 00:41:59,576 --> 00:42:01,576 All right, well T of eight is the process 1032 00:42:01,576 --> 00:42:03,676 of sorting the left half, which is now size four, 1033 00:42:03,856 --> 00:42:05,776 plus sorting the right half, which is also size four 1034 00:42:05,776 --> 00:42:08,756 so T times two of T of four plus eight. 1035 00:42:08,976 --> 00:42:11,076 All right, what is the answer to T of four? 1036 00:42:11,296 --> 00:42:14,826 Well, T of four is now this, two times T of two, plus four. 1037 00:42:15,186 --> 00:42:16,126 What is T of two? 1038 00:42:16,126 --> 00:42:18,556 Well, T of two is two times T of one 1039 00:42:18,556 --> 00:42:20,406 and hopefully we're now going to bottom out 1040 00:42:20,406 --> 00:42:21,956 and not get stuck in this loop. 1041 00:42:22,176 --> 00:42:24,506 Ah, T of one is just zero 1042 00:42:24,956 --> 00:42:27,416 because how long does it take to sort one element? 1043 00:42:27,596 --> 00:42:28,276 Well, no time. 1044 00:42:28,276 --> 00:42:28,996 It's already sorted. 1045 00:42:28,996 --> 00:42:30,496 That's when we returned immediately. 1046 00:42:30,816 --> 00:42:33,496 So now we just have some high school arithmetic, 1047 00:42:33,496 --> 00:42:34,716 a little substitution here. 1048 00:42:34,916 --> 00:42:37,996 I now can fill in T of one then I can fill in T of two, 1049 00:42:37,996 --> 00:42:41,066 and then it bobbles back up so if we do out the math, 1050 00:42:41,066 --> 00:42:42,206 and I just flip the numbers 1051 00:42:42,206 --> 00:42:44,596 around so I can do the substitution from top to bottom, 1052 00:42:44,846 --> 00:42:49,986 it looks like T of 16 is ultimately two times T 1053 00:42:49,986 --> 00:42:51,136 of eight, but what is that? 1054 00:42:51,586 --> 00:42:54,326 Well, that is just the answer to T of eight, 1055 00:42:54,326 --> 00:42:56,576 which is apparently twenty-four, if you go through the math. 1056 00:42:57,116 --> 00:43:00,996 So long story short T of 16 is 64 steps. 1057 00:43:01,836 --> 00:43:03,816 Does this jive with the claim 1058 00:43:03,816 --> 00:43:07,466 that the running time is big O of N times log N? 1059 00:43:09,056 --> 00:43:09,186 >> Student: Yes. 1060 00:43:09,466 --> 00:43:09,776 >> Lecturer: Yeah. 1061 00:43:09,776 --> 00:43:14,196 So 16, right, so if the running time here is N times log N. 1062 00:43:14,256 --> 00:43:18,256 So N is 16, that's 16 times log base two 1063 00:43:18,256 --> 00:43:24,386 of sixteen that's four so, yes, so sixteen times four, 32, 64. 1064 00:43:24,686 --> 00:43:28,506 It is, in fact, 64 steps so the claim ultimately is, in fact, 1065 00:43:28,546 --> 00:43:32,206 correct that merge sort takes 64 steps concretely 1066 00:43:32,206 --> 00:43:35,376 with 16 elements but more generally N times log N. 1067 00:43:35,376 --> 00:43:36,646 So what's the take away there? 1068 00:43:37,226 --> 00:43:38,446 Like who cares? 1069 00:43:39,386 --> 00:43:40,696 Right. There's no point in memorizing 1070 00:43:40,696 --> 00:43:44,216 that merge sort takes 64 steps or 64 seconds for inputs 1071 00:43:44,216 --> 00:43:47,976 of size 16 but what would the N squared algorithm have been? 1072 00:43:47,976 --> 00:43:49,426 How long would bubble sort have taken? 1073 00:43:49,426 --> 00:43:52,406 How long would selection sort have taken? 1074 00:43:52,556 --> 00:43:56,066 Sixteen squared and that's if you do the math 256 1075 00:43:56,066 --> 00:43:57,706 and these are small inputs. 1076 00:43:57,756 --> 00:44:02,446 Now I'm comparing 256 steps against 64 steps. 1077 00:44:02,766 --> 00:44:05,256 Already the math is starting to work out in our favor, 1078 00:44:05,256 --> 00:44:08,096 and if you think back to the slide where we had those graphs, 1079 00:44:08,466 --> 00:44:11,786 when N really gets big do these numbers really start to hurt. 1080 00:44:11,786 --> 00:44:14,146 Well, it turns out there are other algorithms to be fair 1081 00:44:14,146 --> 00:44:18,136 out there so we'll see things over time whether in sections 1082 00:44:18,136 --> 00:44:20,686 or in future CS classes things called heap sorts 1083 00:44:20,686 --> 00:44:24,136 and insertion sort, quick sort, rate sort, shell sort 1084 00:44:24,136 --> 00:44:26,296 and as an aside especially for one 1085 00:44:26,296 --> 00:44:28,946 of the hacker editions you might enjoy reading up on one or more 1086 00:44:28,946 --> 00:44:33,816 of these, but N log N for now is about as good as we'll try to do 1087 00:44:34,116 --> 00:44:37,486 but let's just see exactly how good that is. 1088 00:44:37,806 --> 00:44:40,016 All right so here we have the second demo. 1089 00:44:40,016 --> 00:44:41,636 This, too, is linked on the course's website. 1090 00:44:41,836 --> 00:44:43,256 This is just another Java applet 1091 00:44:43,286 --> 00:44:45,226 that someone very nicely put together and each 1092 00:44:45,226 --> 00:44:47,726 of these drop downs is a whole bunch of sorting algorithms. 1093 00:44:47,726 --> 00:44:50,176 So the [inaudible] has had a lot of free time over the years 1094 00:44:50,446 --> 00:44:53,756 to come up with algorithms like bozo sort and stooge sort 1095 00:44:53,756 --> 00:44:56,656 and cocktail sort and a whole bunch of other silly ones 1096 00:44:56,656 --> 00:44:59,106 so apparently if you come up with a clever algorithm, 1097 00:44:59,106 --> 00:45:02,156 the fun part is you actually get to name it, but now we have 1098 00:45:02,156 --> 00:45:03,646 from left to right selection sort, 1099 00:45:03,646 --> 00:45:05,596 bubble sort and merge sort. 1100 00:45:05,676 --> 00:45:08,546 The question of the day was, can we do better 1101 00:45:08,546 --> 00:45:10,366 than these initial algorithms, bubble sort 1102 00:45:10,366 --> 00:45:12,866 and selection sort, and if so, who cares? 1103 00:45:12,866 --> 00:45:13,976 Why does it really matter? 1104 00:45:13,976 --> 00:45:16,516 What I'm about to do is click on each of these one right 1105 00:45:16,516 --> 00:45:19,186 after the other so that these things essentially run 1106 00:45:19,186 --> 00:45:22,346 in lock step and let's see what it means for something to be 1107 00:45:22,346 --> 00:45:26,846 in N log in as opposed to N squared. 1108 00:45:28,616 --> 00:45:30,746 Apparently not much. 1109 00:45:31,976 --> 00:45:36,836 There we go. 1110 00:45:37,086 --> 00:45:43,176 Done. And with that we'll see you Monday.