1 00:00:00,506 --> 00:00:09,626 [ Silence ] 2 00:00:10,126 --> 00:00:10,726 >> Alright. 3 00:00:10,866 --> 00:00:11,516 Welcome back. 4 00:00:11,516 --> 00:00:13,536 This is the end of week 3. 5 00:00:13,606 --> 00:00:16,586 So you'll recall at the beginning of week zero, 6 00:00:16,896 --> 00:00:18,706 we played around with this, with the whole bunch 7 00:00:18,706 --> 00:00:20,496 of our staff up on the stage. 8 00:00:20,496 --> 00:00:24,066 And the goal there was to consider just how efficiently, 9 00:00:24,206 --> 00:00:28,116 how effectively we could solve a fairly mundane problem 10 00:00:28,116 --> 00:00:30,566 of just looking up someone like Mike Smith in this phone book 11 00:00:30,746 --> 00:00:33,626 and recall that one of the first passes at it involved, well, 12 00:00:33,816 --> 00:00:36,086 start at the beginning, turn to the second page, 13 00:00:36,086 --> 00:00:37,946 turn to the third page, dot, dot, dot. 14 00:00:38,136 --> 00:00:39,886 A thousand some odd pages later, bam! 15 00:00:39,886 --> 00:00:41,066 We happen to pawn Mike Smith. 16 00:00:41,306 --> 00:00:44,396 But of course almost everyone except Jason as I recall 17 00:00:44,396 --> 00:00:47,966 up here, went at this phonebook by tearing it first in half 18 00:00:47,966 --> 00:00:50,636 and then in half and then in half and then in half again 19 00:00:50,896 --> 00:00:54,496 and that algorithm or that procedure, we decided was 20 00:00:54,496 --> 00:00:57,936 in theory much, much faster than the sort of old school start 21 00:00:57,936 --> 00:00:59,966 at the beginning, go to the end and stop as soon 22 00:00:59,966 --> 00:01:01,006 as you find Mike Smith. 23 00:01:01,106 --> 00:01:03,366 And just to refresh some jargon, 24 00:01:03,366 --> 00:01:05,846 if we said that the old school naive 25 00:01:05,966 --> 00:01:09,366 but very correct approach was linear in nature 26 00:01:09,366 --> 00:01:11,646 and that we were just kind of walking along a straight line 27 00:01:11,686 --> 00:01:13,106 from left to right in the phonebook, 28 00:01:13,346 --> 00:01:15,266 how did we describe the smarter approach 29 00:01:15,266 --> 00:01:18,216 that the CA's and TF's took? 30 00:01:18,436 --> 00:01:20,266 Right. It was logarithmic, right? 31 00:01:20,266 --> 00:01:23,286 So, sort of a-- maybe a flashback to a high school math 32 00:01:23,286 --> 00:01:26,086 at this moment but logarithmic in short was better 33 00:01:26,126 --> 00:01:29,296 and we'll actually feel what we mean by better today and beyond 34 00:01:29,296 --> 00:01:32,816 because today we put aside, see entirely for just a bit, 35 00:01:33,056 --> 00:01:35,946 no new syntax, nothing new scary today but we finally start 36 00:01:35,946 --> 00:01:39,116 to deliver on this syllabus's promise that 50 is ultimately 37 00:01:39,116 --> 00:01:41,816 about thinking, teaching you to think more carefully 38 00:01:41,816 --> 00:01:43,406 and solve problems more efficiently 39 00:01:43,406 --> 00:01:44,406 and more effectively. 40 00:01:44,696 --> 00:01:47,916 So, with that said, recall this, no need to stand up this time, 41 00:01:48,186 --> 00:01:50,626 this particular algorithm which was actually very similar 42 00:01:50,626 --> 00:01:52,226 in spirit to that phonebook example. 43 00:01:52,226 --> 00:01:55,256 In the phonebook example, again the linear approach was left 44 00:01:55,256 --> 00:01:58,736 to right but the logarithmic approach, the divide 45 00:01:58,736 --> 00:02:01,046 and conquer approach was to start in the middle 46 00:02:01,336 --> 00:02:04,366 and then repeat, repeat, repeat tearing half 47 00:02:04,366 --> 00:02:06,096 of the problem away again and again. 48 00:02:06,096 --> 00:02:08,146 So, you went from a hundred percent to 50 percent 49 00:02:08,146 --> 00:02:10,876 to 25 percent to 12 and a half percent and so forth. 50 00:02:10,876 --> 00:02:13,756 It was really a whittling down of that phonebook. 51 00:02:13,996 --> 00:02:17,006 This algorithm too, recall that first day, we all stood up 52 00:02:17,216 --> 00:02:20,246 and we had each of you pair off with the person next to you 53 00:02:20,486 --> 00:02:22,716 and then one of you sat down 54 00:02:22,716 --> 00:02:27,026 but the other kept both your number plus their number. 55 00:02:27,026 --> 00:02:28,656 And so if you continue that logic 56 00:02:28,656 --> 00:02:30,956 of having everyone stand then half people sit, 57 00:02:30,956 --> 00:02:32,426 half people sit, half people sit, 58 00:02:32,556 --> 00:02:34,496 but you never throw away any of the numbers that are 59 00:02:34,496 --> 00:02:36,396 in your head because you just hand your number off 60 00:02:36,396 --> 00:02:37,596 to the person still standing. 61 00:02:37,816 --> 00:02:39,486 In theory at the end of that algorithm, 62 00:02:39,686 --> 00:02:42,406 we should have had just one person standing and his 63 00:02:42,406 --> 00:02:44,556 or her number should have been the collective count 64 00:02:44,806 --> 00:02:47,516 in the room, because everyone recall started at one. 65 00:02:47,746 --> 00:02:49,836 Now, that's a bit of a cheat, that approach, 66 00:02:49,836 --> 00:02:53,296 and that it was not so much as more efficient algorithm per se. 67 00:02:53,296 --> 00:02:56,316 It was really leveraging more than one CPU, right? 68 00:02:56,316 --> 00:02:58,906 If each of us is sort of the brains of that algorithm, well, 69 00:02:58,906 --> 00:03:02,216 we threw like 400, 500 CPU's at that problem 70 00:03:02,436 --> 00:03:04,266 and ran the algorithm in parallel. 71 00:03:04,406 --> 00:03:07,576 But we'll see when you only have just one CPU, one core, 72 00:03:07,576 --> 00:03:09,416 one brain inside of your own computer, 73 00:03:09,636 --> 00:03:12,416 you can still leverage this idea of divide and conquer 74 00:03:12,576 --> 00:03:15,066 and ultimately get much, much faster 75 00:03:15,066 --> 00:03:16,806 and smarter implementations. 76 00:03:17,366 --> 00:03:19,946 Now, rather than use everyone in the room here, thought we'd pick 77 00:03:19,946 --> 00:03:22,626 on those of you in the orchestra section, just those of you here 78 00:03:22,626 --> 00:03:24,636 on the ground floor, otherwise this isn't too easy. 79 00:03:24,866 --> 00:03:27,546 So, if just you folks would humor me for just a moment 80 00:03:27,776 --> 00:03:31,116 and execute step one here, let's see if we can't find 81 00:03:31,116 --> 00:03:34,606 within this orchestra section the tallest person among you. 82 00:03:35,096 --> 00:03:36,926 And this should be easy if you're all standing 83 00:03:36,926 --> 00:03:39,326 on flat ground, alright? 84 00:03:40,166 --> 00:03:43,946 Execute, oh come on, even you folks standing 85 00:03:43,946 --> 00:03:45,736 in the back there, come on. 86 00:03:46,516 --> 00:04:08,966 [ Inaudible Discussions ] 87 00:04:09,466 --> 00:04:13,126 >> Alright, so the group should be getting smaller and smaller. 88 00:04:13,126 --> 00:04:14,556 I still see two of you standing. 89 00:04:14,556 --> 00:04:14,623 [ Inaudible Remark ] 90 00:04:14,623 --> 00:04:14,690 [ Laughter ] 91 00:04:14,690 --> 00:04:14,757 [ Inaudible Remark ] 92 00:04:14,757 --> 00:04:21,556 >> Okay, need to do a side by side comparison. 93 00:04:21,556 --> 00:04:28,546 Alright, and what's your name again? 94 00:04:29,006 --> 00:04:29,706 >> Renzo. 95 00:04:29,706 --> 00:04:31,656 >> So, Renzo isn't he the tallest person 96 00:04:31,656 --> 00:04:32,866 in the orchestra section today? 97 00:04:33,476 --> 00:04:34,196 Congratulations. 98 00:04:36,086 --> 00:04:39,426 Alright. So, first of all, it didn't go unnoticed, 99 00:04:39,426 --> 00:04:40,406 it was a whole lot of opting 100 00:04:40,406 --> 00:04:42,096 out in the back section there, the orchestra. 101 00:04:42,136 --> 00:04:44,506 That's okay, so Renzo was the tallest and consider 102 00:04:44,506 --> 00:04:47,566 in theory how fast this algorithm should have been. 103 00:04:47,566 --> 00:04:50,346 If everyone stood up then half of you sat down, half of you, 104 00:04:50,346 --> 00:04:51,256 half of you, half of you. 105 00:04:51,476 --> 00:04:55,036 So, again if we start with say not just maybe 50, 60, 106 00:04:55,036 --> 00:04:57,846 70 people here but instead start with a thousand people, 107 00:04:58,026 --> 00:05:02,356 we then got down to say 500 and then 250 and then 125. 108 00:05:02,356 --> 00:05:04,726 If you can kind of think about these numbers, they really start 109 00:05:04,726 --> 00:05:06,886 to shrink and shrink and shrink pretty quickly. 110 00:05:07,046 --> 00:05:09,436 In fact if we extrapolate like we did in week zero, 111 00:05:09,436 --> 00:05:12,986 if we had crazy thing, 4 billion people sitting 112 00:05:12,986 --> 00:05:14,836 in just the orchestra section today, 113 00:05:14,836 --> 00:05:17,046 how many iterations would there have been 114 00:05:17,096 --> 00:05:18,756 of this three step algorithm? 115 00:05:20,016 --> 00:05:21,866 So, only 32, right? 116 00:05:21,866 --> 00:05:25,556 It would be a kind of a mess but of course with just 32 steps 117 00:05:25,646 --> 00:05:28,496 and by 32 we get that from, you can divide 4 billion 118 00:05:28,496 --> 00:05:31,696 but in half again and again and again, a total of 32 times. 119 00:05:31,896 --> 00:05:33,316 I mean that's kind of a powerful thing. 120 00:05:33,316 --> 00:05:38,036 Even if you had 4 billion people here after just 32 rounds 121 00:05:38,036 --> 00:05:41,146 of this, everyone would have sat down except one person 122 00:05:41,146 --> 00:05:42,706 and that person would be standing like Renzo 123 00:05:42,706 --> 00:05:43,636 with the total amount. 124 00:05:43,796 --> 00:05:45,116 Now, what's the contrast here, right? 125 00:05:45,116 --> 00:05:46,206 That's kind of powerful. 126 00:05:46,206 --> 00:05:47,896 It's kind of nice that we're leveraging all these brains. 127 00:05:48,116 --> 00:05:49,586 What would be the alternative, right? 128 00:05:49,586 --> 00:05:52,616 If I did it sort of old fashion style, I would start down here 129 00:05:52,616 --> 00:05:54,676 and I would say, "Would you mind standing up for just a moment?" 130 00:05:56,156 --> 00:05:58,646 Alright, so I'm gonna compare these two fellows here 131 00:05:58,646 --> 00:06:00,146 and of course this is a very close tie. 132 00:06:00,706 --> 00:06:02,306 So, we're gonna break the tie randomly. 133 00:06:02,306 --> 00:06:03,666 So, I'm gonna say, Joe, you stay standing. 134 00:06:03,666 --> 00:06:04,526 Go ahead and sit down. 135 00:06:04,746 --> 00:06:05,826 Now, what do I do again? 136 00:06:05,826 --> 00:06:07,646 Well, I now walk down to the rest of the aisle, 137 00:06:07,646 --> 00:06:09,546 I find someone else, I compare those two, 138 00:06:09,786 --> 00:06:11,626 one of them sits down, I move on to the next, 139 00:06:11,926 --> 00:06:13,196 on to the next, on to the next. 140 00:06:13,196 --> 00:06:15,086 But, you know, all this while, I'm remembering 141 00:06:15,306 --> 00:06:16,886 who is the most tall-- 142 00:06:16,886 --> 00:06:19,096 the tallest person I've seen thus far. 143 00:06:19,096 --> 00:06:21,976 So, I kind of remember that person or drag them along 144 00:06:21,976 --> 00:06:23,116 with me but at the end of the day 145 00:06:23,116 --> 00:06:25,586 that algorithm is still something linear. 146 00:06:25,586 --> 00:06:29,196 I have to make some N minus 1 comparison if there's N of you 147 00:06:29,356 --> 00:06:30,556 and I have to compare all of you-- 148 00:06:30,556 --> 00:06:33,166 one person against all of you potentially to find the tallest. 149 00:06:33,536 --> 00:06:34,906 So, that would actually be much slower. 150 00:06:34,906 --> 00:06:36,946 If we did have some 60 or so people here, 151 00:06:37,116 --> 00:06:38,516 that would take 60 steps, 152 00:06:38,556 --> 00:06:42,666 whereas yours would have taken 60 to 30 to 15 to 7 and a half 153 00:06:42,666 --> 00:06:45,586 to 4 to 2 to 8, like 7 or 8 steps to do 154 00:06:45,586 --> 00:06:46,776 that same thing in this size group. 155 00:06:46,776 --> 00:06:47,386 So thank you, Joe. 156 00:06:47,716 --> 00:06:50,036 So, linear is obviously a bit slower. 157 00:06:50,326 --> 00:06:54,226 So, how can we now leverage this sort of commonsensical ideas 158 00:06:54,226 --> 00:06:56,786 that we're using in this human context 159 00:06:56,786 --> 00:06:59,976 to actually now leverage them in more powerful context. 160 00:06:59,976 --> 00:07:01,766 Well, first what's really the goal? 161 00:07:01,766 --> 00:07:04,196 So, let's just kind of plot upper [inaudible] like you might 162 00:07:04,196 --> 00:07:06,996 on a graphing calculator or Excel or on a piece of paper, 163 00:07:07,226 --> 00:07:08,286 what these things look like. 164 00:07:08,286 --> 00:07:11,936 So, if we very generically say that this chart represents 165 00:07:11,936 --> 00:07:14,536 on the X axis the size of some problem 166 00:07:14,536 --> 00:07:16,606 and size we measure typically numerically, 167 00:07:16,816 --> 00:07:19,926 we usually call it N as a generic variable and the size 168 00:07:19,926 --> 00:07:22,846 of the problem that we've been playing with here are size-- 169 00:07:22,846 --> 00:07:24,886 the number of people in this room, so 60. 170 00:07:24,886 --> 00:07:26,676 In this case a few hundred, in the total case. 171 00:07:26,676 --> 00:07:28,216 So, N equals that value. 172 00:07:28,496 --> 00:07:31,956 So, the farther out you go over here, the more people there are 173 00:07:31,956 --> 00:07:34,646 to count in the algorithm, the more input there is. 174 00:07:34,646 --> 00:07:36,776 Now, on the Y axis here, the time to solve, 175 00:07:37,016 --> 00:07:39,006 that's just the generic way of saying how much, 176 00:07:39,196 --> 00:07:41,306 how many steps are there gonna be, how many seconds, 177 00:07:41,306 --> 00:07:42,726 how many minutes is it gonna take 178 00:07:42,806 --> 00:07:45,186 to actually execute this algorithm given these 179 00:07:45,236 --> 00:07:45,936 many people. 180 00:07:45,936 --> 00:07:48,186 So, if we plotted here in red a straight line, 181 00:07:48,516 --> 00:07:51,996 straight implying linear, so if we had N people in this room 182 00:07:52,126 --> 00:07:55,456 as the size of the problem grows from left to right, well, 183 00:07:55,456 --> 00:07:59,076 the amount of time it would take for me to count you back 184 00:07:59,076 --> 00:08:02,416 in the first week or for me to find the tallest person 185 00:08:02,416 --> 00:08:06,136 in here manually by comparing every person against-- 186 00:08:06,136 --> 00:08:08,796 comparing pairs of people again 187 00:08:08,796 --> 00:08:10,206 and again 'til I find the tallest. 188 00:08:10,526 --> 00:08:13,096 Well, every time you add another person to the orchestra section, 189 00:08:13,326 --> 00:08:14,506 it's gonna take me one more step. 190 00:08:14,616 --> 00:08:16,126 If a hundred more people come in sit 191 00:08:16,126 --> 00:08:18,276 down in the orchestra, 100 more steps. 192 00:08:18,386 --> 00:08:20,646 So, it's a one to one, a linear relationship. 193 00:08:20,936 --> 00:08:23,966 Now, what did I do in week zero as an optimization 194 00:08:23,966 --> 00:08:25,186 of counting everyone in the room? 195 00:08:25,416 --> 00:08:29,376 It was kind of slow to say 1, 2, 3, 4, 5, 196 00:08:29,776 --> 00:08:32,986 so what was the obvious marginal improvement, alright? 197 00:08:32,986 --> 00:08:37,146 So, count in pair, so two people at once, 2, 4, 6, 8, 10, 12, 198 00:08:37,146 --> 00:08:40,296 14 and so forth, and just verbally you can hear how the 199 00:08:40,296 --> 00:08:41,616 speed has obviously increased. 200 00:08:41,616 --> 00:08:45,966 It's twice that but graphically it's really just the same shape 201 00:08:45,966 --> 00:08:48,486 of a line, it just happens to be a little lower, alright? 202 00:08:48,486 --> 00:08:50,186 Because you can actually have more people 203 00:08:50,286 --> 00:08:52,966 but it's gonna take less time if you look up the Y-axis, 204 00:08:53,186 --> 00:08:55,106 obviously because you're counting them twice as-- 205 00:08:55,156 --> 00:08:57,406 twice as many at a time. 206 00:08:57,716 --> 00:08:59,636 But when you all counted yourselves, 207 00:08:59,636 --> 00:09:01,816 or in this case found the tallest person 208 00:09:02,106 --> 00:09:04,456 in just the orchestra section, well, 209 00:09:04,456 --> 00:09:08,526 the shape of that graph is much more powerfully curved 210 00:09:08,526 --> 00:09:10,576 and almost flat like this. 211 00:09:10,576 --> 00:09:13,686 Whereby if we graph how much time it takes 212 00:09:13,976 --> 00:09:15,906 to count the number of people in the mezzanine 213 00:09:15,906 --> 00:09:17,006 or find the tallest person-- 214 00:09:17,006 --> 00:09:19,556 sorry, the number of people in the theater 215 00:09:19,806 --> 00:09:22,706 or the tallest person in the orchestra, well, 216 00:09:22,706 --> 00:09:23,986 you get this shape instead. 217 00:09:23,986 --> 00:09:24,866 And why is that? 218 00:09:25,096 --> 00:09:27,236 Well, if we literally double the number of people 219 00:09:27,236 --> 00:09:29,456 in this orchestra section, suppose there are 60, 220 00:09:29,456 --> 00:09:31,166 60 more people come in, they sit down, 221 00:09:31,356 --> 00:09:33,226 how many more steps does my algorithm take 222 00:09:33,226 --> 00:09:35,256 if I'm actually using something like divide 223 00:09:35,256 --> 00:09:37,216 and conquer, this parallelism? 224 00:09:38,206 --> 00:09:40,626 Just one, and that's kind of the powerful idea. 225 00:09:40,626 --> 00:09:42,266 You can double the size of the problem 226 00:09:42,266 --> 00:09:45,156 and I'd barely bat an eye at it because I can absorb 227 00:09:45,156 --> 00:09:47,546 that problem by just executing one more step. 228 00:09:47,876 --> 00:09:49,826 But by contrast the linear algorithms, 229 00:09:50,106 --> 00:09:53,106 if I was doing it manually, 1, 2, 3, 230 00:09:53,106 --> 00:09:56,326 that would take 60 more steps to count 60 people. 231 00:09:56,326 --> 00:09:58,636 So, the goal here ultimately is not gonna be 232 00:09:58,636 --> 00:09:59,976 to count simple things like these, but try. 233 00:10:00,146 --> 00:10:03,816 >> And most every time we start implementing new problems 234 00:10:04,036 --> 00:10:07,056 and solving more interesting challenges in the P-sets is 235 00:10:07,056 --> 00:10:09,186 to try to figure out, well, can I get away 236 00:10:09,186 --> 00:10:10,046 with something linear? 237 00:10:10,046 --> 00:10:12,316 It's quick, it's easy, I know for loops and while loops, 238 00:10:12,666 --> 00:10:16,326 or can I actually make something that's smarter than that, 239 00:10:16,326 --> 00:10:19,306 that's more efficient than that, that's better designed. 240 00:10:19,306 --> 00:10:22,256 And if you consider the applications here, right, 241 00:10:22,256 --> 00:10:24,986 we talked about the 6 degrees of Kevin Bacon example 242 00:10:24,986 --> 00:10:27,476 and just asking sort of very reasonable questions like, 243 00:10:27,756 --> 00:10:30,176 "Who are my friends of friends on Facebook?" 244 00:10:30,176 --> 00:10:33,566 That is not an easy question to answer as the size 245 00:10:33,566 --> 00:10:35,086 of your social network grows 246 00:10:35,296 --> 00:10:37,786 because of the multiplying effects 247 00:10:38,066 --> 00:10:41,126 of having your thousand friends times their thousand friends 248 00:10:41,126 --> 00:10:43,276 and so forth, that graph is actually gonna look much 249 00:10:43,336 --> 00:10:43,896 more different. 250 00:10:43,896 --> 00:10:45,756 In fact, kind of take a mental image 251 00:10:45,756 --> 00:10:48,986 of these two straight lines and then this curved green line, 252 00:10:48,986 --> 00:10:50,196 green meaning good here. 253 00:10:50,326 --> 00:10:51,776 And if we actually zoom out 254 00:10:51,986 --> 00:10:54,156 and consider other possible algorithms, 255 00:10:54,406 --> 00:10:57,536 those three colored lines have now kind of flattened themselves 256 00:10:57,536 --> 00:10:59,076 out on the right hand side there. 257 00:10:59,356 --> 00:11:02,406 And what you see plotted here now are numbers like, well, 258 00:11:02,406 --> 00:11:06,176 suppose your algorithm was N squared, in other words 259 00:11:06,176 --> 00:11:09,226 if you've got N people, it takes N squared steps, 260 00:11:09,226 --> 00:11:10,756 10 people, 100 steps. 261 00:11:11,116 --> 00:11:13,226 Well if its N cubed, it's even bigger. 262 00:11:13,426 --> 00:11:15,766 If we zoom out further though, you can see that there's 263 00:11:15,766 --> 00:11:17,356 at least one type of algorithm. 264 00:11:17,386 --> 00:11:21,076 We probably want to avoid throughout life that's anything 265 00:11:21,076 --> 00:11:21,776 that looks like this. 266 00:11:22,166 --> 00:11:25,146 This is what we generally call an exponential algorithm 267 00:11:25,316 --> 00:11:28,046 where you take some constant number like 2 and then raise 268 00:11:28,046 --> 00:11:30,346 that to some variable number like N here. 269 00:11:30,436 --> 00:11:32,356 And we'll put this into context before long. 270 00:11:32,356 --> 00:11:35,226 There's some very famous and some very common problems 271 00:11:35,226 --> 00:11:38,896 that potentially fall into that bucket but you can see just 272 00:11:38,896 --> 00:11:42,456 from the shape of this graph, if you just to have an input 273 00:11:42,456 --> 00:11:44,896 of size 60, 60 people, 60 something 274 00:11:45,186 --> 00:11:48,206 and my God it takes 200,000 units of time, 275 00:11:48,496 --> 00:11:50,956 and that's a lot compared to anything that's down there. 276 00:11:51,356 --> 00:11:54,526 Now, you've all been sectioning for this class and other classes 277 00:11:54,526 --> 00:11:57,846 and can you just think about the Harvard has in the fall semester 278 00:11:57,846 --> 00:12:01,386 like a few thousand classes of varying sizes 279 00:12:01,716 --> 00:12:06,586 and we have a few hundred rooms probably and there's a few hours 280 00:12:06,586 --> 00:12:09,886 in the day, 9 to 6-ish that classes might be held in 281 00:12:09,886 --> 00:12:12,346 and there's some finite number of faculty, so there's all 282 00:12:12,346 --> 00:12:13,406 of these various inputs. 283 00:12:13,716 --> 00:12:14,706 And if you just start to think 284 00:12:14,706 --> 00:12:16,556 about now I've got a few thousand times 285 00:12:16,556 --> 00:12:18,836 of this many rooms times this many students, 286 00:12:18,946 --> 00:12:22,256 all of whom have these many constraints that the problem 287 00:12:22,256 --> 00:12:25,566 of scheduling people or resources or rooms 288 00:12:25,566 --> 00:12:28,526 or classes is actually one of those problems that tends 289 00:12:28,526 --> 00:12:30,556 to fall in the 2 to the N bucket, 290 00:12:30,556 --> 00:12:32,086 at least if implemented poorly. 291 00:12:32,086 --> 00:12:35,306 FAS's sectioning tool which you use for this class and probably 292 00:12:35,306 --> 00:12:37,616 for some of your other classes, if you actually Google around, 293 00:12:37,616 --> 00:12:39,336 you can find the algorithm that they used 294 00:12:39,476 --> 00:12:42,506 and it's actually disclaimed as not being optimal. 295 00:12:42,506 --> 00:12:44,066 It is not the perfect solution. 296 00:12:44,066 --> 00:12:47,606 It will not maximize all of your happiness across campus 297 00:12:47,866 --> 00:12:50,406 because doing so would actually take a ridiculous amount 298 00:12:50,406 --> 00:12:50,946 of time. 299 00:12:51,186 --> 00:12:54,736 You would essentially have to compare every possible persons, 300 00:12:55,016 --> 00:12:59,156 possible maximum happiness given a current schedule then compared 301 00:12:59,156 --> 00:13:01,756 against everyone else's optimum and then figure out from 302 00:13:02,256 --> 00:13:05,296 that exactly how, who we should put in which class, 303 00:13:05,296 --> 00:13:08,086 which section, with which TF, it just takes too long. 304 00:13:08,496 --> 00:13:10,406 Certainly if you have many classes, 305 00:13:10,406 --> 00:13:11,466 all of whom we're trying to section 306 00:13:11,466 --> 00:13:13,006 at once, so they optimize. 307 00:13:13,076 --> 00:13:14,276 Maybe there's a bit of randomness. 308 00:13:14,576 --> 00:13:16,756 And so with randomness, with optimizations, 309 00:13:16,786 --> 00:13:20,466 kind of cutting corners, the out program like that can say, 310 00:13:20,466 --> 00:13:23,076 you probably got your first or second or third choice 311 00:13:23,346 --> 00:13:25,226 but if you kind of added up all of your happiness, 312 00:13:25,306 --> 00:13:27,906 if you could kind of quantize it somehow, odds are most 313 00:13:27,906 --> 00:13:29,816 of you are not perfectly happy right now 314 00:13:29,856 --> 00:13:30,916 with your section assignments. 315 00:13:30,996 --> 00:13:34,016 But that's sort of the nature of these kinds of problems. 316 00:13:34,646 --> 00:13:37,246 So, let's just slap some jargon on one thing just 317 00:13:37,246 --> 00:13:39,666 so we can express things a little more succinctly later. 318 00:13:40,136 --> 00:13:42,576 In programming, in computer science more generally, 319 00:13:42,866 --> 00:13:45,446 computer scientists have some shorthand notation 320 00:13:45,446 --> 00:13:47,926 for describing how long it takes 321 00:13:47,926 --> 00:13:50,506 to run certain algorithms or certain programs. 322 00:13:50,656 --> 00:13:53,726 And this notation here just denotes what we'll call big O 323 00:13:53,986 --> 00:13:57,086 and this is the-- it stands for order, the order of magnitude 324 00:13:57,086 --> 00:13:58,416 of running time essentially. 325 00:13:58,686 --> 00:14:01,766 So, if we have an algorithm like your grade school counting, 1, 326 00:14:02,046 --> 00:14:05,026 2, 3, 4, we do say that's linear 327 00:14:05,266 --> 00:14:06,626 but if you wanna be really fancy, 328 00:14:06,626 --> 00:14:10,306 you can say that that algorithm of counting is in big O 329 00:14:10,306 --> 00:14:12,896 of N. So, big O of N is again just sort 330 00:14:12,896 --> 00:14:14,996 of the mathy [phonetic] way of saying that algorithm 331 00:14:14,996 --> 00:14:19,496 in the worst case is going to take you N steps no matter what, 332 00:14:19,976 --> 00:14:21,606 because you have to count every person. 333 00:14:21,936 --> 00:14:25,206 By contrast, if we did the two Z's counting, counting everyone 334 00:14:25,206 --> 00:14:28,206 in pairs, 2, 4, 6, 8 and so forth, we could actually whittle 335 00:14:28,206 --> 00:14:31,616 that down to something like big O of N over 2, right, 336 00:14:31,616 --> 00:14:33,756 if N is the total number of people divided by 2. 337 00:14:34,016 --> 00:14:36,106 And though if we went even smarter 338 00:14:36,106 --> 00:14:39,076 and had you guys count yourselves, pairing off again 339 00:14:39,076 --> 00:14:41,706 and again so that the problem goes from this size to this size 340 00:14:41,706 --> 00:14:45,016 to this size to this size, that we could express as big O 341 00:14:45,486 --> 00:14:50,086 of log base 2 of N. And realize that this is pretty much 342 00:14:50,086 --> 00:14:52,116 as mathy as a class like this tends to get but it-- 343 00:14:52,196 --> 00:14:54,676 well, you'll find it helpful in thinking through some 344 00:14:54,676 --> 00:14:57,906 of your own programs longer term in the future weeks just 345 00:14:57,906 --> 00:15:00,176 as to how well designed something might be 346 00:15:00,176 --> 00:15:02,596 and we'll see some alternatives to this big O notation. 347 00:15:02,836 --> 00:15:05,386 In fact what we're gonna see too before long is 348 00:15:05,386 --> 00:15:07,516 that an algorithm that's big O of N 349 00:15:07,856 --> 00:15:10,326 versus an algorithm that's big O of N over 2, 350 00:15:10,676 --> 00:15:13,256 most people just say, any constant number 351 00:15:13,256 --> 00:15:15,176 like that is actually pretty meaningless. 352 00:15:15,456 --> 00:15:16,126 Because you know what? 353 00:15:16,426 --> 00:15:20,516 If your algorithm is twice as fast as mine, whereby you count 354 00:15:20,516 --> 00:15:23,436 in twosies and I count one at a time, well you know what? 355 00:15:23,436 --> 00:15:26,206 All I have to do to catch up to you is wait literally 12 months, 356 00:15:26,206 --> 00:15:28,166 I get a brand new computer that's probably gonna be twice 357 00:15:28,166 --> 00:15:30,346 as fast, thanks to industry trends and voila, 358 00:15:30,346 --> 00:15:31,846 now our algorithms are the same. 359 00:15:32,376 --> 00:15:34,526 But that's not what we mean by comparing algorithms. 360 00:15:34,526 --> 00:15:35,986 So, we do in fact, when talking 361 00:15:35,986 --> 00:15:38,626 about how well designed something is, we just cross 362 00:15:38,626 --> 00:15:40,986 out things like divide by 2 or multiply by 2, 363 00:15:40,986 --> 00:15:42,336 any of these constant factors. 364 00:15:42,596 --> 00:15:45,166 Because again, it then ties us too much 365 00:15:45,166 --> 00:15:48,626 to current hardware capabilities and so we try to focus only 366 00:15:48,626 --> 00:15:51,486 on the-- the formulaic part, the variables. 367 00:15:51,986 --> 00:15:53,996 So, let's try to make this a little more real. 368 00:15:54,626 --> 00:15:55,966 So, dramatic flourish time. 369 00:15:56,636 --> 00:16:02,416 So, in advance, we prepared these white pieces 370 00:16:02,416 --> 00:16:03,846 of paper here. 371 00:16:04,606 --> 00:16:06,106 Hopefully you didn't see what's behind them. 372 00:16:06,106 --> 00:16:09,356 On the-- on this blackboard here, and the question 373 00:16:09,356 --> 00:16:13,126 at hand is, "How can we find numbers behind these doors 374 00:16:13,396 --> 00:16:15,606 efficiently as quickly as possible?" 375 00:16:15,606 --> 00:16:17,806 And what's nice about this point in the semester is 376 00:16:17,806 --> 00:16:19,846 that now even though you might still be fighting 377 00:16:19,846 --> 00:16:22,716 with some new season tax and functions and return variables 378 00:16:22,716 --> 00:16:23,826 and stuff that might be new to you, 379 00:16:24,006 --> 00:16:26,096 we at least have a pretty good toolkit 380 00:16:26,466 --> 00:16:27,876 for storing information,. 381 00:16:27,876 --> 00:16:30,096 You've got variables and floats and chars and strings 382 00:16:30,096 --> 00:16:32,186 but we also now have what data structure 383 00:16:32,186 --> 00:16:33,626 that we introduced a couple of days ago? 384 00:16:34,356 --> 00:16:35,416 Yeah, so we have arrays. 385 00:16:35,416 --> 00:16:37,536 And an array in English, anyone, is what? 386 00:16:40,026 --> 00:16:42,716 Wanna have one person, in just words you might use 387 00:16:42,716 --> 00:16:43,656 to explain it to your roommate? 388 00:16:44,516 --> 00:16:50,316 [ Inaudible Remark ] 389 00:16:50,816 --> 00:16:53,216 >> Okay, collection of data points, collection of integers 390 00:16:53,216 --> 00:16:55,136 or chars or whatnot and a little-- 391 00:16:55,136 --> 00:16:56,246 be a little more precise, what's-- 392 00:16:56,756 --> 00:16:58,056 what about that collection? 393 00:16:58,176 --> 00:17:02,166 Finite number of them, and can you be even more specific 394 00:17:02,166 --> 00:17:04,766 with regards to how they're laid out in memory? 395 00:17:04,766 --> 00:17:06,126 [ Inaudible Remark ] 396 00:17:06,126 --> 00:17:12,086 >> Perfect, they each take up the same amount of space, int, 397 00:17:12,086 --> 00:17:13,816 int, int, int or char, char, char, 398 00:17:13,816 --> 00:17:16,656 char and they're all contiguous in memory. 399 00:17:16,656 --> 00:17:18,886 And the advantage of the contiguousness 400 00:17:18,886 --> 00:17:20,506 of these values is that you can use 401 00:17:20,506 --> 00:17:22,736 that very succinct square bracket notation to go 402 00:17:22,736 --> 00:17:25,436 from the zeroth element to element 1, element 2, 403 00:17:25,436 --> 00:17:28,386 element N minus 1, but we did see some dangers. 404 00:17:28,386 --> 00:17:31,536 If you accidentally take an array that's of size N and go 405 00:17:31,536 --> 00:17:33,956 to the name of the array bracket N, 406 00:17:34,146 --> 00:17:38,166 what did you just do that's bad? 407 00:17:38,386 --> 00:17:41,256 Sorry? You fell off the end of the array. 408 00:17:41,256 --> 00:17:43,806 And now sometimes this might have no bad effect, right? 409 00:17:43,806 --> 00:17:45,646 I printed something out where I went a little too far 410 00:17:45,646 --> 00:17:47,846 and all we saw was a space character, not a big deal. 411 00:17:47,846 --> 00:17:49,616 But if you a little too far, 412 00:17:49,826 --> 00:17:51,406 you're actually gonna start touching RAM 413 00:17:51,406 --> 00:17:52,686 that does not belong to you. 414 00:17:52,686 --> 00:17:55,276 In fact it might belong to some other program potentially all 415 00:17:55,276 --> 00:17:58,476 together, and so the operating system will typically crash 416 00:17:58,476 --> 00:18:01,246 in some form and the-- the bad news you get 417 00:18:01,246 --> 00:18:04,866 from your command line is core dump or segmentation fault, 418 00:18:04,866 --> 00:18:06,986 one or both of those realities. 419 00:18:07,046 --> 00:18:09,866 But let's assume for now that we at least have arrays 420 00:18:09,866 --> 00:18:10,946 and so we have this capacity 421 00:18:10,946 --> 00:18:14,026 to store say 8 integers back to back to back. 422 00:18:14,256 --> 00:18:16,386 So, this might be our first array here. 423 00:18:16,586 --> 00:18:19,176 This might be a second array here and behind each 424 00:18:19,176 --> 00:18:22,536 of the doors of paper here on the top, 425 00:18:22,626 --> 00:18:25,346 we have some random number in random order. 426 00:18:25,346 --> 00:18:30,966 And suppose if I were to ask you to say find me the number 7, 427 00:18:31,466 --> 00:18:35,716 what would you do in that top array of integers? 428 00:18:39,476 --> 00:18:42,336 This is not even a computer science question. 429 00:18:42,336 --> 00:18:43,716 I say, "Here's 8 pieces of paper, 430 00:18:43,716 --> 00:18:45,066 there's 8 numbers behind them." 431 00:18:45,066 --> 00:18:46,976 It's sort of game show like, find me the number 7. 432 00:18:46,976 --> 00:18:48,316 What would you do as a reasonable person? 433 00:18:49,846 --> 00:18:50,226 I'm sorry? 434 00:18:51,326 --> 00:18:52,716 Alright, so rip off the paper, right, 435 00:18:52,716 --> 00:18:54,486 and just start looking for it, right? 436 00:18:54,486 --> 00:18:57,256 And unfortunately all of these doors are closed, 437 00:18:57,256 --> 00:18:59,216 all of the papers hiding it, so you can't kind 438 00:18:59,216 --> 00:19:01,746 of do this human thing and just kind of skim it and be like, 439 00:19:01,746 --> 00:19:03,106 "Oh, bam, it's right there." 440 00:19:03,356 --> 00:19:05,146 Rather you have to be much more methodical 441 00:19:05,146 --> 00:19:07,936 and actually open the door or lift the paper then let it go 442 00:19:07,936 --> 00:19:09,596 and then check another one, check another one. 443 00:19:09,826 --> 00:19:12,176 And this kinda hammer home-- hammers home the point perhaps 444 00:19:12,436 --> 00:19:14,376 that a computer too has to do that. 445 00:19:14,376 --> 00:19:16,766 We've had the luxury for some 18, 20, 446 00:19:16,766 --> 00:19:19,976 30 years as humans growing up of just kind of being able 447 00:19:19,976 --> 00:19:22,566 to glance at something and say, "It's right there, 448 00:19:22,566 --> 00:19:23,526 that number is right there." 449 00:19:23,526 --> 00:19:26,566 Or looking out on this room and saying, "Feels like 60 people," 450 00:19:26,716 --> 00:19:28,066 and you get this aggregate sense. 451 00:19:28,276 --> 00:19:31,066 But what your brain, you know, maybe is actually doing 452 00:19:31,066 --> 00:19:32,776 and what a computer certainly would need 453 00:19:32,776 --> 00:19:35,926 to do is you can't just kind of glance out and say, 60. 454 00:19:36,166 --> 00:19:39,466 You have to do something much more methodical, 1, 2, 3, 4, 5. 455 00:19:39,466 --> 00:19:42,066 Same thing in Scratch, you might have felt your self frustrated 456 00:19:42,066 --> 00:19:44,366 at first because Scratch kind of confines you 457 00:19:44,366 --> 00:19:48,086 to using only specific puzzle pieces that exist and you kind 458 00:19:48,086 --> 00:19:49,896 of have to figure out how to express something 459 00:19:49,896 --> 00:19:51,596 like that would be very natural to say. 460 00:19:51,846 --> 00:19:53,606 Like, move the cat up to the left. 461 00:19:53,956 --> 00:19:55,466 Well, how do you actually move that cat, 462 00:19:55,466 --> 00:19:57,296 how do you move Scratch up to the left? 463 00:19:57,646 --> 00:19:59,766 Well, you had to do something like in a loop and you had 464 00:19:59,766 --> 00:20:01,936 to have a move and then turn and move and turn. 465 00:20:02,116 --> 00:20:04,246 >> In other words you had to break down common sense 466 00:20:04,536 --> 00:20:07,156 and human assumptions into something much more precise. 467 00:20:07,726 --> 00:20:10,276 So, one of my favorite CS50 moments was a couple 468 00:20:10,276 --> 00:20:12,636 of years ago when we did a demonstration much like this 469 00:20:12,636 --> 00:20:14,256 with one of your predecessors named Shawn. 470 00:20:14,486 --> 00:20:16,246 Rather than have someone new come up this year, 471 00:20:16,246 --> 00:20:17,816 I thought I would engage us and we'll look back 472 00:20:18,146 --> 00:20:21,176 at how Shawn approached this problem and hopefully 473 00:20:21,386 --> 00:20:24,396 as you watch Shawn participates in this example here, 474 00:20:24,606 --> 00:20:27,406 you yourself will be thinking about what you would be doing 475 00:20:27,696 --> 00:20:30,586 or would not be doing if you had been Shawn standing in front 476 00:20:30,756 --> 00:20:34,866 of some 300 people trying and a little example like this. 477 00:20:34,966 --> 00:20:37,746 So, this is Shawn trying to find us the number 7. 478 00:20:37,746 --> 00:20:39,626 And again, ask yourself as we watch this, 479 00:20:39,936 --> 00:20:43,426 what you might have been doing or yelling at Shawn at the time. 480 00:20:44,626 --> 00:20:47,856 Okay, so your task here, Shawn, is the following. 481 00:20:47,856 --> 00:20:52,846 I had hidden behind these doors the number 7, but tucked away 482 00:20:52,846 --> 00:20:56,176 in some of these doors as well are other nonnegative numbers 483 00:20:56,456 --> 00:20:58,376 and your goal is to think of this top row 484 00:20:58,376 --> 00:20:59,606 of numbers is just an array. 485 00:20:59,946 --> 00:21:01,686 Were just a sequence of pieces of paper 486 00:21:01,686 --> 00:21:02,576 with numbers behind them, 487 00:21:02,836 --> 00:21:05,496 and your goal is only using the top array here, 488 00:21:05,656 --> 00:21:06,756 find me the number 7. 489 00:21:06,836 --> 00:21:07,556 And we are then going 490 00:21:07,556 --> 00:21:08,946 to critique how you go about doing it. 491 00:21:09,976 --> 00:21:10,226 >> Alright. 492 00:21:10,396 --> 00:21:20,376 >> Find us the number 7 please. 493 00:21:20,876 --> 00:21:25,426 [ Laughter ] 494 00:21:25,926 --> 00:21:27,126 >> No, 5, 19, 13. 495 00:21:27,126 --> 00:21:27,193 [ Laughter ] 496 00:21:27,193 --> 00:21:33,516 >> It's not a trick question. 497 00:21:33,766 --> 00:21:48,186 One. At this point your score is not very good, 498 00:21:48,356 --> 00:21:54,996 so you might as well keep going. 499 00:21:55,056 --> 00:21:55,466 Three, go on. 500 00:21:55,466 --> 00:22:04,566 Frankly I can't help but wonder what you're even thinking about. 501 00:22:04,566 --> 00:22:04,633 [ Laughter ] 502 00:22:04,633 --> 00:22:06,976 >> Only the top row, so you got three left, so find me 7. 503 00:22:07,516 --> 00:22:16,396 [ Inaudible Remark ] 504 00:22:16,896 --> 00:22:25,776 [ Laughter ] 505 00:22:26,276 --> 00:22:26,936 >> 17. 506 00:22:27,516 --> 00:22:31,101 [ Laughter ] 507 00:22:31,601 --> 00:22:35,186 [ Inaudible Remark ] 508 00:22:35,686 --> 00:22:37,676 >> 7. 509 00:22:38,176 --> 00:22:40,716 [ Applause ] 510 00:22:41,216 --> 00:22:42,486 >> So, we never again have to do 511 00:22:42,486 --> 00:22:44,146 that demonstration 'cause Shawn kind 512 00:22:44,286 --> 00:22:46,796 of captured the potential of it so perfectly. 513 00:22:47,196 --> 00:22:50,786 So, there really was no right answer to that question, right? 514 00:22:50,786 --> 00:22:53,396 The best he could have done was just start tearing the pieces 515 00:22:53,396 --> 00:22:56,666 of paper from left to right and find the number 7 516 00:22:56,666 --> 00:22:58,676 and I'm sure he was thinking, "There's got 517 00:22:58,676 --> 00:22:59,566 to be some trick to this." 518 00:22:59,566 --> 00:23:00,296 And it probably didn't help 519 00:23:00,296 --> 00:23:02,036 that all these eyes were bearing down on him. 520 00:23:02,166 --> 00:23:04,326 But the problem is that if you're not given any sort 521 00:23:04,326 --> 00:23:05,386 of assumptions to go on. 522 00:23:05,386 --> 00:23:07,296 You're just told, "Here's a bunch of numbers, 523 00:23:07,296 --> 00:23:08,646 they happen to be nonnegative but they're 524 00:23:08,646 --> 00:23:10,356 in some random order, find me something." 525 00:23:10,616 --> 00:23:13,756 The best you can do is to implement an algorithm 526 00:23:13,936 --> 00:23:16,046 in what running time, big O of what, 527 00:23:16,046 --> 00:23:18,086 if there's N total doors to look behind? 528 00:23:18,746 --> 00:23:19,756 It's big O of N, right? 529 00:23:19,756 --> 00:23:23,746 So, that algorithm in the worst case was gonna take Shawn big O 530 00:23:23,746 --> 00:23:25,376 of N steps and it almost did. 531 00:23:25,376 --> 00:23:27,576 It was big O of, in his case, N minus 1. 532 00:23:27,836 --> 00:23:30,536 But as we do with divide by 2, even doing plus this, 533 00:23:30,536 --> 00:23:31,576 minus this, who cares? 534 00:23:31,776 --> 00:23:33,806 We just typically erase that part of the formula 535 00:23:33,806 --> 00:23:37,106 and we just say Shawn's implementation was big O of N. 536 00:23:37,736 --> 00:23:40,406 But there was-- there's another way we can express running time 537 00:23:40,556 --> 00:23:42,766 because Shawn could have gotten lucky, right? 538 00:23:42,766 --> 00:23:46,036 If 2 years ago he had actually done something slightly 539 00:23:46,036 --> 00:23:48,436 different, he might have avoided 540 00:23:48,436 --> 00:23:51,416 that whole awkward moment for himself on stage. 541 00:23:51,416 --> 00:23:53,666 What could he have done differently just by chance 542 00:23:53,936 --> 00:23:56,506 that would have made that problem go away much faster? 543 00:23:57,576 --> 00:23:58,806 Alright, so start on the other end. 544 00:23:58,806 --> 00:24:00,696 Right now there's no way he would have known that 545 00:24:00,696 --> 00:24:02,556 and had he done that we wouldn't be showing the video 'cause it 546 00:24:02,556 --> 00:24:04,386 wouldn't be nearly as much fun to watch 547 00:24:04,596 --> 00:24:08,656 but he would have solved that problem in just one step, right? 548 00:24:08,656 --> 00:24:12,516 So, in one step you would say is the best case solution 549 00:24:12,516 --> 00:24:13,266 for that algorithm. 550 00:24:13,266 --> 00:24:14,276 The process is the same. 551 00:24:14,276 --> 00:24:15,506 Even if he started from the right 552 00:24:15,506 --> 00:24:18,446 and most likely he would have just played the same game 553 00:24:18,446 --> 00:24:20,616 but from right to left instead of left right. 554 00:24:20,826 --> 00:24:24,296 So, in the worst case, 7 could have been here and if Shawn just 555 00:24:24,296 --> 00:24:26,166 so happen to start over there, well, 556 00:24:26,166 --> 00:24:27,466 it's the same kind of bad luck. 557 00:24:27,466 --> 00:24:30,416 He would be a big O of N. But in the case of good luck 558 00:24:30,416 --> 00:24:33,366 where 7 is here and he starts here or 7 is here 559 00:24:33,366 --> 00:24:37,566 and he starts here, we would say that that algorithm is yes 560 00:24:37,566 --> 00:24:40,686 in big O of N 'cause big O is again worst case 561 00:24:41,016 --> 00:24:43,076 but we can also express it as Omega, 562 00:24:43,426 --> 00:24:46,146 Omega of some constant number. 563 00:24:46,146 --> 00:24:48,716 And how many steps in the best case might have taken Shawn 564 00:24:48,716 --> 00:24:49,716 to find the number 7? 565 00:24:50,496 --> 00:24:52,966 One, so we would say, in Omega of 1. 566 00:24:52,966 --> 00:24:55,426 So, big O parenthesis N for the worst case 567 00:24:55,646 --> 00:24:58,816 and then Omega parenthesis 1 for the best case. 568 00:24:58,816 --> 00:25:00,616 And maybe it could take him 2 steps, right? 569 00:25:00,616 --> 00:25:02,406 Maybe it actually takes 2 steps. 570 00:25:02,406 --> 00:25:05,186 Conceptually you have to lift the paper, look at number 571 00:25:05,186 --> 00:25:07,716 and that's two steps but the takeaway is 572 00:25:07,716 --> 00:25:09,346 that it's a constant number of steps. 573 00:25:09,446 --> 00:25:12,046 Irrespective of how many doors there are in the best case, 574 00:25:12,296 --> 00:25:15,026 the solution to that problem might only involve just 575 00:25:15,026 --> 00:25:15,596 one step. 576 00:25:16,206 --> 00:25:19,666 So, could he have done better with these doors? 577 00:25:19,726 --> 00:25:23,006 Would you have done anything differently given the same input 578 00:25:23,006 --> 00:25:23,526 that he was? 579 00:25:23,526 --> 00:25:24,236 [ Inaudible Remark ] 580 00:25:24,236 --> 00:25:29,056 >> Okay, so you could try the twosies approach where you could 581 00:25:29,056 --> 00:25:31,656 at least have kind of made that awkwardness go away twice 582 00:25:31,656 --> 00:25:33,916 as fast by checking these two and then these two 583 00:25:33,916 --> 00:25:36,626 and then these two and it would take 4 steps maximally instead 584 00:25:36,626 --> 00:25:37,016 of 8. 585 00:25:37,016 --> 00:25:40,776 So, in formulaic terms, it's still kind of big O 586 00:25:40,776 --> 00:25:43,646 of N 'cause it's big O of N over 2 but we said that, whatever, 587 00:25:43,646 --> 00:25:46,816 with the constant numbers it's still big O of N, but in reality 588 00:25:46,816 --> 00:25:48,306 that would absolutely have been faster, 589 00:25:48,306 --> 00:25:49,176 it would have been twice as fast. 590 00:25:49,466 --> 00:25:53,046 If he actually had some way of lifting two pages with one hand 591 00:25:53,046 --> 00:25:56,336 and then two in the other, he could actually do N over 4 592 00:25:56,336 --> 00:25:59,356 and get that over right quick in just two huge steps 593 00:25:59,726 --> 00:26:01,506 or lift all the papers at once. 594 00:26:01,806 --> 00:26:03,866 But again these aren't fundamental improvements 595 00:26:03,916 --> 00:26:06,776 and he's really given no additional information he 596 00:26:06,776 --> 00:26:07,406 can leverage. 597 00:26:07,406 --> 00:26:10,176 If I had him taught his whole, they're all in random order. 598 00:26:10,376 --> 00:26:13,506 I mean sure, you can start in random order, start here, no, 599 00:26:13,506 --> 00:26:16,706 start here, no, start here, no. 600 00:26:16,856 --> 00:26:19,386 But at the end, you're gonna have to check them all. 601 00:26:19,436 --> 00:26:21,326 And in fact f you try this random approach, 602 00:26:21,436 --> 00:26:23,606 what might you also have to do if you're now the human 603 00:26:23,606 --> 00:26:25,646 or a computer implementing that kind of randomness? 604 00:26:26,516 --> 00:26:29,436 You have to remember where you've been. 605 00:26:29,436 --> 00:26:32,116 So, here we see the first example of a tradeoff. 606 00:26:32,116 --> 00:26:35,666 Well, you can maybe get lucky and find the number 7 faster 607 00:26:35,666 --> 00:26:37,016 but it's gonna cost you something. 608 00:26:37,236 --> 00:26:39,256 You're gonna need to keep around some variables 609 00:26:39,256 --> 00:26:40,666 or another array just 610 00:26:40,666 --> 00:26:42,606 to remember what doors you've been at. 611 00:26:42,606 --> 00:26:45,556 For instance, you could have another array of size 8 612 00:26:45,776 --> 00:26:47,706 and just might-- much like you would on a piece paper, 613 00:26:47,706 --> 00:26:49,736 you could check off which doors you've been-- 614 00:26:49,876 --> 00:26:51,136 been to that you've opened 615 00:26:51,386 --> 00:26:53,366 but that's gonna take you another 8 boolean, 616 00:26:53,406 --> 00:26:55,326 some number of bits or bytes of memory. 617 00:26:55,746 --> 00:26:58,106 So, what would Shawn have benefited 618 00:26:58,106 --> 00:26:59,946 from besides my just flat 619 00:26:59,946 --> 00:27:02,196 out saying the number 7 is behind that door? 620 00:27:02,456 --> 00:27:03,356 What could I have done 621 00:27:03,356 --> 00:27:09,686 to at least help him make his algorithm better? 622 00:27:09,686 --> 00:27:09,876 [ Inaudible Remark ] 623 00:27:09,876 --> 00:27:10,506 >> Okay, yeah. 624 00:27:10,506 --> 00:27:12,256 So what if I have arranged it in ascending 625 00:27:12,256 --> 00:27:13,426 or descending order, right? 626 00:27:13,426 --> 00:27:15,466 What if I steal a very reasonable idea that, you know, 627 00:27:15,466 --> 00:27:16,526 Verizon has been doing 628 00:27:16,526 --> 00:27:18,766 for a thousand years and its predecessors. 629 00:27:18,996 --> 00:27:21,256 You're-- our phone numbers are not in random order in this book 630 00:27:21,256 --> 00:27:23,706 and for good reason, they're alphabetical, right? 631 00:27:23,706 --> 00:27:26,496 But if they're alphabetical, that's not just convenient, 632 00:27:26,556 --> 00:27:29,186 that is an assumption we can now exploit 633 00:27:29,386 --> 00:27:32,796 to implement this technique known as divide and conquer. 634 00:27:32,796 --> 00:27:35,236 So, let's go ahead and [inaudible] 635 00:27:35,236 --> 00:27:37,486 for this, we have a volunteer. 636 00:27:37,626 --> 00:27:41,076 And I think I can promise not to show you on video 2 years hence 637 00:27:41,166 --> 00:27:42,776 but let's see if you graduate first. 638 00:27:43,126 --> 00:27:45,556 How about in the back, you wanna come on down? 639 00:27:48,476 --> 00:27:53,046 So, let's suppose we did just that and behind these sets 640 00:27:53,046 --> 00:27:54,506 of doors, this second array. 641 00:27:54,506 --> 00:27:57,746 So, this is indeed what Shawn saw long ago. 642 00:27:58,826 --> 00:28:01,446 These were the answers that he was revealing one 643 00:28:01,446 --> 00:28:03,936 at a time, okay? 644 00:28:04,356 --> 00:28:07,486 Come on up, what's your name? 645 00:28:07,586 --> 00:28:07,836 >> Jeremy. 646 00:28:07,986 --> 00:28:08,606 >> Jeremy, alright. 647 00:28:08,606 --> 00:28:10,046 Jeremy, David, nice to meet you. 648 00:28:10,336 --> 00:28:12,146 Alright, so now Jeremy's tasked 649 00:28:12,786 --> 00:28:16,036 with finding let's say the number 50 but this time-- 650 00:28:16,036 --> 00:28:17,566 so these numbers are different, just to be clear, 651 00:28:17,566 --> 00:28:18,796 they're just different random numbers. 652 00:28:19,036 --> 00:28:22,726 But I did take some time before class to sort the numbers 653 00:28:23,006 --> 00:28:25,336 in such a way that there still might be gaps between them. 654 00:28:25,336 --> 00:28:28,226 The numbers are not gonna be, 48, 49, 50, 51, 52, 655 00:28:28,306 --> 00:28:30,626 there could be gaps but the array 656 00:28:30,626 --> 00:28:33,846 of numbers now is increasing from left to right. 657 00:28:33,846 --> 00:28:35,426 And I ask you, find the number 50. 658 00:28:36,106 --> 00:28:37,026 Before you reveal it, 659 00:28:37,246 --> 00:28:40,846 what's your first instinct, what do you wanna do? 660 00:28:40,846 --> 00:28:41,076 [ Inaudible Remark ] 661 00:28:41,076 --> 00:28:44,066 >> Okay, let's-- just for the folks playing along at home. 662 00:28:44,666 --> 00:28:46,396 >> Take one of two middle [inaudible] numbers. 663 00:28:47,326 --> 00:28:52,106 >> Okay. Okay, and why the middle ones? 664 00:28:52,106 --> 00:28:52,796 [ Inaudible Remark ] 665 00:28:52,796 --> 00:28:53,346 >> Okay, good. 666 00:28:53,346 --> 00:28:55,866 So, divide the numbers in half so that much 667 00:28:55,866 --> 00:28:57,346 like the phonebook we can then go left to right. 668 00:28:57,346 --> 00:28:59,696 Alright, so take your pick, reveal one such door. 669 00:29:00,966 --> 00:29:02,766 Okay, so we're at 61, alright? 670 00:29:02,766 --> 00:29:06,806 So, this was the number 61 and now this is obviously bigger 671 00:29:06,806 --> 00:29:07,956 or smaller than what we're looking for? 672 00:29:08,006 --> 00:29:10,566 Alright, so it's bigger, so much like we dramatically try 673 00:29:10,566 --> 00:29:11,866 to tear phonebooks in weeks past, 674 00:29:12,296 --> 00:29:15,286 this problem now really goes away and we don't have 675 00:29:15,286 --> 00:29:18,296 to even know what these numbers are but we can at least ignore 676 00:29:18,296 --> 00:29:20,046 that half of the problem entirely. 677 00:29:20,046 --> 00:29:21,906 So, we've gone from size 8 to size 4, 678 00:29:21,906 --> 00:29:23,006 now what do you wanna do? 679 00:29:23,606 --> 00:29:24,896 Alright, so now let's go down the middle. 680 00:29:25,606 --> 00:29:28,016 And again just to be clear here, there's really not a middle, 681 00:29:28,016 --> 00:29:30,686 so if we were implementing this in a program, you got to have 682 00:29:30,686 --> 00:29:32,216 to figure out how to round, you're either gonna have 683 00:29:32,216 --> 00:29:34,956 to round down to an even number or round up to an odd number, 684 00:29:34,956 --> 00:29:37,686 something like that, but we can at least do that consistently. 685 00:29:37,686 --> 00:29:40,256 So, you're rounding up slightly, perfect. 686 00:29:40,736 --> 00:29:42,306 So, after-- a big round of applause, 687 00:29:42,306 --> 00:29:42,986 that was-- [applause] Okay? 688 00:29:47,486 --> 00:29:49,796 So, partly good luck, right, 689 00:29:49,896 --> 00:29:53,366 because that could have taken Jeremy not 2 steps, 690 00:29:53,366 --> 00:29:55,276 which it did, but rather 3, right? 691 00:29:55,276 --> 00:29:56,856 If you'd kind of rounded down just based 692 00:29:56,856 --> 00:29:58,516 on whatever arithmetic you implemented, 693 00:29:58,726 --> 00:29:59,726 and was there any reason you went 694 00:29:59,726 --> 00:30:01,246 with the right instead of the left? 695 00:30:02,166 --> 00:30:03,796 >> 61 is pretty close. 696 00:30:03,846 --> 00:30:04,336 These numbers are [inaudible]. 697 00:30:04,886 --> 00:30:05,386 >> Okay. 698 00:30:05,386 --> 00:30:06,596 [ Inaudible Remark ] 699 00:30:06,596 --> 00:30:07,146 >> Okay, good. 700 00:30:07,146 --> 00:30:09,306 So, he was actually a little cheating then if taking 701 00:30:09,306 --> 00:30:12,026 into account the numbers that we pretended weren't there, 702 00:30:12,026 --> 00:30:14,756 but that's fine, right, because if you are given some additional 703 00:30:14,756 --> 00:30:17,446 input like the gaps are of this size or the numbers are 704 00:30:17,446 --> 00:30:18,326 between this range, 705 00:30:18,526 --> 00:30:20,846 that's absolutely something you could exploit and worse case, 706 00:30:20,846 --> 00:30:21,866 you just randomly choose. 707 00:30:22,116 --> 00:30:24,916 But in short, even though there's only 8 pieces of paper 708 00:30:24,916 --> 00:30:28,446 or doors here, we still went from 8 to 4 709 00:30:28,656 --> 00:30:31,806 and then we got lucky when we went from 4 to 1. 710 00:30:31,936 --> 00:30:33,486 But even in the worse case, we would have gone 711 00:30:33,486 --> 00:30:36,286 from 8 to 4 to 2 to 1. 712 00:30:36,546 --> 00:30:38,376 So, that's 3 iterations total 713 00:30:38,376 --> 00:30:41,916 and that indeed would be a logarithmic running time. 714 00:30:41,916 --> 00:30:44,366 Because if we then said to Jeremy, alright, 715 00:30:44,366 --> 00:30:46,366 there's not 8 pieces of paper, I actually maybe went 716 00:30:46,366 --> 00:30:48,296 to the effort of putting 16 pieces of paper, 717 00:30:48,486 --> 00:30:49,666 how many steps would that have taken you? 718 00:30:50,946 --> 00:30:55,926 Well, [inaudible] not twice as many. 719 00:30:55,926 --> 00:30:56,756 [ Inaudible Remark ] 720 00:30:56,756 --> 00:30:57,766 >> Okay, good. 721 00:30:57,766 --> 00:31:00,646 So if we had 8 pieces of paper here and it took you 2 steps 722 00:31:00,646 --> 00:31:02,756 in that case, suppose we doubled it. 723 00:31:02,756 --> 00:31:03,236 [ Inaudible Remark ] 724 00:31:03,236 --> 00:31:05,196 >> Exactly, just one more division. 725 00:31:05,196 --> 00:31:06,316 And that's the powerful thing. 726 00:31:06,316 --> 00:31:08,576 We go from 8 to 16 to 32 to 64 727 00:31:08,756 --> 00:31:11,346 and you can keep batting these problems away, chopping them 728 00:31:11,346 --> 00:31:13,536 in half ever so easily. 729 00:31:13,616 --> 00:31:17,976 So, very well done, thank you so much, alright. 730 00:31:19,426 --> 00:31:19,776 [Applause] Alright. 731 00:31:19,776 --> 00:31:23,156 So, unfortunately, when you are just given some data 732 00:31:23,346 --> 00:31:26,136 like you don't have someone presorting things for you. 733 00:31:26,136 --> 00:31:28,576 So at the end of the day, this kind of begs the question, 734 00:31:28,816 --> 00:31:32,266 just how efficiently can we actually sort all 735 00:31:32,266 --> 00:31:32,896 of these numbers? 736 00:31:32,896 --> 00:31:34,506 How do we go about doing it 737 00:31:34,506 --> 00:31:37,276 and with what techniques can we do and visualize this? 738 00:31:37,686 --> 00:31:39,556 Well, why don't we go ahead and pause here, 739 00:31:39,556 --> 00:31:41,776 take our 5-minute break and then we'll bring some more folks 740 00:31:41,776 --> 00:31:45,626 up on stage. 741 00:31:46,216 --> 00:31:50,436 Alright, so let's put these numbers aside and use numbers 742 00:31:50,436 --> 00:31:52,596 that are a little smaller and more manageable. 743 00:31:52,826 --> 00:31:55,346 Here are some numbers in essentially random order, 744 00:31:55,346 --> 00:31:56,396 and there's only 8 of them. 745 00:31:56,546 --> 00:31:58,756 And the fact that I keep using frankly 8 is kind 746 00:31:58,756 --> 00:32:00,746 of intentional 'cause it divides by two nicely, 747 00:32:00,746 --> 00:32:02,906 but in reality you might get sizes of 9 748 00:32:02,906 --> 00:32:04,456 or 13 or any crazy numbers. 749 00:32:04,736 --> 00:32:06,226 So again just realize when it comes time 750 00:32:06,226 --> 00:32:09,286 to implement something where you're dividing by 2, 751 00:32:09,476 --> 00:32:11,986 you're not necessarily gonna get as perfect numbers as we are. 752 00:32:12,236 --> 00:32:14,376 So, you're gonna have to figure out whether to round up or down 753 00:32:14,376 --> 00:32:16,406 but you've seen things like the round function already. 754 00:32:16,446 --> 00:32:18,036 So, that will come up again before long. 755 00:32:18,376 --> 00:32:22,736 So, it's more fun than just kind of talking very boringly 756 00:32:22,736 --> 00:32:23,856 about numbers on the screen. 757 00:32:24,156 --> 00:32:25,606 We actually try to engage 758 00:32:25,606 --> 00:32:26,696 in other [inaudible] worth of folks. 759 00:32:26,966 --> 00:32:30,356 Can I ask for your indulgence one last time 760 00:32:30,616 --> 00:32:32,156 to have 8 people join us up here? 761 00:32:33,316 --> 00:32:34,656 Okay, you are number 1. 762 00:32:35,096 --> 00:32:35,666 Come on up. 763 00:32:35,666 --> 00:32:41,186 Number 2, 3, 4, 5, 6, 7 and really it's just me 764 00:32:41,186 --> 00:32:41,996 and the orchestra today. 765 00:32:42,236 --> 00:32:45,206 Okay. Well, it's-- wait, wait, I said 7. 766 00:32:45,206 --> 00:32:46,966 How about let's go in the back there, 8. 767 00:32:47,416 --> 00:32:48,096 Alright, come on down. 768 00:32:48,206 --> 00:32:50,906 I think I got 3, 4, 5, 6, 7, perfect. 769 00:32:51,436 --> 00:32:54,896 Alright, you raised your hand first, so go ahead if you would 770 00:32:54,896 --> 00:32:58,946 and just line yourselves up on stage in a manner consistent 771 00:32:59,376 --> 00:33:10,786 with the sample on the screen there, 1, 2, 3, oh there we go. 772 00:33:10,926 --> 00:33:11,576 Just in time. 773 00:33:12,316 --> 00:33:15,166 Alright. And so-- so that the lectern is not in the way, 774 00:33:15,166 --> 00:33:18,226 why don't we come over here if you don't mind. 775 00:33:19,996 --> 00:33:22,136 Alright, so this is the easy part. 776 00:33:22,696 --> 00:33:25,936 Make yourselves look like that. 777 00:33:26,116 --> 00:33:30,466 Alright, hopefully we are correct, okay? 778 00:33:30,656 --> 00:33:32,226 So, now we need one ninth volunteer 779 00:33:32,226 --> 00:33:33,386 and this person actually has 780 00:33:33,416 --> 00:33:36,206 to do some thinking on their feet, okay? 781 00:33:36,206 --> 00:33:38,976 Come on down, yeah, in the black shirt. 782 00:33:40,626 --> 00:33:41,996 Oh, I'm sorry; we'll get you next time. 783 00:33:42,076 --> 00:33:42,806 Black shirt, how about it? 784 00:33:42,806 --> 00:33:44,116 Come on down. 785 00:33:44,116 --> 00:33:46,826 Alright, so in just a moment we're gonna have to figure 786 00:33:46,826 --> 00:33:49,266 out how we can take this array of numbers 787 00:33:49,266 --> 00:33:51,276 which is effectively random, much like the first array 788 00:33:51,276 --> 00:33:53,296 that Shawn actually used, but we need to sort this. 789 00:33:53,526 --> 00:33:55,716 And even though this too might seem at first glance, 790 00:33:55,966 --> 00:33:57,036 and this is kind of easy, right? 791 00:33:57,036 --> 00:33:58,856 Like, there's number 1, let's put 1 there, 8 there, 792 00:33:59,276 --> 00:34:01,286 that's the sort of human approach, where you just kind 793 00:34:01,286 --> 00:34:04,006 of get an overarching sense of things and you just kind 794 00:34:04,006 --> 00:34:05,976 of know instinctively what to do. 795 00:34:06,186 --> 00:34:08,566 But again if you're a computer or even a robot, if you think 796 00:34:08,566 --> 00:34:11,486 of it that way, you really can only do one thing at a time 797 00:34:11,676 --> 00:34:14,346 and so you might have to first look here behind this door 798 00:34:14,346 --> 00:34:15,946 and there's a number, then you can move 799 00:34:15,946 --> 00:34:17,396 on to the next, then on to the next. 800 00:34:17,646 --> 00:34:19,856 But then if you wanna swap people, you actually have 801 00:34:19,886 --> 00:34:23,286 to spend some CPU time, some CPU cycles to move them around. 802 00:34:23,556 --> 00:34:25,356 So, again we come back to this notion of swap. 803 00:34:25,356 --> 00:34:28,026 But unfortunately according to last week's buggy program, 804 00:34:28,276 --> 00:34:30,196 swap doesn't really work in C. So, 805 00:34:30,196 --> 00:34:31,926 that's something else we'll have to fix before long. 806 00:34:32,096 --> 00:34:32,616 And what is your name? 807 00:34:32,856 --> 00:34:33,166 >> Casey [phonetic]. 808 00:34:33,276 --> 00:34:33,886 >> Casey, okay. 809 00:34:33,886 --> 00:34:34,546 Casey, David. 810 00:34:34,776 --> 00:34:38,476 So, Casey is gonna be our sort of godlike figure here 811 00:34:38,476 --> 00:34:40,966 and actually moving these folks around. 812 00:34:40,966 --> 00:34:44,026 So, if you guys could just take just a step or so back just 813 00:34:44,026 --> 00:34:45,936 so Casey has some leg room in front. 814 00:34:47,086 --> 00:34:49,516 This may end up being a little awkward. 815 00:34:49,976 --> 00:34:52,226 But let's do this. 816 00:34:52,856 --> 00:34:56,646 So, Casey, your task first is to sort these people 817 00:34:56,926 --> 00:34:59,026 and before you actually instruct them to do something, 818 00:34:59,026 --> 00:35:02,096 at least give us, the audience, an idea verbally 819 00:35:02,096 --> 00:35:03,586 of what you're going to do. 820 00:35:04,846 --> 00:35:05,516 What would you like to do? 821 00:35:05,516 --> 00:35:06,856 Sort these people, what would you do first? 822 00:35:07,006 --> 00:35:08,466 >> In a descending order. 823 00:35:08,466 --> 00:35:10,696 >> Yes, so in from 1 on the left to 8 on the right. 824 00:35:11,486 --> 00:35:18,276 How would you approach this problem as a normal human being? 825 00:35:18,276 --> 00:35:18,446 [ Inaudible Remark ] 826 00:35:18,446 --> 00:35:23,836 >> Okay, so do that step, go. 827 00:35:24,336 --> 00:35:27,046 [ Inaudible Remark ] 828 00:35:27,546 --> 00:35:29,256 >> Very good so far, okay? 829 00:35:29,586 --> 00:35:32,776 What would you do next? 830 00:35:32,776 --> 00:35:33,766 [ Inaudible Remark ] 831 00:35:33,766 --> 00:35:37,746 >> Okay, and next. 832 00:35:37,746 --> 00:35:38,606 [ Inaudible Remark ] 833 00:35:38,606 --> 00:35:44,756 >> Okay, so there's a pattern. 834 00:35:44,756 --> 00:35:45,506 [ Inaudible Remark ] 835 00:35:45,506 --> 00:35:45,666 >> Good. 836 00:35:45,666 --> 00:35:47,056 [ Inaudible Remark ] 837 00:35:47,056 --> 00:35:47,156 >> Okay. 838 00:35:47,156 --> 00:35:48,796 [ Inaudible Remark ] 839 00:35:48,796 --> 00:35:49,956 >> Okay, done. 840 00:35:50,236 --> 00:35:51,416 So, this looks correct. 841 00:35:51,826 --> 00:35:52,406 Oh, very good. 842 00:35:52,411 --> 00:35:54,411 [ Applause ] 843 00:35:54,416 --> 00:35:58,866 >> Very well done. 844 00:35:58,866 --> 00:36:01,186 So, the question now for the audience is, 845 00:36:01,346 --> 00:36:02,426 "How long did that take?" 846 00:36:02,536 --> 00:36:05,316 Not so much in like seconds on the clock but how many steps? 847 00:36:05,396 --> 00:36:06,326 How many operations? 848 00:36:06,436 --> 00:36:07,586 How many units of time? 849 00:36:08,086 --> 00:36:10,516 Like, how can we begin to quantize what we just did? 850 00:36:11,916 --> 00:36:13,096 So, N minus 1. 851 00:36:13,096 --> 00:36:16,276 So it felt like if there're 8 people, right, 852 00:36:16,276 --> 00:36:18,156 she had to do 8 things minimally. 853 00:36:18,156 --> 00:36:19,546 So, there's at least 8 steps. 854 00:36:20,126 --> 00:36:22,416 But it's not quite as simple as that, right? 855 00:36:22,416 --> 00:36:25,886 Again, if you now think of these humans as an array, each of them 856 00:36:25,886 --> 00:36:28,806 in their own bucket of memory, all be it side by side 857 00:36:28,906 --> 00:36:30,536 but in a distinct location, 858 00:36:30,876 --> 00:36:33,386 how did Casey actually find the number 1? 859 00:36:33,596 --> 00:36:34,596 So, let's actually rewind. 860 00:36:34,596 --> 00:36:35,636 So, Casey you can come back here. 861 00:36:35,636 --> 00:36:38,286 If you guys could revert back to the original configuration, 862 00:36:38,756 --> 00:36:40,476 let's now dig a bit deeper 863 00:36:40,476 --> 00:36:44,076 because sorting N people is actually not as fast as big O 864 00:36:44,076 --> 00:36:45,786 of N steps or even N minus 1. 865 00:36:45,976 --> 00:36:47,446 That did not take N steps. 866 00:36:47,446 --> 00:36:49,756 We actually have to push a little harder on this question. 867 00:36:50,016 --> 00:36:52,586 So, again we rewind and you said, "Okay, 868 00:36:52,586 --> 00:36:53,756 you're gonna pluck out number 1." 869 00:36:53,956 --> 00:36:55,386 How did you find number 1? 870 00:36:55,386 --> 00:36:56,376 [ Inaudible Remark ] 871 00:36:56,376 --> 00:36:58,246 >> Okay, good. 872 00:36:59,676 --> 00:37:03,506 So, a reasonable human would do that. 873 00:37:03,506 --> 00:37:04,486 You kind of look at all the numbers, 874 00:37:04,536 --> 00:37:07,026 but therein lies a hidden cost, right? 875 00:37:07,026 --> 00:37:08,396 How do you look at all those numbers? 876 00:37:08,536 --> 00:37:10,216 Well, even if your brain did it super fast, 877 00:37:10,216 --> 00:37:13,226 what you probably kind of sort of did was at least glance at 4, 878 00:37:13,226 --> 00:37:15,626 glance at 2, glance at 6, glance at 8, glance at 1 879 00:37:15,866 --> 00:37:17,126 and then maybe there's a 0. 880 00:37:17,126 --> 00:37:19,786 So you might have glanced at 3, glanced at 7, glanced at 5 881 00:37:19,816 --> 00:37:22,846 and then you concluded that number 1 is indeed the smallest. 882 00:37:22,846 --> 00:37:24,786 So, how many steps did we actually just spend? 883 00:37:26,026 --> 00:37:27,266 That was actually 8 steps 884 00:37:27,356 --> 00:37:30,196 to find the smallest number in this array. 885 00:37:30,196 --> 00:37:31,636 And so then what did we do? 886 00:37:31,636 --> 00:37:32,566 So, now let's fast forward, 887 00:37:32,566 --> 00:37:34,706 so you identified number 1 N steps later, 888 00:37:34,706 --> 00:37:35,956 how did you move him over here? 889 00:37:35,956 --> 00:37:38,196 >> I told him to move. 890 00:37:38,196 --> 00:37:40,026 >> Okay. So, you told him to move, 891 00:37:40,026 --> 00:37:41,036 but now let's consider that. 892 00:37:41,526 --> 00:37:42,276 How does this work? 893 00:37:42,276 --> 00:37:44,016 So, what's your name, number 1? 894 00:37:44,016 --> 00:37:44,086 [ Inaudible Remark ] 895 00:37:44,086 --> 00:37:44,156 >> [Inaudible]. 896 00:37:44,156 --> 00:37:45,846 So, if you could go ahead and step forward 897 00:37:45,846 --> 00:37:48,506 and no one else move just yet, how did you go 898 00:37:48,506 --> 00:37:50,656 about inserting yourself into the right part of the list? 899 00:37:51,636 --> 00:37:53,436 Okay, so you walked in in the list, 900 00:37:53,436 --> 00:37:56,856 so let's keep playing so we could do that. 901 00:37:56,856 --> 00:37:57,066 [ Inaudible Remark ] 902 00:37:57,066 --> 00:37:58,656 >> Yeah, we'll do exactly what you did last time 903 00:37:58,856 --> 00:38:01,146 and then we'll pause you. 904 00:38:01,376 --> 00:38:04,606 Pause. Now what did you do? 905 00:38:04,606 --> 00:38:04,961 [ Inaudible Remark ] 906 00:38:04,961 --> 00:38:05,316 [ Laughter ] 907 00:38:05,316 --> 00:38:07,636 >> Yeah, we stopped, okay, and then what? 908 00:38:09,366 --> 00:38:11,666 Now, if this were an array, what did we just do? 909 00:38:11,666 --> 00:38:13,506 We tried to pull up number 1 910 00:38:13,566 --> 00:38:16,876 in location negative number-- a negative 1, right? 911 00:38:16,876 --> 00:38:19,116 You can't just kind of go to the beginning of the array, 912 00:38:19,116 --> 00:38:21,856 you wanted to go at location bracket 0. 913 00:38:22,006 --> 00:38:23,276 So-- and what's your name number 4? 914 00:38:23,276 --> 00:38:23,466 >> Sam. 915 00:38:23,686 --> 00:38:26,396 >> Sam. So Sam, what really should Sam do here 916 00:38:26,396 --> 00:38:28,336 and what should number 2 and 6 and 8 really do? 917 00:38:28,336 --> 00:38:31,396 You know, it'd be nice if they kind of made some room. 918 00:38:31,396 --> 00:38:33,136 But wait a minute, what did that just cost us? 919 00:38:33,136 --> 00:38:34,876 That's 4 more steps, right? 920 00:38:34,876 --> 00:38:35,896 So, that's expensive. 921 00:38:36,056 --> 00:38:37,646 Well, let's rewind, that feels too expensive, 922 00:38:37,646 --> 00:38:39,386 this is taking forever now with all these steps. 923 00:38:39,676 --> 00:38:41,856 What's slightly smarter than having all 924 00:38:41,856 --> 00:38:43,006 of his neighbors move down? 925 00:38:43,506 --> 00:38:44,916 Okay, so just swap who? 926 00:38:44,916 --> 00:38:48,596 So, how about just 4 and 1, right? 927 00:38:48,596 --> 00:38:50,846 'Cause Sam here, yeah, he's number 4 928 00:38:51,056 --> 00:38:52,956 but we don't really know anything about him yet. 929 00:38:53,196 --> 00:38:55,396 He might be the biggest number, he might be the smallest. 930 00:38:55,396 --> 00:38:58,736 So far as we're concerned, Sam is still completely random. 931 00:38:58,916 --> 00:39:00,986 So, who cares if we put him in another random spot? 932 00:39:00,986 --> 00:39:02,716 So, let's move Sam over there. 933 00:39:02,876 --> 00:39:04,516 That frees up bracket 0, 934 00:39:04,686 --> 00:39:08,806 so now [inaudible] can go into bracket 0. 935 00:39:09,376 --> 00:39:11,006 So, now you can, yes. 936 00:39:11,106 --> 00:39:13,016 So, now how many steps did that take? 937 00:39:13,016 --> 00:39:15,786 Well, to rewind, it-- first took Casey some 8 steps 938 00:39:16,006 --> 00:39:18,906 to identify number 1 then it took what? 939 00:39:19,026 --> 00:39:22,456 One more step to move Sam and then one more step 940 00:39:22,496 --> 00:39:25,916 to move [inaudible] into that spot, so that's like 10 steps. 941 00:39:25,916 --> 00:39:28,846 So, it's like N plus 1, N plus 2, give or take, 942 00:39:29,246 --> 00:39:31,326 but it's at least N, give or take. 943 00:39:31,556 --> 00:39:33,766 So, Casey now, and this story has obviously taken a lot 944 00:39:33,766 --> 00:39:34,276 longer now. 945 00:39:34,546 --> 00:39:36,936 Now, what do we do next to find number 2, 946 00:39:36,936 --> 00:39:39,056 or rather the next smallest? 947 00:39:39,726 --> 00:39:42,356 >> Look at them again. 948 00:39:42,356 --> 00:39:43,506 >> Okay, so we look at them again. 949 00:39:43,696 --> 00:39:46,326 Do we keep-- need to keep looking at all 8 people though? 950 00:39:46,326 --> 00:39:46,426 [ Inaudible Remark ] 951 00:39:46,426 --> 00:39:49,646 >> Okay good, so there's a slight optimization here, right? 952 00:39:49,646 --> 00:39:51,346 We don't need to look at spot number 1-- 953 00:39:51,346 --> 00:39:54,356 spot number 0 ever again where the number 1 is 954 00:39:54,356 --> 00:39:57,016 because we just put him there as the smallest so we can 955 00:39:57,016 --> 00:40:00,356 at least whittle this down to a problem of size N minus 1, 956 00:40:00,516 --> 00:40:03,016 then next N minus 2, N minus 3, dot, dot, 957 00:40:03,016 --> 00:40:04,466 dot, hopefully down to 0. 958 00:40:04,466 --> 00:40:05,436 So, how do we find number 2? 959 00:40:05,436 --> 00:40:05,976 What is your name? 960 00:40:06,046 --> 00:40:06,526 >> Emy [phonetic]. 961 00:40:06,976 --> 00:40:09,836 >> Emy. So Emy happens to be holding number 2, 962 00:40:09,836 --> 00:40:11,646 but do we know she's the next smallest? 963 00:40:12,246 --> 00:40:15,216 What do you think? 964 00:40:15,826 --> 00:40:18,036 Casey, how do you know for sure or not 965 00:40:18,096 --> 00:40:23,326 if Emy is the next smallest value? 966 00:40:23,326 --> 00:40:23,556 [ Inaudible Remark ] 967 00:40:23,556 --> 00:40:26,306 >> So, really even though it looks obviously 2 is pretty 968 00:40:26,306 --> 00:40:27,686 small, it's probably next to 1. 969 00:40:27,846 --> 00:40:30,756 We only know that if we look at a total of 7 numbers. 970 00:40:30,996 --> 00:40:33,296 And now this, this is getting a little awkward up here, right? 971 00:40:33,296 --> 00:40:34,666 So, let's just fast forward verbally. 972 00:40:34,876 --> 00:40:39,026 So, now to find 3, we do N minus 2 steps, then N minus 3 steps, 973 00:40:39,026 --> 00:40:42,916 then N minus 4 steps, until finally we find number 8 974 00:40:42,916 --> 00:40:45,976 and we can just leave him in spot number 8. 975 00:40:46,206 --> 00:40:49,056 So, that then begs the question, how may total steps was this? 976 00:40:49,156 --> 00:40:52,236 It does feel like it was way more than just the initial sense 977 00:40:52,236 --> 00:40:54,986 of N steps or N minus 1 to sort all these people 978 00:40:55,126 --> 00:40:56,076 when you actually start counting 979 00:40:56,076 --> 00:40:57,646 up the comparisons and the swaps. 980 00:40:58,076 --> 00:40:59,236 So, really how many was this? 981 00:41:00,036 --> 00:41:03,006 Yeah, so if you actually add all this up, 982 00:41:03,006 --> 00:41:05,906 this is effectively N steps, plus N minus 1, 983 00:41:05,906 --> 00:41:08,956 plus N minus 2, plus N minus 3. 984 00:41:08,956 --> 00:41:10,046 And now if you kind of think back 985 00:41:10,086 --> 00:41:12,266 to like your high school physics book or math book 986 00:41:12,266 --> 00:41:14,806 with all the formulas and whatnot, this is a series 987 00:41:14,806 --> 00:41:17,686 of sorts, a summation and so if you actually add N-- 988 00:41:17,786 --> 00:41:20,286 8 plus 7 plus 6 plus 5 plus 4 and so forth, 989 00:41:20,456 --> 00:41:25,226 you essentially get that formula N times N plus 1 over 2, 990 00:41:25,226 --> 00:41:26,986 so N squared something. 991 00:41:27,146 --> 00:41:30,286 So, in short, it is in fact something like N squared. 992 00:41:30,286 --> 00:41:32,416 And again, if we added all these up, 993 00:41:32,416 --> 00:41:34,786 that would be the formula we get, N squared something. 994 00:41:34,986 --> 00:41:36,006 And how do we feel this? 995 00:41:36,086 --> 00:41:37,916 Well, just to if you'll humor me with this part, 996 00:41:38,006 --> 00:41:39,486 how do we go and find number 3? 997 00:41:39,526 --> 00:41:41,796 Well, you go here, here, here, okay. 998 00:41:41,796 --> 00:41:47,086 So, smallest 6, not smaller, smaller 4, oh smaller still 3, 999 00:41:47,536 --> 00:41:49,536 so you can kind of feel the inefficiency of this, 1000 00:41:49,536 --> 00:41:52,426 so find out-- pluck out 3, swap him with 6. 1001 00:41:53,716 --> 00:41:55,186 Now, I can at least optimize. 1002 00:41:55,186 --> 00:41:58,016 I never again have to look over here but now I start with 8. 1003 00:41:58,016 --> 00:41:58,876 Is 8 the smallest? 1004 00:41:58,876 --> 00:41:59,806 At this moment it is. 1005 00:42:00,076 --> 00:42:03,666 How about-- well, 4 is now smaller, 6 is not, 7 is not, 1006 00:42:03,706 --> 00:42:06,006 5 is not, so I better remember that 4 is there, 1007 00:42:06,006 --> 00:42:08,476 so I pull Sam out, swap with 8. 1008 00:42:08,716 --> 00:42:11,596 So, in short, even though this went super fast the first time, 1009 00:42:11,596 --> 00:42:14,636 if you really break it down into sort of scratch puzzle pieces 1010 00:42:14,896 --> 00:42:18,746 or C-loops, well really what Casey would have been doing is 1011 00:42:18,746 --> 00:42:23,016 walking this far the first time then going back starting here, 1012 00:42:23,286 --> 00:42:24,336 walking here again. 1013 00:42:25,006 --> 00:42:27,836 So, this is step 7 steps, then we go back over here. 1014 00:42:28,386 --> 00:42:29,616 Now, we've got 6 steps. 1015 00:42:30,396 --> 00:42:31,646 Now, these are real steps. 1016 00:42:31,856 --> 00:42:34,086 Because arrays are contiguous back to back, 1017 00:42:34,426 --> 00:42:36,206 we do have random access, right? 1018 00:42:36,206 --> 00:42:38,996 I can go from bracket 7 to bracket 0 like that 1019 00:42:39,506 --> 00:42:41,456 because I know exactly where 0 is. 1020 00:42:41,656 --> 00:42:43,276 But if I wanna compare these people, 1021 00:42:43,476 --> 00:42:46,066 I can go back here very easily but again, 1022 00:42:46,066 --> 00:42:48,896 you're literally going back and forth and back and forth 1023 00:42:48,896 --> 00:42:50,376 and back and forth and that adds up 1024 00:42:50,536 --> 00:42:53,566 and in fact it gives us something like N squared. 1025 00:42:55,086 --> 00:42:56,436 Let's do something a little different. 1026 00:42:56,436 --> 00:42:58,346 So, if you guys could reset one last time 1027 00:42:58,346 --> 00:42:59,746 to this original configuration. 1028 00:43:01,316 --> 00:43:03,136 That felt nice but it turns 1029 00:43:03,136 --> 00:43:04,936 out we can solve this problem actually in like 1030 00:43:04,936 --> 00:43:06,396 at least half a dozen ways. 1031 00:43:06,396 --> 00:43:09,016 Some of them increasingly powerful or even slower, 1032 00:43:09,366 --> 00:43:10,406 so let me just do this. 1033 00:43:10,906 --> 00:43:12,576 That was perfect implementation. 1034 00:43:12,576 --> 00:43:15,346 In fact what Casey proposed is something called selection sort. 1035 00:43:15,346 --> 00:43:15,916 There's a whole bunch 1036 00:43:15,916 --> 00:43:17,376 of algorithms the world has invented. 1037 00:43:17,616 --> 00:43:18,766 Selection sort is the one 1038 00:43:18,766 --> 00:43:21,526 where literally you select the smallest number then you do it 1039 00:43:21,526 --> 00:43:25,026 again and again and again and just by common sense 1040 00:43:25,026 --> 00:43:26,886 if you keep selecting the smallest number, 1041 00:43:27,136 --> 00:43:29,786 you're gonna get a shorted list if you do what Casey just did. 1042 00:43:30,026 --> 00:43:31,766 There's this other one though called bubble sort 1043 00:43:31,766 --> 00:43:34,506 and this one you can kind of see as the name implies 1044 00:43:34,506 --> 00:43:35,496 that there's a bubbling effect. 1045 00:43:35,886 --> 00:43:38,706 So, let's actually try this, let's compare each 1046 00:43:38,706 --> 00:43:42,666 of these volunteers side by side and if they're out of order, 1047 00:43:42,666 --> 00:43:45,106 let's just swap them pairwise. 1048 00:43:45,106 --> 00:43:46,886 Let's see if we can take a whole different approach here. 1049 00:43:46,886 --> 00:43:49,766 So, Casey we point first at bracket 0 and bracket 1, 1050 00:43:49,986 --> 00:43:51,556 are they in order or out of order? 1051 00:43:52,286 --> 00:43:53,486 Out of order, so we swap? 1052 00:43:53,486 --> 00:43:59,086 So that took what, 1, 2, 3 steps maybe, right? 1053 00:43:59,086 --> 00:44:01,956 One step to compare, one step to move Sam 1054 00:44:02,216 --> 00:44:03,566 and one step to move Emy. 1055 00:44:03,566 --> 00:44:06,866 So that's like 3 total steps but that's constant, so that's nice, 1056 00:44:07,156 --> 00:44:09,326 one big 3, a 3-unit step. 1057 00:44:09,636 --> 00:44:12,186 Alright, so now we compare 4 and 6, and do you wanna swap or not? 1058 00:44:12,186 --> 00:44:13,126 >> Don't swap them. 1059 00:44:13,446 --> 00:44:14,236 >> Okay, 6 and 8? 1060 00:44:14,676 --> 00:44:15,196 >> Don't swap. 1061 00:44:15,196 --> 00:44:15,646 >> 8 and 1? 1062 00:44:15,816 --> 00:44:16,396 >> Swap. 1063 00:44:16,686 --> 00:44:19,266 >> So that's 3 more steps we just spent. 1064 00:44:19,266 --> 00:44:19,726 8 and 3. 1065 00:44:20,026 --> 00:44:20,806 >> Swap them. 1066 00:44:20,806 --> 00:44:21,786 >> 8 and 7? 1067 00:44:21,916 --> 00:44:22,386 >> Swap them. 1068 00:44:22,916 --> 00:44:23,416 >> 8 and 5. 1069 00:44:23,766 --> 00:44:24,696 >> Swap them. 1070 00:44:24,866 --> 00:44:25,396 >> Done, right? 1071 00:44:26,406 --> 00:44:27,326 Alright, still not quite. 1072 00:44:27,426 --> 00:44:28,956 Right, this algorithm is a little different 1073 00:44:28,956 --> 00:44:31,446 in that we did fix some problems along the way, right? 1074 00:44:31,446 --> 00:44:34,556 We fixed a lot of juxtapositions where we fixed the ordering 1075 00:44:34,906 --> 00:44:37,376 but what is the worst case scenario when trying 1076 00:44:37,376 --> 00:44:38,806 to sort N humans like this? 1077 00:44:39,876 --> 00:44:41,126 What's the worst possible input? 1078 00:44:42,646 --> 00:44:44,266 Right there in reverse order. 1079 00:44:44,486 --> 00:44:47,396 So, let's actually see what bubble sort is like in the-- 1080 00:44:47,396 --> 00:44:48,956 in the really,the worst case. 1081 00:44:49,016 --> 00:44:51,936 So, let's ignore this now and now go ahead and just line 1082 00:44:51,936 --> 00:44:56,796 up in reverse order, 8 to 1, and let's feel how much faster 1083 00:44:56,796 --> 00:44:59,526 or slower this alternative algorithm is, bubble sort, 1084 00:44:59,836 --> 00:45:01,186 to that thing called selection sort. 1085 00:45:01,186 --> 00:45:01,716 So, here we go. 1086 00:45:01,986 --> 00:45:03,796 Bubble sort begins with 8 and 7. 1087 00:45:03,796 --> 00:45:04,126 >> Swap them. 1088 00:45:05,046 --> 00:45:05,656 >> 8 and 6? 1089 00:45:05,696 --> 00:45:06,446 >> Swap them. 1090 00:45:06,446 --> 00:45:06,906 >> 8 and 5? 1091 00:45:06,966 --> 00:45:07,656 >> Swap. 1092 00:45:07,656 --> 00:45:08,126 >> 8 and 4? 1093 00:45:08,186 --> 00:45:08,436 >> Swap. 1094 00:45:08,436 --> 00:45:08,886 >> 8 and 3? 1095 00:45:08,886 --> 00:45:09,196 >> Swap. 1096 00:45:09,196 --> 00:45:09,596 >> 8 and 2? 1097 00:45:09,686 --> 00:45:09,806 >> Swap. 1098 00:45:09,806 --> 00:45:10,286 >> 8 and 1. 1099 00:45:10,736 --> 00:45:13,356 Okay, so done but not quite. 1100 00:45:13,676 --> 00:45:15,826 What's curious about bubble sort, and if you actually kind 1101 00:45:15,826 --> 00:45:17,526 of think this through logically to the end, 1102 00:45:17,526 --> 00:45:19,746 this will actually work because if you just keep on going back 1103 00:45:19,746 --> 00:45:22,036 and forth, back and forth, fixing all of the problems, 1104 00:45:22,036 --> 00:45:24,636 albeit slowly, they will eventually bubble 1105 00:45:24,636 --> 00:45:26,346 up the small numbers here and bubble 1106 00:45:26,346 --> 00:45:27,766 down the big numbers there and that's 1107 00:45:27,766 --> 00:45:28,946 where bubble sort gets its name. 1108 00:45:29,316 --> 00:45:32,546 But how many steps did this actually take? 1109 00:45:32,546 --> 00:45:35,246 Well, in the worst case, number 8 was indeed over here 1110 00:45:35,246 --> 00:45:39,766 and number 1 was over there so how many spots did number 8 need 1111 00:45:39,766 --> 00:45:41,026 to move to fix just him? 1112 00:45:41,026 --> 00:45:44,166 He had to move all the way from the left 1113 00:45:44,626 --> 00:45:45,716 to all the way to the right. 1114 00:45:45,716 --> 00:45:48,156 So, that's like N steps, it's actually N minus 1 1115 00:45:48,556 --> 00:45:49,716 but it's like N steps. 1116 00:45:50,186 --> 00:45:51,406 What about the next person? 1117 00:45:51,586 --> 00:45:53,416 Now, we have to move 7 all the way down. 1118 00:45:53,626 --> 00:45:54,916 That's N minus 2. 1119 00:45:55,166 --> 00:45:58,746 Then we have to move 6, N minus 3, then N minus 4. 1120 00:45:58,746 --> 00:45:59,696 So, where is this going? 1121 00:45:59,696 --> 00:46:01,196 What's the running time of bubble sort? 1122 00:46:02,386 --> 00:46:03,736 It's also N squared. 1123 00:46:03,976 --> 00:46:06,406 Why? Well, there's N total people up here 1124 00:46:06,696 --> 00:46:09,946 and on every iteration, you might move someone 1125 00:46:09,946 --> 00:46:11,236 into their perfect position 1126 00:46:11,236 --> 00:46:14,036 but you still have 7 more people then 6 more people. 1127 00:46:14,036 --> 00:46:17,576 So, you have to do N things roughly N times, 1128 00:46:17,806 --> 00:46:19,716 and so even if we don't really do out the math but just think 1129 00:46:19,716 --> 00:46:21,526 about it, you've got N people and you might have 1130 00:46:21,556 --> 00:46:25,056 to walk this list N times to fix all of the juxtapositions, 1131 00:46:25,056 --> 00:46:28,656 all of the misorderings, that's N times N or N squared. 1132 00:46:28,886 --> 00:46:31,626 So bubble sort sucks, selection sort sucks. 1133 00:46:31,696 --> 00:46:34,746 Like we saw the graph up there before where linear was good, 1134 00:46:34,746 --> 00:46:37,266 N over 2 was good, logarithm was amazing 1135 00:46:37,486 --> 00:46:39,076 but now, where are we at? 1136 00:46:39,076 --> 00:46:41,826 We're at the N squared which is this curve that's goes off 1137 00:46:41,826 --> 00:46:45,106 into the positive direction is actually pretty darn slow. 1138 00:46:45,556 --> 00:46:46,856 So, can we do better? 1139 00:46:47,116 --> 00:46:48,986 Well, first off all a round of applause if we could 1140 00:46:48,986 --> 00:46:50,546 for our volunteers here [applause] and Casey 1141 00:46:50,546 --> 00:46:53,326 in particular, nicely done. 1142 00:46:53,326 --> 00:46:53,926 [Inaudible] exit there. 1143 00:46:53,926 --> 00:46:56,116 You can keep the papers as souvenirs. 1144 00:46:56,386 --> 00:46:59,276 Let's see if we can't see or feel now some 1145 00:46:59,276 --> 00:47:00,606 of the speed of these things. 1146 00:47:00,606 --> 00:47:04,596 I'm gonna go over to the courses websites here and I'm gonna go 1147 00:47:04,596 --> 00:47:06,636 to today's lectures page where we have a couple 1148 00:47:06,636 --> 00:47:08,426 of demo programs linked. 1149 00:47:08,426 --> 00:47:13,446 If I go in here to sorting algorithms, so this happens 1150 00:47:13,486 --> 00:47:17,176 to be a web page that a fellow at another university made 1151 00:47:17,236 --> 00:47:19,786 that implements in a language called Java 1152 00:47:19,786 --> 00:47:21,096 or what's called an applet. 1153 00:47:21,266 --> 00:47:22,836 But in short, this is just a visualization, 1154 00:47:22,836 --> 00:47:25,346 it's a web based tool with which we can now see this 1155 00:47:25,346 --> 00:47:26,726 and some other algorithms. 1156 00:47:26,726 --> 00:47:28,656 So, I'm gonna scroll down here to the bottom 1157 00:47:28,816 --> 00:47:30,946 and choose my sorting algorithm. 1158 00:47:31,136 --> 00:47:33,086 You can already see that the world has a bunch 1159 00:47:33,086 --> 00:47:34,976 and this is barely scratching the surface. 1160 00:47:35,246 --> 00:47:38,026 We looked at bubble sort, we looked at selection sort 1161 00:47:38,236 --> 00:47:40,376 but there're things like heap sort and shell sort 1162 00:47:40,376 --> 00:47:42,526 and quick sort and a whole bunch of other things. 1163 00:47:42,826 --> 00:47:43,846 So, let's choose bubble sort. 1164 00:47:44,496 --> 00:47:45,646 What kind of array do I want? 1165 00:47:45,916 --> 00:47:47,206 Let's choose random initially. 1166 00:47:47,786 --> 00:47:49,466 View is just an aesthetic thing. 1167 00:47:49,466 --> 00:47:51,416 The delay will leave there for just a moment. 1168 00:47:51,416 --> 00:47:53,766 Let me go ahead and click start and see what happens. 1169 00:47:54,086 --> 00:47:56,056 So, in this particular visualization, 1170 00:47:56,496 --> 00:47:58,516 each of these blue bars represents a number, 1171 00:47:58,706 --> 00:48:01,336 the shorter the bar, the smaller the number, the taller the bar, 1172 00:48:01,336 --> 00:48:02,136 the bigger the number. 1173 00:48:02,466 --> 00:48:04,446 So, the idea is to move all the short bars to the left 1174 00:48:04,446 --> 00:48:06,046 and all the big bars to the right. 1175 00:48:06,546 --> 00:48:09,246 What you see is this thing runs is that it's comparing 1176 00:48:09,246 --> 00:48:12,346 in red two humans at a time, two bars at a time, 1177 00:48:12,556 --> 00:48:15,996 and if they are big-small it makes them small-big, 1178 00:48:15,996 --> 00:48:17,336 in other words it swaps them. 1179 00:48:17,886 --> 00:48:19,606 So, notice how every time something's 1180 00:48:19,606 --> 00:48:21,286 out of order, it swaps them. 1181 00:48:21,666 --> 00:48:23,136 So, what's really happening here? 1182 00:48:23,166 --> 00:48:27,416 In theory, the biggest bar is making its way from left 1183 00:48:27,416 --> 00:48:31,426 to right, but we might-- that's still not gonna fix the problem. 1184 00:48:31,426 --> 00:48:32,536 And you can really appreciate 1185 00:48:32,536 --> 00:48:34,676 that even though we're fixing some problems here 1186 00:48:34,916 --> 00:48:37,966 and slowly the small bars are trickling down to the left 1187 00:48:37,966 --> 00:48:40,646 and the bigger bar is bubbling its way up to the right, 1188 00:48:41,036 --> 00:48:42,646 we now have to do the whole thing again. 1189 00:48:43,076 --> 00:48:46,606 Now, this speed is way too slow to be fun so let's stop this 1190 00:48:47,016 --> 00:48:49,206 and let's actually do this much quicker 1191 00:48:49,886 --> 00:48:51,036 with a much shorter delay. 1192 00:48:51,086 --> 00:48:53,176 And actually even that's a little tedious, 1193 00:48:53,226 --> 00:48:55,136 so let's do it even faster. 1194 00:48:55,786 --> 00:48:57,706 Let's do 15, start. 1195 00:48:58,866 --> 00:49:02,576 So, same thing, so that's one pass of Casey on the stage, 1196 00:49:02,776 --> 00:49:09,346 that's a second pass, third pass, fourth, fifth. 1197 00:49:09,936 --> 00:49:13,316 But again, even though I'm counting these slowly, 1198 00:49:13,446 --> 00:49:16,446 each of these passes, how many steps is there during each 1199 00:49:16,446 --> 00:49:17,216 of these passes? 1200 00:49:18,146 --> 00:49:19,286 Well, it's N, right? 1201 00:49:19,286 --> 00:49:21,126 There's gonna be a total of N passes. 1202 00:49:21,366 --> 00:49:25,136 But to do a pass, Casey has to walk N steps across the stage. 1203 00:49:25,136 --> 00:49:27,526 The reds have to go N steps across the screen. 1204 00:49:27,866 --> 00:49:30,876 So, bubble sort is indeed something like N times N 1205 00:49:31,176 --> 00:49:35,836 or big O of N squared which per the chart before is actually 1206 00:49:35,896 --> 00:49:37,476 kind of bad and you can feel it, right? 1207 00:49:37,476 --> 00:49:38,666 I'm just kind of stalling here verbally. 1208 00:49:38,666 --> 00:49:40,166 I have nothing interesting to say at the moment. 1209 00:49:40,536 --> 00:49:42,126 But we just kind of need to kill time 1210 00:49:42,496 --> 00:49:45,086 until this algorithm finishes, 1211 00:49:46,106 --> 00:49:52,206 much like it will in just a moment. 1212 00:49:52,736 --> 00:49:55,696 Perfect. So, no, no. 1213 00:49:56,466 --> 00:49:59,276 [Applause] So that's bubble sort, what about selection sort? 1214 00:49:59,276 --> 00:50:00,966 What's neat about these visualizations is 1215 00:50:00,966 --> 00:50:03,066 that what might seem like fairly abstract ideas 1216 00:50:03,066 --> 00:50:05,316 or just common sense ideas, you can really start 1217 00:50:05,316 --> 00:50:07,976 to appreciate what's going on if we paint a picture for them. 1218 00:50:08,316 --> 00:50:10,466 >> So, this is selection sort this time. 1219 00:50:10,766 --> 00:50:13,546 Let me start it somewhat slow but then we'll speed it up. 1220 00:50:13,856 --> 00:50:17,046 So, selection sort kind of doesn't seem like he's doing 1221 00:50:17,046 --> 00:50:20,006 as much work at first because notice the red bar is just kind 1222 00:50:20,006 --> 00:50:22,906 of looking around trying to find the then smallest element. 1223 00:50:23,196 --> 00:50:26,006 But what happens with selection sort is that he plucks 1224 00:50:26,006 --> 00:50:29,396 out that smallest person or bar and then forces it 1225 00:50:29,396 --> 00:50:32,296 over to the left and then repeats and then repeats. 1226 00:50:32,636 --> 00:50:34,166 And so again and again, what you're seeing 1227 00:50:34,166 --> 00:50:37,316 in red is the smallest bar that the algorithm can find 1228 00:50:37,316 --> 00:50:40,646 on that pass and then again he's booting Sam or whoever is 1229 00:50:40,646 --> 00:50:44,616 in his way, moving that bar into place and then repeating. 1230 00:50:45,176 --> 00:50:48,096 So, on each pass, how many steps does it take in selection sort 1231 00:50:48,096 --> 00:50:49,836 to find the smallest element? 1232 00:50:49,986 --> 00:50:52,166 Well, it's actually gonna take as many as N steps, right? 1233 00:50:52,166 --> 00:50:54,236 You might find 2 or 3 pretty quickly 1234 00:50:54,486 --> 00:50:58,496 but you don't know the computer that 2 and 3 are the smallest 1235 00:50:58,496 --> 00:51:01,116 and so you compare them against every other element again 1236 00:51:01,116 --> 00:51:01,906 and again and again. 1237 00:51:02,316 --> 00:51:05,346 So, you've got N passes each time you're plucking off one 1238 00:51:05,346 --> 00:51:08,556 element, so this is N times N, to move N elements in place 1239 00:51:08,556 --> 00:51:10,026 and my god, this is really too slow. 1240 00:51:10,376 --> 00:51:12,696 So, let's change this to super fast. 1241 00:51:13,516 --> 00:51:15,776 Now, this does not mean selection sort is faster just 1242 00:51:15,776 --> 00:51:16,936 because I run it faster. 1243 00:51:17,226 --> 00:51:19,386 This again is consistent though with this idea, 1244 00:51:19,506 --> 00:51:21,356 you know your algorithms not faster than mine 1245 00:51:21,356 --> 00:51:23,306 if you just wait a year and run a new hardware. 1246 00:51:23,306 --> 00:51:26,796 We really care about the fundamental differences here. 1247 00:51:27,606 --> 00:51:29,416 So, let's go ahead and do this selection sort. 1248 00:51:29,416 --> 00:51:31,126 Now it's really wearing along but again, 1249 00:51:31,426 --> 00:51:32,886 fundamentally it's the same thing. 1250 00:51:32,886 --> 00:51:34,866 Bubble sort is big O of N squared 1251 00:51:35,356 --> 00:51:38,176 and selection sort is big O of N squared. 1252 00:51:38,236 --> 00:51:40,016 But what about this other thing? 1253 00:51:40,586 --> 00:51:42,406 So, big O is again worst case. 1254 00:51:42,406 --> 00:51:43,876 What was the notation for best case? 1255 00:51:43,876 --> 00:51:48,906 Yeah, so Omega, so if we go back to this notation with big O 1256 00:51:48,906 --> 00:51:52,356 and Omega, we've got big O of N squared now. 1257 00:51:52,356 --> 00:51:53,016 It's pretty bad. 1258 00:51:53,706 --> 00:51:55,326 So, this is big O of N squared. 1259 00:51:55,326 --> 00:51:56,266 That's all I've been saying. 1260 00:51:56,566 --> 00:52:01,006 What about let's say selection sort? 1261 00:52:01,296 --> 00:52:05,206 In the best case where you are handed a bunch of humans 1262 00:52:05,326 --> 00:52:08,136 that are already in order, how much work would Casey do 1263 00:52:08,136 --> 00:52:10,276 if she were implementing selection sort? 1264 00:52:11,236 --> 00:52:12,546 Best case? 1265 00:52:14,756 --> 00:52:16,336 So, who thinks N squared, 1266 00:52:17,386 --> 00:52:19,156 which was the original answer for worst case? 1267 00:52:19,156 --> 00:52:20,626 Who thinks N? 1268 00:52:21,266 --> 00:52:23,126 Excellent, you're almost all wrong. 1269 00:52:23,406 --> 00:52:24,756 So, why is that, right? 1270 00:52:24,796 --> 00:52:30,836 So, selection sort is actually big O of N squared but it's also 1271 00:52:30,836 --> 00:52:33,586 in Omega of N squared. 1272 00:52:33,586 --> 00:52:36,276 Why? Well, Casey started over here 1273 00:52:36,276 --> 00:52:38,756 and again suppose the humans are 1, 2, 3, 4, 5, 6, 7, 8, 1274 00:52:38,756 --> 00:52:42,016 all in the order, what was Casey's first suggestion to us? 1275 00:52:42,016 --> 00:52:43,746 Well, find the smallest element. 1276 00:52:43,856 --> 00:52:44,736 How did she do that? 1277 00:52:44,906 --> 00:52:47,356 Well, she looked to her left and said, "Oh, here's number 1." 1278 00:52:47,506 --> 00:52:51,306 But did she know at that moment in time that 1 is the smallest? 1279 00:52:52,346 --> 00:52:55,736 No, she only knew that once she checked here and then here 1280 00:52:55,736 --> 00:52:58,466 and then here and then here, increasingly disappointed 1281 00:52:58,466 --> 00:53:01,166 that she's not finding anything smaller but only once she got 1282 00:53:01,196 --> 00:53:04,276 to number 8 could she definitively conclude that was 1283 00:53:04,276 --> 00:53:05,426 in fact the smallest number. 1284 00:53:05,616 --> 00:53:06,346 So, what does she do? 1285 00:53:06,546 --> 00:53:09,556 She makes no swaps, she knows 1 is in the right place, 1286 00:53:09,556 --> 00:53:12,346 next what should she have done or would she have done? 1287 00:53:12,896 --> 00:53:14,046 Find the next smallest element. 1288 00:53:14,046 --> 00:53:17,566 So she can now ignore number 1 but she now needs to start 1289 00:53:17,566 --> 00:53:20,176 at number 2 and says, "Okay, 2 is now the current smallest, 1290 00:53:20,416 --> 00:53:21,866 let's see if there's anyone smaller, 1291 00:53:22,096 --> 00:53:23,766 1.5, something like that." 1292 00:53:23,766 --> 00:53:27,386 Well, let's see, nope, nope, nope, nope, nope, 1293 00:53:27,386 --> 00:53:30,506 she'll only when she does another N minus 1 steps does 1294 00:53:30,576 --> 00:53:33,656 Casey know definitively that 2 in fact was the smallest. 1295 00:53:33,776 --> 00:53:34,826 What she gonna do again? 1296 00:53:35,256 --> 00:53:37,426 Same mistake, same mistake, same mistake. 1297 00:53:37,576 --> 00:53:40,996 So as elegant and as simple and as sort of commonsensical 1298 00:53:40,996 --> 00:53:44,426 as selection sort is, it's always N squared 1299 00:53:44,536 --> 00:53:47,266 because you don't have this sort of omniscient view of your data. 1300 00:53:47,266 --> 00:53:50,456 You only check for the current smallest element and to do 1301 00:53:50,456 --> 00:53:51,836 that you have to check everyone else. 1302 00:53:52,446 --> 00:53:53,476 So, that's N squared. 1303 00:53:53,666 --> 00:53:54,866 What about bubble sort though? 1304 00:53:55,746 --> 00:53:57,526 If I instead revert to bubble sort 1305 00:53:57,526 --> 00:54:00,436 which was the second algorithm where Casey just swapped pairs 1306 00:54:00,436 --> 00:54:03,316 of people but then did it again and again and again, 1307 00:54:03,316 --> 00:54:05,206 we said that in the worst case that's N squared, 1308 00:54:05,206 --> 00:54:06,506 if people are reversed ordered. 1309 00:54:06,956 --> 00:54:07,946 What about in the best case? 1310 00:54:07,976 --> 00:54:10,286 What if all those volunteers came up and they were 1 1311 00:54:10,336 --> 00:54:11,746 through 8 in perfect order, 1312 00:54:12,406 --> 00:54:14,586 how much work would Casey do with bubble sort? 1313 00:54:15,266 --> 00:54:19,266 So, it would be N, right, and if you're just kind 1314 00:54:19,266 --> 00:54:21,146 of playing the odds here, most likely the answer 1315 00:54:21,146 --> 00:54:23,296 to this question is gonna be the opposite to the other one. 1316 00:54:23,296 --> 00:54:26,156 So, indeed the answer here is going to be N but how? 1317 00:54:26,496 --> 00:54:29,276 So, bubble sort does have a potential optimization. 1318 00:54:29,276 --> 00:54:32,046 In the worst case Bubble Sort, N squared, no doubt about it 1319 00:54:32,196 --> 00:54:34,156 because in the worst case everyone's out of order 1320 00:54:34,186 --> 00:54:36,376 and we kind of walked through just how much work you have 1321 00:54:36,376 --> 00:54:38,306 to do to re-bubble everyone through. 1322 00:54:38,576 --> 00:54:43,226 But in the best case, what might Casey be able to do to optimize 1323 00:54:43,226 --> 00:54:45,036 so that she stops wasting her time 1324 00:54:45,036 --> 00:54:46,306 if everyone is already sorted? 1325 00:54:46,526 --> 00:54:50,826 She first compares 1 and 2, not out of order, 2 and 3, 3 and 4, 1326 00:54:50,826 --> 00:54:53,926 4 and 5, 5 and 6, 7 and 8, done. 1327 00:54:54,506 --> 00:54:57,256 So, what does-- what could Casey do? 1328 00:54:57,676 --> 00:55:00,716 She could now go back to the start and do it again, 1329 00:55:01,046 --> 00:55:03,206 but what could she do instead? 1330 00:55:05,196 --> 00:55:07,436 She just needs to leverage some piece 1331 00:55:07,436 --> 00:55:10,066 of information she just created for herself. 1332 00:55:10,916 --> 00:55:12,076 There're no swaps. 1333 00:55:12,226 --> 00:55:13,326 So what if Casey 1334 00:55:13,326 --> 00:55:16,846 in her implementation just has a variable initialized to 0, 1335 00:55:17,006 --> 00:55:18,036 and that's just a counter. 1336 00:55:18,036 --> 00:55:20,776 And anytime she does a swap, she does plus, plus, 1337 00:55:20,776 --> 00:55:21,976 plus, plus, plus, plus. 1338 00:55:22,176 --> 00:55:24,326 If she does any swaps, that means, you know what, 1339 00:55:24,326 --> 00:55:25,136 we've seen this before. 1340 00:55:25,136 --> 00:55:27,146 Just because I've done swaps doesn't mean the problem is 1341 00:55:27,146 --> 00:55:28,596 solved, I better go do it again. 1342 00:55:28,756 --> 00:55:29,866 That should be her reasoning. 1343 00:55:30,186 --> 00:55:32,696 But if instead she makes a pass through these humans, 1344 00:55:32,906 --> 00:55:36,616 makes no swaps whatsoever, what does she obviously do? 1345 00:55:37,286 --> 00:55:39,736 Right, it would be down right stupid to do it again 1346 00:55:39,896 --> 00:55:41,966 because if you didn't change the state of your world, 1347 00:55:41,966 --> 00:55:43,126 if you didn't change your array 1348 00:55:43,126 --> 00:55:45,116 and the answer you got was no swaps, 1349 00:55:45,116 --> 00:55:46,876 what are you gonna get if-- as an answer the next time? 1350 00:55:47,696 --> 00:55:49,386 Well no swaps, no swaps, no swaps, 1351 00:55:49,386 --> 00:55:50,566 so what's the opportunity here 1352 00:55:50,856 --> 00:55:54,086 if counter equals equals 0 exits? 1353 00:55:54,786 --> 00:55:58,896 And so bubble sort by contrast is absolutely big O of N squared 1354 00:55:58,896 --> 00:56:01,666 in the worst case but in the best case we would say it's 1355 00:56:01,666 --> 00:56:07,296 instead Omega of N. She still has to check everyone. 1356 00:56:07,356 --> 00:56:09,676 It's not Omega of 1, it's not constant time. 1357 00:56:09,676 --> 00:56:12,446 She can't just glance at number 1 and say, 1358 00:56:12,896 --> 00:56:13,866 "Feels like it's in order." 1359 00:56:13,866 --> 00:56:16,916 Still has to look at all N but that's why we get Omega 1360 00:56:17,166 --> 00:56:20,996 of N. So, this isn't bad, this is feeling kind of good 1361 00:56:20,996 --> 00:56:23,366 and especially when we speed up our CPU time, 1362 00:56:23,366 --> 00:56:25,966 it's looking really good but remember where we started. 1363 00:56:25,966 --> 00:56:28,226 This was counting people one at a time, 1364 00:56:28,226 --> 00:56:29,926 this was counting people two at a time, 1365 00:56:30,146 --> 00:56:33,116 this was having you guys count yourselves sitting down half 1366 00:56:33,116 --> 00:56:34,546 at a time again and again and again 1367 00:56:34,886 --> 00:56:38,376 but again N squared is not the best place to be in. 1368 00:56:38,376 --> 00:56:41,546 This here, down at bottom right if we zoom in, 1369 00:56:41,716 --> 00:56:42,676 that's what N looks like. 1370 00:56:42,676 --> 00:56:44,276 It's still a straight line, we just kind of [inaudible] 1371 00:56:44,336 --> 00:56:46,426 with the X and Y-axis to stretch things out 1372 00:56:46,686 --> 00:56:49,356 but N is still straight as a line there. 1373 00:56:49,616 --> 00:56:53,076 But if we scroll up, N log int, now that's interesting, 1374 00:56:53,076 --> 00:56:54,236 we'll see that guy before long. 1375 00:56:54,426 --> 00:56:57,866 But N squared who we've seen today, that feels bad, right? 1376 00:56:57,866 --> 00:57:01,366 That means that if your size of your input is say, 1377 00:57:01,926 --> 00:57:04,936 my god this is only lining up at like 4 or 6, whatever the unit 1378 00:57:04,936 --> 00:57:07,346 of measure is here, that's bad. 1379 00:57:07,346 --> 00:57:09,346 It's gonna keep growing faster and faster. 1380 00:57:09,346 --> 00:57:11,186 So, with something like bubble sort, 1381 00:57:11,186 --> 00:57:14,146 sorting these elements is going to feel, feel, 1382 00:57:14,146 --> 00:57:16,876 feel slow unless you do something better. 1383 00:57:17,246 --> 00:57:19,316 But thankfully there are smarter ways. 1384 00:57:19,316 --> 00:57:22,626 And again for this promise of teaching us how 1385 00:57:22,626 --> 00:57:24,796 to do things more efficiently and effectively, 1386 00:57:25,166 --> 00:57:28,396 let's just get a sense of how much better we can do. 1387 00:57:28,916 --> 00:57:31,116 So, the world has come up with again 1388 00:57:31,116 --> 00:57:34,386 yet more sorting algorithms, stooge sort, Q sort, J sort. 1389 00:57:34,546 --> 00:57:36,036 I'm gonna choose our familiar ones. 1390 00:57:36,036 --> 00:57:37,856 I'm gonna choose selection sort in this box. 1391 00:57:38,256 --> 00:57:41,776 I'm gonna choose bubble sort in this one, 1392 00:57:41,936 --> 00:57:44,076 and then I'm gonna choose another guy, merge sort. 1393 00:57:44,436 --> 00:57:45,856 We haven't looked at merge sort 1394 00:57:45,856 --> 00:57:48,306 but someone put a little more thought 1395 00:57:48,306 --> 00:57:49,646 into implementing that algorithm. 1396 00:57:49,646 --> 00:57:51,836 It's still just as correct, all three of these are right 1397 00:57:51,836 --> 00:57:53,556 and Casey in our demonstrations, 1398 00:57:53,776 --> 00:57:56,966 no one did anything wrong today but we can do better. 1399 00:57:57,286 --> 00:57:59,666 So, what I'm gonna do here, and these examples are very similar, 1400 00:57:59,666 --> 00:58:02,336 it's just this, the person who implemented this rotated the bar 1401 00:58:02,336 --> 00:58:05,126 so that they're now sideways instead of top to bottom, 1402 00:58:05,126 --> 00:58:06,026 but it's the same idea. 1403 00:58:06,256 --> 00:58:08,286 We wanna get short bars on one end of the graph 1404 00:58:08,376 --> 00:58:09,526 and long bars on the other. 1405 00:58:09,526 --> 00:58:12,606 I'm gonna click as fast as I can on all three of these graphs 1406 00:58:12,606 --> 00:58:14,556 to have them run simultaneously in hopes 1407 00:58:14,556 --> 00:58:16,396 of demonstrating which one is better. 1408 00:58:16,686 --> 00:58:18,466 So, let's just see after a little bit of thought 1409 00:58:18,466 --> 00:58:21,246 and a little bit of CS50, how much better we can do. 1410 00:58:21,916 --> 00:58:22,896 Here we go. 1411 00:58:23,516 --> 00:58:29,826 [ Pause ] 1412 00:58:30,326 --> 00:58:34,506 >> Done. So that is merge sort, this is CS50 1413 00:58:34,506 --> 00:58:36,396 and we will see you next week. 1414 00:58:37,516 --> 00:58:45,740 [ Applause ]