1 00:00:00,506 --> 00:00:08,546 [ Silence ] 2 00:00:09,046 --> 00:00:11,356 >> Alright, welcome back to CS50, 3 00:00:11,356 --> 00:00:13,786 this is the start of week 4. 4 00:00:13,786 --> 00:00:15,576 So, now that you're fully immersed 5 00:00:15,576 --> 00:00:18,476 in CS50 it's quite understandable 6 00:00:18,476 --> 00:00:20,406 if you want people to know that you're in CS50 7 00:00:20,436 --> 00:00:21,966 and so the teaching fellas have been hard 8 00:00:21,966 --> 00:00:24,596 at work putting together this year's line of apparel 9 00:00:25,216 --> 00:00:28,446 which includes sweatshirts, CS50 T-shirts and the like. 10 00:00:28,446 --> 00:00:31,176 It's kind of a silly tradition that we started a few years ago. 11 00:00:31,176 --> 00:00:35,676 And so, I would be remised if I didn't draw your attention 12 00:00:35,676 --> 00:00:39,686 to the artwork that they've been working on at store.cs50.net. 13 00:00:40,086 --> 00:00:45,196 Can you tune in to any number of designs including duffel bags, 14 00:00:45,226 --> 00:00:49,496 if you're the athletic type, law bears if you're more 15 00:00:49,646 --> 00:00:51,266 of the stuffed animal type, 16 00:00:51,266 --> 00:00:55,026 and then under the new category here do we have a whole bunch 17 00:00:55,026 --> 00:00:57,776 of meem themes shirt as well. 18 00:00:57,776 --> 00:00:59,796 So feel free to check that out if you have interest. 19 00:00:59,936 --> 00:01:01,766 I of course picked the wrong day to wear a sweatshirt 20 00:01:01,766 --> 00:01:05,256 under really hot lights so thankfully I came prepared 21 00:01:05,256 --> 00:01:08,326 with another item from the-- from the store. 22 00:01:08,936 --> 00:01:13,566 But one of the teaching fellas also passed long to us recently, 23 00:01:13,816 --> 00:01:16,186 a little real world example of what happens 24 00:01:16,186 --> 00:01:18,386 when you're not mindful of various data types 25 00:01:18,386 --> 00:01:20,866 and you're not mindful of the imprecision that's inherent 26 00:01:20,866 --> 00:01:22,836 in representing data in a computer, 27 00:01:22,836 --> 00:01:26,266 at least using a language like C and low level primitives 28 00:01:26,266 --> 00:01:27,966 like floats and even doubles. 29 00:01:27,966 --> 00:01:30,746 So, this is a photograph that someone took on the internet 30 00:01:30,746 --> 00:01:33,536 of a real world receipt from a restaurant somewhere I think 31 00:01:33,536 --> 00:01:37,016 in the UK and I think the take away will perhaps be cleared 32 00:01:37,016 --> 00:01:38,686 pretty fast. 33 00:01:38,686 --> 00:01:39,236 [ Laughter ] 34 00:01:39,236 --> 00:01:41,706 >> So, it looks like the curry 35 00:01:41,706 --> 00:01:45,306 and some other items were really precisely defined in terms 36 00:01:45,306 --> 00:01:47,426 of price and this is of course has generated this receipt 37 00:01:47,426 --> 00:01:49,266 by some computer, some cash-- 38 00:01:49,266 --> 00:01:51,846 cash register and they just didn't account 39 00:01:51,846 --> 00:01:53,106 for the inherent imprecision 40 00:01:53,106 --> 00:01:55,616 so here is a perhaps real world incarnation of that. 41 00:01:55,836 --> 00:01:59,716 Now, we concluded last week at looking at sorts and efficiency, 42 00:01:59,956 --> 00:02:02,266 some new notation called asymptotic notation 43 00:02:02,266 --> 00:02:03,776 and we'll continue that story today. 44 00:02:04,096 --> 00:02:05,976 But what I thought I'd do is draw your attention 45 00:02:05,976 --> 00:02:08,736 to another little project that the course has been working on. 46 00:02:08,736 --> 00:02:09,916 This is essentially a reboot 47 00:02:09,916 --> 00:02:11,716 of something we put together a few years ago 48 00:02:11,716 --> 00:02:14,056 but the thing had grown in size, this particular app. 49 00:02:14,326 --> 00:02:15,986 The data set was getting unwieldy. 50 00:02:15,986 --> 00:02:17,916 It was just getting ugly in our eyes. 51 00:02:18,216 --> 00:02:19,636 So based on the experience we had 52 00:02:19,636 --> 00:02:23,836 with Harvard courses we give you version 3 of HarvardEvents. 53 00:02:23,836 --> 00:02:27,196 And this is a tool with which you can navigate all the things 54 00:02:27,196 --> 00:02:28,716 that we know about going on campus. 55 00:02:28,716 --> 00:02:31,496 It turns out there are hundreds of Google calendars floating 56 00:02:31,496 --> 00:02:33,546 around campus, for student groups, for departments, 57 00:02:33,736 --> 00:02:35,376 for special events and the like. 58 00:02:35,376 --> 00:02:37,036 There are other file formats as well. 59 00:02:37,286 --> 00:02:40,446 And being graduates of CS50 we decided to find all 60 00:02:40,446 --> 00:02:42,606 of these data, find all of the URL's of people 61 00:02:42,696 --> 00:02:45,106 with Google calendars and the like, write a script 62 00:02:45,246 --> 00:02:46,016 in a language called PHP 63 00:02:46,016 --> 00:02:48,596 which we'll actually introduce toward the end of the semester 64 00:02:48,896 --> 00:02:51,186 and then every several hours our script runs, 65 00:02:51,266 --> 00:02:53,896 grabs all of these Google calendars and the 66 00:02:53,896 --> 00:02:54,796 like from the internet. 67 00:02:55,096 --> 00:02:57,216 Parses through them, which means to read through this-- 68 00:02:57,306 --> 00:03:00,396 what are really just text files, top to bottom, left to right. 69 00:03:00,396 --> 00:03:03,846 It imports it into our database and does some fancy indexing 70 00:03:03,846 --> 00:03:05,596 as we'll call it later in the term 71 00:03:05,596 --> 00:03:07,716 to make searches more efficient. 72 00:03:07,716 --> 00:03:08,916 And it also allows us to look 73 00:03:08,916 --> 00:03:10,716 for certain key words and themes. 74 00:03:10,716 --> 00:03:14,366 And so frankly, what justified the expenditure of time 75 00:03:14,366 --> 00:03:17,946 that this took us in recent weeks was this simple little 76 00:03:17,946 --> 00:03:20,376 feature of being able now at top left, 77 00:03:20,376 --> 00:03:23,076 you click on free food [laughter] and you can filter 78 00:03:23,076 --> 00:03:25,006 out all the stuff that's going on [laughter] 79 00:03:25,006 --> 00:03:29,096 and this every place on campus you can go between now 80 00:03:29,096 --> 00:03:30,896 and the rest of the year to find something to eat. 81 00:03:30,896 --> 00:03:30,963 [ Applause ] 82 00:03:30,963 --> 00:03:38,146 >> And so if incidentally, you're a representative 83 00:03:38,146 --> 00:03:40,906 of a student group and the alike and your Google calendar is not 84 00:03:40,906 --> 00:03:43,646 in here, you can check just by flipping through the various-- 85 00:03:43,686 --> 00:03:45,536 we have over 200 calendars right now 86 00:03:45,536 --> 00:03:46,636 and you can search through them here. 87 00:03:46,836 --> 00:03:47,736 If someone-- you're-- 88 00:03:47,736 --> 00:03:50,016 one of you're predecessor didn't already submit it, 89 00:03:50,016 --> 00:03:52,896 do follow the directions at top left which says, add calendar 90 00:03:53,116 --> 00:03:56,466 so that we can augment the data set even further. 91 00:03:56,466 --> 00:03:59,286 And you'll find too because of J term this year or winter break 92 00:03:59,286 --> 00:04:01,516 where you guys can finally come back onto campus a bit early 93 00:04:01,516 --> 00:04:03,926 for various opportunities. 94 00:04:04,176 --> 00:04:06,226 You'll find that those events as they get approved 95 00:04:06,226 --> 00:04:08,346 and start getting proposed by your classmates, 96 00:04:08,516 --> 00:04:10,596 they'll start appearing on the calendar as well 97 00:04:10,596 --> 00:04:11,946 in their respective categories. 98 00:04:11,946 --> 00:04:15,906 So, more data to come, so this is a clip some 99 00:04:15,906 --> 00:04:18,726 of you might have seen in section already. 100 00:04:18,726 --> 00:04:21,326 If not, it's sort of become a geek classic. 101 00:04:21,326 --> 00:04:23,486 Eric Schmidt is the name of the CEO 102 00:04:23,486 --> 00:04:25,966 of a little company called Google and he had the fortune 103 00:04:25,966 --> 00:04:28,086 of sitting down with Barrack Obama a few years ago 104 00:04:28,086 --> 00:04:30,816 when he was still senator about to run for president 105 00:04:31,236 --> 00:04:33,816 and to give him a Google interview. 106 00:04:33,886 --> 00:04:36,816 And I think you'll find the topic that came up during 107 00:04:37,156 --> 00:04:39,676 that interview fairly apropos. 108 00:04:39,746 --> 00:04:42,656 So, I give you this interview. 109 00:04:44,676 --> 00:04:44,926 Oops! 110 00:04:44,926 --> 00:04:45,056 [ Music ] 111 00:04:45,056 --> 00:04:45,186 [ Laughter ] 112 00:04:45,186 --> 00:04:48,996 >> I give you first some dance music. 113 00:04:49,436 --> 00:04:52,816 Alright, I give you this interview. 114 00:04:53,021 --> 00:04:55,021 [ Applause ] 115 00:04:55,226 --> 00:05:02,746 >> Now senator, you're here at Google and I like to think 116 00:05:02,746 --> 00:05:04,286 of the presidency as a job interview. 117 00:05:05,426 --> 00:05:08,166 [Laughter] Now, it's hard to get a job. 118 00:05:08,166 --> 00:05:08,296 >> Right. 119 00:05:08,296 --> 00:05:09,776 >> As president-- 120 00:05:09,776 --> 00:05:09,866 >> Right. 121 00:05:09,866 --> 00:05:11,746 >> And you're going to deliver this now. 122 00:05:11,746 --> 00:05:13,526 It's also hard to get a job at Google. 123 00:05:13,526 --> 00:05:13,806 >> Right. 124 00:05:13,806 --> 00:05:13,916 [ Laughter ] 125 00:05:13,916 --> 00:05:18,706 >> We have questions and we asks our candidates questions 126 00:05:18,706 --> 00:05:21,036 and this one is from Larry Schwimmer. 127 00:05:21,036 --> 00:05:21,136 >> Okay. 128 00:05:21,136 --> 00:05:21,296 [ Laughter ] 129 00:05:21,296 --> 00:05:24,556 >> What-- you guys think I'm kidding? 130 00:05:24,556 --> 00:05:26,986 It's right here. 131 00:05:27,496 --> 00:05:28,876 What is the most efficient way 132 00:05:28,876 --> 00:05:31,096 to store a million 32-bit integers? 133 00:05:31,411 --> 00:05:33,411 [ Laughter ] 134 00:05:33,726 --> 00:05:34,626 >> Well-- 135 00:05:34,626 --> 00:05:36,976 >> I'm-- maybe-- 136 00:05:36,976 --> 00:05:37,116 >> I-- 137 00:05:37,116 --> 00:05:38,006 >> I'm sorry, maybe-- 138 00:05:38,006 --> 00:05:38,896 >> No, no, no, no. 139 00:05:38,976 --> 00:05:39,676 I-- I think-- 140 00:05:39,676 --> 00:05:39,916 >> That's not a-- 141 00:05:39,916 --> 00:05:43,716 >> I-- I think the Bubble Sort would be the wrong way to go. 142 00:05:43,716 --> 00:05:44,906 [ Laughter ] 143 00:05:44,906 --> 00:05:47,006 >> Come on. 144 00:05:47,766 --> 00:05:48,636 Who told is this? 145 00:05:49,176 --> 00:05:53,556 Okay. I didn't see computer science in his background. 146 00:05:53,556 --> 00:05:56,046 >> We-- we've got our spies in there. 147 00:05:56,046 --> 00:05:57,096 [ Laughter ] 148 00:05:57,096 --> 00:06:00,276 >> Why, let's okay-- let's ask him a different 149 00:06:00,276 --> 00:06:02,596 interview question. 150 00:06:03,416 --> 00:06:07,326 >> Alright, so bubble of sorts of course is being poked fun 151 00:06:07,326 --> 00:06:08,756 at there for what reasons? 152 00:06:10,096 --> 00:06:11,296 Alright, so it's inefficient. 153 00:06:11,296 --> 00:06:12,426 It's relatively slow. 154 00:06:12,426 --> 00:06:14,516 The upside of it of course though is that's it's pretty 155 00:06:14,516 --> 00:06:16,196 darn easy to implement. 156 00:06:16,196 --> 00:06:19,186 At least you maybe finding that and if not when you look back 157 00:06:19,246 --> 00:06:21,526 at the code you then have written you'll realize 158 00:06:21,526 --> 00:06:23,506 that in fact it just requires a few lines of code. 159 00:06:23,506 --> 00:06:26,866 And so ease of implementation is actually a very compelling 160 00:06:26,866 --> 00:06:28,676 metric against which to measure-- 161 00:06:28,676 --> 00:06:31,216 do you mind toning my voice down a bit-- 162 00:06:31,216 --> 00:06:32,966 is a very reasonable measure against which 163 00:06:32,966 --> 00:06:35,386 to measure the quality of an algorithm, right. 164 00:06:35,386 --> 00:06:37,666 One of the inputs to the problems we're gonna be working 165 00:06:37,666 --> 00:06:39,856 on this semester and if you continue on in this world 166 00:06:39,886 --> 00:06:42,426 after this course is gonna be how much time it takes, 167 00:06:42,426 --> 00:06:44,236 how much thought it takes to implement something, 168 00:06:44,476 --> 00:06:46,486 how much time it takes to run something, 169 00:06:46,486 --> 00:06:49,266 how much space it takes to run something? 170 00:06:49,266 --> 00:06:50,606 And so we'll, oops! 171 00:06:50,606 --> 00:06:52,526 That is not the slide I want. 172 00:06:52,726 --> 00:06:55,056 And so we'll see this theme of tradeoffs, to be honest, 173 00:06:55,056 --> 00:06:55,776 throughout the course. 174 00:06:55,776 --> 00:06:57,106 And we'll focus more academically 175 00:06:57,106 --> 00:06:58,896 on tradeoffs involving space, 176 00:06:58,896 --> 00:07:01,656 bram and disk space and timed CPU cycles. 177 00:07:01,886 --> 00:07:02,836 But towards semesters end 178 00:07:02,836 --> 00:07:05,166 when you're actually tackling what you probably will make 179 00:07:05,166 --> 00:07:07,356 out to be some fairly real world projects 180 00:07:07,356 --> 00:07:09,406 for your final project the amount of time it takes 181 00:07:09,406 --> 00:07:11,676 and the amount of effort it takes for the programmers, 182 00:07:11,706 --> 00:07:12,636 the computer scientists 183 00:07:12,836 --> 00:07:16,696 to actually develop something is itself a reasonable resource 184 00:07:17,046 --> 00:07:19,216 to consider spending or not spending. 185 00:07:19,216 --> 00:07:20,106 Case and point, when I was 186 00:07:20,106 --> 00:07:23,236 in graduate school I used a programming language called Perl 187 00:07:23,236 --> 00:07:23,596 a lot. 188 00:07:23,596 --> 00:07:24,836 It's a scripting language 189 00:07:24,836 --> 00:07:27,406 which like PHP is something we'll discuss later 190 00:07:27,406 --> 00:07:28,056 in this semester. 191 00:07:28,376 --> 00:07:31,506 But I was working on research involving detection of worms, 192 00:07:31,506 --> 00:07:33,116 viruses and the spread thereof. 193 00:07:33,116 --> 00:07:34,666 And so I was collecting from friends 194 00:07:34,966 --> 00:07:38,226 and colleagues' computers all sorts of logs 195 00:07:38,306 --> 00:07:40,506 that monitored the kind of low level activity 196 00:07:40,506 --> 00:07:41,796 that was happening on their computer. 197 00:07:41,796 --> 00:07:42,936 And so part of my research was 198 00:07:42,986 --> 00:07:44,986 to analyze these logs and find patterns. 199 00:07:45,016 --> 00:07:46,876 The idea being if I find a pattern 200 00:07:46,876 --> 00:07:49,016 across multiple people's computers maybe all 201 00:07:49,016 --> 00:07:50,946 of those people are infected with the same kind 202 00:07:50,946 --> 00:07:52,856 of malware or virus or worm. 203 00:07:53,176 --> 00:07:56,216 And so very often I would find myself with gigabytes worth 204 00:07:56,216 --> 00:08:00,096 of data, at night I needed to analyze this data and look 205 00:08:00,096 --> 00:08:03,206 for this patterns and frankly the reality was sometimes I 206 00:08:03,206 --> 00:08:06,236 could spend 10-15 minutes whipping up a little script, 207 00:08:06,236 --> 00:08:07,116 a little program 208 00:08:07,426 --> 00:08:10,586 that unfortunately would take eight hours or more to run. 209 00:08:10,966 --> 00:08:11,896 But I would choose to sit 210 00:08:11,896 --> 00:08:14,496 down for these particular projects right around midnight 211 00:08:14,496 --> 00:08:15,236 so that when I woke 212 00:08:15,236 --> 00:08:17,736 up my program was giving me the results I wanted. 213 00:08:17,736 --> 00:08:19,136 And so, in the real world frankly, 214 00:08:19,136 --> 00:08:20,186 these kinds of questions, 215 00:08:20,396 --> 00:08:22,986 how much time is this gonna take is certainly germane. 216 00:08:22,986 --> 00:08:25,846 But of course that process would not scale very well. 217 00:08:25,846 --> 00:08:27,836 In fact to write a run those same experience-- 218 00:08:28,046 --> 00:08:31,236 those same experiments now with the even greater volume 219 00:08:31,236 --> 00:08:33,306 of data computers are now producing I'd probably have 220 00:08:33,346 --> 00:08:35,026 terabytes worth of data and at 221 00:08:35,026 --> 00:08:36,566 that point things just would have broken. 222 00:08:36,566 --> 00:08:37,676 It would not have been viable. 223 00:08:37,676 --> 00:08:39,336 And so you have to go back to the drawing board 224 00:08:39,336 --> 00:08:40,736 and realize Bubble Sort 225 00:08:40,736 --> 00:08:43,106 or whatever it is you're using just isn't enough to snuff, 226 00:08:43,106 --> 00:08:44,946 I need to come up with something more clever. 227 00:08:45,206 --> 00:08:47,106 And so we looked at some alternatives the other day. 228 00:08:47,106 --> 00:08:49,136 We looked at something called Selection Sort 229 00:08:49,316 --> 00:08:51,196 and that too was pretty straightforward, 230 00:08:51,196 --> 00:08:52,066 at least conceptually. 231 00:08:52,066 --> 00:08:55,226 I just go down the list selecting the smallest person 232 00:08:55,226 --> 00:08:56,856 at a time and then I repeat, repeat, 233 00:08:56,856 --> 00:08:59,686 repeat but when we actually did out the math or kind of reason 234 00:08:59,716 --> 00:09:03,336 through it, the running time, the asymptotic running time 235 00:09:03,336 --> 00:09:06,326 of bub-- of Selection Sort was also what? 236 00:09:06,936 --> 00:09:08,436 So, N squared. 237 00:09:08,436 --> 00:09:10,486 So we introduced this notation big O 238 00:09:10,486 --> 00:09:12,356 which generally refers to worst case. 239 00:09:12,446 --> 00:09:15,086 And by worst case we mean different things. 240 00:09:15,086 --> 00:09:16,376 And the context of sorting, 241 00:09:16,596 --> 00:09:18,566 the worst case is your handed a problem that's 242 00:09:18,566 --> 00:09:21,596 in complete reverse order because that implies you have 243 00:09:21,596 --> 00:09:25,736 to do as more work that could possibly-- that you could-- 244 00:09:25,736 --> 00:09:28,146 you have to do more work than you would of course 245 00:09:28,146 --> 00:09:29,986 if things were in perfect order. 246 00:09:29,986 --> 00:09:31,386 And so you care-- you care 247 00:09:31,386 --> 00:09:33,986 about ultimately how much time is my algorithm gonna take 248 00:09:34,256 --> 00:09:36,656 to perform on that worst case running time. 249 00:09:36,656 --> 00:09:38,406 We also looked at Omega Notation. 250 00:09:38,406 --> 00:09:39,786 And this was just a formal way 251 00:09:39,786 --> 00:09:42,146 of describing the best case running time and in the case 252 00:09:42,146 --> 00:09:45,726 of Selection Sort, what was the best case running time? 253 00:09:47,196 --> 00:09:50,506 So, N, N squared, which was it? 254 00:09:51,156 --> 00:09:52,986 So, it wasn't one. 255 00:09:52,986 --> 00:09:55,526 Okay, so 1 is a viable answer in some contacts 256 00:09:55,526 --> 00:09:57,416 that needs constant time, the same amount 257 00:09:57,416 --> 00:09:58,276 of time no matter what. 258 00:09:58,466 --> 00:10:01,726 But it's definitely not one and in fact it wasn't N in the case 259 00:10:01,726 --> 00:10:04,146 of Selection Sort because remember the algorithm we 260 00:10:04,146 --> 00:10:06,906 implemented on stage last week had me going back and forth 261 00:10:06,906 --> 00:10:09,776 across the stage selecting on iteration, 262 00:10:09,776 --> 00:10:12,306 the smallest person I can find, the smallest number 263 00:10:12,306 --> 00:10:12,976 and then putting them into place. 264 00:10:13,286 --> 00:10:15,916 >> But the problem is I wasn't taking any kind 265 00:10:15,916 --> 00:10:16,796 of aggregate view. 266 00:10:16,796 --> 00:10:19,256 I was just finding very tunnel vision-like, 267 00:10:19,256 --> 00:10:21,716 the smallest elements at that moment in time 268 00:10:21,986 --> 00:10:23,436 which means I don't know anything 269 00:10:23,436 --> 00:10:26,296 about the other elements other than they are not the smallest 270 00:10:26,466 --> 00:10:28,626 and so no matter what with Selection Sort I had 271 00:10:28,626 --> 00:10:31,286 to repeat this again and again and again and if you do 272 00:10:31,286 --> 00:10:33,856 out the math it's roughly N squared steps 273 00:10:34,036 --> 00:10:35,366 in the worst case as well. 274 00:10:35,596 --> 00:10:38,396 So Selection Sort, while it might be easier perhaps to think 275 00:10:38,396 --> 00:10:39,206 through than Bubble Sort, 276 00:10:39,206 --> 00:10:40,376 or maybe it's pretty much equivalent, 277 00:10:40,376 --> 00:10:42,216 it's just a different approach to the same problem. 278 00:10:42,626 --> 00:10:45,176 It fundamentally didn't feel all that better. 279 00:10:45,426 --> 00:10:47,416 But there was something that was better, right? 280 00:10:47,416 --> 00:10:50,296 You'll recall that at the very end of class we pulled 281 00:10:50,296 --> 00:10:54,076 up this little demo that looked like this. 282 00:10:54,076 --> 00:10:56,536 And we had a bunch of drop downs I was able to choose from. 283 00:10:56,536 --> 00:11:00,436 I chose Bubble Sort on the left Selection Sort on the right 284 00:11:00,716 --> 00:11:04,036 and then something called merge sort on the very right hand side 285 00:11:04,266 --> 00:11:06,996 and then I started this all off roughly at the same time 286 00:11:07,726 --> 00:11:10,886 and what was frankly striking at least to me 287 00:11:11,046 --> 00:11:13,436 at the time was, my God it's done. 288 00:11:13,686 --> 00:11:15,946 Like what the heck have we been spending our time for-- 289 00:11:15,946 --> 00:11:18,566 our time on with Bubble Sort and with Selection Sort 290 00:11:18,566 --> 00:11:20,756 and in fact there's plenty of other N squared sorts 291 00:11:20,756 --> 00:11:22,486 that we're not even gonna bother looking at. 292 00:11:22,526 --> 00:11:24,666 But with this example alone can you realize 293 00:11:24,666 --> 00:11:28,066 that with some more exercise of thoughts and intelligent-- 294 00:11:28,376 --> 00:11:31,306 I don't wanna say intelligent design here, 295 00:11:31,496 --> 00:11:34,976 intelligent design can you actually solve this problem 296 00:11:34,976 --> 00:11:38,986 so much more efficiently and just as correctly. 297 00:11:38,986 --> 00:11:40,746 And so one of the things we'll look 298 00:11:40,746 --> 00:11:43,256 at today is how can we leverage an algorithm, 299 00:11:43,256 --> 00:11:45,336 how can we implement an algorithm that at least 300 00:11:45,336 --> 00:11:48,786 at first glance the second time we've now seen it feels 301 00:11:48,786 --> 00:11:50,176 so obviously better. 302 00:11:50,176 --> 00:11:52,736 Where is the ingenuity there and how can we leverage 303 00:11:52,736 --> 00:11:54,656 that in our own problem solving? 304 00:11:54,656 --> 00:11:56,926 So, first, a couple of teasers-- 305 00:11:57,516 --> 00:12:00,336 oh, that was kind of a jerk move, okay. 306 00:12:00,336 --> 00:12:00,846 [ Laughter ] 307 00:12:00,846 --> 00:12:02,036 >> Unintentional. 308 00:12:02,356 --> 00:12:05,456 So first just a couple of teasers if you are 309 00:12:05,456 --> 00:12:07,046 in fact the geeky type, so our friends 310 00:12:07,046 --> 00:12:10,116 at Microsoft are having this little nerd party in a few days. 311 00:12:10,356 --> 00:12:12,616 We'll post this information on the course's home page. 312 00:12:12,616 --> 00:12:14,146 But it looks like they're giving away stuff, 313 00:12:14,146 --> 00:12:16,296 they're feeding you stuff, and if you're interested 314 00:12:16,296 --> 00:12:18,326 in learning a bit more about Microsoft internships 315 00:12:18,326 --> 00:12:21,046 and such they have an office in Kendall Square near MIT 316 00:12:21,046 --> 00:12:23,866 and that's where I believe this event is to take place. 317 00:12:23,866 --> 00:12:28,006 If you are interested by contrast in new 318 00:12:28,006 --> 00:12:32,346 and fun toys our friends at Aces which is a hardware manufacturer 319 00:12:32,346 --> 00:12:34,116 which makes things like motherboards 320 00:12:34,116 --> 00:12:36,076 which are the innards of computers and other things. 321 00:12:36,406 --> 00:12:38,776 They are soon releasing apparently a new tablet 322 00:12:38,776 --> 00:12:40,616 like device that looks a little something like this. 323 00:12:40,886 --> 00:12:43,326 One of our teaching fellows Willie Yao is one 324 00:12:43,326 --> 00:12:45,136 of their ambassadors on campus. 325 00:12:45,376 --> 00:12:47,336 And so if you are a bit of a geek and would like to play 326 00:12:47,336 --> 00:12:50,346 with this you can go to cs50.net/aces 327 00:12:50,346 --> 00:12:52,756 and I don't think Willie will give it to you to keep 328 00:12:52,756 --> 00:12:55,996 but you can borrow and play for some amount of time in exchange 329 00:12:55,996 --> 00:12:57,116 for some feedback on it. 330 00:12:57,146 --> 00:13:01,296 So if you like toys feel free to reach out there-- what's that? 331 00:13:01,296 --> 00:13:02,706 >> Is it like an Etch-a Sketch? 332 00:13:02,706 --> 00:13:03,846 >> Is it like an Etch-a-Sketch? 333 00:13:03,846 --> 00:13:05,956 It appears to be useful in that way 334 00:13:05,956 --> 00:13:09,326 but I imagine it does more compelling things these days 335 00:13:09,356 --> 00:13:12,266 but I have not seen it myself. 336 00:13:13,016 --> 00:13:15,926 Alright, so problem set 3. 337 00:13:16,006 --> 00:13:17,326 So a bunch of you have already dived 338 00:13:17,326 --> 00:13:18,876 into this probably which is fantastic. 339 00:13:19,266 --> 00:13:21,296 What I though I'd do is just tease you with some visuals 340 00:13:21,486 --> 00:13:22,976 if you haven't gotten to the point of playing 341 00:13:22,976 --> 00:13:25,136 with the staff solution so the themes 342 00:13:25,136 --> 00:13:27,646 of problem set 3 are two-fold. 343 00:13:27,646 --> 00:13:29,876 One is to just get your hands dirty with some 344 00:13:29,876 --> 00:13:33,796 of the algorithms, some of the sorting and searching algorithms 345 00:13:33,796 --> 00:13:35,646 that we've been discussing now for a couple of weeks 346 00:13:35,646 --> 00:13:38,866 and we'll finish looking at this week specifically Bubble Sort, 347 00:13:38,866 --> 00:13:40,806 Selection Sort, this thing called merge sort 348 00:13:41,026 --> 00:13:43,326 but then the main part of this problem set is 349 00:13:43,326 --> 00:13:46,166 to implement a game that you might have received 350 00:13:46,166 --> 00:13:49,966 in a little goody bag as a party favor years ago, the game of 15. 351 00:13:49,966 --> 00:13:52,246 This is a game where you get a little plastic device, 352 00:13:52,506 --> 00:13:55,276 it's got 15 plastic numbers on it and one hole 353 00:13:55,276 --> 00:13:57,546 in this little plastic board and you can move those numbers up, 354 00:13:57,606 --> 00:13:59,386 down, left to right, and the goal is 355 00:13:59,386 --> 00:14:02,026 to take what's a random assortment of tiles 356 00:14:02,356 --> 00:14:04,356 and arrange them in numeric order and that's one 357 00:14:04,356 --> 00:14:06,846 of these little things you can play sort of absentmindedly. 358 00:14:07,166 --> 00:14:09,396 But now, there is an opportunity for us 359 00:14:09,396 --> 00:14:10,656 to implement this in code. 360 00:14:10,656 --> 00:14:13,346 And for a couple of reasons, one this is the first P set 361 00:14:13,496 --> 00:14:16,596 where we're actually gonna give you code to work from. 362 00:14:16,596 --> 00:14:19,936 So it's generally called distribution code and whereas 363 00:14:19,936 --> 00:14:21,846 for the previous problems that you pretty much started 364 00:14:21,846 --> 00:14:24,156 from scratch, blank files you opened up nano 365 00:14:24,156 --> 00:14:26,026 and there was nothing there unless you put it there. 366 00:14:26,346 --> 00:14:28,206 Now we're gonna hand you some framework, 367 00:14:28,256 --> 00:14:30,456 that code that we wrote with some blanks 368 00:14:30,636 --> 00:14:32,726 to get you accustomed to the idea of one, 369 00:14:32,726 --> 00:14:34,986 writing larger programs than time might allow 370 00:14:34,986 --> 00:14:36,146 if you were doing it on your own. 371 00:14:36,506 --> 00:14:39,236 And two, to give you a better sense of design especially 372 00:14:39,236 --> 00:14:42,576 as your programs get a little more involved and they play 373 00:14:42,576 --> 00:14:45,096 or they run for more than just a few seconds time. 374 00:14:45,096 --> 00:14:46,606 In fact they interact with the user. 375 00:14:46,606 --> 00:14:49,306 You will be able to infer from some of our code how 376 00:14:49,306 --> 00:14:52,036 in fact you can implement some more sophisticated programs. 377 00:14:52,036 --> 00:14:53,986 Now we'll guide you through it as you'll see in the specs 378 00:14:54,276 --> 00:14:57,416 as to what holes you need to fill in within the framework. 379 00:14:57,706 --> 00:15:00,536 But you'll see that once you get to final projects frankly, 380 00:15:00,536 --> 00:15:01,346 you're not gonna want 381 00:15:01,346 --> 00:15:03,256 to implement everything from scratch. 382 00:15:03,256 --> 00:15:06,006 In fact, with this website here that I pulled up a moment ago, 383 00:15:06,386 --> 00:15:10,176 the events.cs50.net a lot of this stuff, 384 00:15:10,236 --> 00:15:11,996 so it definitely took a lot of time. 385 00:15:11,996 --> 00:15:15,026 And in fact, one of the lessons you may already be realizing 386 00:15:15,026 --> 00:15:18,946 with P sets is that things seem to take twice, three times, 387 00:15:18,986 --> 00:15:21,176 four times longer than you actually might think. 388 00:15:21,216 --> 00:15:24,596 And frankly, even I never quite learned that lesson. 389 00:15:24,596 --> 00:15:25,806 I mean that-- what began as, 390 00:15:25,806 --> 00:15:28,126 ah we'll do this this weekend turned out to be 391 00:15:28,126 --> 00:15:30,496 like a 100-hour project to revamp this whole thing. 392 00:15:30,496 --> 00:15:33,766 And it was fun for us but thankfully, we didn't have 393 00:15:33,766 --> 00:15:35,666 to implement all this stuff ourselves. 394 00:15:35,666 --> 00:15:38,116 If you focus on some of the minutia of a site like this, 395 00:15:38,386 --> 00:15:39,686 you know, it's useful to be able to jump 396 00:15:39,686 --> 00:15:42,426 to a particular date whether in the future or in the past. 397 00:15:42,426 --> 00:15:45,806 Now, this is a pretty familiar widget of little gadget 398 00:15:45,806 --> 00:15:48,706 that you might see on most websites today, but my God, 399 00:15:48,706 --> 00:15:51,296 what an uninteresting problem to solve ourselves. 400 00:15:51,296 --> 00:15:53,706 I don't want to spend the whole weekend implementing just a 401 00:15:53,706 --> 00:15:56,496 little pop up calendar and frankly it probably would take 402 00:15:56,496 --> 00:15:59,156 at least that long to get something that's as interactive 403 00:15:59,416 --> 00:16:01,646 and dynamic as something like this that works 404 00:16:01,646 --> 00:16:03,266 across all browsers and so forth. 405 00:16:03,526 --> 00:16:05,966 But thankfully now in 2010, we're at the point 406 00:16:05,966 --> 00:16:08,956 where there's a whole lot of free and open source software 407 00:16:09,156 --> 00:16:10,956 that you can integrate into your projects 408 00:16:11,006 --> 00:16:12,836 that other nice people have made available. 409 00:16:13,016 --> 00:16:16,346 We're using a library as we'll call it, as we already do 410 00:16:16,426 --> 00:16:19,196 in the CS50 library whereby you can bootstrap yourselves 411 00:16:19,236 --> 00:16:23,406 to making much more interesting, much more sophisticated projects 412 00:16:23,456 --> 00:16:26,746 by standing on the shoulders of others who have come before you. 413 00:16:26,746 --> 00:16:28,796 And that's exactly what we did with a widget like that. 414 00:16:29,076 --> 00:16:32,306 Some of the search features are we borrowing other people's 415 00:16:32,376 --> 00:16:33,686 libraries and framework. 416 00:16:34,026 --> 00:16:35,996 So problem set 3 has a little taste 417 00:16:35,996 --> 00:16:37,326 of precisely that approach. 418 00:16:37,326 --> 00:16:40,426 So let me go ahead and pull up the staff solution 419 00:16:40,426 --> 00:16:43,736 to the standard edition for a moment, the game is called 15, 420 00:16:43,736 --> 00:16:46,376 it takes one command line argument which is the dimensions 421 00:16:46,376 --> 00:16:49,856 of the board 3 by 3, 4 by 4, I'll do it 4 by 4. 422 00:16:49,856 --> 00:16:51,526 And this is how we chose 423 00:16:51,526 --> 00:16:53,216 to implement the aesthetics of this game. 424 00:16:53,216 --> 00:16:54,666 You can get a little fancier. 425 00:16:54,966 --> 00:16:58,416 But this is perhaps a very simple example of ASCII art. 426 00:16:58,726 --> 00:17:02,276 Now the bottom-- the little underscore 427 00:17:02,276 --> 00:17:04,586 in the bottom right hand corner represents the blank tile 428 00:17:04,856 --> 00:17:08,076 and just as with my thumbs I would move 4 down 429 00:17:08,076 --> 00:17:09,246 or maybe 2 to the right. 430 00:17:09,616 --> 00:17:10,696 I can do exactly that. 431 00:17:10,696 --> 00:17:13,366 If I type 4 and hit Enter the 4 moves down 432 00:17:13,366 --> 00:17:15,106 and the blank space essentially moves up. 433 00:17:15,106 --> 00:17:20,266 I can now move 5 over, I can now move 9, I can move let's say 8. 434 00:17:20,346 --> 00:17:23,236 Well if I didn't want that, 8 again and so forth. 435 00:17:23,236 --> 00:17:25,576 And so for the standard edition will you be implementing the 436 00:17:25,576 --> 00:17:27,256 mechanics of this game so that you can 437 00:17:27,256 --> 00:17:29,566 in fact play against yourself. 438 00:17:29,726 --> 00:17:33,666 Now for the hacker edition, if you are feeling up to a bit more 439 00:17:33,666 --> 00:17:35,826 of a challenge you'll find that when you run the game, 440 00:17:36,256 --> 00:17:37,506 one it's not all that hard 441 00:17:37,506 --> 00:17:39,596 to use some ASCII art just make it a little fancier 442 00:17:39,596 --> 00:17:41,956 as the teaching fellow who implemented this solution did. 443 00:17:42,286 --> 00:17:45,306 But the game here is gonna be played pretty much the same. 444 00:17:45,306 --> 00:17:50,396 I can move 9 down, I can move 12 down, I can move 2 down, 445 00:17:50,396 --> 00:17:54,136 and I can cheat by entering God mode and if I hit God 446 00:17:54,136 --> 00:17:56,916 and hit Enter well now the problem set solves itself 447 00:17:56,916 --> 00:17:57,196 for me. 448 00:17:57,516 --> 00:17:59,666 And so if you choose to elect the hacker edition 449 00:17:59,666 --> 00:18:02,046 of this week's problem set the challenge will be to figure 450 00:18:02,046 --> 00:18:04,646 out not only how to implement the game but how 451 00:18:04,646 --> 00:18:06,796 to implement a solver for the game. 452 00:18:06,796 --> 00:18:08,446 And the PDF will walk you through that. 453 00:18:08,446 --> 00:18:11,076 Now this can actually take a while to watch but you'll see 454 00:18:11,076 --> 00:18:12,796 that the numbers are moving into position. 455 00:18:13,036 --> 00:18:14,346 The spec, if you want to add a little bit 456 00:18:14,346 --> 00:18:16,786 of color shows you obviously how you can add green 457 00:18:16,786 --> 00:18:17,966 and red and so forth. 458 00:18:18,346 --> 00:18:20,366 So if you're curious this one I'm gonna kill off 459 00:18:20,366 --> 00:18:21,446 because it will take a little while-- 460 00:18:22,436 --> 00:18:24,626 you'll find that available on the cloud if you'd 461 00:18:24,626 --> 00:18:26,776 like to experiment whether or not you're doing 462 00:18:26,776 --> 00:18:27,946 that edition or the other. 463 00:18:28,716 --> 00:18:31,736 So, with that said, I have a new toy here. 464 00:18:31,966 --> 00:18:34,496 So this is a little scale that if you put something here 465 00:18:34,496 --> 00:18:36,506 or here it's supposed to tell you which one is heavier 466 00:18:36,506 --> 00:18:38,426 and I also have 8 cups here. 467 00:18:38,426 --> 00:18:41,796 And inside of each of these cups is a different weight so that 468 00:18:41,796 --> 00:18:44,576 in theory all of these cups are different weights. 469 00:18:44,836 --> 00:18:47,476 And now suppose that I want to sort all of these cups. 470 00:18:47,476 --> 00:18:50,196 So all you have behind here for spaces sake is just 8 cups, 471 00:18:50,276 --> 00:18:51,166 each with different weights. 472 00:18:51,496 --> 00:18:52,816 I have my little scale here. 473 00:18:52,816 --> 00:18:54,456 And the reason for the scale is 474 00:18:54,456 --> 00:18:58,376 that as we saw last week the idea of these algorithms-- 475 00:18:58,916 --> 00:19:00,076 I broke the scale already. 476 00:19:00,896 --> 00:19:03,016 The idea behind all these algorithms is 477 00:19:03,016 --> 00:19:06,026 that what's ultimately important is how many comparisons you 478 00:19:06,026 --> 00:19:07,366 ultimately need to make. 479 00:19:08,276 --> 00:19:10,286 I did buy this at like a child store 480 00:19:10,286 --> 00:19:13,216 so it's not very robust here. 481 00:19:13,546 --> 00:19:14,256 Oh my god! 482 00:19:16,936 --> 00:19:20,546 Alright, I can figure this out it's made for like ages 5 483 00:19:20,546 --> 00:19:29,056 and up so, [laughter] oh, my gosh! 484 00:19:29,216 --> 00:19:32,546 This is ridiculous [laughter] alright you try doing this 485 00:19:32,546 --> 00:19:33,826 in front of hundreds of people. 486 00:19:33,826 --> 00:19:35,486 Alright, is that right? 487 00:19:35,826 --> 00:19:37,866 I think that's right. 488 00:19:38,986 --> 00:19:40,266 If not we're gonna call in some help. 489 00:19:40,966 --> 00:19:43,116 Alright, it feels right. 490 00:19:43,806 --> 00:19:47,006 This was not meant to be part of the demo alright nice. 491 00:19:47,406 --> 00:19:49,476 Okay. Thanks, alright. 492 00:19:50,516 --> 00:19:55,766 [ Applause ] 493 00:19:56,266 --> 00:19:59,716 >> So next year don't pick up the scale. 494 00:19:59,716 --> 00:20:01,876 Alright, so we can implement any number 495 00:20:01,876 --> 00:20:03,086 of algorithms using this thing 496 00:20:03,086 --> 00:20:04,976 because the basic mechanism I have here is a comparator. 497 00:20:05,756 --> 00:20:07,976 >> Something that takes two inputs, a left operant 498 00:20:07,976 --> 00:20:11,006 and a right operant and it just returns a Boolean value is-- 499 00:20:11,036 --> 00:20:13,306 or rather it returns a left to right value. 500 00:20:13,506 --> 00:20:15,776 Is this side heavier or is this side heavier? 501 00:20:16,076 --> 00:20:19,046 With that mechanism we can now implement these 502 00:20:19,046 --> 00:20:20,246 comparisons sorts. 503 00:20:20,246 --> 00:20:22,276 Bubble sort recall is a comparison sort. 504 00:20:22,276 --> 00:20:24,536 We pair wise compare everyone that was onstage. 505 00:20:24,856 --> 00:20:27,236 Selection sort too really reduces to a total number 506 00:20:27,236 --> 00:20:30,336 of comparisons because I'm again comparing the current smallest 507 00:20:30,336 --> 00:20:32,856 to the next thing I see, the next thing, so really a lot 508 00:20:32,856 --> 00:20:35,306 of these sorting algorithms boil down to comparisons 509 00:20:35,306 --> 00:20:37,346 and the numbers that you actually have to make. 510 00:20:37,616 --> 00:20:41,476 So supposed I did want to find the smallest cup here. 511 00:20:41,476 --> 00:20:43,626 So each of these cups again have different weights if I want 512 00:20:43,626 --> 00:20:47,266 to find the lightest of them I might start not knowing 513 00:20:47,266 --> 00:20:47,816 which is which. 514 00:20:47,816 --> 00:20:49,066 I'm just gonna start over here. 515 00:20:49,306 --> 00:20:51,156 I'm gonna put two cups down and it looks 516 00:20:51,156 --> 00:20:52,646 like this one is clearly heavier. 517 00:20:52,646 --> 00:20:54,546 And so now I can take this one away. 518 00:20:54,816 --> 00:20:57,226 Make a mental note that he is not the smallest. 519 00:20:57,416 --> 00:20:59,846 So this is like crossing off the person 520 00:21:00,126 --> 00:21:01,456 in position number 2 here. 521 00:21:01,686 --> 00:21:03,586 I know this thing is currently the smallest. 522 00:21:03,586 --> 00:21:05,676 So let me choose another cup from the array. 523 00:21:05,876 --> 00:21:07,826 Alright, he is still currently the smallest. 524 00:21:07,826 --> 00:21:10,826 So let me move down the list or put this cup back. 525 00:21:11,116 --> 00:21:12,146 How about this one here? 526 00:21:12,536 --> 00:21:13,976 Alright, this one is still the smallest. 527 00:21:13,976 --> 00:21:15,176 It's gonna be a pretty stupid demo 528 00:21:15,176 --> 00:21:17,216 if that was the smallest, but here we go. 529 00:21:17,906 --> 00:21:19,946 Now let's try this one, alright. 530 00:21:19,946 --> 00:21:21,556 So that's four comparisons. 531 00:21:22,046 --> 00:21:24,436 Alright, that's 5 comparisons. 532 00:21:24,976 --> 00:21:26,396 This demo needs a little work next year. 533 00:21:26,486 --> 00:21:30,056 That's six comparisons, so N minus 2. 534 00:21:30,056 --> 00:21:33,326 So, now let's see here, okay. 535 00:21:33,826 --> 00:21:36,476 So now thankfully I found the lightest one which happens 536 00:21:36,516 --> 00:21:39,256 to be this guy here and so what's just happened? 537 00:21:39,256 --> 00:21:41,486 It's as though I've walked across the stage 538 00:21:41,486 --> 00:21:44,116 like this realized, damn, it was the guy over here 539 00:21:44,116 --> 00:21:46,136 or rather I found the smallest element here who beat 540 00:21:46,136 --> 00:21:50,006 out number 2 over here so I can now put number 1 into place 541 00:21:50,276 --> 00:21:51,466 and recall that it didn't matte 542 00:21:51,466 --> 00:21:53,996 if I punted whoever was standing here 'cause they were given 543 00:21:53,996 --> 00:21:55,056 to me randomly anyway. 544 00:21:55,056 --> 00:21:57,786 Who cares if I randomly send him or her elsewhere in the array? 545 00:21:58,096 --> 00:22:00,916 And now I have a problem of size N minus 1. 546 00:22:01,106 --> 00:22:05,166 And so I repeat, and so I repeat and one of the visuals meant 547 00:22:05,166 --> 00:22:08,236 to be conveyed by this-- this scale here is now I know 548 00:22:08,236 --> 00:22:09,916 that this guy is the smallest. 549 00:22:09,916 --> 00:22:11,096 So, I'll just put him aside. 550 00:22:11,466 --> 00:22:12,466 But now, my God! 551 00:22:12,466 --> 00:22:14,436 I have to do this whole stupid process again. 552 00:22:14,436 --> 00:22:17,016 So, alright, so he is currently the smallest. 553 00:22:17,016 --> 00:22:20,906 Now, he is currently-- he's currently the smallest. 554 00:22:20,996 --> 00:22:25,506 Now, he's currently the smallest and you can feel the tedium. 555 00:22:25,506 --> 00:22:27,676 And so this begs the question, even in the context 556 00:22:27,676 --> 00:22:30,126 of these cups, how can we possibly do better if at the end 557 00:22:30,126 --> 00:22:34,036 of the day in order to figure out if two cups or if two people 558 00:22:34,176 --> 00:22:37,326 or if two ints inside of an array are bigger or smaller 559 00:22:37,326 --> 00:22:39,056 than one another it feels like we have 560 00:22:39,086 --> 00:22:40,916 to do this comparison work anyway. 561 00:22:41,266 --> 00:22:42,786 Well recall from week zero 562 00:22:42,826 --> 00:22:45,536 when we took a very real world phonebook 563 00:22:45,536 --> 00:22:48,036 and then just last week we had our volunteer come 564 00:22:48,036 --> 00:22:50,626 up with the array of pieces of paper on the board 565 00:22:50,846 --> 00:22:53,836 and when I challenged myself and when I challenged our volunteer 566 00:22:53,836 --> 00:22:57,406 to find me the number 50 both he and I were able 567 00:22:57,406 --> 00:22:59,226 to leverage one assumption. 568 00:22:59,226 --> 00:23:01,246 And that assumption in those examples was what? 569 00:23:01,846 --> 00:23:05,096 That the phonebook was sorted alphabetically 570 00:23:05,286 --> 00:23:08,256 and that the array of numbers on the board behind the second row 571 00:23:08,256 --> 00:23:10,586 of paper itself was sorted alphabetically. 572 00:23:10,826 --> 00:23:13,026 So that seemed to be a detail we can leverage. 573 00:23:13,026 --> 00:23:15,446 And yet we seemed to have a different problem here, right? 574 00:23:15,446 --> 00:23:19,316 Really there's no work to be done if I am handed all 575 00:23:19,316 --> 00:23:21,276 of the cups in sorted order. 576 00:23:21,276 --> 00:23:24,236 There's no work to be done if I'm handed all of the arrays 577 00:23:24,236 --> 00:23:25,836 in sorted order so, you know, 578 00:23:25,836 --> 00:23:27,896 if I demand that you give me this assumption 579 00:23:27,896 --> 00:23:30,836 that the cups are already sorted and then I'll sort them for you, 580 00:23:31,136 --> 00:23:33,146 I mean, this is kind of a cyclical argument. 581 00:23:33,146 --> 00:23:34,826 We're really not solving the problem. 582 00:23:35,086 --> 00:23:37,476 But there was an idea that we exercised on the board. 583 00:23:37,476 --> 00:23:39,706 And there was an idea that we exercised with the phonebook 584 00:23:39,916 --> 00:23:43,526 which was this notion of dividing and conquering. 585 00:23:43,526 --> 00:23:46,236 Taking the problem, recognizing that you know what, 586 00:23:46,236 --> 00:23:49,286 even though this is a pretty big problem size 8 in this case 587 00:23:49,606 --> 00:23:53,086 and last time it was size 8 or in the case of the papers 588 00:23:53,086 --> 00:23:55,856 in size of a thousand roughly with the phonebook, 589 00:23:56,436 --> 00:23:57,326 I assume these are 590 00:23:57,326 --> 00:23:59,086 in a perfectly straight line they won't quite fit. 591 00:23:59,496 --> 00:24:04,546 So if I try to apply the same logic, well how can I divide 592 00:24:04,546 --> 00:24:05,486 and conquer this problem. 593 00:24:05,486 --> 00:24:07,036 This is kind of an unwieldy problem. 594 00:24:07,036 --> 00:24:08,126 There's 8 cups here. 595 00:24:08,366 --> 00:24:10,506 I know it's gonna take me a lot of work on the order 596 00:24:10,506 --> 00:24:12,236 of N squared steps to sort these things. 597 00:24:12,516 --> 00:24:15,596 But can I borrow that idea of taking a problem and dividing it 598 00:24:15,596 --> 00:24:16,706 into something smaller. 599 00:24:16,706 --> 00:24:17,776 And let's see what happens. 600 00:24:17,776 --> 00:24:20,666 Well here is my array of 8, you know what, 601 00:24:20,896 --> 00:24:22,626 it's too much work to sort all 8. 602 00:24:22,626 --> 00:24:26,116 Let me go ahead and just sort the left half at the moment. 603 00:24:26,116 --> 00:24:28,946 So I'm just gonna conceptually move these aside and now, wow! 604 00:24:29,026 --> 00:24:30,566 Alright, I have the problem 605 00:24:30,756 --> 00:24:33,556 so clearly this algorithm whatever it's gonna be is gonna 606 00:24:33,556 --> 00:24:34,506 be at least twice as fast 607 00:24:34,596 --> 00:24:35,966 because I'm doing half as much work. 608 00:24:36,166 --> 00:24:38,296 Alright, well even this, this is still kind 609 00:24:38,296 --> 00:24:39,606 of unwieldy, sort 4 cups. 610 00:24:39,606 --> 00:24:41,366 So I have to get out the damn scale again and go 611 00:24:41,366 --> 00:24:42,376 through that whole process. 612 00:24:42,636 --> 00:24:44,216 Well let me apply this same logic, 613 00:24:44,216 --> 00:24:46,796 let me divide this problem in half, alright. 614 00:24:46,796 --> 00:24:48,356 And now I'm down to 2 cups. 615 00:24:49,066 --> 00:24:51,796 Even this, sort the-- I have to get the damn scale again, right? 616 00:24:51,796 --> 00:24:52,546 It's a lot of work. 617 00:24:52,546 --> 00:24:53,526 Can I do better? 618 00:24:53,756 --> 00:24:55,796 In the phonebook, I just kept dividing and dividing 619 00:24:55,796 --> 00:24:58,106 and dividing into half and in the end I found Mike Smith 620 00:24:58,106 --> 00:24:59,366 or I found the number 50. 621 00:24:59,536 --> 00:25:00,676 Well, let's see what happens here. 622 00:25:00,676 --> 00:25:02,126 Alright, here's the list of size 2. 623 00:25:02,436 --> 00:25:03,906 I still don't want to do all these work. 624 00:25:03,906 --> 00:25:08,836 Let me divide this into 2 lists of size 1 and now done, right? 625 00:25:08,896 --> 00:25:10,946 I have sorted with the smaller problem 626 00:25:10,946 --> 00:25:13,856 because that smaller problem right now is of size 1 627 00:25:14,126 --> 00:25:15,626 and so it's sort of obviously the case 628 00:25:15,626 --> 00:25:16,896 that this cup is now sorted. 629 00:25:16,896 --> 00:25:19,796 In fact, better than that, this guy is already sorted as well 630 00:25:19,796 --> 00:25:22,036 because I whittled that problem down to size 1. 631 00:25:22,486 --> 00:25:24,176 Now, you might think me an idiot, right? 632 00:25:24,176 --> 00:25:27,456 Because this is not really doing any work it's just talking 633 00:25:27,456 --> 00:25:28,216 about doing work. 634 00:25:28,726 --> 00:25:30,736 But what remains to be done, right? 635 00:25:30,736 --> 00:25:33,326 So supposed that this one-- I'll just guesstimate, 636 00:25:33,326 --> 00:25:34,536 this one feels heavier. 637 00:25:34,786 --> 00:25:36,386 So they're still out of order. 638 00:25:36,386 --> 00:25:40,256 If I want this one over here, so what work remains to be done? 639 00:25:40,366 --> 00:25:42,666 This is sorted, this is sorted, 640 00:25:42,666 --> 00:25:44,406 how do I now make a list of size 2? 641 00:25:44,406 --> 00:25:45,026 [ Inaudible Remark ] 642 00:25:45,026 --> 00:25:47,726 >> So I'm gonna have to compare them, right? 643 00:25:47,726 --> 00:25:50,386 So at the end of the day we're not gonna get out of this need 644 00:25:50,386 --> 00:25:53,046 to compare, at least in this context of cups 645 00:25:53,086 --> 00:25:55,026 that have weights or ints that have values. 646 00:25:55,306 --> 00:25:57,056 So I'm gonna need to do one comparison. 647 00:25:57,056 --> 00:25:58,876 So I can do this really simply with a scale 648 00:25:58,876 --> 00:26:00,136 or just with my own arms. 649 00:26:00,356 --> 00:26:01,606 Indeed, this thing is heavier. 650 00:26:01,836 --> 00:26:05,046 And so now when I have the list here, let's see, 651 00:26:05,296 --> 00:26:06,326 yours is facing this way, 652 00:26:06,326 --> 00:26:08,806 so small is gonna be here large is gonna be here. 653 00:26:09,046 --> 00:26:11,686 So here is a list of size 2, this is the light one, 654 00:26:11,866 --> 00:26:12,806 this is the heavier one. 655 00:26:13,056 --> 00:26:15,216 I'm done sorting a list of size 2. 656 00:26:15,566 --> 00:26:17,896 Now we still have all this work to do. 657 00:26:17,896 --> 00:26:19,196 Let's see what happens next. 658 00:26:19,196 --> 00:26:21,696 Well, at this point in the story how had I gotten here? 659 00:26:21,856 --> 00:26:26,186 I first had a list of size 8, then 4, then 2, 660 00:26:26,456 --> 00:26:28,626 but then I had another problem of size 2. 661 00:26:28,626 --> 00:26:30,906 It was the right hand side that I ignored for the moment. 662 00:26:31,136 --> 00:26:32,536 So, let's finish that part of the story. 663 00:26:32,536 --> 00:26:33,326 This one's done. 664 00:26:33,786 --> 00:26:36,966 So now I have this right hand side of the 4-cup problem. 665 00:26:36,966 --> 00:26:38,046 How do I sort these things? 666 00:26:38,456 --> 00:26:39,046 I don't know. 667 00:26:39,046 --> 00:26:40,096 Let's divide and conquer. 668 00:26:40,096 --> 00:26:43,256 Divide each of these into a list of size 1, done. 669 00:26:43,256 --> 00:26:45,896 Done. Well now what remains to be done next? 670 00:26:46,736 --> 00:26:48,706 Well I need to merge these two lists. 671 00:26:48,896 --> 00:26:51,716 This list is sorted and that is, you know, stupid to say 672 00:26:51,716 --> 00:26:52,816 but it's very much correct. 673 00:26:53,036 --> 00:26:54,216 This list is sorted. 674 00:26:54,386 --> 00:26:55,776 Now which order do they go in? 675 00:26:55,776 --> 00:26:56,906 I'm gonna have to compare. 676 00:26:56,906 --> 00:26:58,316 Now I won't bother with the scale anymore. 677 00:26:58,316 --> 00:26:59,726 This one feels heavier. 678 00:26:59,996 --> 00:27:04,846 So, I'm gonna say that this goes here and this goes here. 679 00:27:05,326 --> 00:27:07,596 Alright, now at what point in the story am I? 680 00:27:07,786 --> 00:27:09,476 Alright, this I haven't even touched yet. 681 00:27:09,716 --> 00:27:12,156 This list is sorted, this list is sorted 682 00:27:12,226 --> 00:27:14,496 but they could be intermingled. 683 00:27:14,496 --> 00:27:16,896 What do I now need to do in order 684 00:27:16,896 --> 00:27:18,796 to sort the whole left half? 685 00:27:19,126 --> 00:27:19,456 >> Compare them all. 686 00:27:19,616 --> 00:27:21,506 >> Alright, now you kinda need to compare them all, right? 687 00:27:21,506 --> 00:27:24,686 Because this could be one-- this could be let's say 4 688 00:27:24,686 --> 00:27:28,106 and 5 pounds and this could be 1 and 2 pounds. 689 00:27:28,196 --> 00:27:32,206 So each respectively, this is sorted 1 and 2 respectively 4 690 00:27:32,206 --> 00:27:33,356 and 5, this is sorted 691 00:27:33,586 --> 00:27:35,106 but clearly they are not in the right order. 692 00:27:35,106 --> 00:27:37,686 So now I have to merge these two lists of size 2. 693 00:27:38,026 --> 00:27:38,906 So how do I do that? 694 00:27:38,906 --> 00:27:39,706 I've got 2 lists. 695 00:27:39,946 --> 00:27:41,106 This one here, this one here, 696 00:27:41,226 --> 00:27:43,276 the lightest elements of each are where? 697 00:27:44,566 --> 00:27:45,626 Alright, it's gonna be on this side. 698 00:27:45,626 --> 00:27:48,396 So it's on the elements on your left that are already known 699 00:27:48,396 --> 00:27:49,786 to be the lightest 'cause I did that. 700 00:27:50,056 --> 00:27:51,306 So now I have to compare these. 701 00:27:51,496 --> 00:27:54,146 This one feels lighter so I'm gonna put him there. 702 00:27:54,406 --> 00:27:55,276 Now what's next? 703 00:27:55,276 --> 00:27:58,236 Well, I might move my hands here 'cause this list only has one 704 00:27:58,236 --> 00:27:58,816 element left. 705 00:27:59,036 --> 00:28:00,266 Now, which one remains? 706 00:28:00,306 --> 00:28:02,716 This one feels lighter, so let's put him here. 707 00:28:02,876 --> 00:28:03,886 Now I have these two. 708 00:28:03,886 --> 00:28:04,946 Which one feels lighter? 709 00:28:05,146 --> 00:28:07,416 Well this one feels lighter, done. 710 00:28:07,656 --> 00:28:09,436 So in the end I am doing comparisons. 711 00:28:09,436 --> 00:28:11,836 And it feel like really this is just one of those 712 00:28:11,836 --> 00:28:13,806 like games the magicians play in the square 713 00:28:13,806 --> 00:28:15,136 where you are literally moving the cups 714 00:28:15,136 --> 00:28:17,126 around accomplishing nothing other than confusion 715 00:28:17,126 --> 00:28:19,646 with the audience but-- which may actually be the case-- 716 00:28:20,236 --> 00:28:23,586 but, I argue that this is actually better. 717 00:28:23,586 --> 00:28:25,436 And in fact, if we count up all 718 00:28:25,436 --> 00:28:28,526 of these silly comparisons I was making verbally I bet I'm gonna 719 00:28:28,526 --> 00:28:30,486 be making fewer in the end than I was 720 00:28:30,486 --> 00:28:32,526 with bubble or with selection. 721 00:28:32,526 --> 00:28:35,246 Because the algorithm I proposed is going to leverage this idea 722 00:28:35,246 --> 00:28:38,536 of recursion which recall was just a piece of jargon we tossed 723 00:28:38,536 --> 00:28:41,016 out at the last-- at the end of last week's lecture, 724 00:28:41,016 --> 00:28:45,536 last time's lecture recursion really in this context refers 725 00:28:45,536 --> 00:28:48,406 to the act of a function calling it's self. 726 00:28:48,626 --> 00:28:49,886 Now, this is really not new. 727 00:28:49,886 --> 00:28:51,836 We just didn't use this word in week zero. 728 00:28:51,836 --> 00:28:55,016 In week zero, when we tore the phonebook in half and half 729 00:28:55,016 --> 00:28:57,826 and half we were recursing through that problem. 730 00:28:57,826 --> 00:29:00,966 We took the problem of size a thousand, we divided it in half. 731 00:29:00,966 --> 00:29:02,056 And what did we do? 732 00:29:02,056 --> 00:29:03,636 We then repeated the same process. 733 00:29:03,966 --> 00:29:05,806 So, the curious thing about recursion is 734 00:29:05,806 --> 00:29:09,216 that pretty much always can you implement this idea 735 00:29:09,216 --> 00:29:11,546 of doing the same thing again and again and again 736 00:29:11,546 --> 00:29:13,326 but with smaller bytes each time. 737 00:29:13,566 --> 00:29:15,666 You can do it with loops, a for loop, a while loop, 738 00:29:15,666 --> 00:29:18,356 a forever loop, a repeat block, whatever language you're using, 739 00:29:18,356 --> 00:29:20,606 whatever you want to call it you can implement these things 740 00:29:20,676 --> 00:29:24,496 iteratively but there is this simplicity you'll find 741 00:29:24,546 --> 00:29:27,406 about the idea of recursion why implement something 742 00:29:27,406 --> 00:29:29,786 with this clunky for loop, with the initialization 743 00:29:29,786 --> 00:29:31,936 and the updates and all of this syntax mess 744 00:29:32,446 --> 00:29:36,086 if you can just write a function once and have it call itself 745 00:29:36,086 --> 00:29:38,206 but with a smaller and smaller argument. 746 00:29:38,206 --> 00:29:39,456 There is an elegance about it. 747 00:29:39,456 --> 00:29:42,526 And you'll find in the end that recursion is a feature 748 00:29:42,526 --> 00:29:45,556 of a language, it allows you to map some very obvious concepts 749 00:29:45,606 --> 00:29:47,796 like the phonebook tearing in half and half and half and half 750 00:29:48,366 --> 00:29:52,506 to actual code without it using some arbitrary human contrived 751 00:29:52,966 --> 00:29:55,176 framework like a for loop or a while loop 752 00:29:55,226 --> 00:29:57,336 which are much more stilted mechanisms. 753 00:29:57,596 --> 00:30:01,726 So I propose this as a new algorithm for sorting N elements 754 00:30:01,766 --> 00:30:04,206 and being 8 in this case or really a thousand in the case 755 00:30:04,206 --> 00:30:05,976 of the phonebook, or anything of larger size. 756 00:30:06,046 --> 00:30:10,326 >> So given some input of N elements, first, if N is less 757 00:30:10,326 --> 00:30:11,856 than 2, I return immediately. 758 00:30:11,856 --> 00:30:12,386 Why is that? 759 00:30:14,206 --> 00:30:16,606 Right, 'cause that means either I've been handed zero elements 760 00:30:16,606 --> 00:30:18,356 which mean there's really no work to be done 761 00:30:18,356 --> 00:30:20,916 or I've been handed one element which is a vacuous truth 762 00:30:20,916 --> 00:30:22,096 that it's sorted, right? 763 00:30:22,096 --> 00:30:24,276 It's one element like I claimed before it's sorted 764 00:30:24,276 --> 00:30:25,866 and so there's no work to be done. 765 00:30:26,096 --> 00:30:29,196 Now, as obvious a statement as that is in this algorithm, 766 00:30:29,296 --> 00:30:30,806 it turns out that is the key 767 00:30:30,806 --> 00:30:33,556 to this whole problem being solved correctly 768 00:30:33,756 --> 00:30:35,946 without my algorithm looping infinitely. 769 00:30:35,946 --> 00:30:38,966 I need that sanity check saying at some point, you have to stop. 770 00:30:39,196 --> 00:30:41,846 And here are the criteria for stopping if N is less than 2. 771 00:30:42,176 --> 00:30:45,946 So if N is not less than 2, that is I have two elements or more 772 00:30:45,946 --> 00:30:47,886 in which case there's definitely some sorting to be done. 773 00:30:48,176 --> 00:30:49,396 I argue that, you know what? 774 00:30:49,646 --> 00:30:52,496 I can tell you how to sort N elements by saying 775 00:30:52,496 --> 00:30:53,916 in kind of a snide way. 776 00:30:54,236 --> 00:30:55,646 Alright, you want me to sort N elements. 777 00:30:55,646 --> 00:30:58,496 I'm gonna tell you go sort the left, go sort the right, 778 00:30:58,726 --> 00:31:00,456 and then I'll merge the two together. 779 00:31:00,756 --> 00:31:02,686 Now, it's kind of a cyclical argument here 780 00:31:02,686 --> 00:31:06,066 because how do you sort the left half of N elements? 781 00:31:06,246 --> 00:31:07,956 Well, you need an algorithm for sorting. 782 00:31:08,206 --> 00:31:10,386 But why don't I apply this same logic? 783 00:31:11,006 --> 00:31:14,726 And so the fact that in this whole slide here, this algorithm 784 00:31:14,726 --> 00:31:17,016 for sorting, I'm using the verb sort. 785 00:31:17,276 --> 00:31:21,116 It implies that this algorithm is calling itself again 786 00:31:21,116 --> 00:31:23,836 and again, and again, and on each time the size 787 00:31:23,836 --> 00:31:27,066 of the problem I'm trying to sort is being divided by what? 788 00:31:27,856 --> 00:31:30,696 By two, by two, by two, and here it just conceptually is why this 789 00:31:30,696 --> 00:31:32,566 thing doesn't infinitely loop. 790 00:31:32,566 --> 00:31:35,666 At some point, I'm gonna pass an N divided by two elements, 791 00:31:35,666 --> 00:31:37,556 N divided by two elements but at some point, 792 00:31:37,556 --> 00:31:40,086 N divided by 2 is going to give me 1. 793 00:31:40,466 --> 00:31:43,516 And at that point, the algorithm is going to return immediately 794 00:31:43,706 --> 00:31:46,086 so I'm not gonna keep asking the same question again 795 00:31:46,086 --> 00:31:46,696 and again, and again. 796 00:31:46,696 --> 00:31:48,636 It will bottom out at some point. 797 00:31:48,826 --> 00:31:50,786 So what does this allow me to do? 798 00:31:50,786 --> 00:31:53,146 Well, let me come up with a sample set of elements here 799 00:31:53,556 --> 00:31:55,856 and let me go ahead and just get some fresh white board space. 800 00:31:55,856 --> 00:31:59,076 It's a little distracting using the cups since they don't fit 801 00:31:59,476 --> 00:32:03,436 on table and also, you can't see what they previously 802 00:32:03,436 --> 00:32:03,826 looked like. 803 00:32:03,826 --> 00:32:06,386 So let's try to do this chronologically top to bottom. 804 00:32:06,426 --> 00:32:13,636 If I start off with fou, two, six, eight, one, three, seven, 805 00:32:13,806 --> 00:32:18,426 five, so my list is of size N equals 8 at the moment. 806 00:32:18,676 --> 00:32:20,106 The algorithm on mine professed 807 00:32:20,106 --> 00:32:21,886 to be implementing now is this thing. 808 00:32:21,886 --> 00:32:23,376 So I'm giving N equals 8. 809 00:32:23,456 --> 00:32:25,616 So N is not less than 2 so I don't return 810 00:32:25,916 --> 00:32:27,186 and so I proceed to do what? 811 00:32:27,356 --> 00:32:29,966 Well, first I have to sort the left half of these elements. 812 00:32:29,966 --> 00:32:31,326 Well, let me think about what that means. 813 00:32:31,626 --> 00:32:36,506 Here's the left half, so now I have a problem of size 4. 814 00:32:36,606 --> 00:32:38,336 And what do I do? 815 00:32:38,336 --> 00:32:41,426 So in your mind, if you are now the computer program 816 00:32:41,856 --> 00:32:44,686 and you are executing this thing from top to bottom, 817 00:32:44,916 --> 00:32:49,006 what just has happened verbally is we are stepping into the line 818 00:32:49,006 --> 00:32:52,356 of code that says sort left half of elements. 819 00:32:52,816 --> 00:32:54,946 So at this point in the story, we're not gonna forge ahead 820 00:32:54,946 --> 00:32:56,986 to here or to here, or the last two lines. 821 00:32:57,236 --> 00:32:59,496 We're gonna focus on this line in particular 822 00:32:59,496 --> 00:33:01,106 and ask ourselves how do you sort the left half. 823 00:33:01,596 --> 00:33:03,126 Well, let's call this same algorithm, 824 00:33:03,176 --> 00:33:04,836 on input of four elements. 825 00:33:05,236 --> 00:33:06,996 Check first is N less than 2. 826 00:33:07,286 --> 00:33:08,566 No. So let's proceed. 827 00:33:08,876 --> 00:33:09,756 How do I sort this? 828 00:33:09,916 --> 00:33:12,016 Alright, let me sort the left half. 829 00:33:12,056 --> 00:33:14,576 That's gonna be these two elements here, alright. 830 00:33:14,766 --> 00:33:16,676 How do I sort the left half of these elements? 831 00:33:16,676 --> 00:33:18,306 Well, let's call this same algorithm. 832 00:33:18,306 --> 00:33:20,946 On input of two elements, is less than 2? 833 00:33:20,946 --> 00:33:22,406 No, 'cause it equals 2. 834 00:33:22,746 --> 00:33:23,736 So what do I do next? 835 00:33:23,736 --> 00:33:25,996 I sort the left half of elements, alright. 836 00:33:25,996 --> 00:33:26,826 So what's the left half? 837 00:33:27,296 --> 00:33:28,506 It's just this guy here. 838 00:33:28,506 --> 00:33:29,546 And this is where it feels 839 00:33:29,546 --> 00:33:32,066 like we're not really doing anything except talking 840 00:33:32,066 --> 00:33:32,776 about sorting. 841 00:33:33,076 --> 00:33:37,366 I now have a list of size 1 so N is, in fact, less than 2. 842 00:33:37,366 --> 00:33:39,516 One is less than 2 so I return. 843 00:33:39,846 --> 00:33:42,836 And this is now consistent with my claim 844 00:33:43,116 --> 00:33:45,846 that I have sorted a list of size N equals 1. 845 00:33:46,016 --> 00:33:47,986 Now, in the big picture, that doesn't seem all 846 00:33:47,986 --> 00:33:49,626 that useful yet, but let's get there. 847 00:33:49,836 --> 00:33:52,376 So now, at this point in the story, I've just returned. 848 00:33:52,666 --> 00:33:57,396 So if you now reverse the story that we're telling here, 849 00:33:57,396 --> 00:34:00,666 what was the next line supposed to be after sorting left half? 850 00:34:01,256 --> 00:34:02,236 Sort the right half, right? 851 00:34:02,236 --> 00:34:05,206 So we just zoomed in on this line of codes. 852 00:34:05,206 --> 00:34:07,906 Sort the left half and that left half was only of size 1. 853 00:34:07,906 --> 00:34:10,616 As soon as I return, I get to resume this story 854 00:34:10,846 --> 00:34:12,076 which means sort the right half, 855 00:34:12,466 --> 00:34:14,366 so that means sort this half here. 856 00:34:14,706 --> 00:34:18,186 Well, N is, again, less than 2 'cause N is 1 now. 857 00:34:18,486 --> 00:34:21,626 So I have now sorted another list of size 1, 858 00:34:21,986 --> 00:34:22,926 and now I'm done with that. 859 00:34:23,116 --> 00:34:24,156 The function returns. 860 00:34:24,156 --> 00:34:25,536 And what line of code am I on now? 861 00:34:26,276 --> 00:34:27,716 The merge, the sorted halves. 862 00:34:28,046 --> 00:34:30,516 So this is where recursion gets a little trippy-- 863 00:34:30,516 --> 00:34:32,506 certainly initially, and that you have to kind 864 00:34:32,506 --> 00:34:35,076 of keep diving deeper, deeper, deeper into the problem. 865 00:34:35,296 --> 00:34:37,706 And then as soon as the function returns, you kinda have 866 00:34:37,846 --> 00:34:41,346 to unwind but remember where you were just a moment ago. 867 00:34:41,346 --> 00:34:43,006 So that's where it's easy to get lost, but let's see 868 00:34:43,006 --> 00:34:45,506 if these lines in the picture help us stay on track. 869 00:34:45,606 --> 00:34:47,446 So now, I need to merge the sorted halves. 870 00:34:47,446 --> 00:34:48,136 How do I do this? 871 00:34:48,446 --> 00:34:50,496 Well, it's pretty obvious to do as a human, but let's come 872 00:34:50,496 --> 00:34:51,816 up with a simple heuristic. 873 00:34:52,106 --> 00:34:56,496 I'm gonna use two integers-- two variables rather. 874 00:34:56,496 --> 00:34:59,016 I'm gonna use my left finger and my right finger and each 875 00:34:59,016 --> 00:35:01,626 of which is gonna point at the start of each of this list. 876 00:35:01,796 --> 00:35:03,886 So at the moment left hand is at the start of this list 877 00:35:03,886 --> 00:35:07,326 of size 1, my right hand is at the start of this list 878 00:35:07,326 --> 00:35:09,916 of size 1, and now I need to merge these two lists. 879 00:35:10,036 --> 00:35:11,056 How do I merge them? 880 00:35:11,326 --> 00:35:12,436 Well, I have to compare. 881 00:35:12,726 --> 00:35:14,816 So compare the front of this list which is 4 882 00:35:14,816 --> 00:35:16,556 against the front of this list which is 2. 883 00:35:16,886 --> 00:35:18,196 Which one is obviously smaller? 884 00:35:18,196 --> 00:35:20,926 Two, so now let me drop down 2. 885 00:35:21,146 --> 00:35:23,686 Now what? Well, let's consider these as lists. 886 00:35:23,716 --> 00:35:24,876 They're pretty small right now, 887 00:35:25,116 --> 00:35:26,826 but supposed there could be more numbers, 888 00:35:27,006 --> 00:35:28,676 so let me increment my right hand. 889 00:35:29,006 --> 00:35:30,866 So it's now pointing to the end of the list. 890 00:35:30,866 --> 00:35:32,456 There's nothing there, so now I'm done. 891 00:35:32,676 --> 00:35:35,276 I can now just blindly copy what remains 892 00:35:35,276 --> 00:35:36,576 in the previous list here. 893 00:35:37,126 --> 00:35:39,336 So how many comparisons did that take here? 894 00:35:39,956 --> 00:35:42,856 Well, it looks like it took 1 in this case or it involves-- 895 00:35:42,856 --> 00:35:43,766 we can put it another way, 896 00:35:44,126 --> 00:35:46,246 merging those two lists involved looking 897 00:35:46,476 --> 00:35:49,096 at two numbers, 1, 2, and that's it. 898 00:35:49,096 --> 00:35:50,486 So, tuck that away for just a moment. 899 00:35:50,716 --> 00:35:54,006 So now, I have a list that's sorted of size 2. 900 00:35:54,116 --> 00:35:56,236 Where am I in the original story? 901 00:35:57,546 --> 00:35:58,286 What comes next? 902 00:35:59,246 --> 00:36:00,176 Sort the right half. 903 00:36:00,176 --> 00:36:03,586 So again, if you're unwinding what's going on here, this-- 904 00:36:03,586 --> 00:36:05,856 we sorted the left half which meant sort the left half, 905 00:36:05,896 --> 00:36:07,236 then the right half then the merge. 906 00:36:07,536 --> 00:36:10,476 Well, then we rewind which means now sort the right half. 907 00:36:10,766 --> 00:36:12,076 So let's do this a little faster. 908 00:36:12,076 --> 00:36:13,116 I'm sorting the right half 909 00:36:13,116 --> 00:36:15,666 which means sort the left half first, alright? 910 00:36:15,666 --> 00:36:17,366 So I've done sorting of the left half. 911 00:36:17,366 --> 00:36:17,866 Let me do that. 912 00:36:18,446 --> 00:36:19,536 I return immediately. 913 00:36:19,736 --> 00:36:20,936 Now, let me sort the right half. 914 00:36:20,936 --> 00:36:22,056 This, too, is pretty easy. 915 00:36:22,226 --> 00:36:22,796 I get this. 916 00:36:22,946 --> 00:36:25,806 Now, I'm at what point? 917 00:36:25,986 --> 00:36:28,096 Merge. Now this is pretty easy, this one, 918 00:36:28,096 --> 00:36:29,486 but let's use the same heuristics. 919 00:36:29,486 --> 00:36:32,686 So finger-pointing at the start of each list, 6 is indeed less 920 00:36:32,726 --> 00:36:34,676 than 8 so I'm gonna write it down first. 921 00:36:34,846 --> 00:36:36,296 Now, I'm gonna advance this finger. 922 00:36:36,296 --> 00:36:37,546 I'm out of numbers in this list 923 00:36:37,636 --> 00:36:39,386 so I can just go ahead and right down 8. 924 00:36:39,776 --> 00:36:43,366 And again, how many numbers have I touched physically this time? 925 00:36:43,736 --> 00:36:46,376 Just two, I touched 6 once and then I moved ahead, 926 00:36:46,446 --> 00:36:48,236 and then I touched 8 and I moved ahead, 927 00:36:48,506 --> 00:36:49,786 not touching any other number. 928 00:36:49,786 --> 00:36:52,446 So at this point in the story-- at this level in the story, 929 00:36:52,446 --> 00:36:56,786 if you will, I've touched one, two, three, four numbers total 930 00:36:57,216 --> 00:36:59,426 or I've made two comparisons. 931 00:36:59,456 --> 00:37:00,476 But let's just count in terms 932 00:37:00,476 --> 00:37:02,256 of the numbers I'm touching 'cause that'll be useful 933 00:37:02,256 --> 00:37:02,986 in just a moment. 934 00:37:03,286 --> 00:37:05,176 So now, what remains to be done? 935 00:37:06,086 --> 00:37:07,636 What step is next? 936 00:37:07,886 --> 00:37:08,946 So now, I have to merge. 937 00:37:08,946 --> 00:37:10,506 Now, don't be distracted by the fact 938 00:37:10,506 --> 00:37:12,446 that they already appear to be sorted. 939 00:37:12,446 --> 00:37:13,446 That's just good luck. 940 00:37:13,686 --> 00:37:15,396 I still have to do this process and here is 941 00:37:15,396 --> 00:37:18,276 where the finger thing gets a little more useful 'cause I have 942 00:37:18,276 --> 00:37:19,026 longer lists. 943 00:37:19,026 --> 00:37:20,916 Just point at the start of this list, and this one 944 00:37:21,216 --> 00:37:22,346 so which number comes first? 945 00:37:22,346 --> 00:37:24,336 Obviously, two, now what do I do? 946 00:37:24,336 --> 00:37:25,686 Well, let's do the same thing as before. 947 00:37:25,686 --> 00:37:28,596 Advance this pointer-- this finger, to the next element 948 00:37:28,596 --> 00:37:30,866 of the list which is 4, make the comparison. 949 00:37:30,866 --> 00:37:31,986 It's obviously 4. 950 00:37:32,326 --> 00:37:34,806 I'm out of numbers here, so now I can blindly write 951 00:37:34,806 --> 00:37:36,286 down the rest of these numbers. 952 00:37:36,476 --> 00:37:38,976 But again, the key takeaway there was how many numbers did I 953 00:37:38,976 --> 00:37:39,956 touch on that pass? 954 00:37:40,516 --> 00:37:43,846 One, two, three, four, so on each level 955 00:37:43,846 --> 00:37:46,156 of the picture I'm drawing, I've seem only 956 00:37:46,156 --> 00:37:48,096 to be looking at each number once. 957 00:37:48,346 --> 00:37:50,176 And again, that's one of the key takeaways here. 958 00:37:50,416 --> 00:37:52,686 Just contrast this for a brief moment to something 959 00:37:52,686 --> 00:37:55,686 like Selection Sort which from the get go had a ridiculous 960 00:37:55,686 --> 00:37:58,396 amount of redundancy comparing the same damn numbers again 961 00:37:58,396 --> 00:37:59,156 and again, and again. 962 00:37:59,436 --> 00:38:01,696 We don't seem to be doing that just yet, certainly not 963 00:38:01,696 --> 00:38:03,356 as badly, alright, so at this point 964 00:38:03,356 --> 00:38:05,656 in the story I have a sorted list of size 4. 965 00:38:05,656 --> 00:38:07,376 What's the next step in the algorithm? 966 00:38:08,516 --> 00:38:09,356 Sort right half. 967 00:38:09,356 --> 00:38:11,746 Now, I'll do this a bit more on automatic pilot 968 00:38:11,746 --> 00:38:13,176 so that it doesn't get too tedious, 969 00:38:13,576 --> 00:38:14,936 but let's do the same thing. 970 00:38:15,026 --> 00:38:15,736 Here is the right half. 971 00:38:16,546 --> 00:38:17,656 Here is the left half. 972 00:38:18,236 --> 00:38:20,116 Here is the left half of the left half 973 00:38:20,456 --> 00:38:21,946 which is a trivial list of 1. 974 00:38:22,256 --> 00:38:25,706 Here is the right half of the left half, trivial list of 1. 975 00:38:25,936 --> 00:38:28,826 Now, I have to do the merging so I do pointer, pointer. 976 00:38:29,016 --> 00:38:30,406 I end up getting 1 and 3. 977 00:38:30,706 --> 00:38:32,286 Now, I do the same process here. 978 00:38:32,736 --> 00:38:35,066 I get a 7, same process here. 979 00:38:35,426 --> 00:38:36,166 I get a 5. 980 00:38:36,166 --> 00:38:37,186 I now have to merge. 981 00:38:37,186 --> 00:38:39,026 It's a little more interesting now, 5. 982 00:38:39,376 --> 00:38:42,076 I'm out of numbers so I can just write down the 7 and now, 983 00:38:42,346 --> 00:38:44,016 if you're following along, what step comes next? 984 00:38:44,736 --> 00:38:44,836 >> Merge. 985 00:38:45,176 --> 00:38:45,976 >> Okay, merge. 986 00:38:45,976 --> 00:38:48,256 It turns out that they already look pretty good, 987 00:38:48,256 --> 00:38:49,766 but let's see what happens. 988 00:38:50,016 --> 00:38:51,646 One, then I advance this finger. 989 00:38:51,806 --> 00:38:53,086 I'm pointing at 3 and 5. 990 00:38:53,086 --> 00:38:53,906 I take the 3. 991 00:38:54,216 --> 00:38:55,116 I advance this finger. 992 00:38:55,116 --> 00:38:57,556 I'm now pointing at just the 5, then the 7. 993 00:38:57,926 --> 00:39:00,586 Now, at this point in the story, one, two, three, four, 994 00:39:00,766 --> 00:39:02,346 so notice what seems to be happening here. 995 00:39:02,346 --> 00:39:04,006 On each level in the picture, 996 00:39:04,356 --> 00:39:06,686 I'm touching each number once, alright. 997 00:39:06,686 --> 00:39:08,236 So that's gonna be important in just a moment. 998 00:39:08,426 --> 00:39:09,536 But we have one last step 999 00:39:09,716 --> 00:39:12,506 which is obviously merge the sorted halves. 1000 00:39:12,506 --> 00:39:15,426 So on this half, I'm gonna put my finger on the 2. 1001 00:39:15,426 --> 00:39:17,226 This half, I'm gonna put my finger on the 1 1002 00:39:17,386 --> 00:39:19,156 and notice I can do the same thing. 1003 00:39:19,516 --> 00:39:21,566 I've realized 1 versus 2, 1 wins. 1004 00:39:21,806 --> 00:39:23,206 I advance this finger to 3. 1005 00:39:23,206 --> 00:39:25,956 I then compare 2 and 3 and as you might imagine, 1006 00:39:25,956 --> 00:39:29,756 if I repeat this process, again, again, again, and again, 1007 00:39:30,436 --> 00:39:34,136 the list is hopefully, now [laughter] one, 1008 00:39:34,136 --> 00:39:35,756 two, three, four, five. 1009 00:39:35,756 --> 00:39:37,256 [ Laughter ] 1010 00:39:37,256 --> 00:39:44,186 >> Wow, okay, nicely done, alright, [laughter] alright. 1011 00:39:44,326 --> 00:39:46,746 I'm getting a lot of credibility today. 1012 00:39:47,076 --> 00:39:51,186 Alright, so the list is now hopefully sorted correctly 1013 00:39:51,186 --> 00:39:51,866 and it is, in fact. 1014 00:39:52,076 --> 00:39:54,156 Alright, so then, this begs the question. 1015 00:39:54,156 --> 00:39:55,176 What was the point of all this? 1016 00:39:55,176 --> 00:39:57,176 'Cause frankly, that took a lot more time to explain. 1017 00:39:57,176 --> 00:39:57,786 I'm out of breath. 1018 00:39:58,116 --> 00:39:59,746 There was lot of work it seemed at the board. 1019 00:40:00,066 --> 00:40:00,946 Was it, in fact, better? 1020 00:40:01,076 --> 00:40:03,466 >> Well, we saw the teaser in terms of that animation 1021 00:40:03,466 --> 00:40:05,806 that suggests this merge sort algorithm when implemented 1022 00:40:05,806 --> 00:40:07,976 by a computer is absolutely faster. 1023 00:40:07,976 --> 00:40:09,666 It blows selection and Bubble Sorts 1024 00:40:09,666 --> 00:40:11,816 out of the water, but why is that? 1025 00:40:12,076 --> 00:40:14,696 Well, notice again, on each level of the tree, 1026 00:40:15,356 --> 00:40:18,316 when I actually do the merging, so this remember, 1027 00:40:18,316 --> 00:40:20,766 is when I wrote down just the numbers when I bottomed 1028 00:40:20,766 --> 00:40:22,116 out with the list of size 1. 1029 00:40:22,406 --> 00:40:25,566 When I started merging, I did merging three times 1030 00:40:25,566 --> 00:40:28,656 across this whole board, this level, this level, 1031 00:40:28,656 --> 00:40:30,386 and this level, and each time 1032 00:40:30,386 --> 00:40:32,476 because of the way I was advancing my fingers I touched 1033 00:40:32,476 --> 00:40:33,656 each number just once. 1034 00:40:34,056 --> 00:40:38,466 So if on each iteration of merging I'm doing eight things 1035 00:40:38,466 --> 00:40:41,176 or more generally, N. That then begs the question, 1036 00:40:41,176 --> 00:40:44,026 how many levels of this tree are there actually? 1037 00:40:44,316 --> 00:40:46,576 Ignore this one here 'cause this is when I was just writing 1038 00:40:46,576 --> 00:40:48,416 down the original list of size 1. 1039 00:40:48,476 --> 00:40:50,286 Well, here is when I started doing the merging. 1040 00:40:50,626 --> 00:40:51,936 So there's obviously three rows 1041 00:40:52,416 --> 00:40:54,656 but conceptually, how did I get to 3? 1042 00:40:54,916 --> 00:40:58,646 Well, here is a list of size N. How many times can you divide a 1043 00:40:58,646 --> 00:41:02,386 list of size N by 2, right? 1044 00:41:02,386 --> 00:41:07,486 So I can go from 8 to 4 to 2 to 1, so that's 3. 1045 00:41:07,866 --> 00:41:10,856 And in fact, mathematically if it's-- 1046 00:41:10,896 --> 00:41:11,866 don't really have room for this. 1047 00:41:11,946 --> 00:41:14,816 But mathematically, we've mentioned this before, 1048 00:41:14,926 --> 00:41:17,926 log N or really to be precise, log base 2 of N, 1049 00:41:17,926 --> 00:41:20,346 is the way you express this mathematically. 1050 00:41:20,636 --> 00:41:25,386 How many times you can divide N by 2 before you get down to 1? 1051 00:41:25,456 --> 00:41:29,046 And in this case, we go from 8 to 4 to 2 to 1 three times 1052 00:41:29,046 --> 00:41:31,476 and then on each iteration of this algorithm, 1053 00:41:31,516 --> 00:41:34,746 each pass across the board I'm touching N numbers, 1054 00:41:35,086 --> 00:41:38,756 so that means I'm doing N things, log N times. 1055 00:41:39,036 --> 00:41:40,906 And so what that seems to suggest is 1056 00:41:40,906 --> 00:41:44,896 that merge sort itself when all is said and done is an algorithm 1057 00:41:44,896 --> 00:41:49,376 that runs in log of N times N or rather just 1058 00:41:49,636 --> 00:41:53,876 so the parenthesization is right, N log N. And in fact, 1059 00:41:54,136 --> 00:41:56,596 this algorithm called merge sort does, in fact, 1060 00:41:56,596 --> 00:42:00,216 run in what's called N log N time. 1061 00:42:00,216 --> 00:42:02,376 Now, if you think back to some of the charts last week, 1062 00:42:02,726 --> 00:42:05,066 N squared was one that-- as like a parabola, 1063 00:42:05,066 --> 00:42:06,606 that pretty much took off on the chart 1064 00:42:06,606 --> 00:42:08,176 and got really big, really fast. 1065 00:42:08,566 --> 00:42:12,726 N log N is not nearly as good as log N. As a sanity check, 1066 00:42:12,726 --> 00:42:17,386 what algorithm have we seen that runs in log N time? 1067 00:42:17,586 --> 00:42:20,526 So binary search, the phonebook example, binary search 1068 00:42:20,526 --> 00:42:22,486 on the pieces of paper on the white board, why is that? 1069 00:42:22,486 --> 00:42:24,906 'Cause there, you're only dividing the problem in half 1070 00:42:24,906 --> 00:42:27,376 and half, and half and half, we are doing that here. 1071 00:42:27,376 --> 00:42:29,456 I'm dividing it in half and half, and half and half, 1072 00:42:29,456 --> 00:42:32,986 but each time I do that there's this merging process. 1073 00:42:33,116 --> 00:42:35,346 But that merging process only takes N steps, 1074 00:42:35,806 --> 00:42:40,046 so that's N times log of N. Now, it's a little tricky to reason 1075 00:42:40,046 --> 00:42:41,436 through this perhaps the first time, 1076 00:42:41,436 --> 00:42:43,376 let's just take a very simple example and see 1077 00:42:43,376 --> 00:42:45,276 if we can do a little sanity check here. 1078 00:42:45,576 --> 00:42:46,766 So I claim the following. 1079 00:42:47,356 --> 00:42:49,216 In general, when we wanna talk about running time, 1080 00:42:49,486 --> 00:42:50,856 we just need a silly place holder, 1081 00:42:50,856 --> 00:42:52,156 so we'll call it T for time. 1082 00:42:52,436 --> 00:42:55,136 So the running time of the problem where the input is 1083 00:42:55,136 --> 00:42:58,686 of size N as expressed here formulaically, T of N, 1084 00:42:58,756 --> 00:43:00,606 the running time of an algorithm, given an input 1085 00:43:00,606 --> 00:43:01,936 of size N. You know what? 1086 00:43:02,206 --> 00:43:04,576 It's gonna be zero if N is less than 2, right. 1087 00:43:04,576 --> 00:43:05,866 This is just statement of the obvious. 1088 00:43:05,866 --> 00:43:07,046 If there's no work to be done, 1089 00:43:07,436 --> 00:43:08,936 the running time is gonna be zero. 1090 00:43:09,186 --> 00:43:10,286 So let me just define that. 1091 00:43:10,286 --> 00:43:13,586 I need a so-called base case even in my analysis here. 1092 00:43:13,906 --> 00:43:16,246 Now, this looks a little scary at first, 1093 00:43:16,246 --> 00:43:18,096 but this is actually just another statement 1094 00:43:18,096 --> 00:43:18,586 of the obvious. 1095 00:43:19,026 --> 00:43:21,936 If I'm using algorithm that I'm now calling merge sort, 1096 00:43:22,286 --> 00:43:25,816 the running time involved in sorting N elements, T of N, 1097 00:43:26,176 --> 00:43:29,496 you know, is just the same as running the algorithm 1098 00:43:29,496 --> 00:43:32,276 on half the size of the elements plus do it again 1099 00:43:32,276 --> 00:43:35,076 for the right half, plus what's this plus N come from? 1100 00:43:36,146 --> 00:43:37,026 That's the merging, right? 1101 00:43:37,026 --> 00:43:38,816 That's the finger thing from left finger 1102 00:43:38,816 --> 00:43:41,126 to right finger going back and forth across the board, 1103 00:43:41,256 --> 00:43:43,586 never touching a number more than once. 1104 00:43:43,586 --> 00:43:46,046 So in other words, every time I merge the point 1105 00:43:46,046 --> 00:43:47,746 that I kept emphasizing verbally there 1106 00:43:47,746 --> 00:43:50,866 and that I'm only touching each number once, means that we have 1107 00:43:50,916 --> 00:43:53,246 to account for the amount of time it takes to merge 1108 00:43:53,546 --> 00:43:56,686 which is going to be just N. Now, this is again one 1109 00:43:56,686 --> 00:43:57,836 of these cyclical answers. 1110 00:43:57,836 --> 00:43:59,766 I ask you for the running time of this algorithm 1111 00:43:59,766 --> 00:44:01,376 and you give me the running time in terms 1112 00:44:01,376 --> 00:44:03,166 of the running time, right. 1113 00:44:03,166 --> 00:44:05,986 So I kinda have to push back now and, you know, push back 1114 00:44:05,986 --> 00:44:06,836 and say, alright, well, 1115 00:44:06,866 --> 00:44:08,896 what's the running time of T divided by N? 1116 00:44:08,896 --> 00:44:11,716 Well, you being the wise-- wise guy could say, well, 1117 00:44:11,716 --> 00:44:14,436 it's T of N divided by 4 plus T of N divided 1118 00:44:14,436 --> 00:44:16,176 by 4 plus and over 2, right? 1119 00:44:16,176 --> 00:44:17,656 We can keep doing this again and again. 1120 00:44:17,986 --> 00:44:20,426 Well, how can we ever get to an actual answer? 1121 00:44:20,756 --> 00:44:23,796 Well, the beauty is in this simple stupid little statement 1122 00:44:23,796 --> 00:44:24,476 of the obvious. 1123 00:44:24,476 --> 00:44:26,866 And this too is where the power of recursion comes 1124 00:44:26,866 --> 00:44:28,046 in in a programming language. 1125 00:44:28,266 --> 00:44:31,186 If you have this so-called base case, can we actually get 1126 00:44:31,186 --> 00:44:33,246 to a definitive answer eventually? 1127 00:44:33,246 --> 00:44:35,666 We're not going to run around in circles forever. 1128 00:44:35,666 --> 00:44:37,216 So again, the algorithm looks like this. 1129 00:44:37,636 --> 00:44:39,466 This is our little sanity check, the base case. 1130 00:44:39,726 --> 00:44:42,596 This is really the guts of the algorithm 'cause this line here, 1131 00:44:43,226 --> 00:44:44,516 someone pointed it out before. 1132 00:44:44,716 --> 00:44:48,436 You can actually say big O of 1, big O of 1 being constant time, 1133 00:44:48,436 --> 00:44:49,506 the same number of steps. 1134 00:44:49,806 --> 00:44:51,436 So this line or these lines of code 1135 00:44:51,436 --> 00:44:55,826 up here are arguably constant time steps to say if N is less 1136 00:44:55,826 --> 00:44:58,986 than 2 in return, that it will always take maybe one step, 1137 00:44:59,036 --> 00:45:02,176 maybe two steps, some number of fixed CPU cycles. 1138 00:45:02,466 --> 00:45:05,466 These things run for variable amounts of time 1139 00:45:05,466 --> 00:45:08,406 because they take input, a list of size of some amount. 1140 00:45:08,926 --> 00:45:09,926 So here is the formula. 1141 00:45:10,206 --> 00:45:13,596 T of N is gonna equal T of N divided by 2 'cause the time 1142 00:45:13,596 --> 00:45:16,186 to sort the left half plus T of N divided by 2, 1143 00:45:16,246 --> 00:45:19,116 the time to sort the right half plus the time to merge. 1144 00:45:19,436 --> 00:45:22,456 And so can we do a little sanity check with the simple case here? 1145 00:45:22,456 --> 00:45:23,126 Well, let's try. 1146 00:45:23,426 --> 00:45:25,296 So supposed that the list is not of size 8, 1147 00:45:25,296 --> 00:45:27,336 let's make it slightly more interesting and double it. 1148 00:45:27,596 --> 00:45:30,606 So supposed that I give you 16 elements to sort, well, 1149 00:45:30,606 --> 00:45:34,366 following the logic before, the running time involved 1150 00:45:34,366 --> 00:45:37,806 in sorting 16 elements is gonna be twice the running time 1151 00:45:37,806 --> 00:45:41,146 of sorting 8 elements, left half and right half plus 16 1152 00:45:41,146 --> 00:45:42,926 and again, a little sanity check, 16 means-- 1153 00:45:42,926 --> 00:45:45,586 just the merge steps, right? 1154 00:45:45,586 --> 00:45:48,146 You have to touch each of the 16 elements once as I go 1155 00:45:48,146 --> 00:45:49,126 across the blackboard 1156 00:45:49,126 --> 00:45:50,926 to actually merge them together, alright. 1157 00:45:50,966 --> 00:45:53,936 So now, let's recurse together, let's dive in 1158 00:45:53,936 --> 00:45:55,966 and hone in on what T of 8 is. 1159 00:45:55,966 --> 00:45:59,996 Well, T of 8 is gonna be 2 times T of 4 plus 8, alright. 1160 00:45:59,996 --> 00:46:00,936 Well, what's T of 4? 1161 00:46:00,936 --> 00:46:05,436 T of 4 is gonna be two times T of 2 plus 4, alright. 1162 00:46:05,636 --> 00:46:07,136 Well, what's T of 2? 1163 00:46:07,396 --> 00:46:11,236 Well, T of 2 is gonna be two times T of 1 plus 1 1164 00:46:11,236 --> 00:46:14,706 and thankfully, this guys is gonna become uninteresting 1165 00:46:14,976 --> 00:46:16,966 to sort one elements takes no time, 1166 00:46:16,966 --> 00:46:18,256 'cause we return immediately. 1167 00:46:18,536 --> 00:46:21,486 So that's why that base case rule was so important. 1168 00:46:22,026 --> 00:46:23,416 So I'll now put these slides online, 1169 00:46:23,656 --> 00:46:25,876 certainly if you are scribbling things down. 1170 00:46:26,106 --> 00:46:27,776 But let's now answer this question. 1171 00:46:27,776 --> 00:46:28,766 What do I wanna now do? 1172 00:46:29,066 --> 00:46:31,136 I had T of 1-- I have an answer now. 1173 00:46:31,496 --> 00:46:33,776 So now let me just do some grade school substitution. 1174 00:46:33,776 --> 00:46:35,896 I need to answer this question, T of 2. 1175 00:46:36,056 --> 00:46:39,736 So let's plug in T of 1 equals zero by changing that, alright. 1176 00:46:39,736 --> 00:46:42,826 So now, T of 2 equals zero plus 2. 1177 00:46:42,976 --> 00:46:44,566 So that's T of 2 equals 2, 1178 00:46:44,926 --> 00:46:48,066 so that means this thing here had better become 2, alright. 1179 00:46:48,116 --> 00:46:50,566 So now, 2 times 2 is 4 plus 4, that's 8. 1180 00:46:50,836 --> 00:46:54,166 So now, I'm gonna plug in 8 here and now finally, 1181 00:46:54,166 --> 00:46:57,126 that's 16 so that's plus 8, so that's 24. 1182 00:46:57,326 --> 00:46:59,346 So now, I need to plug the 24 in there. 1183 00:46:59,676 --> 00:47:04,986 So 2 times 24 plus 16 gives me 64. 1184 00:47:04,986 --> 00:47:10,266 So the running time I claim to sort 16 elements takes 64 steps. 1185 00:47:10,926 --> 00:47:14,666 Now, does this jive with our little asymptotic claim here, 1186 00:47:14,666 --> 00:47:17,326 our little analysis with N notation? 1187 00:47:17,676 --> 00:47:18,986 Well, N is what? 1188 00:47:18,986 --> 00:47:23,156 Sixteen, so that's 16 times log base 2 of 16 1189 00:47:23,156 --> 00:47:25,946 and though I'm writing small here, log base 2 of 16, 1190 00:47:25,946 --> 00:47:28,836 this gives me 4 'cause 2 to the 4 equals 16. 1191 00:47:28,836 --> 00:47:33,276 So this is 16 times 4 equals 64 and though this is not proof 1192 00:47:33,436 --> 00:47:35,056 by any means, it's not a formal proof 1193 00:47:35,056 --> 00:47:37,196 because here is one example that happens to prove my point. 1194 00:47:37,546 --> 00:47:40,666 It at least does corroborate the claim that merge sort 1195 00:47:40,666 --> 00:47:44,926 as we argue intuitively is in fact, N log N in running time. 1196 00:47:44,926 --> 00:47:46,366 And so what then is the takeaway? 1197 00:47:46,366 --> 00:47:50,136 What does a logarithmic running time look like? 1198 00:47:50,946 --> 00:47:55,296 Well, let's go ahead and restore this to, let's say, bubble. 1199 00:47:55,646 --> 00:48:01,996 Let's do selection and let's do merge sort here 1200 00:48:01,996 --> 00:48:04,366 on the right just to see what actually happens. 1201 00:48:04,366 --> 00:48:06,886 And if we zoom in, notice what's happening. 1202 00:48:06,886 --> 00:48:08,706 The fact that the picture is a little prettier 1203 00:48:08,706 --> 00:48:11,266 on the far right is testament to the fact 1204 00:48:11,266 --> 00:48:13,386 that I'm sorting the left half, the left half, the left half, 1205 00:48:13,386 --> 00:48:15,316 and that's why you get these little triangles, triangles, 1206 00:48:15,316 --> 00:48:16,906 triangles, that then get merged together. 1207 00:48:17,236 --> 00:48:20,246 And what does N log N feel like vis-a-vis N squared? 1208 00:48:20,656 --> 00:48:22,306 Well, this, for the third time. 1209 00:48:22,966 --> 00:48:24,946 Why don't we go ahead and take a 5-minute break 1210 00:48:24,946 --> 00:48:26,406 and I'll let this one run to completion. 1211 00:48:27,516 --> 00:48:55,756 [ Pause ] 1212 00:48:56,256 --> 00:48:58,046 >> Alright, we're back. 1213 00:48:58,046 --> 00:49:02,366 So an Internet forward that went around recently is this here, 1214 00:49:02,366 --> 00:49:05,326 YouTube video which is a visualization 1215 00:49:05,326 --> 00:49:09,256 and an audio rendition of what various sorting algorithms might 1216 00:49:09,446 --> 00:49:12,856 sound like if you echo a tone based on the comparisons 1217 00:49:12,856 --> 00:49:14,636 that are being made in the frequency thereof, 1218 00:49:14,856 --> 00:49:16,226 and we've linked this on the website 1219 00:49:16,226 --> 00:49:17,726 if you want to take a look at it. 1220 00:49:17,726 --> 00:49:19,656 I would say that this is more of a-- 1221 00:49:19,656 --> 00:49:21,876 more for fun than perhaps as enlightening 1222 00:49:21,876 --> 00:49:24,966 as the pure visualizations where that we looked at last time. 1223 00:49:24,966 --> 00:49:26,346 And we haven't seen all of these sorts, 1224 00:49:26,556 --> 00:49:27,776 but it's actually quite neat 1225 00:49:27,896 --> 00:49:30,906 to recognize how different the underlying work is 1226 00:49:30,906 --> 00:49:32,456 of each of these algorithms. 1227 00:49:32,456 --> 00:49:35,596 So the first one here is something called insertion sort 1228 00:49:35,896 --> 00:49:40,446 which amounts to going through the list, taking the first thing 1229 00:49:40,446 --> 00:49:44,066 that you see and inserting that element into its correct place, 1230 00:49:44,446 --> 00:49:47,286 then moving on to the next one, dealing with what element-- 1231 00:49:47,286 --> 00:49:48,246 whatever element you're given 1232 00:49:48,246 --> 00:49:49,946 and putting it in its right place. 1233 00:49:49,946 --> 00:49:52,206 And this is in contrast to Selection Sort 1234 00:49:52,206 --> 00:49:53,806 where you're fishing again and again 1235 00:49:54,106 --> 00:49:55,436 for the then smallest element. 1236 00:49:55,436 --> 00:49:57,326 This one, you're just taking what you're given and putting it 1237 00:49:57,326 --> 00:49:58,306 where you think it belongs. 1238 00:49:59,516 --> 00:50:10,516 [ Music ] 1239 00:50:11,016 --> 00:50:11,083 [ Background Music ] 1240 00:50:11,083 --> 00:50:11,886 >> This is Bubble Sort. 1241 00:50:12,516 --> 00:50:37,516 [ Music ] 1242 00:50:38,016 --> 00:50:38,083 [ Background Music ] 1243 00:50:38,586 --> 00:50:38,976 >> Selection Sort. 1244 00:50:39,516 --> 00:50:54,516 [ Music ] 1245 00:50:55,016 --> 00:50:55,083 [ Background Music ] 1246 00:50:55,306 --> 00:50:55,976 >> Merge Sort. 1247 00:50:56,516 --> 00:51:03,266 [ Music ] 1248 00:51:03,766 --> 00:51:10,516 [ Laughter ] 1249 00:51:11,016 --> 00:51:11,083 [ Background Music ] 1250 00:51:11,426 --> 00:51:11,966 >> Gnome Sort. 1251 00:51:12,516 --> 00:51:30,516 [ Music ] 1252 00:51:31,016 --> 00:51:31,083 [ Laughter ] 1253 00:51:32,026 --> 00:51:35,686 >> So, you know, we've got a bunch of juicy topics to dive 1254 00:51:35,686 --> 00:51:38,156 into this week involving pointers and memory. 1255 00:51:38,156 --> 00:51:40,246 Why don't we go ahead and save that for Wednesday? 1256 00:51:40,476 --> 00:51:41,446 We'll see you on Wednesday. 1257 00:51:42,516 --> 00:51:47,158 [ Noise ] 1258 00:51:47,658 --> 00:51:52,300 [ Applause ]