[ Silence ] >> Alright, welcome back to CS50, this is the start of week 4. So, now that you're fully immersed in CS50 it's quite understandable if you want people to know that you're in CS50 and so the teaching fellas have been hard at work putting together this year's line of apparel which includes sweatshirts, CS50 T-shirts and the like. It's kind of a silly tradition that we started a few years ago. And so, I would be remised if I didn't draw your attention to the artwork that they've been working on at store.cs50.net. Can you tune in to any number of designs including duffel bags, if you're the athletic type, law bears if you're more of the stuffed animal type, and then under the new category here do we have a whole bunch of meem themes shirt as well. So feel free to check that out if you have interest. I of course picked the wrong day to wear a sweatshirt under really hot lights so thankfully I came prepared with another item from the--from the store. But one of the teaching fellas also passed long to us recently, a little real world example of what happens when you're not mindful of various data types and you're not mindful of the imprecision that's inherent in representing data in a computer, at least using a language like C and low level primitives like floats and even doubles. So, this is a photograph that someone took on the internet of a real world receipt from a restaurant somewhere I think in the UK and I think the take away will perhaps be cleared pretty fast. [ Laughter ] >> So, it looks like the curry and some other items were really precisely defined in terms of price and this is of course has generated this receipt by some computer, some cash--cash register and they just didn't account for the inherent imprecision so here is a perhaps real world incarnation of that. Now, we concluded last week at looking at sorts and efficiency, some new notation called asymptotic notation and we'll continue that story today. But what I thought I'd do is draw your attention to another little project that the course has been working on. This is essentially a reboot of something we put together a few years ago but the thing had grown in size, this particular app. The data set was getting unwieldy. It was just getting ugly in our eyes. So based on the experience we had with Harvard courses we give you version 3 of HarvardEvents. And this is a tool with which you can navigate all the things that we know about going on campus. It turns out there are hundreds of Google calendars floating around campus, for student groups, for departments, for special events and the like. There are other file formats as well. And being graduates of CS50 we decided to find all of these data, find all of the URL's of people with Google calendars and the like, write a script in a language called PHP which we'll actually introduce toward the end of the semester and then every several hours our script runs, grabs all of these Google calendars and the like from the internet. Parses through them, which means to read through this--what are really just text files, top to bottom, left to right. It imports it into our database and does some fancy indexing as we'll call it later in the term to make searches more efficient. And it also allows us to look for certain key words and themes. And so frankly, what justified the expenditure of time that this took us in recent weeks was this simple little feature of being able now at top left, you click on free food [laughter] and you can filter out all the stuff that's going on [laughter] and this every place on campus you can go between now and the rest of the year to find something to eat. [ Applause ] >> And so if incidentally, you're a representative of a student group and the alike and your Google calendar is not in here, you can check just by flipping through the various--we have over 200 calendars right now and you can search through them here. If someone--you're--one of you're predecessor didn't already submit it, do follow the directions at top left which says, add calendar so that we can augment the data set even further. And you'll find too because of J term this year or winter break where you guys can finally come back onto campus a bit early for various opportunities. You'll find that those events as they get approved and start getting proposed by your classmates, they'll start appearing on the calendar as well in their respective categories. So, more data to come, so this is a clip some of you might have seen in section already. If not, it's sort of become a geek classic. Eric Schmidt is the name of the CEO of a little company called Google and he had the fortune of sitting down with Barrack Obama a few years ago when he was still senator about to run for president and to give him a Google interview. And I think you'll find the topic that came up during that interview fairly apropos. So, I give you this interview. Oops! [ Music ] [ Laughter ] >> I give you first some dance music. Alright, I give you this interview. [ Applause ] >> Now senator, you're here at Google and I like to think of the presidency as a job interview. [Laughter] Now, it's hard to get a job. >> Right. >> As president-- >> Right. >> And you're going to deliver this now. It's also hard to get a job at Google. >> Right. [ Laughter ] >> We have questions and we asks our candidates questions and this one is from Larry Schwimmer. >> Okay. [ Laughter ] >> What--you guys think I'm kidding? It's right here. What is the most efficient way to store a million 32-bit integers? [ Laughter ] >> Well-- >> I'm--maybe-- >> I-- >> I'm sorry, maybe-- >> No, no, no, no. I--I think-- >> That's not a-- >> I--I think the Bubble Sort would be the wrong way to go. [ Laughter ] >> Come on. Who told is this? Okay. I didn't see computer science in his background. >> We--we've got our spies in there. [ Laughter ] >> Why, let's okay--let's ask him a different interview question. >> Alright, so bubble of sorts of course is being poked fun at there for what reasons? Alright, so it's inefficient. It's relatively slow. The upside of it of course though is that's it's pretty darn easy to implement. At least you maybe finding that and if not when you look back at the code you then have written you'll realize that in fact it just requires a few lines of code. And so ease of implementation is actually a very compelling metric against which to measure--do you mind toning my voice down a bit--is a very reasonable measure against which to measure the quality of an algorithm, right. One of the inputs to the problems we're gonna be working on this semester and if you continue on in this world after this course is gonna be how much time it takes, how much thought it takes to implement something, how much time it takes to run something, how much space it takes to run something? And so we'll, oops! That is not the slide I want. And so we'll see this theme of tradeoffs, to be honest, throughout the course. And we'll focus more academically on tradeoffs involving space, bram and disk space and timed CPU cycles. But towards semesters end when you're actually tackling what you probably will make out to be some fairly real world projects for your final project the amount of time it takes and the amount of effort it takes for the programmers, the computer scientists to actually develop something is itself a reasonable resource to consider spending or not spending. Case and point, when I was in graduate school I used a programming language called Perl a lot. It's a scripting language which like PHP is something we'll discuss later in this semester. But I was working on research involving detection of worms, viruses and the spread thereof. And so I was collecting from friends and colleagues' computers all sorts of logs that monitored the kind of low level activity that was happening on their computer. And so part of my research was to analyze these logs and find patterns. The idea being if I find a pattern across multiple people's computers maybe all of those people are infected with the same kind of malware or virus or worm. And so very often I would find myself with gigabytes worth of data, at night I needed to analyze this data and look for this patterns and frankly the reality was sometimes I could spend 10-15 minutes whipping up a little script, a little program that unfortunately would take eight hours or more to run. But I would choose to sit down for these particular projects right around midnight so that when I woke up my program was giving me the results I wanted. And so, in the real world frankly, these kinds of questions, how much time is this gonna take is certainly germane. But of course that process would not scale very well. In fact to write a run those same experience--those same experiments now with the even greater volume of data computers are now producing I'd probably have terabytes worth of data and at that point things just would have broken. It would not have been viable. And so you have to go back to the drawing board and realize Bubble Sort or whatever it is you're using just isn't enough to snuff, I need to come up with something more clever. And so we looked at some alternatives the other day. We looked at something called Selection Sort and that too was pretty straightforward, at least conceptually. I just go down the list selecting the smallest person at a time and then I repeat, repeat, repeat but when we actually did out the math or kind of reason through it, the running time, the asymptotic running time of bub--of Selection Sort was also what? So, N squared. So we introduced this notation big O which generally refers to worst case. And by worst case we mean different things. And the context of sorting, the worst case is your handed a problem that's in complete reverse order because that implies you have to do as more work that could possibly--that you could--you have to do more work than you would of course if things were in perfect order. And so you care--you care about ultimately how much time is my algorithm gonna take to perform on that worst case running time. We also looked at Omega Notation. And this was just a formal way of describing the best case running time and in the case of Selection Sort, what was the best case running time? So, N, N squared, which was it? So, it wasn't one. Okay, so 1 is a viable answer in some contacts that needs constant time, the same amount of time no matter what. But it's definitely not one and in fact it wasn't N in the case of Selection Sort because remember the algorithm we implemented on stage last week had me going back and forth across the stage selecting on iteration, the smallest person I can find, the smallest number and then putting them into place. >> But the problem is I wasn't taking any kind of aggregate view. I was just finding very tunnel vision-like, the smallest elements at that moment in time which means I don't know anything about the other elements other than they are not the smallest and so no matter what with Selection Sort I had to repeat this again and again and again and if you do out the math it's roughly N squared steps in the worst case as well. So Selection Sort, while it might be easier perhaps to think through than Bubble Sort, or maybe it's pretty much equivalent, it's just a different approach to the same problem. It fundamentally didn't feel all that better. But there was something that was better, right? You'll recall that at the very end of class we pulled up this little demo that looked like this. And we had a bunch of drop downs I was able to choose from. I chose Bubble Sort on the left Selection Sort on the right and then something called merge sort on the very right hand side and then I started this all off roughly at the same time and what was frankly striking at least to me at the time was, my God it's done. Like what the heck have we been spending our time for--our time on with Bubble Sort and with Selection Sort and in fact there's plenty of other N squared sorts that we're not even gonna bother looking at. But with this example alone can you realize that with some more exercise of thoughts and intelligent--I don't wanna say intelligent design here, intelligent design can you actually solve this problem so much more efficiently and just as correctly. And so one of the things we'll look at today is how can we leverage an algorithm, how can we implement an algorithm that at least at first glance the second time we've now seen it feels so obviously better. Where is the ingenuity there and how can we leverage that in our own problem solving? So, first, a couple of teasers--oh, that was kind of a jerk move, okay. [ Laughter ] >> Unintentional. So first just a couple of teasers if you are in fact the geeky type, so our friends at Microsoft are having this little nerd party in a few days. We'll post this information on the course's home page. But it looks like they're giving away stuff, they're feeding you stuff, and if you're interested in learning a bit more about Microsoft internships and such they have an office in Kendall Square near MIT and that's where I believe this event is to take place. If you are interested by contrast in new and fun toys our friends at Aces which is a hardware manufacturer which makes things like motherboards which are the innards of computers and other things. They are soon releasing apparently a new tablet like device that looks a little something like this. One of our teaching fellows Willie Yao is one of their ambassadors on campus. And so if you are a bit of a geek and would like to play with this you can go to cs50.net/aces and I don't think Willie will give it to you to keep but you can borrow and play for some amount of time in exchange for some feedback on it. So if you like toys feel free to reach out there--what's that? >> Is it like an Etch-a Sketch? >> Is it like an Etch-a-Sketch? It appears to be useful in that way but I imagine it does more compelling things these days but I have not seen it myself. Alright, so problem set 3. So a bunch of you have already dived into this probably which is fantastic. What I though I'd do is just tease you with some visuals if you haven't gotten to the point of playing with the staff solution so the themes of problem set 3 are two-fold. One is to just get your hands dirty with some of the algorithms, some of the sorting and searching algorithms that we've been discussing now for a couple of weeks and we'll finish looking at this week specifically Bubble Sort, Selection Sort, this thing called merge sort but then the main part of this problem set is to implement a game that you might have received in a little goody bag as a party favor years ago, the game of 15. This is a game where you get a little plastic device, it's got 15 plastic numbers on it and one hole in this little plastic board and you can move those numbers up, down, left to right, and the goal is to take what's a random assortment of tiles and arrange them in numeric order and that's one of these little things you can play sort of absentmindedly. But now, there is an opportunity for us to implement this in code. And for a couple of reasons, one this is the first P set where we're actually gonna give you code to work from. So it's generally called distribution code and whereas for the previous problems that you pretty much started from scratch, blank files you opened up nano and there was nothing there unless you put it there. Now we're gonna hand you some framework, that code that we wrote with some blanks to get you accustomed to the idea of one, writing larger programs than time might allow if you were doing it on your own. And two, to give you a better sense of design especially as your programs get a little more involved and they play or they run for more than just a few seconds time. In fact they interact with the user. You will be able to infer from some of our code how in fact you can implement some more sophisticated programs. Now we'll guide you through it as you'll see in the specs as to what holes you need to fill in within the framework. But you'll see that once you get to final projects frankly, you're not gonna want to implement everything from scratch. In fact, with this website here that I pulled up a moment ago, the events.cs50.net a lot of this stuff, so it definitely took a lot of time. And in fact, one of the lessons you may already be realizing with P sets is that things seem to take twice, three times, four times longer than you actually might think. And frankly, even I never quite learned that lesson. I mean that--what began as, ah we'll do this this weekend turned out to be like a 100-hour project to revamp this whole thing. And it was fun for us but thankfully, we didn't have to implement all this stuff ourselves. If you focus on some of the minutia of a site like this, you know, it's useful to be able to jump to a particular date whether in the future or in the past. Now, this is a pretty familiar widget of little gadget that you might see on most websites today, but my God, what an uninteresting problem to solve ourselves. I don't want to spend the whole weekend implementing just a little pop up calendar and frankly it probably would take at least that long to get something that's as interactive and dynamic as something like this that works across all browsers and so forth. But thankfully now in 2010, we're at the point where there's a whole lot of free and open source software that you can integrate into your projects that other nice people have made available. We're using a library as we'll call it, as we already do in the CS50 library whereby you can bootstrap yourselves to making much more interesting, much more sophisticated projects by standing on the shoulders of others who have come before you. And that's exactly what we did with a widget like that. Some of the search features are we borrowing other people's libraries and framework. So problem set 3 has a little taste of precisely that approach. So let me go ahead and pull up the staff solution to the standard edition for a moment, the game is called 15, it takes one command line argument which is the dimensions of the board 3 by 3, 4 by 4, I'll do it 4 by 4. And this is how we chose to implement the aesthetics of this game. You can get a little fancier. But this is perhaps a very simple example of ASCII art. Now the bottom--the little underscore in the bottom right hand corner represents the blank tile and just as with my thumbs I would move 4 down or maybe 2 to the right. I can do exactly that. If I type 4 and hit Enter the 4 moves down and the blank space essentially moves up. I can now move 5 over, I can now move 9, I can move let's say 8. Well if I didn't want that, 8 again and so forth. And so for the standard edition will you be implementing the mechanics of this game so that you can in fact play against yourself. Now for the hacker edition, if you are feeling up to a bit more of a challenge you'll find that when you run the game, one it's not all that hard to use some ASCII art just make it a little fancier as the teaching fellow who implemented this solution did. But the game here is gonna be played pretty much the same. I can move 9 down, I can move 12 down, I can move 2 down, and I can cheat by entering God mode and if I hit God and hit Enter well now the problem set solves itself for me. And so if you choose to elect the hacker edition of this week's problem set the challenge will be to figure out not only how to implement the game but how to implement a solver for the game. And the PDF will walk you through that. Now this can actually take a while to watch but you'll see that the numbers are moving into position. The spec, if you want to add a little bit of color shows you obviously how you can add green and red and so forth. So if you're curious this one I'm gonna kill off because it will take a little while--you'll find that available on the cloud if you'd like to experiment whether or not you're doing that edition or the other. So, with that said, I have a new toy here. So this is a little scale that if you put something here or here it's supposed to tell you which one is heavier and I also have 8 cups here. And inside of each of these cups is a different weight so that in theory all of these cups are different weights. And now suppose that I want to sort all of these cups. So all you have behind here for spaces sake is just 8 cups, each with different weights. I have my little scale here. And the reason for the scale is that as we saw last week the idea of these algorithms--I broke the scale already. The idea behind all these algorithms is that what's ultimately important is how many comparisons you ultimately need to make. I did buy this at like a child store so it's not very robust here. Oh my god! Alright, I can figure this out it's made for like ages 5 and up so, [laughter] oh, my gosh! This is ridiculous [laughter] alright you try doing this in front of hundreds of people. Alright, is that right? I think that's right. If not we're gonna call in some help. Alright, it feels right. This was not meant to be part of the demo alright nice. Okay. Thanks, alright. [ Applause ] >> So next year don't pick up the scale. Alright, so we can implement any number of algorithms using this thing because the basic mechanism I have here is a comparator. >> Something that takes two inputs, a left operant and a right operant and it just returns a Boolean value is--or rather it returns a left to right value. Is this side heavier or is this side heavier? With that mechanism we can now implement these comparisons sorts. Bubble sort recall is a comparison sort. We pair wise compare everyone that was onstage. Selection sort too really reduces to a total number of comparisons because I'm again comparing the current smallest to the next thing I see, the next thing, so really a lot of these sorting algorithms boil down to comparisons and the numbers that you actually have to make. So supposed I did want to find the smallest cup here. So each of these cups again have different weights if I want to find the lightest of them I might start not knowing which is which. I'm just gonna start over here. I'm gonna put two cups down and it looks like this one is clearly heavier. And so now I can take this one away. Make a mental note that he is not the smallest. So this is like crossing off the person in position number 2 here. I know this thing is currently the smallest. So let me choose another cup from the array. Alright, he is still currently the smallest. So let me move down the list or put this cup back. How about this one here? Alright, this one is still the smallest. It's gonna be a pretty stupid demo if that was the smallest, but here we go. Now let's try this one, alright. So that's four comparisons. Alright, that's 5 comparisons. This demo needs a little work next year. That's six comparisons, so N minus 2. So, now let's see here, okay. So now thankfully I found the lightest one which happens to be this guy here and so what's just happened? It's as though I've walked across the stage like this realized, damn, it was the guy over here or rather I found the smallest element here who beat out number 2 over here so I can now put number 1 into place and recall that it didn't matte if I punted whoever was standing here 'cause they were given to me randomly anyway. Who cares if I randomly send him or her elsewhere in the array? And now I have a problem of size N minus 1. And so I repeat, and so I repeat and one of the visuals meant to be conveyed by this--this scale here is now I know that this guy is the smallest. So, I'll just put him aside. But now, my God! I have to do this whole stupid process again. So, alright, so he is currently the smallest. Now, he is currently--he's currently the smallest. Now, he's currently the smallest and you can feel the tedium. And so this begs the question, even in the context of these cups, how can we possibly do better if at the end of the day in order to figure out if two cups or if two people or if two ints inside of an array are bigger or smaller than one another it feels like we have to do this comparison work anyway. Well recall from week zero when we took a very real world phonebook and then just last week we had our volunteer come up with the array of pieces of paper on the board and when I challenged myself and when I challenged our volunteer to find me the number 50 both he and I were able to leverage one assumption. And that assumption in those examples was what? That the phonebook was sorted alphabetically and that the array of numbers on the board behind the second row of paper itself was sorted alphabetically. So that seemed to be a detail we can leverage. And yet we seemed to have a different problem here, right? Really there's no work to be done if I am handed all of the cups in sorted order. There's no work to be done if I'm handed all of the arrays in sorted order so, you know, if I demand that you give me this assumption that the cups are already sorted and then I'll sort them for you, I mean, this is kind of a cyclical argument. We're really not solving the problem. But there was an idea that we exercised on the board. And there was an idea that we exercised with the phonebook which was this notion of dividing and conquering. Taking the problem, recognizing that you know what, even though this is a pretty big problem size 8 in this case and last time it was size 8 or in the case of the papers in size of a thousand roughly with the phonebook, I assume these are in a perfectly straight line they won't quite fit. So if I try to apply the same logic, well how can I divide and conquer this problem. This is kind of an unwieldy problem. There's 8 cups here. I know it's gonna take me a lot of work on the order of N squared steps to sort these things. But can I borrow that idea of taking a problem and dividing it into something smaller. And let's see what happens. Well here is my array of 8, you know what, it's too much work to sort all 8. Let me go ahead and just sort the left half at the moment. So I'm just gonna conceptually move these aside and now, wow! Alright, I have the problem so clearly this algorithm whatever it's gonna be is gonna be at least twice as fast because I'm doing half as much work. Alright, well even this, this is still kind of unwieldy, sort 4 cups. So I have to get out the damn scale again and go through that whole process. Well let me apply this same logic, let me divide this problem in half, alright. And now I'm down to 2 cups. Even this, sort the--I have to get the damn scale again, right? It's a lot of work. Can I do better? In the phonebook, I just kept dividing and dividing and dividing into half and in the end I found Mike Smith or I found the number 50. Well, let's see what happens here. Alright, here's the list of size 2. I still don't want to do all these work. Let me divide this into 2 lists of size 1 and now done, right? I have sorted with the smaller problem because that smaller problem right now is of size 1 and so it's sort of obviously the case that this cup is now sorted. In fact, better than that, this guy is already sorted as well because I whittled that problem down to size 1. Now, you might think me an idiot, right? Because this is not really doing any work it's just talking about doing work. But what remains to be done, right? So supposed that this one--I'll just guesstimate, this one feels heavier. So they're still out of order. If I want this one over here, so what work remains to be done? This is sorted, this is sorted, how do I now make a list of size 2? [ Inaudible Remark ] >> So I'm gonna have to compare them, right? So at the end of the day we're not gonna get out of this need to compare, at least in this context of cups that have weights or ints that have values. So I'm gonna need to do one comparison. So I can do this really simply with a scale or just with my own arms. Indeed, this thing is heavier. And so now when I have the list here, let's see, yours is facing this way, so small is gonna be here large is gonna be here. So here is a list of size 2, this is the light one, this is the heavier one. I'm done sorting a list of size 2. Now we still have all this work to do. Let's see what happens next. Well, at this point in the story how had I gotten here? I first had a list of size 8, then 4, then 2, but then I had another problem of size 2. It was the right hand side that I ignored for the moment. So, let's finish that part of the story. This one's done. So now I have this right hand side of the 4-cup problem. How do I sort these things? I don't know. Let's divide and conquer. Divide each of these into a list of size 1, done. Done. Well now what remains to be done next? Well I need to merge these two lists. This list is sorted and that is, you know, stupid to say but it's very much correct. This list is sorted. Now which order do they go in? I'm gonna have to compare. Now I won't bother with the scale anymore. This one feels heavier. So, I'm gonna say that this goes here and this goes here. Alright, now at what point in the story am I? Alright, this I haven't even touched yet. This list is sorted, this list is sorted but they could be intermingled. What do I now need to do in order to sort the whole left half? >> Compare them all. >> Alright, now you kinda need to compare them all, right? Because this could be one--this could be let's say 4 and 5 pounds and this could be 1 and 2 pounds. So each respectively, this is sorted 1 and 2 respectively 4 and 5, this is sorted but clearly they are not in the right order. So now I have to merge these two lists of size 2. So how do I do that? I've got 2 lists. This one here, this one here, the lightest elements of each are where? Alright, it's gonna be on this side. So it's on the elements on your left that are already known to be the lightest 'cause I did that. So now I have to compare these. This one feels lighter so I'm gonna put him there. Now what's next? Well, I might move my hands here 'cause this list only has one element left. Now, which one remains? This one feels lighter, so let's put him here. Now I have these two. Which one feels lighter? Well this one feels lighter, done. So in the end I am doing comparisons. And it feel like really this is just one of those like games the magicians play in the square where you are literally moving the cups around accomplishing nothing other than confusion with the audience but--which may actually be the case--but, I argue that this is actually better. And in fact, if we count up all of these silly comparisons I was making verbally I bet I'm gonna be making fewer in the end than I was with bubble or with selection. Because the algorithm I proposed is going to leverage this idea of recursion which recall was just a piece of jargon we tossed out at the last--at the end of last week's lecture, last time's lecture recursion really in this context refers to the act of a function calling it's self. Now, this is really not new. We just didn't use this word in week zero. In week zero, when we tore the phonebook in half and half and half we were recursing through that problem. We took the problem of size a thousand, we divided it in half. And what did we do? We then repeated the same process. So, the curious thing about recursion is that pretty much always can you implement this idea of doing the same thing again and again and again but with smaller bytes each time. You can do it with loops, a for loop, a while loop, a forever loop, a repeat block, whatever language you're using, whatever you want to call it you can implement these things iteratively but there is this simplicity you'll find about the idea of recursion why implement something with this clunky for loop, with the initialization and the updates and all of this syntax mess if you can just write a function once and have it call itself but with a smaller and smaller argument. There is an elegance about it. And you'll find in the end that recursion is a feature of a language, it allows you to map some very obvious concepts like the phonebook tearing in half and half and half and half to actual code without it using some arbitrary human contrived framework like a for loop or a while loop which are much more stilted mechanisms. So I propose this as a new algorithm for sorting N elements and being 8 in this case or really a thousand in the case of the phonebook, or anything of larger size. >> So given some input of N elements, first, if N is less than 2, I return immediately. Why is that? Right, 'cause that means either I've been handed zero elements which mean there's really no work to be done or I've been handed one element which is a vacuous truth that it's sorted, right? It's one element like I claimed before it's sorted and so there's no work to be done. Now, as obvious a statement as that is in this algorithm, it turns out that is the key to this whole problem being solved correctly without my algorithm looping infinitely. I need that sanity check saying at some point, you have to stop. And here are the criteria for stopping if N is less than 2. So if N is not less than 2, that is I have two elements or more in which case there's definitely some sorting to be done. I argue that, you know what? I can tell you how to sort N elements by saying in kind of a snide way. Alright, you want me to sort N elements. I'm gonna tell you go sort the left, go sort the right, and then I'll merge the two together. Now, it's kind of a cyclical argument here because how do you sort the left half of N elements? Well, you need an algorithm for sorting. But why don't I apply this same logic? And so the fact that in this whole slide here, this algorithm for sorting, I'm using the verb sort. It implies that this algorithm is calling itself again and again, and again, and on each time the size of the problem I'm trying to sort is being divided by what? By two, by two, by two, and here it just conceptually is why this thing doesn't infinitely loop. At some point, I'm gonna pass an N divided by two elements, N divided by two elements but at some point, N divided by 2 is going to give me 1. And at that point, the algorithm is going to return immediately so I'm not gonna keep asking the same question again and again, and again. It will bottom out at some point. So what does this allow me to do? Well, let me come up with a sample set of elements here and let me go ahead and just get some fresh white board space. It's a little distracting using the cups since they don't fit on table and also, you can't see what they previously looked like. So let's try to do this chronologically top to bottom. If I start off with fou, two, six, eight, one, three, seven, five, so my list is of size N equals 8 at the moment. The algorithm on mine professed to be implementing now is this thing. So I'm giving N equals 8. So N is not less than 2 so I don't return and so I proceed to do what? Well, first I have to sort the left half of these elements. Well, let me think about what that means. Here's the left half, so now I have a problem of size 4. And what do I do? So in your mind, if you are now the computer program and you are executing this thing from top to bottom, what just has happened verbally is we are stepping into the line of code that says sort left half of elements. So at this point in the story, we're not gonna forge ahead to here or to here, or the last two lines. We're gonna focus on this line in particular and ask ourselves how do you sort the left half. Well, let's call this same algorithm, on input of four elements. Check first is N less than 2. No. So let's proceed. How do I sort this? Alright, let me sort the left half. That's gonna be these two elements here, alright. How do I sort the left half of these elements? Well, let's call this same algorithm. On input of two elements, is less than 2? No, 'cause it equals 2. So what do I do next? I sort the left half of elements, alright. So what's the left half? It's just this guy here. And this is where it feels like we're not really doing anything except talking about sorting. I now have a list of size 1 so N is, in fact, less than 2. One is less than 2 so I return. And this is now consistent with my claim that I have sorted a list of size N equals 1. Now, in the big picture, that doesn't seem all that useful yet, but let's get there. So now, at this point in the story, I've just returned. So if you now reverse the story that we're telling here, what was the next line supposed to be after sorting left half? Sort the right half, right? So we just zoomed in on this line of codes. Sort the left half and that left half was only of size 1. As soon as I return, I get to resume this story which means sort the right half, so that means sort this half here. Well, N is, again, less than 2 'cause N is 1 now. So I have now sorted another list of size 1, and now I'm done with that. The function returns. And what line of code am I on now? The merge, the sorted halves. So this is where recursion gets a little trippy--certainly initially, and that you have to kind of keep diving deeper, deeper, deeper into the problem. And then as soon as the function returns, you kinda have to unwind but remember where you were just a moment ago. So that's where it's easy to get lost, but let's see if these lines in the picture help us stay on track. So now, I need to merge the sorted halves. How do I do this? Well, it's pretty obvious to do as a human, but let's come up with a simple heuristic. I'm gonna use two integers--two variables rather. I'm gonna use my left finger and my right finger and each of which is gonna point at the start of each of this list. So at the moment left hand is at the start of this list of size 1, my right hand is at the start of this list of size 1, and now I need to merge these two lists. How do I merge them? Well, I have to compare. So compare the front of this list which is 4 against the front of this list which is 2. Which one is obviously smaller? Two, so now let me drop down 2. Now what? Well, let's consider these as lists. They're pretty small right now, but supposed there could be more numbers, so let me increment my right hand. So it's now pointing to the end of the list. There's nothing there, so now I'm done. I can now just blindly copy what remains in the previous list here. So how many comparisons did that take here? Well, it looks like it took 1 in this case or it involves--we can put it another way, merging those two lists involved looking at two numbers, 1, 2, and that's it. So, tuck that away for just a moment. So now, I have a list that's sorted of size 2. Where am I in the original story? What comes next? Sort the right half. So again, if you're unwinding what's going on here, this--we sorted the left half which meant sort the left half, then the right half then the merge. Well, then we rewind which means now sort the right half. So let's do this a little faster. I'm sorting the right half which means sort the left half first, alright? So I've done sorting of the left half. Let me do that. I return immediately. Now, let me sort the right half. This, too, is pretty easy. I get this. Now, I'm at what point? Merge. Now this is pretty easy, this one, but let's use the same heuristics. So finger-pointing at the start of each list, 6 is indeed less than 8 so I'm gonna write it down first. Now, I'm gonna advance this finger. I'm out of numbers in this list so I can just go ahead and right down 8. And again, how many numbers have I touched physically this time? Just two, I touched 6 once and then I moved ahead, and then I touched 8 and I moved ahead, not touching any other number. So at this point in the story--at this level in the story, if you will, I've touched one, two, three, four numbers total or I've made two comparisons. But let's just count in terms of the numbers I'm touching 'cause that'll be useful in just a moment. So now, what remains to be done? What step is next? So now, I have to merge. Now, don't be distracted by the fact that they already appear to be sorted. That's just good luck. I still have to do this process and here is where the finger thing gets a little more useful 'cause I have longer lists. Just point at the start of this list, and this one so which number comes first? Obviously, two, now what do I do? Well, let's do the same thing as before. Advance this pointer--this finger, to the next element of the list which is 4, make the comparison. It's obviously 4. I'm out of numbers here, so now I can blindly write down the rest of these numbers. But again, the key takeaway there was how many numbers did I touch on that pass? One, two, three, four, so on each level of the picture I'm drawing, I've seem only to be looking at each number once. And again, that's one of the key takeaways here. Just contrast this for a brief moment to something like Selection Sort which from the get go had a ridiculous amount of redundancy comparing the same damn numbers again and again, and again. We don't seem to be doing that just yet, certainly not as badly, alright, so at this point in the story I have a sorted list of size 4. What's the next step in the algorithm? Sort right half. Now, I'll do this a bit more on automatic pilot so that it doesn't get too tedious, but let's do the same thing. Here is the right half. Here is the left half. Here is the left half of the left half which is a trivial list of 1. Here is the right half of the left half, trivial list of 1. Now, I have to do the merging so I do pointer, pointer. I end up getting 1 and 3. Now, I do the same process here. I get a 7, same process here. I get a 5. I now have to merge. It's a little more interesting now, 5. I'm out of numbers so I can just write down the 7 and now, if you're following along, what step comes next? >> Merge. >> Okay, merge. It turns out that they already look pretty good, but let's see what happens. One, then I advance this finger. I'm pointing at 3 and 5. I take the 3. I advance this finger. I'm now pointing at just the 5, then the 7. Now, at this point in the story, one, two, three, four, so notice what seems to be happening here. On each level in the picture, I'm touching each number once, alright. So that's gonna be important in just a moment. But we have one last step which is obviously merge the sorted halves. So on this half, I'm gonna put my finger on the 2. This half, I'm gonna put my finger on the 1 and notice I can do the same thing. I've realized 1 versus 2, 1 wins. I advance this finger to 3. I then compare 2 and 3 and as you might imagine, if I repeat this process, again, again, again, and again, the list is hopefully, now [laughter] one, two, three, four, five. [ Laughter ] >> Wow, okay, nicely done, alright, [laughter] alright. I'm getting a lot of credibility today. Alright, so the list is now hopefully sorted correctly and it is, in fact. Alright, so then, this begs the question. What was the point of all this? 'Cause frankly, that took a lot more time to explain. I'm out of breath. There was lot of work it seemed at the board. Was it, in fact, better? >> Well, we saw the teaser in terms of that animation that suggests this merge sort algorithm when implemented by a computer is absolutely faster. It blows selection and Bubble Sorts out of the water, but why is that? Well, notice again, on each level of the tree, when I actually do the merging, so this remember, is when I wrote down just the numbers when I bottomed out with the list of size 1. When I started merging, I did merging three times across this whole board, this level, this level, and this level, and each time because of the way I was advancing my fingers I touched each number just once. So if on each iteration of merging I'm doing eight things or more generally, N. That then begs the question, how many levels of this tree are there actually? Ignore this one here 'cause this is when I was just writing down the original list of size 1. Well, here is when I started doing the merging. So there's obviously three rows but conceptually, how did I get to 3? Well, here is a list of size N. How many times can you divide a list of size N by 2, right? So I can go from 8 to 4 to 2 to 1, so that's 3. And in fact, mathematically if it's--don't really have room for this. But mathematically, we've mentioned this before, log N or really to be precise, log base 2 of N, is the way you express this mathematically. How many times you can divide N by 2 before you get down to 1? And in this case, we go from 8 to 4 to 2 to 1 three times and then on each iteration of this algorithm, each pass across the board I'm touching N numbers, so that means I'm doing N things, log N times. And so what that seems to suggest is that merge sort itself when all is said and done is an algorithm that runs in log of N times N or rather just so the parenthesization is right, N log N. And in fact, this algorithm called merge sort does, in fact, run in what's called N log N time. Now, if you think back to some of the charts last week, N squared was one that--as like a parabola, that pretty much took off on the chart and got really big, really fast. N log N is not nearly as good as log N. As a sanity check, what algorithm have we seen that runs in log N time? So binary search, the phonebook example, binary search on the pieces of paper on the white board, why is that? 'Cause there, you're only dividing the problem in half and half, and half and half, we are doing that here. I'm dividing it in half and half, and half and half, but each time I do that there's this merging process. But that merging process only takes N steps, so that's N times log of N. Now, it's a little tricky to reason through this perhaps the first time, let's just take a very simple example and see if we can do a little sanity check here. So I claim the following. In general, when we wanna talk about running time, we just need a silly place holder, so we'll call it T for time. So the running time of the problem where the input is of size N as expressed here formulaically, T of N, the running time of an algorithm, given an input of size N. You know what? It's gonna be zero if N is less than 2, right. This is just statement of the obvious. If there's no work to be done, the running time is gonna be zero. So let me just define that. I need a so-called base case even in my analysis here. Now, this looks a little scary at first, but this is actually just another statement of the obvious. If I'm using algorithm that I'm now calling merge sort, the running time involved in sorting N elements, T of N, you know, is just the same as running the algorithm on half the size of the elements plus do it again for the right half, plus what's this plus N come from? That's the merging, right? That's the finger thing from left finger to right finger going back and forth across the board, never touching a number more than once. So in other words, every time I merge the point that I kept emphasizing verbally there and that I'm only touching each number once, means that we have to account for the amount of time it takes to merge which is going to be just N. Now, this is again one of these cyclical answers. I ask you for the running time of this algorithm and you give me the running time in terms of the running time, right. So I kinda have to push back now and, you know, push back and say, alright, well, what's the running time of T divided by N? Well, you being the wise--wise guy could say, well, it's T of N divided by 4 plus T of N divided by 4 plus and over 2, right? We can keep doing this again and again. Well, how can we ever get to an actual answer? Well, the beauty is in this simple stupid little statement of the obvious. And this too is where the power of recursion comes in in a programming language. If you have this so-called base case, can we actually get to a definitive answer eventually? We're not going to run around in circles forever. So again, the algorithm looks like this. This is our little sanity check, the base case. This is really the guts of the algorithm 'cause this line here, someone pointed it out before. You can actually say big O of 1, big O of 1 being constant time, the same number of steps. So this line or these lines of code up here are arguably constant time steps to say if N is less than 2 in return, that it will always take maybe one step, maybe two steps, some number of fixed CPU cycles. These things run for variable amounts of time because they take input, a list of size of some amount. So here is the formula. T of N is gonna equal T of N divided by 2 'cause the time to sort the left half plus T of N divided by 2, the time to sort the right half plus the time to merge. And so can we do a little sanity check with the simple case here? Well, let's try. So supposed that the list is not of size 8, let's make it slightly more interesting and double it. So supposed that I give you 16 elements to sort, well, following the logic before, the running time involved in sorting 16 elements is gonna be twice the running time of sorting 8 elements, left half and right half plus 16 and again, a little sanity check, 16 means--just the merge steps, right? You have to touch each of the 16 elements once as I go across the blackboard to actually merge them together, alright. So now, let's recurse together, let's dive in and hone in on what T of 8 is. Well, T of 8 is gonna be 2 times T of 4 plus 8, alright. Well, what's T of 4? T of 4 is gonna be two times T of 2 plus 4, alright. Well, what's T of 2? Well, T of 2 is gonna be two times T of 1 plus 1 and thankfully, this guys is gonna become uninteresting to sort one elements takes no time, 'cause we return immediately. So that's why that base case rule was so important. So I'll now put these slides online, certainly if you are scribbling things down. But let's now answer this question. What do I wanna now do? I had T of 1--I have an answer now. So now let me just do some grade school substitution. I need to answer this question, T of 2. So let's plug in T of 1 equals zero by changing that, alright. So now, T of 2 equals zero plus 2. So that's T of 2 equals 2, so that means this thing here had better become 2, alright. So now, 2 times 2 is 4 plus 4, that's 8. So now, I'm gonna plug in 8 here and now finally, that's 16 so that's plus 8, so that's 24. So now, I need to plug the 24 in there. So 2 times 24 plus 16 gives me 64. So the running time I claim to sort 16 elements takes 64 steps. Now, does this jive with our little asymptotic claim here, our little analysis with N notation? Well, N is what? Sixteen, so that's 16 times log base 2 of 16 and though I'm writing small here, log base 2 of 16, this gives me 4 'cause 2 to the 4 equals 16. So this is 16 times 4 equals 64 and though this is not proof by any means, it's not a formal proof because here is one example that happens to prove my point. It at least does corroborate the claim that merge sort as we argue intuitively is in fact, N log N in running time. And so what then is the takeaway? What does a logarithmic running time look like? Well, let's go ahead and restore this to, let's say, bubble. Let's do selection and let's do merge sort here on the right just to see what actually happens. And if we zoom in, notice what's happening. The fact that the picture is a little prettier on the far right is testament to the fact that I'm sorting the left half, the left half, the left half, and that's why you get these little triangles, triangles, triangles, that then get merged together. And what does N log N feel like vis-a-vis N squared? Well, this, for the third time. Why don't we go ahead and take a 5-minute break and I'll let this one run to completion. [ Pause ] >> Alright, we're back. So an Internet forward that went around recently is this here, YouTube video which is a visualization and an audio rendition of what various sorting algorithms might sound like if you echo a tone based on the comparisons that are being made in the frequency thereof, and we've linked this on the website if you want to take a look at it. I would say that this is more of a--more for fun than perhaps as enlightening as the pure visualizations where that we looked at last time. And we haven't seen all of these sorts, but it's actually quite neat to recognize how different the underlying work is of each of these algorithms. So the first one here is something called insertion sort which amounts to going through the list, taking the first thing that you see and inserting that element into its correct place, then moving on to the next one, dealing with what element--whatever element you're given and putting it in its right place. And this is in contrast to Selection Sort where you're fishing again and again for the then smallest element. This one, you're just taking what you're given and putting it where you think it belongs. [ Music ] [ Background Music ] >> This is Bubble Sort. [ Music ] [ Background Music ] >> Selection Sort. [ Music ] [ Background Music ] >> Merge Sort. [ Music ] [ Laughter ] [ Background Music ] >> Gnome Sort. [ Music ] [ Laughter ] >> So, you know, we've got a bunch of juicy topics to dive into this week involving pointers and memory. Why don't we go ahead and save that for Wednesday? We'll see you on Wednesday. [ Noise ] [ Applause ] ==== Transcribed by Automatic Sync Technologies ====