WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:00.000 --> 00:00:02.982 [FILM REEL] 00:00:02.982 --> 00:00:06.461 [MUSIC PLAYING] 00:01:12.660 --> 00:01:15.420 DAVID MALAN: All right, this is CS50. 00:01:15.420 --> 00:01:19.200 And this is Week 3 already, wherein we'll take a look back actually 00:01:19.200 --> 00:01:21.400 at Week 0 where we first began. 00:01:21.400 --> 00:01:24.930 And in Week 0, recall that everything was very intuitive, in a sense. 00:01:24.930 --> 00:01:28.000 We talked not just about representation of information, but algorithms. 00:01:28.000 --> 00:01:30.450 And we talked about tearing a phone book again and again. 00:01:30.450 --> 00:01:32.860 And that somehow got us to a better solution. 00:01:32.860 --> 00:01:35.888 But today, we'll try to start formalizing some of those ideas 00:01:35.888 --> 00:01:38.430 and capturing some of those same ideas not in pseudocode just 00:01:38.430 --> 00:01:41.650 yet, but in actual code as well. 00:01:41.650 --> 00:01:45.210 But we'll also consider the efficiency of those algorithms, 00:01:45.210 --> 00:01:48.372 like just how good, how well-designed our algorithms actually are. 00:01:48.372 --> 00:01:50.580 And if you recall, when we did the phone book example 00:01:50.580 --> 00:01:53.940 wherein I first had an algorithm searching one page at a time, 00:01:53.940 --> 00:01:56.460 and then second one two pages at a time, and then third, 00:01:56.460 --> 00:01:58.590 started tearing the thing in half, recall 00:01:58.590 --> 00:02:01.900 that we, with a wave of the hand, kind of analyzed it as follows. 00:02:01.900 --> 00:02:05.070 We proposed that if the x-axis here is the size of the problem, 00:02:05.070 --> 00:02:09.120 like number of pages in a phone book, and the y-axis is the time required 00:02:09.120 --> 00:02:11.430 to solve the problem in seconds, minutes, 00:02:11.430 --> 00:02:13.363 page tears, whatever your unit of measure is, 00:02:13.363 --> 00:02:16.530 recall that the first algorithm, which is the straight line such that if you 00:02:16.530 --> 00:02:19.620 had n pages in the phone book, it might have this slope of n-- 00:02:19.620 --> 00:02:23.280 and there's this one-to-one relationship between pages and tears. 00:02:23.280 --> 00:02:27.000 Two pages at a time, of course, was twice as fast, but still really 00:02:27.000 --> 00:02:30.000 the same shape, the yellow line here indicating that yeah, 00:02:30.000 --> 00:02:33.570 it's n over 2, maybe plus 1 if you have to double back, as we discussed. 00:02:33.570 --> 00:02:36.840 But it's really still fundamentally the same algorithm one 00:02:36.840 --> 00:02:38.370 or two pages at a time. 00:02:38.370 --> 00:02:41.760 But the third algorithm, recall, was this one here in green, 00:02:41.760 --> 00:02:45.660 where we called it logarithmic in terms of how fast or how slow it was. 00:02:45.660 --> 00:02:48.235 And indeed, the implication of this algorithm 00:02:48.235 --> 00:02:50.610 was that we could even double the size of the phone book, 00:02:50.610 --> 00:02:53.280 and no big deal-- one additional page tear, 00:02:53.280 --> 00:02:55.990 and we take yet another 1,000 page bite out of the phone book. 00:02:55.990 --> 00:02:58.890 So today, we'll revisit some of these ideas, formalize them a bit, 00:02:58.890 --> 00:03:01.530 but also translate some of them, ultimately, to code. 00:03:01.530 --> 00:03:03.690 And all of that now is possible because we 00:03:03.690 --> 00:03:06.060 have this lower-level understanding, perhaps, of what's 00:03:06.060 --> 00:03:07.600 actually inside of your computer. 00:03:07.600 --> 00:03:10.170 This, of course, is your computer's RAM or memory. 00:03:10.170 --> 00:03:12.720 And recall that if we start to abstract this away, 00:03:12.720 --> 00:03:15.420 your computer's memory is really just a grid of bytes. 00:03:15.420 --> 00:03:17.730 In fact, we don't have to look at the hardware anymore. 00:03:17.730 --> 00:03:21.270 And we looked at a grid of bytes like this, whereby each of these bytes 00:03:21.270 --> 00:03:26.700 could be used to store a char, an int, a long, or even an entire string, 00:03:26.700 --> 00:03:27.460 at that. 00:03:27.460 --> 00:03:30.117 But let's focus perhaps just on a subset of this 00:03:30.117 --> 00:03:31.950 because last week, of course, we emphasized, 00:03:31.950 --> 00:03:34.830 really, arrays, storing things in arrays. 00:03:34.830 --> 00:03:38.140 And that allowed us to start storing entire strings, sequences 00:03:38.140 --> 00:03:40.470 of characters, and even arrays of integers 00:03:40.470 --> 00:03:44.770 if we wanted to have multiple ones and not just multiple variables as well. 00:03:44.770 --> 00:03:48.850 But the catch is that if you look inside of an array in the computer's memory-- 00:03:48.850 --> 00:03:51.450 and for instance, suppose these integers here are stored-- 00:03:51.450 --> 00:03:53.850 it's pretty easy for us humans to glance at this 00:03:53.850 --> 00:03:55.698 and immediately find the number 50. 00:03:55.698 --> 00:03:57.990 You sort of have this bird's eye view from where you're 00:03:57.990 --> 00:03:59.700 seated of everything on the screen. 00:03:59.700 --> 00:04:02.500 And so it's pretty obvious how you get to the number 50. 00:04:02.500 --> 00:04:04.710 But in the world of computers, of course, 00:04:04.710 --> 00:04:06.720 it turns out that this is hardware. 00:04:06.720 --> 00:04:10.440 And computers, for today's purposes, can only do one thing at a time. 00:04:10.440 --> 00:04:14.820 They can't just take it all in and find instantly some number like 50. 00:04:14.820 --> 00:04:17.130 So perhaps a decent metaphor is to consider 00:04:17.130 --> 00:04:20.010 the array of memory inside of your computer 00:04:20.010 --> 00:04:22.500 really is a sequence of closed doors. 00:04:22.500 --> 00:04:25.860 And if the computer wants to find some value in an array, 00:04:25.860 --> 00:04:29.970 it has to do the digital equivalent of opening each of these doors one 00:04:29.970 --> 00:04:30.730 at a time. 00:04:30.730 --> 00:04:32.410 Now how can code do that? 00:04:32.410 --> 00:04:35.610 Well, of course, we introduced indices or indexes last week, 00:04:35.610 --> 00:04:39.900 whereby we, by convention, call the first element of an array location 0, 00:04:39.900 --> 00:04:44.490 the second location 1, the third location 2, and so forth-- so-called 0 00:04:44.490 --> 00:04:45.150 indexed. 00:04:45.150 --> 00:04:48.300 And this allowed us to now bridge this conceptual world of what's 00:04:48.300 --> 00:04:51.510 going on in memory with actual code, because now we had this square bracket 00:04:51.510 --> 00:04:55.710 syntax via which we could go searching for something if we so choose. 00:04:55.710 --> 00:04:59.760 And it turns out, if I now paint these red instead of yellow, 00:04:59.760 --> 00:05:02.970 it would seem that we actually have a pretty good physical metaphor here 00:05:02.970 --> 00:05:07.170 standing in place for what would be a computer's array of memory 00:05:07.170 --> 00:05:10.390 if, for instance, you're storing some seven numbers like that. 00:05:10.390 --> 00:05:13.710 And so today we begin with a look of a specific type of algorithm. 00:05:13.710 --> 00:05:14.910 That is for searching. 00:05:14.910 --> 00:05:16.270 Searching is all over the place. 00:05:16.270 --> 00:05:19.900 All of us have probably gone to google.com or some equivalent already 00:05:19.900 --> 00:05:21.090 multiple times per day. 00:05:21.090 --> 00:05:24.120 And getting back answers fast is what companies like Google 00:05:24.120 --> 00:05:25.110 are really good at. 00:05:25.110 --> 00:05:26.700 So how are they doing that? 00:05:26.700 --> 00:05:29.882 How are they storing information in computers' memory? 00:05:29.882 --> 00:05:31.590 Well, let's consider what this really is. 00:05:31.590 --> 00:05:34.500 It's really just a problem as it was back in Week 0. 00:05:34.500 --> 00:05:36.420 The input, though, to the problem, for now, 00:05:36.420 --> 00:05:38.253 might be this array of seven lockers. 00:05:38.253 --> 00:05:40.920 So that's the input to the problem, inside of which is a number. 00:05:40.920 --> 00:05:45.060 And maybe for simplicity now, we just want a yes/no, a true/false answer-- 00:05:45.060 --> 00:05:46.860 a bool, that is to say-- 00:05:46.860 --> 00:05:51.250 of whether or not some number like 50 is in that array. 00:05:51.250 --> 00:05:52.680 It's not quite as fancy as Google. 00:05:52.680 --> 00:05:55.500 That doesn't just tell you yes, we have search results. 00:05:55.500 --> 00:05:57.300 It actually gives you the search results. 00:05:57.300 --> 00:05:59.520 But for, now we'll keep it simple, and just output 00:05:59.520 --> 00:06:02.070 as part of this problem yes or no, true or false, 00:06:02.070 --> 00:06:06.090 we have found the number we're looking for given an input like that array. 00:06:06.090 --> 00:06:09.930 But it turns out inside of this black box that we keep coming back to, 00:06:09.930 --> 00:06:12.385 there's all sorts of possible algorithms. 00:06:12.385 --> 00:06:15.010 And we talked about this at a high level conceptually in Week 0 00:06:15.010 --> 00:06:15.860 with the phone book. 00:06:15.860 --> 00:06:19.360 But today, let's consider it a little more concretely 00:06:19.360 --> 00:06:22.648 by way of a game that some of you might have grown up with, namely Monopoly. 00:06:22.648 --> 00:06:24.940 And so behind these doors, it turns out, will be hidden 00:06:24.940 --> 00:06:26.800 some denominations of monopoly money. 00:06:26.800 --> 00:06:28.690 But for this, we now have two volunteers. 00:06:28.690 --> 00:06:30.483 If you'd like to greet the world? 00:06:30.483 --> 00:06:31.525 JACKSON: Hi, I'm Jackson. 00:06:35.440 --> 00:06:37.220 STEPHANIE: Hi, my name is Stephanie. 00:06:37.220 --> 00:06:38.410 DAVID MALAN: And do you want to say a little something 00:06:38.410 --> 00:06:40.570 about yourselves-- years, house, dorm? 00:06:40.570 --> 00:06:43.030 STEPHANIE: I'm a first year living in Matthews. 00:06:43.030 --> 00:06:43.780 DAVID MALAN: Nice. 00:06:43.780 --> 00:06:45.580 JACKSON: And I'm a first year in Canaday. 00:06:45.580 --> 00:06:46.330 DAVID MALAN: Nice. 00:06:46.330 --> 00:06:48.670 Well, welcome to our two volunteers. 00:06:48.670 --> 00:06:50.710 So why don't we do this? 00:06:50.710 --> 00:06:54.460 Would one of you like to volunteer the other to go first? 00:06:54.460 --> 00:06:55.510 STEPHANIE: I'll go. 00:06:55.510 --> 00:06:56.560 DAVID MALAN: OK. 00:06:56.560 --> 00:06:58.300 All right, so Stephanie's up first. 00:06:58.300 --> 00:07:02.560 And behind one of these doors here, we've hidden the monopoly money 50. 00:07:02.560 --> 00:07:04.180 And so we'd like you to find the 50. 00:07:04.180 --> 00:07:06.490 We'll tell you nothing more about the lockers. 00:07:06.490 --> 00:07:08.900 But we would like you to execute a certain algorithm. 00:07:08.900 --> 00:07:11.020 And in fact, I'm going to give you some pseudocode for this. 00:07:11.020 --> 00:07:12.770 And I'm going to give you the name for it. 00:07:12.770 --> 00:07:13.900 It's called linear search. 00:07:13.900 --> 00:07:15.855 And as the name implies, you're pretty much 00:07:15.855 --> 00:07:17.980 going to end up walking in sort of a straight line. 00:07:17.980 --> 00:07:19.220 But how are you going to do this? 00:07:19.220 --> 00:07:21.520 Well, let me propose that in a moment, your first step 00:07:21.520 --> 00:07:23.170 will be to think kind of like a loop. 00:07:23.170 --> 00:07:27.280 For each door from left to right, what do we want you to do on each iteration? 00:07:27.280 --> 00:07:31.528 Well, if 50 is behind that door, then we want 00:07:31.528 --> 00:07:33.070 to go ahead and have you return true. 00:07:33.070 --> 00:07:35.860 And hold up the 50 proudly, if you will, for the group. 00:07:35.860 --> 00:07:38.500 Otherwise, if you get through that whole loop 00:07:38.500 --> 00:07:40.690 and you haven't found the number 50, you can just 00:07:40.690 --> 00:07:42.640 throw up your hands in disappointment. 00:07:42.640 --> 00:07:45.190 False-- you've not found the number 50. 00:07:45.190 --> 00:07:50.293 So to be clear, step one is going to be for each door from left to right. 00:07:50.293 --> 00:07:51.460 How would you like to begin? 00:07:55.610 --> 00:07:56.495 Yep. 00:07:56.495 --> 00:07:57.840 Oh, and then-- yep. 00:07:57.840 --> 00:07:58.340 There we go. 00:07:58.340 --> 00:08:00.365 Yep. 00:08:00.365 --> 00:08:02.060 And if you'd like to at least tell-- 00:08:02.060 --> 00:08:04.040 good, good acting here. 00:08:04.040 --> 00:08:06.140 What have you found instead? 00:08:06.140 --> 00:08:07.970 STEPHANIE: It's not 50, but 20. 00:08:07.970 --> 00:08:08.900 DAVID MALAN: Oh, OK. 00:08:08.900 --> 00:08:10.280 So step one was a fail. 00:08:10.280 --> 00:08:11.870 So let's move on to step two. 00:08:11.870 --> 00:08:14.453 Inside of this loop, what are you going to do next? 00:08:14.453 --> 00:08:16.370 STEPHANIE: I'm going to move to the next door. 00:08:16.370 --> 00:08:17.037 DAVID MALAN: OK. 00:08:20.790 --> 00:08:22.100 STEPHANIE: Almost. 00:08:22.100 --> 00:08:23.100 DAVID MALAN: OK, almost. 00:08:23.100 --> 00:08:23.790 Sort of. 00:08:23.790 --> 00:08:25.110 A 500 instead. 00:08:25.110 --> 00:08:26.280 Next locker? 00:08:26.280 --> 00:08:30.200 STEPHANIE: I would rather take that. 00:08:30.200 --> 00:08:30.700 No. 00:08:33.840 --> 00:08:36.558 DAVID MALAN: OK, we're not telling the audience? 00:08:36.558 --> 00:08:38.260 STEPHANIE: It was a 10. 00:08:38.260 --> 00:08:39.789 DAVID MALAN: OK, so keep going. 00:08:39.789 --> 00:08:40.974 This is step three now. 00:08:45.470 --> 00:08:46.310 STEPHANIE: Oh, man. 00:08:49.850 --> 00:08:51.260 DAVID MALAN: Five, OK. 00:08:51.260 --> 00:08:52.670 A few more lockers to check. 00:08:57.296 --> 00:08:58.790 STEPHANIE: A little sad, guys. 00:09:02.527 --> 00:09:04.360 DAVID MALAN: All right, second-to-last step. 00:09:07.710 --> 00:09:09.070 STEPHANIE: It's 1. 00:09:09.070 --> 00:09:10.022 Kind of close. 00:09:10.022 --> 00:09:10.980 DAVID MALAN: All right. 00:09:10.980 --> 00:09:12.780 And finally, the last step. 00:09:12.780 --> 00:09:15.780 Clearly you've been, perhaps, set up here. 00:09:15.780 --> 00:09:17.340 STEPHANIE: Let's go! 00:09:17.340 --> 00:09:19.920 DAVID MALAN: All right, so the number 50. 00:09:23.500 --> 00:09:25.870 And Stephanie, if I may, let me ask you a question here. 00:09:25.870 --> 00:09:28.890 So on the screen, this is the pseudocode you just executed. 00:09:28.890 --> 00:09:31.800 Suppose, though, I had done what many of us 00:09:31.800 --> 00:09:34.770 have gotten to the habit of doing when you have an if condition. 00:09:34.770 --> 00:09:36.730 You often have an else branch as well. 00:09:36.730 --> 00:09:38.760 Suppose that I had done this now. 00:09:38.760 --> 00:09:41.680 And I'm marking it in red to be clear this is wrong. 00:09:41.680 --> 00:09:45.750 But what would have been bad about this code using an if and an else, 00:09:45.750 --> 00:09:47.040 might you say? 00:09:47.040 --> 00:09:48.360 Any instincts? 00:09:55.620 --> 00:09:58.740 STEPHANIE: Then you would end up canceling 00:09:58.740 --> 00:10:00.210 the code before you found the 50. 00:10:00.210 --> 00:10:01.020 DAVID MALAN: Yeah, exactly. 00:10:01.020 --> 00:10:02.070 STEPHANIE: I mean, you'd just be eternally sad. 00:10:02.070 --> 00:10:02.460 DAVID MALAN: Indeed. 00:10:02.460 --> 00:10:05.127 When Stephanie had opened the first locker, she'd have found 20. 00:10:05.127 --> 00:10:06.630 20, of course, is not 50. 00:10:06.630 --> 00:10:07.838 She would have decreed false. 00:10:07.838 --> 00:10:10.547 But of course, she hadn't checked all of the rest of the lockers. 00:10:10.547 --> 00:10:13.440 So that would seem to be a key detail that, with this implementation 00:10:13.440 --> 00:10:16.440 of the pseudocode, we actually do go through-- as we did-- 00:10:16.440 --> 00:10:18.810 and only return false not even with an else, 00:10:18.810 --> 00:10:23.070 but just at the end of the loop such that we only reach that line if we 00:10:23.070 --> 00:10:25.245 don't return true earlier than that. 00:10:25.245 --> 00:10:26.620 Well, let's go ahead and do this. 00:10:26.620 --> 00:10:27.360 Let me take the mic from you. 00:10:27.360 --> 00:10:28.930 If you'd like to take a seat next to Jackson? 00:10:28.930 --> 00:10:31.180 And Jackson, in just a moment, we'll have you come up. 00:10:31.180 --> 00:10:34.170 Carter, if you don't mind reorganizing the lockers for us. 00:10:34.170 --> 00:10:36.630 But in the meantime, let me point out how we might now 00:10:36.630 --> 00:10:38.010 translate that same idea to code. 00:10:38.010 --> 00:10:41.050 Pretty high level, pretty English-oriented with that pseudocode-- 00:10:41.050 --> 00:10:44.910 but really now, as of last week, we have syntax via which Stephanie 00:10:44.910 --> 00:10:48.810 and, soon, Jackson could treat this locker, this set of lockers, as really 00:10:48.810 --> 00:10:51.250 indeed an array using bracket notation. 00:10:51.250 --> 00:10:54.480 So we can now get a little closer in our pseudocode to actual code. 00:10:54.480 --> 00:10:56.670 And the way a computer scientist, for instance, 00:10:56.670 --> 00:10:59.850 would translate fairly high level English pseudocode 00:10:59.850 --> 00:11:03.720 like this to something that's a little closer to C or any language 00:11:03.720 --> 00:11:06.600 that supports arrays would be a little more cryptically like this. 00:11:06.600 --> 00:11:09.060 But you'll see more of this syntax in the coming days. 00:11:09.060 --> 00:11:13.260 For i from 0 to n minus 1-- this is still pseudocode. 00:11:13.260 --> 00:11:15.840 But that's the English-like way of expressing what 00:11:15.840 --> 00:11:17.730 we've known come to know as a for loop. 00:11:17.730 --> 00:11:22.260 If 50 is behind doors bracket i-- so I'm assuming for the sake of discussion 00:11:22.260 --> 00:11:26.503 that doors, now, is the name of my variable, this array of seven doors. 00:11:26.503 --> 00:11:28.920 But then the rest of the logic, the rest of the pseudocode 00:11:28.920 --> 00:11:30.370 really is the same way. 00:11:30.370 --> 00:11:33.270 And so you'll find in time that programmers, computer scientists more 00:11:33.270 --> 00:11:36.900 generally, when you start expressing ideas, algorithms to someone else, 00:11:36.900 --> 00:11:40.170 instead of maybe operating at this level here, 00:11:40.170 --> 00:11:43.020 you now have a new vocabulary, really a new syntax 00:11:43.020 --> 00:11:44.910 that you can be a little more specific, not 00:11:44.910 --> 00:11:47.380 getting so into the weeds of writing actual C code, 00:11:47.380 --> 00:11:50.910 but at least now doing something that's a little closer to manipulating 00:11:50.910 --> 00:11:51.810 an array like this. 00:11:51.810 --> 00:11:55.140 So Jackson, would you like to stand on up? 00:11:55.140 --> 00:11:56.760 All right. 00:11:56.760 --> 00:11:57.360 Yes, yes. 00:11:57.360 --> 00:11:59.010 Support for Jackson here, too. 00:11:59.010 --> 00:12:00.780 Nice. 00:12:00.780 --> 00:12:04.470 And here now, I'm going to allow you an assumption that Stephanie did not have. 00:12:04.470 --> 00:12:06.660 Stephanie clearly was really doing her best 00:12:06.660 --> 00:12:10.050 searching from left to right using linear search, as we'll now call it. 00:12:10.050 --> 00:12:12.180 But they were pretty much in a random order, right? 00:12:12.180 --> 00:12:15.030 There was a 20 over there, there was 1 over there, and then a 50. 00:12:15.030 --> 00:12:19.110 So we deliberately jumbled things up and did not sort the numbers for her. 00:12:19.110 --> 00:12:22.530 But Carter kindly has just come up to give you a leg up, Jackson, 00:12:22.530 --> 00:12:24.510 by sorting the numbers in advance. 00:12:24.510 --> 00:12:27.630 And we'd like you this time, much like in week 0, 00:12:27.630 --> 00:12:30.060 to do something again and again, but this time 00:12:30.060 --> 00:12:32.250 using what we'll now call binary search. 00:12:32.250 --> 00:12:35.580 It's exactly the same algorithm conceptually as we did in Week 0. 00:12:35.580 --> 00:12:38.310 But if we translate it to the context of this array, 00:12:38.310 --> 00:12:40.450 we now might say something like this. 00:12:40.450 --> 00:12:42.960 The first step for Jackson might be to ask the question-- 00:12:42.960 --> 00:12:46.110 if 50 is behind the middle door, where presumably he's 00:12:46.110 --> 00:12:48.570 done some mental math to figure out what the middle is, 00:12:48.570 --> 00:12:50.610 then he's going to just return true. 00:12:50.610 --> 00:12:53.070 And hopefully we'll get lucky and 50 will be right there. 00:12:53.070 --> 00:12:56.400 Of course, there's two other possibilities at least, 00:12:56.400 --> 00:12:58.290 which would be what? 00:12:58.290 --> 00:13:01.290 50 is, with respect to these doors? 00:13:01.290 --> 00:13:03.930 Yeah, so to the left or to the right, alternatively. 00:13:03.930 --> 00:13:07.722 So if 50 is less than the middle door, then presumably, 00:13:07.722 --> 00:13:09.180 Jackson's going to want to go left. 00:13:09.180 --> 00:13:11.310 Else, if 50 is greater than the middle door, 00:13:11.310 --> 00:13:13.380 he's going to want to go right, much like I 00:13:13.380 --> 00:13:16.170 did physically last week with the phone book, 00:13:16.170 --> 00:13:17.910 dividing and conquering left to right. 00:13:17.910 --> 00:13:20.020 But there's actually a fourth case. 00:13:20.020 --> 00:13:21.540 Let's put it on the board first. 00:13:21.540 --> 00:13:25.530 What else might happen here that Jackson should consider? 00:13:25.530 --> 00:13:26.170 Yeah. 00:13:26.170 --> 00:13:28.590 Oh, it's not there. 00:13:28.590 --> 00:13:32.930 So let me actually go back and amend my pseudocode here and just say Jackson, 00:13:32.930 --> 00:13:36.200 if we don't hand you any doors at all, or eventually, 00:13:36.200 --> 00:13:39.080 as he's dividing and conquering, if he's left with no more doors, 00:13:39.080 --> 00:13:42.380 we have to handle that situation so that the behavior is defined. 00:13:42.380 --> 00:13:44.210 All right, so with that said, Jackson, do 00:13:44.210 --> 00:13:46.127 you want to go ahead and find us the number 50 00:13:46.127 --> 00:13:48.650 and walk us through verbally what you're doing and finding? 00:13:48.650 --> 00:13:52.860 JACKSON: All right, so it looks like this one is the middle door. 00:13:52.860 --> 00:13:55.290 So I'm going to open it. 00:13:55.290 --> 00:13:57.030 But it's 20, not 50. 00:13:57.030 --> 00:13:59.622 DAVID MALAN: Aw. 00:13:59.622 --> 00:14:01.080 What's going through your head now? 00:14:01.080 --> 00:14:02.670 JACKSON: So now I'm looking-- 00:14:02.670 --> 00:14:06.490 because 50 is higher than 20, I want to look to the right. 00:14:06.490 --> 00:14:07.440 DAVID MALAN: Good. 00:14:07.440 --> 00:14:10.270 JACKSON: And look for the new middle door, which would be here. 00:14:10.270 --> 00:14:11.700 DAVID MALAN: Nice. 00:14:11.700 --> 00:14:12.900 JACKSON: And it's 100-- 00:14:12.900 --> 00:14:13.740 bad. 00:14:13.740 --> 00:14:16.560 But 50 is less than 100. 00:14:16.560 --> 00:14:20.520 So now we to look left, which would be here. 00:14:20.520 --> 00:14:21.240 And ta-da. 00:14:21.240 --> 00:14:21.990 DAVID MALAN: Nice. 00:14:21.990 --> 00:14:25.680 Very well done this time around, too. 00:14:25.680 --> 00:14:29.680 So thank you, first, to our volunteers here. 00:14:29.680 --> 00:14:32.970 And in fact, since you're a fan of Monopoly, as we're so informed, 00:14:32.970 --> 00:14:35.220 we have the Cambridge edition of Monopoly 00:14:35.220 --> 00:14:36.743 with all your Harvard favorites. 00:14:36.743 --> 00:14:37.410 JACKSON: No way. 00:14:37.410 --> 00:14:38.460 DAVID MALAN: Here you go. 00:14:38.460 --> 00:14:38.970 STEPHANIE: Thank you. 00:14:38.970 --> 00:14:40.095 JACKSON: Thank you so much. 00:14:40.095 --> 00:14:42.570 DAVID MALAN: Thank you to our volunteers for finding us 50. 00:14:42.570 --> 00:14:46.940 So-- that was more popular than we expected. 00:14:46.940 --> 00:14:50.470 So here, we can translate this one more time into something 00:14:50.470 --> 00:14:52.060 a little closer to code. 00:14:52.060 --> 00:14:54.730 And again, still pseudocode-- but here, now, 00:14:54.730 --> 00:14:57.460 might be another formulation of exactly what Jackson just did, 00:14:57.460 --> 00:14:59.380 just using the nomenclature now of arrays, 00:14:59.380 --> 00:15:01.570 where you can be a little more precise with your instructions 00:15:01.570 --> 00:15:04.640 and still leave it to someone else to translate this, finally, to code. 00:15:04.640 --> 00:15:06.640 But here we have same question at the beginning. 00:15:06.640 --> 00:15:08.650 If no doors left, return false. 00:15:08.650 --> 00:15:11.500 If 50 is behind doors bracket middle-- 00:15:11.500 --> 00:15:13.810 so I'm assuming here, because this is pseudocode-- 00:15:13.810 --> 00:15:16.810 that somewhere I've done the mental math or the actual math 00:15:16.810 --> 00:15:19.480 to figure out what the index of middle is. 00:15:19.480 --> 00:15:22.420 For instance, if these are seven doors in an array, 00:15:22.420 --> 00:15:27.310 this would be location 0, 1, 2, 3, 4, 5, 6. 00:15:27.310 --> 00:15:31.120 So somehow I've taken the total number of doors, 7, 00:15:31.120 --> 00:15:33.520 divided by 2 to find the middle. 00:15:33.520 --> 00:15:34.430 That's 3 and 1/2. 00:15:34.430 --> 00:15:35.680 We have to deal with rounding. 00:15:35.680 --> 00:15:38.920 But suffice it to say there's a well-defined formula for finding 00:15:38.920 --> 00:15:39.910 the middle index. 00:15:39.910 --> 00:15:43.280 Given the total number of lockers, divide by 2 and then round accordingly. 00:15:43.280 --> 00:15:45.550 So that's presumably what Jackson did just by counting 00:15:45.550 --> 00:15:48.790 or in his head to find us door number 3. 00:15:48.790 --> 00:15:52.210 Not the third door, the 4th door, but door bracket 3. 00:15:52.210 --> 00:15:56.200 So this is just saying if 50 is behind doors bracket middle, return true. 00:15:56.200 --> 00:15:57.200 That was not the case. 00:15:57.200 --> 00:15:58.960 He found a $20 bill instead. 00:15:58.960 --> 00:16:03.670 Else, if 50 is less than the doors bracket middle, go ahead-- 00:16:03.670 --> 00:16:10.660 and now it gets interesting-- search doors 0 through doors middle minus 1. 00:16:10.660 --> 00:16:12.730 So it's getting a little more into the weeds now. 00:16:12.730 --> 00:16:16.900 But if middle is 3, this one here, what we want to now 00:16:16.900 --> 00:16:19.300 have Jackson search if 50 had been-- 00:16:19.300 --> 00:16:22.360 if the number had been less, we want to start at bracket 0 00:16:22.360 --> 00:16:24.340 and go up through this one. 00:16:24.340 --> 00:16:26.380 And we deliberately subtract 1 because what's 00:16:26.380 --> 00:16:28.297 the point of looking in the same locker again? 00:16:28.297 --> 00:16:31.270 We might as well do 0 through middle minus 1. 00:16:31.270 --> 00:16:36.040 Else if 50 is greater than doors bracket middle, which it was, 00:16:36.040 --> 00:16:37.120 what did we then do? 00:16:37.120 --> 00:16:40.870 So Jackson intuitively searched for doors middle plus 1 00:16:40.870 --> 00:16:43.295 through doors n minus 1. 00:16:43.295 --> 00:16:46.420 And honestly, it gets a little annoying having the pluses and minuses here. 00:16:46.420 --> 00:16:47.780 But just think of what it means. 00:16:47.780 --> 00:16:49.090 This is the middle door. 00:16:49.090 --> 00:16:53.942 And Jackson then did proceed to search through doors middle plus 1 00:16:53.942 --> 00:16:56.150 because there's no point in searching this one again. 00:16:56.150 --> 00:17:00.130 And then the last element in any array of size n 00:17:00.130 --> 00:17:05.352 where n is just our go-to number for the size is always going to be n minus 1. 00:17:05.352 --> 00:17:06.310 It's not going to be n. 00:17:06.310 --> 00:17:10.839 It's going to be n minus 1 because we always start counting arrays at 0. 00:17:10.839 --> 00:17:14.200 So here then we have a translation into pseudocode that's a little closer 00:17:14.200 --> 00:17:16.430 to C of this exact same idea. 00:17:16.430 --> 00:17:18.490 And here, we come full circle to Week 0. 00:17:18.490 --> 00:17:22.240 In Week 0, it was pretty intuitive to imagine dividing and conquering 00:17:22.240 --> 00:17:23.420 a problem like this. 00:17:23.420 --> 00:17:26.770 But if you now think back to actual your iPhone, your Android phone, 00:17:26.770 --> 00:17:29.920 or the like, when you're doing autocomplete and searching the list, 00:17:29.920 --> 00:17:33.430 it's possible, if you don't have many friends or family or colleagues 00:17:33.430 --> 00:17:34.840 in the phone, you know what? 00:17:34.840 --> 00:17:39.070 Linear search, just checking every name for the person you're searching for, 00:17:39.070 --> 00:17:40.720 might be perfectly fine. 00:17:40.720 --> 00:17:43.810 But odds are your phone's being smarter than that, especially if you 00:17:43.810 --> 00:17:47.140 start to have dozens, hundreds, thousands of people in your contacts 00:17:47.140 --> 00:17:47.770 over the years. 00:17:47.770 --> 00:17:49.540 What would be better than linear search? 00:17:49.540 --> 00:17:51.340 Well, perhaps binary search. 00:17:51.340 --> 00:17:55.180 But, but, but-- there's an assumption, a requirement, which is what? 00:17:55.180 --> 00:18:00.340 Why was Jackson ultimately able to find the 50 in just three steps 00:18:00.340 --> 00:18:04.830 instead of a full seven, like Stephanie? 00:18:04.830 --> 00:18:06.570 Because the array was sorted. 00:18:06.570 --> 00:18:09.990 And so this is sort of a teaser for what we'll have to come back to later today. 00:18:09.990 --> 00:18:12.780 Well, how much effort did it take someone like Carter? 00:18:12.780 --> 00:18:15.240 How much effort does it take your phone to sort all 00:18:15.240 --> 00:18:17.040 of those names and numbers in advance? 00:18:17.040 --> 00:18:19.650 Because maybe it's not actually worth the amount of time. 00:18:19.650 --> 00:18:24.210 Now someone like Google probably somehow keeps the database of web pages sorted. 00:18:24.210 --> 00:18:28.110 You can imagine it being super slow if, when you type in cats or something else 00:18:28.110 --> 00:18:32.280 into google.com, if they searched linearly over their entire data set. 00:18:32.280 --> 00:18:35.430 Ideally, they're doing something a little smarter than that. 00:18:35.430 --> 00:18:38.820 So we'll formalize, now, exactly this kind of analysis. 00:18:38.820 --> 00:18:42.180 And it's not going to be so much mathy as it still will be intuitive. 00:18:42.180 --> 00:18:45.480 But we'll introduce you to some jargon, some terminology 00:18:45.480 --> 00:18:48.180 that most any programmer or computer scientists might use 00:18:48.180 --> 00:18:50.550 when analyzing their own algorithms. 00:18:50.550 --> 00:18:53.670 Let's formalize now what this kind of analysis is. 00:18:53.670 --> 00:18:56.910 So right now, I claim binary search better than linear search. 00:18:56.910 --> 00:18:59.100 But how much better and why, exactly? 00:18:59.100 --> 00:19:01.120 Well, it all comes back to this kind of graph. 00:19:01.120 --> 00:19:04.830 So this, recall, is how we analyzed the phone book back in Week 0. 00:19:04.830 --> 00:19:08.880 And recall that, indeed, we had these formulas, rough formulas that 00:19:08.880 --> 00:19:11.370 described the running time of those three algorithms-- 00:19:11.370 --> 00:19:13.740 one page at a time, two pages at a time, and then 00:19:13.740 --> 00:19:15.720 tearing the thing again and again in half. 00:19:15.720 --> 00:19:19.830 And precisely, if you counted up the number of pages I was touching 00:19:19.830 --> 00:19:22.020 or the number of pages I was tearing, it's 00:19:22.020 --> 00:19:24.600 fair to say that the first algorithm, in the worst case, 00:19:24.600 --> 00:19:26.700 might have taken n total pages. 00:19:26.700 --> 00:19:29.010 It didn't because I was searching for John Harvard 00:19:29.010 --> 00:19:31.260 at the time, which is somewhat early in the alphabet. 00:19:31.260 --> 00:19:34.340 But if I were searching for someone with the last name of Z, 00:19:34.340 --> 00:19:36.840 I would have had to keep going and going, in the worst case, 00:19:36.840 --> 00:19:38.430 through all n pages. 00:19:38.430 --> 00:19:41.940 Not as bad for the second algorithm, and that's why we do n divided by 2. 00:19:41.940 --> 00:19:43.920 And even that's a bit of a white lie. 00:19:43.920 --> 00:19:48.270 It's probably n divided by 2 plus 1 in case I have to double back. 00:19:48.270 --> 00:19:50.405 But again, I'm sort of doing this more generally 00:19:50.405 --> 00:19:52.030 to capture the essence of these things. 00:19:52.030 --> 00:19:54.363 And then we really got into the weeds with like log base 00:19:54.363 --> 00:19:56.940 2 event for that third and final algorithm. 00:19:56.940 --> 00:20:00.690 And at the time, we claimed any time you're dividing something in half, 00:20:00.690 --> 00:20:04.200 in half, in half, odds are there's going to be some kind of logarithm involved. 00:20:04.200 --> 00:20:05.340 And we'll see that today. 00:20:05.340 --> 00:20:09.010 But today, we're going to actually start using computer science terminology. 00:20:09.010 --> 00:20:13.590 And we're going to formalize this imprecision, if you will. 00:20:13.590 --> 00:20:17.670 We are not going to care, generally, about exactly how many steps 00:20:17.670 --> 00:20:20.820 some algorithm takes because that's not going to be that enlightening, 00:20:20.820 --> 00:20:24.630 especially if maybe you have a faster computer tomorrow than you did today. 00:20:24.630 --> 00:20:27.510 It wouldn't really be fair to compare numbers too precisely. 00:20:27.510 --> 00:20:31.590 We really want to, with the wave of a hand, just get a sense of roughly 00:20:31.590 --> 00:20:33.930 how slow or how fast an algorithm is. 00:20:33.930 --> 00:20:36.000 So the notation here is deliberate. 00:20:36.000 --> 00:20:40.620 That is literally a capital O, often italicized, refer to as big O. 00:20:40.620 --> 00:20:43.920 And so the first algorithm is in big O of n. 00:20:43.920 --> 00:20:47.760 The second algorithm is in big O of n divided by 2. 00:20:47.760 --> 00:20:51.480 The third algorithm is in big O of log base 2 of n. 00:20:51.480 --> 00:20:54.690 But even that is kind of unnecessary detail. 00:20:54.690 --> 00:20:58.830 When using big O notation, you really don't care about, we'll see, 00:20:58.830 --> 00:21:01.230 the smaller order terms. 00:21:01.230 --> 00:21:04.500 We're not going to care about the divided by 2 because you know what? 00:21:04.500 --> 00:21:07.720 The shape of these algorithms are is almost the same. 00:21:07.720 --> 00:21:10.800 And really, the idea-- the algorithm itself is sort of fundamentally 00:21:10.800 --> 00:21:11.340 the same. 00:21:11.340 --> 00:21:13.620 Instead of one page at a time, I'm doing two. 00:21:13.620 --> 00:21:17.280 But if you throw millions of pages, billions of pages at me, 00:21:17.280 --> 00:21:20.460 those algorithms are really going to kind of perform the same as n gets 00:21:20.460 --> 00:21:22.085 really large, goes off toward infinity. 00:21:22.085 --> 00:21:23.585 And the same is true for logarithms. 00:21:23.585 --> 00:21:25.560 Even if you're a little rusty, it turns out 00:21:25.560 --> 00:21:29.560 that whether you do the math with log base 2, log base 3, log base 10, 00:21:29.560 --> 00:21:33.040 you can just multiply one by the other to really get the same formula. 00:21:33.040 --> 00:21:35.700 So this is only to say a computer scientist would generally 00:21:35.700 --> 00:21:39.270 say that the first two algorithms are on the order of n steps. 00:21:39.270 --> 00:21:42.690 The third algorithm is on the order of log n steps. 00:21:42.690 --> 00:21:46.350 And we don't really care precisely what we mean beyond that. 00:21:46.350 --> 00:21:49.770 And this big O notation, as we'll see-- and actually, let me zoom out. 00:21:49.770 --> 00:21:53.500 If you can imagine suddenly making the x-axis much longer-- 00:21:53.500 --> 00:21:55.770 so more pages on the screen at once-- 00:21:55.770 --> 00:21:58.320 it is indeed going to be the shapes of these curves 00:21:58.320 --> 00:22:01.480 that matter, because imagine in your mind's eye as you zoom out, 00:22:01.480 --> 00:22:04.950 zoom out, zoom out, zoom out, and as n gets much, much, much bigger 00:22:04.950 --> 00:22:07.470 on the x-axis, the red and the yellow line 00:22:07.470 --> 00:22:11.400 are essentially going to look the same once n is sufficiently large. 00:22:11.400 --> 00:22:14.378 But the green line is never going to look the same. 00:22:14.378 --> 00:22:16.420 It's going to be a fundamentally different shape. 00:22:16.420 --> 00:22:18.570 And so that's the intuition of big O, to get 00:22:18.570 --> 00:22:23.200 a sense of these rates of performance like this. 00:22:23.200 --> 00:22:25.410 So here, then, is big O. Here is, perhaps, 00:22:25.410 --> 00:22:29.160 a cheat sheet of the common formulas that a computer scientist, certainly 00:22:29.160 --> 00:22:32.490 in an introductory context, might use when analyzing algorithms. 00:22:32.490 --> 00:22:36.060 And let's consider for a moment which of our first two algorithms-- 00:22:36.060 --> 00:22:39.040 linear search and binary search-- fall into these categories. 00:22:39.040 --> 00:22:44.318 So I've ordered them from slowest to fastest, so order of n squared. 00:22:44.318 --> 00:22:46.110 It's not something we've actually seen yet, 00:22:46.110 --> 00:22:48.392 but it tends to be slow because it's quadratic. 00:22:48.392 --> 00:22:49.350 You're doing n times n. 00:22:49.350 --> 00:22:51.090 That's got to add up to a lot of steps. 00:22:51.090 --> 00:22:53.190 Better today is going to be n log n. 00:22:53.190 --> 00:22:54.630 Even better is going to be n. 00:22:54.630 --> 00:22:56.190 Even better than that is log n. 00:22:56.190 --> 00:23:00.150 And best is so-called order of 1, like one step 00:23:00.150 --> 00:23:02.520 or maybe two steps, maybe even 1,000 steps, 00:23:02.520 --> 00:23:08.200 but a fixed, finite number of steps that never changes no matter how big n is. 00:23:08.200 --> 00:23:11.920 So given this chart, just to be clear, linear search-- 00:23:11.920 --> 00:23:13.570 let's consider the worst case. 00:23:13.570 --> 00:23:18.040 In the worst case, how many steps did it take someone like Stephanie 00:23:18.040 --> 00:23:23.500 to find the solution to the problem, assuming not seven doors but n doors? 00:23:23.500 --> 00:23:25.160 Yeah? 00:23:25.160 --> 00:23:26.540 So on the order of n. 00:23:26.540 --> 00:23:28.280 And in this case, it's exactly n. 00:23:28.280 --> 00:23:32.660 But you know what, maybe it's arguably 2n because it took Stephanie 00:23:32.660 --> 00:23:33.530 a couple of steps. 00:23:33.530 --> 00:23:34.460 She had to lift the latch. 00:23:34.460 --> 00:23:35.360 She had to open the door. 00:23:35.360 --> 00:23:36.318 Maybe it's three steps. 00:23:36.318 --> 00:23:37.530 She had to show the money. 00:23:37.530 --> 00:23:39.170 So now it's 3n, 2n. 00:23:39.170 --> 00:23:41.990 But we don't really care about that level of precision. 00:23:41.990 --> 00:23:45.660 We really just care about the fundamental number of operations. 00:23:45.660 --> 00:23:47.540 So we'll say yes, on the order of n. 00:23:47.540 --> 00:23:51.320 So that might be an upper bound, we'll call this, for linear search. 00:23:51.320 --> 00:23:53.030 And how about binary search? 00:23:53.030 --> 00:23:57.140 In Jackson's case, or in general, me in Week 0, if there's n doors, 00:23:57.140 --> 00:24:02.910 how many steps did it take Jackson or me using binary search? 00:24:02.910 --> 00:24:04.860 In this case, it was literally three. 00:24:04.860 --> 00:24:07.200 But that's not a formula. 00:24:07.200 --> 00:24:09.690 Yeah so it's on the order of log n. 00:24:09.690 --> 00:24:11.970 And indeed, if there are seven doors, well, that's 00:24:11.970 --> 00:24:14.250 almost eight, if you just do a little bit of rounding. 00:24:14.250 --> 00:24:18.480 And indeed, if you take log base 2 of 8, that does actually give us 3. 00:24:18.480 --> 00:24:19.813 So the math actually checks out. 00:24:19.813 --> 00:24:22.272 And if you're not comfortable with logarithms, no big deal. 00:24:22.272 --> 00:24:23.670 Just think about it intuitively. 00:24:23.670 --> 00:24:27.010 Logarithm of base 2 is just dividing something again and again. 00:24:27.010 --> 00:24:31.110 So on this chart, when we consider big O, which to be clear, 00:24:31.110 --> 00:24:35.100 allows you to describe the order of an algorithm's running time-- 00:24:35.100 --> 00:24:38.610 like the magnitude of it-- but it also describes, more specifically, 00:24:38.610 --> 00:24:40.090 an upper bound. 00:24:40.090 --> 00:24:42.990 So in the worst case, for instance, these 00:24:42.990 --> 00:24:45.480 are pretty good measures of how good-- 00:24:45.480 --> 00:24:47.310 or rather, of how bad-- 00:24:47.310 --> 00:24:49.270 linear search and binary search might be. 00:24:49.270 --> 00:24:49.770 Why? 00:24:49.770 --> 00:24:52.122 Well, suppose you're searching a 1,000-page phonebook 00:24:52.122 --> 00:24:55.080 and the person's name starts with Z. The algorithm is still going to be 00:24:55.080 --> 00:24:56.320 on the order of n steps. 00:24:56.320 --> 00:24:56.820 Why? 00:24:56.820 --> 00:25:01.080 Because it might take you as many as all n steps to find it. 00:25:01.080 --> 00:25:05.250 Now that's not necessarily going to be the case in practice. 00:25:05.250 --> 00:25:08.370 If I use big O as an upper bound, well, it 00:25:08.370 --> 00:25:11.160 would be nice if there's a corresponding lower bound, 00:25:11.160 --> 00:25:16.180 especially if you want to consider not just worst cases, but maybe best cases. 00:25:16.180 --> 00:25:18.040 So what might we use here? 00:25:18.040 --> 00:25:20.200 Well, this is a capital Greek omega symbol. 00:25:20.200 --> 00:25:23.550 So omega is the symbol that a computer scientist uses generally 00:25:23.550 --> 00:25:25.920 to describe a lower bound on an algorithm, 00:25:25.920 --> 00:25:28.710 often in the context of best case, though not necessarily. 00:25:28.710 --> 00:25:32.490 So a lower bound means how few steps might an algorithm take? 00:25:32.490 --> 00:25:33.990 And here, too, same formulas. 00:25:33.990 --> 00:25:36.270 And we'll fill in these blanks over time. 00:25:36.270 --> 00:25:40.140 Some algorithms might always take a minimum of n squared steps, 00:25:40.140 --> 00:25:41.370 or on the order of n steps. 00:25:41.370 --> 00:25:45.660 Some might only take n log n, or n, or log n, or 1. 00:25:45.660 --> 00:25:49.170 So something like a linear search-- 00:25:49.170 --> 00:25:51.240 when Stephanie started with linear search, 00:25:51.240 --> 00:25:52.980 she didn't get lucky this time on stage. 00:25:52.980 --> 00:25:57.720 But what if she had, and the first door she opened were 50? 00:25:57.720 --> 00:26:01.260 How might you then describe the lower bound on linear 00:26:01.260 --> 00:26:08.290 search in this so-called best case, using this list of possible answers? 00:26:08.290 --> 00:26:09.530 Yeah? 00:26:09.530 --> 00:26:11.060 Yeah, so omega of 1. 00:26:11.060 --> 00:26:14.900 So in the best case, the lower bound on how many steps 00:26:14.900 --> 00:26:18.990 it might take linear search to find something might just be one step. 00:26:18.990 --> 00:26:19.490 Why? 00:26:19.490 --> 00:26:21.500 Because maybe if Stephanie had gotten lucky 00:26:21.500 --> 00:26:25.960 and we had prefilled these lockers with the numbers in some other order 00:26:25.960 --> 00:26:28.460 such that she might have opened the first locker, and voila, 00:26:28.460 --> 00:26:31.280 the number 50 could have been there, so a lower bound arguably 00:26:31.280 --> 00:26:34.610 could indeed be omega of 1 for linear search. 00:26:34.610 --> 00:26:35.990 And how about now for Jackson? 00:26:35.990 --> 00:26:37.440 He used binary search. 00:26:37.440 --> 00:26:40.940 So he dived right into the middle of the problem. 00:26:40.940 --> 00:26:45.020 But what would be a lower bound on binary search using this logic? 00:26:45.020 --> 00:26:45.980 Yeah? 00:26:45.980 --> 00:26:47.460 Yeah, so again, omega of 1. 00:26:47.460 --> 00:26:47.960 Why? 00:26:47.960 --> 00:26:49.580 Because maybe he just gets lucky. 00:26:49.580 --> 00:26:53.300 And indeed, right in the middle of the lockers could have been the number 50. 00:26:53.300 --> 00:26:54.060 It wasn't. 00:26:54.060 --> 00:26:57.770 And so more germane in Jackson's actual practice 00:26:57.770 --> 00:27:00.050 would have been the big O discussion. 00:27:00.050 --> 00:27:03.350 But big O and omega, upper bound and lower bound, 00:27:03.350 --> 00:27:05.450 just allow a computer scientist to kind of wrestle 00:27:05.450 --> 00:27:06.980 with what could happen maybe in the worst 00:27:06.980 --> 00:27:08.670 case, what can happen in the best case? 00:27:08.670 --> 00:27:12.267 And you can even get even more precise like the average case or the like. 00:27:12.267 --> 00:27:15.350 And this is, indeed, what engineers might do at a whiteboard in a company, 00:27:15.350 --> 00:27:17.600 in a university when designing an algorithm 00:27:17.600 --> 00:27:21.380 and trying to make arguments as to why their algorithm is better than someone 00:27:21.380 --> 00:27:24.080 else's, by way of these kinds of analyses. 00:27:24.080 --> 00:27:29.180 And just so you've seen it, it turns out that if some algorithm happens 00:27:29.180 --> 00:27:32.990 to have an identical upper bound and lower bound, 00:27:32.990 --> 00:27:35.880 you can actually use a capital Greek theta as well. 00:27:35.880 --> 00:27:38.210 And this is the last of the Greek symbols today. 00:27:38.210 --> 00:27:41.900 But a Greek theta indicates a coincidence of both upper 00:27:41.900 --> 00:27:43.130 bound and lower bound. 00:27:43.130 --> 00:27:44.702 That is, they are one and the same. 00:27:44.702 --> 00:27:47.660 That was not the case for our discussion a second ago of linear search, 00:27:47.660 --> 00:27:49.220 not the case for binary search. 00:27:49.220 --> 00:27:52.970 But you could use the same kinds of formulas 00:27:52.970 --> 00:27:56.970 if it turns out that your upper bound and lower bound are the same. 00:27:56.970 --> 00:28:00.440 So for instance, if I were to count everyone literally in this room-- one, 00:28:00.440 --> 00:28:03.870 two, three, four, five, six and so forth-- 00:28:03.870 --> 00:28:09.080 you could actually say that counting in that way is in theta of n 00:28:09.080 --> 00:28:11.150 because in the best case, it's going to take 00:28:11.150 --> 00:28:13.468 me n points, people in the audience. 00:28:13.468 --> 00:28:15.260 In the worst case, it's going to take me n. 00:28:15.260 --> 00:28:18.380 It's always going to take me n steps if I want to count everyone in the room. 00:28:18.380 --> 00:28:20.930 You can't really do better than that unless you skip people. 00:28:20.930 --> 00:28:23.510 So that would be an example of the cuff of something 00:28:23.510 --> 00:28:26.150 where theta is instead germane. 00:28:26.150 --> 00:28:31.530 Are any questions now on big O, on omega, or theta, 00:28:31.530 --> 00:28:34.040 which are now just more formal tools in the toolkit 00:28:34.040 --> 00:28:38.730 for talking about the design of our algorithms? 00:28:38.730 --> 00:28:42.050 Any questions? 00:28:42.050 --> 00:28:42.860 No? 00:28:42.860 --> 00:28:44.720 Seeing none. 00:28:44.720 --> 00:28:45.560 Oh, is this-- yes? 00:28:45.560 --> 00:28:46.840 No? 00:28:46.840 --> 00:28:48.250 OK, so we're good. 00:28:48.250 --> 00:28:52.000 So let's go ahead and translate this, perhaps, to some actual code. 00:28:52.000 --> 00:28:53.900 Let me go over to VS Code here. 00:28:53.900 --> 00:28:57.100 And let's see if we can't now translate some of these ideas 00:28:57.100 --> 00:29:00.280 to some actual code, not so much using new syntax yet. 00:29:00.280 --> 00:29:03.320 We're going to still operate in this world of arrays like last week. 00:29:03.320 --> 00:29:05.237 So let me go ahead and create a program called 00:29:05.237 --> 00:29:09.280 search.c by executing code space search.c in my terminal. 00:29:09.280 --> 00:29:11.800 And then up here, let's go ahead and include our usual, 00:29:11.800 --> 00:29:14.740 so include cs50.h so I can get some input. 00:29:14.740 --> 00:29:18.370 Include standard io.h so I can print some output. 00:29:18.370 --> 00:29:21.910 We'll do int main void, the meaning of which we did start 00:29:21.910 --> 00:29:23.140 to tease apart last week. 00:29:23.140 --> 00:29:26.650 The fact that it's void again today just means no command line arguments. 00:29:26.650 --> 00:29:28.580 And let me go ahead and do this. 00:29:28.580 --> 00:29:33.130 Let me go ahead and declare, just for discussion's sake, a static array, 00:29:33.130 --> 00:29:34.840 like an array that never changes. 00:29:34.840 --> 00:29:39.280 And the syntax for this is going to be give me an array called numbers 00:29:39.280 --> 00:29:41.290 using the square bracket notation. 00:29:41.290 --> 00:29:43.390 And I'm going to immediately initialize it 00:29:43.390 --> 00:29:49.300 to 20, 500, 10, 5, 100, 1, and 50, reminiscent of those same denominations 00:29:49.300 --> 00:29:50.060 as before. 00:29:50.060 --> 00:29:54.080 So this is a slightly new syntax that we've perhaps not seen. 00:29:54.080 --> 00:29:57.130 And the curly braces here, which are different from for loops 00:29:57.130 --> 00:30:00.820 and while loops and functions, just tell the compiler please give me 00:30:00.820 --> 00:30:05.380 an array of whatever size this is containing those numbers left to right. 00:30:05.380 --> 00:30:10.220 I could alternatively use last week's syntax of saying something like this. 00:30:10.220 --> 00:30:13.090 Let's see, 1, 2, 3, 4, 5, 6, 7 denominations. 00:30:13.090 --> 00:30:15.250 I could alternatively do this. 00:30:15.250 --> 00:30:21.910 And then I could say numbers bracket 0 equals 00:30:21.910 --> 00:30:25.570 20, numbers bracket 1 equals 500. 00:30:25.570 --> 00:30:27.572 And I could do this five more times. 00:30:27.572 --> 00:30:28.780 That's just a little tedious. 00:30:28.780 --> 00:30:30.580 If you know the numbers in advance, you don't 00:30:30.580 --> 00:30:32.530 have to tell the compiler how many there are. 00:30:32.530 --> 00:30:37.420 You can just let it figure it out that your numbers will be 10, 500, 10, 5, 00:30:37.420 --> 00:30:39.430 100, 1, and 50. 00:30:39.430 --> 00:30:42.550 So this is how you statically define an array. 00:30:42.550 --> 00:30:45.380 All right, let me just go ahead and ask the user now for a number. 00:30:45.380 --> 00:30:48.920 We'll call it n by using get_int and prompting them for a number-- 00:30:48.920 --> 00:30:50.020 so nothing new there. 00:30:50.020 --> 00:30:53.680 And now let me go ahead and implement linear search. 00:30:53.680 --> 00:30:57.520 And the pseudocode we had for this before used some array-like notation. 00:30:57.520 --> 00:30:59.620 Let me go ahead, then, and start similarly. 00:30:59.620 --> 00:31:04.270 For int i-- and you almost always start counting at i by convention. 00:31:04.270 --> 00:31:06.490 So that's perhaps a good starting point. 00:31:06.490 --> 00:31:09.790 I'm going to do this so long as i is less than 7. 00:31:09.790 --> 00:31:12.138 Not the best design to hard code the 7, but this is just 00:31:12.138 --> 00:31:13.930 for demonstration's sake for now, because I 00:31:13.930 --> 00:31:15.560 know how many numbers I put in there. 00:31:15.560 --> 00:31:16.992 And then I'm going to i++. 00:31:16.992 --> 00:31:19.450 So now I have the beginnings of a loop that will just allow 00:31:19.450 --> 00:31:21.550 me to iterate over the entire array. 00:31:21.550 --> 00:31:22.760 And let me ask this. 00:31:22.760 --> 00:31:27.370 If the current number at location i equals 00:31:27.370 --> 00:31:30.650 equals n, which is the number the human typed in, 00:31:30.650 --> 00:31:33.400 then let's go ahead and do something simple like printf, 00:31:33.400 --> 00:31:36.190 quote unquote, found, backslash n. 00:31:36.190 --> 00:31:40.240 And then per our discussion last week, to indicate that this is successful, 00:31:40.240 --> 00:31:42.610 I'm going to return 0 if I found it. 00:31:42.610 --> 00:31:45.940 And if I don't find it, I'm just going to go down here and, by default, 00:31:45.940 --> 00:31:48.640 say not found, backslash n. 00:31:48.640 --> 00:31:52.610 And just for convention-- whoops, just for good measure, per convention, 00:31:52.610 --> 00:31:55.630 I'll return 1 or, really, any value other than 0. 00:31:55.630 --> 00:31:57.130 0, recall, means success. 00:31:57.130 --> 00:32:00.670 And any other integer tends to mean error of some sort, 00:32:00.670 --> 00:32:02.690 irrespective of the number I'm looking for. 00:32:02.690 --> 00:32:06.670 So just to revisit, the only thing that's new here is the syntax. 00:32:06.670 --> 00:32:09.980 We're creating an array of seven numbers, these numbers. 00:32:09.980 --> 00:32:13.540 And then after that we have really highlighted here 00:32:13.540 --> 00:32:16.090 an implementation of linear search. 00:32:16.090 --> 00:32:19.390 I mean this is the C version, I daresay, of what Stephanie did on the board, 00:32:19.390 --> 00:32:22.540 whereas now the array is called numbers instead of doors. 00:32:22.540 --> 00:32:25.460 But I think it's pretty much the same. 00:32:25.460 --> 00:32:30.380 Let me go ahead and open my terminal window and run make search. 00:32:30.380 --> 00:32:32.518 Seems to compile, ./search. 00:32:32.518 --> 00:32:34.310 And let's go ahead and search for a number. 00:32:34.310 --> 00:32:36.230 We'll start with what we did before, 50. 00:32:36.230 --> 00:32:37.340 And it's found. 00:32:37.340 --> 00:32:39.770 Let's go ahead and run it again, ./search. 00:32:39.770 --> 00:32:42.500 Let's search for maybe 20 at the beginning. 00:32:42.500 --> 00:32:43.670 That one, too, is found. 00:32:43.670 --> 00:32:47.150 Let's run it one more time searching for like 1,000, 00:32:47.150 --> 00:32:50.720 which is not among the denominations. 00:32:50.720 --> 00:32:52.980 And that one, indeed, is not found. 00:32:52.980 --> 00:32:56.210 So we've taken an idea from Week 0, now formalized in Week 3, 00:32:56.210 --> 00:32:59.300 and just translated it now to code. 00:32:59.300 --> 00:33:05.500 Questions on this implementation of linear search? 00:33:05.500 --> 00:33:07.570 Linear search. 00:33:07.570 --> 00:33:08.680 Nothing. 00:33:08.680 --> 00:33:11.810 Oh, so successful so far today. 00:33:11.810 --> 00:33:15.820 So let's see if we can't maybe make this a little more interesting 00:33:15.820 --> 00:33:19.270 and see if we can't trip over a detail that's going to be important in C. 00:33:19.270 --> 00:33:23.330 And instead of doing numbers, let me go ahead and do this. 00:33:23.330 --> 00:33:25.030 We'll stay on theme with Monopoly. 00:33:25.030 --> 00:33:26.830 And I went down the rabbit hole of reading the Wikipedia 00:33:26.830 --> 00:33:27.730 article on Monopoly. 00:33:27.730 --> 00:33:32.170 And the original pieces or tokens that came with Monopoly-- and it turns out 00:33:32.170 --> 00:33:33.740 we can represent those with strings. 00:33:33.740 --> 00:33:37.690 So I'm going to create an array called strings, plural, of whatever 00:33:37.690 --> 00:33:39.170 size I defined here. 00:33:39.170 --> 00:33:42.340 And the very first monopoly pieces back in the day 00:33:42.340 --> 00:33:44.920 were a battleship that you could play with, 00:33:44.920 --> 00:33:53.320 a boot, a cannon, an iron, a thimble, and a top hat, some of which 00:33:53.320 --> 00:33:54.700 you might from the game nowadays. 00:33:54.700 --> 00:33:56.410 Turns out they've been changing these-- 00:33:56.410 --> 00:33:57.890 had no idea-- over the years. 00:33:57.890 --> 00:34:00.170 So here is, now, an array of strings. 00:34:00.170 --> 00:34:03.940 Let me go ahead and prompt the user now not for an integer anymore. 00:34:03.940 --> 00:34:07.970 I want to now search for one of these strings still using linear search. 00:34:07.970 --> 00:34:11.020 So let me create a string s, set it equal to get_string, 00:34:11.020 --> 00:34:13.840 prompt the user for a string to search for. 00:34:13.840 --> 00:34:19.540 And then I think my code here is almost the same, except for one detail. 00:34:19.540 --> 00:34:21.850 I now have an array called strings. 00:34:21.850 --> 00:34:24.040 I now have a variable called s. 00:34:24.040 --> 00:34:27.790 But it turns out, for reasons we'll explore in more detail next week, 00:34:27.790 --> 00:34:31.030 this line of code is not going to work. 00:34:31.030 --> 00:34:34.900 And it turns out the reason has to do with what we discussed 00:34:34.900 --> 00:34:36.880 last week of what a string really is. 00:34:36.880 --> 00:34:39.355 And what is a string, again? 00:34:39.355 --> 00:34:41.000 A string is an array. 00:34:41.000 --> 00:34:44.929 And it turns out, though, that equals equals is not 00:34:44.929 --> 00:34:49.760 going to generously compare all of the characters in an array for you 00:34:49.760 --> 00:34:51.949 just because you use equal equals. 00:34:51.949 --> 00:34:54.650 It turns out it's not going to compare every letter. 00:34:54.650 --> 00:35:00.260 And so thankfully, there is, in the string library 00:35:00.260 --> 00:35:03.058 that we introduced last week, a solution to this problem. 00:35:03.058 --> 00:35:05.850 The reason for the problem, we'll explore in more detail next week. 00:35:05.850 --> 00:35:09.860 But for now, just know that when you want to compare strings in C-- 00:35:09.860 --> 00:35:13.220 especially if you've come into the class knowing a bit of Java or Python or some 00:35:13.220 --> 00:35:15.680 other language-- you cannot use equals equals. 00:35:15.680 --> 00:35:18.500 Even though you could in Scratch, you cannot in C. 00:35:18.500 --> 00:35:21.620 So what I have to actually do here is this. 00:35:21.620 --> 00:35:26.330 I have to ask the question, does the return value of a function called str 00:35:26.330 --> 00:35:33.160 compare, or strcomp, equal 0 when passed in the current string 00:35:33.160 --> 00:35:36.050 and that's user input? 00:35:36.050 --> 00:35:39.790 So if you read the documentation for this function called str compare, 00:35:39.790 --> 00:35:44.500 you'll see that it takes two strings as input, first one and second one. 00:35:44.500 --> 00:35:47.140 It then-- someone decades ago wrote the code 00:35:47.140 --> 00:35:49.060 that probably uses a for loop or a while loop 00:35:49.060 --> 00:35:51.910 to compare every character in each of those strings. 00:35:51.910 --> 00:35:56.290 And it turns out it returns 0 if they are, in fact, equal. 00:35:56.290 --> 00:36:00.640 Turns out, too, it will return a positive number or a negative number 00:36:00.640 --> 00:36:02.440 in other situations. 00:36:02.440 --> 00:36:04.990 Any intuition for why it might actually be 00:36:04.990 --> 00:36:10.810 useful to have a function that allows you to check if two strings are equal? 00:36:10.810 --> 00:36:13.480 If they're not equal, what else might be interesting to know 00:36:13.480 --> 00:36:14.830 when comparing two strings? 00:36:18.474 --> 00:36:19.391 If certain values are? 00:36:19.391 --> 00:36:23.347 STUDENT: [INAUDIBLE] 00:36:23.347 --> 00:36:24.430 DAVID MALAN: OK, possibly. 00:36:24.430 --> 00:36:26.950 Maybe you want to just how similar they are. 00:36:26.950 --> 00:36:28.810 And that's indeed an algorithm unto itself. 00:36:28.810 --> 00:36:31.410 But str compare is a little simpler than that. 00:36:31.410 --> 00:36:33.040 STUDENT: [INAUDIBLE] 00:36:35.850 --> 00:36:39.100 DAVID MALAN: Exactly, if you're trying to alphabetize a whole list of strings, 00:36:39.100 --> 00:36:41.950 just like your phone probably is for your contacts or address book. 00:36:41.950 --> 00:36:44.320 It turns out that str compare will actually 00:36:44.320 --> 00:36:48.220 return a positive number or a negative number or a 0 00:36:48.220 --> 00:36:52.120 based on whether, maybe it comes alphabetically first or later, 00:36:52.120 --> 00:36:53.800 or in fact, equal. 00:36:53.800 --> 00:36:55.130 So that can be a useful thing. 00:36:55.130 --> 00:36:57.838 And that's just a teaser for a lower level explanation 00:36:57.838 --> 00:36:58.880 that we'll see next week. 00:36:58.880 --> 00:37:01.750 So now, let me cross my fingers and see if I got this right. 00:37:01.750 --> 00:37:05.410 Let me go ahead and do make search. 00:37:05.410 --> 00:37:08.590 Did compile, albeit slowly. 00:37:08.590 --> 00:37:11.920 Dot slash search, and let's search for something like the thimble. 00:37:11.920 --> 00:37:14.048 And we see that that's, indeed, found. 00:37:14.048 --> 00:37:16.090 Otherwise, let's search for something that I know 00:37:16.090 --> 00:37:19.060 isn't there, like a race car, which was there when I grew up. 00:37:19.060 --> 00:37:23.227 But huh, segmentation fault, core dumped. 00:37:23.227 --> 00:37:25.810 And actually, some of you have tripped over this error before. 00:37:25.810 --> 00:37:27.220 Anyone want to admit seeing this? 00:37:27.220 --> 00:37:29.350 So yeah, not something we've talked about, 00:37:29.350 --> 00:37:32.170 and honestly, not something I intended just now. 00:37:32.170 --> 00:37:34.450 But that too, we'll see next week. 00:37:34.450 --> 00:37:39.920 Any intuition for why my program just broke. 00:37:39.920 --> 00:37:41.900 I didn't really change the logic. 00:37:41.900 --> 00:37:43.550 It's still linear search. 00:37:43.550 --> 00:37:46.280 Let me hide the terminal so you can see all of the code at once. 00:37:46.280 --> 00:37:49.850 The only thing I did was switched from integers to strings. 00:37:49.850 --> 00:37:52.310 And I switched to str compare here. 00:37:52.310 --> 00:37:54.205 But segmentation fault happened. 00:37:54.205 --> 00:37:57.080 And the teaser is that that somehow relates to the computer's memory. 00:37:57.080 --> 00:37:57.996 Yeah. 00:37:57.996 --> 00:38:00.690 STUDENT: [INAUDIBLE] 00:38:01.470 --> 00:38:03.670 DAVID MALAN: Yeah, and this is subtle, but spot on. 00:38:03.670 --> 00:38:06.510 So one, two, three, four, five, six elements 00:38:06.510 --> 00:38:11.880 total in this array, versus the seven numbers of monopoly denominations 00:38:11.880 --> 00:38:12.810 that we had earlier. 00:38:12.810 --> 00:38:13.888 And this is where, see? 00:38:13.888 --> 00:38:15.930 Sort of case in point, this came back to bite me. 00:38:15.930 --> 00:38:19.860 The fact that I hardcoded this value as opposed to maybe separating it out 00:38:19.860 --> 00:38:22.260 as a constant or declaring it higher up, kind of 00:38:22.260 --> 00:38:26.610 bit me here, because now, I'm iterating over an array of size 6. 00:38:26.610 --> 00:38:29.820 But clearly, I'm going one step too far, because I'm literally 00:38:29.820 --> 00:38:32.250 going to iterate seven times, not six. 00:38:32.250 --> 00:38:35.580 So it's as though I'm looking at memory that's over here. 00:38:35.580 --> 00:38:37.530 And indeed, next week, we'll focus on memory. 00:38:37.530 --> 00:38:38.860 And that's just a bad thing. 00:38:38.860 --> 00:38:42.270 So odds are, not even seeing your code from this past week, if any of you 00:38:42.270 --> 00:38:46.020 have had segmentation faults, odds are, you touched memory 00:38:46.020 --> 00:38:47.280 that you shouldn't have. 00:38:47.280 --> 00:38:49.290 You maybe looped too many times. 00:38:49.290 --> 00:38:52.770 You might have used a negative number to get into your array. 00:38:52.770 --> 00:38:55.220 In general, you touched memory that you shouldn't have. 00:38:55.220 --> 00:38:57.720 And you touched a segment of memory that you shouldn't have. 00:38:57.720 --> 00:39:00.060 The fix, though, at least in my case, is simple. 00:39:00.060 --> 00:39:01.300 Just don't do that. 00:39:01.300 --> 00:39:03.210 So let me go ahead and recompile this. 00:39:03.210 --> 00:39:06.870 Make search dot slash search. 00:39:06.870 --> 00:39:10.320 And I'll search again for race car, Enter. 00:39:10.320 --> 00:39:11.850 And now it does not crash. 00:39:11.850 --> 00:39:13.630 But it does tell me it's not found. 00:39:13.630 --> 00:39:17.040 So subtle, but something you might yourself have tripped over already. 00:39:17.040 --> 00:39:23.190 Questions then, on what I just did, intentionally or otherwise. 00:39:23.190 --> 00:39:24.423 Yeah, in front. 00:39:24.423 --> 00:39:29.060 STUDENT: One thing is the program still works if you do return-- 00:39:29.060 --> 00:39:31.275 if you don't do return 0, return 1. 00:39:31.275 --> 00:39:33.220 So what is the purpose of doing [INAUDIBLE]?? 00:39:33.220 --> 00:39:34.720 DAVID MALAN: A really good question. 00:39:34.720 --> 00:39:38.920 So the program will still work even if I don't return 0 or return 1. 00:39:38.920 --> 00:39:43.120 In fact, let me go ahead and do that and just hide my terminal window 00:39:43.120 --> 00:39:43.930 for a second. 00:39:43.930 --> 00:39:46.210 Let's get rid of the return here. 00:39:46.210 --> 00:39:48.040 Let's get rid of the return here. 00:39:48.040 --> 00:39:50.810 However, watch what happens here. 00:39:50.810 --> 00:39:53.710 Let me go ahead and recompile this, make search. 00:39:53.710 --> 00:39:55.610 Let me scroll up in my code here. 00:39:55.610 --> 00:39:57.560 Let me go ahead and do dot slash search. 00:39:57.560 --> 00:40:00.280 And let me go ahead and search for the first thing in the list, 00:40:00.280 --> 00:40:02.800 battle ship, so I know that this should be found. 00:40:02.800 --> 00:40:04.690 I hit Enter. 00:40:04.690 --> 00:40:05.858 Huh, interesting. 00:40:05.858 --> 00:40:07.150 So it's saying found not found. 00:40:07.150 --> 00:40:11.496 But do you see why, logically, in this case? 00:40:11.496 --> 00:40:12.980 STUDENT: Is the loop still running? 00:40:12.980 --> 00:40:13.910 DAVID MALAN: Exactly. 00:40:13.910 --> 00:40:15.302 So the loop is still running. 00:40:15.302 --> 00:40:17.010 So there's a couple of solutions to this. 00:40:17.010 --> 00:40:21.080 I could, for instance, somehow break out of the code here. 00:40:21.080 --> 00:40:24.200 But that's going to still result in line 18 executing. 00:40:24.200 --> 00:40:26.600 I could then instead just return here. 00:40:26.600 --> 00:40:29.390 I don't strictly need to return 1 down at the bottom. 00:40:29.390 --> 00:40:31.580 But I made this claim last week that it tends 00:40:31.580 --> 00:40:35.600 to be helpful as your programs get more sophisticated, to at least signify, 00:40:35.600 --> 00:40:39.300 just like a real world programmer, error codes when something goes wrong. 00:40:39.300 --> 00:40:44.090 So returning 0 in main is the easiest way to signify my code is done. 00:40:44.090 --> 00:40:46.340 I'm ready to exit successfully, that's it. 00:40:46.340 --> 00:40:48.988 But down here, I could absolutely still return 0, 00:40:48.988 --> 00:40:50.280 because that's not a huge deal. 00:40:50.280 --> 00:40:53.870 It's not really an error that deserves annoying the user with some kind of pop 00:40:53.870 --> 00:40:55.200 up that something went wrong. 00:40:55.200 --> 00:40:57.830 But return 1 is just a lower level way of signaling, 00:40:57.830 --> 00:41:00.330 eh, it didn't really find what I was looking for. 00:41:00.330 --> 00:41:03.510 And remember from last week, you can see this as follows. 00:41:03.510 --> 00:41:08.060 If I recompile this again, now that I've reverted those changes, so make search. 00:41:08.060 --> 00:41:13.100 And if I do a dot slash search and search for battle ship, 00:41:13.100 --> 00:41:16.700 which is indeed found, recall I can execute this magical command, 00:41:16.700 --> 00:41:19.790 echo dollar sign question mark, which you're not going to often execute. 00:41:19.790 --> 00:41:22.790 But it shows you what main returned. 00:41:22.790 --> 00:41:27.770 If I run search again and search for race car, which is not found, 00:41:27.770 --> 00:41:30.530 I see not found, but I can also run this command again 00:41:30.530 --> 00:41:32.150 and see that, oh, it returned 1. 00:41:32.150 --> 00:41:35.300 So now if you fast forward a few months, a few years, when you're actually 00:41:35.300 --> 00:41:37.850 writing code in a company or for larger projects, 00:41:37.850 --> 00:41:40.100 you might want to be automating software. 00:41:40.100 --> 00:41:43.100 You might not want the human to necessarily be running it manually. 00:41:43.100 --> 00:41:47.240 You might want code to be automated by some nightly process 00:41:47.240 --> 00:41:48.360 or something like that. 00:41:48.360 --> 00:41:53.030 Using these exit codes, can a program determine yes or no 00:41:53.030 --> 00:41:55.910 that other code succeeded or failed. 00:41:55.910 --> 00:42:01.850 Other questions on linear search in this way. 00:42:01.850 --> 00:42:02.350 No? 00:42:02.350 --> 00:42:07.570 All right, well, let's translate this to one other feature of C 00:42:07.570 --> 00:42:11.590 here by incorporating these two ideas now into one other program. 00:42:11.590 --> 00:42:16.605 So I'm going to create a phone book in C by doing code space phonebook dot C. 00:42:16.605 --> 00:42:18.730 And let's combine some of these ideas and implement 00:42:18.730 --> 00:42:21.700 this notion of searching a phonebook for an actual name 00:42:21.700 --> 00:42:23.030 and getting back a number. 00:42:23.030 --> 00:42:26.500 So I'm going to go ahead and quickly include some of the same things, cs50.h 00:42:26.500 --> 00:42:28.060 so we can get input. 00:42:28.060 --> 00:42:30.860 standard io dot h so we can print output. 00:42:30.860 --> 00:42:33.310 And I'm going to preemptively include string.h 00:42:33.310 --> 00:42:35.320 in case we need that one as well. 00:42:35.320 --> 00:42:39.010 int main void, no need for command line arguments today. 00:42:39.010 --> 00:42:42.650 And let me give myself, now, an array of names for this phone book. 00:42:42.650 --> 00:42:45.040 So string names equals. 00:42:45.040 --> 00:42:47.620 And then in curly braces, how about Carter 00:42:47.620 --> 00:42:50.840 will be one person in the phone book, and David, myself, will be the other. 00:42:50.840 --> 00:42:53.465 So we'll keep it short so we don't have to type too many names. 00:42:53.465 --> 00:42:55.840 But this is a phone book with two people thus far. 00:42:55.840 --> 00:42:59.628 Suppose, now, we want to also store Carter's phone number in mind. 00:42:59.628 --> 00:43:01.420 So it's not just saying found or not found. 00:43:01.420 --> 00:43:05.320 It's literally looking up our phone numbers like a proper phone book. 00:43:05.320 --> 00:43:09.440 Well, at the moment, there's really no way to do this. 00:43:09.440 --> 00:43:15.460 I could do something hackish like I could put a number like 617-495-1000 00:43:15.460 --> 00:43:16.510 after Carter. 00:43:16.510 --> 00:43:22.460 I could maybe do something like 949-468-2750 after me. 00:43:22.460 --> 00:43:25.300 But now you're kind of doing the whole apples and oranges thing. 00:43:25.300 --> 00:43:26.470 Now, it's not strings. 00:43:26.470 --> 00:43:28.420 It's a string int, string int. 00:43:28.420 --> 00:43:31.240 All right, so maybe I could just make all of these strings. 00:43:31.240 --> 00:43:34.600 But now it's just a conceptual mixing of apples and oranges. 00:43:34.600 --> 00:43:36.425 Like yes, that's an array of four strings. 00:43:36.425 --> 00:43:39.550 But now you're on the honor system to know that the first string is a name, 00:43:39.550 --> 00:43:42.160 the second string is a number, the third string is-- 00:43:42.160 --> 00:43:43.100 you can do it. 00:43:43.100 --> 00:43:45.110 But it's a bit of a hack, so to speak. 00:43:45.110 --> 00:43:47.300 So what might be cleaner than this? 00:43:47.300 --> 00:43:51.070 Instead of combining our phone numbers into the same array as our names, 00:43:51.070 --> 00:43:55.480 what else might we do that's perhaps a little better? 00:43:55.480 --> 00:43:56.440 Say it little louder. 00:43:58.960 --> 00:44:01.197 A 2D array, possibly something we could do. 00:44:01.197 --> 00:44:04.030 I'm going to keep it even simpler now, because we haven't used those 00:44:04.030 --> 00:44:07.823 by name, even though that is, we saw last week, technically what argv is. 00:44:07.823 --> 00:44:10.240 What else could I do if I want to store names and numbers? 00:44:10.240 --> 00:44:11.147 Yeah. 00:44:11.147 --> 00:44:12.220 STUDENT: [INAUDIBLE] 00:44:12.220 --> 00:44:13.690 DAVID MALAN: Yeah, let me go with this suggestion. 00:44:13.690 --> 00:44:14.607 It's a little simpler. 00:44:14.607 --> 00:44:17.440 Rather than complicate things in literally different dimensions, 00:44:17.440 --> 00:44:18.970 let me go ahead and do string. 00:44:18.970 --> 00:44:21.730 Well, I could do int numbers. 00:44:21.730 --> 00:44:22.690 But you know what? 00:44:22.690 --> 00:44:27.700 So that we can support punctuation like dashes or even parentheses or country 00:44:27.700 --> 00:44:29.200 codes, I'm going to do this instead. 00:44:29.200 --> 00:44:33.760 I'm going to do string numbers so that I can represent Carter's number as quote 00:44:33.760 --> 00:44:39.040 unquote plus 1 for the US, 617-495-1000, complete with hyphens, 00:44:39.040 --> 00:44:40.390 as is US convention. 00:44:40.390 --> 00:44:47.930 And then for mine I'll go ahead and do +1-949-468-2750 semicolon. 00:44:47.930 --> 00:44:51.610 And now down below, let's actually enable the user to search this phone 00:44:51.610 --> 00:44:53.860 book, just like in week 0 we did. 00:44:53.860 --> 00:44:55.960 String name equals get string. 00:44:55.960 --> 00:44:58.960 And let's ask the user for a name, presumably David or Carter 00:44:58.960 --> 00:44:59.990 or someone else. 00:44:59.990 --> 00:45:01.850 And now let's re-implement linear search. 00:45:01.850 --> 00:45:05.920 So 4, int i get 0. i is less than 2. 00:45:05.920 --> 00:45:07.510 And do as I say, not as I do. 00:45:07.510 --> 00:45:09.700 I think we should beware this hard coding, 00:45:09.700 --> 00:45:12.010 but we'll keep it simple for now. 00:45:12.010 --> 00:45:13.220 i++. 00:45:13.220 --> 00:45:16.390 And then in this for loop, I think we have all of the ingredients 00:45:16.390 --> 00:45:17.150 to solve this. 00:45:17.150 --> 00:45:23.440 So if the return value of str compare of all of the names bracket 00:45:23.440 --> 00:45:28.810 i comparing against the name that the human typed in, if all of that equals 00:45:28.810 --> 00:45:33.380 equals 0, that is, all of the characters in those two strings are equal, 00:45:33.380 --> 00:45:36.770 then I think we can go ahead and say found, just like last time. 00:45:36.770 --> 00:45:37.520 But you know what? 00:45:37.520 --> 00:45:40.130 Let's actually print Carter's or my phone number. 00:45:40.130 --> 00:45:44.770 So found percent s, and we'll plug in numbers, bracket i. 00:45:44.770 --> 00:45:47.800 And then just for consistency, I'll return 0 here. 00:45:47.800 --> 00:45:52.210 And down here, how about I'll say something like printf not 00:45:52.210 --> 00:45:53.600 found, just to be clear. 00:45:53.600 --> 00:45:56.240 And then I'll return 1 as well. 00:45:56.240 --> 00:45:58.120 So just to recap, here's all of the code. 00:45:58.120 --> 00:46:01.610 It's almost the same as before, except now it's useful. 00:46:01.610 --> 00:46:03.460 I'm not just saying found or not found. 00:46:03.460 --> 00:46:07.180 I found a number in monopoly, or I found a piece in monopoly. 00:46:07.180 --> 00:46:09.880 I'm looking up in one array, one of the strings. 00:46:09.880 --> 00:46:12.730 And then I'm printing from the other array, the answer. 00:46:12.730 --> 00:46:19.480 So let me go ahead here and run the compiler, make phone book, Enter. 00:46:19.480 --> 00:46:21.070 OK, that's promising, no errors. 00:46:21.070 --> 00:46:22.720 Dot slash phonebook now. 00:46:22.720 --> 00:46:26.350 And let's search, for instance, Carter Enter. 00:46:26.350 --> 00:46:28.060 All right, so we found Carter's number. 00:46:28.060 --> 00:46:29.393 All right, let me do that again. 00:46:29.393 --> 00:46:30.960 Phone book, let's search for David. 00:46:30.960 --> 00:46:32.960 All right, we seem to have found David's number. 00:46:32.960 --> 00:46:34.502 All right, let's do it one last time. 00:46:34.502 --> 00:46:35.410 Phone book, Enter. 00:46:35.410 --> 00:46:37.360 And now we'll search for John Harvard. 00:46:37.360 --> 00:46:40.060 Enter, not found. 00:46:40.060 --> 00:46:45.520 All right, so I daresay, albeit with minimal testing, this code is correct. 00:46:45.520 --> 00:46:48.190 Would anyone now like to critique the design? 00:46:48.190 --> 00:46:51.910 Does something rub you the wrong way, perhaps, about this approach here? 00:46:55.120 --> 00:46:58.180 And as always, think about how, if the program maybe 00:46:58.180 --> 00:47:01.510 gets longer, more complicated, how decisions like this might unfold. 00:47:01.510 --> 00:47:02.448 Yeah. 00:47:02.448 --> 00:47:04.400 STUDENT: If i is less than 2. 00:47:04.400 --> 00:47:07.820 DAVID MALAN: OK, so if i is less than 2, so technically, 00:47:07.820 --> 00:47:10.080 if I change the number of people in this phone book, 00:47:10.080 --> 00:47:11.330 I'm going to have to update i. 00:47:11.330 --> 00:47:13.290 And we've already seen that I get myself into trouble. 00:47:13.290 --> 00:47:14.165 So that's bad design. 00:47:14.165 --> 00:47:15.005 Good. 00:47:15.005 --> 00:47:17.795 STUDENT: Say you add someone's name to the phonebook, 00:47:17.795 --> 00:47:20.710 but you don't have the corresponding number. 00:47:20.710 --> 00:47:23.380 So then when you go to pull their number, 00:47:23.380 --> 00:47:24.730 it [INAUDIBLE] someone's number. 00:47:24.730 --> 00:47:25.480 DAVID MALAN: Yeah. 00:47:25.480 --> 00:47:28.180 So again, I'm sort of trusting myself not to screw up. 00:47:28.180 --> 00:47:30.850 If I add John or anyone else to the first array 00:47:30.850 --> 00:47:34.180 but I forget to add their number to the second array, 00:47:34.180 --> 00:47:36.640 eventually things are going to drift and be inconsistent. 00:47:36.640 --> 00:47:39.010 And then code will be incorrect at that point. 00:47:39.010 --> 00:47:43.420 So sort of a poor design setting me up for future failure, if you will. 00:47:43.420 --> 00:47:44.860 Other thoughts? 00:47:44.860 --> 00:47:45.460 Yeah. 00:47:45.460 --> 00:47:49.816 STUDENT: [INAUDIBLE] so if you were to switch the order of the numbers 00:47:49.816 --> 00:47:52.848 but not the main [INAUDIBLE] 00:47:52.848 --> 00:47:54.140 DAVID MALAN: Yeah, really good. 00:47:54.140 --> 00:47:55.550 We're assuming the same order. 00:47:55.550 --> 00:47:59.452 From left to right, the names go, and from left to right, the numbers go. 00:47:59.452 --> 00:48:01.160 But that's kind of just the honor system. 00:48:01.160 --> 00:48:03.440 Like, there's literally nothing in code preventing me 00:48:03.440 --> 00:48:07.047 from reversing the order for whatever reason, or maybe sorting the names. 00:48:07.047 --> 00:48:10.130 Like, they're sorted now, and maybe that's deliberate, but maybe it's not. 00:48:10.130 --> 00:48:12.920 So this honor system here, too, is just not good. 00:48:12.920 --> 00:48:17.180 I could put a comment in here to remind myself, note to self, 00:48:17.180 --> 00:48:19.490 always update arrays the same way. 00:48:19.490 --> 00:48:21.600 But like, something's going to happen eventually, 00:48:21.600 --> 00:48:26.090 especially when we have not two, but three, but 30, 300 names and numbers. 00:48:26.090 --> 00:48:29.670 It would be nice to keep all of the related data together. 00:48:29.670 --> 00:48:32.810 And so in fact, the one new feature of C we'll introduce today 00:48:32.810 --> 00:48:37.970 is one that actually allows us to implement our very own data structures. 00:48:37.970 --> 00:48:41.870 You can think of arrays as a very lightweight data 00:48:41.870 --> 00:48:44.870 structure, in that it allows you to cluster related data back 00:48:44.870 --> 00:48:45.930 to back to back to back. 00:48:45.930 --> 00:48:48.170 And this is how strings are implemented. 00:48:48.170 --> 00:48:51.560 They are a data structure effectively implemented with an array. 00:48:51.560 --> 00:48:54.290 But with C and with other languages, it turns out 00:48:54.290 --> 00:48:56.600 you can invent your own data types, whether they're 00:48:56.600 --> 00:48:59.870 one dimensional, two dimensional even, or beyond. 00:48:59.870 --> 00:49:05.960 And with C, can you specifically create your own types 00:49:05.960 --> 00:49:07.200 that have their own names? 00:49:07.200 --> 00:49:10.670 So for instance, wouldn't it have been nice if C came with, 00:49:10.670 --> 00:49:16.380 not just char and int and floats and long and others. 00:49:16.380 --> 00:49:19.970 Wouldn't it be nice if C came with a data type called person? 00:49:19.970 --> 00:49:22.790 And ideally, a person would have a name and a number. 00:49:22.790 --> 00:49:24.860 Now, that's a little naive and unrealistic. 00:49:24.860 --> 00:49:28.460 Like, why would they define a person to have just those two fields. 00:49:28.460 --> 00:49:30.950 Certainly, people could have disagreed what a person is. 00:49:30.950 --> 00:49:32.300 So they leave it to us. 00:49:32.300 --> 00:49:35.900 The authors of C gave us all of these primitives, ints and floats and strings 00:49:35.900 --> 00:49:36.810 and so forth. 00:49:36.810 --> 00:49:39.810 But it's up to us now to use those in a more interesting way 00:49:39.810 --> 00:49:44.940 so that we can create an array of person variables, if you will, 00:49:44.940 --> 00:49:48.150 inside of an array called people, just to pluralize it here. 00:49:48.150 --> 00:49:49.740 So how are we going to do this? 00:49:49.740 --> 00:49:54.140 Well, for now, let's just stipulate that a person in the world 00:49:54.140 --> 00:49:57.260 will have a name and a number that we could argue all day long what else 00:49:57.260 --> 00:49:58.010 a person should have. 00:49:58.010 --> 00:49:58.677 And that's fine. 00:49:58.677 --> 00:50:01.790 You can invent your own person eventually. 00:50:01.790 --> 00:50:04.610 At the moment, I'm using just two variables 00:50:04.610 --> 00:50:06.500 to define a person's name and number. 00:50:06.500 --> 00:50:10.490 But wouldn't it be nice to encapsulate, that is, combine these two data 00:50:10.490 --> 00:50:14.660 types, into a new and improved data type called person. 00:50:14.660 --> 00:50:17.360 And the syntax for that is going to be this. 00:50:17.360 --> 00:50:18.800 So it's a bit of a mouthful. 00:50:18.800 --> 00:50:21.960 But you can, perhaps, infer what some of this is doing here. 00:50:21.960 --> 00:50:24.500 So it turns out C has a keyword called typedef. 00:50:24.500 --> 00:50:28.310 As the name kind of suggests, this allows you to define your own type. 00:50:28.310 --> 00:50:31.550 Struct is an indication that it's a structure. 00:50:31.550 --> 00:50:35.060 It's like a structure that has multiple values inside of it 00:50:35.060 --> 00:50:36.710 that you are trying to define. 00:50:36.710 --> 00:50:39.440 And then at the very bottom here outside of the curly braces, 00:50:39.440 --> 00:50:42.270 is the name of the type that you want to create. 00:50:42.270 --> 00:50:45.790 So you don't have discretion over using typedef or struct 00:50:45.790 --> 00:50:46.790 in this particular case. 00:50:46.790 --> 00:50:48.665 But you can name the thing whatever you want. 00:50:48.665 --> 00:50:52.590 And you can put anything in the structure that you want as well. 00:50:52.590 --> 00:50:56.510 And as soon as the semicolon is executed at the bottom of the code, 00:50:56.510 --> 00:50:59.180 every line thereafter can now have access 00:50:59.180 --> 00:51:05.760 to a person data type, whether as a single variable, or as an entire array. 00:51:05.760 --> 00:51:10.260 So if I want to build on this then, let me go ahead and do this. 00:51:10.260 --> 00:51:12.230 Let me go back to my C code here. 00:51:12.230 --> 00:51:17.610 And I'm going to go ahead and change just a couple of things. 00:51:17.610 --> 00:51:19.110 Let's go ahead and do this. 00:51:19.110 --> 00:51:23.240 I'm going to go ahead and, first, get rid of those two hardcoded arrays. 00:51:23.240 --> 00:51:27.180 And let me go ahead and, at the top of my file, 00:51:27.180 --> 00:51:30.180 invent this type, so typedef struct. 00:51:30.180 --> 00:51:34.470 Inside of it will be a string name and then a string number. 00:51:34.470 --> 00:51:36.780 And then the name of the structure will be person. 00:51:36.780 --> 00:51:40.025 And best practice would have me define it at the very top of my file 00:51:40.025 --> 00:51:42.150 so that any of my functions, in fact, could use it, 00:51:42.150 --> 00:51:44.530 even though I just have main in this case. 00:51:44.530 --> 00:51:47.100 Now, if I wanted, I could do this. 00:51:47.100 --> 00:51:50.370 Person P1 and person P2. 00:51:50.370 --> 00:51:53.040 But we know from last week, that already is bad design. 00:51:53.040 --> 00:51:57.400 If you want to have multiple instances of the same type of variable, 00:51:57.400 --> 00:52:00.044 we should probably use what instead? 00:52:00.044 --> 00:52:01.046 STUDENT: [INAUDIBLE] 00:52:01.046 --> 00:52:01.796 DAVID MALAN: And-- 00:52:01.796 --> 00:52:02.470 STUDENT: An array. 00:52:02.470 --> 00:52:03.637 DAVID MALAN: Yeah, an array. 00:52:03.637 --> 00:52:05.230 So let me not even go down that road. 00:52:05.230 --> 00:52:06.700 Let me instead just do this. 00:52:06.700 --> 00:52:09.727 Person will be the type of the array. 00:52:09.727 --> 00:52:10.810 But I'm going to call it-- 00:52:10.810 --> 00:52:11.980 I could call it persons. 00:52:11.980 --> 00:52:13.720 But in English, we typically say people. 00:52:13.720 --> 00:52:15.190 So I'll call the array people. 00:52:15.190 --> 00:52:18.110 And I want two people to exist in this array, 00:52:18.110 --> 00:52:20.920 though I could certainly change that number to be anything I want. 00:52:20.920 --> 00:52:24.820 How, now, do you put a name inside of a person 00:52:24.820 --> 00:52:27.190 and then put the number inside of that same person? 00:52:27.190 --> 00:52:28.990 Well, slightly new syntax today. 00:52:28.990 --> 00:52:30.520 I'm going to go ahead and say this. 00:52:30.520 --> 00:52:34.420 People bracket 0 just gives me the first person in the array. 00:52:34.420 --> 00:52:35.570 That's not new. 00:52:35.570 --> 00:52:40.840 But if you want to go inside of that person in memory, you use a dot. 00:52:40.840 --> 00:52:44.870 And then you just specify the name of the attribute therein. 00:52:44.870 --> 00:52:47.410 So if I want to set the first person's name to Carter, 00:52:47.410 --> 00:52:49.480 I just use that so-called dot notation. 00:52:49.480 --> 00:52:52.780 And then if I want to set Carter's number using dot notation, 00:52:52.780 --> 00:52:56.680 I would do this, +1-617-495-1000. 00:52:56.680 --> 00:52:58.880 And then if I want to do the same for myself, 00:52:58.880 --> 00:53:03.730 I would now do people bracket 1 dot name equals quote unquote David. 00:53:03.730 --> 00:53:08.440 And then people bracket 1 still dot number equals quote unquote 00:53:08.440 --> 00:53:13.030 +1-949-468-2750. 00:53:13.030 --> 00:53:15.970 And now, at the bottom of my file, I think 00:53:15.970 --> 00:53:18.610 my logic can pretty much stay the same. 00:53:18.610 --> 00:53:22.180 I can still, on this line here, prompt the user 00:53:22.180 --> 00:53:24.370 for the name of the person they want to look up. 00:53:24.370 --> 00:53:26.620 For now, even though I admit it's not the best design, 00:53:26.620 --> 00:53:28.495 I'm just doing this for demonstration's sake, 00:53:28.495 --> 00:53:31.360 I'm going to leave the two there, because I know I have two people. 00:53:31.360 --> 00:53:34.100 But down here, this is going to have to change. 00:53:34.100 --> 00:53:37.000 I don't want to compare names bracket i anymore. 00:53:37.000 --> 00:53:42.190 What do I want to type here as the first argument to str compare? 00:53:42.190 --> 00:53:43.900 What do I want to do here? 00:53:43.900 --> 00:53:44.960 Yeah. 00:53:44.960 --> 00:53:46.800 STUDENT: People i dot name. 00:53:46.800 --> 00:53:49.140 DAVID MALAN: So people i dot name, yeah. 00:53:49.140 --> 00:53:52.938 So I want to go into the people array at the ith location, 00:53:52.938 --> 00:53:54.480 because that's what my loop is doing. 00:53:54.480 --> 00:53:55.890 It's updating i again and again. 00:53:55.890 --> 00:53:58.087 And then look at name, and that's good. 00:53:58.087 --> 00:53:59.670 I think now I need to change this too. 00:53:59.670 --> 00:54:01.890 What do I want to print if the person is found? 00:54:01.890 --> 00:54:02.445 Someone else? 00:54:05.070 --> 00:54:08.850 What do I want to print here, if I found the person's name? 00:54:08.850 --> 00:54:09.360 Yeah. 00:54:09.360 --> 00:54:10.890 STUDENT: [INAUDIBLE] 00:54:10.890 --> 00:54:12.390 DAVID MALAN: Say it a little louder. 00:54:12.390 --> 00:54:13.795 STUDENT: People i dot number. 00:54:13.795 --> 00:54:14.670 DAVID MALAN: Perfect. 00:54:14.670 --> 00:54:17.460 So people bracket i dot number, if indeed I 00:54:17.460 --> 00:54:20.310 want to print the corresponding number to this person. 00:54:20.310 --> 00:54:22.930 And then I think the rest of my code can stay the same. 00:54:22.930 --> 00:54:27.150 So let me go ahead and rerun make phone book to recompile this version. 00:54:27.150 --> 00:54:28.170 So far so good. 00:54:28.170 --> 00:54:29.400 Dot slash phone book. 00:54:29.400 --> 00:54:31.598 Let's go ahead and type in Carter's name, found. 00:54:31.598 --> 00:54:33.390 All right, let's go ahead and run it again. 00:54:33.390 --> 00:54:35.273 David's name, found. 00:54:35.273 --> 00:54:36.940 Let's go ahead and run it one more time. 00:54:36.940 --> 00:54:40.260 Type in John Harvard, for instance, not found, in this case. 00:54:40.260 --> 00:54:43.710 So fundamentally, the code isn't all that different. 00:54:43.710 --> 00:54:46.090 Linear search is still behaving the same way. 00:54:46.090 --> 00:54:48.690 And I admit, this is kind of ugly looking. 00:54:48.690 --> 00:54:52.350 We've kind of made a two line solution five lines of code now. 00:54:52.350 --> 00:54:56.640 But if we fast forward a week or two when we start saving information 00:54:56.640 --> 00:55:01.500 to files, we'll introduce you to files like csv files, 00:55:01.500 --> 00:55:03.892 comma separated values, or spreadsheet files, which 00:55:03.892 --> 00:55:06.600 you've surely opened on your Mac or PC at some point in the past. 00:55:06.600 --> 00:55:09.840 Suffice it to say we'll soon learn techniques for storing information, 00:55:09.840 --> 00:55:11.790 like names and numbers, in files. 00:55:11.790 --> 00:55:13.680 And at that point, we're not going to do any 00:55:13.680 --> 00:55:15.990 of this hackish sort of hard coding of the number 2 00:55:15.990 --> 00:55:19.080 and manually typing my name and Carter's name and number into our program. 00:55:19.080 --> 00:55:21.750 We'll read the information dynamically from a file. 00:55:21.750 --> 00:55:25.180 And in a few weeks, we'll read it dynamically from a database instead. 00:55:25.180 --> 00:55:30.240 But this is, for now, just syntactically how we can create an array of size 2 00:55:30.240 --> 00:55:32.190 containing one person each. 00:55:32.190 --> 00:55:34.643 We can update the name and number of the first person, 00:55:34.643 --> 00:55:36.810 update the name and the number of the second person, 00:55:36.810 --> 00:55:40.500 and then later search across those names and print out 00:55:40.500 --> 00:55:41.610 the corresponding numbers. 00:55:41.610 --> 00:55:44.220 And in this sense, this is a better design. 00:55:44.220 --> 00:55:44.730 Why? 00:55:44.730 --> 00:55:48.690 Because my person data type encapsulates, 00:55:48.690 --> 00:55:53.400 now, everything that it means to be a person, at least in this narrow world. 00:55:53.400 --> 00:55:57.580 And if I want to add something to the notion of a person, for instance, 00:55:57.580 --> 00:55:59.640 I could go up to my type def, and tomorrow, 00:55:59.640 --> 00:56:03.743 add an address to every person and start reading that in as well. 00:56:03.743 --> 00:56:05.160 And now it's not the honor system. 00:56:05.160 --> 00:56:10.260 It's not a names array, a numbers array, an addresses array, and everything else 00:56:10.260 --> 00:56:12.210 you might imagine related to a person. 00:56:12.210 --> 00:56:17.223 It's all encapsulated, which is a term of art inside of the same type. 00:56:17.223 --> 00:56:19.890 Reminiscent, if some of you have programmed before, of something 00:56:19.890 --> 00:56:21.660 called object-oriented programming. 00:56:21.660 --> 00:56:23.190 But we're not there yet. 00:56:23.190 --> 00:56:24.690 C is not that. 00:56:24.690 --> 00:56:29.280 Questions on this use of struct or this new syntax, 00:56:29.280 --> 00:56:35.037 the dot operator being really the juicy part here. 00:56:35.037 --> 00:56:35.620 Any questions? 00:56:35.620 --> 00:56:36.522 Yeah. 00:56:36.522 --> 00:56:39.414 STUDENT: [INAUDIBLE] 00:56:42.800 --> 00:56:44.420 DAVID MALAN: On what line number? 00:56:44.420 --> 00:56:46.063 STUDENT: 16. 00:56:46.063 --> 00:56:46.730 DAVID MALAN: 16? 00:56:46.730 --> 00:56:51.230 So yes, so syntactically, we introduced the square brackets last week. 00:56:51.230 --> 00:56:55.310 So doing people bracket 0 just means go to the first person in the array. 00:56:55.310 --> 00:56:58.400 That was like when Stephanie literally opened this door. 00:56:58.400 --> 00:56:59.990 That's doors bracket 0. 00:56:59.990 --> 00:57:02.330 But this is, of course, people bracket 0 instead. 00:57:02.330 --> 00:57:04.580 Today, the dot is a new piece of syntax. 00:57:04.580 --> 00:57:11.000 It means go inside of that person in memory and look at the name they're in 00:57:11.000 --> 00:57:13.297 and set it equal to Carter and do the same for number. 00:57:13.297 --> 00:57:13.880 So that's all. 00:57:13.880 --> 00:57:16.340 It's like, open the locker door, go inside of it, 00:57:16.340 --> 00:57:18.410 and check or set the name and the number. 00:57:18.410 --> 00:57:19.040 Yeah. 00:57:19.040 --> 00:57:29.280 STUDENT: [INAUDIBLE] can you set default values for each of the [INAUDIBLE]?? 00:57:29.280 --> 00:57:30.840 DAVID MALAN: Attributes is fine. 00:57:30.840 --> 00:57:31.530 Good question. 00:57:31.530 --> 00:57:34.050 In the struct, can you set default values? 00:57:34.050 --> 00:57:35.100 Short answer, no. 00:57:35.100 --> 00:57:39.000 And this is where C becomes less feature-able than more modern languages 00:57:39.000 --> 00:57:42.580 like Python and Java and others, where you can, in fact, do that. 00:57:42.580 --> 00:57:44.890 So when we transition to Python in a few weeks' time, 00:57:44.890 --> 00:57:47.140 we'll see how we can start solving problems like that. 00:57:47.140 --> 00:57:50.100 But for now, it's up to you to initialize name and number 00:57:50.100 --> 00:57:51.450 to something. 00:57:51.450 --> 00:57:52.832 Yeah. 00:57:52.832 --> 00:57:55.540 STUDENT: [INAUDIBLE] 00:58:04.123 --> 00:58:05.540 DAVID MALAN: Really good question. 00:58:05.540 --> 00:58:08.470 How can we adjust or critique the design of what I'm doing? 00:58:08.470 --> 00:58:12.190 This is one of the few situations where I would say, hypocritically, 00:58:12.190 --> 00:58:13.780 do as I say, not as I do. 00:58:13.780 --> 00:58:17.710 I am using pretty ugly lines like this, just to introduce the syntax. 00:58:17.710 --> 00:58:20.428 But my claim, pedagogically today, is that eventually, 00:58:20.428 --> 00:58:22.720 when we start storing names and numbers or other things 00:58:22.720 --> 00:58:26.230 in files or in databases, you won't have this redundancy. 00:58:26.230 --> 00:58:29.080 You'll have one line of code or two lines of code that 00:58:29.080 --> 00:58:31.660 read the information from the file or database 00:58:31.660 --> 00:58:34.630 and then fill the entire array with that data. 00:58:34.630 --> 00:58:37.330 For now, I'm just doing it manually so as to keep our focus only 00:58:37.330 --> 00:58:39.400 on the new syntax, but that's it. 00:58:39.400 --> 00:58:42.640 So forgive the bad design by design today. 00:58:42.640 --> 00:58:45.740 Other questions on this? 00:58:45.740 --> 00:58:47.595 All right, that's been a lot already. 00:58:47.595 --> 00:58:50.470 Why don't we go ahead and take our 10 minute break with snacks first. 00:58:50.470 --> 00:58:53.020 We have some delightful brownies in the lobby. 00:58:53.020 --> 00:58:55.900 All right, we are back. 00:58:55.900 --> 00:58:59.890 And up until now, it clearly seems to be a good thing if your data is sorted, 00:58:59.890 --> 00:59:02.350 because you can use binary search. 00:59:02.350 --> 00:59:05.540 You know a little something more about the data. 00:59:05.540 --> 00:59:07.570 But it turns out that sorting in and of itself 00:59:07.570 --> 00:59:10.420 is kind of a problem to solve too. 00:59:10.420 --> 00:59:14.830 And you might think, well, if sorting is going 00:59:14.830 --> 00:59:18.070 to be pretty fast, we absolutely should do it before we start searching, 00:59:18.070 --> 00:59:20.237 because that will just speed up all of our searches. 00:59:20.237 --> 00:59:22.960 But if sorting is slow, that kind of invites the question, well, 00:59:22.960 --> 00:59:25.420 should we bother sorting our data if we're only 00:59:25.420 --> 00:59:28.090 going to search the data maybe once, maybe twice? 00:59:28.090 --> 00:59:30.550 And so here is going to be, potentially, a trade off. 00:59:30.550 --> 00:59:33.250 So let's consider what it means really to sort data. 00:59:33.250 --> 00:59:35.950 In our case, it's just going to be simple and use numbers. 00:59:35.950 --> 00:59:38.230 But it might, in the case of the Googles of the world, 00:59:38.230 --> 00:59:40.880 be actual web pages or persons or the like. 00:59:40.880 --> 00:59:46.090 So here is our typical picture for sorting, for solving any problem. 00:59:46.090 --> 00:59:48.190 Input at left and output at right. 00:59:48.190 --> 00:59:54.340 The input to our sort problem is going to be some unsorted set of values. 00:59:54.340 --> 00:59:57.940 And the output, ideally, will be the same set of values sorted. 00:59:57.940 --> 01:00:00.550 And if we do this concretely, let's suppose 01:00:00.550 --> 01:00:04.786 that we want to go about sorting this list of numbers, 7, 2, 5, 4, 1, 6, 0, 01:00:04.786 --> 01:00:05.860 3. 01:00:05.860 --> 01:00:07.810 So it's all of the numbers from 0 to 7. 01:00:07.810 --> 01:00:09.757 But they're somehow jumbled up randomly. 01:00:09.757 --> 01:00:11.590 That's going to be the input to the problem. 01:00:11.590 --> 01:00:13.840 And the goal is now to sort those so that you, indeed, 01:00:13.840 --> 01:00:17.990 get out 0, 1, 2, 3, 4, 5, 6, 7 instead. 01:00:17.990 --> 01:00:20.440 So it turns out there's lots of different ways 01:00:20.440 --> 01:00:23.900 we can actually sort numbers like these here. 01:00:23.900 --> 01:00:27.430 And in fact, just to complement our search example earlier, 01:00:27.430 --> 01:00:29.952 could we perhaps quickly get some eight volunteers 01:00:29.952 --> 01:00:32.410 to come up if you're comfortable appearing on the internet? 01:00:32.410 --> 01:00:39.100 If you want to do 1, 2, 3, 4, 5, 6, 7, 8, how about? 01:00:39.100 --> 01:00:40.255 All right, come on down. 01:00:45.040 --> 01:00:47.970 All right. 01:00:47.970 --> 01:00:50.560 Come on over here, and I'll give you each a number. 01:00:50.560 --> 01:00:54.240 And if you want to start to organize yourselves in the same order 01:00:54.240 --> 01:00:58.390 you see the numbers on the board. 01:00:58.390 --> 01:01:01.530 So look up on the overhead and organize yourselves from left 01:01:01.530 --> 01:01:04.460 to right in that same order. 01:01:04.460 --> 01:01:06.210 And let's have the first of you-- perfect. 01:01:06.210 --> 01:01:10.420 If you want to come right over here, how about right in line with this? 01:01:10.420 --> 01:01:13.990 All right, and a few more numbers. 01:01:13.990 --> 01:01:14.980 All right. 01:01:14.980 --> 01:01:19.810 Number 2, 6, and perfect. 01:01:19.810 --> 01:01:21.625 Just the right number, all right. 01:01:21.625 --> 01:01:22.858 Uh oh. 01:01:22.858 --> 01:01:24.400 All right, there we go, number three. 01:01:24.400 --> 01:01:24.968 All right. 01:01:24.968 --> 01:01:26.260 So let's just do a quick check. 01:01:26.260 --> 01:01:30.867 We have 7, 2, 5, 4, 1, 6, 0, 3, very good so far. 01:01:30.867 --> 01:01:32.950 Do you want to just scootch a little this way just 01:01:32.950 --> 01:01:34.510 to make a little more room? 01:01:34.510 --> 01:01:38.090 All right, and let's consider now who we have here on stage. 01:01:38.090 --> 01:01:40.780 You want to each say a quick hello to the audience? 01:01:40.780 --> 01:01:42.070 RYAN: Hi, my name is Ryan. 01:01:42.070 --> 01:01:45.597 I'm a first year from Pennypacker. 01:01:45.597 --> 01:01:46.930 ITSELLE: Hi, my name is Itselle. 01:01:46.930 --> 01:01:49.177 I'm a first year at Strauss. 01:01:49.177 --> 01:01:50.260 LUCY: Hi, my name is Lucy. 01:01:50.260 --> 01:01:52.400 And I'm a first year from Greenough. 01:01:52.400 --> 01:01:53.650 SHILOH: Hi, my name is Shiloh. 01:01:53.650 --> 01:01:55.927 I'm a first year in Wigglesworth. 01:01:55.927 --> 01:01:57.010 JACK: Hi, my name is Jack. 01:01:57.010 --> 01:01:59.877 And I'm a first year in Strauss. 01:01:59.877 --> 01:02:01.210 KATHRYN: Hi, my name is Kathryn. 01:02:01.210 --> 01:02:02.787 I'm a first year at Strauss. 01:02:02.787 --> 01:02:04.120 MICHAEL: Hi, my name is Michael. 01:02:04.120 --> 01:02:06.063 I'm a first year at Pennypacker. 01:02:06.063 --> 01:02:07.480 MUHAMMAD: Hi, my name is Muhammad. 01:02:07.480 --> 01:02:09.047 I'm a first year in Matthews. 01:02:09.047 --> 01:02:10.630 DAVID MALAN: Hi, nice, welcome aboard. 01:02:10.630 --> 01:02:11.240 All right. 01:02:11.240 --> 01:02:16.510 So let's consider, now, how we might go about sorting our kind volunteers here, 01:02:16.510 --> 01:02:21.160 the goal being to get them into order from smallest to largest so that, 01:02:21.160 --> 01:02:24.160 presumably then, we can use something smarter than just linear search. 01:02:24.160 --> 01:02:26.350 We can actually use binary search, assuming 01:02:26.350 --> 01:02:27.878 that they are already then sorted. 01:02:27.878 --> 01:02:30.670 So let me propose that we first consider an algorithm that actually 01:02:30.670 --> 01:02:32.600 has a name called selection sort. 01:02:32.600 --> 01:02:36.220 And selection sort is going to be one that literally has me, 01:02:36.220 --> 01:02:40.060 or really you, as the programmer, selecting the smallest element again 01:02:40.060 --> 01:02:43.610 and again, and then putting them into the appropriate place. 01:02:43.610 --> 01:02:47.115 So let me go ahead and start this here, starting with the number 7. 01:02:47.115 --> 01:02:49.240 At the moment, 7 is the smallest number I've found. 01:02:49.240 --> 01:02:52.610 So I'm going to make mental note of that with a mental variable, if you will. 01:02:52.610 --> 01:02:53.710 I'm going to move on now. 01:02:53.710 --> 01:02:55.460 Number 2 is obviously smaller, so I'm just 01:02:55.460 --> 01:02:58.360 going to update my mental reminder that 2 is now the smallest, 01:02:58.360 --> 01:03:01.555 effectively forgetting, for now, number 7. 01:03:01.555 --> 01:03:02.440 5, not smaller. 01:03:02.440 --> 01:03:03.370 4, not smaller. 01:03:03.370 --> 01:03:04.170 1, smaller. 01:03:04.170 --> 01:03:05.920 And I'm going to make mental note of that. 01:03:05.920 --> 01:03:07.030 6, not smaller. 01:03:07.030 --> 01:03:08.200 0, even smaller. 01:03:08.200 --> 01:03:11.140 I'll make mental note of that, having forgotten now everything else. 01:03:11.140 --> 01:03:13.180 And now number 3 is not smaller. 01:03:13.180 --> 01:03:14.290 So what's your name again? 01:03:14.290 --> 01:03:14.630 MICHAEL: Michael. 01:03:14.630 --> 01:03:16.240 DAVID MALAN: So Michael is number 0. 01:03:16.240 --> 01:03:18.310 He belongs, of course, way down there. 01:03:18.310 --> 01:03:20.740 But unfortunately-- you are-- 01:03:20.740 --> 01:03:21.550 RYAN: Ryan. 01:03:21.550 --> 01:03:23.360 DAVID MALAN: Ryan is in the way. 01:03:23.360 --> 01:03:24.580 So what should we do? 01:03:24.580 --> 01:03:27.570 How should we start to sort this list? 01:03:27.570 --> 01:03:30.510 Where should number 0 go? 01:03:30.510 --> 01:03:31.012 Yeah. 01:03:31.012 --> 01:03:32.220 Do you want to say it louder? 01:03:32.220 --> 01:03:34.545 STUDENT: I will swap, I think. 01:03:34.545 --> 01:03:36.670 DAVID MALAN: Yeah, so let's just go ahead and swap. 01:03:36.670 --> 01:03:39.190 So if you want to go ahead and 0, go on where 7 is. 01:03:39.190 --> 01:03:41.170 We need to make room for number 7. 01:03:41.170 --> 01:03:44.350 It would kind of be cheating if maybe everyone kind of politely 01:03:44.350 --> 01:03:45.530 stepped over to the side. 01:03:45.530 --> 01:03:46.030 Why? 01:03:46.030 --> 01:03:49.000 Because if we imagine all of our volunteers here to be an array, like, 01:03:49.000 --> 01:03:52.390 that's a crazy amount of work to have every element in the array 01:03:52.390 --> 01:03:54.190 shift to the left just to make room. 01:03:54.190 --> 01:03:57.340 So we're going to keep it simple and just evict whoever's there now. 01:03:57.340 --> 01:04:00.880 Now, maybe we get lucky, and number 7 is actually closer to its destination. 01:04:00.880 --> 01:04:03.250 Maybe we get unlucky, and it goes farther away. 01:04:03.250 --> 01:04:05.260 But we've at least solved one problem. 01:04:05.260 --> 01:04:08.140 If we had n problems at first, now we have n minus 1, 01:04:08.140 --> 01:04:10.280 because number 0 is indeed in the right place. 01:04:10.280 --> 01:04:14.588 So if I continue to act this out, let me go ahead and say 2, currently 01:04:14.588 --> 01:04:15.130 the smallest. 01:04:15.130 --> 01:04:18.040 5, no, 4, no, 1 currently the smallest. 01:04:18.040 --> 01:04:19.000 I'll make mental note. 01:04:19.000 --> 01:04:22.690 6, 7, 3, and now let me pause. 01:04:22.690 --> 01:04:26.000 1 is obviously the now smallest element. 01:04:26.000 --> 01:04:27.760 So did I need to keep going? 01:04:27.760 --> 01:04:30.550 Well, turns out, at least as I've defined selection sort, 01:04:30.550 --> 01:04:33.130 I do need to keep going, because I only claim 01:04:33.130 --> 01:04:36.550 that I'm using one variable in my mind to remember the then smallest element. 01:04:36.550 --> 01:04:39.970 I'm not smart enough like us humans to remember, wait a minute, 01:04:39.970 --> 01:04:41.590 1 is definitely the smallest now. 01:04:41.590 --> 01:04:43.190 I don't have that whole recollection. 01:04:43.190 --> 01:04:45.590 So I just am keeping track of the now smallest. 01:04:45.590 --> 01:04:46.910 So number 1, your name was? 01:04:46.910 --> 01:04:47.410 JACK: Jack. 01:04:47.410 --> 01:04:49.300 DAVID MALAN: Jack, where should Jack go? 01:04:49.300 --> 01:04:50.380 Probably there. 01:04:50.380 --> 01:04:51.670 And what's your name? 01:04:51.670 --> 01:04:51.880 ITSELLE: Itselle. 01:04:51.880 --> 01:04:54.588 DAVID MALAN: OK, so Jack and Itselle, if you want to swap places, 01:04:54.588 --> 01:04:57.430 we've now solved two of the n total problems. 01:04:57.430 --> 01:04:58.990 And now we'll do it a little faster. 01:04:58.990 --> 01:05:03.880 If each of you want to start to swap as I find the right person, so 5 smallest, 01:05:03.880 --> 01:05:06.340 4 smaller, 2 is smaller. 01:05:06.340 --> 01:05:07.750 Got to keep checking. 01:05:07.750 --> 01:05:09.572 OK, 2 was smaller. 01:05:09.572 --> 01:05:11.780 All right, now I'm going to go back to the beginning. 01:05:11.780 --> 01:05:13.090 All right, 4 is small. 01:05:13.090 --> 01:05:14.050 5 is not. 01:05:14.050 --> 01:05:14.740 6 is not. 01:05:14.740 --> 01:05:16.120 7-- oh, 3 is small. 01:05:16.120 --> 01:05:17.770 Where do you want to go? 01:05:17.770 --> 01:05:18.670 OK, good. 01:05:18.670 --> 01:05:19.810 I'm going to go back here. 01:05:19.810 --> 01:05:21.060 And I can be a little smart. 01:05:21.060 --> 01:05:22.810 I don't have to go all the way to the end, 01:05:22.810 --> 01:05:24.950 because I know these folks are already sorted. 01:05:24.950 --> 01:05:26.630 So I can at least optimize slightly. 01:05:26.630 --> 01:05:27.970 So now 5 is small. 01:05:27.970 --> 01:05:28.720 6 is small. 01:05:28.720 --> 01:05:30.160 7 is 4, 4 is smaller. 01:05:30.160 --> 01:05:33.080 If you want to go in place there. 01:05:33.080 --> 01:05:34.810 And now, here things get interesting. 01:05:34.810 --> 01:05:37.180 I can optimize by not looking at these folks 01:05:37.180 --> 01:05:39.340 anymore, because they're obviously problem solved. 01:05:39.340 --> 01:05:42.970 But now 5 is small, 6 is not, 7 is not. 01:05:42.970 --> 01:05:45.010 OK, 5, you can stay where you are. 01:05:45.010 --> 01:05:46.870 Now, a human in the room is obviously going 01:05:46.870 --> 01:05:49.420 to question why I'm wasting any more time. 01:05:49.420 --> 01:05:52.090 But with selection sort, as I've defined it thus far, 01:05:52.090 --> 01:05:55.840 I still have to, now, check 6 is smallest, not 7. 01:05:55.840 --> 01:05:58.520 And now my final step, OK, they're all in place. 01:05:58.520 --> 01:06:00.760 So here, too, is this dichotomy between what we all 01:06:00.760 --> 01:06:03.730 have is this bird's eye view of the whole problem, where it's obvious 01:06:03.730 --> 01:06:05.060 where everyone needs to go. 01:06:05.060 --> 01:06:09.137 But a computer implementing this with an array really has to be more methodical. 01:06:09.137 --> 01:06:10.720 And we're actually saving a step here. 01:06:10.720 --> 01:06:13.780 If we were really doing this, none of these numbers would be visible. 01:06:13.780 --> 01:06:16.840 All eight of our volunteers would be inside of a locked door. 01:06:16.840 --> 01:06:19.220 And only then could we see them one at a time. 01:06:19.220 --> 01:06:21.670 But we're focusing now just on the sorting aspect. 01:06:21.670 --> 01:06:24.640 So let me just, before we do one other demonstration here, 01:06:24.640 --> 01:06:29.620 propose that what I really just did here in pseudocode was something like this. 01:06:29.620 --> 01:06:33.430 For i from 0 to n minus 1, keeping in mind 01:06:33.430 --> 01:06:35.590 that 0 is always the left of the array. 01:06:35.590 --> 01:06:38.110 n minus 1 is always the right end of the array. 01:06:38.110 --> 01:06:43.000 For i from 0 to n minus 1, I found the smallest number between numbers bracket 01:06:43.000 --> 01:06:45.730 i and numbers bracket n minus 1. 01:06:45.730 --> 01:06:48.610 And that's the very geeky way of expressing this optimization. 01:06:48.610 --> 01:06:51.490 I'm always starting from numbers bracket i wherever I am. 01:06:51.490 --> 01:06:53.200 And then everything else to the right. 01:06:53.200 --> 01:06:56.890 And that's what was allowing me to ignore the already sorted volunteers. 01:06:56.890 --> 01:07:01.220 If, though, my last line says swap smallest number with numbers i, 01:07:01.220 --> 01:07:03.220 think that implements what our humans were doing 01:07:03.220 --> 01:07:05.470 by physically walking to another spot. 01:07:05.470 --> 01:07:09.220 All right, so that, then, would be what we'll call selection sort. 01:07:09.220 --> 01:07:11.620 Let's go ahead and take a second approach here 01:07:11.620 --> 01:07:13.360 using an algorithm that I'm going to call bubble sort. 01:07:13.360 --> 01:07:16.090 But to do this, we need you all to reset to your original locations. 01:07:16.090 --> 01:07:17.320 We have a little cheat sheet on the board 01:07:17.320 --> 01:07:19.750 if you'd like to go back to this position here. 01:07:19.750 --> 01:07:22.600 And let me take a fundamentally different approach, because I'm not 01:07:22.600 --> 01:07:24.850 really liking selection sort as is, because it's kind 01:07:24.850 --> 01:07:26.780 of a lot of walking back and forth. 01:07:26.780 --> 01:07:30.620 And the lot of walking suggests a lot of, lot of steps again and again. 01:07:30.620 --> 01:07:32.090 So what might I do instead? 01:07:32.090 --> 01:07:34.840 Well, bubble sort is going to have me focus a little more 01:07:34.840 --> 01:07:36.730 intuitively on just smaller problems. 01:07:36.730 --> 01:07:38.605 And let's see if this gets me somewhere else. 01:07:38.605 --> 01:07:42.430 So if I just look at this list without looking at everyone else, 7 and 2, 01:07:42.430 --> 01:07:43.670 this is obviously a problem. 01:07:43.670 --> 01:07:44.170 Why? 01:07:44.170 --> 01:07:45.500 Because you're out of order. 01:07:45.500 --> 01:07:47.810 So let's just solve one tiny problem first. 01:07:47.810 --> 01:07:49.570 So 7 and 2, why don't you swap? 01:07:49.570 --> 01:07:54.160 I know 2 is in a better place now, because she's definitely less than 7. 01:07:54.160 --> 01:07:55.540 So I think I can now move on. 01:07:55.540 --> 01:07:57.350 7 and 5, problem. 01:07:57.350 --> 01:07:58.390 So let's solve that. 01:07:58.390 --> 01:07:59.830 7 and 4, problem. 01:07:59.830 --> 01:08:02.380 Let's solve that, 7 and 1, let's solve that. 01:08:02.380 --> 01:08:03.970 7 and 6, let's solve that. 01:08:03.970 --> 01:08:05.080 7 and 0, solve that. 01:08:05.080 --> 01:08:06.550 7 and 3, solve that. 01:08:06.550 --> 01:08:07.330 OK, done. 01:08:07.330 --> 01:08:09.130 Sorted, right? 01:08:09.130 --> 01:08:11.780 Or obviously not, if you just glance at these numbers here. 01:08:11.780 --> 01:08:14.530 But we have, fundamentally, taken a bite out of the problem. 01:08:14.530 --> 01:08:17.020 7 is indeed in the right place. 01:08:17.020 --> 01:08:21.170 So we maximally have n minus 1 other problems to solve. 01:08:21.170 --> 01:08:23.660 So how do I do this? 01:08:23.660 --> 01:08:25.700 I think I can just repeat the same logic. 01:08:25.700 --> 01:08:26.770 Let me go over here. 01:08:26.770 --> 01:08:28.210 2 and 5, good. 01:08:28.210 --> 01:08:29.800 5 and 4, no. 01:08:29.800 --> 01:08:31.330 5 and 1, no. 01:08:31.330 --> 01:08:32.590 5 and 6, yes. 01:08:32.590 --> 01:08:34.660 6 and 0, no. 01:08:34.660 --> 01:08:36.760 6 and 3, no. 01:08:36.760 --> 01:08:39.191 So now we've solved two of the problems. 01:08:39.191 --> 01:08:41.649 And what's nice about bubble sort, at least as this glance, 01:08:41.649 --> 01:08:42.707 it's nice and simple. 01:08:42.707 --> 01:08:43.540 It's nice and local. 01:08:43.540 --> 01:08:46.510 And you just keep incrementally solving more and more problems. 01:08:46.510 --> 01:08:48.010 So let's go ahead and do this again. 01:08:48.010 --> 01:08:50.080 And I'll do it-- we can do it faster. 01:08:50.080 --> 01:08:51.760 2 and 4, we know are good. 01:08:51.760 --> 01:08:59.200 4 and 1, 4 and 5, 5 and 0, 5 and 3, 5 and 6, 6 and 7, good. 01:08:59.200 --> 01:09:01.390 So we go back, 2 and 1. 01:09:01.390 --> 01:09:03.340 Ah, now another problem solve. 01:09:03.340 --> 01:09:09.939 2, and 4, 4 and 0, 4 and 3, 4 and 5, 5 and 6, 6 and 7. 01:09:09.939 --> 01:09:13.270 And so notice 2, as per its name, the largest elements 01:09:13.270 --> 01:09:14.895 have bubbled their way up to the top. 01:09:14.895 --> 01:09:17.020 And that's what seems to be happening just as we're 01:09:17.020 --> 01:09:18.340 fixing some remaining problems. 01:09:18.340 --> 01:09:19.120 So almost done. 01:09:19.120 --> 01:09:27.550 1 and 2, 2 and 0, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7, almost done. 01:09:27.550 --> 01:09:29.830 Obviously, to us humans, it looks done. 01:09:29.830 --> 01:09:32.529 How do I know as the computer for sure? 01:09:32.529 --> 01:09:36.370 What would be the most surefire way for me to now go, it's not done, sorry. 01:09:36.370 --> 01:09:38.080 That's a bug. 01:09:38.080 --> 01:09:43.390 OK, 1 and 0, 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7. 01:09:43.390 --> 01:09:47.899 OK, so now it's obviously sorted to the rest of us on stage. 01:09:47.899 --> 01:09:50.290 How could I confirm as much as code? 01:09:50.290 --> 01:09:52.670 You're doing it with your mind, just glancing at this. 01:09:52.670 --> 01:09:56.080 How would the computer, the code, know for sure that this list is now sorted? 01:09:56.080 --> 01:09:57.000 Yeah. 01:09:57.000 --> 01:09:58.500 STUDENT: [INAUDIBLE] one more time. 01:09:58.500 --> 01:10:00.000 DAVID MALAN: Let's do one more time. 01:10:00.000 --> 01:10:03.512 And look, draw what conclusion? 01:10:03.512 --> 01:10:05.490 STUDENT: That nothing has to switch at all. 01:10:05.490 --> 01:10:07.365 DAVID MALAN: Yeah, let's do it one more time, 01:10:07.365 --> 01:10:08.860 even though it's a little wasteful. 01:10:08.860 --> 01:10:13.320 But logically, if I go through the whole list comparing pairs again, again, 01:10:13.320 --> 01:10:15.810 and again, and I don't do any work that time, 01:10:15.810 --> 01:10:19.133 now it's obviously logically safe to just stop, because otherwise, I'm 01:10:19.133 --> 01:10:21.300 wasting my time doing the same thing again and again 01:10:21.300 --> 01:10:22.933 if no one's actually moving. 01:10:22.933 --> 01:10:25.350 So I'm afraid we don't have monopoly games for all of you. 01:10:25.350 --> 01:10:26.767 But we do have eight stress balls. 01:10:26.767 --> 01:10:30.090 And round of applause, if we could, for our volunteers. 01:10:30.090 --> 01:10:33.910 If you want to put your numbers on the shelf there. 01:10:33.910 --> 01:10:36.220 So if we consider for a moment-- 01:10:36.220 --> 01:10:36.720 thank you. 01:10:36.720 --> 01:10:39.340 Thank you so much. 01:10:39.340 --> 01:10:42.150 Sure. 01:10:42.150 --> 01:10:43.170 Thank you. 01:10:43.170 --> 01:10:44.230 Thanks. 01:10:44.230 --> 01:10:44.730 Sure. 01:10:44.730 --> 01:10:48.870 So if we consider now these two algorithms, which one is better? 01:10:48.870 --> 01:10:51.990 Any intuition for whether selection sort the first 01:10:51.990 --> 01:10:55.950 is better or worse than bubble sort the second? 01:10:55.950 --> 01:10:58.020 Any thoughts? 01:10:58.020 --> 01:10:58.860 Yeah. 01:10:58.860 --> 01:11:03.620 STUDENT: Bubble sort's even better because it's less work [INAUDIBLE].. 01:11:03.620 --> 01:11:06.110 DAVID MALAN: So bubble sort seems like less work, 01:11:06.110 --> 01:11:08.930 especially since I was focusing on those localized problems. 01:11:08.930 --> 01:11:11.460 Other intuition? 01:11:11.460 --> 01:11:14.580 Selection sort versus bubble sort. 01:11:14.580 --> 01:11:18.000 Well, let me propose that we try to quantize this so we can actually 01:11:18.000 --> 01:11:19.420 analyze it in some way. 01:11:19.420 --> 01:11:22.590 And this is not an exercise we'll do constantly for lots of algorithms. 01:11:22.590 --> 01:11:24.940 But these are pretty representative of algorithms. 01:11:24.940 --> 01:11:27.690 So we can wrap our minds around, indeed, the performance 01:11:27.690 --> 01:11:28.960 or the design of these things. 01:11:28.960 --> 01:11:34.350 So here is my pseudocode for selection sort, whereby as per its name, 01:11:34.350 --> 01:11:38.500 I just iteratively select the next smallest element again and again. 01:11:38.500 --> 01:11:41.890 So how can we go about analyzing something like this? 01:11:41.890 --> 01:11:43.920 Well, we could just do it on paper pencil 01:11:43.920 --> 01:11:46.260 and count up the number of steps that seem 01:11:46.260 --> 01:11:48.030 to be implied logically by the code. 01:11:48.030 --> 01:11:52.260 We could literally count the number of steps I was taking again and again, 01:11:52.260 --> 01:11:52.890 left to right. 01:11:52.890 --> 01:11:55.830 We could also just count the number of comparisons 01:11:55.830 --> 01:11:58.302 I was making with each of the persons involved. 01:11:58.302 --> 01:12:00.510 And I was doing it kind of quickly in selection sort. 01:12:00.510 --> 01:12:03.033 But every time I was looking at a person trying to decide, 01:12:03.033 --> 01:12:04.950 do I want to remember that number is smallest? 01:12:04.950 --> 01:12:08.580 That number, I was comparing two values with an equals equals or less 01:12:08.580 --> 01:12:11.700 than or greater than sign, at least if we had done this in code. 01:12:11.700 --> 01:12:13.110 So that tends to be the norm. 01:12:13.110 --> 01:12:16.680 When analyzing algorithms like these, counting the number of comparisons, 01:12:16.680 --> 01:12:21.090 because it's kind of a global unit of measure 01:12:21.090 --> 01:12:23.490 we can use to compare different algorithms entirely. 01:12:23.490 --> 01:12:27.780 So think, too, that in the general case, when 01:12:27.780 --> 01:12:30.390 we have more than eight volunteers, more than seven doors, 01:12:30.390 --> 01:12:33.510 we can generalize our array in general, as this 01:12:33.510 --> 01:12:35.220 is the first element at bracket 0. 01:12:35.220 --> 01:12:37.770 And the end of it is always n minus 1. 01:12:37.770 --> 01:12:41.550 So arrays or doors, in this case, or volunteers, 01:12:41.550 --> 01:12:45.720 are always numerically indexed from 0 on up to n minus 1, 01:12:45.720 --> 01:12:47.200 if there's n of them in total. 01:12:47.200 --> 01:12:50.940 So how do we analyze the code of selection sort? 01:12:50.940 --> 01:12:56.370 Well, how many steps did it take me to find the first smallest element? 01:12:56.370 --> 01:12:58.835 Or more precisely, how many comparisons did I 01:12:58.835 --> 01:13:00.960 need to make when I walked to left to right to find 01:13:00.960 --> 01:13:06.100 our first-smallest person, which ended up being 0? 01:13:06.100 --> 01:13:09.310 How many comparisons did I do when walking left to right? 01:13:09.310 --> 01:13:15.850 If there were eight people on stage, how many total comparisons that I do? 01:13:15.850 --> 01:13:18.280 Like if there's eight people, I compared these folks. 01:13:18.280 --> 01:13:22.210 Then this person, this person, yeah. 01:13:22.210 --> 01:13:23.412 Yeah, so seven total, right? 01:13:23.412 --> 01:13:25.120 Because if there's eight people on stage, 01:13:25.120 --> 01:13:28.420 you can only do seven comparisons total, because otherwise you'd 01:13:28.420 --> 01:13:29.960 be comparing one number to itself. 01:13:29.960 --> 01:13:31.960 So it seems like, in the general case, if you've 01:13:31.960 --> 01:13:34.600 got n numbers that you're trying to sort, 01:13:34.600 --> 01:13:38.560 finding the smallest element first takes n minus 1 comparisons. 01:13:38.560 --> 01:13:41.275 Maybe n total steps left to right. 01:13:41.275 --> 01:13:43.150 But the number of comparisons, which I claim, 01:13:43.150 --> 01:13:46.030 is just a useful unit of measure, is n minus 1. 01:13:46.030 --> 01:13:48.490 How about finding the next smallest person? 01:13:48.490 --> 01:13:51.970 How many steps did it take me to find the next smallest number, which 01:13:51.970 --> 01:13:53.200 ended up being the number 1? 01:13:55.790 --> 01:13:56.855 Yeah. 01:13:56.855 --> 01:13:58.340 STUDENT: [INAUDIBLE] n minus 2. 01:13:58.340 --> 01:13:59.600 DAVID MALAN: Yeah, so just n minus 2. 01:13:59.600 --> 01:13:59.870 Why? 01:13:59.870 --> 01:14:01.610 Because I'd already solved one problem. 01:14:01.610 --> 01:14:03.210 Someone was already in the right position. 01:14:03.210 --> 01:14:05.490 It would be silly to keep counting them again and again. 01:14:05.490 --> 01:14:08.157 So I can whittle down my number of comparisons for the next pass 01:14:08.157 --> 01:14:09.200 to n minus 2. 01:14:09.200 --> 01:14:12.350 The third pass to find the third smallest number would be n minus 3. 01:14:12.350 --> 01:14:15.770 And then dot, dot, dot, presumably this story, this formula, 01:14:15.770 --> 01:14:19.620 ends when you have just one final pair, the people at the end, to compare. 01:14:19.620 --> 01:14:22.820 So if this is looking a little reminiscent of some kind of recurrence 01:14:22.820 --> 01:14:25.520 from high school or high school math or physics or the like, 01:14:25.520 --> 01:14:28.220 let me just stipulate that if you actually do out this math 01:14:28.220 --> 01:14:32.840 and generalize it, that is the same thing as n times n minus 1 01:14:32.840 --> 01:14:33.392 divided by 2. 01:14:33.392 --> 01:14:35.100 And if you're rusty on that, no big deal. 01:14:35.100 --> 01:14:37.070 Just kind of commit to memory that any time 01:14:37.070 --> 01:14:40.010 you add up this kind of series, something plus something slightly 01:14:40.010 --> 01:14:42.302 smaller, plus something slightly smaller, each of which 01:14:42.302 --> 01:14:46.520 differs by 1, you're going to get this formula. n times n minus 1 over 2. 01:14:46.520 --> 01:14:50.630 If we, of course, multiply that out, that's really n squared minus n, 01:14:50.630 --> 01:14:51.680 all divided by 2. 01:14:51.680 --> 01:14:56.540 If we keep multiplying it out, that's n squared divided by 2 minus n over 2. 01:14:56.540 --> 01:14:59.660 And now, we have kind of a vocabulary with which we 01:14:59.660 --> 01:15:03.140 can talk about the efficiency, the design of this algorithm. 01:15:03.140 --> 01:15:06.380 But honestly, I don't really care about this level of precision, 01:15:06.380 --> 01:15:09.560 like n squared divided by 2 minus n divided by 2. 01:15:09.560 --> 01:15:14.240 As n gets really large, which of these symbols, which of these terms 01:15:14.240 --> 01:15:17.480 is really going to dominate, become the biggest influencer 01:15:17.480 --> 01:15:20.190 on the total value of steps? 01:15:20.190 --> 01:15:20.690 Right? 01:15:20.690 --> 01:15:21.890 It's the square, right? 01:15:21.890 --> 01:15:23.382 It's definitely not n divided by 2. 01:15:23.382 --> 01:15:24.590 That's shaving some time off. 01:15:24.590 --> 01:15:27.800 But n squared, as n gets big, is going to get really big. 01:15:27.800 --> 01:15:29.990 If n is 100, then n squared is bigger. 01:15:29.990 --> 01:15:32.570 If n is a million, n squared is really bigger. 01:15:32.570 --> 01:15:35.390 And so at the end of the day, when we're really just talking 01:15:35.390 --> 01:15:39.140 about a wave of the hand analysis and upper bound, if you will, 01:15:39.140 --> 01:15:43.130 let's just say that selection sort, as analyzed here, 01:15:43.130 --> 01:15:45.860 it's on the order of n squared steps. 01:15:45.860 --> 01:15:47.690 It's not precisely n squared steps. 01:15:47.690 --> 01:15:52.830 But you know what? n squared divided by 2, the intuition here might be that, 01:15:52.830 --> 01:15:55.250 well, it's half of that. 01:15:55.250 --> 01:15:58.570 n squared is what really matters as n gets really, really large. 01:15:58.570 --> 01:16:01.070 And that's when you start thinking about and trying to solve 01:16:01.070 --> 01:16:02.445 the Google problems of the world. 01:16:02.445 --> 01:16:04.670 When n gets large, that's when you have to be smarter 01:16:04.670 --> 01:16:07.490 than just sort of naive implementations of any algorithm. 01:16:07.490 --> 01:16:12.480 So where, then, does this algorithm fall into this categorization here? 01:16:12.480 --> 01:16:14.870 Well, n squared, it turns out, is on the order 01:16:14.870 --> 01:16:19.610 of n squared steps, in the worst case, whether it's sorted or not. 01:16:19.610 --> 01:16:23.990 It turns out, though, lower bound, if we consider this same code, 01:16:23.990 --> 01:16:28.670 suppose the best case scenario, like our eight volunteers came up on stage. 01:16:28.670 --> 01:16:32.240 And just because they already sorted themselves, so 0 through 7. 01:16:32.240 --> 01:16:34.490 Suppose they just happened to be in that state. 01:16:34.490 --> 01:16:37.550 How many steps would selection store take 01:16:37.550 --> 01:16:42.670 to sort an already-sorted list of volunteers? 01:16:42.670 --> 01:16:43.420 Any intuition? 01:16:43.420 --> 01:16:44.318 Yeah. 01:16:44.318 --> 01:16:47.186 STUDENT: Would it still be [INAUDIBLE]? 01:16:47.186 --> 01:16:49.717 DAVID MALAN: Would it still be n-- 01:16:49.717 --> 01:16:51.180 STUDENT: Still be 7 [INAUDIBLE]. 01:16:51.180 --> 01:16:53.263 DAVID MALAN: So for the first pass, it would still 01:16:53.263 --> 01:16:55.710 be 7 for the first pass across the humans. 01:16:55.710 --> 01:16:58.530 Because even though, yeah, I'm claiming 0 is here, 01:16:58.530 --> 01:17:01.920 I don't know that 0 is the smallest until I make my way all the way 01:17:01.920 --> 01:17:03.990 over there doing all seven comparisons. 01:17:03.990 --> 01:17:08.220 OK, fine, first pass took seven or more generally n minus 1 steps. 01:17:08.220 --> 01:17:11.940 What if I look for the next smallest element, and the humans in this story 01:17:11.940 --> 01:17:14.370 are already sorted 0 through 7? 01:17:14.370 --> 01:17:17.580 Well, yes, the number 1 is here, and I see them first. 01:17:17.580 --> 01:17:21.405 But I don't know they're the smallest until I compare against everyone else 01:17:21.405 --> 01:17:22.530 get to the end of the list. 01:17:22.530 --> 01:17:24.238 And we're like, oh, well that was stupid. 01:17:24.238 --> 01:17:26.550 I already had the smallest person in hand then. 01:17:26.550 --> 01:17:29.820 And so this pseudocode, this implementation of selection sort, 01:17:29.820 --> 01:17:31.650 is sort of fixed like this. 01:17:31.650 --> 01:17:35.490 There's no special case that says, if already sorted, quit early. 01:17:35.490 --> 01:17:37.860 It's always going to take n squared steps. 01:17:37.860 --> 01:17:42.090 And so in this case, if we borrow our jargon from earlier 01:17:42.090 --> 01:17:46.080 using omega notation, just to be clear, selection sort 01:17:46.080 --> 01:17:50.790 is also going to be in this incarnation in omega of n squared, 01:17:50.790 --> 01:17:53.190 because even in the best case, where the list is already 01:17:53.190 --> 01:17:56.100 sorted, you're going to waste a huge amount of time 01:17:56.100 --> 01:17:59.040 essentially verifying as much or discovering as much, 01:17:59.040 --> 01:18:01.750 even though we humans of course could see it right away. 01:18:01.750 --> 01:18:06.270 So selection sort would seem to take both n squared steps in the worst 01:18:06.270 --> 01:18:08.425 case, n squared steps in the best case. 01:18:08.425 --> 01:18:09.300 And so you know what? 01:18:09.300 --> 01:18:11.280 We can use our theta terminology for that. 01:18:11.280 --> 01:18:13.770 Here would be an algorithm, just like counting earlier, 01:18:13.770 --> 01:18:17.880 that always takes n squared steps, no matter whether the array is sorted 01:18:17.880 --> 01:18:19.652 or not from the get go. 01:18:19.652 --> 01:18:21.360 All right, so hopefully we can do better. 01:18:21.360 --> 01:18:23.550 And someone proposed earlier that bubble sort 01:18:23.550 --> 01:18:25.618 felt like it was using fewer steps. 01:18:25.618 --> 01:18:26.910 Well, let's consider that next. 01:18:26.910 --> 01:18:30.630 With bubble sort, we had this pseudocode, I claim. 01:18:30.630 --> 01:18:33.780 Whereby, let's focus on the inside of the code first. 01:18:33.780 --> 01:18:36.120 Down here, what was I doing? 01:18:36.120 --> 01:18:39.960 For i from 0 to n minus 2. 01:18:39.960 --> 01:18:40.740 That's curious. 01:18:40.740 --> 01:18:42.360 We've never seen n minus 2 before. 01:18:42.360 --> 01:18:44.040 But I asked this question. 01:18:44.040 --> 01:18:50.160 If numbers bracket i and numbers bracket i plus 1 are out of order, swap them. 01:18:50.160 --> 01:18:53.610 So that was when I was pointing at our first two volunteers here. 01:18:53.610 --> 01:18:57.090 I saw that they were out of order, so I swapped them. 01:18:57.090 --> 01:19:01.830 How come I'm doing that again and again up to n minus 2, though, 01:19:01.830 --> 01:19:05.610 instead of n minus 1, which we've always used up 01:19:05.610 --> 01:19:09.670 until now as our rightmost boundary? 01:19:09.670 --> 01:19:14.170 Any intuition for why I'm doing this from 0 to n minus 2? 01:19:14.170 --> 01:19:14.700 Yeah. 01:19:14.700 --> 01:19:18.540 STUDENT: [INAUDIBLE] number, you can't get rid of the ith number. 01:19:18.540 --> 01:19:21.005 There's no benign character you can swap with. 01:19:21.005 --> 01:19:21.880 DAVID MALAN: Exactly. 01:19:21.880 --> 01:19:26.050 Because I'm looking at the ith person per this pseudocode here 01:19:26.050 --> 01:19:29.740 and the ith plus 1 person, I better make sure I don't go step 01:19:29.740 --> 01:19:31.550 beyond the boundaries of my array. 01:19:31.550 --> 01:19:33.010 So if you think of my left hand. 01:19:33.010 --> 01:19:36.250 When my back was to you here, pointing at the current person 01:19:36.250 --> 01:19:39.225 at the first position, my right hand for this if conditioner 01:19:39.225 --> 01:19:41.350 is essentially pointing at the person next to them. 01:19:41.350 --> 01:19:44.740 And you want to iterate with your left hand all through these people. 01:19:44.740 --> 01:19:47.620 But you don't want your left hand to point at the last person. 01:19:47.620 --> 01:19:50.000 You want it to point at the second to last person. 01:19:50.000 --> 01:19:54.220 But we know that the last person is always at n minus 1. 01:19:54.220 --> 01:19:57.820 So the second to last person, just mathematically, is at n minus 2. 01:19:57.820 --> 01:19:58.780 So it's a subtlety. 01:19:58.780 --> 01:20:00.880 But this is a seg fault waiting to happen. 01:20:00.880 --> 01:20:03.760 If you implemented bubble sort using n minus 1, 01:20:03.760 --> 01:20:07.340 you will, my right hand would go beyond the boundaries of the array, 01:20:07.340 --> 01:20:08.170 so just bad. 01:20:08.170 --> 01:20:10.490 All right, so why am I saying this n times? 01:20:10.490 --> 01:20:13.070 Well, we did it very organically with humans. 01:20:13.070 --> 01:20:15.940 But each time someone-- 01:20:15.940 --> 01:20:18.370 each pass I did through the array, someone 01:20:18.370 --> 01:20:19.840 bubbled their way up to the end. 01:20:19.840 --> 01:20:22.870 Number 7, then number 6, then number 5. 01:20:22.870 --> 01:20:26.600 So if on each pass through the array of volunteers, 01:20:26.600 --> 01:20:31.360 I was solving at least one problem, it seems like bubble sort can just 01:20:31.360 --> 01:20:34.315 run n times total to solve all n problems, 01:20:34.315 --> 01:20:36.940 because the first pass will get at least one number into place. 01:20:36.940 --> 01:20:38.470 Second pass, second number into place. 01:20:38.470 --> 01:20:39.970 You might get lucky, and it would do more. 01:20:39.970 --> 01:20:41.740 But worst case, this feels like enough. 01:20:41.740 --> 01:20:46.240 Just do this blindly n times, and they'll all line up together. 01:20:46.240 --> 01:20:49.780 Well, technically-- all right, now we're getting into the weeds. 01:20:49.780 --> 01:20:52.060 Technically, you can just repeat it in minus 1 times, 01:20:52.060 --> 01:20:54.520 because if you solve all n minus 1 other problems, 01:20:54.520 --> 01:20:58.120 and you're left with 1, literally that person's where they need to be, 01:20:58.120 --> 01:20:58.900 just logically. 01:20:58.900 --> 01:21:01.390 If you've already sorted everything else and you've got just the 1 left, 01:21:01.390 --> 01:21:02.540 it's already bubbled up. 01:21:02.540 --> 01:21:03.980 So how do we analyze this? 01:21:03.980 --> 01:21:06.670 Well in bubble sort, we might do something like this. 01:21:06.670 --> 01:21:11.015 I'm essentially doing n minus 1 things n minus 1 times. 01:21:11.015 --> 01:21:13.390 Now, let me back up to the pseudocode, because this one's 01:21:13.390 --> 01:21:14.980 a little less obvious. 01:21:14.980 --> 01:21:19.270 This is where you can actually mathematically infer from your loop 01:21:19.270 --> 01:21:21.110 how many steps you're taking. 01:21:21.110 --> 01:21:24.585 So this first line literally says, repeat the following n minus 1 times. 01:21:24.585 --> 01:21:26.710 So that's going to translate very straightforwardly 01:21:26.710 --> 01:21:28.240 to our mathematical formula. 01:21:28.240 --> 01:21:30.190 Do something n minus 1 times. 01:21:30.190 --> 01:21:33.970 This loop, just because I'm using for loop terminology, 01:21:33.970 --> 01:21:35.840 it's framed a little differently. 01:21:35.840 --> 01:21:39.910 But if you're iterating from 0 to n minus 2, 01:21:39.910 --> 01:21:43.258 you're iterating a total of n minus 1 times. 01:21:43.258 --> 01:21:45.550 And again, the arithmetic is getting a little annoying. 01:21:45.550 --> 01:21:48.470 But this just means do the following n minus 1 times. 01:21:48.470 --> 01:21:51.670 So do n minus 1 things n minus 1 times. 01:21:51.670 --> 01:21:54.440 We can now run out the math as follows. 01:21:54.440 --> 01:21:57.940 We have the formula n minus 1 times n minus 1. 01:21:57.940 --> 01:22:01.210 We do our little FOIL method here, n squared minus 1 times n, 01:22:01.210 --> 01:22:03.100 minus 1 times n, plus 1. 01:22:03.100 --> 01:22:06.550 We can combine like terms. n squared minus 2n plus 1. 01:22:06.550 --> 01:22:09.025 But at this point, when n gets really large, 01:22:09.025 --> 01:22:10.900 which term are we really going to care about? 01:22:10.900 --> 01:22:13.390 This is on the order of? 01:22:13.390 --> 01:22:14.870 Yeah, n squared. 01:22:14.870 --> 01:22:16.780 So at least asymptotically. 01:22:16.780 --> 01:22:20.830 Asymptotically means, as n approaches infinity, gets really large. 01:22:20.830 --> 01:22:24.070 Turns out that the upper bound on selection sort and bubble sort 01:22:24.070 --> 01:22:25.430 are essentially the same. 01:22:25.430 --> 01:22:28.540 Now, if we really nitpicked and compared the total number of comparisons, 01:22:28.540 --> 01:22:29.680 they might differ slightly. 01:22:29.680 --> 01:22:31.870 But as n gets large, honestly, you're barely 01:22:31.870 --> 01:22:36.350 going to notice the difference, it would seem, between these two algorithms. 01:22:36.350 --> 01:22:39.550 But what about the lower bound? 01:22:39.550 --> 01:22:44.620 If the upper bound on bubble sort is also big O of n, what about the lower 01:22:44.620 --> 01:22:45.470 bound here? 01:22:45.470 --> 01:22:50.170 Well, with this pseudocode, what would the lower bound be on bubble sort? 01:22:50.170 --> 01:22:53.890 Even in the best case when all of the volunteers are sorted. 01:22:53.890 --> 01:22:56.830 Any intuition? 01:22:56.830 --> 01:22:57.670 In this pseudo code. 01:22:57.670 --> 01:22:58.538 Yeah, in the middle. 01:22:58.538 --> 01:22:59.830 STUDENT: Sorry, quick question. 01:22:59.830 --> 01:23:02.360 Isn't bubble sort structured such that you 01:23:02.360 --> 01:23:05.955 wouldn't need to compare numbers that have already bubbled up? 01:23:05.955 --> 01:23:07.080 DAVID MALAN: Good question. 01:23:07.080 --> 01:23:09.122 Isn't bubble sort designed such that you wouldn't 01:23:09.122 --> 01:23:12.860 need to compare numbers that have already bubbled up? 01:23:12.860 --> 01:23:17.000 That's what's happening here in the middle, implicitly. 01:23:17.000 --> 01:23:19.220 I'm always going from left to right. 01:23:19.220 --> 01:23:21.650 But remember that even when I screwed up at the end 01:23:21.650 --> 01:23:23.900 and the last two people were out of order, I do always 01:23:23.900 --> 01:23:27.140 need to restart at the beginning, because the big numbers are 01:23:27.140 --> 01:23:29.691 going that way, and the small numbers are coming this way. 01:23:29.691 --> 01:23:32.892 STUDENT: [INAUDIBLE] 01:23:32.892 --> 01:23:34.100 DAVID MALAN: So that is true. 01:23:34.100 --> 01:23:37.460 There are some slight optimizations that I'm kind of glossing over here. 01:23:37.460 --> 01:23:40.700 Let me stipulate that it would still end up being on the order of n squared. 01:23:40.700 --> 01:23:43.910 But that would definitely shave off some actual running time here. 01:23:43.910 --> 01:23:46.340 But what if the list is already sorted? 01:23:46.340 --> 01:23:48.410 Our pseudocode, at the moment, has no allowance 01:23:48.410 --> 01:23:51.020 for if list is already sorted, quit early. 01:23:51.020 --> 01:23:53.840 So we're going to blindly do n minus 1 things 01:23:53.840 --> 01:23:58.850 and minus 1 times unless we modify our pseudocode, as I did verbally earlier, 01:23:58.850 --> 01:23:59.960 I proposed this. 01:23:59.960 --> 01:24:04.520 Inside of that outer loop, if you make a pass across all of the volunteers, 01:24:04.520 --> 01:24:06.907 and your mental counter has made no swaps, 01:24:06.907 --> 01:24:08.990 you have to keep track with some kind of variable, 01:24:08.990 --> 01:24:10.518 well then, you might as well stop. 01:24:10.518 --> 01:24:12.560 Because if you do a whole pass and make no swaps, 01:24:12.560 --> 01:24:17.550 why would you waste time doing it again expecting different behavior? 01:24:17.550 --> 01:24:23.480 So to help visualize these, whereby now bubble sort can be advantageous 01:24:23.480 --> 01:24:26.640 if the data is already sorted or mostly sorted. 01:24:26.640 --> 01:24:27.140 Why? 01:24:27.140 --> 01:24:29.510 Because it does have this short circuit detail. 01:24:29.510 --> 01:24:31.790 At least if we implement it like that, how 01:24:31.790 --> 01:24:36.263 can we go about visualizing these things a little more clearly? 01:24:36.263 --> 01:24:37.680 Well, let me go ahead and do this. 01:24:37.680 --> 01:24:41.870 Let me pull up, here, a visualization of exactly these algorithms, 01:24:41.870 --> 01:24:45.320 thanks to a third party tool here that's going to help us visualize 01:24:45.320 --> 01:24:46.850 these sorting algorithms as follows. 01:24:46.850 --> 01:24:48.740 Small bars represent small numbers. 01:24:48.740 --> 01:24:50.480 Big bars represent big numbers. 01:24:50.480 --> 01:24:53.720 And so the idea, now, is when I hit a button here 01:24:53.720 --> 01:24:56.843 to get all of the small bars this way, all of the big bars this way. 01:24:56.843 --> 01:24:58.010 So just like our volunteers. 01:24:58.010 --> 01:25:02.370 But instead of holding lighted numbers, it's bars representing their magnitude. 01:25:02.370 --> 01:25:07.190 So let's go ahead and start with, for instance, selection sort. 01:25:07.190 --> 01:25:09.920 And you'll see in pink, is being highlighted 01:25:09.920 --> 01:25:12.980 the current number that is being selected 01:25:12.980 --> 01:25:14.820 and then pulled all the way to the left. 01:25:14.820 --> 01:25:16.220 So this is selection sort. 01:25:16.220 --> 01:25:20.420 And again, it's selecting the next smallest element. 01:25:20.420 --> 01:25:25.850 But you can see here, all the more visibly, that just like my human feet, 01:25:25.850 --> 01:25:27.450 we're taking a lot of steps. 01:25:27.450 --> 01:25:32.430 So is this algorithm touching these elements, again and again and again. 01:25:32.430 --> 01:25:34.970 And this is why the n squared is really a thing. 01:25:34.970 --> 01:25:37.322 There's got to be some inherent redundancy here. 01:25:37.322 --> 01:25:40.280 Like, why do we keep looking at the same darn elements again and again? 01:25:40.280 --> 01:25:43.070 We do, in terms of our pseudocode need to do so. 01:25:43.070 --> 01:25:46.670 But it's this redundant comparisons that kind of explains 01:25:46.670 --> 01:25:48.782 why n squared is indeed the case. 01:25:48.782 --> 01:25:49.490 So now it's done. 01:25:49.490 --> 01:25:50.977 Small bars here, big bars there. 01:25:50.977 --> 01:25:53.060 And I had to just keep talking there to kill time, 01:25:53.060 --> 01:25:54.650 because it's relatively slow. 01:25:54.650 --> 01:25:57.182 Well, let me re-randomize the array, just 01:25:57.182 --> 01:25:58.640 so we start with a different order. 01:25:58.640 --> 01:26:00.380 And now let me click on bubble sort. 01:26:00.380 --> 01:26:03.240 And you'll see similar idea, but different algorithm. 01:26:03.240 --> 01:26:06.620 So now, the two bars in pink are the two that 01:26:06.620 --> 01:26:09.995 are being compared and fixed, potentially, if they're out of order. 01:26:09.995 --> 01:26:11.870 And you can see already that the biggest bars 01:26:11.870 --> 01:26:14.420 are bubbling their way up to the top. 01:26:14.420 --> 01:26:17.510 But now, you can also see this redundancy, 01:26:17.510 --> 01:26:20.450 like we keep swooping through the list again and again, just 01:26:20.450 --> 01:26:22.740 like I kept walking back and forth. 01:26:22.740 --> 01:26:23.795 And this is n squared. 01:26:23.795 --> 01:26:24.920 This is not that many bars. 01:26:24.920 --> 01:26:25.420 What? 01:26:25.420 --> 01:26:27.830 10, 20, there's like 40 or something bars, I'm guessing. 01:26:27.830 --> 01:26:31.560 That's pretty slow already just to sort 40 numbers. 01:26:31.560 --> 01:26:34.310 And I think it's going to get tedious if I keep talking over this. 01:26:34.310 --> 01:26:37.590 So let's just assume that this too is relatively slow. 01:26:37.590 --> 01:26:41.390 Had I gotten lucky and the list were almost sorted already, 01:26:41.390 --> 01:26:43.310 bubble sort would have been pretty fast. 01:26:43.310 --> 01:26:46.040 But this was a truly random array, so we did not get lucky. 01:26:46.040 --> 01:26:50.010 So indeed, the worst case might be what's kicking in here. 01:26:50.010 --> 01:26:54.470 So I feel like it'll be anticlimactic, like holding in a sneeze, if I 01:26:54.470 --> 01:26:55.980 don't let you see the end of this. 01:26:55.980 --> 01:26:57.890 So here we go. 01:26:57.890 --> 01:27:00.110 Nothing interesting is about to happen. 01:27:00.110 --> 01:27:02.330 Almost done. 01:27:02.330 --> 01:27:03.080 OK, done. 01:27:03.080 --> 01:27:05.890 All right, so thank you. 01:27:05.890 --> 01:27:06.710 [APPLAUSE] 01:27:06.710 --> 01:27:09.110 Thank you. 01:27:09.110 --> 01:27:12.500 So still somewhat slow though. 01:27:12.500 --> 01:27:15.800 How though can we, perhaps, do a little better fundamentally? 01:27:15.800 --> 01:27:19.070 So we can do so if we introduce yet another technique. 01:27:19.070 --> 01:27:22.130 And this one isn't so much a function of code as it is concept. 01:27:22.130 --> 01:27:25.290 And it's something that you might have seen in the real world, 01:27:25.290 --> 01:27:27.500 but perhaps not so obviously so. 01:27:27.500 --> 01:27:31.490 So it turns out, in programming, recursion 01:27:31.490 --> 01:27:34.970 refers to the ability of a function to call itself. 01:27:34.970 --> 01:27:37.460 In the world of mathematics, if you have a function f, 01:27:37.460 --> 01:27:41.450 if f appears on both the left side and the right side of a formula, 01:27:41.450 --> 01:27:43.850 that would be a recursive function in the math world too. 01:27:43.850 --> 01:27:46.490 Whenever f is defined in terms of itself, or in our case, 01:27:46.490 --> 01:27:51.800 in compute-- in programming, any time a function calls itself, 01:27:51.800 --> 01:27:53.660 that function is said to be recursive. 01:27:53.660 --> 01:27:55.790 And this is actually something we've seen already in class, 01:27:55.790 --> 01:27:57.373 even though we didn't call it as much. 01:27:57.373 --> 01:28:00.350 So for instance, consider this pseudocode 01:28:00.350 --> 01:28:03.590 from earlier, whereby this was the pseudocode 01:28:03.590 --> 01:28:07.760 for searching via binary search, a whole bunch of doors. 01:28:07.760 --> 01:28:10.040 If no doors are left returned false, that 01:28:10.040 --> 01:28:12.600 was the additional conditional we added. 01:28:12.600 --> 01:28:14.930 But then if number behind middle door returned true, 01:28:14.930 --> 01:28:18.980 and here's the interesting part, if number is less than middle door, 01:28:18.980 --> 01:28:20.780 search the left half. 01:28:20.780 --> 01:28:24.020 Else if number is greater than middle door, search the right half. 01:28:24.020 --> 01:28:27.800 This pseudocode earlier was, itself, recursive. 01:28:27.800 --> 01:28:28.340 Why? 01:28:28.340 --> 01:28:30.590 Because here is an algorithm for searching. 01:28:30.590 --> 01:28:32.650 But what's the algorithm telling us? 01:28:32.650 --> 01:28:37.280 Well, on this line and this line, it's telling us to search something else. 01:28:37.280 --> 01:28:40.720 So even though it's not explicitly defined in code as having a name, 01:28:40.720 --> 01:28:44.410 if this is a search algorithm, and yet the search algorithm is using a search 01:28:44.410 --> 01:28:47.650 algorithm, this pseudocode is recursive. 01:28:47.650 --> 01:28:50.380 Now, that could quickly get you into trouble if a function just 01:28:50.380 --> 01:28:53.410 calls itself again and again and again. 01:28:53.410 --> 01:28:57.130 But why, intuitively, is it not problematic 01:28:57.130 --> 01:29:01.840 that this code, this pseudocode, calls itself? 01:29:01.840 --> 01:29:03.460 Why will the algorithm still stop? 01:29:03.460 --> 01:29:03.970 Yeah. 01:29:03.970 --> 01:29:07.525 STUDENT: It has an exit condition, as in if there is no doors left, [INAUDIBLE].. 01:29:07.525 --> 01:29:08.400 DAVID MALAN: Exactly. 01:29:08.400 --> 01:29:10.860 It has some exit condition, like if no doors left. 01:29:10.860 --> 01:29:14.700 And more importantly, any time you search the left half, 01:29:14.700 --> 01:29:17.120 you're searching a smaller version of the problem. 01:29:17.120 --> 01:29:18.870 Any time you search the right half, you're 01:29:18.870 --> 01:29:22.330 searching a smaller version of the problem, literally half the size. 01:29:22.330 --> 01:29:24.270 So this is why, in the phone book, obviously 01:29:24.270 --> 01:29:27.210 I couldn't tear the phone book in half infinitely many times, 01:29:27.210 --> 01:29:29.560 because it was literally getting smaller each time. 01:29:29.560 --> 01:29:33.580 So recursion is this ability to call yourself, if you will. 01:29:33.580 --> 01:29:36.850 But what's important is that you do it on a smaller, smaller problem, 01:29:36.850 --> 01:29:39.750 so that eventually, you have no more problems to solve 01:29:39.750 --> 01:29:42.010 or no more data, no more doors at all. 01:29:42.010 --> 01:29:46.210 So these two lines here would be the recursive elements here. 01:29:46.210 --> 01:29:49.690 But if we go back to week 0, we could have used recursion in some other way. 01:29:49.690 --> 01:29:53.040 So this was our pseudocode for the phone book back in week 0. 01:29:53.040 --> 01:29:55.350 And recall that we described these yellow lines 01:29:55.350 --> 01:29:59.050 as really representing a loop, some kind of cycle again and again. 01:29:59.050 --> 01:30:01.080 But there was a missed opportunity here. 01:30:01.080 --> 01:30:05.670 What if I had re-implemented this code to do this? 01:30:05.670 --> 01:30:09.120 Instead of saying open to middle of left half of book 01:30:09.120 --> 01:30:12.450 and then go back to line 3, like literally inducing a loop, 01:30:12.450 --> 01:30:14.610 or open to middle of right half a book and go back 01:30:14.610 --> 01:30:17.400 to line 3 inducing another loop, why don't I 01:30:17.400 --> 01:30:19.590 just recognize that what I'm staring at now 01:30:19.590 --> 01:30:23.730 is a algorithm for searching a phone book? 01:30:23.730 --> 01:30:27.480 And if you want to search a smaller phone book, like A through M or N 01:30:27.480 --> 01:30:30.750 through Z, we'll just use this same algorithm. 01:30:30.750 --> 01:30:35.100 So I can replace these yellow lines with just this, casually speaking. 01:30:35.100 --> 01:30:37.282 Search left half of book, search right half of book. 01:30:37.282 --> 01:30:39.240 This would be implicitly, and now I can shorten 01:30:39.240 --> 01:30:42.360 the whole thing, a recursive implementation of the phone book 01:30:42.360 --> 01:30:43.633 pseudocode from week 0. 01:30:43.633 --> 01:30:46.050 And it's recursive, because if this is a search algorithm, 01:30:46.050 --> 01:30:48.900 and you're saying go search something else, that's fine. 01:30:48.900 --> 01:30:49.890 That's recursive. 01:30:49.890 --> 01:30:52.440 But because you're searching half of the phone book, 01:30:52.440 --> 01:30:55.710 it's indeed going to get smaller and smaller. 01:30:55.710 --> 01:30:59.400 Even in the real world or the real real virtual world, 01:30:59.400 --> 01:31:01.903 you can see recursive data structures in the wild, 01:31:01.903 --> 01:31:03.820 or at least in Super Mario Brothers like this. 01:31:03.820 --> 01:31:05.612 Let me get rid of all the distractions here 01:31:05.612 --> 01:31:08.790 and focus on this pyramid, where you have one block, 01:31:08.790 --> 01:31:10.860 then two, then three, then four. 01:31:10.860 --> 01:31:14.710 Well, this itself, is technically recursively-defined in the sense that, 01:31:14.710 --> 01:31:16.590 well, what is a pyramid of height for? 01:31:16.590 --> 01:31:18.420 Well, it's really, what? 01:31:18.420 --> 01:31:21.090 How would you describe a pyramid of height 4 01:31:21.090 --> 01:31:25.743 is actually the same thing as a pyramid of-- 01:31:25.743 --> 01:31:28.200 STUDENT: Height 3. 01:31:28.200 --> 01:31:30.750 DAVID MALAN: --of height 3, plus 1 additional layer. 01:31:30.750 --> 01:31:32.370 Well, what's a pyramid of height 3? 01:31:32.370 --> 01:31:36.250 Well, it's technically a pyramid of height 2 plus 1 additional layer. 01:31:36.250 --> 01:31:38.670 And so even physical structures can be recursive 01:31:38.670 --> 01:31:40.630 if you can define them in terms of itself. 01:31:40.630 --> 01:31:44.280 Now, at some point, you have to say that if the pyramid is of height 1, 01:31:44.280 --> 01:31:46.090 there's just one block. 01:31:46.090 --> 01:31:49.020 You can't forever say it's defined in terms of a height negative 1, 01:31:49.020 --> 01:31:50.440 negative 2, you would never stop. 01:31:50.440 --> 01:31:52.752 So you have to kind of have a special case there. 01:31:52.752 --> 01:31:55.710 But let's go ahead and translate something like this, in fact, to code. 01:31:55.710 --> 01:32:00.390 Let me go back to VS code here, and let me implement a program called iteration 01:32:00.390 --> 01:32:03.090 that refers to a loop iterating. 01:32:03.090 --> 01:32:05.620 And let me implement a very simple pyramid like that. 01:32:05.620 --> 01:32:08.370 So let me go ahead and include the CS50 library. 01:32:08.370 --> 01:32:14.918 I'll include our standard io.h int main void, no command line arguments today. 01:32:14.918 --> 01:32:16.210 And let's go ahead and do this. 01:32:16.210 --> 01:32:18.900 Let's declare a variable called height, ask the human 01:32:18.900 --> 01:32:21.150 for the height of this pyramid. 01:32:21.150 --> 01:32:25.300 And then let's go ahead and draw a pyramid of that height. 01:32:25.300 --> 01:32:27.580 Now, of course, draw does not yet exist. 01:32:27.580 --> 01:32:30.090 So I'm going to need to invent the draw function. 01:32:30.090 --> 01:32:33.180 Let me go ahead and define a function that doesn't have a return value. 01:32:33.180 --> 01:32:34.722 It's just going to have side effects. 01:32:34.722 --> 01:32:37.230 It's just going to print bricks on the screen, called draw. 01:32:37.230 --> 01:32:40.240 And it takes in an integer, n, as its input. 01:32:40.240 --> 01:32:41.950 And how am I going to implement this? 01:32:41.950 --> 01:32:46.530 Well again, I want to print one block, then two, then three, then four. 01:32:46.530 --> 01:32:49.680 That's pretty straightforward, at least once you're comfortable with loops. 01:32:49.680 --> 01:32:51.370 Let me go back to the code here. 01:32:51.370 --> 01:32:55.170 Let me go ahead and say 4, int i, get 0. 01:32:55.170 --> 01:32:56.880 i is less than n. 01:32:56.880 --> 01:32:58.260 i plus plus. 01:32:58.260 --> 01:33:01.170 And that's going to iterate, essentially row by row. 01:33:01.170 --> 01:33:05.370 And on each row, I want to print out one, then two, then three, then 01:33:05.370 --> 01:33:06.060 four bricks. 01:33:06.060 --> 01:33:08.815 But I'm iterating from 0 to 1 to 2 to 3. 01:33:08.815 --> 01:33:09.690 So I think that's OK. 01:33:09.690 --> 01:33:13.020 I can just say something like 4 int j get 0. 01:33:13.020 --> 01:33:17.160 j, let's be clever about this, is less than i. 01:33:17.160 --> 01:33:19.380 j++. 01:33:19.380 --> 01:33:22.560 And now, let me go ahead and, inside of this loop, 01:33:22.560 --> 01:33:27.130 I think I can get away with just printing out a single hash sign. 01:33:27.130 --> 01:33:30.270 But then outside of that loop, similar to last week, 01:33:30.270 --> 01:33:32.920 I'm going to print my new line separately. 01:33:32.920 --> 01:33:34.470 So a little non-obvious at first. 01:33:34.470 --> 01:33:38.790 But this outer loop iterates row by row, line by line, if you will. 01:33:38.790 --> 01:33:46.890 And then the inner loop just makes sure that when i equals zero, let's see. 01:33:46.890 --> 01:33:48.960 Oh nope, there's a bug. 01:33:48.960 --> 01:33:52.170 I need to make sure that it's j is less than i plus 1. 01:33:52.170 --> 01:33:55.500 So when i is 0 on my first line of output, 01:33:55.500 --> 01:33:57.600 I'm going to print out one brick. 01:33:57.600 --> 01:34:02.350 When i is 1, I'm going to print out two bricks and so forth. 01:34:02.350 --> 01:34:05.460 So let me go ahead and run make iteration. 01:34:05.460 --> 01:34:09.090 All right, and now, seems to compile. 01:34:09.090 --> 01:34:10.770 Uh oh, huh. 01:34:10.770 --> 01:34:12.900 Implicit declaration of function draw. 01:34:12.900 --> 01:34:16.100 So I'm making week one mistakes again. 01:34:16.100 --> 01:34:16.660 What? 01:34:16.660 --> 01:34:17.570 Say again. 01:34:17.570 --> 01:34:18.450 STUDENT: [INAUDIBLE] 01:34:18.450 --> 01:34:19.200 DAVID MALAN: Yeah. 01:34:19.200 --> 01:34:20.320 The prototype is missing. 01:34:20.320 --> 01:34:21.300 I didn't declare it at the top. 01:34:21.300 --> 01:34:23.550 That's an easy fix, and the only time, really, it's 01:34:23.550 --> 01:34:25.530 OK and necessary to copy paste. 01:34:25.530 --> 01:34:29.050 Let me copy the functions declaration there and it with a semicolon. 01:34:29.050 --> 01:34:32.370 So that clang now knows that draw will exist. 01:34:32.370 --> 01:34:33.240 Make iteration. 01:34:33.240 --> 01:34:33.930 Now it works. 01:34:33.930 --> 01:34:36.090 Thank you. dot slash iteration. 01:34:36.090 --> 01:34:37.830 We'll type in something like 4. 01:34:37.830 --> 01:34:40.860 And there we have it, our pyramid of height one, two, three, four, 01:34:40.860 --> 01:34:43.340 that looks pretty similar to this, albeit using hashes. 01:34:43.340 --> 01:34:46.590 So that's how we would have implemented this, like, two weeks ago in week one, 01:34:46.590 --> 01:34:49.110 maybe last week, but just using arrays. 01:34:49.110 --> 01:34:53.640 But let me propose that we could do something recursively instead. 01:34:53.640 --> 01:34:55.480 Let me close this version of the code. 01:34:55.480 --> 01:34:59.730 And let me go back to VS Code and open up recursion.c, 01:34:59.730 --> 01:35:01.800 just to demonstrate something recursively. 01:35:01.800 --> 01:35:04.420 And I'll do it incorrectly deliberately the first time. 01:35:04.420 --> 01:35:06.630 So let me include cs50.h. 01:35:06.630 --> 01:35:08.850 Let me include standard io.h. 01:35:08.850 --> 01:35:12.000 Let me do int main void. 01:35:12.000 --> 01:35:17.910 And let me just blindly draw a pyramid initially of height 1. 01:35:17.910 --> 01:35:21.910 But now in my draw function, let me re-implement it a little differently. 01:35:21.910 --> 01:35:24.840 So my draw function this time is still going to take a number n. 01:35:24.840 --> 01:35:26.860 But that's how many hashes it's going to print. 01:35:26.860 --> 01:35:30.030 So let's do 4, int i get 0. 01:35:30.030 --> 01:35:32.220 i is less than n. 01:35:32.220 --> 01:35:34.050 i++. 01:35:34.050 --> 01:35:38.440 Then let's go ahead and print out a single hash mark here. 01:35:38.440 --> 01:35:44.290 And then after that, let's print out the end of the line, just as before. 01:35:44.290 --> 01:35:49.770 But now this, of course, is only going to draw a single row. 01:35:49.770 --> 01:35:53.790 It's going to print out one hash or two hashes or three hashes, but only 01:35:53.790 --> 01:35:54.750 on one line. 01:35:54.750 --> 01:35:58.560 Let me now, incorrectly, but just kind of curiously say, all right. 01:35:58.560 --> 01:36:01.230 Well, if this draws a pyramid of height 1, 01:36:01.230 --> 01:36:04.860 let's just use ourselves to draw a pyramid of height n plus 1. 01:36:04.860 --> 01:36:08.370 So the first time I call draw, it will print out one hash. 01:36:08.370 --> 01:36:12.870 Then the second time I call draw, it will print out two hashes, then three, 01:36:12.870 --> 01:36:13.770 then four. 01:36:13.770 --> 01:36:18.000 So we're kind of laying these bricks down from top to bottom. 01:36:18.000 --> 01:36:20.670 Make recursion. 01:36:20.670 --> 01:36:22.420 Whoops, I screwed up again. 01:36:22.420 --> 01:36:24.630 So let's copy the prototype here. 01:36:24.630 --> 01:36:27.260 Let's put this down over here, semicolon. 01:36:27.260 --> 01:36:28.600 Let's do this again. 01:36:28.600 --> 01:36:30.010 Make recursion. 01:36:30.010 --> 01:36:32.410 All right, all good, dot slash recursion. 01:36:32.410 --> 01:36:34.930 And now let me increase the size of my terminal window, 01:36:34.930 --> 01:36:37.310 just so you can see more of the output. 01:36:37.310 --> 01:36:39.490 And here we have. 01:36:39.490 --> 01:36:41.480 OK, bad, but thank you. 01:36:41.480 --> 01:36:43.525 So we have an infinitely tall pyramid. 01:36:43.525 --> 01:36:45.400 And it's just flying across the screen, which 01:36:45.400 --> 01:36:47.020 is why it looks kind of like a mess. 01:36:47.020 --> 01:36:51.670 But I printed out a pyramid of height 1, and then 2, and then 3, and then 4. 01:36:51.670 --> 01:36:55.210 And unfortunately, what am I lacking any sort of quick condition, 01:36:55.210 --> 01:36:58.270 any kind of condition that says, wait a minute, when it's too tall, 01:36:58.270 --> 01:36:59.353 stop altogether. 01:36:59.353 --> 01:37:00.520 So this is an infinite loop. 01:37:00.520 --> 01:37:01.570 But it's not a loop. 01:37:01.570 --> 01:37:03.250 It's a recursive call. 01:37:03.250 --> 01:37:05.780 And actually, doing this in general, is very bad. 01:37:05.780 --> 01:37:08.410 We'll see next week that if you call a function too many times, 01:37:08.410 --> 01:37:11.967 you can actually trigger yet another of those segmentation faults, 01:37:11.967 --> 01:37:14.050 because you're using too much memory, essentially. 01:37:14.050 --> 01:37:16.300 But for now, I haven't triggered that yet. 01:37:16.300 --> 01:37:17.927 Control C is your friend to cancel. 01:37:17.927 --> 01:37:21.010 And as an aside, if you're playing along at home or playing with this code 01:37:21.010 --> 01:37:22.750 later, I actually cheated here. 01:37:22.750 --> 01:37:25.360 We have a special clang configuration feature 01:37:25.360 --> 01:37:28.392 that prevents you from calling a function like that 01:37:28.392 --> 01:37:29.350 and creating a problem. 01:37:29.350 --> 01:37:31.598 I overrode it just for demonstration sake. 01:37:31.598 --> 01:37:34.640 But odds are at home, you wouldn't be able to compile this code yourself. 01:37:34.640 --> 01:37:39.050 But let me do a proper version recursively of this code as follows. 01:37:39.050 --> 01:37:41.870 Let me go back into the code here. 01:37:41.870 --> 01:37:45.130 Let me go ahead and, not just blindly start drawing one, then two, 01:37:45.130 --> 01:37:46.540 then three layers of bricks. 01:37:46.540 --> 01:37:50.380 Let me prompt the human as before for the height of the pyramid 01:37:50.380 --> 01:37:53.350 they want using our get int function. 01:37:53.350 --> 01:37:55.670 And now let me call draw of height again. 01:37:55.670 --> 01:37:58.330 So now I'm going back to the loop-like version. 01:37:58.330 --> 01:38:03.250 But instead of using a loop now, this is where recursion gets rather elegant, 01:38:03.250 --> 01:38:04.120 if you will. 01:38:04.120 --> 01:38:10.690 Let me go ahead and execute and code the draw function as follows. 01:38:10.690 --> 01:38:14.050 Per your definition, if a pyramid of height 4 01:38:14.050 --> 01:38:17.437 is really just a pyramid of height 3 plus another row, well, 01:38:17.437 --> 01:38:18.520 let's take that literally. 01:38:18.520 --> 01:38:19.990 Let me go back to my code. 01:38:19.990 --> 01:38:24.340 And if you want to draw a pyramid of height 4, well go right ahead 01:38:24.340 --> 01:38:29.380 and draw a pyramid of height 3 first, or more generally, n minus 1. 01:38:29.380 --> 01:38:30.640 But what's the second step? 01:38:30.640 --> 01:38:34.510 Well, once you've drawn a pyramid of height 3, draw an extra row. 01:38:34.510 --> 01:38:37.190 So I at least have to bite off that part of the problem myself. 01:38:37.190 --> 01:38:39.310 So let me just do for int i get 0. 01:38:39.310 --> 01:38:41.530 i is less than n i++. 01:38:41.530 --> 01:38:46.010 And let me, the programmer of this function, print out my hashes. 01:38:46.010 --> 01:38:48.340 And then at the very bottom, print out a new line 01:38:48.340 --> 01:38:50.350 so the cursor moves to the next line. 01:38:50.350 --> 01:38:55.660 But this is kind of elegant now, I dare say, in that draw is recursive, 01:38:55.660 --> 01:38:58.570 because I'm literally translating from English to C code, 01:38:58.570 --> 01:39:02.050 this idea that a pyramid of height 4 is really just a pyramid of height 3. 01:39:02.050 --> 01:39:03.640 So I do that first. 01:39:03.640 --> 01:39:06.560 And I'm sort of trusting that this will work. 01:39:06.560 --> 01:39:09.800 Then I just have to lay one more layer of bricks, four of them. 01:39:09.800 --> 01:39:13.030 So if n is 4, this is just a simple for loop, a la week 1, 01:39:13.030 --> 01:39:15.520 that will print out an additional layer. 01:39:15.520 --> 01:39:18.610 But this, of course, is going to be problematic eventually. 01:39:18.610 --> 01:39:20.030 Why? 01:39:20.030 --> 01:39:22.670 It's not done yet, this program. 01:39:22.670 --> 01:39:27.644 How many times will draw call itself in this model? 01:39:27.644 --> 01:39:28.640 STUDENT: It's infinite. 01:39:28.640 --> 01:39:30.098 DAVID MALAN: Infinitely many times. 01:39:30.098 --> 01:39:30.814 Why? 01:39:30.814 --> 01:39:34.170 STUDENT: Because there's no quit function. 01:39:34.170 --> 01:39:36.450 DAVID MALAN: Yeah, there's no equivalent of quit. 01:39:36.450 --> 01:39:39.810 Like, if you've printed enough already, then quit, well, 01:39:39.810 --> 01:39:41.050 how do we capture that? 01:39:41.050 --> 01:39:43.320 Well, I don't think we want this to go negative. 01:39:43.320 --> 01:39:46.570 It would make no sense to draw a negative height pyramid. 01:39:46.570 --> 01:39:51.270 So I think we can just pluck off, as the programmer, an easy case, 01:39:51.270 --> 01:39:53.650 an easy answer, a so-called base case. 01:39:53.650 --> 01:39:54.900 And I'm just going to do this. 01:39:54.900 --> 01:40:00.030 At the top of my draw function, let me just say, if n is less than 01:40:00.030 --> 01:40:02.830 or, heck, less than or equal to 0, that's it. 01:40:02.830 --> 01:40:04.530 Go ahead and just return. 01:40:04.530 --> 01:40:06.030 There's nothing more to do. 01:40:06.030 --> 01:40:10.680 And that simple condition, technically known as a base case, 01:40:10.680 --> 01:40:13.290 will ensure that the code doesn't run forever. 01:40:13.290 --> 01:40:13.860 Why? 01:40:13.860 --> 01:40:17.730 Well, suppose that draw is called with an argument of 4. 01:40:17.730 --> 01:40:20.580 4 is, of course, not less than 0, so we don't return. 01:40:20.580 --> 01:40:22.590 But we do draw a pyramid of height 3. 01:40:22.590 --> 01:40:24.870 And here's where things get a little mentally tricky. 01:40:24.870 --> 01:40:28.320 You don't move on to line 20 until draw has been called. 01:40:28.320 --> 01:40:31.080 So when draw is called with an argument of 3, 01:40:31.080 --> 01:40:34.230 it's as though you're executing from the top of this function again. 01:40:34.230 --> 01:40:35.520 3 is not less than 0. 01:40:35.520 --> 01:40:36.330 So what do you do? 01:40:36.330 --> 01:40:38.490 You draw 2. 01:40:38.490 --> 01:40:39.540 How do you draw 2? 01:40:39.540 --> 01:40:41.950 Well, 2 is not less than 0, so you don't return. 01:40:41.950 --> 01:40:43.050 So you draw 1. 01:40:43.050 --> 01:40:44.370 Got to be careful here. 01:40:44.370 --> 01:40:45.240 Draw 1. 01:40:45.240 --> 01:40:47.340 And now, we go ahead back to the beginning. 01:40:47.340 --> 01:40:48.090 How do you draw 1? 01:40:48.090 --> 01:40:50.430 Well, 1 is not less than 0, so you don't return. 01:40:50.430 --> 01:40:53.400 You draw height 0. 01:40:53.400 --> 01:40:54.510 How do you draw height 0? 01:40:54.510 --> 01:40:55.110 Wait a minute. 01:40:55.110 --> 01:40:57.660 0 is less than or equal to 0. 01:40:57.660 --> 01:40:58.980 And you return. 01:40:58.980 --> 01:41:02.100 And so it's kind of like this mental stack, this to do list. 01:41:02.100 --> 01:41:05.190 You keep postponing, executing these lower lines of code, 01:41:05.190 --> 01:41:08.700 because you keep restarting, restarting, restarting the draw function 01:41:08.700 --> 01:41:12.840 until, finally, one of those function calls says there's nothing to do, 01:41:12.840 --> 01:41:13.530 return. 01:41:13.530 --> 01:41:16.530 And now the whole thing starts to unravel, if you will. 01:41:16.530 --> 01:41:18.330 And you pick back up where you left off. 01:41:18.330 --> 01:41:20.300 And this is, perhaps, the best scenario. 01:41:20.300 --> 01:41:21.300 We won't do it in class. 01:41:21.300 --> 01:41:23.508 But if you'd like to wrestle through this on your own 01:41:23.508 --> 01:41:27.780 using debug50 to keep stepping into, step into, step into, each 01:41:27.780 --> 01:41:31.480 of those lines, logically, you'll see exactly what's actually happening. 01:41:31.480 --> 01:41:34.330 So let me go to my terminal and do make recursion, 01:41:34.330 --> 01:41:37.740 which is now this correct version of the code, dot slash recursion. 01:41:37.740 --> 01:41:39.240 Let's type in a height of 4. 01:41:39.240 --> 01:41:44.730 And voila, now we have that same pyramid, not using iteration per se, 01:41:44.730 --> 01:41:47.910 though admittedly, we're using iteration to print the additional layer. 01:41:47.910 --> 01:41:53.340 We're now using draw recursively to print all of the smaller pyramids 01:41:53.340 --> 01:41:55.120 that need come before it. 01:41:55.120 --> 01:41:57.370 STUDENT: Can you only use recursion for void function? 01:41:57.370 --> 01:41:58.123 [INAUDIBLE] 01:41:58.123 --> 01:41:58.790 DAVID MALAN: No. 01:41:58.790 --> 01:42:01.070 Question is, can you only use recursion with a void function? 01:42:01.070 --> 01:42:01.920 No, not at all. 01:42:01.920 --> 01:42:05.120 In fact, it's very common to have a return value like an integer 01:42:05.120 --> 01:42:09.350 or something else so that you can actually do something constructively 01:42:09.350 --> 01:42:11.360 with that actual value. 01:42:11.360 --> 01:42:13.190 Other questions on this. 01:42:13.190 --> 01:42:15.290 STUDENT: When is line 21 getting executed? 01:42:15.290 --> 01:42:16.790 DAVID MALAN: Say it a little louder. 01:42:16.790 --> 01:42:18.770 STUDENT: When is line 21 getting executed? 01:42:18.770 --> 01:42:20.850 DAVID MALAN: When is line 21 getting executed? 01:42:20.850 --> 01:42:23.720 So if you continue to-- 01:42:23.720 --> 01:42:26.600 let me scroll down a bit more so you can see the top of the code. 01:42:26.600 --> 01:42:35.310 So line 21 will be executed once line 19 is done executing itself. 01:42:35.310 --> 01:42:40.790 Now, in the story I told, we kept calling draw again, again, again. 01:42:40.790 --> 01:42:43.400 But as soon as one of those function calls 01:42:43.400 --> 01:42:46.490 where n equals 0 returns immediately, then we 01:42:46.490 --> 01:42:48.510 don't keep drawing again and again. 01:42:48.510 --> 01:42:51.110 So now if you kind of think of the process as reversing, 01:42:51.110 --> 01:42:57.530 then you continue to line 21, then line 21 again, then line 21 again, 01:42:57.530 --> 01:42:59.303 and as the sort of logic unravels. 01:42:59.303 --> 01:43:01.970 And next week, we'll actually paint a picture of what's actually 01:43:01.970 --> 01:43:03.530 happening in the computer's memory. 01:43:03.530 --> 01:43:07.450 But for now, it's just, it's very similar to the pseudocode for the phone 01:43:07.450 --> 01:43:07.950 book. 01:43:07.950 --> 01:43:09.680 You're just searching again and again. 01:43:09.680 --> 01:43:14.408 But you're waiting until the very end to get back the final result. 01:43:14.408 --> 01:43:16.700 Google now, who I keep mentioning by coincidence today, 01:43:16.700 --> 01:43:18.830 is full of programmers of course. 01:43:18.830 --> 01:43:20.600 Here's a fun exercise. 01:43:20.600 --> 01:43:23.432 Let me go back to a browser. 01:43:23.432 --> 01:43:26.390 I'm going to go ahead and search for recursion, because I want to learn 01:43:26.390 --> 01:43:27.980 a little something about recursion. 01:43:27.980 --> 01:43:30.230 Here is kind of an internet meme or joke. 01:43:30.230 --> 01:43:35.360 If I zoom in here, the engineers at Google are kind of funny. 01:43:35.360 --> 01:43:37.902 See why? 01:43:37.902 --> 01:43:38.798 STUDENT: Ah. 01:43:38.798 --> 01:43:40.540 DAVID MALAN: Ah, there you go. 01:43:40.540 --> 01:43:41.740 Yes. 01:43:41.740 --> 01:43:43.030 Yes, this is recursion. 01:43:43.030 --> 01:43:45.822 And there's going to be so many memes you'll come across now, where 01:43:45.822 --> 01:43:48.490 recursion, like if you've ever pointed a camera at the TV that's 01:43:48.490 --> 01:43:51.190 showing the camera, and you sort of see yourself or the image again and again, 01:43:51.190 --> 01:43:52.660 that's really recursion. 01:43:52.660 --> 01:43:56.320 And in that case, it only stops once you hit the base case of a single pixel. 01:43:56.320 --> 01:43:59.350 But this is a very funny joke in some circles 01:43:59.350 --> 01:44:01.880 when it comes to recursion and Google. 01:44:01.880 --> 01:44:04.570 So how can we actually use Google, or rather, 01:44:04.570 --> 01:44:08.050 how can we actually use recursion constructively? 01:44:08.050 --> 01:44:11.320 Well, let me propose that we actually introduced 01:44:11.320 --> 01:44:14.950 a third and final algorithm for sorting that hopefully does better 01:44:14.950 --> 01:44:16.870 than the two sorts thus far. 01:44:16.870 --> 01:44:19.480 We've done selection sort and bubble sort. 01:44:19.480 --> 01:44:22.845 Bubble sort, we liked a little better, at least insofar as in the best case 01:44:22.845 --> 01:44:24.220 where the list is already sorted. 01:44:24.220 --> 01:44:26.387 Bubble sort's at least smarter, and it will actually 01:44:26.387 --> 01:44:28.840 terminate early, giving us a better lower bound, 01:44:28.840 --> 01:44:30.490 in terms of our omega notation. 01:44:30.490 --> 01:44:33.100 But it turns out that recursion, and this is not 01:44:33.100 --> 01:44:36.250 necessarily a feature of recursion, but something we can now leverage. 01:44:36.250 --> 01:44:39.460 It turns out, using recursion, we can take a fundamentally different approach 01:44:39.460 --> 01:44:43.090 to sorting a whole bunch of numbers in such a way 01:44:43.090 --> 01:44:47.560 that we can do far fewer comparisons and, ideally, speed up 01:44:47.560 --> 01:44:49.010 our final results. 01:44:49.010 --> 01:44:51.970 So here is the pseudocode for what we're about to see 01:44:51.970 --> 01:44:54.010 for something called merge sort. 01:44:54.010 --> 01:44:56.230 And it really is this terse. 01:44:56.230 --> 01:44:58.330 Sort the left half of numbers. 01:44:58.330 --> 01:45:00.550 Sort the right half of numbers. 01:45:00.550 --> 01:45:02.950 Merge the sorted halves. 01:45:02.950 --> 01:45:06.310 This is almost sort of nonsensical, because if you're 01:45:06.310 --> 01:45:09.880 asked for an algorithm to sort, and you respond with, well, sort the left half, 01:45:09.880 --> 01:45:10.960 sort the right half. 01:45:10.960 --> 01:45:14.230 That's being difficult, because well, I'm asking for a sorting algorithm. 01:45:14.230 --> 01:45:16.897 You're just telling me to sort the left half and the right half. 01:45:16.897 --> 01:45:21.130 But implicit in that last line, merging is a pretty powerful feature 01:45:21.130 --> 01:45:21.760 of this sort. 01:45:21.760 --> 01:45:23.908 Now, we do need another base case at the top. 01:45:23.908 --> 01:45:24.700 So let me add this. 01:45:24.700 --> 01:45:27.850 If we find ourselves with a list, an array, of size 1, 01:45:27.850 --> 01:45:29.807 well, that array is obviously sorted. 01:45:29.807 --> 01:45:32.390 If there's only one element in it, there's no work to be done. 01:45:32.390 --> 01:45:33.890 So that's going to be our base case. 01:45:33.890 --> 01:45:38.770 But allowing us now, in just these, what, four, six lines of pseudocode, 01:45:38.770 --> 01:45:40.900 to actually sort some elements. 01:45:40.900 --> 01:45:43.652 But let's focus first on just a subset of this. 01:45:43.652 --> 01:45:46.360 Let's consider for a moment what it means to merge sorted halves. 01:45:46.360 --> 01:45:48.760 So Carter has wonderfully come up to volunteer here just 01:45:48.760 --> 01:45:50.170 to help us reset these numbers. 01:45:50.170 --> 01:45:54.080 Suppose that in the middle of the story we're about to tell, 01:45:54.080 --> 01:45:56.020 we have two sorted halves. 01:45:56.020 --> 01:45:58.300 I've already sorted the left half of these numbers, 01:45:58.300 --> 01:46:01.630 and indeed, 2, 4, 5, 7 is sorted from smallest to largest. 01:46:01.630 --> 01:46:06.100 And the right half appears to be already sorted, 0, 1, 3, 6, already sorted. 01:46:06.100 --> 01:46:09.370 So in my pseudocode, we're already done sorting the left half 01:46:09.370 --> 01:46:10.630 and the right half somehow. 01:46:10.630 --> 01:46:12.160 But we'll see how in a moment. 01:46:12.160 --> 01:46:14.980 Well, how do I go about merging these two halves? 01:46:14.980 --> 01:46:18.490 Well, because they're sorted already, and you want to merge them in order, 01:46:18.490 --> 01:46:20.110 I think we can flip down. 01:46:20.110 --> 01:46:25.030 We can hide all but the first numbers in each of these sublists. 01:46:25.030 --> 01:46:28.125 So here, we have a half that starts with 2. 01:46:28.125 --> 01:46:30.250 And I don't really care what the other numbers are, 01:46:30.250 --> 01:46:32.060 because they're clearly larger than 2. 01:46:32.060 --> 01:46:35.043 I can focus only on 2, and 0 too, 0 also. 01:46:35.043 --> 01:46:37.960 We know that 0 is the smallest there, so let's just ignore the numbers 01:46:37.960 --> 01:46:39.293 that Carter kindly flipped down. 01:46:39.293 --> 01:46:44.360 So how do I merge these two lists into a new sorted larger list? 01:46:44.360 --> 01:46:47.590 Well, I compare the two on my left with the 0 01:46:47.590 --> 01:46:50.470 on my right, obviously, which comes first, the 0. 01:46:50.470 --> 01:46:51.963 So let me put this down here. 01:46:51.963 --> 01:46:54.130 And Carter, if you want to give us the next element. 01:46:54.130 --> 01:46:55.960 Now I have two sorted halves. 01:46:55.960 --> 01:46:57.650 But I've already plucked one off. 01:46:57.650 --> 01:47:00.010 So now I compare the two against the 1. 01:47:00.010 --> 01:47:01.580 1 obviously comes next. 01:47:01.580 --> 01:47:04.843 So I'm going to take out the 1 and put it in place here. 01:47:04.843 --> 01:47:06.760 Now I'm going to compare the two halves again. 01:47:06.760 --> 01:47:08.830 2 and 3, which do I merge first? 01:47:08.830 --> 01:47:10.660 Obviously the 2 comes next. 01:47:10.660 --> 01:47:14.170 And now, notice, each time I do this, my hands are theoretically 01:47:14.170 --> 01:47:15.220 making forward progress. 01:47:15.220 --> 01:47:18.580 I'm not doubling back like I kept doing with selection sort or bubble 01:47:18.580 --> 01:47:20.290 sort, back and forth, back and forth. 01:47:20.290 --> 01:47:22.810 My fingers are constantly advancing forward, 01:47:22.810 --> 01:47:24.310 and that's going to be a key detail. 01:47:24.310 --> 01:47:27.340 So I compare 4 and 3, 3 obviously. 01:47:27.340 --> 01:47:32.560 I compare 4 and 6, 4 obviously. 01:47:32.560 --> 01:47:36.520 I compare 5 and 6, 5 obviously. 01:47:36.520 --> 01:47:40.810 And then I compare 7 and 6, 6 of course. 01:47:40.810 --> 01:47:43.000 And then lastly, we have just one element left. 01:47:43.000 --> 01:47:45.460 And even though I'm kind of moving awkwardly as a human, 01:47:45.460 --> 01:47:48.160 my hands technically were only moving to the right. 01:47:48.160 --> 01:47:51.130 I was never looping back doing something again and again. 01:47:51.130 --> 01:47:54.430 And that's, perhaps, the intuition, and just enough room for the 7. 01:47:54.430 --> 01:47:58.210 So that, then, is how you would merge two sorted halves. 01:47:58.210 --> 01:48:00.610 We started with left half sorted, right half sorted. 01:48:00.610 --> 01:48:02.860 And merging is just like what you would do as a human. 01:48:02.860 --> 01:48:04.568 And Carter just flipped the numbers down, 01:48:04.568 --> 01:48:08.620 so our focus was only on the smallest elements in each. 01:48:08.620 --> 01:48:13.030 Any questions before we forge ahead with what it means, then, 01:48:13.030 --> 01:48:17.120 to be merged in this way? 01:48:17.120 --> 01:48:18.777 So now, here is an original list. 01:48:18.777 --> 01:48:20.860 We deliberately put it at the top, because there's 01:48:20.860 --> 01:48:22.480 one detail of merge sort that's key. 01:48:22.480 --> 01:48:25.490 Merge sort is technically going to use a little more space. 01:48:25.490 --> 01:48:28.270 And so whereas, previously, we just kept moving our humans around 01:48:28.270 --> 01:48:30.790 and swapping people and making sure they stayed ultimately 01:48:30.790 --> 01:48:32.110 in the original positions. 01:48:32.110 --> 01:48:36.700 With merge sort, pretends that here's our original array of memory. 01:48:36.700 --> 01:48:38.970 I'm going to need at least one other array of memory. 01:48:38.970 --> 01:48:41.160 And I'm going to cheat, and I'm going to use even more memory. 01:48:41.160 --> 01:48:43.440 But technically, I could actually go back and forth 01:48:43.440 --> 01:48:45.540 between 1 array and a secondary array. 01:48:45.540 --> 01:48:48.370 But it is going to take me more space. 01:48:48.370 --> 01:48:53.130 So how do I go about implementing merge sort on this code? 01:48:53.130 --> 01:48:54.930 Well, let's consider this. 01:48:54.930 --> 01:48:57.060 Here is a array of size 8. 01:48:57.060 --> 01:48:59.590 If only one number quit, obviously not applicable. 01:48:59.590 --> 01:49:01.230 So let's focus on the juicy part there. 01:49:01.230 --> 01:49:02.880 Sort the left half of the numbers. 01:49:02.880 --> 01:49:05.130 All right, how do I sort the left half of the numbers? 01:49:05.130 --> 01:49:09.240 I'm going to just nudge them over just to be clear, which is the left half. 01:49:09.240 --> 01:49:11.850 Here is now a sublist of size 4. 01:49:11.850 --> 01:49:14.860 How do I sort the left half? 01:49:14.860 --> 01:49:17.380 Well, do I have an algorithm for sorting? 01:49:17.380 --> 01:49:18.430 Yeah, what do I do? 01:49:18.430 --> 01:49:19.492 Here's a list of size 4. 01:49:19.492 --> 01:49:20.200 How do I sort it? 01:49:20.200 --> 01:49:22.000 What's step one? 01:49:22.000 --> 01:49:23.330 Sort the left half. 01:49:23.330 --> 01:49:28.060 So I now sort of, conceptually in my mind, take this sublist of size 4. 01:49:28.060 --> 01:49:32.740 And I sort it by first sorting the left half, focusing now on the 7 and 2. 01:49:32.740 --> 01:49:34.330 All right, here's a list of size 2. 01:49:34.330 --> 01:49:37.060 How do I sort a list of size 2? 01:49:37.060 --> 01:49:38.740 STUDENT: [INAUDIBLE] 01:49:38.740 --> 01:49:40.170 DAVID MALAN: Sorry? 01:49:40.170 --> 01:49:42.360 I think we just keep following our instructions. 01:49:42.360 --> 01:49:43.650 Sort the left half. 01:49:43.650 --> 01:49:45.630 All right, here is a list of size 1. 01:49:45.630 --> 01:49:48.417 How do I sort a list of size 1? 01:49:48.417 --> 01:49:49.803 STUDENT: [INAUDIBLE] 01:49:49.803 --> 01:49:50.720 DAVID MALAN: I'm done. 01:49:50.720 --> 01:49:51.360 It's done. 01:49:51.360 --> 01:49:52.740 So I leave this alone. 01:49:52.740 --> 01:49:54.740 What was the next step in the story? 01:49:54.740 --> 01:49:58.160 I've just sorted the left half of the left half of the left half. 01:49:58.160 --> 01:49:59.580 What comes next? 01:49:59.580 --> 01:50:03.260 I sort the right half of the left half of the left half, 01:50:03.260 --> 01:50:06.260 and I'm done, because it's just a list of size 1. 01:50:06.260 --> 01:50:09.280 What comes after this? 01:50:09.280 --> 01:50:09.972 Merge. 01:50:09.972 --> 01:50:11.680 So this is where it gets a little trippy, 01:50:11.680 --> 01:50:14.770 because you have to remember where we're pausing the story to do things 01:50:14.770 --> 01:50:16.187 recursively again and again. 01:50:16.187 --> 01:50:19.270 But if I've just sorted the left half and I've just sorted the right half, 01:50:19.270 --> 01:50:20.890 now I merge them together. 01:50:20.890 --> 01:50:25.040 This is a super short list, so we don't need Carter's help here as before. 01:50:25.040 --> 01:50:27.640 But I think the first number I take here is the 2. 01:50:27.640 --> 01:50:31.660 And then the second number I take, because it's the only option, is the 7. 01:50:31.660 --> 01:50:35.260 But what's nice now is that, notice, the left half of the left half 01:50:35.260 --> 01:50:39.228 is indeed sorted, because I trivially sorted the left half of it 01:50:39.228 --> 01:50:40.270 and the right half of it. 01:50:40.270 --> 01:50:42.760 But then merging is really where the magic happens. 01:50:42.760 --> 01:50:45.670 All right, again, if you rewind now in your mind, 01:50:45.670 --> 01:50:51.300 if I've just sorted the left half of the left half, what happens next? 01:50:51.300 --> 01:50:55.000 Sort the right half of the left half. 01:50:55.000 --> 01:50:56.980 So again, you kind of rewind in time. 01:50:56.980 --> 01:50:58.290 So how do I do this? 01:50:58.290 --> 01:50:59.520 I've got a list of size 2. 01:50:59.520 --> 01:51:01.920 I sort the left half, just the 5, done. 01:51:01.920 --> 01:51:04.200 Sort the right half, 4, done. 01:51:04.200 --> 01:51:08.910 Now the interesting part, I merge the left half and the right half 01:51:08.910 --> 01:51:11.380 of the right half of the left half. 01:51:11.380 --> 01:51:12.450 So what do I do? 01:51:12.450 --> 01:51:14.280 4 comes down here. 01:51:14.280 --> 01:51:16.260 5 comes down here. 01:51:16.260 --> 01:51:19.860 And now, notice what I have. 01:51:19.860 --> 01:51:21.600 Left half is sorted. 01:51:21.600 --> 01:51:23.130 Right half is sorted. 01:51:23.130 --> 01:51:26.610 If you rewind in time, where is my next step, 3? 01:51:26.610 --> 01:51:27.742 Merge the two halves. 01:51:27.742 --> 01:51:29.700 And so this is what Carter helped me do before. 01:51:29.700 --> 01:51:31.290 Let's focus only on the smallest elements, 01:51:31.290 --> 01:51:32.665 just so there's less distraction. 01:51:32.665 --> 01:51:34.020 I compare the 2 and the 4. 01:51:34.020 --> 01:51:36.520 2 comes first, so let's obviously put that here. 01:51:36.520 --> 01:51:39.570 Now, I compare the new beginning of this list 01:51:39.570 --> 01:51:41.280 and the old beginning of this list. 01:51:41.280 --> 01:51:43.050 4 obviously comes next. 01:51:43.050 --> 01:51:45.940 And now, I compare the 7 against the 5. 01:51:45.940 --> 01:51:47.430 5 obviously comes next. 01:51:47.430 --> 01:51:49.240 And now, lastly, I'm left with one number. 01:51:49.240 --> 01:51:50.970 So now I'm down to the 7. 01:51:50.970 --> 01:51:53.490 So even if you've kind of lost track of some of the nuances 01:51:53.490 --> 01:51:55.510 here, if you just kind of take a step back, 01:51:55.510 --> 01:51:58.260 we have the original right half here still untouched. 01:51:58.260 --> 01:52:03.210 But the left half of the original input is now, indeed, sorted, 01:52:03.210 --> 01:52:07.140 all by way of doing sorting left half, right half, left half, right half, 01:52:07.140 --> 01:52:08.940 but with those merges in between. 01:52:08.940 --> 01:52:11.965 All right, so if we've just sorted the left half, 01:52:11.965 --> 01:52:13.590 we rewind all the way to the beginning. 01:52:13.590 --> 01:52:15.590 What do I now do? 01:52:15.590 --> 01:52:17.120 All right, so sort the right half. 01:52:17.120 --> 01:52:18.410 So sort the right half. 01:52:18.410 --> 01:52:20.180 How do I sort a list of size 4? 01:52:20.180 --> 01:52:22.550 Well, I first sort the left half, the 1 and the 6. 01:52:22.550 --> 01:52:24.560 How do I sort a list of size 2? 01:52:24.560 --> 01:52:26.757 You sort the left half, just the number 1. 01:52:26.757 --> 01:52:28.340 Obviously, there's no work to be done. 01:52:28.340 --> 01:52:30.620 Done, sorting the left half. 01:52:30.620 --> 01:52:33.080 6, done, sorting the right half. 01:52:33.080 --> 01:52:34.280 Now, what do I do? 01:52:34.280 --> 01:52:40.610 I merge the left half here with the right half here. 01:52:40.610 --> 01:52:42.240 And that one's pretty straightforward. 01:52:42.240 --> 01:52:43.050 Now, what do I do? 01:52:43.050 --> 01:52:43.910 I've just merged. 01:52:43.910 --> 01:52:45.048 So now I sort it. 01:52:45.048 --> 01:52:47.090 I've just sorted the left half of the right half. 01:52:47.090 --> 01:52:49.550 So now I sort the right half of the right half. 01:52:49.550 --> 01:52:51.590 So I consider the 0, done. 01:52:51.590 --> 01:52:53.270 I consider the 3, done. 01:52:53.270 --> 01:52:55.040 I now merge these two together. 01:52:55.040 --> 01:52:56.640 0, of course, comes first. 01:52:56.640 --> 01:52:58.100 Then comes the 3. 01:52:58.100 --> 01:53:00.680 And now I'm at the point of the story where 01:53:00.680 --> 01:53:02.930 I've sorted the left half of the right half 01:53:02.930 --> 01:53:04.910 and the right half of the right half. 01:53:04.910 --> 01:53:07.535 So step 3 is merge. 01:53:07.535 --> 01:53:09.410 And I'll do it again like we did with Carter. 01:53:09.410 --> 01:53:12.320 All right, 1 and 0, obviously the 0 comes first. 01:53:12.320 --> 01:53:14.390 Now, compare the 1 and the 3. 01:53:14.390 --> 01:53:16.130 Obviously, the 1 comes first. 01:53:16.130 --> 01:53:18.590 Compare the 6 and the 3, obviously the 3. 01:53:18.590 --> 01:53:20.300 And then lastly, the 6. 01:53:20.300 --> 01:53:21.890 So now, where are we? 01:53:21.890 --> 01:53:26.840 We've taken the left half of the whole thing and sorted it. 01:53:26.840 --> 01:53:29.990 We then took the right half of the whole thing and sorted it. 01:53:29.990 --> 01:53:33.560 So now we're at, lastly, step 3 for the last time. 01:53:33.560 --> 01:53:35.120 What do we do? 01:53:35.120 --> 01:53:35.780 Merge. 01:53:35.780 --> 01:53:40.220 And so just to be consistent, let me push these down, and let's compare. 01:53:40.220 --> 01:53:43.310 Left hand or right hand, noticing that they only make forward progress, 01:53:43.310 --> 01:53:45.230 none of this back and forth comparisons. 01:53:45.230 --> 01:53:47.270 2 and 0, of course, the 0. 01:53:47.270 --> 01:53:48.860 So we'll put that in place. 01:53:48.860 --> 01:53:51.140 2 and 1, of course, the 1. 01:53:51.140 --> 01:53:52.880 So we put that in place. 01:53:52.880 --> 01:53:56.930 2 and 3, we merge in, of course, the 2 in this case. 01:53:56.930 --> 01:54:00.770 4 and 3, we now merge in the 3 in this case. 01:54:00.770 --> 01:54:05.630 4 and 6, we now merge, of course, the 4 in place. 01:54:05.630 --> 01:54:07.760 And now, we compare 5 and 6. 01:54:07.760 --> 01:54:08.615 We keep the 5. 01:54:12.590 --> 01:54:15.226 Bug. 01:54:15.226 --> 01:54:17.295 OK, well pretend that the 5 is on. 01:54:20.040 --> 01:54:21.450 Oh, this is why. 01:54:21.450 --> 01:54:24.240 All right, so now we compare the 7 and the 6. 01:54:24.240 --> 01:54:26.430 6th is gone. 01:54:26.430 --> 01:54:29.520 And lastly, 7 is the last one in place. 01:54:29.520 --> 01:54:32.140 And even though I grant that of all the algorithms, 01:54:32.140 --> 01:54:34.320 this is probably the hardest one to stay on top of, 01:54:34.320 --> 01:54:36.210 especially when I'm doing it as a voice over. 01:54:36.210 --> 01:54:41.010 Realize that what we've just done is only those three steps, recursively. 01:54:41.010 --> 01:54:42.510 We started with a list of size 8. 01:54:42.510 --> 01:54:43.650 We sorted the left half. 01:54:43.650 --> 01:54:44.790 We sorted the right half. 01:54:44.790 --> 01:54:46.450 And then we merge the two together. 01:54:46.450 --> 01:54:48.960 But if you go down each of those rabbit holes, so to speak, 01:54:48.960 --> 01:54:52.410 sorting the left half involves sorting the left half of the left half 01:54:52.410 --> 01:54:54.970 and the right half of the left half, and so forth. 01:54:54.970 --> 01:54:59.250 But this germ of an idea of really dividing and conquering the problem, 01:54:59.250 --> 01:55:02.520 not such that you're having the problem and only dealing with one half. 01:55:02.520 --> 01:55:05.700 Clearly, we're sorting one half and the other half 01:55:05.700 --> 01:55:07.780 and merging them together, ultimately. 01:55:07.780 --> 01:55:10.810 It does still lead us to the same solution. 01:55:10.810 --> 01:55:14.550 And if we visualize the remnants of this now, if I depict this as follows, 01:55:14.550 --> 01:55:18.390 where on the screen here, you see where the numbers originally started 01:55:18.390 --> 01:55:20.310 in the top row from left to right. 01:55:20.310 --> 01:55:23.470 Essentially, even though this is in a different order, 01:55:23.470 --> 01:55:28.708 I divided that list of size 8, ultimately, into eight lists of size 1. 01:55:28.708 --> 01:55:31.000 And that's where the base case kicked in and just said, 01:55:31.000 --> 01:55:32.580 OK, we're done sorting that. 01:55:32.580 --> 01:55:37.680 And after that, logically, I then merged two lists of size 1 01:55:37.680 --> 01:55:41.580 into many lists of size 2 and those lists of size 2 into lists of size 4. 01:55:41.580 --> 01:55:47.250 And then finally, the list of size 4 into one big list sorted of size 8. 01:55:47.250 --> 01:55:50.610 And so I put forth this picture with the little line indicators 01:55:50.610 --> 01:55:55.620 here, because how many times did I divide, divide, divide in half? 01:55:55.620 --> 01:55:57.360 Or really double, double, double. 01:55:57.360 --> 01:55:59.280 So exponent is the opposite-- 01:55:59.280 --> 01:56:00.600 spoiler. 01:56:00.600 --> 01:56:02.610 How many times did I divide? 01:56:02.610 --> 01:56:04.320 So three, concretely. 01:56:04.320 --> 01:56:08.070 But if there's eight elements total, and there's n more generally, 01:56:08.070 --> 01:56:12.300 it really is a matter of dividing and conquering log n times. 01:56:12.300 --> 01:56:15.360 You start this, and you can divide one, two, three times, log n times. 01:56:15.360 --> 01:56:18.930 Or conversely, you can start here and exponentially double, double, 01:56:18.930 --> 01:56:21.060 double three times, which is log n. 01:56:21.060 --> 01:56:24.510 But on every row, every shelf, literally, I 01:56:24.510 --> 01:56:28.350 made a fuss about pointing my hands only from the left to the right, 01:56:28.350 --> 01:56:31.380 constantly advancing them, such that every time I did those merges, 01:56:31.380 --> 01:56:34.500 I touched every element once and only once. 01:56:34.500 --> 01:56:37.470 There was none of this back and forth, back and forth on stage. 01:56:37.470 --> 01:56:42.750 So if I'm doing something log n times, or if I'm doing, 01:56:42.750 --> 01:56:49.410 rather, n things log n times, what would be our big O formula, perhaps? 01:56:49.410 --> 01:56:51.057 n things log n times? 01:56:51.057 --> 01:56:52.140 STUDENT: Oh, it's n log n. 01:56:52.140 --> 01:56:53.490 DAVID MALAN: Yeah, so n log n. 01:56:53.490 --> 01:56:56.430 The order of n log n is, indeed, how we would describe 01:56:56.430 --> 01:56:58.560 the running time of merge sort. 01:56:58.560 --> 01:57:03.270 And so of all of the sorts thus far, we've seen that merge sort here, 01:57:03.270 --> 01:57:07.120 actually, is n log n, which is strictly better than n squared, 01:57:07.120 --> 01:57:10.650 which is where both selection sort and bubble sort landed. 01:57:10.650 --> 01:57:14.015 But it's also slower than linear search, for instance. 01:57:14.015 --> 01:57:15.390 But you would rather expect that. 01:57:15.390 --> 01:57:17.880 If you have to do a lot of work up front sorting 01:57:17.880 --> 01:57:19.878 some elements versus just searching them, 01:57:19.878 --> 01:57:21.670 you're going to have to put in more effort. 01:57:21.670 --> 01:57:24.150 And so the question of whether or not you should just search something 01:57:24.150 --> 01:57:26.670 blindly with linear search and not bother sorting it, 01:57:26.670 --> 01:57:30.327 really boils down to, can you afford to spend this amount of time? 01:57:30.327 --> 01:57:32.160 And if you're the Googles of the world, odds 01:57:32.160 --> 01:57:35.170 are you don't want to be searching their database linearly every time. 01:57:35.170 --> 01:57:35.670 Why? 01:57:35.670 --> 01:57:40.380 Because you can sort it once and then benefit millions, billions of people, 01:57:40.380 --> 01:57:43.560 subsequently using something like binary search or, frankly in practice, 01:57:43.560 --> 01:57:46.443 something even fancier and faster than binary search. 01:57:46.443 --> 01:57:48.360 But there's always going to be this trade off. 01:57:48.360 --> 01:57:52.140 You can achieve binary search only if the elements are sorted. 01:57:52.140 --> 01:57:53.940 How much does it cost you to sort them? 01:57:53.940 --> 01:57:56.940 Well, maybe n squared, if you used some of the earlier algorithms. 01:57:56.940 --> 01:58:00.850 But it turns out, n log in is pretty fast as well. 01:58:00.850 --> 01:58:06.180 So at the end of the day, these running times involve trade offs. 01:58:06.180 --> 01:58:09.600 And indeed, in merge sort 2, I should note that the lower bound on merge 01:58:09.600 --> 01:58:12.090 sort is also going to be omega of n log n. 01:58:12.090 --> 01:58:14.640 As such, we can describe it in terms of our theta notation, 01:58:14.640 --> 01:58:18.030 saying that merge sort is, indeed, in theta of n log n. 01:58:18.030 --> 01:58:21.780 So generally speaking, probably better to use something like merge sort 01:58:21.780 --> 01:58:24.030 or some other algorithm that's n log n. 01:58:24.030 --> 01:58:27.660 In practice, most programmers are not implementing these sorting algorithms 01:58:27.660 --> 01:58:28.200 themselves. 01:58:28.200 --> 01:58:30.390 Odds are, they're using a library off the shelf 01:58:30.390 --> 01:58:34.200 that themselves have made the decision as to which of these algorithms to do. 01:58:34.200 --> 01:58:37.140 But generally speaking, and we're seeing now this for the first time, 01:58:37.140 --> 01:58:41.430 if you want to improve time, like use less time, write faster code, 01:58:41.430 --> 01:58:42.750 you've got to pay a price. 01:58:42.750 --> 01:58:45.120 And that might be your human time, just takes you 01:58:45.120 --> 01:58:47.400 more time to code up something more sophisticated, 01:58:47.400 --> 01:58:49.080 more difficult to implement. 01:58:49.080 --> 01:58:51.840 Or you need to spend something like space. 01:58:51.840 --> 01:58:54.060 And as these shelves suggest, that too is 01:58:54.060 --> 01:58:55.740 one of the key details of merge sort. 01:58:55.740 --> 01:58:58.500 You can't just have the elements swapping in place. 01:58:58.500 --> 01:59:02.520 You need at least an auxiliary array, so that when you do the merging, 01:59:02.520 --> 01:59:03.960 you have a place to put them. 01:59:03.960 --> 01:59:05.988 And this is excessive, this amount of memory. 01:59:05.988 --> 01:59:09.030 I could have just gone back and forth between top shelf and bottom shelf. 01:59:09.030 --> 01:59:11.113 But it's a little more interesting to go top down. 01:59:11.113 --> 01:59:12.870 But you do need more space. 01:59:12.870 --> 01:59:15.605 Back in the day, decades ago, space was really expensive. 01:59:15.605 --> 01:59:16.480 And so you know what? 01:59:16.480 --> 01:59:19.450 It might have been better to not use merge sort, use bubble sort 01:59:19.450 --> 01:59:23.230 or selection sort even, or some other algorithm altogether. 01:59:23.230 --> 01:59:25.300 Nowadays, space is relatively cheap. 01:59:25.300 --> 01:59:27.160 And so these are more acceptable trade offs. 01:59:27.160 --> 01:59:29.650 But it totally depends on the application. 01:59:29.650 --> 01:59:32.950 The very last thing we thought we'd do is show you an actual comparison 01:59:32.950 --> 01:59:34.450 of some of these sorting algorithms. 01:59:34.450 --> 01:59:35.800 It's about 60 seconds long. 01:59:35.800 --> 01:59:39.790 And it will compare for you, selection sort, bubble sort, 01:59:39.790 --> 01:59:44.350 and merge sort in parallel simultaneously with some fun sorting 01:59:44.350 --> 01:59:46.570 music, showing you ultimately what it really 01:59:46.570 --> 01:59:52.750 means to be an O of n squared, or better yet, big O of n log n. 01:59:52.750 --> 01:59:54.850 Selection on the top. 01:59:54.850 --> 01:59:58.050 Bubble on the bottom. 01:59:58.050 --> 01:59:59.400 Merge in the middle. 01:59:59.400 --> 02:00:01.885 [MUSIC PLAYING] 02:00:53.660 --> 02:00:55.700 All right, that's it for CS50. 02:00:55.700 --> 02:00:57.910 We'll see you next time.