WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:00.000 --> 00:01:24.180 [MUSIC PLAYING] 00:01:24.180 --> 00:01:25.890 SPEAKER 1: This is CS50. 00:01:25.890 --> 00:01:27.870 And perhaps not unlike past lectures, today 00:01:27.870 --> 00:01:30.187 is going to feel a bit like a fire hose again, 00:01:30.187 --> 00:01:32.520 but realize that it's going to be a lot less code today. 00:01:32.520 --> 00:01:35.550 So there's less syntax, just a few new ideas here and there. 00:01:35.550 --> 00:01:38.550 And the goal ultimately today really is the concepts and the ideas 00:01:38.550 --> 00:01:39.360 that we take away. 00:01:39.360 --> 00:01:41.735 And keep in mind that over the remainder of the semester, 00:01:41.735 --> 00:01:44.710 we will continue to apply and reapply these same ideas. 00:01:44.710 --> 00:01:46.380 So the goal today really is exposure. 00:01:46.380 --> 00:01:48.690 And the goal for the semester is comfort. 00:01:48.690 --> 00:01:52.058 So with that said, let's consider where we left off 00:01:52.058 --> 00:01:54.600 last time, which was to consider that inside of our computer. 00:01:54.600 --> 00:01:55.683 Of course, we have memory. 00:01:55.683 --> 00:01:57.360 We have RAM, Random Access Memory. 00:01:57.360 --> 00:02:00.180 And it was convenient, we found, to start to divide this up 00:02:00.180 --> 00:02:01.440 into individual bytes. 00:02:01.440 --> 00:02:05.520 Like, byte 0 might be in the top left, and 2 gigabytes, 2 billion bytes, 00:02:05.520 --> 00:02:08.370 might be all the way down there in the bottom right-hand corner. 00:02:08.370 --> 00:02:10.199 And once we started to layout our memory, 00:02:10.199 --> 00:02:14.730 both conceptually and technologically, in this way, left to right, top 00:02:14.730 --> 00:02:18.420 to bottom, we had the ability to use a data structure of sorts. 00:02:18.420 --> 00:02:21.330 We introduced, recall, arrays, whereby if we think of our memory 00:02:21.330 --> 00:02:24.180 as just a grid of bytes, we can start using it 00:02:24.180 --> 00:02:26.610 kind of to our advantage to solve problems. 00:02:26.610 --> 00:02:30.660 And it turns out that perhaps a physical incarnation of this idea of an array 00:02:30.660 --> 00:02:32.790 might be like these red lockers here. 00:02:32.790 --> 00:02:35.970 Because even though you and I, every time we've looked at arrays thus far, 00:02:35.970 --> 00:02:39.330 can kind of get a bird's-eye view of everything that's in the array, 00:02:39.330 --> 00:02:40.830 computer's actually pretty limited. 00:02:40.830 --> 00:02:43.387 It doesn't have that instant detection that you and I 00:02:43.387 --> 00:02:46.470 have when we just scan a list of numbers and [? kind ?] of take it all in. 00:02:46.470 --> 00:02:50.760 A computer is only going to be able to look at the contents of an array step 00:02:50.760 --> 00:02:53.692 by step by step, consistent with this whole idea of an algorithm. 00:02:53.692 --> 00:02:56.400 A computer can't just look at all the numbers and take it all in. 00:02:56.400 --> 00:03:00.000 It can only open, so to speak, one of these lockers at a time. 00:03:00.000 --> 00:03:03.120 So toward that end, we've gone ahead and populated these lockers 00:03:03.120 --> 00:03:04.545 with a whole bunch of numbers. 00:03:04.545 --> 00:03:06.420 And the goal at hand is to solve the problem. 00:03:06.420 --> 00:03:08.693 The goal at hand is to find one of those numbers. 00:03:08.693 --> 00:03:11.610 So if we distill computer science into this problem-solving mechanism, 00:03:11.610 --> 00:03:14.190 the input today is these seven lockers. 00:03:14.190 --> 00:03:18.090 And the output is going to be a Boolean, true or false, 00:03:18.090 --> 00:03:21.480 is the number we're looking for among those seven lockers? 00:03:21.480 --> 00:03:25.140 And so rather than my kind of poking around looking for this number, 00:03:25.140 --> 00:03:31.050 might I call on, say, two volunteers to kick us off with two, say, algorithms? 00:03:31.050 --> 00:03:32.310 Let's see. 00:03:32.310 --> 00:03:33.180 Over here? 00:03:33.180 --> 00:03:33.870 Yeah. 00:03:33.870 --> 00:03:37.710 And let's go over here in front, if we may. 00:03:37.710 --> 00:03:39.560 Come on up. 00:03:39.560 --> 00:03:40.560 And what are your names? 00:03:40.560 --> 00:03:41.500 AUDIENCE: [? Nizari. ?] 00:03:41.500 --> 00:03:41.930 SPEAKER 1: Lizari? 00:03:41.930 --> 00:03:42.430 AUDIENCE: [? Nizari. ?] 00:03:42.430 --> 00:03:44.050 SPEAKER 1: [? Nizari. ?] OK, David. 00:03:44.050 --> 00:03:44.800 Nice to meet you. 00:03:44.800 --> 00:03:47.670 Come on up, [? Nizari. ?] And over here, what's your name? 00:03:47.670 --> 00:03:48.467 AUDIENCE: Eric. 00:03:48.467 --> 00:03:49.300 SPEAKER 1: Eric, OK. 00:03:49.300 --> 00:03:49.800 David. 00:03:49.800 --> 00:03:51.210 And come on up, Brian, as well. 00:03:51.210 --> 00:03:52.670 Nice to meet you, Eric. 00:03:52.670 --> 00:03:54.560 Eric, [? Nizari. ?] [? Nizari, ?] Eric. 00:03:54.560 --> 00:03:55.890 [APPLAUSE] 00:03:55.890 --> 00:03:57.630 Come on over here. 00:03:57.630 --> 00:03:59.730 So you came on up the stage first. 00:03:59.730 --> 00:04:01.530 Would you like to go first or go second? 00:04:01.530 --> 00:04:02.270 AUDIENCE: I'll go second. 00:04:02.270 --> 00:04:03.510 SPEAKER 1: You're going to go second. 00:04:03.510 --> 00:04:04.260 So Eric, you're up first. 00:04:04.260 --> 00:04:05.593 Come on over here, if you would. 00:04:05.593 --> 00:04:10.260 So Eric, behind these seven doors we have placed, in advance, the number 50. 00:04:10.260 --> 00:04:12.570 And we would simply like you, the computer, 00:04:12.570 --> 00:04:15.560 to search this array for the number 50. 00:04:15.560 --> 00:04:17.440 AUDIENCE: Is it sorted? 00:04:17.440 --> 00:04:21.180 SPEAKER 1: I cannot answer that question at this time. 00:04:21.180 --> 00:04:23.582 Go. 00:04:23.582 --> 00:04:25.290 Oh, and so that the audience knows what's 00:04:25.290 --> 00:04:28.170 going on, if you wouldn't mind taking the numbers out to see. 00:04:28.170 --> 00:04:29.360 AUDIENCE: This is seven. 00:04:29.360 --> 00:04:30.480 SPEAKER 1: Excellent. 00:04:30.480 --> 00:04:31.310 Not 50. 00:04:31.310 --> 00:04:33.160 AUDIENCE: Should I-- can I take it out? 00:04:33.160 --> 00:04:35.827 SPEAKER 1: At this point, yes, you may do whatever you want now. 00:04:35.827 --> 00:04:37.594 Just find us 50. 00:04:37.594 --> 00:04:38.940 AUDIENCE: Two. 00:04:38.940 --> 00:04:39.860 [LAUGHTER] 00:04:39.860 --> 00:04:42.216 SPEAKER 1: Very good. 00:04:42.216 --> 00:04:43.400 AUDIENCE: One. 00:04:43.400 --> 00:04:45.972 SPEAKER 1: Nice. 00:04:45.972 --> 00:04:47.286 AUDIENCE: Six. 00:04:47.286 --> 00:04:50.094 SPEAKER 1: Very good. 00:04:50.094 --> 00:04:50.800 AUDIENCE: Three. 00:04:51.300 --> 00:04:51.967 SPEAKER 1: Nice. 00:04:51.967 --> 00:04:53.633 AUDIENCE: None of these are close to 50. 00:04:53.633 --> 00:04:55.620 SPEAKER 1: No, none of them are close to 50. 00:04:55.620 --> 00:04:56.355 AUDIENCE: Four. 00:04:56.355 --> 00:04:57.810 SPEAKER 1: Four? 00:04:57.810 --> 00:04:59.748 And? 00:04:59.748 --> 00:05:00.720 AUDIENCE: 50! 00:05:00.720 --> 00:05:01.770 SPEAKER 1: Amazing! 00:05:01.770 --> 00:05:02.770 Very well done. 00:05:02.770 --> 00:05:04.960 [APPLAUSE] 00:05:05.785 --> 00:05:06.410 Very well done. 00:05:06.410 --> 00:05:08.390 Now, if I may, Eric, what was the algorithm 00:05:08.390 --> 00:05:10.160 via which you found us the number 50? 00:05:10.160 --> 00:05:11.195 AUDIENCE: Linear search. 00:05:11.195 --> 00:05:13.520 SPEAKER 1: OK, linear search, meaning what to you? 00:05:13.520 --> 00:05:17.215 AUDIENCE: You just go in a line, starting from there until there. 00:05:17.215 --> 00:05:19.340 SPEAKER 1: OK, that was a very sophisticated answer 00:05:19.340 --> 00:05:20.798 to a term we've not yet introduced. 00:05:20.798 --> 00:05:24.290 And that's great, linear search from left to right, so literally 00:05:24.290 --> 00:05:25.043 following a line. 00:05:25.043 --> 00:05:26.960 And was your algorithm correct, would you say? 00:05:26.960 --> 00:05:27.605 AUDIENCE: Yes. 00:05:27.605 --> 00:05:29.010 SPEAKER 1: OK, so it was correct. 00:05:29.010 --> 00:05:31.048 But there's these different parameters that we 00:05:31.048 --> 00:05:33.590 want to optimize solutions for not just correctness, but what 00:05:33.590 --> 00:05:35.110 other property as well? 00:05:35.110 --> 00:05:35.870 AUDIENCE: Design. 00:05:35.870 --> 00:05:37.953 SPEAKER 1: So maybe design, right, the efficiency. 00:05:37.953 --> 00:05:40.100 So was that the most efficient you could have done? 00:05:40.100 --> 00:05:43.160 AUDIENCE: Actually, yeah, I think so. 00:05:43.160 --> 00:05:43.940 [LAUGHTER] 00:05:43.940 --> 00:05:44.940 SPEAKER 1: And why do you say that? 00:05:44.940 --> 00:05:46.950 AUDIENCE: Because-- so the numbers are sorted. 00:05:46.950 --> 00:05:50.160 So at the end of the day, I have to look through every single one. 00:05:50.160 --> 00:05:50.360 SPEAKER 1: Yeah. 00:05:50.360 --> 00:05:52.520 AUDIENCE: And it's just by chance that the 50 was at last. 00:05:52.520 --> 00:05:53.312 SPEAKER 1: Exactly. 00:05:53.312 --> 00:05:55.370 So it's unfortunate that they were all random. 00:05:55.370 --> 00:05:57.680 And I didn't want to tell you because I didn't want to bias your algorithm one 00:05:57.680 --> 00:05:58.388 way or the other. 00:05:58.388 --> 00:06:01.400 But not knowing if they're sorted and them not even being sorted 00:06:01.400 --> 00:06:04.968 means that that is the best you can do, look at all of the doors 00:06:04.968 --> 00:06:06.260 to find the number in question. 00:06:06.260 --> 00:06:08.908 And maybe you could have gotten lucky, if we had put 50 here. 00:06:08.908 --> 00:06:10.700 But in the worst case, Eric was, of course, 00:06:10.700 --> 00:06:13.280 going to have to do exactly that, searching all of the boxes. 00:06:13.280 --> 00:06:14.480 So thank you, Eric. 00:06:14.480 --> 00:06:16.100 Stay on stage with us, if you would, for a moment. 00:06:16.100 --> 00:06:18.230 And a round of applause, if we could, for finding 50 so well. 00:06:18.230 --> 00:06:18.730 [APPLAUSE] 00:06:18.730 --> 00:06:21.120 [? Nizari, ?] could you come on up? 00:06:21.120 --> 00:06:23.390 We need you not to look at the numbers, because Brian 00:06:23.390 --> 00:06:24.830 needs to do a little bit of magic. 00:06:24.830 --> 00:06:27.170 And he's going to put some of the numbers back into the locker. 00:06:27.170 --> 00:06:29.253 So literally everyone in the room will know what's 00:06:29.253 --> 00:06:30.932 going on except you, at the moment. 00:06:30.932 --> 00:06:33.140 But we're going to give you the added bonus this time 00:06:33.140 --> 00:06:36.080 of sorting the numbers in advance. 00:06:36.080 --> 00:06:38.567 So Brian is in the process of sorting some numbers for us. 00:06:38.567 --> 00:06:40.400 The goal at hand, in just a moment, is still 00:06:40.400 --> 00:06:42.380 going to be the find the number 50. 00:06:42.380 --> 00:06:45.780 I'm really just stalling right now because he's still doing this. 00:06:45.780 --> 00:06:49.730 So I don't really have anything interesting to say just yet. 00:06:49.730 --> 00:06:50.660 Brian's back now. 00:06:50.660 --> 00:06:51.460 Hold on. 00:06:51.460 --> 00:06:52.790 And would you like to introduce yourself maybe? 00:06:52.790 --> 00:06:53.580 AUDIENCE: I'm [? Nizari. ?] 00:06:53.580 --> 00:06:54.590 SPEAKER 1: [? Nizari, ?] and what year are you? 00:06:54.590 --> 00:06:56.615 AUDIENCE: I'm a high school student, a senior. 00:06:56.615 --> 00:06:57.490 SPEAKER 1: Wonderful. 00:06:57.490 --> 00:06:58.130 At what school? 00:06:58.130 --> 00:06:59.300 AUDIENCE: Cambridge Rindge and Latin, it's down the street. 00:06:59.300 --> 00:07:00.572 SPEAKER 1: Just down the road. 00:07:00.572 --> 00:07:02.030 So glad you can join us here today. 00:07:02.030 --> 00:07:03.770 And perfect timing, if I may. 00:07:03.770 --> 00:07:06.230 Now we have seven lockers here behind you. 00:07:06.230 --> 00:07:08.270 And the goal now is to still find the number 50. 00:07:08.270 --> 00:07:10.860 But I'm going to tell you that the numbers are sorted. 00:07:10.860 --> 00:07:14.100 So what's going to be your algorithm, if not the same as Eric? 00:07:14.100 --> 00:07:15.100 AUDIENCE: I will start-- 00:07:15.100 --> 00:07:15.510 SPEAKER 1: And here you go. 00:07:15.510 --> 00:07:17.450 AUDIENCE: I'm going to start in the middle. 00:07:17.450 --> 00:07:19.400 SPEAKER 1: All right, go ahead and show us what's in the middle. 00:07:19.400 --> 00:07:20.805 AUDIENCE: Middle number is seven. 00:07:20.805 --> 00:07:21.680 SPEAKER 1: All right. 00:07:21.680 --> 00:07:23.695 And now what's your next step going to be? 00:07:23.695 --> 00:07:25.070 AUDIENCE: So I want to get to 50. 00:07:25.070 --> 00:07:27.770 So assuming that they're sorted, I'm going to go this way. 00:07:27.770 --> 00:07:28.700 SPEAKER 1: Go to the right, OK. 00:07:28.700 --> 00:07:31.117 So we have three lockers remaining on the right-hand side. 00:07:31.117 --> 00:07:32.240 What's your instinct now? 00:07:32.240 --> 00:07:34.620 AUDIENCE: Mm, I'm going to start with this locker. 00:07:34.620 --> 00:07:36.530 SPEAKER 1: OK, this one being in the middle of those three. 00:07:36.530 --> 00:07:36.910 And you find? 00:07:36.910 --> 00:07:37.940 AUDIENCE: And we got 81. 00:07:37.940 --> 00:07:38.435 SPEAKER 1: 81. 00:07:38.435 --> 00:07:39.800 AUDIENCE: So I know that's too big. 00:07:39.800 --> 00:07:40.620 SPEAKER 1: Way too far. 00:07:40.620 --> 00:07:42.260 AUDIENCE: Hoo, so I'm going to go with this one. 00:07:42.350 --> 00:07:43.970 SPEAKER 1: Which is now in the middle of the two lockers. 00:07:43.970 --> 00:07:44.590 AUDIENCE: And I got 50. 00:07:44.590 --> 00:07:47.574 SPEAKER 1: And a round of applause, if we could, for [? Nizari. ?] 00:07:47.574 --> 00:07:50.240 [APPLAUSE] 00:07:50.240 --> 00:07:52.190 Congratulations and thank you to you both. 00:07:52.190 --> 00:07:53.340 So thanks to you both. 00:07:53.340 --> 00:07:57.260 So here were two algorithms, dubbed linear search and binary search. 00:07:57.260 --> 00:08:00.440 And that's all we have for you right now. 00:08:00.440 --> 00:08:01.730 [LAUGHTER] 00:08:01.730 --> 00:08:04.190 So linear search and binary search are aptly 00:08:04.190 --> 00:08:06.320 named for exactly the reasons we saw. 00:08:06.320 --> 00:08:09.230 Eric literally walked across in a line looking 00:08:09.230 --> 00:08:12.110 for some element, where [? Nizari, ?] instead, actually used 00:08:12.110 --> 00:08:14.510 binary search, "bi" meaning two, and being 00:08:14.510 --> 00:08:18.080 very reminiscent of our discussion of phone books in week 0, 00:08:18.080 --> 00:08:20.153 when I did this divide-and-conquer approach. 00:08:20.153 --> 00:08:22.070 That, too, was called, even though I might not 00:08:22.070 --> 00:08:24.620 have labeled it as such, binary search because I 00:08:24.620 --> 00:08:27.995 kept dividing the problem in two, hence the "bi" in binary. 00:08:27.995 --> 00:08:30.150 Binary search was again and again and again, 00:08:30.150 --> 00:08:34.773 just as we did here when searching for 50 the second time around. 00:08:34.773 --> 00:08:36.440 So these, of course, are two algorithms. 00:08:36.440 --> 00:08:39.380 But let's now start to formalize this discussion a little bit 00:08:39.380 --> 00:08:43.340 and consider how it was each of them was able to solve the problem correctly 00:08:43.340 --> 00:08:45.470 and then ultimately with better design. 00:08:45.470 --> 00:08:48.740 So linear search, we might distill as pseudocode like this 00:08:48.740 --> 00:08:52.408 and again, pseudocode, English-like syntax, no one way to write this. 00:08:52.408 --> 00:08:54.950 Eric, if I can put words in your mouth, might have done this. 00:08:54.950 --> 00:08:58.190 You might have thought to yourself for [? i ?] from 0 to n minus 1, 00:08:58.190 --> 00:09:03.230 to very quickly kind of map it to the idea of code, where this is locker 0, 00:09:03.230 --> 00:09:07.820 and this is locker n minus 1 or 7 or 6, specifically, in this case, 00:09:07.820 --> 00:09:09.350 with 7 total lockers. 00:09:09.350 --> 00:09:13.670 He then checked if the ith elements-- ith just meaning the one he's currently 00:09:13.670 --> 00:09:14.330 looking at-- 00:09:14.330 --> 00:09:16.910 happens to be 50, then go ahead and return true, 00:09:16.910 --> 00:09:19.910 the bool that was meant to be the output of this algorithm. 00:09:19.910 --> 00:09:22.490 And he kept doing that and doing that and doing that. 00:09:22.490 --> 00:09:24.710 But suppose 50 were not there. 00:09:24.710 --> 00:09:28.160 And suppose he got all the way here, to where there is no locker. 00:09:28.160 --> 00:09:29.736 What should he ultimately return? 00:09:29.736 --> 00:09:30.430 AUDIENCE: False. 00:09:30.430 --> 00:09:31.263 SPEAKER 1: So false. 00:09:31.263 --> 00:09:34.910 And so the very last step of this algorithm not inside of that loop 00:09:34.910 --> 00:09:38.013 has to be kind of a catch all, where you just say, return false. 00:09:38.013 --> 00:09:40.430 If I got all the way through this loop and didn't find it, 00:09:40.430 --> 00:09:42.840 it must be the case that 50 is simply not there. 00:09:42.840 --> 00:09:45.660 So that might be one way to write the pseudocode for this problem. 00:09:45.660 --> 00:09:50.600 But now let's consider for a moment just how efficient or inefficient that code 00:09:50.600 --> 00:09:53.300 might have been vis a vis the second algorithm, [? Nizari, ?] 00:09:53.300 --> 00:09:56.470 where she actually divided the conquer in half in half in half. 00:09:56.470 --> 00:09:58.280 That, of course, was called binary search. 00:09:58.280 --> 00:10:00.447 And we can write this in any number of ways as well. 00:10:00.447 --> 00:10:02.120 But in pseudocode, I might propose this. 00:10:02.120 --> 00:10:03.870 Look right in the middle, just as she did. 00:10:03.870 --> 00:10:07.160 And if that number is 50, what should she have returned or outputted? 00:10:07.160 --> 00:10:08.150 AUDIENCE: True. 00:10:08.150 --> 00:10:09.650 SPEAKER 1: So true as our bool. 00:10:09.650 --> 00:10:14.600 And so we might have done this, else if 50 were less than the middle item. 00:10:14.600 --> 00:10:16.940 She probably wanted a search to the left, 00:10:16.940 --> 00:10:20.300 just as when I was searching for Mike Smith, I might have gone left or right. 00:10:20.300 --> 00:10:25.250 So if 50 is less than the middle item, she might want to search the left half. 00:10:25.250 --> 00:10:28.040 Meanwhile, if 50 is greater than the middle item, 00:10:28.040 --> 00:10:31.160 then she might want to search instead the right half. 00:10:31.160 --> 00:10:34.730 But there is a fourth possibility, just to be safe here. 00:10:34.730 --> 00:10:36.050 What else might be the case? 00:10:36.050 --> 00:10:39.742 It's not in the middle, and it's not to the left, and it's not to the right. 00:10:39.742 --> 00:10:40.700 So it's just not there. 00:10:40.700 --> 00:10:43.550 And so there's actually a fourth case, and we can express this differently. 00:10:43.550 --> 00:10:45.425 I'm going to go ahead and just say at the top 00:10:45.425 --> 00:10:49.070 if there's no items in the list, let me go ahead and just claim return false. 00:10:49.070 --> 00:10:50.120 There's nothing there. 00:10:50.120 --> 00:10:53.360 After all, if I keep dividing a list in half and half and half and half, 00:10:53.360 --> 00:10:55.250 eventually there's going to be no list left. 00:10:55.250 --> 00:10:58.160 At which point, I should just conclude, oh, it clearly wasn't there. 00:10:58.160 --> 00:11:02.623 If I halved it so many times, nothing is left on the right or the left. 00:11:02.623 --> 00:11:04.040 So how might we now think of this? 00:11:04.040 --> 00:11:07.230 Well, just as in week 0, we had a picture like this. 00:11:07.230 --> 00:11:09.740 And we claimed that these algorithms were either 00:11:09.740 --> 00:11:11.600 linear in nature, literally a straight line, 00:11:11.600 --> 00:11:14.780 like Eric's, or a little more curved or logarithmic, 00:11:14.780 --> 00:11:16.640 so to speak, like [? Nizari's. ?] And these 00:11:16.640 --> 00:11:18.098 had fundamentally different shapes. 00:11:18.098 --> 00:11:20.420 And we refer to them really by the number of steps 00:11:20.420 --> 00:11:22.220 they might take in the worst case. 00:11:22.220 --> 00:11:26.720 If the phone book or today, the number of lockers was n in total, 00:11:26.720 --> 00:11:29.120 it might take as many as n steps for Eric or anyone 00:11:29.120 --> 00:11:32.790 to find Mike Smith or the number 50 from left to right. 00:11:32.790 --> 00:11:37.910 If in week 0, I did two pages at a time, you can actually speed that up, 00:11:37.910 --> 00:11:39.800 but the shape of the line was the same. 00:11:39.800 --> 00:11:41.360 Eric didn't do that here, but he could have. 00:11:41.360 --> 00:11:43.970 With two hands, he maybe could have looked at two lockers at once. 00:11:43.970 --> 00:11:46.970 So that might have been an intermediate step between those two extremes. 00:11:46.970 --> 00:11:49.160 But logarithmic was this more curved shape. 00:11:49.160 --> 00:11:52.010 But today, we're going to start to formalize this a little bit 00:11:52.010 --> 00:11:54.380 so that we don't keep talking about searching 00:11:54.380 --> 00:11:58.100 and binary search in linear search alone, but other algorithms as well. 00:11:58.100 --> 00:12:00.470 And computer scientists now actually have 00:12:00.470 --> 00:12:03.680 terminology with which to describe algorithms 00:12:03.680 --> 00:12:06.140 and just how well designed your algorithm is 00:12:06.140 --> 00:12:08.090 or how well implemented your code is. 00:12:08.090 --> 00:12:12.050 And it's generally called big O, literally a capital, italicized O. 00:12:12.050 --> 00:12:14.870 Big O notation just means on the order of. 00:12:14.870 --> 00:12:19.490 So if you were asked by someone what is the efficiency of your algorithm 00:12:19.490 --> 00:12:21.620 or the efficiency of your code, you could kind of 00:12:21.620 --> 00:12:23.910 wave your hand, literally and figuratively, 00:12:23.910 --> 00:12:27.710 and give them an approximation of just how fast or slow your code is. 00:12:27.710 --> 00:12:31.840 So instead of saying literally n steps or n/2 or log n steps, 00:12:31.840 --> 00:12:33.920 a computer scientists would typically say, ah, 00:12:33.920 --> 00:12:38.240 that algorithm is on the order of n or on the order of n/2 00:12:38.240 --> 00:12:40.220 or on the order of log n. 00:12:40.220 --> 00:12:43.160 So this is just cryptic-looking syntax that you pronounce verbally 00:12:43.160 --> 00:12:44.840 as "on the order of." 00:12:44.840 --> 00:12:48.470 And it's kind of written like a math function, just as we have here. 00:12:48.470 --> 00:12:51.050 But it turns out that when you're using big O notation, 00:12:51.050 --> 00:12:52.520 it really is kind of hand-waving. 00:12:52.520 --> 00:12:54.560 Like, it's just meant to be an approximation. 00:12:54.560 --> 00:12:55.460 And you know what? 00:12:55.460 --> 00:12:59.030 In this case here, these lines are so similar looking, 00:12:59.030 --> 00:13:01.370 I'm actually going to throw away the divided by 2. 00:13:01.370 --> 00:13:03.470 And we'll see why this is OK in just a moment. 00:13:03.470 --> 00:13:06.830 But those are so similar that I'm just going to call them the same thing. 00:13:06.830 --> 00:13:09.950 And it turns out-- and it's fine if you don't recall logarithms too well-- 00:13:09.950 --> 00:13:11.660 the base 2 there it doesn't really matter. 00:13:11.660 --> 00:13:12.868 I'm going to throw that away. 00:13:12.868 --> 00:13:14.510 It can be base 2 or 3 or 10. 00:13:14.510 --> 00:13:16.527 They're all within multiples of one another. 00:13:16.527 --> 00:13:18.110 So that's no big deal either, I claim. 00:13:18.110 --> 00:13:20.330 And if you don't recall, that's OK, too. 00:13:20.330 --> 00:13:23.780 But the reason I claim that this red line and this yellow line 00:13:23.780 --> 00:13:27.950 are essentially the same thing is because if the problem gets big enough, 00:13:27.950 --> 00:13:30.500 that is the size of the problem gets bigger and bigger, 00:13:30.500 --> 00:13:34.580 and I only have so much screen here-- so let me instead just zoom out so that we 00:13:34.580 --> 00:13:37.010 see more y-axis and more x-axis. 00:13:37.010 --> 00:13:40.160 Notice how much closer the yellow and red lines even get to one another. 00:13:40.160 --> 00:13:42.080 And honestly, if I kept zooming out so that we 00:13:42.080 --> 00:13:44.300 could see bigger and bigger and bigger problems, 00:13:44.300 --> 00:13:46.343 these, frankly, would look pretty much the same. 00:13:46.343 --> 00:13:49.260 So when a computer scientist describes the efficiency of an algorithm, 00:13:49.260 --> 00:13:53.930 they say it's on the order of n, even if it's technically on the order of n/2. 00:13:53.930 --> 00:13:56.930 And here, too, on the order of [? law, ?] [? again ?] irrespective 00:13:56.930 --> 00:13:57.810 of what the base is. 00:13:57.810 --> 00:13:58.730 So it's kind of nice, right? 00:13:58.730 --> 00:14:01.772 Even though it looks a little mathy, you can still kind of wave your hand 00:14:01.772 --> 00:14:04.107 and approximate just a little bit. 00:14:04.107 --> 00:14:06.440 So there are different algorithms, though, in the world. 00:14:06.440 --> 00:14:10.370 And here's kind of a cheat sheet of common running times. 00:14:10.370 --> 00:14:12.530 A running time is just how much time it takes 00:14:12.530 --> 00:14:15.140 for your program or your algorithm to run, 00:14:15.140 --> 00:14:18.350 how many seconds does it take, how many seconds does it take, 00:14:18.350 --> 00:14:21.680 how many steps does it make, whenever your unit of measure is. 00:14:21.680 --> 00:14:24.470 And we'll see on the list here some familiar terms. 00:14:24.470 --> 00:14:27.890 If I were to label this chart now with a couple of the algorithms we've seen, 00:14:27.890 --> 00:14:30.380 linear search, we'll say, is in big O of n. 00:14:30.380 --> 00:14:34.070 In the worst case, Eric is going to have to look at all of the lockers, 00:14:34.070 --> 00:14:38.870 just like a few weeks ago I had to look at all of the pages in the phone book 00:14:38.870 --> 00:14:40.910 maximally to find Mike Smith. 00:14:40.910 --> 00:14:43.430 And just to be clear, where's binary search going 00:14:43.430 --> 00:14:46.470 to be in this list of running times? 00:14:46.470 --> 00:14:47.432 AUDIENCE: Log n. 00:14:47.432 --> 00:14:48.140 SPEAKER 1: Log n. 00:14:48.140 --> 00:14:49.280 So it's actually better. 00:14:49.280 --> 00:14:52.792 Lower on this chart is better, at least in terms of time required, 00:14:52.792 --> 00:14:53.750 than anything above it. 00:14:53.750 --> 00:14:55.170 So we've seen this thus far. 00:14:55.170 --> 00:14:57.170 And now this sort of invites the question, well, 00:14:57.170 --> 00:14:59.120 what algorithms kind of go here or here? 00:14:59.120 --> 00:15:00.150 Which ones are slower? 00:15:00.150 --> 00:15:01.070 Which ones are faster? 00:15:01.070 --> 00:15:03.395 That'll be one of the things we look at here today. 00:15:03.395 --> 00:15:06.020 But computer scientist have another sort of tool in the toolkit 00:15:06.020 --> 00:15:07.650 that we want to introduce you today. 00:15:07.650 --> 00:15:11.570 And this is just a capital Greek omega, this symbol here. 00:15:11.570 --> 00:15:15.470 And this just refers to not a-- 00:15:15.470 --> 00:15:17.450 it's the opposite of big O, if you will. 00:15:17.450 --> 00:15:21.200 Big O is essentially an upper bound on how much time an algorithm might take. 00:15:21.200 --> 00:15:24.290 It might have taken Eric n steps, 7 lockers, 00:15:24.290 --> 00:15:27.230 to find the number 50 because of linear search. 00:15:27.230 --> 00:15:29.540 That's big O of n, or on the order of n. 00:15:29.540 --> 00:15:32.310 That's an upper bound, worst case in this scenario. 00:15:32.310 --> 00:15:35.550 You can use omega, though, to describe things like best cases. 00:15:35.550 --> 00:15:39.530 So for instance, with Eric's linear search approach in the worst case, 00:15:39.530 --> 00:15:43.670 it could have and it did take him n steps, or 7 specifically. 00:15:43.670 --> 00:15:47.263 But in the best case, how few steps might it have taken him? 00:15:47.263 --> 00:15:47.930 Just one, right? 00:15:47.930 --> 00:15:51.300 He might have gotten lucky, and 50 might have been just there. 00:15:51.300 --> 00:15:53.600 Similarly, when [? Nizari, ?] when she looked for 50 00:15:53.600 --> 00:15:56.330 in the middle, how few steps might she have 00:15:56.330 --> 00:15:59.500 needed to find 50 among her 7 lockers? 00:15:59.500 --> 00:16:00.170 AUDIENCE: One. 00:16:00.170 --> 00:16:00.830 SPEAKER 1: One step, too. 00:16:00.830 --> 00:16:02.747 She might have just gotten lucky because Brian 00:16:02.747 --> 00:16:06.230 might have just, by coincidence or design, put the number 50 there. 00:16:06.230 --> 00:16:09.800 So whereas you have this upper bound on how many steps an algorithm might take, 00:16:09.800 --> 00:16:11.030 sometimes you can get lucky. 00:16:11.030 --> 00:16:13.280 And if the inputs are in a certain order, 00:16:13.280 --> 00:16:15.740 you might get lucky and have a lower bound on the running 00:16:15.740 --> 00:16:17.193 time that's much, much better. 00:16:17.193 --> 00:16:19.110 So we might have a chart that looks like this. 00:16:19.110 --> 00:16:22.130 This is the same function, so to speak, the same math. 00:16:22.130 --> 00:16:25.220 But I'm just using omega now instead of big O. 00:16:25.220 --> 00:16:29.030 And now let's just apply some of these algorithms to the chat here, then. 00:16:29.030 --> 00:16:33.380 Linear search is in omega of what, so to speak, by this definition? 00:16:33.380 --> 00:16:34.422 AUDIENCE: Omega 1. 00:16:34.422 --> 00:16:35.630 SPEAKER 1: Omega of 1, right? 00:16:35.630 --> 00:16:39.705 In the best case, the lower bound on how much time linear search might 00:16:39.705 --> 00:16:41.330 have taken Eric would just be one step. 00:16:41.330 --> 00:16:44.930 So we're going to call linear search omega of 1. 00:16:44.930 --> 00:16:46.770 And meanwhile, when we did binary search, 00:16:46.770 --> 00:16:50.030 secondly, it's not going to be log n in the best case. 00:16:50.030 --> 00:16:54.710 It might be also omega of 1 because we might just get lucky. 00:16:54.710 --> 00:16:57.050 And so now we have kind of useful rules of thumb 00:16:57.050 --> 00:16:59.810 for describing just how good or bad your algorithm or your code 00:16:59.810 --> 00:17:05.849 might be, depending on, at least, the inputs that are fed to that algorithm. 00:17:05.849 --> 00:17:07.730 So that's big O, and that's omega. 00:17:07.730 --> 00:17:13.130 Any questions on these two principles, big O or omega? 00:17:13.130 --> 00:17:13.630 Yeah? 00:17:13.630 --> 00:17:22.407 AUDIENCE: [INAUDIBLE] no matter where it starts [INAUDIBLE]?? 00:17:22.407 --> 00:17:23.740 SPEAKER 1: Really good question. 00:17:23.740 --> 00:17:25.698 And we'll touch on a few such algorithms today. 00:17:25.698 --> 00:17:28.780 But for now the question is, what's an example of an algorithm that 00:17:28.780 --> 00:17:33.490 might be omega of n, such that in the best case, no matter how good or bad 00:17:33.490 --> 00:17:35.167 your input is, it takes n steps? 00:17:35.167 --> 00:17:37.000 Maybe counting the number of lockers, right? 00:17:37.000 --> 00:17:38.110 How do I do that? 00:17:38.110 --> 00:17:42.580 1, 2, 3, 4, 5, 6, 7-- my output is 7. 00:17:42.580 --> 00:17:44.080 How many steps did that take? 00:17:44.080 --> 00:17:47.060 Big O of n because in the worst case, I had to look at all of them, 00:17:47.060 --> 00:17:49.622 but also omega of n because in the best case, 00:17:49.622 --> 00:17:51.080 I still had to look at all of them. 00:17:51.080 --> 00:17:53.410 Otherwise, I couldn't have given you an accurate count. 00:17:53.410 --> 00:17:55.780 So that would be an example of an omega of n algorithm. 00:17:55.780 --> 00:17:57.103 And we'll see others over time. 00:17:57.103 --> 00:17:57.770 Other questions? 00:17:57.770 --> 00:17:58.884 Yeah? 00:17:58.884 --> 00:18:06.516 AUDIENCE: [INAUDIBLE] omega or [INAUDIBLE] better omega value 00:18:06.516 --> 00:18:07.887 or a better O value? 00:18:07.887 --> 00:18:09.220 SPEAKER 1: Really good question. 00:18:09.220 --> 00:18:13.240 Is it better to have a really good omega value or a really good O value? 00:18:13.240 --> 00:18:15.205 The latter, and we'll see this over time. 00:18:15.205 --> 00:18:17.080 Really what computer scientists tend to worry 00:18:17.080 --> 00:18:19.900 about is how their code performs in the worst case, 00:18:19.900 --> 00:18:21.790 or maybe not even that, in the average case. 00:18:21.790 --> 00:18:25.270 Typically, day today, best case is nice to have. 00:18:25.270 --> 00:18:28.765 But who really cares if your code is super fast when the input happens 00:18:28.765 --> 00:18:30.580 to be sorted for you already? 00:18:30.580 --> 00:18:32.600 That would be a corner case, so to speak. 00:18:32.600 --> 00:18:34.960 So it's a useful tool to describe your algorithm. 00:18:34.960 --> 00:18:39.260 But a big O and upper bound is typically what we'll care about a little more. 00:18:39.260 --> 00:18:41.260 So let's go in and make this a little more real. 00:18:41.260 --> 00:18:43.490 Let me go ahead and switch over to CS50 IDE. 00:18:43.490 --> 00:18:47.440 And let me go ahead and create a program here called numbers.c 00:18:47.440 --> 00:18:50.990 that's going to allow us to explore, for instance, linear search. 00:18:50.990 --> 00:18:54.620 So numbers.c is going to start off with our usual lines. 00:18:54.620 --> 00:18:57.580 So I'm going to go ahead and include cs50.h. 00:18:57.580 --> 00:19:01.420 I'm going to go ahead and include standard io.h, int main void, 00:19:01.420 --> 00:19:03.790 so no command line arguments for now. 00:19:03.790 --> 00:19:06.460 And in here, let me go ahead and just declare some numbers, 00:19:06.460 --> 00:19:07.960 maybe six numbers total. 00:19:07.960 --> 00:19:11.080 And if I want to declare an array of six numbers, recall from last week, 00:19:11.080 --> 00:19:12.670 I can literally say this. 00:19:12.670 --> 00:19:14.440 And if I want to initialize those numbers, 00:19:14.440 --> 00:19:17.830 I can do numbers bracket 0 gets, for instance, the number 4. 00:19:17.830 --> 00:19:21.190 Numbers bracket 1 gets the number, say, 8. 00:19:21.190 --> 00:19:25.150 Numbers bracket 2 gets the number 15. 00:19:25.150 --> 00:19:28.000 Numbers-- OK, so this is getting really tedious. 00:19:28.000 --> 00:19:30.460 Turns out in C, there's a shorthand notation 00:19:30.460 --> 00:19:34.270 when you know in advance what values you want to put in an array. 00:19:34.270 --> 00:19:42.070 I can actually go up and do this, 4, 8, 15, 16, 23, 42 with curly braces 00:19:42.070 --> 00:19:43.180 on either side. 00:19:43.180 --> 00:19:46.310 So this is just what's called a statically initialized array. 00:19:46.310 --> 00:19:48.520 You just know in advance what the values are. 00:19:48.520 --> 00:19:51.230 And so I can just save some lines of code that way. 00:19:51.230 --> 00:19:54.130 But it's the same thing as the road I was going down a moment ago. 00:19:54.130 --> 00:19:56.828 But the curly braces are new for that little feature. 00:19:56.828 --> 00:19:58.870 Now I'm going to go ahead and iterate over these. 00:19:58.870 --> 00:20:01.728 So for int, i gets 0, i less than 6. 00:20:01.728 --> 00:20:05.020 And I'm going to cut some corners now so that we focus on the new stuff and not 00:20:05.020 --> 00:20:05.520 on the old. 00:20:05.520 --> 00:20:09.370 I'm hard coding 6 instead of using a constant or something like that. 00:20:09.370 --> 00:20:13.120 But all I want to do ultimately is search for the number 50. 00:20:13.120 --> 00:20:15.520 So what code can I now write inside of this for loop 00:20:15.520 --> 00:20:21.440 to just ask the question, is 50 behind this door? 00:20:21.440 --> 00:20:24.400 Someone want to call it out? 00:20:24.400 --> 00:20:24.900 Yeah? 00:20:24.900 --> 00:20:26.178 If? 00:20:26.178 --> 00:20:29.500 AUDIENCE: Number i [INAUDIBLE]. 00:20:29.500 --> 00:20:33.175 SPEAKER 1: Numbers i equals and not just single equals, but equals equals 50. 00:20:33.175 --> 00:20:34.925 I can go ahead now and return some answer. 00:20:34.925 --> 00:20:37.675 So I'm going to go ahead [? and say ?] printf, for instance, found 00:20:37.675 --> 00:20:38.510 in a new line. 00:20:38.510 --> 00:20:42.010 And then if I want to say that, no 50 was found, 00:20:42.010 --> 00:20:45.460 recall, that I want to do this outside of the loop, just like in my pseudocode 00:20:45.460 --> 00:20:45.970 earlier. 00:20:45.970 --> 00:20:48.230 So not found can go way down there. 00:20:48.230 --> 00:20:51.313 So just to be clear, what algorithm have I implemented here? 00:20:51.313 --> 00:20:52.280 AUDIENCE: [INAUDIBLE]. 00:20:52.280 --> 00:20:52.700 SPEAKER 1: Yeah. 00:20:52.700 --> 00:20:53.742 So this is linear search. 00:20:53.742 --> 00:20:58.910 This is the code incarnation of my pseudocode in Eric's actual execution 00:20:58.910 --> 00:20:59.645 of his algorithm. 00:20:59.645 --> 00:21:01.020 So let me go ahead and save this. 00:21:01.020 --> 00:21:04.220 Let me go ahead and make numbers, no error messages, which is good, 00:21:04.220 --> 00:21:08.840 dot slash numbers and Enter, or I should see what when I hit Enter here? 00:21:08.840 --> 00:21:10.100 AUDIENCE: Not found. 00:21:10.100 --> 00:21:13.640 SPEAKER 1: Hopefully not found because indeed 50 is not among those numbers. 00:21:13.640 --> 00:21:17.170 So that's interesting, but it's mostly warm up from last week. 00:21:17.170 --> 00:21:18.920 Why don't we consider a different problem, 00:21:18.920 --> 00:21:22.090 where now we might want to search not just for numbers, but maybe names. 00:21:22.090 --> 00:21:24.590 Like, if the goal is to search a phone book, let me go ahead 00:21:24.590 --> 00:21:28.950 and create names.c that allows me to search now for names in an array. 00:21:28.950 --> 00:21:31.340 So let me go ahead and include cs50.h. 00:21:31.340 --> 00:21:34.100 Let me go ahead and include standard io.h. 00:21:34.100 --> 00:21:36.620 Let me go ahead and do it int main void. 00:21:36.620 --> 00:21:40.220 And then down here let me go ahead and give myself an array, 00:21:40.220 --> 00:21:42.560 so an array of string called names. 00:21:42.560 --> 00:21:44.870 I'm going to go ahead and give myself four names. 00:21:44.870 --> 00:21:48.170 And just like last time, I can do names bracket 0 gets Emma. 00:21:48.170 --> 00:21:52.190 Or again, to save myself time, I can cut a few corners here 00:21:52.190 --> 00:21:57.740 and say Emma, Rodrigo, Brian, David, just like last week, 00:21:57.740 --> 00:21:59.210 capitalized just because. 00:21:59.210 --> 00:22:04.610 So that's another way of writing the same code as with more lines than that. 00:22:04.610 --> 00:22:06.410 Now I'm going to do int i gets 0. 00:22:06.410 --> 00:22:10.130 i is less than 5, in this case, i plus plus. 00:22:10.130 --> 00:22:12.800 And now things get a little interesting because I 00:22:12.800 --> 00:22:17.370 might want to say if names bracket i equals equals-- let's not search for 50 00:22:17.370 --> 00:22:17.870 now. 00:22:17.870 --> 00:22:20.030 Let's search for Emma, just like last week. 00:22:20.030 --> 00:22:24.530 I want to go ahead and say found if I find Emma, 00:22:24.530 --> 00:22:28.700 else down here I want to say not found. 00:22:28.700 --> 00:22:31.050 The catch is that this will not work. 00:22:31.050 --> 00:22:31.550 Sorry. 00:22:31.550 --> 00:22:32.925 It's a little warm up here today. 00:22:32.925 --> 00:22:36.830 The catch is this will not work, even though I'm pretty much 00:22:36.830 --> 00:22:39.860 doing exactly what I did last time. 00:22:39.860 --> 00:22:43.310 What might the intuition be, especially if you've never studied C before, 00:22:43.310 --> 00:22:47.570 as to why line 10 here won't actually work 00:22:47.570 --> 00:22:51.246 as easily as numbers did a moment ago? 00:22:51.246 --> 00:22:51.746 Yeah? 00:22:51.746 --> 00:22:53.080 AUDIENCE: Difference in data type. 00:22:53.080 --> 00:22:54.370 SPEAKER 1: Difference in data type, and what do 00:22:54.370 --> 00:22:56.462 what are the differences, to be clear? 00:22:56.462 --> 00:22:58.893 AUDIENCE: [INAUDIBLE] 00:22:58.893 --> 00:22:59.560 SPEAKER 1: Yeah. 00:22:59.560 --> 00:23:05.158 AUDIENCE: [INAUDIBLE] of the array [INAUDIBLE].. 00:23:05.158 --> 00:23:05.950 SPEAKER 1: Exactly. 00:23:05.950 --> 00:23:08.760 You can't use equals equals 4 strings because, remember, 00:23:08.760 --> 00:23:12.390 a string is not a data type, like a char, a bool, a float, an int. 00:23:12.390 --> 00:23:15.120 Remember, it's actually an array and an array 00:23:15.120 --> 00:23:17.070 that likely has multiple characters. 00:23:17.070 --> 00:23:19.320 And odds if you want to compare two strings, 00:23:19.320 --> 00:23:23.550 you probably intuitively need to compare all of the characters in those strings, 00:23:23.550 --> 00:23:25.170 not just the whole thing at once. 00:23:25.170 --> 00:23:27.480 In other languages, if you use Python or Java, 00:23:27.480 --> 00:23:30.030 you can actually do this in one line, just like this. 00:23:30.030 --> 00:23:32.430 But in C, everything is much more low level. 00:23:32.430 --> 00:23:35.460 If you want to compare strings, you can't use equal equals. 00:23:35.460 --> 00:23:37.260 However, it turns out there's a function, 00:23:37.260 --> 00:23:39.360 and you might have even used this in p-set 2, 00:23:39.360 --> 00:23:43.200 if you took this approach, where you can actually compare two strings. 00:23:43.200 --> 00:23:45.400 So I'm going to delete this line and instead say, 00:23:45.400 --> 00:23:50.430 str comp, for string comparison, names bracket i being the first string I want 00:23:50.430 --> 00:23:53.850 to compare and then, quote unquote, "Emma" being the second string 00:23:53.850 --> 00:23:54.808 that I want to compare. 00:23:54.808 --> 00:23:57.017 And you would only know this from having been told it 00:23:57.017 --> 00:23:58.410 or reading in the documentation. 00:23:58.410 --> 00:24:05.190 This function str compare returns 0 if two strings are the same. 00:24:05.190 --> 00:24:07.098 It happens to return a positive number if one 00:24:07.098 --> 00:24:09.390 comes after the other alphabetically or negative number 00:24:09.390 --> 00:24:11.265 if one comes before the other alphabetically. 00:24:11.265 --> 00:24:15.240 But for today, we're just using it to test equality of strings, so to speak. 00:24:15.240 --> 00:24:17.350 So let me go ahead and save this. 00:24:17.350 --> 00:24:22.050 Let me go ahead and scroll up here and do make names this time. 00:24:22.050 --> 00:24:25.350 And unfortunately, I can't just use this function, it seems. 00:24:25.350 --> 00:24:27.330 And while it's fine certainly to keep using 00:24:27.330 --> 00:24:30.360 help 50 to understand these messages, any thoughts as to what 00:24:30.360 --> 00:24:32.964 I've done wrong? 00:24:32.964 --> 00:24:33.910 AUDIENCE: [INAUDIBLE] 00:24:33.910 --> 00:24:34.300 SPEAKER 1: Yeah. 00:24:34.300 --> 00:24:36.520 I mean, I can't quite understand all of the words on the screen, 00:24:36.520 --> 00:24:37.600 frankly, at first glance. 00:24:37.600 --> 00:24:40.150 But string.h is something we've seen before. 00:24:40.150 --> 00:24:43.040 And indeed, if you read the documentation or the manual page, 00:24:43.040 --> 00:24:46.660 you'll see that str compare, indeed, comes in string.h 00:24:46.660 --> 00:24:48.700 so I need to put this up here. 00:24:48.700 --> 00:24:53.800 And now if I save my file and recompile my code down here with make names, 00:24:53.800 --> 00:24:54.730 now it compiles. 00:24:54.730 --> 00:24:59.545 And if I do dot slash names, I should see, hmm, interesting, 00:24:59.545 --> 00:25:02.290 a mixed message, literally. 00:25:02.290 --> 00:25:05.245 So is Emma there or not there in my array? 00:25:08.000 --> 00:25:09.590 She's obviously there. 00:25:09.590 --> 00:25:11.300 And yet she's somehow not there. 00:25:11.300 --> 00:25:13.010 So what have I done wrong logically. 00:25:13.010 --> 00:25:13.510 Yeah? 00:25:13.510 --> 00:25:20.200 AUDIENCE: Do you have [INAUDIBLE] if it's found or not. 00:25:20.200 --> 00:25:23.343 So [INAUDIBLE] is not found [INAUDIBLE]. 00:25:23.343 --> 00:25:24.010 SPEAKER 1: Yeah. 00:25:24.010 --> 00:25:27.890 So it's this not found that I'm just blindingly printing 00:25:27.890 --> 00:25:29.510 at the end as a sort of catch all. 00:25:29.510 --> 00:25:34.070 But really, if I execute found or print found up here, 00:25:34.070 --> 00:25:37.780 what should I really be doing maybe right after that? 00:25:37.780 --> 00:25:38.330 Returning. 00:25:38.330 --> 00:25:39.663 And we looked at this last week. 00:25:39.663 --> 00:25:42.950 Recall that if you want to go ahead and return a successful outcome, 00:25:42.950 --> 00:25:45.110 the convention is to return 0. 00:25:45.110 --> 00:25:47.163 And actually down here, if you're unsuccessful, 00:25:47.163 --> 00:25:48.830 what should be perhaps returned instead? 00:25:48.830 --> 00:25:49.398 AUDIENCE: 1. 00:25:49.398 --> 00:25:49.940 SPEAKER 1: 1. 00:25:49.940 --> 00:25:51.980 And again, these are totally arbitrary conventions. 00:25:51.980 --> 00:25:53.630 You just kind of learn them as you go. 00:25:53.630 --> 00:25:54.860 But 0 mean success. 00:25:54.860 --> 00:25:57.020 1 tends to mean failure. 00:25:57.020 --> 00:25:58.260 And that now lines up. 00:25:58.260 --> 00:26:01.440 So now my function main will essentially exit early. 00:26:01.440 --> 00:26:05.030 So if I go ahead and run make names and then do dot slash names, now 00:26:05.030 --> 00:26:07.580 if I'm searching for Emma in that array of four names, 00:26:07.580 --> 00:26:11.480 she's found and only found. 00:26:11.480 --> 00:26:15.620 Any questions, then, on this here? 00:26:15.620 --> 00:26:18.560 All right, well, what if I want to do one further thing 00:26:18.560 --> 00:26:21.740 and combine these two ideas into one final program, namely 00:26:21.740 --> 00:26:22.762 that of a phone book? 00:26:22.762 --> 00:26:24.470 So let me go ahead and close these files. 00:26:24.470 --> 00:26:26.430 Let me go ahead and give myself a new file. 00:26:26.430 --> 00:26:28.430 I'll call it phonebook.c. 00:26:28.430 --> 00:26:31.310 And let's actually integrate all of these building blocks 00:26:31.310 --> 00:26:33.680 as follows, cs50.h again. 00:26:33.680 --> 00:26:36.200 I'm going to go ahead and include standard io.h. 00:26:36.200 --> 00:26:39.390 I'm going to go ahead and include string.h just as before. 00:26:39.390 --> 00:26:41.210 And now I'm going to do int main void. 00:26:41.210 --> 00:26:44.420 And now I want to implement the idea of searching a phone book, 00:26:44.420 --> 00:26:48.540 just like in week 0, but now doing it in C. So let's keep it simple. 00:26:48.540 --> 00:26:52.918 And we'll have just four names in this phone book, so string names 4 equals. 00:26:52.918 --> 00:26:54.710 And I'm going to use my same new trick just 00:26:54.710 --> 00:26:58.520 to save myself some lines of code, Emma, Rodrigo, 00:26:58.520 --> 00:27:03.500 and then, quote unquote, "Brian," quote unquote, "myself." 00:27:03.500 --> 00:27:05.000 But then our numbers. 00:27:05.000 --> 00:27:08.990 So how should we store phone number, would you propose, what data type? 00:27:11.990 --> 00:27:13.022 AUDIENCE: [INAUDIBLE] 00:27:13.022 --> 00:27:13.730 SPEAKER 1: Sorry? 00:27:13.730 --> 00:27:14.750 AUDIENCE: String. 00:27:14.750 --> 00:27:15.140 SPEAKER 1: String? 00:27:15.140 --> 00:27:15.680 Why string? 00:27:15.680 --> 00:27:18.160 I feel like phone numbers are numbers and strings-- 00:27:18.160 --> 00:27:21.864 AUDIENCE: Maybe if you store it as a [INAUDIBLE] or an integer, 00:27:21.864 --> 00:27:25.910 then it's implied that you need to do much [INAUDIBLE].. 00:27:25.910 --> 00:27:30.728 You don't have [INAUDIBLE] the [INAUDIBLE],, 00:27:30.728 --> 00:27:34.200 like, [? add ?] [INAUDIBLE] number a dash or something. 00:27:34.200 --> 00:27:36.468 It would be really hard to manipulate an integer. 00:27:36.468 --> 00:27:37.260 SPEAKER 1: Exactly. 00:27:37.260 --> 00:27:41.040 So to summarize if a phone number has dashes in it or parentheses or maybe 00:27:41.040 --> 00:27:43.345 plus signs abroad, those are characters. 00:27:43.345 --> 00:27:44.220 Those aren't numbers. 00:27:44.220 --> 00:27:45.803 So they won't fit in ints or in longs. 00:27:45.803 --> 00:27:48.762 So even though we call it a phone number, now that you're a programmer, 00:27:48.762 --> 00:27:51.720 it's not really a number so much as a string that looks like a number. 00:27:51.720 --> 00:27:53.820 So string is probably the better bet here. 00:27:53.820 --> 00:27:56.100 And if you consider, too, in certain geographies, 00:27:56.100 --> 00:27:59.340 you sometimes have to dial 0 to dial someone's number if it's local. 00:27:59.340 --> 00:28:01.842 But if it's a 0, it's going to get dropped mathematically 00:28:01.842 --> 00:28:03.300 because leading zeros don't matter. 00:28:03.300 --> 00:28:07.680 So again, modeling things that look like numbers but really aren't as integers 00:28:07.680 --> 00:28:08.870 is probably the wrong call. 00:28:08.870 --> 00:28:11.545 So let's indeed do string numbers. 00:28:11.545 --> 00:28:13.170 And I'll give myself four numbers here. 00:28:13.170 --> 00:28:19.170 And let's do 617 555 how about, 0100. 00:28:19.170 --> 00:28:26.580 We'll do 617 555 just like in the movies, 0101. 00:28:26.580 --> 00:28:27.660 Let me fix that. 00:28:27.660 --> 00:28:34.620 Then we'll do 617 555 [? 0102. ?] And then lastly my number, which shall be-- 00:28:34.620 --> 00:28:39.360 whoops-- which shall be 617 555 0103. 00:28:39.360 --> 00:28:41.550 And I'm doing a same kind of trick, but this 00:28:41.550 --> 00:28:44.910 is giving me now two arrays, one called names, one called numbers. 00:28:44.910 --> 00:28:50.580 Here we go, for int i gets 0, i less than 4 i plus plus, so same quick loop 00:28:50.580 --> 00:28:51.720 as before. 00:28:51.720 --> 00:28:53.400 I'm going to go ahead and compare now. 00:28:53.400 --> 00:28:54.390 I'm searching for Emma. 00:28:54.390 --> 00:28:57.810 And specifically now I'm searching for her number not just her name. 00:28:57.810 --> 00:29:01.350 So I want to print out her number this time not just found or not found. 00:29:01.350 --> 00:29:06.990 So as before, I can say if comparing the two strings at names bracket i 00:29:06.990 --> 00:29:12.480 and, quote unquote, "Emma" equals equals 0, I know that I found Emma. 00:29:12.480 --> 00:29:16.500 And if I want to go ahead and print out Emma's phone number, what should 00:29:16.500 --> 00:29:18.930 I do here? 00:29:18.930 --> 00:29:20.670 It's not names. 00:29:20.670 --> 00:29:22.883 It's not numbers. 00:29:22.883 --> 00:29:24.300 What should go between the quotes? 00:29:24.300 --> 00:29:25.623 AUDIENCE: [INAUDIBLE]. 00:29:25.623 --> 00:29:28.540 SPEAKER 1: Yeah, so [? %s, ?] remember, just our familiar place holder 00:29:28.540 --> 00:29:29.130 for strings. 00:29:29.130 --> 00:29:32.760 And then here not names because I know I'm looking for Emma. 00:29:32.760 --> 00:29:34.500 Here I want to go ahead and put number. 00:29:34.500 --> 00:29:38.520 So it's a separate array, but it's at the same location, bracket 1. 00:29:38.520 --> 00:29:40.060 Let me go ahead and save that. 00:29:40.060 --> 00:29:42.870 And down here, I'm going to go ahead and say printf not found, 00:29:42.870 --> 00:29:45.630 if we don't find Emma, even though we surely will in this case. 00:29:45.630 --> 00:29:47.005 And I'm going to learn my lesson. 00:29:47.005 --> 00:29:51.120 I'm going to return 0 for success and return 1 for failure in this case. 00:29:51.120 --> 00:29:54.450 Let me save the file, scroll my terminal window up a little bit, 00:29:54.450 --> 00:29:59.160 do make phone book Enter, compiles OK dot slash phone book. 00:29:59.160 --> 00:30:02.706 And what should I see when I run the program now? 00:30:02.706 --> 00:30:04.600 AUDIENCE: [INAUDIBLE] 00:30:04.600 --> 00:30:09.610 SPEAKER 1: 617 555 0100, hopefully. 00:30:09.610 --> 00:30:11.860 So this code is correct. 00:30:11.860 --> 00:30:14.500 And this is an opportunity now for us to criticize it, though, 00:30:14.500 --> 00:30:16.150 along a different line. 00:30:16.150 --> 00:30:17.150 This is correct. 00:30:17.150 --> 00:30:20.740 I've got two arrays, both of size 4, one with names, one with numbers, 00:30:20.740 --> 00:30:23.710 code finds Emma, prints her number, returns 0. 00:30:23.710 --> 00:30:26.050 I seem to have done everything correctly. 00:30:26.050 --> 00:30:30.520 But does anything rub you the wrong way perhaps about the design of this code? 00:30:30.520 --> 00:30:33.250 Could we do better? 00:30:33.250 --> 00:30:36.490 Is there's something that's a little arbitrary, a little contrived, 00:30:36.490 --> 00:30:39.680 a little dangerous about this code? 00:30:39.680 --> 00:30:40.690 Any glimpses? 00:30:40.690 --> 00:30:41.734 Yeah, over here? 00:30:41.734 --> 00:30:43.630 AUDIENCE: [INAUDIBLE]. 00:30:43.630 --> 00:30:45.134 SPEAKER 1: Sorry, a little louder. 00:30:45.134 --> 00:30:47.444 AUDIENCE: [INAUDIBLE] [? two ?] [? single digit ?] [? on both sides ?] 00:30:47.444 --> 00:30:48.370 [INAUDIBLE]. 00:30:48.370 --> 00:30:50.980 SPEAKER 1: So we could use a two-dimensional array 00:30:50.980 --> 00:30:52.420 to store data like this. 00:30:52.420 --> 00:30:54.253 I would propose it's not strictly necessary, 00:30:54.253 --> 00:30:56.378 and it might make things a little more complicated, 00:30:56.378 --> 00:30:58.030 but a reasonable alternative as well. 00:30:58.030 --> 00:30:58.974 Other thoughts? 00:30:58.974 --> 00:31:01.580 AUDIENCE: [INAUDIBLE] Emma's number [INAUDIBLE].. 00:31:01.580 --> 00:31:04.600 SPEAKER 1: Yeah, it's assuming that Emma's number is the first one. 00:31:04.600 --> 00:31:06.220 And that seems reasonable, right? 00:31:06.220 --> 00:31:07.300 Emma's name is first. 00:31:07.300 --> 00:31:08.950 So presumably her number's first. 00:31:08.950 --> 00:31:10.270 Rodrigo's name is second. 00:31:10.270 --> 00:31:12.070 So presumably his number is second. 00:31:12.070 --> 00:31:13.750 And that might be true. 00:31:13.750 --> 00:31:17.380 But frankly, that's the concern, this sort of honor system 00:31:17.380 --> 00:31:19.810 that I promised to keep the names in the right order, 00:31:19.810 --> 00:31:23.080 and I promise to keep the numbers in the right order, when really, 00:31:23.080 --> 00:31:26.410 that is just sort of an unspoken agreement between me and myself, 00:31:26.410 --> 00:31:28.810 or if I'm working with colleagues or classmates, 00:31:28.810 --> 00:31:31.273 that we all just agree to keep those things in sync. 00:31:31.273 --> 00:31:32.440 And that's dangerous, right? 00:31:32.440 --> 00:31:35.710 If you had more numbers than four, you could imagine things very quickly 00:31:35.710 --> 00:31:37.240 getting slightly out of order. 00:31:37.240 --> 00:31:39.970 Or god forbid, you sort the names alphabetically, 00:31:39.970 --> 00:31:43.970 how do you go about sorting the numbers as well and keeping things together? 00:31:43.970 --> 00:31:46.780 So this feels like an opportunity for one new feature in C 00:31:46.780 --> 00:31:48.820 and in programming languages more generally, 00:31:48.820 --> 00:31:52.540 whereby we can actually keep these pieces of data, someone's name 00:31:52.540 --> 00:31:54.010 and number, together. 00:31:54.010 --> 00:31:56.140 And today we give ourselves the opportunity 00:31:56.140 --> 00:31:58.450 to introduce our own custom types. 00:31:58.450 --> 00:32:01.930 We've seen ints and bools and floats and longs and strings. 00:32:01.930 --> 00:32:04.390 And string, recall, is a custom CS50 data type. 00:32:04.390 --> 00:32:07.840 And we'll take that one away in a couple of weeks as a training wheel. 00:32:07.840 --> 00:32:11.650 But today let's give ourselves our own data type as follows. 00:32:11.650 --> 00:32:14.030 Typedef is our new keyword today. 00:32:14.030 --> 00:32:16.360 And it literally means define a type. 00:32:16.360 --> 00:32:17.710 It's going to be a structure. 00:32:17.710 --> 00:32:20.740 And so struct in C is an actual keyword, and it 00:32:20.740 --> 00:32:26.500 refers to a container, inside of which you can put multiple other data types. 00:32:26.500 --> 00:32:29.620 Struct is a container for multiple data types. 00:32:29.620 --> 00:32:31.720 What do I want to contain? 00:32:31.720 --> 00:32:34.210 Well, I want to give myself a name for everyone. 00:32:34.210 --> 00:32:36.580 And I want to give myself a number for everyone, 00:32:36.580 --> 00:32:39.880 even though it's a string because phone numbers can have dashes and parentheses 00:32:39.880 --> 00:32:40.900 and so forth. 00:32:40.900 --> 00:32:41.650 And you know what? 00:32:41.650 --> 00:32:45.070 The name I'm going to give to this structure is going to be person. 00:32:45.070 --> 00:32:46.390 It's a simple person. 00:32:46.390 --> 00:32:49.420 But using this syntax, I can teach my compiler, 00:32:49.420 --> 00:32:51.490 [INAUDIBLE] in this case, that not only are there 00:32:51.490 --> 00:32:55.240 ints and floats and chars and bools and so forth and strings, 00:32:55.240 --> 00:32:59.710 there are also person types now in C. They didn't come with the language. 00:32:59.710 --> 00:33:04.360 But I'm inventing them now with typedef struct person, inside of which, 00:33:04.360 --> 00:33:06.700 or encapsulated, so to speak, inside of which 00:33:06.700 --> 00:33:09.970 is going to be two things, name and number. 00:33:09.970 --> 00:33:11.570 So what can I do with this? 00:33:11.570 --> 00:33:15.820 Well, my code gets a little different but better designed, I would argue. 00:33:15.820 --> 00:33:19.150 Down in my code now, I'm going to give myself an array of people. 00:33:19.150 --> 00:33:20.560 There's four of us on the staff. 00:33:20.560 --> 00:33:22.960 And I want to give myself an array of four people. 00:33:22.960 --> 00:33:26.140 So I might do literally the same approach I've always 00:33:26.140 --> 00:33:27.520 done when declaring a data type. 00:33:27.520 --> 00:33:28.645 What data type do you want? 00:33:28.645 --> 00:33:29.470 Person. 00:33:29.470 --> 00:33:31.330 And what should my array be called? 00:33:31.330 --> 00:33:32.860 Well, I could call it persons. 00:33:32.860 --> 00:33:35.320 Or frankly, I could just call it people in English. 00:33:35.320 --> 00:33:37.450 And how many people do I want to represent? 00:33:37.450 --> 00:33:38.470 Four. 00:33:38.470 --> 00:33:40.900 So my array is called people. 00:33:40.900 --> 00:33:42.340 It's a size 4. 00:33:42.340 --> 00:33:46.070 And each element in that array is going to be a person. 00:33:46.070 --> 00:33:48.110 So this syntax is not new. 00:33:48.110 --> 00:33:50.800 This syntax up here is new. 00:33:50.800 --> 00:33:54.220 But as of today now, persons exist in C. Now, 00:33:54.220 --> 00:33:57.550 my syntax here does have to change a little bit, but not all that much. 00:33:57.550 --> 00:34:01.450 Now, if I want to go ahead and fill this array, I can do something like this. 00:34:01.450 --> 00:34:03.700 Emma will be our 0th person. 00:34:03.700 --> 00:34:07.570 But I don't just do something like this because, quote unquote, "Emma" is not 00:34:07.570 --> 00:34:08.409 a person. 00:34:08.409 --> 00:34:10.989 Quote unquote "Emma" is a name. 00:34:10.989 --> 00:34:15.409 And quote unquote "617 555 0100" is a number. 00:34:15.409 --> 00:34:17.590 So I actually need to be a little more specific. 00:34:17.590 --> 00:34:23.320 I need to say that people 0 name is Emma. 00:34:23.320 --> 00:34:27.610 And then people 0 number is whatever Emma's was, 00:34:27.610 --> 00:34:31.460 which was 617 555 0100 semicolon. 00:34:31.460 --> 00:34:35.620 And now I can do the same thing again, so people bracket 1 dot name 00:34:35.620 --> 00:34:37.300 gets Rodrigo. 00:34:37.300 --> 00:34:43.719 People bracket 1 dot number gets 617 555 0101 semicolon. 00:34:43.719 --> 00:34:47.679 People bracket 2 dot name gets Brian. 00:34:47.679 --> 00:34:54.130 And people bracket 2 dot number gets 617 555-- 00:34:54.130 --> 00:34:56.469 555-- 0102. 00:34:56.469 --> 00:34:58.900 And then lastly-- it's getting tedious quickly. 00:34:58.900 --> 00:35:02.380 But in an ideal world, we would just ask the human for these inputs. 00:35:02.380 --> 00:35:03.880 Name will be mine. 00:35:03.880 --> 00:35:07.570 And then lastly, people bracket 3 dot number equals, quote unquote, 00:35:07.570 --> 00:35:10.390 "617 555 0103." 00:35:10.390 --> 00:35:11.750 Whew. 00:35:11.750 --> 00:35:14.750 So it's a little more to write in this case. 00:35:14.750 --> 00:35:16.990 And so it might rub you the wrong way in that sense. 00:35:16.990 --> 00:35:19.990 But notice that we're now kind of encapsulating everything together. 00:35:19.990 --> 00:35:23.920 We only have four values, each of which is a person. 00:35:23.920 --> 00:35:26.950 And each of those persons, inside of them, so to speak, 00:35:26.950 --> 00:35:28.570 have a name and a number. 00:35:28.570 --> 00:35:30.700 And everything is intricately related. 00:35:30.700 --> 00:35:33.440 So even if I sought these things by name, 00:35:33.440 --> 00:35:37.790 they're going to end up having the same associations between numbers and names. 00:35:37.790 --> 00:35:40.720 So now the last thing I have to do is change my logic down here. 00:35:40.720 --> 00:35:45.940 It's not sufficient anymore to compare names bracket i against Emma. 00:35:45.940 --> 00:35:48.650 What should I compare name against Emma? 00:35:48.650 --> 00:35:50.090 AUDIENCE: [INAUDIBLE]. 00:35:50.090 --> 00:35:51.820 SPEAKER 1: Dot name. 00:35:51.820 --> 00:35:55.180 And then down here, numbers doesn't even-- oh, and this was-- 00:35:55.180 --> 00:35:56.680 this is people. 00:35:56.680 --> 00:35:58.000 Numbers doesn't exist either. 00:35:58.000 --> 00:35:58.840 It's people. 00:35:58.840 --> 00:36:00.340 But I want to print her number here. 00:36:00.340 --> 00:36:02.800 So I do dot number. 00:36:02.800 --> 00:36:04.930 So again, we've add a little bit of complexity 00:36:04.930 --> 00:36:07.540 by adding typedef and these dot notations. 00:36:07.540 --> 00:36:11.680 But if I go ahead and make my phone book now, all too many errors. 00:36:11.680 --> 00:36:13.360 Oh, interesting. 00:36:13.360 --> 00:36:18.310 Array index 4 is past the end of the array, which contains four elements. 00:36:18.310 --> 00:36:21.340 So I made a stupid mistake here. 00:36:21.340 --> 00:36:22.185 What did I do? 00:36:22.185 --> 00:36:23.060 AUDIENCE: [INAUDIBLE] 00:36:23.060 --> 00:36:23.727 SPEAKER 1: Yeah. 00:36:23.727 --> 00:36:25.990 So I just kept incrementing incorrectly. 00:36:25.990 --> 00:36:28.630 Let me save that, run make phone book, Enter. 00:36:28.630 --> 00:36:29.650 Now it's good. 00:36:29.650 --> 00:36:35.080 Dot slash phone book, Enter, and hopefully I will see Emma's number. 00:36:35.080 --> 00:36:37.240 So it's no more correct than before. 00:36:37.240 --> 00:36:40.900 But it's arguably better designed/ and we'll come back to this later 00:36:40.900 --> 00:36:41.630 in the semester. 00:36:41.630 --> 00:36:43.338 [? As ?] you choose your choice of tracks 00:36:43.338 --> 00:36:47.230 and start implementing applications for the web or mobile devices or games, 00:36:47.230 --> 00:36:50.020 it's going to be quite common to encapsulate related information 00:36:50.020 --> 00:36:52.330 like this so that you keep lots of information 00:36:52.330 --> 00:36:55.000 together, especially when you use something called a database. 00:36:55.000 --> 00:36:55.738 Yeah? 00:36:55.738 --> 00:36:58.037 AUDIENCE: [INAUDIBLE] 00:36:58.037 --> 00:37:00.620 SPEAKER 1: Is there any shortcut for writing everything I did? 00:37:00.620 --> 00:37:02.760 Yes, you can actually use curly bracket notation. 00:37:02.760 --> 00:37:05.510 It gets a little uglier in this case so I'm not going to bother doing it. 00:37:05.510 --> 00:37:06.927 But, yes, there is a way to do it. 00:37:06.927 --> 00:37:08.968 However, this is, at the end of the day, realize, 00:37:08.968 --> 00:37:11.180 kind of a silly program because I'm writing a program 00:37:11.180 --> 00:37:13.550 to find Emma in a list of names I already wrote. 00:37:13.550 --> 00:37:14.700 So it's not dynamic at all. 00:37:14.700 --> 00:37:18.230 So in an ideal world, we would be using get string or something fancier anyway. 00:37:18.230 --> 00:37:21.180 Other questions on this? 00:37:21.180 --> 00:37:21.860 All right. 00:37:21.860 --> 00:37:24.650 So this is only to say we clearly have the ability, then, 00:37:24.650 --> 00:37:30.140 in code to implement these ideas, like [? Nizari ?] and Eric implemented 00:37:30.140 --> 00:37:33.980 more physically, using something like this array of lockers. 00:37:33.980 --> 00:37:35.640 So where do we go from here? 00:37:35.640 --> 00:37:38.450 Well, unfortunately, [? Nizari ?] benefited from the fact 00:37:38.450 --> 00:37:41.690 that the lockers were, of course, already 00:37:41.690 --> 00:37:43.940 sorted sort of behind her by Brian. 00:37:43.940 --> 00:37:45.638 But there were some price paid, right? 00:37:45.638 --> 00:37:48.680 Indeed, even we had to wait a little bit of time for all of those numbers 00:37:48.680 --> 00:37:50.513 to get sorted in the lockers before we could 00:37:50.513 --> 00:37:52.800 proceed to execute that algorithm. 00:37:52.800 --> 00:37:55.350 So a question, then, reasonable to ask is, well, 00:37:55.350 --> 00:37:57.590 how expensive is it to sort numbers? 00:37:57.590 --> 00:38:00.080 And should you sort numbers and then search? 00:38:00.080 --> 00:38:02.120 Or should you just jump right into searching 00:38:02.120 --> 00:38:05.150 and not worry about sorting the numbers, especially if one 00:38:05.150 --> 00:38:06.650 might be more costly than the other? 00:38:06.650 --> 00:38:08.483 These are going to be ultimately trade-offs. 00:38:08.483 --> 00:38:09.960 So let's consider them as follows. 00:38:09.960 --> 00:38:13.190 If now the problem at hand is to provide as input to our problem 00:38:13.190 --> 00:38:16.160 an unsorted list of numbers, the goal of which 00:38:16.160 --> 00:38:19.010 is to get a sorted list of numbers back out, 00:38:19.010 --> 00:38:20.990 how do we go about implementing this? 00:38:20.990 --> 00:38:26.720 For instance, if the numbers are 7, 2, 1, 6, 3, 4, 50 in that order, 00:38:26.720 --> 00:38:29.390 that unsorted order, the goal at hand is to get out 00:38:29.390 --> 00:38:34.700 1, 2, 3, 4, 6, 7, 50 in sorted order from left to right, 00:38:34.700 --> 00:38:36.570 smallest to largest. 00:38:36.570 --> 00:38:39.710 So how can we go about implementing that idea? 00:38:39.710 --> 00:38:41.110 Well, let me go ahead and see. 00:38:41.110 --> 00:38:42.470 We have a few stress balls left. 00:38:42.470 --> 00:38:44.803 And we could perhaps do this a little dramatically maybe 00:38:44.803 --> 00:38:47.330 with eight volunteers, if you will. 00:38:47.330 --> 00:38:48.350 OK, that's a plan. 00:38:48.350 --> 00:38:58.100 OK, so 1, about 2, 3, if we could, OK, 4 in the middle there, 5, 6, 7, and let's 00:38:58.100 --> 00:38:58.820 see-- 00:38:58.820 --> 00:39:02.250 and let's see, [INAUDIBLE] can come up here. 00:39:02.250 --> 00:39:03.170 Can we do it after? 00:39:03.170 --> 00:39:03.808 OK, thanks. 00:39:03.808 --> 00:39:05.850 And how about-- wait, I saw a hand in the middle. 00:39:05.850 --> 00:39:07.770 How about eight, volunteered by your friends. 00:39:07.770 --> 00:39:08.780 Come on up. 00:39:08.780 --> 00:39:10.042 So come on up, if you would. 00:39:10.042 --> 00:39:11.750 And Brian, if we could go ahead and equip 00:39:11.750 --> 00:39:15.230 our volunteers each with a number. 00:39:15.230 --> 00:39:18.320 We're going to go ahead and see if we can't solve together 00:39:18.320 --> 00:39:25.310 the idea of finding an algorithm for sorting the numbers at hand. 00:39:25.310 --> 00:39:27.770 So in just a moment, each of you will be handed a number. 00:39:27.770 --> 00:39:30.520 In the meantime, let's go ahead and just say a quick introduction, 00:39:30.520 --> 00:39:32.160 who you are, and perhaps your house. 00:39:32.160 --> 00:39:36.080 AUDIENCE: [? Crus, ?] Dudley House, from Germany. 00:39:36.080 --> 00:39:38.410 AUDIENCE: Curtis, just here visiting. 00:39:38.410 --> 00:39:39.696 SPEAKER 1: Wonderful. 00:39:39.696 --> 00:39:43.775 AUDIENCE: Ali, freshman, [INAUDIBLE],, from Turkey. 00:39:43.775 --> 00:39:46.110 AUDIENCE: Farah [? Foho, ?] from Detroit. 00:39:46.910 --> 00:39:47.660 SPEAKER 1: Nice. 00:39:47.660 --> 00:39:52.151 AUDIENCE: Allison, Hollis because I'm first year, from Cleveland. 00:39:52.151 --> 00:39:53.150 AUDIENCE: I'm Claude. 00:39:53.150 --> 00:39:53.720 I'm in Mauer. 00:39:53.720 --> 00:39:55.290 And I'm from Virginia. 00:39:55.290 --> 00:39:57.260 AUDIENCE: I'm [? Rohil. ?] I'm in Wigglesworth. 00:39:57.260 --> 00:39:58.946 And I'm from Atlanta. 00:39:58.946 --> 00:40:01.250 AUDIENCE: I'm [? Yowell. ?] I'm also from Wigglesworth. 00:40:01.250 --> 00:40:02.505 And I'm from New York. 00:40:02.505 --> 00:40:03.390 AUDIENCE: I'm Bonnie. 00:40:03.390 --> 00:40:03.665 I'm in Lowell. 00:40:03.665 --> 00:40:05.457 I'm from Beijing and [? Ann ?] [? Arbor. ?] 00:40:05.457 --> 00:40:06.530 SPEAKER 1: Wonderful. 00:40:06.530 --> 00:40:10.360 And I'm noticing now, as you might be too, we have nine volunteers on stage. 00:40:10.360 --> 00:40:12.110 So we're going to go ahead and solve this. 00:40:12.110 --> 00:40:12.740 That's OK. 00:40:12.740 --> 00:40:13.270 What's your name again? 00:40:13.270 --> 00:40:13.960 AUDIENCE: Bonnie. 00:40:13.960 --> 00:40:14.770 SPEAKER 1: Bonnie, come on over here. 00:40:14.770 --> 00:40:16.937 You're going to be maybe my assistant, if you could, 00:40:16.937 --> 00:40:18.592 as we sought these elements. 00:40:18.592 --> 00:40:20.300 Let's go ahead and give you the mic here. 00:40:20.300 --> 00:40:23.360 Each of you has been handed a number that 00:40:23.360 --> 00:40:27.330 happens to match with this, which is just an unsorted list of numbers. 00:40:27.330 --> 00:40:30.530 And let me just ask that our eight volunteers here sort yourselves. 00:40:30.530 --> 00:40:32.752 Go. 00:40:32.752 --> 00:40:35.353 [INTERPOSING VOICES] 00:40:35.353 --> 00:40:37.520 SPEAKER 1: And I'll have you direct them after this. 00:40:41.200 --> 00:40:41.830 Excellent. 00:40:41.830 --> 00:40:42.910 Very well done. 00:40:42.910 --> 00:40:45.010 [APPLAUSE] 00:40:45.010 --> 00:40:45.727 OK. 00:40:45.727 --> 00:40:47.560 So let me ask any of you, and we'll hand you 00:40:47.560 --> 00:40:50.200 the mic, if need be, what was the algorithm you used to sort yourselves? 00:40:50.200 --> 00:40:51.540 AUDIENCE: Human intuition. 00:40:51.540 --> 00:40:53.370 SPEAKER 1: Human intuition, OK. 00:40:53.370 --> 00:40:55.130 [LAUGHTER] 00:40:55.130 --> 00:40:55.630 Nice. 00:40:55.630 --> 00:40:58.080 [APPLAUSE] 00:40:59.320 --> 00:40:59.820 Nice. 00:40:59.820 --> 00:41:01.220 Other formulations? 00:41:01.220 --> 00:41:01.720 Yeah? 00:41:04.642 --> 00:41:09.160 AUDIENCE: I just checked if the person who's left 00:41:09.160 --> 00:41:13.320 me, who is supposed to be larger than me is larger than me. 00:41:13.320 --> 00:41:17.880 And if he was larger than me, then I stayed there. 00:41:17.880 --> 00:41:20.535 And if I was larger than him, I just switched places with him. 00:41:20.535 --> 00:41:21.660 SPEAKER 1: OK, I like that. 00:41:21.660 --> 00:41:23.400 It's sort [? of a ?] locally optimum approach, 00:41:23.400 --> 00:41:25.942 where you just kind of look to the left and right and sort of 00:41:25.942 --> 00:41:27.630 fix any transpositions or mismatches. 00:41:27.630 --> 00:41:30.172 And in fact, let's go ahead and try and apply that same idea. 00:41:30.172 --> 00:41:32.790 Can all eight of you reorder yourselves, just like that, 00:41:32.790 --> 00:41:34.620 so that you're standing below your number 00:41:34.620 --> 00:41:39.820 so that we're undoing the human intuition that we just executed. 00:41:39.820 --> 00:41:42.450 And now let's go ahead and say, all right, so, Bonnie, 00:41:42.450 --> 00:41:44.560 if you don't mind helping direct us there-- 00:41:44.560 --> 00:41:47.610 direct us here, we clearly have now an unsorted list of numbers. 00:41:47.610 --> 00:41:49.928 Let's just bite off this problem one bit at a time. 00:41:49.928 --> 00:41:51.720 So for instance, you two, your names again? 00:41:51.720 --> 00:41:52.350 AUDIENCE: Tris. 00:41:52.350 --> 00:41:52.740 SPEAKER 1: Tris. 00:41:52.740 --> 00:41:53.280 AUDIENCE: Curtis. 00:41:53.280 --> 00:41:53.940 SPEAKER 1: And Curtis. 00:41:53.940 --> 00:41:55.655 So you guys are clearly out of order. 00:41:55.655 --> 00:41:57.780 So what would be the locally optimal solution here. 00:41:57.780 --> 00:41:58.800 AUDIENCE: They would switch orders. 00:41:58.800 --> 00:42:00.050 SPEAKER 1: OK, please do that. 00:42:00.050 --> 00:42:01.885 All right, now let's consider 6 and 8. 00:42:01.885 --> 00:42:02.910 AUDIENCE: They're fine. 00:42:02.910 --> 00:42:03.892 SPEAKER 1: OK, 8 and 5? 00:42:03.892 --> 00:42:05.100 AUDIENCE: Let's switch again. 00:42:05.100 --> 00:42:06.392 SPEAKER 1: Please switch again. 00:42:06.392 --> 00:42:07.140 8 and 2? 00:42:07.140 --> 00:42:08.190 AUDIENCE: Switch. 00:42:08.190 --> 00:42:09.030 SPEAKER 1: OK. 00:42:09.030 --> 00:42:09.840 8 and 7? 00:42:09.840 --> 00:42:11.063 AUDIENCE: Switch. 00:42:11.063 --> 00:42:11.855 SPEAKER 1: 8 and 4? 00:42:11.855 --> 00:42:13.110 AUDIENCE: Switch. 00:42:13.110 --> 00:42:13.860 SPEAKER 1: 8 and-- 00:42:13.860 --> 00:42:14.490 AUDIENCE: 1. 00:42:14.685 --> 00:42:14.880 SPEAKER 1: --1? 00:42:14.880 --> 00:42:15.480 AUDIENCE: Switch. 00:42:15.480 --> 00:42:16.355 SPEAKER 1: All right. 00:42:16.355 --> 00:42:17.930 So have we solved the problem? 00:42:17.930 --> 00:42:18.605 AUDIENCE: No. 00:42:18.605 --> 00:42:20.730 SPEAKER 1: OK, no, obviously not, but is it better? 00:42:20.730 --> 00:42:23.160 Are we closer to the solution? 00:42:23.160 --> 00:42:27.493 I'd argue we are closer because, right, like 8 somehow made its way all 00:42:27.493 --> 00:42:30.660 the way to the correct destination, even though we still have kind of a mess 00:42:30.660 --> 00:42:31.920 here to fix. 00:42:31.920 --> 00:42:35.580 But notice that the solution got better in this direction and a little better 00:42:35.580 --> 00:42:36.210 this direction. 00:42:36.210 --> 00:42:37.270 But we're going to do this again. 00:42:37.270 --> 00:42:38.715 So Bonnie, can you direct us once more? 00:42:38.715 --> 00:42:39.325 AUDIENCE: Yes. 00:42:39.325 --> 00:42:44.100 So if you would proceed from this order, you two would switch. 00:42:44.100 --> 00:42:45.802 SPEAKER 1: 5 and 6? 00:42:45.802 --> 00:42:47.240 AUDIENCE: Let's switch again. 00:42:47.240 --> 00:42:48.330 SPEAKER 1: 6 and 2? 00:42:48.330 --> 00:42:50.630 AUDIENCE: Remain, and then the next person-- 00:42:50.630 --> 00:42:51.450 SPEAKER 1: 7 and 4? 00:42:51.450 --> 00:42:52.440 AUDIENCE: 7 and 4 switch. 00:42:52.440 --> 00:42:52.620 SPEAKER 1: Nice. 00:42:52.620 --> 00:42:53.120 7 and 1? 00:42:53.120 --> 00:42:54.162 AUDIENCE: 1 and 7 switch. 00:42:54.162 --> 00:42:54.780 And then-- 00:42:54.780 --> 00:42:55.560 SPEAKER 1: So now are we done? 00:42:55.560 --> 00:42:56.233 AUDIENCE: No. 00:42:56.233 --> 00:42:58.650 SPEAKER 1: So no, but look, the problem is getting better. 00:42:58.650 --> 00:43:01.968 It's closer to solution because now we have 8 in place and 7 in place. 00:43:01.968 --> 00:43:04.260 So we've taken a bite out of the problem, if you would. 00:43:04.260 --> 00:43:05.670 Now, we can do this a little more rapid. 00:43:05.670 --> 00:43:08.580 So if you want to tell everyone what to do pairwise, pretty quickly. 00:43:08.580 --> 00:43:09.080 Go. 00:43:09.080 --> 00:43:12.187 AUDIENCE: So everyone, just if you're-- 00:43:12.187 --> 00:43:13.650 [LAUGHTER] 00:43:13.650 --> 00:43:15.408 SPEAKER 1: Human intuition, if you would. 00:43:15.408 --> 00:43:16.450 But let's do it pairwise. 00:43:16.450 --> 00:43:17.210 AUDIENCE: OK. 00:43:17.210 --> 00:43:18.360 Sure. 00:43:18.360 --> 00:43:21.240 Could everyone if the person on your right is smaller than you, 00:43:21.240 --> 00:43:25.453 switch with them and then do that again. 00:43:25.453 --> 00:43:26.120 SPEAKER 1: Good. 00:43:26.120 --> 00:43:27.540 AUDIENCE: Do that again, again. 00:43:27.540 --> 00:43:29.231 SPEAKER 1: Good. 00:43:29.231 --> 00:43:30.010 AUDIENCE: Again. 00:43:32.670 --> 00:43:33.833 And then one last time. 00:43:33.833 --> 00:43:34.500 SPEAKER 1: Yeah. 00:43:34.500 --> 00:43:36.810 So even though we allowed it to get a little organic there at the end, 00:43:36.810 --> 00:43:38.100 now is the list sorted? 00:43:38.100 --> 00:43:39.570 AUDIENCE: Yeah. 00:43:39.570 --> 00:43:40.722 SPEAKER 1: [LAUGHS] Yes. 00:43:40.722 --> 00:43:42.930 So maybe a round of applause for our volunteers here. 00:43:42.930 --> 00:43:44.260 And thank you to Bonnie, especially. 00:43:44.260 --> 00:43:44.760 Thank you. 00:43:44.760 --> 00:43:47.160 [APPLAUSE] 00:43:47.660 --> 00:43:49.950 Brian, here we have a stressful for each of you. 00:43:49.950 --> 00:43:51.090 And thank you so much. 00:43:51.090 --> 00:43:54.160 So let's see if we can't now formalize-- 00:43:54.160 --> 00:43:56.350 feel free to make your way off to either side. 00:43:56.350 --> 00:44:01.080 Let's see if we can't formalize exactly what it is these volunteers wonderfully 00:44:01.080 --> 00:44:03.870 did at Bonnie's direction to get this list sorted. 00:44:03.870 --> 00:44:06.870 It turns out that what everyone did here has a name. 00:44:06.870 --> 00:44:10.230 It's an algorithm known as bubble sort because as you notice, 00:44:10.230 --> 00:44:14.110 the 8 initially kind of bubbled its way up from left to right, 00:44:14.110 --> 00:44:16.837 and then the 7 kind of bubbled its way up from left to right. 00:44:16.837 --> 00:44:19.920 And as they repeated, even though we did it more quickly at the [? end, ?] 00:44:19.920 --> 00:44:23.070 [? the ?] bigger numbers bubble their way all the way up until they were 00:44:23.070 --> 00:44:24.060 in the right place. 00:44:24.060 --> 00:44:26.580 So in pseudocode, I'd argue that what we did was this. 00:44:26.580 --> 00:44:29.460 Bonnie directed our audience, at an increasing speed, 00:44:29.460 --> 00:44:32.340 to repeat the following n minus 1 times. 00:44:32.340 --> 00:44:33.390 Why n minus 1? 00:44:33.390 --> 00:44:36.900 Well, if you've got n people and they're comparing each other, 00:44:36.900 --> 00:44:40.800 you can only compare people n minus 1 times if you have n people. 00:44:40.800 --> 00:44:44.370 So she told them to do this n minus 1 times in total 00:44:44.370 --> 00:44:47.130 for i from 0 to n minus 2. 00:44:47.130 --> 00:44:48.960 Now what's that actually referring to? 00:44:48.960 --> 00:44:50.610 So this i is our index. 00:44:50.610 --> 00:44:52.980 So it's kind of like treating our humans like an array. 00:44:52.980 --> 00:44:54.030 What did we do? 00:44:54.030 --> 00:45:00.130 If the ith person, starting at 0, and the ith plus 1 person are out of order, 00:45:00.130 --> 00:45:02.910 what did she tell them to do? 00:45:02.910 --> 00:45:05.680 Switch places or swap, so to speak. 00:45:05.680 --> 00:45:07.350 And so this looks pretty technical. 00:45:07.350 --> 00:45:11.790 But it's really just a pseudocode way of distilling into more succinct English, 00:45:11.790 --> 00:45:15.860 with some numbers involved, what it is Bonnie was directing everyone to do. 00:45:15.860 --> 00:45:17.610 She said do the following n minus 1 times. 00:45:17.610 --> 00:45:20.610 That's why it went on for several rotations, quicker and quicker. 00:45:20.610 --> 00:45:23.160 She then pretty much treated the first person 00:45:23.160 --> 00:45:27.580 as bracket 0, the next person as bracket 1, bracket 2, just like an array, 00:45:27.580 --> 00:45:28.740 albeit of humans. 00:45:28.740 --> 00:45:31.950 And then she compared them side by side, calling one person 00:45:31.950 --> 00:45:34.320 i and the person next to them i plus 1. 00:45:34.320 --> 00:45:37.815 And if they were out of order, it was swapped and again and again 00:45:37.815 --> 00:45:39.440 and again till this algorithm executed. 00:45:39.440 --> 00:45:43.080 Until finally, the whole thing was hopefully sorted. 00:45:43.080 --> 00:45:46.320 How many times did it-- 00:45:46.320 --> 00:45:47.880 how many steps did it take? 00:45:47.880 --> 00:45:48.910 How long did it take? 00:45:48.910 --> 00:45:52.200 What's the running time in big O notation of bubble sort? 00:45:52.200 --> 00:45:54.810 Well, the outer loop takes n minus 1 steps. 00:45:54.810 --> 00:45:59.820 The inner loop also takes n minus 1 steps because it's 0 through n minus 2. 00:45:59.820 --> 00:46:02.800 And so if we go ahead and multiply that out, ala FOIL, 00:46:02.800 --> 00:46:06.640 we have n squared minus 1n minus 1n plus 1. 00:46:06.640 --> 00:46:10.380 If we combine like terms, we now have n squared minus 2n plus 1. 00:46:10.380 --> 00:46:12.660 But at this point, what matters ultimately 00:46:12.660 --> 00:46:16.800 is that the highest order term, the n squared, is what ultimately dominates. 00:46:16.800 --> 00:46:19.860 The bigger n gets, the more impact that n squared has. 00:46:19.860 --> 00:46:22.410 And so a computer scientist would say that bubble sort 00:46:22.410 --> 00:46:25.720 is on the order of n squared. 00:46:25.720 --> 00:46:29.970 So if we add to our list from before the algorithm's upper bounds, 00:46:29.970 --> 00:46:33.750 we can now put bubble sort way up at the top, unfortunately, which 00:46:33.750 --> 00:46:36.240 is to say that sorting numbers with bubble sort 00:46:36.240 --> 00:46:41.085 is apparently way more expensive than linearly searching or binary searching. 00:46:41.085 --> 00:46:42.960 And so it kind of invites the question, then, 00:46:42.960 --> 00:46:45.168 with Eric and [? Nizari ?] when they came up earlier. 00:46:45.168 --> 00:46:47.550 Yes, [? Nizari's ?] algorithm was better. 00:46:47.550 --> 00:46:50.520 But it was better in the sense that it ran faster. 00:46:50.520 --> 00:46:53.650 But it presupposed what, just to be clear? 00:46:53.650 --> 00:46:54.583 AUDIENCE: [INAUDIBLE] 00:46:54.583 --> 00:46:56.250 SPEAKER 1: That the numbers were sorted. 00:46:56.250 --> 00:46:59.040 And so it's a little misleading to say that binary search is 00:46:59.040 --> 00:47:00.570 better than linear search. 00:47:00.570 --> 00:47:02.490 Because if it costs you a huge amount of time 00:47:02.490 --> 00:47:04.573 to sort those elements so that, then, [? Nizari ?] 00:47:04.573 --> 00:47:07.530 can go ahead and execute binary search, it might be a wash, 00:47:07.530 --> 00:47:09.390 or it might even be a net negative. 00:47:09.390 --> 00:47:11.400 So it's really going to depend on, well, are you 00:47:11.400 --> 00:47:13.698 searching more often, more than once? 00:47:13.698 --> 00:47:15.990 Are you searching lots and lots and lots of times, such 00:47:15.990 --> 00:47:18.210 that it's worth it to sort it once and then 00:47:18.210 --> 00:47:20.670 benefit long term by much faster code? 00:47:20.670 --> 00:47:23.470 Well, what about omega for bubble sort? 00:47:23.470 --> 00:47:25.930 Bubble sort's code, again, looked like this. 00:47:25.930 --> 00:47:30.040 And frankly, it doesn't really take into account at all good inputs, right? 00:47:30.040 --> 00:47:33.150 Like, the best possible input to any sorting algorithm most likely 00:47:33.150 --> 00:47:34.817 is it's already sorted for you, right? 00:47:34.817 --> 00:47:36.900 Because if it's already sorted, presumably there's 00:47:36.900 --> 00:47:38.190 no actual work to be done. 00:47:38.190 --> 00:47:40.020 How lucky would that be? 00:47:40.020 --> 00:47:42.660 But bubble sort, as defined, is kind of stupid, right? 00:47:42.660 --> 00:47:45.360 It doesn't say if already sorted, quit. 00:47:45.360 --> 00:47:48.450 It just blindly does the following n minus 1 times 00:47:48.450 --> 00:47:51.480 and then inside of that does something n minus 2 times. 00:47:51.480 --> 00:47:54.360 So what's the lower bound on the running time of bubble sort, 00:47:54.360 --> 00:47:58.347 even if you get lucky and the whole thing is already sorted for you? 00:47:58.347 --> 00:47:59.430 AUDIENCE: [? n squared. ?] 00:47:59.430 --> 00:48:02.760 SPEAKER 1: It's still in squared because it's still going to take as many steps 00:48:02.760 --> 00:48:03.310 as before. 00:48:03.310 --> 00:48:07.963 And so bubble short as a lower bound, arguably, has omega of n squared. 00:48:07.963 --> 00:48:10.380 And let's see, Brian, if you wouldn't mind lending a hand, 00:48:10.380 --> 00:48:13.740 let's see if we can't do better than by taking maybe a fundamentally 00:48:13.740 --> 00:48:16.680 different approach to sorting, as by laying out 00:48:16.680 --> 00:48:18.750 something called selection sort. 00:48:18.750 --> 00:48:21.560 So in selection sort, we have a similar set of numbers, 00:48:21.560 --> 00:48:23.940 but we won't bother using something as large as 50. 00:48:23.940 --> 00:48:26.200 Brian's going to kindly set them up in a random order, 00:48:26.200 --> 00:48:28.200 but we happen to have a cheat sheet on the board 00:48:28.200 --> 00:48:30.240 so that we can try this again if we need to. 00:48:30.240 --> 00:48:35.400 And these numbers right now are unsorted from left to right. 00:48:35.400 --> 00:48:39.620 And we have 1 2, 3, 4, 5, 6, 7, 8 numbers in total here. 00:48:39.620 --> 00:48:42.120 So bubble sort was nice because it leveraged your intuition, 00:48:42.120 --> 00:48:43.770 where it will just look to the left, look to the right 00:48:43.770 --> 00:48:45.220 and fix those small problems. 00:48:45.220 --> 00:48:47.970 But honestly, a fundamentally different way to think about sorting 00:48:47.970 --> 00:48:51.570 would be, well, if I know I want small to large, left to right, 00:48:51.570 --> 00:48:53.610 why don't I just do that? 00:48:53.610 --> 00:48:54.930 What is the smallest number? 00:48:54.930 --> 00:48:58.470 Well, recall that these things, if they're implemented in an array, 00:48:58.470 --> 00:48:59.760 might as well be in lockers. 00:48:59.760 --> 00:49:01.960 I can't just use a human intuition in this case. 00:49:01.960 --> 00:49:03.930 I have to look at each element individually. 00:49:03.930 --> 00:49:05.610 But I'm not going to bother throwing them back in the locker 00:49:05.610 --> 00:49:07.890 because that's just going to take unnecessary time. 00:49:07.890 --> 00:49:08.880 But I look at 6. 00:49:08.880 --> 00:49:11.453 6 is the smallest number I have seen thus far. 00:49:11.453 --> 00:49:13.870 So at the moment, this is the smallest number in the list. 00:49:13.870 --> 00:49:16.620 So I'm going to remember that with a variable in my mind. 00:49:16.620 --> 00:49:17.400 Now I see 3. 00:49:17.400 --> 00:49:20.460 3 is obviously less than 6, so I'm going to forget about 6 00:49:20.460 --> 00:49:23.940 and just remember for now that 3 is the smallest element I've seen. 00:49:23.940 --> 00:49:25.170 8 is no smaller. 00:49:25.170 --> 00:49:26.250 5 is no smaller. 00:49:26.250 --> 00:49:27.493 Ooh, 2 is smaller. 00:49:27.493 --> 00:49:29.160 I'm going to remember 2 is the smallest. 00:49:29.160 --> 00:49:31.050 I'm going to forget about the 3. 00:49:31.050 --> 00:49:34.680 Meanwhile I keep going, 7, 4-- ooh, 1 is even smaller. 00:49:34.680 --> 00:49:36.430 And so I've gotten to the end of the list. 00:49:36.430 --> 00:49:38.610 The smallest element in this list is 1. 00:49:38.610 --> 00:49:40.388 It obviously belongs over there. 00:49:40.388 --> 00:49:41.430 So what can I do with it? 00:49:41.430 --> 00:49:42.843 AUDIENCE: [INAUDIBLE]. 00:49:42.843 --> 00:49:44.760 SPEAKER 1: Yeah, ideally I could just move it. 00:49:44.760 --> 00:49:46.302 Now, maybe I should make room, right? 00:49:46.302 --> 00:49:48.873 The table's a little small, or my array is a fixed size. 00:49:48.873 --> 00:49:51.040 So I could start scooching everything over this way. 00:49:51.040 --> 00:49:51.420 But you know what? 00:49:51.420 --> 00:49:53.460 Frankly, that's going to take a while, right? [? I ?] 00:49:53.460 --> 00:49:54.990 have to move, like, seven elements. 00:49:54.990 --> 00:49:59.370 Why don't I just kind of forcefully evict the 6, 00:49:59.370 --> 00:50:03.050 put it over here, because after all, it was in random order in the first place. 00:50:03.050 --> 00:50:05.850 Who cares if I move it someplace else even more random? 00:50:05.850 --> 00:50:06.930 I'll deal with it later. 00:50:06.930 --> 00:50:08.263 So you could do either approach. 00:50:08.263 --> 00:50:09.510 You could shift everything. 00:50:09.510 --> 00:50:11.490 But that feels like it'll take some time. 00:50:11.490 --> 00:50:14.440 Or you can just evict whatever is in the place you want to be. 00:50:14.440 --> 00:50:18.330 But what's nice now is that my list is closer to sorted. 00:50:18.330 --> 00:50:21.010 The 1 is in its correct place. 00:50:21.010 --> 00:50:24.535 So now all I have to look at is n minus one other element. 00:50:24.535 --> 00:50:25.410 So let's take a look. 00:50:25.410 --> 00:50:26.785 What's the next smallest element? 00:50:26.785 --> 00:50:30.070 At the moment, it's 3, still 3, still 3. 00:50:30.070 --> 00:50:31.950 Oh, wait a minute, it looks like 2. 00:50:31.950 --> 00:50:34.680 Now, you might want to just abort now and rip out the 2. 00:50:34.680 --> 00:50:36.993 But you don't know necessarily, as the computer, 00:50:36.993 --> 00:50:38.910 if you're only looking at one value at a time, 00:50:38.910 --> 00:50:41.340 unless you have multiple variables in your mind, which 00:50:41.340 --> 00:50:42.573 I'm not going to bother with. 00:50:42.573 --> 00:50:44.490 Let me see if there's anything smaller than 2. 00:50:44.490 --> 00:50:46.900 7, 4, 6-- no. 00:50:46.900 --> 00:50:48.570 So I'm going to grab the 2. 00:50:48.570 --> 00:50:51.312 And where do I want to put it? 00:50:51.312 --> 00:50:52.020 Right over there. 00:50:52.020 --> 00:50:52.832 And you know what? 00:50:52.832 --> 00:50:54.040 This could be a net negative. 00:50:54.040 --> 00:50:55.650 But I think it's going to average out. 00:50:55.650 --> 00:50:59.160 I'm going to move the 3 to where I do have room and go ahead and claim 00:50:59.160 --> 00:51:01.385 that my 2 is now sorted. 00:51:01.385 --> 00:51:03.510 And I'm going to do this again and again and again. 00:51:03.510 --> 00:51:05.760 And just like Bonnie did, I'm going to do it a little faster now, 00:51:05.760 --> 00:51:06.677 walk through the list. 00:51:06.677 --> 00:51:08.140 OK, 3 is the smallest. 00:51:08.140 --> 00:51:11.277 I'm going to go ahead and put it in sorted order by evicting the 8. 00:51:11.277 --> 00:51:12.360 Now I'm going to go ahead. 00:51:12.360 --> 00:51:14.130 All right, 5 8, 7-- 00:51:14.130 --> 00:51:15.690 4 is now the smallest. 00:51:15.690 --> 00:51:18.000 I'm going to go ahead and evict the 5, move it over 00:51:18.000 --> 00:51:19.980 here, and claim that that's sorted. 00:51:19.980 --> 00:51:23.250 Let me do it once more, 8, 7, 5, 6. 00:51:23.250 --> 00:51:24.900 5 is clearly the smallest. 00:51:24.900 --> 00:51:28.310 Let me go ahead and evict the 8 again, make room for the 5. 00:51:28.310 --> 00:51:31.670 But I only have three steps left, 7, 8, 6. 00:51:31.670 --> 00:51:36.420 Let me go ahead and move the 7 over here, put the sixth in place. 00:51:36.420 --> 00:51:37.170 8 is the smallest. 00:51:37.170 --> 00:51:38.280 No, 7 is smaller. 00:51:38.280 --> 00:51:40.920 Let me go ahead and put it in place, evicting the 8. 00:51:40.920 --> 00:51:46.360 Voila, hopefully now, oof, done but a fundamentally different algorithm, 00:51:46.360 --> 00:51:46.860 right? 00:51:46.860 --> 00:51:50.220 There was no pairwise swapping back and forth and back and forth. 00:51:50.220 --> 00:51:53.670 Each time I sort of set my mind on a goal, get the next smallest element, 00:51:53.670 --> 00:51:55.000 get the next smallest element. 00:51:55.000 --> 00:51:59.460 And that is what we shall call selection sort, where on each iteration 00:51:59.460 --> 00:52:01.510 you select the next smallest element. 00:52:01.510 --> 00:52:05.880 So in pseudocode we might say this, for i from 0 to n minus 1. 00:52:05.880 --> 00:52:07.580 And again, just adopt this habit now. 00:52:07.580 --> 00:52:11.650 Any time in the life, and certainly a CS class, when you have n items, 00:52:11.650 --> 00:52:15.300 the first one is ironically 1 but in this case, 0. 00:52:15.300 --> 00:52:17.520 And the last one is n minus 1. 00:52:17.520 --> 00:52:19.920 0 to n minus 1 is how a computer scientist 00:52:19.920 --> 00:52:22.780 counts from 1 to n in the real world. 00:52:22.780 --> 00:52:26.880 So this just says do the following n times but use i. 00:52:26.880 --> 00:52:28.530 Start counting from 0. 00:52:28.530 --> 00:52:32.280 Find the smallest item between the ith item and the last item. 00:52:32.280 --> 00:52:33.510 What am I saying there? 00:52:33.510 --> 00:52:36.030 Well, if I initialize i initially to 0, that's 00:52:36.030 --> 00:52:38.820 just saying find the smallest element among all eight 00:52:38.820 --> 00:52:42.810 and grab it, swap the smallest item with that ith item. 00:52:42.810 --> 00:52:45.030 So wherever I found the smallest element, 00:52:45.030 --> 00:52:47.160 go ahead and swap it with that one. 00:52:47.160 --> 00:52:48.870 And then this algorithm-- whoops-- 00:52:48.870 --> 00:52:51.570 is just going to repeat again and again and again. 00:52:51.570 --> 00:52:56.440 It's almost a little more succinct to represent in pseudocode. 00:52:56.440 --> 00:52:59.260 But it invites the question, then, is this better? 00:52:59.260 --> 00:53:00.330 Is selection sort better? 00:53:00.330 --> 00:53:01.950 Well, what would it mean for an algorithm to be better? 00:53:01.950 --> 00:53:04.128 We have two rules of thumb, big O and omega. 00:53:04.128 --> 00:53:04.920 So let's try those. 00:53:04.920 --> 00:53:09.870 So in big O notation, how many steps does it take to sort a list of numbers 00:53:09.870 --> 00:53:12.870 like I did, where you just again and again and again select 00:53:12.870 --> 00:53:16.210 the smallest, the smallest, the smallest element? 00:53:16.210 --> 00:53:18.210 Well, how do you even begin to think about that? 00:53:18.210 --> 00:53:18.710 Yeah? 00:53:18.710 --> 00:53:23.691 AUDIENCE: [INAUDIBLE] n squared because you have at iteration n 00:53:23.691 --> 00:53:28.113 [? an ?] n minus 1 [INAUDIBLE]. 00:53:28.113 --> 00:53:28.780 SPEAKER 1: Yeah. 00:53:28.780 --> 00:53:29.730 That's the right intuition. 00:53:29.730 --> 00:53:31.820 And let me back up just one step until we get to that. 00:53:31.820 --> 00:53:33.168 The proposal was its n squared. 00:53:33.168 --> 00:53:34.960 And indeed, that's going to be the spoiler. 00:53:34.960 --> 00:53:35.770 But why? 00:53:35.770 --> 00:53:37.570 Well, if you actually started to count up 00:53:37.570 --> 00:53:41.110 how many steps I was taking physically, right, to find the smallest element, 00:53:41.110 --> 00:53:44.498 it's going to take me maybe seven steps to find the smallest 00:53:44.498 --> 00:53:46.540 element because I'm going to look at all of them. 00:53:46.540 --> 00:53:49.450 So in my first pass, I'm looking at all eight elements, 00:53:49.450 --> 00:53:53.620 or taking almost n steps to find the smallest number, like 1. 00:53:53.620 --> 00:53:55.300 But after that, the 1 was in place. 00:53:55.300 --> 00:53:58.820 And I turned on its light bulbs, and that left seven numbers left. 00:53:58.820 --> 00:54:00.490 And how many steps did I then take? 00:54:00.490 --> 00:54:01.570 Well, n minus 1. 00:54:01.570 --> 00:54:04.300 Then after the 2 was in place, how many steps? 00:54:04.300 --> 00:54:07.090 n minus 2 and then n minus 3, n minus 4, dot, 00:54:07.090 --> 00:54:09.490 dot, dot, until there was just one number left. 00:54:09.490 --> 00:54:13.873 So that invites the question, what, then, does this total up to? 00:54:13.873 --> 00:54:15.790 And indeed, you jumped to the right intuition. 00:54:15.790 --> 00:54:18.730 If you start with n, and you add to the n minus 1 steps, 00:54:18.730 --> 00:54:21.160 and you add to that n minus 2 steps, dot, dot, dot, 00:54:21.160 --> 00:54:24.303 one final step once you get to the end of the list, 00:54:24.303 --> 00:54:25.720 what does this actually sum up to? 00:54:25.720 --> 00:54:26.710 It's actually not obvious. 00:54:26.710 --> 00:54:29.050 And this is one of those things in life, unless you're a math major, 00:54:29.050 --> 00:54:31.030 you probably would look at the back of a math textbook 00:54:31.030 --> 00:54:32.860 or a physics textbook for those little cheat sheets 00:54:32.860 --> 00:54:34.630 that they used to come with, at least in high school. 00:54:34.630 --> 00:54:37.870 Allow me just to propose for today's sake, if you actually do out this math 00:54:37.870 --> 00:54:40.840 or look it up at the back of a book, it ends up being this, 00:54:40.840 --> 00:54:43.877 n times n plus 1 divided by 2. 00:54:43.877 --> 00:54:45.460 And you can prove this mathematically. 00:54:45.460 --> 00:54:48.820 But for our purposes, just trust me, if you will, 00:54:48.820 --> 00:54:52.730 that adding a number plus the smaller number plus the smaller number plus 00:54:52.730 --> 00:54:56.230 the smaller number all the way to 1, gives you this relationship, n times n 00:54:56.230 --> 00:54:57.550 plus 1 divided by 2. 00:54:57.550 --> 00:54:59.745 And it's fine if you just take that as fact. 00:54:59.745 --> 00:55:01.120 So let me just multiply this out. 00:55:01.120 --> 00:55:03.340 That's n squared plus n/2. 00:55:03.340 --> 00:55:05.890 That, of course, is n squared divided by 2 plus n/2. 00:55:05.890 --> 00:55:08.140 But, again, who cares? 00:55:08.140 --> 00:55:10.840 Big O notation would propose that we focus only on what? 00:55:10.840 --> 00:55:11.765 AUDIENCE: n squared. 00:55:11.765 --> 00:55:12.640 SPEAKER 1: n squared. 00:55:12.640 --> 00:55:14.980 This frankly, is on the order of n squared, 00:55:14.980 --> 00:55:17.650 exactly as you said, because as n gets large, 00:55:17.650 --> 00:55:20.710 the only factor in that mathematical expression that we're really 00:55:20.710 --> 00:55:23.890 going to care about is the one that gets bigger and bigger and bigger 00:55:23.890 --> 00:55:26.030 faster than everything else. 00:55:26.030 --> 00:55:28.870 So in terms of selection sort, it would seem that we 00:55:28.870 --> 00:55:30.907 have big O of n squared for it as well. 00:55:30.907 --> 00:55:32.740 So it's a fundamentally different algorithm, 00:55:32.740 --> 00:55:34.720 but mathematically and in the real world, 00:55:34.720 --> 00:55:36.410 it kind of works out to be the same. 00:55:36.410 --> 00:55:38.080 So we haven't really done better yet. 00:55:38.080 --> 00:55:40.330 What about omega for selection sort? 00:55:40.330 --> 00:55:43.690 If the code for selection sort is this, does it 00:55:43.690 --> 00:55:48.170 benefit from the list being sorted already? 00:55:48.170 --> 00:55:51.920 Or is it just going to blindly do its order of n squared work 00:55:51.920 --> 00:55:54.470 again and again anyway, right? 00:55:54.470 --> 00:55:56.720 Like, this is opportune that the numbers are currently 00:55:56.720 --> 00:55:59.540 sorted because we can make a point, well, this is the best case scenario. 00:55:59.540 --> 00:56:01.020 I hand you the numbers 1 through 8. 00:56:01.020 --> 00:56:01.800 They're already sorted. 00:56:01.800 --> 00:56:03.467 And you try to use selection sort on it. 00:56:03.467 --> 00:56:05.698 Well, you might think, ooh, it's in the right place. 00:56:05.698 --> 00:56:07.490 I'm just going to grab the smallest number. 00:56:07.490 --> 00:56:10.377 Now I'm going to grab the next smallest number and so forth. 00:56:10.377 --> 00:56:11.210 But that's not true. 00:56:11.210 --> 00:56:14.780 When I'm the computer, and I open the first locker, and I see the number 1, 00:56:14.780 --> 00:56:16.810 do I know anything more about my numbers yet? 00:56:16.810 --> 00:56:17.660 AUDIENCE: No. 00:56:17.660 --> 00:56:18.140 SPEAKER 1: No, right? 00:56:18.140 --> 00:56:21.223 You're using human intuition to see that, OK, obviously it's the smallest. 00:56:21.223 --> 00:56:25.970 I, the program, do not know that until I look at the other numbers in the list. 00:56:25.970 --> 00:56:29.360 And so again, if you just iterate through using selection sort, 00:56:29.360 --> 00:56:31.460 you only know what's in front of you, which 00:56:31.460 --> 00:56:35.390 means you're going to execute the exact same code again and again. 00:56:35.390 --> 00:56:36.890 And that means the math is the same. 00:56:36.890 --> 00:56:39.680 Even in this best case, we are truly wasting our time 00:56:39.680 --> 00:56:45.020 now with selections sort because it is going to be omega of n squared, too. 00:56:45.020 --> 00:56:48.740 So my god, now we have two bad solutions to a problem. 00:56:48.740 --> 00:56:49.950 Can we do better? 00:56:49.950 --> 00:56:52.120 Well, let me propose we revisit bubble sort. 00:56:52.120 --> 00:56:56.390 Bubble sort, again, just has you swap adjacent elements again and again 00:56:56.390 --> 00:56:59.720 and again and again until you're all sorted. 00:56:59.720 --> 00:57:02.960 But when might you want to stop going back and forth the list? 00:57:02.960 --> 00:57:08.550 Like, when might Bonnie have wanted to say, ooh, that's enough work, I'm done? 00:57:08.550 --> 00:57:11.550 If she walks through the list looking at every person, i and i 00:57:11.550 --> 00:57:13.950 plus 1 next to each other, when might she 00:57:13.950 --> 00:57:16.200 conclude that she's done doing that work of sorting? 00:57:19.984 --> 00:57:20.484 Yeah? 00:57:20.484 --> 00:57:22.400 AUDIENCE: [INAUDIBLE] 00:57:22.400 --> 00:57:24.530 SPEAKER 1: Yeah, if there was a question she asked, 00:57:24.530 --> 00:57:27.980 or if there was a pass she made walking through the volunteers 00:57:27.980 --> 00:57:29.360 and didn't have to do any work. 00:57:29.360 --> 00:57:31.485 She doesn't have to keep doing work again and again 00:57:31.485 --> 00:57:34.280 just because the algorithm said to repeat it n minus 1 times. 00:57:34.280 --> 00:57:37.820 We kind of want to have a condition in here, or some way of short circuiting 00:57:37.820 --> 00:57:41.570 the algorithm, so that we stop once we're really just wasting our time. 00:57:41.570 --> 00:57:43.820 And bubble sort lends itself to that because we 00:57:43.820 --> 00:57:48.485 can tweak our wording of our pseudocode as follows, repeat until no swaps. 00:57:48.485 --> 00:57:51.110 So again, it's opportune that these numbers are already sorted. 00:57:51.110 --> 00:57:52.370 Let's try bubble sort on it. 00:57:52.370 --> 00:57:54.998 So Bonnie probably would have said, compare 1 and 2. 00:57:54.998 --> 00:57:56.040 They're not out of order. 00:57:56.040 --> 00:57:57.260 So we don't have to swap. 00:57:57.260 --> 00:58:02.570 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7, 7 and 8, 00:58:02.570 --> 00:58:04.580 she obviously did no swaps. 00:58:04.580 --> 00:58:09.290 It would be stupid for her to go again just because the algorithm said 00:58:09.290 --> 00:58:13.460 do this n minus 1 times because she's going to get no, no, no, no, 00:58:13.460 --> 00:58:15.080 again and again as her answer. 00:58:15.080 --> 00:58:19.520 So by saying repeat until no swaps, she can abort this algorithm early and then 00:58:19.520 --> 00:58:23.327 have taken how many steps in this best case? 00:58:23.327 --> 00:58:24.202 AUDIENCE: [INAUDIBLE] 00:58:24.202 --> 00:58:26.119 SPEAKER 1: Yeah, technically n minus 1, right? 00:58:26.119 --> 00:58:31.750 Because if this is n elements, or 8, you can compare seven pairs, 1, 2, 3, 4, 5, 00:58:31.750 --> 00:58:35.000 6, 7, so n minus 1. 00:58:35.000 --> 00:58:38.480 So she could, in the best case, then, have a lower bound 00:58:38.480 --> 00:58:41.030 on running time for selection sort-- 00:58:41.030 --> 00:58:45.620 of bubble sort no longer of n squared, but now n. 00:58:45.620 --> 00:58:47.630 So it would seem that with a bit more cleverness 00:58:47.630 --> 00:58:52.635 we can actually benefit in terms of the running time of these algorithms. 00:58:52.635 --> 00:58:55.760 Well, let's see if we can't see these from a slightly different perspective 00:58:55.760 --> 00:58:58.290 now by doing this visualization. 00:58:58.290 --> 00:59:02.990 I'm going to go ahead and open up a graphical visualization of each 00:59:02.990 --> 00:59:04.890 of these algorithms in turn. 00:59:04.890 --> 00:59:09.890 So what you have here is an array of numbers, each of which 00:59:09.890 --> 00:59:11.690 is represented by a vertical bar. 00:59:11.690 --> 00:59:14.270 Short bar is small number, like 0, 1, 2. 00:59:14.270 --> 00:59:18.500 Tall bar is big number, like 99 or 100 or anything in between. 00:59:18.500 --> 00:59:20.330 This is a visualization tool online. 00:59:20.330 --> 00:59:21.740 And we'll link this on the course's website 00:59:21.740 --> 00:59:23.240 so that we can try these algorithms. 00:59:23.240 --> 00:59:25.090 So let's try bubble sort, for instance. 00:59:25.090 --> 00:59:26.660 I'm going to start it kind of slow. 00:59:26.660 --> 00:59:30.860 But you can see highlighted in pink two elements being compared side 00:59:30.860 --> 00:59:34.220 by side, i and i plus 1 being swapped if they're out of order. 00:59:34.220 --> 00:59:37.460 So this is the graphical version of what Bonnie's instructions were 00:59:37.460 --> 00:59:38.810 to our volunteers. 00:59:38.810 --> 00:59:40.800 And now notice, bubble sort gets its name 00:59:40.800 --> 00:59:43.550 because notice what's happening to apparently the biggest element. 00:59:43.550 --> 00:59:46.880 It's sort of bubbling its way up all the way to the end. 00:59:46.880 --> 00:59:48.710 Smaller elements are making progress. 00:59:48.710 --> 00:59:51.410 Like a 15 and a 12 just moved a little bit to the left. 00:59:51.410 --> 00:59:52.340 But they're not done. 00:59:52.340 --> 00:59:54.020 They're not in their right places yet. 00:59:54.020 --> 00:59:58.100 But the big elements are starting to bubble all the way to the right. 00:59:58.100 --> 01:00:00.600 Now, this gets a little tedious pretty quickly. 01:00:00.600 --> 01:00:03.390 So I'm going to go ahead and speed up the animation speed. 01:00:03.390 --> 01:00:04.650 And if we watch it now-- 01:00:04.650 --> 01:00:06.650 same algorithm, it's just running faster-- 01:00:06.650 --> 01:00:08.690 you can really see that the larger elements are 01:00:08.690 --> 01:00:10.920 accumulating at the right-hand side. 01:00:10.920 --> 01:00:12.830 So this is identical to our eight volunteers. 01:00:12.830 --> 01:00:15.260 It's just each human now is represented by a bar. 01:00:15.260 --> 01:00:19.070 And you can really see the larger numbers bubbling their way up 01:00:19.070 --> 01:00:20.210 to the top. 01:00:20.210 --> 01:00:23.810 But you can see perhaps more visually there's a lot of work here. 01:00:23.810 --> 01:00:25.940 Bonnie was uttering a lot of sentences. 01:00:25.940 --> 01:00:28.310 She was doing a lot of back and forth, because, just 01:00:28.310 --> 01:00:32.120 as this pink bar suggests, it's going back and forth and back and forth, 01:00:32.120 --> 01:00:35.887 doing a lot of work again and again and again. 01:00:35.887 --> 01:00:36.470 And let's see. 01:00:36.470 --> 01:00:38.512 It's going to start to speed up now because we're 01:00:38.512 --> 01:00:40.820 nearing the latter half of it. 01:00:40.820 --> 01:00:44.420 But as you can see, ultimately, this is kind 01:00:44.420 --> 01:00:46.130 of what n squared feels like, right? 01:00:46.130 --> 01:00:47.540 I'm kind of out of words again. 01:00:47.540 --> 01:00:49.050 And I could say some more things. 01:00:49.050 --> 01:00:51.842 But it's really just stalling because the algorithm's kind of slow. 01:00:51.842 --> 01:00:55.550 n squared is not a good upper bound on running 01:00:55.550 --> 01:00:58.598 time, especially when your [? elements ?] are randomly sorted. 01:00:58.598 --> 01:00:59.640 So let's try another one. 01:00:59.640 --> 01:01:02.375 Let's do, in this case, selection sort. 01:01:02.375 --> 01:01:05.520 So I'm going to re-randomize the numbers just as we started and now do 01:01:05.520 --> 01:01:06.380 selection sort. 01:01:06.380 --> 01:01:08.240 And I'm starting at the faster speed. 01:01:08.240 --> 01:01:10.010 And it's working a little differently. 01:01:10.010 --> 01:01:11.990 Notice that the pink line is sweeping from 01:01:11.990 --> 01:01:13.990 left to right, looking for the smallest element. 01:01:13.990 --> 01:01:16.880 And when it finds it, it highlights the little bar, 01:01:16.880 --> 01:01:19.363 and it moves it all the way in place to the left. 01:01:19.363 --> 01:01:22.280 So whereas bubble [? sort's ?] large elements bubbled up to the right, 01:01:22.280 --> 01:01:26.360 selection sort is much more emphatically grabbing the smallest element 01:01:26.360 --> 01:01:29.340 and putting it into its place one after the other. 01:01:29.340 --> 01:01:30.890 So this has a different feel. 01:01:30.890 --> 01:01:33.200 But here, too, I'm going to have to ad lib 01:01:33.200 --> 01:01:35.380 quite a bit because it's taking a while. 01:01:35.380 --> 01:01:37.970 And you can see the pink bars are really going back and forth 01:01:37.970 --> 01:01:43.610 and back and forth, doing quite a bit of work, quite a bit of work, 01:01:43.610 --> 01:01:44.960 quite a bit of work. 01:01:44.960 --> 01:01:47.060 And now finally, it's done. 01:01:47.060 --> 01:01:50.690 So in a bit, we'll take a look at fundamentally faster solutions 01:01:50.690 --> 01:01:52.730 and see why n squared actually is small. 01:01:52.730 --> 01:01:56.270 But first let's take our five-minute break with mini cupcakes outside. 01:01:59.900 --> 01:02:01.370 So we are back. 01:02:01.370 --> 01:02:03.860 And just as a teaser for this coming week, 01:02:03.860 --> 01:02:07.280 the week is ultimately about algorithms and the implementation thereof. 01:02:07.280 --> 01:02:09.528 And it turns out that certainly on campus 01:02:09.528 --> 01:02:11.570 and in the real world, our [? election's ?] quite 01:02:11.570 --> 01:02:14.600 often, an algorithmic process to which there's actually 01:02:14.600 --> 01:02:16.550 multiple possible solutions. 01:02:16.550 --> 01:02:19.450 Indeed, when you vote for someone, how those votes are tabulated 01:02:19.450 --> 01:02:21.700 can actually differ based on the algorithm being used. 01:02:21.700 --> 01:02:24.460 And those can actually had very real-world effects 01:02:24.460 --> 01:02:26.110 on the outcomes of those elections. 01:02:26.110 --> 01:02:28.930 So among the challenges ahead for the coming week with problem set 01:02:28.930 --> 01:02:32.260 three is to implement a number of algorithms related to elections. 01:02:32.260 --> 01:02:34.300 For instance, it might be a very simple ballot, 01:02:34.300 --> 01:02:39.610 whereby whoever has the most votes among all among all of the candidates, 01:02:39.610 --> 01:02:41.218 or whoever has a plurality wins. 01:02:41.218 --> 01:02:44.260 Or you can implement some kind of runoff election, whereby you don't just 01:02:44.260 --> 01:02:46.690 vote for one candidate, but you rank your preferences. 01:02:46.690 --> 01:02:49.720 And then you use software or a more manual human process 01:02:49.720 --> 01:02:52.660 to adjudicate who wins based on the ranking of those candidates. 01:02:52.660 --> 01:02:55.150 And there's even more possibilities that ultimately 01:02:55.150 --> 01:02:57.370 can influence real-world outcomes, whether it's here 01:02:57.370 --> 01:02:59.090 on campus or in the real world. 01:02:59.090 --> 01:03:02.080 And so that's what we'll explore this week in code. 01:03:02.080 --> 01:03:04.240 But now let's see if we can't fundamentally 01:03:04.240 --> 01:03:07.660 do better than both bubble sort and selection sort. 01:03:07.660 --> 01:03:10.810 And let me stipulate, there's actually dozens of sorting algorithms. 01:03:10.810 --> 01:03:14.320 We're looking just at a couple of representative algorithms here. 01:03:14.320 --> 01:03:19.090 But let's see if we can't do fundamentally better than the n squared 01:03:19.090 --> 01:03:20.863 big O that we kept bumping up against. 01:03:20.863 --> 01:03:23.530 And to do that, let me propose that we introduce a fundamentally 01:03:23.530 --> 01:03:27.070 new idea that, frankly, among the ideas we explore in computer science 01:03:27.070 --> 01:03:28.960 will kind of bend your mind a little bit. 01:03:28.960 --> 01:03:30.460 So again, here comes that fire hose. 01:03:30.460 --> 01:03:33.760 But again, the goal today is exposure, not yet comfort. 01:03:33.760 --> 01:03:37.600 Comfort will come in the coming weeks as we apply [? this ?] [? ideas ?] 01:03:37.600 --> 01:03:38.470 and others. 01:03:38.470 --> 01:03:41.608 So let's rewind to week 0, where everything was very simple at the time. 01:03:41.608 --> 01:03:43.900 And we were just searching a phone book for Mike Smith. 01:03:43.900 --> 01:03:45.670 And we had this pseudocode here. 01:03:45.670 --> 01:03:48.550 This had an example of a programming construct that, at the time, 01:03:48.550 --> 01:03:51.460 we highlighted and called a loop, go back to line 3 01:03:51.460 --> 01:03:54.010 so that you can do something again and again. 01:03:54.010 --> 01:03:56.503 This is an example of what's called iteration, 01:03:56.503 --> 01:03:58.420 a word you might have heard your [? TFs ?] say 01:03:58.420 --> 01:04:02.365 or someone else, where to iterate just means to loop again and again. 01:04:02.365 --> 01:04:03.740 And this is very straightforward. 01:04:03.740 --> 01:04:05.698 And we could implement this in code if we want. 01:04:05.698 --> 01:04:08.980 But there there's an opportunity to design this algorithm not only 01:04:08.980 --> 01:04:11.500 differently, but perhaps better, right? 01:04:11.500 --> 01:04:14.440 After all, let me go ahead and erase that line there 01:04:14.440 --> 01:04:17.530 and get rid of this iteration and see if I can't solve the problem 01:04:17.530 --> 01:04:21.230 more elegantly, if you will, a better design, if you will, 01:04:21.230 --> 01:04:23.230 though there will invariably be some trade-offs. 01:04:23.230 --> 01:04:26.230 Here, with the open to the middle of the left-- 01:04:26.230 --> 01:04:28.720 here, with open to middle of left half of book 01:04:28.720 --> 01:04:31.540 and here, open to middle of right half of book, 01:04:31.540 --> 01:04:35.530 the whole point of opening to the middle of the left or the middle of the right 01:04:35.530 --> 01:04:39.190 was just a search for Mike Smith again but in half of the phone book, 01:04:39.190 --> 01:04:40.180 left or right. 01:04:40.180 --> 01:04:43.810 The key detail being it's half the size of the whole phone book. 01:04:43.810 --> 01:04:45.500 But the algorithm is really the same. 01:04:45.500 --> 01:04:47.593 So in fact, why don't we simplify our pseudocode 01:04:47.593 --> 01:04:50.260 and not get into the logistics of like, oh, go back to this line 01:04:50.260 --> 01:04:51.635 and then do this again and again. 01:04:51.635 --> 01:04:54.700 No, let's just say search the left half of book 01:04:54.700 --> 01:04:56.800 or search the right half of book. 01:04:56.800 --> 01:04:59.620 And in fact, let's tighten up the code and make it fewer lines 01:04:59.620 --> 01:05:03.250 so that we don't even need to get into the specific line numbers. 01:05:03.250 --> 01:05:06.070 We can just tell ourselves what to do. 01:05:06.070 --> 01:05:09.040 Now, highlighted in yellow here are those two new lines. 01:05:09.040 --> 01:05:11.165 And it might seem kind of like a cyclical argument. 01:05:11.165 --> 01:05:12.790 Well, how do you search for Mike Smith? 01:05:12.790 --> 01:05:14.590 Well, you just search for Mike Smith. 01:05:14.590 --> 01:05:17.620 But the key detail here is I'm not just telling 01:05:17.620 --> 01:05:19.390 you to do the same thing endlessly. 01:05:19.390 --> 01:05:22.057 I'm telling you, if you want to search for Mike Smith in a phone 01:05:22.057 --> 01:05:23.140 book of this size, mm-mm. 01:05:23.140 --> 01:05:25.315 Search for Mike Smith in a phone book of this size. 01:05:25.315 --> 01:05:27.940 And then the next step of that algorithm becomes search for him 01:05:27.940 --> 01:05:32.320 in a phone book of this size, this size, when you keep halving the problem. 01:05:32.320 --> 01:05:37.150 So this is an example of a technique in programming called recursion, whereby 01:05:37.150 --> 01:05:43.810 you implement a program or an algorithm or code that, in a sense, calls itself. 01:05:43.810 --> 01:05:47.950 If what we're looking at here on the board is a function called search, 01:05:47.950 --> 01:05:52.600 a function is recursive if it literally references its own name 01:05:52.600 --> 01:05:53.785 in its own code. 01:05:53.785 --> 01:05:55.910 And this is where your mind starts to bend perhaps. 01:05:55.910 --> 01:05:57.640 And we'll see this more concretely. 01:05:57.640 --> 01:06:00.910 But recursion is when a function calls itself. 01:06:00.910 --> 01:06:03.970 So if this is a function implementing search and highlighted in yellow 01:06:03.970 --> 01:06:07.690 are two lines of code that say search again but on a smaller 01:06:07.690 --> 01:06:13.330 piece of the problem, that is recursion, something happening again and again. 01:06:13.330 --> 01:06:14.715 So let's see this in context. 01:06:14.715 --> 01:06:17.590 So let's go back to Mario, where this is a slightly different pyramid 01:06:17.590 --> 01:06:18.548 that we've seen before. 01:06:18.548 --> 01:06:22.270 Notice that it's left aligned, and it goes downward to the right. 01:06:22.270 --> 01:06:25.930 Let's, in fact, get rid of the ground and just focus only on the pyramid. 01:06:25.930 --> 01:06:29.980 How could I go about writing code that implements this type of Mario pyramid? 01:06:29.980 --> 01:06:33.100 Well, let me go ahead and create a new file called Mario.c. 01:06:33.100 --> 01:06:35.860 Or actually, no, let's be even more specific this time. 01:06:35.860 --> 01:06:38.313 Let's call this iteration.c to make clear 01:06:38.313 --> 01:06:39.730 that this is an iterative program. 01:06:39.730 --> 01:06:42.580 And let me go ahead and include cs50.h. 01:06:42.580 --> 01:06:44.920 And let me include standard io.h. 01:06:44.920 --> 01:06:48.220 And let me go ahead then and do int main void. 01:06:48.220 --> 01:06:51.460 And in here, let me go ahead and just get the height of the pyramid 01:06:51.460 --> 01:06:53.673 from the user, using our old friend get int. 01:06:53.673 --> 01:06:55.840 I'm not going to bother, for today, doing a do while 01:06:55.840 --> 01:06:57.382 and making sure the human cooperates. 01:06:57.382 --> 01:07:00.670 They need to just behave and give us a positive integer here. 01:07:00.670 --> 01:07:04.300 And then I'm going to go ahead and just draw a pyramid of that height 01:07:04.300 --> 01:07:07.690 by using a function that doesn't exist yet but that's going to be called draw. 01:07:07.690 --> 01:07:10.240 I'm going to implement this function draw as follows, 01:07:10.240 --> 01:07:13.790 void draw, because it doesn't need to return a value, per our discussion 01:07:13.790 --> 01:07:14.290 last week. 01:07:14.290 --> 01:07:15.520 It's just going to print something. 01:07:15.520 --> 01:07:17.650 But it is going to take input, like a number n. 01:07:17.650 --> 01:07:19.450 Or rather, let's call it H for Height. 01:07:19.450 --> 01:07:21.700 That represents the height of the pyramid to draw. 01:07:21.700 --> 01:07:25.010 And how do I draw a pyramid that looks like this? 01:07:25.010 --> 01:07:28.630 Well, again, use some intuition, as you might have four problem set 1, 01:07:28.630 --> 01:07:31.190 even though the pyramid there was a little trickier. 01:07:31.190 --> 01:07:33.043 On the first row, I want to print one brick. 01:07:33.043 --> 01:07:34.960 On the second row, I want to print two bricks. 01:07:34.960 --> 01:07:37.450 On the third row, three, fourth row, four. 01:07:37.450 --> 01:07:40.210 So it turns out this is an easier pyramid than the one we 01:07:40.210 --> 01:07:41.950 had you do for problem set 1. 01:07:41.950 --> 01:07:42.610 Sorry. 01:07:42.610 --> 01:07:45.550 So for int, i gets 0. 01:07:45.550 --> 01:07:47.410 i is less than-- actually, you know what? 01:07:47.410 --> 01:07:50.530 Let me make it a little clearer and more mapping to my verbal pseudocode. 01:07:50.530 --> 01:07:53.080 Let's initialize i to 1 for the first row. 01:07:53.080 --> 01:07:56.140 Let's do this so long as i is less than or equal to the height. 01:07:56.140 --> 01:07:57.310 And let's do i plus plus. 01:07:57.310 --> 01:08:02.550 So this is the same thing as starting from 0 but just, surprise-- 01:08:02.550 --> 01:08:08.380 [LAUGHS] so I actually didn't make that mistake, if you didn't see it. 01:08:08.380 --> 01:08:14.860 So I'm going to go ahead and say for in i gets 1 to represent my first row. 01:08:14.860 --> 01:08:17.439 i is less than or equal to height, i plus plus. 01:08:17.439 --> 01:08:20.242 This is identical, again, to starting from 0, 01:08:20.242 --> 01:08:23.200 but it's just nice to start counting from 1 sometimes, as in this case, 01:08:23.200 --> 01:08:24.279 for the first row. 01:08:24.279 --> 01:08:26.696 And then anytime you want to do something two dimensional, 01:08:26.696 --> 01:08:28.700 like in Mario, odds are, if you're like me, 01:08:28.700 --> 01:08:33.310 you probably had an inner nested loop, maybe calling it j, and doing j 01:08:33.310 --> 01:08:37.593 is less than or equal to i and then j plus plus. 01:08:37.593 --> 01:08:39.760 And I'll run this so that it's clear what I'm doing. 01:08:39.760 --> 01:08:43.090 But inside this nested loop, I'm just going to print one brick. 01:08:43.090 --> 01:08:49.600 And then down here, I'm going to print my new line backslash n. 01:08:49.600 --> 01:08:51.370 So again, it's a simple draw function. 01:08:51.370 --> 01:08:53.710 And now because it's at the bottom of my file, 01:08:53.710 --> 01:08:56.950 I need to put its prototype up here, one of the few times copy and paste 01:08:56.950 --> 01:08:58.720 is reasonable, I would say. 01:08:58.720 --> 01:09:03.055 So let me make iteration, compiles OK, dot slash iteration. 01:09:03.055 --> 01:09:04.430 And now I'm asked for the height. 01:09:04.430 --> 01:09:06.470 Let's go ahead do a pyramid of size 4. 01:09:06.470 --> 01:09:07.715 And voila, it seems to work. 01:09:07.715 --> 01:09:08.840 And let me do it once more. 01:09:08.840 --> 01:09:12.100 I'll try, for instance, a pyramid of height 3. 01:09:12.100 --> 01:09:12.950 That works. 01:09:12.950 --> 01:09:15.340 And let me go ahead and do a pyramid of size 5. 01:09:15.340 --> 01:09:16.420 So it seems to work. 01:09:16.420 --> 01:09:19.180 And this is a very reasonable, very correct approach 01:09:19.180 --> 01:09:24.040 to implementing that Mario pyramid using iteration, that is to say, 01:09:24.040 --> 01:09:26.920 using loops, in this case, two loops. 01:09:26.920 --> 01:09:29.380 But you know what's interesting about this Mario pyramid, 01:09:29.380 --> 01:09:31.540 as well as some of the others we've seen, 01:09:31.540 --> 01:09:33.609 is there's this common structure, right? 01:09:33.609 --> 01:09:35.649 And if we look at the pyramid in isolation, 01:09:35.649 --> 01:09:38.950 what is the definition of a pyramid of height 4? 01:09:38.950 --> 01:09:44.050 Well, arguably, it's a pyramid of height 3 plus 1 additional row. 01:09:44.050 --> 01:09:46.359 What's the definition of a pyramid of height 3? 01:09:46.359 --> 01:09:49.359 Well, it's a pyramid of height 2 plus 1 additional row. 01:09:49.359 --> 01:09:51.460 What's the definition of a pyramid of high 2? 01:09:51.460 --> 01:09:54.730 It's a pyramid of height 1 with an additional row. 01:09:54.730 --> 01:09:59.830 That's a recursive definition of just a physical object or a virtual object, 01:09:59.830 --> 01:10:04.090 whereby you can describe the structure of something in terms of itself. 01:10:04.090 --> 01:10:08.020 Now, at some point, I need a special case, at least one height. 01:10:08.020 --> 01:10:09.970 What is a pyramid of height 0? 01:10:09.970 --> 01:10:10.780 Nothing, right? 01:10:10.780 --> 01:10:16.210 Return or exit or quit, whatever the right verbiage is for the algorithm. 01:10:16.210 --> 01:10:19.430 So long as you have a so-called base case, where you manually say, 01:10:19.430 --> 01:10:21.580 oh, in that specific case, just don't do anything, 01:10:21.580 --> 01:10:23.950 and you don't recursively call yourself again 01:10:23.950 --> 01:10:28.340 and again, we can use this principle of code calling itself. 01:10:28.340 --> 01:10:29.630 So let's try this once more. 01:10:29.630 --> 01:10:33.880 Let me go ahead and create another file called recursion.c. 01:10:33.880 --> 01:10:37.570 I'm again going to go ahead and include [? cs50.h. ?] And I'm going to go ahead 01:10:37.570 --> 01:10:41.590 and include standard [? io.h. ?] And then I'm going to go ahead and have int 01:10:41.590 --> 01:10:43.450 main void again. 01:10:43.450 --> 01:10:45.520 And in this program here, I'm going to again ask 01:10:45.520 --> 01:10:48.430 the user for the height of interest for their pyramid 01:10:48.430 --> 01:10:52.540 using int height gets get int and ask them for height. 01:10:52.540 --> 01:10:55.000 I'm not going to bother error checking here. 01:10:55.000 --> 01:10:57.400 I'm going to go ahead and draw a pyramid of that height. 01:10:57.400 --> 01:11:00.610 And so what's going to change this time is my draw function, 01:11:00.610 --> 01:11:04.400 void draw int h as before. 01:11:04.400 --> 01:11:06.070 And now's where things get interesting. 01:11:06.070 --> 01:11:09.460 My goal now is not to just use nested loops, 01:11:09.460 --> 01:11:13.390 but to define a bigger pyramid in terms of a small pyramid. 01:11:13.390 --> 01:11:19.030 So suppose that the goal at hand is to draw a pyramid of size 4. 01:11:19.030 --> 01:11:23.890 What should I do first, according to this definition of a pyramid? 01:11:23.890 --> 01:11:29.060 How do I draw a pyramid of size 4 in English? 01:11:29.060 --> 01:11:29.560 Yeah? 01:11:29.560 --> 01:11:33.750 AUDIENCE: Draw a pyramid of the size 4 minus 1. 01:11:33.750 --> 01:11:37.715 SPEAKER 1: Yeah, draw a pyramid of size 4 minus 1, or a pyramid of size 3. 01:11:37.715 --> 01:11:39.090 So how do I express this in code? 01:11:39.090 --> 01:11:43.140 Well, wonderfully in code, this is super simple, h minus 1. 01:11:43.140 --> 01:11:51.060 That will draw me a pyramid of height h minus 1, or 3 in this specific case. 01:11:51.060 --> 01:11:53.250 Now, it's not done the program, right? 01:11:53.250 --> 01:11:55.500 I can't possibly just compile this and expect 01:11:55.500 --> 01:11:59.790 it to work because this seems like it's just going to call itself endlessly. 01:11:59.790 --> 01:12:05.610 Well, what's a pyramid of size 3, 2, 1, 0, negative 1, negative 2, right? 01:12:05.610 --> 01:12:09.360 It would go on endlessly if I just blindly subtract 1. 01:12:09.360 --> 01:12:10.860 So I need that base case. 01:12:10.860 --> 01:12:13.630 Under what circumstances should I actually not draw anything? 01:12:13.630 --> 01:12:15.373 AUDIENCE: [INAUDIBLE] 01:12:15.373 --> 01:12:16.040 SPEAKER 1: Yeah. 01:12:16.040 --> 01:12:19.190 So maybe if h equals equals 0, you know what? 01:12:19.190 --> 01:12:19.880 Just return. 01:12:19.880 --> 01:12:21.290 Don't do anything, right? 01:12:21.290 --> 01:12:23.480 I need a base case, a hard-coded condition 01:12:23.480 --> 01:12:27.440 that says stop doing this, this mind-bending [? cyclity ?] 01:12:27.440 --> 01:12:28.700 again and again. 01:12:28.700 --> 01:12:30.330 But I do need to do one more thing. 01:12:30.330 --> 01:12:34.130 So this is just an error check to make sure I don't do this forever. 01:12:34.130 --> 01:12:36.730 This is this leap of faith, where somehow I haven't even 01:12:36.730 --> 01:12:38.480 written the function yet, and somehow it's 01:12:38.480 --> 01:12:40.572 magically going to draw my pyramid. 01:12:40.572 --> 01:12:42.530 But what's the second step of drawing a pyramid 01:12:42.530 --> 01:12:44.678 of height 4, if I can ask again? 01:12:44.678 --> 01:12:46.630 AUDIENCE: Well, in terms of [INAUDIBLE]? 01:12:46.630 --> 01:12:48.130 SPEAKER 1: Yeah, so what comes next? 01:12:48.130 --> 01:12:49.713 I've just drawn a pyramid of height 3. 01:12:49.713 --> 01:12:51.910 AUDIENCE: Oh, then you draw a pyramid of height 2. 01:12:51.910 --> 01:12:53.482 SPEAKER 1: Now I draw a-- 01:12:53.482 --> 01:12:54.190 say it once more. 01:12:54.190 --> 01:12:55.675 AUDIENCE: Pyramid of height 2. 01:12:55.675 --> 01:12:56.550 SPEAKER 1: Not quite. 01:12:56.550 --> 01:12:57.383 Take this literally. 01:12:57.383 --> 01:13:01.330 If I have just in code drawn a pyramid of height 3, 01:13:01.330 --> 01:13:03.280 how do I get to a pyramid of height 4 now? 01:13:03.280 --> 01:13:04.772 AUDIENCE: Oh, you add [INAUDIBLE]. 01:13:04.772 --> 01:13:07.480 SPEAKER 1: Yeah, I add that additional row, right? [? Because, ?] 01:13:07.480 --> 01:13:10.040 again, per our diagram, what's a pyramid of height 4? 01:13:10.040 --> 01:13:13.440 Well, it's really just a pyramid of height 3 plus an additional row. 01:13:13.440 --> 01:13:15.830 So if we all just kind of agree, a leap of faith, 01:13:15.830 --> 01:13:20.150 that somehow or other I have the ability to draw pyramids of height h minus 1, 01:13:20.150 --> 01:13:26.280 lets you and I do the hard part in code of drawing that one additional row. 01:13:26.280 --> 01:13:30.560 So if I go back in code here, after drawing a pyramid of height h minus 1, 01:13:30.560 --> 01:13:37.250 I need to go ahead and for int i gets 0, i is less than h, i plus plus. 01:13:37.250 --> 01:13:43.200 It would seem that I just need to print out, for instance, 01:13:43.200 --> 01:13:48.800 up here a hash followed by a new line after that, right? 01:13:48.800 --> 01:13:51.947 So I do need a for loop, but just one not nested. 01:13:51.947 --> 01:13:53.780 And what does this have the effect of doing? 01:13:53.780 --> 01:13:59.240 Well, on the fourth row, where h equals 4, how many hashes am I going to print? 01:13:59.240 --> 01:14:04.430 1, 2, 3, 4, if I'm iterating from 0 on up to h, 0, 1, 2, 3, 4. 01:14:04.430 --> 01:14:08.820 So these lines of code, in the story at hand, are going to print four hashes. 01:14:08.820 --> 01:14:13.460 This line of code, amazingly, is going to print everything else above it, 01:14:13.460 --> 01:14:14.855 the pyramid of height 3. 01:14:14.855 --> 01:14:16.730 And the line of code above that is just going 01:14:16.730 --> 01:14:22.070 to make sure that we don't blindly call draw forever into the negative numbers. 01:14:22.070 --> 01:14:27.290 I'm literally going to say, if h equals equals 0, stop doing this magic. 01:14:27.290 --> 01:14:31.190 So let's go ahead and put my prototype up top, just as before, 01:14:31.190 --> 01:14:36.260 even though it's the same, save the file, make recursion, Enter. 01:14:36.260 --> 01:14:37.700 It compiles OK. 01:14:37.700 --> 01:14:40.910 Now let me go ahead and run recursion a height of 4. 01:14:40.910 --> 01:14:47.150 And, oh, my god, I wrote a function that called itself and somehow magically 01:14:47.150 --> 01:14:47.940 printed a pyramid. 01:14:47.940 --> 01:14:51.620 And yet all I ever explicitly did was print what? 01:14:51.620 --> 01:14:54.860 A row of bricks myself. 01:14:54.860 --> 01:14:58.280 And the recursion comes from the fact that I'm calling myself. 01:14:58.280 --> 01:15:02.630 But just like with binary search, just like 01:15:02.630 --> 01:15:08.780 with any divide-and-conquer approach, I'm calling myself on a smaller problem 01:15:08.780 --> 01:15:09.860 than I was handed. 01:15:09.860 --> 01:15:14.220 The bites are eating into the problem again and again and again. 01:15:14.220 --> 01:15:17.570 Any questions on this technique, a function 01:15:17.570 --> 01:15:19.880 that calls itself is recursive? 01:15:19.880 --> 01:15:20.606 Yeah? 01:15:20.606 --> 01:15:22.550 AUDIENCE: A quick question. 01:15:22.550 --> 01:15:25.952 [INAUDIBLE] 01:15:25.952 --> 01:15:29.283 So [INAUDIBLE] loop, how does it go back [INAUDIBLE]?? 01:15:29.283 --> 01:15:31.450 SPEAKER 1: Really good question, after the for loop, 01:15:31.450 --> 01:15:32.700 how does it go back and print? 01:15:32.700 --> 01:15:33.240 It doesn't. 01:15:33.240 --> 01:15:34.960 That happens first. 01:15:34.960 --> 01:15:38.220 So if you actually were to use debug 50 in the [? IDE, ?] 01:15:38.220 --> 01:15:40.740 you would see that when this line 20 is called, 01:15:40.740 --> 01:15:45.650 and you call draw of a pyramid of height 3, draw gets called again. 01:15:45.650 --> 01:15:47.400 And then it gets called again on height 2. 01:15:47.400 --> 01:15:49.140 Then it gets called again on height 1. 01:15:49.140 --> 01:15:51.450 But guess what happens on a pyramid of height 1? 01:15:51.450 --> 01:15:53.100 It prints a single hash. 01:15:53.100 --> 01:15:55.410 Then if you rewind the story, what happens next? 01:15:55.410 --> 01:15:57.300 You print a row of two hashes. 01:15:57.300 --> 01:15:58.350 What happens next? 01:15:58.350 --> 01:15:59.940 You print a row of three hashes. 01:15:59.940 --> 01:16:01.080 What happens next? 01:16:01.080 --> 01:16:03.420 You print a row of four hashes. 01:16:03.420 --> 01:16:05.790 And we'll see more of this before long. 01:16:05.790 --> 01:16:07.800 But because I'm printing-- 01:16:07.800 --> 01:16:12.660 I'm calling draw before I'm printing the base, I don't know how this works yet. 01:16:12.660 --> 01:16:14.970 That's the leap of faith to which I keep alluding. 01:16:14.970 --> 01:16:18.870 But it keeps happening because, 1, I have this base case that 01:16:18.870 --> 01:16:20.880 stops this from happening forever. 01:16:20.880 --> 01:16:26.470 And I have this other case that adds to my pyramid again and again. 01:16:26.470 --> 01:16:26.970 Yeah? 01:16:26.970 --> 01:16:30.906 AUDIENCE: It's kind of like a layering of [INAUDIBLE] for iterations. 01:16:30.906 --> 01:16:34.962 But instead of going from top down, it's going [? down up. ?] 01:16:34.962 --> 01:16:35.670 SPEAKER 1: It is. 01:16:35.670 --> 01:16:36.390 It's going down up. 01:16:36.390 --> 01:16:37.860 And you're referring actually to a [? concept ?] 01:16:37.860 --> 01:16:40.652 we'll talk about actually in a week or two's time called the stack. 01:16:40.652 --> 01:16:43.110 We'll see actually how this magic is working. 01:16:43.110 --> 01:16:45.870 For now let me just stipulate that functions can call themselves, 01:16:45.870 --> 01:16:48.090 so long as what you pass them is a smaller 01:16:48.090 --> 01:16:50.545 input than you were handed initially. 01:16:50.545 --> 01:16:52.920 And now just to demonstrate that computer scientists have 01:16:52.920 --> 01:16:55.080 a sense of humor, if we Google recursion, 01:16:55.080 --> 01:16:57.810 as you might currently be doing to understand what this is, 01:16:57.810 --> 01:16:59.032 you'll notice-- 01:16:59.032 --> 01:17:01.392 [LAUGHTER] 01:17:02.810 --> 01:17:04.040 Get it? 01:17:04.040 --> 01:17:06.350 Kind of-- OK, anyhow. 01:17:06.350 --> 01:17:10.640 Google has literally hard-coded that into their source code of google.com. 01:17:10.640 --> 01:17:17.090 So let's now use this to solve a problem of sorting. 01:17:17.090 --> 01:17:19.997 It turns out there's an algorithm out there called merge sort. 01:17:19.997 --> 01:17:23.080 And it's representative of sorts that are actually better than bubble sort 01:17:23.080 --> 01:17:25.550 and better than selection sort fundamentally. 01:17:25.550 --> 01:17:28.160 In terms of big O notation, we can do better. 01:17:28.160 --> 01:17:30.708 n squared does not have to be our fate. 01:17:30.708 --> 01:17:32.750 After all, so many things in our life are sorted. 01:17:32.750 --> 01:17:36.320 Your contacts in your phone, maybe your friends on Facebook or Instagram, 01:17:36.320 --> 01:17:40.100 or any application using the cloud typically sorts data in some way. 01:17:40.100 --> 01:17:42.920 It would be a shame if it's super slow to sort, 01:17:42.920 --> 01:17:44.870 as we saw already, with n squared. 01:17:44.870 --> 01:17:47.402 So merge sort works as follows. 01:17:47.402 --> 01:17:49.610 This is pseudocode for an algorithm called merge sort 01:17:49.610 --> 01:17:53.000 that if you hand it an array of numbers or names or anything, 01:17:53.000 --> 01:17:54.050 it acts as follows. 01:17:54.050 --> 01:17:57.850 If there's only one item you're handed in your array, well, just return. 01:17:57.850 --> 01:17:58.850 There's nothing to sort. 01:17:58.850 --> 01:18:00.050 So that's our base case. 01:18:00.050 --> 01:18:02.000 That's the sort of stupid case where you have 01:18:02.000 --> 01:18:04.520 to hard code, that is literally write out, 01:18:04.520 --> 01:18:06.903 if this situation happens, do this. 01:18:06.903 --> 01:18:09.320 And the case is if you just hand me a list with one thing, 01:18:09.320 --> 01:18:12.070 it's obviously sorted, by definition, because nothing can possibly 01:18:12.070 --> 01:18:12.950 be out of order. 01:18:12.950 --> 01:18:15.380 Things get more interesting otherwise. 01:18:15.380 --> 01:18:18.800 Merge sort says, just like that Mario example, you know what? 01:18:18.800 --> 01:18:21.140 If you want me to sort this whole list, I'm 01:18:21.140 --> 01:18:25.970 going to tell you sort the left half, then sort the right half, 01:18:25.970 --> 01:18:30.200 and then merge those lists together, such that you weave them together 01:18:30.200 --> 01:18:33.830 in such a way that the merged list is sorted as well. 01:18:33.830 --> 01:18:38.330 So merge sort is three steps, sort left half, sort right half, 01:18:38.330 --> 01:18:41.870 merge those two sorted halves. 01:18:41.870 --> 01:18:43.310 And this is the-- 01:18:43.310 --> 01:18:45.800 we were chatting earlier about an apt metaphor here. 01:18:45.800 --> 01:18:49.770 This is kind of a roller-coaster-type ride, where you got to hold on. 01:18:49.770 --> 01:18:50.640 You've got to focus. 01:18:50.640 --> 01:18:54.080 It's OK if it doesn't all work out for the best the first time around. 01:18:54.080 --> 01:18:58.530 But each step will be important here in the metaphor of the fire hose as well. 01:18:58.530 --> 01:19:00.890 So here is a list of unsorted numbers. 01:19:00.890 --> 01:19:04.470 The goal at hand is to sort them faster than bubble sort and selection sort 01:19:04.470 --> 01:19:05.140 can. 01:19:05.140 --> 01:19:07.160 So merge sort tells me what? 01:19:07.160 --> 01:19:09.560 Sort left half, sort right half, merge. 01:19:09.560 --> 01:19:10.670 That is it for merge sort. 01:19:10.670 --> 01:19:11.390 That's the magic. 01:19:11.390 --> 01:19:16.220 Just like Mario says, print a pyramid of height h minus 1, print the base, done. 01:19:16.220 --> 01:19:19.160 That's the essence of this recursive algorithm, left half, right half, 01:19:19.160 --> 01:19:19.680 merge. 01:19:19.680 --> 01:19:20.680 So what's the left half? 01:19:20.680 --> 01:19:22.100 It's these four elements here. 01:19:22.100 --> 01:19:24.440 Let me go ahead now and sort those four elements. 01:19:24.440 --> 01:19:27.800 How do I saw a list of four elements? 01:19:27.800 --> 01:19:28.880 Merge sort them, right? 01:19:28.880 --> 01:19:32.175 Sort the left half, then sort the right half, then merge them together. 01:19:32.175 --> 01:19:33.800 So you're kind of like kicking the can. 01:19:33.800 --> 01:19:34.620 Like, I've done no work. 01:19:34.620 --> 01:19:36.560 You're just telling me to go sort something [? else. ?] 01:19:36.560 --> 01:19:38.185 But OK, let me follow those directions. 01:19:38.185 --> 01:19:40.310 Let me sort the left half, 7, 4. 01:19:40.310 --> 01:19:41.810 How do I saw a list of size 2? 01:19:41.810 --> 01:19:43.190 AUDIENCE: Swap. 01:19:43.190 --> 01:19:44.555 SPEAKER 1: Not swapping yet. 01:19:44.555 --> 01:19:45.860 AUDIENCE: [INAUDIBLE] 01:19:45.860 --> 01:19:46.910 SPEAKER 1: Merge sort-- 01:19:46.910 --> 01:19:49.555 the left half, then the right half, then merge them together. 01:19:49.555 --> 01:19:51.680 So again, it's kind of crazy talk because we've not 01:19:51.680 --> 01:19:53.030 done any actual work yet. 01:19:53.030 --> 01:19:54.300 And I claim we're sorting. 01:19:54.300 --> 01:19:55.470 But let's see what happens. 01:19:55.470 --> 01:19:56.900 Here's the left half. 01:19:56.900 --> 01:19:59.550 How do I sort a list of size 1? 01:19:59.550 --> 01:20:00.050 Done. 01:20:00.050 --> 01:20:00.980 That's the return. 01:20:00.980 --> 01:20:03.980 That's the base case to make sure I don't do this forever. 01:20:03.980 --> 01:20:05.210 What came next? 01:20:05.210 --> 01:20:06.380 I just sorted the left half. 01:20:06.380 --> 01:20:07.977 What was the second step? 01:20:07.977 --> 01:20:08.810 Sort the right half. 01:20:08.810 --> 01:20:10.130 How do I sort this? 01:20:10.130 --> 01:20:11.000 Done. 01:20:11.000 --> 01:20:12.500 Now it gets interesting. 01:20:12.500 --> 01:20:13.800 What was the third step? 01:20:13.800 --> 01:20:14.540 AUDIENCE: Merge. 01:20:14.540 --> 01:20:17.283 SPEAKER 1: Merge two lists of size 1. 01:20:17.283 --> 01:20:18.575 So now I need some extra space. 01:20:18.575 --> 01:20:21.480 So I'm going to give myself an extra row, some extra memory, if you will, 01:20:21.480 --> 01:20:22.280 in the computer. 01:20:22.280 --> 01:20:23.840 4 obviously comes first. 01:20:23.840 --> 01:20:25.033 7 obviously comes next. 01:20:25.033 --> 01:20:25.950 That's the merge step. 01:20:25.950 --> 01:20:29.075 That's what I mean by merge, take the smallest element from whichever list, 01:20:29.075 --> 01:20:32.180 and then follow it by the smallest element in the other list. 01:20:32.180 --> 01:20:34.760 This now is a sorted list of size 2. 01:20:34.760 --> 01:20:39.690 So if you rewind in your mind, what was the second step now? 01:20:39.690 --> 01:20:41.790 That was sort left half. 01:20:41.790 --> 01:20:43.620 Sort right half, right? 01:20:43.620 --> 01:20:47.008 So you really have to kind of rewind in the story, like, 30-plus seconds ago. 01:20:47.008 --> 01:20:48.300 How do you sort the right half? 01:20:48.300 --> 01:20:52.470 Well, you sort the left half, done, right half, done. 01:20:52.470 --> 01:20:54.600 Here's the magic, merge. 01:20:54.600 --> 01:20:56.850 How do I merge these two lists? 01:20:56.850 --> 01:20:58.240 2 comes first. 01:20:58.240 --> 01:20:59.640 5 comes next. 01:20:59.640 --> 01:21:04.022 I have just sorted the right half of this list. 01:21:04.022 --> 01:21:05.730 So I sorted left half, sorted right half. 01:21:05.730 --> 01:21:06.647 What's the third step? 01:21:06.647 --> 01:21:07.980 AUDIENCE: Merge. 01:21:07.980 --> 01:21:08.730 SPEAKER 1: Merge. 01:21:08.730 --> 01:21:09.940 So how do I do that? 01:21:09.940 --> 01:21:11.190 Well, I look at the two lists. 01:21:11.190 --> 01:21:13.230 And how do I merge these together [? interleaving ?] them 01:21:13.230 --> 01:21:14.130 in the right order? 01:21:14.130 --> 01:21:18.810 2 comes first, then 4, then 5, then 7. 01:21:18.810 --> 01:21:21.850 So now I have sorted the left half of the original list. 01:21:21.850 --> 01:21:24.750 So what was step two originally? 01:21:24.750 --> 01:21:26.370 Sort the right half. 01:21:26.370 --> 01:21:29.820 So sort the right half means sort the left half. 01:21:29.820 --> 01:21:34.790 And then sort the left half of that, done, right half of that, done. 01:21:34.790 --> 01:21:38.220 Merge makes it interesting, 3 and then 6. 01:21:38.220 --> 01:21:40.770 I've now sorted the left half of four numbers. 01:21:40.770 --> 01:21:41.790 What comes next? 01:21:41.790 --> 01:21:44.250 Sort of right half, so 8 in 1. 01:21:44.250 --> 01:21:47.430 Sort the left half of that, done, right half of that, done. 01:21:47.430 --> 01:21:50.670 Now merge those two together, 1 and 8. 01:21:50.670 --> 01:21:53.610 I've now sorted the right half of the four elements. 01:21:53.610 --> 01:21:56.230 What's the third step? 01:21:56.230 --> 01:21:56.730 Merge. 01:21:56.730 --> 01:21:58.980 So it's left half, right half, merge, again and again. 01:21:58.980 --> 01:22:05.310 So right half, left half, let's merge them, 1, 3, 6, 8. 01:22:05.310 --> 01:22:07.530 And now if you rewind, like, two minutes, 01:22:07.530 --> 01:22:11.050 this is the right half of the whole list. 01:22:11.050 --> 01:22:13.180 So what's step three? 01:22:13.180 --> 01:22:13.680 Merge. 01:22:13.680 --> 01:22:15.570 So let's give ourselves a little more memory 01:22:15.570 --> 01:22:24.620 and merge these two, 1, 2, 3, 4, 5, 6, 7, 8. 01:22:24.620 --> 01:22:26.550 And my god, it's merged in the end. 01:22:26.550 --> 01:22:28.410 Now, that was a lot of steps. 01:22:28.410 --> 01:22:31.350 But it turns out it was far fewer than the number 01:22:31.350 --> 01:22:33.940 of steps we were used to thus far. 01:22:33.940 --> 01:22:36.600 In fact, if you consider what really happened, 01:22:36.600 --> 01:22:39.180 after all of those verbal gymnastics, what I really did was 01:22:39.180 --> 01:22:42.060 I took a list of size 8 and broke it down at some point 01:22:42.060 --> 01:22:43.195 into eight lists of size 1. 01:22:43.195 --> 01:22:45.570 And that's when there was no interesting work to be done. 01:22:45.570 --> 01:22:46.320 We just returned. 01:22:46.820 --> 01:22:50.760 But I did that so that I could then compose four lists of size 2 01:22:50.760 --> 01:22:51.720 along the way. 01:22:51.720 --> 01:22:54.545 And I did that so I could compose two lists of size 4. 01:22:54.545 --> 01:22:56.670 And I did that so that I could aggregate everything 01:22:56.670 --> 01:23:00.810 together and get one list of size 8. 01:23:00.810 --> 01:23:02.550 So notice the pattern here. 01:23:02.550 --> 01:23:05.310 If you go bottom up, even, here's one list. 01:23:05.310 --> 01:23:06.660 I divide it in half. 01:23:06.660 --> 01:23:07.980 I divided those halves in half. 01:23:07.980 --> 01:23:10.180 I divided those halves in halves. 01:23:10.180 --> 01:23:13.590 So what function or mathematics have we use 01:23:13.590 --> 01:23:16.110 to describe any process thus far since week 0, 01:23:16.110 --> 01:23:19.498 where we're doing something halves at a time? 01:23:19.498 --> 01:23:20.675 AUDIENCE: Logarithm. 01:23:20.675 --> 01:23:21.550 SPEAKER 1: Logarithm. 01:23:21.550 --> 01:23:25.150 So any time you see in CS50 and really in algorithms is more generally 01:23:25.150 --> 01:23:29.570 a process that is dividing and dividing and dividing again and again, 01:23:29.570 --> 01:23:31.750 there's a logarithm involved there. 01:23:31.750 --> 01:23:36.040 And indeed, the number of times that you can chop up a list of size 8 01:23:36.040 --> 01:23:40.600 into eight lists of size 1 is, by definition, log base 2 of n 01:23:40.600 --> 01:23:43.322 or just, again, with a wave of the hand, log n, which 01:23:43.322 --> 01:23:46.030 is to say like the height of this picture, if you will, is log n. 01:23:46.030 --> 01:23:48.405 But again, we don't have to worry too much about numbers. 01:23:48.405 --> 01:23:52.210 But every time we did that dividing into smaller lists, we merged, right? 01:23:52.210 --> 01:23:54.430 That was the third and most important step. 01:23:54.430 --> 01:23:58.810 And every time we merged, we combined 4 elements 01:23:58.810 --> 01:24:02.740 plus 4 elements or 2 plus 2 plus 2 plus 2 elements or 1 plus 1 01:24:02.740 --> 01:24:04.840 plus 1, 8 elements individually. 01:24:04.840 --> 01:24:07.370 So we touched all n elements. 01:24:07.370 --> 01:24:11.050 So this picture, if you will, is, like, 8 numbers wide. 01:24:11.050 --> 01:24:14.740 And I-- or n numbers wide, if we generalize as n. 01:24:14.740 --> 01:24:19.360 And it's log n rows tall, if you will, because that's how many times you 01:24:19.360 --> 01:24:20.890 can divide things again and again. 01:24:20.890 --> 01:24:24.970 So what is the running time intuitively, perhaps, of merge sort? 01:24:24.970 --> 01:24:28.240 It's actually n times log n because you've 01:24:28.240 --> 01:24:31.000 got n numbers that need to be merged again and again and again. 01:24:31.000 --> 01:24:34.000 But how many times did I say again? 01:24:34.000 --> 01:24:38.620 Log n times, because that's the number of times you can have things again 01:24:38.620 --> 01:24:40.758 and again and again. 01:24:40.758 --> 01:24:44.050 And if you do the math, log base 2 of 8, which is the total number of elements, 01:24:44.050 --> 01:24:45.970 indeed is 1, 2, 3. 01:24:45.970 --> 01:24:47.080 So math works out. 01:24:47.080 --> 01:24:49.660 But it's OK if you think about it more intuitively. 01:24:49.660 --> 01:24:51.640 So this is perhaps the bigger leap of faith, 01:24:51.640 --> 01:24:55.420 to just believe that, indeed, that is how that math works out. 01:24:55.420 --> 01:24:58.600 But it turns out that what this means is the algorithm 01:24:58.600 --> 01:25:01.610 itself is fundamentally faster. 01:25:01.610 --> 01:25:03.568 So if we consider our little chart from before, 01:25:03.568 --> 01:25:06.318 where bubble sort and selection sort were way up here at the top-- 01:25:06.318 --> 01:25:08.810 and frankly, you can have even slower algorithms than that, 01:25:08.810 --> 01:25:11.680 especially if the problems are even more difficult to solve. 01:25:11.680 --> 01:25:16.283 Now we can add to the list merge sort there at n log n. 01:25:16.283 --> 01:25:16.950 It's in between. 01:25:16.950 --> 01:25:17.450 Why? 01:25:17.450 --> 01:25:20.920 Because, again, even if you're not 100% comfortable with what log in is, 01:25:20.920 --> 01:25:22.270 notice that here's n. 01:25:22.270 --> 01:25:23.420 Here's n squared. 01:25:23.420 --> 01:25:27.910 So n times a slightly smaller value is in between, or n log n. 01:25:27.910 --> 01:25:31.540 And we'll see in a moment what this actually means or feels like. 01:25:31.540 --> 01:25:33.310 What about omega? 01:25:33.310 --> 01:25:37.460 In the best case with merge sort, how much time does it take? 01:25:37.460 --> 01:25:39.730 Well, it, too, does not have that optimization 01:25:39.730 --> 01:25:42.970 that bubble sort had, which is, well, if you do no swaps, just quit. 01:25:42.970 --> 01:25:45.940 It does the same thing always, sort the left half, sort the right half, 01:25:45.940 --> 01:25:48.430 merge, even if it's a bit unnecessary. 01:25:48.430 --> 01:25:53.710 So it turns out that the omega notation for merge sort is also n log n. 01:25:53.710 --> 01:25:58.540 The newer version of bubble sort, recall, we could get as good as n steps 01:25:58.540 --> 01:26:01.000 if we stop after seeing no swaps. 01:26:01.000 --> 01:26:02.920 So merge sort, it's a trade-off, right? 01:26:02.920 --> 01:26:05.038 In the worst case, much faster, I claim. 01:26:05.038 --> 01:26:05.830 It's not n squared. 01:26:05.830 --> 01:26:06.760 It's n log n. 01:26:06.760 --> 01:26:09.280 But in the best case, you might waste a little bit of time. 01:26:09.280 --> 01:26:11.800 And again, that's thematic in computer science more generally. 01:26:11.800 --> 01:26:13.210 You're not going to get anything for free. 01:26:13.210 --> 01:26:15.190 If you want to improve your upper bound, you 01:26:15.190 --> 01:26:18.493 might have to sacrifice your lower bound as well. 01:26:18.493 --> 01:26:20.410 Now, it turns out with some algorithms-- and I 01:26:20.410 --> 01:26:24.580 promise this is last Greek notation for the course. 01:26:24.580 --> 01:26:27.220 This is a capital theta in Greek. 01:26:27.220 --> 01:26:31.000 And it turns out that if an algorithm has an upper bound and a lower 01:26:31.000 --> 01:26:33.940 bound that are identical, you can describe it using, just 01:26:33.940 --> 01:26:35.860 for shorthand notation, theta. 01:26:35.860 --> 01:26:38.320 So we've seen two algorithms that fit this criteria. 01:26:38.320 --> 01:26:40.000 Selection sort was pretty bad. 01:26:40.000 --> 01:26:42.100 It was big O of n squared. 01:26:42.100 --> 01:26:44.320 And it was omega of n squared because it just 01:26:44.320 --> 01:26:47.590 kept blindly looking for the smallest elements again and again. 01:26:47.590 --> 01:26:51.910 Merge sort is in theta of n log n for the same reason. 01:26:51.910 --> 01:26:54.250 It just blindly does the same algorithm again and again, 01:26:54.250 --> 01:26:58.420 no matter whether the input is already sorted or completely unsorted. 01:26:58.420 --> 01:27:04.130 But on the whole, n log n is a pretty powerful, compelling feature. 01:27:04.130 --> 01:27:07.990 So let me go ahead and turn our attention, finally, 01:27:07.990 --> 01:27:11.530 to a little visualization that might help this sink in as well. 01:27:11.530 --> 01:27:15.760 What you're about to see is a bunch of vertical bars, the top of which 01:27:15.760 --> 01:27:17.590 are 100 bars from left to right. 01:27:17.590 --> 01:27:19.840 Small bars equals small number. 01:27:19.840 --> 01:27:21.160 Big bar equals big number. 01:27:21.160 --> 01:27:23.650 And the first algorithm up here is selection sort. 01:27:23.650 --> 01:27:25.810 The second algorithm down here is bubble sort. 01:27:25.810 --> 01:27:27.820 And the middle algorithm is merge sort. 01:27:27.820 --> 01:27:30.820 So if you will, we'll end on this note today. 01:27:30.820 --> 01:27:33.490 We'll time these algorithms with these simple inputs 01:27:33.490 --> 01:27:39.160 and see just how much better, I claim, merge sort is, which is to say, 01:27:39.160 --> 01:27:43.330 just how big of a difference does n squared versus n log n make, 01:27:43.330 --> 01:27:46.120 which is to say when you design algorithms, making things correct 01:27:46.120 --> 01:27:47.650 is not the ultimate goal. 01:27:47.650 --> 01:27:50.290 It's to make them well designed as well. 01:27:50.290 --> 01:27:53.284 [MUSIC PLAYING] 01:28:53.280 --> 01:28:55.020 That's it for C50 and merge sort. 01:28:55.020 --> 01:28:56.820 We will see you next time. 01:28:56.820 --> 01:28:59.570 [APPLAUSE]