1 00:00:00,000 --> 00:01:24,180 [MUSIC PLAYING] 2 00:01:24,180 --> 00:01:25,890 SPEAKER 1: This is CS50. 3 00:01:25,890 --> 00:01:27,870 And perhaps not unlike past lectures, today 4 00:01:27,870 --> 00:01:30,187 is going to feel a bit like a fire hose again, 5 00:01:30,187 --> 00:01:32,520 but realize that it's going to be a lot less code today. 6 00:01:32,520 --> 00:01:35,550 So there's less syntax, just a few new ideas here and there. 7 00:01:35,550 --> 00:01:38,550 And the goal ultimately today really is the concepts and the ideas 8 00:01:38,550 --> 00:01:39,360 that we take away. 9 00:01:39,360 --> 00:01:41,735 And keep in mind that over the remainder of the semester, 10 00:01:41,735 --> 00:01:44,710 we will continue to apply and reapply these same ideas. 11 00:01:44,710 --> 00:01:46,380 So the goal today really is exposure. 12 00:01:46,380 --> 00:01:48,690 And the goal for the semester is comfort. 13 00:01:48,690 --> 00:01:52,058 So with that said, let's consider where we left off 14 00:01:52,058 --> 00:01:54,600 last time, which was to consider that inside of our computer. 15 00:01:54,600 --> 00:01:55,683 Of course, we have memory. 16 00:01:55,683 --> 00:01:57,360 We have RAM, Random Access Memory. 17 00:01:57,360 --> 00:02:00,180 And it was convenient, we found, to start to divide this up 18 00:02:00,180 --> 00:02:01,440 into individual bytes. 19 00:02:01,440 --> 00:02:05,520 Like, byte 0 might be in the top left, and 2 gigabytes, 2 billion bytes, 20 00:02:05,520 --> 00:02:08,370 might be all the way down there in the bottom right-hand corner. 21 00:02:08,370 --> 00:02:10,199 And once we started to layout our memory, 22 00:02:10,199 --> 00:02:14,730 both conceptually and technologically, in this way, left to right, top 23 00:02:14,730 --> 00:02:18,420 to bottom, we had the ability to use a data structure of sorts. 24 00:02:18,420 --> 00:02:21,330 We introduced, recall, arrays, whereby if we think of our memory 25 00:02:21,330 --> 00:02:24,180 as just a grid of bytes, we can start using it 26 00:02:24,180 --> 00:02:26,610 kind of to our advantage to solve problems. 27 00:02:26,610 --> 00:02:30,660 And it turns out that perhaps a physical incarnation of this idea of an array 28 00:02:30,660 --> 00:02:32,790 might be like these red lockers here. 29 00:02:32,790 --> 00:02:35,970 Because even though you and I, every time we've looked at arrays thus far, 30 00:02:35,970 --> 00:02:39,330 can kind of get a bird's-eye view of everything that's in the array, 31 00:02:39,330 --> 00:02:40,830 computer's actually pretty limited. 32 00:02:40,830 --> 00:02:43,387 It doesn't have that instant detection that you and I 33 00:02:43,387 --> 00:02:46,470 have when we just scan a list of numbers and [? kind ?] of take it all in. 34 00:02:46,470 --> 00:02:50,760 A computer is only going to be able to look at the contents of an array step 35 00:02:50,760 --> 00:02:53,692 by step by step, consistent with this whole idea of an algorithm. 36 00:02:53,692 --> 00:02:56,400 A computer can't just look at all the numbers and take it all in. 37 00:02:56,400 --> 00:03:00,000 It can only open, so to speak, one of these lockers at a time. 38 00:03:00,000 --> 00:03:03,120 So toward that end, we've gone ahead and populated these lockers 39 00:03:03,120 --> 00:03:04,545 with a whole bunch of numbers. 40 00:03:04,545 --> 00:03:06,420 And the goal at hand is to solve the problem. 41 00:03:06,420 --> 00:03:08,693 The goal at hand is to find one of those numbers. 42 00:03:08,693 --> 00:03:11,610 So if we distill computer science into this problem-solving mechanism, 43 00:03:11,610 --> 00:03:14,190 the input today is these seven lockers. 44 00:03:14,190 --> 00:03:18,090 And the output is going to be a Boolean, true or false, 45 00:03:18,090 --> 00:03:21,480 is the number we're looking for among those seven lockers? 46 00:03:21,480 --> 00:03:25,140 And so rather than my kind of poking around looking for this number, 47 00:03:25,140 --> 00:03:31,050 might I call on, say, two volunteers to kick us off with two, say, algorithms? 48 00:03:31,050 --> 00:03:32,310 Let's see. 49 00:03:32,310 --> 00:03:33,180 Over here? 50 00:03:33,180 --> 00:03:33,870 Yeah. 51 00:03:33,870 --> 00:03:37,710 And let's go over here in front, if we may. 52 00:03:37,710 --> 00:03:39,560 Come on up. 53 00:03:39,560 --> 00:03:40,560 And what are your names? 54 00:03:40,560 --> 00:03:41,500 AUDIENCE: [? Nizari. ?] 55 00:03:41,500 --> 00:03:41,930 SPEAKER 1: Lizari? 56 00:03:41,930 --> 00:03:42,430 AUDIENCE: [? Nizari. ?] 57 00:03:42,430 --> 00:03:44,050 SPEAKER 1: [? Nizari. ?] OK, David. 58 00:03:44,050 --> 00:03:44,800 Nice to meet you. 59 00:03:44,800 --> 00:03:47,670 Come on up, [? Nizari. ?] And over here, what's your name? 60 00:03:47,670 --> 00:03:48,467 AUDIENCE: Eric. 61 00:03:48,467 --> 00:03:49,300 SPEAKER 1: Eric, OK. 62 00:03:49,300 --> 00:03:49,800 David. 63 00:03:49,800 --> 00:03:51,210 And come on up, Brian, as well. 64 00:03:51,210 --> 00:03:52,670 Nice to meet you, Eric. 65 00:03:52,670 --> 00:03:54,560 Eric, [? Nizari. ?] [? Nizari, ?] Eric. 66 00:03:54,560 --> 00:03:55,890 [APPLAUSE] 67 00:03:55,890 --> 00:03:57,630 Come on over here. 68 00:03:57,630 --> 00:03:59,730 So you came on up the stage first. 69 00:03:59,730 --> 00:04:01,530 Would you like to go first or go second? 70 00:04:01,530 --> 00:04:02,270 AUDIENCE: I'll go second. 71 00:04:02,270 --> 00:04:03,510 SPEAKER 1: You're going to go second. 72 00:04:03,510 --> 00:04:04,260 So Eric, you're up first. 73 00:04:04,260 --> 00:04:05,593 Come on over here, if you would. 74 00:04:05,593 --> 00:04:10,260 So Eric, behind these seven doors we have placed, in advance, the number 50. 75 00:04:10,260 --> 00:04:12,570 And we would simply like you, the computer, 76 00:04:12,570 --> 00:04:15,560 to search this array for the number 50. 77 00:04:15,560 --> 00:04:17,440 AUDIENCE: Is it sorted? 78 00:04:17,440 --> 00:04:21,180 SPEAKER 1: I cannot answer that question at this time. 79 00:04:21,180 --> 00:04:23,582 Go. 80 00:04:23,582 --> 00:04:25,290 Oh, and so that the audience knows what's 81 00:04:25,290 --> 00:04:28,170 going on, if you wouldn't mind taking the numbers out to see. 82 00:04:28,170 --> 00:04:29,360 AUDIENCE: This is seven. 83 00:04:29,360 --> 00:04:30,480 SPEAKER 1: Excellent. 84 00:04:30,480 --> 00:04:31,310 Not 50. 85 00:04:31,310 --> 00:04:33,160 AUDIENCE: Should I-- can I take it out? 86 00:04:33,160 --> 00:04:35,827 SPEAKER 1: At this point, yes, you may do whatever you want now. 87 00:04:35,827 --> 00:04:37,594 Just find us 50. 88 00:04:37,594 --> 00:04:38,940 AUDIENCE: Two. 89 00:04:38,940 --> 00:04:39,860 [LAUGHTER] 90 00:04:39,860 --> 00:04:42,216 SPEAKER 1: Very good. 91 00:04:42,216 --> 00:04:43,400 AUDIENCE: One. 92 00:04:43,400 --> 00:04:45,972 SPEAKER 1: Nice. 93 00:04:45,972 --> 00:04:47,286 AUDIENCE: Six. 94 00:04:47,286 --> 00:04:50,094 SPEAKER 1: Very good. 95 00:04:50,094 --> 00:04:50,800 AUDIENCE: Three. 96 00:04:50,800 --> 00:04:51,300 97 00:04:51,300 --> 00:04:51,967 SPEAKER 1: Nice. 98 00:04:51,967 --> 00:04:53,633 AUDIENCE: None of these are close to 50. 99 00:04:53,633 --> 00:04:55,620 SPEAKER 1: No, none of them are close to 50. 100 00:04:55,620 --> 00:04:56,355 AUDIENCE: Four. 101 00:04:56,355 --> 00:04:57,810 SPEAKER 1: Four? 102 00:04:57,810 --> 00:04:59,748 And? 103 00:04:59,748 --> 00:05:00,720 AUDIENCE: 50! 104 00:05:00,720 --> 00:05:01,770 SPEAKER 1: Amazing! 105 00:05:01,770 --> 00:05:02,770 Very well done. 106 00:05:02,770 --> 00:05:04,960 [APPLAUSE] 107 00:05:04,960 --> 00:05:05,785 108 00:05:05,785 --> 00:05:06,410 Very well done. 109 00:05:06,410 --> 00:05:08,390 Now, if I may, Eric, what was the algorithm 110 00:05:08,390 --> 00:05:10,160 via which you found us the number 50? 111 00:05:10,160 --> 00:05:11,195 AUDIENCE: Linear search. 112 00:05:11,195 --> 00:05:13,520 SPEAKER 1: OK, linear search, meaning what to you? 113 00:05:13,520 --> 00:05:17,215 AUDIENCE: You just go in a line, starting from there until there. 114 00:05:17,215 --> 00:05:19,340 SPEAKER 1: OK, that was a very sophisticated answer 115 00:05:19,340 --> 00:05:20,798 to a term we've not yet introduced. 116 00:05:20,798 --> 00:05:24,290 And that's great, linear search from left to right, so literally 117 00:05:24,290 --> 00:05:25,043 following a line. 118 00:05:25,043 --> 00:05:26,960 And was your algorithm correct, would you say? 119 00:05:26,960 --> 00:05:27,605 AUDIENCE: Yes. 120 00:05:27,605 --> 00:05:29,010 SPEAKER 1: OK, so it was correct. 121 00:05:29,010 --> 00:05:31,048 But there's these different parameters that we 122 00:05:31,048 --> 00:05:33,590 want to optimize solutions for not just correctness, but what 123 00:05:33,590 --> 00:05:35,110 other property as well? 124 00:05:35,110 --> 00:05:35,870 AUDIENCE: Design. 125 00:05:35,870 --> 00:05:37,953 SPEAKER 1: So maybe design, right, the efficiency. 126 00:05:37,953 --> 00:05:40,100 So was that the most efficient you could have done? 127 00:05:40,100 --> 00:05:43,160 AUDIENCE: Actually, yeah, I think so. 128 00:05:43,160 --> 00:05:43,940 [LAUGHTER] 129 00:05:43,940 --> 00:05:44,940 SPEAKER 1: And why do you say that? 130 00:05:44,940 --> 00:05:46,950 AUDIENCE: Because-- so the numbers are sorted. 131 00:05:46,950 --> 00:05:50,160 So at the end of the day, I have to look through every single one. 132 00:05:50,160 --> 00:05:50,360 SPEAKER 1: Yeah. 133 00:05:50,360 --> 00:05:52,520 AUDIENCE: And it's just by chance that the 50 was at last. 134 00:05:52,520 --> 00:05:53,312 SPEAKER 1: Exactly. 135 00:05:53,312 --> 00:05:55,370 So it's unfortunate that they were all random. 136 00:05:55,370 --> 00:05:57,680 And I didn't want to tell you because I didn't want to bias your algorithm one 137 00:05:57,680 --> 00:05:58,388 way or the other. 138 00:05:58,388 --> 00:06:01,400 But not knowing if they're sorted and them not even being sorted 139 00:06:01,400 --> 00:06:04,968 means that that is the best you can do, look at all of the doors 140 00:06:04,968 --> 00:06:06,260 to find the number in question. 141 00:06:06,260 --> 00:06:08,908 And maybe you could have gotten lucky, if we had put 50 here. 142 00:06:08,908 --> 00:06:10,700 But in the worst case, Eric was, of course, 143 00:06:10,700 --> 00:06:13,280 going to have to do exactly that, searching all of the boxes. 144 00:06:13,280 --> 00:06:14,480 So thank you, Eric. 145 00:06:14,480 --> 00:06:16,100 Stay on stage with us, if you would, for a moment. 146 00:06:16,100 --> 00:06:18,230 And a round of applause, if we could, for finding 50 so well. 147 00:06:18,230 --> 00:06:18,730 [APPLAUSE] 148 00:06:18,730 --> 00:06:21,120 [? Nizari, ?] could you come on up? 149 00:06:21,120 --> 00:06:23,390 We need you not to look at the numbers, because Brian 150 00:06:23,390 --> 00:06:24,830 needs to do a little bit of magic. 151 00:06:24,830 --> 00:06:27,170 And he's going to put some of the numbers back into the locker. 152 00:06:27,170 --> 00:06:29,253 So literally everyone in the room will know what's 153 00:06:29,253 --> 00:06:30,932 going on except you, at the moment. 154 00:06:30,932 --> 00:06:33,140 But we're going to give you the added bonus this time 155 00:06:33,140 --> 00:06:36,080 of sorting the numbers in advance. 156 00:06:36,080 --> 00:06:38,567 So Brian is in the process of sorting some numbers for us. 157 00:06:38,567 --> 00:06:40,400 The goal at hand, in just a moment, is still 158 00:06:40,400 --> 00:06:42,380 going to be the find the number 50. 159 00:06:42,380 --> 00:06:45,780 I'm really just stalling right now because he's still doing this. 160 00:06:45,780 --> 00:06:49,730 So I don't really have anything interesting to say just yet. 161 00:06:49,730 --> 00:06:50,660 Brian's back now. 162 00:06:50,660 --> 00:06:51,460 Hold on. 163 00:06:51,460 --> 00:06:52,790 And would you like to introduce yourself maybe? 164 00:06:52,790 --> 00:06:53,580 AUDIENCE: I'm [? Nizari. ?] 165 00:06:53,580 --> 00:06:54,590 SPEAKER 1: [? Nizari, ?] and what year are you? 166 00:06:54,590 --> 00:06:56,615 AUDIENCE: I'm a high school student, a senior. 167 00:06:56,615 --> 00:06:57,490 SPEAKER 1: Wonderful. 168 00:06:57,490 --> 00:06:58,130 At what school? 169 00:06:58,130 --> 00:06:59,300 AUDIENCE: Cambridge Rindge and Latin, it's down the street. 170 00:06:59,300 --> 00:07:00,572 SPEAKER 1: Just down the road. 171 00:07:00,572 --> 00:07:02,030 So glad you can join us here today. 172 00:07:02,030 --> 00:07:03,770 And perfect timing, if I may. 173 00:07:03,770 --> 00:07:06,230 Now we have seven lockers here behind you. 174 00:07:06,230 --> 00:07:08,270 And the goal now is to still find the number 50. 175 00:07:08,270 --> 00:07:10,860 But I'm going to tell you that the numbers are sorted. 176 00:07:10,860 --> 00:07:14,100 So what's going to be your algorithm, if not the same as Eric? 177 00:07:14,100 --> 00:07:15,100 AUDIENCE: I will start-- 178 00:07:15,100 --> 00:07:15,510 SPEAKER 1: And here you go. 179 00:07:15,510 --> 00:07:17,450 AUDIENCE: I'm going to start in the middle. 180 00:07:17,450 --> 00:07:19,400 SPEAKER 1: All right, go ahead and show us what's in the middle. 181 00:07:19,400 --> 00:07:20,805 AUDIENCE: Middle number is seven. 182 00:07:20,805 --> 00:07:21,680 SPEAKER 1: All right. 183 00:07:21,680 --> 00:07:23,695 And now what's your next step going to be? 184 00:07:23,695 --> 00:07:25,070 AUDIENCE: So I want to get to 50. 185 00:07:25,070 --> 00:07:27,770 So assuming that they're sorted, I'm going to go this way. 186 00:07:27,770 --> 00:07:28,700 SPEAKER 1: Go to the right, OK. 187 00:07:28,700 --> 00:07:31,117 So we have three lockers remaining on the right-hand side. 188 00:07:31,117 --> 00:07:32,240 What's your instinct now? 189 00:07:32,240 --> 00:07:34,620 AUDIENCE: Mm, I'm going to start with this locker. 190 00:07:34,620 --> 00:07:36,530 SPEAKER 1: OK, this one being in the middle of those three. 191 00:07:36,530 --> 00:07:36,910 And you find? 192 00:07:36,910 --> 00:07:37,940 AUDIENCE: And we got 81. 193 00:07:37,940 --> 00:07:38,435 SPEAKER 1: 81. 194 00:07:38,435 --> 00:07:39,800 AUDIENCE: So I know that's too big. 195 00:07:39,800 --> 00:07:40,620 SPEAKER 1: Way too far. 196 00:07:40,620 --> 00:07:42,260 AUDIENCE: Hoo, so I'm going to go with this one. 197 00:07:42,260 --> 00:07:42,350 198 00:07:42,350 --> 00:07:43,970 SPEAKER 1: Which is now in the middle of the two lockers. 199 00:07:43,970 --> 00:07:44,590 AUDIENCE: And I got 50. 200 00:07:44,590 --> 00:07:47,574 SPEAKER 1: And a round of applause, if we could, for [? Nizari. ?] 201 00:07:47,574 --> 00:07:50,240 [APPLAUSE] 202 00:07:50,240 --> 00:07:52,190 Congratulations and thank you to you both. 203 00:07:52,190 --> 00:07:53,340 So thanks to you both. 204 00:07:53,340 --> 00:07:57,260 So here were two algorithms, dubbed linear search and binary search. 205 00:07:57,260 --> 00:08:00,440 And that's all we have for you right now. 206 00:08:00,440 --> 00:08:01,730 [LAUGHTER] 207 00:08:01,730 --> 00:08:04,190 So linear search and binary search are aptly 208 00:08:04,190 --> 00:08:06,320 named for exactly the reasons we saw. 209 00:08:06,320 --> 00:08:09,230 Eric literally walked across in a line looking 210 00:08:09,230 --> 00:08:12,110 for some element, where [? Nizari, ?] instead, actually used 211 00:08:12,110 --> 00:08:14,510 binary search, "bi" meaning two, and being 212 00:08:14,510 --> 00:08:18,080 very reminiscent of our discussion of phone books in week 0, 213 00:08:18,080 --> 00:08:20,153 when I did this divide-and-conquer approach. 214 00:08:20,153 --> 00:08:22,070 That, too, was called, even though I might not 215 00:08:22,070 --> 00:08:24,620 have labeled it as such, binary search because I 216 00:08:24,620 --> 00:08:27,995 kept dividing the problem in two, hence the "bi" in binary. 217 00:08:27,995 --> 00:08:30,150 Binary search was again and again and again, 218 00:08:30,150 --> 00:08:34,773 just as we did here when searching for 50 the second time around. 219 00:08:34,773 --> 00:08:36,440 So these, of course, are two algorithms. 220 00:08:36,440 --> 00:08:39,380 But let's now start to formalize this discussion a little bit 221 00:08:39,380 --> 00:08:43,340 and consider how it was each of them was able to solve the problem correctly 222 00:08:43,340 --> 00:08:45,470 and then ultimately with better design. 223 00:08:45,470 --> 00:08:48,740 So linear search, we might distill as pseudocode like this 224 00:08:48,740 --> 00:08:52,408 and again, pseudocode, English-like syntax, no one way to write this. 225 00:08:52,408 --> 00:08:54,950 Eric, if I can put words in your mouth, might have done this. 226 00:08:54,950 --> 00:08:58,190 You might have thought to yourself for [? i ?] from 0 to n minus 1, 227 00:08:58,190 --> 00:09:03,230 to very quickly kind of map it to the idea of code, where this is locker 0, 228 00:09:03,230 --> 00:09:07,820 and this is locker n minus 1 or 7 or 6, specifically, in this case, 229 00:09:07,820 --> 00:09:09,350 with 7 total lockers. 230 00:09:09,350 --> 00:09:13,670 He then checked if the ith elements-- ith just meaning the one he's currently 231 00:09:13,670 --> 00:09:14,330 looking at-- 232 00:09:14,330 --> 00:09:16,910 happens to be 50, then go ahead and return true, 233 00:09:16,910 --> 00:09:19,910 the bool that was meant to be the output of this algorithm. 234 00:09:19,910 --> 00:09:22,490 And he kept doing that and doing that and doing that. 235 00:09:22,490 --> 00:09:24,710 But suppose 50 were not there. 236 00:09:24,710 --> 00:09:28,160 And suppose he got all the way here, to where there is no locker. 237 00:09:28,160 --> 00:09:29,736 What should he ultimately return? 238 00:09:29,736 --> 00:09:30,430 AUDIENCE: False. 239 00:09:30,430 --> 00:09:31,263 SPEAKER 1: So false. 240 00:09:31,263 --> 00:09:34,910 And so the very last step of this algorithm not inside of that loop 241 00:09:34,910 --> 00:09:38,013 has to be kind of a catch all, where you just say, return false. 242 00:09:38,013 --> 00:09:40,430 If I got all the way through this loop and didn't find it, 243 00:09:40,430 --> 00:09:42,840 it must be the case that 50 is simply not there. 244 00:09:42,840 --> 00:09:45,660 So that might be one way to write the pseudocode for this problem. 245 00:09:45,660 --> 00:09:50,600 But now let's consider for a moment just how efficient or inefficient that code 246 00:09:50,600 --> 00:09:53,300 might have been vis a vis the second algorithm, [? Nizari, ?] 247 00:09:53,300 --> 00:09:56,470 where she actually divided the conquer in half in half in half. 248 00:09:56,470 --> 00:09:58,280 That, of course, was called binary search. 249 00:09:58,280 --> 00:10:00,447 And we can write this in any number of ways as well. 250 00:10:00,447 --> 00:10:02,120 But in pseudocode, I might propose this. 251 00:10:02,120 --> 00:10:03,870 Look right in the middle, just as she did. 252 00:10:03,870 --> 00:10:07,160 And if that number is 50, what should she have returned or outputted? 253 00:10:07,160 --> 00:10:08,150 AUDIENCE: True. 254 00:10:08,150 --> 00:10:09,650 SPEAKER 1: So true as our bool. 255 00:10:09,650 --> 00:10:14,600 And so we might have done this, else if 50 were less than the middle item. 256 00:10:14,600 --> 00:10:16,940 She probably wanted a search to the left, 257 00:10:16,940 --> 00:10:20,300 just as when I was searching for Mike Smith, I might have gone left or right. 258 00:10:20,300 --> 00:10:25,250 So if 50 is less than the middle item, she might want to search the left half. 259 00:10:25,250 --> 00:10:28,040 Meanwhile, if 50 is greater than the middle item, 260 00:10:28,040 --> 00:10:31,160 then she might want to search instead the right half. 261 00:10:31,160 --> 00:10:34,730 But there is a fourth possibility, just to be safe here. 262 00:10:34,730 --> 00:10:36,050 What else might be the case? 263 00:10:36,050 --> 00:10:39,742 It's not in the middle, and it's not to the left, and it's not to the right. 264 00:10:39,742 --> 00:10:40,700 So it's just not there. 265 00:10:40,700 --> 00:10:43,550 And so there's actually a fourth case, and we can express this differently. 266 00:10:43,550 --> 00:10:45,425 I'm going to go ahead and just say at the top 267 00:10:45,425 --> 00:10:49,070 if there's no items in the list, let me go ahead and just claim return false. 268 00:10:49,070 --> 00:10:50,120 There's nothing there. 269 00:10:50,120 --> 00:10:53,360 After all, if I keep dividing a list in half and half and half and half, 270 00:10:53,360 --> 00:10:55,250 eventually there's going to be no list left. 271 00:10:55,250 --> 00:10:58,160 At which point, I should just conclude, oh, it clearly wasn't there. 272 00:10:58,160 --> 00:11:02,623 If I halved it so many times, nothing is left on the right or the left. 273 00:11:02,623 --> 00:11:04,040 So how might we now think of this? 274 00:11:04,040 --> 00:11:07,230 Well, just as in week 0, we had a picture like this. 275 00:11:07,230 --> 00:11:09,740 And we claimed that these algorithms were either 276 00:11:09,740 --> 00:11:11,600 linear in nature, literally a straight line, 277 00:11:11,600 --> 00:11:14,780 like Eric's, or a little more curved or logarithmic, 278 00:11:14,780 --> 00:11:16,640 so to speak, like [? Nizari's. ?] And these 279 00:11:16,640 --> 00:11:18,098 had fundamentally different shapes. 280 00:11:18,098 --> 00:11:20,420 And we refer to them really by the number of steps 281 00:11:20,420 --> 00:11:22,220 they might take in the worst case. 282 00:11:22,220 --> 00:11:26,720 If the phone book or today, the number of lockers was n in total, 283 00:11:26,720 --> 00:11:29,120 it might take as many as n steps for Eric or anyone 284 00:11:29,120 --> 00:11:32,790 to find Mike Smith or the number 50 from left to right. 285 00:11:32,790 --> 00:11:37,910 If in week 0, I did two pages at a time, you can actually speed that up, 286 00:11:37,910 --> 00:11:39,800 but the shape of the line was the same. 287 00:11:39,800 --> 00:11:41,360 Eric didn't do that here, but he could have. 288 00:11:41,360 --> 00:11:43,970 With two hands, he maybe could have looked at two lockers at once. 289 00:11:43,970 --> 00:11:46,970 So that might have been an intermediate step between those two extremes. 290 00:11:46,970 --> 00:11:49,160 But logarithmic was this more curved shape. 291 00:11:49,160 --> 00:11:52,010 But today, we're going to start to formalize this a little bit 292 00:11:52,010 --> 00:11:54,380 so that we don't keep talking about searching 293 00:11:54,380 --> 00:11:58,100 and binary search in linear search alone, but other algorithms as well. 294 00:11:58,100 --> 00:12:00,470 And computer scientists now actually have 295 00:12:00,470 --> 00:12:03,680 terminology with which to describe algorithms 296 00:12:03,680 --> 00:12:06,140 and just how well designed your algorithm is 297 00:12:06,140 --> 00:12:08,090 or how well implemented your code is. 298 00:12:08,090 --> 00:12:12,050 And it's generally called big O, literally a capital, italicized O. 299 00:12:12,050 --> 00:12:14,870 Big O notation just means on the order of. 300 00:12:14,870 --> 00:12:19,490 So if you were asked by someone what is the efficiency of your algorithm 301 00:12:19,490 --> 00:12:21,620 or the efficiency of your code, you could kind of 302 00:12:21,620 --> 00:12:23,910 wave your hand, literally and figuratively, 303 00:12:23,910 --> 00:12:27,710 and give them an approximation of just how fast or slow your code is. 304 00:12:27,710 --> 00:12:31,840 So instead of saying literally n steps or n/2 or log n steps, 305 00:12:31,840 --> 00:12:33,920 a computer scientists would typically say, ah, 306 00:12:33,920 --> 00:12:38,240 that algorithm is on the order of n or on the order of n/2 307 00:12:38,240 --> 00:12:40,220 or on the order of log n. 308 00:12:40,220 --> 00:12:43,160 So this is just cryptic-looking syntax that you pronounce verbally 309 00:12:43,160 --> 00:12:44,840 as "on the order of." 310 00:12:44,840 --> 00:12:48,470 And it's kind of written like a math function, just as we have here. 311 00:12:48,470 --> 00:12:51,050 But it turns out that when you're using big O notation, 312 00:12:51,050 --> 00:12:52,520 it really is kind of hand-waving. 313 00:12:52,520 --> 00:12:54,560 Like, it's just meant to be an approximation. 314 00:12:54,560 --> 00:12:55,460 And you know what? 315 00:12:55,460 --> 00:12:59,030 In this case here, these lines are so similar looking, 316 00:12:59,030 --> 00:13:01,370 I'm actually going to throw away the divided by 2. 317 00:13:01,370 --> 00:13:03,470 And we'll see why this is OK in just a moment. 318 00:13:03,470 --> 00:13:06,830 But those are so similar that I'm just going to call them the same thing. 319 00:13:06,830 --> 00:13:09,950 And it turns out-- and it's fine if you don't recall logarithms too well-- 320 00:13:09,950 --> 00:13:11,660 the base 2 there it doesn't really matter. 321 00:13:11,660 --> 00:13:12,868 I'm going to throw that away. 322 00:13:12,868 --> 00:13:14,510 It can be base 2 or 3 or 10. 323 00:13:14,510 --> 00:13:16,527 They're all within multiples of one another. 324 00:13:16,527 --> 00:13:18,110 So that's no big deal either, I claim. 325 00:13:18,110 --> 00:13:20,330 And if you don't recall, that's OK, too. 326 00:13:20,330 --> 00:13:23,780 But the reason I claim that this red line and this yellow line 327 00:13:23,780 --> 00:13:27,950 are essentially the same thing is because if the problem gets big enough, 328 00:13:27,950 --> 00:13:30,500 that is the size of the problem gets bigger and bigger, 329 00:13:30,500 --> 00:13:34,580 and I only have so much screen here-- so let me instead just zoom out so that we 330 00:13:34,580 --> 00:13:37,010 see more y-axis and more x-axis. 331 00:13:37,010 --> 00:13:40,160 Notice how much closer the yellow and red lines even get to one another. 332 00:13:40,160 --> 00:13:42,080 And honestly, if I kept zooming out so that we 333 00:13:42,080 --> 00:13:44,300 could see bigger and bigger and bigger problems, 334 00:13:44,300 --> 00:13:46,343 these, frankly, would look pretty much the same. 335 00:13:46,343 --> 00:13:49,260 So when a computer scientist describes the efficiency of an algorithm, 336 00:13:49,260 --> 00:13:53,930 they say it's on the order of n, even if it's technically on the order of n/2. 337 00:13:53,930 --> 00:13:56,930 And here, too, on the order of [? law, ?] [? again ?] irrespective 338 00:13:56,930 --> 00:13:57,810 of what the base is. 339 00:13:57,810 --> 00:13:58,730 So it's kind of nice, right? 340 00:13:58,730 --> 00:14:01,772 Even though it looks a little mathy, you can still kind of wave your hand 341 00:14:01,772 --> 00:14:04,107 and approximate just a little bit. 342 00:14:04,107 --> 00:14:06,440 So there are different algorithms, though, in the world. 343 00:14:06,440 --> 00:14:10,370 And here's kind of a cheat sheet of common running times. 344 00:14:10,370 --> 00:14:12,530 A running time is just how much time it takes 345 00:14:12,530 --> 00:14:15,140 for your program or your algorithm to run, 346 00:14:15,140 --> 00:14:18,350 how many seconds does it take, how many seconds does it take, 347 00:14:18,350 --> 00:14:21,680 how many steps does it make, whenever your unit of measure is. 348 00:14:21,680 --> 00:14:24,470 And we'll see on the list here some familiar terms. 349 00:14:24,470 --> 00:14:27,890 If I were to label this chart now with a couple of the algorithms we've seen, 350 00:14:27,890 --> 00:14:30,380 linear search, we'll say, is in big O of n. 351 00:14:30,380 --> 00:14:34,070 In the worst case, Eric is going to have to look at all of the lockers, 352 00:14:34,070 --> 00:14:38,870 just like a few weeks ago I had to look at all of the pages in the phone book 353 00:14:38,870 --> 00:14:40,910 maximally to find Mike Smith. 354 00:14:40,910 --> 00:14:43,430 And just to be clear, where's binary search going 355 00:14:43,430 --> 00:14:46,470 to be in this list of running times? 356 00:14:46,470 --> 00:14:47,432 AUDIENCE: Log n. 357 00:14:47,432 --> 00:14:48,140 SPEAKER 1: Log n. 358 00:14:48,140 --> 00:14:49,280 So it's actually better. 359 00:14:49,280 --> 00:14:52,792 Lower on this chart is better, at least in terms of time required, 360 00:14:52,792 --> 00:14:53,750 than anything above it. 361 00:14:53,750 --> 00:14:55,170 So we've seen this thus far. 362 00:14:55,170 --> 00:14:57,170 And now this sort of invites the question, well, 363 00:14:57,170 --> 00:14:59,120 what algorithms kind of go here or here? 364 00:14:59,120 --> 00:15:00,150 Which ones are slower? 365 00:15:00,150 --> 00:15:01,070 Which ones are faster? 366 00:15:01,070 --> 00:15:03,395 That'll be one of the things we look at here today. 367 00:15:03,395 --> 00:15:06,020 But computer scientist have another sort of tool in the toolkit 368 00:15:06,020 --> 00:15:07,650 that we want to introduce you today. 369 00:15:07,650 --> 00:15:11,570 And this is just a capital Greek omega, this symbol here. 370 00:15:11,570 --> 00:15:15,470 And this just refers to not a-- 371 00:15:15,470 --> 00:15:17,450 it's the opposite of big O, if you will. 372 00:15:17,450 --> 00:15:21,200 Big O is essentially an upper bound on how much time an algorithm might take. 373 00:15:21,200 --> 00:15:24,290 It might have taken Eric n steps, 7 lockers, 374 00:15:24,290 --> 00:15:27,230 to find the number 50 because of linear search. 375 00:15:27,230 --> 00:15:29,540 That's big O of n, or on the order of n. 376 00:15:29,540 --> 00:15:32,310 That's an upper bound, worst case in this scenario. 377 00:15:32,310 --> 00:15:35,550 You can use omega, though, to describe things like best cases. 378 00:15:35,550 --> 00:15:39,530 So for instance, with Eric's linear search approach in the worst case, 379 00:15:39,530 --> 00:15:43,670 it could have and it did take him n steps, or 7 specifically. 380 00:15:43,670 --> 00:15:47,263 But in the best case, how few steps might it have taken him? 381 00:15:47,263 --> 00:15:47,930 Just one, right? 382 00:15:47,930 --> 00:15:51,300 He might have gotten lucky, and 50 might have been just there. 383 00:15:51,300 --> 00:15:53,600 Similarly, when [? Nizari, ?] when she looked for 50 384 00:15:53,600 --> 00:15:56,330 in the middle, how few steps might she have 385 00:15:56,330 --> 00:15:59,500 needed to find 50 among her 7 lockers? 386 00:15:59,500 --> 00:16:00,170 AUDIENCE: One. 387 00:16:00,170 --> 00:16:00,830 SPEAKER 1: One step, too. 388 00:16:00,830 --> 00:16:02,747 She might have just gotten lucky because Brian 389 00:16:02,747 --> 00:16:06,230 might have just, by coincidence or design, put the number 50 there. 390 00:16:06,230 --> 00:16:09,800 So whereas you have this upper bound on how many steps an algorithm might take, 391 00:16:09,800 --> 00:16:11,030 sometimes you can get lucky. 392 00:16:11,030 --> 00:16:13,280 And if the inputs are in a certain order, 393 00:16:13,280 --> 00:16:15,740 you might get lucky and have a lower bound on the running 394 00:16:15,740 --> 00:16:17,193 time that's much, much better. 395 00:16:17,193 --> 00:16:19,110 So we might have a chart that looks like this. 396 00:16:19,110 --> 00:16:22,130 This is the same function, so to speak, the same math. 397 00:16:22,130 --> 00:16:25,220 But I'm just using omega now instead of big O. 398 00:16:25,220 --> 00:16:29,030 And now let's just apply some of these algorithms to the chat here, then. 399 00:16:29,030 --> 00:16:33,380 Linear search is in omega of what, so to speak, by this definition? 400 00:16:33,380 --> 00:16:34,422 AUDIENCE: Omega 1. 401 00:16:34,422 --> 00:16:35,630 SPEAKER 1: Omega of 1, right? 402 00:16:35,630 --> 00:16:39,705 In the best case, the lower bound on how much time linear search might 403 00:16:39,705 --> 00:16:41,330 have taken Eric would just be one step. 404 00:16:41,330 --> 00:16:44,930 So we're going to call linear search omega of 1. 405 00:16:44,930 --> 00:16:46,770 And meanwhile, when we did binary search, 406 00:16:46,770 --> 00:16:50,030 secondly, it's not going to be log n in the best case. 407 00:16:50,030 --> 00:16:54,710 It might be also omega of 1 because we might just get lucky. 408 00:16:54,710 --> 00:16:57,050 And so now we have kind of useful rules of thumb 409 00:16:57,050 --> 00:16:59,810 for describing just how good or bad your algorithm or your code 410 00:16:59,810 --> 00:17:05,849 might be, depending on, at least, the inputs that are fed to that algorithm. 411 00:17:05,849 --> 00:17:07,730 So that's big O, and that's omega. 412 00:17:07,730 --> 00:17:13,130 Any questions on these two principles, big O or omega? 413 00:17:13,130 --> 00:17:13,630 Yeah? 414 00:17:13,630 --> 00:17:22,407 AUDIENCE: [INAUDIBLE] no matter where it starts [INAUDIBLE]?? 415 00:17:22,407 --> 00:17:23,740 SPEAKER 1: Really good question. 416 00:17:23,740 --> 00:17:25,698 And we'll touch on a few such algorithms today. 417 00:17:25,698 --> 00:17:28,780 But for now the question is, what's an example of an algorithm that 418 00:17:28,780 --> 00:17:33,490 might be omega of n, such that in the best case, no matter how good or bad 419 00:17:33,490 --> 00:17:35,167 your input is, it takes n steps? 420 00:17:35,167 --> 00:17:37,000 Maybe counting the number of lockers, right? 421 00:17:37,000 --> 00:17:38,110 How do I do that? 422 00:17:38,110 --> 00:17:42,580 1, 2, 3, 4, 5, 6, 7-- my output is 7. 423 00:17:42,580 --> 00:17:44,080 How many steps did that take? 424 00:17:44,080 --> 00:17:47,060 Big O of n because in the worst case, I had to look at all of them, 425 00:17:47,060 --> 00:17:49,622 but also omega of n because in the best case, 426 00:17:49,622 --> 00:17:51,080 I still had to look at all of them. 427 00:17:51,080 --> 00:17:53,410 Otherwise, I couldn't have given you an accurate count. 428 00:17:53,410 --> 00:17:55,780 So that would be an example of an omega of n algorithm. 429 00:17:55,780 --> 00:17:57,103 And we'll see others over time. 430 00:17:57,103 --> 00:17:57,770 Other questions? 431 00:17:57,770 --> 00:17:58,884 Yeah? 432 00:17:58,884 --> 00:18:06,516 AUDIENCE: [INAUDIBLE] omega or [INAUDIBLE] better omega value 433 00:18:06,516 --> 00:18:07,887 or a better O value? 434 00:18:07,887 --> 00:18:09,220 SPEAKER 1: Really good question. 435 00:18:09,220 --> 00:18:13,240 Is it better to have a really good omega value or a really good O value? 436 00:18:13,240 --> 00:18:15,205 The latter, and we'll see this over time. 437 00:18:15,205 --> 00:18:17,080 Really what computer scientists tend to worry 438 00:18:17,080 --> 00:18:19,900 about is how their code performs in the worst case, 439 00:18:19,900 --> 00:18:21,790 or maybe not even that, in the average case. 440 00:18:21,790 --> 00:18:25,270 Typically, day today, best case is nice to have. 441 00:18:25,270 --> 00:18:28,765 But who really cares if your code is super fast when the input happens 442 00:18:28,765 --> 00:18:30,580 to be sorted for you already? 443 00:18:30,580 --> 00:18:32,600 That would be a corner case, so to speak. 444 00:18:32,600 --> 00:18:34,960 So it's a useful tool to describe your algorithm. 445 00:18:34,960 --> 00:18:39,260 But a big O and upper bound is typically what we'll care about a little more. 446 00:18:39,260 --> 00:18:41,260 So let's go in and make this a little more real. 447 00:18:41,260 --> 00:18:43,490 Let me go ahead and switch over to CS50 IDE. 448 00:18:43,490 --> 00:18:47,440 And let me go ahead and create a program here called numbers.c 449 00:18:47,440 --> 00:18:50,990 that's going to allow us to explore, for instance, linear search. 450 00:18:50,990 --> 00:18:54,620 So numbers.c is going to start off with our usual lines. 451 00:18:54,620 --> 00:18:57,580 So I'm going to go ahead and include cs50.h. 452 00:18:57,580 --> 00:19:01,420 I'm going to go ahead and include standard io.h, int main void, 453 00:19:01,420 --> 00:19:03,790 so no command line arguments for now. 454 00:19:03,790 --> 00:19:06,460 And in here, let me go ahead and just declare some numbers, 455 00:19:06,460 --> 00:19:07,960 maybe six numbers total. 456 00:19:07,960 --> 00:19:11,080 And if I want to declare an array of six numbers, recall from last week, 457 00:19:11,080 --> 00:19:12,670 I can literally say this. 458 00:19:12,670 --> 00:19:14,440 And if I want to initialize those numbers, 459 00:19:14,440 --> 00:19:17,830 I can do numbers bracket 0 gets, for instance, the number 4. 460 00:19:17,830 --> 00:19:21,190 Numbers bracket 1 gets the number, say, 8. 461 00:19:21,190 --> 00:19:25,150 Numbers bracket 2 gets the number 15. 462 00:19:25,150 --> 00:19:28,000 Numbers-- OK, so this is getting really tedious. 463 00:19:28,000 --> 00:19:30,460 Turns out in C, there's a shorthand notation 464 00:19:30,460 --> 00:19:34,270 when you know in advance what values you want to put in an array. 465 00:19:34,270 --> 00:19:42,070 I can actually go up and do this, 4, 8, 15, 16, 23, 42 with curly braces 466 00:19:42,070 --> 00:19:43,180 on either side. 467 00:19:43,180 --> 00:19:46,310 So this is just what's called a statically initialized array. 468 00:19:46,310 --> 00:19:48,520 You just know in advance what the values are. 469 00:19:48,520 --> 00:19:51,230 And so I can just save some lines of code that way. 470 00:19:51,230 --> 00:19:54,130 But it's the same thing as the road I was going down a moment ago. 471 00:19:54,130 --> 00:19:56,828 But the curly braces are new for that little feature. 472 00:19:56,828 --> 00:19:58,870 Now I'm going to go ahead and iterate over these. 473 00:19:58,870 --> 00:20:01,728 So for int, i gets 0, i less than 6. 474 00:20:01,728 --> 00:20:05,020 And I'm going to cut some corners now so that we focus on the new stuff and not 475 00:20:05,020 --> 00:20:05,520 on the old. 476 00:20:05,520 --> 00:20:09,370 I'm hard coding 6 instead of using a constant or something like that. 477 00:20:09,370 --> 00:20:13,120 But all I want to do ultimately is search for the number 50. 478 00:20:13,120 --> 00:20:15,520 So what code can I now write inside of this for loop 479 00:20:15,520 --> 00:20:21,440 to just ask the question, is 50 behind this door? 480 00:20:21,440 --> 00:20:24,400 Someone want to call it out? 481 00:20:24,400 --> 00:20:24,900 Yeah? 482 00:20:24,900 --> 00:20:26,178 If? 483 00:20:26,178 --> 00:20:29,500 AUDIENCE: Number i [INAUDIBLE]. 484 00:20:29,500 --> 00:20:33,175 SPEAKER 1: Numbers i equals and not just single equals, but equals equals 50. 485 00:20:33,175 --> 00:20:34,925 I can go ahead now and return some answer. 486 00:20:34,925 --> 00:20:37,675 So I'm going to go ahead [? and say ?] printf, for instance, found 487 00:20:37,675 --> 00:20:38,510 in a new line. 488 00:20:38,510 --> 00:20:42,010 And then if I want to say that, no 50 was found, 489 00:20:42,010 --> 00:20:45,460 recall, that I want to do this outside of the loop, just like in my pseudocode 490 00:20:45,460 --> 00:20:45,970 earlier. 491 00:20:45,970 --> 00:20:48,230 So not found can go way down there. 492 00:20:48,230 --> 00:20:51,313 So just to be clear, what algorithm have I implemented here? 493 00:20:51,313 --> 00:20:52,280 AUDIENCE: [INAUDIBLE]. 494 00:20:52,280 --> 00:20:52,700 SPEAKER 1: Yeah. 495 00:20:52,700 --> 00:20:53,742 So this is linear search. 496 00:20:53,742 --> 00:20:58,910 This is the code incarnation of my pseudocode in Eric's actual execution 497 00:20:58,910 --> 00:20:59,645 of his algorithm. 498 00:20:59,645 --> 00:21:01,020 So let me go ahead and save this. 499 00:21:01,020 --> 00:21:04,220 Let me go ahead and make numbers, no error messages, which is good, 500 00:21:04,220 --> 00:21:08,840 dot slash numbers and Enter, or I should see what when I hit Enter here? 501 00:21:08,840 --> 00:21:10,100 AUDIENCE: Not found. 502 00:21:10,100 --> 00:21:13,640 SPEAKER 1: Hopefully not found because indeed 50 is not among those numbers. 503 00:21:13,640 --> 00:21:17,170 So that's interesting, but it's mostly warm up from last week. 504 00:21:17,170 --> 00:21:18,920 Why don't we consider a different problem, 505 00:21:18,920 --> 00:21:22,090 where now we might want to search not just for numbers, but maybe names. 506 00:21:22,090 --> 00:21:24,590 Like, if the goal is to search a phone book, let me go ahead 507 00:21:24,590 --> 00:21:28,950 and create names.c that allows me to search now for names in an array. 508 00:21:28,950 --> 00:21:31,340 So let me go ahead and include cs50.h. 509 00:21:31,340 --> 00:21:34,100 Let me go ahead and include standard io.h. 510 00:21:34,100 --> 00:21:36,620 Let me go ahead and do it int main void. 511 00:21:36,620 --> 00:21:40,220 And then down here let me go ahead and give myself an array, 512 00:21:40,220 --> 00:21:42,560 so an array of string called names. 513 00:21:42,560 --> 00:21:44,870 I'm going to go ahead and give myself four names. 514 00:21:44,870 --> 00:21:48,170 And just like last time, I can do names bracket 0 gets Emma. 515 00:21:48,170 --> 00:21:52,190 Or again, to save myself time, I can cut a few corners here 516 00:21:52,190 --> 00:21:57,740 and say Emma, Rodrigo, Brian, David, just like last week, 517 00:21:57,740 --> 00:21:59,210 capitalized just because. 518 00:21:59,210 --> 00:22:04,610 So that's another way of writing the same code as with more lines than that. 519 00:22:04,610 --> 00:22:06,410 Now I'm going to do int i gets 0. 520 00:22:06,410 --> 00:22:10,130 i is less than 5, in this case, i plus plus. 521 00:22:10,130 --> 00:22:12,800 And now things get a little interesting because I 522 00:22:12,800 --> 00:22:17,370 might want to say if names bracket i equals equals-- let's not search for 50 523 00:22:17,370 --> 00:22:17,870 now. 524 00:22:17,870 --> 00:22:20,030 Let's search for Emma, just like last week. 525 00:22:20,030 --> 00:22:24,530 I want to go ahead and say found if I find Emma, 526 00:22:24,530 --> 00:22:28,700 else down here I want to say not found. 527 00:22:28,700 --> 00:22:31,050 The catch is that this will not work. 528 00:22:31,050 --> 00:22:31,550 Sorry. 529 00:22:31,550 --> 00:22:32,925 It's a little warm up here today. 530 00:22:32,925 --> 00:22:36,830 The catch is this will not work, even though I'm pretty much 531 00:22:36,830 --> 00:22:39,860 doing exactly what I did last time. 532 00:22:39,860 --> 00:22:43,310 What might the intuition be, especially if you've never studied C before, 533 00:22:43,310 --> 00:22:47,570 as to why line 10 here won't actually work 534 00:22:47,570 --> 00:22:51,246 as easily as numbers did a moment ago? 535 00:22:51,246 --> 00:22:51,746 Yeah? 536 00:22:51,746 --> 00:22:53,080 AUDIENCE: Difference in data type. 537 00:22:53,080 --> 00:22:54,370 SPEAKER 1: Difference in data type, and what do 538 00:22:54,370 --> 00:22:56,462 what are the differences, to be clear? 539 00:22:56,462 --> 00:22:58,893 AUDIENCE: [INAUDIBLE] 540 00:22:58,893 --> 00:22:59,560 SPEAKER 1: Yeah. 541 00:22:59,560 --> 00:23:05,158 AUDIENCE: [INAUDIBLE] of the array [INAUDIBLE].. 542 00:23:05,158 --> 00:23:05,950 SPEAKER 1: Exactly. 543 00:23:05,950 --> 00:23:08,760 You can't use equals equals 4 strings because, remember, 544 00:23:08,760 --> 00:23:12,390 a string is not a data type, like a char, a bool, a float, an int. 545 00:23:12,390 --> 00:23:15,120 Remember, it's actually an array and an array 546 00:23:15,120 --> 00:23:17,070 that likely has multiple characters. 547 00:23:17,070 --> 00:23:19,320 And odds if you want to compare two strings, 548 00:23:19,320 --> 00:23:23,550 you probably intuitively need to compare all of the characters in those strings, 549 00:23:23,550 --> 00:23:25,170 not just the whole thing at once. 550 00:23:25,170 --> 00:23:27,480 In other languages, if you use Python or Java, 551 00:23:27,480 --> 00:23:30,030 you can actually do this in one line, just like this. 552 00:23:30,030 --> 00:23:32,430 But in C, everything is much more low level. 553 00:23:32,430 --> 00:23:35,460 If you want to compare strings, you can't use equal equals. 554 00:23:35,460 --> 00:23:37,260 However, it turns out there's a function, 555 00:23:37,260 --> 00:23:39,360 and you might have even used this in p-set 2, 556 00:23:39,360 --> 00:23:43,200 if you took this approach, where you can actually compare two strings. 557 00:23:43,200 --> 00:23:45,400 So I'm going to delete this line and instead say, 558 00:23:45,400 --> 00:23:50,430 str comp, for string comparison, names bracket i being the first string I want 559 00:23:50,430 --> 00:23:53,850 to compare and then, quote unquote, "Emma" being the second string 560 00:23:53,850 --> 00:23:54,808 that I want to compare. 561 00:23:54,808 --> 00:23:57,017 And you would only know this from having been told it 562 00:23:57,017 --> 00:23:58,410 or reading in the documentation. 563 00:23:58,410 --> 00:24:05,190 This function str compare returns 0 if two strings are the same. 564 00:24:05,190 --> 00:24:07,098 It happens to return a positive number if one 565 00:24:07,098 --> 00:24:09,390 comes after the other alphabetically or negative number 566 00:24:09,390 --> 00:24:11,265 if one comes before the other alphabetically. 567 00:24:11,265 --> 00:24:15,240 But for today, we're just using it to test equality of strings, so to speak. 568 00:24:15,240 --> 00:24:17,350 So let me go ahead and save this. 569 00:24:17,350 --> 00:24:22,050 Let me go ahead and scroll up here and do make names this time. 570 00:24:22,050 --> 00:24:25,350 And unfortunately, I can't just use this function, it seems. 571 00:24:25,350 --> 00:24:27,330 And while it's fine certainly to keep using 572 00:24:27,330 --> 00:24:30,360 help 50 to understand these messages, any thoughts as to what 573 00:24:30,360 --> 00:24:32,964 I've done wrong? 574 00:24:32,964 --> 00:24:33,910 AUDIENCE: [INAUDIBLE] 575 00:24:33,910 --> 00:24:34,300 SPEAKER 1: Yeah. 576 00:24:34,300 --> 00:24:36,520 I mean, I can't quite understand all of the words on the screen, 577 00:24:36,520 --> 00:24:37,600 frankly, at first glance. 578 00:24:37,600 --> 00:24:40,150 But string.h is something we've seen before. 579 00:24:40,150 --> 00:24:43,040 And indeed, if you read the documentation or the manual page, 580 00:24:43,040 --> 00:24:46,660 you'll see that str compare, indeed, comes in string.h 581 00:24:46,660 --> 00:24:48,700 so I need to put this up here. 582 00:24:48,700 --> 00:24:53,800 And now if I save my file and recompile my code down here with make names, 583 00:24:53,800 --> 00:24:54,730 now it compiles. 584 00:24:54,730 --> 00:24:59,545 And if I do dot slash names, I should see, hmm, interesting, 585 00:24:59,545 --> 00:25:02,290 a mixed message, literally. 586 00:25:02,290 --> 00:25:05,245 So is Emma there or not there in my array? 587 00:25:05,245 --> 00:25:08,000 588 00:25:08,000 --> 00:25:09,590 She's obviously there. 589 00:25:09,590 --> 00:25:11,300 And yet she's somehow not there. 590 00:25:11,300 --> 00:25:13,010 So what have I done wrong logically. 591 00:25:13,010 --> 00:25:13,510 Yeah? 592 00:25:13,510 --> 00:25:20,200 AUDIENCE: Do you have [INAUDIBLE] if it's found or not. 593 00:25:20,200 --> 00:25:23,343 So [INAUDIBLE] is not found [INAUDIBLE]. 594 00:25:23,343 --> 00:25:24,010 SPEAKER 1: Yeah. 595 00:25:24,010 --> 00:25:27,890 So it's this not found that I'm just blindingly printing 596 00:25:27,890 --> 00:25:29,510 at the end as a sort of catch all. 597 00:25:29,510 --> 00:25:34,070 But really, if I execute found or print found up here, 598 00:25:34,070 --> 00:25:37,780 what should I really be doing maybe right after that? 599 00:25:37,780 --> 00:25:38,330 Returning. 600 00:25:38,330 --> 00:25:39,663 And we looked at this last week. 601 00:25:39,663 --> 00:25:42,950 Recall that if you want to go ahead and return a successful outcome, 602 00:25:42,950 --> 00:25:45,110 the convention is to return 0. 603 00:25:45,110 --> 00:25:47,163 And actually down here, if you're unsuccessful, 604 00:25:47,163 --> 00:25:48,830 what should be perhaps returned instead? 605 00:25:48,830 --> 00:25:49,398 AUDIENCE: 1. 606 00:25:49,398 --> 00:25:49,940 SPEAKER 1: 1. 607 00:25:49,940 --> 00:25:51,980 And again, these are totally arbitrary conventions. 608 00:25:51,980 --> 00:25:53,630 You just kind of learn them as you go. 609 00:25:53,630 --> 00:25:54,860 But 0 mean success. 610 00:25:54,860 --> 00:25:57,020 1 tends to mean failure. 611 00:25:57,020 --> 00:25:58,260 And that now lines up. 612 00:25:58,260 --> 00:26:01,440 So now my function main will essentially exit early. 613 00:26:01,440 --> 00:26:05,030 So if I go ahead and run make names and then do dot slash names, now 614 00:26:05,030 --> 00:26:07,580 if I'm searching for Emma in that array of four names, 615 00:26:07,580 --> 00:26:11,480 she's found and only found. 616 00:26:11,480 --> 00:26:15,620 Any questions, then, on this here? 617 00:26:15,620 --> 00:26:18,560 All right, well, what if I want to do one further thing 618 00:26:18,560 --> 00:26:21,740 and combine these two ideas into one final program, namely 619 00:26:21,740 --> 00:26:22,762 that of a phone book? 620 00:26:22,762 --> 00:26:24,470 So let me go ahead and close these files. 621 00:26:24,470 --> 00:26:26,430 Let me go ahead and give myself a new file. 622 00:26:26,430 --> 00:26:28,430 I'll call it phonebook.c. 623 00:26:28,430 --> 00:26:31,310 And let's actually integrate all of these building blocks 624 00:26:31,310 --> 00:26:33,680 as follows, cs50.h again. 625 00:26:33,680 --> 00:26:36,200 I'm going to go ahead and include standard io.h. 626 00:26:36,200 --> 00:26:39,390 I'm going to go ahead and include string.h just as before. 627 00:26:39,390 --> 00:26:41,210 And now I'm going to do int main void. 628 00:26:41,210 --> 00:26:44,420 And now I want to implement the idea of searching a phone book, 629 00:26:44,420 --> 00:26:48,540 just like in week 0, but now doing it in C. So let's keep it simple. 630 00:26:48,540 --> 00:26:52,918 And we'll have just four names in this phone book, so string names 4 equals. 631 00:26:52,918 --> 00:26:54,710 And I'm going to use my same new trick just 632 00:26:54,710 --> 00:26:58,520 to save myself some lines of code, Emma, Rodrigo, 633 00:26:58,520 --> 00:27:03,500 and then, quote unquote, "Brian," quote unquote, "myself." 634 00:27:03,500 --> 00:27:05,000 But then our numbers. 635 00:27:05,000 --> 00:27:08,990 So how should we store phone number, would you propose, what data type? 636 00:27:08,990 --> 00:27:11,990 637 00:27:11,990 --> 00:27:13,022 AUDIENCE: [INAUDIBLE] 638 00:27:13,022 --> 00:27:13,730 SPEAKER 1: Sorry? 639 00:27:13,730 --> 00:27:14,750 AUDIENCE: String. 640 00:27:14,750 --> 00:27:15,140 SPEAKER 1: String? 641 00:27:15,140 --> 00:27:15,680 Why string? 642 00:27:15,680 --> 00:27:18,160 I feel like phone numbers are numbers and strings-- 643 00:27:18,160 --> 00:27:21,864 AUDIENCE: Maybe if you store it as a [INAUDIBLE] or an integer, 644 00:27:21,864 --> 00:27:25,910 then it's implied that you need to do much [INAUDIBLE].. 645 00:27:25,910 --> 00:27:30,728 You don't have [INAUDIBLE] the [INAUDIBLE],, 646 00:27:30,728 --> 00:27:34,200 like, [? add ?] [INAUDIBLE] number a dash or something. 647 00:27:34,200 --> 00:27:36,468 It would be really hard to manipulate an integer. 648 00:27:36,468 --> 00:27:37,260 SPEAKER 1: Exactly. 649 00:27:37,260 --> 00:27:41,040 So to summarize if a phone number has dashes in it or parentheses or maybe 650 00:27:41,040 --> 00:27:43,345 plus signs abroad, those are characters. 651 00:27:43,345 --> 00:27:44,220 Those aren't numbers. 652 00:27:44,220 --> 00:27:45,803 So they won't fit in ints or in longs. 653 00:27:45,803 --> 00:27:48,762 So even though we call it a phone number, now that you're a programmer, 654 00:27:48,762 --> 00:27:51,720 it's not really a number so much as a string that looks like a number. 655 00:27:51,720 --> 00:27:53,820 So string is probably the better bet here. 656 00:27:53,820 --> 00:27:56,100 And if you consider, too, in certain geographies, 657 00:27:56,100 --> 00:27:59,340 you sometimes have to dial 0 to dial someone's number if it's local. 658 00:27:59,340 --> 00:28:01,842 But if it's a 0, it's going to get dropped mathematically 659 00:28:01,842 --> 00:28:03,300 because leading zeros don't matter. 660 00:28:03,300 --> 00:28:07,680 So again, modeling things that look like numbers but really aren't as integers 661 00:28:07,680 --> 00:28:08,870 is probably the wrong call. 662 00:28:08,870 --> 00:28:11,545 So let's indeed do string numbers. 663 00:28:11,545 --> 00:28:13,170 And I'll give myself four numbers here. 664 00:28:13,170 --> 00:28:19,170 And let's do 617 555 how about, 0100. 665 00:28:19,170 --> 00:28:26,580 We'll do 617 555 just like in the movies, 0101. 666 00:28:26,580 --> 00:28:27,660 Let me fix that. 667 00:28:27,660 --> 00:28:34,620 Then we'll do 617 555 [? 0102. ?] And then lastly my number, which shall be-- 668 00:28:34,620 --> 00:28:39,360 whoops-- which shall be 617 555 0103. 669 00:28:39,360 --> 00:28:41,550 And I'm doing a same kind of trick, but this 670 00:28:41,550 --> 00:28:44,910 is giving me now two arrays, one called names, one called numbers. 671 00:28:44,910 --> 00:28:50,580 Here we go, for int i gets 0, i less than 4 i plus plus, so same quick loop 672 00:28:50,580 --> 00:28:51,720 as before. 673 00:28:51,720 --> 00:28:53,400 I'm going to go ahead and compare now. 674 00:28:53,400 --> 00:28:54,390 I'm searching for Emma. 675 00:28:54,390 --> 00:28:57,810 And specifically now I'm searching for her number not just her name. 676 00:28:57,810 --> 00:29:01,350 So I want to print out her number this time not just found or not found. 677 00:29:01,350 --> 00:29:06,990 So as before, I can say if comparing the two strings at names bracket i 678 00:29:06,990 --> 00:29:12,480 and, quote unquote, "Emma" equals equals 0, I know that I found Emma. 679 00:29:12,480 --> 00:29:16,500 And if I want to go ahead and print out Emma's phone number, what should 680 00:29:16,500 --> 00:29:18,930 I do here? 681 00:29:18,930 --> 00:29:20,670 It's not names. 682 00:29:20,670 --> 00:29:22,883 It's not numbers. 683 00:29:22,883 --> 00:29:24,300 What should go between the quotes? 684 00:29:24,300 --> 00:29:25,623 AUDIENCE: [INAUDIBLE]. 685 00:29:25,623 --> 00:29:28,540 SPEAKER 1: Yeah, so [? %s, ?] remember, just our familiar place holder 686 00:29:28,540 --> 00:29:29,130 for strings. 687 00:29:29,130 --> 00:29:32,760 And then here not names because I know I'm looking for Emma. 688 00:29:32,760 --> 00:29:34,500 Here I want to go ahead and put number. 689 00:29:34,500 --> 00:29:38,520 So it's a separate array, but it's at the same location, bracket 1. 690 00:29:38,520 --> 00:29:40,060 Let me go ahead and save that. 691 00:29:40,060 --> 00:29:42,870 And down here, I'm going to go ahead and say printf not found, 692 00:29:42,870 --> 00:29:45,630 if we don't find Emma, even though we surely will in this case. 693 00:29:45,630 --> 00:29:47,005 And I'm going to learn my lesson. 694 00:29:47,005 --> 00:29:51,120 I'm going to return 0 for success and return 1 for failure in this case. 695 00:29:51,120 --> 00:29:54,450 Let me save the file, scroll my terminal window up a little bit, 696 00:29:54,450 --> 00:29:59,160 do make phone book Enter, compiles OK dot slash phone book. 697 00:29:59,160 --> 00:30:02,706 And what should I see when I run the program now? 698 00:30:02,706 --> 00:30:04,600 AUDIENCE: [INAUDIBLE] 699 00:30:04,600 --> 00:30:09,610 SPEAKER 1: 617 555 0100, hopefully. 700 00:30:09,610 --> 00:30:11,860 So this code is correct. 701 00:30:11,860 --> 00:30:14,500 And this is an opportunity now for us to criticize it, though, 702 00:30:14,500 --> 00:30:16,150 along a different line. 703 00:30:16,150 --> 00:30:17,150 This is correct. 704 00:30:17,150 --> 00:30:20,740 I've got two arrays, both of size 4, one with names, one with numbers, 705 00:30:20,740 --> 00:30:23,710 code finds Emma, prints her number, returns 0. 706 00:30:23,710 --> 00:30:26,050 I seem to have done everything correctly. 707 00:30:26,050 --> 00:30:30,520 But does anything rub you the wrong way perhaps about the design of this code? 708 00:30:30,520 --> 00:30:33,250 Could we do better? 709 00:30:33,250 --> 00:30:36,490 Is there's something that's a little arbitrary, a little contrived, 710 00:30:36,490 --> 00:30:39,680 a little dangerous about this code? 711 00:30:39,680 --> 00:30:40,690 Any glimpses? 712 00:30:40,690 --> 00:30:41,734 Yeah, over here? 713 00:30:41,734 --> 00:30:43,630 AUDIENCE: [INAUDIBLE]. 714 00:30:43,630 --> 00:30:45,134 SPEAKER 1: Sorry, a little louder. 715 00:30:45,134 --> 00:30:47,444 AUDIENCE: [INAUDIBLE] [? two ?] [? single digit ?] [? on both sides ?] 716 00:30:47,444 --> 00:30:48,370 [INAUDIBLE]. 717 00:30:48,370 --> 00:30:50,980 SPEAKER 1: So we could use a two-dimensional array 718 00:30:50,980 --> 00:30:52,420 to store data like this. 719 00:30:52,420 --> 00:30:54,253 I would propose it's not strictly necessary, 720 00:30:54,253 --> 00:30:56,378 and it might make things a little more complicated, 721 00:30:56,378 --> 00:30:58,030 but a reasonable alternative as well. 722 00:30:58,030 --> 00:30:58,974 Other thoughts? 723 00:30:58,974 --> 00:31:01,580 AUDIENCE: [INAUDIBLE] Emma's number [INAUDIBLE].. 724 00:31:01,580 --> 00:31:04,600 SPEAKER 1: Yeah, it's assuming that Emma's number is the first one. 725 00:31:04,600 --> 00:31:06,220 And that seems reasonable, right? 726 00:31:06,220 --> 00:31:07,300 Emma's name is first. 727 00:31:07,300 --> 00:31:08,950 So presumably her number's first. 728 00:31:08,950 --> 00:31:10,270 Rodrigo's name is second. 729 00:31:10,270 --> 00:31:12,070 So presumably his number is second. 730 00:31:12,070 --> 00:31:13,750 And that might be true. 731 00:31:13,750 --> 00:31:17,380 But frankly, that's the concern, this sort of honor system 732 00:31:17,380 --> 00:31:19,810 that I promised to keep the names in the right order, 733 00:31:19,810 --> 00:31:23,080 and I promise to keep the numbers in the right order, when really, 734 00:31:23,080 --> 00:31:26,410 that is just sort of an unspoken agreement between me and myself, 735 00:31:26,410 --> 00:31:28,810 or if I'm working with colleagues or classmates, 736 00:31:28,810 --> 00:31:31,273 that we all just agree to keep those things in sync. 737 00:31:31,273 --> 00:31:32,440 And that's dangerous, right? 738 00:31:32,440 --> 00:31:35,710 If you had more numbers than four, you could imagine things very quickly 739 00:31:35,710 --> 00:31:37,240 getting slightly out of order. 740 00:31:37,240 --> 00:31:39,970 Or god forbid, you sort the names alphabetically, 741 00:31:39,970 --> 00:31:43,970 how do you go about sorting the numbers as well and keeping things together? 742 00:31:43,970 --> 00:31:46,780 So this feels like an opportunity for one new feature in C 743 00:31:46,780 --> 00:31:48,820 and in programming languages more generally, 744 00:31:48,820 --> 00:31:52,540 whereby we can actually keep these pieces of data, someone's name 745 00:31:52,540 --> 00:31:54,010 and number, together. 746 00:31:54,010 --> 00:31:56,140 And today we give ourselves the opportunity 747 00:31:56,140 --> 00:31:58,450 to introduce our own custom types. 748 00:31:58,450 --> 00:32:01,930 We've seen ints and bools and floats and longs and strings. 749 00:32:01,930 --> 00:32:04,390 And string, recall, is a custom CS50 data type. 750 00:32:04,390 --> 00:32:07,840 And we'll take that one away in a couple of weeks as a training wheel. 751 00:32:07,840 --> 00:32:11,650 But today let's give ourselves our own data type as follows. 752 00:32:11,650 --> 00:32:14,030 Typedef is our new keyword today. 753 00:32:14,030 --> 00:32:16,360 And it literally means define a type. 754 00:32:16,360 --> 00:32:17,710 It's going to be a structure. 755 00:32:17,710 --> 00:32:20,740 And so struct in C is an actual keyword, and it 756 00:32:20,740 --> 00:32:26,500 refers to a container, inside of which you can put multiple other data types. 757 00:32:26,500 --> 00:32:29,620 Struct is a container for multiple data types. 758 00:32:29,620 --> 00:32:31,720 What do I want to contain? 759 00:32:31,720 --> 00:32:34,210 Well, I want to give myself a name for everyone. 760 00:32:34,210 --> 00:32:36,580 And I want to give myself a number for everyone, 761 00:32:36,580 --> 00:32:39,880 even though it's a string because phone numbers can have dashes and parentheses 762 00:32:39,880 --> 00:32:40,900 and so forth. 763 00:32:40,900 --> 00:32:41,650 And you know what? 764 00:32:41,650 --> 00:32:45,070 The name I'm going to give to this structure is going to be person. 765 00:32:45,070 --> 00:32:46,390 It's a simple person. 766 00:32:46,390 --> 00:32:49,420 But using this syntax, I can teach my compiler, 767 00:32:49,420 --> 00:32:51,490 [INAUDIBLE] in this case, that not only are there 768 00:32:51,490 --> 00:32:55,240 ints and floats and chars and bools and so forth and strings, 769 00:32:55,240 --> 00:32:59,710 there are also person types now in C. They didn't come with the language. 770 00:32:59,710 --> 00:33:04,360 But I'm inventing them now with typedef struct person, inside of which, 771 00:33:04,360 --> 00:33:06,700 or encapsulated, so to speak, inside of which 772 00:33:06,700 --> 00:33:09,970 is going to be two things, name and number. 773 00:33:09,970 --> 00:33:11,570 So what can I do with this? 774 00:33:11,570 --> 00:33:15,820 Well, my code gets a little different but better designed, I would argue. 775 00:33:15,820 --> 00:33:19,150 Down in my code now, I'm going to give myself an array of people. 776 00:33:19,150 --> 00:33:20,560 There's four of us on the staff. 777 00:33:20,560 --> 00:33:22,960 And I want to give myself an array of four people. 778 00:33:22,960 --> 00:33:26,140 So I might do literally the same approach I've always 779 00:33:26,140 --> 00:33:27,520 done when declaring a data type. 780 00:33:27,520 --> 00:33:28,645 What data type do you want? 781 00:33:28,645 --> 00:33:29,470 Person. 782 00:33:29,470 --> 00:33:31,330 And what should my array be called? 783 00:33:31,330 --> 00:33:32,860 Well, I could call it persons. 784 00:33:32,860 --> 00:33:35,320 Or frankly, I could just call it people in English. 785 00:33:35,320 --> 00:33:37,450 And how many people do I want to represent? 786 00:33:37,450 --> 00:33:38,470 Four. 787 00:33:38,470 --> 00:33:40,900 So my array is called people. 788 00:33:40,900 --> 00:33:42,340 It's a size 4. 789 00:33:42,340 --> 00:33:46,070 And each element in that array is going to be a person. 790 00:33:46,070 --> 00:33:48,110 So this syntax is not new. 791 00:33:48,110 --> 00:33:50,800 This syntax up here is new. 792 00:33:50,800 --> 00:33:54,220 But as of today now, persons exist in C. Now, 793 00:33:54,220 --> 00:33:57,550 my syntax here does have to change a little bit, but not all that much. 794 00:33:57,550 --> 00:34:01,450 Now, if I want to go ahead and fill this array, I can do something like this. 795 00:34:01,450 --> 00:34:03,700 Emma will be our 0th person. 796 00:34:03,700 --> 00:34:07,570 But I don't just do something like this because, quote unquote, "Emma" is not 797 00:34:07,570 --> 00:34:08,409 a person. 798 00:34:08,409 --> 00:34:10,989 Quote unquote "Emma" is a name. 799 00:34:10,989 --> 00:34:15,409 And quote unquote "617 555 0100" is a number. 800 00:34:15,409 --> 00:34:17,590 So I actually need to be a little more specific. 801 00:34:17,590 --> 00:34:23,320 I need to say that people 0 name is Emma. 802 00:34:23,320 --> 00:34:27,610 And then people 0 number is whatever Emma's was, 803 00:34:27,610 --> 00:34:31,460 which was 617 555 0100 semicolon. 804 00:34:31,460 --> 00:34:35,620 And now I can do the same thing again, so people bracket 1 dot name 805 00:34:35,620 --> 00:34:37,300 gets Rodrigo. 806 00:34:37,300 --> 00:34:43,719 People bracket 1 dot number gets 617 555 0101 semicolon. 807 00:34:43,719 --> 00:34:47,679 People bracket 2 dot name gets Brian. 808 00:34:47,679 --> 00:34:54,130 And people bracket 2 dot number gets 617 555-- 809 00:34:54,130 --> 00:34:56,469 555-- 0102. 810 00:34:56,469 --> 00:34:58,900 And then lastly-- it's getting tedious quickly. 811 00:34:58,900 --> 00:35:02,380 But in an ideal world, we would just ask the human for these inputs. 812 00:35:02,380 --> 00:35:03,880 Name will be mine. 813 00:35:03,880 --> 00:35:07,570 And then lastly, people bracket 3 dot number equals, quote unquote, 814 00:35:07,570 --> 00:35:10,390 "617 555 0103." 815 00:35:10,390 --> 00:35:11,750 Whew. 816 00:35:11,750 --> 00:35:14,750 So it's a little more to write in this case. 817 00:35:14,750 --> 00:35:16,990 And so it might rub you the wrong way in that sense. 818 00:35:16,990 --> 00:35:19,990 But notice that we're now kind of encapsulating everything together. 819 00:35:19,990 --> 00:35:23,920 We only have four values, each of which is a person. 820 00:35:23,920 --> 00:35:26,950 And each of those persons, inside of them, so to speak, 821 00:35:26,950 --> 00:35:28,570 have a name and a number. 822 00:35:28,570 --> 00:35:30,700 And everything is intricately related. 823 00:35:30,700 --> 00:35:33,440 So even if I sought these things by name, 824 00:35:33,440 --> 00:35:37,790 they're going to end up having the same associations between numbers and names. 825 00:35:37,790 --> 00:35:40,720 So now the last thing I have to do is change my logic down here. 826 00:35:40,720 --> 00:35:45,940 It's not sufficient anymore to compare names bracket i against Emma. 827 00:35:45,940 --> 00:35:48,650 What should I compare name against Emma? 828 00:35:48,650 --> 00:35:50,090 AUDIENCE: [INAUDIBLE]. 829 00:35:50,090 --> 00:35:51,820 SPEAKER 1: Dot name. 830 00:35:51,820 --> 00:35:55,180 And then down here, numbers doesn't even-- oh, and this was-- 831 00:35:55,180 --> 00:35:56,680 this is people. 832 00:35:56,680 --> 00:35:58,000 Numbers doesn't exist either. 833 00:35:58,000 --> 00:35:58,840 It's people. 834 00:35:58,840 --> 00:36:00,340 But I want to print her number here. 835 00:36:00,340 --> 00:36:02,800 So I do dot number. 836 00:36:02,800 --> 00:36:04,930 So again, we've add a little bit of complexity 837 00:36:04,930 --> 00:36:07,540 by adding typedef and these dot notations. 838 00:36:07,540 --> 00:36:11,680 But if I go ahead and make my phone book now, all too many errors. 839 00:36:11,680 --> 00:36:13,360 Oh, interesting. 840 00:36:13,360 --> 00:36:18,310 Array index 4 is past the end of the array, which contains four elements. 841 00:36:18,310 --> 00:36:21,340 So I made a stupid mistake here. 842 00:36:21,340 --> 00:36:22,185 What did I do? 843 00:36:22,185 --> 00:36:23,060 AUDIENCE: [INAUDIBLE] 844 00:36:23,060 --> 00:36:23,727 SPEAKER 1: Yeah. 845 00:36:23,727 --> 00:36:25,990 So I just kept incrementing incorrectly. 846 00:36:25,990 --> 00:36:28,630 Let me save that, run make phone book, Enter. 847 00:36:28,630 --> 00:36:29,650 Now it's good. 848 00:36:29,650 --> 00:36:35,080 Dot slash phone book, Enter, and hopefully I will see Emma's number. 849 00:36:35,080 --> 00:36:37,240 So it's no more correct than before. 850 00:36:37,240 --> 00:36:40,900 But it's arguably better designed/ and we'll come back to this later 851 00:36:40,900 --> 00:36:41,630 in the semester. 852 00:36:41,630 --> 00:36:43,338 [? As ?] you choose your choice of tracks 853 00:36:43,338 --> 00:36:47,230 and start implementing applications for the web or mobile devices or games, 854 00:36:47,230 --> 00:36:50,020 it's going to be quite common to encapsulate related information 855 00:36:50,020 --> 00:36:52,330 like this so that you keep lots of information 856 00:36:52,330 --> 00:36:55,000 together, especially when you use something called a database. 857 00:36:55,000 --> 00:36:55,738 Yeah? 858 00:36:55,738 --> 00:36:58,037 AUDIENCE: [INAUDIBLE] 859 00:36:58,037 --> 00:37:00,620 SPEAKER 1: Is there any shortcut for writing everything I did? 860 00:37:00,620 --> 00:37:02,760 Yes, you can actually use curly bracket notation. 861 00:37:02,760 --> 00:37:05,510 It gets a little uglier in this case so I'm not going to bother doing it. 862 00:37:05,510 --> 00:37:06,927 But, yes, there is a way to do it. 863 00:37:06,927 --> 00:37:08,968 However, this is, at the end of the day, realize, 864 00:37:08,968 --> 00:37:11,180 kind of a silly program because I'm writing a program 865 00:37:11,180 --> 00:37:13,550 to find Emma in a list of names I already wrote. 866 00:37:13,550 --> 00:37:14,700 So it's not dynamic at all. 867 00:37:14,700 --> 00:37:18,230 So in an ideal world, we would be using get string or something fancier anyway. 868 00:37:18,230 --> 00:37:21,180 Other questions on this? 869 00:37:21,180 --> 00:37:21,860 All right. 870 00:37:21,860 --> 00:37:24,650 So this is only to say we clearly have the ability, then, 871 00:37:24,650 --> 00:37:30,140 in code to implement these ideas, like [? Nizari ?] and Eric implemented 872 00:37:30,140 --> 00:37:33,980 more physically, using something like this array of lockers. 873 00:37:33,980 --> 00:37:35,640 So where do we go from here? 874 00:37:35,640 --> 00:37:38,450 Well, unfortunately, [? Nizari ?] benefited from the fact 875 00:37:38,450 --> 00:37:41,690 that the lockers were, of course, already 876 00:37:41,690 --> 00:37:43,940 sorted sort of behind her by Brian. 877 00:37:43,940 --> 00:37:45,638 But there were some price paid, right? 878 00:37:45,638 --> 00:37:48,680 Indeed, even we had to wait a little bit of time for all of those numbers 879 00:37:48,680 --> 00:37:50,513 to get sorted in the lockers before we could 880 00:37:50,513 --> 00:37:52,800 proceed to execute that algorithm. 881 00:37:52,800 --> 00:37:55,350 So a question, then, reasonable to ask is, well, 882 00:37:55,350 --> 00:37:57,590 how expensive is it to sort numbers? 883 00:37:57,590 --> 00:38:00,080 And should you sort numbers and then search? 884 00:38:00,080 --> 00:38:02,120 Or should you just jump right into searching 885 00:38:02,120 --> 00:38:05,150 and not worry about sorting the numbers, especially if one 886 00:38:05,150 --> 00:38:06,650 might be more costly than the other? 887 00:38:06,650 --> 00:38:08,483 These are going to be ultimately trade-offs. 888 00:38:08,483 --> 00:38:09,960 So let's consider them as follows. 889 00:38:09,960 --> 00:38:13,190 If now the problem at hand is to provide as input to our problem 890 00:38:13,190 --> 00:38:16,160 an unsorted list of numbers, the goal of which 891 00:38:16,160 --> 00:38:19,010 is to get a sorted list of numbers back out, 892 00:38:19,010 --> 00:38:20,990 how do we go about implementing this? 893 00:38:20,990 --> 00:38:26,720 For instance, if the numbers are 7, 2, 1, 6, 3, 4, 50 in that order, 894 00:38:26,720 --> 00:38:29,390 that unsorted order, the goal at hand is to get out 895 00:38:29,390 --> 00:38:34,700 1, 2, 3, 4, 6, 7, 50 in sorted order from left to right, 896 00:38:34,700 --> 00:38:36,570 smallest to largest. 897 00:38:36,570 --> 00:38:39,710 So how can we go about implementing that idea? 898 00:38:39,710 --> 00:38:41,110 Well, let me go ahead and see. 899 00:38:41,110 --> 00:38:42,470 We have a few stress balls left. 900 00:38:42,470 --> 00:38:44,803 And we could perhaps do this a little dramatically maybe 901 00:38:44,803 --> 00:38:47,330 with eight volunteers, if you will. 902 00:38:47,330 --> 00:38:48,350 OK, that's a plan. 903 00:38:48,350 --> 00:38:58,100 OK, so 1, about 2, 3, if we could, OK, 4 in the middle there, 5, 6, 7, and let's 904 00:38:58,100 --> 00:38:58,820 see-- 905 00:38:58,820 --> 00:39:02,250 and let's see, [INAUDIBLE] can come up here. 906 00:39:02,250 --> 00:39:03,170 Can we do it after? 907 00:39:03,170 --> 00:39:03,808 OK, thanks. 908 00:39:03,808 --> 00:39:05,850 And how about-- wait, I saw a hand in the middle. 909 00:39:05,850 --> 00:39:07,770 How about eight, volunteered by your friends. 910 00:39:07,770 --> 00:39:08,780 Come on up. 911 00:39:08,780 --> 00:39:10,042 So come on up, if you would. 912 00:39:10,042 --> 00:39:11,750 And Brian, if we could go ahead and equip 913 00:39:11,750 --> 00:39:15,230 our volunteers each with a number. 914 00:39:15,230 --> 00:39:18,320 We're going to go ahead and see if we can't solve together 915 00:39:18,320 --> 00:39:25,310 the idea of finding an algorithm for sorting the numbers at hand. 916 00:39:25,310 --> 00:39:27,770 So in just a moment, each of you will be handed a number. 917 00:39:27,770 --> 00:39:30,520 In the meantime, let's go ahead and just say a quick introduction, 918 00:39:30,520 --> 00:39:32,160 who you are, and perhaps your house. 919 00:39:32,160 --> 00:39:36,080 AUDIENCE: [? Crus, ?] Dudley House, from Germany. 920 00:39:36,080 --> 00:39:38,410 AUDIENCE: Curtis, just here visiting. 921 00:39:38,410 --> 00:39:39,696 SPEAKER 1: Wonderful. 922 00:39:39,696 --> 00:39:43,775 AUDIENCE: Ali, freshman, [INAUDIBLE],, from Turkey. 923 00:39:43,775 --> 00:39:46,110 AUDIENCE: Farah [? Foho, ?] from Detroit. 924 00:39:46,110 --> 00:39:46,910 925 00:39:46,910 --> 00:39:47,660 SPEAKER 1: Nice. 926 00:39:47,660 --> 00:39:52,151 AUDIENCE: Allison, Hollis because I'm first year, from Cleveland. 927 00:39:52,151 --> 00:39:53,150 AUDIENCE: I'm Claude. 928 00:39:53,150 --> 00:39:53,720 I'm in Mauer. 929 00:39:53,720 --> 00:39:55,290 And I'm from Virginia. 930 00:39:55,290 --> 00:39:57,260 AUDIENCE: I'm [? Rohil. ?] I'm in Wigglesworth. 931 00:39:57,260 --> 00:39:58,946 And I'm from Atlanta. 932 00:39:58,946 --> 00:40:01,250 AUDIENCE: I'm [? Yowell. ?] I'm also from Wigglesworth. 933 00:40:01,250 --> 00:40:02,505 And I'm from New York. 934 00:40:02,505 --> 00:40:03,390 AUDIENCE: I'm Bonnie. 935 00:40:03,390 --> 00:40:03,665 I'm in Lowell. 936 00:40:03,665 --> 00:40:05,457 I'm from Beijing and [? Ann ?] [? Arbor. ?] 937 00:40:05,457 --> 00:40:06,530 SPEAKER 1: Wonderful. 938 00:40:06,530 --> 00:40:10,360 And I'm noticing now, as you might be too, we have nine volunteers on stage. 939 00:40:10,360 --> 00:40:12,110 So we're going to go ahead and solve this. 940 00:40:12,110 --> 00:40:12,740 That's OK. 941 00:40:12,740 --> 00:40:13,270 What's your name again? 942 00:40:13,270 --> 00:40:13,960 AUDIENCE: Bonnie. 943 00:40:13,960 --> 00:40:14,770 SPEAKER 1: Bonnie, come on over here. 944 00:40:14,770 --> 00:40:16,937 You're going to be maybe my assistant, if you could, 945 00:40:16,937 --> 00:40:18,592 as we sought these elements. 946 00:40:18,592 --> 00:40:20,300 Let's go ahead and give you the mic here. 947 00:40:20,300 --> 00:40:23,360 Each of you has been handed a number that 948 00:40:23,360 --> 00:40:27,330 happens to match with this, which is just an unsorted list of numbers. 949 00:40:27,330 --> 00:40:30,530 And let me just ask that our eight volunteers here sort yourselves. 950 00:40:30,530 --> 00:40:32,752 Go. 951 00:40:32,752 --> 00:40:35,353 [INTERPOSING VOICES] 952 00:40:35,353 --> 00:40:37,520 SPEAKER 1: And I'll have you direct them after this. 953 00:40:37,520 --> 00:40:41,200 954 00:40:41,200 --> 00:40:41,830 Excellent. 955 00:40:41,830 --> 00:40:42,910 Very well done. 956 00:40:42,910 --> 00:40:45,010 [APPLAUSE] 957 00:40:45,010 --> 00:40:45,727 OK. 958 00:40:45,727 --> 00:40:47,560 So let me ask any of you, and we'll hand you 959 00:40:47,560 --> 00:40:50,200 the mic, if need be, what was the algorithm you used to sort yourselves? 960 00:40:50,200 --> 00:40:51,540 AUDIENCE: Human intuition. 961 00:40:51,540 --> 00:40:53,370 SPEAKER 1: Human intuition, OK. 962 00:40:53,370 --> 00:40:55,130 [LAUGHTER] 963 00:40:55,130 --> 00:40:55,630 Nice. 964 00:40:55,630 --> 00:40:58,080 [APPLAUSE] 965 00:40:58,080 --> 00:40:59,320 966 00:40:59,320 --> 00:40:59,820 Nice. 967 00:40:59,820 --> 00:41:01,220 Other formulations? 968 00:41:01,220 --> 00:41:01,720 Yeah? 969 00:41:01,720 --> 00:41:04,642 970 00:41:04,642 --> 00:41:09,160 AUDIENCE: I just checked if the person who's left 971 00:41:09,160 --> 00:41:13,320 me, who is supposed to be larger than me is larger than me. 972 00:41:13,320 --> 00:41:17,880 And if he was larger than me, then I stayed there. 973 00:41:17,880 --> 00:41:20,535 And if I was larger than him, I just switched places with him. 974 00:41:20,535 --> 00:41:21,660 SPEAKER 1: OK, I like that. 975 00:41:21,660 --> 00:41:23,400 It's sort [? of a ?] locally optimum approach, 976 00:41:23,400 --> 00:41:25,942 where you just kind of look to the left and right and sort of 977 00:41:25,942 --> 00:41:27,630 fix any transpositions or mismatches. 978 00:41:27,630 --> 00:41:30,172 And in fact, let's go ahead and try and apply that same idea. 979 00:41:30,172 --> 00:41:32,790 Can all eight of you reorder yourselves, just like that, 980 00:41:32,790 --> 00:41:34,620 so that you're standing below your number 981 00:41:34,620 --> 00:41:39,820 so that we're undoing the human intuition that we just executed. 982 00:41:39,820 --> 00:41:42,450 And now let's go ahead and say, all right, so, Bonnie, 983 00:41:42,450 --> 00:41:44,560 if you don't mind helping direct us there-- 984 00:41:44,560 --> 00:41:47,610 direct us here, we clearly have now an unsorted list of numbers. 985 00:41:47,610 --> 00:41:49,928 Let's just bite off this problem one bit at a time. 986 00:41:49,928 --> 00:41:51,720 So for instance, you two, your names again? 987 00:41:51,720 --> 00:41:52,350 AUDIENCE: Tris. 988 00:41:52,350 --> 00:41:52,740 SPEAKER 1: Tris. 989 00:41:52,740 --> 00:41:53,280 AUDIENCE: Curtis. 990 00:41:53,280 --> 00:41:53,940 SPEAKER 1: And Curtis. 991 00:41:53,940 --> 00:41:55,655 So you guys are clearly out of order. 992 00:41:55,655 --> 00:41:57,780 So what would be the locally optimal solution here. 993 00:41:57,780 --> 00:41:58,800 AUDIENCE: They would switch orders. 994 00:41:58,800 --> 00:42:00,050 SPEAKER 1: OK, please do that. 995 00:42:00,050 --> 00:42:01,885 All right, now let's consider 6 and 8. 996 00:42:01,885 --> 00:42:02,910 AUDIENCE: They're fine. 997 00:42:02,910 --> 00:42:03,892 SPEAKER 1: OK, 8 and 5? 998 00:42:03,892 --> 00:42:05,100 AUDIENCE: Let's switch again. 999 00:42:05,100 --> 00:42:06,392 SPEAKER 1: Please switch again. 1000 00:42:06,392 --> 00:42:07,140 8 and 2? 1001 00:42:07,140 --> 00:42:08,190 AUDIENCE: Switch. 1002 00:42:08,190 --> 00:42:09,030 SPEAKER 1: OK. 1003 00:42:09,030 --> 00:42:09,840 8 and 7? 1004 00:42:09,840 --> 00:42:11,063 AUDIENCE: Switch. 1005 00:42:11,063 --> 00:42:11,855 SPEAKER 1: 8 and 4? 1006 00:42:11,855 --> 00:42:13,110 AUDIENCE: Switch. 1007 00:42:13,110 --> 00:42:13,860 SPEAKER 1: 8 and-- 1008 00:42:13,860 --> 00:42:14,490 AUDIENCE: 1. 1009 00:42:14,490 --> 00:42:14,685 1010 00:42:14,685 --> 00:42:14,880 SPEAKER 1: --1? 1011 00:42:14,880 --> 00:42:15,480 AUDIENCE: Switch. 1012 00:42:15,480 --> 00:42:16,355 SPEAKER 1: All right. 1013 00:42:16,355 --> 00:42:17,930 So have we solved the problem? 1014 00:42:17,930 --> 00:42:18,605 AUDIENCE: No. 1015 00:42:18,605 --> 00:42:20,730 SPEAKER 1: OK, no, obviously not, but is it better? 1016 00:42:20,730 --> 00:42:23,160 Are we closer to the solution? 1017 00:42:23,160 --> 00:42:27,493 I'd argue we are closer because, right, like 8 somehow made its way all 1018 00:42:27,493 --> 00:42:30,660 the way to the correct destination, even though we still have kind of a mess 1019 00:42:30,660 --> 00:42:31,920 here to fix. 1020 00:42:31,920 --> 00:42:35,580 But notice that the solution got better in this direction and a little better 1021 00:42:35,580 --> 00:42:36,210 this direction. 1022 00:42:36,210 --> 00:42:37,270 But we're going to do this again. 1023 00:42:37,270 --> 00:42:38,715 So Bonnie, can you direct us once more? 1024 00:42:38,715 --> 00:42:39,325 AUDIENCE: Yes. 1025 00:42:39,325 --> 00:42:44,100 So if you would proceed from this order, you two would switch. 1026 00:42:44,100 --> 00:42:45,802 SPEAKER 1: 5 and 6? 1027 00:42:45,802 --> 00:42:47,240 AUDIENCE: Let's switch again. 1028 00:42:47,240 --> 00:42:48,330 SPEAKER 1: 6 and 2? 1029 00:42:48,330 --> 00:42:50,630 AUDIENCE: Remain, and then the next person-- 1030 00:42:50,630 --> 00:42:51,450 SPEAKER 1: 7 and 4? 1031 00:42:51,450 --> 00:42:52,440 AUDIENCE: 7 and 4 switch. 1032 00:42:52,440 --> 00:42:52,620 SPEAKER 1: Nice. 1033 00:42:52,620 --> 00:42:53,120 7 and 1? 1034 00:42:53,120 --> 00:42:54,162 AUDIENCE: 1 and 7 switch. 1035 00:42:54,162 --> 00:42:54,780 And then-- 1036 00:42:54,780 --> 00:42:55,560 SPEAKER 1: So now are we done? 1037 00:42:55,560 --> 00:42:56,233 AUDIENCE: No. 1038 00:42:56,233 --> 00:42:58,650 SPEAKER 1: So no, but look, the problem is getting better. 1039 00:42:58,650 --> 00:43:01,968 It's closer to solution because now we have 8 in place and 7 in place. 1040 00:43:01,968 --> 00:43:04,260 So we've taken a bite out of the problem, if you would. 1041 00:43:04,260 --> 00:43:05,670 Now, we can do this a little more rapid. 1042 00:43:05,670 --> 00:43:08,580 So if you want to tell everyone what to do pairwise, pretty quickly. 1043 00:43:08,580 --> 00:43:09,080 Go. 1044 00:43:09,080 --> 00:43:12,187 AUDIENCE: So everyone, just if you're-- 1045 00:43:12,187 --> 00:43:13,650 [LAUGHTER] 1046 00:43:13,650 --> 00:43:15,408 SPEAKER 1: Human intuition, if you would. 1047 00:43:15,408 --> 00:43:16,450 But let's do it pairwise. 1048 00:43:16,450 --> 00:43:17,210 AUDIENCE: OK. 1049 00:43:17,210 --> 00:43:18,360 Sure. 1050 00:43:18,360 --> 00:43:21,240 Could everyone if the person on your right is smaller than you, 1051 00:43:21,240 --> 00:43:25,453 switch with them and then do that again. 1052 00:43:25,453 --> 00:43:26,120 SPEAKER 1: Good. 1053 00:43:26,120 --> 00:43:27,540 AUDIENCE: Do that again, again. 1054 00:43:27,540 --> 00:43:29,231 SPEAKER 1: Good. 1055 00:43:29,231 --> 00:43:30,010 AUDIENCE: Again. 1056 00:43:30,010 --> 00:43:32,670 1057 00:43:32,670 --> 00:43:33,833 And then one last time. 1058 00:43:33,833 --> 00:43:34,500 SPEAKER 1: Yeah. 1059 00:43:34,500 --> 00:43:36,810 So even though we allowed it to get a little organic there at the end, 1060 00:43:36,810 --> 00:43:38,100 now is the list sorted? 1061 00:43:38,100 --> 00:43:39,570 AUDIENCE: Yeah. 1062 00:43:39,570 --> 00:43:40,722 SPEAKER 1: [LAUGHS] Yes. 1063 00:43:40,722 --> 00:43:42,930 So maybe a round of applause for our volunteers here. 1064 00:43:42,930 --> 00:43:44,260 And thank you to Bonnie, especially. 1065 00:43:44,260 --> 00:43:44,760 Thank you. 1066 00:43:44,760 --> 00:43:47,160 [APPLAUSE] 1067 00:43:47,160 --> 00:43:47,660 1068 00:43:47,660 --> 00:43:49,950 Brian, here we have a stressful for each of you. 1069 00:43:49,950 --> 00:43:51,090 And thank you so much. 1070 00:43:51,090 --> 00:43:54,160 So let's see if we can't now formalize-- 1071 00:43:54,160 --> 00:43:56,350 feel free to make your way off to either side. 1072 00:43:56,350 --> 00:44:01,080 Let's see if we can't formalize exactly what it is these volunteers wonderfully 1073 00:44:01,080 --> 00:44:03,870 did at Bonnie's direction to get this list sorted. 1074 00:44:03,870 --> 00:44:06,870 It turns out that what everyone did here has a name. 1075 00:44:06,870 --> 00:44:10,230 It's an algorithm known as bubble sort because as you notice, 1076 00:44:10,230 --> 00:44:14,110 the 8 initially kind of bubbled its way up from left to right, 1077 00:44:14,110 --> 00:44:16,837 and then the 7 kind of bubbled its way up from left to right. 1078 00:44:16,837 --> 00:44:19,920 And as they repeated, even though we did it more quickly at the [? end, ?] 1079 00:44:19,920 --> 00:44:23,070 [? the ?] bigger numbers bubble their way all the way up until they were 1080 00:44:23,070 --> 00:44:24,060 in the right place. 1081 00:44:24,060 --> 00:44:26,580 So in pseudocode, I'd argue that what we did was this. 1082 00:44:26,580 --> 00:44:29,460 Bonnie directed our audience, at an increasing speed, 1083 00:44:29,460 --> 00:44:32,340 to repeat the following n minus 1 times. 1084 00:44:32,340 --> 00:44:33,390 Why n minus 1? 1085 00:44:33,390 --> 00:44:36,900 Well, if you've got n people and they're comparing each other, 1086 00:44:36,900 --> 00:44:40,800 you can only compare people n minus 1 times if you have n people. 1087 00:44:40,800 --> 00:44:44,370 So she told them to do this n minus 1 times in total 1088 00:44:44,370 --> 00:44:47,130 for i from 0 to n minus 2. 1089 00:44:47,130 --> 00:44:48,960 Now what's that actually referring to? 1090 00:44:48,960 --> 00:44:50,610 So this i is our index. 1091 00:44:50,610 --> 00:44:52,980 So it's kind of like treating our humans like an array. 1092 00:44:52,980 --> 00:44:54,030 What did we do? 1093 00:44:54,030 --> 00:45:00,130 If the ith person, starting at 0, and the ith plus 1 person are out of order, 1094 00:45:00,130 --> 00:45:02,910 what did she tell them to do? 1095 00:45:02,910 --> 00:45:05,680 Switch places or swap, so to speak. 1096 00:45:05,680 --> 00:45:07,350 And so this looks pretty technical. 1097 00:45:07,350 --> 00:45:11,790 But it's really just a pseudocode way of distilling into more succinct English, 1098 00:45:11,790 --> 00:45:15,860 with some numbers involved, what it is Bonnie was directing everyone to do. 1099 00:45:15,860 --> 00:45:17,610 She said do the following n minus 1 times. 1100 00:45:17,610 --> 00:45:20,610 That's why it went on for several rotations, quicker and quicker. 1101 00:45:20,610 --> 00:45:23,160 She then pretty much treated the first person 1102 00:45:23,160 --> 00:45:27,580 as bracket 0, the next person as bracket 1, bracket 2, just like an array, 1103 00:45:27,580 --> 00:45:28,740 albeit of humans. 1104 00:45:28,740 --> 00:45:31,950 And then she compared them side by side, calling one person 1105 00:45:31,950 --> 00:45:34,320 i and the person next to them i plus 1. 1106 00:45:34,320 --> 00:45:37,815 And if they were out of order, it was swapped and again and again 1107 00:45:37,815 --> 00:45:39,440 and again till this algorithm executed. 1108 00:45:39,440 --> 00:45:43,080 Until finally, the whole thing was hopefully sorted. 1109 00:45:43,080 --> 00:45:46,320 How many times did it-- 1110 00:45:46,320 --> 00:45:47,880 how many steps did it take? 1111 00:45:47,880 --> 00:45:48,910 How long did it take? 1112 00:45:48,910 --> 00:45:52,200 What's the running time in big O notation of bubble sort? 1113 00:45:52,200 --> 00:45:54,810 Well, the outer loop takes n minus 1 steps. 1114 00:45:54,810 --> 00:45:59,820 The inner loop also takes n minus 1 steps because it's 0 through n minus 2. 1115 00:45:59,820 --> 00:46:02,800 And so if we go ahead and multiply that out, ala FOIL, 1116 00:46:02,800 --> 00:46:06,640 we have n squared minus 1n minus 1n plus 1. 1117 00:46:06,640 --> 00:46:10,380 If we combine like terms, we now have n squared minus 2n plus 1. 1118 00:46:10,380 --> 00:46:12,660 But at this point, what matters ultimately 1119 00:46:12,660 --> 00:46:16,800 is that the highest order term, the n squared, is what ultimately dominates. 1120 00:46:16,800 --> 00:46:19,860 The bigger n gets, the more impact that n squared has. 1121 00:46:19,860 --> 00:46:22,410 And so a computer scientist would say that bubble sort 1122 00:46:22,410 --> 00:46:25,720 is on the order of n squared. 1123 00:46:25,720 --> 00:46:29,970 So if we add to our list from before the algorithm's upper bounds, 1124 00:46:29,970 --> 00:46:33,750 we can now put bubble sort way up at the top, unfortunately, which 1125 00:46:33,750 --> 00:46:36,240 is to say that sorting numbers with bubble sort 1126 00:46:36,240 --> 00:46:41,085 is apparently way more expensive than linearly searching or binary searching. 1127 00:46:41,085 --> 00:46:42,960 And so it kind of invites the question, then, 1128 00:46:42,960 --> 00:46:45,168 with Eric and [? Nizari ?] when they came up earlier. 1129 00:46:45,168 --> 00:46:47,550 Yes, [? Nizari's ?] algorithm was better. 1130 00:46:47,550 --> 00:46:50,520 But it was better in the sense that it ran faster. 1131 00:46:50,520 --> 00:46:53,650 But it presupposed what, just to be clear? 1132 00:46:53,650 --> 00:46:54,583 AUDIENCE: [INAUDIBLE] 1133 00:46:54,583 --> 00:46:56,250 SPEAKER 1: That the numbers were sorted. 1134 00:46:56,250 --> 00:46:59,040 And so it's a little misleading to say that binary search is 1135 00:46:59,040 --> 00:47:00,570 better than linear search. 1136 00:47:00,570 --> 00:47:02,490 Because if it costs you a huge amount of time 1137 00:47:02,490 --> 00:47:04,573 to sort those elements so that, then, [? Nizari ?] 1138 00:47:04,573 --> 00:47:07,530 can go ahead and execute binary search, it might be a wash, 1139 00:47:07,530 --> 00:47:09,390 or it might even be a net negative. 1140 00:47:09,390 --> 00:47:11,400 So it's really going to depend on, well, are you 1141 00:47:11,400 --> 00:47:13,698 searching more often, more than once? 1142 00:47:13,698 --> 00:47:15,990 Are you searching lots and lots and lots of times, such 1143 00:47:15,990 --> 00:47:18,210 that it's worth it to sort it once and then 1144 00:47:18,210 --> 00:47:20,670 benefit long term by much faster code? 1145 00:47:20,670 --> 00:47:23,470 Well, what about omega for bubble sort? 1146 00:47:23,470 --> 00:47:25,930 Bubble sort's code, again, looked like this. 1147 00:47:25,930 --> 00:47:30,040 And frankly, it doesn't really take into account at all good inputs, right? 1148 00:47:30,040 --> 00:47:33,150 Like, the best possible input to any sorting algorithm most likely 1149 00:47:33,150 --> 00:47:34,817 is it's already sorted for you, right? 1150 00:47:34,817 --> 00:47:36,900 Because if it's already sorted, presumably there's 1151 00:47:36,900 --> 00:47:38,190 no actual work to be done. 1152 00:47:38,190 --> 00:47:40,020 How lucky would that be? 1153 00:47:40,020 --> 00:47:42,660 But bubble sort, as defined, is kind of stupid, right? 1154 00:47:42,660 --> 00:47:45,360 It doesn't say if already sorted, quit. 1155 00:47:45,360 --> 00:47:48,450 It just blindly does the following n minus 1 times 1156 00:47:48,450 --> 00:47:51,480 and then inside of that does something n minus 2 times. 1157 00:47:51,480 --> 00:47:54,360 So what's the lower bound on the running time of bubble sort, 1158 00:47:54,360 --> 00:47:58,347 even if you get lucky and the whole thing is already sorted for you? 1159 00:47:58,347 --> 00:47:59,430 AUDIENCE: [? n squared. ?] 1160 00:47:59,430 --> 00:48:02,760 SPEAKER 1: It's still in squared because it's still going to take as many steps 1161 00:48:02,760 --> 00:48:03,310 as before. 1162 00:48:03,310 --> 00:48:07,963 And so bubble short as a lower bound, arguably, has omega of n squared. 1163 00:48:07,963 --> 00:48:10,380 And let's see, Brian, if you wouldn't mind lending a hand, 1164 00:48:10,380 --> 00:48:13,740 let's see if we can't do better than by taking maybe a fundamentally 1165 00:48:13,740 --> 00:48:16,680 different approach to sorting, as by laying out 1166 00:48:16,680 --> 00:48:18,750 something called selection sort. 1167 00:48:18,750 --> 00:48:21,560 So in selection sort, we have a similar set of numbers, 1168 00:48:21,560 --> 00:48:23,940 but we won't bother using something as large as 50. 1169 00:48:23,940 --> 00:48:26,200 Brian's going to kindly set them up in a random order, 1170 00:48:26,200 --> 00:48:28,200 but we happen to have a cheat sheet on the board 1171 00:48:28,200 --> 00:48:30,240 so that we can try this again if we need to. 1172 00:48:30,240 --> 00:48:35,400 And these numbers right now are unsorted from left to right. 1173 00:48:35,400 --> 00:48:39,620 And we have 1 2, 3, 4, 5, 6, 7, 8 numbers in total here. 1174 00:48:39,620 --> 00:48:42,120 So bubble sort was nice because it leveraged your intuition, 1175 00:48:42,120 --> 00:48:43,770 where it will just look to the left, look to the right 1176 00:48:43,770 --> 00:48:45,220 and fix those small problems. 1177 00:48:45,220 --> 00:48:47,970 But honestly, a fundamentally different way to think about sorting 1178 00:48:47,970 --> 00:48:51,570 would be, well, if I know I want small to large, left to right, 1179 00:48:51,570 --> 00:48:53,610 why don't I just do that? 1180 00:48:53,610 --> 00:48:54,930 What is the smallest number? 1181 00:48:54,930 --> 00:48:58,470 Well, recall that these things, if they're implemented in an array, 1182 00:48:58,470 --> 00:48:59,760 might as well be in lockers. 1183 00:48:59,760 --> 00:49:01,960 I can't just use a human intuition in this case. 1184 00:49:01,960 --> 00:49:03,930 I have to look at each element individually. 1185 00:49:03,930 --> 00:49:05,610 But I'm not going to bother throwing them back in the locker 1186 00:49:05,610 --> 00:49:07,890 because that's just going to take unnecessary time. 1187 00:49:07,890 --> 00:49:08,880 But I look at 6. 1188 00:49:08,880 --> 00:49:11,453 6 is the smallest number I have seen thus far. 1189 00:49:11,453 --> 00:49:13,870 So at the moment, this is the smallest number in the list. 1190 00:49:13,870 --> 00:49:16,620 So I'm going to remember that with a variable in my mind. 1191 00:49:16,620 --> 00:49:17,400 Now I see 3. 1192 00:49:17,400 --> 00:49:20,460 3 is obviously less than 6, so I'm going to forget about 6 1193 00:49:20,460 --> 00:49:23,940 and just remember for now that 3 is the smallest element I've seen. 1194 00:49:23,940 --> 00:49:25,170 8 is no smaller. 1195 00:49:25,170 --> 00:49:26,250 5 is no smaller. 1196 00:49:26,250 --> 00:49:27,493 Ooh, 2 is smaller. 1197 00:49:27,493 --> 00:49:29,160 I'm going to remember 2 is the smallest. 1198 00:49:29,160 --> 00:49:31,050 I'm going to forget about the 3. 1199 00:49:31,050 --> 00:49:34,680 Meanwhile I keep going, 7, 4-- ooh, 1 is even smaller. 1200 00:49:34,680 --> 00:49:36,430 And so I've gotten to the end of the list. 1201 00:49:36,430 --> 00:49:38,610 The smallest element in this list is 1. 1202 00:49:38,610 --> 00:49:40,388 It obviously belongs over there. 1203 00:49:40,388 --> 00:49:41,430 So what can I do with it? 1204 00:49:41,430 --> 00:49:42,843 AUDIENCE: [INAUDIBLE]. 1205 00:49:42,843 --> 00:49:44,760 SPEAKER 1: Yeah, ideally I could just move it. 1206 00:49:44,760 --> 00:49:46,302 Now, maybe I should make room, right? 1207 00:49:46,302 --> 00:49:48,873 The table's a little small, or my array is a fixed size. 1208 00:49:48,873 --> 00:49:51,040 So I could start scooching everything over this way. 1209 00:49:51,040 --> 00:49:51,420 But you know what? 1210 00:49:51,420 --> 00:49:53,460 Frankly, that's going to take a while, right? [? I ?] 1211 00:49:53,460 --> 00:49:54,990 have to move, like, seven elements. 1212 00:49:54,990 --> 00:49:59,370 Why don't I just kind of forcefully evict the 6, 1213 00:49:59,370 --> 00:50:03,050 put it over here, because after all, it was in random order in the first place. 1214 00:50:03,050 --> 00:50:05,850 Who cares if I move it someplace else even more random? 1215 00:50:05,850 --> 00:50:06,930 I'll deal with it later. 1216 00:50:06,930 --> 00:50:08,263 So you could do either approach. 1217 00:50:08,263 --> 00:50:09,510 You could shift everything. 1218 00:50:09,510 --> 00:50:11,490 But that feels like it'll take some time. 1219 00:50:11,490 --> 00:50:14,440 Or you can just evict whatever is in the place you want to be. 1220 00:50:14,440 --> 00:50:18,330 But what's nice now is that my list is closer to sorted. 1221 00:50:18,330 --> 00:50:21,010 The 1 is in its correct place. 1222 00:50:21,010 --> 00:50:24,535 So now all I have to look at is n minus one other element. 1223 00:50:24,535 --> 00:50:25,410 So let's take a look. 1224 00:50:25,410 --> 00:50:26,785 What's the next smallest element? 1225 00:50:26,785 --> 00:50:30,070 At the moment, it's 3, still 3, still 3. 1226 00:50:30,070 --> 00:50:31,950 Oh, wait a minute, it looks like 2. 1227 00:50:31,950 --> 00:50:34,680 Now, you might want to just abort now and rip out the 2. 1228 00:50:34,680 --> 00:50:36,993 But you don't know necessarily, as the computer, 1229 00:50:36,993 --> 00:50:38,910 if you're only looking at one value at a time, 1230 00:50:38,910 --> 00:50:41,340 unless you have multiple variables in your mind, which 1231 00:50:41,340 --> 00:50:42,573 I'm not going to bother with. 1232 00:50:42,573 --> 00:50:44,490 Let me see if there's anything smaller than 2. 1233 00:50:44,490 --> 00:50:46,900 7, 4, 6-- no. 1234 00:50:46,900 --> 00:50:48,570 So I'm going to grab the 2. 1235 00:50:48,570 --> 00:50:51,312 And where do I want to put it? 1236 00:50:51,312 --> 00:50:52,020 Right over there. 1237 00:50:52,020 --> 00:50:52,832 And you know what? 1238 00:50:52,832 --> 00:50:54,040 This could be a net negative. 1239 00:50:54,040 --> 00:50:55,650 But I think it's going to average out. 1240 00:50:55,650 --> 00:50:59,160 I'm going to move the 3 to where I do have room and go ahead and claim 1241 00:50:59,160 --> 00:51:01,385 that my 2 is now sorted. 1242 00:51:01,385 --> 00:51:03,510 And I'm going to do this again and again and again. 1243 00:51:03,510 --> 00:51:05,760 And just like Bonnie did, I'm going to do it a little faster now, 1244 00:51:05,760 --> 00:51:06,677 walk through the list. 1245 00:51:06,677 --> 00:51:08,140 OK, 3 is the smallest. 1246 00:51:08,140 --> 00:51:11,277 I'm going to go ahead and put it in sorted order by evicting the 8. 1247 00:51:11,277 --> 00:51:12,360 Now I'm going to go ahead. 1248 00:51:12,360 --> 00:51:14,130 All right, 5 8, 7-- 1249 00:51:14,130 --> 00:51:15,690 4 is now the smallest. 1250 00:51:15,690 --> 00:51:18,000 I'm going to go ahead and evict the 5, move it over 1251 00:51:18,000 --> 00:51:19,980 here, and claim that that's sorted. 1252 00:51:19,980 --> 00:51:23,250 Let me do it once more, 8, 7, 5, 6. 1253 00:51:23,250 --> 00:51:24,900 5 is clearly the smallest. 1254 00:51:24,900 --> 00:51:28,310 Let me go ahead and evict the 8 again, make room for the 5. 1255 00:51:28,310 --> 00:51:31,670 But I only have three steps left, 7, 8, 6. 1256 00:51:31,670 --> 00:51:36,420 Let me go ahead and move the 7 over here, put the sixth in place. 1257 00:51:36,420 --> 00:51:37,170 8 is the smallest. 1258 00:51:37,170 --> 00:51:38,280 No, 7 is smaller. 1259 00:51:38,280 --> 00:51:40,920 Let me go ahead and put it in place, evicting the 8. 1260 00:51:40,920 --> 00:51:46,360 Voila, hopefully now, oof, done but a fundamentally different algorithm, 1261 00:51:46,360 --> 00:51:46,860 right? 1262 00:51:46,860 --> 00:51:50,220 There was no pairwise swapping back and forth and back and forth. 1263 00:51:50,220 --> 00:51:53,670 Each time I sort of set my mind on a goal, get the next smallest element, 1264 00:51:53,670 --> 00:51:55,000 get the next smallest element. 1265 00:51:55,000 --> 00:51:59,460 And that is what we shall call selection sort, where on each iteration 1266 00:51:59,460 --> 00:52:01,510 you select the next smallest element. 1267 00:52:01,510 --> 00:52:05,880 So in pseudocode we might say this, for i from 0 to n minus 1. 1268 00:52:05,880 --> 00:52:07,580 And again, just adopt this habit now. 1269 00:52:07,580 --> 00:52:11,650 Any time in the life, and certainly a CS class, when you have n items, 1270 00:52:11,650 --> 00:52:15,300 the first one is ironically 1 but in this case, 0. 1271 00:52:15,300 --> 00:52:17,520 And the last one is n minus 1. 1272 00:52:17,520 --> 00:52:19,920 0 to n minus 1 is how a computer scientist 1273 00:52:19,920 --> 00:52:22,780 counts from 1 to n in the real world. 1274 00:52:22,780 --> 00:52:26,880 So this just says do the following n times but use i. 1275 00:52:26,880 --> 00:52:28,530 Start counting from 0. 1276 00:52:28,530 --> 00:52:32,280 Find the smallest item between the ith item and the last item. 1277 00:52:32,280 --> 00:52:33,510 What am I saying there? 1278 00:52:33,510 --> 00:52:36,030 Well, if I initialize i initially to 0, that's 1279 00:52:36,030 --> 00:52:38,820 just saying find the smallest element among all eight 1280 00:52:38,820 --> 00:52:42,810 and grab it, swap the smallest item with that ith item. 1281 00:52:42,810 --> 00:52:45,030 So wherever I found the smallest element, 1282 00:52:45,030 --> 00:52:47,160 go ahead and swap it with that one. 1283 00:52:47,160 --> 00:52:48,870 And then this algorithm-- whoops-- 1284 00:52:48,870 --> 00:52:51,570 is just going to repeat again and again and again. 1285 00:52:51,570 --> 00:52:56,440 It's almost a little more succinct to represent in pseudocode. 1286 00:52:56,440 --> 00:52:59,260 But it invites the question, then, is this better? 1287 00:52:59,260 --> 00:53:00,330 Is selection sort better? 1288 00:53:00,330 --> 00:53:01,950 Well, what would it mean for an algorithm to be better? 1289 00:53:01,950 --> 00:53:04,128 We have two rules of thumb, big O and omega. 1290 00:53:04,128 --> 00:53:04,920 So let's try those. 1291 00:53:04,920 --> 00:53:09,870 So in big O notation, how many steps does it take to sort a list of numbers 1292 00:53:09,870 --> 00:53:12,870 like I did, where you just again and again and again select 1293 00:53:12,870 --> 00:53:16,210 the smallest, the smallest, the smallest element? 1294 00:53:16,210 --> 00:53:18,210 Well, how do you even begin to think about that? 1295 00:53:18,210 --> 00:53:18,710 Yeah? 1296 00:53:18,710 --> 00:53:23,691 AUDIENCE: [INAUDIBLE] n squared because you have at iteration n 1297 00:53:23,691 --> 00:53:28,113 [? an ?] n minus 1 [INAUDIBLE]. 1298 00:53:28,113 --> 00:53:28,780 SPEAKER 1: Yeah. 1299 00:53:28,780 --> 00:53:29,730 That's the right intuition. 1300 00:53:29,730 --> 00:53:31,820 And let me back up just one step until we get to that. 1301 00:53:31,820 --> 00:53:33,168 The proposal was its n squared. 1302 00:53:33,168 --> 00:53:34,960 And indeed, that's going to be the spoiler. 1303 00:53:34,960 --> 00:53:35,770 But why? 1304 00:53:35,770 --> 00:53:37,570 Well, if you actually started to count up 1305 00:53:37,570 --> 00:53:41,110 how many steps I was taking physically, right, to find the smallest element, 1306 00:53:41,110 --> 00:53:44,498 it's going to take me maybe seven steps to find the smallest 1307 00:53:44,498 --> 00:53:46,540 element because I'm going to look at all of them. 1308 00:53:46,540 --> 00:53:49,450 So in my first pass, I'm looking at all eight elements, 1309 00:53:49,450 --> 00:53:53,620 or taking almost n steps to find the smallest number, like 1. 1310 00:53:53,620 --> 00:53:55,300 But after that, the 1 was in place. 1311 00:53:55,300 --> 00:53:58,820 And I turned on its light bulbs, and that left seven numbers left. 1312 00:53:58,820 --> 00:54:00,490 And how many steps did I then take? 1313 00:54:00,490 --> 00:54:01,570 Well, n minus 1. 1314 00:54:01,570 --> 00:54:04,300 Then after the 2 was in place, how many steps? 1315 00:54:04,300 --> 00:54:07,090 n minus 2 and then n minus 3, n minus 4, dot, 1316 00:54:07,090 --> 00:54:09,490 dot, dot, until there was just one number left. 1317 00:54:09,490 --> 00:54:13,873 So that invites the question, what, then, does this total up to? 1318 00:54:13,873 --> 00:54:15,790 And indeed, you jumped to the right intuition. 1319 00:54:15,790 --> 00:54:18,730 If you start with n, and you add to the n minus 1 steps, 1320 00:54:18,730 --> 00:54:21,160 and you add to that n minus 2 steps, dot, dot, dot, 1321 00:54:21,160 --> 00:54:24,303 one final step once you get to the end of the list, 1322 00:54:24,303 --> 00:54:25,720 what does this actually sum up to? 1323 00:54:25,720 --> 00:54:26,710 It's actually not obvious. 1324 00:54:26,710 --> 00:54:29,050 And this is one of those things in life, unless you're a math major, 1325 00:54:29,050 --> 00:54:31,030 you probably would look at the back of a math textbook 1326 00:54:31,030 --> 00:54:32,860 or a physics textbook for those little cheat sheets 1327 00:54:32,860 --> 00:54:34,630 that they used to come with, at least in high school. 1328 00:54:34,630 --> 00:54:37,870 Allow me just to propose for today's sake, if you actually do out this math 1329 00:54:37,870 --> 00:54:40,840 or look it up at the back of a book, it ends up being this, 1330 00:54:40,840 --> 00:54:43,877 n times n plus 1 divided by 2. 1331 00:54:43,877 --> 00:54:45,460 And you can prove this mathematically. 1332 00:54:45,460 --> 00:54:48,820 But for our purposes, just trust me, if you will, 1333 00:54:48,820 --> 00:54:52,730 that adding a number plus the smaller number plus the smaller number plus 1334 00:54:52,730 --> 00:54:56,230 the smaller number all the way to 1, gives you this relationship, n times n 1335 00:54:56,230 --> 00:54:57,550 plus 1 divided by 2. 1336 00:54:57,550 --> 00:54:59,745 And it's fine if you just take that as fact. 1337 00:54:59,745 --> 00:55:01,120 So let me just multiply this out. 1338 00:55:01,120 --> 00:55:03,340 That's n squared plus n/2. 1339 00:55:03,340 --> 00:55:05,890 That, of course, is n squared divided by 2 plus n/2. 1340 00:55:05,890 --> 00:55:08,140 But, again, who cares? 1341 00:55:08,140 --> 00:55:10,840 Big O notation would propose that we focus only on what? 1342 00:55:10,840 --> 00:55:11,765 AUDIENCE: n squared. 1343 00:55:11,765 --> 00:55:12,640 SPEAKER 1: n squared. 1344 00:55:12,640 --> 00:55:14,980 This frankly, is on the order of n squared, 1345 00:55:14,980 --> 00:55:17,650 exactly as you said, because as n gets large, 1346 00:55:17,650 --> 00:55:20,710 the only factor in that mathematical expression that we're really 1347 00:55:20,710 --> 00:55:23,890 going to care about is the one that gets bigger and bigger and bigger 1348 00:55:23,890 --> 00:55:26,030 faster than everything else. 1349 00:55:26,030 --> 00:55:28,870 So in terms of selection sort, it would seem that we 1350 00:55:28,870 --> 00:55:30,907 have big O of n squared for it as well. 1351 00:55:30,907 --> 00:55:32,740 So it's a fundamentally different algorithm, 1352 00:55:32,740 --> 00:55:34,720 but mathematically and in the real world, 1353 00:55:34,720 --> 00:55:36,410 it kind of works out to be the same. 1354 00:55:36,410 --> 00:55:38,080 So we haven't really done better yet. 1355 00:55:38,080 --> 00:55:40,330 What about omega for selection sort? 1356 00:55:40,330 --> 00:55:43,690 If the code for selection sort is this, does it 1357 00:55:43,690 --> 00:55:48,170 benefit from the list being sorted already? 1358 00:55:48,170 --> 00:55:51,920 Or is it just going to blindly do its order of n squared work 1359 00:55:51,920 --> 00:55:54,470 again and again anyway, right? 1360 00:55:54,470 --> 00:55:56,720 Like, this is opportune that the numbers are currently 1361 00:55:56,720 --> 00:55:59,540 sorted because we can make a point, well, this is the best case scenario. 1362 00:55:59,540 --> 00:56:01,020 I hand you the numbers 1 through 8. 1363 00:56:01,020 --> 00:56:01,800 They're already sorted. 1364 00:56:01,800 --> 00:56:03,467 And you try to use selection sort on it. 1365 00:56:03,467 --> 00:56:05,698 Well, you might think, ooh, it's in the right place. 1366 00:56:05,698 --> 00:56:07,490 I'm just going to grab the smallest number. 1367 00:56:07,490 --> 00:56:10,377 Now I'm going to grab the next smallest number and so forth. 1368 00:56:10,377 --> 00:56:11,210 But that's not true. 1369 00:56:11,210 --> 00:56:14,780 When I'm the computer, and I open the first locker, and I see the number 1, 1370 00:56:14,780 --> 00:56:16,810 do I know anything more about my numbers yet? 1371 00:56:16,810 --> 00:56:17,660 AUDIENCE: No. 1372 00:56:17,660 --> 00:56:18,140 SPEAKER 1: No, right? 1373 00:56:18,140 --> 00:56:21,223 You're using human intuition to see that, OK, obviously it's the smallest. 1374 00:56:21,223 --> 00:56:25,970 I, the program, do not know that until I look at the other numbers in the list. 1375 00:56:25,970 --> 00:56:29,360 And so again, if you just iterate through using selection sort, 1376 00:56:29,360 --> 00:56:31,460 you only know what's in front of you, which 1377 00:56:31,460 --> 00:56:35,390 means you're going to execute the exact same code again and again. 1378 00:56:35,390 --> 00:56:36,890 And that means the math is the same. 1379 00:56:36,890 --> 00:56:39,680 Even in this best case, we are truly wasting our time 1380 00:56:39,680 --> 00:56:45,020 now with selections sort because it is going to be omega of n squared, too. 1381 00:56:45,020 --> 00:56:48,740 So my god, now we have two bad solutions to a problem. 1382 00:56:48,740 --> 00:56:49,950 Can we do better? 1383 00:56:49,950 --> 00:56:52,120 Well, let me propose we revisit bubble sort. 1384 00:56:52,120 --> 00:56:56,390 Bubble sort, again, just has you swap adjacent elements again and again 1385 00:56:56,390 --> 00:56:59,720 and again and again until you're all sorted. 1386 00:56:59,720 --> 00:57:02,960 But when might you want to stop going back and forth the list? 1387 00:57:02,960 --> 00:57:08,550 Like, when might Bonnie have wanted to say, ooh, that's enough work, I'm done? 1388 00:57:08,550 --> 00:57:11,550 If she walks through the list looking at every person, i and i 1389 00:57:11,550 --> 00:57:13,950 plus 1 next to each other, when might she 1390 00:57:13,950 --> 00:57:16,200 conclude that she's done doing that work of sorting? 1391 00:57:16,200 --> 00:57:19,984 1392 00:57:19,984 --> 00:57:20,484 Yeah? 1393 00:57:20,484 --> 00:57:22,400 AUDIENCE: [INAUDIBLE] 1394 00:57:22,400 --> 00:57:24,530 SPEAKER 1: Yeah, if there was a question she asked, 1395 00:57:24,530 --> 00:57:27,980 or if there was a pass she made walking through the volunteers 1396 00:57:27,980 --> 00:57:29,360 and didn't have to do any work. 1397 00:57:29,360 --> 00:57:31,485 She doesn't have to keep doing work again and again 1398 00:57:31,485 --> 00:57:34,280 just because the algorithm said to repeat it n minus 1 times. 1399 00:57:34,280 --> 00:57:37,820 We kind of want to have a condition in here, or some way of short circuiting 1400 00:57:37,820 --> 00:57:41,570 the algorithm, so that we stop once we're really just wasting our time. 1401 00:57:41,570 --> 00:57:43,820 And bubble sort lends itself to that because we 1402 00:57:43,820 --> 00:57:48,485 can tweak our wording of our pseudocode as follows, repeat until no swaps. 1403 00:57:48,485 --> 00:57:51,110 So again, it's opportune that these numbers are already sorted. 1404 00:57:51,110 --> 00:57:52,370 Let's try bubble sort on it. 1405 00:57:52,370 --> 00:57:54,998 So Bonnie probably would have said, compare 1 and 2. 1406 00:57:54,998 --> 00:57:56,040 They're not out of order. 1407 00:57:56,040 --> 00:57:57,260 So we don't have to swap. 1408 00:57:57,260 --> 00:58:02,570 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7, 7 and 8, 1409 00:58:02,570 --> 00:58:04,580 she obviously did no swaps. 1410 00:58:04,580 --> 00:58:09,290 It would be stupid for her to go again just because the algorithm said 1411 00:58:09,290 --> 00:58:13,460 do this n minus 1 times because she's going to get no, no, no, no, 1412 00:58:13,460 --> 00:58:15,080 again and again as her answer. 1413 00:58:15,080 --> 00:58:19,520 So by saying repeat until no swaps, she can abort this algorithm early and then 1414 00:58:19,520 --> 00:58:23,327 have taken how many steps in this best case? 1415 00:58:23,327 --> 00:58:24,202 AUDIENCE: [INAUDIBLE] 1416 00:58:24,202 --> 00:58:26,119 SPEAKER 1: Yeah, technically n minus 1, right? 1417 00:58:26,119 --> 00:58:31,750 Because if this is n elements, or 8, you can compare seven pairs, 1, 2, 3, 4, 5, 1418 00:58:31,750 --> 00:58:35,000 6, 7, so n minus 1. 1419 00:58:35,000 --> 00:58:38,480 So she could, in the best case, then, have a lower bound 1420 00:58:38,480 --> 00:58:41,030 on running time for selection sort-- 1421 00:58:41,030 --> 00:58:45,620 of bubble sort no longer of n squared, but now n. 1422 00:58:45,620 --> 00:58:47,630 So it would seem that with a bit more cleverness 1423 00:58:47,630 --> 00:58:52,635 we can actually benefit in terms of the running time of these algorithms. 1424 00:58:52,635 --> 00:58:55,760 Well, let's see if we can't see these from a slightly different perspective 1425 00:58:55,760 --> 00:58:58,290 now by doing this visualization. 1426 00:58:58,290 --> 00:59:02,990 I'm going to go ahead and open up a graphical visualization of each 1427 00:59:02,990 --> 00:59:04,890 of these algorithms in turn. 1428 00:59:04,890 --> 00:59:09,890 So what you have here is an array of numbers, each of which 1429 00:59:09,890 --> 00:59:11,690 is represented by a vertical bar. 1430 00:59:11,690 --> 00:59:14,270 Short bar is small number, like 0, 1, 2. 1431 00:59:14,270 --> 00:59:18,500 Tall bar is big number, like 99 or 100 or anything in between. 1432 00:59:18,500 --> 00:59:20,330 This is a visualization tool online. 1433 00:59:20,330 --> 00:59:21,740 And we'll link this on the course's website 1434 00:59:21,740 --> 00:59:23,240 so that we can try these algorithms. 1435 00:59:23,240 --> 00:59:25,090 So let's try bubble sort, for instance. 1436 00:59:25,090 --> 00:59:26,660 I'm going to start it kind of slow. 1437 00:59:26,660 --> 00:59:30,860 But you can see highlighted in pink two elements being compared side 1438 00:59:30,860 --> 00:59:34,220 by side, i and i plus 1 being swapped if they're out of order. 1439 00:59:34,220 --> 00:59:37,460 So this is the graphical version of what Bonnie's instructions were 1440 00:59:37,460 --> 00:59:38,810 to our volunteers. 1441 00:59:38,810 --> 00:59:40,800 And now notice, bubble sort gets its name 1442 00:59:40,800 --> 00:59:43,550 because notice what's happening to apparently the biggest element. 1443 00:59:43,550 --> 00:59:46,880 It's sort of bubbling its way up all the way to the end. 1444 00:59:46,880 --> 00:59:48,710 Smaller elements are making progress. 1445 00:59:48,710 --> 00:59:51,410 Like a 15 and a 12 just moved a little bit to the left. 1446 00:59:51,410 --> 00:59:52,340 But they're not done. 1447 00:59:52,340 --> 00:59:54,020 They're not in their right places yet. 1448 00:59:54,020 --> 00:59:58,100 But the big elements are starting to bubble all the way to the right. 1449 00:59:58,100 --> 01:00:00,600 Now, this gets a little tedious pretty quickly. 1450 01:00:00,600 --> 01:00:03,390 So I'm going to go ahead and speed up the animation speed. 1451 01:00:03,390 --> 01:00:04,650 And if we watch it now-- 1452 01:00:04,650 --> 01:00:06,650 same algorithm, it's just running faster-- 1453 01:00:06,650 --> 01:00:08,690 you can really see that the larger elements are 1454 01:00:08,690 --> 01:00:10,920 accumulating at the right-hand side. 1455 01:00:10,920 --> 01:00:12,830 So this is identical to our eight volunteers. 1456 01:00:12,830 --> 01:00:15,260 It's just each human now is represented by a bar. 1457 01:00:15,260 --> 01:00:19,070 And you can really see the larger numbers bubbling their way up 1458 01:00:19,070 --> 01:00:20,210 to the top. 1459 01:00:20,210 --> 01:00:23,810 But you can see perhaps more visually there's a lot of work here. 1460 01:00:23,810 --> 01:00:25,940 Bonnie was uttering a lot of sentences. 1461 01:00:25,940 --> 01:00:28,310 She was doing a lot of back and forth, because, just 1462 01:00:28,310 --> 01:00:32,120 as this pink bar suggests, it's going back and forth and back and forth, 1463 01:00:32,120 --> 01:00:35,887 doing a lot of work again and again and again. 1464 01:00:35,887 --> 01:00:36,470 And let's see. 1465 01:00:36,470 --> 01:00:38,512 It's going to start to speed up now because we're 1466 01:00:38,512 --> 01:00:40,820 nearing the latter half of it. 1467 01:00:40,820 --> 01:00:44,420 But as you can see, ultimately, this is kind 1468 01:00:44,420 --> 01:00:46,130 of what n squared feels like, right? 1469 01:00:46,130 --> 01:00:47,540 I'm kind of out of words again. 1470 01:00:47,540 --> 01:00:49,050 And I could say some more things. 1471 01:00:49,050 --> 01:00:51,842 But it's really just stalling because the algorithm's kind of slow. 1472 01:00:51,842 --> 01:00:55,550 n squared is not a good upper bound on running 1473 01:00:55,550 --> 01:00:58,598 time, especially when your [? elements ?] are randomly sorted. 1474 01:00:58,598 --> 01:00:59,640 So let's try another one. 1475 01:00:59,640 --> 01:01:02,375 Let's do, in this case, selection sort. 1476 01:01:02,375 --> 01:01:05,520 So I'm going to re-randomize the numbers just as we started and now do 1477 01:01:05,520 --> 01:01:06,380 selection sort. 1478 01:01:06,380 --> 01:01:08,240 And I'm starting at the faster speed. 1479 01:01:08,240 --> 01:01:10,010 And it's working a little differently. 1480 01:01:10,010 --> 01:01:11,990 Notice that the pink line is sweeping from 1481 01:01:11,990 --> 01:01:13,990 left to right, looking for the smallest element. 1482 01:01:13,990 --> 01:01:16,880 And when it finds it, it highlights the little bar, 1483 01:01:16,880 --> 01:01:19,363 and it moves it all the way in place to the left. 1484 01:01:19,363 --> 01:01:22,280 So whereas bubble [? sort's ?] large elements bubbled up to the right, 1485 01:01:22,280 --> 01:01:26,360 selection sort is much more emphatically grabbing the smallest element 1486 01:01:26,360 --> 01:01:29,340 and putting it into its place one after the other. 1487 01:01:29,340 --> 01:01:30,890 So this has a different feel. 1488 01:01:30,890 --> 01:01:33,200 But here, too, I'm going to have to ad lib 1489 01:01:33,200 --> 01:01:35,380 quite a bit because it's taking a while. 1490 01:01:35,380 --> 01:01:37,970 And you can see the pink bars are really going back and forth 1491 01:01:37,970 --> 01:01:43,610 and back and forth, doing quite a bit of work, quite a bit of work, 1492 01:01:43,610 --> 01:01:44,960 quite a bit of work. 1493 01:01:44,960 --> 01:01:47,060 And now finally, it's done. 1494 01:01:47,060 --> 01:01:50,690 So in a bit, we'll take a look at fundamentally faster solutions 1495 01:01:50,690 --> 01:01:52,730 and see why n squared actually is small. 1496 01:01:52,730 --> 01:01:56,270 But first let's take our five-minute break with mini cupcakes outside. 1497 01:01:56,270 --> 01:01:59,900 1498 01:01:59,900 --> 01:02:01,370 So we are back. 1499 01:02:01,370 --> 01:02:03,860 And just as a teaser for this coming week, 1500 01:02:03,860 --> 01:02:07,280 the week is ultimately about algorithms and the implementation thereof. 1501 01:02:07,280 --> 01:02:09,528 And it turns out that certainly on campus 1502 01:02:09,528 --> 01:02:11,570 and in the real world, our [? election's ?] quite 1503 01:02:11,570 --> 01:02:14,600 often, an algorithmic process to which there's actually 1504 01:02:14,600 --> 01:02:16,550 multiple possible solutions. 1505 01:02:16,550 --> 01:02:19,450 Indeed, when you vote for someone, how those votes are tabulated 1506 01:02:19,450 --> 01:02:21,700 can actually differ based on the algorithm being used. 1507 01:02:21,700 --> 01:02:24,460 And those can actually had very real-world effects 1508 01:02:24,460 --> 01:02:26,110 on the outcomes of those elections. 1509 01:02:26,110 --> 01:02:28,930 So among the challenges ahead for the coming week with problem set 1510 01:02:28,930 --> 01:02:32,260 three is to implement a number of algorithms related to elections. 1511 01:02:32,260 --> 01:02:34,300 For instance, it might be a very simple ballot, 1512 01:02:34,300 --> 01:02:39,610 whereby whoever has the most votes among all among all of the candidates, 1513 01:02:39,610 --> 01:02:41,218 or whoever has a plurality wins. 1514 01:02:41,218 --> 01:02:44,260 Or you can implement some kind of runoff election, whereby you don't just 1515 01:02:44,260 --> 01:02:46,690 vote for one candidate, but you rank your preferences. 1516 01:02:46,690 --> 01:02:49,720 And then you use software or a more manual human process 1517 01:02:49,720 --> 01:02:52,660 to adjudicate who wins based on the ranking of those candidates. 1518 01:02:52,660 --> 01:02:55,150 And there's even more possibilities that ultimately 1519 01:02:55,150 --> 01:02:57,370 can influence real-world outcomes, whether it's here 1520 01:02:57,370 --> 01:02:59,090 on campus or in the real world. 1521 01:02:59,090 --> 01:03:02,080 And so that's what we'll explore this week in code. 1522 01:03:02,080 --> 01:03:04,240 But now let's see if we can't fundamentally 1523 01:03:04,240 --> 01:03:07,660 do better than both bubble sort and selection sort. 1524 01:03:07,660 --> 01:03:10,810 And let me stipulate, there's actually dozens of sorting algorithms. 1525 01:03:10,810 --> 01:03:14,320 We're looking just at a couple of representative algorithms here. 1526 01:03:14,320 --> 01:03:19,090 But let's see if we can't do fundamentally better than the n squared 1527 01:03:19,090 --> 01:03:20,863 big O that we kept bumping up against. 1528 01:03:20,863 --> 01:03:23,530 And to do that, let me propose that we introduce a fundamentally 1529 01:03:23,530 --> 01:03:27,070 new idea that, frankly, among the ideas we explore in computer science 1530 01:03:27,070 --> 01:03:28,960 will kind of bend your mind a little bit. 1531 01:03:28,960 --> 01:03:30,460 So again, here comes that fire hose. 1532 01:03:30,460 --> 01:03:33,760 But again, the goal today is exposure, not yet comfort. 1533 01:03:33,760 --> 01:03:37,600 Comfort will come in the coming weeks as we apply [? this ?] [? ideas ?] 1534 01:03:37,600 --> 01:03:38,470 and others. 1535 01:03:38,470 --> 01:03:41,608 So let's rewind to week 0, where everything was very simple at the time. 1536 01:03:41,608 --> 01:03:43,900 And we were just searching a phone book for Mike Smith. 1537 01:03:43,900 --> 01:03:45,670 And we had this pseudocode here. 1538 01:03:45,670 --> 01:03:48,550 This had an example of a programming construct that, at the time, 1539 01:03:48,550 --> 01:03:51,460 we highlighted and called a loop, go back to line 3 1540 01:03:51,460 --> 01:03:54,010 so that you can do something again and again. 1541 01:03:54,010 --> 01:03:56,503 This is an example of what's called iteration, 1542 01:03:56,503 --> 01:03:58,420 a word you might have heard your [? TFs ?] say 1543 01:03:58,420 --> 01:04:02,365 or someone else, where to iterate just means to loop again and again. 1544 01:04:02,365 --> 01:04:03,740 And this is very straightforward. 1545 01:04:03,740 --> 01:04:05,698 And we could implement this in code if we want. 1546 01:04:05,698 --> 01:04:08,980 But there there's an opportunity to design this algorithm not only 1547 01:04:08,980 --> 01:04:11,500 differently, but perhaps better, right? 1548 01:04:11,500 --> 01:04:14,440 After all, let me go ahead and erase that line there 1549 01:04:14,440 --> 01:04:17,530 and get rid of this iteration and see if I can't solve the problem 1550 01:04:17,530 --> 01:04:21,230 more elegantly, if you will, a better design, if you will, 1551 01:04:21,230 --> 01:04:23,230 though there will invariably be some trade-offs. 1552 01:04:23,230 --> 01:04:26,230 Here, with the open to the middle of the left-- 1553 01:04:26,230 --> 01:04:28,720 here, with open to middle of left half of book 1554 01:04:28,720 --> 01:04:31,540 and here, open to middle of right half of book, 1555 01:04:31,540 --> 01:04:35,530 the whole point of opening to the middle of the left or the middle of the right 1556 01:04:35,530 --> 01:04:39,190 was just a search for Mike Smith again but in half of the phone book, 1557 01:04:39,190 --> 01:04:40,180 left or right. 1558 01:04:40,180 --> 01:04:43,810 The key detail being it's half the size of the whole phone book. 1559 01:04:43,810 --> 01:04:45,500 But the algorithm is really the same. 1560 01:04:45,500 --> 01:04:47,593 So in fact, why don't we simplify our pseudocode 1561 01:04:47,593 --> 01:04:50,260 and not get into the logistics of like, oh, go back to this line 1562 01:04:50,260 --> 01:04:51,635 and then do this again and again. 1563 01:04:51,635 --> 01:04:54,700 No, let's just say search the left half of book 1564 01:04:54,700 --> 01:04:56,800 or search the right half of book. 1565 01:04:56,800 --> 01:04:59,620 And in fact, let's tighten up the code and make it fewer lines 1566 01:04:59,620 --> 01:05:03,250 so that we don't even need to get into the specific line numbers. 1567 01:05:03,250 --> 01:05:06,070 We can just tell ourselves what to do. 1568 01:05:06,070 --> 01:05:09,040 Now, highlighted in yellow here are those two new lines. 1569 01:05:09,040 --> 01:05:11,165 And it might seem kind of like a cyclical argument. 1570 01:05:11,165 --> 01:05:12,790 Well, how do you search for Mike Smith? 1571 01:05:12,790 --> 01:05:14,590 Well, you just search for Mike Smith. 1572 01:05:14,590 --> 01:05:17,620 But the key detail here is I'm not just telling 1573 01:05:17,620 --> 01:05:19,390 you to do the same thing endlessly. 1574 01:05:19,390 --> 01:05:22,057 I'm telling you, if you want to search for Mike Smith in a phone 1575 01:05:22,057 --> 01:05:23,140 book of this size, mm-mm. 1576 01:05:23,140 --> 01:05:25,315 Search for Mike Smith in a phone book of this size. 1577 01:05:25,315 --> 01:05:27,940 And then the next step of that algorithm becomes search for him 1578 01:05:27,940 --> 01:05:32,320 in a phone book of this size, this size, when you keep halving the problem. 1579 01:05:32,320 --> 01:05:37,150 So this is an example of a technique in programming called recursion, whereby 1580 01:05:37,150 --> 01:05:43,810 you implement a program or an algorithm or code that, in a sense, calls itself. 1581 01:05:43,810 --> 01:05:47,950 If what we're looking at here on the board is a function called search, 1582 01:05:47,950 --> 01:05:52,600 a function is recursive if it literally references its own name 1583 01:05:52,600 --> 01:05:53,785 in its own code. 1584 01:05:53,785 --> 01:05:55,910 And this is where your mind starts to bend perhaps. 1585 01:05:55,910 --> 01:05:57,640 And we'll see this more concretely. 1586 01:05:57,640 --> 01:06:00,910 But recursion is when a function calls itself. 1587 01:06:00,910 --> 01:06:03,970 So if this is a function implementing search and highlighted in yellow 1588 01:06:03,970 --> 01:06:07,690 are two lines of code that say search again but on a smaller 1589 01:06:07,690 --> 01:06:13,330 piece of the problem, that is recursion, something happening again and again. 1590 01:06:13,330 --> 01:06:14,715 So let's see this in context. 1591 01:06:14,715 --> 01:06:17,590 So let's go back to Mario, where this is a slightly different pyramid 1592 01:06:17,590 --> 01:06:18,548 that we've seen before. 1593 01:06:18,548 --> 01:06:22,270 Notice that it's left aligned, and it goes downward to the right. 1594 01:06:22,270 --> 01:06:25,930 Let's, in fact, get rid of the ground and just focus only on the pyramid. 1595 01:06:25,930 --> 01:06:29,980 How could I go about writing code that implements this type of Mario pyramid? 1596 01:06:29,980 --> 01:06:33,100 Well, let me go ahead and create a new file called Mario.c. 1597 01:06:33,100 --> 01:06:35,860 Or actually, no, let's be even more specific this time. 1598 01:06:35,860 --> 01:06:38,313 Let's call this iteration.c to make clear 1599 01:06:38,313 --> 01:06:39,730 that this is an iterative program. 1600 01:06:39,730 --> 01:06:42,580 And let me go ahead and include cs50.h. 1601 01:06:42,580 --> 01:06:44,920 And let me include standard io.h. 1602 01:06:44,920 --> 01:06:48,220 And let me go ahead then and do int main void. 1603 01:06:48,220 --> 01:06:51,460 And in here, let me go ahead and just get the height of the pyramid 1604 01:06:51,460 --> 01:06:53,673 from the user, using our old friend get int. 1605 01:06:53,673 --> 01:06:55,840 I'm not going to bother, for today, doing a do while 1606 01:06:55,840 --> 01:06:57,382 and making sure the human cooperates. 1607 01:06:57,382 --> 01:07:00,670 They need to just behave and give us a positive integer here. 1608 01:07:00,670 --> 01:07:04,300 And then I'm going to go ahead and just draw a pyramid of that height 1609 01:07:04,300 --> 01:07:07,690 by using a function that doesn't exist yet but that's going to be called draw. 1610 01:07:07,690 --> 01:07:10,240 I'm going to implement this function draw as follows, 1611 01:07:10,240 --> 01:07:13,790 void draw, because it doesn't need to return a value, per our discussion 1612 01:07:13,790 --> 01:07:14,290 last week. 1613 01:07:14,290 --> 01:07:15,520 It's just going to print something. 1614 01:07:15,520 --> 01:07:17,650 But it is going to take input, like a number n. 1615 01:07:17,650 --> 01:07:19,450 Or rather, let's call it H for Height. 1616 01:07:19,450 --> 01:07:21,700 That represents the height of the pyramid to draw. 1617 01:07:21,700 --> 01:07:25,010 And how do I draw a pyramid that looks like this? 1618 01:07:25,010 --> 01:07:28,630 Well, again, use some intuition, as you might have four problem set 1, 1619 01:07:28,630 --> 01:07:31,190 even though the pyramid there was a little trickier. 1620 01:07:31,190 --> 01:07:33,043 On the first row, I want to print one brick. 1621 01:07:33,043 --> 01:07:34,960 On the second row, I want to print two bricks. 1622 01:07:34,960 --> 01:07:37,450 On the third row, three, fourth row, four. 1623 01:07:37,450 --> 01:07:40,210 So it turns out this is an easier pyramid than the one we 1624 01:07:40,210 --> 01:07:41,950 had you do for problem set 1. 1625 01:07:41,950 --> 01:07:42,610 Sorry. 1626 01:07:42,610 --> 01:07:45,550 So for int, i gets 0. 1627 01:07:45,550 --> 01:07:47,410 i is less than-- actually, you know what? 1628 01:07:47,410 --> 01:07:50,530 Let me make it a little clearer and more mapping to my verbal pseudocode. 1629 01:07:50,530 --> 01:07:53,080 Let's initialize i to 1 for the first row. 1630 01:07:53,080 --> 01:07:56,140 Let's do this so long as i is less than or equal to the height. 1631 01:07:56,140 --> 01:07:57,310 And let's do i plus plus. 1632 01:07:57,310 --> 01:08:02,550 So this is the same thing as starting from 0 but just, surprise-- 1633 01:08:02,550 --> 01:08:08,380 [LAUGHS] so I actually didn't make that mistake, if you didn't see it. 1634 01:08:08,380 --> 01:08:14,860 So I'm going to go ahead and say for in i gets 1 to represent my first row. 1635 01:08:14,860 --> 01:08:17,439 i is less than or equal to height, i plus plus. 1636 01:08:17,439 --> 01:08:20,242 This is identical, again, to starting from 0, 1637 01:08:20,242 --> 01:08:23,200 but it's just nice to start counting from 1 sometimes, as in this case, 1638 01:08:23,200 --> 01:08:24,279 for the first row. 1639 01:08:24,279 --> 01:08:26,696 And then anytime you want to do something two dimensional, 1640 01:08:26,696 --> 01:08:28,700 like in Mario, odds are, if you're like me, 1641 01:08:28,700 --> 01:08:33,310 you probably had an inner nested loop, maybe calling it j, and doing j 1642 01:08:33,310 --> 01:08:37,593 is less than or equal to i and then j plus plus. 1643 01:08:37,593 --> 01:08:39,760 And I'll run this so that it's clear what I'm doing. 1644 01:08:39,760 --> 01:08:43,090 But inside this nested loop, I'm just going to print one brick. 1645 01:08:43,090 --> 01:08:49,600 And then down here, I'm going to print my new line backslash n. 1646 01:08:49,600 --> 01:08:51,370 So again, it's a simple draw function. 1647 01:08:51,370 --> 01:08:53,710 And now because it's at the bottom of my file, 1648 01:08:53,710 --> 01:08:56,950 I need to put its prototype up here, one of the few times copy and paste 1649 01:08:56,950 --> 01:08:58,720 is reasonable, I would say. 1650 01:08:58,720 --> 01:09:03,055 So let me make iteration, compiles OK, dot slash iteration. 1651 01:09:03,055 --> 01:09:04,430 And now I'm asked for the height. 1652 01:09:04,430 --> 01:09:06,470 Let's go ahead do a pyramid of size 4. 1653 01:09:06,470 --> 01:09:07,715 And voila, it seems to work. 1654 01:09:07,715 --> 01:09:08,840 And let me do it once more. 1655 01:09:08,840 --> 01:09:12,100 I'll try, for instance, a pyramid of height 3. 1656 01:09:12,100 --> 01:09:12,950 That works. 1657 01:09:12,950 --> 01:09:15,340 And let me go ahead and do a pyramid of size 5. 1658 01:09:15,340 --> 01:09:16,420 So it seems to work. 1659 01:09:16,420 --> 01:09:19,180 And this is a very reasonable, very correct approach 1660 01:09:19,180 --> 01:09:24,040 to implementing that Mario pyramid using iteration, that is to say, 1661 01:09:24,040 --> 01:09:26,920 using loops, in this case, two loops. 1662 01:09:26,920 --> 01:09:29,380 But you know what's interesting about this Mario pyramid, 1663 01:09:29,380 --> 01:09:31,540 as well as some of the others we've seen, 1664 01:09:31,540 --> 01:09:33,609 is there's this common structure, right? 1665 01:09:33,609 --> 01:09:35,649 And if we look at the pyramid in isolation, 1666 01:09:35,649 --> 01:09:38,950 what is the definition of a pyramid of height 4? 1667 01:09:38,950 --> 01:09:44,050 Well, arguably, it's a pyramid of height 3 plus 1 additional row. 1668 01:09:44,050 --> 01:09:46,359 What's the definition of a pyramid of height 3? 1669 01:09:46,359 --> 01:09:49,359 Well, it's a pyramid of height 2 plus 1 additional row. 1670 01:09:49,359 --> 01:09:51,460 What's the definition of a pyramid of high 2? 1671 01:09:51,460 --> 01:09:54,730 It's a pyramid of height 1 with an additional row. 1672 01:09:54,730 --> 01:09:59,830 That's a recursive definition of just a physical object or a virtual object, 1673 01:09:59,830 --> 01:10:04,090 whereby you can describe the structure of something in terms of itself. 1674 01:10:04,090 --> 01:10:08,020 Now, at some point, I need a special case, at least one height. 1675 01:10:08,020 --> 01:10:09,970 What is a pyramid of height 0? 1676 01:10:09,970 --> 01:10:10,780 Nothing, right? 1677 01:10:10,780 --> 01:10:16,210 Return or exit or quit, whatever the right verbiage is for the algorithm. 1678 01:10:16,210 --> 01:10:19,430 So long as you have a so-called base case, where you manually say, 1679 01:10:19,430 --> 01:10:21,580 oh, in that specific case, just don't do anything, 1680 01:10:21,580 --> 01:10:23,950 and you don't recursively call yourself again 1681 01:10:23,950 --> 01:10:28,340 and again, we can use this principle of code calling itself. 1682 01:10:28,340 --> 01:10:29,630 So let's try this once more. 1683 01:10:29,630 --> 01:10:33,880 Let me go ahead and create another file called recursion.c. 1684 01:10:33,880 --> 01:10:37,570 I'm again going to go ahead and include [? cs50.h. ?] And I'm going to go ahead 1685 01:10:37,570 --> 01:10:41,590 and include standard [? io.h. ?] And then I'm going to go ahead and have int 1686 01:10:41,590 --> 01:10:43,450 main void again. 1687 01:10:43,450 --> 01:10:45,520 And in this program here, I'm going to again ask 1688 01:10:45,520 --> 01:10:48,430 the user for the height of interest for their pyramid 1689 01:10:48,430 --> 01:10:52,540 using int height gets get int and ask them for height. 1690 01:10:52,540 --> 01:10:55,000 I'm not going to bother error checking here. 1691 01:10:55,000 --> 01:10:57,400 I'm going to go ahead and draw a pyramid of that height. 1692 01:10:57,400 --> 01:11:00,610 And so what's going to change this time is my draw function, 1693 01:11:00,610 --> 01:11:04,400 void draw int h as before. 1694 01:11:04,400 --> 01:11:06,070 And now's where things get interesting. 1695 01:11:06,070 --> 01:11:09,460 My goal now is not to just use nested loops, 1696 01:11:09,460 --> 01:11:13,390 but to define a bigger pyramid in terms of a small pyramid. 1697 01:11:13,390 --> 01:11:19,030 So suppose that the goal at hand is to draw a pyramid of size 4. 1698 01:11:19,030 --> 01:11:23,890 What should I do first, according to this definition of a pyramid? 1699 01:11:23,890 --> 01:11:29,060 How do I draw a pyramid of size 4 in English? 1700 01:11:29,060 --> 01:11:29,560 Yeah? 1701 01:11:29,560 --> 01:11:33,750 AUDIENCE: Draw a pyramid of the size 4 minus 1. 1702 01:11:33,750 --> 01:11:37,715 SPEAKER 1: Yeah, draw a pyramid of size 4 minus 1, or a pyramid of size 3. 1703 01:11:37,715 --> 01:11:39,090 So how do I express this in code? 1704 01:11:39,090 --> 01:11:43,140 Well, wonderfully in code, this is super simple, h minus 1. 1705 01:11:43,140 --> 01:11:51,060 That will draw me a pyramid of height h minus 1, or 3 in this specific case. 1706 01:11:51,060 --> 01:11:53,250 Now, it's not done the program, right? 1707 01:11:53,250 --> 01:11:55,500 I can't possibly just compile this and expect 1708 01:11:55,500 --> 01:11:59,790 it to work because this seems like it's just going to call itself endlessly. 1709 01:11:59,790 --> 01:12:05,610 Well, what's a pyramid of size 3, 2, 1, 0, negative 1, negative 2, right? 1710 01:12:05,610 --> 01:12:09,360 It would go on endlessly if I just blindly subtract 1. 1711 01:12:09,360 --> 01:12:10,860 So I need that base case. 1712 01:12:10,860 --> 01:12:13,630 Under what circumstances should I actually not draw anything? 1713 01:12:13,630 --> 01:12:15,373 AUDIENCE: [INAUDIBLE] 1714 01:12:15,373 --> 01:12:16,040 SPEAKER 1: Yeah. 1715 01:12:16,040 --> 01:12:19,190 So maybe if h equals equals 0, you know what? 1716 01:12:19,190 --> 01:12:19,880 Just return. 1717 01:12:19,880 --> 01:12:21,290 Don't do anything, right? 1718 01:12:21,290 --> 01:12:23,480 I need a base case, a hard-coded condition 1719 01:12:23,480 --> 01:12:27,440 that says stop doing this, this mind-bending [? cyclity ?] 1720 01:12:27,440 --> 01:12:28,700 again and again. 1721 01:12:28,700 --> 01:12:30,330 But I do need to do one more thing. 1722 01:12:30,330 --> 01:12:34,130 So this is just an error check to make sure I don't do this forever. 1723 01:12:34,130 --> 01:12:36,730 This is this leap of faith, where somehow I haven't even 1724 01:12:36,730 --> 01:12:38,480 written the function yet, and somehow it's 1725 01:12:38,480 --> 01:12:40,572 magically going to draw my pyramid. 1726 01:12:40,572 --> 01:12:42,530 But what's the second step of drawing a pyramid 1727 01:12:42,530 --> 01:12:44,678 of height 4, if I can ask again? 1728 01:12:44,678 --> 01:12:46,630 AUDIENCE: Well, in terms of [INAUDIBLE]? 1729 01:12:46,630 --> 01:12:48,130 SPEAKER 1: Yeah, so what comes next? 1730 01:12:48,130 --> 01:12:49,713 I've just drawn a pyramid of height 3. 1731 01:12:49,713 --> 01:12:51,910 AUDIENCE: Oh, then you draw a pyramid of height 2. 1732 01:12:51,910 --> 01:12:53,482 SPEAKER 1: Now I draw a-- 1733 01:12:53,482 --> 01:12:54,190 say it once more. 1734 01:12:54,190 --> 01:12:55,675 AUDIENCE: Pyramid of height 2. 1735 01:12:55,675 --> 01:12:56,550 SPEAKER 1: Not quite. 1736 01:12:56,550 --> 01:12:57,383 Take this literally. 1737 01:12:57,383 --> 01:13:01,330 If I have just in code drawn a pyramid of height 3, 1738 01:13:01,330 --> 01:13:03,280 how do I get to a pyramid of height 4 now? 1739 01:13:03,280 --> 01:13:04,772 AUDIENCE: Oh, you add [INAUDIBLE]. 1740 01:13:04,772 --> 01:13:07,480 SPEAKER 1: Yeah, I add that additional row, right? [? Because, ?] 1741 01:13:07,480 --> 01:13:10,040 again, per our diagram, what's a pyramid of height 4? 1742 01:13:10,040 --> 01:13:13,440 Well, it's really just a pyramid of height 3 plus an additional row. 1743 01:13:13,440 --> 01:13:15,830 So if we all just kind of agree, a leap of faith, 1744 01:13:15,830 --> 01:13:20,150 that somehow or other I have the ability to draw pyramids of height h minus 1, 1745 01:13:20,150 --> 01:13:26,280 lets you and I do the hard part in code of drawing that one additional row. 1746 01:13:26,280 --> 01:13:30,560 So if I go back in code here, after drawing a pyramid of height h minus 1, 1747 01:13:30,560 --> 01:13:37,250 I need to go ahead and for int i gets 0, i is less than h, i plus plus. 1748 01:13:37,250 --> 01:13:43,200 It would seem that I just need to print out, for instance, 1749 01:13:43,200 --> 01:13:48,800 up here a hash followed by a new line after that, right? 1750 01:13:48,800 --> 01:13:51,947 So I do need a for loop, but just one not nested. 1751 01:13:51,947 --> 01:13:53,780 And what does this have the effect of doing? 1752 01:13:53,780 --> 01:13:59,240 Well, on the fourth row, where h equals 4, how many hashes am I going to print? 1753 01:13:59,240 --> 01:14:04,430 1, 2, 3, 4, if I'm iterating from 0 on up to h, 0, 1, 2, 3, 4. 1754 01:14:04,430 --> 01:14:08,820 So these lines of code, in the story at hand, are going to print four hashes. 1755 01:14:08,820 --> 01:14:13,460 This line of code, amazingly, is going to print everything else above it, 1756 01:14:13,460 --> 01:14:14,855 the pyramid of height 3. 1757 01:14:14,855 --> 01:14:16,730 And the line of code above that is just going 1758 01:14:16,730 --> 01:14:22,070 to make sure that we don't blindly call draw forever into the negative numbers. 1759 01:14:22,070 --> 01:14:27,290 I'm literally going to say, if h equals equals 0, stop doing this magic. 1760 01:14:27,290 --> 01:14:31,190 So let's go ahead and put my prototype up top, just as before, 1761 01:14:31,190 --> 01:14:36,260 even though it's the same, save the file, make recursion, Enter. 1762 01:14:36,260 --> 01:14:37,700 It compiles OK. 1763 01:14:37,700 --> 01:14:40,910 Now let me go ahead and run recursion a height of 4. 1764 01:14:40,910 --> 01:14:47,150 And, oh, my god, I wrote a function that called itself and somehow magically 1765 01:14:47,150 --> 01:14:47,940 printed a pyramid. 1766 01:14:47,940 --> 01:14:51,620 And yet all I ever explicitly did was print what? 1767 01:14:51,620 --> 01:14:54,860 A row of bricks myself. 1768 01:14:54,860 --> 01:14:58,280 And the recursion comes from the fact that I'm calling myself. 1769 01:14:58,280 --> 01:15:02,630 But just like with binary search, just like 1770 01:15:02,630 --> 01:15:08,780 with any divide-and-conquer approach, I'm calling myself on a smaller problem 1771 01:15:08,780 --> 01:15:09,860 than I was handed. 1772 01:15:09,860 --> 01:15:14,220 The bites are eating into the problem again and again and again. 1773 01:15:14,220 --> 01:15:17,570 Any questions on this technique, a function 1774 01:15:17,570 --> 01:15:19,880 that calls itself is recursive? 1775 01:15:19,880 --> 01:15:20,606 Yeah? 1776 01:15:20,606 --> 01:15:22,550 AUDIENCE: A quick question. 1777 01:15:22,550 --> 01:15:25,952 [INAUDIBLE] 1778 01:15:25,952 --> 01:15:29,283 So [INAUDIBLE] loop, how does it go back [INAUDIBLE]?? 1779 01:15:29,283 --> 01:15:31,450 SPEAKER 1: Really good question, after the for loop, 1780 01:15:31,450 --> 01:15:32,700 how does it go back and print? 1781 01:15:32,700 --> 01:15:33,240 It doesn't. 1782 01:15:33,240 --> 01:15:34,960 That happens first. 1783 01:15:34,960 --> 01:15:38,220 So if you actually were to use debug 50 in the [? IDE, ?] 1784 01:15:38,220 --> 01:15:40,740 you would see that when this line 20 is called, 1785 01:15:40,740 --> 01:15:45,650 and you call draw of a pyramid of height 3, draw gets called again. 1786 01:15:45,650 --> 01:15:47,400 And then it gets called again on height 2. 1787 01:15:47,400 --> 01:15:49,140 Then it gets called again on height 1. 1788 01:15:49,140 --> 01:15:51,450 But guess what happens on a pyramid of height 1? 1789 01:15:51,450 --> 01:15:53,100 It prints a single hash. 1790 01:15:53,100 --> 01:15:55,410 Then if you rewind the story, what happens next? 1791 01:15:55,410 --> 01:15:57,300 You print a row of two hashes. 1792 01:15:57,300 --> 01:15:58,350 What happens next? 1793 01:15:58,350 --> 01:15:59,940 You print a row of three hashes. 1794 01:15:59,940 --> 01:16:01,080 What happens next? 1795 01:16:01,080 --> 01:16:03,420 You print a row of four hashes. 1796 01:16:03,420 --> 01:16:05,790 And we'll see more of this before long. 1797 01:16:05,790 --> 01:16:07,800 But because I'm printing-- 1798 01:16:07,800 --> 01:16:12,660 I'm calling draw before I'm printing the base, I don't know how this works yet. 1799 01:16:12,660 --> 01:16:14,970 That's the leap of faith to which I keep alluding. 1800 01:16:14,970 --> 01:16:18,870 But it keeps happening because, 1, I have this base case that 1801 01:16:18,870 --> 01:16:20,880 stops this from happening forever. 1802 01:16:20,880 --> 01:16:26,470 And I have this other case that adds to my pyramid again and again. 1803 01:16:26,470 --> 01:16:26,970 Yeah? 1804 01:16:26,970 --> 01:16:30,906 AUDIENCE: It's kind of like a layering of [INAUDIBLE] for iterations. 1805 01:16:30,906 --> 01:16:34,962 But instead of going from top down, it's going [? down up. ?] 1806 01:16:34,962 --> 01:16:35,670 SPEAKER 1: It is. 1807 01:16:35,670 --> 01:16:36,390 It's going down up. 1808 01:16:36,390 --> 01:16:37,860 And you're referring actually to a [? concept ?] 1809 01:16:37,860 --> 01:16:40,652 we'll talk about actually in a week or two's time called the stack. 1810 01:16:40,652 --> 01:16:43,110 We'll see actually how this magic is working. 1811 01:16:43,110 --> 01:16:45,870 For now let me just stipulate that functions can call themselves, 1812 01:16:45,870 --> 01:16:48,090 so long as what you pass them is a smaller 1813 01:16:48,090 --> 01:16:50,545 input than you were handed initially. 1814 01:16:50,545 --> 01:16:52,920 And now just to demonstrate that computer scientists have 1815 01:16:52,920 --> 01:16:55,080 a sense of humor, if we Google recursion, 1816 01:16:55,080 --> 01:16:57,810 as you might currently be doing to understand what this is, 1817 01:16:57,810 --> 01:16:59,032 you'll notice-- 1818 01:16:59,032 --> 01:17:01,392 [LAUGHTER] 1819 01:17:01,392 --> 01:17:02,810 1820 01:17:02,810 --> 01:17:04,040 Get it? 1821 01:17:04,040 --> 01:17:06,350 Kind of-- OK, anyhow. 1822 01:17:06,350 --> 01:17:10,640 Google has literally hard-coded that into their source code of google.com. 1823 01:17:10,640 --> 01:17:17,090 So let's now use this to solve a problem of sorting. 1824 01:17:17,090 --> 01:17:19,997 It turns out there's an algorithm out there called merge sort. 1825 01:17:19,997 --> 01:17:23,080 And it's representative of sorts that are actually better than bubble sort 1826 01:17:23,080 --> 01:17:25,550 and better than selection sort fundamentally. 1827 01:17:25,550 --> 01:17:28,160 In terms of big O notation, we can do better. 1828 01:17:28,160 --> 01:17:30,708 n squared does not have to be our fate. 1829 01:17:30,708 --> 01:17:32,750 After all, so many things in our life are sorted. 1830 01:17:32,750 --> 01:17:36,320 Your contacts in your phone, maybe your friends on Facebook or Instagram, 1831 01:17:36,320 --> 01:17:40,100 or any application using the cloud typically sorts data in some way. 1832 01:17:40,100 --> 01:17:42,920 It would be a shame if it's super slow to sort, 1833 01:17:42,920 --> 01:17:44,870 as we saw already, with n squared. 1834 01:17:44,870 --> 01:17:47,402 So merge sort works as follows. 1835 01:17:47,402 --> 01:17:49,610 This is pseudocode for an algorithm called merge sort 1836 01:17:49,610 --> 01:17:53,000 that if you hand it an array of numbers or names or anything, 1837 01:17:53,000 --> 01:17:54,050 it acts as follows. 1838 01:17:54,050 --> 01:17:57,850 If there's only one item you're handed in your array, well, just return. 1839 01:17:57,850 --> 01:17:58,850 There's nothing to sort. 1840 01:17:58,850 --> 01:18:00,050 So that's our base case. 1841 01:18:00,050 --> 01:18:02,000 That's the sort of stupid case where you have 1842 01:18:02,000 --> 01:18:04,520 to hard code, that is literally write out, 1843 01:18:04,520 --> 01:18:06,903 if this situation happens, do this. 1844 01:18:06,903 --> 01:18:09,320 And the case is if you just hand me a list with one thing, 1845 01:18:09,320 --> 01:18:12,070 it's obviously sorted, by definition, because nothing can possibly 1846 01:18:12,070 --> 01:18:12,950 be out of order. 1847 01:18:12,950 --> 01:18:15,380 Things get more interesting otherwise. 1848 01:18:15,380 --> 01:18:18,800 Merge sort says, just like that Mario example, you know what? 1849 01:18:18,800 --> 01:18:21,140 If you want me to sort this whole list, I'm 1850 01:18:21,140 --> 01:18:25,970 going to tell you sort the left half, then sort the right half, 1851 01:18:25,970 --> 01:18:30,200 and then merge those lists together, such that you weave them together 1852 01:18:30,200 --> 01:18:33,830 in such a way that the merged list is sorted as well. 1853 01:18:33,830 --> 01:18:38,330 So merge sort is three steps, sort left half, sort right half, 1854 01:18:38,330 --> 01:18:41,870 merge those two sorted halves. 1855 01:18:41,870 --> 01:18:43,310 And this is the-- 1856 01:18:43,310 --> 01:18:45,800 we were chatting earlier about an apt metaphor here. 1857 01:18:45,800 --> 01:18:49,770 This is kind of a roller-coaster-type ride, where you got to hold on. 1858 01:18:49,770 --> 01:18:50,640 You've got to focus. 1859 01:18:50,640 --> 01:18:54,080 It's OK if it doesn't all work out for the best the first time around. 1860 01:18:54,080 --> 01:18:58,530 But each step will be important here in the metaphor of the fire hose as well. 1861 01:18:58,530 --> 01:19:00,890 So here is a list of unsorted numbers. 1862 01:19:00,890 --> 01:19:04,470 The goal at hand is to sort them faster than bubble sort and selection sort 1863 01:19:04,470 --> 01:19:05,140 can. 1864 01:19:05,140 --> 01:19:07,160 So merge sort tells me what? 1865 01:19:07,160 --> 01:19:09,560 Sort left half, sort right half, merge. 1866 01:19:09,560 --> 01:19:10,670 That is it for merge sort. 1867 01:19:10,670 --> 01:19:11,390 That's the magic. 1868 01:19:11,390 --> 01:19:16,220 Just like Mario says, print a pyramid of height h minus 1, print the base, done. 1869 01:19:16,220 --> 01:19:19,160 That's the essence of this recursive algorithm, left half, right half, 1870 01:19:19,160 --> 01:19:19,680 merge. 1871 01:19:19,680 --> 01:19:20,680 So what's the left half? 1872 01:19:20,680 --> 01:19:22,100 It's these four elements here. 1873 01:19:22,100 --> 01:19:24,440 Let me go ahead now and sort those four elements. 1874 01:19:24,440 --> 01:19:27,800 How do I saw a list of four elements? 1875 01:19:27,800 --> 01:19:28,880 Merge sort them, right? 1876 01:19:28,880 --> 01:19:32,175 Sort the left half, then sort the right half, then merge them together. 1877 01:19:32,175 --> 01:19:33,800 So you're kind of like kicking the can. 1878 01:19:33,800 --> 01:19:34,620 Like, I've done no work. 1879 01:19:34,620 --> 01:19:36,560 You're just telling me to go sort something [? else. ?] 1880 01:19:36,560 --> 01:19:38,185 But OK, let me follow those directions. 1881 01:19:38,185 --> 01:19:40,310 Let me sort the left half, 7, 4. 1882 01:19:40,310 --> 01:19:41,810 How do I saw a list of size 2? 1883 01:19:41,810 --> 01:19:43,190 AUDIENCE: Swap. 1884 01:19:43,190 --> 01:19:44,555 SPEAKER 1: Not swapping yet. 1885 01:19:44,555 --> 01:19:45,860 AUDIENCE: [INAUDIBLE] 1886 01:19:45,860 --> 01:19:46,910 SPEAKER 1: Merge sort-- 1887 01:19:46,910 --> 01:19:49,555 the left half, then the right half, then merge them together. 1888 01:19:49,555 --> 01:19:51,680 So again, it's kind of crazy talk because we've not 1889 01:19:51,680 --> 01:19:53,030 done any actual work yet. 1890 01:19:53,030 --> 01:19:54,300 And I claim we're sorting. 1891 01:19:54,300 --> 01:19:55,470 But let's see what happens. 1892 01:19:55,470 --> 01:19:56,900 Here's the left half. 1893 01:19:56,900 --> 01:19:59,550 How do I sort a list of size 1? 1894 01:19:59,550 --> 01:20:00,050 Done. 1895 01:20:00,050 --> 01:20:00,980 That's the return. 1896 01:20:00,980 --> 01:20:03,980 That's the base case to make sure I don't do this forever. 1897 01:20:03,980 --> 01:20:05,210 What came next? 1898 01:20:05,210 --> 01:20:06,380 I just sorted the left half. 1899 01:20:06,380 --> 01:20:07,977 What was the second step? 1900 01:20:07,977 --> 01:20:08,810 Sort the right half. 1901 01:20:08,810 --> 01:20:10,130 How do I sort this? 1902 01:20:10,130 --> 01:20:11,000 Done. 1903 01:20:11,000 --> 01:20:12,500 Now it gets interesting. 1904 01:20:12,500 --> 01:20:13,800 What was the third step? 1905 01:20:13,800 --> 01:20:14,540 AUDIENCE: Merge. 1906 01:20:14,540 --> 01:20:17,283 SPEAKER 1: Merge two lists of size 1. 1907 01:20:17,283 --> 01:20:18,575 So now I need some extra space. 1908 01:20:18,575 --> 01:20:21,480 So I'm going to give myself an extra row, some extra memory, if you will, 1909 01:20:21,480 --> 01:20:22,280 in the computer. 1910 01:20:22,280 --> 01:20:23,840 4 obviously comes first. 1911 01:20:23,840 --> 01:20:25,033 7 obviously comes next. 1912 01:20:25,033 --> 01:20:25,950 That's the merge step. 1913 01:20:25,950 --> 01:20:29,075 That's what I mean by merge, take the smallest element from whichever list, 1914 01:20:29,075 --> 01:20:32,180 and then follow it by the smallest element in the other list. 1915 01:20:32,180 --> 01:20:34,760 This now is a sorted list of size 2. 1916 01:20:34,760 --> 01:20:39,690 So if you rewind in your mind, what was the second step now? 1917 01:20:39,690 --> 01:20:41,790 That was sort left half. 1918 01:20:41,790 --> 01:20:43,620 Sort right half, right? 1919 01:20:43,620 --> 01:20:47,008 So you really have to kind of rewind in the story, like, 30-plus seconds ago. 1920 01:20:47,008 --> 01:20:48,300 How do you sort the right half? 1921 01:20:48,300 --> 01:20:52,470 Well, you sort the left half, done, right half, done. 1922 01:20:52,470 --> 01:20:54,600 Here's the magic, merge. 1923 01:20:54,600 --> 01:20:56,850 How do I merge these two lists? 1924 01:20:56,850 --> 01:20:58,240 2 comes first. 1925 01:20:58,240 --> 01:20:59,640 5 comes next. 1926 01:20:59,640 --> 01:21:04,022 I have just sorted the right half of this list. 1927 01:21:04,022 --> 01:21:05,730 So I sorted left half, sorted right half. 1928 01:21:05,730 --> 01:21:06,647 What's the third step? 1929 01:21:06,647 --> 01:21:07,980 AUDIENCE: Merge. 1930 01:21:07,980 --> 01:21:08,730 SPEAKER 1: Merge. 1931 01:21:08,730 --> 01:21:09,940 So how do I do that? 1932 01:21:09,940 --> 01:21:11,190 Well, I look at the two lists. 1933 01:21:11,190 --> 01:21:13,230 And how do I merge these together [? interleaving ?] them 1934 01:21:13,230 --> 01:21:14,130 in the right order? 1935 01:21:14,130 --> 01:21:18,810 2 comes first, then 4, then 5, then 7. 1936 01:21:18,810 --> 01:21:21,850 So now I have sorted the left half of the original list. 1937 01:21:21,850 --> 01:21:24,750 So what was step two originally? 1938 01:21:24,750 --> 01:21:26,370 Sort the right half. 1939 01:21:26,370 --> 01:21:29,820 So sort the right half means sort the left half. 1940 01:21:29,820 --> 01:21:34,790 And then sort the left half of that, done, right half of that, done. 1941 01:21:34,790 --> 01:21:38,220 Merge makes it interesting, 3 and then 6. 1942 01:21:38,220 --> 01:21:40,770 I've now sorted the left half of four numbers. 1943 01:21:40,770 --> 01:21:41,790 What comes next? 1944 01:21:41,790 --> 01:21:44,250 Sort of right half, so 8 in 1. 1945 01:21:44,250 --> 01:21:47,430 Sort the left half of that, done, right half of that, done. 1946 01:21:47,430 --> 01:21:50,670 Now merge those two together, 1 and 8. 1947 01:21:50,670 --> 01:21:53,610 I've now sorted the right half of the four elements. 1948 01:21:53,610 --> 01:21:56,230 What's the third step? 1949 01:21:56,230 --> 01:21:56,730 Merge. 1950 01:21:56,730 --> 01:21:58,980 So it's left half, right half, merge, again and again. 1951 01:21:58,980 --> 01:22:05,310 So right half, left half, let's merge them, 1, 3, 6, 8. 1952 01:22:05,310 --> 01:22:07,530 And now if you rewind, like, two minutes, 1953 01:22:07,530 --> 01:22:11,050 this is the right half of the whole list. 1954 01:22:11,050 --> 01:22:13,180 So what's step three? 1955 01:22:13,180 --> 01:22:13,680 Merge. 1956 01:22:13,680 --> 01:22:15,570 So let's give ourselves a little more memory 1957 01:22:15,570 --> 01:22:24,620 and merge these two, 1, 2, 3, 4, 5, 6, 7, 8. 1958 01:22:24,620 --> 01:22:26,550 And my god, it's merged in the end. 1959 01:22:26,550 --> 01:22:28,410 Now, that was a lot of steps. 1960 01:22:28,410 --> 01:22:31,350 But it turns out it was far fewer than the number 1961 01:22:31,350 --> 01:22:33,940 of steps we were used to thus far. 1962 01:22:33,940 --> 01:22:36,600 In fact, if you consider what really happened, 1963 01:22:36,600 --> 01:22:39,180 after all of those verbal gymnastics, what I really did was 1964 01:22:39,180 --> 01:22:42,060 I took a list of size 8 and broke it down at some point 1965 01:22:42,060 --> 01:22:43,195 into eight lists of size 1. 1966 01:22:43,195 --> 01:22:45,570 And that's when there was no interesting work to be done. 1967 01:22:45,570 --> 01:22:46,320 We just returned. 1968 01:22:46,320 --> 01:22:46,820 1969 01:22:46,820 --> 01:22:50,760 But I did that so that I could then compose four lists of size 2 1970 01:22:50,760 --> 01:22:51,720 along the way. 1971 01:22:51,720 --> 01:22:54,545 And I did that so I could compose two lists of size 4. 1972 01:22:54,545 --> 01:22:56,670 And I did that so that I could aggregate everything 1973 01:22:56,670 --> 01:23:00,810 together and get one list of size 8. 1974 01:23:00,810 --> 01:23:02,550 So notice the pattern here. 1975 01:23:02,550 --> 01:23:05,310 If you go bottom up, even, here's one list. 1976 01:23:05,310 --> 01:23:06,660 I divide it in half. 1977 01:23:06,660 --> 01:23:07,980 I divided those halves in half. 1978 01:23:07,980 --> 01:23:10,180 I divided those halves in halves. 1979 01:23:10,180 --> 01:23:13,590 So what function or mathematics have we use 1980 01:23:13,590 --> 01:23:16,110 to describe any process thus far since week 0, 1981 01:23:16,110 --> 01:23:19,498 where we're doing something halves at a time? 1982 01:23:19,498 --> 01:23:20,675 AUDIENCE: Logarithm. 1983 01:23:20,675 --> 01:23:21,550 SPEAKER 1: Logarithm. 1984 01:23:21,550 --> 01:23:25,150 So any time you see in CS50 and really in algorithms is more generally 1985 01:23:25,150 --> 01:23:29,570 a process that is dividing and dividing and dividing again and again, 1986 01:23:29,570 --> 01:23:31,750 there's a logarithm involved there. 1987 01:23:31,750 --> 01:23:36,040 And indeed, the number of times that you can chop up a list of size 8 1988 01:23:36,040 --> 01:23:40,600 into eight lists of size 1 is, by definition, log base 2 of n 1989 01:23:40,600 --> 01:23:43,322 or just, again, with a wave of the hand, log n, which 1990 01:23:43,322 --> 01:23:46,030 is to say like the height of this picture, if you will, is log n. 1991 01:23:46,030 --> 01:23:48,405 But again, we don't have to worry too much about numbers. 1992 01:23:48,405 --> 01:23:52,210 But every time we did that dividing into smaller lists, we merged, right? 1993 01:23:52,210 --> 01:23:54,430 That was the third and most important step. 1994 01:23:54,430 --> 01:23:58,810 And every time we merged, we combined 4 elements 1995 01:23:58,810 --> 01:24:02,740 plus 4 elements or 2 plus 2 plus 2 plus 2 elements or 1 plus 1 1996 01:24:02,740 --> 01:24:04,840 plus 1, 8 elements individually. 1997 01:24:04,840 --> 01:24:07,370 So we touched all n elements. 1998 01:24:07,370 --> 01:24:11,050 So this picture, if you will, is, like, 8 numbers wide. 1999 01:24:11,050 --> 01:24:14,740 And I-- or n numbers wide, if we generalize as n. 2000 01:24:14,740 --> 01:24:19,360 And it's log n rows tall, if you will, because that's how many times you 2001 01:24:19,360 --> 01:24:20,890 can divide things again and again. 2002 01:24:20,890 --> 01:24:24,970 So what is the running time intuitively, perhaps, of merge sort? 2003 01:24:24,970 --> 01:24:28,240 It's actually n times log n because you've 2004 01:24:28,240 --> 01:24:31,000 got n numbers that need to be merged again and again and again. 2005 01:24:31,000 --> 01:24:34,000 But how many times did I say again? 2006 01:24:34,000 --> 01:24:38,620 Log n times, because that's the number of times you can have things again 2007 01:24:38,620 --> 01:24:40,758 and again and again. 2008 01:24:40,758 --> 01:24:44,050 And if you do the math, log base 2 of 8, which is the total number of elements, 2009 01:24:44,050 --> 01:24:45,970 indeed is 1, 2, 3. 2010 01:24:45,970 --> 01:24:47,080 So math works out. 2011 01:24:47,080 --> 01:24:49,660 But it's OK if you think about it more intuitively. 2012 01:24:49,660 --> 01:24:51,640 So this is perhaps the bigger leap of faith, 2013 01:24:51,640 --> 01:24:55,420 to just believe that, indeed, that is how that math works out. 2014 01:24:55,420 --> 01:24:58,600 But it turns out that what this means is the algorithm 2015 01:24:58,600 --> 01:25:01,610 itself is fundamentally faster. 2016 01:25:01,610 --> 01:25:03,568 So if we consider our little chart from before, 2017 01:25:03,568 --> 01:25:06,318 where bubble sort and selection sort were way up here at the top-- 2018 01:25:06,318 --> 01:25:08,810 and frankly, you can have even slower algorithms than that, 2019 01:25:08,810 --> 01:25:11,680 especially if the problems are even more difficult to solve. 2020 01:25:11,680 --> 01:25:16,283 Now we can add to the list merge sort there at n log n. 2021 01:25:16,283 --> 01:25:16,950 It's in between. 2022 01:25:16,950 --> 01:25:17,450 Why? 2023 01:25:17,450 --> 01:25:20,920 Because, again, even if you're not 100% comfortable with what log in is, 2024 01:25:20,920 --> 01:25:22,270 notice that here's n. 2025 01:25:22,270 --> 01:25:23,420 Here's n squared. 2026 01:25:23,420 --> 01:25:27,910 So n times a slightly smaller value is in between, or n log n. 2027 01:25:27,910 --> 01:25:31,540 And we'll see in a moment what this actually means or feels like. 2028 01:25:31,540 --> 01:25:33,310 What about omega? 2029 01:25:33,310 --> 01:25:37,460 In the best case with merge sort, how much time does it take? 2030 01:25:37,460 --> 01:25:39,730 Well, it, too, does not have that optimization 2031 01:25:39,730 --> 01:25:42,970 that bubble sort had, which is, well, if you do no swaps, just quit. 2032 01:25:42,970 --> 01:25:45,940 It does the same thing always, sort the left half, sort the right half, 2033 01:25:45,940 --> 01:25:48,430 merge, even if it's a bit unnecessary. 2034 01:25:48,430 --> 01:25:53,710 So it turns out that the omega notation for merge sort is also n log n. 2035 01:25:53,710 --> 01:25:58,540 The newer version of bubble sort, recall, we could get as good as n steps 2036 01:25:58,540 --> 01:26:01,000 if we stop after seeing no swaps. 2037 01:26:01,000 --> 01:26:02,920 So merge sort, it's a trade-off, right? 2038 01:26:02,920 --> 01:26:05,038 In the worst case, much faster, I claim. 2039 01:26:05,038 --> 01:26:05,830 It's not n squared. 2040 01:26:05,830 --> 01:26:06,760 It's n log n. 2041 01:26:06,760 --> 01:26:09,280 But in the best case, you might waste a little bit of time. 2042 01:26:09,280 --> 01:26:11,800 And again, that's thematic in computer science more generally. 2043 01:26:11,800 --> 01:26:13,210 You're not going to get anything for free. 2044 01:26:13,210 --> 01:26:15,190 If you want to improve your upper bound, you 2045 01:26:15,190 --> 01:26:18,493 might have to sacrifice your lower bound as well. 2046 01:26:18,493 --> 01:26:20,410 Now, it turns out with some algorithms-- and I 2047 01:26:20,410 --> 01:26:24,580 promise this is last Greek notation for the course. 2048 01:26:24,580 --> 01:26:27,220 This is a capital theta in Greek. 2049 01:26:27,220 --> 01:26:31,000 And it turns out that if an algorithm has an upper bound and a lower 2050 01:26:31,000 --> 01:26:33,940 bound that are identical, you can describe it using, just 2051 01:26:33,940 --> 01:26:35,860 for shorthand notation, theta. 2052 01:26:35,860 --> 01:26:38,320 So we've seen two algorithms that fit this criteria. 2053 01:26:38,320 --> 01:26:40,000 Selection sort was pretty bad. 2054 01:26:40,000 --> 01:26:42,100 It was big O of n squared. 2055 01:26:42,100 --> 01:26:44,320 And it was omega of n squared because it just 2056 01:26:44,320 --> 01:26:47,590 kept blindly looking for the smallest elements again and again. 2057 01:26:47,590 --> 01:26:51,910 Merge sort is in theta of n log n for the same reason. 2058 01:26:51,910 --> 01:26:54,250 It just blindly does the same algorithm again and again, 2059 01:26:54,250 --> 01:26:58,420 no matter whether the input is already sorted or completely unsorted. 2060 01:26:58,420 --> 01:27:04,130 But on the whole, n log n is a pretty powerful, compelling feature. 2061 01:27:04,130 --> 01:27:07,990 So let me go ahead and turn our attention, finally, 2062 01:27:07,990 --> 01:27:11,530 to a little visualization that might help this sink in as well. 2063 01:27:11,530 --> 01:27:15,760 What you're about to see is a bunch of vertical bars, the top of which 2064 01:27:15,760 --> 01:27:17,590 are 100 bars from left to right. 2065 01:27:17,590 --> 01:27:19,840 Small bars equals small number. 2066 01:27:19,840 --> 01:27:21,160 Big bar equals big number. 2067 01:27:21,160 --> 01:27:23,650 And the first algorithm up here is selection sort. 2068 01:27:23,650 --> 01:27:25,810 The second algorithm down here is bubble sort. 2069 01:27:25,810 --> 01:27:27,820 And the middle algorithm is merge sort. 2070 01:27:27,820 --> 01:27:30,820 So if you will, we'll end on this note today. 2071 01:27:30,820 --> 01:27:33,490 We'll time these algorithms with these simple inputs 2072 01:27:33,490 --> 01:27:39,160 and see just how much better, I claim, merge sort is, which is to say, 2073 01:27:39,160 --> 01:27:43,330 just how big of a difference does n squared versus n log n make, 2074 01:27:43,330 --> 01:27:46,120 which is to say when you design algorithms, making things correct 2075 01:27:46,120 --> 01:27:47,650 is not the ultimate goal. 2076 01:27:47,650 --> 01:27:50,290 It's to make them well designed as well. 2077 01:27:50,290 --> 01:27:53,284 [MUSIC PLAYING] 2078 01:27:53,284 --> 01:28:53,280 2079 01:28:53,280 --> 01:28:55,020 That's it for C50 and merge sort. 2080 01:28:55,020 --> 01:28:56,820 We will see you next time. 2081 01:28:56,820 --> 01:28:59,570 [APPLAUSE]