1 00:00:00,000 --> 00:00:02,982 [FILM REEL] 2 00:00:02,982 --> 00:00:06,461 [MUSIC PLAYING] 3 00:00:06,461 --> 00:01:12,660 4 00:01:12,660 --> 00:01:15,420 DAVID MALAN: All right, this is CS50. 5 00:01:15,420 --> 00:01:19,200 And this is Week 3 already, wherein we'll take a look back actually 6 00:01:19,200 --> 00:01:21,400 at Week 0 where we first began. 7 00:01:21,400 --> 00:01:24,930 And in Week 0, recall that everything was very intuitive, in a sense. 8 00:01:24,930 --> 00:01:28,000 We talked not just about representation of information, but algorithms. 9 00:01:28,000 --> 00:01:30,450 And we talked about tearing a phone book again and again. 10 00:01:30,450 --> 00:01:32,860 And that somehow got us to a better solution. 11 00:01:32,860 --> 00:01:35,888 But today, we'll try to start formalizing some of those ideas 12 00:01:35,888 --> 00:01:38,430 and capturing some of those same ideas not in pseudocode just 13 00:01:38,430 --> 00:01:41,650 yet, but in actual code as well. 14 00:01:41,650 --> 00:01:45,210 But we'll also consider the efficiency of those algorithms, 15 00:01:45,210 --> 00:01:48,372 like just how good, how well-designed our algorithms actually are. 16 00:01:48,372 --> 00:01:50,580 And if you recall, when we did the phone book example 17 00:01:50,580 --> 00:01:53,940 wherein I first had an algorithm searching one page at a time, 18 00:01:53,940 --> 00:01:56,460 and then second one two pages at a time, and then third, 19 00:01:56,460 --> 00:01:58,590 started tearing the thing in half, recall 20 00:01:58,590 --> 00:02:01,900 that we, with a wave of the hand, kind of analyzed it as follows. 21 00:02:01,900 --> 00:02:05,070 We proposed that if the x-axis here is the size of the problem, 22 00:02:05,070 --> 00:02:09,120 like number of pages in a phone book, and the y-axis is the time required 23 00:02:09,120 --> 00:02:11,430 to solve the problem in seconds, minutes, 24 00:02:11,430 --> 00:02:13,363 page tears, whatever your unit of measure is, 25 00:02:13,363 --> 00:02:16,530 recall that the first algorithm, which is the straight line such that if you 26 00:02:16,530 --> 00:02:19,620 had n pages in the phone book, it might have this slope of n-- 27 00:02:19,620 --> 00:02:23,280 and there's this one-to-one relationship between pages and tears. 28 00:02:23,280 --> 00:02:27,000 Two pages at a time, of course, was twice as fast, but still really 29 00:02:27,000 --> 00:02:30,000 the same shape, the yellow line here indicating that yeah, 30 00:02:30,000 --> 00:02:33,570 it's n over 2, maybe plus 1 if you have to double back, as we discussed. 31 00:02:33,570 --> 00:02:36,840 But it's really still fundamentally the same algorithm one 32 00:02:36,840 --> 00:02:38,370 or two pages at a time. 33 00:02:38,370 --> 00:02:41,760 But the third algorithm, recall, was this one here in green, 34 00:02:41,760 --> 00:02:45,660 where we called it logarithmic in terms of how fast or how slow it was. 35 00:02:45,660 --> 00:02:48,235 And indeed, the implication of this algorithm 36 00:02:48,235 --> 00:02:50,610 was that we could even double the size of the phone book, 37 00:02:50,610 --> 00:02:53,280 and no big deal-- one additional page tear, 38 00:02:53,280 --> 00:02:55,990 and we take yet another 1,000 page bite out of the phone book. 39 00:02:55,990 --> 00:02:58,890 So today, we'll revisit some of these ideas, formalize them a bit, 40 00:02:58,890 --> 00:03:01,530 but also translate some of them, ultimately, to code. 41 00:03:01,530 --> 00:03:03,690 And all of that now is possible because we 42 00:03:03,690 --> 00:03:06,060 have this lower-level understanding, perhaps, of what's 43 00:03:06,060 --> 00:03:07,600 actually inside of your computer. 44 00:03:07,600 --> 00:03:10,170 This, of course, is your computer's RAM or memory. 45 00:03:10,170 --> 00:03:12,720 And recall that if we start to abstract this away, 46 00:03:12,720 --> 00:03:15,420 your computer's memory is really just a grid of bytes. 47 00:03:15,420 --> 00:03:17,730 In fact, we don't have to look at the hardware anymore. 48 00:03:17,730 --> 00:03:21,270 And we looked at a grid of bytes like this, whereby each of these bytes 49 00:03:21,270 --> 00:03:26,700 could be used to store a char, an int, a long, or even an entire string, 50 00:03:26,700 --> 00:03:27,460 at that. 51 00:03:27,460 --> 00:03:30,117 But let's focus perhaps just on a subset of this 52 00:03:30,117 --> 00:03:31,950 because last week, of course, we emphasized, 53 00:03:31,950 --> 00:03:34,830 really, arrays, storing things in arrays. 54 00:03:34,830 --> 00:03:38,140 And that allowed us to start storing entire strings, sequences 55 00:03:38,140 --> 00:03:40,470 of characters, and even arrays of integers 56 00:03:40,470 --> 00:03:44,770 if we wanted to have multiple ones and not just multiple variables as well. 57 00:03:44,770 --> 00:03:48,850 But the catch is that if you look inside of an array in the computer's memory-- 58 00:03:48,850 --> 00:03:51,450 and for instance, suppose these integers here are stored-- 59 00:03:51,450 --> 00:03:53,850 it's pretty easy for us humans to glance at this 60 00:03:53,850 --> 00:03:55,698 and immediately find the number 50. 61 00:03:55,698 --> 00:03:57,990 You sort of have this bird's eye view from where you're 62 00:03:57,990 --> 00:03:59,700 seated of everything on the screen. 63 00:03:59,700 --> 00:04:02,500 And so it's pretty obvious how you get to the number 50. 64 00:04:02,500 --> 00:04:04,710 But in the world of computers, of course, 65 00:04:04,710 --> 00:04:06,720 it turns out that this is hardware. 66 00:04:06,720 --> 00:04:10,440 And computers, for today's purposes, can only do one thing at a time. 67 00:04:10,440 --> 00:04:14,820 They can't just take it all in and find instantly some number like 50. 68 00:04:14,820 --> 00:04:17,130 So perhaps a decent metaphor is to consider 69 00:04:17,130 --> 00:04:20,010 the array of memory inside of your computer 70 00:04:20,010 --> 00:04:22,500 really is a sequence of closed doors. 71 00:04:22,500 --> 00:04:25,860 And if the computer wants to find some value in an array, 72 00:04:25,860 --> 00:04:29,970 it has to do the digital equivalent of opening each of these doors one 73 00:04:29,970 --> 00:04:30,730 at a time. 74 00:04:30,730 --> 00:04:32,410 Now how can code do that? 75 00:04:32,410 --> 00:04:35,610 Well, of course, we introduced indices or indexes last week, 76 00:04:35,610 --> 00:04:39,900 whereby we, by convention, call the first element of an array location 0, 77 00:04:39,900 --> 00:04:44,490 the second location 1, the third location 2, and so forth-- so-called 0 78 00:04:44,490 --> 00:04:45,150 indexed. 79 00:04:45,150 --> 00:04:48,300 And this allowed us to now bridge this conceptual world of what's 80 00:04:48,300 --> 00:04:51,510 going on in memory with actual code, because now we had this square bracket 81 00:04:51,510 --> 00:04:55,710 syntax via which we could go searching for something if we so choose. 82 00:04:55,710 --> 00:04:59,760 And it turns out, if I now paint these red instead of yellow, 83 00:04:59,760 --> 00:05:02,970 it would seem that we actually have a pretty good physical metaphor here 84 00:05:02,970 --> 00:05:07,170 standing in place for what would be a computer's array of memory 85 00:05:07,170 --> 00:05:10,390 if, for instance, you're storing some seven numbers like that. 86 00:05:10,390 --> 00:05:13,710 And so today we begin with a look of a specific type of algorithm. 87 00:05:13,710 --> 00:05:14,910 That is for searching. 88 00:05:14,910 --> 00:05:16,270 Searching is all over the place. 89 00:05:16,270 --> 00:05:19,900 All of us have probably gone to google.com or some equivalent already 90 00:05:19,900 --> 00:05:21,090 multiple times per day. 91 00:05:21,090 --> 00:05:24,120 And getting back answers fast is what companies like Google 92 00:05:24,120 --> 00:05:25,110 are really good at. 93 00:05:25,110 --> 00:05:26,700 So how are they doing that? 94 00:05:26,700 --> 00:05:29,882 How are they storing information in computers' memory? 95 00:05:29,882 --> 00:05:31,590 Well, let's consider what this really is. 96 00:05:31,590 --> 00:05:34,500 It's really just a problem as it was back in Week 0. 97 00:05:34,500 --> 00:05:36,420 The input, though, to the problem, for now, 98 00:05:36,420 --> 00:05:38,253 might be this array of seven lockers. 99 00:05:38,253 --> 00:05:40,920 So that's the input to the problem, inside of which is a number. 100 00:05:40,920 --> 00:05:45,060 And maybe for simplicity now, we just want a yes/no, a true/false answer-- 101 00:05:45,060 --> 00:05:46,860 a bool, that is to say-- 102 00:05:46,860 --> 00:05:51,250 of whether or not some number like 50 is in that array. 103 00:05:51,250 --> 00:05:52,680 It's not quite as fancy as Google. 104 00:05:52,680 --> 00:05:55,500 That doesn't just tell you yes, we have search results. 105 00:05:55,500 --> 00:05:57,300 It actually gives you the search results. 106 00:05:57,300 --> 00:05:59,520 But for, now we'll keep it simple, and just output 107 00:05:59,520 --> 00:06:02,070 as part of this problem yes or no, true or false, 108 00:06:02,070 --> 00:06:06,090 we have found the number we're looking for given an input like that array. 109 00:06:06,090 --> 00:06:09,930 But it turns out inside of this black box that we keep coming back to, 110 00:06:09,930 --> 00:06:12,385 there's all sorts of possible algorithms. 111 00:06:12,385 --> 00:06:15,010 And we talked about this at a high level conceptually in Week 0 112 00:06:15,010 --> 00:06:15,860 with the phone book. 113 00:06:15,860 --> 00:06:19,360 But today, let's consider it a little more concretely 114 00:06:19,360 --> 00:06:22,648 by way of a game that some of you might have grown up with, namely Monopoly. 115 00:06:22,648 --> 00:06:24,940 And so behind these doors, it turns out, will be hidden 116 00:06:24,940 --> 00:06:26,800 some denominations of monopoly money. 117 00:06:26,800 --> 00:06:28,690 But for this, we now have two volunteers. 118 00:06:28,690 --> 00:06:30,483 If you'd like to greet the world? 119 00:06:30,483 --> 00:06:31,525 JACKSON: Hi, I'm Jackson. 120 00:06:31,525 --> 00:06:35,440 121 00:06:35,440 --> 00:06:37,220 STEPHANIE: Hi, my name is Stephanie. 122 00:06:37,220 --> 00:06:38,410 DAVID MALAN: And do you want to say a little something 123 00:06:38,410 --> 00:06:40,570 about yourselves-- years, house, dorm? 124 00:06:40,570 --> 00:06:43,030 STEPHANIE: I'm a first year living in Matthews. 125 00:06:43,030 --> 00:06:43,780 DAVID MALAN: Nice. 126 00:06:43,780 --> 00:06:45,580 JACKSON: And I'm a first year in Canaday. 127 00:06:45,580 --> 00:06:46,330 DAVID MALAN: Nice. 128 00:06:46,330 --> 00:06:48,670 Well, welcome to our two volunteers. 129 00:06:48,670 --> 00:06:50,710 So why don't we do this? 130 00:06:50,710 --> 00:06:54,460 Would one of you like to volunteer the other to go first? 131 00:06:54,460 --> 00:06:55,510 STEPHANIE: I'll go. 132 00:06:55,510 --> 00:06:56,560 DAVID MALAN: OK. 133 00:06:56,560 --> 00:06:58,300 All right, so Stephanie's up first. 134 00:06:58,300 --> 00:07:02,560 And behind one of these doors here, we've hidden the monopoly money 50. 135 00:07:02,560 --> 00:07:04,180 And so we'd like you to find the 50. 136 00:07:04,180 --> 00:07:06,490 We'll tell you nothing more about the lockers. 137 00:07:06,490 --> 00:07:08,900 But we would like you to execute a certain algorithm. 138 00:07:08,900 --> 00:07:11,020 And in fact, I'm going to give you some pseudocode for this. 139 00:07:11,020 --> 00:07:12,770 And I'm going to give you the name for it. 140 00:07:12,770 --> 00:07:13,900 It's called linear search. 141 00:07:13,900 --> 00:07:15,855 And as the name implies, you're pretty much 142 00:07:15,855 --> 00:07:17,980 going to end up walking in sort of a straight line. 143 00:07:17,980 --> 00:07:19,220 But how are you going to do this? 144 00:07:19,220 --> 00:07:21,520 Well, let me propose that in a moment, your first step 145 00:07:21,520 --> 00:07:23,170 will be to think kind of like a loop. 146 00:07:23,170 --> 00:07:27,280 For each door from left to right, what do we want you to do on each iteration? 147 00:07:27,280 --> 00:07:31,528 Well, if 50 is behind that door, then we want 148 00:07:31,528 --> 00:07:33,070 to go ahead and have you return true. 149 00:07:33,070 --> 00:07:35,860 And hold up the 50 proudly, if you will, for the group. 150 00:07:35,860 --> 00:07:38,500 Otherwise, if you get through that whole loop 151 00:07:38,500 --> 00:07:40,690 and you haven't found the number 50, you can just 152 00:07:40,690 --> 00:07:42,640 throw up your hands in disappointment. 153 00:07:42,640 --> 00:07:45,190 False-- you've not found the number 50. 154 00:07:45,190 --> 00:07:50,293 So to be clear, step one is going to be for each door from left to right. 155 00:07:50,293 --> 00:07:51,460 How would you like to begin? 156 00:07:51,460 --> 00:07:55,610 157 00:07:55,610 --> 00:07:56,495 Yep. 158 00:07:56,495 --> 00:07:57,840 Oh, and then-- yep. 159 00:07:57,840 --> 00:07:58,340 There we go. 160 00:07:58,340 --> 00:08:00,365 Yep. 161 00:08:00,365 --> 00:08:02,060 And if you'd like to at least tell-- 162 00:08:02,060 --> 00:08:04,040 good, good acting here. 163 00:08:04,040 --> 00:08:06,140 What have you found instead? 164 00:08:06,140 --> 00:08:07,970 STEPHANIE: It's not 50, but 20. 165 00:08:07,970 --> 00:08:08,900 DAVID MALAN: Oh, OK. 166 00:08:08,900 --> 00:08:10,280 So step one was a fail. 167 00:08:10,280 --> 00:08:11,870 So let's move on to step two. 168 00:08:11,870 --> 00:08:14,453 Inside of this loop, what are you going to do next? 169 00:08:14,453 --> 00:08:16,370 STEPHANIE: I'm going to move to the next door. 170 00:08:16,370 --> 00:08:17,037 DAVID MALAN: OK. 171 00:08:17,037 --> 00:08:20,790 172 00:08:20,790 --> 00:08:22,100 STEPHANIE: Almost. 173 00:08:22,100 --> 00:08:23,100 DAVID MALAN: OK, almost. 174 00:08:23,100 --> 00:08:23,790 Sort of. 175 00:08:23,790 --> 00:08:25,110 A 500 instead. 176 00:08:25,110 --> 00:08:26,280 Next locker? 177 00:08:26,280 --> 00:08:30,200 STEPHANIE: I would rather take that. 178 00:08:30,200 --> 00:08:30,700 No. 179 00:08:30,700 --> 00:08:33,840 180 00:08:33,840 --> 00:08:36,558 DAVID MALAN: OK, we're not telling the audience? 181 00:08:36,558 --> 00:08:38,260 STEPHANIE: It was a 10. 182 00:08:38,260 --> 00:08:39,789 DAVID MALAN: OK, so keep going. 183 00:08:39,789 --> 00:08:40,974 This is step three now. 184 00:08:40,974 --> 00:08:45,470 185 00:08:45,470 --> 00:08:46,310 STEPHANIE: Oh, man. 186 00:08:46,310 --> 00:08:49,850 187 00:08:49,850 --> 00:08:51,260 DAVID MALAN: Five, OK. 188 00:08:51,260 --> 00:08:52,670 A few more lockers to check. 189 00:08:52,670 --> 00:08:57,296 190 00:08:57,296 --> 00:08:58,790 STEPHANIE: A little sad, guys. 191 00:08:58,790 --> 00:09:02,527 192 00:09:02,527 --> 00:09:04,360 DAVID MALAN: All right, second-to-last step. 193 00:09:04,360 --> 00:09:07,710 194 00:09:07,710 --> 00:09:09,070 STEPHANIE: It's 1. 195 00:09:09,070 --> 00:09:10,022 Kind of close. 196 00:09:10,022 --> 00:09:10,980 DAVID MALAN: All right. 197 00:09:10,980 --> 00:09:12,780 And finally, the last step. 198 00:09:12,780 --> 00:09:15,780 Clearly you've been, perhaps, set up here. 199 00:09:15,780 --> 00:09:17,340 STEPHANIE: Let's go! 200 00:09:17,340 --> 00:09:19,920 DAVID MALAN: All right, so the number 50. 201 00:09:19,920 --> 00:09:23,500 202 00:09:23,500 --> 00:09:25,870 And Stephanie, if I may, let me ask you a question here. 203 00:09:25,870 --> 00:09:28,890 So on the screen, this is the pseudocode you just executed. 204 00:09:28,890 --> 00:09:31,800 Suppose, though, I had done what many of us 205 00:09:31,800 --> 00:09:34,770 have gotten to the habit of doing when you have an if condition. 206 00:09:34,770 --> 00:09:36,730 You often have an else branch as well. 207 00:09:36,730 --> 00:09:38,760 Suppose that I had done this now. 208 00:09:38,760 --> 00:09:41,680 And I'm marking it in red to be clear this is wrong. 209 00:09:41,680 --> 00:09:45,750 But what would have been bad about this code using an if and an else, 210 00:09:45,750 --> 00:09:47,040 might you say? 211 00:09:47,040 --> 00:09:48,360 Any instincts? 212 00:09:48,360 --> 00:09:55,620 213 00:09:55,620 --> 00:09:58,740 STEPHANIE: Then you would end up canceling 214 00:09:58,740 --> 00:10:00,210 the code before you found the 50. 215 00:10:00,210 --> 00:10:01,020 DAVID MALAN: Yeah, exactly. 216 00:10:01,020 --> 00:10:02,070 STEPHANIE: I mean, you'd just be eternally sad. 217 00:10:02,070 --> 00:10:02,460 DAVID MALAN: Indeed. 218 00:10:02,460 --> 00:10:05,127 When Stephanie had opened the first locker, she'd have found 20. 219 00:10:05,127 --> 00:10:06,630 20, of course, is not 50. 220 00:10:06,630 --> 00:10:07,838 She would have decreed false. 221 00:10:07,838 --> 00:10:10,547 But of course, she hadn't checked all of the rest of the lockers. 222 00:10:10,547 --> 00:10:13,440 So that would seem to be a key detail that, with this implementation 223 00:10:13,440 --> 00:10:16,440 of the pseudocode, we actually do go through-- as we did-- 224 00:10:16,440 --> 00:10:18,810 and only return false not even with an else, 225 00:10:18,810 --> 00:10:23,070 but just at the end of the loop such that we only reach that line if we 226 00:10:23,070 --> 00:10:25,245 don't return true earlier than that. 227 00:10:25,245 --> 00:10:26,620 Well, let's go ahead and do this. 228 00:10:26,620 --> 00:10:27,360 Let me take the mic from you. 229 00:10:27,360 --> 00:10:28,930 If you'd like to take a seat next to Jackson? 230 00:10:28,930 --> 00:10:31,180 And Jackson, in just a moment, we'll have you come up. 231 00:10:31,180 --> 00:10:34,170 Carter, if you don't mind reorganizing the lockers for us. 232 00:10:34,170 --> 00:10:36,630 But in the meantime, let me point out how we might now 233 00:10:36,630 --> 00:10:38,010 translate that same idea to code. 234 00:10:38,010 --> 00:10:41,050 Pretty high level, pretty English-oriented with that pseudocode-- 235 00:10:41,050 --> 00:10:44,910 but really now, as of last week, we have syntax via which Stephanie 236 00:10:44,910 --> 00:10:48,810 and, soon, Jackson could treat this locker, this set of lockers, as really 237 00:10:48,810 --> 00:10:51,250 indeed an array using bracket notation. 238 00:10:51,250 --> 00:10:54,480 So we can now get a little closer in our pseudocode to actual code. 239 00:10:54,480 --> 00:10:56,670 And the way a computer scientist, for instance, 240 00:10:56,670 --> 00:10:59,850 would translate fairly high level English pseudocode 241 00:10:59,850 --> 00:11:03,720 like this to something that's a little closer to C or any language 242 00:11:03,720 --> 00:11:06,600 that supports arrays would be a little more cryptically like this. 243 00:11:06,600 --> 00:11:09,060 But you'll see more of this syntax in the coming days. 244 00:11:09,060 --> 00:11:13,260 For i from 0 to n minus 1-- this is still pseudocode. 245 00:11:13,260 --> 00:11:15,840 But that's the English-like way of expressing what 246 00:11:15,840 --> 00:11:17,730 we've known come to know as a for loop. 247 00:11:17,730 --> 00:11:22,260 If 50 is behind doors bracket i-- so I'm assuming for the sake of discussion 248 00:11:22,260 --> 00:11:26,503 that doors, now, is the name of my variable, this array of seven doors. 249 00:11:26,503 --> 00:11:28,920 But then the rest of the logic, the rest of the pseudocode 250 00:11:28,920 --> 00:11:30,370 really is the same way. 251 00:11:30,370 --> 00:11:33,270 And so you'll find in time that programmers, computer scientists more 252 00:11:33,270 --> 00:11:36,900 generally, when you start expressing ideas, algorithms to someone else, 253 00:11:36,900 --> 00:11:40,170 instead of maybe operating at this level here, 254 00:11:40,170 --> 00:11:43,020 you now have a new vocabulary, really a new syntax 255 00:11:43,020 --> 00:11:44,910 that you can be a little more specific, not 256 00:11:44,910 --> 00:11:47,380 getting so into the weeds of writing actual C code, 257 00:11:47,380 --> 00:11:50,910 but at least now doing something that's a little closer to manipulating 258 00:11:50,910 --> 00:11:51,810 an array like this. 259 00:11:51,810 --> 00:11:55,140 So Jackson, would you like to stand on up? 260 00:11:55,140 --> 00:11:56,760 All right. 261 00:11:56,760 --> 00:11:57,360 Yes, yes. 262 00:11:57,360 --> 00:11:59,010 Support for Jackson here, too. 263 00:11:59,010 --> 00:12:00,780 Nice. 264 00:12:00,780 --> 00:12:04,470 And here now, I'm going to allow you an assumption that Stephanie did not have. 265 00:12:04,470 --> 00:12:06,660 Stephanie clearly was really doing her best 266 00:12:06,660 --> 00:12:10,050 searching from left to right using linear search, as we'll now call it. 267 00:12:10,050 --> 00:12:12,180 But they were pretty much in a random order, right? 268 00:12:12,180 --> 00:12:15,030 There was a 20 over there, there was 1 over there, and then a 50. 269 00:12:15,030 --> 00:12:19,110 So we deliberately jumbled things up and did not sort the numbers for her. 270 00:12:19,110 --> 00:12:22,530 But Carter kindly has just come up to give you a leg up, Jackson, 271 00:12:22,530 --> 00:12:24,510 by sorting the numbers in advance. 272 00:12:24,510 --> 00:12:27,630 And we'd like you this time, much like in week 0, 273 00:12:27,630 --> 00:12:30,060 to do something again and again, but this time 274 00:12:30,060 --> 00:12:32,250 using what we'll now call binary search. 275 00:12:32,250 --> 00:12:35,580 It's exactly the same algorithm conceptually as we did in Week 0. 276 00:12:35,580 --> 00:12:38,310 But if we translate it to the context of this array, 277 00:12:38,310 --> 00:12:40,450 we now might say something like this. 278 00:12:40,450 --> 00:12:42,960 The first step for Jackson might be to ask the question-- 279 00:12:42,960 --> 00:12:46,110 if 50 is behind the middle door, where presumably he's 280 00:12:46,110 --> 00:12:48,570 done some mental math to figure out what the middle is, 281 00:12:48,570 --> 00:12:50,610 then he's going to just return true. 282 00:12:50,610 --> 00:12:53,070 And hopefully we'll get lucky and 50 will be right there. 283 00:12:53,070 --> 00:12:56,400 Of course, there's two other possibilities at least, 284 00:12:56,400 --> 00:12:58,290 which would be what? 285 00:12:58,290 --> 00:13:01,290 50 is, with respect to these doors? 286 00:13:01,290 --> 00:13:03,930 Yeah, so to the left or to the right, alternatively. 287 00:13:03,930 --> 00:13:07,722 So if 50 is less than the middle door, then presumably, 288 00:13:07,722 --> 00:13:09,180 Jackson's going to want to go left. 289 00:13:09,180 --> 00:13:11,310 Else, if 50 is greater than the middle door, 290 00:13:11,310 --> 00:13:13,380 he's going to want to go right, much like I 291 00:13:13,380 --> 00:13:16,170 did physically last week with the phone book, 292 00:13:16,170 --> 00:13:17,910 dividing and conquering left to right. 293 00:13:17,910 --> 00:13:20,020 But there's actually a fourth case. 294 00:13:20,020 --> 00:13:21,540 Let's put it on the board first. 295 00:13:21,540 --> 00:13:25,530 What else might happen here that Jackson should consider? 296 00:13:25,530 --> 00:13:26,170 Yeah. 297 00:13:26,170 --> 00:13:28,590 Oh, it's not there. 298 00:13:28,590 --> 00:13:32,930 So let me actually go back and amend my pseudocode here and just say Jackson, 299 00:13:32,930 --> 00:13:36,200 if we don't hand you any doors at all, or eventually, 300 00:13:36,200 --> 00:13:39,080 as he's dividing and conquering, if he's left with no more doors, 301 00:13:39,080 --> 00:13:42,380 we have to handle that situation so that the behavior is defined. 302 00:13:42,380 --> 00:13:44,210 All right, so with that said, Jackson, do 303 00:13:44,210 --> 00:13:46,127 you want to go ahead and find us the number 50 304 00:13:46,127 --> 00:13:48,650 and walk us through verbally what you're doing and finding? 305 00:13:48,650 --> 00:13:52,860 JACKSON: All right, so it looks like this one is the middle door. 306 00:13:52,860 --> 00:13:55,290 So I'm going to open it. 307 00:13:55,290 --> 00:13:57,030 But it's 20, not 50. 308 00:13:57,030 --> 00:13:59,622 DAVID MALAN: Aw. 309 00:13:59,622 --> 00:14:01,080 What's going through your head now? 310 00:14:01,080 --> 00:14:02,670 JACKSON: So now I'm looking-- 311 00:14:02,670 --> 00:14:06,490 because 50 is higher than 20, I want to look to the right. 312 00:14:06,490 --> 00:14:07,440 DAVID MALAN: Good. 313 00:14:07,440 --> 00:14:10,270 JACKSON: And look for the new middle door, which would be here. 314 00:14:10,270 --> 00:14:11,700 DAVID MALAN: Nice. 315 00:14:11,700 --> 00:14:12,900 JACKSON: And it's 100-- 316 00:14:12,900 --> 00:14:13,740 bad. 317 00:14:13,740 --> 00:14:16,560 But 50 is less than 100. 318 00:14:16,560 --> 00:14:20,520 So now we to look left, which would be here. 319 00:14:20,520 --> 00:14:21,240 And ta-da. 320 00:14:21,240 --> 00:14:21,990 DAVID MALAN: Nice. 321 00:14:21,990 --> 00:14:25,680 Very well done this time around, too. 322 00:14:25,680 --> 00:14:29,680 So thank you, first, to our volunteers here. 323 00:14:29,680 --> 00:14:32,970 And in fact, since you're a fan of Monopoly, as we're so informed, 324 00:14:32,970 --> 00:14:35,220 we have the Cambridge edition of Monopoly 325 00:14:35,220 --> 00:14:36,743 with all your Harvard favorites. 326 00:14:36,743 --> 00:14:37,410 JACKSON: No way. 327 00:14:37,410 --> 00:14:38,460 DAVID MALAN: Here you go. 328 00:14:38,460 --> 00:14:38,970 STEPHANIE: Thank you. 329 00:14:38,970 --> 00:14:40,095 JACKSON: Thank you so much. 330 00:14:40,095 --> 00:14:42,570 DAVID MALAN: Thank you to our volunteers for finding us 50. 331 00:14:42,570 --> 00:14:46,940 So-- that was more popular than we expected. 332 00:14:46,940 --> 00:14:50,470 So here, we can translate this one more time into something 333 00:14:50,470 --> 00:14:52,060 a little closer to code. 334 00:14:52,060 --> 00:14:54,730 And again, still pseudocode-- but here, now, 335 00:14:54,730 --> 00:14:57,460 might be another formulation of exactly what Jackson just did, 336 00:14:57,460 --> 00:14:59,380 just using the nomenclature now of arrays, 337 00:14:59,380 --> 00:15:01,570 where you can be a little more precise with your instructions 338 00:15:01,570 --> 00:15:04,640 and still leave it to someone else to translate this, finally, to code. 339 00:15:04,640 --> 00:15:06,640 But here we have same question at the beginning. 340 00:15:06,640 --> 00:15:08,650 If no doors left, return false. 341 00:15:08,650 --> 00:15:11,500 If 50 is behind doors bracket middle-- 342 00:15:11,500 --> 00:15:13,810 so I'm assuming here, because this is pseudocode-- 343 00:15:13,810 --> 00:15:16,810 that somewhere I've done the mental math or the actual math 344 00:15:16,810 --> 00:15:19,480 to figure out what the index of middle is. 345 00:15:19,480 --> 00:15:22,420 For instance, if these are seven doors in an array, 346 00:15:22,420 --> 00:15:27,310 this would be location 0, 1, 2, 3, 4, 5, 6. 347 00:15:27,310 --> 00:15:31,120 So somehow I've taken the total number of doors, 7, 348 00:15:31,120 --> 00:15:33,520 divided by 2 to find the middle. 349 00:15:33,520 --> 00:15:34,430 That's 3 and 1/2. 350 00:15:34,430 --> 00:15:35,680 We have to deal with rounding. 351 00:15:35,680 --> 00:15:38,920 But suffice it to say there's a well-defined formula for finding 352 00:15:38,920 --> 00:15:39,910 the middle index. 353 00:15:39,910 --> 00:15:43,280 Given the total number of lockers, divide by 2 and then round accordingly. 354 00:15:43,280 --> 00:15:45,550 So that's presumably what Jackson did just by counting 355 00:15:45,550 --> 00:15:48,790 or in his head to find us door number 3. 356 00:15:48,790 --> 00:15:52,210 Not the third door, the 4th door, but door bracket 3. 357 00:15:52,210 --> 00:15:56,200 So this is just saying if 50 is behind doors bracket middle, return true. 358 00:15:56,200 --> 00:15:57,200 That was not the case. 359 00:15:57,200 --> 00:15:58,960 He found a $20 bill instead. 360 00:15:58,960 --> 00:16:03,670 Else, if 50 is less than the doors bracket middle, go ahead-- 361 00:16:03,670 --> 00:16:10,660 and now it gets interesting-- search doors 0 through doors middle minus 1. 362 00:16:10,660 --> 00:16:12,730 So it's getting a little more into the weeds now. 363 00:16:12,730 --> 00:16:16,900 But if middle is 3, this one here, what we want to now 364 00:16:16,900 --> 00:16:19,300 have Jackson search if 50 had been-- 365 00:16:19,300 --> 00:16:22,360 if the number had been less, we want to start at bracket 0 366 00:16:22,360 --> 00:16:24,340 and go up through this one. 367 00:16:24,340 --> 00:16:26,380 And we deliberately subtract 1 because what's 368 00:16:26,380 --> 00:16:28,297 the point of looking in the same locker again? 369 00:16:28,297 --> 00:16:31,270 We might as well do 0 through middle minus 1. 370 00:16:31,270 --> 00:16:36,040 Else if 50 is greater than doors bracket middle, which it was, 371 00:16:36,040 --> 00:16:37,120 what did we then do? 372 00:16:37,120 --> 00:16:40,870 So Jackson intuitively searched for doors middle plus 1 373 00:16:40,870 --> 00:16:43,295 through doors n minus 1. 374 00:16:43,295 --> 00:16:46,420 And honestly, it gets a little annoying having the pluses and minuses here. 375 00:16:46,420 --> 00:16:47,780 But just think of what it means. 376 00:16:47,780 --> 00:16:49,090 This is the middle door. 377 00:16:49,090 --> 00:16:53,942 And Jackson then did proceed to search through doors middle plus 1 378 00:16:53,942 --> 00:16:56,150 because there's no point in searching this one again. 379 00:16:56,150 --> 00:17:00,130 And then the last element in any array of size n 380 00:17:00,130 --> 00:17:05,352 where n is just our go-to number for the size is always going to be n minus 1. 381 00:17:05,352 --> 00:17:06,310 It's not going to be n. 382 00:17:06,310 --> 00:17:10,839 It's going to be n minus 1 because we always start counting arrays at 0. 383 00:17:10,839 --> 00:17:14,200 So here then we have a translation into pseudocode that's a little closer 384 00:17:14,200 --> 00:17:16,430 to C of this exact same idea. 385 00:17:16,430 --> 00:17:18,490 And here, we come full circle to Week 0. 386 00:17:18,490 --> 00:17:22,240 In Week 0, it was pretty intuitive to imagine dividing and conquering 387 00:17:22,240 --> 00:17:23,420 a problem like this. 388 00:17:23,420 --> 00:17:26,770 But if you now think back to actual your iPhone, your Android phone, 389 00:17:26,770 --> 00:17:29,920 or the like, when you're doing autocomplete and searching the list, 390 00:17:29,920 --> 00:17:33,430 it's possible, if you don't have many friends or family or colleagues 391 00:17:33,430 --> 00:17:34,840 in the phone, you know what? 392 00:17:34,840 --> 00:17:39,070 Linear search, just checking every name for the person you're searching for, 393 00:17:39,070 --> 00:17:40,720 might be perfectly fine. 394 00:17:40,720 --> 00:17:43,810 But odds are your phone's being smarter than that, especially if you 395 00:17:43,810 --> 00:17:47,140 start to have dozens, hundreds, thousands of people in your contacts 396 00:17:47,140 --> 00:17:47,770 over the years. 397 00:17:47,770 --> 00:17:49,540 What would be better than linear search? 398 00:17:49,540 --> 00:17:51,340 Well, perhaps binary search. 399 00:17:51,340 --> 00:17:55,180 But, but, but-- there's an assumption, a requirement, which is what? 400 00:17:55,180 --> 00:18:00,340 Why was Jackson ultimately able to find the 50 in just three steps 401 00:18:00,340 --> 00:18:04,830 instead of a full seven, like Stephanie? 402 00:18:04,830 --> 00:18:06,570 Because the array was sorted. 403 00:18:06,570 --> 00:18:09,990 And so this is sort of a teaser for what we'll have to come back to later today. 404 00:18:09,990 --> 00:18:12,780 Well, how much effort did it take someone like Carter? 405 00:18:12,780 --> 00:18:15,240 How much effort does it take your phone to sort all 406 00:18:15,240 --> 00:18:17,040 of those names and numbers in advance? 407 00:18:17,040 --> 00:18:19,650 Because maybe it's not actually worth the amount of time. 408 00:18:19,650 --> 00:18:24,210 Now someone like Google probably somehow keeps the database of web pages sorted. 409 00:18:24,210 --> 00:18:28,110 You can imagine it being super slow if, when you type in cats or something else 410 00:18:28,110 --> 00:18:32,280 into google.com, if they searched linearly over their entire data set. 411 00:18:32,280 --> 00:18:35,430 Ideally, they're doing something a little smarter than that. 412 00:18:35,430 --> 00:18:38,820 So we'll formalize, now, exactly this kind of analysis. 413 00:18:38,820 --> 00:18:42,180 And it's not going to be so much mathy as it still will be intuitive. 414 00:18:42,180 --> 00:18:45,480 But we'll introduce you to some jargon, some terminology 415 00:18:45,480 --> 00:18:48,180 that most any programmer or computer scientists might use 416 00:18:48,180 --> 00:18:50,550 when analyzing their own algorithms. 417 00:18:50,550 --> 00:18:53,670 Let's formalize now what this kind of analysis is. 418 00:18:53,670 --> 00:18:56,910 So right now, I claim binary search better than linear search. 419 00:18:56,910 --> 00:18:59,100 But how much better and why, exactly? 420 00:18:59,100 --> 00:19:01,120 Well, it all comes back to this kind of graph. 421 00:19:01,120 --> 00:19:04,830 So this, recall, is how we analyzed the phone book back in Week 0. 422 00:19:04,830 --> 00:19:08,880 And recall that, indeed, we had these formulas, rough formulas that 423 00:19:08,880 --> 00:19:11,370 described the running time of those three algorithms-- 424 00:19:11,370 --> 00:19:13,740 one page at a time, two pages at a time, and then 425 00:19:13,740 --> 00:19:15,720 tearing the thing again and again in half. 426 00:19:15,720 --> 00:19:19,830 And precisely, if you counted up the number of pages I was touching 427 00:19:19,830 --> 00:19:22,020 or the number of pages I was tearing, it's 428 00:19:22,020 --> 00:19:24,600 fair to say that the first algorithm, in the worst case, 429 00:19:24,600 --> 00:19:26,700 might have taken n total pages. 430 00:19:26,700 --> 00:19:29,010 It didn't because I was searching for John Harvard 431 00:19:29,010 --> 00:19:31,260 at the time, which is somewhat early in the alphabet. 432 00:19:31,260 --> 00:19:34,340 But if I were searching for someone with the last name of Z, 433 00:19:34,340 --> 00:19:36,840 I would have had to keep going and going, in the worst case, 434 00:19:36,840 --> 00:19:38,430 through all n pages. 435 00:19:38,430 --> 00:19:41,940 Not as bad for the second algorithm, and that's why we do n divided by 2. 436 00:19:41,940 --> 00:19:43,920 And even that's a bit of a white lie. 437 00:19:43,920 --> 00:19:48,270 It's probably n divided by 2 plus 1 in case I have to double back. 438 00:19:48,270 --> 00:19:50,405 But again, I'm sort of doing this more generally 439 00:19:50,405 --> 00:19:52,030 to capture the essence of these things. 440 00:19:52,030 --> 00:19:54,363 And then we really got into the weeds with like log base 441 00:19:54,363 --> 00:19:56,940 2 event for that third and final algorithm. 442 00:19:56,940 --> 00:20:00,690 And at the time, we claimed any time you're dividing something in half, 443 00:20:00,690 --> 00:20:04,200 in half, in half, odds are there's going to be some kind of logarithm involved. 444 00:20:04,200 --> 00:20:05,340 And we'll see that today. 445 00:20:05,340 --> 00:20:09,010 But today, we're going to actually start using computer science terminology. 446 00:20:09,010 --> 00:20:13,590 And we're going to formalize this imprecision, if you will. 447 00:20:13,590 --> 00:20:17,670 We are not going to care, generally, about exactly how many steps 448 00:20:17,670 --> 00:20:20,820 some algorithm takes because that's not going to be that enlightening, 449 00:20:20,820 --> 00:20:24,630 especially if maybe you have a faster computer tomorrow than you did today. 450 00:20:24,630 --> 00:20:27,510 It wouldn't really be fair to compare numbers too precisely. 451 00:20:27,510 --> 00:20:31,590 We really want to, with the wave of a hand, just get a sense of roughly 452 00:20:31,590 --> 00:20:33,930 how slow or how fast an algorithm is. 453 00:20:33,930 --> 00:20:36,000 So the notation here is deliberate. 454 00:20:36,000 --> 00:20:40,620 That is literally a capital O, often italicized, refer to as big O. 455 00:20:40,620 --> 00:20:43,920 And so the first algorithm is in big O of n. 456 00:20:43,920 --> 00:20:47,760 The second algorithm is in big O of n divided by 2. 457 00:20:47,760 --> 00:20:51,480 The third algorithm is in big O of log base 2 of n. 458 00:20:51,480 --> 00:20:54,690 But even that is kind of unnecessary detail. 459 00:20:54,690 --> 00:20:58,830 When using big O notation, you really don't care about, we'll see, 460 00:20:58,830 --> 00:21:01,230 the smaller order terms. 461 00:21:01,230 --> 00:21:04,500 We're not going to care about the divided by 2 because you know what? 462 00:21:04,500 --> 00:21:07,720 The shape of these algorithms are is almost the same. 463 00:21:07,720 --> 00:21:10,800 And really, the idea-- the algorithm itself is sort of fundamentally 464 00:21:10,800 --> 00:21:11,340 the same. 465 00:21:11,340 --> 00:21:13,620 Instead of one page at a time, I'm doing two. 466 00:21:13,620 --> 00:21:17,280 But if you throw millions of pages, billions of pages at me, 467 00:21:17,280 --> 00:21:20,460 those algorithms are really going to kind of perform the same as n gets 468 00:21:20,460 --> 00:21:22,085 really large, goes off toward infinity. 469 00:21:22,085 --> 00:21:23,585 And the same is true for logarithms. 470 00:21:23,585 --> 00:21:25,560 Even if you're a little rusty, it turns out 471 00:21:25,560 --> 00:21:29,560 that whether you do the math with log base 2, log base 3, log base 10, 472 00:21:29,560 --> 00:21:33,040 you can just multiply one by the other to really get the same formula. 473 00:21:33,040 --> 00:21:35,700 So this is only to say a computer scientist would generally 474 00:21:35,700 --> 00:21:39,270 say that the first two algorithms are on the order of n steps. 475 00:21:39,270 --> 00:21:42,690 The third algorithm is on the order of log n steps. 476 00:21:42,690 --> 00:21:46,350 And we don't really care precisely what we mean beyond that. 477 00:21:46,350 --> 00:21:49,770 And this big O notation, as we'll see-- and actually, let me zoom out. 478 00:21:49,770 --> 00:21:53,500 If you can imagine suddenly making the x-axis much longer-- 479 00:21:53,500 --> 00:21:55,770 so more pages on the screen at once-- 480 00:21:55,770 --> 00:21:58,320 it is indeed going to be the shapes of these curves 481 00:21:58,320 --> 00:22:01,480 that matter, because imagine in your mind's eye as you zoom out, 482 00:22:01,480 --> 00:22:04,950 zoom out, zoom out, zoom out, and as n gets much, much, much bigger 483 00:22:04,950 --> 00:22:07,470 on the x-axis, the red and the yellow line 484 00:22:07,470 --> 00:22:11,400 are essentially going to look the same once n is sufficiently large. 485 00:22:11,400 --> 00:22:14,378 But the green line is never going to look the same. 486 00:22:14,378 --> 00:22:16,420 It's going to be a fundamentally different shape. 487 00:22:16,420 --> 00:22:18,570 And so that's the intuition of big O, to get 488 00:22:18,570 --> 00:22:23,200 a sense of these rates of performance like this. 489 00:22:23,200 --> 00:22:25,410 So here, then, is big O. Here is, perhaps, 490 00:22:25,410 --> 00:22:29,160 a cheat sheet of the common formulas that a computer scientist, certainly 491 00:22:29,160 --> 00:22:32,490 in an introductory context, might use when analyzing algorithms. 492 00:22:32,490 --> 00:22:36,060 And let's consider for a moment which of our first two algorithms-- 493 00:22:36,060 --> 00:22:39,040 linear search and binary search-- fall into these categories. 494 00:22:39,040 --> 00:22:44,318 So I've ordered them from slowest to fastest, so order of n squared. 495 00:22:44,318 --> 00:22:46,110 It's not something we've actually seen yet, 496 00:22:46,110 --> 00:22:48,392 but it tends to be slow because it's quadratic. 497 00:22:48,392 --> 00:22:49,350 You're doing n times n. 498 00:22:49,350 --> 00:22:51,090 That's got to add up to a lot of steps. 499 00:22:51,090 --> 00:22:53,190 Better today is going to be n log n. 500 00:22:53,190 --> 00:22:54,630 Even better is going to be n. 501 00:22:54,630 --> 00:22:56,190 Even better than that is log n. 502 00:22:56,190 --> 00:23:00,150 And best is so-called order of 1, like one step 503 00:23:00,150 --> 00:23:02,520 or maybe two steps, maybe even 1,000 steps, 504 00:23:02,520 --> 00:23:08,200 but a fixed, finite number of steps that never changes no matter how big n is. 505 00:23:08,200 --> 00:23:11,920 So given this chart, just to be clear, linear search-- 506 00:23:11,920 --> 00:23:13,570 let's consider the worst case. 507 00:23:13,570 --> 00:23:18,040 In the worst case, how many steps did it take someone like Stephanie 508 00:23:18,040 --> 00:23:23,500 to find the solution to the problem, assuming not seven doors but n doors? 509 00:23:23,500 --> 00:23:25,160 Yeah? 510 00:23:25,160 --> 00:23:26,540 So on the order of n. 511 00:23:26,540 --> 00:23:28,280 And in this case, it's exactly n. 512 00:23:28,280 --> 00:23:32,660 But you know what, maybe it's arguably 2n because it took Stephanie 513 00:23:32,660 --> 00:23:33,530 a couple of steps. 514 00:23:33,530 --> 00:23:34,460 She had to lift the latch. 515 00:23:34,460 --> 00:23:35,360 She had to open the door. 516 00:23:35,360 --> 00:23:36,318 Maybe it's three steps. 517 00:23:36,318 --> 00:23:37,530 She had to show the money. 518 00:23:37,530 --> 00:23:39,170 So now it's 3n, 2n. 519 00:23:39,170 --> 00:23:41,990 But we don't really care about that level of precision. 520 00:23:41,990 --> 00:23:45,660 We really just care about the fundamental number of operations. 521 00:23:45,660 --> 00:23:47,540 So we'll say yes, on the order of n. 522 00:23:47,540 --> 00:23:51,320 So that might be an upper bound, we'll call this, for linear search. 523 00:23:51,320 --> 00:23:53,030 And how about binary search? 524 00:23:53,030 --> 00:23:57,140 In Jackson's case, or in general, me in Week 0, if there's n doors, 525 00:23:57,140 --> 00:24:02,910 how many steps did it take Jackson or me using binary search? 526 00:24:02,910 --> 00:24:04,860 In this case, it was literally three. 527 00:24:04,860 --> 00:24:07,200 But that's not a formula. 528 00:24:07,200 --> 00:24:09,690 Yeah so it's on the order of log n. 529 00:24:09,690 --> 00:24:11,970 And indeed, if there are seven doors, well, that's 530 00:24:11,970 --> 00:24:14,250 almost eight, if you just do a little bit of rounding. 531 00:24:14,250 --> 00:24:18,480 And indeed, if you take log base 2 of 8, that does actually give us 3. 532 00:24:18,480 --> 00:24:19,813 So the math actually checks out. 533 00:24:19,813 --> 00:24:22,272 And if you're not comfortable with logarithms, no big deal. 534 00:24:22,272 --> 00:24:23,670 Just think about it intuitively. 535 00:24:23,670 --> 00:24:27,010 Logarithm of base 2 is just dividing something again and again. 536 00:24:27,010 --> 00:24:31,110 So on this chart, when we consider big O, which to be clear, 537 00:24:31,110 --> 00:24:35,100 allows you to describe the order of an algorithm's running time-- 538 00:24:35,100 --> 00:24:38,610 like the magnitude of it-- but it also describes, more specifically, 539 00:24:38,610 --> 00:24:40,090 an upper bound. 540 00:24:40,090 --> 00:24:42,990 So in the worst case, for instance, these 541 00:24:42,990 --> 00:24:45,480 are pretty good measures of how good-- 542 00:24:45,480 --> 00:24:47,310 or rather, of how bad-- 543 00:24:47,310 --> 00:24:49,270 linear search and binary search might be. 544 00:24:49,270 --> 00:24:49,770 Why? 545 00:24:49,770 --> 00:24:52,122 Well, suppose you're searching a 1,000-page phonebook 546 00:24:52,122 --> 00:24:55,080 and the person's name starts with Z. The algorithm is still going to be 547 00:24:55,080 --> 00:24:56,320 on the order of n steps. 548 00:24:56,320 --> 00:24:56,820 Why? 549 00:24:56,820 --> 00:25:01,080 Because it might take you as many as all n steps to find it. 550 00:25:01,080 --> 00:25:05,250 Now that's not necessarily going to be the case in practice. 551 00:25:05,250 --> 00:25:08,370 If I use big O as an upper bound, well, it 552 00:25:08,370 --> 00:25:11,160 would be nice if there's a corresponding lower bound, 553 00:25:11,160 --> 00:25:16,180 especially if you want to consider not just worst cases, but maybe best cases. 554 00:25:16,180 --> 00:25:18,040 So what might we use here? 555 00:25:18,040 --> 00:25:20,200 Well, this is a capital Greek omega symbol. 556 00:25:20,200 --> 00:25:23,550 So omega is the symbol that a computer scientist uses generally 557 00:25:23,550 --> 00:25:25,920 to describe a lower bound on an algorithm, 558 00:25:25,920 --> 00:25:28,710 often in the context of best case, though not necessarily. 559 00:25:28,710 --> 00:25:32,490 So a lower bound means how few steps might an algorithm take? 560 00:25:32,490 --> 00:25:33,990 And here, too, same formulas. 561 00:25:33,990 --> 00:25:36,270 And we'll fill in these blanks over time. 562 00:25:36,270 --> 00:25:40,140 Some algorithms might always take a minimum of n squared steps, 563 00:25:40,140 --> 00:25:41,370 or on the order of n steps. 564 00:25:41,370 --> 00:25:45,660 Some might only take n log n, or n, or log n, or 1. 565 00:25:45,660 --> 00:25:49,170 So something like a linear search-- 566 00:25:49,170 --> 00:25:51,240 when Stephanie started with linear search, 567 00:25:51,240 --> 00:25:52,980 she didn't get lucky this time on stage. 568 00:25:52,980 --> 00:25:57,720 But what if she had, and the first door she opened were 50? 569 00:25:57,720 --> 00:26:01,260 How might you then describe the lower bound on linear 570 00:26:01,260 --> 00:26:08,290 search in this so-called best case, using this list of possible answers? 571 00:26:08,290 --> 00:26:09,530 Yeah? 572 00:26:09,530 --> 00:26:11,060 Yeah, so omega of 1. 573 00:26:11,060 --> 00:26:14,900 So in the best case, the lower bound on how many steps 574 00:26:14,900 --> 00:26:18,990 it might take linear search to find something might just be one step. 575 00:26:18,990 --> 00:26:19,490 Why? 576 00:26:19,490 --> 00:26:21,500 Because maybe if Stephanie had gotten lucky 577 00:26:21,500 --> 00:26:25,960 and we had prefilled these lockers with the numbers in some other order 578 00:26:25,960 --> 00:26:28,460 such that she might have opened the first locker, and voila, 579 00:26:28,460 --> 00:26:31,280 the number 50 could have been there, so a lower bound arguably 580 00:26:31,280 --> 00:26:34,610 could indeed be omega of 1 for linear search. 581 00:26:34,610 --> 00:26:35,990 And how about now for Jackson? 582 00:26:35,990 --> 00:26:37,440 He used binary search. 583 00:26:37,440 --> 00:26:40,940 So he dived right into the middle of the problem. 584 00:26:40,940 --> 00:26:45,020 But what would be a lower bound on binary search using this logic? 585 00:26:45,020 --> 00:26:45,980 Yeah? 586 00:26:45,980 --> 00:26:47,460 Yeah, so again, omega of 1. 587 00:26:47,460 --> 00:26:47,960 Why? 588 00:26:47,960 --> 00:26:49,580 Because maybe he just gets lucky. 589 00:26:49,580 --> 00:26:53,300 And indeed, right in the middle of the lockers could have been the number 50. 590 00:26:53,300 --> 00:26:54,060 It wasn't. 591 00:26:54,060 --> 00:26:57,770 And so more germane in Jackson's actual practice 592 00:26:57,770 --> 00:27:00,050 would have been the big O discussion. 593 00:27:00,050 --> 00:27:03,350 But big O and omega, upper bound and lower bound, 594 00:27:03,350 --> 00:27:05,450 just allow a computer scientist to kind of wrestle 595 00:27:05,450 --> 00:27:06,980 with what could happen maybe in the worst 596 00:27:06,980 --> 00:27:08,670 case, what can happen in the best case? 597 00:27:08,670 --> 00:27:12,267 And you can even get even more precise like the average case or the like. 598 00:27:12,267 --> 00:27:15,350 And this is, indeed, what engineers might do at a whiteboard in a company, 599 00:27:15,350 --> 00:27:17,600 in a university when designing an algorithm 600 00:27:17,600 --> 00:27:21,380 and trying to make arguments as to why their algorithm is better than someone 601 00:27:21,380 --> 00:27:24,080 else's, by way of these kinds of analyses. 602 00:27:24,080 --> 00:27:29,180 And just so you've seen it, it turns out that if some algorithm happens 603 00:27:29,180 --> 00:27:32,990 to have an identical upper bound and lower bound, 604 00:27:32,990 --> 00:27:35,880 you can actually use a capital Greek theta as well. 605 00:27:35,880 --> 00:27:38,210 And this is the last of the Greek symbols today. 606 00:27:38,210 --> 00:27:41,900 But a Greek theta indicates a coincidence of both upper 607 00:27:41,900 --> 00:27:43,130 bound and lower bound. 608 00:27:43,130 --> 00:27:44,702 That is, they are one and the same. 609 00:27:44,702 --> 00:27:47,660 That was not the case for our discussion a second ago of linear search, 610 00:27:47,660 --> 00:27:49,220 not the case for binary search. 611 00:27:49,220 --> 00:27:52,970 But you could use the same kinds of formulas 612 00:27:52,970 --> 00:27:56,970 if it turns out that your upper bound and lower bound are the same. 613 00:27:56,970 --> 00:28:00,440 So for instance, if I were to count everyone literally in this room-- one, 614 00:28:00,440 --> 00:28:03,870 two, three, four, five, six and so forth-- 615 00:28:03,870 --> 00:28:09,080 you could actually say that counting in that way is in theta of n 616 00:28:09,080 --> 00:28:11,150 because in the best case, it's going to take 617 00:28:11,150 --> 00:28:13,468 me n points, people in the audience. 618 00:28:13,468 --> 00:28:15,260 In the worst case, it's going to take me n. 619 00:28:15,260 --> 00:28:18,380 It's always going to take me n steps if I want to count everyone in the room. 620 00:28:18,380 --> 00:28:20,930 You can't really do better than that unless you skip people. 621 00:28:20,930 --> 00:28:23,510 So that would be an example of the cuff of something 622 00:28:23,510 --> 00:28:26,150 where theta is instead germane. 623 00:28:26,150 --> 00:28:31,530 Are any questions now on big O, on omega, or theta, 624 00:28:31,530 --> 00:28:34,040 which are now just more formal tools in the toolkit 625 00:28:34,040 --> 00:28:38,730 for talking about the design of our algorithms? 626 00:28:38,730 --> 00:28:42,050 Any questions? 627 00:28:42,050 --> 00:28:42,860 No? 628 00:28:42,860 --> 00:28:44,720 Seeing none. 629 00:28:44,720 --> 00:28:45,560 Oh, is this-- yes? 630 00:28:45,560 --> 00:28:46,840 No? 631 00:28:46,840 --> 00:28:48,250 OK, so we're good. 632 00:28:48,250 --> 00:28:52,000 So let's go ahead and translate this, perhaps, to some actual code. 633 00:28:52,000 --> 00:28:53,900 Let me go over to VS Code here. 634 00:28:53,900 --> 00:28:57,100 And let's see if we can't now translate some of these ideas 635 00:28:57,100 --> 00:29:00,280 to some actual code, not so much using new syntax yet. 636 00:29:00,280 --> 00:29:03,320 We're going to still operate in this world of arrays like last week. 637 00:29:03,320 --> 00:29:05,237 So let me go ahead and create a program called 638 00:29:05,237 --> 00:29:09,280 search.c by executing code space search.c in my terminal. 639 00:29:09,280 --> 00:29:11,800 And then up here, let's go ahead and include our usual, 640 00:29:11,800 --> 00:29:14,740 so include cs50.h so I can get some input. 641 00:29:14,740 --> 00:29:18,370 Include standard io.h so I can print some output. 642 00:29:18,370 --> 00:29:21,910 We'll do int main void, the meaning of which we did start 643 00:29:21,910 --> 00:29:23,140 to tease apart last week. 644 00:29:23,140 --> 00:29:26,650 The fact that it's void again today just means no command line arguments. 645 00:29:26,650 --> 00:29:28,580 And let me go ahead and do this. 646 00:29:28,580 --> 00:29:33,130 Let me go ahead and declare, just for discussion's sake, a static array, 647 00:29:33,130 --> 00:29:34,840 like an array that never changes. 648 00:29:34,840 --> 00:29:39,280 And the syntax for this is going to be give me an array called numbers 649 00:29:39,280 --> 00:29:41,290 using the square bracket notation. 650 00:29:41,290 --> 00:29:43,390 And I'm going to immediately initialize it 651 00:29:43,390 --> 00:29:49,300 to 20, 500, 10, 5, 100, 1, and 50, reminiscent of those same denominations 652 00:29:49,300 --> 00:29:50,060 as before. 653 00:29:50,060 --> 00:29:54,080 So this is a slightly new syntax that we've perhaps not seen. 654 00:29:54,080 --> 00:29:57,130 And the curly braces here, which are different from for loops 655 00:29:57,130 --> 00:30:00,820 and while loops and functions, just tell the compiler please give me 656 00:30:00,820 --> 00:30:05,380 an array of whatever size this is containing those numbers left to right. 657 00:30:05,380 --> 00:30:10,220 I could alternatively use last week's syntax of saying something like this. 658 00:30:10,220 --> 00:30:13,090 Let's see, 1, 2, 3, 4, 5, 6, 7 denominations. 659 00:30:13,090 --> 00:30:15,250 I could alternatively do this. 660 00:30:15,250 --> 00:30:21,910 And then I could say numbers bracket 0 equals 661 00:30:21,910 --> 00:30:25,570 20, numbers bracket 1 equals 500. 662 00:30:25,570 --> 00:30:27,572 And I could do this five more times. 663 00:30:27,572 --> 00:30:28,780 That's just a little tedious. 664 00:30:28,780 --> 00:30:30,580 If you know the numbers in advance, you don't 665 00:30:30,580 --> 00:30:32,530 have to tell the compiler how many there are. 666 00:30:32,530 --> 00:30:37,420 You can just let it figure it out that your numbers will be 10, 500, 10, 5, 667 00:30:37,420 --> 00:30:39,430 100, 1, and 50. 668 00:30:39,430 --> 00:30:42,550 So this is how you statically define an array. 669 00:30:42,550 --> 00:30:45,380 All right, let me just go ahead and ask the user now for a number. 670 00:30:45,380 --> 00:30:48,920 We'll call it n by using get_int and prompting them for a number-- 671 00:30:48,920 --> 00:30:50,020 so nothing new there. 672 00:30:50,020 --> 00:30:53,680 And now let me go ahead and implement linear search. 673 00:30:53,680 --> 00:30:57,520 And the pseudocode we had for this before used some array-like notation. 674 00:30:57,520 --> 00:30:59,620 Let me go ahead, then, and start similarly. 675 00:30:59,620 --> 00:31:04,270 For int i-- and you almost always start counting at i by convention. 676 00:31:04,270 --> 00:31:06,490 So that's perhaps a good starting point. 677 00:31:06,490 --> 00:31:09,790 I'm going to do this so long as i is less than 7. 678 00:31:09,790 --> 00:31:12,138 Not the best design to hard code the 7, but this is just 679 00:31:12,138 --> 00:31:13,930 for demonstration's sake for now, because I 680 00:31:13,930 --> 00:31:15,560 know how many numbers I put in there. 681 00:31:15,560 --> 00:31:16,992 And then I'm going to i++. 682 00:31:16,992 --> 00:31:19,450 So now I have the beginnings of a loop that will just allow 683 00:31:19,450 --> 00:31:21,550 me to iterate over the entire array. 684 00:31:21,550 --> 00:31:22,760 And let me ask this. 685 00:31:22,760 --> 00:31:27,370 If the current number at location i equals 686 00:31:27,370 --> 00:31:30,650 equals n, which is the number the human typed in, 687 00:31:30,650 --> 00:31:33,400 then let's go ahead and do something simple like printf, 688 00:31:33,400 --> 00:31:36,190 quote unquote, found, backslash n. 689 00:31:36,190 --> 00:31:40,240 And then per our discussion last week, to indicate that this is successful, 690 00:31:40,240 --> 00:31:42,610 I'm going to return 0 if I found it. 691 00:31:42,610 --> 00:31:45,940 And if I don't find it, I'm just going to go down here and, by default, 692 00:31:45,940 --> 00:31:48,640 say not found, backslash n. 693 00:31:48,640 --> 00:31:52,610 And just for convention-- whoops, just for good measure, per convention, 694 00:31:52,610 --> 00:31:55,630 I'll return 1 or, really, any value other than 0. 695 00:31:55,630 --> 00:31:57,130 0, recall, means success. 696 00:31:57,130 --> 00:32:00,670 And any other integer tends to mean error of some sort, 697 00:32:00,670 --> 00:32:02,690 irrespective of the number I'm looking for. 698 00:32:02,690 --> 00:32:06,670 So just to revisit, the only thing that's new here is the syntax. 699 00:32:06,670 --> 00:32:09,980 We're creating an array of seven numbers, these numbers. 700 00:32:09,980 --> 00:32:13,540 And then after that we have really highlighted here 701 00:32:13,540 --> 00:32:16,090 an implementation of linear search. 702 00:32:16,090 --> 00:32:19,390 I mean this is the C version, I daresay, of what Stephanie did on the board, 703 00:32:19,390 --> 00:32:22,540 whereas now the array is called numbers instead of doors. 704 00:32:22,540 --> 00:32:25,460 But I think it's pretty much the same. 705 00:32:25,460 --> 00:32:30,380 Let me go ahead and open my terminal window and run make search. 706 00:32:30,380 --> 00:32:32,518 Seems to compile, ./search. 707 00:32:32,518 --> 00:32:34,310 And let's go ahead and search for a number. 708 00:32:34,310 --> 00:32:36,230 We'll start with what we did before, 50. 709 00:32:36,230 --> 00:32:37,340 And it's found. 710 00:32:37,340 --> 00:32:39,770 Let's go ahead and run it again, ./search. 711 00:32:39,770 --> 00:32:42,500 Let's search for maybe 20 at the beginning. 712 00:32:42,500 --> 00:32:43,670 That one, too, is found. 713 00:32:43,670 --> 00:32:47,150 Let's run it one more time searching for like 1,000, 714 00:32:47,150 --> 00:32:50,720 which is not among the denominations. 715 00:32:50,720 --> 00:32:52,980 And that one, indeed, is not found. 716 00:32:52,980 --> 00:32:56,210 So we've taken an idea from Week 0, now formalized in Week 3, 717 00:32:56,210 --> 00:32:59,300 and just translated it now to code. 718 00:32:59,300 --> 00:33:05,500 Questions on this implementation of linear search? 719 00:33:05,500 --> 00:33:07,570 Linear search. 720 00:33:07,570 --> 00:33:08,680 Nothing. 721 00:33:08,680 --> 00:33:11,810 Oh, so successful so far today. 722 00:33:11,810 --> 00:33:15,820 So let's see if we can't maybe make this a little more interesting 723 00:33:15,820 --> 00:33:19,270 and see if we can't trip over a detail that's going to be important in C. 724 00:33:19,270 --> 00:33:23,330 And instead of doing numbers, let me go ahead and do this. 725 00:33:23,330 --> 00:33:25,030 We'll stay on theme with Monopoly. 726 00:33:25,030 --> 00:33:26,830 And I went down the rabbit hole of reading the Wikipedia 727 00:33:26,830 --> 00:33:27,730 article on Monopoly. 728 00:33:27,730 --> 00:33:32,170 And the original pieces or tokens that came with Monopoly-- and it turns out 729 00:33:32,170 --> 00:33:33,740 we can represent those with strings. 730 00:33:33,740 --> 00:33:37,690 So I'm going to create an array called strings, plural, of whatever 731 00:33:37,690 --> 00:33:39,170 size I defined here. 732 00:33:39,170 --> 00:33:42,340 And the very first monopoly pieces back in the day 733 00:33:42,340 --> 00:33:44,920 were a battleship that you could play with, 734 00:33:44,920 --> 00:33:53,320 a boot, a cannon, an iron, a thimble, and a top hat, some of which 735 00:33:53,320 --> 00:33:54,700 you might from the game nowadays. 736 00:33:54,700 --> 00:33:56,410 Turns out they've been changing these-- 737 00:33:56,410 --> 00:33:57,890 had no idea-- over the years. 738 00:33:57,890 --> 00:34:00,170 So here is, now, an array of strings. 739 00:34:00,170 --> 00:34:03,940 Let me go ahead and prompt the user now not for an integer anymore. 740 00:34:03,940 --> 00:34:07,970 I want to now search for one of these strings still using linear search. 741 00:34:07,970 --> 00:34:11,020 So let me create a string s, set it equal to get_string, 742 00:34:11,020 --> 00:34:13,840 prompt the user for a string to search for. 743 00:34:13,840 --> 00:34:19,540 And then I think my code here is almost the same, except for one detail. 744 00:34:19,540 --> 00:34:21,850 I now have an array called strings. 745 00:34:21,850 --> 00:34:24,040 I now have a variable called s. 746 00:34:24,040 --> 00:34:27,790 But it turns out, for reasons we'll explore in more detail next week, 747 00:34:27,790 --> 00:34:31,030 this line of code is not going to work. 748 00:34:31,030 --> 00:34:34,900 And it turns out the reason has to do with what we discussed 749 00:34:34,900 --> 00:34:36,880 last week of what a string really is. 750 00:34:36,880 --> 00:34:39,355 And what is a string, again? 751 00:34:39,355 --> 00:34:41,000 A string is an array. 752 00:34:41,000 --> 00:34:44,929 And it turns out, though, that equals equals is not 753 00:34:44,929 --> 00:34:49,760 going to generously compare all of the characters in an array for you 754 00:34:49,760 --> 00:34:51,949 just because you use equal equals. 755 00:34:51,949 --> 00:34:54,650 It turns out it's not going to compare every letter. 756 00:34:54,650 --> 00:35:00,260 And so thankfully, there is, in the string library 757 00:35:00,260 --> 00:35:03,058 that we introduced last week, a solution to this problem. 758 00:35:03,058 --> 00:35:05,850 The reason for the problem, we'll explore in more detail next week. 759 00:35:05,850 --> 00:35:09,860 But for now, just know that when you want to compare strings in C-- 760 00:35:09,860 --> 00:35:13,220 especially if you've come into the class knowing a bit of Java or Python or some 761 00:35:13,220 --> 00:35:15,680 other language-- you cannot use equals equals. 762 00:35:15,680 --> 00:35:18,500 Even though you could in Scratch, you cannot in C. 763 00:35:18,500 --> 00:35:21,620 So what I have to actually do here is this. 764 00:35:21,620 --> 00:35:26,330 I have to ask the question, does the return value of a function called str 765 00:35:26,330 --> 00:35:33,160 compare, or strcomp, equal 0 when passed in the current string 766 00:35:33,160 --> 00:35:36,050 and that's user input? 767 00:35:36,050 --> 00:35:39,790 So if you read the documentation for this function called str compare, 768 00:35:39,790 --> 00:35:44,500 you'll see that it takes two strings as input, first one and second one. 769 00:35:44,500 --> 00:35:47,140 It then-- someone decades ago wrote the code 770 00:35:47,140 --> 00:35:49,060 that probably uses a for loop or a while loop 771 00:35:49,060 --> 00:35:51,910 to compare every character in each of those strings. 772 00:35:51,910 --> 00:35:56,290 And it turns out it returns 0 if they are, in fact, equal. 773 00:35:56,290 --> 00:36:00,640 Turns out, too, it will return a positive number or a negative number 774 00:36:00,640 --> 00:36:02,440 in other situations. 775 00:36:02,440 --> 00:36:04,990 Any intuition for why it might actually be 776 00:36:04,990 --> 00:36:10,810 useful to have a function that allows you to check if two strings are equal? 777 00:36:10,810 --> 00:36:13,480 If they're not equal, what else might be interesting to know 778 00:36:13,480 --> 00:36:14,830 when comparing two strings? 779 00:36:14,830 --> 00:36:18,474 780 00:36:18,474 --> 00:36:19,391 If certain values are? 781 00:36:19,391 --> 00:36:23,347 STUDENT: [INAUDIBLE] 782 00:36:23,347 --> 00:36:24,430 DAVID MALAN: OK, possibly. 783 00:36:24,430 --> 00:36:26,950 Maybe you want to just how similar they are. 784 00:36:26,950 --> 00:36:28,810 And that's indeed an algorithm unto itself. 785 00:36:28,810 --> 00:36:31,410 But str compare is a little simpler than that. 786 00:36:31,410 --> 00:36:33,040 STUDENT: [INAUDIBLE] 787 00:36:33,040 --> 00:36:35,850 788 00:36:35,850 --> 00:36:39,100 DAVID MALAN: Exactly, if you're trying to alphabetize a whole list of strings, 789 00:36:39,100 --> 00:36:41,950 just like your phone probably is for your contacts or address book. 790 00:36:41,950 --> 00:36:44,320 It turns out that str compare will actually 791 00:36:44,320 --> 00:36:48,220 return a positive number or a negative number or a 0 792 00:36:48,220 --> 00:36:52,120 based on whether, maybe it comes alphabetically first or later, 793 00:36:52,120 --> 00:36:53,800 or in fact, equal. 794 00:36:53,800 --> 00:36:55,130 So that can be a useful thing. 795 00:36:55,130 --> 00:36:57,838 And that's just a teaser for a lower level explanation 796 00:36:57,838 --> 00:36:58,880 that we'll see next week. 797 00:36:58,880 --> 00:37:01,750 So now, let me cross my fingers and see if I got this right. 798 00:37:01,750 --> 00:37:05,410 Let me go ahead and do make search. 799 00:37:05,410 --> 00:37:08,590 Did compile, albeit slowly. 800 00:37:08,590 --> 00:37:11,920 Dot slash search, and let's search for something like the thimble. 801 00:37:11,920 --> 00:37:14,048 And we see that that's, indeed, found. 802 00:37:14,048 --> 00:37:16,090 Otherwise, let's search for something that I know 803 00:37:16,090 --> 00:37:19,060 isn't there, like a race car, which was there when I grew up. 804 00:37:19,060 --> 00:37:23,227 But huh, segmentation fault, core dumped. 805 00:37:23,227 --> 00:37:25,810 And actually, some of you have tripped over this error before. 806 00:37:25,810 --> 00:37:27,220 Anyone want to admit seeing this? 807 00:37:27,220 --> 00:37:29,350 So yeah, not something we've talked about, 808 00:37:29,350 --> 00:37:32,170 and honestly, not something I intended just now. 809 00:37:32,170 --> 00:37:34,450 But that too, we'll see next week. 810 00:37:34,450 --> 00:37:39,920 Any intuition for why my program just broke. 811 00:37:39,920 --> 00:37:41,900 I didn't really change the logic. 812 00:37:41,900 --> 00:37:43,550 It's still linear search. 813 00:37:43,550 --> 00:37:46,280 Let me hide the terminal so you can see all of the code at once. 814 00:37:46,280 --> 00:37:49,850 The only thing I did was switched from integers to strings. 815 00:37:49,850 --> 00:37:52,310 And I switched to str compare here. 816 00:37:52,310 --> 00:37:54,205 But segmentation fault happened. 817 00:37:54,205 --> 00:37:57,080 And the teaser is that that somehow relates to the computer's memory. 818 00:37:57,080 --> 00:37:57,996 Yeah. 819 00:37:57,996 --> 00:38:00,690 STUDENT: [INAUDIBLE] 820 00:38:00,690 --> 00:38:01,470 821 00:38:01,470 --> 00:38:03,670 DAVID MALAN: Yeah, and this is subtle, but spot on. 822 00:38:03,670 --> 00:38:06,510 So one, two, three, four, five, six elements 823 00:38:06,510 --> 00:38:11,880 total in this array, versus the seven numbers of monopoly denominations 824 00:38:11,880 --> 00:38:12,810 that we had earlier. 825 00:38:12,810 --> 00:38:13,888 And this is where, see? 826 00:38:13,888 --> 00:38:15,930 Sort of case in point, this came back to bite me. 827 00:38:15,930 --> 00:38:19,860 The fact that I hardcoded this value as opposed to maybe separating it out 828 00:38:19,860 --> 00:38:22,260 as a constant or declaring it higher up, kind of 829 00:38:22,260 --> 00:38:26,610 bit me here, because now, I'm iterating over an array of size 6. 830 00:38:26,610 --> 00:38:29,820 But clearly, I'm going one step too far, because I'm literally 831 00:38:29,820 --> 00:38:32,250 going to iterate seven times, not six. 832 00:38:32,250 --> 00:38:35,580 So it's as though I'm looking at memory that's over here. 833 00:38:35,580 --> 00:38:37,530 And indeed, next week, we'll focus on memory. 834 00:38:37,530 --> 00:38:38,860 And that's just a bad thing. 835 00:38:38,860 --> 00:38:42,270 So odds are, not even seeing your code from this past week, if any of you 836 00:38:42,270 --> 00:38:46,020 have had segmentation faults, odds are, you touched memory 837 00:38:46,020 --> 00:38:47,280 that you shouldn't have. 838 00:38:47,280 --> 00:38:49,290 You maybe looped too many times. 839 00:38:49,290 --> 00:38:52,770 You might have used a negative number to get into your array. 840 00:38:52,770 --> 00:38:55,220 In general, you touched memory that you shouldn't have. 841 00:38:55,220 --> 00:38:57,720 And you touched a segment of memory that you shouldn't have. 842 00:38:57,720 --> 00:39:00,060 The fix, though, at least in my case, is simple. 843 00:39:00,060 --> 00:39:01,300 Just don't do that. 844 00:39:01,300 --> 00:39:03,210 So let me go ahead and recompile this. 845 00:39:03,210 --> 00:39:06,870 Make search dot slash search. 846 00:39:06,870 --> 00:39:10,320 And I'll search again for race car, Enter. 847 00:39:10,320 --> 00:39:11,850 And now it does not crash. 848 00:39:11,850 --> 00:39:13,630 But it does tell me it's not found. 849 00:39:13,630 --> 00:39:17,040 So subtle, but something you might yourself have tripped over already. 850 00:39:17,040 --> 00:39:23,190 Questions then, on what I just did, intentionally or otherwise. 851 00:39:23,190 --> 00:39:24,423 Yeah, in front. 852 00:39:24,423 --> 00:39:29,060 STUDENT: One thing is the program still works if you do return-- 853 00:39:29,060 --> 00:39:31,275 if you don't do return 0, return 1. 854 00:39:31,275 --> 00:39:33,220 So what is the purpose of doing [INAUDIBLE]?? 855 00:39:33,220 --> 00:39:34,720 DAVID MALAN: A really good question. 856 00:39:34,720 --> 00:39:38,920 So the program will still work even if I don't return 0 or return 1. 857 00:39:38,920 --> 00:39:43,120 In fact, let me go ahead and do that and just hide my terminal window 858 00:39:43,120 --> 00:39:43,930 for a second. 859 00:39:43,930 --> 00:39:46,210 Let's get rid of the return here. 860 00:39:46,210 --> 00:39:48,040 Let's get rid of the return here. 861 00:39:48,040 --> 00:39:50,810 However, watch what happens here. 862 00:39:50,810 --> 00:39:53,710 Let me go ahead and recompile this, make search. 863 00:39:53,710 --> 00:39:55,610 Let me scroll up in my code here. 864 00:39:55,610 --> 00:39:57,560 Let me go ahead and do dot slash search. 865 00:39:57,560 --> 00:40:00,280 And let me go ahead and search for the first thing in the list, 866 00:40:00,280 --> 00:40:02,800 battle ship, so I know that this should be found. 867 00:40:02,800 --> 00:40:04,690 I hit Enter. 868 00:40:04,690 --> 00:40:05,858 Huh, interesting. 869 00:40:05,858 --> 00:40:07,150 So it's saying found not found. 870 00:40:07,150 --> 00:40:11,496 But do you see why, logically, in this case? 871 00:40:11,496 --> 00:40:12,980 STUDENT: Is the loop still running? 872 00:40:12,980 --> 00:40:13,910 DAVID MALAN: Exactly. 873 00:40:13,910 --> 00:40:15,302 So the loop is still running. 874 00:40:15,302 --> 00:40:17,010 So there's a couple of solutions to this. 875 00:40:17,010 --> 00:40:21,080 I could, for instance, somehow break out of the code here. 876 00:40:21,080 --> 00:40:24,200 But that's going to still result in line 18 executing. 877 00:40:24,200 --> 00:40:26,600 I could then instead just return here. 878 00:40:26,600 --> 00:40:29,390 I don't strictly need to return 1 down at the bottom. 879 00:40:29,390 --> 00:40:31,580 But I made this claim last week that it tends 880 00:40:31,580 --> 00:40:35,600 to be helpful as your programs get more sophisticated, to at least signify, 881 00:40:35,600 --> 00:40:39,300 just like a real world programmer, error codes when something goes wrong. 882 00:40:39,300 --> 00:40:44,090 So returning 0 in main is the easiest way to signify my code is done. 883 00:40:44,090 --> 00:40:46,340 I'm ready to exit successfully, that's it. 884 00:40:46,340 --> 00:40:48,988 But down here, I could absolutely still return 0, 885 00:40:48,988 --> 00:40:50,280 because that's not a huge deal. 886 00:40:50,280 --> 00:40:53,870 It's not really an error that deserves annoying the user with some kind of pop 887 00:40:53,870 --> 00:40:55,200 up that something went wrong. 888 00:40:55,200 --> 00:40:57,830 But return 1 is just a lower level way of signaling, 889 00:40:57,830 --> 00:41:00,330 eh, it didn't really find what I was looking for. 890 00:41:00,330 --> 00:41:03,510 And remember from last week, you can see this as follows. 891 00:41:03,510 --> 00:41:08,060 If I recompile this again, now that I've reverted those changes, so make search. 892 00:41:08,060 --> 00:41:13,100 And if I do a dot slash search and search for battle ship, 893 00:41:13,100 --> 00:41:16,700 which is indeed found, recall I can execute this magical command, 894 00:41:16,700 --> 00:41:19,790 echo dollar sign question mark, which you're not going to often execute. 895 00:41:19,790 --> 00:41:22,790 But it shows you what main returned. 896 00:41:22,790 --> 00:41:27,770 If I run search again and search for race car, which is not found, 897 00:41:27,770 --> 00:41:30,530 I see not found, but I can also run this command again 898 00:41:30,530 --> 00:41:32,150 and see that, oh, it returned 1. 899 00:41:32,150 --> 00:41:35,300 So now if you fast forward a few months, a few years, when you're actually 900 00:41:35,300 --> 00:41:37,850 writing code in a company or for larger projects, 901 00:41:37,850 --> 00:41:40,100 you might want to be automating software. 902 00:41:40,100 --> 00:41:43,100 You might not want the human to necessarily be running it manually. 903 00:41:43,100 --> 00:41:47,240 You might want code to be automated by some nightly process 904 00:41:47,240 --> 00:41:48,360 or something like that. 905 00:41:48,360 --> 00:41:53,030 Using these exit codes, can a program determine yes or no 906 00:41:53,030 --> 00:41:55,910 that other code succeeded or failed. 907 00:41:55,910 --> 00:42:01,850 Other questions on linear search in this way. 908 00:42:01,850 --> 00:42:02,350 No? 909 00:42:02,350 --> 00:42:07,570 All right, well, let's translate this to one other feature of C 910 00:42:07,570 --> 00:42:11,590 here by incorporating these two ideas now into one other program. 911 00:42:11,590 --> 00:42:16,605 So I'm going to create a phone book in C by doing code space phonebook dot C. 912 00:42:16,605 --> 00:42:18,730 And let's combine some of these ideas and implement 913 00:42:18,730 --> 00:42:21,700 this notion of searching a phonebook for an actual name 914 00:42:21,700 --> 00:42:23,030 and getting back a number. 915 00:42:23,030 --> 00:42:26,500 So I'm going to go ahead and quickly include some of the same things, cs50.h 916 00:42:26,500 --> 00:42:28,060 so we can get input. 917 00:42:28,060 --> 00:42:30,860 standard io dot h so we can print output. 918 00:42:30,860 --> 00:42:33,310 And I'm going to preemptively include string.h 919 00:42:33,310 --> 00:42:35,320 in case we need that one as well. 920 00:42:35,320 --> 00:42:39,010 int main void, no need for command line arguments today. 921 00:42:39,010 --> 00:42:42,650 And let me give myself, now, an array of names for this phone book. 922 00:42:42,650 --> 00:42:45,040 So string names equals. 923 00:42:45,040 --> 00:42:47,620 And then in curly braces, how about Carter 924 00:42:47,620 --> 00:42:50,840 will be one person in the phone book, and David, myself, will be the other. 925 00:42:50,840 --> 00:42:53,465 So we'll keep it short so we don't have to type too many names. 926 00:42:53,465 --> 00:42:55,840 But this is a phone book with two people thus far. 927 00:42:55,840 --> 00:42:59,628 Suppose, now, we want to also store Carter's phone number in mind. 928 00:42:59,628 --> 00:43:01,420 So it's not just saying found or not found. 929 00:43:01,420 --> 00:43:05,320 It's literally looking up our phone numbers like a proper phone book. 930 00:43:05,320 --> 00:43:09,440 Well, at the moment, there's really no way to do this. 931 00:43:09,440 --> 00:43:15,460 I could do something hackish like I could put a number like 617-495-1000 932 00:43:15,460 --> 00:43:16,510 after Carter. 933 00:43:16,510 --> 00:43:22,460 I could maybe do something like 949-468-2750 after me. 934 00:43:22,460 --> 00:43:25,300 But now you're kind of doing the whole apples and oranges thing. 935 00:43:25,300 --> 00:43:26,470 Now, it's not strings. 936 00:43:26,470 --> 00:43:28,420 It's a string int, string int. 937 00:43:28,420 --> 00:43:31,240 All right, so maybe I could just make all of these strings. 938 00:43:31,240 --> 00:43:34,600 But now it's just a conceptual mixing of apples and oranges. 939 00:43:34,600 --> 00:43:36,425 Like yes, that's an array of four strings. 940 00:43:36,425 --> 00:43:39,550 But now you're on the honor system to know that the first string is a name, 941 00:43:39,550 --> 00:43:42,160 the second string is a number, the third string is-- 942 00:43:42,160 --> 00:43:43,100 you can do it. 943 00:43:43,100 --> 00:43:45,110 But it's a bit of a hack, so to speak. 944 00:43:45,110 --> 00:43:47,300 So what might be cleaner than this? 945 00:43:47,300 --> 00:43:51,070 Instead of combining our phone numbers into the same array as our names, 946 00:43:51,070 --> 00:43:55,480 what else might we do that's perhaps a little better? 947 00:43:55,480 --> 00:43:56,440 Say it little louder. 948 00:43:56,440 --> 00:43:58,960 949 00:43:58,960 --> 00:44:01,197 A 2D array, possibly something we could do. 950 00:44:01,197 --> 00:44:04,030 I'm going to keep it even simpler now, because we haven't used those 951 00:44:04,030 --> 00:44:07,823 by name, even though that is, we saw last week, technically what argv is. 952 00:44:07,823 --> 00:44:10,240 What else could I do if I want to store names and numbers? 953 00:44:10,240 --> 00:44:11,147 Yeah. 954 00:44:11,147 --> 00:44:12,220 STUDENT: [INAUDIBLE] 955 00:44:12,220 --> 00:44:13,690 DAVID MALAN: Yeah, let me go with this suggestion. 956 00:44:13,690 --> 00:44:14,607 It's a little simpler. 957 00:44:14,607 --> 00:44:17,440 Rather than complicate things in literally different dimensions, 958 00:44:17,440 --> 00:44:18,970 let me go ahead and do string. 959 00:44:18,970 --> 00:44:21,730 Well, I could do int numbers. 960 00:44:21,730 --> 00:44:22,690 But you know what? 961 00:44:22,690 --> 00:44:27,700 So that we can support punctuation like dashes or even parentheses or country 962 00:44:27,700 --> 00:44:29,200 codes, I'm going to do this instead. 963 00:44:29,200 --> 00:44:33,760 I'm going to do string numbers so that I can represent Carter's number as quote 964 00:44:33,760 --> 00:44:39,040 unquote plus 1 for the US, 617-495-1000, complete with hyphens, 965 00:44:39,040 --> 00:44:40,390 as is US convention. 966 00:44:40,390 --> 00:44:47,930 And then for mine I'll go ahead and do +1-949-468-2750 semicolon. 967 00:44:47,930 --> 00:44:51,610 And now down below, let's actually enable the user to search this phone 968 00:44:51,610 --> 00:44:53,860 book, just like in week 0 we did. 969 00:44:53,860 --> 00:44:55,960 String name equals get string. 970 00:44:55,960 --> 00:44:58,960 And let's ask the user for a name, presumably David or Carter 971 00:44:58,960 --> 00:44:59,990 or someone else. 972 00:44:59,990 --> 00:45:01,850 And now let's re-implement linear search. 973 00:45:01,850 --> 00:45:05,920 So 4, int i get 0. i is less than 2. 974 00:45:05,920 --> 00:45:07,510 And do as I say, not as I do. 975 00:45:07,510 --> 00:45:09,700 I think we should beware this hard coding, 976 00:45:09,700 --> 00:45:12,010 but we'll keep it simple for now. 977 00:45:12,010 --> 00:45:13,220 i++. 978 00:45:13,220 --> 00:45:16,390 And then in this for loop, I think we have all of the ingredients 979 00:45:16,390 --> 00:45:17,150 to solve this. 980 00:45:17,150 --> 00:45:23,440 So if the return value of str compare of all of the names bracket 981 00:45:23,440 --> 00:45:28,810 i comparing against the name that the human typed in, if all of that equals 982 00:45:28,810 --> 00:45:33,380 equals 0, that is, all of the characters in those two strings are equal, 983 00:45:33,380 --> 00:45:36,770 then I think we can go ahead and say found, just like last time. 984 00:45:36,770 --> 00:45:37,520 But you know what? 985 00:45:37,520 --> 00:45:40,130 Let's actually print Carter's or my phone number. 986 00:45:40,130 --> 00:45:44,770 So found percent s, and we'll plug in numbers, bracket i. 987 00:45:44,770 --> 00:45:47,800 And then just for consistency, I'll return 0 here. 988 00:45:47,800 --> 00:45:52,210 And down here, how about I'll say something like printf not 989 00:45:52,210 --> 00:45:53,600 found, just to be clear. 990 00:45:53,600 --> 00:45:56,240 And then I'll return 1 as well. 991 00:45:56,240 --> 00:45:58,120 So just to recap, here's all of the code. 992 00:45:58,120 --> 00:46:01,610 It's almost the same as before, except now it's useful. 993 00:46:01,610 --> 00:46:03,460 I'm not just saying found or not found. 994 00:46:03,460 --> 00:46:07,180 I found a number in monopoly, or I found a piece in monopoly. 995 00:46:07,180 --> 00:46:09,880 I'm looking up in one array, one of the strings. 996 00:46:09,880 --> 00:46:12,730 And then I'm printing from the other array, the answer. 997 00:46:12,730 --> 00:46:19,480 So let me go ahead here and run the compiler, make phone book, Enter. 998 00:46:19,480 --> 00:46:21,070 OK, that's promising, no errors. 999 00:46:21,070 --> 00:46:22,720 Dot slash phonebook now. 1000 00:46:22,720 --> 00:46:26,350 And let's search, for instance, Carter Enter. 1001 00:46:26,350 --> 00:46:28,060 All right, so we found Carter's number. 1002 00:46:28,060 --> 00:46:29,393 All right, let me do that again. 1003 00:46:29,393 --> 00:46:30,960 Phone book, let's search for David. 1004 00:46:30,960 --> 00:46:32,960 All right, we seem to have found David's number. 1005 00:46:32,960 --> 00:46:34,502 All right, let's do it one last time. 1006 00:46:34,502 --> 00:46:35,410 Phone book, Enter. 1007 00:46:35,410 --> 00:46:37,360 And now we'll search for John Harvard. 1008 00:46:37,360 --> 00:46:40,060 Enter, not found. 1009 00:46:40,060 --> 00:46:45,520 All right, so I daresay, albeit with minimal testing, this code is correct. 1010 00:46:45,520 --> 00:46:48,190 Would anyone now like to critique the design? 1011 00:46:48,190 --> 00:46:51,910 Does something rub you the wrong way, perhaps, about this approach here? 1012 00:46:51,910 --> 00:46:55,120 1013 00:46:55,120 --> 00:46:58,180 And as always, think about how, if the program maybe 1014 00:46:58,180 --> 00:47:01,510 gets longer, more complicated, how decisions like this might unfold. 1015 00:47:01,510 --> 00:47:02,448 Yeah. 1016 00:47:02,448 --> 00:47:04,400 STUDENT: If i is less than 2. 1017 00:47:04,400 --> 00:47:07,820 DAVID MALAN: OK, so if i is less than 2, so technically, 1018 00:47:07,820 --> 00:47:10,080 if I change the number of people in this phone book, 1019 00:47:10,080 --> 00:47:11,330 I'm going to have to update i. 1020 00:47:11,330 --> 00:47:13,290 And we've already seen that I get myself into trouble. 1021 00:47:13,290 --> 00:47:14,165 So that's bad design. 1022 00:47:14,165 --> 00:47:15,005 Good. 1023 00:47:15,005 --> 00:47:17,795 STUDENT: Say you add someone's name to the phonebook, 1024 00:47:17,795 --> 00:47:20,710 but you don't have the corresponding number. 1025 00:47:20,710 --> 00:47:23,380 So then when you go to pull their number, 1026 00:47:23,380 --> 00:47:24,730 it [INAUDIBLE] someone's number. 1027 00:47:24,730 --> 00:47:25,480 DAVID MALAN: Yeah. 1028 00:47:25,480 --> 00:47:28,180 So again, I'm sort of trusting myself not to screw up. 1029 00:47:28,180 --> 00:47:30,850 If I add John or anyone else to the first array 1030 00:47:30,850 --> 00:47:34,180 but I forget to add their number to the second array, 1031 00:47:34,180 --> 00:47:36,640 eventually things are going to drift and be inconsistent. 1032 00:47:36,640 --> 00:47:39,010 And then code will be incorrect at that point. 1033 00:47:39,010 --> 00:47:43,420 So sort of a poor design setting me up for future failure, if you will. 1034 00:47:43,420 --> 00:47:44,860 Other thoughts? 1035 00:47:44,860 --> 00:47:45,460 Yeah. 1036 00:47:45,460 --> 00:47:49,816 STUDENT: [INAUDIBLE] so if you were to switch the order of the numbers 1037 00:47:49,816 --> 00:47:52,848 but not the main [INAUDIBLE] 1038 00:47:52,848 --> 00:47:54,140 DAVID MALAN: Yeah, really good. 1039 00:47:54,140 --> 00:47:55,550 We're assuming the same order. 1040 00:47:55,550 --> 00:47:59,452 From left to right, the names go, and from left to right, the numbers go. 1041 00:47:59,452 --> 00:48:01,160 But that's kind of just the honor system. 1042 00:48:01,160 --> 00:48:03,440 Like, there's literally nothing in code preventing me 1043 00:48:03,440 --> 00:48:07,047 from reversing the order for whatever reason, or maybe sorting the names. 1044 00:48:07,047 --> 00:48:10,130 Like, they're sorted now, and maybe that's deliberate, but maybe it's not. 1045 00:48:10,130 --> 00:48:12,920 So this honor system here, too, is just not good. 1046 00:48:12,920 --> 00:48:17,180 I could put a comment in here to remind myself, note to self, 1047 00:48:17,180 --> 00:48:19,490 always update arrays the same way. 1048 00:48:19,490 --> 00:48:21,600 But like, something's going to happen eventually, 1049 00:48:21,600 --> 00:48:26,090 especially when we have not two, but three, but 30, 300 names and numbers. 1050 00:48:26,090 --> 00:48:29,670 It would be nice to keep all of the related data together. 1051 00:48:29,670 --> 00:48:32,810 And so in fact, the one new feature of C we'll introduce today 1052 00:48:32,810 --> 00:48:37,970 is one that actually allows us to implement our very own data structures. 1053 00:48:37,970 --> 00:48:41,870 You can think of arrays as a very lightweight data 1054 00:48:41,870 --> 00:48:44,870 structure, in that it allows you to cluster related data back 1055 00:48:44,870 --> 00:48:45,930 to back to back to back. 1056 00:48:45,930 --> 00:48:48,170 And this is how strings are implemented. 1057 00:48:48,170 --> 00:48:51,560 They are a data structure effectively implemented with an array. 1058 00:48:51,560 --> 00:48:54,290 But with C and with other languages, it turns out 1059 00:48:54,290 --> 00:48:56,600 you can invent your own data types, whether they're 1060 00:48:56,600 --> 00:48:59,870 one dimensional, two dimensional even, or beyond. 1061 00:48:59,870 --> 00:49:05,960 And with C, can you specifically create your own types 1062 00:49:05,960 --> 00:49:07,200 that have their own names? 1063 00:49:07,200 --> 00:49:10,670 So for instance, wouldn't it have been nice if C came with, 1064 00:49:10,670 --> 00:49:16,380 not just char and int and floats and long and others. 1065 00:49:16,380 --> 00:49:19,970 Wouldn't it be nice if C came with a data type called person? 1066 00:49:19,970 --> 00:49:22,790 And ideally, a person would have a name and a number. 1067 00:49:22,790 --> 00:49:24,860 Now, that's a little naive and unrealistic. 1068 00:49:24,860 --> 00:49:28,460 Like, why would they define a person to have just those two fields. 1069 00:49:28,460 --> 00:49:30,950 Certainly, people could have disagreed what a person is. 1070 00:49:30,950 --> 00:49:32,300 So they leave it to us. 1071 00:49:32,300 --> 00:49:35,900 The authors of C gave us all of these primitives, ints and floats and strings 1072 00:49:35,900 --> 00:49:36,810 and so forth. 1073 00:49:36,810 --> 00:49:39,810 But it's up to us now to use those in a more interesting way 1074 00:49:39,810 --> 00:49:44,940 so that we can create an array of person variables, if you will, 1075 00:49:44,940 --> 00:49:48,150 inside of an array called people, just to pluralize it here. 1076 00:49:48,150 --> 00:49:49,740 So how are we going to do this? 1077 00:49:49,740 --> 00:49:54,140 Well, for now, let's just stipulate that a person in the world 1078 00:49:54,140 --> 00:49:57,260 will have a name and a number that we could argue all day long what else 1079 00:49:57,260 --> 00:49:58,010 a person should have. 1080 00:49:58,010 --> 00:49:58,677 And that's fine. 1081 00:49:58,677 --> 00:50:01,790 You can invent your own person eventually. 1082 00:50:01,790 --> 00:50:04,610 At the moment, I'm using just two variables 1083 00:50:04,610 --> 00:50:06,500 to define a person's name and number. 1084 00:50:06,500 --> 00:50:10,490 But wouldn't it be nice to encapsulate, that is, combine these two data 1085 00:50:10,490 --> 00:50:14,660 types, into a new and improved data type called person. 1086 00:50:14,660 --> 00:50:17,360 And the syntax for that is going to be this. 1087 00:50:17,360 --> 00:50:18,800 So it's a bit of a mouthful. 1088 00:50:18,800 --> 00:50:21,960 But you can, perhaps, infer what some of this is doing here. 1089 00:50:21,960 --> 00:50:24,500 So it turns out C has a keyword called typedef. 1090 00:50:24,500 --> 00:50:28,310 As the name kind of suggests, this allows you to define your own type. 1091 00:50:28,310 --> 00:50:31,550 Struct is an indication that it's a structure. 1092 00:50:31,550 --> 00:50:35,060 It's like a structure that has multiple values inside of it 1093 00:50:35,060 --> 00:50:36,710 that you are trying to define. 1094 00:50:36,710 --> 00:50:39,440 And then at the very bottom here outside of the curly braces, 1095 00:50:39,440 --> 00:50:42,270 is the name of the type that you want to create. 1096 00:50:42,270 --> 00:50:45,790 So you don't have discretion over using typedef or struct 1097 00:50:45,790 --> 00:50:46,790 in this particular case. 1098 00:50:46,790 --> 00:50:48,665 But you can name the thing whatever you want. 1099 00:50:48,665 --> 00:50:52,590 And you can put anything in the structure that you want as well. 1100 00:50:52,590 --> 00:50:56,510 And as soon as the semicolon is executed at the bottom of the code, 1101 00:50:56,510 --> 00:50:59,180 every line thereafter can now have access 1102 00:50:59,180 --> 00:51:05,760 to a person data type, whether as a single variable, or as an entire array. 1103 00:51:05,760 --> 00:51:10,260 So if I want to build on this then, let me go ahead and do this. 1104 00:51:10,260 --> 00:51:12,230 Let me go back to my C code here. 1105 00:51:12,230 --> 00:51:17,610 And I'm going to go ahead and change just a couple of things. 1106 00:51:17,610 --> 00:51:19,110 Let's go ahead and do this. 1107 00:51:19,110 --> 00:51:23,240 I'm going to go ahead and, first, get rid of those two hardcoded arrays. 1108 00:51:23,240 --> 00:51:27,180 And let me go ahead and, at the top of my file, 1109 00:51:27,180 --> 00:51:30,180 invent this type, so typedef struct. 1110 00:51:30,180 --> 00:51:34,470 Inside of it will be a string name and then a string number. 1111 00:51:34,470 --> 00:51:36,780 And then the name of the structure will be person. 1112 00:51:36,780 --> 00:51:40,025 And best practice would have me define it at the very top of my file 1113 00:51:40,025 --> 00:51:42,150 so that any of my functions, in fact, could use it, 1114 00:51:42,150 --> 00:51:44,530 even though I just have main in this case. 1115 00:51:44,530 --> 00:51:47,100 Now, if I wanted, I could do this. 1116 00:51:47,100 --> 00:51:50,370 Person P1 and person P2. 1117 00:51:50,370 --> 00:51:53,040 But we know from last week, that already is bad design. 1118 00:51:53,040 --> 00:51:57,400 If you want to have multiple instances of the same type of variable, 1119 00:51:57,400 --> 00:52:00,044 we should probably use what instead? 1120 00:52:00,044 --> 00:52:01,046 STUDENT: [INAUDIBLE] 1121 00:52:01,046 --> 00:52:01,796 DAVID MALAN: And-- 1122 00:52:01,796 --> 00:52:02,470 STUDENT: An array. 1123 00:52:02,470 --> 00:52:03,637 DAVID MALAN: Yeah, an array. 1124 00:52:03,637 --> 00:52:05,230 So let me not even go down that road. 1125 00:52:05,230 --> 00:52:06,700 Let me instead just do this. 1126 00:52:06,700 --> 00:52:09,727 Person will be the type of the array. 1127 00:52:09,727 --> 00:52:10,810 But I'm going to call it-- 1128 00:52:10,810 --> 00:52:11,980 I could call it persons. 1129 00:52:11,980 --> 00:52:13,720 But in English, we typically say people. 1130 00:52:13,720 --> 00:52:15,190 So I'll call the array people. 1131 00:52:15,190 --> 00:52:18,110 And I want two people to exist in this array, 1132 00:52:18,110 --> 00:52:20,920 though I could certainly change that number to be anything I want. 1133 00:52:20,920 --> 00:52:24,820 How, now, do you put a name inside of a person 1134 00:52:24,820 --> 00:52:27,190 and then put the number inside of that same person? 1135 00:52:27,190 --> 00:52:28,990 Well, slightly new syntax today. 1136 00:52:28,990 --> 00:52:30,520 I'm going to go ahead and say this. 1137 00:52:30,520 --> 00:52:34,420 People bracket 0 just gives me the first person in the array. 1138 00:52:34,420 --> 00:52:35,570 That's not new. 1139 00:52:35,570 --> 00:52:40,840 But if you want to go inside of that person in memory, you use a dot. 1140 00:52:40,840 --> 00:52:44,870 And then you just specify the name of the attribute therein. 1141 00:52:44,870 --> 00:52:47,410 So if I want to set the first person's name to Carter, 1142 00:52:47,410 --> 00:52:49,480 I just use that so-called dot notation. 1143 00:52:49,480 --> 00:52:52,780 And then if I want to set Carter's number using dot notation, 1144 00:52:52,780 --> 00:52:56,680 I would do this, +1-617-495-1000. 1145 00:52:56,680 --> 00:52:58,880 And then if I want to do the same for myself, 1146 00:52:58,880 --> 00:53:03,730 I would now do people bracket 1 dot name equals quote unquote David. 1147 00:53:03,730 --> 00:53:08,440 And then people bracket 1 still dot number equals quote unquote 1148 00:53:08,440 --> 00:53:13,030 +1-949-468-2750. 1149 00:53:13,030 --> 00:53:15,970 And now, at the bottom of my file, I think 1150 00:53:15,970 --> 00:53:18,610 my logic can pretty much stay the same. 1151 00:53:18,610 --> 00:53:22,180 I can still, on this line here, prompt the user 1152 00:53:22,180 --> 00:53:24,370 for the name of the person they want to look up. 1153 00:53:24,370 --> 00:53:26,620 For now, even though I admit it's not the best design, 1154 00:53:26,620 --> 00:53:28,495 I'm just doing this for demonstration's sake, 1155 00:53:28,495 --> 00:53:31,360 I'm going to leave the two there, because I know I have two people. 1156 00:53:31,360 --> 00:53:34,100 But down here, this is going to have to change. 1157 00:53:34,100 --> 00:53:37,000 I don't want to compare names bracket i anymore. 1158 00:53:37,000 --> 00:53:42,190 What do I want to type here as the first argument to str compare? 1159 00:53:42,190 --> 00:53:43,900 What do I want to do here? 1160 00:53:43,900 --> 00:53:44,960 Yeah. 1161 00:53:44,960 --> 00:53:46,800 STUDENT: People i dot name. 1162 00:53:46,800 --> 00:53:49,140 DAVID MALAN: So people i dot name, yeah. 1163 00:53:49,140 --> 00:53:52,938 So I want to go into the people array at the ith location, 1164 00:53:52,938 --> 00:53:54,480 because that's what my loop is doing. 1165 00:53:54,480 --> 00:53:55,890 It's updating i again and again. 1166 00:53:55,890 --> 00:53:58,087 And then look at name, and that's good. 1167 00:53:58,087 --> 00:53:59,670 I think now I need to change this too. 1168 00:53:59,670 --> 00:54:01,890 What do I want to print if the person is found? 1169 00:54:01,890 --> 00:54:02,445 Someone else? 1170 00:54:02,445 --> 00:54:05,070 1171 00:54:05,070 --> 00:54:08,850 What do I want to print here, if I found the person's name? 1172 00:54:08,850 --> 00:54:09,360 Yeah. 1173 00:54:09,360 --> 00:54:10,890 STUDENT: [INAUDIBLE] 1174 00:54:10,890 --> 00:54:12,390 DAVID MALAN: Say it a little louder. 1175 00:54:12,390 --> 00:54:13,795 STUDENT: People i dot number. 1176 00:54:13,795 --> 00:54:14,670 DAVID MALAN: Perfect. 1177 00:54:14,670 --> 00:54:17,460 So people bracket i dot number, if indeed I 1178 00:54:17,460 --> 00:54:20,310 want to print the corresponding number to this person. 1179 00:54:20,310 --> 00:54:22,930 And then I think the rest of my code can stay the same. 1180 00:54:22,930 --> 00:54:27,150 So let me go ahead and rerun make phone book to recompile this version. 1181 00:54:27,150 --> 00:54:28,170 So far so good. 1182 00:54:28,170 --> 00:54:29,400 Dot slash phone book. 1183 00:54:29,400 --> 00:54:31,598 Let's go ahead and type in Carter's name, found. 1184 00:54:31,598 --> 00:54:33,390 All right, let's go ahead and run it again. 1185 00:54:33,390 --> 00:54:35,273 David's name, found. 1186 00:54:35,273 --> 00:54:36,940 Let's go ahead and run it one more time. 1187 00:54:36,940 --> 00:54:40,260 Type in John Harvard, for instance, not found, in this case. 1188 00:54:40,260 --> 00:54:43,710 So fundamentally, the code isn't all that different. 1189 00:54:43,710 --> 00:54:46,090 Linear search is still behaving the same way. 1190 00:54:46,090 --> 00:54:48,690 And I admit, this is kind of ugly looking. 1191 00:54:48,690 --> 00:54:52,350 We've kind of made a two line solution five lines of code now. 1192 00:54:52,350 --> 00:54:56,640 But if we fast forward a week or two when we start saving information 1193 00:54:56,640 --> 00:55:01,500 to files, we'll introduce you to files like csv files, 1194 00:55:01,500 --> 00:55:03,892 comma separated values, or spreadsheet files, which 1195 00:55:03,892 --> 00:55:06,600 you've surely opened on your Mac or PC at some point in the past. 1196 00:55:06,600 --> 00:55:09,840 Suffice it to say we'll soon learn techniques for storing information, 1197 00:55:09,840 --> 00:55:11,790 like names and numbers, in files. 1198 00:55:11,790 --> 00:55:13,680 And at that point, we're not going to do any 1199 00:55:13,680 --> 00:55:15,990 of this hackish sort of hard coding of the number 2 1200 00:55:15,990 --> 00:55:19,080 and manually typing my name and Carter's name and number into our program. 1201 00:55:19,080 --> 00:55:21,750 We'll read the information dynamically from a file. 1202 00:55:21,750 --> 00:55:25,180 And in a few weeks, we'll read it dynamically from a database instead. 1203 00:55:25,180 --> 00:55:30,240 But this is, for now, just syntactically how we can create an array of size 2 1204 00:55:30,240 --> 00:55:32,190 containing one person each. 1205 00:55:32,190 --> 00:55:34,643 We can update the name and number of the first person, 1206 00:55:34,643 --> 00:55:36,810 update the name and the number of the second person, 1207 00:55:36,810 --> 00:55:40,500 and then later search across those names and print out 1208 00:55:40,500 --> 00:55:41,610 the corresponding numbers. 1209 00:55:41,610 --> 00:55:44,220 And in this sense, this is a better design. 1210 00:55:44,220 --> 00:55:44,730 Why? 1211 00:55:44,730 --> 00:55:48,690 Because my person data type encapsulates, 1212 00:55:48,690 --> 00:55:53,400 now, everything that it means to be a person, at least in this narrow world. 1213 00:55:53,400 --> 00:55:57,580 And if I want to add something to the notion of a person, for instance, 1214 00:55:57,580 --> 00:55:59,640 I could go up to my type def, and tomorrow, 1215 00:55:59,640 --> 00:56:03,743 add an address to every person and start reading that in as well. 1216 00:56:03,743 --> 00:56:05,160 And now it's not the honor system. 1217 00:56:05,160 --> 00:56:10,260 It's not a names array, a numbers array, an addresses array, and everything else 1218 00:56:10,260 --> 00:56:12,210 you might imagine related to a person. 1219 00:56:12,210 --> 00:56:17,223 It's all encapsulated, which is a term of art inside of the same type. 1220 00:56:17,223 --> 00:56:19,890 Reminiscent, if some of you have programmed before, of something 1221 00:56:19,890 --> 00:56:21,660 called object-oriented programming. 1222 00:56:21,660 --> 00:56:23,190 But we're not there yet. 1223 00:56:23,190 --> 00:56:24,690 C is not that. 1224 00:56:24,690 --> 00:56:29,280 Questions on this use of struct or this new syntax, 1225 00:56:29,280 --> 00:56:35,037 the dot operator being really the juicy part here. 1226 00:56:35,037 --> 00:56:35,620 Any questions? 1227 00:56:35,620 --> 00:56:36,522 Yeah. 1228 00:56:36,522 --> 00:56:39,414 STUDENT: [INAUDIBLE] 1229 00:56:39,414 --> 00:56:42,800 1230 00:56:42,800 --> 00:56:44,420 DAVID MALAN: On what line number? 1231 00:56:44,420 --> 00:56:46,063 STUDENT: 16. 1232 00:56:46,063 --> 00:56:46,730 DAVID MALAN: 16? 1233 00:56:46,730 --> 00:56:51,230 So yes, so syntactically, we introduced the square brackets last week. 1234 00:56:51,230 --> 00:56:55,310 So doing people bracket 0 just means go to the first person in the array. 1235 00:56:55,310 --> 00:56:58,400 That was like when Stephanie literally opened this door. 1236 00:56:58,400 --> 00:56:59,990 That's doors bracket 0. 1237 00:56:59,990 --> 00:57:02,330 But this is, of course, people bracket 0 instead. 1238 00:57:02,330 --> 00:57:04,580 Today, the dot is a new piece of syntax. 1239 00:57:04,580 --> 00:57:11,000 It means go inside of that person in memory and look at the name they're in 1240 00:57:11,000 --> 00:57:13,297 and set it equal to Carter and do the same for number. 1241 00:57:13,297 --> 00:57:13,880 So that's all. 1242 00:57:13,880 --> 00:57:16,340 It's like, open the locker door, go inside of it, 1243 00:57:16,340 --> 00:57:18,410 and check or set the name and the number. 1244 00:57:18,410 --> 00:57:19,040 Yeah. 1245 00:57:19,040 --> 00:57:29,280 STUDENT: [INAUDIBLE] can you set default values for each of the [INAUDIBLE]?? 1246 00:57:29,280 --> 00:57:30,840 DAVID MALAN: Attributes is fine. 1247 00:57:30,840 --> 00:57:31,530 Good question. 1248 00:57:31,530 --> 00:57:34,050 In the struct, can you set default values? 1249 00:57:34,050 --> 00:57:35,100 Short answer, no. 1250 00:57:35,100 --> 00:57:39,000 And this is where C becomes less feature-able than more modern languages 1251 00:57:39,000 --> 00:57:42,580 like Python and Java and others, where you can, in fact, do that. 1252 00:57:42,580 --> 00:57:44,890 So when we transition to Python in a few weeks' time, 1253 00:57:44,890 --> 00:57:47,140 we'll see how we can start solving problems like that. 1254 00:57:47,140 --> 00:57:50,100 But for now, it's up to you to initialize name and number 1255 00:57:50,100 --> 00:57:51,450 to something. 1256 00:57:51,450 --> 00:57:52,832 Yeah. 1257 00:57:52,832 --> 00:57:55,540 STUDENT: [INAUDIBLE] 1258 00:57:55,540 --> 00:58:04,123 1259 00:58:04,123 --> 00:58:05,540 DAVID MALAN: Really good question. 1260 00:58:05,540 --> 00:58:08,470 How can we adjust or critique the design of what I'm doing? 1261 00:58:08,470 --> 00:58:12,190 This is one of the few situations where I would say, hypocritically, 1262 00:58:12,190 --> 00:58:13,780 do as I say, not as I do. 1263 00:58:13,780 --> 00:58:17,710 I am using pretty ugly lines like this, just to introduce the syntax. 1264 00:58:17,710 --> 00:58:20,428 But my claim, pedagogically today, is that eventually, 1265 00:58:20,428 --> 00:58:22,720 when we start storing names and numbers or other things 1266 00:58:22,720 --> 00:58:26,230 in files or in databases, you won't have this redundancy. 1267 00:58:26,230 --> 00:58:29,080 You'll have one line of code or two lines of code that 1268 00:58:29,080 --> 00:58:31,660 read the information from the file or database 1269 00:58:31,660 --> 00:58:34,630 and then fill the entire array with that data. 1270 00:58:34,630 --> 00:58:37,330 For now, I'm just doing it manually so as to keep our focus only 1271 00:58:37,330 --> 00:58:39,400 on the new syntax, but that's it. 1272 00:58:39,400 --> 00:58:42,640 So forgive the bad design by design today. 1273 00:58:42,640 --> 00:58:45,740 Other questions on this? 1274 00:58:45,740 --> 00:58:47,595 All right, that's been a lot already. 1275 00:58:47,595 --> 00:58:50,470 Why don't we go ahead and take our 10 minute break with snacks first. 1276 00:58:50,470 --> 00:58:53,020 We have some delightful brownies in the lobby. 1277 00:58:53,020 --> 00:58:55,900 All right, we are back. 1278 00:58:55,900 --> 00:58:59,890 And up until now, it clearly seems to be a good thing if your data is sorted, 1279 00:58:59,890 --> 00:59:02,350 because you can use binary search. 1280 00:59:02,350 --> 00:59:05,540 You know a little something more about the data. 1281 00:59:05,540 --> 00:59:07,570 But it turns out that sorting in and of itself 1282 00:59:07,570 --> 00:59:10,420 is kind of a problem to solve too. 1283 00:59:10,420 --> 00:59:14,830 And you might think, well, if sorting is going 1284 00:59:14,830 --> 00:59:18,070 to be pretty fast, we absolutely should do it before we start searching, 1285 00:59:18,070 --> 00:59:20,237 because that will just speed up all of our searches. 1286 00:59:20,237 --> 00:59:22,960 But if sorting is slow, that kind of invites the question, well, 1287 00:59:22,960 --> 00:59:25,420 should we bother sorting our data if we're only 1288 00:59:25,420 --> 00:59:28,090 going to search the data maybe once, maybe twice? 1289 00:59:28,090 --> 00:59:30,550 And so here is going to be, potentially, a trade off. 1290 00:59:30,550 --> 00:59:33,250 So let's consider what it means really to sort data. 1291 00:59:33,250 --> 00:59:35,950 In our case, it's just going to be simple and use numbers. 1292 00:59:35,950 --> 00:59:38,230 But it might, in the case of the Googles of the world, 1293 00:59:38,230 --> 00:59:40,880 be actual web pages or persons or the like. 1294 00:59:40,880 --> 00:59:46,090 So here is our typical picture for sorting, for solving any problem. 1295 00:59:46,090 --> 00:59:48,190 Input at left and output at right. 1296 00:59:48,190 --> 00:59:54,340 The input to our sort problem is going to be some unsorted set of values. 1297 00:59:54,340 --> 00:59:57,940 And the output, ideally, will be the same set of values sorted. 1298 00:59:57,940 --> 01:00:00,550 And if we do this concretely, let's suppose 1299 01:00:00,550 --> 01:00:04,786 that we want to go about sorting this list of numbers, 7, 2, 5, 4, 1, 6, 0, 1300 01:00:04,786 --> 01:00:05,860 3. 1301 01:00:05,860 --> 01:00:07,810 So it's all of the numbers from 0 to 7. 1302 01:00:07,810 --> 01:00:09,757 But they're somehow jumbled up randomly. 1303 01:00:09,757 --> 01:00:11,590 That's going to be the input to the problem. 1304 01:00:11,590 --> 01:00:13,840 And the goal is now to sort those so that you, indeed, 1305 01:00:13,840 --> 01:00:17,990 get out 0, 1, 2, 3, 4, 5, 6, 7 instead. 1306 01:00:17,990 --> 01:00:20,440 So it turns out there's lots of different ways 1307 01:00:20,440 --> 01:00:23,900 we can actually sort numbers like these here. 1308 01:00:23,900 --> 01:00:27,430 And in fact, just to complement our search example earlier, 1309 01:00:27,430 --> 01:00:29,952 could we perhaps quickly get some eight volunteers 1310 01:00:29,952 --> 01:00:32,410 to come up if you're comfortable appearing on the internet? 1311 01:00:32,410 --> 01:00:39,100 If you want to do 1, 2, 3, 4, 5, 6, 7, 8, how about? 1312 01:00:39,100 --> 01:00:40,255 All right, come on down. 1313 01:00:40,255 --> 01:00:45,040 1314 01:00:45,040 --> 01:00:47,970 All right. 1315 01:00:47,970 --> 01:00:50,560 Come on over here, and I'll give you each a number. 1316 01:00:50,560 --> 01:00:54,240 And if you want to start to organize yourselves in the same order 1317 01:00:54,240 --> 01:00:58,390 you see the numbers on the board. 1318 01:00:58,390 --> 01:01:01,530 So look up on the overhead and organize yourselves from left 1319 01:01:01,530 --> 01:01:04,460 to right in that same order. 1320 01:01:04,460 --> 01:01:06,210 And let's have the first of you-- perfect. 1321 01:01:06,210 --> 01:01:10,420 If you want to come right over here, how about right in line with this? 1322 01:01:10,420 --> 01:01:13,990 All right, and a few more numbers. 1323 01:01:13,990 --> 01:01:14,980 All right. 1324 01:01:14,980 --> 01:01:19,810 Number 2, 6, and perfect. 1325 01:01:19,810 --> 01:01:21,625 Just the right number, all right. 1326 01:01:21,625 --> 01:01:22,858 Uh oh. 1327 01:01:22,858 --> 01:01:24,400 All right, there we go, number three. 1328 01:01:24,400 --> 01:01:24,968 All right. 1329 01:01:24,968 --> 01:01:26,260 So let's just do a quick check. 1330 01:01:26,260 --> 01:01:30,867 We have 7, 2, 5, 4, 1, 6, 0, 3, very good so far. 1331 01:01:30,867 --> 01:01:32,950 Do you want to just scootch a little this way just 1332 01:01:32,950 --> 01:01:34,510 to make a little more room? 1333 01:01:34,510 --> 01:01:38,090 All right, and let's consider now who we have here on stage. 1334 01:01:38,090 --> 01:01:40,780 You want to each say a quick hello to the audience? 1335 01:01:40,780 --> 01:01:42,070 RYAN: Hi, my name is Ryan. 1336 01:01:42,070 --> 01:01:45,597 I'm a first year from Pennypacker. 1337 01:01:45,597 --> 01:01:46,930 ITSELLE: Hi, my name is Itselle. 1338 01:01:46,930 --> 01:01:49,177 I'm a first year at Strauss. 1339 01:01:49,177 --> 01:01:50,260 LUCY: Hi, my name is Lucy. 1340 01:01:50,260 --> 01:01:52,400 And I'm a first year from Greenough. 1341 01:01:52,400 --> 01:01:53,650 SHILOH: Hi, my name is Shiloh. 1342 01:01:53,650 --> 01:01:55,927 I'm a first year in Wigglesworth. 1343 01:01:55,927 --> 01:01:57,010 JACK: Hi, my name is Jack. 1344 01:01:57,010 --> 01:01:59,877 And I'm a first year in Strauss. 1345 01:01:59,877 --> 01:02:01,210 KATHRYN: Hi, my name is Kathryn. 1346 01:02:01,210 --> 01:02:02,787 I'm a first year at Strauss. 1347 01:02:02,787 --> 01:02:04,120 MICHAEL: Hi, my name is Michael. 1348 01:02:04,120 --> 01:02:06,063 I'm a first year at Pennypacker. 1349 01:02:06,063 --> 01:02:07,480 MUHAMMAD: Hi, my name is Muhammad. 1350 01:02:07,480 --> 01:02:09,047 I'm a first year in Matthews. 1351 01:02:09,047 --> 01:02:10,630 DAVID MALAN: Hi, nice, welcome aboard. 1352 01:02:10,630 --> 01:02:11,240 All right. 1353 01:02:11,240 --> 01:02:16,510 So let's consider, now, how we might go about sorting our kind volunteers here, 1354 01:02:16,510 --> 01:02:21,160 the goal being to get them into order from smallest to largest so that, 1355 01:02:21,160 --> 01:02:24,160 presumably then, we can use something smarter than just linear search. 1356 01:02:24,160 --> 01:02:26,350 We can actually use binary search, assuming 1357 01:02:26,350 --> 01:02:27,878 that they are already then sorted. 1358 01:02:27,878 --> 01:02:30,670 So let me propose that we first consider an algorithm that actually 1359 01:02:30,670 --> 01:02:32,600 has a name called selection sort. 1360 01:02:32,600 --> 01:02:36,220 And selection sort is going to be one that literally has me, 1361 01:02:36,220 --> 01:02:40,060 or really you, as the programmer, selecting the smallest element again 1362 01:02:40,060 --> 01:02:43,610 and again, and then putting them into the appropriate place. 1363 01:02:43,610 --> 01:02:47,115 So let me go ahead and start this here, starting with the number 7. 1364 01:02:47,115 --> 01:02:49,240 At the moment, 7 is the smallest number I've found. 1365 01:02:49,240 --> 01:02:52,610 So I'm going to make mental note of that with a mental variable, if you will. 1366 01:02:52,610 --> 01:02:53,710 I'm going to move on now. 1367 01:02:53,710 --> 01:02:55,460 Number 2 is obviously smaller, so I'm just 1368 01:02:55,460 --> 01:02:58,360 going to update my mental reminder that 2 is now the smallest, 1369 01:02:58,360 --> 01:03:01,555 effectively forgetting, for now, number 7. 1370 01:03:01,555 --> 01:03:02,440 5, not smaller. 1371 01:03:02,440 --> 01:03:03,370 4, not smaller. 1372 01:03:03,370 --> 01:03:04,170 1, smaller. 1373 01:03:04,170 --> 01:03:05,920 And I'm going to make mental note of that. 1374 01:03:05,920 --> 01:03:07,030 6, not smaller. 1375 01:03:07,030 --> 01:03:08,200 0, even smaller. 1376 01:03:08,200 --> 01:03:11,140 I'll make mental note of that, having forgotten now everything else. 1377 01:03:11,140 --> 01:03:13,180 And now number 3 is not smaller. 1378 01:03:13,180 --> 01:03:14,290 So what's your name again? 1379 01:03:14,290 --> 01:03:14,630 MICHAEL: Michael. 1380 01:03:14,630 --> 01:03:16,240 DAVID MALAN: So Michael is number 0. 1381 01:03:16,240 --> 01:03:18,310 He belongs, of course, way down there. 1382 01:03:18,310 --> 01:03:20,740 But unfortunately-- you are-- 1383 01:03:20,740 --> 01:03:21,550 RYAN: Ryan. 1384 01:03:21,550 --> 01:03:23,360 DAVID MALAN: Ryan is in the way. 1385 01:03:23,360 --> 01:03:24,580 So what should we do? 1386 01:03:24,580 --> 01:03:27,570 How should we start to sort this list? 1387 01:03:27,570 --> 01:03:30,510 Where should number 0 go? 1388 01:03:30,510 --> 01:03:31,012 Yeah. 1389 01:03:31,012 --> 01:03:32,220 Do you want to say it louder? 1390 01:03:32,220 --> 01:03:34,545 STUDENT: I will swap, I think. 1391 01:03:34,545 --> 01:03:36,670 DAVID MALAN: Yeah, so let's just go ahead and swap. 1392 01:03:36,670 --> 01:03:39,190 So if you want to go ahead and 0, go on where 7 is. 1393 01:03:39,190 --> 01:03:41,170 We need to make room for number 7. 1394 01:03:41,170 --> 01:03:44,350 It would kind of be cheating if maybe everyone kind of politely 1395 01:03:44,350 --> 01:03:45,530 stepped over to the side. 1396 01:03:45,530 --> 01:03:46,030 Why? 1397 01:03:46,030 --> 01:03:49,000 Because if we imagine all of our volunteers here to be an array, like, 1398 01:03:49,000 --> 01:03:52,390 that's a crazy amount of work to have every element in the array 1399 01:03:52,390 --> 01:03:54,190 shift to the left just to make room. 1400 01:03:54,190 --> 01:03:57,340 So we're going to keep it simple and just evict whoever's there now. 1401 01:03:57,340 --> 01:04:00,880 Now, maybe we get lucky, and number 7 is actually closer to its destination. 1402 01:04:00,880 --> 01:04:03,250 Maybe we get unlucky, and it goes farther away. 1403 01:04:03,250 --> 01:04:05,260 But we've at least solved one problem. 1404 01:04:05,260 --> 01:04:08,140 If we had n problems at first, now we have n minus 1, 1405 01:04:08,140 --> 01:04:10,280 because number 0 is indeed in the right place. 1406 01:04:10,280 --> 01:04:14,588 So if I continue to act this out, let me go ahead and say 2, currently 1407 01:04:14,588 --> 01:04:15,130 the smallest. 1408 01:04:15,130 --> 01:04:18,040 5, no, 4, no, 1 currently the smallest. 1409 01:04:18,040 --> 01:04:19,000 I'll make mental note. 1410 01:04:19,000 --> 01:04:22,690 6, 7, 3, and now let me pause. 1411 01:04:22,690 --> 01:04:26,000 1 is obviously the now smallest element. 1412 01:04:26,000 --> 01:04:27,760 So did I need to keep going? 1413 01:04:27,760 --> 01:04:30,550 Well, turns out, at least as I've defined selection sort, 1414 01:04:30,550 --> 01:04:33,130 I do need to keep going, because I only claim 1415 01:04:33,130 --> 01:04:36,550 that I'm using one variable in my mind to remember the then smallest element. 1416 01:04:36,550 --> 01:04:39,970 I'm not smart enough like us humans to remember, wait a minute, 1417 01:04:39,970 --> 01:04:41,590 1 is definitely the smallest now. 1418 01:04:41,590 --> 01:04:43,190 I don't have that whole recollection. 1419 01:04:43,190 --> 01:04:45,590 So I just am keeping track of the now smallest. 1420 01:04:45,590 --> 01:04:46,910 So number 1, your name was? 1421 01:04:46,910 --> 01:04:47,410 JACK: Jack. 1422 01:04:47,410 --> 01:04:49,300 DAVID MALAN: Jack, where should Jack go? 1423 01:04:49,300 --> 01:04:50,380 Probably there. 1424 01:04:50,380 --> 01:04:51,670 And what's your name? 1425 01:04:51,670 --> 01:04:51,880 ITSELLE: Itselle. 1426 01:04:51,880 --> 01:04:54,588 DAVID MALAN: OK, so Jack and Itselle, if you want to swap places, 1427 01:04:54,588 --> 01:04:57,430 we've now solved two of the n total problems. 1428 01:04:57,430 --> 01:04:58,990 And now we'll do it a little faster. 1429 01:04:58,990 --> 01:05:03,880 If each of you want to start to swap as I find the right person, so 5 smallest, 1430 01:05:03,880 --> 01:05:06,340 4 smaller, 2 is smaller. 1431 01:05:06,340 --> 01:05:07,750 Got to keep checking. 1432 01:05:07,750 --> 01:05:09,572 OK, 2 was smaller. 1433 01:05:09,572 --> 01:05:11,780 All right, now I'm going to go back to the beginning. 1434 01:05:11,780 --> 01:05:13,090 All right, 4 is small. 1435 01:05:13,090 --> 01:05:14,050 5 is not. 1436 01:05:14,050 --> 01:05:14,740 6 is not. 1437 01:05:14,740 --> 01:05:16,120 7-- oh, 3 is small. 1438 01:05:16,120 --> 01:05:17,770 Where do you want to go? 1439 01:05:17,770 --> 01:05:18,670 OK, good. 1440 01:05:18,670 --> 01:05:19,810 I'm going to go back here. 1441 01:05:19,810 --> 01:05:21,060 And I can be a little smart. 1442 01:05:21,060 --> 01:05:22,810 I don't have to go all the way to the end, 1443 01:05:22,810 --> 01:05:24,950 because I know these folks are already sorted. 1444 01:05:24,950 --> 01:05:26,630 So I can at least optimize slightly. 1445 01:05:26,630 --> 01:05:27,970 So now 5 is small. 1446 01:05:27,970 --> 01:05:28,720 6 is small. 1447 01:05:28,720 --> 01:05:30,160 7 is 4, 4 is smaller. 1448 01:05:30,160 --> 01:05:33,080 If you want to go in place there. 1449 01:05:33,080 --> 01:05:34,810 And now, here things get interesting. 1450 01:05:34,810 --> 01:05:37,180 I can optimize by not looking at these folks 1451 01:05:37,180 --> 01:05:39,340 anymore, because they're obviously problem solved. 1452 01:05:39,340 --> 01:05:42,970 But now 5 is small, 6 is not, 7 is not. 1453 01:05:42,970 --> 01:05:45,010 OK, 5, you can stay where you are. 1454 01:05:45,010 --> 01:05:46,870 Now, a human in the room is obviously going 1455 01:05:46,870 --> 01:05:49,420 to question why I'm wasting any more time. 1456 01:05:49,420 --> 01:05:52,090 But with selection sort, as I've defined it thus far, 1457 01:05:52,090 --> 01:05:55,840 I still have to, now, check 6 is smallest, not 7. 1458 01:05:55,840 --> 01:05:58,520 And now my final step, OK, they're all in place. 1459 01:05:58,520 --> 01:06:00,760 So here, too, is this dichotomy between what we all 1460 01:06:00,760 --> 01:06:03,730 have is this bird's eye view of the whole problem, where it's obvious 1461 01:06:03,730 --> 01:06:05,060 where everyone needs to go. 1462 01:06:05,060 --> 01:06:09,137 But a computer implementing this with an array really has to be more methodical. 1463 01:06:09,137 --> 01:06:10,720 And we're actually saving a step here. 1464 01:06:10,720 --> 01:06:13,780 If we were really doing this, none of these numbers would be visible. 1465 01:06:13,780 --> 01:06:16,840 All eight of our volunteers would be inside of a locked door. 1466 01:06:16,840 --> 01:06:19,220 And only then could we see them one at a time. 1467 01:06:19,220 --> 01:06:21,670 But we're focusing now just on the sorting aspect. 1468 01:06:21,670 --> 01:06:24,640 So let me just, before we do one other demonstration here, 1469 01:06:24,640 --> 01:06:29,620 propose that what I really just did here in pseudocode was something like this. 1470 01:06:29,620 --> 01:06:33,430 For i from 0 to n minus 1, keeping in mind 1471 01:06:33,430 --> 01:06:35,590 that 0 is always the left of the array. 1472 01:06:35,590 --> 01:06:38,110 n minus 1 is always the right end of the array. 1473 01:06:38,110 --> 01:06:43,000 For i from 0 to n minus 1, I found the smallest number between numbers bracket 1474 01:06:43,000 --> 01:06:45,730 i and numbers bracket n minus 1. 1475 01:06:45,730 --> 01:06:48,610 And that's the very geeky way of expressing this optimization. 1476 01:06:48,610 --> 01:06:51,490 I'm always starting from numbers bracket i wherever I am. 1477 01:06:51,490 --> 01:06:53,200 And then everything else to the right. 1478 01:06:53,200 --> 01:06:56,890 And that's what was allowing me to ignore the already sorted volunteers. 1479 01:06:56,890 --> 01:07:01,220 If, though, my last line says swap smallest number with numbers i, 1480 01:07:01,220 --> 01:07:03,220 think that implements what our humans were doing 1481 01:07:03,220 --> 01:07:05,470 by physically walking to another spot. 1482 01:07:05,470 --> 01:07:09,220 All right, so that, then, would be what we'll call selection sort. 1483 01:07:09,220 --> 01:07:11,620 Let's go ahead and take a second approach here 1484 01:07:11,620 --> 01:07:13,360 using an algorithm that I'm going to call bubble sort. 1485 01:07:13,360 --> 01:07:16,090 But to do this, we need you all to reset to your original locations. 1486 01:07:16,090 --> 01:07:17,320 We have a little cheat sheet on the board 1487 01:07:17,320 --> 01:07:19,750 if you'd like to go back to this position here. 1488 01:07:19,750 --> 01:07:22,600 And let me take a fundamentally different approach, because I'm not 1489 01:07:22,600 --> 01:07:24,850 really liking selection sort as is, because it's kind 1490 01:07:24,850 --> 01:07:26,780 of a lot of walking back and forth. 1491 01:07:26,780 --> 01:07:30,620 And the lot of walking suggests a lot of, lot of steps again and again. 1492 01:07:30,620 --> 01:07:32,090 So what might I do instead? 1493 01:07:32,090 --> 01:07:34,840 Well, bubble sort is going to have me focus a little more 1494 01:07:34,840 --> 01:07:36,730 intuitively on just smaller problems. 1495 01:07:36,730 --> 01:07:38,605 And let's see if this gets me somewhere else. 1496 01:07:38,605 --> 01:07:42,430 So if I just look at this list without looking at everyone else, 7 and 2, 1497 01:07:42,430 --> 01:07:43,670 this is obviously a problem. 1498 01:07:43,670 --> 01:07:44,170 Why? 1499 01:07:44,170 --> 01:07:45,500 Because you're out of order. 1500 01:07:45,500 --> 01:07:47,810 So let's just solve one tiny problem first. 1501 01:07:47,810 --> 01:07:49,570 So 7 and 2, why don't you swap? 1502 01:07:49,570 --> 01:07:54,160 I know 2 is in a better place now, because she's definitely less than 7. 1503 01:07:54,160 --> 01:07:55,540 So I think I can now move on. 1504 01:07:55,540 --> 01:07:57,350 7 and 5, problem. 1505 01:07:57,350 --> 01:07:58,390 So let's solve that. 1506 01:07:58,390 --> 01:07:59,830 7 and 4, problem. 1507 01:07:59,830 --> 01:08:02,380 Let's solve that, 7 and 1, let's solve that. 1508 01:08:02,380 --> 01:08:03,970 7 and 6, let's solve that. 1509 01:08:03,970 --> 01:08:05,080 7 and 0, solve that. 1510 01:08:05,080 --> 01:08:06,550 7 and 3, solve that. 1511 01:08:06,550 --> 01:08:07,330 OK, done. 1512 01:08:07,330 --> 01:08:09,130 Sorted, right? 1513 01:08:09,130 --> 01:08:11,780 Or obviously not, if you just glance at these numbers here. 1514 01:08:11,780 --> 01:08:14,530 But we have, fundamentally, taken a bite out of the problem. 1515 01:08:14,530 --> 01:08:17,020 7 is indeed in the right place. 1516 01:08:17,020 --> 01:08:21,170 So we maximally have n minus 1 other problems to solve. 1517 01:08:21,170 --> 01:08:23,660 So how do I do this? 1518 01:08:23,660 --> 01:08:25,700 I think I can just repeat the same logic. 1519 01:08:25,700 --> 01:08:26,770 Let me go over here. 1520 01:08:26,770 --> 01:08:28,210 2 and 5, good. 1521 01:08:28,210 --> 01:08:29,800 5 and 4, no. 1522 01:08:29,800 --> 01:08:31,330 5 and 1, no. 1523 01:08:31,330 --> 01:08:32,590 5 and 6, yes. 1524 01:08:32,590 --> 01:08:34,660 6 and 0, no. 1525 01:08:34,660 --> 01:08:36,760 6 and 3, no. 1526 01:08:36,760 --> 01:08:39,191 So now we've solved two of the problems. 1527 01:08:39,191 --> 01:08:41,649 And what's nice about bubble sort, at least as this glance, 1528 01:08:41,649 --> 01:08:42,707 it's nice and simple. 1529 01:08:42,707 --> 01:08:43,540 It's nice and local. 1530 01:08:43,540 --> 01:08:46,510 And you just keep incrementally solving more and more problems. 1531 01:08:46,510 --> 01:08:48,010 So let's go ahead and do this again. 1532 01:08:48,010 --> 01:08:50,080 And I'll do it-- we can do it faster. 1533 01:08:50,080 --> 01:08:51,760 2 and 4, we know are good. 1534 01:08:51,760 --> 01:08:59,200 4 and 1, 4 and 5, 5 and 0, 5 and 3, 5 and 6, 6 and 7, good. 1535 01:08:59,200 --> 01:09:01,390 So we go back, 2 and 1. 1536 01:09:01,390 --> 01:09:03,340 Ah, now another problem solve. 1537 01:09:03,340 --> 01:09:09,939 2, and 4, 4 and 0, 4 and 3, 4 and 5, 5 and 6, 6 and 7. 1538 01:09:09,939 --> 01:09:13,270 And so notice 2, as per its name, the largest elements 1539 01:09:13,270 --> 01:09:14,895 have bubbled their way up to the top. 1540 01:09:14,895 --> 01:09:17,020 And that's what seems to be happening just as we're 1541 01:09:17,020 --> 01:09:18,340 fixing some remaining problems. 1542 01:09:18,340 --> 01:09:19,120 So almost done. 1543 01:09:19,120 --> 01:09:27,550 1 and 2, 2 and 0, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7, almost done. 1544 01:09:27,550 --> 01:09:29,830 Obviously, to us humans, it looks done. 1545 01:09:29,830 --> 01:09:32,529 How do I know as the computer for sure? 1546 01:09:32,529 --> 01:09:36,370 What would be the most surefire way for me to now go, it's not done, sorry. 1547 01:09:36,370 --> 01:09:38,080 That's a bug. 1548 01:09:38,080 --> 01:09:43,390 OK, 1 and 0, 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7. 1549 01:09:43,390 --> 01:09:47,899 OK, so now it's obviously sorted to the rest of us on stage. 1550 01:09:47,899 --> 01:09:50,290 How could I confirm as much as code? 1551 01:09:50,290 --> 01:09:52,670 You're doing it with your mind, just glancing at this. 1552 01:09:52,670 --> 01:09:56,080 How would the computer, the code, know for sure that this list is now sorted? 1553 01:09:56,080 --> 01:09:57,000 Yeah. 1554 01:09:57,000 --> 01:09:58,500 STUDENT: [INAUDIBLE] one more time. 1555 01:09:58,500 --> 01:10:00,000 DAVID MALAN: Let's do one more time. 1556 01:10:00,000 --> 01:10:03,512 And look, draw what conclusion? 1557 01:10:03,512 --> 01:10:05,490 STUDENT: That nothing has to switch at all. 1558 01:10:05,490 --> 01:10:07,365 DAVID MALAN: Yeah, let's do it one more time, 1559 01:10:07,365 --> 01:10:08,860 even though it's a little wasteful. 1560 01:10:08,860 --> 01:10:13,320 But logically, if I go through the whole list comparing pairs again, again, 1561 01:10:13,320 --> 01:10:15,810 and again, and I don't do any work that time, 1562 01:10:15,810 --> 01:10:19,133 now it's obviously logically safe to just stop, because otherwise, I'm 1563 01:10:19,133 --> 01:10:21,300 wasting my time doing the same thing again and again 1564 01:10:21,300 --> 01:10:22,933 if no one's actually moving. 1565 01:10:22,933 --> 01:10:25,350 So I'm afraid we don't have monopoly games for all of you. 1566 01:10:25,350 --> 01:10:26,767 But we do have eight stress balls. 1567 01:10:26,767 --> 01:10:30,090 And round of applause, if we could, for our volunteers. 1568 01:10:30,090 --> 01:10:33,910 If you want to put your numbers on the shelf there. 1569 01:10:33,910 --> 01:10:36,220 So if we consider for a moment-- 1570 01:10:36,220 --> 01:10:36,720 thank you. 1571 01:10:36,720 --> 01:10:39,340 Thank you so much. 1572 01:10:39,340 --> 01:10:42,150 Sure. 1573 01:10:42,150 --> 01:10:43,170 Thank you. 1574 01:10:43,170 --> 01:10:44,230 Thanks. 1575 01:10:44,230 --> 01:10:44,730 Sure. 1576 01:10:44,730 --> 01:10:48,870 So if we consider now these two algorithms, which one is better? 1577 01:10:48,870 --> 01:10:51,990 Any intuition for whether selection sort the first 1578 01:10:51,990 --> 01:10:55,950 is better or worse than bubble sort the second? 1579 01:10:55,950 --> 01:10:58,020 Any thoughts? 1580 01:10:58,020 --> 01:10:58,860 Yeah. 1581 01:10:58,860 --> 01:11:03,620 STUDENT: Bubble sort's even better because it's less work [INAUDIBLE].. 1582 01:11:03,620 --> 01:11:06,110 DAVID MALAN: So bubble sort seems like less work, 1583 01:11:06,110 --> 01:11:08,930 especially since I was focusing on those localized problems. 1584 01:11:08,930 --> 01:11:11,460 Other intuition? 1585 01:11:11,460 --> 01:11:14,580 Selection sort versus bubble sort. 1586 01:11:14,580 --> 01:11:18,000 Well, let me propose that we try to quantize this so we can actually 1587 01:11:18,000 --> 01:11:19,420 analyze it in some way. 1588 01:11:19,420 --> 01:11:22,590 And this is not an exercise we'll do constantly for lots of algorithms. 1589 01:11:22,590 --> 01:11:24,940 But these are pretty representative of algorithms. 1590 01:11:24,940 --> 01:11:27,690 So we can wrap our minds around, indeed, the performance 1591 01:11:27,690 --> 01:11:28,960 or the design of these things. 1592 01:11:28,960 --> 01:11:34,350 So here is my pseudocode for selection sort, whereby as per its name, 1593 01:11:34,350 --> 01:11:38,500 I just iteratively select the next smallest element again and again. 1594 01:11:38,500 --> 01:11:41,890 So how can we go about analyzing something like this? 1595 01:11:41,890 --> 01:11:43,920 Well, we could just do it on paper pencil 1596 01:11:43,920 --> 01:11:46,260 and count up the number of steps that seem 1597 01:11:46,260 --> 01:11:48,030 to be implied logically by the code. 1598 01:11:48,030 --> 01:11:52,260 We could literally count the number of steps I was taking again and again, 1599 01:11:52,260 --> 01:11:52,890 left to right. 1600 01:11:52,890 --> 01:11:55,830 We could also just count the number of comparisons 1601 01:11:55,830 --> 01:11:58,302 I was making with each of the persons involved. 1602 01:11:58,302 --> 01:12:00,510 And I was doing it kind of quickly in selection sort. 1603 01:12:00,510 --> 01:12:03,033 But every time I was looking at a person trying to decide, 1604 01:12:03,033 --> 01:12:04,950 do I want to remember that number is smallest? 1605 01:12:04,950 --> 01:12:08,580 That number, I was comparing two values with an equals equals or less 1606 01:12:08,580 --> 01:12:11,700 than or greater than sign, at least if we had done this in code. 1607 01:12:11,700 --> 01:12:13,110 So that tends to be the norm. 1608 01:12:13,110 --> 01:12:16,680 When analyzing algorithms like these, counting the number of comparisons, 1609 01:12:16,680 --> 01:12:21,090 because it's kind of a global unit of measure 1610 01:12:21,090 --> 01:12:23,490 we can use to compare different algorithms entirely. 1611 01:12:23,490 --> 01:12:27,780 So think, too, that in the general case, when 1612 01:12:27,780 --> 01:12:30,390 we have more than eight volunteers, more than seven doors, 1613 01:12:30,390 --> 01:12:33,510 we can generalize our array in general, as this 1614 01:12:33,510 --> 01:12:35,220 is the first element at bracket 0. 1615 01:12:35,220 --> 01:12:37,770 And the end of it is always n minus 1. 1616 01:12:37,770 --> 01:12:41,550 So arrays or doors, in this case, or volunteers, 1617 01:12:41,550 --> 01:12:45,720 are always numerically indexed from 0 on up to n minus 1, 1618 01:12:45,720 --> 01:12:47,200 if there's n of them in total. 1619 01:12:47,200 --> 01:12:50,940 So how do we analyze the code of selection sort? 1620 01:12:50,940 --> 01:12:56,370 Well, how many steps did it take me to find the first smallest element? 1621 01:12:56,370 --> 01:12:58,835 Or more precisely, how many comparisons did I 1622 01:12:58,835 --> 01:13:00,960 need to make when I walked to left to right to find 1623 01:13:00,960 --> 01:13:06,100 our first-smallest person, which ended up being 0? 1624 01:13:06,100 --> 01:13:09,310 How many comparisons did I do when walking left to right? 1625 01:13:09,310 --> 01:13:15,850 If there were eight people on stage, how many total comparisons that I do? 1626 01:13:15,850 --> 01:13:18,280 Like if there's eight people, I compared these folks. 1627 01:13:18,280 --> 01:13:22,210 Then this person, this person, yeah. 1628 01:13:22,210 --> 01:13:23,412 Yeah, so seven total, right? 1629 01:13:23,412 --> 01:13:25,120 Because if there's eight people on stage, 1630 01:13:25,120 --> 01:13:28,420 you can only do seven comparisons total, because otherwise you'd 1631 01:13:28,420 --> 01:13:29,960 be comparing one number to itself. 1632 01:13:29,960 --> 01:13:31,960 So it seems like, in the general case, if you've 1633 01:13:31,960 --> 01:13:34,600 got n numbers that you're trying to sort, 1634 01:13:34,600 --> 01:13:38,560 finding the smallest element first takes n minus 1 comparisons. 1635 01:13:38,560 --> 01:13:41,275 Maybe n total steps left to right. 1636 01:13:41,275 --> 01:13:43,150 But the number of comparisons, which I claim, 1637 01:13:43,150 --> 01:13:46,030 is just a useful unit of measure, is n minus 1. 1638 01:13:46,030 --> 01:13:48,490 How about finding the next smallest person? 1639 01:13:48,490 --> 01:13:51,970 How many steps did it take me to find the next smallest number, which 1640 01:13:51,970 --> 01:13:53,200 ended up being the number 1? 1641 01:13:53,200 --> 01:13:55,790 1642 01:13:55,790 --> 01:13:56,855 Yeah. 1643 01:13:56,855 --> 01:13:58,340 STUDENT: [INAUDIBLE] n minus 2. 1644 01:13:58,340 --> 01:13:59,600 DAVID MALAN: Yeah, so just n minus 2. 1645 01:13:59,600 --> 01:13:59,870 Why? 1646 01:13:59,870 --> 01:14:01,610 Because I'd already solved one problem. 1647 01:14:01,610 --> 01:14:03,210 Someone was already in the right position. 1648 01:14:03,210 --> 01:14:05,490 It would be silly to keep counting them again and again. 1649 01:14:05,490 --> 01:14:08,157 So I can whittle down my number of comparisons for the next pass 1650 01:14:08,157 --> 01:14:09,200 to n minus 2. 1651 01:14:09,200 --> 01:14:12,350 The third pass to find the third smallest number would be n minus 3. 1652 01:14:12,350 --> 01:14:15,770 And then dot, dot, dot, presumably this story, this formula, 1653 01:14:15,770 --> 01:14:19,620 ends when you have just one final pair, the people at the end, to compare. 1654 01:14:19,620 --> 01:14:22,820 So if this is looking a little reminiscent of some kind of recurrence 1655 01:14:22,820 --> 01:14:25,520 from high school or high school math or physics or the like, 1656 01:14:25,520 --> 01:14:28,220 let me just stipulate that if you actually do out this math 1657 01:14:28,220 --> 01:14:32,840 and generalize it, that is the same thing as n times n minus 1 1658 01:14:32,840 --> 01:14:33,392 divided by 2. 1659 01:14:33,392 --> 01:14:35,100 And if you're rusty on that, no big deal. 1660 01:14:35,100 --> 01:14:37,070 Just kind of commit to memory that any time 1661 01:14:37,070 --> 01:14:40,010 you add up this kind of series, something plus something slightly 1662 01:14:40,010 --> 01:14:42,302 smaller, plus something slightly smaller, each of which 1663 01:14:42,302 --> 01:14:46,520 differs by 1, you're going to get this formula. n times n minus 1 over 2. 1664 01:14:46,520 --> 01:14:50,630 If we, of course, multiply that out, that's really n squared minus n, 1665 01:14:50,630 --> 01:14:51,680 all divided by 2. 1666 01:14:51,680 --> 01:14:56,540 If we keep multiplying it out, that's n squared divided by 2 minus n over 2. 1667 01:14:56,540 --> 01:14:59,660 And now, we have kind of a vocabulary with which we 1668 01:14:59,660 --> 01:15:03,140 can talk about the efficiency, the design of this algorithm. 1669 01:15:03,140 --> 01:15:06,380 But honestly, I don't really care about this level of precision, 1670 01:15:06,380 --> 01:15:09,560 like n squared divided by 2 minus n divided by 2. 1671 01:15:09,560 --> 01:15:14,240 As n gets really large, which of these symbols, which of these terms 1672 01:15:14,240 --> 01:15:17,480 is really going to dominate, become the biggest influencer 1673 01:15:17,480 --> 01:15:20,190 on the total value of steps? 1674 01:15:20,190 --> 01:15:20,690 Right? 1675 01:15:20,690 --> 01:15:21,890 It's the square, right? 1676 01:15:21,890 --> 01:15:23,382 It's definitely not n divided by 2. 1677 01:15:23,382 --> 01:15:24,590 That's shaving some time off. 1678 01:15:24,590 --> 01:15:27,800 But n squared, as n gets big, is going to get really big. 1679 01:15:27,800 --> 01:15:29,990 If n is 100, then n squared is bigger. 1680 01:15:29,990 --> 01:15:32,570 If n is a million, n squared is really bigger. 1681 01:15:32,570 --> 01:15:35,390 And so at the end of the day, when we're really just talking 1682 01:15:35,390 --> 01:15:39,140 about a wave of the hand analysis and upper bound, if you will, 1683 01:15:39,140 --> 01:15:43,130 let's just say that selection sort, as analyzed here, 1684 01:15:43,130 --> 01:15:45,860 it's on the order of n squared steps. 1685 01:15:45,860 --> 01:15:47,690 It's not precisely n squared steps. 1686 01:15:47,690 --> 01:15:52,830 But you know what? n squared divided by 2, the intuition here might be that, 1687 01:15:52,830 --> 01:15:55,250 well, it's half of that. 1688 01:15:55,250 --> 01:15:58,570 n squared is what really matters as n gets really, really large. 1689 01:15:58,570 --> 01:16:01,070 And that's when you start thinking about and trying to solve 1690 01:16:01,070 --> 01:16:02,445 the Google problems of the world. 1691 01:16:02,445 --> 01:16:04,670 When n gets large, that's when you have to be smarter 1692 01:16:04,670 --> 01:16:07,490 than just sort of naive implementations of any algorithm. 1693 01:16:07,490 --> 01:16:12,480 So where, then, does this algorithm fall into this categorization here? 1694 01:16:12,480 --> 01:16:14,870 Well, n squared, it turns out, is on the order 1695 01:16:14,870 --> 01:16:19,610 of n squared steps, in the worst case, whether it's sorted or not. 1696 01:16:19,610 --> 01:16:23,990 It turns out, though, lower bound, if we consider this same code, 1697 01:16:23,990 --> 01:16:28,670 suppose the best case scenario, like our eight volunteers came up on stage. 1698 01:16:28,670 --> 01:16:32,240 And just because they already sorted themselves, so 0 through 7. 1699 01:16:32,240 --> 01:16:34,490 Suppose they just happened to be in that state. 1700 01:16:34,490 --> 01:16:37,550 How many steps would selection store take 1701 01:16:37,550 --> 01:16:42,670 to sort an already-sorted list of volunteers? 1702 01:16:42,670 --> 01:16:43,420 Any intuition? 1703 01:16:43,420 --> 01:16:44,318 Yeah. 1704 01:16:44,318 --> 01:16:47,186 STUDENT: Would it still be [INAUDIBLE]? 1705 01:16:47,186 --> 01:16:49,717 DAVID MALAN: Would it still be n-- 1706 01:16:49,717 --> 01:16:51,180 STUDENT: Still be 7 [INAUDIBLE]. 1707 01:16:51,180 --> 01:16:53,263 DAVID MALAN: So for the first pass, it would still 1708 01:16:53,263 --> 01:16:55,710 be 7 for the first pass across the humans. 1709 01:16:55,710 --> 01:16:58,530 Because even though, yeah, I'm claiming 0 is here, 1710 01:16:58,530 --> 01:17:01,920 I don't know that 0 is the smallest until I make my way all the way 1711 01:17:01,920 --> 01:17:03,990 over there doing all seven comparisons. 1712 01:17:03,990 --> 01:17:08,220 OK, fine, first pass took seven or more generally n minus 1 steps. 1713 01:17:08,220 --> 01:17:11,940 What if I look for the next smallest element, and the humans in this story 1714 01:17:11,940 --> 01:17:14,370 are already sorted 0 through 7? 1715 01:17:14,370 --> 01:17:17,580 Well, yes, the number 1 is here, and I see them first. 1716 01:17:17,580 --> 01:17:21,405 But I don't know they're the smallest until I compare against everyone else 1717 01:17:21,405 --> 01:17:22,530 get to the end of the list. 1718 01:17:22,530 --> 01:17:24,238 And we're like, oh, well that was stupid. 1719 01:17:24,238 --> 01:17:26,550 I already had the smallest person in hand then. 1720 01:17:26,550 --> 01:17:29,820 And so this pseudocode, this implementation of selection sort, 1721 01:17:29,820 --> 01:17:31,650 is sort of fixed like this. 1722 01:17:31,650 --> 01:17:35,490 There's no special case that says, if already sorted, quit early. 1723 01:17:35,490 --> 01:17:37,860 It's always going to take n squared steps. 1724 01:17:37,860 --> 01:17:42,090 And so in this case, if we borrow our jargon from earlier 1725 01:17:42,090 --> 01:17:46,080 using omega notation, just to be clear, selection sort 1726 01:17:46,080 --> 01:17:50,790 is also going to be in this incarnation in omega of n squared, 1727 01:17:50,790 --> 01:17:53,190 because even in the best case, where the list is already 1728 01:17:53,190 --> 01:17:56,100 sorted, you're going to waste a huge amount of time 1729 01:17:56,100 --> 01:17:59,040 essentially verifying as much or discovering as much, 1730 01:17:59,040 --> 01:18:01,750 even though we humans of course could see it right away. 1731 01:18:01,750 --> 01:18:06,270 So selection sort would seem to take both n squared steps in the worst 1732 01:18:06,270 --> 01:18:08,425 case, n squared steps in the best case. 1733 01:18:08,425 --> 01:18:09,300 And so you know what? 1734 01:18:09,300 --> 01:18:11,280 We can use our theta terminology for that. 1735 01:18:11,280 --> 01:18:13,770 Here would be an algorithm, just like counting earlier, 1736 01:18:13,770 --> 01:18:17,880 that always takes n squared steps, no matter whether the array is sorted 1737 01:18:17,880 --> 01:18:19,652 or not from the get go. 1738 01:18:19,652 --> 01:18:21,360 All right, so hopefully we can do better. 1739 01:18:21,360 --> 01:18:23,550 And someone proposed earlier that bubble sort 1740 01:18:23,550 --> 01:18:25,618 felt like it was using fewer steps. 1741 01:18:25,618 --> 01:18:26,910 Well, let's consider that next. 1742 01:18:26,910 --> 01:18:30,630 With bubble sort, we had this pseudocode, I claim. 1743 01:18:30,630 --> 01:18:33,780 Whereby, let's focus on the inside of the code first. 1744 01:18:33,780 --> 01:18:36,120 Down here, what was I doing? 1745 01:18:36,120 --> 01:18:39,960 For i from 0 to n minus 2. 1746 01:18:39,960 --> 01:18:40,740 That's curious. 1747 01:18:40,740 --> 01:18:42,360 We've never seen n minus 2 before. 1748 01:18:42,360 --> 01:18:44,040 But I asked this question. 1749 01:18:44,040 --> 01:18:50,160 If numbers bracket i and numbers bracket i plus 1 are out of order, swap them. 1750 01:18:50,160 --> 01:18:53,610 So that was when I was pointing at our first two volunteers here. 1751 01:18:53,610 --> 01:18:57,090 I saw that they were out of order, so I swapped them. 1752 01:18:57,090 --> 01:19:01,830 How come I'm doing that again and again up to n minus 2, though, 1753 01:19:01,830 --> 01:19:05,610 instead of n minus 1, which we've always used up 1754 01:19:05,610 --> 01:19:09,670 until now as our rightmost boundary? 1755 01:19:09,670 --> 01:19:14,170 Any intuition for why I'm doing this from 0 to n minus 2? 1756 01:19:14,170 --> 01:19:14,700 Yeah. 1757 01:19:14,700 --> 01:19:18,540 STUDENT: [INAUDIBLE] number, you can't get rid of the ith number. 1758 01:19:18,540 --> 01:19:21,005 There's no benign character you can swap with. 1759 01:19:21,005 --> 01:19:21,880 DAVID MALAN: Exactly. 1760 01:19:21,880 --> 01:19:26,050 Because I'm looking at the ith person per this pseudocode here 1761 01:19:26,050 --> 01:19:29,740 and the ith plus 1 person, I better make sure I don't go step 1762 01:19:29,740 --> 01:19:31,550 beyond the boundaries of my array. 1763 01:19:31,550 --> 01:19:33,010 So if you think of my left hand. 1764 01:19:33,010 --> 01:19:36,250 When my back was to you here, pointing at the current person 1765 01:19:36,250 --> 01:19:39,225 at the first position, my right hand for this if conditioner 1766 01:19:39,225 --> 01:19:41,350 is essentially pointing at the person next to them. 1767 01:19:41,350 --> 01:19:44,740 And you want to iterate with your left hand all through these people. 1768 01:19:44,740 --> 01:19:47,620 But you don't want your left hand to point at the last person. 1769 01:19:47,620 --> 01:19:50,000 You want it to point at the second to last person. 1770 01:19:50,000 --> 01:19:54,220 But we know that the last person is always at n minus 1. 1771 01:19:54,220 --> 01:19:57,820 So the second to last person, just mathematically, is at n minus 2. 1772 01:19:57,820 --> 01:19:58,780 So it's a subtlety. 1773 01:19:58,780 --> 01:20:00,880 But this is a seg fault waiting to happen. 1774 01:20:00,880 --> 01:20:03,760 If you implemented bubble sort using n minus 1, 1775 01:20:03,760 --> 01:20:07,340 you will, my right hand would go beyond the boundaries of the array, 1776 01:20:07,340 --> 01:20:08,170 so just bad. 1777 01:20:08,170 --> 01:20:10,490 All right, so why am I saying this n times? 1778 01:20:10,490 --> 01:20:13,070 Well, we did it very organically with humans. 1779 01:20:13,070 --> 01:20:15,940 But each time someone-- 1780 01:20:15,940 --> 01:20:18,370 each pass I did through the array, someone 1781 01:20:18,370 --> 01:20:19,840 bubbled their way up to the end. 1782 01:20:19,840 --> 01:20:22,870 Number 7, then number 6, then number 5. 1783 01:20:22,870 --> 01:20:26,600 So if on each pass through the array of volunteers, 1784 01:20:26,600 --> 01:20:31,360 I was solving at least one problem, it seems like bubble sort can just 1785 01:20:31,360 --> 01:20:34,315 run n times total to solve all n problems, 1786 01:20:34,315 --> 01:20:36,940 because the first pass will get at least one number into place. 1787 01:20:36,940 --> 01:20:38,470 Second pass, second number into place. 1788 01:20:38,470 --> 01:20:39,970 You might get lucky, and it would do more. 1789 01:20:39,970 --> 01:20:41,740 But worst case, this feels like enough. 1790 01:20:41,740 --> 01:20:46,240 Just do this blindly n times, and they'll all line up together. 1791 01:20:46,240 --> 01:20:49,780 Well, technically-- all right, now we're getting into the weeds. 1792 01:20:49,780 --> 01:20:52,060 Technically, you can just repeat it in minus 1 times, 1793 01:20:52,060 --> 01:20:54,520 because if you solve all n minus 1 other problems, 1794 01:20:54,520 --> 01:20:58,120 and you're left with 1, literally that person's where they need to be, 1795 01:20:58,120 --> 01:20:58,900 just logically. 1796 01:20:58,900 --> 01:21:01,390 If you've already sorted everything else and you've got just the 1 left, 1797 01:21:01,390 --> 01:21:02,540 it's already bubbled up. 1798 01:21:02,540 --> 01:21:03,980 So how do we analyze this? 1799 01:21:03,980 --> 01:21:06,670 Well in bubble sort, we might do something like this. 1800 01:21:06,670 --> 01:21:11,015 I'm essentially doing n minus 1 things n minus 1 times. 1801 01:21:11,015 --> 01:21:13,390 Now, let me back up to the pseudocode, because this one's 1802 01:21:13,390 --> 01:21:14,980 a little less obvious. 1803 01:21:14,980 --> 01:21:19,270 This is where you can actually mathematically infer from your loop 1804 01:21:19,270 --> 01:21:21,110 how many steps you're taking. 1805 01:21:21,110 --> 01:21:24,585 So this first line literally says, repeat the following n minus 1 times. 1806 01:21:24,585 --> 01:21:26,710 So that's going to translate very straightforwardly 1807 01:21:26,710 --> 01:21:28,240 to our mathematical formula. 1808 01:21:28,240 --> 01:21:30,190 Do something n minus 1 times. 1809 01:21:30,190 --> 01:21:33,970 This loop, just because I'm using for loop terminology, 1810 01:21:33,970 --> 01:21:35,840 it's framed a little differently. 1811 01:21:35,840 --> 01:21:39,910 But if you're iterating from 0 to n minus 2, 1812 01:21:39,910 --> 01:21:43,258 you're iterating a total of n minus 1 times. 1813 01:21:43,258 --> 01:21:45,550 And again, the arithmetic is getting a little annoying. 1814 01:21:45,550 --> 01:21:48,470 But this just means do the following n minus 1 times. 1815 01:21:48,470 --> 01:21:51,670 So do n minus 1 things n minus 1 times. 1816 01:21:51,670 --> 01:21:54,440 We can now run out the math as follows. 1817 01:21:54,440 --> 01:21:57,940 We have the formula n minus 1 times n minus 1. 1818 01:21:57,940 --> 01:22:01,210 We do our little FOIL method here, n squared minus 1 times n, 1819 01:22:01,210 --> 01:22:03,100 minus 1 times n, plus 1. 1820 01:22:03,100 --> 01:22:06,550 We can combine like terms. n squared minus 2n plus 1. 1821 01:22:06,550 --> 01:22:09,025 But at this point, when n gets really large, 1822 01:22:09,025 --> 01:22:10,900 which term are we really going to care about? 1823 01:22:10,900 --> 01:22:13,390 This is on the order of? 1824 01:22:13,390 --> 01:22:14,870 Yeah, n squared. 1825 01:22:14,870 --> 01:22:16,780 So at least asymptotically. 1826 01:22:16,780 --> 01:22:20,830 Asymptotically means, as n approaches infinity, gets really large. 1827 01:22:20,830 --> 01:22:24,070 Turns out that the upper bound on selection sort and bubble sort 1828 01:22:24,070 --> 01:22:25,430 are essentially the same. 1829 01:22:25,430 --> 01:22:28,540 Now, if we really nitpicked and compared the total number of comparisons, 1830 01:22:28,540 --> 01:22:29,680 they might differ slightly. 1831 01:22:29,680 --> 01:22:31,870 But as n gets large, honestly, you're barely 1832 01:22:31,870 --> 01:22:36,350 going to notice the difference, it would seem, between these two algorithms. 1833 01:22:36,350 --> 01:22:39,550 But what about the lower bound? 1834 01:22:39,550 --> 01:22:44,620 If the upper bound on bubble sort is also big O of n, what about the lower 1835 01:22:44,620 --> 01:22:45,470 bound here? 1836 01:22:45,470 --> 01:22:50,170 Well, with this pseudocode, what would the lower bound be on bubble sort? 1837 01:22:50,170 --> 01:22:53,890 Even in the best case when all of the volunteers are sorted. 1838 01:22:53,890 --> 01:22:56,830 Any intuition? 1839 01:22:56,830 --> 01:22:57,670 In this pseudo code. 1840 01:22:57,670 --> 01:22:58,538 Yeah, in the middle. 1841 01:22:58,538 --> 01:22:59,830 STUDENT: Sorry, quick question. 1842 01:22:59,830 --> 01:23:02,360 Isn't bubble sort structured such that you 1843 01:23:02,360 --> 01:23:05,955 wouldn't need to compare numbers that have already bubbled up? 1844 01:23:05,955 --> 01:23:07,080 DAVID MALAN: Good question. 1845 01:23:07,080 --> 01:23:09,122 Isn't bubble sort designed such that you wouldn't 1846 01:23:09,122 --> 01:23:12,860 need to compare numbers that have already bubbled up? 1847 01:23:12,860 --> 01:23:17,000 That's what's happening here in the middle, implicitly. 1848 01:23:17,000 --> 01:23:19,220 I'm always going from left to right. 1849 01:23:19,220 --> 01:23:21,650 But remember that even when I screwed up at the end 1850 01:23:21,650 --> 01:23:23,900 and the last two people were out of order, I do always 1851 01:23:23,900 --> 01:23:27,140 need to restart at the beginning, because the big numbers are 1852 01:23:27,140 --> 01:23:29,691 going that way, and the small numbers are coming this way. 1853 01:23:29,691 --> 01:23:32,892 STUDENT: [INAUDIBLE] 1854 01:23:32,892 --> 01:23:34,100 DAVID MALAN: So that is true. 1855 01:23:34,100 --> 01:23:37,460 There are some slight optimizations that I'm kind of glossing over here. 1856 01:23:37,460 --> 01:23:40,700 Let me stipulate that it would still end up being on the order of n squared. 1857 01:23:40,700 --> 01:23:43,910 But that would definitely shave off some actual running time here. 1858 01:23:43,910 --> 01:23:46,340 But what if the list is already sorted? 1859 01:23:46,340 --> 01:23:48,410 Our pseudocode, at the moment, has no allowance 1860 01:23:48,410 --> 01:23:51,020 for if list is already sorted, quit early. 1861 01:23:51,020 --> 01:23:53,840 So we're going to blindly do n minus 1 things 1862 01:23:53,840 --> 01:23:58,850 and minus 1 times unless we modify our pseudocode, as I did verbally earlier, 1863 01:23:58,850 --> 01:23:59,960 I proposed this. 1864 01:23:59,960 --> 01:24:04,520 Inside of that outer loop, if you make a pass across all of the volunteers, 1865 01:24:04,520 --> 01:24:06,907 and your mental counter has made no swaps, 1866 01:24:06,907 --> 01:24:08,990 you have to keep track with some kind of variable, 1867 01:24:08,990 --> 01:24:10,518 well then, you might as well stop. 1868 01:24:10,518 --> 01:24:12,560 Because if you do a whole pass and make no swaps, 1869 01:24:12,560 --> 01:24:17,550 why would you waste time doing it again expecting different behavior? 1870 01:24:17,550 --> 01:24:23,480 So to help visualize these, whereby now bubble sort can be advantageous 1871 01:24:23,480 --> 01:24:26,640 if the data is already sorted or mostly sorted. 1872 01:24:26,640 --> 01:24:27,140 Why? 1873 01:24:27,140 --> 01:24:29,510 Because it does have this short circuit detail. 1874 01:24:29,510 --> 01:24:31,790 At least if we implement it like that, how 1875 01:24:31,790 --> 01:24:36,263 can we go about visualizing these things a little more clearly? 1876 01:24:36,263 --> 01:24:37,680 Well, let me go ahead and do this. 1877 01:24:37,680 --> 01:24:41,870 Let me pull up, here, a visualization of exactly these algorithms, 1878 01:24:41,870 --> 01:24:45,320 thanks to a third party tool here that's going to help us visualize 1879 01:24:45,320 --> 01:24:46,850 these sorting algorithms as follows. 1880 01:24:46,850 --> 01:24:48,740 Small bars represent small numbers. 1881 01:24:48,740 --> 01:24:50,480 Big bars represent big numbers. 1882 01:24:50,480 --> 01:24:53,720 And so the idea, now, is when I hit a button here 1883 01:24:53,720 --> 01:24:56,843 to get all of the small bars this way, all of the big bars this way. 1884 01:24:56,843 --> 01:24:58,010 So just like our volunteers. 1885 01:24:58,010 --> 01:25:02,370 But instead of holding lighted numbers, it's bars representing their magnitude. 1886 01:25:02,370 --> 01:25:07,190 So let's go ahead and start with, for instance, selection sort. 1887 01:25:07,190 --> 01:25:09,920 And you'll see in pink, is being highlighted 1888 01:25:09,920 --> 01:25:12,980 the current number that is being selected 1889 01:25:12,980 --> 01:25:14,820 and then pulled all the way to the left. 1890 01:25:14,820 --> 01:25:16,220 So this is selection sort. 1891 01:25:16,220 --> 01:25:20,420 And again, it's selecting the next smallest element. 1892 01:25:20,420 --> 01:25:25,850 But you can see here, all the more visibly, that just like my human feet, 1893 01:25:25,850 --> 01:25:27,450 we're taking a lot of steps. 1894 01:25:27,450 --> 01:25:32,430 So is this algorithm touching these elements, again and again and again. 1895 01:25:32,430 --> 01:25:34,970 And this is why the n squared is really a thing. 1896 01:25:34,970 --> 01:25:37,322 There's got to be some inherent redundancy here. 1897 01:25:37,322 --> 01:25:40,280 Like, why do we keep looking at the same darn elements again and again? 1898 01:25:40,280 --> 01:25:43,070 We do, in terms of our pseudocode need to do so. 1899 01:25:43,070 --> 01:25:46,670 But it's this redundant comparisons that kind of explains 1900 01:25:46,670 --> 01:25:48,782 why n squared is indeed the case. 1901 01:25:48,782 --> 01:25:49,490 So now it's done. 1902 01:25:49,490 --> 01:25:50,977 Small bars here, big bars there. 1903 01:25:50,977 --> 01:25:53,060 And I had to just keep talking there to kill time, 1904 01:25:53,060 --> 01:25:54,650 because it's relatively slow. 1905 01:25:54,650 --> 01:25:57,182 Well, let me re-randomize the array, just 1906 01:25:57,182 --> 01:25:58,640 so we start with a different order. 1907 01:25:58,640 --> 01:26:00,380 And now let me click on bubble sort. 1908 01:26:00,380 --> 01:26:03,240 And you'll see similar idea, but different algorithm. 1909 01:26:03,240 --> 01:26:06,620 So now, the two bars in pink are the two that 1910 01:26:06,620 --> 01:26:09,995 are being compared and fixed, potentially, if they're out of order. 1911 01:26:09,995 --> 01:26:11,870 And you can see already that the biggest bars 1912 01:26:11,870 --> 01:26:14,420 are bubbling their way up to the top. 1913 01:26:14,420 --> 01:26:17,510 But now, you can also see this redundancy, 1914 01:26:17,510 --> 01:26:20,450 like we keep swooping through the list again and again, just 1915 01:26:20,450 --> 01:26:22,740 like I kept walking back and forth. 1916 01:26:22,740 --> 01:26:23,795 And this is n squared. 1917 01:26:23,795 --> 01:26:24,920 This is not that many bars. 1918 01:26:24,920 --> 01:26:25,420 What? 1919 01:26:25,420 --> 01:26:27,830 10, 20, there's like 40 or something bars, I'm guessing. 1920 01:26:27,830 --> 01:26:31,560 That's pretty slow already just to sort 40 numbers. 1921 01:26:31,560 --> 01:26:34,310 And I think it's going to get tedious if I keep talking over this. 1922 01:26:34,310 --> 01:26:37,590 So let's just assume that this too is relatively slow. 1923 01:26:37,590 --> 01:26:41,390 Had I gotten lucky and the list were almost sorted already, 1924 01:26:41,390 --> 01:26:43,310 bubble sort would have been pretty fast. 1925 01:26:43,310 --> 01:26:46,040 But this was a truly random array, so we did not get lucky. 1926 01:26:46,040 --> 01:26:50,010 So indeed, the worst case might be what's kicking in here. 1927 01:26:50,010 --> 01:26:54,470 So I feel like it'll be anticlimactic, like holding in a sneeze, if I 1928 01:26:54,470 --> 01:26:55,980 don't let you see the end of this. 1929 01:26:55,980 --> 01:26:57,890 So here we go. 1930 01:26:57,890 --> 01:27:00,110 Nothing interesting is about to happen. 1931 01:27:00,110 --> 01:27:02,330 Almost done. 1932 01:27:02,330 --> 01:27:03,080 OK, done. 1933 01:27:03,080 --> 01:27:05,890 All right, so thank you. 1934 01:27:05,890 --> 01:27:06,710 [APPLAUSE] 1935 01:27:06,710 --> 01:27:09,110 Thank you. 1936 01:27:09,110 --> 01:27:12,500 So still somewhat slow though. 1937 01:27:12,500 --> 01:27:15,800 How though can we, perhaps, do a little better fundamentally? 1938 01:27:15,800 --> 01:27:19,070 So we can do so if we introduce yet another technique. 1939 01:27:19,070 --> 01:27:22,130 And this one isn't so much a function of code as it is concept. 1940 01:27:22,130 --> 01:27:25,290 And it's something that you might have seen in the real world, 1941 01:27:25,290 --> 01:27:27,500 but perhaps not so obviously so. 1942 01:27:27,500 --> 01:27:31,490 So it turns out, in programming, recursion 1943 01:27:31,490 --> 01:27:34,970 refers to the ability of a function to call itself. 1944 01:27:34,970 --> 01:27:37,460 In the world of mathematics, if you have a function f, 1945 01:27:37,460 --> 01:27:41,450 if f appears on both the left side and the right side of a formula, 1946 01:27:41,450 --> 01:27:43,850 that would be a recursive function in the math world too. 1947 01:27:43,850 --> 01:27:46,490 Whenever f is defined in terms of itself, or in our case, 1948 01:27:46,490 --> 01:27:51,800 in compute-- in programming, any time a function calls itself, 1949 01:27:51,800 --> 01:27:53,660 that function is said to be recursive. 1950 01:27:53,660 --> 01:27:55,790 And this is actually something we've seen already in class, 1951 01:27:55,790 --> 01:27:57,373 even though we didn't call it as much. 1952 01:27:57,373 --> 01:28:00,350 So for instance, consider this pseudocode 1953 01:28:00,350 --> 01:28:03,590 from earlier, whereby this was the pseudocode 1954 01:28:03,590 --> 01:28:07,760 for searching via binary search, a whole bunch of doors. 1955 01:28:07,760 --> 01:28:10,040 If no doors are left returned false, that 1956 01:28:10,040 --> 01:28:12,600 was the additional conditional we added. 1957 01:28:12,600 --> 01:28:14,930 But then if number behind middle door returned true, 1958 01:28:14,930 --> 01:28:18,980 and here's the interesting part, if number is less than middle door, 1959 01:28:18,980 --> 01:28:20,780 search the left half. 1960 01:28:20,780 --> 01:28:24,020 Else if number is greater than middle door, search the right half. 1961 01:28:24,020 --> 01:28:27,800 This pseudocode earlier was, itself, recursive. 1962 01:28:27,800 --> 01:28:28,340 Why? 1963 01:28:28,340 --> 01:28:30,590 Because here is an algorithm for searching. 1964 01:28:30,590 --> 01:28:32,650 But what's the algorithm telling us? 1965 01:28:32,650 --> 01:28:37,280 Well, on this line and this line, it's telling us to search something else. 1966 01:28:37,280 --> 01:28:40,720 So even though it's not explicitly defined in code as having a name, 1967 01:28:40,720 --> 01:28:44,410 if this is a search algorithm, and yet the search algorithm is using a search 1968 01:28:44,410 --> 01:28:47,650 algorithm, this pseudocode is recursive. 1969 01:28:47,650 --> 01:28:50,380 Now, that could quickly get you into trouble if a function just 1970 01:28:50,380 --> 01:28:53,410 calls itself again and again and again. 1971 01:28:53,410 --> 01:28:57,130 But why, intuitively, is it not problematic 1972 01:28:57,130 --> 01:29:01,840 that this code, this pseudocode, calls itself? 1973 01:29:01,840 --> 01:29:03,460 Why will the algorithm still stop? 1974 01:29:03,460 --> 01:29:03,970 Yeah. 1975 01:29:03,970 --> 01:29:07,525 STUDENT: It has an exit condition, as in if there is no doors left, [INAUDIBLE].. 1976 01:29:07,525 --> 01:29:08,400 DAVID MALAN: Exactly. 1977 01:29:08,400 --> 01:29:10,860 It has some exit condition, like if no doors left. 1978 01:29:10,860 --> 01:29:14,700 And more importantly, any time you search the left half, 1979 01:29:14,700 --> 01:29:17,120 you're searching a smaller version of the problem. 1980 01:29:17,120 --> 01:29:18,870 Any time you search the right half, you're 1981 01:29:18,870 --> 01:29:22,330 searching a smaller version of the problem, literally half the size. 1982 01:29:22,330 --> 01:29:24,270 So this is why, in the phone book, obviously 1983 01:29:24,270 --> 01:29:27,210 I couldn't tear the phone book in half infinitely many times, 1984 01:29:27,210 --> 01:29:29,560 because it was literally getting smaller each time. 1985 01:29:29,560 --> 01:29:33,580 So recursion is this ability to call yourself, if you will. 1986 01:29:33,580 --> 01:29:36,850 But what's important is that you do it on a smaller, smaller problem, 1987 01:29:36,850 --> 01:29:39,750 so that eventually, you have no more problems to solve 1988 01:29:39,750 --> 01:29:42,010 or no more data, no more doors at all. 1989 01:29:42,010 --> 01:29:46,210 So these two lines here would be the recursive elements here. 1990 01:29:46,210 --> 01:29:49,690 But if we go back to week 0, we could have used recursion in some other way. 1991 01:29:49,690 --> 01:29:53,040 So this was our pseudocode for the phone book back in week 0. 1992 01:29:53,040 --> 01:29:55,350 And recall that we described these yellow lines 1993 01:29:55,350 --> 01:29:59,050 as really representing a loop, some kind of cycle again and again. 1994 01:29:59,050 --> 01:30:01,080 But there was a missed opportunity here. 1995 01:30:01,080 --> 01:30:05,670 What if I had re-implemented this code to do this? 1996 01:30:05,670 --> 01:30:09,120 Instead of saying open to middle of left half of book 1997 01:30:09,120 --> 01:30:12,450 and then go back to line 3, like literally inducing a loop, 1998 01:30:12,450 --> 01:30:14,610 or open to middle of right half a book and go back 1999 01:30:14,610 --> 01:30:17,400 to line 3 inducing another loop, why don't I 2000 01:30:17,400 --> 01:30:19,590 just recognize that what I'm staring at now 2001 01:30:19,590 --> 01:30:23,730 is a algorithm for searching a phone book? 2002 01:30:23,730 --> 01:30:27,480 And if you want to search a smaller phone book, like A through M or N 2003 01:30:27,480 --> 01:30:30,750 through Z, we'll just use this same algorithm. 2004 01:30:30,750 --> 01:30:35,100 So I can replace these yellow lines with just this, casually speaking. 2005 01:30:35,100 --> 01:30:37,282 Search left half of book, search right half of book. 2006 01:30:37,282 --> 01:30:39,240 This would be implicitly, and now I can shorten 2007 01:30:39,240 --> 01:30:42,360 the whole thing, a recursive implementation of the phone book 2008 01:30:42,360 --> 01:30:43,633 pseudocode from week 0. 2009 01:30:43,633 --> 01:30:46,050 And it's recursive, because if this is a search algorithm, 2010 01:30:46,050 --> 01:30:48,900 and you're saying go search something else, that's fine. 2011 01:30:48,900 --> 01:30:49,890 That's recursive. 2012 01:30:49,890 --> 01:30:52,440 But because you're searching half of the phone book, 2013 01:30:52,440 --> 01:30:55,710 it's indeed going to get smaller and smaller. 2014 01:30:55,710 --> 01:30:59,400 Even in the real world or the real real virtual world, 2015 01:30:59,400 --> 01:31:01,903 you can see recursive data structures in the wild, 2016 01:31:01,903 --> 01:31:03,820 or at least in Super Mario Brothers like this. 2017 01:31:03,820 --> 01:31:05,612 Let me get rid of all the distractions here 2018 01:31:05,612 --> 01:31:08,790 and focus on this pyramid, where you have one block, 2019 01:31:08,790 --> 01:31:10,860 then two, then three, then four. 2020 01:31:10,860 --> 01:31:14,710 Well, this itself, is technically recursively-defined in the sense that, 2021 01:31:14,710 --> 01:31:16,590 well, what is a pyramid of height for? 2022 01:31:16,590 --> 01:31:18,420 Well, it's really, what? 2023 01:31:18,420 --> 01:31:21,090 How would you describe a pyramid of height 4 2024 01:31:21,090 --> 01:31:25,743 is actually the same thing as a pyramid of-- 2025 01:31:25,743 --> 01:31:28,200 STUDENT: Height 3. 2026 01:31:28,200 --> 01:31:30,750 DAVID MALAN: --of height 3, plus 1 additional layer. 2027 01:31:30,750 --> 01:31:32,370 Well, what's a pyramid of height 3? 2028 01:31:32,370 --> 01:31:36,250 Well, it's technically a pyramid of height 2 plus 1 additional layer. 2029 01:31:36,250 --> 01:31:38,670 And so even physical structures can be recursive 2030 01:31:38,670 --> 01:31:40,630 if you can define them in terms of itself. 2031 01:31:40,630 --> 01:31:44,280 Now, at some point, you have to say that if the pyramid is of height 1, 2032 01:31:44,280 --> 01:31:46,090 there's just one block. 2033 01:31:46,090 --> 01:31:49,020 You can't forever say it's defined in terms of a height negative 1, 2034 01:31:49,020 --> 01:31:50,440 negative 2, you would never stop. 2035 01:31:50,440 --> 01:31:52,752 So you have to kind of have a special case there. 2036 01:31:52,752 --> 01:31:55,710 But let's go ahead and translate something like this, in fact, to code. 2037 01:31:55,710 --> 01:32:00,390 Let me go back to VS code here, and let me implement a program called iteration 2038 01:32:00,390 --> 01:32:03,090 that refers to a loop iterating. 2039 01:32:03,090 --> 01:32:05,620 And let me implement a very simple pyramid like that. 2040 01:32:05,620 --> 01:32:08,370 So let me go ahead and include the CS50 library. 2041 01:32:08,370 --> 01:32:14,918 I'll include our standard io.h int main void, no command line arguments today. 2042 01:32:14,918 --> 01:32:16,210 And let's go ahead and do this. 2043 01:32:16,210 --> 01:32:18,900 Let's declare a variable called height, ask the human 2044 01:32:18,900 --> 01:32:21,150 for the height of this pyramid. 2045 01:32:21,150 --> 01:32:25,300 And then let's go ahead and draw a pyramid of that height. 2046 01:32:25,300 --> 01:32:27,580 Now, of course, draw does not yet exist. 2047 01:32:27,580 --> 01:32:30,090 So I'm going to need to invent the draw function. 2048 01:32:30,090 --> 01:32:33,180 Let me go ahead and define a function that doesn't have a return value. 2049 01:32:33,180 --> 01:32:34,722 It's just going to have side effects. 2050 01:32:34,722 --> 01:32:37,230 It's just going to print bricks on the screen, called draw. 2051 01:32:37,230 --> 01:32:40,240 And it takes in an integer, n, as its input. 2052 01:32:40,240 --> 01:32:41,950 And how am I going to implement this? 2053 01:32:41,950 --> 01:32:46,530 Well again, I want to print one block, then two, then three, then four. 2054 01:32:46,530 --> 01:32:49,680 That's pretty straightforward, at least once you're comfortable with loops. 2055 01:32:49,680 --> 01:32:51,370 Let me go back to the code here. 2056 01:32:51,370 --> 01:32:55,170 Let me go ahead and say 4, int i, get 0. 2057 01:32:55,170 --> 01:32:56,880 i is less than n. 2058 01:32:56,880 --> 01:32:58,260 i plus plus. 2059 01:32:58,260 --> 01:33:01,170 And that's going to iterate, essentially row by row. 2060 01:33:01,170 --> 01:33:05,370 And on each row, I want to print out one, then two, then three, then 2061 01:33:05,370 --> 01:33:06,060 four bricks. 2062 01:33:06,060 --> 01:33:08,815 But I'm iterating from 0 to 1 to 2 to 3. 2063 01:33:08,815 --> 01:33:09,690 So I think that's OK. 2064 01:33:09,690 --> 01:33:13,020 I can just say something like 4 int j get 0. 2065 01:33:13,020 --> 01:33:17,160 j, let's be clever about this, is less than i. 2066 01:33:17,160 --> 01:33:19,380 j++. 2067 01:33:19,380 --> 01:33:22,560 And now, let me go ahead and, inside of this loop, 2068 01:33:22,560 --> 01:33:27,130 I think I can get away with just printing out a single hash sign. 2069 01:33:27,130 --> 01:33:30,270 But then outside of that loop, similar to last week, 2070 01:33:30,270 --> 01:33:32,920 I'm going to print my new line separately. 2071 01:33:32,920 --> 01:33:34,470 So a little non-obvious at first. 2072 01:33:34,470 --> 01:33:38,790 But this outer loop iterates row by row, line by line, if you will. 2073 01:33:38,790 --> 01:33:46,890 And then the inner loop just makes sure that when i equals zero, let's see. 2074 01:33:46,890 --> 01:33:48,960 Oh nope, there's a bug. 2075 01:33:48,960 --> 01:33:52,170 I need to make sure that it's j is less than i plus 1. 2076 01:33:52,170 --> 01:33:55,500 So when i is 0 on my first line of output, 2077 01:33:55,500 --> 01:33:57,600 I'm going to print out one brick. 2078 01:33:57,600 --> 01:34:02,350 When i is 1, I'm going to print out two bricks and so forth. 2079 01:34:02,350 --> 01:34:05,460 So let me go ahead and run make iteration. 2080 01:34:05,460 --> 01:34:09,090 All right, and now, seems to compile. 2081 01:34:09,090 --> 01:34:10,770 Uh oh, huh. 2082 01:34:10,770 --> 01:34:12,900 Implicit declaration of function draw. 2083 01:34:12,900 --> 01:34:16,100 So I'm making week one mistakes again. 2084 01:34:16,100 --> 01:34:16,660 What? 2085 01:34:16,660 --> 01:34:17,570 Say again. 2086 01:34:17,570 --> 01:34:18,450 STUDENT: [INAUDIBLE] 2087 01:34:18,450 --> 01:34:19,200 DAVID MALAN: Yeah. 2088 01:34:19,200 --> 01:34:20,320 The prototype is missing. 2089 01:34:20,320 --> 01:34:21,300 I didn't declare it at the top. 2090 01:34:21,300 --> 01:34:23,550 That's an easy fix, and the only time, really, it's 2091 01:34:23,550 --> 01:34:25,530 OK and necessary to copy paste. 2092 01:34:25,530 --> 01:34:29,050 Let me copy the functions declaration there and it with a semicolon. 2093 01:34:29,050 --> 01:34:32,370 So that clang now knows that draw will exist. 2094 01:34:32,370 --> 01:34:33,240 Make iteration. 2095 01:34:33,240 --> 01:34:33,930 Now it works. 2096 01:34:33,930 --> 01:34:36,090 Thank you. dot slash iteration. 2097 01:34:36,090 --> 01:34:37,830 We'll type in something like 4. 2098 01:34:37,830 --> 01:34:40,860 And there we have it, our pyramid of height one, two, three, four, 2099 01:34:40,860 --> 01:34:43,340 that looks pretty similar to this, albeit using hashes. 2100 01:34:43,340 --> 01:34:46,590 So that's how we would have implemented this, like, two weeks ago in week one, 2101 01:34:46,590 --> 01:34:49,110 maybe last week, but just using arrays. 2102 01:34:49,110 --> 01:34:53,640 But let me propose that we could do something recursively instead. 2103 01:34:53,640 --> 01:34:55,480 Let me close this version of the code. 2104 01:34:55,480 --> 01:34:59,730 And let me go back to VS Code and open up recursion.c, 2105 01:34:59,730 --> 01:35:01,800 just to demonstrate something recursively. 2106 01:35:01,800 --> 01:35:04,420 And I'll do it incorrectly deliberately the first time. 2107 01:35:04,420 --> 01:35:06,630 So let me include cs50.h. 2108 01:35:06,630 --> 01:35:08,850 Let me include standard io.h. 2109 01:35:08,850 --> 01:35:12,000 Let me do int main void. 2110 01:35:12,000 --> 01:35:17,910 And let me just blindly draw a pyramid initially of height 1. 2111 01:35:17,910 --> 01:35:21,910 But now in my draw function, let me re-implement it a little differently. 2112 01:35:21,910 --> 01:35:24,840 So my draw function this time is still going to take a number n. 2113 01:35:24,840 --> 01:35:26,860 But that's how many hashes it's going to print. 2114 01:35:26,860 --> 01:35:30,030 So let's do 4, int i get 0. 2115 01:35:30,030 --> 01:35:32,220 i is less than n. 2116 01:35:32,220 --> 01:35:34,050 i++. 2117 01:35:34,050 --> 01:35:38,440 Then let's go ahead and print out a single hash mark here. 2118 01:35:38,440 --> 01:35:44,290 And then after that, let's print out the end of the line, just as before. 2119 01:35:44,290 --> 01:35:49,770 But now this, of course, is only going to draw a single row. 2120 01:35:49,770 --> 01:35:53,790 It's going to print out one hash or two hashes or three hashes, but only 2121 01:35:53,790 --> 01:35:54,750 on one line. 2122 01:35:54,750 --> 01:35:58,560 Let me now, incorrectly, but just kind of curiously say, all right. 2123 01:35:58,560 --> 01:36:01,230 Well, if this draws a pyramid of height 1, 2124 01:36:01,230 --> 01:36:04,860 let's just use ourselves to draw a pyramid of height n plus 1. 2125 01:36:04,860 --> 01:36:08,370 So the first time I call draw, it will print out one hash. 2126 01:36:08,370 --> 01:36:12,870 Then the second time I call draw, it will print out two hashes, then three, 2127 01:36:12,870 --> 01:36:13,770 then four. 2128 01:36:13,770 --> 01:36:18,000 So we're kind of laying these bricks down from top to bottom. 2129 01:36:18,000 --> 01:36:20,670 Make recursion. 2130 01:36:20,670 --> 01:36:22,420 Whoops, I screwed up again. 2131 01:36:22,420 --> 01:36:24,630 So let's copy the prototype here. 2132 01:36:24,630 --> 01:36:27,260 Let's put this down over here, semicolon. 2133 01:36:27,260 --> 01:36:28,600 Let's do this again. 2134 01:36:28,600 --> 01:36:30,010 Make recursion. 2135 01:36:30,010 --> 01:36:32,410 All right, all good, dot slash recursion. 2136 01:36:32,410 --> 01:36:34,930 And now let me increase the size of my terminal window, 2137 01:36:34,930 --> 01:36:37,310 just so you can see more of the output. 2138 01:36:37,310 --> 01:36:39,490 And here we have. 2139 01:36:39,490 --> 01:36:41,480 OK, bad, but thank you. 2140 01:36:41,480 --> 01:36:43,525 So we have an infinitely tall pyramid. 2141 01:36:43,525 --> 01:36:45,400 And it's just flying across the screen, which 2142 01:36:45,400 --> 01:36:47,020 is why it looks kind of like a mess. 2143 01:36:47,020 --> 01:36:51,670 But I printed out a pyramid of height 1, and then 2, and then 3, and then 4. 2144 01:36:51,670 --> 01:36:55,210 And unfortunately, what am I lacking any sort of quick condition, 2145 01:36:55,210 --> 01:36:58,270 any kind of condition that says, wait a minute, when it's too tall, 2146 01:36:58,270 --> 01:36:59,353 stop altogether. 2147 01:36:59,353 --> 01:37:00,520 So this is an infinite loop. 2148 01:37:00,520 --> 01:37:01,570 But it's not a loop. 2149 01:37:01,570 --> 01:37:03,250 It's a recursive call. 2150 01:37:03,250 --> 01:37:05,780 And actually, doing this in general, is very bad. 2151 01:37:05,780 --> 01:37:08,410 We'll see next week that if you call a function too many times, 2152 01:37:08,410 --> 01:37:11,967 you can actually trigger yet another of those segmentation faults, 2153 01:37:11,967 --> 01:37:14,050 because you're using too much memory, essentially. 2154 01:37:14,050 --> 01:37:16,300 But for now, I haven't triggered that yet. 2155 01:37:16,300 --> 01:37:17,927 Control C is your friend to cancel. 2156 01:37:17,927 --> 01:37:21,010 And as an aside, if you're playing along at home or playing with this code 2157 01:37:21,010 --> 01:37:22,750 later, I actually cheated here. 2158 01:37:22,750 --> 01:37:25,360 We have a special clang configuration feature 2159 01:37:25,360 --> 01:37:28,392 that prevents you from calling a function like that 2160 01:37:28,392 --> 01:37:29,350 and creating a problem. 2161 01:37:29,350 --> 01:37:31,598 I overrode it just for demonstration sake. 2162 01:37:31,598 --> 01:37:34,640 But odds are at home, you wouldn't be able to compile this code yourself. 2163 01:37:34,640 --> 01:37:39,050 But let me do a proper version recursively of this code as follows. 2164 01:37:39,050 --> 01:37:41,870 Let me go back into the code here. 2165 01:37:41,870 --> 01:37:45,130 Let me go ahead and, not just blindly start drawing one, then two, 2166 01:37:45,130 --> 01:37:46,540 then three layers of bricks. 2167 01:37:46,540 --> 01:37:50,380 Let me prompt the human as before for the height of the pyramid 2168 01:37:50,380 --> 01:37:53,350 they want using our get int function. 2169 01:37:53,350 --> 01:37:55,670 And now let me call draw of height again. 2170 01:37:55,670 --> 01:37:58,330 So now I'm going back to the loop-like version. 2171 01:37:58,330 --> 01:38:03,250 But instead of using a loop now, this is where recursion gets rather elegant, 2172 01:38:03,250 --> 01:38:04,120 if you will. 2173 01:38:04,120 --> 01:38:10,690 Let me go ahead and execute and code the draw function as follows. 2174 01:38:10,690 --> 01:38:14,050 Per your definition, if a pyramid of height 4 2175 01:38:14,050 --> 01:38:17,437 is really just a pyramid of height 3 plus another row, well, 2176 01:38:17,437 --> 01:38:18,520 let's take that literally. 2177 01:38:18,520 --> 01:38:19,990 Let me go back to my code. 2178 01:38:19,990 --> 01:38:24,340 And if you want to draw a pyramid of height 4, well go right ahead 2179 01:38:24,340 --> 01:38:29,380 and draw a pyramid of height 3 first, or more generally, n minus 1. 2180 01:38:29,380 --> 01:38:30,640 But what's the second step? 2181 01:38:30,640 --> 01:38:34,510 Well, once you've drawn a pyramid of height 3, draw an extra row. 2182 01:38:34,510 --> 01:38:37,190 So I at least have to bite off that part of the problem myself. 2183 01:38:37,190 --> 01:38:39,310 So let me just do for int i get 0. 2184 01:38:39,310 --> 01:38:41,530 i is less than n i++. 2185 01:38:41,530 --> 01:38:46,010 And let me, the programmer of this function, print out my hashes. 2186 01:38:46,010 --> 01:38:48,340 And then at the very bottom, print out a new line 2187 01:38:48,340 --> 01:38:50,350 so the cursor moves to the next line. 2188 01:38:50,350 --> 01:38:55,660 But this is kind of elegant now, I dare say, in that draw is recursive, 2189 01:38:55,660 --> 01:38:58,570 because I'm literally translating from English to C code, 2190 01:38:58,570 --> 01:39:02,050 this idea that a pyramid of height 4 is really just a pyramid of height 3. 2191 01:39:02,050 --> 01:39:03,640 So I do that first. 2192 01:39:03,640 --> 01:39:06,560 And I'm sort of trusting that this will work. 2193 01:39:06,560 --> 01:39:09,800 Then I just have to lay one more layer of bricks, four of them. 2194 01:39:09,800 --> 01:39:13,030 So if n is 4, this is just a simple for loop, a la week 1, 2195 01:39:13,030 --> 01:39:15,520 that will print out an additional layer. 2196 01:39:15,520 --> 01:39:18,610 But this, of course, is going to be problematic eventually. 2197 01:39:18,610 --> 01:39:20,030 Why? 2198 01:39:20,030 --> 01:39:22,670 It's not done yet, this program. 2199 01:39:22,670 --> 01:39:27,644 How many times will draw call itself in this model? 2200 01:39:27,644 --> 01:39:28,640 STUDENT: It's infinite. 2201 01:39:28,640 --> 01:39:30,098 DAVID MALAN: Infinitely many times. 2202 01:39:30,098 --> 01:39:30,814 Why? 2203 01:39:30,814 --> 01:39:34,170 STUDENT: Because there's no quit function. 2204 01:39:34,170 --> 01:39:36,450 DAVID MALAN: Yeah, there's no equivalent of quit. 2205 01:39:36,450 --> 01:39:39,810 Like, if you've printed enough already, then quit, well, 2206 01:39:39,810 --> 01:39:41,050 how do we capture that? 2207 01:39:41,050 --> 01:39:43,320 Well, I don't think we want this to go negative. 2208 01:39:43,320 --> 01:39:46,570 It would make no sense to draw a negative height pyramid. 2209 01:39:46,570 --> 01:39:51,270 So I think we can just pluck off, as the programmer, an easy case, 2210 01:39:51,270 --> 01:39:53,650 an easy answer, a so-called base case. 2211 01:39:53,650 --> 01:39:54,900 And I'm just going to do this. 2212 01:39:54,900 --> 01:40:00,030 At the top of my draw function, let me just say, if n is less than 2213 01:40:00,030 --> 01:40:02,830 or, heck, less than or equal to 0, that's it. 2214 01:40:02,830 --> 01:40:04,530 Go ahead and just return. 2215 01:40:04,530 --> 01:40:06,030 There's nothing more to do. 2216 01:40:06,030 --> 01:40:10,680 And that simple condition, technically known as a base case, 2217 01:40:10,680 --> 01:40:13,290 will ensure that the code doesn't run forever. 2218 01:40:13,290 --> 01:40:13,860 Why? 2219 01:40:13,860 --> 01:40:17,730 Well, suppose that draw is called with an argument of 4. 2220 01:40:17,730 --> 01:40:20,580 4 is, of course, not less than 0, so we don't return. 2221 01:40:20,580 --> 01:40:22,590 But we do draw a pyramid of height 3. 2222 01:40:22,590 --> 01:40:24,870 And here's where things get a little mentally tricky. 2223 01:40:24,870 --> 01:40:28,320 You don't move on to line 20 until draw has been called. 2224 01:40:28,320 --> 01:40:31,080 So when draw is called with an argument of 3, 2225 01:40:31,080 --> 01:40:34,230 it's as though you're executing from the top of this function again. 2226 01:40:34,230 --> 01:40:35,520 3 is not less than 0. 2227 01:40:35,520 --> 01:40:36,330 So what do you do? 2228 01:40:36,330 --> 01:40:38,490 You draw 2. 2229 01:40:38,490 --> 01:40:39,540 How do you draw 2? 2230 01:40:39,540 --> 01:40:41,950 Well, 2 is not less than 0, so you don't return. 2231 01:40:41,950 --> 01:40:43,050 So you draw 1. 2232 01:40:43,050 --> 01:40:44,370 Got to be careful here. 2233 01:40:44,370 --> 01:40:45,240 Draw 1. 2234 01:40:45,240 --> 01:40:47,340 And now, we go ahead back to the beginning. 2235 01:40:47,340 --> 01:40:48,090 How do you draw 1? 2236 01:40:48,090 --> 01:40:50,430 Well, 1 is not less than 0, so you don't return. 2237 01:40:50,430 --> 01:40:53,400 You draw height 0. 2238 01:40:53,400 --> 01:40:54,510 How do you draw height 0? 2239 01:40:54,510 --> 01:40:55,110 Wait a minute. 2240 01:40:55,110 --> 01:40:57,660 0 is less than or equal to 0. 2241 01:40:57,660 --> 01:40:58,980 And you return. 2242 01:40:58,980 --> 01:41:02,100 And so it's kind of like this mental stack, this to do list. 2243 01:41:02,100 --> 01:41:05,190 You keep postponing, executing these lower lines of code, 2244 01:41:05,190 --> 01:41:08,700 because you keep restarting, restarting, restarting the draw function 2245 01:41:08,700 --> 01:41:12,840 until, finally, one of those function calls says there's nothing to do, 2246 01:41:12,840 --> 01:41:13,530 return. 2247 01:41:13,530 --> 01:41:16,530 And now the whole thing starts to unravel, if you will. 2248 01:41:16,530 --> 01:41:18,330 And you pick back up where you left off. 2249 01:41:18,330 --> 01:41:20,300 And this is, perhaps, the best scenario. 2250 01:41:20,300 --> 01:41:21,300 We won't do it in class. 2251 01:41:21,300 --> 01:41:23,508 But if you'd like to wrestle through this on your own 2252 01:41:23,508 --> 01:41:27,780 using debug50 to keep stepping into, step into, step into, each 2253 01:41:27,780 --> 01:41:31,480 of those lines, logically, you'll see exactly what's actually happening. 2254 01:41:31,480 --> 01:41:34,330 So let me go to my terminal and do make recursion, 2255 01:41:34,330 --> 01:41:37,740 which is now this correct version of the code, dot slash recursion. 2256 01:41:37,740 --> 01:41:39,240 Let's type in a height of 4. 2257 01:41:39,240 --> 01:41:44,730 And voila, now we have that same pyramid, not using iteration per se, 2258 01:41:44,730 --> 01:41:47,910 though admittedly, we're using iteration to print the additional layer. 2259 01:41:47,910 --> 01:41:53,340 We're now using draw recursively to print all of the smaller pyramids 2260 01:41:53,340 --> 01:41:55,120 that need come before it. 2261 01:41:55,120 --> 01:41:57,370 STUDENT: Can you only use recursion for void function? 2262 01:41:57,370 --> 01:41:58,123 [INAUDIBLE] 2263 01:41:58,123 --> 01:41:58,790 DAVID MALAN: No. 2264 01:41:58,790 --> 01:42:01,070 Question is, can you only use recursion with a void function? 2265 01:42:01,070 --> 01:42:01,920 No, not at all. 2266 01:42:01,920 --> 01:42:05,120 In fact, it's very common to have a return value like an integer 2267 01:42:05,120 --> 01:42:09,350 or something else so that you can actually do something constructively 2268 01:42:09,350 --> 01:42:11,360 with that actual value. 2269 01:42:11,360 --> 01:42:13,190 Other questions on this. 2270 01:42:13,190 --> 01:42:15,290 STUDENT: When is line 21 getting executed? 2271 01:42:15,290 --> 01:42:16,790 DAVID MALAN: Say it a little louder. 2272 01:42:16,790 --> 01:42:18,770 STUDENT: When is line 21 getting executed? 2273 01:42:18,770 --> 01:42:20,850 DAVID MALAN: When is line 21 getting executed? 2274 01:42:20,850 --> 01:42:23,720 So if you continue to-- 2275 01:42:23,720 --> 01:42:26,600 let me scroll down a bit more so you can see the top of the code. 2276 01:42:26,600 --> 01:42:35,310 So line 21 will be executed once line 19 is done executing itself. 2277 01:42:35,310 --> 01:42:40,790 Now, in the story I told, we kept calling draw again, again, again. 2278 01:42:40,790 --> 01:42:43,400 But as soon as one of those function calls 2279 01:42:43,400 --> 01:42:46,490 where n equals 0 returns immediately, then we 2280 01:42:46,490 --> 01:42:48,510 don't keep drawing again and again. 2281 01:42:48,510 --> 01:42:51,110 So now if you kind of think of the process as reversing, 2282 01:42:51,110 --> 01:42:57,530 then you continue to line 21, then line 21 again, then line 21 again, 2283 01:42:57,530 --> 01:42:59,303 and as the sort of logic unravels. 2284 01:42:59,303 --> 01:43:01,970 And next week, we'll actually paint a picture of what's actually 2285 01:43:01,970 --> 01:43:03,530 happening in the computer's memory. 2286 01:43:03,530 --> 01:43:07,450 But for now, it's just, it's very similar to the pseudocode for the phone 2287 01:43:07,450 --> 01:43:07,950 book. 2288 01:43:07,950 --> 01:43:09,680 You're just searching again and again. 2289 01:43:09,680 --> 01:43:14,408 But you're waiting until the very end to get back the final result. 2290 01:43:14,408 --> 01:43:16,700 Google now, who I keep mentioning by coincidence today, 2291 01:43:16,700 --> 01:43:18,830 is full of programmers of course. 2292 01:43:18,830 --> 01:43:20,600 Here's a fun exercise. 2293 01:43:20,600 --> 01:43:23,432 Let me go back to a browser. 2294 01:43:23,432 --> 01:43:26,390 I'm going to go ahead and search for recursion, because I want to learn 2295 01:43:26,390 --> 01:43:27,980 a little something about recursion. 2296 01:43:27,980 --> 01:43:30,230 Here is kind of an internet meme or joke. 2297 01:43:30,230 --> 01:43:35,360 If I zoom in here, the engineers at Google are kind of funny. 2298 01:43:35,360 --> 01:43:37,902 See why? 2299 01:43:37,902 --> 01:43:38,798 STUDENT: Ah. 2300 01:43:38,798 --> 01:43:40,540 DAVID MALAN: Ah, there you go. 2301 01:43:40,540 --> 01:43:41,740 Yes. 2302 01:43:41,740 --> 01:43:43,030 Yes, this is recursion. 2303 01:43:43,030 --> 01:43:45,822 And there's going to be so many memes you'll come across now, where 2304 01:43:45,822 --> 01:43:48,490 recursion, like if you've ever pointed a camera at the TV that's 2305 01:43:48,490 --> 01:43:51,190 showing the camera, and you sort of see yourself or the image again and again, 2306 01:43:51,190 --> 01:43:52,660 that's really recursion. 2307 01:43:52,660 --> 01:43:56,320 And in that case, it only stops once you hit the base case of a single pixel. 2308 01:43:56,320 --> 01:43:59,350 But this is a very funny joke in some circles 2309 01:43:59,350 --> 01:44:01,880 when it comes to recursion and Google. 2310 01:44:01,880 --> 01:44:04,570 So how can we actually use Google, or rather, 2311 01:44:04,570 --> 01:44:08,050 how can we actually use recursion constructively? 2312 01:44:08,050 --> 01:44:11,320 Well, let me propose that we actually introduced 2313 01:44:11,320 --> 01:44:14,950 a third and final algorithm for sorting that hopefully does better 2314 01:44:14,950 --> 01:44:16,870 than the two sorts thus far. 2315 01:44:16,870 --> 01:44:19,480 We've done selection sort and bubble sort. 2316 01:44:19,480 --> 01:44:22,845 Bubble sort, we liked a little better, at least insofar as in the best case 2317 01:44:22,845 --> 01:44:24,220 where the list is already sorted. 2318 01:44:24,220 --> 01:44:26,387 Bubble sort's at least smarter, and it will actually 2319 01:44:26,387 --> 01:44:28,840 terminate early, giving us a better lower bound, 2320 01:44:28,840 --> 01:44:30,490 in terms of our omega notation. 2321 01:44:30,490 --> 01:44:33,100 But it turns out that recursion, and this is not 2322 01:44:33,100 --> 01:44:36,250 necessarily a feature of recursion, but something we can now leverage. 2323 01:44:36,250 --> 01:44:39,460 It turns out, using recursion, we can take a fundamentally different approach 2324 01:44:39,460 --> 01:44:43,090 to sorting a whole bunch of numbers in such a way 2325 01:44:43,090 --> 01:44:47,560 that we can do far fewer comparisons and, ideally, speed up 2326 01:44:47,560 --> 01:44:49,010 our final results. 2327 01:44:49,010 --> 01:44:51,970 So here is the pseudocode for what we're about to see 2328 01:44:51,970 --> 01:44:54,010 for something called merge sort. 2329 01:44:54,010 --> 01:44:56,230 And it really is this terse. 2330 01:44:56,230 --> 01:44:58,330 Sort the left half of numbers. 2331 01:44:58,330 --> 01:45:00,550 Sort the right half of numbers. 2332 01:45:00,550 --> 01:45:02,950 Merge the sorted halves. 2333 01:45:02,950 --> 01:45:06,310 This is almost sort of nonsensical, because if you're 2334 01:45:06,310 --> 01:45:09,880 asked for an algorithm to sort, and you respond with, well, sort the left half, 2335 01:45:09,880 --> 01:45:10,960 sort the right half. 2336 01:45:10,960 --> 01:45:14,230 That's being difficult, because well, I'm asking for a sorting algorithm. 2337 01:45:14,230 --> 01:45:16,897 You're just telling me to sort the left half and the right half. 2338 01:45:16,897 --> 01:45:21,130 But implicit in that last line, merging is a pretty powerful feature 2339 01:45:21,130 --> 01:45:21,760 of this sort. 2340 01:45:21,760 --> 01:45:23,908 Now, we do need another base case at the top. 2341 01:45:23,908 --> 01:45:24,700 So let me add this. 2342 01:45:24,700 --> 01:45:27,850 If we find ourselves with a list, an array, of size 1, 2343 01:45:27,850 --> 01:45:29,807 well, that array is obviously sorted. 2344 01:45:29,807 --> 01:45:32,390 If there's only one element in it, there's no work to be done. 2345 01:45:32,390 --> 01:45:33,890 So that's going to be our base case. 2346 01:45:33,890 --> 01:45:38,770 But allowing us now, in just these, what, four, six lines of pseudocode, 2347 01:45:38,770 --> 01:45:40,900 to actually sort some elements. 2348 01:45:40,900 --> 01:45:43,652 But let's focus first on just a subset of this. 2349 01:45:43,652 --> 01:45:46,360 Let's consider for a moment what it means to merge sorted halves. 2350 01:45:46,360 --> 01:45:48,760 So Carter has wonderfully come up to volunteer here just 2351 01:45:48,760 --> 01:45:50,170 to help us reset these numbers. 2352 01:45:50,170 --> 01:45:54,080 Suppose that in the middle of the story we're about to tell, 2353 01:45:54,080 --> 01:45:56,020 we have two sorted halves. 2354 01:45:56,020 --> 01:45:58,300 I've already sorted the left half of these numbers, 2355 01:45:58,300 --> 01:46:01,630 and indeed, 2, 4, 5, 7 is sorted from smallest to largest. 2356 01:46:01,630 --> 01:46:06,100 And the right half appears to be already sorted, 0, 1, 3, 6, already sorted. 2357 01:46:06,100 --> 01:46:09,370 So in my pseudocode, we're already done sorting the left half 2358 01:46:09,370 --> 01:46:10,630 and the right half somehow. 2359 01:46:10,630 --> 01:46:12,160 But we'll see how in a moment. 2360 01:46:12,160 --> 01:46:14,980 Well, how do I go about merging these two halves? 2361 01:46:14,980 --> 01:46:18,490 Well, because they're sorted already, and you want to merge them in order, 2362 01:46:18,490 --> 01:46:20,110 I think we can flip down. 2363 01:46:20,110 --> 01:46:25,030 We can hide all but the first numbers in each of these sublists. 2364 01:46:25,030 --> 01:46:28,125 So here, we have a half that starts with 2. 2365 01:46:28,125 --> 01:46:30,250 And I don't really care what the other numbers are, 2366 01:46:30,250 --> 01:46:32,060 because they're clearly larger than 2. 2367 01:46:32,060 --> 01:46:35,043 I can focus only on 2, and 0 too, 0 also. 2368 01:46:35,043 --> 01:46:37,960 We know that 0 is the smallest there, so let's just ignore the numbers 2369 01:46:37,960 --> 01:46:39,293 that Carter kindly flipped down. 2370 01:46:39,293 --> 01:46:44,360 So how do I merge these two lists into a new sorted larger list? 2371 01:46:44,360 --> 01:46:47,590 Well, I compare the two on my left with the 0 2372 01:46:47,590 --> 01:46:50,470 on my right, obviously, which comes first, the 0. 2373 01:46:50,470 --> 01:46:51,963 So let me put this down here. 2374 01:46:51,963 --> 01:46:54,130 And Carter, if you want to give us the next element. 2375 01:46:54,130 --> 01:46:55,960 Now I have two sorted halves. 2376 01:46:55,960 --> 01:46:57,650 But I've already plucked one off. 2377 01:46:57,650 --> 01:47:00,010 So now I compare the two against the 1. 2378 01:47:00,010 --> 01:47:01,580 1 obviously comes next. 2379 01:47:01,580 --> 01:47:04,843 So I'm going to take out the 1 and put it in place here. 2380 01:47:04,843 --> 01:47:06,760 Now I'm going to compare the two halves again. 2381 01:47:06,760 --> 01:47:08,830 2 and 3, which do I merge first? 2382 01:47:08,830 --> 01:47:10,660 Obviously the 2 comes next. 2383 01:47:10,660 --> 01:47:14,170 And now, notice, each time I do this, my hands are theoretically 2384 01:47:14,170 --> 01:47:15,220 making forward progress. 2385 01:47:15,220 --> 01:47:18,580 I'm not doubling back like I kept doing with selection sort or bubble 2386 01:47:18,580 --> 01:47:20,290 sort, back and forth, back and forth. 2387 01:47:20,290 --> 01:47:22,810 My fingers are constantly advancing forward, 2388 01:47:22,810 --> 01:47:24,310 and that's going to be a key detail. 2389 01:47:24,310 --> 01:47:27,340 So I compare 4 and 3, 3 obviously. 2390 01:47:27,340 --> 01:47:32,560 I compare 4 and 6, 4 obviously. 2391 01:47:32,560 --> 01:47:36,520 I compare 5 and 6, 5 obviously. 2392 01:47:36,520 --> 01:47:40,810 And then I compare 7 and 6, 6 of course. 2393 01:47:40,810 --> 01:47:43,000 And then lastly, we have just one element left. 2394 01:47:43,000 --> 01:47:45,460 And even though I'm kind of moving awkwardly as a human, 2395 01:47:45,460 --> 01:47:48,160 my hands technically were only moving to the right. 2396 01:47:48,160 --> 01:47:51,130 I was never looping back doing something again and again. 2397 01:47:51,130 --> 01:47:54,430 And that's, perhaps, the intuition, and just enough room for the 7. 2398 01:47:54,430 --> 01:47:58,210 So that, then, is how you would merge two sorted halves. 2399 01:47:58,210 --> 01:48:00,610 We started with left half sorted, right half sorted. 2400 01:48:00,610 --> 01:48:02,860 And merging is just like what you would do as a human. 2401 01:48:02,860 --> 01:48:04,568 And Carter just flipped the numbers down, 2402 01:48:04,568 --> 01:48:08,620 so our focus was only on the smallest elements in each. 2403 01:48:08,620 --> 01:48:13,030 Any questions before we forge ahead with what it means, then, 2404 01:48:13,030 --> 01:48:17,120 to be merged in this way? 2405 01:48:17,120 --> 01:48:18,777 So now, here is an original list. 2406 01:48:18,777 --> 01:48:20,860 We deliberately put it at the top, because there's 2407 01:48:20,860 --> 01:48:22,480 one detail of merge sort that's key. 2408 01:48:22,480 --> 01:48:25,490 Merge sort is technically going to use a little more space. 2409 01:48:25,490 --> 01:48:28,270 And so whereas, previously, we just kept moving our humans around 2410 01:48:28,270 --> 01:48:30,790 and swapping people and making sure they stayed ultimately 2411 01:48:30,790 --> 01:48:32,110 in the original positions. 2412 01:48:32,110 --> 01:48:36,700 With merge sort, pretends that here's our original array of memory. 2413 01:48:36,700 --> 01:48:38,970 I'm going to need at least one other array of memory. 2414 01:48:38,970 --> 01:48:41,160 And I'm going to cheat, and I'm going to use even more memory. 2415 01:48:41,160 --> 01:48:43,440 But technically, I could actually go back and forth 2416 01:48:43,440 --> 01:48:45,540 between 1 array and a secondary array. 2417 01:48:45,540 --> 01:48:48,370 But it is going to take me more space. 2418 01:48:48,370 --> 01:48:53,130 So how do I go about implementing merge sort on this code? 2419 01:48:53,130 --> 01:48:54,930 Well, let's consider this. 2420 01:48:54,930 --> 01:48:57,060 Here is a array of size 8. 2421 01:48:57,060 --> 01:48:59,590 If only one number quit, obviously not applicable. 2422 01:48:59,590 --> 01:49:01,230 So let's focus on the juicy part there. 2423 01:49:01,230 --> 01:49:02,880 Sort the left half of the numbers. 2424 01:49:02,880 --> 01:49:05,130 All right, how do I sort the left half of the numbers? 2425 01:49:05,130 --> 01:49:09,240 I'm going to just nudge them over just to be clear, which is the left half. 2426 01:49:09,240 --> 01:49:11,850 Here is now a sublist of size 4. 2427 01:49:11,850 --> 01:49:14,860 How do I sort the left half? 2428 01:49:14,860 --> 01:49:17,380 Well, do I have an algorithm for sorting? 2429 01:49:17,380 --> 01:49:18,430 Yeah, what do I do? 2430 01:49:18,430 --> 01:49:19,492 Here's a list of size 4. 2431 01:49:19,492 --> 01:49:20,200 How do I sort it? 2432 01:49:20,200 --> 01:49:22,000 What's step one? 2433 01:49:22,000 --> 01:49:23,330 Sort the left half. 2434 01:49:23,330 --> 01:49:28,060 So I now sort of, conceptually in my mind, take this sublist of size 4. 2435 01:49:28,060 --> 01:49:32,740 And I sort it by first sorting the left half, focusing now on the 7 and 2. 2436 01:49:32,740 --> 01:49:34,330 All right, here's a list of size 2. 2437 01:49:34,330 --> 01:49:37,060 How do I sort a list of size 2? 2438 01:49:37,060 --> 01:49:38,740 STUDENT: [INAUDIBLE] 2439 01:49:38,740 --> 01:49:40,170 DAVID MALAN: Sorry? 2440 01:49:40,170 --> 01:49:42,360 I think we just keep following our instructions. 2441 01:49:42,360 --> 01:49:43,650 Sort the left half. 2442 01:49:43,650 --> 01:49:45,630 All right, here is a list of size 1. 2443 01:49:45,630 --> 01:49:48,417 How do I sort a list of size 1? 2444 01:49:48,417 --> 01:49:49,803 STUDENT: [INAUDIBLE] 2445 01:49:49,803 --> 01:49:50,720 DAVID MALAN: I'm done. 2446 01:49:50,720 --> 01:49:51,360 It's done. 2447 01:49:51,360 --> 01:49:52,740 So I leave this alone. 2448 01:49:52,740 --> 01:49:54,740 What was the next step in the story? 2449 01:49:54,740 --> 01:49:58,160 I've just sorted the left half of the left half of the left half. 2450 01:49:58,160 --> 01:49:59,580 What comes next? 2451 01:49:59,580 --> 01:50:03,260 I sort the right half of the left half of the left half, 2452 01:50:03,260 --> 01:50:06,260 and I'm done, because it's just a list of size 1. 2453 01:50:06,260 --> 01:50:09,280 What comes after this? 2454 01:50:09,280 --> 01:50:09,972 Merge. 2455 01:50:09,972 --> 01:50:11,680 So this is where it gets a little trippy, 2456 01:50:11,680 --> 01:50:14,770 because you have to remember where we're pausing the story to do things 2457 01:50:14,770 --> 01:50:16,187 recursively again and again. 2458 01:50:16,187 --> 01:50:19,270 But if I've just sorted the left half and I've just sorted the right half, 2459 01:50:19,270 --> 01:50:20,890 now I merge them together. 2460 01:50:20,890 --> 01:50:25,040 This is a super short list, so we don't need Carter's help here as before. 2461 01:50:25,040 --> 01:50:27,640 But I think the first number I take here is the 2. 2462 01:50:27,640 --> 01:50:31,660 And then the second number I take, because it's the only option, is the 7. 2463 01:50:31,660 --> 01:50:35,260 But what's nice now is that, notice, the left half of the left half 2464 01:50:35,260 --> 01:50:39,228 is indeed sorted, because I trivially sorted the left half of it 2465 01:50:39,228 --> 01:50:40,270 and the right half of it. 2466 01:50:40,270 --> 01:50:42,760 But then merging is really where the magic happens. 2467 01:50:42,760 --> 01:50:45,670 All right, again, if you rewind now in your mind, 2468 01:50:45,670 --> 01:50:51,300 if I've just sorted the left half of the left half, what happens next? 2469 01:50:51,300 --> 01:50:55,000 Sort the right half of the left half. 2470 01:50:55,000 --> 01:50:56,980 So again, you kind of rewind in time. 2471 01:50:56,980 --> 01:50:58,290 So how do I do this? 2472 01:50:58,290 --> 01:50:59,520 I've got a list of size 2. 2473 01:50:59,520 --> 01:51:01,920 I sort the left half, just the 5, done. 2474 01:51:01,920 --> 01:51:04,200 Sort the right half, 4, done. 2475 01:51:04,200 --> 01:51:08,910 Now the interesting part, I merge the left half and the right half 2476 01:51:08,910 --> 01:51:11,380 of the right half of the left half. 2477 01:51:11,380 --> 01:51:12,450 So what do I do? 2478 01:51:12,450 --> 01:51:14,280 4 comes down here. 2479 01:51:14,280 --> 01:51:16,260 5 comes down here. 2480 01:51:16,260 --> 01:51:19,860 And now, notice what I have. 2481 01:51:19,860 --> 01:51:21,600 Left half is sorted. 2482 01:51:21,600 --> 01:51:23,130 Right half is sorted. 2483 01:51:23,130 --> 01:51:26,610 If you rewind in time, where is my next step, 3? 2484 01:51:26,610 --> 01:51:27,742 Merge the two halves. 2485 01:51:27,742 --> 01:51:29,700 And so this is what Carter helped me do before. 2486 01:51:29,700 --> 01:51:31,290 Let's focus only on the smallest elements, 2487 01:51:31,290 --> 01:51:32,665 just so there's less distraction. 2488 01:51:32,665 --> 01:51:34,020 I compare the 2 and the 4. 2489 01:51:34,020 --> 01:51:36,520 2 comes first, so let's obviously put that here. 2490 01:51:36,520 --> 01:51:39,570 Now, I compare the new beginning of this list 2491 01:51:39,570 --> 01:51:41,280 and the old beginning of this list. 2492 01:51:41,280 --> 01:51:43,050 4 obviously comes next. 2493 01:51:43,050 --> 01:51:45,940 And now, I compare the 7 against the 5. 2494 01:51:45,940 --> 01:51:47,430 5 obviously comes next. 2495 01:51:47,430 --> 01:51:49,240 And now, lastly, I'm left with one number. 2496 01:51:49,240 --> 01:51:50,970 So now I'm down to the 7. 2497 01:51:50,970 --> 01:51:53,490 So even if you've kind of lost track of some of the nuances 2498 01:51:53,490 --> 01:51:55,510 here, if you just kind of take a step back, 2499 01:51:55,510 --> 01:51:58,260 we have the original right half here still untouched. 2500 01:51:58,260 --> 01:52:03,210 But the left half of the original input is now, indeed, sorted, 2501 01:52:03,210 --> 01:52:07,140 all by way of doing sorting left half, right half, left half, right half, 2502 01:52:07,140 --> 01:52:08,940 but with those merges in between. 2503 01:52:08,940 --> 01:52:11,965 All right, so if we've just sorted the left half, 2504 01:52:11,965 --> 01:52:13,590 we rewind all the way to the beginning. 2505 01:52:13,590 --> 01:52:15,590 What do I now do? 2506 01:52:15,590 --> 01:52:17,120 All right, so sort the right half. 2507 01:52:17,120 --> 01:52:18,410 So sort the right half. 2508 01:52:18,410 --> 01:52:20,180 How do I sort a list of size 4? 2509 01:52:20,180 --> 01:52:22,550 Well, I first sort the left half, the 1 and the 6. 2510 01:52:22,550 --> 01:52:24,560 How do I sort a list of size 2? 2511 01:52:24,560 --> 01:52:26,757 You sort the left half, just the number 1. 2512 01:52:26,757 --> 01:52:28,340 Obviously, there's no work to be done. 2513 01:52:28,340 --> 01:52:30,620 Done, sorting the left half. 2514 01:52:30,620 --> 01:52:33,080 6, done, sorting the right half. 2515 01:52:33,080 --> 01:52:34,280 Now, what do I do? 2516 01:52:34,280 --> 01:52:40,610 I merge the left half here with the right half here. 2517 01:52:40,610 --> 01:52:42,240 And that one's pretty straightforward. 2518 01:52:42,240 --> 01:52:43,050 Now, what do I do? 2519 01:52:43,050 --> 01:52:43,910 I've just merged. 2520 01:52:43,910 --> 01:52:45,048 So now I sort it. 2521 01:52:45,048 --> 01:52:47,090 I've just sorted the left half of the right half. 2522 01:52:47,090 --> 01:52:49,550 So now I sort the right half of the right half. 2523 01:52:49,550 --> 01:52:51,590 So I consider the 0, done. 2524 01:52:51,590 --> 01:52:53,270 I consider the 3, done. 2525 01:52:53,270 --> 01:52:55,040 I now merge these two together. 2526 01:52:55,040 --> 01:52:56,640 0, of course, comes first. 2527 01:52:56,640 --> 01:52:58,100 Then comes the 3. 2528 01:52:58,100 --> 01:53:00,680 And now I'm at the point of the story where 2529 01:53:00,680 --> 01:53:02,930 I've sorted the left half of the right half 2530 01:53:02,930 --> 01:53:04,910 and the right half of the right half. 2531 01:53:04,910 --> 01:53:07,535 So step 3 is merge. 2532 01:53:07,535 --> 01:53:09,410 And I'll do it again like we did with Carter. 2533 01:53:09,410 --> 01:53:12,320 All right, 1 and 0, obviously the 0 comes first. 2534 01:53:12,320 --> 01:53:14,390 Now, compare the 1 and the 3. 2535 01:53:14,390 --> 01:53:16,130 Obviously, the 1 comes first. 2536 01:53:16,130 --> 01:53:18,590 Compare the 6 and the 3, obviously the 3. 2537 01:53:18,590 --> 01:53:20,300 And then lastly, the 6. 2538 01:53:20,300 --> 01:53:21,890 So now, where are we? 2539 01:53:21,890 --> 01:53:26,840 We've taken the left half of the whole thing and sorted it. 2540 01:53:26,840 --> 01:53:29,990 We then took the right half of the whole thing and sorted it. 2541 01:53:29,990 --> 01:53:33,560 So now we're at, lastly, step 3 for the last time. 2542 01:53:33,560 --> 01:53:35,120 What do we do? 2543 01:53:35,120 --> 01:53:35,780 Merge. 2544 01:53:35,780 --> 01:53:40,220 And so just to be consistent, let me push these down, and let's compare. 2545 01:53:40,220 --> 01:53:43,310 Left hand or right hand, noticing that they only make forward progress, 2546 01:53:43,310 --> 01:53:45,230 none of this back and forth comparisons. 2547 01:53:45,230 --> 01:53:47,270 2 and 0, of course, the 0. 2548 01:53:47,270 --> 01:53:48,860 So we'll put that in place. 2549 01:53:48,860 --> 01:53:51,140 2 and 1, of course, the 1. 2550 01:53:51,140 --> 01:53:52,880 So we put that in place. 2551 01:53:52,880 --> 01:53:56,930 2 and 3, we merge in, of course, the 2 in this case. 2552 01:53:56,930 --> 01:54:00,770 4 and 3, we now merge in the 3 in this case. 2553 01:54:00,770 --> 01:54:05,630 4 and 6, we now merge, of course, the 4 in place. 2554 01:54:05,630 --> 01:54:07,760 And now, we compare 5 and 6. 2555 01:54:07,760 --> 01:54:08,615 We keep the 5. 2556 01:54:08,615 --> 01:54:12,590 2557 01:54:12,590 --> 01:54:15,226 Bug. 2558 01:54:15,226 --> 01:54:17,295 OK, well pretend that the 5 is on. 2559 01:54:17,295 --> 01:54:20,040 2560 01:54:20,040 --> 01:54:21,450 Oh, this is why. 2561 01:54:21,450 --> 01:54:24,240 All right, so now we compare the 7 and the 6. 2562 01:54:24,240 --> 01:54:26,430 6th is gone. 2563 01:54:26,430 --> 01:54:29,520 And lastly, 7 is the last one in place. 2564 01:54:29,520 --> 01:54:32,140 And even though I grant that of all the algorithms, 2565 01:54:32,140 --> 01:54:34,320 this is probably the hardest one to stay on top of, 2566 01:54:34,320 --> 01:54:36,210 especially when I'm doing it as a voice over. 2567 01:54:36,210 --> 01:54:41,010 Realize that what we've just done is only those three steps, recursively. 2568 01:54:41,010 --> 01:54:42,510 We started with a list of size 8. 2569 01:54:42,510 --> 01:54:43,650 We sorted the left half. 2570 01:54:43,650 --> 01:54:44,790 We sorted the right half. 2571 01:54:44,790 --> 01:54:46,450 And then we merge the two together. 2572 01:54:46,450 --> 01:54:48,960 But if you go down each of those rabbit holes, so to speak, 2573 01:54:48,960 --> 01:54:52,410 sorting the left half involves sorting the left half of the left half 2574 01:54:52,410 --> 01:54:54,970 and the right half of the left half, and so forth. 2575 01:54:54,970 --> 01:54:59,250 But this germ of an idea of really dividing and conquering the problem, 2576 01:54:59,250 --> 01:55:02,520 not such that you're having the problem and only dealing with one half. 2577 01:55:02,520 --> 01:55:05,700 Clearly, we're sorting one half and the other half 2578 01:55:05,700 --> 01:55:07,780 and merging them together, ultimately. 2579 01:55:07,780 --> 01:55:10,810 It does still lead us to the same solution. 2580 01:55:10,810 --> 01:55:14,550 And if we visualize the remnants of this now, if I depict this as follows, 2581 01:55:14,550 --> 01:55:18,390 where on the screen here, you see where the numbers originally started 2582 01:55:18,390 --> 01:55:20,310 in the top row from left to right. 2583 01:55:20,310 --> 01:55:23,470 Essentially, even though this is in a different order, 2584 01:55:23,470 --> 01:55:28,708 I divided that list of size 8, ultimately, into eight lists of size 1. 2585 01:55:28,708 --> 01:55:31,000 And that's where the base case kicked in and just said, 2586 01:55:31,000 --> 01:55:32,580 OK, we're done sorting that. 2587 01:55:32,580 --> 01:55:37,680 And after that, logically, I then merged two lists of size 1 2588 01:55:37,680 --> 01:55:41,580 into many lists of size 2 and those lists of size 2 into lists of size 4. 2589 01:55:41,580 --> 01:55:47,250 And then finally, the list of size 4 into one big list sorted of size 8. 2590 01:55:47,250 --> 01:55:50,610 And so I put forth this picture with the little line indicators 2591 01:55:50,610 --> 01:55:55,620 here, because how many times did I divide, divide, divide in half? 2592 01:55:55,620 --> 01:55:57,360 Or really double, double, double. 2593 01:55:57,360 --> 01:55:59,280 So exponent is the opposite-- 2594 01:55:59,280 --> 01:56:00,600 spoiler. 2595 01:56:00,600 --> 01:56:02,610 How many times did I divide? 2596 01:56:02,610 --> 01:56:04,320 So three, concretely. 2597 01:56:04,320 --> 01:56:08,070 But if there's eight elements total, and there's n more generally, 2598 01:56:08,070 --> 01:56:12,300 it really is a matter of dividing and conquering log n times. 2599 01:56:12,300 --> 01:56:15,360 You start this, and you can divide one, two, three times, log n times. 2600 01:56:15,360 --> 01:56:18,930 Or conversely, you can start here and exponentially double, double, 2601 01:56:18,930 --> 01:56:21,060 double three times, which is log n. 2602 01:56:21,060 --> 01:56:24,510 But on every row, every shelf, literally, I 2603 01:56:24,510 --> 01:56:28,350 made a fuss about pointing my hands only from the left to the right, 2604 01:56:28,350 --> 01:56:31,380 constantly advancing them, such that every time I did those merges, 2605 01:56:31,380 --> 01:56:34,500 I touched every element once and only once. 2606 01:56:34,500 --> 01:56:37,470 There was none of this back and forth, back and forth on stage. 2607 01:56:37,470 --> 01:56:42,750 So if I'm doing something log n times, or if I'm doing, 2608 01:56:42,750 --> 01:56:49,410 rather, n things log n times, what would be our big O formula, perhaps? 2609 01:56:49,410 --> 01:56:51,057 n things log n times? 2610 01:56:51,057 --> 01:56:52,140 STUDENT: Oh, it's n log n. 2611 01:56:52,140 --> 01:56:53,490 DAVID MALAN: Yeah, so n log n. 2612 01:56:53,490 --> 01:56:56,430 The order of n log n is, indeed, how we would describe 2613 01:56:56,430 --> 01:56:58,560 the running time of merge sort. 2614 01:56:58,560 --> 01:57:03,270 And so of all of the sorts thus far, we've seen that merge sort here, 2615 01:57:03,270 --> 01:57:07,120 actually, is n log n, which is strictly better than n squared, 2616 01:57:07,120 --> 01:57:10,650 which is where both selection sort and bubble sort landed. 2617 01:57:10,650 --> 01:57:14,015 But it's also slower than linear search, for instance. 2618 01:57:14,015 --> 01:57:15,390 But you would rather expect that. 2619 01:57:15,390 --> 01:57:17,880 If you have to do a lot of work up front sorting 2620 01:57:17,880 --> 01:57:19,878 some elements versus just searching them, 2621 01:57:19,878 --> 01:57:21,670 you're going to have to put in more effort. 2622 01:57:21,670 --> 01:57:24,150 And so the question of whether or not you should just search something 2623 01:57:24,150 --> 01:57:26,670 blindly with linear search and not bother sorting it, 2624 01:57:26,670 --> 01:57:30,327 really boils down to, can you afford to spend this amount of time? 2625 01:57:30,327 --> 01:57:32,160 And if you're the Googles of the world, odds 2626 01:57:32,160 --> 01:57:35,170 are you don't want to be searching their database linearly every time. 2627 01:57:35,170 --> 01:57:35,670 Why? 2628 01:57:35,670 --> 01:57:40,380 Because you can sort it once and then benefit millions, billions of people, 2629 01:57:40,380 --> 01:57:43,560 subsequently using something like binary search or, frankly in practice, 2630 01:57:43,560 --> 01:57:46,443 something even fancier and faster than binary search. 2631 01:57:46,443 --> 01:57:48,360 But there's always going to be this trade off. 2632 01:57:48,360 --> 01:57:52,140 You can achieve binary search only if the elements are sorted. 2633 01:57:52,140 --> 01:57:53,940 How much does it cost you to sort them? 2634 01:57:53,940 --> 01:57:56,940 Well, maybe n squared, if you used some of the earlier algorithms. 2635 01:57:56,940 --> 01:58:00,850 But it turns out, n log in is pretty fast as well. 2636 01:58:00,850 --> 01:58:06,180 So at the end of the day, these running times involve trade offs. 2637 01:58:06,180 --> 01:58:09,600 And indeed, in merge sort 2, I should note that the lower bound on merge 2638 01:58:09,600 --> 01:58:12,090 sort is also going to be omega of n log n. 2639 01:58:12,090 --> 01:58:14,640 As such, we can describe it in terms of our theta notation, 2640 01:58:14,640 --> 01:58:18,030 saying that merge sort is, indeed, in theta of n log n. 2641 01:58:18,030 --> 01:58:21,780 So generally speaking, probably better to use something like merge sort 2642 01:58:21,780 --> 01:58:24,030 or some other algorithm that's n log n. 2643 01:58:24,030 --> 01:58:27,660 In practice, most programmers are not implementing these sorting algorithms 2644 01:58:27,660 --> 01:58:28,200 themselves. 2645 01:58:28,200 --> 01:58:30,390 Odds are, they're using a library off the shelf 2646 01:58:30,390 --> 01:58:34,200 that themselves have made the decision as to which of these algorithms to do. 2647 01:58:34,200 --> 01:58:37,140 But generally speaking, and we're seeing now this for the first time, 2648 01:58:37,140 --> 01:58:41,430 if you want to improve time, like use less time, write faster code, 2649 01:58:41,430 --> 01:58:42,750 you've got to pay a price. 2650 01:58:42,750 --> 01:58:45,120 And that might be your human time, just takes you 2651 01:58:45,120 --> 01:58:47,400 more time to code up something more sophisticated, 2652 01:58:47,400 --> 01:58:49,080 more difficult to implement. 2653 01:58:49,080 --> 01:58:51,840 Or you need to spend something like space. 2654 01:58:51,840 --> 01:58:54,060 And as these shelves suggest, that too is 2655 01:58:54,060 --> 01:58:55,740 one of the key details of merge sort. 2656 01:58:55,740 --> 01:58:58,500 You can't just have the elements swapping in place. 2657 01:58:58,500 --> 01:59:02,520 You need at least an auxiliary array, so that when you do the merging, 2658 01:59:02,520 --> 01:59:03,960 you have a place to put them. 2659 01:59:03,960 --> 01:59:05,988 And this is excessive, this amount of memory. 2660 01:59:05,988 --> 01:59:09,030 I could have just gone back and forth between top shelf and bottom shelf. 2661 01:59:09,030 --> 01:59:11,113 But it's a little more interesting to go top down. 2662 01:59:11,113 --> 01:59:12,870 But you do need more space. 2663 01:59:12,870 --> 01:59:15,605 Back in the day, decades ago, space was really expensive. 2664 01:59:15,605 --> 01:59:16,480 And so you know what? 2665 01:59:16,480 --> 01:59:19,450 It might have been better to not use merge sort, use bubble sort 2666 01:59:19,450 --> 01:59:23,230 or selection sort even, or some other algorithm altogether. 2667 01:59:23,230 --> 01:59:25,300 Nowadays, space is relatively cheap. 2668 01:59:25,300 --> 01:59:27,160 And so these are more acceptable trade offs. 2669 01:59:27,160 --> 01:59:29,650 But it totally depends on the application. 2670 01:59:29,650 --> 01:59:32,950 The very last thing we thought we'd do is show you an actual comparison 2671 01:59:32,950 --> 01:59:34,450 of some of these sorting algorithms. 2672 01:59:34,450 --> 01:59:35,800 It's about 60 seconds long. 2673 01:59:35,800 --> 01:59:39,790 And it will compare for you, selection sort, bubble sort, 2674 01:59:39,790 --> 01:59:44,350 and merge sort in parallel simultaneously with some fun sorting 2675 01:59:44,350 --> 01:59:46,570 music, showing you ultimately what it really 2676 01:59:46,570 --> 01:59:52,750 means to be an O of n squared, or better yet, big O of n log n. 2677 01:59:52,750 --> 01:59:54,850 Selection on the top. 2678 01:59:54,850 --> 01:59:58,050 Bubble on the bottom. 2679 01:59:58,050 --> 01:59:59,400 Merge in the middle. 2680 01:59:59,400 --> 02:00:01,885 [MUSIC PLAYING] 2681 02:00:01,885 --> 02:00:53,660 2682 02:00:53,660 --> 02:00:55,700 All right, that's it for CS50. 2683 02:00:55,700 --> 02:00:57,910 We'll see you next time. 2684 02:00:57,910 --> 02:01:35,000