1 00:00:00,000 --> 00:00:02,997 2 00:00:02,997 --> 00:00:05,080 DAVID MALAN: So let's consider one such algorithm. 3 00:00:05,080 --> 00:00:08,050 So on all of our phones, whether iOS or Android or the like, 4 00:00:08,050 --> 00:00:10,233 you have some contacts application. 5 00:00:10,233 --> 00:00:12,400 And that contacts application's probably storing all 6 00:00:12,400 --> 00:00:14,483 of your friends and family members and colleagues, 7 00:00:14,483 --> 00:00:17,650 probably alphabetically, maybe by first name, maybe by last name, 8 00:00:17,650 --> 00:00:19,810 or however you've organized that device. 9 00:00:19,810 --> 00:00:22,510 Well, the old-school version of this happens 10 00:00:22,510 --> 00:00:26,590 to be in paper form which looks a little something like this, a phonebook. 11 00:00:26,590 --> 00:00:30,070 And inside of an old-school phonebook, really, is that exact same idea. 12 00:00:30,070 --> 00:00:30,880 It's much larger. 13 00:00:30,880 --> 00:00:32,860 It's much more printed. 14 00:00:32,860 --> 00:00:34,010 But it's the same thing. 15 00:00:34,010 --> 00:00:36,760 There's a whole bunch of names and numbers in a typical phonebook, 16 00:00:36,760 --> 00:00:41,170 sorted alphabetically just like your own Android phone or iOS 17 00:00:41,170 --> 00:00:42,590 phone might be as well. 18 00:00:42,590 --> 00:00:44,200 So suppose we want to solve a problem. 19 00:00:44,200 --> 00:00:47,260 And the input to that problem is not only this phone book, but also 20 00:00:47,260 --> 00:00:49,600 the name of someone to look up the number for. 21 00:00:49,600 --> 00:00:53,200 So my own name, for instance, if I want to look up my phone number or you do, 22 00:00:53,200 --> 00:00:56,050 you might open up this book and start looking for David, 23 00:00:56,050 --> 00:00:58,690 for instance, if we assume that it's sorted by first name. 24 00:00:58,690 --> 00:01:00,470 I don't see David on the first page. 25 00:01:00,470 --> 00:01:01,870 So I move on to the second. 26 00:01:01,870 --> 00:01:03,080 I don't see myself there. 27 00:01:03,080 --> 00:01:04,300 So I move on to the third. 28 00:01:04,300 --> 00:01:05,450 I don't see myself there. 29 00:01:05,450 --> 00:01:07,880 So I move on to the fourth and so forth. 30 00:01:07,880 --> 00:01:12,310 One page at a time, looking for my name, and in turn, my number. 31 00:01:12,310 --> 00:01:15,970 Well, if correctness is important, let me ask that question first. 32 00:01:15,970 --> 00:01:18,970 Is this algorithm, turning the pages step by step, 33 00:01:18,970 --> 00:01:22,650 looking for David, correct? 34 00:01:22,650 --> 00:01:23,430 What do you think? 35 00:01:23,430 --> 00:01:28,140 Within Zoom, you should see some icons under the participant's window 36 00:01:28,140 --> 00:01:29,620 labeled Yes and No. 37 00:01:29,620 --> 00:01:33,990 If you'd like to go ahead and vote virtually. 38 00:01:33,990 --> 00:01:41,160 Yes or no, is this algorithm correct, one page at a time, looking for myself? 39 00:01:41,160 --> 00:01:43,290 Never mind the fact that this is yellow pages, 40 00:01:43,290 --> 00:01:45,880 and so I'm not going to be anywhere in the phone book. 41 00:01:45,880 --> 00:01:49,710 But indeed, we'll assume it contains humans as well. 42 00:01:49,710 --> 00:01:55,230 All right, so it looks like the algorithm is indeed correct. 43 00:01:55,230 --> 00:01:57,450 But it's terribly, terribly slow. 44 00:01:57,450 --> 00:02:01,530 And that's OK because one of the ideas we're going to consider in CS50, 45 00:02:01,530 --> 00:02:05,460 and in turn, computer science, is not only the correctness of an algorithm, 46 00:02:05,460 --> 00:02:09,250 but also the efficiency, how well designed is the algorithm. 47 00:02:09,250 --> 00:02:09,990 This is correct. 48 00:02:09,990 --> 00:02:12,810 It's just incredibly, incredibly tedious and slow. 49 00:02:12,810 --> 00:02:13,955 But I will find myself. 50 00:02:13,955 --> 00:02:15,330 But, of course, we can do better. 51 00:02:15,330 --> 00:02:19,020 Instead of looking for myself one page at a time, why don't I do one page? 52 00:02:19,020 --> 00:02:22,710 Let me do 2, 4, 6, 8, 10. 53 00:02:22,710 --> 00:02:23,442 It sounds faster. 54 00:02:23,442 --> 00:02:24,150 And it is faster. 55 00:02:24,150 --> 00:02:27,210 I'm going twice as fast through the phone book, looking for myself. 56 00:02:27,210 --> 00:02:28,560 Is this algorithm correct? 57 00:02:28,560 --> 00:02:30,570 Let me go to someone in the audience this time. 58 00:02:30,570 --> 00:02:34,350 Is this algorithm of searching for someone's name two pages at a time 59 00:02:34,350 --> 00:02:35,160 correct? 60 00:02:35,160 --> 00:02:36,960 Because I claim it's more efficient. 61 00:02:36,960 --> 00:02:41,520 I claim it's better designed because I'll solve the problem twice as fast. 62 00:02:41,520 --> 00:02:42,030 Annika? 63 00:02:42,030 --> 00:02:42,780 What do you think? 64 00:02:42,780 --> 00:02:45,630 ANNIKA: No because you might skip your name on the page. 65 00:02:45,630 --> 00:02:46,168 DAVID: Yeah. 66 00:02:46,168 --> 00:02:47,460 I might skip my name on a page. 67 00:02:47,460 --> 00:02:49,020 And let me ask a follow up question. 68 00:02:49,020 --> 00:02:49,778 Can I fix this? 69 00:02:49,778 --> 00:02:51,570 Do I have to throw out the whole algorithm? 70 00:02:51,570 --> 00:02:53,700 Or can we at least fix this problem, do you think? 71 00:02:53,700 --> 00:02:56,830 ANNIKA: I think whatever page you'd flip to, it 72 00:02:56,830 --> 00:02:59,580 would help to see what name is there, maybe see if your name would 73 00:02:59,580 --> 00:03:00,730 come before or after. 74 00:03:00,730 --> 00:03:01,230 DAVID: Nice. 75 00:03:01,230 --> 00:03:02,640 So that's exactly the right intuition. 76 00:03:02,640 --> 00:03:05,610 I don't think we have to completely sacrifice the idea of speeding up 77 00:03:05,610 --> 00:03:07,380 this algorithm by moving twice as fast. 78 00:03:07,380 --> 00:03:09,930 But as you propose, if I go too far maybe. 79 00:03:09,930 --> 00:03:12,780 I get to the E section, which is one letter too late, 80 00:03:12,780 --> 00:03:16,350 I should at least double back one page because I could get unlucky. 81 00:03:16,350 --> 00:03:19,102 And maybe David is kind of sandwiched in between two pages. 82 00:03:19,102 --> 00:03:21,810 At which point, I might fly by, get to the end of the phone book, 83 00:03:21,810 --> 00:03:22,920 say, no, there's no David. 84 00:03:22,920 --> 00:03:25,740 And I just got unlucky with, like, 50% probability. 85 00:03:25,740 --> 00:03:29,250 But, as you propose, I can at least recover and sort of conditionally 86 00:03:29,250 --> 00:03:32,190 ask myself, wait a minute, maybe I just missed it and double back. 87 00:03:32,190 --> 00:03:35,670 So I can get the overall speed improvement but then at least 88 00:03:35,670 --> 00:03:37,590 fix that kind of mistake or bug. 89 00:03:37,590 --> 00:03:39,840 And bug, a term of art in programming, a bug 90 00:03:39,840 --> 00:03:42,000 is just a mistake in a program or a mistake, more 91 00:03:42,000 --> 00:03:43,170 generally, in an algorithm. 92 00:03:43,170 --> 00:03:44,850 But honestly, none of us are going to do that. 93 00:03:44,850 --> 00:03:47,430 When we actually go to search for someone in a phone book, 94 00:03:47,430 --> 00:03:51,180 just like our phones do, they typically don't start at the top 95 00:03:51,180 --> 00:03:52,320 and go to the bottom. 96 00:03:52,320 --> 00:03:55,070 And computers do exactly what you might do more intuitively. 97 00:03:55,070 --> 00:03:56,820 They'll probably go roughly to the middle. 98 00:03:56,820 --> 00:03:58,545 Maybe they'll skew a little to the left if you know 99 00:03:58,545 --> 00:04:00,087 D is toward the start of an alphabet. 100 00:04:00,087 --> 00:04:02,400 But, no, I open to the middle sort of sloppily. 101 00:04:02,400 --> 00:04:03,850 And I'm in the M section. 102 00:04:03,850 --> 00:04:07,020 So what do I know when I'm in the M section about this problem? 103 00:04:07,020 --> 00:04:08,880 Let me call on one more person. 104 00:04:08,880 --> 00:04:10,180 I'm in the M section. 105 00:04:10,180 --> 00:04:13,710 What would you do as a human now, taking this as input to solve this problem? 106 00:04:13,710 --> 00:04:21,360 What do I know about the location, of course, my name in the phone book? 107 00:04:21,360 --> 00:04:24,900 What decision can I make here? 108 00:04:24,900 --> 00:04:26,510 What decision can I make? 109 00:04:26,510 --> 00:04:27,510 Kyle, what do you think? 110 00:04:27,510 --> 00:04:28,010 KYLE: Yeah. 111 00:04:28,010 --> 00:04:31,090 So from the M onwards, you know that your name won't be there for sure. 112 00:04:31,090 --> 00:04:31,590 DAVID: Yeah. 113 00:04:31,590 --> 00:04:33,300 So my name is not going to be in the M section. 114 00:04:33,300 --> 00:04:35,730 But thanks to the alphabetized portion of the phone book, I at least know, 115 00:04:35,730 --> 00:04:36,480 you know what? 116 00:04:36,480 --> 00:04:39,060 I can take a huge bite out of this problem 117 00:04:39,060 --> 00:04:42,517 and tear the problem in half, both metaphorically and also 118 00:04:42,517 --> 00:04:44,100 literally in the case of a phone book. 119 00:04:44,100 --> 00:04:47,430 And I can literally throw half of the problem away. 120 00:04:47,430 --> 00:04:50,580 And so if I started with some, like, 1,000 pages in this phone book 121 00:04:50,580 --> 00:04:53,970 or 1,000 contacts in my phone, just by going to the middle 122 00:04:53,970 --> 00:04:56,880 roughly and taking a look to the left and the right, I can decide, 123 00:04:56,880 --> 00:05:00,220 as you note, well, it's not on the page I'm looking for. 124 00:05:00,220 --> 00:05:02,340 But I can decide it's to the left or to the right. 125 00:05:02,340 --> 00:05:05,970 I know D comes before M. And so now I can go to the left. 126 00:05:05,970 --> 00:05:08,130 And you know what's interesting here is that I 127 00:05:08,130 --> 00:05:09,947 can use that exact same algorithm. 128 00:05:09,947 --> 00:05:11,530 I don't have to think any differently. 129 00:05:11,530 --> 00:05:15,420 I can apply the same logic, open to the middle of this half of the phonebook, 130 00:05:15,420 --> 00:05:17,340 and now I see I'm in the G section. 131 00:05:17,340 --> 00:05:18,960 So I'm still a little too far. 132 00:05:18,960 --> 00:05:22,800 But, again, I can tear half the problem away, throw it down. 133 00:05:22,800 --> 00:05:26,940 And now I've gone from 1,000 pages to 500 pages to 250 pages. 134 00:05:26,940 --> 00:05:30,990 If I do this again, I might find myself, oh, I made it to the C section now. 135 00:05:30,990 --> 00:05:34,230 I can tear the problem in half again, throw the left half away, 136 00:05:34,230 --> 00:05:36,810 and now I'm down to just 125 pages. 137 00:05:36,810 --> 00:05:37,920 Now, that's still a lot. 138 00:05:37,920 --> 00:05:41,880 But, my God, I've gone from 1,000 to 500 to 250, to 125. 139 00:05:41,880 --> 00:05:47,070 That is way faster than going from 1,000 to 999 to 998. 140 00:05:47,070 --> 00:05:53,130 And it's even faster than going from 1,000 to 998, to 996, to 994. 141 00:05:53,130 --> 00:05:57,890 Both of those algorithms are going to take me much longer as well. 142 00:05:57,890 --> 00:05:59,000