WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:00.000 --> 00:00:03.437 [MUSIC PLAYING] 00:00:10.330 --> 00:00:13.530 DAVID J. MALAN: So odds are you use a computer most every day, 00:00:13.530 --> 00:00:16.750 but you might not necessarily think of yourself as a computer person. 00:00:16.750 --> 00:00:20.090 Something goes wrong, you don't necessarily know how to fix it. 00:00:20.090 --> 00:00:23.180 And if you actually want to solve some problem using technology, 00:00:23.180 --> 00:00:25.420 the whole world of technology, and computing, 00:00:25.420 --> 00:00:28.930 and algorithms and all of that might seem all quite foreign. 00:00:28.930 --> 00:00:31.180 So much so that you can't necessarily feel like you're 00:00:31.180 --> 00:00:33.140 making an informed decision. 00:00:33.140 --> 00:00:35.890 Well, that's just because the technology that we all use around us 00:00:35.890 --> 00:00:37.867 is kind of working at this level. 00:00:37.867 --> 00:00:39.700 Whereby there's so many people who have come 00:00:39.700 --> 00:00:43.810 before us that have invented this level, and this level, and this level, 00:00:43.810 --> 00:00:44.510 and this level. 00:00:44.510 --> 00:00:48.730 And so what we'll do here is try to start from that ground level up. 00:00:48.730 --> 00:00:52.360 And indeed, at the base of all computing, do we really have, 00:00:52.360 --> 00:00:54.670 as we'll see, just zeros and ones? 00:00:54.670 --> 00:00:57.930 And that's probably something you know, at least, generally speaking. 00:00:57.930 --> 00:01:00.070 But it turns out once you go have zeros and ones, 00:01:00.070 --> 00:01:01.662 can you very quickly go to text? 00:01:01.662 --> 00:01:02.620 Can you go to graphics? 00:01:02.620 --> 00:01:03.490 Can you go to videos? 00:01:03.490 --> 00:01:04.615 Can you go to spreadsheets? 00:01:04.615 --> 00:01:06.370 Can you go to so much more? 00:01:06.370 --> 00:01:08.500 But again, that's the world we now inhabit. 00:01:08.500 --> 00:01:10.760 But let's now build ourselves up to that point, 00:01:10.760 --> 00:01:12.926 so that you actually begin to look around yourselves 00:01:12.926 --> 00:01:17.020 and realize, oh, I understand now, what's going on. 00:01:17.020 --> 00:01:21.430 And to do that, let's consider this, computational thinking, which really 00:01:21.430 --> 00:01:23.650 refers to thinking computationally. 00:01:23.650 --> 00:01:27.160 Thinking more methodically, thinking more carefully, 00:01:27.160 --> 00:01:32.590 and somehow framing all problems in the world as a sequence of steps. 00:01:32.590 --> 00:01:38.474 And those steps are quite simply, input comes in, and output goes out. 00:01:38.474 --> 00:01:40.390 And what's in between those inputs and outputs 00:01:40.390 --> 00:01:42.765 we'll eventually describe as something called algorithms, 00:01:42.765 --> 00:01:45.620 but for now, let's just consider it to be this black box. 00:01:45.620 --> 00:01:48.250 We don't know, we don't care what's going on inside there. 00:01:48.250 --> 00:01:52.810 All that we care is that we get back a correct output or a solution 00:01:52.810 --> 00:01:54.400 to a problem. 00:01:54.400 --> 00:01:58.480 But how do we go about representing those inputs and outputs? 00:01:58.480 --> 00:02:01.471 If our problem at hand is to analyze a whole bunch of financial data, 00:02:01.471 --> 00:02:03.220 a whole bunch of numbers in a spreadsheet, 00:02:03.220 --> 00:02:05.053 how can we actually sum all of those numbers 00:02:05.053 --> 00:02:08.380 together, or perform some other arithmetic calculation on them? 00:02:08.380 --> 00:02:12.020 If we instead have a really big document, a contract of some sort. 00:02:12.020 --> 00:02:14.460 And we want to make sure that it is properly spellchecked? 00:02:14.460 --> 00:02:16.994 Well, the input there isn't numbers, but is 00:02:16.994 --> 00:02:18.910 all of the words and letters in that document. 00:02:18.910 --> 00:02:21.610 And the output, we hope, is a document without all 00:02:21.610 --> 00:02:23.410 of those little red squigglies. 00:02:23.410 --> 00:02:26.080 We want a correctly spelled document. 00:02:26.080 --> 00:02:30.040 So those are just some of the problems that you might experience most any day, 00:02:30.040 --> 00:02:33.160 but underneath the hood, there's quite a bit of complexity going on. 00:02:33.160 --> 00:02:35.260 But that complexity, it turns out, is just 00:02:35.260 --> 00:02:40.064 the result of layering, pretty simple ideas, one on top of another. 00:02:40.064 --> 00:02:42.230 But in order to get to that point in the discussion, 00:02:42.230 --> 00:02:45.640 we need to somehow represent these inputs, these numbers, these letters, 00:02:45.640 --> 00:02:47.800 whatever it is that we have at hand. 00:02:47.800 --> 00:02:51.820 And to do that, we're going to use binary. 00:02:51.820 --> 00:02:55.330 Now odds are you've probably heard that computers indeed only speak 00:02:55.330 --> 00:02:57.430 or understand zeros and ones. 00:02:57.430 --> 00:03:01.060 But how can that possibly be, when they can do so much today, whether they're 00:03:01.060 --> 00:03:05.020 on our desktops, or laptops or even our mobile phones in our pockets, 00:03:05.020 --> 00:03:07.420 if all you have at the end of the day is zeros and ones? 00:03:07.420 --> 00:03:12.550 How could you possibly count to two, let alone three, or four, or four billion, 00:03:12.550 --> 00:03:15.540 let alone representing any number of other types of media 00:03:15.540 --> 00:03:17.760 that computers today understand? 00:03:17.760 --> 00:03:20.770 Well, odds are you recognize in this word here, 00:03:20.770 --> 00:03:24.910 this prefix, bi, bi meaning two, and indeed that hints at the fact 00:03:24.910 --> 00:03:28.090 that you only have two letters in your alphabet, so to speak. 00:03:28.090 --> 00:03:30.490 Two digits, zero and one. 00:03:30.490 --> 00:03:34.960 Now we humans have typically grown up using the decimal system, 00:03:34.960 --> 00:03:36.640 dec meaning 10. 00:03:36.640 --> 00:03:41.800 And of course we have 0,1, 2,3, 4,5, 6, 7, 8, 9, 10 possible digits 00:03:41.800 --> 00:03:45.730 that we can permute and arrange to count even higher than that. 00:03:45.730 --> 00:03:49.570 But with just zeros and ones, how do you get to two, how do you get to three? 00:03:49.570 --> 00:03:55.090 Well, you're thinking still in base 10, so to speak, base 10 being decimal. 00:03:55.090 --> 00:03:59.200 But we want to think now in base 2, so to speak, binary, 00:03:59.200 --> 00:04:00.970 where we have just two of these inputs. 00:04:00.970 --> 00:04:03.290 And let me propose that if you're like me, 00:04:03.290 --> 00:04:07.150 you probably grew up understanding numbers as really just this pattern 00:04:07.150 --> 00:04:11.430 of symbols like this, where each symbol was in a column of sorts, 00:04:11.430 --> 00:04:13.160 a placeholder, if you will. 00:04:13.160 --> 00:04:17.170 And this, if you recall, was generally called the ones place, and then 00:04:17.170 --> 00:04:19.750 the tens place, and then with the hundreds place. 00:04:19.750 --> 00:04:22.270 And so at the moment, what we really just have on the screen 00:04:22.270 --> 00:04:26.230 is a pattern of symbols, a pattern of digits, one, two, three. 00:04:26.230 --> 00:04:29.920 But why is this pattern of symbols, one, two, three, 00:04:29.920 --> 00:04:34.180 the number you and I think of immediately as 123? 00:04:34.180 --> 00:04:38.500 Well, that's because by way of these columns or places do we implicitly, 00:04:38.500 --> 00:04:40.870 in our own mind, quickly do a bit of arithmetic? 00:04:40.870 --> 00:04:45.790 Of course, if we have one, two, three, that's really like 100 times 1 plus 00:04:45.790 --> 00:04:53.840 10 times 2 plus 1 times 3, and of course that gives us 123. 00:04:53.840 --> 00:04:58.240 So it turns out that the binary system is actually fundamentally the same. 00:04:58.240 --> 00:05:01.550 So if you get decimal, if you've been counting like that since grade school, 00:05:01.550 --> 00:05:03.880 you are good to go now with binary, because in fact 00:05:03.880 --> 00:05:07.259 and arguably, binary is even simpler, because you don't even 00:05:07.259 --> 00:05:08.800 have to keep track of so many digits. 00:05:08.800 --> 00:05:11.740 In fact, if we consider this pattern of symbols, 00:05:11.740 --> 00:05:14.200 this of course, in our human world, would generally 00:05:14.200 --> 00:05:17.467 be immediately recognized by us humans as the number zero. 00:05:17.467 --> 00:05:19.300 Of course, it's a little silly to have those 00:05:19.300 --> 00:05:22.210 left most zeros, those leading zeros, so to speak, 00:05:22.210 --> 00:05:24.640 because they don't really add much mathematically. 00:05:24.640 --> 00:05:25.270 But that's OK. 00:05:25.270 --> 00:05:28.630 You can put as many zeros to the left of a number in our real world 00:05:28.630 --> 00:05:31.120 and it doesn't really affect the value. 00:05:31.120 --> 00:05:38.530 Now instead of using one and 10 and 100, let me just use one, two, and four. 00:05:38.530 --> 00:05:39.580 Now why those values? 00:05:39.580 --> 00:05:43.910 Well before, 1, 10, 100, and again a thousand, 10,000, 100,000, and so 00:05:43.910 --> 00:05:44.410 forth. 00:05:44.410 --> 00:05:49.450 Those are powers of 10, 10 to the zero is 1, 10 to the one is 10, 00:05:49.450 --> 00:05:52.760 10 to the two is 100, and so forth. 00:05:52.760 --> 00:05:55.900 So what pattern do you perhaps see here? 00:05:55.900 --> 00:06:00.520 These aren't powers of 10 anymore, 1, 2, 4, an 8, 16, 32. 00:06:00.520 --> 00:06:02.470 Now you really hear the pattern? 00:06:02.470 --> 00:06:03.970 These are instead powers of two. 00:06:03.970 --> 00:06:08.740 Two to the zero is one, two to the one is two, two to the two 00:06:08.740 --> 00:06:10.730 is four, and so forth. 00:06:10.730 --> 00:06:11.950 So that's the only change. 00:06:11.950 --> 00:06:16.270 By using base 10, just two digits, zero and one, 00:06:16.270 --> 00:06:19.040 can we still count using the same arithmetic system. 00:06:19.040 --> 00:06:21.850 So for instance, this of course is still the number zero, 00:06:21.850 --> 00:06:27.670 because we have zero fours, and zero twos, and zero ones. 00:06:27.670 --> 00:06:30.850 But if we instead change this pattern to 0 0 1, 00:06:30.850 --> 00:06:32.620 well what value does this now represent? 00:06:32.620 --> 00:06:35.410 In the world of decimal, this would represent the number one. 00:06:35.410 --> 00:06:40.660 And that's true in the world of binary, because we have 4 times 0 plus 2 times 00:06:40.660 --> 00:06:44.800 0 plus 1 times 1, which of course is just 1. 00:06:44.800 --> 00:06:46.360 So how do we get to two? 00:06:46.360 --> 00:06:51.040 You might be inclined to just change this zero to a one, 00:06:51.040 --> 00:06:52.510 but that would be incorrect. 00:06:52.510 --> 00:06:54.280 That's like carrying the one, but then not 00:06:54.280 --> 00:06:56.920 wrapping around on the rightmost digit. 00:06:56.920 --> 00:06:59.769 Rather, if we want to represent the number two, 00:06:59.769 --> 00:07:02.060 we need to come up with a pattern that represents that. 00:07:02.060 --> 00:07:06.850 So let me propose instead that we don't just change that zero to a one, 00:07:06.850 --> 00:07:10.030 but we also change that one to zero. 00:07:10.030 --> 00:07:12.190 Because now we have 4 times 0 plus 2 times 00:07:12.190 --> 00:07:16.910 1 plus times 0 which of course is the number we know as 2. 00:07:16.910 --> 00:07:19.240 And now you can perhaps grok the pattern at hand. 00:07:19.240 --> 00:07:21.790 If we want to count up to the number we know as three, 00:07:21.790 --> 00:07:25.330 let's just change that rightmost one to a one. 00:07:25.330 --> 00:07:29.290 So that's four times 0 plus 2 times 1 plus 1 times 1. 00:07:29.290 --> 00:07:30.480 That of course is three. 00:07:30.480 --> 00:07:32.140 Four is perhaps jumping out at you now. 00:07:32.140 --> 00:07:36.640 We just change that zero to a one, and then the other two digits to zeros. 00:07:36.640 --> 00:07:42.730 And now if we want to count up to five, or to six, or to seven, or to-- 00:07:42.730 --> 00:07:45.070 dang it, what's supposed to come next? 00:07:45.070 --> 00:07:48.830 It would seem that in the binary system, we can only count as high as-- 00:07:48.830 --> 00:07:49.622 what, seven? 00:07:49.622 --> 00:07:52.990 And that's a little strange, because of course, we started at zero, 00:07:52.990 --> 00:07:56.440 we counted here up to seven, but surely computers can count higher than that. 00:07:56.440 --> 00:07:58.930 And that's fine, just need another digit. 00:07:58.930 --> 00:08:04.600 Much like if we wanted to count from 999, in our human decimal world, 00:08:04.600 --> 00:08:07.480 up to a thousand, which has four digits. 00:08:07.480 --> 00:08:09.730 Similarly here, do we need another digit? 00:08:09.730 --> 00:08:12.790 We need an eights place, and so we could represent 00:08:12.790 --> 00:08:16.636 the number we know as eight with one, zero, zero, zero. 00:08:16.636 --> 00:08:20.620 But the key here is that we needed another digit. 00:08:20.620 --> 00:08:23.740 To put it more mechanically, we needed more hardware. 00:08:23.740 --> 00:08:27.430 We needed another physical place, at least in this depiction, 00:08:27.430 --> 00:08:32.360 to actually store that one. 00:08:32.360 --> 00:08:37.990 And now we begin already to hint at a connection between this digital world 00:08:37.990 --> 00:08:42.309 of zeros and ones, and the hardware world in which it's typically 00:08:42.309 --> 00:08:43.390 incarnated. 00:08:43.390 --> 00:08:48.250 And by that, I mean, we need to decide, ultimately, 00:08:48.250 --> 00:08:51.850 how and where to store patterns like this. 00:08:51.850 --> 00:08:55.600 And you know the nice thing about binary only having two values, 00:08:55.600 --> 00:08:58.450 is that that effectively, means you have just two states. 00:08:58.450 --> 00:09:01.450 And you might think of these two states as something being on, 00:09:01.450 --> 00:09:02.710 or something being off. 00:09:02.710 --> 00:09:06.010 And maybe we might think of something being on as a one, and something being 00:09:06.010 --> 00:09:07.570 off as a zero. 00:09:07.570 --> 00:09:11.920 And so just with two states, can we have these two extremes. 00:09:11.920 --> 00:09:15.850 And maybe it's not on or off, maybe it's true, or false, or black, or white, 00:09:15.850 --> 00:09:16.750 or red, or blue. 00:09:16.750 --> 00:09:19.390 It doesn't even matter what we call the two states, 00:09:19.390 --> 00:09:21.745 the key is that we have two of them. 00:09:21.745 --> 00:09:25.510 And you know In the physical world, the simplest thing 00:09:25.510 --> 00:09:31.930 I can think of to turn on and off, is perhaps something like this. 00:09:31.930 --> 00:09:32.920 Just a light bulb. 00:09:32.920 --> 00:09:35.080 This here's a little desk lamp, and in fact, 00:09:35.080 --> 00:09:38.740 if I go ahead and clip this on here, and let 00:09:38.740 --> 00:09:41.950 me plug this into an electrical outlet so that I'm 00:09:41.950 --> 00:09:46.030 getting some additional input if you will, electricity or electrons, 00:09:46.030 --> 00:09:49.450 this lamp right now I claim is representing the number zero. 00:09:49.450 --> 00:09:53.430 It's completely off, and I'm just going to agree, to agree with you, that this 00:09:53.430 --> 00:09:54.650 shall represent zero. 00:09:54.650 --> 00:09:58.400 But you know what, if I want to represent a one in the physical world, 00:09:58.400 --> 00:09:59.720 just going to turn it on. 00:09:59.720 --> 00:10:04.550 And so now we have something that's on, or true, we'll call that a one. 00:10:04.550 --> 00:10:06.170 Zero, one. 00:10:06.170 --> 00:10:09.200 Of course, we're not really counting all that high just yet. 00:10:09.200 --> 00:10:13.220 If I want to count higher than zero to one, to something higher, 00:10:13.220 --> 00:10:15.680 I'm going to need another light bulb. 00:10:15.680 --> 00:10:18.567 I'm going to need more hardware, more memory, if you will, 00:10:18.567 --> 00:10:19.650 in the world of computers. 00:10:19.650 --> 00:10:25.400 So let's actually now give myself a second zero or one. 00:10:25.400 --> 00:10:28.890 Plug this thing in here. 00:10:28.890 --> 00:10:31.940 Give it some electricity as well, which again, is really 00:10:31.940 --> 00:10:34.500 one of the few physical resources we have in a computer, 00:10:34.500 --> 00:10:37.040 whether it's coming from a power cord or a battery. 00:10:37.040 --> 00:10:40.460 And so now, if I want to represent the number zero, here we are. 00:10:40.460 --> 00:10:42.500 But I'm doing it really with zero zero. 00:10:42.500 --> 00:10:48.950 This now is going to be one, this now is going to be two, and this, of course, 00:10:48.950 --> 00:10:50.180 is going to be three. 00:10:50.180 --> 00:10:51.830 And let's not stop there. 00:10:51.830 --> 00:10:54.920 Why have two desk lamps when you can have three desk lamps. 00:10:54.920 --> 00:10:57.680 If we instead make a little more room for here, and we 00:10:57.680 --> 00:11:01.010 want to instead now count not just as high as three. 00:11:01.010 --> 00:11:06.310 Let me go ahead and plug-in this third and final desk lamp, 00:11:06.310 --> 00:11:08.230 and go ahead and turn this on. 00:11:08.230 --> 00:11:12.340 And of course, we instantly go from three to-- 00:11:12.340 --> 00:11:14.480 not quite instantly because you have the hands-- 00:11:14.480 --> 00:11:16.410 to the number four. 00:11:16.410 --> 00:11:17.770 And so forth. 00:11:17.770 --> 00:11:20.850 So what's now the connection between the digital 00:11:20.850 --> 00:11:23.430 and the sort of analog world of light bulbs, 00:11:23.430 --> 00:11:25.710 and the physical world of computers? 00:11:25.710 --> 00:11:29.460 Well inside of a computer is a whole bunch of tiny, tiny little bulbs, 00:11:29.460 --> 00:11:29.961 so to speak. 00:11:29.961 --> 00:11:33.001 In fact, you can think of what's inside of your computer is exactly this. 00:11:33.001 --> 00:11:35.520 So many little light bulbs that are either on or off. 00:11:35.520 --> 00:11:41.100 And thanks to those little light bulbs, can your computer represent numbers? 00:11:41.100 --> 00:11:44.250 Zero, one, two, three, four, as many-- 00:11:44.250 --> 00:11:47.850 as high of a number as it wants so long as it has enough little lights. 00:11:47.850 --> 00:11:49.590 Now of course I'm oversimplifying. 00:11:49.590 --> 00:11:53.280 It's not actually lights that you have inside of your computer, rather, 00:11:53.280 --> 00:11:55.250 you have just the switches. 00:11:55.250 --> 00:11:58.034 These switches being called in the computing world transistors, 00:11:58.034 --> 00:11:58.950 as you may have heard. 00:11:58.950 --> 00:12:02.190 Indeed, one of the things that companies like Intel are increasingly doing, 00:12:02.190 --> 00:12:06.420 is packing more and more transistors into their CPUs, central processing 00:12:06.420 --> 00:12:06.990 units. 00:12:06.990 --> 00:12:08.250 The brains of computers. 00:12:08.250 --> 00:12:12.270 And with those additional transistors, can they store even more values, 00:12:12.270 --> 00:12:15.330 or equivalently can they count even higher? 00:12:15.330 --> 00:12:18.690 And so now, even though we began the discussion down here 00:12:18.690 --> 00:12:21.400 with just zeros and ones in the abstract, 00:12:21.400 --> 00:12:23.580 now we've made it a little more physical, in so far 00:12:23.580 --> 00:12:26.820 as we can represent those zeros and ones with something physical, 00:12:26.820 --> 00:12:29.400 drawing electricity to power these kinds of light bulbs. 00:12:29.400 --> 00:12:31.680 Of course, we can now miniaturize that and consider 00:12:31.680 --> 00:12:35.160 these light bulbs to really be, what we'll call in the computer world, 00:12:35.160 --> 00:12:37.980 transistors. 00:12:37.980 --> 00:12:40.320 But at the end of the day, even though we 00:12:40.320 --> 00:12:43.380 have now a way of representing information, 00:12:43.380 --> 00:12:47.160 the zeros and ones, even though we have a way now 00:12:47.160 --> 00:12:51.300 of representing even bigger numbers by using even more transistors inside 00:12:51.300 --> 00:12:55.530 of our computer, we're still talking only about numbers. 00:12:55.530 --> 00:12:58.170 And with my computer I want to do more than just use something 00:12:58.170 --> 00:12:59.670 like Microsoft Excel. 00:12:59.670 --> 00:13:04.170 I want to do more than just do math on my data. 00:13:04.170 --> 00:13:07.710 I want to actually store things like letters, and words. 00:13:07.710 --> 00:13:15.040 Let alone, colors, and images, and videos, and more. 00:13:15.040 --> 00:13:17.410 So we need to abstract higher. 00:13:17.410 --> 00:13:20.890 So again zeros and ones, we know how to represent it, now let's 00:13:20.890 --> 00:13:24.610 build on top of the zeros and ones and start to represent 00:13:24.610 --> 00:13:26.470 something more interesting. 00:13:26.470 --> 00:13:31.520 Something more alphabetical, if you will. 00:13:31.520 --> 00:13:34.630 And so this gives us ASCII, the American Standard 00:13:34.630 --> 00:13:37.570 Code for Information Interchange, or more modernly 00:13:37.570 --> 00:13:40.780 referred to as a superset of Unicode. 00:13:40.780 --> 00:13:43.420 Turns out, human some time ago, decided on a mapping 00:13:43.420 --> 00:13:45.070 between letters and numbers. 00:13:45.070 --> 00:13:47.560 Somewhat arbitrarily, but in a way that's 00:13:47.560 --> 00:13:51.290 convenient when it comes to actually programming, things like this. 00:13:51.290 --> 00:13:53.290 And this is to say that humans time ago decided, 00:13:53.290 --> 00:13:58.140 you know what, we are going to represent the letter A using the decimal number 00:13:58.140 --> 00:13:58.900 65. 00:13:58.900 --> 00:13:59.990 Now what does that mean? 00:13:59.990 --> 00:14:03.670 Well, this means that if your computer is to store the letter A, or the letter 00:14:03.670 --> 00:14:06.550 B, or the letter C, it simply has to store ultimately 00:14:06.550 --> 00:14:09.680 the number 65, or 66, or 67. 00:14:09.680 --> 00:14:13.030 What does it mean to store number like 65, 66, or 67? 00:14:13.030 --> 00:14:17.040 Well that just means to store some pattern of bits, some pattern of bulbs 00:14:17.040 --> 00:14:21.910 as we did a moment ago, and turn them on and off in such a way 00:14:21.910 --> 00:14:27.521 that you're representing in binary, the decimal number 65, 66, 67. 00:14:27.521 --> 00:14:30.520 So if you think about something like Microsoft Word or Google documents, 00:14:30.520 --> 00:14:33.580 or any program in which you type text, what you're really 00:14:33.580 --> 00:14:37.480 doing by typing on the keyboard, is sending some pattern of signals 00:14:37.480 --> 00:14:42.190 that's telling the computer to store not just the letter ABC per se, 00:14:42.190 --> 00:14:45.700 but really to store the number 65 or 66 or 67, 00:14:45.700 --> 00:14:50.260 or really, to store the pattern of zeros and ones 00:14:50.260 --> 00:14:52.680 that ultimately represent those same values. 00:14:52.680 --> 00:14:56.020 So again, this spirit of layering binary now 00:14:56.020 --> 00:14:59.380 becomes ASCII, or Unicode, something higher still. 00:14:59.380 --> 00:15:01.630 And so with this can we represent messages? 00:15:01.630 --> 00:15:07.950 For instance, if I wanted to represent a message like this, a familiar message, 00:15:07.950 --> 00:15:10.780 there say, I might store these three values. 00:15:10.780 --> 00:15:15.640 72, 73, and 33. 00:15:15.640 --> 00:15:16.540 Now what is that? 00:15:16.540 --> 00:15:20.305 Well turns out if I look back at this pattern here, where 65 is A, 00:15:20.305 --> 00:15:26.910 and in tens 72 is H, and 73 is I, well what are we really representing here? 00:15:26.910 --> 00:15:29.301 It would seem we're representing the pattern H I, 00:15:29.301 --> 00:15:31.300 and then you wouldn't know this from that chart, 00:15:31.300 --> 00:15:33.091 but if you look at another online reference 00:15:33.091 --> 00:15:35.390 you'll see a 33 is an exclamation point. 00:15:35.390 --> 00:15:38.980 So now we have the word Hi or the exclamation Hi. 00:15:38.980 --> 00:15:42.700 But that's in the context of a text editor, or word processor like word. 00:15:42.700 --> 00:15:45.460 Suppose, instead, you were using not a text 00:15:45.460 --> 00:15:51.220 editor, but something like Photoshop, or MS Paint, 00:15:51.220 --> 00:15:53.290 or some other graphical program. 00:15:53.290 --> 00:15:57.520 Well instead, you might have that same pattern of numbers, 00:15:57.520 --> 00:16:01.300 or really bits at the end of the day, but in this context, 00:16:01.300 --> 00:16:04.540 something like Photoshop, these numbers are 00:16:04.540 --> 00:16:08.980 meant to be values between 0 and 255. 00:16:08.980 --> 00:16:12.937 Turns out, long story short, that if you have eight zeros or ones, 00:16:12.937 --> 00:16:14.770 where zero or one, you know what, let's just 00:16:14.770 --> 00:16:19.030 start calling them by their formal name bits for binary digits. 00:16:19.030 --> 00:16:24.190 If you have eight bits, you can count from 0 all the way up to 255. 00:16:24.190 --> 00:16:27.160 And the quick math there is 2 to the 8, means 00:16:27.160 --> 00:16:30.220 you have 256 possible permutations of zeros and ones. 00:16:30.220 --> 00:16:33.880 So therefore, if you start counting at zero you can count as high up as 255. 00:16:33.880 --> 00:16:40.000 And so in the context of a graphics program, you can think of 72 in so far 00:16:40.000 --> 00:16:46.210 as it's not quite halfway between zero and 255, it's a medium amount of red 00:16:46.210 --> 00:16:47.950 and a medium amount of green. 00:16:47.950 --> 00:16:49.810 Not too much blue. 00:16:49.810 --> 00:16:58.090 Where zero means none of this color, and 255 means a lot of this color. 00:16:58.090 --> 00:17:03.490 So if you had a pattern of bits, and in-turn numbers, stored inside 00:17:03.490 --> 00:17:06.525 of your computer for the purposes of a graphics program, 00:17:06.525 --> 00:17:09.400 it's really like telling the computer give me a medium amount of red, 00:17:09.400 --> 00:17:12.910 give me a minimum amount of green and just a little bit of blue, 00:17:12.910 --> 00:17:19.390 and combine those like paint, or like frequencies of light., 00:17:19.390 --> 00:17:21.609 until you get the summation of those, really, 00:17:21.609 --> 00:17:23.770 are the combination really of those, which 00:17:23.770 --> 00:17:26.500 is this murky shade here of yellow. 00:17:26.500 --> 00:17:29.200 So again, using the same patterns of bits, 00:17:29.200 --> 00:17:32.080 can we represent either letters and words and paragraphs, 00:17:32.080 --> 00:17:35.410 or in another context all together, could that same pattern of bits 00:17:35.410 --> 00:17:40.180 still represent numbers, but be interpreted not as letters and words 00:17:40.180 --> 00:17:42.190 and paragraphs, but as colors. 00:17:42.190 --> 00:17:47.290 If you treat each trio of 8 bits as representing some amount of red, 00:17:47.290 --> 00:17:49.240 some amount of green, some amount of blue, 00:17:49.240 --> 00:17:55.320 otherwise known as, if you've heard the term, RGB, for red, green, blue. 00:17:55.320 --> 00:17:59.100 So this principle of starting simple, and gradually making 00:17:59.100 --> 00:18:02.270 things more and more and more complicated, 00:18:02.270 --> 00:18:04.354 is really a principle of abstraction. 00:18:04.354 --> 00:18:06.270 Because at this point in the story, when we're 00:18:06.270 --> 00:18:10.020 talking about words and paragraphs and essays and documents, 00:18:10.020 --> 00:18:13.230 or in another context we're talking about images, or maybe even 00:18:13.230 --> 00:18:17.430 movies and more, we no longer really need to care about, 00:18:17.430 --> 00:18:19.470 or even need to know about, or understand, 00:18:19.470 --> 00:18:21.570 what's underneath the hood those zeros and ones, 00:18:21.570 --> 00:18:24.960 because we've abstracted away from that lower level. 00:18:24.960 --> 00:18:28.410 And this principle of abstraction, layering idea on idea 00:18:28.410 --> 00:18:30.780 on idea on idea such that you no longer need 00:18:30.780 --> 00:18:33.750 to worry about how the lower level ideas are implemented 00:18:33.750 --> 00:18:36.540 is nice, because it allows us humans to focus only 00:18:36.540 --> 00:18:38.670 on the problems we really care about, which 00:18:38.670 --> 00:18:41.220 in theory are those top most problems. 00:18:41.220 --> 00:18:44.640 The ones that are immediately at hand, that we're building solutions to, 00:18:44.640 --> 00:18:47.820 on the shoulders of, computer scientists, and engineers, 00:18:47.820 --> 00:18:50.580 and just colleagues that have come before us. 00:18:50.580 --> 00:18:52.510 And we don't have to get lost in the weeds, 00:18:52.510 --> 00:18:54.870 so to speak, of the earlier complexity. 00:18:54.870 --> 00:18:57.210 And so this is a powerful problem solving technique, 00:18:57.210 --> 00:18:59.910 and indeed it's a principle that we'll see applied ultimately 00:18:59.910 --> 00:19:04.070 in the world of programming as well. 00:19:04.070 --> 00:19:05.500 Now let's try something. 00:19:05.500 --> 00:19:08.140 Speaking of abstractions, let me encourage you 00:19:08.140 --> 00:19:11.354 at this point to take out a piece of paper if you have one there. 00:19:11.354 --> 00:19:13.270 And surely, if you don't have one right there, 00:19:13.270 --> 00:19:17.380 surely you could pause this video and actually go dig one up. 00:19:17.380 --> 00:19:18.950 Grab a pencil too, or pen. 00:19:21.640 --> 00:19:24.620 Yeah, Ill wait. 00:19:24.620 --> 00:19:28.460 Now you could just pause this, I can't wait all that long. 00:19:28.460 --> 00:19:32.090 Let's assume at this point in the story you've got that piece of paper, 00:19:32.090 --> 00:19:34.760 and a pencil or a pen, and let's play a little game. 00:19:34.760 --> 00:19:37.446 I of course can't really see what you're doing. 00:19:37.446 --> 00:19:40.070 But I'm going to hope that you either do or don't do what I do, 00:19:40.070 --> 00:19:42.200 because either way it will be instructive. 00:19:42.200 --> 00:19:44.420 And I'm going to leverage some abstractions here, 00:19:44.420 --> 00:19:45.740 for better or for worse. 00:19:45.740 --> 00:19:49.130 I want you to go ahead, and please don't be one of those people like me 00:19:49.130 --> 00:19:50.890 who's just following along pretending like all right, 00:19:50.890 --> 00:19:54.056 I have the piece of paper, I have the pencil, this is what I would be doing. 00:19:54.056 --> 00:19:55.170 Actually do this. 00:19:55.170 --> 00:19:59.000 This will be kind of fun for one of us in the end, if this works out. 00:19:59.000 --> 00:20:03.050 Go ahead now, on that piece of paper, with your pen or pencil, and go ahead 00:20:03.050 --> 00:20:05.040 and just draw a circle. 00:20:07.810 --> 00:20:08.710 OK. 00:20:08.710 --> 00:20:09.210 All right. 00:20:09.210 --> 00:20:13.890 And below that circle draw a square. 00:20:13.890 --> 00:20:15.100 All set. 00:20:15.100 --> 00:20:18.025 All right, and below that square draw a triangle. 00:20:21.360 --> 00:20:25.460 Now odds are, you know exactly what I meant when I said draw a circle, 00:20:25.460 --> 00:20:27.690 and perhaps you did a little something like this. 00:20:27.690 --> 00:20:29.700 Or a little more perfect than my circle there. 00:20:29.700 --> 00:20:31.499 And then beneath that I said draw a square. 00:20:31.499 --> 00:20:33.290 And you just intuitively know what a square 00:20:33.290 --> 00:20:36.920 is, so you might have drawn a square like this. 00:20:36.920 --> 00:20:39.710 And then the third thing I said was draw a triangle. 00:20:39.710 --> 00:20:44.120 And so you might have drawn a triangle that looks like this. 00:20:44.120 --> 00:20:47.490 But immediately, ant that looks curiously like part of a jack-o-lantern 00:20:47.490 --> 00:20:47.990 now. 00:20:47.990 --> 00:20:50.917 But curiously, there's some ambiguity there. 00:20:50.917 --> 00:20:53.750 Right, a circle is kind of a circle, but I didn't specify the radius 00:20:53.750 --> 00:20:56.270 or diameter, so maybe yours is bigger, maybe yours a smaller, 00:20:56.270 --> 00:20:59.270 maybe it's over here on your paper, maybe it's over there on your paper. 00:20:59.270 --> 00:21:02.540 So already, these abstractions are useful in that you immediately 00:21:02.540 --> 00:21:06.050 knew what to draw, but you didn't really know how to draw it. 00:21:06.050 --> 00:21:09.560 Similarly, this square, I don't know why I drew it a little smaller here, 00:21:09.560 --> 00:21:11.960 but it's indeed smaller, and it's barely a square. 00:21:11.960 --> 00:21:13.250 But it's meant to be a square. 00:21:13.250 --> 00:21:16.540 But there's a gap between it and the circle, but I didn't really specify. 00:21:16.540 --> 00:21:18.290 So there, too, the abstraction is valuable 00:21:18.290 --> 00:21:22.760 in that you immediately knew how to draw a square, but not where to draw it. 00:21:22.760 --> 00:21:24.890 Or in what size to draw it. 00:21:24.890 --> 00:21:27.140 And lastly, the triangle, perhaps the most, 00:21:27.140 --> 00:21:29.510 the juiciest opportunity for ambiguity, I 00:21:29.510 --> 00:21:31.350 didn't tell you how to orient that triangle. 00:21:31.350 --> 00:21:34.400 Maybe you, instead, did something a little different, 00:21:34.400 --> 00:21:37.130 where your triangle wasn't drawn like that, 00:21:37.130 --> 00:21:39.140 with the pointy part at the bottom. 00:21:39.140 --> 00:21:42.380 Maybe you, instead, did something like this. 00:21:42.380 --> 00:21:46.310 Or maybe it was somewhere else on the paper altogether. 00:21:46.310 --> 00:21:50.450 So now at this point in the story, you could have followed my instructions 00:21:50.450 --> 00:21:54.230 exactly as I described, but we could still somehow come up 00:21:54.230 --> 00:21:55.550 with different solutions. 00:21:55.550 --> 00:21:59.660 And in fact, if I can now spoil what the actual task at hand was, 00:21:59.660 --> 00:22:02.000 this was the picture I was describing. 00:22:02.000 --> 00:22:04.220 This picture here has a circle, below which 00:22:04.220 --> 00:22:06.400 is a square, below which is a triangle. 00:22:06.400 --> 00:22:08.750 But that leaves out some key details. 00:22:08.750 --> 00:22:12.500 And the curious thing here, is that even though abstraction is a useful 00:22:12.500 --> 00:22:16.730 mechanism, once you start to move away from those implementation details, 00:22:16.730 --> 00:22:21.140 if you will, you very quickly realize that I don't really know what 00:22:21.140 --> 00:22:23.390 you're telling me to do, necessarily. 00:22:23.390 --> 00:22:28.430 And the challenge is, that computers, as complicated or intimidating as they 00:22:28.430 --> 00:22:30.960 might seem to you, they're really not all that bright. 00:22:30.960 --> 00:22:31.460 Right. 00:22:31.460 --> 00:22:34.640 They can only do what they are explicitly told to do. 00:22:34.640 --> 00:22:38.000 And so if you, the human, or you eventually perhaps, 00:22:38.000 --> 00:22:42.410 the programmer, don't actually specify absolutely 00:22:42.410 --> 00:22:45.420 precisely what you want the computer-- 00:22:45.420 --> 00:22:48.620 or the human in this case-- to do, you might not 00:22:48.620 --> 00:22:53.630 get the output, or the correctness of a solution, that you're hoping for. 00:22:53.630 --> 00:22:57.080 In fact, let's try one other, and let's see 00:22:57.080 --> 00:22:58.761 what the other extreme might feel like. 00:22:58.761 --> 00:23:01.010 So on that same piece of paper, maybe on the flip side 00:23:01.010 --> 00:23:05.569 or another sheet of paper, let's play this game just one more time. 00:23:05.569 --> 00:23:07.360 And this one's going to be a little harder. 00:23:07.360 --> 00:23:11.170 I'm going to try to tell you exactly what to do. 00:23:11.170 --> 00:23:13.940 So lesson learned, that was too abstract. 00:23:13.940 --> 00:23:16.070 Let's now drill down on the implementation details. 00:23:16.070 --> 00:23:16.570 OK. 00:23:16.570 --> 00:23:20.650 So let me think like a computer might, or I think a computer might, 00:23:20.650 --> 00:23:24.400 go ahead and put your pen or pencil down on the piece of paper 00:23:24.400 --> 00:23:27.320 toward the top middle of the page. 00:23:27.320 --> 00:23:37.020 Now from that point, move say Southwest to halfway down the piece of paper. 00:23:37.020 --> 00:23:40.270 So really at a 45 degree downward angle. 00:23:40.270 --> 00:23:47.350 And then go ahead and move south east, or a different 45 degree angle, 00:23:47.350 --> 00:23:49.580 toward the bottom of the page. 00:23:49.580 --> 00:23:52.950 So you've kind of sort of made the left half of a triangle, 00:23:52.950 --> 00:23:55.690 or not really that, 2/3 of a triangle, two sides of a triangle. 00:23:55.690 --> 00:23:57.340 But stay where you are. 00:23:57.340 --> 00:24:03.580 At this point in the story your pen is probably below your original dot. 00:24:03.580 --> 00:24:07.150 But you've drawn two lines that form an angle. 00:24:07.150 --> 00:24:13.370 From where your pen now is, draw a line Northeast, 00:24:13.370 --> 00:24:15.950 or in an upward and to the right 45 degree 00:24:15.950 --> 00:24:19.490 angle, to the same height as your previous line. 00:24:19.490 --> 00:24:25.490 And then double back and go say Northwest, or a different 45 degree 00:24:25.490 --> 00:24:28.570 angle, still back to the original point. 00:24:28.570 --> 00:24:33.340 And at this point in the story, you have really either nothing at all, 00:24:33.340 --> 00:24:35.020 or you have a diamond. 00:24:35.020 --> 00:24:37.240 And therein lies a curiosity too. 00:24:37.240 --> 00:24:40.492 I could have just said diamond, draw a diamond, or draw a kite. 00:24:40.492 --> 00:24:41.950 Which would be an equivalent shape. 00:24:41.950 --> 00:24:44.950 But that too lends itself to same ambiguity, so I drill down deeper. 00:24:44.950 --> 00:24:48.290 But my God, it's taken us just a minute or more just to get to this point. 00:24:48.290 --> 00:24:49.417 And we're not done yet. 00:24:49.417 --> 00:24:51.000 We're not drawing a diamond or a kite. 00:24:51.000 --> 00:24:58.520 Now you have a diamond or a kite, with a top vertex or point, a bottom one, 00:24:58.520 --> 00:24:59.540 a left and a right. 00:24:59.540 --> 00:25:04.040 Move your pen to the left one, and draw a vertical line down. 00:25:04.040 --> 00:25:09.020 Now go back to the diamond or the kite, and on the right vertex or point, 00:25:09.020 --> 00:25:13.350 draw similarly a vertical line down. 00:25:13.350 --> 00:25:19.170 And now, in that original diameter kite, at the bottom most vertex or point, 00:25:19.170 --> 00:25:21.930 draw a vertical line down. 00:25:21.930 --> 00:25:25.740 And now, if you followed along with these very precise machine 00:25:25.740 --> 00:25:30.212 instructions, you've got three vertical lines that are just kind of dangling. 00:25:30.212 --> 00:25:33.420 I don't even want to mislead you with any hand gestures, three vertical lines 00:25:33.420 --> 00:25:37.110 that are dangling, go ahead and draw two lines that 00:25:37.110 --> 00:25:42.070 connect the ends of those three lines. 00:25:42.070 --> 00:25:42.920 Oh my God. 00:25:42.920 --> 00:25:46.870 Like it's so complex to do what we just did, and I would put money on the fact 00:25:46.870 --> 00:25:50.410 that you did not draw correctly, what I was trying to get you to draw, 00:25:50.410 --> 00:25:52.030 which is just a cube. 00:25:52.030 --> 00:25:56.170 Right, a cube is a wonderfully simple abstraction. 00:25:56.170 --> 00:25:59.190 A shape with which you and I are probably long familiar, 00:25:59.190 --> 00:26:01.876 and it's so easy to say draw a cube. 00:26:01.876 --> 00:26:04.750 But as we saw before with the circle and the square and the triangle, 00:26:04.750 --> 00:26:06.670 just saying draw a cube is ambiguous. 00:26:06.670 --> 00:26:08.170 At what angle should it be oriented? 00:26:08.170 --> 00:26:09.220 How big should it be? 00:26:09.220 --> 00:26:10.934 Where on the piece of paper should it be? 00:26:10.934 --> 00:26:12.850 And so I was trying to be so much more precise 00:26:12.850 --> 00:26:15.800 this time by having you put your pen down at the top, 00:26:15.800 --> 00:26:19.120 go down to the southwest and draw this line, 00:26:19.120 --> 00:26:25.810 then go down to the southernmost point here, then another Northeasternly line, 00:26:25.810 --> 00:26:28.300 and then a north westerly line. 00:26:28.300 --> 00:26:30.580 And that just got us to the kite. 00:26:30.580 --> 00:26:33.700 But the takeaway here, is that when it comes 00:26:33.700 --> 00:26:36.640 to making a computer do what you want to do, 00:26:36.640 --> 00:26:39.130 you can't just speak these abstractions. 00:26:39.130 --> 00:26:43.540 You actually have to implement them, or program them, or code them 00:26:43.540 --> 00:26:44.830 at least once. 00:26:44.830 --> 00:26:48.490 In fact, some of the earliest graphical programs in the world of computing, 00:26:48.490 --> 00:26:50.530 were kind of as low level as this. 00:26:50.530 --> 00:26:52.890 There was an old programming language called logo, 00:26:52.890 --> 00:26:56.625 in fact, that allowed you to program by moving a cursor, 00:26:56.625 --> 00:26:59.500 like a turtle of sorts, up and down and left and right on the screen. 00:26:59.500 --> 00:27:01.812 And putting either down or up a marker of sorts, 00:27:01.812 --> 00:27:05.020 and that you could draw shapes like this by just moving around on the screen. 00:27:05.020 --> 00:27:08.560 But to draw things like this clearly, as in our verbal example here, 00:27:08.560 --> 00:27:10.930 you have to be so darn precise. 00:27:10.930 --> 00:27:13.090 And it just gets so tedious so quickly. 00:27:13.090 --> 00:27:16.270 It certainly would take all of the fun out of using a computer, 00:27:16.270 --> 00:27:19.540 or all the fun out of using programming a computer, if you 00:27:19.540 --> 00:27:21.340 had to do this every time. 00:27:21.340 --> 00:27:24.100 But that's where there's an ingredient here to be leveraged. 00:27:24.100 --> 00:27:27.370 One of the things that a computer scientist, and a programmer, 00:27:27.370 --> 00:27:30.610 and engineers, more generally, very often do, 00:27:30.610 --> 00:27:34.270 is they absolutely implement these kinds of low level details once. 00:27:34.270 --> 00:27:38.110 They go through those very methodical, if mundane, steps 00:27:38.110 --> 00:27:40.330 of getting something just right. 00:27:40.330 --> 00:27:43.300 And then they save the instructions they wrote. 00:27:43.300 --> 00:27:46.460 They save the programs they wrote, if you will, 00:27:46.460 --> 00:27:48.332 so that they can reuse them later. 00:27:48.332 --> 00:27:50.665 And the fancy words for these things will eventually see 00:27:50.665 --> 00:27:52.330 are called libraries. 00:27:52.330 --> 00:27:53.500 Or functions. 00:27:53.500 --> 00:27:56.470 Or other names still. 00:27:56.470 --> 00:27:59.980 So once one human in the world has implemented a program, if you will, 00:27:59.980 --> 00:28:03.280 with which to draw a cube, similarly can we stand on his or her shoulders 00:28:03.280 --> 00:28:05.140 and reuse that same routine. 00:28:05.140 --> 00:28:09.090 And hopefully, they were clever enough to allow us to parameterize it. 00:28:09.090 --> 00:28:13.210 To customize it, by maybe changing the angle and the size and the depth 00:28:13.210 --> 00:28:14.090 and so forth. 00:28:14.090 --> 00:28:17.260 So it doesn't just have to be that one cube. 00:28:17.260 --> 00:28:21.820 And so here we have a wonderfully powerful problem solving technique. 00:28:21.820 --> 00:28:22.730 Abstraction. 00:28:22.730 --> 00:28:26.560 Which allows us to say what we mean, and the rest of the humans in the room 00:28:26.560 --> 00:28:30.250 just immediately understand-- at least after some instruction-- what 00:28:30.250 --> 00:28:31.600 it is we're talking about. 00:28:31.600 --> 00:28:35.560 But with computers being these very little literal devices, 00:28:35.560 --> 00:28:37.930 we can only talk at those levels of abstraction 00:28:37.930 --> 00:28:42.970 once we've actually built up software, implemented solutions to get us 00:28:42.970 --> 00:28:44.950 to that point in the conversation. 00:28:44.950 --> 00:28:48.670 And this is why, at first glance, using a phone in your pocket 00:28:48.670 --> 00:28:52.442 or a computer on your desk might actually seem super, super complicated. 00:28:52.442 --> 00:28:53.650 There's so many moving parts. 00:28:53.650 --> 00:28:55.270 And absolutely there are. 00:28:55.270 --> 00:28:58.420 Windows and Mac OS are literally the result of millions 00:28:58.420 --> 00:29:02.550 of lines of programming code these days, having been written over the years. 00:29:02.550 --> 00:29:05.530 And so of course it's to be expected that you might be a little daunted 00:29:05.530 --> 00:29:08.350 or overwhelmed by the apparent complexity. 00:29:08.350 --> 00:29:10.750 But one of the goals here for this lesson, 00:29:10.750 --> 00:29:14.740 is to really help you appreciate that beneath all that complexity, 00:29:14.740 --> 00:29:16.180 is a simpler idea. 00:29:16.180 --> 00:29:17.770 And then an even simpler idea. 00:29:17.770 --> 00:29:20.290 And then a very simple idea, and so forth. 00:29:20.290 --> 00:29:24.100 And so once you sort of bottom out and understand those first principles, 00:29:24.100 --> 00:29:28.330 zeros and ones, binary on top of which might be Ascii or Unicode, on top 00:29:28.330 --> 00:29:31.030 of which might be some other encoding still, 00:29:31.030 --> 00:29:35.370 can you resume the current conversation and understand 00:29:35.370 --> 00:29:38.770 that what might have looked completely complex at first glance, 00:29:38.770 --> 00:29:41.440 is really just the result of assembling, if you will, 00:29:41.440 --> 00:29:45.970 a whole bunch of pretty simple puzzle pieces. 00:29:45.970 --> 00:29:51.460 So at this point in the story, we now have a way of representing information. 00:29:51.460 --> 00:29:53.080 But now let's just stipulate. 00:29:53.080 --> 00:29:54.836 We know how to represent information. 00:29:54.836 --> 00:29:57.586 At the end of the day, Yeah it's binary, And at the end of the day 00:29:57.586 --> 00:30:01.720 you can think about it as decimal, and maybe you're using Ascii in Unicode, 00:30:01.720 --> 00:30:04.030 or maybe you're using graphics, or whatever 00:30:04.030 --> 00:30:05.680 is going on underneath the hood. 00:30:05.680 --> 00:30:09.310 All we need to know and care about now is that you can do it. 00:30:09.310 --> 00:30:12.550 And we don't really have to think too much more at that level. 00:30:12.550 --> 00:30:17.200 Now we can resume our look at the overarching model 00:30:17.200 --> 00:30:19.360 at hand, which is problem solving. 00:30:19.360 --> 00:30:23.780 We now have a way to represent our inputs and outputs. 00:30:23.780 --> 00:30:26.290 What then is inside that black box? 00:30:26.290 --> 00:30:28.660 Well the buzzword du jour these days is perhaps 00:30:28.660 --> 00:30:32.050 algorithms, where even if you don't necessarily know what it is 00:30:32.050 --> 00:30:36.040 or how to make one, or how to use one in the context of a computer program, 00:30:36.040 --> 00:30:39.620 algorithms seem to be increasingly the solution to all of our problems. 00:30:39.620 --> 00:30:42.370 Well an algorithm isn't all that complex, fancy 00:30:42.370 --> 00:30:45.100 though the word might sound, it's really just step 00:30:45.100 --> 00:30:48.160 by step instructions for solving a problem. 00:30:48.160 --> 00:30:51.340 And those steps can be in English, or they can be in something we'll call 00:30:51.340 --> 00:30:54.790 pseudo code, sort of code like but it's not really an actual language, 00:30:54.790 --> 00:30:58.600 or can be in Java, or C, or c++, or JavaScript, 00:30:58.600 --> 00:31:00.640 or any number of other programming languages. 00:31:00.640 --> 00:31:05.560 Algorithms are, again, just sets of instructions for solving a problem. 00:31:05.560 --> 00:31:08.610 Well what's one such problem we might want to solve? 00:31:08.610 --> 00:31:11.380 Well in the real world we might have-- 00:31:11.380 --> 00:31:13.390 the real old world shall we say-- 00:31:13.390 --> 00:31:16.960 we might have a problem that once upon a time looked like this. 00:31:16.960 --> 00:31:20.970 A phone book with a whole lot of pieces of paper inside of it, 00:31:20.970 --> 00:31:24.760 on which were a whole bunch of names and a whole bunch of telephone numbers. 00:31:24.760 --> 00:31:28.482 And the phone book was alphabetized from A to Z typically. 00:31:28.482 --> 00:31:31.690 And maybe there were some other sections like the yellow pages, or apparently 00:31:31.690 --> 00:31:33.550 the red pages, whatever this is here. 00:31:33.550 --> 00:31:35.758 But we'll just assume that these are the white pages, 00:31:35.758 --> 00:31:38.470 so to speak, with just a whole bunch of humans names and numbers. 00:31:38.470 --> 00:31:40.540 So suppose I want to find one such human, 00:31:40.540 --> 00:31:43.270 a human whom we'll call Mike Smith. 00:31:43.270 --> 00:31:45.970 How do you go about finding Mike Smith in this phone book? 00:31:45.970 --> 00:31:51.970 Well I could, somewhat stupidly, but arguably quite correctly, 00:31:51.970 --> 00:31:54.130 start at the beginning of the book. 00:31:54.130 --> 00:31:56.890 And I see here instructions for calling 911. 00:31:56.890 --> 00:32:00.220 If I move on to page two, I now see someone's names and numbers. 00:32:00.220 --> 00:32:02.320 But these people's names all start with A. 00:32:02.320 --> 00:32:04.660 And so I continue going through the A section. 00:32:04.660 --> 00:32:05.740 And the A section. 00:32:05.740 --> 00:32:08.380 And then eventually I get to the B section. 00:32:08.380 --> 00:32:09.640 And the B section. 00:32:09.640 --> 00:32:10.780 And the B section. 00:32:10.780 --> 00:32:14.410 And then the Cs, and the Ds and the Es and Fs and so forth. 00:32:14.410 --> 00:32:20.050 And eventually, tediously but correctly, I'll get to the Ss in the phone book. 00:32:20.050 --> 00:32:24.580 And if I see Mike Smith here, I can now pick up the phone and call Mike Smith. 00:32:24.580 --> 00:32:28.330 Now no one out there, if you've been still use this technology, 00:32:28.330 --> 00:32:30.550 is going to have looked up Mike Smith in that way. 00:32:30.550 --> 00:32:33.010 You're going to fly through this phone book far faster than I, right, 00:32:33.010 --> 00:32:35.470 you learned probably in grade school why count by ones 00:32:35.470 --> 00:32:36.880 when you can count by twos? 00:32:36.880 --> 00:32:41.230 So two, four, six, eight, ten, twelve. 00:32:41.230 --> 00:32:43.660 Sounds faster, is faster. 00:32:43.660 --> 00:32:47.680 It's going to get me to Mike faster, but is it going to get to me to Mike 00:32:47.680 --> 00:32:49.650 correctly? 00:32:49.650 --> 00:32:52.810 I'm going twice as fast by doing two pages at a time. 00:32:52.810 --> 00:32:55.510 So I'm going to have flip half as many times. 00:32:55.510 --> 00:33:01.770 But there's actually a bug, so to speak, a potential mistake here. 00:33:01.770 --> 00:33:03.700 What is that? 00:33:03.700 --> 00:33:06.720 Well in my cleverness to get to Mike's name 00:33:06.720 --> 00:33:10.230 twice as fast, what if I go ever so slightly too 00:33:10.230 --> 00:33:16.620 far, just because by bad luck, Mike is sandwiched between two of the pages 00:33:16.620 --> 00:33:20.220 that I so cleverly was skimming two at a time? 00:33:20.220 --> 00:33:22.490 Of course I'm looking at this side, maybe this side, 00:33:22.490 --> 00:33:24.370 but maybe Mike is in the middle. 00:33:24.370 --> 00:33:28.770 And so it turns out with that algorithm, that twosie approach, 00:33:28.770 --> 00:33:32.070 am I going to have to have a little bit of a check at the end? 00:33:32.070 --> 00:33:35.769 Such that if I hit like the T section, and see a name starting with T, 00:33:35.769 --> 00:33:37.560 I better wait a minute, let me double back, 00:33:37.560 --> 00:33:40.890 I might need to go back at most one page to make sure 00:33:40.890 --> 00:33:42.810 that I didn't actually miss Mike Smith. 00:33:42.810 --> 00:33:45.030 So at the end of the day, it's not correct 00:33:45.030 --> 00:33:46.860 if you naively just do two pages at a time, 00:33:46.860 --> 00:33:49.410 but it is correct if you do two pages at a time, 00:33:49.410 --> 00:33:53.580 with one final reversal by a page, just to make sure 00:33:53.580 --> 00:33:57.360 once you go past SMITH in the phone book, 00:33:57.360 --> 00:34:00.210 that you didn't accidentally miss Mike just one page prior. 00:34:00.210 --> 00:34:03.080 So it's still super fast, you just need that one little check. 00:34:03.080 --> 00:34:08.250 But still, no one out there, is going to look up Mike Smith one page at 00:34:08.250 --> 00:34:09.530 a time, two pages at a time. 00:34:09.530 --> 00:34:12.170 You're going to open in the middle of the damn phone book, 00:34:12.170 --> 00:34:17.070 look down and see, oh, not in the S section yet, I'm in the M section. 00:34:17.070 --> 00:34:21.270 And so what you intuitively know how to do since growing up perhaps, 00:34:21.270 --> 00:34:24.449 is that you know Mike is not in this half of the phone book. 00:34:24.449 --> 00:34:26.940 Mike is clearly in this half of the phone book. 00:34:26.940 --> 00:34:29.800 And so at this point you can figuratively-- but in our case 00:34:29.800 --> 00:34:32.610 literally-- tear the problem in half. 00:34:32.610 --> 00:34:36.739 Throw half of the problem away, and be left with, 00:34:36.739 --> 00:34:42.489 we single use phone book, and half at that, but half the size of the problem. 00:34:42.489 --> 00:34:46.350 So if we had 1,000 pages originally, now maybe I've got only 500 pages. 00:34:46.350 --> 00:34:48.030 And so I can repeat this intuition. 00:34:48.030 --> 00:34:49.580 Jump roughly to the middle. 00:34:49.580 --> 00:34:52.500 Darn, I'm a little too far now, I'm in the T section. 00:34:52.500 --> 00:34:57.420 Again, let's tear half the problem away, throw it away as well, 00:34:57.420 --> 00:35:02.880 and now be left with a 250 page problem, which can now be 125 page problem. 00:35:02.880 --> 00:35:06.520 Which is getting easier and easier and easier, until we repeat, 00:35:06.520 --> 00:35:08.160 repeat, repeat, repeat. 00:35:08.160 --> 00:35:12.600 Until theoretically, we're left with just one page in the end. 00:35:12.600 --> 00:35:15.750 Maybe Mike's on it, maybe Mike's not, but if he's in the phone book, 00:35:15.750 --> 00:35:18.160 he will be on this page. 00:35:18.160 --> 00:35:19.950 So how efficient was that. 00:35:19.950 --> 00:35:25.150 Well if that phone book had 1,000 pages, in my first algorithm 00:35:25.150 --> 00:35:28.900 it might have taken me what, like 700 plus steps to get to Mike? 00:35:28.900 --> 00:35:31.590 Or in the worst case 1,000 steps, right, it's alphabetical. 00:35:31.590 --> 00:35:33.150 But maybe it's not Smith, maybe it's someone 00:35:33.150 --> 00:35:36.316 with the last name that starts with Z. In the worst case in that first phone 00:35:36.316 --> 00:35:39.840 book, maybe it would take me 1,000 steps maximally to find Mike Smith. 00:35:39.840 --> 00:35:40.860 Pretty slow. 00:35:40.860 --> 00:35:44.320 What about that second algorithm where I was going two pages at a time? 00:35:44.320 --> 00:35:47.790 Well, with that algorithm, it's going to take me like 500 steps 00:35:47.790 --> 00:35:49.650 maximally to find Mike Smith. 00:35:49.650 --> 00:35:51.750 That's twice as fast, that's pretty good. 00:35:51.750 --> 00:35:56.790 It's not nearly as amazing as the algorithm we settled on. 00:35:56.790 --> 00:36:00.510 The intuitive one, arguably, where I divided and conquered. 00:36:00.510 --> 00:36:03.030 I have the problem again and again and again. 00:36:03.030 --> 00:36:09.150 Because if I start at 1,000, and I go to 500, then 250, then 125 and so forth, 00:36:09.150 --> 00:36:14.730 rounding as needed, I actually get to one page much faster. 00:36:14.730 --> 00:36:19.020 Put another way, how many times can you divide 1,000 in half 00:36:19.020 --> 00:36:22.537 before you're left with just the number one? 00:36:22.537 --> 00:36:25.120 Well if you do the math, either in one direction or the other, 00:36:25.120 --> 00:36:27.810 you'll see that it's roughly ten. 00:36:27.810 --> 00:36:31.150 In fact, if you want to pause even and grab a calculator, or a piece of paper, 00:36:31.150 --> 00:36:34.233 or pencil, or just think through it in your head, you can start with 1,000 00:36:34.233 --> 00:36:36.660 and go to 500, 125, and so forth. 00:36:36.660 --> 00:36:39.630 And you'll eventually hit one, after just 10-- 00:36:39.630 --> 00:36:41.610 give or take, depending on how you round-- 00:36:41.610 --> 00:36:42.900 steps. 00:36:42.900 --> 00:36:44.430 That's pretty powerful. 00:36:44.430 --> 00:36:45.740 But not that big of a deal. 00:36:45.740 --> 00:36:46.240 Right. 00:36:46.240 --> 00:36:48.360 Ten still is like a lot of page turns. 00:36:48.360 --> 00:36:51.240 But what about an even bigger problem. 00:36:51.240 --> 00:36:54.780 The types of problems that Google, and Microsoft, and Facebook, 00:36:54.780 --> 00:36:57.570 and Oracle, and really big companies deal with that 00:36:57.570 --> 00:36:58.800 have lots and lots of data. 00:36:58.800 --> 00:37:02.310 Suppose, for instance, that I'm searching through not a phone 00:37:02.310 --> 00:37:04.170 book, but a database. 00:37:04.170 --> 00:37:06.300 A big program that stores lots and lots of data. 00:37:06.300 --> 00:37:09.780 And suppose the data that's being stored is still names and numbers. 00:37:09.780 --> 00:37:13.440 How much time might it take to find someone like Mike Smith, 00:37:13.440 --> 00:37:18.510 in a database that's got like 4 billion names and numbers in it somehow? 00:37:18.510 --> 00:37:20.400 Well, four billion names and numbers. 00:37:20.400 --> 00:37:22.251 Well if we use that first algorithm, might 00:37:22.251 --> 00:37:24.000 take as many as four billion steps to find 00:37:24.000 --> 00:37:25.560 Mike Smith in a really big database. 00:37:25.560 --> 00:37:28.350 In a really big computer program, that's not too smart. 00:37:28.350 --> 00:37:30.480 But if I instead use the twosie approach, 00:37:30.480 --> 00:37:33.870 flipping through two database records at once, 00:37:33.870 --> 00:37:37.730 maybe it's not 4 billion operations, maybe it's just 2 billion. 00:37:37.730 --> 00:37:39.000 That's good. 00:37:39.000 --> 00:37:40.450 That's half as many operations. 00:37:40.450 --> 00:37:43.740 But what if I use this super clever intuition 00:37:43.740 --> 00:37:47.160 that I kind of grew up with here, with that divide and conquer algorithm. 00:37:47.160 --> 00:37:50.310 Well, I can start with 4 billion database records, go to 2 billion, 00:37:50.310 --> 00:37:56.190 then 1 billion, then 500 million, 250 million, 125 million, and so forth. 00:37:56.190 --> 00:37:58.690 I'm getting to one much, much faster. 00:37:58.690 --> 00:38:03.270 In fact, I can only divide the number 4 billion 00:38:03.270 --> 00:38:07.020 in half, roughly 32 times total. 00:38:07.020 --> 00:38:11.260 Again, depending kind of sort of on how you round, but 32 times. 00:38:11.260 --> 00:38:15.810 32 is so much smaller than 2 billion steps. 00:38:15.810 --> 00:38:18.675 And 32 is certainly smaller than the original starting point, 00:38:18.675 --> 00:38:20.010 4 billion steps. 00:38:20.010 --> 00:38:23.160 So this is a really powerful problem solving technique 00:38:23.160 --> 00:38:24.480 to divide and conquer. 00:38:24.480 --> 00:38:28.590 And here, too, even though what computers might seem to be doing these 00:38:28.590 --> 00:38:31.470 days is super complicated and sophisticated-- 00:38:31.470 --> 00:38:33.120 and it is in many ways-- 00:38:33.120 --> 00:38:36.180 but some of the ideas that those computers and the programmers who 00:38:36.180 --> 00:38:41.460 program them are leveraging, are actually pretty familiar to us already. 00:38:41.460 --> 00:38:46.710 Inside of this black box might not be something super fancy, but just 00:38:46.710 --> 00:38:49.890 a clever adaptation of some of your grade school 00:38:49.890 --> 00:38:56.380 human intuition to the context of a computer program. 00:38:56.380 --> 00:39:00.490 Now it's one thing to talk about algorithms, especially 00:39:00.490 --> 00:39:03.730 if we're just spit balling it verbally. 00:39:03.730 --> 00:39:07.330 But computers, of course, need us to be more precise. 00:39:07.330 --> 00:39:10.570 And they need us to state our thoughts more methodically. 00:39:10.570 --> 00:39:11.930 So what does this mean? 00:39:11.930 --> 00:39:16.330 Well, let me propose that we write some code, or really pseudocode, 00:39:16.330 --> 00:39:19.450 for this same algorithm, where we're looking for someone like Mike 00:39:19.450 --> 00:39:20.710 Smith in a phone book. 00:39:20.710 --> 00:39:23.620 And it's pseudocode because it's not going to be Java, or c++, 00:39:23.620 --> 00:39:25.120 or JavaScript, or anything else. 00:39:25.120 --> 00:39:28.990 It's going to be English like syntax, that's kind of sort of like code. 00:39:28.990 --> 00:39:31.630 And before long, we'll see some actual code as well. 00:39:31.630 --> 00:39:33.422 But step zero, and just to be playful here, 00:39:33.422 --> 00:39:36.588 I'm not going to start counting at one, I'm going to start counting at zero, 00:39:36.588 --> 00:39:39.010 just because with any number of bits, or digits, 00:39:39.010 --> 00:39:42.970 the lowest number that I could count with is of course zero. 00:39:42.970 --> 00:39:45.550 Pick up phone book is the first thing that I did. 00:39:45.550 --> 00:39:47.260 One, open to the middle of the phone book 00:39:47.260 --> 00:39:50.200 is the second thing I did in that third and final algorithm. 00:39:50.200 --> 00:39:54.100 Look at the names was the next thing I did, looking down at that phone book. 00:39:54.100 --> 00:39:56.830 And then if Smith is among names, and notice, 00:39:56.830 --> 00:40:00.000 this is semantically different from those first three steps, 00:40:00.000 --> 00:40:02.050 because this expression starts with if. 00:40:02.050 --> 00:40:04.390 So this is kind of like the proverbial fork in the road, 00:40:04.390 --> 00:40:06.940 if Smith is among the names, let's do this. 00:40:06.940 --> 00:40:08.200 What do we want to do? 00:40:08.200 --> 00:40:10.370 Call Mike, that's great. 00:40:10.370 --> 00:40:14.890 Otherwise, or else if Mike is earlier in the book, 00:40:14.890 --> 00:40:18.310 let's go a different direction instead. 00:40:18.310 --> 00:40:24.340 Let's instead open to the middle of the left half of the book, all right. 00:40:24.340 --> 00:40:27.730 So that would be the left half that I threw away earlier, 00:40:27.730 --> 00:40:29.710 and then go back to step two. 00:40:29.710 --> 00:40:34.000 Because once I have opened to the middle of the left half of the book, 00:40:34.000 --> 00:40:35.980 I don't have to actually dramatically tear it. 00:40:35.980 --> 00:40:40.390 I now need to look for Mike's name again, as per step two. 00:40:40.390 --> 00:40:43.870 Else if Mike is later in the book, I actually 00:40:43.870 --> 00:40:47.560 want to open to the middle of the right half of the book, 00:40:47.560 --> 00:40:50.880 and then go back to step two as well. 00:40:50.880 --> 00:40:54.020 Now you might think that's everything to this program. 00:40:54.020 --> 00:40:55.860 But there's actually a remaining step. 00:40:55.860 --> 00:40:59.870 Indeed I've got room left on the screen for a remaining step or two, 00:40:59.870 --> 00:41:02.540 but there are more than three possibilities. 00:41:02.540 --> 00:41:05.510 One of them is that Mike is among the names, one of them 00:41:05.510 --> 00:41:09.460 is that Mike is to the left, the other is that Mike is to the right. 00:41:09.460 --> 00:41:10.670 What's the fourth? 00:41:10.670 --> 00:41:13.610 If I don't consider the fourth, and indeed if in a program 00:41:13.610 --> 00:41:16.700 I don't implement the fourth, my program might crash. 00:41:16.700 --> 00:41:17.839 My computer might hang. 00:41:17.839 --> 00:41:19.880 My computer might behave in an unpredictable way, 00:41:19.880 --> 00:41:23.870 because if the programmer wasn't so precise as to anticipate 00:41:23.870 --> 00:41:27.590 something that might happen, who knows what the computer might do. 00:41:27.590 --> 00:41:30.440 And indeed, that's often why your own computer might 00:41:30.440 --> 00:41:32.600 have a little spinning beachball, or icon, 00:41:32.600 --> 00:41:34.510 or it might crash outright or freeze. 00:41:34.510 --> 00:41:37.100 It's because something unanticipated happened. 00:41:37.100 --> 00:41:39.090 So let's be precise. 00:41:39.090 --> 00:41:41.510 There's a fourth and final scenario I can think of, 00:41:41.510 --> 00:41:46.260 which perhaps on your mind too, just quit. 00:41:46.260 --> 00:41:49.024 Because in the fourth scenario, if Mike's not among the names, 00:41:49.024 --> 00:41:51.690 and he's not to the left, and he's not to the right in the book, 00:41:51.690 --> 00:41:53.110 he must not be there. 00:41:53.110 --> 00:41:56.400 And so let's avoid just hanging infinitely somehow or other, 00:41:56.400 --> 00:41:58.967 by actually proactively deciding to quit. 00:41:58.967 --> 00:42:01.800 But now let's tease apart what some of these terms actually are now. 00:42:01.800 --> 00:42:06.030 So in yellow are some things that really just look like verbs, or actions. 00:42:06.030 --> 00:42:09.000 And we're going to call those statements, or more specifically 00:42:09.000 --> 00:42:12.690 functions, or procedures would be a reasonable synonym too. 00:42:12.690 --> 00:42:15.030 And each of these yellow terms is really a call 00:42:15.030 --> 00:42:19.350 to action for the computer to just do something unconditionally. 00:42:19.350 --> 00:42:22.950 But sometimes, we want that computer to do something conditionally, 00:42:22.950 --> 00:42:25.920 as evinced by these yellow terms now. 00:42:25.920 --> 00:42:30.630 If, and else if, and else and else, kind of paints the picture of a four-way 00:42:30.630 --> 00:42:31.890 fork in the road. 00:42:31.890 --> 00:42:34.650 Where each of these branches, or conditions, 00:42:34.650 --> 00:42:36.480 leads us to take different action. 00:42:36.480 --> 00:42:41.160 And you'll see that I've indented lines four and six and seven and nine and ten 00:42:41.160 --> 00:42:45.510 and twelve, because they are meant to happen 00:42:45.510 --> 00:42:52.230 only if lines three and five and eight and eleven, or eleven, 00:42:52.230 --> 00:42:54.330 actually applies. 00:42:54.330 --> 00:42:58.030 So those indentation kind of captures the logic of this program. 00:42:58.030 --> 00:43:01.950 Lastly, or second to last there is this. 00:43:01.950 --> 00:43:05.680 This is what we call a little more fancily, a Boolean expression. 00:43:05.680 --> 00:43:08.810 These yellow phrases here are kind of like questions. 00:43:08.810 --> 00:43:12.840 There are either yes no, or true false, or one and zero 00:43:12.840 --> 00:43:16.320 or any number of other binary terms. 00:43:16.320 --> 00:43:17.880 But these are questions we're asking. 00:43:17.880 --> 00:43:19.470 The answer to which is going to be true or false. 00:43:19.470 --> 00:43:21.010 Smith is among the names, true or false? 00:43:21.010 --> 00:43:22.830 Smith is earlier in the book, true or false? 00:43:22.830 --> 00:43:24.580 Smith is later in the book, true or false? 00:43:24.580 --> 00:43:27.360 And so these Boolean expressions, named after someone 00:43:27.360 --> 00:43:31.860 who the last name of Bool, long ago, is a way 00:43:31.860 --> 00:43:35.550 of having conditions take you in a different direction based 00:43:35.550 --> 00:43:38.190 on whether something is true or false. 00:43:38.190 --> 00:43:40.000 Three such examples there. 00:43:40.000 --> 00:43:42.150 And then, lastly, there's these two lines. 00:43:42.150 --> 00:43:44.970 On seven and ten there's this expression go back 00:43:44.970 --> 00:43:49.750 to step two, which is to induce a bit of cyclicity, right. 00:43:49.750 --> 00:43:52.500 You can sort of think about it visually if you go from step seven, 00:43:52.500 --> 00:43:54.990 or ten for that matter, back up to step two. 00:43:54.990 --> 00:43:58.200 This might happen again, and again, and again, 00:43:58.200 --> 00:44:00.720 if Mike is still to the left half, the left half, 00:44:00.720 --> 00:44:03.210 the left, as you're whittling down the phone book. 00:44:03.210 --> 00:44:05.990 And so we're going to induce, and induce, and induce potentially, 00:44:05.990 --> 00:44:08.889 this cycling or looping behavior. 00:44:08.889 --> 00:44:10.680 So these lines here now in yellow represent 00:44:10.680 --> 00:44:14.020 what a computer program might call a loop. 00:44:14.020 --> 00:44:18.060 Now these same English phrases we'll eventually see can be translated 00:44:18.060 --> 00:44:21.150 into Java, and c++, and JavaScript, and Python, and Ruby, 00:44:21.150 --> 00:44:22.740 and other languages still. 00:44:22.740 --> 00:44:24.630 But the key takeaway for us here today is 00:44:24.630 --> 00:44:28.740 one, the precision with which we specified these steps, two, 00:44:28.740 --> 00:44:31.937 the fact that there's this ordering, what happens after the other. 00:44:31.937 --> 00:44:34.020 The fact that there is these conditions, some only 00:44:34.020 --> 00:44:36.540 happen if something is true, and the fact that some of them 00:44:36.540 --> 00:44:40.680 can happen again, and again, and again, based on some kind of looping. 00:44:43.300 --> 00:44:45.040 But is this good? 00:44:45.040 --> 00:44:48.020 Like this is correct, and that of course, was one of our goals 00:44:48.020 --> 00:44:50.850 with the phone book, initially, was let's get it correct, 00:44:50.850 --> 00:44:53.450 and better still, let's get it correct and efficient. 00:44:53.450 --> 00:44:55.100 correct and fast. 00:44:55.100 --> 00:44:57.890 But how fast now is this? 00:44:57.890 --> 00:45:01.010 In fact, how do I put a number, of really a formula, 00:45:01.010 --> 00:45:03.170 on the performance of the algorithm so that I 00:45:03.170 --> 00:45:07.190 can claim that I am a good programmer, or I am a good problem solver? 00:45:07.190 --> 00:45:10.550 I've not only solved your problem correctly, but really, really well. 00:45:10.550 --> 00:45:13.820 Well let me propose that we analyze these three functions. 00:45:13.820 --> 00:45:14.900 Kind of in the abstract. 00:45:14.900 --> 00:45:19.310 We don't need actual arithmetic expressions here, per se. 00:45:19.310 --> 00:45:21.930 We'll just do things variably as follows. 00:45:21.930 --> 00:45:26.240 So here's a nice little Cartesian plane, with an x-axis of size of problem, 00:45:26.240 --> 00:45:29.000 and a y-axis of time to solve. 00:45:29.000 --> 00:45:33.020 And the farther out you go on size of problem, from left to right, 00:45:33.020 --> 00:45:36.210 means bigger, bigger, bigger, bigger, bigger, bigger, bigger bigger problem. 00:45:36.210 --> 00:45:38.570 And by bigger problem I mean more and more pages. 00:45:38.570 --> 00:45:41.150 More and more input, whatever the problem actually is. 00:45:41.150 --> 00:45:44.930 And then time to solve, is not very much time, lots of time. 00:45:44.930 --> 00:45:47.240 So on the y-axis too, the higher you go, the more 00:45:47.240 --> 00:45:50.360 time it takes to solve a problem, or really the slower 00:45:50.360 --> 00:45:52.380 it is to solve the problem. 00:45:52.380 --> 00:45:58.100 So let's now draw this red line here as a depiction of the first algorithms 00:45:58.100 --> 00:46:00.080 performance, or running time. 00:46:00.080 --> 00:46:04.820 Whereby, there is a one to one relationship between size and time. 00:46:04.820 --> 00:46:07.580 A one to one relationship between number of pages, 00:46:07.580 --> 00:46:10.520 and maybe number of page turns, or seconds, or whatever 00:46:10.520 --> 00:46:12.470 your unit of measure happens to be. 00:46:12.470 --> 00:46:18.380 So that if I have a phone book of this size, this dot here on the red line 00:46:18.380 --> 00:46:21.390 is how many seconds, or page turns it takes me to find it. 00:46:21.390 --> 00:46:24.920 And there's a linear relationship there, as implied by this variable-- 00:46:24.920 --> 00:46:26.630 as we'll call it-- as in algebra. 00:46:26.630 --> 00:46:29.330 n for number of pages, for instance. 00:46:29.330 --> 00:46:33.692 It's linear in so far as Verizon, if the phone company like Verizon 00:46:33.692 --> 00:46:36.650 increases the number of pages in the phone book just buy one next year, 00:46:36.650 --> 00:46:39.860 a few more people move into town so they add one more page to the phone book. 00:46:39.860 --> 00:46:44.360 That might take one additional page turn to find someone like Mike, 00:46:44.360 --> 00:46:46.960 or anyone else in that phone book if Verizon just adds a page. 00:46:46.960 --> 00:46:48.710 Or if they doubled the number of pages, it 00:46:48.710 --> 00:46:51.084 might double the amount of time it takes to find someone. 00:46:51.084 --> 00:46:53.781 There's a linear relationship between the two. 00:46:53.781 --> 00:46:56.780 Now with the second algorithm, where I was flying through the phone book 00:46:56.780 --> 00:47:00.060 two pages at a time, there's still a linear relationship, 00:47:00.060 --> 00:47:02.460 but it's not quite as bad so to speak. 00:47:02.460 --> 00:47:05.300 For instance, if I have this many pages in my phone book, 00:47:05.300 --> 00:47:10.010 my first algorithm might take this many seconds in the first algorithm, 00:47:10.010 --> 00:47:13.280 but because I'm flying through the phone book two at a time, 00:47:13.280 --> 00:47:15.440 it will take me half as much time. 00:47:15.440 --> 00:47:18.060 Half as many page turns with that second algorithm. 00:47:18.060 --> 00:47:20.690 So we'll describe it algebraically as n over two. 00:47:20.690 --> 00:47:22.980 Where n, again, is just the number of pages. 00:47:22.980 --> 00:47:26.970 So there is a relationship between those two lines and those two formulas. 00:47:26.970 --> 00:47:28.550 But that third algorithm. 00:47:28.550 --> 00:47:31.680 Super, super fancy I'll wager. 00:47:31.680 --> 00:47:35.100 And the shape of its line is fundamentally different. 00:47:35.100 --> 00:47:37.740 If you remember your logarithms, you'll see here 00:47:37.740 --> 00:47:42.170 this green curved line, otherwise depicted here as log of n, 00:47:42.170 --> 00:47:44.240 or really log base 2 of n, but we'll just keep 00:47:44.240 --> 00:47:46.760 it abstracted away as just log of n. 00:47:46.760 --> 00:47:49.100 This is a fundamentally different shape. 00:47:49.100 --> 00:47:51.590 And it still goes forever up and up and up and up. 00:47:51.590 --> 00:47:53.506 Even though it looks like it's flattening out, 00:47:53.506 --> 00:47:58.430 it's not perfectly flattening out, it's just growing and growing very slowly. 00:47:58.430 --> 00:48:02.180 And that's key, because what's powerful about that third and final algorithm, 00:48:02.180 --> 00:48:06.705 is that Verizon, for instance, doubles the number of pages in the phone book 00:48:06.705 --> 00:48:09.830 next year, because a whole lot of people move into town, or maybe two towns 00:48:09.830 --> 00:48:11.510 merge or whatever. 00:48:11.510 --> 00:48:14.510 Then how many more steps will it take me to find someone 00:48:14.510 --> 00:48:17.200 like Mike Smith in that phone book? 00:48:17.200 --> 00:48:19.460 Well just one additional page turn. 00:48:19.460 --> 00:48:21.950 One additional tear of the phone book. 00:48:21.950 --> 00:48:26.060 Because with each tear can I take a bite out of that problem. 00:48:26.060 --> 00:48:27.800 That's 50% of it. 00:48:27.800 --> 00:48:30.054 I can literally tear it in half. 00:48:30.054 --> 00:48:32.720 Meanwhile, Verizon doubled the number of pages in the phone book 00:48:32.720 --> 00:48:37.190 with my first algorithm, as per this straight linear relationship, aka n, 00:48:37.190 --> 00:48:40.640 it would double the amount of time necessary, or double the number of page 00:48:40.640 --> 00:48:43.440 turns necessary, to find Mike Smith in that case. 00:48:43.440 --> 00:48:46.640 So again, in the case of the first algorithm, 00:48:46.640 --> 00:48:49.040 would double the amount of time taken. 00:48:49.040 --> 00:48:53.900 The case of the third algorithm would increase by one step. 00:48:53.900 --> 00:48:54.740 One step. 00:48:54.740 --> 00:48:59.540 So if that phone book went ridiculously from four billion to eight billion, 00:48:59.540 --> 00:49:00.470 no big deal. 00:49:00.470 --> 00:49:03.770 But that third algorithm, just need one additional page turn. 00:49:03.770 --> 00:49:06.040 One additional tear of the phone book. 00:49:06.040 --> 00:49:07.850 Not in the case of the first algorithm. 00:49:07.850 --> 00:49:13.790 I would need an additional 4 billion page turns to find who I'm looking for. 00:49:17.290 --> 00:49:17.790 OK. 00:49:17.790 --> 00:49:20.460 But a phone book is a physical problem. 00:49:20.460 --> 00:49:24.360 And searching for phone numbers in a phone book is a physical solution. 00:49:24.360 --> 00:49:26.920 But what can computers do electronically? 00:49:26.920 --> 00:49:30.150 Like what is the analog in our computer to the problems 00:49:30.150 --> 00:49:31.450 we've been solving thus far? 00:49:31.450 --> 00:49:34.590 Well what about our own iPhones, or Android phones, or the like, 00:49:34.590 --> 00:49:36.480 where you have contacts these days? 00:49:36.480 --> 00:49:38.440 Sort of a digital version of a phone book. 00:49:38.440 --> 00:49:40.809 And those contacts are typically sorted alphabetically. 00:49:40.809 --> 00:49:43.350 But even then, you might know a lot of people, and most of us 00:49:43.350 --> 00:49:46.380 probably don't scroll, scroll, scroll, scroll, scroll. 00:49:46.380 --> 00:49:48.070 And if you do, you're doing it wrong. 00:49:48.070 --> 00:49:51.502 You can instead search by keyword, or by first name, or by last name, 00:49:51.502 --> 00:49:53.460 to actually find someone much more efficiently. 00:49:53.460 --> 00:49:56.680 And typically their name jumps to the top of the list. 00:49:56.680 --> 00:49:58.250 So that's an algorithm. 00:49:58.250 --> 00:50:00.660 The Google, and Apple, and other companies 00:50:00.660 --> 00:50:04.740 have implemented algorithms for searching, or even sorting 00:50:04.740 --> 00:50:07.620 your list of contacts so that they can find people for you quickly, 00:50:07.620 --> 00:50:10.830 so that you can just send them a text message or make that call. 00:50:10.830 --> 00:50:12.270 But let's simplify further. 00:50:12.270 --> 00:50:17.300 Rather than even get into the sorting of names, and alphabetical phrases, 00:50:17.300 --> 00:50:19.020 let's keep things simple for a moment. 00:50:19.020 --> 00:50:22.710 And just assume that we want to store things like numbers. 00:50:22.710 --> 00:50:24.940 A list of numbers for instance. 00:50:24.940 --> 00:50:28.800 So if we want to store a list of numbers, how can we do it? 00:50:28.800 --> 00:50:32.790 Well let me propose that the screen here, or if you will a piece of paper 00:50:32.790 --> 00:50:36.495 in front of you, really just represents your computer's memory. 00:50:36.495 --> 00:50:38.370 Now you may know that inside of your computer 00:50:38.370 --> 00:50:39.786 there's different types of memory. 00:50:39.786 --> 00:50:42.320 Something called the hard disk, or maybe a solid state disk, 00:50:42.320 --> 00:50:45.630 or something called ram, or random access memory, something called Rom 00:50:45.630 --> 00:50:48.510 or read only memory, something called l-1 cache, l-2 cache, 00:50:48.510 --> 00:50:50.130 sometimes l-3 cache and the like. 00:50:50.130 --> 00:50:52.110 So there's different types of memory. 00:50:52.110 --> 00:50:55.515 But the one we're going to think about right now is ram, random access memory. 00:50:55.515 --> 00:50:57.390 And this is the type of memory that you might 00:50:57.390 --> 00:51:02.910 have a gigabyte of, four gigabytes of, 16 gigabytes of, depending on how fancy 00:51:02.910 --> 00:51:07.360 your laptop or desktop computer is, or your phone for that matter. 00:51:07.360 --> 00:51:10.620 So inside of your computer, or phone, is a whole bunch of memory. 00:51:10.620 --> 00:51:11.790 A whole bunch of RAM. 00:51:11.790 --> 00:51:15.240 And if you have, for instance, 4 gigabytes of RAM, 00:51:15.240 --> 00:51:17.940 that is four billion bytes. 00:51:17.940 --> 00:51:20.580 Because what that means is giga means billion, 00:51:20.580 --> 00:51:25.220 so four giga means for billion, four gigabytes means four billion bytes. 00:51:25.220 --> 00:51:28.420 Mega, meanwhile means million, give or take. 00:51:25.840 --> 00:51:28.776 And in fact, earlier when we talked about rgb, 00:51:28.420 --> 00:51:31.310 And four megabytes then would be four million bytes. 00:51:31.310 --> 00:51:32.592 So 4 gigabytes is even bigger. 00:51:32.592 --> 00:51:34.800 And if you want to count higher still, four terabytes 00:51:34.800 --> 00:51:37.540 would be a huge amount of memory. 00:51:37.540 --> 00:51:39.780 So if you have four gigabytes of memory, that's 00:51:39.780 --> 00:51:45.840 four billion bytes of memory, and a byte meanwhile, is just eight bits. 00:51:45.910 --> 00:51:47.460 So why talk in terms of bytes? 00:51:47.460 --> 00:51:49.964 Well it's just easier to talk in terms of eight bits, 00:51:49.964 --> 00:51:51.130 rather than individual bits. 00:51:51.130 --> 00:51:53.463 Because with one bit, you can't really do all that much. 00:51:53.463 --> 00:51:55.220 You can only count from zero to one. 00:51:55.220 --> 00:51:59.390 So with 8 bits at least we can count a bit higher and express a bit more. 00:51:59.390 --> 00:52:04.744 so if we have 4 billion bytes of memory inside of our computer, 00:52:04.744 --> 00:52:07.660 you know it stands to reason that we could number each of those bytes. 00:52:07.660 --> 00:52:11.710 And maybe the top most byte up here represents byte number zero, 00:52:11.710 --> 00:52:18.039 and maybe the bottom most byte over here represents the number four billion, 00:52:18.039 --> 00:52:18.580 give or take. 00:52:18.580 --> 00:52:21.740 In fact it's not exactly 4 billion, it's a little more than that. 00:52:21.740 --> 00:52:24.580 But for our purposes, let's just assume that if this screen here 00:52:24.580 --> 00:52:27.950 represents all of my computer's memory, and you know for that matter, 00:52:27.950 --> 00:52:31.720 let's think of my memory as being divisible into rows and columns, 00:52:31.720 --> 00:52:33.250 just to kind of keep things orderly. 00:52:33.250 --> 00:52:36.070 This is not actually what it looks like underneath the hood. 00:52:36.070 --> 00:52:39.310 And it's definitely not as wavy as I making it out to be here. 00:52:39.310 --> 00:52:42.280 But you can think of it really, as rows and columns, 00:52:42.280 --> 00:52:45.700 that you can address each of the cells in this table. 00:52:45.700 --> 00:52:48.970 So kind of sort of like a spreadsheet with a whole lot of room for values. 00:52:48.970 --> 00:52:53.670 So this might be byte zero one, two, three, dot dot dot if you will. 00:52:53.670 --> 00:52:55.967 Four billion or so in the bottom right hand corner. 00:52:55.967 --> 00:52:59.050 But where these things are laid out doesn't really matter for our purposes 00:52:59.050 --> 00:53:00.290 right now. 00:53:00.290 --> 00:53:02.510 So that's how you might address memory. 00:53:02.510 --> 00:53:04.210 But how might you use memory? 00:53:04.210 --> 00:53:06.730 So suppose I want to store a whole bunch of values 00:53:06.730 --> 00:53:08.620 inside of my computer's memory. 00:53:08.620 --> 00:53:11.630 I don't really care where it is for the moment. 00:53:11.630 --> 00:53:13.840 And therefore, I don't care what those addresses are. 00:53:13.840 --> 00:53:20.290 But let's suppose that I do care that the numbers are close by to each other. 00:53:20.290 --> 00:53:21.640 They are contiguous in memory. 00:53:21.640 --> 00:53:22.870 Back to back to back to back. 00:53:22.870 --> 00:53:26.290 And we'll see that that has some advantages for efficiency. 00:53:26.290 --> 00:53:29.290 So suppose that I want to store one number in my computer's memory, 00:53:29.290 --> 00:53:34.750 and suppose that this memory and this memory and this memory 00:53:34.750 --> 00:53:37.690 here, those are all being used already by other programs, 00:53:37.690 --> 00:53:39.410 or other things going on in my program. 00:53:39.410 --> 00:53:43.119 And so the first available slot, for instance, say is this one here. 00:53:43.119 --> 00:53:45.160 And the number I want to store is the number one. 00:53:45.160 --> 00:53:48.940 And the next number that I want to store is going to be the number 50. 00:53:48.940 --> 00:53:51.670 And the next number I want to store is going to be the number 51. 00:53:51.670 --> 00:53:53.410 And the next one 61. 00:53:53.410 --> 00:53:55.720 And then the next one 109. 00:53:55.720 --> 00:53:58.180 And then the next one 121. 00:53:58.180 --> 00:53:59.690 And so forth. 00:53:59.690 --> 00:54:03.400 Which is to say, if I'm storing not people's phone numbers or names, 00:54:03.400 --> 00:54:05.650 just more simply for the sake of discussion, 00:54:05.650 --> 00:54:09.130 all I care about for whatever reason is storing a list of numbers. 00:54:09.130 --> 00:54:13.090 I might store them in my computer's memory in exactly this way. 00:54:13.090 --> 00:54:16.150 Each of these squares represents a byte, or eight bits. 00:54:16.150 --> 00:54:18.910 We know that we can abstract above bits, and actually 00:54:18.910 --> 00:54:20.530 represent things like decimal numbers. 00:54:20.530 --> 00:54:22.613 So I'm just stipulating that inside of these boxes 00:54:22.613 --> 00:54:25.669 really is eight bits, or some pattern of eight zeros and ones. 00:54:25.669 --> 00:54:26.710 But that's too low level. 00:54:26.710 --> 00:54:30.160 We're going to abstract away from that and just talk in terms, for now, 00:54:30.160 --> 00:54:32.300 in decimal digits. 00:54:32.300 --> 00:54:36.100 So this thing here might be how I store something 00:54:36.100 --> 00:54:38.260 underneath the hood in my computer. 00:54:38.260 --> 00:54:43.910 And what's nice about this, is that it's super, super easy to find values. 00:54:43.910 --> 00:54:47.680 In fact, let me just highlight this range of values, and slap a name on it. 00:54:47.680 --> 00:54:50.620 This boxed in region here in a computer program, 00:54:50.620 --> 00:54:53.620 would very often be called an array. 00:54:53.620 --> 00:54:57.370 It's a list of sorts, of values, but what's key 00:54:57.370 --> 00:55:01.540 here is that this list of values is back to back to back 00:55:01.540 --> 00:55:04.090 to back all contiguous in memory. 00:55:04.090 --> 00:55:08.620 When that is the case, when you have this property inside of a computer, 00:55:08.620 --> 00:55:10.870 you can do things pretty darned fast. 00:55:10.870 --> 00:55:14.500 Because there's a mathematical relationship among the locations of all 00:55:14.500 --> 00:55:15.320 of these values. 00:55:15.320 --> 00:55:17.770 For instance, if you are the computer, and you 00:55:17.770 --> 00:55:20.170 find yourself looking at the number 50, how do you 00:55:20.170 --> 00:55:22.180 find the next number in the list? 00:55:22.180 --> 00:55:25.990 Well, you just add one to your location. 00:55:25.990 --> 00:55:28.390 Not one to your value, one to your location. 00:55:28.390 --> 00:55:31.060 How do you go from the number 51 to 61? 00:55:31.060 --> 00:55:33.130 You just add one to your location. 00:55:33.130 --> 00:55:35.982 Again, because if this is byte zero, one. two, three, and I 00:55:35.982 --> 00:55:37.190 don't know where we are here. 00:55:37.190 --> 00:55:40.750 Maybe this is byte 100, 101, 102, 103. 00:55:40.750 --> 00:55:43.390 Because all of these values are back to back to back to back, 00:55:43.390 --> 00:55:46.960 you can very quickly access them just by looking to the left, 00:55:46.960 --> 00:55:51.260 looking to the right, by a fixed number of bits. 00:55:51.260 --> 00:55:53.560 This is advantageous, because you can jump around. 00:55:53.560 --> 00:55:56.950 In fact, this is where the name random access memory comes from. 00:55:56.950 --> 00:56:01.150 You have random access, not in the sense that you just go anywhere randomly, 00:56:01.150 --> 00:56:06.290 but you can go anywhere you want in the same amount of time. 00:56:06.290 --> 00:56:10.000 Even though this byte four billion looks really far away from everything else, 00:56:10.000 --> 00:56:10.840 doesn't matter. 00:56:10.840 --> 00:56:14.950 Because arithmetically the computer can say, give me byte number four billion. 00:56:14.950 --> 00:56:18.590 And that's a constant time operation, it's not linear. 00:56:18.590 --> 00:56:21.880 Doesn't take more and more steps the farther away it is, because all of this 00:56:21.880 --> 00:56:25.060 is electricity and electronic, not moving parts. 00:56:25.060 --> 00:56:30.220 You can just instantly jump to any of the cells in your computer's memory 00:56:30.220 --> 00:56:32.020 like that. 00:56:32.020 --> 00:56:36.220 The catch is, me the human, can look at these values and just see, 00:56:36.220 --> 00:56:38.290 oh I see a whole bunch of values. 00:56:38.290 --> 00:56:42.610 Six values inside of that array of memory there. 00:56:42.610 --> 00:56:44.360 But a computer can't do that. 00:56:44.360 --> 00:56:48.430 In fact a computer can really only see one value at a time. 00:56:48.430 --> 00:56:52.440 And so when I say the computer can go get value four billion, it can do that. 00:56:52.440 --> 00:56:54.090 But it can only go get that one value. 00:56:54.090 --> 00:56:57.160 If it wants another, it has to go get that value and another value. 00:56:57.160 --> 00:57:01.140 And indeed, inside of a computer, inside of a CPU that a company like Intel 00:57:01.140 --> 00:57:02.850 might make, there's an instruction that's 00:57:02.850 --> 00:57:06.000 often called load, which refers to exactly this process. 00:57:06.000 --> 00:57:09.660 Go load a value, maybe one byte, maybe four bytes, maybe eight bytes, 00:57:09.660 --> 00:57:11.880 from memory, and bring it back to me. 00:57:11.880 --> 00:57:14.460 And put it in something called a register, typically. 00:57:14.460 --> 00:57:18.660 But that load operation is kind of like, if you've ever seen those fancy soda 00:57:18.660 --> 00:57:21.675 machines where it doesn't just drop your soda on the ground, 00:57:21.675 --> 00:57:24.300 instead there's that fancy little robotic arm that for whatever 00:57:24.300 --> 00:57:27.420 reason goes left and then goes right and then goes up and then goes down, 00:57:27.420 --> 00:57:30.450 and then finally decides where your soda is. 00:57:30.450 --> 00:57:33.540 Then it moves it over the dispenser and drops it out. 00:57:33.540 --> 00:57:37.560 Well that kind of robotic arm is kind of like a physical incarnation 00:57:37.560 --> 00:57:41.070 of a computer, in that it can't just take a look back and figure out 00:57:41.070 --> 00:57:46.290 where the coca-cola is, it has to go to that particular location 00:57:46.290 --> 00:57:47.670 and actually drop it for you. 00:57:47.670 --> 00:57:51.420 But computers, when there aren't moving parts, can just jump there instantly. 00:57:51.420 --> 00:57:53.730 You don't have to wait for that robotic arm to move. 00:57:53.730 --> 00:57:57.690 And so in this case, the computer can get one of these values instantly, 00:57:57.690 --> 00:58:02.640 but it doesn't know what value is there until it arrives. 00:58:02.640 --> 00:58:04.110 Right It's kind of like being-- 00:58:04.110 --> 00:58:09.240 another analogy might be a whole bunch of lockers 00:58:09.240 --> 00:58:11.160 in a train station or the like. 00:58:11.160 --> 00:58:14.370 Where the only way you can see what's inside of a locker is by renting it 00:58:14.370 --> 00:58:17.490 or by putting in a key and opening it up and seeing what's inside. 00:58:17.490 --> 00:58:21.150 Similarly, can you think of there being tiny little doors, 00:58:21.150 --> 00:58:24.540 occluding these numbers so that the computer has to open it up 00:58:24.540 --> 00:58:25.920 before it sees what's there. 00:58:28.890 --> 00:58:32.920 Speaking of doors, let's suppose that the computer's memory actually 00:58:32.920 --> 00:58:34.270 has some doors in front of it. 00:58:34.270 --> 00:58:36.760 Depicted here, is just these simple rectangles. 00:58:36.760 --> 00:58:40.660 And behind each of these doors is some random numbers, 00:58:40.660 --> 00:58:43.600 among which is one number I do know and care about. 00:58:43.600 --> 00:58:46.000 In fact, I want to find myself the number 50, 00:58:46.000 --> 00:58:47.430 but I don't know where it is. 00:58:47.430 --> 00:58:51.280 So I'm going to take a pretty naive, but correct approach, of just starting 00:58:51.280 --> 00:58:51.910 from the left. 00:58:51.910 --> 00:58:55.210 Just as we did with a phone book, starting from the first page. 00:58:55.210 --> 00:58:58.819 And I'm going to go ahead and knock on, and then open the first door. 00:58:58.819 --> 00:59:01.360 And I see the number 15, that of course is not the number 50. 00:59:01.360 --> 00:59:02.693 So I'm going to proceed further. 00:59:02.693 --> 00:59:07.030 To go and knock on and touch the second door, and I see 23. 00:59:07.030 --> 00:59:10.030 Still not the number I'm looking for, so I knock and touch again. 00:59:10.030 --> 00:59:10.690 16. 00:59:10.690 --> 00:59:12.400 Knock and touch again. 00:59:12.400 --> 00:59:14.170 Knock and touch again. 00:59:14.170 --> 00:59:16.600 Still no number, but I have found the meaning of life. 00:59:16.600 --> 00:59:22.150 And let's try one more door here and touch the number 50 behind this door. 00:59:22.150 --> 00:59:25.807 So here too is a very linear algorithm, but these numbers 00:59:25.807 --> 00:59:27.390 are kind of all over the place, right. 00:59:27.390 --> 00:59:30.980 It starts at 15, gets a little bigger, then gets smaller, then smaller still, 00:59:30.980 --> 00:59:32.440 then bigger then bigger again. 00:59:32.440 --> 00:59:36.580 So it doesn't seem that these numbers are sorted. 00:59:36.580 --> 00:59:41.560 And in fact, if the analog then in my phone might be, 00:59:41.560 --> 00:59:44.020 my contacts might be just randomly ordered. 00:59:44.020 --> 00:59:46.099 Maybe ordered in the order in which I inputted 00:59:46.099 --> 00:59:48.640 them, which probably isn't ideal when I want to find someone. 00:59:48.640 --> 00:59:50.770 I don't really care in what order they were put in, 00:59:50.770 --> 00:59:52.930 I care that I find them quickly. 00:59:52.930 --> 00:59:55.540 And so maybe I should do a little work up front, 00:59:55.540 --> 00:59:58.330 or maybe Apple or Google should do a little work upfront, 00:59:58.330 --> 01:00:01.120 and actually maintain my contacts in sorted order. 01:00:01.120 --> 01:00:02.380 Which indeed they do. 01:00:02.380 --> 01:00:05.470 Or in this case here, why don't I at least maintain my numbers 01:00:05.470 --> 01:00:06.800 in sorted order. 01:00:06.800 --> 01:00:10.870 So let me go ahead and reset all of the doors, and start again. 01:00:10.870 --> 01:00:13.180 This time with an assumption. 01:00:13.180 --> 01:00:17.050 This time with a feature, if you will, that the numbers are now 01:00:17.050 --> 01:00:19.240 sorted behind these doors. 01:00:19.240 --> 01:00:22.960 So they go from smallest to biggest, and there might be some gaps in the middle, 01:00:22.960 --> 01:00:25.720 but in so far as they're now sorted, I can 01:00:25.720 --> 01:00:29.470 leverage some of that old school phone book intuition and divide 01:00:29.470 --> 01:00:30.880 and conquer this problem. 01:00:30.880 --> 01:00:33.760 Let me go ahead into the middle of the problem now, not the left, 01:00:33.760 --> 01:00:37.060 and let me go ahead and knock on and open the middle door. 01:00:37.060 --> 01:00:38.040 And there's 16. 01:00:38.040 --> 01:00:40.450 Now 16 is kind of a small number, but I now 01:00:40.450 --> 01:00:42.414 know that none of the numbers to the left 01:00:42.414 --> 01:00:44.080 are going to be the one I'm looking for. 01:00:44.080 --> 01:00:45.670 Because I'm still looking for 50. 01:00:45.670 --> 01:00:48.070 So I can sort of throw that half of the problem 01:00:48.070 --> 01:00:50.350 away, or really just turn a blind eye to it. 01:00:50.350 --> 01:00:52.270 And now go to the right half of the problem, 01:00:52.270 --> 01:00:58.270 identify the middle door here, knock on and open this one, there again is 42. 01:00:58.270 --> 01:01:00.580 But now, again, I can apply some of that intuition 01:01:00.580 --> 01:01:03.820 and know that 50 is bigger than 42, so must be 01:01:03.820 --> 01:01:08.680 that 50 is right here at the last door. 01:01:08.680 --> 01:01:13.360 And in this case, have I gotten away with opening fewer doors? 01:01:13.360 --> 01:01:17.789 It's not a frightening number of fewer doors, it's still relatively small, 01:01:17.789 --> 01:01:20.830 but that's only because we could fit some seven doors or so on the screen 01:01:20.830 --> 01:01:21.340 here. 01:01:21.340 --> 01:01:25.330 But if the number of doors were four billion, or if the number of doors 01:01:25.330 --> 01:01:30.130 were even larger, surely with this divide and conquer approach, 01:01:30.130 --> 01:01:33.800 could I find my number super fast. 01:01:33.800 --> 01:01:35.890 But there's kind of a trade off here, right. 01:01:35.890 --> 01:01:39.520 Like yes, I can now find numbers faster, or equivalently 01:01:39.520 --> 01:01:42.790 I can find people presumably in my address book in my phone 01:01:42.790 --> 01:01:47.890 faster, if they are maintained in sorted order, but what price did I pay? 01:01:47.890 --> 01:01:52.300 In fact, a theme in computer science, and in programming specifically, 01:01:52.300 --> 01:01:54.760 is this theme of trade offs. 01:01:54.760 --> 01:01:59.030 Yes, you can speed up searches, but you're going to have to pay a price. 01:01:59.030 --> 01:02:00.680 What is that price in this context? 01:02:00.680 --> 01:02:04.930 Well it took someone some amount of work to sort those contacts, right. 01:02:04.930 --> 01:02:08.811 Before class I had actually come up with the sorted order of those doors, 01:02:08.811 --> 01:02:11.060 and then put the right numbers behind the right doors. 01:02:11.060 --> 01:02:12.700 There was a non-zero amount of work. 01:02:12.700 --> 01:02:15.130 Not a huge amount of work, but it's non-zero. 01:02:15.130 --> 01:02:19.030 So really, by speeding things up during search, 01:02:19.030 --> 01:02:24.940 I have to first incur an up front, one time cost, of sorting things 01:02:24.940 --> 01:02:26.830 before I even begin to search. 01:02:26.830 --> 01:02:29.907 Or equivalently, someone at Google, or someone at Microsoft, 01:02:29.907 --> 01:02:32.740 or someone at Facebook, or any of these big companies with big data, 01:02:32.740 --> 01:02:36.280 if they want to use searching efficiently, 01:02:36.280 --> 01:02:39.190 they need to pay a price upfront of sorting the data first. 01:02:39.190 --> 01:02:42.100 And indeed, that's what Android and iOS and other devices 01:02:42.100 --> 01:02:43.760 too, must be doing underneath the hood. 01:02:43.760 --> 01:02:47.620 Someone at some point in time, probably long ago at this point, 01:02:47.620 --> 01:02:50.820 implemented a sorting routine. 01:02:50.820 --> 01:02:53.380 But how quickly can we sort numbers? 01:02:53.380 --> 01:02:58.150 How quickly can we actually achieve that result of sorted values 01:02:58.150 --> 01:03:02.020 so that we can then do search more efficiently? 01:03:02.020 --> 01:03:04.110 So for this, let's try something a little visual, 01:03:04.110 --> 01:03:05.549 a little old school, in fact. 01:03:05.549 --> 01:03:07.090 We're here in this beautiful theater. 01:03:07.090 --> 01:03:09.965 And in fact we have a whole bunch of music stands from the musicians. 01:03:09.965 --> 01:03:12.010 Let's go ahead and use a few music stands, 01:03:12.010 --> 01:03:15.705 each of which will represent a location in my computer's memory, 01:03:15.705 --> 01:03:17.830 and then I'm just going to have some numbers that I 01:03:17.830 --> 01:03:19.930 want to put inside of those locations. 01:03:19.930 --> 01:03:25.625 And let's see how expensive it is for me to actually sort those values. 01:03:25.625 --> 01:03:28.610 So I have here in my hand the numbers one through eight, 01:03:28.610 --> 01:03:32.900 and I have here eight memory locations as represented here by music stands. 01:03:32.900 --> 01:03:37.070 And let me go ahead and just put these numbers in pretty much random order. 01:03:37.070 --> 01:03:39.320 I've got my little cheat sheet there up on the screen, 01:03:39.320 --> 01:03:42.380 so that we reset each time to the same location. 01:03:42.380 --> 01:03:45.955 And I'm just kind of putting these in some random order. 01:03:45.955 --> 01:03:48.080 Where four's going to be over here, and the numbers 01:03:48.080 --> 01:03:49.913 are going to sometimes get bigger, sometimes 01:03:49.913 --> 01:03:51.560 get smaller from left to right. 01:03:51.560 --> 01:03:54.590 The key, though, is that they're not actually sorted. 01:03:54.590 --> 01:04:00.200 And so I start now with these values as such. 01:04:00.200 --> 01:04:00.860 All right. 01:04:00.860 --> 01:04:04.850 So suppose I wanted to find now, maybe the number 50. 01:04:04.850 --> 01:04:07.850 Well 50 is not among these numbers, so we better think a little smaller. 01:04:07.850 --> 01:04:10.250 Suppose I want to find myself the number seven. 01:04:10.250 --> 01:04:13.570 Well again, as in the case of a computer, or the doors, 01:04:13.570 --> 01:04:16.340 I can only find seven not just by kind of cheating like a human 01:04:16.340 --> 01:04:18.170 and say, oh there it is over there. 01:04:18.170 --> 01:04:20.180 I have to get to that point more methodically. 01:04:20.180 --> 01:04:24.830 And define the number seven linearly, using linear search, so to speak. 01:04:24.830 --> 01:04:26.690 I have to check is this it, nope. 01:04:26.690 --> 01:04:27.770 Is this it, nope. 01:04:27.770 --> 01:04:28.270 Is this it? 01:04:28.270 --> 01:04:29.270 Nope. 01:04:29.270 --> 01:04:29.770 Is this it? 01:04:29.770 --> 01:04:30.270 Nope. 01:04:30.270 --> 01:04:30.770 Is this it? 01:04:30.770 --> 01:04:31.270 Nope. 01:04:31.270 --> 01:04:31.770 Is this it? 01:04:31.770 --> 01:04:32.270 Nope. 01:04:32.270 --> 01:04:33.050 Is this it? 01:04:33.050 --> 01:04:33.680 Yes. 01:04:33.680 --> 01:04:36.139 And now I found it, coincidentally, some seven steps later. 01:04:36.139 --> 01:04:38.555 But seven could have been anywhere, but in the worst case, 01:04:38.555 --> 01:04:39.930 it's all the way over here. 01:04:39.930 --> 01:04:43.830 Now could I use binary search on this? 01:04:43.830 --> 01:04:47.040 Let's actually slap a label on that algorithm we've leveraged with a phone 01:04:47.040 --> 01:04:47.540 book. 01:04:47.540 --> 01:04:48.920 Dividing and conquering. 01:04:48.920 --> 01:04:52.040 If you do it half and half and half and half, looking for some value, 01:04:52.040 --> 01:04:54.740 it's actually more technically called binary search. 01:04:54.740 --> 01:04:57.450 Bi meaning two, and so it's two because you're splitting 01:04:57.450 --> 01:04:58.700 the problem in half each time. 01:04:58.700 --> 01:05:00.620 Could I use binary search here? 01:05:00.620 --> 01:05:02.592 Well there's one minor problem, which is that I 01:05:02.592 --> 01:05:04.550 don't have an odd number of music stands, which 01:05:04.550 --> 01:05:07.184 means the middle element is like between eight and one. 01:05:07.184 --> 01:05:07.850 But that's fine. 01:05:07.850 --> 01:05:10.940 We can deal with that arithmetically by just rounding up or down, 01:05:10.940 --> 01:05:13.460 depending on how we want to implement it ultimately. 01:05:13.460 --> 01:05:15.810 But even then, that's not the biggest problem. 01:05:15.810 --> 01:05:19.070 If I look at eight here, and I'm looking for seven, well 01:05:19.070 --> 01:05:22.010 based on my previous divide and conquer strategy, 01:05:22.010 --> 01:05:24.890 I should be looking to the left because seven is smaller than eight. 01:05:24.890 --> 01:05:27.223 Of course I'm not going to find it to the left of eight, 01:05:27.223 --> 01:05:29.220 because it's not there, it's over there. 01:05:29.220 --> 01:05:31.530 And that's because the numbers aren't sorted. 01:05:31.530 --> 01:05:34.410 So as in the case with my hypothetical address book, 01:05:34.410 --> 01:05:37.250 if I want to be able to search this efficiently, 01:05:37.250 --> 01:05:41.020 or search a database sufficiently, or search most anything efficiently, 01:05:41.020 --> 01:05:42.770 I need to know something about the data. 01:05:42.770 --> 01:05:44.070 And ideally that it's sorted. 01:05:44.070 --> 01:05:47.280 So how do I get to the point of sorting this data? 01:05:47.280 --> 01:05:50.390 Well, you know what, I could take a fairly localized approach. 01:05:50.390 --> 01:05:52.700 Like four and two, I'm looking at them right here. 01:05:52.700 --> 01:05:54.920 And technically, I'm looking at them one at a time. 01:05:54.920 --> 01:05:58.380 I see four, I see two, and I know they're out of order. 01:05:58.380 --> 01:06:00.180 Well, as the computer, I could do this. 01:06:00.180 --> 01:06:03.530 I could, using my load and store instruction, essentially, 01:06:03.530 --> 01:06:05.300 or a move instruction, do that. 01:06:05.300 --> 01:06:08.370 And now I've improved the sorting of this list. 01:06:08.370 --> 01:06:10.640 Now four and six, those look OK. 01:06:10.640 --> 01:06:12.170 Six and eight, those look OK. 01:06:12.170 --> 01:06:14.390 One and eight, Mm-mm, not OK. 01:06:14.390 --> 01:06:16.370 So let me just make a quick fix. 01:06:16.370 --> 01:06:18.450 Let me just swap one and eight. 01:06:18.450 --> 01:06:20.510 It's not perfect yet, but it's better. 01:06:20.510 --> 01:06:23.300 Eight and three, let me make a quick fix. 01:06:23.300 --> 01:06:26.090 Well, that's better. 01:06:26.090 --> 01:06:28.735 Eight and seven, a quick fix, that's better. 01:06:28.735 --> 01:06:31.280 Eight and five, quick fix, that's better. 01:06:31.280 --> 01:06:32.240 So this is great. 01:06:32.240 --> 01:06:34.460 Eight is all the way to the right as intended, 01:06:34.460 --> 01:06:38.780 and one is not yet all the way to the left. 01:06:38.780 --> 01:06:42.440 So I haven't solved the problem, but I have improved it. 01:06:42.440 --> 01:06:45.650 I've gotten a step closer to the correct solution of sorted numbers, 01:06:45.650 --> 01:06:49.430 because some of the values moved at least one step closer 01:06:49.430 --> 01:06:51.962 to their intended location. 01:06:51.962 --> 01:06:53.670 But of course, I've got to do this again. 01:06:53.670 --> 01:06:55.097 So two and four their good. 01:06:55.097 --> 01:06:56.180 Four and six they're good. 01:06:56.180 --> 01:06:58.470 Oh six and one, now that's out of order. 01:06:58.470 --> 01:07:00.980 And so I can transpose these two. 01:07:00.980 --> 01:07:06.200 So this algorithm has a name, whereby I am transposing pairwise elements that 01:07:06.200 --> 01:07:10.190 are out of order, such that the biggest numbers are bubbling their way up 01:07:10.190 --> 01:07:14.060 to the top. eight already made its way there, seven just made its way there. 01:07:14.060 --> 01:07:16.880 And so if I keep doing this, these bigger values 01:07:16.880 --> 01:07:19.010 are going to continue to bubble their way up. 01:07:19.010 --> 01:07:20.600 Two and four those or fine. 01:07:20.600 --> 01:07:23.750 Four and one, ooh notice, four is starting to bubble to the right. 01:07:23.750 --> 01:07:26.200 Four and three, it's continuing to bubble. 01:07:26.200 --> 01:07:27.440 Four and six is OK. 01:07:27.440 --> 01:07:28.670 Six and five, mm-mm. 01:07:28.670 --> 01:07:32.090 Six bubbled its way up, and now I got a good one there. 01:07:32.090 --> 01:07:34.160 I really fixed a lot of those values. 01:07:34.160 --> 01:07:35.990 I just need to do this once more. 01:07:35.990 --> 01:07:38.180 One and two, fixed that. 01:07:38.180 --> 01:07:42.050 Two and three, three and four, four and five, six and seven, seven and eight. 01:07:42.050 --> 01:07:44.630 Let me do another spot check just to make sure. 01:07:44.630 --> 01:07:48.320 One and two, three and four, five,six, seven, eight. 01:07:48.320 --> 01:07:49.010 I'm good. 01:07:49.010 --> 01:07:51.950 But I did a lot of walking, and I did a lot of transpositions. 01:07:51.950 --> 01:07:54.450 And it turns out this algorithm is correct, 01:07:54.450 --> 01:07:56.960 does have a formal name called bubble sort. 01:07:56.960 --> 01:07:58.529 But it's not very efficient. 01:07:58.529 --> 01:08:02.149 In fact, if you actually do the math out, this takes as many as like n times 01:08:02.149 --> 01:08:03.819 n steps in the worst case. 01:08:03.819 --> 01:08:06.649 So if you have n elements, that's eight in this case. 01:08:06.649 --> 01:08:07.990 N times n, that's 64. 01:08:07.990 --> 01:08:12.500 It's not actually 64, but it starts to approach 64 asymptotically, 01:08:12.500 --> 01:08:13.109 so to speak. 01:08:13.109 --> 01:08:15.479 So if we did a formal analysis, long story short, 01:08:15.479 --> 01:08:18.590 it would start to feel like something quadratic, so to speak. 01:08:18.590 --> 01:08:19.819 Not a fast algorithm. 01:08:19.819 --> 01:08:20.830 But there's other ones. 01:08:20.830 --> 01:08:22.663 So let's actually try out another algorithm. 01:08:22.663 --> 01:08:27.149 Let me go ahead and restore the original, seemingly random order here. 01:08:27.149 --> 01:08:29.580 And indeed it is random in so far as I'm just 01:08:29.580 --> 01:08:35.260 putting it in some non-sorted order that I thought through initially. 01:08:35.260 --> 01:08:39.569 One's going to go here, three is going to go here, seven and five. 01:08:39.569 --> 01:08:42.180 So now we're back to the original unsorted order. 01:08:42.180 --> 01:08:44.670 You know what, I can do another algorithm altogether here. 01:08:44.670 --> 01:08:48.840 Let me not look at them pairwise, let me just go even simpler. 01:08:48.840 --> 01:08:51.534 Let me just find the smallest number, and deal with it. 01:08:51.534 --> 01:08:54.700 All right, four is pretty small indeed it's the smallest I've seen thus far, 01:08:54.700 --> 01:08:55.908 I'm going to hang on to this. 01:08:55.908 --> 01:08:58.050 Oh wait a minute, two a smaller. 01:08:58.050 --> 01:08:59.229 Let me hang on to that. 01:08:59.229 --> 01:09:00.450 Six not smaller. 01:09:00.450 --> 01:09:02.010 One, eight not smaller. 01:09:02.010 --> 01:09:05.500 One is smaller, let me go ahead and hang on to this. 01:09:05.500 --> 01:09:07.810 Three, not smaller. 01:09:07.810 --> 01:09:08.729 Seven, five. 01:09:08.729 --> 01:09:10.200 OK so here's the smallest element. 01:09:10.200 --> 01:09:12.424 I found it I've selected the smallest element. 01:09:12.424 --> 01:09:14.590 You know what, this belongs all the way to the left. 01:09:14.590 --> 01:09:17.380 I'm going to put it there. 01:09:17.380 --> 01:09:18.810 I'm going to do that. 01:09:18.810 --> 01:09:20.220 Of course, this is cheating. 01:09:20.220 --> 01:09:22.890 Like this music stand, even though it's kind of wide, 01:09:22.890 --> 01:09:24.540 should only be storing one value. 01:09:24.540 --> 01:09:27.880 There's only enough room for one byte for instance of information. 01:09:27.880 --> 01:09:29.340 So where do I put four? 01:09:29.340 --> 01:09:31.840 Like four can't stay on the same music stand. 01:09:31.840 --> 01:09:33.479 You know I can't cheat and do that. 01:09:33.479 --> 01:09:37.440 So I could move eight over, move six over, move two over, 01:09:37.440 --> 01:09:38.460 that's a lot of work. 01:09:38.460 --> 01:09:41.921 Or I could just, fine, four was already out of order in the first place. 01:09:41.921 --> 01:09:43.170 Let's just put four over here. 01:09:43.170 --> 01:09:45.254 It's not made the problem any fundamentally worse, 01:09:45.254 --> 01:09:46.795 because they started in random order. 01:09:46.795 --> 01:09:47.970 I randomly put four there. 01:09:47.970 --> 01:09:51.840 The key is, one is in really good shape now. 01:09:51.840 --> 01:09:54.754 Let me repeat this process of selecting the smallest element. 01:09:54.754 --> 01:09:56.170 Let me look for the next smallest. 01:09:56.170 --> 01:09:58.020 Two is pretty small. 01:09:58.020 --> 01:09:59.550 Small, smallest, yep. 01:09:59.550 --> 01:10:00.720 OK I found it. 01:10:00.720 --> 01:10:01.840 Two is the smallest. 01:10:01.840 --> 01:10:04.290 Let's put him co-incidentally right where he goes. 01:10:04.290 --> 01:10:08.670 So that was kind of a waste of time, but verification that it's correct. 01:10:08.670 --> 01:10:09.750 Let's keep looking. 01:10:09.750 --> 01:10:11.370 Six is pretty small now. 01:10:11.370 --> 01:10:13.410 Eight is not, oh four is smaller. 01:10:13.410 --> 01:10:14.700 Let's grab four. 01:10:14.700 --> 01:10:16.224 Oh Three is even better. 01:10:16.224 --> 01:10:16.890 Let's grab that. 01:10:16.890 --> 01:10:17.810 Seven, five. 01:10:17.810 --> 01:10:20.830 OK 3 needs to be put in its place. 01:10:20.830 --> 01:10:23.500 I don't want to touch one and two, because those are good. 01:10:23.500 --> 01:10:25.570 Six, sorry you don't belong there. 01:10:25.570 --> 01:10:28.470 Let's go ahead and evict you and put six over there. 01:10:28.470 --> 01:10:32.910 So if I repeat this process of just continually iterating, or looping, 01:10:32.910 --> 01:10:36.420 if you will, through the list, looking for the smallest element, 01:10:36.420 --> 01:10:41.220 and evicting whatever is in the place that it belongs, 01:10:41.220 --> 01:10:44.370 notice that I've sorted it seemingly faster. 01:10:44.370 --> 01:10:46.170 But I actually got a little lucky there. 01:10:46.170 --> 01:10:50.700 It turns out that this algorithm, which also has a name called selection sort, 01:10:50.700 --> 01:10:52.570 is also quadratic in nature. 01:10:52.570 --> 01:10:55.830 If you've got n elements, it's going to take roughly n times n total 01:10:55.830 --> 01:10:58.020 steps in order to achieve that. 01:10:58.020 --> 01:10:59.100 Not in every case. 01:10:59.100 --> 01:11:01.470 But in the worst case, so to speak. 01:11:01.470 --> 01:11:05.520 So it turns out this is just two ways now to sort elements. 01:11:05.520 --> 01:11:10.511 And there are dozens, if not more, other ways to actually sort elements, 01:11:10.511 --> 01:11:13.260 with each one getting maybe a little better, maybe a little worse, 01:11:13.260 --> 01:11:15.660 maybe a little better depending on the inputs. 01:11:15.660 --> 01:11:17.760 So sometimes it really depends. 01:11:17.760 --> 01:11:21.030 But these are all examples then of algorithms. 01:11:21.030 --> 01:11:24.420 And algorithms have performance associated with them. 01:11:24.420 --> 01:11:27.010 Algorithms have a running time associated with them. 01:11:27.010 --> 01:11:29.730 And one of the things that theoretical computer scientists do, 01:11:29.730 --> 01:11:31.950 is not only design algorithms like these, 01:11:31.950 --> 01:11:35.050 or really even way more sophisticated algorithms than these, 01:11:35.050 --> 01:11:36.180 but also analyze them. 01:11:36.180 --> 01:11:39.240 So that we can make arguments, mathematical arguments, sometimes 01:11:39.240 --> 01:11:42.820 that state as a proof, this algorithm is correct. 01:11:42.820 --> 01:11:46.020 No matter what, this will sort your elements correctly, 01:11:46.020 --> 01:11:48.180 if implemented correctly in a computer, and this 01:11:48.180 --> 01:11:53.070 is how much time it will take in general in order to achieve that. 01:11:53.070 --> 01:11:57.120 But the takeaway here, is that it was a lot of work, right. 01:11:57.120 --> 01:11:59.520 I had the luxury with that phone book, long ago, 01:11:59.520 --> 01:12:02.280 of just opening it up to the middle and looking for Mike Smith, 01:12:02.280 --> 01:12:05.370 and da da da da da, found Mike Smith pretty darn efficiently, 01:12:05.370 --> 01:12:07.950 because Verizon, or whoever made the phone book, 01:12:07.950 --> 01:12:11.380 actually did all of the upfront work for me. 01:12:11.380 --> 01:12:12.690 So here's a question then. 01:12:12.690 --> 01:12:16.200 Suppose that you have a whole bunch of random values, 01:12:16.200 --> 01:12:19.020 and you want to search for something specific among those values. 01:12:19.020 --> 01:12:22.950 Mike Smith, number 50, whatever it is you're looking for, 01:12:22.950 --> 01:12:27.960 should you sort those values first before searching? 01:12:27.960 --> 01:12:31.030 Should you sort the values first? 01:12:31.030 --> 01:12:34.200 Well, I would wager you should sort them if you're 01:12:34.200 --> 01:12:36.730 going to be searching over them pretty darn often. 01:12:36.730 --> 01:12:37.230 Right. 01:12:37.230 --> 01:12:39.240 Because it might be painful, might be boring, 01:12:39.240 --> 01:12:42.480 might be tedious to actually sort those values either manually, 01:12:42.480 --> 01:12:46.170 or algorithmically with a computer, and a computer program. 01:12:46.170 --> 01:12:48.760 But it'll probably pay off over time. 01:12:48.760 --> 01:12:52.080 So there's this notion of amortization of the cost over time, whereby yeah, it 01:12:52.080 --> 01:12:53.746 might take you a decent amount of steps. 01:12:53.746 --> 01:12:56.830 Quadratic, seconds, minutes, whatever you want to measure it in. 01:12:56.830 --> 01:12:59.050 But, it's going to pay off in the long run. 01:12:59.050 --> 01:12:59.100 Of 01:12:59.100 --> 01:13:00.974 Course, if you add data, you're going to need 01:13:00.974 --> 01:13:02.710 to keep that new data sorted as well. 01:13:02.710 --> 01:13:04.202 But again, it's this trade off. 01:13:04.202 --> 01:13:07.160 However, consider a scenario where you just have a whole bunch of data, 01:13:07.160 --> 01:13:09.120 it's in random order for whatever reason, 01:13:09.120 --> 01:13:10.705 and you just want to find Mike Smith. 01:13:10.705 --> 01:13:12.330 Or you just want to find the number 50. 01:13:12.330 --> 01:13:14.589 And after that, you do not care about the data. 01:13:14.589 --> 01:13:16.380 You're never going to look up someone else. 01:13:16.380 --> 01:13:18.213 You're never going to look up another value. 01:13:18.213 --> 01:13:20.940 Maybe then, in that case, you should just blindly 01:13:20.940 --> 01:13:23.190 go through it brute force, so to speak. 01:13:23.190 --> 01:13:26.760 Left or right, in some sense, or linearly, as opposed 01:13:26.760 --> 01:13:29.730 to trying to do anything clever like we did with the phone book. 01:13:29.730 --> 01:13:34.290 Because it's faster to just brute force your way through the solution, 01:13:34.290 --> 01:13:38.550 than doing some upfront sophistication just to improve the subsequent result. 01:13:38.550 --> 01:13:40.290 So again, it's a trade off. 01:13:40.290 --> 01:13:43.950 It's a cost benefit analysis that you need to decide, ultimately, 01:13:43.950 --> 01:13:45.300 on what is most important. 01:13:45.300 --> 01:13:51.420 Which resource, time, space, money, people, is most important to you. 01:13:51.420 --> 01:13:54.150 So in all of these examples thus far have we 01:13:54.150 --> 01:13:57.180 assumed that the data is contiguous. 01:13:57.180 --> 01:13:59.079 Back to back to back with other values. 01:13:59.079 --> 01:14:01.620 And indeed, that's exactly what the music stands represented, 01:14:01.620 --> 01:14:03.930 were these fixed locations in memory. 01:14:03.930 --> 01:14:07.320 Bytes in your computer's ram, so to speak, that could have values. 01:14:07.320 --> 01:14:11.070 But you had to physically move those values from one location to another 01:14:11.070 --> 01:14:13.110 if you wanted to sort the values therein. 01:14:13.110 --> 01:14:15.699 you couldn't just kind of insert a new music stand, 01:14:15.699 --> 01:14:18.240 and push the others aside, because that's simply not allowed. 01:14:18.240 --> 01:14:20.550 That's not the metaphor in question. 01:14:20.550 --> 01:14:23.010 Those music stands were fixed just as the locations 01:14:23.010 --> 01:14:26.030 in memory of your computer are as well. 01:14:26.030 --> 01:14:31.050 And as it turns out, just as we drew a picture before wherein we had some 01:14:31.050 --> 01:14:36.210 sequence of back to back values in memory, turns out that in programming, 01:14:36.210 --> 01:14:40.080 actually using languages that aren't just pseudo code, but actual code, 01:14:40.080 --> 01:14:41.430 Java and c++ and others. 01:14:41.430 --> 01:14:45.900 Still, there is very often something called an array, or a list. 01:14:45.900 --> 01:14:50.340 A data structure that gives you the appearance of having values 01:14:50.340 --> 01:14:52.140 back to back to back to back. 01:14:52.140 --> 01:14:54.810 And in some languages, these values are indeed 01:14:54.810 --> 01:14:59.310 next to each other in terms of their bits inside of your computer's memory. 01:14:59.310 --> 01:15:01.590 But there's a problem with a world like this. 01:15:01.590 --> 01:15:05.460 Because if you put in a whole bunch of values up front, as I did earlier, 01:15:05.460 --> 01:15:08.200 and you want to actually insert some new value, 01:15:08.200 --> 01:15:10.680 where do you actually put that value? 01:15:10.680 --> 01:15:15.540 For instance, earlier we had one and 50 and 51 and 61 and 109 and 121. 01:15:15.540 --> 01:15:20.820 Suppose I want to put the number 55 in this sequence of numbers. 01:15:20.820 --> 01:15:25.320 And better still, I want to keep the whole list sorted. 01:15:25.320 --> 01:15:26.190 Well dammit. 01:15:26.190 --> 01:15:28.800 There's really not room for 55 in there. 01:15:28.800 --> 01:15:32.100 I know where it should go numerically, it should go right here, of course, 01:15:32.100 --> 01:15:34.320 between the 51 and the 61. 01:15:34.320 --> 01:15:35.320 But there's no room. 01:15:35.320 --> 01:15:38.070 Or I could put it arbitrarily over here, but then it's not sorted. 01:15:38.070 --> 01:15:39.778 I could put it, well I can't put it there 01:15:39.778 --> 01:15:42.130 because the dot dot dot remember from earlier, suggested 01:15:42.130 --> 01:15:43.380 that something else was there. 01:15:43.380 --> 01:15:45.300 Something else from your program or computer 01:15:45.300 --> 01:15:46.836 was already using that location. 01:15:46.836 --> 01:15:49.210 Now I could put it maybe below, but that's kind of weird. 01:15:49.210 --> 01:15:52.710 The whole point here is this contiguousness. 01:15:52.710 --> 01:15:55.500 And indeed, with the music stands a moment ago, 01:15:55.500 --> 01:15:58.380 was it advantageous for me to be able to take one step 01:15:58.380 --> 01:15:59.910 and be at another music stand? 01:15:59.910 --> 01:16:01.950 One more step be at another music stand. 01:16:01.950 --> 01:16:04.830 I didn't have to walk all around stage looking for my values, 01:16:04.830 --> 01:16:06.705 because they were indeed back to back to back 01:16:06.705 --> 01:16:10.540 and that made things very efficient and predictably close to one another. 01:16:10.540 --> 01:16:11.520 So what are my options? 01:16:11.520 --> 01:16:13.460 Like surely a computer can do this. 01:16:13.460 --> 01:16:16.630 It's not the case that computers are so dumb that once you write your values 01:16:16.630 --> 01:16:19.860 you can't solve any new problems. 01:16:19.860 --> 01:16:22.470 So what are my options? 01:16:22.470 --> 01:16:25.700 I could put the 55 here at the end, but then I'm 01:16:25.700 --> 01:16:27.860 going to have to reshuffle all of the numbers. 01:16:27.860 --> 01:16:31.009 Or maybe I ask the computer for more memory altogether, 01:16:31.009 --> 01:16:33.050 and you know what, I re do this, and I say let me 01:16:33.050 --> 01:16:40.400 put one here, 50 here, 51 here, 55 here, then 61, 109, and 121. 01:16:40.400 --> 01:16:43.190 In other words, why don't I just use different memory? 01:16:43.190 --> 01:16:46.460 Different bytes in my computer's memory, but that feels kind of lame, 01:16:46.460 --> 01:16:50.100 because now I need twice as much memory, at least temporarily. 01:16:50.100 --> 01:16:52.250 So I can kind of copy the old values into the new, 01:16:52.250 --> 01:16:54.680 but leave room for that new value. 01:16:54.680 --> 01:16:58.070 So that seems a little time consuming, and tedious certainly. 01:16:58.070 --> 01:17:00.900 And indeed, it would be for a computer as well. 01:17:00.900 --> 01:17:03.710 But maybe there's another solution altogether. 01:17:03.710 --> 01:17:06.530 And here too, we're experiencing yet another trade off. 01:17:06.530 --> 01:17:09.830 The contiguousness of my memory has thus far been an advantage. 01:17:09.830 --> 01:17:13.070 Because again, each of my music stands just one step away. 01:17:13.070 --> 01:17:16.490 In the case of actual ram, each number is just one byte away, 01:17:16.490 --> 01:17:20.510 and that lends itself to again, random access, ergo random access memory. 01:17:20.510 --> 01:17:23.150 But it's kind of painting me into a corner, 01:17:23.150 --> 01:17:26.840 because I'm finding that now I can no longer really 01:17:26.840 --> 01:17:30.060 add new values without incurring significant cost. 01:17:30.060 --> 01:17:35.180 And by significant cost I mean having to duplicate the entire array of memory 01:17:35.180 --> 01:17:38.840 into a new location, leaving one additional space for that new value, 01:17:38.840 --> 01:17:43.700 which is going to take as many steps as there are values in the original array. 01:17:43.700 --> 01:17:46.940 So you know what, maybe I take an even older school approach, 01:17:46.940 --> 01:17:49.730 and maybe I do something proactive. 01:17:49.730 --> 01:17:50.960 A little defensive. 01:17:50.960 --> 01:17:57.590 I'll put one there, and 50 here, and 51 here, and 61 and then 109, 01:17:57.590 --> 01:17:59.510 and then 121 and so forth. 01:17:59.510 --> 01:18:01.970 And wow, this was clever of me, because now I 01:18:01.970 --> 01:18:04.620 left myself little gaps in my values. 01:18:04.620 --> 01:18:08.750 So that I still get the predictability of every value is now two steps away, 01:18:08.750 --> 01:18:13.060 but now I can fit 55 in here. 01:18:13.060 --> 01:18:16.490 Well, clever as you might initially think that, now we've 01:18:16.490 --> 01:18:17.900 created kind of a problem. 01:18:17.900 --> 01:18:22.730 Because most values are two bytes away from each other. 01:18:22.730 --> 01:18:25.670 But now there's this weirdness where 55 is only one byte away 01:18:25.670 --> 01:18:26.490 from its neighbors. 01:18:26.490 --> 01:18:29.330 So that's really just complicating things now. 01:18:29.330 --> 01:18:32.570 I can no longer predictably take just one step at a time. 01:18:32.570 --> 01:18:34.490 Sometimes I take two, sometimes I take one, 01:18:34.490 --> 01:18:39.350 and that just feels like it's devolving into a very confusing bug prone 01:18:39.350 --> 01:18:40.760 situation already. 01:18:40.760 --> 01:18:44.679 And in fact, if you ever saw, or maybe grew up with the basic programming 01:18:44.679 --> 01:18:46.970 language, back in the day, you didn't number your lines 01:18:46.970 --> 01:18:48.840 of code zero, one, two, three, four. 01:18:48.840 --> 01:18:51.170 If you numbered of them at all, you instead 01:18:51.170 --> 01:18:55.610 did something like line ten, line 20, line 30, line 40. 01:18:55.610 --> 01:18:59.630 Which wonderfully left your room for like nine more additions later on, 01:18:59.630 --> 01:19:01.670 without having to renumber everything. 01:19:01.670 --> 01:19:04.790 Of course now, programs just number or lines automatically for us. 01:19:04.790 --> 01:19:07.530 And they're not even strictly necessary, those line numbers. 01:19:07.530 --> 01:19:11.480 So we've tried this, and it's just, it's not really a good solution. 01:19:11.480 --> 01:19:15.590 Better would be I'll claim some dynamism. 01:19:15.590 --> 01:19:21.410 What if instead I consider my computer's memory 01:19:21.410 --> 01:19:25.080 still to be this grid of locations. 01:19:25.080 --> 01:19:26.480 But let me do this. 01:19:26.480 --> 01:19:28.530 Let me propose that the first number. 01:19:28.530 --> 01:19:31.712 I'm going to put in, maybe is here, and just to keep things prettier I'm not 01:19:31.712 --> 01:19:33.170 going to draw the whole grid again. 01:19:33.170 --> 01:19:35.810 But let's assume that one byte I'm using is right there, 01:19:35.810 --> 01:19:37.460 and I put the number one in there. 01:19:37.460 --> 01:19:41.660 And then eventually, I decide to add that second number. 01:19:41.660 --> 01:19:45.410 And I go ahead and add the number 50. 01:19:45.410 --> 01:19:51.110 And the number 50 just so happens to end up, oh I don't know, over here. 01:19:51.110 --> 01:19:52.480 And it's farther away in memory. 01:19:52.480 --> 01:19:54.815 Maybe enough time has passed that the bunch 01:19:54.815 --> 01:19:57.940 of memory in between those two values is being used for some other purpose. 01:19:57.940 --> 01:19:59.140 And that's OK. 01:19:59.140 --> 01:20:02.050 But the 50 and the one are no longer next to each 01:20:02.050 --> 01:20:06.820 other, or therefore pictorially, or really related, 01:20:06.820 --> 01:20:09.250 unless I somehow interconnect them. 01:20:09.250 --> 01:20:11.890 So for now, let me go ahead and draw an arrow. 01:20:11.890 --> 01:20:15.070 And let me kind of weave together my values. 01:20:15.070 --> 01:20:19.120 Much like popcorn on a thread, or any number of other metaphors 01:20:19.120 --> 01:20:21.100 where you're linking things together. 01:20:21.100 --> 01:20:24.370 Like with a chain, for instance, can you make a list like this? 01:20:24.370 --> 01:20:27.010 Because my other numbers, suppose that more time passes. 01:20:27.010 --> 01:20:31.150 I insert a third value like 51, just happens to be over there, whatever. 01:20:31.150 --> 01:20:32.900 It's not contiguous, which is unfortunate. 01:20:32.900 --> 01:20:37.750 But that's OK, so long as I somehow link these together like this. 01:20:37.750 --> 01:20:41.560 And then maybe that next number here, 61, get lucky and it's pretty close. 01:20:41.560 --> 01:20:42.610 So it's over here. 01:20:42.610 --> 01:20:45.850 But we still need an arrow to this one here. 01:20:45.850 --> 01:20:53.500 And then maybe we have maybe 121 and 109, each of which 01:20:53.500 --> 01:20:55.180 is in some different locations. 01:20:55.180 --> 01:20:57.520 So we might need this arrow here. 01:20:57.520 --> 01:20:59.300 And this arrow there. 01:20:59.300 --> 01:21:01.490 So it's kind of all over the place. 01:21:01.490 --> 01:21:03.430 And if you ever play the game growing up, 01:21:03.430 --> 01:21:05.138 it's kind of like chutes and ladders now, 01:21:05.138 --> 01:21:08.140 you kind of have to like follow the chutes to get from one number 01:21:08.140 --> 01:21:09.320 to another. 01:21:09.320 --> 01:21:10.090 But that's OK. 01:21:10.090 --> 01:21:13.850 In fact, it's actually quite beautiful, my handwriting aside. 01:21:13.850 --> 01:21:15.610 It's not quite as efficient though. 01:21:15.610 --> 01:21:20.260 Because it turns out that if you're at one of the numbers in this list, 01:21:20.260 --> 01:21:24.240 and you want to jump to another, you can get to the next number pretty fast. 01:21:24.240 --> 01:21:26.340 If I'm at one, I can get to 50 pretty quickly. 01:21:26.340 --> 01:21:27.840 I just follow that arrow somehow. 01:21:27.840 --> 01:21:30.173 I don't know how it works in memory, and we don't really 01:21:30.173 --> 01:21:31.625 need to know how it works. 01:21:31.625 --> 01:21:34.750 We can kind of abstract that away for the sake of discussion at the moment. 01:21:34.750 --> 01:21:38.350 But I can get to 50 based on this picture alone pretty darn directly. 01:21:38.350 --> 01:21:41.410 But I can't really get to like 109 pretty quickly. 01:21:41.410 --> 01:21:44.770 If I'm starting at one, I want to get to 109. 01:21:44.770 --> 01:21:49.180 previously, I could just take one two, three, four steps away, four bytes away 01:21:49.180 --> 01:21:52.420 and I could just take a step that's like four times as big as usual 01:21:52.420 --> 01:21:54.040 and I'm instantly there. 01:21:54.040 --> 01:21:57.234 But these arrows seem to suggest that the memory could be anywhere. 01:21:57.234 --> 01:21:58.900 And so it's almost like following a map. 01:21:58.900 --> 01:22:01.810 If I'm at one, all right let me go find 50. 01:22:01.810 --> 01:22:04.180 Once I found 50, I can pick up another map, 01:22:04.180 --> 01:22:08.260 and now I can follow another arrow to 51, which is maybe over here. 01:22:08.260 --> 01:22:09.380 Now I pick up another map. 01:22:09.380 --> 01:22:10.780 Oh here's 61. 01:22:10.780 --> 01:22:13.810 OK now this leads me to, Ah here is 109. 01:22:13.810 --> 01:22:16.300 It wasn't as simple as just one big step that's 01:22:16.300 --> 01:22:21.580 four times as big as was possible in the case of contiguous memory. 01:22:21.580 --> 01:22:24.506 Now, I kind of have to weave my way through. 01:22:24.506 --> 01:22:25.630 That actually kind of hurt. 01:22:25.630 --> 01:22:29.630 Weave my way through my computer's memory to get where I'm going. 01:22:29.630 --> 01:22:30.940 So it's a trade off, right. 01:22:30.940 --> 01:22:33.820 Because now I have perfect dynamism. 01:22:33.820 --> 01:22:36.820 No matter where there is available memory in my computer, 01:22:36.820 --> 01:22:39.590 over there, over here, over here, over here, I can use it. 01:22:39.590 --> 01:22:42.950 And I can just kind of stitch together this data structure, 01:22:42.950 --> 01:22:44.740 the structure for my data. 01:22:44.740 --> 01:22:50.630 I have eliminated the gotcha that my data structure is a fixed size. 01:22:50.630 --> 01:22:54.130 So now if I want to add another value, suppose I want to add 55. 01:22:54.130 --> 01:22:58.120 OK, suppose that 55 happens to have some available space here. 01:22:58.120 --> 01:23:01.690 You know what I can do here, I can just get rid of this arrow, 01:23:01.690 --> 01:23:07.720 and then I can go ahead and stitch this in like this. 01:23:07.720 --> 01:23:08.391 No big deal. 01:23:08.391 --> 01:23:10.390 I don't have to touch any of the other elements. 01:23:10.390 --> 01:23:12.100 But think back to the array. 01:23:12.100 --> 01:23:13.990 The array being the rectangular example where 01:23:13.990 --> 01:23:15.448 everybody was back to back to back. 01:23:15.448 --> 01:23:19.480 That I had to make room for things, or duplicate all of it. 01:23:19.480 --> 01:23:23.410 Had to do more work here, just to allocate space for your 55, 01:23:23.410 --> 01:23:25.870 and then go ahead and just update two of these arrows, 01:23:25.870 --> 01:23:30.650 or pointers, or references as that might be called in different languages. 01:23:30.650 --> 01:23:32.510 So that's pretty powerful. 01:23:32.510 --> 01:23:34.780 But again I've lost the random access, which 01:23:34.780 --> 01:23:38.870 means I can't do things like binary search anymore. 01:23:38.870 --> 01:23:41.080 Which we're predicated on simple arithmetic. 01:23:41.080 --> 01:23:44.110 Divide by two, divide by two, divide by two, Maybe round, 01:23:44.110 --> 01:23:47.410 but divide by two each time. 01:23:47.410 --> 01:23:48.580 And these arrows. 01:23:48.580 --> 01:23:52.480 Kind of abstracting maybe too generously here. 01:23:52.480 --> 01:23:55.420 These arrows are going to cost me some memory too. 01:23:55.420 --> 01:23:58.900 Now I've drawn them prettily as arrows on the screen, 01:23:58.900 --> 01:24:01.420 but those actually themselves are some kind of value. 01:24:01.420 --> 01:24:06.340 And it turns out that those arrows themselves take up space, take up bits. 01:24:06.340 --> 01:24:09.730 So I've kind of doubled, let's say, the amount of space 01:24:09.730 --> 01:24:11.440 necessary to store this data. 01:24:11.440 --> 01:24:13.881 Because I need to somehow store and memory those arrows. 01:24:13.881 --> 01:24:15.880 Underneath the hood those arrows are effectively 01:24:15.880 --> 01:24:17.350 stored as numbers themselves. 01:24:17.350 --> 01:24:20.080 Which is to say, they take up at least a byte or more. 01:24:20.080 --> 01:24:24.402 In fact, on most systems, they take up four or eight bytes still. 01:24:24.402 --> 01:24:26.110 So it depends on what's important to you. 01:24:26.110 --> 01:24:29.670 Do you want to be able to search the data super efficiently 01:24:29.670 --> 01:24:32.970 and actually have random access, and do something like binary search 01:24:32.970 --> 01:24:36.375 and get that divide and conquer upside? 01:24:36.375 --> 01:24:39.000 Or do you want to have dynamism, and be able to grow and shrink 01:24:39.000 --> 01:24:43.950 the data structure super fast, but pay a price in terms of memory as well. 01:24:43.950 --> 01:24:45.960 And use more of those bits. 01:24:45.960 --> 01:24:48.480 It really quite depends. 01:24:48.480 --> 01:24:51.390 Now these aren't the only data structures at our disposal. 01:24:51.390 --> 01:24:54.630 We can use not just arrays, not just linked lists, 01:24:54.630 --> 01:24:56.880 as we'll call that last one, whereby we have 01:24:56.880 --> 01:25:00.450 links tying these lists items together. 01:25:00.450 --> 01:25:02.320 But we have maybe tree structures too. 01:25:02.320 --> 01:25:05.760 In fact, we think back to like your family tree that you 01:25:05.760 --> 01:25:07.437 might have made in grade school. 01:25:07.437 --> 01:25:09.270 You might have drawn a little something that 01:25:09.270 --> 01:25:12.510 might have grandma or grandpa at the top of the tree, 01:25:12.510 --> 01:25:15.330 and then their children might be down here, 01:25:15.330 --> 01:25:18.610 and then these children might be down here, and so forth. 01:25:18.610 --> 01:25:20.820 Well if we generalize away from a fam-- 01:25:20.820 --> 01:25:21.730 OK ignore that one. 01:25:21.730 --> 01:25:24.060 If we generalize away from a family tree, 01:25:24.060 --> 01:25:27.150 it turns out that this tree data structure is another kind 01:25:27.150 --> 01:25:29.610 of incarnation of that previous idea. 01:25:29.610 --> 01:25:34.080 A linked list doesn't have to just be linked from left to right, so to speak, 01:25:34.080 --> 01:25:36.420 even if it's kind of swooping all over the screen. 01:25:36.420 --> 01:25:38.430 It doesn't have to have a start and an end. 01:25:38.430 --> 01:25:42.870 It can be a tree structure, whereby your arrows actually, much 01:25:42.870 --> 01:25:45.710 like a branch and a program, can go in different directions. 01:25:45.710 --> 01:25:49.920 So each of these edges in this graph actually are directional. 01:25:49.920 --> 01:25:51.790 They take you from one place to another. 01:25:51.790 --> 01:25:56.380 You can imagine using a data structure like this to store values as well. 01:25:56.380 --> 01:26:02.320 For instance, suppose that I wanted to store numbers in this data structure. 01:26:02.320 --> 01:26:04.320 Instead of using squares, I'm just doing circles 01:26:04.320 --> 01:26:05.778 now just because it's conventional. 01:26:05.778 --> 01:26:06.990 Doesn't mean anything else. 01:26:06.990 --> 01:26:07.870 Anything special. 01:26:07.870 --> 01:26:11.290 These are still just bytes underneath the hood, or ultimately bits. 01:26:11.290 --> 01:26:16.320 Now suppose I want to store the number two here, and one here, and three 01:26:16.320 --> 01:26:20.490 here, and five here, and six here, and seven here. 01:26:20.490 --> 01:26:23.250 This is all very deliberate. 01:26:23.250 --> 01:26:25.890 Because notice the pattern that I've kind of adopted. 01:26:25.890 --> 01:26:29.660 Four is at the top, in the so-called root of the tree. 01:26:29.660 --> 01:26:33.960 What do you notice about all of the values to the left of the four? 01:26:33.960 --> 01:26:37.560 All of its left descendants so to speak, to borrow that family tree 01:26:37.560 --> 01:26:39.530 nomenclature. 01:26:39.530 --> 01:26:41.990 One, and two, and three. 01:26:41.990 --> 01:26:46.400 Curiously, all three of those are smaller than the number four. 01:26:46.400 --> 01:26:49.370 What about all of its right descendants, down this branch? 01:26:49.370 --> 01:26:51.710 Six and five and six and seven. 01:26:51.710 --> 01:26:53.510 Those are all bigger than four. 01:26:53.510 --> 01:26:55.400 That's kind of neat. 01:26:55.400 --> 01:26:56.570 How about here? 01:26:56.570 --> 01:26:59.360 Two, this is kind of a mini tree, if you will. 01:26:59.360 --> 01:27:01.730 This is a new route, if you ignore everything above it. 01:27:01.730 --> 01:27:04.059 Two has two children, left and right. 01:27:04.059 --> 01:27:06.350 Notice that the one is less than the two, and the three 01:27:06.350 --> 01:27:07.310 is more than the two. 01:27:07.310 --> 01:27:09.930 So this is not a coincidence that I drew it like this. 01:27:09.930 --> 01:27:13.450 Similarly, if you think of this as a tree, just those three nodes below it, 01:27:13.450 --> 01:27:16.430 six is bigger than five, six is less than seven. 01:27:16.430 --> 01:27:18.050 So there's a pattern. 01:27:18.050 --> 01:27:24.200 So we can actually reclaim some of the capabilities of divide 01:27:24.200 --> 01:27:26.330 and conquer by laying out our data, still 01:27:26.330 --> 01:27:29.690 with these arrows, these pointers or references if you will. 01:27:29.690 --> 01:27:33.110 But not just doing it linearly, or swiveling all over the screen, 01:27:33.110 --> 01:27:37.430 but from one start node to an end node. 01:27:37.430 --> 01:27:42.110 Where node is a fancy word for these squares or these circles. 01:27:42.110 --> 01:27:45.870 What if we instead allow the user to go in two different directions 01:27:45.870 --> 01:27:46.941 when searching for data? 01:27:46.941 --> 01:27:49.190 And suppose we leverage that same phone book principle 01:27:49.190 --> 01:27:52.700 where the smaller values are this way, and the bigger values are that way. 01:27:52.700 --> 01:27:54.470 What does that afford us? 01:27:54.470 --> 01:27:59.120 Well, this binary search tree, if you will, to give it a fancy name, 01:27:59.120 --> 01:28:02.540 actually gives us back a logarithmic running time. 01:28:02.540 --> 01:28:05.180 Much like dividing and conquering a phone book. 01:28:05.180 --> 01:28:09.050 Because indeed, if I start at the four and I'm looking for seven, 01:28:09.050 --> 01:28:13.310 and I immediately realize four is here, which means seven must be to the right 01:28:13.310 --> 01:28:18.800 if it's present at all, I can pretty much chop off the half of the problem 01:28:18.800 --> 01:28:20.240 that I know seven is not in. 01:28:20.240 --> 01:28:22.698 So it's like throwing away the left half of the phone book. 01:28:22.698 --> 01:28:25.850 I can ignore almost half of the nodes in the tree 01:28:25.850 --> 01:28:29.330 by just snipping off that left branch. 01:28:29.330 --> 01:28:31.380 Can't be that easy. 01:28:31.380 --> 01:28:32.800 Got to be a trade off. 01:28:32.800 --> 01:28:35.930 What it that trade off here? 01:28:35.930 --> 01:28:38.240 Well, from the looks of this picture, pretty 01:28:38.240 --> 01:28:43.220 though it is, relatively speaking, it looks like now each of my nodes, 01:28:43.220 --> 01:28:47.150 each of my values has not just one arrow associated with it potentially, 01:28:47.150 --> 01:28:49.250 but as many as two. 01:28:49.250 --> 01:28:52.740 And that means we're spending twice as much space as before. 01:28:52.740 --> 01:28:56.030 So whereas in our array, we use just one byte per value, 01:28:56.030 --> 01:28:59.660 or four bytes per value, or some fixed number of bytes per value, 01:28:59.660 --> 01:29:01.670 in my linked list example I had to double 01:29:01.670 --> 01:29:03.470 that because I had to also store-- 01:29:03.470 --> 01:29:05.060 using bytes in some form-- 01:29:05.060 --> 01:29:05.840 those arrows. 01:29:05.840 --> 01:29:07.920 Those pointers, or references as they're called. 01:29:07.920 --> 01:29:11.960 But now in this darn tree, I need three times as much data, 01:29:11.960 --> 01:29:14.730 because each of my nodes might have a left child so to speak, 01:29:14.730 --> 01:29:16.370 or a right child. 01:29:16.370 --> 01:29:18.230 So again it's a trade off. 01:29:18.230 --> 01:29:23.242 And the prevailing trade off here, thus far, seems to be if you want less time, 01:29:23.242 --> 01:29:24.950 you're going to have to spend more space. 01:29:24.950 --> 01:29:29.510 Or rather, less time, I'm getting the visual metaphor wrong here. 01:29:29.510 --> 01:29:33.290 If you want to spend less time, you're going to have to spend more space. 01:29:33.290 --> 01:29:38.180 And if you want to save space, you might have to tolerate a little more time. 01:29:38.180 --> 01:29:39.890 But can we do better? 01:29:39.890 --> 01:29:44.960 Indeed, can we combine some of the best features of multiple data structures 01:29:44.960 --> 01:29:48.110 in order to achieve something better still? 01:29:48.110 --> 01:29:48.890 Well, sort of. 01:29:48.890 --> 01:29:52.400 It turns out that there exists a very popular data 01:29:52.400 --> 01:29:54.710 structure called a hash table. 01:29:54.710 --> 01:29:57.950 And a hash table is a data structure that a computer scientist 01:29:57.950 --> 01:30:01.940 will use when he or she really wants to store a dynamic amount of data. 01:30:01.940 --> 01:30:03.350 It might grow or might shrink. 01:30:03.350 --> 01:30:06.740 But they also want to search that data pretty efficiently, 01:30:06.740 --> 01:30:10.830 and ideally be able to find data instantly, in constant time, 01:30:10.830 --> 01:30:11.730 so to speak. 01:30:11.730 --> 01:30:15.320 So you'll often see a hash table implemented, really 01:30:15.320 --> 01:30:18.350 with an array something like this, and I've drawn it vertically just 01:30:18.350 --> 01:30:19.790 to make the picture prettier. 01:30:19.790 --> 01:30:22.040 But again, these are just artists depictions. 01:30:22.040 --> 01:30:24.920 Bad artists depictions of what's going on in memory. 01:30:24.920 --> 01:30:27.065 It's still just an array of back to back values. 01:30:27.065 --> 01:30:28.940 But the values now in the array are not going 01:30:28.940 --> 01:30:31.670 to be the numbers I care about, or the words, or whatever. 01:30:31.670 --> 01:30:36.260 It's actually going to be arrows, which again we've stipulated take space. 01:30:36.260 --> 01:30:38.720 And each of these arrows is going to point 01:30:38.720 --> 01:30:42.380 to what we described as a linked list. 01:30:42.380 --> 01:30:46.880 So it turns out that a hash table will often 01:30:46.880 --> 01:30:48.830 look a little something like this. 01:30:48.830 --> 01:30:52.640 A combination of an array, and a linked list 01:30:52.640 --> 01:30:57.410 where those linked lists might be not present at all, might be short, 01:30:57.410 --> 01:31:00.410 might be long, totally depends. 01:31:00.410 --> 01:31:05.670 But we decide where to put our values based on some property of the value. 01:31:05.670 --> 01:31:09.260 So for instance, if this array weren't this big, but maybe 01:31:09.260 --> 01:31:11.810 had 26 different locations in it. 01:31:11.810 --> 01:31:15.350 And suppose that we're no longer storing numbers, we're storing names. 01:31:15.350 --> 01:31:20.960 Well every name in the English alphabet starts with an A or B or C 01:31:20.960 --> 01:31:24.500 or dot dot dot a Z. One of 26 possibilities. 01:31:24.500 --> 01:31:29.720 So if my hash table has an array of size 26, 01:31:29.720 --> 01:31:33.770 then maybe, if I'm storing names that start with A, 01:31:33.770 --> 01:31:37.250 I could just put them in this linked list here, the first one. 01:31:37.250 --> 01:31:39.590 And if they start with B, they'll go in the second one. 01:31:39.590 --> 01:31:41.190 And if they start with C, the third. 01:31:41.190 --> 01:31:44.630 D the fourth, Z the very last. 01:31:44.630 --> 01:31:49.760 And what you'll find in this case is partial optimization. 01:31:49.760 --> 01:31:53.120 It's not instantaneous to find someone with a given name. 01:31:53.120 --> 01:31:55.580 If there's a lot of people who's names start with A, 01:31:55.580 --> 01:31:59.000 you might still have to look through all of the A names linearly 01:31:59.000 --> 01:32:00.350 through that list. 01:32:00.350 --> 01:32:04.280 But at least you only have to look through 126th 01:32:04.280 --> 01:32:06.770 of the possible values in your data structure, 01:32:06.770 --> 01:32:08.240 assuming a uniform distribution. 01:32:08.240 --> 01:32:11.570 Of course, that's not fair, there's probably fewer names that start with Z, 01:32:11.570 --> 01:32:13.880 fewer names that start with Q, maybe a lot 01:32:13.880 --> 01:32:16.610 that start with A or D or other such letters. 01:32:16.610 --> 01:32:19.040 So your chains might be of variable length, 01:32:19.040 --> 01:32:20.690 and maybe that's not the best design. 01:32:20.690 --> 01:32:22.970 So maybe we should think a little harder about this. 01:32:22.970 --> 01:32:27.180 So this principle of bucketizing things, and putting values in their place, 01:32:27.180 --> 01:32:29.750 and using that as a stepping stone to getting 01:32:29.750 --> 01:32:32.710 to a complete solution-- in this case of sorted order-- 01:32:32.710 --> 01:32:36.560 is really useful not just in these increasingly complex examples, 01:32:36.560 --> 01:32:38.190 but even in the real world. 01:32:38.190 --> 01:32:40.880 So for instance, if you're a fan of playing cards, 01:32:40.880 --> 01:32:43.100 you might have a deck of cards at home. 01:32:43.100 --> 01:32:47.000 And those cards of course, have different values or numbers 01:32:47.000 --> 01:32:48.650 on them, and also different suits. 01:32:48.650 --> 01:32:50.990 Clubs and spades and diamonds and hearts. 01:32:50.990 --> 01:32:53.870 And you know, sometimes you might want to either shuffle the deck, 01:32:53.870 --> 01:32:56.280 or really un-shuffle it and actually sort it. 01:32:56.280 --> 01:32:58.640 And if you did want to, sort of obsessively like me, 01:32:58.640 --> 01:33:00.980 want to sort your deck of cards, well you 01:33:00.980 --> 01:33:03.500 could just go through it one card at a time. 01:33:03.500 --> 01:33:07.010 And then try to find or select or maybe bubble up the values, 01:33:07.010 --> 01:33:08.480 like our algorithms before. 01:33:08.480 --> 01:33:11.190 But most of us probably use a bit of intuition. 01:33:11.190 --> 01:33:14.169 So if I see a spade here, I might put it in one pile, 01:33:14.169 --> 01:33:16.460 and a heart, I'm going to put that in a different pile. 01:33:16.460 --> 01:33:18.470 A diamond, different pile still. 01:33:18.470 --> 01:33:19.880 A spade will go up here. 01:33:19.880 --> 01:33:21.700 A club now goes over here. 01:33:21.700 --> 01:33:25.640 And as I continue this pattern, I'm kind of dividing the problem, 01:33:25.640 --> 01:33:27.140 albeit in a different way. 01:33:27.140 --> 01:33:30.020 I now have, instead of a 52 card problem, 01:33:30.020 --> 01:33:33.350 I'm eventually going to have four 13-card problems. 01:33:33.350 --> 01:33:35.480 But how am I getting to that point? 01:33:35.480 --> 01:33:37.040 Well, I'm bucketizing things. 01:33:37.040 --> 01:33:40.940 Not physical buckets, but there's like four piles here of cards. 01:33:40.940 --> 01:33:42.950 But what really am I doing? 01:33:42.950 --> 01:33:45.500 I'm really hashing these values. 01:33:45.500 --> 01:33:48.160 I am looking at each card as input. 01:33:48.160 --> 01:33:52.310 I am passing it through a sort of mental function, a mental black box if you 01:33:52.310 --> 01:33:54.590 will, that outputs a value. 01:33:54.590 --> 01:34:00.140 Either bucket one, or two, or three, or four, or more specifically spade, 01:34:00.140 --> 01:34:02.690 or diamond, or heart, or club. 01:34:02.690 --> 01:34:04.269 That's kind of my hash function. 01:34:04.269 --> 01:34:06.060 So if you've ever done something like that, 01:34:06.060 --> 01:34:09.510 where you're trying to clean up a mess, or you're trying to sort some data, 01:34:09.510 --> 01:34:11.090 or organize your cards. 01:34:11.090 --> 01:34:14.480 Once you start bucketizing in this way, you are hashing values. 01:34:14.480 --> 01:34:17.090 You're taking as input some value, maybe a card, 01:34:17.090 --> 01:34:19.130 and producing as output some identifier. 01:34:19.130 --> 01:34:21.200 Similarly, with our example of names, if I've 01:34:21.200 --> 01:34:24.382 got a whole bunch of names, each of which might start with A through Z, 01:34:24.382 --> 01:34:26.090 each time you hand me a name, I'm looking 01:34:26.090 --> 01:34:28.310 at the very first letter in that person's name, 01:34:28.310 --> 01:34:32.390 and I'm deciding A bucket, Z bucket, D bucket, C bucket or so forth. 01:34:32.390 --> 01:34:33.590 C should be over there. 01:34:33.590 --> 01:34:37.280 And deciding, based on the input, what the output should be. 01:34:37.280 --> 01:34:41.060 So again, here to, this sort of intuitive principle 01:34:41.060 --> 01:34:44.900 can very quickly be applied to something much more sophisticated. 01:34:44.900 --> 01:34:48.740 Again, compare and contrast the sort of sophistication 01:34:48.740 --> 01:34:50.750 of the before and the after. 01:34:50.750 --> 01:34:55.290 This is odds are pretty darn intuitive, if not something very familiar to you. 01:34:55.290 --> 01:34:57.680 This might still look a bit like Greek. 01:34:57.680 --> 01:35:01.430 Certainly pretty arcane versus some simple playing cards. 01:35:01.430 --> 01:35:08.090 But to the overarching point, that while computing and technology all around 01:35:08.090 --> 01:35:11.150 us might seem especially sophisticated, and complicated, 01:35:11.150 --> 01:35:13.250 and well beyond one's understanding, that's 01:35:13.250 --> 01:35:16.520 just because it might be presently beyond your familiarity 01:35:16.520 --> 01:35:18.090 with some of those building blocks. 01:35:18.090 --> 01:35:21.660 Indeed, we started so low level with just zeros and ones, 01:35:21.660 --> 01:35:23.660 and then we got to letters, and then maybe words 01:35:23.660 --> 01:35:26.420 and paragraphs, and graphics and videos, and eventually 01:35:26.420 --> 01:35:28.610 so many more forms of media. 01:35:28.610 --> 01:35:31.310 And at the end of the day, it's useful to understand 01:35:31.310 --> 01:35:34.550 what's going on underneath the hood, but it's also very empowering. 01:35:34.550 --> 01:35:36.830 Because you realize that even once we're at the point 01:35:36.830 --> 01:35:40.640 as we are literally right now, of talking it this degree of complexity 01:35:40.640 --> 01:35:43.730 and sophistication for how you might store data inside of her computer's 01:35:43.730 --> 01:35:46.670 memory, and very efficiently get at it, all of this 01:35:46.670 --> 01:35:49.040 was the result of these stepping stones. 01:35:49.040 --> 01:35:50.360 These abstractions. 01:35:50.360 --> 01:35:53.180 And along the way, in building up these abstractions, 01:35:53.180 --> 01:35:56.390 and in solving problems more effectively by standing 01:35:56.390 --> 01:35:58.830 on the shoulders of problems solved past, 01:35:58.830 --> 01:36:01.610 similarly we have to make some trade offs. 01:36:01.610 --> 01:36:05.580 And so at the end of the day, this is what computational thinking is all 01:36:05.580 --> 01:36:06.080 about. 01:36:06.080 --> 01:36:09.350 This is what computer science and engineering and programming 01:36:09.350 --> 01:36:10.460 is all about. 01:36:10.460 --> 01:36:14.090 It's solving problems using solutions to problems past, 01:36:14.090 --> 01:36:17.930 and along the way making very conscious, calculated, and hopefully 01:36:17.930 --> 01:36:21.890 correct decisions, as to what resources and what 01:36:21.890 --> 01:36:25.810 objectives are most important to you.