1 00:00:00,000 --> 00:00:00,030 2 00:00:00,030 --> 00:00:00,460 >> DAVID MALAN: All right. 3 00:00:00,460 --> 00:00:01,094 We are back. 4 00:00:01,094 --> 00:00:04,260 So in this segment on programming what I thought we'd do is a mix of things. 5 00:00:04,260 --> 00:00:06,340 One, do a little bit of something hands-on, 6 00:00:06,340 --> 00:00:08,690 albeit using a more playful programming environment-- 7 00:00:08,690 --> 00:00:11,620 one that is demonstrative of exactly the kinds of ideas 8 00:00:11,620 --> 00:00:14,220 we've been talking about, but a little more formally. 9 00:00:14,220 --> 00:00:18,200 Two, look at some of the more technical ways 10 00:00:18,200 --> 00:00:21,520 that a programmer would actually solve problems like the searching problem 11 00:00:21,520 --> 00:00:24,530 that we looked at before and also a more fundamentally 12 00:00:24,530 --> 00:00:26,020 interesting problem of sorting. 13 00:00:26,020 --> 00:00:28,840 >> We just assumed from the get go that that phone book was sorted, 14 00:00:28,840 --> 00:00:31,980 but that alone is actually kind of a hard problem with many different ways 15 00:00:31,980 --> 00:00:32,479 to solve it. 16 00:00:32,479 --> 00:00:34,366 So we'll use these as a class of problems 17 00:00:34,366 --> 00:00:36,740 representative of things that might be solved in general. 18 00:00:36,740 --> 00:00:38,980 And then we'll talk about in some detail what 19 00:00:38,980 --> 00:00:42,360 are called data structures-- fancier ways like linked lists 20 00:00:42,360 --> 00:00:46,290 and hash tables and trees that a programmer would actually 21 00:00:46,290 --> 00:00:48,890 use and generally use on a whiteboard to paint 22 00:00:48,890 --> 00:00:51,840 a picture of what he or she envisions for implementing 23 00:00:51,840 --> 00:00:52,980 some piece of software. 24 00:00:52,980 --> 00:00:55,130 >> So let's do the hands-on portion first. 25 00:00:55,130 --> 00:01:00,090 So just get your hands dirty with an environment called scratch.mit.edu. 26 00:01:00,090 --> 00:01:02,636 This is a tool that we use in our undergraduate class. 27 00:01:02,636 --> 00:01:04,510 Even though it's designed for ages 12 and up, 28 00:01:04,510 --> 00:01:07,570 we use it for the up part of that quite a bit 29 00:01:07,570 --> 00:01:10,020 since it's a nice, fun graphical way of learning 30 00:01:10,020 --> 00:01:12,160 a little something about programming. 31 00:01:12,160 --> 00:01:17,600 So head to that URL, where you should see a page quite like this, 32 00:01:17,600 --> 00:01:23,330 and go ahead and click Join Scratch at top right 33 00:01:23,330 --> 00:01:28,300 and choose a username and a password and ultimately get yourself 34 00:01:28,300 --> 00:01:29,970 an account-- scratch.mit.edu. 35 00:01:29,970 --> 00:01:32,165 36 00:01:32,165 --> 00:01:34,665 I thought I'd use this as an opportunity first to show this. 37 00:01:34,665 --> 00:01:39,120 A question came up during the break about what code actually looks like. 38 00:01:39,120 --> 00:01:41,315 And we were talking during the break about C, 39 00:01:41,315 --> 00:01:45,060 in particular-- particularly a lower level in an older language. 40 00:01:45,060 --> 00:01:47,750 And I just did a quick Google search to find C code 41 00:01:47,750 --> 00:01:51,574 for binary search, the algorithm that we used to search that phone book earlier. 42 00:01:51,574 --> 00:01:54,240 This particular example, of course, doesn't search a phone book. 43 00:01:54,240 --> 00:01:57,840 It just searches a whole bunch of numbers in the computer's memory. 44 00:01:57,840 --> 00:02:01,000 But if you'd like to just get a visual sense of what an actual programming 45 00:02:01,000 --> 00:02:05,370 language looks like, it looks a little something like this. 46 00:02:05,370 --> 00:02:09,759 So it's about 20-plus, 30 or so lines of code, 47 00:02:09,759 --> 00:02:12,640 but the conversation we were having over break 48 00:02:12,640 --> 00:02:16,000 was about how this actually gets morphed into zeros and ones 49 00:02:16,000 --> 00:02:19,200 and if you can't just revert that process and go from zeros and ones 50 00:02:19,200 --> 00:02:20,210 back to code. 51 00:02:20,210 --> 00:02:22,620 >> Unfortunately, the process is so transformative 52 00:02:22,620 --> 00:02:24,890 that it's a lot easier said than done. 53 00:02:24,890 --> 00:02:29,400 I went ahead and actually turned that program, Binary Search, 54 00:02:29,400 --> 00:02:32,700 into zeros and ones by way of a program called The Compiler that I 55 00:02:32,700 --> 00:02:34,400 happen to have here right on my Mac. 56 00:02:34,400 --> 00:02:37,850 And if you look at the screen here, focusing specifically 57 00:02:37,850 --> 00:02:43,520 on these middle six columns only, you'll see only zeros and ones. 58 00:02:43,520 --> 00:02:48,290 And those are the zeros and ones that compose exactly that searching program. 59 00:02:48,290 --> 00:02:53,720 >> And so each chunk of five bits, each byte of zeros and ones here, 60 00:02:53,720 --> 00:02:57,310 represent some instruction typically inside of a computer. 61 00:02:57,310 --> 00:03:00,730 And in fact, if you've heard the marketing slogan "Intel inside"-- that, 62 00:03:00,730 --> 00:03:04,610 of course, just means you have an Intel CPU or brain inside the computer. 63 00:03:04,610 --> 00:03:08,000 And what that means to be a CPU is that you have an instruction set, 64 00:03:08,000 --> 00:03:08,840 so to speak. 65 00:03:08,840 --> 00:03:11,620 >> Every CPU in the world, many of them made by Intel these days, 66 00:03:11,620 --> 00:03:13,690 understands a finite number of instructions. 67 00:03:13,690 --> 00:03:18,690 And those instructions are so low level as add these two numbers together, 68 00:03:18,690 --> 00:03:22,560 multiply these two numbers together, move this piece of data from here 69 00:03:22,560 --> 00:03:27,340 to here in memory, save this information from here to here in memory, 70 00:03:27,340 --> 00:03:32,200 and so forth-- so very, very low-level, almost electronic details. 71 00:03:32,200 --> 00:03:34,780 But with those mathematical operations coupled 72 00:03:34,780 --> 00:03:37,410 with what we discussed earlier, the representation of data 73 00:03:37,410 --> 00:03:40,450 as zeros and ones, can you build up everything 74 00:03:40,450 --> 00:03:44,180 that a computer can do today, whether it's textual, graphical, musical, 75 00:03:44,180 --> 00:03:45,580 or otherwise. 76 00:03:45,580 --> 00:03:49,450 >> So this is very easy to get lost in the weeds of quickly. 77 00:03:49,450 --> 00:03:52,150 And there's a lot of syntactical challenges 78 00:03:52,150 --> 00:03:56,630 whereby if you make the simplest, stupidest of typos none of the program 79 00:03:56,630 --> 00:03:57,860 will work whatsoever. 80 00:03:57,860 --> 00:04:00,366 And so instead of using a language like C this morning, 81 00:04:00,366 --> 00:04:02,240 I thought it would be more fun to actually do 82 00:04:02,240 --> 00:04:04,840 something more visual, which while designed for kids 83 00:04:04,840 --> 00:04:08,079 is actually a perfect manifestation of an actual programming 84 00:04:08,079 --> 00:04:10,370 language-- just happens to use pictures instead of text 85 00:04:10,370 --> 00:04:11,710 to represent those ideas. 86 00:04:11,710 --> 00:04:15,470 >> So once you indeed have an account on scratch.mit.edu, 87 00:04:15,470 --> 00:04:21,070 click the Create button at top left of the site. 88 00:04:21,070 --> 00:04:24,620 And you should see an environment like the one I'm about to see on my screen 89 00:04:24,620 --> 00:04:26,310 here. 90 00:04:26,310 --> 00:04:29,350 And we'll spend just a little bit of time playing here. 91 00:04:29,350 --> 00:04:34,080 Let's see if we can't all solve some problems together in the following way. 92 00:04:34,080 --> 00:04:39,420 >> So what you'll see within this environment-- and actually just let 93 00:04:39,420 --> 00:04:40,050 me pause. 94 00:04:40,050 --> 00:04:42,680 Is anyone not here? 95 00:04:42,680 --> 00:04:45,070 Not here? 96 00:04:45,070 --> 00:04:45,800 OK. 97 00:04:45,800 --> 00:04:49,030 So let me point out a few characteristics of this environment. 98 00:04:49,030 --> 00:04:55,024 >> So at the top left of the screen, we have Scratch's stage, so to speak. 99 00:04:55,024 --> 00:04:57,440 Scratch is not only the name of this programming language; 100 00:04:57,440 --> 00:05:00,356 it's also the name of the cat that you see by default there in orange. 101 00:05:00,356 --> 00:05:02,160 He is on a stage, so much like I described 102 00:05:02,160 --> 00:05:05,770 the turtle earlier as being in a rectangular white board environment. 103 00:05:05,770 --> 00:05:09,800 This cat's world is confined entirely to that rectangle up top there. 104 00:05:09,800 --> 00:05:12,210 >> Meanwhile, on the right hand side here, it's 105 00:05:12,210 --> 00:05:15,610 just a scripts area, a blank slate if you will. 106 00:05:15,610 --> 00:05:18,590 This is where we're going to write our programs in just a moment. 107 00:05:18,590 --> 00:05:22,935 And the building blocks that we shall use to write this program-- the puzzle 108 00:05:22,935 --> 00:05:25,310 pieces, if you will-- are those right here in the middle, 109 00:05:25,310 --> 00:05:27,500 and they're categorized by functionality. 110 00:05:27,500 --> 00:05:31,000 So, for instance, I'm going to go ahead and demonstrate at least one of these. 111 00:05:31,000 --> 00:05:33,690 I'm going to go ahead and click the Control category up top. 112 00:05:33,690 --> 00:05:35,720 >> So these are the categories up top. 113 00:05:35,720 --> 00:05:39,410 I'm going to click the Control category. 114 00:05:39,410 --> 00:05:44,020 Rather, I'm going to click the Events category, the very first one up top. 115 00:05:44,020 --> 00:05:47,790 And if you'd like to follow along even as we do this, you're quite welcome to. 116 00:05:47,790 --> 00:05:52,180 I'm going to click and drag this first one, "when green flag clicked." 117 00:05:52,180 --> 00:05:58,410 And then I'm going to drop it just roughly at the top of my blank slates. 118 00:05:58,410 --> 00:06:01,450 >> And what's nice about Scratch is that this puzzle piece, when 119 00:06:01,450 --> 00:06:04,560 interlocked with other puzzle pieces, is going to do literally 120 00:06:04,560 --> 00:06:06,460 what those puzzle pieces say to do. 121 00:06:06,460 --> 00:06:09,710 So, for instance, Scratch is right now in the middle of his world. 122 00:06:09,710 --> 00:06:14,660 I'm going to go ahead and choose now, let's say, the Motion category, 123 00:06:14,660 --> 00:06:18,000 if you'd like to do the same-- Motion category. 124 00:06:18,000 --> 00:06:20,430 And now notice I have a whole bunch of puzzle pieces here 125 00:06:20,430 --> 00:06:23,370 that, again, kind of do what they say. 126 00:06:23,370 --> 00:06:28,110 And I'm going to go ahead and drag and drop the move block right over here. 127 00:06:28,110 --> 00:06:31,860 >> And notice that as soon as you get close to the bottom of the "green flag 128 00:06:31,860 --> 00:06:34,580 clicked" button, notice how a white line appears, 129 00:06:34,580 --> 00:06:36,950 as though it's almost magnetic, it wants to go there. 130 00:06:36,950 --> 00:06:43,070 Just let go, and it will snap together and the shapes will match. 131 00:06:43,070 --> 00:06:46,620 And now you can perhaps almost guess where we're going with this. 132 00:06:46,620 --> 00:06:51,570 >> If you look at the Scratch stage over here and look to the top of it, 133 00:06:51,570 --> 00:06:55,142 you'll see a red light, a stop sign, and a green flag. 134 00:06:55,142 --> 00:06:57,100 And I'm going to go ahead and watch my screen-- 135 00:06:57,100 --> 00:06:58,460 for just a moment, if you could. 136 00:06:58,460 --> 00:07:01,960 I'm going to click the green flag right now, 137 00:07:01,960 --> 00:07:07,850 and he moved what appears to be 10 steps or 10 pixels, 10 dots, on the screen. 138 00:07:07,850 --> 00:07:13,390 >> And so not that exciting, but let me propose 139 00:07:13,390 --> 00:07:17,440 without even teaching this, just using the own your own intuition-- let 140 00:07:17,440 --> 00:07:22,560 me propose that you figure out how to make Scratch walk right off the stage. 141 00:07:22,560 --> 00:07:28,700 Have him make way for the right side of the screen, all the way to the right. 142 00:07:28,700 --> 00:07:32,200 Let me give you a moment or so to wrestle with that. 143 00:07:32,200 --> 00:07:37,681 You might want to take a look at other categories of blocks. 144 00:07:37,681 --> 00:07:38,180 All right. 145 00:07:38,180 --> 00:07:41,290 So just to recap, when we have the green flag clicked here 146 00:07:41,290 --> 00:07:44,850 and move 10 steps is the only instruction, each time I 147 00:07:44,850 --> 00:07:46,720 click the green flag, what's happening? 148 00:07:46,720 --> 00:07:50,070 Well, that's running my program. 149 00:07:50,070 --> 00:07:52,450 So I could do this maybe 10 times manually, 150 00:07:52,450 --> 00:07:55,130 but this feels a little bit hackish, so to speak, 151 00:07:55,130 --> 00:07:57,480 whereby I'm not really solving the problem. 152 00:07:57,480 --> 00:08:00,530 I'm just trying again and again and again and again 153 00:08:00,530 --> 00:08:03,180 until I sort of accidentally achieve the directive 154 00:08:03,180 --> 00:08:05,560 that I set out to achieve earlier. 155 00:08:05,560 --> 00:08:08,200 >> But we know from our pseudocode earlier that there's 156 00:08:08,200 --> 00:08:11,870 this notion in programming of looping, doing something again and again. 157 00:08:11,870 --> 00:08:14,888 And so I saw that a bunch of you reached for what puzzle piece? 158 00:08:14,888 --> 00:08:17,870 159 00:08:17,870 --> 00:08:18,730 Repeat until. 160 00:08:18,730 --> 00:08:21,400 So we could do something like repeat until. 161 00:08:21,400 --> 00:08:23,760 And what did you repeat until exactly? 162 00:08:23,760 --> 00:08:27,720 163 00:08:27,720 --> 00:08:28,540 >> OK. 164 00:08:28,540 --> 00:08:31,974 And let me go with one that's somewhat simpler for just a moment. 165 00:08:31,974 --> 00:08:33,140 Let me go ahead and do this. 166 00:08:33,140 --> 00:08:35,559 Notice that, as you may have discovered under Control, 167 00:08:35,559 --> 00:08:38,409 there is this repeat block, which doesn't look like it's that big. 168 00:08:38,409 --> 00:08:41,039 There's not much room in between those two yellow lines. 169 00:08:41,039 --> 00:08:43,539 But as some of you might have noticed, if you drag and drop, 170 00:08:43,539 --> 00:08:45,150 notice how it grows to fill the shape. 171 00:08:45,150 --> 00:08:46,274 >> And you can even cram more. 172 00:08:46,274 --> 00:08:48,670 It'll just keep growing if you drag and hover over it. 173 00:08:48,670 --> 00:08:51,110 And I don't know what's best here, so let 174 00:08:51,110 --> 00:08:54,760 me at least repeat five times, for instance, and then go back to the stage 175 00:08:54,760 --> 00:08:56,720 and click the green flag. 176 00:08:56,720 --> 00:08:59,110 And now notice it's not quite there. 177 00:08:59,110 --> 00:09:02,400 >> Now some of you proposed, as Victoria just did, repeat 10 times. 178 00:09:02,400 --> 00:09:05,140 And that generally does get him all the way, 179 00:09:05,140 --> 00:09:10,510 but wouldn't there be a more robust way than arbitrarily figuring out 180 00:09:10,510 --> 00:09:12,640 how many moves to make? 181 00:09:12,640 --> 00:09:17,680 What might be a better block than repeat 10 times be? 182 00:09:17,680 --> 00:09:20,380 >> Yeah, so why not do something forever? 183 00:09:20,380 --> 00:09:24,390 And now let me move this puzzle piece inside there and get rid of this one. 184 00:09:24,390 --> 00:09:28,300 Now notice no matter where Scratch starts, he goes to the edge. 185 00:09:28,300 --> 00:09:30,700 And thankfully MIT, who makes Scratch, just 186 00:09:30,700 --> 00:09:33,190 makes sure that he never disappears completely. 187 00:09:33,190 --> 00:09:35,360 You can always grab his tail. 188 00:09:35,360 --> 00:09:37,680 >> And just intuitively, why does he keep moving? 189 00:09:37,680 --> 00:09:38,892 What is going on here? 190 00:09:38,892 --> 00:09:41,440 191 00:09:41,440 --> 00:09:43,824 He seems to have stopped, but then if I pick up and drag 192 00:09:43,824 --> 00:09:45,240 he keeps wanting to go over there. 193 00:09:45,240 --> 00:09:46,123 Why is that? 194 00:09:46,123 --> 00:09:51,610 195 00:09:51,610 --> 00:09:54,360 Truly, a computer is literally going to do what you tell it to do. 196 00:09:54,360 --> 00:09:58,380 So if you told it earlier do the following thing forever, move 10 steps, 197 00:09:58,380 --> 00:10:01,860 it's going to keep going and going until I hit the red stop sign 198 00:10:01,860 --> 00:10:04,620 and stop the program altogether. 199 00:10:04,620 --> 00:10:06,610 >> So even if you didn't do this, how could I 200 00:10:06,610 --> 00:10:09,510 make Scratch move faster across the screen? 201 00:10:09,510 --> 00:10:12,060 202 00:10:12,060 --> 00:10:13,280 More steps, right? 203 00:10:13,280 --> 00:10:15,710 So instead of doing 10 at a time, why don't we 204 00:10:15,710 --> 00:10:20,100 go ahead and change it to-- what would you propose-- 50? 205 00:10:20,100 --> 00:10:24,410 So now I'm going to click the green flag, and indeed, he goes really fast. 206 00:10:24,410 --> 00:10:27,180 >> And this, of course, is just a manifestation of animation. 207 00:10:27,180 --> 00:10:28,060 What is animation? 208 00:10:28,060 --> 00:10:33,090 It's just showing you the human a whole bunch of still images really, 209 00:10:33,090 --> 00:10:34,160 really, really fast. 210 00:10:34,160 --> 00:10:36,500 And so if we're just telling him to move more steps, 211 00:10:36,500 --> 00:10:39,750 we're just having the effect be to change where he is on the screen 212 00:10:39,750 --> 00:10:42,900 all the more rapidly per unit of time. 213 00:10:42,900 --> 00:10:46,454 >> Now the next challenge that I proposed was to have him bounce off the edge. 214 00:10:46,454 --> 00:10:49,120 And without knowing what puzzle pieces exist-- because it's fine 215 00:10:49,120 --> 00:10:53,030 if you don't get to the stage of the challenge-- what 216 00:10:53,030 --> 00:10:54,280 do you want to do intuitively? 217 00:10:54,280 --> 00:10:58,030 How would we have him bounce back and forth, between the left and right? 218 00:10:58,030 --> 00:11:02,630 219 00:11:02,630 --> 00:11:03,810 >> Yeah. 220 00:11:03,810 --> 00:11:05,680 So we need some kind of condition, and we 221 00:11:05,680 --> 00:11:09,710 seem to have conditionals, so to speak, under the Control category. 222 00:11:09,710 --> 00:11:14,110 Which of these blocks do we probably want? 223 00:11:14,110 --> 00:11:15,200 Yeah, maybe "if, then." 224 00:11:15,200 --> 00:11:18,780 So notice that among the yellow blocks we have here, there is this "if" 225 00:11:18,780 --> 00:11:23,920 or this "if, else" block that will allow us to make a decision to do this 226 00:11:23,920 --> 00:11:25,000 or to do that. 227 00:11:25,000 --> 00:11:27,380 And you can even nest them to do multiple things. 228 00:11:27,380 --> 00:11:34,910 Or if you've not gone here yet, go ahead to the Sensing category 229 00:11:34,910 --> 00:11:39,612 and-- let's see if it's here. 230 00:11:39,612 --> 00:11:43,050 231 00:11:43,050 --> 00:11:52,050 >> So what block might be helpful here to detect if he's off the stage? 232 00:11:52,050 --> 00:11:56,740 Yeah, notice that some of these blocks can be parametrized, so to speak. 233 00:11:56,740 --> 00:12:00,706 They can be sort of customized, not unlike HTML yesterday with attributes, 234 00:12:00,706 --> 00:12:03,330 where those attributes kind of customize the behavior of a tag. 235 00:12:03,330 --> 00:12:08,880 Similarly here, can I grab this touching block and change and ask the question, 236 00:12:08,880 --> 00:12:11,500 are you touching the mouse pointer like the cursor 237 00:12:11,500 --> 00:12:13,250 or are you touching the edge? 238 00:12:13,250 --> 00:12:15,210 >> So let me go in and do this. 239 00:12:15,210 --> 00:12:18,130 I'm going to zoom out for a moment. 240 00:12:18,130 --> 00:12:21,320 Let me grab this puzzle piece here, this puzzle piece this, 241 00:12:21,320 --> 00:12:24,570 and I'm going to jumble them up for just a moment. 242 00:12:24,570 --> 00:12:27,620 I'm going to move this, change this to touching edge, 243 00:12:27,620 --> 00:12:38,590 and I'm going to motion do this. 244 00:12:38,590 --> 00:12:40,490 So here are some ingredients. 245 00:12:40,490 --> 00:12:42,570 I think I've got everything I want. 246 00:12:42,570 --> 00:12:47,710 >> Would someone like to propose how I can connect these maybe top to bottom 247 00:12:47,710 --> 00:12:52,020 in order to solve the problem of having Scratch move right to left to right to 248 00:12:52,020 --> 00:12:57,020 left to right to left, each time just bouncing off the wall? 249 00:12:57,020 --> 00:12:58,050 What do I want to do? 250 00:12:58,050 --> 00:13:01,097 Which block should I connect to the "when green flag clicked first"? 251 00:13:01,097 --> 00:13:04,060 252 00:13:04,060 --> 00:13:06,200 >> OK, so let's start with the "forever." 253 00:13:06,200 --> 00:13:07,170 What goes inside next? 254 00:13:07,170 --> 00:13:10,290 Someone else. 255 00:13:10,290 --> 00:13:11,850 OK, move steps. 256 00:13:11,850 --> 00:13:12,350 All right. 257 00:13:12,350 --> 00:13:14,470 Then what? 258 00:13:14,470 --> 00:13:15,120 So then the if. 259 00:13:15,120 --> 00:13:17,720 And notice, even though it looks sandwiched together tightly, 260 00:13:17,720 --> 00:13:19,500 it will just grow to fill. 261 00:13:19,500 --> 00:13:21,500 It will just jump in where I want it. 262 00:13:21,500 --> 00:13:25,920 >> And what do I put between the if and the then? 263 00:13:25,920 --> 00:13:27,180 Probably "if touching edge." 264 00:13:27,180 --> 00:13:31,800 And notice, again, it's too big for it, but it will grow to fill. 265 00:13:31,800 --> 00:13:35,002 And then turn 15 degrees? 266 00:13:35,002 --> 00:13:35,710 How many degrees? 267 00:13:35,710 --> 00:13:38,800 268 00:13:38,800 --> 00:13:41,196 Yeah, so 180 will spin me all the way around. 269 00:13:41,196 --> 00:13:42,570 So let's see if I got this right. 270 00:13:42,570 --> 00:13:43,930 Let me zoom out. 271 00:13:43,930 --> 00:13:45,130 >> Let me drag Scratch up. 272 00:13:45,130 --> 00:13:50,030 So he's a little distorted now, but that's fine. 273 00:13:50,030 --> 00:13:52,231 How can I reset him easily? 274 00:13:52,231 --> 00:13:59,879 275 00:13:59,879 --> 00:14:01,045 I'm going to cheat slightly. 276 00:14:01,045 --> 00:14:04,074 277 00:14:04,074 --> 00:14:05,990 So I'm adding another block, just to be clear. 278 00:14:05,990 --> 00:14:08,424 I want him to point 90 degrees to the right by default, 279 00:14:08,424 --> 00:14:10,840 so I'm just going to tell him to do that programmatically. 280 00:14:10,840 --> 00:14:11,632 And here we go. 281 00:14:11,632 --> 00:14:14,740 282 00:14:14,740 --> 00:14:15,740 We seem to have done it. 283 00:14:15,740 --> 00:14:19,980 It's a little weird, because he's walking upside down. 284 00:14:19,980 --> 00:14:21,250 Let's call that a bug. 285 00:14:21,250 --> 00:14:22,120 That's a mistake. 286 00:14:22,120 --> 00:14:27,320 A bug is a mistake in a program, a logical error that I, the human, made. 287 00:14:27,320 --> 00:14:28,985 Why is he going upside down? 288 00:14:28,985 --> 00:14:33,560 289 00:14:33,560 --> 00:14:35,250 Did MIT screw up or did I? 290 00:14:35,250 --> 00:14:38,840 291 00:14:38,840 --> 00:14:42,550 >> Yeah, I mean, it's not MIT's fault. They gave me a puzzle piece 292 00:14:42,550 --> 00:14:44,970 that says turn some number of degrees. 293 00:14:44,970 --> 00:14:47,672 And at Victoria's suggestion, I'm turning 180 degrees, 294 00:14:47,672 --> 00:14:48,880 which is the right intuition. 295 00:14:48,880 --> 00:14:53,700 But turning 180 degrees literally means turning 180 degrees, 296 00:14:53,700 --> 00:14:55,860 and that's not really what I want, apparently. 297 00:14:55,860 --> 00:14:58,026 Because at least he's in this two-dimensional world, 298 00:14:58,026 --> 00:15:00,740 so turning is really going to flip him upside down. 299 00:15:00,740 --> 00:15:04,030 >> I probably want to use what block instead, based on what you see here? 300 00:15:04,030 --> 00:15:11,890 301 00:15:11,890 --> 00:15:14,790 How might we fix this? 302 00:15:14,790 --> 00:15:18,380 Yeah, so we could point in the opposite direction. 303 00:15:18,380 --> 00:15:22,300 And actually even that's not going to be enough, 304 00:15:22,300 --> 00:15:26,410 because we can only hard code to pointing left or right. 305 00:15:26,410 --> 00:15:27,920 >> You know what we could do? 306 00:15:27,920 --> 00:15:30,160 It looks like we have a convenience block here. 307 00:15:30,160 --> 00:15:32,987 If I zoom in, see something we like here? 308 00:15:32,987 --> 00:15:36,120 309 00:15:36,120 --> 00:15:40,020 So it looks like MIT has an abstraction built in here. 310 00:15:40,020 --> 00:15:45,440 This block seems to be equivalent to which other blocks, plural? 311 00:15:45,440 --> 00:15:49,510 >> This one block seems to be equivalent to this whole trio of blocks 312 00:15:49,510 --> 00:15:50,880 that we have here. 313 00:15:50,880 --> 00:15:54,670 So it turns out I can simplify my program by getting rid of all of that 314 00:15:54,670 --> 00:15:58,270 and just put this in here. 315 00:15:58,270 --> 00:16:01,620 And now he's still a little buggy, and that's fine for now. 316 00:16:01,620 --> 00:16:03,370 We'll leave that be. 317 00:16:03,370 --> 00:16:06,000 But my program is even simpler, and this, too, 318 00:16:06,000 --> 00:16:09,060 would be representative of a goal in programming-- 319 00:16:09,060 --> 00:16:13,430 is to ideally make your code as simple, as compact as possible, 320 00:16:13,430 --> 00:16:15,650 while still being as readable as possible. 321 00:16:15,650 --> 00:16:20,310 You don't want to make it so succinct that it's hard to understand. 322 00:16:20,310 --> 00:16:22,826 >> But notice I've replaced three blocks with one, 323 00:16:22,826 --> 00:16:24,200 and that's arguably a good thing. 324 00:16:24,200 --> 00:16:27,280 I've abstracted away the notion of checking whether you're 325 00:16:27,280 --> 00:16:29,120 on the edge with just one block. 326 00:16:29,120 --> 00:16:31,520 Now we can have fun with this, in fact. 327 00:16:31,520 --> 00:16:35,790 This doesn't add so much intellectual value but playful value. 328 00:16:35,790 --> 00:16:39,730 I'm going to go ahead and grab this sound here. 329 00:16:39,730 --> 00:16:42,900 330 00:16:42,900 --> 00:16:46,420 So let me go ahead, and let me stop the program for a moment. 331 00:16:46,420 --> 00:16:52,070 I'm going to record the following, allowing access to my microphone. 332 00:16:52,070 --> 00:16:53,181 >> Here we go. 333 00:16:53,181 --> 00:16:53,680 Ouch. 334 00:16:53,680 --> 00:16:58,710 335 00:16:58,710 --> 00:17:01,140 Let's try this again. 336 00:17:01,140 --> 00:17:02,279 Here we go. 337 00:17:02,279 --> 00:17:03,570 OK, I recorded the wrong thing. 338 00:17:03,570 --> 00:17:04,580 Here we go. 339 00:17:04,580 --> 00:17:05,080 Ouch. 340 00:17:05,080 --> 00:17:07,910 341 00:17:07,910 --> 00:17:08,800 Ouch. 342 00:17:08,800 --> 00:17:09,300 All right. 343 00:17:09,300 --> 00:17:10,791 Now I need to get rid of that. 344 00:17:10,791 --> 00:17:11,290 All right. 345 00:17:11,290 --> 00:17:13,950 >> So now I have a recording of just "ouch." 346 00:17:13,950 --> 00:17:18,040 So now I'm going to go ahead and call this "ouch." 347 00:17:18,040 --> 00:17:20,270 I'm going to go back to my scripts, and now 348 00:17:20,270 --> 00:17:25,460 notice there's this block that's called play sound "meow" or play sound "ouch." 349 00:17:25,460 --> 00:17:28,920 I'm going to drag this, and where should I put this for comical effect? 350 00:17:28,920 --> 00:17:31,740 351 00:17:31,740 --> 00:17:37,860 Yeah, so now it's kind of buggy, because now this block-- 352 00:17:37,860 --> 00:17:42,050 notice how this "if on edge, bounce" is kind of self-contained. 353 00:17:42,050 --> 00:17:43,704 So I need to fix this. 354 00:17:43,704 --> 00:17:44,870 Let me go ahead and do this. 355 00:17:44,870 --> 00:17:48,630 Let me get rid of this and go back to our original, more deliberate 356 00:17:48,630 --> 00:17:49,870 functionality. 357 00:17:49,870 --> 00:18:01,080 So "if touching edge, then" I want to turn, as Victoria proposed, 358 00:18:01,080 --> 00:18:02,480 180 degrees. 359 00:18:02,480 --> 00:18:05,497 And do I want to play the sound "ouch" there? 360 00:18:05,497 --> 00:18:11,800 361 00:18:11,800 --> 00:18:15,580 >> Yeah, notice it's outside that yellow block. 362 00:18:15,580 --> 00:18:17,680 So this, too, would be a bug, but I've noticed it. 363 00:18:17,680 --> 00:18:21,290 So I'm going to drag it up here, and notice now it's inside the "if." 364 00:18:21,290 --> 00:18:24,250 So the "if" is this sort of like arm-like blot 365 00:18:24,250 --> 00:18:26,260 that's only going to do what's inside of it. 366 00:18:26,260 --> 00:18:30,216 So now if I zoom out at the risk of annoying-- 367 00:18:30,216 --> 00:18:32,860 368 00:18:32,860 --> 00:18:36,470 >> COMPUTER: Ouch, ouch, ouch. 369 00:18:36,470 --> 00:18:39,910 >> DAVID MALAN: And it will just go on forever. 370 00:18:39,910 --> 00:18:44,160 Now just to accelerate things here, let me go ahead and open up, 371 00:18:44,160 --> 00:18:50,460 let's say-- let me go to some of my own stuff from class. 372 00:18:50,460 --> 00:18:53,000 373 00:18:53,000 --> 00:19:00,220 And let me open up, let's say, this one made by one of our teaching fellows 374 00:19:00,220 --> 00:19:01,500 a couple of years ago. 375 00:19:01,500 --> 00:19:04,732 So some of you might recall this game from yesteryear, 376 00:19:04,732 --> 00:19:05,940 and it's actually remarkable. 377 00:19:05,940 --> 00:19:08,190 Even though we've done the simplest of programs right now, 378 00:19:08,190 --> 00:19:09,980 let's consider what this actually looks like. 379 00:19:09,980 --> 00:19:10,650 Let me hit play. 380 00:19:10,650 --> 00:19:14,210 381 00:19:14,210 --> 00:19:18,980 >> So in this game, we have a frog, and using the arrow keys-- 382 00:19:18,980 --> 00:19:23,340 he takes bigger steps than I remember-- I have control over this frog. 383 00:19:23,340 --> 00:19:29,630 And the goal is to get across the busy road without running into the cars. 384 00:19:29,630 --> 00:19:34,735 And let's see-- if I go up here, I have to wait for a log to scroll by. 385 00:19:34,735 --> 00:19:38,130 386 00:19:38,130 --> 00:19:39,274 This feels like a bug. 387 00:19:39,274 --> 00:19:42,240 388 00:19:42,240 --> 00:19:43,495 This is kind of a bug. 389 00:19:43,495 --> 00:19:45,980 390 00:19:45,980 --> 00:19:46,480 All right. 391 00:19:46,480 --> 00:19:51,550 I'm on this here, there, and then you keep 392 00:19:51,550 --> 00:19:54,100 going until you get all the frogs to the lily pads. 393 00:19:54,100 --> 00:19:55,920 Now this might look all the more complex, 394 00:19:55,920 --> 00:19:57,840 but let's try to break this down mentally 395 00:19:57,840 --> 00:20:00,040 and verbally into its component blocks. 396 00:20:00,040 --> 00:20:03,910 So there's probably a puzzle piece that we haven't seen yet 397 00:20:03,910 --> 00:20:07,440 but that's responding to keystrokes, to things I hit on the keyboard. 398 00:20:07,440 --> 00:20:11,660 >> So there's probably some kind of block that says, if key equals up, 399 00:20:11,660 --> 00:20:15,965 then do something with Scratch-- maybe move it 10 steps this way. 400 00:20:15,965 --> 00:20:20,240 If down key is pressed, move 10 steps this way, or left key, move 10 steps 401 00:20:20,240 --> 00:20:21,710 this way, 10 steps that. 402 00:20:21,710 --> 00:20:23,644 I've clearly turned the cat into a frog. 403 00:20:23,644 --> 00:20:26,060 So that's just where the costume, as Scratch calls it-- we 404 00:20:26,060 --> 00:20:28,440 just imported a picture of the frog. 405 00:20:28,440 --> 00:20:29,570 >> But what else is happening? 406 00:20:29,570 --> 00:20:32,794 What other lines of code, what other puzzle pieces 407 00:20:32,794 --> 00:20:35,460 did Blake, our teaching fellow, use in this program, apparently? 408 00:20:35,460 --> 00:20:38,320 409 00:20:38,320 --> 00:20:42,730 What's making everything move-- what programming construct? 410 00:20:42,730 --> 00:20:44,950 >> Motion, sure-- so the move block, for sure. 411 00:20:44,950 --> 00:20:49,330 And what's that move block inside of, most likely? 412 00:20:49,330 --> 00:20:52,850 Yeah, some kind of loop, maybe a forever block, maybe a repeat block-- 413 00:20:52,850 --> 00:20:54,070 repeat until block. 414 00:20:54,070 --> 00:20:57,330 And that's what's making the logs and the lily pads and everything else move 415 00:20:57,330 --> 00:20:57,990 back and forth. 416 00:20:57,990 --> 00:21:00,270 It's just happening endlessly. 417 00:21:00,270 --> 00:21:03,180 >> Why are some of the cars moving faster than the others? 418 00:21:03,180 --> 00:21:06,607 What is different about those programs? 419 00:21:06,607 --> 00:21:09,690 Yeah, probably some of them are taking more steps at once and some of them 420 00:21:09,690 --> 00:21:10,690 fewer steps at once. 421 00:21:10,690 --> 00:21:14,670 And the visual effect is fast versus slow. 422 00:21:14,670 --> 00:21:16,030 >> What do you think happened? 423 00:21:16,030 --> 00:21:19,700 When I got my frog all the way across the street and the river 424 00:21:19,700 --> 00:21:23,560 onto the lily pad, something noteworthy happened. 425 00:21:23,560 --> 00:21:26,540 What happened as soon as I did that? 426 00:21:26,540 --> 00:21:27,210 It stopped. 427 00:21:27,210 --> 00:21:29,680 That frog stopped, and I got a second frog. 428 00:21:29,680 --> 00:21:33,155 So what construct must be used there, what feature? 429 00:21:33,155 --> 00:21:36,020 430 00:21:36,020 --> 00:21:38,660 >> Yeah, so there's some kind of "if" condition up there, too. 431 00:21:38,660 --> 00:21:41,909 And it turns out-- we didn't see this-- but there's other blocks in there that 432 00:21:41,909 --> 00:21:45,300 can say, if you are touching another thing on the screen, 433 00:21:45,300 --> 00:21:47,720 if you're touching the lily pad, "then." 434 00:21:47,720 --> 00:21:50,810 And then that's when we make the second frog appear. 435 00:21:50,810 --> 00:21:54,969 So even though this game is certainly very dated, even though at first glance 436 00:21:54,969 --> 00:21:58,010 there's so much going on-- and Blake did not whip this up in two minutes, 437 00:21:58,010 --> 00:22:00,390 it probably took him several hours to create this game 438 00:22:00,390 --> 00:22:03,850 based on his memory or videos of yesteryear's version of it. 439 00:22:03,850 --> 00:22:07,940 But all of these little things going on the screen in isolation 440 00:22:07,940 --> 00:22:11,550 boil down to these very simple constructs-- movements or statements 441 00:22:11,550 --> 00:22:15,519 like we've discussed, loops and conditions, and that's about it. 442 00:22:15,519 --> 00:22:17,060 There's a few other fancier features. 443 00:22:17,060 --> 00:22:19,130 Some of them are purely aesthetic or acoustic, 444 00:22:19,130 --> 00:22:20,964 like the sounds I just played with. 445 00:22:20,964 --> 00:22:23,380 But for the most part, you have in this language, Scratch, 446 00:22:23,380 --> 00:22:25,350 all of the fundamental building blocks that you 447 00:22:25,350 --> 00:22:29,280 have in C, Java, JavaScript, PHP, Ruby, Python, 448 00:22:29,280 --> 00:22:32,960 and any number of other languages. 449 00:22:32,960 --> 00:22:36,720 Any questions about Scratch? 450 00:22:36,720 --> 00:22:37,220 All right. 451 00:22:37,220 --> 00:22:40,303 So we won't dive in deeper to Scratch, though you're welcome this weekend, 452 00:22:40,303 --> 00:22:42,860 especially if you have kids or nieces and nephews and such, 453 00:22:42,860 --> 00:22:44,220 to introduce them to Scratch. 454 00:22:44,220 --> 00:22:47,960 It's actually a wonderfully playful environment with, as its authors say, 455 00:22:47,960 --> 00:22:49,120 very high ceilings. 456 00:22:49,120 --> 00:22:51,670 Even though we started with very low-level details, 457 00:22:51,670 --> 00:22:54,890 you can really do quite a bit with it, and this is perhaps 458 00:22:54,890 --> 00:22:57,360 a demonstration of exactly that. 459 00:22:57,360 --> 00:23:02,920 >> But let's now transition to some more sophisticated problems, if you will, 460 00:23:02,920 --> 00:23:05,870 known as "searching" and "sorting," more generally. 461 00:23:05,870 --> 00:23:09,500 We had this phone book earlier-- here's another one just for discussion-- 462 00:23:09,500 --> 00:23:13,460 that we were able to search more efficiently because 463 00:23:13,460 --> 00:23:15,270 of a significant assumption. 464 00:23:15,270 --> 00:23:17,655 And just to be clear, what assumption was I making 465 00:23:17,655 --> 00:23:19,280 when searching through this phone book? 466 00:23:19,280 --> 00:23:23,342 467 00:23:23,342 --> 00:23:25,300 That Mike Smith was in the phone book, though I 468 00:23:25,300 --> 00:23:27,410 would be able to handle the scenario without him 469 00:23:27,410 --> 00:23:30,720 there if I just stopped prematurely. 470 00:23:30,720 --> 00:23:31,806 The book is alphabetical. 471 00:23:31,806 --> 00:23:33,930 And that's a very generous assumption, because that 472 00:23:33,930 --> 00:23:36,580 means someone-- I'm kind of cutting a corner, 473 00:23:36,580 --> 00:23:40,580 like I am faster because someone else did a lot of hard work for me. 474 00:23:40,580 --> 00:23:43,120 >> But what if the phone book were unsorted? 475 00:23:43,120 --> 00:23:47,050 Maybe Verizon got lazy, just threw everyone's names and numbers in there 476 00:23:47,050 --> 00:23:50,120 maybe in the order in which they signed up for phone service. 477 00:23:50,120 --> 00:23:54,570 And how much time does it take me to find someone like Mike Smith? 478 00:23:54,570 --> 00:23:58,160 1,000 page phone book-- how many pages do I have to look through? 479 00:23:58,160 --> 00:23:58,905 >> All of them. 480 00:23:58,905 --> 00:24:00,030 You're sort of out of luck. 481 00:24:00,030 --> 00:24:03,420 You literally have to look at every page if the phone book is just 482 00:24:03,420 --> 00:24:04,450 randomly sorted. 483 00:24:04,450 --> 00:24:06,910 You might get lucky and find Mike on the very first page, because he 484 00:24:06,910 --> 00:24:08,826 was the first customer to order phone service. 485 00:24:08,826 --> 00:24:10,760 But he might have been the last, too. 486 00:24:10,760 --> 00:24:12,500 >> So random order is not good. 487 00:24:12,500 --> 00:24:16,750 So suppose we have to sort the phone book or in general sort data 488 00:24:16,750 --> 00:24:18,520 that we've been given. 489 00:24:18,520 --> 00:24:19,440 How can we do that? 490 00:24:19,440 --> 00:24:21,360 >> Well, let me just try a simple example here. 491 00:24:21,360 --> 00:24:24,290 Let me go ahead and toss a few numbers on the board. 492 00:24:24,290 --> 00:24:35,480 Suppose the numbers we have are, let's say, four, two, one, and three. 493 00:24:35,480 --> 00:24:38,390 And, Ben, sort these numbers for us. 494 00:24:38,390 --> 00:24:39,017 >> OK, good. 495 00:24:39,017 --> 00:24:39,850 How did you do that? 496 00:24:39,850 --> 00:24:42,731 497 00:24:42,731 --> 00:24:43,230 All right. 498 00:24:43,230 --> 00:24:44,710 So start with the smallest value and the highest, 499 00:24:44,710 --> 00:24:46,084 and that's really good intuition. 500 00:24:46,084 --> 00:24:48,080 And realize that we humans are actually pretty 501 00:24:48,080 --> 00:24:49,913 good at solving problems like this, at least 502 00:24:49,913 --> 00:24:51,810 when the data is relatively small. 503 00:24:51,810 --> 00:24:54,860 As soon as you start to have hundreds of numbers, thousands of numbers, 504 00:24:54,860 --> 00:24:58,440 millions of numbers, Ben probably couldn't do it quite that fast, 505 00:24:58,440 --> 00:25:00,620 assuming that there were gaps in the numbers. 506 00:25:00,620 --> 00:25:03,450 Pretty easy to count to a million otherwise, just time consuming. 507 00:25:03,450 --> 00:25:07,150 >> So the algorithm it sounds like Ben used just now 508 00:25:07,150 --> 00:25:08,930 was search for the smallest number. 509 00:25:08,930 --> 00:25:12,900 So even though we humans can take in a lot of information visually, 510 00:25:12,900 --> 00:25:14,830 a computer is actually a little more limited. 511 00:25:14,830 --> 00:25:17,560 The computer can only look at one byte at a time 512 00:25:17,560 --> 00:25:20,770 or maybe four bytes at a time-- these days maybe 8 bytes at a time-- 513 00:25:20,770 --> 00:25:24,450 but a very small number of bytes at a given time. 514 00:25:24,450 --> 00:25:28,480 >> So given that we really have four separate values here-- 515 00:25:28,480 --> 00:25:32,440 and you can think of Ben as having blinders on if he were a computer such 516 00:25:32,440 --> 00:25:36,450 that he can't see anything other than one number at a time-- 517 00:25:36,450 --> 00:25:39,720 so we generally will assume, like in English, we'll read from right to left. 518 00:25:39,720 --> 00:25:42,870 So the first number Ben probably looked at was four and then very quickly 519 00:25:42,870 --> 00:25:44,770 realized that's a pretty big number-- let me keep looking. 520 00:25:44,770 --> 00:25:45,357 >> There's two. 521 00:25:45,357 --> 00:25:45,940 Wait a minute. 522 00:25:45,940 --> 00:25:47,070 Two is smaller than four. 523 00:25:47,070 --> 00:25:47,986 I'm going to remember. 524 00:25:47,986 --> 00:25:49,070 Two is now the smallest. 525 00:25:49,070 --> 00:25:50,417 Now one-- that's even better. 526 00:25:50,417 --> 00:25:51,250 That's even smaller. 527 00:25:51,250 --> 00:25:54,000 I'm going to forget about two and just remember one now. 528 00:25:54,000 --> 00:25:56,550 >> And could he stop looking? 529 00:25:56,550 --> 00:25:58,360 Well, he could based on this information, 530 00:25:58,360 --> 00:26:00,477 but he'd better search the rest of the list. 531 00:26:00,477 --> 00:26:02,060 Because what if zero were in the list? 532 00:26:02,060 --> 00:26:03,643 What if negative one were in the list? 533 00:26:03,643 --> 00:26:07,720 He only knows that his answer is correct if he's exhaustively 534 00:26:07,720 --> 00:26:08,729 checked the whole list. 535 00:26:08,729 --> 00:26:10,020 So we look at the rest of this. 536 00:26:10,020 --> 00:26:11,394 Three-- that was a waste of time. 537 00:26:11,394 --> 00:26:13,540 Got unlucky, but I was still correct to do so. 538 00:26:13,540 --> 00:26:17,857 And so now he presumably selected the smallest number 539 00:26:17,857 --> 00:26:20,440 and just put it at the beginning of the list, as I'll do here. 540 00:26:20,440 --> 00:26:23,480 Now what did you do next, even though you didn't think about it nearly 541 00:26:23,480 --> 00:26:25,962 to this extent? 542 00:26:25,962 --> 00:26:27,670 Repeat the process, so some kind of loop. 543 00:26:27,670 --> 00:26:28,920 There's a familiar idea. 544 00:26:28,920 --> 00:26:30,860 So here is four. 545 00:26:30,860 --> 00:26:32,110 That's currently the smallest. 546 00:26:32,110 --> 00:26:33,220 That's a candidate. 547 00:26:33,220 --> 00:26:33,900 Not anymore. 548 00:26:33,900 --> 00:26:34,770 Now I've seen two. 549 00:26:34,770 --> 00:26:36,630 That's the next smallest element. 550 00:26:36,630 --> 00:26:40,800 Three-- that's not smaller, so now Ben can pluck out the two. 551 00:26:40,800 --> 00:26:44,510 >> And now we repeat the process, and of course three gets pulled out next. 552 00:26:44,510 --> 00:26:45,420 Repeat the process. 553 00:26:45,420 --> 00:26:46,990 Four gets pulled out. 554 00:26:46,990 --> 00:26:50,140 And now we're out of numbers, so the list must be sorted. 555 00:26:50,140 --> 00:26:51,960 >> And indeed, this is a formal algorithm. 556 00:26:51,960 --> 00:26:56,610 A computer scientist would call this "selection sort," 557 00:26:56,610 --> 00:27:00,880 the idea being sort a list iteratively-- again 558 00:27:00,880 --> 00:27:03,807 and again and again selecting the smallest number. 559 00:27:03,807 --> 00:27:06,140 And what's nice about it is it's just so darn intuitive. 560 00:27:06,140 --> 00:27:07,470 It's so simple. 561 00:27:07,470 --> 00:27:11,100 And you can repeat the same operation again and again. 562 00:27:11,100 --> 00:27:12,150 It's simple. 563 00:27:12,150 --> 00:27:17,170 >> In this case it was fast, but how long does it actually take? 564 00:27:17,170 --> 00:27:19,880 Let's make it seem and feel a little more tedious. 565 00:27:19,880 --> 00:27:24,150 So one, two, three, four, five six, seven, eight, nine, 10, 11, 12, 13, 14, 566 00:27:24,150 --> 00:27:26,160 15, 16-- arbitrary number. 567 00:27:26,160 --> 00:27:28,780 I just wanted more this time than just the four. 568 00:27:28,780 --> 00:27:30,780 So if I've got a whole bunch of numbers now-- it 569 00:27:30,780 --> 00:27:32,420 doesn't even matter what they are-- let's 570 00:27:32,420 --> 00:27:34,380 think about what this algorithm really is like. 571 00:27:34,380 --> 00:27:35,857 >> Suppose there are numbers there. 572 00:27:35,857 --> 00:27:38,190 Again, doesn't matter what they are, but they're random. 573 00:27:38,190 --> 00:27:39,679 I am applying Ben's algorithm. 574 00:27:39,679 --> 00:27:41,220 I need to select the smallest number. 575 00:27:41,220 --> 00:27:41,761 What do I do? 576 00:27:41,761 --> 00:27:44,240 And I'm going to physically do it this time to act it out. 577 00:27:44,240 --> 00:27:46,099 Looking, looking, looking, looking, looking. 578 00:27:46,099 --> 00:27:48,140 Only by the time I get to the end of the list can 579 00:27:48,140 --> 00:27:51,230 I realize the smallest number was two this time. 580 00:27:51,230 --> 00:27:52,720 One's not in the list. 581 00:27:52,720 --> 00:27:54,400 So I put down two. 582 00:27:54,400 --> 00:27:55,590 >> What do I do next? 583 00:27:55,590 --> 00:27:58,600 Looking, looking, looking, looking. 584 00:27:58,600 --> 00:28:02,250 Now I found the number seven, because there's gaps in these numbers-- 585 00:28:02,250 --> 00:28:03,300 but just arbitrary. 586 00:28:03,300 --> 00:28:03,800 All right. 587 00:28:03,800 --> 00:28:06,030 So now I can put down seven. 588 00:28:06,030 --> 00:28:08,860 Looking looking, looking. 589 00:28:08,860 --> 00:28:11,030 >> Now I'm assuming, of course, that Ben doesn't 590 00:28:11,030 --> 00:28:14,780 have extra RAM, extra memory, because, of course, 591 00:28:14,780 --> 00:28:16,080 I'm looking at the same number. 592 00:28:16,080 --> 00:28:18,246 Surely I could have remembered all of those numbers, 593 00:28:18,246 --> 00:28:19,930 and that's absolutely true. 594 00:28:19,930 --> 00:28:22,610 But if Ben remembers all of the numbers he's seen, 595 00:28:22,610 --> 00:28:24,430 he hasn't really made fundamental progress 596 00:28:24,430 --> 00:28:26,170 because he already has the ability to search 597 00:28:26,170 --> 00:28:27,540 through the numbers on the board. 598 00:28:27,540 --> 00:28:29,373 Remembering all of the numbers doesn't help, 599 00:28:29,373 --> 00:28:32,490 because he can still as a computer only look at, we've said, one number 600 00:28:32,490 --> 00:28:33,080 at a time. 601 00:28:33,080 --> 00:28:35,760 So there's no sort of cheat there that you can leverage. 602 00:28:35,760 --> 00:28:39,170 >> So in reality, as I keep searching the list, 603 00:28:39,170 --> 00:28:44,200 I literally have to just keep going back and forth through it, plucking out 604 00:28:44,200 --> 00:28:45,710 the next smallest number. 605 00:28:45,710 --> 00:28:48,810 And as you can kind of infer from my silly movements, 606 00:28:48,810 --> 00:28:50,860 this just gets very tedious very quickly, 607 00:28:50,860 --> 00:28:54,850 and I seem to be going back and forth, back and forth quite a bit. 608 00:28:54,850 --> 00:29:03,220 Now to be fair, I don't have to go quite as, well, let's see-- to be fair, 609 00:29:03,220 --> 00:29:06,310 I don't have to walk quite as many steps each time. 610 00:29:06,310 --> 00:29:09,200 Because, of course, as I select numbers from the list, 611 00:29:09,200 --> 00:29:11,860 the remaining list is getting shorter. 612 00:29:11,860 --> 00:29:14,240 >> And so let's think about how many steps I'm actually 613 00:29:14,240 --> 00:29:16,010 traipsing through each time. 614 00:29:16,010 --> 00:29:18,950 In the very first situation we had 16 numbers, 615 00:29:18,950 --> 00:29:22,210 and so maximally-- let's just do this for a discussion-- 616 00:29:22,210 --> 00:29:25,640 I had to look through 16 numbers to find the smallest. 617 00:29:25,640 --> 00:29:28,420 But once I plucked out the smallest number, how 618 00:29:28,420 --> 00:29:30,590 long was the remaining list, of course? 619 00:29:30,590 --> 00:29:31,420 Just 15. 620 00:29:31,420 --> 00:29:34,670 So how many numbers did Ben or I have to look through the second time around? 621 00:29:34,670 --> 00:29:36,832 15, just to go and find the smallest. 622 00:29:36,832 --> 00:29:39,540 But now, of course, the list is, too, smaller than it was before. 623 00:29:39,540 --> 00:29:42,540 So how many steps did I have to take the next time? 624 00:29:42,540 --> 00:29:49,970 14 and then 13 and then 12, plus dot, dot, dot, until I'm left with just one. 625 00:29:49,970 --> 00:29:53,146 So now a computer scientist would ask, well, what does that all equal? 626 00:29:53,146 --> 00:29:55,770 It actually equals some concrete number that we could certainly 627 00:29:55,770 --> 00:30:00,490 do arithmetically, but we want to talk about the efficiency of algorithms 628 00:30:00,490 --> 00:30:04,940 a little more formulaically, independent of how long the list is. 629 00:30:04,940 --> 00:30:06,240 >> And so you know what? 630 00:30:06,240 --> 00:30:09,860 This is 16, but like I said before, let's just call the size of the problem 631 00:30:09,860 --> 00:30:10,970 n, where n is some number. 632 00:30:10,970 --> 00:30:13,220 Maybe it's 16, maybe it's three, maybe it's a million. 633 00:30:13,220 --> 00:30:13,761 I don't know. 634 00:30:13,761 --> 00:30:14,390 I don't care. 635 00:30:14,390 --> 00:30:16,520 What I really want is a formula that I can 636 00:30:16,520 --> 00:30:19,420 use to compare this algorithm against other algorithms 637 00:30:19,420 --> 00:30:22,350 that someone might claim are better or worse. 638 00:30:22,350 --> 00:30:25,430 >> So it turns out, and I only know this from grade school, 639 00:30:25,430 --> 00:30:34,790 that this actually works out to the same thing as n over n plus one over two. 640 00:30:34,790 --> 00:30:40,020 And this happens to equal, of course, n squared plus n over two. 641 00:30:40,020 --> 00:30:43,250 So if I wanted a formula for how many steps 642 00:30:43,250 --> 00:30:46,330 were involved in looking at all of those numbers again and again 643 00:30:46,330 --> 00:30:52,681 and again and again, I would say it's n squared plus n over two. 644 00:30:52,681 --> 00:30:53,430 But you know what? 645 00:30:53,430 --> 00:30:54,500 This just looks messy. 646 00:30:54,500 --> 00:30:56,470 I just really want a general sense of things. 647 00:30:56,470 --> 00:30:58,810 And you might recall from high school that there 648 00:30:58,810 --> 00:31:00,660 is the notion of highest order term. 649 00:31:00,660 --> 00:31:05,300 Which of these terms, the n squared, the n, or the half, 650 00:31:05,300 --> 00:31:07,550 has the most impact over time? 651 00:31:07,550 --> 00:31:11,920 The bigger n gets, which of these matters the most? 652 00:31:11,920 --> 00:31:15,560 >> In other words, if I plug in a million, n squared 653 00:31:15,560 --> 00:31:17,900 is going to be most likely the dominating factor, 654 00:31:17,900 --> 00:31:21,670 because a million times itself is a lot bigger 655 00:31:21,670 --> 00:31:23,682 than plus one additional million. 656 00:31:23,682 --> 00:31:24,390 So you know what? 657 00:31:24,390 --> 00:31:27,305 This is such a darn big number if you square a number. 658 00:31:27,305 --> 00:31:28,430 This doesn't really matter. 659 00:31:28,430 --> 00:31:30,596 We're just going cross that out and forget about it. 660 00:31:30,596 --> 00:31:34,250 And so a computer scientist would say that the efficiency of this algorithm 661 00:31:34,250 --> 00:31:37,850 is on the order of n squared-- I mean truly an approximation. 662 00:31:37,850 --> 00:31:40,810 It is sort of roughly n squared. 663 00:31:40,810 --> 00:31:44,130 Over time, the bigger and bigger n gets, this 664 00:31:44,130 --> 00:31:47,610 is a good estimation for what the efficiency or lack of efficiency 665 00:31:47,610 --> 00:31:49,400 of this algorithm actually is. 666 00:31:49,400 --> 00:31:52,040 And I derive that, of course, from actually doing the math. 667 00:31:52,040 --> 00:31:54,040 But now I'm just waving my hands, because I just 668 00:31:54,040 --> 00:31:55,790 want a general sense of this algorithm. 669 00:31:55,790 --> 00:31:58,850 >> So using the same logic, meanwhile, let's consider another algorithm 670 00:31:58,850 --> 00:32:01,162 we already looked at-- linear search. 671 00:32:01,162 --> 00:32:02,870 When I was searching for the phone book-- 672 00:32:02,870 --> 00:32:05,980 not sorting it, searching through the phone book-- 673 00:32:05,980 --> 00:32:09,197 we kept saying that it was 1,000 steps, or 500 steps. 674 00:32:09,197 --> 00:32:10,280 But let's generalize that. 675 00:32:10,280 --> 00:32:12,860 If there's n pages in the phone book, what's 676 00:32:12,860 --> 00:32:17,250 the running time or the efficiency of linear search? 677 00:32:17,250 --> 00:32:19,750 It's on the order of how many steps to find 678 00:32:19,750 --> 00:32:24,210 Mike Smith using linear search, the first algorithm, or even the second? 679 00:32:24,210 --> 00:32:27,240 680 00:32:27,240 --> 00:32:31,710 >> In the worst case, Mike is at the end of the book. 681 00:32:31,710 --> 00:32:35,590 So if the phone book has 1,000 pages, we said last time, in the worst case, 682 00:32:35,590 --> 00:32:38,380 it might take roughly how many pages to find Mike? 683 00:32:38,380 --> 00:32:38,990 Like 1,000. 684 00:32:38,990 --> 00:32:39,830 It's an upper bound. 685 00:32:39,830 --> 00:32:41,790 It's a worst possible situation. 686 00:32:41,790 --> 00:32:44,410 But again, we're moving away from numbers like 1,000 now. 687 00:32:44,410 --> 00:32:45,730 It's just n. 688 00:32:45,730 --> 00:32:47,470 >> So what's the logical conclusion? 689 00:32:47,470 --> 00:32:50,210 Finding Mike in a phone book that has n pages 690 00:32:50,210 --> 00:32:55,280 might take, in the very worst case, how many steps on the order of n? 691 00:32:55,280 --> 00:32:58,110 And indeed a computer scientist would say 692 00:32:58,110 --> 00:33:02,340 that the running time, or the performance or the efficiency 693 00:33:02,340 --> 00:33:07,470 or inefficiency, of an algorithm like a linear search is on the order of n. 694 00:33:07,470 --> 00:33:10,010 And we can apply the same logic of crossing something out 695 00:33:10,010 --> 00:33:13,170 as I just did to the second algorithm we had with the phone book, 696 00:33:13,170 --> 00:33:16,040 where we went two pages at a time. 697 00:33:16,040 --> 00:33:20,120 >> So 1,000 page phone book might take us 500 page turns, plus one 698 00:33:20,120 --> 00:33:21,910 if we double back a bit. 699 00:33:21,910 --> 00:33:26,590 So if a phone book has n pages, but we're doing two pages at a time, 700 00:33:26,590 --> 00:33:28,900 that's roughly what? 701 00:33:28,900 --> 00:33:33,190 N over two, so that's like n over two. 702 00:33:33,190 --> 00:33:38,490 But I made the claim a moment ago that n over two-- 703 00:33:38,490 --> 00:33:41,060 that's kind of the same as just n. 704 00:33:41,060 --> 00:33:44,050 It's just a constant factor, computer scientists would say. 705 00:33:44,050 --> 00:33:45,970 Let's only focus on the variables, really-- 706 00:33:45,970 --> 00:33:47,780 the biggest variables in the equation. 707 00:33:47,780 --> 00:33:52,530 >> So linear search, whether done one page at a time or two pages at a time, 708 00:33:52,530 --> 00:33:54,810 is sort of fundamentally the same. 709 00:33:54,810 --> 00:33:56,880 It's still on the order of n. 710 00:33:56,880 --> 00:34:01,930 But I claimed with my picture earlier that the third algorithm was not 711 00:34:01,930 --> 00:34:02,480 linear. 712 00:34:02,480 --> 00:34:03,605 It was not a straight line. 713 00:34:03,605 --> 00:34:08,659 It was that curved line, and the algebraic formula there was what? 714 00:34:08,659 --> 00:34:11,812 Log of n-- so log base two of n. 715 00:34:11,812 --> 00:34:14,520 And we don't have to go into too much detail on logarithms today, 716 00:34:14,520 --> 00:34:17,394 but most computer scientists wouldn't even tell you what the base is. 717 00:34:17,394 --> 00:34:20,639 Because it's all just constant factors, so to speak, 718 00:34:20,639 --> 00:34:22,659 just slight numeric differences. 719 00:34:22,659 --> 00:34:31,179 And so this would be a very common way for particularly formal computer 720 00:34:31,179 --> 00:34:33,949 scientists at a board or programmers at a white board 721 00:34:33,949 --> 00:34:36,889 actually arguing which algorithm they would use 722 00:34:36,889 --> 00:34:39,500 or what the efficiency of their algorithm is. 723 00:34:39,500 --> 00:34:42,960 >> And this isn't necessarily something you discuss in any great detail, 724 00:34:42,960 --> 00:34:47,889 but a good programmer is someone who has a solid, formal background. 725 00:34:47,889 --> 00:34:50,120 He's able to speak to you in this kind of way 726 00:34:50,120 --> 00:34:53,350 and actually make qualitative arguments as 727 00:34:53,350 --> 00:34:56,870 to why one algorithm or one piece of software 728 00:34:56,870 --> 00:35:00,165 is superior in some way to another. 729 00:35:00,165 --> 00:35:02,540 Because you could certainly just run one person's program 730 00:35:02,540 --> 00:35:04,980 and count the number of seconds it takes to sort some numbers, 731 00:35:04,980 --> 00:35:06,710 and you can run some other person's program 732 00:35:06,710 --> 00:35:08,418 and count the number of seconds it takes. 733 00:35:08,418 --> 00:35:12,840 But this is a more general way that you can use to analyze algorithms, 734 00:35:12,840 --> 00:35:15,520 if you will, just on paper or just verbally. 735 00:35:15,520 --> 00:35:18,430 Without even running it, without even trying sample inputs, 736 00:35:18,430 --> 00:35:20,180 you can just reason through it. 737 00:35:20,180 --> 00:35:24,670 And so with hiring a developer or if having him or her sort of argue to you 738 00:35:24,670 --> 00:35:28,460 why their algorithm, their secret sauce for searching billions 739 00:35:28,460 --> 00:35:30,580 of web pages for your company is better, these 740 00:35:30,580 --> 00:35:33,302 are the kinds of arguments they should ideally be able to make. 741 00:35:33,302 --> 00:35:35,010 Or at least these are the kinds of things 742 00:35:35,010 --> 00:35:40,211 that would come up in discussion, at least in a very formal discussion. 743 00:35:40,211 --> 00:35:40,710 All right. 744 00:35:40,710 --> 00:35:44,400 So Ben proposed something called selection sort. 745 00:35:44,400 --> 00:35:48,200 But I'm going to propose that there's other ways of doing this, too. 746 00:35:48,200 --> 00:35:50,400 What I didn't really like about Ben's algorithm 747 00:35:50,400 --> 00:35:54,400 is that he kept walking, or having me walk, back and forth 748 00:35:54,400 --> 00:35:56,930 and back and forth and back and forth. 749 00:35:56,930 --> 00:36:04,130 What if instead I were to do something like these numbers here 750 00:36:04,130 --> 00:36:08,200 and I were to just deal with each number in turn as I'm given it? 751 00:36:08,200 --> 00:36:10,780 >> In other words, here's my list of numbers. 752 00:36:10,780 --> 00:36:12,944 Four, one, three, two. 753 00:36:12,944 --> 00:36:14,360 And I'm going to do the following. 754 00:36:14,360 --> 00:36:17,230 I'm going to insert the numbers where they belong rather 755 00:36:17,230 --> 00:36:18,980 than selecting them one at a time. 756 00:36:18,980 --> 00:36:20,820 In other words, here's the number four. 757 00:36:20,820 --> 00:36:22,430 >> Here's my original list. 758 00:36:22,430 --> 00:36:25,290 And I'm going to maintain essentially a new list here. 759 00:36:25,290 --> 00:36:26,710 So this is the old list. 760 00:36:26,710 --> 00:36:28,560 This is the new list. 761 00:36:28,560 --> 00:36:30,220 I see the number four first. 762 00:36:30,220 --> 00:36:34,500 My new list is initially empty, so it is trivially the case 763 00:36:34,500 --> 00:36:36,410 that four is now assorted list. 764 00:36:36,410 --> 00:36:39,560 I'm just taking the number I'm given, and I'm putting it in my new list. 765 00:36:39,560 --> 00:36:41,460 >> Is this new list sorted? 766 00:36:41,460 --> 00:36:41,990 Yeah. 767 00:36:41,990 --> 00:36:45,090 It's stupid because there's just one element, but it's absolutely sorted. 768 00:36:45,090 --> 00:36:46,390 There's nothing out of place. 769 00:36:46,390 --> 00:36:49,290 It's more interesting, this algorithm, when I move to the next step. 770 00:36:49,290 --> 00:36:50,550 >> Now I have one. 771 00:36:50,550 --> 00:36:55,430 So one, of course, belongs at the beginning or the end of this new list? 772 00:36:55,430 --> 00:36:56,360 The beginning. 773 00:36:56,360 --> 00:36:58,530 So I have to do some work now. 774 00:36:58,530 --> 00:37:01,410 I've been taking some liberties with my marker 775 00:37:01,410 --> 00:37:03,120 by just drawing things where I want them, 776 00:37:03,120 --> 00:37:05,320 but that's not really accurate in a computer. 777 00:37:05,320 --> 00:37:08,530 A computer, as we know, has RAM, or Random Access Memory, 778 00:37:08,530 --> 00:37:12,411 and that's one byte and another byte and another byte. 779 00:37:12,411 --> 00:37:14,910 And if you have a gigabyte of RAM, you have a billion bytes, 780 00:37:14,910 --> 00:37:16,680 but they're physically in one location. 781 00:37:16,680 --> 00:37:19,540 You can't just move stuff around by drawing it on the board 782 00:37:19,540 --> 00:37:20,750 wherever you want. 783 00:37:20,750 --> 00:37:24,090 So if my new list has four locations in memory, 784 00:37:24,090 --> 00:37:27,480 unfortunately the four is already in the wrong place. 785 00:37:27,480 --> 00:37:30,410 >> So to insert the number one I can't just draw it here. 786 00:37:30,410 --> 00:37:31,901 This memory location doesn't exist. 787 00:37:31,901 --> 00:37:35,150 That would be cheating, and I have been cheating pictorially for a few minutes 788 00:37:35,150 --> 00:37:35,800 here. 789 00:37:35,800 --> 00:37:40,950 So really, if I want to put one here, I have to temporarily copy the four 790 00:37:40,950 --> 00:37:43,030 and then put the one there. 791 00:37:43,030 --> 00:37:45,500 >> That's fine, that's correct, that's technically possible, 792 00:37:45,500 --> 00:37:48,410 but realize that's extra work. 793 00:37:48,410 --> 00:37:50,460 I didn't just put the number in place. 794 00:37:50,460 --> 00:37:53,026 I first had to move a number, then put it in place, 795 00:37:53,026 --> 00:37:54,650 so I kind of doubled my amount of work. 796 00:37:54,650 --> 00:37:55,660 So keep that in mind. 797 00:37:55,660 --> 00:37:57,120 >> But I'm now done with this element. 798 00:37:57,120 --> 00:37:59,056 Now I want to grab the number three. 799 00:37:59,056 --> 00:38:00,430 Where, of course, does it belong? 800 00:38:00,430 --> 00:38:01,480 In between. 801 00:38:01,480 --> 00:38:03,650 I can't cheat anymore and just put it there, 802 00:38:03,650 --> 00:38:06,770 because, again, this memory is in physical locations. 803 00:38:06,770 --> 00:38:10,900 So I have to copy the four and put the three over here. 804 00:38:10,900 --> 00:38:11,550 Not a big deal. 805 00:38:11,550 --> 00:38:14,610 It's just one extra step again-- feels very inexpensive. 806 00:38:14,610 --> 00:38:16,445 >> But now I move on to the two. 807 00:38:16,445 --> 00:38:17,820 The two, of course, belongs here. 808 00:38:17,820 --> 00:38:20,990 Now you start to see how the work can pile up. 809 00:38:20,990 --> 00:38:23,520 Now what do I have to do? 810 00:38:23,520 --> 00:38:28,570 Yeah, I have to move the four, I then have to copy the three, 811 00:38:28,570 --> 00:38:31,200 and now I can insert the two. 812 00:38:31,200 --> 00:38:34,460 And the catch with this algorithm, interestingly enough, 813 00:38:34,460 --> 00:38:41,050 is that suppose we have a more extreme case where it's let's say eight, seven, 814 00:38:41,050 --> 00:38:45,150 six, five, four, three, two, one. 815 00:38:45,150 --> 00:38:49,450 This is, in many contexts, the worst case scenario, 816 00:38:49,450 --> 00:38:51,570 because the darn thing is literally backwards. 817 00:38:51,570 --> 00:38:53,670 >> It doesn't really affect Ben's algorithm, 818 00:38:53,670 --> 00:38:55,940 because in Ben's selection sort he's going to keep 819 00:38:55,940 --> 00:38:58,359 going back and forth through the list. 820 00:38:58,359 --> 00:39:01,150 And because he was always looking through the whole remaining list, 821 00:39:01,150 --> 00:39:02,858 it doesn't matter where the elements are. 822 00:39:02,858 --> 00:39:05,630 But in this case with my inserting approach-- let's try this. 823 00:39:05,630 --> 00:39:08,616 >> So one, two, three, four, five, six, seven, eight. 824 00:39:08,616 --> 00:39:11,630 One, two, three, four, five, six, seven, eight. 825 00:39:11,630 --> 00:39:14,320 I'm going to take the eight, and where do I put it? 826 00:39:14,320 --> 00:39:17,260 Well, at the beginning of my list, because this new list is sorted. 827 00:39:17,260 --> 00:39:18,760 And I cross it out. 828 00:39:18,760 --> 00:39:20,551 >> Where do I put the seven? 829 00:39:20,551 --> 00:39:21,050 Darn it. 830 00:39:21,050 --> 00:39:23,174 It needs to go there, so I have to do some copying. 831 00:39:23,174 --> 00:39:26,820 832 00:39:26,820 --> 00:39:28,480 And now the seven goes here. 833 00:39:28,480 --> 00:39:29,860 Now I move on to the six. 834 00:39:29,860 --> 00:39:30,980 Now it's even more work. 835 00:39:30,980 --> 00:39:32,570 >> Eight has to go here. 836 00:39:32,570 --> 00:39:33,920 Seven has to go here. 837 00:39:33,920 --> 00:39:35,450 Now the six can go here. 838 00:39:35,450 --> 00:39:37,950 Now I grab the five. 839 00:39:37,950 --> 00:39:40,560 Now the eight has to go here, seven has to go here, 840 00:39:40,560 --> 00:39:43,650 six has to go here, and now the five and repeat. 841 00:39:43,650 --> 00:39:46,610 And I'm pretty much moving it constantly. 842 00:39:46,610 --> 00:39:52,950 >> So at the end, this algorithm-- we'll call it insertion sort-- actually 843 00:39:52,950 --> 00:39:55,020 has a lot of work, too. 844 00:39:55,020 --> 00:39:56,970 It's just different kind of work than Ben's. 845 00:39:56,970 --> 00:40:00,090 Ben's work had me going back and forth all the time, 846 00:40:00,090 --> 00:40:03,510 selecting the next smallest element again and again. 847 00:40:03,510 --> 00:40:06,660 So it was this very visual kind of work. 848 00:40:06,660 --> 00:40:10,600 >> This other algorithm, which is still correct-- it will get the job done-- 849 00:40:10,600 --> 00:40:12,800 just changes the amount of work. 850 00:40:12,800 --> 00:40:15,420 It looks like initially you're saving, because you're just 851 00:40:15,420 --> 00:40:19,190 dealing with each element up front without walking all 852 00:40:19,190 --> 00:40:20,930 the way through the list like Ben was. 853 00:40:20,930 --> 00:40:25,300 But the problem is, especially in these crazy cases where it's all backwards, 854 00:40:25,300 --> 00:40:27,830 you're just kind of postponing the hard work 855 00:40:27,830 --> 00:40:30,360 until you have to fix your mistakes. 856 00:40:30,360 --> 00:40:33,919 >> And so if you can imagine this eight and seven and six and five 857 00:40:33,919 --> 00:40:36,710 and later four and three and two moving their way through the list, 858 00:40:36,710 --> 00:40:39,060 we've just changed the type of work we're doing. 859 00:40:39,060 --> 00:40:42,340 Instead of doing it at the beginning of my iteration, 860 00:40:42,340 --> 00:40:45,250 I'm just doing it at the end of every iteration. 861 00:40:45,250 --> 00:40:50,550 So it turns out that this algorithm, too, generally called insertion sort, 862 00:40:50,550 --> 00:40:52,190 is also on the order of n squared. 863 00:40:52,190 --> 00:40:56,480 It's actually no better, no better at all. 864 00:40:56,480 --> 00:41:00,810 >> However, there's a third approach I would encourage us to consider, 865 00:41:00,810 --> 00:41:02,970 which is this. 866 00:41:02,970 --> 00:41:07,850 So suppose my list, for simplicity again, is four, one, three, 867 00:41:07,850 --> 00:41:11,080 two-- just four numbers. 868 00:41:11,080 --> 00:41:13,300 Ben had good intuition, good human intuition 869 00:41:13,300 --> 00:41:16,340 before, by which we fixed the whole list eventually-- insertion sort. 870 00:41:16,340 --> 00:41:18,020 I coaxed us along. 871 00:41:18,020 --> 00:41:22,530 But let's consider the simplest way to fix this list. 872 00:41:22,530 --> 00:41:24,110 >> This list is not sorted. 873 00:41:24,110 --> 00:41:26,130 Why? 874 00:41:26,130 --> 00:41:31,920 In English, explain why it's not actually sorted. 875 00:41:31,920 --> 00:41:33,400 What does it mean not to be sorted? 876 00:41:33,400 --> 00:41:34,220 >> STUDENT: It's not sequential. 877 00:41:34,220 --> 00:41:34,990 >> DAVID MALAN: Not sequential. 878 00:41:34,990 --> 00:41:35,822 Give me an example. 879 00:41:35,822 --> 00:41:37,180 >> STUDENT: Put them in order. 880 00:41:37,180 --> 00:41:37,440 >> DAVID MALAN: OK. 881 00:41:37,440 --> 00:41:38,790 Give me a more specific example. 882 00:41:38,790 --> 00:41:39,832 >> STUDENT: Ascending order. 883 00:41:39,832 --> 00:41:41,206 DAVID MALAN: Not ascending order. 884 00:41:41,206 --> 00:41:42,100 Be more precise. 885 00:41:42,100 --> 00:41:45,190 I don't know what you mean by ascending. 886 00:41:45,190 --> 00:41:47,150 What's wrong? 887 00:41:47,150 --> 00:41:49,930 >> STUDENT: The smallest of the numbers is not in the first space. 888 00:41:49,930 --> 00:41:51,140 >> DAVID MALAN: Smallest number's not in the first space. 889 00:41:51,140 --> 00:41:52,120 Be more specific. 890 00:41:52,120 --> 00:41:55,000 I'm starting to catch on. 891 00:41:55,000 --> 00:41:59,470 We're counting, but what's out of order here? 892 00:41:59,470 --> 00:42:00,707 >> STUDENT: Numerical sequence. 893 00:42:00,707 --> 00:42:02,040 DAVID MALAN: Numerical sequence. 894 00:42:02,040 --> 00:42:04,248 Everyone's kind of keeping it here-- very high level. 895 00:42:04,248 --> 00:42:07,450 Just literally tell me what's wrong like a five-year-old might. 896 00:42:07,450 --> 00:42:08,310 >> STUDENT: Plus one. 897 00:42:08,310 --> 00:42:08,750 >> DAVID MALAN: What's that? 898 00:42:08,750 --> 00:42:09,610 >> STUDENT: Plus one. 899 00:42:09,610 --> 00:42:11,235 >> DAVID MALAN: What do you mean plus one? 900 00:42:11,235 --> 00:42:12,754 901 00:42:12,754 --> 00:42:14,170 Give me a different five-year-old. 902 00:42:14,170 --> 00:42:16,840 903 00:42:16,840 --> 00:42:18,330 What's wrong, mom? 904 00:42:18,330 --> 00:42:19,940 What's wrong, dad? 905 00:42:19,940 --> 00:42:22,808 What do you mean this isn't sorted? 906 00:42:22,808 --> 00:42:24,370 >> STUDENT: It's not the right place. 907 00:42:24,370 --> 00:42:25,580 >> DAVID MALAN: What's not in the right place? 908 00:42:25,580 --> 00:42:26,174 >> STUDENT: Four. 909 00:42:26,174 --> 00:42:27,090 DAVID MALAN: OK, good. 910 00:42:27,090 --> 00:42:29,110 So four is not where it should be. 911 00:42:29,110 --> 00:42:30,590 In particular, is this right? 912 00:42:30,590 --> 00:42:33,000 Four and one, the first two numbers I see. 913 00:42:33,000 --> 00:42:34,930 Is this right? 914 00:42:34,930 --> 00:42:36,427 No, they're out of order, right? 915 00:42:36,427 --> 00:42:38,135 In fact, think now about a computer, too. 916 00:42:38,135 --> 00:42:40,824 It can only look at maybe one, maybe two things at once-- 917 00:42:40,824 --> 00:42:43,240 and actually only one thing at a time, but it can at least 918 00:42:43,240 --> 00:42:45,790 look at one thing then the next thing right next to it. 919 00:42:45,790 --> 00:42:47,380 >> So are these in order? 920 00:42:47,380 --> 00:42:48,032 Of course not. 921 00:42:48,032 --> 00:42:48,740 So you know what? 922 00:42:48,740 --> 00:42:51,020 Why don't we take baby steps fixing this problem 923 00:42:51,020 --> 00:42:53,410 instead of doing these fancy algorithms like Ben, where 924 00:42:53,410 --> 00:42:56,440 he's sort of fixing it by looping through the list 925 00:42:56,440 --> 00:42:59,670 instead of doing what I did, where I just kind of fixed it as we go? 926 00:42:59,670 --> 00:43:03,650 Let's just literally break down the notion of order-- numerical order, 927 00:43:03,650 --> 00:43:06,990 call it whatever you want-- into these pairwise comparisons. 928 00:43:06,990 --> 00:43:07,590 >> Four and one. 929 00:43:07,590 --> 00:43:09,970 Is this the correct order? 930 00:43:09,970 --> 00:43:11,310 So let's fix that. 931 00:43:11,310 --> 00:43:14,700 One and four, and then we'll just copy that. 932 00:43:14,700 --> 00:43:15,560 All right, good. 933 00:43:15,560 --> 00:43:17,022 I fixed one and four. 934 00:43:17,022 --> 00:43:18,320 Three and two? 935 00:43:18,320 --> 00:43:18,820 No. 936 00:43:18,820 --> 00:43:21,690 Let my words match my fingers. 937 00:43:21,690 --> 00:43:23,695 Four and three? 938 00:43:23,695 --> 00:43:27,930 >> It's not in order, so I'm going to do one, three, four, two. 939 00:43:27,930 --> 00:43:28,680 OK, good. 940 00:43:28,680 --> 00:43:32,310 Now four and two? 941 00:43:32,310 --> 00:43:33,370 We need to fix this, too. 942 00:43:33,370 --> 00:43:36,700 So one, three, two, four. 943 00:43:36,700 --> 00:43:39,820 So is it sorted? 944 00:43:39,820 --> 00:43:43,170 No, but is it closer to sorted? 945 00:43:43,170 --> 00:43:48,930 >> It is, because we fixed this mistake, we fixed this mistake, 946 00:43:48,930 --> 00:43:50,370 and we fixed this mistake. 947 00:43:50,370 --> 00:43:52,420 So we fixed three mistakes arguably. 948 00:43:52,420 --> 00:43:58,100 Still doesn't really look sorted, but it is objectively closer to sorted 949 00:43:58,100 --> 00:44:00,080 because we fixed some of those mistakes. 950 00:44:00,080 --> 00:44:02,047 >> Now what do I do next? 951 00:44:02,047 --> 00:44:03,630 I kind of reached the end of the list. 952 00:44:03,630 --> 00:44:05,680 I seemed to have fixed all the mistakes, but no. 953 00:44:05,680 --> 00:44:08,510 Because in this case, some numbers might have bubbled up closer 954 00:44:08,510 --> 00:44:10,410 to other numbers that are still out of order. 955 00:44:10,410 --> 00:44:12,951 So let's do it again, and I'll just do it in place this time. 956 00:44:12,951 --> 00:44:14,170 One and three? 957 00:44:14,170 --> 00:44:14,720 It's fine. 958 00:44:14,720 --> 00:44:16,070 Three and two? 959 00:44:16,070 --> 00:44:17,560 Of course no, so let's change that. 960 00:44:17,560 --> 00:44:19,160 So two, three. 961 00:44:19,160 --> 00:44:21,340 Three and four? 962 00:44:21,340 --> 00:44:24,370 And now let's just be particularly pedantic here. 963 00:44:24,370 --> 00:44:26,350 Is it sorted? 964 00:44:26,350 --> 00:44:29,280 You humans know it's sorted. 965 00:44:29,280 --> 00:44:30,400 >> I should try again. 966 00:44:30,400 --> 00:44:31,900 So Olivia is proposing I try again. 967 00:44:31,900 --> 00:44:32,530 Why? 968 00:44:32,530 --> 00:44:35,810 Because a computer doesn't have the luxury of our human eyes 969 00:44:35,810 --> 00:44:38,080 of just glancing back-- OK, I'm done. 970 00:44:38,080 --> 00:44:41,610 How does the computer determine that the list is now sorted? 971 00:44:41,610 --> 00:44:44,590 Mechanically. 972 00:44:44,590 --> 00:44:47,650 >> I should go through once more, and only if I 973 00:44:47,650 --> 00:44:51,190 don't make/find any mistakes can I then conclude as the computer, yep, 974 00:44:51,190 --> 00:44:51,980 we're good to go. 975 00:44:51,980 --> 00:44:54,850 So one and two, two and three, three and four. 976 00:44:54,850 --> 00:44:58,030 Now I can definitively say this is sorted, because I made no changes. 977 00:44:58,030 --> 00:45:01,940 Now it would be a bug and just foolish if I, the computer, 978 00:45:01,940 --> 00:45:05,640 asked those same questions again expecting different answers. 979 00:45:05,640 --> 00:45:07,110 Shouldn't happen. 980 00:45:07,110 --> 00:45:08,600 >> And so now the list is sorted. 981 00:45:08,600 --> 00:45:12,630 Unfortunately, running time of this algorithm is also n squared. 982 00:45:12,630 --> 00:45:13,130 Why? 983 00:45:13,130 --> 00:45:19,520 Because you have n numbers, and in the worst case you have to move n numbers 984 00:45:19,520 --> 00:45:23,637 n times because you have to keep going back to check and potentially fix 985 00:45:23,637 --> 00:45:24,220 these numbers. 986 00:45:24,220 --> 00:45:26,280 And we can do a more formal analysis, too. 987 00:45:26,280 --> 00:45:29,530 >> So this is all to say we've taken three different approaches, one 988 00:45:29,530 --> 00:45:32,210 of them immediately intuitive off the bat from Ben 989 00:45:32,210 --> 00:45:35,170 to my suggested insertion sort to this one 990 00:45:35,170 --> 00:45:38,540 where you kind of lose sight of the forest for the trees initially. 991 00:45:38,540 --> 00:45:41,760 But then if you take a step back, voila, we've fixed the sorting notion. 992 00:45:41,760 --> 00:45:43,824 So this is, dare say, a lower level perhaps 993 00:45:43,824 --> 00:45:45,740 than some of those other algorithms, but let's 994 00:45:45,740 --> 00:45:48,550 see if we can't visualize these by way of this. 995 00:45:48,550 --> 00:45:51,450 >> So this is some nice software that someone 996 00:45:51,450 --> 00:45:56,110 wrote using colorful bars that's going to do the following for us. 997 00:45:56,110 --> 00:45:57,736 Each of these bars represents a number. 998 00:45:57,736 --> 00:46:00,026 Taller the bar, the bigger the number, smaller the bar, 999 00:46:00,026 --> 00:46:00,990 the smaller the number. 1000 00:46:00,990 --> 00:46:05,880 So ideally we want a nice pyramid where it starts small and gets big, 1001 00:46:05,880 --> 00:46:08,330 and that would mean that these bars are sorted. 1002 00:46:08,330 --> 00:46:11,200 So I'm going to go ahead and choose, for instance, Ben's algorithm 1003 00:46:11,200 --> 00:46:13,990 first-- selection sort. 1004 00:46:13,990 --> 00:46:16,220 >> And notice what it's doing. 1005 00:46:16,220 --> 00:46:18,670 The way they've chosen to visualize this algorithm 1006 00:46:18,670 --> 00:46:22,090 is that, just like I was walking through my list, 1007 00:46:22,090 --> 00:46:24,710 this program is walking through its list of numbers, 1008 00:46:24,710 --> 00:46:28,160 highlighting in pink each number that it's looking at. 1009 00:46:28,160 --> 00:46:32,360 And what's about to happen right now? 1010 00:46:32,360 --> 00:46:35,154 >> The smallest number that I or Ben found suddenly 1011 00:46:35,154 --> 00:46:36,820 gets moved to the beginning of the list. 1012 00:46:36,820 --> 00:46:40,037 And notice they did evict the number that was there, 1013 00:46:40,037 --> 00:46:41,120 and that's perfectly fine. 1014 00:46:41,120 --> 00:46:42,600 I didn't get into that level of detail. 1015 00:46:42,600 --> 00:46:44,308 But we need to put that number somewhere, 1016 00:46:44,308 --> 00:46:47,775 so we just moved it to the open spot that was created. 1017 00:46:47,775 --> 00:46:49,900 So I'm going to speed this up, because otherwise it 1018 00:46:49,900 --> 00:46:51,871 becomes very tedious quickly. 1019 00:46:51,871 --> 00:46:55,800 1020 00:46:55,800 --> 00:46:58,600 Animation speed-- there we go. 1021 00:46:58,600 --> 00:47:01,850 So now same principle I was applying, but you 1022 00:47:01,850 --> 00:47:06,540 can start to feel the algorithm, if you will, or see it a little more clearly. 1023 00:47:06,540 --> 00:47:13,190 And this algorithm has the effect of selecting the next smallest element, 1024 00:47:13,190 --> 00:47:16,422 so you're going to start to see it ramp up on the left. 1025 00:47:16,422 --> 00:47:19,130 And on each iteration, as I proposed, it does a little less work. 1026 00:47:19,130 --> 00:47:21,921 It doesn't have to go all the way back to the left end of the list, 1027 00:47:21,921 --> 00:47:23,900 because it already knows those are sorted. 1028 00:47:23,900 --> 00:47:28,129 So it kind of feels like it's accelerating, even though each step is 1029 00:47:28,129 --> 00:47:29,420 taking the same amount of time. 1030 00:47:29,420 --> 00:47:31,600 There's just fewer steps remaining. 1031 00:47:31,600 --> 00:47:35,240 And now you can kind of feel the algorithm cleaning up the end of it, 1032 00:47:35,240 --> 00:47:37,040 and indeed now it's sorted. 1033 00:47:37,040 --> 00:47:41,620 >> So insertion sort is all done. 1034 00:47:41,620 --> 00:47:43,600 I need to re-randomize the array. 1035 00:47:43,600 --> 00:47:45,940 And notice I can just keep randomizing it, 1036 00:47:45,940 --> 00:47:50,630 and we'll get an approximation of the same approach, insertion sort. 1037 00:47:50,630 --> 00:47:55,050 Let me slow it down to here. 1038 00:47:55,050 --> 00:47:56,915 Let's start that over. 1039 00:47:56,915 --> 00:47:57,414 Stop. 1040 00:47:57,414 --> 00:48:00,662 1041 00:48:00,662 --> 00:48:02,410 >> Let's skip four. 1042 00:48:02,410 --> 00:48:03,200 There we go. 1043 00:48:03,200 --> 00:48:04,190 Randomize they array. 1044 00:48:04,190 --> 00:48:05,555 And here we go-- insertion sort. 1045 00:48:05,555 --> 00:48:10,260 1046 00:48:10,260 --> 00:48:12,800 Play. 1047 00:48:12,800 --> 00:48:17,280 Notice that it's dealing with each element it encounters right away, 1048 00:48:17,280 --> 00:48:20,282 but if it belongs in the wrong place notice 1049 00:48:20,282 --> 00:48:21,740 all of the work that has to happen. 1050 00:48:21,740 --> 00:48:24,700 We have to keep shifting more and more elements to make room 1051 00:48:24,700 --> 00:48:27,340 for the one we want to put in place. 1052 00:48:27,340 --> 00:48:30,740 >> So we're focusing on the left end of the list only. 1053 00:48:30,740 --> 00:48:34,460 Notice we haven't even looked at-- we haven't highlighted in pink anything 1054 00:48:34,460 --> 00:48:35,610 to the right. 1055 00:48:35,610 --> 00:48:38,180 We're just dealing with the problems as we go, 1056 00:48:38,180 --> 00:48:40,430 but we're creating a lot of work for ourselves still. 1057 00:48:40,430 --> 00:48:44,410 And so if we speed this up now to go to completion, 1058 00:48:44,410 --> 00:48:46,210 it has a different feel to it indeed. 1059 00:48:46,210 --> 00:48:50,150 It's just focusing on the left end but doing a little more work as needed-- 1060 00:48:50,150 --> 00:48:53,230 kind of smoothing things over, fixing things, 1061 00:48:53,230 --> 00:48:58,350 but dealing ultimately with each element one at a time 1062 00:48:58,350 --> 00:49:07,740 until we get to the-- well, we all know how this is going to end, 1063 00:49:07,740 --> 00:49:09,700 so it's a little underwhelming perhaps. 1064 00:49:09,700 --> 00:49:12,830 >> But the list in the end-- spoiler-- is going to be sorted. 1065 00:49:12,830 --> 00:49:15,300 So let's look at one last one. 1066 00:49:15,300 --> 00:49:16,840 We can't just skip now. 1067 00:49:16,840 --> 00:49:18,000 We're almost there. 1068 00:49:18,000 --> 00:49:19,980 Two to go, one to go. 1069 00:49:19,980 --> 00:49:22,680 And voila. 1070 00:49:22,680 --> 00:49:23,450 Excellent. 1071 00:49:23,450 --> 00:49:27,220 >> So now let's do one last one, re-randomizing with bubble sort. 1072 00:49:27,220 --> 00:49:31,690 And notice here, especially if I slow it down, this does keep swooping through. 1073 00:49:31,690 --> 00:49:36,830 But notice it just makes pairwise comparisons-- sort of local solutions. 1074 00:49:36,830 --> 00:49:39,050 But as soon as we get to the end of the list in pink, 1075 00:49:39,050 --> 00:49:40,690 what's going to have to happen again? 1076 00:49:40,690 --> 00:49:44,539 1077 00:49:44,539 --> 00:49:46,830 Yeah, it's going to have to start over, because it only 1078 00:49:46,830 --> 00:49:49,870 fixed pairwise mistakes. 1079 00:49:49,870 --> 00:49:53,120 And that might have revealed yet others. 1080 00:49:53,120 --> 00:49:58,950 And so if you speed this up, you'll see that, much as the name implies, 1081 00:49:58,950 --> 00:50:01,870 the smaller elements-- or rather, the larger elements-- are starting 1082 00:50:01,870 --> 00:50:03,740 to bubble up to the top, if you will. 1083 00:50:03,740 --> 00:50:07,380 And the smaller elements are starting to bubble down to the left. 1084 00:50:07,380 --> 00:50:10,780 And indeed, that's kind of the visual effect as well. 1085 00:50:10,780 --> 00:50:17,150 And so this will end up finishing in a very similar way, too. 1086 00:50:17,150 --> 00:50:19,160 >> We don't have to dwell on this particular one. 1087 00:50:19,160 --> 00:50:21,010 Let me open this now, too. 1088 00:50:21,010 --> 00:50:24,040 There's a few other sorting algorithms in the world, a few of which 1089 00:50:24,040 --> 00:50:25,580 are captured here. 1090 00:50:25,580 --> 00:50:29,960 And especially for learners who aren't necessarily visual or mathematical, 1091 00:50:29,960 --> 00:50:31,930 as we did before, we can also do this audially 1092 00:50:31,930 --> 00:50:34,210 if we associate a sound with this. 1093 00:50:34,210 --> 00:50:36,990 And just for fun, here's a few different algorithms, 1094 00:50:36,990 --> 00:50:40,950 and one of them in particular you're going to notice is called "merge sort." 1095 00:50:40,950 --> 00:50:43,250 >> It is actually a fundamentally better algorithm, 1096 00:50:43,250 --> 00:50:45,860 such that merge sort, one of the ones you're about to see, 1097 00:50:45,860 --> 00:50:49,170 is not order of n squared. 1098 00:50:49,170 --> 00:50:57,280 It's on the order of n times log of n, which is actually smaller and thus 1099 00:50:57,280 --> 00:50:58,940 faster than those other three. 1100 00:50:58,940 --> 00:51:00,670 And there's a couple other silly ones that we'll see. 1101 00:51:00,670 --> 00:51:01,933 >> So here we go with some sound. 1102 00:51:01,933 --> 00:51:06,620 1103 00:51:06,620 --> 00:51:10,490 This is insertion sort, so again it's just dealing with the elements 1104 00:51:10,490 --> 00:51:13,420 as they come. 1105 00:51:13,420 --> 00:51:17,180 This is bubble sort, so it's considering them pairs at a time. 1106 00:51:17,180 --> 00:51:22,030 1107 00:51:22,030 --> 00:51:24,490 And again, the biggest elements are bubbling up to the top. 1108 00:51:24,490 --> 00:51:38,098 1109 00:51:38,098 --> 00:51:41,710 >> Next up selection sort. 1110 00:51:41,710 --> 00:51:45,420 This is Ben's algorithm, where again he's selecting iteratively 1111 00:51:45,420 --> 00:51:46,843 the next smallest element. 1112 00:51:46,843 --> 00:51:49,801 1113 00:51:49,801 --> 00:51:53,900 And again, now you can really hear that it's speeding up but only in so far 1114 00:51:53,900 --> 00:51:58,230 as it's doing less and less work on each iteration. 1115 00:51:58,230 --> 00:52:04,170 This is the faster one, merge sort, which is sorting clusters of numbers 1116 00:52:04,170 --> 00:52:05,971 together and then combining them. 1117 00:52:05,971 --> 00:52:07,720 So look-- the left half is already sorted. 1118 00:52:07,720 --> 00:52:14,165 >> Now it's sorting the right half, and now it's going to combine them into one. 1119 00:52:14,165 --> 00:52:19,160 This is something called "Gnome sort." 1120 00:52:19,160 --> 00:52:23,460 And you can kind of see that it's going back and forth, 1121 00:52:23,460 --> 00:52:27,950 fixing work a little bit here and there before it proceeds to new work. 1122 00:52:27,950 --> 00:52:32,900 1123 00:52:32,900 --> 00:52:33,692 And that's it. 1124 00:52:33,692 --> 00:52:36,400 There's another sort, which is really just for academic purposes, 1125 00:52:36,400 --> 00:52:40,980 called "stupid sort," which takes your data, sorts it randomly, 1126 00:52:40,980 --> 00:52:43,350 and then checks if it is sorted. 1127 00:52:43,350 --> 00:52:47,880 And if it is not, it re-sorts it randomly, checks if it's sorted, 1128 00:52:47,880 --> 00:52:49,440 and if not repeats. 1129 00:52:49,440 --> 00:52:52,660 And in theory, probabilistically this will complete, 1130 00:52:52,660 --> 00:52:54,140 but after quite a bit of time. 1131 00:52:54,140 --> 00:52:56,930 It's not the most efficient of algorithms. 1132 00:52:56,930 --> 00:53:02,550 So any questions on those particular algorithms or anything 1133 00:53:02,550 --> 00:53:04,720 related there, too? 1134 00:53:04,720 --> 00:53:09,430 >> Well, let's now tease apart what all these lines are that I've been drawing 1135 00:53:09,430 --> 00:53:15,090 and what I'm assuming the computer can do underneath the hood. 1136 00:53:15,090 --> 00:53:18,650 I would argue that all of these numbers I keep drawing-- they need to get 1137 00:53:18,650 --> 00:53:21,330 stored somewhere in memory. 1138 00:53:21,330 --> 00:53:24,130 We'll get rid of this guy now, too. 1139 00:53:24,130 --> 00:53:30,110 >> So a piece of memory in a computer-- so RAM DIMM is 1140 00:53:30,110 --> 00:53:35,480 what we searched for yesterday, dual inline memory module-- looks like this. 1141 00:53:35,480 --> 00:53:39,370 And each of these little black chips is some number of bytes, typically. 1142 00:53:39,370 --> 00:53:44,380 And then the gold pins are like the wires that connect it to the computer, 1143 00:53:44,380 --> 00:53:47,521 and the green silicon board is just what keeps everything all together. 1144 00:53:47,521 --> 00:53:48,770 So what does this really mean? 1145 00:53:48,770 --> 00:53:53,180 If I kind of draw this same picture, let's suppose for simplicity 1146 00:53:53,180 --> 00:53:55,280 that this DIMM, dual inline memory module, 1147 00:53:55,280 --> 00:54:00,530 is one gigabyte of RAM, one gigabyte of memory, which is how many bytes total? 1148 00:54:00,530 --> 00:54:02,100 One gigabyte is how many bytes? 1149 00:54:02,100 --> 00:54:04,860 1150 00:54:04,860 --> 00:54:06,030 More than that. 1151 00:54:06,030 --> 00:54:09,960 1,124 is kilo, 1,000. 1152 00:54:09,960 --> 00:54:11,730 Mega is million. 1153 00:54:11,730 --> 00:54:14,570 Giga is a billion. 1154 00:54:14,570 --> 00:54:15,070 >> Am I lying? 1155 00:54:15,070 --> 00:54:16,670 Can we even read the label? 1156 00:54:16,670 --> 00:54:19,920 This is actually 128 gigabytes, so it's more. 1157 00:54:19,920 --> 00:54:22,130 But we'll pretend this is just one gigabyte. 1158 00:54:22,130 --> 00:54:25,640 So that means there's a billion bytes of memory available to me 1159 00:54:25,640 --> 00:54:29,770 or 8 billion bits, but we're going to talk in terms of bytes now, 1160 00:54:29,770 --> 00:54:30,750 moving forward. 1161 00:54:30,750 --> 00:54:36,330 >> So what that means is this is one byte, this is another byte, 1162 00:54:36,330 --> 00:54:38,680 this is another byte, and if we really wanted 1163 00:54:38,680 --> 00:54:43,280 to be specific we would have to draw a billion little squares. 1164 00:54:43,280 --> 00:54:44,320 But what does that mean? 1165 00:54:44,320 --> 00:54:46,420 Well, let me just zoom in on this picture. 1166 00:54:46,420 --> 00:54:50,900 If I've got something that looks like this now, that's four bytes. 1167 00:54:50,900 --> 00:54:53,710 >> And so I could put four numbers here. 1168 00:54:53,710 --> 00:54:54,990 One, two, three, four. 1169 00:54:54,990 --> 00:55:00,170 Or I could put four letters or symbols. 1170 00:55:00,170 --> 00:55:02,620 "Hey!" could go right there, because each of the letters, 1171 00:55:02,620 --> 00:55:04,370 we discussed earlier, could be represented 1172 00:55:04,370 --> 00:55:06,650 with eight bits or ASCII or a byte. 1173 00:55:06,650 --> 00:55:09,370 So in other words, you can put 8 billion things inside 1174 00:55:09,370 --> 00:55:11,137 of this one stick of memory. 1175 00:55:11,137 --> 00:55:14,345 Now what does it mean to put things back to back to back in memory like this? 1176 00:55:14,345 --> 00:55:17,330 This is what a programmer would call an "array." 1177 00:55:17,330 --> 00:55:21,250 In a computer program, you don't think about the underlying hardware, per se. 1178 00:55:21,250 --> 00:55:24,427 You just think of yourself as having access to a billion bytes total, 1179 00:55:24,427 --> 00:55:26,010 and you can anything you want with it. 1180 00:55:26,010 --> 00:55:27,880 But for convenience it's generally useful 1181 00:55:27,880 --> 00:55:31,202 to keep your memory right next to each other like this. 1182 00:55:31,202 --> 00:55:33,660 So if I zoom in on this-- because we're certainly not going 1183 00:55:33,660 --> 00:55:39,310 to draw a billion little squares-- let's suppose that this board represents 1184 00:55:39,310 --> 00:55:40,610 that stick of memory now. 1185 00:55:40,610 --> 00:55:43,800 And I'll just draw as many as my marker ends up giving me here. 1186 00:55:43,800 --> 00:55:46,420 1187 00:55:46,420 --> 00:55:52,300 So now we have a stick of memory on the board 1188 00:55:52,300 --> 00:55:56,400 that's got one, two, three, four, five, six, one, two, three, four, five, six, 1189 00:55:56,400 --> 00:56:01,130 seven-- so 42 bytes of memory on the screen total. 1190 00:56:01,130 --> 00:56:01,630 Thank you. 1191 00:56:01,630 --> 00:56:02,838 Yes, did my arithmetic right. 1192 00:56:02,838 --> 00:56:05,120 So 42 bytes of memory here. 1193 00:56:05,120 --> 00:56:06,660 So what does this actually mean? 1194 00:56:06,660 --> 00:56:09,830 Well, a computer programmer would actually generally 1195 00:56:09,830 --> 00:56:12,450 think of this memory as addressable. 1196 00:56:12,450 --> 00:56:16,630 In other words, every one of these locations in memory, in hardware, 1197 00:56:16,630 --> 00:56:18,030 has a unique address. 1198 00:56:18,030 --> 00:56:22,020 >> It's not as complex as One Brattle Square, Cambridge, Mass., 02138. 1199 00:56:22,020 --> 00:56:23,830 Instead, it's just a number. 1200 00:56:23,830 --> 00:56:27,930 This is byte number zero, this is one, this is two, this is three, 1201 00:56:27,930 --> 00:56:30,327 and this is 41. 1202 00:56:30,327 --> 00:56:30,910 Wait a minute. 1203 00:56:30,910 --> 00:56:32,510 I thought I said 42 a moment ago. 1204 00:56:32,510 --> 00:56:35,050 1205 00:56:35,050 --> 00:56:37,772 I started counting at zero, so that's actually correct. 1206 00:56:37,772 --> 00:56:40,980 Now we don't have to actually draw it as a grid, and if you draw it as a grid 1207 00:56:40,980 --> 00:56:43,520 I think things actually get a bit misleading. 1208 00:56:43,520 --> 00:56:46,650 What a programmer would, in his or her own mind, 1209 00:56:46,650 --> 00:56:50,310 generally think of this memory as is just like a tape, 1210 00:56:50,310 --> 00:56:53,340 like a piece of masking tape that just goes on and on forever 1211 00:56:53,340 --> 00:56:54,980 or until you run out of memory. 1212 00:56:54,980 --> 00:56:59,200 So a more common way to draw and just think about memory 1213 00:56:59,200 --> 00:57:03,710 would be that this is byte zero, one, two, three, and then dot, dot, dot. 1214 00:57:03,710 --> 00:57:07,650 And you have 42 such bytes total, even though physically it might actually 1215 00:57:07,650 --> 00:57:09,480 be something more like this. 1216 00:57:09,480 --> 00:57:12,850 >> So if you now think of your memory as this, just like a tape, 1217 00:57:12,850 --> 00:57:17,640 this is what a programmer again would call an array of memory. 1218 00:57:17,640 --> 00:57:20,660 And when you want to actually store something in a computer's memory, 1219 00:57:20,660 --> 00:57:23,290 you generally do store things back-to-back to back-to-back. 1220 00:57:23,290 --> 00:57:25,010 So we've been talking about numbers. 1221 00:57:25,010 --> 00:57:30,880 And when I wanted to solve problems like four, one, three, two, 1222 00:57:30,880 --> 00:57:33,820 even though I was just drawing only the numbers four, one, three, 1223 00:57:33,820 --> 00:57:39,490 two on the board, the computer would really have this setup in memory. 1224 00:57:39,490 --> 00:57:43,347 >> And what would be next to the two in the computer's memory? 1225 00:57:43,347 --> 00:57:44,680 Well, there's no answer to that. 1226 00:57:44,680 --> 00:57:45,770 We don't really know. 1227 00:57:45,770 --> 00:57:48,200 And so long as the computer doesn't need it, 1228 00:57:48,200 --> 00:57:51,440 it doesn't have to care what is next to the numbers it does care about. 1229 00:57:51,440 --> 00:57:55,130 And when I said earlier that a computer can only look at one address at a time, 1230 00:57:55,130 --> 00:57:56,170 this is kind of why. 1231 00:57:56,170 --> 00:57:59,490 >> Not unlike a record player and a reading head 1232 00:57:59,490 --> 00:58:03,030 only being able to look at a certain groove in a physical old-school record 1233 00:58:03,030 --> 00:58:06,500 at a time, similarly can a computer thanks 1234 00:58:06,500 --> 00:58:09,810 to its CPU and its Intel instruction set, 1235 00:58:09,810 --> 00:58:12,480 among whose instruction is read from memory 1236 00:58:12,480 --> 00:58:15,590 or save to memory-- a computer can only look 1237 00:58:15,590 --> 00:58:19,210 at one location at a time-- sometimes a combination of them, 1238 00:58:19,210 --> 00:58:21,770 but really just one location at a time. 1239 00:58:21,770 --> 00:58:24,770 So when we were doing these various algorithms, 1240 00:58:24,770 --> 00:58:28,110 I'm not just writing in a vacuum-- four, one, three, two. 1241 00:58:28,110 --> 00:58:30,849 Those numbers actually belong somewhere physical in memory. 1242 00:58:30,849 --> 00:58:32,890 So there are tiny little transistors or some kind 1243 00:58:32,890 --> 00:58:35,840 of electronics underneath the hood storing these values. 1244 00:58:35,840 --> 00:58:40,460 >> And in total, how many bits are involved right now, just to be clear? 1245 00:58:40,460 --> 00:58:45,580 So this is four bytes, or now it's 32 bits total. 1246 00:58:45,580 --> 00:58:49,280 So there are actually 32 zeros and ones composing these four things. 1247 00:58:49,280 --> 00:58:52,070 There's even more over here, but again we don't care about that. 1248 00:58:52,070 --> 00:58:55,120 >> So now let's ask another question using memory, 1249 00:58:55,120 --> 00:58:57,519 because that at the end of the day is in variance. 1250 00:58:57,519 --> 00:59:00,310 No matter what we might do with the computer, at the end of the day 1251 00:59:00,310 --> 00:59:02,560 the hardware is still the same underneath the hood. 1252 00:59:02,560 --> 00:59:04,670 How would I store a word in here? 1253 00:59:04,670 --> 00:59:09,710 Well, a word in a computer like "hey!" would be stored just like this. 1254 00:59:09,710 --> 00:59:12,300 And if you wanted a longer word, you can simply 1255 00:59:12,300 --> 00:59:19,120 overwrite that and say something like "hello" and store that here. 1256 00:59:19,120 --> 00:59:23,930 >> And so here, too, this contiguousness is actually an advantage, 1257 00:59:23,930 --> 00:59:26,530 because a computer can just read from right to left. 1258 00:59:26,530 --> 00:59:28,680 But here's a question. 1259 00:59:28,680 --> 00:59:33,480 In the context of this word, h-e-l-l-o, exclamation point, 1260 00:59:33,480 --> 00:59:38,740 how might the computer know where the word begins and where the word ends? 1261 00:59:38,740 --> 00:59:41,690 1262 00:59:41,690 --> 00:59:43,800 In the context of numbers, how does the computer 1263 00:59:43,800 --> 00:59:48,396 know how long the sequence of numbers is or where it starts? 1264 00:59:48,396 --> 00:59:50,270 Well, it turns out-- and we won't go too much 1265 00:59:50,270 --> 00:59:54,970 into this level of detail-- computers move stuff around in memory 1266 00:59:54,970 --> 00:59:57,800 literally by way of these addresses. 1267 00:59:57,800 --> 01:00:02,080 So in a computer, if you're writing code to store things 1268 01:00:02,080 --> 01:00:05,800 like words, what you're really doing is typing 1269 01:00:05,800 --> 01:00:11,320 expressions that remember where in the computer's memory these words are. 1270 01:00:11,320 --> 01:00:14,370 So let me do a very, very simple example. 1271 01:00:14,370 --> 01:00:18,260 >> I'm going to go ahead and open up a simple text program, 1272 01:00:18,260 --> 01:00:20,330 and I'm going to create a file called hello.c. 1273 01:00:20,330 --> 01:00:22,849 Most of this information we won't go into in great detail, 1274 01:00:22,849 --> 01:00:25,140 but I'm going to write a program in that same language, 1275 01:00:25,140 --> 01:00:31,140 C. This is far more intimidating, I would argue, than Scratch, 1276 01:00:31,140 --> 01:00:32,490 but it's very similar in spirit. 1277 01:00:32,490 --> 01:00:34,364 In fact, these curly braces-- you can kind of 1278 01:00:34,364 --> 01:00:37,820 think of what I just did as this. 1279 01:00:37,820 --> 01:00:39,240 >> Let's do this, actually. 1280 01:00:39,240 --> 01:00:45,100 When green flag clicked, do the following. 1281 01:00:45,100 --> 01:00:50,210 I want to print out "hello." 1282 01:00:50,210 --> 01:00:51,500 So this is now pseudocode. 1283 01:00:51,500 --> 01:00:53,000 I'm kind of blurring the lines. 1284 01:00:53,000 --> 01:00:56,750 In C, this language I'm talking about, this line print hello 1285 01:00:56,750 --> 01:01:01,940 actually becomes "printf" with some parentheses and a semi-colon. 1286 01:01:01,940 --> 01:01:03,480 >> But it's the exact same idea. 1287 01:01:03,480 --> 01:01:06,730 And this very user-friendly "when green flag clicked" becomes 1288 01:01:06,730 --> 01:01:10,182 the much more arcane "int main void." 1289 01:01:10,182 --> 01:01:12,890 And this really has no mapping, so I'm just going to ignore that. 1290 01:01:12,890 --> 01:01:17,210 But the curly braces are like the curved puzzle pieces like this. 1291 01:01:17,210 --> 01:01:18,700 >> So you can kind of guess. 1292 01:01:18,700 --> 01:01:22,357 Even if you've never programmed before, what does this program probably do? 1293 01:01:22,357 --> 01:01:25,560 1294 01:01:25,560 --> 01:01:28,000 Probably prints hello with an exclamation point. 1295 01:01:28,000 --> 01:01:29,150 >> So let's try that. 1296 01:01:29,150 --> 01:01:30,800 I'm going to save it. 1297 01:01:30,800 --> 01:01:34,000 And this is, again, a very old school environment. 1298 01:01:34,000 --> 01:01:35,420 I can't click, I can't drag. 1299 01:01:35,420 --> 01:01:36,910 I have to type commands. 1300 01:01:36,910 --> 01:01:41,320 So I want to run my program, so I might do this, like hello.c. 1301 01:01:41,320 --> 01:01:42,292 That's the file I ran. 1302 01:01:42,292 --> 01:01:43,500 But wait, I'm missing a step. 1303 01:01:43,500 --> 01:01:46,470 What did we say is a necessary step for a language like C? 1304 01:01:46,470 --> 01:01:49,470 I've just written source code, but what do I need? 1305 01:01:49,470 --> 01:01:50,670 Yeah, I need a compiler. 1306 01:01:50,670 --> 01:01:57,670 So on my Mac here, I have a program called GCC, GNU C compiler, 1307 01:01:57,670 --> 01:02:03,990 which allows me to do this-- turn my source code into, we'll call it, 1308 01:02:03,990 --> 01:02:04,930 machine code. 1309 01:02:04,930 --> 01:02:10,180 >> And I can see that, again, as follows, these 1310 01:02:10,180 --> 01:02:14,090 are the zeros and ones I just created from my source code, 1311 01:02:14,090 --> 01:02:15,730 all of the zeros and ones. 1312 01:02:15,730 --> 01:02:17,770 And if I want to run my program-- it happens 1313 01:02:17,770 --> 01:02:23,010 to be called a.out for historical reasons-- "hello." 1314 01:02:23,010 --> 01:02:24,070 I can run it again. 1315 01:02:24,070 --> 01:02:25,690 Hello, hello, hello. 1316 01:02:25,690 --> 01:02:27,430 And it seems to be working. 1317 01:02:27,430 --> 01:02:31,000 >> But that means somewhere in my computer's memory are the words 1318 01:02:31,000 --> 01:02:35,279 h-e-l-l-o, exclamation point. 1319 01:02:35,279 --> 01:02:38,070 And it turns out, just as an aside, what a computer would typically 1320 01:02:38,070 --> 01:02:40,550 do so that it knows where things start and end-- it's 1321 01:02:40,550 --> 01:02:42,460 going to put a special symbol here. 1322 01:02:42,460 --> 01:02:46,064 And the convention is to put the number zero at the end of a word 1323 01:02:46,064 --> 01:02:48,230 so that you know where it actually ends, so that you 1324 01:02:48,230 --> 01:02:52,750 don't keep printing out more and more characters than you actually intend. 1325 01:02:52,750 --> 01:02:55,400 >> But the takeaway here, even though this is fairly arcane, 1326 01:02:55,400 --> 01:02:58,140 is that it's ultimately relatively simple. 1327 01:02:58,140 --> 01:03:04,550 You were given sort of a tape, a blank space on which you can write letters. 1328 01:03:04,550 --> 01:03:07,150 You simply have to have a special symbol, like arbitrarily 1329 01:03:07,150 --> 01:03:10,316 the number zero, to put at the end of your words so that the computer knows, 1330 01:03:10,316 --> 01:03:13,410 oh, I should stop printing after I see the exclamation point. 1331 01:03:13,410 --> 01:03:16,090 Because the next thing there is an ASCII value of zero, 1332 01:03:16,090 --> 01:03:19,125 or the null character as someone would call it. 1333 01:03:19,125 --> 01:03:21,500 But there's kind of a problem here, and let's revert back 1334 01:03:21,500 --> 01:03:23,320 to numbers for a moment. 1335 01:03:23,320 --> 01:03:28,720 Suppose that I do, in fact, have an array of numbers, 1336 01:03:28,720 --> 01:03:30,730 and suppose that the program I'm writing is 1337 01:03:30,730 --> 01:03:34,680 like a grade book for a teacher and a teachers classroom. 1338 01:03:34,680 --> 01:03:38,720 And this program allows him or her to type in their students' scores 1339 01:03:38,720 --> 01:03:39,960 on quizzes. 1340 01:03:39,960 --> 01:03:43,750 And suppose that the student gets 100 on their first quiz, maybe 1341 01:03:43,750 --> 01:03:49,920 like an 80 on the next one, then a 75, then a 90 on the fourth quiz. 1342 01:03:49,920 --> 01:03:54,150 >> So at this point in the story, the array is of size four. 1343 01:03:54,150 --> 01:03:58,470 There's absolutely more memory in the computer, but the array, so to speak, 1344 01:03:58,470 --> 01:04:00,350 is of size four. 1345 01:04:00,350 --> 01:04:06,060 Suppose now that the teacher wants to assign a fifth quiz to the class. 1346 01:04:06,060 --> 01:04:08,510 Well, one of the things he or she is going to have to do 1347 01:04:08,510 --> 01:04:10,650 is now store an additional value here. 1348 01:04:10,650 --> 01:04:15,490 But if the array the teacher has created in this program is of size for, 1349 01:04:15,490 --> 01:04:22,440 one of the problem with an array is that you can't just keep adding to memory. 1350 01:04:22,440 --> 01:04:26,470 Because what if another part of the program has the word "hey" right there? 1351 01:04:26,470 --> 01:04:29,650 >> In other words, my memory can be used for anything in a program. 1352 01:04:29,650 --> 01:04:33,250 And if in advance I typed in, hey, I want to input four quiz scores, 1353 01:04:33,250 --> 01:04:34,784 they might go here and here. 1354 01:04:34,784 --> 01:04:37,700 And if you suddenly change your mind later and say I want a fifth quiz 1355 01:04:37,700 --> 01:04:40,872 score, you can't just put it wherever you want, 1356 01:04:40,872 --> 01:04:42,580 because what if this memory is being used 1357 01:04:42,580 --> 01:04:45,990 for something else-- some other program or some other feature of the program 1358 01:04:45,990 --> 01:04:46,910 that you're running? 1359 01:04:46,910 --> 01:04:50,650 So you have to think in advance how you want to store your data, 1360 01:04:50,650 --> 01:04:54,480 because now you've painted yourself into a digital corner. 1361 01:04:54,480 --> 01:04:57,280 >> So a teacher might instead say when writing a program 1362 01:04:57,280 --> 01:04:59,360 to store his or her grades, you know what? 1363 01:04:59,360 --> 01:05:04,180 I am going to request, when writing my program, 1364 01:05:04,180 --> 01:05:12,070 that I want zero, one, two, three, four, five, six, eight grades total. 1365 01:05:12,070 --> 01:05:15,320 So one, two, three, four, five, six, seven, eight. 1366 01:05:15,320 --> 01:05:18,612 The teacher can just over-allocate memory when writing his or her program 1367 01:05:18,612 --> 01:05:19,570 and say, you know what? 1368 01:05:19,570 --> 01:05:22,236 I'm never going to assign more than eight quizzes in a semester. 1369 01:05:22,236 --> 01:05:23,130 That's just crazy. 1370 01:05:23,130 --> 01:05:24,470 I'll never allocate that. 1371 01:05:24,470 --> 01:05:28,270 So that this way he or she has the flexibility to store student scores, 1372 01:05:28,270 --> 01:05:33,010 like 75, 90, and maybe one extra where the student got extra credit, 105. 1373 01:05:33,010 --> 01:05:36,130 >> But if the teacher never uses these three spaces, 1374 01:05:36,130 --> 01:05:38,860 there's an intuitive takeaway here. 1375 01:05:38,860 --> 01:05:41,410 He or she is just wasting space. 1376 01:05:41,410 --> 01:05:44,790 So in other words, there's this common tradeoff in programming 1377 01:05:44,790 --> 01:05:48,241 where you can either allocate exactly as much memory as you want, 1378 01:05:48,241 --> 01:05:51,490 the upside of which is that you're super efficient-- you're not being wasteful 1379 01:05:51,490 --> 01:05:54,640 at all-- but the downside of which is what if you change your mind when 1380 01:05:54,640 --> 01:05:58,780 using the program that you want to store more data than you originally intended. 1381 01:05:58,780 --> 01:06:03,030 >> So maybe the solution is, then, write your programs in such a way 1382 01:06:03,030 --> 01:06:05,605 that they use more memory than they actually need. 1383 01:06:05,605 --> 01:06:07,730 This way you're not going to run into that problem, 1384 01:06:07,730 --> 01:06:09,730 but you're being wasteful. 1385 01:06:09,730 --> 01:06:12,960 And the more memory your program uses, as we discussed yesterday, the less 1386 01:06:12,960 --> 01:06:15,410 memory that's available for other programs, 1387 01:06:15,410 --> 01:06:18,790 the sooner your computer might slow down because of virtual memory. 1388 01:06:18,790 --> 01:06:22,670 And so the ideal solution might be what? 1389 01:06:22,670 --> 01:06:24,610 >> Under-allocating seems bad. 1390 01:06:24,610 --> 01:06:27,030 Over-allocating seems bad. 1391 01:06:27,030 --> 01:06:31,120 So what might be a better solution? 1392 01:06:31,120 --> 01:06:32,390 Reallocating. 1393 01:06:32,390 --> 01:06:33,590 Be more dynamic. 1394 01:06:33,590 --> 01:06:37,520 Don't force yourself to choose a priori, at the beginning, what you want. 1395 01:06:37,520 --> 01:06:41,370 And certainly don't over-allocate, lest you be wasteful. 1396 01:06:41,370 --> 01:06:45,770 >> And so to achieve that goal, we need to throw this data structure, 1397 01:06:45,770 --> 01:06:48,100 so to speak, away. 1398 01:06:48,100 --> 01:06:51,080 And so what a programmer will typically use 1399 01:06:51,080 --> 01:06:55,940 is something called not an array but a linked list. 1400 01:06:55,940 --> 01:07:00,860 In other words, he or she will start to think of their memory 1401 01:07:00,860 --> 01:07:05,280 as being kind of a shape that they can draw in the following way. 1402 01:07:05,280 --> 01:07:08,520 If I want to store one number in a program-- so it's September, 1403 01:07:08,520 --> 01:07:12,600 I've given my students a quiz; I want to store the students' first quiz, 1404 01:07:12,600 --> 01:07:16,220 and they got a 100 on it-- I am going to ask my computer, 1405 01:07:16,220 --> 01:07:19,540 by way of the program I've written, for one chunk of memory. 1406 01:07:19,540 --> 01:07:22,570 And I'm going to store the number 100 in it, and that's it. 1407 01:07:22,570 --> 01:07:24,820 >> Then a few weeks later when I get my second quiz, 1408 01:07:24,820 --> 01:07:27,890 and it's time to type in that 90%, I am going 1409 01:07:27,890 --> 01:07:32,129 to ask the computer, hey, computer, can I have another chunk of memory? 1410 01:07:32,129 --> 01:07:34,170 It's going to give me this empty chunk of memory. 1411 01:07:34,170 --> 01:07:39,370 I'm going to put in the number 90, but in my program somehow or other-- 1412 01:07:39,370 --> 01:07:42,100 and we won't worry about the syntax for this-- I need 1413 01:07:42,100 --> 01:07:44,430 to somehow chain these things together. 1414 01:07:44,430 --> 01:07:47,430 And I'll chain them together with what looks like an arrow here. 1415 01:07:47,430 --> 01:07:50,050 >> The third quiz that comes up, I'm going to say, hey, computer, 1416 01:07:50,050 --> 01:07:51,680 give me another chunk of memory. 1417 01:07:51,680 --> 01:07:54,660 And I'm going to put down whatever it was, like 75, 1418 01:07:54,660 --> 01:07:56,920 and I have to chain this together now somehow. 1419 01:07:56,920 --> 01:08:00,290 Fourth quiz comes along, and maybe that's toward the end of the semester. 1420 01:08:00,290 --> 01:08:03,140 And by that point my program might be using memory 1421 01:08:03,140 --> 01:08:05,540 all over the place, all over physically. 1422 01:08:05,540 --> 01:08:08,170 And so just for kicks, I'm going to draw this forth 1423 01:08:08,170 --> 01:08:11,260 quiz-- I forget what it was; I think maybe an 80 or something-- 1424 01:08:11,260 --> 01:08:12,500 way over here. 1425 01:08:12,500 --> 01:08:15,920 >> But that's fine, because pictorially I'm going to draw this line. 1426 01:08:15,920 --> 01:08:19,063 In other words, in reality, in your computer's hardware, 1427 01:08:19,063 --> 01:08:20,979 the first score might end up here because it's 1428 01:08:20,979 --> 01:08:22,529 right at the start of the semester. 1429 01:08:22,529 --> 01:08:25,810 The next one might end up here because a bit of time has passed 1430 01:08:25,810 --> 01:08:27,210 and the program keeps running. 1431 01:08:27,210 --> 01:08:30,060 The next score, which was a 75, might be over here. 1432 01:08:30,060 --> 01:08:33,420 And the last score might be an 80, which is over here. 1433 01:08:33,420 --> 01:08:38,729 >> So in reality, physically, this might be what your computer's memory looks like. 1434 01:08:38,729 --> 01:08:41,569 But this is not a useful mental paradigm for a computer programmer. 1435 01:08:41,569 --> 01:08:44,649 Why should you care where the heck your data is ending up? 1436 01:08:44,649 --> 01:08:46,200 You just want to store data. 1437 01:08:46,200 --> 01:08:49,390 >> This is kind of like our discussion earlier of drawing the cube. 1438 01:08:49,390 --> 01:08:52,200 Why do you care what the angle is of the cube 1439 01:08:52,200 --> 01:08:53,740 and how you have to turn to draw it? 1440 01:08:53,740 --> 01:08:54,950 You just want a cube. 1441 01:08:54,950 --> 01:08:57,359 Similarly here, you just want to grade book. 1442 01:08:57,359 --> 01:08:59,559 You just want to think of this as a list of numbers. 1443 01:08:59,559 --> 01:09:01,350 Who cares how it's implemented in hardware? 1444 01:09:01,350 --> 01:09:05,180 >> So the abstraction now is this picture here. 1445 01:09:05,180 --> 01:09:07,580 This is a linked list, as a programmer would call it, 1446 01:09:07,580 --> 01:09:10,640 insofar as you have a list, obviously of numbers. 1447 01:09:10,640 --> 01:09:14,990 But it's linked pictorially by way of these arrows, 1448 01:09:14,990 --> 01:09:18,510 and all these arrows are-- underneath the hood, if you're curious, 1449 01:09:18,510 --> 01:09:23,210 recall that our physical hardware has addresses zero, one, two, three, four. 1450 01:09:23,210 --> 01:09:28,465 All these arrows are is like a map or directions, where if 90 is-- now 1451 01:09:28,465 --> 01:09:29,090 I got to count. 1452 01:09:29,090 --> 01:09:31,750 >> Zero, one, two, three, four, five, six, seven. 1453 01:09:31,750 --> 01:09:35,640 It looks like the 90 is at memory address number seven. 1454 01:09:35,640 --> 01:09:38,460 All these arrows are is like a little scrap of paper 1455 01:09:38,460 --> 01:09:42,439 that's giving directions to the program that says follow this map 1456 01:09:42,439 --> 01:09:43,880 to get to location seven. 1457 01:09:43,880 --> 01:09:46,680 And there you will find the student's second quiz score. 1458 01:09:46,680 --> 01:09:52,100 Meanwhile, the 75-- if I continue this, this is seven, eight, nine, 10, 11, 12, 1459 01:09:52,100 --> 01:09:54,240 13, 14, 15. 1460 01:09:54,240 --> 01:09:59,080 >> This other arrow just represents a map to memory location 15. 1461 01:09:59,080 --> 01:10:02,550 But again, the programmer generally does not care about this level of detail. 1462 01:10:02,550 --> 01:10:05,530 And in most every programming language today, the programmer 1463 01:10:05,530 --> 01:10:10,490 won't even know where in memory these numbers actually are. 1464 01:10:10,490 --> 01:10:14,830 All he or she has to care about is that they are somehow linked together 1465 01:10:14,830 --> 01:10:18,390 in a data structure like this. 1466 01:10:18,390 --> 01:10:21,580 >> But it turns out not to get too technical. 1467 01:10:21,580 --> 01:10:27,430 But just because we can perhaps afford to have this discussion here, 1468 01:10:27,430 --> 01:10:33,630 suppose that we revisit this issue here of an array. 1469 01:10:33,630 --> 01:10:35,780 Let's see if we regret going here. 1470 01:10:35,780 --> 01:10:42,950 This is 100, 90, 75, and 80. 1471 01:10:42,950 --> 01:10:44,980 >> Let me briefly make this claim. 1472 01:10:44,980 --> 01:10:48,980 This is an array, and again, the salient characteristic of an array 1473 01:10:48,980 --> 01:10:52,400 is that all of your data is back to back to back in memory-- literally 1474 01:10:52,400 --> 01:10:56,830 one byte or maybe four bytes, some fixed number of bytes away. 1475 01:10:56,830 --> 01:11:00,710 In a linked list, which we might draw like this, underneath the hood who 1476 01:11:00,710 --> 01:11:02,000 knows where that stuff is? 1477 01:11:02,000 --> 01:11:03,630 It doesn't even need to flow like this. 1478 01:11:03,630 --> 01:11:06,050 Some of the data could be back to the left up there. 1479 01:11:06,050 --> 01:11:07,530 You don't even know. 1480 01:11:07,530 --> 01:11:15,430 >> And so with an array, you have a feature known as random access. 1481 01:11:15,430 --> 01:11:20,570 And what random access means is that the computer can jump instantly 1482 01:11:20,570 --> 01:11:22,730 to any location in an array. 1483 01:11:22,730 --> 01:11:23,580 Why? 1484 01:11:23,580 --> 01:11:26,000 Because the computer knows that the first location is 1485 01:11:26,000 --> 01:11:29,540 zero, one, two, and three. 1486 01:11:29,540 --> 01:11:33,890 >> And so if you want to go from this element to the next element, 1487 01:11:33,890 --> 01:11:36,099 you literally, in the computer's mind, just add one. 1488 01:11:36,099 --> 01:11:39,140 If you want to go to the third element, just add one-- next element, just 1489 01:11:39,140 --> 01:11:40,290 add one. 1490 01:11:40,290 --> 01:11:42,980 However, in this version of the story, suppose 1491 01:11:42,980 --> 01:11:46,080 the computer is currently looking at or dealing with the number 100. 1492 01:11:46,080 --> 01:11:49,770 How do you get to the next grade in the grade book? 1493 01:11:49,770 --> 01:11:52,560 >> You have to take seven steps, which is arbitrary. 1494 01:11:52,560 --> 01:11:58,120 To get to the next one, you have to take another eight steps to get to 15. 1495 01:11:58,120 --> 01:12:02,250 In other words, it's not a constant gap between the numbers, 1496 01:12:02,250 --> 01:12:04,857 and so it just takes the computer more time is the point. 1497 01:12:04,857 --> 01:12:06,940 The computer has to search through memory in order 1498 01:12:06,940 --> 01:12:08,990 to find what you're looking for. 1499 01:12:08,990 --> 01:12:14,260 >> So whereas an array tends to be a fast data structure-- because you 1500 01:12:14,260 --> 01:12:17,610 can literally just do simple arithmetic and get where you want by adding one, 1501 01:12:17,610 --> 01:12:21,300 for instance-- a linked list, you sacrifice that feature. 1502 01:12:21,300 --> 01:12:24,020 You can't just go from first to second to third to fourth. 1503 01:12:24,020 --> 01:12:25,240 You have to follow the map. 1504 01:12:25,240 --> 01:12:28,160 You have to take more steps to get to those values, which 1505 01:12:28,160 --> 01:12:30,230 would seem to be adding a cost. 1506 01:12:30,230 --> 01:12:35,910 So we're paying a price, but what was the feature that Dan was seeking here? 1507 01:12:35,910 --> 01:12:38,110 What does a linked list apparently allow us to do, 1508 01:12:38,110 --> 01:12:40,240 which was the origin of this particular story? 1509 01:12:40,240 --> 01:12:43,250 1510 01:12:43,250 --> 01:12:43,830 >> Exactly. 1511 01:12:43,830 --> 01:12:46,220 A dynamic size to it. 1512 01:12:46,220 --> 01:12:48,040 We can add to this list. 1513 01:12:48,040 --> 01:12:51,430 We can even shrink the list, so that we're only using as much memory 1514 01:12:51,430 --> 01:12:55,560 as we actually want and so we're never over-allocating. 1515 01:12:55,560 --> 01:12:58,470 >> Now just to be really nit-picky, there's a hidden cost. 1516 01:12:58,470 --> 01:13:01,980 So you shouldn't just let me convince you that this is a compelling tradeoff. 1517 01:13:01,980 --> 01:13:04,190 There's another hidden cost here. 1518 01:13:04,190 --> 01:13:06,550 The benefit, to be clear, is that we get dynamism. 1519 01:13:06,550 --> 01:13:10,359 If I want another element, I can just draw it and put a number in there. 1520 01:13:10,359 --> 01:13:12,150 And then I can link it with a picture here, 1521 01:13:12,150 --> 01:13:14,970 whereas over here, again, if I've painted myself into a corner, 1522 01:13:14,970 --> 01:13:19,410 if something else is already using the memory here, I'm out of luck. 1523 01:13:19,410 --> 01:13:21,700 I've painted myself into the corner. 1524 01:13:21,700 --> 01:13:24,390 >> But what's the hidden cost in this picture? 1525 01:13:24,390 --> 01:13:27,690 It's not just the amount of time that it takes 1526 01:13:27,690 --> 01:13:29,870 to go from here to here, which is seven steps, then 1527 01:13:29,870 --> 01:13:32,820 eight steps, which is more than one. 1528 01:13:32,820 --> 01:13:34,830 What's another hidden cost? 1529 01:13:34,830 --> 01:13:35,440 Not just time. 1530 01:13:35,440 --> 01:13:44,790 1531 01:13:44,790 --> 01:13:49,940 Additional information is necessary to achieve this picture. 1532 01:13:49,940 --> 01:13:53,210 >> Yeah, that map, those little scraps of paper, as I keep describing them as. 1533 01:13:53,210 --> 01:13:55,650 These arrows-- those aren't free. 1534 01:13:55,650 --> 01:13:57,660 A computer-- you know what a computer has. 1535 01:13:57,660 --> 01:13:58,790 It has zeros and ones. 1536 01:13:58,790 --> 01:14:03,170 If you want to represent an arrow or a map or a number, you need some memory. 1537 01:14:03,170 --> 01:14:05,950 So the other price you pay for a linked list, 1538 01:14:05,950 --> 01:14:09,070 a common computer science resource, is also space. 1539 01:14:09,070 --> 01:14:11,710 >> And indeed so, so commonly, among the tradeoffs 1540 01:14:11,710 --> 01:14:15,580 in designing software engineering systems is time and space-- 1541 01:14:15,580 --> 01:14:18,596 are two of your ingredients, two of your most costly ingredients. 1542 01:14:18,596 --> 01:14:21,220 This is costing me more time because I have to follow this map, 1543 01:14:21,220 --> 01:14:25,730 but it's also costing me more space because I have to keep this map around. 1544 01:14:25,730 --> 01:14:28,730 So the hope, as we've kind of discussed over yesterday and today, 1545 01:14:28,730 --> 01:14:31,720 is that the benefits will outweigh the costs. 1546 01:14:31,720 --> 01:14:33,870 >> But there's no obvious solution here. 1547 01:14:33,870 --> 01:14:35,870 Maybe it is better-- a la quick and dirty, 1548 01:14:35,870 --> 01:14:38,660 as Kareem proposed earlier-- to throw memory at the problem. 1549 01:14:38,660 --> 01:14:42,520 Just buy more memory, think less hard about solving the problem, 1550 01:14:42,520 --> 01:14:44,595 and solve it in an easier way. 1551 01:14:44,595 --> 01:14:46,720 And indeed earlier, when we talked about tradeoffs, 1552 01:14:46,720 --> 01:14:49,190 it wasn't space in the computer and time. 1553 01:14:49,190 --> 01:14:51,810 It was developer time, which is yet another resource. 1554 01:14:51,810 --> 01:14:54,829 >> So again, it's this balancing act trying to decide which of those things 1555 01:14:54,829 --> 01:14:55,870 are you willing to spend? 1556 01:14:55,870 --> 01:14:57,380 Which is the least expensive? 1557 01:14:57,380 --> 01:15:01,040 Which yields the better results? 1558 01:15:01,040 --> 01:15:01,540 Yeah? 1559 01:15:01,540 --> 01:15:11,310 1560 01:15:11,310 --> 01:15:12,580 >> Indeed. 1561 01:15:12,580 --> 01:15:15,970 In this case, if you're representing numbers in the maps-- 1562 01:15:15,970 --> 01:15:18,820 these are called in many languages "pointers" or "addresses"-- 1563 01:15:18,820 --> 01:15:20,390 it's double the space. 1564 01:15:20,390 --> 01:15:24,390 That needn't be as bad as double if right now we're just storing numbers. 1565 01:15:24,390 --> 01:15:27,410 Suppose that we were storing patient records in a hospital-- 1566 01:15:27,410 --> 01:15:30,870 so Pierson's names, phone numbers, social security numbers, doctor 1567 01:15:30,870 --> 01:15:31,540 history. 1568 01:15:31,540 --> 01:15:34,160 This box might be much, much bigger, in which case 1569 01:15:34,160 --> 01:15:38,000 a tiny little pointer, the address of the next element-- it's not a big deal. 1570 01:15:38,000 --> 01:15:40,620 It's such a fringe cost it doesn't matter. 1571 01:15:40,620 --> 01:15:43,210 But in this case, yeah, it's a doubling. 1572 01:15:43,210 --> 01:15:45,290 Good question. 1573 01:15:45,290 --> 01:15:47,900 >> Let's talk about time a little more concretely. 1574 01:15:47,900 --> 01:15:50,380 What's the running time of searching this list? 1575 01:15:50,380 --> 01:15:53,640 Suppose I wanted to search through all the students' grades, 1576 01:15:53,640 --> 01:15:55,980 and there's n grades in this data structure. 1577 01:15:55,980 --> 01:15:58,830 Here, too, we can borrow the vocabulary of earlier. 1578 01:15:58,830 --> 01:16:00,890 This is a linear data structure. 1579 01:16:00,890 --> 01:16:04,570 >> Big O of n is what's required to get to the end of this data structure, 1580 01:16:04,570 --> 01:16:08,410 whereas-- and we haven't seen this before-- an array gives you 1581 01:16:08,410 --> 01:16:13,555 what's called constant time, which means one step or two steps or 10 steps-- 1582 01:16:13,555 --> 01:16:14,180 doesn't matter. 1583 01:16:14,180 --> 01:16:15,440 It's a fixed number. 1584 01:16:15,440 --> 01:16:17,440 It has nothing to do with the size of the array. 1585 01:16:17,440 --> 01:16:20,130 And the reason for that, again, is random access. 1586 01:16:20,130 --> 01:16:23,180 The computer can just immediately jump to another location, 1587 01:16:23,180 --> 01:16:27,770 because they're all the same distance from everything else. 1588 01:16:27,770 --> 01:16:29,112 There is no thinking involved. 1589 01:16:29,112 --> 01:16:31,900 1590 01:16:31,900 --> 01:16:32,400 All right. 1591 01:16:32,400 --> 01:16:39,230 So if I can, let me try to paint two final pictures. 1592 01:16:39,230 --> 01:16:42,830 A very common one known as a hash table. 1593 01:16:42,830 --> 01:16:51,120 So to motivate this discussion, let me think about how to do this. 1594 01:16:51,120 --> 01:16:52,610 >> So how about this? 1595 01:16:52,610 --> 01:16:55,160 Suppose that the problem we want to solve now 1596 01:16:55,160 --> 01:16:58,360 is implementing in a dictionary-- so a whole bunch of English words 1597 01:16:58,360 --> 01:16:59,330 or whatever. 1598 01:16:59,330 --> 01:17:02,724 And the goal is to be able to answer questions of the form is this a word? 1599 01:17:02,724 --> 01:17:04,640 So you want to implement a spell checker, just 1600 01:17:04,640 --> 01:17:07,220 like a physical dictionary that you can look things up in. 1601 01:17:07,220 --> 01:17:10,490 Suppose I were to do this with an array. 1602 01:17:10,490 --> 01:17:12,590 I could do this. 1603 01:17:12,590 --> 01:17:20,756 >> And suppose the words are apple and banana and cantaloupe. 1604 01:17:20,756 --> 01:17:23,330 1605 01:17:23,330 --> 01:17:26,465 And I can't think of fruits that start with d, so we're just 1606 01:17:26,465 --> 01:17:27,590 going to have three fruits. 1607 01:17:27,590 --> 01:17:31,510 So this is an array, and we're storing all of these words 1608 01:17:31,510 --> 01:17:34,200 in this dictionary as an array. 1609 01:17:34,200 --> 01:17:39,350 The question, then, is how else could you store this information? 1610 01:17:39,350 --> 01:17:43,160 >> Well, I'm kind of cheating here, because each of these letters in the word 1611 01:17:43,160 --> 01:17:44,490 is really an individual byte. 1612 01:17:44,490 --> 01:17:46,740 So if I really wanted to be nit-picky, I should really 1613 01:17:46,740 --> 01:17:49,600 be dividing this up into much smaller chunks of memory, 1614 01:17:49,600 --> 01:17:51,289 and we could do exactly that. 1615 01:17:51,289 --> 01:17:53,580 But we're going to run into the same problem as before. 1616 01:17:53,580 --> 01:17:56,674 What if, as Merriam Webster or Oxford does every year-- they add words 1617 01:17:56,674 --> 01:17:59,340 to the dictionary-- we don't necessarily want to paint ourselves 1618 01:17:59,340 --> 01:18:00,780 into a corner with an array? 1619 01:18:00,780 --> 01:18:05,710 >> So instead, maybe a smarter approach is to put apple in its own node or box, 1620 01:18:05,710 --> 01:18:11,190 as we would say, banana, and then here we have cantaloupe. 1621 01:18:11,190 --> 01:18:14,990 1622 01:18:14,990 --> 01:18:16,790 And we string these things together. 1623 01:18:16,790 --> 01:18:19,980 So this is the array, and this is the linked list. 1624 01:18:19,980 --> 01:18:23,300 If you can't quite see, it just says "array," and this says "list." 1625 01:18:23,300 --> 01:18:25,780 >> So we have the same exact issues as before, 1626 01:18:25,780 --> 01:18:28,600 whereby we now have dynamism in our linked list. 1627 01:18:28,600 --> 01:18:31,090 But we have a fairly slow dictionary. 1628 01:18:31,090 --> 01:18:32,870 Suppose I want to look up a word. 1629 01:18:32,870 --> 01:18:35,430 It might take me big O of n steps, because the word might 1630 01:18:35,430 --> 01:18:37,840 be all the way at the end of the list, like cantaloupe. 1631 01:18:37,840 --> 01:18:40,600 And it turns out that in programming, sort 1632 01:18:40,600 --> 01:18:42,700 of the holy grail of data structures, is something 1633 01:18:42,700 --> 01:18:46,620 that gives you constant time like an array 1634 01:18:46,620 --> 01:18:50,870 but that still gives you dynamism. 1635 01:18:50,870 --> 01:18:52,940 >> So can we have the best of both worlds? 1636 01:18:52,940 --> 01:18:55,570 And indeed, there is something called the hash table 1637 01:18:55,570 --> 01:18:59,320 that allows you to do exactly that, albeit approximately. 1638 01:18:59,320 --> 01:19:03,140 A hash table is a fancier data structure that we 1639 01:19:03,140 --> 01:19:06,340 can think of as the combination of an array-- 1640 01:19:06,340 --> 01:19:12,390 and I'm going to draw it like this-- and linked lists 1641 01:19:12,390 --> 01:19:17,310 that I'll draw like this over here. 1642 01:19:17,310 --> 01:19:19,760 >> And the way this thing works is as follows. 1643 01:19:19,760 --> 01:19:23,310 1644 01:19:23,310 --> 01:19:29,540 If this now-- hash table-- is my third data structure, 1645 01:19:29,540 --> 01:19:32,590 and I want to store words in this, I don't 1646 01:19:32,590 --> 01:19:35,440 want to just store all of the words back to back to back to back. 1647 01:19:35,440 --> 01:19:37,430 I want to leverage some piece of information 1648 01:19:37,430 --> 01:19:40,330 about the words that will let me get it where it's faster. 1649 01:19:40,330 --> 01:19:43,666 >> So given the words apple and banana and cantaloupe, 1650 01:19:43,666 --> 01:19:45,040 I deliberately chose those words. 1651 01:19:45,040 --> 01:19:45,340 Why? 1652 01:19:45,340 --> 01:19:47,631 What's sort of fundamentally different about the three? 1653 01:19:47,631 --> 01:19:49,950 1654 01:19:49,950 --> 01:19:51,484 What's the obvious? 1655 01:19:51,484 --> 01:19:52,900 They start with different letters. 1656 01:19:52,900 --> 01:19:53,900 >> So you know what? 1657 01:19:53,900 --> 01:19:57,120 Rather than put all my words in the same bucket, so to speak, 1658 01:19:57,120 --> 01:20:00,390 like in one big list, why don't I at least try an optimization 1659 01:20:00,390 --> 01:20:04,180 and make my lists 1/26 as long. 1660 01:20:04,180 --> 01:20:07,440 A compelling optimization might be why don't 1661 01:20:07,440 --> 01:20:10,650 I-- when inserting a word into this data structure, 1662 01:20:10,650 --> 01:20:14,300 into the computer's memory, why don't I put all the 'a' words here, 1663 01:20:14,300 --> 01:20:17,270 all the 'b' words here, and all the 'c' words here? 1664 01:20:17,270 --> 01:20:24,610 So this ends up putting an apple here, banana here, cantaloupe here, 1665 01:20:24,610 --> 01:20:25,730 and so forth. 1666 01:20:25,730 --> 01:20:31,700 >> And if I have an additional word like-- what's another? 1667 01:20:31,700 --> 01:20:36,640 Apple, banana, pear. 1668 01:20:36,640 --> 01:20:39,370 Anyone think of a fruit that starts with a, b, or c? 1669 01:20:39,370 --> 01:20:40,570 Blueberry-- perfect. 1670 01:20:40,570 --> 01:20:43,990 That is going to end up here. 1671 01:20:43,990 --> 01:20:47,530 And so we seem to have a marginally better solution, 1672 01:20:47,530 --> 01:20:50,820 because now if I want to search for apple, I 1673 01:20:50,820 --> 01:20:53,200 first-- I don't just dive into my data structure. 1674 01:20:53,200 --> 01:20:54,850 I don't dive into my computer's memory. 1675 01:20:54,850 --> 01:20:56,530 I first look at the first letter. 1676 01:20:56,530 --> 01:20:58,610 >> And this is what a computer scientist would say. 1677 01:20:58,610 --> 01:21:00,760 You hash into your data structure. 1678 01:21:00,760 --> 01:21:04,100 You take your input, which in this case is a word like apple. 1679 01:21:04,100 --> 01:21:07,150 You analyze it, looking at the first letter in this case, 1680 01:21:07,150 --> 01:21:08,340 thereby hashing it. 1681 01:21:08,340 --> 01:21:10,950 Hashing is a general term whereby you take something as input 1682 01:21:10,950 --> 01:21:12,116 and you produce some output. 1683 01:21:12,116 --> 01:21:15,090 And the output in that case is the location 1684 01:21:15,090 --> 01:21:18,150 you want to search, the first location, second location, third. 1685 01:21:18,150 --> 01:21:22,160 So the input is apple, the output is first. 1686 01:21:22,160 --> 01:21:25,054 The input is banana, the output should be second. 1687 01:21:25,054 --> 01:21:27,220 The input is cantaloupe, the output should be third. 1688 01:21:27,220 --> 01:21:30,320 The input is blueberry, the output should again be second. 1689 01:21:30,320 --> 01:21:34,010 And that's what helps you take shortcuts through your memory 1690 01:21:34,010 --> 01:21:39,050 in order to get to words or data more effectively. 1691 01:21:39,050 --> 01:21:43,330 >> Now this cuts down our time potentially by as much as one out of 26, 1692 01:21:43,330 --> 01:21:45,850 because if you assume that you have as many "a" words as "z" 1693 01:21:45,850 --> 01:21:48,080 words as "q" words, which isn't really realistic-- 1694 01:21:48,080 --> 01:21:50,830 you're going to have skew across certain letters of the alphabet-- 1695 01:21:50,830 --> 01:21:53,204 but this would be an incremental approach that does allow 1696 01:21:53,204 --> 01:21:55,930 you to get to words much more quickly. 1697 01:21:55,930 --> 01:21:59,660 And in reality, a sophisticated program, the Googles of the world, 1698 01:21:59,660 --> 01:22:02,180 the Facebooks of the world-- they would use a hash table 1699 01:22:02,180 --> 01:22:03,740 for a lot of different purposes. 1700 01:22:03,740 --> 01:22:06,590 But they wouldn't be so naive as to just look at the first letter 1701 01:22:06,590 --> 01:22:09,700 in apple or banana or pear or cantaloupe, 1702 01:22:09,700 --> 01:22:13,420 because as you can see these lists could still get long. 1703 01:22:13,420 --> 01:22:17,130 >> And so this might still be sort of linear-- so sort of slow, 1704 01:22:17,130 --> 01:22:19,980 like with the big O of n that we discussed earlier. 1705 01:22:19,980 --> 01:22:25,290 So what a real good hash table will do-- it will have a much bigger array. 1706 01:22:25,290 --> 01:22:28,574 And it will use a much more sophisticated hashing function, 1707 01:22:28,574 --> 01:22:30,240 so that it doesn't just look at the "a." 1708 01:22:30,240 --> 01:22:35,480 Maybe it looks at "a-p-p-l-e" and somehow converts those five letters 1709 01:22:35,480 --> 01:22:38,400 into the location where apple should be stored. 1710 01:22:38,400 --> 01:22:42,660 We're just naively using the letter 'a' alone, because it's nice and simple. 1711 01:22:42,660 --> 01:22:44,600 >> But a hash table, in the end, you can think 1712 01:22:44,600 --> 01:22:47,270 of as a combination of an array, each of which 1713 01:22:47,270 --> 01:22:51,700 has a linked list that ideally should be as short as possible. 1714 01:22:51,700 --> 01:22:54,364 And this is not an obvious solution. 1715 01:22:54,364 --> 01:22:57,280 In fact, much of the fine tuning that goes on underneath the hood when 1716 01:22:57,280 --> 01:22:59,654 implementing these kinds of sophisticated data structures 1717 01:22:59,654 --> 01:23:01,640 is what is the right length of the array? 1718 01:23:01,640 --> 01:23:03,250 What is the right hash function? 1719 01:23:03,250 --> 01:23:04,830 How do you store things in memory? 1720 01:23:04,830 --> 01:23:07,249 >> But realize how quickly this sort of discussion 1721 01:23:07,249 --> 01:23:10,540 escalated, either so far that it's kind of over one's head at this point, which 1722 01:23:10,540 --> 01:23:11,360 is fine. 1723 01:23:11,360 --> 01:23:18,820 But we started, recall, with truly something low-level and electronic. 1724 01:23:18,820 --> 01:23:20,819 And so this again is this theme of abstraction, 1725 01:23:20,819 --> 01:23:23,610 where once you start to take for granted, OK, I've got it-- there's 1726 01:23:23,610 --> 01:23:26,680 physical memory, OK, got it, every physical location has an address, 1727 01:23:26,680 --> 01:23:29,910 OK, I got it, I can represent those addresses as arrows-- 1728 01:23:29,910 --> 01:23:34,650 you can very quickly start to have more sophisticated conversations that 1729 01:23:34,650 --> 01:23:38,360 in the end seem to be allowing us to solve problems like searching 1730 01:23:38,360 --> 01:23:41,620 and sorting more effectively. 1731 01:23:41,620 --> 01:23:44,190 And rest assured, too-- because I think this 1732 01:23:44,190 --> 01:23:48,700 is the deepest we've gone into some of these CS topics proper-- we've 1733 01:23:48,700 --> 01:23:51,880 done in a day and a half at this point what you might typically do over 1734 01:23:51,880 --> 01:23:55,520 the course of eight weeks in a semester. 1735 01:23:55,520 --> 01:23:59,670 >> Any questions on these? 1736 01:23:59,670 --> 01:24:01,100 No? 1737 01:24:01,100 --> 01:24:01,940 All right. 1738 01:24:01,940 --> 01:24:05,610 Well, why don't we pause there, start lunch a few minutes early, 1739 01:24:05,610 --> 01:24:07,052 resume in just about an hour? 1740 01:24:07,052 --> 01:24:08,760 And I'll linger for a bit with questions. 1741 01:24:08,760 --> 01:24:11,343 Then I'm going to have to go take a couple calls if that's OK. 1742 01:24:11,343 --> 01:24:15,000 I'll turn on some music in the meantime, but lunch should be around the corner. 1743 01:24:15,000 --> 01:24:17,862