1 00:00:00,000 --> 00:00:02,000 [Week 6] 2 00:00:02,000 --> 00:00:04,000 [David J. Malan] [Harvard University] 3 00:00:04,000 --> 00:00:08,000 [This is CS50.] [CS50.TV] 4 00:00:08,000 --> 00:00:12,000 >> This is CS50, and this is the start of Week 6, 5 00:00:12,000 --> 00:00:16,000 so a couple of new tools are now available for you to take advantage of, 6 00:00:16,000 --> 00:00:19,000 the first of which is called CS50 Style. 7 00:00:19,000 --> 00:00:22,000 Odds are if you're like me or any of the teaching fellows, 8 00:00:22,000 --> 00:00:26,000 you've probably seen a program whose style looks a little something like this. 9 00:00:26,000 --> 00:00:30,000 Maybe you start cutting some corners late at night, or you'll deal with it later, 10 00:00:30,000 --> 00:00:32,000 and then a TF or CA comes over during office hours. 11 00:00:32,000 --> 00:00:34,000 Then it's hard for us to read. 12 00:00:34,000 --> 00:00:38,000 Well, this code is syntactically correct, and it will compile, and it will actually run. 13 00:00:38,000 --> 00:00:40,000 But it's definitely not a 5 for style. 14 00:00:40,000 --> 00:00:45,000 >> But now, if we go into this directory here— 15 00:00:45,000 --> 00:00:48,000 and notice that I have conditions2.c— 16 00:00:48,000 --> 00:00:55,000 and I run this new command, style50, on this file conditions2.c, Enter, 17 00:00:55,000 --> 00:00:57,000 notice that it's informed me that it has been stylized. 18 00:00:57,000 --> 00:01:00,000 Gedit noticed that the file has been changed on disk, 19 00:01:00,000 --> 00:01:08,000 and if I click reload, all your problems are now automated. 20 00:01:08,000 --> 00:01:15,000 [applause] 21 00:01:15,000 --> 00:01:17,000 That's one of the things we did this weekend. 22 00:01:17,000 --> 00:01:20,000 Realize that it is imperfect because there are some code 23 00:01:20,000 --> 00:01:23,000 that it simply won't be able to stylize perfectly, 24 00:01:23,000 --> 00:01:26,000 but realize this is now a tool you can take advantage of 25 00:01:26,000 --> 00:01:33,000 if only to tidy up some of the more errantly placed curly braces and the like. 26 00:01:33,000 --> 00:01:36,000 >> But more compelling now is CS50 Check. 27 00:01:36,000 --> 00:01:39,000 With CS50 Check, you can actually perform the same correctness tests 28 00:01:39,000 --> 00:01:42,000 on your own code that the teaching fellows are able to. 29 00:01:42,000 --> 00:01:44,000 This is a command line utility that comes now in the appliance 30 00:01:44,000 --> 00:01:46,000 as soon as you do an update50 as per 31 00:01:46,000 --> 00:01:49,000 pset 4 specifications, and you use it essentially like this. 32 00:01:49,000 --> 00:01:51,000 You run the command check50. 33 00:01:51,000 --> 00:01:56,000 Then you pass in a command line argument, or more generally known as a switch or a flag. 34 00:01:56,000 --> 00:01:58,000 Generally, things that have hyphens are called a switch 35 00:01:58,000 --> 00:02:02,000 to a command line program, so -c specifies 36 00:02:02,000 --> 00:02:04,000 the checks that you want to run. 37 00:02:04,000 --> 00:02:07,000 >> The tests that you want to run are identified uniquely by this string, 38 00:02:07,000 --> 00:02:10,000 2012/pset4/resize. 39 00:02:10,000 --> 00:02:13,000 In other words, that's just an arbitrary but unique string 40 00:02:13,000 --> 00:02:18,000 that we use to uniquely identify pset 4's correctness tests. 41 00:02:18,000 --> 00:02:21,000 And then you specify a space separated list of the files that you want to upload 42 00:02:21,000 --> 00:02:24,000 to CS50 Check for analysis. 43 00:02:24,000 --> 00:02:29,000 For instance, if I go into my solution here for resize.c— 44 00:02:29,000 --> 00:02:31,000 let me open up a bigger terminal window— 45 00:02:31,000 --> 00:02:42,000 and I go ahead and run let's say check50 -c 2012/pset4/resize, 46 00:02:42,000 --> 00:02:46,000 and then I go ahead and specify the names of the files, 47 00:02:46,000 --> 00:02:49,000 resize.c, and then hit Enter, it compresses, 48 00:02:49,000 --> 00:02:53,000 it uploads, it checks, and I just failed a whole bunch of tests. 49 00:02:53,000 --> 00:02:59,000 The one in red at top left says that resize.c and bmp exist. 50 00:02:59,000 --> 00:03:01,000 That was the test. That was the question we asked. 51 00:03:01,000 --> 00:03:04,000 And it's unhappy because the answer was false. 52 00:03:04,000 --> 00:03:08,000 The white text below it says expected bmp.h to exist, and that's simply my fault. 53 00:03:08,000 --> 00:03:11,000 I forgot to upload it, so I need to upload both files, 54 00:03:11,000 --> 00:03:14,000 resize.c and bmp.h. 55 00:03:14,000 --> 00:03:17,000 But now notice all of the other tests are in yellow because they haven't run, 56 00:03:17,000 --> 00:03:21,000 and so the smiley face is vertical because he's neither happy nor sad, 57 00:03:21,000 --> 00:03:25,000 but we have to redress that issue in red before those other checks will run. 58 00:03:25,000 --> 00:03:27,000 >> Let me fix this. 59 00:03:27,000 --> 00:03:30,000 Let me zoom out and rerun this, this time with bmp.h also 60 00:03:30,000 --> 00:03:34,000 on the command line, Enter, and now if all goes well, 61 00:03:34,000 --> 00:03:38,000 it's going to check and then return a result of—hold your breath— 62 00:03:38,000 --> 00:03:42,000 all green, which means I'm doing really well on pset 4 so far. 63 00:03:42,000 --> 00:03:44,000 You can see and infer from the descriptive text here 64 00:03:44,000 --> 00:03:47,000 exactly what it is we tested. 65 00:03:47,000 --> 00:03:49,000 We tested first do the files exist? 66 00:03:49,000 --> 00:03:51,000 We then tested does resize.c compile? 67 00:03:51,000 --> 00:03:58,000 Then we tested does it not resize a 1x1-pixel BMP when n, the resize factor, is 1. 68 00:03:58,000 --> 00:04:01,000 Now, if you have no idea what n is, you will once you dive into pset 4, 69 00:04:01,000 --> 00:04:04,000 but that simply is a sanity check to make sure that you're not resizing 70 00:04:04,000 --> 00:04:08,000 an image at all if the resize factor is 1. 71 00:04:08,000 --> 00:04:14,000 If, by contrast, it resizes a 1x1 pixel to a 1x1 pixel BMP to 2x2 correctly 72 00:04:14,000 --> 00:04:19,000 when n is 2, then similarly, mine forms accordingly. 73 00:04:19,000 --> 00:04:22,000 >> In short, this is meant to, one, take the crossing the fingers 74 00:04:22,000 --> 00:04:25,000 out of the equation right before you submit your pset. 75 00:04:25,000 --> 00:04:28,000 You will know exactly what your TF will soon know 76 00:04:28,000 --> 00:04:30,000 when you go about submitting some of these problem sets, 77 00:04:30,000 --> 00:04:34,000 and also the pedagogical motivation is really to put 78 00:04:34,000 --> 00:04:37,000 the opportunity in front of you so that when you know a priori 79 00:04:37,000 --> 00:04:39,000 that there's bugs in your code and tests that aren't being passed, 80 00:04:39,000 --> 00:04:43,000 you can put in more effective time up front to solve those problems 81 00:04:43,000 --> 00:04:45,000 rather than lose points, get feedback from your TF, 82 00:04:45,000 --> 00:04:48,000 and then go, "Ahh," like I should have figured that out. 83 00:04:48,000 --> 00:04:50,000 Now at least there's a tool to help you find that. 84 00:04:50,000 --> 00:04:52,000 It's not going to point out where the bug is, but it will tell you 85 00:04:52,000 --> 00:04:54,000 what is symptomatic of it. 86 00:04:54,000 --> 00:04:57,000 >> Now realize the tests aren't necessarily exhaustive. 87 00:04:57,000 --> 00:04:59,000 Just because you get a screen full of green smiley faces 88 00:04:59,000 --> 00:05:02,000 doesn't mean your code is perfect, but it does mean 89 00:05:02,000 --> 00:05:06,000 that it has passed certain tests prescribed by the spec. 90 00:05:06,000 --> 00:05:08,000 Sometimes we won't release checks. 91 00:05:08,000 --> 00:05:10,000 For instance, whodunit, one of the aspects of pset 4, 92 00:05:10,000 --> 00:05:15,000 is kind of disappointing if we give you 93 00:05:15,000 --> 00:05:18,000 the answer as to what it is, and there's a number of ways to reveal 94 00:05:18,000 --> 00:05:21,000 who the person is in that red noise. 95 00:05:21,000 --> 00:05:24,000 The spec will always specify in the future for pset 5 onward 96 00:05:24,000 --> 00:05:26,000 what checks exist for you. 97 00:05:26,000 --> 00:05:28,000 You'll notice there's this white URL at the bottom. 98 00:05:28,000 --> 00:05:30,000 For now, this is just diagnostic output. 99 00:05:30,000 --> 00:05:33,000 If you visit that URL, you'll get a whole bunch of crazy, cryptic messages 100 00:05:33,000 --> 00:05:36,000 that you're welcome to look through, but it's mostly for the staff 101 00:05:36,000 --> 00:05:41,000 so that we can diagnose and debug bugs in check50 itself. 102 00:05:41,000 --> 00:05:46,000 >> Without ado, let's move on to where we left off. 103 00:05:46,000 --> 00:05:48,000 CS50 library we took for granted for some weeks, 104 00:05:48,000 --> 00:05:52,000 but then last week, we started peeling back one of the layers of it. 105 00:05:52,000 --> 00:05:55,000 We started putting aside string in favor of what instead? 106 00:05:55,000 --> 00:05:57,000 [Students] Char. 107 00:05:57,000 --> 00:05:59,000 Char*, which has been a char* all this time, 108 00:05:59,000 --> 00:06:03,000 but now we don't have to pretend that it's an actual data type string. 109 00:06:03,000 --> 00:06:06,000 Rather, it's been a synonym of sorts for char*, 110 00:06:06,000 --> 00:06:09,000 and a string is a sequence of characters, 111 00:06:09,000 --> 00:06:14,000 so why does it make sense to represent strings as char*s? 112 00:06:14,000 --> 00:06:20,000 What does a char* represent in the context of this concept of a string? 113 00:06:20,000 --> 00:06:23,000 Yeah.>>[Student] The first character. 114 00:06:23,000 --> 00:06:25,000 Good, the first character, but not quite the first character. 115 00:06:25,000 --> 00:06:27,000 It's the—[Students] Address. 116 00:06:27,000 --> 00:06:29,000 Good, the address of the first character. 117 00:06:29,000 --> 00:06:33,000 All that's necessary to represent a string in a computer's memory 118 00:06:33,000 --> 00:06:36,000 is just the unique address of its very first byte. 119 00:06:36,000 --> 00:06:38,000 You don't even have to know how long it is 120 00:06:38,000 --> 00:06:42,000 because how can you figure that out dynamically? 121 00:06:42,000 --> 00:06:44,000 [Student] String length. 122 00:06:44,000 --> 00:06:48,000 You can call string length, excellent, but how does string length work? 123 00:06:48,000 --> 00:06:50,000 What does it do? Yeah. 124 00:06:50,000 --> 00:06:52,000 [Student] Keep going until you get the null character. 125 00:06:52,000 --> 00:06:54,000 Yeah, exactly, it just iterates with a for loop, while loop, 126 00:06:54,000 --> 00:06:57,000 whatever from * to the end, and the end is represented 127 00:06:57,000 --> 00:07:01,000 by \0, the so-called nul character, nul, 128 00:07:01,000 --> 00:07:05,000 not to be confused with null, which is a pointer, 129 00:07:05,000 --> 00:07:07,000 which will come up in conversation again today. 130 00:07:07,000 --> 00:07:11,000 >> We peeled back a layer of GetInt, and then we took a look at GetString, 131 00:07:11,000 --> 00:07:14,000 and recall that both of those functions, or really, 132 00:07:14,000 --> 00:07:18,000 GetString, was using a certain function 133 00:07:18,000 --> 00:07:21,000 to actually parse, that is, read or analyze, the user's input. 134 00:07:21,000 --> 00:07:25,000 And what was that new function? 135 00:07:25,000 --> 00:07:27,000 Scanf or sscanf. It actually comes in a few different flavors. 136 00:07:27,000 --> 00:07:31,000 There's scanf, there's sscanf, there's fscanf. 137 00:07:31,000 --> 00:07:35,000 For now, though, let's focus on the one most easily illustrated, 138 00:07:35,000 --> 00:07:38,000 and let me go ahead and open up in the appliance 139 00:07:38,000 --> 00:07:41,000 a file like this, scanf1.c. 140 00:07:41,000 --> 00:07:43,000 This is a super simple program, 141 00:07:43,000 --> 00:07:46,000 but that does something that we've never done 142 00:07:46,000 --> 00:07:48,000 without the help of the CS50 library. 143 00:07:48,000 --> 00:07:51,000 This gets an int from a user. How does it work? 144 00:07:51,000 --> 00:07:53,000 Well, in line 16 there, 145 00:07:53,000 --> 00:07:56,000 notice that we declare an int called x, and at this point in the story, 146 00:07:56,000 --> 00:07:58,000 what is the value of x? 147 00:07:58,000 --> 00:08:00,000 [inaudible student response] 148 00:08:00,000 --> 00:08:02,000 [David M.] Right, who knows, some garbage value potentially, so in 17, we just tell the user 149 00:08:02,000 --> 00:08:06,000 give me a number, please, and step 18 is where it gets interesting. 150 00:08:06,000 --> 00:08:11,000 Scanf seems to borrow an idea from printf in that it uses these format codes in quotes. 151 00:08:11,000 --> 00:08:13,000 %d is of course a decimal number. 152 00:08:13,000 --> 00:08:21,000 But why am I passing in &x instead of just x? 153 00:08:21,000 --> 00:08:24,000 The former is correct. Yeah. 154 00:08:24,000 --> 00:08:26,000 [inaudible student response] 155 00:08:26,000 --> 00:08:31,000 Exactly, if the goal of this program, like the function GetInt itself, 156 00:08:31,000 --> 00:08:34,000 is to get an int from the user I can pass functions 157 00:08:34,000 --> 00:08:38,000 all the variables I want, but if I don't pass them by reference 158 00:08:38,000 --> 00:08:41,000 or by address or by pointer, all synonymous for today's purposes, 159 00:08:41,000 --> 00:08:46,000 then that function has no ability to change the contents of that variable. 160 00:08:46,000 --> 00:08:49,000 This would pass in a copy just like the buggy version of swap 161 00:08:49,000 --> 00:08:51,000 that we've talked about a few times now. 162 00:08:51,000 --> 00:08:54,000 >> But instead, by doing &x, I'm literally passing in what? 163 00:08:54,000 --> 00:08:57,000 [Student] The address.>>The address of x. 164 00:08:57,000 --> 00:09:01,000 It's like drawing a map for the function called scanf and saying here, 165 00:09:01,000 --> 00:09:04,000 these are directions to a chunk of memory in the computer 166 00:09:04,000 --> 00:09:07,000 that you can go store some integer in. 167 00:09:07,000 --> 00:09:10,000 In order for sscanf to now do that 168 00:09:10,000 --> 00:09:13,000 what operator, what piece of syntax is it going to have to use 169 00:09:13,000 --> 00:09:19,000 even though we can't see it because someone else wrote this function? 170 00:09:19,000 --> 00:09:21,000 In other words--what's that? 171 00:09:21,000 --> 00:09:23,000 [Student] X read. 172 00:09:23,000 --> 00:09:27,000 There's going to be some reading, but only with regard to x here. 173 00:09:27,000 --> 00:09:30,000 If scanf is being passed the address of x, 174 00:09:30,000 --> 00:09:35,000 syntactically, what operator is bound to exist somewhere 175 00:09:35,000 --> 00:09:38,000 inside of scanf's implementation so that scanf 176 00:09:38,000 --> 00:09:42,000 can actually write a number 2 to that address? 177 00:09:42,000 --> 00:09:44,000 Yeah, so the *. 178 00:09:44,000 --> 00:09:47,000 Recall that the * is our dereference operator, which essentially means go there. 179 00:09:47,000 --> 00:09:50,000 >> Once you've been handed an address, as is the case here, 180 00:09:50,000 --> 00:09:53,000 scanf is probably—if we actually looked around its source code— 181 00:09:53,000 --> 00:09:59,000 is doing *x or the equivalent to actually go to that address and put some value there. 182 00:09:59,000 --> 00:10:02,000 Now, as for how scanf gets input from the keyboard, 183 00:10:02,000 --> 00:10:04,000 we'll wave our hands out for today. 184 00:10:04,000 --> 00:10:07,000 Just assume that the operating system allows sscanf to talk 185 00:10:07,000 --> 00:10:11,000 to the user's keyboard, but at this point now in line 19, 186 00:10:11,000 --> 00:10:14,000 when we simply print out x, it seems to be the case 187 00:10:14,000 --> 00:10:17,000 that scanf has put an int in x. 188 00:10:17,000 --> 00:10:19,000 That's exactly how scanf works, and recall last week 189 00:10:19,000 --> 00:10:25,000 that's exactly how GetString and GetInt and its other family of functions 190 00:10:25,000 --> 00:10:28,000 ultimately works, albeit with slight variance like sscanf, 191 00:10:28,000 --> 00:10:31,000 which means scan a string instead of the keyboard. 192 00:10:31,000 --> 00:10:33,000 But let's take a look at a little variance of this. 193 00:10:33,000 --> 00:10:37,000 In scanf2, I actually screwed up. 194 00:10:37,000 --> 00:10:42,000 What is wrong—and I'll hide the comment that explains as much— 195 00:10:42,000 --> 00:10:47,000 what is wrong with this program, version 2? 196 00:10:47,000 --> 00:10:55,000 Be as technical as possible this time. 197 00:10:55,000 --> 00:10:57,000 It looks pretty good. 198 00:10:57,000 --> 00:11:03,000 It's nicely indented, but— 199 00:11:03,000 --> 00:11:07,000 okay, how about let's prune it down to shorter questions? 200 00:11:07,000 --> 00:11:17,000 Line 16. What's line 16 doing in precise but technical English? 201 00:11:17,000 --> 00:11:20,000 Getting a little awkward. Yes, Michael. 202 00:11:20,000 --> 00:11:25,000 [Student] It's pointing to the first letter of a string. 203 00:11:25,000 --> 00:11:27,000 >> Okay, close. Let me tweak that a little bit. 204 00:11:27,000 --> 00:11:33,000 Pointing to the first letter of a string, you are declaring a variable called buffer 205 00:11:33,000 --> 00:11:36,000 that will point to the first address of a string, 206 00:11:36,000 --> 00:11:39,000 or rather, that will point more specifically to a char. 207 00:11:39,000 --> 00:11:42,000 Notice it's not actually pointing anywhere because there's no assignment operator. 208 00:11:42,000 --> 00:11:46,000 There's no equal sign, so all we're doing is allocating the variable called buffer. 209 00:11:46,000 --> 00:11:49,000 It happens to be 32 bits because it's a pointer, 210 00:11:49,000 --> 00:11:52,000 and the contents of buffer presumably eventually 211 00:11:52,000 --> 00:11:57,000 will contain an address of a char, but for now, what does buffer contain? 212 00:11:57,000 --> 00:11:59,000 Just some bogus, who knows, some garbage value, 213 00:11:59,000 --> 00:12:03,000 because we haven't explicitly initialized it, so we shouldn't assume anything. 214 00:12:03,000 --> 00:12:06,000 Okay, so now line 17 is—what does line 17 do? 215 00:12:06,000 --> 00:12:08,000 Maybe that will warm this up. 216 00:12:08,000 --> 00:12:10,000 It prints a string, right? 217 00:12:10,000 --> 00:12:12,000 It prints String please. 218 00:12:12,000 --> 00:12:15,000 >> Line 18 is kind of familiar now in that we just saw a variance of this 219 00:12:15,000 --> 00:12:18,000 but with a different format code, so in line 18, 220 00:12:18,000 --> 00:12:23,000 we're telling scanf here is the address of a chunk of memory. 221 00:12:23,000 --> 00:12:27,000 I want you to ring in a string, as implied by %s, 222 00:12:27,000 --> 00:12:32,000 but the problem is that we have not done a couple of things here. 223 00:12:32,000 --> 00:12:35,000 What's one of the problems? 224 00:12:35,000 --> 00:12:38,000 [Student] It's trying to dereference a null pointer. 225 00:12:38,000 --> 00:12:41,000 Good, null or just otherwise unknown pointers. 226 00:12:41,000 --> 00:12:45,000 You're handing scanf an address, but you just said a moment ago 227 00:12:45,000 --> 00:12:49,000 that that address is some garbage value because we didn't actually assign it to anything, 228 00:12:49,000 --> 00:12:53,000 and so you're telling scanf effectively go put a string here, 229 00:12:53,000 --> 00:12:56,000 but we don't know where here yet is, 230 00:12:56,000 --> 00:12:59,000 so we haven't actually allocated memory for buffer. 231 00:12:59,000 --> 00:13:03,000 Moreover, what are you also not even telling scanf? 232 00:13:03,000 --> 00:13:06,000 Suppose this was a chunk of memory, and it wasn't a garbage value, 233 00:13:06,000 --> 00:13:09,000 but you're still not telling scanf something important. 234 00:13:09,000 --> 00:13:12,000 [Student] Where it actually is, the ampersand. 235 00:13:12,000 --> 00:13:15,000 Ampersand, so in this case, it's okay. 236 00:13:15,000 --> 00:13:18,000 Because buffer is already declared as a pointer 237 00:13:18,000 --> 00:13:22,000 with the * piece of syntax, we don't need to use ampersand 238 00:13:22,000 --> 00:13:25,000 because it's already an address, but I think I heard it here. 239 00:13:25,000 --> 00:13:27,000 [Student] How big is it? 240 00:13:27,000 --> 00:13:29,000 Good, we're not telling scanf how big this buffer is, 241 00:13:29,000 --> 00:13:32,000 which means even if buffer were a pointer, 242 00:13:32,000 --> 00:13:35,000 we're saying scanf, put a string here, 243 00:13:35,000 --> 00:13:38,000 but here could be 2 bytes, it could be 10 bytes, it could be a megabyte. 244 00:13:38,000 --> 00:13:41,000 Scanf has no idea, and because this is a chunk of memory 245 00:13:41,000 --> 00:13:43,000 presumably, it's not a string yet. 246 00:13:43,000 --> 00:13:48,000 It's only a string once you write characters and a \0 to that chunk of memory. 247 00:13:48,000 --> 00:13:51,000 Now it's just some chunk of memory. 248 00:13:51,000 --> 00:13:55,000 Scanf won't know when to stop writing to that address. 249 00:13:55,000 --> 00:13:59,000 >> If you recall some examples in the past where I randomly typed on the keyboard 250 00:13:59,000 --> 00:14:03,000 trying to overflow a buffer, and we talked on Friday about exactly that. 251 00:14:03,000 --> 00:14:07,000 If an adversary somehow injects into your program a much bigger word 252 00:14:07,000 --> 00:14:10,000 or sentence or phrase then you were expecting you can overrun 253 00:14:10,000 --> 00:14:13,000 a chunk of memory, which can have bad consequences, 254 00:14:13,000 --> 00:14:15,000 like taking over the whole program itself. 255 00:14:15,000 --> 00:14:17,000 We need to fix this somehow. 256 00:14:17,000 --> 00:14:20,000 Let me zoom out and go into version 3 of this program. 257 00:14:20,000 --> 00:14:22,000 That's a little bit better. 258 00:14:22,000 --> 00:14:24,000 In this version, notice the difference. 259 00:14:24,000 --> 00:14:27,000 In line 16, I'm again declaring a variable called buffer, 260 00:14:27,000 --> 00:14:29,000 but what is it now? 261 00:14:29,000 --> 00:14:33,000 It's an array of 16 chars. 262 00:14:33,000 --> 00:14:36,000 This is good because this means I can now tell scanf 263 00:14:36,000 --> 00:14:39,000 here is an actual chunk of memory. 264 00:14:39,000 --> 00:14:42,000 You can almost think of arrays as being pointers now, 265 00:14:42,000 --> 00:14:44,000 even though they're not actually equivalent. 266 00:14:44,000 --> 00:14:47,000 They'll behave differently in different contexts. 267 00:14:47,000 --> 00:14:50,000 But it's certainly the case that buffer is referencing 268 00:14:50,000 --> 00:14:53,000 16 contiguous chars because that's what an array is 269 00:14:53,000 --> 00:14:55,000 and has been for some weeks now. 270 00:14:55,000 --> 00:14:59,000 >> Here, I am telling scanf here's a chunk of memory. 271 00:14:59,000 --> 00:15:01,000 This time, it's actually a chunk of memory, 272 00:15:01,000 --> 00:15:07,000 but why is this program still exploitable? 273 00:15:07,000 --> 00:15:11,000 What's wrong still? 274 00:15:11,000 --> 00:15:14,000 I've said give me 16 bytes but— 275 00:15:14,000 --> 00:15:16,000 [Student] What if they type in more than 16? 276 00:15:16,000 --> 00:15:20,000 Exactly, what if the user types in 17 characters or 1700 characters? 277 00:15:20,000 --> 00:15:23,000 In fact, let's see if we can't trip over this mistake now. 278 00:15:23,000 --> 00:15:25,000 It's better but not perfect. 279 00:15:25,000 --> 00:15:28,000 Let me go ahead and run make scanf3 to compile this program. 280 00:15:28,000 --> 00:15:34,000 Let me run scanf3, String please: hello, and we seem to be okay. 281 00:15:34,000 --> 00:15:37,000 Let me try a slightly longer one, hello there. 282 00:15:37,000 --> 00:15:42,000 Okay, let's do hello there how are you today, Enter. 283 00:15:42,000 --> 00:15:54,000 Getting kind of lucky here, let's say hello there how are you. 284 00:15:54,000 --> 00:15:56,000 Damn it. 285 00:15:56,000 --> 00:16:03,000 Okay, so we got lucky. Let's see if we can't fix this. 286 00:16:03,000 --> 00:16:06,000 No, it's not going to let me copy. 287 00:16:06,000 --> 00:16:09,000 Let's try this again. 288 00:16:09,000 --> 00:16:12,000 All right, stand by. 289 00:16:12,000 --> 00:16:20,000 We'll see how long I can pretend to focus while still doing this. 290 00:16:20,000 --> 00:16:23,000 Damn it. That's rather appropriate, actually. 291 00:16:23,000 --> 00:16:26,000 There we go. 292 00:16:26,000 --> 00:16:30,000 Point made. 293 00:16:30,000 --> 00:16:34,000 >> This, embarrassing though it also is, it is also one of the sources of great confusion 294 00:16:34,000 --> 00:16:38,000 when writing programs that have bugs because they manifest themselves 295 00:16:38,000 --> 00:16:40,000 only once in a while sometimes. 296 00:16:40,000 --> 00:16:43,000 The reality is that even if your code is completely broken, 297 00:16:43,000 --> 00:16:46,000 it might only be completely broken once in a while 298 00:16:46,000 --> 00:16:49,000 because sometimes, essentially what happens is the operating system allocates 299 00:16:49,000 --> 00:16:52,000 a little more memory than you actually need for whatever reason, 300 00:16:52,000 --> 00:16:57,000 and so no one else is using the memory right after your chunk of 16 characters, 301 00:16:57,000 --> 00:17:01,000 so if you go to 17, 18, 19, whatever, it's not such a big deal. 302 00:17:01,000 --> 00:17:04,000 Now, the computer, even if it doesn't crash at that point, 303 00:17:04,000 --> 00:17:09,000 might eventually use byte number 17 or 18 or 19 for something else, 304 00:17:09,000 --> 00:17:14,000 at which point your data that you put there, albeit excessively long, 305 00:17:14,000 --> 00:17:18,000 is going to get overwritten potentially by some other function. 306 00:17:18,000 --> 00:17:21,000 It's not necessarily going to remain intact, 307 00:17:21,000 --> 00:17:23,000 but it won't necessarily cause a seg fault. 308 00:17:23,000 --> 00:17:26,000 But in this case, I finally provided enough characters 309 00:17:26,000 --> 00:17:29,000 that I essentially exceeded my segment of memory, and bam, 310 00:17:29,000 --> 00:17:33,000 the operating system said, "Sorry, that's no good, segmentation fault." 311 00:17:33,000 --> 00:17:38,000 >> And let's see now if what remains here in my directory— 312 00:17:38,000 --> 00:17:40,000 notice that I have this file here, core. 313 00:17:40,000 --> 00:17:42,000 Notice that this is again called a core dump. 314 00:17:42,000 --> 00:17:46,000 It's essentially a file that contains the content of your program's memory 315 00:17:46,000 --> 00:17:48,000 at the point at which it crashed, 316 00:17:48,000 --> 00:17:51,000 and just to try a little example here let me go in here 317 00:17:51,000 --> 00:17:57,000 and run gdb on scanf3 and then specify a third argument called core, 318 00:17:57,000 --> 00:18:01,000 and notice here that if I list the code, 319 00:18:01,000 --> 00:18:06,000 we'll be able as usual with gdb to start walking through this program, 320 00:18:06,000 --> 00:18:10,000 and I can run it and as soon as I hit—as with the step command in gdb— 321 00:18:10,000 --> 00:18:13,000 as soon as I hit the potentially buggy line after typing in a huge string, 322 00:18:13,000 --> 00:18:16,000 I'll be able to actually identify it here. 323 00:18:16,000 --> 00:18:19,000 More on this, though, in section in terms of core dumps 324 00:18:19,000 --> 00:18:22,000 and the like so that you can actually poke around inside of the core dump 325 00:18:22,000 --> 00:18:27,000 and see on what line the program failed you. 326 00:18:27,000 --> 00:18:32,000 Any questions then on pointers and on addresses? 327 00:18:32,000 --> 00:18:36,000 Because today on, we're going to start taking for granted that these things exist 328 00:18:36,000 --> 00:18:40,000 and we know exactly what they are. 329 00:18:40,000 --> 00:18:42,000 Yes. 330 00:18:42,000 --> 00:18:46,000 >> [Student] How come you didn't have to put an ampersand next to the part— 331 00:18:46,000 --> 00:18:48,000 Good question. 332 00:18:48,000 --> 00:18:51,000 How come I didn't have to put an ampersand next to the character array as I did previously 333 00:18:51,000 --> 00:18:53,000 with most of our examples? 334 00:18:53,000 --> 00:18:55,000 The short answer is arrays are a little special. 335 00:18:55,000 --> 00:18:59,000 You can almost think a buffer as actually being an address, 336 00:18:59,000 --> 00:19:03,000 and it just so happens to be the case that the square bracket notation 337 00:19:03,000 --> 00:19:06,000 is a convenience so that we can go into bracket 0, bracket 1, 338 00:19:06,000 --> 00:19:10,000 bracket 2, without having to use the * notation. 339 00:19:10,000 --> 00:19:13,000 That's a bit of a white lie because arrays and pointers 340 00:19:13,000 --> 00:19:17,000 are, in fact, a little bit different, but they can often but not always be used interchangeably. 341 00:19:17,000 --> 00:19:21,000 In short, when a function is expecting a pointer to a chunk of memory, 342 00:19:21,000 --> 00:19:24,000 you can either pass it an address that was returned by malloc, 343 00:19:24,000 --> 00:19:29,000 and we'll see malloc again before long, or you can pass it the name of an array. 344 00:19:29,000 --> 00:19:32,000 You don't have to do ampersand with arrays because they are already 345 00:19:32,000 --> 00:19:34,000 essentially like addresses. 346 00:19:34,000 --> 00:19:36,000 That's the one exception. 347 00:19:36,000 --> 00:19:39,000 The square brackets make them special. 348 00:19:39,000 --> 00:19:41,000 >> Could you put an ampersand next to the buffer? 349 00:19:41,000 --> 00:19:43,000 Not in this case. 350 00:19:43,000 --> 00:19:46,000 That would not work because, again, of this corner case 351 00:19:46,000 --> 00:19:49,000 where arrays aren't quite actually addresses. 352 00:19:49,000 --> 00:19:54,000 But we'll perhaps come back to that before long with other examples. 353 00:19:54,000 --> 00:19:56,000 Let's try to solve a problem here. 354 00:19:56,000 --> 00:20:00,000 We have a data structure that we've been using for some time known as an array. 355 00:20:00,000 --> 00:20:02,000 Case in point, that's what we just had. 356 00:20:02,000 --> 00:20:04,000 But arrays have some upsides and downsides. 357 00:20:04,000 --> 00:20:06,000 Arrays are nice why? 358 00:20:06,000 --> 00:20:11,000 What's one thing that you like—to the extent you like arrays—about arrays? 359 00:20:11,000 --> 00:20:13,000 What's convenient about them? What's compelling? 360 00:20:13,000 --> 00:20:18,000 Why did we introduce them in the first place? 361 00:20:18,000 --> 00:20:20,000 Yeah. 362 00:20:20,000 --> 00:20:27,000 [Student] They can store a lot of data, and you don't have to use an entire thing. 363 00:20:27,000 --> 00:20:29,000 You can use a section. 364 00:20:29,000 --> 00:20:32,000 Good, with an array you can store a lot of data, 365 00:20:32,000 --> 00:20:35,000 and you don't necessarily have to use all of it, so you can overallocate, 366 00:20:35,000 --> 00:20:39,000 which might be convenient if you don't know in advance how many of something to expect. 367 00:20:39,000 --> 00:20:41,000 >> GetString is a perfect example. 368 00:20:41,000 --> 00:20:44,000 GetString, written by us, has no idea how many chars to expect, 369 00:20:44,000 --> 00:20:48,000 so the fact that we can allocate chunks of contiguous memory is good. 370 00:20:48,000 --> 00:20:51,000 Arrays also solve a problem we saw a couple weeks ago now 371 00:20:51,000 --> 00:20:54,000 where your code starts to devolve into something very poorly designed. 372 00:20:54,000 --> 00:20:57,000 Recall that I created a student structure called David, 373 00:20:57,000 --> 00:21:00,000 and then that was actually an alternative, though, 374 00:21:00,000 --> 00:21:04,000 to having a variable called name and another variable called, I think, house, 375 00:21:04,000 --> 00:21:08,000 and another variable called ID because in that story I then wanted to introduce something else 376 00:21:08,000 --> 00:21:11,000 like Rob into the program, so then I decided wait a minute, 377 00:21:11,000 --> 00:21:13,000 I need to rename these variables. 378 00:21:13,000 --> 00:21:16,000 Let's call mine name1, ID1, house1. 379 00:21:16,000 --> 00:21:20,000 Let's call Rob's name2, house2, ID2. 380 00:21:20,000 --> 00:21:22,000 But then wait a minute, what about Tommy? 381 00:21:22,000 --> 00:21:24,000 Then we had three more variables. 382 00:21:24,000 --> 00:21:27,000 We introduced someone else, four sets of variables. 383 00:21:27,000 --> 00:21:30,000 The world started to get messy very quickly, 384 00:21:30,000 --> 00:21:33,000 so we introduced structs, and what's compelling about a struct? 385 00:21:33,000 --> 00:21:39,000 What does a C struct let you do? 386 00:21:39,000 --> 00:21:42,000 It's really awkward today. 387 00:21:42,000 --> 00:21:44,000 What?>>[inaudible student response] 388 00:21:44,000 --> 00:21:47,000 Yeah, specifically, typedef allows you to create a new data type, 389 00:21:47,000 --> 00:21:51,000 and struct, the struct keyword, allows you to encapsulate 390 00:21:51,000 --> 00:21:54,000 conceptually related pieces of data together 391 00:21:54,000 --> 00:21:56,000 and thereafter call them something like a student. 392 00:21:56,000 --> 00:21:58,000 >> That was good because now we can model 393 00:21:58,000 --> 00:22:03,000 much more sort of conceptually consistent the notion of a student in a variable 394 00:22:03,000 --> 00:22:07,000 rather than arbitrarily having one for a string, one for an ID, and so forth. 395 00:22:07,000 --> 00:22:10,000 Arrays are nice because they allow us to start cleaning up our code. 396 00:22:10,000 --> 00:22:13,000 But what is a downside now of an array? 397 00:22:13,000 --> 00:22:15,000 What can you not do? Yeah. 398 00:22:15,000 --> 00:22:17,000 [Student] You have to know how big it is. 399 00:22:17,000 --> 00:22:19,000 You have to know how big it is, so it's kind of a pain. 400 00:22:19,000 --> 00:22:21,000 Those of you with prior programming experience know that in a lot of languages, 401 00:22:21,000 --> 00:22:24,000 like Java, you can ask a chunk of memory, specifically an array, 402 00:22:24,000 --> 00:22:28,000 how big are you, with a length, property, so to speak, and that's really convenient. 403 00:22:28,000 --> 00:22:32,000 In C, you can't even call strlen on a generic array 404 00:22:32,000 --> 00:22:35,000 because strlen, as the word implies, is only for strings, 405 00:22:35,000 --> 00:22:39,000 and you can figure out the length of a string because of this human convention 406 00:22:39,000 --> 00:22:43,000 of having a \0, but an array, more generically, is just a chunk of memory. 407 00:22:43,000 --> 00:22:46,000 If it's an array of ints, there's not going to be some special character 408 00:22:46,000 --> 00:22:48,000 at the end waiting for you. 409 00:22:48,000 --> 00:22:50,000 You have to remember the length of an array. 410 00:22:50,000 --> 00:22:54,000 Another downside of an array reared its head in GetString itself. 411 00:22:54,000 --> 00:22:59,000 What's another downside of an array? 412 00:22:59,000 --> 00:23:01,000 Sir, just you and me today. 413 00:23:01,000 --> 00:23:04,000 [inaudible student response]>>It's what? 414 00:23:04,000 --> 00:23:06,000 It's declared on the stack. 415 00:23:06,000 --> 00:23:09,000 Okay, declared on the stack. Why don't you like that? 416 00:23:09,000 --> 00:23:13,000 [Student] Because it gets reused. 417 00:23:13,000 --> 00:23:15,000 It gets reused. 418 00:23:15,000 --> 00:23:18,000 Okay, if you use an array to allocate memory, 419 00:23:18,000 --> 00:23:21,000 you can't, for instance, return it because it's on the stack. 420 00:23:21,000 --> 00:23:23,000 Okay, that's a disadvantage. 421 00:23:23,000 --> 00:23:25,000 And how about one other with an array? 422 00:23:25,000 --> 00:23:28,000 Once you allocate it, you're kind of screwed if you need more space 423 00:23:28,000 --> 00:23:30,000 than that array has. 424 00:23:30,000 --> 00:23:34,000 >> Then we introduced, recall, malloc, which gave us the ability to dynamically allocate memory. 425 00:23:34,000 --> 00:23:37,000 But what if we tried a different world altogether? 426 00:23:37,000 --> 00:23:40,000 What if we wanted to solve a couple of those problems 427 00:23:40,000 --> 00:23:45,000 so we instead—my pen has fallen asleep here— 428 00:23:45,000 --> 00:23:51,000 what if we instead wanted to essentially create a world that's no longer like this? 429 00:23:51,000 --> 00:23:56,000 This is an array, and, of course, this kind of deteriorates once we hit the end of the array, 430 00:23:56,000 --> 00:24:00,000 and I now no longer have space for another integer or another character. 431 00:24:00,000 --> 00:24:03,000 What if we sort of preemptively say well, why don't we relax 432 00:24:03,000 --> 00:24:07,000 this requirement that all these chunks of memory be contiguous back to back, 433 00:24:07,000 --> 00:24:10,000 and why don't, when I need an int or a char, 434 00:24:10,000 --> 00:24:12,000 just give me space for one of them? 435 00:24:12,000 --> 00:24:14,000 And when I need another, give me another space, 436 00:24:14,000 --> 00:24:16,000 and when I need another, give me another space. 437 00:24:16,000 --> 00:24:19,000 The advantage of which now is that if someone else 438 00:24:19,000 --> 00:24:21,000 takes the memory over here, no big deal. 439 00:24:21,000 --> 00:24:25,000 I'll take this additional chunk of memory here and then this one. 440 00:24:25,000 --> 00:24:28,000 >> Now, the only catch here is that this almost feels like I have 441 00:24:28,000 --> 00:24:30,000 a whole bunch of different variables. 442 00:24:30,000 --> 00:24:33,000 This feels like five different variables potentially. 443 00:24:33,000 --> 00:24:36,000 But what if we steal an idea from strings 444 00:24:36,000 --> 00:24:41,000 whereby we somehow link these things together conceptually, and what if I did this? 445 00:24:41,000 --> 00:24:44,000 This is my very poorly drawn arrow. 446 00:24:44,000 --> 00:24:46,000 But suppose that each of these chunks of memory 447 00:24:46,000 --> 00:24:52,000 pointed to the other, and this guy, who has no sibling to his right, 448 00:24:52,000 --> 00:24:54,000 has no such arrow. 449 00:24:54,000 --> 00:24:56,000 This is in fact what's called a linked list. 450 00:24:56,000 --> 00:25:00,000 This is a new data structure that allows us to allocate a chunk of memory, 451 00:25:00,000 --> 00:25:03,000 then another, then another, then another, any time we want 452 00:25:03,000 --> 00:25:07,000 during a program, and we remember that they're all somehow related 453 00:25:07,000 --> 00:25:11,000 by literally chaining them together, and we did that pictorially here with an arrow. 454 00:25:11,000 --> 00:25:15,000 But in code, what would be the mechanism via which you could somehow connect, 455 00:25:15,000 --> 00:25:20,000 almost like Scratch, one chunk to another chunk? 456 00:25:20,000 --> 00:25:22,000 We could use a pointer, right? 457 00:25:22,000 --> 00:25:25,000 Because really the arrow that's going from the top left square, 458 00:25:25,000 --> 00:25:31,000 this guy here to this one, could contain inside of this square 459 00:25:31,000 --> 00:25:34,000 not just some ints, not just some char, but what if I actually allocated 460 00:25:34,000 --> 00:25:37,000 a little extra space so that now, 461 00:25:37,000 --> 00:25:41,000 each of my chunks of memory, even though this is going to cost me, 462 00:25:41,000 --> 00:25:45,000 now looks a little more rectangular where one of the chunks of memory 463 00:25:45,000 --> 00:25:47,000 is used for a number, like the number 1, 464 00:25:47,000 --> 00:25:50,000 and then if this guy stores the number 2, 465 00:25:50,000 --> 00:25:52,000 this other chunk of memory is used for an arrow, 466 00:25:52,000 --> 00:25:54,000 or more concretely, a pointer. 467 00:25:54,000 --> 00:25:59,000 And suppose I store the number 3 over here while I use this to point at that guy, 468 00:25:59,000 --> 00:26:02,000 and now this guy, let's suppose I only want three such chunks of memory. 469 00:26:02,000 --> 00:26:05,000 I'll draw a line through that, indicating null. 470 00:26:05,000 --> 00:26:07,000 There is no additional character. 471 00:26:07,000 --> 00:26:10,000 >> Indeed, this is how we can go about implementing 472 00:26:10,000 --> 00:26:12,000 something that's called a linked list. 473 00:26:12,000 --> 00:26:18,000 A linked list is a new data structure, and it's a stepping stone toward 474 00:26:18,000 --> 00:26:21,000 much fancier data structures that begin to solve problems 475 00:26:21,000 --> 00:26:23,000 along the lines of Facebook-type problems and Google-type problems 476 00:26:23,000 --> 00:26:26,000 where you have huge data sets, and it no longer cuts it 477 00:26:26,000 --> 00:26:29,000 to store everything contiguously and use something like linear search 478 00:26:29,000 --> 00:26:31,000 or even something like binary search. 479 00:26:31,000 --> 00:26:33,000 You want even better running times. 480 00:26:33,000 --> 00:26:37,000 In fact, one of the Holy Grails we'll talk about later this week or next 481 00:26:37,000 --> 00:26:41,000 is an algorithm whose running time is constant. 482 00:26:41,000 --> 00:26:44,000 In other words, it always takes the same amount of time no matter 483 00:26:44,000 --> 00:26:47,000 how big the input is, and that would indeed be compelling, 484 00:26:47,000 --> 00:26:49,000 even more so than something logarithmic. 485 00:26:49,000 --> 00:26:51,000 What is this on the screen here? 486 00:26:51,000 --> 00:26:55,000 Each of the rectangles is exactly what I just drew by hand. 487 00:26:55,000 --> 00:26:59,000 But the thing all the way on the left is a special variable. 488 00:26:59,000 --> 00:27:02,000 It's going to be a single pointer because the one gotcha 489 00:27:02,000 --> 00:27:04,000 with a linked list, as these things are called, 490 00:27:04,000 --> 00:27:09,000 is that you have to hang onto one end of the linked list. 491 00:27:09,000 --> 00:27:13,000 >> Just like with a string, you have to know the address of the first char. 492 00:27:13,000 --> 00:27:15,000 Same deal for linked lists. 493 00:27:15,000 --> 00:27:19,000 You have to know the address of the first chunk of memory 494 00:27:19,000 --> 00:27:25,000 because from there, you can reach every other one. 495 00:27:25,000 --> 00:27:27,000 Downside. 496 00:27:27,000 --> 00:27:30,000 What price are we paying for this versatility of having a dynamically 497 00:27:30,000 --> 00:27:34,000 sizable data structure that if we ever need more memory, fine, 498 00:27:34,000 --> 00:27:37,000 just allocate one more chunk and draw a pointer from 499 00:27:37,000 --> 00:27:39,000 the old to the new tail of the list? 500 00:27:39,000 --> 00:27:41,000 Yeah. 501 00:27:41,000 --> 00:27:43,000 [Student] It takes about twice as much space. 502 00:27:43,000 --> 00:27:45,000 It takes twice as much space, so that's definitely a downside, and we've seen this 503 00:27:45,000 --> 00:27:48,000 tradeoff before between time and space and flexibility 504 00:27:48,000 --> 00:27:51,000 where by now, we need not 32 bits for each of these numbers. 505 00:27:51,000 --> 00:27:57,000 We really need 64, 32 for the number and 32 for the pointer. 506 00:27:57,000 --> 00:27:59,000 But hey, I have 2 gigabytes of RAM. 507 00:27:59,000 --> 00:28:02,000 Adding another 32 bits here and here doesn't seem that big of a deal. 508 00:28:02,000 --> 00:28:05,000 But for large data sets, it definitely adds up to literally twice as much. 509 00:28:05,000 --> 00:28:09,000 What's another downside now, or what feature do we give up, 510 00:28:09,000 --> 00:28:12,000 if we represent lists of things with a linked list and not an array? 511 00:28:12,000 --> 00:28:14,000 [Student] You can't traverse it backwards. 512 00:28:14,000 --> 00:28:16,000 You can't traverse it backwards, so you're kind of screwed if you're walking 513 00:28:16,000 --> 00:28:19,000 from left to right using a for loop or a while loop 514 00:28:19,000 --> 00:28:21,000 and then you realize, "Oh, I want to go back to the beginning of the list." 515 00:28:21,000 --> 00:28:26,000 You can't because these pointers only go from left to right as the arrows indicate. 516 00:28:26,000 --> 00:28:29,000 >> Now, you could remember the start of the list with another variable, 517 00:28:29,000 --> 00:28:31,000 but that's a complexity to keep in mind. 518 00:28:31,000 --> 00:28:35,000 An array, no matter how far you go, you can always do minus, minus, minus, minus 519 00:28:35,000 --> 00:28:37,000 and go back from whence you came. 520 00:28:37,000 --> 00:28:40,000 What's another downside here? Yeah. 521 00:28:40,000 --> 00:28:43,000 [inaudible student question] 522 00:28:43,000 --> 00:28:47,000 You could, so you've actually just proposed a data structure called a doubly linked list, 523 00:28:47,000 --> 00:28:50,000 and indeed, you would add another pointer to each of these rectangles 524 00:28:50,000 --> 00:28:53,000 that goes the other direction, the upside of which 525 00:28:53,000 --> 00:28:55,000 is now you can traverse back and forth, 526 00:28:55,000 --> 00:28:59,000 the downside of which is now you're using three times as much memory as we used to 527 00:28:59,000 --> 00:29:04,000 and also adding complexity in terms of the code you have to write to get it right. 528 00:29:04,000 --> 00:29:08,000 But these are all perhaps very reasonable tradeoffs, if the reversal is more important. 529 00:29:08,000 --> 00:29:10,000 Yeah. 530 00:29:10,000 --> 00:29:12,000 [Student] You also can't have a 2D linked list. 531 00:29:12,000 --> 00:29:16,000 Good, you can't really have a 2D linked list. 532 00:29:16,000 --> 00:29:18,000 You could. It's not nearly as easy as an array. 533 00:29:18,000 --> 00:29:21,000 Like an array, you do open bracket, closed bracket, open bracket, closed bracket, 534 00:29:21,000 --> 00:29:23,000 and you get some 2-dimensional structure. 535 00:29:23,000 --> 00:29:26,000 You could implement a 2-dimensional linked list 536 00:29:26,000 --> 00:29:29,000 if you do add—as you proposed—a third pointer to each of these things, 537 00:29:29,000 --> 00:29:34,000 and if you think about another list coming at you 3D style 538 00:29:34,000 --> 00:29:40,000 from the screen to all of us, which is just another chain of some sort. 539 00:29:40,000 --> 00:29:45,000 We could do it, but it's not as simple as typing open bracket, square bracket. Yeah. 540 00:29:45,000 --> 00:29:48,000 [inaudible student question] 541 00:29:48,000 --> 00:29:50,000 Good, so this is a real kicker. 542 00:29:50,000 --> 00:29:54,000 >> These algorithms that we've pined over, like oh, binary search, 543 00:29:54,000 --> 00:29:57,000 you can search an array of numbers on the board 544 00:29:57,000 --> 00:30:01,000 or a phone book so much more quickly if you use divide and conquer 545 00:30:01,000 --> 00:30:05,000 and a binary search algorithm, but binary search required two assumptions. 546 00:30:05,000 --> 00:30:09,000 One, that the data was sorted. 547 00:30:09,000 --> 00:30:11,000 Now, we can presumably keep this sorted, 548 00:30:11,000 --> 00:30:14,000 so maybe that's not a concern, but binary search also assumed 549 00:30:14,000 --> 00:30:18,000 that you had random access to the list of numbers, 550 00:30:18,000 --> 00:30:21,000 and an array allows you to have random access, and by random access, 551 00:30:21,000 --> 00:30:24,000 I mean if you're given an array, how much time does it take you 552 00:30:24,000 --> 00:30:26,000 to get to bracket 0? 553 00:30:26,000 --> 00:30:29,000 One operation, you just use [0] and you're right there. 554 00:30:29,000 --> 00:30:33,000 How many steps does it take to get to location 10? 555 00:30:33,000 --> 00:30:36,000 One step, you just go to [10] and you're there. 556 00:30:36,000 --> 00:30:40,000 By contrast, how do you get to the 10th integer in a linked list? 557 00:30:40,000 --> 00:30:42,000 You have to start at the beginning because you're only remembering 558 00:30:42,000 --> 00:30:45,000 the beginning of a linked list, just like a string is being remembered 559 00:30:45,000 --> 00:30:48,000 by the address of its first char, and to find that 10th int 560 00:30:48,000 --> 00:30:53,000 or that 10th character in a string, you have to search the whole damn thing. 561 00:30:53,000 --> 00:30:55,000 >> Again, we're not solving all of our problems. 562 00:30:55,000 --> 00:31:00,000 We're introducing new ones, but it really depends on what you're trying to design for. 563 00:31:00,000 --> 00:31:04,000 In terms of implementing this, we can borrow an idea from that student structure. 564 00:31:04,000 --> 00:31:07,000 The syntax is very similar, except now, the idea is a little more abstract 565 00:31:07,000 --> 00:31:09,000 than house and name and ID. 566 00:31:09,000 --> 00:31:13,000 But I propose that we could have a data structure in C 567 00:31:13,000 --> 00:31:17,000 that is called node, as the last word on the slide suggests, 568 00:31:17,000 --> 00:31:21,000 inside of a node, and a node is just a generic container in computer science. 569 00:31:21,000 --> 00:31:25,000 It's usually drawn as a circle or a square or rectangle as we've done. 570 00:31:25,000 --> 00:31:27,000 And in this data structure, we have an int, n, 571 00:31:27,000 --> 00:31:29,000 so that's the number I want to store. 572 00:31:29,000 --> 00:31:36,000 But what is this second line, struct node *next? 573 00:31:36,000 --> 00:31:40,000 Why is this correct, or what role does this thing play, 574 00:31:40,000 --> 00:31:42,000 even though it's a little cryptic at first glance? 575 00:31:42,000 --> 00:31:44,000 Yeah. 576 00:31:44,000 --> 00:31:46,000 [inaudible student response] 577 00:31:46,000 --> 00:31:50,000 Exactly, so the * sort of spoils that it's a pointer of some sort. 578 00:31:50,000 --> 00:31:53,000 The name of this pointer is arbitrarily next, 579 00:31:53,000 --> 00:32:00,000 but we could have called it anything we want, but what does this pointer point to? 580 00:32:00,000 --> 00:32:03,000 [Student] Another node.>>Exactly, it points to another such node. 581 00:32:03,000 --> 00:32:05,000 >> Now, this is sort of a curiosity of C. 582 00:32:05,000 --> 00:32:09,000 Recall that C is read by a compiler top to bottom, left to right, 583 00:32:09,000 --> 00:32:13,000 which means if—this is a little different from what we did with the student. 584 00:32:13,000 --> 00:32:16,000 When we defined a student, we actually did not put a word there. 585 00:32:16,000 --> 00:32:18,000 It just said typedef. 586 00:32:18,000 --> 00:32:20,000 Then we had int id, string name, string house, 587 00:32:20,000 --> 00:32:23,000 and then student at the bottom of the struct. 588 00:32:23,000 --> 00:32:26,000 This declaration is a little different because, 589 00:32:26,000 --> 00:32:28,000 again, the C compiler is a little dumb. 590 00:32:28,000 --> 00:32:30,000 It's only going to read top to bottom, 591 00:32:30,000 --> 00:32:33,000 so if it reaches the 2nd line here 592 00:32:33,000 --> 00:32:37,000 where next is declared and it sees, oh, here's a variable called next. 593 00:32:37,000 --> 00:32:39,000 It's a pointer to a struct node. 594 00:32:39,000 --> 00:32:42,000 The compiler is going to realize what is a struct node? 595 00:32:42,000 --> 00:32:44,000 I've never heard of this thing before, 596 00:32:44,000 --> 00:32:47,000 because the word node might not otherwise appear 597 00:32:47,000 --> 00:32:49,000 until the bottom, so there is this redundancy. 598 00:32:49,000 --> 00:32:53,000 You have to say struct node here, which you can then shorten later on 599 00:32:53,000 --> 00:32:56,000 thanks to typedef down here, but this is because 600 00:32:56,000 --> 00:33:02,000 we are referencing the structure itself inside of the structure. 601 00:33:02,000 --> 00:33:05,000 That's the one gotcha there. 602 00:33:05,000 --> 00:33:07,000 >> Some interesting problems are going to arise. 603 00:33:07,000 --> 00:33:09,000 We've got a list of numbers. How do we insert into it? 604 00:33:09,000 --> 00:33:11,000 How do we search it? How do we delete from it? 605 00:33:11,000 --> 00:33:13,000 Especially now that we have to manage all of these pointers. 606 00:33:13,000 --> 00:33:15,000 You thought pointers were sort of mind-bending 607 00:33:15,000 --> 00:33:17,000 when you had one of them just trying to read an int to it. 608 00:33:17,000 --> 00:33:20,000 Now we have to manipulate an entire list's worth. 609 00:33:20,000 --> 00:33:22,000 Why don't we take our 5-minute break here, and then we'll bring 610 00:33:22,000 --> 00:33:34,000 some folks up on stage to do exactly that. 611 00:33:34,000 --> 00:33:36,000 >> C is much more fun when it's acted out. 612 00:33:36,000 --> 00:33:39,000 Who would literally like to be first? 613 00:33:39,000 --> 00:33:41,000 Okay, come on up. You are first. 614 00:33:41,000 --> 00:33:44,000 Who would like to be 9? Okay, 9. 615 00:33:44,000 --> 00:33:46,000 How about 9? 17? 616 00:33:46,000 --> 00:33:51,000 A little clique here. 22 and 26 in that front row. 617 00:33:51,000 --> 00:33:53,000 And then how about someone over there being pointed at. 618 00:33:53,000 --> 00:33:57,000 You are 34. Okay, 34, come on up. 619 00:33:57,000 --> 00:33:59,000 First is over there. Okay, all four of you guys. 620 00:33:59,000 --> 00:34:01,000 And who did we say for 9? 621 00:34:01,000 --> 00:34:04,000 Who is our 9? 622 00:34:04,000 --> 00:34:07,000 Who really wants to be 9? All right, come on, be 9. 623 00:34:07,000 --> 00:34:10,000 Here we go. 624 00:34:10,000 --> 00:34:13,000 34, we'll meet you over there. 625 00:34:13,000 --> 00:34:17,000 The first part is make yourselves look like that. 626 00:34:17,000 --> 00:34:21,000 26, 22, 17, good. 627 00:34:21,000 --> 00:34:25,000 If you can stand off to the side, because we're going to malloc you in a moment. 628 00:34:25,000 --> 00:34:29,000 >> Good, good. 629 00:34:29,000 --> 00:34:32,000 Okay, excellent, so let's ask a couple of questions here. 630 00:34:32,000 --> 00:34:34,000 And actually, what's your name?>>Anita. 631 00:34:34,000 --> 00:34:37,000 Anita, okay, come on over here. 632 00:34:37,000 --> 00:34:41,000 Anita is going to help us sort of solve one fairly simple question in first, 633 00:34:41,000 --> 00:34:44,000 which is how do you find whether or not a value is in the list? 634 00:34:44,000 --> 00:34:48,000 Now, notice that first, represented here by Lucas, 635 00:34:48,000 --> 00:34:52,000 is a little different, and so his piece of paper is deliberately sideways 636 00:34:52,000 --> 00:34:55,000 because it's not quite as tall and doesn't take up as many bits, 637 00:34:55,000 --> 00:34:58,000 even though technically he has the same size of paper just rotated. 638 00:34:58,000 --> 00:35:01,000 But he's a little different in that he's only 32 bits for a pointer, 639 00:35:01,000 --> 00:35:05,000 and all of these guys are 64 bits, half of which is the number, half of which is a pointer. 640 00:35:05,000 --> 00:35:08,000 But the pointer is not depicted, so if you guys could somewhat awkwardly 641 00:35:08,000 --> 00:35:12,000 use your left hand to point at the person next to you. 642 00:35:12,000 --> 00:35:14,000 And you're number 34. What's your name? 643 00:35:14,000 --> 00:35:16,000 Ari. 644 00:35:16,000 --> 00:35:19,000 Ari, so actually, hold the paper in your right hand, and left hand goes straight down. 645 00:35:19,000 --> 00:35:21,000 You represent null on the left. 646 00:35:21,000 --> 00:35:24,000 >> Now our human picture is very consistent. 647 00:35:24,000 --> 00:35:26,000 This is actually how pointers work. 648 00:35:26,000 --> 00:35:29,000 And if you can scrunch a little bit this way so I'm not in your way. 649 00:35:29,000 --> 00:35:34,000 Anita here, find me the number 22, 650 00:35:34,000 --> 00:35:40,000 but assume a constraint of not humans holding up pieces of paper, 651 00:35:40,000 --> 00:35:43,000 but this is a list, and you only have Lucas to begin with 652 00:35:43,000 --> 00:35:46,000 because he is literally the first pointer. 653 00:35:46,000 --> 00:35:51,000 Suppose you yourself are a pointer, and so you too have the ability to point at something. 654 00:35:51,000 --> 00:35:56,000 Why don't you start by pointing at exactly what Lucas is pointing at? 655 00:35:56,000 --> 00:35:58,000 Good, and let me enact this out over here. 656 00:35:58,000 --> 00:36:04,000 Just for the sake of discussion, let me pull up a blank page here. 657 00:36:04,000 --> 00:36:06,000 How do you spell your name?>>Anita. 658 00:36:06,000 --> 00:36:08,000 Okay, Anita. 659 00:36:08,000 --> 00:36:18,000 Let's say node* anita = lucas. 660 00:36:18,000 --> 00:36:22,000 Well, we shouldn't call you lucas. We should call you first. 661 00:36:22,000 --> 00:36:25,000 Why is this in fact consistent with reality here? 662 00:36:25,000 --> 00:36:27,000 One, first already exists. 663 00:36:27,000 --> 00:36:30,000 First has been allocated presumably somewhere up here. 664 00:36:30,000 --> 00:36:35,000 Node* first, and it's been allocated a list somehow. 665 00:36:35,000 --> 00:36:37,000 I don't know how that happened. That happened before class started. 666 00:36:37,000 --> 00:36:40,000 This linked list of humans has been created. 667 00:36:40,000 --> 00:36:44,000 And now at this point in the story—this is all going on Facebook apparently later— 668 00:36:44,000 --> 00:36:49,000 at this point in the story, Anita has been initialized to be equal to first, 669 00:36:49,000 --> 00:36:51,000 which doesn't mean that Anita points at Lucas. 670 00:36:51,000 --> 00:36:53,000 Rather, she points at what he points at 671 00:36:53,000 --> 00:36:57,000 because the same address that's inside of Lucas's 32 bits--1, 2, 3-- 672 00:36:57,000 --> 00:37:01,000 is now also inside of Anita's 32 bits--1, 2, 3. 673 00:37:01,000 --> 00:37:05,000 >> Now find 22. How would you go about doing this? 674 00:37:05,000 --> 00:37:07,000 What's that?>>Point to whatever. 675 00:37:07,000 --> 00:37:11,000 Point to whatever, so go ahead and act it out as best you can here. 676 00:37:11,000 --> 00:37:15,000 Good, good, and now you're pointing at—what's your name with 22? 677 00:37:15,000 --> 00:37:18,000 Ramon.>>Ramon, so Ramon is holding up 22. 678 00:37:18,000 --> 00:37:20,000 You have now done a check. 679 00:37:20,000 --> 00:37:24,000 Does Ramon == 22, and if so, for instance, we can return true. 680 00:37:24,000 --> 00:37:26,000 Let me—while these guys stand here somewhat awkwardly— 681 00:37:26,000 --> 00:37:32,000 let me do something quickly like bool find. 682 00:37:32,000 --> 00:37:37,000 I'm going to go ahead and say (node* list, int n). 683 00:37:37,000 --> 00:37:39,000 I'll be right back with you guys. I just have to write some code. 684 00:37:39,000 --> 00:37:45,000 And now I'm going to go ahead and do this, node* anita = list. 685 00:37:45,000 --> 00:37:51,000 And I'm going to go ahead and say while (anita != NULL). 686 00:37:51,000 --> 00:37:57,000 >> The metaphor here is getting a little stretched, but while (anita != NULL), what do I want to do? 687 00:37:57,000 --> 00:38:03,000 I need some way of referencing 688 00:38:03,000 --> 00:38:05,000 the integer that Anita is pointing at. 689 00:38:05,000 --> 00:38:08,000 In the past, when we had structures, which a node is, 690 00:38:08,000 --> 00:38:11,000 we used the dot notation, and we would say something like 691 00:38:11,000 --> 00:38:15,000 anita.n, but the problem here is that Anita is not a struct per se. 692 00:38:15,000 --> 00:38:17,000 What is she? 693 00:38:17,000 --> 00:38:21,000 She's a pointer, so really, if we want to use this dot notation— 694 00:38:21,000 --> 00:38:23,000 and this is going to look deliberately a little cryptic— 695 00:38:23,000 --> 00:38:28,000 we have to do something like go to whatever Anita's left hand is pointing at 696 00:38:28,000 --> 00:38:31,000 and then get the field called n. 697 00:38:31,000 --> 00:38:35,000 Anita is a pointer, but what is *anita? 698 00:38:35,000 --> 00:38:38,000 What do you find when you go to what Anita is pointing at? 699 00:38:38,000 --> 00:38:42,000 A struct, a node, and a node, recall, has a field called n 700 00:38:42,000 --> 00:38:47,000 because it has, recall, these 2 fields, next and n, 701 00:38:47,000 --> 00:38:50,000 that we saw a moment ago right here. 702 00:38:50,000 --> 00:38:53,000 >> To actually imitate this in code, 703 00:38:53,000 --> 00:39:02,000 we could do this and say if ((*anita).n == n), the n that I'm looking for. 704 00:39:02,000 --> 00:39:04,000 Notice that the function was passed in the number I care about. 705 00:39:04,000 --> 00:39:10,000 Then I can go ahead and do something like return true. 706 00:39:10,000 --> 00:39:12,000 Else, if that's not the case, what do I want to do? 707 00:39:12,000 --> 00:39:19,000 How do I translate to code what Anita did so intuitively by walking through the list? 708 00:39:19,000 --> 00:39:26,000 What should I do up here to simulate Anita taking that step to the left, that step to the left? 709 00:39:26,000 --> 00:39:28,000 [inaudible student response]>>What's that? 710 00:39:28,000 --> 00:39:30,000 [inaudible student response] 711 00:39:30,000 --> 00:39:34,000 Good, not a bad idea, but in the past, when we've done this, we've done anita++ 712 00:39:34,000 --> 00:39:37,000 because that would add the number 1 to Anita, 713 00:39:37,000 --> 00:39:40,000 which would typically point at the next person, like Ramon, 714 00:39:40,000 --> 00:39:44,000 or the person next to him, or the next to him person down the line. 715 00:39:44,000 --> 00:39:49,000 But that's not quite good here because what does this thing look like in memory? 716 00:39:49,000 --> 00:39:54,000 Not that. We have to disable that. 717 00:39:54,000 --> 00:40:00,000 It looks like this in memory, and even though I've drawn 1 and 2 and 3 close to one another, 718 00:40:00,000 --> 00:40:03,000 if we really simulate this—can you guys, while still pointing at the same people, 719 00:40:03,000 --> 00:40:07,000 can some of you take a random step back, some of you a random step forward? 720 00:40:07,000 --> 00:40:10,000 >> This mess is still a linked list, 721 00:40:10,000 --> 00:40:13,000 but these guys could be anywhere in memory, 722 00:40:13,000 --> 00:40:15,000 so anita++ is not going to work why? 723 00:40:15,000 --> 00:40:19,000 What's at location anita++? 724 00:40:19,000 --> 00:40:21,000 Who knows. 725 00:40:21,000 --> 00:40:24,000 It's some other value that just so happens to be interposed 726 00:40:24,000 --> 00:40:28,000 among all of these nodes by chance because we're not using an array. 727 00:40:28,000 --> 00:40:30,000 We allocated each of these nodes individually. 728 00:40:30,000 --> 00:40:32,000 Okay, if you guys can clean yourselves back up. 729 00:40:32,000 --> 00:40:37,000 Let me propose that instead of anita++, we instead do anita gets— 730 00:40:37,000 --> 00:40:42,000 well, why don't we go to whatever Anita is pointing at and then do .next? 731 00:40:42,000 --> 00:40:45,000 In other words, we go to Ramon, who's holding the number 22, 732 00:40:45,000 --> 00:40:51,000 and then .next is as though Anita would be copying his left hand pointer. 733 00:40:51,000 --> 00:40:54,000 But she wouldn't go farther than Ramon because we found 22. 734 00:40:54,000 --> 00:40:56,000 But that would be the idea. Now, this is a god-awful mess. 735 00:40:56,000 --> 00:40:59,000 Honestly, no one will ever remember this syntax, and so thankfully, 736 00:40:59,000 --> 00:41:04,000 it's actually a little deliberate—oh, you didn't actually see what I wrote. 737 00:41:04,000 --> 00:41:08,000 This would be more compelling if you could. Voila! 738 00:41:08,000 --> 00:41:10,000 >> Behind the scenes, I was solving the problem this way. 739 00:41:10,000 --> 00:41:14,000 Anita, to take that step to the left, 740 00:41:14,000 --> 00:41:18,000 first, we do go to the address that Anita is pointing at 741 00:41:18,000 --> 00:41:23,000 and where she will find not only n, which we just checked for comparison's sake, 742 00:41:23,000 --> 00:41:25,000 but you will also find next--in this case, 743 00:41:25,000 --> 00:41:28,000 Ramon's left hand pointing to the next node in the list. 744 00:41:28,000 --> 00:41:32,000 But this is the god-awful mess to which I referred earlier, 745 00:41:32,000 --> 00:41:34,000 but it turns out C lets us simplify this. 746 00:41:34,000 --> 00:41:40,000 Instead of writing (*anita), we can instead just write anita->n, 747 00:41:40,000 --> 00:41:45,000 and it's the exact same thing functionally, but it's a lot more intuitive, 748 00:41:45,000 --> 00:41:48,000 and it's a lot more consistent with the picture that we've been drawing 749 00:41:48,000 --> 00:41:50,000 all this time using arrows. 750 00:41:50,000 --> 00:41:57,000 >> Lastly, what do we need to do at the end of this program? 751 00:41:57,000 --> 00:42:00,000 There's one line of code remaining. 752 00:42:00,000 --> 00:42:02,000 Return what? 753 00:42:02,000 --> 00:42:05,000 False, because if we get through the whole while loop 754 00:42:05,000 --> 00:42:10,000 and Anita is, in fact, null, that means she went all the way to the end of the list 755 00:42:10,000 --> 00:42:12,000 where she was pointing at—what's your name again? 756 00:42:12,000 --> 00:42:15,000 Ari.>>Ari's left hand, which is null. 757 00:42:15,000 --> 00:42:18,000 Anita is now null, and I realize you're just standing here awkwardly in limbo 758 00:42:18,000 --> 00:42:21,000 because I'm going off on a monologue here, 759 00:42:21,000 --> 00:42:23,000 but we'll involve you again in just a moment. 760 00:42:23,000 --> 00:42:27,000 Anita is null at that point in the story, so the while loop terminates, 761 00:42:27,000 --> 00:42:30,000 and we have to return false because if she got all the way to Ari's null pointer 762 00:42:30,000 --> 00:42:34,000 then there was no number that she sought in the list. 763 00:42:34,000 --> 00:42:39,000 We can clean this up too, but this is a pretty good implementation then 764 00:42:39,000 --> 00:42:43,000 of a traversal function, a find function for a linked list. 765 00:42:43,000 --> 00:42:48,000 It's still linear search, but it's not as simple as ++ a pointer 766 00:42:48,000 --> 00:42:52,000 or ++ an i variable because now we can't guess 767 00:42:52,000 --> 00:42:54,000 where each of these nodes are in memory. 768 00:42:54,000 --> 00:42:57,000 We have to literally follow the trail of breadcrumbs or, more specifically, 769 00:42:57,000 --> 00:43:00,000 pointers, to get from one node to another. 770 00:43:00,000 --> 00:43:02,000 >> Now let's try another one. Anita, do you want to come back here? 771 00:43:02,000 --> 00:43:06,000 Why don't we go ahead and allocate one other person from the audience? 772 00:43:06,000 --> 00:43:08,000 Malloc—what's your name?>>Rebecca. 773 00:43:08,000 --> 00:43:10,000 Rebecca. Rebecca has been malloced from the audience, 774 00:43:10,000 --> 00:43:13,000 and she is now storing the number 55. 775 00:43:13,000 --> 00:43:17,000 And the goal at hand now is for Anita to insert 776 00:43:17,000 --> 00:43:22,000 Rebecca into the linked list here in its appropriate place. 777 00:43:22,000 --> 00:43:24,000 Come on over here for a moment. 778 00:43:24,000 --> 00:43:28,000 I have done something like this. 779 00:43:28,000 --> 00:43:32,000 I have done node*. And what's your name again? 780 00:43:32,000 --> 00:43:34,000 Rebecca.>>Rebecca, okay. 781 00:43:34,000 --> 00:43:41,000 Rebecca gets malloc(sizeof(node)). 782 00:43:41,000 --> 00:43:44,000 Just like we have allocated things like students and whatnot in the past, 783 00:43:44,000 --> 00:43:46,000 we need the size of the node, so now Rebecca 784 00:43:46,000 --> 00:43:49,000 is pointing at what? 785 00:43:49,000 --> 00:43:52,000 Rebecca has two fields inside of her, one of which is 55. 786 00:43:52,000 --> 00:43:55,000 Let's do what, rebecca-> = 55. 787 00:43:55,000 --> 00:44:00,000 But then rebecca->next should be—like right now, her hand is kind of who knows? 788 00:44:00,000 --> 00:44:03,000 It's pointing at some garbage value, so why don't for good measure 789 00:44:03,000 --> 00:44:07,000 we at least do this so that left hand is now at her side. 790 00:44:07,000 --> 00:44:09,000 Now Anita, take it from here. 791 00:44:09,000 --> 00:44:11,000 You have Rebecca having been allocated. 792 00:44:11,000 --> 00:44:20,000 Go ahead and find where we should put Rebecca. 793 00:44:20,000 --> 00:44:25,000 Good, very good. 794 00:44:25,000 --> 00:44:28,000 Okay, good, and now we need you to provide a bit of direction, 795 00:44:28,000 --> 00:44:30,000 so you've reached Ari. 796 00:44:30,000 --> 00:44:33,000 His left hand is null, but Rebecca clearly belongs to the right, 797 00:44:33,000 --> 00:44:36,000 so how do we have to alter this linked list 798 00:44:36,000 --> 00:44:38,000 in order to insert Rebecca into the appropriate place? 799 00:44:38,000 --> 00:44:42,000 If you could literally move people's left hands around as needed, 800 00:44:42,000 --> 00:44:48,000 we'll fix the problem that way. 801 00:44:48,000 --> 00:44:52,000 Okay, good, and meanwhile, Rebecca's left hand is now by her side. 802 00:44:52,000 --> 00:44:54,000 >> That was too easy. 803 00:44:54,000 --> 00:44:57,000 Let's try allocating—we're almost done, 20. 804 00:44:57,000 --> 00:44:59,000 Okay, come on up. 805 00:44:59,000 --> 00:45:04,000 20 has been allocated, so let me go ahead and say again here 806 00:45:04,000 --> 00:45:07,000 we've just done node* saad. 807 00:45:07,000 --> 00:45:11,000 We have malloc(sizeof(node)). 808 00:45:11,000 --> 00:45:16,000 We then do the same exact syntax as we did before for 20, 809 00:45:16,000 --> 00:45:20,000 and I'll do next = NULL, and now it's up to Anita 810 00:45:20,000 --> 00:45:23,000 to insert you into the linked list, if you could play that exact same role. 811 00:45:23,000 --> 00:45:30,000 Execute. 812 00:45:30,000 --> 00:45:32,000 Okay, good. 813 00:45:32,000 --> 00:45:38,000 Now think carefully before you start moving left hands around. 814 00:45:38,000 --> 00:45:46,000 You by far got the most awkward role today. 815 00:45:46,000 --> 00:45:59,000 Whose hand should be moved first? 816 00:45:59,000 --> 00:46:02,000 Okay, wait, I'm hearing some no's. 817 00:46:02,000 --> 00:46:07,000 If some folks would politely like to help solve an awkward situation here. 818 00:46:07,000 --> 00:46:11,000 Whose left hand should be updated first perhaps? Yeah. 819 00:46:11,000 --> 00:46:13,000 [Student] Saad's. 820 00:46:13,000 --> 00:46:15,000 Okay, Saad's, why, though? 821 00:46:15,000 --> 00:46:17,000 [inaudible student response] 822 00:46:17,000 --> 00:46:19,000 Good, because if we move—what's your name?>>Marshall. 823 00:46:19,000 --> 00:46:22,000 Marshall, if we move his hand first down to null, 824 00:46:22,000 --> 00:46:25,000 now we have literally orphaned four people in this list 825 00:46:25,000 --> 00:46:29,000 because he was the only thing pointing at Ramon and everyone to the left, 826 00:46:29,000 --> 00:46:31,000 so updating that pointer first was bad. 827 00:46:31,000 --> 00:46:33,000 Let's undo that. 828 00:46:33,000 --> 00:46:37,000 Good, and now go ahead and move the appropriate left hand pointing at Ramon. 829 00:46:37,000 --> 00:46:39,000 This feels a little redundant. 830 00:46:39,000 --> 00:46:41,000 Now there's two people pointing at Ramon, but that's fine 831 00:46:41,000 --> 00:46:43,000 because now how else do we update the list? 832 00:46:43,000 --> 00:46:48,000 What other hand has to move? 833 00:46:48,000 --> 00:46:53,000 Excellent, now have we lost any memory? 834 00:46:53,000 --> 00:46:57,000 No, so good, let's see if we can't break this once more. 835 00:46:57,000 --> 00:47:00,000 >> Mallocing one last time, number 5. 836 00:47:00,000 --> 00:47:04,000 All the way in back, come on down. 837 00:47:04,000 --> 00:47:08,000 It's very exciting. 838 00:47:08,000 --> 00:47:15,000 [applause] 839 00:47:15,000 --> 00:47:17,000 What's your name?>>Ron. 840 00:47:17,000 --> 00:47:19,000 Ron, okay, you are malloced as number 5. 841 00:47:19,000 --> 00:47:23,000 We've just executed code that's almost identical to these 842 00:47:23,000 --> 00:47:26,000 with just a different name. 843 00:47:26,000 --> 00:47:28,000 Excellent. 844 00:47:28,000 --> 00:47:38,000 Now, Anita, good luck inserting number 5 into the list now. 845 00:47:38,000 --> 00:47:43,000 Good, and? 846 00:47:43,000 --> 00:47:47,000 Excellent, so this is really the third of three total cases. 847 00:47:47,000 --> 00:47:49,000 We first had someone at the end, Rebecca. 848 00:47:49,000 --> 00:47:51,000 We then had someone in the middle. 849 00:47:51,000 --> 00:47:53,000 Now we have someone at the beginning, and in this example, 850 00:47:53,000 --> 00:47:56,000 we now had to update Lucas for the first time 851 00:47:56,000 --> 00:48:00,000 because the first element in the list now has to point at a new node, 852 00:48:00,000 --> 00:48:03,000 who, in turn, is pointing at node number 9. 853 00:48:03,000 --> 00:48:06,000 >> This was a hugely awkward demonstration, I'm sure, 854 00:48:06,000 --> 00:48:08,000 so a big round of applause for these guys if you could. 855 00:48:08,000 --> 00:48:11,000 Nicely done. 856 00:48:11,000 --> 00:48:17,000 That's all. You may keep your pieces of paper as a little memory. 857 00:48:17,000 --> 00:48:22,000 It turns out that doing this in code 858 00:48:22,000 --> 00:48:26,000 is not quite as simple as just moving hands around 859 00:48:26,000 --> 00:48:28,000 and pointing pointers at different things. 860 00:48:28,000 --> 00:48:31,000 But realize that when it comes time to implement something like 861 00:48:31,000 --> 00:48:34,000 a linked list or a variant of it if you focus on really 862 00:48:34,000 --> 00:48:38,000 these basic fundamentals, the bite-size problems I have to figure out, 863 00:48:38,000 --> 00:48:43,000 is it this hand or this hand, realize that what is otherwise a fairly complex program 864 00:48:43,000 --> 00:48:47,000 can, in fact, be reduced to fairly simple building blocks like this. 865 00:48:47,000 --> 00:48:51,000 >> Let's take things in a more sophisticated direction still. 866 00:48:51,000 --> 00:48:53,000 We now have the notion of the linked list. 867 00:48:53,000 --> 00:48:57,000 We also have—thanks to the suggestion back there—a doubly linked list, 868 00:48:57,000 --> 00:49:01,000 which looks almost the same, but now we have two pointers inside of the struct 869 00:49:01,000 --> 00:49:05,000 instead of one, and we could probably call those pointers previous and next 870 00:49:05,000 --> 00:49:08,000 or left or right, but we do, in fact, need two of them. 871 00:49:08,000 --> 00:49:10,000 The code would be a little more involved. 872 00:49:10,000 --> 00:49:12,000 Anita would have had to do more work here on the stage. 873 00:49:12,000 --> 00:49:15,000 But we could certainly implement that kind of structure. 874 00:49:15,000 --> 00:49:19,000 In terms of running time, though, what would be the running time 875 00:49:19,000 --> 00:49:24,000 for Anita of finding a number n in a linked list now? 876 00:49:24,000 --> 00:49:27,000 Still big O of n, so it's no better than linear search. 877 00:49:27,000 --> 00:49:29,000 We can't do binary search, though, again. 878 00:49:29,000 --> 00:49:34,000 Why was that the case? You can't jump around. 879 00:49:34,000 --> 00:49:36,000 Even though we obviously see all the humans on the stage, 880 00:49:36,000 --> 00:49:39,000 and Anita could have eyeballed it and said, "Here is the middle of the list," 881 00:49:39,000 --> 00:49:42,000 she wouldn't know that if she were the computer program 882 00:49:42,000 --> 00:49:47,000 because the only thing she had to latch on to at the start of the scenario 883 00:49:47,000 --> 00:49:50,000 was Lucas, who was the first pointer. 884 00:49:50,000 --> 00:49:53,000 She would necessarily have to follow those links, 885 00:49:53,000 --> 00:49:56,000 counting her way until she found roughly the middle, 886 00:49:56,000 --> 00:49:58,000 and even then, she's not going to know when she's reached the middle 887 00:49:58,000 --> 00:50:01,000 unless she goes all the way to the end to figure out how many there are, 888 00:50:01,000 --> 00:50:05,000 then backtracks, and that too would be hard unless you had 889 00:50:05,000 --> 00:50:07,000 a doubly linked list of some sort. 890 00:50:07,000 --> 00:50:10,000 >> Solving some problems today, but introducing others. 891 00:50:10,000 --> 00:50:12,000 What about a different data structure altogether? 892 00:50:12,000 --> 00:50:15,000 This is a photograph of the trays in Mather House, 893 00:50:15,000 --> 00:50:19,000 and in this case, we have a data structure we've also kind of already been talking about. 894 00:50:19,000 --> 00:50:22,000 We talked about a stack in the context of memory, 895 00:50:22,000 --> 00:50:26,000 and that's sort of deliberately named because a stack in the terms of memory 896 00:50:26,000 --> 00:50:31,000 is effectively a data structure that has more and more stuff layered on top of it. 897 00:50:31,000 --> 00:50:35,000 But the interesting thing about a stack, as is the case in reality, 898 00:50:35,000 --> 00:50:38,000 is that it's a special kind of data structure. 899 00:50:38,000 --> 00:50:42,000 It's a data structure whereby the first element in 900 00:50:42,000 --> 00:50:46,000 is the last element out. 901 00:50:46,000 --> 00:50:50,000 If you are the first tray to be put onto the stack, 902 00:50:50,000 --> 00:50:53,000 you're going to be unfortunately the last tray to be taken off the stack, 903 00:50:53,000 --> 00:50:55,000 and that's not necessarily a good thing. 904 00:50:55,000 --> 00:50:58,000 Conversely, you can think about it the other way around, 905 00:50:58,000 --> 00:51:02,000 the last in is the first out. 906 00:51:02,000 --> 00:51:05,000 >> Now, do any scenarios come to mind where having a stack 907 00:51:05,000 --> 00:51:08,000 data structure where you have that property 908 00:51:08,000 --> 00:51:13,000 of the last in, first out, is actually compelling? 909 00:51:13,000 --> 00:51:16,000 Is that a good thing? Is that a bad thing? 910 00:51:16,000 --> 00:51:19,000 It's definitely a bad thing if the trays weren't all identical 911 00:51:19,000 --> 00:51:21,000 and they were all special different colors or whatnot, 912 00:51:21,000 --> 00:51:24,000 and the color you want is all the way at the bottom. 913 00:51:24,000 --> 00:51:26,000 Of course, you can't get that without great effort. 914 00:51:26,000 --> 00:51:28,000 You have to start from the top and work your way down. 915 00:51:28,000 --> 00:51:31,000 Similarly, what if you were one of these fan boys 916 00:51:31,000 --> 00:51:34,000 who waits up all night trying to get an iPhone and lines up 917 00:51:34,000 --> 00:51:36,000 at a place like this? 918 00:51:36,000 --> 00:51:40,000 Wouldn't it be nice if the Apple store 919 00:51:40,000 --> 00:51:42,000 were a stack data structure? 920 00:51:42,000 --> 00:51:44,000 Yay? Nay? 921 00:51:44,000 --> 00:51:47,000 It's only good for the people who show up at the last possible minute 922 00:51:47,000 --> 00:51:50,000 and then get plucked off the queue. 923 00:51:50,000 --> 00:51:52,000 And in fact, the fact that I was so inclined to say queue 924 00:51:52,000 --> 00:51:56,000 is actually consistent with what we would call this kind of data structure, 925 00:51:56,000 --> 00:51:59,000 one in reality where the order does matter, 926 00:51:59,000 --> 00:52:02,000 and you want the first one in to be the first one out 927 00:52:02,000 --> 00:52:04,000 if only for the sake of human fairness. 928 00:52:04,000 --> 00:52:07,000 We'll generally call that a queue data structure. 929 00:52:07,000 --> 00:52:11,000 >> It turns out besides linked lists, we can start using these same basic ideas 930 00:52:11,000 --> 00:52:15,000 and start creating new and different types of solutions to problems. 931 00:52:15,000 --> 00:52:19,000 For instance, in the case of a stack, we could represent a stack 932 00:52:19,000 --> 00:52:22,000 using a data structure like this, I would propose. 933 00:52:22,000 --> 00:52:26,000 In this case, I've declared a struct, and I've said inside of this structure 934 00:52:26,000 --> 00:52:30,000 is an array of numbers and then a variable called size, 935 00:52:30,000 --> 00:52:33,000 and I am going to call this thing a stack. 936 00:52:33,000 --> 00:52:35,000 Now, why does this actually work? 937 00:52:35,000 --> 00:52:43,000 In the case of a stack, I could draw this effectively on the screen as an array. 938 00:52:43,000 --> 00:52:47,000 Here is my stack. Those are my numbers. 939 00:52:47,000 --> 00:52:50,000 And we'll draw them as this, this, this, this, this. 940 00:52:50,000 --> 00:52:53,000 And then I have some other data member here, 941 00:52:53,000 --> 00:52:58,000 which is called size, so this is size, and this is numbers, 942 00:52:58,000 --> 00:53:02,000 and collectively, the whole iPad here represents one stack structure. 943 00:53:02,000 --> 00:53:07,000 Now, by default, size has presumably got to be initialized to 0, 944 00:53:07,000 --> 00:53:11,000 and what's inside of the array of numbers initially 945 00:53:11,000 --> 00:53:14,000 when I first allocate an array? 946 00:53:14,000 --> 00:53:16,000 Garbage. Who knows? And it doesn't actually matter. 947 00:53:16,000 --> 00:53:20,000 It doesn't matter if this is 1, 2, 3, 4, 5, completely randomly 948 00:53:20,000 --> 00:53:25,000 by bad luck stored in my structure because so long as I know that the size of the stack 949 00:53:25,000 --> 00:53:29,000 is 0, then I know programmatically, don't look at any of the elements in the array. 950 00:53:29,000 --> 00:53:31,000 It doesn't matter what's there. 951 00:53:31,000 --> 00:53:34,000 Don't look at them, as would be the implication of a size of 0. 952 00:53:34,000 --> 00:53:38,000 >> But suppose now I go ahead and insert something into the stack. 953 00:53:38,000 --> 00:53:42,000 I want to insert the number 5, so I put number 5 here, 954 00:53:42,000 --> 00:53:45,000 and then what do I put down here? 955 00:53:45,000 --> 00:53:48,000 Now I would actually put down 1 for the size, 956 00:53:48,000 --> 00:53:50,000 and now the stack is of size 1. 957 00:53:50,000 --> 00:53:53,000 What if I go ahead and insert the number, let's say, 7 next? 958 00:53:53,000 --> 00:53:57,000 This then gets updated to 2, and then we'll do 9, 959 00:53:57,000 --> 00:54:02,000 and then this gets updated to 3. 960 00:54:02,000 --> 00:54:05,000 But the interesting feature now of this stack is that 961 00:54:05,000 --> 00:54:09,000 I'm supposed to remove which element if I want to pop 962 00:54:09,000 --> 00:54:12,000 something off of the stack, so to speak? 963 00:54:12,000 --> 00:54:14,000 9 would be the first thing to go. 964 00:54:14,000 --> 00:54:18,000 How should the picture change if I want to pop an element off the stack, 965 00:54:18,000 --> 00:54:20,000 much like a tray in Mather? 966 00:54:20,000 --> 00:54:22,000 Yeah.>>[Student] Set size to 2. 967 00:54:22,000 --> 00:54:27,000 Exactly, all I do is set size to 2, and what do I do with the array? 968 00:54:27,000 --> 00:54:29,000 I don't have to do anything. 969 00:54:29,000 --> 00:54:32,000 I could, just to be anal, put a 0 there or a -1 or something to signify 970 00:54:32,000 --> 00:54:34,000 that this is not a legit value, but it doesn't matter because 971 00:54:34,000 --> 00:54:37,000 I can record outside of the array itself how long it is 972 00:54:37,000 --> 00:54:41,000 so that I know only look at the first two elements in this array. 973 00:54:41,000 --> 00:54:47,000 Now, if I go and add the number 8 to this array, how does the picture change next? 974 00:54:47,000 --> 00:54:50,000 This becomes 8, and this becomes 3. 975 00:54:50,000 --> 00:54:52,000 I'm cutting a few corners here. 976 00:54:52,000 --> 00:54:56,000 Now we have 5, 7, 8, and we're back to a size of 3. 977 00:54:56,000 --> 00:54:58,000 This is pretty simple to implement, 978 00:54:58,000 --> 00:55:06,000 but when are we going to regret this design decision? 979 00:55:06,000 --> 00:55:09,000 When do things start to go very, very wrong? Yeah. 980 00:55:09,000 --> 00:55:11,000 [inaudible student response] 981 00:55:11,000 --> 00:55:13,000 When you want to go back and get the first element you put in. 982 00:55:13,000 --> 00:55:18,000 >> It turns out here even though a stack is an array underneath the hood, 983 00:55:18,000 --> 00:55:21,000 these data structures we've started talking about are also generally known as 984 00:55:21,000 --> 00:55:25,000 abstract data structures whereby how they're implemented 985 00:55:25,000 --> 00:55:27,000 is completely besides the point. 986 00:55:27,000 --> 00:55:31,000 A data structure like a stack is supposed to add support 987 00:55:31,000 --> 00:55:35,000 operations like push, which pushes a tray onto the stack, 988 00:55:35,000 --> 00:55:39,000 and pop, which removes an element from the stack, and that's it. 989 00:55:39,000 --> 00:55:43,000 If you were to download someone else's code who already implemented 990 00:55:43,000 --> 00:55:46,000 this thing called a stack, that person would have written 991 00:55:46,000 --> 00:55:49,000 only two functions for you, push and pop, whose sole purpose in life 992 00:55:49,000 --> 00:55:51,000 would be to do exactly that. 993 00:55:51,000 --> 00:55:54,000 You or him or her who implemented that program 994 00:55:54,000 --> 00:55:58,000 would have been entirely the one to decide how to implement 995 00:55:58,000 --> 00:56:00,000 the semantics of pushing and popping underneath the hood 996 00:56:00,000 --> 00:56:03,000 or the functionality of pushing and popping. 997 00:56:03,000 --> 00:56:07,000 And I have made a somewhat shortsighted decision here 998 00:56:07,000 --> 00:56:10,000 by implementing my stack with this simple data structure why? 999 00:56:10,000 --> 00:56:12,000 When does this data structure break? 1000 00:56:12,000 --> 00:56:18,000 At what point do I have to return an error when the user calls push, for instance? 1001 00:56:18,000 --> 00:56:20,000 [Student] If there's no more space. 1002 00:56:20,000 --> 00:56:23,000 Exactly, if there's no more space, if I've exceeded capacity, 1003 00:56:23,000 --> 00:56:27,000 which is all caps because it suggests that it's some kind of global constant. 1004 00:56:27,000 --> 00:56:30,000 Well, then I'm just going to have to say, "Sorry, I can't push another value 1005 00:56:30,000 --> 00:56:32,000 onto the stack," much like in Mather. 1006 00:56:32,000 --> 00:56:36,000 >> At some point, they're going to hit the top part of that little cabinet. 1007 00:56:36,000 --> 00:56:39,000 There's no more space or capacity in the stack, at which point there's some kind of error. 1008 00:56:39,000 --> 00:56:42,000 They have to put the element somewhere else, the tray somewhere else, 1009 00:56:42,000 --> 00:56:44,000 or nowhere at all. 1010 00:56:44,000 --> 00:56:47,000 Now, with a queue, we could implement it a little differently. 1011 00:56:47,000 --> 00:56:50,000 A queue is a little different in that underneath the hood, it can be implemented 1012 00:56:50,000 --> 00:56:54,000 as an array, but why, in this case, am I proposing 1013 00:56:54,000 --> 00:56:59,000 to also have a head element representing the head of the list, 1014 00:56:59,000 --> 00:57:06,000 the front of the list, the first person in line at the Apple store, in addition to size? 1015 00:57:06,000 --> 00:57:14,000 Why do I need an additional piece of data here? 1016 00:57:14,000 --> 00:57:16,000 Think back to what numbers is 1017 00:57:16,000 --> 00:57:18,000 if I've drawn it as follows. 1018 00:57:18,000 --> 00:57:21,000 Suppose this is now a queue instead of a stack, 1019 00:57:21,000 --> 00:57:24,000 the difference being—just like the Apple store—queue is fair. 1020 00:57:24,000 --> 00:57:27,000 The first person in line at the start of the list, number 5 in this case, 1021 00:57:27,000 --> 00:57:30,000 he or she is going to be let into the store first. 1022 00:57:30,000 --> 00:57:32,000 Let's do that. 1023 00:57:32,000 --> 00:57:35,000 Suppose that this is the state of my queue at this moment in time, and now the Apple store 1024 00:57:35,000 --> 00:57:39,000 opens and the first person, number 5, is led into the store. 1025 00:57:39,000 --> 00:57:43,000 How do I change the picture now that I have de-queued the first person 1026 00:57:43,000 --> 00:57:47,000 at the front of the line? 1027 00:57:47,000 --> 00:57:50,000 What's that?>>[Student] Change the queue. 1028 00:57:50,000 --> 00:57:52,000 Change the head, so 5 disappears. 1029 00:57:52,000 --> 00:57:56,000 In reality, it's as though—how best to do this? 1030 00:57:56,000 --> 00:58:00,000 In reality, it's as though this guy disappears. 1031 00:58:00,000 --> 00:58:03,000 What would number 7 do in an actual store? 1032 00:58:03,000 --> 00:58:05,000 They would take a big step forward. 1033 00:58:05,000 --> 00:58:08,000 >> But what have we come to appreciate when it comes to arrays 1034 00:58:08,000 --> 00:58:10,000 and moving things around? 1035 00:58:10,000 --> 00:58:12,000 That's kind of a waste of your time, right? 1036 00:58:12,000 --> 00:58:16,000 Why do you have to be so anal as to have the first person 1037 00:58:16,000 --> 00:58:21,000 at the start of the line at physically the start of the chunk of memory? 1038 00:58:21,000 --> 00:58:23,000 That's completely unnecessary. Why? 1039 00:58:23,000 --> 00:58:26,000 What could I just remember instead?>>[inaudible student response] 1040 00:58:26,000 --> 00:58:30,000 Exactly, I could just remember with this additional data member head 1041 00:58:30,000 --> 00:58:34,000 that now the head of the list is no longer 0, which it was a moment ago. 1042 00:58:34,000 --> 00:58:39,000 Now it's actually the number 1. In this way, I get a slight optimization. 1043 00:58:39,000 --> 00:58:44,000 Just because I've de-queued someone from line at the start of the line at the Apple store 1044 00:58:44,000 --> 00:58:47,000 doesn't mean everyone has to shift, which recall is a linear operation. 1045 00:58:47,000 --> 00:58:50,000 I can instead spend constant time only 1046 00:58:50,000 --> 00:58:53,000 and achieve then a much faster response. 1047 00:58:53,000 --> 00:58:56,000 But the price I'm paying is what to gain that additional performance 1048 00:58:56,000 --> 00:58:58,000 and not having to shift everyone? 1049 00:58:58,000 --> 00:59:01,000 Yeah.>>[inaudible student response] 1050 00:59:01,000 --> 00:59:04,000 Can add more people, well, that problem is orthogonal 1051 00:59:04,000 --> 00:59:07,000 to the fact that we're not shifting people around. 1052 00:59:07,000 --> 00:59:11,000 It's still an array, so whether or not we shift everyone or not— 1053 00:59:11,000 --> 00:59:13,000 oh, I see what you mean, okay. 1054 00:59:13,000 --> 00:59:16,000 Actually, I agree with what you're saying in that it's almost as though 1055 00:59:16,000 --> 00:59:19,000 we're now never going to use the start of this array anymore 1056 00:59:19,000 --> 00:59:22,000 because if I remove 5, then I remove 7. 1057 00:59:22,000 --> 00:59:24,000 But I only put people to the right. 1058 00:59:24,000 --> 00:59:28,000 >> It feels like I'm wasting space, and eventually my queue disintegrates into nothing at all, 1059 00:59:28,000 --> 00:59:31,000 so we could just have people wraparound, 1060 00:59:31,000 --> 00:59:35,000 and we could think of this array really as some kind of circular structure, 1061 00:59:35,000 --> 00:59:38,000 but we use what operator in C to do that sort of wraparound? 1062 00:59:38,000 --> 00:59:40,000 [inaudible student response]>>The modulo operator. 1063 00:59:40,000 --> 00:59:43,000 It would be a little annoying to think through how do you do the wraparound, 1064 00:59:43,000 --> 00:59:46,000 but we could do it, and we could start putting people at what used to be the front of the line, 1065 00:59:46,000 --> 00:59:52,000 but we just remember with this head variable who the actual head of the line actually is. 1066 00:59:52,000 --> 00:59:57,000 What if, instead, our goal ultimately, though, 1067 00:59:57,000 --> 01:00:00,000 was to look up numbers, as we did here on stage with Anita, 1068 01:00:00,000 --> 01:00:02,000 but we really want the best of all these worlds? 1069 01:00:02,000 --> 01:00:05,000 We want more sophistication than array allows 1070 01:00:05,000 --> 01:00:09,000 because we want the ability to dynamically grow the data structure. 1071 01:00:09,000 --> 01:00:12,000 But we don't want to have to resort to something that we pointed out 1072 01:00:12,000 --> 01:00:15,000 in the first lecture was not an optimal algorithm, 1073 01:00:15,000 --> 01:00:17,000 that of linear search. 1074 01:00:17,000 --> 01:00:21,000 It turns out that you can, in fact, achieve 1075 01:00:21,000 --> 01:00:24,000 or at least close to constant time, whereby someone like Anita, 1076 01:00:24,000 --> 01:00:27,000 if she configures her data structure not to be a linked list, 1077 01:00:27,000 --> 01:00:30,000 not to be a stack, not to be a queue, could, in fact, 1078 01:00:30,000 --> 01:00:33,000 come up with a data structure that allows her to look up things, 1079 01:00:33,000 --> 01:00:37,000 even words, not just numbers, in what we'll call constant time. 1080 01:00:37,000 --> 01:00:40,000 >> And in fact, looking ahead, one of the psets in this class is almost always 1081 01:00:40,000 --> 01:00:43,000 an implementation of a spellchecker, whereby 1082 01:00:43,000 --> 01:00:46,000 we give you again some 150,000 English words and the goal is to 1083 01:00:46,000 --> 01:00:51,000 load those into memory and rapidly be able to answer questions of the form 1084 01:00:51,000 --> 01:00:54,000 is this word spelled correctly? 1085 01:00:54,000 --> 01:00:58,000 And it would really suck if you had to iterate through all 150,000 words to answer that. 1086 01:00:58,000 --> 01:01:02,000 But, in fact, we'll see that we can do it in very, very quick time. 1087 01:01:02,000 --> 01:01:06,000 And it's going to involve implementing something called a hash table, 1088 01:01:06,000 --> 01:01:09,000 and even though at first glance this thing called a hash table is going to 1089 01:01:09,000 --> 01:01:12,000 let us achieve these super rapid response times, 1090 01:01:12,000 --> 01:01:18,000 it turns out that there is in fact a problem. 1091 01:01:18,000 --> 01:01:23,000 When it comes time to implement this thing called—again, I'm doing it again. 1092 01:01:23,000 --> 01:01:25,000 I'm the only one here. 1093 01:01:25,000 --> 01:01:28,000 When it comes time to implementing this thing called a hash table, 1094 01:01:28,000 --> 01:01:30,000 we're going to have to make a decision. 1095 01:01:30,000 --> 01:01:32,000 How big should this thing actually be? 1096 01:01:32,000 --> 01:01:36,000 And when we start inserting numbers into this hash table, 1097 01:01:36,000 --> 01:01:38,000 how are we going to store them in such a way 1098 01:01:38,000 --> 01:01:42,000 that we can get them back out as quickly as we got them in? 1099 01:01:42,000 --> 01:01:45,000 But we'll see before long that this question of 1100 01:01:45,000 --> 01:01:48,000 when everyone's birthday is in the class will be quite germane. 1101 01:01:48,000 --> 01:01:51,000 It turns out that in this room, we've got a few hundred people, 1102 01:01:51,000 --> 01:01:56,000 so the odds that two of us have the same birthday is probably pretty high. 1103 01:01:56,000 --> 01:01:58,000 What if there were only 40 of us in this room? 1104 01:01:58,000 --> 01:02:02,000 What are the odds of two people having the same birthday? 1105 01:02:02,000 --> 01:02:04,000 [Students] Over 50%. 1106 01:02:04,000 --> 01:02:06,000 Yeah, over 50%. In fact, I even brought a chart. 1107 01:02:06,000 --> 01:02:08,000 It turns out—and this is really just a sneak preview— 1108 01:02:08,000 --> 01:02:12,000 if there's only 58 of us in this room, the probability of 2 of us 1109 01:02:12,000 --> 01:02:16,000 having the same birthday is hugely high, almost 100%, 1110 01:02:16,000 --> 01:02:20,000 and that's going to cause a whole bunch of hurt for us on Wednesday. 1111 01:02:20,000 --> 01:02:24,000 >> With that said, let's adjourn here. We'll see you on Wednesday. 1112 01:02:24,000 --> 01:02:28,000 [applause] 1113 01:02:28,000 --> 01:02:30,000 [CS50.TV]