1 00:00:00,000 --> 00:00:11,460 2 00:00:11,460 --> 00:00:12,250 >> DAVID MALAN: All right. 3 00:00:12,250 --> 00:00:13,860 Welcome back to CS50. 4 00:00:13,860 --> 00:00:16,190 This is the start of week 8. 5 00:00:16,190 --> 00:00:21,320 And recall that problem set 5 ended with a little bit of a challenge. 6 00:00:21,320 --> 00:00:25,210 So assuming you recovered all of your teaching Fellows and CA's photographs 7 00:00:25,210 --> 00:00:30,480 in the card.raw file, you are eligible to now find all of those people, and 8 00:00:30,480 --> 00:00:34,510 one lucky winner will walk home with one of these things, the leap motion 9 00:00:34,510 --> 00:00:37,450 device that you can use for final projects, for instance. 10 00:00:37,450 --> 00:00:39,860 >> This, every year, leads to a bit of creepiness. 11 00:00:39,860 --> 00:00:43,480 And so what I thought I'd do is share with you some of the notes that have 12 00:00:43,480 --> 00:00:47,370 gone back and forth over the staff list of late. 13 00:00:47,370 --> 00:00:51,110 For instance, just last night, quote unquote, from one of the staff 14 00:00:51,110 --> 00:00:55,000 members, "I just had a student knock on my door to take a photo with me. 15 00:00:55,000 --> 00:00:59,020 Stalkers, I tell you." Started off fairly descriptive and then we moved 16 00:00:59,020 --> 00:01:02,830 on to, an hour or so later, "I had a student waiting for me after section 17 00:01:02,830 --> 00:01:06,080 and he had all of our names and photos on some sheets of paper." All right. 18 00:01:06,080 --> 00:01:09,230 So organized, but not all that creepy yet. 19 00:01:09,230 --> 00:01:12,520 >> Then, "I was out of town this weekend, and when I got back, there was one in 20 00:01:12,520 --> 00:01:12,630 my 21 00:01:12,630 --> 00:01:16,740 bedroom." [LAUGHTER] 22 00:01:16,740 --> 00:01:20,410 DAVID MALAN: Next quote from a staff member, "a student came to my house at 23 00:01:20,410 --> 00:01:25,330 Somerville at 4 AM this morning." Next staff, "I got to my hotel in San 24 00:01:25,330 --> 00:01:30,016 Francisco and a student was waiting for me at the lobby with three DSLRs." 25 00:01:30,016 --> 00:01:31,510 Type of camera. 26 00:01:31,510 --> 00:01:34,980 "I'm not even on staff this semester, but a student broke into my house this 27 00:01:34,980 --> 00:01:40,480 morning and recorded the whole thing with Google Glass." And then lastly, 28 00:01:40,480 --> 00:01:43,650 "at least 12 people were eagerly awaiting for me when I got out of my 29 00:01:43,650 --> 00:01:44,800 limo, and then I 30 00:01:44,800 --> 00:01:46,970 woke up." All right. 31 00:01:46,970 --> 00:01:57,690 So among the photographs, as you may recall, are this fellow here, who you 32 00:01:57,690 --> 00:02:01,850 might know as Milo Banana, who lives with Lauren Carvalho, our head 33 00:02:01,850 --> 00:02:02,905 teaching Fellow. 34 00:02:02,905 --> 00:02:05,170 Milo, Milo, come here boy. 35 00:02:05,170 --> 00:02:06,320 Milo. 36 00:02:06,320 --> 00:02:08,650 Milo. 37 00:02:08,650 --> 00:02:12,230 Mind you, he's wearing Google Glass, so we'll show you all of this after. 38 00:02:12,230 --> 00:02:16,190 So this is Milo if you would like to take a photograph with him afterward. 39 00:02:16,190 --> 00:02:18,240 If you'd like to look out at the audience there. 40 00:02:18,240 --> 00:02:19,430 OK. 41 00:02:19,430 --> 00:02:20,200 That's good footage. 42 00:02:20,200 --> 00:02:22,556 Well, Milo Banana. 43 00:02:22,556 --> 00:02:23,941 Oh, don't do that. 44 00:02:23,941 --> 00:02:29,020 >> [LAUGHTER] 45 00:02:29,020 --> 00:02:29,470 >> OK. 46 00:02:29,470 --> 00:02:34,550 So a word then on what lies ahead, because as we begin to transition, 47 00:02:34,550 --> 00:02:38,410 this week specifically, from C in a command line environment to PHP and 48 00:02:38,410 --> 00:02:42,720 JavaScript and SQL and HTML and CSS in a web-based environment, we'll be 49 00:02:42,720 --> 00:02:44,490 equipping you with all the more knowledge for 50 00:02:44,490 --> 00:02:46,010 potential final projects. 51 00:02:46,010 --> 00:02:49,240 Toward that end, the course has a tradition of holding seminars which 52 00:02:49,240 --> 00:02:50,950 are on tangential topics to the course. 53 00:02:50,950 --> 00:02:54,330 Very much related to programming and to app development and so forth, but 54 00:02:54,330 --> 00:02:57,010 not necessarily explored by the course's own syllabus. 55 00:02:57,010 --> 00:03:00,250 >> So if you might be interested in one or more of this year's seminars, 56 00:03:00,250 --> 00:03:02,530 register at cs50.net/seminar. 57 00:03:02,530 --> 00:03:06,170 There are older seminars at cs50.net/seminars. 58 00:03:06,170 --> 00:03:10,620 And on the roster thus far for this year are Amazing Web Apps with Ruby on 59 00:03:10,620 --> 00:03:13,580 Rails, which is an alternative language to PHP. 60 00:03:13,580 --> 00:03:14,900 Computational Linguistics. 61 00:03:14,900 --> 00:03:18,710 Introduction to iOS, which is the platform that's used for iPhone and 62 00:03:18,710 --> 00:03:19,850 iPad development. 63 00:03:19,850 --> 00:03:22,890 JavaScript for Web Apps, we'll cover that, but in this seminar, you'll go 64 00:03:22,890 --> 00:03:24,070 into more detail. 65 00:03:24,070 --> 00:03:27,390 >> Leap Motion, so we'll actually have some of our friends from Leap Motion, 66 00:03:27,390 --> 00:03:29,160 the company itself, join us. 67 00:03:29,160 --> 00:03:31,800 Tomorrow, in fact, to provide a hands-on seminar, if 68 00:03:31,800 --> 00:03:33,320 of interest to you. 69 00:03:33,320 --> 00:03:38,770 Meteor.js, an alternative technique for using JavaScript not in a browser, 70 00:03:38,770 --> 00:03:39,970 but on a server. 71 00:03:39,970 --> 00:03:42,110 Node.js, which is very much in that vein as well. 72 00:03:42,110 --> 00:03:43,650 Sleek Android Design. 73 00:03:43,650 --> 00:03:46,990 Android being a very popular alternative to iOS and Windows Phone 74 00:03:46,990 --> 00:03:48,790 and other mobile platforms. 75 00:03:48,790 --> 00:03:51,180 And Web Security Active Defense. 76 00:03:51,180 --> 00:03:54,590 >> So in fact, if you would like to engage in this, let me 77 00:03:54,590 --> 00:03:55,840 make note of this. 78 00:03:55,840 --> 00:03:57,790 We're very happy to say that our friends at Leap 79 00:03:57,790 --> 00:03:59,140 Motion, which is a startup-- 80 00:03:59,140 --> 00:04:01,300 this device really just came out a few months ago-- 81 00:04:01,300 --> 00:04:05,960 has graciously donated 30 such devices to the class for as many students, if 82 00:04:05,960 --> 00:04:08,670 you'd like to borrow the hardware towards semester's end and use it for 83 00:04:08,670 --> 00:04:10,390 an actual final project. 84 00:04:10,390 --> 00:04:11,890 They support a number of languages. 85 00:04:11,890 --> 00:04:16,040 None of them C, none of them PHP, so realize one or more of these seminars 86 00:04:16,040 --> 00:04:16,899 might prove of interest. 87 00:04:16,899 --> 00:04:19,730 And all of them will be filmed in the event that you are not able 88 00:04:19,730 --> 00:04:21,380 to attend in person. 89 00:04:21,380 --> 00:04:25,000 The schedule be announced via email as we solidify rooms. 90 00:04:25,000 --> 00:04:28,460 >> And lastly, if you go to projects.cs.50.net, this is a website 91 00:04:28,460 --> 00:04:31,450 we maintain each year that we invite folks from the community, faculty, 92 00:04:31,450 --> 00:04:36,420 departments, staff, and both in an outside of CS50 to 93 00:04:36,420 --> 00:04:37,730 propose project ideas. 94 00:04:37,730 --> 00:04:39,050 Things of interest to student groups. 95 00:04:39,050 --> 00:04:40,600 Things of interest to departments. 96 00:04:40,600 --> 00:04:43,990 So do turn there if you're struggling with uncertainty as to what you 97 00:04:43,990 --> 00:04:46,700 yourself would like to tackle. 98 00:04:46,700 --> 00:04:51,760 >> So last time we introduced an arguably more complex data structure than we'd 99 00:04:51,760 --> 00:04:53,300 seen in weeks past. 100 00:04:53,300 --> 00:04:56,550 We'd been using arrays pretty happily as a useful if 101 00:04:56,550 --> 00:04:58,160 simplistic data structure. 102 00:04:58,160 --> 00:05:00,570 Then we introduced these, which of course are linked lists. 103 00:05:00,570 --> 00:05:05,470 And what was one of the motivations for introducing this data structure? 104 00:05:05,470 --> 00:05:06,930 Yeah? 105 00:05:06,930 --> 00:05:07,250 What's that? 106 00:05:07,250 --> 00:05:08,080 >> AUDIENCE: Dynamic size. 107 00:05:08,080 --> 00:05:09,040 >> DAVID MALAN: Dynamic size. 108 00:05:09,040 --> 00:05:11,890 So whereas in array, you have to know its size in advance when 109 00:05:11,890 --> 00:05:12,740 you allocate it. 110 00:05:12,740 --> 00:05:14,380 In linked list, you don't have to know that. 111 00:05:14,380 --> 00:05:17,610 You can just malloc, or more generally, allocate an additional 112 00:05:17,610 --> 00:05:20,720 node, so to speak, any time you want to insert more data. 113 00:05:20,720 --> 00:05:22,670 And node has no predetermined meaning. 114 00:05:22,670 --> 00:05:25,580 It's just a generic term describing some kind of container that we're 115 00:05:25,580 --> 00:05:29,610 using in our data structure to store some item of interest, which in this 116 00:05:29,610 --> 00:05:31,750 case happen to be integers. 117 00:05:31,750 --> 00:05:33,160 >> But there's always a tradeoff. 118 00:05:33,160 --> 00:05:38,070 So we get dynamic sizes of the data structure, but what price do we pay? 119 00:05:38,070 --> 00:05:40,040 What's the downside of linked lists? 120 00:05:40,040 --> 00:05:41,006 Yeah? 121 00:05:41,006 --> 00:05:41,980 >> AUDIENCE: Requires more memory. 122 00:05:41,980 --> 00:05:44,240 >> DAVID MALAN: It requires more memory, how exactly? 123 00:05:44,240 --> 00:05:46,440 >> AUDIENCE: [INAUDIBLE]. 124 00:05:46,440 --> 00:05:47,050 >> DAVID MALAN: Exactly. 125 00:05:47,050 --> 00:05:50,460 So now we have pointers taking up additional memory that we previously 126 00:05:50,460 --> 00:05:53,040 didn't need, because the advantage of an array, of course, is that 127 00:05:53,040 --> 00:05:54,860 everything's contiguous, back to back to back, which 128 00:05:54,860 --> 00:05:56,380 gives you random access. 129 00:05:56,380 --> 00:06:00,710 Because just by using square bracket notation, or more technically pointer 130 00:06:00,710 --> 00:06:03,580 arithmetic, very simple addition, can you access any 131 00:06:03,580 --> 00:06:05,700 elements in constant time. 132 00:06:05,700 --> 00:06:08,975 And in fact, that's kind of hinting at another price that we're paying with a 133 00:06:08,975 --> 00:06:09,760 linked list. 134 00:06:09,760 --> 00:06:13,890 >> What happens to the running time of something like Search, if I want to 135 00:06:13,890 --> 00:06:17,270 find some value and inside of a linked list? 136 00:06:17,270 --> 00:06:20,290 What does my running time become? 137 00:06:20,290 --> 00:06:21,560 Big O of n. 138 00:06:21,560 --> 00:06:24,060 If it's sorted to? 139 00:06:24,060 --> 00:06:25,440 What if the data structure's sorted? 140 00:06:25,440 --> 00:06:28,640 Can I do better than big O of n for searching? 141 00:06:28,640 --> 00:06:31,700 No, because in the worst case it might very well be sorted, but the number 142 00:06:31,700 --> 00:06:32,950 you're looking for might be big. 143 00:06:32,950 --> 00:06:35,370 It might be the number 100, which might happen to be all 144 00:06:35,370 --> 00:06:36,410 the way at the end. 145 00:06:36,410 --> 00:06:39,950 And because you can only access a linked list in this implementation by 146 00:06:39,950 --> 00:06:42,690 way of its first node, you're still kind of out of luck. 147 00:06:42,690 --> 00:06:47,450 You have to traverse the whole thing from first to last in order to find 148 00:06:47,450 --> 00:06:49,150 that big value like 100. 149 00:06:49,150 --> 00:06:51,350 Or to determine if it's not even there. 150 00:06:51,350 --> 00:06:55,960 >> So we cannot do what algorithm in a data structure that looks like this? 151 00:06:55,960 --> 00:06:59,460 We can't do binary search, because binary search required that we had 152 00:06:59,460 --> 00:07:00,740 random access. 153 00:07:00,740 --> 00:07:04,500 We could just leap from location to location without having to follow 154 00:07:04,500 --> 00:07:07,080 these bread crumbs in the form of all these pointers. 155 00:07:07,080 --> 00:07:08,300 >> Now, how did we implement this? 156 00:07:08,300 --> 00:07:12,830 Well, if we go to the screen here, if we can quickly reimplement this data 157 00:07:12,830 --> 00:07:13,440 structure-- 158 00:07:13,440 --> 00:07:15,670 my handwriting's not all that great here, but we'll try. 159 00:07:15,670 --> 00:07:22,030 So typedef struct, and what did I want to call this thing up here? 160 00:07:22,030 --> 00:07:22,960 Node. 161 00:07:22,960 --> 00:07:24,580 So I'll get us started. 162 00:07:24,580 --> 00:07:27,860 And now, what needs to be inside of the data structure for that singly 163 00:07:27,860 --> 00:07:28,430 linked list? 164 00:07:28,430 --> 00:07:29,950 How many fields? 165 00:07:29,950 --> 00:07:30,450 >> So two. 166 00:07:30,450 --> 00:07:31,570 One is pretty easy. 167 00:07:31,570 --> 00:07:33,050 So int n. 168 00:07:33,050 --> 00:07:35,930 And we could call n anything we want, but it should be an int if we're 169 00:07:35,930 --> 00:07:37,660 implementing a linked list for ints. 170 00:07:37,660 --> 00:07:41,920 And now what does the second field have to be? 171 00:07:41,920 --> 00:07:43,460 Struct node *. 172 00:07:43,460 --> 00:07:50,570 So if I do struct node *, and then I can call this also whatever I want, 173 00:07:50,570 --> 00:07:53,510 but just to be clear I'll call it next, as we've been doing. 174 00:07:53,510 --> 00:07:55,270 And then I'll close my curly braces. 175 00:07:55,270 --> 00:08:00,700 >> And now, as last time, I put node down here. 176 00:08:00,700 --> 00:08:03,830 But if I'm declaring this is as a node, why did I bother being so 177 00:08:03,830 --> 00:08:07,320 verbose here in declaring struct node * next, as opposed 178 00:08:07,320 --> 00:08:09,210 to just node * next? 179 00:08:09,210 --> 00:08:09,904 Yeah? 180 00:08:09,904 --> 00:08:12,810 >> AUDIENCE: [INAUDIBLE]. 181 00:08:12,810 --> 00:08:14,050 >> DAVID MALAN: Exactly. 182 00:08:14,050 --> 00:08:14,530 Exactly. 183 00:08:14,530 --> 00:08:18,320 Because C really takes you literally and only sees the definition of node 184 00:08:18,320 --> 00:08:21,230 way down here, you can't refer to it up here. 185 00:08:21,230 --> 00:08:24,760 So we have this sort of preemptive declaration here, which is admittedly 186 00:08:24,760 --> 00:08:25,390 more verbose. 187 00:08:25,390 --> 00:08:27,810 Struct node, that means we can now access it 188 00:08:27,810 --> 00:08:29,760 inside of the data structure. 189 00:08:29,760 --> 00:08:33,370 >> And as an aside, because this is becoming a little more subjective now, 190 00:08:33,370 --> 00:08:36,230 the star can technically go here, it can go here, it can 191 00:08:36,230 --> 00:08:37,179 even go in the middle. 192 00:08:37,179 --> 00:08:39,890 We've adopted, in the style guide for the course, the convention of putting 193 00:08:39,890 --> 00:08:42,299 the star right next to the data type, which in this case, 194 00:08:42,299 --> 00:08:43,460 would be struct node. 195 00:08:43,460 --> 00:08:46,620 But realize in a lot of textbooks and online references, you might indeed 196 00:08:46,620 --> 00:08:48,450 see it on the other side. 197 00:08:48,450 --> 00:08:52,200 But just realize that both will actually work and you should simply be 198 00:08:52,200 --> 00:08:52,970 consistent. 199 00:08:52,970 --> 00:08:53,580 >> All right. 200 00:08:53,580 --> 00:08:55,630 So that was our declaration of struct node. 201 00:08:55,630 --> 00:08:59,430 But then we started doing more sophisticated things. 202 00:08:59,430 --> 00:09:03,410 For instance, we decided to introduce something like a hash table. 203 00:09:03,410 --> 00:09:08,160 So here is a hash table of size n, indexed from 0 on the top left to n 204 00:09:08,160 --> 00:09:09,690 minus 1 on the bottom left. 205 00:09:09,690 --> 00:09:11,640 This could be a hash table for anything. 206 00:09:11,640 --> 00:09:15,340 But what kinds of things did we talk about using a hash table for? 207 00:09:15,340 --> 00:09:18,370 Storing what? 208 00:09:18,370 --> 00:09:18,800 >> Names. 209 00:09:18,800 --> 00:09:20,870 We could do names like we did last time. 210 00:09:20,870 --> 00:09:22,200 And really, you can store anything. 211 00:09:22,200 --> 00:09:24,640 And we'll see this again in PHP and in JavaScript. 212 00:09:24,640 --> 00:09:28,550 A hash table is a nice sort of Swiss Army knife that allows you to store 213 00:09:28,550 --> 00:09:33,690 pretty much whatever you want inside of it by associating keys with values. 214 00:09:33,690 --> 00:09:34,770 Keys with values. 215 00:09:34,770 --> 00:09:37,800 >> Now in this simple case, our keys are just numbers. 216 00:09:37,800 --> 00:09:40,380 We're implementing a hash table as an array. 217 00:09:40,380 --> 00:09:43,500 And so the keys are 0, 1, 2, and so forth. 218 00:09:43,500 --> 00:09:47,200 And so we, as humans, decided last week that you know what, if we're 219 00:09:47,200 --> 00:09:50,410 going to store names, let's just arbitrarily, but pretty reasonably, 220 00:09:50,410 --> 00:09:54,680 assume that Alice, an A name, will just be indexed into 0. 221 00:09:54,680 --> 00:09:58,030 And Bob, a B name, will be indexed into 1, and so forth. 222 00:09:58,030 --> 00:10:02,490 So we had a mapping between inputs, which are strings, and the hash 223 00:10:02,490 --> 00:10:04,560 locations, which are numbers. 224 00:10:04,560 --> 00:10:07,740 >> So that process is generally known as a hash function, and you can truly 225 00:10:07,740 --> 00:10:09,130 implement it in code. 226 00:10:09,130 --> 00:10:12,080 If I wanted to implement a hash function that does exactly what we 227 00:10:12,080 --> 00:10:17,070 just described from last time, I might declare a function that takes, as 228 00:10:17,070 --> 00:10:18,330 input for instance-- 229 00:10:18,330 --> 00:10:22,190 and let's do this on this screen over here. 230 00:10:22,190 --> 00:10:26,180 If I wanted to implement a hash function, I might say 231 00:10:26,180 --> 00:10:27,410 something like this. 232 00:10:27,410 --> 00:10:29,030 >> It's going to return an int. 233 00:10:29,030 --> 00:10:33,600 It's going to be called hash, and it's going to accept as an argument a 234 00:10:33,600 --> 00:10:38,920 string, or we can be more proper now, and say char*, we'll call it s. 235 00:10:38,920 --> 00:10:43,840 And then all this function has to do, ultimately, is return an int. 236 00:10:43,840 --> 00:10:45,990 Now, how it does that might not be so clear. 237 00:10:45,990 --> 00:10:49,510 I'm going to implement this without any form of error checking right now. 238 00:10:49,510 --> 00:10:55,740 I'm just going to blindly say, return whatever is at s bracket 0, minus, 239 00:10:55,740 --> 00:10:58,850 let's say, capital A semicolon. 240 00:10:58,850 --> 00:10:59,960 >> Totally broken. 241 00:10:59,960 --> 00:11:02,620 It's not perfect because one, what if s is null? 242 00:11:02,620 --> 00:11:04,000 Bad things are going to happen. 243 00:11:04,000 --> 00:11:07,940 Two, what if the first letter in this name is not a capital letter? 244 00:11:07,940 --> 00:11:09,860 That's not going to turn out well either. 245 00:11:09,860 --> 00:11:11,970 It might be a lowercase letter or not a letter at all. 246 00:11:11,970 --> 00:11:15,520 So totally room for improvement here, but this is the basic idea. 247 00:11:15,520 --> 00:11:19,010 >> What we described last week verbally as just a process of mapping Alice to 248 00:11:19,010 --> 00:11:23,360 0 and Bob to 1 can be expressed certainly more formulaically as a C 249 00:11:23,360 --> 00:11:24,320 function here. 250 00:11:24,320 --> 00:11:28,630 Called again hash, takes a string as input, and then somehow does something 251 00:11:28,630 --> 00:11:31,020 with that input to produce an output. 252 00:11:31,020 --> 00:11:34,130 Not unlike our black box description that we've long done. 253 00:11:34,130 --> 00:11:36,550 I don't know how this might be working underneath the hood. 254 00:11:36,550 --> 00:11:40,120 >> For problem set 6, one of the challenges is for you to decide what 255 00:11:40,120 --> 00:11:41,920 will your hash function be? 256 00:11:41,920 --> 00:11:45,760 What's going to be inside of that black box, and presumably, it'll be a 257 00:11:45,760 --> 00:11:50,380 little more interesting than this, and definitely more prone to error 258 00:11:50,380 --> 00:11:53,180 checking than this particular implementation. 259 00:11:53,180 --> 00:11:54,580 >> But problems can arise, right? 260 00:11:54,580 --> 00:11:57,760 If we have a data structure such as this one, what's one of the problems 261 00:11:57,760 --> 00:12:01,600 you can run into over time as you insert more and more names into the 262 00:12:01,600 --> 00:12:02,880 hash table? 263 00:12:02,880 --> 00:12:04,630 You get collisions, right? 264 00:12:04,630 --> 00:12:07,560 What if you have Alice and Aaron, two people whose names happened 265 00:12:07,560 --> 00:12:08,190 to start with A? 266 00:12:08,190 --> 00:12:11,660 That begs the question, where you put the second such A name? 267 00:12:11,660 --> 00:12:15,050 >> Well, you might naively just put it where Bob belongs, but then Bob is 268 00:12:15,050 --> 00:12:17,300 kind of screwed if you try to insert his name next and 269 00:12:17,300 --> 00:12:18,240 there's no room for him. 270 00:12:18,240 --> 00:12:21,400 So you might put Bob where Charlie is, and you can imagine this very quickly 271 00:12:21,400 --> 00:12:23,020 devolving into a bit of a mess. 272 00:12:23,020 --> 00:12:25,600 Something linear in the end, where you just have to search the whole thing 273 00:12:25,600 --> 00:12:28,190 looking for Alice or Bob or Aaron or Charlie. 274 00:12:28,190 --> 00:12:33,230 >> So instead we proposed, instead of just linearly probing for open spaces 275 00:12:33,230 --> 00:12:36,450 and plopping the names there, we proposed a fancier approach. 276 00:12:36,450 --> 00:12:41,740 A hash table implemented still with an array of indices, but the data type of 277 00:12:41,740 --> 00:12:44,500 those indices now were pointers. 278 00:12:44,500 --> 00:12:47,360 Pointers to what? 279 00:12:47,360 --> 00:12:48,730 Pointers to linked lists. 280 00:12:48,730 --> 00:12:53,330 >> Because recall that a linked list is really just a pointer to a node, and 281 00:12:53,330 --> 00:12:57,110 the node has a next field, and that node has a next field, and so forth. 282 00:12:57,110 --> 00:13:00,690 So you can now think of this array on the left-hand side of a hash table as 283 00:13:00,690 --> 00:13:01,820 leading to a linked list. 284 00:13:01,820 --> 00:13:07,000 The advantage of which is if you get a collision between Alice and Aaron, 285 00:13:07,000 --> 00:13:09,300 what do you do with the second such person? 286 00:13:09,300 --> 00:13:14,150 You just attach him or her to the end, or even the beginning 287 00:13:14,150 --> 00:13:15,490 of that linked list. 288 00:13:15,490 --> 00:13:17,340 >> And actually, let's just noodle through that for just a second. 289 00:13:17,340 --> 00:13:18,640 Where would make the most sense? 290 00:13:18,640 --> 00:13:22,060 If I insert Alice and she ends up at the first location, then I try to 291 00:13:22,060 --> 00:13:25,310 insert Aaron's name, and there's obviously a collision, should I put 292 00:13:25,310 --> 00:13:27,400 him at the beginning of the linked list? 293 00:13:27,400 --> 00:13:30,944 That's at that first location, or at the end? 294 00:13:30,944 --> 00:13:31,440 >> AUDIENCE: [INAUDIBLE]. 295 00:13:31,440 --> 00:13:31,990 >> DAVID MALAN: OK. 296 00:13:31,990 --> 00:13:32,490 I heard beginning. 297 00:13:32,490 --> 00:13:33,903 Why at the beginning? 298 00:13:33,903 --> 00:13:34,750 >> AUDIENCE: [INAUDIBLE]. 299 00:13:34,750 --> 00:13:34,940 >> DAVID MALAN: OK. 300 00:13:34,940 --> 00:13:36,520 It's alphabetical, so that's nice. 301 00:13:36,520 --> 00:13:37,330 That's a good property. 302 00:13:37,330 --> 00:13:39,335 It will save me some time potentially. 303 00:13:39,335 --> 00:13:43,290 It won't let me do binary search, but I might at least be able to break out 304 00:13:43,290 --> 00:13:47,340 of a loop if I realize, well, I'm way past were Aaron would be in this 305 00:13:47,340 --> 00:13:48,310 sorted linked list. 306 00:13:48,310 --> 00:13:50,360 I don't have to waste my time looking all the way to the end. 307 00:13:50,360 --> 00:13:51,530 So that's reasonable. 308 00:13:51,530 --> 00:13:54,710 Why else might you want to insert the colliding name at the 309 00:13:54,710 --> 00:13:56,660 beginning of the list? 310 00:13:56,660 --> 00:13:57,397 What's that? 311 00:13:57,397 --> 00:13:58,680 >> AUDIENCE: [INAUDIBLE]. 312 00:13:58,680 --> 00:14:00,820 >> DAVID MALAN: It could take a long time to get to the end of the list. 313 00:14:00,820 --> 00:14:02,490 And in fact, longer and longer. 314 00:14:02,490 --> 00:14:04,920 The more names you insert that start with A, the longer that 315 00:14:04,920 --> 00:14:06,280 chain is going to get. 316 00:14:06,280 --> 00:14:07,890 The longer that linked list is going to get. 317 00:14:07,890 --> 00:14:09,420 So you're really just wasting your time. 318 00:14:09,420 --> 00:14:14,070 Maybe you're better off maintaining constant insertion time, big O of 1, 319 00:14:14,070 --> 00:14:18,470 by always putting the colliding name at the beginning of the linked list, 320 00:14:18,470 --> 00:14:21,230 and not worrying as much about sorting. 321 00:14:21,230 --> 00:14:22,600 >> What's the best answer? 322 00:14:22,600 --> 00:14:23,320 It's unclear. 323 00:14:23,320 --> 00:14:26,140 It kind of depends on what the distribution is, what the pattern is 324 00:14:26,140 --> 00:14:27,850 of the names you are inserting. 325 00:14:27,850 --> 00:14:29,430 It's not necessarily an obvious answer. 326 00:14:29,430 --> 00:14:33,100 But here to, again, is a design opportunity. 327 00:14:33,100 --> 00:14:37,220 >> So we then looked at this thing, which is really the other big opportunity 328 00:14:37,220 --> 00:14:38,180 for p-set 6. 329 00:14:38,180 --> 00:14:41,770 And realize, if you haven't already, Zamyla dives into both of these, hash 330 00:14:41,770 --> 00:14:43,260 tables and tries, in more detail. 331 00:14:43,260 --> 00:14:45,630 And the video walkthrough is embedded in p-set spec. 332 00:14:45,630 --> 00:14:46,590 This was a trie-- 333 00:14:46,590 --> 00:14:51,670 T-R-I-E. And what was interesting about this was that the running time 334 00:14:51,670 --> 00:14:59,510 of searching for a name, like Maxwell last time, was big O of what? 335 00:14:59,510 --> 00:15:01,040 What's that? 336 00:15:01,040 --> 00:15:01,920 >> AUDIENCE: The number of letters. 337 00:15:01,920 --> 00:15:02,550 >> DAVID MALAN: Number of letters. 338 00:15:02,550 --> 00:15:03,210 I heard two things. 339 00:15:03,210 --> 00:15:04,630 Number of letters and constant time. 340 00:15:04,630 --> 00:15:05,540 So let's go with that first. 341 00:15:05,540 --> 00:15:06,410 The number of letters. 342 00:15:06,410 --> 00:15:10,195 Well, this data structure, recall, is like a tree, a family tree, each of 343 00:15:10,195 --> 00:15:12,860 whose nodes are made up of arrays. 344 00:15:12,860 --> 00:15:16,300 And those arrays are pointers to other such nodes, or other such 345 00:15:16,300 --> 00:15:17,670 arrays in the tree. 346 00:15:17,670 --> 00:15:22,890 >> So if we wanted to then determine whether Maxwell is in here, I might go 347 00:15:22,890 --> 00:15:26,890 to the first array, at the very top of the tree, the so-called root, top of 348 00:15:26,890 --> 00:15:30,521 the trie, and then follow the m pointer, then the a pointer, then x, 349 00:15:30,521 --> 00:15:31,710 w, e, l, l. 350 00:15:31,710 --> 00:15:34,910 And then when I see some special symbol, denoted here as a triangle. 351 00:15:34,910 --> 00:15:38,480 In code you'll see we propose that you implemented as a bool, just saying yes 352 00:15:38,480 --> 00:15:40,540 or no, a word stops here. 353 00:15:40,540 --> 00:15:45,270 >> Well, once we've gone to M-A-X-W-E-L-L, feels like seven, maybe 354 00:15:45,270 --> 00:15:48,910 eight if we go one past it, eight steps to find Maxwell. 355 00:15:48,910 --> 00:15:53,050 Or let's call it K. But recall last time, I argued that if there's 356 00:15:53,050 --> 00:15:57,540 realistically a maximum length on a word, like 40-some-odd characters, a 357 00:15:57,540 --> 00:16:00,810 maximum length implies a constant value. 358 00:16:00,810 --> 00:16:05,770 So really, yes, it's technically big O of 8 or 7, or really big O of K. But 359 00:16:05,770 --> 00:16:09,420 if there's a finite cap on what K could be, it's a constant. 360 00:16:09,420 --> 00:16:12,080 And so it's big O of 1 at the end of the day. 361 00:16:12,080 --> 00:16:13,040 >> Not in the real world. 362 00:16:13,040 --> 00:16:15,960 Not when you actually start watching your clock as your program's running. 363 00:16:15,960 --> 00:16:20,690 It's absolutely going to be a bit slower than truly constant 364 00:16:20,690 --> 00:16:21,840 time with one step. 365 00:16:21,840 --> 00:16:25,540 It's going to be seven or eight steps, but still that's much, much better 366 00:16:25,540 --> 00:16:30,080 than an algorithm like big O of n that depends on the size of what's in the 367 00:16:30,080 --> 00:16:31,220 data structure. 368 00:16:31,220 --> 00:16:34,970 >> Notice the upside here is we can insert a million more names into this 369 00:16:34,970 --> 00:16:38,170 data structure, but how many more steps is it going to take us to find 370 00:16:38,170 --> 00:16:40,480 Maxwell in that case? 371 00:16:40,480 --> 00:16:40,780 None. 372 00:16:40,780 --> 00:16:41,820 He's unaffected. 373 00:16:41,820 --> 00:16:45,480 And to date, I don't think we've seen an example of a data structure or an 374 00:16:45,480 --> 00:16:48,560 algorithm that was completely unaffected by external 375 00:16:48,560 --> 00:16:50,040 behaviors like that. 376 00:16:50,040 --> 00:16:51,160 But this can't be amazing. 377 00:16:51,160 --> 00:16:52,900 This can't be the only solution for the p-set 378 00:16:52,900 --> 00:16:53,570 >> And it's not. 379 00:16:53,570 --> 00:16:55,980 This is not necessarily the data structure you should gravitate to, 380 00:16:55,980 --> 00:16:58,220 because like hash tables, tradeoff. 381 00:16:58,220 --> 00:17:00,500 What's the price you pay here? 382 00:17:00,500 --> 00:17:00,940 Memory. 383 00:17:00,940 --> 00:17:02,890 I mean, this is an atrocious amount of memory. 384 00:17:02,890 --> 00:17:05,569 And you can't quite see it here because the author of this picture 385 00:17:05,569 --> 00:17:09,420 obviously truncated all of the arrays, and we're not seeing lots of A's and 386 00:17:09,420 --> 00:17:12,700 B's and C's and Q's and Y's and Z's in these arrays. 387 00:17:12,700 --> 00:17:13,630 But they're there. 388 00:17:13,630 --> 00:17:17,660 >> Each of these nodes is a whole array of some 26 or more bytes, each of 389 00:17:17,660 --> 00:17:19,170 which represents a letter. 390 00:17:19,170 --> 00:17:22,920 27 in our case, so that we can support apostrophes in the problem set. 391 00:17:22,920 --> 00:17:27,030 So this data structure is really, really dense and wide. 392 00:17:27,030 --> 00:17:30,880 And that alone might end up slowing things down, or at least costing you a 393 00:17:30,880 --> 00:17:32,240 lot more space. 394 00:17:32,240 --> 00:17:34,020 But again, we can draw comparisons here. 395 00:17:34,020 --> 00:17:39,190 >> Recall a while back, we achieved much more exciting running time in sorting 396 00:17:39,190 --> 00:17:42,880 when we use merge sort, but the price we paid to achieve n log n for merge 397 00:17:42,880 --> 00:17:46,930 sort required that we spend more what resource? 398 00:17:46,930 --> 00:17:47,690 More space. 399 00:17:47,690 --> 00:17:50,530 We needed a secondary array to copy people into, just like 400 00:17:50,530 --> 00:17:51,620 we did here on stage. 401 00:17:51,620 --> 00:17:55,880 So again, no clear winners, but just subjective design 402 00:17:55,880 --> 00:17:57,710 decisions to be made. 403 00:17:57,710 --> 00:17:58,060 >> All right. 404 00:17:58,060 --> 00:17:59,130 So how about this? 405 00:17:59,130 --> 00:18:02,050 Anyone recognize which D-Hall? 406 00:18:02,050 --> 00:18:02,440 OK. 407 00:18:02,440 --> 00:18:03,170 So three of us do. 408 00:18:03,170 --> 00:18:03,750 Mather House. 409 00:18:03,750 --> 00:18:05,070 So this is for Mather's dining. 410 00:18:05,070 --> 00:18:09,650 I'll bet all the dining halls have stacks of trays like this. 411 00:18:09,650 --> 00:18:11,950 And this is actually representative of something we've 412 00:18:11,950 --> 00:18:13,050 obviously seen already. 413 00:18:13,050 --> 00:18:14,850 We called it literally a stack. 414 00:18:14,850 --> 00:18:18,970 And the stack, in terms of your computer's memory, is where data goes 415 00:18:18,970 --> 00:18:20,460 while functions are being called. 416 00:18:20,460 --> 00:18:23,410 >> For instance, what kinds of things go on the stack with respect to the 417 00:18:23,410 --> 00:18:27,420 memory layout we've discussed in weeks past? 418 00:18:27,420 --> 00:18:28,736 What's that? 419 00:18:28,736 --> 00:18:29,670 >> AUDIENCE: Calls to functions. 420 00:18:29,670 --> 00:18:30,260 >> DAVID MALAN: I'm sorry. 421 00:18:30,260 --> 00:18:31,210 >> AUDIENCE: Calls to functions. 422 00:18:31,210 --> 00:18:33,590 >> DAVID MALAN: Calls to functions, but specifically, what's inside of each of 423 00:18:33,590 --> 00:18:35,340 those frames? 424 00:18:35,340 --> 00:18:37,220 What kinds of things? 425 00:18:37,220 --> 00:18:37,460 Yeah. 426 00:18:37,460 --> 00:18:38,500 So local variables. 427 00:18:38,500 --> 00:18:43,080 Anytime we needed some local storage, like an argument, or int I, or int 428 00:18:43,080 --> 00:18:45,940 temp, or whatever the local variable is, we've been 429 00:18:45,940 --> 00:18:47,210 putting that on the stack. 430 00:18:47,210 --> 00:18:49,610 And we call it a stack because of that layering idea. 431 00:18:49,610 --> 00:18:52,940 Just kind of matches up with reality, the concept thereof. 432 00:18:52,940 --> 00:18:56,650 >> But it turns out that a stack can also be seen as a data structure, an 433 00:18:56,650 --> 00:19:00,110 alternative to an array, an alternative to a linked list. 434 00:19:00,110 --> 00:19:02,770 Something conceptually more interesting that can still be 435 00:19:02,770 --> 00:19:06,030 implemented using either of those things, but it's a different type of 436 00:19:06,030 --> 00:19:09,140 data structure supporting, really, only two operations. 437 00:19:09,140 --> 00:19:11,000 But you can add on fancier features than these. 438 00:19:11,000 --> 00:19:12,180 But these are the basics-- 439 00:19:12,180 --> 00:19:13,510 push and pop. 440 00:19:13,510 --> 00:19:19,240 >> And the idea with a stack is that if I have here, with or without Annenberg 441 00:19:19,240 --> 00:19:22,880 knowing, a tray from next door with the number 9 on it. 442 00:19:22,880 --> 00:19:23,870 So just an int. 443 00:19:23,870 --> 00:19:26,990 And I want to push this onto the data structure, which currently is empty. 444 00:19:26,990 --> 00:19:28,790 Consider this the bottom of the stack. 445 00:19:28,790 --> 00:19:33,150 I would push this number 9 onto the stack, and now it's right there. 446 00:19:33,150 --> 00:19:36,040 >> But the interesting thing about a stack is that if I now want to push 447 00:19:36,040 --> 00:19:40,210 some other value, like 17, and I push this onto the stack, I'm going to do 448 00:19:40,210 --> 00:19:43,290 the only intuitive thing, I'm just going to put it right where we humans 449 00:19:43,290 --> 00:19:45,180 would be inclined to put it, on top. 450 00:19:45,180 --> 00:19:48,850 But what's interesting now is, how do I get at 9? 451 00:19:48,850 --> 00:19:50,670 You know, I don't without some effort. 452 00:19:50,670 --> 00:19:54,070 >> So what's interesting about a stack is that by design, 453 00:19:54,070 --> 00:19:56,330 it's a LIFO data structure. 454 00:19:56,330 --> 00:19:59,680 Silly way of describing last in, first out. 455 00:19:59,680 --> 00:20:03,280 So the last number in at this time was 17. 456 00:20:03,280 --> 00:20:07,540 So if I want to pop something off of the stack, it can only be 17. 457 00:20:07,540 --> 00:20:11,890 So there's a mandatory order of operations here, where the last item 458 00:20:11,890 --> 00:20:14,260 in has to be the first one out. 459 00:20:14,260 --> 00:20:16,440 Hence the acronym, LIFO. 460 00:20:16,440 --> 00:20:19,160 >> So why might this be useful? 461 00:20:19,160 --> 00:20:22,690 Are their contexts in which you'd want a data structure like this? 462 00:20:22,690 --> 00:20:24,810 Well, it's certainly been useful inside of a computer. 463 00:20:24,810 --> 00:20:29,050 So operating systems clearly use this kind of data structure for stacks. 464 00:20:29,050 --> 00:20:32,800 We'll also see the same idea when it comes to web pages. 465 00:20:32,800 --> 00:20:35,890 So this week and next week and beyond, and as you start implementing web 466 00:20:35,890 --> 00:20:39,490 pages in a language called HTML, you can actually use a data structure like 467 00:20:39,490 --> 00:20:42,690 this to determine if the page is correctly formatted. 468 00:20:42,690 --> 00:20:47,170 Because we'll see all web pages follow a sort of hierarchy, an indentation 469 00:20:47,170 --> 00:20:52,030 that will, at the end of the day, be a tree structure underneath the hood. 470 00:20:52,030 --> 00:20:53,620 So more on that in just a bit. 471 00:20:53,620 --> 00:20:56,560 >> But for now, let's propose for a moment, how could we go about 472 00:20:56,560 --> 00:20:58,830 representing what a stack is? 473 00:20:58,830 --> 00:21:03,370 Let me propose that we implement a stack with code like this. 474 00:21:03,370 --> 00:21:07,990 So a stack is going to have inside of it two things, an array, called trays, 475 00:21:07,990 --> 00:21:09,510 just to be consistent with the demo. 476 00:21:09,510 --> 00:21:12,660 And each of the items in that array is going to be a type int. 477 00:21:12,660 --> 00:21:14,740 And capacity is presumably what? 478 00:21:14,740 --> 00:21:18,796 Because I've not written the full definition here. 479 00:21:18,796 --> 00:21:21,535 >> It's probably the maximum size of the array. 480 00:21:21,535 --> 00:21:25,150 And it's probably declared as a sharp define at the top of the file, some 481 00:21:25,150 --> 00:21:28,450 kind of constant as implied by the mere capitalization. 482 00:21:28,450 --> 00:21:32,250 So somewhere capacity is defined as the maximum possible size. 483 00:21:32,250 --> 00:21:35,590 Meanwhile, inside of the data structure known as a stack there will 484 00:21:35,590 --> 00:21:38,630 be an integer just known simply as size. 485 00:21:38,630 --> 00:21:43,400 >> So if I were to represent this now pictorially, let's suppose that this 486 00:21:43,400 --> 00:21:48,070 whole black box represents my stack. 487 00:21:48,070 --> 00:21:50,070 Inside of it is two variables. 488 00:21:50,070 --> 00:21:54,780 So I'm going to draw the first one as size. 489 00:21:54,780 --> 00:21:57,420 And the second one I'm going to draw as an array. 490 00:21:57,420 --> 00:22:01,060 >> But just to keep things orderly, normally I would draw an array like 491 00:22:01,060 --> 00:22:04,910 this, but it's kind of nice if we match reality, or 492 00:22:04,910 --> 00:22:06,230 match the mental model. 493 00:22:06,230 --> 00:22:12,880 So let me instead draw the array vertically, which is just, again, 494 00:22:12,880 --> 00:22:13,840 artist's rendition. 495 00:22:13,840 --> 00:22:16,610 Doesn't really matter what it is underneath the hood. 496 00:22:16,610 --> 00:22:20,350 And we'll say that, by default, capacity is going to be three. 497 00:22:20,350 --> 00:22:23,480 So this will be location 0, this will be location 1, this 498 00:22:23,480 --> 00:22:25,740 will be location 2. 499 00:22:25,740 --> 00:22:29,330 >> If I bribe with a stress ball, would someone like to come up and run the 500 00:22:29,330 --> 00:22:30,870 board here for just a moment? 501 00:22:30,870 --> 00:22:31,960 OK, saw your hand first. 502 00:22:31,960 --> 00:22:33,950 Come on up. 503 00:22:33,950 --> 00:22:36,500 All right. 504 00:22:36,500 --> 00:22:38,760 So I believe it is Steven. 505 00:22:38,760 --> 00:22:40,035 Come on up. 506 00:22:40,035 --> 00:22:40,770 All right. 507 00:22:40,770 --> 00:22:46,760 >> But suppose now we rewind to the initial state of the world where I 508 00:22:46,760 --> 00:22:52,180 have just declared a stack, and it's going to be of capacity three. 509 00:22:52,180 --> 00:22:54,470 But size has not yet been determined. 510 00:22:54,470 --> 00:22:56,100 Trays has not yet been determined. 511 00:22:56,100 --> 00:22:57,300 So a couple of questions first. 512 00:22:57,300 --> 00:23:01,310 And let me give you mic so you can partake more actively in this. 513 00:23:01,310 --> 00:23:05,190 >> So what is inside of size at this moment in time if all I have done is 514 00:23:05,190 --> 00:23:09,340 declared a stack with one line of code? 515 00:23:09,340 --> 00:23:10,100 >> STEVEN: Not much. 516 00:23:10,100 --> 00:23:12,080 >> DAVID MALAN: OK, not much. 517 00:23:12,080 --> 00:23:14,410 Do we know what's inside of size, do we know what's inside 518 00:23:14,410 --> 00:23:16,330 of this array here? 519 00:23:16,330 --> 00:23:18,630 >> STEVEN: Just random code, right? 520 00:23:18,630 --> 00:23:20,220 Just-- 521 00:23:20,220 --> 00:23:23,230 >> DAVID MALAN: Yeah, I'm going to call it code, but random-- 522 00:23:23,230 --> 00:23:23,820 >> STEVEN: Things. 523 00:23:23,820 --> 00:23:28,290 >> DAVID MALAN: Things like random 524 00:23:28,290 --> 00:23:28,870 >> STEVEN: Bits. 525 00:23:28,870 --> 00:23:29,530 >> DAVID MALAN: Bits, right? 526 00:23:29,530 --> 00:23:31,190 So garbage values, right? 527 00:23:31,190 --> 00:23:33,470 So permutations of 0's and 1's. 528 00:23:33,470 --> 00:23:35,920 Remnants of previous usages of this memory. 529 00:23:35,920 --> 00:23:38,150 And we don't really know what the values are, so we typically draw them 530 00:23:38,150 --> 00:23:38,930 as question marks. 531 00:23:38,930 --> 00:23:41,990 >> So the first thing we're presumably going to want to do here-- 532 00:23:41,990 --> 00:23:46,630 and let me give this field inside of there a name-- trays. 533 00:23:46,630 --> 00:23:49,540 What should we presumably initialize size to if we want to 534 00:23:49,540 --> 00:23:51,040 start using this stack? 535 00:23:51,040 --> 00:23:53,070 >> STEVEN: Tray is sub 3. 536 00:23:53,070 --> 00:23:53,910 >> DAVID MALAN: So, OK. 537 00:23:53,910 --> 00:23:56,710 To be clear, capacity is declared elsewhere as three. 538 00:23:56,710 --> 00:23:58,570 And that's what I've used to allocate the array. 539 00:23:58,570 --> 00:24:03,535 Size is going to refer to how many trays are currently on the stack. 540 00:24:03,535 --> 00:24:03,880 >> STEVEN: Zero. 541 00:24:03,880 --> 00:24:04,460 >> DAVID MALAN: So it should be zero. 542 00:24:04,460 --> 00:24:07,760 So go ahead, and with any finger, draw a zero in size. 543 00:24:07,760 --> 00:24:08,440 All right. 544 00:24:08,440 --> 00:24:10,920 So now, what's inside of this here, we don't know. 545 00:24:10,920 --> 00:24:12,160 These are really just garbage values. 546 00:24:12,160 --> 00:24:14,800 So we could draw question marks, but let's keep the board clean for now 547 00:24:14,800 --> 00:24:16,300 because it doesn't matter what's there. 548 00:24:16,300 --> 00:24:19,130 We don't need to initialize the array to anything, because if we know that 549 00:24:19,130 --> 00:24:23,100 the size of the stack is zero, well, we shouldn't be looking at anything in 550 00:24:23,100 --> 00:24:25,590 this array anyway at this point in time. 551 00:24:25,590 --> 00:24:29,970 >> So now suppose that I push the number 9 onto the stack. 552 00:24:29,970 --> 00:24:33,750 How should we update the data structure inside of this black box? 553 00:24:33,750 --> 00:24:35,540 What values need to change? 554 00:24:35,540 --> 00:24:36,200 >> STEVEN: Within-- 555 00:24:36,200 --> 00:24:37,400 the size? 556 00:24:37,400 --> 00:24:37,650 >> DAVID MALAN: OK. 557 00:24:37,650 --> 00:24:38,770 Size should become what? 558 00:24:38,770 --> 00:24:39,580 >> STEVEN: Size would be one. 559 00:24:39,580 --> 00:24:39,870 >> DAVID MALAN: OK. 560 00:24:39,870 --> 00:24:41,110 So size should become one. 561 00:24:41,110 --> 00:24:42,540 So you can do this in a couple ways. 562 00:24:42,540 --> 00:24:46,920 Let me give you, now your finger is an eraser. 563 00:24:46,920 --> 00:24:47,260 All right. 564 00:24:47,260 --> 00:24:49,960 Then now your finger is a brush. 565 00:24:49,960 --> 00:24:50,330 All right. 566 00:24:50,330 --> 00:24:52,820 And now what else has to change, obviously, in the data structure? 567 00:24:52,820 --> 00:24:57,060 >> STEVEN: We're going from bottom up to 9. 568 00:24:57,060 --> 00:24:57,760 >> DAVID MALAN: 9. 569 00:24:57,760 --> 00:24:58,420 OK, Good. 570 00:24:58,420 --> 00:25:01,550 So still doesn't matter what's at location one or two because they're 571 00:25:01,550 --> 00:25:04,520 garbage values, but we shouldn't bother looking there because size is 572 00:25:04,520 --> 00:25:07,540 telling us that only the first element is actually legitimate. 573 00:25:07,540 --> 00:25:10,400 So now I push 17 onto the list. 574 00:25:10,400 --> 00:25:11,830 What happens to this picture? 575 00:25:11,830 --> 00:25:14,720 >> STEVEN: So size is going to go to two. 576 00:25:14,720 --> 00:25:15,300 >> DAVID MALAN: OK. 577 00:25:15,300 --> 00:25:16,070 You're eraser-- 578 00:25:16,070 --> 00:25:16,810 oops. 579 00:25:16,810 --> 00:25:18,026 You're an eraser. 580 00:25:18,026 --> 00:25:18,840 >> STEVEN: Eraser. 581 00:25:18,840 --> 00:25:19,720 >> DAVID MALAN: You're a brush. 582 00:25:19,720 --> 00:25:20,560 >> STEVEN: Brush. 583 00:25:20,560 --> 00:25:20,920 >> DAVID MALAN: OK. 584 00:25:20,920 --> 00:25:21,600 And what else? 585 00:25:21,600 --> 00:25:22,600 >> STEVEN: And then we-- 586 00:25:22,600 --> 00:25:22,915 >> DAVID MALAN: We pushed 17. 587 00:25:22,915 --> 00:25:24,760 >> STEVEN: We stick 17 on top, so-- 588 00:25:24,760 --> 00:25:25,710 >> DAVID MALAN: OK, good. 589 00:25:25,710 --> 00:25:27,040 >> STEVEN: --drop it down. 590 00:25:27,040 --> 00:25:27,530 >> DAVID MALAN: All right. 591 00:25:27,530 --> 00:25:27,940 It's getting easy. 592 00:25:27,940 --> 00:25:29,300 I'm not going to help you this time. 593 00:25:29,300 --> 00:25:30,510 Push 22. 594 00:25:30,510 --> 00:25:31,720 >> STEVEN: Done. 595 00:25:31,720 --> 00:25:34,870 Becoming an eraser. 596 00:25:34,870 --> 00:25:37,340 I'm becoming a brush. 597 00:25:37,340 --> 00:25:39,340 And then I am putting 22. 598 00:25:39,340 --> 00:25:40,100 >> DAVID MALAN: 22. 599 00:25:40,100 --> 00:25:40,620 Excellent. 600 00:25:40,620 --> 00:25:41,380 So one more time. 601 00:25:41,380 --> 00:25:44,280 I'm now going to push onto the stack 26. 602 00:25:44,280 --> 00:25:46,350 >> STEVEN: Ooh. 603 00:25:46,350 --> 00:25:50,278 Oh gosh. 604 00:25:50,278 --> 00:25:52,520 You really caught me off guard. 605 00:25:52,520 --> 00:25:53,703 >> DAVID MALAN: You didn't see this coming? 606 00:25:53,703 --> 00:25:55,930 >> STEVEN: I didn't see this coming. 607 00:25:55,930 --> 00:25:58,756 Could we re-initial capacity? 608 00:25:58,756 --> 00:25:59,790 >> DAVID MALAN: That's a good question. 609 00:25:59,790 --> 00:26:02,360 So we've kind of painted ourselves in a corner here. 610 00:26:02,360 --> 00:26:06,740 There really is no good out for Steven because we've allocated this array 611 00:26:06,740 --> 00:26:09,130 statically, so to speak, inside of the data structure. 612 00:26:09,130 --> 00:26:12,170 And we've essentially hard coded it to be of size three. 613 00:26:12,170 --> 00:26:14,170 So we can't really reallocate it. 614 00:26:14,170 --> 00:26:20,020 >> We could if we went back in, we redefined trays to be a pointer that 615 00:26:20,020 --> 00:26:22,300 we then use malloc to hand memory to. 616 00:26:22,300 --> 00:26:25,050 Because if we got the memory from the heap via malloc, we 617 00:26:25,050 --> 00:26:26,430 could then free it. 618 00:26:26,430 --> 00:26:29,630 But before freeing it, we could reallocate a bigger chunk of memory, 619 00:26:29,630 --> 00:26:31,330 update the pointer, and so forth. 620 00:26:31,330 --> 00:26:33,500 But for now, this is really the best we can do. 621 00:26:33,500 --> 00:26:36,360 Push and pop are presumably going to have to signal some error. 622 00:26:36,360 --> 00:26:40,270 >> So for instance, our implementation of push could return a bool which 623 00:26:40,270 --> 00:26:42,390 previously returned true, true, true. 624 00:26:42,390 --> 00:26:48,390 But the fourth time, it's going to have to return false, for instance. 625 00:26:48,390 --> 00:26:48,540 All right. 626 00:26:48,540 --> 00:26:49,540 Very well done. 627 00:26:49,540 --> 00:26:50,060 Congratulations. 628 00:26:50,060 --> 00:26:52,160 You've earned your stress ball today. 629 00:26:52,160 --> 00:26:53,110 >> [APPLAUSE] 630 00:26:53,110 --> 00:26:54,382 >> STEVEN: Thank you. 631 00:26:54,382 --> 00:26:55,680 >> DAVID MALAN: Thank you. 632 00:26:55,680 --> 00:26:59,740 OK, so this seems to be not much of a step forward, right? 633 00:26:59,740 --> 00:27:01,410 We described this data structure. 634 00:27:01,410 --> 00:27:02,320 It's been compelling, right? 635 00:27:02,320 --> 00:27:03,200 Operating systems like it. 636 00:27:03,200 --> 00:27:06,360 Apparently the web can make use of this, and other applications still. 637 00:27:06,360 --> 00:27:10,870 But what a stupid limitation that we're back to sort of week two limits 638 00:27:10,870 --> 00:27:12,880 where we have fixed size arrays. 639 00:27:12,880 --> 00:27:15,010 >> So there are indeed a couple of ways we could solve this. 640 00:27:15,010 --> 00:27:18,750 We could dynamically allocate the array, not by hard coding it as I've 641 00:27:18,750 --> 00:27:22,600 done here, but instead re-declaring this, just to be clear, as 642 00:27:22,600 --> 00:27:23,830 something like this. 643 00:27:23,830 --> 00:27:29,040 Int* trays, not deciding on a capacity yet. 644 00:27:29,040 --> 00:27:35,460 But when I declare the stack elsewhere in my code, I could then call malloc, 645 00:27:35,460 --> 00:27:38,250 get the address of a chunk of memory, and I could assign 646 00:27:38,250 --> 00:27:39,980 that address to trays. 647 00:27:39,980 --> 00:27:43,340 >> And then, because it's just a chunk of memory, I could continue to use square 648 00:27:43,340 --> 00:27:45,450 bracket notation in the usual way. 649 00:27:45,450 --> 00:27:49,020 Because again, there's sort of this functional equivalent of arrays and 650 00:27:49,020 --> 00:27:50,820 chunks of memory that come back from malloc. 651 00:27:50,820 --> 00:27:53,090 We can treat one as the other using pointer arithmetic or 652 00:27:53,090 --> 00:27:54,440 square bracket notation. 653 00:27:54,440 --> 00:27:55,660 So that's one approach. 654 00:27:55,660 --> 00:28:00,120 >> But how else might we implement this same data structure, potentially? 655 00:28:00,120 --> 00:28:00,280 Right? 656 00:28:00,280 --> 00:28:04,530 I feel like we just solved this problem like a week ago. 657 00:28:04,530 --> 00:28:08,860 What was the solution to this problem that Steven ran into? 658 00:28:08,860 --> 00:28:10,370 So linked lists, right. 659 00:28:10,370 --> 00:28:13,410 >> If the problem is that we're painting ourselves into a corner by allocating 660 00:28:13,410 --> 00:28:17,580 in advance too little memory that we then have to somehow deal with, well, 661 00:28:17,580 --> 00:28:19,880 why not just avoid that issue altogether? 662 00:28:19,880 --> 00:28:26,170 Why not just declare trays to be a pointer to a node, ergo a linked list, 663 00:28:26,170 --> 00:28:30,740 and then simply allocate new nodes every time Steven needed to fit a 664 00:28:30,740 --> 00:28:32,400 number into the data structure. 665 00:28:32,400 --> 00:28:34,200 >> So the picture would have to change. 666 00:28:34,200 --> 00:28:38,220 It's not going to be as clean and as simple as just an array of three ints. 667 00:28:38,220 --> 00:28:42,970 Now it's going to be a pointer to a struct, and that struct is going to 668 00:28:42,970 --> 00:28:44,830 have an int and a next pointer. 669 00:28:44,830 --> 00:28:47,670 It's going to lead via that pointer to another such struct to 670 00:28:47,670 --> 00:28:48,600 another such struct. 671 00:28:48,600 --> 00:28:50,560 So the picture would actually get a bit messier. 672 00:28:50,560 --> 00:28:52,950 And we'd have arrows tying everything together. 673 00:28:52,950 --> 00:28:55,280 >> But that's fine, right, because we've seen how to do this. 674 00:28:55,280 --> 00:28:58,180 And once you get comfortable implementing something like a linked 675 00:28:58,180 --> 00:29:01,450 list, which you'll have to do if you choose to implement a hash table with 676 00:29:01,450 --> 00:29:05,120 separate chaining for p-set 6, you can use it as a building block, or an 677 00:29:05,120 --> 00:29:08,870 ingredient, or in Scratch speak, a procedure, something that you put, you 678 00:29:08,870 --> 00:29:12,560 created your own puzzle piece that you can then reuse. 679 00:29:12,560 --> 00:29:17,090 So tradeoffs, but potential solutions that we've actually seen before. 680 00:29:17,090 --> 00:29:20,560 >> So quite often, you see this every year or two when Apple releases 681 00:29:20,560 --> 00:29:23,060 something new, and all the crazy people line up outside of the Apple 682 00:29:23,060 --> 00:29:27,050 store to buy their marginal upgrade on hardware. 683 00:29:27,050 --> 00:29:30,420 I say this, it's OK, because I am one of those people. 684 00:29:30,420 --> 00:29:35,140 So what kind of data structure might represent this reality? 685 00:29:35,140 --> 00:29:36,980 >> Well, let's call it a queue, a line. 686 00:29:36,980 --> 00:29:40,270 So British would call it typically a queue anyway, so it's a nice name. 687 00:29:40,270 --> 00:29:44,960 And the two operations that a queue shall support we'll call a enqueue 688 00:29:44,960 --> 00:29:48,900 operation and a dequeue operation, which are similar in 689 00:29:48,900 --> 00:29:50,120 spirit to push and pop. 690 00:29:50,120 --> 00:29:54,060 It's just sort of different in convention, what we're calling these. 691 00:29:54,060 --> 00:29:57,680 But to enqueue something means to add or insert it to the data structure. 692 00:29:57,680 --> 00:29:59,570 To dequeue means to remove it. 693 00:29:59,570 --> 00:30:05,170 But whereas a stack was a LIFO data structure, a queue is a first in, 694 00:30:05,170 --> 00:30:06,740 first out data structure. 695 00:30:06,740 --> 00:30:10,050 >> If you are the first person in line, you will be the first person to get 696 00:30:10,050 --> 00:30:12,420 out of line and buy your new device. 697 00:30:12,420 --> 00:30:18,070 Imagine how upset these people would be if Apple instead used a stack, for 698 00:30:18,070 --> 00:30:21,250 instance, to implement the picking up of your new toy. 699 00:30:21,250 --> 00:30:24,310 So queues make sense, certainly, and we can think of all sorts of 700 00:30:24,310 --> 00:30:27,480 applications, presumably, for queues, especially when you want fairness. 701 00:30:27,480 --> 00:30:30,040 So how might we implement these as a data structure? 702 00:30:30,040 --> 00:30:33,680 >> Well, I propose that we might need to do it this way. 703 00:30:33,680 --> 00:30:35,225 So I'm going to now have numbers. 704 00:30:35,225 --> 00:30:38,190 So we'll keep it simple and not necessarily talk in terms of trays. 705 00:30:38,190 --> 00:30:40,220 Just numbers that people of gotten. 706 00:30:40,220 --> 00:30:43,760 Capacity is going to, again, fix the total number of people that can be in 707 00:30:43,760 --> 00:30:46,900 this line, as three or whatever other value. 708 00:30:46,900 --> 00:30:50,760 >> But I propose that I need to keep track not only of the size of the 709 00:30:50,760 --> 00:30:52,370 queue, how many things are in it. 710 00:30:52,370 --> 00:30:56,310 So size is the current size, capacity is the maximum size. 711 00:30:56,310 --> 00:30:58,540 Just again, nomenclature by convention. 712 00:30:58,540 --> 00:31:03,680 Why do I need an additional int inside of a queue to keep track of who's in 713 00:31:03,680 --> 00:31:05,365 front of the line? 714 00:31:05,365 --> 00:31:07,930 715 00:31:07,930 --> 00:31:10,910 Why do I need to do that in this case? 716 00:31:10,910 --> 00:31:14,750 717 00:31:14,750 --> 00:31:16,190 >> Well, how is this picture going to change? 718 00:31:16,190 --> 00:31:19,280 I can probably reuse most of this picture. 719 00:31:19,280 --> 00:31:21,480 Let me go ahead and erase what's here. 720 00:31:21,480 --> 00:31:24,580 We'll give this a slightly different name up here. 721 00:31:24,580 --> 00:31:28,930 Let's get rid of the 17, let's get rid of the 9, let's get rid of the 3. 722 00:31:28,930 --> 00:31:30,410 And let's add one other thing. 723 00:31:30,410 --> 00:31:34,710 I propose that I need to keep track of the front of the list, which is just 724 00:31:34,710 --> 00:31:35,570 going to be an int as well. 725 00:31:35,570 --> 00:31:36,550 And we're going to keep it simple. 726 00:31:36,550 --> 00:31:37,740 No linked list for now. 727 00:31:37,740 --> 00:31:40,900 >> We'll admit that we're going to bump up against this limit. 728 00:31:40,900 --> 00:31:43,720 But what do I want to see happen this time? 729 00:31:43,720 --> 00:31:47,240 So suppose I go ahead and the first person comes up in line, and 730 00:31:47,240 --> 00:31:48,560 it's the number 9. 731 00:31:48,560 --> 00:31:49,680 We do have stress balls. 732 00:31:49,680 --> 00:31:51,330 Can I steal, say, two or three people? 733 00:31:51,330 --> 00:31:52,690 One, two, three? 734 00:31:52,690 --> 00:31:53,120 Come on up. 735 00:31:53,120 --> 00:31:56,022 Right from the front, because we'll make this one quick. 736 00:31:56,022 --> 00:31:59,415 >> Each of you is now going to be a fan boy in line at Apple. 737 00:31:59,415 --> 00:32:03,970 738 00:32:03,970 --> 00:32:06,210 You will not be receiving Apple hardware at the end of this though. 739 00:32:06,210 --> 00:32:06,500 All right. 740 00:32:06,500 --> 00:32:09,430 So you're number 9, you're number 17, number 22. 741 00:32:09,430 --> 00:32:12,130 These are arbitrary numbers, like student IDs or whatnot. 742 00:32:12,130 --> 00:32:14,550 And in just a moment, let's begin to start adding things. 743 00:32:14,550 --> 00:32:16,000 And I'll run the board here this time. 744 00:32:16,000 --> 00:32:19,570 >> So in this case, I've initialized the front to be-- 745 00:32:19,570 --> 00:32:22,380 I actually don't really care what the front is, because the size is zero. 746 00:32:22,380 --> 00:32:24,480 So this might as well just be a question mark. 747 00:32:24,480 --> 00:32:26,170 These are all question marks. 748 00:32:26,170 --> 00:32:29,880 So now we'll begin to actually see some people lining up at the store. 749 00:32:29,880 --> 00:32:33,320 >> So if number 9, you're the first one there at 5 AM, go ahead and line up, 750 00:32:33,320 --> 00:32:34,210 or the night before. 751 00:32:34,210 --> 00:32:34,580 OK. 752 00:32:34,580 --> 00:32:35,940 So now 9 is here. 753 00:32:35,940 --> 00:32:37,940 So 9 is in the front of the list. 754 00:32:37,940 --> 00:32:41,440 So I'm going to go ahead and update the size of this current data 755 00:32:41,440 --> 00:32:44,740 structure to not be 0 anymore, but to be 1. 756 00:32:44,740 --> 00:32:47,630 I'm going to put 9 at the front of the list. 757 00:32:47,630 --> 00:32:51,020 Let me go ahead and toggle the screen so we can see past us here. 758 00:32:51,020 --> 00:32:53,220 >> And now what do I want to put at front? 759 00:32:53,220 --> 00:32:56,240 I'm going to keep track that the front of the queue right now 760 00:32:56,240 --> 00:32:58,570 is at location 0. 761 00:32:58,570 --> 00:33:00,510 Because what is next going to happen? 762 00:33:00,510 --> 00:33:03,000 Well, suppose now I enqueue 17 as well. 763 00:33:03,000 --> 00:33:04,510 So hop in line there. 764 00:33:04,510 --> 00:33:07,060 And again, the sort of door to the store is going to be here. 765 00:33:07,060 --> 00:33:08,700 So now I've added 17. 766 00:33:08,700 --> 00:33:10,810 And even though these guys are blocking the screen, that's OK, 767 00:33:10,810 --> 00:33:12,300 because we can see it up here. 768 00:33:12,300 --> 00:33:12,910 Sorry. 769 00:33:12,910 --> 00:33:13,810 >> AUDIENCE: We can move-- 770 00:33:13,810 --> 00:33:14,660 >> DAVID MALAN: No, that's OK. 771 00:33:14,660 --> 00:33:16,000 It's huge up there. 772 00:33:16,000 --> 00:33:18,580 So 17 is now inside of the queue. 773 00:33:18,580 --> 00:33:21,332 I need to update which fields now though? 774 00:33:21,332 --> 00:33:23,210 OK, definitely size. 775 00:33:23,210 --> 00:33:26,430 And how about front? 776 00:33:26,430 --> 00:33:27,040 OK, no. 777 00:33:27,040 --> 00:33:30,200 Front should not change, because unlike a stack, we 778 00:33:30,200 --> 00:33:31,370 want to maintain fairness. 779 00:33:31,370 --> 00:33:35,150 So if 9 came in first, we want 9 to be the first out of the line 780 00:33:35,150 --> 00:33:36,420 and into the store. 781 00:33:36,420 --> 00:33:37,220 >> In fact, let's see that. 782 00:33:37,220 --> 00:33:42,235 Before we insert 22, let's go ahead and dequeue 9. 783 00:33:42,235 --> 00:33:42,970 What's your name again? 784 00:33:42,970 --> 00:33:43,680 >> AUDIENCE: Jake. 785 00:33:43,680 --> 00:33:45,440 >> DAVID MALAN: Jake is going to be dequeued now. 786 00:33:45,440 --> 00:33:48,050 So you get to walk into the store. 787 00:33:48,050 --> 00:33:49,880 And pretend that the store is over there. 788 00:33:49,880 --> 00:33:51,970 So now what needs-- dit-dit-dit! 789 00:33:51,970 --> 00:33:53,400 What needs to happen now? 790 00:33:53,400 --> 00:33:54,490 Design decision. 791 00:33:54,490 --> 00:33:56,825 So not a bad instinct, but-- what's your name again? 792 00:33:56,825 --> 00:33:57,090 >> AUDIENCE: David. 793 00:33:57,090 --> 00:33:57,500 >> DAVID MALAN: David. 794 00:33:57,500 --> 00:33:58,810 So what did David do? 795 00:33:58,810 --> 00:34:02,590 He was trying to sort of fix the data structure and move from his location 796 00:34:02,590 --> 00:34:04,100 into Jake's former location. 797 00:34:04,100 --> 00:34:06,740 And that's fine if we're willing to accept that as an 798 00:34:06,740 --> 00:34:08,199 implementation detail. 799 00:34:08,199 --> 00:34:11,100 But first, let's update the data structure before we do that. 800 00:34:11,100 --> 00:34:14,139 Because I'm not liking the idea of all the people shifting in this line. 801 00:34:14,139 --> 00:34:17,360 >> It's no big deal if David does it with one step, but again, think back to 802 00:34:17,360 --> 00:34:20,360 when we've had eight volunteers on the stage and we've done like insertion 803 00:34:20,360 --> 00:34:22,600 sort, where we had to start moving everyone around. 804 00:34:22,600 --> 00:34:23,790 That got expensive, right? 805 00:34:23,790 --> 00:34:28,330 That makes me cringe about big O of n, big O of n squared again. 806 00:34:28,330 --> 00:34:30,650 It's not feeling like an ideal outcome. 807 00:34:30,650 --> 00:34:32,080 >> So let's just update this. 808 00:34:32,080 --> 00:34:35,120 So the size of the queue is no longer 2. 809 00:34:35,120 --> 00:34:37,090 It's now simply 1. 810 00:34:37,090 --> 00:34:40,360 But I can now update something I didn't update before, the 811 00:34:40,360 --> 00:34:41,130 front of the list. 812 00:34:41,130 --> 00:34:45,420 I could just say, is that location 1? 813 00:34:45,420 --> 00:34:49,770 So now we have garbage value here, garbage value here, and David in the 814 00:34:49,770 --> 00:34:51,469 middle of this garbage. 815 00:34:51,469 --> 00:34:54,980 But the data structure is still intact. 816 00:34:54,980 --> 00:34:58,540 >> And in fact, I don't even need to change Jake's former number 817 00:34:58,540 --> 00:35:00,460 9, because who cares. 818 00:35:00,460 --> 00:35:04,470 I have enough information now in the size that I know there's one person in 819 00:35:04,470 --> 00:35:05,030 this queue. 820 00:35:05,030 --> 00:35:08,340 And I know that that person is at location 1, not 0. 821 00:35:08,340 --> 00:35:09,760 I'm not counting. 822 00:35:09,760 --> 00:35:11,300 So 1 as well. 823 00:35:11,300 --> 00:35:13,410 So the data structure's still OK. 824 00:35:13,410 --> 00:35:14,330 >> Well, what happens next? 825 00:35:14,330 --> 00:35:15,010 Let's enqueue-- 826 00:35:15,010 --> 00:35:15,370 what's your name? 827 00:35:15,370 --> 00:35:16,160 >> AUDIENCE: Callen. 828 00:35:16,160 --> 00:35:16,580 >> DAVID MALAN: Callen. 829 00:35:16,580 --> 00:35:20,770 Let's enqueue a Callen, and 22 is now in the queue. 830 00:35:20,770 --> 00:35:22,300 So now what has to change here? 831 00:35:22,300 --> 00:35:24,380 Front is not going to change, obviously. 832 00:35:24,380 --> 00:35:27,160 Size is going to change to be 2 again. 833 00:35:27,160 --> 00:35:31,590 And 22 ends up here, 9 is still present, but it's effectively a 834 00:35:31,590 --> 00:35:32,600 garbage value now. 835 00:35:32,600 --> 00:35:35,910 It's just a remnant of Jake past. 836 00:35:35,910 --> 00:35:39,200 >> So now what happens if I dequeue David? 837 00:35:39,200 --> 00:35:41,560 One last operation, dequeue David. 838 00:35:41,560 --> 00:35:46,070 We could shift, but I propose let's do as little work as possible. 839 00:35:46,070 --> 00:35:50,280 Now my data structure goes back in size from 2 to 1. 840 00:35:50,280 --> 00:35:53,730 But the front of the queue now becomes 2. 841 00:35:53,730 --> 00:35:56,640 I don't need to change these numbers just yet, because they're 842 00:35:56,640 --> 00:35:58,230 just garbage values. 843 00:35:58,230 --> 00:35:59,720 >> But now what happens? 844 00:35:59,720 --> 00:36:03,280 Suppose I enqueue myself, 26? 845 00:36:03,280 --> 00:36:05,890 I feel like I belong over here. 846 00:36:05,890 --> 00:36:06,890 So I'm being enqueued. 847 00:36:06,890 --> 00:36:08,760 So I kind of belong here. 848 00:36:08,760 --> 00:36:11,300 And even though you don't quite appreciate this visually on the stage, 849 00:36:11,300 --> 00:36:15,075 because we have plenty of room, I should not be standing here, why? 850 00:36:15,075 --> 00:36:16,290 >> AUDIENCE: You're out of bounds. 851 00:36:16,290 --> 00:36:16,370 >> DAVID MALAN: Right. 852 00:36:16,370 --> 00:36:16,940 I'm out of bounds. 853 00:36:16,940 --> 00:36:19,330 I've indexed beyond the bounds of this array. 854 00:36:19,330 --> 00:36:23,420 I really should be in one of the three possible locations. 855 00:36:23,420 --> 00:36:25,150 Now, where's most natural to go? 856 00:36:25,150 --> 00:36:27,760 I propose we leveraged a week one trick. 857 00:36:27,760 --> 00:36:30,150 The mod operator, percentage. 858 00:36:30,150 --> 00:36:36,850 Because I'm technically standing at location 3, but I do 3 mod capacity, 859 00:36:36,850 --> 00:36:40,250 so 3, a percent sign, 3-- 860 00:36:40,250 --> 00:36:40,970 capacity's 3. 861 00:36:40,970 --> 00:36:41,720 What's that? 862 00:36:41,720 --> 00:36:43,700 What's the remainder when you divide 3 by 3? 863 00:36:43,700 --> 00:36:44,070 0. 864 00:36:44,070 --> 00:36:48,140 >> So that puts me were Jake was, which is actually good. 865 00:36:48,140 --> 00:36:50,370 So now the implementation of this thing's going to 866 00:36:50,370 --> 00:36:51,250 be a bit of a headache. 867 00:36:51,250 --> 00:36:53,740 It's really just one line of headache, of code. 868 00:36:53,740 --> 00:36:56,580 But at least now there's garbage value here, but there's two 869 00:36:56,580 --> 00:36:57,910 legitimate ints here. 870 00:36:57,910 --> 00:37:04,160 And I claim that now we have done exactly what we need to do so long as 871 00:37:04,160 --> 00:37:08,600 we change what Jake's value was to be 26. 872 00:37:08,600 --> 00:37:12,110 >> We now have enough information still to maintain the integrity 873 00:37:12,110 --> 00:37:13,060 of this data structure. 874 00:37:13,060 --> 00:37:17,160 We're still kind of out of luck when we want to insert four or more total 875 00:37:17,160 --> 00:37:20,740 elements, but I can at least make pretty efficient use of this constant 876 00:37:20,740 --> 00:37:21,740 time, in fact. 877 00:37:21,740 --> 00:37:27,150 I don't have to worry about shifting everyone, as David's inclination was. 878 00:37:27,150 --> 00:37:30,816 >> Any questions on stacks, or this queue? 879 00:37:30,816 --> 00:37:32,184 >> AUDIENCE: Is the reason why you need size so you know 880 00:37:32,184 --> 00:37:34,010 where to have a person? 881 00:37:34,010 --> 00:37:34,770 >> DAVID MALAN: Exactly. 882 00:37:34,770 --> 00:37:38,230 I need to know the size of the array because I need to know exactly how 883 00:37:38,230 --> 00:37:41,940 many of these values are legitimate, and so that I can find where to put 884 00:37:41,940 --> 00:37:42,800 the next person. 885 00:37:42,800 --> 00:37:43,300 Exactly. 886 00:37:43,300 --> 00:37:44,580 The size is-- 887 00:37:44,580 --> 00:37:46,360 actually, we didn't update this yet. 888 00:37:46,360 --> 00:37:48,380 I added myself at 26. 889 00:37:48,380 --> 00:37:51,760 The size is now, not 1, but 2. 890 00:37:51,760 --> 00:37:57,780 So now this indeed helps me find the head of the list, which is not 0, not 891 00:37:57,780 --> 00:37:59,250 1, but is 2. 892 00:37:59,250 --> 00:38:01,665 The front of the list is indeed number 22. 893 00:38:01,665 --> 00:38:05,120 Because he came in first, so he should be allowed into the store before me, 894 00:38:05,120 --> 00:38:08,780 even though visually I'm standing closer to the store. 895 00:38:08,780 --> 00:38:09,220 >> All right? 896 00:38:09,220 --> 00:38:12,410 A round of applause for these guys and we'll let them out of there. 897 00:38:12,410 --> 00:38:17,090 >> [APPLAUSE] 898 00:38:17,090 --> 00:38:18,150 >> DAVID MALAN: I could let you keep the tray. 899 00:38:18,150 --> 00:38:20,760 We could see what happens if you want, but maybe not. 900 00:38:20,760 --> 00:38:21,590 All right. 901 00:38:21,590 --> 00:38:25,380 So what now does that leave us? 902 00:38:25,380 --> 00:38:28,900 Well, let me propose that there's a few other data structures we could 903 00:38:28,900 --> 00:38:33,810 start adding to our tool kit that will actually be quite, quite relevant as 904 00:38:33,810 --> 00:38:35,270 we dive into web stuff. 905 00:38:35,270 --> 00:38:38,150 Which again, has some kind of connection to trees in the form of 906 00:38:38,150 --> 00:38:40,550 something called DOM, document object model. 907 00:38:40,550 --> 00:38:42,370 But we'll see more of that before long. 908 00:38:42,370 --> 00:38:46,260 >> Let me propose definitionally that we call tree now what you might know as 909 00:38:46,260 --> 00:38:48,820 more of a family tree, where you have some ancestor at the 910 00:38:48,820 --> 00:38:49,790 roots of the tree. 911 00:38:49,790 --> 00:38:54,480 A patriarchal or a matriarch at the very top of the tree. 912 00:38:54,480 --> 00:38:56,700 Without their spouse, in this case. 913 00:38:56,700 --> 00:39:00,940 But we now have what we'll call children, which are nodes that hang 914 00:39:00,940 --> 00:39:05,480 off the left child or the right child, arrows as depicted here. 915 00:39:05,480 --> 00:39:10,490 >> In other words, in a tree data structure in computer, a tree has zero 916 00:39:10,490 --> 00:39:11,480 or more nodes. 917 00:39:11,480 --> 00:39:13,500 If it has at least one node, that's called the root. 918 00:39:13,500 --> 00:39:15,700 It's the things visually that we draw at the top. 919 00:39:15,700 --> 00:39:20,280 And that node, like any other node, can have zero, one, or two, or three, 920 00:39:20,280 --> 00:39:23,600 or however many children the data structure supports. 921 00:39:23,600 --> 00:39:29,150 In this case, the root, storing the value one, has two children, 2 and 3, 922 00:39:29,150 --> 00:39:33,020 so we generally call 2 the left child and 3 the right child. 923 00:39:33,020 --> 00:39:36,940 >> And then when we get down to 5, 6, and 7, 6 might be called the middle child. 924 00:39:36,940 --> 00:39:38,940 If you have four children, it gets confusing. 925 00:39:38,940 --> 00:39:42,260 So we stop using that kind of shortcut verbally. 926 00:39:42,260 --> 00:39:44,580 But it's really just a family tree. 927 00:39:44,580 --> 00:39:48,880 And the leaves here are the nodes that themselves have no children. 928 00:39:48,880 --> 00:39:52,540 They hang off the bottom of the tree. 929 00:39:52,540 --> 00:39:56,940 >> So how might we implement a tree that has just two children maximally? 930 00:39:56,940 --> 00:39:58,410 We'll call it a binary tree. 931 00:39:58,410 --> 00:40:00,960 Bi again meaning two, in this case, like with binary. 932 00:40:00,960 --> 00:40:04,830 And so it can have zero, one, or two children maximally. 933 00:40:04,830 --> 00:40:08,650 >> I'll propose that we implement the node for that structure with an int n, 934 00:40:08,650 --> 00:40:11,910 and then two pointers, one called left, one called right. 935 00:40:11,910 --> 00:40:14,830 But those are just nice arbitrary conventions. 936 00:40:14,830 --> 00:40:18,170 And what's nice now, especially if you kind of struggled conceptually with 937 00:40:18,170 --> 00:40:21,300 recursion, or thought that it wasn't really a solution to anything, 938 00:40:21,300 --> 00:40:23,120 especially if you could run out of memory. 939 00:40:23,120 --> 00:40:26,600 Now that we're talking about data structures and algorithms that allow 940 00:40:26,600 --> 00:40:31,030 us to traverse and manipulate them, turns out that recursion comes back in 941 00:40:31,030 --> 00:40:34,240 a much more compelling if not beautiful way. 942 00:40:34,240 --> 00:40:38,670 >> So this I propose is the implementation of a Search function. 943 00:40:38,670 --> 00:40:39,870 Given two inputs-- 944 00:40:39,870 --> 00:40:41,570 so think of this as a black box. 945 00:40:41,570 --> 00:40:46,560 Given two inputs, n, an int, and a pointer to a tree, a pointer to a 946 00:40:46,560 --> 00:40:50,020 node, or really the root of a tree, I claim that this function can return 947 00:40:50,020 --> 00:40:53,530 true or false, that value n is inside of this tree. 948 00:40:53,530 --> 00:40:55,210 >> What's inside of this black box? 949 00:40:55,210 --> 00:40:57,440 Well, four branches. 950 00:40:57,440 --> 00:40:58,385 The first just checks. 951 00:40:58,385 --> 00:41:00,490 If tree is null, just return false. 952 00:41:00,490 --> 00:41:04,580 If there's no node, there's no n, there's no number, just return false. 953 00:41:04,580 --> 00:41:12,330 If though, n, the value you're looking for, is less than tree arrow n, and 954 00:41:12,330 --> 00:41:15,180 just to be clear, what does it mean when I write tree and then the arrow 955 00:41:15,180 --> 00:41:18,150 notation, n? 956 00:41:18,150 --> 00:41:18,690 Exactly. 957 00:41:18,690 --> 00:41:21,970 It means dereference that pointer called tree. 958 00:41:21,970 --> 00:41:26,750 Go there, and then get inside of that node and get its field called n. 959 00:41:26,750 --> 00:41:30,810 And then compare the actual n that was passed into Search against it. 960 00:41:30,810 --> 00:41:35,390 >> So if n is less than, the n value in the tree node itself, well, 961 00:41:35,390 --> 00:41:36,720 what does that mean? 962 00:41:36,720 --> 00:41:40,690 That means nothing at first glance. 963 00:41:40,690 --> 00:41:40,900 Right? 964 00:41:40,900 --> 00:41:45,560 Just like when you have an array of values, you might like to apply binary 965 00:41:45,560 --> 00:41:48,290 search as a form of divide and conquer. 966 00:41:48,290 --> 00:41:51,790 But what assumption did we need to make for binary search to work at all 967 00:41:51,790 --> 00:41:54,510 in the phone book and earlier examples? 968 00:41:54,510 --> 00:41:55,530 >> How to be sorted. 969 00:41:55,530 --> 00:41:59,490 So let's refine the definition of tree here not to be just a tree, which can 970 00:41:59,490 --> 00:42:00,880 have any number of children. 971 00:42:00,880 --> 00:42:04,700 Not just a binary tree, which can have 0, 1, or 2 maximally. 972 00:42:04,700 --> 00:42:09,700 But as a binary search tree, or BST, which is just a fancy way of saying a 973 00:42:09,700 --> 00:42:15,430 binary tree such that every node's left child, if present, is 974 00:42:15,430 --> 00:42:16,830 less than the node. 975 00:42:16,830 --> 00:42:20,170 And every node's right child, if present, is greater 976 00:42:20,170 --> 00:42:21,740 than the node itself. 977 00:42:21,740 --> 00:42:25,200 >> So in other words, if you were to draw the tree out, all of the numbers are 978 00:42:25,200 --> 00:42:30,620 carefully balanced like this so that if you have 55 as the root, 33 can go 979 00:42:30,620 --> 00:42:33,090 to its left because it's less than 55. 980 00:42:33,090 --> 00:42:36,430 77 can go to its right because it's greater than 55. 981 00:42:36,430 --> 00:42:40,750 But now notice, the same definition, it's a recursive definition verbally , 982 00:42:40,750 --> 00:42:42,600 has to apply for 33. 983 00:42:42,600 --> 00:42:47,610 33's left child must be less than it, and 33's right child, 44, must be 984 00:42:47,610 --> 00:42:48,580 larger than it. 985 00:42:48,580 --> 00:42:51,670 >> So this is a binary search tree, and I propose, using a little bit of 986 00:42:51,670 --> 00:42:53,910 recursion, we can now find n. 987 00:42:53,910 --> 00:42:59,160 So if n is less than the value n that's current node, I'm going to go 988 00:42:59,160 --> 00:43:04,090 ahead and punt, so to speak, and just return whatever the answer is of 989 00:43:04,090 --> 00:43:08,470 searching for n on the tree's left child. 990 00:43:08,470 --> 00:43:11,370 Notice again, this function just expects a node star, a 991 00:43:11,370 --> 00:43:12,780 pointer to a node. 992 00:43:12,780 --> 00:43:17,360 So surely I can just do tree arrow left, which will lead 993 00:43:17,360 --> 00:43:18,400 me to another node. 994 00:43:18,400 --> 00:43:19,480 But what is that node? 995 00:43:19,480 --> 00:43:22,820 >> Well, according to this declaration, left is just a pointer, so that just 996 00:43:22,820 --> 00:43:27,090 means I'm passing to the search function a different pointer, namely 997 00:43:27,090 --> 00:43:30,750 the one that represents my left child's tree. 998 00:43:30,750 --> 00:43:36,040 So in this case, the pointer to 33, if this is our sample input Meanwhile, if 999 00:43:36,040 --> 00:43:40,740 n is greater than the value n at the current node in the tree, then I'm 1000 00:43:40,740 --> 00:43:43,370 going to go ahead and punt in the other direction and just say, I don't 1001 00:43:43,370 --> 00:43:47,280 know if this value n is in the tree, but I know if it is, it's down my 1002 00:43:47,280 --> 00:43:49,090 right branch, so to speak. 1003 00:43:49,090 --> 00:43:53,120 So let me call recursively search, passing an n again, but passing in a 1004 00:43:53,120 --> 00:43:54,580 pointer to my right child. 1005 00:43:54,580 --> 00:44:00,020 >> In other words, if I'm currently at 55 and I'm looking for 99, I know that 99 1006 00:44:00,020 --> 00:44:04,270 is greater than 55, so just like I tore the phone book weeks ago and we 1007 00:44:04,270 --> 00:44:07,140 went right, similarly are we going to go right here. 1008 00:44:07,140 --> 00:44:11,960 And I don't know if it's at my right child, and it's not, 77 is there, but 1009 00:44:11,960 --> 00:44:13,210 I know it's in that direction. 1010 00:44:13,210 --> 00:44:18,770 So I call search on my right child, 77, and let search figure out from 1011 00:44:18,770 --> 00:44:24,950 there if 99 in this arbitrary example is actually there. 1012 00:44:24,950 --> 00:44:26,900 >> Else, what's the final case? 1013 00:44:26,900 --> 00:44:28,620 If tree is null is one case. 1014 00:44:28,620 --> 00:44:31,890 If n is less than the current node's value is another case. 1015 00:44:31,890 --> 00:44:35,120 If n is greater than the current node's value is a third case. 1016 00:44:35,120 --> 00:44:38,250 What's the fourth and final case? 1017 00:44:38,250 --> 00:44:39,480 I think we're out of cases, right? 1018 00:44:39,480 --> 00:44:44,690 It must be that n is in the current node that I'm on. 1019 00:44:44,690 --> 00:44:49,640 >> So if I'm searching for 55 at this point in the story, that branch of the 1020 00:44:49,640 --> 00:44:51,780 tree would return true. 1021 00:44:51,780 --> 00:44:55,380 So what's interesting here is that we actually, unlike weeks past, we kind 1022 00:44:55,380 --> 00:44:56,740 of have two base cases. 1023 00:44:56,740 --> 00:44:58,300 And they don't have to be all at the top. 1024 00:44:58,300 --> 00:45:01,390 The top is a base case because if the tree is null, there's nothing to do. 1025 00:45:01,390 --> 00:45:03,410 Just return a hard coded value of false. 1026 00:45:03,410 --> 00:45:07,400 >> The bottom branch is sort of the default, whereby if we've checked for 1027 00:45:07,400 --> 00:45:11,550 null, we've checked if it should be left, but it shouldn't be, we've 1028 00:45:11,550 --> 00:45:14,640 checked if it should be right, but it shouldn't be, clearly it has to be 1029 00:45:14,640 --> 00:45:15,870 right where we are. 1030 00:45:15,870 --> 00:45:16,780 That's a base case. 1031 00:45:16,780 --> 00:45:19,920 So there's two recursive cases sandwiched there in the middle. 1032 00:45:19,920 --> 00:45:21,630 But I could have written this in any order. 1033 00:45:21,630 --> 00:45:24,520 I just thought it kind of felt natural to first check for a possible error, 1034 00:45:24,520 --> 00:45:28,340 then check left, then check right, then assume that you're at the node 1035 00:45:28,340 --> 00:45:30,630 you're actually looking for. 1036 00:45:30,630 --> 00:45:36,240 >> So why might this be useful? 1037 00:45:36,240 --> 00:45:37,910 So it turns out-- 1038 00:45:37,910 --> 00:45:42,110 and let me jump to a teaser here that's in the web. 1039 00:45:42,110 --> 00:45:44,920 We're going to start using not a programming language at first, but a 1040 00:45:44,920 --> 00:45:46,030 markup language. 1041 00:45:46,030 --> 00:45:48,740 A markup language being one that's similar in spirit to programming 1042 00:45:48,740 --> 00:45:51,715 language, but it doesn't give you the ability to express yourself logically. 1043 00:45:51,715 --> 00:45:55,070 It only gives you the ability to express yourself structurally. 1044 00:45:55,070 --> 00:45:57,960 >> Where do you want to put something on the page, the web page? 1045 00:45:57,960 --> 00:45:59,200 What color do you want to make it? 1046 00:45:59,200 --> 00:46:00,950 What font size do you want to make it? 1047 00:46:00,950 --> 00:46:02,970 What words do you actually want on the web page? 1048 00:46:02,970 --> 00:46:04,060 So that's a markup language. 1049 00:46:04,060 --> 00:46:07,690 But then we'll very quickly introduce JavaScript, which is a full-fledged 1050 00:46:07,690 --> 00:46:08,560 programming language. 1051 00:46:08,560 --> 00:46:12,530 Very similar syntactically in appearance to C, but it'll have some 1052 00:46:12,530 --> 00:46:15,200 nice, more powerful, more user friendly features. 1053 00:46:15,200 --> 00:46:18,050 >> And one of the frustrations at this point in the semester is that we'll 1054 00:46:18,050 --> 00:46:22,065 soon implement speller in far fewer lines of code using other languages 1055 00:46:22,065 --> 00:46:25,580 than C itself allows, but for reason's we'll soon understand. 1056 00:46:25,580 --> 00:46:27,750 This will be the first such web page. 1057 00:46:27,750 --> 00:46:30,120 It will be completely underwhelming, the first one we make. 1058 00:46:30,120 --> 00:46:31,400 It will simply say, hello world. 1059 00:46:31,400 --> 00:46:34,010 But if you've never seen it before, this is HTML, 1060 00:46:34,010 --> 00:46:35,670 HyperText Markup Language. 1061 00:46:35,670 --> 00:46:39,310 >> If you go to a certain menu option in most any browser, on any web page on 1062 00:46:39,310 --> 00:46:43,160 the internet, you can see the HTML that some people wrote to 1063 00:46:43,160 --> 00:46:44,400 create that web page. 1064 00:46:44,400 --> 00:46:47,850 And it probably doesn't look as brief or as neat as this. 1065 00:46:47,850 --> 00:46:51,400 But it will follow the pattern of these open brackets and slashes and 1066 00:46:51,400 --> 00:46:53,660 letters and potentially numbers. 1067 00:46:53,660 --> 00:46:56,770 >> I thought I'd give you a teaser of what you'll be able to do 1068 00:46:56,770 --> 00:46:57,950 after taking CS50. 1069 00:46:57,950 --> 00:47:02,620 Let me go to cs.harvard.edu/rob, our own Rob Bowden's homepage. 1070 00:47:02,620 --> 00:47:06,080 He made this for us. 1071 00:47:06,080 --> 00:47:07,490 So you'll soon be able to do that. 1072 00:47:07,490 --> 00:47:10,660 And also, what you heard this morning-- 1073 00:47:10,660 --> 00:47:12,480 what you heard this morning-- 1074 00:47:12,480 --> 00:47:13,780 >> [HAMSTER DANCE MUSIC] 1075 00:47:13,780 --> 00:47:15,702 >> --you'll be able to make this. 1076 00:47:15,702 --> 00:47:16,790 That awaits us on Wednesday. 1077 00:47:16,790 --> 00:47:17,791 We will see you then. 1078 00:47:17,791 --> 00:47:22,950 >> [HAMSTER DANCE MUSIC] 1079 00:47:22,950 --> 00:47:24,300 DAVID MALAN: At the next CS50-- 1080 00:47:24,300 --> 00:47:31,670