DAVID MALAN: All right. Welcome back to CS50. This is the start of week 8. And recall that problem set 5 ended with a little bit of a challenge. So assuming you recovered all of your teaching Fellows and CA's photographs in the card.raw file, you are eligible to now find all of those people, and one lucky winner will walk home with one of these things, the leap motion device that you can use for final projects, for instance. This, every year, leads to a bit of creepiness. And so what I thought I'd do is share with you some of the notes that have gone back and forth over the staff list of late. For instance, just last night, quote unquote, from one of the staff members, "I just had a student knock on my door to take a photo with me. Stalkers, I tell you." Started off fairly descriptive and then we moved on to, an hour or so later, "I had a student waiting for me after section and he had all of our names and photos on some sheets of paper." All right. So organized, but not all that creepy yet. Then, "I was out of town this weekend, and when I got back, there was one in my bedroom." [LAUGHTER] DAVID MALAN: Next quote from a staff member, "a student came to my house at Somerville at 4 AM this morning." Next staff, "I got to my hotel in San Francisco and a student was waiting for me at the lobby with three DSLRs." Type of camera. "I'm not even on staff this semester, but a student broke into my house this morning and recorded the whole thing with Google Glass." And then lastly, "at least 12 people were eagerly awaiting for me when I got out of my limo, and then I woke up." All right. So among the photographs, as you may recall, are this fellow here, who you might know as Milo Banana, who lives with Lauren Carvalho, our head teaching Fellow. Milo, Milo, come here boy. Milo. Milo. Mind you, he's wearing Google Glass, so we'll show you all of this after. So this is Milo if you would like to take a photograph with him afterward. If you'd like to look out at the audience there. OK. That's good footage. Well, Milo Banana. Oh, don't do that. [LAUGHTER] OK. So a word then on what lies ahead, because as we begin to transition, this week specifically, from C in a command line environment to PHP and JavaScript and SQL and HTML and CSS in a web-based environment, we'll be equipping you with all the more knowledge for potential final projects. Toward that end, the course has a tradition of holding seminars which are on tangential topics to the course. Very much related to programming and to app development and so forth, but not necessarily explored by the course's own syllabus. So if you might be interested in one or more of this year's seminars, register at cs50.net/seminar. There are older seminars at cs50.net/seminars. And on the roster thus far for this year are Amazing Web Apps with Ruby on Rails, which is an alternative language to PHP. Computational Linguistics. Introduction to iOS, which is the platform that's used for iPhone and iPad development. JavaScript for Web Apps, we'll cover that, but in this seminar, you'll go into more detail. Leap Motion, so we'll actually have some of our friends from Leap Motion, the company itself, join us. Tomorrow, in fact, to provide a hands-on seminar, if of interest to you. Meteor.js, an alternative technique for using JavaScript not in a browser, but on a server. Node.js, which is very much in that vein as well. Sleek Android Design. Android being a very popular alternative to iOS and Windows Phone and other mobile platforms. And Web Security Active Defense. So in fact, if you would like to engage in this, let me make note of this. We're very happy to say that our friends at Leap Motion, which is a startup-- this device really just came out a few months ago-- has graciously donated 30 such devices to the class for as many students, if you'd like to borrow the hardware towards semester's end and use it for an actual final project. They support a number of languages. None of them C, none of them PHP, so realize one or more of these seminars might prove of interest. And all of them will be filmed in the event that you are not able to attend in person. The schedule be announced via email as we solidify rooms. And lastly, if you go to projects.cs.50.net, this is a website we maintain each year that we invite folks from the community, faculty, departments, staff, and both in an outside of CS50 to propose project ideas. Things of interest to student groups. Things of interest to departments. So do turn there if you're struggling with uncertainty as to what you yourself would like to tackle. So last time we introduced an arguably more complex data structure than we'd seen in weeks past. We'd been using arrays pretty happily as a useful if simplistic data structure. Then we introduced these, which of course are linked lists. And what was one of the motivations for introducing this data structure? Yeah? What's that? AUDIENCE: Dynamic size. DAVID MALAN: Dynamic size. So whereas in array, you have to know its size in advance when you allocate it. In linked list, you don't have to know that. You can just malloc, or more generally, allocate an additional node, so to speak, any time you want to insert more data. And node has no predetermined meaning. It's just a generic term describing some kind of container that we're using in our data structure to store some item of interest, which in this case happen to be integers. But there's always a tradeoff. So we get dynamic sizes of the data structure, but what price do we pay? What's the downside of linked lists? Yeah? AUDIENCE: Requires more memory. DAVID MALAN: It requires more memory, how exactly? AUDIENCE: [INAUDIBLE]. DAVID MALAN: Exactly. So now we have pointers taking up additional memory that we previously didn't need, because the advantage of an array, of course, is that everything's contiguous, back to back to back, which gives you random access. Because just by using square bracket notation, or more technically pointer arithmetic, very simple addition, can you access any elements in constant time. And in fact, that's kind of hinting at another price that we're paying with a linked list. What happens to the running time of something like Search, if I want to find some value and inside of a linked list? What does my running time become? Big O of n. If it's sorted to? What if the data structure's sorted? Can I do better than big O of n for searching? No, because in the worst case it might very well be sorted, but the number you're looking for might be big. It might be the number 100, which might happen to be all the way at the end. And because you can only access a linked list in this implementation by way of its first node, you're still kind of out of luck. You have to traverse the whole thing from first to last in order to find that big value like 100. Or to determine if it's not even there. So we cannot do what algorithm in a data structure that looks like this? We can't do binary search, because binary search required that we had random access. We could just leap from location to location without having to follow these bread crumbs in the form of all these pointers. Now, how did we implement this? Well, if we go to the screen here, if we can quickly reimplement this data structure-- my handwriting's not all that great here, but we'll try. So typedef struct, and what did I want to call this thing up here? Node. So I'll get us started. And now, what needs to be inside of the data structure for that singly linked list? How many fields? So two. One is pretty easy. So int n. And we could call n anything we want, but it should be an int if we're implementing a linked list for ints. And now what does the second field have to be? Struct node *. So if I do struct node *, and then I can call this also whatever I want, but just to be clear I'll call it next, as we've been doing. And then I'll close my curly braces. And now, as last time, I put node down here. But if I'm declaring this is as a node, why did I bother being so verbose here in declaring struct node * next, as opposed to just node * next? Yeah? AUDIENCE: [INAUDIBLE]. DAVID MALAN: Exactly. Exactly. Because C really takes you literally and only sees the definition of node way down here, you can't refer to it up here. So we have this sort of preemptive declaration here, which is admittedly more verbose. Struct node, that means we can now access it inside of the data structure. And as an aside, because this is becoming a little more subjective now, the star can technically go here, it can go here, it can even go in the middle. We've adopted, in the style guide for the course, the convention of putting the star right next to the data type, which in this case, would be struct node. But realize in a lot of textbooks and online references, you might indeed see it on the other side. But just realize that both will actually work and you should simply be consistent. All right. So that was our declaration of struct node. But then we started doing more sophisticated things. For instance, we decided to introduce something like a hash table. So here is a hash table of size n, indexed from 0 on the top left to n minus 1 on the bottom left. This could be a hash table for anything. But what kinds of things did we talk about using a hash table for? Storing what? Names. We could do names like we did last time. And really, you can store anything. And we'll see this again in PHP and in JavaScript. A hash table is a nice sort of Swiss Army knife that allows you to store pretty much whatever you want inside of it by associating keys with values. Keys with values. Now in this simple case, our keys are just numbers. We're implementing a hash table as an array. And so the keys are 0, 1, 2, and so forth. And so we, as humans, decided last week that you know what, if we're going to store names, let's just arbitrarily, but pretty reasonably, assume that Alice, an A name, will just be indexed into 0. And Bob, a B name, will be indexed into 1, and so forth. So we had a mapping between inputs, which are strings, and the hash locations, which are numbers. So that process is generally known as a hash function, and you can truly implement it in code. If I wanted to implement a hash function that does exactly what we just described from last time, I might declare a function that takes, as input for instance-- and let's do this on this screen over here. If I wanted to implement a hash function, I might say something like this. It's going to return an int. It's going to be called hash, and it's going to accept as an argument a string, or we can be more proper now, and say char*, we'll call it s. And then all this function has to do, ultimately, is return an int. Now, how it does that might not be so clear. I'm going to implement this without any form of error checking right now. I'm just going to blindly say, return whatever is at s bracket 0, minus, let's say, capital A semicolon. Totally broken. It's not perfect because one, what if s is null? Bad things are going to happen. Two, what if the first letter in this name is not a capital letter? That's not going to turn out well either. It might be a lowercase letter or not a letter at all. So totally room for improvement here, but this is the basic idea. What we described last week verbally as just a process of mapping Alice to 0 and Bob to 1 can be expressed certainly more formulaically as a C function here. Called again hash, takes a string as input, and then somehow does something with that input to produce an output. Not unlike our black box description that we've long done. I don't know how this might be working underneath the hood. For problem set 6, one of the challenges is for you to decide what will your hash function be? What's going to be inside of that black box, and presumably, it'll be a little more interesting than this, and definitely more prone to error checking than this particular implementation. But problems can arise, right? If we have a data structure such as this one, what's one of the problems you can run into over time as you insert more and more names into the hash table? You get collisions, right? What if you have Alice and Aaron, two people whose names happened to start with A? That begs the question, where you put the second such A name? Well, you might naively just put it where Bob belongs, but then Bob is kind of screwed if you try to insert his name next and there's no room for him. So you might put Bob where Charlie is, and you can imagine this very quickly devolving into a bit of a mess. Something linear in the end, where you just have to search the whole thing looking for Alice or Bob or Aaron or Charlie. So instead we proposed, instead of just linearly probing for open spaces and plopping the names there, we proposed a fancier approach. A hash table implemented still with an array of indices, but the data type of those indices now were pointers. Pointers to what? Pointers to linked lists. Because recall that a linked list is really just a pointer to a node, and the node has a next field, and that node has a next field, and so forth. So you can now think of this array on the left-hand side of a hash table as leading to a linked list. The advantage of which is if you get a collision between Alice and Aaron, what do you do with the second such person? You just attach him or her to the end, or even the beginning of that linked list. And actually, let's just noodle through that for just a second. Where would make the most sense? If I insert Alice and she ends up at the first location, then I try to insert Aaron's name, and there's obviously a collision, should I put him at the beginning of the linked list? That's at that first location, or at the end? AUDIENCE: [INAUDIBLE]. DAVID MALAN: OK. I heard beginning. Why at the beginning? AUDIENCE: [INAUDIBLE]. DAVID MALAN: OK. It's alphabetical, so that's nice. That's a good property. It will save me some time potentially. It won't let me do binary search, but I might at least be able to break out of a loop if I realize, well, I'm way past were Aaron would be in this sorted linked list. I don't have to waste my time looking all the way to the end. So that's reasonable. Why else might you want to insert the colliding name at the beginning of the list? What's that? AUDIENCE: [INAUDIBLE]. DAVID MALAN: It could take a long time to get to the end of the list. And in fact, longer and longer. The more names you insert that start with A, the longer that chain is going to get. The longer that linked list is going to get. So you're really just wasting your time. Maybe you're better off maintaining constant insertion time, big O of 1, by always putting the colliding name at the beginning of the linked list, and not worrying as much about sorting. What's the best answer? It's unclear. It kind of depends on what the distribution is, what the pattern is of the names you are inserting. It's not necessarily an obvious answer. But here to, again, is a design opportunity. So we then looked at this thing, which is really the other big opportunity for p-set 6. And realize, if you haven't already, Zamyla dives into both of these, hash tables and tries, in more detail. And the video walkthrough is embedded in p-set spec. This was a trie-- T-R-I-E. And what was interesting about this was that the running time of searching for a name, like Maxwell last time, was big O of what? What's that? AUDIENCE: The number of letters. DAVID MALAN: Number of letters. I heard two things. Number of letters and constant time. So let's go with that first. The number of letters. Well, this data structure, recall, is like a tree, a family tree, each of whose nodes are made up of arrays. And those arrays are pointers to other such nodes, or other such arrays in the tree. So if we wanted to then determine whether Maxwell is in here, I might go to the first array, at the very top of the tree, the so-called root, top of the trie, and then follow the m pointer, then the a pointer, then x, w, e, l, l. And then when I see some special symbol, denoted here as a triangle. In code you'll see we propose that you implemented as a bool, just saying yes or no, a word stops here. Well, once we've gone to M-A-X-W-E-L-L, feels like seven, maybe eight if we go one past it, eight steps to find Maxwell. Or let's call it K. But recall last time, I argued that if there's realistically a maximum length on a word, like 40-some-odd characters, a maximum length implies a constant value. So really, yes, it's technically big O of 8 or 7, or really big O of K. But if there's a finite cap on what K could be, it's a constant. And so it's big O of 1 at the end of the day. Not in the real world. Not when you actually start watching your clock as your program's running. It's absolutely going to be a bit slower than truly constant time with one step. It's going to be seven or eight steps, but still that's much, much better than an algorithm like big O of n that depends on the size of what's in the data structure. Notice the upside here is we can insert a million more names into this data structure, but how many more steps is it going to take us to find Maxwell in that case? None. He's unaffected. And to date, I don't think we've seen an example of a data structure or an algorithm that was completely unaffected by external behaviors like that. But this can't be amazing. This can't be the only solution for the p-set And it's not. This is not necessarily the data structure you should gravitate to, because like hash tables, tradeoff. What's the price you pay here? Memory. I mean, this is an atrocious amount of memory. And you can't quite see it here because the author of this picture obviously truncated all of the arrays, and we're not seeing lots of A's and B's and C's and Q's and Y's and Z's in these arrays. But they're there. Each of these nodes is a whole array of some 26 or more bytes, each of which represents a letter. 27 in our case, so that we can support apostrophes in the problem set. So this data structure is really, really dense and wide. And that alone might end up slowing things down, or at least costing you a lot more space. But again, we can draw comparisons here. Recall a while back, we achieved much more exciting running time in sorting when we use merge sort, but the price we paid to achieve n log n for merge sort required that we spend more what resource? More space. We needed a secondary array to copy people into, just like we did here on stage. So again, no clear winners, but just subjective design decisions to be made. All right. So how about this? Anyone recognize which D-Hall? OK. So three of us do. Mather House. So this is for Mather's dining. I'll bet all the dining halls have stacks of trays like this. And this is actually representative of something we've obviously seen already. We called it literally a stack. And the stack, in terms of your computer's memory, is where data goes while functions are being called. For instance, what kinds of things go on the stack with respect to the memory layout we've discussed in weeks past? What's that? AUDIENCE: Calls to functions. DAVID MALAN: I'm sorry. AUDIENCE: Calls to functions. DAVID MALAN: Calls to functions, but specifically, what's inside of each of those frames? What kinds of things? Yeah. So local variables. Anytime we needed some local storage, like an argument, or int I, or int temp, or whatever the local variable is, we've been putting that on the stack. And we call it a stack because of that layering idea. Just kind of matches up with reality, the concept thereof. But it turns out that a stack can also be seen as a data structure, an alternative to an array, an alternative to a linked list. Something conceptually more interesting that can still be implemented using either of those things, but it's a different type of data structure supporting, really, only two operations. But you can add on fancier features than these. But these are the basics-- push and pop. And the idea with a stack is that if I have here, with or without Annenberg knowing, a tray from next door with the number 9 on it. So just an int. And I want to push this onto the data structure, which currently is empty. Consider this the bottom of the stack. I would push this number 9 onto the stack, and now it's right there. But the interesting thing about a stack is that if I now want to push some other value, like 17, and I push this onto the stack, I'm going to do the only intuitive thing, I'm just going to put it right where we humans would be inclined to put it, on top. But what's interesting now is, how do I get at 9? You know, I don't without some effort. So what's interesting about a stack is that by design, it's a LIFO data structure. Silly way of describing last in, first out. So the last number in at this time was 17. So if I want to pop something off of the stack, it can only be 17. So there's a mandatory order of operations here, where the last item in has to be the first one out. Hence the acronym, LIFO. So why might this be useful? Are their contexts in which you'd want a data structure like this? Well, it's certainly been useful inside of a computer. So operating systems clearly use this kind of data structure for stacks. We'll also see the same idea when it comes to web pages. So this week and next week and beyond, and as you start implementing web pages in a language called HTML, you can actually use a data structure like this to determine if the page is correctly formatted. Because we'll see all web pages follow a sort of hierarchy, an indentation that will, at the end of the day, be a tree structure underneath the hood. So more on that in just a bit. But for now, let's propose for a moment, how could we go about representing what a stack is? Let me propose that we implement a stack with code like this. So a stack is going to have inside of it two things, an array, called trays, just to be consistent with the demo. And each of the items in that array is going to be a type int. And capacity is presumably what? Because I've not written the full definition here. It's probably the maximum size of the array. And it's probably declared as a sharp define at the top of the file, some kind of constant as implied by the mere capitalization. So somewhere capacity is defined as the maximum possible size. Meanwhile, inside of the data structure known as a stack there will be an integer just known simply as size. So if I were to represent this now pictorially, let's suppose that this whole black box represents my stack. Inside of it is two variables. So I'm going to draw the first one as size. And the second one I'm going to draw as an array. But just to keep things orderly, normally I would draw an array like this, but it's kind of nice if we match reality, or match the mental model. So let me instead draw the array vertically, which is just, again, artist's rendition. Doesn't really matter what it is underneath the hood. And we'll say that, by default, capacity is going to be three. So this will be location 0, this will be location 1, this will be location 2. If I bribe with a stress ball, would someone like to come up and run the board here for just a moment? OK, saw your hand first. Come on up. All right. So I believe it is Steven. Come on up. All right. But suppose now we rewind to the initial state of the world where I have just declared a stack, and it's going to be of capacity three. But size has not yet been determined. Trays has not yet been determined. So a couple of questions first. And let me give you mic so you can partake more actively in this. So what is inside of size at this moment in time if all I have done is declared a stack with one line of code? STEVEN: Not much. DAVID MALAN: OK, not much. Do we know what's inside of size, do we know what's inside of this array here? STEVEN: Just random code, right? Just-- DAVID MALAN: Yeah, I'm going to call it code, but random-- STEVEN: Things. DAVID MALAN: Things like random STEVEN: Bits. DAVID MALAN: Bits, right? So garbage values, right? So permutations of 0's and 1's. Remnants of previous usages of this memory. And we don't really know what the values are, so we typically draw them as question marks. So the first thing we're presumably going to want to do here-- and let me give this field inside of there a name-- trays. What should we presumably initialize size to if we want to start using this stack? STEVEN: Tray is sub 3. DAVID MALAN: So, OK. To be clear, capacity is declared elsewhere as three. And that's what I've used to allocate the array. Size is going to refer to how many trays are currently on the stack. STEVEN: Zero. DAVID MALAN: So it should be zero. So go ahead, and with any finger, draw a zero in size. All right. So now, what's inside of this here, we don't know. These are really just garbage values. So we could draw question marks, but let's keep the board clean for now because it doesn't matter what's there. We don't need to initialize the array to anything, because if we know that the size of the stack is zero, well, we shouldn't be looking at anything in this array anyway at this point in time. So now suppose that I push the number 9 onto the stack. How should we update the data structure inside of this black box? What values need to change? STEVEN: Within-- the size? DAVID MALAN: OK. Size should become what? STEVEN: Size would be one. DAVID MALAN: OK. So size should become one. So you can do this in a couple ways. Let me give you, now your finger is an eraser. All right. Then now your finger is a brush. All right. And now what else has to change, obviously, in the data structure? STEVEN: We're going from bottom up to 9. DAVID MALAN: 9. OK, Good. So still doesn't matter what's at location one or two because they're garbage values, but we shouldn't bother looking there because size is telling us that only the first element is actually legitimate. So now I push 17 onto the list. What happens to this picture? STEVEN: So size is going to go to two. DAVID MALAN: OK. You're eraser-- oops. You're an eraser. STEVEN: Eraser. DAVID MALAN: You're a brush. STEVEN: Brush. DAVID MALAN: OK. And what else? STEVEN: And then we-- DAVID MALAN: We pushed 17. STEVEN: We stick 17 on top, so-- DAVID MALAN: OK, good. STEVEN: --drop it down. DAVID MALAN: All right. It's getting easy. I'm not going to help you this time. Push 22. STEVEN: Done. Becoming an eraser. I'm becoming a brush. And then I am putting 22. DAVID MALAN: 22. Excellent. So one more time. I'm now going to push onto the stack 26. STEVEN: Ooh. Oh gosh. You really caught me off guard. DAVID MALAN: You didn't see this coming? STEVEN: I didn't see this coming. Could we re-initial capacity? DAVID MALAN: That's a good question. So we've kind of painted ourselves in a corner here. There really is no good out for Steven because we've allocated this array statically, so to speak, inside of the data structure. And we've essentially hard coded it to be of size three. So we can't really reallocate it. We could if we went back in, we redefined trays to be a pointer that we then use malloc to hand memory to. Because if we got the memory from the heap via malloc, we could then free it. But before freeing it, we could reallocate a bigger chunk of memory, update the pointer, and so forth. But for now, this is really the best we can do. Push and pop are presumably going to have to signal some error. So for instance, our implementation of push could return a bool which previously returned true, true, true. But the fourth time, it's going to have to return false, for instance. All right. Very well done. Congratulations. You've earned your stress ball today. [APPLAUSE] STEVEN: Thank you. DAVID MALAN: Thank you. OK, so this seems to be not much of a step forward, right? We described this data structure. It's been compelling, right? Operating systems like it. Apparently the web can make use of this, and other applications still. But what a stupid limitation that we're back to sort of week two limits where we have fixed size arrays. So there are indeed a couple of ways we could solve this. We could dynamically allocate the array, not by hard coding it as I've done here, but instead re-declaring this, just to be clear, as something like this. Int* trays, not deciding on a capacity yet. But when I declare the stack elsewhere in my code, I could then call malloc, get the address of a chunk of memory, and I could assign that address to trays. And then, because it's just a chunk of memory, I could continue to use square bracket notation in the usual way. Because again, there's sort of this functional equivalent of arrays and chunks of memory that come back from malloc. We can treat one as the other using pointer arithmetic or square bracket notation. So that's one approach. But how else might we implement this same data structure, potentially? Right? I feel like we just solved this problem like a week ago. What was the solution to this problem that Steven ran into? So linked lists, right. If the problem is that we're painting ourselves into a corner by allocating in advance too little memory that we then have to somehow deal with, well, why not just avoid that issue altogether? Why not just declare trays to be a pointer to a node, ergo a linked list, and then simply allocate new nodes every time Steven needed to fit a number into the data structure. So the picture would have to change. It's not going to be as clean and as simple as just an array of three ints. Now it's going to be a pointer to a struct, and that struct is going to have an int and a next pointer. It's going to lead via that pointer to another such struct to another such struct. So the picture would actually get a bit messier. And we'd have arrows tying everything together. But that's fine, right, because we've seen how to do this. And once you get comfortable implementing something like a linked list, which you'll have to do if you choose to implement a hash table with separate chaining for p-set 6, you can use it as a building block, or an ingredient, or in Scratch speak, a procedure, something that you put, you created your own puzzle piece that you can then reuse. So tradeoffs, but potential solutions that we've actually seen before. So quite often, you see this every year or two when Apple releases something new, and all the crazy people line up outside of the Apple store to buy their marginal upgrade on hardware. I say this, it's OK, because I am one of those people. So what kind of data structure might represent this reality? Well, let's call it a queue, a line. So British would call it typically a queue anyway, so it's a nice name. And the two operations that a queue shall support we'll call a enqueue operation and a dequeue operation, which are similar in spirit to push and pop. It's just sort of different in convention, what we're calling these. But to enqueue something means to add or insert it to the data structure. To dequeue means to remove it. But whereas a stack was a LIFO data structure, a queue is a first in, first out data structure. If you are the first person in line, you will be the first person to get out of line and buy your new device. Imagine how upset these people would be if Apple instead used a stack, for instance, to implement the picking up of your new toy. So queues make sense, certainly, and we can think of all sorts of applications, presumably, for queues, especially when you want fairness. So how might we implement these as a data structure? Well, I propose that we might need to do it this way. So I'm going to now have numbers. So we'll keep it simple and not necessarily talk in terms of trays. Just numbers that people of gotten. Capacity is going to, again, fix the total number of people that can be in this line, as three or whatever other value. But I propose that I need to keep track not only of the size of the queue, how many things are in it. So size is the current size, capacity is the maximum size. Just again, nomenclature by convention. Why do I need an additional int inside of a queue to keep track of who's in front of the line? Why do I need to do that in this case? Well, how is this picture going to change? I can probably reuse most of this picture. Let me go ahead and erase what's here. We'll give this a slightly different name up here. Let's get rid of the 17, let's get rid of the 9, let's get rid of the 3. And let's add one other thing. I propose that I need to keep track of the front of the list, which is just going to be an int as well. And we're going to keep it simple. No linked list for now. We'll admit that we're going to bump up against this limit. But what do I want to see happen this time? So suppose I go ahead and the first person comes up in line, and it's the number 9. We do have stress balls. Can I steal, say, two or three people? One, two, three? Come on up. Right from the front, because we'll make this one quick. Each of you is now going to be a fan boy in line at Apple. You will not be receiving Apple hardware at the end of this though. All right. So you're number 9, you're number 17, number 22. These are arbitrary numbers, like student IDs or whatnot. And in just a moment, let's begin to start adding things. And I'll run the board here this time. So in this case, I've initialized the front to be-- I actually don't really care what the front is, because the size is zero. So this might as well just be a question mark. These are all question marks. So now we'll begin to actually see some people lining up at the store. So if number 9, you're the first one there at 5 AM, go ahead and line up, or the night before. OK. So now 9 is here. So 9 is in the front of the list. So I'm going to go ahead and update the size of this current data structure to not be 0 anymore, but to be 1. I'm going to put 9 at the front of the list. Let me go ahead and toggle the screen so we can see past us here. And now what do I want to put at front? I'm going to keep track that the front of the queue right now is at location 0. Because what is next going to happen? Well, suppose now I enqueue 17 as well. So hop in line there. And again, the sort of door to the store is going to be here. So now I've added 17. And even though these guys are blocking the screen, that's OK, because we can see it up here. Sorry. AUDIENCE: We can move-- DAVID MALAN: No, that's OK. It's huge up there. So 17 is now inside of the queue. I need to update which fields now though? OK, definitely size. And how about front? OK, no. Front should not change, because unlike a stack, we want to maintain fairness. So if 9 came in first, we want 9 to be the first out of the line and into the store. In fact, let's see that. Before we insert 22, let's go ahead and dequeue 9. What's your name again? AUDIENCE: Jake. DAVID MALAN: Jake is going to be dequeued now. So you get to walk into the store. And pretend that the store is over there. So now what needs-- dit-dit-dit! What needs to happen now? Design decision. So not a bad instinct, but-- what's your name again? AUDIENCE: David. DAVID MALAN: David. So what did David do? He was trying to sort of fix the data structure and move from his location into Jake's former location. And that's fine if we're willing to accept that as an implementation detail. But first, let's update the data structure before we do that. Because I'm not liking the idea of all the people shifting in this line. It's no big deal if David does it with one step, but again, think back to when we've had eight volunteers on the stage and we've done like insertion sort, where we had to start moving everyone around. That got expensive, right? That makes me cringe about big O of n, big O of n squared again. It's not feeling like an ideal outcome. So let's just update this. So the size of the queue is no longer 2. It's now simply 1. But I can now update something I didn't update before, the front of the list. I could just say, is that location 1? So now we have garbage value here, garbage value here, and David in the middle of this garbage. But the data structure is still intact. And in fact, I don't even need to change Jake's former number 9, because who cares. I have enough information now in the size that I know there's one person in this queue. And I know that that person is at location 1, not 0. I'm not counting. So 1 as well. So the data structure's still OK. Well, what happens next? Let's enqueue-- what's your name? AUDIENCE: Callen. DAVID MALAN: Callen. Let's enqueue a Callen, and 22 is now in the queue. So now what has to change here? Front is not going to change, obviously. Size is going to change to be 2 again. And 22 ends up here, 9 is still present, but it's effectively a garbage value now. It's just a remnant of Jake past. So now what happens if I dequeue David? One last operation, dequeue David. We could shift, but I propose let's do as little work as possible. Now my data structure goes back in size from 2 to 1. But the front of the queue now becomes 2. I don't need to change these numbers just yet, because they're just garbage values. But now what happens? Suppose I enqueue myself, 26? I feel like I belong over here. So I'm being enqueued. So I kind of belong here. And even though you don't quite appreciate this visually on the stage, because we have plenty of room, I should not be standing here, why? AUDIENCE: You're out of bounds. DAVID MALAN: Right. I'm out of bounds. I've indexed beyond the bounds of this array. I really should be in one of the three possible locations. Now, where's most natural to go? I propose we leveraged a week one trick. The mod operator, percentage. Because I'm technically standing at location 3, but I do 3 mod capacity, so 3, a percent sign, 3-- capacity's 3. What's that? What's the remainder when you divide 3 by 3? 0. So that puts me were Jake was, which is actually good. So now the implementation of this thing's going to be a bit of a headache. It's really just one line of headache, of code. But at least now there's garbage value here, but there's two legitimate ints here. And I claim that now we have done exactly what we need to do so long as we change what Jake's value was to be 26. We now have enough information still to maintain the integrity of this data structure. We're still kind of out of luck when we want to insert four or more total elements, but I can at least make pretty efficient use of this constant time, in fact. I don't have to worry about shifting everyone, as David's inclination was. Any questions on stacks, or this queue? AUDIENCE: Is the reason why you need size so you know where to have a person? DAVID MALAN: Exactly. I need to know the size of the array because I need to know exactly how many of these values are legitimate, and so that I can find where to put the next person. Exactly. The size is-- actually, we didn't update this yet. I added myself at 26. The size is now, not 1, but 2. So now this indeed helps me find the head of the list, which is not 0, not 1, but is 2. The front of the list is indeed number 22. Because he came in first, so he should be allowed into the store before me, even though visually I'm standing closer to the store. All right? A round of applause for these guys and we'll let them out of there. [APPLAUSE] DAVID MALAN: I could let you keep the tray. We could see what happens if you want, but maybe not. All right. So what now does that leave us? Well, let me propose that there's a few other data structures we could start adding to our tool kit that will actually be quite, quite relevant as we dive into web stuff. Which again, has some kind of connection to trees in the form of something called DOM, document object model. But we'll see more of that before long. Let me propose definitionally that we call tree now what you might know as more of a family tree, where you have some ancestor at the roots of the tree. A patriarchal or a matriarch at the very top of the tree. Without their spouse, in this case. But we now have what we'll call children, which are nodes that hang off the left child or the right child, arrows as depicted here. In other words, in a tree data structure in computer, a tree has zero or more nodes. If it has at least one node, that's called the root. It's the things visually that we draw at the top. And that node, like any other node, can have zero, one, or two, or three, or however many children the data structure supports. In this case, the root, storing the value one, has two children, 2 and 3, so we generally call 2 the left child and 3 the right child. And then when we get down to 5, 6, and 7, 6 might be called the middle child. If you have four children, it gets confusing. So we stop using that kind of shortcut verbally. But it's really just a family tree. And the leaves here are the nodes that themselves have no children. They hang off the bottom of the tree. So how might we implement a tree that has just two children maximally? We'll call it a binary tree. Bi again meaning two, in this case, like with binary. And so it can have zero, one, or two children maximally. I'll propose that we implement the node for that structure with an int n, and then two pointers, one called left, one called right. But those are just nice arbitrary conventions. And what's nice now, especially if you kind of struggled conceptually with recursion, or thought that it wasn't really a solution to anything, especially if you could run out of memory. Now that we're talking about data structures and algorithms that allow us to traverse and manipulate them, turns out that recursion comes back in a much more compelling if not beautiful way. So this I propose is the implementation of a Search function. Given two inputs-- so think of this as a black box. Given two inputs, n, an int, and a pointer to a tree, a pointer to a node, or really the root of a tree, I claim that this function can return true or false, that value n is inside of this tree. What's inside of this black box? Well, four branches. The first just checks. If tree is null, just return false. If there's no node, there's no n, there's no number, just return false. If though, n, the value you're looking for, is less than tree arrow n, and just to be clear, what does it mean when I write tree and then the arrow notation, n? Exactly. It means dereference that pointer called tree. Go there, and then get inside of that node and get its field called n. And then compare the actual n that was passed into Search against it. So if n is less than, the n value in the tree node itself, well, what does that mean? That means nothing at first glance. Right? Just like when you have an array of values, you might like to apply binary search as a form of divide and conquer. But what assumption did we need to make for binary search to work at all in the phone book and earlier examples? How to be sorted. So let's refine the definition of tree here not to be just a tree, which can have any number of children. Not just a binary tree, which can have 0, 1, or 2 maximally. But as a binary search tree, or BST, which is just a fancy way of saying a binary tree such that every node's left child, if present, is less than the node. And every node's right child, if present, is greater than the node itself. So in other words, if you were to draw the tree out, all of the numbers are carefully balanced like this so that if you have 55 as the root, 33 can go to its left because it's less than 55. 77 can go to its right because it's greater than 55. But now notice, the same definition, it's a recursive definition verbally , has to apply for 33. 33's left child must be less than it, and 33's right child, 44, must be larger than it. So this is a binary search tree, and I propose, using a little bit of recursion, we can now find n. So if n is less than the value n that's current node, I'm going to go ahead and punt, so to speak, and just return whatever the answer is of searching for n on the tree's left child. Notice again, this function just expects a node star, a pointer to a node. So surely I can just do tree arrow left, which will lead me to another node. But what is that node? Well, according to this declaration, left is just a pointer, so that just means I'm passing to the search function a different pointer, namely the one that represents my left child's tree. So in this case, the pointer to 33, if this is our sample input Meanwhile, if n is greater than the value n at the current node in the tree, then I'm going to go ahead and punt in the other direction and just say, I don't know if this value n is in the tree, but I know if it is, it's down my right branch, so to speak. So let me call recursively search, passing an n again, but passing in a pointer to my right child. In other words, if I'm currently at 55 and I'm looking for 99, I know that 99 is greater than 55, so just like I tore the phone book weeks ago and we went right, similarly are we going to go right here. And I don't know if it's at my right child, and it's not, 77 is there, but I know it's in that direction. So I call search on my right child, 77, and let search figure out from there if 99 in this arbitrary example is actually there. Else, what's the final case? If tree is null is one case. If n is less than the current node's value is another case. If n is greater than the current node's value is a third case. What's the fourth and final case? I think we're out of cases, right? It must be that n is in the current node that I'm on. So if I'm searching for 55 at this point in the story, that branch of the tree would return true. So what's interesting here is that we actually, unlike weeks past, we kind of have two base cases. And they don't have to be all at the top. The top is a base case because if the tree is null, there's nothing to do. Just return a hard coded value of false. The bottom branch is sort of the default, whereby if we've checked for null, we've checked if it should be left, but it shouldn't be, we've checked if it should be right, but it shouldn't be, clearly it has to be right where we are. That's a base case. So there's two recursive cases sandwiched there in the middle. But I could have written this in any order. I just thought it kind of felt natural to first check for a possible error, then check left, then check right, then assume that you're at the node you're actually looking for. So why might this be useful? So it turns out-- and let me jump to a teaser here that's in the web. We're going to start using not a programming language at first, but a markup language. A markup language being one that's similar in spirit to programming language, but it doesn't give you the ability to express yourself logically. It only gives you the ability to express yourself structurally. Where do you want to put something on the page, the web page? What color do you want to make it? What font size do you want to make it? What words do you actually want on the web page? So that's a markup language. But then we'll very quickly introduce JavaScript, which is a full-fledged programming language. Very similar syntactically in appearance to C, but it'll have some nice, more powerful, more user friendly features. And one of the frustrations at this point in the semester is that we'll soon implement speller in far fewer lines of code using other languages than C itself allows, but for reason's we'll soon understand. This will be the first such web page. It will be completely underwhelming, the first one we make. It will simply say, hello world. But if you've never seen it before, this is HTML, HyperText Markup Language. If you go to a certain menu option in most any browser, on any web page on the internet, you can see the HTML that some people wrote to create that web page. And it probably doesn't look as brief or as neat as this. But it will follow the pattern of these open brackets and slashes and letters and potentially numbers. I thought I'd give you a teaser of what you'll be able to do after taking CS50. Let me go to cs.harvard.edu/rob, our own Rob Bowden's homepage. He made this for us. So you'll soon be able to do that. And also, what you heard this morning-- what you heard this morning-- [HAMSTER DANCE MUSIC] --you'll be able to make this. That awaits us on Wednesday. We will see you then. [HAMSTER DANCE MUSIC] DAVID MALAN: At the next CS50--