[Section 7: More Comfortable] [Rob Bowden] [Harvard University] [This is CS50] [CS50.TV] All right. So like I said in my email, this is going to be a binary tree-intensive section. But there aren't that many questions. So we're going to try and draw out each question and go into painful detail of all the best ways of doing things. So right at the beginning, we go through sample drawings of binary trees and stuff. So here, "Remember that a binary tree has nodes similar to those of a linked list, except instead of one pointer there are two: one for the left 'child' and one for the right 'child'." So a binary tree is just like a linked list, except the struct is going to have two pointers. There's trinary trees, which are going to have three pointers, there are N-ary trees, which just have a generic pointer that you then have to malloc to be large enough to have enough pointers to all the possible children. So binary tree just happens to have a constant number of two. If you want, you can give a linked list as a unary tree, but I don't think anyone calls it that. "Draw a boxes-and-arrows diagram of a binary tree node containing Nate's favorite number, 7, where each child pointer is null." So iPad mode. It's going to be pretty straightforward. We're just going to have a node, I'll draw it as a square. And I'll draw the values in here. So the value will go in here, and then down here, we'll have the left pointer on the left and the right pointer on the right. And it is very much so convention to call it left and right, the pointer names. Both of these are going to be null. That will just be null, and that will just be null. Okay. So back to here. "With a linked list, we only had to store a pointer to the first node in the list in order to remember the whole linked list, or the whole list. Likewise, with trees, we only have to store a pointer to a single node in order to remember the whole tree. This node is called the "root" of the tree. Build upon your diagram from before or draw a new one such that you have a boxes-and-arrows depiction of a binary tree with the the value 7, then 3 in the left, then 9 on the right, and then 6 on the right of the 3." Let's see if I can remember all of that in my head. So this is going to be our root up here. We'll have some pointer somewhere, something that we'll call root, and it's pointing to this guy. Now to make a new node, what do we have, 3 on the left? So a new node with 3, and it initially points null. I'll just put N. Now we want to make that go to the left of 7. So we change this pointer to now point to this guy. And we do the same. We want a 9 over here, which initially just says null. We're going to change this pointer, point to 9, and now we want to put 6 to the right of 3. So it's going to--make a 6. And that guy will point there. Okay. So that's all it asks us to do. Now let's go over some terminology. We already talked about how the root of the tree is the top-most node in the tree. The one containing 7. The nodes at the bottom of the tree are called the leaves. Any node which just has null as its children is a leaf. So it is possible, if our binary tree is just a single node, that a tree is a leaf, and that's it. "The 'height' of the tree is the number of hops you have to make to get from the top to a leaf." We'll get into, in a second, the difference between balanced and unbalanced binary trees, but for now, the overall height of this tree, I would say, is 3, although if you count the number of hops you have to make in order to get to 9, then it's really only a height of 2. Right now, this is an unbalanced binary tree, but we'll talked about balanced when it gets to be relevant. So now we can talk about nodes in a tree in terms relative to the other nodes in the tree. So now we have parents, children, siblings, ancestors, and descendants. They are pretty common sense, what they mean. If we ask--it's parents. So what is the parent of 3? [Students] 7. >>Yeah. The parent is just going to be what points to you. Then what are the children of 7? [Students] 3 and 9. >>Yeah. Notice that "children" literally means children, so 6 would not apply, because it's like a grandchild. But then if we go descendants, so what are the descendants of 7? [Students] 3, 6 and 9. >>Yeah. The descendants of the root node is going to be everything in the tree, except maybe the root node itself, if you don't want to consider that a descendant. And finally, ancestors, so it's the opposite direction. So what are the ancestors of 6? [Students] 3 and 7. >>Yeah. 9 is not included. It's just the direct lineage back to the root is going to be your ancestors. "We say that a binary tree is 'ordered' if for each node in the tree, all of its descendants on the left have lesser values and all of the ones on the right have greater values. For example, the tree above is ordered but it's not the only possible ordered arrangement." Before we get to that, an ordered binary tree is also known as a binary search tree. Here we happen to be calling it an ordered binary tree, but I have never heard it called an ordered binary tree before, and on a quiz, we are much more likely to put binary search tree. They're one and the same, and it's important you recognize the distinction between binary tree and binary search tree. A binary tree is just a tree that points to two things. Each node points to two things. There is no reasoning about the values that it points to. So like over here, since it's a binary search tree, we know that if we go left of 7, then all of the values that we can possibly reach by going left of 7 have to be less than 7. Notice that all the values less than 7 are 3 and 6. Those are all to the left of 7. If we go to the right of 7, everything has to be greater than 7, so 9 is to the right of 7, so we're good. This is not the case for a binary tree-- for a regular binary tree we can just have 3 at the top, 7 to the left, 9 to the left of 7; there's no ordering of values whatsoever. Now, we won't actually do this because it's tedious and unnecessary, but "try to draw as many ordered trees as you can think of using the numbers 7, 3, 9, and 6. How many distinct arrangements are there? What is the height of each one?" We'll do a couple, but the main idea is this is in no way a unique representation of a binary tree containing these values. All we need is some binary tree that contains 7, 3, 6, and 9. Another possible valid one would be the root is 3, go to the left and it's 6, go to the left and it's 7, go to the left and it's 9. That is a perfectly valid binary search tree. It's not very helpful, because it's just like a linked list and all of these pointers are just null. But it is a valid tree. Yeah? [Student] Don't the values have to be greater on the right? Or is this--? >>These I meant to go the other way. There's also--yeah, let's switch that. 9, 7, 6, 3. Good catch. It still has to obey what a binary tree search is supposed to do. So everything to the left has to be less than any given node. We could just move, say, this 6 and put it here. No, we can't. Why do I keep doing that? Let's do--here is 6, here is 7, 6 points to 3. This is still a valid binary search tree. What is wrong if I--let's see if I can come up with an arrangement. Yeah, okay. So what is wrong with this tree? I guess I've already given you a hint that there is something wrong with it. Why do I keep doing that? Okay. This looks reasonable. If we look at each node, like 7, then to the left of 7 is a 3. So we have 3, the thing to the right of 3 is a 6. And if you look at 6, the thing to the right of 6 is a 9. So why is this not a valid binary search tree? [Students] 9 is still to the left of 7. >>Yeah. It must be true that all values you can possibly reach by going to the left of 7 are less than 7. If we go left of 7, we get to 3 and we can still get to 6, we can still get to 9, but by having gone less than 7, we shouldn't be able to get to a number that's greater than 7. So this is not a valid binary search tree. My brother actually had an interview question that was basically this: just code up something to validate whether a tree is a binary search tree, and so the first thing he did was just check to see if the left and right are correct, and then iterate down there. But you can't just do that; you have to keep track of the fact that now that I've gone left of 7, everything in this subtree must be less than 7. The correct algorithm needs to keep track of the bounds that the values can possibly fall in. We won't go through all of them. There is a nice recurrence relation, although we haven't gotten to those, or we won't get to those, defining how many there actually are. So there are 14 of them. The idea of how you would do it mathematically is like you can pick any single one to be the root node, and then if I pick 7 to be the root node, then there are, say, some numbers that can go be my left node, and there are some numbers that can be my right node, but if I have n total numbers, then the amount that can go to the left plus the amount that can go to the right is n - 1. So of the remaining numbers, they have to be able to go either to the left or the right. It seems difficult that if I put 3 first, then everything has to go to the left, but if I put 7, then some things can go the the left and some things can go to the right. And by "3 first", I meant everything can go to the right. It's really--you just have to think about it as-- how many things can go on the next level of the tree. And it comes out to be 14; or you can draw all of them, and then you'll get 14. Going back here, "Ordered binary trees are cool because we can search through them in a very similar way to searching over a sorted array. To do so, we start at the root and work our way down the tree towards leaves, checking each node's values against the values we're searching for. If the current node's value is less than the value we're looking for, you go next to the node's right child. Otherwise, you go to the node's left child. At some point, you'll either find the value you're looking for, or you'll run into a null, indicating the value's not in the tree." I have to redraw the tree we had before; that'll take a second. But we want to look up whether 6, 10, and 1 are in the tree. So what it was, 7, 9, 3, 6. Okay. The numbers you want to look up, we want to look up 6. How does this algorithm work? Well, we also have some root pointer to our tree. And we would go to the root and say is this value equal to the value we're searching for? So we're looking for 6, so it's not equal. So we keep going, and now we say okay, so 6 is less than 7. Does that mean we want to go to the left, or do we want to go to the right? [Student] Left. >>Yeah. It's significantly easier, all you have to do is draw one possible node of the tree and then you don't--instead of trying to think in your head, okay, if it's less, do I go to the left or go the right, just looking at this picture, it's very clear that I have to go to the left if this node is greater than the value that I'm looking for. So you go to the left; now I'm at 3. I want to--3 is less than the value I'm looking for, which is 6. So we go to the right, and now I end up at 6, which is the value I'm looking for, so I return true. The next value I'm going to look for is 10. Okay. So 10, now, is going to--cut off that--going to follow the root. Now, 10 is going to be greater than 7, so I want to look to the right. I'm going to come over here, 10 is going to be greater than 9, so I'm going to want to look to the right. I come over here, but over here now I'm at null. What do I do if I hit null? [Student] Return false? >>Yeah. I did not find 10. 1 is going to be a nearly identical case, except it's just going to be flipped; instead of looking down the right side, I'm going to look down the left side. Now I think we actually get to code. Here's where--open up the CS50 Appliance and navigate your way there, but you can also just do it in the space. It's probably ideal to do it in the space, because we can work in the space. "First we'll need a new type definition for a binary tree node containing int values. Using the boilerplate typedef below, create a new type definition for a node in a binary tree. If you get stuck. . . " blah, blah, blah. Okay. So let's put the boilerplate here, typedef struct node, and node. Yeah, okay. So what are the fields we're going to want in our node? [Student] Int and then two pointers? >>Int value, two pointers? How do I write the pointers? [Student] Struct. >>I should zoom in. Yeah, so struct node* left, and struct node* right. And remember the discussion from last time, that this makes no sense, this makes no sense, this makes no sense. You need everything there in order to define this recursive struct. Okay, so that's what our tree is going to look like. If we did a trinary tree, then a node might look like b1, b2, struct node* b3, where b is a branch--actually, I've more heard it left, middle, right, but whatever. We only care about binary, so right, left. "Now declare a global node * variable for the root of the tree." So we're not going to do that. In order to make things slightly more difficult and more generalized, we won't have a global node variable. Instead, in main, we will declare all our node things, and that means that below, when we start running our contains function and our insert function, instead of our contains function just using this global node variable, we're going to have it take as an argument the tree that we want it to process. Having the global variable was supposed to make things easier. We're going to make things harder. Now take a minute or so to just do this sort of thing, where inside of main you want to create this tree, and that's all you want to do. Try and construct this tree in your main function. Okay. So you don't even have to have constructed the tree the entire way yet. But anyone have something I could pull up to show how one might start constructing such a tree? [Student] Someone's banging, trying to get out. [Bowden] Anyone comfortable with their tree construction? [Student] Sure. It's not done. >>It's fine. We can just finish-- oh, can you save it? Hooray. So here we have--oh, I'm slightly cut off. Am I zoomed in? Zoom in, scroll out. >>I have a question. >>Yeah? [Student] When you define struct, are things like initialized to anything? [Bowden] No. >>Okay. So you would have to initialize-- [Bowden] No. When you define, or when you declare a struct, it is not initialized by default; it's just like if you declare an int. It's exactly the same thing. It's like each of its individual fields can have a garbage value in it. >>And is it possible to define--or to declare a struct in a way that it does initialize them? [Bowden] Yes. So, shortcut initialization syntax is going to look like-- There's two ways we can do this. I think we should compile it to make sure Clang also does this. The order of arguments that comes in the struct, you put as the order of arguments inside of these curly braces. So if you want to initialize it to 9 and left be null and then right be null, it would be 9, null, null. The alternative is, and the editor does not like this syntax, and it thinks I want a new block, but the alternative is something like-- here, I'll put it on a new line. You can explicitly say--I forget the exact syntax. So you can explicitly address them by name and say .c, or .value = 9, .left = NULL. I'm guessing these need to be commas. .right = NULL, so this way you don't actually need to know the order of the struct, and when you're reading this, it's much more explicit about what the value's being initialized to. This happens to be one of the things that-- so, for the most part, C++ is a superset of C. You can take C code, move it over to C++, and it should compile. This is one of the things that C++ does not support, so people tend not to do it. I don't know if that's the only reason people tend not to do it, but the case where I needed to use it needed to work with C++ and so I couldn't use this. Another example of something that doesn't work with C++ is how malloc returns a "void *", technically, but you could just say char* x = malloc whatever, and it will automatically be cast to a char*. That automatic cast does not happen in C++. That would not compile, and you would explicitly need to say char*, malloc, whatever, to cast it to a char*. There aren't many things that C and C++ disagree on, but those are two. So we'll go with this syntax. But even if we didn't go with that syntax, what is--might be wrong with this? [Student] I don't need to dereference it? >>Yeah. Remember that the arrow has an implicit dereference, and so when we're just dealing with a struct, we want to use . to get at a field inside of the struct. And the only time we use arrow is when we want to do-- well, arrow is equivalent to-- that's what it would have meant if I used arrow. All arrow means is dereference this, now I'm at a struct, and I can get the field. Either get the field directly or dereference and get the field-- I guess this should be value. But here I'm dealing with just a struct, not a pointer to a struct, and so I cannot use the arrow. But this sort of thing we can do for all nodes. Oh my God. This is 6, 7, and 3. Then we can set up the branches in our tree, we can have 7-- we can have--its left should point to 3. So how do we do that? [Students, unintelligible] >>Yeah. The address of node3, and if you didn't have address, then it just wouldn't compile. But remember that these are pointers to the next nodes. The right should point to 9, and 3 should point on the right to 6. I think this is all set. Any comments or questions? [Student, unintelligible] The root is going to be 7. We can just say node* ptr = or root, = &node7. For our purposes, we're going to be dealing with insert, so we're going to want to write a function to insert into this binary tree, and insert is inevitably going to call malloc to create a new node for this tree. So things are going to get messy with the fact that some nodes are currently on the stack and other nodes are going to end up on the heap when we insert them. This is perfectly valid, but the only reason we're able to do this on the stack is because it's such a contrived example that we know the tree is supposed to be constructed as 7, 3, 6, 9. If we didn't have this, then we wouldn't have to malloc in the first place. As we'll see a bit later, we should be malloc'ing. Right now it's perfectly reasonable to put on the stack, but let's change this to a malloc implementation. So each of these is now going to be something like node* node9 = malloc (sizeof(node)). And now we're going to have to do our check. if (node9 == NULL)--I didn't want that-- return 1, and then we can do node9-> because now it's a pointer, value = 6, node9->left = NULL, node9->right = NULL, and we're going to have to do that for each of those nodes. So instead, let's put it inside of a separate function. Let's call it node* build_node, and this is somewhat similar to the APIs we provide for Huffman encoding. We give you initializer functions for a tree and deconstructor "functions" for those trees and the same for forests. So here we're going to have an initializer function to just build a node for us. And it's going to look pretty much exactly like this. And I'm even going to be lazy and not change the name of the variable, even though node9 makes no sense anymore. Oh, I guess node9's value should not have been 6. Now we can return node9. And here we should return null. Everyone agree on that build-a-node function? So now we can just call that to build any node with a given value and null pointers. Now we can call that, we can do node* node9 = build_node(9). And let's do. . . 6, 3, 7, 6, 3, 7. And now we want to set up the same pointers, except now everything's already in terms of pointers so no longer need the address of. Okay. So what's the last thing I want to do? There's an error-checking that I'm not doing. What does build node return? [Student, unintelligible] >>Yeah. If malloc failed, it'll return null. So I'm going to lazily put it down here instead of doing a condition for each. If (node9 == NULL) or--even simpler, this is equivalent to just if not node9. So if not node9, or not node6, or not node3, or not node7, return 1. Maybe we should print "malloc failed" or something. [Student] Is false equal to null as well? [Bowden] Any zero value is false. So null is a zero value. Zero is a zero value. False is a zero value. Any--pretty much the only two zero values are null and zero, false is just hash-defined as zero. That also applies if we do declare global variable. If we did have node* root up here, then--the nice thing about global variables is that they always have an initial value. That's not true of functions, how inside of here, if we have, like, node* or node x. We have no idea what x.value, x.whatever, or we could print them and they could be arbitrary. That's not true of global variables. So node root or node x. By default, everything that's global, if not explicitly initialized to some value, has a zero value as its value. So here, node* root, we don't explicitly initialize it to anything, so its default value will be null, which is the zero value of pointers. The default value of x is going to mean that x.value is zero, x.left is null, and x.right is null. So since it is a struct, all of the fields of the struct will be zero values. We don't need to use that here, though. [Student] The structs are different than other variables, and the other variables are garbage values; these are zeros? [Bowden] Other values too. So in x, x will be zero. If it's at global scope, it has an initial value. >>Okay. [Bowden] Either the initial value you gave it or zero. I think that takes care of all of this. Okay. So the next part of the question asks, "Now we want to write a function called contains with a prototype of bool contains int value." We are not going to do bool contains int value. Our prototype is going to look like bool contains(int value). And then we're also going to pass it the tree that it should be checking to see if it has that value. So (node* tree). Okay. And then we can call it with something like, maybe we'll want to printf or something. Contains 6, our root. That should return 1, or true, whereas contains 5 root should return false. So take a second to implement this. You can do it either iteratively or recursively. The nice thing about the way we've set things up is that it lends itself to our recursive solution much easier than the global-variable way did. Because if we just have contains int value, then we have no way of recursing down subtrees. We would have to have a separate helper function that recurses down the subtrees for us. But since we've changed it to take the tree as an argument, which it should have always been in the first place, now we can recurse more easily. So iterative or recursive, we'll go over both, but we'll see that recursive ends up being quite easy. Okay. Does anyone have something we can work with? [Student] I've got an iterative solution. >>All right, iterative. Okay, this looks good. So, want to walk us through it? [Student] Sure. So I set a temp variable to get the first node of the tree. And then I just looped through while temp doesn't equal null, so while was still in the tree, I guess. And if the value is equal to the value that temp is pointing to, then it returns that value. Otherwise, it checks if it's on the right side or the left side. If you ever get a situation where there's no more tree, then it returns--it exits the loop and returns false. [Bowden] Okay. So that seems good. Anyone have any comments on anything? I have no correctness comments at all. The one thing we can do is this guy. Oh, it's going to go a little longish. I'll fix that up. Okay. Everyone should remember how ternary works. There have definitely been quizzes in the past that give you a function with a ternary operator, and say translate this, do something that doesn't use ternary. So this is a very common case of when I would think to use ternary, where if some condition set a variable to something, else set that same variable to something else. That's something that very frequently can be transformed into this sort of thing where set that variable to this--or, well, is this true? Then this, else this. [Student] The first one is if true, right? [Bowden] Yeah. The way I always read it is, temp equals value greater than temp value, then this, else this. It's asking a question. Is it greater? Then do the first thing. Else do the second thing. I almost always--the colon, I just--in my head, I read as else. Does anyone have a recursive solution? Okay. This one we're going to--it could already be great, but we're going to make it even better. This is pretty much the same exact idea. It's just, well, do you want to explain? [Student] Sure. So we're making sure that the tree isn't null first, because if the tree is null, then it's going to return false because we haven't found it. And if there's still a tree, we go into--we first check if the value is the current node. Return true if it is, and if not, we recurse on the left or right. Does that sound appropriate? >>Mm-hmm. (Agreement) So notice that this is almost--structurally very similar to the iterative solution. It's just that instead of recursing, we had a while loop. And the base case here where tree does not equal null was the condition under which we broke out of the while loop. They're very similar. But we're going to take this one step further. Now, we do the same thing here. Notice we're returning the same thing in both of these lines, except for one argument is different. So we're going to make that into a ternary. I hit Option-something, and it made a symbol. Okay. So we're going to return contains that. This is getting to be multiple lines--well, zoomed in, it is. Usually, as a stylistic thing, I don't think many people put a space after the arrow, but I guess if you're consistent, it's fine. If value is less than tree value, we want to recurse on tree left, else we want to recurse on tree right. So that was step one of making this look smaller. Step two of making this look smaller-- we can separate this to multiple lines. Okay. Step two of making it look smaller is here, so return value equals tree value, or contains whatever. This is an important thing. I'm not sure if he said it explicitly in class, but it's called short-circuit evaluation. The idea here is value == tree value. If that is true, then this is true, and we want to OR that with whatever is over here. So without even thinking about whatever is over here, what is the entire expression going to return? [Student] True? >>Yeah, because true of anything, ORed--or true ORed with anything is necessarily true. So as soon as we see return value = tree value, we're just going to return true. Not even going to recurse further contains down the line. We can take this one step further. Return tree does not equal null and all of this. It made it a one-line function. This is also an example of short-circuit evaluation. But now it's the same idea-- instead of--so if tree does not equal null--or, well, if tree does equal null, which is the bad case, if tree equals null, then the first condition is going to be false. So false ANDed with anything is going to be what? [Student] False. >>Yeah. This is the other half of short-circuit evaluation, where if tree does not equal null, then we aren't going to even go-- or if tree does equal null, then we aren't going to do value == tree value. We're just going to immediately return false. Which is important, since if it did not short-circuit evaluate, then if tree does equal null, this second condition is going to seg fault, because tree->value is dereferencing null. So that's that. Can make this--shift once over. This is a very common thing also, not making this one line with this, but it's a common thing in conditions, maybe not right here, but if (tree != NULL && tree->value == value), do whatever. This is a very common condition, where instead of having to break this into two ifs, where like, is the tree null? Okay, it's not null, so now is the tree value equal to value? Do this. Instead, this condition, this will never seg fault because it will break out if this happens to be null. Well, I guess if your tree is a completely invalid pointer, it can still seg fault, but it can't seg fault if tree is null. If it were null, it would break out before you ever dereferenced the pointer in the first place. [Student] Is this called lazy evaluation? [Bowden] Lazy evaluation is a separate thing. Lazy evaluation is more like you ask for a value, you ask to calculate a value, kind of, but you don't need it immediately. So until you actually need it, it isn't evaluated. This isn't exactly the same thing, but in the Huffman pset, it says that we "lazily" write. The reason we do that is because we're actually buffering the write-- we don't want to write individual bits at a time, or individual bytes at a time, we instead want to get a chunk of bytes. Then once we have a chunk of bytes, then we'll write it out. Even though you ask it to write--and fwrite and fread do the same sort of thing. They buffer your reads and writes. Even though you ask it to write immediately, it probably won't. And you can't be sure that things are going to be written until you call hfclose or whatever it is, which then says okay, I'm closing my file, that means I'd better write everything I haven't written yet. It has no need to write everything out until you are closing the file, and then it needs to. So that's just what lazy--it waits until it has to happen. This--take 51 and you'll go into it in more detail, because OCaml and everything in 51, everything is recursion. There are no iterative solutions, basically. Everything is recursion, and lazy evaluation is going to be important for a lot of circumstances where, if you didn't lazily evaluate, that would mean-- The example is streams, which are infinitely long. In theory, you can think of the natural numbers as a stream of 1-2-3-4-5-6-7, so lazily evaluated things are fine. If I say I want the tenth number, then I can evaluate up to the tenth number. If I want the hundredth number, then I can evaluate up to the hundredth number. Without lazy evaluation, then it's going to try to evaluate all numbers immediately. You're evaluating infinitely many numbers, and that's just not possible. So there are a lot of circumstances where lazy evaluation is just essential to getting things to work. Now we want to write insert where insert is going to be similarly changing in its definition. So right now it's bool insert(int value). We're going to change that to bool insert(int value, node* tree). We're actually going to change that again in a bit, we'll see why. And let's move build_node, just for the heck of it, above insert so we don't have to write a function prototype. Which is a hint that you're going to be using build_node in insert. Okay. Take a minute for that. I think I saved the revision if you want to pull from that, or, at least, I did now. I wanted a slight break to think about the logic of insert, if you can't think of it. Basically, you will only ever be inserting at leaves. Like, if I insert 1, then I'm inevitably going to be inserting 1-- I'll change to black--I'll be inserting 1 over here. Or if I insert 4, I want to be inserting 4 over here. So no matter what you do, you're going to be inserting at a leaf. All you have to do is iterate down the tree until you get to the node that should be the node's parent, the new node's parent, and then change its left or right pointer, depending on whether it's greater than or less than the current node. Change that pointer to point to your new node. So iterate down the tree, make the leaf point to the new node. Also think about why that forbids the type of situation before, where I constructed the binary tree where it was correct if you only looked at a single node, but 9 was to the left of 7 if you iterated down all the way. So that's impossible in this scenario, since-- think about inserting 9 or something; at the very first node, I'm going to see 7 and I'm just going to go to the right. So no matter what I do, if I'm inserting by going to a leaf, and to a leaf using the appropriate algorithm, it's going to be impossible for me to insert 9 to the left of 7 because as soon as I hit 7, I'm going to go to the right. Does anyone have something to start with? [Student] I do. >>Sure. [Student, unintelligible] [Other student, unintelligible] [Bowden] It's appreciated. Okay. Want to explain? [Student] Since we know that we were inserting new nodes at the end of the tree, I looped through the tree iteratively until I got to a node that pointed to null. And then I decided to put it either on the right side or the left side using this right variable; it told me where to put it. And then, essentially, I just made that last-- that temp node point to the new node that it was creating, either on the left side or on the right side, depending on what the value right was. Finally, I set the new node value to the value of its testing. [Bowden] Okay, so I see one issue here. This is like 95% of the way there. The one issue that I see, well, does anyone else see an issue? What is the circumstance under which they break out of the loop? [Student] If temp is null? >>Yeah. So how you break out of the loop is if temp is null. But what do I do right here? I dereference temp, which is inevitably null. So the other thing you need to do is not just keep track until temp is null, you want to keep track of the parent at all times. We also want node * parent, I guess we can keep that at null at first. This is going to have weird behavior for the root of the tree, but we'll get to that. If value is greater than whatever, then temp = temp right. But before we do that, parent = temp. Or are parents always going to equal temp? Is that the case? If temp is not null, then I'm going to move down, no matter what, to a node for which temp is the parent. So parent's going to be temp, and then I move temp down. Now temp is null, but parent points to the parent of the thing that is null. So down here, I don't want to set right equals 1. So I moved to the right, so if right = 1, and I guess you also want to do-- if you move to the left, you want to set right equal to 0. Or else if you ever move to the right. So right = 0. If right = 1, now we want to make the parent right pointer newnode, else we want to make the parent left pointer newnode. Questions on that? Okay. So this is the way we--well, actually, instead of doing this, we half expected you to use build_node. And then if newnode equals null, return false. That's that. Now, this is what we expected you to do. This is what the staff solutions do. I disagree with this as the "right" way of going about it, but this is perfectly fine and it will work. One thing that's a little weird right now is if the tree starts off as null, we pass in a null tree. I guess it depends on how you define the behavior of passing in a null tree. I would think that if you pass in a null tree, then inserting the value into a null tree should just return a tree where the only value is that single node. Do people agree with that? You could, if you wanted, if you pass in a null tree and you want to insert a value into it, return false. It's up to you to define that. To do the first thing I said and then-- well, you're going to have trouble doing that, because it would be easier if we had a global pointer to the thing, but we don't, so if tree is null, there's nothing we can do about that. We can just return false. So I'm going to change insert. We technically could just change this right here, how it's iterating over things, but I'm going to change insert to take a node** tree. Double pointers. What does this mean? Instead of dealing with pointers to nodes, the thing I'm going to be manipulating is this pointer. I'm going to be manipulating this pointer. I'm going to be manipulating pointers directly. This makes sense since, think about down-- well, right now, this points to null. What I want to do is manipulate this pointer to point to not null. I want it to point to my new node. If I just keep track of pointers to my pointers, then I don't need to keep track of a parent pointer. I can just keep track to see if the pointer is pointing to null, and if the pointer is pointing to null, change it to point to the node I want. And I can change it since I have a pointer to the pointer. Let's see this right now. You can actually do it recursively pretty easily. Do we want to do that? Yes, we do. Let's see it recursively. First, what is our base case going to be? Almost always our base case; but actually, this is kind of tricky. First things first, if (tree == NULL) I guess we're just going to return false. This is different from your tree being null. This is the pointer to your root pointer being null, which means that your root pointer does not exist. So down here, if I do node*--let's just reuse this. Node* root = NULL, and then I'm going to call insert by doing something like insert 4 into &root. So &root, if root is a node*, then &root is going to be a node**. This is valid. In this case, tree, up here, tree is not null--or insert. Here. Tree is not null; *tree is null, which is fine because if *tree is null, then I can manipulate it to now point to what I want it to point to. But if tree is null, that means I just came down here and said null. That does not make sense. I can't do anything with that. If tree is null, return false. So I basically already said what our real base case is. And what is that going to be? [Student, unintelligible] [Bowden] Yes. So if (*tree == NULL). This relates to the case over here where if my red pointer is the pointer I'm focused on, so like I'm focused on this pointer, now I'm focused on this pointer. Now I'm focused on this pointer. So if my red pointer, which is my node**, is ever--if *, my red pointer, is ever null, that means that I am at the case where I'm focusing on a pointer that points-- this is a pointer that belongs to a leaf. I want to change this pointer to point to my new node. Come back over here. My new node will just be node* n= build_node(value) then n, if n = NULL, return false. Else we want to change what the pointer is currently pointing to to now point to our newly built node. We can actually do that here. Instead of saying n, we say *tree= if *tree. Everyone understand that? That by dealing with pointers to pointers, we can change null pointers to point to things we want them to point to. That's our base case. Now our recurrence, or our recursion, is going to be very similar to all other recursions we've been doing. We're going to want to insert value, and now I'm going to use ternary again, but what is our condition going to be? What is it we're looking for to decide whether we want to go left or right? Let's do it in separate steps. If (value <) what? [Student] The tree's value? [Bowden] So remember that I'm currently-- [Students, unintelligible] [Bowden] Yeah, so right here, let's say that this green arrow is an example of what tree currently is, is a pointer to this pointer. So that means I am a pointer to a pointer to 3. The dereference twice sounded good. What do I--how do I do that? [Student] Dereference once, and then do arrow that way? [Bowden] So (*tree) is the dereference once, ->value is going to give me the value of the node that I'm indirectly pointing to. So I can also write it **tree.value if you prefer that. Either works. If that is the case, then I want to call insert with value. And what is my updated node** going to be? I want to go to the left, so **tree.left is going to be my left. And I want the pointer to that thing so that if the left ends up being the null pointer, I can modify it to point to my new node. And the other case can be very similar. Let's actually make that my ternary right now. Insert value if value < (**tree).value. Then we want to update our ** to the left, else we want to update our ** to the right. [Student] Does that get the pointer to the pointer? [Bowden] Remember that--**tree.right is a node star. [Student, unintelligible] >>Yeah. **tree.right is like this pointer or something. So by taking a pointer to that, that gives me what I want of the pointer to that guy. [Student] Could we go over again why we are using the two pointers? [Bowden] Yeah. So--no, you can, and that solution before was a way of doing it without doing two pointers. You need to be able to understand using two pointers, and this is a cleaner solution. Also, notice that, what happens if my tree-- what happens if my root was null? What happens if I do this case right here? So node* root = NULL, insert 4 into &root. What is root going to be after this? [Student, unintelligible] >>Yeah. Root value is going to be 4. Root left is going to be null, root right is going to be null. In the case where we did not pass root by address, we could not modify root. In the case where the tree--where root was null, we just had to return false. There's nothing we could do. We cannot insert a node into an empty tree. But now we can; we just make an empty tree into a one-node tree. Which is usually the expected way that it's supposed to work. Furthermore, this is significantly shorter than also keeping track of the parent, and so you iterate down all the way. Now I have my parent, and I just have my parent right pointer to the whatever. Instead, if we did this iteratively, it'd be the same idea with a while loop. But instead of having to deal with my parent pointer, instead, my current pointer would be the thing that I'm directly modifying to point to my new node. I don't have to deal with whether it's pointing to the left. I don't have to deal with whether it's pointing to the right. It's just whatever this pointer is, I'm going to set it to point to my new node. Everyone understand how it works? If not why do we want to do it this way, but at least that this works as a solution? [Student] Where do we return true? [Bowden] That's probably right here. If we correctly insert it, return true. Else, down here we're going to want to return whatever insert returns. And what's special about this recursive function? It's tail recursive, so as long as we compile with some optimization, it will recognize that and you will never get a stack overflow from this, even if our tree has a height of 10 thousand or 10 million. [Student, unintelligible] [Bowden] I think it does it at dash--or what optimization level is required for a tail recursion to be recognized. I think it recognizes--GCC and Clang also have different meanings for their optimization levels. I want to say it's -o 2, for sure that it will recognize tail recursion. But we--you could construct like a Fibonocci example or something. It's not easy to test with this, because it's difficult to construct a binary tree that's so large. But yeah, I think it's -o 2, that if you compile with -o 2, it will look for tail recursion and optimize that out. Let's go back to--insert's literally the last thing it needs. Let's go back to the insert over here where we're going to do the same idea. It'll still have the flaw of not being able to entirely handle when the root itself is null, or the past entry is null, but instead of dealing with a parent pointer, let's apply the same logic of keeping pointers to pointers. If here we keep our node ** cur, and we don't need to keep track of right anymore, but node ** cur = &tree. And now our while loop is going to be while *cur does not equal null. Don't need to keep track of parents anymore. Don't need to keep track of left and right. And I'll call it temp, because we're already using temp. Okay. So if (value > *temp), then &(*temp)->right else temp = &(*temp)->left. And now, at this point, after this while loop, I only do this because maybe it's easier to think about iteratively than recursively, but after this while loop, *temp is the pointer we want to change. Before we had parent, and we wanted to change either parent left or parent right, but if we want to change parent right, then *temp is parent right, and we can change it directly. So down here, we can do *temp = newnode, and that's it. So notice, all we did in this was take out lines of code. In order to keep track of the parent in all that is additional effort. Here, if we just keep track of the pointer to the pointer, and even if we wanted to get rid of all these curly braces now, make it look shorter. This now is the exact same solution, but fewer lines of code. Once you start recognizing this as a valid solution, it's also easier to reason about than like, okay, why do I have this flag at int right? What does that mean? Oh, it's signifying that every time I go right, I need to set it, else if I go left I need to set it to zero. Here, I don't have to reason about that; it's just easier to think about. Questions? [Student, unintelligible] >>Yeah. Okay, so in the last bit-- I guess one quick and easy function we can do is, let's--together, I guess, try and write a contains function which does not care whether it is a binary search tree. This contains function should return true if anywhere in this general binary tree is the value we're looking for. So let's first do it recursively and then we'll do it iteratively. We can actually just do it together, because this is going to be really short. What is my base case going to be? [Student, unintelligible] [Bowden] So if (tree == NULL), then what? [Student] Return false. [Bowden] Else, well, I don't need the else. If was my other base case. [Student] Tree's value? >>Yeah. So if (tree->value == value). Notice we're back to node*, not node**s? Contains will never need to use a node**, since we aren't modifying pointers. We're just traversing them. If that happens, then we want to return true. Else we want to traverse the children. So we cannot reason about whether everything to the left is less and everything to the right is greater. So what is our condition going to be here--or, what are we going to do? [Student, unintelligible] >>Yeah. Return contains(value, tree->left) or contains(value, tree->right). And that's it. And notice there is some short-circuit evaluation, where if we happen to find the value in the left tree, we never need to look at the right tree. That's the entire function. Now let's do it iteratively, which is going to be less nice. We'll take the usual start of node* cur = tree. While (cur != NULL). Quickly going to see a problem. If cur--out here, if we ever break out of this, then we've run out of things to look at, so return false. If (cur->value == value), return true. So now, we are at a place-- we don't know whether we want to go left or right. So arbitrarily, let's just go left. I've obviously run into an issue where I've completely abandoned everything-- I will only ever check the left side of a tree. I will never check anything that is a right child of anything. How do I fix this? [Student] You have to keep track of the left and right in a stack. [Bowden] Yeah. So let's make it struct list, node* n, and then node** next? I think that works fine. We want to go over the left, or let's--up here. Struct list list =, it'll start out at this struct list. *list = NULL. So that's going to be our linked list of subtrees that we have skipped over. We are going to traverse left now, but since we inevitably need to come back to the right, we're going to keep the right side inside of our struct list. Then we'll have new_list or struct, struct list*, new_list = malloc (sizeof(list)). I'm going to ignore error checking that, but you should check to see if it's null. New_list the node it's going to point to-- oh, that's why I wanted it up here. It's going to point to a second struct list. That's just how linked lists work. This is the same as an int linked list except we're just replacing int with node*. It's exactly the same. So new_list, the value of our new_list node, is going to be cur->right. The value of our new_list->next is going to be our original list, and then we're going to update our list to point to new_list. Now we need some sort of way of pulling things, like we have traversed the entire left subtree. Now we need to pull stuff out of it, like cur is null; we don't want to just return false. We want to now pull outside at our new list. A convenient way of doing this--well, actually, there's multiple ways of doing this. Anyone have a suggestion? Where I should do this or how I should do this? We only have a couple minutes, but any suggestions? Instead of--one way, instead of our condition being while, what we're currently looking at is not null, instead we're going to continue to go until our list itself is null. So if our list ends up being null, then we have run out of things to look for, to search over. But that means that the first thing in our list is just going to be the first node. The very first thing will be--we no longer need to see that. So list->n will be our tree. list->next is going to be null. And now while list does not equal null. Cur is going to pull something from our list. So cur is going to equal list->n. And then list is going to equal list->n, or list->next. So if cur value equals value. Now we can add both our right pointer and our left pointer as long as they're not null. Down here, I guess we should have done that in the first place. If (cur->right != NULL) then we are going to insert that node into our list. If (cur->left), this is a little bit of extra work, but it's fine. If (cur->left != NULL), and we're going to insert the left into our linked list, and that should be it. We iterate--as long as we have something in our list, we have another node to look at. So we look at that node, we advance our list to the next one. If that node is the value we're looking for, we can return true. Else insert both our left and right subtrees, as long as they're not null, into our list so that we inevitably go over them. So if they were not null, if our root pointer pointed to two things, then at first we pulled something out so our list ends up being null. And then we put two things back in, so now our list is of size 2. Then we're going to loop back up and we're just going to pull, let's say, the left pointer of our root node. And that'll just keep happening; we'll end up looping over everything. Notice that this was significantly more complicated in the recursive solution. And I've said multiple times that the recursive solution usually has much in common with the iterative solution. Here, this is exactly what the recursive solution is doing. The only change is that instead of implicitly using the stack, the program stack, as your way of keeping track of what nodes you still need to visit, now you have to explicitly use a linked list. In both cases you are keeping track of what node still needs to be visited. In the recursive case, it's just easier because a stack is implemented for you as the program stack. Notice that this linked list, it is a stack. Whatever we just put on the stack is immediately what we're going to pull off the stack to visit next. We're out of time, but any questions? [Student, unintelligible] [Bowden] Yeah. So if we have our linked list, current is going to point to this guy, and now we're just advancing our linked list to focus on this guy. We're traversing over the linked list in that line. And then I guess we should free our linked list and stuff once before returning true or false, we need to iterate over our linked list and always down here, I guess, if we cur right is not equal to, add it, so now we want to free cur because, well, did we completely forget about the list? Yeah. So that's what we want to do here. Where's the pointer? Cur was then--we want to a struct list* 10 equals list next. Free list, list = temp. And in the case where we return true, we do need to iterate over the remainder of our linked list, freeing things. The nice thing about the recursive solution is freeing things just means popping factorings off the stack, which will happen for you. So we've gone from something that's like 3 lines of hard-to-think-about code to something that is significantly many more hard-to-think-about lines of code. Any more questions? All right. We're good. Bye! [CS50.TV]