1 00:00:00,000 --> 00:00:02,770 [Section 7: More Comfortable] 2 00:00:02,770 --> 00:00:05,090 [Rob Bowden] [Harvard University] 3 00:00:05,090 --> 00:00:07,930 [This is CS50] [CS50.TV] 4 00:00:07,930 --> 00:00:10,110 All right. So like I said in my email, 5 00:00:10,110 --> 00:00:14,060 this is going to be a binary tree-intensive section. 6 00:00:14,060 --> 00:00:16,820 But there aren't that many questions. 7 00:00:16,820 --> 00:00:18,920 So we're going to try and draw out each question 8 00:00:18,920 --> 00:00:25,430 and go into painful detail of all the best ways of doing things. 9 00:00:25,430 --> 00:00:31,200 So right at the beginning, we go through sample drawings of binary trees and stuff. 10 00:00:31,200 --> 00:00:35,970 So here, "Remember that a binary tree has nodes similar to those of a linked list, 11 00:00:35,970 --> 00:00:38,150 except instead of one pointer there are two: one for the left 'child' 12 00:00:38,150 --> 00:00:41,950 and one for the right 'child'." 13 00:00:41,950 --> 00:00:45,630 So a binary tree is just like a linked list, 14 00:00:45,630 --> 00:00:47,910 except the struct is going to have two pointers. 15 00:00:47,910 --> 00:00:51,390 There's trinary trees, which are going to have three pointers, 16 00:00:51,390 --> 00:00:56,540 there are N-ary trees, which just have a generic pointer 17 00:00:56,540 --> 00:01:00,320 that you then have to malloc to be large enough to have 18 00:01:00,320 --> 00:01:04,840 enough pointers to all the possible children. 19 00:01:04,840 --> 00:01:08,200 So binary tree just happens to have a constant number of two. 20 00:01:08,200 --> 00:01:11,260 If you want, you can give a linked list as a unary tree, 21 00:01:11,260 --> 00:01:15,360 but I don't think anyone calls it that. 22 00:01:15,360 --> 00:01:18,060 "Draw a boxes-and-arrows diagram of a binary tree node 23 00:01:18,060 --> 00:01:24,110 containing Nate's favorite number, 7, where each child pointer is null." 24 00:01:24,110 --> 00:01:27,780 So iPad mode. 25 00:01:27,780 --> 00:01:30,280 >> It's going to be pretty straightforward. 26 00:01:30,280 --> 00:01:36,150 We're just going to have a node, I'll draw it as a square. 27 00:01:36,150 --> 00:01:38,730 And I'll draw the values in here. 28 00:01:38,730 --> 00:01:41,090 So the value will go in here, 29 00:01:41,090 --> 00:01:44,770 and then down here, we'll have the left pointer on the left and the right pointer on the right. 30 00:01:44,770 --> 00:01:50,430 And it is very much so convention to call it left and right, the pointer names. 31 00:01:50,430 --> 00:01:52,460 Both of these are going to be null. 32 00:01:52,460 --> 00:01:57,870 That will just be null, and that will just be null. 33 00:01:57,870 --> 00:02:03,630 Okay. So back to here. 34 00:02:03,630 --> 00:02:05,700 "With a linked list, we only had to store a pointer 35 00:02:05,700 --> 00:02:09,639 to the first node in the list in order to remember the whole linked list, or the whole list. 36 00:02:09,639 --> 00:02:11,650 Likewise, with trees, we only have to store a pointer 37 00:02:11,650 --> 00:02:13,420 to a single node in order to remember the whole tree. 38 00:02:13,420 --> 00:02:15,980 This node is called the "root" of the tree. 39 00:02:15,980 --> 00:02:18,320 Build upon your diagram from before or draw a new one 40 00:02:18,320 --> 00:02:21,690 such that you have a boxes-and-arrows depiction of a binary tree 41 00:02:21,690 --> 00:02:25,730 with the the value 7, then 3 in the left, 42 00:02:25,730 --> 00:02:32,760 then 9 on the right, and then 6 on the right of the 3." 43 00:02:32,760 --> 00:02:34,810 Let's see if I can remember all of that in my head. 44 00:02:34,810 --> 00:02:37,670 So this is going to be our root up here. 45 00:02:37,670 --> 00:02:41,600 We'll have some pointer somewhere, something that we'll call root, 46 00:02:41,600 --> 00:02:45,140 and it's pointing to this guy. 47 00:02:45,140 --> 00:02:48,490 Now to make a new node, 48 00:02:48,490 --> 00:02:52,870 what do we have, 3 on the left? 49 00:02:52,870 --> 00:02:57,140 So a new node with 3, and it initially points null. 50 00:02:57,140 --> 00:02:59,190 I'll just put N. 51 00:02:59,190 --> 00:03:02,250 Now we want to make that go to the left of 7. 52 00:03:02,250 --> 00:03:06,840 So we change this pointer to now point to this guy. 53 00:03:06,840 --> 00:03:12,420 And we do the same. We want a 9 over here, 54 00:03:12,420 --> 00:03:14,620 which initially just says null. 55 00:03:14,620 --> 00:03:19,600 We're going to change this pointer, point to 9, 56 00:03:19,600 --> 00:03:23,310 and now we want to put 6 to the right of 3. 57 00:03:23,310 --> 00:03:32,170 So it's going to--make a 6. 58 00:03:32,170 --> 00:03:34,310 And that guy will point there. 59 00:03:34,310 --> 00:03:38,320 Okay. So that's all it asks us to do. 60 00:03:38,320 --> 00:03:42,770 >> Now let's go over some terminology. 61 00:03:42,770 --> 00:03:46,690 We already talked about how the root of the tree is the top-most node in the tree. 62 00:03:46,690 --> 00:03:54,720 The one containing 7. The nodes at the bottom of the tree are called the leaves. 63 00:03:54,720 --> 00:04:01,240 Any node which just has null as its children is a leaf. 64 00:04:01,240 --> 00:04:03,680 So it is possible, if our binary tree is just a single node, 65 00:04:03,680 --> 00:04:10,130 that a tree is a leaf, and that's it. 66 00:04:10,130 --> 00:04:12,060 "The 'height' of the tree is the number of hops you have to make 67 00:04:12,060 --> 00:04:16,620 to get from the top to a leaf." 68 00:04:16,620 --> 00:04:18,640 We'll get into, in a second, the difference 69 00:04:18,640 --> 00:04:21,940 between balanced and unbalanced binary trees, 70 00:04:21,940 --> 00:04:29,470 but for now, the overall height of this tree, 71 00:04:29,470 --> 00:04:34,520 I would say, is 3, although if you count the number of hops 72 00:04:34,520 --> 00:04:39,710 you have to make in order to get to 9, then it's really only a height of 2. 73 00:04:39,710 --> 00:04:43,340 Right now, this is an unbalanced binary tree, 74 00:04:43,340 --> 00:04:49,840 but we'll talked about balanced when it gets to be relevant. 75 00:04:49,840 --> 00:04:52,040 So now we can talk about nodes in a tree in terms 76 00:04:52,040 --> 00:04:54,700 relative to the other nodes in the tree. 77 00:04:54,700 --> 00:04:59,730 So now we have parents, children, siblings, ancestors, and descendants. 78 00:04:59,730 --> 00:05:05,650 They are pretty common sense, what they mean. 79 00:05:05,650 --> 00:05:09,610 If we ask--it's parents. 80 00:05:09,610 --> 00:05:12,830 So what is the parent of 3? 81 00:05:12,830 --> 00:05:16,090 [Students] 7. >>Yeah. The parent is just going to be what points to you. 82 00:05:16,090 --> 00:05:20,510 Then what are the children of 7? 83 00:05:20,510 --> 00:05:23,860 [Students] 3 and 9. >>Yeah. 84 00:05:23,860 --> 00:05:26,460 Notice that "children" literally means children, 85 00:05:26,460 --> 00:05:31,010 so 6 would not apply, because it's like a grandchild. 86 00:05:31,010 --> 00:05:35,540 But then if we go descendants, so what are the descendants of 7? 87 00:05:35,540 --> 00:05:37,500 [Students] 3, 6 and 9. >>Yeah. 88 00:05:37,500 --> 00:05:42,330 The descendants of the root node is going to be everything in the tree, 89 00:05:42,330 --> 00:05:47,950 except maybe the root node itself, if you don't want to consider that a descendant. 90 00:05:47,950 --> 00:05:50,750 And finally, ancestors, so it's the opposite direction. 91 00:05:50,750 --> 00:05:55,740 So what are the ancestors of 6? 92 00:05:55,740 --> 00:05:58,920 [Students] 3 and 7. >>Yeah. 9 is not included. 93 00:05:58,920 --> 00:06:02,520 It's just the direct lineage back to the root 94 00:06:02,520 --> 00:06:13,230 is going to be your ancestors. 95 00:06:13,230 --> 00:06:16,300 >> "We say that a binary tree is 'ordered' if for each node in the tree, 96 00:06:16,300 --> 00:06:19,530 all of its descendants on the left have lesser values 97 00:06:19,530 --> 00:06:22,890 and all of the ones on the right have greater values. 98 00:06:22,890 --> 00:06:27,060 For example, the tree above is ordered but it's not the only possible ordered arrangement." 99 00:06:27,060 --> 00:06:30,180 Before we get to that, 100 00:06:30,180 --> 00:06:36,420 an ordered binary tree is also known as a binary search tree. 101 00:06:36,420 --> 00:06:38,660 Here we happen to be calling it an ordered binary tree, 102 00:06:38,660 --> 00:06:41,850 but I have never heard it called an ordered binary tree before, 103 00:06:41,850 --> 00:06:46,650 and on a quiz, we are much more likely to put binary search tree. 104 00:06:46,650 --> 00:06:49,250 They're one and the same, 105 00:06:49,250 --> 00:06:54,580 and it's important you recognize the distinction between binary tree and binary search tree. 106 00:06:54,580 --> 00:06:58,830 A binary tree is just a tree that points to two things. 107 00:06:58,830 --> 00:07:02,120 Each node points to two things. 108 00:07:02,120 --> 00:07:06,310 There is no reasoning about the values that it points to. 109 00:07:06,310 --> 00:07:11,370 So like over here, since it's a binary search tree, 110 00:07:11,370 --> 00:07:14,030 we know that if we go left of 7, 111 00:07:14,030 --> 00:07:16,670 then all of the values that we can possibly reach 112 00:07:16,670 --> 00:07:19,310 by going left of 7 have to be less than 7. 113 00:07:19,310 --> 00:07:24,070 Notice that all the values less than 7 are 3 and 6. 114 00:07:24,070 --> 00:07:26,040 Those are all to the left of 7. 115 00:07:26,040 --> 00:07:29,020 If we go to the right of 7, everything has to be greater than 7, 116 00:07:29,020 --> 00:07:33,220 so 9 is to the right of 7, so we're good. 117 00:07:33,220 --> 00:07:36,150 This is not the case for a binary tree-- 118 00:07:36,150 --> 00:07:40,020 for a regular binary tree we can just have 3 at the top, 7 to the left, 119 00:07:40,020 --> 00:07:47,630 9 to the left of 7; there's no ordering of values whatsoever. 120 00:07:47,630 --> 00:07:56,140 Now, we won't actually do this because it's tedious and unnecessary, 121 00:07:56,140 --> 00:07:59,400 but "try to draw as many ordered trees as you can think of 122 00:07:59,400 --> 00:08:01,900 using the numbers 7, 3, 9, and 6. 123 00:08:01,900 --> 00:08:06,800 How many distinct arrangements are there? What is the height of each one?" 124 00:08:06,800 --> 00:08:10,480 >> We'll do a couple, but the main idea is 125 00:08:10,480 --> 00:08:16,530 this is in no way a unique representation of a binary tree containing these values. 126 00:08:16,530 --> 00:08:22,760 All we need is some binary tree that contains 7, 3, 6, and 9. 127 00:08:22,760 --> 00:08:25,960 Another possible valid one would be the root is 3, 128 00:08:25,960 --> 00:08:30,260 go to the left and it's 6, go to the left and it's 7, 129 00:08:30,260 --> 00:08:32,730 go to the left and it's 9. 130 00:08:32,730 --> 00:08:36,169 That is a perfectly valid binary search tree. 131 00:08:36,169 --> 00:08:39,570 It's not very helpful, because it's just like a linked list 132 00:08:39,570 --> 00:08:43,750 and all of these pointers are just null. 133 00:08:43,750 --> 00:08:48,900 But it is a valid tree. Yeah? 134 00:08:48,900 --> 00:08:51,310 [Student] Don't the values have to be greater on the right? 135 00:08:51,310 --> 00:08:56,700 Or is this--? >>These I meant to go the other way. 136 00:08:56,700 --> 00:09:00,960 There's also--yeah, let's switch that. 137 00:09:00,960 --> 00:09:07,770 9, 7, 6, 3. Good catch. 138 00:09:07,770 --> 00:09:11,730 It still has to obey what a binary tree search is supposed to do. 139 00:09:11,730 --> 00:09:15,650 So everything to the left has to be less than any given node. 140 00:09:15,650 --> 00:09:23,180 We could just move, say, this 6 and put it here. 141 00:09:23,180 --> 00:09:26,880 No, we can't. Why do I keep doing that? 142 00:09:26,880 --> 00:09:35,350 Let's do--here is 6, here is 7, 6 points to 3. 143 00:09:35,350 --> 00:09:39,290 This is still a valid binary search tree. 144 00:09:39,290 --> 00:09:49,260 What is wrong if I--let's see if I can come up with an arrangement. 145 00:09:49,260 --> 00:09:52,280 Yeah, okay. So what is wrong with this tree? 146 00:09:52,280 --> 00:10:08,920 I guess I've already given you a hint that there is something wrong with it. 147 00:10:08,920 --> 00:10:14,430 Why do I keep doing that? 148 00:10:14,430 --> 00:10:18,510 Okay. This looks reasonable. 149 00:10:18,510 --> 00:10:22,590 If we look at each node, like 7, then to the left of 7 is a 3. 150 00:10:22,590 --> 00:10:24,960 So we have 3, the thing to the right of 3 is a 6. 151 00:10:24,960 --> 00:10:27,750 And if you look at 6, the thing to the right of 6 is a 9. 152 00:10:27,750 --> 00:10:30,910 So why is this not a valid binary search tree? 153 00:10:30,910 --> 00:10:36,330 [Students] 9 is still to the left of 7. >>Yeah. 154 00:10:36,330 --> 00:10:43,710 It must be true that all values you can possibly reach by going to the left of 7 are less than 7. 155 00:10:43,710 --> 00:10:49,120 If we go left of 7, we get to 3 and we can still get to 6, 156 00:10:49,120 --> 00:10:52,520 we can still get to 9, but by having gone less than 7, 157 00:10:52,520 --> 00:10:55,070 we shouldn't be able to get to a number that's greater than 7. 158 00:10:55,070 --> 00:10:59,830 So this is not a valid binary search tree. 159 00:10:59,830 --> 00:11:02,330 >> My brother actually had an interview question 160 00:11:02,330 --> 00:11:07,760 that was basically this: just code up something to validate 161 00:11:07,760 --> 00:11:10,500 whether a tree is a binary search tree, 162 00:11:10,500 --> 00:11:13,580 and so the first thing he did was just check to see 163 00:11:13,580 --> 00:11:17,020 if the left and right are correct, and then iterate down there. 164 00:11:17,020 --> 00:11:19,700 But you can't just do that; you have to keep track 165 00:11:19,700 --> 00:11:22,550 of the fact that now that I've gone left of 7, 166 00:11:22,550 --> 00:11:26,340 everything in this subtree must be less than 7. 167 00:11:26,340 --> 00:11:28,430 The correct algorithm needs to keep track 168 00:11:28,430 --> 00:11:35,970 of the bounds that the values can possibly fall in. 169 00:11:35,970 --> 00:11:38,360 We won't go through all of them. 170 00:11:38,360 --> 00:11:41,630 There is a nice recurrence relation, 171 00:11:41,630 --> 00:11:44,810 although we haven't gotten to those, or we won't get to those, 172 00:11:44,810 --> 00:11:47,030 defining how many there actually are. 173 00:11:47,030 --> 00:11:53,180 So there are 14 of them. 174 00:11:53,180 --> 00:11:56,020 The idea of how you would do it mathematically is like 175 00:11:56,020 --> 00:11:59,770 you can pick any single one to be the root node, 176 00:11:59,770 --> 00:12:03,160 and then if I pick 7 to be the root node, 177 00:12:03,160 --> 00:12:08,150 then there are, say, some numbers that can go be my left node, 178 00:12:08,150 --> 00:12:10,440 and there are some numbers that can be my right node, 179 00:12:10,440 --> 00:12:15,090 but if I have n total numbers, then the amount that can go to the left 180 00:12:15,090 --> 00:12:18,820 plus the amount that can go to the right is n - 1. 181 00:12:18,820 --> 00:12:26,130 So of the remaining numbers, they have to be able to go either to the left or the right. 182 00:12:26,130 --> 00:12:31,580 It seems difficult that if I put 3 first, then everything has to go to the left, 183 00:12:31,580 --> 00:12:35,080 but if I put 7, then some things can go the the left and some things can go to the right. 184 00:12:35,080 --> 00:12:39,570 And by "3 first", I meant everything can go to the right. 185 00:12:39,570 --> 00:12:42,350 It's really--you just have to think about it as-- 186 00:12:42,350 --> 00:12:47,980 how many things can go on the next level of the tree. 187 00:12:47,980 --> 00:12:50,420 And it comes out to be 14; or you can draw all of them, 188 00:12:50,420 --> 00:12:54,650 and then you'll get 14. 189 00:12:54,650 --> 00:13:01,120 >> Going back here, 190 00:13:01,120 --> 00:13:03,510 "Ordered binary trees are cool because we can search through them 191 00:13:03,510 --> 00:13:06,910 in a very similar way to searching over a sorted array. 192 00:13:06,910 --> 00:13:10,120 To do so, we start at the root and work our way down the tree 193 00:13:10,120 --> 00:13:13,440 towards leaves, checking each node's values against the values we're searching for. 194 00:13:13,440 --> 00:13:15,810 If the current node's value is less than the value we're looking for, 195 00:13:15,810 --> 00:13:18,050 you go next to the node's right child. 196 00:13:18,050 --> 00:13:20,030 Otherwise, you go to the node's left child. 197 00:13:20,030 --> 00:13:22,800 At some point, you'll either find the value you're looking for, or you'll run into a null, 198 00:13:22,800 --> 00:13:28,520 indicating the value's not in the tree." 199 00:13:28,520 --> 00:13:32,450 I have to redraw the tree we had before; that'll take a second. 200 00:13:32,450 --> 00:13:38,280 But we want to look up whether 6, 10, and 1 are in the tree. 201 00:13:38,280 --> 00:13:49,180 So what it was, 7, 9, 3, 6. Okay. 202 00:13:49,180 --> 00:13:52,730 The numbers you want to look up, we want to look up 6. 203 00:13:52,730 --> 00:13:55,440 How does this algorithm work? 204 00:13:55,440 --> 00:14:03,040 Well, we also have some root pointer to our tree. 205 00:14:03,040 --> 00:14:12,460 And we would go to the root and say is this value equal to the value we're searching for? 206 00:14:12,460 --> 00:14:15,000 So we're looking for 6, so it's not equal. 207 00:14:15,000 --> 00:14:20,140 So we keep going, and now we say okay, so 6 is less than 7. 208 00:14:20,140 --> 00:14:23,940 Does that mean we want to go to the left, or do we want to go to the right? 209 00:14:23,940 --> 00:14:27,340 [Student] Left. >>Yeah. 210 00:14:27,340 --> 00:14:33,340 It's significantly easier, all you have to do is draw one possible node of the tree 211 00:14:33,340 --> 00:14:37,760 and then you don't--instead of trying to think in your head, 212 00:14:37,760 --> 00:14:40,020 okay, if it's less, do I go to the left or go the right, 213 00:14:40,020 --> 00:14:43,030 just looking at this picture, it's very clear that I have to go to the left 214 00:14:43,030 --> 00:14:47,700 if this node is greater than the value that I'm looking for. 215 00:14:47,700 --> 00:14:52,600 So you go to the left; now I'm at 3. 216 00:14:52,600 --> 00:14:57,770 I want to--3 is less than the value I'm looking for, which is 6. 217 00:14:57,770 --> 00:15:03,420 So we go to the right, and now I end up at 6, 218 00:15:03,420 --> 00:15:07,380 which is the value I'm looking for, so I return true. 219 00:15:07,380 --> 00:15:15,760 The next value I'm going to look for is 10. 220 00:15:15,760 --> 00:15:23,230 Okay. So 10, now, is going to--cut off that--going to follow the root. 221 00:15:23,230 --> 00:15:27,670 Now, 10 is going to be greater than 7, so I want to look to the right. 222 00:15:27,670 --> 00:15:31,660 I'm going to come over here, 10 is going to be greater than 9, 223 00:15:31,660 --> 00:15:34,520 so I'm going to want to look to the right. 224 00:15:34,520 --> 00:15:42,100 I come over here, but over here now I'm at null. 225 00:15:42,100 --> 00:15:44,170 What do I do if I hit null? 226 00:15:44,170 --> 00:15:47,440 [Student] Return false? >>Yeah. I did not find 10. 227 00:15:47,440 --> 00:15:49,800 1 is going to be a nearly identical case, 228 00:15:49,800 --> 00:15:51,950 except it's just going to be flipped; instead of looking 229 00:15:51,950 --> 00:15:56,540 down the right side, I'm going to look down the left side. 230 00:15:56,540 --> 00:16:00,430 >> Now I think we actually get to code. 231 00:16:00,430 --> 00:16:04,070 Here's where--open up the CS50 Appliance and navigate your way there, 232 00:16:04,070 --> 00:16:07,010 but you can also just do it in the space. 233 00:16:07,010 --> 00:16:09,170 It's probably ideal to do it in the space, 234 00:16:09,170 --> 00:16:13,760 because we can work in the space. 235 00:16:13,760 --> 00:16:19,170 "First we'll need a new type definition for a binary tree node containing int values. 236 00:16:19,170 --> 00:16:21,400 Using the boilerplate typedef below, 237 00:16:21,400 --> 00:16:24,510 create a new type definition for a node in a binary tree. 238 00:16:24,510 --> 00:16:27,930 If you get stuck. . . " blah, blah, blah. Okay. 239 00:16:27,930 --> 00:16:30,380 So let's put the boilerplate here, 240 00:16:30,380 --> 00:16:41,630 typedef struct node, and node. Yeah, okay. 241 00:16:41,630 --> 00:16:44,270 So what are the fields we're going to want in our node? 242 00:16:44,270 --> 00:16:46,520 [Student] Int and then two pointers? 243 00:16:46,520 --> 00:16:49,700 >>Int value, two pointers? How do I write the pointers? 244 00:16:49,700 --> 00:16:58,440 [Student] Struct. >>I should zoom in. Yeah, so struct node* left, 245 00:16:58,440 --> 00:17:01,320 and struct node* right. 246 00:17:01,320 --> 00:17:03,460 And remember the discussion from last time, 247 00:17:03,460 --> 00:17:15,290 that this makes no sense, this makes no sense, 248 00:17:15,290 --> 00:17:18,270 this makes no sense. 249 00:17:18,270 --> 00:17:25,000 You need everything there in order to define this recursive struct. 250 00:17:25,000 --> 00:17:27,970 Okay, so that's what our tree is going to look like. 251 00:17:27,970 --> 00:17:37,670 If we did a trinary tree, then a node might look like b1, b2, struct node* b3, 252 00:17:37,670 --> 00:17:43,470 where b is a branch--actually, I've more heard it left, middle, right, but whatever. 253 00:17:43,470 --> 00:17:55,610 We only care about binary, so right, left. 254 00:17:55,610 --> 00:17:59,110 "Now declare a global node * variable for the root of the tree." 255 00:17:59,110 --> 00:18:01,510 So we're not going to do that. 256 00:18:01,510 --> 00:18:08,950 In order to make things slightly more difficult and more generalized, 257 00:18:08,950 --> 00:18:11,570 we won't have a global node variable. 258 00:18:11,570 --> 00:18:16,710 Instead, in main, we will declare all our node things, 259 00:18:16,710 --> 00:18:20,500 and that means that below, when we start running 260 00:18:20,500 --> 00:18:23,130 our contains function and our insert function, 261 00:18:23,130 --> 00:18:27,410 instead of our contains function just using this global node variable, 262 00:18:27,410 --> 00:18:34,280 we're going to have it take as an argument the tree that we want it to process. 263 00:18:34,280 --> 00:18:36,650 Having the global variable was supposed to make things easier. 264 00:18:36,650 --> 00:18:42,310 We're going to make things harder. 265 00:18:42,310 --> 00:18:51,210 Now take a minute or so to just do this sort of thing, 266 00:18:51,210 --> 00:18:57,690 where inside of main you want to create this tree, and that's all you want to do. 267 00:18:57,690 --> 00:19:04,530 Try and construct this tree in your main function. 268 00:19:42,760 --> 00:19:47,610 >> Okay. So you don't even have to have constructed the tree the entire way yet. 269 00:19:47,610 --> 00:19:49,840 But anyone have something I could pull up 270 00:19:49,840 --> 00:20:03,130 to show how one might start constructing such a tree? 271 00:20:03,130 --> 00:20:08,080 [Student] Someone's banging, trying to get out. 272 00:20:08,080 --> 00:20:13,100 [Bowden] Anyone comfortable with their tree construction? 273 00:20:13,100 --> 00:20:15,520 [Student] Sure. It's not done. >>It's fine. We can just finish-- 274 00:20:15,520 --> 00:20:26,860 oh, can you save it? Hooray. 275 00:20:26,860 --> 00:20:32,020 So here we have--oh, I'm slightly cut off. 276 00:20:32,020 --> 00:20:34,770 Am I zoomed in? 277 00:20:34,770 --> 00:20:37,730 Zoom in, scroll out. >>I have a question. >>Yeah? 278 00:20:37,730 --> 00:20:44,410 [Student] When you define struct, are things like initialized to anything? 279 00:20:44,410 --> 00:20:47,160 [Bowden] No. >>Okay. So you would have to initialize-- 280 00:20:47,160 --> 00:20:50,450 [Bowden] No. When you define, or when you declare a struct, 281 00:20:50,450 --> 00:20:55,600 it is not initialized by default; it's just like if you declare an int. 282 00:20:55,600 --> 00:20:59,020 It's exactly the same thing. It's like each of its individual fields 283 00:20:59,020 --> 00:21:04,460 can have a garbage value in it. >>And is it possible to define--or to declare a struct 284 00:21:04,460 --> 00:21:07,740 in a way that it does initialize them? 285 00:21:07,740 --> 00:21:13,200 [Bowden] Yes. So, shortcut initialization syntax 286 00:21:13,200 --> 00:21:18,660 is going to look like-- 287 00:21:18,660 --> 00:21:22,200 There's two ways we can do this. I think we should compile it 288 00:21:22,200 --> 00:21:25,840 to make sure Clang also does this. 289 00:21:25,840 --> 00:21:28,510 The order of arguments that comes in the struct, 290 00:21:28,510 --> 00:21:32,170 you put as the order of arguments inside of these curly braces. 291 00:21:32,170 --> 00:21:35,690 So if you want to initialize it to 9 and left be null and then right be null, 292 00:21:35,690 --> 00:21:41,570 it would be 9, null, null. 293 00:21:41,570 --> 00:21:47,890 The alternative is, and the editor does not like this syntax, 294 00:21:47,890 --> 00:21:50,300 and it thinks I want a new block, 295 00:21:50,300 --> 00:22:01,800 but the alternative is something like-- 296 00:22:01,800 --> 00:22:04,190 here, I'll put it on a new line. 297 00:22:04,190 --> 00:22:09,140 You can explicitly say--I forget the exact syntax. 298 00:22:09,140 --> 00:22:13,110 So you can explicitly address them by name and say 299 00:22:13,110 --> 00:22:21,570 .c, or .value = 9, .left = NULL. 300 00:22:21,570 --> 00:22:24,540 I'm guessing these need to be commas. 301 00:22:24,540 --> 00:22:29,110 .right = NULL, so this way you don't 302 00:22:29,110 --> 00:22:33,780 actually need to know the order of the struct, 303 00:22:33,780 --> 00:22:36,650 and when you're reading this, it's much more explicit 304 00:22:36,650 --> 00:22:41,920 about what the value's being initialized to. 305 00:22:41,920 --> 00:22:44,080 >> This happens to be one of the things that-- 306 00:22:44,080 --> 00:22:49,100 so, for the most part, C++ is a superset of C. 307 00:22:49,100 --> 00:22:54,160 You can take C code, move it over to C++, and it should compile. 308 00:22:54,160 --> 00:22:59,570 This is one of the things that C++ does not support, so people tend not to do it. 309 00:22:59,570 --> 00:23:01,850 I don't know if that's the only reason people tend not to do it, 310 00:23:01,850 --> 00:23:10,540 but the case where I needed to use it needed to work with C++ and so I couldn't use this. 311 00:23:10,540 --> 00:23:14,000 Another example of something that doesn't work with C++ is 312 00:23:14,000 --> 00:23:16,940 how malloc returns a "void *", technically, 313 00:23:16,940 --> 00:23:20,200 but you could just say char* x = malloc whatever, 314 00:23:20,200 --> 00:23:22,840 and it will automatically be cast to a char*. 315 00:23:22,840 --> 00:23:25,450 That automatic cast does not happen in C++. 316 00:23:25,450 --> 00:23:28,150 That would not compile, and you would explicitly need to say 317 00:23:28,150 --> 00:23:34,510 char*, malloc, whatever, to cast it to a char*. 318 00:23:34,510 --> 00:23:37,270 There aren't many things that C and C++ disagree on, 319 00:23:37,270 --> 00:23:40,620 but those are two. 320 00:23:40,620 --> 00:23:43,120 So we'll go with this syntax. 321 00:23:43,120 --> 00:23:46,150 But even if we didn't go with that syntax, 322 00:23:46,150 --> 00:23:49,470 what is--might be wrong with this? 323 00:23:49,470 --> 00:23:52,170 [Student] I don't need to dereference it? >>Yeah. 324 00:23:52,170 --> 00:23:55,110 Remember that the arrow has an implicit dereference, 325 00:23:55,110 --> 00:23:58,230 and so when we're just dealing with a struct, 326 00:23:58,230 --> 00:24:04,300 we want to use . to get at a field inside of the struct. 327 00:24:04,300 --> 00:24:07,200 And the only time we use arrow is when we want to do-- 328 00:24:07,200 --> 00:24:13,450 well, arrow is equivalent to-- 329 00:24:13,450 --> 00:24:17,020 that's what it would have meant if I used arrow. 330 00:24:17,020 --> 00:24:24,600 All arrow means is dereference this, now I'm at a struct, and I can get the field. 331 00:24:24,600 --> 00:24:28,040 Either get the field directly or dereference and get the field-- 332 00:24:28,040 --> 00:24:30,380 I guess this should be value. 333 00:24:30,380 --> 00:24:33,910 But here I'm dealing with just a struct, not a pointer to a struct, 334 00:24:33,910 --> 00:24:38,780 and so I cannot use the arrow. 335 00:24:38,780 --> 00:24:47,510 But this sort of thing we can do for all nodes. 336 00:24:47,510 --> 00:24:55,790 Oh my God. 337 00:24:55,790 --> 00:25:09,540 This is 6, 7, and 3. 338 00:25:09,540 --> 00:25:16,470 Then we can set up the branches in our tree, we can have 7-- 339 00:25:16,470 --> 00:25:21,650 we can have--its left should point to 3. 340 00:25:21,650 --> 00:25:25,130 So how do we do that? 341 00:25:25,130 --> 00:25:29,320 [Students, unintelligible] >>Yeah. The address of node3, 342 00:25:29,320 --> 00:25:34,170 and if you didn't have address, then it just wouldn't compile. 343 00:25:34,170 --> 00:25:38,190 But remember that these are pointers to the next nodes. 344 00:25:38,190 --> 00:25:44,930 The right should point to 9, 345 00:25:44,930 --> 00:25:53,330 and 3 should point on the right to 6. 346 00:25:53,330 --> 00:25:58,480 I think this is all set. Any comments or questions? 347 00:25:58,480 --> 00:26:02,060 [Student, unintelligible] The root is going to be 7. 348 00:26:02,060 --> 00:26:08,210 We can just say node* ptr = 349 00:26:08,210 --> 00:26:14,160 or root, = &node7. 350 00:26:14,160 --> 00:26:20,730 >> For our purposes, we're going to be dealing with insert, 351 00:26:20,730 --> 00:26:25,490 so we're going to want to write a function to insert into this binary tree, 352 00:26:25,490 --> 00:26:34,230 and insert is inevitably going to call malloc to create a new node for this tree. 353 00:26:34,230 --> 00:26:36,590 So things are going to get messy with the fact that some nodes 354 00:26:36,590 --> 00:26:38,590 are currently on the stack 355 00:26:38,590 --> 00:26:43,680 and other nodes are going to end up on the heap when we insert them. 356 00:26:43,680 --> 00:26:47,640 This is perfectly valid, but the only reason 357 00:26:47,640 --> 00:26:49,600 we're able to do this on the stack 358 00:26:49,600 --> 00:26:51,840 is because it's such a contrived example that we know 359 00:26:51,840 --> 00:26:56,360 the tree is supposed to be constructed as 7, 3, 6, 9. 360 00:26:56,360 --> 00:27:02,970 If we didn't have this, then we wouldn't have to malloc in the first place. 361 00:27:02,970 --> 00:27:06,160 As we'll see a bit later, we should be malloc'ing. 362 00:27:06,160 --> 00:27:08,570 Right now it's perfectly reasonable to put on the stack, 363 00:27:08,570 --> 00:27:14,750 but let's change this to a malloc implementation. 364 00:27:14,750 --> 00:27:17,160 So each of these is now going to be something like 365 00:27:17,160 --> 00:27:24,240 node* node9 = malloc (sizeof(node)). 366 00:27:24,240 --> 00:27:28,040 And now we're going to have to do our check. 367 00:27:28,040 --> 00:27:34,010 if (node9 == NULL)--I didn't want that-- 368 00:27:34,010 --> 00:27:40,950 return 1, and then we can do node9-> because now it's a pointer, 369 00:27:40,950 --> 00:27:45,300 value = 6, node9->left = NULL, 370 00:27:45,300 --> 00:27:48,930 node9->right = NULL, 371 00:27:48,930 --> 00:27:51,410 and we're going to have to do that for each of those nodes. 372 00:27:51,410 --> 00:27:57,490 So instead, let's put it inside of a separate function. 373 00:27:57,490 --> 00:28:00,620 Let's call it node* build_node, 374 00:28:00,620 --> 00:28:09,050 and this is somewhat similar to the APIs we provide for Huffman encoding. 375 00:28:09,050 --> 00:28:12,730 We give you initializer functions for a tree 376 00:28:12,730 --> 00:28:20,520 and deconstructor "functions" for those trees and the same for forests. 377 00:28:20,520 --> 00:28:22,570 >> So here we're going to have an initializer function 378 00:28:22,570 --> 00:28:25,170 to just build a node for us. 379 00:28:25,170 --> 00:28:29,320 And it's going to look pretty much exactly like this. 380 00:28:29,320 --> 00:28:32,230 And I'm even going to be lazy and not change the name of the variable, 381 00:28:32,230 --> 00:28:34,380 even though node9 makes no sense anymore. 382 00:28:34,380 --> 00:28:38,610 Oh, I guess node9's value should not have been 6. 383 00:28:38,610 --> 00:28:42,800 Now we can return node9. 384 00:28:42,800 --> 00:28:49,550 And here we should return null. 385 00:28:49,550 --> 00:28:52,690 Everyone agree on that build-a-node function? 386 00:28:52,690 --> 00:28:59,780 So now we can just call that to build any node with a given value and null pointers. 387 00:28:59,780 --> 00:29:09,750 Now we can call that, we can do node* node9 = build_node(9). 388 00:29:09,750 --> 00:29:25,810 And let's do. . . 389 00:29:25,810 --> 00:29:33,980 6, 3, 7, 6, 3, 7. 390 00:29:33,980 --> 00:29:39,330 And now we want to set up the same pointers, 391 00:29:39,330 --> 00:29:42,470 except now everything's already in terms of pointers 392 00:29:42,470 --> 00:29:48,480 so no longer need the address of. 393 00:29:48,480 --> 00:29:56,300 Okay. So what's the last thing I want to do? 394 00:29:56,300 --> 00:30:03,850 There's an error-checking that I'm not doing. 395 00:30:03,850 --> 00:30:06,800 What does build node return? 396 00:30:06,800 --> 00:30:11,660 [Student, unintelligible] >>Yeah. If malloc failed, it'll return null. 397 00:30:11,660 --> 00:30:16,460 So I'm going to lazily put it down here instead of doing a condition for each. 398 00:30:16,460 --> 00:30:22,320 If (node9 == NULL) or--even simpler, 399 00:30:22,320 --> 00:30:28,020 this is equivalent to just if not node9. 400 00:30:28,020 --> 00:30:38,310 So if not node9, or not node6, or not node3, or not node7, return 1. 401 00:30:38,310 --> 00:30:42,850 Maybe we should print "malloc failed" or something. 402 00:30:42,850 --> 00:30:46,680 [Student] Is false equal to null as well? 403 00:30:46,680 --> 00:30:51,290 [Bowden] Any zero value is false. 404 00:30:51,290 --> 00:30:53,920 So null is a zero value. 405 00:30:53,920 --> 00:30:56,780 Zero is a zero value. False is a zero value. 406 00:30:56,780 --> 00:31:02,130 Any--pretty much the only two zero values are null and zero, 407 00:31:02,130 --> 00:31:07,900 false is just hash-defined as zero. 408 00:31:07,900 --> 00:31:13,030 That also applies if we do declare global variable. 409 00:31:13,030 --> 00:31:17,890 If we did have node* root up here, 410 00:31:17,890 --> 00:31:24,150 then--the nice thing about global variables is that they always have an initial value. 411 00:31:24,150 --> 00:31:27,500 That's not true of functions, how inside of here, 412 00:31:27,500 --> 00:31:34,870 if we have, like, node* or node x. 413 00:31:34,870 --> 00:31:37,380 We have no idea what x.value, x.whatever, 414 00:31:37,380 --> 00:31:40,700 or we could print them and they could be arbitrary. 415 00:31:40,700 --> 00:31:44,980 That's not true of global variables. 416 00:31:44,980 --> 00:31:47,450 So node root or node x. 417 00:31:47,450 --> 00:31:53,410 By default, everything that's global, if not explicitly initialized to some value, 418 00:31:53,410 --> 00:31:57,390 has a zero value as its value. 419 00:31:57,390 --> 00:32:01,150 So here, node* root, we don't explicitly initialize it to anything, 420 00:32:01,150 --> 00:32:06,350 so its default value will be null, which is the zero value of pointers. 421 00:32:06,350 --> 00:32:11,870 The default value of x is going to mean that x.value is zero, 422 00:32:11,870 --> 00:32:14,260 x.left is null, and x.right is null. 423 00:32:14,260 --> 00:32:21,070 So since it is a struct, all of the fields of the struct will be zero values. 424 00:32:21,070 --> 00:32:24,410 We don't need to use that here, though. 425 00:32:24,410 --> 00:32:27,320 [Student] The structs are different than other variables, and the other variables are 426 00:32:27,320 --> 00:32:31,400 garbage values; these are zeros? 427 00:32:31,400 --> 00:32:36,220 [Bowden] Other values too. So in x, x will be zero. 428 00:32:36,220 --> 00:32:40,070 If it's at global scope, it has an initial value. >>Okay. 429 00:32:40,070 --> 00:32:48,950 [Bowden] Either the initial value you gave it or zero. 430 00:32:48,950 --> 00:32:53,260 I think that takes care of all of this. 431 00:32:53,260 --> 00:32:58,390 >> Okay. So the next part of the question asks, 432 00:32:58,390 --> 00:33:01,260 "Now we want to write a function called contains 433 00:33:01,260 --> 00:33:04,930 with a prototype of bool contains int value." 434 00:33:04,930 --> 00:33:08,340 We are not going to do bool contains int value. 435 00:33:08,340 --> 00:33:15,110 Our prototype is going to look like 436 00:33:15,110 --> 00:33:22,380 bool contains(int value). 437 00:33:22,380 --> 00:33:24,490 And then we're also going to pass it the tree 438 00:33:24,490 --> 00:33:28,870 that it should be checking to see if it has that value. 439 00:33:28,870 --> 00:33:38,290 So (node* tree). Okay. 440 00:33:38,290 --> 00:33:44,440 And then we can call it with something like, 441 00:33:44,440 --> 00:33:46,090 maybe we'll want to printf or something. 442 00:33:46,090 --> 00:33:51,040 Contains 6, our root. 443 00:33:51,040 --> 00:33:54,300 That should return 1, or true, 444 00:33:54,300 --> 00:33:59,390 whereas contains 5 root should return false. 445 00:33:59,390 --> 00:34:02,690 So take a second to implement this. 446 00:34:02,690 --> 00:34:06,780 You can do it either iteratively or recursively. 447 00:34:06,780 --> 00:34:09,739 The nice thing about the way we've set things up is that 448 00:34:09,739 --> 00:34:12,300 it lends itself to our recursive solution much easier 449 00:34:12,300 --> 00:34:14,719 than the global-variable way did. 450 00:34:14,719 --> 00:34:19,750 Because if we just have contains int value, then we have no way of recursing down subtrees. 451 00:34:19,750 --> 00:34:24,130 We would have to have a separate helper function that recurses down the subtrees for us. 452 00:34:24,130 --> 00:34:29,610 But since we've changed it to take the tree as an argument, 453 00:34:29,610 --> 00:34:32,760 which it should have always been in the first place, 454 00:34:32,760 --> 00:34:35,710 now we can recurse more easily. 455 00:34:35,710 --> 00:34:38,960 So iterative or recursive, we'll go over both, 456 00:34:38,960 --> 00:34:46,139 but we'll see that recursive ends up being quite easy. 457 00:34:59,140 --> 00:35:02,480 Okay. Does anyone have something we can work with? 458 00:35:02,480 --> 00:35:12,000 >> [Student] I've got an iterative solution. >>All right, iterative. 459 00:35:12,000 --> 00:35:16,030 Okay, this looks good. 460 00:35:16,030 --> 00:35:18,920 So, want to walk us through it? 461 00:35:18,920 --> 00:35:25,780 [Student] Sure. So I set a temp variable to get the first node of the tree. 462 00:35:25,780 --> 00:35:28,330 And then I just looped through while temp doesn't equal null, 463 00:35:28,330 --> 00:35:31,700 so while was still in the tree, I guess. 464 00:35:31,700 --> 00:35:35,710 And if the value is equal to the value that temp is pointing to, 465 00:35:35,710 --> 00:35:37,640 then it returns that value. 466 00:35:37,640 --> 00:35:40,210 Otherwise, it checks if it's on the right side or the left side. 467 00:35:40,210 --> 00:35:43,400 If you ever get a situation where there's no more tree, 468 00:35:43,400 --> 00:35:47,330 then it returns--it exits the loop and returns false. 469 00:35:47,330 --> 00:35:52,190 [Bowden] Okay. So that seems good. 470 00:35:52,190 --> 00:35:58,470 Anyone have any comments on anything? 471 00:35:58,470 --> 00:36:02,400 I have no correctness comments at all. 472 00:36:02,400 --> 00:36:11,020 The one thing we can do is this guy. 473 00:36:11,020 --> 00:36:21,660 Oh, it's going to go a little longish. 474 00:36:21,660 --> 00:36:33,460 I'll fix that up. Okay. 475 00:36:33,460 --> 00:36:37,150 >> Everyone should remember how ternary works. 476 00:36:37,150 --> 00:36:38,610 There have definitely been quizzes in the past 477 00:36:38,610 --> 00:36:41,170 that give you a function with a ternary operator, 478 00:36:41,170 --> 00:36:45,750 and say translate this, do something that doesn't use ternary. 479 00:36:45,750 --> 00:36:49,140 So this is a very common case of when I would think to use ternary, 480 00:36:49,140 --> 00:36:54,610 where if some condition set a variable to something, 481 00:36:54,610 --> 00:36:58,580 else set that same variable to something else. 482 00:36:58,580 --> 00:37:03,390 That's something that very frequently can be transformed into this sort of thing 483 00:37:03,390 --> 00:37:14,420 where set that variable to this--or, well, is this true? Then this, else this. 484 00:37:14,420 --> 00:37:18,550 [Student] The first one is if true, right? 485 00:37:18,550 --> 00:37:25,570 [Bowden] Yeah. The way I always read it is, temp equals value greater than temp value, 486 00:37:25,570 --> 00:37:30,680 then this, else this. It's asking a question. 487 00:37:30,680 --> 00:37:35,200 Is it greater? Then do the first thing. Else do the second thing. 488 00:37:35,200 --> 00:37:41,670 I almost always--the colon, I just--in my head, I read as else. 489 00:37:41,670 --> 00:37:47,180 >> Does anyone have a recursive solution? 490 00:37:47,180 --> 00:37:49,670 Okay. This one we're going to--it could already be great, 491 00:37:49,670 --> 00:37:53,990 but we're going to make it even better. 492 00:37:53,990 --> 00:37:58,720 This is pretty much the same exact idea. 493 00:37:58,720 --> 00:38:03,060 It's just, well, do you want to explain? 494 00:38:03,060 --> 00:38:08,340 [Student] Sure. So we're making sure that the tree isn't null first, 495 00:38:08,340 --> 00:38:13,380 because if the tree is null, then it's going to return false because we haven't found it. 496 00:38:13,380 --> 00:38:19,200 And if there's still a tree, we go into--we first check if the value is the current node. 497 00:38:19,200 --> 00:38:23,740 Return true if it is, and if not, we recurse on the left or right. 498 00:38:23,740 --> 00:38:25,970 Does that sound appropriate? >>Mm-hmm. (Agreement) 499 00:38:25,970 --> 00:38:33,880 So notice that this is almost--structurally very similar to the iterative solution. 500 00:38:33,880 --> 00:38:38,200 It's just that instead of recursing, we had a while loop. 501 00:38:38,200 --> 00:38:40,840 And the base case here where tree does not equal null 502 00:38:40,840 --> 00:38:44,850 was the condition under which we broke out of the while loop. 503 00:38:44,850 --> 00:38:50,200 They're very similar. But we're going to take this one step further. 504 00:38:50,200 --> 00:38:54,190 Now, we do the same thing here. 505 00:38:54,190 --> 00:38:57,680 Notice we're returning the same thing in both of these lines, 506 00:38:57,680 --> 00:39:00,220 except for one argument is different. 507 00:39:00,220 --> 00:39:07,950 So we're going to make that into a ternary. 508 00:39:07,950 --> 00:39:13,790 I hit Option-something, and it made a symbol. Okay. 509 00:39:13,790 --> 00:39:21,720 So we're going to return contains that. 510 00:39:21,720 --> 00:39:28,030 This is getting to be multiple lines--well, zoomed in, it is. 511 00:39:28,030 --> 00:39:31,060 Usually, as a stylistic thing, I don't think many people 512 00:39:31,060 --> 00:39:38,640 put a space after the arrow, but I guess if you're consistent, it's fine. 513 00:39:38,640 --> 00:39:44,320 If value is less than tree value, we want to recurse on tree left, 514 00:39:44,320 --> 00:39:53,890 else we want to recurse on tree right. 515 00:39:53,890 --> 00:39:58,610 So that was step one of making this look smaller. 516 00:39:58,610 --> 00:40:02,660 Step two of making this look smaller-- 517 00:40:02,660 --> 00:40:09,150 we can separate this to multiple lines. 518 00:40:09,150 --> 00:40:16,500 Okay. Step two of making it look smaller is here, 519 00:40:16,500 --> 00:40:25,860 so return value equals tree value, or contains whatever. 520 00:40:25,860 --> 00:40:28,340 >> This is an important thing. 521 00:40:28,340 --> 00:40:30,860 I'm not sure if he said it explicitly in class, 522 00:40:30,860 --> 00:40:34,740 but it's called short-circuit evaluation. 523 00:40:34,740 --> 00:40:41,060 The idea here is value == tree value. 524 00:40:41,060 --> 00:40:49,960 If that is true, then this is true, and we want to OR that with whatever is over here. 525 00:40:49,960 --> 00:40:52,520 So without even thinking about whatever is over here, 526 00:40:52,520 --> 00:40:55,070 what is the entire expression going to return? 527 00:40:55,070 --> 00:40:59,430 [Student] True? >>Yeah, because true of anything, 528 00:40:59,430 --> 00:41:04,990 ORed--or true ORed with anything is necessarily true. 529 00:41:04,990 --> 00:41:08,150 So as soon as we see return value = tree value, 530 00:41:08,150 --> 00:41:10,140 we're just going to return true. 531 00:41:10,140 --> 00:41:15,710 Not even going to recurse further contains down the line. 532 00:41:15,710 --> 00:41:20,980 We can take this one step further. 533 00:41:20,980 --> 00:41:29,490 Return tree does not equal null and all of this. 534 00:41:29,490 --> 00:41:32,650 It made it a one-line function. 535 00:41:32,650 --> 00:41:36,790 This is also an example of short-circuit evaluation. 536 00:41:36,790 --> 00:41:40,680 But now it's the same idea-- 537 00:41:40,680 --> 00:41:47,320 instead of--so if tree does not equal null--or, well, 538 00:41:47,320 --> 00:41:49,580 if tree does equal null, which is the bad case, 539 00:41:49,580 --> 00:41:54,790 if tree equals null, then the first condition is going to be false. 540 00:41:54,790 --> 00:42:00,550 So false ANDed with anything is going to be what? 541 00:42:00,550 --> 00:42:04,640 [Student] False. >>Yeah. This is the other half of short-circuit evaluation, 542 00:42:04,640 --> 00:42:10,710 where if tree does not equal null, then we aren't going to even go-- 543 00:42:10,710 --> 00:42:14,930 or if tree does equal null, then we aren't going to do value == tree value. 544 00:42:14,930 --> 00:42:17,010 We're just going to immediately return false. 545 00:42:17,010 --> 00:42:20,970 Which is important, since if it did not short-circuit evaluate, 546 00:42:20,970 --> 00:42:25,860 then if tree does equal null, this second condition is going to seg fault, 547 00:42:25,860 --> 00:42:32,510 because tree->value is dereferencing null. 548 00:42:32,510 --> 00:42:40,490 So that's that. Can make this--shift once over. 549 00:42:40,490 --> 00:42:44,840 This is a very common thing also, not making this one line with this, 550 00:42:44,840 --> 00:42:49,000 but it's a common thing in conditions, maybe not right here, 551 00:42:49,000 --> 00:42:59,380 but if (tree != NULL && tree->value == value), do whatever. 552 00:42:59,380 --> 00:43:01,560 This is a very common condition, where instead of having 553 00:43:01,560 --> 00:43:06,770 to break this into two ifs, where like, is the tree null? 554 00:43:06,770 --> 00:43:11,780 Okay, it's not null, so now is the tree value equal to value? Do this. 555 00:43:11,780 --> 00:43:17,300 Instead, this condition, this will never seg fault 556 00:43:17,300 --> 00:43:21,220 because it will break out if this happens to be null. 557 00:43:21,220 --> 00:43:24,000 Well, I guess if your tree is a completely invalid pointer, it can still seg fault, 558 00:43:24,000 --> 00:43:26,620 but it can't seg fault if tree is null. 559 00:43:26,620 --> 00:43:32,900 If it were null, it would break out before you ever dereferenced the pointer in the first place. 560 00:43:32,900 --> 00:43:35,000 [Student] Is this called lazy evaluation? 561 00:43:35,000 --> 00:43:40,770 >> [Bowden] Lazy evaluation is a separate thing. 562 00:43:40,770 --> 00:43:49,880 Lazy evaluation is more like you ask for a value, 563 00:43:49,880 --> 00:43:54,270 you ask to calculate a value, kind of, but you don't need it immediately. 564 00:43:54,270 --> 00:43:59,040 So until you actually need it, it isn't evaluated. 565 00:43:59,040 --> 00:44:03,920 This isn't exactly the same thing, but in the Huffman pset, 566 00:44:03,920 --> 00:44:06,520 it says that we "lazily" write. 567 00:44:06,520 --> 00:44:08,670 The reason we do that is because we're actually buffering the write-- 568 00:44:08,670 --> 00:44:11,820 we don't want to write individual bits at a time, 569 00:44:11,820 --> 00:44:15,450 or individual bytes at a time, we instead want to get a chunk of bytes. 570 00:44:15,450 --> 00:44:19,270 Then once we have a chunk of bytes, then we'll write it out. 571 00:44:19,270 --> 00:44:22,640 Even though you ask it to write--and fwrite and fread do the same sort of thing. 572 00:44:22,640 --> 00:44:26,260 They buffer your reads and writes. 573 00:44:26,260 --> 00:44:31,610 Even though you ask it to write immediately, it probably won't. 574 00:44:31,610 --> 00:44:34,540 And you can't be sure that things are going to be written 575 00:44:34,540 --> 00:44:37,540 until you call hfclose or whatever it is, 576 00:44:37,540 --> 00:44:39,620 which then says okay, I'm closing my file, 577 00:44:39,620 --> 00:44:43,450 that means I'd better write everything I haven't written yet. 578 00:44:43,450 --> 00:44:45,770 It has no need to write everything out 579 00:44:45,770 --> 00:44:49,010 until you are closing the file, and then it needs to. 580 00:44:49,010 --> 00:44:56,220 So that's just what lazy--it waits until it has to happen. 581 00:44:56,220 --> 00:44:59,990 This--take 51 and you'll go into it in more detail, 582 00:44:59,990 --> 00:45:05,470 because OCaml and everything in 51, everything is recursion. 583 00:45:05,470 --> 00:45:08,890 There are no iterative solutions, basically. 584 00:45:08,890 --> 00:45:11,550 Everything is recursion, and lazy evaluation 585 00:45:11,550 --> 00:45:14,790 is going to be important for a lot of circumstances 586 00:45:14,790 --> 00:45:19,920 where, if you didn't lazily evaluate, that would mean-- 587 00:45:19,920 --> 00:45:24,760 The example is streams, which are infinitely long. 588 00:45:24,760 --> 00:45:30,990 In theory, you can think of the natural numbers as a stream of 1-2-3-4-5-6-7, 589 00:45:30,990 --> 00:45:33,090 so lazily evaluated things are fine. 590 00:45:33,090 --> 00:45:37,210 If I say I want the tenth number, then I can evaluate up to the tenth number. 591 00:45:37,210 --> 00:45:40,300 If I want the hundredth number, then I can evaluate up to the hundredth number. 592 00:45:40,300 --> 00:45:46,290 Without lazy evaluation, then it's going to try to evaluate all numbers immediately. 593 00:45:46,290 --> 00:45:50,000 You're evaluating infinitely many numbers, and that's just not possible. 594 00:45:50,000 --> 00:45:52,080 So there are a lot of circumstances where lazy evaluation 595 00:45:52,080 --> 00:45:57,840 is just essential to getting things to work. 596 00:45:57,840 --> 00:46:05,260 >> Now we want to write insert where insert is going to be 597 00:46:05,260 --> 00:46:08,430 similarly changing in its definition. 598 00:46:08,430 --> 00:46:10,470 So right now it's bool insert(int value). 599 00:46:10,470 --> 00:46:23,850 We're going to change that to bool insert(int value, node* tree). 600 00:46:23,850 --> 00:46:29,120 We're actually going to change that again in a bit, we'll see why. 601 00:46:29,120 --> 00:46:32,210 And let's move build_node, just for the heck of it, 602 00:46:32,210 --> 00:46:36,730 above insert so we don't have to write a function prototype. 603 00:46:36,730 --> 00:46:42,450 Which is a hint that you're going to be using build_node in insert. 604 00:46:42,450 --> 00:46:49,590 Okay. Take a minute for that. 605 00:46:49,590 --> 00:46:52,130 I think I saved the revision if you want to pull from that, 606 00:46:52,130 --> 00:47:00,240 or, at least, I did now. 607 00:47:00,240 --> 00:47:04,190 I wanted a slight break to think about the logic of insert, 608 00:47:04,190 --> 00:47:08,750 if you can't think of it. 609 00:47:08,750 --> 00:47:12,860 Basically, you will only ever be inserting at leaves. 610 00:47:12,860 --> 00:47:18,640 Like, if I insert 1, then I'm inevitably going to be inserting 1-- 611 00:47:18,640 --> 00:47:21,820 I'll change to black--I'll be inserting 1 over here. 612 00:47:21,820 --> 00:47:28,070 Or if I insert 4, I want to be inserting 4 over here. 613 00:47:28,070 --> 00:47:32,180 So no matter what you do, you're going to be inserting at a leaf. 614 00:47:32,180 --> 00:47:36,130 All you have to do is iterate down the tree until you get to the node 615 00:47:36,130 --> 00:47:40,890 that should be the node's parent, the new node's parent, 616 00:47:40,890 --> 00:47:44,560 and then change its left or right pointer, depending on whether 617 00:47:44,560 --> 00:47:47,060 it's greater than or less than the current node. 618 00:47:47,060 --> 00:47:51,180 Change that pointer to point to your new node. 619 00:47:51,180 --> 00:48:05,490 So iterate down the tree, make the leaf point to the new node. 620 00:48:05,490 --> 00:48:09,810 Also think about why that forbids the type of situation before, 621 00:48:09,810 --> 00:48:17,990 where I constructed the binary tree where it was correct 622 00:48:17,990 --> 00:48:19,920 if you only looked at a single node, 623 00:48:19,920 --> 00:48:24,830 but 9 was to the left of 7 if you iterated down all the way. 624 00:48:24,830 --> 00:48:29,770 So that's impossible in this scenario, since-- 625 00:48:29,770 --> 00:48:32,530 think about inserting 9 or something; at the very first node, 626 00:48:32,530 --> 00:48:35,350 I'm going to see 7 and I'm just going to go to the right. 627 00:48:35,350 --> 00:48:38,490 So no matter what I do, if I'm inserting by going to a leaf, 628 00:48:38,490 --> 00:48:40,790 and to a leaf using the appropriate algorithm, 629 00:48:40,790 --> 00:48:43,130 it's going to be impossible for me to insert 9 to the left of 7 630 00:48:43,130 --> 00:48:48,250 because as soon as I hit 7, I'm going to go to the right. 631 00:48:59,740 --> 00:49:02,070 Does anyone have something to start with? 632 00:49:02,070 --> 00:49:05,480 [Student] I do. >>Sure. 633 00:49:05,480 --> 00:49:09,200 [Student, unintelligible] 634 00:49:09,200 --> 00:49:14,390 [Other student, unintelligible] 635 00:49:14,390 --> 00:49:18,330 [Bowden] It's appreciated. Okay. Want to explain? 636 00:49:18,330 --> 00:49:20,690 >> [Student] Since we know that we were inserting 637 00:49:20,690 --> 00:49:24,060 new nodes at the end of the tree, 638 00:49:24,060 --> 00:49:28,070 I looped through the tree iteratively 639 00:49:28,070 --> 00:49:31,360 until I got to a node that pointed to null. 640 00:49:31,360 --> 00:49:34,220 And then I decided to put it either on the right side or the left side 641 00:49:34,220 --> 00:49:37,420 using this right variable; it told me where to put it. 642 00:49:37,420 --> 00:49:41,850 And then, essentially, I just made that last-- 643 00:49:41,850 --> 00:49:47,520 that temp node point to the new node that it was creating, 644 00:49:47,520 --> 00:49:50,770 either on the left side or on the right side, depending on what the value right was. 645 00:49:50,770 --> 00:49:56,530 Finally, I set the new node value to the value of its testing. 646 00:49:56,530 --> 00:49:59,550 [Bowden] Okay, so I see one issue here. 647 00:49:59,550 --> 00:50:02,050 This is like 95% of the way there. 648 00:50:02,050 --> 00:50:07,550 The one issue that I see, well, does anyone else see an issue? 649 00:50:07,550 --> 00:50:18,400 What is the circumstance under which they break out of the loop? 650 00:50:18,400 --> 00:50:22,100 [Student] If temp is null? >>Yeah. So how you break out of the loop is if temp is null. 651 00:50:22,100 --> 00:50:30,220 But what do I do right here? 652 00:50:30,220 --> 00:50:35,410 I dereference temp, which is inevitably null. 653 00:50:35,410 --> 00:50:39,840 So the other thing you need to do is not just keep track until temp is null, 654 00:50:39,840 --> 00:50:45,590 you want to keep track of the parent at all times. 655 00:50:45,590 --> 00:50:52,190 We also want node * parent, I guess we can keep that at null at first. 656 00:50:52,190 --> 00:50:55,480 This is going to have weird behavior for the root of the tree, 657 00:50:55,480 --> 00:50:59,210 but we'll get to that. 658 00:50:59,210 --> 00:51:03,950 If value is greater than whatever, then temp = temp right. 659 00:51:03,950 --> 00:51:07,910 But before we do that, parent = temp. 660 00:51:07,910 --> 00:51:14,500 Or are parents always going to equal temp? Is that the case? 661 00:51:14,500 --> 00:51:19,560 If temp is not null, then I'm going to move down, no matter what, 662 00:51:19,560 --> 00:51:24,030 to a node for which temp is the parent. 663 00:51:24,030 --> 00:51:27,500 So parent's going to be temp, and then I move temp down. 664 00:51:27,500 --> 00:51:32,410 Now temp is null, but parent points to the parent of the thing that is null. 665 00:51:32,410 --> 00:51:39,110 So down here, I don't want to set right equals 1. 666 00:51:39,110 --> 00:51:41,300 So I moved to the right, so if right = 1, 667 00:51:41,300 --> 00:51:45,130 and I guess you also want to do-- 668 00:51:45,130 --> 00:51:48,530 if you move to the left, you want to set right equal to 0. 669 00:51:48,530 --> 00:51:55,950 Or else if you ever move to the right. 670 00:51:55,950 --> 00:51:58,590 So right = 0. If right = 1, 671 00:51:58,590 --> 00:52:04,260 now we want to make the parent right pointer newnode, 672 00:52:04,260 --> 00:52:08,520 else we want to make the parent left pointer newnode. 673 00:52:08,520 --> 00:52:16,850 Questions on that? 674 00:52:16,850 --> 00:52:24,880 Okay. So this is the way we--well, actually, instead of doing this, 675 00:52:24,880 --> 00:52:29,630 we half expected you to use build_node. 676 00:52:29,630 --> 00:52:40,450 And then if newnode equals null, return false. 677 00:52:40,450 --> 00:52:44,170 That's that. Now, this is what we expected you to do. 678 00:52:44,170 --> 00:52:47,690 >> This is what the staff solutions do. 679 00:52:47,690 --> 00:52:52,360 I disagree with this as the "right" way of going about it, 680 00:52:52,360 --> 00:52:57,820 but this is perfectly fine and it will work. 681 00:52:57,820 --> 00:53:02,840 One thing that's a little weird right now is 682 00:53:02,840 --> 00:53:08,310 if the tree starts off as null, we pass in a null tree. 683 00:53:08,310 --> 00:53:12,650 I guess it depends on how you define the behavior of passing in a null tree. 684 00:53:12,650 --> 00:53:15,490 I would think that if you pass in a null tree, 685 00:53:15,490 --> 00:53:17,520 then inserting the value into a null tree 686 00:53:17,520 --> 00:53:23,030 should just return a tree where the only value is that single node. 687 00:53:23,030 --> 00:53:25,830 Do people agree with that? You could, if you wanted, 688 00:53:25,830 --> 00:53:28,050 if you pass in a null tree 689 00:53:28,050 --> 00:53:31,320 and you want to insert a value into it, return false. 690 00:53:31,320 --> 00:53:33,830 It's up to you to define that. 691 00:53:33,830 --> 00:53:38,320 To do the first thing I said and then-- 692 00:53:38,320 --> 00:53:40,720 well, you're going to have trouble doing that, because 693 00:53:40,720 --> 00:53:44,090 it would be easier if we had a global pointer to the thing, 694 00:53:44,090 --> 00:53:48,570 but we don't, so if tree is null, there's nothing we can do about that. 695 00:53:48,570 --> 00:53:50,960 We can just return false. 696 00:53:50,960 --> 00:53:52,840 So I'm going to change insert. 697 00:53:52,840 --> 00:53:56,540 We technically could just change this right here, 698 00:53:56,540 --> 00:53:58,400 how it's iterating over things, 699 00:53:58,400 --> 00:54:04,530 but I'm going to change insert to take a node** tree. 700 00:54:04,530 --> 00:54:07,510 Double pointers. 701 00:54:07,510 --> 00:54:09,710 What does this mean? 702 00:54:09,710 --> 00:54:12,330 Instead of dealing with pointers to nodes, 703 00:54:12,330 --> 00:54:16,690 the thing I'm going to be manipulating is this pointer. 704 00:54:16,690 --> 00:54:18,790 I'm going to be manipulating this pointer. 705 00:54:18,790 --> 00:54:21,770 I'm going to be manipulating pointers directly. 706 00:54:21,770 --> 00:54:25,760 This makes sense since, think about down-- 707 00:54:25,760 --> 00:54:33,340 well, right now, this points to null. 708 00:54:33,340 --> 00:54:38,130 What I want to do is manipulate this pointer to point to not null. 709 00:54:38,130 --> 00:54:40,970 I want it to point to my new node. 710 00:54:40,970 --> 00:54:44,870 If I just keep track of pointers to my pointers, 711 00:54:44,870 --> 00:54:47,840 then I don't need to keep track of a parent pointer. 712 00:54:47,840 --> 00:54:52,640 I can just keep track to see if the pointer is pointing to null, 713 00:54:52,640 --> 00:54:54,580 and if the pointer is pointing to null, 714 00:54:54,580 --> 00:54:57,370 change it to point to the node I want. 715 00:54:57,370 --> 00:55:00,070 And I can change it since I have a pointer to the pointer. 716 00:55:00,070 --> 00:55:02,040 Let's see this right now. 717 00:55:02,040 --> 00:55:05,470 You can actually do it recursively pretty easily. 718 00:55:05,470 --> 00:55:08,080 Do we want to do that? 719 00:55:08,080 --> 00:55:10,980 Yes, we do. 720 00:55:10,980 --> 00:55:16,790 >> Let's see it recursively. 721 00:55:16,790 --> 00:55:24,070 First, what is our base case going to be? 722 00:55:24,070 --> 00:55:29,140 Almost always our base case; but actually, this is kind of tricky. 723 00:55:29,140 --> 00:55:34,370 First things first, if (tree == NULL) 724 00:55:34,370 --> 00:55:37,550 I guess we're just going to return false. 725 00:55:37,550 --> 00:55:40,460 This is different from your tree being null. 726 00:55:40,460 --> 00:55:44,510 This is the pointer to your root pointer being null, 727 00:55:44,510 --> 00:55:48,360 which means that your root pointer does not exist. 728 00:55:48,360 --> 00:55:52,390 So down here, if I do 729 00:55:52,390 --> 00:55:55,760 node*--let's just reuse this. 730 00:55:55,760 --> 00:55:58,960 Node* root = NULL, 731 00:55:58,960 --> 00:56:00,730 and then I'm going to call insert by doing something like 732 00:56:00,730 --> 00:56:04,540 insert 4 into &root. 733 00:56:04,540 --> 00:56:06,560 So &root, if root is a node*, 734 00:56:06,560 --> 00:56:11,170 then &root is going to be a node**. 735 00:56:11,170 --> 00:56:15,120 This is valid. In this case, tree, up here, 736 00:56:15,120 --> 00:56:20,120 tree is not null--or insert. 737 00:56:20,120 --> 00:56:24,630 Here. Tree is not null; *tree is null, which is fine 738 00:56:24,630 --> 00:56:26,680 because if *tree is null, then I can manipulate it 739 00:56:26,680 --> 00:56:29,210 to now point to what I want it to point to. 740 00:56:29,210 --> 00:56:34,750 But if tree is null, that means I just came down here and said null. 741 00:56:34,750 --> 00:56:42,710 That does not make sense. I can't do anything with that. 742 00:56:42,710 --> 00:56:45,540 If tree is null, return false. 743 00:56:45,540 --> 00:56:48,220 So I basically already said what our real base case is. 744 00:56:48,220 --> 00:56:50,580 And what is that going to be? 745 00:56:50,580 --> 00:56:55,030 [Student, unintelligible] 746 00:56:55,030 --> 00:57:00,000 [Bowden] Yes. So if (*tree == NULL). 747 00:57:00,000 --> 00:57:04,520 This relates to the case over here 748 00:57:04,520 --> 00:57:09,640 where if my red pointer is the pointer I'm focused on, 749 00:57:09,640 --> 00:57:12,960 so like I'm focused on this pointer, now I'm focused on this pointer. 750 00:57:12,960 --> 00:57:15,150 Now I'm focused on this pointer. 751 00:57:15,150 --> 00:57:18,160 So if my red pointer, which is my node**, 752 00:57:18,160 --> 00:57:22,880 is ever--if *, my red pointer, is ever null, 753 00:57:22,880 --> 00:57:28,470 that means that I am at the case where I'm focusing on a pointer that points-- 754 00:57:28,470 --> 00:57:32,530 this is a pointer that belongs to a leaf. 755 00:57:32,530 --> 00:57:41,090 I want to change this pointer to point to my new node. 756 00:57:41,090 --> 00:57:45,230 Come back over here. 757 00:57:45,230 --> 00:57:56,490 My new node will just be node* n= build_node(value) 758 00:57:56,490 --> 00:58:07,300 then n, if n = NULL, return false. 759 00:58:07,300 --> 00:58:12,600 Else we want to change what the pointer is currently pointing to 760 00:58:12,600 --> 00:58:17,830 to now point to our newly built node. 761 00:58:17,830 --> 00:58:20,280 We can actually do that here. 762 00:58:20,280 --> 00:58:30,660 Instead of saying n, we say *tree= if *tree. 763 00:58:30,660 --> 00:58:35,450 Everyone understand that? That by dealing with pointers to pointers, 764 00:58:35,450 --> 00:58:40,750 we can change null pointers to point to things we want them to point to. 765 00:58:40,750 --> 00:58:42,820 That's our base case. 766 00:58:42,820 --> 00:58:45,740 >> Now our recurrence, or our recursion, 767 00:58:45,740 --> 00:58:51,430 is going to be very similar to all other recursions we've been doing. 768 00:58:51,430 --> 00:58:55,830 We're going to want to insert value, 769 00:58:55,830 --> 00:58:59,040 and now I'm going to use ternary again, but what is our condition going to be? 770 00:58:59,040 --> 00:59:05,180 What is it we're looking for to decide whether we want to go left or right? 771 00:59:05,180 --> 00:59:07,760 Let's do it in separate steps. 772 00:59:07,760 --> 00:59:18,850 If (value <) what? 773 00:59:18,850 --> 00:59:23,200 [Student] The tree's value? 774 00:59:23,200 --> 00:59:27,490 [Bowden] So remember that I'm currently-- 775 00:59:27,490 --> 00:59:31,260 [Students, unintelligible] 776 00:59:31,260 --> 00:59:34,140 [Bowden] Yeah, so right here, let's say that this green arrow 777 00:59:34,140 --> 00:59:39,050 is an example of what tree currently is, is a pointer to this pointer. 778 00:59:39,050 --> 00:59:46,610 So that means I am a pointer to a pointer to 3. 779 00:59:46,610 --> 00:59:48,800 The dereference twice sounded good. 780 00:59:48,800 --> 00:59:51,010 What do I--how do I do that? 781 00:59:51,010 --> 00:59:53,210 [Student] Dereference once, and then do arrow that way? 782 00:59:53,210 --> 00:59:58,420 [Bowden] So (*tree) is the dereference once, ->value 783 00:59:58,420 --> 01:00:05,960 is going to give me the value of the node that I'm indirectly pointing to. 784 01:00:05,960 --> 01:00:11,980 So I can also write it **tree.value if you prefer that. 785 01:00:11,980 --> 01:00:18,490 Either works. 786 01:00:18,490 --> 01:00:26,190 If that is the case, then I want to call insert with value. 787 01:00:26,190 --> 01:00:32,590 And what is my updated node** going to be? 788 01:00:32,590 --> 01:00:39,440 I want to go to the left, so **tree.left is going to be my left. 789 01:00:39,440 --> 01:00:41,900 And I want the pointer to that thing 790 01:00:41,900 --> 01:00:44,930 so that if the left ends up being the null pointer, 791 01:00:44,930 --> 01:00:48,360 I can modify it to point to my new node. 792 01:00:48,360 --> 01:00:51,460 >> And the other case can be very similar. 793 01:00:51,460 --> 01:00:55,600 Let's actually make that my ternary right now. 794 01:00:55,600 --> 01:01:04,480 Insert value if value < (**tree).value. 795 01:01:04,480 --> 01:01:11,040 Then we want to update our ** to the left, 796 01:01:11,040 --> 01:01:17,030 else we want to update our ** to the right. 797 01:01:17,030 --> 01:01:22,540 [Student] Does that get the pointer to the pointer? 798 01:01:22,540 --> 01:01:30,940 [Bowden] Remember that--**tree.right is a node star. 799 01:01:30,940 --> 01:01:35,040 [Student, unintelligible] >>Yeah. 800 01:01:35,040 --> 01:01:41,140 **tree.right is like this pointer or something. 801 01:01:41,140 --> 01:01:45,140 So by taking a pointer to that, that gives me what I want 802 01:01:45,140 --> 01:01:50,090 of the pointer to that guy. 803 01:01:50,090 --> 01:01:54,260 [Student] Could we go over again why we are using the two pointers? 804 01:01:54,260 --> 01:01:58,220 [Bowden] Yeah. So--no, you can, and that solution before 805 01:01:58,220 --> 01:02:04,540 was a way of doing it without doing two pointers. 806 01:02:04,540 --> 01:02:08,740 You need to be able to understand using two pointers, 807 01:02:08,740 --> 01:02:11,660 and this is a cleaner solution. 808 01:02:11,660 --> 01:02:16,090 Also, notice that, what happens if my tree-- 809 01:02:16,090 --> 01:02:24,480 what happens if my root was null? What happens if I do this case right here? 810 01:02:24,480 --> 01:02:30,540 So node* root = NULL, insert 4 into &root. 811 01:02:30,540 --> 01:02:35,110 What is root going to be after this? 812 01:02:35,110 --> 01:02:37,470 [Student, unintelligible] >>Yeah. 813 01:02:37,470 --> 01:02:41,710 Root value is going to be 4. 814 01:02:41,710 --> 01:02:45,510 Root left is going to be null, root right is going to be null. 815 01:02:45,510 --> 01:02:49,490 In the case where we did not pass root by address, 816 01:02:49,490 --> 01:02:52,490 we could not modify root. 817 01:02:52,490 --> 01:02:56,020 In the case where the tree--where root was null, 818 01:02:56,020 --> 01:02:58,410 we just had to return false. There's nothing we could do. 819 01:02:58,410 --> 01:03:01,520 We cannot insert a node into an empty tree. 820 01:03:01,520 --> 01:03:06,810 But now we can; we just make an empty tree into a one-node tree. 821 01:03:06,810 --> 01:03:13,400 Which is usually the expected way that it's supposed to work. 822 01:03:13,400 --> 01:03:21,610 >> Furthermore, this is significantly shorter than 823 01:03:21,610 --> 01:03:26,240 also keeping track of the parent, and so you iterate down all the way. 824 01:03:26,240 --> 01:03:30,140 Now I have my parent, and I just have my parent right pointer to the whatever. 825 01:03:30,140 --> 01:03:35,640 Instead, if we did this iteratively, it'd be the same idea with a while loop. 826 01:03:35,640 --> 01:03:38,100 But instead of having to deal with my parent pointer, 827 01:03:38,100 --> 01:03:40,600 instead, my current pointer would be the thing 828 01:03:40,600 --> 01:03:43,790 that I'm directly modifying to point to my new node. 829 01:03:43,790 --> 01:03:46,090 I don't have to deal with whether it's pointing to the left. 830 01:03:46,090 --> 01:03:48,810 I don't have to deal with whether it's pointing to the right. 831 01:03:48,810 --> 01:04:00,860 It's just whatever this pointer is, I'm going to set it to point to my new node. 832 01:04:00,860 --> 01:04:05,740 Everyone understand how it works? 833 01:04:05,740 --> 01:04:09,460 If not why do we want to do it this way, 834 01:04:09,460 --> 01:04:14,920 but at least that this works as a solution? 835 01:04:14,920 --> 01:04:17,110 [Student] Where do we return true? 836 01:04:17,110 --> 01:04:21,550 [Bowden] That's probably right here. 837 01:04:21,550 --> 01:04:26,690 If we correctly insert it, return true. 838 01:04:26,690 --> 01:04:32,010 Else, down here we're going to want to return whatever insert returns. 839 01:04:32,010 --> 01:04:35,950 >> And what's special about this recursive function? 840 01:04:35,950 --> 01:04:43,010 It's tail recursive, so as long as we compile with some optimization, 841 01:04:43,010 --> 01:04:48,060 it will recognize that and you will never get a stack overflow from this, 842 01:04:48,060 --> 01:04:53,230 even if our tree has a height of 10 thousand or 10 million. 843 01:04:53,230 --> 01:04:55,630 [Student, unintelligible] 844 01:04:55,630 --> 01:05:00,310 [Bowden] I think it does it at dash--or what optimization level 845 01:05:00,310 --> 01:05:03,820 is required for a tail recursion to be recognized. 846 01:05:03,820 --> 01:05:09,350 I think it recognizes--GCC and Clang 847 01:05:09,350 --> 01:05:12,610 also have different meanings for their optimization levels. 848 01:05:12,610 --> 01:05:17,870 I want to say it's -o 2, for sure that it will recognize tail recursion. 849 01:05:17,870 --> 01:05:27,880 But we--you could construct like a Fibonocci example or something. 850 01:05:27,880 --> 01:05:30,030 It's not easy to test with this, because it's difficult to construct 851 01:05:30,030 --> 01:05:32,600 a binary tree that's so large. 852 01:05:32,600 --> 01:05:37,140 But yeah, I think it's -o 2, that 853 01:05:37,140 --> 01:05:40,580 if you compile with -o 2, it will look for tail recursion 854 01:05:40,580 --> 01:05:54,030 and optimize that out. 855 01:05:54,030 --> 01:05:58,190 Let's go back to--insert's literally the last thing it needs. 856 01:05:58,190 --> 01:06:04,900 Let's go back to the insert over here 857 01:06:04,900 --> 01:06:07,840 where we're going to do the same idea. 858 01:06:07,840 --> 01:06:14,340 It'll still have the flaw of not being able to entirely handle 859 01:06:14,340 --> 01:06:17,940 when the root itself is null, or the past entry is null, 860 01:06:17,940 --> 01:06:20,060 but instead of dealing with a parent pointer, 861 01:06:20,060 --> 01:06:27,430 let's apply the same logic of keeping pointers to pointers. 862 01:06:27,430 --> 01:06:35,580 If here we keep our node ** cur, 863 01:06:35,580 --> 01:06:37,860 and we don't need to keep track of right anymore, 864 01:06:37,860 --> 01:06:47,190 but node ** cur = &tree. 865 01:06:47,190 --> 01:06:54,800 And now our while loop is going to be while *cur does not equal null. 866 01:06:54,800 --> 01:07:00,270 Don't need to keep track of parents anymore. 867 01:07:00,270 --> 01:07:04,180 Don't need to keep track of left and right. 868 01:07:04,180 --> 01:07:10,190 And I'll call it temp, because we're already using temp. 869 01:07:10,190 --> 01:07:17,200 Okay. So if (value > *temp), 870 01:07:17,200 --> 01:07:24,010 then &(*temp)->right 871 01:07:24,010 --> 01:07:29,250 else temp = &(*temp)->left. 872 01:07:29,250 --> 01:07:32,730 And now, at this point, after this while loop, 873 01:07:32,730 --> 01:07:36,380 I only do this because maybe it's easier to think about iteratively than recursively, 874 01:07:36,380 --> 01:07:39,010 but after this while loop, 875 01:07:39,010 --> 01:07:43,820 *temp is the pointer we want to change. 876 01:07:43,820 --> 01:07:48,960 >> Before we had parent, and we wanted to change either parent left or parent right, 877 01:07:48,960 --> 01:07:51,310 but if we want to change parent right, 878 01:07:51,310 --> 01:07:54,550 then *temp is parent right, and we can change it directly. 879 01:07:54,550 --> 01:08:05,860 So down here, we can do *temp = newnode, and that's it. 880 01:08:05,860 --> 01:08:09,540 So notice, all we did in this was take out lines of code. 881 01:08:09,540 --> 01:08:14,770 In order to keep track of the parent in all that is additional effort. 882 01:08:14,770 --> 01:08:18,689 Here, if we just keep track of the pointer to the pointer, 883 01:08:18,689 --> 01:08:22,990 and even if we wanted to get rid of all these curly braces now, 884 01:08:22,990 --> 01:08:27,170 make it look shorter. 885 01:08:27,170 --> 01:08:32,529 This now is the exact same solution, 886 01:08:32,529 --> 01:08:38,210 but fewer lines of code. 887 01:08:38,210 --> 01:08:41,229 Once you start recognizing this as a valid solution, 888 01:08:41,229 --> 01:08:43,529 it's also easier to reason about than like, okay, 889 01:08:43,529 --> 01:08:45,779 why do I have this flag at int right? 890 01:08:45,779 --> 01:08:49,290 What does that mean? Oh, it's signifying that 891 01:08:49,290 --> 01:08:51,370 every time I go right, I need to set it, 892 01:08:51,370 --> 01:08:53,819 else if I go left I need to set it to zero. 893 01:08:53,819 --> 01:09:04,060 Here, I don't have to reason about that; it's just easier to think about. 894 01:09:04,060 --> 01:09:06,710 Questions? 895 01:09:06,710 --> 01:09:16,290 [Student, unintelligible] >>Yeah. 896 01:09:16,290 --> 01:09:23,359 Okay, so in the last bit-- 897 01:09:23,359 --> 01:09:28,080 I guess one quick and easy function we can do is, 898 01:09:28,080 --> 01:09:34,910 let's--together, I guess, try and write a contains function 899 01:09:34,910 --> 01:09:38,899 which does not care whether it is a binary search tree. 900 01:09:38,899 --> 01:09:43,770 This contains function should return true 901 01:09:43,770 --> 01:09:58,330 if anywhere in this general binary tree is the value we're looking for. 902 01:09:58,330 --> 01:10:02,520 So let's first do it recursively and then we'll do it iteratively. 903 01:10:02,520 --> 01:10:07,300 We can actually just do it together, because this is going to be really short. 904 01:10:07,300 --> 01:10:11,510 >> What is my base case going to be? 905 01:10:11,510 --> 01:10:13,890 [Student, unintelligible] 906 01:10:13,890 --> 01:10:18,230 [Bowden] So if (tree == NULL), then what? 907 01:10:18,230 --> 01:10:22,740 [Student] Return false. 908 01:10:22,740 --> 01:10:26,160 [Bowden] Else, well, I don't need the else. 909 01:10:26,160 --> 01:10:30,250 If was my other base case. 910 01:10:30,250 --> 01:10:32,450 [Student] Tree's value? >>Yeah. 911 01:10:32,450 --> 01:10:36,430 So if (tree->value == value). 912 01:10:36,430 --> 01:10:39,920 Notice we're back to node*, not node**s? 913 01:10:39,920 --> 01:10:42,990 Contains will never need to use a node**, 914 01:10:42,990 --> 01:10:45,480 since we aren't modifying pointers. 915 01:10:45,480 --> 01:10:50,540 We're just traversing them. 916 01:10:50,540 --> 01:10:53,830 If that happens, then we want to return true. 917 01:10:53,830 --> 01:10:58,270 Else we want to traverse the children. 918 01:10:58,270 --> 01:11:02,620 So we cannot reason about whether everything to the left is less 919 01:11:02,620 --> 01:11:05,390 and everything to the right is greater. 920 01:11:05,390 --> 01:11:09,590 So what is our condition going to be here--or, what are we going to do? 921 01:11:09,590 --> 01:11:11,840 [Student, unintelligible] >>Yeah. 922 01:11:11,840 --> 01:11:17,400 Return contains(value, tree->left) 923 01:11:17,400 --> 01:11:26,860 or contains(value, tree->right). And that's it. 924 01:11:26,860 --> 01:11:29,080 And notice there is some short-circuit evaluation, 925 01:11:29,080 --> 01:11:32,520 where if we happen to find the value in the left tree, 926 01:11:32,520 --> 01:11:36,820 we never need to look at the right tree. 927 01:11:36,820 --> 01:11:40,430 That's the entire function. 928 01:11:40,430 --> 01:11:43,690 Now let's do it iteratively, 929 01:11:43,690 --> 01:11:48,060 which is going to be less nice. 930 01:11:48,060 --> 01:11:54,750 We'll take the usual start of node* cur = tree. 931 01:11:54,750 --> 01:12:03,250 While (cur != NULL). 932 01:12:03,250 --> 01:12:08,600 Quickly going to see a problem. 933 01:12:08,600 --> 01:12:12,250 If cur--out here, if we ever break out of this, 934 01:12:12,250 --> 01:12:16,020 then we've run out of things to look at, so return false. 935 01:12:16,020 --> 01:12:24,810 If (cur->value == value), return true. 936 01:12:24,810 --> 01:12:32,910 So now, we are at a place-- 937 01:12:32,910 --> 01:12:36,250 we don't know whether we want to go left or right. 938 01:12:36,250 --> 01:12:44,590 So arbitrarily, let's just go left. 939 01:12:44,590 --> 01:12:47,910 I've obviously run into an issue where I've completely abandoned everything-- 940 01:12:47,910 --> 01:12:50,760 I will only ever check the left side of a tree. 941 01:12:50,760 --> 01:12:56,050 I will never check anything that is a right child of anything. 942 01:12:56,050 --> 01:13:00,910 How do I fix this? 943 01:13:00,910 --> 01:13:05,260 [Student] You have to keep track of the left and right in a stack. 944 01:13:05,260 --> 01:13:13,710 [Bowden] Yeah. So let's make it 945 01:13:13,710 --> 01:13:32,360 struct list, node* n, and then node** next? 946 01:13:32,360 --> 01:13:40,240 I think that works fine. 947 01:13:40,240 --> 01:13:45,940 We want to go over the left, or let's--up here. 948 01:13:45,940 --> 01:13:54,350 Struct list list =, it'll start 949 01:13:54,350 --> 01:13:58,170 out at this struct list. 950 01:13:58,170 --> 01:14:04,040 *list = NULL. So that's going to be our linked list 951 01:14:04,040 --> 01:14:08,430 of subtrees that we have skipped over. 952 01:14:08,430 --> 01:14:13,800 We are going to traverse left now, 953 01:14:13,800 --> 01:14:17,870 but since we inevitably need to come back to the right, 954 01:14:17,870 --> 01:14:26,140 we're going to keep the right side inside of our struct list. 955 01:14:26,140 --> 01:14:32,620 Then we'll have new_list or struct, 956 01:14:32,620 --> 01:14:41,080 struct list*, new_list = malloc (sizeof(list)). 957 01:14:41,080 --> 01:14:44,920 I'm going to ignore error checking that, but you should check to see if it's null. 958 01:14:44,920 --> 01:14:50,540 New_list the node it's going to point to-- 959 01:14:50,540 --> 01:14:56,890 oh, that's why I wanted it up here. It's going to point to a second struct list. 960 01:14:56,890 --> 01:15:02,380 That's just how linked lists work. 961 01:15:02,380 --> 01:15:04,810 This is the same as an int linked list 962 01:15:04,810 --> 01:15:06,960 except we're just replacing int with node*. 963 01:15:06,960 --> 01:15:11,040 It's exactly the same. 964 01:15:11,040 --> 01:15:15,100 So new_list, the value of our new_list node, 965 01:15:15,100 --> 01:15:19,120 is going to be cur->right. 966 01:15:19,120 --> 01:15:24,280 The value of our new_list->next is going to be our original list, 967 01:15:24,280 --> 01:15:30,760 and then we're going to update our list to point to new_list. 968 01:15:30,760 --> 01:15:33,650 >> Now we need some sort of way of pulling things, 969 01:15:33,650 --> 01:15:37,020 like we have traversed the entire left subtree. 970 01:15:37,020 --> 01:15:40,480 Now we need to pull stuff out of it, 971 01:15:40,480 --> 01:15:43,850 like cur is null; we don't want to just return false. 972 01:15:43,850 --> 01:15:50,370 We want to now pull outside at our new list. 973 01:15:50,370 --> 01:15:53,690 A convenient way of doing this--well, actually, there's multiple ways of doing this. 974 01:15:53,690 --> 01:15:56,680 Anyone have a suggestion? 975 01:15:56,680 --> 01:15:58,790 Where I should do this or how I should do this? 976 01:15:58,790 --> 01:16:08,010 We only have a couple minutes, but any suggestions? 977 01:16:08,010 --> 01:16:14,750 Instead of--one way, instead of our condition being while, 978 01:16:14,750 --> 01:16:17,090 what we're currently looking at is not null, 979 01:16:17,090 --> 01:16:22,340 instead we're going to continue to go until our list itself is null. 980 01:16:22,340 --> 01:16:25,680 So if our list ends up being null, 981 01:16:25,680 --> 01:16:30,680 then we have run out of things to look for, to search over. 982 01:16:30,680 --> 01:16:39,860 But that means that the first thing in our list is just going to be the first node. 983 01:16:39,860 --> 01:16:49,730 The very first thing will be--we no longer need to see that. 984 01:16:49,730 --> 01:16:58,710 So list->n will be our tree. 985 01:16:58,710 --> 01:17:02,860 list->next is going to be null. 986 01:17:02,860 --> 01:17:07,580 And now while list does not equal null. 987 01:17:07,580 --> 01:17:11,610 Cur is going to pull something from our list. 988 01:17:11,610 --> 01:17:13,500 So cur is going to equal list->n. 989 01:17:13,500 --> 01:17:20,850 And then list is going to equal list->n, or list->next. 990 01:17:20,850 --> 01:17:23,480 So if cur value equals value. 991 01:17:23,480 --> 01:17:28,790 Now we can add both our right pointer and our left pointer 992 01:17:28,790 --> 01:17:32,900 as long as they're not null. 993 01:17:32,900 --> 01:17:36,390 Down here, I guess we should have done that in the first place. 994 01:17:36,390 --> 01:17:44,080 If (cur->right != NULL) 995 01:17:44,080 --> 01:17:56,380 then we are going to insert that node into our list. 996 01:17:56,380 --> 01:17:59,290 If (cur->left), this is a little bit of extra work, but it's fine. 997 01:17:59,290 --> 01:18:02,690 If (cur->left != NULL), 998 01:18:02,690 --> 01:18:14,310 and we're going to insert the left into our linked list, 999 01:18:14,310 --> 01:18:19,700 and that should be it. 1000 01:18:19,700 --> 01:18:22,670 We iterate--as long as we have something in our list, 1001 01:18:22,670 --> 01:18:26,640 we have another node to look at. 1002 01:18:26,640 --> 01:18:28,920 So we look at that node, 1003 01:18:28,920 --> 01:18:31,390 we advance our list to the next one. 1004 01:18:31,390 --> 01:18:34,060 If that node is the value we're looking for, we can return true. 1005 01:18:34,060 --> 01:18:37,640 Else insert both our left and right subtrees, 1006 01:18:37,640 --> 01:18:40,510 as long as they're not null, into our list 1007 01:18:40,510 --> 01:18:43,120 so that we inevitably go over them. 1008 01:18:43,120 --> 01:18:45,160 So if they were not null, 1009 01:18:45,160 --> 01:18:47,950 if our root pointer pointed to two things, 1010 01:18:47,950 --> 01:18:51,670 then at first we pulled something out so our list ends up being null. 1011 01:18:51,670 --> 01:18:56,890 And then we put two things back in, so now our list is of size 2. 1012 01:18:56,890 --> 01:19:00,030 Then we're going to loop back up and we're just going to pull, 1013 01:19:00,030 --> 01:19:04,250 let's say, the left pointer of our root node. 1014 01:19:04,250 --> 01:19:07,730 And that'll just keep happening; we'll end up looping over everything. 1015 01:19:07,730 --> 01:19:11,280 >> Notice that this was significantly more complicated 1016 01:19:11,280 --> 01:19:14,160 in the recursive solution. 1017 01:19:14,160 --> 01:19:17,260 And I've said multiple times 1018 01:19:17,260 --> 01:19:25,120 that the recursive solution usually has much in common with the iterative solution. 1019 01:19:25,120 --> 01:19:30,820 Here, this is exactly what the recursive solution is doing. 1020 01:19:30,820 --> 01:19:36,740 The only change is that instead of implicitly using the stack, the program stack, 1021 01:19:36,740 --> 01:19:40,840 as your way of keeping track of what nodes you still need to visit, 1022 01:19:40,840 --> 01:19:45,430 now you have to explicitly use a linked list. 1023 01:19:45,430 --> 01:19:49,070 In both cases you are keeping track of what node still needs to be visited. 1024 01:19:49,070 --> 01:19:51,790 In the recursive case, it's just easier because a stack 1025 01:19:51,790 --> 01:19:57,100 is implemented for you as the program stack. 1026 01:19:57,100 --> 01:19:59,550 Notice that this linked list, it is a stack. 1027 01:19:59,550 --> 01:20:01,580 Whatever we just put on the stack 1028 01:20:01,580 --> 01:20:09,960 is immediately what we're going to pull off the stack to visit next. 1029 01:20:09,960 --> 01:20:14,380 We're out of time, but any questions? 1030 01:20:14,380 --> 01:20:23,530 [Student, unintelligible] 1031 01:20:23,530 --> 01:20:27,790 [Bowden] Yeah. So if we have our linked list, 1032 01:20:27,790 --> 01:20:30,150 current is going to point to this guy, 1033 01:20:30,150 --> 01:20:34,690 and now we're just advancing our linked list to focus on this guy. 1034 01:20:34,690 --> 01:20:44,200 We're traversing over the linked list in that line. 1035 01:20:44,200 --> 01:20:51,200 And then I guess we should free our linked list and stuff 1036 01:20:51,200 --> 01:20:53,880 once before returning true or false, we need to 1037 01:20:53,880 --> 01:20:57,360 iterate over our linked list and always down here, I guess, 1038 01:20:57,360 --> 01:21:03,900 if we cur right is not equal to, add it, so now we want to free 1039 01:21:03,900 --> 01:21:09,600 cur because, well, did we completely forget about the list? Yeah. 1040 01:21:09,600 --> 01:21:12,880 So that's what we want to do here. 1041 01:21:12,880 --> 01:21:16,730 Where's the pointer? 1042 01:21:16,730 --> 01:21:23,320 Cur was then--we want to a struct list* 10 equals list next. 1043 01:21:23,320 --> 01:21:29,240 Free list, list = temp. 1044 01:21:29,240 --> 01:21:32,820 And in the case where we return true, we do need to iterate 1045 01:21:32,820 --> 01:21:37,050 over the remainder of our linked list, freeing things. 1046 01:21:37,050 --> 01:21:39,700 The nice thing about the recursive solution is freeing things 1047 01:21:39,700 --> 01:21:44,930 just means popping factorings off the stack, which will happen for you. 1048 01:21:44,930 --> 01:21:50,720 So we've gone from something that's like 3 lines of hard-to-think-about code 1049 01:21:50,720 --> 01:21:57,520 to something that is significantly many more hard-to-think-about lines of code. 1050 01:21:57,520 --> 01:22:00,620 Any more questions? 1051 01:22:00,620 --> 01:22:08,760 All right. We're good. Bye! 1052 01:22:08,760 --> 01:22:11,760 [CS50.TV]