1 00:00:00,000 --> 00:00:02,500 [Section 7] [Less Comfortable] 2 00:00:02,500 --> 00:00:04,890 [Nate Hardison] [Harvard University] 3 00:00:04,890 --> 00:00:07,000 [This is CS50.] [CS50.TV] 4 00:00:07,000 --> 00:00:09,080 >> Welcome to Section 7. 5 00:00:09,080 --> 00:00:11,330 Thanks to hurricane Sandy, 6 00:00:11,330 --> 00:00:13,440 instead of having a normal section this week, 7 00:00:13,440 --> 00:00:17,650 we're doing this walkthrough through the section of questions. 8 00:00:17,650 --> 00:00:22,830 I'm going to be following along with the Problem Set 6 specification 9 00:00:22,830 --> 00:00:25,650 and going through all of the questions in the 10 00:00:25,650 --> 00:00:27,770 "A Section of Questions" section. 11 00:00:27,770 --> 00:00:30,940 If there are any questions, 12 00:00:30,940 --> 00:00:32,960 please post these on CS50 Discuss. 13 00:00:32,960 --> 00:00:35,480 >> Alright. Let's get started. 14 00:00:35,480 --> 00:00:40,780 Right now, I'm looking at page 3 of the problem set specification. 15 00:00:40,780 --> 00:00:44,110 We're going to first start talking about binary trees 16 00:00:44,110 --> 00:00:47,850 since those have a lot of relevance to this week's problem set-- 17 00:00:47,850 --> 00:00:49,950 the Huffman tree encoding. 18 00:00:49,950 --> 00:00:55,000 One of the very first data structures we talked about on CS50 was the array. 19 00:00:55,000 --> 00:01:00,170 Remember that an array is a sequence of elements-- 20 00:01:00,170 --> 00:01:04,019 all of the same type--stored right next to each other in memory. 21 00:01:04,019 --> 00:01:14,420 If I have an integer array that I can draw using this boxes-numbers-integers style-- 22 00:01:14,420 --> 00:01:20,290 Let's say I have 5 in the first box, I have 7 in the second, 23 00:01:20,290 --> 00:01:27,760 then I have 8, 10, and 20 in the final box. 24 00:01:27,760 --> 00:01:33,000 Remember, the two really good things about this array 25 00:01:33,000 --> 00:01:38,800 are that we have this constant-time access to any particular element 26 00:01:38,800 --> 00:01:40,500 in the array if we know its index. 27 00:01:40,500 --> 00:01:44,670 For example, if I want to grab the third element in the array-- 28 00:01:44,670 --> 00:01:47,870 at index 2 using our zero-based indexing system-- 29 00:01:47,870 --> 00:01:52,220 I literally just have to do a simple mathematical calculation, 30 00:01:52,220 --> 00:01:56,170 hop to that position in the array, 31 00:01:56,170 --> 00:01:57,840 pull out the 8 that is stored there, 32 00:01:57,840 --> 00:01:59,260 and I'm good to go. 33 00:01:59,260 --> 00:02:03,350 >> One of the bad things about this array--that we talked about 34 00:02:03,350 --> 00:02:05,010 when we discussed linked lists-- 35 00:02:05,010 --> 00:02:09,120 is that if I want to insert an element into this array, 36 00:02:09,120 --> 00:02:11,090 I'm going to have to do some shifting around. 37 00:02:11,090 --> 00:02:12,940 For example, this array right here 38 00:02:12,940 --> 00:02:16,850 is in sorted order--sorted in ascending order-- 39 00:02:16,850 --> 00:02:19,440 5, then 7, then 8, then 10, and then 20-- 40 00:02:19,440 --> 00:02:23,100 but if I want to insert the number 9 into this array, 41 00:02:23,100 --> 00:02:27,460 I'm going to have to shift some of the elements in order to make space. 42 00:02:27,460 --> 00:02:30,440 We can draw this out here. 43 00:02:30,440 --> 00:02:35,650 I'm going to have to move the 5, the 7, and then the 8; 44 00:02:35,650 --> 00:02:38,720 create a gap where I can put the 9, 45 00:02:38,720 --> 00:02:45,910 and then the 10 and the 20 can go to the right of the 9. 46 00:02:45,910 --> 00:02:49,450 This is kind of a pain because in the worst-case scenario-- 47 00:02:49,450 --> 00:02:54,350 when we're having to insert either at the beginning or at the end 48 00:02:54,350 --> 00:02:56,040 of the array, depending on how we're shifting-- 49 00:02:56,040 --> 00:02:58,850 we might end up having to shift all of the elements 50 00:02:58,850 --> 00:03:00,750 that we're currently storing in the array. 51 00:03:00,750 --> 00:03:03,810 >> So, what was the way around this? 52 00:03:03,810 --> 00:03:09,260 The way around this was to go to our linked-list method where-- 53 00:03:09,260 --> 00:03:19,820 instead of storing the elements 5, 7, 8, 10, and 20 all next to each other in memory-- 54 00:03:19,820 --> 00:03:25,630 what we instead did was store them kind of wherever we wanted to store them 55 00:03:25,630 --> 00:03:32,470 in these linked-list nodes which I'm drawing out here, kind of ad hoc. 56 00:03:32,470 --> 00:03:42,060 And then we connected them together using these next pointers. 57 00:03:42,060 --> 00:03:44,370 I can have a pointer from 5 to the 7, 58 00:03:44,370 --> 00:03:46,420 a pointer from the 7 to the 8, 59 00:03:46,420 --> 00:03:47,770 a pointer from the 8 to the 10, 60 00:03:47,770 --> 00:03:51,220 and, finally, a pointer from the 10 to the 20, 61 00:03:51,220 --> 00:03:54,880 and then a null pointer at the 20 indicating that there's nothing left. 62 00:03:54,880 --> 00:03:59,690 The trade-off that we have here 63 00:03:59,690 --> 00:04:05,360 is that now if we want to insert the number 9 into our sorted list, 64 00:04:05,360 --> 00:04:08,270 all we have to do is create a new node with 9, 65 00:04:08,270 --> 00:04:12,290 wire it up to point to the appropriate place, 66 00:04:12,290 --> 00:04:20,630 and then rewire the 8 to point down to the 9. 67 00:04:20,630 --> 00:04:25,660 That's pretty fast, assuming we know exactly where we want to insert the 9. 68 00:04:25,660 --> 00:04:32,610 But the trade-off in exchange for this is that we've now lost the constant-time access 69 00:04:32,610 --> 00:04:36,230 to any particular element in our data structure. 70 00:04:36,230 --> 00:04:40,950 For example, if I want to find the fourth element in this linked list, 71 00:04:40,950 --> 00:04:43,510 I'm going to have to start at the very beginning of the list 72 00:04:43,510 --> 00:04:48,930 and work my way through counting node-by-node until I find the fourth one. 73 00:04:48,930 --> 00:04:55,870 >> In order to get better access performance than a linked list-- 74 00:04:55,870 --> 00:04:59,360 but also retain some of the benefits that we had 75 00:04:59,360 --> 00:05:01,800 in terms of insertion time from a linked list-- 76 00:05:01,800 --> 00:05:05,750 a binary tree is going to need to use a little more memory. 77 00:05:05,750 --> 00:05:11,460 In particular, instead of just having one pointer in a binary tree node-- 78 00:05:11,460 --> 00:05:13,350 like the linked-list node does-- 79 00:05:13,350 --> 00:05:16,950 we're going to add a second pointer to the binary tree node. 80 00:05:16,950 --> 00:05:19,950 Rather than just having one pointer to the next element, 81 00:05:19,950 --> 00:05:24,420 we're going to have a pointer to a left child and a right child. 82 00:05:24,420 --> 00:05:26,560 >> Let's draw a picture to see what that actually looks like. 83 00:05:26,560 --> 00:05:31,350 Again, I'm going to use these boxes and arrows. 84 00:05:31,350 --> 00:05:37,150 A binary tree node will start off with just a simple box. 85 00:05:37,150 --> 00:05:40,940 It's going to have a space for the value, 86 00:05:40,940 --> 00:05:47,280 and then it's also going to have a space for the left child and the right child. 87 00:05:47,280 --> 00:05:49,280 I'm going to label them here. 88 00:05:49,280 --> 00:05:57,560 We're going to have the left child, and then we're going to have the right child. 89 00:05:57,560 --> 00:05:59,920 There are many different ways of doing this. 90 00:05:59,920 --> 00:06:02,050 Sometimes for space and convenience, 91 00:06:02,050 --> 00:06:06,460 I'll actually draw it like I'm doing here on the bottom 92 00:06:06,460 --> 00:06:10,910 where I'm going to have the value at the top, 93 00:06:10,910 --> 00:06:14,060 and then the right child on the bottom right, 94 00:06:14,060 --> 00:06:16,060 and the left child on the bottom left. 95 00:06:16,060 --> 00:06:20,250 Going back to this top diagram, 96 00:06:20,250 --> 00:06:22,560 we have the value at the very top, 97 00:06:22,560 --> 00:06:25,560 then we have the left-child pointer, and then we have the right-child pointer. 98 00:06:25,560 --> 00:06:30,110 >> In the problem set specification, 99 00:06:30,110 --> 00:06:33,110 we talk about drawing a node that has a value 7, 100 00:06:33,110 --> 00:06:39,750 and then a left-child pointer that is null, and a right-child pointer that is null. 101 00:06:39,750 --> 00:06:46,040 We can either write capital N-U-L-L in the space for 102 00:06:46,040 --> 00:06:51,610 both the left child and the right child, or we can draw this diagonal slash 103 00:06:51,610 --> 00:06:53,750 through each of the boxes to indicate that it's null. 104 00:06:53,750 --> 00:06:57,560 I'm going to do that just because that's simpler. 105 00:06:57,560 --> 00:07:03,700 What you see here are two ways of diagramming a very simple binary tree node 106 00:07:03,700 --> 00:07:07,960 where we have the value 7 and null child pointers. 107 00:07:07,960 --> 00:07:15,220 >> The second part of our specification talks about how with linked lists-- 108 00:07:15,220 --> 00:07:18,270 remember, we only had to hold on to the very first element in a list 109 00:07:18,270 --> 00:07:20,270 to remember the entire list-- 110 00:07:20,270 --> 00:07:26,140 and likewise, with a binary tree, we only have to hold on to one pointer to the tree 111 00:07:26,140 --> 00:07:31,120 in order to maintain control over the entire data structure. 112 00:07:31,120 --> 00:07:36,150 This special element of the tree is called the root node of the tree. 113 00:07:36,150 --> 00:07:43,360 For example, if this one node--this node containing the value 7 114 00:07:43,360 --> 00:07:45,500 with null left- and right-child pointers-- 115 00:07:45,500 --> 00:07:47,360 were the only value in our tree, 116 00:07:47,360 --> 00:07:50,390 then this would be our root node. 117 00:07:50,390 --> 00:07:52,240 It's the very beginning of our tree. 118 00:07:52,240 --> 00:07:58,530 We can see this a little more clearly once we start adding more nodes to our tree. 119 00:07:58,530 --> 00:08:01,510 Let me pull up a new page. 120 00:08:01,510 --> 00:08:05,000 >> Now we're going to draw a tree that has 7 at the root, 121 00:08:05,000 --> 00:08:10,920 and 3 inside of the left child, and 9 inside of the right child. 122 00:08:10,920 --> 00:08:13,500 Again, this is pretty simple. 123 00:08:13,500 --> 00:08:26,510 We've got 7, draw a node for the 3, a node for 9, 124 00:08:26,510 --> 00:08:32,150 and I'm going to set the left-child pointer of 7 to point to the node containing 3, 125 00:08:32,150 --> 00:08:37,850 and the right-child pointer of the node containing 7 to the node containing 9. 126 00:08:37,850 --> 00:08:42,419 Now, since 3 and 9 don't have any children, 127 00:08:42,419 --> 00:08:48,500 we're going to set all of their child pointers to be null. 128 00:08:48,500 --> 00:08:56,060 Here, the root of our tree corresponds to the node containing the number 7. 129 00:08:56,060 --> 00:09:02,440 You can see that if all we have is a pointer to that root node, 130 00:09:02,440 --> 00:09:07,330 we can then walk through our tree and access both child nodes-- 131 00:09:07,330 --> 00:09:10,630 both 3 and 9. 132 00:09:10,630 --> 00:09:14,820 No need to maintain pointers to every single node on the tree. 133 00:09:14,820 --> 00:09:22,080 All right. Now we're going to add another node to this diagram. 134 00:09:22,080 --> 00:09:25,370 We're going to add a node containing 6, 135 00:09:25,370 --> 00:09:34,140 and we're going to add this as the right child of the node containing 3. 136 00:09:34,140 --> 00:09:41,850 To do that, I'm going to erase that null pointer in the 3 node 137 00:09:41,850 --> 00:09:47,750 and wire it up to point to the node containing 6. All right. 138 00:09:47,750 --> 00:09:53,800 >> At this point, let's go over a little bit of terminology. 139 00:09:53,800 --> 00:09:58,230 To start, the reason that this is called a binary tree in particular 140 00:09:58,230 --> 00:10:00,460 is that it has two child pointers. 141 00:10:00,460 --> 00:10:06,020 There are other kinds of trees that have more child pointers. 142 00:10:06,020 --> 00:10:10,930 In particular, you did a "trie" in Problem Set 5. 143 00:10:10,930 --> 00:10:19,310 You'll notice that in that trie, you had 27 different pointers to different children-- 144 00:10:19,310 --> 00:10:22,410 one for each of the 26 letters in the English alphabet, 145 00:10:22,410 --> 00:10:25,410 and then the 27th for the apostrophe-- 146 00:10:25,410 --> 00:10:28,900 so, that's similar to a type of tree. 147 00:10:28,900 --> 00:10:34,070 But here, since it's binary, we only have two child pointers. 148 00:10:34,070 --> 00:10:37,880 >> In addition to this root node that we talked about, 149 00:10:37,880 --> 00:10:41,470 we've also been throwing around this term "child". 150 00:10:41,470 --> 00:10:44,470 What does it mean for one node to be a child of another node? 151 00:10:44,470 --> 00:10:54,060 It literally means that a child node is a child of another node 152 00:10:54,060 --> 00:10:58,760 if that other node has one of its child pointers set to point to that node. 153 00:10:58,760 --> 00:11:01,230 To put this into more concrete terms, 154 00:11:01,230 --> 00:11:11,170 if 3 is pointed to by one of the child pointers of 7, then 3 is a child of 7. 155 00:11:11,170 --> 00:11:14,510 If we were to figure out what the children of 7 are-- 156 00:11:14,510 --> 00:11:18,510 well, we see that 7 has a pointer to 3 and a pointer to 9, 157 00:11:18,510 --> 00:11:22,190 so 9 and 3 are the children of 7. 158 00:11:22,190 --> 00:11:26,650 9 has no children because its child pointers are null, 159 00:11:26,650 --> 00:11:30,940 and 3 has only one child, 6. 160 00:11:30,940 --> 00:11:37,430 6 also has no children because both of its pointers are null, which we'll draw right now. 161 00:11:37,430 --> 00:11:45,010 >> Additionally, we also talk about the parents of a particular node, 162 00:11:45,010 --> 00:11:51,100 and this is, as you'd expect, the reverse of this child description. 163 00:11:51,100 --> 00:11:58,620 Each node has only one parent--instead of two as you might expect with humans. 164 00:11:58,620 --> 00:12:03,390 For example, the parent of 3 is 7. 165 00:12:03,390 --> 00:12:10,800 The parent of 9 is also 7, and the parent of 6 is 3. Not much to it. 166 00:12:10,800 --> 00:12:15,720 We also have terms to talk about, grandparents and grandchildren, 167 00:12:15,720 --> 00:12:18,210 and more generally, we talk about the ancestors 168 00:12:18,210 --> 00:12:20,960 and descendants of a particular node. 169 00:12:20,960 --> 00:12:25,710 The ancestor of a node--or ancestors, rather, of a node-- 170 00:12:25,710 --> 00:12:32,730 are all of the nodes that lie on the path from the root to that node. 171 00:12:32,730 --> 00:12:36,640 For example, if I'm looking at the node 6, 172 00:12:36,640 --> 00:12:46,430 then the ancestors are going to be both 3 and 7. 173 00:12:46,430 --> 00:12:55,310 The ancestors of 9, for example, are--if I'm looking at the node 9-- 174 00:12:55,310 --> 00:12:59,990 then the ancestor of 9 is just 7. 175 00:12:59,990 --> 00:13:01,940 And descendants are exactly the reverse. 176 00:13:01,940 --> 00:13:05,430 If I want to look at all of the descendants of 7, 177 00:13:05,430 --> 00:13:11,020 then I have to look at all of the nodes beneath it. 178 00:13:11,020 --> 00:13:16,950 So, I have 3, 9, and 6 all as descendants of 7. 179 00:13:16,950 --> 00:13:24,170 >> The final term that we'll talk about is this notion of being a sibling. 180 00:13:24,170 --> 00:13:27,980 Siblings--kind of following along on these family terms-- 181 00:13:27,980 --> 00:13:33,150 are nodes that are at the same level in the tree. 182 00:13:33,150 --> 00:13:42,230 So, 3 and 9 are siblings because they are at the same level in the tree. 183 00:13:42,230 --> 00:13:46,190 They both have the same parent, 7. 184 00:13:46,190 --> 00:13:51,400 The 6 has no siblings because 9 doesn't have any children. 185 00:13:51,400 --> 00:13:55,540 And 7 doesn't have any siblings because it's the root of our tree, 186 00:13:55,540 --> 00:13:59,010 and there's only ever one root. 187 00:13:59,010 --> 00:14:02,260 For 7 to have siblings, there would have to be a node above 7. 188 00:14:02,260 --> 00:14:07,480 There would have to be a parent of 7, in which case 7 would no longer be the root of the tree. 189 00:14:07,480 --> 00:14:10,480 Then that new parent of 7 would also have to have a child, 190 00:14:10,480 --> 00:14:16,480 and that child would then be the sibling of 7. 191 00:14:16,480 --> 00:14:21,040 >> All right. Moving on. 192 00:14:21,040 --> 00:14:24,930 When we started our discussion of binary trees, 193 00:14:24,930 --> 00:14:28,790 we talked about how we were going to use them to 194 00:14:28,790 --> 00:14:32,800 gain an advantage over both arrays and linked lists. 195 00:14:32,800 --> 00:14:37,220 And the way we're going to do that is with this ordering property. 196 00:14:37,220 --> 00:14:41,080 We say that a binary tree is ordered, according to the specification, 197 00:14:41,080 --> 00:14:45,740 if for each node in our tree, all of its descendants on the left-- 198 00:14:45,740 --> 00:14:48,670 the left child and all of the left child's descendants-- 199 00:14:48,670 --> 00:14:54,510 have lesser values, and all of the nodes on the right-- 200 00:14:54,510 --> 00:14:57,770 the right child and all of the right child's descendants-- 201 00:14:57,770 --> 00:15:02,800 have nodes greater than the value of the current node that we're looking at. 202 00:15:02,800 --> 00:15:07,850 Just for simplicity, we're going to assume that there aren't any duplicate nodes in our tree. 203 00:15:07,850 --> 00:15:11,180 For example, in this tree, we're not going to deal with the case 204 00:15:11,180 --> 00:15:13,680 where we have the value 7 at the root 205 00:15:13,680 --> 00:15:16,720 and then we also have the value 7 elsewhere in the tree. 206 00:15:16,720 --> 00:15:24,390 In this case, you'll notice that this tree is indeed ordered. 207 00:15:24,390 --> 00:15:26,510 We have the value 7 at the root. 208 00:15:26,510 --> 00:15:29,720 Everything to the left of 7-- 209 00:15:29,720 --> 00:15:35,310 if I undo all of these little marks here-- 210 00:15:35,310 --> 00:15:40,450 everything to the left of 7--the 3 and the 6-- 211 00:15:40,450 --> 00:15:49,410 those values are both less than 7, and everything to the right--which is just this 9-- 212 00:15:49,410 --> 00:15:53,450 is greater than 7. 213 00:15:53,450 --> 00:15:58,650 >> This is not the only ordered tree containing these values, 214 00:15:58,650 --> 00:16:03,120 but let's draw a few more of them. 215 00:16:03,120 --> 00:16:05,030 There is actually a whole bunch of ways that we can do this. 216 00:16:05,030 --> 00:16:09,380 I'm going to use a shorthand just to keep things simple where-- 217 00:16:09,380 --> 00:16:11,520 rather than drawing out the whole boxes-and-arrows-- 218 00:16:11,520 --> 00:16:14,220 I'm just going to draw the numbers and add arrows connecting them. 219 00:16:14,220 --> 00:16:22,920 To start, we'll just write our original tree again where we had 7, and then a 3, 220 00:16:22,920 --> 00:16:25,590 and then 3 pointed back to the right to the 6, 221 00:16:25,590 --> 00:16:30,890 and 7 had a right child that was 9. 222 00:16:30,890 --> 00:16:33,860 All right. What's another way that we could write this tree? 223 00:16:33,860 --> 00:16:38,800 Instead of having 3 be the left child of 7, 224 00:16:38,800 --> 00:16:41,360 we could also have the 6 be the left child of 7, 225 00:16:41,360 --> 00:16:44,470 and then 3 be the left child of the 6. 226 00:16:44,470 --> 00:16:48,520 That would look like this tree right here where I've got 7, 227 00:16:48,520 --> 00:16:57,860 then 6, then 3, and a 9 on the right. 228 00:16:57,860 --> 00:17:01,490 We also don't have to have 7 as our root node. 229 00:17:01,490 --> 00:17:03,860 We could also have 6 as our root node. 230 00:17:03,860 --> 00:17:06,470 What would that look like? 231 00:17:06,470 --> 00:17:09,230 If we're going to maintain this ordered property, 232 00:17:09,230 --> 00:17:12,970 everything to the left of the 6 has to be less than it. 233 00:17:12,970 --> 00:17:16,540 There's only one possibility, and that's 3. 234 00:17:16,540 --> 00:17:19,869 But then as the right child of 6, we have two possibilities. 235 00:17:19,869 --> 00:17:25,380 First, we could have the 7 and then the 9, 236 00:17:25,380 --> 00:17:28,850 or we could draw it--like I'm about to do here-- 237 00:17:28,850 --> 00:17:34,790 where we have the 9 as the right child of the 6, 238 00:17:34,790 --> 00:17:39,050 and then the 7 as the left child of the 9. 239 00:17:39,050 --> 00:17:44,240 >> Now, 7 and 6 aren't the only possible values for the root. 240 00:17:44,240 --> 00:17:50,200 We could also have 3 be at the root. 241 00:17:50,200 --> 00:17:52,240 What happens if 3 is at the root? 242 00:17:52,240 --> 00:17:54,390 Here, things get a little bit interesting. 243 00:17:54,390 --> 00:17:58,440 3 doesn't have any values that are less than it, 244 00:17:58,440 --> 00:18:02,070 so that entire left side of the tree is just going to be null. 245 00:18:02,070 --> 00:18:03,230 There's not going to be anything there. 246 00:18:03,230 --> 00:18:07,220 To the right, we could list things in ascending order. 247 00:18:07,220 --> 00:18:15,530 We could have 3, then 6, then 7, then 9. 248 00:18:15,530 --> 00:18:26,710 Or, we could do 3, then 6, then 9, then 7. 249 00:18:26,710 --> 00:18:35,020 Or, we could do 3, then 7, then 6, then 9. 250 00:18:35,020 --> 00:18:40,950 Or, 3, 7--actually no, we can't do a 7 anymore. 251 00:18:40,950 --> 00:18:43,330 That's our one thing there. 252 00:18:43,330 --> 00:18:54,710 We can do 9, and then from the 9 we can do 6 and then 7. 253 00:18:54,710 --> 00:19:06,980 Or, we can do 3, then 9, then 7, and then 6. 254 00:19:06,980 --> 00:19:12,490 >> One thing to draw your attention to here is 255 00:19:12,490 --> 00:19:14,510 that these trees are a little weird-looking. 256 00:19:14,510 --> 00:19:17,840 In particular, if we look at the 4 trees on the right side-- 257 00:19:17,840 --> 00:19:20,930 I'll circle them, here-- 258 00:19:20,930 --> 00:19:28,410 these trees look almost exactly like a linked list. 259 00:19:28,410 --> 00:19:32,670 Each node has only one child, 260 00:19:32,670 --> 00:19:38,950 and so we don't have any of this tree-like structure that we see, for example, 261 00:19:38,950 --> 00:19:44,720 in this one lone tree over here on the bottom left. 262 00:19:44,720 --> 00:19:52,490 These trees are actually called degenerate binary trees, 263 00:19:52,490 --> 00:19:54,170 and we'll talk about these more in the future-- 264 00:19:54,170 --> 00:19:56,730 especially if you go on to take other computer science courses. 265 00:19:56,730 --> 00:19:59,670 These trees are degenerate. 266 00:19:59,670 --> 00:20:03,780 They're not very useful because, indeed, this structure lends itself 267 00:20:03,780 --> 00:20:08,060 to lookup times similar to that of a linked list. 268 00:20:08,060 --> 00:20:13,050 We don't get to take advantage of the extra memory--this extra pointer-- 269 00:20:13,050 --> 00:20:18,840 because of our structure being bad in this way. 270 00:20:18,840 --> 00:20:24,700 Rather than go on and draw out the binary trees that have 9 at the root, 271 00:20:24,700 --> 00:20:27,220 which is the final case that we would have, 272 00:20:27,220 --> 00:20:32,380 we're instead, at this point, going to talk a little bit about this other term 273 00:20:32,380 --> 00:20:36,150 that we use when talking about trees, which is called the height. 274 00:20:36,150 --> 00:20:45,460 >> The height of a tree is the distance from the root to the most-distant node, 275 00:20:45,460 --> 00:20:48,370 or, rather, the number of hops that you would have to make in order to 276 00:20:48,370 --> 00:20:53,750 start from the root and then end up at the most-distant node in the tree. 277 00:20:53,750 --> 00:20:57,330 If we look at some of these trees that we've drawn right here, 278 00:20:57,330 --> 00:21:07,870 we can see that if we take this tree in the top left corner and we start at the 3, 279 00:21:07,870 --> 00:21:14,510 then we have to make one hop to get to the 6, a second hop to get to the 7, 280 00:21:14,510 --> 00:21:20,560 and a third hop to get to the 9. 281 00:21:20,560 --> 00:21:26,120 So, the height of this tree is 3. 282 00:21:26,120 --> 00:21:30,640 We can do the same exercise for the other trees outlined with this green, 283 00:21:30,640 --> 00:21:40,100 and we see that the height of all of these trees is also indeed 3. 284 00:21:40,100 --> 00:21:45,230 That's part of what makes them degenerate-- 285 00:21:45,230 --> 00:21:53,750 that their height is just one less than the number of nodes in the entire tree. 286 00:21:53,750 --> 00:21:58,400 If we look at this other tree that's encircled with red, on the other hand, 287 00:21:58,400 --> 00:22:03,920 we see that the most-distant leaf nodes are the 6 and the 9-- 288 00:22:03,920 --> 00:22:06,940 the leaves being those nodes that have no children. 289 00:22:06,940 --> 00:22:11,760 So, in order to get from the root node to either the 6 or the 9, 290 00:22:11,760 --> 00:22:17,840 we have to make one hop to get to the 7 and then a second hop to get to the 9, 291 00:22:17,840 --> 00:22:21,240 and likewise, only a second hop from the 7 to get to the 6. 292 00:22:21,240 --> 00:22:29,080 So, the height of this tree over here is only 2. 293 00:22:29,080 --> 00:22:35,330 You can go back and do that for all of the other trees that we previously discussed 294 00:22:35,330 --> 00:22:37,380 beginning with the 7 and the 6, 295 00:22:37,480 --> 00:22:42,500 and you'll find that the height of all of those trees is also 2. 296 00:22:42,500 --> 00:22:46,320 >> The reason we've been talking about ordered binary trees 297 00:22:46,320 --> 00:22:50,250 and why they're cool is because you can search through them in 298 00:22:50,250 --> 00:22:53,810 a very similar way to searching over a sorted array. 299 00:22:53,810 --> 00:22:58,720 This is where we talk about getting that improved lookup time 300 00:22:58,720 --> 00:23:02,730 over the simple linked list. 301 00:23:02,730 --> 00:23:05,010 With a linked list--if you want to find a particular element-- 302 00:23:05,010 --> 00:23:07,470 you're at best going to do some sort of linear search 303 00:23:07,470 --> 00:23:10,920 where you start at the beginning of a list and hop one-by-one-- 304 00:23:10,920 --> 00:23:12,620 one node by one node-- 305 00:23:12,620 --> 00:23:16,060 through the entire list until you find whatever you're searching for. 306 00:23:16,060 --> 00:23:19,440 Whereas, if you have a binary tree that's stored in this nice format, 307 00:23:19,440 --> 00:23:23,300 you can actually get more of a binary search going on 308 00:23:23,300 --> 00:23:25,160 where you divide and conquer 309 00:23:25,160 --> 00:23:29,490 and search through the appropriate half of the tree at each step. 310 00:23:29,490 --> 00:23:32,840 Let's see how that works. 311 00:23:32,840 --> 00:23:38,850 >> If we have--again, going back to our original tree-- 312 00:23:38,850 --> 00:23:46,710 we start at 7, we have 3 on the left, 9 on the right, 313 00:23:46,710 --> 00:23:51,740 and beneath the 3 we have the 6. 314 00:23:51,740 --> 00:24:01,880 If we want to search for the number 6 in this tree, we'd start at the root. 315 00:24:01,880 --> 00:24:08,910 We would compare the value we're looking for--say 6-- 316 00:24:08,910 --> 00:24:12,320 to the value stored in the node that we're currently looking at, 7, 317 00:24:12,320 --> 00:24:21,200 find that 6 is indeed less than 7, so we'd move to the left. 318 00:24:21,200 --> 00:24:25,530 If the value of 6 had been greater than 7, we would have instead moved to the right. 319 00:24:25,530 --> 00:24:29,770 Since we know that--due to the structure of our ordered binary tree-- 320 00:24:29,770 --> 00:24:33,910 all of the values less than 7 are going to be stored to the left of 7, 321 00:24:33,910 --> 00:24:40,520 there's no need to even bother looking through the right side of the tree. 322 00:24:40,520 --> 00:24:43,780 Once we move to the left and we're now at the node containing 3, 323 00:24:43,780 --> 00:24:48,110 we can do that same comparison again where we compare the 3 and the 6. 324 00:24:48,110 --> 00:24:52,430 And we find that while 6--the value we're looking for--is greater than 3, 325 00:24:52,430 --> 00:24:58,580 we can go to the right side of the node containing 3. 326 00:24:58,580 --> 00:25:02,670 There's no left side here, so we could have ignored that. 327 00:25:02,670 --> 00:25:06,510 But we only know that because we're looking at the tree itself, 328 00:25:06,510 --> 00:25:08,660 and we can see that the tree has no children. 329 00:25:08,660 --> 00:25:13,640 >> It's also pretty easy to look up 6 in this tree if we're doing it ourselves as humans, 330 00:25:13,640 --> 00:25:16,890 but let's follow this process mechanically like a computer would do 331 00:25:16,890 --> 00:25:18,620 to really understand the algorithm. 332 00:25:18,620 --> 00:25:26,200 At this point, we're now looking at a node that contains 6, 333 00:25:26,200 --> 00:25:29,180 and we're looking for the value 6, 334 00:25:29,180 --> 00:25:31,740 so, indeed, we've found the appropriate node. 335 00:25:31,740 --> 00:25:35,070 We found the value 6 in our tree, and we can stop our search. 336 00:25:35,070 --> 00:25:37,330 At this point, depending on what's going on, 337 00:25:37,330 --> 00:25:41,870 we can say yes, we have found the value 6, it exists in our tree. 338 00:25:41,870 --> 00:25:47,640 Or, if we're planning to insert a node or do something, we can do that at this point. 339 00:25:47,640 --> 00:25:53,010 >> Let's do a couple more lookups just to see how this works. 340 00:25:53,010 --> 00:25:59,390 Let's look at what happens if we were to try and look up the value 10. 341 00:25:59,390 --> 00:26:02,970 If we were to look up the value 10, we would start at the root. 342 00:26:02,970 --> 00:26:07,070 We'd see that 10 is greater than 7, so we'd move to the right. 343 00:26:07,070 --> 00:26:13,640 We'd get to the 9 and compare 9 to the 10 and see that 9 is indeed less than 10. 344 00:26:13,640 --> 00:26:16,210 So again, we'd try to move to the right. 345 00:26:16,210 --> 00:26:20,350 But at this point, we'd notice that we're at a null node. 346 00:26:20,350 --> 00:26:23,080 There's nothing there. There's nothing where the 10 should be. 347 00:26:23,080 --> 00:26:29,360 This is where we can report failure--that there is indeed no 10 in the tree. 348 00:26:29,360 --> 00:26:35,420 And finally, let's go through the case where we're trying to look up 1 in the tree. 349 00:26:35,420 --> 00:26:38,790 This is similar to what happens if we look up 10, except instead of going to the right, 350 00:26:38,790 --> 00:26:41,260 we're going to go to the left. 351 00:26:41,260 --> 00:26:46,170 We start at the 7 and see that 1 is less than 7, so we move to the left. 352 00:26:46,170 --> 00:26:51,750 We get to the 3 and see that 1 is less than 3, so again, we try to move to the left. 353 00:26:51,750 --> 00:26:59,080 At this point, we have a null node, so again we can report failure. 354 00:26:59,080 --> 00:27:10,260 >> If you do want to learn more about binary trees, 355 00:27:10,260 --> 00:27:14,420 there are a whole bunch of fun little problems that you can do with them. 356 00:27:14,420 --> 00:27:19,450 I suggest practicing the drawing out of these diagrams one-by-one 357 00:27:19,450 --> 00:27:21,910 and following through all of the different steps, 358 00:27:21,910 --> 00:27:25,060 because this will come in super-handy 359 00:27:25,060 --> 00:27:27,480 not only when you're doing the Huffman encoding problem set 360 00:27:27,480 --> 00:27:29,390 but also in future courses-- 361 00:27:29,390 --> 00:27:32,220 just learning how to draw out these data structures and think through the problems 362 00:27:32,220 --> 00:27:38,000 with pen and paper or, in this case, iPad and stylus. 363 00:27:38,000 --> 00:27:41,000 >> At this point, though, we're going to move on to do some coding practice 364 00:27:41,000 --> 00:27:44,870 and actually play with these binary trees and see. 365 00:27:44,870 --> 00:27:52,150 I'm going to switch back over to my computer. 366 00:27:52,150 --> 00:27:58,480 For this part of the section, instead of using CS50 Run or CS50 Spaces, 367 00:27:58,480 --> 00:28:01,500 I'm going to use the appliance. 368 00:28:01,500 --> 00:28:04,950 >> Following along with the problem set specification, 369 00:28:04,950 --> 00:28:07,740 I see that I'm supposed to open up the appliance, 370 00:28:07,740 --> 00:28:11,020 go to my Dropbox folder, create a folder called Section 7, 371 00:28:11,020 --> 00:28:15,730 and then create a file called binary_tree.c. 372 00:28:15,730 --> 00:28:22,050 Here we go. I've got the appliance already open. 373 00:28:22,050 --> 00:28:25,910 I'm going pull up a terminal. 374 00:28:25,910 --> 00:28:38,250 I'm going to go to the Dropbox folder, make a directory called section7, 375 00:28:38,250 --> 00:28:42,230 and see it's totally empty. 376 00:28:42,230 --> 00:28:48,860 Now I'm going to open up binary_tree.c. 377 00:28:48,860 --> 00:28:51,750 Alright. Here we go--empty file. 378 00:28:51,750 --> 00:28:54,330 >> Let's go back to the specification. 379 00:28:54,330 --> 00:28:59,850 The specification asks to create a new type definition 380 00:28:59,850 --> 00:29:03,080 for a binary tree node containing int values-- 381 00:29:03,080 --> 00:29:07,110 just like the values that we drew out in our diagramming before. 382 00:29:07,110 --> 00:29:11,740 We're going to use this boilerplate typedef that we've done right here 383 00:29:11,740 --> 00:29:14,420 that you should recognize from Problem Set 5-- 384 00:29:14,420 --> 00:29:19,190 if you did the hash table way of conquering the speller program. 385 00:29:19,190 --> 00:29:22,540 You should also recognize it from last week's section 386 00:29:22,540 --> 00:29:23,890 where we talked about linked lists. 387 00:29:23,890 --> 00:29:27,870 We've got this typedef of a struct node, 388 00:29:27,870 --> 00:29:34,430 and we've given this struct node this name of struct node beforehand 389 00:29:34,430 --> 00:29:39,350 so that we can then refer to it since we'll want to have struct node pointers 390 00:29:39,350 --> 00:29:45,740 as part of our struct, but we've then encircled this-- 391 00:29:45,740 --> 00:29:47,700 or rather, enclosed this--in a typedef 392 00:29:47,700 --> 00:29:54,600 so that, later in the code, we can refer to this struct as just a node instead of a struct node. 393 00:29:54,600 --> 00:30:03,120 >> This is going to be very similar to the singly-linked list definition that we saw last week. 394 00:30:03,120 --> 00:30:20,070 To do this, let's just start by writing out the boilerplate. 395 00:30:20,070 --> 00:30:23,840 We know that we have to have an integer value, 396 00:30:23,840 --> 00:30:32,170 so we'll put in int value, and then instead of having just one pointer to the next element-- 397 00:30:32,170 --> 00:30:33,970 as we did with singly-linked lists-- 398 00:30:33,970 --> 00:30:38,110 we're going to have left and right child pointers. 399 00:30:38,110 --> 00:30:42,880 That's pretty simple too--struct node* left_child; 400 00:30:42,880 --> 00:30:51,190 and struct node* right_child;. Cool. 401 00:30:51,190 --> 00:30:54,740 That looks like a pretty good start. 402 00:30:54,740 --> 00:30:58,530 Let's go back to the specification. 403 00:30:58,530 --> 00:31:05,030 >> Now we need to declare a global node* variable for the root of a tree. 404 00:31:05,030 --> 00:31:10,590 We're going to make this global just like we made first pointer in our linked list also global. 405 00:31:10,590 --> 00:31:12,690 This was so that in the functions that we write, 406 00:31:12,690 --> 00:31:16,180 we don't have to keep passing around this root-- 407 00:31:16,180 --> 00:31:19,620 though we'll see that if you do want to write these functions recursively, 408 00:31:19,620 --> 00:31:22,830 it might be better to not even pass it around as a global in the first place 409 00:31:22,830 --> 00:31:28,090 and instead initialize it locally in your main function. 410 00:31:28,090 --> 00:31:31,960 But, we'll do it globally to start. 411 00:31:31,960 --> 00:31:39,920 Again, we'll give a couple of spaces, and I'm going to declare a node* root. 412 00:31:39,920 --> 00:31:46,770 Just to make sure that I don't leave this uninitialized, I'm going to set it equal to null. 413 00:31:46,770 --> 00:31:52,210 Now, in the main function--which we'll write really quickly right here-- 414 00:31:52,210 --> 00:32:00,450 int main(int argc, const char* argv[ ])-- 415 00:32:00,450 --> 00:32:10,640 and I'm going to start declaring my argv array as const just so that I know 416 00:32:10,640 --> 00:32:14,550 that those arguments are arguments that I probably don't want to modify. 417 00:32:14,550 --> 00:32:18,390 If I want to modify them, I should probably be making copies of them. 418 00:32:18,390 --> 00:32:21,740 You'll see this a lot in code. 419 00:32:21,740 --> 00:32:25,440 It's fine either way. It's fine to leave it as--omit the const if you'd like. 420 00:32:25,440 --> 00:32:28,630 I typically put it in just so that I remind myself 421 00:32:28,630 --> 00:32:33,650 that I probably don't want to modify those arguments. 422 00:32:33,650 --> 00:32:39,240 >> As always, I'm going to include this return 0 line at the end of main. 423 00:32:39,240 --> 00:32:45,730 Here, I will initialize my root node. 424 00:32:45,730 --> 00:32:48,900 As it stands right now, I've got a pointer that's set to null, 425 00:32:48,900 --> 00:32:52,970 so it's pointing at nothing. 426 00:32:52,970 --> 00:32:57,480 In order to actually start building the node, 427 00:32:57,480 --> 00:32:59,250 I first need to allocate memory for it. 428 00:32:59,250 --> 00:33:05,910 I'm going to do that by making memory on the heap using malloc. 429 00:33:05,910 --> 00:33:10,660 I'm going to set root equal to the result of malloc, 430 00:33:10,660 --> 00:33:19,550 and I'm going to use the sizeof operator to calculate the size of a node. 431 00:33:19,550 --> 00:33:24,990 The reason that I use sizeof node as opposed to, say, 432 00:33:24,990 --> 00:33:37,020 doing something like this--malloc(4 + 4 +4) or malloc 12-- 433 00:33:37,020 --> 00:33:40,820 is because I want my code to be as compatible as possible. 434 00:33:40,820 --> 00:33:44,540 I want to be able to take this .c file, compile it on the appliance, 435 00:33:44,540 --> 00:33:48,820 and then compile it on my 64-bit Mac-- 436 00:33:48,820 --> 00:33:52,040 or on a completely different architecture-- 437 00:33:52,040 --> 00:33:54,640 and I want this all to work the same. 438 00:33:54,640 --> 00:33:59,510 >> If I'm making assumptions about the size of variables-- 439 00:33:59,510 --> 00:34:02,070 the size of an int or the size of a pointer-- 440 00:34:02,070 --> 00:34:06,070 then I'm also making assumptions about the kinds of architectures 441 00:34:06,070 --> 00:34:10,440 on which my code can successfully compile when run. 442 00:34:10,440 --> 00:34:15,030 Always use sizeof as opposed to manually summing the struct fields. 443 00:34:15,030 --> 00:34:20,500 The other reason is that there might also be padding that the compiler puts into the struct. 444 00:34:20,500 --> 00:34:26,570 Even just summing the individual fields is not something that you typically want to do, 445 00:34:26,570 --> 00:34:30,340 so, delete that line. 446 00:34:30,340 --> 00:34:33,090 Now, to really initialize this root node, 447 00:34:33,090 --> 00:34:36,489 I'm going to have to plug in values for each of its different fields. 448 00:34:36,489 --> 00:34:41,400 For example, for value I know I want to initialize to 7, 449 00:34:41,400 --> 00:34:46,920 and for now I'm going to set the left child to be null 450 00:34:46,920 --> 00:34:55,820 and the right child to also be null. Great! 451 00:34:55,820 --> 00:35:02,670 We've done that part of the spec. 452 00:35:02,670 --> 00:35:07,390 >> The specification down at the bottom of page 3 asks me to create three more nodes-- 453 00:35:07,390 --> 00:35:10,600 one containing 3, one containing 6, and one with 9-- 454 00:35:10,600 --> 00:35:14,210 and then wire them up so it looks exactly like our tree diagram 455 00:35:14,210 --> 00:35:17,120 that we were talking about previously. 456 00:35:17,120 --> 00:35:20,450 Let's do that pretty quickly here. 457 00:35:20,450 --> 00:35:26,270 You'll see really quickly that I'm going to start writing a bunch of duplicate code. 458 00:35:26,270 --> 00:35:32,100 I'm going to create a node* and I'm going to call it three. 459 00:35:32,100 --> 00:35:36,000 I'm going to set it equal to malloc(sizeof(node)). 460 00:35:36,000 --> 00:35:41,070 I'm going to set three->value = 3. 461 00:35:41,070 --> 00:35:54,780 Three->left_child = NULL; three->right _child = NULL; as well. 462 00:35:54,780 --> 00:36:01,150 That looks pretty similar to initializing the root, and that's exactly what 463 00:36:01,150 --> 00:36:05,760 I'm going to have to do if I start initializing 6 and 9 as well. 464 00:36:05,760 --> 00:36:20,720 I'll do that really quickly here--actually, I'm going to do a little copy and paste, 465 00:36:20,720 --> 00:36:46,140 and make sure that I--alright. 466 00:36:46,470 --> 00:37:09,900 Now, I've got it copied and I can go ahead and set this equal to 6. 467 00:37:09,900 --> 00:37:14,670 You can see that this takes a while and isn't super-efficient. 468 00:37:14,670 --> 00:37:22,610 In just a little bit, we'll write a function that will do this for us. 469 00:37:22,610 --> 00:37:32,890 I want to replace this with a 9, replace that with a 6. 470 00:37:32,890 --> 00:37:37,360 >> Now we've got all of our nodes created and initialized. 471 00:37:37,360 --> 00:37:41,200 We've got our root set equal to 7, or containing the value 7, 472 00:37:41,200 --> 00:37:46,510 our node containing 3, our node containing 6, and our node containing 9. 473 00:37:46,510 --> 00:37:50,390 At this point, all we have to do is wire everything up. 474 00:37:50,390 --> 00:37:53,020 The reason I initialized all the pointers to null is just so that I make sure that 475 00:37:53,020 --> 00:37:56,260 I don't have any uninitialized pointers in there by accident. 476 00:37:56,260 --> 00:38:02,290 And also since, at this point, I only have to actually connect the nodes to each other-- 477 00:38:02,290 --> 00:38:04,750 to the ones that they're actually connected to--I don't have to go through and make 478 00:38:04,750 --> 00:38:08,240 sure that all the nulls are in there in the appropriate places. 479 00:38:08,240 --> 00:38:15,630 >> Starting at the root, I know that the root's left child is 3. 480 00:38:15,630 --> 00:38:21,250 I know that the root's right child is 9. 481 00:38:21,250 --> 00:38:24,880 After that, the only other child that I have left to worry about 482 00:38:24,880 --> 00:38:39,080 is 3's right child, which is 6. 483 00:38:39,080 --> 00:38:44,670 At this point, it all looks pretty good. 484 00:38:44,670 --> 00:38:54,210 We'll delete some of these lines. 485 00:38:54,210 --> 00:38:59,540 Now everything looks pretty good. 486 00:38:59,540 --> 00:39:04,240 Let's go back to our specification and see what we have to do next. 487 00:39:04,240 --> 00:39:07,610 >> At this point, we have to write a function called "contains", 488 00:39:07,610 --> 00:39:14,150 with a prototype of "bool contains(int value)". 489 00:39:14,150 --> 00:39:17,060 And this contains function is going to return true 490 00:39:17,060 --> 00:39:21,200 if the tree pointed to by our global root variable 491 00:39:21,200 --> 00:39:26,950 contains the value passed into the function and false otherwise. 492 00:39:26,950 --> 00:39:29,000 Let's go ahead and do that. 493 00:39:29,000 --> 00:39:35,380 This is going to be exactly like the lookup that we did by hand on the iPad just a little bit ago. 494 00:39:35,380 --> 00:39:40,130 Let's zoom back in a little bit and scroll up. 495 00:39:40,130 --> 00:39:43,130 We're going to put this function right above our main function 496 00:39:43,130 --> 00:39:48,990 so that we don't have to do any sort of prototyping. 497 00:39:48,990 --> 00:39:55,960 So, bool contains(int value). 498 00:39:55,960 --> 00:40:00,330 There we go. There's our boilerplate declaration. 499 00:40:00,330 --> 00:40:02,900 Just to make sure that this will compile, 500 00:40:02,900 --> 00:40:06,820 I'm going to go ahead and just set it equal to return false. 501 00:40:06,820 --> 00:40:09,980 Right now, this function just won't do anything and always report that 502 00:40:09,980 --> 00:40:14,010 the value that we're looking for is not in the tree. 503 00:40:14,010 --> 00:40:16,280 >> At this point, it's probably a good idea-- 504 00:40:16,280 --> 00:40:19,600 since we've written a whole bunch of code and we haven't even tried testing it yet-- 505 00:40:19,600 --> 00:40:22,590 to make sure that it all compiles. 506 00:40:22,590 --> 00:40:27,460 There are a couple of things that we have to do to make sure that this will actually compile. 507 00:40:27,460 --> 00:40:33,530 First, see if we've been using any functions in any libraries that we haven't yet included. 508 00:40:33,530 --> 00:40:37,940 The functions we've used so far are malloc, 509 00:40:37,940 --> 00:40:43,310 and then we've also been using this type--this nonstandard type called "bool"-- 510 00:40:43,310 --> 00:40:45,750 which is included in the standard bool header file. 511 00:40:45,750 --> 00:40:53,250 We definitely want to include stdbool.h for the bool type, 512 00:40:53,250 --> 00:40:59,230 and we also want to #include stdlib.h for the standard libraries 513 00:40:59,230 --> 00:41:03,530 that include malloc and free and all of that. 514 00:41:03,530 --> 00:41:08,660 So, zoom out, we're going to quit. 515 00:41:08,660 --> 00:41:14,190 Let's try and make sure that this actually did compile. 516 00:41:14,190 --> 00:41:18,150 We see that it does, so we're on the right track. 517 00:41:18,150 --> 00:41:22,990 >> Let's open up binary_tree.c again. 518 00:41:22,990 --> 00:41:34,530 It compiles. Let's go down and make sure that we insert some calls to our contains function-- 519 00:41:34,530 --> 00:41:40,130 just to make sure that that's all well and good. 520 00:41:40,130 --> 00:41:43,170 For example, when we did some lookups in our tree earlier, 521 00:41:43,170 --> 00:41:48,500 we tried to look up the values 6, 10, and 1, and we knew that 6 was in the tree, 522 00:41:48,500 --> 00:41:52,220 10 was not in the tree, and 1 was not in the tree either. 523 00:41:52,220 --> 00:41:57,230 Let's use those sample calls as a way to figure out whether or not 524 00:41:57,230 --> 00:41:59,880 our contains function is working. 525 00:41:59,880 --> 00:42:05,210 In order to do that, I'm going to use the printf function, 526 00:42:05,210 --> 00:42:10,280 and we're going to print out the result of the call to contains. 527 00:42:10,280 --> 00:42:13,280 I'm going to put in a string "contains(%d) = because 528 00:42:13,280 --> 00:42:20,470 we're going to plug in the value that we're going to look for, 529 00:42:20,470 --> 00:42:27,130 and = %s\n" and use that as our format string. 530 00:42:27,130 --> 00:42:30,720 We're just going to see--literally print out on the screen-- 531 00:42:30,720 --> 00:42:32,060 what looks like a function call. 532 00:42:32,060 --> 00:42:33,580 This isn't actually the function call. 533 00:42:33,580 --> 00:42:36,760 This is just a string designed to look like a function call. 534 00:42:36,760 --> 00:42:41,140 >> Now, we're going to plug in the values. 535 00:42:41,140 --> 00:42:43,580 We're going to try contains on 6, 536 00:42:43,580 --> 00:42:48,340 and then what we're going to do here is use that ternary operator. 537 00:42:48,340 --> 00:42:56,340 Let's see--contains(6)--so, now I've contained 6 and if contains(6) is true, 538 00:42:56,340 --> 00:43:01,850 the string that we're going to send to the %s format character 539 00:43:01,850 --> 00:43:04,850 is going to be the string "true". 540 00:43:04,850 --> 00:43:07,690 Let's scroll over a little bit. 541 00:43:07,690 --> 00:43:16,210 Otherwise, we want to send the string "false" if contains(6) returns false. 542 00:43:16,210 --> 00:43:19,730 This is a little goofy-looking, but I figure I might as well illustrate 543 00:43:19,730 --> 00:43:23,780 what the ternary operator looks like since we haven't seen it for a while. 544 00:43:23,780 --> 00:43:27,670 This will be a nice, handy way to figure out if our contains function is working. 545 00:43:27,670 --> 00:43:30,040 I'm going to scroll back over to the left, 546 00:43:30,040 --> 00:43:39,900 and I'm going to copy and paste this line a few times. 547 00:43:39,900 --> 00:43:44,910 It changed some of these values around, 548 00:43:44,910 --> 00:43:59,380 so this is going to be 1, and this is going to be 10. 549 00:43:59,380 --> 00:44:02,480 >> At this point, we've got a nice contains function. 550 00:44:02,480 --> 00:44:06,080 We've got some tests, and we'll see if this all works. 551 00:44:06,080 --> 00:44:08,120 At this point, we've written some more code. 552 00:44:08,120 --> 00:44:13,160 Time to quit out and make sure that everything still compiles. 553 00:44:13,160 --> 00:44:20,360 Quit out, and now let's try making binary tree again. 554 00:44:20,360 --> 00:44:22,260 Well, it looks like we've got an error, 555 00:44:22,260 --> 00:44:26,930 and we've got this explicitly declaring the library function printf. 556 00:44:26,930 --> 00:44:39,350 It looks like we need to go in and #include stdio.h. 557 00:44:39,350 --> 00:44:45,350 And with that, everything should compile. We're all good. 558 00:44:45,350 --> 00:44:50,420 Now let's try running binary tree and see what happens. 559 00:44:50,420 --> 00:44:53,520 Here we are, ./binary_tree, 560 00:44:53,520 --> 00:44:55,190 and we see that, as we expected-- 561 00:44:55,190 --> 00:44:56,910 because we haven't implemented contains yet, 562 00:44:56,910 --> 00:44:59,800 or rather, we've just put in return false-- 563 00:44:59,800 --> 00:45:03,300 we see that it's just returning false for all of them, 564 00:45:03,300 --> 00:45:06,180 so that's all working for the most part fairly well. 565 00:45:06,180 --> 00:45:11,860 >> Let's go back in and actually implement contains at this point. 566 00:45:11,860 --> 00:45:17,490 I'm going to scroll down, zoom in, and-- 567 00:45:17,490 --> 00:45:22,330 remember, the algorithm that we used was that we started at the root node 568 00:45:22,330 --> 00:45:28,010 and then at each node that we encounter, we do a comparison, 569 00:45:28,010 --> 00:45:32,380 and based on that comparison, we either move to the left child or to the right child. 570 00:45:32,380 --> 00:45:39,670 This is going to look very similar to the binary search code that we wrote earlier in the term. 571 00:45:39,670 --> 00:45:47,810 When we start off, we know that we want to hold on to the current node 572 00:45:47,810 --> 00:45:54,050 that we're looking at, and the current node is going to be initialized to the root node. 573 00:45:54,050 --> 00:45:56,260 And now, we're going to keep going through the tree, 574 00:45:56,260 --> 00:45:58,140 and remember that our stopping condition-- 575 00:45:58,140 --> 00:46:01,870 when we actually worked through the example by hand-- 576 00:46:01,870 --> 00:46:03,960 was when we encountered a null node, 577 00:46:03,960 --> 00:46:05,480 not when we looked at a null child, 578 00:46:05,480 --> 00:46:09,620 but rather when we actually moved to a node in the tree 579 00:46:09,620 --> 00:46:12,640 and found that we're at a null node. 580 00:46:12,640 --> 00:46:20,720 We're going to iterate until cur is not equal to null. 581 00:46:20,720 --> 00:46:22,920 And what are we going to do? 582 00:46:22,920 --> 00:46:31,610 We're going to test if (cur->value == value), 583 00:46:31,610 --> 00:46:35,160 then we know that we've actually found the node that we're looking for. 584 00:46:35,160 --> 00:46:40,450 So here, we can return true. 585 00:46:40,450 --> 00:46:49,830 Otherwise, we want to see whether or not the value is less than the value. 586 00:46:49,830 --> 00:46:53,850 If the current node's value is less than the value we're looking for, 587 00:46:53,850 --> 00:46:57,280 we're going to move to the right. 588 00:46:57,280 --> 00:47:10,600 So, cur = cur->right_child; and otherwise, we're going to move to the left. 589 00:47:10,600 --> 00:47:17,480 cur = cur->left_child;. Pretty simple. 590 00:47:17,480 --> 00:47:22,830 >> You probably recognize the loop that looks very similar to this from 591 00:47:22,830 --> 00:47:27,580 binary search earlier in the term, except then we were dealing with low, mid, and high. 592 00:47:27,580 --> 00:47:30,000 Here, we just have to look at a current value, 593 00:47:30,000 --> 00:47:31,930 so it's nice and simple. 594 00:47:31,930 --> 00:47:34,960 Let's make sure this code is working. 595 00:47:34,960 --> 00:47:42,780 First, make sure it compiles. Looks like it does. 596 00:47:42,780 --> 00:47:47,920 Let's try running it. 597 00:47:47,920 --> 00:47:50,160 And indeed, it prints out everything that we expected. 598 00:47:50,160 --> 00:47:54,320 It finds 6 in the tree, doesn't find 10 because 10's not in the tree, 599 00:47:54,320 --> 00:47:57,740 and doesn't find 1 either because 1's also not in the tree. 600 00:47:57,740 --> 00:48:01,420 Cool stuff. 601 00:48:01,420 --> 00:48:04,470 >> All rright. Let's go back to our specification and see what's next. 602 00:48:04,470 --> 00:48:07,990 Now, it wants to add some more nodes to our tree. 603 00:48:07,990 --> 00:48:11,690 It wants to add 5, 2, and 8, and make sure that our contains code 604 00:48:11,690 --> 00:48:13,570 still works as expected. 605 00:48:13,570 --> 00:48:14,900 Let's go do that. 606 00:48:14,900 --> 00:48:19,430 At this point, rather than doing that annoying copy and paste again, 607 00:48:19,430 --> 00:48:23,770 let's write a function to actually create a node. 608 00:48:23,770 --> 00:48:27,740 If we scroll down all the way to main, we see that we've been doing this 609 00:48:27,740 --> 00:48:31,210 very similar code over and over again each time that we want to create a node. 610 00:48:31,210 --> 00:48:39,540 >> Let's write a function that will actually build a node for us and return it. 611 00:48:39,540 --> 00:48:41,960 I'm going to call it build_node. 612 00:48:41,960 --> 00:48:45,130 I'm going to build a node with a particular value. 613 00:48:45,130 --> 00:48:51,040 Zoom in here. 614 00:48:51,040 --> 00:48:56,600 The first thing I'm going to do is actually create space for the node on the heap. 615 00:48:56,600 --> 00:49:05,400 So, node* n = malloc(sizeof(node)); n->value = value; 616 00:49:05,400 --> 00:49:14,960 and then here, I'm just going to initialize all of the fields to be appropriate values. 617 00:49:14,960 --> 00:49:22,500 And at the very end, we'll return our node. 618 00:49:22,500 --> 00:49:28,690 All right. One thing to note is that this function right here 619 00:49:28,690 --> 00:49:34,320 is going to return a pointer to memory that has been heap-allocated. 620 00:49:34,320 --> 00:49:38,880 What's nice about this is that this node now-- 621 00:49:38,880 --> 00:49:42,420 we have to declare it on the heap because if we declared it on the stack, 622 00:49:42,420 --> 00:49:45,050 we wouldn't be able to do it in this function like this. 623 00:49:45,050 --> 00:49:47,690 That memory would go out of scope 624 00:49:47,690 --> 00:49:51,590 and would be invalid if we tried to access it later on. 625 00:49:51,590 --> 00:49:53,500 Since we are declaring heap-allocated memory, 626 00:49:53,500 --> 00:49:55,830 we are going to have to take care of freeing it later 627 00:49:55,830 --> 00:49:58,530 for our program to not leak any memory. 628 00:49:58,530 --> 00:50:01,270 We've been punting on that for everything else in the code 629 00:50:01,270 --> 00:50:02,880 just for simplicity's sake at the time, 630 00:50:02,880 --> 00:50:05,610 but if you ever write a function that looks like this 631 00:50:05,610 --> 00:50:10,370 where you've got--some call it a malloc or realloc inside-- 632 00:50:10,370 --> 00:50:14,330 you want to make sure that you put some sort of comment up here that says, 633 00:50:14,330 --> 00:50:29,970 hey, you know, returns a heap-allocated node initialized with the passed-in value. 634 00:50:29,970 --> 00:50:33,600 And then you want to make sure that you put in some sort of note that says 635 00:50:33,600 --> 00:50:41,720 the caller must free the returned memory. 636 00:50:41,720 --> 00:50:45,450 That way, if somebody ever goes and uses that function, 637 00:50:45,450 --> 00:50:48,150 they know that whatever they get back from that function 638 00:50:48,150 --> 00:50:50,870 at some point will need to be freed. 639 00:50:50,870 --> 00:50:53,940 >> Assuming that all is well and good here, 640 00:50:53,940 --> 00:51:02,300 we can go down into our code and replace all of these lines right here 641 00:51:02,300 --> 00:51:05,410 with calls to our build node function. 642 00:51:05,410 --> 00:51:08,170 Let's do that really quickly. 643 00:51:08,170 --> 00:51:15,840 The one part that we're not going to replace is this part down here 644 00:51:15,840 --> 00:51:18,520 at the bottom where we actually wire up the nodes to point to each other, 645 00:51:18,520 --> 00:51:21,030 because that we can't do in our function. 646 00:51:21,030 --> 00:51:37,400 But, let's do root = build_node(7); node* three = build_node(3); 647 00:51:37,400 --> 00:51:47,980 node* six = build_node(6); node* nine = build_node(9);. 648 00:51:47,980 --> 00:51:52,590 And now, we also wanted to add nodes for-- 649 00:51:52,590 --> 00:52:03,530 node* five = build_node(5); node* eight = build_node(8); 650 00:52:03,530 --> 00:52:09,760 and what was the other node? Let's see here. We wanted to also add a 2-- 651 00:52:09,760 --> 00:52:20,280 node* two = build_node(2);. 652 00:52:20,280 --> 00:52:26,850 All right. At this point, we know that we've got the 7, the 3, the 9, and the 6 653 00:52:26,850 --> 00:52:30,320 all wired up appropriately, but what about the 5, the 8, and the 2? 654 00:52:30,320 --> 00:52:33,550 To keep everything in the appropriate order, 655 00:52:33,550 --> 00:52:39,230 we know that three's right child is 6. 656 00:52:39,230 --> 00:52:40,890 So, if we're going to add the 5, 657 00:52:40,890 --> 00:52:46,670 the 5 also belongs in the right side of the tree of which 3 is the root, 658 00:52:46,670 --> 00:52:50,440 so 5 belongs as the left child of 6. 659 00:52:50,440 --> 00:52:58,650 We can do this by saying, six->left_child = five; 660 00:52:58,650 --> 00:53:10,790 and then the 8 belongs as the left child of 9, so nine->left_child = eight; 661 00:53:10,790 --> 00:53:22,190 and then 2 is the left child of 3, so we can do that up here--three->left_child = two;. 662 00:53:22,190 --> 00:53:27,730 If you didn't quite follow along with that, I suggest you draw it out yourself. 663 00:53:27,730 --> 00:53:35,660 >> All right. Let's save this. Let's go out and make sure it compiles, 664 00:53:35,660 --> 00:53:40,760 and then we can add in our contains calls. 665 00:53:40,760 --> 00:53:44,120 Looks like everything still compiles. 666 00:53:44,120 --> 00:53:51,790 Let's go in and add in some contains calls. 667 00:53:51,790 --> 00:53:59,640 Again, I'm going to do a little bit of copy and paste. 668 00:53:59,640 --> 00:54:15,860 Now let's search for 5, 8, and 2. Alright. 669 00:54:15,860 --> 00:54:28,330 Let's make sure that this all looks good still. Great! Save and quit. 670 00:54:28,330 --> 00:54:33,220 Now let's make, compile, and now let's run. 671 00:54:33,220 --> 00:54:37,540 From the results, it looks like everything is working just nice and well. 672 00:54:37,540 --> 00:54:41,780 Great! So now we've got our contains function written. 673 00:54:41,780 --> 00:54:46,160 Let's move on and start working on how to insert nodes into the tree 674 00:54:46,160 --> 00:54:50,000 since, as we're doing it right now, things aren't very pretty. 675 00:54:50,000 --> 00:54:52,280 >> If we go back to the specification, 676 00:54:52,280 --> 00:55:00,540 it asks us to write a function called insert--again, returning a bool 677 00:55:00,540 --> 00:55:04,400 for whether or not we could actually insert the node into the tree-- 678 00:55:04,400 --> 00:55:07,710 and then the value to insert into the tree is specified as 679 00:55:07,710 --> 00:55:11,060 the only argument to our insert function. 680 00:55:11,060 --> 00:55:18,180 We will return true if we could indeed insert the node containing value into the tree, 681 00:55:18,180 --> 00:55:20,930 which means that we, one, had enough memory, 682 00:55:20,930 --> 00:55:24,840 and then, two, that node didn't already exist in the tree since-- 683 00:55:24,840 --> 00:55:32,170 remember, we are not going to have duplicate values in the tree, just to make things simple. 684 00:55:32,170 --> 00:55:35,590 All right. Back to the code. 685 00:55:35,590 --> 00:55:44,240 Open it up. Zoom in a bit, then scroll down. 686 00:55:44,240 --> 00:55:47,220 Let's put the insert function right above the contains. 687 00:55:47,220 --> 00:55:56,360 Again, it's going to be called bool insert(int value). 688 00:55:56,360 --> 00:56:01,840 Give it a little more space, and then, as a default, 689 00:56:01,840 --> 00:56:08,870 let's put in return false at the very end. 690 00:56:08,870 --> 00:56:22,620 Now down at the bottom, let's go ahead and instead of manually building the nodes 691 00:56:22,620 --> 00:56:27,900 in main ourselves and wiring them up to point to each other like we're doing, 692 00:56:27,900 --> 00:56:30,600 we'll rely on our insert function to do that. 693 00:56:30,600 --> 00:56:35,510 We won't rely on our insert function to build the entire tree from scratch just yet, 694 00:56:35,510 --> 00:56:39,970 but rather, we'll get rid of these lines--we'll comment out these lines-- 695 00:56:39,970 --> 00:56:43,430 that build the nodes 5, 8, and 2. 696 00:56:43,430 --> 00:56:55,740 And instead, we'll insert calls to our insert function 697 00:56:55,740 --> 00:57:01,280 to make sure that that actually works. 698 00:57:01,280 --> 00:57:05,840 Here we go. 699 00:57:05,840 --> 00:57:09,300 >> Now we've commented out these lines. 700 00:57:09,300 --> 00:57:13,700 We only have 7, 3, 9, and 6 in our tree at this point. 701 00:57:13,700 --> 00:57:18,870 To make sure that this is all working, 702 00:57:18,870 --> 00:57:25,050 we can zoom out, make our binary tree, 703 00:57:25,050 --> 00:57:30,750 run it, and we can see that contains is now telling us that we're totally right-- 704 00:57:30,750 --> 00:57:33,110 5, 8, and 2 are no longer in the tree. 705 00:57:33,110 --> 00:57:37,960 Go back into the code, 706 00:57:37,960 --> 00:57:41,070 and how are we going to insert? 707 00:57:41,070 --> 00:57:46,290 Remember what we did when we were actually inserting 5, 8, and 2 previously. 708 00:57:46,290 --> 00:57:50,100 We played that Plinko game where we started at the root, 709 00:57:50,100 --> 00:57:52,780 went down the tree one by one by one 710 00:57:52,780 --> 00:57:54,980 until we found the appropriate gap, 711 00:57:54,980 --> 00:57:57,570 and then we wired in the node at the appropriate spot. 712 00:57:57,570 --> 00:57:59,480 We're going to do the same thing. 713 00:57:59,480 --> 00:58:04,070 This is basically like writing the code that we used in the contains function 714 00:58:04,070 --> 00:58:05,910 to find the spot where the node should be, 715 00:58:05,910 --> 00:58:10,560 and then we're just going to insert the node right there. 716 00:58:10,560 --> 00:58:17,000 Let's start doing that. 717 00:58:17,000 --> 00:58:24,200 >> So we have a node* cur = root; we're just going to follow the contains code 718 00:58:24,200 --> 00:58:26,850 until we find that it doesn't quite work for us. 719 00:58:26,850 --> 00:58:32,390 We're going to go through the tree while the current element is not null, 720 00:58:32,390 --> 00:58:45,280 and if we find that cur's value is equal to the value that we're trying to insert-- 721 00:58:45,280 --> 00:58:49,600 well, this is one of the cases in which we couldn't actually insert the node 722 00:58:49,600 --> 00:58:52,730 into the tree because this means we have a duplicate value. 723 00:58:52,730 --> 00:58:59,010 Here, we're actually going to return false. 724 00:58:59,010 --> 00:59:08,440 Now, else if cur's value is less than value, 725 00:59:08,440 --> 00:59:10,930 now we know that we move to the right 726 00:59:10,930 --> 00:59:17,190 because the value belongs in the right half of the cur tree. 727 00:59:17,190 --> 00:59:30,110 Otherwise, we're going to move to the left. 728 00:59:30,110 --> 00:59:37,980 That's basically our contains function right there. 729 00:59:37,980 --> 00:59:41,820 >> At this point, once we've completed this while loop, 730 00:59:41,820 --> 00:59:47,350 our cur pointer is going to be pointing to null 731 00:59:47,350 --> 00:59:51,540 if the function hasn't already returned. 732 00:59:51,540 --> 00:59:58,710 We're therefore having cur at the spot where we want to insert the new node. 733 00:59:58,710 --> 01:00:05,210 What remains to be done is to actually build the new node, 734 01:00:05,210 --> 01:00:08,480 which we can do pretty easily. 735 01:00:08,480 --> 01:00:14,930 We can use our super-handy build_node function, 736 01:00:14,930 --> 01:00:17,210 and something that we didn't do earlier-- 737 01:00:17,210 --> 01:00:21,400 we just kind of took for granted, but now we'll do just to make sure-- 738 01:00:21,400 --> 01:00:27,130 we'll test to make sure that the value returned by new_node was actually 739 01:00:27,130 --> 01:00:33,410 not null, because we don't want to start accessing that memory if it is null. 740 01:00:33,410 --> 01:00:39,910 We can test to make sure that new_node is not equal to null. 741 01:00:39,910 --> 01:00:42,910 Or instead, we can just see if it actually is null, 742 01:00:42,910 --> 01:00:52,120 and if it is null, then we can just return false early. 743 01:00:52,120 --> 01:00:59,090 >> At this point, we have to wire new-node into its appropriate spot in the tree. 744 01:00:59,090 --> 01:01:03,510 If we look back at main and where we were actually wiring in values before, 745 01:01:03,510 --> 01:01:08,470 we see that the way we were doing it when we wanted to put 3 in the tree 746 01:01:08,470 --> 01:01:11,750 was we accessed the left child of root. 747 01:01:11,750 --> 01:01:14,920 When we put 9 in the tree, we had to access the right child of the root. 748 01:01:14,920 --> 01:01:21,030 We had to have a pointer to the parent in order to put a new value into the tree. 749 01:01:21,030 --> 01:01:24,430 Scrolling back up to insert, that's not going to quite work here 750 01:01:24,430 --> 01:01:27,550 because we don't have a parent pointer. 751 01:01:27,550 --> 01:01:31,650 What we want to be able to do is, at this point, 752 01:01:31,650 --> 01:01:37,080 check the parent's value and see--well, gosh, 753 01:01:37,080 --> 01:01:41,990 if the parent's value is less than the current value, 754 01:01:41,990 --> 01:01:54,440 then the parent's right child should be the new node; 755 01:01:54,440 --> 01:02:07,280 otherwise, the parent's left child should be the new node. 756 01:02:07,280 --> 01:02:10,290 But we don't have this parent pointer quite yet. 757 01:02:10,290 --> 01:02:15,010 >> In order to get it, we're actually going to have to track it as we go through the tree 758 01:02:15,010 --> 01:02:18,440 and find the appropriate spot in our loop above. 759 01:02:18,440 --> 01:02:26,840 We can do that by scrolling back up to the top of our insert function 760 01:02:26,840 --> 01:02:32,350 and tracking another pointer variable called the parent. 761 01:02:32,350 --> 01:02:39,340 We're going to set it equal to null initially, 762 01:02:39,340 --> 01:02:43,640 and then every time we go through the tree, 763 01:02:43,640 --> 01:02:51,540 we're going to set the parent pointer to match the current pointer. 764 01:02:51,540 --> 01:02:59,140 Set parent equal to cur. 765 01:02:59,140 --> 01:03:02,260 This way, each time we go through, 766 01:03:02,260 --> 01:03:05,550 we're going to ensure that as the current pointer gets incremented, 767 01:03:05,550 --> 01:03:12,640 the parent pointer follows it--just one level higher than the current pointer in the tree. 768 01:03:12,640 --> 01:03:17,370 That all looks pretty good. 769 01:03:17,370 --> 01:03:22,380 >> I think the one thing that we'll want to adjust is this build_node returning null. 770 01:03:22,380 --> 01:03:25,380 In order to get build_node to actually successfully return null, 771 01:03:25,380 --> 01:03:31,060 we'll have to modify that code, 772 01:03:31,060 --> 01:03:37,270 because here, we never tested to make sure that malloc returned a valid pointer. 773 01:03:37,270 --> 01:03:53,390 So, if (n != NULL), then-- 774 01:03:53,390 --> 01:03:55,250 if malloc returned a valid pointer, then we'll initialize it; 775 01:03:55,250 --> 01:04:02,540 otherwise, we'll just return and that will end up returning null for us. 776 01:04:02,540 --> 01:04:13,050 Now all looks pretty good. Let's make sure this actually compiles. 777 01:04:13,050 --> 01:04:22,240 Make binary_tree, and oh, we've got some stuff going on here. 778 01:04:22,240 --> 01:04:29,230 >> We've got an implicit declaration of function build_node. 779 01:04:29,230 --> 01:04:31,950 Again, with these compilers, we're going to start at the top. 780 01:04:31,950 --> 01:04:36,380 What that must mean is that I'm calling build_node before I've actually declared it. 781 01:04:36,380 --> 01:04:37,730 Let's go back to the code really quickly. 782 01:04:37,730 --> 01:04:43,510 Scroll down, and sure enough, my insert function is declared 783 01:04:43,510 --> 01:04:47,400 above the build_node function, 784 01:04:47,400 --> 01:04:50,070 but I'm trying to use build_node inside of insert. 785 01:04:50,070 --> 01:05:06,610 I'm going to go in and copy and then paste the build_node function way up here at the top. 786 01:05:06,610 --> 01:05:11,750 That way, hopefully, that will work. Let's give this another go. 787 01:05:11,750 --> 01:05:18,920 Now it all compiles. All is good. 788 01:05:18,920 --> 01:05:21,640 >> But at this point, we haven't actually called our insert function. 789 01:05:21,640 --> 01:05:26,510 We just know that it compiles, so let's go in and put some calls in. 790 01:05:26,510 --> 01:05:28,240 Let's do that in our main function. 791 01:05:28,240 --> 01:05:32,390 Here, we commented out 5, 8, and 2, 792 01:05:32,390 --> 01:05:36,680 and then we didn't wire them up down here. 793 01:05:36,680 --> 01:05:41,640 Let's make some calls to insert, 794 01:05:41,640 --> 01:05:46,440 and let's also use the same kind of stuff that we used 795 01:05:46,440 --> 01:05:52,810 when we made these printf calls to make sure that everything did get inserted properly. 796 01:05:52,810 --> 01:06:00,550 I'm going to copy and paste, 797 01:06:00,550 --> 01:06:12,450 and instead of contains, we're going to do insert. 798 01:06:12,450 --> 01:06:30,140 And instead of 6, 10, and 1, we're going to use 5, 8, and 2. 799 01:06:30,140 --> 01:06:37,320 This should hopefully insert 5, 8, and 2 into the tree. 800 01:06:37,320 --> 01:06:44,050 Compile. All is good. Now we'll actually run our program. 801 01:06:44,050 --> 01:06:47,330 >> Everything returned false. 802 01:06:47,330 --> 01:06:53,830 So, 5, 8, and 2 did not go, and it looks like contains didn't find them either. 803 01:06:53,830 --> 01:06:58,890 What's going on? Let's zoom out. 804 01:06:58,890 --> 01:07:02,160 The first problem was that insert seemed to return false, 805 01:07:02,160 --> 01:07:08,750 and it looks like that's because we left in our return false call, 806 01:07:08,750 --> 01:07:14,590 and we never actually returned true. 807 01:07:14,590 --> 01:07:17,880 We can set that up. 808 01:07:17,880 --> 01:07:25,290 The second problem is now even if we do--save this, quit this, 809 01:07:25,290 --> 01:07:34,530 run make again, have it compile, then run it-- 810 01:07:34,530 --> 01:07:37,670 we see that something else happened here. 811 01:07:37,670 --> 01:07:42,980 The 5, 8, and 2 were still never found in the tree. 812 01:07:42,980 --> 01:07:44,350 So what's going on? 813 01:07:44,350 --> 01:07:45,700 >> Let's take a look at this in the code. 814 01:07:45,700 --> 01:07:49,790 Let's see if we can figure this out. 815 01:07:49,790 --> 01:07:57,940 We start with the parent not being null. 816 01:07:57,940 --> 01:07:59,510 We set the current pointer equal to the root pointer, 817 01:07:59,510 --> 01:08:04,280 and we're going to work our way down through the tree. 818 01:08:04,280 --> 01:08:08,650 If the current node isn't null, then we know that we can move down a little bit. 819 01:08:08,650 --> 01:08:12,330 We set our parent pointer to be equal to the current pointer, 820 01:08:12,330 --> 01:08:15,420 checked the value--if the values are the same, we returned false. 821 01:08:15,420 --> 01:08:17,540 If the values are less, we moved to the right; 822 01:08:17,540 --> 01:08:20,399 otherwise, we moved to the left. 823 01:08:20,399 --> 01:08:24,220 Then we build a node. I'll zoom in a little bit. 824 01:08:24,220 --> 01:08:31,410 And here, we're going to try to wire up the values to be the same. 825 01:08:31,410 --> 01:08:37,250 What's going on? Let's see if possibly Valgrind can give us a hint. 826 01:08:37,250 --> 01:08:43,910 >> I like to use Valgrind just because Valgrind really quickly runs 827 01:08:43,910 --> 01:08:46,729 and tells you if there are any memory errors. 828 01:08:46,729 --> 01:08:48,300 When we run Valgrind on the code, 829 01:08:48,300 --> 01:08:55,859 as you can see right here--Valgrind./binary_tree--and hit Enter. 830 01:08:55,859 --> 01:09:03,640 You see that we didn't have any memory error, so it looks like everything's okay so far. 831 01:09:03,640 --> 01:09:07,529 We do have some memory leaks, which we know because we're not 832 01:09:07,529 --> 01:09:08,910 happening to free any of our nodes. 833 01:09:08,910 --> 01:09:13,050 Let's try running GDB to see what's actually going on. 834 01:09:13,050 --> 01:09:20,010 We'll do gdb ./binary_tree. It booted up just fine. 835 01:09:20,010 --> 01:09:23,670 Let's set a break point on insert. 836 01:09:23,670 --> 01:09:28,600 Let's run. 837 01:09:28,600 --> 01:09:31,200 It looks like we never actually called insert. 838 01:09:31,200 --> 01:09:39,410 It looks like the problem was just that when I changed down here in main-- 839 01:09:39,410 --> 01:09:44,279 all of these printf calls from contains-- 840 01:09:44,279 --> 01:09:56,430 I never actually changed these to call insert. 841 01:09:56,430 --> 01:10:01,660 Now let's give it a try. Let's compile. 842 01:10:01,660 --> 01:10:09,130 All looks good there. Now let's try running it, see what happens. 843 01:10:09,130 --> 01:10:13,320 All right! Everything looks pretty good there. 844 01:10:13,320 --> 01:10:18,130 >> The final thing to think about is are there any edge cases to this insert? 845 01:10:18,130 --> 01:10:23,170 And it turns out that, well, the one edge case that is always interesting to think about 846 01:10:23,170 --> 01:10:26,250 is what happens if your tree is empty and you call this insert function? 847 01:10:26,250 --> 01:10:30,330 Will it work? Well, let's give it a try. 848 01:10:30,330 --> 01:10:32,110 --binary_tree.c-- 849 01:10:32,110 --> 01:10:35,810 The way we're going to test this is we're going to go down to our main function, 850 01:10:35,810 --> 01:10:41,690 and rather than wiring these nodes up like this, 851 01:10:41,690 --> 01:10:56,730 we're just going to comment out the entire thing, 852 01:10:56,730 --> 01:11:02,620 and instead of wiring up the nodes ourselves, 853 01:11:02,620 --> 01:11:09,400 we can actually just go ahead and delete all of this. 854 01:11:09,400 --> 01:11:17,560 We're going to make everything a call to insert. 855 01:11:17,560 --> 01:11:49,020 So, let's do--instead of 5, 8, and 2, we're going to insert 7, 3, and 9. 856 01:11:49,020 --> 01:11:58,440 And then we'll also want to insert 6 as well. 857 01:11:58,440 --> 01:12:05,190 Save. Quit. Make binary_tree. 858 01:12:05,190 --> 01:12:08,540 It all compiles. 859 01:12:08,540 --> 01:12:10,320 We can just run it as is and see what happens, 860 01:12:10,320 --> 01:12:12,770 but it's also going to be really important to make sure that 861 01:12:12,770 --> 01:12:14,740 we don't have any memory errors, 862 01:12:14,740 --> 01:12:16,840 since this is one of our edge cases that we know about. 863 01:12:16,840 --> 01:12:20,150 >> Let's make sure that it works well under Valgrind, 864 01:12:20,150 --> 01:12:28,310 which we'll do by just running Valgrind ./binary_tree. 865 01:12:28,310 --> 01:12:31,110 It looks like we indeed have one error from one context-- 866 01:12:31,110 --> 01:12:33,790 we have this segmentation fault. 867 01:12:33,790 --> 01:12:36,690 What happened? 868 01:12:36,690 --> 01:12:41,650 Valgrind actually tells us where it is. 869 01:12:41,650 --> 01:12:43,050 Zoom out a little bit. 870 01:12:43,050 --> 01:12:46,040 It looks like it's happening in our insert function, 871 01:12:46,040 --> 01:12:53,420 where we have an invalid read of size 4 at insert, line 60. 872 01:12:53,420 --> 01:13:03,590 Let's go back and see what's going on here. 873 01:13:03,590 --> 01:13:05,350 Zoom out really quick. 874 01:13:05,350 --> 01:13:14,230 I want to make sure that it doesn't go to the edge of the screen so we can see everything. 875 01:13:14,230 --> 01:13:18,760 Pull that in a little bit. All right. 876 01:13:18,760 --> 01:13:35,030 Scroll down, and the problem is right here. 877 01:13:35,030 --> 01:13:40,120 What happens if we get down and our current node is already null, 878 01:13:40,120 --> 01:13:44,010 our parent node is null, so if we look up at the very top, right here-- 879 01:13:44,010 --> 01:13:47,340 if this while loop never actually executes 880 01:13:47,340 --> 01:13:52,330 because our current value is null--our root is null, so cur is null-- 881 01:13:52,330 --> 01:13:57,810 then our parent never gets set to cur or to a valid value, 882 01:13:57,810 --> 01:14:00,580 so parent will also be null. 883 01:14:00,580 --> 01:14:03,700 We need to remember to check for that 884 01:14:03,700 --> 01:14:08,750 by the time we get down here and we start accessing the parent's value. 885 01:14:08,750 --> 01:14:13,190 So what happens? Well, if the parent is null-- 886 01:14:13,190 --> 01:14:17,990 if (parent == NULL)--then we know that 887 01:14:17,990 --> 01:14:19,530 there must not be anything in the tree. 888 01:14:19,530 --> 01:14:22,030 We must be trying to insert it at the root. 889 01:14:22,030 --> 01:14:32,570 We can do that by just setting the root equal to the new node. 890 01:14:32,570 --> 01:14:40,010 Then, at this point, we don't actually want to go through these other things. 891 01:14:40,010 --> 01:14:44,780 Instead, right here, we can do either an else-if-else, 892 01:14:44,780 --> 01:14:47,610 or we could combine everything up here in an else, 893 01:14:47,610 --> 01:14:56,300 but here, we'll just use else and do it that way. 894 01:14:56,300 --> 01:14:59,030 Now, we're going to test to make sure that our parent is not null 895 01:14:59,030 --> 01:15:02,160 before then actually trying to access its fields. 896 01:15:02,160 --> 01:15:05,330 This will help us avoid the segmentation fault. 897 01:15:05,330 --> 01:15:14,740 So, we quit, zoom out, compile, run. 898 01:15:14,740 --> 01:15:18,130 >> No errors, but we still have a bunch of memory leaks 899 01:15:18,130 --> 01:15:20,650 because we didn't free any of our nodes. 900 01:15:20,650 --> 01:15:24,350 But if we go up here and we look at our printout, 901 01:15:24,350 --> 01:15:30,880 we see that, well, looks like all of our inserts were returning true, which is good. 902 01:15:30,880 --> 01:15:33,050 The inserts are all true, 903 01:15:33,050 --> 01:15:36,670 and then the appropriate contains calls are also true. 904 01:15:36,670 --> 01:15:41,510 >> Good job! It looks like we've successfully written insert. 905 01:15:41,510 --> 01:15:47,430 That's all we have for this week's problem set specification. 906 01:15:47,430 --> 01:15:51,720 One fun challenge to think about is how you would actually go in 907 01:15:51,720 --> 01:15:55,340 and free all of the nodes in this tree. 908 01:15:55,340 --> 01:15:58,830 We can do so a number of different ways, 909 01:15:58,830 --> 01:16:01,930 but I'll leave that up to you to experiment, 910 01:16:01,930 --> 01:16:06,080 and as a fun challenge, try and make sure that you can make sure 911 01:16:06,080 --> 01:16:09,490 that this Valgrind report returns no errors and no leaks. 912 01:16:09,490 --> 01:16:12,880 >> Good luck on this week's Huffman coding problem set, 913 01:16:12,880 --> 01:16:14,380 and we'll see you next week! 914 01:16:14,380 --> 01:16:17,290 [CS50.TV]