1 00:00:07,050 --> 00:00:09,630 How would you represent all your family members in a computer? 2 00:00:10,790 --> 00:00:12,390 We could simply use a list, 3 00:00:12,390 --> 00:00:14,400 but there's a clear hierarchy here. 4 00:00:14,400 --> 00:00:17,110 >> Let's say we're beginning with your great-grandmother, Alice. 5 00:00:17,110 --> 00:00:19,030 She has 2 sons, Bob 6 00:00:19,030 --> 00:00:21,310 and your grandfather, Charlie. 7 00:00:21,310 --> 00:00:23,190 Charlie has 3 children, 8 00:00:23,190 --> 00:00:26,730 your uncle, Dave, your aunt, Eve, and your father, Fred. 9 00:00:26,730 --> 00:00:28,750 You are Fred's only child. 10 00:00:28,750 --> 00:00:30,950 >> Why would organizing your family members in this way 11 00:00:30,950 --> 00:00:34,010 be better than the simple list representation? 12 00:00:34,010 --> 00:00:36,630 One reason is that this hierarchical structure, 13 00:00:36,630 --> 00:00:39,660 called a 'tree,' contains more information than a simple list. 14 00:00:40,540 --> 00:00:43,520 We know the familial relationships between everyone 15 00:00:43,520 --> 00:00:45,440 just by examining the tree. 16 00:00:45,440 --> 00:00:47,250 Also, it can speed up 17 00:00:47,250 --> 00:00:50,010 look-up time tremendously, if tree data is sorted. 18 00:00:50,010 --> 00:00:53,850 >> We can't take advantage of that here, but we'll see an example of this soon. 19 00:00:53,850 --> 00:00:56,150 Each person is represented by a node on the tree. 20 00:00:56,680 --> 00:00:58,370 Nodes can have child nodes 21 00:00:58,370 --> 00:01:00,350 as well as a parent node. 22 00:01:00,350 --> 00:01:02,390 These are the technical terms, 23 00:01:02,390 --> 00:01:05,220 even when using trees for things besides families. 24 00:01:05,220 --> 00:01:07,940 Alice's node has 2 children and no parents, 25 00:01:07,940 --> 00:01:11,500 while Charlie's node has 3 children and 1 parent. 26 00:01:11,500 --> 00:01:14,330 >> A leaf node is one that does not have any children 27 00:01:14,330 --> 00:01:16,410 on the outside edge of the tree. 28 00:01:16,410 --> 00:01:18,520 The topmost node of the tree, the root node, 29 00:01:18,520 --> 00:01:20,240 has no parent. 30 00:01:20,240 --> 00:01:23,170 >> A binary tree is a specific type of tree, 31 00:01:23,170 --> 00:01:26,720 in which each node has, at most, 2 children. 32 00:01:26,720 --> 00:01:30,490 Here is the struct of a node of a binary tree in C. 33 00:01:31,560 --> 00:01:34,530 Every node has some data associated with it 34 00:01:34,530 --> 00:01:36,520 and 2 pointers to other nodes. 35 00:01:36,520 --> 00:01:38,110 >> In our family tree, 36 00:01:38,110 --> 00:01:40,900 the associated data was each person's name. 37 00:01:40,900 --> 00:01:43,850 Here it is an int, though it could be anything. 38 00:01:44,510 --> 00:01:46,200 As it turns out, 39 00:01:46,200 --> 00:01:48,990 a binary tree would not be a good representation for a family, 40 00:01:48,990 --> 00:01:51,960 since people frequently have more than 2 children. 41 00:01:51,960 --> 00:01:53,510 >> A binary search tree 42 00:01:53,510 --> 00:01:56,380 is a special, ordered type of binary tree 43 00:01:56,380 --> 00:01:58,090 that allows us to look at values quickly. 44 00:01:58,740 --> 00:02:00,050 You may have noticed 45 00:02:00,050 --> 00:02:02,010 that every node below the root of a tree 46 00:02:02,010 --> 00:02:04,620 is the root of another tree, called a 'subtree.' 47 00:02:04,960 --> 00:02:07,090 Here, the root of the tree is 6, 48 00:02:07,090 --> 00:02:09,860 and its child, 2, is the root of a subtree. 49 00:02:09,860 --> 00:02:11,870 >> In a binary search tree 50 00:02:11,870 --> 00:02:15,790 all the values of a node's right subtree 51 00:02:15,790 --> 00:02:18,690 are greater than the node's value. Here: 6. 52 00:02:18,690 --> 00:02:22,640 Well, the values in a node's left subtree 53 00:02:24,540 --> 00:02:26,890 are less than the node's value. 54 00:02:26,890 --> 00:02:28,620 If we need to handle duplicate values, 55 00:02:28,620 --> 00:02:31,760 we can change either of those to a loose inequality, 56 00:02:31,760 --> 00:02:34,410 meaning identical values can fall either on the left or right, 57 00:02:34,410 --> 00:02:37,400 as long as we are consistent about it throughout. 58 00:02:37,400 --> 00:02:39,620 This tree is a binary search tree 59 00:02:39,620 --> 00:02:41,540 because it follows these rules. 60 00:02:42,320 --> 00:02:46,020 >> This is how it would look if we turned all the nodes into C structs. 61 00:02:46,880 --> 00:02:48,410 Notice that if a child is missing, 62 00:02:48,410 --> 00:02:50,340 the pointer is null. 63 00:02:50,340 --> 00:02:53,270 How do we check to see if 7 is in the tree? 64 00:02:53,270 --> 00:02:55,020 We start at the root. 65 00:02:55,020 --> 00:02:58,690 Seven is greater than 6, so if it's in the tree, it must be to the right. 66 00:02:59,680 --> 00:03:03,650 Then, it is less than 8, so it must be left. 67 00:03:03,650 --> 00:03:06,210 Here, we have found 7. 68 00:03:06,210 --> 00:03:08,160 Now, we'll check for 5. 69 00:03:08,160 --> 00:03:11,500 Five is less than 6, so it must be to the left. 70 00:03:11,500 --> 00:03:13,460 Five is greater than 2, 71 00:03:13,460 --> 00:03:15,010 so it must be right, 72 00:03:15,010 --> 00:03:18,100 and it is also greater than 4, so it must be right again. 73 00:03:18,100 --> 00:03:20,300 However, there is no child here. 74 00:03:20,300 --> 00:03:22,000 The pointer is null. 75 00:03:22,000 --> 00:03:24,270 This means that 5 is not in our tree. 76 00:03:24,270 --> 00:03:27,250 >> We can search the binary tree with the following code: 77 00:03:28,430 --> 00:03:31,140 At every node, we check to see if we have found 78 00:03:31,140 --> 00:03:33,020 the value we are looking for. 79 00:03:33,020 --> 00:03:35,740 If we do not find it, we determine if it should be 80 00:03:35,740 --> 00:03:38,990 on the left or right and check that subtree. 81 00:03:38,990 --> 00:03:41,160 This loop will continue down the tree 82 00:03:41,160 --> 00:03:44,190 until there is no child node on either the left or right. 83 00:03:45,190 --> 00:03:48,600 >> Remember that 5 was not in the tree. 84 00:03:48,600 --> 00:03:50,460 How do we insert it? 85 00:03:50,460 --> 00:03:52,370 The process looks similar to search. 86 00:03:52,370 --> 00:03:54,830 We iterate down the tree starting from 6, 87 00:03:54,830 --> 00:03:57,040 left to 2, 88 00:03:57,040 --> 00:03:59,140 right to 4, 89 00:03:59,140 --> 00:04:02,500 and right again, but 4 has no child on this side. 90 00:04:02,500 --> 00:04:06,090 This will be the new position for 5, 91 00:04:06,090 --> 00:04:08,500 and it will start with no children. 92 00:04:09,020 --> 00:04:12,220 >> How fast are operations on a binary search tree? 93 00:04:12,220 --> 00:04:15,410 Remember that Bigohnotation seeks to provide an upper bound. 94 00:04:15,410 --> 00:04:17,390 In the worst case, 95 00:04:17,390 --> 00:04:19,480 our tree could simply be a linked list 96 00:04:19,480 --> 00:04:22,220 meaning that insertion, deletion, and search 97 00:04:22,220 --> 00:04:25,380 could take time proportional to the number of nodes in the tree. 98 00:04:25,380 --> 00:04:27,380 This is O(n). 99 00:04:27,380 --> 00:04:30,690 >> For example, the following is a valid binary search tree. 100 00:04:31,850 --> 00:04:34,020 However, if we try to find 9, 101 00:04:34,020 --> 00:04:36,760 we have to traverse every node. 102 00:04:36,760 --> 00:04:39,120 It is no better than a linked list. 103 00:04:39,120 --> 00:04:41,410 Ideally, we would want every node 104 00:04:41,410 --> 00:04:44,120 of our binary search tree to have 2 children. 105 00:04:44,120 --> 00:04:46,370 This way, insertion, deletion and search 106 00:04:46,370 --> 00:04:50,190 would take, at worst, O(log n) time. 107 00:04:50,190 --> 00:04:52,470 The tree from before could be more balanced, 108 00:04:52,470 --> 00:04:54,140 like this. 109 00:04:54,140 --> 00:04:57,980 Now, looking up any value takes, at most, 3 steps. 110 00:04:57,980 --> 00:04:59,460 This tree is balanced, 111 00:04:59,460 --> 00:05:01,190 meaning that is has a minimal depth 112 00:05:01,190 --> 00:05:03,680 relative to the number of nodes. 113 00:05:03,680 --> 00:05:06,300 >> Looking for a value in a balanced binary search tree 114 00:05:06,300 --> 00:05:09,540 is similar to binary search on a sorted array. 115 00:05:09,540 --> 00:05:12,290 In fact, if we don't need to insert or delete items, 116 00:05:12,290 --> 00:05:15,150 they behave exactly the same way. 117 00:05:15,150 --> 00:05:17,600 However, a tree structure is better 118 00:05:17,600 --> 00:05:19,530 for handling insertions and deletions 119 00:05:20,360 --> 00:05:22,670 >> My name is Bannus Van der Kloot. 120 00:05:22,670 --> 00:05:24,030 This is CS50.