[Binary Search] [Patrick Schmid - Harvard University] [This is CS50. - CS50.TV] If I gave you a list of Disney character names in alphabetical order and asked you to find Mickey Mouse, how would you go about doing this? One obvious way would be to scan the list from the beginning and check every name to see if it's Mickey. But as you read Aladdin, Alice, Ariel, and so forth, you'll quickly realize that starting at the front of the list wasn't a good idea. Okay, maybe you should start working backwards from the end of the list. Now you read Tarzan, Stitch, Snow White, and so forth. Still, this doesn't seem like the best way to go about it. Well, another way that you could go about doing this is to try to narrow down the list of names that you have to look at. Since you know that they are in alphabetical order, you could just look at the names in the middle of the list and check if Mickey Mouse is before or after this name. Looking at the last name in the second column you'd realize that M for Mickey comes after J for Jasmine, so you'd simply ignore the first half of the list. Then you'd probably look at the top of the last column and see that it begins with Rapunzel. Mickey comes before Rapunzel; looks like we can ignore the last column as well. Continuing the search strategy, you'll quickly see that Mickey is in the first half of the remaining list of names and finally find Mickey hiding between Merlin and Minnie. What you just did was basically binary search. As this name implies, it performs its searching strategy in a binary matter. What does this mean? Well, given a list of sorted items, the binary search algorithm makes a binary decision-- left or right, greater than or less than, alphabetically before or after--at each point. Now that we have a name that goes along with this search algorithm, let's look at another example, but this time with a list of sorted numbers. Say we are looking for the number 144 in this list of sorted numbers. Just like before, we find the number that's in the middle-- which in this case is 13--and see if 144 is greater than or less than 13. Since it's clearly greater than 13, we can ignore everything that is 13 or less and just concentrate on the remaining half. Since we now have an even number of items left, we simply choose a number that is close to the middle. In this case we choose 55. We could have just as easily chosen 89. Okay. Again, 144 is greater than 55, so we go to the right. Fortunately for us, the next middle number is 144, the one we're looking for. So in order to find 144 using a binary search, we're able to find it in only 3 steps. If we would have used linear search here, it would have taken us 12 steps. As a matter of fact, since this search method halves the number of items it has to look at at each step, it will find the item it is searching for in about the log of the number of items in the list. Now that we've seen 2 examples, let's take a look at some pseudocode for a recursive function that implements binary search. Starting at the top, we see that we have to find a function that takes 4 arguments: key, array, min, and max. The key is the number that we are looking for, so 144 in the previous example. Array is the list of numbers that we are searching. Min and max are the indices of the minimum and maximum positions that we are currently looking at. So when we start, min will be zero and max will be the maximum index of the array. As we narrow the search down, we will update min and max to be just the range that we are still looking in. Let's skip to the interesting part first. The first thing we do is find the midpoint, the index that is halfway between the min and max of the array that we are still considering. Then we look at the value of the array at that midpoint location and see if the number that we are looking for is less than that key. If the number at that position is less, then it means the key is larger than all numbers to the left of that position. So we can call binary search function again, but this time updating the min and max parameters to read just the half that is greater than or equal to the value we just looked at. On the other hand, if the key is less than the number at the current midpoint of the array, we want to go to the left and ignore all numbers that are greater. Again, we call binary search but this time with the range of min and max updated to include just the lower half. If the value at the current midpoint in the array is neither larger than nor smaller than the key, then it must be equal to the key. Thus, we can simply return the current midpoint index, and we're done. Finally, this check here is for the case that the number isn't actually in the array of numbers we are searching. If the maximum index of the range that we are looking for is ever less than the minimum, that means that we've gone too far. Since the number wasn't in the input array, we return -1 to indicate that nothing was found. You may have noticed that for this algorithm to work, the list of numbers has to be sorted. In other words, we can only find 144 using binary search if all the numbers are ordered from lowest to highest. If this weren't the case, we wouldn't be able to exclude half of the numbers at each step. So we have 2 options. Either we can take an unsorted list and sort it before using binary search, or we can make sure that the list of numbers is sorted as we add numbers to it. Thus, instead of sorting just when we have to search, why not keep the list sorted at all times? One way to keep a list of numbers sorted while simultaneously allowing one to add or move numbers from this list is by using something called a binary search tree. A binary search tree is a data structure that has 3 properties. First, the left subtree of any node contains only values that are less than or equal to the node's value. Second, the right subtree of a node only contains values that are greater than or equal to the node's value. And, finally, both the left and right subtrees of all nodes are also binary search trees. Let's look at an example with the same numbers we used earlier. For those of you who have never seen a computer science tree before, let me begin by telling you that a computer science tree grows downwards. Yes, unlike trees you are accustomed to, the root of a computer science tree is at the top, and the leaves are at the bottom. Each little box is called a node, and the nodes are connected to each other by edges. So the root of this tree is a node value with the value 13, which has 2 children nodes with the values 5 and 34. A subtree is the tree that is formed just by looking at a subsection of the entire tree. For example, the left subtree of the node 3 is the tree created by the nodes 0, 1, and 2. So, if we go back to the properties of a binary search tree, we see that each node in the tree conforms to all 3 properties, namely, the left subtree only contains values that are less than or equal to the node's value; the right subtree of all nodes only contains values that are greater than or equal to the node's value; and both left and right subtrees of all nodes also are binary search trees. Although this tree looks different, this is a valid binary search tree for the same set of numbers. As a matter of fact, there are many possible ways that you can create a valid binary search tree from these numbers. Well, let's go back to the first one we created. So what can we do with these trees? Well, we can very simply find the minimum and maximum values. The minimum values can be found by always going to the left until there are no more nodes to visit. Conversely, to find the maximum one simply just goes down to the right at each time. Finding any other number that isn't the minimum or the maximum is just as easy. Say we're looking for the number 89. We simply check the value of each node and go to the left or right, depending on whether the node's value is less than or greater than the one we're looking for. So, starting at the root of 13, we see that 89 is greater, and so we go to the right. Then we see the node for 34, and again we go to the right. 89 is still greater than 55, so we continue going to the right. We then come up with a node with the value of 144 and go to the left. Lo and behold, 89 is right there. Another thing that we can do is print out all numbers by performing an inorder traversal. An inorder traversal means to print everything out in the left subtree followed by printing the node itself and then followed by printing everything out in the right subtree. For example, let's take our favorite binary search tree and print out the numbers in sorted order. We start at the root of 13, but before printing 13 we have to print out everything in the left subtree. So we go to 5. We still have to go deeper down in the tree until we find the left-most node, which is zero. After printing zero, we go back up to the 1 and print that out. Then we go to the right subtree, which is 2, and print that out. Now that we're done with that subtree, we can go back up to the 3 and print it out. Continuing back up, we print the 5 and then the 8. Now that we have completed the entire left subtree, we can print out the 13 and start working on the right subtree. We hop down to 34, but before printing 34 we have to print out its left subtree. So we print out 21; then we get to print out 34 and visit its right subtree. Since 55 has no left subtree, we print it out and continue on to its right subtree. 144 has a left subtree, and so we print out the 89, followed by the 144, and finally the right-most node of 233. There you have it; all the numbers are printed out in order from lowest to highest. Adding something to the tree is relatively painless as well. All we have to do is make sure that we follow 3 binary search tree properties and then insert the value where there is space. Let's say we want to insert the value of 7. Since 7 is less than 13, we go to the left. But it's greater than 5, so we traverse to the right. Since it's less than 8 and 8 is a leaf node, we add 7 to the left child of 8. Voila! We've added a number to our binary search tree. If we can add things, we better be able to delete things as well. Unfortunately for us, deleting is a little bit more complicated-- not much, but just a little bit. There are 3 different scenarios that we have to consider when deleting elements from binary search trees. First, the easiest case is that the element is a leaf node. In this case, we simply delete it and go on with our business. Say we want to delete the 7 that we just added. Well, we simply find it, remove it, and that's it. The next case is if the node has only 1 child. Here we can delete the node, but we first have to make sure to connect the subtree that is now left parentless to the parent of the node we just deleted. Say we want to delete 3 from our tree. We take the child element of that node and attach it to the parent of the node. In this case, we're now attaching the 1 to the 5. This works without a problem because we know, according to the binary search tree property, that everything in the left subtree of 3 was less than 5. Now that 3's subtree is taken care of, we can delete it. The third and final case is the most complex. This is the case when the node we want to delete has 2 children. In order to do this, we first have to find the node that has the next largest value, swap the two, and then delete the node in question. Notice the node that has the next largest value can't have 2 children itself since its left child would be a better candidate for the next largest. Therefore, deleting a node with 2 children amounts to swapping of 2 nodes, and then deleting is handled by 1 of the 2 aforementioned rules. For example, let's say we want to delete the root node, 13. The first thing we do is we find the next largest value in the tree which, in this case, is 21. We then swap the 2 nodes, making 13 a leaf and 21 the central group node. Now we can simply delete 13. As alluded to earlier, there are many possible ways to make a valid binary search tree. Unfortunately for us, some are worse than others. For example, what happens when we build a binary search tree from a sorted list of numbers? All numbers are just added to the right at each step. When we want to search for a number, we have no choice but only to look at the right at each step. This isn't better than linear search at all. Although we won't cover them here, there are other, more complex, data structures that make sure that this doesn't happen. However, one simple thing that can be done to avoid this is to just randomly shuffle the input values. It's highly unlikely that by random chance a shuffled list of numbers is sorted. If this were the case, casinos wouldn't stay in business for long. There you have it. You now know about binary searching and binary search trees. I'm Patrick Schmid, and this is CS50. [CS50.TV] One obvious way would be to scan the list from...[beep] ...the number of items...yep [laughs] ...post node of 234...augh. >>Yay! That was--