WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:00.000 --> 00:00:03.346 >> [MUSIC PLAYING] 00:00:05.258 --> 00:00:06.220 >> DOUG LLOYD: All right. 00:00:06.220 --> 00:00:08.140 So binary search is an algorithm we can use 00:00:08.140 --> 00:00:10.530 to find an element inside of an array. 00:00:10.530 --> 00:00:14.710 Unlike linear search, it requires a special condition be met beforehand, 00:00:14.710 --> 00:00:19.020 but it's so much more efficient if that condition is, in fact, met. 00:00:19.020 --> 00:00:20.470 >> So what's the idea here? 00:00:20.470 --> 00:00:21.780 it's divide and conquer. 00:00:21.780 --> 00:00:25.100 We want to reduce the size of the search area by half each time 00:00:25.100 --> 00:00:27.240 in order to find a target number. 00:00:27.240 --> 00:00:29.520 This is where that condition comes into play, though. 00:00:29.520 --> 00:00:32.740 We can only leverage the power of eliminating half of the elements 00:00:32.740 --> 00:00:36.070 without even looking at them if the array is sorted. 00:00:36.070 --> 00:00:39.200 >> If it's a complete mix up, we can't just out of hand 00:00:39.200 --> 00:00:42.870 discard half of the elements, because we don't know what we're discarding. 00:00:42.870 --> 00:00:45.624 But if the array is sorted, we can do that, because we 00:00:45.624 --> 00:00:48.040 know that everything to the left of where we currently are 00:00:48.040 --> 00:00:50.500 must be lower than the value we're currently at. 00:00:50.500 --> 00:00:52.300 And everything to the right of where we are 00:00:52.300 --> 00:00:55.040 must be greater than the value we're currently looking at. 00:00:55.040 --> 00:00:58.710 >> So what's the pseudocode steps for binary search? 00:00:58.710 --> 00:01:02.310 We repeat this process until the array or, as we proceed through, 00:01:02.310 --> 00:01:07.740 sub arrays, smaller pieces of the original array, is of size 0. 00:01:07.740 --> 00:01:10.960 Calculate the midpoint of the current sub array. 00:01:10.960 --> 00:01:14.460 >> If the value you're looking for is in that element of the array, stop. 00:01:14.460 --> 00:01:15.030 You found it. 00:01:15.030 --> 00:01:16.550 That's great. 00:01:16.550 --> 00:01:19.610 Otherwise, if the target is less than what's at the middle, 00:01:19.610 --> 00:01:23.430 so if the value we're looking for is lower than what we see, 00:01:23.430 --> 00:01:26.780 repeat this process again, but change the end point, instead 00:01:26.780 --> 00:01:29.300 of being the original complete full array, 00:01:29.300 --> 00:01:34.110 to be just to the left of where we just looked. 00:01:34.110 --> 00:01:38.940 >> We knew that the middle was too high, or the target was less than the middle, 00:01:38.940 --> 00:01:42.210 and so it must exist, if it exists at all in the array, 00:01:42.210 --> 00:01:44.660 somewhere to the left of the midpoint. 00:01:44.660 --> 00:01:48.120 And so we'll set the array location just to the left 00:01:48.120 --> 00:01:51.145 of the midpoint as the new end point. 00:01:51.145 --> 00:01:53.770 Conversely, if the target is greater than what's at the middle, 00:01:53.770 --> 00:01:55.750 we do the exact same process, but instead we 00:01:55.750 --> 00:01:59.520 change the start point to be just to the right of the midpoint 00:01:59.520 --> 00:02:00.680 we just calculated. 00:02:00.680 --> 00:02:03.220 And then, we begin the process again. 00:02:03.220 --> 00:02:05.220 >> Let's visualize this, OK? 00:02:05.220 --> 00:02:08.620 So there's a lot going and on here, but here's an array of the 15 elements. 00:02:08.620 --> 00:02:11.400 And we're going to be keeping track of a lot more stuff this time. 00:02:11.400 --> 00:02:13.870 So in linear search, we were just caring about a target. 00:02:13.870 --> 00:02:15.869 But this time we want to care about where are we 00:02:15.869 --> 00:02:18.480 starting to look, where are we stopping looking, 00:02:18.480 --> 00:02:21.876 and what's the midpoint of the current array. 00:02:21.876 --> 00:02:23.250 So here we go with binary search. 00:02:23.250 --> 00:02:25.290 We're pretty much good to go, right? 00:02:25.290 --> 00:02:28.650 I'm just going to put down below here a set of indices. 00:02:28.650 --> 00:02:32.430 This is basically just what element of the array we're talking about. 00:02:32.430 --> 00:02:34.500 With linear search, we care, inasmuch as we 00:02:34.500 --> 00:02:36.800 need to know how many elements we're iterating over, 00:02:36.800 --> 00:02:40.010 but we don't actually care what element we're currently looking at. 00:02:40.010 --> 00:02:41.014 In binary search, we do. 00:02:41.014 --> 00:02:42.930 And so those are just there as a little guide. 00:02:42.930 --> 00:02:44.910 >> So we can start, right? 00:02:44.910 --> 00:02:46.240 Well, not quite. 00:02:46.240 --> 00:02:48.160 Remember what I said about binary search? 00:02:48.160 --> 00:02:50.955 We can't do it on an unsorted array or else, 00:02:50.955 --> 00:02:55.820 we are not guaranteeing that the certain elements or values aren't 00:02:55.820 --> 00:02:57.650 being accidentally discarded when we just 00:02:57.650 --> 00:02:59.920 decide to ignore half of the array. 00:02:59.920 --> 00:03:02.574 >> So step one with binary search is you must have a sorted array. 00:03:02.574 --> 00:03:05.240 And you can use any of the sorting algorithms we've talked about 00:03:05.240 --> 00:03:06.700 to get you to that position. 00:03:06.700 --> 00:03:10.370 So now, we're in a position where we can perform binary search. 00:03:10.370 --> 00:03:12.560 >> So let's repeat the process step by step and keep 00:03:12.560 --> 00:03:14.830 track of what's happening as we go. 00:03:14.830 --> 00:03:17.980 So the first we need to do is calculate the midpoint of the current array. 00:03:17.980 --> 00:03:20.620 Well, we'll say we're, first of all, looking for the value 19. 00:03:20.620 --> 00:03:22.290 We're trying to find the number 19. 00:03:22.290 --> 00:03:25.380 The first element of this array is located at index zero, 00:03:25.380 --> 00:03:28.880 and the last element of this array is located at index 14. 00:03:28.880 --> 00:03:31.430 And so we'll call those start and end. 00:03:31.430 --> 00:03:35.387 >> So we calculate the midpoint by adding 0 plus 14 divided by 2; 00:03:35.387 --> 00:03:36.720 pretty straightforward midpoint. 00:03:36.720 --> 00:03:40.190 And we can say that the midpoint is now 7. 00:03:40.190 --> 00:03:43.370 So is 15 what we're looking for? 00:03:43.370 --> 00:03:43.940 No, it's not. 00:03:43.940 --> 00:03:45.270 We're looking for 19. 00:03:45.270 --> 00:03:49.400 But we know that 19 is greater than what we found at the middle. 00:03:49.400 --> 00:03:52.470 >> So what we can do is change the start point 00:03:52.470 --> 00:03:57.280 to be just to the right of the midpoint, and repeat the process again. 00:03:57.280 --> 00:04:01.690 And when we do that, we now say the new start point is array location 8. 00:04:01.690 --> 00:04:07.220 What we've effectively done is ignored everything to the left of 15. 00:04:07.220 --> 00:04:09.570 We've eliminated half of the problem, and now, 00:04:09.570 --> 00:04:13.510 instead of having to search over 15 elements in our array, 00:04:13.510 --> 00:04:15.610 we only have to search over 7. 00:04:15.610 --> 00:04:17.706 So 8 is the new start point. 00:04:17.706 --> 00:04:19.600 14 is still the end point. 00:04:19.600 --> 00:04:21.430 >> And now, we go over this again. 00:04:21.430 --> 00:04:22.810 We calculate the new midpoint. 00:04:22.810 --> 00:04:27.130 8 plus 14 is 22, divided by 2 is 11. 00:04:27.130 --> 00:04:28.660 Is this what we're looking for? 00:04:28.660 --> 00:04:30.110 No, it's not. 00:04:30.110 --> 00:04:32.930 We're looking for a value that's less than what we just found. 00:04:32.930 --> 00:04:34.721 So we're going to repeat the process again. 00:04:34.721 --> 00:04:38.280 We're going to change the end point to be just to the left of the midpoint. 00:04:38.280 --> 00:04:41.800 So the new end point becomes 10. 00:04:41.800 --> 00:04:44.780 And now, that's the only part of the array we have to sort through. 00:04:44.780 --> 00:04:48.460 So we have now eliminated 12 of the 15 elements. 00:04:48.460 --> 00:04:51.550 We know that if 19 exists in the array, it 00:04:51.550 --> 00:04:57.210 must exist somewhere between element number 8 and element number 10. 00:04:57.210 --> 00:04:59.400 >> So we calculate the new midpoint again. 00:04:59.400 --> 00:05:02.900 8 plus 10 is 18, divided by 2 is 9. 00:05:02.900 --> 00:05:05.074 And in this case, look, the target is at the middle. 00:05:05.074 --> 00:05:06.740 We found exactly what we're looking for. 00:05:06.740 --> 00:05:07.780 We can stop. 00:05:07.780 --> 00:05:10.561 We successfully completed a binary search. 00:05:10.561 --> 00:05:11.060 All right. 00:05:11.060 --> 00:05:13.930 So we know this algorithm works if the target is 00:05:13.930 --> 00:05:16.070 somewhere inside of the array. 00:05:16.070 --> 00:05:19.060 Does this algorithm work if the target is not in the array? 00:05:19.060 --> 00:05:20.810 Well, let's start it again, and this time, 00:05:20.810 --> 00:05:23.380 let's look for the element 16, which visually we can see 00:05:23.380 --> 00:05:25.620 does not exist anywhere in the array. 00:05:25.620 --> 00:05:27.110 >> The start point is again 0. 00:05:27.110 --> 00:05:28.590 The end point is again 14. 00:05:28.590 --> 00:05:32.490 Those are the indices of the first and last elements of the complete array. 00:05:32.490 --> 00:05:36.057 And we'll go through the process we just went through again, trying to find 16, 00:05:36.057 --> 00:05:39.140 even though visually, we can already tell that it's not going to be there. 00:05:39.140 --> 00:05:43.450 We just want to make sure this algorithm will, in fact, still work in some way 00:05:43.450 --> 00:05:46.310 and not just leave us stuck in an infinite loop. 00:05:46.310 --> 00:05:48.190 >> So what's the step first? 00:05:48.190 --> 00:05:50.230 Calculate the midpoint of the current array. 00:05:50.230 --> 00:05:52.790 What's the midpoint of the current array? 00:05:52.790 --> 00:05:54.410 Well, it's 7, right? 00:05:54.410 --> 00:05:57.560 14 plus 0 divided by 2 is 7. 00:05:57.560 --> 00:05:59.280 Is 15 what we're looking for? 00:05:59.280 --> 00:05:59.780 No. 00:05:59.780 --> 00:06:02.930 It's pretty close, but we're looking for a value slightly bigger than that. 00:06:02.930 --> 00:06:06.310 >> So we know that it's going to be nowhere to the left of 15. 00:06:06.310 --> 00:06:08.540 The target is greater than what's in the midpoint. 00:06:08.540 --> 00:06:12.450 And so we set the new start point to be just to the right of the middle. 00:06:12.450 --> 00:06:16.130 The midpoint is currently 7, so let's say the new start point is 8. 00:06:16.130 --> 00:06:18.130 And what we've effectively done again is ignored 00:06:18.130 --> 00:06:21.150 the entire left half of the array. 00:06:21.150 --> 00:06:23.750 >> Now, we repeat the process one more time. 00:06:23.750 --> 00:06:24.910 Calculate the new midpoint. 00:06:24.910 --> 00:06:29.350 8 plus 14 is 22, divided by 2 is 11. 00:06:29.350 --> 00:06:31.060 Is 23 what we're looking for? 00:06:31.060 --> 00:06:31.870 Unfortunately, no. 00:06:31.870 --> 00:06:34.930 We're looking for a value that is less than 23. 00:06:34.930 --> 00:06:37.850 And so in this case, we're going to change the end point to be just 00:06:37.850 --> 00:06:40.035 to the left of the current midpoint. 00:06:40.035 --> 00:06:43.440 The current midpoint is 11, and so we'll set the new end point 00:06:43.440 --> 00:06:46.980 for the next time we go through this process to 10. 00:06:46.980 --> 00:06:48.660 >> Again, we go through the process again. 00:06:48.660 --> 00:06:49.640 Calculate the midpoint. 00:06:49.640 --> 00:06:53.100 8 plus 10 divided by 2 is 9. 00:06:53.100 --> 00:06:54.750 Is 19 what we're looking for? 00:06:54.750 --> 00:06:55.500 Unfortunately, no. 00:06:55.500 --> 00:06:58.050 We're still looking for a number less than that. 00:06:58.050 --> 00:07:02.060 So we'll change the end point this time to be just to the left of the midpoint. 00:07:02.060 --> 00:07:05.532 The midpoint is currently 9, so the end point will be 8. 00:07:05.532 --> 00:07:07.920 And now, we're just looking at a single element array. 00:07:07.920 --> 00:07:10.250 >> What's the midpoint of this array? 00:07:10.250 --> 00:07:13.459 Well, it starts at 8, it ends at 8, the midpoint is 8. 00:07:13.459 --> 00:07:14.750 Is that what we're looking for? 00:07:14.750 --> 00:07:16.339 Are we looking for 17? 00:07:16.339 --> 00:07:17.380 No, we're looking for 16. 00:07:17.380 --> 00:07:20.160 So if it exists in the array, it must exist somewhere 00:07:20.160 --> 00:07:21.935 to the left of where we currently are. 00:07:21.935 --> 00:07:23.060 So what are we going to do? 00:07:23.060 --> 00:07:26.610 Well, we'll set the end point to be just to the left of the current midpoint. 00:07:26.610 --> 00:07:29.055 So we'll change the end point to 7. 00:07:29.055 --> 00:07:32.120 Do you see what just happened here, though? 00:07:32.120 --> 00:07:33.370 Look up here now. 00:07:33.370 --> 00:07:35.970 >> Start is now greater than end. 00:07:35.970 --> 00:07:39.620 Effectively, the two ends of our array have crossed, 00:07:39.620 --> 00:07:42.252 and the start point is now after the end point. 00:07:42.252 --> 00:07:43.960 Well, that doesn't make any sense, right? 00:07:43.960 --> 00:07:47.960 So now, what we can say is we have a sub array of size 0. 00:07:47.960 --> 00:07:50.110 And once we're gotten to this point, we can now 00:07:50.110 --> 00:07:53.940 guarantee that element 16 does not exist in the array, 00:07:53.940 --> 00:07:56.280 because the start point and end point have crossed. 00:07:56.280 --> 00:07:58.340 And so it's not there. 00:07:58.340 --> 00:08:01.340 Now, notice that this is slightly different than the start point and end 00:08:01.340 --> 00:08:02.900 point being the same. 00:08:02.900 --> 00:08:05.030 If we had been looking for 17, it would have 00:08:05.030 --> 00:08:08.870 been in the array, and the start point and end point of that last iteration 00:08:08.870 --> 00:08:11.820 before those points crossed, we would have found 17 there. 00:08:11.820 --> 00:08:15.510 It's only when they cross that we can guarantee that the element does not 00:08:15.510 --> 00:08:17.180 exist in the array. 00:08:17.180 --> 00:08:20.260 >> So let's take a lot fewer steps than linear search. 00:08:20.260 --> 00:08:23.250 In the worst case scenario, we had to split up a list of n elements 00:08:23.250 --> 00:08:27.770 repeatedly in half to find the target, either because the target element 00:08:27.770 --> 00:08:33.110 will be somewhere in the last division, or it doesn't exist at all. 00:08:33.110 --> 00:08:37.830 So in the worst case, we have to split up the array-- do you know? 00:08:37.830 --> 00:08:40.510 Log of n times; we have to cut the problem 00:08:40.510 --> 00:08:42.610 in half a certain number of times. 00:08:42.610 --> 00:08:45.160 That number of times is log n. 00:08:45.160 --> 00:08:46.510 >> What's the best case scenario? 00:08:46.510 --> 00:08:48.899 Well, the first time we calculate the midpoint, 00:08:48.899 --> 00:08:50.190 we find what we're looking for. 00:08:50.190 --> 00:08:52.150 In all the previous examples on binary search 00:08:52.150 --> 00:08:55.489 we've just gone over, if we had been looking for the element 15, 00:08:55.489 --> 00:08:57.030 we would have found that immediately. 00:08:57.030 --> 00:08:58.321 That was at the very beginning. 00:08:58.321 --> 00:09:01.200 That was the midpoint of the first attempt at a split 00:09:01.200 --> 00:09:03.950 of a division in binary search. 00:09:03.950 --> 00:09:06.350 >> And so in the worst case, binary search runs 00:09:06.350 --> 00:09:11.580 in log n, which is substantially better than linear search, in the worst case. 00:09:11.580 --> 00:09:16.210 In the best case, binary search runs in omega of 1. 00:09:16.210 --> 00:09:18.990 So binary search is a lot better than linear search, 00:09:18.990 --> 00:09:23.270 but you have to deal with the process of sorting your array first before you can 00:09:23.270 --> 00:09:26.140 leverage the power of binary search. 00:09:26.140 --> 00:09:27.130 >> I'm Doug Lloyd. 00:09:27.130 --> 00:09:29.470 This is CS 50.