[Section 3] [Less Comfortable] 

[Nate Hardison] [Harvard University] 

[This is CS50.] [CS50.TV] 

All right, let's get started. Welcome to Week 4 of CS50. If you guys open up a web browser and open up pset 3, Scramble with CS50, we're going to start going through the section of questions there. Just like last week, we'll be working in CS50 Spaces, if you'll also pull that up as well, and if you go ahead and visit this link that I've got up here at the top. It's time to get started. We've got our little hi program here. Nothing crazy. One of the first things I want to do with you guys today is go over a few solutions to Problem Set 1, kind of example solutions, just so you can get a feel for what kinds of code staff is writing, what kinds of code other students are writing, and have you take a look at it because I know it's weird when you submit a solution to a problem set and get comments on your own version, but sometimes it's helpful to see how other people did it, especially ones that are nice looking. For the most part, I was really impressed with the solutions that you guys produced. I haven't yet started looking at your Problem Set 2s, but if they're anything like the first, it means nothing but good things. 

If you look at my revisions, let's start all the way down at Revision 1, and we're going to take a quick look at a Mario solution. If you pull this up, these programs that we're going to present are correct. There weren't correctness issues with these problems, but rather, we want to talk a little bit about the different design issues that were being used here. One of the things that was interesting about the solution is that it used this new construct called pound define, sometimes also referred to as a hash define. Let me zoom in on it here. A #define allows you to give names to these numbers in your program. In this case, the maximum height of a pyramid in Mario was 23 and rather than putting 23 in my code— we would refer to that as hard coding 23— instead this gives the name MAX_HEIGHT to that number, so that down here in my do-while loop you can actually refer to MAX_HEIGHT instead of putting the number 23 in. [Student] What is the advantage of doing that? That's a great question. One is readability. An advantage of using this #define is readability. When I'm reading this code, I can see what's going on. 

I can see in this condition here that we're testing for the height being < 0, which we could have also defined to be a minimum height or a min height. The other advantage is that I can then read the rest of the line to see that we're also checking to make sure that height is not greater than the max height, because we're going to continue while the height is greater than the max height. The other advantage is—if I zoom out a little bit here— if I run this program and I run it, say, with 23 right now, it will print out all 23 rows just like that. But say I wanted to change the max height, and now I want to limit the maximum height of pyramids to be only say—man, that was funky. #include <stdio.h>, #define MAX_HEIGHT, and let's say we wanted to set it equal to 10. Now at this point, all I had to do was change it in this one location. I can recompile the code, and now if I try and type in 12, it will prompt me again. In this case, we're only using MAX_HEIGHT once. It's not that big of a hassle to go in and change it in the while loop if you need to. But in programs where you're referencing the same magic number over and over again, this #define mechanism is really handy because you just change it one time at the top of the file—it's typically where you put them— and the change percolates through the rest of the file. 

Other things I wanted to note in this assignment that I thought looked really nice, one was the naming of the variables. You see here that we've got integer variables called row and called height. Spaces, hashes, it helps make the code a little more readable, makes it a little more understandable what's actually going on. This is in contrast to using, say, random letters or just gobbledygook altogether. A final thing I'll point out is that in for loops, often these iterator variables, these counters that you use in your for loops, it's standard and conventional to start them with either i and then j and then k and going on from there if you need more variables, and this is just a convention. There are lots of conventions. It depends on the programming language you're using. But in C, we typically start with i. It doesn't make sense to use, say, a or b depending on the situation. That's it for this one. If you now pull up Revision 2, you'll see another Mario, and this one is similar to the other one that we just saw, but it does something kind of cool. If we look at this section right here inside the inner for loop, they're using some crazy looking syntax here right in this line. This is called a ternary operator. It is an if else statement condensed into one line. The condition is this part within parentheses. It's equivalent to saying if j < height - i - 1. And then what the contents of that if block would be are the space and then the contents of what the else would be are this #. It's essentially assigning a space to this variable. It's putting a space in the contents of the block variable, if this condition is met, and if the condition is not met, then the block variable gets this #. And then, of course, instead of building up an entire string and printing everything out at the end this solution prints it out one character at a time. Pretty cool. 

Another couple of things to look at. We'll move on to greedy. Now if we look at greedy, this first solution uses these #defines quite a bit. We've got one constant defined for each of the different numbers in this program. We've got one for cents per dollar, one for quarters, dimes, nickels, and pennies, and now if we scroll down and read the code, we can see a standard do-while loop printing everything out. Kind of the crux of this problem was realizing that you needed to convert the float that you read in from the user to an integer to accurately do the math, and this is because with floating point numbers, like we talked about in lecture briefly, it's not possible to accurately represent every single value on the number line because there are infinitely many values between 3 and, say, 3.1 even. You can have 3.01 and 3.001 and 3.0001, and you can keep going. It turns out whenever you're working with money, you often want to convert it into integer format so that you're not losing pennies and that kind of stuff. Doing that and rounding was key. This solution used a perfectly straightforward, great algorithm, which decremented the number of cents remaining, first by quarters, then by dimes, then by nickels, then by pennies, and adding to the number of coins each time. 

Another solution that we'll see, as I zoom out and go to Revision 4, had a very similar beginning but instead used div and mod right over here to calculate the number of cents. This, the number of quarters is equal to the number of cents divided by 25, and the reason this works is because we're doing integer division, so it's discarding any remainder. [Student] Do we have to comment the search? It really depends. [Student] You're commenting more than code right here. Yeah, and so there are a bunch of varying philosophies on this. My personal philosophy is that your code is really the truth, like your code is what's actually executing on the computer, and so your code should be as readable as possible to not necessitate as many comments. That said, when you are doing things that are kind of tricky mathematically or algorithmically, it's good to comment those so that you can add an extra dimension, an extra layer to whoever is reading your code. In these solutions, often they are commented more heavily just because we want to be able to distribute them and have people pick them up and read them pretty easily. But definitely, I would agree that this is heavy. [Student] But when in doubt, go heavier? When in doubt, go heavier. Some people will sometimes say return 0 or something like that. I think that's a ridiculous comment. Clearly that's what's happening. I don't need English to tell me that. Sometimes people will write stuff like "kthxbai!" That's kind of cute but also not— that's not making the difference between commenting points or not. Those kinds of comments are just ha, ha. Cool. 

At this point, let's start working on the Problem Set 3 section of questions. If you guys pull this up again, as with last week, we're not going to watch the shorts in this section. We'll let you guys do that on your own time and talk about the questions. But now in this section we're going to spend a little more time talking about less of the coding basics like we did last week, and instead, we're going to focus more on a little bit more of the theory, so talking about binary search and then sorting. From those of you who have been following along with the lecture, can somebody give me a recap of what the difference is between binary search and linear search? What's going on? Sure. Linear search searches through every element in the sorted list one by one by one by one by one, and binary search divides the list into 2 groups, checks if the keys value that you're searching for is greater than or less than the midpoint value that you just found, and if it's less than, it goes with the lower list and then divides that again, does the same function all the way down until it finds the midpoint to be equal to the value itself. Right. 

Why do we care? Why do we talk about binary search versus linear search? Yeah. Binary is a lot faster, so if you double the size of the problem it takes one more step rather than twice as many. Exactly. That's a great answer. Linear search is very much checking one element at a time, and as we saw on the very first day of lecture when David went through his phone book example and ripped out one page of the phone book at a time and kept doing that over and over and over again, it's going to take him a really long time to find anybody in the phone book, unless, of course, he was looking for somebody at the very beginning of the alphabet. With binary search, you can go a lot faster, and it's not just twice as fast or 3 times as fast or 4 times as fast. But the problem gets smaller and smaller and smaller much faster. To illustrate this, we'll start talking about what's going on when we write binary search. The problem at hand is that if I have an array of numbers, say, 1, 2, 3, 5, 7, 23, 45, 78, 12323, and then 9 with a ton of 0s after it, we want to be able to figure out really quickly what is in this array of numbers. I know this seems a little silly and a little contrived, because right now it is. We have an array that doesn't have very many elements in it, and if I ask one of you to figure out whether or not 23 is in the array, you can do that pretty quickly just by glancing at this and telling me yes or no. The analog to consider is imagine if this were, say, an Excel spreadsheet with 10,000 rows, 20,000 rows. Of course, you can do the command F or the control F and look something up. You can also use the filters and the search stuff, but if you had to look through that file line by line by line, it would take you a long time to find it. It's kind of like in the phone book example, too, where nobody looks through a phone book one page at a time. Typically, they do open it to the middle, or in the case of a lot of phone books and dictionaries where you actually have it keyed on the first letter, you flip to that first letter and open and start going through there. 

Remind me of your name again.>>Sam. Sam. Like Sam said, that linear search process is going to be really slow, and instead with binary search, the way this works is that every time we go through an iteration of our searching algorithm, we're going to divide the list in half, essentially, into two smaller lists. And then on the next iteration of the loop, we'll divide it again into other smaller lists. As you can see, the problem keeps getting smaller and smaller because we keep discarding half of the list every single time. How does this discard work? Just as a reminder, what we're going to do if we were a computer and we were, say, searching for the number 5 in this list is that we would pick a number in the middle. In the middle of this list, because there are 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 numbers, we'd pick the number either at the 4th position or at the 5th position, and we'd call that the middle of our list. Pick number in middle. Then, just like Sam said, we'll test to see if that number is equal to the number that we want to get or our desired number. If it's equal, then we've found it. We win. If it's not equal, then there are a couple of cases. The two cases are either the number has to be greater than the number we're looking at, or it's less than. If it's greater, we move to the right. And if it's less, we move to the left. And then we repeat the whole process again on either the right half or the left half of the list. 

The first problem in today's section is to figure out how we can actually start to express this in C code. We've got the pseudocode here. What we'll start doing is I'll pull up a brand-new space, save this revision so that we have these notes for later, we'll delete all this, and then copy and paste from the problem set this information into our spaces, and hopefully this doesn't break. Perfect. If you guys all do that, copy and paste this code into your new space, into a blank one. Let's try Daniel. If you compile and run this program, does it work? No.>>What's it saying? It says the control reaches end of non-void function. Yeah, so let me try running it. Have you guys seen this before? Do you know what this means? Okay, let's dissect this a little bit. It's saying at file.c on line 9, column 1 we have an error, just like you said, and it says that it's stemming from the error warning and the return type warning. It looks like something is going on with the return type, which makes sense. We've got a non-void function, which means that we've got a function that doesn't return void. A void function is one that looks like this: void foo(), and it's void because the return type is void, which means that if we had something in here like return 1, we'd get a compiler error for this. However, we have a non-void function. Our non-void function in this case is our search function because it has a return type of bool. When it's saying that the control reaches the end of a non-void function, it's because search doesn't have a return statement. It's not returning anything of type bool. 

We can fix that, and what do you guys think search should return by default? What should be the default return value of search? Because that's what we can put at the end. Charlotte, do you have any—? True or false?>>True or false. Which one? False. I don't know. False? Let's try it. Why would you say return false? That's great intuition. [Charlotte] I don't know. We're going to return false in this case because this will be our default if for some reason the list is empty or the needle that we're looking for doesn't exist. Then at the very end, if we don't return true earlier in this function, we always know that this function will say nope, it's not in the array. It's not in the haystack. Now if we compile and run it—let me save this so we can pull it up. Now if we compile and run our program, it builds. We get our little prompt. If I hit 4—uh-oh. It didn't print out anything. It looks like everything ended okay. We've got to fill this in. We talked about the algorithm in pseudocode a little bit ago. Let me see, save this, and I'll pull that algorithm back up again. Let's hit this guy. Nope. There it is. How do we do this? What would be a good strategy for starting off this code? You have to pick a number in the middle. How do we pick a number in the middle of an array? Any suggestions? [Student] Strlen divided by 2. Strlen divided by 2. That's a great one. Strlen works with special kinds of arrays. What kinds of arrays? String arrays, character arrays. It's that same sort of concept that we want to apply, but we can't use strlen because we don't have an array of characters. We have an array of ints. But what does strlen get for us? Do you know what it gets for us? [Student] Strlen gets us the length. Exactly, it gets us the length. Strlen gets the length of the array for us. 

How do we get that in our binary search program? How would you get the length of an array? [Student] Strlen? You can get the length of a properly formatted C string array with strlen. The problem, though, is that we don't have a string array. If we look back at this code, we have this integer array. How do we know how long it is? [Student] Is there an equivalent one for endpoint, like int l or something? It turns out there actually isn't, and so in a way, this is one of those things that's just good to know about C, that there is no way to get the length of an array if all I give you is the array. The reason it works with strings, the reason strlen works, is because if a string is properly formatted, it will have that special \0 character at the very end. 

You can also imagine if you have an improperly formatted string and there's no \0 character there, then the whole thing doesn't work. [Student] Can you add the \0? We could in this case. We could add some sort of \0 or some sort of signifying character and then use that. But that's not quite going to work because the \0 is for a char type, and here we've got ints. The other thing is if we were to use a special value like -1 to mark the end of an array then we could never store a -1 in our integer arrays. We'd be stuck. It turns out that the only way to get the length of an array in C is to actually remember it when you set it up and then pass it around with the array so that whenever I have a function that's going to do some work on an array of integers or floats or doubles or what have you, I also need to give the function the array's length, and that's exactly what we've done here in the search function. If you look, what we've done when we pass in our array here, we also pass in the length, the size. It just happens that we have called this variable here, this parameter or argument. This is called a function's argument list or parameter list, and these are also called arguments or parameters. People use different terms at different times. I sometimes interchange them myself. It just so happens that this variable here is named similarly to this #define up here. But they're not the same thing. The capitalization does matter. 

If you look at what happens here, we declare our int array, which we've called numbers. We've given it our size, which corresponds to our #define up at the top. It's going to be 8. And then when we then call our search function down below, we pass in the number we want to search for, which we've prompted, gotten from the user. We pass in the array, this numbers, and then we also have to pass in the size of the array, and then the value of size 8 gets stored or passed to this integer variable called size. We have the size of the array. Now if we go back to what we were talking about earlier, I think Missy brought up the point that what we needed to do is get the length of the array and divide it by 2, and that will give us the midpoint. Let's see. Can I have somebody write this and save it in their space? How about Leila? Can I have you write this in? Write the first line where you take the length of the array and get the midpoint and store it in a new variable. I'll give you a couple seconds. Are you ready? [Student inaudible] Sure, could I have you calculate the midpoint of the haystack array inside the search function using the length of the haystack array, which is the size variable? Nothing tricky here. [Leila] Just size/2 and just— And save it, and hit the Save button up here at the top, and we'll pull it up. Perfect. There we go. Awesome. 

As is, will this compile? [Leila] No, it needs to be higher. [Nate] Yeah, so what do we need to do? [Leila] Like int midpoint or something. Awesome. Yeah, let's do that, int midpoint = size. Will this compile? Let's delete this comment and get it out of the way. What won't compile about this? We're not doing anything with integer, so we need to print it or something like that. Yeah, exactly. We'll get an unused variable. What else is not going to work about this? I think you said something, Sam. Semicolons. Yeah, I'm missing those semicolons. It's going to be a constant thing throughout the course of the term. The final thing I'll do is I'll put some white space on either side of this operator here, since that's typically how we do it according to our style guide. We've got the midpoint of our array. Now if we remember back to our algorithm, what was the second step that we had to do once we have the midpoint? [Student] If it's greater [inaudible]. Yeah, so we have to do some sort of comparison, and what are we comparing here? You said if it is greater than. What is it in that sentence referring to? The number that comes up, if that's greater than the midpoint, then go up to the array? Exactly, so the number that comes up when we— The needle, so we're comparing to the needle, and what are we comparing against the needle? Because the needle is what we're looking for. We're comparing it to get to the midpoint. 

But does it make sense to check to see if needle = midpoint? Does that make sense? Does anybody disagree? Let's give it a try, if (needle == midpoint). [Student] Do printf you found it. [Nate] Printf("We found it!\n"); Otherwise—I'm going to start doing something different here. I'm going to start putting braces around if statements all the time just because if we add more stuff, then we don't get the compilers. Yeah, Sam. You've got a point. The problem is that midpoint represents a position in the array, but you can get it to represent the value in that position of the array. That's a great point. Did everybody hear what Sam said? He said that midpoint as is represents just a position in the array, but it's not the actual element in the array. If you think about the code as written right now, if we look at this array down here, which has 8 elements in it, what is the value of midpoint going to be in this function? [Student] 4. [Nate] 4. If we look for the number 4— and we can just run this code and put a little sad face in here because we didn't find it—if we run this code as is right now, uploading it, building, let me scroll down, and if we look for the number 4, we found it, but we didn't get this to printf yes. One reason is that we didn't return true, but did we really find the number 4? And Sam is saying no. What did we find? We really found the midpoint, which if we look at the array down here, it's going to be the element at index 4 that we're looking at, which is 23. 

How do we actually get that element at the midpoint and not just the midpoint itself? [Student] We would enter char or something? What would that do, just out of curiosity? Can you elaborate a little more? You have to transform the position into the number, so you've got to make some connection—I think it's char, but it might not be. Yeah, that's a good point. We've been doing a lot of this converting positions into chars, these characters, in the first two problem sets. It turns out that here, this is almost similar to accessing the ith character within a string, if that makes sense. Here we want to access the midpoint element. How do we do that? Kevin, do you have any suggestions how we might do that? You could do haystack, open bracket, mid, closed bracket. Can you write that for us? Save it in here, and we'll pull that up. We're looking at this line 9, and we're realizing that we don't want to compare the needle to the midpoint, but instead, we want to compare the needle to the element at position midpoint within our haystack array. Cool. There we go. Yeah, that looks pretty good, if (needle == haystack[midpoint]). We found it. Now if we run the code—we'll back up a little bit— it compiles, it runs, and now if we look for 4, we didn't find it because now we're actually getting the number 23. We're getting the value 23, and that's what we're comparing to our needle. But that's good. That's a step in the right direction. 

That's what we're trying to do. We're not trying to compare the needle against positions in the array but rather against the actual elements in the array. If we look back again now at the next step in our algorithm, what is the next step? Leila already mentioned it briefly. [Student] Check to see if it's greater than or less than and then decide which way to move. [Nate] Yeah, so how would we do that? Can you put in some—I'll save this revision, and then if you put in some lines that will do that. Yeah, Charlotte.>>I have a question. Shouldn't it be midpoint - 1 because the first thing is it's 0 indexed, so if we put 4, that's not actually the character we're looking for? Yes, and the other problem with that is— that's a great catch, because what is going to end up happening possibly if we keep moving and we don't ever adjust initially? I guess what we might end up doing is trying to access the element at the 8th position of the array, which in this case doesn't exist. We will want to do some sort of accounting for the fact that we have some zero indexing. [Charlotte] Sorry, I meant midpoint - 1 in the square brackets. We can do that. We'll come back to this issue in just a bit. Once we start to get to the actual looping, that's when we'll really see this come into play. For the time being, we can do this, but you're totally right. That zero indexing will have an effect that we need to account for. Let's see. 

How is the greater than and less than—? [Student] I get how to do the greater than and less than part. I just wasn't sure what to print if you find that it is less than the haystack midpoint or greater than. Here I can save what I've— [Nate] Yeah, if you save what you've got, and we'll pull it up. There we go. [Student] And I put question marks for what I didn't know. [Nate] That looks great. Here we've got question marks because we still don't know what we're going to quite do yet. What would we want to do—oops, we've got some braces all funky on us. We'll correct these braces. There we go. And so what do we want to do, according to our algorithm, if we don't find the needle? Say in the case that the needle is less than what we're looking at. Kevin. Only look at the left half. Right, so we'll put a comment in here that says "look at left half." And if the needle is greater than the haystack at the midpoint, what do we want to do? [Student] Then you look at the right half. Look at the right half, "look at right half." Not too shabby. Okay, so at this point, things are looking pretty good. The problem with the code as written is what? [Student] You don't have endpoints for the halves. Right, we don't have endpoints for the halves. We also are only going to go through this once. We're only going to look at one midpoint. Either the element is there, or it's not. In order to complete this, we'll need to do some sort of repetition. We need to keep repeating until we find that either the element is in there because we've narrowed down and finally found it, or it's not in there because we've looked through all of the things in the appropriate halves of the array and found that nothing is in there. 

Whenever we've got this repetition going on, what are we going to use? [Student] A loop. Some sort of loop. Yes. [Student] Can we do a do-while loop and have it do that and then while the needle does not equal—I'm not sure where I was going with that. But kind of like do that as long as it doesn't equal the value that the user input. Yeah, so let's see, how might this write itself? You said let's use a do-while loop. Where does the do start? [Student] Right after the size/2. [Nate] Okay, and what are we going to do? We'll fill in the while later. What are we going to do? [Student] Don't we want to do all the stuff we have in the if portion? [Nate] Do all this stuff, great. Copy and paste. Oh, man. Let's see if this works, if we can tab this over. Beautiful. Okay, and we save this so you guys have it. All right, and we are going to do this while— what was the while condition you were after? [Student] While the needle does not equal, so like the exclamation point. But I'm not sure exactly what that is yet. [Nate] Yeah, this is one way to do it. Sam, do you have a comment? [Sam] I remembered when I looked at the videos, I took a screenshot of one of the—like when we did the pseudocode for it, there was some relationship between max and min. I think it was something like if max is ever less than min. Got it. [Sam] Or like if max is not less than min or something like that, because that would mean that you've searched everything. 

Yeah, so what does it sound like max and min were referring to? [Sam] Values that—integers that are going to change relative to where we put the midpoint. Exactly. [Sam] At that point, it's going to [inaudible] calculate the max and min. Midpoint is this max and min idea. Does that make sense to folks? If we were to start looking at how we're going to do this iteration, you're totally right that we want to use some sort of do-while loop. But I guess if we remember what's going on at the spot of this array and what's actually happening—I'm going to write over here— at the very first iteration of binary search, we have— I'm going to use b and e to denote the beginning. And then the end of our array. We know that the beginning is at 4 right over here, and we know that the end is at 108. Say we're searching for the number 15. The first time we do this, like we saw earlier, the midpoint is either going to be 16 or 23 depending on how we calculate things out. Since evenly dividing in the middle would give us this space between 16 and 23, we can't evenly divide it or divide it and get at a true midpoint. We'll look at 16. We'll realize "Hey, 16 > 15 that we're looking for." To then look at the left half of the array what we'll end up doing is discarding this entire upper portion and saying, "Okay, now our endpoint is going to be here." The next iteration of our loop, we're now looking at this array, effectively having discarded this portion because now if we're taking the midpoint to be the difference between the beginning and the end, we find our midpoint to be 8, which we can then test 8 to see where it is in relation to the number we're looking for, 15, find that 15 is greater, so we have to move to the right portion of the list, which we know because we're humans, and we can see it. We know that the right portion is going to be where we find it, but the computer doesn't know that, so what we'll do is we'll actually have this go up, and now the beginning and the end are the same spot, so the midpoint becomes the only number in the list at that point, which is 15, and we've found it. Does that shed some light on where this whole max and min notation is going, keeping track of the endpoints of the array in order to figure out how to narrow things down? 

What would happen if this weren't equal to 15 now? What if we were looking for 15 and, instead, this number were also 16? We'd say, "Oh, it's greater. We want to go back to the left." And we'd move our e to the right, at which point we have an endpoint that would be conflicting. It wouldn't be able to search for any more elements because now we have our endpoint and our beginning point, our max and our min, are now flipped. We search through the entire array. We can't find anything. That's the point at which we'd want to say, "Okay, we're going to stop this algorithm. We haven't found anything. We know it's not in here." How is this going? [Student] How exactly does the computer switch the end? How does the end end up before the beginning? The end ends up before the beginning because of the math that we're going to do each time we do this. The way we swap is if you look at the very first time we do this swap where we have the beginning at 4 and the end all the way down at 108 and our midpoint, say, at 16— I'm going to reset this back to 15—if we're looking for the 15, we knew that what we did when we checked the 16 and saw that it was greater and wanted to discard the entire right portion of the list, we saw that what we wanted to do is move this e right here. Effectively, the e got moved to one before the midpoint. Likewise, when we did this iteration of the algorithm and the midpoint was at 8, we found that 8 < 15, so we wanted to move the b one past the midpoint. Now, the beginning and the end are both together at this 15. 

If we'd been happening to look for some other value, not 15, or if this 15 had instead been a 16, we would have found that the e we want to move one before the midpoint. Now the e would be there flipped less than the b. Let's walk through how we actually end up coding this algorithm. We know that we want to have this midpoint calculation. We know also that we want to track the beginning and the end of the array of our current array so we can figure out where this left half of the list is and where the right half of the list is. We do that with either begin and end, or we can call them min and max. I'll use begin and end this time. When we begin, if we look back at our example down here, our beginning was set to the very beginning of the array, as natural. What index was this? What should our begin be? Daniel. [Daniel] Haystack[0]. [Nate] Yeah, so we could set it equal to haystack[0]. The problem, though, is that this gives us not the position of the first element. It gives us the index of the first element or the actual value at that first position. [Student] That will convert to .20? [Nate] What this will do is—well, it won't do any converting. What it will do is it will store a 4 in begin, and then it will be hard to make comparisons against begin because begin will be holding the value of 4, which is the beginning of our array, but we want to track the indices in the array as opposed to the values. We'll actually use a 0, like that. For the end of the array—Charlotte brought this up a little earlier. This is where we'll take into account the zero indexing. 

Charlotte, what's the end of the array? What is the index of the end? [Charlotte] Size - 1. Yeah, and which size should we use? Should we use capital size or lowercase size? Capital size. In this case, we could use capital size. If we wanted this function to be portable and use this function in other programs, we can actually use lowercase size. It's fine too. But Charlotte is totally right that we want to have size - 1. At this point— [Student] How is it that you can use uppercase size? How is it that we could use uppercase size? It turns out that these #defines are really, under the hood, a text like find and replace, if that makes sense. When you compile your code, the preprocessing phase of the compiler goes through the file, and it looks for everywhere that you've written capital size, and it replaces that text literally with an 8, just like that. In that sense, this is very different from a variable. It doesn't take up any space in memory. It's a simple text replace trick. In this case, we're going to use size. From here we do want to do some sort of repetition, and we're on the right track with our do-while loop. We want to do something until a condition doesn't hold anymore, and as we saw earlier, we saw that that condition was indeed that we don't want the end to be less than the begin. 

This is our stopping condition. If this occurs, we want to stop and declare like, "Hey, we haven't found anything." To express this, we do want to use some sort of loop. In this case, would it be a do-while loop, a for loop, a while loop? We have a do-while loop here. Do you guys like that approach? Do you think we should try a different approach? Kevin, any thoughts? We could have a while loop because we know maximum would be greater than min at the start anyways. Yeah, so there's no initialization that needs to happen. Those do-while loops are great when you have to initialize something before then testing, whereas here we know that we're not going to keep reinitializing both begin and end every round of the loop. We know that we want to initialize them, then check our condition. In this case, I'll actually go with a simple while loop. It turns out that do-while loops are used fairly infrequently. A lot of places don't even teach do while loops. They're good for handling user input, so we've seen a lot of them thus far. But normal for and while loops are a lot more common. It turns out that this condition as written won't really do us much good, and why is that? I'm sorry, I don't know your name. I'm Jerry.>>Sorry? It's B-O-R-U-I. Oh, okay. I don't see you on my list. Oh, it's because—oh, that makes sense. Do you have an idea of why this while loop might not work as intended, as written with the condition? [Jerry] You mean like you want all the stuff after it into the—? Yeah, so that's one. We might have to put all of this stuff into the while loop, which is totally true. The other thing that's a little more problematic, though, is that this condition doesn't work. [Student] You need to flip it. Right, so this condition won't ever be true initially the way we talked about it. We want to do something until end < begin, but we want to do something while begin ≤ end. 

There's that reversal of the logic there. I'm guilty of making those mistakes all the time. [Student] Why does it have to be less than or equal to? Because do you remember the case that we got to where there was only one element, and we were down, and we were looking at just the 15 in our array? And our beginning and our end were the same element. We want to make sure that we handle that case. If we did a straight less than, we would only be able to get down to a 2-element array. Once we got down to that last element, if that were our element, we'd never find it. Now here, we can do exactly like you were saying. We can start plopping stuff right into the middle of our while loop. We can plop in our midpoint. We can take all of these if statements, pull them out of this do-while loop, plop them in, clean things up a little bit, and I'll go ahead and save this revision. And at this point, we're getting pretty close. Sam. I think you also have to have int midpoint = size - 1 / 2. Got it, size - 1 / 2. Is there anything else we need to change about that line? That was a good catch. 

What does size do? Are we ever changing size? In order to keep the line like this, we have to change the size. We have to change the size every time we go around the for loop. But remember when we were going through our example just a little bit earlier, and we had the beginning at 4 and the end all the way over at 108? How did we calculate the midpoint? Were we using the size? Or we were using begin and end instead? It's the difference between the end and the beginning. Exactly, and how exactly should I write that, Charlotte? Just end - begin. You wouldn't need to do the - 1 because the - 1 has been included in the end and the begin already. [Nate] Great, you're totally right. We don't have to do the - 1 because that - 1 has been included and accounted for when we initialize the end variable. 

Is there anything else I need to do syntactically to have this line make sense? [Student] Plus begin.>>Plus begin? [Student] At the end. Because it's only calculated half the length. You need to add the begin. [Nate] What would this calculate for us? If we think about end on this very first iteration of the loop, end is going to be in position index 7. Begin is in position 0. Remember, we're looking for either position 3 or position 4. If we look at this math, just to make it a little more tangible, put some numbers here, we have 7, 0, so 7 - 0, and then / 2 is 3 in integer division, that is. Then do we need to then add back our begin? We don't in this case. On the very first iteration, it will be fine because begin is 0. But as we progress, we do really all just need end - begin / 2. There's one other trick here, and that is namely one of precedence. [Student] Do we need parentheses? [Nate] Exactly, and that's because if we don't put these parentheses, then this line will be interpreted instead as (end) - (begin / 2), which we definitely don't want. Watch out for those precedence rules. [Student] Why isn't it end + begin? Why isn't it end + begin? [Student] Why is it not that? Why would it be +? I think you're right. [Student] Because it's average? [Nate] End + begin, you're totally right. Wow, I totally goofed. You're right. If we were doing the minus, we would want to add the begin back in. In this case, you're very right that we want to take the average of the two, so we do want to add them, as opposed to subtract them. [Student] It would also work if you did end - begin / 2 + begin. It would if we do—I believe so. 

For example, if we were looking at begin, and we shifted it over here to the 15. Now begin is at position 2. End is at position 7. If we subtract them, we get 5. Divide that by 2, we get 2. And then we add 2 back in, and that gets us to the 4th position, which is right here, which is the midpoint. [Student] Do we need to take care of wrapping? In what sense do we need to take care of wrapping? If the sum or the difference between depending on how we do it is not an even number. Then the computer gets confused whether when it's 2.5; do you move to the left or to the right to determine which is the midpoint? Got it. It turns out that with integer division, we don't ever get these floating point numbers. We never get the decimal. It's totally discarded. If you have a computer divide two int variables, and one is 7, and the other is 2, you won't get 3.5 as a result. It will get 3. The remainder will be discarded, so it's effectively rounding— not a round but rather a floor, if you guys are familiar with that in math, where you completely discard the decimal, and so you're essentially truncating it down to the nearest whole position, to the nearest whole number. [Student] But then that's problematic because if you have an array of 7 elements then that automatically takes the 3rd element out of the midpoint instead of the 4th. How do we deal with that? It's problematic because if we had an array of 7, it would pick the 3rd instead of the 4th. Could you explain a little more? [Student] Because if you have 7 elements then the 4th element would be the midpoint, right? Remember your comment about being zero indexed, though. [Student] Yeah, so in position 3. That would be the midpoint. Yeah. Oh, okay. I see what you mean. It's kind of weird, as we get used to this whole notion of getting rid of decimals. That's a great point. Let's finish this up. We've calculated our midpoint. 

We're testing to see if our needle is equal to the middle value. We're printing that we found it, but really, what do we want to do in this situation? We've found it, so we want to let the caller know that we found it. We've got a function that's a boolean typed function. The way we signal to the caller of our function that we're ready to go is we say, "Hey, this is true." How would we do that, Kevin? You're nodding your head.>>[Kevin] Add return true. [Nate] Exactly, return true. Now, if it's not equal, how would we look at the left half? Any ideas? Stella, any ideas? You need to set a new position for end. Yeah. So we have to do position of midpoint  - the end. Great. We need to set a new position for the end to look at the left half. This was what we talked about before where I keep going back to this example. I have the begin here, and then I have the end all the way over here. 

Again, if we're looking for 15, and our midpoint is at 16, and we realize, "Oops, 16 is greater. We want to move to the left half." We would then move the end to the 15, and we do that by taking one away from the midpoint and setting that as our new end. Likewise, if we want to look at the right half, how would we do that? Do you have an idea? [Student] You just set begin to midpoint + 1. [Nate] Great. And now in the case that we don't find anything, does that get taken care of for us? Daniel, does that get taken care of for us? [Daniel] No. [Nate] If we make it through the entire array and we don't find anything, where would that be taken care of, or should we take care of it? [Daniel] The while condition. [Nate] Yeah, the while condition, exactly. It will take care of going through the entire array if we don't find anything. This while loop will end. We will never have encountered this condition, and we can return false. We can also leave this if in here like this because if this if statement is true, and our function will return, and so we'll essentially abort this function at this point when we return true. But what happens with this structure here? Will this work entirely, or is there some logical flaw in there? 

There is some logical flaw in there, with the way it's set up. What might it be? [Student] Why do you need the - and + 1s? That sets our array up to be our new left half and right half. [Student] But why couldn't you do it without the - 1s and + 1s? [Nate] We could set it equal to the midpoint? What might be problematic about that? [Student] I guess it's inefficient because you're checking a value that's already been checked. [Nate] Exactly, so Sam is totally right. If you set the end and the begin equal to the midpoint instead of - 1 and + 1 reflectively, at some point in the future we'll end up checking the midpoint again. [Student] I started the pset, and then I had something like that where I forgot the + 1, and it got stuck in an infinite loop. Right, because at some point you're never going to get begin and end to actually overlap. Cool. There's one more logical flaw, and that is that this should definitely be an else if. Why might that be? 

The reason is if it's not an else if—did you see it, Kevin? [Kevin] Yeah, because you're changing the end point. [Nate] Exactly. We're changing the endpoint, and if it's written like this—we'll make spaces between— it will check this case. This case, if it succeeds, will abort out of the function. Then it will check this next case, and if this succeeds, it will adjust the endpoint, and then it will continue on and check this case. But at this point, we don't want it to continue checking. Fortunately, we haven't reset the midpoint here, and we know that this case won't succeed. But we definitely want to put the else if in there even though that might—in this case since we're not adjusting the midpoint, would that make a difference? No, because these cases are all exclusive. Again, my bad. We don't, I think, need this else if. We can give it a try and run it and see what happens. Building, an error occurred. It's probably because I left these b's and e's in here. Do I have any more of those up at the top? It doesn't look like it. We zoom out, build, there it goes, so now if we search for 15, yes. Let me zoom in. 15, yes. We can run it again. Uploading source code, building, running. We can search for something like 13, and we don't get anything printing out, so it's not finding that for us. That's great, because it's not in our list. 

We are now out of time. That's going to be it for this week. Thanks for joining, and see you later. 

[CS50.TV]