[Section 4] [Less Comfortable] [Nate Hardison] [Harvard University] [This is CS50.] [CS50.TV] All right, welcome back to section. In this week's section we're going to do a couple of things. We're going to first recap Problem Set 2, which is the Caesar and Vigenère problem set. And then we're going to dive into Quiz 0 review and spend a little bit of time recapping what we've talked about in each of the lectures so far, and we'll also do a few problems from previous year's quizzes. That way you guys have a good way to prepare for that. To start, I've booted up a couple of good solutions for the previous problem set, Problem Set 2, into this space. If you guys all hit this link, and if you click my name and click on my first revision you'll see caesar.c, which is exactly what I'm looking at. Let's talk about this really quickly. This is just a sample solution. This is not necessarily the perfect solution. There are many different ways to write this, but there are a few things that I wanted to highlight that I saw as I was grading, common mistakes that I think this solution does a very good job of handling. The first is having some sort of header comment at the top. On lines 1 through 7 you see the details, what exactly this program is doing. A good standard practice when you're writing C code regardless if your program is contained within a single file or whether it's split over multiple files is to have some sort of orienting comment at the top. This is also for people who go out and write code in the real world. This is where they'll put copyright information. Below are the #includes. On line 16 there's this #define, which we'll come back to in just a bit. And then once the function starts, once main starts, because this program has been all contained in a single function the very first thing that happens—and this is very idiomatic and typical of a C program that takes in command line arguments—is that it immediately checks for the argument count, argc. Right here we see that this program is expecting 2 arguments exactly. Remember there's that first argument that's the special one that's always the name of the program that's being run, the name of the executable file. And so what this does is it prevents the user from running the program with more or fewer arguments. The reason we want to check for this right away is because we can't actually access this argv array right here reliably until we've checked to see how big it is. One of the common errors I saw was people would immediately go in and grab argv[1]. They'd grab the key argument out of the array and do the a to i check on it, and then they'd do the test for argc as well as the next test, whether or not the first argument was indeed an integer at the same time, and that doesn't work because in the case that there are no arguments supplied you'll be grabbing an argument that isn't there or attempting to grab one that isn't there. The other big thing that you should notice is that you always want to print out some sort of helpful error message to the user to orient them. I'm sure you've all run programs where all of a sudden it crashes, and you get this ridiculous little dialog that pops up and says something horribly cryptic and maybe gives you an error code or something like that that makes no sense. This is where you really want to provide something helpful and targeted to the user so that when they run it they go "Oh," face palm. "I know exactly what to do. I know how to fix this." If you don't print a message, then you end up actually leaving the user to go examine your source code to figure out what went wrong. There are also some times that you'll use different error codes. Here we just used one to say there was an error, there was an error, there was an error. Bigger programs, often programs that are called by other programs, will return some sort of special error codes in different scenarios to programmatically communicate what you would otherwise just use a nice English message for. Cool. As we work down, you can see we pull the key out. We test to see if the key fits. We get a message from the user. The reason we do it in this do while loop—and this is something that we will cover in a little bit—but it turns out that if you type control D when you get that GetString prompt on the terminal what that actually does is it sends a special character to the program. It's called the ELF or the end of file character. And in that case, our message string will be null, so this wasn't something we checked for in the problem set itself. But as we go on, now that we've started to talk about pointers and dynamic memory allocation on the heap, checking for null whenever you have a function that might return null as a value is something that you'll want to get in the habit of doing. This is here primarily for illustration. But when you do see GetString in the future, so from Problem Set 4 on, you'll want to keep this in mind. Again, this is not an issue for Problem Set 3 either since we hadn't covered it yet. Finally, we get to this part where we get to the main encryption loop, and there are a couple of things going on here. First, we iterate over the entire message string itself. Here we've kept the strlen call in the condition, which a number of you have pointed out is not a great way to go. It turns out in this case it's also not great, partly because we're modifying the contents of the message itself inside the for loop, so if we have a message that's 10 characters long, the first time we start that for loop strlen will return what? 10. But if we then modify message, say we modify its 5th character, and we throw in a \0 character in the 5th position, on a subsequent iteration strlen(message) won't return what it did the very first time we iterated, but it will instead return 5 because we threw in that null terminator, and the string's length is defined by the position of that \0. In this case, this is a great way to go because we're modifying it in place. But you notice that this is actually surprisingly simple to encrypt if you can get the math correct. All that's required is to check whether or not the letter that you're looking at is uppercase or lowercase. The reason we only have to check for that and we don't have to check for the is alpha case is because if a character is uppercase or if it's lowercase then it's definitely an alphabetic character, because we don't have uppercase and lowercase digits. The other thing we do—and this is a little tricky— is we've modified the standard Caesar cipher formula that we gave in the problem set specification. What's different here is that we subtracted in the uppercase case capital A, and then we added capital A back in at the end. I know a few of you have done this in your code. Did any of you do this in your submissions? You did this. Can you explain what this does, Sahb? By subtracting it out, because you did a mod right after it, you have to take it out, so that way you get [coughing] position. And then by adding it back later you shifted over the one that you wanted. Yeah, exactly. What Sahb said was that when we want to add our message and our key together and then mod that, mod that by NUM_LETTERS, if we don't scale our message into the appropriate 0 to 25 range first, then we might end up getting a really weird number because the values that we're looking at when we look at message[i], when we look at the ith character of our plain-text message, is a value somewhere in this 65 to 122 range based on the ASCII values for uppercase A through lowercase z. And so when we mod it by 26 or by NUM_LETTERS, since that was our #define at the top right up here, that's going to give us a value that's in the 0 to 25 range, and we need a way to then scale that back up and get it in the appropriate ASCII range. The easiest way to do that is to just scale everything down into the 0 to 25 range to begin with, and then shift everything back up at the end. Another common error that I saw people run into is that if you don't actually do this scaling right away and you add message and key together and you add them, say, into a char variable, the problem with that is since message[i] is a relatively big number to begin with— remember it's at least 65 if it's an uppercase character— if you have a large key, say, something like 100, and you add those 2 together into a signed char you're going to get an overflow. You're going to get a value that's larger than 127, which is the largest value that a char variable can hold. Again, that's why you'd want to do that sort of thing to begin with. Some people got around that case by doing an if else and testing to see if it would overflow before doing that, but this way gets around that. And then in this solution we printed out the whole string at the very end. Other people printed out a character at a time. Both are awesome. At this point, do you guys have any questions, any comments about this? Things you like, things you don't like? I had a question. Maybe I missed it during your explanation, but how does this program skip the spaces for connecting the key to the length of the text? This is just Caesar cipher.>>Oh, sorry, yeah. Yeah, we'll see that. In the Caesar cipher we got around that because we only flipped characters. We only rotated them if they were uppercase or lowercase. You guys feeling pretty good about this? Feel free to copy this home, take it, compare it to what you guys wrote. Definitely feel free to send questions about it too. And again, realize that the goal here with your problem sets is not to get you guys to write perfect code for your problem sets. It's a learning experience. Yeah. Back to the do while loop, if it equals null, so null just means nothing, they just hit enter? Null is a special pointer value, and we use null when we want to say we have a pointer variable that is pointing to nothing. And so typically it means that this variable, this message variable is empty, and here, because we're using the CS50 special string type, what is the CS50 string type? Have you seen what it is when David pulled back the hood in lecture? It's a funky—it's a pointer, right? Okay, yeah.>>It's a char*. And so really we could replace this right here with char* message, and so the GetString function, if it doesn't successfully get a string from the user, it can't parse a string, and the one case in which it can't parse a string is if the user types the end of file character, the control D, which is not something you typically do, but if that happens then the function will return this null value as a way of saying "Hey, I didn't get a string." What would happen if we don't put message = null, which is something that we haven't been doing yet? Why would that be a problem here? Because I know that we talked a little bit in lecture about memory leaks. Yeah, let's do that, and let's see what happens. Basil's question was what happens if we don't actually have this message = null test? Let's scroll up to the top. You guys can comment this out. Actually, I'll save it in a revision. This will be Revision 3. What you'll have to do to run this program is you'll have to click this gear icon up here, and you'll have to add an argument to it. You'll have to give it the key argument since we want to pass in a command line argument. Here I'm going to give it the number 3. I like 3. Now zooming back out, running the program. It's running, compiling, building. Here we go. It's waiting to be prompted. If I type in something like hello—where did that go? Oh, my program took too long to run. I was jawing for too long. Here it goes. Now I type in hello. We see that it encrypts appropriately. Now what happens if we do prompt GetString to return null? Remember, I said that we did that by pressing control D at the same time. I'll scroll up here. We'll run it again. Building. There it goes. Now when I hit control D I got this line that says opt/sandbox50/bin/run.sh, Segmentation fault. Have you guys seen that before? [Student] Why is there no—>>Sorry? [Student] Why is there no core dump in this case? The core dump is—the question is why is there no core dump here? The question is that there may be, but the core dump is a file that gets stored on the hard drive. In this case we've disabled core dumps on the run server so that we don't have people seg faulting and building up tons of core dumps. But you may get one. Core dumps are the sort of thing that you can often disable, and sometimes you do. The segmentation fault, to answer your question, Basil, is saying that we tried to access a pointer that wasn't set to point to anything. Remember Binky in the video when Binky tries to go access a pointer that's not pointing to anything? In this case I guess technically the pointer is pointing to something. It's pointing to null, which is technically 0, but that is defined to be in a segment that is not accessible by your program, so you get a segmentation fault because you're not accessing memory that's in a valid segment like the heap segment or the stack segment or the data segment. Cool. Any more questions about Caesar? Let's move on. Let's look at Revision 2 really quickly. That's Vigenère. Here in Vigenère we'll walk through this one pretty quickly because, again, Vigenère and Caesar are quite similar. Header comment is before, #define is before to avoid using these magic numbers. The nice thing is say we wanted to move to a different alphabet or something like that. Rather than having to go manually change all the 26's in the code we could change this to 27 or drop it down if we were using different alphabets, different languages. Again, we've got this check of the argument count, and really you can almost take this as a template. Pretty much every program you write should have— if it takes command line arguments—some sequence of lines that reads like this at the very beginning. That's one of the first sanity tests you want to do. Here what we did was we made sure that the keyword was valid, and that was the second check that we did. Notice again that we separated this from argc and 2. Note that in this case one thing that we had to do was instead of using a to i we wanted to validate the entire string, and in order to do that you actually have to go character by character over the string. There's no good way to call something on it because even, for example, a to i will return 0 if it can't parse an integer, so that doesn't even work. Again, nice message telling the user exactly what happened. Then here, again, we also handle the case where the user types in a control D random character. And then Charlotte had a question earlier about how we manage to skip spaces in our string here. This was kind of similar to what we did with the Myspace program that we did in section, and the way this worked is that we tracked the number of letters that we'd seen. As we walked over the message string, as we walked over character by character, we tracked the index as part of our for loop, and then we also tracked the number of letters, so non-special characters, non-digits, non-white space that we'd seen in the separate variable. And then this solution modifies the key to get an actual key integer, and it does that on the fly, right before it then goes to encrypt the actual message character. There are some solutions that were perfectly great too that would modify the key up when testing for the key's validity. In addition to making sure that the character and the keyword was an alphabetic character it also turned that into an integer in the 0 to 25 range to then skip having to do that later on in this for loop. Again, you see here this is really the exact same code that we used in Caesar at this point. You're doing the exact same thing, so the real trick is figuring out how to turn the keyword into an integer. One thing that we did here that is a little dense is we repeated this phrase, I guess you could call it, 3 separate times on lines 58, 59, and 61. Can somebody explain what exactly this phrase does? It's accessing a character, like you said. Yeah, it's [inaudible] a character in the keyword, and so it's number of letters seen because you're only moving along the keyword once you've seen the letter, so that's going to effectively skip spaces and stuff like that. Yeah, exactly. And then once you've seen the keyword blank you just mod so you move back around. Exactly. That's a perfect explanation. What Kevin said is that we want to index into the keyword. We want to get the num_letters_seen character, if you will, but if num_letters_seen exceeds the length of the keyword, the way we get back into the appropriate range is we use the mod operator to effectively wrap around. For example, like in the short, our keyword is bacon, and it's 5 letters long. But we've seen 6 letters in our plain text at this point and encrypted 6. We will end up accessing the num_letters_seen, which is 6, mod the length of the keyword, 5, and so we'll get 1, and so what we'll do is we'll access the first character inside of our keyword at that point. All right, any questions on Vigenère before we move on? You guys feeling pretty good about this? Cool, great. I want to make sure that you guys are getting the chance to see code that we think looks good and have the chance to learn from it. This is going to be the last we'll be using spaces for the time being, and we're going to transition now, and I'm going to go to cs50.net/lectures so we can do a little bit of quiz review. The best way I think to start doing quiz review is to come to this Lectures page, cs50.net/lectures, and underneath each of the week headings, so if I look here at Week 0, I see that we have a list of topics that we covered in Week 0. If any of these topics seem unfamiliar to you you'll definitely want to go back and scour the lecture notes and possibly even skim through the lectures, watch them again if you want to get a feel for what's going on with each of those topics. I will say additionally this year one of the cool resources we've got is these shorts that we've created, and if you look at Week 0, we don't have all of the topics covered, but we've got quite a few of them, some of the trickier ones, so watching these shorts again is a good way to get you up to speed. In particular, I'm going to put in a plug for the 3 on the bottom, since I did those. But if you're struggling with binary, bits, hex, that kind of stuff, binary is a great place to start. ASCII is another one that's good to view too. You can even watch me at 1.5x speed if I'm going too slow for you. Since it's review, feel free to do that. Just to start really quickly, we're going to go through a couple of these quiz problems just to quickly churn through these. For example, let's look at problem 16 that I've got right up here on the board. We've got this following calculation in binary, and we want to show any work. Okay, I'm going to give this a shot. You guys should follow along with paper, and we'll do this really quickly. We want to perform the following calculation in binary. I've got 00110010. And I'm going to add to it 00110010. For the math geniuses following along at home, this is effectively multiplying by 2. Let's start. We're going to follow the same addition algorithm that we do when we add decimal numbers together. Really the only difference here is that we loop back around once we have 1 + 1 instead of once we get to 10. If we start from the right, really quickly, what's the first digit? [Student] 0.>>[Nate H.] 0. Great, the second digit? [Student] 1. [Nate H.] Is it a 1? 1 + 1 is? [Student] 10. [Nate H.] Exactly, so what is the digit that I write right beneath the 2 ones added together? [Student] 1, 0, or 0 and then carry the 1. [Nate H.] 0 and carry a 1, exactly. Next one up, Basil, you're up. What's the third?>>[Basil] 1. [Nate H.] 1, perfect. Kevin? [Kevin] 0.>>[Nate H.] 0, Charlotte? [Charlotte] 0.>>[Nate H.] Yeah, and what do I do? [Student] The 1. [Nate H.] And what do I do? And then I carry the 1. Perfect, Sahb?>>[Sahb] Now you have 1. [Nate H.] And do I do anything here? [Sahb] Then for the next one you have 1 because you carried over 1. [Nate H.] Great, so here we can finish it up. Cool. [Student] Does 0 + 0 = 0? 0 + 0 = 0. 1 + 1, like you said, is 10, or 1, 0, rather. 10 is a misnomer because to me 10 means the number 10, and it's the quirk of how we're representing it when we're writing it. We represent the number 2 by 1, 0, and the number 10 is slightly different. What's kind of nice about binary is that there really aren't that many cases you need to learn. There's 0 + 0 = 0, 0 + 1 = 1, 1 + 1 is 0, and then carry a 1, and then you can see here on the third column from the right we had this 1, 1, and 1. And 1 + 1 + 1 is a 1, and you carry another 1. When you're doing binary addition, pretty simple. I'd do a couple more of these to sanity check yourselves before you go in because this is probably something that we'll see on the quiz. Now let's do this next one as well. Let's do problem 17. We're going to convert the following binary number to decimal. I've got 10100111001. Remember in the binary video that I did I walked through a couple of examples, and I showed how everything works when you're doing it in decimal. When you're working in decimal representation I think we're at this point in our lives so fluent in it that it's pretty easy to gloss over the mechanics of how it actually works. But to do a quick recap, if I have the number 137 this really means—and again, this is in decimal representation— the number 137 in decimal means that I have 1 x 100 + 3 x 10 + 7 x 1. This is all staying on the screen. And then if you look at these numbers right here, 100, 10 and 1, you see that they're actually all powers of 10. I have 10², 10¹, and 10 to the zero. We have a similar sort of thing in binary, except that our base, as we call it, is 2 instead of 10. These 10s that I wrote down here at the bottom, this 10², 10¹, 10 to the zero, 10 is our base, and the exponent, 0, 1, or 2, is implied by the position of the digit in the number that we write. 1, if we look at it, this 1 is in the 2nd position. The 3 is in the 1st position, and the 7 is in the 0th position. That's how we get the various exponents below for our bases. Following all of this we'll—actually, you know what? We'll do—where did my undo button go? There it goes. I love this undo thing. Following this I think for me at least the easiest way to start converting a binary number or a hexadecimal number where the base is 16 and not 10 or 2 is to go ahead and write out the bases and exponents for all of the numbers in my binary number at the top. If we start from left to right again, which is kind of counterintuitive, I'll change back to black here, we have the 2 to the 0th position, and then we have 2¹, 2², and then 2 to the 3, 2 to the 4, 2 to the 5, 6, 7, 8, 9, and 10. These numbers I've written out are all the exponents. I only wrote the bases here in the first 3 just for space. At this point I'm going to go ahead and I'm actually going to erase the stuff that we did in decimal, if that's okay. You've all got that. Those of you watching online I'm sure will be able to rewind me if you'd like. Switching back to the pen. Now, what we can do—if you guys aren't totally up to speed on your powers of 2, that's totally cool. It happens. I understand. I once had a job interview where I was told I should know all powers of 2 up through 2 to the 30th. It was not a job I got. Anyway, you guys can go ahead and do the math here, but with binary it doesn't really make sense, and nor does it make sense with decimal or hexadecimal either, to do the math out where you have zeros. You can see I have 0 here, a 0 here, 0 here, 0 here, 0 here, 0 here. Why might it not make sense to do the actual math to calculate the appropriate power of 2 for that position? Exactly, like Charlotte said, it will be 0. Might as well save yourself the time if calculating powers of 2 isn't your strong suit. In this case we only need to calculate it for 2 to the 0 which is—? [Student] 1. [Nate H.] 1, 2 to the 3 which is—? [Student] 8.>>[Nate H.] 8. 2 to the 4? [Student] 2. I'm sorry, 1. [Nate H.] 2 to the 4 is 16, exactly. 2 to the 5, Kevin?>>32. [Nate H.] 32, 2 to the 8? [Student] 32 x 8, 256. [Nate H.] Perfect. And 2 to the 10? [Student] 1024. [Nate H.] Yeah, 1024. Once we've got these numbers we can sum them all up. And this is where it's really important to do a couple of things. One is go slow and check your work. You can tell that there's a 1 at the end of this number, so I should definitely get an odd number as my result, because all the other ones are going to be even numbers given that it's a binary number. The other thing to do is if you get to this point on the test and you've written it out this far and you're running out of time look at the number of points that this problem is worth. This problem, as you can see—if I flip back to my laptop really quickly— this problem is worth 2 points, so this is not the sort of addition you should be going through if you're really pressed for time. But we'll switch back to the iPad, and we'll go through it really quickly. I like doing the small numbers first because I find that easier. I like 32 and 8 because they go together pretty easily, and we get 50. 16 and 1 gets 17. There we get 57, and then we can do the rest of this, so we can do 57, 156. Come on. Man, well, let's see. We had 57, 256, and 1024. At this point, I'd rather just go through. I have no clue. I clearly need to read up on this. 7, 6, and 4, you get 17. 1, 5, 5, 2, 13. Then we get 3, and then we get 1. 1337. Easter egg, anybody? Anybody recognize this number? Chris recognizes the number. What does it mean, Chris? [Chris] Leet. Leet, so if you look at this, it looks like leet. Hacker stuff. Watch out for that kind of stuff on the midterm or the quiz, rather. If you see that kind of stuff and you're wondering "Huh," that might actually mean something. I don't know. David likes putting it in. It's a good way to sanity check it. Like okay, I can see what's going on. That's Week 0/Week 1 stuff. If we switch back to our laptop now, zoom out, and a couple of other things. There's ASCII, which we've been doing a lot of with the problem sets. This notion of capital A. What is that really? Knowing it's the decimal integer. 65 is what it's mapped to in the ASCII table, and that's therefore how the computer writes it, and that's how we've been getting away with actually writing the character capital A and the character lowercase a in some of these solutions and problem sets that you've been doing. A couple of other things. We've got statements, boolean expressions, conditions, loops, variables and threads. Those all seem to make sense for the most part? Some of this terminology is a little funky at times. I like to think of a statement as for the most part something that ends with a semicolon. Statements such as x = 7, which sets a variable, presumably called x = 7. Presumably x is also a type that can store the number 7, so it's an int or possibly a float or a short or a char, something like that. A boolean expression is using these double equals and the bang equals or the not equals, less than, greater than, less than or equal to, all that kind of stuff. Conditions then are if else statements. I would remember that you can't have an else without a corresponding if. Likewise, you can't have an else if without a corresponding if. Loops, recall the 3 kinds of loops we've been hammering into you for the last couple of sections and problem sets. Using do while when you're getting user input, using while loops until a particular condition is true, and then using those for loops if you need to know which iteration of the loop you're currently on is how I think about it. Or if you're doing a for each character in a string I want to do something, for each element in an array I want to do something to that element. Threads and events. These we haven't covered so explicitly in C, but remember this from Scratch. This is the notion of having different scripts. This is also this notion of broadcasting an event. Some people didn't use broadcasting in their projects initially, which is totally cool, but these are 2 different ways of handling this larger issue called concurrency, which is how do you get programs to execute or seemingly execute at the same time? Different tasks running while other tasks are also running. This is how your operating system seems to work. This is why even though, for example, I've got my browser running, I can also turn on Spotify and play a song. That's more of a conceptual thing to understand. I would take a look at the threads short if you'd like to learn more about that. Let's see, I believe there might have been a problem on this in one of these. Again, I think threads and events are not something that we will cover in C just because it's significantly more difficult than in Scratch. You shouldn't worry about it there, but definitely understand the concepts, understand what's going on. Before we move on, any questions on Week 0 material? Everybody feeling pretty good? Understanding variables and what a variable is? Moving on. Week 1. A couple of things here that weren't particularly covered in the quiz review necessarily and also are more conceptual things to think about. The first is this notion of what source code, compilers and object code are. Anybody? Basil. Is object code—I mean source code is what you put into clang, and object code is what clang puts out so that your computer can read the program. Exactly. Source code is the C code that you actually type up. Object code is what you get out of clang. It's the 0s and 1s in that binary format. Then what happens is when you have a bunch of object files, say you're compiling a project or a program that uses multiple source code files, which by convention are given the .c file extension. That's why we have caesar.c, vigenère.c. If you're writing Java programs you give them the extension .java. Python programs have the extension .py often. Once you have multiple .c files, you compile them. Clang spits out all this binary junk. Then because you only want 1 program you have the linker link all of these object files together into 1 executable file. This is also what happens when you use the CS50 library, for example. The CS50 library is both that .h header file that you read, that #includecs50.h. And then it's also a special binary library file that's been compiled that is 0s and 1s, and that -l flag, so if we go back to our Spaces and we look really quickly at what's going on here when we look at our clang command, what we've got is this is our source code file right here. These are a bunch of compiler flags. And then at the very end, these -l flags link in the actual binary files for these 2 libraries, the CS50 library and then the math library. Understanding each type of files' purpose in the compilation process is something you'll want to be able to give at least a high level overview of. Source code comes in. Object code comes out. Object code files link together, and you get a beautiful, executable file. Cool. This is also where you can get errors at multiple points in the compilation process. This is where, for example, if you take out this linking flag, the CS50 flag, and you omit it in Spaces or when you're running your code, this is where you'll get an error in the linking phase, and the linker will say, "Hey, you called a function GetString that's in the CS50 library." "You told me it was in the CS50 library, and I can't find the code for it." That's where you have to link it in, and that's separate from a compiler error because the compiler is looking at syntax and that kind of stuff. It's good to know what's going on when. Other things to know about. I would say you definitely want to take a look at the short on typecasting done by Jordan to understand what ints are under the hood, what chars are under the hood. When we talk about ASCII and we actually look at the ASCII table, what that's doing is giving us an under the hood look at how the computer actually represents capital A and the digit 7 and a comma and a question mark. The computer also has special ways to represent the number 7 as an integer. It has a special way to represent the number 7 as a floating point number, and those are very different. Typecasting is how you tell the computer "Hey, I want you to convert from one representation to another representation." Why don't we take a look at that. I would also take a look at the short on libraries and the short on compilers. Those talk about the process of compilation, what a library is, and go over some of these questions that you might get asked. Questions on Week 1 material? Are there any topics in here that seem daunting you'd like to cover? I'm trying to blow through most of these earlier topics so that we can get to pointers and do a little bit of recursion. Thoughts? Anything to cover? Time for some chocolate maybe? You guys are working through it. I'm going to keep sipping on my coffee. Week 2. Good call, good call. In Week 2 we talked a little bit more about functions. In the first few problem sets we didn't really write any functions at all other than which function? [Student] Main.>>Main, exactly. And so we've seen the different costumes that main wears. There's the one in which it takes no arguments, and we just say void in between the parentheses, and then there's the other one where we do want to take command line arguments, and as we saw, that's where you have int argc and string argv array or now that we've actually exposed string to be the char* that it is we're going to start writing it as char* argv and then brackets. In Problem Set 3, you guys saw a bunch of functions, and you implemented a bunch of functions, draw, look up, scramble. The prototypes were all written there for you. What I wanted to talk about here with functions really quickly is that there are 3 parts to them whenever you write a function. You have to specify the return type of the function. You have to specify a name for the function, and then you have to specify the argument list or the parameter list. For example, if I were to write a function to sum up a bunch of integers and then return to me the sum what would be my return type if I wanted to sum integers and then return the sum? Then the name of the function. If I go ahead and write in green, this part is the return type. This part is the name. And then in between parentheses is where I give the arguments, often abbreviated as args, sometimes called params for parameters. And if you have one, you just specify the one. If you have multiple you separate each one with a comma. And for each argument you give it 2 things which are—Kevin? [Kevin] You have to give the type and then the name. And then the name, and the name is the name that you're going to use to refer to that argument within the sum function, within the function that you're currently writing. You don't have to—for example, if I'm going to sum up, say, an array of integers—we'll do int array, and I'll give myself some curly braces there— then when I pass an array to the sum function I pass it in the first position of the argument list. But the array that I pass in doesn't have to have the name arr. Arr is going to be how I refer to that argument within the body of the function. The other thing that we need to take into account, and this is slightly different from functions, but I think it's an important point, is that in C when I'm writing a function like this how do I know how many elements are in this array? This is somewhat of a trick question. We talked about this a little bit in last week's section. How do I know the number of elements inside an array in C? Is there a way? It turns out that there's no way to know. You have to pass it in separately. There is a trick that you can do if you're in the same function in which the array has been declared, and you're working with a stack array. But that only works if you're in the same function. Once you pass an array to another function or if you've declared an array and you put that array on the heap, you've used malloc and that kind of stuff, then all bets are off. Then you actually have to pass around a special argument or another parameter telling you how big the array is. In this case, I'd want to use a comma—I'm sorry, it's going off the screen here— and I'd pass in another argument and call it int len for the length. One thing that might come up on the quiz is asking you to write or implement a particular function called something. If we don't give you the prototype, so this whole thing here, this whole mess is called the function declaration or the function prototype, this is one of the first things that you'll want to nail down if it's not given to you right away on the quiz. The other trick I've learned is that say we do give you a prototype for a function, and we say, "Hey, you've got to write it." Inside the curly braces that you have on the quiz if you notice that there is a return type and you notice that the return type is something other than void, which means that the function doesn't return anything, then one thing you definitely want to do is write some sort of return statement at the very end of the function. Return, and in this case, we'll put a blank because we want to fill in the blank. But this gets you thinking in the right way about how am I going to approach this problem? And it reminds you you're going to have to return a value to the caller of the function. Yeah.>>[Student] Does style apply when we're writing code on the quiz? Such as indentation and that kind of stuff?>>[Student] Yeah. No, not as much. I think a lot of—this is something we'll clarify on the quiz on the day of, but typically worrying about #includes and that kind of stuff, it's kind of outside. [Student] Do you need to comment your handwritten code? Do you need to comment your handwritten code? Commenting is always good if you're worried about partial credit or you want to communicate your intent to the grader. But I, again, will clarify on the quiz itself and on the quiz day, but I don't believe that you'll be required to write comments, no. Typically not, but it's definitely the sort of thing where you can communicate your intent, like "Hey, this is where I'm going with it." And sometimes that can help with partial credit. Cool. Basil. [Basil] What's the difference between declaring, say, int lang in the arguments or parameters versus declaring a variable within the function? Wow, coffee went down the windpipe. [Basil] Like which things we want to put in arguments. Yeah, that's a great question. How do you choose what things you want to put in the arguments versus what things you should do inside of the function? In this case we included both of these as arguments because they're something that whoever is going to use the sum function needs to specify those things. The sum function, like we talked about, has no way of knowing how big the array is it gets from its caller or whoever is using the sum function. It has no way of knowing how big that array is. The reason we pass in this length right here as an argument is because that's something that we're basically telling the caller of the function, whoever is going to use the sum function, "Hey, not only do you have to give us an array of ints, you also have to tell us how big the array that you've given us is." [Basil] Those will both be command line arguments? No, these are actual arguments that you would pass to the function. Let me do a new page here. [Basil] Like name would pass— [Nate H.] If I have int main (void), and I'm going to put in my return 0 down here at the bottom, and say I want to call the sum function. I want to say int x = sum( ); To use the sum function I have to pass in both the array that I want to sum up and the length of the array, so this is where assuming I had an array of ints, say I had int numbaz [ ] = 1, 2, 3, kind of use that hacked up syntax right there, then what I would do is in sum I would want to pass in both numbaz and the number 3 to tell the sum function "Okay, here's the array I want you to sum." "Here's its size." Does that make sense? Does that answer your question? In many ways it does parallel what we're doing with main when we have the command line arguments. A program like Caesar cipher, for example, that needed command line arguments would not be able to do anything. It wouldn't know how to encrypt if you didn't tell it what key to use or if you didn't tell it what string you wanted to encrypt. Prompting for input, this is where we've got 2 different mechanisms for taking input in from the user, for taking information in from the user. For Problem Set 1 we saw this GetInt, GetString, GetFloat way of prompting for input, and that's called using the standard input stream. It's slightly different. It's something that you can do at one time as opposed to when you invoke the program, when you start the program running. The command line arguments all are specified when you start the program running. We've been mixing the two of those. When we use arguments to a function, it's much like command line arguments to main. It's when you invoke the function you need to tell it what exactly it needs in order to perform its tasks. Another good thing to look at—and I'll let you look at it in your spare time, and it was covered in the quiz—was this notion of scope and local variables versus global variables. Do pay attention to that. Now that we're getting on to this other stuff, in Week 3 we started talking about searching and sorting. Searching and sorting, at least in CS50, is very much an introduction to some of the more theoretical parts of computer science. The problem of searching, the problem of sorting are big, canonical problems. How do you find a particular number in an array of billions of integers? How do you find a particular name inside a phone book that's stored on your laptop? And so we introduce this notion of asymptotic run times to really quantify how long, how hard these problem are, how long they take to solve. In, I believe, 2011's quiz there's a problem that I think merits covering very quickly, which is this one, problem 12. O no, it's Omega. Here we're talking about the fastest possible run time for a particular algorithm and then the slowest possible run time. This Omega and O are really just shortcuts. They're notational shortcuts for saying how fast in the best possible case will our algorithm run, and how slow in the worst possible case will our algorithm run? Let's do a couple of these, and these were also covered in the short on asymptotic notation, which I highly recommend. Jackson did a really good job. With binary search, we talk about binary search as being an algorithm, and we usually talk about it in terms of its big O. What is the big O? What is the slowest possible run time of binary search? [Student] N²? Close, I guess similar to that. It's a lot faster than that. [Student] Binary?>>Yeah, binary search. [Student] It's log n. Log n, so what does log n mean? It halves it each iteration. Exactly, so in the slowest possible case, say if you have a sorted array of a million integers and the number you're looking for is either the very first element in the array or the very last element in the array. Remember, the binary search algorithm works by looking at the middle element, seeing if that's the match that you're looking for. If it is, then great, you found it. In the best possible case, how fast does binary search run? [Students] 1. 1, it's constant time, big O of 1. Yeah. [Student] I have a question. When you say log of n, you mean with respect to base 2, right? Yes, so that's the other thing. We say log n, and I guess when I was in high school I always assumed that log was base 10. Yeah, so yes, log base 2 typically is what we use. Again, going back to binary search, if you're searching for either the element at the very end or the element at the very beginning, because you start in the middle and then you discard whichever half doesn't meet the criteria that you're looking for, and you go to the next half and the next half and the next half. If I'm searching for the largest element in the million integer array I'm going to halve it at most log of 1 million times before I finally test and see that the element I'm looking for is in the biggest or in the highest index of the array, and that will take log of n, log of 1 million times. Bubble sort. Do you guys remember the bubble sort algorithm? Kevin, can you give me a quick recap of what happened in the bubble sort algorithm? [Kevin] Basically it goes through everything in the list. It looks at the first two. If the first one is bigger than the second one it swaps them. Then it compares second and third, same thing, swaps, third and fourth, all the way down. Bigger numbers will follow up to the end. And after however many loops you're done. Exactly, so what Kevin said is that we'll watch bigger numbers bubble up to the end of the array. For example, do you mind walking us through this example if this is our array? [Kevin] You'll take 2 and 3. 3 is bigger than 2, so you swap them. [Nate H.] Right, so we swap these, and so we get 2, 3, 6, 4, and 9. [Kevin] Then you compare the 3 and 6. 3 is smaller than 6, so you leave them, and 6 and 4, you'd swap them because 4 is smaller than 6. [Nate H.] Right, so I get 2, 3, 4, 6, 9. [Kevin] And 9 is bigger than 6, so you leave it. And you'd go back through it again. [Nate H.] Am I done at this point?>>[Kevin] No. And why am I not done at this point? Because it looks like my array is sorted. I'm looking at it. [Kevin] Go through it again and make sure that there are no more swaps before you can fully stop. Exactly, so you need to keep going through and make sure that there are no swaps that you can make at this point. It was really just lucky, like you said, that we ended up only having to make 1 pass through and we're sorted. But to do this in the general case we'll actually have to do this over and over again. And in fact, this was an example of the best possible case, like we saw in the problem. We saw that the best possible case was n. We went through the array 1 time. What is the worst possible case for this algorithm? [Kevin] N². And what does that look like? What would an array look like that would take n² time? [Kevin] [inaudible] sorted. Exactly, so if I had the array 9, 7, 6, 5, 2, first the 9 would bubble all the way up. After 1 iteration we'd have 7, 6, 5, 2, 9. Then the 7 would bubble up, 6, 5, 2, 7, 9, and so on and so forth. We'd have to go through the entire array n times, and you can actually get slightly more precise than this because once we've moved the 9 all the way up into its last possible position we know that we never have to compare against that element again. Once we start bubbling the 7 up we know that we can stop once the 7 is right before the 9 since we've already compared the 9 to it. If you do this in a smart way it's not truly, I guess, that much time. You're not going to compare all the possible [inaudible] combinations every single time you go through each iteration. But still, when we talk about this upper bound we say that you are looking at n² comparisons all the way through. Let's go back, and since we're starting to get a little short on time I would say you should definitely go through the rest of this table, fill it all out. Think of examples. Think of concrete examples. That's really handy and helpful to do. Draw it out. This is the sort of table that as you go through in computer science you should really start to know these by heart. These are the kinds of questions you get in interviews. These are sorts of things that are good to know, and think about those edge cases, really figuring out how to think about knowing that for bubble sort the worst possible array to sort with that is one that's in reverse order. Pointers. Let's talk a little bit about pointers. In the last few minutes we have here I know this is something along with file I/O that is rather new. When we talk about pointers the reason we want to talk about pointers is because, one, when we're working in C we are really at a fairly low level compared to most modern programming languages. We're actually able to manipulate the variables in memory, figure out where they're actually located within our RAM. Once you've gone on to take operating system classes you'll see that that's, again, kind of an abstraction. That's not actually the case. We've got virtual memory that's hiding those details from us. But for now you can assume that when you have a program, for example, when you start running your Caesar cipher program— I'll switch back to my iPad really quickly— that at the very beginning your program, if you have, say, 4 gigabytes of RAM on your laptop, you get set aside this chunk, and we'll call this RAM. And it starts in a place we're going to call 0, and it ends at a place that we'll call 4 gigabytes. I really can't write. Man, that is hacked. When your program executes the operating system carves up RAM, and it specifies different segments for different parts of your program to live in. Down here this area is kind of a no man's land. When you go up a little farther here you've got actually the place where the code for your program lives. That actual binary code, that executable file actually gets loaded into memory when you run a program, and it lives in the code segment. And as your program executes the processor looks at this code segment to figure out what is the next instruction? What is the next line of code I need to execute? There's also a data segment, and this is where those string constants get stored that you've been using. And then farther up there's this place called the heap. We access memory in there by using malloc, and then towards the very top of your program there's the stack, and that's where we've been playing for most of the beginning. This isn't to scale or anything. A lot of this is very machine dependent, operating system dependent, but this is relatively how things get chunked up. When you run a program and you declare a variable called x— I'm going to draw another box down below, and this is going to be RAM as well. And I'm going to look. We'll draw jagged lines to indicate this is just a small section of RAM and not all of it as we draw at the top. If I declare an integer variable called x, then what I actually get is a mapping that is stored in the symbol table of my program that connects the name x to this region of memory that I've drawn right here between the vertical bars. If I have a line of code in my program that says x = 7 the processor knows "Oh, okay, I know that x lives at this location in memory." "I'm going to go ahead and write a 7 there." How does it know what location this is in memory? Well, that's all done at compile time. The compiler takes care of allocating where each of the variables are going to go and creating a special mapping or rather connecting the dots between a symbol and where it's going, a variable's name and where it's going to live in memory. But it turns out that we can actually access it in our programs as well. This gets important when we start talking about some of the data structures, which is a concept that we're going to introduce later on. But for now, what you can know is that I can create a pointer to this location, x. For example, I can create a pointer variable. When we create a pointer variable we use the star notation. In this case, this says I'm going to create a pointer to an int. It's a type just like any other. We give it a variable like y, and then we set it equal to the address, to an address. In this case, we can set y to point to x by taking the address of x, which we do with this ampersand, and then we set y to point to it. What this essentially does is if we look at our RAM this creates a separate variable. It's going to call it y, and when this line of code executes it's actually going to create a little pointer which we typically draw as an arrow, and it sets y to point to x. Yes. [Student] If x is already a pointer, would you just do int *y = x instead of having the ampersand? Yes. If x is already a pointer, then you can set 2 pointers equal to each other, in which case y would not point to x, but it would point to whatever x is pointing to. Unfortunately, we're out of time. What I would say at this point, we can talk about this offline, but I would say start working through this problem, #14. You can see there's already a little bit filled in for you here. You can see that when we declare 2 pointers, int *x and *y, and note that pointing the * next to the variable was something that was done last year. It turns out that this is similar to what we're doing this year. It doesn't matter where you write the * when you're declaring the pointer. But we have written the * next to the type because that makes it very clear that you're declaring a pointer variable. You can see that declaring the 2 pointers gives us 2 boxes. Here when we set x equal to malloc what this is saying is setting aside memory in the heap. This little box right here, this circle, is located on the heap. X is pointing to it. Note that y is still not pointing to anything. To get memory—to store the number 42 into x we would use what notation? [Student] *x = 42. Exactly, *x = 42. That means follow the arrow and throw 42 in there. Here where we set y and x we have y pointing to x. Again, this is just like what Kevin said where we set y equal to x. Y is not pointing to x. Rather, it's pointing to what x is pointing to as well. And then finally in this last box there are 2 possible things that we could do. One is we could say *x = 13. The other thing is we could say—Alex, do you know what we could do here? You could say *x = 13 or— [Student] You could say int whatever. [Nate H.] If this were referred to as an int variable we could do that. We could also say *y = 13 because they're both pointing to the same place, so we could use either variable to get there. Yeah.>>[Student] What would it look like if we just say int x is 13? That would be declaring a new variable called x, which wouldn't work. We'd have a collision because we declared x to be a pointer up here. [Student] If we just had that statement by itself what would it look like in terms of the circle? If we had x = 13 then we'd have a box, and rather than having an arrow coming out of the box we'd draw it as just a 13. [Student] In the box. Okay. Thank you for watching, and good luck on Quiz 0. [CS50.TV]