[SECTION 5: LESS COMFORTABLE] [Nate Hardison, Harvard University] [This is CS50.] [CS50.TV] So welcome back, guys. Welcome to section 5. At this point, having completed quiz 0 and having seen how you've done, hopefully you feel really good because I was very impressed by the scores in this section. For our online viewers, we've had a couple of questions about the last two problems on the problem set--or on the quiz, rather. So we're going to go over those really quickly so that everybody sees what happened and how to go through the actual solution rather than just viewing the solution itself. We're going to go over the last couple of problems really quickly, 32 and 33. Just, again, so that the online viewers can see this. 

If you turn to your problem 32, which is on page 13, 13 out of 16, problem 32 is all about swaps. It was all about swapping two integers. It's the problem that we'd gone over a couple of times in lecture. And in here, what we were asking you to do is a quick memory trace. To fill in the values of the variables as they are on the stack as the code goes through this swap function. In particular, what we're looking at--I'm going to put this iPad down-- in particular, what we're looking at is this line numbered 6 right here. And it's numbered 6 for just contiguity with the previous problem. What we want to do is display or label the state of memory as it is at the time when we execute this line number 6, which is effectively a return from our swap function right here. If we scroll down here, we saw that the addresses of everything in memory was provided for us. This is very key; we'll come back to it in just a moment. And then down here at the bottom, we had a little memory diagram that we're going to refer to. I have actually done this out on my iPad. So I'm going to alternate back and forth between the iPad and this code just for reference. 

Let's start. First, let's focus on the first couple of lines of main right here. To start, we're going to initialize x to 1 and y to 2. So we have two integer variables, they're both going to be placed on the stack. We're going to put a 1 and a 2 in them. So if I flip over to my iPad, hopefully, let's see-- Apple TV mirroring, and there we go. Okay. So if I flip over to my iPad, I want to initialize x to 1 and y to 2. We do that quite simply by writing a 1 in the box marked x and a 2 in the box marked y. Fairly simple. So now let's go back to the laptop, see what happens next. So this next line is where things get tricky. We pass the address of x and the address of y as the parameters a and b to the swap function. The address of x and the address of y are things that we can't calculate without referring to these bullet points right down here. And fortunately, the first two bullet points tell us exactly what the answers are. The address of x in memory is 10, and the address of y in memory is 14. So those are the values that get passed in as a and b up top in our swap function. So again, switching back to our diagram, I can write a 10 in a and a 14 in b. Now, this point is where we proceed with the swap. So flipping back to the laptop again, we see that the way the swap works is I first dereference a and store the result in tmp. So the dereference operator says, "Hey. Treat the contents of variable a as an address. Go to whatever is stored at that address, and load it." What you load out of the variable is going to be stored into our tmp variable. Flipping back to the iPad. If we go to address 10, we know that address 10 is the varible x because we were told by our bullet point that the address of x in memory is 10. So we can go there, get the value of it, which is 1, as we see on our iPad, and load that into tmp. Again, this is not the final contents. We're going to walk through and we'll get to our final state of the program at the end. But right now, we have the value 1 stored in tmp. 

And there's a quick question over here. [Alexander] Is the dereference operator--that's just the star right in front of the variable? >>Yes. So the dereference operator, as we flip back to our laptop once again, is this star right in front. In that sense, it is--you contrast it with the multiplication operator which requires two things; the dereference operator is a unary operator. Just applied to one value as opposed to a binary operator, where you apply to two different values. So that's what happens in this line. We loaded the value 1 and stored it into our temporary integer variable. The next line, we store the contents of b into-- or, rather, we store the contents that b is pointing to into the place where a is pointing to. If we analyze this from right to left, we are going to dereference b, we are going to address 14, we are going to grab the integer that is there, and then we are going to go to the address 10, and we are going to throw the result of our dereference of b into that space. Flipping back to our iPad, where we can make this a little more concrete, it might help if I write numbers on all of the addresses here. So we know that at y, we are at address 14, x is at address 10. When we start at b, we dereference b, we're going to grab the value 2. We are going to grab this value because that is the value that lives at address 14. And we're going to put it into the variable that lives at address 10, which is right there, corresponding to our variable x. So we can do a little bit of overwriting here where we get rid of our 1 and instead we write a 2. So all's well and good in the world, even though we've overwritten x now. We have stored x's old value in our tmp variable. So we can complete the swap with the next line. Flipping back to our laptop. Now all that remains is to take the contents out of our temporary integer variable and store them into the variable that lives at the address that b is holding. So we're going to effectively dereference b to get access to the variable that is at the address that b holds in it, and we're going to stuff the value that tmp is holding into it. Flipping back to the iPad once more. I can erase this value here, 2, and instead we'll copy the 1 right into it. Then the next line that executes, of course-- if we flip back to the laptop--is this point 6, which is the point at which we wanted to have our diagram completely filled out. So flipping back to the iPad once more, just so you can see the completed diagram, you can see that we have a 10 in a, a 14 in b, a 1 in tmp, a 2 in x, and a 1 in y. Are there any questions about this? Does this make more sense, having walked through it? Make less sense? Hopefully not. Okay. 

Pointers are a very tricky subject. One of the guys we work with has a very common saying: "To understand pointers, you must first understand pointers." Which I think is very true. It does take a while to get used to it. Drawing lots of pictures, drawing lots of memory diagrams like this one are very helpful, and after you walk through example after example after example, it'll start to make a little more sense and a little more sense and a little more sense. Finally, one day, you'll have it all completely mastered. Any questions before we move on to the next problem? All right. So flip back to the laptop. The next problem we have is problem number 33 on file I/O. Zoom in on this a little bit. Problem 33--Yes? 

[Daniel] I just had a quick question. This star, or the asterisk, it's called dereferencing when you use an asterisk before. What's it called when you use the ampersand before? >>The ampersand before is the address-of operator. So let's scroll back up. Oops. I'm in zoom mode so I can't really scroll. If we look at this code really quickly right here, again, same thing happening. If we look at this code right here, on this line where we make the call to swap, the ampersand is just saying "get the address at which variable x lives." When your compiler compiles your code, it has to actually physically mark out a place in memory for all of your variables to live. And so what the compiler can then do once it's compiled everything, it knows, "Oh, I put x at address 10. I put y at address 14." It can then fill in these values for you. So you can then--it can then pass this in and pass &y in as well. These guys get the address, but they also, when you pass them into the swap function, this type information, this int* right here, tells the compiler, "Okay, we're going to be interpreting this address as an address of an integer variable." As an address of an int, which is different from the address of a character variable because an int takes up, on a 32-bit machine, takes up 4 bytes of space, whereas a character only takes up 1 byte of space. So it's important to know also what is--what lives, what type of value is living at the address that got passed in. Or the address that you're dealing with. That way, you know how many bytes of information to actually load out of your RAM. And then, yes, this dereference operator, like you were asking, goes and accesses information at a particular address. So it says, with this a variable here, treat the contents of a as an address, go to that address, and pull out, load into the processor, load into a register the actual values or the contents that live at that address. Any more questions? These are good questions. It's a lot of new terminology too. It's also kind of funky, seeing & and * in different places. 

All right. So back to problem 33, file I/O. This was one of those problems that I think a couple of things happened. One, it's a fairly new topic. It was presented pretty soon before the quiz, and then I think it was kind of like one of those word problems in math where they give you a lot of information, but you actually don't end up having to use a ton of it. The first part of this problem is describing what a CSV file is. Now, a CSV file, according to the description, is a comma-separated values file. The reason these are at all interesting, and the reason you ever use them, is, because, how many of you have ever used stuff like Excel? Figure most of you have, probably, or will use at some point in your life. You'll use something like Excel. In order to get the data out of an Excel spreadsheet or do any sort of processing with it, if you wanted to write a C program or Python program, Java program, to deal with the data you have stored in there, one of the most common ways to get it out is in a CSV file. And you can open up Excel and when you go to the 'Save As' dialogue, you can get out an actual CSV file. 

Handy to know how to deal with these things. The way it works is that it's similar to--I mean, it's essentially mimicking a spreadsheet, where, as we see here, in the very left-most piece, we have all the last names. So we have Malan, then Hardison, and then Bowden, MacWilliam, and then Chan. All the last names. And then a comma separates the last names from the first names. David, Nate, Rob, Tommy, and Zamyla. I always mix up Robby and Tom. And then, finally, the third column is the email addresses. Once you understand that, the rest of the program is fairly straightforward to implement. What we've done in order to mimic this same structure in our C program is we've used a structure. We'll start playing with these a little more as well. We saw them for the first little bit in problem set 3, when we were dealing with the dictionaries. But this staff struct stores a last name, a first name, and an email. Just like our CSV file was storing. So this is just converting from one format to another. We have to convert, in this case, a staff struct into a line, a comma-separated line, just like that. Does that make sense? You guys have all taken the quiz, so I imagine you have at least had some time to think about this. 

In the hire function, the problem asks us to take in--we'll zoom in on this a little bit-- take in a staff structure, a staff struct, with name s, and append its contents to our staff.csv file. It turns out that this is fairly straightforward to use. We'll kind of play around with these functions a little bit more today. But in this case, the fprintf function is really the key. So with fprintf, we can print, just like you guys have been using printf this whole term. You can printf a line to a file. So instead of just making the usual printf call where you give it the format string and then you replace all the variables with the following arguments, with fprintf, your very first argument is instead the file you want to write to. If we were to look at this in the appliance, for example, man fprintf, we can see the difference between printf and fprintf. I'll zoom in here a little bit. So with printf, we give it a format string, and then the subsequent arguments are all the variables for replacement or substitution into our format string. Whereas with fprintf, the first argument is indeed this file* called a stream. 

Moving back over here to our hire, we've already got our file* stream opened for us. That's what this first line does; it opens the staff.csv file, it opens it in append mode, and all that's left for us to do is write the staff structure to the file. And, let's see, do I want to use the iPad? I'll use the iPad. We have void--let's put this on the table so I can write a little better-- void hire and it takes in one argument, a staff structure called s. Got our braces, we've got our file* called file, we have our fopen line given to us, and I'll just write it as dots since it's already in the pedia. And then on our next line, we're going to make a call to fprintf and we're going to pass in the file that we want to print to, and then our format string, which-- I'll let you guys tell me what it looks like. How about you, Stella? Do you know what the first part of the format string looks like? [Stella] I'm not sure. >>Feel free to ask Jimmy. Do you know, Jimmy? [Jimmy] Would it just be last? I don't know. I'm not entirely sure. >>Okay. How about, did anybody get this correct on the exam? No. All right. It turns out that here all we have to do is we want each part of our staff structure to be printed out as a string into our file. We just use the string substitution character three different times because we have a last name followed by comma, then a first name followed by a comma, and then finally the email address which is followed--which is not fitting on my screen--but it's followed by a newline character. So I'm going to write it just down there. And then following our format string, we just have the substitutions, which we access using the dot notation that we saw in problem set 3. We can use s.last, s.first, and s.email to substitute in those three values into our format string. So how did that go? Make sense? Yes? No? Possibly? Okay. 

The final thing that we do after we've printed and after we've opened our file: whenever we've opened a file, we always have to remember to close it. Because otherwise we'll end up leaking the memory, using up file descriptors. So to close it, which function do we use? Daniel? [Daniel] fclose? >> fclose, exactly. So the last part of this problem was to properly close the file, using the fclose function, which just looks like that. Not too crazy. Cool. So that's problem 33 on the quiz. We'll have definitely more file I/O coming up. We'll do a little bit more in lecture today, or in section today, because that's what's going to form the bulk of this upcoming pset. Let's move on from the quiz at this point. Yes? 

[Charlotte]] Why fclose(file) instead of fclose(staff.csv)? >>Ah. Because it turns out that--so the question, which is a great one, is why, when we write fclose, are we writing fclose(file) star variable as opposed to the file name, staff.csv? Is that correct? Yeah. So let's take a look. If I switch back to my laptop, and let's look at the fclose function. So the fclose function closes a stream and it takes in the pointer to the stream that we want to close, as opposed to the actual file name that we want to close. And this is because behind the scenes, when you make a call to fopen, when you open up a file, you're actually allocating memory to store information about the file. So you have file pointer that has information about the file, such as it's open, its size, where you are currently in the file, so that you can make reading and writing calls to that particular place within the file. You end up closing the pointer instead of closing the file name. 

Yes? [Daniel] So in order to use hire, would you say--how does it get the user input? Does fprintf act like GetString in the sense that it'll just wait for the user input and ask you to type this--or wait for you to type these three things in? Or do you need to use something to implement hire? >>Yeah. So we're not--the question was, how do we get the user input in order to implement hire? And what we have here is the caller of hire, passed in this staff struct with all of the data stored in the struct already. So fprintf is able to just write that data directly to the file. There's no waiting for user input. The user's already given the input by properly putting it in this staff struct. And things, of course, would break if any of those pointers were null, so we scroll back up here and we look at our struct. We have string last, string first, string email. We now know that all of those really, under the hood, are char* variables. That may or may not be pointing to null. They may be pointing to memory on the heap, maybe memory on the stack. We don't really know, but if any of these pointers are null, or invalid, that that'll definitely crash our hire function. That was something that was kind of beyond the scope of the exam. We're not worrying about that. Great. Okay. So moving on from the quiz. 

Let's close this guy, and we're going to look at pset 4. So if you guys look at the pset spec, once you can access it, cs50.net/quizzes, we are going to go through a few of the section problems today. I'm scrolling down--section of questions begins on the third page of the pset spec. And the first part asks you to go and watch the short on redirecting and pipes. Which was kind of a cool short, shows you some new, cool command line tricks that you can use. And then we've got a few questions for you as well. This first question about streams, to which printf writes by default, we kind of touched on just a little bit a moment ago. This fprintf that we were just discussing takes in a file* stream as its argument. fclose takes in a file* stream as well, and the return value of fopen gives you a file* stream as well. The reason we haven't seen those before when we've dealt with printf is because printf has a default stream. And the default stream to which it writes you'll find out about in the short. So definitely take a look at it. 

In today's section, we're going to talk a little bit about GDB, since the more familiar you are with it, the more practice you get with it, the better able you'll be to actually hunt down bugs in your own code. This speeds the process of debugging up tremendously. So by using printf, every time you do that you have to recompile your code, you have to run it again, sometimes you have to move the printf call around, comment out code, it just takes a while. Our goal is to try and convince you that with GDB, you can essentially printf anything at any point in your code and you never have to recompile it. You never have to start and keep guessing where to printf next. The first thing to do is to copy this line and get the section code off of the web. I'm copying this line of code that says, "wget http://cdn.cs50.net". I'm going to copy it. I'm going to go over to my appliance, zoom out so you can see what I'm doing, pasting it in there, and when I hit Enter, this wget command literally is a web get. It's going to pull down this file off of the Internet, and it's going to save it to the current directory. Now if I list my current directory you can see that I've got this section5.zip file right in there. The way to deal with that guy is to unzip it, which you can do in the command line, just like this. Section5.zip. That'll unzip it, create the folder for me, inflate all of the contents, put them in there. So now I can go into my section 5 directory using the cd command. Clear the screen using clear. So clear the screen. Now I've got a nice clean terminal to deal with. 

Now if I list all the files that I see in this directory, you see that I've got four files: buggy1, buggy2, buggy3, and buggy4. I've also got their corresponding .c files. We're not going to look at the .c files for now. Instead, we're going to use them when we open up GDB. We've kept them around so that we have access to the actual source code when we're using GDB, but the goal of this part of the section is to tinker around with GDB and see how we can use it to figure out what's going wrong with each of these four buggy programs. So we're just going to around the room really quickly, and I'm going to ask somebody to run one of the buggy programs, and then we'll go as a group through GDB, and we'll see what we can do to fix these programs, or to at least identify what's going wrong in each of them. Let's start over here with Daniel. Will you run buggy1? Let's see what happens. [Daniel] It says there's an application fault. >>Yeah. Exactly. So if I run buggy1, I get a seg fault. At this point, I could go and open up buggy1.c, try and figure out what's going wrong, but one of the most obnoxious things about this seg fault error is that it doesn't tell you on what line of the program things actually went wrong and broke. You kind of have to look at the code and figure out using guess and check or printf to see what's going wrong. One of the coolest things about GDB is that it's really, really easy to figure out the line at which your program crashes. It's totally worth it to use it, even if just for that. So to boot up GDB, I type GDB, and then I give it the path to the executable that I want to run. Here I'm typing gdb ./buggy1. Hit Enter. Gives me all this copyright information, and down here you'll see this line that says, "Reading symbols from/home/ jharvard/section5/buggy1." And if all goes well, you'll see it print out a message that looks like this. It'll read symbols, it'll say "I'm reading symbols from your executable file," and then it will have this "done" message over here. If you see some other variation of this, or you see it couldn't find the symbols or something like that, what that means is that you just haven't compiled your executable properly. When we compile programs for use with GDB, we have to use that special -g flag, and that's done by default if you compile your programs, just by typing make or make buggy or make recover, any of those. But if you're compiling manually with Clang, then you'll have to go in and include that -g flag. 

At this point, now that we have our GDB prompt, it's pretty simple to run the program. We can either type run, or we can just type r. Most GDB commands can be abbreviated. Usually to just one or a couple letters, which is pretty nice. So Saad, if you type r and hit Enter, what happens? [Saad] I got SIGSEGV, segmentation fault, and then all this gobbledygook. >>Yeah. Like we're seeing on the screen right now, and like Saad said, when we type run or r and hit Enter, we still get the same seg fault. So using GDB doesn't solve our problem. But it gives us some gobbledygook, and it turns out that this gobbledygook actually tells us where it's happening. To parse this a little bit, this first bit is the function in which everything's going wrong. There's this __strcmp_sse4_2, and it tells us that it's happening in this file called sysdeps/i386, all this, again, kind of a mess--but line 254. That's kind of hard to parse. Usually when you see stuff like this, that means that it's seg faulting in one of the system libraries. So something to do with strcmp. You guys have seen strcmp before. Not too crazy, but does this mean that strcmp is broken or that there's a problem with strcmp? What do you think, Alexander? [Alexander] Is that--is 254 the line? And the--not the binary, but it's not their ceilings, and then there's another language for each function. Is that 254 in that function, or--? >>It's line 254. It looks like in this .s file, so it's assembly code probably. 

But, I guess the more pressing thing is, because we've gotten a seg fault, and it looks like it's coming from the strcmp function, does this imply, then, that strcmp is broken? It shouldn't, hopefully. So just because you have a segmentation fault in one of the system functions, typically that means that you just haven't called it correctly. The quickest thing to do to figure out what's actually going on when you see something crazy like this, whenever you see a seg fault, especially if you have a program that's using more than just main, is to use a backtrace. I abbreviate backtrace by writing bt, as opposed to the full backtrace word. But Charlotte, what happens when you type bt and hit Enter? [Charlotte] It shows me two lines, line 0 and line 1. >>Yeah. So line 0 and line 1. These are the actual stack frames that were currently in play when your program crashed. Starting from the topmost frame, frame 0, and going to the bottom-most, which is frame 1. Our topmost frame is the strcmp frame. You can think of this as similar to that problem we were just doing on the quiz with the pointers, where we had swap stack frame on top of main stack frame, and we had the variables that swap was using on top of the variables that main was using. Here our crash happened in our strcmp function, which was called by our main function, and backtrace is giving us not only the functions in which things failed, but it's also telling us where everything was called from. So if I scroll over a little more to the right, we can see that yeah, we were on line 254 of this strcmp-sse4.s file. But the call was made at buggy1.c, line 6. So that means we can do--is we can just go check out and see what was going on at buggy1.c, line 6. Again, there are a couple ways to do this. One is to exit out of GDB or have your code open in another window and cross reference. That, in and of itself, is pretty handy because now if you're at office hours and you've got a seg fault and your TF's wondering where everything was breaking, you can just say, "Oh, line 6. I don't know what's going on, but something about line 6 is causing my program to break." The other way to do it is you can use this command called list in GDB. You can also abbreviate it with l. So if we hit l, what do we get here? We get a whole bunch of weird stuff. This is the actual assembly code that is in strcmp_sse4_2. This looks kind of funky, and the reason we're getting this is because right now, GDB has us in frame 0. 

So anytime we look at variables, any time we look at source code, we're looking at source code that pertains to the stack frame we're currently in. So in order to get anything meaningful, we have to move to a stack frame that makes more sense. In this case, the main stack frame would make a little more sense, because that was actually the code that we wrote. Not the strcmp code. The way you can move between frames, in this case, because we have two, we have 0 and 1, you do that with the up and down commands. If I move up one frame, now I'm in the main stack frame. I can move down to go back to where I was, go up again, go down again, and go up again. If you ever do your program in GDB, you get a crash, you get the backtrace, and you see that it's in some file that you don't know what's going on. You try list, the code doesn't look familiar to you, take a look at your frames and figure out where you are. You're probably in the wrong stack frame. Or at least you're in a stack frame that isn't one that you can really debug. Now that we're in the appropriate stack frame, we're in main, now we can use the list command to figure out what the line was. And you can see it; it printed it for us right here. But we can hit list all the same, and list gives us this nice printout of the actual source code that's going on in here. 

In particular, we can look at line 6. We can see what's going on here. And it looks like we're making a string comparison between the string "CS50 rocks" and argv[1]. Something about this was crashing. So Missy, do you have any thoughts on what might be going on here? [Missy] I don't know why it's crashing.  >>You don't know why it's crashing? Jimmy, any thoughts? [Jimmy] I'm not entirely sure, but the last time we used string compare, or strcmp, we had like three different cases under it. We didn't have an ==, I don't think, right in that first line. Instead it was separated into three, and one was ==0, one was < 0, I think, and one was > 0. So maybe something like that? >>Yeah. So there's this issue of are we doing the comparison correctly? Stella? Any thoughts? [Stella] I'm not sure. >>Not sure. Daniel? Thoughts? Okay. It turns out what's happening right here is when we ran the program and we got the seg fault, when you ran the program for the first time, Daniel, did you give it any command line arguments? [Daniel] No. >>No. In that case, what is the value of argv[1]? >>There is no value. >>Right. Well, there is no appropriate string value. But there is some value. What is the value that gets stored in there? >>A garbage value? >>It's either a garbage value or, in this case, the end of the argv array is always terminated with null. So what actually got stored in there is null. The other way to solve this, rather than thinking it through, is to try printing it out. This is where I was saying that using GDB is great, because you can print out all the variables, all the values that you want using this handy-dandy p command. So if I type p and then I type the value of a variable or the name of a variable, say, argc, I see that argc is 1. If I want to print out argv[0], I can do so just like that. And like we saw, argv[0] is always the name of your program, always the name of the executable. Here you see it's got the full path name. I can also print out argv[1] and see what happens. 

Here we got this kind of mystical value. We got this 0x0. Remember at the beginning of the term when we talked about hexadecimal numbers? Or that little question at the end of pset 0 about how to represent 50 in hex? The way we write hex numbers in CS, just to not confuse ourselves with decimal numbers, is we always prefix them with 0x. So this 0x prefix always just means interpret the following number as a hexadecimal number, not as a string, not as a decimal number, not as a binary number. Since the number 5-0 is a valid number in hexadecimal. And it's a number in decimal, 50. So this is just how we disambiguate. So 0x0 means hexadecimal 0, which is also decimal 0, binary 0. It's just the value 0. It turns out that this is what null is, actually, in memory. Null is just 0. Here, the element stored at argv[1] is null. So we're trying to compare our "CS50 rocks" string to a null string. So dereferencing null, trying to access things at null, those are typically going to cause some sort of segmentation fault or other bad things to happen. And it turns out that strcmp doesn't check to see whether or not you've passed in a value that's null. Rather, it just goes ahead, tries to do its thing, and if it seg faults, it seg faults, and it's your problem. You have to go fix it. Really quickly, how might we fix this problem? Charlotte? [Charlotte] You can check using if. So if argv[1] is null, ==0, then return 1, or something [unintelligible]. >>Yeah. So that's one great way to do it, as we can check to see, the value we're about to pass into strcmp, argv[1], is it null? If it's null, then we can say okay, abort. 

A more common way to do this is to use the argc value. You can see right here at the beginning of main, we omitted that first test that we typically do when we use command line arguments, which is to test whether or not our argc value is what we expect. In this case, we're expecting at least two arguments, the name of the program plus one other. Because we're about to use the second argument right here. So having some sort of test beforehand, before our strcmp call that tests whether or not argv is at least 2, would also do the same sort of thing. We can see if that works by running the program again. You can always restart your program within GDB, which is really nice. You can run, and when you pass in arguments to your program, you pass them in when you call run, not when you boot up GDB. That way you can keep invoking your program with different arguments each time. So run, or again, I can type r, and let's see what happens if we type "hello". It will always ask you if you want to start it from the beginning again. Usually, you do want to start it from the beginning again. And at this point, it restarts it again, it prints out the program that we're running, buggy1, with the argument hello, and it prints this standard out; it says, "You get a D," sad face. But we didn't seg fault. It said that process exited normally. So that looks pretty good. No more seg fault, we made it past, so it looks like that was indeed the seg fault bug that we were getting. Unfortunately, it tells us that we're getting a D. 

We can go back and look at the code and see what was going on there to figure out what was--why it was telling us that we got a D. Let's see, here was this printf saying that you got a D. If we type list, as you keep typing list, it keeps iterating down through your program, so it'll show you the first few lines of your program. Then it'll show you the next few lines, and the next chunk and the next chunk. And it'll keep trying to go down. And now we'll get to "line number 16 is out of range." Because it only has 15 lines. If you get to this point and your wondering, "What do I do?" you can use the help command. Use help and then give it the name of a command. And you see the GDB gives us all this sort of stuff. It says, "With no argument, lists ten more lines after or around the previous listing. List - lists the ten lines before--" So let's try using list minus. And that lists the 10 lines previous; you can play around with list a little bit. You can do list, list -, you can even give list a number, like list 8, and it'll list the 10 lines around line 8. And you can see what's going on here is you've got a simple if else. If you type in CS50 rocks, it prints out "You get an A." Otherwise it prints out "You get a D." Bummer town. All right. Yes? 

[Daniel] So when I tried doing CS50 rocks without the quotes, it says "You get a D." I needed the quotes to get it to work; why is that? >>Yeah. It turns out that when--this is another fun little tidbit-- when you run the program, if we run it and we type in CS50 rocks, just like Daniel was saying he did, and you hit Enter, it still says we get a D. And the question is, why is this? And it turns out that both our terminal and GDB parse these as two separate arguments. Because when there's a space, that's implied as the first argument ended; the next argument is about to begin. The way to combine those into two, or sorry, into one argument, is to use the quotes. So now, if we put it in quotes and run it again, we get an A. So just to recap, no quotes, CS50 and rocks are parsed as two separate arguments. With quotes, it's parsed as one argument altogether. 

We can see this with a breakpoint. So far we've been running our program, and it's been running until either it seg faults or hits an error or until it has exited and all has been totally fine. This isn't necessarily the most helpful thing, because sometimes you have an error in your program, but it's not causing a segmentation fault. It's not causing your program to stop or anything like that. The way to get GDB to pause your program at a particular point is to set a breakpoint. You can either do this by setting a breakpoint on a function name or you can set a breakpoint on a particular line of code. I like to set breakpoints on function names, because--easy to remember, and if you actually go in and change your source code up a little bit, then your breakpoint will actually stay at the same place within your code. Whereas if you're using line numbers, and the line numbers change because you add or delete some code, then your breakpoints are all totally screwed up. One of the most common things I do is set a breakpoint on the main function. Often I'll boot up GDB, I'll type b main, hit Enter, and that'll set a breakpoint on the main function which just says, "Pause the program as soon as you start running," and that way, when I run my program with, say, CS50 rocks as two arguments and hit Enter, it gets to the main function and it stops right at the very first line, right before it evaluates the strcmp function. 

Since I'm paused, now I can start mucking around and seeing what's going on with all of the different variables that are passed into my program. Here I can print out argc and see what's going on. See that argc is 3, because it's got 3 different values in it. It's got the name of the program, it's got the first argument and the second argument. We can print those out by looking at argv[0], argv[1], and argv[2]. So now you can also see why this strcmp call is going to fail, because you see that it did split up the CS50 and the rocks into two separate arguments. At this point, once you've hit a breakpoint, you can continue to step through your program line by line, as opposed to starting your program again. So if you don't want to start your program again and just continue on from here, you can use the continue command and continue will run the program to the end. Just like it did here. However, if I restart the program, CS50 rocks, it hits my breakpoint again, and this time, if I don't want to just go all the way through the rest of the program, I can use the next command, which I also abbreviate with n. And this will step through the program line by line. So you can watch as things execute, as variables change, as things get updated. Which is pretty nice. The other cool thing is rather than repeating the same command over and over and over again, if you just hit Enter--so here you see I haven't typed in anything-- if I just hit Enter, it will repeat the previous command, or the previous GDB command that I just put in. I can keep hitting Enter and it'll keep stepping through my code line by line. I would encourage you guys to go check out the other buggy programs as well. We don't have time to get through all of them today in section. The source code is there, so you can kind of see what's going on behind the scenes if you get really stuck, but at the very least, just practice booting up GDB, running the program until it breaks on you, getting the backtrace, figuring out what function the crash was in, what line it was on, printing out some variable values, just so you get a feel for it, because that will really help you going forward. At this point, we're going to quit out of GDB, which you do using quit or just q. If your program is in the middle of running still, and it hasn't exited, it will always ask you, "Are you sure you really want to quit?" You can just hit yes. 

Now we're going to look at the next problem we have, which is the cat program. If you watch the short on redirecting and pipes, you'll see that Tommy uses this program that basically prints all the output of a file to the screen. So if I run cat, this is actually a built-in program to the appliance, and if you have Macs you can do this on your Mac too, if you open up terminal. And we--cat, let's say, cp.c, and hit Enter. What this did, if we scroll up a little bit and see where we ran the line, or where we ran the cat command, it literally just printed out the contents of cp.c to our screen. We can run it again and you can put in multiple files together. So you can do cat cp.c, and then we can also concatenate the cat.c file, which is the program we're about to write, and it'll print both files back to back to our screen. So if we scroll up a little bit, we see that when we ran this cat cp.c, cat.c, first it printed out the cp file, and then below it, it printed out the cat.c file right down here. We're going to use this to just get our feet wet. Play around with simple printing to the terminal, see how that works. If you guys open up with gedit cat.c, hit Enter, you can see the program that we're about to write. We've included this nice boiler plate, so we don't have to spend time typing all that out. We also check the number of arguments passed in. We print out a nice usage message. 

This is the sort of thing that, again, like we've been talking about, it's almost like muscle memory. Just remember to keep doing the same sort of stuff and always printing out some sort of helpful message so that people know how to run your program. With cat, it's pretty simple; we're just going to go through all of the different arguments that were passed to our program, and we're going to print their contents out to the screen one at a time. In order to print files out to the screen, we're going to do something very similar to what we did at the end of the quiz. At the end of the quiz, that hire program, we had to open up a file, and then we had to print to it. In this case, we're going to open up a file, and we're going to read from it instead. Then we're going to print, instead of to a file, we're going to print to the screen. So printing to the screen you've all done before with printf. So that's not too crazy. But reading a file is kind of weird. We'll go through that a little bit at a time. If you guys go back to that last problem on your quiz, problem 33, the first line that we're going to do here, opening the file, is very similar to what we did there. So Stella, what does that line look like, when we open a file? [Stella] Capital FILE*, file-- >>Okay. >>--is equal to fopen. >>Yup. Which in this case is? It's in the comment. >>It's in the comment? argv[i] and r? >>Exactly. Right on. So Stella's totally right. This is what the line looks like. We're going to get a file stream variable, store it in a FILE*, so all caps, F-I-L-E, *, and the name of this variable will be file. We could call it whatever we like. We could call it first_file, or file_i, whatever we'd like. And then the name of the file was passed in on the command line to this program. So it's stored in argv[i,] and then we're going to open this file in read mode. Now that we've opened the file, what's the thing that we always have to remember to do whenever we've opened a file? Close it. So Missy, how do we close a file? [Missy] fclose(file) >>fclose(file). Exactly. Great. Okay. If we look at this to do comment right here, it says, "Open argv[i] and print its contents to stdout." 

Standard out is a weird name. Stdout is just our way of saying we want to print it to the terminal; we want to print it to the standard output stream. We can actually get rid of this comment right here. I'm going to copy it and paste it since that's what we did. At this point, now we have to read the file bit by bit. We've discussed a couple of ways of reading files. Which ones are your favorites so far? Which ways have you seen or do you remember, to read files? [Daniel] fread? >>fread? So fread is one. Jimmy, do you know any others? [Jimmy] No. >>Okay. Nope. Charlotte? Alexander? Any others? Okay. So the other ones are fgetc, is one that we'll use a lot. There's also fscanf; you guys see a pattern here? They all begin with f. Anything to do with a file. There's fread, fgetc, fscanf. These are all of the reading functions. For writing we have fwrite, we have fputc instead of fgetc. We also have fprintf like we saw on the quiz. Since this is a problem that involves reading from a file, we're going to use one of these three functions. We're not going to use these functions down here. These functions are all found in the standard I/O library. So if you look at the top of this program, you can see that we've already included the header file for the standard I/O library. If we want to figure out which one we want to use, we can always open up the man pages. So we can type man stdio and read all about the stdio input and output functions in C. And we can already see oh, look. It's mentioning fgetc, it's mentioning fputc. So you can drill down a little bit and look at, say, fgetc and look at its man page. You can see that it goes along with a whole bunch of other functions: fgetc, fgets, getc, getchar, gets, ungetc, and its input of characters and strings. So this is how we read in characters and strings from files from standard input, which is essentially from the user. And this is how we do it in actual C. So this is not using the GetString and GetChar functions that we used from the CS50 library. We're going to do this problem in a couple of ways so that you can see two different ways of doing it. Both the fread function that Daniel mentioned and fgetc are good ways to do it. I think fgetc is a little easier, because it only has, as you see, one argument, the FILE* that we're trying to read the character from, and its return value is an int. And this is a little confusing, right? 

Because we're getting a character, so why doesn't this return a char? You guys have any ideas on why this might not return a char? [Missy answers, unintelligible] >>Yeah. So Missy's totally right. If it's ASCII, then this integer could be mapped to an actual char. Could be an ASCII character, and that's right. That's exactly what's happening. We're using an int simply because it has more bits. It's bigger than a char; our char only has 8 bits, that 1 byte on our 32-bit machines. And an int has all 4 bytes' worth of space. And it turns out that the way fgetc works, if we scroll down in our synopsis in this man page a little bit, scroll all the way down. It turns out that they use this special value called EOF. It's a special constant as the return value of the fgetc function whenever you hit the end of the file or if you get an error. And it turns out that to do these comparisons with EOF properly, you want to have that extra amount of information that you have in an int as opposed to using a char variable. Even though fgetc is effectively getting a character from a file, you want to remember that it is returning something that's of type int to you. That said, it's fairly easy to use. It's going to give us a character; so all we have to do is keep asking the file, "Give me the next character, give me the next character, give me the next character," until we get to the end of the file. And that will pull in one character at a time from our file, and then we can do whatever we like with it. We can store it, we can add it to a string, we can print it out. Do any of that. 

Zooming back out and going back to our cat.c program, if we're going to use fgetc, how might we approach this next line of code? We're going to use--fread will do something slightly different. And this time, we're just going to use fgetc to get one character at a time. To process an entire file, what might we have to do? How many characters are there in a file? There are a lot. So you probably want to get one and then get another and get another and get another. What kind of algorithm do you think we might have to use here? What type of--? [Alexander] A for loop? >>Exactly. Some type of loop. A for loop is actually great, in this case. And like you were saying, it sounds like you want a loop over the entire file, getting a character at a time. Any suggestions on what that might look like? [Alexander, unintelligible] >>Okay, just tell me in English what you're trying to do? [Alexander, unintelligible] So in this case, it sounds like we're just trying to loop over the entire file. [Alexander] So i < the size of int? >>The size of--? I guess the size of the file, right? The size--we'll just write it like this. Size of file for the time being, i++. So it turns out that the way you do this using fgetc, and this is new, is that there's no easy way to just get the size of a file with this "sizeof" type of construct that you've seen before. When we use that fgetc function, we're introducing some kind of new, funky syntax to this for loop, where instead of using just a basic counter to go character by character, we're going to pull one character at a time, one character at a time, and the way we know we're at the end is not when we've counted a certain number of characters, but when the character we pull out is that special end of file character. So we can do this by--I call this ch, and we're going to initialize it with our first call to get the first character out of the file. So this part right here, this is going to get a character out of the file and store it into the variable ch. We're going to keep doing this until we get to the end of the file, which we do by testing for the character not being equal to that special EOF character. And then instead of doing ch++, which would just increment the value, so if we read an A out of the file, a capital A, say, ch++ would give us b, and then we'd get c and then d. That's clearly not what we want. What we want here in this last bit is we want to get the next character from the file. 

So how might we get the next character from the file? How do we get the first character from the file? [Student] fgetfile? >>fgetc, or, sorry, you were totally right. I misspelled it right there. So yeah. Here instead of doing ch++, we're just going to call fgetc(file) again and store the result in our same ch variable. [Student question, unintelligible] >>This is where these FILE* guys are special. The way they work is they--when you first open--when you first make that fopen call, the FILE* effectively serves as a pointer to the beginning of the file. And then every time you call fgetc, it moves one character through the file. So whenever you call this, you're incrementing the file pointer by one character. And when you fgetc again, you're moving it another character and another character and another character and another character. [Student question, unintelligible] >>And that's--yeah. It's kind of this magic under the hood. You just keep incrementing through. At this point, you're able to actually work with a character. So how might we print this out to the screen, now? We can use the same printf thing that we used before. That we've been using all semester. We can call printf, and we can pass in the character just like that. Another way to do it is rather than using printf and having to do this format string, we can also use one of the other functions. We can use fputc, which prints a character to the screen, except if we look at fputc--let me zoom out a little bit. We see what's nice is it takes in the character that we read out using fgetc, but then we have to give it a stream to print to. We can also use the putchar function, which will put directly to standard out. So there are a whole bunch of different options that we can use for printing. They're all in the standard I/O library. Whenever you want to print--so printf, by default, will print to the special standard out stream, which is that stdout. So we can just refer to it as kind of this magic value, stdout in here. Oops. Put the semicolon outside. 

This is a lot of new, funky information in here. A lot of this is very idiomatic, in the sense that this is code that is written this way just because it's clean to read, easy to read. There are many different ways to do it, many different functions you can use, but we tend to just follow these same patterns over and over. So don't be surprised if you see code like this coming up again and again. All right. At this point, we need to break for the day. Thanks for coming. Thanks for watching if you're online. And we'll see you next week. [CS50.TV]