1 00:00:00,000 --> 00:00:02,730 [SECTION 5: LESS COMFORTABLE] 2 00:00:02,730 --> 00:00:05,180 [Nate Hardison, Harvard University] 3 00:00:05,180 --> 00:00:08,260 [This is CS50.] [CS50.TV] 4 00:00:08,260 --> 00:00:11,690 So welcome back, guys. 5 00:00:11,690 --> 00:00:16,320 Welcome to section 5. 6 00:00:16,320 --> 00:00:20,220 At this point, having completed quiz 0 and having seen how you've done, 7 00:00:20,220 --> 00:00:25,770 hopefully you feel really good because I was very impressed by the scores in this section. 8 00:00:25,770 --> 00:00:28,050 For our online viewers, we've had a couple of questions 9 00:00:28,050 --> 00:00:33,680 about the last two problems on the problem set--or on the quiz, rather. 10 00:00:33,680 --> 00:00:39,690 So we're going to go over those really quickly so that everybody sees what happened 11 00:00:39,690 --> 00:00:45,060 and how to go through the actual solution rather than just viewing the solution itself. 12 00:00:45,060 --> 00:00:50,330 We're going to go over the last couple of problems really quickly, 32 and 33. 13 00:00:50,330 --> 00:00:53,240 Just, again, so that the online viewers can see this. 14 00:00:53,240 --> 00:00:59,080 >> If you turn to your problem 32, which is on page 13, 15 00:00:59,080 --> 00:01:02,730 13 out of 16, problem 32 is all about swaps. 16 00:01:02,730 --> 00:01:05,010 It was all about swapping two integers. 17 00:01:05,010 --> 00:01:08,740 It's the problem that we'd gone over a couple of times in lecture. 18 00:01:08,740 --> 00:01:13,590 And in here, what we were asking you to do is a quick memory trace. 19 00:01:13,590 --> 00:01:17,000 To fill in the values of the variables as they are on the stack 20 00:01:17,000 --> 00:01:20,250 as the code goes through this swap function. 21 00:01:20,250 --> 00:01:24,500 In particular, what we're looking at--I'm going to put this iPad down-- 22 00:01:24,500 --> 00:01:29,650 in particular, what we're looking at is this line numbered 6 right here. 23 00:01:29,650 --> 00:01:36,740 And it's numbered 6 for just contiguity with the previous problem. 24 00:01:36,740 --> 00:01:41,720 What we want to do is display or label the state of memory 25 00:01:41,720 --> 00:01:46,090 as it is at the time when we execute this line number 6, 26 00:01:46,090 --> 00:01:52,540 which is effectively a return from our swap function right here. 27 00:01:52,540 --> 00:01:59,450 If we scroll down here, we saw that the addresses of everything in memory was provided for us. 28 00:01:59,450 --> 00:02:02,540 This is very key; we'll come back to it in just a moment. 29 00:02:02,540 --> 00:02:09,240 And then down here at the bottom, we had a little memory diagram that we're going to refer to. 30 00:02:09,240 --> 00:02:12,490 I have actually done this out on my iPad. 31 00:02:12,490 --> 00:02:20,720 So I'm going to alternate back and forth between the iPad and this code just for reference. 32 00:02:20,720 --> 00:02:26,540 >> Let's start. First, let's focus on the first couple of lines of main right here. 33 00:02:26,540 --> 00:02:30,220 To start, we're going to initialize x to 1 and y to 2. 34 00:02:30,220 --> 00:02:33,040 So we have two integer variables, they're both going to be placed on the stack. 35 00:02:33,040 --> 00:02:36,050 We're going to put a 1 and a 2 in them. 36 00:02:36,050 --> 00:02:43,150 So if I flip over to my iPad, hopefully, let's see-- 37 00:02:43,150 --> 00:02:48,660 Apple TV mirroring, and there we go. Okay. 38 00:02:48,660 --> 00:02:51,670 So if I flip over to my iPad, 39 00:02:51,670 --> 00:02:56,220 I want to initialize x to 1 and y to 2. 40 00:02:56,220 --> 00:03:00,580 We do that quite simply by writing a 1 in the box marked x 41 00:03:00,580 --> 00:03:07,730 and a 2 in the box marked y. Fairly simple. 42 00:03:07,730 --> 00:03:11,620 So now let's go back to the laptop, see what happens next. 43 00:03:11,620 --> 00:03:15,810 So this next line is where things get tricky. 44 00:03:15,810 --> 00:03:28,110 We pass the address of x and the address of y as the parameters a and b to the swap function. 45 00:03:28,110 --> 00:03:32,380 The address of x and the address of y are things that we can't calculate 46 00:03:32,380 --> 00:03:36,360 without referring to these bullet points right down here. 47 00:03:36,360 --> 00:03:39,750 And fortunately, the first two bullet points tell us exactly what the answers are. 48 00:03:39,750 --> 00:03:44,740 The address of x in memory is 10, and the address of y in memory is 14. 49 00:03:44,740 --> 00:03:51,870 So those are the values that get passed in as a and b up top in our swap function. 50 00:03:51,870 --> 00:04:00,760 So again, switching back to our diagram, I can write a 10 in a 51 00:04:00,760 --> 00:04:07,400 and a 14 in b. 52 00:04:07,400 --> 00:04:11,610 Now, this point is where we proceed with the swap. 53 00:04:11,610 --> 00:04:14,520 So flipping back to the laptop again, 54 00:04:14,520 --> 00:04:21,079 we see that the way the swap works is I first dereference a and store the result in tmp. 55 00:04:21,079 --> 00:04:27,650 So the dereference operator says, "Hey. Treat the contents of variable a as an address. 56 00:04:27,650 --> 00:04:33,830 Go to whatever is stored at that address, and load it." 57 00:04:33,830 --> 00:04:41,720 What you load out of the variable is going to be stored into our tmp variable. 58 00:04:41,720 --> 00:04:45,150 Flipping back to the iPad. 59 00:04:45,150 --> 00:04:51,690 If we go to address 10, we know that address 10 is the varible x 60 00:04:51,690 --> 00:04:55,480 because we were told by our bullet point that the address of x in memory is 10. 61 00:04:55,480 --> 00:05:00,180 So we can go there, get the value of it, which is 1, as we see on our iPad, 62 00:05:00,180 --> 00:05:06,300 and load that into tmp. 63 00:05:06,300 --> 00:05:08,250 Again, this is not the final contents. 64 00:05:08,250 --> 00:05:14,350 We're going to walk through and we'll get to our final state of the program at the end. 65 00:05:14,350 --> 00:05:17,210 But right now, we have the value 1 stored in tmp. 66 00:05:17,210 --> 00:05:19,210 >> And there's a quick question over here. 67 00:05:19,210 --> 00:05:23,980 [Alexander] Is the dereference operator--that's just the star right in front of the variable? 68 00:05:23,980 --> 00:05:27,600 >>Yes. So the dereference operator, as we flip back to our laptop once again, 69 00:05:27,600 --> 00:05:33,780 is this star right in front. 70 00:05:33,780 --> 00:05:37,460 In that sense, it is--you contrast it with the multiplication operator 71 00:05:37,460 --> 00:05:42,400 which requires two things; the dereference operator is a unary operator. 72 00:05:42,400 --> 00:05:46,130 Just applied to one value as opposed to a binary operator, 73 00:05:46,130 --> 00:05:48,810 where you apply to two different values. 74 00:05:48,810 --> 00:05:52,080 So that's what happens in this line. 75 00:05:52,080 --> 00:05:58,390 We loaded the value 1 and stored it into our temporary integer variable. 76 00:05:58,390 --> 00:06:05,800 The next line, we store the contents of b into-- 77 00:06:05,800 --> 00:06:12,630 or, rather, we store the contents that b is pointing to into the place where a is pointing to. 78 00:06:12,630 --> 00:06:17,690 If we analyze this from right to left, we are going to dereference b, 79 00:06:17,690 --> 00:06:23,580 we are going to address 14, we are going to grab the integer that is there, 80 00:06:23,580 --> 00:06:26,900 and then we are going to go to the address 10, 81 00:06:26,900 --> 00:06:34,240 and we are going to throw the result of our dereference of b into that space. 82 00:06:34,240 --> 00:06:40,080 Flipping back to our iPad, where we can make this a little more concrete, 83 00:06:40,080 --> 00:06:44,070 it might help if I write numbers on all of the addresses here. 84 00:06:44,070 --> 00:06:53,820 So we know that at y, we are at address 14, x is at address 10. 85 00:06:53,820 --> 00:07:00,180 When we start at b, we dereference b, we're going to grab the value 2. 86 00:07:00,180 --> 00:07:08,320 We are going to grab this value because that is the value that lives at address 14. 87 00:07:08,320 --> 00:07:15,700 And we're going to put it into the variable that lives at address 10, 88 00:07:15,700 --> 00:07:19,160 which is right there, corresponding to our variable x. 89 00:07:19,160 --> 00:07:21,810 So we can do a little bit of overwriting here 90 00:07:21,810 --> 00:07:35,380 where we get rid of our 1 and instead we write a 2. 91 00:07:35,380 --> 00:07:39,560 So all's well and good in the world, even though we've overwritten x now. 92 00:07:39,560 --> 00:07:44,890 We have stored x's old value in our tmp variable. 93 00:07:44,890 --> 00:07:50,210 So we can complete the swap with the next line. 94 00:07:50,210 --> 00:07:53,030 Flipping back to our laptop. 95 00:07:53,030 --> 00:07:58,150 Now all that remains is to take the contents out of our temporary integer variable 96 00:07:58,150 --> 00:08:05,630 and store them into the variable that lives at the address that b is holding. 97 00:08:05,630 --> 00:08:10,230 So we're going to effectively dereference b to get access to the variable 98 00:08:10,230 --> 00:08:14,340 that is at the address that b holds in it, 99 00:08:14,340 --> 00:08:19,190 and we're going to stuff the value that tmp is holding into it. 100 00:08:19,190 --> 00:08:23,280 Flipping back to the iPad once more. 101 00:08:23,280 --> 00:08:31,290 I can erase this value here, 2, 102 00:08:31,290 --> 00:08:41,010 and instead we'll copy the 1 right into it. 103 00:08:41,010 --> 00:08:43,059 Then the next line that executes, of course-- 104 00:08:43,059 --> 00:08:47,150 if we flip back to the laptop--is this point 6, 105 00:08:47,150 --> 00:08:52,500 which is the point at which we wanted to have our diagram completely filled out. 106 00:08:52,500 --> 00:08:58,940 So flipping back to the iPad once more, just so you can see the completed diagram, 107 00:08:58,940 --> 00:09:06,610 you can see that we have a 10 in a, a 14 in b, a 1 in tmp, a 2 in x, and a 1 in y. 108 00:09:06,610 --> 00:09:11,000 Are there any questions about this? 109 00:09:11,000 --> 00:09:14,640 Does this make more sense, having walked through it? 110 00:09:14,640 --> 00:09:24,850 Make less sense? Hopefully not. Okay. 111 00:09:24,850 --> 00:09:28,230 >> Pointers are a very tricky subject. 112 00:09:28,230 --> 00:09:33,420 One of the guys we work with has a very common saying: 113 00:09:33,420 --> 00:09:36,590 "To understand pointers, you must first understand pointers." 114 00:09:36,590 --> 00:09:40,530 Which I think is very true. It does take a while to get used to it. 115 00:09:40,530 --> 00:09:45,360 Drawing lots of pictures, drawing lots of memory diagrams like this one are very helpful, 116 00:09:45,360 --> 00:09:49,480 and after you walk through example after example after example, 117 00:09:49,480 --> 00:09:54,450 it'll start to make a little more sense and a little more sense and a little more sense. 118 00:09:54,450 --> 00:10:01,560 Finally, one day, you'll have it all completely mastered. 119 00:10:01,560 --> 00:10:13,800 Any questions before we move on to the next problem? All right. 120 00:10:13,800 --> 00:10:18,840 So flip back to the laptop. 121 00:10:18,840 --> 00:10:23,300 The next problem we have is problem number 33 on file I/O. 122 00:10:23,300 --> 00:10:26,350 Zoom in on this a little bit. 123 00:10:26,350 --> 00:10:28,710 Problem 33--Yes? 124 00:10:28,710 --> 00:10:32,110 >> [Daniel] I just had a quick question. This star, or the asterisk, 125 00:10:32,110 --> 00:10:35,590 it's called dereferencing when you use an asterisk before. 126 00:10:35,590 --> 00:10:38,820 What's it called when you use the ampersand before? 127 00:10:38,820 --> 00:10:43,140 >>The ampersand before is the address-of operator. 128 00:10:43,140 --> 00:10:45,880 So let's scroll back up. 129 00:10:45,880 --> 00:10:49,310 Oops. I'm in zoom mode so I can't really scroll. 130 00:10:49,310 --> 00:10:52,780 If we look at this code really quickly right here, 131 00:10:52,780 --> 00:10:54,980 again, same thing happening. 132 00:10:54,980 --> 00:10:59,180 If we look at this code right here, on this line where we make the call to swap, 133 00:10:59,180 --> 00:11:10,460 the ampersand is just saying "get the address at which variable x lives." 134 00:11:10,460 --> 00:11:14,460 When your compiler compiles your code, 135 00:11:14,460 --> 00:11:20,590 it has to actually physically mark out a place in memory for all of your variables to live. 136 00:11:20,590 --> 00:11:24,910 And so what the compiler can then do once it's compiled everything, 137 00:11:24,910 --> 00:11:31,110 it knows, "Oh, I put x at address 10. I put y at address 14." 138 00:11:31,110 --> 00:11:34,640 It can then fill in these values for you. 139 00:11:34,640 --> 00:11:44,740 So you can then--it can then pass this in and pass &y in as well. 140 00:11:44,740 --> 00:11:50,730 These guys get the address, but they also, when you pass them into the swap function, 141 00:11:50,730 --> 00:11:55,690 this type information, this int* right here, tells the compiler, 142 00:11:55,690 --> 00:12:01,350 "Okay, we're going to be interpreting this address as an address of an integer variable." 143 00:12:01,350 --> 00:12:05,900 As an address of an int, which is different from the address of a character variable 144 00:12:05,900 --> 00:12:09,930 because an int takes up, on a 32-bit machine, takes up 4 bytes of space, 145 00:12:09,930 --> 00:12:13,310 whereas a character only takes up 1 byte of space. 146 00:12:13,310 --> 00:12:17,310 So it's important to know also what is--what lives, what type of value 147 00:12:17,310 --> 00:12:20,340 is living at the address that got passed in. 148 00:12:20,340 --> 00:12:22,020 Or the address that you're dealing with. 149 00:12:22,020 --> 00:12:29,020 That way, you know how many bytes of information to actually load out of your RAM. 150 00:12:29,020 --> 00:12:31,780 And then, yes, this dereference operator, like you were asking, 151 00:12:31,780 --> 00:12:37,200 goes and accesses information at a particular address. 152 00:12:37,200 --> 00:12:42,820 So it says, with this a variable here, treat the contents of a as an address, 153 00:12:42,820 --> 00:12:47,880 go to that address, and pull out, load into the processor, load into a register 154 00:12:47,880 --> 00:12:56,340 the actual values or the contents that live at that address. 155 00:12:56,340 --> 00:12:59,620 Any more questions? These are good questions. 156 00:12:59,620 --> 00:13:01,650 It's a lot of new terminology too. 157 00:13:01,650 --> 00:13:09,800 It's also kind of funky, seeing & and * in different places. 158 00:13:09,800 --> 00:13:13,180 >> All right. 159 00:13:13,180 --> 00:13:18,530 So back to problem 33, file I/O. 160 00:13:18,530 --> 00:13:22,540 This was one of those problems that I think a couple of things happened. 161 00:13:22,540 --> 00:13:25,400 One, it's a fairly new topic. 162 00:13:25,400 --> 00:13:30,590 It was presented pretty soon before the quiz, 163 00:13:30,590 --> 00:13:33,400 and then I think it was kind of like one of those word problems in math 164 00:13:33,400 --> 00:13:39,720 where they give you a lot of information, but you actually don't end up having to use a ton of it. 165 00:13:39,720 --> 00:13:44,060 The first part of this problem is describing what a CSV file is. 166 00:13:44,060 --> 00:13:50,620 Now, a CSV file, according to the description, is a comma-separated values file. 167 00:13:50,620 --> 00:13:55,300 The reason these are at all interesting, and the reason you ever use them, 168 00:13:55,300 --> 00:14:00,800 is, because, how many of you have ever used stuff like Excel? 169 00:14:00,800 --> 00:14:03,240 Figure most of you have, probably, or will use at some point in your life. 170 00:14:03,240 --> 00:14:06,430 You'll use something like Excel. 171 00:14:06,430 --> 00:14:10,940 In order to get the data out of an Excel spreadsheet or do any sort of processing with it, 172 00:14:10,940 --> 00:14:17,240 if you wanted to write a C program or Python program, Java program, 173 00:14:17,240 --> 00:14:20,070 to deal with the data you have stored in there, 174 00:14:20,070 --> 00:14:23,170 one of the most common ways to get it out is in a CSV file. 175 00:14:23,170 --> 00:14:26,850 And you can open up Excel and when you go to the 'Save As' dialogue, 176 00:14:26,850 --> 00:14:32,840 you can get out an actual CSV file. 177 00:14:32,840 --> 00:14:35,890 >> Handy to know how to deal with these things. 178 00:14:35,890 --> 00:14:42,010 The way it works is that it's similar to--I mean, it's essentially mimicking a spreadsheet, 179 00:14:42,010 --> 00:14:47,590 where, as we see here, in the very left-most piece, 180 00:14:47,590 --> 00:14:49,910 we have all the last names. 181 00:14:49,910 --> 00:14:54,670 So we have Malan, then Hardison, and then Bowden, MacWilliam, and then Chan. 182 00:14:54,670 --> 00:14:59,470 All the last names. And then a comma separates the last names from the first names. 183 00:14:59,470 --> 00:15:02,970 David, Nate, Rob, Tommy, and Zamyla. 184 00:15:02,970 --> 00:15:06,850 I always mix up Robby and Tom. 185 00:15:06,850 --> 00:15:10,940 And then, finally, the third column is the email addresses. 186 00:15:10,940 --> 00:15:18,500 Once you understand that, the rest of the program is fairly straightforward to implement. 187 00:15:18,500 --> 00:15:23,850 What we've done in order to mimic this same structure in our C program 188 00:15:23,850 --> 00:15:27,510 is we've used a structure. 189 00:15:27,510 --> 00:15:30,520 We'll start playing with these a little more as well. 190 00:15:30,520 --> 00:15:35,790 We saw them for the first little bit in problem set 3, when we were dealing with the dictionaries. 191 00:15:35,790 --> 00:15:40,290 But this staff struct stores a last name, a first name, and an email. 192 00:15:40,290 --> 00:15:44,500 Just like our CSV file was storing. 193 00:15:44,500 --> 00:15:47,950 So this is just converting from one format to another. 194 00:15:47,950 --> 00:15:54,630 We have to convert, in this case, a staff struct into a line, 195 00:15:54,630 --> 00:15:59,060 a comma-separated line, just like that. 196 00:15:59,060 --> 00:16:01,500 Does that make sense? You guys have all taken the quiz, 197 00:16:01,500 --> 00:16:07,680 so I imagine you have at least had some time to think about this. 198 00:16:07,680 --> 00:16:16,410 >> In the hire function, the problem asks us to take in--we'll zoom in on this a little bit-- 199 00:16:16,410 --> 00:16:22,480 take in a staff structure, a staff struct, with name s, 200 00:16:22,480 --> 00:16:30,900 and append its contents to our staff.csv file. 201 00:16:30,900 --> 00:16:34,230 It turns out that this is fairly straightforward to use. 202 00:16:34,230 --> 00:16:37,430 We'll kind of play around with these functions a little bit more today. 203 00:16:37,430 --> 00:16:44,510 But in this case, the fprintf function is really the key. 204 00:16:44,510 --> 00:16:51,960 So with fprintf, we can print, just like you guys have been using printf this whole term. 205 00:16:51,960 --> 00:16:55,050 You can printf a line to a file. 206 00:16:55,050 --> 00:16:59,030 So instead of just making the usual printf call where you give it the format string 207 00:16:59,030 --> 00:17:05,380 and then you replace all the variables with the following arguments, 208 00:17:05,380 --> 00:17:11,290 with fprintf, your very first argument is instead the file you want to write to. 209 00:17:11,290 --> 00:17:21,170 If we were to look at this in the appliance, for example, man fprintf, 210 00:17:21,170 --> 00:17:25,980 we can see the difference between printf and fprintf. 211 00:17:25,980 --> 00:17:28,960 I'll zoom in here a little bit. 212 00:17:28,960 --> 00:17:33,140 So with printf, we give it a format string, and then the subsequent arguments 213 00:17:33,140 --> 00:17:37,580 are all the variables for replacement or substitution into our format string. 214 00:17:37,580 --> 00:17:47,310 Whereas with fprintf, the first argument is indeed this file* called a stream. 215 00:17:47,310 --> 00:17:51,800 >> Moving back over here to our hire, 216 00:17:51,800 --> 00:17:54,550 we've already got our file* stream opened for us. 217 00:17:54,550 --> 00:17:57,810 That's what this first line does; it opens the staff.csv file, 218 00:17:57,810 --> 00:18:01,690 it opens it in append mode, and all that's left for us to do is 219 00:18:01,690 --> 00:18:08,640 write the staff structure to the file. 220 00:18:08,640 --> 00:18:10,870 And, let's see, do I want to use the iPad? 221 00:18:10,870 --> 00:18:17,900 I'll use the iPad. We have void--let's put this on the table so I can write a little better-- 222 00:18:17,900 --> 00:18:33,680 void hire and it takes in one argument, a staff structure called s. 223 00:18:33,680 --> 00:18:44,120 Got our braces, we've got our file* called file, 224 00:18:44,120 --> 00:18:48,380 we have our fopen line given to us, 225 00:18:48,380 --> 00:18:51,890 and I'll just write it as dots since it's already in the pedia. 226 00:18:51,890 --> 00:19:00,530 And then on our next line, we're going to make a call to fprintf 227 00:19:00,530 --> 00:19:03,700 and we're going to pass in the file that we want to print to, 228 00:19:03,700 --> 00:19:10,290 and then our format string, which-- 229 00:19:10,290 --> 00:19:14,300 I'll let you guys tell me what it looks like. 230 00:19:14,300 --> 00:19:20,500 How about you, Stella? Do you know what the first part of the format string looks like? 231 00:19:20,500 --> 00:19:24,270 [Stella] I'm not sure. >>Feel free to ask Jimmy. 232 00:19:24,270 --> 00:19:27,690 Do you know, Jimmy? 233 00:19:27,690 --> 00:19:31,000 [Jimmy] Would it just be last? I don't know. I'm not entirely sure. 234 00:19:31,000 --> 00:19:39,020 >>Okay. How about, did anybody get this correct on the exam? 235 00:19:39,020 --> 00:19:41,770 No. All right. 236 00:19:41,770 --> 00:19:47,920 It turns out that here all we have to do is we want each part of our staff structure 237 00:19:47,920 --> 00:19:53,290 to be printed out as a string into our file. 238 00:19:53,290 --> 00:19:59,900 We just use the string substitution character three different times because we have a last name 239 00:19:59,900 --> 00:20:07,160 followed by comma, then a first name followed by a comma, 240 00:20:07,160 --> 00:20:12,430 and then finally the email address which is followed--which is not 241 00:20:12,430 --> 00:20:15,140 fitting on my screen--but it's followed by a newline character. 242 00:20:15,140 --> 00:20:20,060 So I'm going to write it just down there. 243 00:20:20,060 --> 00:20:23,560 And then following our format string, 244 00:20:23,560 --> 00:20:27,880 we just have the substitutions, which we access using the dot notation 245 00:20:27,880 --> 00:20:31,370 that we saw in problem set 3. 246 00:20:31,370 --> 00:20:48,820 We can use s.last, s.first, and s.email 247 00:20:48,820 --> 00:20:58,990 to substitute in those three values into our format string. 248 00:20:58,990 --> 00:21:06,190 So how did that go? Make sense? 249 00:21:06,190 --> 00:21:09,700 Yes? No? Possibly? Okay. 250 00:21:09,700 --> 00:21:14,180 >> The final thing that we do after we've printed and after we've opened our file: 251 00:21:14,180 --> 00:21:17,370 whenever we've opened a file, we always have to remember to close it. 252 00:21:17,370 --> 00:21:19,430 Because otherwise we'll end up leaking the memory, 253 00:21:19,430 --> 00:21:22,500 using up file descriptors. 254 00:21:22,500 --> 00:21:25,950 So to close it, which function do we use? Daniel? 255 00:21:25,950 --> 00:21:30,120 [Daniel] fclose? >> fclose, exactly. 256 00:21:30,120 --> 00:21:37,520 So the last part of this problem was to properly close the file, using the fclose function, 257 00:21:37,520 --> 00:21:40,370 which just looks like that. 258 00:21:40,370 --> 00:21:43,880 Not too crazy. 259 00:21:43,880 --> 00:21:46,990 Cool. 260 00:21:46,990 --> 00:21:49,520 So that's problem 33 on the quiz. 261 00:21:49,520 --> 00:21:52,480 We'll have definitely more file I/O coming up. 262 00:21:52,480 --> 00:21:55,130 We'll do a little bit more in lecture today, or in section today, 263 00:21:55,130 --> 00:22:01,710 because that's what's going to form the bulk of this upcoming pset. 264 00:22:01,710 --> 00:22:05,020 Let's move on from the quiz at this point. Yes? 265 00:22:05,020 --> 00:22:10,880 >> [Charlotte]] Why fclose(file) instead of fclose(staff.csv)? 266 00:22:10,880 --> 00:22:19,100 >>Ah. Because it turns out that--so the question, which is a great one, 267 00:22:19,100 --> 00:22:27,800 is why, when we write fclose, are we writing fclose(file) star variable 268 00:22:27,800 --> 00:22:33,680 as opposed to the file name, staff.csv? Is that correct? Yeah. 269 00:22:33,680 --> 00:22:39,570 So let's take a look. If I switch back to my laptop, 270 00:22:39,570 --> 00:22:45,040 and let's look at the fclose function. 271 00:22:45,040 --> 00:22:51,460 So the fclose function closes a stream and it takes in the pointer to the stream that we want to close, 272 00:22:51,460 --> 00:22:57,010 as opposed to the actual file name that we want to close. 273 00:22:57,010 --> 00:23:01,620 And this is because behind the scenes, when you make a call to fopen, 274 00:23:01,620 --> 00:23:12,020 when you open up a file, you're actually allocating memory to store information about the file. 275 00:23:12,020 --> 00:23:16,380 So you have file pointer that has information about the file, 276 00:23:16,380 --> 00:23:23,080 such as it's open, its size, where you are currently in the file, 277 00:23:23,080 --> 00:23:29,100 so that you can make reading and writing calls to that particular place within the file. 278 00:23:29,100 --> 00:23:38,060 You end up closing the pointer instead of closing the file name. 279 00:23:38,060 --> 00:23:48,990 >> Yes? [Daniel] So in order to use hire, would you say--how does it get the user input? 280 00:23:48,990 --> 00:23:53,830 Does fprintf act like GetString in the sense that it'll just wait for the user input 281 00:23:53,830 --> 00:23:57,180 and ask you to type this--or wait for you to type these three things in? 282 00:23:57,180 --> 00:24:00,480 Or do you need to use something to implement hire? 283 00:24:00,480 --> 00:24:04,100 >>Yeah. So we're not--the question was, how do we get the user input 284 00:24:04,100 --> 00:24:09,220 in order to implement hire? And what we have here is the caller of hire, 285 00:24:09,220 --> 00:24:17,690 passed in this staff struct with all of the data stored in the struct already. 286 00:24:17,690 --> 00:24:22,990 So fprintf is able to just write that data directly to the file. 287 00:24:22,990 --> 00:24:25,690 There's no waiting for user input. 288 00:24:25,690 --> 00:24:32,110 The user's already given the input by properly putting it in this staff struct. 289 00:24:32,110 --> 00:24:36,510 And things, of course, would break if any of those pointers were null, 290 00:24:36,510 --> 00:24:40,370 so we scroll back up here and we look at our struct. 291 00:24:40,370 --> 00:24:43,640 We have string last, string first, string email. 292 00:24:43,640 --> 00:24:48,530 We now know that all of those really, under the hood, are char* variables. 293 00:24:48,530 --> 00:24:53,470 That may or may not be pointing to null. 294 00:24:53,470 --> 00:24:55,800 They may be pointing to memory on the heap, 295 00:24:55,800 --> 00:24:59,650 maybe memory on the stack. 296 00:24:59,650 --> 00:25:04,580 We don't really know, but if any of these pointers are null, or invalid, 297 00:25:04,580 --> 00:25:08,120 that that'll definitely crash our hire function. 298 00:25:08,120 --> 00:25:11,050 That was something that was kind of beyond the scope of the exam. 299 00:25:11,050 --> 00:25:16,440 We're not worrying about that. 300 00:25:16,440 --> 00:25:22,170 Great. Okay. So moving on from the quiz. 301 00:25:22,170 --> 00:25:25,760 >> Let's close this guy, and we're going to look at pset 4. 302 00:25:25,760 --> 00:25:34,700 So if you guys look at the pset spec, once you can access it, cs50.net/quizzes, 303 00:25:34,700 --> 00:25:42,730 we are going to go through a few of the section problems today. 304 00:25:42,730 --> 00:25:52,240 I'm scrolling down--section of questions begins on the third page of the pset spec. 305 00:25:52,240 --> 00:25:57,800 And the first part asks you to go and watch the short on redirecting and pipes. 306 00:25:57,800 --> 00:26:02,820 Which was kind of a cool short, shows you some new, cool command line tricks that you can use. 307 00:26:02,820 --> 00:26:06,050 And then we've got a few questions for you as well. 308 00:26:06,050 --> 00:26:10,860 This first question about streams, to which printf writes by default, 309 00:26:10,860 --> 00:26:15,920 we kind of touched on just a little bit a moment ago. 310 00:26:15,920 --> 00:26:22,380 This fprintf that we were just discussing takes in a file* stream as its argument. 311 00:26:22,380 --> 00:26:26,580 fclose takes in a file* stream as well, 312 00:26:26,580 --> 00:26:32,660 and the return value of fopen gives you a file* stream as well. 313 00:26:32,660 --> 00:26:36,060 The reason we haven't seen those before when we've dealt with printf 314 00:26:36,060 --> 00:26:39,450 is because printf has a default stream. 315 00:26:39,450 --> 00:26:41,810 And the default stream to which it writes 316 00:26:41,810 --> 00:26:45,190 you'll find out about in the short. 317 00:26:45,190 --> 00:26:50,080 So definitely take a look at it. 318 00:26:50,080 --> 00:26:53,010 >> In today's section, we're going to talk a little bit about GDB, 319 00:26:53,010 --> 00:26:57,720 since the more familiar you are with it, the more practice you get with it, 320 00:26:57,720 --> 00:27:01,390 the better able you'll be to actually hunt down bugs in your own code. 321 00:27:01,390 --> 00:27:05,540 This speeds the process of debugging up tremendously. 322 00:27:05,540 --> 00:27:09,230 So by using printf, every time you do that you have to recompile your code, 323 00:27:09,230 --> 00:27:13,000 you have to run it again, sometimes you have to move the printf call around, 324 00:27:13,000 --> 00:27:17,100 comment out code, it just takes a while. 325 00:27:17,100 --> 00:27:20,850 Our goal is to try and convince you that with GDB, you can essentially 326 00:27:20,850 --> 00:27:26,810 printf anything at any point in your code and you never have to recompile it. 327 00:27:26,810 --> 00:27:35,120 You never have to start and keep guessing where to printf next. 328 00:27:35,120 --> 00:27:40,910 The first thing to do is to copy this line and get the section code off of the web. 329 00:27:40,910 --> 00:27:47,530 I'm copying this line of code that says, "wget http://cdn.cs50.net". 330 00:27:47,530 --> 00:27:49,510 I'm going to copy it. 331 00:27:49,510 --> 00:27:55,950 I'm going to go over to my appliance, zoom out so you can see what I'm doing, 332 00:27:55,950 --> 00:28:01,890 pasting it in there, and when I hit Enter, this wget command literally is a web get. 333 00:28:01,890 --> 00:28:06,210 It's going to pull down this file off of the Internet, 334 00:28:06,210 --> 00:28:11,790 and it's going to save it to the current directory. 335 00:28:11,790 --> 00:28:21,630 Now if I list my current directory you can see that I've got this section5.zip file right in there. 336 00:28:21,630 --> 00:28:25,260 The way to deal with that guy is to unzip it, 337 00:28:25,260 --> 00:28:27,650 which you can do in the command line, just like this. 338 00:28:27,650 --> 00:28:31,880 Section5.zip. 339 00:28:31,880 --> 00:28:36,980 That'll unzip it, create the folder for me, 340 00:28:36,980 --> 00:28:40,410 inflate all of the contents, put them in there. 341 00:28:40,410 --> 00:28:47,410 So now I can go into my section 5 directory using the cd command. 342 00:28:47,410 --> 00:28:58,310 Clear the screen using clear. So clear the screen. 343 00:28:58,310 --> 00:29:02,280 Now I've got a nice clean terminal to deal with. 344 00:29:02,280 --> 00:29:06,200 >> Now if I list all the files that I see in this directory, 345 00:29:06,200 --> 00:29:12,270 you see that I've got four files: buggy1, buggy2, buggy3, and buggy4. 346 00:29:12,270 --> 00:29:16,180 I've also got their corresponding .c files. 347 00:29:16,180 --> 00:29:20,400 We're not going to look at the .c files for now. 348 00:29:20,400 --> 00:29:24,140 Instead, we're going to use them when we open up GDB. 349 00:29:24,140 --> 00:29:28,220 We've kept them around so that we have access to the actual source code when we're using GDB, 350 00:29:28,220 --> 00:29:32,740 but the goal of this part of the section is to tinker around with GDB 351 00:29:32,740 --> 00:29:40,370 and see how we can use it to figure out what's going wrong with each of these four buggy programs. 352 00:29:40,370 --> 00:29:43,380 So we're just going to around the room really quickly, 353 00:29:43,380 --> 00:29:47,000 and I'm going to ask somebody to run one of the buggy programs, 354 00:29:47,000 --> 00:29:54,730 and then we'll go as a group through GDB, and we'll see what we can do to fix these programs, 355 00:29:54,730 --> 00:29:58,460 or to at least identify what's going wrong in each of them. 356 00:29:58,460 --> 00:30:04,760 Let's start over here with Daniel. Will you run buggy1? Let's see what happens. 357 00:30:04,760 --> 00:30:09,470 [Daniel] It says there's an application fault. >>Yeah. Exactly. 358 00:30:09,470 --> 00:30:12,460 So if I run buggy1, I get a seg fault. 359 00:30:12,460 --> 00:30:16,210 At this point, I could go and open up buggy1.c, 360 00:30:16,210 --> 00:30:19,450 try and figure out what's going wrong, 361 00:30:19,450 --> 00:30:22,000 but one of the most obnoxious things about this seg fault error 362 00:30:22,000 --> 00:30:27,610 is that it doesn't tell you on what line of the program things actually went wrong and broke. 363 00:30:27,610 --> 00:30:29,880 You kind of have to look at the code 364 00:30:29,880 --> 00:30:33,990 and figure out using guess and check or printf to see what's going wrong. 365 00:30:33,990 --> 00:30:37,840 One of the coolest things about GDB is that it's really, really easy 366 00:30:37,840 --> 00:30:42,170 to figure out the line at which your program crashes. 367 00:30:42,170 --> 00:30:46,160 It's totally worth it to use it, even if just for that. 368 00:30:46,160 --> 00:30:56,190 So to boot up GDB, I type GDB, and then I give it the path to the executable that I want to run. 369 00:30:56,190 --> 00:31:01,960 Here I'm typing gdb ./buggy1. 370 00:31:01,960 --> 00:31:06,600 Hit Enter. Gives me all this copyright information, 371 00:31:06,600 --> 00:31:13,000 and down here you'll see this line that says, "Reading symbols from/home/ 372 00:31:13,000 --> 00:31:17,680 jharvard/section5/buggy1." 373 00:31:17,680 --> 00:31:22,060 And if all goes well, you'll see it print out a message that looks like this. 374 00:31:22,060 --> 00:31:25,500 It'll read symbols, it'll say "I'm reading symbols from your executable file," 375 00:31:25,500 --> 00:31:29,900 and then it will have this "done" message over here. 376 00:31:29,900 --> 00:31:35,410 If you see some other variation of this, or you see it couldn't find the symbols 377 00:31:35,410 --> 00:31:41,460 or something like that, what that means is that you just haven't compiled your executable properly. 378 00:31:41,460 --> 00:31:49,980 When we compile programs for use with GDB, we have to use that special -g flag, 379 00:31:49,980 --> 00:31:54,540 and that's done by default if you compile your programs, just by typing make 380 00:31:54,540 --> 00:31:59,320 or make buggy or make recover, any of those. 381 00:31:59,320 --> 00:32:07,800 But if you're compiling manually with Clang, then you'll have to go in and include that -g flag. 382 00:32:07,800 --> 00:32:10,310 >> At this point, now that we have our GDB prompt, 383 00:32:10,310 --> 00:32:12,310 it's pretty simple to run the program. 384 00:32:12,310 --> 00:32:19,740 We can either type run, or we can just type r. 385 00:32:19,740 --> 00:32:22,820 Most GDB commands can be abbreviated. 386 00:32:22,820 --> 00:32:25,940 Usually to just one or a couple letters, which is pretty nice. 387 00:32:25,940 --> 00:32:30,980 So Saad, if you type r and hit Enter, what happens? 388 00:32:30,980 --> 00:32:39,390 [Saad] I got SIGSEGV, segmentation fault, and then all this gobbledygook. 389 00:32:39,390 --> 00:32:43,650 >>Yeah. 390 00:32:43,650 --> 00:32:47,990 Like we're seeing on the screen right now, and like Saad said, 391 00:32:47,990 --> 00:32:53,430 when we type run or r and hit Enter, we still get the same seg fault. 392 00:32:53,430 --> 00:32:55,830 So using GDB doesn't solve our problem. 393 00:32:55,830 --> 00:32:59,120 But it gives us some gobbledygook, and it turns out that this gobbledygook 394 00:32:59,120 --> 00:33:03,080 actually tells us where it's happening. 395 00:33:03,080 --> 00:33:10,680 To parse this a little bit, this first bit is the function in which everything's going wrong. 396 00:33:10,680 --> 00:33:20,270 There's this __strcmp_sse4_2, and it tells us that it's happening in this file 397 00:33:20,270 --> 00:33:29,450 called sysdeps/i386, all this, again, kind of a mess--but line 254. 398 00:33:29,450 --> 00:33:31,670 That's kind of hard to parse. Usually when you see stuff like this, 399 00:33:31,670 --> 00:33:38,770 that means that it's seg faulting in one of the system libraries. 400 00:33:38,770 --> 00:33:43,220 So something to do with strcmp. You guys have seen strcmp before. 401 00:33:43,220 --> 00:33:52,730 Not too crazy, but does this mean that strcmp is broken or that there's a problem with strcmp? 402 00:33:52,730 --> 00:33:57,110 What do you think, Alexander? 403 00:33:57,110 --> 00:34:04,890 [Alexander] Is that--is 254 the line? And the--not the binary, but it's not their ceilings, 404 00:34:04,890 --> 00:34:10,590 and then there's another language for each function. Is that 254 in that function, or--? 405 00:34:10,590 --> 00:34:21,460 >>It's line 254. It looks like in this .s file, so it's assembly code probably. 406 00:34:21,460 --> 00:34:25,949 >> But, I guess the more pressing thing is, because we've gotten a seg fault, 407 00:34:25,949 --> 00:34:29,960 and it looks like it's coming from the strcmp function, 408 00:34:29,960 --> 00:34:38,030 does this imply, then, that strcmp is broken? 409 00:34:38,030 --> 00:34:42,290 It shouldn't, hopefully. So just because you have a segmentation fault 410 00:34:42,290 --> 00:34:49,480 in one of the system functions, typically that means that you just haven't called it correctly. 411 00:34:49,480 --> 00:34:52,440 The quickest thing to do to figure out what's actually going on 412 00:34:52,440 --> 00:34:55,500 when you see something crazy like this, whenever you see a seg fault, 413 00:34:55,500 --> 00:34:59,800 especially if you have a program that's using more than just main, 414 00:34:59,800 --> 00:35:03,570 is to use a backtrace. 415 00:35:03,570 --> 00:35:13,080 I abbreviate backtrace by writing bt, as opposed to the full backtrace word. 416 00:35:13,080 --> 00:35:16,510 But Charlotte, what happens when you type bt and hit Enter? 417 00:35:16,510 --> 00:35:23,200 [Charlotte] It shows me two lines, line 0 and line 1. 418 00:35:23,200 --> 00:35:26,150 >>Yeah. So line 0 and line 1. 419 00:35:26,150 --> 00:35:34,560 These are the actual stack frames that were currently in play when your program crashed. 420 00:35:34,560 --> 00:35:42,230 Starting from the topmost frame, frame 0, and going to the bottom-most, which is frame 1. 421 00:35:42,230 --> 00:35:45,140 Our topmost frame is the strcmp frame. 422 00:35:45,140 --> 00:35:50,080 You can think of this as similar to that problem we were just doing on the quiz with the pointers, 423 00:35:50,080 --> 00:35:54,890 where we had swap stack frame on top of main stack frame, 424 00:35:54,890 --> 00:35:59,700 and we had the variables that swap was using on top of the variables that main was using. 425 00:35:59,700 --> 00:36:08,440 Here our crash happened in our strcmp function, which was called by our main function, 426 00:36:08,440 --> 00:36:14,370 and backtrace is giving us not only the functions in which things failed, 427 00:36:14,370 --> 00:36:16,440 but it's also telling us where everything was called from. 428 00:36:16,440 --> 00:36:18,830 So if I scroll over a little more to the right, 429 00:36:18,830 --> 00:36:26,110 we can see that yeah, we were on line 254 of this strcmp-sse4.s file. 430 00:36:26,110 --> 00:36:32,540 But the call was made at buggy1.c, line 6. 431 00:36:32,540 --> 00:36:35,960 So that means we can do--is we can just go check out and see what was going on 432 00:36:35,960 --> 00:36:39,930 at buggy1.c, line 6. 433 00:36:39,930 --> 00:36:43,780 Again, there are a couple ways to do this. One is to exit out of GDB 434 00:36:43,780 --> 00:36:49,460 or have your code open in another window and cross reference. 435 00:36:49,460 --> 00:36:54,740 That, in and of itself, is pretty handy because now if you're at office hours 436 00:36:54,740 --> 00:36:57,220 and you've got a seg fault and your TF's wondering where everything was breaking, 437 00:36:57,220 --> 00:36:59,710 you can just say, "Oh, line 6. I don't know what's going on, 438 00:36:59,710 --> 00:37:03,670 but something about line 6 is causing my program to break." 439 00:37:03,670 --> 00:37:10,430 The other way to do it is you can use this command called list in GDB. 440 00:37:10,430 --> 00:37:13,650 You can also abbreviate it with l. 441 00:37:13,650 --> 00:37:18,910 So if we hit l, what do we get here? 442 00:37:18,910 --> 00:37:21,160 We get a whole bunch of weird stuff. 443 00:37:21,160 --> 00:37:26,030 This is the actual assembly code 444 00:37:26,030 --> 00:37:29,860 that is in strcmp_sse4_2. 445 00:37:29,860 --> 00:37:32,440 This looks kind of funky, 446 00:37:32,440 --> 00:37:36,520 and the reason we're getting this is because right now, 447 00:37:36,520 --> 00:37:40,160 GDB has us in frame 0. 448 00:37:40,160 --> 00:37:43,070 >> So anytime we look at variables, any time we look at source code, 449 00:37:43,070 --> 00:37:50,530 we're looking at source code that pertains to the stack frame we're currently in. 450 00:37:50,530 --> 00:37:53,200 So in order to get anything meaningful, we have to 451 00:37:53,200 --> 00:37:57,070 move to a stack frame that makes more sense. 452 00:37:57,070 --> 00:38:00,180 In this case, the main stack frame would make a little more sense, 453 00:38:00,180 --> 00:38:02,680 because that was actually the code that we wrote. 454 00:38:02,680 --> 00:38:05,330 Not the strcmp code. 455 00:38:05,330 --> 00:38:08,650 The way you can move between frames, in this case, because we have two, 456 00:38:08,650 --> 00:38:10,430 we have 0 and 1, 457 00:38:10,430 --> 00:38:13,650 you do that with the up and down commands. 458 00:38:13,650 --> 00:38:18,480 If I move up one frame, 459 00:38:18,480 --> 00:38:21,770 now I'm in the main stack frame. 460 00:38:21,770 --> 00:38:24,330 I can move down to go back to where I was, 461 00:38:24,330 --> 00:38:32,830 go up again, go down again, and go up again. 462 00:38:32,830 --> 00:38:39,750 If you ever do your program in GDB, you get a crash, you get the backtrace, 463 00:38:39,750 --> 00:38:42,380 and you see that it's in some file that you don't know what's going on. 464 00:38:42,380 --> 00:38:45,460 You try list, the code doesn't look familiar to you, 465 00:38:45,460 --> 00:38:48,150 take a look at your frames and figure out where you are. 466 00:38:48,150 --> 00:38:51,010 You're probably in the wrong stack frame. 467 00:38:51,010 --> 00:38:58,760 Or at least you're in a stack frame that isn't one that you can really debug. 468 00:38:58,760 --> 00:39:03,110 Now that we're in the appropriate stack frame, we're in main, 469 00:39:03,110 --> 00:39:08,100 now we can use the list command to figure out what the line was. 470 00:39:08,100 --> 00:39:13,590 And you can see it; it printed it for us right here. 471 00:39:13,590 --> 00:39:19,470 But we can hit list all the same, and list gives us this nice printout 472 00:39:19,470 --> 00:39:23,920 of the actual source code that's going on in here. 473 00:39:23,920 --> 00:39:26,420 >> In particular, we can look at line 6. 474 00:39:26,420 --> 00:39:29,330 We can see what's going on here. 475 00:39:29,330 --> 00:39:31,250 And it looks like we're making a string comparison 476 00:39:31,250 --> 00:39:41,050 between the string "CS50 rocks" and argv[1]. 477 00:39:41,050 --> 00:39:45,700 Something about this was crashing. 478 00:39:45,700 --> 00:39:54,120 So Missy, do you have any thoughts on what might be going on here? 479 00:39:54,120 --> 00:39:59,400 [Missy] I don't know why it's crashing. >>You don't know why it's crashing? 480 00:39:59,400 --> 00:40:02,700 Jimmy, any thoughts? 481 00:40:02,700 --> 00:40:06,240 [Jimmy] I'm not entirely sure, but the last time we used string compare, 482 00:40:06,240 --> 00:40:10,260 or strcmp, we had like three different cases under it. 483 00:40:10,260 --> 00:40:12,800 We didn't have an ==, I don't think, right in that first line. 484 00:40:12,800 --> 00:40:16,700 Instead it was separated into three, and one was ==0, 485 00:40:16,700 --> 00:40:19,910 one was < 0, I think, and one was > 0. 486 00:40:19,910 --> 00:40:22,590 So maybe something like that? >>Yeah. So there's this issue 487 00:40:22,590 --> 00:40:27,200 of are we doing the comparison correctly? 488 00:40:27,200 --> 00:40:31,660 Stella? Any thoughts? 489 00:40:31,660 --> 00:40:38,110 [Stella] I'm not sure. >>Not sure. Daniel? Thoughts? Okay. 490 00:40:38,110 --> 00:40:44,770 It turns out what's happening right here is when we ran the program 491 00:40:44,770 --> 00:40:48,370 and we got the seg fault, when you ran the program for the first time, Daniel, 492 00:40:48,370 --> 00:40:50,800 did you give it any command line arguments? 493 00:40:50,800 --> 00:40:58,420 [Daniel] No. >>No. In that case, what is the value of argv[1]? 494 00:40:58,420 --> 00:41:00,920 >>There is no value. >>Right. 495 00:41:00,920 --> 00:41:06,120 Well, there is no appropriate string value. 496 00:41:06,120 --> 00:41:10,780 But there is some value. What is the value that gets stored in there? 497 00:41:10,780 --> 00:41:15,130 >>A garbage value? >>It's either a garbage value or, in this case, 498 00:41:15,130 --> 00:41:19,930 the end of the argv array is always terminated with null. 499 00:41:19,930 --> 00:41:26,050 So what actually got stored in there is null. 500 00:41:26,050 --> 00:41:30,810 The other way to solve this, rather than thinking it through, 501 00:41:30,810 --> 00:41:33,420 is to try printing it out. 502 00:41:33,420 --> 00:41:35,880 This is where I was saying that using GDB is great, 503 00:41:35,880 --> 00:41:40,640 because you can print out all the variables, all the values that you want 504 00:41:40,640 --> 00:41:43,230 using this handy-dandy p command. 505 00:41:43,230 --> 00:41:48,520 So if I type p and then I type the value of a variable or the name of a variable, 506 00:41:48,520 --> 00:41:55,320 say, argc, I see that argc is 1. 507 00:41:55,320 --> 00:42:01,830 If I want to print out argv[0], I can do so just like that. 508 00:42:01,830 --> 00:42:04,840 And like we saw, argv[0] is always the name of your program, 509 00:42:04,840 --> 00:42:06,910 always the name of the executable. 510 00:42:06,910 --> 00:42:09,740 Here you see it's got the full path name. 511 00:42:09,740 --> 00:42:15,920 I can also print out argv[1] and see what happens. 512 00:42:15,920 --> 00:42:20,890 >> Here we got this kind of mystical value. 513 00:42:20,890 --> 00:42:23,890 We got this 0x0. 514 00:42:23,890 --> 00:42:27,850 Remember at the beginning of the term when we talked about hexadecimal numbers? 515 00:42:27,850 --> 00:42:34,680 Or that little question at the end of pset 0 about how to represent 50 in hex? 516 00:42:34,680 --> 00:42:39,410 The way we write hex numbers in CS, just to not confuse ourselves 517 00:42:39,410 --> 00:42:46,080 with decimal numbers, is we always prefix them with 0x. 518 00:42:46,080 --> 00:42:51,420 So this 0x prefix always just means interpret the following number as a hexadecimal number, 519 00:42:51,420 --> 00:42:57,400 not as a string, not as a decimal number, not as a binary number. 520 00:42:57,400 --> 00:43:02,820 Since the number 5-0 is a valid number in hexadecimal. 521 00:43:02,820 --> 00:43:06,240 And it's a number in decimal, 50. 522 00:43:06,240 --> 00:43:10,050 So this is just how we disambiguate. 523 00:43:10,050 --> 00:43:14,860 So 0x0 means hexadecimal 0, which is also decimal 0, binary 0. 524 00:43:14,860 --> 00:43:17,030 It's just the value 0. 525 00:43:17,030 --> 00:43:22,630 It turns out that this is what null is, actually, in memory. 526 00:43:22,630 --> 00:43:25,940 Null is just 0. 527 00:43:25,940 --> 00:43:37,010 Here, the element stored at argv[1] is null. 528 00:43:37,010 --> 00:43:45,220 So we're trying to compare our "CS50 rocks" string to a null string. 529 00:43:45,220 --> 00:43:48,130 So dereferencing null, trying to access things at null, 530 00:43:48,130 --> 00:43:55,050 those are typically going to cause some sort of segmentation fault or other bad things to happen. 531 00:43:55,050 --> 00:43:59,350 And it turns out that strcmp doesn't check to see 532 00:43:59,350 --> 00:44:04,340 whether or not you've passed in a value that's null. 533 00:44:04,340 --> 00:44:06,370 Rather, it just goes ahead, tries to do its thing, 534 00:44:06,370 --> 00:44:14,640 and if it seg faults, it seg faults, and it's your problem. You have to go fix it. 535 00:44:14,640 --> 00:44:19,730 Really quickly, how might we fix this problem? Charlotte? 536 00:44:19,730 --> 00:44:23,540 [Charlotte] You can check using if. 537 00:44:23,540 --> 00:44:32,240 So if argv[1] is null, ==0, then return 1, or something [unintelligible]. 538 00:44:32,240 --> 00:44:34,590 >>Yeah. So that's one great way to do it, as we can check to see, 539 00:44:34,590 --> 00:44:39,230 the value we're about to pass into strcmp, argv[1], is it null? 540 00:44:39,230 --> 00:44:45,830 If it's null, then we can say okay, abort. 541 00:44:45,830 --> 00:44:49,450 >> A more common way to do this is to use the argc value. 542 00:44:49,450 --> 00:44:52,040 You can see right here at the beginning of main, 543 00:44:52,040 --> 00:44:58,040 we omitted that first test that we typically do when we use command line arguments, 544 00:44:58,040 --> 00:45:05,240 which is to test whether or not our argc value is what we expect. 545 00:45:05,240 --> 00:45:10,290 In this case, we're expecting at least two arguments, 546 00:45:10,290 --> 00:45:13,660 the name of the program plus one other. 547 00:45:13,660 --> 00:45:17,140 Because we're about to use the second argument right here. 548 00:45:17,140 --> 00:45:21,350 So having some sort of test beforehand, before our strcmp call 549 00:45:21,350 --> 00:45:37,390 that tests whether or not argv is at least 2, would also do the same sort of thing. 550 00:45:37,390 --> 00:45:40,620 We can see if that works by running the program again. 551 00:45:40,620 --> 00:45:45,610 You can always restart your program within GDB, which is really nice. 552 00:45:45,610 --> 00:45:49,310 You can run, and when you pass in arguments to your program, 553 00:45:49,310 --> 00:45:53,060 you pass them in when you call run, not when you boot up GDB. 554 00:45:53,060 --> 00:45:57,120 That way you can keep invoking your program with different arguments each time. 555 00:45:57,120 --> 00:46:08,080 So run, or again, I can type r, and let's see what happens if we type "hello". 556 00:46:08,080 --> 00:46:11,140 It will always ask you if you want to start it from the beginning again. 557 00:46:11,140 --> 00:46:17,490 Usually, you do want to start it from the beginning again. 558 00:46:17,490 --> 00:46:25,010 And at this point, it restarts it again, it prints out 559 00:46:25,010 --> 00:46:28,920 the program that we're running, buggy1, with the argument hello, 560 00:46:28,920 --> 00:46:32,720 and it prints this standard out; it says, "You get a D," sad face. 561 00:46:32,720 --> 00:46:37,610 But we didn't seg fault. It said that process exited normally. 562 00:46:37,610 --> 00:46:39,900 So that looks pretty good. 563 00:46:39,900 --> 00:46:43,050 No more seg fault, we made it past, 564 00:46:43,050 --> 00:46:48,190 so it looks like that was indeed the seg fault bug that we were getting. 565 00:46:48,190 --> 00:46:51,540 Unfortunately, it tells us that we're getting a D. 566 00:46:51,540 --> 00:46:54,090 >> We can go back and look at the code and see what was going on there 567 00:46:54,090 --> 00:46:57,980 to figure out what was--why it was telling us that we got a D. 568 00:46:57,980 --> 00:47:03,690 Let's see, here was this printf saying that you got a D. 569 00:47:03,690 --> 00:47:08,540 If we type list, as you keep typing list, it keeps iterating down through your program, 570 00:47:08,540 --> 00:47:10,940 so it'll show you the first few lines of your program. 571 00:47:10,940 --> 00:47:15,450 Then it'll show you the next few lines, and the next chunk and the next chunk. 572 00:47:15,450 --> 00:47:18,240 And it'll keep trying to go down. 573 00:47:18,240 --> 00:47:21,180 And now we'll get to "line number 16 is out of range." 574 00:47:21,180 --> 00:47:23,940 Because it only has 15 lines. 575 00:47:23,940 --> 00:47:30,310 If you get to this point and your wondering, "What do I do?" you can use the help command. 576 00:47:30,310 --> 00:47:34,340 Use help and then give it the name of a command. 577 00:47:34,340 --> 00:47:36,460 And you see the GDB gives us all this sort of stuff. 578 00:47:36,460 --> 00:47:43,870 It says, "With no argument, lists ten more lines after or around the previous listing. 579 00:47:43,870 --> 00:47:47,920 List - lists the ten lines before--" 580 00:47:47,920 --> 00:47:52,960 So let's try using list minus. 581 00:47:52,960 --> 00:47:57,000 And that lists the 10 lines previous; you can play around with list a little bit. 582 00:47:57,000 --> 00:48:02,330 You can do list, list -, you can even give list a number, like list 8, 583 00:48:02,330 --> 00:48:07,500 and it'll list the 10 lines around line 8. 584 00:48:07,500 --> 00:48:10,290 And you can see what's going on here is you've got a simple if else. 585 00:48:10,290 --> 00:48:13,980 If you type in CS50 rocks, it prints out "You get an A." 586 00:48:13,980 --> 00:48:16,530 Otherwise it prints out "You get a D." 587 00:48:16,530 --> 00:48:23,770 Bummer town. All right. Yes? 588 00:48:23,770 --> 00:48:26,730 >> [Daniel] So when I tried doing CS50 rocks without the quotes, 589 00:48:26,730 --> 00:48:29,290 it says "You get a D." 590 00:48:29,290 --> 00:48:32,560 I needed the quotes to get it to work; why is that? 591 00:48:32,560 --> 00:48:38,490 >>Yeah. It turns out that when--this is another fun little tidbit-- 592 00:48:38,490 --> 00:48:47,900 when you run the program, if we run it and we type in CS50 rocks, 593 00:48:47,900 --> 00:48:50,800 just like Daniel was saying he did, and you hit Enter, 594 00:48:50,800 --> 00:48:52,870 it still says we get a D. 595 00:48:52,870 --> 00:48:55,580 And the question is, why is this? 596 00:48:55,580 --> 00:49:02,120 And it turns out that both our terminal and GDB parse these as two separate arguments. 597 00:49:02,120 --> 00:49:04,800 Because when there's a space, that's implied as 598 00:49:04,800 --> 00:49:08,730 the first argument ended; the next argument is about to begin. 599 00:49:08,730 --> 00:49:13,260 The way to combine those into two, or sorry, into one argument, 600 00:49:13,260 --> 00:49:18,510 is to use the quotes. 601 00:49:18,510 --> 00:49:29,560 So now, if we put it in quotes and run it again, we get an A. 602 00:49:29,560 --> 00:49:38,780 So just to recap, no quotes, CS50 and rocks are parsed as two separate arguments. 603 00:49:38,780 --> 00:49:45,320 With quotes, it's parsed as one argument altogether. 604 00:49:45,320 --> 00:49:53,070 >> We can see this with a breakpoint. 605 00:49:53,070 --> 00:49:54,920 So far we've been running our program, and it's been running 606 00:49:54,920 --> 00:49:58,230 until either it seg faults or hits an error 607 00:49:58,230 --> 00:50:05,930 or until it has exited and all has been totally fine. 608 00:50:05,930 --> 00:50:08,360 This isn't necessarily the most helpful thing, because sometimes 609 00:50:08,360 --> 00:50:11,840 you have an error in your program, but it's not causing a segmentation fault. 610 00:50:11,840 --> 00:50:16,950 It's not causing your program to stop or anything like that. 611 00:50:16,950 --> 00:50:20,730 The way to get GDB to pause your program at a particular point 612 00:50:20,730 --> 00:50:23,260 is to set a breakpoint. 613 00:50:23,260 --> 00:50:26,520 You can either do this by setting a breakpoint on a function name 614 00:50:26,520 --> 00:50:30,770 or you can set a breakpoint on a particular line of code. 615 00:50:30,770 --> 00:50:34,450 I like to set breakpoints on function names, because--easy to remember, 616 00:50:34,450 --> 00:50:37,700 and if you actually go in and change your source code up a little bit, 617 00:50:37,700 --> 00:50:42,020 then your breakpoint will actually stay at the same place within your code. 618 00:50:42,020 --> 00:50:44,760 Whereas if you're using line numbers, and the line numbers change 619 00:50:44,760 --> 00:50:51,740 because you add or delete some code, then your breakpoints are all totally screwed up. 620 00:50:51,740 --> 00:50:58,590 One of the most common things I do is set a breakpoint on the main function. 621 00:50:58,590 --> 00:51:05,300 Often I'll boot up GDB, I'll type b main, hit Enter, and that'll set a breakpoint 622 00:51:05,300 --> 00:51:10,630 on the main function which just says, "Pause the program as soon as you start running," 623 00:51:10,630 --> 00:51:17,960 and that way, when I run my program with, say, CS50 rocks as two arguments 624 00:51:17,960 --> 00:51:24,830 and hit Enter, it gets to the main function and it stops right at the very first line, 625 00:51:24,830 --> 00:51:30,620 right before it evaluates the strcmp function. 626 00:51:30,620 --> 00:51:34,940 >> Since I'm paused, now I can start mucking around and seeing what's going on 627 00:51:34,940 --> 00:51:40,250 with all of the different variables that are passed into my program. 628 00:51:40,250 --> 00:51:43,670 Here I can print out argc and see what's going on. 629 00:51:43,670 --> 00:51:50,030 See that argc is 3, because it's got 3 different values in it. 630 00:51:50,030 --> 00:51:54,060 It's got the name of the program, it's got the first argument and the second argument. 631 00:51:54,060 --> 00:52:09,330 We can print those out by looking at argv[0], argv[1], and argv[2]. 632 00:52:09,330 --> 00:52:12,030 So now you can also see why this strcmp call is going to fail, 633 00:52:12,030 --> 00:52:21,650 because you see that it did split up the CS50 and the rocks into two separate arguments. 634 00:52:21,650 --> 00:52:27,250 At this point, once you've hit a breakpoint, you can continue to step through your program 635 00:52:27,250 --> 00:52:32,920 line by line, as opposed to starting your program again. 636 00:52:32,920 --> 00:52:35,520 So if you don't want to start your program again and just continue on from here, 637 00:52:35,520 --> 00:52:41,970 you can use the continue command and continue will run the program to the end. 638 00:52:41,970 --> 00:52:45,010 Just like it did here. 639 00:52:45,010 --> 00:52:54,880 However, if I restart the program, CS50 rocks, it hits my breakpoint again, 640 00:52:54,880 --> 00:52:59,670 and this time, if I don't want to just go all the way through the rest of the program, 641 00:52:59,670 --> 00:53:08,040 I can use the next command, which I also abbreviate with n. 642 00:53:08,040 --> 00:53:12,960 And this will step through the program line by line. 643 00:53:12,960 --> 00:53:17,530 So you can watch as things execute, as variables change, as things get updated. 644 00:53:17,530 --> 00:53:21,550 Which is pretty nice. 645 00:53:21,550 --> 00:53:26,570 The other cool thing is rather than repeating the same command over and over and over again, 646 00:53:26,570 --> 00:53:30,670 if you just hit Enter--so here you see I haven't typed in anything-- 647 00:53:30,670 --> 00:53:33,780 if I just hit Enter, it will repeat the previous command, 648 00:53:33,780 --> 00:53:36,900 or the previous GDB command that I just put in. 649 00:53:36,900 --> 00:53:56,000 I can keep hitting Enter and it'll keep stepping through my code line by line. 650 00:53:56,000 --> 00:53:59,310 I would encourage you guys to go check out the other buggy programs as well. 651 00:53:59,310 --> 00:54:01,330 We don't have time to get through all of them today in section. 652 00:54:01,330 --> 00:54:05,890 The source code is there, so you can kind of see what's going on 653 00:54:05,890 --> 00:54:07,730 behind the scenes if you get really stuck, 654 00:54:07,730 --> 00:54:11,940 but at the very least, just practice booting up GDB, 655 00:54:11,940 --> 00:54:13,940 running the program until it breaks on you, 656 00:54:13,940 --> 00:54:18,260 getting the backtrace, figuring out what function the crash was in, 657 00:54:18,260 --> 00:54:24,450 what line it was on, printing out some variable values, 658 00:54:24,450 --> 00:54:30,140 just so you get a feel for it, because that will really help you going forward. 659 00:54:30,140 --> 00:54:36,340 At this point, we're going to quit out of GDB, which you do using quit or just q. 660 00:54:36,340 --> 00:54:40,460 If your program is in the middle of running still, and it hasn't exited, 661 00:54:40,460 --> 00:54:43,510 it will always ask you, "Are you sure you really want to quit?" 662 00:54:43,510 --> 00:54:48,770 You can just hit yes. 663 00:54:48,770 --> 00:54:55,250 >> Now we're going to look at the next problem we have, which is the cat program. 664 00:54:55,250 --> 00:54:59,880 If you watch the short on redirecting and pipes, you'll see that Tommy uses this program 665 00:54:59,880 --> 00:55:07,540 that basically prints all the output of a file to the screen. 666 00:55:07,540 --> 00:55:12,660 So if I run cat, this is actually a built-in program to the appliance, 667 00:55:12,660 --> 00:55:16,860 and if you have Macs you can do this on your Mac too, if you open up terminal. 668 00:55:16,860 --> 00:55:25,630 And we--cat, let's say, cp.c, and hit Enter. 669 00:55:25,630 --> 00:55:29,640 What this did, if we scroll up a little bit and see where we ran the line, 670 00:55:29,640 --> 00:55:40,440 or where we ran the cat command, it literally just printed out the contents of cp.c to our screen. 671 00:55:40,440 --> 00:55:44,140 We can run it again and you can put in multiple files together. 672 00:55:44,140 --> 00:55:49,880 So you can do cat cp.c, and then we can also concatenate the cat.c file, 673 00:55:49,880 --> 00:55:53,250 which is the program we're about to write, 674 00:55:53,250 --> 00:55:58,140 and it'll print both files back to back to our screen. 675 00:55:58,140 --> 00:56:05,490 So if we scroll up a little bit, we see that when we ran this cat cp.c, cat.c, 676 00:56:05,490 --> 00:56:17,110 first it printed out the cp file, and then below it, it printed out the cat.c file right down here. 677 00:56:17,110 --> 00:56:19,650 We're going to use this to just get our feet wet. 678 00:56:19,650 --> 00:56:25,930 Play around with simple printing to the terminal, see how that works. 679 00:56:25,930 --> 00:56:39,170 If you guys open up with gedit cat.c, hit Enter, 680 00:56:39,170 --> 00:56:43,760 you can see the program that we're about to write. 681 00:56:43,760 --> 00:56:48,980 We've included this nice boiler plate, so we don't have to spend time typing all that out. 682 00:56:48,980 --> 00:56:52,310 We also check the number of arguments passed in. 683 00:56:52,310 --> 00:56:56,910 We print out a nice usage message. 684 00:56:56,910 --> 00:57:00,950 >> This is the sort of thing that, again, like we've been talking about, 685 00:57:00,950 --> 00:57:04,490 it's almost like muscle memory. 686 00:57:04,490 --> 00:57:07,190 Just remember to keep doing the same sort of stuff 687 00:57:07,190 --> 00:57:11,310 and always printing out some sort of helpful message 688 00:57:11,310 --> 00:57:17,670 so that people know how to run your program. 689 00:57:17,670 --> 00:57:21,630 With cat, it's pretty simple; we're just going to go through all of the different arguments 690 00:57:21,630 --> 00:57:24,300 that were passed to our program, and we're going to print 691 00:57:24,300 --> 00:57:29,950 their contents out to the screen one at a time. 692 00:57:29,950 --> 00:57:35,670 In order to print files out to the screen, we're going to do something very similar 693 00:57:35,670 --> 00:57:38,120 to what we did at the end of the quiz. 694 00:57:38,120 --> 00:57:45,350 At the end of the quiz, that hire program, we had to open up a file, 695 00:57:45,350 --> 00:57:48,490 and then we had to print to it. 696 00:57:48,490 --> 00:57:54,660 In this case, we're going to open up a file, and we're going to read from it instead. 697 00:57:54,660 --> 00:58:00,630 Then we're going to print, instead of to a file, we're going to print to the screen. 698 00:58:00,630 --> 00:58:05,830 So printing to the screen you've all done before with printf. 699 00:58:05,830 --> 00:58:08,290 So that's not too crazy. 700 00:58:08,290 --> 00:58:12,190 But reading a file is kind of weird. 701 00:58:12,190 --> 00:58:17,300 We'll go through that a little bit at a time. 702 00:58:17,300 --> 00:58:20,560 If you guys go back to that last problem on your quiz, problem 33, 703 00:58:20,560 --> 00:58:27,280 the first line that we're going to do here, opening the file, is very similar to what we did there. 704 00:58:27,280 --> 00:58:36,370 So Stella, what does that line look like, when we open a file? 705 00:58:36,370 --> 00:58:47,510 [Stella] Capital FILE*, file-- >>Okay. >>--is equal to fopen. >>Yup. 706 00:58:47,510 --> 00:58:55,980 Which in this case is? It's in the comment. 707 00:58:55,980 --> 00:59:06,930 >>It's in the comment? argv[i] and r? 708 00:59:06,930 --> 00:59:11,300 >>Exactly. Right on. So Stella's totally right. 709 00:59:11,300 --> 00:59:13,720 This is what the line looks like. 710 00:59:13,720 --> 00:59:19,670 We're going to get a file stream variable, store it in a FILE*, so all caps, 711 00:59:19,670 --> 00:59:25,720 F-I-L-E, *, and the name of this variable will be file. 712 00:59:25,720 --> 00:59:32,250 We could call it whatever we like. We could call it first_file, or file_i, whatever we'd like. 713 00:59:32,250 --> 00:59:37,590 And then the name of the file was passed in on the command line to this program. 714 00:59:37,590 --> 00:59:44,450 So it's stored in argv[i,] and then we're going to open this file in read mode. 715 00:59:44,450 --> 00:59:48,100 Now that we've opened the file, what's the thing that we always have to remember to do 716 00:59:48,100 --> 00:59:52,230 whenever we've opened a file? Close it. 717 00:59:52,230 --> 00:59:57,220 So Missy, how do we close a file? 718 00:59:57,220 --> 01:00:01,020 [Missy] fclose(file) >>fclose(file). Exactly. 719 01:00:01,020 --> 01:00:05,340 Great. Okay. If we look at this to do comment right here, 720 01:00:05,340 --> 01:00:11,940 it says, "Open argv[i] and print its contents to stdout." 721 01:00:11,940 --> 01:00:15,460 >> Standard out is a weird name. Stdout is just our way of saying 722 01:00:15,460 --> 01:00:22,880 we want to print it to the terminal; we want to print it to the standard output stream. 723 01:00:22,880 --> 01:00:26,450 We can actually get rid of this comment right here. 724 01:00:26,450 --> 01:00:36,480 I'm going to copy it and paste it since that's what we did. 725 01:00:36,480 --> 01:00:41,290 At this point, now we have to read the file bit by bit. 726 01:00:41,290 --> 01:00:46,300 We've discussed a couple of ways of reading files. 727 01:00:46,300 --> 01:00:51,830 Which ones are your favorites so far? 728 01:00:51,830 --> 01:00:57,960 Which ways have you seen or do you remember, to read files? 729 01:00:57,960 --> 01:01:04,870 [Daniel] fread? >>fread? So fread is one. Jimmy, do you know any others? 730 01:01:04,870 --> 01:01:12,150 [Jimmy] No. >>Okay. Nope. Charlotte? Alexander? Any others? Okay. 731 01:01:12,150 --> 01:01:20,740 So the other ones are fgetc, is one that we'll use a lot. 732 01:01:20,740 --> 01:01:26,410 There's also fscanf; you guys see a pattern here? 733 01:01:26,410 --> 01:01:29,170 They all begin with f. Anything to do with a file. 734 01:01:29,170 --> 01:01:35,260 There's fread, fgetc, fscanf. These are all of the reading functions. 735 01:01:35,260 --> 01:01:49,120 For writing we have fwrite, we have fputc instead of fgetc. 736 01:01:49,120 --> 01:01:58,250 We also have fprintf like we saw on the quiz. 737 01:01:58,250 --> 01:02:01,680 Since this is a problem that involves reading from a file, 738 01:02:01,680 --> 01:02:04,940 we're going to use one of these three functions. 739 01:02:04,940 --> 01:02:10,890 We're not going to use these functions down here. 740 01:02:10,890 --> 01:02:14,880 These functions are all found in the standard I/O library. 741 01:02:14,880 --> 01:02:17,510 So if you look at the top of this program, 742 01:02:17,510 --> 01:02:24,110 you can see that we've already included the header file for the standard I/O library. 743 01:02:24,110 --> 01:02:27,120 If we want to figure out which one we want to use, 744 01:02:27,120 --> 01:02:29,690 we can always open up the man pages. 745 01:02:29,690 --> 01:02:34,350 So we can type man stdio 746 01:02:34,350 --> 01:02:43,180 and read all about the stdio input and output functions in C. 747 01:02:43,180 --> 01:02:49,870 And we can already see oh, look. It's mentioning fgetc, it's mentioning fputc. 748 01:02:49,870 --> 01:02:57,220 So you can drill down a little bit and look at, say, fgetc 749 01:02:57,220 --> 01:03:00,060 and look at its man page. 750 01:03:00,060 --> 01:03:03,430 You can see that it goes along with a whole bunch of other functions: 751 01:03:03,430 --> 01:03:12,640 fgetc, fgets, getc, getchar, gets, ungetc, and its input of characters and strings. 752 01:03:12,640 --> 01:03:19,180 So this is how we read in characters and strings from files from standard input, 753 01:03:19,180 --> 01:03:21,990 which is essentially from the user. 754 01:03:21,990 --> 01:03:24,780 And this is how we do it in actual C. 755 01:03:24,780 --> 01:03:30,850 So this is not using the GetString and GetChar functions 756 01:03:30,850 --> 01:03:36,840 that we used from the CS50 library. 757 01:03:36,840 --> 01:03:39,710 We're going to do this problem in a couple of ways 758 01:03:39,710 --> 01:03:43,430 so that you can see two different ways of doing it. 759 01:03:43,430 --> 01:03:48,490 Both the fread function that Daniel mentioned and fgetc are good ways to do it. 760 01:03:48,490 --> 01:03:53,790 I think fgetc is a little easier, because it only has, as you see, 761 01:03:53,790 --> 01:03:59,660 one argument, the FILE* that we're trying to read the character from, 762 01:03:59,660 --> 01:04:02,740 and its return value is an int. 763 01:04:02,740 --> 01:04:05,610 And this is a little confusing, right? 764 01:04:05,610 --> 01:04:11,450 >> Because we're getting a character, so why doesn't this return a char? 765 01:04:11,450 --> 01:04:18,700 You guys have any ideas on why this might not return a char? 766 01:04:18,700 --> 01:04:25,510 [Missy answers, unintelligible] >>Yeah. So Missy's totally right. 767 01:04:25,510 --> 01:04:31,570 If it's ASCII, then this integer could be mapped to an actual char. 768 01:04:31,570 --> 01:04:33,520 Could be an ASCII character, and that's right. 769 01:04:33,520 --> 01:04:36,220 That's exactly what's happening. 770 01:04:36,220 --> 01:04:39,190 We're using an int simply because it has more bits. 771 01:04:39,190 --> 01:04:44,750 It's bigger than a char; our char only has 8 bits, that 1 byte on our 32-bit machines. 772 01:04:44,750 --> 01:04:48,520 And an int has all 4 bytes' worth of space. 773 01:04:48,520 --> 01:04:50,940 And it turns out that the way fgetc works, 774 01:04:50,940 --> 01:04:53,940 if we scroll down in our synopsis in this man page a little bit, 775 01:04:53,940 --> 01:05:05,000 scroll all the way down. It turns out that they use this special value called EOF. 776 01:05:05,000 --> 01:05:09,640 It's a special constant as the return value of the fgetc function 777 01:05:09,640 --> 01:05:14,570 whenever you hit the end of the file or if you get an error. 778 01:05:14,570 --> 01:05:18,170 And it turns out that to do these comparisons with EOF properly, 779 01:05:18,170 --> 01:05:24,060 you want to have that extra amount of information that you have in an int 780 01:05:24,060 --> 01:05:28,420 as opposed to using a char variable. 781 01:05:28,420 --> 01:05:32,130 Even though fgetc is effectively getting a character from a file, 782 01:05:32,130 --> 01:05:38,450 you want to remember that it is returning something that's of type int to you. 783 01:05:38,450 --> 01:05:41,360 That said, it's fairly easy to use. 784 01:05:41,360 --> 01:05:44,960 It's going to give us a character; so all we have to do is keep asking the file, 785 01:05:44,960 --> 01:05:48,440 "Give me the next character, give me the next character, give me the next character," 786 01:05:48,440 --> 01:05:51,400 until we get to the end of the file. 787 01:05:51,400 --> 01:05:54,730 And that will pull in one character at a time from our file, 788 01:05:54,730 --> 01:05:56,250 and then we can do whatever we like with it. 789 01:05:56,250 --> 01:06:00,160 We can store it, we can add it to a string, we can print it out. 790 01:06:00,160 --> 01:06:04,630 Do any of that. 791 01:06:04,630 --> 01:06:09,600 >> Zooming back out and going back to our cat.c program, 792 01:06:09,600 --> 01:06:16,170 if we're going to use fgetc, 793 01:06:16,170 --> 01:06:21,710 how might we approach this next line of code? 794 01:06:21,710 --> 01:06:26,020 We're going to use--fread will do something slightly different. 795 01:06:26,020 --> 01:06:32,600 And this time, we're just going to use fgetc to get one character at a time. 796 01:06:32,600 --> 01:06:40,910 To process an entire file, what might we have to do? 797 01:06:40,910 --> 01:06:44,030 How many characters are there in a file? 798 01:06:44,030 --> 01:06:47,390 There are a lot. So you probably want to get one 799 01:06:47,390 --> 01:06:49,860 and then get another and get another and get another. 800 01:06:49,860 --> 01:06:53,330 What kind of algorithm do you think we might have to use here? 801 01:06:53,330 --> 01:06:55,470 What type of--? [Alexander] A for loop? >>Exactly. 802 01:06:55,470 --> 01:06:57,500 Some type of loop. 803 01:06:57,500 --> 01:07:03,380 A for loop is actually great, in this case. 804 01:07:03,380 --> 01:07:08,620 And like you were saying, it sounds like you want a loop over the entire file, 805 01:07:08,620 --> 01:07:11,820 getting a character at a time. 806 01:07:11,820 --> 01:07:13,850 Any suggestions on what that might look like? 807 01:07:13,850 --> 01:07:22,090 [Alexander, unintelligible] 808 01:07:22,090 --> 01:07:30,050 >>Okay, just tell me in English what you're trying to do? [Alexander, unintelligible] 809 01:07:30,050 --> 01:07:36,270 So in this case, it sounds like we're just trying to loop over the entire file. 810 01:07:36,270 --> 01:07:45,330 [Alexander] So i < the size of int? >>The size of--? 811 01:07:45,330 --> 01:07:49,290 I guess the size of the file, right? The size--we'll just write it like this. 812 01:07:49,290 --> 01:07:57,470 Size of file for the time being, i++. 813 01:07:57,470 --> 01:08:04,610 So it turns out that the way you do this using fgetc, and this is new, 814 01:08:04,610 --> 01:08:10,460 is that there's no easy way to just get the size of a file 815 01:08:10,460 --> 01:08:16,979 with this "sizeof" type of construct that you've seen before. 816 01:08:16,979 --> 01:08:20,910 When we use that fgetc function, we're introducing some kind of 817 01:08:20,910 --> 01:08:29,069 new, funky syntax to this for loop, where instead of using just a basic counter 818 01:08:29,069 --> 01:08:33,920 to go character by character, we're going to pull one character at a time, 819 01:08:33,920 --> 01:08:37,120 one character at a time, and the way we know we're at the end 820 01:08:37,120 --> 01:08:41,290 is not when we've counted a certain number of characters, 821 01:08:41,290 --> 01:08:49,939 but when the character we pull out is that special end of file character. 822 01:08:49,939 --> 01:08:58,689 So we can do this by--I call this ch, and we're going to initialize it 823 01:08:58,689 --> 01:09:08,050 with our first call to get the first character out of the file. 824 01:09:08,050 --> 01:09:14,979 So this part right here, this is going to get a character out of the file 825 01:09:14,979 --> 01:09:20,840 and store it into the variable ch. 826 01:09:20,840 --> 01:09:25,420 We're going to keep doing this until we get to the end of the file, 827 01:09:25,420 --> 01:09:41,170 which we do by testing for the character not being equal to that special EOF character. 828 01:09:41,170 --> 01:09:48,750 And then instead of doing ch++, which would just increment the value, 829 01:09:48,750 --> 01:09:52,710 so if we read an A out of the file, a capital A, say, 830 01:09:52,710 --> 01:09:56,810 ch++ would give us b, and then we'd get c and then d. 831 01:09:56,810 --> 01:09:59,310 That's clearly not what we want. What we want here 832 01:09:59,310 --> 01:10:05,830 in this last bit is we want to get the next character from the file. 833 01:10:05,830 --> 01:10:09,500 >> So how might we get the next character from the file? 834 01:10:09,500 --> 01:10:13,470 How do we get the first character from the file? 835 01:10:13,470 --> 01:10:17,200 [Student] fgetfile? >>fgetc, or, sorry, you were totally right. 836 01:10:17,200 --> 01:10:20,470 I misspelled it right there. So yeah. 837 01:10:20,470 --> 01:10:26,240 Here instead of doing ch++, 838 01:10:26,240 --> 01:10:29,560 we're just going to call fgetc(file) again 839 01:10:29,560 --> 01:10:39,180 and store the result in our same ch variable. 840 01:10:39,180 --> 01:10:43,730 [Student question, unintelligible] 841 01:10:43,730 --> 01:10:52,390 >>This is where these FILE* guys are special. 842 01:10:52,390 --> 01:10:59,070 The way they work is they--when you first open--when you first make that fopen call, 843 01:10:59,070 --> 01:11:04,260 the FILE* effectively serves as a pointer to the beginning of the file. 844 01:11:04,260 --> 01:11:12,830 And then every time you call fgetc, it moves one character through the file. 845 01:11:12,830 --> 01:11:23,280 So whenever you call this, you're incrementing the file pointer by one character. 846 01:11:23,280 --> 01:11:26,210 And when you fgetc again, you're moving it another character 847 01:11:26,210 --> 01:11:28,910 and another character and another character and another character. 848 01:11:28,910 --> 01:11:32,030 [Student question, unintelligible] >>And that's--yeah. 849 01:11:32,030 --> 01:11:34,810 It's kind of this magic under the hood. 850 01:11:34,810 --> 01:11:37,930 You just keep incrementing through. 851 01:11:37,930 --> 01:11:46,510 At this point, you're able to actually work with a character. 852 01:11:46,510 --> 01:11:52,150 So how might we print this out to the screen, now? 853 01:11:52,150 --> 01:11:58,340 We can use the same printf thing that we used before. 854 01:11:58,340 --> 01:12:00,330 That we've been using all semester. 855 01:12:00,330 --> 01:12:05,450 We can call printf, 856 01:12:05,450 --> 01:12:21,300 and we can pass in the character just like that. 857 01:12:21,300 --> 01:12:27,430 Another way to do it is rather than using printf and having to do this format string, 858 01:12:27,430 --> 01:12:29,490 we can also use one of the other functions. 859 01:12:29,490 --> 01:12:40,090 We can use fputc, which prints a character to the screen, 860 01:12:40,090 --> 01:12:52,580 except if we look at fputc--let me zoom out a little bit. 861 01:12:52,580 --> 01:12:56,430 We see what's nice is it takes in the character that we read out using fgetc, 862 01:12:56,430 --> 01:13:05,100 but then we have to give it a stream to print to. 863 01:13:05,100 --> 01:13:11,850 We can also use the putchar function, which will put directly to standard out. 864 01:13:11,850 --> 01:13:16,070 So there are a whole bunch of different options that we can use for printing. 865 01:13:16,070 --> 01:13:19,580 They're all in the standard I/O library. 866 01:13:19,580 --> 01:13:25,150 Whenever you want to print--so printf, by default, will print to the special standard out stream, 867 01:13:25,150 --> 01:13:27,910 which is that stdout. 868 01:13:27,910 --> 01:13:41,300 So we can just refer to it as kind of this magic value, stdout in here. 869 01:13:41,300 --> 01:13:48,410 Oops. Put the semicolon outside. 870 01:13:48,410 --> 01:13:52,790 >> This is a lot of new, funky information in here. 871 01:13:52,790 --> 01:13:58,600 A lot of this is very idiomatic, in the sense that this is code 872 01:13:58,600 --> 01:14:05,700 that is written this way just because it's clean to read, easy to read. 873 01:14:05,700 --> 01:14:11,520 There are many different ways to do it, many different functions you can use, 874 01:14:11,520 --> 01:14:14,680 but we tend to just follow these same patterns over and over. 875 01:14:14,680 --> 01:14:20,180 So don't be surprised if you see code like this coming up again and again. 876 01:14:20,180 --> 01:14:25,690 All right. At this point, we need to break for the day. 877 01:14:25,690 --> 01:14:31,300 Thanks for coming. Thanks for watching if you're online. And we'll see you next week. 878 01:14:31,300 --> 01:14:33,890 [CS50.TV]