1 00:00:00,000 --> 00:00:01,856 2 00:00:01,856 --> 00:00:03,850 CARTER ZENKE: OK, well, hello one and all 3 00:00:03,850 --> 00:00:07,430 and welcome to CS50's fourth section this week. 4 00:00:07,430 --> 00:00:11,260 My name is Carter Zenke, and I am the course's preceptor here on campus. 5 00:00:11,260 --> 00:00:14,350 And this week, week 4, we learned about memory 6 00:00:14,350 --> 00:00:17,830 and how our computer actually stores things in bits and bytes 7 00:00:17,830 --> 00:00:20,920 across its own landscape of bits and bytes. 8 00:00:20,920 --> 00:00:23,920 And so our goal today will be to talk about a few different topics, most 9 00:00:23,920 --> 00:00:26,860 of them from lecture, and to give you the chance to ask the questions 10 00:00:26,860 --> 00:00:30,950 you want to ask so you can prepare yourself for this week's problem set. 11 00:00:30,950 --> 00:00:34,700 So among these topics are this idea of pointers 12 00:00:34,700 --> 00:00:36,590 and why we would even use them. 13 00:00:36,590 --> 00:00:39,250 So as we saw in lecture, a pointer is a way 14 00:00:39,250 --> 00:00:44,710 of storing some address in a computer's memory, kind of similar to a variable. 15 00:00:44,710 --> 00:00:49,610 We'll also talk about how we can read and write data to files. 16 00:00:49,610 --> 00:00:53,650 So up until now, you've been writing programs that maybe take user input 17 00:00:53,650 --> 00:00:55,750 and print something back to the screen. 18 00:00:55,750 --> 00:00:58,910 By the end of this section and by the end of this week, 19 00:00:58,910 --> 00:01:02,810 you'll be able to write programs that actually open up files and can actually 20 00:01:02,810 --> 00:01:06,380 write data to files, getting more advanced along the way. 21 00:01:06,380 --> 00:01:11,490 And all of this will hopefully prepare you for problem set 4 at the very end. 22 00:01:11,490 --> 00:01:14,090 So I'm excited to dive into these topics today. 23 00:01:14,090 --> 00:01:17,820 Now, the first topic will be this idea of pointers. 24 00:01:17,820 --> 00:01:20,123 And they are admittedly a little bit scary. 25 00:01:20,123 --> 00:01:22,790 I've seen a lot of fear around pointers when we talk about them. 26 00:01:22,790 --> 00:01:26,120 I think they're scary because they introduce some new syntax that 27 00:01:26,120 --> 00:01:27,780 doesn't feel quite familiar yet. 28 00:01:27,780 --> 00:01:30,900 And so the goal is to help you become familiar with that syntax 29 00:01:30,900 --> 00:01:34,560 so you can actually leave feeling a little bit less scared of pointers 30 00:01:34,560 --> 00:01:37,980 and more empowered to use them at the end of the day. 31 00:01:37,980 --> 00:01:41,540 So I think one thing that makes pointers less scary 32 00:01:41,540 --> 00:01:44,210 is this idea that they really tie closely 33 00:01:44,210 --> 00:01:50,250 to an idea you already know about, and that is this idea of a variable. 34 00:01:50,250 --> 00:01:53,630 So if you remember back in, let's say, week 1 or so, we 35 00:01:53,630 --> 00:01:56,160 talked about this idea of a variable. 36 00:01:56,160 --> 00:01:58,370 And in this section, we had this idea of trying 37 00:01:58,370 --> 00:02:01,340 to make a contact application to store, let's say, 38 00:02:01,340 --> 00:02:04,700 the number of times we had called somebody on the phone. 39 00:02:04,700 --> 00:02:08,360 And so we had this variable named calls, and this 40 00:02:08,360 --> 00:02:10,070 was our visual of that variable. 41 00:02:10,070 --> 00:02:14,540 We had a box that had some value in it, like the number 3, 42 00:02:14,540 --> 00:02:17,630 and that box had a name called calls. 43 00:02:17,630 --> 00:02:20,190 And we said that a variable was exactly this. 44 00:02:20,190 --> 00:02:24,240 It is a name for some value that can change. 45 00:02:24,240 --> 00:02:26,390 That is, the value of that variable could change. 46 00:02:26,390 --> 00:02:28,590 Could be three calls right now. 47 00:02:28,590 --> 00:02:32,390 But when I call somebody new, it could be then, later on, four calls. 48 00:02:32,390 --> 00:02:34,940 So some value that can change. 49 00:02:34,940 --> 00:02:39,410 What we didn't see in week 1, which we'll now see this week, 50 00:02:39,410 --> 00:02:43,663 is that this value has to be stored somewhere in our computer. 51 00:02:43,663 --> 00:02:45,080 It can't just be out in the ether. 52 00:02:45,080 --> 00:02:48,710 It has to have some actual physical location among all the bits 53 00:02:48,710 --> 00:02:53,030 and bytes inside of our computer, and that is what we called an address. 54 00:02:53,030 --> 00:02:56,540 This value has some address inside of our computer. 55 00:02:56,540 --> 00:03:01,970 And so in lecture, we saw this grid of addresses of bits and bytes, 56 00:03:01,970 --> 00:03:03,590 kind of like a yellow grid. 57 00:03:03,590 --> 00:03:06,632 We could store number 50, a pointer, and so on. 58 00:03:06,632 --> 00:03:08,840 I want to give you one other visual of which to think 59 00:03:08,840 --> 00:03:11,220 about pointers and addresses too. 60 00:03:11,220 --> 00:03:15,680 So you could think about it, as well, as kind of like this table here, 61 00:03:15,680 --> 00:03:20,870 where your computer has keeping track of both the addresses of values 62 00:03:20,870 --> 00:03:22,620 and the values themselves. 63 00:03:22,620 --> 00:03:27,320 That is, what is the value I'm storing, and where, among all the possible bits 64 00:03:27,320 --> 00:03:29,900 and bytes, am I storing that value? 65 00:03:29,900 --> 00:03:35,815 And so notice here that we have addresses that begin with 0x, 0x. 66 00:03:35,815 --> 00:03:38,310 And I'm curious, as a question for you all, 67 00:03:38,310 --> 00:03:43,070 what do you remember 0x meaning or signifying? 68 00:03:43,070 --> 00:03:49,150 Why do we always append or really prepend this 0x? 69 00:03:49,150 --> 00:03:52,990 So I'm seeing some right answers here, which is that this means hexadecimal. 70 00:03:52,990 --> 00:03:57,550 So whatever comes after 0x is a hexadecimal value, which 71 00:03:57,550 --> 00:04:00,290 is to say it's a base-16 value. 72 00:04:00,290 --> 00:04:04,090 So you're probably used to the base-10 system, like decimal. 73 00:04:04,090 --> 00:04:08,230 You've seen, in this class, the base-2 system, like binary. 74 00:04:08,230 --> 00:04:10,690 And now we're introducing a new one called 75 00:04:10,690 --> 00:04:15,880 base 16, which is just handy because we can count up to larger numbers 76 00:04:15,880 --> 00:04:18,100 without using so many digits. 77 00:04:18,100 --> 00:04:21,519 Like if we're talking about the many billions of bits that are inside 78 00:04:21,519 --> 00:04:23,920 of a computer, it's worth representing them in a way 79 00:04:23,920 --> 00:04:27,190 we can count pretty high with some fewer number of actual digits 80 00:04:27,190 --> 00:04:30,190 we'll see as part of our number we're writing out. 81 00:04:30,190 --> 00:04:32,830 So that's the background for this address here. 82 00:04:32,830 --> 00:04:35,590 But to keep going with this idea, our computer 83 00:04:35,590 --> 00:04:38,450 wants to store some value somewhere in memory. 84 00:04:38,450 --> 00:04:41,440 And so for that calls variable we saw earlier, well, 85 00:04:41,440 --> 00:04:43,840 that could be stored at this address right here. 86 00:04:43,840 --> 00:04:45,310 0x-- let's see-- 87 00:04:45,310 --> 00:04:47,980 50-- a 5, 000-- 88 00:04:47,980 --> 00:04:49,140 5 and seven 0's. 89 00:04:49,140 --> 00:04:50,960 So 0x5 and seven 0's. 90 00:04:50,960 --> 00:04:55,550 Somewhere-- doesn't quite matter-- it has some address in our computer. 91 00:04:55,550 --> 00:04:59,510 And the way we put it there was this C syntax on the right. 92 00:04:59,510 --> 00:05:03,980 We simply said, we're going to create some new integer whose name is 93 00:05:03,980 --> 00:05:06,620 calls that will get the value 3. 94 00:05:06,620 --> 00:05:09,570 And we're going to store it somewhere in memory. 95 00:05:09,570 --> 00:05:15,230 Now, a pointer is this exact same idea, but the only difference 96 00:05:15,230 --> 00:05:19,580 is that we're not going to store an integer or a character 97 00:05:19,580 --> 00:05:21,500 or really a string or anything else. 98 00:05:21,500 --> 00:05:24,930 We're going to store, instead, an address itself. 99 00:05:24,930 --> 00:05:27,920 So here's what we would say if we're going to make a pointer. 100 00:05:27,920 --> 00:05:33,800 If we go here, notice how on the next table row, 101 00:05:33,800 --> 00:05:37,070 I'm not storing an integer or a character. 102 00:05:37,070 --> 00:05:42,270 I'm storing an address itself at some address inside the computer. 103 00:05:42,270 --> 00:05:44,570 And notice in particular that this address 104 00:05:44,570 --> 00:05:49,160 I'm storing, 0x5 and then seven 0's afterwards, 105 00:05:49,160 --> 00:05:54,330 that seems to reference some other address in my computer's memory. 106 00:05:54,330 --> 00:05:56,550 That is the place where the 3 is stored. 107 00:05:56,550 --> 00:06:01,370 So we could say that this value, 0x5 and seven 0's, 108 00:06:01,370 --> 00:06:05,780 that is pointing to this other address where the 3 is stored, 109 00:06:05,780 --> 00:06:08,700 and, therefore, it is a pointer. 110 00:06:08,700 --> 00:06:12,590 So the only difference here is we're no longer storing integers or characters 111 00:06:12,590 --> 00:06:13,220 or so on. 112 00:06:13,220 --> 00:06:18,080 We're now just storing addresses at some place inside our computer. 113 00:06:18,080 --> 00:06:25,350 So let me ask, what questions do we have on this so far, on this mental model 114 00:06:25,350 --> 00:06:29,190 or on this idea of now storing addresses and not just actual 115 00:06:29,190 --> 00:06:31,800 values like numbers or characters? 116 00:06:31,800 --> 00:06:35,670 117 00:06:35,670 --> 00:06:36,890 OK. 118 00:06:36,890 --> 00:06:38,552 Not seeing any so far. 119 00:06:38,552 --> 00:06:40,260 Actually, let me ask you a question here. 120 00:06:40,260 --> 00:06:42,950 How can we dereference this pointer? 121 00:06:42,950 --> 00:06:45,180 It's a good vocabulary word we learned in lecture, 122 00:06:45,180 --> 00:06:49,380 which is to follow a pointer to the place it is pointing to. 123 00:06:49,380 --> 00:06:51,500 So let's begin with a bit of syntax here. 124 00:06:51,500 --> 00:06:54,990 And here, if I show you this in full screen, 125 00:06:54,990 --> 00:06:59,060 you'll see that this is the syntax we could use to do exactly what we 126 00:06:59,060 --> 00:07:01,070 did on the table on the left. 127 00:07:01,070 --> 00:07:05,300 Let's create now a pointer whose name is p, 128 00:07:05,300 --> 00:07:11,420 and we're going to store the address of this variable named calls. 129 00:07:11,420 --> 00:07:13,430 So notice a few pieces of syntax here. 130 00:07:13,430 --> 00:07:18,410 We have int *, which means this is no longer an integer, 131 00:07:18,410 --> 00:07:20,990 but it is a pointer to an integer. 132 00:07:20,990 --> 00:07:24,070 It is going to direct us-- if we dereference this pointer, 133 00:07:24,070 --> 00:07:25,820 we're going to get-- we're going to end up 134 00:07:25,820 --> 00:07:28,490 at an integer in our computer's memory. 135 00:07:28,490 --> 00:07:32,000 And notice, too, this ampersand, which means 136 00:07:32,000 --> 00:07:37,260 we should store the address of whatever calls is storing. 137 00:07:37,260 --> 00:07:40,950 In which case, we see in the table here, that is 0x5 138 00:07:40,950 --> 00:07:43,060 and then seven 0's afterwards. 139 00:07:43,060 --> 00:07:46,890 So the data type of p here is a pointer, but it 140 00:07:46,890 --> 00:07:51,950 is a pointer to an integer and not a character, for instance. 141 00:07:51,950 --> 00:07:56,168 So to highlight here as well, let's try setting this up 142 00:07:56,168 --> 00:07:57,710 and visualizing it a little bit more. 143 00:07:57,710 --> 00:08:01,540 So all we're doing is saying the same thing we did for a variable-- 144 00:08:01,540 --> 00:08:07,600 that p, this value named p, should store this particular address now. 145 00:08:07,600 --> 00:08:10,610 And I think it's worth breaking down the syntax on the left 146 00:08:10,610 --> 00:08:14,690 so we actually see it, and you can use it later on in your own programs. 147 00:08:14,690 --> 00:08:20,440 So one common point of confusion here is that this variable, this pointer, 148 00:08:20,440 --> 00:08:22,810 is not called star p. 149 00:08:22,810 --> 00:08:24,160 That is not its name. 150 00:08:24,160 --> 00:08:26,110 In fact, it is just called p. 151 00:08:26,110 --> 00:08:30,178 But it is confusing though because we see int *p, 152 00:08:30,178 --> 00:08:31,720 like the star is right next to the p. 153 00:08:31,720 --> 00:08:33,640 And so we kind of assume, well, this pointer 154 00:08:33,640 --> 00:08:35,780 is just going to be called star p later on. 155 00:08:35,780 --> 00:08:38,080 But in fact, it's just a syntactic trick. 156 00:08:38,080 --> 00:08:45,730 All we're doing here is saying this is a pointer named p, and its type is int *, 157 00:08:45,730 --> 00:08:48,050 or a pointer to an integer. 158 00:08:48,050 --> 00:08:52,880 So when you see this star being used to initialize or declare some value, 159 00:08:52,880 --> 00:08:55,850 be careful here and think, what is the name 160 00:08:55,850 --> 00:08:58,790 of the variable that I'm working with or the pointer that I have, 161 00:08:58,790 --> 00:08:59,903 and what is its type? 162 00:08:59,903 --> 00:09:01,820 Because it might not be quite apparent to you, 163 00:09:01,820 --> 00:09:05,240 if you're new, that you might say, OK, well, the star is next to the p. 164 00:09:05,240 --> 00:09:06,570 Maybe that's its name. 165 00:09:06,570 --> 00:09:08,390 It's not. p is its name. 166 00:09:08,390 --> 00:09:10,160 Int * is its type. 167 00:09:10,160 --> 00:09:12,890 Now, it's more obvious on the right-hand side 168 00:09:12,890 --> 00:09:15,710 here that the value is simply the address 169 00:09:15,710 --> 00:09:18,270 of calls using the ampersand here. 170 00:09:18,270 --> 00:09:22,700 And as we saw in lecture, I like to think of this as ampersand for address. 171 00:09:22,700 --> 00:09:26,710 Both begin with A in this case. 172 00:09:26,710 --> 00:09:30,270 And a question here is, could we actually modify this syntax and say, 173 00:09:30,270 --> 00:09:31,500 maybe, int* p? 174 00:09:31,500 --> 00:09:34,380 175 00:09:34,380 --> 00:09:37,150 That's absolutely possible, as we saw in lecture. 176 00:09:37,150 --> 00:09:41,670 We could also have int * p, which would be maybe 177 00:09:41,670 --> 00:09:44,680 a little more confusing in my opinion, but that would work as well. 178 00:09:44,680 --> 00:09:46,530 So white space doesn't quite matter. 179 00:09:46,530 --> 00:09:49,570 But by convention, we tend to use this syntax 180 00:09:49,570 --> 00:09:55,200 you're going to see here, int * followed by the variable name. 181 00:09:55,200 --> 00:09:57,420 Now, a question here too is, what if we wanted 182 00:09:57,420 --> 00:10:02,310 to store not just an integer or a pointer to an integer, but a pointer 183 00:10:02,310 --> 00:10:06,360 to a character, which, as we saw, is kind of like what a string would be? 184 00:10:06,360 --> 00:10:10,440 Well, in that case, we'd simply replace the int part of this syntax 185 00:10:10,440 --> 00:10:12,330 with a char or "kayr"-- 186 00:10:12,330 --> 00:10:18,160 C-H-A-R. That would then store a pointer to a character. 187 00:10:18,160 --> 00:10:20,590 And the follow-on question is, well, if we 188 00:10:20,590 --> 00:10:24,280 can store a pointer to an integer, a pointer to a character, 189 00:10:24,280 --> 00:10:26,530 could we have a pointer to a pointer? 190 00:10:26,530 --> 00:10:27,400 You could. 191 00:10:27,400 --> 00:10:29,710 There's nothing stopping you. 192 00:10:29,710 --> 00:10:32,710 You can have a pointer to a pointer to a pointer and so on and so forth. 193 00:10:32,710 --> 00:10:35,418 But at that point, I'd say we're getting a little bit ridiculous. 194 00:10:35,418 --> 00:10:37,930 We can pause and say, what are we actually doing here? 195 00:10:37,930 --> 00:10:40,150 But you could possibly do it. 196 00:10:40,150 --> 00:10:44,920 So to recap here, some key syntax to study, to practice with, 197 00:10:44,920 --> 00:10:46,760 is going to be the following. 198 00:10:46,760 --> 00:10:49,210 You could say the type *-- 199 00:10:49,210 --> 00:10:51,190 just get it in your head-- is a pointer that 200 00:10:51,190 --> 00:10:54,400 stores the address of a certain type. 201 00:10:54,400 --> 00:10:59,470 But then later on, as we'll see, if I want to dereference that value-- 202 00:10:59,470 --> 00:11:02,980 that is, get the value that that pointer is pointing to-- 203 00:11:02,980 --> 00:11:05,570 I would use the star in a different way. 204 00:11:05,570 --> 00:11:09,580 I would say star and then the name of that pointer. 205 00:11:09,580 --> 00:11:14,800 So *x, for example, takes a pointer named x and gets the value that is 206 00:11:14,800 --> 00:11:17,110 stored at that address. 207 00:11:17,110 --> 00:11:22,910 As we just discussed here, &x takes x and gets its address. 208 00:11:22,910 --> 00:11:28,040 Ampersand for address, A for A in this case. 209 00:11:28,040 --> 00:11:31,660 So questions on this syntax here? 210 00:11:31,660 --> 00:11:35,270 211 00:11:35,270 --> 00:11:38,160 Question about how pointers work with arrays. 212 00:11:38,160 --> 00:11:42,620 So we'll see this probably a little bit more as you work on the problem set 213 00:11:42,620 --> 00:11:44,720 and go on with the course. 214 00:11:44,720 --> 00:11:48,290 But you could think of a pointer as pointing 215 00:11:48,290 --> 00:11:51,590 to the beginning of some chunk of memory. 216 00:11:51,590 --> 00:11:53,720 In the previous example, we saw a pointer 217 00:11:53,720 --> 00:11:58,820 pointing to one particular value like one integer or one character. 218 00:11:58,820 --> 00:12:01,310 But there's nothing stopping us from having 219 00:12:01,310 --> 00:12:05,990 a pointer that points to an array of values or a list of values all back 220 00:12:05,990 --> 00:12:08,220 to back to back in a computer's memory. 221 00:12:08,220 --> 00:12:13,140 So we can have pointers to arrays. 222 00:12:13,140 --> 00:12:14,280 Other questions too? 223 00:12:14,280 --> 00:12:19,630 224 00:12:19,630 --> 00:12:23,470 OK, so I think one question that comes to my mind 225 00:12:23,470 --> 00:12:27,230 when I first see this idea of pointers is, that's all fine and good. 226 00:12:27,230 --> 00:12:28,000 I love pointers. 227 00:12:28,000 --> 00:12:28,780 I love addresses. 228 00:12:28,780 --> 00:12:31,420 But why would I ever use them, because at this point, 229 00:12:31,420 --> 00:12:33,430 it seems kind of confusing? 230 00:12:33,430 --> 00:12:38,530 Why would I use this syntax versus the one I already know, and why is it worth 231 00:12:38,530 --> 00:12:40,930 getting into the weeds about all these addresses 232 00:12:40,930 --> 00:12:43,120 when I can just store things wherever? 233 00:12:43,120 --> 00:12:44,530 I don't quite care. 234 00:12:44,530 --> 00:12:48,470 Well, there are a few reasons you might want to use pointers, 235 00:12:48,470 --> 00:12:52,910 and pointers can enable you to do things you actually couldn't do before. 236 00:12:52,910 --> 00:12:56,110 So among them are these things that you can now do with pointers. 237 00:12:56,110 --> 00:13:00,490 You could now write a function that allows you to pass something 238 00:13:00,490 --> 00:13:04,360 in by reference and not just by copy. 239 00:13:04,360 --> 00:13:09,820 So that is to say, I could have a function that modifies some value right 240 00:13:09,820 --> 00:13:11,170 where it currently is. 241 00:13:11,170 --> 00:13:14,650 And this is helpful more so as you will see next week when 242 00:13:14,650 --> 00:13:17,320 we make these very big data structures. 243 00:13:17,320 --> 00:13:21,710 I could write functions that modify those data structures without copying 244 00:13:21,710 --> 00:13:23,670 all of them somewhere else. 245 00:13:23,670 --> 00:13:26,270 And perhaps more resonant probably this week 246 00:13:26,270 --> 00:13:29,840 is this one, which is you can use what we call dynamic memory. 247 00:13:29,840 --> 00:13:32,540 That is, up until now, you've been writing programs 248 00:13:32,540 --> 00:13:35,305 that use some fixed amount of memory. 249 00:13:35,305 --> 00:13:36,680 The user might type something in. 250 00:13:36,680 --> 00:13:39,870 Maybe it uses maybe 5 bytes or 10 bytes or so on. 251 00:13:39,870 --> 00:13:44,600 But you could now, as the program runs, actually ask your program 252 00:13:44,600 --> 00:13:48,300 for more and more memory depending on what the user actually needs. 253 00:13:48,300 --> 00:13:52,170 And you can use pointers to manage the memory for you. 254 00:13:52,170 --> 00:13:55,140 So more again on that in some future weeks. 255 00:13:55,140 --> 00:13:57,800 But pointers here are setting up these additional capabilities 256 00:13:57,800 --> 00:14:01,650 that you can use as we go through the course as well. 257 00:14:01,650 --> 00:14:04,370 So I hope you find at least some of this exciting. 258 00:14:04,370 --> 00:14:07,550 And I want to focus in particular on what we can actually 259 00:14:07,550 --> 00:14:08,640 do because of this. 260 00:14:08,640 --> 00:14:11,660 So in this first step, it's like, OK, great, 261 00:14:11,660 --> 00:14:15,050 I can write functions that pass things by reference and not by copy. 262 00:14:15,050 --> 00:14:17,720 But what does that get me ultimately? 263 00:14:17,720 --> 00:14:20,300 Well, ultimately, if you write functions that 264 00:14:20,300 --> 00:14:23,930 allow you to pass values by reference, not just by copy, 265 00:14:23,930 --> 00:14:27,410 you can write code that's ultimately cleaner as a result. 266 00:14:27,410 --> 00:14:31,400 And as I said before, you can also now scale your usage of memory 267 00:14:31,400 --> 00:14:35,090 in your application by using pointers to actually manage that memory 268 00:14:35,090 --> 00:14:40,350 and make your programs all the more efficient too. 269 00:14:40,350 --> 00:14:44,190 All right, so let's keep going here and show 270 00:14:44,190 --> 00:14:47,940 a bit of a visual example of what it means now to pass things 271 00:14:47,940 --> 00:14:51,870 by reference and not just by copy. 272 00:14:51,870 --> 00:14:57,450 So we saw in lecture we had this idea of trying to swap two values around 273 00:14:57,450 --> 00:15:00,250 and trying to write a function to do that. 274 00:15:00,250 --> 00:15:04,800 So let's revisit that and get a visual example here in our slides 275 00:15:04,800 --> 00:15:09,240 so we understand exactly what it means to now pass things by copy as opposed 276 00:15:09,240 --> 00:15:12,000 to passing things by reference. 277 00:15:12,000 --> 00:15:16,900 So you could imagine here, we have some code on the left-hand side 278 00:15:16,900 --> 00:15:20,980 and what we call our call stack on the right-hand side here. 279 00:15:20,980 --> 00:15:23,820 So you could imagine we have already called 280 00:15:23,820 --> 00:15:26,610 main, the main function of our program. 281 00:15:26,610 --> 00:15:32,820 And we've set two variables-- one equal to 10, called a, and one equal to 50, 282 00:15:32,820 --> 00:15:34,340 called b. 283 00:15:34,340 --> 00:15:39,510 And in our first implementation of swap, we had the following code 284 00:15:39,510 --> 00:15:44,140 that swap would take an integer named a, an integer named b, 285 00:15:44,140 --> 00:15:49,510 and it would swap them around, trying to store the value of b in the place of a 286 00:15:49,510 --> 00:15:52,820 and the value of a in the place of b. 287 00:15:52,820 --> 00:15:56,200 So, visually, it looks a bit like this, where I have main running. 288 00:15:56,200 --> 00:16:02,380 But now as soon as I call or use swap, I'm able to have another function call. 289 00:16:02,380 --> 00:16:06,090 I'm copying down those values from main. 290 00:16:06,090 --> 00:16:11,170 So to try animation one more time, we had main running, but now we call swap. 291 00:16:11,170 --> 00:16:13,815 We pass in these values a and b. 292 00:16:13,815 --> 00:16:16,050 We pass them by copy. 293 00:16:16,050 --> 00:16:21,380 This space that swap has gets a copy of the values for a and b. 294 00:16:21,380 --> 00:16:24,460 And then, of course, within swap, we do swap them. 295 00:16:24,460 --> 00:16:26,610 So the value of a gets 50-- 296 00:16:26,610 --> 00:16:31,960 or the variable a gets 50, and the variable b now gets 10. 297 00:16:31,960 --> 00:16:33,760 But what's the problem? 298 00:16:33,760 --> 00:16:41,060 Like if I finish now running swap, what do you notice? 299 00:16:41,060 --> 00:16:45,080 They seem to be swapped in swap, but what hasn't happened, actually? 300 00:16:45,080 --> 00:16:47,760 301 00:16:47,760 --> 00:16:48,260 Yeah. 302 00:16:48,260 --> 00:16:50,280 So in main, they're still the same. 303 00:16:50,280 --> 00:16:55,580 I see that a is still equal to 10, and b is still equal to 50. 304 00:16:55,580 --> 00:16:59,690 So if we call a function without using pointers, 305 00:16:59,690 --> 00:17:04,109 we're essentially saying, let's go ahead and take a copy of these values, 306 00:17:04,109 --> 00:17:07,020 put them somewhere else, and do something to them. 307 00:17:07,020 --> 00:17:09,770 We won't actually impact the variables we 308 00:17:09,770 --> 00:17:14,390 were hoping to impact when we set them or use them in main. 309 00:17:14,390 --> 00:17:18,079 So once swap finishes, we're still left here 310 00:17:18,079 --> 00:17:21,530 with the same values in the same places. 311 00:17:21,530 --> 00:17:26,079 So what do we do to fix this in lecture? 312 00:17:26,079 --> 00:17:27,999 What do we do to fix this problem? 313 00:17:27,999 --> 00:17:33,190 314 00:17:33,190 --> 00:17:36,040 Any ideas? 315 00:17:36,040 --> 00:17:38,540 So we did end up using pointers. 316 00:17:38,540 --> 00:17:41,800 And what we did in particular is we didn't 317 00:17:41,800 --> 00:17:45,385 copy in the actual values of a and b. 318 00:17:45,385 --> 00:17:48,430 We copied their locations or their addresses-- 319 00:17:48,430 --> 00:17:52,730 that is, the pointer to a and the pointer to b. 320 00:17:52,730 --> 00:17:58,760 So another option here is to rewrite swap so that it does this instead. 321 00:17:58,760 --> 00:18:05,590 So notice in the syntax now, this swap function is taking two arguments, 322 00:18:05,590 --> 00:18:11,830 one that is now a pointer to a and one that is now a pointer to b-- that is, 323 00:18:11,830 --> 00:18:14,950 the address of a and the address of b. 324 00:18:14,950 --> 00:18:19,840 And then, down below, it's going to use the dereference syntax to go and get 325 00:18:19,840 --> 00:18:23,390 those values and swap them in place. 326 00:18:23,390 --> 00:18:29,180 So although we're still running swap, and we're copying in some value, 327 00:18:29,180 --> 00:18:33,580 the ultimate result, though, is that we actually copy in the address, 328 00:18:33,580 --> 00:18:39,660 and we change the values exactly where those values currently are in memory. 329 00:18:39,660 --> 00:18:41,880 So, for instance, let's visualize this. 330 00:18:41,880 --> 00:18:42,680 I'll show you. 331 00:18:42,680 --> 00:18:45,340 We're going to call swap here. 332 00:18:45,340 --> 00:18:46,660 Here's swap. 333 00:18:46,660 --> 00:18:51,310 And what we pass into swap now is not the value 10 or the value 50, 334 00:18:51,310 --> 00:18:54,900 but the address of a and the address of b. 335 00:18:54,900 --> 00:18:59,650 So now we have a link from swap's location memory 336 00:18:59,650 --> 00:19:03,760 back to main, where a and b originally are. 337 00:19:03,760 --> 00:19:09,320 Now, if we follow the steps in swap, we can follow these pointers 338 00:19:09,320 --> 00:19:12,530 and swap the values all at once in place. 339 00:19:12,530 --> 00:19:15,730 So now a is 50 and b is 10. 340 00:19:15,730 --> 00:19:17,830 And so as you go on in the course, you'll 341 00:19:17,830 --> 00:19:21,860 be able to write functions that do this not just for two variables 342 00:19:21,860 --> 00:19:24,520 but perhaps for entire data structures and that 343 00:19:24,520 --> 00:19:29,920 allow you to ultimately build up much more complex ways of storing data 344 00:19:29,920 --> 00:19:32,150 in your programs. 345 00:19:32,150 --> 00:19:38,290 So questions, then, on this idea of swapping and passing by reference 346 00:19:38,290 --> 00:19:39,385 and passing by copy? 347 00:19:39,385 --> 00:19:43,233 348 00:19:43,233 --> 00:19:43,900 A question here. 349 00:19:43,900 --> 00:19:49,195 If you wanted to access the address, wouldn't you write &a and &b? 350 00:19:49,195 --> 00:19:50,770 And, in fact, you would. 351 00:19:50,770 --> 00:19:55,855 In this case, though, as we see in void swap(int *a, int *b), 352 00:19:55,855 --> 00:20:00,130 we're defining here is the definition of swap. 353 00:20:00,130 --> 00:20:03,500 And you'll see swap takes two arguments. 354 00:20:03,500 --> 00:20:06,460 The first argument is a pointer to a. 355 00:20:06,460 --> 00:20:08,380 But how do we know it's a pointer to a? 356 00:20:08,380 --> 00:20:12,370 Well, we see here, it has the type int *. 357 00:20:12,370 --> 00:20:15,040 And then the same thing here, a pointer to b. 358 00:20:15,040 --> 00:20:21,916 We know it's a pointer to b because the type is int *, and the name is b. 359 00:20:21,916 --> 00:20:26,433 So I think it's helpful actually to show you the entire code for this as well. 360 00:20:26,433 --> 00:20:28,850 And for that, I'll pull up my codespace, and I'll actually 361 00:20:28,850 --> 00:20:32,420 walk through each step of this now with debug50 362 00:20:32,420 --> 00:20:35,330 so we can see the exact values of each variable 363 00:20:35,330 --> 00:20:39,270 here and understand exactly what's going on in our program. 364 00:20:39,270 --> 00:20:40,790 So I'll come to my codespace. 365 00:20:40,790 --> 00:20:44,660 And now I will code swap.c. 366 00:20:44,660 --> 00:20:47,450 So swap.c is already done for me. 367 00:20:47,450 --> 00:20:49,130 I already wrote this code. 368 00:20:49,130 --> 00:20:51,480 But let me show you a few pieces here. 369 00:20:51,480 --> 00:20:56,900 So on line 4, I have the prototype for my function swap. 370 00:20:56,900 --> 00:21:03,360 swap will take a pointer called a and a pointer called b. 371 00:21:03,360 --> 00:21:05,330 It won't return us anything. 372 00:21:05,330 --> 00:21:09,890 It will simply have the side effect of swapping those two values. 373 00:21:09,890 --> 00:21:13,640 Now, in our function called main, the very first function 374 00:21:13,640 --> 00:21:18,020 to run in our program, I'll create this variable named a, 375 00:21:18,020 --> 00:21:21,630 put it someplace in memory, and give it the value 10. 376 00:21:21,630 --> 00:21:26,090 Similarly, I'll create this variable named b, put it someplace in memory, 377 00:21:26,090 --> 00:21:29,420 and give it now the value of 50. 378 00:21:29,420 --> 00:21:33,455 So, first, I'll print out what is a and what is b. 379 00:21:33,455 --> 00:21:35,600 I'll call the function swap. 380 00:21:35,600 --> 00:21:37,190 And notice now. 381 00:21:37,190 --> 00:21:41,900 I'm going to give it not the value of a but the address of a. 382 00:21:41,900 --> 00:21:45,290 That is, wherever a is stored, I'll pass that into swap. 383 00:21:45,290 --> 00:21:48,280 And in the same way, I'll do that for b. 384 00:21:48,280 --> 00:21:52,250 And then once that's done, I'll call printf, 385 00:21:52,250 --> 00:21:57,706 and I'll tell us again what is a and what is b. 386 00:21:57,706 --> 00:22:02,550 Now, down below, here's swap, the same implementation we had before. 387 00:22:02,550 --> 00:22:07,410 We're going to take in a pointer called a and a pointer called b. 388 00:22:07,410 --> 00:22:09,150 And through what we saw in lecture, we're 389 00:22:09,150 --> 00:22:13,440 going to swap them all at once down here. 390 00:22:13,440 --> 00:22:20,630 So let's go ahead and write a bit of syntax 391 00:22:20,630 --> 00:22:23,640 here to actually debug this program and see it step by step. 392 00:22:23,640 --> 00:22:28,280 So if you're familiar with debug50, we can actually pause our program 393 00:22:28,280 --> 00:22:33,060 and walk through it step by step, seeing the values of variables as we go. 394 00:22:33,060 --> 00:22:38,300 So I might pause this program here, right where we're about to call swap. 395 00:22:38,300 --> 00:22:46,490 And now if I make swap to compile it and run debug50 ./swap and hit Enter, 396 00:22:46,490 --> 00:22:51,850 I should be whisked away to the debug50 environment. 397 00:22:51,850 --> 00:23:00,390 And I should hopefully see here soon that I'm able to see my program running 398 00:23:00,390 --> 00:23:03,060 but paused at this point. 399 00:23:03,060 --> 00:23:04,760 So now I want to show you where we are. 400 00:23:04,760 --> 00:23:09,830 Notice in my terminal, I see that a is 10, and b is 50. 401 00:23:09,830 --> 00:23:13,310 And, actually, in this tab over here called Variables, 402 00:23:13,310 --> 00:23:15,080 I can see that in real-time. 403 00:23:15,080 --> 00:23:19,580 I do see a is 10, and b is 50. 404 00:23:19,580 --> 00:23:21,500 So it's true, right? 405 00:23:21,500 --> 00:23:26,190 Now, down in my Call Stack, notice that main() is currently running. 406 00:23:26,190 --> 00:23:28,620 It's the highest thing in my Call Stack. 407 00:23:28,620 --> 00:23:31,580 So now I want to run swap. 408 00:23:31,580 --> 00:23:35,840 But if I want to see the workings of swap, how swap is actually 409 00:23:35,840 --> 00:23:40,550 going to swap these two values, I should use step into, 410 00:23:40,550 --> 00:23:45,450 which means, run this function but show me each step along the way. 411 00:23:45,450 --> 00:23:48,240 So I'll step into swap here. 412 00:23:48,240 --> 00:23:50,850 And notice how I change my variables. 413 00:23:50,850 --> 00:23:54,620 These are the variables that have been passed into swap. 414 00:23:54,620 --> 00:24:01,530 I see this variable named temp that currently has value 257. 415 00:24:01,530 --> 00:24:03,330 Why do you think it has that value? 416 00:24:03,330 --> 00:24:04,230 Any ideas? 417 00:24:04,230 --> 00:24:07,510 418 00:24:07,510 --> 00:24:10,950 Doesn't seem to make much sense to me. 419 00:24:10,950 --> 00:24:18,440 And we learned a name for this value in lecture or rather this type of value. 420 00:24:18,440 --> 00:24:21,920 I'm seeing some people say a junk value or a garbage value. 421 00:24:21,920 --> 00:24:23,210 That's exactly what it is. 422 00:24:23,210 --> 00:24:24,760 There's some garbage value. 423 00:24:24,760 --> 00:24:27,880 It is a value that is just kind of already there in memory. 424 00:24:27,880 --> 00:24:30,160 We haven't assigned it anything yet. 425 00:24:30,160 --> 00:24:35,410 But once I actually step over this line of code, we should see what? 426 00:24:35,410 --> 00:24:41,680 Well, int temp, this value here, is currently 257. 427 00:24:41,680 --> 00:24:46,510 But once I run this, I'll set it equal to the result 428 00:24:46,510 --> 00:24:51,110 of dereferencing the pointer a. 429 00:24:51,110 --> 00:24:57,920 Notice the value of a here is 0x7ffd26c1790c. 430 00:24:57,920 --> 00:25:02,210 OK, but what is at the end of that pointer? 431 00:25:02,210 --> 00:25:04,460 If I follow it, what will I find? 432 00:25:04,460 --> 00:25:07,530 I'll find 10 here. 433 00:25:07,530 --> 00:25:12,920 So notice that it gives us the value not just for a, but the value of *a-- 434 00:25:12,920 --> 00:25:17,780 that is, following the pointer going to that location memory and finding 435 00:25:17,780 --> 00:25:18,890 whatever value is there. 436 00:25:18,890 --> 00:25:21,660 In that case, that value is 10. 437 00:25:21,660 --> 00:25:26,210 So it goes to say that if I were to run this line of code, line 18, 438 00:25:26,210 --> 00:25:30,440 I should set temp equal to 10-- 439 00:25:30,440 --> 00:25:34,370 that is, the value that I would get if I dereferenced a. 440 00:25:34,370 --> 00:25:41,130 So now I'll step over it, and I'll see temp becomes 10. 441 00:25:41,130 --> 00:25:42,790 Now, what's the next step? 442 00:25:42,790 --> 00:25:51,370 Well, this says *a = *b, which is kind of confusing. 443 00:25:51,370 --> 00:25:53,580 But keep in mind what *a is. 444 00:25:53,580 --> 00:25:55,290 Well, *a is currently 10. 445 00:25:55,290 --> 00:25:59,580 It is the value at the location that a is stored. 446 00:25:59,580 --> 00:26:05,250 Now I want to put in the value that is-- 447 00:26:05,250 --> 00:26:08,980 the value that I get if I follow the pointer b, which is 50. 448 00:26:08,980 --> 00:26:11,550 So notice here *b is 50. 449 00:26:11,550 --> 00:26:17,400 If I follow this 0x7ffd26c17908, wherever 450 00:26:17,400 --> 00:26:20,970 that is in memory, if I follow that pointer and find that location, 451 00:26:20,970 --> 00:26:23,760 I will then get the value 50. 452 00:26:23,760 --> 00:26:30,420 And I'll then assign that value to the dereferenced part of this pointer a, 453 00:26:30,420 --> 00:26:32,070 which is currently 10. 454 00:26:32,070 --> 00:26:39,660 So I'll step over this now, and we'll see, well, *a becomes 50. 455 00:26:39,660 --> 00:26:45,390 I follow that pointer to the location of a and set it equal to 50. 456 00:26:45,390 --> 00:26:51,180 Now then, I'll say *b gets the value of temp. 457 00:26:51,180 --> 00:26:52,860 Well, *b is currently 50. 458 00:26:52,860 --> 00:26:59,250 I'm going to follow this pointer b to its value, which is 50. 459 00:26:59,250 --> 00:27:01,450 And then I'll set it equal to 10. 460 00:27:01,450 --> 00:27:08,620 I'll step over, and now I'll see *a is 50, and *b is 10. 461 00:27:08,620 --> 00:27:13,120 Now, if I end this program, finish swap, what do we see? 462 00:27:13,120 --> 00:27:16,030 Well, a is now 50, and b is now 10. 463 00:27:16,030 --> 00:27:18,610 I'll step over and finish my program again. 464 00:27:18,610 --> 00:27:21,900 And now I'll see a is 50. b is 10. 465 00:27:21,900 --> 00:27:25,200 I've swapped these two values right there in place, 466 00:27:25,200 --> 00:27:27,070 walking through step by step. 467 00:27:27,070 --> 00:27:31,140 So I'll close this program, and let me ask, what questions do we 468 00:27:31,140 --> 00:27:35,020 have on swapping or on pointers so far? 469 00:27:35,020 --> 00:27:40,517 470 00:27:40,517 --> 00:27:41,100 Question here. 471 00:27:41,100 --> 00:27:47,020 Why is swap defined on line 4 and then again called on line 16? 472 00:27:47,020 --> 00:27:48,200 It's a good question. 473 00:27:48,200 --> 00:27:50,650 So to be clear, on line 4, what I'm doing 474 00:27:50,650 --> 00:27:53,380 is setting up the prototype for swap. 475 00:27:53,380 --> 00:27:56,320 I'm not so much defining it as I am declaring it. 476 00:27:56,320 --> 00:27:59,230 I'm telling C that, hey, look, up ahead, you 477 00:27:59,230 --> 00:28:03,010 should see a function called swap that takes these two inputs 478 00:28:03,010 --> 00:28:05,590 and returns this value-- in this case, nothing. 479 00:28:05,590 --> 00:28:12,130 And down below, when I, on line 16, have that same prototype followed 480 00:28:12,130 --> 00:28:15,520 by some brackets and some code here, that's 481 00:28:15,520 --> 00:28:19,370 where I'm defining swap in itself. 482 00:28:19,370 --> 00:28:20,330 A good question though. 483 00:28:20,330 --> 00:28:23,830 484 00:28:23,830 --> 00:28:25,070 Another good question. 485 00:28:25,070 --> 00:28:27,830 Why aren't we returning in this function? 486 00:28:27,830 --> 00:28:34,720 So often we'll see if we want to get some value back in, let's say, main, 487 00:28:34,720 --> 00:28:39,590 we have the function we call return us some value. 488 00:28:39,590 --> 00:28:44,270 In this case, though, swap doesn't need to return much of anything. 489 00:28:44,270 --> 00:28:48,820 In fact, its entire purpose is to have this side effect of swapping values 490 00:28:48,820 --> 00:28:53,560 in place because we give to swap the addresses of two variables in memory. 491 00:28:53,560 --> 00:28:57,400 We have it follow those pointers to where those values are 492 00:28:57,400 --> 00:28:59,660 and move them around a bit like this. 493 00:28:59,660 --> 00:29:01,750 So we don't need to return anything. 494 00:29:01,750 --> 00:29:05,080 Because swap already has access to that place in memory, 495 00:29:05,080 --> 00:29:07,135 we're going to move those two values around. 496 00:29:07,135 --> 00:29:10,710 497 00:29:10,710 --> 00:29:11,880 All right. 498 00:29:11,880 --> 00:29:12,975 Other questions too? 499 00:29:12,975 --> 00:29:21,330 500 00:29:21,330 --> 00:29:22,440 A question here. 501 00:29:22,440 --> 00:29:29,700 Would this work without the &a and &b because swap is defined above? 502 00:29:29,700 --> 00:29:31,280 Let's see. 503 00:29:31,280 --> 00:29:33,280 I'm not sure if this is answering your question, 504 00:29:33,280 --> 00:29:39,610 but we do see here that swap is taking in these values, *a and *b. 505 00:29:39,610 --> 00:29:42,370 And it's important here that we actually give swap 506 00:29:42,370 --> 00:29:46,330 the address of these values, not the values themselves. 507 00:29:46,330 --> 00:29:52,180 If I said a and b here, I would now be passing by copy. 508 00:29:52,180 --> 00:29:55,540 I would say, swap, take the value of a, which is 10, 509 00:29:55,540 --> 00:29:59,750 and take the value of b, which is 50, and do whatever you want with it. 510 00:29:59,750 --> 00:30:03,790 But in this case, beyond the fact that swap is expecting a pointer, 511 00:30:03,790 --> 00:30:08,410 if I were to do that, I'm not actually telling swap where my values currently 512 00:30:08,410 --> 00:30:10,580 are so it can move them around. 513 00:30:10,580 --> 00:30:12,580 I'm just giving it a copy of those values for it 514 00:30:12,580 --> 00:30:14,200 to do whatever it wants with it. 515 00:30:14,200 --> 00:30:16,540 Here, though, I should make sure that swap 516 00:30:16,540 --> 00:30:19,700 is able to take in the address of these values 517 00:30:19,700 --> 00:30:23,660 so it can move them around right where they are. 518 00:30:23,660 --> 00:30:24,690 Another question. 519 00:30:24,690 --> 00:30:28,800 Why don't we declare swap instead of prototyping it? 520 00:30:28,800 --> 00:30:29,520 So we do. 521 00:30:29,520 --> 00:30:32,120 So this is the prototype and declaration for swap. 522 00:30:32,120 --> 00:30:34,340 We're telling C exactly what swap will be. 523 00:30:34,340 --> 00:30:38,237 And down below, we define it as well. 524 00:30:38,237 --> 00:30:39,820 All right, so that was a bit of a lot. 525 00:30:39,820 --> 00:30:41,570 And our goal here is I'm going to show you 526 00:30:41,570 --> 00:30:43,510 all what you can now do with pointers. 527 00:30:43,510 --> 00:30:47,200 It's not so much interesting to swap things around like this and that. 528 00:30:47,200 --> 00:30:52,060 But what is powerful, what is fun, is actually opening up files, reading data 529 00:30:52,060 --> 00:30:55,240 from them, and writing data to those files, 530 00:30:55,240 --> 00:30:59,660 allowing you to write even more complex programs as you go. 531 00:30:59,660 --> 00:31:04,570 So here we'll talk about this idea of file I/O. 532 00:31:04,570 --> 00:31:08,470 And file I/O stands for file input and file 533 00:31:08,470 --> 00:31:13,640 output, how we can write data to files and how we can read data to files. 534 00:31:13,640 --> 00:31:19,240 So as a visual here, let's think of how we can both open and close files first. 535 00:31:19,240 --> 00:31:23,890 When we work with files in C, there's this idea of opening up a file 536 00:31:23,890 --> 00:31:28,270 and closing a file, similar to what you would do on your own computer. 537 00:31:28,270 --> 00:31:31,040 If you were to open up, let's say, a Microsoft Word document, 538 00:31:31,040 --> 00:31:31,840 you open it up. 539 00:31:31,840 --> 00:31:34,000 And later on, when you're done, you just close it. 540 00:31:34,000 --> 00:31:36,820 Hit that X button, and it's gone and stored away. 541 00:31:36,820 --> 00:31:42,690 In a similar way, your programs can open and close files too. 542 00:31:42,690 --> 00:31:48,710 So there are two key functions here for opening and closing files in C. One 543 00:31:48,710 --> 00:31:51,830 is called fopen, which stands for file open. 544 00:31:51,830 --> 00:31:54,920 Going to open up a file for future reading or writing. 545 00:31:54,920 --> 00:31:57,410 If we're reading, it is looking at the data, 546 00:31:57,410 --> 00:32:01,200 and writing is adding data or modifying data. 547 00:32:01,200 --> 00:32:05,820 Now, fclose goes ahead and closes that file for us. 548 00:32:05,820 --> 00:32:09,940 And now there's a bit of a hint here you saw in lecture a bit 549 00:32:09,940 --> 00:32:13,140 and one you should keep in mind as well, which is you should always 550 00:32:13,140 --> 00:32:17,010 fclose all files you fopen. 551 00:32:17,010 --> 00:32:21,330 And I'm curious to get a sense of your intuition for this. 552 00:32:21,330 --> 00:32:24,510 Why is it important that we always close files we open? 553 00:32:24,510 --> 00:32:27,660 Why, do you think? 554 00:32:27,660 --> 00:32:31,710 I'm seeing to free up memory, which is a good idea. 555 00:32:31,710 --> 00:32:34,300 Why else? 556 00:32:34,300 --> 00:32:38,930 So memory isn't wasted, another good idea. 557 00:32:38,930 --> 00:32:42,710 Maybe you won't be able to use the file later if you don't close it. 558 00:32:42,710 --> 00:32:45,998 Preventing memory leaks That is like a ballooning program 559 00:32:45,998 --> 00:32:47,540 using more and more memory over time. 560 00:32:47,540 --> 00:32:48,857 These are all good ideas. 561 00:32:48,857 --> 00:32:50,690 And I think you could really draw a parallel 562 00:32:50,690 --> 00:32:53,780 between you working on your own computer and just 563 00:32:53,780 --> 00:32:58,510 always opening files, opening files, opening files but never closing them. 564 00:32:58,510 --> 00:33:01,510 One, it would just be a mess if your computer, every time you opened it, 565 00:33:01,510 --> 00:33:04,670 just had all these open files on it, right? 566 00:33:04,670 --> 00:33:07,040 But the other thing is that if you ever want 567 00:33:07,040 --> 00:33:10,790 to send that file to somebody else, you had better 568 00:33:10,790 --> 00:33:16,020 close it so they can modify it without you also modifying it at the same time. 569 00:33:16,020 --> 00:33:17,930 So, generally, it's good practice to make 570 00:33:17,930 --> 00:33:21,800 sure you open files when you're going to use them and close them 571 00:33:21,800 --> 00:33:23,930 when you're done with them to save memory 572 00:33:23,930 --> 00:33:30,650 and also to ensure that no two programs are opening a file at the same time. 573 00:33:30,650 --> 00:33:39,000 OK, so let's get a visual here for how we can actually write to our file. 574 00:33:39,000 --> 00:33:43,260 So here we see a actual file on the right-hand side 575 00:33:43,260 --> 00:33:47,660 and some piece of syntax we could use to open that file. 576 00:33:47,660 --> 00:33:51,620 In this case, we're going to use the fopen function. 577 00:33:51,620 --> 00:33:56,390 So fopen takes two particular arguments or inputs. 578 00:33:56,390 --> 00:33:58,610 It takes the name of the file. 579 00:33:58,610 --> 00:34:02,880 And let's say this file here is called hi.txt, as we see down below. 580 00:34:02,880 --> 00:34:06,380 So it's a text file that just says "Hi!" on the inside. 581 00:34:06,380 --> 00:34:12,020 And now fopen takes another argument, another input, which is r. 582 00:34:12,020 --> 00:34:16,139 That is the mode we're going to use to open this file. 583 00:34:16,139 --> 00:34:19,017 There are two modes, read and write. 584 00:34:19,017 --> 00:34:20,600 And so what do you think r stands for? 585 00:34:20,600 --> 00:34:21,590 Well, read. 586 00:34:21,590 --> 00:34:25,409 We're going to open this file just so we can see what's inside of it. 587 00:34:25,409 --> 00:34:27,880 We can read some data from it. 588 00:34:27,880 --> 00:34:33,449 And now what do you think fopen returns to us, based 589 00:34:33,449 --> 00:34:36,420 on what you see on the left-hand side? 590 00:34:36,420 --> 00:34:39,165 What does fopen return to us? 591 00:34:39,165 --> 00:34:41,820 592 00:34:41,820 --> 00:34:45,315 You could look at the type on the left-hand side here. 593 00:34:45,315 --> 00:34:48,949 594 00:34:48,949 --> 00:34:50,239 So I'm seeing a few ideas. 595 00:34:50,239 --> 00:34:56,440 One is that it returns the contents of hi.txt, which is close, but not quite. 596 00:34:56,440 --> 00:35:02,570 I'm seeing another idea that it's a pointer to the file in memory, 597 00:35:02,570 --> 00:35:04,087 which is a good idea. 598 00:35:04,087 --> 00:35:05,920 And I think we could get this intuition-wise 599 00:35:05,920 --> 00:35:11,110 if we said, well, it looks like we have this variable named f. 600 00:35:11,110 --> 00:35:14,530 And its type is FILE *. 601 00:35:14,530 --> 00:35:18,250 And whenever we see type *, well, we should 602 00:35:18,250 --> 00:35:20,990 assume that is a pointer to that type. 603 00:35:20,990 --> 00:35:25,580 So in this case, in C, there is a type called, all caps, FILE, 604 00:35:25,580 --> 00:35:28,810 which is basically a way of trying to access the file. 605 00:35:28,810 --> 00:35:30,100 It's a bit of a fancy type. 606 00:35:30,100 --> 00:35:33,460 But suffice to say for now, it allows you to access some file. 607 00:35:33,460 --> 00:35:38,110 And it has a pointer to that particular file type. 608 00:35:38,110 --> 00:35:40,930 Now, as a bit of an oversimplification here, 609 00:35:40,930 --> 00:35:46,250 you could imagine that this variable named f now just points to, 610 00:35:46,250 --> 00:35:50,510 let's say, this location in memory or this file we're trying to open. 611 00:35:50,510 --> 00:35:53,300 It points to the very beginning of that file. 612 00:35:53,300 --> 00:35:56,810 In reality, there is more going on underneath the hood. 613 00:35:56,810 --> 00:35:59,450 There is a very special file type, like we discussed. 614 00:35:59,450 --> 00:36:03,540 C tends to move some of the data of that file to its own program. 615 00:36:03,540 --> 00:36:06,020 But for now, you could just think of fopen 616 00:36:06,020 --> 00:36:10,400 as returning to you the location of this file in memory 617 00:36:10,400 --> 00:36:12,950 so you can actually see where it's stored. 618 00:36:12,950 --> 00:36:16,010 Much like you trying to find a file to open it, 619 00:36:16,010 --> 00:36:18,260 you first have to figure out, what directory is it in, 620 00:36:18,260 --> 00:36:21,770 where is it located, and so on. 621 00:36:21,770 --> 00:36:25,810 So questions, then, on fopen and how we can 622 00:36:25,810 --> 00:36:28,870 try to find files to open them with our own programs? 623 00:36:28,870 --> 00:36:39,320 624 00:36:39,320 --> 00:36:40,490 OK. 625 00:36:40,490 --> 00:36:41,400 Oh, a question. 626 00:36:41,400 --> 00:36:42,440 Why do the modes matter? 627 00:36:42,440 --> 00:36:43,440 That's a great question. 628 00:36:43,440 --> 00:36:47,690 So here we see we're going to open hi.txt using 629 00:36:47,690 --> 00:36:50,220 the mode r, which stands for read. 630 00:36:50,220 --> 00:36:53,120 There's also the mode w, which stands for write, 631 00:36:53,120 --> 00:36:56,680 which allows us to actually write some data into the file 632 00:36:56,680 --> 00:37:00,240 to change or modify it in some way. 633 00:37:00,240 --> 00:37:03,200 I would say, for now, just keep in mind that fopen 634 00:37:03,200 --> 00:37:04,920 might do two different things. 635 00:37:04,920 --> 00:37:09,260 It might set up the file in different ways, for reading and for writing, 636 00:37:09,260 --> 00:37:12,350 because reading just requires us to look at that file 637 00:37:12,350 --> 00:37:14,240 and see what it is right now. 638 00:37:14,240 --> 00:37:17,240 Whereas writing, we have to set up the entire process of being 639 00:37:17,240 --> 00:37:21,440 able to change that file in some way. 640 00:37:21,440 --> 00:37:24,320 You can also, I believe, have both modes together, 641 00:37:24,320 --> 00:37:26,852 able to read and write at the same time. 642 00:37:26,852 --> 00:37:29,060 There are also other kinds of modes you could look up 643 00:37:29,060 --> 00:37:33,070 as well in the C standard library. 644 00:37:33,070 --> 00:37:34,090 OK. 645 00:37:34,090 --> 00:37:37,330 So here we're able to open up a file, and it's 646 00:37:37,330 --> 00:37:40,960 easy to close it simply using fclose and then 647 00:37:40,960 --> 00:37:44,000 giving fclose the pointer to that file. 648 00:37:44,000 --> 00:37:48,490 So here we see f pointing to this file, telling us where it is in memory. 649 00:37:48,490 --> 00:37:52,610 We can simply say fclose and give it the pointer to that file. 650 00:37:52,610 --> 00:37:55,930 And now that file will, afterwards, be closed. 651 00:37:55,930 --> 00:38:01,050 We can no longer read or write from it. 652 00:38:01,050 --> 00:38:06,610 But let's say, along the way, we do want to read or write data from our file. 653 00:38:06,610 --> 00:38:11,740 So let's get a visual now for what it means to read and write from a file. 654 00:38:11,740 --> 00:38:18,540 So, often, one thing we'll see is having a file a bit like this one, hi.txt, 655 00:38:18,540 --> 00:38:23,370 and our program on the left-hand side with some variable like text. 656 00:38:23,370 --> 00:38:28,530 And maybe we want to store whatever is inside this file in some variable. 657 00:38:28,530 --> 00:38:33,160 We want to get it into our program so we can modify it or use it in some way. 658 00:38:33,160 --> 00:38:36,300 Well, if we were to read this data, it's basically 659 00:38:36,300 --> 00:38:39,660 like taking a copy of some chunk of our file 660 00:38:39,660 --> 00:38:42,250 and putting it inside of our program. 661 00:38:42,250 --> 00:38:44,790 So here I see this text, "Hi!" 662 00:38:44,790 --> 00:38:49,770 I'll take a copy of it and put it inside this variable called text. 663 00:38:49,770 --> 00:38:51,600 And now it's inside my program. 664 00:38:51,600 --> 00:38:54,360 I could use it, modify it as I wish. 665 00:38:54,360 --> 00:38:58,780 It's simply a copy of whatever is inside this file. 666 00:38:58,780 --> 00:39:02,740 But now if I want to write data or modify this file, 667 00:39:02,740 --> 00:39:07,810 you could imagine maybe I have some value, like what's currently in text, 668 00:39:07,810 --> 00:39:10,390 and I want to put that in the file. 669 00:39:10,390 --> 00:39:15,760 Well, I could write data by copying what I have in this particular variable 670 00:39:15,760 --> 00:39:19,850 and appending it-- that is, adding it to this file here. 671 00:39:19,850 --> 00:39:24,130 I'll take "Hi!", and now I'll put it at the very end of this file. 672 00:39:24,130 --> 00:39:29,830 I've written data or added data to this file here. 673 00:39:29,830 --> 00:39:34,155 So what questions, then, on this visual for reading and for writing? 674 00:39:34,155 --> 00:39:38,560 675 00:39:38,560 --> 00:39:41,450 Question on the syntax, I think, which is a good question. 676 00:39:41,450 --> 00:39:46,000 So we've seen how to open and close files with fopen and fclose. 677 00:39:46,000 --> 00:39:50,240 But now how do we actually add data to them and read data from them? 678 00:39:50,240 --> 00:39:54,820 So for that, we have two other functions, one called fread 679 00:39:54,820 --> 00:39:57,730 and one called fwrite. 680 00:39:57,730 --> 00:40:00,190 And more on these as we go. 681 00:40:00,190 --> 00:40:05,800 But suffice to say for now that fread lets us read data from a file 682 00:40:05,800 --> 00:40:07,630 into our program. 683 00:40:07,630 --> 00:40:13,240 And fwrite allows us to read data or to take data from our program 684 00:40:13,240 --> 00:40:16,660 and add it to a file. 685 00:40:16,660 --> 00:40:21,540 Now, in particular, there's a new vocabulary word here called a buffer. 686 00:40:21,540 --> 00:40:24,600 And a buffer is simply a place we can temporarily 687 00:40:24,600 --> 00:40:27,580 store some data in our program. 688 00:40:27,580 --> 00:40:31,950 So let's say we have a file, and like we saw before, I wanted to read, 689 00:40:31,950 --> 00:40:34,410 let's say, three characters from that file-- 690 00:40:34,410 --> 00:40:36,910 H, i, exclamation point. 691 00:40:36,910 --> 00:40:40,500 I would have a variable, which serves as a buffer that 692 00:40:40,500 --> 00:40:45,750 is some place to store that data inside my program temporarily. 693 00:40:45,750 --> 00:40:49,950 So a buffer is simply some particular name for a kind of variable 694 00:40:49,950 --> 00:40:52,635 that stores often file contents. 695 00:40:52,635 --> 00:40:56,180 696 00:40:56,180 --> 00:41:02,770 So let's consider then why we might even want to use idea of a buffer. 697 00:41:02,770 --> 00:41:05,690 I'm curious what you all might think here. 698 00:41:05,690 --> 00:41:12,350 If a buffer is simply some place to store part of a file, not all of it, 699 00:41:12,350 --> 00:41:14,600 why would we use a buffer? 700 00:41:14,600 --> 00:41:17,070 Another way of asking this question is this. 701 00:41:17,070 --> 00:41:22,460 Why might we not want to read the entirety of our file into memory 702 00:41:22,460 --> 00:41:23,550 all at once? 703 00:41:23,550 --> 00:41:25,430 Why do we need a buffer in this case? 704 00:41:25,430 --> 00:41:28,780 705 00:41:28,780 --> 00:41:33,727 If the goal of buffer is to break our file into smaller bits, 706 00:41:33,727 --> 00:41:34,685 why might we need that? 707 00:41:34,685 --> 00:41:37,590 708 00:41:37,590 --> 00:41:42,810 So I'm seeing this idea of saving up memory, trying to use less memory 709 00:41:42,810 --> 00:41:45,860 overall, which is a good one. 710 00:41:45,860 --> 00:41:46,790 Let's see. 711 00:41:46,790 --> 00:41:49,730 Trying to avoid overflow, which makes sense. 712 00:41:49,730 --> 00:41:54,520 Maybe we don't quite know how big the file is. 713 00:41:54,520 --> 00:41:57,550 Trying to avoid segmentation fault. So I'm seeing some themes here. 714 00:41:57,550 --> 00:42:01,510 And among them are this idea that maybe our file is really big, 715 00:42:01,510 --> 00:42:04,660 like we don't want to have that entire file loaded up at once 716 00:42:04,660 --> 00:42:05,600 and put into memory. 717 00:42:05,600 --> 00:42:06,850 That's a good idea. 718 00:42:06,850 --> 00:42:11,960 The other idea is we often don't know exactly how big a file is. 719 00:42:11,960 --> 00:42:14,860 So the best we can do is look at small bits of it 720 00:42:14,860 --> 00:42:21,478 one at a time until we get to the end of our file at the end of the day. 721 00:42:21,478 --> 00:42:23,270 So that's why I might want to use a buffer. 722 00:42:23,270 --> 00:42:26,880 This allows us to look at some particular pieces of our file and not 723 00:42:26,880 --> 00:42:30,220 the entire file all at once. 724 00:42:30,220 --> 00:42:32,610 So if we have this buffer now, it's worth asking, 725 00:42:32,610 --> 00:42:35,370 how could we get data into that buffer? 726 00:42:35,370 --> 00:42:39,430 And for that, we'll see this idea of reading from a file. 727 00:42:39,430 --> 00:42:42,150 So if I wanted to read from a file, there 728 00:42:42,150 --> 00:42:44,850 are really two questions I should answer. 729 00:42:44,850 --> 00:42:48,510 The first is, from where am I reading data? 730 00:42:48,510 --> 00:42:52,120 What file am I trying to get data from? 731 00:42:52,120 --> 00:42:56,730 And then where am I trying to read that data into? 732 00:42:56,730 --> 00:42:59,190 What buffer am I trying to put it into? 733 00:42:59,190 --> 00:43:04,590 And it turns out that fread requires us to answer these two questions before it 734 00:43:04,590 --> 00:43:07,420 can do what we want it to do. 735 00:43:07,420 --> 00:43:11,020 So let's see one example of fread here. 736 00:43:11,020 --> 00:43:13,230 So this is fread. 737 00:43:13,230 --> 00:43:17,790 And fread takes four inputs or four arguments. 738 00:43:17,790 --> 00:43:21,340 One of the first ones you might care about is this one, which is, 739 00:43:21,340 --> 00:43:23,740 from where are we reading data? 740 00:43:23,740 --> 00:43:27,380 From what file are we trying to read data? 741 00:43:27,380 --> 00:43:30,730 And for that, we give fread a file pointer, 742 00:43:30,730 --> 00:43:36,620 some way of finding the location of our file in the computer's memory. 743 00:43:36,620 --> 00:43:39,730 So, for instance, let's say we have this file here. 744 00:43:39,730 --> 00:43:42,130 The pointer to that file is called f. 745 00:43:42,130 --> 00:43:48,380 I could run fread a bit like this, by slotting in f as that fourth argument. 746 00:43:48,380 --> 00:43:55,340 So now fread knows from where to get the data from this file. 747 00:43:55,340 --> 00:43:59,780 But the next question, we said, is where is the data going to go? 748 00:43:59,780 --> 00:44:03,350 Where in my program should I put this temporary piece 749 00:44:03,350 --> 00:44:05,540 of data I'm going to get from the file? 750 00:44:05,540 --> 00:44:08,930 Well, in that case, we might have something like a buffer, 751 00:44:08,930 --> 00:44:12,570 and that is going to be the first input to fread. 752 00:44:12,570 --> 00:44:14,480 So here now is our visual. 753 00:44:14,480 --> 00:44:17,780 Let's say we have a pointer to our file called f, 754 00:44:17,780 --> 00:44:20,810 and we have some place in our program called 755 00:44:20,810 --> 00:44:23,820 buffer, some place to store this data. 756 00:44:23,820 --> 00:44:27,260 Well, we could then just call that variable buffer. 757 00:44:27,260 --> 00:44:34,190 And now when we call fread, we could say, we want to read into the buffer. 758 00:44:34,190 --> 00:44:39,710 That is, that will be a pointer to some place in our computer's memory, 759 00:44:39,710 --> 00:44:43,400 some address we want to put that data in. 760 00:44:43,400 --> 00:44:49,040 So now we've seen two out of the four arguments so fread can function. 761 00:44:49,040 --> 00:44:55,060 Where are we getting data, and where are we going to put it in the end? 762 00:44:55,060 --> 00:44:57,270 So questions here as well. 763 00:44:57,270 --> 00:45:00,480 764 00:45:00,480 --> 00:45:02,165 And we'll see some examples later on. 765 00:45:02,165 --> 00:45:08,367 766 00:45:08,367 --> 00:45:08,950 Good question. 767 00:45:08,950 --> 00:45:09,950 So buffer is a variable? 768 00:45:09,950 --> 00:45:11,230 Buffer is a variable. 769 00:45:11,230 --> 00:45:14,250 And in this case, it is the address of some place in memory 770 00:45:14,250 --> 00:45:17,710 where we want to store the file's contents. 771 00:45:17,710 --> 00:45:20,070 Good question. 772 00:45:20,070 --> 00:45:20,570 OK. 773 00:45:20,570 --> 00:45:24,740 So now the next thing is there are still two arguments we haven't yet defined. 774 00:45:24,740 --> 00:45:26,360 What are these arguments? 775 00:45:26,360 --> 00:45:31,290 Well, it turns out that these two are answering these questions here, 776 00:45:31,290 --> 00:45:35,450 which is, what size is the block of data I want to read, 777 00:45:35,450 --> 00:45:38,630 and how many blocks do I want to read? 778 00:45:38,630 --> 00:45:42,440 So it turns out that files themselves are 779 00:45:42,440 --> 00:45:46,080 composed of individual blocks of data. 780 00:45:46,080 --> 00:45:49,430 So you could imagine, let's say, a text file. 781 00:45:49,430 --> 00:45:55,160 And I want to ask you, what might be the file-- 782 00:45:55,160 --> 00:45:59,600 what might be the individual chunks of a text file? 783 00:45:59,600 --> 00:46:03,290 If I'm storing some text, what do you think 784 00:46:03,290 --> 00:46:06,665 might compose those individual chunks of the file? 785 00:46:06,665 --> 00:46:09,210 786 00:46:09,210 --> 00:46:12,630 How would you break up that file if I were to ask 787 00:46:12,630 --> 00:46:16,060 you to break it up into smaller pieces? 788 00:46:16,060 --> 00:46:19,010 You could certainly break it up into lines or sentences or so on. 789 00:46:19,010 --> 00:46:22,600 But I would argue, maybe the smallest unit we could get 790 00:46:22,600 --> 00:46:25,930 is an individual character of text-- 791 00:46:25,930 --> 00:46:29,740 so maybe a character like a or a character like e. 792 00:46:29,740 --> 00:46:35,770 So we saw our hi.txt earlier had H, i, exclamation point. 793 00:46:35,770 --> 00:46:41,440 Well, that file is probably just three chunks of memory, each 1 byte long. 794 00:46:41,440 --> 00:46:43,420 H is one chunk. 795 00:46:43,420 --> 00:46:45,100 i is one chunk. 796 00:46:45,100 --> 00:46:47,710 Exclamation point is another chunk. 797 00:46:47,710 --> 00:46:50,350 And we know that characters are 1 byte long. 798 00:46:50,350 --> 00:46:53,590 So you can probably assume that text file is broken up 799 00:46:53,590 --> 00:46:59,460 into individual pieces, individual bytes in this case. 800 00:46:59,460 --> 00:47:02,870 So some files do have individual bytes that make them up. 801 00:47:02,870 --> 00:47:05,760 Other files, though, are a little fancier. 802 00:47:05,760 --> 00:47:10,520 So while text files might have chunks that are 1 byte long, 803 00:47:10,520 --> 00:47:15,230 you could imagine a file like an image that stores color. 804 00:47:15,230 --> 00:47:18,290 Well, to store individual pixels, it turns out 805 00:47:18,290 --> 00:47:20,910 each pixel needs about 3 bytes. 806 00:47:20,910 --> 00:47:23,840 And so we could probably best think of an image file 807 00:47:23,840 --> 00:47:29,030 as being broken up into not one-byte chunks but three-byte chunks, 808 00:47:29,030 --> 00:47:31,250 a bit like this visually. 809 00:47:31,250 --> 00:47:36,270 So as we read files, it's important we figure out, well, 810 00:47:36,270 --> 00:47:41,720 how big are the individual pieces of data that make up this file? 811 00:47:41,720 --> 00:47:45,950 In the case of a text file, it's those characters that are 1 byte long. 812 00:47:45,950 --> 00:47:48,710 In the case of an image though, it's those pixels 813 00:47:48,710 --> 00:47:53,900 that could be up to 3 bytes long as well to store all the possible colors 814 00:47:53,900 --> 00:47:56,210 that pixel could be. 815 00:47:56,210 --> 00:47:58,470 And the question is, how do we know that? 816 00:47:58,470 --> 00:48:00,920 Well, often, you just know that from convention. 817 00:48:00,920 --> 00:48:03,980 So if you're working with text files, by convention, 818 00:48:03,980 --> 00:48:06,500 we store them character by character. 819 00:48:06,500 --> 00:48:10,520 If you're working with images, you were able to maybe look up documentation 820 00:48:10,520 --> 00:48:14,060 for, let's say, the .png or .bmp file type. 821 00:48:14,060 --> 00:48:18,980 It tells you that those pixels are stored in three-byte chunks. 822 00:48:18,980 --> 00:48:23,690 So often you don't know yourself, but somebody else, the maker of that file, 823 00:48:23,690 --> 00:48:27,330 will tell you how their data is stored. 824 00:48:27,330 --> 00:48:30,100 A good question. 825 00:48:30,100 --> 00:48:34,530 So it stands to reason, then, if we know how to break a file into smaller 826 00:48:34,530 --> 00:48:37,410 chunks, we need to answer these two questions, which 827 00:48:37,410 --> 00:48:42,910 is, how big are those chunks, and how many do we want to read at once? 828 00:48:42,910 --> 00:48:44,880 So let's answer that first question. 829 00:48:44,880 --> 00:48:47,110 What size are these chunks? 830 00:48:47,110 --> 00:48:49,180 And this question is in bytes. 831 00:48:49,180 --> 00:48:51,990 So if we're working now with, let's say, a file 832 00:48:51,990 --> 00:48:54,600 that looks a bit like this-- maybe it's a text file, 833 00:48:54,600 --> 00:48:59,790 has individual characters-- we could say the individual chunks of this file 834 00:48:59,790 --> 00:49:02,580 are simply 1 byte long. 835 00:49:02,580 --> 00:49:09,720 So we could tell fread we're going to read chunks that are 1 byte big. 836 00:49:09,720 --> 00:49:15,900 But now the next question is, OK, I know my file is made of one-byte chunks. 837 00:49:15,900 --> 00:49:19,560 But how many of those should I read all at once? 838 00:49:19,560 --> 00:49:23,940 Well, we could say, maybe we want to read 4 at a time or 8 839 00:49:23,940 --> 00:49:27,030 at a time or 2 or 1 at a time, whatever it is. 840 00:49:27,030 --> 00:49:32,130 In this case, let's say, I want to read just 4 bytes at a time, 841 00:49:32,130 --> 00:49:37,150 able to see my file in 4 byte sliding windows, if you will. 842 00:49:37,150 --> 00:49:41,550 So here, if I read 4 bytes, visually, it looks a bit like this. 843 00:49:41,550 --> 00:49:44,700 I'm looking at my file here, and I want to read 844 00:49:44,700 --> 00:49:47,400 whatever is in these first four chunks. 845 00:49:47,400 --> 00:49:50,250 My pointer f points to the first one. 846 00:49:50,250 --> 00:49:54,480 I'll take out the first four and put them in my buffer. 847 00:49:54,480 --> 00:49:59,160 I've made a copy of them from my file into my program, 848 00:49:59,160 --> 00:50:04,390 and now my file pointer points at whatever is still left to read. 849 00:50:04,390 --> 00:50:07,260 So if I have some sentence here, maybe I'm 850 00:50:07,260 --> 00:50:09,900 reading it four characters at a time. 851 00:50:09,900 --> 00:50:12,480 I see those first four characters. 852 00:50:12,480 --> 00:50:15,900 My file pointer gets updated and points to the rest of my file. 853 00:50:15,900 --> 00:50:18,930 If I call fread again and again and again, 854 00:50:18,930 --> 00:50:22,440 I'll keep moving further and further and further and further down 855 00:50:22,440 --> 00:50:27,870 my file 4 bytes at a time, 4 bytes at a time. 856 00:50:27,870 --> 00:50:32,810 So questions on this, whether visually or syntax-wise? 857 00:50:32,810 --> 00:50:37,160 This is our final call or usage of fread. 858 00:50:37,160 --> 00:50:41,130 859 00:50:41,130 --> 00:50:43,570 And we'll see this in action in just a bit. 860 00:50:43,570 --> 00:50:48,857 861 00:50:48,857 --> 00:50:49,440 Good question. 862 00:50:49,440 --> 00:50:52,920 How do we decide how many chunks to take out? 863 00:50:52,920 --> 00:50:56,320 Often it will depend on a few things. 864 00:50:56,320 --> 00:51:02,170 So one is, how much memory do you want to use at any one time? 865 00:51:02,170 --> 00:51:05,430 So if you were to take a big chunk out of your file, 866 00:51:05,430 --> 00:51:09,630 that's a lot of memory to store in your program, perhaps. 867 00:51:09,630 --> 00:51:14,940 At the same time, maybe you care about seeing all that data at once. 868 00:51:14,940 --> 00:51:16,650 That would be a good reason to do that. 869 00:51:16,650 --> 00:51:21,120 If you don't, though, quite care about seeing all your data all at once, 870 00:51:21,120 --> 00:51:24,750 you could read things in smaller chunks just 871 00:51:24,750 --> 00:51:28,530 to make sure you're not using up too much memory at any one time. 872 00:51:28,530 --> 00:51:33,270 The other reason you might make this value smaller or larger is it 873 00:51:33,270 --> 00:51:39,630 tends to be a little bit faster to read things in bigger chunks, like to read, 874 00:51:39,630 --> 00:51:47,620 let's say, 100 individual bytes versus, let's say, 10 bytes ten times, 875 00:51:47,620 --> 00:51:48,520 for instance. 876 00:51:48,520 --> 00:51:54,250 Or to make it even simpler, it's easier to read 10 bytes all at once 877 00:51:54,250 --> 00:51:58,820 than it is to read 10 bytes individually over and over again. 878 00:51:58,820 --> 00:52:01,570 So that's a consideration as well, though probably not as 879 00:52:01,570 --> 00:52:05,110 much a consideration for the kind of work you'll do in CS50 in particular. 880 00:52:05,110 --> 00:52:08,583 881 00:52:08,583 --> 00:52:09,500 Another question here. 882 00:52:09,500 --> 00:52:13,640 Is f able to return to the beginning of the file after doing the reading? 883 00:52:13,640 --> 00:52:14,750 It is. 884 00:52:14,750 --> 00:52:19,430 There is a special function you would call to move the file pointer back 885 00:52:19,430 --> 00:52:20,340 to the top. 886 00:52:20,340 --> 00:52:23,370 So when you use fread, as we said before, 887 00:52:23,370 --> 00:52:27,230 this file pointer continues through the file, through the file, 888 00:52:27,230 --> 00:52:29,510 through the file, gradually getting to the end. 889 00:52:29,510 --> 00:52:33,690 When you get to the end, you have to kind of rewind it back to the top. 890 00:52:33,690 --> 00:52:36,440 And if you're familiar with this idea of a cassette player, where 891 00:52:36,440 --> 00:52:40,040 you have some tape that goes and spins and spins and spins, 892 00:52:40,040 --> 00:52:42,735 that's kind of what this file pointer is doing. 893 00:52:42,735 --> 00:52:45,110 It starts at the beginning, spins all the way to the end. 894 00:52:45,110 --> 00:52:46,820 Once you get there, you have to rewind it 895 00:52:46,820 --> 00:52:50,450 all the way back to the very beginning. 896 00:52:50,450 --> 00:52:52,250 Or a VHS tape as well-- 897 00:52:52,250 --> 00:52:53,570 nice metaphor there. 898 00:52:53,570 --> 00:52:55,808 Yeah. 899 00:52:55,808 --> 00:52:57,100 And then another question here. 900 00:52:57,100 --> 00:53:00,180 How do we find how many chunks there are in the file? 901 00:53:00,180 --> 00:53:05,160 Well, actually, I would argue, you can't quite know at the very beginning. 902 00:53:05,160 --> 00:53:06,960 Your computer is able to tell you roughly 903 00:53:06,960 --> 00:53:09,520 how many bytes are inside a file. 904 00:53:09,520 --> 00:53:13,650 But if you were to write a program to find the size of a file 905 00:53:13,650 --> 00:53:17,460 that you didn't know beforehand, logically, the only way you 906 00:53:17,460 --> 00:53:21,780 can find the end of that file or how big it is by starting at the beginning, 907 00:53:21,780 --> 00:53:26,130 reading byte by byte by byte by byte until you get to the end. 908 00:53:26,130 --> 00:53:29,730 And you'll notice, well, there's no more file left to read. 909 00:53:29,730 --> 00:53:32,160 And so, often, at the end of your file, you 910 00:53:32,160 --> 00:53:37,050 will see a special signifier, maybe a null terminating character 911 00:53:37,050 --> 00:53:41,310 or some other kind of special character called an EOF or End of File character. 912 00:53:41,310 --> 00:53:44,190 That tells you there are no more bytes to read. 913 00:53:44,190 --> 00:53:48,150 But it's a bit like strlen, as we saw a bit in lecture, where you can't quite 914 00:53:48,150 --> 00:53:51,810 know how long the string is until you go character by character 915 00:53:51,810 --> 00:53:54,540 or byte by byte to the very end of it, and you 916 00:53:54,540 --> 00:53:58,750 see that ending character, like a null character or an EOF, 917 00:53:58,750 --> 00:54:01,780 end of file character. 918 00:54:01,780 --> 00:54:02,740 Good questions. 919 00:54:02,740 --> 00:54:05,340 920 00:54:05,340 --> 00:54:09,060 OK, so let's make this a little more concrete, 921 00:54:09,060 --> 00:54:12,990 and we'll write a program here to actually check what kind of type 922 00:54:12,990 --> 00:54:14,440 a file is. 923 00:54:14,440 --> 00:54:20,070 So one interesting fact about a file-- let me find this problem right here-- 924 00:54:20,070 --> 00:54:24,870 is that, generally, files have a signature of bytes 925 00:54:24,870 --> 00:54:30,300 at the very beginning that tell your program and tell you what kind of file 926 00:54:30,300 --> 00:54:31,270 it is. 927 00:54:31,270 --> 00:54:37,050 So, for instance, a PDF tends to begin with these 4 bytes. 928 00:54:37,050 --> 00:54:40,530 Or rather, 4 bytes represent these numbers 929 00:54:40,530 --> 00:54:46,500 when stored as integers-- so 37, 80, 68, and 70. 930 00:54:46,500 --> 00:54:51,760 If you see 4 bytes at the beginning of a file that look like 37, 931 00:54:51,760 --> 00:54:58,380 80, 68, and 70, that file, turns out, will most likely be a PDF. 932 00:54:58,380 --> 00:55:00,660 And other files have their own signatures 933 00:55:00,660 --> 00:55:06,170 that they can tell you what kind of file they are at the very beginning of them. 934 00:55:06,170 --> 00:55:11,650 So what we'll do is write a program that actually opens up any given file 935 00:55:11,650 --> 00:55:16,540 and tells us whether that file is a PDF or is not a PDF 936 00:55:16,540 --> 00:55:20,000 based on those first 4 bytes. 937 00:55:20,000 --> 00:55:21,350 So let's go ahead and do that. 938 00:55:21,350 --> 00:55:24,100 I'll go back to my program here, and I'll refresh the window 939 00:55:24,100 --> 00:55:26,880 so I'm able to set up my codespace. 940 00:55:26,880 --> 00:55:29,640 And I'll wait for that to load. 941 00:55:29,640 --> 00:55:34,900 And while we do, let me ask what questions we have on this prompt here, 942 00:55:34,900 --> 00:55:36,480 if any-- 943 00:55:36,480 --> 00:55:37,690 on file signatures. 944 00:55:37,690 --> 00:55:45,000 945 00:55:45,000 --> 00:55:47,070 A question I see is, what happens if my chunk is 946 00:55:47,070 --> 00:55:50,130 bigger than the last bit in the file? 947 00:55:50,130 --> 00:55:50,890 A good question. 948 00:55:50,890 --> 00:55:55,920 So you could imagine, let's say you're reading maybe 8 bytes at a time. 949 00:55:55,920 --> 00:56:00,690 And you get to the end of your file, and there are only 4 bytes left to read. 950 00:56:00,690 --> 00:56:03,900 Well, what might happen with fread is it actually 951 00:56:03,900 --> 00:56:05,990 won't read past the end of your file. 952 00:56:05,990 --> 00:56:10,920 And instead of giving you 8 bytes, it'll give you 4 back. 953 00:56:10,920 --> 00:56:14,580 And the cool thing about fread is that it'll return to you 954 00:56:14,580 --> 00:56:16,840 the number of elements it has read. 955 00:56:16,840 --> 00:56:23,070 So if you say, read me eight elements of 1 byte size each, 956 00:56:23,070 --> 00:56:26,880 it'll return to you the number 8 if it successfully did that. 957 00:56:26,880 --> 00:56:29,880 If it only read four it'll return to you 4. 958 00:56:29,880 --> 00:56:32,220 Or if it only read three, it'll return to you 3. 959 00:56:32,220 --> 00:56:35,250 Or zero, it'll return to you 0. 960 00:56:35,250 --> 00:56:38,910 So fread does have a return value you can use to figure out, 961 00:56:38,910 --> 00:56:42,820 are you at the end of your file, or are you not? 962 00:56:42,820 --> 00:56:47,950 So I think here my codespace is loaded, so I will get set up here. 963 00:56:47,950 --> 00:56:50,470 And we're going to finish this pdf.c program. 964 00:56:50,470 --> 00:56:53,410 So I will code pdf.c. 965 00:56:53,410 --> 00:56:56,870 And here I have the beginnings of my program. 966 00:56:56,870 --> 00:57:00,940 So the goal is to use pdf.c a bit like this-- 967 00:57:00,940 --> 00:57:07,210 ./pdf and then typing in some file name, like, let's say, test.pdf. 968 00:57:07,210 --> 00:57:12,700 And my program will say, yes, this is a PDF, or no, it's not a PDF. 969 00:57:12,700 --> 00:57:15,640 So the very first thing I should probably do 970 00:57:15,640 --> 00:57:18,940 is figure out how I can get that file name. 971 00:57:18,940 --> 00:57:23,890 And it turns out that that file name is going to be the first command line 972 00:57:23,890 --> 00:57:26,720 argument to my program. 973 00:57:26,720 --> 00:57:32,830 So if I do ./pdf test.pdf, I could access that file name using argv 974 00:57:32,830 --> 00:57:35,450 bracket 1, as we learned in a prior week. 975 00:57:35,450 --> 00:57:38,240 So here I'll go ahead and write the following. 976 00:57:38,240 --> 00:57:44,260 I'll say that I'm going to get a string called filename, 977 00:57:44,260 --> 00:57:47,950 and it will be equal to argv bracket 1. 978 00:57:47,950 --> 00:57:54,170 So not the very first argument, which is PDF, this ./pdf part here, 979 00:57:54,170 --> 00:57:57,350 but the second part, which is test.pdf right here. 980 00:57:57,350 --> 00:57:59,780 So now I have a string called filename. 981 00:57:59,780 --> 00:58:03,620 And now I have to try to open up that file. 982 00:58:03,620 --> 00:58:09,540 So what function could we use to open up a file? 983 00:58:09,540 --> 00:58:10,245 Any ideas? 984 00:58:10,245 --> 00:58:12,930 985 00:58:12,930 --> 00:58:14,790 We could use fopen. 986 00:58:14,790 --> 00:58:18,270 So fopen allows us to open up a file, find where it is, 987 00:58:18,270 --> 00:58:22,570 and return to us a pointer to that file we can use in our program. 988 00:58:22,570 --> 00:58:24,880 So I'll use fopen here. 989 00:58:24,880 --> 00:58:30,430 And it turns out fopen takes a file name as its first input. 990 00:58:30,430 --> 00:58:35,520 So whatever file name is in argv bracket 1, I'll give that as input to fopen. 991 00:58:35,520 --> 00:58:41,160 And I'll try to open that file using just the mode r for reading. 992 00:58:41,160 --> 00:58:42,900 I don't want to modify the file. 993 00:58:42,900 --> 00:58:45,710 I just want to read from it. 994 00:58:45,710 --> 00:58:49,980 But now I have to keep track of the pointer to this file. 995 00:58:49,980 --> 00:58:54,680 So to create a file pointer, I can use this FILE * type 996 00:58:54,680 --> 00:58:56,610 and make sure I give it a name. 997 00:58:56,610 --> 00:58:59,030 In this case, I'll call it f for consistency. 998 00:58:59,030 --> 00:59:02,510 And I'll say I have a pointer to a file called f. 999 00:59:02,510 --> 00:59:05,300 It is of type FILE *, FILE pointer. 1000 00:59:05,300 --> 00:59:11,340 And it is the result of calling fopen on some file name to open up that file 1001 00:59:11,340 --> 00:59:15,600 and tell me exactly where it is in memory. 1002 00:59:15,600 --> 00:59:22,450 OK, so now with f available, I need to figure out how I can read from my file. 1003 00:59:22,450 --> 00:59:26,550 But to read from my file, what do I need first? 1004 00:59:26,550 --> 00:59:29,620 1005 00:59:29,620 --> 00:59:34,100 Thinking back to what we had seen earlier about fread, 1006 00:59:34,100 --> 00:59:35,730 there are a few questions to answer. 1007 00:59:35,730 --> 00:59:41,855 I think we know where we're going to read from, but what do we still need? 1008 00:59:41,855 --> 00:59:45,050 1009 00:59:45,050 --> 00:59:49,590 Probably, I'm seeing some people say, a place to read to. 1010 00:59:49,590 --> 00:59:51,890 So we're going to read from our file. 1011 00:59:51,890 --> 00:59:57,600 But now, in our own program, we need to have some space to store those values. 1012 00:59:57,600 --> 01:00:04,933 So, in this case, it's probably worth thinking about, first, we 1013 01:00:04,933 --> 01:00:06,350 want to read into a certain place. 1014 01:00:06,350 --> 01:00:10,280 But then again, how many chunks are we going to read, 1015 01:00:10,280 --> 01:00:11,870 and how big are those chunks? 1016 01:00:11,870 --> 01:00:17,000 Turns out that a PDF is full of one-byte chunks of memory, 1017 01:00:17,000 --> 01:00:20,690 and we want to read the first four. 1018 01:00:20,690 --> 01:00:24,290 So one way we could do that is by reading 1019 01:00:24,290 --> 01:00:27,020 1 byte one at a time four times. 1020 01:00:27,020 --> 01:00:33,620 Or we could use an array and store those first 4 bytes back to back to back. 1021 01:00:33,620 --> 01:00:37,590 So, often, buffers will be arrays of values. 1022 01:00:37,590 --> 01:00:40,340 And in this case, I'll try doing that. 1023 01:00:40,340 --> 01:00:43,670 I'll make an array of integers because we're 1024 01:00:43,670 --> 01:00:48,390 going to look at those first 4 bytes in the PDF and think of them as integers 1025 01:00:48,390 --> 01:00:53,830 and ask, is this equal to some value, like in this case 37, 1026 01:00:53,830 --> 01:00:56,920 80, 68, and 70, those first 4 bytes? 1027 01:00:56,920 --> 01:01:02,020 So I'll create some space in my program to store four integers. 1028 01:01:02,020 --> 01:01:05,280 And this, as we saw in prior weeks, is the syntax to do that. 1029 01:01:05,280 --> 01:01:12,420 I now have an array called buffer that stores four types, 1030 01:01:12,420 --> 01:01:17,440 and those types are int back to back to back. 1031 01:01:17,440 --> 01:01:22,580 Now, this is good but not quite particular enough. 1032 01:01:22,580 --> 01:01:25,750 So if you recall from lecture, we might have 1033 01:01:25,750 --> 01:01:29,930 this idea of an integer being worth about 4 bytes in size. 1034 01:01:29,930 --> 01:01:34,870 And notice here that, well, we want to read things in individual bytes, 1035 01:01:34,870 --> 01:01:37,730 like 1 byte at a time, not 4 bytes at a time. 1036 01:01:37,730 --> 01:01:42,700 So we're going to use an integer type but a special kind of integer type. 1037 01:01:42,700 --> 01:01:49,210 And that is one called a uint8_t, which looks a little weird. 1038 01:01:49,210 --> 01:01:51,310 But, basically, all this is doing is saying, 1039 01:01:51,310 --> 01:01:55,810 I want not just the generic integer type, which is 4 bytes long. 1040 01:01:55,810 --> 01:02:01,150 I want, in particular, an integer that is 8 bits or 1 byte 1041 01:02:01,150 --> 01:02:04,750 long that is unsigned, that is only positive, 1042 01:02:04,750 --> 01:02:07,070 and that entire thing is its own type. 1043 01:02:07,070 --> 01:02:08,260 So uint8_t. 1044 01:02:08,260 --> 01:02:11,080 This is essentially a particular kind of integer, 1045 01:02:11,080 --> 01:02:15,550 and it comes as part of the standard int library here. 1046 01:02:15,550 --> 01:02:18,630 So I only know this because I looked it up beforehand. 1047 01:02:18,630 --> 01:02:21,750 But when you're reading files, you could look up what kind of type 1048 01:02:21,750 --> 01:02:24,370 is best suited for reading data from that file. 1049 01:02:24,370 --> 01:02:28,440 Turns out that is a uint8_t for this particular kind of file. 1050 01:02:28,440 --> 01:02:31,350 And in CS50, we'll tell you in advance what kind of type 1051 01:02:31,350 --> 01:02:34,350 you should use when you're going to read from a particular kind of file, 1052 01:02:34,350 --> 01:02:36,150 all right? 1053 01:02:36,150 --> 01:02:37,680 So here I have my buffer. 1054 01:02:37,680 --> 01:02:43,910 And now I need to ask how I could read into that buffer. 1055 01:02:43,910 --> 01:02:46,830 Probably going to use fread here. 1056 01:02:46,830 --> 01:02:52,390 And does anyone remember what the first argument to fread is going to be? 1057 01:02:52,390 --> 01:02:55,430 We were answering four questions here. 1058 01:02:55,430 --> 01:03:00,060 Where we're reading from, where we're reading to-- 1059 01:03:00,060 --> 01:03:03,000 and the first one is indeed where we're going to read to. 1060 01:03:03,000 --> 01:03:06,570 So we're going to read into our buffer here. 1061 01:03:06,570 --> 01:03:10,410 And the next question is, how big are the chunks? 1062 01:03:10,410 --> 01:03:13,370 Well, they're 1 byte long, as we said before. 1063 01:03:13,370 --> 01:03:16,730 The next question is, how many chunks to read all at once? 1064 01:03:16,730 --> 01:03:19,290 Well, four to read all at once. 1065 01:03:19,290 --> 01:03:23,420 And now we're going to read from our file pointer. 1066 01:03:23,420 --> 01:03:27,570 So notice we're not doing filename, not the name of our file, 1067 01:03:27,570 --> 01:03:31,590 but instead the file pointer, f in this case. 1068 01:03:31,590 --> 01:03:36,397 So with that, we actually successfully have some data inside our buffer. 1069 01:03:36,397 --> 01:03:37,230 And we can prove it. 1070 01:03:37,230 --> 01:03:39,920 So I will write a for loop here-- 1071 01:03:39,920 --> 01:03:47,510 for int i = 0; i is less than 4; i++ to go from 0 to 3 and read through 1072 01:03:47,510 --> 01:03:48,500 our entire buffer. 1073 01:03:48,500 --> 01:03:54,110 I'll print out whatever's inside that buffer as an integer, like this, 1074 01:03:54,110 --> 01:03:57,410 and I'll say buffer bracket i. 1075 01:03:57,410 --> 01:04:01,820 And then at the very end, I will close our file like this 1076 01:04:01,820 --> 01:04:04,250 to make sure we're being safe with our memory. 1077 01:04:04,250 --> 01:04:12,340 I'll make pdf, run ./pdf test.pdf, and now we see those first 4 bytes 1078 01:04:12,340 --> 01:04:14,920 in our file. 1079 01:04:14,920 --> 01:04:16,450 I have another file here too. 1080 01:04:16,450 --> 01:04:21,450 I could say ./pdf or dot slash-- 1081 01:04:21,450 --> 01:04:25,960 yeah, sorry, ./pdf and then test.jpg, I believe. 1082 01:04:25,960 --> 01:04:27,680 Hit Enter. 1083 01:04:27,680 --> 01:04:31,390 And notice now these are very different values for this JPEG. 1084 01:04:31,390 --> 01:04:39,230 So the very first 4 bytes are 255, 216, 255, and then 224. 1085 01:04:39,230 --> 01:04:44,000 So we see here this signature of what this file is telling us its type 1086 01:04:44,000 --> 01:04:44,510 might be. 1087 01:04:44,510 --> 01:04:48,530 And we said before, well, a PDF seems to have these first 4 1088 01:04:48,530 --> 01:04:55,200 bytes of 37, 80, 68, and 70. 1089 01:04:55,200 --> 01:04:58,930 So questions here on how this is working. 1090 01:04:58,930 --> 01:05:02,920 We first opened our file, created a buffer, 1091 01:05:02,920 --> 01:05:07,930 used fread to read 4 individual bytes from that file, 1092 01:05:07,930 --> 01:05:12,260 and then printed those out as we went. 1093 01:05:12,260 --> 01:05:15,230 A question here on, does this apply to all file types? 1094 01:05:15,230 --> 01:05:18,590 Generally, all file types will have some signature 1095 01:05:18,590 --> 01:05:24,720 or some metadata that tells you what type of file they're going to be. 1096 01:05:24,720 --> 01:05:25,260 A question. 1097 01:05:25,260 --> 01:05:28,680 What if we used int instead of uint8_t? 1098 01:05:28,680 --> 01:05:31,990 I'm actually curious about this too, so I'll try int here as well. 1099 01:05:31,990 --> 01:05:35,010 I will recompile PDF. 1100 01:05:35,010 --> 01:05:37,290 I'll do ./pdf test.pdf. 1101 01:05:37,290 --> 01:05:40,740 1102 01:05:40,740 --> 01:05:42,880 That doesn't look as good. 1103 01:05:42,880 --> 01:05:47,250 And I'm curious why you think this might have happened. 1104 01:05:47,250 --> 01:05:48,720 What might have gone wrong? 1105 01:05:48,720 --> 01:05:57,590 1106 01:05:57,590 --> 01:06:01,840 So remember that the reason we used uint8_t is that it 1107 01:06:01,840 --> 01:06:04,760 was the right size of value to use. 1108 01:06:04,760 --> 01:06:10,210 So the key thing about a uint8_t is that it is only a single byte big. 1109 01:06:10,210 --> 01:06:14,030 A regular integer, though, is 4 bytes big. 1110 01:06:14,030 --> 01:06:17,620 So here what we see is that we're trying to perhaps create 1111 01:06:17,620 --> 01:06:23,470 an array of up to 16 bytes if an integer is 4 bytes long. 1112 01:06:23,470 --> 01:06:30,130 But we're only going to read four of them into that particular buffer. 1113 01:06:30,130 --> 01:06:35,120 So if we take 4 bytes and store them inside this buffer, 1114 01:06:35,120 --> 01:06:37,300 well, we might not get the values we expect. 1115 01:06:37,300 --> 01:06:40,270 And that's why it's important, when you're reading files, 1116 01:06:40,270 --> 01:06:42,640 to make sure you're using the appropriate types 1117 01:06:42,640 --> 01:06:45,890 and getting particular about the kinds of types you're going to use. 1118 01:06:45,890 --> 01:06:50,200 So I revert this back to uint8_t, which basically 1119 01:06:50,200 --> 01:06:54,640 means an integer that is going to be always positive, that is 1120 01:06:54,640 --> 01:06:56,890 8 bits or 1 byte long. 1121 01:06:56,890 --> 01:07:02,150 I can now recompile-- make pdf, ./pdf test.pdf, 1122 01:07:02,150 --> 01:07:05,045 and I'll see the numbers as I expect them to be. 1123 01:07:05,045 --> 01:07:10,790 1124 01:07:10,790 --> 01:07:12,950 And a good follow-on question here. 1125 01:07:12,950 --> 01:07:16,130 This program is decent, but I wouldn't say it's 1126 01:07:16,130 --> 01:07:18,890 quite all the way, let's say, safe. 1127 01:07:18,890 --> 01:07:21,050 So you can imagine me doing this. 1128 01:07:21,050 --> 01:07:25,993 If I type in ./pdf and leave this last argument blank, 1129 01:07:25,993 --> 01:07:28,910 who knows what could happen, right, because we're going to try to move 1130 01:07:28,910 --> 01:07:30,707 beyond argv and access some value. 1131 01:07:30,707 --> 01:07:32,540 So there is some more work to do here, which 1132 01:07:32,540 --> 01:07:35,623 I'll leave up to you to make sure we have the right number of command line 1133 01:07:35,623 --> 01:07:39,390 arguments and so on. 1134 01:07:39,390 --> 01:07:42,360 Other questions conceptually on reading and writing? 1135 01:07:42,360 --> 01:07:45,650 1136 01:07:45,650 --> 01:07:48,890 Question on the _t here. 1137 01:07:48,890 --> 01:07:54,600 So _t basically identifies this as its very own type. 1138 01:07:54,600 --> 01:07:56,910 So it's kind of a convention here. 1139 01:07:56,910 --> 01:07:59,120 And this is broken out into a few parts. 1140 01:07:59,120 --> 01:08:05,210 u stands for unsigned only positive. int stands, of course, for integer. 1141 01:08:05,210 --> 01:08:07,730 8 stands for the number of bits used. 1142 01:08:07,730 --> 01:08:08,660 We have 8. 1143 01:08:08,660 --> 01:08:10,920 We have 16 as well. 1144 01:08:10,920 --> 01:08:14,990 And then _t means all of that is its very own type. 1145 01:08:14,990 --> 01:08:19,377 1146 01:08:19,377 --> 01:08:20,210 And a question here. 1147 01:08:20,210 --> 01:08:22,468 What is the return value of fread? 1148 01:08:22,468 --> 01:08:23,760 That's an interesting question. 1149 01:08:23,760 --> 01:08:25,069 So why don't I try that out? 1150 01:08:25,069 --> 01:08:31,250 I could say int blocks_read because I know fread returns to me 1151 01:08:31,250 --> 01:08:34,560 the number of blocks that it did read successfully. 1152 01:08:34,560 --> 01:08:36,950 So maybe I'll print out the buffer, and at the end, 1153 01:08:36,950 --> 01:08:43,160 I'll also print out "Blocks read %i backslash n". 1154 01:08:43,160 --> 01:08:47,370 I'll substitute blocks_read in here like this. 1155 01:08:47,370 --> 01:08:52,010 I will then make pdf and do ./pdf test.pdf. 1156 01:08:52,010 --> 01:08:56,640 And I'll see I successfully read four blocks. 1157 01:08:56,640 --> 01:09:03,616 If I got to the end of my file, I might see fewer, or I might see none at all. 1158 01:09:03,616 --> 01:09:04,199 Good question. 1159 01:09:04,199 --> 01:09:09,020 1160 01:09:09,020 --> 01:09:14,460 OK, so I think this brings us close to the end of our section. 1161 01:09:14,460 --> 01:09:16,430 Suffice to say, in problem set this week, 1162 01:09:16,430 --> 01:09:20,359 we get a lot more practice using fread and fwrite even. 1163 01:09:20,359 --> 01:09:26,960 Before we go off to our own studies, I want to remind you all of fwrite 1164 01:09:26,960 --> 01:09:27,870 as well. 1165 01:09:27,870 --> 01:09:31,310 And one handy trick here is that fwrite is basically 1166 01:09:31,310 --> 01:09:33,800 the same ordering as fread. 1167 01:09:33,800 --> 01:09:37,910 Notice we have buffer, size of the chunk, number of chunks, 1168 01:09:37,910 --> 01:09:40,258 and the place to now write into. 1169 01:09:40,258 --> 01:09:42,050 The only difference, as I just said before, 1170 01:09:42,050 --> 01:09:45,920 is that we're now not just reading from the file into our buffer. 1171 01:09:45,920 --> 01:09:53,520 We're copying from our buffer into the file and adding data as we go. 1172 01:09:53,520 --> 01:09:54,850 All right. 1173 01:09:54,850 --> 01:09:57,860 So that, I think, should hopefully set you up very well for this week. 1174 01:09:57,860 --> 01:09:59,980 Feel free to reach out if you have any questions. 1175 01:09:59,980 --> 01:10:02,970 And, hopefully, see you next time. 1176 01:10:02,970 --> 01:10:04,000