1 00:00:00,000 --> 00:00:03,240 2 00:00:03,240 --> 00:00:06,860 CARTER ZENKE: Well, hello one and all and welcome to our week four section. 3 00:00:06,860 --> 00:00:07,860 My name is Carter Zenke. 4 00:00:07,860 --> 00:00:09,720 I'm the course's preceptor, and today we'll 5 00:00:09,720 --> 00:00:13,860 dive into memory, little details of how a computer actually stores things 6 00:00:13,860 --> 00:00:16,560 inside of its own memory in this case. 7 00:00:16,560 --> 00:00:17,618 So jump in. 8 00:00:17,618 --> 00:00:19,410 We'll take a look at a few questions today, 9 00:00:19,410 --> 00:00:21,118 and these questions look a bit like this. 10 00:00:21,118 --> 00:00:23,610 First we'll talk about these ideas of pointers. 11 00:00:23,610 --> 00:00:25,860 What is a pointer, how do we use it, what's 12 00:00:25,860 --> 00:00:27,960 their syntax we should get used to? 13 00:00:27,960 --> 00:00:31,500 Second, we'll think about how it can read and write data from a file. 14 00:00:31,500 --> 00:00:35,430 How do we take data stored inside a file, put it inside our program. 15 00:00:35,430 --> 00:00:39,210 And then how do we take data from our program and put it inside of a file? 16 00:00:39,210 --> 00:00:42,250 And then finally, we'll talk about this idea of dynamic memory. 17 00:00:42,250 --> 00:00:46,280 How do we use these new tools like malloc and when should we use them? 18 00:00:46,280 --> 00:00:47,850 Even really, why should we use them? 19 00:00:47,850 --> 00:00:49,240 Why do we care about them? 20 00:00:49,240 --> 00:00:52,290 So to jump in, let's take a look first at pointers, 21 00:00:52,290 --> 00:00:54,780 where pointers is this idea of trying to store 22 00:00:54,780 --> 00:01:00,180 the address of some variable inside a new variable, in this case a pointer. 23 00:01:00,180 --> 00:01:02,580 So to get started with this, let's think back to what 24 00:01:02,580 --> 00:01:04,560 we talked about first with variables. 25 00:01:04,560 --> 00:01:06,150 What is a variable? 26 00:01:06,150 --> 00:01:08,850 If a pointer is to variable, what first is a variable? 27 00:01:08,850 --> 00:01:10,320 Well, we saw earlier-- 28 00:01:10,320 --> 00:01:12,120 we had this contact application. 29 00:01:12,120 --> 00:01:14,580 We're trying to store some variables inside of like people 30 00:01:14,580 --> 00:01:17,640 with names, addresses, phone numbers, et cetera. 31 00:01:17,640 --> 00:01:19,780 Or even the number of times we had called them. 32 00:01:19,780 --> 00:01:21,630 So let's go back to that idea first. 33 00:01:21,630 --> 00:01:23,430 Let's take a look at this syntax where we 34 00:01:23,430 --> 00:01:29,410 had some variable named calls that is an integer, and it has the value four. 35 00:01:29,410 --> 00:01:31,810 And we broke this variable down as follows. 36 00:01:31,810 --> 00:01:36,150 We said that we can give this variable a name-- in this case, calls. 37 00:01:36,150 --> 00:01:40,410 It also has a type called int for integer, it stores integer values. 38 00:01:40,410 --> 00:01:43,710 And it also has a value, in this case, four. 39 00:01:43,710 --> 00:01:46,980 So calls is simply some name for some place in memory 40 00:01:46,980 --> 00:01:49,720 that has the value four. 41 00:01:49,720 --> 00:01:53,175 But the picture is a little more complex than that, as we now know. 42 00:01:53,175 --> 00:01:58,140 In fact, instead of simply saying that calls is located someplace in memory, 43 00:01:58,140 --> 00:02:01,230 we can tell you very specifically where it is in memory 44 00:02:01,230 --> 00:02:03,600 using this new tool called addresses. 45 00:02:03,600 --> 00:02:07,810 So similar to how homes or businesses have addresses, so do variables. 46 00:02:07,810 --> 00:02:11,130 And in this case, we'll see that calls has the address 47 00:02:11,130 --> 00:02:15,070 0x1A, kind of by random choice here. 48 00:02:15,070 --> 00:02:19,830 But in this case, we use hexadecimal format, where we see zero x in front 49 00:02:19,830 --> 00:02:22,140 to note that this is in base 16. 50 00:02:22,140 --> 00:02:24,480 And programs have used hexadecimal because it's 51 00:02:24,480 --> 00:02:26,880 more convenient to talk about memory locations 52 00:02:26,880 --> 00:02:29,700 given how many location memory there can possibly be. 53 00:02:29,700 --> 00:02:32,830 It's easier to use base 16 than, for example, base 10. 54 00:02:32,830 --> 00:02:36,030 And so we denote that using that 0x in front. 55 00:02:36,030 --> 00:02:39,690 So calls has this location, 0x1A. 56 00:02:39,690 --> 00:02:42,720 But how do we actually make use of this location? 57 00:02:42,720 --> 00:02:45,690 Well we could actually use what we call now a pointer, where 58 00:02:45,690 --> 00:02:48,690 a pointer is simply a variable as we know it, 59 00:02:48,690 --> 00:02:52,950 but it stores not a regular value like an integer or a character, 60 00:02:52,950 --> 00:02:54,850 but some address in memory. 61 00:02:54,850 --> 00:02:59,780 So let's take a look at this visual now where we have this pointer called p, 62 00:02:59,780 --> 00:03:03,030 and we know it because we're calling it p in our syntax on the left hand side. 63 00:03:03,030 --> 00:03:05,560 This is the name for our variable. 64 00:03:05,560 --> 00:03:08,460 We also have, similarly, a type for this pointer. 65 00:03:08,460 --> 00:03:10,230 This is an int star. 66 00:03:10,230 --> 00:03:14,280 So we know int is an integer value, integer type. 67 00:03:14,280 --> 00:03:16,150 We also have characters and so on. 68 00:03:16,150 --> 00:03:18,750 But in this case, we see int star. 69 00:03:18,750 --> 00:03:21,420 So whenever we see one of those basic types, 70 00:03:21,420 --> 00:03:25,860 in this case like char star or int star, we 71 00:03:25,860 --> 00:03:31,560 can infer that this is not a char or an int, but a pointer to a char or an int. 72 00:03:31,560 --> 00:03:34,410 In this case, p is a pointer to an integer. 73 00:03:34,410 --> 00:03:38,850 That star notes this is a pointer we're talking about, not the actual value, 74 00:03:38,850 --> 00:03:41,710 in this case, an integer or a character. 75 00:03:41,710 --> 00:03:45,750 Now to that end, notice how the value of p is itself an address. 76 00:03:45,750 --> 00:03:47,880 So we have 0x1a. 77 00:03:47,880 --> 00:03:51,360 So p will get this address and store it wherever 78 00:03:51,360 --> 00:03:53,770 it has located itself in memory. 79 00:03:53,770 --> 00:03:57,060 And so not to get too in the weeds, but we 80 00:03:57,060 --> 00:04:00,000 know that if pointers are themselves variables, 81 00:04:00,000 --> 00:04:02,880 they're a special kind of variable and they store addresses, 82 00:04:02,880 --> 00:04:07,410 but this pointer itself also has an address in memory. 83 00:04:07,410 --> 00:04:09,970 We won't really talk too much about this idea. 84 00:04:09,970 --> 00:04:13,140 It's not often we try to get the location of a pointer 85 00:04:13,140 --> 00:04:15,280 that itself is a location in memory. 86 00:04:15,280 --> 00:04:21,310 But just so you know, because this p variable is a variable, it does, 87 00:04:21,310 --> 00:04:24,170 of course, have an address in memory. 88 00:04:24,170 --> 00:04:27,130 Now let's think about how we could use this. 89 00:04:27,130 --> 00:04:31,665 So we have this idea of storing values and variables, in this case, 90 00:04:31,665 --> 00:04:33,790 addresses in pointers, but what do we do with them? 91 00:04:33,790 --> 00:04:35,747 What syntax can we use for them? 92 00:04:35,747 --> 00:04:36,830 Let's take a look at this. 93 00:04:36,830 --> 00:04:39,790 Well, we know we can always get the value of a variable, 94 00:04:39,790 --> 00:04:42,670 simply kind of saying its name in code after we've 95 00:04:42,670 --> 00:04:44,240 declared it and initialized it. 96 00:04:44,240 --> 00:04:47,390 So in this case, we're simply getting the value of calls. 97 00:04:47,390 --> 00:04:49,690 If we ever say calls in our program, that 98 00:04:49,690 --> 00:04:54,410 gives us back the value at that variable location-- in this case, 99 00:04:54,410 --> 00:04:56,170 the value for calls. 100 00:04:56,170 --> 00:04:59,885 Now if we want the address or the value inside of our pointer, p, 101 00:04:59,885 --> 00:05:00,760 we could do the same. 102 00:05:00,760 --> 00:05:03,700 We could say, give me whatever's inside of p. 103 00:05:03,700 --> 00:05:06,460 And in this way, we can keep track of, for example, 104 00:05:06,460 --> 00:05:08,080 where this variable calls is. 105 00:05:08,080 --> 00:05:12,460 If we know that we have a variable, a pointer called p that points to whoever 106 00:05:12,460 --> 00:05:16,300 calls is, we could simply use the syntax, p, to figure out 107 00:05:16,300 --> 00:05:19,420 where, in this case, calls is located. 108 00:05:19,420 --> 00:05:24,430 We could, similarly, use some new syntax we've seen in this case, ampersand. 109 00:05:24,430 --> 00:05:27,760 where ampersand stands for the address of some variable. 110 00:05:27,760 --> 00:05:30,970 And I like to remember this by saying that ampersand begins with an a, 111 00:05:30,970 --> 00:05:32,050 and so does address. 112 00:05:32,050 --> 00:05:34,190 Ampersand and address. 113 00:05:34,190 --> 00:05:37,330 So in this case ampersand calls tells us, OK, 114 00:05:37,330 --> 00:05:40,210 where is calls located, just in general? 115 00:05:40,210 --> 00:05:42,620 Maybe it's not inside a variable yet, but where is calls? 116 00:05:42,620 --> 00:05:45,700 So if we look at the slides here, we'll see that calls 117 00:05:45,700 --> 00:05:48,940 is located at 0x1a, as we saw before. 118 00:05:48,940 --> 00:05:52,930 Ampersand calls will give us not the value of calls, not four, 119 00:05:52,930 --> 00:05:56,700 but in this case, the address, 0x1a. 120 00:05:56,700 --> 00:06:00,210 Now thinking similarly, we could run the same on a pointer. 121 00:06:00,210 --> 00:06:05,940 We could say, what's the address of this pointer, and get back, perhaps, 0xf0, 122 00:06:05,940 --> 00:06:08,400 or wherever this pointer is located in memory. 123 00:06:08,400 --> 00:06:09,730 It's not often we'll do this. 124 00:06:09,730 --> 00:06:12,030 It's not often that we'll get the address of a pointer 125 00:06:12,030 --> 00:06:16,470 that, again, itself has an address, but you can do it if you're curious. 126 00:06:16,470 --> 00:06:18,130 Now let's think about this, then. 127 00:06:18,130 --> 00:06:24,750 If we have a way to get the value of the pointer and the address of the pointer. 128 00:06:24,750 --> 00:06:27,360 I mean, what do we actually use a pointer for? 129 00:06:27,360 --> 00:06:32,730 Well, often, we'll use a pointer to instead go to wherever it's pointing to 130 00:06:32,730 --> 00:06:33,940 and get the value from there. 131 00:06:33,940 --> 00:06:36,750 And for that we use this syntax, this star. 132 00:06:36,750 --> 00:06:40,080 It's a little bit confusing because we saw a star before, like int star 133 00:06:40,080 --> 00:06:42,793 and char star to declare a pointer. 134 00:06:42,793 --> 00:06:45,210 But in this case, we're actually going to use it to follow 135 00:06:45,210 --> 00:06:48,150 a pointer to the locations pointing to. 136 00:06:48,150 --> 00:06:50,370 Now the visual for this is a bit like this. 137 00:06:50,370 --> 00:06:54,900 If we say star p, we first think, OK, what is inside of star p? 138 00:06:54,900 --> 00:06:56,910 What is the value of star p? 139 00:06:56,910 --> 00:06:59,080 And think to yourself for a minute here. 140 00:06:59,080 --> 00:07:01,800 What is the value of star p? 141 00:07:01,800 --> 00:07:04,460 What is the value of just p, first. 142 00:07:04,460 --> 00:07:05,750 0x1a. 143 00:07:05,750 --> 00:07:08,780 And now, putting the star in front of it, we say, 144 00:07:08,780 --> 00:07:12,170 let's go to that value that is stored in p. 145 00:07:12,170 --> 00:07:15,830 Let's go to whatever address is stored in p-- in this case, 0x1A, 146 00:07:15,830 --> 00:07:18,000 and let's get that value there. 147 00:07:18,000 --> 00:07:20,850 So let's follow the arrow, follow the pointer so to speak, 148 00:07:20,850 --> 00:07:26,690 and get that value, in this case, four, from our pointer, p. 149 00:07:26,690 --> 00:07:27,970 This is handy, of course. 150 00:07:27,970 --> 00:07:30,590 We build up really large and complex data structures. 151 00:07:30,590 --> 00:07:34,250 And this simple example might seem redundant but as you go off and build 152 00:07:34,250 --> 00:07:37,220 your own linked lists and hash tables and so on, it 153 00:07:37,220 --> 00:07:40,360 can be really useful to be able to call a pointer like this 154 00:07:40,360 --> 00:07:43,040 to be able to see OK, where is this value stored, how can I 155 00:07:43,040 --> 00:07:45,320 get the value it's pointing to? 156 00:07:45,320 --> 00:07:48,830 Now as you go off a few things to remember about pointer syntax 157 00:07:48,830 --> 00:07:50,330 as we just saw now. 158 00:07:50,330 --> 00:07:54,260 First, type star, whether it's int star or char star, 159 00:07:54,260 --> 00:07:57,830 is a pointer that stores the address of a certain type. 160 00:07:57,830 --> 00:08:00,920 If you say int star, it stores the address of integer. 161 00:08:00,920 --> 00:08:05,100 If you say char star, it's the address of a character. 162 00:08:05,100 --> 00:08:09,530 Now additionally, star x takes a pointer-- in this case called x, 163 00:08:09,530 --> 00:08:13,520 and goes to the address stored at that pointer, as we just saw recently. 164 00:08:13,520 --> 00:08:18,800 Star p finds that pointer, p, and says, what's the value inside of p? 165 00:08:18,800 --> 00:08:23,070 Let me go to that address and find the value there. 166 00:08:23,070 --> 00:08:28,500 And finally, ampersand x takes whatever variable you have, in this case 167 00:08:28,500 --> 00:08:30,720 called x, and gets its address. 168 00:08:30,720 --> 00:08:36,020 Ampersand for address, the two A's together there. 169 00:08:36,020 --> 00:08:38,720 Now let's take a look at an example exercise 170 00:08:38,720 --> 00:08:42,390 we can do to make sure we're practicing this syntax as we go. 171 00:08:42,390 --> 00:08:44,810 And this is a pointer prediction exercise. 172 00:08:44,810 --> 00:08:48,830 So often, it's helpful when first starting out to look at a program 173 00:08:48,830 --> 00:08:50,510 and predict what it might do. 174 00:08:50,510 --> 00:08:54,690 And only afterwards, we run it and check your assumptions. 175 00:08:54,690 --> 00:08:58,340 So today, we'll take a look at this file called pointers.c. 176 00:08:58,340 --> 00:09:02,330 And I encourage you, before you run this file, to read through the code, 177 00:09:02,330 --> 00:09:04,850 line by line, and act like a computer. 178 00:09:04,850 --> 00:09:09,080 In your mind or on a piece of paper, write down, what's the value of a? 179 00:09:09,080 --> 00:09:13,010 What's the address of a, for example, inside this points.c file. 180 00:09:13,010 --> 00:09:16,340 And then once you've done that, go ahead and run it and see 181 00:09:16,340 --> 00:09:18,450 if your assumptions are correct. 182 00:09:18,450 --> 00:09:20,660 Now let's do this a bit together here. 183 00:09:20,660 --> 00:09:23,240 Here we have the code on the left hand side 184 00:09:23,240 --> 00:09:25,410 and this visual on the right hand side. 185 00:09:25,410 --> 00:09:29,790 And again, your job is to take a few minutes, maybe pause the video, 186 00:09:29,790 --> 00:09:33,230 read through this code top to bottom and work like a computer. 187 00:09:33,230 --> 00:09:36,620 Fill in the potential address of A and the value of a. 188 00:09:36,620 --> 00:09:40,310 The potential address of b and the value of b, and so on for c. 189 00:09:40,310 --> 00:09:42,680 It's OK if you need to make up addresses. 190 00:09:42,680 --> 00:09:44,130 You can certainly do that. 191 00:09:44,130 --> 00:09:47,660 But at the end of running this program, quote unquote, in your own head, 192 00:09:47,660 --> 00:09:52,490 you should have some addresses for a, b, and c, And some values for a, b, and c. 193 00:09:52,490 --> 00:09:55,117 So go ahead and pause the video here, work on this, 194 00:09:55,117 --> 00:09:56,825 and we'll come back to this one together. 195 00:09:56,825 --> 00:10:00,756 196 00:10:00,756 --> 00:10:06,390 OK, so by now you might have, in your mind, some potential values for a, b, 197 00:10:06,390 --> 00:10:06,960 and c. 198 00:10:06,960 --> 00:10:08,918 And let's go ahead and run this program, again, 199 00:10:08,918 --> 00:10:12,420 called pointers.c to actually see what will happen here. 200 00:10:12,420 --> 00:10:16,140 I'll open up code.cs50.io, in which case, I already 201 00:10:16,140 --> 00:10:18,540 have a folder called pointers. 202 00:10:18,540 --> 00:10:21,150 And inside of this folder, I have pointers.c. 203 00:10:21,150 --> 00:10:26,040 You too can download pointers.c and get it inside of your own code space. 204 00:10:26,040 --> 00:10:31,260 In this case, I'll type cd pointers to change directory into pointers. 205 00:10:31,260 --> 00:10:34,860 And I'll type ls, where you can see, I have pointers.c 206 00:10:34,860 --> 00:10:38,610 and this compiled version of pointers that I could run if I want to. 207 00:10:38,610 --> 00:10:41,220 Let's go ahead and open, first, pointers.c. 208 00:10:41,220 --> 00:10:43,260 And we see the very same file-- 209 00:10:43,260 --> 00:10:46,770 in this case, encased with some syntax to help us run this program, 210 00:10:46,770 --> 00:10:47,910 ultimately. 211 00:10:47,910 --> 00:10:51,540 But notice how we have the same declarations of variables 212 00:10:51,540 --> 00:10:54,900 and initialization and the same manipulation down below. 213 00:10:54,900 --> 00:10:57,340 And finally what we'll do is print the results. 214 00:10:57,340 --> 00:11:01,480 So in the end, after running this program, c, a has the value-- 215 00:11:01,480 --> 00:11:05,820 whatever it has, located at some certain location in memory. 216 00:11:05,820 --> 00:11:08,250 And notice how we're using that same syntax from before. 217 00:11:08,250 --> 00:11:12,070 We're saying, if we want to find out where a is located, 218 00:11:12,070 --> 00:11:13,680 let's use ampersand. 219 00:11:13,680 --> 00:11:16,020 So ampersand a says, what's the address of? 220 00:11:16,020 --> 00:11:20,940 And the format code for a pointer is simply percent p. 221 00:11:20,940 --> 00:11:22,410 Percent i for integer. 222 00:11:22,410 --> 00:11:23,980 Again, these are all integers here. 223 00:11:23,980 --> 00:11:28,040 So we can use percent i, percent p or a pointer. 224 00:11:28,040 --> 00:11:29,630 So let's compile pointers. 225 00:11:29,630 --> 00:11:33,210 Let's go down to our terminal and type Make pointers. 226 00:11:33,210 --> 00:11:39,320 And now I'll clear my terminal and I'll run dot slash pointers and hit Enter. 227 00:11:39,320 --> 00:11:41,940 And now we'll see the results. 228 00:11:41,940 --> 00:11:44,360 And you might have made up some addresses as you went, 229 00:11:44,360 --> 00:11:49,010 but we can still see the basic gist here where a has the value 14 230 00:11:49,010 --> 00:11:53,850 and it's located at some location memory that ends in, let's say, 53c. 231 00:11:53,850 --> 00:11:59,310 B has a value 25, located at some other location, but it ends with 538. 232 00:11:59,310 --> 00:12:01,830 So it's in a different location than a. 233 00:12:01,830 --> 00:12:04,830 And finally, c has the value-- 234 00:12:04,830 --> 00:12:07,290 well, what value does it have? 235 00:12:07,290 --> 00:12:10,120 It doesn't have an integer, it just has an actual address. 236 00:12:10,120 --> 00:12:13,380 So in this case, c is a pointer storing an address. 237 00:12:13,380 --> 00:12:16,770 And of course, a pointer as another variable 238 00:12:16,770 --> 00:12:19,600 does have a location itself in memory-- 239 00:12:19,600 --> 00:12:24,210 in this case, one that ends with 530, so someplace altogether different 240 00:12:24,210 --> 00:12:28,130 than a and b. 241 00:12:28,130 --> 00:12:31,190 So if you predicted correctly, congratulations. 242 00:12:31,190 --> 00:12:33,170 You're getting a hang of your pointer syntax. 243 00:12:33,170 --> 00:12:34,340 If not, not to worry. 244 00:12:34,340 --> 00:12:36,650 Feel free to practice more with the syntax 245 00:12:36,650 --> 00:12:39,440 and actually learn by doing as you write programs 246 00:12:39,440 --> 00:12:43,860 that use this syntax here for pointers. 247 00:12:43,860 --> 00:12:47,220 But maybe first let's get a feel for how this is working. 248 00:12:47,220 --> 00:12:51,000 Well first we had integer a gets the value 28. 249 00:12:51,000 --> 00:12:53,910 So at some location in memory, we'll create 250 00:12:53,910 --> 00:12:58,380 this variable called a that gets the value 28, and so on for b. 251 00:12:58,380 --> 00:13:02,310 So we can give them some random address, wherever it is that they're assigned. 252 00:13:02,310 --> 00:13:07,980 So a and b have some random address, but they do have the values 28 and 50. 253 00:13:07,980 --> 00:13:10,320 Now though, let's take a look at c. 254 00:13:10,320 --> 00:13:11,890 Well, C looks a bit different. 255 00:13:11,890 --> 00:13:14,710 It says int star is its type. 256 00:13:14,710 --> 00:13:16,590 So what is int star? 257 00:13:16,590 --> 00:13:18,070 Think to yourself. 258 00:13:18,070 --> 00:13:19,610 It's a pointer to an integer. 259 00:13:19,610 --> 00:13:22,180 So in this case, we have a pointer called c. 260 00:13:22,180 --> 00:13:25,910 This is going to get not a, but the address of a. 261 00:13:25,910 --> 00:13:29,470 So wherever a is stored, c will get that address. 262 00:13:29,470 --> 00:13:35,830 Now c is pointing to, metaphorically, a, or the value stored at a. 263 00:13:35,830 --> 00:13:38,590 Now we do some manipulation of these pointers. 264 00:13:38,590 --> 00:13:42,040 So we've set them up initially here, but now I do some manipulation of them. 265 00:13:42,040 --> 00:13:49,380 We say, OK, let's go to the value that c is holding and make that 14. 266 00:13:49,380 --> 00:13:54,610 But where, currently, is c pointing? 267 00:13:54,610 --> 00:13:57,760 Well, c, as we just said, is pointing at a. 268 00:13:57,760 --> 00:14:02,290 C, this pointer called c, to an integer, has the address of a. 269 00:14:02,290 --> 00:14:09,400 So if we say star c, that means go to the location of a and change the value 270 00:14:09,400 --> 00:14:11,920 to be 14. 271 00:14:11,920 --> 00:14:13,570 Now we'll update c. 272 00:14:13,570 --> 00:14:17,620 We'll say, c, as a value, has ampersand b. 273 00:14:17,620 --> 00:14:19,555 So where to c point to now? 274 00:14:19,555 --> 00:14:22,160 275 00:14:22,160 --> 00:14:24,500 C points to b, essentially. 276 00:14:24,500 --> 00:14:28,220 We take the address of b and store that in c. 277 00:14:28,220 --> 00:14:34,340 And finally, we say, let's follow c to wherever it's pointing using star c, 278 00:14:34,340 --> 00:14:37,550 and then change the value to be 25. 279 00:14:37,550 --> 00:14:40,430 So at the end result, we should see-- 280 00:14:40,430 --> 00:14:47,210 because we've first set a, first set b, then had c point to a, updated a 281 00:14:47,210 --> 00:14:51,290 to be 14, then had c point to b, updated b to be 25, 282 00:14:51,290 --> 00:14:55,790 we should, of course, see that a has a value 14, b has the value 25. 283 00:14:55,790 --> 00:15:02,800 And notice that c has this value that corresponds to what other variable? 284 00:15:02,800 --> 00:15:05,160 Corresponds to b. 285 00:15:05,160 --> 00:15:10,790 C has the value that is the location of b. 286 00:15:10,790 --> 00:15:11,290 OK. 287 00:15:11,290 --> 00:15:14,650 So again, feel free to get some more practice with this, either on your own 288 00:15:14,650 --> 00:15:16,690 or while you work on the problem set itself. 289 00:15:16,690 --> 00:15:19,510 But these pieces of syntax will be really useful for you 290 00:15:19,510 --> 00:15:23,320 and great to master as you go through your programming journey. 291 00:15:23,320 --> 00:15:27,190 Now with that in mind, let's do something a bit more advanced. 292 00:15:27,190 --> 00:15:30,400 Let's take a look at how we can use these pointers for actually adding 293 00:15:30,400 --> 00:15:33,260 to files, reading from files and so on. 294 00:15:33,260 --> 00:15:35,030 So let's go back to our slides here. 295 00:15:35,030 --> 00:15:37,690 And let's think about this idea of file I/O, 296 00:15:37,690 --> 00:15:41,780 where file I/O stands for file input and file output. 297 00:15:41,780 --> 00:15:47,220 How do we take input from files and how do we output data to files? 298 00:15:47,220 --> 00:15:49,820 So if we think first about this idea of a file, 299 00:15:49,820 --> 00:15:51,860 you might have this kind of common notion 300 00:15:51,860 --> 00:15:56,240 of a file being some place in memory that has a name, and some maybe text 301 00:15:56,240 --> 00:15:59,250 inside of it, or some other characters or some other pieces of data. 302 00:15:59,250 --> 00:16:03,680 In this case, we might have this file called hi.txt. 303 00:16:03,680 --> 00:16:08,420 And inside of hi.txt, well, we just have the characters, hi, exclamation point. 304 00:16:08,420 --> 00:16:12,830 And now this file is really located somewhere, of course, in memory. 305 00:16:12,830 --> 00:16:18,610 This file has to have a place, and let's say it's at 0x456 in memory. 306 00:16:18,610 --> 00:16:22,620 Now, if we want to open this file in order to read data from it, 307 00:16:22,620 --> 00:16:27,000 we have one way of doing it in C. For example, we could write this. 308 00:16:27,000 --> 00:16:33,963 We could say file star input gets this value of fopen, hi.txt, r. 309 00:16:33,963 --> 00:16:35,755 And this is maybe a lot to take in at once, 310 00:16:35,755 --> 00:16:38,250 so let's break it down as a whole. 311 00:16:38,250 --> 00:16:41,560 First what we're doing is giving something a name. 312 00:16:41,560 --> 00:16:46,070 We're creating some variable and calling it input. 313 00:16:46,070 --> 00:16:46,570 OK. 314 00:16:46,570 --> 00:16:48,650 But we also have some other pieces going on here. 315 00:16:48,650 --> 00:16:50,170 Let's look at this. 316 00:16:50,170 --> 00:16:54,410 We have a type for this variable we're creating called input. 317 00:16:54,410 --> 00:16:57,500 This is a file pointer, a pointer to a file. 318 00:16:57,500 --> 00:17:03,280 So it goes to say that maybe this variable, this pointer called input, 319 00:17:03,280 --> 00:17:04,780 points to some file. 320 00:17:04,780 --> 00:17:08,240 It has the address of some file inside of it. 321 00:17:08,240 --> 00:17:12,220 Now, if we use fopen here, it's our very own function 322 00:17:12,220 --> 00:17:14,589 that C gives to us as part of its standard library. 323 00:17:14,589 --> 00:17:21,520 Well, what we could do is say, I want to open hi.txt using the read mode. 324 00:17:21,520 --> 00:17:25,599 So the first argument to fopen, in this case hi.txt 325 00:17:25,599 --> 00:17:27,980 is the file name we want to open. 326 00:17:27,980 --> 00:17:30,430 And when we run this, if we give a certain file name, 327 00:17:30,430 --> 00:17:34,070 C will look in our current directory for that file name. 328 00:17:34,070 --> 00:17:39,860 And if it finds it, go ahead and give us the address of that file. 329 00:17:39,860 --> 00:17:43,670 Now in this case, we have to specify the mode in which we open this file. 330 00:17:43,670 --> 00:17:45,940 Is it a reading mode or a writing mode? 331 00:17:45,940 --> 00:17:49,720 Reading mode allows us to read data, to take data from the file, 332 00:17:49,720 --> 00:17:54,430 to really copy it, see what's inside of that file, but not add data to it. 333 00:17:54,430 --> 00:17:58,240 If we want to do that, we have to open it in writing mode, which is w. 334 00:17:58,240 --> 00:18:02,500 So we could say hi.txt comma w, if you wanted to open this file 335 00:18:02,500 --> 00:18:04,003 and add more data to it. 336 00:18:04,003 --> 00:18:05,920 But for now, we're going to read data from it, 337 00:18:05,920 --> 00:18:09,520 take data copied into our program now. 338 00:18:09,520 --> 00:18:11,597 Now this visual here looks a bit like this. 339 00:18:11,597 --> 00:18:13,180 We're at the end of running this code. 340 00:18:13,180 --> 00:18:16,150 We have some variable called input, some pointer 341 00:18:16,150 --> 00:18:18,370 that has the address of our file. 342 00:18:18,370 --> 00:18:20,710 Now it's not quite as simple as this. 343 00:18:20,710 --> 00:18:23,443 There is more going on underneath the hood. 344 00:18:23,443 --> 00:18:25,360 But for now, we can tell a bit of a white lie, 345 00:18:25,360 --> 00:18:29,620 that input does have roughly the address of this file. 346 00:18:29,620 --> 00:18:34,060 In actuality, when we run this program, this line of code right here, 347 00:18:34,060 --> 00:18:37,300 fopen doesn't give us the exact address of the file, 348 00:18:37,300 --> 00:18:40,480 it gives us some file structure that is a bit more complicated. 349 00:18:40,480 --> 00:18:45,160 But the basic idea of this is that some file has a location memory, 350 00:18:45,160 --> 00:18:50,980 and using fopen to find that location and have a pointer roughly to that file 351 00:18:50,980 --> 00:18:55,140 that we can use in a special way as we'll see in just a minute. 352 00:18:55,140 --> 00:18:58,490 Now it's all fine and good to open our files, 353 00:18:58,490 --> 00:19:01,610 but wouldn't it be good if we could actually take data from them 354 00:19:01,610 --> 00:19:03,770 and read them into our program? 355 00:19:03,770 --> 00:19:05,990 Read those bytes and put them inside our program 356 00:19:05,990 --> 00:19:08,480 so we can use them for our own good. 357 00:19:08,480 --> 00:19:13,820 Well, to do that, we first need to complicate our vision of this file. 358 00:19:13,820 --> 00:19:15,240 Let's take a look at this. 359 00:19:15,240 --> 00:19:18,410 So we have this idea of a file and some variable, 360 00:19:18,410 --> 00:19:21,860 often called input, that points roughly towards our file. 361 00:19:21,860 --> 00:19:24,050 This tells us where our file is and allows us 362 00:19:24,050 --> 00:19:26,240 to access whatever is inside of it. 363 00:19:26,240 --> 00:19:30,770 Now this file is, of course, composed itself of different bytes of memory. 364 00:19:30,770 --> 00:19:35,870 Where we might have, in this case, many bytes memory inside of this dot txt 365 00:19:35,870 --> 00:19:36,980 file. 366 00:19:36,980 --> 00:19:40,670 And our goal is to take a peek at those bytes and see what's inside of them 367 00:19:40,670 --> 00:19:43,340 and put them inside of our own program. 368 00:19:43,340 --> 00:19:46,280 Well, to do that, we'll often need a place inside 369 00:19:46,280 --> 00:19:48,590 of our program to store these bytes. 370 00:19:48,590 --> 00:19:51,410 And this will often, in our case, be called a buffer. 371 00:19:51,410 --> 00:19:54,470 So a buffer is a technical term for some place 372 00:19:54,470 --> 00:19:59,300 we're going to store some data as we read it from a file. 373 00:19:59,300 --> 00:20:02,030 A buffer might look, visually, a bit like this, 374 00:20:02,030 --> 00:20:04,820 or it's maybe a sequence of three bytes. 375 00:20:04,820 --> 00:20:09,650 And if a buffer is some sequence of bytes, back, to back, to back, 376 00:20:09,650 --> 00:20:11,690 or some sequence of locations memory, back, 377 00:20:11,690 --> 00:20:13,910 to back, to back, what kind of structure do you think 378 00:20:13,910 --> 00:20:18,410 would be good to use for a buffer that we've seen before? 379 00:20:18,410 --> 00:20:19,985 What kind of structure would you use? 380 00:20:19,985 --> 00:20:22,820 381 00:20:22,820 --> 00:20:27,288 Buffer might be an array, right, where we have an array being some locations 382 00:20:27,288 --> 00:20:28,580 memory, back, to back, to back. 383 00:20:28,580 --> 00:20:33,710 A buffer similarly might be an array of some bytes we read from our file. 384 00:20:33,710 --> 00:20:39,920 Now we have a buffer to store this data in and a file to read this data from. 385 00:20:39,920 --> 00:20:43,370 It just, reason to say, OK, we need some tool 386 00:20:43,370 --> 00:20:45,800 to use to actually take these bytes from our file 387 00:20:45,800 --> 00:20:47,960 and put them now in our buffer. 388 00:20:47,960 --> 00:20:52,850 And thankfully, the C library does give us this function called fread. 389 00:20:52,850 --> 00:20:54,780 Fread looks a bit like this. 390 00:20:54,780 --> 00:20:57,890 And you can certainly change the arguments to fread. 391 00:20:57,890 --> 00:21:03,390 But importantly, it takes four distinct arguments-- four distinct inputs. 392 00:21:03,390 --> 00:21:06,590 The first input that we care about that makes logical sense to start with 393 00:21:06,590 --> 00:21:08,850 is the file we're reading from. 394 00:21:08,850 --> 00:21:13,190 So this is the file pointer that we're going to read data from. 395 00:21:13,190 --> 00:21:16,460 Notice how in our prior visual, we had this input 396 00:21:16,460 --> 00:21:19,070 file pointer, pointing roughly towards our file. 397 00:21:19,070 --> 00:21:22,500 Well, in this case, we want fread to read from that location, 398 00:21:22,500 --> 00:21:26,760 so we'll put this in the fourth argument to fread. 399 00:21:26,760 --> 00:21:31,530 Next in importance is the size of the blocks to read in bytes. 400 00:21:31,530 --> 00:21:36,960 Notice how our file here is composed of, in this case, individual bytes. 401 00:21:36,960 --> 00:21:41,530 So we want to read each of those bytes as a single byte at a time. 402 00:21:41,530 --> 00:21:44,490 So we'll say that this is the size of the block to read. 403 00:21:44,490 --> 00:21:47,970 The blocks in our file are a single byte large. 404 00:21:47,970 --> 00:21:51,330 The next question then is, OK, how many blocks do you want to read? 405 00:21:51,330 --> 00:21:53,730 If we say we have-- 406 00:21:53,730 --> 00:21:55,708 the size of our blocks in our file is one byte, 407 00:21:55,708 --> 00:21:57,000 well, let's read three of them. 408 00:21:57,000 --> 00:21:59,820 So three single byte blocks from our file. 409 00:21:59,820 --> 00:22:00,990 And where do we put them? 410 00:22:00,990 --> 00:22:04,300 We put them inside of our buffer, in this case. 411 00:22:04,300 --> 00:22:06,030 So the first argument to fread. 412 00:22:06,030 --> 00:22:08,130 The location we store the data we're reading. 413 00:22:08,130 --> 00:22:10,800 The second argument, how big is the block in our file? 414 00:22:10,800 --> 00:22:13,770 How big is a chunk of data in our file? 415 00:22:13,770 --> 00:22:16,300 Third argument, how many do I want to read? 416 00:22:16,300 --> 00:22:20,200 And finally, for this input, where are we reading from. 417 00:22:20,200 --> 00:22:22,420 So let's get a visual for this as we go through. 418 00:22:22,420 --> 00:22:25,810 So first, if we look at this location to read from, 419 00:22:25,810 --> 00:22:28,720 input, we're telling our file, here's what we're reading from, 420 00:22:28,720 --> 00:22:31,450 the start hi.txt file, because we opened it 421 00:22:31,450 --> 00:22:34,500 using it fopen just a little bit ago. 422 00:22:34,500 --> 00:22:35,780 Now we finish the question. 423 00:22:35,780 --> 00:22:37,700 What's the size of blocks to read? 424 00:22:37,700 --> 00:22:40,430 Well, they're a single byte big. 425 00:22:40,430 --> 00:22:41,980 Now how many do we want to read? 426 00:22:41,980 --> 00:22:43,230 We want to read three of them. 427 00:22:43,230 --> 00:22:45,830 So we'll look at three of these here. 428 00:22:45,830 --> 00:22:48,710 And then where do we want to store these? 429 00:22:48,710 --> 00:22:51,470 Where do we want to read them and copy them into? 430 00:22:51,470 --> 00:22:52,880 In this case, it's buffer. 431 00:22:52,880 --> 00:22:57,710 So we'll simply take our data in our file 432 00:22:57,710 --> 00:22:59,960 here and then put them inside of buffer. 433 00:22:59,960 --> 00:23:03,560 And now our file pointer, previously called input, updates 434 00:23:03,560 --> 00:23:06,260 to the next location in our file. 435 00:23:06,260 --> 00:23:09,260 And as such, we can keep reading, and reading, and reading from our file 436 00:23:09,260 --> 00:23:11,660 while our file pointer updates. 437 00:23:11,660 --> 00:23:15,680 So it's often good to kind of see this in practice. 438 00:23:15,680 --> 00:23:18,110 And what we'll do for this is actually take a look 439 00:23:18,110 --> 00:23:21,128 at how we can test the file as a PDF. 440 00:23:21,128 --> 00:23:24,170 But before we do that, let's take a look at how we can use a buffer here. 441 00:23:24,170 --> 00:23:25,878 If we want to use our buffer as an array, 442 00:23:25,878 --> 00:23:28,910 we could, of course, use this bracket syntax, like buffer bracket zero, 443 00:23:28,910 --> 00:23:32,180 buffer bracket one, and buffer bracket two, giving us access 444 00:23:32,180 --> 00:23:36,090 to those individual bytes, which will be important in just a moment. 445 00:23:36,090 --> 00:23:39,830 So for our practice, we'll create this program called pdf.c. 446 00:23:39,830 --> 00:23:45,230 And the goal of pdf.c is to open a file given to our program and check, 447 00:23:45,230 --> 00:23:48,110 is that file likely a PDF? 448 00:23:48,110 --> 00:23:52,250 And we can know this because we know that every PDF, or at least those 449 00:23:52,250 --> 00:23:56,160 of a certain type, a common type, start with this four byte sequence. 450 00:23:56,160 --> 00:23:59,990 And these bytes correspond to these four integers-- 451 00:23:59,990 --> 00:24:03,590 37, 80, 68, and 70. 452 00:24:03,590 --> 00:24:05,690 So maybe news to you. 453 00:24:05,690 --> 00:24:09,140 Whenever you open a PDF, the first four bytes in that file 454 00:24:09,140 --> 00:24:11,630 are often going to represent these integers-- 455 00:24:11,630 --> 00:24:14,630 37, 80, 68, and 70. 456 00:24:14,630 --> 00:24:16,790 This is known as a file signature. 457 00:24:16,790 --> 00:24:19,680 It tells a program opening this file to know hey, 458 00:24:19,680 --> 00:24:22,380 this is pretty sure to be a PDF. 459 00:24:22,380 --> 00:24:24,860 Now let's write this code together. 460 00:24:24,860 --> 00:24:27,110 Let's go to pdf.c over here. 461 00:24:27,110 --> 00:24:30,020 I'll do cd dot dot, get out of my pointers directory 462 00:24:30,020 --> 00:24:31,670 and I'll clear my terminal. 463 00:24:31,670 --> 00:24:37,130 I'll then go into my new folder called PDF and cd inside of it. 464 00:24:37,130 --> 00:24:38,090 I'll type ls. 465 00:24:38,090 --> 00:24:40,010 And notice how I have some test files here. 466 00:24:40,010 --> 00:24:42,810 I have test JPEG and test PDF. 467 00:24:42,810 --> 00:24:47,240 I don't yet, though, have, in this case, pdf.c. 468 00:24:47,240 --> 00:24:51,950 I'll use these files as tests for my code, but first I need to create pdf.c. 469 00:24:51,950 --> 00:24:56,940 So I'll do that with code, in this case PDF, dot c. 470 00:24:56,940 --> 00:24:59,790 Now I have this new file, pdf.c. 471 00:24:59,790 --> 00:25:03,990 And what's the kind of boilerplate syntax 472 00:25:03,990 --> 00:25:06,540 I should use to start off my PDF program? 473 00:25:06,540 --> 00:25:09,840 I might want to have an int main void program-- 474 00:25:09,840 --> 00:25:12,870 or a function, some place to run the main part of my code. 475 00:25:12,870 --> 00:25:17,640 I might also want to simply import the CS50 library. 476 00:25:17,640 --> 00:25:21,390 I might also want to import maybe the standard I/O library 477 00:25:21,390 --> 00:25:26,450 to print something out to the user, tell them if it's a PDF or if it's not. 478 00:25:26,450 --> 00:25:31,225 Now to get started, I first need to accept some command line arguments 479 00:25:31,225 --> 00:25:31,850 for my program. 480 00:25:31,850 --> 00:25:36,300 I need to ask the user to type in the name of the file they want to open. 481 00:25:36,300 --> 00:25:40,520 And so what I'll do is I'll check to see if the user has actually 482 00:25:40,520 --> 00:25:42,510 given me a file name. 483 00:25:42,510 --> 00:25:48,920 Now instead of void here, I probably want to use int argc and string argv. 484 00:25:48,920 --> 00:25:51,890 This allows my program to take command line arguments. 485 00:25:51,890 --> 00:25:54,350 And remember that the amount of arguments I've been given 486 00:25:54,350 --> 00:25:58,580 is stored in argc, and the actual content of the arguments 487 00:25:58,580 --> 00:26:00,530 is stored in this array, argv. 488 00:26:00,530 --> 00:26:08,990 Now I should check, is argc equal to the number of command 489 00:26:08,990 --> 00:26:10,680 line arguments I expect? 490 00:26:10,680 --> 00:26:14,240 Now, if I expect one command line argument, I should check, 491 00:26:14,240 --> 00:26:18,190 is arg not equal to two? 492 00:26:18,190 --> 00:26:23,120 And if it's not I'll say, printf, this is improper usage. 493 00:26:23,120 --> 00:26:24,940 This is not the way to run my program. 494 00:26:24,940 --> 00:26:30,090 And I'll return 1 to the user, saying, this is not the right way to do it. 495 00:26:30,090 --> 00:26:35,390 Now why is argc two here, not one? 496 00:26:35,390 --> 00:26:42,270 We expect one argument, really, but why should I check if argc is two? 497 00:26:42,270 --> 00:26:45,930 Well, keep in mind that when I run this program, eventually as, in this case, 498 00:26:45,930 --> 00:26:54,090 dot slash pdf, maybe test JPEG dot JPEG, the first command line 499 00:26:54,090 --> 00:26:57,990 argument, technically, for a program is going to be dot slash PDF, 500 00:26:57,990 --> 00:27:00,190 and then the second one is going to be this. 501 00:27:00,190 --> 00:27:03,387 So if I want really one command an argument to my program here, 502 00:27:03,387 --> 00:27:05,220 I should keep in mind that this still counts 503 00:27:05,220 --> 00:27:07,290 as some argument given at the terminal. 504 00:27:07,290 --> 00:27:12,150 So I should say if argc is not equal to two in the end. 505 00:27:12,150 --> 00:27:13,480 Now let me scroll down here. 506 00:27:13,480 --> 00:27:18,550 And once I know that I have some file name, let me try to open this file. 507 00:27:18,550 --> 00:27:21,720 So let's say open file here. 508 00:27:21,720 --> 00:27:22,838 And how can we do that? 509 00:27:22,838 --> 00:27:25,380 I encourage you to pause the video, maybe try it on your own. 510 00:27:25,380 --> 00:27:29,505 How could we open whatever file is given to us from the user? 511 00:27:29,505 --> 00:27:33,160 512 00:27:33,160 --> 00:27:35,220 So let's first keep in mind the file name, right? 513 00:27:35,220 --> 00:27:40,420 We could perhaps say this-- the file name, string file name, 514 00:27:40,420 --> 00:27:44,020 is located at argv 1. 515 00:27:44,020 --> 00:27:45,280 Now why argv 1? 516 00:27:45,280 --> 00:27:48,510 What we saw below, when we eventually run our program with dot 517 00:27:48,510 --> 00:27:52,780 slash PDF test JPEG dot JPEG or test PDF dot PDF, 518 00:27:52,780 --> 00:27:56,860 well, this will be what's stored in argv 0, 519 00:27:56,860 --> 00:28:00,580 and this will be what's stored in argv 1. 520 00:28:00,580 --> 00:28:04,500 So the file name is located at argv 1. 521 00:28:04,500 --> 00:28:11,130 And we need the final name to pass it into what function to open our file? 522 00:28:11,130 --> 00:28:11,820 Fopen. 523 00:28:11,820 --> 00:28:14,940 So we saw, we could use fopen like this-- 524 00:28:14,940 --> 00:28:17,770 fopen and then the file name. 525 00:28:17,770 --> 00:28:19,890 So we'll say this is our file name, simply using 526 00:28:19,890 --> 00:28:21,960 this variable we've used before. 527 00:28:21,960 --> 00:28:24,210 And then what mode we want to open the file in. 528 00:28:24,210 --> 00:28:26,850 Because we're simply reading this file, seeing 529 00:28:26,850 --> 00:28:29,040 what information is inside of it, not adding to it, 530 00:28:29,040 --> 00:28:32,910 we can simply open this file in read mode. 531 00:28:32,910 --> 00:28:38,190 But we need some place to store the structure we'll get back from fopen-- 532 00:28:38,190 --> 00:28:40,282 the file pointer we'll get back from fopen. 533 00:28:40,282 --> 00:28:42,240 And so for that, let's go ahead and create one. 534 00:28:42,240 --> 00:28:45,580 We can say, give me a new file pointer. 535 00:28:45,580 --> 00:28:49,640 This one perhaps called PDF. 536 00:28:49,640 --> 00:28:55,487 And now we've created some file pointer called PDF using fopen. 537 00:28:55,487 --> 00:28:58,070 And actually, because we don't know if it's going to be a PDF, 538 00:28:58,070 --> 00:29:01,070 let's just call it in this case file. 539 00:29:01,070 --> 00:29:07,470 So we have a file pointer called file that will take a file name, 540 00:29:07,470 --> 00:29:11,190 find it in our current directory, open it in read mode, and give it back to us 541 00:29:11,190 --> 00:29:14,920 in terms of this file pointer. 542 00:29:14,920 --> 00:29:16,280 OK. 543 00:29:16,280 --> 00:29:17,760 So we have opened the file. 544 00:29:17,760 --> 00:29:21,660 But as we do, you can think of some edge cases. 545 00:29:21,660 --> 00:29:27,370 What if, for example, we type in a file name we actually don't have? 546 00:29:27,370 --> 00:29:32,470 Well, in that case, fopen will return to us null, as a special term to say, 547 00:29:32,470 --> 00:29:34,130 we don't have that file for you here. 548 00:29:34,130 --> 00:29:37,810 So it's important here to not simply blindly open the file, but to check, 549 00:29:37,810 --> 00:29:38,470 even. 550 00:29:38,470 --> 00:29:43,470 Is file equal to null. 551 00:29:43,470 --> 00:29:46,260 And if it is, let's go ahead and return 1. 552 00:29:46,260 --> 00:29:48,340 And we print an error message to the user. 553 00:29:48,340 --> 00:29:54,330 We could say, in this case that, no such file found. 554 00:29:54,330 --> 00:29:57,210 And for good measure, why don't we have some backslash 555 00:29:57,210 --> 00:30:02,680 ends to tidy up our code here. 556 00:30:02,680 --> 00:30:06,195 All right so this is going to check if file exists. 557 00:30:06,195 --> 00:30:09,340 558 00:30:09,340 --> 00:30:11,650 Now we've opened the file, ideally. 559 00:30:11,650 --> 00:30:13,000 We've checked if it exists. 560 00:30:13,000 --> 00:30:16,640 And presumably, if we get past this line of code, the file does exist. 561 00:30:16,640 --> 00:30:18,210 So what's the next step? 562 00:30:18,210 --> 00:30:23,560 We want to read from this file, and ideally, have some place to read into. 563 00:30:23,560 --> 00:30:25,840 So we saw before that it's common for us to create 564 00:30:25,840 --> 00:30:29,410 what we call a buffer-- some place to read data from our file 565 00:30:29,410 --> 00:30:32,950 and put it inside our program in smaller chunks. 566 00:30:32,950 --> 00:30:37,060 It wouldn't be wise, for example, for us to take the entire program 567 00:30:37,060 --> 00:30:38,380 and put it all-- 568 00:30:38,380 --> 00:30:41,150 the entire file and put it all in our program at once. 569 00:30:41,150 --> 00:30:44,350 Instead, we want to take single chunks of that file and deal 570 00:30:44,350 --> 00:30:46,750 with those chunks kind of individually, one 571 00:30:46,750 --> 00:30:49,210 at a time, kind of reading in smaller pieces of the file 572 00:30:49,210 --> 00:30:52,300 but going through the entire file over time. 573 00:30:52,300 --> 00:30:55,780 Now to create this buffer-- remember, a buffer is simply an array. 574 00:30:55,780 --> 00:30:59,590 But in this case, for a PDF, it's an array of a special kind of byte. 575 00:30:59,590 --> 00:31:03,860 We want the buffer to store the same type of data that's inside of our file. 576 00:31:03,860 --> 00:31:10,270 And in this case, a PDF stores the special type called a uint 8 t. 577 00:31:10,270 --> 00:31:12,130 This is a special type of data. 578 00:31:12,130 --> 00:31:15,910 And it might look scary at first, until we break it down into smaller pieces. 579 00:31:15,910 --> 00:31:18,190 So first, notice some familiar syntax. 580 00:31:18,190 --> 00:31:20,020 Here we have int, right? 581 00:31:20,020 --> 00:31:23,110 So presumably, this is some kind of integer. 582 00:31:23,110 --> 00:31:24,340 It's a special kind, though. 583 00:31:24,340 --> 00:31:25,780 It's a U int. 584 00:31:25,780 --> 00:31:29,920 U stands for unsigned, meaning it's only positive. 585 00:31:29,920 --> 00:31:34,360 Remember, that we talk about signed or unsigned integers, where signed means 586 00:31:34,360 --> 00:31:36,260 it can be positive or negative. 587 00:31:36,260 --> 00:31:38,680 We can have a minus sign in front or not. 588 00:31:38,680 --> 00:31:43,100 But as unsigned integers, it can be in this case only positive. 589 00:31:43,100 --> 00:31:45,020 Now we have this 8 here. 590 00:31:45,020 --> 00:31:49,190 It's a uint, unsigned integer, 8 underscore t. 591 00:31:49,190 --> 00:31:53,810 Well, the 8 here denotes this is only 8 bits-- a single byte for an integer. 592 00:31:53,810 --> 00:31:57,230 It's not a 2 byte integer or a 4 byte integer. 593 00:31:57,230 --> 00:31:58,760 It's only a single byte. 594 00:31:58,760 --> 00:32:02,180 We can represent up to, in this case, the number of values we have 595 00:32:02,180 --> 00:32:06,110 with a single byte, unsigned. 596 00:32:06,110 --> 00:32:10,370 Now this underscore t here means that all that together, this unsigned 597 00:32:10,370 --> 00:32:14,060 integer of 8 bits, is going to be its own type. 598 00:32:14,060 --> 00:32:17,600 So uint went 8 underscore t, an unsigned integer 599 00:32:17,600 --> 00:32:20,780 of 8 bits that is its very own type. 600 00:32:20,780 --> 00:32:24,560 Now this is a special kind-- it's presumably used in more than PDF, 601 00:32:24,560 --> 00:32:26,100 but it's a new one for us today. 602 00:32:26,100 --> 00:32:28,190 So we'll get some practice with this. 603 00:32:28,190 --> 00:32:33,050 To use a uint 8 t, we need to actually import a different library from the C 604 00:32:33,050 --> 00:32:37,010 library, this one called standard int-- 605 00:32:37,010 --> 00:32:41,180 S-T-D dot I-N-T dot H. So we'll get it from this header file, 606 00:32:41,180 --> 00:32:46,910 standard int dot h, and now we can use it inside our pdf.c down below. 607 00:32:46,910 --> 00:32:48,170 Now we want a buffer here. 608 00:32:48,170 --> 00:32:51,160 So we want a buffer of uint 8 t's. 609 00:32:51,160 --> 00:32:57,510 And how many integers are we going to store in this buffer, do you think? 610 00:32:57,510 --> 00:33:05,270 We need four-- the PDF signature is only four bytes, remember. 611 00:33:05,270 --> 00:33:07,580 Assume I only need four spaces. 612 00:33:07,580 --> 00:33:12,290 So I'll create enough space in an array for four uint 8 t's that'll 613 00:33:12,290 --> 00:33:14,300 read from our PDF. 614 00:33:14,300 --> 00:33:16,400 And currently this is empty, but in just a minute, 615 00:33:16,400 --> 00:33:20,560 we'll actually go ahead and try to read these inside of our buffer. 616 00:33:20,560 --> 00:33:25,663 So now that we have some place to read in the data from our file, 617 00:33:25,663 --> 00:33:26,830 let's go ahead and try that. 618 00:33:26,830 --> 00:33:29,020 We could say fread. 619 00:33:29,020 --> 00:33:33,620 And remember that the first argument to fread is the place we're reading into. 620 00:33:33,620 --> 00:33:37,330 So in this case, we'll read into our buffer. 621 00:33:37,330 --> 00:33:41,590 Now the next two arguments are the size of blocks in the file, 622 00:33:41,590 --> 00:33:44,050 and how many of those blocks you want to read. 623 00:33:44,050 --> 00:33:48,880 Now for a PDF, a PDF could consider it being composed of individual bytes. 624 00:33:48,880 --> 00:33:53,110 So we could say, the size of the block is one byte. 625 00:33:53,110 --> 00:33:57,110 But how many of these bytes do you want to read? 626 00:33:57,110 --> 00:33:58,110 Only the first four. 627 00:33:58,110 --> 00:34:01,790 So we could say, take the first four single byte blocks from this file. 628 00:34:01,790 --> 00:34:03,110 And now where from? 629 00:34:03,110 --> 00:34:04,730 Where is this file reading from? 630 00:34:04,730 --> 00:34:10,209 Well, in this case, it's going to be from our file pointer from above. 631 00:34:10,209 --> 00:34:13,334 Notice the correspondence between the file we've opened before and the file 632 00:34:13,334 --> 00:34:15,960 we're reading from down here. 633 00:34:15,960 --> 00:34:18,989 Now fread doesn't necessarily need to be stored in some variable. 634 00:34:18,989 --> 00:34:24,659 We don't necessarily say, file pointer, updated file equals fread. 635 00:34:24,659 --> 00:34:25,889 Fread works just on its own. 636 00:34:25,889 --> 00:34:26,730 You can call it. 637 00:34:26,730 --> 00:34:29,909 It'll do the magic of reading in your bytes from your file, 638 00:34:29,909 --> 00:34:33,420 putting them inside the buffer, and updating the file pointer to look 639 00:34:33,420 --> 00:34:37,110 at the next four bytes in your file. 640 00:34:37,110 --> 00:34:39,199 Now once we've done that, we could possibly 641 00:34:39,199 --> 00:34:42,710 take a look at what's inside of buffer, just for the sake of looking. 642 00:34:42,710 --> 00:34:45,130 Now to iterate through our buffer, what could we do? 643 00:34:45,130 --> 00:34:46,213 We could write a for loop. 644 00:34:46,213 --> 00:34:49,040 We could say for int i equals 0. 645 00:34:49,040 --> 00:34:51,920 i is less than four, i plus plus. 646 00:34:51,920 --> 00:34:53,060 Now we could look-- 647 00:34:53,060 --> 00:34:57,920 let's print out whatever integer is inside of buffer-- 648 00:34:57,920 --> 00:35:00,410 let's put a space after it. 649 00:35:00,410 --> 00:35:02,670 Buffer bracket i. 650 00:35:02,670 --> 00:35:06,450 So now ideally, we've created some space in our program 651 00:35:06,450 --> 00:35:08,580 to store the first four bytes from our file. 652 00:35:08,580 --> 00:35:12,300 We've read them from our file, from our PDF, perhaps. 653 00:35:12,300 --> 00:35:15,690 And now we're going to print them out, just as a check here to see 654 00:35:15,690 --> 00:35:18,290 what's inside of our buffer. 655 00:35:18,290 --> 00:35:23,350 So I'll go down here, and I'll run make PDF to compile it and see 656 00:35:23,350 --> 00:35:24,475 if we don't get any errors. 657 00:35:24,475 --> 00:35:28,715 658 00:35:28,715 --> 00:35:30,090 And we'll wait for it to compile. 659 00:35:30,090 --> 00:35:31,423 And it looks like it's all good. 660 00:35:31,423 --> 00:35:34,080 So now I'll type dot slash PDF. 661 00:35:34,080 --> 00:35:36,420 And now we'll give the name of the file we want to open. 662 00:35:36,420 --> 00:35:40,350 In this case we'll do test underscore PDF dot PDF. 663 00:35:40,350 --> 00:35:41,790 And let's see. 664 00:35:41,790 --> 00:35:46,060 OK, it looks like inside this file, in the first four bites 665 00:35:46,060 --> 00:35:50,530 are 37, 80, 68, and 70. 666 00:35:50,530 --> 00:35:52,570 And that does correspond to what we expect-- 667 00:35:52,570 --> 00:35:56,770 37, 80, 68, and 70. 668 00:35:56,770 --> 00:36:00,360 Let's try our, in this case, our JPEG. 669 00:36:00,360 --> 00:36:03,270 But first, notice how I have this on the same line here. 670 00:36:03,270 --> 00:36:04,000 Why do I do this? 671 00:36:04,000 --> 00:36:07,740 Why don't I say backslash n at the end to print a new line, 672 00:36:07,740 --> 00:36:11,500 ultimately after I print out these first four bytes? 673 00:36:11,500 --> 00:36:13,680 Now I'll make PDF again. 674 00:36:13,680 --> 00:36:17,910 I'll do dot slash PDF test JPEG dot JPEG. 675 00:36:17,910 --> 00:36:23,280 Open this, and now we see the first four integers inside of this JPEG. 676 00:36:23,280 --> 00:36:25,620 And let's just be sure-- test a file, it doesn't exist. 677 00:36:25,620 --> 00:36:28,950 Let's do dot PDF, hello.c. 678 00:36:28,950 --> 00:36:31,810 And no such file was found. 679 00:36:31,810 --> 00:36:32,560 Great. 680 00:36:32,560 --> 00:36:34,968 So we seem to be able to read in data from our file, 681 00:36:34,968 --> 00:36:36,010 put it inside our buffer. 682 00:36:36,010 --> 00:36:42,700 But now I need to check, is this buffer the same as 37, 80, 68, and 70? 683 00:36:42,700 --> 00:36:45,833 Now I'll leave this up to you to work on for a minute here. 684 00:36:45,833 --> 00:36:48,250 I encourage you to do that on your own before coming back. 685 00:36:48,250 --> 00:36:51,760 But maybe pause the video and work on this together in just a moment. 686 00:36:51,760 --> 00:36:56,240 687 00:36:56,240 --> 00:36:56,860 OK. 688 00:36:56,860 --> 00:36:59,050 So presumably you've attempted this on your own, 689 00:36:59,050 --> 00:37:00,400 but if you haven't, that's OK. 690 00:37:00,400 --> 00:37:01,450 Let's go ahead and check. 691 00:37:01,450 --> 00:37:04,660 How could we see the data inside our buffer 692 00:37:04,660 --> 00:37:07,480 and check it against the signature we're looking for? 693 00:37:07,480 --> 00:37:11,050 Well, it would be handy if we created our very own signature-- 694 00:37:11,050 --> 00:37:13,840 our own array-- 695 00:37:13,840 --> 00:37:17,060 that actually has the data we expect inside of it. 696 00:37:17,060 --> 00:37:18,230 So we could say this. 697 00:37:18,230 --> 00:37:21,070 Let's make a new uint 8 t buffer, this one 698 00:37:21,070 --> 00:37:24,800 called signature, that stores four values. 699 00:37:24,800 --> 00:37:27,760 But in this case, we'll actually just give them to the buffer. 700 00:37:27,760 --> 00:37:31,500 We'll say first we're looking for 37. 701 00:37:31,500 --> 00:37:34,520 Then we're looking for 80. 702 00:37:34,520 --> 00:37:37,040 Then we're looking for 68. 703 00:37:37,040 --> 00:37:40,550 And then we're looking for, in this case, 70. 704 00:37:40,550 --> 00:37:43,210 So this is our file signature we're looking for. 705 00:37:43,210 --> 00:37:44,980 And as a bit of trivia, you don't actually 706 00:37:44,980 --> 00:37:48,490 need to include the length of the buffer or the length of the array 707 00:37:48,490 --> 00:37:54,880 if you have the definition already over here of four, in this case, integers. 708 00:37:54,880 --> 00:37:57,475 So now we have the buffer that we're reading from our file 709 00:37:57,475 --> 00:38:00,100 and the signature that has the data we're actually looking for. 710 00:38:00,100 --> 00:38:04,240 Well, what if we compared buffer bracket i with signature bracket i, 711 00:38:04,240 --> 00:38:05,973 and see if they're the same? 712 00:38:05,973 --> 00:38:06,640 So I'll do this. 713 00:38:06,640 --> 00:38:10,330 I'll say, as I loop through my buffer, why don't I ask, 714 00:38:10,330 --> 00:38:16,450 is buffer bracket i the same as signature bracket i? 715 00:38:16,450 --> 00:38:19,190 And if that's the case, well, what do I want to do? 716 00:38:19,190 --> 00:38:21,580 Well, I actually really can't make a conclusion 717 00:38:21,580 --> 00:38:27,530 if I only know that one of buffers integers is the same as the signatures. 718 00:38:27,530 --> 00:38:31,850 Instead, I should probably ask, what if it's not the same? 719 00:38:31,850 --> 00:38:36,140 What if as I loop through, it's ever not the same between buffer and signature? 720 00:38:36,140 --> 00:38:40,080 Well, I could perhaps print to the user, if I know that one is not the same, 721 00:38:40,080 --> 00:38:41,790 this is likely not a PDF. 722 00:38:41,790 --> 00:38:44,120 So I'll say, likely not a PDF. 723 00:38:44,120 --> 00:38:46,520 And I'll return just a 0 to say everything went OK, 724 00:38:46,520 --> 00:38:49,330 but this is just not a PDF. 725 00:38:49,330 --> 00:38:52,470 Now then, if I get all the way through this loop, 726 00:38:52,470 --> 00:38:55,690 then I know that I likely have a PDF. 727 00:38:55,690 --> 00:38:58,545 I'll go say, printf, likely a PDF. 728 00:38:58,545 --> 00:39:01,070 729 00:39:01,070 --> 00:39:03,290 And I'll do backslash n up here, too. 730 00:39:03,290 --> 00:39:05,360 And then I'll say return 0. 731 00:39:05,360 --> 00:39:09,030 I don't need this new line anymore. 732 00:39:09,030 --> 00:39:11,250 So notice the logic here. 733 00:39:11,250 --> 00:39:16,970 First we're going to read in some data into our buffer. 734 00:39:16,970 --> 00:39:22,200 We're going to ask the question, does the buffer signature match? 735 00:39:22,200 --> 00:39:25,680 Does the better signature match. 736 00:39:25,680 --> 00:39:27,810 We'll loop to buffer, check. 737 00:39:27,810 --> 00:39:31,950 Is every integer and buffer the same as signature? 738 00:39:31,950 --> 00:39:34,380 If it is, if we never trigger this condition, 739 00:39:34,380 --> 00:39:36,960 we'll get down here and print, likely a PDF. 740 00:39:36,960 --> 00:39:41,130 If we ever find it's not the same, though, we'll print, likely not a PDF, 741 00:39:41,130 --> 00:39:47,580 return 0, and thus ending our program before we get down to likely a PDF. 742 00:39:47,580 --> 00:39:49,440 OK, so let's run this program again, I'll 743 00:39:49,440 --> 00:39:54,830 open it up to make PDF dot slash PDF. 744 00:39:54,830 --> 00:39:57,550 I'll do test JPEG dot JPEG. 745 00:39:57,550 --> 00:40:00,280 And we see, likely not a PDF, which makes sense. 746 00:40:00,280 --> 00:40:06,310 Now let's do dot slash PDF, test PDF dot PDF, and we see, likely a PDF. 747 00:40:06,310 --> 00:40:11,360 So this file seems to be working exactly as we intend. 748 00:40:11,360 --> 00:40:11,870 All right. 749 00:40:11,870 --> 00:40:14,060 So this is some of the magic we can do, now that we 750 00:40:14,060 --> 00:40:17,650 can open files and read data from them. 751 00:40:17,650 --> 00:40:19,400 What we'll take a look at in just a moment 752 00:40:19,400 --> 00:40:22,490 is what we can do if we want to keep asking 753 00:40:22,490 --> 00:40:24,508 our program for more and more memory. 754 00:40:24,508 --> 00:40:26,300 Here we're not using too much memory, we're 755 00:40:26,300 --> 00:40:29,943 only using four bytes to read in our data 756 00:40:29,943 --> 00:40:31,610 from our file, put it inside our buffer. 757 00:40:31,610 --> 00:40:34,100 But there's more we can do with memory in this case. 758 00:40:34,100 --> 00:40:38,780 So beyond opening files, let's take a look at dynamic memory. 759 00:40:38,780 --> 00:40:43,310 And dynamic memory is often seen the context of using malloc in C. 760 00:40:43,310 --> 00:40:45,800 So malloc, if you recall, is used for asking 761 00:40:45,800 --> 00:40:48,302 our program for more and more memory. 762 00:40:48,302 --> 00:40:51,510 Importantly, it does it from a special place that we'll see in just a minute. 763 00:40:51,510 --> 00:40:56,040 But the basic idea is asking for memory for our program on the fly. 764 00:40:56,040 --> 00:41:00,860 Now for example, let's say we wanted to create some integer called hours. 765 00:41:00,860 --> 00:41:02,300 Well, we can use malloc here. 766 00:41:02,300 --> 00:41:04,520 We could say int star hours. 767 00:41:04,520 --> 00:41:09,050 Gets the value of running malloc given the size of an integer. 768 00:41:09,050 --> 00:41:11,610 And there are some stuff to break down here, so let's do it. 769 00:41:11,610 --> 00:41:15,320 Notice how we have, in this case, the name of our variable, still. 770 00:41:15,320 --> 00:41:16,520 It's hours. 771 00:41:16,520 --> 00:41:18,320 But it's not an integer right now. 772 00:41:18,320 --> 00:41:20,840 It's an integer pointer. 773 00:41:20,840 --> 00:41:25,730 So malloc, when it runs, always gives back a pointer to whatever space 774 00:41:25,730 --> 00:41:28,440 it created in memory for us. 775 00:41:28,440 --> 00:41:32,990 So as we run malloc, though, it needs to know, what size of space 776 00:41:32,990 --> 00:41:33,890 should I give you? 777 00:41:33,890 --> 00:41:37,500 And we can use, in this case, this function, size of, to say, 778 00:41:37,500 --> 00:41:39,680 give me the size of whatever type I have. 779 00:41:39,680 --> 00:41:41,660 Size of int, size of char. 780 00:41:41,660 --> 00:41:46,360 If we give that to malloc, it'll always give us that size in bytes. 781 00:41:46,360 --> 00:41:51,240 So we're asking malloc here for simply some space for one integer. 782 00:41:51,240 --> 00:41:53,360 Now we get back a pointer to the integer, 783 00:41:53,360 --> 00:41:55,760 and to then store some values inside of it, 784 00:41:55,760 --> 00:42:00,050 we need to use that star syntax we saw a little bit earlier. 785 00:42:00,050 --> 00:42:03,350 What if we wanted not just a single space for an integer, 786 00:42:03,350 --> 00:42:05,240 but actually maybe an array of integers? 787 00:42:05,240 --> 00:42:06,830 We could ask malloc for that, too. 788 00:42:06,830 --> 00:42:10,340 We could say, instead of size of integer, give me size of integer times 789 00:42:10,340 --> 00:42:14,190 five, in this case, five spaces for this integer here. 790 00:42:14,190 --> 00:42:17,040 Now what if you wanted to actually add in some data? 791 00:42:17,040 --> 00:42:22,020 Well, we could say, as we saw before, star hours gets 7. 792 00:42:22,020 --> 00:42:26,230 And if you wanted to add in some data to the right of 7, we could do this. 793 00:42:26,230 --> 00:42:32,670 We could say, maybe, star hours plus 1 is 9, or stars hour plus 2 794 00:42:32,670 --> 00:42:34,480 is something else. 795 00:42:34,480 --> 00:42:37,800 Notice how we're using some pointer arithmetic here, or we're saying, 796 00:42:37,800 --> 00:42:42,000 hours plus 1 means go to the next location memory 797 00:42:42,000 --> 00:42:43,980 after whatever hours is pointing to. 798 00:42:43,980 --> 00:42:49,460 And of course, hours points that first location in our broader array here. 799 00:42:49,460 --> 00:42:53,350 So we could also, of course, use same bracket notation we saw before-- 800 00:42:53,350 --> 00:42:57,880 like hours bracket 2 for 8, or hours bracket 3, in this case, 801 00:42:57,880 --> 00:42:59,792 for 7, and so on. 802 00:42:59,792 --> 00:43:02,500 So it kind of begs the question, what's the point of using malloc 803 00:43:02,500 --> 00:43:07,430 if we can use this very same syntax, and now we have to do with pointers? 804 00:43:07,430 --> 00:43:10,810 As we saw briefly in lecture, malloc gives us some memory 805 00:43:10,810 --> 00:43:12,920 from the special place called the Heap. 806 00:43:12,920 --> 00:43:15,550 What we've been using up until now in CS50 807 00:43:15,550 --> 00:43:17,680 has been the Stack, where the Stack is what 808 00:43:17,680 --> 00:43:19,660 you use when you use simply a function. 809 00:43:19,660 --> 00:43:22,120 You create some variable inside your function 810 00:43:22,120 --> 00:43:24,670 that asks for memory from the Stack. 811 00:43:24,670 --> 00:43:28,480 When you use malloc, though, you get memory from the Heap. 812 00:43:28,480 --> 00:43:30,380 And why would you get memory from the Heap? 813 00:43:30,380 --> 00:43:33,520 Well, if you want a much larger data structure, 814 00:43:33,520 --> 00:43:37,240 you might often do that on the Heap because the Heap is more persistent, 815 00:43:37,240 --> 00:43:38,163 it's quite larger. 816 00:43:38,163 --> 00:43:40,330 You don't really want to fill up the Stack too much. 817 00:43:40,330 --> 00:43:43,840 You often want to use the Heap for these really large kind of files 818 00:43:43,840 --> 00:43:45,280 that you might work with. 819 00:43:45,280 --> 00:43:47,410 You might also use the Heap when you want 820 00:43:47,410 --> 00:43:51,040 to have a data structure that many functions can operate on-- for example, 821 00:43:51,040 --> 00:43:52,810 a linked list or a hash table. 822 00:43:52,810 --> 00:43:56,920 If you want a single structure that you can write many functions to operate on, 823 00:43:56,920 --> 00:43:59,260 you'll want to use the Heap for that, because remember, 824 00:43:59,260 --> 00:44:01,990 a Stack is limited to the single function call, 825 00:44:01,990 --> 00:44:05,150 but the Heap can be shared across functions overall. 826 00:44:05,150 --> 00:44:08,080 So this is good, as you'll see, in the problem set this week 827 00:44:08,080 --> 00:44:12,670 and in coming weeks, you'll be able to actually share data across functions 828 00:44:12,670 --> 00:44:15,580 when you use malloc and the Heap. 829 00:44:15,580 --> 00:44:19,420 Now if you use malloc, though, there are still things to be wary of. 830 00:44:19,420 --> 00:44:22,090 And in fact, when we use malloc, there's often these brand 831 00:44:22,090 --> 00:44:24,680 new errors that we're now capable of making. 832 00:44:24,680 --> 00:44:27,280 And so let's take a look at these kinds of errors now. 833 00:44:27,280 --> 00:44:29,740 Often when we use malloc, we have to make 834 00:44:29,740 --> 00:44:31,750 sure we actually free every block of memory-- 835 00:44:31,750 --> 00:44:34,875 and often, you actually won't remember do that when you're first beginning. 836 00:44:34,875 --> 00:44:35,650 That's OK. 837 00:44:35,650 --> 00:44:39,370 If you use fopen, you want to make sure you always close the file that you've 838 00:44:39,370 --> 00:44:40,750 then opened before. 839 00:44:40,750 --> 00:44:43,690 And of course, we want to be wary of using more memory 840 00:44:43,690 --> 00:44:45,310 than has actually been allocated. 841 00:44:45,310 --> 00:44:46,675 So let's take a look at this. 842 00:44:46,675 --> 00:44:49,630 But before we actually jump into a new exercise, 843 00:44:49,630 --> 00:44:53,290 let's go back to our previous PDF one and see if there's something 844 00:44:53,290 --> 00:44:56,530 we could have done a little bit better, keeping in mind these common memory 845 00:44:56,530 --> 00:44:57,650 errors. 846 00:44:57,650 --> 00:44:58,870 Let's go back over here. 847 00:44:58,870 --> 00:45:05,810 And reading through this file, I might not see much to improve off the bat, 848 00:45:05,810 --> 00:45:07,710 especially if I'm a beginner here. 849 00:45:07,710 --> 00:45:10,310 But what I can do is run a special program 850 00:45:10,310 --> 00:45:14,210 that tells me what might be going wrong, memory-wise in this program. 851 00:45:14,210 --> 00:45:17,190 It seemed to work, but we could still do better. 852 00:45:17,190 --> 00:45:20,570 So I'll do this-- valgrind of dot PDF. 853 00:45:20,570 --> 00:45:25,220 And I'll type in this time, test PDF dot PDF. 854 00:45:25,220 --> 00:45:31,400 So if I run my program as I usually do, but type valgrind in front, 855 00:45:31,400 --> 00:45:35,510 I'll then be able to see what memory errors, if any, I encounter. 856 00:45:35,510 --> 00:45:37,500 So I hit Enter here. 857 00:45:37,500 --> 00:45:38,370 And now I see-- 858 00:45:38,370 --> 00:45:38,870 whoops. 859 00:45:38,870 --> 00:45:40,640 Let me cd into PDF first. 860 00:45:40,640 --> 00:45:46,590 valgrind dot PDF, and then test PDF dot PDF. 861 00:45:46,590 --> 00:45:47,730 It'll run. 862 00:45:47,730 --> 00:45:49,950 It gave me some slightly cryptic syntax. 863 00:45:49,950 --> 00:45:51,300 But notice how I see this-- 864 00:45:51,300 --> 00:45:52,770 leak summary. 865 00:45:52,770 --> 00:45:56,130 If I see a leak summary here and I see that there are still 866 00:45:56,130 --> 00:45:57,960 some bytes in that leak summary, well, I've 867 00:45:57,960 --> 00:46:02,760 actually been not necessarily keeping memory as tidy as I should be. 868 00:46:02,760 --> 00:46:06,960 In this case, it might mean that perhaps I left some memory on the table 869 00:46:06,960 --> 00:46:11,520 and didn't tell my program to free that up so other programs could 870 00:46:11,520 --> 00:46:14,070 use this very same memory. 871 00:46:14,070 --> 00:46:15,060 So what did I do wrong? 872 00:46:15,060 --> 00:46:17,820 If I read to this file top to bottom-- 873 00:46:17,820 --> 00:46:19,830 keep in mind our common mistakes. 874 00:46:19,830 --> 00:46:25,780 I'll see, well, I didn't use malloc, but I did open the file. 875 00:46:25,780 --> 00:46:27,290 Did I close it later on? 876 00:46:27,290 --> 00:46:30,230 Let's look through. 877 00:46:30,230 --> 00:46:33,170 I checked the buffer but I didn't close it. 878 00:46:33,170 --> 00:46:35,750 It's important, before you end your program, 879 00:46:35,750 --> 00:46:38,960 to make sure you always close the file that you've opened. 880 00:46:38,960 --> 00:46:41,120 So in this case, I could close it down here. 881 00:46:41,120 --> 00:46:43,610 Fclose. 882 00:46:43,610 --> 00:46:46,710 And I'll give it the file pointer I want to close-- in this case, 883 00:46:46,710 --> 00:46:49,290 we called it simply file. 884 00:46:49,290 --> 00:46:51,965 Fclose file. 885 00:46:51,965 --> 00:46:53,840 But this isn't the only place I should close. 886 00:46:53,840 --> 00:46:56,090 It I should also close it up here. 887 00:46:56,090 --> 00:46:59,360 Keep in mind that if I run this for loop and ever find 888 00:46:59,360 --> 00:47:03,080 that buffer isn't the same as signature, I'll print likely not a PDF 889 00:47:03,080 --> 00:47:04,460 and return 0. 890 00:47:04,460 --> 00:47:07,820 If I hadn't closed the file here, that would be the end of my program 891 00:47:07,820 --> 00:47:09,930 and I wouldn't have closed the file at all. 892 00:47:09,930 --> 00:47:14,400 So I want to make sure to close it, at least, in both places. 893 00:47:14,400 --> 00:47:17,743 I could also, perhaps, close it just after I finished reading-- 894 00:47:17,743 --> 00:47:18,660 I could do it up here. 895 00:47:18,660 --> 00:47:21,360 Fclose file there, and that would avoid duplicating 896 00:47:21,360 --> 00:47:25,233 this in more than one place. 897 00:47:25,233 --> 00:47:26,400 So now let's run this again. 898 00:47:26,400 --> 00:47:33,240 Let's do valgrind dot PDF, test underscore PDF dot PDF, run it again. 899 00:47:33,240 --> 00:47:34,460 And now we see-- 900 00:47:34,460 --> 00:47:35,090 whoops. 901 00:47:35,090 --> 00:47:36,740 Still reachable. 902 00:47:36,740 --> 00:47:39,470 Hm, let's see. 903 00:47:39,470 --> 00:47:42,872 Fclose of file. 904 00:47:42,872 --> 00:47:44,330 Let's try doing what we had before. 905 00:47:44,330 --> 00:47:48,770 So we do fclose down here, putting it right there. 906 00:47:48,770 --> 00:47:52,760 Fclose down here, putting it right there. 907 00:47:52,760 --> 00:48:00,060 And now let's run this again and just see if that helps us. 908 00:48:00,060 --> 00:48:03,000 So do Command a there. 909 00:48:03,000 --> 00:48:05,825 Make PDF-- oh, we might not have recompiled compiled 910 00:48:05,825 --> 00:48:07,200 this, which might make it happen. 911 00:48:07,200 --> 00:48:12,960 So we do dot valgrind dot slash PDF, test underscore PDF dot PDF. 912 00:48:12,960 --> 00:48:17,770 And now we should see all heap blocks are free, no leaks are possible. 913 00:48:17,770 --> 00:48:20,910 So whenever you go ahead and add your fcloses to a file 914 00:48:20,910 --> 00:48:23,010 and you've run valgrind before, make sure you 915 00:48:23,010 --> 00:48:27,530 recompile it to see the new results of your new program here. 916 00:48:27,530 --> 00:48:28,190 OK. 917 00:48:28,190 --> 00:48:30,860 So let's go back to a new exercise-- this one focused 918 00:48:30,860 --> 00:48:33,740 on identifying more kinds of memory leaks, more errors you're now 919 00:48:33,740 --> 00:48:35,270 capable of making. 920 00:48:35,270 --> 00:48:40,250 So for this one, we'll take a look at this program called create. 921 00:48:40,250 --> 00:48:43,250 Where you've seen code before in VSCode. 922 00:48:43,250 --> 00:48:47,730 You know how you can type code, maybe hello.c to open up a new hello.c file. 923 00:48:47,730 --> 00:48:52,650 Similarly, create.c allows you to open up a new file as well. 924 00:48:52,650 --> 00:48:55,100 So I'll go back to my code base. 925 00:48:55,100 --> 00:48:56,660 I'll do cd dot dot. 926 00:48:56,660 --> 00:48:58,700 And you, too, can download this file. 927 00:48:58,700 --> 00:49:02,570 But I'll do cd create and type ls, and now I see, in this case-- 928 00:49:02,570 --> 00:49:04,870 I'll remove hello.c. 929 00:49:04,870 --> 00:49:10,130 I see create.c, which I'll open like this-- code create.c. 930 00:49:10,130 --> 00:49:12,740 So the goal of create.c is to run it a bit like this. 931 00:49:12,740 --> 00:49:17,690 If I type make create to recompile this, I'll type dot slash create, 932 00:49:17,690 --> 00:49:19,670 and I'll type maybe, hello.c. 933 00:49:19,670 --> 00:49:24,560 And now I type ls again, and I should see, I have this brand new code.c file. 934 00:49:24,560 --> 00:49:30,030 I can type code hello.c, open it up, and now I have this blank file for me here. 935 00:49:30,030 --> 00:49:35,252 So create is capable of making whatever file name I type after I type it. 936 00:49:35,252 --> 00:49:37,460 But as we'll see in just a minute, there are probably 937 00:49:37,460 --> 00:49:41,460 going to be some errors in here that we should address first. 938 00:49:41,460 --> 00:49:47,270 So to test if our file has any errors in it let's run valgrind of dot 939 00:49:47,270 --> 00:49:53,360 slash create, test.c to create this file test.c, but along the way, figure out, 940 00:49:53,360 --> 00:49:56,270 are there errors we should consider fixing in this case? 941 00:49:56,270 --> 00:50:01,820 So I'll do valgrind, dot slash create, test.c, hit Enter. 942 00:50:01,820 --> 00:50:02,990 Valgrind will run. 943 00:50:02,990 --> 00:50:05,150 And notice how I have error summary down below. 944 00:50:05,150 --> 00:50:08,210 I can see three errors from three contexts. 945 00:50:08,210 --> 00:50:13,010 I've definitely lost 6 bytes and I still can reach 472 bytes, 946 00:50:13,010 --> 00:50:16,820 but those bytes were still lost as well. 947 00:50:16,820 --> 00:50:20,180 Or at least they were leaked from my file. 948 00:50:20,180 --> 00:50:25,360 So I'll give you a minute here to download this file, read through. 949 00:50:25,360 --> 00:50:28,870 And keep in mind these three common memory errors. 950 00:50:28,870 --> 00:50:34,030 See if you can figure out where we've gone wrong here, 951 00:50:34,030 --> 00:50:36,340 and try to fix the file as you go. 952 00:50:36,340 --> 00:50:38,350 Again, remember that you can always run valgrind 953 00:50:38,350 --> 00:50:45,010 using valgrind dot slash create, test.c, but always be sure in this case 954 00:50:45,010 --> 00:50:48,250 to recompile your program before you run valgrind. 955 00:50:48,250 --> 00:50:51,190 We'll come back in one minute while you all work on this. 956 00:50:51,190 --> 00:51:06,130 957 00:51:06,130 --> 00:51:06,630 All right. 958 00:51:06,630 --> 00:51:09,380 So now that you've had the chance to identify these memory errors, 959 00:51:09,380 --> 00:51:11,080 let's take a look at them together. 960 00:51:11,080 --> 00:51:13,120 So let's go back to our valgrind summary, 961 00:51:13,120 --> 00:51:17,550 which I can find by doing valgrind dot slash create, test.c. 962 00:51:17,550 --> 00:51:21,450 And now I'll look, and I definitely lost 6 bytes, 963 00:51:21,450 --> 00:51:24,420 and there are still reachable these 472 bytes, 964 00:51:24,420 --> 00:51:28,720 but we probably want to make sure we don't leak those in the end. 965 00:51:28,720 --> 00:51:30,030 Let's look at our first error. 966 00:51:30,030 --> 00:51:34,290 Failing to free every block of memory which we've malloc'd. 967 00:51:34,290 --> 00:51:37,350 So here, let's figure out where I used malloc. 968 00:51:37,350 --> 00:51:44,180 I seem to have used it maybe on line 16 to create this space for the file name. 969 00:51:44,180 --> 00:51:47,025 Did I ever free it? 970 00:51:47,025 --> 00:51:48,150 It doesn't look like I did. 971 00:51:48,150 --> 00:51:53,160 So I need to make sure I free this file name after I'm done using it. 972 00:51:53,160 --> 00:51:55,330 At what point am I done using it, though? 973 00:51:55,330 --> 00:51:58,360 If I scroll down below, well, I used it for fopen, 974 00:51:58,360 --> 00:52:02,670 but once I do that, I think I can go ahead and just simply free 975 00:52:02,670 --> 00:52:04,330 that file name. 976 00:52:04,330 --> 00:52:06,000 So I'll run this again. 977 00:52:06,000 --> 00:52:08,900 I'll do make create. 978 00:52:08,900 --> 00:52:13,170 I'll do valgrind dot slash create test.c. 979 00:52:13,170 --> 00:52:18,420 And now, those 6 bytes that were before lost are now not lost, 980 00:52:18,420 --> 00:52:22,130 and I am down to two errors from two contexts. 981 00:52:22,130 --> 00:52:22,760 OK. 982 00:52:22,760 --> 00:52:24,470 Let's see what else I can do. 983 00:52:24,470 --> 00:52:29,150 Well, I want to keep in mind, I need to fclose every file I've fopened. 984 00:52:29,150 --> 00:52:34,690 Well, I'll look here and see, I used fopen, but did I use fclose? 985 00:52:34,690 --> 00:52:36,790 I don't seem to have used fclose. 986 00:52:36,790 --> 00:52:41,500 So after I've opened this file, what should I maybe immediately do? 987 00:52:41,500 --> 00:52:43,660 Well, I should probably just go ahead and close it. 988 00:52:43,660 --> 00:52:49,300 I'll do fclose in this case, new file. 989 00:52:49,300 --> 00:52:54,710 Now one thing I should also check while I'm here is this-- 990 00:52:54,710 --> 00:52:59,340 fopen, again, isn't guaranteed to work, as we saw before. 991 00:52:59,340 --> 00:53:03,320 So if it doesn't work, if I can't open this file for whatever reason, 992 00:53:03,320 --> 00:53:06,260 it's good to check, is this file null? 993 00:53:06,260 --> 00:53:07,730 Is this file pointer a null? 994 00:53:07,730 --> 00:53:11,840 So I'll ask the question after I open it, is new file-- 995 00:53:11,840 --> 00:53:17,160 or if new file is equal to null, what should I do? 996 00:53:17,160 --> 00:53:24,410 I should printf, could not create file, backslash n, and then say return 997 00:53:24,410 --> 00:53:26,560 1, for instance. 998 00:53:26,560 --> 00:53:30,610 But down below, I want to make sure that if I did successfully open this file, 999 00:53:30,610 --> 00:53:34,396 I should go ahead and close it and then free the file name. 1000 00:53:34,396 --> 00:53:35,660 OK, let's do this again. 1001 00:53:35,660 --> 00:53:38,650 So we'll do valgrind-- actually do make create. 1002 00:53:38,650 --> 00:53:45,230 Then valgrind dot slash create, and then test.c. 1003 00:53:45,230 --> 00:53:49,960 I still see two errors from two contexts. 1004 00:53:49,960 --> 00:53:53,860 But this is at least a little better. 1005 00:53:53,860 --> 00:53:56,200 I see all heap blocks were freed. 1006 00:53:56,200 --> 00:53:57,470 No leaks are possible. 1007 00:53:57,470 --> 00:53:59,303 So there's still something going wrong here. 1008 00:53:59,303 --> 00:54:01,060 I still see two errors from two contexts. 1009 00:54:01,060 --> 00:54:06,960 But I do see I'm no longer leaking much memory. 1010 00:54:06,960 --> 00:54:08,780 So let's take a look at this. 1011 00:54:08,780 --> 00:54:12,860 Our next error was using more memory than we've allocated. 1012 00:54:12,860 --> 00:54:17,050 Where might we be doing that? 1013 00:54:17,050 --> 00:54:22,410 Well, we malloc'd, in this case, the size of a character times the file name 1014 00:54:22,410 --> 00:54:23,970 length. 1015 00:54:23,970 --> 00:54:29,460 But if you think about strings, strings require more space 1016 00:54:29,460 --> 00:54:31,410 than just the characters we store in them. 1017 00:54:31,410 --> 00:54:33,405 What else do they require? 1018 00:54:33,405 --> 00:54:35,532 1019 00:54:35,532 --> 00:54:37,740 They might require some place for that null character 1020 00:54:37,740 --> 00:54:39,990 at the very end, that terminating character that says, 1021 00:54:39,990 --> 00:54:42,190 this is the end of our string. 1022 00:54:42,190 --> 00:54:48,300 So if we went back to that valgrind output, we might see up above-- 1023 00:54:48,300 --> 00:54:49,260 let's see. 1024 00:54:49,260 --> 00:54:52,970 Invalid write of size 1. 1025 00:54:52,970 --> 00:54:57,740 So an invalid write means, we wrote to some space in memory 1026 00:54:57,740 --> 00:54:59,180 we didn't have allocated to us. 1027 00:54:59,180 --> 00:55:02,020 We overwrote something we really shouldn't have done. 1028 00:55:02,020 --> 00:55:03,280 But how could I do this? 1029 00:55:03,280 --> 00:55:05,197 Well, I could just simply ask for more memory. 1030 00:55:05,197 --> 00:55:09,130 If I know that this is going to store a string, I should probably in the end 1031 00:55:09,130 --> 00:55:13,150 ask for, of course, the number of characters times the file name length. 1032 00:55:13,150 --> 00:55:15,560 But then, let's go ahead and add 1 to that. 1033 00:55:15,560 --> 00:55:22,340 So let's say file name length plus 1 for that null character at the very end. 1034 00:55:22,340 --> 00:55:26,380 So now we'll go back down below and clear the terminal. 1035 00:55:26,380 --> 00:55:33,070 Make create, run valgrind of dot slash create, test.c. 1036 00:55:33,070 --> 00:55:37,570 And now, hopefully, fingers crossed, we do see that all heap blocks were freed, 1037 00:55:37,570 --> 00:55:43,930 no leaks are possible, and we have o errors from 0 context down below. 1038 00:55:43,930 --> 00:55:49,125 Now often the best way to avoid these errors is to really avoid-- 1039 00:55:49,125 --> 00:55:54,870 to take some preventive measures and really be judicious or be thoughtful 1040 00:55:54,870 --> 00:55:56,910 about your use of malloc, in this case. 1041 00:55:56,910 --> 00:56:01,140 So there is probably a better way to write this same program. 1042 00:56:01,140 --> 00:56:03,480 We might not need to even use malloc. 1043 00:56:03,480 --> 00:56:08,130 We might not even need to worry about creating a string that 1044 00:56:08,130 --> 00:56:09,805 has space for a null character. 1045 00:56:09,805 --> 00:56:12,930 And so I actually encourage you as a optional additional exercise to figure 1046 00:56:12,930 --> 00:56:17,490 out, how would you rewrite create.c, such that you would 1047 00:56:17,490 --> 00:56:19,890 avoid these errors in the first place? 1048 00:56:19,890 --> 00:56:23,205 That's always maybe a good first step for a programmer, is figuring out, 1049 00:56:23,205 --> 00:56:27,750 OK, we solved these errors, but how could we avoid making them even 1050 00:56:27,750 --> 00:56:28,900 in the first place? 1051 00:56:28,900 --> 00:56:31,480 So I'll leave you with that. 1052 00:56:31,480 --> 00:56:35,170 Now this is going to bring us to the end of our section today. 1053 00:56:35,170 --> 00:56:36,550 We've taken a look at pointers. 1054 00:56:36,550 --> 00:56:39,505 We've taken a look at opening files, file 1055 00:56:39,505 --> 00:56:42,280 I/O. We've also taken a look at malloc and common memory errors. 1056 00:56:42,280 --> 00:56:44,030 And this should equip you to go off and do 1057 00:56:44,030 --> 00:56:47,920 this week's problem set with confidence to tackle filter and so on. 1058 00:56:47,920 --> 00:56:50,260 And so I hope you enjoy this week's problem set. 1059 00:56:50,260 --> 00:56:51,680 Wonderful to spend time with you. 1060 00:56:51,680 --> 00:56:55,890 This was CS50 and we'll see you next time. 1061 00:56:55,890 --> 00:56:57,000