CARTER ZENKE: Well, hello one and all and welcome to our week four section. My name is Carter Zenke. I'm the course's preceptor, and today we'll dive into memory, little details of how a computer actually stores things inside of its own memory in this case. So jump in. We'll take a look at a few questions today, and these questions look a bit like this. 

First we'll talk about these ideas of pointers. What is a pointer, how do we use it, what's their syntax we should get used to? Second, we'll think about how it can read and write data from a file. How do we take data stored inside a file, put it inside our program. And then how do we take data from our program and put it inside of a file? 

And then finally, we'll talk about this idea of dynamic memory. How do we use these new tools like malloc and when should we use them? Even really, why should we use them? Why do we care about them? So to jump in, let's take a look first at pointers, where pointers is this idea of trying to store the address of some variable inside a new variable, in this case a pointer. 

So to get started with this, let's think back to what we talked about first with variables. What is a variable? If a pointer is to variable, what first is a variable? Well, we saw earlier-- we had this contact application. We're trying to store some variables inside of like people with names, addresses, phone numbers, et cetera. Or even the number of times we had called them. So let's go back to that idea first. Let's take a look at this syntax where we had some variable named calls that is an integer, and it has the value four. And we broke this variable down as follows. 

We said that we can give this variable a name-- in this case, calls. It also has a type called int for integer, it stores integer values. And it also has a value, in this case, four. So calls is simply some name for some place in memory that has the value four. But the picture is a little more complex than that, as we now know. In fact, instead of simply saying that calls is located someplace in memory, we can tell you very specifically where it is in memory using this new tool called addresses. So similar to how homes or businesses have addresses, so do variables. And in this case, we'll see that calls has the address 0x1A, kind of by random choice here. 

But in this case, we use hexadecimal format, where we see zero x in front to note that this is in base 16. And programs have used hexadecimal because it's more convenient to talk about memory locations given how many location memory there can possibly be. It's easier to use base 16 than, for example, base 10. And so we denote that using that 0x in front. 

So calls has this location, 0x1A. But how do we actually make use of this location? Well we could actually use what we call now a pointer, where a pointer is simply a variable as we know it, but it stores not a regular value like an integer or a character, but some address in memory. So let's take a look at this visual now where we have this pointer called p, and we know it because we're calling it p in our syntax on the left hand side. This is the name for our variable. 

We also have, similarly, a type for this pointer. This is an int star. So we know int is an integer value, integer type. We also have characters and so on. But in this case, we see int star. So whenever we see one of those basic types, in this case like char star or int star, we can infer that this is not a char or an int, but a pointer to a char or an int. In this case, p is a pointer to an integer. That star notes this is a pointer we're talking about, not the actual value, in this case, an integer or a character. 

Now to that end, notice how the value of p is itself an address. So we have 0x1a. So p will get this address and store it wherever it has located itself in memory. And so not to get too in the weeds, but we know that if pointers are themselves variables, they're a special kind of variable and they store addresses, but this pointer itself also has an address in memory. We won't really talk too much about this idea. It's not often we try to get the location of a pointer that itself is a location in memory. But just so you know, because this p variable is a variable, it does, of course, have an address in memory. 

Now let's think about how we could use this. So we have this idea of storing values and variables, in this case, addresses in pointers, but what do we do with them? What syntax can we use for them? Let's take a look at this. Well, we know we can always get the value of a variable, simply kind of saying its name in code after we've declared it and initialized it. 

So in this case, we're simply getting the value of calls. If we ever say calls in our program, that gives us back the value at that variable location-- in this case, the value for calls. Now if we want the address or the value inside of our pointer, p, we could do the same. We could say, give me whatever's inside of p. 

And in this way, we can keep track of, for example, where this variable calls is. If we know that we have a variable, a pointer called p that points to whoever calls is, we could simply use the syntax, p, to figure out where, in this case, calls is located. We could, similarly, use some new syntax we've seen in this case, ampersand. where ampersand stands for the address of some variable. And I like to remember this by saying that ampersand begins with an a, and so does address. Ampersand and address. 

So in this case ampersand calls tells us, OK, where is calls located, just in general? Maybe it's not inside a variable yet, but where is calls? So if we look at the slides here, we'll see that calls is located at 0x1a, as we saw before. Ampersand calls will give us not the value of calls, not four, but in this case, the address, 0x1a. 

Now thinking similarly, we could run the same on a pointer. We could say, what's the address of this pointer, and get back, perhaps, 0xf0, or wherever this pointer is located in memory. It's not often we'll do this. It's not often that we'll get the address of a pointer that, again, itself has an address, but you can do it if you're curious. 

Now let's think about this, then. If we have a way to get the value of the pointer and the address of the pointer. I mean, what do we actually use a pointer for? Well, often, we'll use a pointer to instead go to wherever it's pointing to and get the value from there. And for that we use this syntax, this star. It's a little bit confusing because we saw a star before, like int star and char star to declare a pointer. But in this case, we're actually going to use it to follow a pointer to the locations pointing to. 

Now the visual for this is a bit like this. If we say star p, we first think, OK, what is inside of star p? What is the value of star p? And think to yourself for a minute here. What is the value of star p? What is the value of just p, first. 0x1a. And now, putting the star in front of it, we say, let's go to that value that is stored in p. Let's go to whatever address is stored in p-- in this case, 0x1A, and let's get that value there. So let's follow the arrow, follow the pointer so to speak, and get that value, in this case, four, from our pointer, p. 

This is handy, of course. We build up really large and complex data structures. And this simple example might seem redundant but as you go off and build your own linked lists and hash tables and so on, it can be really useful to be able to call a pointer like this to be able to see OK, where is this value stored, how can I get the value it's pointing to? 

Now as you go off a few things to remember about pointer syntax as we just saw now. First, type star, whether it's int star or char star, is a pointer that stores the address of a certain type. If you say int star, it stores the address of integer. If you say char star, it's the address of a character. 

Now additionally, star x takes a pointer-- in this case called x, and goes to the address stored at that pointer, as we just saw recently. Star p finds that pointer, p, and says, what's the value inside of p? Let me go to that address and find the value there. And finally, ampersand x takes whatever variable you have, in this case called x, and gets its address. Ampersand for address, the two A's together there. 

Now let's take a look at an example exercise we can do to make sure we're practicing this syntax as we go. And this is a pointer prediction exercise. So often, it's helpful when first starting out to look at a program and predict what it might do. And only afterwards, we run it and check your assumptions. So today, we'll take a look at this file called pointers.c. 

And I encourage you, before you run this file, to read through the code, line by line, and act like a computer. In your mind or on a piece of paper, write down, what's the value of a? What's the address of a, for example, inside this points.c file. And then once you've done that, go ahead and run it and see if your assumptions are correct. Now let's do this a bit together here. 

Here we have the code on the left hand side and this visual on the right hand side. And again, your job is to take a few minutes, maybe pause the video, read through this code top to bottom and work like a computer. Fill in the potential address of A and the value of a. The potential address of b and the value of b, and so on for c. It's OK if you need to make up addresses. You can certainly do that. But at the end of running this program, quote unquote, in your own head, you should have some addresses for a, b, and c, And some values for a, b, and c. So go ahead and pause the video here, work on this, and we'll come back to this one together. 

OK, so by now you might have, in your mind, some potential values for a, b, and c. And let's go ahead and run this program, again, called pointers.c to actually see what will happen here. I'll open up code.cs50.io, in which case, I already have a folder called pointers. And inside of this folder, I have pointers.c. You too can download pointers.c and get it inside of your own code space. 

In this case, I'll type cd pointers to change directory into pointers. And I'll type ls, where you can see, I have pointers.c and this compiled version of pointers that I could run if I want to. Let's go ahead and open, first, pointers.c. And we see the very same file-- in this case, encased with some syntax to help us run this program, ultimately. But notice how we have the same declarations of variables and initialization and the same manipulation down below. 

And finally what we'll do is print the results. So in the end, after running this program, c, a has the value-- whatever it has, located at some certain location in memory. And notice how we're using that same syntax from before. We're saying, if we want to find out where a is located, let's use ampersand. So ampersand a says, what's the address of? And the format code for a pointer is simply percent p. Percent i for integer. Again, these are all integers here. So we can use percent i, percent p or a pointer. 

So let's compile pointers. Let's go down to our terminal and type Make pointers. And now I'll clear my terminal and I'll run dot slash pointers and hit Enter. And now we'll see the results. And you might have made up some addresses as you went, but we can still see the basic gist here where a has the value 14 and it's located at some location memory that ends in, let's say, 53c. B has a value 25, located at some other location, but it ends with 538. So it's in a different location than a. 

And finally, c has the value-- well, what value does it have? It doesn't have an integer, it just has an actual address. So in this case, c is a pointer storing an address. And of course, a pointer as another variable does have a location itself in memory-- in this case, one that ends with 530, so someplace altogether different than a and b. 

So if you predicted correctly, congratulations. You're getting a hang of your pointer syntax. If not, not to worry. Feel free to practice more with the syntax and actually learn by doing as you write programs that use this syntax here for pointers. But maybe first let's get a feel for how this is working. 

Well first we had integer a gets the value 28. So at some location in memory, we'll create this variable called a that gets the value 28, and so on for b. So we can give them some random address, wherever it is that they're assigned. So a and b have some random address, but they do have the values 28 and 50. 

Now though, let's take a look at c. Well, C looks a bit different. It says int star is its type. So what is int star? Think to yourself. It's a pointer to an integer. So in this case, we have a pointer called c. This is going to get not a, but the address of a. So wherever a is stored, c will get that address. Now c is pointing to, metaphorically, a, or the value stored at a. 

Now we do some manipulation of these pointers. So we've set them up initially here, but now I do some manipulation of them. We say, OK, let's go to the value that c is holding and make that 14. But where, currently, is c pointing? Well, c, as we just said, is pointing at a. C, this pointer called c, to an integer, has the address of a. So if we say star c, that means go to the location of a and change the value to be 14. 

Now we'll update c. We'll say, c, as a value, has ampersand b. So where to c point to now? C points to b, essentially. We take the address of b and store that in c. And finally, we say, let's follow c to wherever it's pointing using star c, and then change the value to be 25. So at the end result, we should see-- because we've first set a, first set b, then had c point to a, updated a to be 14, then had c point to b, updated b to be 25, we should, of course, see that a has a value 14, b has the value 25. 

And notice that c has this value that corresponds to what other variable? Corresponds to b. C has the value that is the location of b. OK. So again, feel free to get some more practice with this, either on your own or while you work on the problem set itself. But these pieces of syntax will be really useful for you and great to master as you go through your programming journey. 

Now with that in mind, let's do something a bit more advanced. Let's take a look at how we can use these pointers for actually adding to files, reading from files and so on. So let's go back to our slides here. And let's think about this idea of file I/O, where file I/O stands for file input and file output. How do we take input from files and how do we output data to files? 

So if we think first about this idea of a file, you might have this kind of common notion of a file being some place in memory that has a name, and some maybe text inside of it, or some other characters or some other pieces of data. In this case, we might have this file called hi.txt. And inside of hi.txt, well, we just have the characters, hi, exclamation point. And now this file is really located somewhere, of course, in memory. This file has to have a place, and let's say it's at 0x456 in memory. 

Now, if we want to open this file in order to read data from it, we have one way of doing it in C. For example, we could write this. We could say file star input gets this value of fopen, hi.txt, r. And this is maybe a lot to take in at once, so let's break it down as a whole. First what we're doing is giving something a name. We're creating some variable and calling it input. OK. 

But we also have some other pieces going on here. Let's look at this. We have a type for this variable we're creating called input. This is a file pointer, a pointer to a file. So it goes to say that maybe this variable, this pointer called input, points to some file. It has the address of some file inside of it. 

Now, if we use fopen here, it's our very own function that C gives to us as part of its standard library. Well, what we could do is say, I want to open hi.txt using the read mode. So the first argument to fopen, in this case hi.txt is the file name we want to open. And when we run this, if we give a certain file name, C will look in our current directory for that file name. And if it finds it, go ahead and give us the address of that file. 

Now in this case, we have to specify the mode in which we open this file. Is it a reading mode or a writing mode? Reading mode allows us to read data, to take data from the file, to really copy it, see what's inside of that file, but not add data to it. If we want to do that, we have to open it in writing mode, which is w. So we could say hi.txt comma w, if you wanted to open this file and add more data to it. But for now, we're going to read data from it, take data copied into our program now. 

Now this visual here looks a bit like this. We're at the end of running this code. We have some variable called input, some pointer that has the address of our file. Now it's not quite as simple as this. There is more going on underneath the hood. But for now, we can tell a bit of a white lie, that input does have roughly the address of this file. In actuality, when we run this program, this line of code right here, fopen doesn't give us the exact address of the file, it gives us some file structure that is a bit more complicated. But the basic idea of this is that some file has a location memory, and using fopen to find that location and have a pointer roughly to that file that we can use in a special way as we'll see in just a minute. 

Now it's all fine and good to open our files, but wouldn't it be good if we could actually take data from them and read them into our program? Read those bytes and put them inside our program so we can use them for our own good. Well, to do that, we first need to complicate our vision of this file. Let's take a look at this. 

So we have this idea of a file and some variable, often called input, that points roughly towards our file. This tells us where our file is and allows us to access whatever is inside of it. Now this file is, of course, composed itself of different bytes of memory. Where we might have, in this case, many bytes memory inside of this dot txt file. And our goal is to take a peek at those bytes and see what's inside of them and put them inside of our own program. 

Well, to do that, we'll often need a place inside of our program to store these bytes. And this will often, in our case, be called a buffer. So a buffer is a technical term for some place we're going to store some data as we read it from a file. A buffer might look, visually, a bit like this, or it's maybe a sequence of three bytes. And if a buffer is some sequence of bytes, back, to back, to back, or some sequence of locations memory, back, to back, to back, what kind of structure do you think would be good to use for a buffer that we've seen before? What kind of structure would you use? 

Buffer might be an array, right, where we have an array being some locations memory, back, to back, to back. A buffer similarly might be an array of some bytes we read from our file. Now we have a buffer to store this data in and a file to read this data from. It just, reason to say, OK, we need some tool to use to actually take these bytes from our file and put them now in our buffer. 

And thankfully, the C library does give us this function called fread. Fread looks a bit like this. And you can certainly change the arguments to fread. But importantly, it takes four distinct arguments-- four distinct inputs. The first input that we care about that makes logical sense to start with is the file we're reading from. So this is the file pointer that we're going to read data from. Notice how in our prior visual, we had this input file pointer, pointing roughly towards our file. Well, in this case, we want fread to read from that location, so we'll put this in the fourth argument to fread. 

Next in importance is the size of the blocks to read in bytes. Notice how our file here is composed of, in this case, individual bytes. So we want to read each of those bytes as a single byte at a time. So we'll say that this is the size of the block to read. The blocks in our file are a single byte large. 

The next question then is, OK, how many blocks do you want to read? If we say we have-- the size of our blocks in our file is one byte, well, let's read three of them. So three single byte blocks from our file. And where do we put them? We put them inside of our buffer, in this case. So the first argument to fread. The location we store the data we're reading. 

The second argument, how big is the block in our file? How big is a chunk of data in our file? Third argument, how many do I want to read? And finally, for this input, where are we reading from. So let's get a visual for this as we go through. So first, if we look at this location to read from, input, we're telling our file, here's what we're reading from, the start hi.txt file, because we opened it using it fopen just a little bit ago. 

Now we finish the question. What's the size of blocks to read? Well, they're a single byte big. Now how many do we want to read? We want to read three of them. So we'll look at three of these here. And then where do we want to store these? Where do we want to read them and copy them into? In this case, it's buffer. So we'll simply take our data in our file here and then put them inside of buffer. And now our file pointer, previously called input, updates to the next location in our file. And as such, we can keep reading, and reading, and reading from our file while our file pointer updates. 

So it's often good to kind of see this in practice. And what we'll do for this is actually take a look at how we can test the file as a PDF. But before we do that, let's take a look at how we can use a buffer here. If we want to use our buffer as an array, we could, of course, use this bracket syntax, like buffer bracket zero, buffer bracket one, and buffer bracket two, giving us access to those individual bytes, which will be important in just a moment. 

So for our practice, we'll create this program called pdf.c. And the goal of pdf.c is to open a file given to our program and check, is that file likely a PDF? And we can know this because we know that every PDF, or at least those of a certain type, a common type, start with this four byte sequence. And these bytes correspond to these four integers-- 37, 80, 68, and 70. So maybe news to you. Whenever you open a PDF, the first four bytes in that file are often going to represent these integers-- 37, 80, 68, and 70. This is known as a file signature. It tells a program opening this file to know hey, this is pretty sure to be a PDF. 

Now let's write this code together. Let's go to pdf.c over here. I'll do cd dot dot, get out of my pointers directory and I'll clear my terminal. I'll then go into my new folder called PDF and cd inside of it. I'll type ls. And notice how I have some test files here. I have test JPEG and test PDF. I don't yet, though, have, in this case, pdf.c. I'll use these files as tests for my code, but first I need to create pdf.c. 

So I'll do that with code, in this case PDF, dot c. Now I have this new file, pdf.c. And what's the kind of boilerplate syntax I should use to start off my PDF program? I might want to have an int main void program-- or a function, some place to run the main part of my code. I might also want to simply import the CS50 library. I might also want to import maybe the standard I/O library to print something out to the user, tell them if it's a PDF or if it's not. 

Now to get started, I first need to accept some command line arguments for my program. I need to ask the user to type in the name of the file they want to open. And so what I'll do is I'll check to see if the user has actually given me a file name. Now instead of void here, I probably want to use int argc and string argv. This allows my program to take command line arguments. And remember that the amount of arguments I've been given is stored in argc, and the actual content of the arguments is stored in this array, argv. 

Now I should check, is argc equal to the number of command line arguments I expect? Now, if I expect one command line argument, I should check, is arg not equal to two? And if it's not I'll say, printf, this is improper usage. This is not the way to run my program. And I'll return 1 to the user, saying, this is not the right way to do it. Now why is argc two here, not one? We expect one argument, really, but why should I check if argc is two? 

Well, keep in mind that when I run this program, eventually as, in this case, dot slash pdf, maybe test JPEG dot JPEG, the first command line argument, technically, for a program is going to be dot slash PDF, and then the second one is going to be this. So if I want really one command an argument to my program here, I should keep in mind that this still counts as some argument given at the terminal. So I should say if argc is not equal to two in the end. 

Now let me scroll down here. And once I know that I have some file name, let me try to open this file. So let's say open file here. And how can we do that? I encourage you to pause the video, maybe try it on your own. How could we open whatever file is given to us from the user? 

So let's first keep in mind the file name, right? We could perhaps say this-- the file name, string file name, is located at argv 1. Now why argv 1? What we saw below, when we eventually run our program with dot slash PDF test JPEG dot JPEG or test PDF dot PDF, well, this will be what's stored in argv 0, and this will be what's stored in argv 1. So the file name is located at argv 1. And we need the final name to pass it into what function to open our file? Fopen. 

So we saw, we could use fopen like this-- fopen and then the file name. So we'll say this is our file name, simply using this variable we've used before. And then what mode we want to open the file in. Because we're simply reading this file, seeing what information is inside of it, not adding to it, we can simply open this file in read mode. 

But we need some place to store the structure we'll get back from fopen-- the file pointer we'll get back from fopen. And so for that, let's go ahead and create one. We can say, give me a new file pointer. This one perhaps called PDF. And now we've created some file pointer called PDF using fopen. And actually, because we don't know if it's going to be a PDF, let's just call it in this case file. So we have a file pointer called file that will take a file name, find it in our current directory, open it in read mode, and give it back to us in terms of this file pointer. 

OK. So we have opened the file. But as we do, you can think of some edge cases. What if, for example, we type in a file name we actually don't have? Well, in that case, fopen will return to us null, as a special term to say, we don't have that file for you here. So it's important here to not simply blindly open the file, but to check, even. Is file equal to null. And if it is, let's go ahead and return 1. And we print an error message to the user. We could say, in this case that, no such file found. And for good measure, why don't we have some backslash ends to tidy up our code here. All right so this is going to check if file exists. 

Now we've opened the file, ideally. We've checked if it exists. And presumably, if we get past this line of code, the file does exist. So what's the next step? We want to read from this file, and ideally, have some place to read into. So we saw before that it's common for us to create what we call a buffer-- some place to read data from our file and put it inside our program in smaller chunks. It wouldn't be wise, for example, for us to take the entire program and put it all-- the entire file and put it all in our program at once. Instead, we want to take single chunks of that file and deal with those chunks kind of individually, one at a time, kind of reading in smaller pieces of the file but going through the entire file over time. 

Now to create this buffer-- remember, a buffer is simply an array. But in this case, for a PDF, it's an array of a special kind of byte. We want the buffer to store the same type of data that's inside of our file. And in this case, a PDF stores the special type called a uint 8 t. This is a special type of data. And it might look scary at first, until we break it down into smaller pieces. 

So first, notice some familiar syntax. Here we have int, right? So presumably, this is some kind of integer. It's a special kind, though. It's a U int. U stands for unsigned, meaning it's only positive. Remember, that we talk about signed or unsigned integers, where signed means it can be positive or negative. We can have a minus sign in front or not. But as unsigned integers, it can be in this case only positive. 

Now we have this 8 here. It's a uint, unsigned integer, 8 underscore t. Well, the 8 here denotes this is only 8 bits-- a single byte for an integer. It's not a 2 byte integer or a 4 byte integer. It's only a single byte. We can represent up to, in this case, the number of values we have with a single byte, unsigned. 

Now this underscore t here means that all that together, this unsigned integer of 8 bits, is going to be its own type. So uint went 8 underscore t, an unsigned integer of 8 bits that is its very own type. Now this is a special kind-- it's presumably used in more than PDF, but it's a new one for us today. So we'll get some practice with this. 

To use a uint 8 t, we need to actually import a different library from the C library, this one called standard int-- S-T-D dot I-N-T dot H. So we'll get it from this header file, standard int dot h, and now we can use it inside our pdf.c down below. Now we want a buffer here. So we want a buffer of uint 8 t's. And how many integers are we going to store in this buffer, do you think? 

We need four-- the PDF signature is only four bytes, remember. Assume I only need four spaces. So I'll create enough space in an array for four uint 8 t's that'll read from our PDF. And currently this is empty, but in just a minute, we'll actually go ahead and try to read these inside of our buffer. So now that we have some place to read in the data from our file, let's go ahead and try that. 

We could say fread. And remember that the first argument to fread is the place we're reading into. So in this case, we'll read into our buffer. Now the next two arguments are the size of blocks in the file, and how many of those blocks you want to read. Now for a PDF, a PDF could consider it being composed of individual bytes. So we could say, the size of the block is one byte. 

But how many of these bytes do you want to read? Only the first four. So we could say, take the first four single byte blocks from this file. And now where from? Where is this file reading from? Well, in this case, it's going to be from our file pointer from above. Notice the correspondence between the file we've opened before and the file we're reading from down here. 

Now fread doesn't necessarily need to be stored in some variable. We don't necessarily say, file pointer, updated file equals fread. Fread works just on its own. You can call it. It'll do the magic of reading in your bytes from your file, putting them inside the buffer, and updating the file pointer to look at the next four bytes in your file. 

Now once we've done that, we could possibly take a look at what's inside of buffer, just for the sake of looking. Now to iterate through our buffer, what could we do? We could write a for loop. We could say for int i equals 0. i is less than four, i plus plus. Now we could look-- let's print out whatever integer is inside of buffer-- let's put a space after it. Buffer bracket i. 

So now ideally, we've created some space in our program to store the first four bytes from our file. We've read them from our file, from our PDF, perhaps. And now we're going to print them out, just as a check here to see what's inside of our buffer. So I'll go down here, and I'll run make PDF to compile it and see if we don't get any errors. And we'll wait for it to compile. And it looks like it's all good. So now I'll type dot slash PDF. And now we'll give the name of the file we want to open. In this case we'll do test underscore PDF dot PDF. And let's see. 

OK, it looks like inside this file, in the first four bites are 37, 80, 68, and 70. And that does correspond to what we expect-- 37, 80, 68, and 70. Let's try our, in this case, our JPEG. But first, notice how I have this on the same line here. Why do I do this? Why don't I say backslash n at the end to print a new line, ultimately after I print out these first four bytes? 

Now I'll make PDF again. I'll do dot slash PDF test JPEG dot JPEG. Open this, and now we see the first four integers inside of this JPEG. And let's just be sure-- test a file, it doesn't exist. Let's do dot PDF, hello.c. And no such file was found. Great. So we seem to be able to read in data from our file, put it inside our buffer. But now I need to check, is this buffer the same as 37, 80, 68, and 70? Now I'll leave this up to you to work on for a minute here. I encourage you to do that on your own before coming back. But maybe pause the video and work on this together in just a moment. 

OK. So presumably you've attempted this on your own, but if you haven't, that's OK. Let's go ahead and check. How could we see the data inside our buffer and check it against the signature we're looking for? Well, it would be handy if we created our very own signature-- our own array-- that actually has the data we expect inside of it. So we could say this. Let's make a new uint 8 t buffer, this one called signature, that stores four values. 

But in this case, we'll actually just give them to the buffer. We'll say first we're looking for 37. Then we're looking for 80. Then we're looking for 68. And then we're looking for, in this case, 70. So this is our file signature we're looking for. And as a bit of trivia, you don't actually need to include the length of the buffer or the length of the array if you have the definition already over here of four, in this case, integers. 

So now we have the buffer that we're reading from our file and the signature that has the data we're actually looking for. Well, what if we compared buffer bracket i with signature bracket i, and see if they're the same? So I'll do this. I'll say, as I loop through my buffer, why don't I ask, is buffer bracket i the same as signature bracket i? And if that's the case, well, what do I want to do? 

Well, I actually really can't make a conclusion if I only know that one of buffers integers is the same as the signatures. Instead, I should probably ask, what if it's not the same? What if as I loop through, it's ever not the same between buffer and signature? Well, I could perhaps print to the user, if I know that one is not the same, this is likely not a PDF. So I'll say, likely not a PDF. And I'll return just a 0 to say everything went OK, but this is just not a PDF. 

Now then, if I get all the way through this loop, then I know that I likely have a PDF. I'll go say, printf, likely a PDF. And I'll do backslash n up here, too. And then I'll say return 0. I don't need this new line anymore. So notice the logic here. First we're going to read in some data into our buffer. We're going to ask the question, does the buffer signature match? Does the better signature match. 

We'll loop to buffer, check. Is every integer and buffer the same as signature? If it is, if we never trigger this condition, we'll get down here and print, likely a PDF. If we ever find it's not the same, though, we'll print, likely not a PDF, return 0, and thus ending our program before we get down to likely a PDF. OK, so let's run this program again, I'll open it up to make PDF dot slash PDF. I'll do test JPEG dot JPEG. And we see, likely not a PDF, which makes sense. Now let's do dot slash PDF, test PDF dot PDF, and we see, likely a PDF. So this file seems to be working exactly as we intend. 

All right. So this is some of the magic we can do, now that we can open files and read data from them. What we'll take a look at in just a moment is what we can do if we want to keep asking our program for more and more memory. Here we're not using too much memory, we're only using four bytes to read in our data from our file, put it inside our buffer. But there's more we can do with memory in this case. 

So beyond opening files, let's take a look at dynamic memory. And dynamic memory is often seen the context of using malloc in C. So malloc, if you recall, is used for asking our program for more and more memory. Importantly, it does it from a special place that we'll see in just a minute. But the basic idea is asking for memory for our program on the fly. 

Now for example, let's say we wanted to create some integer called hours. Well, we can use malloc here. We could say int star hours. Gets the value of running malloc given the size of an integer. And there are some stuff to break down here, so let's do it. Notice how we have, in this case, the name of our variable, still. It's hours. But it's not an integer right now. It's an integer pointer. 

So malloc, when it runs, always gives back a pointer to whatever space it created in memory for us. So as we run malloc, though, it needs to know, what size of space should I give you? And we can use, in this case, this function, size of, to say, give me the size of whatever type I have. Size of int, size of char. If we give that to malloc, it'll always give us that size in bytes. 

So we're asking malloc here for simply some space for one integer. Now we get back a pointer to the integer, and to then store some values inside of it, we need to use that star syntax we saw a little bit earlier. What if we wanted not just a single space for an integer, but actually maybe an array of integers? We could ask malloc for that, too. We could say, instead of size of integer, give me size of integer times five, in this case, five spaces for this integer here. 

Now what if you wanted to actually add in some data? Well, we could say, as we saw before, star hours gets 7. And if you wanted to add in some data to the right of 7, we could do this. We could say, maybe, star hours plus 1 is 9, or stars hour plus 2 is something else. Notice how we're using some pointer arithmetic here, or we're saying, hours plus 1 means go to the next location memory after whatever hours is pointing to. And of course, hours points that first location in our broader array here. 

So we could also, of course, use same bracket notation we saw before-- like hours bracket 2 for 8, or hours bracket 3, in this case, for 7, and so on. So it kind of begs the question, what's the point of using malloc if we can use this very same syntax, and now we have to do with pointers? As we saw briefly in lecture, malloc gives us some memory from the special place called the Heap. What we've been using up until now in CS50 has been the Stack, where the Stack is what you use when you use simply a function. You create some variable inside your function that asks for memory from the Stack. 

When you use malloc, though, you get memory from the Heap. And why would you get memory from the Heap? Well, if you want a much larger data structure, you might often do that on the Heap because the Heap is more persistent, it's quite larger. You don't really want to fill up the Stack too much. You often want to use the Heap for these really large kind of files that you might work with. 

You might also use the Heap when you want to have a data structure that many functions can operate on-- for example, a linked list or a hash table. If you want a single structure that you can write many functions to operate on, you'll want to use the Heap for that, because remember, a Stack is limited to the single function call, but the Heap can be shared across functions overall. So this is good, as you'll see, in the problem set this week and in coming weeks, you'll be able to actually share data across functions when you use malloc and the Heap. 

Now if you use malloc, though, there are still things to be wary of. And in fact, when we use malloc, there's often these brand new errors that we're now capable of making. And so let's take a look at these kinds of errors now. Often when we use malloc, we have to make sure we actually free every block of memory-- and often, you actually won't remember do that when you're first beginning. That's OK. 

If you use fopen, you want to make sure you always close the file that you've then opened before. And of course, we want to be wary of using more memory than has actually been allocated. So let's take a look at this. But before we actually jump into a new exercise, let's go back to our previous PDF one and see if there's something we could have done a little bit better, keeping in mind these common memory errors. 

Let's go back over here. And reading through this file, I might not see much to improve off the bat, especially if I'm a beginner here. But what I can do is run a special program that tells me what might be going wrong, memory-wise in this program. It seemed to work, but we could still do better. So I'll do this-- valgrind of dot PDF. And I'll type in this time, test PDF dot PDF. 

So if I run my program as I usually do, but type valgrind in front, I'll then be able to see what memory errors, if any, I encounter. So I hit Enter here. And now I see-- whoops. Let me cd into PDF first. valgrind dot PDF, and then test PDF dot PDF. It'll run. It gave me some slightly cryptic syntax. But notice how I see this-- leak summary. If I see a leak summary here and I see that there are still some bytes in that leak summary, well, I've actually been not necessarily keeping memory as tidy as I should be. In this case, it might mean that perhaps I left some memory on the table and didn't tell my program to free that up so other programs could use this very same memory. 

So what did I do wrong? If I read to this file top to bottom-- keep in mind our common mistakes. I'll see, well, I didn't use malloc, but I did open the file. Did I close it later on? Let's look through. I checked the buffer but I didn't close it. It's important, before you end your program, to make sure you always close the file that you've opened. So in this case, I could close it down here. Fclose. And I'll give it the file pointer I want to close-- in this case, we called it simply file. Fclose file. 

But this isn't the only place I should close. It I should also close it up here. Keep in mind that if I run this for loop and ever find that buffer isn't the same as signature, I'll print likely not a PDF and return 0. If I hadn't closed the file here, that would be the end of my program and I wouldn't have closed the file at all. So I want to make sure to close it, at least, in both places. I could also, perhaps, close it just after I finished reading-- I could do it up here. Fclose file there, and that would avoid duplicating this in more than one place. 

So now let's run this again. Let's do valgrind dot PDF, test underscore PDF dot PDF, run it again. And now we see-- whoops. Still reachable. Hm, let's see. Fclose of file. Let's try doing what we had before. So we do fclose down here, putting it right there. Fclose down here, putting it right there. And now let's run this again and just see if that helps us. So do Command a there. Make PDF-- oh, we might not have recompiled compiled this, which might make it happen. 

So we do dot valgrind dot slash PDF, test underscore PDF dot PDF. And now we should see all heap blocks are free, no leaks are possible. So whenever you go ahead and add your fcloses to a file and you've run valgrind before, make sure you recompile it to see the new results of your new program here. OK. 

So let's go back to a new exercise-- this one focused on identifying more kinds of memory leaks, more errors you're now capable of making. So for this one, we'll take a look at this program called create. Where you've seen code before in VSCode. You know how you can type code, maybe hello.c to open up a new hello.c file. Similarly, create.c allows you to open up a new file as well. 

So I'll go back to my code base. I'll do cd dot dot. And you, too, can download this file. But I'll do cd create and type ls, and now I see, in this case-- I'll remove hello.c. I see create.c, which I'll open like this-- code create.c. So the goal of create.c is to run it a bit like this. If I type make create to recompile this, I'll type dot slash create, and I'll type maybe, hello.c. 

And now I type ls again, and I should see, I have this brand new code.c file. I can type code hello.c, open it up, and now I have this blank file for me here. So create is capable of making whatever file name I type after I type it. But as we'll see in just a minute, there are probably going to be some errors in here that we should address first. So to test if our file has any errors in it let's run valgrind of dot slash create, test.c to create this file test.c, but along the way, figure out, are there errors we should consider fixing in this case? 

So I'll do valgrind, dot slash create, test.c, hit Enter. Valgrind will run. And notice how I have error summary down below. I can see three errors from three contexts. I've definitely lost 6 bytes and I still can reach 472 bytes, but those bytes were still lost as well. Or at least they were leaked from my file. So I'll give you a minute here to download this file, read through. And keep in mind these three common memory errors. See if you can figure out where we've gone wrong here, and try to fix the file as you go. Again, remember that you can always run valgrind using valgrind dot slash create, test.c, but always be sure in this case to recompile your program before you run valgrind. We'll come back in one minute while you all work on this. 

All right. So now that you've had the chance to identify these memory errors, let's take a look at them together. So let's go back to our valgrind summary, which I can find by doing valgrind dot slash create, test.c. And now I'll look, and I definitely lost 6 bytes, and there are still reachable these 472 bytes, but we probably want to make sure we don't leak those in the end. 

Let's look at our first error. Failing to free every block of memory which we've malloc'd. So here, let's figure out where I used malloc. I seem to have used it maybe on line 16 to create this space for the file name. Did I ever free it? It doesn't look like I did. So I need to make sure I free this file name after I'm done using it. At what point am I done using it, though? If I scroll down below, well, I used it for fopen, but once I do that, I think I can go ahead and just simply free that file name. 

So I'll run this again. I'll do make create. I'll do valgrind dot slash create test.c. And now, those 6 bytes that were before lost are now not lost, and I am down to two errors from two contexts. OK. Let's see what else I can do. Well, I want to keep in mind, I need to fclose every file I've fopened. Well, I'll look here and see, I used fopen, but did I use fclose? I don't seem to have used fclose. 

So after I've opened this file, what should I maybe immediately do? Well, I should probably just go ahead and close it. I'll do fclose in this case, new file. Now one thing I should also check while I'm here is this-- fopen, again, isn't guaranteed to work, as we saw before. So if it doesn't work, if I can't open this file for whatever reason, it's good to check, is this file null? Is this file pointer a null? 

So I'll ask the question after I open it, is new file-- or if new file is equal to null, what should I do? I should printf, could not create file, backslash n, and then say return 1, for instance. But down below, I want to make sure that if I did successfully open this file, I should go ahead and close it and then free the file name. OK, let's do this again. So we'll do valgrind-- actually do make create. Then valgrind dot slash create, and then test.c. 

I still see two errors from two contexts. But this is at least a little better. I see all heap blocks were freed. No leaks are possible. So there's still something going wrong here. I still see two errors from two contexts. But I do see I'm no longer leaking much memory. So let's take a look at this. Our next error was using more memory than we've allocated. Where might we be doing that? 

Well, we malloc'd, in this case, the size of a character times the file name length. But if you think about strings, strings require more space than just the characters we store in them. What else do they require? They might require some place for that null character at the very end, that terminating character that says, this is the end of our string. So if we went back to that valgrind output, we might see up above-- let's see. Invalid write of size 1. 

So an invalid write means, we wrote to some space in memory we didn't have allocated to us. We overwrote something we really shouldn't have done. But how could I do this? Well, I could just simply ask for more memory. If I know that this is going to store a string, I should probably in the end ask for, of course, the number of characters times the file name length. But then, let's go ahead and add 1 to that. So let's say file name length plus 1 for that null character at the very end. 

So now we'll go back down below and clear the terminal. Make create, run valgrind of dot slash create, test.c. And now, hopefully, fingers crossed, we do see that all heap blocks were freed, no leaks are possible, and we have o errors from 0 context down below. Now often the best way to avoid these errors is to really avoid-- to take some preventive measures and really be judicious or be thoughtful about your use of malloc, in this case. 

So there is probably a better way to write this same program. We might not need to even use malloc. We might not even need to worry about creating a string that has space for a null character. And so I actually encourage you as a optional additional exercise to figure out, how would you rewrite create.c, such that you would avoid these errors in the first place? That's always maybe a good first step for a programmer, is figuring out, OK, we solved these errors, but how could we avoid making them even in the first place? So I'll leave you with that. 

Now this is going to bring us to the end of our section today. We've taken a look at pointers. We've taken a look at opening files, file I/O. We've also taken a look at malloc and common memory errors. And this should equip you to go off and do this week's problem set with confidence to tackle filter and so on. And so I hope you enjoy this week's problem set. Wonderful to spend time with you. This was CS50 and we'll see you next time.