SPEAKER: So far, it's likely that most of your programs have been a bit ephemeral. You run a program like Mario or Greedy. It does something, it maybe prompts the user for some information, print some output to the screen, but then when your program's over, there's really no evidence there it was ever run in the first place. I mean, sure, you might have left it open in the terminal window, but if you clear your screen, there's really no evidence that it existed. We don't have a means of storing persistent information, information that exists after our program has stopped running, or we haven't up to this point. Fortunately though, c does provide us with the ability to do this by implementing something called a file, a structure that basically represents a file that you would double click on your computer, if you're used to a graphical user environment. Generally when working with c, we're actually going to be working with pointers to files-- file stars-- except for a little bit when we talk about a couple of the functions that work with file pointers. You don't need to have really dug too deep into understanding pointers themselves. There's a little teeny bit where we will talk about them, but generally file pointers and pointers, while interrelated, are not exactly the same thing. Now what do I mean when I say persistent data? What is persistent data? Why do we care about it? Say, for example, that you're running a program or you've rewritten a program that's a game, and you want to keep track of all of the user's moves so that maybe if something goes wrong, you can review the file after the game. That's what we mean when we talk about persistent data. In the course of running your program, a file is created. And when your program has stopped running, that file still exists on your system. And we can look at it and examine it. And so that program would be set to have created some persistent data, data exist after the program has finished running. Now all of these functions that work with creating files and manipulating them in various ways live in standard io.h, which is a header file that you've likely been pound including at the top of pretty much all of your programs because it contains one of the most useful functions for us, printf, that also lets lives in standard io.h. So you don't need to pound include any additional files probably in order to work with file pointers. Now every single file pointer function, or every single file I/O, input output function, accepts as one of its parameters or inputs a file pointer-- except for one, fopen, which is what you use to get the file pointer in the first place. But after you've opened the file and you get file pointers, you can then pass them as arguments to the various functions we're going to talk about today, as well as many others so that you can work with files. So there are six pretty common basic ones that we're going to talk about today. fopen and its companion function fclose, fgetc and its companion function fputc, and fread and its companion function, fwrite. So let's get right into it. fopen-- what does it do? Well, it opens a file and it gives you a file pointer to it, so that you can then use that file pointer as an argument to any of the other file I/O functions. The most important thing to remember with fopen is that after you have opened the file or made a call like the one here, you need to check to make sure that the pointer that you got back is not equal to null. If you haven't watched the video on pointers, this might not make sense. But if you try and dereference a null pointer recall, your program will probably suffer a segmentation [INAUDIBLE]. We want to make sure that we got a legitimate pointer back. The vast majority of the time we will have gotten a legitimate pointer back and it won't be a problem. So how do we make a call to fopen? It looks pretty much like this. File star ptr-- ptr being a generic name for file pointer-- fopen and we pass in two things, a file name and an operation we want to undertake. So we might have a call that looks like this-- file star ptr 1 equals fopen file1.txt. And the operation I've chosen is r. So what do you think r is here? What are the kinds of things we might be able to do to files? So r is the operation that we choose when we want to read a file. So we would basically when we make a call like this be getting ourselves a file pointer such that we could then read information from file1.txt. Similarly, we could open file 2.txt for writing and so we can pass ptr2, the file pointer I've created here, as an argument to any function that writes information to a file. And similar to writing, there's also the option to append, a. The difference between writing and appending being that when you write to a file, if you make a call to fopen for writing and that file already exists, it's going to overwrite the entire file. It's going to start at the very beginning, deleting all the information that's already there. Whereas if you open it for appending, it will go to the end of the file if there's already text in it or information in it, and it will then start writing from there. So you won't lose any of the information you've done before. Whether you want to write or append sort of depends on the situation. But you'll probably know what the right operation is when the time comes. So that's fopen. What about fclose? Well, pretty simply, fclose just accepts the file pointer. And as you might expect, it closes that file. And once we've closed a file, we can't perform any more file I/O functions, reading or writing, on that file. We have to re-open the file another time in order to continue working with it using the I/O functions. So fclose means we're done working with this file. And all we need to pass in is the name of a file pointer. So on a couple slides ago, we fopened file 1 dot text for reading and we assigned that file pointer to ptr1. Now we've decided we're done reading from that file. We don't need to do any more with it. We can just fclose ptr1. And similarly, could we fclose the other ones. All right. So that's opening and closing. Those are the two basic starting operations. Now we want to actually do some interesting stuff, and the first function that we'll see that will do that is fgetc-- file get a character. That's what fgetc generally would translate to. Its goal in life is to read the next character, or if this is your very first call to fgetc for a particular file, the first character. But then after that, you get the next one, the very next character of that file, and stores it in a character variable. As we've done here, char ch equals fgetc, pass in the name of a file pointer. Again, it's very important here to remember that in order to have this operation succeed, the file pointer itself must've been opened for reading. We can't read a character from a file pointer that we opened for writing. So that's one of the limitations of fopen, right? We have to restrict ourselves to only performing one operation with one file pointer. If we wanted to read and write from the same file, we would have open two separate file pointers to the same file-- one for reading, one for writing. So again, the only reason I bring that up now is because if we're going to make a call to fgetc, that file pointer must've been opened for reading. And then pretty simply, all we need to do is pass in the name of the file pointer. So char ch equals fgetc ptr1. That's going to get us the next character-- or again, if this is the first time we've made this call, the first character-- of whatever file is pointed to by ptr1. Recall that that was file 1 dot text. It'll get the first character of that and we'll store it in the variable ch. Pretty straightforward. So we've only looked at three functions and already we can do something pretty neat. So if we take this ability of getting a character and we loop it-- so we continue to get characters from a file over and over and over-- now we can read every single character of a file. And if we print every character immediately after we read it, we have now read from a file and printed its contents to the screen. We've effectively concatenated that file on the screen. And that's what the Linux command cat does. If you type cat in the file name, it will print out the entire contents of the file in your terminal window. And so this little loop here, only three lines of code, but it effectively duplicates the Linux command cat. So this syntax might look a little weird, but here's what's happening here. While ch equals fgetc, ptr is not equal to EOF-- it's a whole mouthful, but let's break it down just so it's clear on the syntax. I've consolidated it for the sake of space, although it's a little syntactically tricky. So this part in green right now, what is it doing? Well, that's just our fgetc call, right? We've seen that before. It's obtaining one character from the file. Then we compare that character against EOF. EOF is a special value that's defined in standard io.h, which is the end of file character. So basically what's going to happen is this loop will read a character, compare it to EOF, the end of file character. If they don't match, so we haven't reached the end of the file, we'll print that character out. Then we'll go back to the beginning of the loop again. We'll get a character, check against EOF, print it out, and so on and so on and so on, looping through in that way until we've reached the end of the file. And then by that point, we will have printed out the entire contents of the file. So again, we've only seen fopen, fclose, and fgetc and already we can duplicate a Linux terminal command. As I said at the beginning, we had fgetc and fputc, and fputc was the companion function of fgetc. And so, as you might imagine, it is the writing equivalent. It allows us to write a single character to a file. Again, the caveat being, just like it was with fgetc, the file that we're writing to must've been opened for writing or for appending. If we try and use fputc on a file that we've opened for reading, we're going to suffer a bit of a mistake. But the call is pretty simple. fputc capital A ptr2, all that's going to do is it's going to write the letter into A into file 2 dot text, which was the name of the file that we opened and assigned the pointer to ptr2. So we're going to write a capital A to file 2 dot text. And we'll write an exclamation point to file 3 dot text, which was pointed to by ptr3. So again, pretty straightforward here. But now we can do another thing. We have this example we were just going over about being able to replicate the cat Linux command, the one that prints out to the screen. Well, now that we have the ability to read characters from files and write characters to files, why don't we just substitute that call to printf with a call to fputc. And now we've duplicated cp, a very basic Linux command that we talked about way long ago in the Linux commands video. We've effectively duplicated that right here. We're reading a character and then we're writing that character to another file. Reading from one file, writing to another, over and over and over again until we hit EOF. We've got to the end of the file we're trying to copy from. And by that we'll have written all of the characters we need to the file that we're writing to. So this is cp, the Linux copy command. At the very beginning of this video, I had the caveat that we would talk a little bit about pointers. Here is specifically where we're going to talk about pointers in addition to file pointers. So this function looks kind of scary. It's got several parameters. There's a lot going on here. There's a lot of different colors and texts. But really, it's just the generic version of fgetc that allows us to get any amount of information. It can be a bit inefficient if we're getting characters one at a time, iterating through the file one character at a time. Wouldn't it be nicer to get 100 at a time or 500 at a time? Well, fread and its companion function fwrite, which we'll talk about in a second, allow us to do just that. We can read an arbitrary amount of information from a file and we store it somewhere temporarily. Instead of being able to just fit it in a single variable, we might need to store it in an array. And so, we pass in four arguments to fread-- a pointer to the location where we're going to store information, how large each unit of information will be, how many units of information we want to acquire, and from which file we want to get them. Probably best illustrated with an example here. So let's say that we declare an array of 10 integers. We've just declared on the stack arbitrarily int arr 10. So that's pretty straightforward. Now what we're doing though is the frecall is we're reading size of int times 10 bytes of information. Size of int being four-- that's the size of an integer in c. So what we're doing is we're reading 40 bytes worth of information from the file pointed to by ptr. And we're storing those 40 bytes somewhere where we have set aside 40 bytes worth of memory. Fortunately, we've already done that by declaring arr, that array right there. That is capable of holding 10 four-byte units. So in total, it can hold 40 bytes worth of information. And we are now reading 40 bytes of information from the file, and we're storing it in arr. Recall from the video on pointers that the name of an array, such as arr, is really just a pointer to its first element. So when we pass in arr there, we are, in fact, passing in a pointer. Similarly we can do this-- we don't necessarily need to save our buffer on the stack. We could also dynamically allocate a buffer like this, using malloc. Remember, when we dynamically allocate memory, we're saving it on the heap, not the stack. But it's still a buffer. It still, in this case, is holding 640 bytes of information because a double takes up eight bytes. And we're asking for 80 of them. We want to have space to hold 80 doubles. So 80 times 8 is 640 bytes information. And that call to fread is collecting 640 bytes of information from the file pointed to by ptr and storing it now in arr2. Now we can also treat fread just like a call to fgetc. In this case, we're just trying to get one character from the file. And we don't need an array to hold a character. We can just store it in a character variable. The catch, though, is that when we just have a variable, we need to pass in the address of that variable because recall that the first argument to fread is a pointer to the location and memory where we want to store the information. Again, the name of an array is a pointer. So we don't need to do ampersand array. But c, the character c here, is not an array. It's just a variable. And so we need to pass an ampersand c to indicate that that's the address where we want to store this one byte of information, this one character that we're collecting from ptr. Fwrite-- I'll go through this a little more quickly-- is pretty much the exact equivalent of fread except it's for writing instead of reading, just like the other-- we've had open and close, get a character, write a character. Now it's get arbitrary amount of information, right arbitrary amount of information. So just like before, we can have an array of 10 integers where we already have information stored, perhaps. It was probably some lines of code that should go between these two where I fill arr with something meaningful. I fill it with 10 different integers. And instead, what I'm doing is writing from arr and collecting the information from arr. And I'm taking that information and putting it into the file. So instead of it being from the file to the buffer, we're now going from the buffer to the file. So it's just the reverse. So again, just like before, we can also have a heap chunk of memory that we've dynamically allocated and read from that and write that to the file. And we also have a single variable capable of holding one byte of information, such as a character. But again, we need to pass in the address of that variable when we want to read from it. So we can write the information we find at that address to the file pointer, ptr. There's lots of other great file I/O functions that do various things besides the ones we've talked about today. A couple of the ones you might find useful are fgets and fputs, which are the equivalent of fgetc and fputc but for reading a single string from a file. Instead of a single character, it will read an entire string. fprintf, which basically allows you to use printf to write to file. So just like you can do the variable substitution using the placeholders percent i and percent d, and so on, with printf you can similarly take the printf string and print something like that to a file. fseek-- if you have a DVD player is the analogy I usually use here-- is sort of like using your rewind and fast forward buttons to move around the movie. Similarly, you can move around the file. One of the things inside that file structure that c creates for you is an indicator of where you are in the file. Are you at the very beginning, at byte zero? Are you at byte 100, byte 1,000, and so on? You can use fseek to arbitrarily move that indicator forward or backward. And ftell, again similar to a DVD player, is like a little clock that tells you how many minutes and seconds you are into a particular movie. Similarly, ftell tells you how many bytes you are into the file. feof is a different version of detecting whether you've reached the end of the file. And ferror is a function that you can use to detect whether something has gone wrong working with a file. Again, this is just scratching the surface. There's still plenty more file I/O functions in the standard io.h. But this will probably get you started working with file pointers. I'm Doug Lloyd. This is cs50.