[Walkthrough - Problem Set 4] [Zamyla Chan - Harvard University] [This is CS50. - CS50.TV] All right. Hello, everyone, and welcome to Walkthrough 4. 

Today our pset is Forensics. Forensics is a really fun pset that involves dealing with bitmap files to discover who committed a crime. Then we're going to resize some bitmap files, then we're also going to deal with a really fun part called Recover, in which we're basically handed a memory card in which someone has accidentally deleted all of their files, and we're asked to recover those files. 

But first, before we get into the pset, I really just want to congratulate everyone. We're about at the midpoint of this course. Quiz 0 is behind us, and we're at pset4, so essentially, we're halfway. We've come a long way if you look back to your psets, pset0 and pset1, so congratulate yourself about that, and we're going to get into some really fun stuff. 

So our toolbox for this pset, again, instead of running sudo yum -y update, we're able to just run update50 if you're at version 17.3 and above of the appliance. So be sure to run update50--it's a lot easier, a few less characters-- to make sure that you're at the latest version of the appliance. Especially it's important to update50 when we start using CS50 Check. So make sure that you do that. 

For all of the sections for this pset, we're going to be dealing with file inputs and outputs, file I/O. We're going to be going over a lot of programs that deal with arrays pointing to files and things like that, so we want to make sure that we're really familiar and comfortable dealing with how to input and output into files. 

In the distribution code for this pset is a file called copy.c, and that's what we're going to find is going to be really useful to us because we're going to end up actually copying the copy.c file and just altering it slightly to be able to achieve the first 2 parts of the problem set. 

And so then as I mentioned before, we are dealing with bitmaps as well as JPEGs. So really understanding the structure of how those files are organized, how we can really translate the 0s and 1s into structs and things that we can actually understand and interpret and edit, that will be really important, so going into JPEG and bitmap files and understanding the structure of those. 

Pset4, as usual, starts with a section of questions. Those will deal with file I/O and get you accustomed to that. Then part 1 is Whodunit, in which you're given a bitmap file that looks kind of like red dots all over. And then basically what we're going to do is take this file and just edit it slightly into a version that we can read. Essentially, once we finish, we'll have the same file, except we'll be able to see the hidden message hidden by all those red dots. Then Resize is a program that, given a file and then given the name of the file that it outputs and then given a number as well, will actually resize that bitmap by that integer value. Then lastly, we have the Recover pset. We are given a memory card and then have to recover all of the photos that have been accidentally deleted, but, as we'll learn, not actually deleted and removed from the file; we just kind of lost where they were in the file, but we're going to recover that. 

Great. So going into file I/O specifically, these are a whole list of functions that you'll be using. You've already seen a little bit the basics of fopen, fread, and fwrite, but we're going to look further into some file I/O functions such as fputc, in which you just write one character at a time, to fseek, where you kind of move the file position indicator forwards and backwards, and then some others. But we'll go into that a bit later during the pset. 

So first, just to get into file I/O before we go into the pset, to open a file, for instance, what you have to do is actually set a pointer to that file. So we have a FILE* pointer. In this case, I'm calling it an in pointer because that's going to be my infile. And so I'm going to use the function fopen and then the name of the file and then the mode in which I'm going to be dealing with the file. So there's "r" in this case for reading, "w" for writing, and then "a" for appending. For instance, when you're dealing with an infile and all you want to do is read the bits and bytes stored there, then you're probably going to want to use "r" as your mode. When you want to actually write, kind of make a new file, then what we're going to do is we're going to open the new file and use the "w" mode for writing. 

So then when you're actually reading into the files, the structure is as follows. First you include the pointer to the struct that will contain the bytes that you're reading. So that's going to be the end location of the bytes that you're reading. You're then going to indicate the size, like basically how many bytes your program has to read in to the file, the size basically one element is, and then you're going to specify how many elements you want to read. And then finally, you have to know where you're reading from, so that's going to be your in pointer. I color-coded these because fread is also very similar to fwrite, except you want to make sure that you use the right order, make sure that you're actually writing to or reading from the right file. 

So then as before, if we have the size of the element as well as the number of elements, then we can play around here a little bit. Say I have a DOG struct and so then I want to read two DOGs at a time. What I could do is say the size of one element is going to be the size of one DOG and I'm going to actually read two of them. Alternatively, what I could do is say I'm just going to read one element and that one element is going to be the size of two DOGs. So that's analogous how you can kind of play around with size and number depending on what's more intuitive to you. 

All right. So now we get to writing files. When you want to write a file, the first argument is actually where you're reading from. So that's basically the data that you are going to write into the file, which is the out pointer at the end. So when you're dealing with the pset, make sure you don't get confused. Maybe have the definitions side by side. You can pull the definitions up in the manual by typing m-a-n and then fwrite, for instance, in the terminal, or you can refer back to this slide and make sure that you're using the right one. So again, for fwrite, when you have a file that you want to write into, that's going to be the last argument and that's going to be a pointer to that file. So then that's how we deal with writing perhaps several bytes at a time, but say you want to just write in just one single character. As we'll see later in this example, in the bitmaps we'll have to use that. That's when we can use fputc, essentially just putting one character at a time, chr, into the file pointer, and that's our out pointer there. So then whenever we seek or write in a file, the file is keeping track of where we are. So it's a sort of cursor, the file position indicator. And so whenever we write or read again into a file, the file actually remembers where it is, and so it continues from where the cursor is. This can be beneficial when you want to, say, read in a certain amount to do something and then read in the following amount, but sometimes we might want to go back or actually start from a certain reference value. So then the fseek function, what it does is allows us to move the cursor in a certain file a certain number of bytes. And then what we have to do is specify where the reference value is. So either it moves forward or backward from where the cursor currently is, or we can specify that it should just move in from the beginning of the file or from the end of the file. And so you can pass in negative or positive values to amount, and that will kind of move the cursor either forwards or backwards. 

Before we get into the other psets, any questions on file I/O? Okay. As we get into more examples, feel free to stop me for questions. 

So in Whodunit, you're handed a bitmap file similar to this red one on the slide, and it looks like this--a bunch of red dots--and you don't really know what's written. If you squint, you may be able to see a slight bluish color inside the middle. Essentially, that's where the text is stored. There was a murder that happened, and we need to find out who did it. In order to do that, we need to kind of convert this image into a readable format. If you guys ever encountered this, sometimes there would be little kits where you would have a magnifying glass with a red film. Anyone? Yeah. So you would be handed something like this, you would have a magnifying glass with the red film over it, you would put it over the image, and you would be able to see the message hidden therein. We don't have a magnifying glass with red film, so instead we're going to kind of create our own in this pset. And so the user is going to input whodunit, then the clue, .bmp, so that's the infile, that's the red dot message, and then they're saying verdict.bmp is going to be our outfile. So it's going to create a new bitmap image similar to the clue one except in a readable format where we can see the hidden message. 

Since we're going to be dealing with editing and manipulating bitmaps of some sort, we're going to kind of dive in into the structure of these bitmap files. We went over these a little bit in lecture, but let's look into them some more. Bitmaps are essentially just an arrangement of bytes where we've specified which bytes mean what. So here is kind of like a map of the bitmap image saying that it starts with some header files, starts with some information in there. You see that at about byte number 14 the size is indicated of the bitmap image, and it continues on. But then what we're really interested in here is starting around byte number 54. We have these RGB triples. What that's going to do is contain the actual pixels, the color values. Everything above that in the header is some information corresponding to the size of the image, the width of the image, and the height. When we go into padding later on, we'll see why the size of the image might be different than the width or the height. So then to represent these--these bitmap images are sequences of bytes-- what we could do is say okay, I'm going to remember that at index 14, that's where the size is, for instance, but instead what we're going to do to make this easier is encapsulate it in a struct. And so we have two structs made for us, a BITMAPFILEHEADER and a BITMAPINFOHEADER, and so whenever we read in to that file, by default it's going to be going in order, and so in order it's also going to fill in into variables such as biWidth and biSize. And then finally, every pixel is represented by three bytes. The first one is the amount of blue in the pixel, the second is the amount of green, and finally, the amount of red, where 0 is essentially no blue or no green or no red and then ff is the maximum value. These are hexadecimal values. So then if we have ff0000, then that corresponds to the maximum amount of blue and then no green and no red, so then that would give us a blue pixel. Then if we have ff's all across the board, then that means that we have a white pixel. This is kind of opposite to typically when we say RGB. It's actually going BGR. 

So if we actually look into an example of a bitmap image--let me pull one up here. It's a little small. I'm zooming in, and we can see it's pixelated. It looks like blocks of color. You have white blocks and then red blocks. If you play in Microsoft Paint, for instance, you could make something like that by basically just painting certain squares in a specific order. So then what this translates to in the bitmap is as follows. Here we have first white pixels, that all 6 are f's, and then we have red pixels, indicated by 0000ff. And so the sequence of bytes that we have indicates how the bitmap image is going to look. So what I've done here is just written out all those bytes and then colored in the red so that you can kind of see, if you squint a little bit, how that kind of indicates a smiley face. 

The way that bitmap images work is I envision it basically as a grid. And so by default, every row of the grid has to be a multiple of 4 bytes. If we look at a bitmap image, you're filling in every value. For instance, you might have a red here, a green here, a blue here, but you have to make sure that the image is filled in with a multiple of four bytes. So if I want my image to be three blocks wide, then I would have to put an empty value in the last one to make it a multiple of four. So then I would add in something which we're calling padding. I'm just going to indicate that there with an x. Now say we want an image that is 7 pixels long, for instance. We have 1, 2, 3, 4, 5, 6, 7, and all of that is filled in with color. The way that bitmap images work is that we need an 8th. Right now we have 1, 2, 3, 4, 5, 6, 7. We need 8 spaces for the bitmap image to read correctly. So then what we have to do is add in just a bit of padding to make sure that all of the widths are uniform and that all of the widths are a multiple of 4. And so I previously indicated, padding as an x or a squiggly line, but in the actual bitmap images the padding is indicated by a hexadecimal 0. So that would be a single character, 0. What might come in handy is the xxd command. What it does is actually shows you, like similar to what I did before with the smiley when I actually printed out what each color would be for the pixel and then color-coded it, when you run xxd with the following commands, then it will actually print out what the colors are for those pixels. What you have to do is over here I indicate, like the -s 54 says that I'm going to start at the 54th byte because before that, remember if we look back to the map of the bitmaps, that's all the header information and things like that. But what we really care about is the actual pixels that indicate the color. So by adding in that flag, -s 54, then we're able to see the color values. And don't worry about the complicated flags and things like that. In the problem set spec, you'll have directions on how to use xxd to display the pixels. So if you see here, it kind of looks like a green box, this small thing. I've color-coded the 00ff00 as basically saying no blue, a lot of green, and no red. So that corresponds to green. As you see here, we see a green rectangle. This green rectangle is only 3 pixels wide, so then what we have to do to make sure that the image is a multiple of 4 wide is add in extra padding. And so then that's how you see these 0s here. This will actually be the result of your Resize pset, essentially taking the small bitmap and then enlarging it by 4. And so what we see is that actually this image is 12 pixels wide, but 12 is a multiple of 4, and so we actually don't see any 0s at the end because we don't need to add any because it's fully padded. It doesn't have any more room. 

Okay. Any questions about padding? Okay. Cool. 

As I mentioned before, the bitmaps are just a sequence of bytes. And so what we have is instead of needing to keep track of exactly which number of byte corresponds to a specific element, we actually have created a struct to represent that. So what we have is an RGBTRIPLE struct. Whenever you have an instance of an RGB triple, because this is a type define struct, then you can access the rgbtBlue variable, similarly the Green and Red variables, which will indicate how much blue, green, and red, respectively, you have. 

So if we have the blue variable set to 0, the green set to ff, which is the maximum value you can have, and then the red variable set to 0, then what color would this particular RGB triple represent? >>[student] Green. Green. Exactly. It's going to be useful to know that whenever you have an instance of an RGB triple, you can actually access the amount of color--blue, green, and red--separately. 

Now that we've talked about the structure of that, let's take a look at the BMP file. These are structs made for you. Here we have a BITMAPFILEHEADER struct. Of interest is the size. Later on, we have the info header, which has a few more things that are interesting to us, namely the size, the width, and the height. As we'll go into later, when you read in to the file, it automatically reads in because we've set the order to be the same. So the biSize will contain the right bytes that correspond to the actual size of the image. And then here, lastly, as we've talked about, we have the RGBTRIPLE typedef struct. We have an rgbtBlue, Green, and Red associated with it. 

Great. Okay. Now that we understand bitmaps a little bit, understand that we have a file header and an info header associated with it and then after that, we have the interesting stuff of the colors, and those colors are represented by RGBTRIPLE structs, and those, in turn, have three values associated to the blue, the green, and the red. 

So now, we can kind of think about Recover a bit. Sorry. Think about Whodunit. When we have our clue file, then what we want to do is read in to it pixel by pixel and then somehow change those pixels so that we can output it into a readable format. And so to output it, we're going to write pixel by pixel into the verdict.bmp file. That's kind of a lot to do. We realize that. So what we've done is we've actually provided you with copy.c. What copy.c does is just makes an exact copy of a given bitmap file and then outputs it. So this already opens the file for you, reads in pixel by pixel, and then writes it in into an output file. 

Let's take a look at that. This is ensuring proper usage, getting the filenames here. What this does is it sets the input file to be what we've passed in in the infile here, which is our second command-line argument. Checks to make sure that we can open the file. Checks to make sure we can make a new outfile here. Then what this does here, it just basically starts reading in to the bitmap file from the beginning. The beginning, as we know, contains the BITMAPFILEHEADER, and so those sequences of bits will directly fill in the BITMAPFILEHEADER. So what we have here is saying that BITMAPFILEHEADER bf-- that's our new variable of type BITMAPFILEHEADER-- we're going to put inside bf what we read from in pointer, which is our infile. How much do we read? We read in how many bytes we need to contain the whole BITMAPFILEHEADER. Similarly, that's what we do for the info header. So we're continuing along our file in the infile, and we're reading those bits and bytes, and we're plugging them directly in into these instances of the variables that we're making. Here we're just making sure that the bitmap is a bitmap. 

Now we have an outfile, right? So as it stands when we create it, it's essentially empty. So we have to basically create a new bitmap from scratch. What we do is we have to make sure that we copy in the file header and the info header just like the infile has. What we do is we write--and remember that bf is the variable of type BITMAPFILEHEADER, so what we do is we just use that content to write into the outfile. Here, remember we talked about padding, how it's important to make sure that the amount of pixels that we have is a multiple of 4. This is a pretty useful formula to calculate how much padding you have given the width of your file. I want you guys to remember that in copy.c we have a formula for calculating padding. Okay? So everyone remember that. Great. So then what copy.c does next is it iterates over all of the scanlines. It goes through the rows first and then stores every triple that it reads and then writes it into the outfile. So then here we're reading only one RGB triple at a time and then putting that same triple into the outfile. The tricky part is that the padding isn't an RGB triple, and so we can't just read that padding amount of RGB triples. What we have to do is actually just move our file position indicator, move our cursor, to kind of skip over all the padding so that we're at the next row. And then what this does is copy shows you how you might want to add the padding. So we've calculated how much padding we need, so that means that we need padding number of 0s. What this does is a for loop that puts padding number of 0s into our outfile. And then finally, you close both files. You close the infile as well as the outfile. 

So that's how copy.c works, and that's going to be pretty useful. Instead of just actually directly copying and pasting it or just looking at it and typing in whatever you want, you might just want to execute this command in the terminal, cp copy.c whodunit.c, which will create a new file, whodunit.c, that contains the exact same content as copy does. So then what we can do is use that as a framework upon which to build and edit for our whodunit file. 

These are our to-dos to do for Whodunit, but what copy.c does is actually takes care of most of them for us. So all we need to do next is change the pixels as needed to actually make the file readable. Remember that for a given pixel triple, so for a given variable of type RGBTRIPLE, you can access the blue, green, and red values. That's going to come in handy because if you can access them, that means that you can also check them, and that means that you can also change them. 

So when we went back to our red magnifying glass example, basically, that was acting as a sort of filter for us. So what we want to do is we want to filter all of the triples that are coming in. There are several different ways to do this. Basically, you can have whatever type of filter you want. Maybe you want to change all red pixels or maybe you want to change a different color pixel to a different color. That's up to you. Remember that you can check what color the pixel is and then you can also change it as you're going through. 

Okay. So that's Whodunit. Once you run Whodunit, you'll know who the culprit of the crime was. 

Now we're going to go to Resize. We're going to still be dealing with bitmaps. What we're going to do is we're going to have an input bitmap and then we're going to pass in a number and then get an outfile bitmap where that's basically our infile scaled by n. Say my file was just one pixel large. Then if my n was 3, scaling by 3, then I would repeat that pixel n number of times, so 3 times, and then also scale it down 3 times as well. So you see I'm scaling it vertically as well as horizontally. 

And then here's an example. If you have n = 2, you see that the first blue pixel there repeated two times horizontally as well as two times vertically. And then that continues on, and so you have a direct scaling of your original image by two. 

So then if we were to detail the pseudocode for this, we want to open the file. And then knowing that if we go back here, we see that the width for the outfile is going to be different than the width for the infile. What does that mean? That means that our header information is going to change. And so what we'll want to do is update the header info, knowing that when we read in the files if you're operating on the copy.c framework, we already have a variable that indicates what the size is and things like that. So once you have that, what you might want to do is change those particular variables. Remember, if you have a struct, how you access the variables within that. You use the dot operator, right? So then using that, you know that you'll need to change the header info. So here's just a list of the actual elements that are going to be changing in your file. The file size is going to be changing, the image, as well as the width and the height. So then going back to the map of the bitmaps, look at whether it's the file header or the info header that contains that information and then change as needed. Again, say cp copy.c resize.c. That means that resize.c now contains everything that's contained inside copy because copy provides us a way of reading in to each scanline pixel by pixel. Except now, instead of just changing the values like we did in Whodunit, what we want to do is we want to write in multiple pixels as long as our n is greater than 1. 

Then what we want to do is we want to stretch it horizontally by n, as well as stretch it vertically by n. How might we do this? Say your n is 2 and you have this given infile. Your cursor is going to start at the first one, and what you want to do if n is 2, you want to print in 2 of those. So you print in 2 of those. Then your cursor is going to move to the next pixel, which is the red one, and it's going to print out 2 of those red ones, appending it onto what it's done before. Then the cursor will move to the next pixel and draw in 2 of those. If you look back to the copy.c framework, what this does right here is it creates a new instance of an RGB triple, a new variable called triple. And here when it reads into it, it reads from the infile 1 RGBTRIPLE and stores it inside of that triple variable. So then you actually have a variable representing that particular pixel. Then when you write, what you might want to do is encase the fwrite statement into a for loop that writes it into your outfile as many times as needed. That's simple enough. Just basically repeat the writing process n number of times to scale it horizontally. 

But then we have to remember that our padding is going to change. Previously, say we had something of length 3. Then we would just add in how much padding? Just one more to make it a multiple of 4. But say we're scaling this particular image by n = 2. So then how many blue pixels would we have at the end? We would have 6. 1, 2, 3, 4, 5, 6. All right. 6 isn't a multiple of 4. What's the nearest multiple of 4? That's going to be 8. So we're actually going to have 2 characters of padding there. 

Does anyone remember if we have a formula to calculate padding and where that might be? [inaudible student response] >>Yeah, copy.c. Right. There is a formula in copy.c to calculate how much padding you have given a particular width of the bitmap image. So then that's going to be useful when you need to add in a certain amount of padding to actually figure out how much padding you need to add. But one note, though, is that you want to make sure that you're using the right size. Just be careful because you're basically going to be dealing with two bitmap images. You want to make sure that you're using the right one. When you're calculating the padding for the outfile, you want to use the width of the outfile and not the width of the previous one. 

Great. That kind of takes care of stretching a whole bitmap image horizontally. But what we want to do is actually stretch it vertically as well. This is going to be a little bit trickier because when we've finished copying a row and writing that row, our cursor is going to be at the end. So if we read again, then it's just going to read in to the next line. So what we want to do is kind of find some way of copying those rows again or just kind of taking that row and then rewriting it again. As I kind of alluded to, there are several different ways to do this. What you could do is as you're going through and reading through the particular scanline and changing it as necessary, then kind of store all of those pixels in an array. Then later on you know that you'll need to print out that array again, and so you can just use that array to do that. Another way to do it is you could copy down one row, understand that you need to copy that again, so actually move your cursor, and that's going to be using the method fseek. You could move your cursor all the way back and then repeat the copy process again. 

So if our scaling number is n, then how many times would we have to go back and rewrite a line? >>[student] n - 1. >>Yeah, perfect. n - 1. We've done it once already, so then we'll want to repeat the going back process n - 1 amount of times. Okay. So there you have your resize function. 

Now we can get to a really fun part, my favorite pset, which is Recover. Instead of bitmaps, this time we're dealing with JPEGs. We're actually not given a file just of JPEGs, we're given basically a raw memory card format. And so this contains a bit of info and garbage values in the beginning, and then it starts and it has a bunch of JPEG files. However, we're handed a card where we've deleted the photos; essentially, we've forgotten where the photos are located within the card. So then our task in Recover is to go through this card format and find those pictures again. 

Luckily, the structure of JPEG files and the card file is a bit helpful. It definitely could have been a bit trickier if it weren't in this particular format. Every JPEG file actually starts with two possible sequences, listed above. Basically, whenever you have a new JPEG file, it starts with either the sequence ffd8 ffe0 or the other one, ffd8 ffe1. Another helpful thing to know is that JPEGs are stored contiguously. So whenever one JPEG file ends, the other one starts. So there isn't any kind of in-between values there. Once you hit the start of a JPEG, if you've already been reading a JPEG, you know that you've hit the end of the previous one and the start of the next one. 

To kind of visualize this, I made a schematic. Another thing about JPEGs is that we can read them in sequences of 512 bytes at a time, similarly with the beginning of the card. We don't need to be checking every single byte because that would suck. So instead, what we can do is actually just read in 512 bytes at a time and then, instead of checking in between those in those tiny little slices, we can just check the beginning of the 512 bytes. Essentially, in this picture, what you see is in the beginning of the card, you have values that aren't really relevant to the actual JPEGs themselves. But then what I have is a star to indicate one of the two starting sequences for a JPEG. So whenever you see a star, you know that you have a JPEG file. And then every JPEG file is going to be some multiple of 512 bytes but not necessarily the same multiple. The way that you know that you've hit another JPEG is if you hit another star, another starting sequence of bytes. Then what you have here is you have the red JPEG file continuing until you hit a star, which is indicated by a new color. You continue and then you hit another star, you hit another JPEG, you continue all the way until the end. You're at the last picture here, the pink one. You go to the end until you hit the end of file character. This is going to be really useful. 

A few main takeaways here: The card file doesn't start with a JPEG, but once a JPEG starts, all of the JPEGs are stored side by side to one another. 

Some pseudocode for the Recover. First, we're going to open our card file, and that's going to be using our file I/O functions. We're going to repeat the following process until we've reached the end of the file. We're going to read 512 bytes at a time. And what I said here is we're going to store it in a buffer, so basically hold on to those 512 bytes until we know exactly what to do with them. Then what we want to do is we want to check whether we've hit a star or not. If we've hit a star, if we've hit one of the starting sequences, then we know that we've hit a new JPEG file. What we'll want to do is we're going to want to create a new file in our pset4 directory to continue making that file. But also, if we've already made a JPEG before, then we want to end that file and push it to the pset4 folder, where we'll have that file stored because if we don't specify that we've ended that JPEG file, then we'll basically have an indeterminate amount. The JPEGs will never end. So we want to make sure that when we're reading in to a JPEG file and writing that, we want to specifically close that in order to open the next one. We'll want to check several things. We want to check whether we're at the start of a new JPEG with our buffer and also if we already have found a JPEG before because that will change your process slightly. So then after you go through all the way and you hit the end of the file, then what you'll want to do is you'll want to close all the files that are currently open. That will probably be the last JPEG file that you have, as well as the card file that you've been dealing with. 

The last obstacle that we need to tackle is how to actually make a JPEG file and how to actually push it to the folder. The pset requires that every JPEG that you find be in the following format, where you have the number .jpg. The number, even if it's 0, we call it 000.jpg. Whenever you find a JPEG in your program, you're going to want to name it in the order that it's found. What does this mean? We need to kind of keep track of how many we've found and what the number of each JPEG should be. Here we're going to take advantage of the sprintf function. Similar to printf, which just kind of prints a value out into the terminal, sprintf prints the file out into the folder. And so what this would do if I had sprintf, title, and then the string there, it would print out 2.jpg. Assuming that I've closed my files correctly, that would contain the file that I had been writing out. But one thing is that the code that I have here doesn't quite satisfy what the pset requires. The pset requires that the second JPEG file should be named 002 instead of just 2. So when you print out the name, then perhaps you might want to alter the placeholder slightly. 

Does anyone remember how we allow for extra spaces when we print something? Yeah. >>[student] You put a 3 between the percent sign and the 2. >>Yeah, perfect. You'll put a 3 in this case because we want space for 3. %3d would probably give you 002.jpg instead of 2. The first argument into the sprintf function is actually a char array, which we previously knew as strings. Those will, kind of more like a temporary storage, just store the resultant string. You won't really be dealing with this, but you need to include it. 

Knowing that every file name has the number, which takes up three characters, and then .jpg, how long should this array be? Throw out a number. How many characters in the title, in the name? So there's 3 hashtags, period, jpg. >>[student] 7. >>7. Not quite. We're going to want 8 because we want to allow for the null terminator as well. 

Finally, just to draw out the process that you'll be doing for Recover, you have some beginning information. You continue until you find the start of a JPEG file, and that can be either one of two starting sequences. You keep on reading. Every slash here represents 512 bytes. You keep on reading, keep on reading until you encounter another starting sequence. Once you have that, you end the current JPEG--in this case, it's the red one, so you want to end that. You want to sprintf the name of that into your pset4 folder, then you want to open a new JPEG and then keep on reading until you encounter the next. Keep on reading, keep on reading, and then finally, eventually, you're going to reach the end of file, and so you'll want to close the last JPEG that you were working with, sprintf that into your pset4 folder, and then look at all of the pictures that you've gotten. Those pictures are actually pictures of CS50 staff, and so this is where the bonus fun part of the pset comes in is that you are competing in your sections to find the TFs in the pictures and take pictures with them to prove that you've done the pset and so you can see which staff members are in the pictures. So then you take pictures with the staff. Sometimes you'll have to chase them down. Probably some of them will try to run away from you. You take pictures with them. This is ongoing. It's not due when the pset is due. The deadline will be announced in the spec. Then together with your section, whichever section takes the most pictures with the most staff members will win a pretty awesome prize. That's kind of incentive to get your pset4 finished as quickly as possible because then you can get down to business hunting down all the different CS50 staff members. That's not mandatory, though, so once you get the pictures, then you are finished with pset4. 

And I'm finished with Walkthrough 4, so thank you all for coming. Good luck with Forensics. [applause] [CS50.TV]