[ Silence ] >> Alright, hello everybody, welcome to the walkthrough for our problem set 5, our forensics pset, today's music and other mashup group, so I want to take this opportunity to thank everyone for filling out the feedback, particularly the walkthrough question. I learned that a lot of people don't like my taste in music and I'm sorry. I also learned that a lot of you would like to see more code examples and more syntax and have it be less conceptual and more like, here is like the code, what it kind of looks like what you need to type, and so keeping all that in mind, we'll try to do that moving forward. So that was really helpful and thanks for being honest on those. So, here is a list of things we'll be covering for today. So I should note that we'll see file I/O much more in lecture this week, lecture in section. And so if you're a little bit confused after the walkthrough like the finer details of the file I/O, don't worry all your questions will be answered this week in lecture in section. So, on the topic bitmaps and the different image types you'll see in this pset and then the four programs that you'll encounter as you're going through forensics. So file I/O, so this is the major overarching topic for pset 5, and file I/O is just a short for file input I and output, so input being I have some file and it contains some bytes and some text and whatever else, and I wanna read that in, I want my C program to be able to tell me what is inside of that file. Output on the other hand is saying, well, I want to create a file for my C program and I wanna write this sequence of bytes in this file and that could create an image, that could create a text file, that could create a Word document, at the core of these files, are just sequences of bytes and our programs can either read them in or they can write them out. So, when we open a file, every file that we open is gonna have an associated file position indicator. And this basically just like the little cursor, when you open up Microsoft Word, that says where am I in the file right now? So if I say I want to read some number of bytes or read some number of letters from my file, I'm gonna start at where my cursor is, I'm gonna read something like 4 bytes and the cursor's gonna move over 4 bytes. So, if I read another 4 bytes, I'm not just gonna get the same thing back because I already read that, I'm gonna read the next 4 bytes. So, the first thing we wanna do before we can do anything with our files is open them up. So, one of the functions we can use is fopen, so a lot of these functions are gonna start with F, F for file, so you can fopen a file, the first argument is going to be where the file is and the second argument is going to be what you want to do with the file. So in whodunit, you have this bitmap, it's called clue.bitmap and you don't wanna modify that at all. You don't wanna modify clue.bitmap, you wanna just read in the bytes, and do something with them. So that R just says I'm only going to be reading this file. R for read actually makes sense which is nice. So if you wanted to open a file to write to, this file may or may not exist yet, you would say, okay, I want some path, in this case, just verdict.bmp, which may, may not exist. If it doesn't exist yet, then C is gonna create it for you and this W says that I'm going to write to this file. I'm not gonna be reading any bytes for it, I'm just going to be writing to it. So now, how do we read those bytes? So we have a function fread or fread, file read, and it's gonna take 4 arguments. So the first of these arguments is going to be a pointer, and this could be a pointer to a struct, or to see later pointed to an array and this pointer, after you call fread is what's going to contain the data from your file. So the reason is be it pointers so that the function can actually modify that variable rather than making a copy of it and then you're not having it back. So fread is going to return what was read, it's going to put it into that first argument. So that second too are just basically how much do you want to read so it's basically going to read size times number, so you gonna read a block of size size which is just some integer and it's going to read some number of them, so that's pretty simple and the last one is just the file pointer which is what you get back, when you open, when you fopen some file. So this is a pointer to a file and that is what you're going to pass as the last argument to a function like fread. And so just remember that because these two, the second and third arguments are basically just multiplied together, you could say something like okay, I wanna read some RGBTRIPLE which is just some struct, I wanna read two of those into my data point, into my data. So there's just some struct, I'm gonna read it into data, I'm gonna read two of those. Or I can say, well, I'm gonna read one of them but it's two times the size of one, so both of those things are the same thing 'cause they're just multiplying together the second and third argument. So that's reading. Now for what you write to a file, we're gonna have a very, very similar function. So fwrite is going to take again, just a pointer to some data that we want to write, and now this last argument is no longer the file that we're reading in, but the file that we're reading out. So again, it's still just a pointer to a file, so that thing we get back and we call fopen, and this is what's going to actually write whatever is contained within the struct to the file. So if we don't wanna write a byte but we wanna just write a single character instead, we can use anther function called fputc, and this as its data argument is just going to take a single character or a char. So you could just say fputc a in single quotes, and then the outpointer is still that file pointer that you got back when you called fopen. So there are many more functions to read and write strings from files or bytes from files, but these are the ones that you'll see on this pset. So any questions on how those are working? Yup? [ Inaudible Remark ] >> Okay sure, so, let's take a look at an example. Aha, so here we have some struct, and so I've just called this struct quad because it's gonna read in four letters. So notice that inside this struct I have essentially 4 bytes, right, I have one for the first, this first char, second, and then I have a two byte array. So I can fill up this struct with 4 bytes, right, because each of these members is effectively one byte. So now, the fist thing I'm gonna do is open up my file. So I just have some file, it's called text.txt--question first? >> Can you explain what typedef is? >> Oh sure, so can you explain what typedef is? So again, this is something that we'll see more in section this week, but this is just kind of an idiomatic way of creating a struct. This is how that your struct in pset 4 was declared. We saw a typedef struct board and then we call it G. This is just kind of an idiomatic syntactical way of declaring a struct, but basically what that does is it allows us to say later on quad letters. So I have a struct, its type is quad. If I didn't have this typedef, I'll have to say struct quad letters, which is more annoying to type. So, more about typedef in section, but this is just kind of the formula for creating a struct so I don't have to say struct quad every time I can just say quad. Other questions to the fopen? Okay, I really like pronouncing this, so sorry if that's annoying. So now, we know that this file is going to contain 12 bytes, just because I already wrote the file and that's how many letter it has. So it of course just says this is CS50 which is 12 characters. Each character is a single byte. This is 4. The space plus is another 4 and then CS50 makes 12. So we know how big this file is, and in the bitmaps that we're gonna be working with, we're also gonna know exactly how big they are before we do anything. So more on that later. But so now we're going to read 3 times, right, because our struct can hold 4 bytes, we have 4 one byte slots that we could read things into. So if we read 3 times, that means we're gonna read the whole file and I'm gonna get back 3 total structs. So the first thing I need to do is create a struct that I can fill with data. So this quad letter just says create a new struct, put it on the stack and it's gonna be an instance of this struct up here. So now I have some struct called letters, and inside of letters, I have letters.first, letters.second, and so on. So now when I say fread, I'm saying okay, I'm wanting to read into this variable or into the address of this variable the first 4 bytes. And how does I--how did I specify 4 bytes? Well, I said, I want you to read one times the size of that single struct because that struct has 2 characters in a one 2-character array, that's 4 bytes, that means I'm reading in 4 characters into the struct. Question? Okay, so now that I've read my data from the file, these struct letters that's just a variable contains data from the file. So in this case it's going to contain the letters that I specified in my text file. So now when I just printf letters.first, I know that letters.first is a char because I said up here in the definition of my struct that it's going to be a char, and so I can just print out percent C for a character, so then I can print out the first, I can print out the second, and now I had an array from my third one so I can iterate through that array and print out each character. So if I run this on this file, just text.txt, and I run this, I can say make I/O, I can run I/O, I'm gonna get back just this is CS50, 'cause I just read 4 characters at a time and after each 4 characters, I output a new line. So questions on how we're reading that text file into a struct and outputting them after we've filled up the struct? Yeah? [ Inaudible Remark ] >> How does it go to the next--how does it know to go to the next line? My output? [ Inaudible Remark ] >> So my reader, so how does it know to go to the next line? So in this case, I only have one line of text, right, it just says this is CS50. >> So when I call fread, I'm not really concerned about lines, all I've told this function is I said I want you to go get 4 bytes. So fread says, okay, I'm gonna start at the beginning of the file, because you haven't done anything yet, and it's gonna say, I'm gonna read 1, 2, 3, 4, just these. So now the next time I call fread, you notice that my cursor where that file position indicator is now after the word this. So when I call fread again, it's gonna go 1, 2, 3, 4 and I just read a different 4 characters. Does that make sense? Other questions? Yeah? [ Inaudible Remark ] >> So are those 4 characters getting put somewhere? So they're getting put into my struct because my struct contains spaces for those characters, in this case two variables and then an array. So it's just getting put into in order, the first thing is getting read, its' getting put into letters.first, and then letters.second, and then it's gonna fill up that array with the last two letters that it reads. So now I can access those letters. Even though I haven't, it doesn't look like I've actually modified this struct called letters, but I have because fread changed its contents. So now when I say letters.first, I'm actually getting data from the file, because it's now inside of the struct. Yes, in the front. [ Inaudible Remark ] >> So, okay--so, it says read data into struct and then here it says FP. So remember the arguments to fread. So we had the data of the size of the number and then this input pointer. Remember this input pointer is whatever we got back when we said fopen, because this is going to return a pointer to a file. Now in my code, I didn't call it inptr, I just called it FP, I called fopen and it returned a portion to a file, so now when I tell fread to use FP for file pointer, I need to go to the file that I opened up there. Does that make sense? Other questions? Yeah. [ Inaudible Remark ] >> So how does it know to read 4 characters at a time? So remember these second two arguments to fread are how much you want to read. So size of--when I take the size of a struct, what it's basically gonna do is add up everything that's inside of the struct. So inside of the struct, I have 4 characters. So it says, okay, each of these characters is 1 byte, now I have 4 when I say size of quad. So because I passed in that 4 and I passed in a 1 for the third argument, 1 times 4 is just 4, fread says I just want 4 bytes in that file at a time. [ Inaudible Remark ] >> So how do we know that--how do we know to do it 3 times? Well notice I'm calling it here in a loop. I said for I is less than length, and this is just something that I defined at the top of my file, S3. So it's gonna repeat this process 3 times, and every time it repeats, it's gonna print a new line at the end. That wasn't too clear on, but that's why it's going 3 times. Other questions on reading into structs? Okay, so we can also manipulate this cursor or where we are in the file without explicitly reading or writing. So when we read and write, we're gonna be moving over automatically, but if you wanna move it over to some position without necessarily reading or writing bytes, we have this handy function called fseek and the arguments are kind of annoyingly in a different order, but the first one is just the file. So if FP or inptr, whatever you wanna call it, and then we can say, well, I want you to just move over something like 8 bytes or 16 bytes, I don't want you to read the data, I just want you to move over, so the next time that I read the data, you're gonna read somewhere else. So this last argument just says that this offset, this amount to move the cursor, is that starting from the beginning of the file, is that starting from where you are right now, and so that's just gonna--you can change how you're manipulating the cursor based on that third argument. So questions on how fseek works? So again, more info in lecture in section this week, but that is basic file I/O. Yup? [ Inaudible Remark ] >> So when you do fwrite, is that when you wanna write from a struct to a file? Exactly, that's exactly what you wanna do. This data is what's going to go into the file that you specified in outpointer, or before whatever is in the file is going into the struct. Other last questions on file I/O functions? Okay, so now in this case, we just used a text file, because it's easy to see exactly what we are reading, we are just reading 1 byte letters. On this pset you're gonna be working with images, where it sounds really scary because an image, you know, you can't just open up an image and get it and be able to figure out what's going on. But an image is just a sequence of bytes and a text file is just a sequence of bytes. So they're really not treated any differently as far as you're concerned. So it might be a little intimidating, but it's really not, they're just a sequence of bytes and that's all we're concerned about, is what are these bytes and what can we do with them. So inside of a 24 bitmap, each pixel under that map is going to represent it by 3 bytes and those 3 bytes are going to describe the amounts of blue in the pixel, the amount of green and the amount of red in that order. So let's take a look at some examples of how we can make bitmap colors. So, we're gonna divide each number into 3 parts, so the blue part, the green part, and the red part. So if we have something like 0000ff, that says we have no blue, because the blue is 0, we have 0 green, we have a lot of red because ff is the biggest number we can represent with 2 hexadecimal digits. So if I have this number, that's going to be a red pixel. And f on the other hand, we had no blue, so that's 00, but we had a lot of green and red, and those two are gonna combine to form yellow, because if you add a lot of red, and a lot of green, then you're gonna get a purely yellow pixel. So now these numbers don't just have to be either 00 or ff, but they can really be anything on a scale of 0 to 255. So if you had 3c which is, you know, some blue, 14 just a little bit of green, and then dc which is a lot red, a lot more red compared to that and that's how you form a color like crimson. So now, how are we gonna represent that inside seek? So we've written for you inside of bmp.h this struct and it's called RGBTRIPLE. So if you wanted to create a new one, you could just say RGBTRIPLE triple, and now you have a variable called triple, and its type is this struct that we wrote for you. Now inside of the struct are 3 bytes, so each filled with just a single byte, 'cause now remember that we represented the amount of each color with a single byte. So now if you wanted to create that red pixel we saw, that 0000ff, we could say, well the amount of blue is gonna be 0x00, amount of green is 0, but the amount of red is going to be ff. So this rgbt is just red, green, blue triple and this is something that we wrote for you. If you wanna take a look at it, it's just in bmp.h. And so this is how we would literally construct the pixel inside of C if you wanted to create a red pixel for our bitmap. So, we're also concerned, about padding, and this is just kind of a nuance to the bitmap file format. So the bitmap file specification that says if you want a bitmap, it better look like this. It says that the size of each of your scan lines, or just like a single line of horizontal pixels in your image, the size of that better be a multiple of 4 bytes, or else just--you can't just let it bitmap. Remember that each pixel is going to be 3 bytes. So if the number of pixels in the line--in your line times 3 because there are 3 bytes per pixel, is not a multiple of 4, then we need to add some padding, and this padding is literally just some empty data that says, okay, this data is not really part of the image, but because bitmap tells us so, we need to make sure that every line has a number of bytes that is a multiple of 4, so in this case, what we do in copy.c [phonetic] is we literally just put in some extra zeroes, so we had something like 7 bytes which isn't right because we needed to have 8 so we can just add in a couple of zeroes and now we created a line of 8 bytes. So we've given you this handy dandy formula to calculate the number of padding based on some width of width pixels. So if you have a width of 1 pixel, we know that the size of each--the size of each byte is gonna be 1 byte and then the--so the size of each pixel is gonna be 3 bytes. So if we have a 1 pixel image, that means we have 3 bytes. So that means we need 1 byte of padding, because we need to have to be able to pull up 4. If we have 3 bytes and we need 4 bytes, can we just add 1 extra byte of padding? So if instead we had an image that's 2 pixels wide, so now we have 2 times 3, that's 6, now we need to make that 8, because 8 is the nearest multiple of 4, we need 2 bytes of padding, and so on and so forth. And this formula, what we've given to you can just take some image of any width and it's kind of spit back the number of bytes of padding that you're gonna need to make each line a multiple of 4. Question? [ Inaudible Remark ] >> So why are we taking 4 bytes of space in the previous example? [ Inaudible Remark ] >> So why are we--so in this case we're actually taking 3 at a time, right, because each pixel is going to be 3 bytes and so inside of the struct we have fields that are going to fit 3 bytes. So, we only wanna read enough data that fits into the struct, okay? We don't wanna try to read too much data and then it's not gonna fit in our struct. So because our struct can handle 3 bytes, we wanna read in 3 bytes at a time. Yeah? [ Inaudible Remark ] >> So in our--so in our previous example we had basically the same deal, so--and then--and when we're working with bitmaps, this RGBTRIPLE has space for 3 bytes but the one that I created has space for 4 bytes. We have 1, 2, 3, 4 characters because the last one is basically 2. So, we don't wanna read in 3 because we're just gonna be wasting the character so we can fill up this entire 4 byte struct. Other questions? Okay, so those are our--how we're gonna be representing each pixel. Don't forget about padding. If you're confused about the formula, you can just kind of run it through any width and it's gonna handle any case that you need to throw at it. So, now unfortunately we have more in our bitmap file. Question first? >> Why are you doing modulus twice on the previous formula? >> So, why are we doing modulus twice on this formula? So, when we say, if we just say width times size of byte mod 4, that's gonna give us the number of pixels away we are. So if we subtract that from 4 then we get them out of padding. But in the case that that's somehow negative, then we can just mod 4 it and make sure that we wrap around 2 positive numbers. So, this formula might be overly simplified a bit but this is just the one that we gave you. So, it's gonna make sure that it never turns something like a negative number or something like that. Other questions? Okay, so in addition to all of our pixels and the padding that's included in each line, we have this special block of information at the beginning of our bitmap and it looks just like this, we're dividing it into 2 parts, the first is the file header. Most importantly it's gonna say how big is my image, and then after that we just have some info about the bitmap itself like the compression or the width and height of the actual image. And so these are the ones that you're really, really concerned about. So, from that BITMAPFILEHEADER, you're concerned about the total size of the image. So this is given in bytes and this includes the header, it includes the pixels, it includes all the padding bytes. So, this is how many bytes is your image total. And this needs to be set correctly, all of this need to be set correctly in order for whatever program is reading and displaying your bitmap to display it correctly. Because if these are set wrong, then it's gonna try to read the bytes in the wrong--read the wrong amount of bytes per line or something and your image isn't gonna look right. So, from the second part, this BITMAPINFOHEADER, which contains information about the image itself, you're concerned about the size which is still the total size of the image which includes pixels and padding and they're also concerned about the width and the height now which is the width of the image in pixels which it does not include padding, and then the height of the image. So, when you're modifying any images, if you're doing something like resizing it maybe, you need to make sure that before you do anything, you change these fields. So, these again are just structs that we created for you, they're inside the file bmp.h, which you can look up, if you're gonna open it and just check it out, but they're gonna look very much as you expect because these are just gonna be fields inside of the struct. So we can simply create one using BITMAPFILEHEADER bf and if we wanna fill that up with data from our file we can just as we expect we can say fread and we wanna put that data inside of the struct, no different than what we were doing before, then we can specify things like the size and the input pointer just as we know what to do. So questions on the header, yup? >> Just a general question, what distinguishes a white pixel from a padded pixel? >> So, what distinguishes a white pixel from a padded pixel? So, because of this file header, we're gonna know what padding is and what--what is padding and what isn't padding, right? Because if we tell--if we specify inside of my info header how wide the image should be in pixels, it's gonna know, okay, well this byte must be padding because I know how wide it should be and I now have some extra thing and okay well, it's not a multiple of 4 so this is padding and as I'm an image viewer, I'm not gonna display it. So, that's why it's critical that you make sure you update these headers or you might have like a valid file like when you do xxd and look at it but if you try to open it up in some image viewer it's not gonna work right. Other questions? Okay, so speaking of xxd, so this is a handy program that we can use to read in a bitmap and display the bytes in a format that we can read. So if we use this command, which is just given to you in a pset and something like smiley, then we can say we're going to our pset 5, BMP. So when you clone the file you have all these bitmaps and they're here for you. And if you wanna display one you can say, xxd and then you can just literally copy and paste, it's gonna be -c 24 -g 3 -s 54 and eventually you're going to get--I typed it too long ago. That's okay. So actually you're gonna get xxd -c 24 -G--oh, boy--3 -s 54 smiley.bmp. Oh, and I'm too--you can kind of see it. Some my font size is a little too big. But you can basically see what each of these pixels look like. So every single one of these fff's that says I have all blue, all green, all red, not just the color white. But if you have a smaller font size, then you can see that all of these 000ff which is red are gonna be formed in a shape of a smiley face. So a better image inside of the pset. But to do that we're just gonna run this xxd command, and these options are just things like read 3 bytes at a time, this s 54 says, "I just want you to skip the header." Right, you'll notice that over here. This leftmost number is gonna be where you are in the file. So you notice that we're starting at 00036, so that's actually hexadecimal and that's gonna be equal to 3 times 16, just 48 plus 6 more which is 54. That makes sense because we said s 54 start at the 54th byte. So the reason for that being we just don't wanna display the bitmap header inside a hexadecimal because it's just gonna mess up what the image looks like. And you have the parts like the c 24 and a g 3 to specify how many columns you want to be displayed at once. So questions on using that? It could be really handy when you're writing your programs if, you know, it's not displaying right inside of your image viewer and you did--but it is displaying right inside of xxd, you can determine, well, maybe there's something wrong with my header or vice versa. So now we are--that we are empowered with knowledge of bitmaps and knowledge of file I/O, we can walkthrough copy.c. So again, this was written for us. Question first? [ Inaudible Remark ] >> Yeah. So how do you know if it's displayed right? Unfortunately, this is a bad example because my font size is so big. We may try knocking it down a little bit. So if we make this standard 12 instead. Yeah, so now you can kind of see now, you can kind of see the smiley face? If you can't get this to open up in your image viewer at all, then this is actually not a bad way of trying to figure out if you resize image correctly. So you can kind of see it there, we highlighted the colors in a piece of phrase so it's more visible. But this is a good, oh no, the image viewer isn't working at all. But you're not totally in trouble because you can just do this instead. Unfortunately, last year we didn't have an image viewer, and so this is what everyone had to do. So you're in luck. So now, for copy.c. So this again was a file that was given--written for you, and when you clone the pset 5 repository, you're gonna pull this down. So let's just walk through it. So first, as always you're just making sure we've supplied the right number of arguments. This file is gonna take some input file, just probably a bitmap, and it's going to create another file that's identical to the original bitmap. So in outfile and infile are gonna have identical, the identical sequence of bytes once this is done. So now, we're just getting the filenames from the command line. That's easy because the first one is the infile. RB2 is gonna be the outfile, we know what's going on there. And so now we're going to open them both up. So you notice that we open the first one with the mod of R because we're reading from the infile, and we open the second one with the mode of W because we're writing to that file. So after we do some more error checking, now we can start working with these two headers I mentioned. So we created a BITMAPFILEHEADER and then we read in that first amount of bytes that's equal to the size of that struct. We filled up that struct with the first bytes in the file. And so now we can access both the BITMAPFILEHEADER and the info header which we're just doing the same thing, just created a different struct and we're filling that struct up with the next bytes. And so now we can say something like bf.bfType. And this bfType is a field defined in a struct that someone else wrote. So we can just do some more error checking. But that's a good example of how to access the struct. And so now you'll notice we want to write out those same bytes in the same exact order this time to outpointer. >> So we called fread on inpointer. We're calling fwrite on outpointer. So now it's basically going to copy the headers from the infile into the outfile. So, that's the first 54 bytes. So now we need to calculate our padding, so this is just the same formula we mentioned, so that's just gonna give us back the number of bytes of padding we need. And so now we need to iterate over the entire file. So, this first 4 loop is going to be iterating over the height, so iterating over the rows and the second 4 loop is going to be iterating over the columns 'cause we're going height and width. So this looks really familiar, it's just the same as iterating over a Sudoku board. We're going rows and columns. So now we're creating a struct that's going to hold a single pixel because we're iterating over pixels and now that we're inside of this second 4 loop. So, we're reading in each pixel into the struct called triple. So now that the struct contains the data from the first file after this read then we can write it out into the other file so we're just copying pixel by pixel this first file into the second file. So now that we've read the entire line, we need to read all of the padding. So, you notice here we're using fputc instead of fwrite, that's just because we just wanna write one byte at a time padding. And these are just zeroes, they can be whatever you want and we're writing them out to that file. We're also using fseek to say, well, the next time that I call fread on my input file, I don't want you to try to read that padding, instead I wanna start at the current position of my cursor so the SEEK CUR for current and I want you to move over padding bytes. So, I want you to skip over that padding so the next time that you call fread, you're actually at the next line, the next scanline of your image and you don't start combining padding bytes with pixel bytes. So, once that's done, we can just close both of the files by calling fclose and we're done. So, this is initially really intimidating file, make a little more sense now or any questions on copy. Yup? >> Is there padding after each pixel? >> So, is there padding after each pixel? So, there's only padding at the end of the line, so that's why it's outside of this inner 4 loop but still within the outer 4 loop. Yeah? >> Why haven't you added that on the [inaudible]? >> So, okay, so this fseek says what are we doing with this input file? So, inside of the input file, we're skipping over the padding, so the next time I read, I'm at the next line. But inside of our output file, we know exactly how much padding that we need to add because we calculated it. And so we're writing out to the output file the number of padding while skipping over it inside of the input file. So, we could have just, we didn't have to do this, we could have just written the padding but this is just kind of to demonstrate what's actually going on here. You can skip it with fseek then you can explicitly write it with fputc in the zeroes. Other questions on copy? Yeah. >> It's not really about copy but if there was a 32 bit bitmap would we need padding? >> So, if it were a 32 bit bitmap would we need padding? I don't know off the top of my head like what the rules are if the 4 byte rule applies to 32 and 24 bit bitmaps but we're only concerned with 24 bit bitmaps for this pset. So, that the images depend on what the rule was for that. Other questions, yup? >> I did try and go over it and I don't know if you wanna go over this one [inaudible]. The difference between bi and bf and the weight and [inaudible]? >> Sure, so the difference between this BI and this BF? So all--the only difference is what the bitmap specification tells us the difference is. So, this is just what it looks like, so this--the info header and the file header just contain different information about the file and they're just separated into two structs because whoever designed bitmaps thought that would be a good idea. So, because these just contained different information, the structs contain different fields and their different sizes. And so that's why we have different structs for each of them and we know the file is going to occur before the info, so read the file first and then the info and write the file first and then write the info. So, the short answer why they're two things, just organizationally it makes more sense to group one--some fields into one and others into the other. So that's why we just have two different structs, it's purely because bitmap said so. Yup? [ Inaudible Remark ] >> So, inside the loop I'm never--I'm not--it's a little different than Sudoku. I'm never referring to the row or the column number, and that's because every time I call fread, remember I'm moving over my cursor. So if I just say read, read, I'm not going to read the same two bytes be--or however much I'm reading. Because after the first read, I'm going to move over to a new location so when I read from that location, I'm gonna read a different sequence of bytes. So we're just kind of reading--we're reading and moving forward as we read, rather than saying go to this location. We're kind of literally moving through the file and we can't go back unless we call fseek. Other last questions on copy? Okay. So let's move on to whodunit. So the goal of whodunit is to take this picture that's really unreadable. It has some red pixels, some white pixels and then some bluish ones and you can't--there's some message in there and you can't read it. And so we need to figure out how we can read it. So here's what we need to do, we nee d to open the file and we need to read each line pixel by pixel. And so as you read in those pixels, we need to somehow change them in some deterministic way so we can go from this totally unreadable state to something that we can read. So then after we changed the pixel, we want to write back to some new file. And so we want to start off with copy.c because that essentially handles three of these things for you. Right, copy.c handles opening it, reading it, and writing it pixel by pixel. So I would highly recommend you use copy.c as a starting point and just add in some extra code to change the color of the pixels. And so what we wanna be doing is kind of this effect that you've seen on like a serial box or some decoder ring but we wanna only let through the red parts of every pixel. So we're basically creating some filter and every pixel has some reds, some blue and some green. And we only wanna let through the red. So we can just tone down the green and tone down the blue, that means only red is gonna be left and we get this effect of putting the red glasses over the really confusing image and then the actual message is going to be revealed. So remember that with our struct, we can access the red, the blue, and the green individually because they're just fields of our struct and we can change those values and write the pixel back and we're going to get a new image. That's basically a result of putting on those 3D glasses. So questions on how we're gonna do that or how that works? So as the--oh yup. [ Inaudible Remark ] >> Exactly, so when I say tone down, I literally just mean it's some value that's greater than 0 now and out and get rid of it. Make it 0. So as the piece I've mentioned, you might be surprised about how little code it takes to actually write this but it's actually pretty cool effect once you're done. So questions on who done it? Okay, so let's move on to resize. So the goal for this resize.c is to take some image and rather than just copy it or change the colors, we want to enlarge it. So, scale the image up by a factor of n, and you can be guaranteed that this is just an integer of from 0 to 100. It's not gonna be decimal but just, you know, do make sure that these are actually typed in in integer and not some negative number, and we want to resize it. So if you want to resize it by a factor of 2, we need to do 2 things. So given some pixel, in this case that pixel on the top left, we need to resize it horizontally and we need to resize it vertically. So in order to resize it horizontally, we take this one pixel and we need to make it two pixels that contain the same exact data. Now to resize it horizontally--I mean vertically, we just take that one pixel and add another pixel below it, that's the same data. Or effectively take this one line and duplicate the line to make that one line two lines. So we're doing two things, we're duplicating pixels across and we're duplicating lines down. And there are a number of times that we duplicate lines is gonna be dependent on whatever this n is which is an argument to resize. So we know how to do things like open the file and read each scanline and so now, we need to make sure to update the header info because it's definitely different unless the user just typed in one. The image has a new size, it has a new height a new width. So we need to make sure we change that or we're not gonna be able to display the image correctly. So now, one side we're writing each row and to write each pixel n times and then write each line or row n times. So again, don't forget about this. We know how to change the values because we know their original values, we know what we're scaling by and so we can just calculate the new values very straightforwardly. So now, to write each pixel, you notice that copy.c is just writing it once. But we don't wanna write it once, we want to read a single pixel and write that pixel multiple times. >> So once we read the pixel, we know it's stored inside of some struct. Now writing that struct doesn't change what that struct contains, right? Unless we explicitly change it, that struct is still gonna contain that pixel. So we need to do is instead of just calling fwrite once, we probably wanna call fwrite in some loop where the value the number of times that loop is repeated needs to be the number of times we need to write that pixel. So we don't--so for every one time we read, we need to write instead of once, n times. And so that's gonna take care of our horizontal resizing. But we also need to consider this padding thing. So because our old image and our new image have different sizes, we're probably going to have different amounts of padding. So we definitely need to recalculate padding based on the new dimensions of a resized image. That's a no big deal because we have a formula and that's really easy to plug into. But we're a little nuanced as we're now reading and writing data because when you're reading, you need to make sure that you skip over the original padding. So remembering copy.c, we explicitly used fseek and we skipped over the padding. So when using that fseek, you need to make sure you skip over the original images padding. But when you're writing, you need to write the amount of padding that you calculated for the new image. So in copy.c, we're just using the same padding value. Now that we're resizing, we need to make sure you use different padding values or else we're just not going to get a valid bitmap. So any questions on handling the padding between sizes or why we need to? To size your writing, think about, am I working on the original image or I'm working on the new image now and what's the padding for the current image that I'm working on. So now we have our image and it's resized horizontally. So that's fine, we know how we doubled all of our pixels, whatever, now we need to handle vertically. So we wrote this line once, and so we need to write it n times, so we need to write it n minus 1 more times. But the problem with the way copy is written is that as soon as copy write to pixel, it forgets about it. It's never good--it's not saving these pixels anywhere. As soon as it writes it, it's gonna forget about it and just create a new struct and move on to the next pixel. So we obviously don't want that to happen because if we forget the pixel and we know--never look at it again, there's no way to duplicate the entire line. There are a few things we can do here. We can do--one thing we can do is remember the pixels inside of an array. So we could say, okay, I'm reading my lines, and I'm gonna have some array that represents a single line inside of my image. And so then once I've read in every line, the entire line, I can just write that line multiple times. Or if you don't wanna create a new array, we can use fseek. We can say okay, well, I've written this one line once, instead of going to the next line, I need to restart at the current line and write it again. So both approaches are totally, totally valid but the basic idea is we don't want to make--we don't want to move on the next line immediately. We need to somehow come back to the current line that we're on and write that n times to the file. So this horizontally, this vertical case were kind of two separate cases and they're gonna combine to make sure that we get an image that's resized by n and all the padding is handled and we made sure we updated our headers and we're good to go. So if you wanted to take the former approach of saving every pixel in a line inside of an array, then we need to create an array that's not some static size, but well, I can't just say like array 5 because I could be entering in images that are various sizes, but we wanna create an array who's size is based on a variable. And so there are two ways to do that. The simpler way is just to say, okay, I can have some array, in this case we're using integers but we're not gonna be using integers on this pset, we're gonna make sure we have an array of those RGBTRIPLEs or those pixels. And inside of the link of the array, we could just use a variable. And whatever the value of that variable is, when this is executed, is going to determine the size of the array. But now that we know about the heap, we can also use malloc. Well we know that we want n ints, so just n times size of int where n could be however big I want the array to be. And if you take the lighter approach, make sure that you're free because freeing after malloc is really, really important. So questions on how we're resizing the image in either direction? Yup. [ Inaudible Remark ] >> So how do we know how much to initialize the length of this array to? So in this case, I just--I have some variable and I just said 5 for the example. But we know ints from our bitmap header how wide and how tall our image is. So we can read the value of that struct which is gonna be determined by the file itself and then create an array based on that value. Other questions on anything resize? Okay. So now, let's move on to the fun one which is recover. So the goal of recover is we want to take some CF--some corrupt CF card because David has no idea what he is doing with computers and want to recover 37 images from it. And so just a note if you happen to start this pset on Friday before like 2:30 or so, make sure you update your appliance [phonetic] again. There is just a small issue in the card we supplied you where you couldn't get the last image, so just make sure you update your appliance as soon you're done here to make sure you get the correct version of the CF card. So this is what your CF card looks like, and this middle image is indeed David's college yearbook photo. So, on side--inside of your card, your CF card, your images are stored contiguously. So, one after another, but we know--so before we go into that, just here it's hard to do again, so we're gonna open up this, we're gonna determine where the start of each images, we wanna determine what we wanna call that image and we wanna write out all the bytes of that image to the same file. So, this card.raw which again, when you do your get cloned, you're not gonna pull down the raw file, but when you do your pseudo [inaudible] update, that's what's gonna download the raw file, so make sure you update all the time. And so we wanna open this file and it's located here on your appliance, so we don't need to worry about what's telling you where the file is because it's always gonna be in the same place. So just feel free to hard code this value into your recover. It's not going to be a command-line argument. So, that's how we open the file, just fopen with an R as usual. And so now we need to determine the start of each JPEG. So a JPEG, just like a bitmap, it's still just a sequence of bytes because that's all files are. Even though bitmaps and JPEGs are very different, they're just sequences of bytes and it's no scarier than that. So bit--JPEGs can start with one of these two sequences. They can start with either ffd8, ffe0, or the same thing is with the e1 at the end. So if you find this sequence of bytes, you know JPEG starts right here. And so, because this JPEG is restored contiguously, we're really looking at something like this, we're gonna find this sequence of bytes and those 4 bytes are gonna be the first 4 bytes of my image. So as I'm going through reading through the CF card, I find those 4 bytes again. That means that their image that I was reading, it's gotta be done because I just found a new image. And because this photo is restored contiguously, we know that it has to be some new image when we find the sequence. So in that case, you wanna say okay, that first JPEG is done, I wanna create a new JPEG and start writing all of those bytes, including this first 4. So these first 4 bytes are part of the JPEG itself, so you don't wanna skip over and then start writing, you wanna make sure you include those first 4 bytes. So that's how we're going to determine the start of our JPEG. So any questions to this point? Okay. So we know how to open a file and we know how to read it and we're just gonna make sure we're looking for these 4 bytes in order. So now we need to do a little work to determine the name of our file, so they're all gonna end in .jpg and the pset mentions that they have to be in the file, the format number number number, where this number is running from 0 up to 36. And so we can use this new function called sprintf, which is the exact same thing as printf except instead of writing out to the terminal, you can write to a string or a character array. So this--the first argument to sprintf here is going to be the array you're writing to. So up somewhere else in my program, I declared some fixed size character array, and now, the rest of the arguments are gonna operate just like printf. So I can save the string that I wanna write in and that string can contain things like percent D and percent C and then every argument after that is going to be what we're substituting in. So if I said sprintf coolness and then percent D and I'm making that a 10, instead of writing that out to the terminal, I'm going fill up this character array with the resulting string. So one thing to note is because we're saying 000 and there are only 37 images, you know that every filename is going to be the same size, and that's pretty handy because these arrays has to be the same size from [inaudible] and all the way through its existence. So, also note that the JPEGs are named in the order you find them. So 000 is gonna be the first one, the next one has to be 001, so you could kinda see what we're gonna do here. We need some counter and based on the counter, we need to sprintf some filename and that filename is always gonna be the same length, which is really cool. So now once we've formulate that, then we can pass that argument to the file that we open up when we call fopen and start writing bytes. >> So, any questions on sprintf or how we can use that to construct the filename? Okay, so now that we have the filename, we wanna actually start writing the bytes of the image to the file. So bitmaps, we were concerned with reading 3 bytes at a time, because that's how many bytes were inside of the pixel. With JPEGs, we're now concerned with much larger blocks, in this case, there are 512 byte blocks. So instead of reading 3 bytes at a time, and instead saying 3 inside of fread, we now wanna read 512 blocks at a time. So before, we were reading into structs, right, and the number of fields in the structs determine how much we can read at a time. But we can also [inaudible] that reading into a struct we can read straight into in an array. So here I've created an array of bytes, and you wanna use bytes which is declared on that same header file. A byte is just a data type, and that's what we wanna use in our JPEGs because it represent about 512 block bytes. So we're reading into that array 5 bytes, because it's saying 5 times the size of 1 byte, so that's 5 total bytes. So after this call, array is now filled up with the first 5 bytes, of whatever file I opened with input pointer. So, the same thing is using structs except now we don't have to create a struct, and we can read in instead of 3 bytes at a time really as much as we want, because the size of the array that we create determines how much we could read in at once. So after we read this, if we want to get the first byte that we just read, we would say array of 0, and that's just going to be the first byte that fread read and then you can go on until the size of the array. So where structs you had to access some named field. If we use arrays, we can literally just say I want the third byte, not some name field like first or second. So, questions on how to use fread with arrays? Yup? [ Inaudible Remark ] >> So fread and sprintf, so those are kind of doing two different things. So sprintf is creating some string from something that you specify, so the string that we specified as the second argument to sprintf is gonna determine what the variable array contains. Where with fread, we're not supplying at a string but we're operating with a file, so we're reading from the file and putting it into an array. So it's sprintf, we're not using files but with fread we're reading files and not something that we specified. Yup. [ Inaudible Remark ] >> So if that 5, the 5 from the cursor? Exactly. So, even though I'm not reading to structs anymore, I'm still moving the cursor along as I fread, so every time I call this, I'm reading 5 bytes and the cursor is moving over 5. So, exactly the same thing. Yeah. Other questions? Okay. So now we read in our 512 byte blocks, so now, we need to write out those 512 blocks, and we know how to do that because we can do the same to supply our array until fwrite that we want 512 blocks written out to some file. Now the file that we're writing out to needs to be the same until we find some new image. So we're basically gonna be finding an image. We said okay, we just found a JPEG, we're gonna create some new file, whether that be--let's just say 000.jpg. And until we find another JPEG, we're writing out to 000.jpg, just writing out 512 byte blocks. And then eventually, while we're reading we're gonna say wait, we just found that 4 byte string again, that 4--sequence of 4 bytes that says I'm a new JPEG. So at this point, you wanna say okay, I know longer care about 000.jpg, that's totally done, it's finished, because I found a new JPEG and JPEGs are stored right next to each other on the card. So now I wanna open up a new file which now is gonna be 001.jpg. And now I wanna start writing to that file until I find some other JPEG. And we're just gonna continue this process until we find the start of 37 different JPEGs. But we need to worry about when we're done, because we're not gonna end with the start of a new JPEG. Our CF card is gonna end with the end of the last JPEG, so we have this handy function, feof or feof, that's going to return a Boolean. And if it returns true, then that means that we have reached the end of our file, we're done reading. So if we put this inside of a loop and we keep freading 512 bytes at a time, eventually, this feof is going to return true and that means we've read through the entire CF card. And if at that point we have our program correct, that means we found 37 JPEGs and we've written to 37 different files. So questions on recover? Yeah. [ Inaudible Remark ] >> So, do we have to deal with RGBTRIPLEs and JPEGs? So no, we don't. RGBTRIPLEs are just a bitmap thing. In JPEGs we're just saying we're just gonna look at every block of 512 bytes. We don't really know what those 512 bytes is? But somehow, those bytes come together to form our JPEG images. So we never need to change any bytes themselves, we just need to find where might--did I find a new JPEG yet? And if not, keep writing into the same file. And if we do find a new one, create a new file. Other questions on recover? Yeah. [ Inaudible Remark ] >> Oh, sure. So the feof, it just takes a single argument and it takes the file pointer. So this says, is this file pointer at the end? So is there cursor associated with this file at the end of the file. And if not, then it's gonna return false. And if it is at the end or past the end, it's gonna return true. So this is a good condition for some loop that goes through the entire CF card. Any last questions? Yeah. [ Inaudible Remark ] >> Oh, so why this have a size of 5? >> Yeah. >> No, no reason, just so to show you like do we need to have some fixed size, and that fixed size just has to match the amount of bytes that we're reading in. So for you that your recover that-your recover file, you want it to read in 512 bytes at a time. Well, this is reading in 5 bytes at a time. Yeah. >> So as reading 5 bytes at a time from that file pointer not the pointer but that points they're in? >> Yes. [ Inaudible Remark ] >> Yes. So what this line is doing--and it's reading from this input pointer. And where are these data going? Inside of array. So before we put it inside of a struct, now we're putting it inside of an array. If we then wanted to write out to the JPEG, we would need to call fwrite with different parameters because we don't want input pointer anymore, we now want our output pointer. But we can still say array because that's the array from the original file that we need to write out to the JPEG. >> So we would then write another function using the array we just got through reading out to the file? >> Right. So then you'd have after this, you'd have some other function that writes out to the file, to the JPEG, exactly. Other questions? So before you go, this is an email that we actually received at the end--at the beginning of this year and I thought this was the coolest thing ever. So just read through that and just let it sink in how much we are empowering you as a class. [ Laughter ] [ Pause ] >> So, on that note good luck on pset 5.