1 00:00:00,000 --> 00:00:09,780 >> [MUSIC PLAYING] 2 00:00:09,780 --> 00:00:11,150 >> ZAMYLA CHAN: Let's tackle recover. 3 00:00:11,150 --> 00:00:14,030 Recover is probably my favorite PSET, and mainly because I think it's 4 00:00:14,030 --> 00:00:15,650 really, really cool. 5 00:00:15,650 --> 00:00:19,040 Basically, you're given a memory card file in which 6 00:00:19,040 --> 00:00:20,900 pictures have been deleted. 7 00:00:20,900 --> 00:00:23,650 But what you're going to do is recover them all. 8 00:00:23,650 --> 00:00:24,250 >> OK. 9 00:00:24,250 --> 00:00:28,230 So it's really exciting, but maybe a little intimidating, because you're 10 00:00:28,230 --> 00:00:32,430 given an empty C file and you have to fill it in. 11 00:00:32,430 --> 00:00:36,250 OK, so let's break this into manageable parts. 12 00:00:36,250 --> 00:00:38,160 You'll want to open the memory card file. 13 00:00:38,160 --> 00:00:39,900 That seems simple enough. 14 00:00:39,900 --> 00:00:43,030 Then, find the beginning of a JPG image. 15 00:00:43,030 --> 00:00:46,740 All the files on this memory card are going to be JPGs. 16 00:00:46,740 --> 00:00:50,840 Then, once you find the beginning, you're going to open a new JPG, that 17 00:00:50,840 --> 00:00:57,610 is, like, create a JPG, and write 512 byte at a time until a new JPG is 18 00:00:57,610 --> 00:01:02,930 found, and ending the program, once you detect the end of the file. 19 00:01:02,930 --> 00:01:06,400 >> So first steps first is to open the memory card file. 20 00:01:06,400 --> 00:01:09,850 But you know this already, and there's a file I/O function that's going to 21 00:01:09,850 --> 00:01:12,030 prove very useful. 22 00:01:12,030 --> 00:01:12,820 OK. 23 00:01:12,820 --> 00:01:14,760 So what are JPGs? 24 00:01:14,760 --> 00:01:16,330 Because we need to the beginning it. 25 00:01:16,330 --> 00:01:21,310 Well, JPGs, just like bit maps, are just sequences of bytes. 26 00:01:21,310 --> 00:01:30,660 Luckily, every JPG starts with either 0xff, 0xd8, 0xff, 0xe0, one sequence 27 00:01:30,660 --> 00:01:33,610 of bytes, or another sequence of bytes. 28 00:01:33,610 --> 00:01:37,250 >> So those four bytes indicate the start of a JPG. 29 00:01:37,250 --> 00:01:40,780 None other than those two combinations of four bytes. 30 00:01:40,780 --> 00:01:44,840 And luckily for us, another fact that we can take advantage of is that every 31 00:01:44,840 --> 00:01:48,550 JPG is stored side-by-side on the memory card. 32 00:01:48,550 --> 00:01:52,210 I've represented the structure of a memory card schematically on this 33 00:01:52,210 --> 00:01:53,310 slide here. 34 00:01:53,310 --> 00:01:59,270 Here, every square, every rectangle, represents 512 bytes, and it starts 35 00:01:59,270 --> 00:02:01,750 with a gray in that we don't really have a JPG. 36 00:02:01,750 --> 00:02:05,700 >> But then we finally hit a block with a star. 37 00:02:05,700 --> 00:02:10,940 That means that the first four bytes out of those 512 are one of those two 38 00:02:10,940 --> 00:02:13,230 starting sequences of a JPG. 39 00:02:13,230 --> 00:02:17,340 And we go from there, and then once one JPG ends, the next one begins. 40 00:02:17,340 --> 00:02:20,990 We don't ever have any more gray space in-between. 41 00:02:20,990 --> 00:02:25,550 >> But how do we actually read this, and read the 512 bytes so that we can make 42 00:02:25,550 --> 00:02:27,500 the comparison the first place? 43 00:02:27,500 --> 00:02:33,470 Well, let's go back to fread, which takes in the struct that will contain 44 00:02:33,470 --> 00:02:34,470 the bytes that you're reading. 45 00:02:34,470 --> 00:02:36,570 So you're going to put those in there-- 46 00:02:36,570 --> 00:02:42,192 the size, the number, and then inpointer that you're reading from. 47 00:02:42,192 --> 00:02:49,900 Now, we want to read 512 at a time, and we want to store this in a buffer, 48 00:02:49,900 --> 00:02:50,700 I'm going to call it. 49 00:02:50,700 --> 00:02:54,100 >> Basically, we're going to hold onto those 512 bytes and do 50 00:02:54,100 --> 00:02:55,500 things with it, right? 51 00:02:55,500 --> 00:02:58,260 We're either going to compare the first four bytes, or we're going to 52 00:02:58,260 --> 00:02:59,830 read it in, OK? 53 00:02:59,830 --> 00:03:05,050 So then the data pointer will then serve as your buffer, and the 54 00:03:05,050 --> 00:03:07,745 inpointer, well, that's just going to be your memory card. 55 00:03:07,745 --> 00:03:09,500 >> Back to our memory card schematic. 56 00:03:09,500 --> 00:03:14,690 We're going to read 512 bytes at a time, storing every 512-byte block 57 00:03:14,690 --> 00:03:19,190 into a buffer, holding onto those buffer, those 512 bytes, until we know 58 00:03:19,190 --> 00:03:22,000 exactly what to do them. 59 00:03:22,000 --> 00:03:25,960 So the beginning isn't anything, so we'll read the buffer, compare it, and 60 00:03:25,960 --> 00:03:28,160 we won't need to do anything with it. 61 00:03:28,160 --> 00:03:32,030 And then, we finally hit a star block, meaning that we've 62 00:03:32,030 --> 00:03:33,630 found our first JPG. 63 00:03:33,630 --> 00:03:36,560 So the buffer now hold bytes from that JPG. 64 00:03:36,560 --> 00:03:40,220 >> The next time 512 bytes, because they're not a star block, are also 65 00:03:40,220 --> 00:03:41,740 part of that JPG. 66 00:03:41,740 --> 00:03:47,630 And JPGs are continuous from there on in, until we hit the next JPG. 67 00:03:47,630 --> 00:03:51,880 And then the buffer then holds 512 bytes for that JPG, and 68 00:03:51,880 --> 00:03:53,580 so on, and so forth. 69 00:03:53,580 --> 00:03:54,250 OK. 70 00:03:54,250 --> 00:03:58,980 >> So once you hit the first starred block, the first JPG, how do you 71 00:03:58,980 --> 00:04:01,910 actually, well, open it? 72 00:04:01,910 --> 00:04:04,990 Let's make a new JPG. 73 00:04:04,990 --> 00:04:08,846 The filenames for a JPG are going to be in the format, number, number, 74 00:04:08,846 --> 00:04:13,830 number.jpg, in that they're named in the order in which they are found, 75 00:04:13,830 --> 00:04:14,780 starting at 0. 76 00:04:14,780 --> 00:04:19,890 >> So the first JPG that you find will be 000.jpg. 77 00:04:19,890 --> 00:04:26,560 So, probably a good idea to keep track of how many JPGs you've found so far. 78 00:04:26,560 --> 00:04:27,610 So that's the file name. 79 00:04:27,610 --> 00:04:29,660 But how do you actually make that? 80 00:04:29,660 --> 00:04:34,310 Well, we're going to use a function called sprintf. 81 00:04:34,310 --> 00:04:38,260 A little bit similar to printf, where you can use placeholders for strings, 82 00:04:38,260 --> 00:04:42,420 except in this case, sprintf will print the file out into the current 83 00:04:42,420 --> 00:04:45,550 directory, not into the terminal. 84 00:04:45,550 --> 00:04:46,120 >> OK. 85 00:04:46,120 --> 00:04:49,950 So here we see that we have title, a char array that will store the 86 00:04:49,950 --> 00:04:55,120 resultant string, and we pass in the title of the actual string with a 87 00:04:55,120 --> 00:04:58,720 placeholder, just like we've learned to do with printf. 88 00:04:58,720 --> 00:05:05,530 But this code that I have here will give 2.jpg, not 002.jpg. 89 00:05:05,530 --> 00:05:09,920 So I'll leave to you to find out how to modify the placeholder to make the 90 00:05:09,920 --> 00:05:11,920 correct name. 91 00:05:11,920 --> 00:05:12,610 >> OK. 92 00:05:12,610 --> 00:05:17,390 So once you've sprintf'd then you can open that file, because it exists in 93 00:05:17,390 --> 00:05:22,690 your directory, with fopen, using the title, and then whatever mode you want 94 00:05:22,690 --> 00:05:25,140 to open that file in. 95 00:05:25,140 --> 00:05:30,260 So now that we've opened a new JPG file, now we can write 512 bytes at a 96 00:05:30,260 --> 00:05:33,320 time, until a new JPG is found. 97 00:05:33,320 --> 00:05:36,640 So let's take another look at the syntax of fwrite. 98 00:05:36,640 --> 00:05:40,060 >> I know that I'm showing this slide a lot, but I just want to make sure that 99 00:05:40,060 --> 00:05:43,530 you guys don't get too confused, because I know that it's very easy to 100 00:05:43,530 --> 00:05:47,000 mix up the first and the last argument, in particular. 101 00:05:47,000 --> 00:05:54,390 But remember that you're writing from your buffer into the out file images. 102 00:05:54,390 --> 00:05:59,250 >> Now that you know how the write 512 bytes into your JPG file that you've 103 00:05:59,250 --> 00:06:03,230 created, well, we want to stop that process once we've reached the end of 104 00:06:03,230 --> 00:06:06,720 our card, because there won't be any more images to be found. 105 00:06:06,720 --> 00:06:10,760 So let's go back to fread once more, I promise. 106 00:06:10,760 --> 00:06:15,600 fread returns how many items of size, size, were ready in successfully. 107 00:06:15,600 --> 00:06:19,440 Ideally, this is going to be whatever you pass in for number, right? 108 00:06:19,440 --> 00:06:24,140 Because you're trying to read number of elements of size, size. 109 00:06:24,140 --> 00:06:29,380 But if fread isn't able to read that number of elements, then it'll return 110 00:06:29,380 --> 00:06:32,530 whatever number it read successfully. 111 00:06:32,530 --> 00:06:36,310 >> Now, one important thing to note is that if you use another file I/O 112 00:06:36,310 --> 00:06:43,860 function like fgetc, it'll also return how many items it read successfully. 113 00:06:43,860 --> 00:06:48,000 What's useful about this function is that if you use functions inside of a 114 00:06:48,000 --> 00:06:53,190 condition, it'll execute itself while determining that condition, which is 115 00:06:53,190 --> 00:06:54,340 just really useful. 116 00:06:54,340 --> 00:07:00,440 So if you have this conditions, say, if fread buffer, sizeof DOG, 2, 117 00:07:00,440 --> 00:07:04,870 pointer, equals equals 1, that means that I'd like to read 118 00:07:04,870 --> 00:07:06,540 2 dogs at the time. 119 00:07:06,540 --> 00:07:13,490 But if fread returns 1 instead of 2 as expected, that means that there are 2 120 00:07:13,490 --> 00:07:16,480 dogs left in my file, but rather 1. 121 00:07:16,480 --> 00:07:22,450 But if it returns 2, then I still have those 2 dogs inside of my buffer. 122 00:07:22,450 --> 00:07:26,280 >> So now that gives you a sense of how to check for the end of the file, but 123 00:07:26,280 --> 00:07:28,940 let's go through now the logic. 124 00:07:28,940 --> 00:07:32,460 How do we actually piece all of these elements together? 125 00:07:32,460 --> 00:07:36,880 Once we hit our first JPG, since we know that JPGs are stored 126 00:07:36,880 --> 00:07:40,910 contiguously, we'll be writing until we reach the end of the card file. 127 00:07:40,910 --> 00:07:43,950 But we don't want to write anything until then. 128 00:07:43,950 --> 00:07:48,710 So it matters, not only that we're at the start of a new JPG, but whether 129 00:07:48,710 --> 00:07:50,655 we've already found a JPG or not. 130 00:07:50,655 --> 00:07:55,390 >> If It's the start of a new JPG, we'll want to close our current JPG file if 131 00:07:55,390 --> 00:07:59,110 we have one open, and open a new one to write into. 132 00:07:59,110 --> 00:08:03,340 If it's not the start of the new JPG, though, we'll keep the same JPG file 133 00:08:03,340 --> 00:08:05,910 open and write into that. 134 00:08:05,910 --> 00:08:10,100 We'll write our buffer into whichever JPG file we have open, provided that 135 00:08:10,100 --> 00:08:12,120 we have one open, of course. 136 00:08:12,120 --> 00:08:16,190 If we haven't found our first JPG yet, we don't write anything. 137 00:08:16,190 --> 00:08:20,290 And this process continues until you reach the end of the card file. 138 00:08:20,290 --> 00:08:23,410 >> And finally, you'll want to make sure that you fclose any 139 00:08:23,410 --> 00:08:25,800 files that you've fopened. 140 00:08:25,800 --> 00:08:28,360 Once you're comfortable with the concepts, take a look at some 141 00:08:28,360 --> 00:08:30,840 pseudocode, which I've included here. 142 00:08:30,840 --> 00:08:34,830 First, you want to open the card file, and then repeat the following process 143 00:08:34,830 --> 00:08:37,144 until you've reached the end of the card. 144 00:08:37,144 --> 00:08:40,880 You want to read 512 bytes into a buffer. 145 00:08:40,880 --> 00:08:43,934 Using that buffer, you'll want to check whether you're at the start of a 146 00:08:43,934 --> 00:08:45,300 new JPG or not. 147 00:08:45,300 --> 00:08:48,400 And the answer to that question will affect your file management-- 148 00:08:48,400 --> 00:08:51,940 which files you open, which ones do you close. 149 00:08:51,940 --> 00:08:55,220 >> Then, have you already found a JPG? 150 00:08:55,220 --> 00:08:57,740 How have you been keeping track of that? 151 00:08:57,740 --> 00:09:01,735 Then, depending on that, you'll either write into the current JPG that you 152 00:09:01,735 --> 00:09:07,090 have open, or not write it at all, because you haven't found a JPG yet. 153 00:09:07,090 --> 00:09:10,870 Finally, once you've reached the end of the file, you'll want to close any 154 00:09:10,870 --> 00:09:12,590 remaining files that you have open. 155 00:09:12,590 --> 00:09:14,590 We want to be tidy here. 156 00:09:14,590 --> 00:09:18,790 >> And with that, you've recovered all of the missing files from that memory 157 00:09:18,790 --> 00:09:21,620 card, which is a pretty amazing feat. 158 00:09:21,620 --> 00:09:23,430 So pat yourself on the back. 159 00:09:23,430 --> 00:09:27,560 But, there's one more element to the PSET, which is the contest. 160 00:09:27,560 --> 00:09:30,920 You'll find that all of the pictures that you've recovered are actually 161 00:09:30,920 --> 00:09:32,820 pictures of CS50's staff. 162 00:09:32,820 --> 00:09:38,500 So if you're on campus or somewhere near, then you can take pictures with 163 00:09:38,500 --> 00:09:42,600 the staff, and the section that has the most pictures with staff members 164 00:09:42,600 --> 00:09:46,940 from their recovered files will get an awesome prize. 165 00:09:46,940 --> 00:09:50,650 With that, then you've finished the recover PSET. 166 00:09:50,650 --> 00:09:53,600 My name is Zamyla, and this is CS50. 167 00:09:53,600 --> 00:10:01,835