1 00:00:00,000 --> 00:00:00,500 2 00:00:00,500 --> 00:00:03,120 In recover, we've taken a memory card from a camera 3 00:00:03,120 --> 00:00:06,300 and accidentally deleted all of the images. 4 00:00:06,300 --> 00:00:10,020 Your task is going to be to write a program to recover those images, 5 00:00:10,020 --> 00:00:12,990 generating new JPEG files for each. 6 00:00:12,990 --> 00:00:14,530 How are you going to do that? 7 00:00:14,530 --> 00:00:16,830 Well, first, you're going to want to open the memory 8 00:00:16,830 --> 00:00:19,170 card file that we'll give to you. 9 00:00:19,170 --> 00:00:21,450 Then, you're going to look through that memory card 10 00:00:21,450 --> 00:00:24,610 file for the beginning of a JPEG file. 11 00:00:24,610 --> 00:00:27,990 Once you find a JPEG file, you're going to open a new JPEG file 12 00:00:27,990 --> 00:00:29,640 that you're going to start writing to. 13 00:00:29,640 --> 00:00:33,690 And you're going to keep writing data in 512 byte chunks 14 00:00:33,690 --> 00:00:37,380 until you find a new JPEG file, at which point you can close the old one 15 00:00:37,380 --> 00:00:38,820 and start writing the new one. 16 00:00:38,820 --> 00:00:40,528 And you're going to repeat this process-- 17 00:00:40,528 --> 00:00:43,920 looking for new JPEG files, writing new data to those JPEG files-- 18 00:00:43,920 --> 00:00:46,230 until you reach the end of the file. 19 00:00:46,230 --> 00:00:47,880 How are you going to do all of that? 20 00:00:47,880 --> 00:00:49,755 Well, let's start by talking about how you're 21 00:00:49,755 --> 00:00:51,660 going to open the memory card file. 22 00:00:51,660 --> 00:00:55,410 To do this, you can take advantage of the fopen function, which 23 00:00:55,410 --> 00:00:59,130 will take as its first parameter the name of a file you want to open. 24 00:00:59,130 --> 00:01:01,470 And the second parameter represents what mode 25 00:01:01,470 --> 00:01:05,099 you want to open it in, where r stands for read mode, where you want 26 00:01:05,099 --> 00:01:07,830 to read information from that file. 27 00:01:07,830 --> 00:01:09,660 As you're reading that file, though, you're 28 00:01:09,660 --> 00:01:12,450 going to want to be on the lookout for JPEGs. 29 00:01:12,450 --> 00:01:14,740 How are you going to know that an image is a JPEG? 30 00:01:14,740 --> 00:01:18,060 Well, every JPEG begins with a distinct header, 31 00:01:18,060 --> 00:01:23,830 meaning the first byte of every JPEG file is 0xff in hexadecimal. 32 00:01:23,830 --> 00:01:26,850 The second byte is always 0xd8. 33 00:01:26,850 --> 00:01:29,670 The third byte is always 0xff. 34 00:01:29,670 --> 00:01:31,710 And the fourth byte could vary a little bit. 35 00:01:31,710 --> 00:01:35,490 But it's always 0xe0, or 0xe1, or 0xe2. 36 00:01:35,490 --> 00:01:37,950 Anything that starts with 0xe and then something. 37 00:01:37,950 --> 00:01:40,440 So it could be 0xef, for example. 38 00:01:40,440 --> 00:01:42,960 These are all valid JPEG headers. 39 00:01:42,960 --> 00:01:47,850 So if you notice this pattern of four bytes at the beginning of any 512 byte 40 00:01:47,850 --> 00:01:53,200 block, you can pretty safely assume it's the beginning of a JPEG file. 41 00:01:53,200 --> 00:01:56,070 So let's talk a little bit more detail about JPEGs. 42 00:01:56,070 --> 00:01:58,770 Each JPEG is going to start with a distinct header, where 43 00:01:58,770 --> 00:02:04,560 the first three bytes, as we've talked about, are 0xff, 0xd8, and 0xff. 44 00:02:04,560 --> 00:02:08,639 And the last bite is 0xe0, or 0xe1, or 0xe2. 45 00:02:08,639 --> 00:02:11,009 Anything up to 0xef. 46 00:02:11,009 --> 00:02:13,560 So this is the header that you're going to be looking for. 47 00:02:13,560 --> 00:02:15,480 And we'll tell you that each of the JPEGs 48 00:02:15,480 --> 00:02:18,150 are stored back-to-back in the memory card. 49 00:02:18,150 --> 00:02:22,050 After one JPEG ends, the next one begins, and so on and so forth, 50 00:02:22,050 --> 00:02:27,460 up through the end of the file inside of these 512 byte blocks. 51 00:02:27,460 --> 00:02:30,570 So what does this mean for how you can look through this memory card 52 00:02:30,570 --> 00:02:32,070 to try to find JPEGs? 53 00:02:32,070 --> 00:02:36,900 Well, if you imagine your memory card as a whole bunch of these 512 byte blocks, 54 00:02:36,900 --> 00:02:40,560 then what you can do is start at the first block of the JPEG file, 55 00:02:40,560 --> 00:02:43,850 and check to see if you found a JPEG header or not. 56 00:02:43,850 --> 00:02:45,600 If you haven't found a JPEG header, you'll 57 00:02:45,600 --> 00:02:48,270 keep moving on, looking one block at a time 58 00:02:48,270 --> 00:02:51,060 until you find that sequence of four bytes that 59 00:02:51,060 --> 00:02:55,540 indicates to you that this block is the beginning of a JPEG file. 60 00:02:55,540 --> 00:02:58,740 Once you find that block, now you can open up a new JPEG file 61 00:02:58,740 --> 00:03:00,360 that you're going to start writing to. 62 00:03:00,360 --> 00:03:03,330 And you can start writing block after block after block 63 00:03:03,330 --> 00:03:06,810 as you find more and more blocks of this JPEG file. 64 00:03:06,810 --> 00:03:08,850 What you'll run into eventually, though, is 65 00:03:08,850 --> 00:03:11,910 another block where the first four bytes also 66 00:03:11,910 --> 00:03:14,260 look like the start of a JPEG file. 67 00:03:14,260 --> 00:03:17,610 And when you detect that, you should realize that this is probably the start 68 00:03:17,610 --> 00:03:21,150 of a new JPEG file, meaning you should close the old one-- 69 00:03:21,150 --> 00:03:22,500 you're done writing to it-- 70 00:03:22,500 --> 00:03:25,200 and start writing to a new JPEG file that has 71 00:03:25,200 --> 00:03:29,080 all of these next sequences of blocks as the data within it. 72 00:03:29,080 --> 00:03:30,900 So you continue writing again and again. 73 00:03:30,900 --> 00:03:33,720 And you'll repeat this process, looking through this file, 74 00:03:33,720 --> 00:03:36,060 until you find the start of another JPEG file. 75 00:03:36,060 --> 00:03:37,760 Then, starting to write that new file. 76 00:03:37,760 --> 00:03:39,510 And repeating that process again and again 77 00:03:39,510 --> 00:03:43,260 and again until you reach the end of the memory card, at which point 78 00:03:43,260 --> 00:03:48,090 you'll have found all of the JPEG files that are inside of this memory card. 79 00:03:48,090 --> 00:03:51,420 How are you going to read data, though, from the memory card? 80 00:03:51,420 --> 00:03:56,230 Well, to do that you can use the fread function, which takes four parameters. 81 00:03:56,230 --> 00:03:58,200 The first parameter is called data. 82 00:03:58,200 --> 00:04:00,600 And it is going to be a pointer to where you're 83 00:04:00,600 --> 00:04:02,760 going to store the data that you're reading, 84 00:04:02,760 --> 00:04:07,020 likely some buffer of some kind that might be an array, for example. 85 00:04:07,020 --> 00:04:11,430 Next is size, which is the number of bytes of each element you're 86 00:04:11,430 --> 00:04:14,330 going to try to read from the file. 87 00:04:14,330 --> 00:04:17,420 Next up is number, which is the number of those elements 88 00:04:17,420 --> 00:04:20,120 that you want to try to read all at once. 89 00:04:20,120 --> 00:04:22,940 And then, finally, is inptr, which is the file that you're 90 00:04:22,940 --> 00:04:25,403 going to actually read that data from. 91 00:04:25,403 --> 00:04:28,070 And recall that what you want to do is read from the memory card 92 00:04:28,070 --> 00:04:31,850 file in 512 byte chunks. 93 00:04:31,850 --> 00:04:34,740 Once you've read one of those 512 byte chunks, though, 94 00:04:34,740 --> 00:04:38,540 how are you going to know if that 512 byte chunk is actually 95 00:04:38,540 --> 00:04:41,330 the start of a JPEG file or not? 96 00:04:41,330 --> 00:04:44,580 Well, recall that the JPEG file has a distinct header. 97 00:04:44,580 --> 00:04:48,215 So if you've read this 512 byte block into some sort of buffer, 98 00:04:48,215 --> 00:04:51,380 which might be an array of bytes, then what you can do 99 00:04:51,380 --> 00:04:54,480 is check to see if buffer square bracket 0-- 100 00:04:54,480 --> 00:04:57,252 in other words, the first byte of the buffer-- 101 00:04:57,252 --> 00:05:00,320 is 0xff, the first byte of a JPEG file. 102 00:05:00,320 --> 00:05:02,000 And likewise, you can do the same thing. 103 00:05:02,000 --> 00:05:05,060 Checking if the second byte in the buffer is 0xd8 104 00:05:05,060 --> 00:05:07,970 and the third byte in the buffer is 0xff. 105 00:05:07,970 --> 00:05:11,150 The fourth byte is where things get a little bit tricky. 106 00:05:11,150 --> 00:05:14,940 Recall that there are 16 different values that buffer square bracket 107 00:05:14,940 --> 00:05:16,280 3 could take on. 108 00:05:16,280 --> 00:05:19,490 It could be 0xe0, 0xe1, so on and so forth. 109 00:05:19,490 --> 00:05:23,780 And so what you might imagine doing as a first pass is a Boolean condition where 110 00:05:23,780 --> 00:05:31,280 you ask yourself, if buffer 3 is 0xe0 or buffer 3 is 0xe1, or 0xe2, or 0xe3, 111 00:05:31,280 --> 00:05:33,360 your 0xe4, so on and so forth. 112 00:05:33,360 --> 00:05:36,770 But doing this 16 times is going to get very, very tedious. 113 00:05:36,770 --> 00:05:40,900 So to simplify, we can use a trick known as bitwise arithmetic 114 00:05:40,900 --> 00:05:46,180 where I can take the bitwise and of buffer 3 and 0xf0. 115 00:05:46,180 --> 00:05:49,160 What that's going to do is it's going to say, just look 116 00:05:49,160 --> 00:05:52,850 at the first four bits of this 8-bit byte, 117 00:05:52,850 --> 00:05:55,730 and set the remaining four bits to 0. 118 00:05:55,730 --> 00:06:00,410 What that means is that 0xe0, 0xe1, 0xe2, so on and so forth all 119 00:06:00,410 --> 00:06:05,600 become 0xe0 because we've cleared out those last four bits. 120 00:06:05,600 --> 00:06:10,370 Then, after we've done this bitwise and, we can just compare the result to 0xe0 121 00:06:10,370 --> 00:06:13,430 to determine whether or not this byte is a byte 122 00:06:13,430 --> 00:06:16,940 that might appear as the fourth byte inside of a JPEG. 123 00:06:16,940 --> 00:06:19,670 If all of these conditions are true, it's pretty likely 124 00:06:19,670 --> 00:06:24,470 that the 512 byte block that you found represents the beginning of a new JPEG 125 00:06:24,470 --> 00:06:25,550 file. 126 00:06:25,550 --> 00:06:28,880 So when you found the beginning of a new JPEG file, what you'll need to do 127 00:06:28,880 --> 00:06:33,140 is create a new file where you're going to write this data to. 128 00:06:33,140 --> 00:06:34,910 How do you make a new JPEG file? 129 00:06:34,910 --> 00:06:38,810 Well, each file should have a very particular file name as digit digit 130 00:06:38,810 --> 00:06:44,375 digit dot JPEG starting with 000 dot JPEG in the order in which you find it. 131 00:06:44,375 --> 00:06:48,320 So you'll likely want to keep track of how many JPEGs you found so far 132 00:06:48,320 --> 00:06:52,040 so that you can write the correct file names in the correct order. 133 00:06:52,040 --> 00:06:56,000 How do you create a string of the format digit digit digit dot JPEG? 134 00:06:56,000 --> 00:06:59,360 Well, to do this, you can take advantage of a function called sprintf, where 135 00:06:59,360 --> 00:07:02,480 you're printing not to the terminal, but to a string where 136 00:07:02,480 --> 00:07:05,570 the first parameter is the name of the string you want to write to. 137 00:07:05,570 --> 00:07:07,130 In this case, file name. 138 00:07:07,130 --> 00:07:09,340 The second parameter is the format string. 139 00:07:09,340 --> 00:07:14,990 %03i just means print an integer with three digits to represent it, 140 00:07:14,990 --> 00:07:18,830 even if a number like 0, or 1, or 2 doesn't really require 3 digits 141 00:07:18,830 --> 00:07:20,270 to print out that integer. 142 00:07:20,270 --> 00:07:23,120 And the final parameter is the number that you want to substitute. 143 00:07:23,120 --> 00:07:27,470 In this case, 2, where the result would be that the file name string would now 144 00:07:27,470 --> 00:07:31,490 contain 002 dot JPEG instead. 145 00:07:31,490 --> 00:07:34,310 Make sure that file name, though, has enough memory 146 00:07:34,310 --> 00:07:39,050 or has enough characters to fully represent this entire file name. 147 00:07:39,050 --> 00:07:43,160 After you've created the file name, you can open a new file with that file name 148 00:07:43,160 --> 00:07:44,658 by calling fopen again. 149 00:07:44,658 --> 00:07:47,450 This time, providing as the first argument the name of the file you 150 00:07:47,450 --> 00:07:48,500 want to open. 151 00:07:48,500 --> 00:07:51,650 And now, the second argument to fopen instead of r for 152 00:07:51,650 --> 00:07:54,530 read it is going to be w for writing so that you 153 00:07:54,530 --> 00:07:57,770 can begin to write to this new file all of the data 154 00:07:57,770 --> 00:08:00,590 that you're going to find from the memory card. 155 00:08:00,590 --> 00:08:02,720 How do you write data to a file? 156 00:08:02,720 --> 00:08:06,740 To do that, you can use the inverse of fread, which is fwrite, 157 00:08:06,740 --> 00:08:08,960 which is going to write data to a file. 158 00:08:08,960 --> 00:08:12,540 Fwrite, just like fread, also takes four parameters. 159 00:08:12,540 --> 00:08:14,840 The first parameter, called data, is going 160 00:08:14,840 --> 00:08:19,580 to be a pointer to all of the bytes that you want to write to the file. 161 00:08:19,580 --> 00:08:23,270 Next, you need to tell fwrite how big the elements that you're going to write 162 00:08:23,270 --> 00:08:23,810 are. 163 00:08:23,810 --> 00:08:26,240 So the second parameter, size, represents 164 00:08:26,240 --> 00:08:30,470 the number of bytes in each element that you're going to write to the file. 165 00:08:30,470 --> 00:08:31,640 Next is number. 166 00:08:31,640 --> 00:08:34,740 The number of elements that you're going to write to the file. 167 00:08:34,740 --> 00:08:38,090 And finally, outptr is the file pointer that you actually 168 00:08:38,090 --> 00:08:40,490 want to write the data to, likely the JPEG 169 00:08:40,490 --> 00:08:43,432 that you've just opened for the purpose of writing to it. 170 00:08:43,432 --> 00:08:45,140 And you're going to continue doing this-- 171 00:08:45,140 --> 00:08:47,480 writing data to JPEG after JPEG-- 172 00:08:47,480 --> 00:08:49,820 until you reach the end of the file. 173 00:08:49,820 --> 00:08:53,450 But how do you detect when the end of the file is? 174 00:08:53,450 --> 00:08:56,022 Well, to do that let's take another look at fread, 175 00:08:56,022 --> 00:08:57,980 which is the function that we're using in order 176 00:08:57,980 --> 00:09:01,100 to read data from the memory card that's given 177 00:09:01,100 --> 00:09:03,200 to you at the beginning of the program. 178 00:09:03,200 --> 00:09:05,150 What does fread return? 179 00:09:05,150 --> 00:09:09,560 Well, it turns out that fread returns the number of items of size size 180 00:09:09,560 --> 00:09:10,880 that were read. 181 00:09:10,880 --> 00:09:14,870 We currently try to read number elements of size size. 182 00:09:14,870 --> 00:09:18,630 And so most likely, if fread is able to read all of that data, 183 00:09:18,630 --> 00:09:20,840 it's going to return back to you number. 184 00:09:20,840 --> 00:09:25,310 Say that I'm trying to read 255 elements, for example. 185 00:09:25,310 --> 00:09:28,880 Then fread is going to return to me 255. 186 00:09:28,880 --> 00:09:31,500 Of course, once I reach the end of the file, 187 00:09:31,500 --> 00:09:35,660 it might be the case that I don't have 255 additional bytes to read, 188 00:09:35,660 --> 00:09:37,280 and I have fewer than that. 189 00:09:37,280 --> 00:09:39,740 In that case, fread is going to return to me 190 00:09:39,740 --> 00:09:42,625 some number that is less than 255. 191 00:09:42,625 --> 00:09:44,750 So you might want to think about what condition you 192 00:09:44,750 --> 00:09:48,980 could write to determine whether fread has gotten to the end of the file 193 00:09:48,980 --> 00:09:50,660 or not. 194 00:09:50,660 --> 00:09:54,050 Let's put all of this together now and start to think about some pseudocode 195 00:09:54,050 --> 00:09:57,230 that you could use to implement this recover program. 196 00:09:57,230 --> 00:10:00,560 The first thing you'll want to do is open the memory card. 197 00:10:00,560 --> 00:10:03,680 Then after that, you're going to want to repeat some process until you 198 00:10:03,680 --> 00:10:05,570 reach the end of the card. 199 00:10:05,570 --> 00:10:08,480 You'll want to read 512 bytes, likely using 200 00:10:08,480 --> 00:10:12,680 fread into some sort of some buffer-- some space in your computer's memory 201 00:10:12,680 --> 00:10:15,080 where you have 512 bytes worth of storage 202 00:10:15,080 --> 00:10:17,780 that you can read that data into. 203 00:10:17,780 --> 00:10:20,900 Then, take a look at the 512 bytes that you've read. 204 00:10:20,900 --> 00:10:22,910 If it's the start of a new JPEG, which you 205 00:10:22,910 --> 00:10:25,370 can detect by looking at those first four bytes 206 00:10:25,370 --> 00:10:28,310 and determining whether or not it's a JPEG header or not, 207 00:10:28,310 --> 00:10:30,800 then you know you've found a new JPEG. 208 00:10:30,800 --> 00:10:35,060 So if this is the first JPEG, you should start writing 000 dot JPEG and start 209 00:10:35,060 --> 00:10:37,070 writing your very first file. 210 00:10:37,070 --> 00:10:39,350 Of course, if you've already found a JPEG, 211 00:10:39,350 --> 00:10:41,840 then you'll need to make sure to close the file you've just 212 00:10:41,840 --> 00:10:45,110 been writing to so that you can open up this new file 213 00:10:45,110 --> 00:10:47,830 that you're going to continue writing to. 214 00:10:47,830 --> 00:10:53,620 So that's what to do if the 512 bytes you read is the start of a new JPEG. 215 00:10:53,620 --> 00:10:57,580 But if it's not the start of the new JPEG, what should you do then instead? 216 00:10:57,580 --> 00:11:01,120 Well, if you've already found a JPEG and you've already been writing to it, 217 00:11:01,120 --> 00:11:02,660 then you should keep writing to it. 218 00:11:02,660 --> 00:11:06,490 This is the next 512 byte block of this current JPEG 219 00:11:06,490 --> 00:11:07,690 that you've been writing to. 220 00:11:07,690 --> 00:11:09,970 And you might repeat this process multiple times 221 00:11:09,970 --> 00:11:12,790 because every JPEG might take up multiple blocks 222 00:11:12,790 --> 00:11:16,130 of memory inside of the memory card that we've given to you. 223 00:11:16,130 --> 00:11:17,650 So you'll repeat this process-- 224 00:11:17,650 --> 00:11:20,200 reading 512 bytes and then checking. 225 00:11:20,200 --> 00:11:23,230 If it's the start of a new JPEG, you might need to do something. 226 00:11:23,230 --> 00:11:27,010 If it's not the start of a new JPEG, you might need to do something else. 227 00:11:27,010 --> 00:11:29,350 After you've reached the end of the memory card, 228 00:11:29,350 --> 00:11:31,460 then you should close any remaining files. 229 00:11:31,460 --> 00:11:33,427 And if all goes well, you should be done. 230 00:11:33,427 --> 00:11:35,260 And you should see that you've now generated 231 00:11:35,260 --> 00:11:39,190 a number of JPEG files, each of which contains image data that you can then 232 00:11:39,190 --> 00:11:41,800 open up and view visually. 233 00:11:41,800 --> 00:11:45,420 My name is Brian, and this was recover. 234 00:11:45,420 --> 00:11:47,487