1 00:00:00,000 --> 00:00:09,590 [MUSIC PLAYING] 2 00:00:09,590 --> 00:00:13,240 SPEAKER 1: Multimedia, odds are you see it every day, you hear it every day, 3 00:00:13,240 --> 00:00:14,350 but what is it? 4 00:00:14,350 --> 00:00:17,410 Well, let's start with audio, what you hear coming out of a computer. 5 00:00:17,410 --> 00:00:20,920 Turns out computers are really good at recording and playing back audio, 6 00:00:20,920 --> 00:00:23,230 and they're really good at generating audio as well. 7 00:00:23,230 --> 00:00:26,230 And they can do so using any number of file formats 8 00:00:26,230 --> 00:00:30,400 where a file format is just a way of storing zeros and ones on disk in a way 9 00:00:30,400 --> 00:00:33,620 that certain software knows how to interpret it. 10 00:00:33,620 --> 00:00:36,460 So let's start with a particularly common file format 11 00:00:36,460 --> 00:00:38,890 for musical instruments known as MIDI. 12 00:00:38,890 --> 00:00:43,720 It turns out that using the MIDI format, M-I-D-I can you store effectively 13 00:00:43,720 --> 00:00:46,180 the musical notes that compose some song. 14 00:00:46,180 --> 00:00:48,110 And you can do this for different instruments, 15 00:00:48,110 --> 00:00:50,110 and you can then play these instruments together 16 00:00:50,110 --> 00:00:52,570 by telling the computer to interpret those notes 17 00:00:52,570 --> 00:00:56,140 and then render them based on particular choices of instruments. 18 00:00:56,140 --> 00:00:59,680 For instance, this here is a program called GarageBand on Mac OS, 19 00:00:59,680 --> 00:01:03,160 and I've preloaded a MIDI file that I've downloaded online 20 00:01:03,160 --> 00:01:06,212 and I daresay you will soon recognized the tune. 21 00:01:06,212 --> 00:01:07,420 Let me go ahead and hit play. 22 00:01:07,420 --> 00:01:12,630 [MUSIC PLAYING] 23 00:01:12,630 --> 00:01:26,470 24 00:01:26,470 --> 00:01:28,610 All right, well, that doesn't sound as good 25 00:01:28,610 --> 00:01:31,620 as you might remember it sounding in the movie, but why is that? 26 00:01:31,620 --> 00:01:34,910 Well, that's because my computer was synthesizing that music based only 27 00:01:34,910 --> 00:01:36,140 on those musical notes. 28 00:01:36,140 --> 00:01:38,960 So that wasn't an actual recording of an orchestra performing 29 00:01:38,960 --> 00:01:42,710 that song, but rather it was a computer synthesizing or generating 30 00:01:42,710 --> 00:01:45,650 the music based on an interpretation of those notes. 31 00:01:45,650 --> 00:01:47,810 So MIDI is especially common among musicians who 32 00:01:47,810 --> 00:01:49,436 want to share music with each other. 33 00:01:49,436 --> 00:01:51,560 It's especially common in the digital musical space 34 00:01:51,560 --> 00:01:55,080 where you do want the computer to synthesize the music for you. 35 00:01:55,080 --> 00:01:57,650 But, of course, we humans are generally in the habit 36 00:01:57,650 --> 00:02:01,520 of listening to songs as we know and love them on the radio or from CDs 37 00:02:01,520 --> 00:02:03,710 back in the day or streaming media services. 38 00:02:03,710 --> 00:02:07,310 And those are songs that have actually been performed typically by humans 39 00:02:07,310 --> 00:02:11,460 and recorded often in a concert or in a sound studio, so they sound really, 40 00:02:11,460 --> 00:02:13,790 really good and really really pristine. 41 00:02:13,790 --> 00:02:16,880 Well, you don't have to use MIDI for those kinds of experiences 42 00:02:16,880 --> 00:02:19,645 rather you can use any number of other file formats. 43 00:02:19,645 --> 00:02:21,770 For instance, one of the earliest formats for audio 44 00:02:21,770 --> 00:02:24,310 and still one of the most common for uncompressed audio 45 00:02:24,310 --> 00:02:29,090 is called the wave file format, which can store data in an uncompressed form 46 00:02:29,090 --> 00:02:31,370 so that you have a really, really high quality 47 00:02:31,370 --> 00:02:33,410 versions of some audio recording. 48 00:02:33,410 --> 00:02:35,660 But also popular and perhaps more popular among 49 00:02:35,660 --> 00:02:40,550 consumers is that known as MP3 or MPEG3, which is a file format for audio that 50 00:02:40,550 --> 00:02:45,500 uses compression to significantly reduce generally by a factor of more than 10 51 00:02:45,500 --> 00:02:49,610 just how many bits are necessary to store some song on your hard drive 52 00:02:49,610 --> 00:02:53,390 or on your music device or on your phone or any other form of technology 53 00:02:53,390 --> 00:02:54,710 where you might store music. 54 00:02:54,710 --> 00:02:58,340 And it does so by really throwing away zeros and ones 55 00:02:58,340 --> 00:03:00,680 that we humans can't necessarily hear. 56 00:03:00,680 --> 00:03:03,260 Now, some people will disagree, and true audio files 57 00:03:03,260 --> 00:03:05,670 might disagree and insist that, actually, 58 00:03:05,670 --> 00:03:07,897 you can tell the difference among these file formats, 59 00:03:07,897 --> 00:03:10,730 but that may very well be the case because there's a trade off here. 60 00:03:10,730 --> 00:03:14,099 If you want to use fewer bits and really fewer megabytes 61 00:03:14,099 --> 00:03:15,890 to store your audio files, you might indeed 62 00:03:15,890 --> 00:03:17,914 have to sacrifice some of the quality. 63 00:03:17,914 --> 00:03:21,080 But the upside is that you might be able to store on your phone or your iPod 64 00:03:21,080 --> 00:03:25,880 or some other device 10 times as much music as a result of that compression. 65 00:03:25,880 --> 00:03:29,990 So audio compression is generally what's known as lossy, L-O-S-S-Y, 66 00:03:29,990 --> 00:03:33,170 whereby you're actually losing some of the quality or the fidelity 67 00:03:33,170 --> 00:03:37,550 of the music, but the gain is that you're using far less space to store 68 00:03:37,550 --> 00:03:38,600 that information. 69 00:03:38,600 --> 00:03:42,650 A similar file format in spirit is ACC, which is commonly used for audio files 70 00:03:42,650 --> 00:03:45,300 as well as inside video files for audio. 71 00:03:45,300 --> 00:03:48,500 And that's something that you might see when you download files from-- 72 00:03:48,500 --> 00:03:50,520 via iTunes, for instance, or the like. 73 00:03:50,520 --> 00:03:52,520 And then there are streaming services these days 74 00:03:52,520 --> 00:03:56,960 like Google Play and the Amazon store and Apple Music and Spotify, Pandora, 75 00:03:56,960 --> 00:03:59,930 and others that don't necessarily transfer files outright 76 00:03:59,930 --> 00:04:03,099 to your computer, but stream the bits to you so that they're actually 77 00:04:03,099 --> 00:04:06,140 being played in real time so long as your internet connection can keep up 78 00:04:06,140 --> 00:04:08,180 with the required bandwidth. 79 00:04:08,180 --> 00:04:10,790 So how do we think about the quality of these recordings, 80 00:04:10,790 --> 00:04:13,070 whether we're using any number of these file formats? 81 00:04:13,070 --> 00:04:16,110 Well, you can think of it in terms of at least two parameters. 82 00:04:16,110 --> 00:04:19,850 One is sampling frequency, the number of times per second 83 00:04:19,850 --> 00:04:22,620 that we actually take a digital snapshot, so to speak, 84 00:04:22,620 --> 00:04:26,000 of what it is the human would otherwise be hearing in person so as to then 85 00:04:26,000 --> 00:04:28,580 represented digitally using zeros and ones. 86 00:04:28,580 --> 00:04:30,890 And the second parameter would be the bit depth, 87 00:04:30,890 --> 00:04:34,430 just how many bits are you using for that snapshot in time, 88 00:04:34,430 --> 00:04:36,170 some number of times per second, in order 89 00:04:36,170 --> 00:04:39,324 to represent the pitch and the volume and what it is the human is seeing. 90 00:04:39,324 --> 00:04:41,240 And if you multiply those two values together, 91 00:04:41,240 --> 00:04:43,220 the bit depth and the sample rate, will you 92 00:04:43,220 --> 00:04:45,830 get just how many total bits are necessary to store 93 00:04:45,830 --> 00:04:48,050 for instance one second of music? 94 00:04:48,050 --> 00:04:51,290 And these file formats vary and allow you to vary 95 00:04:51,290 --> 00:04:53,210 exactly what these parameters are. 96 00:04:53,210 --> 00:04:56,180 So by using fewer bits, you might be able to save space 97 00:04:56,180 --> 00:04:59,360 but get a lower quality recording, or if you want a super high quality 98 00:04:59,360 --> 00:05:03,900 recording, you might use a higher bit rate all together. 99 00:05:03,900 --> 00:05:08,420 So now let's transition to graphics, what we see in the world of multimedia. 100 00:05:08,420 --> 00:05:12,320 Turns out here too there's multiple file formats for representing graphics. 101 00:05:12,320 --> 00:05:13,670 And what is a graphic? 102 00:05:13,670 --> 00:05:16,220 Well, graphic really if you think about it is just 103 00:05:16,220 --> 00:05:20,270 a whole bunch of dots otherwise known as pixels both horizontally 104 00:05:20,270 --> 00:05:21,050 and vertically. 105 00:05:21,050 --> 00:05:24,055 Indeed most images that you and I see on the web, on our phones, 106 00:05:24,055 --> 00:05:26,180 on our computers are rectangular in nature, though, 107 00:05:26,180 --> 00:05:29,404 you can make some of the images transparent, 108 00:05:29,404 --> 00:05:31,070 so they might appear to be other shapes. 109 00:05:31,070 --> 00:05:33,500 But at the end of the day, all file formats for images 110 00:05:33,500 --> 00:05:36,590 are rectangular in nature, and you can think of them 111 00:05:36,590 --> 00:05:38,690 as just a grid of pixels or dots. 112 00:05:38,690 --> 00:05:41,570 Now in the simplest form, each of those dots 113 00:05:41,570 --> 00:05:45,270 might just be represented by a single bit, a 1 or a 0. 114 00:05:45,270 --> 00:05:48,220 So for instance, here if you look far enough back, 115 00:05:48,220 --> 00:05:51,290 is what appears to be a very happy smiley face. 116 00:05:51,290 --> 00:05:53,110 But it's pretty simply implemented. 117 00:05:53,110 --> 00:05:55,670 If you think of, again, this rectangular region 118 00:05:55,670 --> 00:05:57,960 as just having a whole bunch of dots or pixels, 119 00:05:57,960 --> 00:06:01,490 I've pretty much colored in in black only those dots necessary 120 00:06:01,490 --> 00:06:03,620 to convey the idea of a happy face and left 121 00:06:03,620 --> 00:06:07,440 in white any of the dots that are otherwise part of our background. 122 00:06:07,440 --> 00:06:09,890 And you might then consider the white pixels 123 00:06:09,890 --> 00:06:12,350 to be represented with a one, and the black pixels 124 00:06:12,350 --> 00:06:14,390 to be represented with a zero or vice versa. 125 00:06:14,390 --> 00:06:17,990 It doesn't really matter, so long as we're consistent in our file format. 126 00:06:17,990 --> 00:06:20,710 And so if you take a step back, you can, kind of, sort of, 127 00:06:20,710 --> 00:06:25,160 but it's really hard to see the same image even among those zeros and ones, 128 00:06:25,160 --> 00:06:29,734 but that might be the simplest mapPNG from binary to an image. 129 00:06:29,734 --> 00:06:32,150 You simply have to decide that there's some number of bits 130 00:06:32,150 --> 00:06:34,380 horizontally, some number of bits vertically. 131 00:06:34,380 --> 00:06:37,160 And if it's a 1, it's a white pixel, and if it's a 0, 132 00:06:37,160 --> 00:06:40,460 it's a black pixel or equivalently vice versa. 133 00:06:40,460 --> 00:06:44,379 But, of course, we don't generally use black and white images alone, 134 00:06:44,379 --> 00:06:46,420 on the internet, on our phones, on our computers. 135 00:06:46,420 --> 00:06:49,070 Indeed, the world would be pretty boring if it only looked like that. 136 00:06:49,070 --> 00:06:51,278 And that's, indeed, how it looked way back in the day 137 00:06:51,278 --> 00:06:54,710 even before there was digital and before we had file formats like this 138 00:06:54,710 --> 00:06:56,480 when you just had black and white TV. 139 00:06:56,480 --> 00:06:59,480 But that would really be similar in spirit 140 00:06:59,480 --> 00:07:02,660 to what we're looking at here with some gray scales as well. 141 00:07:02,660 --> 00:07:05,540 But here let's focus on color and the introduction 142 00:07:05,540 --> 00:07:09,320 of color in a digital context, RGB, red, green, blue. 143 00:07:09,320 --> 00:07:12,200 If you've ever heard this acronym, and even if you haven't, this 144 00:07:12,200 --> 00:07:16,130 represents the three colors that can be mixed together really 145 00:07:16,130 --> 00:07:18,440 to give us any color that we want-- 146 00:07:18,440 --> 00:07:21,720 RGB meaning red, green, and blue. 147 00:07:21,720 --> 00:07:24,019 So using three different values, how much red 148 00:07:24,019 --> 00:07:26,810 do you want, how much green do you want, how much blue do you want, 149 00:07:26,810 --> 00:07:31,740 you can tell a computer to colorize each of those dots in a certain way. 150 00:07:31,740 --> 00:07:35,370 Now if you have none of these colors, you'll actually get a black dot. 151 00:07:35,370 --> 00:07:38,390 And if you have all of these colors mixed together in equal form, 152 00:07:38,390 --> 00:07:39,980 you'll get a white dot. 153 00:07:39,980 --> 00:07:44,840 But it's in the grades in between that you get all sorts of disparate colors. 154 00:07:44,840 --> 00:07:46,640 So let's consider this. 155 00:07:46,640 --> 00:07:51,350 Here is three bytes before you, and each is a byte, because each of these 156 00:07:51,350 --> 00:07:54,690 is 8 bits where, again, a bit is just a 0 or a 1. 157 00:07:54,690 --> 00:07:58,220 So I have eight bits here, eight bits here, and eight bits here. 158 00:07:58,220 --> 00:08:02,600 The first byte of bits, first eight bits, is, of course, all ones 159 00:08:02,600 --> 00:08:03,380 apparently. 160 00:08:03,380 --> 00:08:08,340 The second byte is all zeros, and the third byte is all zeros as well. 161 00:08:08,340 --> 00:08:10,400 So if you view each of these bytes, 1, 2, 162 00:08:10,400 --> 00:08:17,120 3 as representing how much of a certain color red, green, blue, RGB, 163 00:08:17,120 --> 00:08:20,930 this appears to be a lot of red, because all of these bits 164 00:08:20,930 --> 00:08:25,070 are ones, no green and no blue. 165 00:08:25,070 --> 00:08:30,530 So are RGB, red, green, blue, lots of red, no green, no blue. 166 00:08:30,530 --> 00:08:34,640 And so indeed this is how a computer would typically 167 00:08:34,640 --> 00:08:39,650 using eight bits per color or 24 bits in total, 8 plus 8 plus 8, 168 00:08:39,650 --> 00:08:42,500 would represent the number we know as red. 169 00:08:42,500 --> 00:08:44,810 So that is to say if you think of this whole screen 170 00:08:44,810 --> 00:08:46,790 as just one dot-- it's not quite a square. 171 00:08:46,790 --> 00:08:48,540 It's a rectangle in this case-- but if you 172 00:08:48,540 --> 00:08:50,390 think of this whole screen as just one dot, 173 00:08:50,390 --> 00:08:53,150 if a computer wanted to make this dot red, 174 00:08:53,150 --> 00:08:57,380 it would store a pattern of 24 bits, the first eight of which are all ones, 175 00:08:57,380 --> 00:09:00,140 the second eight of which are all zeros, and the third of which 176 00:09:00,140 --> 00:09:01,406 are all zeros as well. 177 00:09:01,406 --> 00:09:03,530 And it will interpret the first of those eight bits 178 00:09:03,530 --> 00:09:07,700 as meaning give me a lot of red, give me no green, give me no blue, 179 00:09:07,700 --> 00:09:12,140 and thus you get a whole screen full of red or a whole pixel full of red. 180 00:09:12,140 --> 00:09:13,700 What if we change it up? 181 00:09:13,700 --> 00:09:19,820 What if we have a zero byte, a byte with all ones, and then another zero byte. 182 00:09:19,820 --> 00:09:24,830 Thereby, making the red zero, the green all ones, and the blue all zeros. 183 00:09:24,830 --> 00:09:28,250 Well, indeed, we'll get a screen filled with all green using 184 00:09:28,250 --> 00:09:29,930 that encoding of 24 bits. 185 00:09:29,930 --> 00:09:33,890 And you might guess in the end here, if we have zeros and zeros and then ones, 186 00:09:33,890 --> 00:09:36,500 RGB, this time we're going to get blue. 187 00:09:36,500 --> 00:09:39,020 That's how a computer using 24 bits would 188 00:09:39,020 --> 00:09:42,110 represent a dot that's entirely blue. 189 00:09:42,110 --> 00:09:44,240 Meanwhile, if you wanted represent black, 190 00:09:44,240 --> 00:09:47,450 you would use all zeros for each of the R, G and B values, 191 00:09:47,450 --> 00:09:50,420 and if you wanted to represent white, you would use all ones for each 192 00:09:50,420 --> 00:09:52,520 of the R,G, and B values. 193 00:09:52,520 --> 00:09:55,970 And you can get any number of colors in between these extremes 194 00:09:55,970 --> 00:09:58,130 in any number of variations of red, green, and blue 195 00:09:58,130 --> 00:10:01,830 by just mixing those colors together in different quantities. 196 00:10:01,830 --> 00:10:04,320 Now it turns out when we talk about graphical file formats, 197 00:10:04,320 --> 00:10:08,090 we don't typically talk in terms of or think in terms of binary. 198 00:10:08,090 --> 00:10:10,460 We rather use something called hexadecimal. 199 00:10:10,460 --> 00:10:13,710 Whereas binary just has two digits, zero and one 200 00:10:13,710 --> 00:10:17,060 and whereas recall decimal has 10 digits zero through nine, 201 00:10:17,060 --> 00:10:18,960 hexadecimal is a little different. 202 00:10:18,960 --> 00:10:21,080 It has 16 possible digits. 203 00:10:21,080 --> 00:10:23,900 And it's a little weird, but it's at least pretty straightforward. 204 00:10:23,900 --> 00:10:27,020 Those 16 digits are zero through nine, and then 205 00:10:27,020 --> 00:10:35,210 A through F. In other words, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F. 206 00:10:35,210 --> 00:10:38,900 And so, of course, zero is the smallest number we can represent, 207 00:10:38,900 --> 00:10:42,050 and 15 is going to be the largest number we can represent, 208 00:10:42,050 --> 00:10:44,900 which is to say that F represents a 15. 209 00:10:44,900 --> 00:10:46,790 So, in fact, let's consider an example. 210 00:10:46,790 --> 00:10:49,482 Here is a pattern of eight bits, all of which are one. 211 00:10:49,482 --> 00:10:52,190 Let me go ahead and add a little bit of space to these eight bits 212 00:10:52,190 --> 00:10:54,650 just to separate them into two groups of four, 213 00:10:54,650 --> 00:10:57,710 because it turns out one of the nice features of hexadecimal 214 00:10:57,710 --> 00:11:02,300 mathematically is that each hexadecimal digit zero through F 215 00:11:02,300 --> 00:11:04,910 represents in total, four bits. 216 00:11:04,910 --> 00:11:08,230 Which is to say that we can take a number in binary like this, 217 00:11:08,230 --> 00:11:11,510 look at it as two halves, one half of a byte followed 218 00:11:11,510 --> 00:11:15,170 by another half of a byte, and use one hexadecimal digit instead 219 00:11:15,170 --> 00:11:18,140 of four binary digits to represent the first four bits. 220 00:11:18,140 --> 00:11:22,490 And then one other hexadecimal digit to represent the other four bits. 221 00:11:22,490 --> 00:11:26,270 So we can take something that takes eight symbols represent and widdle it 222 00:11:26,270 --> 00:11:28,730 down to just two, which is pretty convenient. 223 00:11:28,730 --> 00:11:32,190 And so, in fact, it turns out that in hexadecimal 224 00:11:32,190 --> 00:11:35,810 if we had all zeros, in hexadecimal that would just be 0. 225 00:11:35,810 --> 00:11:40,656 But if we have all ones, 1, 1, 1 and we convert that to hexadecimal, 226 00:11:40,656 --> 00:11:43,280 that's going to be if this is the one place and the twos places 227 00:11:43,280 --> 00:11:45,280 and the fours place and the eights place, that's 228 00:11:45,280 --> 00:11:47,210 going to be the number 15, otherwise known 229 00:11:47,210 --> 00:11:53,030 in hexadecimal as F. Which is to say if you have a byte of bits, 230 00:11:53,030 --> 00:11:57,890 8 bits, all of which are ones, you can think of that same byte 231 00:11:57,890 --> 00:12:02,570 as being two hexadecimal digits FF, as opposed to thinking of it 232 00:12:02,570 --> 00:12:07,919 as 1, 1,1,1, 1, 1, 1, 1, it's just FF. 233 00:12:07,919 --> 00:12:09,710 So it's a more succinct way of representing 234 00:12:09,710 --> 00:12:11,670 the exact same information. 235 00:12:11,670 --> 00:12:16,040 And so accordingly, if you want to think about red a little more succinctly, 236 00:12:16,040 --> 00:12:19,100 you don't have to think about it in terms of eight ones and eights zeros 237 00:12:19,100 --> 00:12:23,010 and eight zeros, you can think of it in terms of FF, 0, 0, 0, 238 00:12:23,010 --> 00:12:28,030 0 just because it's more succinct-- similarly for green, 0,0 F, F, 0, 239 00:12:28,030 --> 00:12:34,700 0 and for blue 0,0, 0,0, F,F. It's just a more succinct way of explaining 240 00:12:34,700 --> 00:12:38,180 oneself, and indeed a lot of graphical editing programs like Photoshop being 241 00:12:38,180 --> 00:12:41,510 one of the most popular actually use this notation certainly instead 242 00:12:41,510 --> 00:12:47,130 of binary and also instead often of decimal just by convention. 243 00:12:47,130 --> 00:12:49,820 So now let's consider some specific file formats. 244 00:12:49,820 --> 00:12:52,910 If you're a PC user, you might not have seen this in a while, 245 00:12:52,910 --> 00:12:54,680 but odds are when you did it was for quite 246 00:12:54,680 --> 00:13:00,560 a few years, this beautiful rolling hill with a beautiful cloudy sky behind it. 247 00:13:00,560 --> 00:13:03,530 This was, of course, the wallpaper or the background image 248 00:13:03,530 --> 00:13:07,850 that came by default with Windows XP on operating system from Microsoft 249 00:13:07,850 --> 00:13:09,170 for PC computers. 250 00:13:09,170 --> 00:13:12,290 So the very first time you turn on your computer and, perhaps, logged in, 251 00:13:12,290 --> 00:13:14,840 you would see a screen like this and maybe some of your icons 252 00:13:14,840 --> 00:13:16,890 and your recycle bin and the like. 253 00:13:16,890 --> 00:13:20,390 Now as an aside and spoiler, this is what that same hill apparently 254 00:13:20,390 --> 00:13:21,780 looks like today. 255 00:13:21,780 --> 00:13:25,280 So it hasn't necessarily aged well, but for our purposes what's interesting 256 00:13:25,280 --> 00:13:28,340 here is what this image was stored as. 257 00:13:28,340 --> 00:13:33,680 It turns out that this image originally was a bitmap file, BMP, or bitmap, 258 00:13:33,680 --> 00:13:37,400 B-I-T-M-A-P, to pronounce it out loud. 259 00:13:37,400 --> 00:13:40,700 And that file format really is what that word implies. 260 00:13:40,700 --> 00:13:42,520 It's a map of bits. 261 00:13:42,520 --> 00:13:45,940 It's a grid of bits, which is perfectly consistent with our definition 262 00:13:45,940 --> 00:13:49,670 earlier of a very simple smiley face using just zeros 263 00:13:49,670 --> 00:13:51,830 and ones or black and white dots. 264 00:13:51,830 --> 00:13:55,284 This case, clearly, has many more colors than that, 265 00:13:55,284 --> 00:13:57,200 and indeed it's certainly the case in general. 266 00:13:57,200 --> 00:14:00,140 The graphical file formats on computers support 267 00:14:00,140 --> 00:14:02,370 dozens of colors, hundreds of colors, thousands, 268 00:14:02,370 --> 00:14:05,330 maybe even millions of colors, certainly, more than just 269 00:14:05,330 --> 00:14:07,160 black and white alone. 270 00:14:07,160 --> 00:14:09,560 But there's a finite amount of information here. 271 00:14:09,560 --> 00:14:14,420 And even though this looks like a beautifully crisp green grassy area 272 00:14:14,420 --> 00:14:15,920 and a beautifully blue-- 273 00:14:15,920 --> 00:14:19,670 a beautifully blue sky with some very smooth clouds, 274 00:14:19,670 --> 00:14:22,070 if we actually zoom in on those clouds, you'll 275 00:14:22,070 --> 00:14:26,840 see that indeed an image is really just a grid of dots. 276 00:14:26,840 --> 00:14:30,890 In fact, let me zoom in on those clouds, and I've not done any alterations. 277 00:14:30,890 --> 00:14:33,800 I simply used a graphical program to take that same sky 278 00:14:33,800 --> 00:14:36,470 and zoom in, zoom in, zoom in as much as I can. 279 00:14:36,470 --> 00:14:40,730 And as soon as you zoom in enough, you see that that cloud that previously 280 00:14:40,730 --> 00:14:44,030 looked especially smooth to the human eye, really isn't. 281 00:14:44,030 --> 00:14:47,990 It's just that my human eyes can't really see dots, especially 282 00:14:47,990 --> 00:14:51,172 clearly when they're really small, and there's a very high resolution so 283 00:14:51,172 --> 00:14:53,630 to speak-- a lot of pixels horizontally and a lot of pixels 284 00:14:53,630 --> 00:14:55,320 vertically in an image. 285 00:14:55,320 --> 00:15:00,060 But if I do zoom in on that, I actually do see the pixilation so to speak, 286 00:15:00,060 --> 00:15:01,640 whereby you actually see the dots. 287 00:15:01,640 --> 00:15:03,860 And you can see that those clouds are really just 288 00:15:03,860 --> 00:15:06,750 roughly represented as a green-- 289 00:15:06,750 --> 00:15:13,410 as a grid of dots or a map of pixels, a rectangular region of pixels. 290 00:15:13,410 --> 00:15:16,580 So that's all very interesting now, because it 291 00:15:16,580 --> 00:15:19,190 would seem that we don't have an endless ability 292 00:15:19,190 --> 00:15:21,500 to zoom and zoom and zoom in and see more 293 00:15:21,500 --> 00:15:25,400 and more detail unless that information's already there. 294 00:15:25,400 --> 00:15:29,210 And so, much like with audio, when you have the choice over just how many bits 295 00:15:29,210 --> 00:15:32,900 to use, so in the world of images do you have 296 00:15:32,900 --> 00:15:34,560 discretion over how many bits to use. 297 00:15:34,560 --> 00:15:37,280 How many bits do you use to represent each dots color. 298 00:15:37,280 --> 00:15:41,140 And that might indeed be just 8 bits for red, 8 bits for green, 8 bits for blue, 299 00:15:41,140 --> 00:15:47,210 AKA 24-bit color, but resolution also play-- comes into play. 300 00:15:47,210 --> 00:15:52,010 If you have an image that's only 100 pixels, for instance, by 100 pixels, 301 00:15:52,010 --> 00:15:55,340 horizontally by vertically, it might only be this big. 302 00:15:55,340 --> 00:15:57,710 Now that might not big enough to fill-- be big enough 303 00:15:57,710 --> 00:16:00,800 to fill your whole background wallpaper on your computer, 304 00:16:00,800 --> 00:16:04,179 and so you might try to scale it up or zoom in on it. 305 00:16:04,179 --> 00:16:07,220 But when you do that, you're taking only a limited amount of information, 306 00:16:07,220 --> 00:16:09,219 100 pixels by 100 pixels, and you're essentially 307 00:16:09,219 --> 00:16:12,890 just duplicating those pixels making them bigger and blotchier just 308 00:16:12,890 --> 00:16:14,060 to fill your screen. 309 00:16:14,060 --> 00:16:17,570 Better would be to not start with an image with so few pixels, 310 00:16:17,570 --> 00:16:19,910 but rather get a much higher resolution image. 311 00:16:19,910 --> 00:16:23,120 And indeed, this is what you get with newer and better camera phones 312 00:16:23,120 --> 00:16:26,520 these days, newer and bigger, better digital cameras 313 00:16:26,520 --> 00:16:29,340 is among other things do you get higher and higher resolution. 314 00:16:29,340 --> 00:16:33,300 More and more dots, so that the dots ultimately that we humans see 315 00:16:33,300 --> 00:16:37,470 are so small on our screens, it looks ever more smooth 316 00:16:37,470 --> 00:16:39,100 than, say, an image like this. 317 00:16:39,100 --> 00:16:41,520 So generally speaking, higher resolution gives us 318 00:16:41,520 --> 00:16:44,000 higher fidelity and a cleaner image. 319 00:16:44,000 --> 00:16:47,940 The other factors in cameras certainly play into that as well. 320 00:16:47,940 --> 00:16:50,010 But there's something else I notice here. 321 00:16:50,010 --> 00:16:53,910 It seems a little silly that I'm using the same number of bits 322 00:16:53,910 --> 00:16:57,125 to represent the color of every one of the dots on the screen. 323 00:16:57,125 --> 00:16:59,250 Because even though I do see a few different shades 324 00:16:59,250 --> 00:17:02,760 of gray or white in there and light blue and dark blue, 325 00:17:02,760 --> 00:17:05,650 I see a lot of identical blue throughout this image. 326 00:17:05,650 --> 00:17:08,880 There's a lot of redundancy, and indeed if we rewind, 327 00:17:08,880 --> 00:17:11,121 there's a whole lot of blue in this image itself. 328 00:17:11,121 --> 00:17:13,079 There's a whole bunch of similar white it would 329 00:17:13,079 --> 00:17:14,495 seem in the middles of the clouds. 330 00:17:14,495 --> 00:17:16,680 There's a whole bunch of similar looking green. 331 00:17:16,680 --> 00:17:21,119 And yet we are using, it would seem by default, 24 pixels-- 332 00:17:21,119 --> 00:17:25,140 24 bits for every pixel, which just seems wasteful 333 00:17:25,140 --> 00:17:28,540 even if one pixel is identical to the one next to it. 334 00:17:28,540 --> 00:17:33,150 So it turns out that graphical file formats can often be compressed, 335 00:17:33,150 --> 00:17:35,140 and this can be done in different ways. 336 00:17:35,140 --> 00:17:38,280 It can be done losslessly or lossely. 337 00:17:38,280 --> 00:17:45,120 So earlier you'll recall that I proposed shrinking audio files by throwing away 338 00:17:45,120 --> 00:17:47,910 information that maybe my human ears can't necessarily hear 339 00:17:47,910 --> 00:17:50,579 or my non-audio file might not even notice are missing. 340 00:17:50,579 --> 00:17:52,620 And that would be lossy compression, and then I'm 341 00:17:52,620 --> 00:17:56,440 just throwing information away assuming that the user's not going to notice. 342 00:17:56,440 --> 00:17:58,290 But that's not always necessary. 343 00:17:58,290 --> 00:18:00,840 Sometimes you can do lossless compression, 344 00:18:00,840 --> 00:18:04,290 whereby you can use fewer bits to store the same information. 345 00:18:04,290 --> 00:18:06,630 You just have to store it more intelligently. 346 00:18:06,630 --> 00:18:11,820 So consider this example here where you have an apple against a blue backdrop 347 00:18:11,820 --> 00:18:16,120 and that, much like our blue sky, seems pretty consistent throughout. 348 00:18:16,120 --> 00:18:18,790 And so it seems a little silly intuitively 349 00:18:18,790 --> 00:18:21,580 to record an image like this on disk as follows. 350 00:18:21,580 --> 00:18:25,110 If we think about me being a verbalisation of a file format, 351 00:18:25,110 --> 00:18:29,370 make this pixel blue, make this pixel blue, make this pixel blue, 352 00:18:29,370 --> 00:18:32,490 make this pixel blue, make this pixel blue, make this pixel blue. 353 00:18:32,490 --> 00:18:36,180 Literally saying the same sentence, or more technically 354 00:18:36,180 --> 00:18:40,560 using the same 24 bits for every pixel across that entire row 355 00:18:40,560 --> 00:18:43,860 even though my sentence might not be changing. 356 00:18:43,860 --> 00:18:47,940 And so instead what a clever file format might do is this. 357 00:18:47,940 --> 00:18:50,310 This is not what the user sees, but this is 358 00:18:50,310 --> 00:18:54,780 what the file format could store with respect to all of that redundant blue. 359 00:18:54,780 --> 00:18:57,000 Just remember, for instance, the leftmost 360 00:18:57,000 --> 00:19:01,230 pixels color as by saying this pixel is blue, 361 00:19:01,230 --> 00:19:05,160 and then for the rest of the row or scanline as it's called in an image 362 00:19:05,160 --> 00:19:08,950 just say, and so are the rest of the pixels in this row. 363 00:19:08,950 --> 00:19:13,170 So I can say much more concisely, essentially repeat this color 364 00:19:13,170 --> 00:19:15,300 throughout the entirety of the rest of the row, 365 00:19:15,300 --> 00:19:18,150 thereby saving myself any number of sentences let alone 366 00:19:18,150 --> 00:19:20,970 any number of 24 bits. 367 00:19:20,970 --> 00:19:23,340 And I can do that the same here, make this pixel blue 368 00:19:23,340 --> 00:19:25,230 and then repeat that image-- 369 00:19:25,230 --> 00:19:26,940 that color again and again and again. 370 00:19:26,940 --> 00:19:29,731 Now it gets a little less efficient as soon as we hit like the stem 371 00:19:29,731 --> 00:19:32,070 on the apple, because then that sentence has to change. 372 00:19:32,070 --> 00:19:36,604 Then we have to say something like make this pixel brown, make this pixel blue, 373 00:19:36,604 --> 00:19:37,520 and then repeat again. 374 00:19:37,520 --> 00:19:41,400 So we have to, kind of, stop and start if there's some obstruction in the way. 375 00:19:41,400 --> 00:19:43,530 And the same thing for the red apple itself. 376 00:19:43,530 --> 00:19:47,820 But just look based on the white at how much information we're potentially 377 00:19:47,820 --> 00:19:50,130 saving or how many bits we're potentially saving, 378 00:19:50,130 --> 00:19:54,420 and yet we're saving those bits in a way that the original information is 379 00:19:54,420 --> 00:19:55,350 recoverable. 380 00:19:55,350 --> 00:19:58,110 Just because we don't store 24 bits representing 381 00:19:58,110 --> 00:20:02,250 blue for every one of these dots on the screen, doesn't mean we 382 00:20:02,250 --> 00:20:05,640 can't display blue there just by interpreting this file 383 00:20:05,640 --> 00:20:07,710 format a little more cleverly. 384 00:20:07,710 --> 00:20:12,990 And so this is indeed how a file format might actually losslessly compress 385 00:20:12,990 --> 00:20:16,200 itself using fewer bits to store the same image, but in a way 386 00:20:16,200 --> 00:20:20,140 where you can recover the original image itself. 387 00:20:20,140 --> 00:20:24,540 Now let's take a look at another example this time of lossy compression. 388 00:20:24,540 --> 00:20:28,620 Here is a beautiful sunflower taken somewhere here on campus 389 00:20:28,620 --> 00:20:29,730 at Harvard University. 390 00:20:29,730 --> 00:20:34,890 This is a high quality JPEG photograph where JPEG is a popular file 391 00:20:34,890 --> 00:20:37,440 format for photographs especially. 392 00:20:37,440 --> 00:20:42,000 And this image here was somewhat compressed, but not very compressed. 393 00:20:42,000 --> 00:20:45,930 In fact, only if I put my face really awkwardly close to the screen 394 00:20:45,930 --> 00:20:49,350 do I see that it's a little bit blotchy way up close. 395 00:20:49,350 --> 00:20:53,640 But from just a foot or so or beyond, it looks perfectly pristine. 396 00:20:53,640 --> 00:20:56,650 But not if we compress this image further. 397 00:20:56,650 --> 00:21:01,080 Suppose that this image is just too big to fit on my Facebook profile page, 398 00:21:01,080 --> 00:21:06,240 or it's just too big to email to a friend via my phone. 399 00:21:06,240 --> 00:21:10,170 In other words, I need to use fewer bits or fewer megabytes 400 00:21:10,170 --> 00:21:13,920 even if it's a really big file to store this same image 401 00:21:13,920 --> 00:21:16,440 and convey the gist of the image to that friend. 402 00:21:16,440 --> 00:21:19,110 Now I see a little bit of blue and I do see a bunch of yellow, 403 00:21:19,110 --> 00:21:21,690 but it's not quite the same clean pattern 404 00:21:21,690 --> 00:21:25,110 that we saw with the apple or even the blissful blue sky 405 00:21:25,110 --> 00:21:27,660 above the green grassy hill. 406 00:21:27,660 --> 00:21:31,170 And so if I were instead using a file format that can still 407 00:21:31,170 --> 00:21:36,780 be compressed, but lossily where we're actually throwing information away, 408 00:21:36,780 --> 00:21:39,270 this might be the before image. 409 00:21:39,270 --> 00:21:40,560 And now wait for it. 410 00:21:40,560 --> 00:21:43,530 This might be the after image. 411 00:21:43,530 --> 00:21:46,500 So it's still clearly a sunflower, though it looks 412 00:21:46,500 --> 00:21:48,420 a little more sickly at this point. 413 00:21:48,420 --> 00:21:50,070 But it definitely looks blotchier. 414 00:21:50,070 --> 00:21:52,620 In fact, from a foot or more away, I can actually 415 00:21:52,620 --> 00:21:55,140 see that my sky has become very pixelated. 416 00:21:55,140 --> 00:21:58,560 It almost looks like Super Mario Bros. back in the old Nintendo systems 417 00:21:58,560 --> 00:22:00,330 where you could really see the big dots. 418 00:22:00,330 --> 00:22:03,120 And the greenery here is just a grid of pixels too, 419 00:22:03,120 --> 00:22:05,250 and even the flower has really just become 420 00:22:05,250 --> 00:22:09,210 a collection of dots that I ever so clearly see on the screen. 421 00:22:09,210 --> 00:22:12,720 And certainly this flower looks none so good anymore. 422 00:22:12,720 --> 00:22:13,890 So let's rewind. 423 00:22:13,890 --> 00:22:19,620 This was before, after, before, after. 424 00:22:19,620 --> 00:22:24,330 And so this is what it means to lossily compress an image. 425 00:22:24,330 --> 00:22:28,830 I cannot go from this pretty poor version back to the original, 426 00:22:28,830 --> 00:22:33,210 if I have achieved this compression by just throwing away some of those bits. 427 00:22:33,210 --> 00:22:35,550 So whereas before I was very cleverly just 428 00:22:35,550 --> 00:22:39,870 remembering repetition in the image, in this case using this file format, 429 00:22:39,870 --> 00:22:42,000 especially when you really turn the virtual knob 430 00:22:42,000 --> 00:22:44,460 and say compress this as much as you can. 431 00:22:44,460 --> 00:22:47,070 Essentially what my graphical software is going to do 432 00:22:47,070 --> 00:22:48,600 is start to use approximations. 433 00:22:48,600 --> 00:22:52,800 Well, does this leaf here really need to be 20 different shades of green? 434 00:22:52,800 --> 00:22:54,115 How about just two? 435 00:22:54,115 --> 00:22:57,240 And that's why I get this big green blotch here and this other green blotch 436 00:22:57,240 --> 00:22:57,970 here. 437 00:22:57,970 --> 00:23:01,500 Does this sky really needs to be 30 different shades of blue? 438 00:23:01,500 --> 00:23:04,950 How about two shades of blue and two shades of gray? 439 00:23:04,950 --> 00:23:07,860 And so that might be a way to use less information to still represent 440 00:23:07,860 --> 00:23:09,070 the same sky. 441 00:23:09,070 --> 00:23:13,920 I don't know in this file format just how clear the sky used to be, 442 00:23:13,920 --> 00:23:17,700 because those dots have essentially been thrown away and aggregated in this way. 443 00:23:17,700 --> 00:23:19,320 But it makes for a much smile-- 444 00:23:19,320 --> 00:23:21,650 much smaller file format. 445 00:23:21,650 --> 00:23:23,724 And so what are the formats that are disposable? 446 00:23:23,724 --> 00:23:25,890 Well, there's any number of options out there today, 447 00:23:25,890 --> 00:23:28,110 but perhaps the most common are these. 448 00:23:28,110 --> 00:23:30,330 There's the bitmap file format, which was commonly 449 00:23:30,330 --> 00:23:34,050 used originally in Windows and other contexts, not super common these days. 450 00:23:34,050 --> 00:23:36,450 Certainly, not on the web, but does indeed 451 00:23:36,450 --> 00:23:41,940 lay out all of your pixels in a grid essentially on disk of zeros and ones. 452 00:23:41,940 --> 00:23:48,150 Meanwhile, there's gif, which is commonly used for low quality images 453 00:23:48,150 --> 00:23:49,800 in multiple senses of the word. 454 00:23:49,800 --> 00:23:53,790 This is often used for icons on the screen or clip art that you might see, 455 00:23:53,790 --> 00:23:57,964 and it's also increasingly used for internet memes or the kinds of images 456 00:23:57,964 --> 00:23:59,880 that you might forward along to friends or see 457 00:23:59,880 --> 00:24:03,600 popping up on your screen in large part, because gifs can be animated. 458 00:24:03,600 --> 00:24:07,380 So they're, sort of, a very low end version of a video file 459 00:24:07,380 --> 00:24:10,720 where really it's like an image with-- 460 00:24:10,720 --> 00:24:14,250 it's like a video file with just a few images inside 461 00:24:14,250 --> 00:24:17,520 of it that often play on the repeat, so one after the other 462 00:24:17,520 --> 00:24:19,890 creating the illusion of some form of animation. 463 00:24:19,890 --> 00:24:24,120 But the resolution of gifs tends to be not very high, although they 464 00:24:24,120 --> 00:24:27,570 can be losslessly compressed, as we saw with the apple before, 465 00:24:27,570 --> 00:24:30,990 but they only support 8-bit color. 466 00:24:30,990 --> 00:24:34,140 And 8 bits can mean-- implies that we can only 467 00:24:34,140 --> 00:24:38,466 have a total of 256 colors in the image itself, which limits the range. 468 00:24:38,466 --> 00:24:40,590 And so they tend not to look great, especially when 469 00:24:40,590 --> 00:24:45,630 large for things like photographs of humans and in grassy knolls. 470 00:24:45,630 --> 00:24:49,030 JPEG, meanwhile, is the file format we saw just a moment ago 471 00:24:49,030 --> 00:24:50,760 of that beautiful sunflowers. 472 00:24:50,760 --> 00:24:54,617 This actually supports 24-bit color, but is lossily compress, 473 00:24:54,617 --> 00:24:57,450 so you might lose some information when shrinking those image files, 474 00:24:57,450 --> 00:25:01,500 but it allows you so many more colors that you can see images typically 475 00:25:01,500 --> 00:25:04,950 with much higher fidelity at much greater quality. 476 00:25:04,950 --> 00:25:07,110 Meanwhile, there's PNGs as well. 477 00:25:07,110 --> 00:25:10,380 PNGs are commonly used for high quality graphics 478 00:25:10,380 --> 00:25:15,150 that you might want to print or resize, supporting 24-bit color as well, 479 00:25:15,150 --> 00:25:18,090 and are generally used for images that you might indeed 480 00:25:18,090 --> 00:25:20,130 want to use in multiple contexts. 481 00:25:20,130 --> 00:25:25,110 Not neccess-- not so much photographs, but other artwork that's higher quality 482 00:25:25,110 --> 00:25:25,950 than gifs. 483 00:25:25,950 --> 00:25:27,390 And here's just a few examples. 484 00:25:27,390 --> 00:25:32,190 This is, perhaps, the most ridiculous animated gif that I could find. 485 00:25:32,190 --> 00:25:34,660 This here being a cat flying through the sky. 486 00:25:34,660 --> 00:25:37,680 And this is an animated gif in the sense that it's really just one 487 00:25:37,680 --> 00:25:39,960 image after another, after another, after another, 488 00:25:39,960 --> 00:25:43,050 and they're repeating again and again and again and again. 489 00:25:43,050 --> 00:25:45,330 So even though it looks like motion, really you're 490 00:25:45,330 --> 00:25:47,356 just seeing a bunch of images each of which 491 00:25:47,356 --> 00:25:49,230 has the cat in a slightly different position, 492 00:25:49,230 --> 00:25:51,924 and it's rainbow and the stars in a slightly different position. 493 00:25:51,924 --> 00:25:54,840 And if you loop these again and again, it looks like the cat's moving, 494 00:25:54,840 --> 00:25:58,750 but really you're just seeing a whole bunch of images every split second. 495 00:25:58,750 --> 00:26:02,790 Meanwhile, here is another JPEG in addition to the sunflower earlier. 496 00:26:02,790 --> 00:26:06,240 This is a beautiful shot of the ceiling here in Sanders Theater at Harvard 497 00:26:06,240 --> 00:26:09,270 University, and JPEG really lends itself to photography, 498 00:26:09,270 --> 00:26:11,610 because you have not only a huge range of colors, 499 00:26:11,610 --> 00:26:14,910 you also have the choice not really to compress the files very much. 500 00:26:14,910 --> 00:26:17,970 The fact that my sunflower got so ugly on the screen 501 00:26:17,970 --> 00:26:21,510 was because I deliberately said compress that sunflower as much as you can, 502 00:26:21,510 --> 00:26:23,070 but that doesn't need to be the case. 503 00:26:23,070 --> 00:26:25,400 If you can afford to spend the bytes on disk 504 00:26:25,400 --> 00:26:28,400 or you can afford to post a really big image on the internet, 505 00:26:28,400 --> 00:26:30,830 then you can certainly use minimal compression 506 00:26:30,830 --> 00:26:33,140 and capture a really beautiful image. 507 00:26:33,140 --> 00:26:36,080 As for a PNG, here might be a good opportunity for a PNG, 508 00:26:36,080 --> 00:26:38,840 a really high resolution version of say Harvard's crest 509 00:26:38,840 --> 00:26:41,870 that you might want to print small on some piece of paper or large 510 00:26:41,870 --> 00:26:43,140 on a banner or the like. 511 00:26:43,140 --> 00:26:48,110 And so this might lend itself especially to an application like that. 512 00:26:48,110 --> 00:26:51,140 Of course, we don't have an infinite amount of information 513 00:26:51,140 --> 00:26:53,750 at our disposal in graphics. 514 00:26:53,750 --> 00:26:56,810 Rather we only have the pixels and the dots 515 00:26:56,810 --> 00:27:01,550 and the colors that are there when that image was saved in some file format. 516 00:27:01,550 --> 00:27:06,650 And so it's quite all too common to see in popular television and film, 517 00:27:06,650 --> 00:27:11,007 sort of, abuses of what it means to be a multimedia format and a graphical 518 00:27:11,007 --> 00:27:11,840 file format at that. 519 00:27:11,840 --> 00:27:15,530 Such that there's entered the lexicon this notion of enhance 520 00:27:15,530 --> 00:27:19,220 where enhance essentially means apparently in the media 521 00:27:19,220 --> 00:27:22,580 make this image as clearly readable as possible 522 00:27:22,580 --> 00:27:24,980 no matter what format it was saved in. 523 00:27:24,980 --> 00:27:29,770 And we can see some examples of that with this popular TV show here. 524 00:27:29,770 --> 00:27:32,210 SPEAKER 2: We know. 525 00:27:32,210 --> 00:27:35,702 SPEAKER 3: That at 9:15 Ray Santoya was at the ATM. 526 00:27:35,702 --> 00:27:39,230 SPEAKER 2: The question is what was he doing at 9:16? 527 00:27:39,230 --> 00:27:42,040 SPEAKER 3: Shooting the 9 millimeter at something. 528 00:27:42,040 --> 00:27:43,640 Maybe he saw the sniper. 529 00:27:43,640 --> 00:27:45,630 SPEAKER 2: [INAUDIBLE] 530 00:27:45,630 --> 00:27:46,970 SPEAKER 3: Right. 531 00:27:46,970 --> 00:27:47,576 Go back one. 532 00:27:47,576 --> 00:27:48,700 SPEAKER 2: What do you see? 533 00:27:48,700 --> 00:27:56,560 534 00:27:56,560 --> 00:28:00,126 SPEAKER 3: Bring his face up full screen. 535 00:28:00,126 --> 00:28:01,593 SPEAKER 2: His glasses. 536 00:28:01,593 --> 00:28:03,549 SPEAKER 3: There's a reflection. 537 00:28:03,549 --> 00:28:12,840 538 00:28:12,840 --> 00:28:14,630 SPEAKER 2: [INAUDIBLE] baseball team. 539 00:28:14,630 --> 00:28:15,560 That's their logo. 540 00:28:15,560 --> 00:28:18,290 SPEAKER 3: And he's talking to whoever's wearing a jacket. 541 00:28:18,290 --> 00:28:19,980 SPEAKER 2: We may have a witness. 542 00:28:19,980 --> 00:28:21,934 SPEAKER 3: To both shootings. 543 00:28:21,934 --> 00:28:23,850 SPEAKER 1: All right, let's take a closer look 544 00:28:23,850 --> 00:28:25,420 at exactly what we just saw. 545 00:28:25,420 --> 00:28:28,290 So they're watching this video of some bad guy presumably, 546 00:28:28,290 --> 00:28:30,090 and they're trying to identify the suspect. 547 00:28:30,090 --> 00:28:32,048 So they're really just looking at what's called 548 00:28:32,048 --> 00:28:34,660 a frame in a video, which for all intensive purposes 549 00:28:34,660 --> 00:28:37,080 is just an image inside of a video. 550 00:28:37,080 --> 00:28:38,190 Because what's a video? 551 00:28:38,190 --> 00:28:40,560 Well, much like the animation we saw a moment ago, 552 00:28:40,560 --> 00:28:43,230 a video really is just a set of images being 553 00:28:43,230 --> 00:28:45,480 shown really fast to the human eye generally 554 00:28:45,480 --> 00:28:48,120 at a rate of 24 frames or images per second 555 00:28:48,120 --> 00:28:50,370 or as many as 30 frames or images per second, 556 00:28:50,370 --> 00:28:54,060 thereby creating the illusion of motion or really motion pictures. 557 00:28:54,060 --> 00:28:56,130 But really it's just a whole bunch of pictures 558 00:28:56,130 --> 00:28:57,910 being shown to us super quickly. 559 00:28:57,910 --> 00:28:59,280 So here's one such picture. 560 00:28:59,280 --> 00:29:02,470 And here apparently is the key to solving this mystery. 561 00:29:02,470 --> 00:29:07,050 Indeed, if we enhance that glint in this fellow's eye, 562 00:29:07,050 --> 00:29:08,940 we apparently see exactly this. 563 00:29:08,940 --> 00:29:13,950 And by the magical incantation of enhance do we apparently see this. 564 00:29:13,950 --> 00:29:16,950 And this is where reality breaks down. 565 00:29:16,950 --> 00:29:19,110 If this is the entirety of the information that 566 00:29:19,110 --> 00:29:21,330 has been stored in some file format and indeed you 567 00:29:21,330 --> 00:29:24,960 can see the pixels and the pixelation, the blotchiness 568 00:29:24,960 --> 00:29:28,020 because only so many bits and only so much resolution 569 00:29:28,020 --> 00:29:30,450 was used to store that image and we are looking 570 00:29:30,450 --> 00:29:34,140 at a tiny, tiny, tiny fraction of it in the reflection 571 00:29:34,140 --> 00:29:38,400 of that fellow sunglasses, this is all the information that we might have. 572 00:29:38,400 --> 00:29:40,262 Now, you might stare at this all day long 573 00:29:40,262 --> 00:29:41,970 and, kind of, sort of, think that you see 574 00:29:41,970 --> 00:29:46,140 who it is that had perpetrated this crime, 575 00:29:46,140 --> 00:29:49,110 but you're certainly not going to get from that anything 576 00:29:49,110 --> 00:29:52,470 close to the resolution of this, unless the original video 577 00:29:52,470 --> 00:29:56,340 and, therefore, the original frame or image was as high resolution 578 00:29:56,340 --> 00:29:58,220 as this output suggests. 579 00:29:58,220 --> 00:30:01,290 So the information, the bits, the pixels aren't just there. 580 00:30:01,290 --> 00:30:06,110 And even cartoons of today like Futurama know this. 581 00:30:06,110 --> 00:30:07,690 SPEAKER 4: Magnify that death spear. 582 00:30:07,690 --> 00:30:10,430 583 00:30:10,430 --> 00:30:11,890 Why is it still blurry? 584 00:30:11,890 --> 00:30:13,850 SPEAKER 5: That's the resolution we have. 585 00:30:13,850 --> 00:30:16,190 Making it bigger doesn't make it clearer. 586 00:30:16,190 --> 00:30:19,090 SPEAKER 4: It does on CSI Miami. 587 00:30:19,090 --> 00:30:21,650 SPEAKER 1: All right, and what better segue then 588 00:30:21,650 --> 00:30:26,330 to video file formats themselves then these excerpts from some actual videos. 589 00:30:26,330 --> 00:30:30,050 Indeed, you can think of a video file format as very reminiscent of something 590 00:30:30,050 --> 00:30:30,900 from the real world. 591 00:30:30,900 --> 00:30:34,130 In fact, as a kid if you either made or played with these little flip books, 592 00:30:34,130 --> 00:30:38,240 you might have had the ability to actually see something animated really 593 00:30:38,240 --> 00:30:41,870 by just flipping through some physical pieces of paper really quickly. 594 00:30:41,870 --> 00:30:45,320 Well, that's all a video format is in the digital age. 595 00:30:45,320 --> 00:30:49,370 It is simply a file format that contains essentially a whole bunch of images 596 00:30:49,370 --> 00:30:53,630 inside of it, each of which is shown to you so fast that there appears to be 597 00:30:53,630 --> 00:30:55,940 the illusion just like this of motion. 598 00:30:55,940 --> 00:30:59,330 And you're seeing 24 images per second, 30 images per second, 599 00:30:59,330 --> 00:31:03,710 and it's not necessarily that they're all PNGs or JPEGS or gifs 600 00:31:03,710 --> 00:31:05,930 or actual images inside of it, there's actually 601 00:31:05,930 --> 00:31:09,530 more complicated and sophisticated ways of storing the information so you're 602 00:31:09,530 --> 00:31:11,697 not just storing each of the frames. 603 00:31:11,697 --> 00:31:13,655 You can actually use algorithms and mathematics 604 00:31:13,655 --> 00:31:15,770 to actually go from one frame to another. 605 00:31:15,770 --> 00:31:19,070 And indeed, there are some very clever opportunities 606 00:31:19,070 --> 00:31:23,000 when it comes to videos for compressing video formats themselves. 607 00:31:23,000 --> 00:31:27,500 We can certainly leverage within frames or Intraframe so to speak, 608 00:31:27,500 --> 00:31:30,334 the exact same techniques that we saw earlier with something 609 00:31:30,334 --> 00:31:33,500 like a gif and an apple where we can actually leverage the fact that there's 610 00:31:33,500 --> 00:31:37,760 redundancy in a given frame of a video, throw that information away, and just 611 00:31:37,760 --> 00:31:42,200 remember whole sky is blue or the whole rest of some line or row in a file 612 00:31:42,200 --> 00:31:45,380 is blue and, therefore, save on information and bits. 613 00:31:45,380 --> 00:31:48,230 But with videos you have another opportunity, because you don't just 614 00:31:48,230 --> 00:31:52,400 have an individual picture, you have a picture in every subsequent picture, 615 00:31:52,400 --> 00:31:54,870 which might look very similar as well. 616 00:31:54,870 --> 00:31:59,450 In fact, if I hold very still for multiple seconds, 617 00:31:59,450 --> 00:32:02,000 odds are almost everything in this video is 618 00:32:02,000 --> 00:32:04,940 staying the same except for my mouth, apparently 619 00:32:04,940 --> 00:32:10,730 my pointer finger and my lips and eyes as I blink, 620 00:32:10,730 --> 00:32:13,710 but everything else about me is pretty much the same. 621 00:32:13,710 --> 00:32:18,200 So why would you in your file format store all of the various colors 622 00:32:18,200 --> 00:32:20,330 that we see behind me and around me? 623 00:32:20,330 --> 00:32:21,690 You don't need to do that. 624 00:32:21,690 --> 00:32:26,030 You can also leverage something called interframe compression, whereby 625 00:32:26,030 --> 00:32:29,490 in simplest form you can take a look at the current frame of a video 626 00:32:29,490 --> 00:32:32,387 and look at the next frame and decide what has changed. 627 00:32:32,387 --> 00:32:34,220 And maybe look another frame after that, see 628 00:32:34,220 --> 00:32:37,219 what has changed, and another frame after that and see what has changed. 629 00:32:37,219 --> 00:32:42,620 And essentially store not every image from the starting point 630 00:32:42,620 --> 00:32:47,630 to the ending point, but really just the differences between those frames that 631 00:32:47,630 --> 00:32:48,500 are adjacent. 632 00:32:48,500 --> 00:32:52,850 So for instance, if we start off with this bee here on a trio of flowers 633 00:32:52,850 --> 00:32:56,060 and he moves and he moves and he moves, we could-- 634 00:32:56,060 --> 00:32:59,150 if not compressing this video and these four frames that 635 00:32:59,150 --> 00:33:03,020 compose the video-- we could just store each of those images essentially as is, 636 00:33:03,020 --> 00:33:06,240 even though flowers are not moving, the leaves are not moving. 637 00:33:06,240 --> 00:33:08,240 The only thing that's moving is the bee. 638 00:33:08,240 --> 00:33:10,970 Or we can be more clever about this just as we were 639 00:33:10,970 --> 00:33:13,310 with the blue sky behind the apple. 640 00:33:13,310 --> 00:33:16,685 We can recognize that between picture one and picture four, 641 00:33:16,685 --> 00:33:20,540 or the first four frames of this video, the only thing that's moving 642 00:33:20,540 --> 00:33:22,100 is indeed that bee. 643 00:33:22,100 --> 00:33:26,900 So maybe we should store just what we'll call keyframes or a snapshot 644 00:33:26,900 --> 00:33:29,520 in time of what the video looks like. 645 00:33:29,520 --> 00:33:31,640 And then on each subsequent frame, essentially, 646 00:33:31,640 --> 00:33:34,980 just remember what information has changed, in this case, 647 00:33:34,980 --> 00:33:38,900 the position of the bee and leave it to the computer playing the video 648 00:33:38,900 --> 00:33:42,560 to infer or interpolate these inner frames based 649 00:33:42,560 --> 00:33:44,320 on those so-called keyframes. 650 00:33:44,320 --> 00:33:47,630 Use a bit of clever math, use some algorithms to actually figure out 651 00:33:47,630 --> 00:33:49,700 that, oh, here's where the bee now is. 652 00:33:49,700 --> 00:33:53,600 Let me redraw the exact same flower and the exact same leaves behind that bee. 653 00:33:53,600 --> 00:33:56,150 But I now only have to store really as many bits 654 00:33:56,150 --> 00:33:58,936 as it takes to remember where the bee now is there, 655 00:33:58,936 --> 00:34:01,310 where the bee now is here, and then just for good measure 656 00:34:01,310 --> 00:34:03,854 to keep everything synchronized maybe every few frames 657 00:34:03,854 --> 00:34:06,770 we'll have another keyframe that, even though it's a little expensive, 658 00:34:06,770 --> 00:34:08,210 stores the entirety of the frame. 659 00:34:08,210 --> 00:34:11,330 Just in case something goes wrong, we can guarantee ourselves 660 00:34:11,330 --> 00:34:14,929 that we can reconstruct what the video actually is even if there's 661 00:34:14,929 --> 00:34:17,699 a little bit of a glitch otherwise. 662 00:34:17,699 --> 00:34:21,130 So what are the file formats that we have at our disposal? 663 00:34:21,130 --> 00:34:24,290 Well, in the video world the terminology gets a little more complicated 664 00:34:24,290 --> 00:34:27,487 in that there are a number of different solutions 665 00:34:27,487 --> 00:34:28,820 to the problem of storing video. 666 00:34:28,820 --> 00:34:31,670 And indeed these are what the world might call containers. 667 00:34:31,670 --> 00:34:33,810 And a container is just as the name implies, 668 00:34:33,810 --> 00:34:39,139 it's a digital container inside of which you can put multiple types of data. 669 00:34:39,139 --> 00:34:41,570 And the types of data you might put into a container 670 00:34:41,570 --> 00:34:44,389 would be a video track, like the actual footage 671 00:34:44,389 --> 00:34:46,639 that you see on the screen, an audio track. 672 00:34:46,639 --> 00:34:49,730 Which is the actual audio that you hear, maybe a secondary audio track. 673 00:34:49,730 --> 00:34:53,070 If a film has been dubbed from one language into another, 674 00:34:53,070 --> 00:34:55,699 you might have multiple audio tracks in the same container. 675 00:34:55,699 --> 00:34:58,940 And then the software on your computer or even on your TV for that matter 676 00:34:58,940 --> 00:35:00,950 that's playing back this video can actually 677 00:35:00,950 --> 00:35:04,370 choose between those multiple audio formats. 678 00:35:04,370 --> 00:35:07,430 You might have closed captions or some other track inside the container. 679 00:35:07,430 --> 00:35:10,090 So long story short, a container really is just that. 680 00:35:10,090 --> 00:35:12,790 It's this bucket inside of which is the video and the audio, 681 00:35:12,790 --> 00:35:16,360 but maybe multiple formats thereof so that you can play them back 682 00:35:16,360 --> 00:35:18,040 based on your own preferences. 683 00:35:18,040 --> 00:35:21,910 So AVI is a very popular format that's been commonly used in the Windows world 684 00:35:21,910 --> 00:35:23,850 for years, as has been DIVX. 685 00:35:23,850 --> 00:35:27,910 MP4 and Quicktime have been more common on the side of Macs, 686 00:35:27,910 --> 00:35:30,760 although MP4 is now pretty much universal across all browsers 687 00:35:30,760 --> 00:35:32,650 and operating systems and more. 688 00:35:32,650 --> 00:35:35,440 Otrosca is more of an open source container that's 689 00:35:35,440 --> 00:35:38,230 meant to be even more versatile than these others 690 00:35:38,230 --> 00:35:42,160 on this screen capable of storing any number of file formats inside. 691 00:35:42,160 --> 00:35:46,330 And as to those formats inside, they might indeed be video. 692 00:35:46,330 --> 00:35:47,710 They might indeed be audio. 693 00:35:47,710 --> 00:35:49,570 But within those worlds realize there are 694 00:35:49,570 --> 00:35:53,380 different ways of storing and encoding information, 695 00:35:53,380 --> 00:35:57,190 and those inner most rappers use what are called codecs 696 00:35:57,190 --> 00:36:00,460 where a codek is just a way of encoding information in a video 697 00:36:00,460 --> 00:36:02,710 or in an audio file format. 698 00:36:02,710 --> 00:36:05,560 And there's any number of these options as well, but perhaps 699 00:36:05,560 --> 00:36:09,370 some of the most common these days is something called H.264 for video, which 700 00:36:09,370 --> 00:36:13,470 is a way of storing video on disk inside of a container, or MPEG-4 part 2, 701 00:36:13,470 --> 00:36:14,740 a little bit more verbosely. 702 00:36:14,740 --> 00:36:16,540 A popular alternative there too. 703 00:36:16,540 --> 00:36:19,510 And then in the world of audio files, two terms we've seen before, 704 00:36:19,510 --> 00:36:21,730 and this is where the world gets a little confusing, 705 00:36:21,730 --> 00:36:25,990 sometimes the container formats are the same as the actual media formats. 706 00:36:25,990 --> 00:36:29,980 And in this case, AAC and MP3 can be standalone files 707 00:36:29,980 --> 00:36:33,400 that you download and listen to in iTunes or some other software, 708 00:36:33,400 --> 00:36:40,210 or they can be tracks inside of a container that actually provide a video 709 00:36:40,210 --> 00:36:43,330 with the audio that accompanies it. 710 00:36:43,330 --> 00:36:47,200 But there aren't just these two dimensional file formats, if you will. 711 00:36:47,200 --> 00:36:51,070 There are increasingly three dimensional or virtual formats 712 00:36:51,070 --> 00:36:55,480 as well that allow you to capture the entirety of spaces like this. 713 00:36:55,480 --> 00:36:58,960 In fact, this is a picture that is knowingly a little bit distorted, 714 00:36:58,960 --> 00:37:02,110 because if you look up and around in reality at this space, 715 00:37:02,110 --> 00:37:03,970 it doesn't look so wide and stretched out. 716 00:37:03,970 --> 00:37:06,220 And the stage definitely isn't curved like this, 717 00:37:06,220 --> 00:37:08,260 but essentially what you're looking at now 718 00:37:08,260 --> 00:37:12,400 is a 360 degree photograph of this exact stage. 719 00:37:12,400 --> 00:37:14,770 And that image, even though it's effectively 720 00:37:14,770 --> 00:37:17,500 a sphere that captures the entirety of this space, 721 00:37:17,500 --> 00:37:20,890 it's essentially like you've taken a sphere and cut it around the edges 722 00:37:20,890 --> 00:37:24,880 and then flattened it out, much like flattening a globe of the earth 723 00:37:24,880 --> 00:37:26,800 into a rectangular region, and what you get 724 00:37:26,800 --> 00:37:28,870 is something that's a little distorted. 725 00:37:28,870 --> 00:37:31,030 But if you kind of stare at this for just a moment 726 00:37:31,030 --> 00:37:34,060 and you imagine that the wooden stage here is really 727 00:37:34,060 --> 00:37:36,430 meant to be a straight line and all of these seats 728 00:37:36,430 --> 00:37:39,450 are supposed to be put together side by side, 729 00:37:39,450 --> 00:37:43,360 you can imagine re-forming a sphere out of this otherwise flat 730 00:37:43,360 --> 00:37:45,940 two-dimension image and putting yourself inside of it 731 00:37:45,940 --> 00:37:48,195 and being able to experience a space like this. 732 00:37:48,195 --> 00:37:51,070 So increasingly some of these same file formats that we've discussed, 733 00:37:51,070 --> 00:37:53,350 among them JPEG, for instance, for photographs, 734 00:37:53,350 --> 00:37:57,580 do you have the ability to inject what's called metadata, some additional often 735 00:37:57,580 --> 00:38:01,240 textual data that the human looking at an image doesn't see. 736 00:38:01,240 --> 00:38:04,150 But programs like Photoshop and browsers and applications 737 00:38:04,150 --> 00:38:07,360 can actually read and realize, oh, this image 738 00:38:07,360 --> 00:38:10,150 has not only a grid of pixels, compressed 739 00:38:10,150 --> 00:38:13,670 or otherwise, color or otherwise, that I can display to the user, 740 00:38:13,670 --> 00:38:16,390 there's also some additional metadata that tells me 741 00:38:16,390 --> 00:38:20,590 how to display this image in a way that's much more immersive, 742 00:38:20,590 --> 00:38:23,240 so that the image effectively wraps around the user. 743 00:38:23,240 --> 00:38:25,540 Now the user might look a little silly doing so, 744 00:38:25,540 --> 00:38:29,760 but if he or she has a headset quite like this one here, 745 00:38:29,760 --> 00:38:32,590 he or she can take a look at this image, pull it up 746 00:38:32,590 --> 00:38:34,720 on the digital screen that's before him, and thanks 747 00:38:34,720 --> 00:38:40,900 to two small lenses, left eye and right eye, start to look up and down 748 00:38:40,900 --> 00:38:44,620 and left and right and all around him or her and actually see a space like this 749 00:38:44,620 --> 00:38:49,090 and experience it in 360 degree virtual reality. 750 00:38:49,090 --> 00:38:52,420 So this is just a taste then of the file formats that currently exist, 751 00:38:52,420 --> 00:38:55,570 that are on the horizon today, and just who knows what more will exist. 752 00:38:55,570 --> 00:38:59,110 But at the end of the day, it all boils down to bits, to zeros and ones, 753 00:38:59,110 --> 00:39:02,500 how you arrange them on disk, and what features you provide to the users 754 00:39:02,500 --> 00:39:06,470 with which to capture their imagination. 755 00:39:06,470 --> 00:39:07,931