1 00:00:00,000 --> 00:00:03,409 [MUSIC PLAYING] 2 00:00:03,409 --> 00:00:17,560 3 00:00:17,560 --> 00:00:18,760 SPEAKER 1: Cryptography. 4 00:00:18,760 --> 00:00:20,713 What is it, and why is it important? 5 00:00:20,713 --> 00:00:23,380 We're going to answer those two questions in exactly that order. 6 00:00:23,380 --> 00:00:25,420 Let's start with what cryptography is. 7 00:00:25,420 --> 00:00:31,340 It's the art and science of obscuring, and ideally protecting, information. 8 00:00:31,340 --> 00:00:34,330 Now it's an art and a science because there's math involved with it. 9 00:00:34,330 --> 00:00:37,457 It's pretty straightforward to manipulate characters in some way 10 00:00:37,457 --> 00:00:39,790 by adding some constant number to them or to change them 11 00:00:39,790 --> 00:00:42,130 in some systematic manner. 12 00:00:42,130 --> 00:00:47,380 But it's an art, because doing so in a way to defend against potential attacks 13 00:00:47,380 --> 00:00:50,650 is not as easy as it might first appear. 14 00:00:50,650 --> 00:00:54,640 There's a lot of guesswork and calculation 15 00:00:54,640 --> 00:00:59,080 that needs to go into play to find a really strong cipher. 16 00:00:59,080 --> 00:01:01,240 Cryptography gives us the opportunity to have 17 00:01:01,240 --> 00:01:04,930 a basic level of security against an adversary who might 18 00:01:04,930 --> 00:01:06,880 do bad things with the information. 19 00:01:06,880 --> 00:01:11,470 We usually contrast, in cipher information, 20 00:01:11,470 --> 00:01:14,350 with information that is presented in the clear, which 21 00:01:14,350 --> 00:01:17,410 is to say there's no protection surrounding it at all. 22 00:01:17,410 --> 00:01:21,940 And it's generally considered better to protect information using cryptography 23 00:01:21,940 --> 00:01:26,140 than to have information just freely available out there. 24 00:01:26,140 --> 00:01:28,840 Now a cipher, we're going to start by talking about cryptography 25 00:01:28,840 --> 00:01:29,920 sort of through history. 26 00:01:29,920 --> 00:01:32,590 We'll lead up to more modern forms of cryptography, 27 00:01:32,590 --> 00:01:36,430 which are derived from more ancient forms of cryptography. 28 00:01:36,430 --> 00:01:40,180 But a cipher is one of the most fundamental forms of cryptography. 29 00:01:40,180 --> 00:01:41,950 And ciphers are algorithms. 30 00:01:41,950 --> 00:01:46,900 And recall that an algorithm is just a step-by-step set of instructions 31 00:01:46,900 --> 00:01:49,730 that we use to complete a task. 32 00:01:49,730 --> 00:01:54,610 And in case, the task is to obscure or encipher information. 33 00:01:54,610 --> 00:01:59,890 And ciphers can also be used in reverse to unobscure, or decipher, 34 00:01:59,890 --> 00:02:04,420 that same information that was previously encoded or enciphered. 35 00:02:04,420 --> 00:02:07,330 Now there are many different ciphers out there 36 00:02:07,330 --> 00:02:10,509 that have varying levels of security potential. 37 00:02:10,509 --> 00:02:13,210 Some of the more ancient ciphers that we're going to start with 38 00:02:13,210 --> 00:02:16,300 should be [INAUDIBLE] be considered to have no security potential at all 39 00:02:16,300 --> 00:02:18,380 considering how easy they are to crack. 40 00:02:18,380 --> 00:02:22,220 But again, this leads into the more modern approach to cryptography, 41 00:02:22,220 --> 00:02:25,510 which is much more secure than some of these basic ones. 42 00:02:25,510 --> 00:02:29,350 And now let's start by imagining that we have possession of this device. 43 00:02:29,350 --> 00:02:32,680 Now if you're looking at this device and it seems somewhat familiar to you, 44 00:02:32,680 --> 00:02:35,800 it may be because you've recently seen the movie A Christmas Story, 45 00:02:35,800 --> 00:02:39,100 where Ralphie, the character there, obtains 46 00:02:39,100 --> 00:02:45,100 one of these, which is a little orphan Annie's secret society decoder pin. 47 00:02:45,100 --> 00:02:50,140 And this decoder pin has a set of numbers going sequentially one 48 00:02:50,140 --> 00:02:52,660 through 26 around the inner edge, and a set 49 00:02:52,660 --> 00:02:55,690 of letters, which is not presented in any particular order, 50 00:02:55,690 --> 00:02:57,140 around the outer edge. 51 00:02:57,140 --> 00:03:01,090 And what would happen is the radio announcer would provide, 52 00:03:01,090 --> 00:03:02,800 set your pins to some combination. 53 00:03:02,800 --> 00:03:05,323 So line up one number with one letter. 54 00:03:05,323 --> 00:03:07,240 And then it would read off some secret message 55 00:03:07,240 --> 00:03:10,328 that, ostensibly, only individuals who possessed this pin, 56 00:03:10,328 --> 00:03:13,120 or many of the duplicate versions of this pin that were distributed 57 00:03:13,120 --> 00:03:16,690 to children around the country, could then decipher 58 00:03:16,690 --> 00:03:19,270 by taking the numbers that were given over the radio 59 00:03:19,270 --> 00:03:22,480 and transforming them back into letters so that it makes sense. 60 00:03:22,480 --> 00:03:24,430 So if you can, if you zoom in on this image, 61 00:03:24,430 --> 00:03:27,070 it might be a little difficult to see, but you 62 00:03:27,070 --> 00:03:31,810 can see that the 3 corresponds to the letter L, and the 4 corresponds to an M 63 00:03:31,810 --> 00:03:36,550 based on this particular setting of this decoder pin. 64 00:03:36,550 --> 00:03:40,660 So this is one potential, what we would call a substitution cipher, 65 00:03:40,660 --> 00:03:45,280 where we're changing, we're substituting a letter in this case for a number, 66 00:03:45,280 --> 00:03:47,740 and that number will henceforth represent that letter 67 00:03:47,740 --> 00:03:51,180 for the rest of this message. 68 00:03:51,180 --> 00:03:53,820 But what is the problem with this cipher? 69 00:03:53,820 --> 00:03:56,760 Or more generally, when we think about issues in computer science 70 00:03:56,760 --> 00:04:00,540 where we have adversaries who are trying to penetrate some system, 71 00:04:00,540 --> 00:04:04,740 or break a code, or break in, or hack into anything, 72 00:04:04,740 --> 00:04:08,620 hack your password, we sometimes frame this in terms of asking the question, 73 00:04:08,620 --> 00:04:10,500 what is the attack vector? 74 00:04:10,500 --> 00:04:13,830 Where is the vulnerability that is potentially 75 00:04:13,830 --> 00:04:17,070 part of this particular cipher? 76 00:04:17,070 --> 00:04:22,440 And in this case, it's that anybody who has access to this pin 77 00:04:22,440 --> 00:04:26,100 is able to break any cipher that is made with this pin. 78 00:04:26,100 --> 00:04:29,790 And again, this pin was distributed pretty extensively in 1930s and 40s 79 00:04:29,790 --> 00:04:32,610 to children who listened to this very popular radio program. 80 00:04:32,610 --> 00:04:35,580 So these pins were in the hands of many people. 81 00:04:35,580 --> 00:04:37,740 And anybody who had access to the pin would 82 00:04:37,740 --> 00:04:39,700 be able to understand the message. 83 00:04:39,700 --> 00:04:43,380 And so that is, how we might frame this attack vector, 84 00:04:43,380 --> 00:04:48,930 is the key, in this case, the pin, which we will call a key for this purpose, 85 00:04:48,930 --> 00:04:50,110 is just very prevalent. 86 00:04:50,110 --> 00:04:54,510 It's pretty well known how to use this key and manipulate this key. 87 00:04:54,510 --> 00:04:58,470 A lot of people have access to that key. 88 00:04:58,470 --> 00:05:00,890 But that's just one example of a substitution cipher. 89 00:05:00,890 --> 00:05:04,755 We have many different examples of substitution ciphers that we could use. 90 00:05:04,755 --> 00:05:07,130 Let's just take another very simple, straightforward one, 91 00:05:07,130 --> 00:05:10,610 which is imagine we have all of the letters of the alphabet 92 00:05:10,610 --> 00:05:13,880 and we're just going to assign the ordinal position of that letter 93 00:05:13,880 --> 00:05:15,150 as its cipher value. 94 00:05:15,150 --> 00:05:18,468 So with the secret society pin, there was this sort of random element 95 00:05:18,468 --> 00:05:19,010 to it, right? 96 00:05:19,010 --> 00:05:20,302 The letters were being skipped. 97 00:05:20,302 --> 00:05:24,240 There wasn't a rhyme or reason to them, although the numbers were sequential. 98 00:05:24,240 --> 00:05:25,760 Here let's just line up both. 99 00:05:25,760 --> 00:05:29,220 Let's use sequential letters and map them to their sequential numbers. 100 00:05:29,220 --> 00:05:33,210 So A becomes 1, B becomes 2, and so on. 101 00:05:33,210 --> 00:05:36,650 Both of these things are increasing linearly. 102 00:05:36,650 --> 00:05:39,020 Now you may recall that as computer scientists, 103 00:05:39,020 --> 00:05:42,230 we ordinarily start counting from zero rather than counting from one. 104 00:05:42,230 --> 00:05:46,730 I'm counting from one here because this mapping of A to 1 and Z to 26 105 00:05:46,730 --> 00:05:49,400 is much more familiar to us intuitively as humans, 106 00:05:49,400 --> 00:05:54,410 and I want to keep us grounded in this discussion of cryptography right now. 107 00:05:54,410 --> 00:05:59,150 But ordinarily, you might actually instead see this as 0 to 25, 0 being A, 108 00:05:59,150 --> 00:06:02,750 through Z being 25 as opposed to 1 through 26. 109 00:06:02,750 --> 00:06:05,060 But this cipher would work exactly the same 110 00:06:05,060 --> 00:06:08,090 and has roughly the same security potential 111 00:06:08,090 --> 00:06:11,690 as Annie's secret society cipher does. 112 00:06:11,690 --> 00:06:17,210 And we can actually make this a little bit better because we are consistently 113 00:06:17,210 --> 00:06:20,180 increasing the letters, A through Z, and consistently increasing 114 00:06:20,180 --> 00:06:22,100 the numbers, 1 through 26. 115 00:06:22,100 --> 00:06:25,460 We could also, instead of just doing this direct mapping, 116 00:06:25,460 --> 00:06:27,260 we could rotate around. 117 00:06:27,260 --> 00:06:31,025 We could start the 1 somewhere else as opposed to being A. 118 00:06:31,025 --> 00:06:36,110 And now instead of having just one cipher where A maps to 1, B maps to 2, 119 00:06:36,110 --> 00:06:38,570 we have a variety of different ciphers, depending 120 00:06:38,570 --> 00:06:42,860 on where we decide we want to have our starting point. 121 00:06:42,860 --> 00:06:47,960 So for example, we might instead add two to every number. 122 00:06:47,960 --> 00:06:54,838 So instead of going from 1 to 26, we go from 3 to 28. 123 00:06:54,838 --> 00:06:55,630 Now think about it. 124 00:06:55,630 --> 00:06:58,300 If you're trying to break this cipher and you see patterns 125 00:06:58,300 --> 00:07:03,370 like this with all these numbers in them, what might jump out at you? 126 00:07:03,370 --> 00:07:07,630 Well, if you're used to seeing ciphers that are 1 through 26, for example, 127 00:07:07,630 --> 00:07:09,880 something where you don't see any 1s or 2s 128 00:07:09,880 --> 00:07:13,510 and suddenly you're seeing 27s and 28s potentially in the message that might 129 00:07:13,510 --> 00:07:18,490 be long enough to have, in this case, Ys or Zs in it 130 00:07:18,490 --> 00:07:21,010 might seem to you that this is slightly off. 131 00:07:21,010 --> 00:07:24,640 Like this cipher must be shifted in some way. 132 00:07:24,640 --> 00:07:26,690 Instead of being this straightforward line, 133 00:07:26,690 --> 00:07:29,050 there's some modification that's been made to it. 134 00:07:29,050 --> 00:07:31,992 That's kind of a tip off if you're trying to defend 135 00:07:31,992 --> 00:07:33,450 against somebody figuring that out. 136 00:07:33,450 --> 00:07:37,390 And so instead of going 27, 28 at the end, 137 00:07:37,390 --> 00:07:40,180 we might instead wrap around the alphabet. 138 00:07:40,180 --> 00:07:45,070 Once we have exhausted the 26 possible values that we started with, 139 00:07:45,070 --> 00:07:49,810 the 26 letters of the alphabet, we might instead, once we have X is 26, 140 00:07:49,810 --> 00:07:54,490 say, well, instead of Y being 27, Y is 1 and Z is 2. 141 00:07:54,490 --> 00:07:59,170 And this is not a massive improvement on the security of this cipher. 142 00:07:59,170 --> 00:08:02,860 Like I said, it's still quite fragile and quite easy to break. 143 00:08:02,860 --> 00:08:06,550 But it doesn't give quite as much of a clue to a potential adversary 144 00:08:06,550 --> 00:08:11,440 as to how to crack it, how to decipher the message. 145 00:08:11,440 --> 00:08:14,320 And this can be done for any different value 146 00:08:14,320 --> 00:08:16,150 to obtain any number of different ciphers. 147 00:08:16,150 --> 00:08:18,190 Instead of going forward by two positions, 148 00:08:18,190 --> 00:08:21,550 we could add 20 to every letter's value, again, 149 00:08:21,550 --> 00:08:24,790 wrapping around the alphabet when we exhaust, 150 00:08:24,790 --> 00:08:28,900 when we get to 26, instead of having 27, 28, we would just reset at 1 151 00:08:28,900 --> 00:08:32,179 and continue on. 152 00:08:32,179 --> 00:08:35,100 But we can also add 26 to it. 153 00:08:35,100 --> 00:08:37,870 But that doesn't look very different than what we had before. 154 00:08:37,870 --> 00:08:42,669 And that's where this cipher's vulnerability comes into play. 155 00:08:42,669 --> 00:08:46,950 There's only 26 possible ways to rotate the alphabet 156 00:08:46,950 --> 00:08:50,940 while keeping the order of the letters preserved, right? 157 00:08:50,940 --> 00:08:55,000 Unless we start skipping A, D, G, and then, 158 00:08:55,000 --> 00:08:57,840 you know, rearranging the other letters in some other way. 159 00:08:57,840 --> 00:09:00,840 If we want to keep everything straightforward in a line, 160 00:09:00,840 --> 00:09:04,440 again, wrapping around 26 when necessary, there's 161 00:09:04,440 --> 00:09:06,330 only 26 ways to do it. 162 00:09:06,330 --> 00:09:09,570 That is to say that shifting the alphabet forward by 26 163 00:09:09,570 --> 00:09:12,982 is exactly the same as shifting the alphabet forward by 0. 164 00:09:12,982 --> 00:09:14,190 And so that's our limitation. 165 00:09:14,190 --> 00:09:18,660 We have a very small number of, again, this word keys that can 166 00:09:18,660 --> 00:09:22,925 be used to decipher using this cipher. 167 00:09:22,925 --> 00:09:25,550 Now this is an example of something called a rotational cipher, 168 00:09:25,550 --> 00:09:27,870 and it's actually a rather famous rotational cipher 169 00:09:27,870 --> 00:09:30,030 known as the Caesar Cipher. 170 00:09:30,030 --> 00:09:32,970 It's attributed to Julius Caesar and was apparently used 171 00:09:32,970 --> 00:09:38,250 more than two millennia ago for him to encode messages to his troops 172 00:09:38,250 --> 00:09:39,480 on the line. 173 00:09:39,480 --> 00:09:41,250 And at the time, this was revolutionary. 174 00:09:41,250 --> 00:09:44,460 And generally what you're going to find with cryptography 175 00:09:44,460 --> 00:09:50,190 is there's just this pattern of breaking the mold and doing something new 176 00:09:50,190 --> 00:09:52,350 and trying to stay one step ahead. 177 00:09:52,350 --> 00:09:55,320 And oftentimes, other people will then catch up. 178 00:09:55,320 --> 00:09:58,740 And this cipher, which was once, you know, 179 00:09:58,740 --> 00:10:04,320 lauded as being a wonderful cipher, is no longer as strong 180 00:10:04,320 --> 00:10:06,120 as it once was thought to be. 181 00:10:06,120 --> 00:10:09,270 And so we keep having to advance and improve and get ahead 182 00:10:09,270 --> 00:10:12,150 of it for whatever kind of adversary that is, whether that's 183 00:10:12,150 --> 00:10:16,110 a potential enemy on the battle line, as might have been the case with Julius 184 00:10:16,110 --> 00:10:20,280 Caesar, or whether that's a hacker who's trying to break into your system 185 00:10:20,280 --> 00:10:21,870 as might be the case today. 186 00:10:21,870 --> 00:10:25,062 And fortunately, again, we're not using Caesar Cipher today 187 00:10:25,062 --> 00:10:26,520 to uncipher any of our information. 188 00:10:26,520 --> 00:10:28,410 We're using much more modern techniques. 189 00:10:28,410 --> 00:10:30,990 But these modern techniques evolved from seeing 190 00:10:30,990 --> 00:10:34,230 codes being created, ciphers being created and broken, 191 00:10:34,230 --> 00:10:36,210 and then having to be created anew to try 192 00:10:36,210 --> 00:10:39,220 and defend against new vulnerabilities that have been exposed. 193 00:10:39,220 --> 00:10:45,750 So like I said previously, very easy to decipher or to crack the Caesar Cipher, 194 00:10:45,750 --> 00:10:48,100 but at the time, very, very difficult. 195 00:10:48,100 --> 00:10:50,430 The limitation, again, limited number of keys. 196 00:10:50,430 --> 00:10:53,592 There's only 26 ways to rotate the alphabet for it to make sense. 197 00:10:53,592 --> 00:10:55,050 In the English alphabet, of course. 198 00:10:55,050 --> 00:10:56,460 If you're using a different alphabet, you're 199 00:10:56,460 --> 00:10:58,252 number of keys might be different if you're 200 00:10:58,252 --> 00:10:59,940 using the same rotational approach. 201 00:10:59,940 --> 00:11:01,860 But the fundamental limitation is you are 202 00:11:01,860 --> 00:11:04,920 confined by how many letters are in your alphabet 203 00:11:04,920 --> 00:11:08,460 that you're using to encipher information. 204 00:11:08,460 --> 00:11:10,320 So let's take things one step further. 205 00:11:10,320 --> 00:11:15,088 What is an improvement that we might be able to make to Caesar? 206 00:11:15,088 --> 00:11:17,880 That would lead us to this idea potentially of the Vigenere Cipher. 207 00:11:17,880 --> 00:11:20,270 So Caesar had this limitation of there's one key 208 00:11:20,270 --> 00:11:23,970 and there's only 26 possible values for that key. 209 00:11:23,970 --> 00:11:27,650 What Vigenere Cipher does is it, instead of using a single key, 210 00:11:27,650 --> 00:11:29,000 uses multiple keys. 211 00:11:29,000 --> 00:11:31,490 Instead of picking a number to shift by, we're 212 00:11:31,490 --> 00:11:34,430 instead going to define a keyword. 213 00:11:34,430 --> 00:11:37,880 And we're going to use the letters of that keyword in sequence as we 214 00:11:37,880 --> 00:11:41,570 go to change what our key is at any given 215 00:11:41,570 --> 00:11:45,050 time, such that our enciphered message, instead of being enciphered using one 216 00:11:45,050 --> 00:11:48,295 key, might use three keys or five keys or 10 keys, 217 00:11:48,295 --> 00:11:50,420 depending on the length of the keyword that we use, 218 00:11:50,420 --> 00:11:54,230 if that keyword is three or five or 10 letters long. 219 00:11:54,230 --> 00:11:57,680 So this keyword becomes the interesting twist 220 00:11:57,680 --> 00:12:01,130 that made Caesar much more challenging for an adversary 221 00:12:01,130 --> 00:12:03,350 to crack by using different keys. 222 00:12:03,350 --> 00:12:06,320 Now let's walk through an example of how the Vigenere Cipher works 223 00:12:06,320 --> 00:12:10,220 because I think it makes more sense to see this visually rather than just 224 00:12:10,220 --> 00:12:11,810 discussing it verbally. 225 00:12:11,810 --> 00:12:15,260 So what we want to do here is encrypt the message HELLO 226 00:12:15,260 --> 00:12:17,000 using the keyword LAW. 227 00:12:17,000 --> 00:12:22,010 So here our message HELLO is what also might be called plain text. 228 00:12:22,010 --> 00:12:23,030 It is in the clear. 229 00:12:23,030 --> 00:12:23,980 It is not enciphered. 230 00:12:23,980 --> 00:12:27,470 It is not hidden against any adversary. 231 00:12:27,470 --> 00:12:30,260 And our key is LAW. 232 00:12:30,260 --> 00:12:32,690 All right, so let's take a look at how we might do this. 233 00:12:32,690 --> 00:12:36,080 So it oftentimes helps, especially when trying to encipher or decipher 234 00:12:36,080 --> 00:12:40,430 using the Vigenere Cipher, to consider all 235 00:12:40,430 --> 00:12:44,567 of the inputs that go into determining the final outputted character. 236 00:12:44,567 --> 00:12:46,400 So we're going to take a look at plain text, 237 00:12:46,400 --> 00:12:47,570 and we're going to convert it, just like we did 238 00:12:47,570 --> 00:12:49,520 with Caesar, to its ordinal position. 239 00:12:49,520 --> 00:12:51,620 We're going to see where in the alphabet that is. 240 00:12:51,620 --> 00:12:52,578 Is it the first letter? 241 00:12:52,578 --> 00:12:53,110 Then it's 1. 242 00:12:53,110 --> 00:12:54,890 If it's the last letter, it's 26. 243 00:12:54,890 --> 00:12:56,190 And so on. 244 00:12:56,190 --> 00:12:59,405 We're going to do the exact same thing with each letter of our keyword. 245 00:12:59,405 --> 00:13:01,280 So we're going to take a look at the keyword, 246 00:13:01,280 --> 00:13:05,447 figure out what that letter's numerical correspondence would be. 247 00:13:05,447 --> 00:13:07,530 We're going to then add those two things together. 248 00:13:07,530 --> 00:13:10,458 If we go over 26, just as we did with the Caesar Cipher, 249 00:13:10,458 --> 00:13:13,250 we're going to wrap back around such that we're confining ourselves 250 00:13:13,250 --> 00:13:15,860 to that range of 1 through 26. 251 00:13:15,860 --> 00:13:18,890 And then we're going to take that number and transform it into a letter. 252 00:13:18,890 --> 00:13:23,180 So for example, if the result there is 2, we're going to change that into a B. 253 00:13:23,180 --> 00:13:26,940 And the reason for that is that B is the second letter of the alphabet. 254 00:13:26,940 --> 00:13:32,690 So let's walk through this with HELLO as our plain text and LAW as our key. 255 00:13:32,690 --> 00:13:36,110 So the first letter of our plain text is H, and the ordinal position 256 00:13:36,110 --> 00:13:37,920 of that H is 8. 257 00:13:37,920 --> 00:13:39,920 It is the eighth letter of the alphabet. 258 00:13:39,920 --> 00:13:43,820 We do the same thing with the first L for LAW, the first letter of LAW. 259 00:13:43,820 --> 00:13:46,590 L is 12, it's the 12th letter of the alphabet. 260 00:13:46,590 --> 00:13:49,700 So our next step is to add those two values, eight and 12 together. 261 00:13:49,700 --> 00:13:51,042 We get 20. 262 00:13:51,042 --> 00:13:52,250 We don't need to wrap around. 263 00:13:52,250 --> 00:13:54,770 We didn't go over 26, so we're still OK. 264 00:13:54,770 --> 00:13:59,430 And the 20th letter of the alphabet is T. So the first step of this 265 00:13:59,430 --> 00:14:04,760 is enciphering process with HELLO, using the Vigenere Cipher, using the key LAW, 266 00:14:04,760 --> 00:14:06,808 is to turn the H into a T. 267 00:14:06,808 --> 00:14:08,600 So we can do this again, we can take a look 268 00:14:08,600 --> 00:14:10,937 at the E, the second letter of our plain text. 269 00:14:10,937 --> 00:14:12,770 We use the second letter of our keyword now. 270 00:14:12,770 --> 00:14:14,103 So we're not using the same key. 271 00:14:14,103 --> 00:14:16,283 We're not using 12 over and over and over. 272 00:14:16,283 --> 00:14:17,450 We're using a different key. 273 00:14:17,450 --> 00:14:20,120 We're now using the A, the second letter of our keyword, 274 00:14:20,120 --> 00:14:21,300 whose ordinal position is 1. 275 00:14:21,300 --> 00:14:26,000 So 5 plus 1 is 6, and that results in F. 276 00:14:26,000 --> 00:14:31,670 Next, we use the first L of HELLO, and the W of LAW. 277 00:14:31,670 --> 00:14:35,740 So L is 12, W is the 23rd letter of the alphabet, we add those together, 278 00:14:35,740 --> 00:14:36,240 we're at 35. 279 00:14:36,240 --> 00:14:39,740 35 is not a legal value in terms of this cipher. 280 00:14:39,740 --> 00:14:43,040 We are confined to 1 through 26. 281 00:14:43,040 --> 00:14:47,065 And so we just subtract 26 and we get down to 9, and now we have I. 282 00:14:47,065 --> 00:14:47,940 So now what do we do? 283 00:14:47,940 --> 00:14:50,960 We've exhausted our keyword, but we still 284 00:14:50,960 --> 00:14:53,497 have plain text that we need to encipher. 285 00:14:53,497 --> 00:14:55,580 Well, as you might expect, the logical thing to do 286 00:14:55,580 --> 00:14:59,785 is just go back to the beginning of the keyword and continue on. 287 00:14:59,785 --> 00:15:00,410 And so we will. 288 00:15:00,410 --> 00:15:05,208 So we'll use the L, the second L of our plain text, and the first L-- 289 00:15:05,208 --> 00:15:07,250 because we've now exhausted all of those letters, 290 00:15:07,250 --> 00:15:09,050 we have to go back to the beginning-- 291 00:15:09,050 --> 00:15:10,190 the L for LAW. 292 00:15:10,190 --> 00:15:12,290 12 plus 12 is 24. 293 00:15:12,290 --> 00:15:15,290 24, the 24th letter of the alphabet is X. 294 00:15:15,290 --> 00:15:19,100 And we do that finally as well for the O, advancing it one position, because 295 00:15:19,100 --> 00:15:23,840 of the A in LAW, to 16, and that is P. 296 00:15:23,840 --> 00:15:27,260 So ultimately, HELLO in this case becomes 297 00:15:27,260 --> 00:15:30,680 this random set of characters, TFIXP. 298 00:15:30,680 --> 00:15:34,280 And some advantages might also immediately jump out at you. 299 00:15:34,280 --> 00:15:37,910 With the Caesar Cipher, anytime we changed a letter, 300 00:15:37,910 --> 00:15:42,500 it always was that same letter every time we 301 00:15:42,500 --> 00:15:43,910 saw it in the enciphered message. 302 00:15:43,910 --> 00:15:47,390 So if we had a B and we were advancing everything by two characters, 303 00:15:47,390 --> 00:15:49,550 every B in the original message would always 304 00:15:49,550 --> 00:15:53,210 be a D because D comes two letters after B. 305 00:15:53,210 --> 00:15:58,205 So again, if our Caesar Cipher key is two, every time we see a B, 306 00:15:58,205 --> 00:16:02,570 it becomes a D, every time we have an A, it becomes a C, always. 307 00:16:02,570 --> 00:16:05,630 Here with the Vigenere Cipher, because we have different keys 308 00:16:05,630 --> 00:16:08,022 and we're rotating these keys differently, 309 00:16:08,022 --> 00:16:09,980 depending on which letter of the keyword we are 310 00:16:09,980 --> 00:16:12,560 and which letter of the plain text we are, 311 00:16:12,560 --> 00:16:15,680 those two Ls are not the same, right? 312 00:16:15,680 --> 00:16:19,640 Instead of H-E-L-L-O, we don't have some mapping. 313 00:16:19,640 --> 00:16:22,760 Those two Ls are I and X. They are not the same character. 314 00:16:22,760 --> 00:16:25,640 And so already we're seeing a bit more security here 315 00:16:25,640 --> 00:16:29,630 because there's not this potential to guess. 316 00:16:29,630 --> 00:16:33,530 Caesar is also much more secure when you consider 317 00:16:33,530 --> 00:16:35,240 how many keys are available to you. 318 00:16:35,240 --> 00:16:39,050 With the Caesar Cipher we had 26 keys available to us. 319 00:16:39,050 --> 00:16:42,860 With the Vigenere Cipher we have 26 to the n keys, 320 00:16:42,860 --> 00:16:44,820 where n is the length of our keyword. 321 00:16:44,820 --> 00:16:47,930 So for example, if we're using a two letter long keyword, 322 00:16:47,930 --> 00:16:52,640 for example, AA or AB or all the way up, that leaves us with 26 squared, 323 00:16:52,640 --> 00:16:54,710 or 676 possibilities. 324 00:16:54,710 --> 00:16:57,560 Now if we extend to three letter keywords or four letter keywords, 325 00:16:57,560 --> 00:17:00,000 we're getting even more and more possibilities. 326 00:17:00,000 --> 00:17:02,870 And as we start to increase the number of possibilities, 327 00:17:02,870 --> 00:17:06,980 we start to really increase the difficulty for some adversary 328 00:17:06,980 --> 00:17:08,930 to figure out what the key is. 329 00:17:08,930 --> 00:17:11,089 And that's really the goal of cryptography, right? 330 00:17:11,089 --> 00:17:13,400 We want to be able to protect information 331 00:17:13,400 --> 00:17:18,000 and we want to defend that information from being determined by other people. 332 00:17:18,000 --> 00:17:22,430 So the more work we put into making more challenging keys, the more likely 333 00:17:22,430 --> 00:17:26,609 we are to be successful in our attempt to encipher information. 334 00:17:26,609 --> 00:17:29,240 So again, Vigenere much more of a secure cipher. 335 00:17:29,240 --> 00:17:31,460 It's still not secure and it's definitely 336 00:17:31,460 --> 00:17:33,590 not a cipher that is used today. 337 00:17:33,590 --> 00:17:39,980 There are computer programs that are capable of figuring out how to decipher 338 00:17:39,980 --> 00:17:42,980 using the Vigenere Cipher pretty well. 339 00:17:42,980 --> 00:17:45,230 But it's more secure than Caesar for sure 340 00:17:45,230 --> 00:17:52,510 because of its changing alphabets and its much larger number of keys. 341 00:17:52,510 --> 00:17:57,010 Let's go back to this decoder pin and think about another potential problem 342 00:17:57,010 --> 00:17:58,300 that we have. 343 00:17:58,300 --> 00:18:00,340 Now assume that your adversary is actually 344 00:18:00,340 --> 00:18:02,950 not a member of Annie's secret society. 345 00:18:02,950 --> 00:18:04,813 They don't have this pin. 346 00:18:04,813 --> 00:18:05,980 So that's already a step up. 347 00:18:05,980 --> 00:18:08,897 We previously had assumed that anybody who had the pin could crack it, 348 00:18:08,897 --> 00:18:09,820 and that's still true. 349 00:18:09,820 --> 00:18:14,800 But let's assume your adversary, lucky you, doesn't have this pin. 350 00:18:14,800 --> 00:18:21,633 Is there still a way that they would be able to crack the code without the pin? 351 00:18:21,633 --> 00:18:22,800 Think about it for a second. 352 00:18:22,800 --> 00:18:26,660 Think about what our characteristics of the English language 353 00:18:26,660 --> 00:18:31,345 are that might suggest people figure out what this cipher is. 354 00:18:31,345 --> 00:18:33,720 Think about some unique features of the English language, 355 00:18:33,720 --> 00:18:38,370 which is one letter words, like I and A, which might appear in the message. 356 00:18:38,370 --> 00:18:41,280 If you see a single letter word in a message, 357 00:18:41,280 --> 00:18:44,005 you're probably going to guess that it's either the letter I, 358 00:18:44,005 --> 00:18:47,130 and every time I see that character or that number I'm going to assume it's 359 00:18:47,130 --> 00:18:49,503 an I, or you're going to assume it's an A 360 00:18:49,503 --> 00:18:51,670 and you're going to try and plug in an A everywhere. 361 00:18:51,670 --> 00:18:55,620 And some trial and error might reveal some patterns that emerge. 362 00:18:55,620 --> 00:18:58,620 And there is a very prevalent pattern in the English language, 363 00:18:58,620 --> 00:19:01,920 which is that letters appear with a pretty regular frequency. 364 00:19:01,920 --> 00:19:05,790 Given any arbitrary text in the English language, 365 00:19:05,790 --> 00:19:10,020 it's pretty likely that the distribution of letters within that text 366 00:19:10,020 --> 00:19:14,700 is going to follow this pattern roughly 13% of the time, give or take. 367 00:19:14,700 --> 00:19:16,890 Any arbitrary letter selected from a text 368 00:19:16,890 --> 00:19:23,890 is going to be the letter E. And only 1/10 of 1% of the time will it be a Z. 369 00:19:23,890 --> 00:19:26,460 And only 2/10 might it be a J. So there are some letters that 370 00:19:26,460 --> 00:19:28,980 appear very frequently and there are other letters that 371 00:19:28,980 --> 00:19:31,050 appear very infrequently. 372 00:19:31,050 --> 00:19:34,682 And that is still a problem in this generic substitution cipher, 373 00:19:34,682 --> 00:19:37,140 even with the letters being scrambled, which seems at first 374 00:19:37,140 --> 00:19:40,698 blush to perhaps be much more secure than one where 375 00:19:40,698 --> 00:19:42,990 the letters are increasing sequentially and the numbers 376 00:19:42,990 --> 00:19:44,460 are increasing sequentially. 377 00:19:44,460 --> 00:19:46,943 Even this scattershot mapping of letters to numbers, 378 00:19:46,943 --> 00:19:49,110 as long as we're still confined to these two domains 379 00:19:49,110 --> 00:19:51,870 where we have A through Z and 1 through 26 380 00:19:51,870 --> 00:19:53,760 and there's always a mapping between them, 381 00:19:53,760 --> 00:19:56,580 whether they're ordered or not ordered, is still 382 00:19:56,580 --> 00:20:00,420 a problem, in the English language anyway, because of frequency analysis. 383 00:20:00,420 --> 00:20:02,820 These are actually very common puzzles. 384 00:20:02,820 --> 00:20:05,880 Humans might find it kind of tedious to try and solve these puzzles, 385 00:20:05,880 --> 00:20:09,540 but otherwise, this is well known as a cryptogram. 386 00:20:09,540 --> 00:20:12,570 You may, if you are the puzzling type, this type of puzzle 387 00:20:12,570 --> 00:20:13,950 is called a cryptogram. 388 00:20:13,950 --> 00:20:18,750 And this pattern is definitely something that is across all messages 389 00:20:18,750 --> 00:20:20,670 that appear in the English language. 390 00:20:20,670 --> 00:20:24,090 There are plenty of other ciphers that appear, that are used, 391 00:20:24,090 --> 00:20:25,890 that are more secure than any of these what 392 00:20:25,890 --> 00:20:27,960 we might call one-to-one ciphers, mapping 393 00:20:27,960 --> 00:20:32,130 a single character to a different character or to a number. 394 00:20:32,130 --> 00:20:35,010 There are some ciphers that substitute pairs or triples of characters 395 00:20:35,010 --> 00:20:35,510 at a time. 396 00:20:35,510 --> 00:20:37,950 And these ciphers, again, form the basis for what 397 00:20:37,950 --> 00:20:40,470 eventually becomes more modern cryptography, which 398 00:20:40,470 --> 00:20:42,060 we're getting to in just a moment. 399 00:20:42,060 --> 00:20:43,920 There are also transposition ciphers, where 400 00:20:43,920 --> 00:20:46,470 instead of substituting one character for another, 401 00:20:46,470 --> 00:20:51,500 we simply use an algorithm to rearrange all the letters in some systematic way. 402 00:20:51,500 --> 00:20:55,980 And the defect there is that all the letters of our original plain text 403 00:20:55,980 --> 00:21:00,000 message are still there and all we need to do is unscramble them. 404 00:21:00,000 --> 00:21:01,740 And because there's an algorithm that was 405 00:21:01,740 --> 00:21:05,640 used to scramble them in the first place, 406 00:21:05,640 --> 00:21:07,950 there's got to be a way to undo it as well. 407 00:21:07,950 --> 00:21:12,560 With a little bit of trial and error, we can probably sort that out. 408 00:21:12,560 --> 00:21:17,450 Finally, the most egregious issue with these classical ciphers 409 00:21:17,450 --> 00:21:20,450 is, how do you distribute the key? 410 00:21:20,450 --> 00:21:25,750 How do you tell someone who you want to share information with? 411 00:21:25,750 --> 00:21:30,530 How do you tell your ally what the key is for the cipher 412 00:21:30,530 --> 00:21:32,850 that you are going to use? 413 00:21:32,850 --> 00:21:36,683 You can't encrypt it because if you encrypt the key, 414 00:21:36,683 --> 00:21:38,350 how will they know what the real key is? 415 00:21:38,350 --> 00:21:40,630 If you say, if you send them a message and they 416 00:21:40,630 --> 00:21:43,600 don't know how to interpret it, or they see it and they interpret it 417 00:21:43,600 --> 00:21:46,690 as something else, that's not going to be helpful to you. 418 00:21:46,690 --> 00:21:51,110 You want them to see the key in the plain text. 419 00:21:51,110 --> 00:21:54,130 You want them to see the key in the clear, rather. 420 00:21:54,130 --> 00:21:55,960 You want them to just have it. 421 00:21:55,960 --> 00:21:59,500 You don't want to encrypt that as you hand it to them. 422 00:21:59,500 --> 00:22:01,870 That doesn't do them any good. 423 00:22:01,870 --> 00:22:05,320 But if you're giving the key to your ally 424 00:22:05,320 --> 00:22:07,930 and your adversary is within earshot, or they 425 00:22:07,930 --> 00:22:11,560 have access to that same piece of paper because your ally carelessly throws it 426 00:22:11,560 --> 00:22:14,260 away and they can just pick it up, now all 427 00:22:14,260 --> 00:22:20,670 of a sudden all of your messages using basic ciphers are fairly insecure. 428 00:22:20,670 --> 00:22:25,110 But let's take a step forward in modern cryptography. 429 00:22:25,110 --> 00:22:28,410 Perhaps you've seen a screen that looks like this at some point 430 00:22:28,410 --> 00:22:32,100 when you're trying to log in to some system. 431 00:22:32,100 --> 00:22:36,910 Enter your email and we'll email you a link to change your password. 432 00:22:36,910 --> 00:22:39,668 Well, why don't you just email me my password? 433 00:22:39,668 --> 00:22:41,710 Like you're going to give me a link to change it, 434 00:22:41,710 --> 00:22:45,055 you must know it if I use my credentials to log in 435 00:22:45,055 --> 00:22:47,170 to your service any given day. 436 00:22:47,170 --> 00:22:50,410 But OK, I guess, sure. 437 00:22:50,410 --> 00:22:54,940 The reason for this is actually a reason of security. 438 00:22:54,940 --> 00:22:59,650 So let's distinguish ciphers, which we've been talking about, from hashes. 439 00:22:59,650 --> 00:23:03,370 So one of the most critical distinctions is 440 00:23:03,370 --> 00:23:07,130 that ciphers are generally reversible. 441 00:23:07,130 --> 00:23:09,177 You can undo what you did. 442 00:23:09,177 --> 00:23:12,010 That's the whole reason why it's important to share with your allies 443 00:23:12,010 --> 00:23:13,330 the key. 444 00:23:13,330 --> 00:23:17,050 But hashes are generally not reversible. 445 00:23:17,050 --> 00:23:20,210 Or certainly, they're not supposed to be reversible. 446 00:23:20,210 --> 00:23:23,140 And so it turns out, and we'll learn about this a little bit 447 00:23:23,140 --> 00:23:28,150 later, when you log in to some service, if that service 448 00:23:28,150 --> 00:23:31,480 is doing a good job of protecting your data, 449 00:23:31,480 --> 00:23:35,350 the reason they can't just send you your password is because they actually 450 00:23:35,350 --> 00:23:36,800 don't know your password. 451 00:23:36,800 --> 00:23:39,820 And that might seem strange because clearly, there 452 00:23:39,820 --> 00:23:44,470 must be something-- if I type in my password then I get logged in. 453 00:23:44,470 --> 00:23:49,395 But a good service is one that does not store your password in the database. 454 00:23:49,395 --> 00:23:51,520 That's probably a good thing if you think about it. 455 00:23:51,520 --> 00:23:53,228 In case there was ever a data breach, you 456 00:23:53,228 --> 00:23:57,070 wouldn't want your password to be in their database. 457 00:23:57,070 --> 00:24:02,410 Instead what they do is they store a hash of your password in the database. 458 00:24:02,410 --> 00:24:06,250 And then when you provide your password to them, 459 00:24:06,250 --> 00:24:08,260 they run that hash through the same things, 460 00:24:08,260 --> 00:24:12,040 called a hash function, which is just a generic idea for a function that 461 00:24:12,040 --> 00:24:18,310 takes any arbitrarily large amount of data and maps it to some other range 462 00:24:18,310 --> 00:24:20,080 or some other set of values. 463 00:24:20,080 --> 00:24:25,700 Now that might be a arbitrarily long string of information. 464 00:24:25,700 --> 00:24:30,340 It might be some fixed string where if I run my password through this, 465 00:24:30,340 --> 00:24:33,400 I'm going to get back something that is always 20 characters long. 466 00:24:33,400 --> 00:24:35,440 But it looks nothing like my original password. 467 00:24:35,440 --> 00:24:38,550 I've just made some weird manipulations to it. 468 00:24:38,550 --> 00:24:41,650 And that's what happens in log-in systems more generally 469 00:24:41,650 --> 00:24:43,900 is you will log in to some service, you'll 470 00:24:43,900 --> 00:24:47,980 type in your password, when that information is then submitted 471 00:24:47,980 --> 00:24:51,100 to the organization to check your log-in credentials, 472 00:24:51,100 --> 00:24:54,490 they will run your password through that same hash function again. 473 00:24:54,490 --> 00:25:00,670 And if that value matches what they have in their database for you, 474 00:25:00,670 --> 00:25:04,540 that is how they know that you have provided the correct credentials. 475 00:25:04,540 --> 00:25:10,060 They're mapping-- they're matching some mapping of your password to the one 476 00:25:10,060 --> 00:25:13,930 that they have stored, but they're not actually checking your actual password. 477 00:25:13,930 --> 00:25:16,700 And that should probably give you some sense of security. 478 00:25:16,700 --> 00:25:20,890 And if you ever use a service where you end up having to click on that link 479 00:25:20,890 --> 00:25:24,045 and they actually send you your password, 480 00:25:24,045 --> 00:25:26,170 you probably don't want to use that service anymore 481 00:25:26,170 --> 00:25:33,100 because they're not taking strong enough precautions to protect your data. 482 00:25:33,100 --> 00:25:36,010 So as I said, once we have a password stored in the database, 483 00:25:36,010 --> 00:25:40,600 it is actually stored as a hash rather than as the password itself. 484 00:25:40,600 --> 00:25:46,720 The service should not be able to tell you what your password really is. 485 00:25:46,720 --> 00:25:48,670 So this idea of a hash function-- what is it? 486 00:25:48,670 --> 00:25:52,420 Well, as I said, it's something that takes any arbitrary data-- 487 00:25:52,420 --> 00:25:56,140 and eventually we'll get into hashing things like files and not just words 488 00:25:56,140 --> 00:25:58,690 or strings, but for now let's keep it to strings, strings 489 00:25:58,690 --> 00:26:00,940 being a sequence of characters or letters, like a word 490 00:26:00,940 --> 00:26:02,980 or a phrase or a sentence-- 491 00:26:02,980 --> 00:26:05,630 and mapping it to some other range. 492 00:26:05,630 --> 00:26:10,900 So we'll start out by just mapping a string, a set of letters, to a number. 493 00:26:10,900 --> 00:26:13,540 But it could be to a different string, a string that's 494 00:26:13,540 --> 00:26:15,950 always 10 characters long, and so on. 495 00:26:15,950 --> 00:26:19,060 So there are some properties that good hash functions have. 496 00:26:19,060 --> 00:26:21,100 Let's take a look at what some of these are. 497 00:26:21,100 --> 00:26:23,980 So they should use only the data being hashed. 498 00:26:23,980 --> 00:26:26,230 There shouldn't be anything else that comes into play. 499 00:26:26,230 --> 00:26:28,480 They shouldn't be bringing in any outside information. 500 00:26:28,480 --> 00:26:31,030 It should rely exclusively on whatever data is 501 00:26:31,030 --> 00:26:34,690 being passed in to the hash function. 502 00:26:34,690 --> 00:26:36,970 They should also use all of the data being hashed. 503 00:26:36,970 --> 00:26:42,700 It becomes a bit less effective if every time I provide a word or a string 504 00:26:42,700 --> 00:26:48,400 to my hash function, I'm only using the first letter of that string, 505 00:26:48,400 --> 00:26:50,590 such that my hash function for every word 506 00:26:50,590 --> 00:26:52,900 or every string I provide that starts with A 507 00:26:52,900 --> 00:26:55,180 is going to return the same value. 508 00:26:55,180 --> 00:26:57,160 That's not terribly useful to me. 509 00:26:57,160 --> 00:27:00,370 I want to get a better distribution of values. 510 00:27:00,370 --> 00:27:02,590 Your hash function should be deterministic. 511 00:27:02,590 --> 00:27:06,640 And when we say deterministic, we mean no random elements to it. 512 00:27:06,640 --> 00:27:10,150 Oftentimes we think that random numbers are nice to jumble things up. 513 00:27:10,150 --> 00:27:12,820 But the problem is we want our hash function to always output 514 00:27:12,820 --> 00:27:16,580 the same value for the same inputs. 515 00:27:16,580 --> 00:27:19,120 So if I give you my password and hash it and I get 516 00:27:19,120 --> 00:27:21,498 some output, every time I provide my password 517 00:27:21,498 --> 00:27:23,290 and run it through that same hash function, 518 00:27:23,290 --> 00:27:25,780 I want to get the same output every time. 519 00:27:25,780 --> 00:27:30,010 And that's what sites rely on when they're using hashed passwords as part 520 00:27:30,010 --> 00:27:31,612 of the credentialing check. 521 00:27:31,612 --> 00:27:33,820 They're relying on the fact that they will always get 522 00:27:33,820 --> 00:27:36,340 the same output given the same input. 523 00:27:36,340 --> 00:27:38,470 So that's a requirement of a hash function. 524 00:27:38,470 --> 00:27:42,640 Hash functions should uniformly distribute data. 525 00:27:42,640 --> 00:27:45,770 So oftentimes you're mapping these strings, 526 00:27:45,770 --> 00:27:47,560 let's say, to some set of values. 527 00:27:47,560 --> 00:27:49,810 Those could be numbers, again, those could be strings. 528 00:27:49,810 --> 00:27:52,660 You want to spread those out evenly, ideally, 529 00:27:52,660 --> 00:27:55,630 across all of the possible values that you have. 530 00:27:55,630 --> 00:28:00,580 You don't want everything to hash to 15 if your range is 0 to 100. 531 00:28:00,580 --> 00:28:02,890 You'd ideally like everything to be spread out such 532 00:28:02,890 --> 00:28:05,710 that there's an equal number of 0s, 1s, 99s, 533 00:28:05,710 --> 00:28:10,970 and so on, as we talked about a little bit when we discussed hash tables. 534 00:28:10,970 --> 00:28:14,990 Finally, we also want to be able to generate very different hash codes, 535 00:28:14,990 --> 00:28:18,140 very different values for very similar data. 536 00:28:18,140 --> 00:28:23,150 For example, LAW and LAWS should hash two very different values. 537 00:28:23,150 --> 00:28:26,510 That would be ideal if a tiny bit of variation 538 00:28:26,510 --> 00:28:29,480 created a really dramatic ripple effect. 539 00:28:29,480 --> 00:28:31,970 And creating this really dramatic ripple effect 540 00:28:31,970 --> 00:28:35,780 is pretty key when we're talking about cryptographic hash functions, which 541 00:28:35,780 --> 00:28:38,540 we'll get to in a second, which form the basis of almost 542 00:28:38,540 --> 00:28:41,030 all modern cryptography, which form the basis of everything 543 00:28:41,030 --> 00:28:47,270 that we do that we rely on when we think of security in the computational field, 544 00:28:47,270 --> 00:28:50,780 it's almost always relying on these hash functions being really, really 545 00:28:50,780 --> 00:28:55,400 good at making small changes have very dramatic ripple effects 546 00:28:55,400 --> 00:28:59,480 in the hash code or the hash value, the data that comes out 547 00:28:59,480 --> 00:29:02,080 of the hash function. 548 00:29:02,080 --> 00:29:04,880 So after all this talk about good hash functions, 549 00:29:04,880 --> 00:29:07,098 let's take a look at a pretty bad hash function. 550 00:29:07,098 --> 00:29:08,140 And we'll talk about why. 551 00:29:08,140 --> 00:29:12,850 We'll talk about one of its virtues, but some of its potential problems as well. 552 00:29:12,850 --> 00:29:15,447 So instead, let's add up all of the ordinal positions 553 00:29:15,447 --> 00:29:17,030 of all the letters in the hash string. 554 00:29:17,030 --> 00:29:19,072 So this ordinal position idea is exactly the same 555 00:29:19,072 --> 00:29:22,340 as we had a moment ago when we were talking about Caesar and Vigenere. 556 00:29:22,340 --> 00:29:26,320 So A is 1, B is 2, and so on. 557 00:29:26,320 --> 00:29:30,220 So for example, for a word like STAR, if we want to add up the ordinal positions 558 00:29:30,220 --> 00:29:34,510 of all of the letters in that word, we have S-T-A-R. 559 00:29:34,510 --> 00:29:38,960 That's 19 plus 20 plus 1 plus 18. 560 00:29:38,960 --> 00:29:42,490 So if you do that math quickly, that ends up being 58. 561 00:29:42,490 --> 00:29:45,550 So what is a good thing about this hash function? 562 00:29:45,550 --> 00:29:47,320 Well, it's not reversible. 563 00:29:47,320 --> 00:29:53,170 If I get a 58, I don't necessarily know that the input that I had there 564 00:29:53,170 --> 00:29:54,730 was STAR. 565 00:29:54,730 --> 00:29:57,300 It could have been any one of a whole variety of things. 566 00:29:57,300 --> 00:30:01,450 It could have been ARTS or RATS or SWAP or PAWS 567 00:30:01,450 --> 00:30:06,460 or WASP or MULL or this whole random set of 29 Bs in a row. 568 00:30:06,460 --> 00:30:10,030 All of these things, when run through this really terrible hash 569 00:30:10,030 --> 00:30:13,840 function that I've defined here, all add up to 58 570 00:30:13,840 --> 00:30:16,150 when I follow the rules of this algorithm. 571 00:30:16,150 --> 00:30:20,090 So I never know what my input was given my output. 572 00:30:20,090 --> 00:30:21,130 That is a good thing. 573 00:30:21,130 --> 00:30:22,755 That is what a hash function should do. 574 00:30:22,755 --> 00:30:28,070 Hash functions, unlike ciphers, should not be reversible. 575 00:30:28,070 --> 00:30:33,200 But the problem that I have here is that I have a lot of collisions, right? 576 00:30:33,200 --> 00:30:37,960 There are a lot of different things that map to 58. 577 00:30:37,960 --> 00:30:40,880 And when we talked about collisions a little bit previously, 578 00:30:40,880 --> 00:30:43,580 we were talking about them in the context of a hash table. 579 00:30:43,580 --> 00:30:46,520 And collisions were OK in that context. 580 00:30:46,520 --> 00:30:48,860 We were just clustering things together. 581 00:30:48,860 --> 00:30:50,920 If they all happened to have the same hash value, 582 00:30:50,920 --> 00:30:52,625 we'll just put them in the same bucket. 583 00:30:52,625 --> 00:30:54,500 When we're talking about cryptography though, 584 00:30:54,500 --> 00:30:59,870 when we start to get into relying on cryptography to keep our data secure, 585 00:30:59,870 --> 00:31:03,360 we can't have collisions at all. 586 00:31:03,360 --> 00:31:07,400 In fact, pretty much we rely on the fact that it is so mathematically 587 00:31:07,400 --> 00:31:12,740 unlikely, neigh impossible to have a collision in order for these things 588 00:31:12,740 --> 00:31:13,250 to work. 589 00:31:13,250 --> 00:31:16,760 And so collisions, when we're talking about cryptographic hash functions, 590 00:31:16,760 --> 00:31:19,630 are definitely not a good thing. 591 00:31:19,630 --> 00:31:23,850 So to recap, to check that a user gave us the correct password, if we're 592 00:31:23,850 --> 00:31:27,330 storing a hash of the password in the database versus just storing 593 00:31:27,330 --> 00:31:30,870 the plain text password in the database, which hopefully no one is storing 594 00:31:30,870 --> 00:31:33,300 a plain text password in the database, we 595 00:31:33,300 --> 00:31:37,530 run the actual password, the real password through the hash function. 596 00:31:37,530 --> 00:31:40,710 We get a hash value as an output, some string or some number 597 00:31:40,710 --> 00:31:42,670 or what have you as the output. 598 00:31:42,670 --> 00:31:46,170 And if we get a match, odds are they entered the right password. 599 00:31:46,170 --> 00:31:51,510 Now I'm saying odds are because we can't be 100% sure. 600 00:31:51,510 --> 00:31:54,300 And we can never be 100% sure. 601 00:31:54,300 --> 00:31:57,420 We can be really, really, really sure, but there's always 602 00:31:57,420 --> 00:31:59,130 a chance of a collision. 603 00:31:59,130 --> 00:32:02,670 Even with the best designed hash functions, even 604 00:32:02,670 --> 00:32:05,125 with the best designed cryptographic hash functions, 605 00:32:05,125 --> 00:32:06,750 there's always a chance of a collision. 606 00:32:06,750 --> 00:32:09,780 But ideally, that chance is quite infinitesimal. 607 00:32:09,780 --> 00:32:12,390 Very, very, very, very, very, very unlikely. 608 00:32:12,390 --> 00:32:16,860 So odds are if we get this hash, comes out of this hash function, 609 00:32:16,860 --> 00:32:21,120 it's quite likely, like 99.9% plus likely 610 00:32:21,120 --> 00:32:23,610 that they entered the correct password, this is, in fact, 611 00:32:23,610 --> 00:32:29,220 the user whose credentials are being verified, and we should log them in. 612 00:32:29,220 --> 00:32:32,113 Modern cryptography is just hashing. 613 00:32:32,113 --> 00:32:35,280 It's just hashing that's quite a bit more clever, certainly than the example 614 00:32:35,280 --> 00:32:37,890 that I just talked about a moment ago. 615 00:32:37,890 --> 00:32:42,235 Also, these algorithms tend not to work on a character by character basis. 616 00:32:42,235 --> 00:32:44,235 It's the algorithm that I just did as well where 617 00:32:44,235 --> 00:32:45,390 I was adding up every single letter. 618 00:32:45,390 --> 00:32:47,220 I was looking at each one individually. 619 00:32:47,220 --> 00:32:49,800 They tend to take, these modern algorithms 620 00:32:49,800 --> 00:32:53,700 tend to take clusters of letters, pairs or triples or so on at a time, 621 00:32:53,700 --> 00:32:55,215 maybe do even more things. 622 00:32:55,215 --> 00:32:57,840 They might rearrange the letters before they do things to them. 623 00:32:57,840 --> 00:33:02,490 So there's multiple layers going on with these encryption algorithms. 624 00:33:02,490 --> 00:33:05,530 And unlike some of the ones I've discussed earlier, 625 00:33:05,530 --> 00:33:10,050 most of these also have the property where given data of arbitrary size-- 626 00:33:10,050 --> 00:33:13,620 and now we're starting to really expand our minds into not just words 627 00:33:13,620 --> 00:33:18,720 or strings, but also images, files, videos, documents, PDFs, 628 00:33:18,720 --> 00:33:23,100 and so on; anything can be run through a hash function to get a value-- 629 00:33:23,100 --> 00:33:26,550 but we're always going to get a string of bits, a bit string, that 630 00:33:26,550 --> 00:33:28,472 is always exactly the same size. 631 00:33:28,472 --> 00:33:30,180 So depending on the algorithm, maybe it's 632 00:33:30,180 --> 00:33:36,780 going to be a 160-bit long string, or a 256-bit long string. 633 00:33:36,780 --> 00:33:38,790 But our range is finite. 634 00:33:38,790 --> 00:33:42,480 It's always going to be exactly 256 bits. 635 00:33:42,480 --> 00:33:45,210 But the combination of those bits will be different, ideally, 636 00:33:45,210 --> 00:33:50,040 for every single piece of data we might throw at it, no matter what. 637 00:33:50,040 --> 00:33:53,970 OK, so let's expand our definition of a hash function 638 00:33:53,970 --> 00:33:57,180 to this idea of a cryptographic hash function. 639 00:33:57,180 --> 00:34:00,390 What properties should they have? 640 00:34:00,390 --> 00:34:05,880 They should be very difficult, very, very difficult, basically impossible 641 00:34:05,880 --> 00:34:06,930 to reverse. 642 00:34:06,930 --> 00:34:12,000 It should be computationally impossible for anybody to undo the encryption. 643 00:34:12,000 --> 00:34:15,060 That's pretty much the same as a regular hash function. 644 00:34:15,060 --> 00:34:17,940 We're just really hammering the point home when we say this here. 645 00:34:17,940 --> 00:34:19,560 They should still be deterministic. 646 00:34:19,560 --> 00:34:22,070 We don't want any random elements to it. 647 00:34:22,070 --> 00:34:23,940 We still want to a hash a value and always 648 00:34:23,940 --> 00:34:28,469 get the same output no matter what if we run that same value through the hash 649 00:34:28,469 --> 00:34:31,949 function an arbitrary number of times. 650 00:34:31,949 --> 00:34:34,739 They should still generate very different hash codes 651 00:34:34,739 --> 00:34:36,239 for very similar data. 652 00:34:36,239 --> 00:34:38,370 We still want things to be spread out and we want 653 00:34:38,370 --> 00:34:42,020 minor changes to have dramatic effect. 654 00:34:42,020 --> 00:34:43,750 And they should never-- 655 00:34:43,750 --> 00:34:46,440 and this is one of those words that computer scientists love-- 656 00:34:46,440 --> 00:34:52,620 they should never allow two different sets of data to hash to the same value. 657 00:34:52,620 --> 00:34:56,830 Do you see a potential problem when we frame it in this way? 658 00:34:56,830 --> 00:35:00,150 When we say they should never be able to do that? 659 00:35:00,150 --> 00:35:02,910 We've already restricted ourselves to a finite domain, right? 660 00:35:02,910 --> 00:35:10,530 I said a moment ago, maybe this hash function maps to 160-bit long strings. 661 00:35:10,530 --> 00:35:14,760 There's only so many combinations of 160 bits. 662 00:35:14,760 --> 00:35:18,960 Now that might be an unfathomably large number, but using the word never 663 00:35:18,960 --> 00:35:21,060 there becomes a bit dangerous. 664 00:35:21,060 --> 00:35:24,000 We can't really rely on that. 665 00:35:24,000 --> 00:35:26,940 And we'll see why this could potentially be a problem. 666 00:35:26,940 --> 00:35:31,088 This static length string, by the way, is usually referred to as a digest 667 00:35:31,088 --> 00:35:31,755 in this context. 668 00:35:31,755 --> 00:35:34,650 When we start to talk about more modern cryptography techniques, 669 00:35:34,650 --> 00:35:36,720 the output of a cryptographic hash function 670 00:35:36,720 --> 00:35:40,365 is usually referred to as a digest. 671 00:35:40,365 --> 00:35:42,990 Let's take a look at one of these cryptographic hash functions. 672 00:35:42,990 --> 00:35:45,420 And certainly I'm not going to dive into the mathematics of it. 673 00:35:45,420 --> 00:35:46,800 I wouldn't be able to explain the mathematics. 674 00:35:46,800 --> 00:35:49,440 I wouldn't be able to do it justice if I tried to explain the mathematics of it. 675 00:35:49,440 --> 00:35:52,080 But let's just take a look at some of the basics of this. 676 00:35:52,080 --> 00:35:53,510 So SHA-1. 677 00:35:53,510 --> 00:35:55,710 SHA-1 is quite a famous algorithm. 678 00:35:55,710 --> 00:36:00,900 It was designed by the National Security Agency in the mid-1990s. 679 00:36:00,900 --> 00:36:06,060 So these are really smart people who are tasked with working 680 00:36:06,060 --> 00:36:08,820 with things like military intelligence. 681 00:36:08,820 --> 00:36:14,870 These are people who are dedicating their lives to trying to protect data 682 00:36:14,870 --> 00:36:17,150 as best as they possibly can. 683 00:36:17,150 --> 00:36:20,240 Far more brilliant minds than I, for sure. 684 00:36:20,240 --> 00:36:22,700 And this hash function-- and this is a published paper. 685 00:36:22,700 --> 00:36:27,290 Hash functions tend to be, actually it's this very strange dichotomy where 686 00:36:27,290 --> 00:36:30,200 you describe exactly how the function works, 687 00:36:30,200 --> 00:36:32,860 but it still should be irreversible. 688 00:36:32,860 --> 00:36:36,980 And this just really becomes a question of incredibly complicated mathematics 689 00:36:36,980 --> 00:36:40,640 involved, such that even if you knew so many of the pieces going in, 690 00:36:40,640 --> 00:36:44,000 you still might not-- you still wouldn't be able to undo it, even if you tried. 691 00:36:44,000 --> 00:36:45,760 It's kind of amazing actually. 692 00:36:45,760 --> 00:36:50,970 SHA-1's digests are always 160 bits in length. 693 00:36:50,970 --> 00:36:53,810 So this is one of those ones I just said a moment ago. 694 00:36:53,810 --> 00:36:59,810 That means that there are 2 to the 160 different SHA-1 digests, which 695 00:36:59,810 --> 00:37:01,550 is a bit over 10 to the 48th power. 696 00:37:01,550 --> 00:37:06,380 And again, 2 to 160 means for every single one of the 160 bits, 697 00:37:06,380 --> 00:37:09,290 that could be a 0 or a 1. 698 00:37:09,290 --> 00:37:15,540 So we have that, two options times two options times two options, 160 times. 699 00:37:15,540 --> 00:37:20,815 Just to try and make it fathomable, to understand how large this number is, 700 00:37:20,815 --> 00:37:22,440 let me try and paint a picture for you. 701 00:37:22,440 --> 00:37:30,270 So imagine that you are looking on Earth for a specific grain of sand. 702 00:37:30,270 --> 00:37:35,690 You're looking for one specific grain of sand on Earth. 703 00:37:35,690 --> 00:37:45,260 That is easier by far than trying to have SHA-1 have a collision where 704 00:37:45,260 --> 00:37:47,070 two values would map to the same thing. 705 00:37:47,070 --> 00:37:52,030 There's about 10 to the 18 grains of sand on Earth. 706 00:37:52,030 --> 00:37:53,860 So that's eight quintillion-- 707 00:37:53,860 --> 00:37:55,240 I had to look up that word-- 708 00:37:55,240 --> 00:37:57,460 eight quintillion grains of sand. 709 00:37:57,460 --> 00:37:59,768 So way easier to find the grain of sand on Earth 710 00:37:59,768 --> 00:38:01,060 than it is to have a collision. 711 00:38:01,060 --> 00:38:04,060 In fact, we go even further and say that imagine 712 00:38:04,060 --> 00:38:07,000 that every single one of those grains of sand 713 00:38:07,000 --> 00:38:12,780 was another planet Earth, each of which also had sand on it. 714 00:38:12,780 --> 00:38:16,590 So you have eight quintillion planet Earths. 715 00:38:16,590 --> 00:38:19,230 You're trying to find a specific grain of sand 716 00:38:19,230 --> 00:38:23,220 on one of those eight quintillion planets. 717 00:38:23,220 --> 00:38:29,900 It's still easier than trying to have a collision with SHA-1. 718 00:38:29,900 --> 00:38:33,650 SHA-1 is such an important algorithm that it's actually 719 00:38:33,650 --> 00:38:36,717 one of the algorithms that is required in federal regulations 720 00:38:36,717 --> 00:38:39,050 to be used by the government for encrypting information. 721 00:38:39,050 --> 00:38:41,540 There are others as well, but SHA-1 is listed 722 00:38:41,540 --> 00:38:48,120 by the National Institute for Science and Technology as a standard algorithm. 723 00:38:48,120 --> 00:38:53,310 But there's a problem, which is that SHA-1 is broken. 724 00:38:53,310 --> 00:38:56,830 And it has this clever website called SHAttered, shattered.io. 725 00:38:56,830 --> 00:39:02,800 So the research team that figured out how to create a collision intentionally 726 00:39:02,800 --> 00:39:04,470 create a collision. 727 00:39:04,470 --> 00:39:07,570 And intentionally creating collision has the effect of basically saying, 728 00:39:07,570 --> 00:39:11,080 this cryptographic hash function is broken. 729 00:39:11,080 --> 00:39:15,070 And they have proven that there is a way that they can systematically 730 00:39:15,070 --> 00:39:17,230 generate collisions. 731 00:39:17,230 --> 00:39:19,330 So that's bad. 732 00:39:19,330 --> 00:39:22,430 And we'll see why that's bad in just a moment. 733 00:39:22,430 --> 00:39:24,280 But you can go to this URL, shattered.io, 734 00:39:24,280 --> 00:39:26,488 and read quite a bit about how the researchers do it. 735 00:39:26,488 --> 00:39:28,493 They explain it in different levels. 736 00:39:28,493 --> 00:39:31,660 So if you really want to dive into the technology and the mathematics of it, 737 00:39:31,660 --> 00:39:32,410 you're certainly welcome to. 738 00:39:32,410 --> 00:39:35,830 If you just want to understand it at a base level and why this is a problem, 739 00:39:35,830 --> 00:39:38,080 I definitely encourage you to take a look at this site 740 00:39:38,080 --> 00:39:39,400 and read more about this. 741 00:39:39,400 --> 00:39:42,620 So what did these researchers do? 742 00:39:42,620 --> 00:39:45,860 So they said, It is now practically possible 743 00:39:45,860 --> 00:39:50,780 to craft two colliding PDF files and obtain a SHA-1 digital signature 744 00:39:50,780 --> 00:39:53,990 on the first PDF file, which can also be abused 745 00:39:53,990 --> 00:39:57,047 as a valid signature on the second PDF file. 746 00:39:57,047 --> 00:39:58,880 In short, what they're basically saying here 747 00:39:58,880 --> 00:40:02,210 is we were able to create two PDF files such 748 00:40:02,210 --> 00:40:07,760 that if I run them through the SHA-1 algorithm, the digest that I get 749 00:40:07,760 --> 00:40:09,560 is the same. 750 00:40:09,560 --> 00:40:10,930 Why is this potentially bad? 751 00:40:10,930 --> 00:40:13,890 752 00:40:13,890 --> 00:40:17,040 For example, by crafting the two colliding PDF files 753 00:40:17,040 --> 00:40:20,310 as two rental agreements with different rent, 754 00:40:20,310 --> 00:40:22,530 it is possible to trick someone to create 755 00:40:22,530 --> 00:40:26,880 a valid signature for a high-rent contract 756 00:40:26,880 --> 00:40:30,150 by having him or her sign a low-rent contract. 757 00:40:30,150 --> 00:40:33,090 If you can take a PDF and twist it into anything 758 00:40:33,090 --> 00:40:36,225 you want it to be, but have a valid signature, 759 00:40:36,225 --> 00:40:41,870 a valid SHA hash associated with it, that's not great. 760 00:40:41,870 --> 00:40:44,420 Now before alarm bells start going off because SHA-1 is still 761 00:40:44,420 --> 00:40:49,010 use quite extensively, even now, this SHAttered research result 762 00:40:49,010 --> 00:40:53,300 was developed in 2017 it was released, but SHA-1 is still 763 00:40:53,300 --> 00:40:56,510 being used now, even then. 764 00:40:56,510 --> 00:41:00,200 Before you panic though, it has not been broken that many times, 765 00:41:00,200 --> 00:41:01,220 although they did very-- 766 00:41:01,220 --> 00:41:05,720 they worked for two years to create this PDF collision. 767 00:41:05,720 --> 00:41:08,090 And they demonstrated a method for how to do it. 768 00:41:08,090 --> 00:41:11,120 It has still not happened that many times. 769 00:41:11,120 --> 00:41:14,270 Cryptographic hash functions, once they've demonstrated one collision, 770 00:41:14,270 --> 00:41:15,050 are broken. 771 00:41:15,050 --> 00:41:16,850 That is certainly true. 772 00:41:16,850 --> 00:41:21,050 But the actual effects of this have not yet really materialized. 773 00:41:21,050 --> 00:41:24,020 The computational power required to create this 774 00:41:24,020 --> 00:41:29,660 is well beyond the capabilities of most people, or most syndicates even. 775 00:41:29,660 --> 00:41:31,440 So no cause for alarm yet. 776 00:41:31,440 --> 00:41:36,320 But it does show that there is a limitation with SHA-1, 777 00:41:36,320 --> 00:41:39,980 and we still want to always be staying one step ahead. 778 00:41:39,980 --> 00:41:43,040 Just like when Julius Caesar's enemies figured out 779 00:41:43,040 --> 00:41:46,940 how to crack the Caesar Cipher, the goal was, we need to get one step ahead. 780 00:41:46,940 --> 00:41:49,490 As technologists, we always want to stay one step ahead 781 00:41:49,490 --> 00:41:52,670 to make sure that we are doing our best job protecting our data. 782 00:41:52,670 --> 00:41:54,420 And as lawyers, we want to make sure we're 783 00:41:54,420 --> 00:41:56,810 doing our best job protecting our clients' data 784 00:41:56,810 --> 00:42:00,220 against potential adversarial attacks. 785 00:42:00,220 --> 00:42:02,608 So as I mentioned, there are other standards 786 00:42:02,608 --> 00:42:05,650 that are in use by other organizations, including the federal government. 787 00:42:05,650 --> 00:42:09,700 SHA-1, as I mentioned, is just one of a few different options that they use. 788 00:42:09,700 --> 00:42:14,230 SHA-2 and SHA-3 are much more robust algorithms. 789 00:42:14,230 --> 00:42:16,870 They use more bits, basically, in their digest. 790 00:42:16,870 --> 00:42:20,740 So instead of being 160 bits, you can have anywhere between 220 791 00:42:20,740 --> 00:42:22,270 and 500 or so bits. 792 00:42:22,270 --> 00:42:25,960 So way larger of a domain, even reducing the likelihood 793 00:42:25,960 --> 00:42:27,540 of a collision that much more. 794 00:42:27,540 --> 00:42:32,530 Again, imagine how unlikely it was with 2 to the 160. 795 00:42:32,530 --> 00:42:34,620 Now we make it even more so. 796 00:42:34,620 --> 00:42:41,050 500 bits, that's unfathomably large and difficult to duplicate. 797 00:42:41,050 --> 00:42:45,100 MD5 and MD6 are other cryptographic hash functions, or hash functions 798 00:42:45,100 --> 00:42:46,530 that you may encounter. 799 00:42:46,530 --> 00:42:50,860 MD5 in particular I've highlighted here in yellow because it's not actually 800 00:42:50,860 --> 00:42:53,350 considered secure anymore, but it's still very, very 801 00:42:53,350 --> 00:42:55,390 commonly used as a checksum. 802 00:42:55,390 --> 00:42:59,170 Basically, what we do is we run a file through MD5. 803 00:42:59,170 --> 00:43:02,260 And say we're a distributor of a file and we 804 00:43:02,260 --> 00:43:04,390 want people to come download our source, and they 805 00:43:04,390 --> 00:43:08,830 want to be able to trust our source, we might run our file through MD5 and say, 806 00:43:08,830 --> 00:43:14,560 if you run this file through MD5, the hash will be blah blah blah blah blah. 807 00:43:14,560 --> 00:43:17,680 And other people can then download the file and run it through MD5. 808 00:43:17,680 --> 00:43:20,347 It's usually a program that is available on computers for people 809 00:43:20,347 --> 00:43:24,190 to just run any arbitrary data through to get a hash result. 810 00:43:24,190 --> 00:43:27,790 And they can check, OK, the hash value that I received from this trusted 811 00:43:27,790 --> 00:43:32,150 source matches the hash value that I was told I would receive, 812 00:43:32,150 --> 00:43:33,790 and so I will trust this. 813 00:43:33,790 --> 00:43:35,800 Versus perhaps getting that same software 814 00:43:35,800 --> 00:43:38,740 versus some corner of the internet that you don't really trust. 815 00:43:38,740 --> 00:43:41,050 If you find the MD5 hash of the trusted source 816 00:43:41,050 --> 00:43:45,562 does not match what you downloaded and what you thought was that same file, 817 00:43:45,562 --> 00:43:47,770 it's probably a sign that something has changed in it 818 00:43:47,770 --> 00:43:49,060 and you don't really want to-- 819 00:43:49,060 --> 00:43:52,060 you might want to be skeptical about trusting that file rather than just 820 00:43:52,060 --> 00:43:54,790 diving right into it. 821 00:43:54,790 --> 00:44:00,070 So what do we do that relies on cryptography on the internet today? 822 00:44:00,070 --> 00:44:03,580 Or you know, just using our computers every day. 823 00:44:03,580 --> 00:44:04,570 Email. 824 00:44:04,570 --> 00:44:07,210 Email relies pretty extensively on cryptography, 825 00:44:07,210 --> 00:44:11,897 particularly when we start to use secure email services, of which Gmail might 826 00:44:11,897 --> 00:44:14,230 not be considered one, but there are services out there, 827 00:44:14,230 --> 00:44:18,520 for example, ProtonMail and others, that do encrypt email completely 828 00:44:18,520 --> 00:44:19,930 from point to point. 829 00:44:19,930 --> 00:44:25,120 Much safer in terms of protecting one's communications. 830 00:44:25,120 --> 00:44:29,410 Similarly, you may be familiar with the mobile app Signal is also 831 00:44:29,410 --> 00:44:33,760 used to encrypt communications between two people over the text messaging 832 00:44:33,760 --> 00:44:38,470 network rather than over email and the internet. 833 00:44:38,470 --> 00:44:41,200 Secure web browsing, you may be familiar with this distinction 834 00:44:41,200 --> 00:44:44,012 between HTTP and HTTPS. 835 00:44:44,012 --> 00:44:45,220 And if you're not, that's OK. 836 00:44:45,220 --> 00:44:47,887 We're going to be talking about that a little bit later as well. 837 00:44:47,887 --> 00:44:51,160 But you want to make sure that your web traffic is encrypted against people 838 00:44:51,160 --> 00:44:55,360 who are able to just monitor the network for all the traffic that is going by. 839 00:44:55,360 --> 00:45:00,640 You probably don't want your searches to be someone else's 840 00:45:00,640 --> 00:45:02,980 fodder for entertainment. 841 00:45:02,980 --> 00:45:03,740 VPNs. 842 00:45:03,740 --> 00:45:06,990 If you use a VPN, that's a great thing to do if you're traveling, for example, 843 00:45:06,990 --> 00:45:11,290 and you may be on less secure networks than you might find at your business 844 00:45:11,290 --> 00:45:16,210 or at home or at a university institution, for example. 845 00:45:16,210 --> 00:45:19,990 VPNs allow you to encrypt communications with a network, 846 00:45:19,990 --> 00:45:24,670 and also allow the network to pretend to do something on your behalf so that 847 00:45:24,670 --> 00:45:28,450 your web traffic cannot be traced back to you directly, 848 00:45:28,450 --> 00:45:31,600 which might be advantageous in some situations as well. 849 00:45:31,600 --> 00:45:32,890 Document storage as well. 850 00:45:32,890 --> 00:45:37,330 So if you use services like Dropbox, for example, generally what 851 00:45:37,330 --> 00:45:40,960 Dropbox is going to do is break your document into pieces 852 00:45:40,960 --> 00:45:42,130 and encrypt those pieces. 853 00:45:42,130 --> 00:45:46,300 Rather than just storing the whole file writ large in some server somewhere 854 00:45:46,300 --> 00:45:49,210 on the cloud, it's going to encrypt it before it sends it over 855 00:45:49,210 --> 00:45:52,840 so that you have some more comfort that your data is being 856 00:45:52,840 --> 00:45:54,910 protected by these cloud services. 857 00:45:54,910 --> 00:45:58,510 And certainly, we're going to talk a bit more about what the cloud is 858 00:45:58,510 --> 00:46:01,720 and what cloud services are and what they can be used for a little bit 859 00:46:01,720 --> 00:46:05,763 later in the course as well. 860 00:46:05,763 --> 00:46:08,180 Hash functions and cryptographic hash functions are great, 861 00:46:08,180 --> 00:46:11,330 but they are well documented and there's only the one. 862 00:46:11,330 --> 00:46:13,460 There's only one version of SHA-1. 863 00:46:13,460 --> 00:46:15,950 There's only one version of SHA-3. 864 00:46:15,950 --> 00:46:17,585 And that is a limitation. 865 00:46:17,585 --> 00:46:20,820 Now it might not be a severe one because it's pretty strong. 866 00:46:20,820 --> 00:46:22,760 They're pretty strong algorithms. 867 00:46:22,760 --> 00:46:27,230 But are there ways that we can improve our own cryptographic techniques 868 00:46:27,230 --> 00:46:30,380 if we're trying to protect data that we are receiving, 869 00:46:30,380 --> 00:46:32,240 data that we are sending, and so on? 870 00:46:32,240 --> 00:46:35,580 And that leaves this idea of public-key cryptography, 871 00:46:35,580 --> 00:46:38,930 or public- and private-key cryptography, or asymmetric encryption. 872 00:46:38,930 --> 00:46:41,870 You'll hear these terms kind of used interchangeably. 873 00:46:41,870 --> 00:46:46,640 Let's start by talking about public-key cryptography by way of an analogy. 874 00:46:46,640 --> 00:46:50,510 We're going to go way back to arithmetic and algebra days here. 875 00:46:50,510 --> 00:46:52,820 So imagine we have something like this. 876 00:46:52,820 --> 00:46:58,220 We have 14 times 8 equals 112. 877 00:46:58,220 --> 00:47:01,460 Multiplication we can think of as a function. 878 00:47:01,460 --> 00:47:02,670 It is a function. 879 00:47:02,670 --> 00:47:08,720 If 14 is our input and our function is times 8, the result is 112. 880 00:47:08,720 --> 00:47:11,780 Now multiplication is not a hash function because it is reversible. 881 00:47:11,780 --> 00:47:17,060 I can take that 112, multiply it by 1/8, or equivalently divide by 8, 882 00:47:17,060 --> 00:47:18,750 and get back the original input. 883 00:47:18,750 --> 00:47:24,650 So multiplication is a function, but it is not a hash function. 884 00:47:24,650 --> 00:47:30,110 It is reversible because if we multiply any number x by some other number y, 885 00:47:30,110 --> 00:47:31,760 we get a result z. 886 00:47:31,760 --> 00:47:37,220 And we can undo that whole process by taking z, multiplying it by 1 over y, 887 00:47:37,220 --> 00:47:40,370 or the reciprocal of y, and getting back the original x. 888 00:47:40,370 --> 00:47:41,090 Reversible. 889 00:47:41,090 --> 00:47:44,620 Goes in both directions. 890 00:47:44,620 --> 00:47:47,740 Now let's take this function and kind of obscure it. 891 00:47:47,740 --> 00:47:54,040 We know for ourselves that this function that I'm using is n times 8. 892 00:47:54,040 --> 00:47:58,600 Whatever I pass in is going to be multiplied by 8. 893 00:47:58,600 --> 00:48:01,240 But I don't tell you what that is. 894 00:48:01,240 --> 00:48:03,310 I don't tell my friends what that is. 895 00:48:03,310 --> 00:48:05,890 I just say, hey, if you want to send me a message, 896 00:48:05,890 --> 00:48:08,393 just run it through this function. 897 00:48:08,393 --> 00:48:10,810 So again, we're going to just use math as an example here. 898 00:48:10,810 --> 00:48:15,350 If my message is 14, I might say, f of 14-- 899 00:48:15,350 --> 00:48:18,700 and again, this is getting back to algebra, maybe a little bit 900 00:48:18,700 --> 00:48:20,100 back in the day-- 901 00:48:20,100 --> 00:48:23,020 f of 14 is 112. 902 00:48:23,020 --> 00:48:27,295 That is my public key, you might think. 903 00:48:27,295 --> 00:48:29,920 And you might say, having just gone through this whole example, 904 00:48:29,920 --> 00:48:32,620 that, well, it's pretty easy to undo that. 905 00:48:32,620 --> 00:48:35,620 If I know that 14 is the plain text and 112 is the cipher text, 906 00:48:35,620 --> 00:48:39,400 I can probably figure out that your function is n times 8. 907 00:48:39,400 --> 00:48:41,620 And so I've broken your encryption scheme. 908 00:48:41,620 --> 00:48:46,210 I have figured out how to reverse your cryptography. 909 00:48:46,210 --> 00:48:49,780 Well, it's true that n times 8 is certainly one function 910 00:48:49,780 --> 00:48:55,150 that I could use to turn that plain text, 14 in this example, 911 00:48:55,150 --> 00:48:58,700 into that cipher text, 112 in this example. 912 00:48:58,700 --> 00:49:01,330 But there are other ways that I can do it. 913 00:49:01,330 --> 00:49:05,300 My actual function could have been n times 10 minus 28. 914 00:49:05,300 --> 00:49:09,010 So 14 times 10 is 140, minus 28 is 112. 915 00:49:09,010 --> 00:49:11,230 And there are other contrived mathematical examples 916 00:49:11,230 --> 00:49:16,240 that I could continue to do pretty much ad infinitum to define 917 00:49:16,240 --> 00:49:20,440 ways to transform 14 into 112. 918 00:49:20,440 --> 00:49:26,140 So just because you see that 112, that doesn't mean you 919 00:49:26,140 --> 00:49:30,620 have figured out how to break my hash function. 920 00:49:30,620 --> 00:49:34,000 You haven't figured out what my encryption technique is. 921 00:49:34,000 --> 00:49:37,180 If all I say is, here's a black box that I would like you to feed an input 922 00:49:37,180 --> 00:49:45,250 into, even if you see the output, you, or really more concernedly an adversary 923 00:49:45,250 --> 00:49:50,180 who sees that output as well should not be able to, or cannot in this case, 924 00:49:50,180 --> 00:49:50,680 undo it. 925 00:49:50,680 --> 00:49:52,840 Because yes, I could have been using n times 8. 926 00:49:52,840 --> 00:49:57,890 I could have been using this crazy thing involving the square of n. 927 00:49:57,890 --> 00:50:01,780 And that's kind of the idea behind public-key cryptography. 928 00:50:01,780 --> 00:50:07,120 I am going to publicize that I have a function that can be used, 929 00:50:07,120 --> 00:50:10,450 but I'm not going to tell you what that function is, 930 00:50:10,450 --> 00:50:14,120 and I'm certainly not going to tell you how to reverse it. 931 00:50:14,120 --> 00:50:19,330 So public- and private-key cryptography are actually two hash functions 932 00:50:19,330 --> 00:50:21,640 where the goal is to reverse them. 933 00:50:21,640 --> 00:50:24,100 We kind of talked about this as hash functions 934 00:50:24,100 --> 00:50:26,380 are supposed to be irreversible. 935 00:50:26,380 --> 00:50:32,860 But the distinction here is that we are creating two functions, f and g, which 936 00:50:32,860 --> 00:50:35,150 are intended to reverse one another. 937 00:50:35,150 --> 00:50:38,720 So it's not that the function does the single function that is reversible, 938 00:50:38,720 --> 00:50:43,120 it is that we have two functions that, working together, create a circuit. 939 00:50:43,120 --> 00:50:47,800 If I take data and I run it through function f, I get some output. 940 00:50:47,800 --> 00:50:52,030 If I run that output through function g, I get back the original data. 941 00:50:52,030 --> 00:50:53,540 I have deciphered the information. 942 00:50:53,540 --> 00:50:55,040 And the same thing works in reverse. 943 00:50:55,040 --> 00:50:58,420 If I take some data and I run it through function g, 944 00:50:58,420 --> 00:51:02,180 I get some hashed output that makes no sense. 945 00:51:02,180 --> 00:51:06,160 And if I run that hashed output through function f, 946 00:51:06,160 --> 00:51:09,500 I get back the original data once again. 947 00:51:09,500 --> 00:51:13,870 Now the key is that-- pun intended-- the key is that one of these functions 948 00:51:13,870 --> 00:51:15,910 is public and the other one is private. 949 00:51:15,910 --> 00:51:19,600 One of them is available to everybody, and everybody uses 950 00:51:19,600 --> 00:51:22,300 that function to send you messages. 951 00:51:22,300 --> 00:51:25,540 If you want to send me a message using encryption, 952 00:51:25,540 --> 00:51:29,290 using public and private key encryption, you take the message 953 00:51:29,290 --> 00:51:32,950 and you use my public key to encrypt it, and you 954 00:51:32,950 --> 00:51:36,250 send me the result, the hashed encrypted result. 955 00:51:36,250 --> 00:51:38,950 And I use my private key to decrypt it. 956 00:51:38,950 --> 00:51:41,020 And I am, ostensibly, the only person who 957 00:51:41,020 --> 00:51:43,810 has my private key, even though I've broadcasted, 958 00:51:43,810 --> 00:51:47,650 made my public key widely available. 959 00:51:47,650 --> 00:51:52,300 Now the math that goes into this is well beyond the scope of a discussion 960 00:51:52,300 --> 00:51:53,920 that we're going to have here today. 961 00:51:53,920 --> 00:51:57,970 But basically, and most encryption, most cryptography 962 00:51:57,970 --> 00:52:00,760 involves the use of prime numbers, particularly 963 00:52:00,760 --> 00:52:02,517 very, very large prime numbers. 964 00:52:02,517 --> 00:52:05,350 And you're looking for prime numbers that have a particular pattern. 965 00:52:05,350 --> 00:52:07,942 And when I say "you're" looking for it, don't worry, 966 00:52:07,942 --> 00:52:09,400 you don't have to do this yourself. 967 00:52:09,400 --> 00:52:11,380 There are plenty of programs out there, RSA 968 00:52:11,380 --> 00:52:14,140 being a very popular one, that can be used to generate 969 00:52:14,140 --> 00:52:16,810 these public and private key pairs. 970 00:52:16,810 --> 00:52:22,330 But the amazing thing is that it can generate these pairs very quickly, 971 00:52:22,330 --> 00:52:26,080 but it's almost impossible to break or figure out 972 00:52:26,080 --> 00:52:28,390 what the underlying functions, or even in this case 973 00:52:28,390 --> 00:52:30,900 what the underlying two prime numbers are 974 00:52:30,900 --> 00:52:33,460 that are the foundation for your own encryption strategy. 975 00:52:33,460 --> 00:52:37,750 So it's pretty amazing that it's easy to define these functions 976 00:52:37,750 --> 00:52:43,757 and almost impossible to reverse engineer them, so to speak. 977 00:52:43,757 --> 00:52:46,840 So we start with a huge prime number, we find some other prime number that 978 00:52:46,840 --> 00:52:49,960 has a property, a special property related to it, 979 00:52:49,960 --> 00:52:54,190 and from those two numbers we generate two functions whose goal in life 980 00:52:54,190 --> 00:52:56,740 is to undo whatever the first one does. 981 00:52:56,740 --> 00:53:03,370 So f's job is to undo what g does, g's job is to undo what f does. 982 00:53:03,370 --> 00:53:05,490 And this is called a public and private key pair. 983 00:53:05,490 --> 00:53:10,180 So your public key is really some complicated hash function 984 00:53:10,180 --> 00:53:11,660 that does work. 985 00:53:11,660 --> 00:53:14,260 And that hash function is represented as a very long string 986 00:53:14,260 --> 00:53:16,490 of numbers and letters. 987 00:53:16,490 --> 00:53:19,870 It looks just like a hash digest. 988 00:53:19,870 --> 00:53:23,620 But it's just a human representation, a readable representation 989 00:53:23,620 --> 00:53:25,330 of a mathematical function. 990 00:53:25,330 --> 00:53:27,860 And your private key is the same-- 991 00:53:27,860 --> 00:53:30,805 or your private key is also a representation of letters and numbers. 992 00:53:30,805 --> 00:53:32,680 It's not exactly the same as your public key, 993 00:53:32,680 --> 00:53:35,110 but it undoes the work that your private key does. 994 00:53:35,110 --> 00:53:39,970 And again, these keys are generated using a program called RSA. 995 00:53:39,970 --> 00:53:42,460 So let's take a look at exactly how we would 996 00:53:42,460 --> 00:53:46,690 go about doing some asymmetric encryption using 997 00:53:46,690 --> 00:53:48,350 public and private keys. 998 00:53:48,350 --> 00:53:50,560 So here we have some original data. 999 00:53:50,560 --> 00:53:52,660 It's a message perhaps that I want to send. 1000 00:53:52,660 --> 00:53:54,880 And I want to send it to you. 1001 00:53:54,880 --> 00:53:57,760 I want to send this message to you, but I don't 1002 00:53:57,760 --> 00:53:59,770 want to send it to you in the clear. 1003 00:53:59,770 --> 00:54:02,090 I don't want to, you know, it's sensitive information. 1004 00:54:02,090 --> 00:54:04,675 I don't want to send it via plain text. 1005 00:54:04,675 --> 00:54:07,630 And I don't want to use a generic hash function 1006 00:54:07,630 --> 00:54:10,570 because if I use a generic hash function, like SHA for example, 1007 00:54:10,570 --> 00:54:11,398 it's irreversible. 1008 00:54:11,398 --> 00:54:13,690 You will not be able to figure out what I tried to say. 1009 00:54:13,690 --> 00:54:18,460 So instead, I take this original data and I use your public key. 1010 00:54:18,460 --> 00:54:22,030 Your public key, again, is just a mathematical-- a very complex-- 1011 00:54:22,030 --> 00:54:24,420 mathematical function. 1012 00:54:24,420 --> 00:54:30,090 So I take this data, I feed it into your public key, your public hash function, 1013 00:54:30,090 --> 00:54:33,660 and I get some garbled stuff out. 1014 00:54:33,660 --> 00:54:34,590 OK? 1015 00:54:34,590 --> 00:54:36,410 And this is what I send to you. 1016 00:54:36,410 --> 00:54:39,880 I send you this garbled stuff. 1017 00:54:39,880 --> 00:54:43,240 In order for you to figure out what the original message is, 1018 00:54:43,240 --> 00:54:44,785 you use your private key. 1019 00:54:44,785 --> 00:54:46,660 Not your public key-- your public key is what 1020 00:54:46,660 --> 00:54:49,900 I use to encipher the information-- but your private key, which 1021 00:54:49,900 --> 00:54:51,940 is known only to you, hypothetically. 1022 00:54:51,940 --> 00:54:55,090 It should not be distributed to others. 1023 00:54:55,090 --> 00:54:57,970 It undoes the work that your public key did. 1024 00:54:57,970 --> 00:55:00,790 And so if I give you the scrambled data and you 1025 00:55:00,790 --> 00:55:04,180 use your private key to try and decipher it, 1026 00:55:04,180 --> 00:55:07,045 you will get back that original data. 1027 00:55:07,045 --> 00:55:08,170 But here's the great thing. 1028 00:55:08,170 --> 00:55:11,560 No one else's private key will be able to do that. 1029 00:55:11,560 --> 00:55:16,750 If anybody intercepts that message other than you and they use their private key 1030 00:55:16,750 --> 00:55:20,170 or they use your public key again, they will not 1031 00:55:20,170 --> 00:55:23,440 be able to decipher the message that I sent to you. 1032 00:55:23,440 --> 00:55:26,140 And so public and private keys are very interesting because they 1033 00:55:26,140 --> 00:55:28,470 create these pairs. 1034 00:55:28,470 --> 00:55:33,460 They're these unique encryption schemes that are unique to two people, 1035 00:55:33,460 --> 00:55:35,620 or really even to one person. 1036 00:55:35,620 --> 00:55:37,450 If you were to send me a message back, you 1037 00:55:37,450 --> 00:55:41,960 would send me a message using my public key. 1038 00:55:41,960 --> 00:55:45,370 You would then send me whatever the encrypted sort of scrambled data 1039 00:55:45,370 --> 00:55:50,350 is for the message that you sent using my public key. 1040 00:55:50,350 --> 00:55:53,350 I would then use my private key, which is not 1041 00:55:53,350 --> 00:55:55,420 known to you or to, hypothetically, anyone 1042 00:55:55,420 --> 00:55:58,900 else to decipher what you sent me. 1043 00:55:58,900 --> 00:56:01,990 And I would get back the secret message, or the perhaps not-so-secret, 1044 00:56:01,990 --> 00:56:05,680 but sensitive message that you sent to me. 1045 00:56:05,680 --> 00:56:09,400 And so that's this idea of asymmetric encryption. 1046 00:56:09,400 --> 00:56:12,430 You can encrypt using someone's public key. 1047 00:56:12,430 --> 00:56:13,960 And anybody can do so. 1048 00:56:13,960 --> 00:56:17,560 And for that reason, you'll often find technically-minded people will 1049 00:56:17,560 --> 00:56:20,890 sometimes post their public key literally on the internet, 1050 00:56:20,890 --> 00:56:24,160 such that anybody who wants to send them a message using a secure channel 1051 00:56:24,160 --> 00:56:27,070 can do so. 1052 00:56:27,070 --> 00:56:28,330 And programmers as well. 1053 00:56:28,330 --> 00:56:32,740 So if I'm doing some work using a tool called GitHub, a popular service 1054 00:56:32,740 --> 00:56:38,080 available online for sharing and posting source code, 1055 00:56:38,080 --> 00:56:42,220 if I want to send something from my computer to GitHub's servers 1056 00:56:42,220 --> 00:56:48,310 in the cloud, I might authenticate using a public key and private key encryption 1057 00:56:48,310 --> 00:56:52,042 scheme so that they see that I'm using their public key to send them 1058 00:56:52,042 --> 00:56:53,500 information, they're decrypting it. 1059 00:56:53,500 --> 00:56:57,250 When they send information back to me, they're using my public key 1060 00:56:57,250 --> 00:56:59,710 and I use my private key to decrypt it. 1061 00:56:59,710 --> 00:57:02,260 It's actually part of-- 1062 00:57:02,260 --> 00:57:06,670 it's part of a communication strategy used by technically-minded folks. 1063 00:57:06,670 --> 00:57:09,850 And you're not restricted to just having one public and private key. 1064 00:57:09,850 --> 00:57:11,770 For example, I have one public and private key 1065 00:57:11,770 --> 00:57:14,860 that I use for a secure email, I have one public and private key 1066 00:57:14,860 --> 00:57:19,120 that I would use for secure texting on my phone, 1067 00:57:19,120 --> 00:57:24,910 and I have one public and private key that I use for my GitHub repository. 1068 00:57:24,910 --> 00:57:29,110 So I have different sets and different combinations of these keys. 1069 00:57:29,110 --> 00:57:31,360 But the key is that-- the key, again, pun intended-- 1070 00:57:31,360 --> 00:57:36,430 is that the decryption can only be done by someone who has the private key, not 1071 00:57:36,430 --> 00:57:40,215 the public key, because only those two functions are reciprocals 1072 00:57:40,215 --> 00:57:40,840 of one another. 1073 00:57:40,840 --> 00:57:46,410 They undo the work that the other did in the first place. 1074 00:57:46,410 --> 00:57:49,320 But interestingly enough, that's not the only thing 1075 00:57:49,320 --> 00:57:52,330 we can do with public and private keys. 1076 00:57:52,330 --> 00:57:54,930 So instead of just encryption, we also have this idea 1077 00:57:54,930 --> 00:57:57,480 of a digital signature, which is different than e-signature, 1078 00:57:57,480 --> 00:58:00,840 an e-signature just being the tracing of a pen typically 1079 00:58:00,840 --> 00:58:04,478 along some surface and just logging where all the pen strokes happen to be. 1080 00:58:04,478 --> 00:58:07,020 So we're talking about something much more complex than that. 1081 00:58:07,020 --> 00:58:08,978 We're talking about something cryptographically 1082 00:58:08,978 --> 00:58:10,800 based when we talk about digital signature. 1083 00:58:10,800 --> 00:58:14,310 It's kind of the opposite of encryption. 1084 00:58:14,310 --> 00:58:17,040 And using someone's digital signature, you 1085 00:58:17,040 --> 00:58:22,050 can verify the authenticity of a document and verify, more precisely, 1086 00:58:22,050 --> 00:58:25,810 the authenticity of the sender of a document. 1087 00:58:25,810 --> 00:58:30,480 And we're going to explain this in great detail in just a moment, 1088 00:58:30,480 --> 00:58:34,720 but the basic idea is they're signing the document using their private key. 1089 00:58:34,720 --> 00:58:36,870 You still don't see what the key is. 1090 00:58:36,870 --> 00:58:39,270 And because these public and private key pairs 1091 00:58:39,270 --> 00:58:42,990 are specific to an individual person, if you 1092 00:58:42,990 --> 00:58:45,330 were able to verify that that document could only 1093 00:58:45,330 --> 00:58:49,140 have been signed using someone's private key, 1094 00:58:49,140 --> 00:58:53,640 then you have quite a serious belief that that person 1095 00:58:53,640 --> 00:58:58,350 is the person who signed the document, who sent the document, and so on. 1096 00:58:58,350 --> 00:59:04,020 Digital signatures are 256 bits long pretty consistently, 1097 00:59:04,020 --> 00:59:08,130 which means there are 2 to the 256th power distinct digital signatures, 1098 00:59:08,130 --> 00:59:13,350 which makes the potential of a forgery effectively zero. 1099 00:59:13,350 --> 00:59:14,420 Again, I'm using this-- 1100 00:59:14,420 --> 00:59:18,420 I'm trying to avoid saying never because computer scientists don't like never. 1101 00:59:18,420 --> 00:59:24,270 But effectively, there is no chance of a forgery. 1102 00:59:24,270 --> 00:59:30,550 Now the process for how one verifies a digital signature is quite-- 1103 00:59:30,550 --> 00:59:32,300 there's quite a few steps involved. 1104 00:59:32,300 --> 00:59:34,533 And I have a diagram here that I sourced from online. 1105 00:59:34,533 --> 00:59:36,450 And what I'd like us to do now is walk through 1106 00:59:36,450 --> 00:59:41,580 this process to hopefully give you an understanding of how these work 1107 00:59:41,580 --> 00:59:44,700 and how you might be able to rely on digital signatures. 1108 00:59:44,700 --> 00:59:49,230 And states and different entities are recognizing digital signatures 1109 00:59:49,230 --> 00:59:53,610 as a valid way to sign documents, but it really helps 1110 00:59:53,610 --> 00:59:57,000 to have a good understanding of them such that you, as an attorney, 1111 00:59:57,000 --> 01:00:02,590 are comfortable with the fact that this does represent a specific individual. 1112 01:00:02,590 --> 01:00:07,420 So let's take a look at how this process works. 1113 01:00:07,420 --> 01:00:10,040 So we start with data. 1114 01:00:10,040 --> 01:00:12,760 Data in this case is any document. 1115 01:00:12,760 --> 01:00:19,240 Perhaps it's a scanned, signed version of some PDF with somebody's actual ink 1116 01:00:19,240 --> 01:00:19,740 signature. 1117 01:00:19,740 --> 01:00:22,260 But again, the whole thing is just scanned. 1118 01:00:22,260 --> 01:00:24,480 The next step is to use a hash function. 1119 01:00:24,480 --> 01:00:27,570 The hash function that we could use in this context could be anything. 1120 01:00:27,570 --> 01:00:29,400 It could be SHA-1. 1121 01:00:29,400 --> 01:00:32,353 It could be something very complex. 1122 01:00:32,353 --> 01:00:34,770 In general, the hash function that's going to be used here 1123 01:00:34,770 --> 01:00:36,960 is actually not a cryptographic hash function. 1124 01:00:36,960 --> 01:00:38,970 It's going to be something like MD5. 1125 01:00:38,970 --> 01:00:40,860 So something that anybody has access to. 1126 01:00:40,860 --> 01:00:44,310 And that's going to result in a hash, a set of zeros and ones. 1127 01:00:44,310 --> 01:00:48,818 In the case of MD5, it's going to be about 160 or so different characters. 1128 01:00:48,818 --> 01:00:50,610 Now where things get very interesting is we 1129 01:00:50,610 --> 01:00:54,030 take that hash, that set of zeros and ones, 1130 01:00:54,030 --> 01:00:58,030 and we encrypt it using the signer's private keys. 1131 01:00:58,030 --> 01:01:00,690 Remember, these functions are reciprocals of one another. 1132 01:01:00,690 --> 01:01:03,270 A public key can undo what the private key does, 1133 01:01:03,270 --> 01:01:06,360 and the private key can undo what the public key does. 1134 01:01:06,360 --> 01:01:11,220 Notice in this case we're still not sending anyone our private key. 1135 01:01:11,220 --> 01:01:14,080 We are just using our private key to encrypt something. 1136 01:01:14,080 --> 01:01:17,940 So we take this hash that we received from running our file through MD5, 1137 01:01:17,940 --> 01:01:22,050 we encrypt it using our private key, and we get some other result out of it. 1138 01:01:22,050 --> 01:01:27,150 This number that comes out of running the hash through our private key 1139 01:01:27,150 --> 01:01:29,550 is called the signature. 1140 01:01:29,550 --> 01:01:32,340 We then just couple that-- so when we send this off, 1141 01:01:32,340 --> 01:01:36,570 we send the signature plus the original document, 1142 01:01:36,570 --> 01:01:40,450 and that would be considered a digital signature. 1143 01:01:40,450 --> 01:01:42,930 So that's the signing part of the process. 1144 01:01:42,930 --> 01:01:43,810 That's where we go. 1145 01:01:43,810 --> 01:01:45,540 We start with a file. 1146 01:01:45,540 --> 01:01:47,730 We run that file through a generic hash function. 1147 01:01:47,730 --> 01:01:49,560 Not our public and private keys, something 1148 01:01:49,560 --> 01:01:51,900 that is generally pretty accessible. 1149 01:01:51,900 --> 01:01:55,800 We take that hash, we encrypt it using our private key 1150 01:01:55,800 --> 01:01:59,730 to get some other hash that looks similar, different zeros and ones, 1151 01:01:59,730 --> 01:02:02,790 but totally different pattern of zeros and ones. 1152 01:02:02,790 --> 01:02:07,860 We attach the original document and the digital signature when we send it off, 1153 01:02:07,860 --> 01:02:11,220 and that is considered a digitally signed document. 1154 01:02:11,220 --> 01:02:15,810 Now the real crux is how do you prove that I'm the person who 1155 01:02:15,810 --> 01:02:18,278 sent you this document, right? 1156 01:02:18,278 --> 01:02:20,070 If you want-- if you're receiving something 1157 01:02:20,070 --> 01:02:22,028 that has a digital signature, which is supposed 1158 01:02:22,028 --> 01:02:25,560 to be as good as any other kind of signature, 1159 01:02:25,560 --> 01:02:28,450 it's supposed to have legal effect. 1160 01:02:28,450 --> 01:02:32,400 How do we verify that that person who sent you the document 1161 01:02:32,400 --> 01:02:34,795 was actually the correct one? 1162 01:02:34,795 --> 01:02:36,420 So then we go to the verification step. 1163 01:02:36,420 --> 01:02:41,310 So we start, we've now received this digitally signed data. 1164 01:02:41,310 --> 01:02:43,560 This is the same as this digitally signed data here 1165 01:02:43,560 --> 01:02:46,660 that was sent by the sender. 1166 01:02:46,660 --> 01:02:48,940 We also received two pieces of information. 1167 01:02:48,940 --> 01:02:53,070 We received the document, the original document, 1168 01:02:53,070 --> 01:02:55,170 and we received the signature. 1169 01:02:55,170 --> 01:02:57,270 And recall, again, that the signature is what 1170 01:02:57,270 --> 01:02:59,610 happens when we take the hash of the document 1171 01:02:59,610 --> 01:03:05,540 and run it using our private key to get a result. 1172 01:03:05,540 --> 01:03:07,773 Now the interesting step here is remembering 1173 01:03:07,773 --> 01:03:10,440 that the public and private keys are reciprocals of one another. 1174 01:03:10,440 --> 01:03:14,570 So we can take this complicated signature hash 1175 01:03:14,570 --> 01:03:18,060 and we can use the public key, which, again, is publicly available. 1176 01:03:18,060 --> 01:03:23,240 Anybody should ostensibly have access to someone's public key, not 1177 01:03:23,240 --> 01:03:23,990 their private key. 1178 01:03:23,990 --> 01:03:27,350 And notice that the signer has never sent their private key. 1179 01:03:27,350 --> 01:03:29,360 They've only used it to encrypt some data, 1180 01:03:29,360 --> 01:03:31,100 but they never sent the private key. 1181 01:03:31,100 --> 01:03:33,620 The public key has always been available though. 1182 01:03:33,620 --> 01:03:36,980 We take the signature, we run it through the public key function, 1183 01:03:36,980 --> 01:03:39,440 and we get a hash. 1184 01:03:39,440 --> 01:03:44,390 We take the data, the document, and we run it through MD5, 1185 01:03:44,390 --> 01:03:48,653 the same hash function that the sender was supposed to use, and we get a hash. 1186 01:03:48,653 --> 01:03:50,570 And we're checking to make sure that these two 1187 01:03:50,570 --> 01:03:53,420 hashes are equal to one another. 1188 01:03:53,420 --> 01:03:56,630 If they are equal to one another, that means the signature is valid. 1189 01:03:56,630 --> 01:04:01,330 Let's talk about why that would be the case. 1190 01:04:01,330 --> 01:04:07,210 If we use the MD5 of this file, the generic hash of this file, 1191 01:04:07,210 --> 01:04:11,100 and we encrypt it using our private key, we get some result, OK? 1192 01:04:11,100 --> 01:04:13,380 But this is very easy to calculate. 1193 01:04:13,380 --> 01:04:14,140 It's MD5. 1194 01:04:14,140 --> 01:04:18,280 We're taking a basic document, we're running it through a publicly known, 1195 01:04:18,280 --> 01:04:19,912 well-defined hash function. 1196 01:04:19,912 --> 01:04:22,870 Anybody who has access to this document and a program on their computer 1197 01:04:22,870 --> 01:04:26,680 called MD5 can literally run this document through it 1198 01:04:26,680 --> 01:04:27,890 and get this number. 1199 01:04:27,890 --> 01:04:30,430 This is not the tricky part of this. 1200 01:04:30,430 --> 01:04:36,160 We then take this hash function, we encrypt it using our private key 1201 01:04:36,160 --> 01:04:38,500 to get some secret number. 1202 01:04:38,500 --> 01:04:40,590 The public key though will undo that. 1203 01:04:40,590 --> 01:04:43,990 Remember, the public and private keys are reciprocals of one another. 1204 01:04:43,990 --> 01:04:48,160 Whatever one does, the other one can undo. 1205 01:04:48,160 --> 01:04:54,650 And so only my public key will undo the work of my private key. 1206 01:04:54,650 --> 01:04:57,550 So if I take this value and I encrypt it using my private key, 1207 01:04:57,550 --> 01:05:00,430 and then I run this value through the public key, 1208 01:05:00,430 --> 01:05:04,630 I should get the original result again, the original MD5 hash. 1209 01:05:04,630 --> 01:05:08,080 And that's why we have to send the document as well, not 1210 01:05:08,080 --> 01:05:10,728 just the digital signature, the numbers that we 1211 01:05:10,728 --> 01:05:13,270 get by running it through our private key in the first place. 1212 01:05:13,270 --> 01:05:18,610 That way we have a way to validate that yes, this file has this checksum, 1213 01:05:18,610 --> 01:05:23,710 and the sender took that checksum, they ran it through their own private key, 1214 01:05:23,710 --> 01:05:26,770 and when I used their public key to undo it, 1215 01:05:26,770 --> 01:05:31,810 I get the same value, which is effectively proving, but is, 1216 01:05:31,810 --> 01:05:34,750 we'll term it as it's very, very, very, very 1217 01:05:34,750 --> 01:05:39,640 likely that this person who claimed to have sent the document 1218 01:05:39,640 --> 01:05:42,100 is, in fact, the person who sent that document. 1219 01:05:42,100 --> 01:05:44,350 And so that's what digital signatures can be used for. 1220 01:05:44,350 --> 01:05:48,520 It is a mathematical, cryptographic way to verify 1221 01:05:48,520 --> 01:05:52,720 the identity of the sender of a document or an individual. 1222 01:05:52,720 --> 01:05:56,260 Or in whatever context you might be using or receiving digital signatures, 1223 01:05:56,260 --> 01:06:02,650 it is purely a verification step that is based entirely in mathematics. 1224 01:06:02,650 --> 01:06:05,590 There's one other potentially interesting use 1225 01:06:05,590 --> 01:06:09,040 of digital signatures that's also quite buzzy right now, 1226 01:06:09,040 --> 01:06:11,300 and that's blockchain technology. 1227 01:06:11,300 --> 01:06:13,450 And what is the blockchain? 1228 01:06:13,450 --> 01:06:18,430 Digital signatures are really key to knowing how the blockchain works 1229 01:06:18,430 --> 01:06:24,740 and why it is trusted as a decentralized source of information for individuals. 1230 01:06:24,740 --> 01:06:27,160 So understanding digital signatures means 1231 01:06:27,160 --> 01:06:30,880 you are in a position to understand blockchain. 1232 01:06:30,880 --> 01:06:34,610 And I use here the term the blockchain, but it really is a blockchain. 1233 01:06:34,610 --> 01:06:37,840 There's no such thing as the one blockchain. 1234 01:06:37,840 --> 01:06:40,930 There are many different-- this is just an idea that is implemented. 1235 01:06:40,930 --> 01:06:43,940 Generally, we're hearing it in the context of a cryptocurrency, 1236 01:06:43,940 --> 01:06:49,150 but it does not need to be restricted to that, although cryptocurrencies are so 1237 01:06:49,150 --> 01:06:52,870 discussed in the media and have been dissected by so many researchers 1238 01:06:52,870 --> 01:06:55,450 that they provide an interesting vehicle, an interesting lens 1239 01:06:55,450 --> 01:06:57,760 through which to consider blockchain. 1240 01:06:57,760 --> 01:07:01,280 And so our example today is going to focus on Bitcoin. 1241 01:07:01,280 --> 01:07:04,000 It is the most well-documented of the cryptocurrencies. 1242 01:07:04,000 --> 01:07:07,560 It is the most well-documented implementation of the blockchain, 1243 01:07:07,560 --> 01:07:10,420 or among the most well-documented implementations. 1244 01:07:10,420 --> 01:07:13,240 But this is not specifically a lecture about Bitcoin. 1245 01:07:13,240 --> 01:07:19,240 We're just using Bitcoin as a lens through which to understand blockchain. 1246 01:07:19,240 --> 01:07:22,480 There's also an outside source that I strongly encourage. 1247 01:07:22,480 --> 01:07:27,850 This channel on YouTube provides interesting mathematical dissections 1248 01:07:27,850 --> 01:07:32,950 of topics, and they tackle blockchain and Bitcoin pretty extensively. 1249 01:07:32,950 --> 01:07:34,960 And this is an excellent supplementary resource 1250 01:07:34,960 --> 01:07:36,580 to consider if you're trying to dig into this 1251 01:07:36,580 --> 01:07:38,960 or understand it a little bit more, because in this video 1252 01:07:38,960 --> 01:07:42,250 I'm going to omit some of the more technical details for the sake 1253 01:07:42,250 --> 01:07:44,857 of, hopefully, broader understanding. 1254 01:07:44,857 --> 01:07:46,690 But if you want to dive into it more deeply, 1255 01:07:46,690 --> 01:07:49,720 this is a resource that I would recommend. 1256 01:07:49,720 --> 01:07:53,170 And I really like talking about Bitcoin in the context of blockchain 1257 01:07:53,170 --> 01:07:57,550 because it's actually how I kind of got started almost as an attorney. 1258 01:07:57,550 --> 01:08:01,690 When I was practicing, when I graduated from law school, 1259 01:08:01,690 --> 01:08:06,040 I decided to go out on my own and start my own firm. 1260 01:08:06,040 --> 01:08:08,650 I live in a small town and so a lot of my early work 1261 01:08:08,650 --> 01:08:12,760 was doing estate plans, wills and such for individuals in my town, 1262 01:08:12,760 --> 01:08:13,810 getting to know them. 1263 01:08:13,810 --> 01:08:17,740 But I had studied extensively technology-related law in law school 1264 01:08:17,740 --> 01:08:21,100 and I really wanted to use it. 1265 01:08:21,100 --> 01:08:23,979 And a few years into my practice, I had a friend 1266 01:08:23,979 --> 01:08:28,960 who needed an estate plan prepared, and he asked if he could pay me in Bitcoin. 1267 01:08:28,960 --> 01:08:31,689 And I had no idea what that meant. 1268 01:08:31,689 --> 01:08:34,380 I didn't really know anything about Bitcoin at the time. 1269 01:08:34,380 --> 01:08:37,670 And I looked it up and thought it sounded interesting, 1270 01:08:37,670 --> 01:08:38,687 and so I said sure. 1271 01:08:38,687 --> 01:08:40,270 So I learned how to set up an account. 1272 01:08:40,270 --> 01:08:42,800 And it's also worth mentioning at the outset, 1273 01:08:42,800 --> 01:08:45,040 as we're talking about cryptocurrency, that you 1274 01:08:45,040 --> 01:08:47,979 need to understand how Bitcoin works to use Bitcoin. 1275 01:08:47,979 --> 01:08:50,470 You don't need to understand how the federal banking 1276 01:08:50,470 --> 01:08:54,040 system works to use a bank. 1277 01:08:54,040 --> 01:08:55,689 And the same is true here with Bitcoin. 1278 01:08:55,689 --> 01:09:00,859 But I ended up accepting a Bitcoin payment 1279 01:09:00,859 --> 01:09:03,560 by creating what's called a Bitcoin wallet. 1280 01:09:03,560 --> 01:09:07,850 I immediately sold the Bitcoin that I received and turned it into cash, such 1281 01:09:07,850 --> 01:09:11,968 that I could use it for more generic purposes. 1282 01:09:11,968 --> 01:09:14,510 And what I decided to do was send out a press release saying, 1283 01:09:14,510 --> 01:09:17,210 oh, I accept Bitcoin, because it was something that was novel 1284 01:09:17,210 --> 01:09:19,085 and I hadn't really heard that much about it. 1285 01:09:19,085 --> 01:09:23,029 And this got the attention of my local paper and companies 1286 01:09:23,029 --> 01:09:25,270 in the area that were technically minded as well. 1287 01:09:25,270 --> 01:09:28,819 And so Bitcoin sort of provided this forum 1288 01:09:28,819 --> 01:09:32,210 to meet new clients that also allowed me to explore fields 1289 01:09:32,210 --> 01:09:34,850 of the law about which I am passionate. 1290 01:09:34,850 --> 01:09:39,800 So it's kind of an interesting segue to be able to share that with you now. 1291 01:09:39,800 --> 01:09:43,609 All right, so stepping away from Bitcoin again more broadly to blockchain. 1292 01:09:43,609 --> 01:09:44,680 What is the blockchain? 1293 01:09:44,680 --> 01:09:47,180 It's very similar to something you've already learned about, 1294 01:09:47,180 --> 01:09:49,109 which is a linked list. 1295 01:09:49,109 --> 01:09:52,220 So recall that a linked list is a set of nodes, each of which 1296 01:09:52,220 --> 01:09:57,200 have connections forward and backward to other nodes in the chain. 1297 01:09:57,200 --> 01:09:58,530 They are linked together. 1298 01:09:58,530 --> 01:10:02,770 And similarly, with a blockchain, all of the blocks are chained together. 1299 01:10:02,770 --> 01:10:06,740 It's basically the same terminology slightly modified. 1300 01:10:06,740 --> 01:10:10,850 So a linked list is a set of nodes, each of which is connected to the one prior 1301 01:10:10,850 --> 01:10:12,652 and the one after it. 1302 01:10:12,652 --> 01:10:15,860 We learned about linked lists as having generally three pieces of information 1303 01:10:15,860 --> 01:10:18,950 associated with them-- a previous pointer, which is basically 1304 01:10:18,950 --> 01:10:23,660 a reference to the prior node, or in this case, the prior block; 1305 01:10:23,660 --> 01:10:26,990 we have the next pointer, which is a reference to the next node 1306 01:10:26,990 --> 01:10:31,010 or the next block; and we had data. 1307 01:10:31,010 --> 01:10:33,570 And in this case, the data is actually two different things. 1308 01:10:33,570 --> 01:10:35,330 There's the real data. 1309 01:10:35,330 --> 01:10:38,180 And again, in the context of a cryptocurrency blockchain 1310 01:10:38,180 --> 01:10:40,730 we're going to be talking about a list of transactions, 1311 01:10:40,730 --> 01:10:44,330 a numbered list of transactions from person A to person b, 1312 01:10:44,330 --> 01:10:47,240 each of those transactions being digitally signed such 1313 01:10:47,240 --> 01:10:50,900 that you can verify that the person who logs that transaction 1314 01:10:50,900 --> 01:10:53,450 is actually the one who made that transaction. 1315 01:10:53,450 --> 01:10:55,780 And also, something called a proof of work. 1316 01:10:55,780 --> 01:10:57,530 And this proof of work is very interesting 1317 01:10:57,530 --> 01:11:01,510 because this is how Bitcoin ostensibly derives its authority. 1318 01:11:01,510 --> 01:11:07,130 There is no central controller of the Bitcoin currency, 1319 01:11:07,130 --> 01:11:09,530 and it is very decentralized. 1320 01:11:09,530 --> 01:11:11,420 And there needs to be some way for people 1321 01:11:11,420 --> 01:11:18,140 to agree as to what the true ledger is, or what the true set of transactions 1322 01:11:18,140 --> 01:11:19,610 that have happened are. 1323 01:11:19,610 --> 01:11:24,330 And the way that is done is by relying on something called the proof of work. 1324 01:11:24,330 --> 01:11:27,280 And we'll dive into that shortly as well. 1325 01:11:27,280 --> 01:11:31,770 So again, cryptocurrencies, that data is a ledger of transactions, each of which 1326 01:11:31,770 --> 01:11:33,840 is digitally signed using the digital signature 1327 01:11:33,840 --> 01:11:37,050 technique we've just discussed by the person who 1328 01:11:37,050 --> 01:11:40,100 made or initiated that transaction. 1329 01:11:40,100 --> 01:11:43,820 And that ledger is decentralized, which means that any time there's 1330 01:11:43,820 --> 01:11:47,210 ever a change, any time any transaction is recorded, in this case, 1331 01:11:47,210 --> 01:11:50,870 using Bitcoin, again, our lens through which to consider blockchain, 1332 01:11:50,870 --> 01:11:53,630 that message is broadcast out. 1333 01:11:53,630 --> 01:11:58,850 So if I make a transaction in Bitcoin, I pay you $10, 1334 01:11:58,850 --> 01:12:02,720 I'm going to announce to everyone else who has a Bitcoin wallet 1335 01:12:02,720 --> 01:12:07,670 or who is monitoring the blockchain, the list of transactions, hey, 1336 01:12:07,670 --> 01:12:14,030 please add the following transaction to this list, Doug pays you $10. 1337 01:12:14,030 --> 01:12:18,800 And that is announced to everybody, everybody records it in their ledger, 1338 01:12:18,800 --> 01:12:22,250 and then some stuff is going to start happening. 1339 01:12:22,250 --> 01:12:25,270 But here is a potential issue. 1340 01:12:25,270 --> 01:12:28,330 How do you know that the blockchain is legitimate? 1341 01:12:28,330 --> 01:12:34,708 How do you know that your copy of what is being said is the truth? 1342 01:12:34,708 --> 01:12:37,750 How do you know that your copy of the blockchain is accurate with respect 1343 01:12:37,750 --> 01:12:40,300 to all other transactions that have happened? 1344 01:12:40,300 --> 01:12:42,200 Everybody else has their own copy as well. 1345 01:12:42,200 --> 01:12:43,180 It's decentralized. 1346 01:12:43,180 --> 01:12:46,930 We all maintain, anybody who's using Bitcoin maintains 1347 01:12:46,930 --> 01:12:51,510 their own copy of the blockchain. 1348 01:12:51,510 --> 01:12:54,630 How do you defend against people modifying it? 1349 01:12:54,630 --> 01:12:57,440 1350 01:12:57,440 --> 01:12:59,580 That's a very interesting question. 1351 01:12:59,580 --> 01:13:02,820 The way that cryptocurrencies do it is to assume-- 1352 01:13:02,820 --> 01:13:04,610 and this is defined in the Bitcoin paper-- 1353 01:13:04,610 --> 01:13:07,260 the way the cryptocurrencies do it is to assume 1354 01:13:07,260 --> 01:13:11,910 that the chain that has the most computational work put into it 1355 01:13:11,910 --> 01:13:13,470 is the true chain. 1356 01:13:13,470 --> 01:13:16,080 This decision is completely arbitrary. 1357 01:13:16,080 --> 01:13:19,290 There's no reason why one needs to be vetted over the other. 1358 01:13:19,290 --> 01:13:23,910 But something had to be agreed upon by, collectively, users of Bitcoin 1359 01:13:23,910 --> 01:13:30,210 to say in the event of a dispute, between which person's chain is 1360 01:13:30,210 --> 01:13:34,380 the accurate de facto definitive list of transactions? 1361 01:13:34,380 --> 01:13:38,370 We're going to go with the one that has been verified the most times. 1362 01:13:38,370 --> 01:13:41,970 And again, this word verified is sort of a sketchy word. 1363 01:13:41,970 --> 01:13:44,400 There's nothing inherently about proof of work 1364 01:13:44,400 --> 01:13:49,590 or anything else that proves that a transaction has taken place in the way 1365 01:13:49,590 --> 01:13:52,230 that we normally think of this term verified. 1366 01:13:52,230 --> 01:13:57,510 Rather it is the collective standard by which we all agree to adhere, 1367 01:13:57,510 --> 01:13:58,730 that the person-- 1368 01:13:58,730 --> 01:14:04,860 or that the blockchain that has the most proof of work in it is the list. 1369 01:14:04,860 --> 01:14:08,820 That is just something we must subscribe to as users 1370 01:14:08,820 --> 01:14:10,650 and consumers of blockchain. 1371 01:14:10,650 --> 01:14:14,760 Now how do we determine which blockchain has had the most computational work 1372 01:14:14,760 --> 01:14:18,720 into it, which copy of the blockchain has had the most computational work put 1373 01:14:18,720 --> 01:14:20,190 into it? 1374 01:14:20,190 --> 01:14:23,100 Well, this is proof of work. 1375 01:14:23,100 --> 01:14:29,670 So proof of work is how the correct blockchain of all the copies 1376 01:14:29,670 --> 01:14:33,840 that are decentralized is determined. 1377 01:14:33,840 --> 01:14:35,130 So recall how hashing works. 1378 01:14:35,130 --> 01:14:42,020 Hashing allows us to take any arbitrary data and run it through a hash function 1379 01:14:42,020 --> 01:14:44,480 and get an outcome. 1380 01:14:44,480 --> 01:14:49,220 And that outcome is going to be, let's say 256 bits, each of those bits being, 1381 01:14:49,220 --> 01:14:52,930 of course, 0 or 1. 1382 01:14:52,930 --> 01:14:56,650 Now there's a lot of different combinations there. 1383 01:14:56,650 --> 01:15:00,460 But some of them will be very unique. 1384 01:15:00,460 --> 01:15:05,110 And the way Bitcoin works, Bitcoin's blockchain works 1385 01:15:05,110 --> 01:15:08,450 is to prove a particular block. 1386 01:15:08,450 --> 01:15:12,840 We are asking people who are oftentimes called miners-- 1387 01:15:12,840 --> 01:15:15,910 that's where this term comes from because they are mining. , 1388 01:15:15,910 --> 01:15:19,180 Ultimately the reward for doing this proof of work is to receive Bitcoin 1389 01:15:19,180 --> 01:15:20,972 that are sort of generated out of thin air. 1390 01:15:20,972 --> 01:15:22,840 And so these people are termed miners. 1391 01:15:22,840 --> 01:15:28,300 But we are asking anyone who has a computer to hash the entire block. 1392 01:15:28,300 --> 01:15:31,000 So hash the entire list of transactions, the reference 1393 01:15:31,000 --> 01:15:32,720 to the previous block and the next block. 1394 01:15:32,720 --> 01:15:35,845 And remember, all of that is contained in a single node of this blockchain, 1395 01:15:35,845 --> 01:15:36,880 basically. 1396 01:15:36,880 --> 01:15:40,570 And we're looking for a highly unusual pattern. 1397 01:15:40,570 --> 01:15:44,500 We're looking for maybe the first 30 bits or the first 40 bits 1398 01:15:44,500 --> 01:15:47,540 to all be zeros. 1399 01:15:47,540 --> 01:15:49,230 That's really weird. 1400 01:15:49,230 --> 01:15:51,230 Like, that's a really difficult pattern to find. 1401 01:15:51,230 --> 01:15:53,060 And the only way to do it is to guess. 1402 01:15:53,060 --> 01:15:56,240 So you take this entire block, you attach a single piece of data 1403 01:15:56,240 --> 01:15:58,880 to the bottom of it, like 1, 2, 3. 1404 01:15:58,880 --> 01:16:01,610 You can just count in that way trying to guess. 1405 01:16:01,610 --> 01:16:05,100 And if you hash that entire thing together, 1406 01:16:05,100 --> 01:16:09,490 do you eventually find a block that, when hashed in this way, 1407 01:16:09,490 --> 01:16:13,170 produces this very, very unique pattern? 1408 01:16:13,170 --> 01:16:16,680 If so, you just say, here's the number that I attach. 1409 01:16:16,680 --> 01:16:21,930 So let's say I took the entire block and I hashed it with 12345 1410 01:16:21,930 --> 01:16:24,450 was the number, right? 1411 01:16:24,450 --> 01:16:29,130 It's very difficult to find a value that would 1412 01:16:29,130 --> 01:16:31,230 create this unique pattern of zeros and ones, 1413 01:16:31,230 --> 01:16:33,900 in particular, zeros, 30 zeros in a row. 1414 01:16:33,900 --> 01:16:38,850 But it's really, really easy to verify that someone has done it. 1415 01:16:38,850 --> 01:16:43,140 To verify that someone has done it, all you have to do is if they announce 1416 01:16:43,140 --> 01:16:46,877 the number that they used, 12345, as their proof of work-- 1417 01:16:46,877 --> 01:16:48,710 and that's what the proof of work really is, 1418 01:16:48,710 --> 01:16:51,740 it's that number that they use to figure it out-- 1419 01:16:51,740 --> 01:16:54,890 if they announce that and you hash the block with that number, 1420 01:16:54,890 --> 01:16:59,270 you can verify, yes, that pattern is actually 30 zeros in a row. 1421 01:16:59,270 --> 01:17:01,610 So I guess you have proven it. 1422 01:17:01,610 --> 01:17:04,130 Now this is, again, kind of arbitrary. 1423 01:17:04,130 --> 01:17:05,090 Like, this seems weird. 1424 01:17:05,090 --> 01:17:08,210 Why are you spending all your time trying 1425 01:17:08,210 --> 01:17:12,530 to figure out a specific pattern that exists somewhere? 1426 01:17:12,530 --> 01:17:16,310 That is a question that I cannot answer other than to say that it is 1427 01:17:16,310 --> 01:17:22,775 the standard by which people who have ascribed to the Bitcoin standard have 1428 01:17:22,775 --> 01:17:23,900 just agreed to be bound by. 1429 01:17:23,900 --> 01:17:27,540 The person who finds this number is probably the-- 1430 01:17:27,540 --> 01:17:31,577 is proving the validity of all the transactions above it. 1431 01:17:31,577 --> 01:17:34,160 And this gets interesting when you think about somebody trying 1432 01:17:34,160 --> 01:17:37,160 to perpetrate a fraudulent transaction. 1433 01:17:37,160 --> 01:17:41,240 So imagine I'm trying to perpetrate a fraudulent transaction by initiating 1434 01:17:41,240 --> 01:17:43,850 a transaction that says, I'm going to pay you $100. 1435 01:17:43,850 --> 01:17:47,720 And I announce that to you, but I don't broadcast it 1436 01:17:47,720 --> 01:17:50,540 to everybody else who maintains the blocks, who are maintaining 1437 01:17:50,540 --> 01:17:52,570 their own copies of blockchains. 1438 01:17:52,570 --> 01:17:55,370 Which is interesting because you think that I have spent $100, 1439 01:17:55,370 --> 01:17:58,370 and as far as you're concerned I have spent $100 to you, 1440 01:17:58,370 --> 01:18:00,440 but no one else is aware of that. 1441 01:18:00,440 --> 01:18:04,940 So no one else thinks that I have spent $100. 1442 01:18:04,940 --> 01:18:09,680 They all think I am $100 wealthier than I actually am. 1443 01:18:09,680 --> 01:18:13,280 The problem then arises that I need to verify that block. 1444 01:18:13,280 --> 01:18:15,230 I need to verify that transaction. 1445 01:18:15,230 --> 01:18:18,260 So I append the transaction to my own copy of the blockchain 1446 01:18:18,260 --> 01:18:20,907 because I am the only person other than you-- 1447 01:18:20,907 --> 01:18:23,240 the two of us maybe have these copies of the blockchain, 1448 01:18:23,240 --> 01:18:25,910 but everybody else, I didn't broadcast this transaction 1449 01:18:25,910 --> 01:18:28,220 so no one else knows about it. 1450 01:18:28,220 --> 01:18:31,070 In order for it to have a proof of work attached to it, 1451 01:18:31,070 --> 01:18:35,360 in order for it to be considered the valid chain, 1452 01:18:35,360 --> 01:18:39,110 I would need to prove that block. 1453 01:18:39,110 --> 01:18:44,300 I would need to find that secret number that when hashed with the entire block, 1454 01:18:44,300 --> 01:18:50,040 produces a pattern of 30 consecutive zero bits before anybody else does. 1455 01:18:50,040 --> 01:18:55,270 So that's a 1 in 2 over 2 to the 30th power chance 1456 01:18:55,270 --> 01:18:59,650 because I'm looking for a pattern of 30 consecutive zeros. 1457 01:18:59,650 --> 01:19:02,530 There's a 1 in 2 to the 30th power chance 1458 01:19:02,530 --> 01:19:04,312 that I'm going to find that pattern. 1459 01:19:04,312 --> 01:19:06,520 And I have to find that pattern before somebody else. 1460 01:19:06,520 --> 01:19:11,225 And in the meantime, other transactions are coming in on my ledger. 1461 01:19:11,225 --> 01:19:13,600 On my-- other people are broadcasting their transactions. 1462 01:19:13,600 --> 01:19:16,000 And I have to keep adding them to my ledger 1463 01:19:16,000 --> 01:19:19,840 and keep proving that work over and over and over, 1464 01:19:19,840 --> 01:19:23,920 all the while trying to stay ahead so that my fraudulent transaction is 1465 01:19:23,920 --> 01:19:27,400 considered ultimately the correct blockchain. 1466 01:19:27,400 --> 01:19:30,730 Now the odd-- you just can't beat the odds of that. 1467 01:19:30,730 --> 01:19:35,260 One malicious person trying to perpetrate a fraudulent transaction 1468 01:19:35,260 --> 01:19:38,140 using the blockchain cannot stay ahead. 1469 01:19:38,140 --> 01:19:42,130 They can't win the find the secret number 1470 01:19:42,130 --> 01:19:45,010 game over and over and over and over. 1471 01:19:45,010 --> 01:19:48,880 Eventually, some other chain, which contains valid transactions, 1472 01:19:48,880 --> 01:19:54,190 will win out over my attempted fraudulent chain. 1473 01:19:54,190 --> 01:19:55,570 And it will be disregarded. 1474 01:19:55,570 --> 01:20:00,568 Nobody will consider that to be a valid part of the chain anymore. 1475 01:20:00,568 --> 01:20:02,110 And so that's kind of how this works. 1476 01:20:02,110 --> 01:20:07,360 Again, it's arbitrary the way they decide to resolve or verify. 1477 01:20:07,360 --> 01:20:09,790 There's nothing about this process that proves 1478 01:20:09,790 --> 01:20:12,670 that person A sent person B money. 1479 01:20:12,670 --> 01:20:15,160 It's just the consensus that we have decided, well, 1480 01:20:15,160 --> 01:20:19,000 if people have gone through the effort to try and find these secret numbers, 1481 01:20:19,000 --> 01:20:23,830 and many different people are doing it, and this one chain is longer than 1482 01:20:23,830 --> 01:20:27,400 the others because it's been verified-- again, using this term verified-- 1483 01:20:27,400 --> 01:20:31,120 it's been proven with work over and over and over, we're 1484 01:20:31,120 --> 01:20:33,380 just going to agree that that's the right one. 1485 01:20:33,380 --> 01:20:34,987 So again, it's kind of strange. 1486 01:20:34,987 --> 01:20:37,570 And I do, again, refer you to that video that I shared earlier 1487 01:20:37,570 --> 01:20:39,862 to get into some of the more technical details of this, 1488 01:20:39,862 --> 01:20:42,430 which I'm glossing over a little bit here in this discussion. 1489 01:20:42,430 --> 01:20:46,060 But proof of work is basically the collective consensus 1490 01:20:46,060 --> 01:20:48,310 of blockchain users, or in this case specifically, 1491 01:20:48,310 --> 01:20:54,232 of Bitcoin users, for which transactions they are going to consider valid. 1492 01:20:54,232 --> 01:20:56,940 Because changing any one-- and if you go back in time, as opposed 1493 01:20:56,940 --> 01:21:00,450 to trying to forward think I want to add a new fraudulent transaction, 1494 01:21:00,450 --> 01:21:03,900 if you try and go back in time to modify a transaction from the past, 1495 01:21:03,900 --> 01:21:08,910 say there was a transaction that was you pay me $10 1496 01:21:08,910 --> 01:21:12,870 and I maintain a copy of the blockchain, so I can go back in time 1497 01:21:12,870 --> 01:21:21,150 and modify that file, technically, I change it to you pay me $100, well, 1498 01:21:21,150 --> 01:21:23,430 because I've changed even the tiniest thing 1499 01:21:23,430 --> 01:21:26,340 and I'm hashing that block, that means that when I hash it 1500 01:21:26,340 --> 01:21:29,700 with that secret number, I'm no longer getting that secret pattern of 30 1501 01:21:29,700 --> 01:21:32,010 numbers, 30 zeros in a row. 1502 01:21:32,010 --> 01:21:35,650 And so that kind of calls that transaction into question. 1503 01:21:35,650 --> 01:21:38,280 It also, because each of those blocks contains a reference 1504 01:21:38,280 --> 01:21:40,560 to the next block and the previous block, 1505 01:21:40,560 --> 01:21:46,770 it also invalidates all of the other transactions in that blockchain. 1506 01:21:46,770 --> 01:21:50,070 And so because of this weird technique we're 1507 01:21:50,070 --> 01:21:54,840 doing of hashing blocks, hashing data, trying to look for specific patterns, 1508 01:21:54,840 --> 01:21:58,680 but realizing that any cryptographic hash function with the tiniest 1509 01:21:58,680 --> 01:22:03,270 change to the input creates a totally different output, 1510 01:22:03,270 --> 01:22:06,390 we actually are pretty well defended against people 1511 01:22:06,390 --> 01:22:10,575 who try and go back in time and make fraudulent transactions using 1512 01:22:10,575 --> 01:22:11,200 the blockchain. 1513 01:22:11,200 --> 01:22:16,440 So it's mathematical and it's quirky, but it does provide a clever way 1514 01:22:16,440 --> 01:22:19,260 to defend against that kind of thing, considering 1515 01:22:19,260 --> 01:22:21,097 we don't have a central authority to rely on 1516 01:22:21,097 --> 01:22:22,680 to adjudicate these kinds of disputes. 1517 01:22:22,680 --> 01:22:26,400 We are collectively, not trusting one another enough, 1518 01:22:26,400 --> 01:22:30,270 but agreeing to trust the mathematics of the blockchain in order 1519 01:22:30,270 --> 01:22:33,860 for it to succeed. 1520 01:22:33,860 --> 01:22:36,615 So as I mentioned, we can very easily verify the correctness 1521 01:22:36,615 --> 01:22:37,740 of someone's proof of work. 1522 01:22:37,740 --> 01:22:41,070 That proof of work is just the number that is hashed with the block 1523 01:22:41,070 --> 01:22:46,090 to produce the secret pattern of 30 zeros and then some other bits, 1524 01:22:46,090 --> 01:22:47,250 and so on. 1525 01:22:47,250 --> 01:22:49,350 The longer a chain gets, the more and more likely 1526 01:22:49,350 --> 01:22:51,988 it is that all the transactions in it are "verified." 1527 01:22:51,988 --> 01:22:54,030 Again, I keep putting air quotes around that word 1528 01:22:54,030 --> 01:22:57,420 because it doesn't mean in exactly the same way 1529 01:22:57,420 --> 01:23:00,390 that we might consider verified colloquially to mean. 1530 01:23:00,390 --> 01:23:03,090 It doesn't prove anything about the transaction itself, 1531 01:23:03,090 --> 01:23:06,660 just that we accept it as the standard. 1532 01:23:06,660 --> 01:23:09,630 We accept this as the de facto truth because of all the mathematics 1533 01:23:09,630 --> 01:23:12,470 that have been put into it. 1534 01:23:12,470 --> 01:23:14,500 So the longer a chain gets, the more likely 1535 01:23:14,500 --> 01:23:17,890 it is that it consists of only verified, legitimate transactions. 1536 01:23:17,890 --> 01:23:22,470 But that brings up a question of, what is a transaction? 1537 01:23:22,470 --> 01:23:26,320 A transaction is just an exchange between two people. 1538 01:23:26,320 --> 01:23:28,200 And if we start to really spread things out, 1539 01:23:28,200 --> 01:23:33,720 we can almost think about a transaction as a contract. 1540 01:23:33,720 --> 01:23:37,950 I offer you $10 for you to do something on my behalf, 1541 01:23:37,950 --> 01:23:41,282 and assuming that we're intending for me to actually give you these $10, 1542 01:23:41,282 --> 01:23:43,740 and you're intending to actually do something on my behalf, 1543 01:23:43,740 --> 01:23:47,610 and the thing that you're doing for me is not illegal, 1544 01:23:47,610 --> 01:23:50,830 we've basically formed a contract. 1545 01:23:50,830 --> 01:23:54,120 And so while Bitcoin can be used, the blockchain for Bitcoin 1546 01:23:54,120 --> 01:23:57,940 can be used to send money back and forth between people, 1547 01:23:57,940 --> 01:24:02,850 the data that goes into the data block of any blockchain is arbitrary. 1548 01:24:02,850 --> 01:24:07,230 And there's no reason why, instead of being a list of transactions, 1549 01:24:07,230 --> 01:24:10,410 that data couldn't be something much more significant than that. 1550 01:24:10,410 --> 01:24:14,570 There's no reason it couldn't be a digitally signed PDF 1551 01:24:14,570 --> 01:24:17,610 scan of a contract between two people. 1552 01:24:17,610 --> 01:24:22,830 There's no reason it can't be a message from me typed to you saying, 1553 01:24:22,830 --> 01:24:27,660 I will pay you $100 if you paint my house on Tuesday, 1554 01:24:27,660 --> 01:24:32,190 and you sending something back in that same chain saying, I will paint-- 1555 01:24:32,190 --> 01:24:35,640 I accept your offer for this payment. 1556 01:24:35,640 --> 01:24:36,720 I accept your offer. 1557 01:24:36,720 --> 01:24:40,560 I will paint your house on Tuesday in exchange for $100. 1558 01:24:40,560 --> 01:24:44,850 We've just formed a contract with no middleman at all. 1559 01:24:44,850 --> 01:24:46,710 We are announcing our intentions. 1560 01:24:46,710 --> 01:24:50,460 It is being recorded publicly in everybody's version of the blockchain. 1561 01:24:50,460 --> 01:24:53,460 There is verified, again, verified in the sense 1562 01:24:53,460 --> 01:24:59,458 that we collectively term to be accurate rather than 1563 01:24:59,458 --> 01:25:02,250 proving that I definitely sent this although the digital signatures 1564 01:25:02,250 --> 01:25:04,125 associated with these transactions do, again, 1565 01:25:04,125 --> 01:25:07,870 suggest yes, I am the person who made this transaction because I digitally 1566 01:25:07,870 --> 01:25:08,370 signed it. 1567 01:25:08,370 --> 01:25:12,510 If I do the same thing with a contract, if I send you an offer 1568 01:25:12,510 --> 01:25:16,330 and you accept, and both of those items are in the chain, 1569 01:25:16,330 --> 01:25:18,210 we arguably have formed a contract. 1570 01:25:18,210 --> 01:25:22,890 And that is what the blockchain associated with the Ethereum technology 1571 01:25:22,890 --> 01:25:24,840 is actually more akin to. 1572 01:25:24,840 --> 01:25:27,210 So Bitcoin is kind of restricted in how it 1573 01:25:27,210 --> 01:25:31,860 approaches cryptocurrency and approaches transactions between people. 1574 01:25:31,860 --> 01:25:34,120 And Ethereum opens up a little bit more. 1575 01:25:34,120 --> 01:25:36,750 And there are other blockchain technologies and other services 1576 01:25:36,750 --> 01:25:41,040 that rely on the blockchain in order to do things far 1577 01:25:41,040 --> 01:25:45,210 beyond what a cryptocurrency could do. 1578 01:25:45,210 --> 01:25:49,290 But all these things are only possible because we rely on-- 1579 01:25:49,290 --> 01:25:51,990 we rely so extensively on cryptography. 1580 01:25:51,990 --> 01:25:57,420 We use computers to send information securely, encrypt information. 1581 01:25:57,420 --> 01:26:00,030 And the mathematical unlikelihood of someone 1582 01:26:00,030 --> 01:26:02,880 being able to duplicate our work, or certainly 1583 01:26:02,880 --> 01:26:06,090 reverse engineer this encryption is what gives us 1584 01:26:06,090 --> 01:26:10,357 the confidence to make these transactions in the first place. 1585 01:26:10,357 --> 01:26:12,690 And so cryptography forms the basis of almost everything 1586 01:26:12,690 --> 01:26:17,220 that we do when we talk about security on a computer. 1587 01:26:17,220 --> 01:26:21,850 But ultimately, cryptography just relies on mathematics. 1588 01:26:21,850 --> 01:26:24,400 So the moral of the story is probably this. 1589 01:26:24,400 --> 01:26:26,512 You are probably not going to be implementing 1590 01:26:26,512 --> 01:26:27,970 your own version of the blockchain. 1591 01:26:27,970 --> 01:26:32,650 And really, you don't need to understand it completely in order to use it. 1592 01:26:32,650 --> 01:26:36,790 Like I said, you can use Bitcoin without knowing the mathematics of how Bitcoin 1593 01:26:36,790 --> 01:26:40,750 works, just like you can use a bank without knowing the minutia of how 1594 01:26:40,750 --> 01:26:42,400 the banking system works. 1595 01:26:42,400 --> 01:26:46,090 The point of the blockchain is to remove a central authority. 1596 01:26:46,090 --> 01:26:50,320 We don't rely on one person or one entity or one government 1597 01:26:50,320 --> 01:26:54,610 to determine what has happened, what the transactions are 1598 01:26:54,610 --> 01:26:55,690 like we do with a bank. 1599 01:26:55,690 --> 01:26:58,960 Your bank has a ledger of everybody's accounts. 1600 01:26:58,960 --> 01:27:03,130 With blockchain technology, we are decentralizing this and making it 1601 01:27:03,130 --> 01:27:05,800 so that everybody has access to all of the information at once, 1602 01:27:05,800 --> 01:27:09,850 and it is everybody's responsibility to keep that ledger accurate. 1603 01:27:09,850 --> 01:27:13,150 And because these ledgers rely so extensively on cryptography, 1604 01:27:13,150 --> 01:27:16,210 because this technology relies on cryptography, 1605 01:27:16,210 --> 01:27:18,940 we can use the power of cryptography, the fact 1606 01:27:18,940 --> 01:27:23,050 that things are very difficult to reverse engineer mathematically 1607 01:27:23,050 --> 01:27:25,630 to verify that yes, these are the things, 1608 01:27:25,630 --> 01:27:28,630 these are the things that have happened, these are the transactions that 1609 01:27:28,630 --> 01:27:33,420 have been logged, and everybody knows about it at the same time. 1610 01:27:33,420 --> 01:27:34,442