WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:00.000 --> 00:00:03.409 [MUSIC PLAYING] 00:00:17.560 --> 00:00:18.760 SPEAKER 1: Cryptography. 00:00:18.760 --> 00:00:20.713 What is it, and why is it important? 00:00:20.713 --> 00:00:23.380 We're going to answer those two questions in exactly that order. 00:00:23.380 --> 00:00:25.420 Let's start with what cryptography is. 00:00:25.420 --> 00:00:31.340 It's the art and science of obscuring, and ideally protecting, information. 00:00:31.340 --> 00:00:34.330 Now it's an art and a science because there's math involved with it. 00:00:34.330 --> 00:00:37.457 It's pretty straightforward to manipulate characters in some way 00:00:37.457 --> 00:00:39.790 by adding some constant number to them or to change them 00:00:39.790 --> 00:00:42.130 in some systematic manner. 00:00:42.130 --> 00:00:47.380 But it's an art, because doing so in a way to defend against potential attacks 00:00:47.380 --> 00:00:50.650 is not as easy as it might first appear. 00:00:50.650 --> 00:00:54.640 There's a lot of guesswork and calculation 00:00:54.640 --> 00:00:59.080 that needs to go into play to find a really strong cipher. 00:00:59.080 --> 00:01:01.240 Cryptography gives us the opportunity to have 00:01:01.240 --> 00:01:04.930 a basic level of security against an adversary who might 00:01:04.930 --> 00:01:06.880 do bad things with the information. 00:01:06.880 --> 00:01:11.470 We usually contrast, in cipher information, 00:01:11.470 --> 00:01:14.350 with information that is presented in the clear, which 00:01:14.350 --> 00:01:17.410 is to say there's no protection surrounding it at all. 00:01:17.410 --> 00:01:21.940 And it's generally considered better to protect information using cryptography 00:01:21.940 --> 00:01:26.140 than to have information just freely available out there. 00:01:26.140 --> 00:01:28.840 Now a cipher, we're going to start by talking about cryptography 00:01:28.840 --> 00:01:29.920 sort of through history. 00:01:29.920 --> 00:01:32.590 We'll lead up to more modern forms of cryptography, 00:01:32.590 --> 00:01:36.430 which are derived from more ancient forms of cryptography. 00:01:36.430 --> 00:01:40.180 But a cipher is one of the most fundamental forms of cryptography. 00:01:40.180 --> 00:01:41.950 And ciphers are algorithms. 00:01:41.950 --> 00:01:46.900 And recall that an algorithm is just a step-by-step set of instructions 00:01:46.900 --> 00:01:49.730 that we use to complete a task. 00:01:49.730 --> 00:01:54.610 And in case, the task is to obscure or encipher information. 00:01:54.610 --> 00:01:59.890 And ciphers can also be used in reverse to unobscure, or decipher, 00:01:59.890 --> 00:02:04.420 that same information that was previously encoded or enciphered. 00:02:04.420 --> 00:02:07.330 Now there are many different ciphers out there 00:02:07.330 --> 00:02:10.509 that have varying levels of security potential. 00:02:10.509 --> 00:02:13.210 Some of the more ancient ciphers that we're going to start with 00:02:13.210 --> 00:02:16.300 should be [INAUDIBLE] be considered to have no security potential at all 00:02:16.300 --> 00:02:18.380 considering how easy they are to crack. 00:02:18.380 --> 00:02:22.220 But again, this leads into the more modern approach to cryptography, 00:02:22.220 --> 00:02:25.510 which is much more secure than some of these basic ones. 00:02:25.510 --> 00:02:29.350 And now let's start by imagining that we have possession of this device. 00:02:29.350 --> 00:02:32.680 Now if you're looking at this device and it seems somewhat familiar to you, 00:02:32.680 --> 00:02:35.800 it may be because you've recently seen the movie A Christmas Story, 00:02:35.800 --> 00:02:39.100 where Ralphie, the character there, obtains 00:02:39.100 --> 00:02:45.100 one of these, which is a little orphan Annie's secret society decoder pin. 00:02:45.100 --> 00:02:50.140 And this decoder pin has a set of numbers going sequentially one 00:02:50.140 --> 00:02:52.660 through 26 around the inner edge, and a set 00:02:52.660 --> 00:02:55.690 of letters, which is not presented in any particular order, 00:02:55.690 --> 00:02:57.140 around the outer edge. 00:02:57.140 --> 00:03:01.090 And what would happen is the radio announcer would provide, 00:03:01.090 --> 00:03:02.800 set your pins to some combination. 00:03:02.800 --> 00:03:05.323 So line up one number with one letter. 00:03:05.323 --> 00:03:07.240 And then it would read off some secret message 00:03:07.240 --> 00:03:10.328 that, ostensibly, only individuals who possessed this pin, 00:03:10.328 --> 00:03:13.120 or many of the duplicate versions of this pin that were distributed 00:03:13.120 --> 00:03:16.690 to children around the country, could then decipher 00:03:16.690 --> 00:03:19.270 by taking the numbers that were given over the radio 00:03:19.270 --> 00:03:22.480 and transforming them back into letters so that it makes sense. 00:03:22.480 --> 00:03:24.430 So if you can, if you zoom in on this image, 00:03:24.430 --> 00:03:27.070 it might be a little difficult to see, but you 00:03:27.070 --> 00:03:31.810 can see that the 3 corresponds to the letter L, and the 4 corresponds to an M 00:03:31.810 --> 00:03:36.550 based on this particular setting of this decoder pin. 00:03:36.550 --> 00:03:40.660 So this is one potential, what we would call a substitution cipher, 00:03:40.660 --> 00:03:45.280 where we're changing, we're substituting a letter in this case for a number, 00:03:45.280 --> 00:03:47.740 and that number will henceforth represent that letter 00:03:47.740 --> 00:03:51.180 for the rest of this message. 00:03:51.180 --> 00:03:53.820 But what is the problem with this cipher? 00:03:53.820 --> 00:03:56.760 Or more generally, when we think about issues in computer science 00:03:56.760 --> 00:04:00.540 where we have adversaries who are trying to penetrate some system, 00:04:00.540 --> 00:04:04.740 or break a code, or break in, or hack into anything, 00:04:04.740 --> 00:04:08.620 hack your password, we sometimes frame this in terms of asking the question, 00:04:08.620 --> 00:04:10.500 what is the attack vector? 00:04:10.500 --> 00:04:13.830 Where is the vulnerability that is potentially 00:04:13.830 --> 00:04:17.070 part of this particular cipher? 00:04:17.070 --> 00:04:22.440 And in this case, it's that anybody who has access to this pin 00:04:22.440 --> 00:04:26.100 is able to break any cipher that is made with this pin. 00:04:26.100 --> 00:04:29.790 And again, this pin was distributed pretty extensively in 1930s and 40s 00:04:29.790 --> 00:04:32.610 to children who listened to this very popular radio program. 00:04:32.610 --> 00:04:35.580 So these pins were in the hands of many people. 00:04:35.580 --> 00:04:37.740 And anybody who had access to the pin would 00:04:37.740 --> 00:04:39.700 be able to understand the message. 00:04:39.700 --> 00:04:43.380 And so that is, how we might frame this attack vector, 00:04:43.380 --> 00:04:48.930 is the key, in this case, the pin, which we will call a key for this purpose, 00:04:48.930 --> 00:04:50.110 is just very prevalent. 00:04:50.110 --> 00:04:54.510 It's pretty well known how to use this key and manipulate this key. 00:04:54.510 --> 00:04:58.470 A lot of people have access to that key. 00:04:58.470 --> 00:05:00.890 But that's just one example of a substitution cipher. 00:05:00.890 --> 00:05:04.755 We have many different examples of substitution ciphers that we could use. 00:05:04.755 --> 00:05:07.130 Let's just take another very simple, straightforward one, 00:05:07.130 --> 00:05:10.610 which is imagine we have all of the letters of the alphabet 00:05:10.610 --> 00:05:13.880 and we're just going to assign the ordinal position of that letter 00:05:13.880 --> 00:05:15.150 as its cipher value. 00:05:15.150 --> 00:05:18.468 So with the secret society pin, there was this sort of random element 00:05:18.468 --> 00:05:19.010 to it, right? 00:05:19.010 --> 00:05:20.302 The letters were being skipped. 00:05:20.302 --> 00:05:24.240 There wasn't a rhyme or reason to them, although the numbers were sequential. 00:05:24.240 --> 00:05:25.760 Here let's just line up both. 00:05:25.760 --> 00:05:29.220 Let's use sequential letters and map them to their sequential numbers. 00:05:29.220 --> 00:05:33.210 So A becomes 1, B becomes 2, and so on. 00:05:33.210 --> 00:05:36.650 Both of these things are increasing linearly. 00:05:36.650 --> 00:05:39.020 Now you may recall that as computer scientists, 00:05:39.020 --> 00:05:42.230 we ordinarily start counting from zero rather than counting from one. 00:05:42.230 --> 00:05:46.730 I'm counting from one here because this mapping of A to 1 and Z to 26 00:05:46.730 --> 00:05:49.400 is much more familiar to us intuitively as humans, 00:05:49.400 --> 00:05:54.410 and I want to keep us grounded in this discussion of cryptography right now. 00:05:54.410 --> 00:05:59.150 But ordinarily, you might actually instead see this as 0 to 25, 0 being A, 00:05:59.150 --> 00:06:02.750 through Z being 25 as opposed to 1 through 26. 00:06:02.750 --> 00:06:05.060 But this cipher would work exactly the same 00:06:05.060 --> 00:06:08.090 and has roughly the same security potential 00:06:08.090 --> 00:06:11.690 as Annie's secret society cipher does. 00:06:11.690 --> 00:06:17.210 And we can actually make this a little bit better because we are consistently 00:06:17.210 --> 00:06:20.180 increasing the letters, A through Z, and consistently increasing 00:06:20.180 --> 00:06:22.100 the numbers, 1 through 26. 00:06:22.100 --> 00:06:25.460 We could also, instead of just doing this direct mapping, 00:06:25.460 --> 00:06:27.260 we could rotate around. 00:06:27.260 --> 00:06:31.025 We could start the 1 somewhere else as opposed to being A. 00:06:31.025 --> 00:06:36.110 And now instead of having just one cipher where A maps to 1, B maps to 2, 00:06:36.110 --> 00:06:38.570 we have a variety of different ciphers, depending 00:06:38.570 --> 00:06:42.860 on where we decide we want to have our starting point. 00:06:42.860 --> 00:06:47.960 So for example, we might instead add two to every number. 00:06:47.960 --> 00:06:54.838 So instead of going from 1 to 26, we go from 3 to 28. 00:06:54.838 --> 00:06:55.630 Now think about it. 00:06:55.630 --> 00:06:58.300 If you're trying to break this cipher and you see patterns 00:06:58.300 --> 00:07:03.370 like this with all these numbers in them, what might jump out at you? 00:07:03.370 --> 00:07:07.630 Well, if you're used to seeing ciphers that are 1 through 26, for example, 00:07:07.630 --> 00:07:09.880 something where you don't see any 1s or 2s 00:07:09.880 --> 00:07:13.510 and suddenly you're seeing 27s and 28s potentially in the message that might 00:07:13.510 --> 00:07:18.490 be long enough to have, in this case, Ys or Zs in it 00:07:18.490 --> 00:07:21.010 might seem to you that this is slightly off. 00:07:21.010 --> 00:07:24.640 Like this cipher must be shifted in some way. 00:07:24.640 --> 00:07:26.690 Instead of being this straightforward line, 00:07:26.690 --> 00:07:29.050 there's some modification that's been made to it. 00:07:29.050 --> 00:07:31.992 That's kind of a tip off if you're trying to defend 00:07:31.992 --> 00:07:33.450 against somebody figuring that out. 00:07:33.450 --> 00:07:37.390 And so instead of going 27, 28 at the end, 00:07:37.390 --> 00:07:40.180 we might instead wrap around the alphabet. 00:07:40.180 --> 00:07:45.070 Once we have exhausted the 26 possible values that we started with, 00:07:45.070 --> 00:07:49.810 the 26 letters of the alphabet, we might instead, once we have X is 26, 00:07:49.810 --> 00:07:54.490 say, well, instead of Y being 27, Y is 1 and Z is 2. 00:07:54.490 --> 00:07:59.170 And this is not a massive improvement on the security of this cipher. 00:07:59.170 --> 00:08:02.860 Like I said, it's still quite fragile and quite easy to break. 00:08:02.860 --> 00:08:06.550 But it doesn't give quite as much of a clue to a potential adversary 00:08:06.550 --> 00:08:11.440 as to how to crack it, how to decipher the message. 00:08:11.440 --> 00:08:14.320 And this can be done for any different value 00:08:14.320 --> 00:08:16.150 to obtain any number of different ciphers. 00:08:16.150 --> 00:08:18.190 Instead of going forward by two positions, 00:08:18.190 --> 00:08:21.550 we could add 20 to every letter's value, again, 00:08:21.550 --> 00:08:24.790 wrapping around the alphabet when we exhaust, 00:08:24.790 --> 00:08:28.900 when we get to 26, instead of having 27, 28, we would just reset at 1 00:08:28.900 --> 00:08:32.179 and continue on. 00:08:32.179 --> 00:08:35.100 But we can also add 26 to it. 00:08:35.100 --> 00:08:37.870 But that doesn't look very different than what we had before. 00:08:37.870 --> 00:08:42.669 And that's where this cipher's vulnerability comes into play. 00:08:42.669 --> 00:08:46.950 There's only 26 possible ways to rotate the alphabet 00:08:46.950 --> 00:08:50.940 while keeping the order of the letters preserved, right? 00:08:50.940 --> 00:08:55.000 Unless we start skipping A, D, G, and then, 00:08:55.000 --> 00:08:57.840 you know, rearranging the other letters in some other way. 00:08:57.840 --> 00:09:00.840 If we want to keep everything straightforward in a line, 00:09:00.840 --> 00:09:04.440 again, wrapping around 26 when necessary, there's 00:09:04.440 --> 00:09:06.330 only 26 ways to do it. 00:09:06.330 --> 00:09:09.570 That is to say that shifting the alphabet forward by 26 00:09:09.570 --> 00:09:12.982 is exactly the same as shifting the alphabet forward by 0. 00:09:12.982 --> 00:09:14.190 And so that's our limitation. 00:09:14.190 --> 00:09:18.660 We have a very small number of, again, this word keys that can 00:09:18.660 --> 00:09:22.925 be used to decipher using this cipher. 00:09:22.925 --> 00:09:25.550 Now this is an example of something called a rotational cipher, 00:09:25.550 --> 00:09:27.870 and it's actually a rather famous rotational cipher 00:09:27.870 --> 00:09:30.030 known as the Caesar Cipher. 00:09:30.030 --> 00:09:32.970 It's attributed to Julius Caesar and was apparently used 00:09:32.970 --> 00:09:38.250 more than two millennia ago for him to encode messages to his troops 00:09:38.250 --> 00:09:39.480 on the line. 00:09:39.480 --> 00:09:41.250 And at the time, this was revolutionary. 00:09:41.250 --> 00:09:44.460 And generally what you're going to find with cryptography 00:09:44.460 --> 00:09:50.190 is there's just this pattern of breaking the mold and doing something new 00:09:50.190 --> 00:09:52.350 and trying to stay one step ahead. 00:09:52.350 --> 00:09:55.320 And oftentimes, other people will then catch up. 00:09:55.320 --> 00:09:58.740 And this cipher, which was once, you know, 00:09:58.740 --> 00:10:04.320 lauded as being a wonderful cipher, is no longer as strong 00:10:04.320 --> 00:10:06.120 as it once was thought to be. 00:10:06.120 --> 00:10:09.270 And so we keep having to advance and improve and get ahead 00:10:09.270 --> 00:10:12.150 of it for whatever kind of adversary that is, whether that's 00:10:12.150 --> 00:10:16.110 a potential enemy on the battle line, as might have been the case with Julius 00:10:16.110 --> 00:10:20.280 Caesar, or whether that's a hacker who's trying to break into your system 00:10:20.280 --> 00:10:21.870 as might be the case today. 00:10:21.870 --> 00:10:25.062 And fortunately, again, we're not using Caesar Cipher today 00:10:25.062 --> 00:10:26.520 to uncipher any of our information. 00:10:26.520 --> 00:10:28.410 We're using much more modern techniques. 00:10:28.410 --> 00:10:30.990 But these modern techniques evolved from seeing 00:10:30.990 --> 00:10:34.230 codes being created, ciphers being created and broken, 00:10:34.230 --> 00:10:36.210 and then having to be created anew to try 00:10:36.210 --> 00:10:39.220 and defend against new vulnerabilities that have been exposed. 00:10:39.220 --> 00:10:45.750 So like I said previously, very easy to decipher or to crack the Caesar Cipher, 00:10:45.750 --> 00:10:48.100 but at the time, very, very difficult. 00:10:48.100 --> 00:10:50.430 The limitation, again, limited number of keys. 00:10:50.430 --> 00:10:53.592 There's only 26 ways to rotate the alphabet for it to make sense. 00:10:53.592 --> 00:10:55.050 In the English alphabet, of course. 00:10:55.050 --> 00:10:56.460 If you're using a different alphabet, you're 00:10:56.460 --> 00:10:58.252 number of keys might be different if you're 00:10:58.252 --> 00:10:59.940 using the same rotational approach. 00:10:59.940 --> 00:11:01.860 But the fundamental limitation is you are 00:11:01.860 --> 00:11:04.920 confined by how many letters are in your alphabet 00:11:04.920 --> 00:11:08.460 that you're using to encipher information. 00:11:08.460 --> 00:11:10.320 So let's take things one step further. 00:11:10.320 --> 00:11:15.088 What is an improvement that we might be able to make to Caesar? 00:11:15.088 --> 00:11:17.880 That would lead us to this idea potentially of the Vigenere Cipher. 00:11:17.880 --> 00:11:20.270 So Caesar had this limitation of there's one key 00:11:20.270 --> 00:11:23.970 and there's only 26 possible values for that key. 00:11:23.970 --> 00:11:27.650 What Vigenere Cipher does is it, instead of using a single key, 00:11:27.650 --> 00:11:29.000 uses multiple keys. 00:11:29.000 --> 00:11:31.490 Instead of picking a number to shift by, we're 00:11:31.490 --> 00:11:34.430 instead going to define a keyword. 00:11:34.430 --> 00:11:37.880 And we're going to use the letters of that keyword in sequence as we 00:11:37.880 --> 00:11:41.570 go to change what our key is at any given 00:11:41.570 --> 00:11:45.050 time, such that our enciphered message, instead of being enciphered using one 00:11:45.050 --> 00:11:48.295 key, might use three keys or five keys or 10 keys, 00:11:48.295 --> 00:11:50.420 depending on the length of the keyword that we use, 00:11:50.420 --> 00:11:54.230 if that keyword is three or five or 10 letters long. 00:11:54.230 --> 00:11:57.680 So this keyword becomes the interesting twist 00:11:57.680 --> 00:12:01.130 that made Caesar much more challenging for an adversary 00:12:01.130 --> 00:12:03.350 to crack by using different keys. 00:12:03.350 --> 00:12:06.320 Now let's walk through an example of how the Vigenere Cipher works 00:12:06.320 --> 00:12:10.220 because I think it makes more sense to see this visually rather than just 00:12:10.220 --> 00:12:11.810 discussing it verbally. 00:12:11.810 --> 00:12:15.260 So what we want to do here is encrypt the message HELLO 00:12:15.260 --> 00:12:17.000 using the keyword LAW. 00:12:17.000 --> 00:12:22.010 So here our message HELLO is what also might be called plain text. 00:12:22.010 --> 00:12:23.030 It is in the clear. 00:12:23.030 --> 00:12:23.980 It is not enciphered. 00:12:23.980 --> 00:12:27.470 It is not hidden against any adversary. 00:12:27.470 --> 00:12:30.260 And our key is LAW. 00:12:30.260 --> 00:12:32.690 All right, so let's take a look at how we might do this. 00:12:32.690 --> 00:12:36.080 So it oftentimes helps, especially when trying to encipher or decipher 00:12:36.080 --> 00:12:40.430 using the Vigenere Cipher, to consider all 00:12:40.430 --> 00:12:44.567 of the inputs that go into determining the final outputted character. 00:12:44.567 --> 00:12:46.400 So we're going to take a look at plain text, 00:12:46.400 --> 00:12:47.570 and we're going to convert it, just like we did 00:12:47.570 --> 00:12:49.520 with Caesar, to its ordinal position. 00:12:49.520 --> 00:12:51.620 We're going to see where in the alphabet that is. 00:12:51.620 --> 00:12:52.578 Is it the first letter? 00:12:52.578 --> 00:12:53.110 Then it's 1. 00:12:53.110 --> 00:12:54.890 If it's the last letter, it's 26. 00:12:54.890 --> 00:12:56.190 And so on. 00:12:56.190 --> 00:12:59.405 We're going to do the exact same thing with each letter of our keyword. 00:12:59.405 --> 00:13:01.280 So we're going to take a look at the keyword, 00:13:01.280 --> 00:13:05.447 figure out what that letter's numerical correspondence would be. 00:13:05.447 --> 00:13:07.530 We're going to then add those two things together. 00:13:07.530 --> 00:13:10.458 If we go over 26, just as we did with the Caesar Cipher, 00:13:10.458 --> 00:13:13.250 we're going to wrap back around such that we're confining ourselves 00:13:13.250 --> 00:13:15.860 to that range of 1 through 26. 00:13:15.860 --> 00:13:18.890 And then we're going to take that number and transform it into a letter. 00:13:18.890 --> 00:13:23.180 So for example, if the result there is 2, we're going to change that into a B. 00:13:23.180 --> 00:13:26.940 And the reason for that is that B is the second letter of the alphabet. 00:13:26.940 --> 00:13:32.690 So let's walk through this with HELLO as our plain text and LAW as our key. 00:13:32.690 --> 00:13:36.110 So the first letter of our plain text is H, and the ordinal position 00:13:36.110 --> 00:13:37.920 of that H is 8. 00:13:37.920 --> 00:13:39.920 It is the eighth letter of the alphabet. 00:13:39.920 --> 00:13:43.820 We do the same thing with the first L for LAW, the first letter of LAW. 00:13:43.820 --> 00:13:46.590 L is 12, it's the 12th letter of the alphabet. 00:13:46.590 --> 00:13:49.700 So our next step is to add those two values, eight and 12 together. 00:13:49.700 --> 00:13:51.042 We get 20. 00:13:51.042 --> 00:13:52.250 We don't need to wrap around. 00:13:52.250 --> 00:13:54.770 We didn't go over 26, so we're still OK. 00:13:54.770 --> 00:13:59.430 And the 20th letter of the alphabet is T. So the first step of this 00:13:59.430 --> 00:14:04.760 is enciphering process with HELLO, using the Vigenere Cipher, using the key LAW, 00:14:04.760 --> 00:14:06.808 is to turn the H into a T. 00:14:06.808 --> 00:14:08.600 So we can do this again, we can take a look 00:14:08.600 --> 00:14:10.937 at the E, the second letter of our plain text. 00:14:10.937 --> 00:14:12.770 We use the second letter of our keyword now. 00:14:12.770 --> 00:14:14.103 So we're not using the same key. 00:14:14.103 --> 00:14:16.283 We're not using 12 over and over and over. 00:14:16.283 --> 00:14:17.450 We're using a different key. 00:14:17.450 --> 00:14:20.120 We're now using the A, the second letter of our keyword, 00:14:20.120 --> 00:14:21.300 whose ordinal position is 1. 00:14:21.300 --> 00:14:26.000 So 5 plus 1 is 6, and that results in F. 00:14:26.000 --> 00:14:31.670 Next, we use the first L of HELLO, and the W of LAW. 00:14:31.670 --> 00:14:35.740 So L is 12, W is the 23rd letter of the alphabet, we add those together, 00:14:35.740 --> 00:14:36.240 we're at 35. 00:14:36.240 --> 00:14:39.740 35 is not a legal value in terms of this cipher. 00:14:39.740 --> 00:14:43.040 We are confined to 1 through 26. 00:14:43.040 --> 00:14:47.065 And so we just subtract 26 and we get down to 9, and now we have I. 00:14:47.065 --> 00:14:47.940 So now what do we do? 00:14:47.940 --> 00:14:50.960 We've exhausted our keyword, but we still 00:14:50.960 --> 00:14:53.497 have plain text that we need to encipher. 00:14:53.497 --> 00:14:55.580 Well, as you might expect, the logical thing to do 00:14:55.580 --> 00:14:59.785 is just go back to the beginning of the keyword and continue on. 00:14:59.785 --> 00:15:00.410 And so we will. 00:15:00.410 --> 00:15:05.208 So we'll use the L, the second L of our plain text, and the first L-- 00:15:05.208 --> 00:15:07.250 because we've now exhausted all of those letters, 00:15:07.250 --> 00:15:09.050 we have to go back to the beginning-- 00:15:09.050 --> 00:15:10.190 the L for LAW. 00:15:10.190 --> 00:15:12.290 12 plus 12 is 24. 00:15:12.290 --> 00:15:15.290 24, the 24th letter of the alphabet is X. 00:15:15.290 --> 00:15:19.100 And we do that finally as well for the O, advancing it one position, because 00:15:19.100 --> 00:15:23.840 of the A in LAW, to 16, and that is P. 00:15:23.840 --> 00:15:27.260 So ultimately, HELLO in this case becomes 00:15:27.260 --> 00:15:30.680 this random set of characters, TFIXP. 00:15:30.680 --> 00:15:34.280 And some advantages might also immediately jump out at you. 00:15:34.280 --> 00:15:37.910 With the Caesar Cipher, anytime we changed a letter, 00:15:37.910 --> 00:15:42.500 it always was that same letter every time we 00:15:42.500 --> 00:15:43.910 saw it in the enciphered message. 00:15:43.910 --> 00:15:47.390 So if we had a B and we were advancing everything by two characters, 00:15:47.390 --> 00:15:49.550 every B in the original message would always 00:15:49.550 --> 00:15:53.210 be a D because D comes two letters after B. 00:15:53.210 --> 00:15:58.205 So again, if our Caesar Cipher key is two, every time we see a B, 00:15:58.205 --> 00:16:02.570 it becomes a D, every time we have an A, it becomes a C, always. 00:16:02.570 --> 00:16:05.630 Here with the Vigenere Cipher, because we have different keys 00:16:05.630 --> 00:16:08.022 and we're rotating these keys differently, 00:16:08.022 --> 00:16:09.980 depending on which letter of the keyword we are 00:16:09.980 --> 00:16:12.560 and which letter of the plain text we are, 00:16:12.560 --> 00:16:15.680 those two Ls are not the same, right? 00:16:15.680 --> 00:16:19.640 Instead of H-E-L-L-O, we don't have some mapping. 00:16:19.640 --> 00:16:22.760 Those two Ls are I and X. They are not the same character. 00:16:22.760 --> 00:16:25.640 And so already we're seeing a bit more security here 00:16:25.640 --> 00:16:29.630 because there's not this potential to guess. 00:16:29.630 --> 00:16:33.530 Caesar is also much more secure when you consider 00:16:33.530 --> 00:16:35.240 how many keys are available to you. 00:16:35.240 --> 00:16:39.050 With the Caesar Cipher we had 26 keys available to us. 00:16:39.050 --> 00:16:42.860 With the Vigenere Cipher we have 26 to the n keys, 00:16:42.860 --> 00:16:44.820 where n is the length of our keyword. 00:16:44.820 --> 00:16:47.930 So for example, if we're using a two letter long keyword, 00:16:47.930 --> 00:16:52.640 for example, AA or AB or all the way up, that leaves us with 26 squared, 00:16:52.640 --> 00:16:54.710 or 676 possibilities. 00:16:54.710 --> 00:16:57.560 Now if we extend to three letter keywords or four letter keywords, 00:16:57.560 --> 00:17:00.000 we're getting even more and more possibilities. 00:17:00.000 --> 00:17:02.870 And as we start to increase the number of possibilities, 00:17:02.870 --> 00:17:06.980 we start to really increase the difficulty for some adversary 00:17:06.980 --> 00:17:08.930 to figure out what the key is. 00:17:08.930 --> 00:17:11.089 And that's really the goal of cryptography, right? 00:17:11.089 --> 00:17:13.400 We want to be able to protect information 00:17:13.400 --> 00:17:18.000 and we want to defend that information from being determined by other people. 00:17:18.000 --> 00:17:22.430 So the more work we put into making more challenging keys, the more likely 00:17:22.430 --> 00:17:26.609 we are to be successful in our attempt to encipher information. 00:17:26.609 --> 00:17:29.240 So again, Vigenere much more of a secure cipher. 00:17:29.240 --> 00:17:31.460 It's still not secure and it's definitely 00:17:31.460 --> 00:17:33.590 not a cipher that is used today. 00:17:33.590 --> 00:17:39.980 There are computer programs that are capable of figuring out how to decipher 00:17:39.980 --> 00:17:42.980 using the Vigenere Cipher pretty well. 00:17:42.980 --> 00:17:45.230 But it's more secure than Caesar for sure 00:17:45.230 --> 00:17:52.510 because of its changing alphabets and its much larger number of keys. 00:17:52.510 --> 00:17:57.010 Let's go back to this decoder pin and think about another potential problem 00:17:57.010 --> 00:17:58.300 that we have. 00:17:58.300 --> 00:18:00.340 Now assume that your adversary is actually 00:18:00.340 --> 00:18:02.950 not a member of Annie's secret society. 00:18:02.950 --> 00:18:04.813 They don't have this pin. 00:18:04.813 --> 00:18:05.980 So that's already a step up. 00:18:05.980 --> 00:18:08.897 We previously had assumed that anybody who had the pin could crack it, 00:18:08.897 --> 00:18:09.820 and that's still true. 00:18:09.820 --> 00:18:14.800 But let's assume your adversary, lucky you, doesn't have this pin. 00:18:14.800 --> 00:18:21.633 Is there still a way that they would be able to crack the code without the pin? 00:18:21.633 --> 00:18:22.800 Think about it for a second. 00:18:22.800 --> 00:18:26.660 Think about what our characteristics of the English language 00:18:26.660 --> 00:18:31.345 are that might suggest people figure out what this cipher is. 00:18:31.345 --> 00:18:33.720 Think about some unique features of the English language, 00:18:33.720 --> 00:18:38.370 which is one letter words, like I and A, which might appear in the message. 00:18:38.370 --> 00:18:41.280 If you see a single letter word in a message, 00:18:41.280 --> 00:18:44.005 you're probably going to guess that it's either the letter I, 00:18:44.005 --> 00:18:47.130 and every time I see that character or that number I'm going to assume it's 00:18:47.130 --> 00:18:49.503 an I, or you're going to assume it's an A 00:18:49.503 --> 00:18:51.670 and you're going to try and plug in an A everywhere. 00:18:51.670 --> 00:18:55.620 And some trial and error might reveal some patterns that emerge. 00:18:55.620 --> 00:18:58.620 And there is a very prevalent pattern in the English language, 00:18:58.620 --> 00:19:01.920 which is that letters appear with a pretty regular frequency. 00:19:01.920 --> 00:19:05.790 Given any arbitrary text in the English language, 00:19:05.790 --> 00:19:10.020 it's pretty likely that the distribution of letters within that text 00:19:10.020 --> 00:19:14.700 is going to follow this pattern roughly 13% of the time, give or take. 00:19:14.700 --> 00:19:16.890 Any arbitrary letter selected from a text 00:19:16.890 --> 00:19:23.890 is going to be the letter E. And only 1/10 of 1% of the time will it be a Z. 00:19:23.890 --> 00:19:26.460 And only 2/10 might it be a J. So there are some letters that 00:19:26.460 --> 00:19:28.980 appear very frequently and there are other letters that 00:19:28.980 --> 00:19:31.050 appear very infrequently. 00:19:31.050 --> 00:19:34.682 And that is still a problem in this generic substitution cipher, 00:19:34.682 --> 00:19:37.140 even with the letters being scrambled, which seems at first 00:19:37.140 --> 00:19:40.698 blush to perhaps be much more secure than one where 00:19:40.698 --> 00:19:42.990 the letters are increasing sequentially and the numbers 00:19:42.990 --> 00:19:44.460 are increasing sequentially. 00:19:44.460 --> 00:19:46.943 Even this scattershot mapping of letters to numbers, 00:19:46.943 --> 00:19:49.110 as long as we're still confined to these two domains 00:19:49.110 --> 00:19:51.870 where we have A through Z and 1 through 26 00:19:51.870 --> 00:19:53.760 and there's always a mapping between them, 00:19:53.760 --> 00:19:56.580 whether they're ordered or not ordered, is still 00:19:56.580 --> 00:20:00.420 a problem, in the English language anyway, because of frequency analysis. 00:20:00.420 --> 00:20:02.820 These are actually very common puzzles. 00:20:02.820 --> 00:20:05.880 Humans might find it kind of tedious to try and solve these puzzles, 00:20:05.880 --> 00:20:09.540 but otherwise, this is well known as a cryptogram. 00:20:09.540 --> 00:20:12.570 You may, if you are the puzzling type, this type of puzzle 00:20:12.570 --> 00:20:13.950 is called a cryptogram. 00:20:13.950 --> 00:20:18.750 And this pattern is definitely something that is across all messages 00:20:18.750 --> 00:20:20.670 that appear in the English language. 00:20:20.670 --> 00:20:24.090 There are plenty of other ciphers that appear, that are used, 00:20:24.090 --> 00:20:25.890 that are more secure than any of these what 00:20:25.890 --> 00:20:27.960 we might call one-to-one ciphers, mapping 00:20:27.960 --> 00:20:32.130 a single character to a different character or to a number. 00:20:32.130 --> 00:20:35.010 There are some ciphers that substitute pairs or triples of characters 00:20:35.010 --> 00:20:35.510 at a time. 00:20:35.510 --> 00:20:37.950 And these ciphers, again, form the basis for what 00:20:37.950 --> 00:20:40.470 eventually becomes more modern cryptography, which 00:20:40.470 --> 00:20:42.060 we're getting to in just a moment. 00:20:42.060 --> 00:20:43.920 There are also transposition ciphers, where 00:20:43.920 --> 00:20:46.470 instead of substituting one character for another, 00:20:46.470 --> 00:20:51.500 we simply use an algorithm to rearrange all the letters in some systematic way. 00:20:51.500 --> 00:20:55.980 And the defect there is that all the letters of our original plain text 00:20:55.980 --> 00:21:00.000 message are still there and all we need to do is unscramble them. 00:21:00.000 --> 00:21:01.740 And because there's an algorithm that was 00:21:01.740 --> 00:21:05.640 used to scramble them in the first place, 00:21:05.640 --> 00:21:07.950 there's got to be a way to undo it as well. 00:21:07.950 --> 00:21:12.560 With a little bit of trial and error, we can probably sort that out. 00:21:12.560 --> 00:21:17.450 Finally, the most egregious issue with these classical ciphers 00:21:17.450 --> 00:21:20.450 is, how do you distribute the key? 00:21:20.450 --> 00:21:25.750 How do you tell someone who you want to share information with? 00:21:25.750 --> 00:21:30.530 How do you tell your ally what the key is for the cipher 00:21:30.530 --> 00:21:32.850 that you are going to use? 00:21:32.850 --> 00:21:36.683 You can't encrypt it because if you encrypt the key, 00:21:36.683 --> 00:21:38.350 how will they know what the real key is? 00:21:38.350 --> 00:21:40.630 If you say, if you send them a message and they 00:21:40.630 --> 00:21:43.600 don't know how to interpret it, or they see it and they interpret it 00:21:43.600 --> 00:21:46.690 as something else, that's not going to be helpful to you. 00:21:46.690 --> 00:21:51.110 You want them to see the key in the plain text. 00:21:51.110 --> 00:21:54.130 You want them to see the key in the clear, rather. 00:21:54.130 --> 00:21:55.960 You want them to just have it. 00:21:55.960 --> 00:21:59.500 You don't want to encrypt that as you hand it to them. 00:21:59.500 --> 00:22:01.870 That doesn't do them any good. 00:22:01.870 --> 00:22:05.320 But if you're giving the key to your ally 00:22:05.320 --> 00:22:07.930 and your adversary is within earshot, or they 00:22:07.930 --> 00:22:11.560 have access to that same piece of paper because your ally carelessly throws it 00:22:11.560 --> 00:22:14.260 away and they can just pick it up, now all 00:22:14.260 --> 00:22:20.670 of a sudden all of your messages using basic ciphers are fairly insecure. 00:22:20.670 --> 00:22:25.110 But let's take a step forward in modern cryptography. 00:22:25.110 --> 00:22:28.410 Perhaps you've seen a screen that looks like this at some point 00:22:28.410 --> 00:22:32.100 when you're trying to log in to some system. 00:22:32.100 --> 00:22:36.910 Enter your email and we'll email you a link to change your password. 00:22:36.910 --> 00:22:39.668 Well, why don't you just email me my password? 00:22:39.668 --> 00:22:41.710 Like you're going to give me a link to change it, 00:22:41.710 --> 00:22:45.055 you must know it if I use my credentials to log in 00:22:45.055 --> 00:22:47.170 to your service any given day. 00:22:47.170 --> 00:22:50.410 But OK, I guess, sure. 00:22:50.410 --> 00:22:54.940 The reason for this is actually a reason of security. 00:22:54.940 --> 00:22:59.650 So let's distinguish ciphers, which we've been talking about, from hashes. 00:22:59.650 --> 00:23:03.370 So one of the most critical distinctions is 00:23:03.370 --> 00:23:07.130 that ciphers are generally reversible. 00:23:07.130 --> 00:23:09.177 You can undo what you did. 00:23:09.177 --> 00:23:12.010 That's the whole reason why it's important to share with your allies 00:23:12.010 --> 00:23:13.330 the key. 00:23:13.330 --> 00:23:17.050 But hashes are generally not reversible. 00:23:17.050 --> 00:23:20.210 Or certainly, they're not supposed to be reversible. 00:23:20.210 --> 00:23:23.140 And so it turns out, and we'll learn about this a little bit 00:23:23.140 --> 00:23:28.150 later, when you log in to some service, if that service 00:23:28.150 --> 00:23:31.480 is doing a good job of protecting your data, 00:23:31.480 --> 00:23:35.350 the reason they can't just send you your password is because they actually 00:23:35.350 --> 00:23:36.800 don't know your password. 00:23:36.800 --> 00:23:39.820 And that might seem strange because clearly, there 00:23:39.820 --> 00:23:44.470 must be something-- if I type in my password then I get logged in. 00:23:44.470 --> 00:23:49.395 But a good service is one that does not store your password in the database. 00:23:49.395 --> 00:23:51.520 That's probably a good thing if you think about it. 00:23:51.520 --> 00:23:53.228 In case there was ever a data breach, you 00:23:53.228 --> 00:23:57.070 wouldn't want your password to be in their database. 00:23:57.070 --> 00:24:02.410 Instead what they do is they store a hash of your password in the database. 00:24:02.410 --> 00:24:06.250 And then when you provide your password to them, 00:24:06.250 --> 00:24:08.260 they run that hash through the same things, 00:24:08.260 --> 00:24:12.040 called a hash function, which is just a generic idea for a function that 00:24:12.040 --> 00:24:18.310 takes any arbitrarily large amount of data and maps it to some other range 00:24:18.310 --> 00:24:20.080 or some other set of values. 00:24:20.080 --> 00:24:25.700 Now that might be a arbitrarily long string of information. 00:24:25.700 --> 00:24:30.340 It might be some fixed string where if I run my password through this, 00:24:30.340 --> 00:24:33.400 I'm going to get back something that is always 20 characters long. 00:24:33.400 --> 00:24:35.440 But it looks nothing like my original password. 00:24:35.440 --> 00:24:38.550 I've just made some weird manipulations to it. 00:24:38.550 --> 00:24:41.650 And that's what happens in log-in systems more generally 00:24:41.650 --> 00:24:43.900 is you will log in to some service, you'll 00:24:43.900 --> 00:24:47.980 type in your password, when that information is then submitted 00:24:47.980 --> 00:24:51.100 to the organization to check your log-in credentials, 00:24:51.100 --> 00:24:54.490 they will run your password through that same hash function again. 00:24:54.490 --> 00:25:00.670 And if that value matches what they have in their database for you, 00:25:00.670 --> 00:25:04.540 that is how they know that you have provided the correct credentials. 00:25:04.540 --> 00:25:10.060 They're mapping-- they're matching some mapping of your password to the one 00:25:10.060 --> 00:25:13.930 that they have stored, but they're not actually checking your actual password. 00:25:13.930 --> 00:25:16.700 And that should probably give you some sense of security. 00:25:16.700 --> 00:25:20.890 And if you ever use a service where you end up having to click on that link 00:25:20.890 --> 00:25:24.045 and they actually send you your password, 00:25:24.045 --> 00:25:26.170 you probably don't want to use that service anymore 00:25:26.170 --> 00:25:33.100 because they're not taking strong enough precautions to protect your data. 00:25:33.100 --> 00:25:36.010 So as I said, once we have a password stored in the database, 00:25:36.010 --> 00:25:40.600 it is actually stored as a hash rather than as the password itself. 00:25:40.600 --> 00:25:46.720 The service should not be able to tell you what your password really is. 00:25:46.720 --> 00:25:48.670 So this idea of a hash function-- what is it? 00:25:48.670 --> 00:25:52.420 Well, as I said, it's something that takes any arbitrary data-- 00:25:52.420 --> 00:25:56.140 and eventually we'll get into hashing things like files and not just words 00:25:56.140 --> 00:25:58.690 or strings, but for now let's keep it to strings, strings 00:25:58.690 --> 00:26:00.940 being a sequence of characters or letters, like a word 00:26:00.940 --> 00:26:02.980 or a phrase or a sentence-- 00:26:02.980 --> 00:26:05.630 and mapping it to some other range. 00:26:05.630 --> 00:26:10.900 So we'll start out by just mapping a string, a set of letters, to a number. 00:26:10.900 --> 00:26:13.540 But it could be to a different string, a string that's 00:26:13.540 --> 00:26:15.950 always 10 characters long, and so on. 00:26:15.950 --> 00:26:19.060 So there are some properties that good hash functions have. 00:26:19.060 --> 00:26:21.100 Let's take a look at what some of these are. 00:26:21.100 --> 00:26:23.980 So they should use only the data being hashed. 00:26:23.980 --> 00:26:26.230 There shouldn't be anything else that comes into play. 00:26:26.230 --> 00:26:28.480 They shouldn't be bringing in any outside information. 00:26:28.480 --> 00:26:31.030 It should rely exclusively on whatever data is 00:26:31.030 --> 00:26:34.690 being passed in to the hash function. 00:26:34.690 --> 00:26:36.970 They should also use all of the data being hashed. 00:26:36.970 --> 00:26:42.700 It becomes a bit less effective if every time I provide a word or a string 00:26:42.700 --> 00:26:48.400 to my hash function, I'm only using the first letter of that string, 00:26:48.400 --> 00:26:50.590 such that my hash function for every word 00:26:50.590 --> 00:26:52.900 or every string I provide that starts with A 00:26:52.900 --> 00:26:55.180 is going to return the same value. 00:26:55.180 --> 00:26:57.160 That's not terribly useful to me. 00:26:57.160 --> 00:27:00.370 I want to get a better distribution of values. 00:27:00.370 --> 00:27:02.590 Your hash function should be deterministic. 00:27:02.590 --> 00:27:06.640 And when we say deterministic, we mean no random elements to it. 00:27:06.640 --> 00:27:10.150 Oftentimes we think that random numbers are nice to jumble things up. 00:27:10.150 --> 00:27:12.820 But the problem is we want our hash function to always output 00:27:12.820 --> 00:27:16.580 the same value for the same inputs. 00:27:16.580 --> 00:27:19.120 So if I give you my password and hash it and I get 00:27:19.120 --> 00:27:21.498 some output, every time I provide my password 00:27:21.498 --> 00:27:23.290 and run it through that same hash function, 00:27:23.290 --> 00:27:25.780 I want to get the same output every time. 00:27:25.780 --> 00:27:30.010 And that's what sites rely on when they're using hashed passwords as part 00:27:30.010 --> 00:27:31.612 of the credentialing check. 00:27:31.612 --> 00:27:33.820 They're relying on the fact that they will always get 00:27:33.820 --> 00:27:36.340 the same output given the same input. 00:27:36.340 --> 00:27:38.470 So that's a requirement of a hash function. 00:27:38.470 --> 00:27:42.640 Hash functions should uniformly distribute data. 00:27:42.640 --> 00:27:45.770 So oftentimes you're mapping these strings, 00:27:45.770 --> 00:27:47.560 let's say, to some set of values. 00:27:47.560 --> 00:27:49.810 Those could be numbers, again, those could be strings. 00:27:49.810 --> 00:27:52.660 You want to spread those out evenly, ideally, 00:27:52.660 --> 00:27:55.630 across all of the possible values that you have. 00:27:55.630 --> 00:28:00.580 You don't want everything to hash to 15 if your range is 0 to 100. 00:28:00.580 --> 00:28:02.890 You'd ideally like everything to be spread out such 00:28:02.890 --> 00:28:05.710 that there's an equal number of 0s, 1s, 99s, 00:28:05.710 --> 00:28:10.970 and so on, as we talked about a little bit when we discussed hash tables. 00:28:10.970 --> 00:28:14.990 Finally, we also want to be able to generate very different hash codes, 00:28:14.990 --> 00:28:18.140 very different values for very similar data. 00:28:18.140 --> 00:28:23.150 For example, LAW and LAWS should hash two very different values. 00:28:23.150 --> 00:28:26.510 That would be ideal if a tiny bit of variation 00:28:26.510 --> 00:28:29.480 created a really dramatic ripple effect. 00:28:29.480 --> 00:28:31.970 And creating this really dramatic ripple effect 00:28:31.970 --> 00:28:35.780 is pretty key when we're talking about cryptographic hash functions, which 00:28:35.780 --> 00:28:38.540 we'll get to in a second, which form the basis of almost 00:28:38.540 --> 00:28:41.030 all modern cryptography, which form the basis of everything 00:28:41.030 --> 00:28:47.270 that we do that we rely on when we think of security in the computational field, 00:28:47.270 --> 00:28:50.780 it's almost always relying on these hash functions being really, really 00:28:50.780 --> 00:28:55.400 good at making small changes have very dramatic ripple effects 00:28:55.400 --> 00:28:59.480 in the hash code or the hash value, the data that comes out 00:28:59.480 --> 00:29:02.080 of the hash function. 00:29:02.080 --> 00:29:04.880 So after all this talk about good hash functions, 00:29:04.880 --> 00:29:07.098 let's take a look at a pretty bad hash function. 00:29:07.098 --> 00:29:08.140 And we'll talk about why. 00:29:08.140 --> 00:29:12.850 We'll talk about one of its virtues, but some of its potential problems as well. 00:29:12.850 --> 00:29:15.447 So instead, let's add up all of the ordinal positions 00:29:15.447 --> 00:29:17.030 of all the letters in the hash string. 00:29:17.030 --> 00:29:19.072 So this ordinal position idea is exactly the same 00:29:19.072 --> 00:29:22.340 as we had a moment ago when we were talking about Caesar and Vigenere. 00:29:22.340 --> 00:29:26.320 So A is 1, B is 2, and so on. 00:29:26.320 --> 00:29:30.220 So for example, for a word like STAR, if we want to add up the ordinal positions 00:29:30.220 --> 00:29:34.510 of all of the letters in that word, we have S-T-A-R. 00:29:34.510 --> 00:29:38.960 That's 19 plus 20 plus 1 plus 18. 00:29:38.960 --> 00:29:42.490 So if you do that math quickly, that ends up being 58. 00:29:42.490 --> 00:29:45.550 So what is a good thing about this hash function? 00:29:45.550 --> 00:29:47.320 Well, it's not reversible. 00:29:47.320 --> 00:29:53.170 If I get a 58, I don't necessarily know that the input that I had there 00:29:53.170 --> 00:29:54.730 was STAR. 00:29:54.730 --> 00:29:57.300 It could have been any one of a whole variety of things. 00:29:57.300 --> 00:30:01.450 It could have been ARTS or RATS or SWAP or PAWS 00:30:01.450 --> 00:30:06.460 or WASP or MULL or this whole random set of 29 Bs in a row. 00:30:06.460 --> 00:30:10.030 All of these things, when run through this really terrible hash 00:30:10.030 --> 00:30:13.840 function that I've defined here, all add up to 58 00:30:13.840 --> 00:30:16.150 when I follow the rules of this algorithm. 00:30:16.150 --> 00:30:20.090 So I never know what my input was given my output. 00:30:20.090 --> 00:30:21.130 That is a good thing. 00:30:21.130 --> 00:30:22.755 That is what a hash function should do. 00:30:22.755 --> 00:30:28.070 Hash functions, unlike ciphers, should not be reversible. 00:30:28.070 --> 00:30:33.200 But the problem that I have here is that I have a lot of collisions, right? 00:30:33.200 --> 00:30:37.960 There are a lot of different things that map to 58. 00:30:37.960 --> 00:30:40.880 And when we talked about collisions a little bit previously, 00:30:40.880 --> 00:30:43.580 we were talking about them in the context of a hash table. 00:30:43.580 --> 00:30:46.520 And collisions were OK in that context. 00:30:46.520 --> 00:30:48.860 We were just clustering things together. 00:30:48.860 --> 00:30:50.920 If they all happened to have the same hash value, 00:30:50.920 --> 00:30:52.625 we'll just put them in the same bucket. 00:30:52.625 --> 00:30:54.500 When we're talking about cryptography though, 00:30:54.500 --> 00:30:59.870 when we start to get into relying on cryptography to keep our data secure, 00:30:59.870 --> 00:31:03.360 we can't have collisions at all. 00:31:03.360 --> 00:31:07.400 In fact, pretty much we rely on the fact that it is so mathematically 00:31:07.400 --> 00:31:12.740 unlikely, neigh impossible to have a collision in order for these things 00:31:12.740 --> 00:31:13.250 to work. 00:31:13.250 --> 00:31:16.760 And so collisions, when we're talking about cryptographic hash functions, 00:31:16.760 --> 00:31:19.630 are definitely not a good thing. 00:31:19.630 --> 00:31:23.850 So to recap, to check that a user gave us the correct password, if we're 00:31:23.850 --> 00:31:27.330 storing a hash of the password in the database versus just storing 00:31:27.330 --> 00:31:30.870 the plain text password in the database, which hopefully no one is storing 00:31:30.870 --> 00:31:33.300 a plain text password in the database, we 00:31:33.300 --> 00:31:37.530 run the actual password, the real password through the hash function. 00:31:37.530 --> 00:31:40.710 We get a hash value as an output, some string or some number 00:31:40.710 --> 00:31:42.670 or what have you as the output. 00:31:42.670 --> 00:31:46.170 And if we get a match, odds are they entered the right password. 00:31:46.170 --> 00:31:51.510 Now I'm saying odds are because we can't be 100% sure. 00:31:51.510 --> 00:31:54.300 And we can never be 100% sure. 00:31:54.300 --> 00:31:57.420 We can be really, really, really sure, but there's always 00:31:57.420 --> 00:31:59.130 a chance of a collision. 00:31:59.130 --> 00:32:02.670 Even with the best designed hash functions, even 00:32:02.670 --> 00:32:05.125 with the best designed cryptographic hash functions, 00:32:05.125 --> 00:32:06.750 there's always a chance of a collision. 00:32:06.750 --> 00:32:09.780 But ideally, that chance is quite infinitesimal. 00:32:09.780 --> 00:32:12.390 Very, very, very, very, very, very unlikely. 00:32:12.390 --> 00:32:16.860 So odds are if we get this hash, comes out of this hash function, 00:32:16.860 --> 00:32:21.120 it's quite likely, like 99.9% plus likely 00:32:21.120 --> 00:32:23.610 that they entered the correct password, this is, in fact, 00:32:23.610 --> 00:32:29.220 the user whose credentials are being verified, and we should log them in. 00:32:29.220 --> 00:32:32.113 Modern cryptography is just hashing. 00:32:32.113 --> 00:32:35.280 It's just hashing that's quite a bit more clever, certainly than the example 00:32:35.280 --> 00:32:37.890 that I just talked about a moment ago. 00:32:37.890 --> 00:32:42.235 Also, these algorithms tend not to work on a character by character basis. 00:32:42.235 --> 00:32:44.235 It's the algorithm that I just did as well where 00:32:44.235 --> 00:32:45.390 I was adding up every single letter. 00:32:45.390 --> 00:32:47.220 I was looking at each one individually. 00:32:47.220 --> 00:32:49.800 They tend to take, these modern algorithms 00:32:49.800 --> 00:32:53.700 tend to take clusters of letters, pairs or triples or so on at a time, 00:32:53.700 --> 00:32:55.215 maybe do even more things. 00:32:55.215 --> 00:32:57.840 They might rearrange the letters before they do things to them. 00:32:57.840 --> 00:33:02.490 So there's multiple layers going on with these encryption algorithms. 00:33:02.490 --> 00:33:05.530 And unlike some of the ones I've discussed earlier, 00:33:05.530 --> 00:33:10.050 most of these also have the property where given data of arbitrary size-- 00:33:10.050 --> 00:33:13.620 and now we're starting to really expand our minds into not just words 00:33:13.620 --> 00:33:18.720 or strings, but also images, files, videos, documents, PDFs, 00:33:18.720 --> 00:33:23.100 and so on; anything can be run through a hash function to get a value-- 00:33:23.100 --> 00:33:26.550 but we're always going to get a string of bits, a bit string, that 00:33:26.550 --> 00:33:28.472 is always exactly the same size. 00:33:28.472 --> 00:33:30.180 So depending on the algorithm, maybe it's 00:33:30.180 --> 00:33:36.780 going to be a 160-bit long string, or a 256-bit long string. 00:33:36.780 --> 00:33:38.790 But our range is finite. 00:33:38.790 --> 00:33:42.480 It's always going to be exactly 256 bits. 00:33:42.480 --> 00:33:45.210 But the combination of those bits will be different, ideally, 00:33:45.210 --> 00:33:50.040 for every single piece of data we might throw at it, no matter what. 00:33:50.040 --> 00:33:53.970 OK, so let's expand our definition of a hash function 00:33:53.970 --> 00:33:57.180 to this idea of a cryptographic hash function. 00:33:57.180 --> 00:34:00.390 What properties should they have? 00:34:00.390 --> 00:34:05.880 They should be very difficult, very, very difficult, basically impossible 00:34:05.880 --> 00:34:06.930 to reverse. 00:34:06.930 --> 00:34:12.000 It should be computationally impossible for anybody to undo the encryption. 00:34:12.000 --> 00:34:15.060 That's pretty much the same as a regular hash function. 00:34:15.060 --> 00:34:17.940 We're just really hammering the point home when we say this here. 00:34:17.940 --> 00:34:19.560 They should still be deterministic. 00:34:19.560 --> 00:34:22.070 We don't want any random elements to it. 00:34:22.070 --> 00:34:23.940 We still want to a hash a value and always 00:34:23.940 --> 00:34:28.469 get the same output no matter what if we run that same value through the hash 00:34:28.469 --> 00:34:31.949 function an arbitrary number of times. 00:34:31.949 --> 00:34:34.739 They should still generate very different hash codes 00:34:34.739 --> 00:34:36.239 for very similar data. 00:34:36.239 --> 00:34:38.370 We still want things to be spread out and we want 00:34:38.370 --> 00:34:42.020 minor changes to have dramatic effect. 00:34:42.020 --> 00:34:43.750 And they should never-- 00:34:43.750 --> 00:34:46.440 and this is one of those words that computer scientists love-- 00:34:46.440 --> 00:34:52.620 they should never allow two different sets of data to hash to the same value. 00:34:52.620 --> 00:34:56.830 Do you see a potential problem when we frame it in this way? 00:34:56.830 --> 00:35:00.150 When we say they should never be able to do that? 00:35:00.150 --> 00:35:02.910 We've already restricted ourselves to a finite domain, right? 00:35:02.910 --> 00:35:10.530 I said a moment ago, maybe this hash function maps to 160-bit long strings. 00:35:10.530 --> 00:35:14.760 There's only so many combinations of 160 bits. 00:35:14.760 --> 00:35:18.960 Now that might be an unfathomably large number, but using the word never 00:35:18.960 --> 00:35:21.060 there becomes a bit dangerous. 00:35:21.060 --> 00:35:24.000 We can't really rely on that. 00:35:24.000 --> 00:35:26.940 And we'll see why this could potentially be a problem. 00:35:26.940 --> 00:35:31.088 This static length string, by the way, is usually referred to as a digest 00:35:31.088 --> 00:35:31.755 in this context. 00:35:31.755 --> 00:35:34.650 When we start to talk about more modern cryptography techniques, 00:35:34.650 --> 00:35:36.720 the output of a cryptographic hash function 00:35:36.720 --> 00:35:40.365 is usually referred to as a digest. 00:35:40.365 --> 00:35:42.990 Let's take a look at one of these cryptographic hash functions. 00:35:42.990 --> 00:35:45.420 And certainly I'm not going to dive into the mathematics of it. 00:35:45.420 --> 00:35:46.800 I wouldn't be able to explain the mathematics. 00:35:46.800 --> 00:35:49.440 I wouldn't be able to do it justice if I tried to explain the mathematics of it. 00:35:49.440 --> 00:35:52.080 But let's just take a look at some of the basics of this. 00:35:52.080 --> 00:35:53.510 So SHA-1. 00:35:53.510 --> 00:35:55.710 SHA-1 is quite a famous algorithm. 00:35:55.710 --> 00:36:00.900 It was designed by the National Security Agency in the mid-1990s. 00:36:00.900 --> 00:36:06.060 So these are really smart people who are tasked with working 00:36:06.060 --> 00:36:08.820 with things like military intelligence. 00:36:08.820 --> 00:36:14.870 These are people who are dedicating their lives to trying to protect data 00:36:14.870 --> 00:36:17.150 as best as they possibly can. 00:36:17.150 --> 00:36:20.240 Far more brilliant minds than I, for sure. 00:36:20.240 --> 00:36:22.700 And this hash function-- and this is a published paper. 00:36:22.700 --> 00:36:27.290 Hash functions tend to be, actually it's this very strange dichotomy where 00:36:27.290 --> 00:36:30.200 you describe exactly how the function works, 00:36:30.200 --> 00:36:32.860 but it still should be irreversible. 00:36:32.860 --> 00:36:36.980 And this just really becomes a question of incredibly complicated mathematics 00:36:36.980 --> 00:36:40.640 involved, such that even if you knew so many of the pieces going in, 00:36:40.640 --> 00:36:44.000 you still might not-- you still wouldn't be able to undo it, even if you tried. 00:36:44.000 --> 00:36:45.760 It's kind of amazing actually. 00:36:45.760 --> 00:36:50.970 SHA-1's digests are always 160 bits in length. 00:36:50.970 --> 00:36:53.810 So this is one of those ones I just said a moment ago. 00:36:53.810 --> 00:36:59.810 That means that there are 2 to the 160 different SHA-1 digests, which 00:36:59.810 --> 00:37:01.550 is a bit over 10 to the 48th power. 00:37:01.550 --> 00:37:06.380 And again, 2 to 160 means for every single one of the 160 bits, 00:37:06.380 --> 00:37:09.290 that could be a 0 or a 1. 00:37:09.290 --> 00:37:15.540 So we have that, two options times two options times two options, 160 times. 00:37:15.540 --> 00:37:20.815 Just to try and make it fathomable, to understand how large this number is, 00:37:20.815 --> 00:37:22.440 let me try and paint a picture for you. 00:37:22.440 --> 00:37:30.270 So imagine that you are looking on Earth for a specific grain of sand. 00:37:30.270 --> 00:37:35.690 You're looking for one specific grain of sand on Earth. 00:37:35.690 --> 00:37:45.260 That is easier by far than trying to have SHA-1 have a collision where 00:37:45.260 --> 00:37:47.070 two values would map to the same thing. 00:37:47.070 --> 00:37:52.030 There's about 10 to the 18 grains of sand on Earth. 00:37:52.030 --> 00:37:53.860 So that's eight quintillion-- 00:37:53.860 --> 00:37:55.240 I had to look up that word-- 00:37:55.240 --> 00:37:57.460 eight quintillion grains of sand. 00:37:57.460 --> 00:37:59.768 So way easier to find the grain of sand on Earth 00:37:59.768 --> 00:38:01.060 than it is to have a collision. 00:38:01.060 --> 00:38:04.060 In fact, we go even further and say that imagine 00:38:04.060 --> 00:38:07.000 that every single one of those grains of sand 00:38:07.000 --> 00:38:12.780 was another planet Earth, each of which also had sand on it. 00:38:12.780 --> 00:38:16.590 So you have eight quintillion planet Earths. 00:38:16.590 --> 00:38:19.230 You're trying to find a specific grain of sand 00:38:19.230 --> 00:38:23.220 on one of those eight quintillion planets. 00:38:23.220 --> 00:38:29.900 It's still easier than trying to have a collision with SHA-1. 00:38:29.900 --> 00:38:33.650 SHA-1 is such an important algorithm that it's actually 00:38:33.650 --> 00:38:36.717 one of the algorithms that is required in federal regulations 00:38:36.717 --> 00:38:39.050 to be used by the government for encrypting information. 00:38:39.050 --> 00:38:41.540 There are others as well, but SHA-1 is listed 00:38:41.540 --> 00:38:48.120 by the National Institute for Science and Technology as a standard algorithm. 00:38:48.120 --> 00:38:53.310 But there's a problem, which is that SHA-1 is broken. 00:38:53.310 --> 00:38:56.830 And it has this clever website called SHAttered, shattered.io. 00:38:56.830 --> 00:39:02.800 So the research team that figured out how to create a collision intentionally 00:39:02.800 --> 00:39:04.470 create a collision. 00:39:04.470 --> 00:39:07.570 And intentionally creating collision has the effect of basically saying, 00:39:07.570 --> 00:39:11.080 this cryptographic hash function is broken. 00:39:11.080 --> 00:39:15.070 And they have proven that there is a way that they can systematically 00:39:15.070 --> 00:39:17.230 generate collisions. 00:39:17.230 --> 00:39:19.330 So that's bad. 00:39:19.330 --> 00:39:22.430 And we'll see why that's bad in just a moment. 00:39:22.430 --> 00:39:24.280 But you can go to this URL, shattered.io, 00:39:24.280 --> 00:39:26.488 and read quite a bit about how the researchers do it. 00:39:26.488 --> 00:39:28.493 They explain it in different levels. 00:39:28.493 --> 00:39:31.660 So if you really want to dive into the technology and the mathematics of it, 00:39:31.660 --> 00:39:32.410 you're certainly welcome to. 00:39:32.410 --> 00:39:35.830 If you just want to understand it at a base level and why this is a problem, 00:39:35.830 --> 00:39:38.080 I definitely encourage you to take a look at this site 00:39:38.080 --> 00:39:39.400 and read more about this. 00:39:39.400 --> 00:39:42.620 So what did these researchers do? 00:39:42.620 --> 00:39:45.860 So they said, It is now practically possible 00:39:45.860 --> 00:39:50.780 to craft two colliding PDF files and obtain a SHA-1 digital signature 00:39:50.780 --> 00:39:53.990 on the first PDF file, which can also be abused 00:39:53.990 --> 00:39:57.047 as a valid signature on the second PDF file. 00:39:57.047 --> 00:39:58.880 In short, what they're basically saying here 00:39:58.880 --> 00:40:02.210 is we were able to create two PDF files such 00:40:02.210 --> 00:40:07.760 that if I run them through the SHA-1 algorithm, the digest that I get 00:40:07.760 --> 00:40:09.560 is the same. 00:40:09.560 --> 00:40:10.930 Why is this potentially bad? 00:40:13.890 --> 00:40:17.040 For example, by crafting the two colliding PDF files 00:40:17.040 --> 00:40:20.310 as two rental agreements with different rent, 00:40:20.310 --> 00:40:22.530 it is possible to trick someone to create 00:40:22.530 --> 00:40:26.880 a valid signature for a high-rent contract 00:40:26.880 --> 00:40:30.150 by having him or her sign a low-rent contract. 00:40:30.150 --> 00:40:33.090 If you can take a PDF and twist it into anything 00:40:33.090 --> 00:40:36.225 you want it to be, but have a valid signature, 00:40:36.225 --> 00:40:41.870 a valid SHA hash associated with it, that's not great. 00:40:41.870 --> 00:40:44.420 Now before alarm bells start going off because SHA-1 is still 00:40:44.420 --> 00:40:49.010 use quite extensively, even now, this SHAttered research result 00:40:49.010 --> 00:40:53.300 was developed in 2017 it was released, but SHA-1 is still 00:40:53.300 --> 00:40:56.510 being used now, even then. 00:40:56.510 --> 00:41:00.200 Before you panic though, it has not been broken that many times, 00:41:00.200 --> 00:41:01.220 although they did very-- 00:41:01.220 --> 00:41:05.720 they worked for two years to create this PDF collision. 00:41:05.720 --> 00:41:08.090 And they demonstrated a method for how to do it. 00:41:08.090 --> 00:41:11.120 It has still not happened that many times. 00:41:11.120 --> 00:41:14.270 Cryptographic hash functions, once they've demonstrated one collision, 00:41:14.270 --> 00:41:15.050 are broken. 00:41:15.050 --> 00:41:16.850 That is certainly true. 00:41:16.850 --> 00:41:21.050 But the actual effects of this have not yet really materialized. 00:41:21.050 --> 00:41:24.020 The computational power required to create this 00:41:24.020 --> 00:41:29.660 is well beyond the capabilities of most people, or most syndicates even. 00:41:29.660 --> 00:41:31.440 So no cause for alarm yet. 00:41:31.440 --> 00:41:36.320 But it does show that there is a limitation with SHA-1, 00:41:36.320 --> 00:41:39.980 and we still want to always be staying one step ahead. 00:41:39.980 --> 00:41:43.040 Just like when Julius Caesar's enemies figured out 00:41:43.040 --> 00:41:46.940 how to crack the Caesar Cipher, the goal was, we need to get one step ahead. 00:41:46.940 --> 00:41:49.490 As technologists, we always want to stay one step ahead 00:41:49.490 --> 00:41:52.670 to make sure that we are doing our best job protecting our data. 00:41:52.670 --> 00:41:54.420 And as lawyers, we want to make sure we're 00:41:54.420 --> 00:41:56.810 doing our best job protecting our clients' data 00:41:56.810 --> 00:42:00.220 against potential adversarial attacks. 00:42:00.220 --> 00:42:02.608 So as I mentioned, there are other standards 00:42:02.608 --> 00:42:05.650 that are in use by other organizations, including the federal government. 00:42:05.650 --> 00:42:09.700 SHA-1, as I mentioned, is just one of a few different options that they use. 00:42:09.700 --> 00:42:14.230 SHA-2 and SHA-3 are much more robust algorithms. 00:42:14.230 --> 00:42:16.870 They use more bits, basically, in their digest. 00:42:16.870 --> 00:42:20.740 So instead of being 160 bits, you can have anywhere between 220 00:42:20.740 --> 00:42:22.270 and 500 or so bits. 00:42:22.270 --> 00:42:25.960 So way larger of a domain, even reducing the likelihood 00:42:25.960 --> 00:42:27.540 of a collision that much more. 00:42:27.540 --> 00:42:32.530 Again, imagine how unlikely it was with 2 to the 160. 00:42:32.530 --> 00:42:34.620 Now we make it even more so. 00:42:34.620 --> 00:42:41.050 500 bits, that's unfathomably large and difficult to duplicate. 00:42:41.050 --> 00:42:45.100 MD5 and MD6 are other cryptographic hash functions, or hash functions 00:42:45.100 --> 00:42:46.530 that you may encounter. 00:42:46.530 --> 00:42:50.860 MD5 in particular I've highlighted here in yellow because it's not actually 00:42:50.860 --> 00:42:53.350 considered secure anymore, but it's still very, very 00:42:53.350 --> 00:42:55.390 commonly used as a checksum. 00:42:55.390 --> 00:42:59.170 Basically, what we do is we run a file through MD5. 00:42:59.170 --> 00:43:02.260 And say we're a distributor of a file and we 00:43:02.260 --> 00:43:04.390 want people to come download our source, and they 00:43:04.390 --> 00:43:08.830 want to be able to trust our source, we might run our file through MD5 and say, 00:43:08.830 --> 00:43:14.560 if you run this file through MD5, the hash will be blah blah blah blah blah. 00:43:14.560 --> 00:43:17.680 And other people can then download the file and run it through MD5. 00:43:17.680 --> 00:43:20.347 It's usually a program that is available on computers for people 00:43:20.347 --> 00:43:24.190 to just run any arbitrary data through to get a hash result. 00:43:24.190 --> 00:43:27.790 And they can check, OK, the hash value that I received from this trusted 00:43:27.790 --> 00:43:32.150 source matches the hash value that I was told I would receive, 00:43:32.150 --> 00:43:33.790 and so I will trust this. 00:43:33.790 --> 00:43:35.800 Versus perhaps getting that same software 00:43:35.800 --> 00:43:38.740 versus some corner of the internet that you don't really trust. 00:43:38.740 --> 00:43:41.050 If you find the MD5 hash of the trusted source 00:43:41.050 --> 00:43:45.562 does not match what you downloaded and what you thought was that same file, 00:43:45.562 --> 00:43:47.770 it's probably a sign that something has changed in it 00:43:47.770 --> 00:43:49.060 and you don't really want to-- 00:43:49.060 --> 00:43:52.060 you might want to be skeptical about trusting that file rather than just 00:43:52.060 --> 00:43:54.790 diving right into it. 00:43:54.790 --> 00:44:00.070 So what do we do that relies on cryptography on the internet today? 00:44:00.070 --> 00:44:03.580 Or you know, just using our computers every day. 00:44:03.580 --> 00:44:04.570 Email. 00:44:04.570 --> 00:44:07.210 Email relies pretty extensively on cryptography, 00:44:07.210 --> 00:44:11.897 particularly when we start to use secure email services, of which Gmail might 00:44:11.897 --> 00:44:14.230 not be considered one, but there are services out there, 00:44:14.230 --> 00:44:18.520 for example, ProtonMail and others, that do encrypt email completely 00:44:18.520 --> 00:44:19.930 from point to point. 00:44:19.930 --> 00:44:25.120 Much safer in terms of protecting one's communications. 00:44:25.120 --> 00:44:29.410 Similarly, you may be familiar with the mobile app Signal is also 00:44:29.410 --> 00:44:33.760 used to encrypt communications between two people over the text messaging 00:44:33.760 --> 00:44:38.470 network rather than over email and the internet. 00:44:38.470 --> 00:44:41.200 Secure web browsing, you may be familiar with this distinction 00:44:41.200 --> 00:44:44.012 between HTTP and HTTPS. 00:44:44.012 --> 00:44:45.220 And if you're not, that's OK. 00:44:45.220 --> 00:44:47.887 We're going to be talking about that a little bit later as well. 00:44:47.887 --> 00:44:51.160 But you want to make sure that your web traffic is encrypted against people 00:44:51.160 --> 00:44:55.360 who are able to just monitor the network for all the traffic that is going by. 00:44:55.360 --> 00:45:00.640 You probably don't want your searches to be someone else's 00:45:00.640 --> 00:45:02.980 fodder for entertainment. 00:45:02.980 --> 00:45:03.740 VPNs. 00:45:03.740 --> 00:45:06.990 If you use a VPN, that's a great thing to do if you're traveling, for example, 00:45:06.990 --> 00:45:11.290 and you may be on less secure networks than you might find at your business 00:45:11.290 --> 00:45:16.210 or at home or at a university institution, for example. 00:45:16.210 --> 00:45:19.990 VPNs allow you to encrypt communications with a network, 00:45:19.990 --> 00:45:24.670 and also allow the network to pretend to do something on your behalf so that 00:45:24.670 --> 00:45:28.450 your web traffic cannot be traced back to you directly, 00:45:28.450 --> 00:45:31.600 which might be advantageous in some situations as well. 00:45:31.600 --> 00:45:32.890 Document storage as well. 00:45:32.890 --> 00:45:37.330 So if you use services like Dropbox, for example, generally what 00:45:37.330 --> 00:45:40.960 Dropbox is going to do is break your document into pieces 00:45:40.960 --> 00:45:42.130 and encrypt those pieces. 00:45:42.130 --> 00:45:46.300 Rather than just storing the whole file writ large in some server somewhere 00:45:46.300 --> 00:45:49.210 on the cloud, it's going to encrypt it before it sends it over 00:45:49.210 --> 00:45:52.840 so that you have some more comfort that your data is being 00:45:52.840 --> 00:45:54.910 protected by these cloud services. 00:45:54.910 --> 00:45:58.510 And certainly, we're going to talk a bit more about what the cloud is 00:45:58.510 --> 00:46:01.720 and what cloud services are and what they can be used for a little bit 00:46:01.720 --> 00:46:05.763 later in the course as well. 00:46:05.763 --> 00:46:08.180 Hash functions and cryptographic hash functions are great, 00:46:08.180 --> 00:46:11.330 but they are well documented and there's only the one. 00:46:11.330 --> 00:46:13.460 There's only one version of SHA-1. 00:46:13.460 --> 00:46:15.950 There's only one version of SHA-3. 00:46:15.950 --> 00:46:17.585 And that is a limitation. 00:46:17.585 --> 00:46:20.820 Now it might not be a severe one because it's pretty strong. 00:46:20.820 --> 00:46:22.760 They're pretty strong algorithms. 00:46:22.760 --> 00:46:27.230 But are there ways that we can improve our own cryptographic techniques 00:46:27.230 --> 00:46:30.380 if we're trying to protect data that we are receiving, 00:46:30.380 --> 00:46:32.240 data that we are sending, and so on? 00:46:32.240 --> 00:46:35.580 And that leaves this idea of public-key cryptography, 00:46:35.580 --> 00:46:38.930 or public- and private-key cryptography, or asymmetric encryption. 00:46:38.930 --> 00:46:41.870 You'll hear these terms kind of used interchangeably. 00:46:41.870 --> 00:46:46.640 Let's start by talking about public-key cryptography by way of an analogy. 00:46:46.640 --> 00:46:50.510 We're going to go way back to arithmetic and algebra days here. 00:46:50.510 --> 00:46:52.820 So imagine we have something like this. 00:46:52.820 --> 00:46:58.220 We have 14 times 8 equals 112. 00:46:58.220 --> 00:47:01.460 Multiplication we can think of as a function. 00:47:01.460 --> 00:47:02.670 It is a function. 00:47:02.670 --> 00:47:08.720 If 14 is our input and our function is times 8, the result is 112. 00:47:08.720 --> 00:47:11.780 Now multiplication is not a hash function because it is reversible. 00:47:11.780 --> 00:47:17.060 I can take that 112, multiply it by 1/8, or equivalently divide by 8, 00:47:17.060 --> 00:47:18.750 and get back the original input. 00:47:18.750 --> 00:47:24.650 So multiplication is a function, but it is not a hash function. 00:47:24.650 --> 00:47:30.110 It is reversible because if we multiply any number x by some other number y, 00:47:30.110 --> 00:47:31.760 we get a result z. 00:47:31.760 --> 00:47:37.220 And we can undo that whole process by taking z, multiplying it by 1 over y, 00:47:37.220 --> 00:47:40.370 or the reciprocal of y, and getting back the original x. 00:47:40.370 --> 00:47:41.090 Reversible. 00:47:41.090 --> 00:47:44.620 Goes in both directions. 00:47:44.620 --> 00:47:47.740 Now let's take this function and kind of obscure it. 00:47:47.740 --> 00:47:54.040 We know for ourselves that this function that I'm using is n times 8. 00:47:54.040 --> 00:47:58.600 Whatever I pass in is going to be multiplied by 8. 00:47:58.600 --> 00:48:01.240 But I don't tell you what that is. 00:48:01.240 --> 00:48:03.310 I don't tell my friends what that is. 00:48:03.310 --> 00:48:05.890 I just say, hey, if you want to send me a message, 00:48:05.890 --> 00:48:08.393 just run it through this function. 00:48:08.393 --> 00:48:10.810 So again, we're going to just use math as an example here. 00:48:10.810 --> 00:48:15.350 If my message is 14, I might say, f of 14-- 00:48:15.350 --> 00:48:18.700 and again, this is getting back to algebra, maybe a little bit 00:48:18.700 --> 00:48:20.100 back in the day-- 00:48:20.100 --> 00:48:23.020 f of 14 is 112. 00:48:23.020 --> 00:48:27.295 That is my public key, you might think. 00:48:27.295 --> 00:48:29.920 And you might say, having just gone through this whole example, 00:48:29.920 --> 00:48:32.620 that, well, it's pretty easy to undo that. 00:48:32.620 --> 00:48:35.620 If I know that 14 is the plain text and 112 is the cipher text, 00:48:35.620 --> 00:48:39.400 I can probably figure out that your function is n times 8. 00:48:39.400 --> 00:48:41.620 And so I've broken your encryption scheme. 00:48:41.620 --> 00:48:46.210 I have figured out how to reverse your cryptography. 00:48:46.210 --> 00:48:49.780 Well, it's true that n times 8 is certainly one function 00:48:49.780 --> 00:48:55.150 that I could use to turn that plain text, 14 in this example, 00:48:55.150 --> 00:48:58.700 into that cipher text, 112 in this example. 00:48:58.700 --> 00:49:01.330 But there are other ways that I can do it. 00:49:01.330 --> 00:49:05.300 My actual function could have been n times 10 minus 28. 00:49:05.300 --> 00:49:09.010 So 14 times 10 is 140, minus 28 is 112. 00:49:09.010 --> 00:49:11.230 And there are other contrived mathematical examples 00:49:11.230 --> 00:49:16.240 that I could continue to do pretty much ad infinitum to define 00:49:16.240 --> 00:49:20.440 ways to transform 14 into 112. 00:49:20.440 --> 00:49:26.140 So just because you see that 112, that doesn't mean you 00:49:26.140 --> 00:49:30.620 have figured out how to break my hash function. 00:49:30.620 --> 00:49:34.000 You haven't figured out what my encryption technique is. 00:49:34.000 --> 00:49:37.180 If all I say is, here's a black box that I would like you to feed an input 00:49:37.180 --> 00:49:45.250 into, even if you see the output, you, or really more concernedly an adversary 00:49:45.250 --> 00:49:50.180 who sees that output as well should not be able to, or cannot in this case, 00:49:50.180 --> 00:49:50.680 undo it. 00:49:50.680 --> 00:49:52.840 Because yes, I could have been using n times 8. 00:49:52.840 --> 00:49:57.890 I could have been using this crazy thing involving the square of n. 00:49:57.890 --> 00:50:01.780 And that's kind of the idea behind public-key cryptography. 00:50:01.780 --> 00:50:07.120 I am going to publicize that I have a function that can be used, 00:50:07.120 --> 00:50:10.450 but I'm not going to tell you what that function is, 00:50:10.450 --> 00:50:14.120 and I'm certainly not going to tell you how to reverse it. 00:50:14.120 --> 00:50:19.330 So public- and private-key cryptography are actually two hash functions 00:50:19.330 --> 00:50:21.640 where the goal is to reverse them. 00:50:21.640 --> 00:50:24.100 We kind of talked about this as hash functions 00:50:24.100 --> 00:50:26.380 are supposed to be irreversible. 00:50:26.380 --> 00:50:32.860 But the distinction here is that we are creating two functions, f and g, which 00:50:32.860 --> 00:50:35.150 are intended to reverse one another. 00:50:35.150 --> 00:50:38.720 So it's not that the function does the single function that is reversible, 00:50:38.720 --> 00:50:43.120 it is that we have two functions that, working together, create a circuit. 00:50:43.120 --> 00:50:47.800 If I take data and I run it through function f, I get some output. 00:50:47.800 --> 00:50:52.030 If I run that output through function g, I get back the original data. 00:50:52.030 --> 00:50:53.540 I have deciphered the information. 00:50:53.540 --> 00:50:55.040 And the same thing works in reverse. 00:50:55.040 --> 00:50:58.420 If I take some data and I run it through function g, 00:50:58.420 --> 00:51:02.180 I get some hashed output that makes no sense. 00:51:02.180 --> 00:51:06.160 And if I run that hashed output through function f, 00:51:06.160 --> 00:51:09.500 I get back the original data once again. 00:51:09.500 --> 00:51:13.870 Now the key is that-- pun intended-- the key is that one of these functions 00:51:13.870 --> 00:51:15.910 is public and the other one is private. 00:51:15.910 --> 00:51:19.600 One of them is available to everybody, and everybody uses 00:51:19.600 --> 00:51:22.300 that function to send you messages. 00:51:22.300 --> 00:51:25.540 If you want to send me a message using encryption, 00:51:25.540 --> 00:51:29.290 using public and private key encryption, you take the message 00:51:29.290 --> 00:51:32.950 and you use my public key to encrypt it, and you 00:51:32.950 --> 00:51:36.250 send me the result, the hashed encrypted result. 00:51:36.250 --> 00:51:38.950 And I use my private key to decrypt it. 00:51:38.950 --> 00:51:41.020 And I am, ostensibly, the only person who 00:51:41.020 --> 00:51:43.810 has my private key, even though I've broadcasted, 00:51:43.810 --> 00:51:47.650 made my public key widely available. 00:51:47.650 --> 00:51:52.300 Now the math that goes into this is well beyond the scope of a discussion 00:51:52.300 --> 00:51:53.920 that we're going to have here today. 00:51:53.920 --> 00:51:57.970 But basically, and most encryption, most cryptography 00:51:57.970 --> 00:52:00.760 involves the use of prime numbers, particularly 00:52:00.760 --> 00:52:02.517 very, very large prime numbers. 00:52:02.517 --> 00:52:05.350 And you're looking for prime numbers that have a particular pattern. 00:52:05.350 --> 00:52:07.942 And when I say "you're" looking for it, don't worry, 00:52:07.942 --> 00:52:09.400 you don't have to do this yourself. 00:52:09.400 --> 00:52:11.380 There are plenty of programs out there, RSA 00:52:11.380 --> 00:52:14.140 being a very popular one, that can be used to generate 00:52:14.140 --> 00:52:16.810 these public and private key pairs. 00:52:16.810 --> 00:52:22.330 But the amazing thing is that it can generate these pairs very quickly, 00:52:22.330 --> 00:52:26.080 but it's almost impossible to break or figure out 00:52:26.080 --> 00:52:28.390 what the underlying functions, or even in this case 00:52:28.390 --> 00:52:30.900 what the underlying two prime numbers are 00:52:30.900 --> 00:52:33.460 that are the foundation for your own encryption strategy. 00:52:33.460 --> 00:52:37.750 So it's pretty amazing that it's easy to define these functions 00:52:37.750 --> 00:52:43.757 and almost impossible to reverse engineer them, so to speak. 00:52:43.757 --> 00:52:46.840 So we start with a huge prime number, we find some other prime number that 00:52:46.840 --> 00:52:49.960 has a property, a special property related to it, 00:52:49.960 --> 00:52:54.190 and from those two numbers we generate two functions whose goal in life 00:52:54.190 --> 00:52:56.740 is to undo whatever the first one does. 00:52:56.740 --> 00:53:03.370 So f's job is to undo what g does, g's job is to undo what f does. 00:53:03.370 --> 00:53:05.490 And this is called a public and private key pair. 00:53:05.490 --> 00:53:10.180 So your public key is really some complicated hash function 00:53:10.180 --> 00:53:11.660 that does work. 00:53:11.660 --> 00:53:14.260 And that hash function is represented as a very long string 00:53:14.260 --> 00:53:16.490 of numbers and letters. 00:53:16.490 --> 00:53:19.870 It looks just like a hash digest. 00:53:19.870 --> 00:53:23.620 But it's just a human representation, a readable representation 00:53:23.620 --> 00:53:25.330 of a mathematical function. 00:53:25.330 --> 00:53:27.860 And your private key is the same-- 00:53:27.860 --> 00:53:30.805 or your private key is also a representation of letters and numbers. 00:53:30.805 --> 00:53:32.680 It's not exactly the same as your public key, 00:53:32.680 --> 00:53:35.110 but it undoes the work that your private key does. 00:53:35.110 --> 00:53:39.970 And again, these keys are generated using a program called RSA. 00:53:39.970 --> 00:53:42.460 So let's take a look at exactly how we would 00:53:42.460 --> 00:53:46.690 go about doing some asymmetric encryption using 00:53:46.690 --> 00:53:48.350 public and private keys. 00:53:48.350 --> 00:53:50.560 So here we have some original data. 00:53:50.560 --> 00:53:52.660 It's a message perhaps that I want to send. 00:53:52.660 --> 00:53:54.880 And I want to send it to you. 00:53:54.880 --> 00:53:57.760 I want to send this message to you, but I don't 00:53:57.760 --> 00:53:59.770 want to send it to you in the clear. 00:53:59.770 --> 00:54:02.090 I don't want to, you know, it's sensitive information. 00:54:02.090 --> 00:54:04.675 I don't want to send it via plain text. 00:54:04.675 --> 00:54:07.630 And I don't want to use a generic hash function 00:54:07.630 --> 00:54:10.570 because if I use a generic hash function, like SHA for example, 00:54:10.570 --> 00:54:11.398 it's irreversible. 00:54:11.398 --> 00:54:13.690 You will not be able to figure out what I tried to say. 00:54:13.690 --> 00:54:18.460 So instead, I take this original data and I use your public key. 00:54:18.460 --> 00:54:22.030 Your public key, again, is just a mathematical-- a very complex-- 00:54:22.030 --> 00:54:24.420 mathematical function. 00:54:24.420 --> 00:54:30.090 So I take this data, I feed it into your public key, your public hash function, 00:54:30.090 --> 00:54:33.660 and I get some garbled stuff out. 00:54:33.660 --> 00:54:34.590 OK? 00:54:34.590 --> 00:54:36.410 And this is what I send to you. 00:54:36.410 --> 00:54:39.880 I send you this garbled stuff. 00:54:39.880 --> 00:54:43.240 In order for you to figure out what the original message is, 00:54:43.240 --> 00:54:44.785 you use your private key. 00:54:44.785 --> 00:54:46.660 Not your public key-- your public key is what 00:54:46.660 --> 00:54:49.900 I use to encipher the information-- but your private key, which 00:54:49.900 --> 00:54:51.940 is known only to you, hypothetically. 00:54:51.940 --> 00:54:55.090 It should not be distributed to others. 00:54:55.090 --> 00:54:57.970 It undoes the work that your public key did. 00:54:57.970 --> 00:55:00.790 And so if I give you the scrambled data and you 00:55:00.790 --> 00:55:04.180 use your private key to try and decipher it, 00:55:04.180 --> 00:55:07.045 you will get back that original data. 00:55:07.045 --> 00:55:08.170 But here's the great thing. 00:55:08.170 --> 00:55:11.560 No one else's private key will be able to do that. 00:55:11.560 --> 00:55:16.750 If anybody intercepts that message other than you and they use their private key 00:55:16.750 --> 00:55:20.170 or they use your public key again, they will not 00:55:20.170 --> 00:55:23.440 be able to decipher the message that I sent to you. 00:55:23.440 --> 00:55:26.140 And so public and private keys are very interesting because they 00:55:26.140 --> 00:55:28.470 create these pairs. 00:55:28.470 --> 00:55:33.460 They're these unique encryption schemes that are unique to two people, 00:55:33.460 --> 00:55:35.620 or really even to one person. 00:55:35.620 --> 00:55:37.450 If you were to send me a message back, you 00:55:37.450 --> 00:55:41.960 would send me a message using my public key. 00:55:41.960 --> 00:55:45.370 You would then send me whatever the encrypted sort of scrambled data 00:55:45.370 --> 00:55:50.350 is for the message that you sent using my public key. 00:55:50.350 --> 00:55:53.350 I would then use my private key, which is not 00:55:53.350 --> 00:55:55.420 known to you or to, hypothetically, anyone 00:55:55.420 --> 00:55:58.900 else to decipher what you sent me. 00:55:58.900 --> 00:56:01.990 And I would get back the secret message, or the perhaps not-so-secret, 00:56:01.990 --> 00:56:05.680 but sensitive message that you sent to me. 00:56:05.680 --> 00:56:09.400 And so that's this idea of asymmetric encryption. 00:56:09.400 --> 00:56:12.430 You can encrypt using someone's public key. 00:56:12.430 --> 00:56:13.960 And anybody can do so. 00:56:13.960 --> 00:56:17.560 And for that reason, you'll often find technically-minded people will 00:56:17.560 --> 00:56:20.890 sometimes post their public key literally on the internet, 00:56:20.890 --> 00:56:24.160 such that anybody who wants to send them a message using a secure channel 00:56:24.160 --> 00:56:27.070 can do so. 00:56:27.070 --> 00:56:28.330 And programmers as well. 00:56:28.330 --> 00:56:32.740 So if I'm doing some work using a tool called GitHub, a popular service 00:56:32.740 --> 00:56:38.080 available online for sharing and posting source code, 00:56:38.080 --> 00:56:42.220 if I want to send something from my computer to GitHub's servers 00:56:42.220 --> 00:56:48.310 in the cloud, I might authenticate using a public key and private key encryption 00:56:48.310 --> 00:56:52.042 scheme so that they see that I'm using their public key to send them 00:56:52.042 --> 00:56:53.500 information, they're decrypting it. 00:56:53.500 --> 00:56:57.250 When they send information back to me, they're using my public key 00:56:57.250 --> 00:56:59.710 and I use my private key to decrypt it. 00:56:59.710 --> 00:57:02.260 It's actually part of-- 00:57:02.260 --> 00:57:06.670 it's part of a communication strategy used by technically-minded folks. 00:57:06.670 --> 00:57:09.850 And you're not restricted to just having one public and private key. 00:57:09.850 --> 00:57:11.770 For example, I have one public and private key 00:57:11.770 --> 00:57:14.860 that I use for a secure email, I have one public and private key 00:57:14.860 --> 00:57:19.120 that I would use for secure texting on my phone, 00:57:19.120 --> 00:57:24.910 and I have one public and private key that I use for my GitHub repository. 00:57:24.910 --> 00:57:29.110 So I have different sets and different combinations of these keys. 00:57:29.110 --> 00:57:31.360 But the key is that-- the key, again, pun intended-- 00:57:31.360 --> 00:57:36.430 is that the decryption can only be done by someone who has the private key, not 00:57:36.430 --> 00:57:40.215 the public key, because only those two functions are reciprocals 00:57:40.215 --> 00:57:40.840 of one another. 00:57:40.840 --> 00:57:46.410 They undo the work that the other did in the first place. 00:57:46.410 --> 00:57:49.320 But interestingly enough, that's not the only thing 00:57:49.320 --> 00:57:52.330 we can do with public and private keys. 00:57:52.330 --> 00:57:54.930 So instead of just encryption, we also have this idea 00:57:54.930 --> 00:57:57.480 of a digital signature, which is different than e-signature, 00:57:57.480 --> 00:58:00.840 an e-signature just being the tracing of a pen typically 00:58:00.840 --> 00:58:04.478 along some surface and just logging where all the pen strokes happen to be. 00:58:04.478 --> 00:58:07.020 So we're talking about something much more complex than that. 00:58:07.020 --> 00:58:08.978 We're talking about something cryptographically 00:58:08.978 --> 00:58:10.800 based when we talk about digital signature. 00:58:10.800 --> 00:58:14.310 It's kind of the opposite of encryption. 00:58:14.310 --> 00:58:17.040 And using someone's digital signature, you 00:58:17.040 --> 00:58:22.050 can verify the authenticity of a document and verify, more precisely, 00:58:22.050 --> 00:58:25.810 the authenticity of the sender of a document. 00:58:25.810 --> 00:58:30.480 And we're going to explain this in great detail in just a moment, 00:58:30.480 --> 00:58:34.720 but the basic idea is they're signing the document using their private key. 00:58:34.720 --> 00:58:36.870 You still don't see what the key is. 00:58:36.870 --> 00:58:39.270 And because these public and private key pairs 00:58:39.270 --> 00:58:42.990 are specific to an individual person, if you 00:58:42.990 --> 00:58:45.330 were able to verify that that document could only 00:58:45.330 --> 00:58:49.140 have been signed using someone's private key, 00:58:49.140 --> 00:58:53.640 then you have quite a serious belief that that person 00:58:53.640 --> 00:58:58.350 is the person who signed the document, who sent the document, and so on. 00:58:58.350 --> 00:59:04.020 Digital signatures are 256 bits long pretty consistently, 00:59:04.020 --> 00:59:08.130 which means there are 2 to the 256th power distinct digital signatures, 00:59:08.130 --> 00:59:13.350 which makes the potential of a forgery effectively zero. 00:59:13.350 --> 00:59:14.420 Again, I'm using this-- 00:59:14.420 --> 00:59:18.420 I'm trying to avoid saying never because computer scientists don't like never. 00:59:18.420 --> 00:59:24.270 But effectively, there is no chance of a forgery. 00:59:24.270 --> 00:59:30.550 Now the process for how one verifies a digital signature is quite-- 00:59:30.550 --> 00:59:32.300 there's quite a few steps involved. 00:59:32.300 --> 00:59:34.533 And I have a diagram here that I sourced from online. 00:59:34.533 --> 00:59:36.450 And what I'd like us to do now is walk through 00:59:36.450 --> 00:59:41.580 this process to hopefully give you an understanding of how these work 00:59:41.580 --> 00:59:44.700 and how you might be able to rely on digital signatures. 00:59:44.700 --> 00:59:49.230 And states and different entities are recognizing digital signatures 00:59:49.230 --> 00:59:53.610 as a valid way to sign documents, but it really helps 00:59:53.610 --> 00:59:57.000 to have a good understanding of them such that you, as an attorney, 00:59:57.000 --> 01:00:02.590 are comfortable with the fact that this does represent a specific individual. 01:00:02.590 --> 01:00:07.420 So let's take a look at how this process works. 01:00:07.420 --> 01:00:10.040 So we start with data. 01:00:10.040 --> 01:00:12.760 Data in this case is any document. 01:00:12.760 --> 01:00:19.240 Perhaps it's a scanned, signed version of some PDF with somebody's actual ink 01:00:19.240 --> 01:00:19.740 signature. 01:00:19.740 --> 01:00:22.260 But again, the whole thing is just scanned. 01:00:22.260 --> 01:00:24.480 The next step is to use a hash function. 01:00:24.480 --> 01:00:27.570 The hash function that we could use in this context could be anything. 01:00:27.570 --> 01:00:29.400 It could be SHA-1. 01:00:29.400 --> 01:00:32.353 It could be something very complex. 01:00:32.353 --> 01:00:34.770 In general, the hash function that's going to be used here 01:00:34.770 --> 01:00:36.960 is actually not a cryptographic hash function. 01:00:36.960 --> 01:00:38.970 It's going to be something like MD5. 01:00:38.970 --> 01:00:40.860 So something that anybody has access to. 01:00:40.860 --> 01:00:44.310 And that's going to result in a hash, a set of zeros and ones. 01:00:44.310 --> 01:00:48.818 In the case of MD5, it's going to be about 160 or so different characters. 01:00:48.818 --> 01:00:50.610 Now where things get very interesting is we 01:00:50.610 --> 01:00:54.030 take that hash, that set of zeros and ones, 01:00:54.030 --> 01:00:58.030 and we encrypt it using the signer's private keys. 01:00:58.030 --> 01:01:00.690 Remember, these functions are reciprocals of one another. 01:01:00.690 --> 01:01:03.270 A public key can undo what the private key does, 01:01:03.270 --> 01:01:06.360 and the private key can undo what the public key does. 01:01:06.360 --> 01:01:11.220 Notice in this case we're still not sending anyone our private key. 01:01:11.220 --> 01:01:14.080 We are just using our private key to encrypt something. 01:01:14.080 --> 01:01:17.940 So we take this hash that we received from running our file through MD5, 01:01:17.940 --> 01:01:22.050 we encrypt it using our private key, and we get some other result out of it. 01:01:22.050 --> 01:01:27.150 This number that comes out of running the hash through our private key 01:01:27.150 --> 01:01:29.550 is called the signature. 01:01:29.550 --> 01:01:32.340 We then just couple that-- so when we send this off, 01:01:32.340 --> 01:01:36.570 we send the signature plus the original document, 01:01:36.570 --> 01:01:40.450 and that would be considered a digital signature. 01:01:40.450 --> 01:01:42.930 So that's the signing part of the process. 01:01:42.930 --> 01:01:43.810 That's where we go. 01:01:43.810 --> 01:01:45.540 We start with a file. 01:01:45.540 --> 01:01:47.730 We run that file through a generic hash function. 01:01:47.730 --> 01:01:49.560 Not our public and private keys, something 01:01:49.560 --> 01:01:51.900 that is generally pretty accessible. 01:01:51.900 --> 01:01:55.800 We take that hash, we encrypt it using our private key 01:01:55.800 --> 01:01:59.730 to get some other hash that looks similar, different zeros and ones, 01:01:59.730 --> 01:02:02.790 but totally different pattern of zeros and ones. 01:02:02.790 --> 01:02:07.860 We attach the original document and the digital signature when we send it off, 01:02:07.860 --> 01:02:11.220 and that is considered a digitally signed document. 01:02:11.220 --> 01:02:15.810 Now the real crux is how do you prove that I'm the person who 01:02:15.810 --> 01:02:18.278 sent you this document, right? 01:02:18.278 --> 01:02:20.070 If you want-- if you're receiving something 01:02:20.070 --> 01:02:22.028 that has a digital signature, which is supposed 01:02:22.028 --> 01:02:25.560 to be as good as any other kind of signature, 01:02:25.560 --> 01:02:28.450 it's supposed to have legal effect. 01:02:28.450 --> 01:02:32.400 How do we verify that that person who sent you the document 01:02:32.400 --> 01:02:34.795 was actually the correct one? 01:02:34.795 --> 01:02:36.420 So then we go to the verification step. 01:02:36.420 --> 01:02:41.310 So we start, we've now received this digitally signed data. 01:02:41.310 --> 01:02:43.560 This is the same as this digitally signed data here 01:02:43.560 --> 01:02:46.660 that was sent by the sender. 01:02:46.660 --> 01:02:48.940 We also received two pieces of information. 01:02:48.940 --> 01:02:53.070 We received the document, the original document, 01:02:53.070 --> 01:02:55.170 and we received the signature. 01:02:55.170 --> 01:02:57.270 And recall, again, that the signature is what 01:02:57.270 --> 01:02:59.610 happens when we take the hash of the document 01:02:59.610 --> 01:03:05.540 and run it using our private key to get a result. 01:03:05.540 --> 01:03:07.773 Now the interesting step here is remembering 01:03:07.773 --> 01:03:10.440 that the public and private keys are reciprocals of one another. 01:03:10.440 --> 01:03:14.570 So we can take this complicated signature hash 01:03:14.570 --> 01:03:18.060 and we can use the public key, which, again, is publicly available. 01:03:18.060 --> 01:03:23.240 Anybody should ostensibly have access to someone's public key, not 01:03:23.240 --> 01:03:23.990 their private key. 01:03:23.990 --> 01:03:27.350 And notice that the signer has never sent their private key. 01:03:27.350 --> 01:03:29.360 They've only used it to encrypt some data, 01:03:29.360 --> 01:03:31.100 but they never sent the private key. 01:03:31.100 --> 01:03:33.620 The public key has always been available though. 01:03:33.620 --> 01:03:36.980 We take the signature, we run it through the public key function, 01:03:36.980 --> 01:03:39.440 and we get a hash. 01:03:39.440 --> 01:03:44.390 We take the data, the document, and we run it through MD5, 01:03:44.390 --> 01:03:48.653 the same hash function that the sender was supposed to use, and we get a hash. 01:03:48.653 --> 01:03:50.570 And we're checking to make sure that these two 01:03:50.570 --> 01:03:53.420 hashes are equal to one another. 01:03:53.420 --> 01:03:56.630 If they are equal to one another, that means the signature is valid. 01:03:56.630 --> 01:04:01.330 Let's talk about why that would be the case. 01:04:01.330 --> 01:04:07.210 If we use the MD5 of this file, the generic hash of this file, 01:04:07.210 --> 01:04:11.100 and we encrypt it using our private key, we get some result, OK? 01:04:11.100 --> 01:04:13.380 But this is very easy to calculate. 01:04:13.380 --> 01:04:14.140 It's MD5. 01:04:14.140 --> 01:04:18.280 We're taking a basic document, we're running it through a publicly known, 01:04:18.280 --> 01:04:19.912 well-defined hash function. 01:04:19.912 --> 01:04:22.870 Anybody who has access to this document and a program on their computer 01:04:22.870 --> 01:04:26.680 called MD5 can literally run this document through it 01:04:26.680 --> 01:04:27.890 and get this number. 01:04:27.890 --> 01:04:30.430 This is not the tricky part of this. 01:04:30.430 --> 01:04:36.160 We then take this hash function, we encrypt it using our private key 01:04:36.160 --> 01:04:38.500 to get some secret number. 01:04:38.500 --> 01:04:40.590 The public key though will undo that. 01:04:40.590 --> 01:04:43.990 Remember, the public and private keys are reciprocals of one another. 01:04:43.990 --> 01:04:48.160 Whatever one does, the other one can undo. 01:04:48.160 --> 01:04:54.650 And so only my public key will undo the work of my private key. 01:04:54.650 --> 01:04:57.550 So if I take this value and I encrypt it using my private key, 01:04:57.550 --> 01:05:00.430 and then I run this value through the public key, 01:05:00.430 --> 01:05:04.630 I should get the original result again, the original MD5 hash. 01:05:04.630 --> 01:05:08.080 And that's why we have to send the document as well, not 01:05:08.080 --> 01:05:10.728 just the digital signature, the numbers that we 01:05:10.728 --> 01:05:13.270 get by running it through our private key in the first place. 01:05:13.270 --> 01:05:18.610 That way we have a way to validate that yes, this file has this checksum, 01:05:18.610 --> 01:05:23.710 and the sender took that checksum, they ran it through their own private key, 01:05:23.710 --> 01:05:26.770 and when I used their public key to undo it, 01:05:26.770 --> 01:05:31.810 I get the same value, which is effectively proving, but is, 01:05:31.810 --> 01:05:34.750 we'll term it as it's very, very, very, very 01:05:34.750 --> 01:05:39.640 likely that this person who claimed to have sent the document 01:05:39.640 --> 01:05:42.100 is, in fact, the person who sent that document. 01:05:42.100 --> 01:05:44.350 And so that's what digital signatures can be used for. 01:05:44.350 --> 01:05:48.520 It is a mathematical, cryptographic way to verify 01:05:48.520 --> 01:05:52.720 the identity of the sender of a document or an individual. 01:05:52.720 --> 01:05:56.260 Or in whatever context you might be using or receiving digital signatures, 01:05:56.260 --> 01:06:02.650 it is purely a verification step that is based entirely in mathematics. 01:06:02.650 --> 01:06:05.590 There's one other potentially interesting use 01:06:05.590 --> 01:06:09.040 of digital signatures that's also quite buzzy right now, 01:06:09.040 --> 01:06:11.300 and that's blockchain technology. 01:06:11.300 --> 01:06:13.450 And what is the blockchain? 01:06:13.450 --> 01:06:18.430 Digital signatures are really key to knowing how the blockchain works 01:06:18.430 --> 01:06:24.740 and why it is trusted as a decentralized source of information for individuals. 01:06:24.740 --> 01:06:27.160 So understanding digital signatures means 01:06:27.160 --> 01:06:30.880 you are in a position to understand blockchain. 01:06:30.880 --> 01:06:34.610 And I use here the term the blockchain, but it really is a blockchain. 01:06:34.610 --> 01:06:37.840 There's no such thing as the one blockchain. 01:06:37.840 --> 01:06:40.930 There are many different-- this is just an idea that is implemented. 01:06:40.930 --> 01:06:43.940 Generally, we're hearing it in the context of a cryptocurrency, 01:06:43.940 --> 01:06:49.150 but it does not need to be restricted to that, although cryptocurrencies are so 01:06:49.150 --> 01:06:52.870 discussed in the media and have been dissected by so many researchers 01:06:52.870 --> 01:06:55.450 that they provide an interesting vehicle, an interesting lens 01:06:55.450 --> 01:06:57.760 through which to consider blockchain. 01:06:57.760 --> 01:07:01.280 And so our example today is going to focus on Bitcoin. 01:07:01.280 --> 01:07:04.000 It is the most well-documented of the cryptocurrencies. 01:07:04.000 --> 01:07:07.560 It is the most well-documented implementation of the blockchain, 01:07:07.560 --> 01:07:10.420 or among the most well-documented implementations. 01:07:10.420 --> 01:07:13.240 But this is not specifically a lecture about Bitcoin. 01:07:13.240 --> 01:07:19.240 We're just using Bitcoin as a lens through which to understand blockchain. 01:07:19.240 --> 01:07:22.480 There's also an outside source that I strongly encourage. 01:07:22.480 --> 01:07:27.850 This channel on YouTube provides interesting mathematical dissections 01:07:27.850 --> 01:07:32.950 of topics, and they tackle blockchain and Bitcoin pretty extensively. 01:07:32.950 --> 01:07:34.960 And this is an excellent supplementary resource 01:07:34.960 --> 01:07:36.580 to consider if you're trying to dig into this 01:07:36.580 --> 01:07:38.960 or understand it a little bit more, because in this video 01:07:38.960 --> 01:07:42.250 I'm going to omit some of the more technical details for the sake 01:07:42.250 --> 01:07:44.857 of, hopefully, broader understanding. 01:07:44.857 --> 01:07:46.690 But if you want to dive into it more deeply, 01:07:46.690 --> 01:07:49.720 this is a resource that I would recommend. 01:07:49.720 --> 01:07:53.170 And I really like talking about Bitcoin in the context of blockchain 01:07:53.170 --> 01:07:57.550 because it's actually how I kind of got started almost as an attorney. 01:07:57.550 --> 01:08:01.690 When I was practicing, when I graduated from law school, 01:08:01.690 --> 01:08:06.040 I decided to go out on my own and start my own firm. 01:08:06.040 --> 01:08:08.650 I live in a small town and so a lot of my early work 01:08:08.650 --> 01:08:12.760 was doing estate plans, wills and such for individuals in my town, 01:08:12.760 --> 01:08:13.810 getting to know them. 01:08:13.810 --> 01:08:17.740 But I had studied extensively technology-related law in law school 01:08:17.740 --> 01:08:21.100 and I really wanted to use it. 01:08:21.100 --> 01:08:23.979 And a few years into my practice, I had a friend 01:08:23.979 --> 01:08:28.960 who needed an estate plan prepared, and he asked if he could pay me in Bitcoin. 01:08:28.960 --> 01:08:31.689 And I had no idea what that meant. 01:08:31.689 --> 01:08:34.380 I didn't really know anything about Bitcoin at the time. 01:08:34.380 --> 01:08:37.670 And I looked it up and thought it sounded interesting, 01:08:37.670 --> 01:08:38.687 and so I said sure. 01:08:38.687 --> 01:08:40.270 So I learned how to set up an account. 01:08:40.270 --> 01:08:42.800 And it's also worth mentioning at the outset, 01:08:42.800 --> 01:08:45.040 as we're talking about cryptocurrency, that you 01:08:45.040 --> 01:08:47.979 need to understand how Bitcoin works to use Bitcoin. 01:08:47.979 --> 01:08:50.470 You don't need to understand how the federal banking 01:08:50.470 --> 01:08:54.040 system works to use a bank. 01:08:54.040 --> 01:08:55.689 And the same is true here with Bitcoin. 01:08:55.689 --> 01:09:00.859 But I ended up accepting a Bitcoin payment 01:09:00.859 --> 01:09:03.560 by creating what's called a Bitcoin wallet. 01:09:03.560 --> 01:09:07.850 I immediately sold the Bitcoin that I received and turned it into cash, such 01:09:07.850 --> 01:09:11.968 that I could use it for more generic purposes. 01:09:11.968 --> 01:09:14.510 And what I decided to do was send out a press release saying, 01:09:14.510 --> 01:09:17.210 oh, I accept Bitcoin, because it was something that was novel 01:09:17.210 --> 01:09:19.085 and I hadn't really heard that much about it. 01:09:19.085 --> 01:09:23.029 And this got the attention of my local paper and companies 01:09:23.029 --> 01:09:25.270 in the area that were technically minded as well. 01:09:25.270 --> 01:09:28.819 And so Bitcoin sort of provided this forum 01:09:28.819 --> 01:09:32.210 to meet new clients that also allowed me to explore fields 01:09:32.210 --> 01:09:34.850 of the law about which I am passionate. 01:09:34.850 --> 01:09:39.800 So it's kind of an interesting segue to be able to share that with you now. 01:09:39.800 --> 01:09:43.609 All right, so stepping away from Bitcoin again more broadly to blockchain. 01:09:43.609 --> 01:09:44.680 What is the blockchain? 01:09:44.680 --> 01:09:47.180 It's very similar to something you've already learned about, 01:09:47.180 --> 01:09:49.109 which is a linked list. 01:09:49.109 --> 01:09:52.220 So recall that a linked list is a set of nodes, each of which 01:09:52.220 --> 01:09:57.200 have connections forward and backward to other nodes in the chain. 01:09:57.200 --> 01:09:58.530 They are linked together. 01:09:58.530 --> 01:10:02.770 And similarly, with a blockchain, all of the blocks are chained together. 01:10:02.770 --> 01:10:06.740 It's basically the same terminology slightly modified. 01:10:06.740 --> 01:10:10.850 So a linked list is a set of nodes, each of which is connected to the one prior 01:10:10.850 --> 01:10:12.652 and the one after it. 01:10:12.652 --> 01:10:15.860 We learned about linked lists as having generally three pieces of information 01:10:15.860 --> 01:10:18.950 associated with them-- a previous pointer, which is basically 01:10:18.950 --> 01:10:23.660 a reference to the prior node, or in this case, the prior block; 01:10:23.660 --> 01:10:26.990 we have the next pointer, which is a reference to the next node 01:10:26.990 --> 01:10:31.010 or the next block; and we had data. 01:10:31.010 --> 01:10:33.570 And in this case, the data is actually two different things. 01:10:33.570 --> 01:10:35.330 There's the real data. 01:10:35.330 --> 01:10:38.180 And again, in the context of a cryptocurrency blockchain 01:10:38.180 --> 01:10:40.730 we're going to be talking about a list of transactions, 01:10:40.730 --> 01:10:44.330 a numbered list of transactions from person A to person b, 01:10:44.330 --> 01:10:47.240 each of those transactions being digitally signed such 01:10:47.240 --> 01:10:50.900 that you can verify that the person who logs that transaction 01:10:50.900 --> 01:10:53.450 is actually the one who made that transaction. 01:10:53.450 --> 01:10:55.780 And also, something called a proof of work. 01:10:55.780 --> 01:10:57.530 And this proof of work is very interesting 01:10:57.530 --> 01:11:01.510 because this is how Bitcoin ostensibly derives its authority. 01:11:01.510 --> 01:11:07.130 There is no central controller of the Bitcoin currency, 01:11:07.130 --> 01:11:09.530 and it is very decentralized. 01:11:09.530 --> 01:11:11.420 And there needs to be some way for people 01:11:11.420 --> 01:11:18.140 to agree as to what the true ledger is, or what the true set of transactions 01:11:18.140 --> 01:11:19.610 that have happened are. 01:11:19.610 --> 01:11:24.330 And the way that is done is by relying on something called the proof of work. 01:11:24.330 --> 01:11:27.280 And we'll dive into that shortly as well. 01:11:27.280 --> 01:11:31.770 So again, cryptocurrencies, that data is a ledger of transactions, each of which 01:11:31.770 --> 01:11:33.840 is digitally signed using the digital signature 01:11:33.840 --> 01:11:37.050 technique we've just discussed by the person who 01:11:37.050 --> 01:11:40.100 made or initiated that transaction. 01:11:40.100 --> 01:11:43.820 And that ledger is decentralized, which means that any time there's 01:11:43.820 --> 01:11:47.210 ever a change, any time any transaction is recorded, in this case, 01:11:47.210 --> 01:11:50.870 using Bitcoin, again, our lens through which to consider blockchain, 01:11:50.870 --> 01:11:53.630 that message is broadcast out. 01:11:53.630 --> 01:11:58.850 So if I make a transaction in Bitcoin, I pay you $10, 01:11:58.850 --> 01:12:02.720 I'm going to announce to everyone else who has a Bitcoin wallet 01:12:02.720 --> 01:12:07.670 or who is monitoring the blockchain, the list of transactions, hey, 01:12:07.670 --> 01:12:14.030 please add the following transaction to this list, Doug pays you $10. 01:12:14.030 --> 01:12:18.800 And that is announced to everybody, everybody records it in their ledger, 01:12:18.800 --> 01:12:22.250 and then some stuff is going to start happening. 01:12:22.250 --> 01:12:25.270 But here is a potential issue. 01:12:25.270 --> 01:12:28.330 How do you know that the blockchain is legitimate? 01:12:28.330 --> 01:12:34.708 How do you know that your copy of what is being said is the truth? 01:12:34.708 --> 01:12:37.750 How do you know that your copy of the blockchain is accurate with respect 01:12:37.750 --> 01:12:40.300 to all other transactions that have happened? 01:12:40.300 --> 01:12:42.200 Everybody else has their own copy as well. 01:12:42.200 --> 01:12:43.180 It's decentralized. 01:12:43.180 --> 01:12:46.930 We all maintain, anybody who's using Bitcoin maintains 01:12:46.930 --> 01:12:51.510 their own copy of the blockchain. 01:12:51.510 --> 01:12:54.630 How do you defend against people modifying it? 01:12:57.440 --> 01:12:59.580 That's a very interesting question. 01:12:59.580 --> 01:13:02.820 The way that cryptocurrencies do it is to assume-- 01:13:02.820 --> 01:13:04.610 and this is defined in the Bitcoin paper-- 01:13:04.610 --> 01:13:07.260 the way the cryptocurrencies do it is to assume 01:13:07.260 --> 01:13:11.910 that the chain that has the most computational work put into it 01:13:11.910 --> 01:13:13.470 is the true chain. 01:13:13.470 --> 01:13:16.080 This decision is completely arbitrary. 01:13:16.080 --> 01:13:19.290 There's no reason why one needs to be vetted over the other. 01:13:19.290 --> 01:13:23.910 But something had to be agreed upon by, collectively, users of Bitcoin 01:13:23.910 --> 01:13:30.210 to say in the event of a dispute, between which person's chain is 01:13:30.210 --> 01:13:34.380 the accurate de facto definitive list of transactions? 01:13:34.380 --> 01:13:38.370 We're going to go with the one that has been verified the most times. 01:13:38.370 --> 01:13:41.970 And again, this word verified is sort of a sketchy word. 01:13:41.970 --> 01:13:44.400 There's nothing inherently about proof of work 01:13:44.400 --> 01:13:49.590 or anything else that proves that a transaction has taken place in the way 01:13:49.590 --> 01:13:52.230 that we normally think of this term verified. 01:13:52.230 --> 01:13:57.510 Rather it is the collective standard by which we all agree to adhere, 01:13:57.510 --> 01:13:58.730 that the person-- 01:13:58.730 --> 01:14:04.860 or that the blockchain that has the most proof of work in it is the list. 01:14:04.860 --> 01:14:08.820 That is just something we must subscribe to as users 01:14:08.820 --> 01:14:10.650 and consumers of blockchain. 01:14:10.650 --> 01:14:14.760 Now how do we determine which blockchain has had the most computational work 01:14:14.760 --> 01:14:18.720 into it, which copy of the blockchain has had the most computational work put 01:14:18.720 --> 01:14:20.190 into it? 01:14:20.190 --> 01:14:23.100 Well, this is proof of work. 01:14:23.100 --> 01:14:29.670 So proof of work is how the correct blockchain of all the copies 01:14:29.670 --> 01:14:33.840 that are decentralized is determined. 01:14:33.840 --> 01:14:35.130 So recall how hashing works. 01:14:35.130 --> 01:14:42.020 Hashing allows us to take any arbitrary data and run it through a hash function 01:14:42.020 --> 01:14:44.480 and get an outcome. 01:14:44.480 --> 01:14:49.220 And that outcome is going to be, let's say 256 bits, each of those bits being, 01:14:49.220 --> 01:14:52.930 of course, 0 or 1. 01:14:52.930 --> 01:14:56.650 Now there's a lot of different combinations there. 01:14:56.650 --> 01:15:00.460 But some of them will be very unique. 01:15:00.460 --> 01:15:05.110 And the way Bitcoin works, Bitcoin's blockchain works 01:15:05.110 --> 01:15:08.450 is to prove a particular block. 01:15:08.450 --> 01:15:12.840 We are asking people who are oftentimes called miners-- 01:15:12.840 --> 01:15:15.910 that's where this term comes from because they are mining. , 01:15:15.910 --> 01:15:19.180 Ultimately the reward for doing this proof of work is to receive Bitcoin 01:15:19.180 --> 01:15:20.972 that are sort of generated out of thin air. 01:15:20.972 --> 01:15:22.840 And so these people are termed miners. 01:15:22.840 --> 01:15:28.300 But we are asking anyone who has a computer to hash the entire block. 01:15:28.300 --> 01:15:31.000 So hash the entire list of transactions, the reference 01:15:31.000 --> 01:15:32.720 to the previous block and the next block. 01:15:32.720 --> 01:15:35.845 And remember, all of that is contained in a single node of this blockchain, 01:15:35.845 --> 01:15:36.880 basically. 01:15:36.880 --> 01:15:40.570 And we're looking for a highly unusual pattern. 01:15:40.570 --> 01:15:44.500 We're looking for maybe the first 30 bits or the first 40 bits 01:15:44.500 --> 01:15:47.540 to all be zeros. 01:15:47.540 --> 01:15:49.230 That's really weird. 01:15:49.230 --> 01:15:51.230 Like, that's a really difficult pattern to find. 01:15:51.230 --> 01:15:53.060 And the only way to do it is to guess. 01:15:53.060 --> 01:15:56.240 So you take this entire block, you attach a single piece of data 01:15:56.240 --> 01:15:58.880 to the bottom of it, like 1, 2, 3. 01:15:58.880 --> 01:16:01.610 You can just count in that way trying to guess. 01:16:01.610 --> 01:16:05.100 And if you hash that entire thing together, 01:16:05.100 --> 01:16:09.490 do you eventually find a block that, when hashed in this way, 01:16:09.490 --> 01:16:13.170 produces this very, very unique pattern? 01:16:13.170 --> 01:16:16.680 If so, you just say, here's the number that I attach. 01:16:16.680 --> 01:16:21.930 So let's say I took the entire block and I hashed it with 12345 01:16:21.930 --> 01:16:24.450 was the number, right? 01:16:24.450 --> 01:16:29.130 It's very difficult to find a value that would 01:16:29.130 --> 01:16:31.230 create this unique pattern of zeros and ones, 01:16:31.230 --> 01:16:33.900 in particular, zeros, 30 zeros in a row. 01:16:33.900 --> 01:16:38.850 But it's really, really easy to verify that someone has done it. 01:16:38.850 --> 01:16:43.140 To verify that someone has done it, all you have to do is if they announce 01:16:43.140 --> 01:16:46.877 the number that they used, 12345, as their proof of work-- 01:16:46.877 --> 01:16:48.710 and that's what the proof of work really is, 01:16:48.710 --> 01:16:51.740 it's that number that they use to figure it out-- 01:16:51.740 --> 01:16:54.890 if they announce that and you hash the block with that number, 01:16:54.890 --> 01:16:59.270 you can verify, yes, that pattern is actually 30 zeros in a row. 01:16:59.270 --> 01:17:01.610 So I guess you have proven it. 01:17:01.610 --> 01:17:04.130 Now this is, again, kind of arbitrary. 01:17:04.130 --> 01:17:05.090 Like, this seems weird. 01:17:05.090 --> 01:17:08.210 Why are you spending all your time trying 01:17:08.210 --> 01:17:12.530 to figure out a specific pattern that exists somewhere? 01:17:12.530 --> 01:17:16.310 That is a question that I cannot answer other than to say that it is 01:17:16.310 --> 01:17:22.775 the standard by which people who have ascribed to the Bitcoin standard have 01:17:22.775 --> 01:17:23.900 just agreed to be bound by. 01:17:23.900 --> 01:17:27.540 The person who finds this number is probably the-- 01:17:27.540 --> 01:17:31.577 is proving the validity of all the transactions above it. 01:17:31.577 --> 01:17:34.160 And this gets interesting when you think about somebody trying 01:17:34.160 --> 01:17:37.160 to perpetrate a fraudulent transaction. 01:17:37.160 --> 01:17:41.240 So imagine I'm trying to perpetrate a fraudulent transaction by initiating 01:17:41.240 --> 01:17:43.850 a transaction that says, I'm going to pay you $100. 01:17:43.850 --> 01:17:47.720 And I announce that to you, but I don't broadcast it 01:17:47.720 --> 01:17:50.540 to everybody else who maintains the blocks, who are maintaining 01:17:50.540 --> 01:17:52.570 their own copies of blockchains. 01:17:52.570 --> 01:17:55.370 Which is interesting because you think that I have spent $100, 01:17:55.370 --> 01:17:58.370 and as far as you're concerned I have spent $100 to you, 01:17:58.370 --> 01:18:00.440 but no one else is aware of that. 01:18:00.440 --> 01:18:04.940 So no one else thinks that I have spent $100. 01:18:04.940 --> 01:18:09.680 They all think I am $100 wealthier than I actually am. 01:18:09.680 --> 01:18:13.280 The problem then arises that I need to verify that block. 01:18:13.280 --> 01:18:15.230 I need to verify that transaction. 01:18:15.230 --> 01:18:18.260 So I append the transaction to my own copy of the blockchain 01:18:18.260 --> 01:18:20.907 because I am the only person other than you-- 01:18:20.907 --> 01:18:23.240 the two of us maybe have these copies of the blockchain, 01:18:23.240 --> 01:18:25.910 but everybody else, I didn't broadcast this transaction 01:18:25.910 --> 01:18:28.220 so no one else knows about it. 01:18:28.220 --> 01:18:31.070 In order for it to have a proof of work attached to it, 01:18:31.070 --> 01:18:35.360 in order for it to be considered the valid chain, 01:18:35.360 --> 01:18:39.110 I would need to prove that block. 01:18:39.110 --> 01:18:44.300 I would need to find that secret number that when hashed with the entire block, 01:18:44.300 --> 01:18:50.040 produces a pattern of 30 consecutive zero bits before anybody else does. 01:18:50.040 --> 01:18:55.270 So that's a 1 in 2 over 2 to the 30th power chance 01:18:55.270 --> 01:18:59.650 because I'm looking for a pattern of 30 consecutive zeros. 01:18:59.650 --> 01:19:02.530 There's a 1 in 2 to the 30th power chance 01:19:02.530 --> 01:19:04.312 that I'm going to find that pattern. 01:19:04.312 --> 01:19:06.520 And I have to find that pattern before somebody else. 01:19:06.520 --> 01:19:11.225 And in the meantime, other transactions are coming in on my ledger. 01:19:11.225 --> 01:19:13.600 On my-- other people are broadcasting their transactions. 01:19:13.600 --> 01:19:16.000 And I have to keep adding them to my ledger 01:19:16.000 --> 01:19:19.840 and keep proving that work over and over and over, 01:19:19.840 --> 01:19:23.920 all the while trying to stay ahead so that my fraudulent transaction is 01:19:23.920 --> 01:19:27.400 considered ultimately the correct blockchain. 01:19:27.400 --> 01:19:30.730 Now the odd-- you just can't beat the odds of that. 01:19:30.730 --> 01:19:35.260 One malicious person trying to perpetrate a fraudulent transaction 01:19:35.260 --> 01:19:38.140 using the blockchain cannot stay ahead. 01:19:38.140 --> 01:19:42.130 They can't win the find the secret number 01:19:42.130 --> 01:19:45.010 game over and over and over and over. 01:19:45.010 --> 01:19:48.880 Eventually, some other chain, which contains valid transactions, 01:19:48.880 --> 01:19:54.190 will win out over my attempted fraudulent chain. 01:19:54.190 --> 01:19:55.570 And it will be disregarded. 01:19:55.570 --> 01:20:00.568 Nobody will consider that to be a valid part of the chain anymore. 01:20:00.568 --> 01:20:02.110 And so that's kind of how this works. 01:20:02.110 --> 01:20:07.360 Again, it's arbitrary the way they decide to resolve or verify. 01:20:07.360 --> 01:20:09.790 There's nothing about this process that proves 01:20:09.790 --> 01:20:12.670 that person A sent person B money. 01:20:12.670 --> 01:20:15.160 It's just the consensus that we have decided, well, 01:20:15.160 --> 01:20:19.000 if people have gone through the effort to try and find these secret numbers, 01:20:19.000 --> 01:20:23.830 and many different people are doing it, and this one chain is longer than 01:20:23.830 --> 01:20:27.400 the others because it's been verified-- again, using this term verified-- 01:20:27.400 --> 01:20:31.120 it's been proven with work over and over and over, we're 01:20:31.120 --> 01:20:33.380 just going to agree that that's the right one. 01:20:33.380 --> 01:20:34.987 So again, it's kind of strange. 01:20:34.987 --> 01:20:37.570 And I do, again, refer you to that video that I shared earlier 01:20:37.570 --> 01:20:39.862 to get into some of the more technical details of this, 01:20:39.862 --> 01:20:42.430 which I'm glossing over a little bit here in this discussion. 01:20:42.430 --> 01:20:46.060 But proof of work is basically the collective consensus 01:20:46.060 --> 01:20:48.310 of blockchain users, or in this case specifically, 01:20:48.310 --> 01:20:54.232 of Bitcoin users, for which transactions they are going to consider valid. 01:20:54.232 --> 01:20:56.940 Because changing any one-- and if you go back in time, as opposed 01:20:56.940 --> 01:21:00.450 to trying to forward think I want to add a new fraudulent transaction, 01:21:00.450 --> 01:21:03.900 if you try and go back in time to modify a transaction from the past, 01:21:03.900 --> 01:21:08.910 say there was a transaction that was you pay me $10 01:21:08.910 --> 01:21:12.870 and I maintain a copy of the blockchain, so I can go back in time 01:21:12.870 --> 01:21:21.150 and modify that file, technically, I change it to you pay me $100, well, 01:21:21.150 --> 01:21:23.430 because I've changed even the tiniest thing 01:21:23.430 --> 01:21:26.340 and I'm hashing that block, that means that when I hash it 01:21:26.340 --> 01:21:29.700 with that secret number, I'm no longer getting that secret pattern of 30 01:21:29.700 --> 01:21:32.010 numbers, 30 zeros in a row. 01:21:32.010 --> 01:21:35.650 And so that kind of calls that transaction into question. 01:21:35.650 --> 01:21:38.280 It also, because each of those blocks contains a reference 01:21:38.280 --> 01:21:40.560 to the next block and the previous block, 01:21:40.560 --> 01:21:46.770 it also invalidates all of the other transactions in that blockchain. 01:21:46.770 --> 01:21:50.070 And so because of this weird technique we're 01:21:50.070 --> 01:21:54.840 doing of hashing blocks, hashing data, trying to look for specific patterns, 01:21:54.840 --> 01:21:58.680 but realizing that any cryptographic hash function with the tiniest 01:21:58.680 --> 01:22:03.270 change to the input creates a totally different output, 01:22:03.270 --> 01:22:06.390 we actually are pretty well defended against people 01:22:06.390 --> 01:22:10.575 who try and go back in time and make fraudulent transactions using 01:22:10.575 --> 01:22:11.200 the blockchain. 01:22:11.200 --> 01:22:16.440 So it's mathematical and it's quirky, but it does provide a clever way 01:22:16.440 --> 01:22:19.260 to defend against that kind of thing, considering 01:22:19.260 --> 01:22:21.097 we don't have a central authority to rely on 01:22:21.097 --> 01:22:22.680 to adjudicate these kinds of disputes. 01:22:22.680 --> 01:22:26.400 We are collectively, not trusting one another enough, 01:22:26.400 --> 01:22:30.270 but agreeing to trust the mathematics of the blockchain in order 01:22:30.270 --> 01:22:33.860 for it to succeed. 01:22:33.860 --> 01:22:36.615 So as I mentioned, we can very easily verify the correctness 01:22:36.615 --> 01:22:37.740 of someone's proof of work. 01:22:37.740 --> 01:22:41.070 That proof of work is just the number that is hashed with the block 01:22:41.070 --> 01:22:46.090 to produce the secret pattern of 30 zeros and then some other bits, 01:22:46.090 --> 01:22:47.250 and so on. 01:22:47.250 --> 01:22:49.350 The longer a chain gets, the more and more likely 01:22:49.350 --> 01:22:51.988 it is that all the transactions in it are "verified." 01:22:51.988 --> 01:22:54.030 Again, I keep putting air quotes around that word 01:22:54.030 --> 01:22:57.420 because it doesn't mean in exactly the same way 01:22:57.420 --> 01:23:00.390 that we might consider verified colloquially to mean. 01:23:00.390 --> 01:23:03.090 It doesn't prove anything about the transaction itself, 01:23:03.090 --> 01:23:06.660 just that we accept it as the standard. 01:23:06.660 --> 01:23:09.630 We accept this as the de facto truth because of all the mathematics 01:23:09.630 --> 01:23:12.470 that have been put into it. 01:23:12.470 --> 01:23:14.500 So the longer a chain gets, the more likely 01:23:14.500 --> 01:23:17.890 it is that it consists of only verified, legitimate transactions. 01:23:17.890 --> 01:23:22.470 But that brings up a question of, what is a transaction? 01:23:22.470 --> 01:23:26.320 A transaction is just an exchange between two people. 01:23:26.320 --> 01:23:28.200 And if we start to really spread things out, 01:23:28.200 --> 01:23:33.720 we can almost think about a transaction as a contract. 01:23:33.720 --> 01:23:37.950 I offer you $10 for you to do something on my behalf, 01:23:37.950 --> 01:23:41.282 and assuming that we're intending for me to actually give you these $10, 01:23:41.282 --> 01:23:43.740 and you're intending to actually do something on my behalf, 01:23:43.740 --> 01:23:47.610 and the thing that you're doing for me is not illegal, 01:23:47.610 --> 01:23:50.830 we've basically formed a contract. 01:23:50.830 --> 01:23:54.120 And so while Bitcoin can be used, the blockchain for Bitcoin 01:23:54.120 --> 01:23:57.940 can be used to send money back and forth between people, 01:23:57.940 --> 01:24:02.850 the data that goes into the data block of any blockchain is arbitrary. 01:24:02.850 --> 01:24:07.230 And there's no reason why, instead of being a list of transactions, 01:24:07.230 --> 01:24:10.410 that data couldn't be something much more significant than that. 01:24:10.410 --> 01:24:14.570 There's no reason it couldn't be a digitally signed PDF 01:24:14.570 --> 01:24:17.610 scan of a contract between two people. 01:24:17.610 --> 01:24:22.830 There's no reason it can't be a message from me typed to you saying, 01:24:22.830 --> 01:24:27.660 I will pay you $100 if you paint my house on Tuesday, 01:24:27.660 --> 01:24:32.190 and you sending something back in that same chain saying, I will paint-- 01:24:32.190 --> 01:24:35.640 I accept your offer for this payment. 01:24:35.640 --> 01:24:36.720 I accept your offer. 01:24:36.720 --> 01:24:40.560 I will paint your house on Tuesday in exchange for $100. 01:24:40.560 --> 01:24:44.850 We've just formed a contract with no middleman at all. 01:24:44.850 --> 01:24:46.710 We are announcing our intentions. 01:24:46.710 --> 01:24:50.460 It is being recorded publicly in everybody's version of the blockchain. 01:24:50.460 --> 01:24:53.460 There is verified, again, verified in the sense 01:24:53.460 --> 01:24:59.458 that we collectively term to be accurate rather than 01:24:59.458 --> 01:25:02.250 proving that I definitely sent this although the digital signatures 01:25:02.250 --> 01:25:04.125 associated with these transactions do, again, 01:25:04.125 --> 01:25:07.870 suggest yes, I am the person who made this transaction because I digitally 01:25:07.870 --> 01:25:08.370 signed it. 01:25:08.370 --> 01:25:12.510 If I do the same thing with a contract, if I send you an offer 01:25:12.510 --> 01:25:16.330 and you accept, and both of those items are in the chain, 01:25:16.330 --> 01:25:18.210 we arguably have formed a contract. 01:25:18.210 --> 01:25:22.890 And that is what the blockchain associated with the Ethereum technology 01:25:22.890 --> 01:25:24.840 is actually more akin to. 01:25:24.840 --> 01:25:27.210 So Bitcoin is kind of restricted in how it 01:25:27.210 --> 01:25:31.860 approaches cryptocurrency and approaches transactions between people. 01:25:31.860 --> 01:25:34.120 And Ethereum opens up a little bit more. 01:25:34.120 --> 01:25:36.750 And there are other blockchain technologies and other services 01:25:36.750 --> 01:25:41.040 that rely on the blockchain in order to do things far 01:25:41.040 --> 01:25:45.210 beyond what a cryptocurrency could do. 01:25:45.210 --> 01:25:49.290 But all these things are only possible because we rely on-- 01:25:49.290 --> 01:25:51.990 we rely so extensively on cryptography. 01:25:51.990 --> 01:25:57.420 We use computers to send information securely, encrypt information. 01:25:57.420 --> 01:26:00.030 And the mathematical unlikelihood of someone 01:26:00.030 --> 01:26:02.880 being able to duplicate our work, or certainly 01:26:02.880 --> 01:26:06.090 reverse engineer this encryption is what gives us 01:26:06.090 --> 01:26:10.357 the confidence to make these transactions in the first place. 01:26:10.357 --> 01:26:12.690 And so cryptography forms the basis of almost everything 01:26:12.690 --> 01:26:17.220 that we do when we talk about security on a computer. 01:26:17.220 --> 01:26:21.850 But ultimately, cryptography just relies on mathematics. 01:26:21.850 --> 01:26:24.400 So the moral of the story is probably this. 01:26:24.400 --> 01:26:26.512 You are probably not going to be implementing 01:26:26.512 --> 01:26:27.970 your own version of the blockchain. 01:26:27.970 --> 01:26:32.650 And really, you don't need to understand it completely in order to use it. 01:26:32.650 --> 01:26:36.790 Like I said, you can use Bitcoin without knowing the mathematics of how Bitcoin 01:26:36.790 --> 01:26:40.750 works, just like you can use a bank without knowing the minutia of how 01:26:40.750 --> 01:26:42.400 the banking system works. 01:26:42.400 --> 01:26:46.090 The point of the blockchain is to remove a central authority. 01:26:46.090 --> 01:26:50.320 We don't rely on one person or one entity or one government 01:26:50.320 --> 01:26:54.610 to determine what has happened, what the transactions are 01:26:54.610 --> 01:26:55.690 like we do with a bank. 01:26:55.690 --> 01:26:58.960 Your bank has a ledger of everybody's accounts. 01:26:58.960 --> 01:27:03.130 With blockchain technology, we are decentralizing this and making it 01:27:03.130 --> 01:27:05.800 so that everybody has access to all of the information at once, 01:27:05.800 --> 01:27:09.850 and it is everybody's responsibility to keep that ledger accurate. 01:27:09.850 --> 01:27:13.150 And because these ledgers rely so extensively on cryptography, 01:27:13.150 --> 01:27:16.210 because this technology relies on cryptography, 01:27:16.210 --> 01:27:18.940 we can use the power of cryptography, the fact 01:27:18.940 --> 01:27:23.050 that things are very difficult to reverse engineer mathematically 01:27:23.050 --> 01:27:25.630 to verify that yes, these are the things, 01:27:25.630 --> 01:27:28.630 these are the things that have happened, these are the transactions that 01:27:28.630 --> 01:27:33.420 have been logged, and everybody knows about it at the same time.