WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:00.000 --> 00:00:03.904 [MUSIC PLAYING] 00:00:17.047 --> 00:00:18.130 DAVID J. MALAN: All right. 00:00:18.130 --> 00:00:21.220 This is CS50's Introduction to Cybersecurity. 00:00:21.220 --> 00:00:25.450 My name is David Malan, and this week, we'll focus on securing data. 00:00:25.450 --> 00:00:27.987 Last week, recall, we focused on accounts, 00:00:27.987 --> 00:00:30.070 and particularly one of the mechanisms by which we 00:00:30.070 --> 00:00:33.880 protect our accounts is generally by way of these things called passwords. 00:00:33.880 --> 00:00:38.080 But we focused last time really on our having the responsibility 00:00:38.080 --> 00:00:39.640 to keep these things secure. 00:00:39.640 --> 00:00:41.650 And yet, there's another party involved whenever 00:00:41.650 --> 00:00:43.775 you have an account with a username and a password, 00:00:43.775 --> 00:00:46.240 and that's the server or app that is actually 00:00:46.240 --> 00:00:49.160 storing that password in some form long-term 00:00:49.160 --> 00:00:51.460 so that you can actually authenticate yourself-- 00:00:51.460 --> 00:00:53.980 that is, prove to this application or website 00:00:53.980 --> 00:00:56.110 that you are who you claim to be. 00:00:56.110 --> 00:01:00.010 Well, in the simplest form, perhaps these servers 00:01:00.010 --> 00:01:04.120 that are storing our usernames and passwords for which we have registered 00:01:04.120 --> 00:01:06.250 or maybe doing something very simple like this. 00:01:06.250 --> 00:01:10.210 For instance, if a website or app has two users at the moment, at least, 00:01:10.210 --> 00:01:12.670 Alice and Bob, and suppose for simplicity 00:01:12.670 --> 00:01:16.760 that Alice's password is Apple and Bob's password is banana, 00:01:16.760 --> 00:01:20.390 you could imagine that a website or that app, simply storing 00:01:20.390 --> 00:01:25.700 in a very simple text file these key value pairs-- 00:01:25.700 --> 00:01:28.370 username, colon, password, new line. 00:01:28.370 --> 00:01:30.980 Username, colon, password, new line. 00:01:30.980 --> 00:01:33.020 And in fact, that's actually very commonly 00:01:33.020 --> 00:01:36.110 how passwords are stored on systems, at least certain operating 00:01:36.110 --> 00:01:39.350 systems like Linux, not necessarily as simply as this. 00:01:39.350 --> 00:01:42.290 They often have a little more information off to the right there, 00:01:42.290 --> 00:01:44.700 but in essence, it's the username and password. 00:01:44.700 --> 00:01:48.980 But this wouldn't be a good thing to store the passwords exactly like this. 00:01:48.980 --> 00:01:49.550 Why? 00:01:49.550 --> 00:01:53.030 Well, suppose that this website or this app and its database 00:01:53.030 --> 00:01:54.920 are somehow hacked by an adversary. 00:01:54.920 --> 00:01:58.310 That if someone gains access to that file containing these usernames 00:01:58.310 --> 00:02:00.440 and passwords, well, at that point, they literally 00:02:00.440 --> 00:02:02.810 have everyone's username and password. 00:02:02.810 --> 00:02:05.270 And we talked last time about attacks like credential 00:02:05.270 --> 00:02:09.620 stuffing whereby an adversary, once they know your username and password on one 00:02:09.620 --> 00:02:12.050 system, they can try stuffing that username 00:02:12.050 --> 00:02:14.240 and password into other systems, other websites, 00:02:14.240 --> 00:02:18.620 other apps just in hopes that you are, unfortunately, using the same username 00:02:18.620 --> 00:02:21.000 and password elsewhere as well. 00:02:21.000 --> 00:02:23.810 So this is generally not a good thing if an adversary gets access 00:02:23.810 --> 00:02:26.360 to everyone's usernames and passwords. 00:02:26.360 --> 00:02:30.110 And even though, of course, in an ideal world, that would never happen, 00:02:30.110 --> 00:02:32.480 we should probably, as the administrators, 00:02:32.480 --> 00:02:34.760 as the creators of this website or app, we 00:02:34.760 --> 00:02:37.790 should probably do everything we can to at least minimize 00:02:37.790 --> 00:02:42.260 the fallout, the downsides, the damages that might result if, 00:02:42.260 --> 00:02:47.880 and daresay, when our database or this text file here are somehow compromised. 00:02:47.880 --> 00:02:49.880 So how might we go about doing that here? 00:02:49.880 --> 00:02:53.850 Rather than just storing apple and banana in clear text, 00:02:53.850 --> 00:02:57.020 so to speak, literally in the English words themselves, 00:02:57.020 --> 00:03:00.263 why don't we go ahead and employ a technique known as hashing? 00:03:00.263 --> 00:03:02.180 Now if you've studied computer science before, 00:03:02.180 --> 00:03:06.020 you might actually know this phrase in the context of hash tables and data 00:03:06.020 --> 00:03:06.650 structures. 00:03:06.650 --> 00:03:11.275 Well, it turns out the idea in this world of securing data is very similar, 00:03:11.275 --> 00:03:14.150 and in fact, this is a technique that's incredibly common for solving 00:03:14.150 --> 00:03:15.560 all sorts of problems. 00:03:15.560 --> 00:03:17.780 Well, what do we mean by hashing in this context? 00:03:17.780 --> 00:03:20.960 Hashing is the process of taking a password as input 00:03:20.960 --> 00:03:25.250 and somehow converting it to a so-called hash or hash value. 00:03:25.250 --> 00:03:27.980 Now these hash values don't look like English. 00:03:27.980 --> 00:03:32.240 They're typically strings of text that might have letters, might have numbers, 00:03:32.240 --> 00:03:35.060 but they're generally of some fixed length typically. 00:03:35.060 --> 00:03:39.170 And in this case here, when we go about taking our password as input, 00:03:39.170 --> 00:03:43.580 converting it somehow via an algorithm or some code that we wrote, 00:03:43.580 --> 00:03:45.680 we want to convert it into this hash value 00:03:45.680 --> 00:03:50.400 and then store that hash value in that database of passwords instead. 00:03:50.400 --> 00:03:53.450 So here's a proverbial black box, and let's stipulate for the moment 00:03:53.450 --> 00:03:57.710 that I have no idea how hashing works, but I do know that this box can do it. 00:03:57.710 --> 00:03:59.630 So how do I think about this process? 00:03:59.630 --> 00:04:03.500 Well generally speaking, there's going to be some input to this box. 00:04:03.500 --> 00:04:06.630 Ultimately, I want to get some output from that box. 00:04:06.630 --> 00:04:11.150 And what this box really represents is, in fact, a hash function. 00:04:11.150 --> 00:04:14.060 You can think of this as a device like some kind of machine; 00:04:14.060 --> 00:04:16.970 you can think of it like a program, some piece of software; 00:04:16.970 --> 00:04:20.560 or you can even think about it as a mathematical function that operates 00:04:20.560 --> 00:04:22.760 simply on numbers coming in as input. 00:04:22.760 --> 00:04:24.610 In fact, if you're mathematically inclined, 00:04:24.610 --> 00:04:28.540 though we won't use this syntax often, you can think of that hash function 00:04:28.540 --> 00:04:33.250 as being represented by f, you can think of the input as being represented by x, 00:04:33.250 --> 00:04:37.678 and you can think of the output of this process as being so-called f of x. 00:04:37.678 --> 00:04:39.970 If you're not familiar with that notation, that's fine, 00:04:39.970 --> 00:04:43.750 but this is directly connected hashing to basic mathematics 00:04:43.750 --> 00:04:45.880 as well that you might encounter before long. 00:04:45.880 --> 00:04:49.930 But what we care about is passing into this black box a password 00:04:49.930 --> 00:04:53.440 and getting out a hash, and then storing that hash and not 00:04:53.440 --> 00:04:57.670 the password in our database or text file of usernames and passwords. 00:04:57.670 --> 00:04:59.600 So how might we go about doing this? 00:04:59.600 --> 00:05:03.760 Well, if I were to provide apple as an input to this hash function, 00:05:03.760 --> 00:05:06.910 let's think about the simplest hash function possible 00:05:06.910 --> 00:05:12.100 that doesn't output apple, but some representation of apple 00:05:12.100 --> 00:05:14.325 that I can eventually store in that database. 00:05:14.325 --> 00:05:17.450 So I'm going to propose very simply that maybe the a simplest hash function 00:05:17.450 --> 00:05:20.330 we can come up with-- and indeed, if you've studied computer science 00:05:20.330 --> 00:05:23.000 or taken CS50 itself, you might recall that we 00:05:23.000 --> 00:05:27.300 can hash our inputs unlike specific letters therein. 00:05:27.300 --> 00:05:29.480 So apple starts with A. So you know what? 00:05:29.480 --> 00:05:31.490 A is the first letter of the English alphabet. 00:05:31.490 --> 00:05:33.470 So I'm going to create a hash function here 00:05:33.470 --> 00:05:38.360 pictorially that outputs one whenever the input happens to start with an A, 00:05:38.360 --> 00:05:39.530 as does apple. 00:05:39.530 --> 00:05:41.990 Meanwhile, if we pass in banana, I'm going 00:05:41.990 --> 00:05:44.540 to have this hash function output 2 because B 00:05:44.540 --> 00:05:46.670 is the letter of the English alphabet. 00:05:46.670 --> 00:05:51.500 And dot-dot-dot, we might get to cherry or other passwords as well that might 00:05:51.500 --> 00:05:53.240 output 3 and beyond. 00:05:53.240 --> 00:05:56.870 And you could imagine doing this for all letters of the English alphabet. 00:05:56.870 --> 00:05:59.870 Now unfortunately, this isn't the best hash function 00:05:59.870 --> 00:06:01.640 because it's fairly simplistic. 00:06:01.640 --> 00:06:05.180 And in fact, I can quickly think of some other fruits like avocados 00:06:05.180 --> 00:06:08.893 that also start with A and that would give me the same hash value. 00:06:08.893 --> 00:06:11.060 And that's actually a characteristic we'll come back 00:06:11.060 --> 00:06:15.830 to whereby when you hash values, there can actually be ambiguities, 00:06:15.830 --> 00:06:20.580 potentially, whereby two inputs might actually have the same output, 00:06:20.580 --> 00:06:23.800 and we'll consider eventually what the implications of that might be. 00:06:23.800 --> 00:06:26.640 But for now, I dare say that's a little too simplistic. 00:06:26.640 --> 00:06:30.570 And what might be better than outputting 1 or 2 or 3 00:06:30.570 --> 00:06:34.320 is a little something more cryptic, because that's just too helpful. 00:06:34.320 --> 00:06:35.470 That's too much of a hint. 00:06:35.470 --> 00:06:38.280 If I see that your hash value is 1, I at least 00:06:38.280 --> 00:06:42.330 know that your password now clearly starts with an A, which means at best, 00:06:42.330 --> 00:06:46.570 I can do 1/26th the amount of work to figure out what it actually is. 00:06:46.570 --> 00:06:50.610 So we want these hashes generally to be a little weird-looking and really 00:06:50.610 --> 00:06:53.440 unguessable and not leak any information. 00:06:53.440 --> 00:06:57.960 So for instance, a very common older hash function for apple might actually 00:06:57.960 --> 00:07:05.580 output this-- ..ekWXa83dhiA with some mixed uppercase and lowercase letters 00:07:05.580 --> 00:07:06.180 therein. 00:07:06.180 --> 00:07:10.180 Now it looks weird, you probably can't and shouldn't see any kind of pattern 00:07:10.180 --> 00:07:10.680 in there. 00:07:10.680 --> 00:07:14.490 There is a fancy math formula that took as input apple 00:07:14.490 --> 00:07:18.420 and outputted as its hash value that string of text 00:07:18.420 --> 00:07:22.170 there, but in and of itself, it doesn't really leak any information 00:07:22.170 --> 00:07:24.270 like the number 1 or 2 or 3 would. 00:07:24.270 --> 00:07:25.920 So we've already made an improvement. 00:07:25.920 --> 00:07:28.510 Banana, meanwhile, would look like this. 00:07:28.510 --> 00:07:31.180 And cherry, meanwhile, would look like that. 00:07:31.180 --> 00:07:34.240 So notice that these values are indeed quite different. 00:07:34.240 --> 00:07:37.320 So using this better hash function, I claim, that doesn't just 00:07:37.320 --> 00:07:39.060 look at the first letter of the alphabet, 00:07:39.060 --> 00:07:42.210 but looks at maybe all of the letters in the input-- 00:07:42.210 --> 00:07:46.380 C-H-E-R-R-Y in this case, we can probably come up with something more 00:07:46.380 --> 00:07:48.930 interesting, more cryptic-looking, if you will, 00:07:48.930 --> 00:07:50.550 like the values that we've just seen. 00:07:50.550 --> 00:07:54.750 So let me propose now that what we should do in our database of passwords 00:07:54.750 --> 00:07:59.400 is not store alice, apple, bob, banana, but let's instead 00:07:59.400 --> 00:08:03.400 store the hashes of apple and banana respectively. 00:08:03.400 --> 00:08:07.410 So instead in this password database, I'm going to store this instead. 00:08:07.410 --> 00:08:11.250 The exact same values that we just saw coming as outputs from that black box, 00:08:11.250 --> 00:08:13.920 but in this case now, I'm storing in my database 00:08:13.920 --> 00:08:18.900 of passwords usernames and hash values. 00:08:18.900 --> 00:08:21.260 Now why is this perhaps a good thing? 00:08:21.260 --> 00:08:23.750 Well, one, if someone now attacks this server 00:08:23.750 --> 00:08:28.220 and somehow gains access to all of these usernames and hashes, what they don't 00:08:28.220 --> 00:08:31.170 have is an entire list of passwords. 00:08:31.170 --> 00:08:35.510 So they can't quite as easily go about credential stuffing and figuring out 00:08:35.510 --> 00:08:39.710 maybe if this database will give me access to my accounts somewhere else. 00:08:39.710 --> 00:08:42.289 I'm at least creating some work for the adversary. 00:08:42.289 --> 00:08:46.070 But at the same time, I feel like I've kind of broken the whole system 00:08:46.070 --> 00:08:49.610 because previously, presumably, when you log into a website or app 00:08:49.610 --> 00:08:53.150 and you type in your username and then you type in your password, what 00:08:53.150 --> 00:08:55.470 is the website or app probably do? 00:08:55.470 --> 00:08:58.190 Well, once that username and password are sent over the internet, 00:08:58.190 --> 00:09:00.950 typically to that server, well, the server probably 00:09:00.950 --> 00:09:04.760 compares what you typed in against the username and their database, 00:09:04.760 --> 00:09:07.640 or their text file, and the server compares 00:09:07.640 --> 00:09:10.790 what you typed in as your password against whatever 00:09:10.790 --> 00:09:12.410 password is in their database. 00:09:12.410 --> 00:09:13.790 But now we have a problem. 00:09:13.790 --> 00:09:16.940 We have you typing the username and we do have the username 00:09:16.940 --> 00:09:18.200 still in the database. 00:09:18.200 --> 00:09:22.030 Case in point, Alice and Bob are still here. 00:09:22.030 --> 00:09:25.290 But what we don't have is apple and banana. 00:09:25.290 --> 00:09:27.600 We've replaced those altogether with hashes. 00:09:27.600 --> 00:09:30.840 So even if you type in-- or Alice types in apple, 00:09:30.840 --> 00:09:34.200 well we don't want to compare A-P-P-L-E to this because it obviously 00:09:34.200 --> 00:09:37.200 doesn't match; and Bob's banana, we don't want to compare against this 00:09:37.200 --> 00:09:40.240 because it's not going to match; and so forth. 00:09:40.240 --> 00:09:41.830 So what can we do? 00:09:41.830 --> 00:09:46.560 Well, the way authentication typically works on the server side 00:09:46.560 --> 00:09:49.180 when using hashing is as follows. 00:09:49.180 --> 00:09:52.920 When you first create an account or register for this website or app, 00:09:52.920 --> 00:09:58.020 you type in, if you're Alice, Alice, Enter, and then apple, for instance, 00:09:58.020 --> 00:09:58.800 Enter. 00:09:58.800 --> 00:10:02.890 That username, Alice, that password, Apple, are sent to the server. 00:10:02.890 --> 00:10:06.270 But what the server does before saving the username and password 00:10:06.270 --> 00:10:10.300 is it runs that hash function on Alice's password, 00:10:10.300 --> 00:10:16.200 which is apple, converts it thereafter to this value, and stores Alice's 00:10:16.200 --> 00:10:21.570 username and the hash of Alice's password only and throws away apple, 00:10:21.570 --> 00:10:23.640 deletes it, it forgets it in memory. 00:10:23.640 --> 00:10:25.600 What then happens next? 00:10:25.600 --> 00:10:29.400 Well, the next time Alice tries to log into this website-- 00:10:29.400 --> 00:10:33.270 maybe the next day, a week from then, a year from then for the second or third 00:10:33.270 --> 00:10:35.130 or more time, what happens? 00:10:35.130 --> 00:10:38.010 Well, Alice types in Alice as her username, hopefully 00:10:38.010 --> 00:10:41.880 apple as her password, hits Enter, those get sent to the server as usual, 00:10:41.880 --> 00:10:44.100 and obviously the server can't just compare 00:10:44.100 --> 00:10:46.770 username against username and password against password 00:10:46.770 --> 00:10:49.890 because it doesn't have the password in its database, so 00:10:49.890 --> 00:10:51.090 what can the server do? 00:10:51.090 --> 00:10:53.910 The server can repeat the very same process, 00:10:53.910 --> 00:10:57.870 taking Alice's password as inputted, A-P-P-L-E, 00:10:57.870 --> 00:11:02.190 run it through the exact same hash function a day, a week, a year later, 00:11:02.190 --> 00:11:07.530 and then compare that resulting hash value to whatever is stored in this 00:11:07.530 --> 00:11:09.510 text file or database. 00:11:09.510 --> 00:11:13.860 And now admittedly, we're creating a whole lot more work for ourselves, 00:11:13.860 --> 00:11:16.770 but it's not that big a deal because this is just a math function, 00:11:16.770 --> 00:11:19.800 or if you know how to program, it's just a few lines of code 00:11:19.800 --> 00:11:23.610 that you've written in software that converts passwords to hash values. 00:11:23.610 --> 00:11:26.670 And honestly, nowadays, you wouldn't even rewriting most of this code 00:11:26.670 --> 00:11:29.910 yourself, you'd be using a library, third-party code that someone 00:11:29.910 --> 00:11:32.850 else smarter than you, maybe, has written and gotten it just right, 00:11:32.850 --> 00:11:36.060 no bugs or mistakes, so you're just relying on someone else's code 00:11:36.060 --> 00:11:37.770 anyway to achieve this goal. 00:11:37.770 --> 00:11:42.420 But the upside now, to be clear, is if this file is compromised somehow, 00:11:42.420 --> 00:11:45.570 the server's hacked into and this data is leaked, 00:11:45.570 --> 00:11:52.620 at least they only know the usernames on your system, not the actual passwords. 00:11:52.620 --> 00:11:55.620 And let me pause here and see if there's any questions on this technique 00:11:55.620 --> 00:12:01.220 of hashing for passwords specifically. 00:12:01.220 --> 00:12:04.430 STUDENT: You said yourself, we are using libraries 00:12:04.430 --> 00:12:09.920 more often than write the hash functions ourselves if we are not 00:12:09.920 --> 00:12:13.610 taking the course on CS50. 00:12:13.610 --> 00:12:17.180 So then it's easy to hack these hashes, right? 00:12:17.180 --> 00:12:20.705 Because we can go through 10, 40, I don't 00:12:20.705 --> 00:12:24.620 know, hash functions that are available in the libraries, 00:12:24.620 --> 00:12:29.392 and then you can reverse the hash results, is that right? 00:12:29.392 --> 00:12:30.350 DAVID J. MALAN: Almost. 00:12:30.350 --> 00:12:32.960 Can do exactly what you described first whereby 00:12:32.960 --> 00:12:37.940 you use the same library, the same code, to create hash values to then compare 00:12:37.940 --> 00:12:41.330 those against what's in the database, but generally, these hashes 00:12:41.330 --> 00:12:43.280 are not reversible, per se. 00:12:43.280 --> 00:12:46.130 You can compare them, but you can't reverse the process 00:12:46.130 --> 00:12:47.720 for reasons we'll come back to. 00:12:47.720 --> 00:12:49.310 But your intuition is right. 00:12:49.310 --> 00:12:51.710 And so really, the takeaway here is that we 00:12:51.710 --> 00:12:54.620 haven't made our system absolutely secure, 00:12:54.620 --> 00:12:56.940 we've made it relatively more secure. 00:12:56.940 --> 00:12:57.440 Why? 00:12:57.440 --> 00:13:01.100 Because we've increased the cost to the adversary, to the hacker. 00:13:01.100 --> 00:13:05.540 They now have to do more work to figure out what the actual passwords are 00:13:05.540 --> 00:13:07.430 if they want to benefit from this hack. 00:13:07.430 --> 00:13:10.610 So again, it just raises the bar, it does not 00:13:10.610 --> 00:13:12.680 keep the adversary necessarily out or even 00:13:12.680 --> 00:13:15.210 stop them from figuring out one person's password, 00:13:15.210 --> 00:13:17.220 but it might take them a lot more time, it 00:13:17.220 --> 00:13:21.030 might take them a lot more resources like server or cloud costs or money, 00:13:21.030 --> 00:13:25.680 or it might even heighten the risk before they actually are successful. 00:13:25.680 --> 00:13:28.950 How about one other question here on hashing? 00:13:28.950 --> 00:13:32.430 STUDENT: If the password is intercepted before-- 00:13:32.430 --> 00:13:34.950 after the website is hacked and the password 00:13:34.950 --> 00:13:40.950 is intercepted before it's encrypted, so wouldn't that pose a problem? 00:13:40.950 --> 00:13:42.660 DAVID J. MALAN: Yes, absolutely. 00:13:42.660 --> 00:13:44.040 Then all bets are off. 00:13:44.040 --> 00:13:45.840 Everything we just discussed is not useful 00:13:45.840 --> 00:13:49.320 at all if the adversary has actually intercepted the password 00:13:49.320 --> 00:13:50.758 before it has even been hashed. 00:13:50.758 --> 00:13:53.550 Now thankfully, there's going to be solutions to that problem, too, 00:13:53.550 --> 00:13:57.180 and we'll come to them today, but for now, focusing only on hashes, 00:13:57.180 --> 00:13:59.380 it solves one problem but not all. 00:13:59.380 --> 00:14:04.140 In fact, it turns out that those attacks we talked about last time with respect 00:14:04.140 --> 00:14:06.240 to our accounts are still possible. 00:14:06.240 --> 00:14:09.360 You can still use a dictionary, for instance, of English words, 00:14:09.360 --> 00:14:12.240 or better yet, a dictionary of English fruits, 00:14:12.240 --> 00:14:18.150 and you could, one fruit at a time, run each of those values as input 00:14:18.150 --> 00:14:20.550 into the same hash function, the library or code 00:14:20.550 --> 00:14:22.560 that you're using to achieve this, and then 00:14:22.560 --> 00:14:25.860 that's going to give you one hash value after another. 00:14:25.860 --> 00:14:28.170 And you could compare each of those hash values 00:14:28.170 --> 00:14:31.890 against whatever is in the database or the file of passwords 00:14:31.890 --> 00:14:35.940 that you, the hacker in this story, might have actually stolen somehow. 00:14:35.940 --> 00:14:37.920 You have to do more work though, because it's 00:14:37.920 --> 00:14:41.700 no longer as simple as just comparing apple against apple and banana 00:14:41.700 --> 00:14:42.420 against banana. 00:14:42.420 --> 00:14:44.680 You actually have to do some work. 00:14:44.680 --> 00:14:46.860 You have to do some computational work. 00:14:46.860 --> 00:14:50.100 And if the file is only a few values, of course, not a big deal. 00:14:50.100 --> 00:14:54.150 If it's thousands or millions of rows, it might actually take a lot more 00:14:54.150 --> 00:14:55.948 time, energy, and effort. 00:14:55.948 --> 00:14:58.740 So again, we're just raising the bar, but not keeping the adversary 00:14:58.740 --> 00:15:00.010 out altogether. 00:15:00.010 --> 00:15:02.770 And even if you don't have a dictionary available, 00:15:02.770 --> 00:15:06.030 and even if the passwords are not all fruits in English, 00:15:06.030 --> 00:15:10.080 well, you can still, as the adversary, resort to brute-force attacks. 00:15:10.080 --> 00:15:15.390 And you can try even the simplest of passwords like 0000 or maybe eight 00:15:15.390 --> 00:15:19.680 0's instead, and you can hash that and see what the resulting hash value is 00:15:19.680 --> 00:15:22.170 and compare that against what's in the database. 00:15:22.170 --> 00:15:28.620 Then you can try 00000001, hash that, compare that against 00:15:28.620 --> 00:15:31.380 what's in the database, and then move on to the next and the next, 00:15:31.380 --> 00:15:34.080 doing this not just for numbers, but for letters as well. 00:15:34.080 --> 00:15:38.400 A, A, A, A, A, A, A, A, A, hash that and compare. 00:15:38.400 --> 00:15:40.590 Eventually, apple will be on that list. 00:15:40.590 --> 00:15:42.682 Eventually, banana will be on that list. 00:15:42.682 --> 00:15:44.640 But there, too, the brute force attack is still 00:15:44.640 --> 00:15:46.120 going to take some amount of time. 00:15:46.120 --> 00:15:48.660 So it's just increasing the cost or the complexity 00:15:48.660 --> 00:15:51.810 for the adversary in this particular case. 00:15:51.810 --> 00:15:54.810 But there's yet another threat that's possible in the context now 00:15:54.810 --> 00:15:58.170 of the hashes, which is worth knowing about. 00:15:58.170 --> 00:16:00.690 There's a term of art known as a rainbow table, which 00:16:00.690 --> 00:16:05.040 is a very beautiful way of saying that adversaries in advance 00:16:05.040 --> 00:16:09.150 might have already hashed all possible English words in a dictionary. 00:16:09.150 --> 00:16:13.400 Adversaries might have already hashed all possible passwords of length 4 00:16:13.400 --> 00:16:16.410 or 5 or 6 or 7 or 8 or something else. 00:16:16.410 --> 00:16:18.630 And maybe if they have a big enough hard drive, 00:16:18.630 --> 00:16:21.510 they are storing a big table, like an Excel file 00:16:21.510 --> 00:16:25.950 or a CSV file of all of the words that they've tried, all of the passwords 00:16:25.950 --> 00:16:30.090 they've tried, and all of the hash values they've already computed. 00:16:30.090 --> 00:16:31.320 Then it's even easier. 00:16:31.320 --> 00:16:34.260 Then they don't even need to do a brute-force attack, per se, 00:16:34.260 --> 00:16:36.480 hashing and hashing and hashing and hashing. 00:16:36.480 --> 00:16:38.880 Then they can just compare, compare, compare. 00:16:38.880 --> 00:16:41.490 Because indeed, a rainbow table simply contains 00:16:41.490 --> 00:16:46.110 all of the passwords they've tried, all of the hash values they've generated, 00:16:46.110 --> 00:16:48.510 and so they just compare left to right whatever 00:16:48.510 --> 00:16:52.110 the user typed in against the hash value they've already computed. 00:16:52.110 --> 00:16:56.430 Now for certain hash functions, this threat of a rainbow table 00:16:56.430 --> 00:16:57.690 is just not feasible. 00:16:57.690 --> 00:17:03.060 You might need terabytes or petabytes of data, which means a lot of hard drives 00:17:03.060 --> 00:17:06.630 and a lot of money, so there are potential downward pressures 00:17:06.630 --> 00:17:09.690 on this kind of an attack, but it can certainly speed things up. 00:17:09.690 --> 00:17:12.060 Certainly if you're pre-computing-- that is, 00:17:12.060 --> 00:17:14.930 pre-calculating some of the hashes for at least words 00:17:14.930 --> 00:17:17.900 in an English dictionary, and certainly some short list like all 00:17:17.900 --> 00:17:19.890 of the fruits in the world. 00:17:19.890 --> 00:17:21.980 But there's another problem that we might 00:17:21.980 --> 00:17:25.010 encounter on the server with regard to our passwords. 00:17:25.010 --> 00:17:28.369 Alice might have a password of apple, Bob might have a password of banana, 00:17:28.369 --> 00:17:34.460 but suppose that both Carol and Charlie have a password of cherry. 00:17:34.460 --> 00:17:38.030 And just by coincidence, they both chose the same password 00:17:38.030 --> 00:17:39.940 and are in this same database. 00:17:39.940 --> 00:17:42.950 Now we've already concluded, I think, that we definitely don't 00:17:42.950 --> 00:17:45.380 want to store the plaintext passwords. 00:17:45.380 --> 00:17:50.030 We don't want to store literally in the clear apple, banana, cherry, and cherry 00:17:50.030 --> 00:17:53.690 because this is just too easy for the adversary to do bad things with it. 00:17:53.690 --> 00:17:56.390 So we at least want to hash this, but here's 00:17:56.390 --> 00:18:00.800 where hashing can leak information, so to speak. 00:18:00.800 --> 00:18:02.900 If I go ahead and use the same function I've 00:18:02.900 --> 00:18:06.500 been using to hash apple and banana and now cherry, 00:18:06.500 --> 00:18:12.620 what do you notice about Carol's and Charlie's hash values? 00:18:12.620 --> 00:18:17.180 Curiously, but maybe not surprisingly, they're exactly the same. 00:18:17.180 --> 00:18:19.620 That's, after all, how functions typically work, 00:18:19.620 --> 00:18:21.740 be it in math or in software, in code. 00:18:21.740 --> 00:18:25.370 If you pass the exact same input, unless there's some randomness going on, 00:18:25.370 --> 00:18:27.860 you're going to get the same output again and again. 00:18:27.860 --> 00:18:29.640 Now why is this a big deal? 00:18:29.640 --> 00:18:33.020 Well, if some adversary attacks this database and gains 00:18:33.020 --> 00:18:36.380 access to all of these usernames and hashes, 00:18:36.380 --> 00:18:40.640 we have leaked information in the sense that the adversary, just 00:18:40.640 --> 00:18:43.760 by glancing at this file, knows that, OK, I 00:18:43.760 --> 00:18:46.860 don't know what Carol's password is or what Charlie's password is, 00:18:46.860 --> 00:18:50.090 but I know it's the same password, and that alone 00:18:50.090 --> 00:18:54.350 might be enough information to figure out with higher probability what it is. 00:18:54.350 --> 00:18:56.430 Maybe Carol and Charlie are related. 00:18:56.430 --> 00:19:01.100 So maybe you focus on words or numbers that are common to both of them. 00:19:01.100 --> 00:19:05.360 Maybe there's some information that's implied by this if they both are-- 00:19:05.360 --> 00:19:08.090 they both like the same TV shows, they both like the same movies. 00:19:08.090 --> 00:19:12.840 You can try to find, in your mind, maybe the intersection of information that 00:19:12.840 --> 00:19:15.810 might lead you, with higher probability, to figure out, 00:19:15.810 --> 00:19:20.220 without brute force, even, what Carol's password is and Charlie's password is. 00:19:20.220 --> 00:19:24.090 So this is a common problem, and we only have four users in this database. 00:19:24.090 --> 00:19:25.770 You can imagine having many more. 00:19:25.770 --> 00:19:28.468 Odds are, some of us are going to have the same username-- not 00:19:28.468 --> 00:19:31.260 the same username, some of us are going to have the same passwords. 00:19:31.260 --> 00:19:35.490 In fact, without raising your hands or admitting to this for the whole world 00:19:35.490 --> 00:19:43.530 to see, do any of you have a password of 1234 In some website or app? 00:19:43.530 --> 00:19:44.580 Maybe a little harder? 00:19:44.580 --> 00:19:48.030 12345678? 00:19:48.030 --> 00:19:49.500 Something very simple like this. 00:19:49.500 --> 00:19:51.600 Maybe it's an account you don't really care about. 00:19:51.600 --> 00:19:54.420 Well, that's a perfect example of where, if you 00:19:54.420 --> 00:19:57.810 have an account on the same system as someone else here in the classroom, 00:19:57.810 --> 00:20:01.950 you're going to have, in that database, presumably, the same hash values, 00:20:01.950 --> 00:20:06.540 and that might be alone enough information to leak and increase 00:20:06.540 --> 00:20:09.330 the probability that you, and not Alice or Bob, 00:20:09.330 --> 00:20:12.250 are actually compromised with respect to your account. 00:20:12.250 --> 00:20:13.540 So how can we fix this? 00:20:13.540 --> 00:20:16.500 Well, it turns out there's another technique in the world of data 00:20:16.500 --> 00:20:20.252 that we can use to perturb this process. 00:20:20.252 --> 00:20:21.960 And you can think of it metaphorically as 00:20:21.960 --> 00:20:25.450 like sprinkling a little bit of salt on the hash function 00:20:25.450 --> 00:20:27.660 so as to change what its output is. 00:20:27.660 --> 00:20:31.090 It's not random, per se, but you are perturbing the output 00:20:31.090 --> 00:20:34.470 so that it's much less likely that two people with the same passwords 00:20:34.470 --> 00:20:36.970 are going to have the same hash value. 00:20:36.970 --> 00:20:38.190 So how does this work? 00:20:38.190 --> 00:20:42.370 In this case before, when we passed in cherry as our input, 00:20:42.370 --> 00:20:45.640 we got the same hash again and again. 00:20:45.640 --> 00:20:50.220 But let me propose that we modify our hash function to take two inputs now. 00:20:50.220 --> 00:20:55.260 Not just the password, but also a salt value, so to speak. 00:20:55.260 --> 00:20:58.740 A little bit of a sprinkling of, in this case, just two characters-- 00:20:58.740 --> 00:21:01.690 two numbers, two letters, or a combination thereof. 00:21:01.690 --> 00:21:04.380 Now this hash function that I'm describing is still 00:21:04.380 --> 00:21:06.660 going to output a hash value, but notice, 00:21:06.660 --> 00:21:09.840 it's different from the one before, and even if you don't quite remember 00:21:09.840 --> 00:21:11.940 what it was before, it was not this. 00:21:11.940 --> 00:21:15.790 But worth noting is that in the output of this hash function 00:21:15.790 --> 00:21:18.130 now is the salt itself. 00:21:18.130 --> 00:21:22.650 So the salt isn't something that's meant to be private or secret or secure, it's 00:21:22.650 --> 00:21:26.430 just sprinkled in there to make sure that whatever hash value comes out 00:21:26.430 --> 00:21:29.820 of this black box is a little bit different than if you 00:21:29.820 --> 00:21:32.920 had put a different salt value instead. 00:21:32.920 --> 00:21:38.190 So for instance, suppose that for Carol and for Charlie, 00:21:38.190 --> 00:21:39.870 we use different salts. 00:21:39.870 --> 00:21:40.920 And that's the idea. 00:21:40.920 --> 00:21:43.140 Different users should have different salt values 00:21:43.140 --> 00:21:45.540 just in case they choose the same passwords. 00:21:45.540 --> 00:21:48.690 So instead of 50 and cherry, suppose that Charlie 00:21:48.690 --> 00:21:51.930 uses a salt value of, say, 49. 00:21:51.930 --> 00:21:55.140 49 is not a number that Charlie or you or me have to pick. 00:21:55.140 --> 00:21:57.150 This is all done by the server automatically, 00:21:57.150 --> 00:22:00.940 picking a random two characters like 4-9 or 5-0. 00:22:00.940 --> 00:22:02.190 But notice what just happened. 00:22:02.190 --> 00:22:07.830 If I rewind to cherry with a salt of 5, this was the hash value, the first two 00:22:07.830 --> 00:22:10.770 characters of which are the salt. If, though, 00:22:10.770 --> 00:22:16.110 I change the salt from 50 to 49, the hash changes completely, 00:22:16.110 --> 00:22:19.800 and it prefixes it with now 49 instead of 50. 00:22:19.800 --> 00:22:24.390 This ensures that even if Carol and Charlie have the exact same password, 00:22:24.390 --> 00:22:28.380 there's no way I, the adversary, am going to know by looking at it. 00:22:28.380 --> 00:22:32.860 Because indeed, what ends up in the file now are these two values. 00:22:32.860 --> 00:22:37.110 One is prefixed with 50, one is prefixed with 49, the rest of the hash values 00:22:37.110 --> 00:22:39.310 clearly are completely different. 00:22:39.310 --> 00:22:42.720 So again, the upside is this approach where the hash function 00:22:42.720 --> 00:22:46.050 takes two inputs, the password and a salt, 00:22:46.050 --> 00:22:51.570 and then outputs one hash value means that we're not leaking information 00:22:51.570 --> 00:22:53.040 except-- 00:22:53.040 --> 00:22:55.020 except-- so there is a corner case-- 00:22:55.020 --> 00:22:58.470 if by chance, by bad luck, the system chooses 00:22:58.470 --> 00:23:01.410 the same salt for both Carol and Charlie, 00:23:01.410 --> 00:23:03.990 yes, there might still be information leaked. 00:23:03.990 --> 00:23:06.480 And honestly, that may very well happen if you've 00:23:06.480 --> 00:23:09.270 got thousands, millions of users, then you're 00:23:09.270 --> 00:23:11.550 going to run out of two-character possibilities, 00:23:11.550 --> 00:23:13.010 you're going to have to reuse salt. 00:23:13.010 --> 00:23:15.020 But the idea is that we're just trying to put 00:23:15.020 --> 00:23:20.240 downward pressure on the probability of being attacked successfully. 00:23:20.240 --> 00:23:23.120 We're trying to equivalently raise the bar to the adversary 00:23:23.120 --> 00:23:28.160 so that they are not as likely to gain access to my data or, in turn, 00:23:28.160 --> 00:23:29.360 my account. 00:23:29.360 --> 00:23:34.650 Questions now on salting or hashing itself? 00:23:34.650 --> 00:23:35.940 STUDENT: Oh, I'm curious. 00:23:35.940 --> 00:23:37.562 Where do we store the salt? 00:23:37.562 --> 00:23:39.520 DAVID J. MALAN: So where do you store the salt? 00:23:39.520 --> 00:23:43.470 The salt is actually stored in the hash value itself, 00:23:43.470 --> 00:23:46.590 according to this algorithm, in the first two characters. 00:23:46.590 --> 00:23:50.580 And the value of storing the salt in the first two characters of the hash 00:23:50.580 --> 00:23:51.550 is as follows. 00:23:51.550 --> 00:23:56.310 The next time Carol logs in, she types in her username, Carol, and hits Enter. 00:23:56.310 --> 00:23:59.880 The server now knows, OK, I'm expecting a password from Carol, 00:23:59.880 --> 00:24:01.320 let's see what she types in. 00:24:01.320 --> 00:24:04.080 Suppose that she types in correctly cherry. 00:24:04.080 --> 00:24:06.720 Now the system is not storing cherry, so it's not 00:24:06.720 --> 00:24:08.910 going to compare literally what Carol typed in, 00:24:08.910 --> 00:24:14.160 but it is going to hash cherry, but first, the system is going to check, 00:24:14.160 --> 00:24:17.010 what is Carol's hash-- what is Carol's salt? 00:24:17.010 --> 00:24:21.060 And it's going to infer as much by looking at Carol's hash value 00:24:21.060 --> 00:24:24.013 and looking only at the first two characters by convention. 00:24:24.013 --> 00:24:27.180 Then what the server is going to do, it's going to take whatever Carol typed 00:24:27.180 --> 00:24:33.850 in, cherry, C-H-E-R-R-Y, it's going to pass in 50, 5-0, and then hopefully, 00:24:33.850 --> 00:24:36.730 it's going to get back to this same value here, 00:24:36.730 --> 00:24:38.480 this whole string in yellow. 00:24:38.480 --> 00:24:43.180 And if those are correct, then Carol will be considered authenticated. 00:24:43.180 --> 00:24:47.470 By contrast, if the username happens to be Charlie and Charlie hits Enter, 00:24:47.470 --> 00:24:50.560 then what the server is going to do is look at Charlie's hash value, 00:24:50.560 --> 00:24:53.680 grab the first two characters for Charlie's salt, 00:24:53.680 --> 00:24:57.340 use that salt and cherry as the input to the hash function, 00:24:57.340 --> 00:25:02.350 and hope that the result is Charlie's value, not Carol's. 00:25:02.350 --> 00:25:04.040 Really good question. 00:25:04.040 --> 00:25:07.120 Other questions on salting or hashing? 00:25:07.120 --> 00:25:09.790 STUDENT: Is there any sense in rehashing a password? 00:25:09.790 --> 00:25:14.770 So hashing it a first time to get a string, 00:25:14.770 --> 00:25:17.080 then rehashing it for a second string? 00:25:17.080 --> 00:25:19.210 Or it's just impractical? 00:25:19.210 --> 00:25:22.300 DAVID J. MALAN: No, you could certainly hash the value multiple times, 00:25:22.300 --> 00:25:25.430 but a good hash function should not require that of you. 00:25:25.430 --> 00:25:28.330 Especially now, more recent modern hashes, one of which 00:25:28.330 --> 00:25:32.320 we'll look at in a moment, they should have sufficiently calculated 00:25:32.320 --> 00:25:36.580 and proven characteristics that allow you to hash it just once 00:25:36.580 --> 00:25:39.430 and you will get a seemingly random string 00:25:39.430 --> 00:25:41.800 that represents whatever that input is. 00:25:41.800 --> 00:25:44.320 And here, too, is where I should emphasize 00:25:44.320 --> 00:25:47.980 that when it comes to this world of hashing and salting 00:25:47.980 --> 00:25:51.670 and today's other topics ultimately, these are not wheels 00:25:51.670 --> 00:25:54.640 that you or I should be reinventing. 00:25:54.640 --> 00:25:57.880 Unless you are the researcher or the company that's actually 00:25:57.880 --> 00:26:02.320 developing the algorithm, stress-testing them, analyzing them theoretically 00:26:02.320 --> 00:26:05.890 and practically so often in industry or the real world, 00:26:05.890 --> 00:26:10.300 when people like you and me invent our own systems for storing information, 00:26:10.300 --> 00:26:13.000 we just haven't spent nearly as much time 00:26:13.000 --> 00:26:16.120 or we're just not nearly as sharp as some of the security researchers 00:26:16.120 --> 00:26:18.680 out there who have really given this some thought. 00:26:18.680 --> 00:26:22.510 So when it comes to all things security-- and let me get on my soapbox 00:26:22.510 --> 00:26:26.200 here and say, you and I should not be solving these problems unless it is 00:26:26.200 --> 00:26:29.710 your full-time job or calling in life. 00:26:29.710 --> 00:26:31.930 There's just too many corner cases unless you're 00:26:31.930 --> 00:26:35.000 collaborating with a smart team. 00:26:35.000 --> 00:26:35.500 All right. 00:26:35.500 --> 00:26:40.230 With that said, here is what hashes generally 00:26:40.230 --> 00:26:41.870 look like nowadays in practice. 00:26:41.870 --> 00:26:43.620 For the sake of discussion, I deliberately 00:26:43.620 --> 00:26:48.240 chose a fairly simple hash function that was using a fairly short salt, 00:26:48.240 --> 00:26:52.260 just two characters, and a fairly short hash value as output. 00:26:52.260 --> 00:26:57.540 Here, in a smaller font, no less, is how Alice's and Bob's and Carol's 00:26:57.540 --> 00:27:00.180 and Charlie's passwords would probably be 00:27:00.180 --> 00:27:03.720 stored nowadays using a more recent modern hash function 00:27:03.720 --> 00:27:06.900 that, notice, by the shear length of the text on the screen, 00:27:06.900 --> 00:27:09.210 outputs a much larger value. 00:27:09.210 --> 00:27:12.030 If you're familiar from computer science with the notion of bits, 00:27:12.030 --> 00:27:15.420 0's and 1's that are used to store information in systems, 00:27:15.420 --> 00:27:19.350 these hash values use many more bits, many more 0's and 1's. 00:27:19.350 --> 00:27:23.910 You and I as humans are seeing them as alphabetical letters and as numbers, 00:27:23.910 --> 00:27:27.300 but underneath the hood, these are just more and more 0's and 1's 00:27:27.300 --> 00:27:31.470 that the computer is storing, which means it's much, much less likely 00:27:31.470 --> 00:27:33.840 that someone who steals this kind of file 00:27:33.840 --> 00:27:37.260 is going to be able to figure out efficiently what 00:27:37.260 --> 00:27:38.920 those original passwords were. 00:27:38.920 --> 00:27:41.500 And you can see, too, that for both Carol and Charlie, 00:27:41.500 --> 00:27:43.900 even though their passwords are still cherry, 00:27:43.900 --> 00:27:47.980 these two strings along the bottom look completely different. 00:27:47.980 --> 00:27:49.720 Except in one location here. 00:27:49.720 --> 00:27:52.720 It turns out that the scheme a lot of systems have adopted 00:27:52.720 --> 00:27:56.080 is that if you look between dollar signs at the beginning of what 00:27:56.080 --> 00:28:01.330 seems to be the hash value, you'll see a code like y or y or y or y 00:28:01.330 --> 00:28:03.520 or other numbers or letters as well. 00:28:03.520 --> 00:28:07.780 That's a little cheat sheet that tells the system exactly what hash function 00:28:07.780 --> 00:28:10.037 was used to generate the rest of it. 00:28:10.037 --> 00:28:12.370 And that's in the documentation that you can read online 00:28:12.370 --> 00:28:14.720 for any number of hash functions. 00:28:14.720 --> 00:28:18.850 So that's just to say, when you create an account on some new website or app, 00:28:18.850 --> 00:28:22.840 if they are doing things well in a manner consistent with best practices 00:28:22.840 --> 00:28:27.100 and they are being mindful of your security, they are probably in a file 00:28:27.100 --> 00:28:29.890 or in a database or some other mechanism storing 00:28:29.890 --> 00:28:34.780 values that look quite like these based on whatever password you actually 00:28:34.780 --> 00:28:35.605 typed in. 00:28:35.605 --> 00:28:38.980 In fact, just to give you a sense of how easy or difficult 00:28:38.980 --> 00:28:43.690 it might be to crack passwords-- that is, figure out what they are based only 00:28:43.690 --> 00:28:47.080 on these hashes, in the case of our first hash function 00:28:47.080 --> 00:28:50.470 whereby we had a fairly short hash value being outputted 00:28:50.470 --> 00:28:52.810 with or without the salt, turns out, there's 00:28:52.810 --> 00:28:56.480 18 quintillion possible hash values. 00:28:56.480 --> 00:28:57.640 Now that's a lot. 00:28:57.640 --> 00:29:00.580 That's bigger than last times quadrillion value. 00:29:00.580 --> 00:29:04.880 But, with enough time, enough money, and enough cloud computing, 00:29:04.880 --> 00:29:07.810 those early hash functions can be broken. 00:29:07.810 --> 00:29:10.060 That is, with enough time and energy, you can probably 00:29:10.060 --> 00:29:11.860 figure out what someone's password is. 00:29:11.860 --> 00:29:14.650 If you fast forward to the other strings that I showed you 00:29:14.650 --> 00:29:18.080 on the screen, the much longer ones that use more bits, so to speak, 00:29:18.080 --> 00:29:22.510 then you have this many possible hash values nowadays. 00:29:22.510 --> 00:29:25.000 And I actually did look up how to pronounce this, 00:29:25.000 --> 00:29:28.300 but based on reading it on my screen, I wasn't actually sure 00:29:28.300 --> 00:29:31.915 how to say the word since this is a really big number that my mathematician 00:29:31.915 --> 00:29:33.790 colleagues could do a better job pronouncing. 00:29:33.790 --> 00:29:36.040 But given how many digits are on the screen, 00:29:36.040 --> 00:29:38.350 given how many commas are on the screen here, 00:29:38.350 --> 00:29:40.360 this is a really big number such that you 00:29:40.360 --> 00:29:45.730 and I probably don't need to worry about an adversary using brute force figuring 00:29:45.730 --> 00:29:49.660 out and still being able to figure out by the end of time 00:29:49.660 --> 00:29:52.660 what the corresponding password might be unless there 00:29:52.660 --> 00:29:55.510 are other weaknesses in the system. 00:29:55.510 --> 00:29:57.370 Now speaking of weaknesses. 00:29:57.370 --> 00:30:00.130 Has anyone ever forgotten your password? 00:30:00.130 --> 00:30:00.980 Yes, of course. 00:30:00.980 --> 00:30:04.630 But have you ever gone to a website or app, clicked that link that says, 00:30:04.630 --> 00:30:09.130 Forgot Password, question mark, in hopes of getting an email of some sort 00:30:09.130 --> 00:30:10.840 so that you can reset the password? 00:30:10.840 --> 00:30:13.840 I mean, odds are, almost everyone here has experienced that. 00:30:13.840 --> 00:30:17.830 But has anyone ever clicked on that link, gotten back 00:30:17.830 --> 00:30:22.030 an email that actually contains your password 00:30:22.030 --> 00:30:24.520 so that you're just immediately reminded what it is? 00:30:24.520 --> 00:30:26.110 I'm seeing a few nods of the head. 00:30:26.110 --> 00:30:28.520 You can copy-paste it, then, into the website. 00:30:28.520 --> 00:30:30.880 Do not use that website anymore. 00:30:30.880 --> 00:30:36.250 That is evidence of-- that is a symptom of a website or application not 00:30:36.250 --> 00:30:38.630 practicing best practices. 00:30:38.630 --> 00:30:39.160 Why? 00:30:39.160 --> 00:30:43.420 Well, if it is the case that the website can email you your password, 00:30:43.420 --> 00:30:46.930 that means they can see and they know what your password is. 00:30:46.930 --> 00:30:49.270 That means this database, this text file we've 00:30:49.270 --> 00:30:52.270 been talking about is probably vulnerable to some hacker 00:30:52.270 --> 00:30:55.360 eventually getting into it and stealing all of those usernames 00:30:55.360 --> 00:30:57.730 and passwords in the clear, no less. 00:30:57.730 --> 00:30:59.830 Because recall what these hashes are. 00:30:59.830 --> 00:31:02.230 They're generally meant to be irreversible. 00:31:02.230 --> 00:31:04.780 When you take as input apple, banana, and cherry, 00:31:04.780 --> 00:31:08.800 the output looks completely different with no obvious relationship to what 00:31:08.800 --> 00:31:11.590 those original passwords actually were. 00:31:11.590 --> 00:31:14.170 And so if that's what's being stored in the database, 00:31:14.170 --> 00:31:18.610 the company who made that website, the person who made that website or app, 00:31:18.610 --> 00:31:21.370 they should not be able to reverse that process either, 00:31:21.370 --> 00:31:23.470 otherwise surely, the adversary can. 00:31:23.470 --> 00:31:27.100 So it is the case, and I've experienced this myself, often 00:31:27.100 --> 00:31:30.730 from smaller shops or companies that maybe haven't really 00:31:30.730 --> 00:31:33.310 invested a lot of time or care into their website, 00:31:33.310 --> 00:31:37.120 if they are able to email you your original password, 00:31:37.120 --> 00:31:39.670 it is, by definition, not secure. 00:31:39.670 --> 00:31:43.120 And it's certainly not up to today's standards, it's just too easy 00:31:43.120 --> 00:31:44.480 for it to be compromised. 00:31:44.480 --> 00:31:46.752 So maybe minimally stop using that service 00:31:46.752 --> 00:31:49.210 and make sure you're not using that password anywhere else. 00:31:49.210 --> 00:31:52.840 Maximally, maybe send them a note explaining your concern 00:31:52.840 --> 00:31:55.420 and maybe linking them to some reference online-- 00:31:55.420 --> 00:32:01.270 maybe this video-- in which you explain why you have that concern. 00:32:01.270 --> 00:32:07.630 Questions, then, on forgetting passwords or hashing or salting? 00:32:07.630 --> 00:32:13.190 STUDENT: So as you said, some companies may not be practicing these hashes 00:32:13.190 --> 00:32:15.790 and maybe practicing something very bad. 00:32:15.790 --> 00:32:19.870 So if I were, let's say, a company and I-- 00:32:19.870 --> 00:32:24.670 because of my practices, I had a leak of passwords and all the data, 00:32:24.670 --> 00:32:30.400 do I as a company have any obligations or responsibility for what 00:32:30.400 --> 00:32:35.320 happened since I have all the customer's data and all their passwords, 00:32:35.320 --> 00:32:38.982 do I have any obligations or responsibilities? 00:32:38.982 --> 00:32:41.190 DAVID J. MALAN: It's a really good, a noble question. 00:32:41.190 --> 00:32:45.000 The answer to that ethically is probably yes you should, quite simply. 00:32:45.000 --> 00:32:48.000 However, the more nuanced answer is that it's probably 00:32:48.000 --> 00:32:51.780 going to depend on the industry that you're in, the country that you're in, 00:32:51.780 --> 00:32:55.380 any regulatory requirements that your company faces which might 00:32:55.380 --> 00:32:58.420 oblige you to report out in that way. 00:32:58.420 --> 00:33:02.220 So I would read up on the context that's specific to you yourself. 00:33:02.220 --> 00:33:08.100 And I will say, unfortunately, it is not that common in the world, I dare say, 00:33:08.100 --> 00:33:11.370 that companies document and detail publicly 00:33:11.370 --> 00:33:13.638 when there have been security exploits. 00:33:13.638 --> 00:33:15.930 They might announce that something indeed has happened, 00:33:15.930 --> 00:33:19.140 but it is rare that companies will go into any amount of detail. 00:33:19.140 --> 00:33:22.800 Now this is understandable because, one, they're already embarrassed, 00:33:22.800 --> 00:33:26.520 or if not in legal trouble or financial trouble because that has happened 00:33:26.520 --> 00:33:29.370 already, but they probably, typically, don't 00:33:29.370 --> 00:33:33.960 want to provide other adversaries-- other future attackers-- with more 00:33:33.960 --> 00:33:38.490 information about their systems and the weaknesses that those systems have. 00:33:38.490 --> 00:33:42.360 The downside, of course, societally, is that if each of us 00:33:42.360 --> 00:33:45.420 is secretly getting attacked in ways we didn't 00:33:45.420 --> 00:33:49.350 expect, learning things that would be ideal to share 00:33:49.350 --> 00:33:51.160 with others in the world. 00:33:51.160 --> 00:33:54.420 This itself is actually a big question in the world of cybersecurity, 00:33:54.420 --> 00:33:57.000 just how much and how often to share, especially when 00:33:57.000 --> 00:33:59.400 you discover a bug or a mistake in someone's system, 00:33:59.400 --> 00:34:02.130 do you tell them privately, do you tell the world publicly? 00:34:02.130 --> 00:34:06.540 These are ethical questions that we'll touch on indeed in the coming days 00:34:06.540 --> 00:34:07.880 as well. 00:34:07.880 --> 00:34:12.568 Allow me to propose that separate from these concerns here, 00:34:12.568 --> 00:34:14.610 we can come back to some of those recommendations 00:34:14.610 --> 00:34:17.940 that we started the class with from this, the National Institute 00:34:17.940 --> 00:34:19.380 for Standards and Technology. 00:34:19.380 --> 00:34:22.800 Notice that this was one other quote we did not share last time. 00:34:22.800 --> 00:34:25.230 A recommendation from NIST is that "Verifiers 00:34:25.230 --> 00:34:28.110 shall store memorized secrets in the form that 00:34:28.110 --> 00:34:30.030 is resistant to offline attacks. 00:34:30.030 --> 00:34:33.690 Memorized secrets SHALL be salted and hashed 00:34:33.690 --> 00:34:36.960 using a suitable one-way key derivation function. 00:34:36.960 --> 00:34:41.199 Their purpose is to make each password guessing trial 00:34:41.199 --> 00:34:45.639 by an attacker who has obtained a password hash file expensive, 00:34:45.639 --> 00:34:48.400 and therefore, the cost of guessing attack-- 00:34:48.400 --> 00:34:51.610 of a guessing attack high or prohibitive." 00:34:51.610 --> 00:34:54.040 So when I refer to best practices, I'm really 00:34:54.040 --> 00:34:58.030 referring to actual documentation like this, either from the United States, 00:34:58.030 --> 00:35:00.370 from other countries, from other companies. 00:35:00.370 --> 00:35:03.940 There are indeed these best practices, and among our goals 00:35:03.940 --> 00:35:06.550 for this class is to expose you to some of those, 00:35:06.550 --> 00:35:10.630 both on the consumer side-- you and me as individual computer users, 00:35:10.630 --> 00:35:13.360 but also on the corporate or the academic side 00:35:13.360 --> 00:35:15.220 as well as to what you should be doing when 00:35:15.220 --> 00:35:19.000 you are in a position of being responsible for someone else's data as 00:35:19.000 --> 00:35:19.610 well. 00:35:19.610 --> 00:35:22.690 Now as for the actual hash functions to use nowadays, 00:35:22.690 --> 00:35:26.140 these are just some of them that are generally recommended nowadays 00:35:26.140 --> 00:35:29.170 that can be categorized as SHA-2 and SHA-3. 00:35:29.170 --> 00:35:32.920 These refer to fairly sophisticated mathematical functions that 00:35:32.920 --> 00:35:36.910 take as input, typically, a password, or some input more generally, 00:35:36.910 --> 00:35:40.060 and then output a hash value thereof. 00:35:40.060 --> 00:35:43.060 There are other algorithms, too, that can even 00:35:43.060 --> 00:35:49.370 be used to verify the authenticity and integrity of messages as well. 00:35:49.370 --> 00:35:53.620 In fact, today, we'll also focus on how we can use primitives like these 00:35:53.620 --> 00:35:57.490 to ensure that data was not actually changed in transit when you sent it 00:35:57.490 --> 00:36:00.040 over the internet from one person to another. 00:36:00.040 --> 00:36:03.280 But ultimately, what we've been focusing on and what you've seen on this list 00:36:03.280 --> 00:36:07.330 here are what are generally known as one-way hash functions. 00:36:07.330 --> 00:36:09.790 That is, these are mathematical functions, 00:36:09.790 --> 00:36:12.370 or, in the context of programming, these are 00:36:12.370 --> 00:36:15.760 functions written in code, languages like Python 00:36:15.760 --> 00:36:20.020 or otherwise, that take as input a string of arbitrary length. 00:36:20.020 --> 00:36:23.590 That is, a password that's this long, maybe this long, maybe this long, 00:36:23.590 --> 00:36:25.900 but what's key to these cryptographic functions 00:36:25.900 --> 00:36:29.680 is they output a hash value of fixed length 00:36:29.680 --> 00:36:33.610 that is always this many bytes or characters or this many bytes 00:36:33.610 --> 00:36:34.330 or characters. 00:36:34.330 --> 00:36:37.670 That is, it doesn't matter how short or how long the password is, 00:36:37.670 --> 00:36:42.740 these cryptographic, these one-way hash functions are one-way in the sense 00:36:42.740 --> 00:36:46.055 that they take a potentially infinite domain, if you 00:36:46.055 --> 00:36:51.500 know this term for mathematics, and condense it into a finite range. 00:36:51.500 --> 00:36:55.190 That is, a huge number of values, all possible passwords in the world, 00:36:55.190 --> 00:36:58.400 to just a finite list of possible hash values. 00:36:58.400 --> 00:37:01.130 It might be a long list of possible hash values, 00:37:01.130 --> 00:37:03.530 but indeed, no matter how long a string of text 00:37:03.530 --> 00:37:06.020 is, if it's of some fixed length-- 00:37:06.020 --> 00:37:08.870 16 characters, 32 characters, something else, 00:37:08.870 --> 00:37:11.880 there's only a finite number of those values. 00:37:11.880 --> 00:37:13.910 Now there's an implication of this. 00:37:13.910 --> 00:37:19.100 When you take a really large input space or domain mathematically 00:37:19.100 --> 00:37:24.200 and map it to a smaller finite range, so to speak, 00:37:24.200 --> 00:37:28.700 mathematically, it turns out that if you do try to reverse the process, 00:37:28.700 --> 00:37:33.750 there will be multiple inputs that yield the same output. 00:37:33.750 --> 00:37:34.970 Think about it this way. 00:37:34.970 --> 00:37:37.930 If you've got 100 possible passwords in the world, 00:37:37.930 --> 00:37:41.350 but you only have 10 possible hash values-- 00:37:41.350 --> 00:37:44.790 so 100 passwords, 10 hash values, you have 00:37:44.790 --> 00:37:49.320 to figure out how to put all of those passwords into 10 buckets, so to speak. 00:37:49.320 --> 00:37:52.870 So surely, some of those passwords are going to be in the same bucket. 00:37:52.870 --> 00:37:54.870 Think about it in terms of the English alphabet. 00:37:54.870 --> 00:37:59.790 If we stuck with that original hash function where A was 1, B was 2, 00:37:59.790 --> 00:38:05.580 C was 3, presumably Z was 26, there's more than one fruit 00:38:05.580 --> 00:38:06.780 that starts with a-- 00:38:06.780 --> 00:38:09.040 apple, avocado, and so forth. 00:38:09.040 --> 00:38:13.020 So there, too, you are going to have multiple fruits mapping 00:38:13.020 --> 00:38:17.880 to the same finite range of values, hash values 1 through 26. 00:38:17.880 --> 00:38:21.780 What that means is that if an adversary, or even you, the owner of the system, 00:38:21.780 --> 00:38:24.270 look at that hash value and see the number 1, 00:38:24.270 --> 00:38:30.390 you don't know if the password was apple or avocado or some other word that 00:38:30.390 --> 00:38:34.260 started with A. And so that's what we mean by one-way hash functions. 00:38:34.260 --> 00:38:39.660 You cannot reliably reverse the process by any means and know definitively what 00:38:39.660 --> 00:38:41.340 the original input is. 00:38:41.340 --> 00:38:42.660 Now there is a catch. 00:38:42.660 --> 00:38:45.480 That technically means on some systems, it 00:38:45.480 --> 00:38:50.070 might be possible to log in with apple or avocado, or more 00:38:50.070 --> 00:38:54.960 generally, your actual password and some other seemingly random password that 00:38:54.960 --> 00:38:58.740 might make no sense to you, but just because mathematically it 00:38:58.740 --> 00:39:03.190 has the same hash value, that password, too, might let you into the system. 00:39:03.190 --> 00:39:07.500 But the idea is, especially as we're using really large numbers of bits, 00:39:07.500 --> 00:39:11.910 really long hash values, the probability of you or me figuring 00:39:11.910 --> 00:39:14.580 out or an adversary even guessing what that other hash 00:39:14.580 --> 00:39:17.430 value or what those other inputs-- 00:39:17.430 --> 00:39:22.328 passwords might be is just so small that we tend not to worry about it as well. 00:39:22.328 --> 00:39:24.120 The algorithms we've looked at on the board 00:39:24.120 --> 00:39:27.690 here are also known as cryptographic hash functions, which 00:39:27.690 --> 00:39:31.200 means they have utility in the world of cryptography 00:39:31.200 --> 00:39:34.920 where the world of cryptography is all about the practice and the study 00:39:34.920 --> 00:39:36.900 of securing data. 00:39:36.900 --> 00:39:41.400 Securing data while in transit from one point to another or while 00:39:41.400 --> 00:39:43.210 at rest on your own system. 00:39:43.210 --> 00:39:45.900 Let's go ahead here and take a five-minute break, 00:39:45.900 --> 00:39:50.130 and when we come back, we'll explore precisely that world of cryptography 00:39:50.130 --> 00:39:52.580 with respect to our data. 00:39:52.580 --> 00:39:53.570 All right. 00:39:53.570 --> 00:39:54.350 We're back. 00:39:54.350 --> 00:39:57.920 And indeed, cryptography is all about the practice and study 00:39:57.920 --> 00:40:01.610 of securing our data, particularly when we want to transmit it 00:40:01.610 --> 00:40:03.300 from one person to another. 00:40:03.300 --> 00:40:07.370 So cryptography can be broken down into a couple of different categories, one 00:40:07.370 --> 00:40:08.570 of which are codes. 00:40:08.570 --> 00:40:12.860 And codes are not the type of code that you might write in Python or the like. 00:40:12.860 --> 00:40:15.260 It has nothing to do with software, but rather, 00:40:15.260 --> 00:40:17.720 a mapping between what we'll call code words 00:40:17.720 --> 00:40:21.980 and the actual message or true reading that those words represent. 00:40:21.980 --> 00:40:25.530 Here, for instance, is an actual book from over 100 years ago 00:40:25.530 --> 00:40:28.850 that was used to map these code words in the left column 00:40:28.850 --> 00:40:32.330 to these, indeed, messages or true readings on the right. 00:40:32.330 --> 00:40:35.180 The idea is, that if that one party wanted 00:40:35.180 --> 00:40:37.640 to send a secure message to another party, 00:40:37.640 --> 00:40:39.710 they wouldn't just write it out in plain English. 00:40:39.710 --> 00:40:40.210 Why? 00:40:40.210 --> 00:40:43.430 Because if that message, written on a piece of paper or parchment, 00:40:43.430 --> 00:40:46.460 were intercepted by another human, that other human, 00:40:46.460 --> 00:40:48.410 assuming they, too, know English, could just 00:40:48.410 --> 00:40:51.500 read the actual message, the so-called plaintext. 00:40:51.500 --> 00:40:55.890 In a code, though, you can convert the words 00:40:55.890 --> 00:41:00.870 that you want to say to code words that make no sense necessarily 00:41:00.870 --> 00:41:03.360 to someone who's intercepted the message in and of itself 00:41:03.360 --> 00:41:05.670 unless they, too, have this book. 00:41:05.670 --> 00:41:08.430 Now you can imagine this being a fairly time-consuming process 00:41:08.430 --> 00:41:12.180 because when the recipient receives that message, unless they've memorized 00:41:12.180 --> 00:41:15.120 all of these pages, these code words and the meanings thereof, 00:41:15.120 --> 00:41:19.530 they have to do quite a bit of work flipping through their copy of the book 00:41:19.530 --> 00:41:21.750 in order to figure out what that message is. 00:41:21.750 --> 00:41:24.150 But the fact that they have a copy of the book, too, 00:41:24.150 --> 00:41:28.410 is a potential threat because if one party or another had their code 00:41:28.410 --> 00:41:33.450 book stolen, then any of the messages they've sent can now be decoded, 00:41:33.450 --> 00:41:36.180 so to speak, by looking them up retrospectively. 00:41:36.180 --> 00:41:38.910 And any future messages, if the owners of the book 00:41:38.910 --> 00:41:41.760 don't realize that code book has been taken, so, too, 00:41:41.760 --> 00:41:43.890 could those messages be translated. 00:41:43.890 --> 00:41:46.320 Not to mention the fact, it's fairly cumbersome. 00:41:46.320 --> 00:41:48.960 This alone is page 187. 00:41:48.960 --> 00:41:51.720 And so that's quite a bit of codes and quite a bit of work 00:41:51.720 --> 00:41:54.240 just to achieve this layer of indirection. 00:41:54.240 --> 00:41:57.090 But there are some terms of art here that are worth knowing, 00:41:57.090 --> 00:41:59.460 and you might actually use in everyday context, 00:41:59.460 --> 00:42:01.690 but not necessarily for the same purpose. 00:42:01.690 --> 00:42:03.910 So encode, what do we mean by that? 00:42:03.910 --> 00:42:06.420 It means taking a plaintext text message, 00:42:06.420 --> 00:42:10.230 be it in English or any human language, and taking that as input 00:42:10.230 --> 00:42:12.930 and producing as output codetext. 00:42:12.930 --> 00:42:15.660 So the codetext might be a short succinct 00:42:15.660 --> 00:42:19.020 sequence of words that might actually be English words, 00:42:19.020 --> 00:42:22.083 but they're not meant to mean what they normally mean. 00:42:22.083 --> 00:42:24.000 They're meant to be looked up in the code book 00:42:24.000 --> 00:42:27.160 to figure out what the message is actually trying to say. 00:42:27.160 --> 00:42:29.910 Meanwhile, decode, as you might expect, is the opposite. 00:42:29.910 --> 00:42:33.810 You take as input the codetext that you have received as the recipient, 00:42:33.810 --> 00:42:38.010 you use that same code book to look up the code words 00:42:38.010 --> 00:42:41.100 and figure out what the actual message is in order 00:42:41.100 --> 00:42:45.420 to get the original plaintext, be it in English or any other human language 00:42:45.420 --> 00:42:47.320 that the code book is designed for. 00:42:47.320 --> 00:42:51.780 But there's an alternative to codes, if only because those code books can 00:42:51.780 --> 00:42:56.015 get very cumbersome indeed, they can be taken and compromised and the like. 00:42:56.015 --> 00:42:58.140 So it's not necessarily the best system in that you 00:42:58.140 --> 00:43:01.230 need to physically keep something like that secure, let alone 00:43:01.230 --> 00:43:03.210 do so efficiently when converting. 00:43:03.210 --> 00:43:05.850 So there are also what we'll call ciphers. 00:43:05.850 --> 00:43:09.220 And ciphers are more algorithmic in nature. 00:43:09.220 --> 00:43:11.910 So if you have taken a computer science or a programming course, 00:43:11.910 --> 00:43:16.320 you already have the predisposition to thinking algorithmically and taking 00:43:16.320 --> 00:43:20.040 a big problem and breaking it down into smaller pieces 00:43:20.040 --> 00:43:23.520 and then applying some kind of logic, sometimes again and again, 00:43:23.520 --> 00:43:25.300 in order to solve some problem. 00:43:25.300 --> 00:43:28.170 So ciphers focus on exactly that. 00:43:28.170 --> 00:43:30.780 They don't focus on maybe words or phrases. 00:43:30.780 --> 00:43:34.620 They might focus on individual letters instead or even bits 00:43:34.620 --> 00:43:37.115 if it's in the context nowadays of computers. 00:43:37.115 --> 00:43:39.240 So in the world of ciphers, you might have actually 00:43:39.240 --> 00:43:41.140 seen them in popular culture. 00:43:41.140 --> 00:43:46.380 So here, for instance, is just one frame from a famous film known as A Christmas 00:43:46.380 --> 00:43:47.850 Story, at least here in the US. 00:43:47.850 --> 00:43:51.780 It plays like every day all day long on a couple of TV channels 00:43:51.780 --> 00:43:55.230 around Christmas time, but this here is Ralphie, 00:43:55.230 --> 00:43:59.280 one of the main characters in the movie, and in his hands 00:43:59.280 --> 00:44:04.710 here is this secret decoder pin that he tried so hard to get through the mail, 00:44:04.710 --> 00:44:09.090 and the secret decoder pin was from little Orphan Annie herself. 00:44:09.090 --> 00:44:13.560 And what it does is implement mechanically a cipher, 00:44:13.560 --> 00:44:17.620 converting one letter to a number and back. 00:44:17.620 --> 00:44:20.610 But the thing twists left and right so that you can actually 00:44:20.610 --> 00:44:22.720 figure out what the mapping might be. 00:44:22.720 --> 00:44:26.100 So this is more of a cipher because it's operating at a lower level-- 00:44:26.100 --> 00:44:30.330 not in entire words or phrases, but one letter at a time. 00:44:30.330 --> 00:44:32.940 And it's a repeatable process that Ralphie, in this case, 00:44:32.940 --> 00:44:37.110 can apply again and again to all of the letters of the secret message. 00:44:37.110 --> 00:44:39.398 In World War II, the Germans, for instance, 00:44:39.398 --> 00:44:41.940 had the Enigma Machine that you might have read about or seen 00:44:41.940 --> 00:44:45.180 depicted in films, and this was a mechanical implementation 00:44:45.180 --> 00:44:47.160 of this same idea of a cipher. 00:44:47.160 --> 00:44:51.538 But instead of using mathematics or gears turning just this way and that, 00:44:51.538 --> 00:44:52.705 it was much more mechanical. 00:44:52.705 --> 00:44:55.270 It was with rotors and lights and the like, 00:44:55.270 --> 00:44:57.730 but it, too, was implementing a cipher and could 00:44:57.730 --> 00:45:00.430 be configured with different inputs in order 00:45:00.430 --> 00:45:03.670 to influence exactly what the output would be. 00:45:03.670 --> 00:45:07.330 But that, too, is a physical device, and we'll focus here for the most part, 00:45:07.330 --> 00:45:10.600 though, on things more digital, things that you can ultimately, for instance, 00:45:10.600 --> 00:45:15.100 nowadays implement much more readily and much more scalably in software. 00:45:15.100 --> 00:45:17.470 But the words we'll use are pretty much the same. 00:45:17.470 --> 00:45:21.760 To encipher a message means to take that message in English or any other 00:45:21.760 --> 00:45:25.780 language, or so-called plaintext, and convert it, not surprisingly, 00:45:25.780 --> 00:45:29.530 to ciphertext as output Meanwhile, the reverse-- 00:45:29.530 --> 00:45:33.280 or rather, an equivalent term here that you might know as well is to encrypt. 00:45:33.280 --> 00:45:38.050 Same idea, synonyms for our purposes, plaintext to ciphertext. 00:45:38.050 --> 00:45:39.580 To encipher or to encrypt. 00:45:39.580 --> 00:45:43.240 Nowadays, encrypt is probably the more common of those terms 00:45:43.240 --> 00:45:46.300 Meanwhile, decipher would be the opposite of that, 00:45:46.300 --> 00:45:49.960 to actually take the ciphertext that someone else has sent to you, 00:45:49.960 --> 00:45:54.110 run it through an algorithm or cipher, and get back the plaintext. 00:45:54.110 --> 00:45:57.200 Meanwhile, decrypt would be a synonym for that phrase, which 00:45:57.200 --> 00:46:00.770 refers to exactly the same process of taking ciphertext as input 00:46:00.770 --> 00:46:03.300 and outputting plaintext as output. 00:46:03.300 --> 00:46:06.320 So how do we configure these ciphers so that you and I 00:46:06.320 --> 00:46:10.940 can use the same algorithm but customize them, not only with our own messages, 00:46:10.940 --> 00:46:14.930 but also with our own settings so that just because you and I might 00:46:14.930 --> 00:46:18.260 want to send the same plaintext doesn't mean that the ciphertext has 00:46:18.260 --> 00:46:20.000 to actually be identical? 00:46:20.000 --> 00:46:22.460 And indeed, in the world of cryptography, 00:46:22.460 --> 00:46:27.350 it's quite recommended that you and I use public and well-documented, 00:46:27.350 --> 00:46:30.020 well-tried-and-tested algorithms publicly, 00:46:30.020 --> 00:46:35.300 but we do keep one piece of information secret so that our use of that cipher, 00:46:35.300 --> 00:46:37.730 that algorithm is specific to us. 00:46:37.730 --> 00:46:40.670 And this customization, this configuration 00:46:40.670 --> 00:46:42.590 are generally known as keys. 00:46:42.590 --> 00:46:47.270 Now keys, much like a physical key to a lock on an actual door to your home, 00:46:47.270 --> 00:46:51.320 a key is what unlocks the capabilities of this cipher, 00:46:51.320 --> 00:46:55.430 but it's a key that needs to be known and used not only by you, typically, 00:46:55.430 --> 00:46:57.420 but also by the recipient. 00:46:57.420 --> 00:46:59.930 So that by having copies of the same key, 00:46:59.930 --> 00:47:03.470 you can not only encrypt messages or encipher them, 00:47:03.470 --> 00:47:07.190 but you can also decrypt or decipher those messages, too. 00:47:07.190 --> 00:47:08.690 Now what are these keys in practice? 00:47:08.690 --> 00:47:10.890 They're not physical objects in the virtual world, 00:47:10.890 --> 00:47:13.160 but really just really big numbers. 00:47:13.160 --> 00:47:16.130 And often, there's some mathematical significance of these numbers, 00:47:16.130 --> 00:47:19.070 and sometimes those numbers don't even look like numbers. 00:47:19.070 --> 00:47:21.920 They might be presented on your phone or your laptop 00:47:21.920 --> 00:47:24.620 or desktop actually as letters of an alphabet 00:47:24.620 --> 00:47:26.600 and maybe even with some punctuation, too. 00:47:26.600 --> 00:47:29.753 But at the end of the day, they're really just numbers, or, of course, 00:47:29.753 --> 00:47:31.670 if you know a bit of computer science already, 00:47:31.670 --> 00:47:33.410 they're really just 0's and 1's. 00:47:33.410 --> 00:47:36.170 But it's perhaps helpful to think about them metaphorically as 00:47:36.170 --> 00:47:38.030 akin to these physical keys. 00:47:38.030 --> 00:47:40.400 Now how are these keys actually used? 00:47:40.400 --> 00:47:43.370 Well, within the world of cryptography, there 00:47:43.370 --> 00:47:45.470 are different types of encryption. 00:47:45.470 --> 00:47:49.520 And the first we'll look at is known as secret key cryptography. 00:47:49.520 --> 00:47:53.150 The presumption is that the security of your data 00:47:53.150 --> 00:47:56.540 relies on the secrecy of some key. 00:47:56.540 --> 00:48:00.860 So if A wants to send a message to B, then A and B 00:48:00.860 --> 00:48:05.210 must keep secret whatever key they are using to configure 00:48:05.210 --> 00:48:06.660 their choice of algorithms. 00:48:06.660 --> 00:48:08.310 So what do we mean by that? 00:48:08.310 --> 00:48:10.880 Well, secret key cryptography, specifically 00:48:10.880 --> 00:48:13.280 in the context of encryption and scrambling data, 00:48:13.280 --> 00:48:16.460 is also known as symmetric key encryption for the reason 00:48:16.460 --> 00:48:21.500 that both A and B in this story are going to use the exact same key. 00:48:21.500 --> 00:48:24.860 And we'll contrast this in just a bit with asymmetric key encryption, 00:48:24.860 --> 00:48:26.910 which solves other problems as well. 00:48:26.910 --> 00:48:29.390 So let's consider the process of encryption, 00:48:29.390 --> 00:48:32.720 much like the process of hashing, as being this black box. 00:48:32.720 --> 00:48:37.610 Somehow or other, this Black box is going to encrypt information for me. 00:48:37.610 --> 00:48:40.940 Taking as input my plaintext and hopefully outputting as output 00:48:40.940 --> 00:48:45.110 my ciphertext that I can actually send over the internet or some other channel 00:48:45.110 --> 00:48:47.040 to a recipient as well. 00:48:47.040 --> 00:48:50.480 So in the context, then, of secret key encryption, 00:48:50.480 --> 00:48:52.490 the picture looks a little something like this. 00:48:52.490 --> 00:48:55.430 Not only do you pass as input to the algorithm 00:48:55.430 --> 00:48:59.130 your plaintext message in English or any other human language, 00:48:59.130 --> 00:49:00.500 you also pass a key. 00:49:00.500 --> 00:49:04.700 And for now, just think of that key as a number that you and the other person 00:49:04.700 --> 00:49:07.010 have somehow agreed upon in advance. 00:49:07.010 --> 00:49:10.280 That algorithm, then, will ultimately output the ciphertext. 00:49:10.280 --> 00:49:13.850 And to be clear, the motivation for that key 00:49:13.850 --> 00:49:18.380 is to ensure that if I and you and you and you and you are all 00:49:18.380 --> 00:49:21.170 using the exact same encryption algorithm, 00:49:21.170 --> 00:49:23.750 it's not going to be obvious if and when we're 00:49:23.750 --> 00:49:26.450 sending the exact same messages because that, 00:49:26.450 --> 00:49:30.710 too, per our discussion of passwords, would leak information. 00:49:30.710 --> 00:49:33.410 Maybe you don't care about the information being leaked, 00:49:33.410 --> 00:49:37.760 but it's probably not a good thing if-- just because someone else is getting 00:49:37.760 --> 00:49:41.660 some message, that, makes it more likely that an adversary can 00:49:41.660 --> 00:49:46.190 infer what it is you sent because the ciphertext just so happens 00:49:46.190 --> 00:49:47.250 to look the same. 00:49:47.250 --> 00:49:51.290 We want our ciphertext to be unique to each of our transmissions. 00:49:51.290 --> 00:49:54.890 So, let's consider a simple, simple example. 00:49:54.890 --> 00:49:59.155 Suppose that the message I want to send is just as short as the capital letter 00:49:59.155 --> 00:50:04.300 A, and suppose that the key that I want to use is as simple as the number 1. 00:50:04.300 --> 00:50:06.970 These are not good best practices, but we'll 00:50:06.970 --> 00:50:08.530 use them for the sake of discussion. 00:50:08.530 --> 00:50:11.890 Let me propose that the simplest algorithm I can perhaps think of 00:50:11.890 --> 00:50:17.110 is actually one that would take A as input and 1 as input and output B. 00:50:17.110 --> 00:50:19.280 And you can perhaps infer where this is going. 00:50:19.280 --> 00:50:24.880 If I instead provide B as input and 1 as input for the plaintext and key 00:50:24.880 --> 00:50:27.580 respectively, then the output is C. 00:50:27.580 --> 00:50:30.520 So believe it or not, in yesteryear, Julius Caesar 00:50:30.520 --> 00:50:36.580 was known to use an algorithm like this whereby this algorithm, Caesar Cipher, 00:50:36.580 --> 00:50:39.190 is what's generally known as a rotational cipher, 00:50:39.190 --> 00:50:42.640 because you're rotating the letters of the English alphabet. 00:50:42.640 --> 00:50:46.880 A becomes B, B becomes C. And I bet if we continue this logic, 00:50:46.880 --> 00:50:50.240 we can go around from Z becoming A as well. 00:50:50.240 --> 00:50:52.850 Now this, of course, is being applied at the moment 00:50:52.850 --> 00:50:55.460 to very short messages that are not that useful. 00:50:55.460 --> 00:50:58.850 Sending A or B or C is not particularly useful in general, 00:50:58.850 --> 00:51:02.630 but it's demonstrating how we can encipher or encrypt 00:51:02.630 --> 00:51:05.930 our plaintext into our ciphertext. 00:51:05.930 --> 00:51:09.170 However, when someone receives this message, 00:51:09.170 --> 00:51:13.610 they need to not only what algorithm I used to encrypt it-- 00:51:13.610 --> 00:51:17.720 in this case, Caesar Cipher or a rotational cipher more generally, 00:51:17.720 --> 00:51:20.120 but they also need to know what the key is. 00:51:20.120 --> 00:51:22.100 And the key might not be as simple as 1. 00:51:22.100 --> 00:51:24.740 Here, for instance, is an example of 13. 00:51:24.740 --> 00:51:27.620 If your key is 13 and your plaintext is A, 00:51:27.620 --> 00:51:33.507 then your ciphertext should be N, because that is 13 places away from A, 00:51:33.507 --> 00:51:38.000 and so now the algorithm seems a little less obvious. 00:51:38.000 --> 00:51:41.630 13 is also representative of something that's long been known on the internet 00:51:41.630 --> 00:51:46.400 as ROT13 for R-O-T-1-3-- rotate 13 places. 00:51:46.400 --> 00:51:49.940 It's a very popular way of scrambling information 00:51:49.940 --> 00:51:52.610 but not in a way that you intend to be secure. 00:51:52.610 --> 00:51:55.790 Historically, it was often used for like movie spoilers online. 00:51:55.790 --> 00:51:59.570 If you want to make something a spoiler before there was CSS and blurring 00:51:59.570 --> 00:52:01.670 effects on websites and whatnot, you could just 00:52:01.670 --> 00:52:04.610 scramble it so it looks completely encrypted, 00:52:04.610 --> 00:52:07.310 but it's very easy for someone else with a click of a button 00:52:07.310 --> 00:52:08.930 even to just decrypt it. 00:52:08.930 --> 00:52:17.810 However, I would recommend that you not use a key of 26 because why? 00:52:17.810 --> 00:52:21.440 Well, at least in English, there's only 26 letters of the alphabet, capital A 00:52:21.440 --> 00:52:22.820 through capital Z in this case. 00:52:22.820 --> 00:52:26.690 So a key of 26 is going to output for your ciphertext 00:52:26.690 --> 00:52:29.600 the exact same thing as your plaintext. 00:52:29.600 --> 00:52:35.060 So there's another joke on the internet whereby ROT26 is twice 00:52:35.060 --> 00:52:39.780 as secure as ROT13 because 13 times 2 is 26, 00:52:39.780 --> 00:52:43.010 and obviously, that's not the case deductively here. 00:52:43.010 --> 00:52:47.600 Now of course, this particular algorithm and keys of this small size, 00:52:47.600 --> 00:52:49.790 1 through 26, not at all secure. 00:52:49.790 --> 00:52:50.300 Why? 00:52:50.300 --> 00:52:53.390 Well honestly, I don't even need a computer to crack this cipher. 00:52:53.390 --> 00:52:55.880 I can probably take out a piece of paper and pencil 00:52:55.880 --> 00:53:00.260 and just try all possible numbers from 1 to 25-- 00:53:00.260 --> 00:53:02.450 I don't need to even waste my time with 26-- 00:53:02.450 --> 00:53:06.650 and just figure out via brute force what keys someone might have used 00:53:06.650 --> 00:53:08.900 to send a message using this algorithm. 00:53:08.900 --> 00:53:13.430 Not on even single letters, but maybe it operates on every individual letter 00:53:13.430 --> 00:53:14.240 of their message. 00:53:14.240 --> 00:53:18.320 Wouldn't take me that long to probably figure this out by brute force by hand. 00:53:18.320 --> 00:53:21.650 And with code, my gosh, I could write some Python code probably 00:53:21.650 --> 00:53:23.850 that does it even faster than that. 00:53:23.850 --> 00:53:27.740 So here on the screen is some ciphertext that I created in advance. 00:53:27.740 --> 00:53:30.710 And I'll stipulate that this ciphertext was enciphered 00:53:30.710 --> 00:53:33.710 using that same rotational cipher, but I'm not 00:53:33.710 --> 00:53:36.890 going to tell you just yet what key I actually used. 00:53:36.890 --> 00:53:40.530 It was originally an English message in all capital letters. 00:53:40.530 --> 00:53:43.850 So the task at hand now is to decrypt this, I dare say. 00:53:43.850 --> 00:53:47.810 Whether you are the intended recipient of the message or maybe maliciously, 00:53:47.810 --> 00:53:51.560 you've intercepted my transmission with this message and it, 00:53:51.560 --> 00:53:54.260 and now you're trying to brute force your way through by trying, 00:53:54.260 --> 00:53:58.430 and by the looks of some heads going down and some scribbling, 1 or 2 or 3. 00:53:58.430 --> 00:54:01.805 I bet we could also brute force our way through this algorithm, but how? 00:54:01.805 --> 00:54:03.980 How does the decrypting process work? 00:54:03.980 --> 00:54:06.440 It's really just the same thing in reverse. 00:54:06.440 --> 00:54:10.340 If this now is our picture and you have ciphertext as your input, 00:54:10.340 --> 00:54:13.340 you should be able to pass the same key as input-- 00:54:13.340 --> 00:54:17.060 1, for instance or 13 or, with no good reason, 00:54:17.060 --> 00:54:20.010 26, and get back out the plaintext. 00:54:20.010 --> 00:54:23.060 But of course, the decryption algorithm is indeed the opposite 00:54:23.060 --> 00:54:25.640 because you don't want to just add one position 00:54:25.640 --> 00:54:32.150 or add two positions or three positions, you want to subtract 1 or 2 or 3 or 13. 00:54:32.150 --> 00:54:34.470 You want to go in reverse, so to speak. 00:54:34.470 --> 00:54:40.160 And so, if I were to pass in B as the ciphertext and 1 as the key, 00:54:40.160 --> 00:54:44.360 well, the plaintext decrypted should, of course, be A. 00:54:44.360 --> 00:54:47.060 And that holds now for all of the other letters of the alphabet, 00:54:47.060 --> 00:54:50.450 assuming I'm reversing this process, in order to decrypt. 00:54:50.450 --> 00:54:54.030 And now, I'll let you a glance at the screen here for just a moment 00:54:54.030 --> 00:54:58.140 and see if you yourselves can't figure out 00:54:58.140 --> 00:55:02.410 what this ciphertext is trying to say. 00:55:02.410 --> 00:55:05.410 And if you like the idea of figuring this out, 00:55:05.410 --> 00:55:08.800 if you want to get better at this particular skill, 00:55:08.800 --> 00:55:13.060 you are an aspiring cryptanalyst, I dare say, focusing 00:55:13.060 --> 00:55:14.590 on this world of cryptanalysis. 00:55:14.590 --> 00:55:18.670 And this, too, itself is a job, I dare say particularly with governments, 00:55:18.670 --> 00:55:23.180 trying to decrypt messages that might very well have been encrypted. 00:55:23.180 --> 00:55:26.590 Now hopefully the world is using more secure algorithms 00:55:26.590 --> 00:55:28.540 than these simple rotational ciphers. 00:55:28.540 --> 00:55:30.010 And what do I mean by secure? 00:55:30.010 --> 00:55:35.260 Hopefully they're using keys that are much bigger than small numbers like 1 00:55:35.260 --> 00:55:36.400 through 25. 00:55:36.400 --> 00:55:41.200 Hopefully they're using much, much, much larger numbers, many more bits, if only 00:55:41.200 --> 00:55:47.200 so that it takes you and me, when we try to apply cryptanalysis to ciphertext, 00:55:47.200 --> 00:55:52.600 it takes us way, way longer than this particular algorithm alone. 00:55:52.600 --> 00:55:55.060 Now I don't want to keep you in suspense, 00:55:55.060 --> 00:55:58.490 but I also don't want to spoil this if you'd like to try your hand at this. 00:55:58.490 --> 00:56:03.380 So go ahead and close your eyes if you don't want to see the answer to this, 00:56:03.380 --> 00:56:05.960 or I suppose you can just look away from your screen. 00:56:05.960 --> 00:56:10.538 But in five seconds, I'll reveal what the plaintext actually is-- 00:56:10.538 --> 00:56:12.830 and some of you, if you've seen that movie I mentioned, 00:56:12.830 --> 00:56:15.360 will know immediately why this is the way it is, 00:56:15.360 --> 00:56:19.310 but otherwise, you might just see this as an advertisement of sorts. 00:56:19.310 --> 00:56:20.150 So here we go. 00:56:20.150 --> 00:56:26.110 Your chance to close your eyes in 5, 4, 3, 2, 1. 00:56:34.090 --> 00:56:37.090 From some faces, some of you have seen this movie around the holidays, 00:56:37.090 --> 00:56:40.450 but now, I've taken it off the screen and we'll move on now 00:56:40.450 --> 00:56:41.920 with some actual algorithms. 00:56:41.920 --> 00:56:45.430 If you'd like to come back on replay and actually see what the answer is, 00:56:45.430 --> 00:56:47.450 we'll, of course, leave it on-demand. 00:56:47.450 --> 00:56:49.840 So what are some of the actual algorithms 00:56:49.840 --> 00:56:53.680 used nowadays for encryption that are best practices? 00:56:53.680 --> 00:56:57.370 This rotational cipher that I described earlier, Caesar's simple one, 00:56:57.370 --> 00:56:58.810 is not to be recommended. 00:56:58.810 --> 00:57:01.480 It's wonderful for demonstration sake and discussion's sake, 00:57:01.480 --> 00:57:05.240 but it's not something you should be using in practice unless, for instance, 00:57:05.240 --> 00:57:08.440 you're in, say, middle school trying to send a message on a piece of paper 00:57:08.440 --> 00:57:11.140 through your classroom of classmates and worried 00:57:11.140 --> 00:57:14.050 that the teacher might intercept it and the teacher probably 00:57:14.050 --> 00:57:18.223 doesn't have the instinct to or the care to actually 00:57:18.223 --> 00:57:20.890 brute force their way through it and figure out what the key is. 00:57:20.890 --> 00:57:23.680 But that's the level of security you're getting with something 00:57:23.680 --> 00:57:25.030 like that rotational cipher. 00:57:25.030 --> 00:57:29.290 But in the real world, with our phones and desktops and laptops today, 00:57:29.290 --> 00:57:33.130 generally used our AES or triple DES, both of which 00:57:33.130 --> 00:57:36.020 are popular algorithms that have been vetted by the world 00:57:36.020 --> 00:57:41.150 and are very commonly used as secret key encryption ciphers 00:57:41.150 --> 00:57:44.090 or symmetric key encryption ciphers, which, to be clear, 00:57:44.090 --> 00:57:47.030 require that both the sender and the receiver 00:57:47.030 --> 00:57:50.120 know and use the exact same key. 00:57:50.120 --> 00:57:52.250 And for our purposes today, let me just stipulate 00:57:52.250 --> 00:57:55.070 that the mathematics of these two and other algorithms 00:57:55.070 --> 00:57:58.560 much more sophisticated and documented in textbooks, 00:57:58.560 --> 00:58:02.660 but, therefore, it makes it much harder for the adversary 00:58:02.660 --> 00:58:08.450 to figure out, as by trying 25 different keys, what the actual key in use 00:58:08.450 --> 00:58:10.270 might be. 00:58:10.270 --> 00:58:17.260 Questions now about secret key cryptography or any of the primitives 00:58:17.260 --> 00:58:19.390 we've just discussed? 00:58:19.390 --> 00:58:22.320 STUDENT: So is it possible that if someone hacks the-- like 00:58:22.320 --> 00:58:26.280 gets to know about the hash value-- the hash function of a company that it 00:58:26.280 --> 00:58:29.520 is using, he might be able to use the hash values 00:58:29.520 --> 00:58:34.577 and use-- like find a reverse function and then get the passwords for that? 00:58:34.577 --> 00:58:35.910 DAVID J. MALAN: A good question. 00:58:35.910 --> 00:58:39.830 I wouldn't worry mathematically about someone reversing the hash functions, 00:58:39.830 --> 00:58:43.220 if only because with all of the ones that are in popular use 00:58:43.220 --> 00:58:47.540 today in modern systems, there are a lot of smart mathematicians, computer 00:58:47.540 --> 00:58:52.010 scientists, professionals who have vetted, if not proven mathematically, 00:58:52.010 --> 00:58:54.750 that these things work as expected. 00:58:54.750 --> 00:59:00.380 However, if the passwords that have been hashed are relatively easy to guess, 00:59:00.380 --> 00:59:03.500 or if the adversary just gets lucky with whatever technique 00:59:03.500 --> 00:59:07.430 they are using, it is absolutely possible to find at least a password, 00:59:07.430 --> 00:59:10.880 a input that maps to that hash value, but often 00:59:10.880 --> 00:59:12.960 not without significant effort. 00:59:12.960 --> 00:59:15.920 And so generally, a company does not want to, 00:59:15.920 --> 00:59:19.820 should not try to keep proprietary or secret what 00:59:19.820 --> 00:59:22.970 hash function they're using, what encryption algorithm they're using. 00:59:22.970 --> 00:59:25.580 If anything, I dare say, it should be reassuring 00:59:25.580 --> 00:59:30.140 to the public if and when companies are using best practices and de facto 00:59:30.140 --> 00:59:32.750 standards, all of these algorithms are designed 00:59:32.750 --> 00:59:36.300 to keep secret not the algorithm itself, which literally can be found 00:59:36.300 --> 00:59:40.410 in like university textbooks nowadays and on Wikipedia and beyond, 00:59:40.410 --> 00:59:44.130 but rather, to keep secret the thing that's designed to be secret, 00:59:44.130 --> 00:59:45.360 which is the key. 00:59:45.360 --> 00:59:49.260 And now, if you're using too small of a key like I did originally, 00:59:49.260 --> 00:59:52.000 well, then you're just using the algorithm poorly, perhaps. 00:59:52.000 --> 00:59:54.000 But so long as you're adhering to best practices 00:59:54.000 --> 00:59:57.360 and picking a really big, recommended-sized key, 00:59:57.360 --> 01:00:01.470 then things mathematically should be trustworthy. 01:00:01.470 --> 01:00:05.070 STUDENT: For an attacker, rather than like basically cracking a hash 01:00:05.070 --> 01:00:08.310 or cracking an algorithm, wouldn't it be easier 01:00:08.310 --> 01:00:12.750 to just try and access the basic server database 01:00:12.750 --> 01:00:16.140 and access the hash function like generated code? 01:00:16.140 --> 01:00:20.340 So rather, access how the specific algorithm works. 01:00:20.340 --> 01:00:24.930 That way, they can basically just reverse-engineer it? 01:00:24.930 --> 01:00:27.970 DAVID J. MALAN: Everything you described is possible. 01:00:27.970 --> 01:00:31.470 However, I would push back on this assumption 01:00:31.470 --> 01:00:36.810 that the company should try to keep its hash algorithm secure or hidden. 01:00:36.810 --> 01:00:38.970 You should trust in the mathematics of what 01:00:38.970 --> 01:00:41.710 we're discussing today, both in the context of hashes 01:00:41.710 --> 01:00:43.350 and in the context of encryption. 01:00:43.350 --> 01:00:48.210 And I've pulled back up on the screen here the number of possible hashes 01:00:48.210 --> 01:00:53.470 that exist when using one of the most modern standards for hashing passwords. 01:00:53.470 --> 01:00:55.920 This is such a big number-- 01:00:55.920 --> 01:00:58.600 I dare say, I don't remember how many atoms are in the universe, 01:00:58.600 --> 01:01:01.410 but I'm going to guess it's fewer than this, maybe. 01:01:01.410 --> 01:01:07.440 The idea is, intuitively, that if the search space of possible hash values 01:01:07.440 --> 01:01:11.430 or the search space of possible keys is so darn big, 01:01:11.430 --> 01:01:14.370 both you and I, not to speak darkly, are going 01:01:14.370 --> 01:01:18.600 to be dead before the attacker actually figures out what 01:01:18.600 --> 01:01:22.960 that password or that hash actually is. 01:01:22.960 --> 01:01:24.840 So that's generally the presumption. 01:01:24.840 --> 01:01:28.620 Most of what we do today in terms of security all boils down 01:01:28.620 --> 01:01:33.840 to probabilities and trying to derive the probability of being exploited way, 01:01:33.840 --> 01:01:40.300 way, way down, even though, if your password is still 00000000, 01:01:40.300 --> 01:01:44.100 doesn't matter if there's this many or more possibilities if the adversary 01:01:44.100 --> 01:01:45.790 tries that one first. 01:01:45.790 --> 01:01:49.950 So keeping algorithms secret, keeping ciphers secret 01:01:49.950 --> 01:01:52.270 is generally not best practice. 01:01:52.270 --> 01:01:55.710 You should be trusting that the math and the probabilities 01:01:55.710 --> 01:02:00.120 will protect your data if you are using these algorithms correctly. 01:02:00.120 --> 01:02:03.240 And how about one more question before we resume? 01:02:03.240 --> 01:02:07.470 STUDENT: How cipher work with word? 01:02:07.470 --> 01:02:12.190 Not number, like with words, how it work? 01:02:12.190 --> 01:02:22.890 How we can cipher-- or cryptograph like our latest with words, not the number, 01:02:22.890 --> 01:02:25.440 how it can be work? 01:02:25.440 --> 01:02:28.660 DAVID J. MALAN: OK, so if your key is a word and not a number, 01:02:28.660 --> 01:02:32.160 let me first say that generally when it comes to encryption, 01:02:32.160 --> 01:02:34.470 the keys are not words. 01:02:34.470 --> 01:02:37.800 These are not passwords, they're not meant to be used in quite the same way. 01:02:37.800 --> 01:02:41.830 These keys are generally generated by the computer for you, 01:02:41.830 --> 01:02:45.750 and so as such, they're just random numbers for the most part. 01:02:45.750 --> 01:02:50.910 With that said, even if it is a word like apple, there are ways-- 01:02:50.910 --> 01:02:53.220 and you would learn this in a class like CS50 01:02:53.220 --> 01:02:58.750 itself-- to convert a word to the underlying numeric representation. 01:02:58.750 --> 01:03:00.900 There's a system called ASCII or Unicode. 01:03:00.900 --> 01:03:04.290 So capital A is actually the number 65 in most systems. 01:03:04.290 --> 01:03:06.000 Capital B is the number 66. 01:03:06.000 --> 01:03:07.680 But we can go one level deeper. 01:03:07.680 --> 01:03:11.910 There's actually a pattern of 0's and 1's that represent A's and B's and C's 01:03:11.910 --> 01:03:15.540 and so forth, so we can convert everything in the world of computers 01:03:15.540 --> 01:03:16.820 to numbers. 01:03:16.820 --> 01:03:20.680 And for that, let me encourage you to take CS50x online. 01:03:20.680 --> 01:03:24.340 So that, then, is secret key cryptography 01:03:24.340 --> 01:03:28.300 or symmetric key cryptography, but it doesn't solve all of our problems, 01:03:28.300 --> 01:03:31.360 because I've taken for granted throughout this whole discussion 01:03:31.360 --> 01:03:36.100 that the sender and the receiver have a shared secret between them. 01:03:36.100 --> 01:03:40.090 Whether it's a simple key like 1 or 2 or 13-- 01:03:40.090 --> 01:03:43.330 hopefully not 26-- or hopefully some much bigger value. 01:03:43.330 --> 01:03:46.460 But there's kind of a chicken and the egg problem there, 01:03:46.460 --> 01:03:50.800 so to speak, in English whereby how do you actually establish 01:03:50.800 --> 01:03:57.980 a shared secret between parties A and B if A and B have never talked before, 01:03:57.980 --> 01:03:58.690 in fact? 01:03:58.690 --> 01:04:01.510 So for instance, if you're visiting Amazon.com 01:04:01.510 --> 01:04:05.950 for the first time, a popular e-commerce website, or gmail.com for your email, 01:04:05.950 --> 01:04:08.437 ideally, and you probably know this already 01:04:08.437 --> 01:04:10.270 from just living in the real world nowadays, 01:04:10.270 --> 01:04:15.040 ideally you want that connection to Amazon or Gmail to be encrypted, 01:04:15.040 --> 01:04:16.690 to be scrambled in some way. 01:04:16.690 --> 01:04:17.200 Why? 01:04:17.200 --> 01:04:19.887 Well, you don't want your password being stolen by someone. 01:04:19.887 --> 01:04:22.720 You don't want your credit card number being intercepted by someone. 01:04:22.720 --> 01:04:25.490 You don't want your personal emails being read by other people. 01:04:25.490 --> 01:04:29.380 So it stands to reason that encryption is generally a good thing. 01:04:29.380 --> 01:04:31.300 And you've seen this, perhaps, in the URL bar 01:04:31.300 --> 01:04:36.700 via something called HTTPS where the S literally is meant to mean Secure. 01:04:36.700 --> 01:04:41.020 But odds are, you don't know anyone personally at amazon.com 01:04:41.020 --> 01:04:43.840 and you don't know anyone personally at gmail.com. 01:04:43.840 --> 01:04:47.890 So what key are you going to use to communicate securely 01:04:47.890 --> 01:04:51.580 with these websites, not to mention new websites that don't even exist today 01:04:51.580 --> 01:04:56.320 but might come online tomorrow, how do you establish a shared secret 01:04:56.320 --> 01:04:57.890 with someone else? 01:04:57.890 --> 01:05:02.500 So that's a fundamental gotcha or caveat with symmetric key 01:05:02.500 --> 01:05:05.350 or secret key encryption, is that it assumes 01:05:05.350 --> 01:05:09.940 that you have a shared secret between you and the other person. 01:05:09.940 --> 01:05:11.950 But the chicken and the egg scenario comes 01:05:11.950 --> 01:05:15.940 in whereby the only way to establish a shared secret 01:05:15.940 --> 01:05:18.610 would be to send it to the other person securely, 01:05:18.610 --> 01:05:21.430 but if you can't communicate securely, you can't even 01:05:21.430 --> 01:05:23.150 send them the secret you want to use. 01:05:23.150 --> 01:05:26.020 So you're caught in this deadlock. 01:05:26.020 --> 01:05:28.900 Thankfully, thanks to math, there are ways 01:05:28.900 --> 01:05:32.200 that we can solve this, too, via not symmetric key cryptography, 01:05:32.200 --> 01:05:37.420 but public key cryptography, otherwise known as asymmetric key cryptography. 01:05:37.420 --> 01:05:40.870 And among the algorithms here might be these, something called Diffie-Hellman, 01:05:40.870 --> 01:05:43.700 MQV, RSA, and others as well. 01:05:43.700 --> 01:05:47.710 And I dare say, on this list, maybe RSA is among the most well-known. 01:05:47.710 --> 01:05:50.800 It's perhaps an acronym you've actually seen in the wild. 01:05:50.800 --> 01:05:53.380 Now what do we mean by public key cryptography, 01:05:53.380 --> 01:05:56.410 or more specifically, public key encryption? 01:05:56.410 --> 01:05:58.960 Well, in the world of public key encryption, 01:05:58.960 --> 01:06:02.470 or asymmetric key encryption, the asymmetry 01:06:02.470 --> 01:06:07.060 is implying that you actually don't use one key between the two people 01:06:07.060 --> 01:06:10.660 A and B. You actually use two keys. 01:06:10.660 --> 01:06:14.200 In the world of public key encryption, everyone in the world 01:06:14.200 --> 01:06:17.230 has both a public key and a private key. 01:06:17.230 --> 01:06:19.560 And these two are just really big numbers. 01:06:19.560 --> 01:06:23.290 There is a mathematical relationship between these numbers, the public key 01:06:23.290 --> 01:06:25.390 and the private key, but that's a relationship 01:06:25.390 --> 01:06:27.940 that your phone or your laptop or your desktop 01:06:27.940 --> 01:06:30.910 figures out when generating these values for you. 01:06:30.910 --> 01:06:35.770 So unlike our previous discussion of passwords, which you and I as humans do 01:06:35.770 --> 01:06:39.040 choose and memorize or store in our password managers, 01:06:39.040 --> 01:06:42.100 when it comes to keys, these are generally, 01:06:42.100 --> 01:06:45.490 in the world of public key cryptography, generated for you. 01:06:45.490 --> 01:06:48.970 And as the name suggests, the whole purpose of these keys 01:06:48.970 --> 01:06:52.720 is to tell the whole world if you want what your public key is. 01:06:52.720 --> 01:06:54.460 It is not in any way secret. 01:06:54.460 --> 01:06:59.050 You can literally email it out, you can put it in the signature of every email, 01:06:59.050 --> 01:07:01.450 you can post it on your website, on social media. 01:07:01.450 --> 01:07:05.500 The whole point of the public key is to make it, indeed, public. 01:07:05.500 --> 01:07:10.270 But, suffice it to say, the private key should be kept secret by you, 01:07:10.270 --> 01:07:13.150 private by you on your own device. 01:07:13.150 --> 01:07:15.430 That should never be shared with anyone else. 01:07:15.430 --> 01:07:19.230 But the cool thing about public key cryptography and the mathematics 01:07:19.230 --> 01:07:23.130 underlying it is that if you share your public key 01:07:23.130 --> 01:07:27.540 with someone else on the internet, they can use that public key 01:07:27.540 --> 01:07:30.870 to encrypt a message and then send it to you over email 01:07:30.870 --> 01:07:33.100 or chat or any other technology. 01:07:33.100 --> 01:07:36.180 And if you had to guess, what is the only key 01:07:36.180 --> 01:07:40.170 in the world that can decrypt a message that has 01:07:40.170 --> 01:07:42.880 been encrypted with your public key? 01:07:42.880 --> 01:07:46.470 The only key in the world that can decrypt 01:07:46.470 --> 01:07:51.420 a message that has been encrypted with your public key is your private key. 01:07:51.420 --> 01:07:54.810 That's what the mathematical relationship ultimately does for you. 01:07:54.810 --> 01:07:57.540 So, pictorially here, if this is our algorithm that 01:07:57.540 --> 01:07:59.800 implements this idea of public key encryption, 01:07:59.800 --> 01:08:01.890 let's see what the inputs and outputs should be. 01:08:01.890 --> 01:08:04.860 If the goal is to send a message to you and you 01:08:04.860 --> 01:08:07.890 have shared with the world your public key, whoever is sending you 01:08:07.890 --> 01:08:13.260 this message uses your public key, their plaintext message, and out of that 01:08:13.260 --> 01:08:15.150 comes ciphertext. 01:08:15.150 --> 01:08:18.850 That, then, is how asymmetric key encryption works. 01:08:18.850 --> 01:08:21.939 Meanwhile, when you receive that message, 01:08:21.939 --> 01:08:26.080 you can use your own private key and the ciphertext you've just 01:08:26.080 --> 01:08:28.720 received to get back the plaintext. 01:08:28.720 --> 01:08:30.880 And this is what we mean by asymmetric. 01:08:30.880 --> 01:08:35.649 Unlike secret key cryptography or symmetric key cryptography where 01:08:35.649 --> 01:08:39.460 you're using the same key back and forth, plus 1 or minus 1 01:08:39.460 --> 01:08:44.170 in the case of the rotational cipher, with asymmetric encryption, 01:08:44.170 --> 01:08:49.990 you are using one key for one process and another key for the decryption 01:08:49.990 --> 01:08:50.689 process. 01:08:50.689 --> 01:08:52.960 So that's what's fundamentally different. 01:08:52.960 --> 01:08:55.870 RSA is one of the most popular algorithms for this. 01:08:55.870 --> 01:08:58.450 The browsers you probably use every day are probably 01:08:58.450 --> 01:09:01.450 using some variant of RSA underneath the hood. 01:09:01.450 --> 01:09:03.910 We won't get into great detail about the mathematics, 01:09:03.910 --> 01:09:06.850 but one of the most important details about RSA 01:09:06.850 --> 01:09:10.210 is that it relies on really big prime numbers. 01:09:10.210 --> 01:09:15.069 In fact, in a nutshell, what happens with RSA is your computer or your phone 01:09:15.069 --> 01:09:18.220 chooses a really big prime number called p. 01:09:18.220 --> 01:09:21.384 It then chooses a really big other prime number called q. 01:09:21.384 --> 01:09:25.479 Then it multiplies them together to get a new value, we'll call it n. 01:09:25.479 --> 01:09:30.220 And it uses that value n in the resulting mathematics 01:09:30.220 --> 01:09:33.850 that the algorithm's authors came up with, dot-dot-dot. 01:09:33.850 --> 01:09:37.569 The presumption here is that when you take a really big prime number 01:09:37.569 --> 01:09:40.870 and multiply it against a really big other prime number, 01:09:40.870 --> 01:09:45.609 it is really hard to figure out from the product of those numbers 01:09:45.609 --> 01:09:48.819 what the original p and q were. 01:09:48.819 --> 01:09:50.950 And if you're a little hazy on prime numbers, 01:09:50.950 --> 01:09:55.840 it's a number that can be only-- that can only be divided by itself and 1. 01:09:55.840 --> 01:09:59.620 And indeed, we can use those, coming up with two big ones, 01:09:59.620 --> 01:10:03.700 multiply it together in order to get this value n that is subsequently 01:10:03.700 --> 01:10:05.680 used in the rest of the mathematics. 01:10:05.680 --> 01:10:07.450 What are the rest of those mathematics? 01:10:07.450 --> 01:10:08.470 In essence, this. 01:10:08.470 --> 01:10:10.930 And this will be the scariest-looking formulas you perhaps 01:10:10.930 --> 01:10:12.730 see over the course of this class. 01:10:12.730 --> 01:10:18.520 The value n I just described is used as to divide values 01:10:18.520 --> 01:10:22.330 ultimately if you're unfamiliar with mod here, this means to, in this context, 01:10:22.330 --> 01:10:24.350 take the remainder of some value. 01:10:24.350 --> 01:10:25.480 So what are we doing? 01:10:25.480 --> 01:10:29.800 Here is a quick summary of how encryption and decryption works 01:10:29.800 --> 01:10:30.760 with RSA. 01:10:30.760 --> 01:10:35.230 If you have some message m that you want to send to another person 01:10:35.230 --> 01:10:39.130 and you have come up with somehow, via the dot-dot-dot process 01:10:39.130 --> 01:10:45.500 earlier that I alluded to, you've come up with your own public key e there. 01:10:45.500 --> 01:10:47.500 Well then, someone can take their message, 01:10:47.500 --> 01:10:53.140 encrypt it by raising that message to the power of e, the exponent of e, 01:10:53.140 --> 01:10:56.440 and then divide it, divide it, divide it, divide it by n 01:10:56.440 --> 01:11:00.550 and figure out what the remainder is when dividing by n. 01:11:00.550 --> 01:11:03.760 That then gives you a value called c for ciphertext. 01:11:03.760 --> 01:11:08.170 When you then receive that message c, you can use your private key, 01:11:08.170 --> 01:11:12.340 known here as d, and you raise the ciphertext, 01:11:12.340 --> 01:11:16.480 its numeric value, to the power of d-- that is, the exponent in d, and you 01:11:16.480 --> 01:11:19.330 divide, divide, divide by n in order to figure out 01:11:19.330 --> 01:11:22.990 that remainder, which will give you back the original message. 01:11:22.990 --> 01:11:26.510 Now that is a significant oversimplification of what's going on, 01:11:26.510 --> 01:11:28.750 but that's the essence of the algorithm. 01:11:28.750 --> 01:11:32.200 It has to do with picking two very large prime numbers, 01:11:32.200 --> 01:11:35.140 multiplying them together to get that value n, 01:11:35.140 --> 01:11:39.850 and then using n as well as other values that, dot-dot-dot, are generated 01:11:39.850 --> 01:11:46.150 by the algorithm for you, e and d, in order to encrypt and decrypt messages 01:11:46.150 --> 01:11:47.080 ultimately. 01:11:47.080 --> 01:11:49.750 And this is what's generally known as modular arithmetic. 01:11:49.750 --> 01:11:52.000 It involves lots of division and division and division 01:11:52.000 --> 01:11:53.750 in order to come up with these remainders, 01:11:53.750 --> 01:11:59.020 but ultimately, it is a very secure way to asymmetrically share information 01:11:59.020 --> 01:12:02.410 without having to agree on one shared key in advance, 01:12:02.410 --> 01:12:06.130 but rather, using a public and a private key instead. 01:12:06.130 --> 01:12:10.480 Now there are other techniques that come with this world 01:12:10.480 --> 01:12:14.590 of public key cryptography, and another technique is that of key exchange. 01:12:14.590 --> 01:12:18.210 So by contrast, if you do actually want to establish 01:12:18.210 --> 01:12:22.320 some kind of shared secret, there are alternative algorithms 01:12:22.320 --> 01:12:24.850 that different humans have invented over the years. 01:12:24.850 --> 01:12:27.550 So there are alternatives to one algorithm or another, 01:12:27.550 --> 01:12:29.670 and one of these alternatives is actually 01:12:29.670 --> 01:12:33.910 called Diffie-Hellman, named after another pair of authors here. 01:12:33.910 --> 01:12:37.530 So here is the essence of the mathematics for this algorithm, 01:12:37.530 --> 01:12:40.470 the goal of which is indeed key exchange. 01:12:40.470 --> 01:12:45.090 To figure out, using fancy mathematics, how both A and B can come up 01:12:45.090 --> 01:12:49.590 with the same value that they can then use as a shared secret, 01:12:49.590 --> 01:12:52.980 but without anyone who intercepts any of their messages 01:12:52.980 --> 01:12:57.750 being able to figure out what is that shared value, that shared secret. 01:12:57.750 --> 01:12:59.830 So what's the essence of the math here? 01:12:59.830 --> 01:13:02.910 Well, you first pick a value g, which is called a generator. 01:13:02.910 --> 01:13:04.740 It can be as simple as the number 2. 01:13:04.740 --> 01:13:07.860 And you pick a big prime number, call it p here. 01:13:07.860 --> 01:13:10.170 And those are agreed-upon in advance. 01:13:10.170 --> 01:13:15.090 Meanwhile, person A, say Alice, picks her own private key A, 01:13:15.090 --> 01:13:18.600 which is another really big number, and then she does this math. g 01:13:18.600 --> 01:13:20.550 to the power of A mod p. 01:13:20.550 --> 01:13:23.940 And again, mod refers to taking the remainder of some value. 01:13:23.940 --> 01:13:28.890 Meanwhile, B, or Bob, still uses the same g, still uses the same p, 01:13:28.890 --> 01:13:34.500 picks his own private key called B and raises g to the power of B modulo p, 01:13:34.500 --> 01:13:36.990 and that gives him back this value capital 01:13:36.990 --> 01:13:41.910 B, whereas Alice had capital A. Then, turns out that Alice and Bob can 01:13:41.910 --> 01:13:44.310 send those values across the internet-- 01:13:44.310 --> 01:13:50.010 A one way, B the other way, and thanks to some fancy modular arithmetic 01:13:50.010 --> 01:13:54.360 here, too, Alice can take Bob's B value and raise it 01:13:54.360 --> 01:13:58.680 to the power of her A value, which effectively gives you 01:13:58.680 --> 01:14:02.010 g to the power of A times B mod p. 01:14:02.010 --> 01:14:05.790 Bob, meanwhile, can take Alice's A value that was sent to him, 01:14:05.790 --> 01:14:10.590 raise it to the power of his private key B, and then mod p. 01:14:10.590 --> 01:14:13.140 So calculate the remainder with respect to p. 01:14:13.140 --> 01:14:16.530 The end result, and it's totally fine if these mathematics 01:14:16.530 --> 01:14:18.330 are uncomfortable for you or whoo! 01:14:18.330 --> 01:14:22.980 Just know that, thanks to some basic principles of mathematics, 01:14:22.980 --> 01:14:27.930 this results in both Alice and Bob having the exact same value-- 01:14:27.930 --> 01:14:30.270 we'll call it s for shared secret-- 01:14:30.270 --> 01:14:35.310 even though the value never went across the internet in its entirety. 01:14:35.310 --> 01:14:38.850 Alice sent part of it this way, Bob sent part of it this way, 01:14:38.850 --> 01:14:43.200 but because Alice and Bob held on to private values, the little A 01:14:43.200 --> 01:14:46.320 and the little B, they kept that to themselves, they're 01:14:46.320 --> 01:14:49.590 able to do these mathematics that ensure that they both came up 01:14:49.590 --> 01:14:53.850 with the same value even though you or I, if we intercepted 01:14:53.850 --> 01:14:57.240 any one of those messages, we could not figure out what it is. 01:14:57.240 --> 01:14:59.670 And now that they have a shared secret s, 01:14:59.670 --> 01:15:02.880 they can use that using any of those other symmetric 01:15:02.880 --> 01:15:04.710 ciphers we talked about earlier. 01:15:04.710 --> 01:15:09.210 AES I put on the board briefly, triple DES I put on the board briefly. 01:15:09.210 --> 01:15:12.090 Heck, we could even use this in a rotational cipher 01:15:12.090 --> 01:15:15.870 if we really wanted to, but not, indeed, best practice. 01:15:15.870 --> 01:15:19.410 So again, don't worry so much about focusing on the mathematics, 01:15:19.410 --> 01:15:22.710 but if you were to take a higher-level class in theoretical computer science, 01:15:22.710 --> 01:15:26.550 these are intellectual rabbit holes that you could go down to better understand 01:15:26.550 --> 01:15:27.690 how the software works. 01:15:27.690 --> 01:15:30.480 And now to my comments earlier about not trying 01:15:30.480 --> 01:15:33.690 to invent your own cryptographic functions, 01:15:33.690 --> 01:15:35.280 this is the kind of reason why. 01:15:35.280 --> 01:15:37.980 This is the degree of sophistication that you and I take 01:15:37.980 --> 01:15:41.520 for granted in our phones, our laptops, and desktops 01:15:41.520 --> 01:15:44.250 that have been vetted by industry and academics alike. 01:15:44.250 --> 01:15:47.550 Generally best practice is to rely on standards 01:15:47.550 --> 01:15:50.040 that have been tried and tested rather than 01:15:50.040 --> 01:15:53.760 try to come up with your own creative cryptosystem, so to speak, 01:15:53.760 --> 01:15:57.780 that may very well have faults that you yourself do not know. 01:15:57.780 --> 01:16:01.170 And the icing on the cake is that this is ultimately, if curious as 01:16:01.170 --> 01:16:04.860 to the underlying mathematics, what value ultimately 01:16:04.860 --> 01:16:10.320 Alice and Bob are both calculating, g to the power A times B mod p. 01:16:10.320 --> 01:16:13.560 But more on that in a higher-level mathematics course if indeed 01:16:13.560 --> 01:16:14.460 of interest. 01:16:14.460 --> 01:16:17.160 How about one final building block that you 01:16:17.160 --> 01:16:19.960 get from this world of public key cryptography, 01:16:19.960 --> 01:16:23.110 and this is one that's going to be increasingly omnipresent, 01:16:23.110 --> 01:16:25.680 I do think, in our world, especially as we move away 01:16:25.680 --> 01:16:28.950 from very archaic paper-pencil signatures 01:16:28.950 --> 01:16:31.290 that you might write with a pen on a paper, 01:16:31.290 --> 01:16:35.050 and rather, moving to what we'll call digital signatures as well. 01:16:35.050 --> 01:16:38.850 It turns out that once you're comfortable with the idea 01:16:38.850 --> 01:16:42.600 of public key cryptography generally involving a public key 01:16:42.600 --> 01:16:46.050 and a private key, the first of which is literally public, 01:16:46.050 --> 01:16:49.710 you can share it with the world; the second of which is meant to be private, 01:16:49.710 --> 01:16:50.910 kept only to you. 01:16:50.910 --> 01:16:53.880 And if you can take at face value my claim 01:16:53.880 --> 01:16:56.160 that through appropriate mathematics, there's 01:16:56.160 --> 01:16:59.310 a relationship possible between these two numbers, 01:16:59.310 --> 01:17:02.850 that whereas one can encrypt data, the other can decrypt, 01:17:02.850 --> 01:17:06.000 even if you don't care to get into the specifics of the mathematics, 01:17:06.000 --> 01:17:09.750 but you just agree that, OK, that sounds reasonable to me, 01:17:09.750 --> 01:17:15.090 that that math can work, we can now use that building block 01:17:15.090 --> 01:17:18.860 of a public key and a private key to solve other problems as well. 01:17:18.860 --> 01:17:21.620 Not just encrypt messages from point A to point B 01:17:21.620 --> 01:17:25.670 and back, but rather, to sign information, sign documents, 01:17:25.670 --> 01:17:30.150 even, and say, yes, this was signed by David or someone else. 01:17:30.150 --> 01:17:31.520 So how does this work? 01:17:31.520 --> 01:17:33.680 In the world of digital signatures, here's 01:17:33.680 --> 01:17:36.260 a few more acronyms of algorithms that are commonly 01:17:36.260 --> 01:17:39.230 used even though we'll continue to simplify them in our discussion. 01:17:39.230 --> 01:17:43.550 DSA, ECDSA, RSA, and others can be used to give you 01:17:43.550 --> 01:17:48.000 the ability to sign documents or other pieces of information digitally. 01:17:48.000 --> 01:17:51.230 So what does it mean to sign something digitally? 01:17:51.230 --> 01:17:53.930 It's not at all like this with a unique signature, 01:17:53.930 --> 01:17:56.340 it's all mathematics involved. 01:17:56.340 --> 01:18:00.500 So, here, then, might be our algorithm for digitally signing 01:18:00.500 --> 01:18:02.660 some document or piece of information. 01:18:02.660 --> 01:18:06.260 And I claim that the input to this process is a message. 01:18:06.260 --> 01:18:09.050 A letter that you've written, a contract that you want to sign, 01:18:09.050 --> 01:18:11.855 something that you want to put your digital signature on. 01:18:11.855 --> 01:18:16.000 And the output of this message initially is going to be a hash. 01:18:16.000 --> 01:18:18.810 So we can use any number of hash functions 01:18:18.810 --> 01:18:23.610 we talked about earlier that take as input an arbitrary length 01:18:23.610 --> 01:18:27.780 input, like a message, a document, an essay, a contract, 01:18:27.780 --> 01:18:32.050 and produce as output a fixed length hash value. 01:18:32.050 --> 01:18:34.650 So we've seen that and we've stipulated that is indeed 01:18:34.650 --> 01:18:38.010 possible, similar in spirit to our password discussion earlier. 01:18:38.010 --> 01:18:40.800 You can even do it for larger inputs than passwords. 01:18:40.800 --> 01:18:43.570 You can do it for entire documents as well. 01:18:43.570 --> 01:18:47.400 Once you have that hash, here's how you digitally sign the document. 01:18:47.400 --> 01:18:54.600 You use your private key, you pass that as input, as well as the hash value 01:18:54.600 --> 01:18:58.620 you just computed a moment ago into the digital signature algorithm, 01:18:58.620 --> 01:19:01.993 and the output of that process is a signature. 01:19:01.993 --> 01:19:04.410 So if you think about this intuitively, what are we doing? 01:19:04.410 --> 01:19:07.020 Well, we're taking an arbitrary-sized document. 01:19:07.020 --> 01:19:09.420 Maybe it's a letter that you've written, maybe it's 01:19:09.420 --> 01:19:12.690 a contract that you've written that you need to sign that might be short 01:19:12.690 --> 01:19:14.250 or it might be really long. 01:19:14.250 --> 01:19:17.440 Here's where the value of cryptographic hash functions come in. 01:19:17.440 --> 01:19:19.830 Recall that a cryptographic hash function, by definition, 01:19:19.830 --> 01:19:25.240 takes an arbitrary-sized input and reduces it to a fixed-sized output. 01:19:25.240 --> 01:19:27.120 So it doesn't matter how big the original 01:19:27.120 --> 01:19:31.200 was, you can distill it into a distinct representation that's shorter. 01:19:31.200 --> 01:19:35.280 So, per this diagram, if you take that hash value 01:19:35.280 --> 01:19:39.330 and you encrypt it with your private key, what we say 01:19:39.330 --> 01:19:41.880 is that the output of that process, which 01:19:41.880 --> 01:19:45.330 is just a really big number or some sequence of weird-looking text, 01:19:45.330 --> 01:19:47.800 is your digital signature. 01:19:47.800 --> 01:19:50.820 Now this is a little weird because what we're doing now 01:19:50.820 --> 01:19:53.280 is the opposite of public key encryption. 01:19:53.280 --> 01:19:56.040 With public key encryption, remember, someone else 01:19:56.040 --> 01:19:59.010 used your public key to encrypt a message to you 01:19:59.010 --> 01:20:02.800 and you used your private key to decrypt it. 01:20:02.800 --> 01:20:06.960 But in the case of digital signatures, the story gets flipped upside-down. 01:20:06.960 --> 01:20:11.100 You use your private key and a hash of your message 01:20:11.100 --> 01:20:15.150 to digitally sign your document and the output of that is a signature-- again, 01:20:15.150 --> 01:20:17.040 a number or some string of text. 01:20:17.040 --> 01:20:20.940 And you send that signature to the recipient saying, this 01:20:20.940 --> 01:20:24.970 is my digital signature, you can verify it now if you so choose. 01:20:24.970 --> 01:20:25.950 And they should. 01:20:25.950 --> 01:20:29.160 So that invites the question, well, how does the recipient 01:20:29.160 --> 01:20:31.200 verify your digital signature? 01:20:31.200 --> 01:20:34.530 How do they know that this weird-looking sequence of characters or numbers 01:20:34.530 --> 01:20:36.570 actually was signed by you? 01:20:36.570 --> 01:20:41.230 Well, recall that you have not only a private key, but a public key as well. 01:20:41.230 --> 01:20:44.640 And that public key is accessible to everyone, including that recipient. 01:20:44.640 --> 01:20:46.830 And so, what happens is this. 01:20:46.830 --> 01:20:51.100 When that recipient gets your document and your digital signature, 01:20:51.100 --> 01:20:55.710 so to speak, they probably want to and should verify the digital signature 01:20:55.710 --> 01:20:59.410 to confirm that, yes, you signed off on that document or contract. 01:20:59.410 --> 01:21:01.300 So what does that box look like? 01:21:01.300 --> 01:21:05.220 Well, they have received not only the document itself, the so-called message, 01:21:05.220 --> 01:21:07.318 they've also received your digital signature. 01:21:07.318 --> 01:21:08.610 So you've sent them two things. 01:21:08.610 --> 01:21:11.770 And the digital signature, you can think of it like a human signature, 01:21:11.770 --> 01:21:14.130 but it's, of course, a big number or a string of text. 01:21:14.130 --> 01:21:17.290 But they've sent you two things-- the document and that signature. 01:21:17.290 --> 01:21:18.220 So what do you do? 01:21:18.220 --> 01:21:20.640 You take the document you've received and you run it 01:21:20.640 --> 01:21:22.800 through the exact same publicly available hash 01:21:22.800 --> 01:21:24.820 function, because the document might be long, 01:21:24.820 --> 01:21:28.650 so you want to collapse it into a short hash representation 01:21:28.650 --> 01:21:31.180 thereof, just like our use of passwords. 01:21:31.180 --> 01:21:35.010 So that you can just do easily, no private information involved. 01:21:35.010 --> 01:21:36.420 But then what do you do? 01:21:36.420 --> 01:21:42.480 You then take the public key of the person who signed this document, you 01:21:42.480 --> 01:21:45.780 take the signature that they claim is their signature, 01:21:45.780 --> 01:21:50.430 and you decrypt their signature with their public key. 01:21:50.430 --> 01:21:58.780 That should output the exact same hash that you just calculated. 01:21:58.780 --> 01:22:04.333 So to summarize, the message itself the document in this story is public. 01:22:04.333 --> 01:22:07.500 It's not encrypted, it's not something you really worry about being private. 01:22:07.500 --> 01:22:09.480 What you really care about in this story is 01:22:09.480 --> 01:22:11.830 that it was signed by a specific person. 01:22:11.830 --> 01:22:15.130 So if that message, that document is available to both the sender 01:22:15.130 --> 01:22:20.890 and the receiver, both of them do this first process of hashing the message, 01:22:20.890 --> 01:22:24.610 hashing the document just to get some succinct representation thereof. 01:22:24.610 --> 01:22:26.560 So it's not this big, it's this big. 01:22:26.560 --> 01:22:28.580 Makes the math quicker and easier. 01:22:28.580 --> 01:22:33.670 However, what the recipient does is upon receiving not only that message, which 01:22:33.670 --> 01:22:37.660 they just hashed, but also your claimed digital signature, 01:22:37.660 --> 01:22:42.640 they try to decrypt your signature using your public key. 01:22:42.640 --> 01:22:45.460 And here, too, just as the private key can 01:22:45.460 --> 01:22:48.460 reverse the encryption done by a public key, 01:22:48.460 --> 01:22:53.060 so can the public key reverse the encryption done by a private key. 01:22:53.060 --> 01:22:58.300 So if the recipient mathematically gets the exact same hash 01:22:58.300 --> 01:23:01.660 after decrypting what you sent them, it must be the case 01:23:01.660 --> 01:23:04.870 mathematically that the only person in the world who 01:23:04.870 --> 01:23:07.600 could have signed this document is, in fact, you 01:23:07.600 --> 01:23:09.670 because they have your public key. 01:23:09.670 --> 01:23:11.920 And maybe some third party, some registry, 01:23:11.920 --> 01:23:15.200 some company has said, yes, that is David Malan's public key, 01:23:15.200 --> 01:23:16.410 you can trust that. 01:23:16.410 --> 01:23:20.900 And so, if David Malan's private key has not been compromised, 01:23:20.900 --> 01:23:26.720 you can trust that any signature that you can decrypt with my public key 01:23:26.720 --> 01:23:31.190 must have been encrypted with my private key. 01:23:31.190 --> 01:23:34.670 And it takes a while, I think, for these ideas, and certainly the mathematics 01:23:34.670 --> 01:23:36.770 to sink in, but for now, if you just trust 01:23:36.770 --> 01:23:39.770 that there's two big numbers in the world, one public, one private, 01:23:39.770 --> 01:23:43.580 there's a mathematical relationship between them such that one can reverse 01:23:43.580 --> 01:23:46.130 the effects of the other in either direction, 01:23:46.130 --> 01:23:49.790 we humans can use this now not only to secure 01:23:49.790 --> 01:23:53.060 our messages per our discussion of encryption, 01:23:53.060 --> 01:23:56.210 we can also use it to authenticate messages 01:23:56.210 --> 01:24:01.160 and attest, yes, this came from David Malan or did not. 01:24:01.160 --> 01:24:03.980 And unlike a human signature on a piece of paper 01:24:03.980 --> 01:24:07.940 that can obviously just be photographed, duplicated, traced over, 01:24:07.940 --> 01:24:13.070 the secrecy of digital signatures relies on keeping your private key private, 01:24:13.070 --> 01:24:16.280 and that notion does not exist in the world of human signatures, 01:24:16.280 --> 01:24:20.300 and so in that sense, digital signatures are objectively better 01:24:20.300 --> 01:24:24.020 than our old-form human ones. 01:24:24.020 --> 01:24:25.400 Questions now? 01:24:25.400 --> 01:24:29.540 And I know that's a lot, and it's OK if it didn't all go down at once. 01:24:29.540 --> 01:24:34.130 Questions on digital signatures, public key encryption or decryption, 01:24:34.130 --> 01:24:36.080 or anything prior? 01:24:36.080 --> 01:24:39.650 STUDENT: Would these public and private keys be attributed to, what, 01:24:39.650 --> 01:24:41.177 your IP address? 01:24:41.177 --> 01:24:42.510 DAVID J. MALAN: A good question. 01:24:42.510 --> 01:24:43.677 To what are they attributed? 01:24:43.677 --> 01:24:45.650 Not to your IP address typically. 01:24:45.650 --> 01:24:49.700 They are typically stored in a registry, like a central registry that 01:24:49.700 --> 01:24:53.050 knows that this is Vlad's public key, this is David's public key and so 01:24:53.050 --> 01:24:53.550 forth. 01:24:53.550 --> 01:24:56.630 And it relies on a system of trust and transitivity. 01:24:56.630 --> 01:25:01.940 So if you trust this third party company that is storing all of our public keys, 01:25:01.940 --> 01:25:05.690 then you can trust whoever it is "they" are, in turn, trusting. 01:25:05.690 --> 01:25:07.370 Or it can be more distributed. 01:25:07.370 --> 01:25:09.470 Your public key can literally be distributed 01:25:09.470 --> 01:25:10.825 in the footer of your emails. 01:25:10.825 --> 01:25:12.200 It can be posted on your website. 01:25:12.200 --> 01:25:14.670 It can be on your LinkedIn profile or the like. 01:25:14.670 --> 01:25:17.690 And so long as other people in the world trust 01:25:17.690 --> 01:25:21.200 your emails or your website or LinkedIn, they 01:25:21.200 --> 01:25:23.940 can trust that that is, in fact, your public key. 01:25:23.940 --> 01:25:26.910 So different ways to implement that system of trust. 01:25:26.910 --> 01:25:28.740 Other questions? 01:25:28.740 --> 01:25:33.840 STUDENT: Hashing uses a mathematical function and encryption uses 01:25:33.840 --> 01:25:36.270 a mathematical function plus a key. 01:25:36.270 --> 01:25:42.960 Like the Caesar Cipher basically uses the simple function plus the key. 01:25:42.960 --> 01:25:44.788 Is that analogy correct? 01:25:44.788 --> 01:25:46.330 DAVID J. MALAN: Yes, that is correct. 01:25:46.330 --> 01:25:49.500 And if it helps you-- this is an oversimplification, 01:25:49.500 --> 01:25:54.100 but it's generally helpful, I think, to think of hashing as one-way. 01:25:54.100 --> 01:26:00.570 So you can only convert a value to a hash value but not the opposite. 01:26:00.570 --> 01:26:04.590 But encryption is like two-way-- 01:26:04.590 --> 01:26:06.660 it's reversible hashing, so to speak. 01:26:06.660 --> 01:26:10.770 The output still looks weird and random, but you can undo the process. 01:26:10.770 --> 01:26:14.460 And one way to think about this is in the world of hashing, 01:26:14.460 --> 01:26:18.510 because I claim that you can take like an infinite domain, 01:26:18.510 --> 01:26:22.200 like any possible message you want to send, and convert it 01:26:22.200 --> 01:26:23.820 to a finite range,-- 01:26:23.820 --> 01:26:27.450 for instance, all A-words could be a hash value of 1, 01:26:27.450 --> 01:26:29.940 all B-words could have a hash value of 2. 01:26:29.940 --> 01:26:34.590 That simple example already captures the reality 01:26:34.590 --> 01:26:38.820 that if you only have the hash values 1, 2, 01:26:38.820 --> 01:26:41.550 I have no idea what the original input is. 01:26:41.550 --> 01:26:44.490 And it doesn't matter how hard I try, I'm never going to figure it out 01:26:44.490 --> 01:26:48.930 because it could be apple or avocado or something else that starts with A. 01:26:48.930 --> 01:26:54.030 So hashing in that sense, one-way hashing throws away information such 01:26:54.030 --> 01:26:55.800 that it's not recoverable. 01:26:55.800 --> 01:26:58.780 But encryption does the opposite. 01:26:58.780 --> 01:27:02.430 It would be pretty useless if encryption threw away information 01:27:02.430 --> 01:27:06.360 because the whole point of encryption is to secure messages and information 01:27:06.360 --> 01:27:07.300 we want to send. 01:27:07.300 --> 01:27:13.530 So encryption is reversible; hashing, in general, is not. 01:27:13.530 --> 01:27:17.880 And, as you know, the key, no pun intended, to encryption 01:27:17.880 --> 01:27:21.900 is necessary so that you can reverse the process in a way that 01:27:21.900 --> 01:27:24.147 remains secret to other people. 01:27:24.147 --> 01:27:26.730 How about one more question, and then we'll take a short break 01:27:26.730 --> 01:27:28.890 and then we'll come back and wrap up. 01:27:28.890 --> 01:27:32.520 STUDENT: Is there any possibility to spoof the signatures? 01:27:32.520 --> 01:27:34.290 DAVID J. MALAN: Short answer, no. 01:27:34.290 --> 01:27:38.280 Like so long as you are using a standard that we believe 01:27:38.280 --> 01:27:43.080 to be correct and not compromised, so long as your private key has not 01:27:43.080 --> 01:27:47.310 been stolen by someone or no one's taken it off of your phone or your computer, 01:27:47.310 --> 01:27:50.280 they should not-- it should not be possible to forge it. 01:27:50.280 --> 01:27:55.500 The probability is so, so, so low, it should be the least of your concerns 01:27:55.500 --> 01:27:57.080 is the idea. 01:27:57.080 --> 01:28:00.640 Now it turns out, there is yet one other application 01:28:00.640 --> 01:28:05.720 of this world of public key cryptography that solves a problem from last time. 01:28:05.720 --> 01:28:09.790 Recall that we ended our first class on a note of emphasizing 01:28:09.790 --> 01:28:14.870 that passwords and password managers can improve our security if used properly, 01:28:14.870 --> 01:28:18.310 but there's another technology that's becoming increasingly available. 01:28:18.310 --> 01:28:21.070 And it's colloquially called passkeys. 01:28:21.070 --> 01:28:23.230 Or more technically, it's an implementation 01:28:23.230 --> 01:28:25.540 of a standard called web authentication. 01:28:25.540 --> 01:28:27.970 And it turns out that these passkeys, which 01:28:27.970 --> 01:28:31.480 are available on certain platforms and certain websites and evermore 01:28:31.480 --> 01:28:34.870 will be available soon quite shortly, they, too, 01:28:34.870 --> 01:28:38.530 rely on public and private keys as follows. 01:28:38.530 --> 01:28:41.620 And thankfully now, as fancy as the mathematics 01:28:41.620 --> 01:28:44.650 we're alluding to today sound, there really are only two ways 01:28:44.650 --> 01:28:46.570 to use these public and private keys-- 01:28:46.570 --> 01:28:50.930 to either encrypt with one and decrypt with the other or vice versa. 01:28:50.930 --> 01:28:53.170 So we have just a fairly basic building block 01:28:53.170 --> 01:28:55.670 that we can use in one direction or another. 01:28:55.670 --> 01:28:57.650 So how do passkeys work? 01:28:57.650 --> 01:29:00.170 In the near-future, as you will find, when 01:29:00.170 --> 01:29:02.420 you go to certain websites or applications, 01:29:02.420 --> 01:29:07.130 you probably will not be prompted as frequently to type in a username 01:29:07.130 --> 01:29:09.770 and pick a password, which is to say, you 01:29:09.770 --> 01:29:11.870 don't have to generate a hard-to-guess password, 01:29:11.870 --> 01:29:14.037 you don't have to memorize a hard-to-guess password. 01:29:14.037 --> 01:29:17.330 You don't have to even store a hard-to-guess password in a password 01:29:17.330 --> 01:29:21.650 manager because passkeys eliminate passwords. 01:29:21.650 --> 01:29:26.270 It moves us more toward a world of passwordless accounts. 01:29:26.270 --> 01:29:27.890 Now how can that be? 01:29:27.890 --> 01:29:30.650 Because up until now, we've been using usernames and passwords 01:29:30.650 --> 01:29:32.360 to authenticate ourselves. 01:29:32.360 --> 01:29:35.600 Well, it turns out, we humans have been getting really good at this math, 01:29:35.600 --> 01:29:37.580 even if it doesn't feel like it today, we've 01:29:37.580 --> 01:29:39.620 been getting really good at using mathematics 01:29:39.620 --> 01:29:41.460 to solve these problems as well. 01:29:41.460 --> 01:29:43.220 So imagine the following scenario. 01:29:43.220 --> 01:29:46.550 When you go to a website in the future or app, 01:29:46.550 --> 01:29:49.430 rather than being prompted to create a username and password, 01:29:49.430 --> 01:29:52.220 you'll just be prompted to create a passkey. 01:29:52.220 --> 01:29:56.210 What that means is your laptop or desktop or phone will probably 01:29:56.210 --> 01:29:58.520 prompt you with some form of factor. 01:29:58.520 --> 01:30:02.540 They'll ask you for your fingerprint or they'll ask you for a scan of your face 01:30:02.540 --> 01:30:06.140 or maybe a pin code, a short number that you type in just 01:30:06.140 --> 01:30:08.210 to demonstrate with high probability that you 01:30:08.210 --> 01:30:11.240 are authorized to be using this device and creating this account. 01:30:11.240 --> 01:30:14.840 What then will your device and the website do? 01:30:14.840 --> 01:30:19.280 Your device will generate a public key and a private key 01:30:19.280 --> 01:30:22.670 just for that one website or app. 01:30:22.670 --> 01:30:29.870 Your device will send the public key to that new website, along with your user 01:30:29.870 --> 01:30:32.690 ID or username, some identifying information 01:30:32.690 --> 01:30:35.180 so that they know your David or someone else. 01:30:35.180 --> 01:30:37.130 But you don't send a password. 01:30:37.130 --> 01:30:41.480 You only send to the website or app your public key. 01:30:41.480 --> 01:30:43.850 And you keep private, within your browser 01:30:43.850 --> 01:30:47.660 or some other piece of software, your corresponding private key. 01:30:47.660 --> 01:30:52.280 And to be clear, this public-private key pair is used only for this one website. 01:30:52.280 --> 01:30:54.770 You'll do this repeatedly, but automatically 01:30:54.770 --> 01:30:57.810 for every other website in the world in this model. 01:30:57.810 --> 01:31:01.640 So what happens when you not register for that website, which 01:31:01.640 --> 01:31:04.700 you've just done, but you want to log into it tomorrow, 01:31:04.700 --> 01:31:06.470 next week, or next year? 01:31:06.470 --> 01:31:08.840 Well, assuming you still have that same device 01:31:08.840 --> 01:31:11.210 or you're using some kind of cloud service 01:31:11.210 --> 01:31:15.380 that synchronizes all of your past keys, your public and private keys, 01:31:15.380 --> 01:31:16.590 across devices-- 01:31:16.590 --> 01:31:19.220 so you haven't lost these past keys, here's 01:31:19.220 --> 01:31:22.970 how you would log in to the website tomorrow, next week, or next year. 01:31:22.970 --> 01:31:26.750 The website would send you when you visit a challenge, 01:31:26.750 --> 01:31:28.820 and a challenge is like some little message. 01:31:28.820 --> 01:31:31.280 It's like a number or a word or a phrase. 01:31:31.280 --> 01:31:33.890 It's some piece of randomly-generated data 01:31:33.890 --> 01:31:37.070 that the website wants you to digitally sign. 01:31:37.070 --> 01:31:39.200 Well, how do you digitally sign information? 01:31:39.200 --> 01:31:42.290 I proposed earlier that you can use your private key 01:31:42.290 --> 01:31:47.060 and pass that key and that challenge, which is just a random input given 01:31:47.060 --> 01:31:51.500 to you by the website, into your digital signature algorithm, this black box. 01:31:51.500 --> 01:31:54.380 And the output of that, as before, is your signature. 01:31:54.380 --> 01:31:55.890 And what is your device do? 01:31:55.890 --> 01:32:00.720 It sends that signature for that challenge to the website. 01:32:00.720 --> 01:32:03.187 And if you followed along earlier well enough, 01:32:03.187 --> 01:32:05.270 you might now realize where we're going with this. 01:32:05.270 --> 01:32:11.000 How does the website now verify that that is, in fact, your signature? 01:32:11.000 --> 01:32:15.950 That this did come from David's device and not some adversary online? 01:32:15.950 --> 01:32:19.380 The website, because it's stored yesterday, 01:32:19.380 --> 01:32:22.670 last week, last year, your public key, it 01:32:22.670 --> 01:32:26.330 will use your public key to decrypt your signature 01:32:26.330 --> 01:32:31.650 using the same algorithm to get back hopefully the same challenge value. 01:32:31.650 --> 01:32:35.240 And if the output of this verification process 01:32:35.240 --> 01:32:39.650 matches the challenge the website sent you a second before, 01:32:39.650 --> 01:32:42.140 it must be the case mathematically that you 01:32:42.140 --> 01:32:44.870 are, in fact, who you claim to be because it 01:32:44.870 --> 01:32:48.840 was your device that registered for this website a day, a week, 01:32:48.840 --> 01:32:50.520 a year ago as well. 01:32:50.520 --> 01:32:53.600 So again, if we trust in the mathematics here 01:32:53.600 --> 01:32:58.150 and we trust that these algorithms allow us to encrypt information and decrypt 01:32:58.150 --> 01:33:01.570 it using a public key and private key, or conversely, 01:33:01.570 --> 01:33:06.640 a private key and public key, we can, with very, very high confidence, 01:33:06.640 --> 01:33:09.820 probabilistically say, yes, this is David Malan, 01:33:09.820 --> 01:33:12.380 I'm going to allow him back into this account. 01:33:12.380 --> 01:33:15.460 So what's the implication of this passwordless world that 01:33:15.460 --> 01:33:18.970 uses passkeys keys, or web authentication more technically? 01:33:18.970 --> 01:33:21.700 It means that we're getting out of the business, potentially, 01:33:21.700 --> 01:33:24.340 as a society of having to remember dozens 01:33:24.340 --> 01:33:28.360 or hundreds or thousands of different passwords for all of our accounts. 01:33:28.360 --> 01:33:33.640 It does require, though, that we don't lose the device or the devices that 01:33:33.640 --> 01:33:37.480 registered for these websites or apps, but again, increasingly, 01:33:37.480 --> 01:33:41.890 as the world providing cloud services, whether it's with Apple or Microsoft 01:33:41.890 --> 01:33:45.040 or Google or others, that presumably can synchronize 01:33:45.040 --> 01:33:47.980 your passkeys across devices and will conclude ultimately today, 01:33:47.980 --> 01:33:51.910 by talking about how they can be synchronized securely, even 01:33:51.910 --> 01:33:56.570 without Google and Microsoft and Apple knowing what your own passkeys are, so 01:33:56.570 --> 01:33:59.720 long as they provide us with a certain technical guarantee. 01:33:59.720 --> 01:34:03.650 So the upside of this is we can move away from passwords, 01:34:03.650 --> 01:34:08.720 and you can even share these passkeys with other people if you so choose. 01:34:08.720 --> 01:34:12.590 The catch is, right now, they're not omnipresently 01:34:12.590 --> 01:34:14.660 available on every website out there. 01:34:14.660 --> 01:34:17.690 It's probably going to take some time for the world to come on board, 01:34:17.690 --> 01:34:20.450 but I do dare say, in the coming weeks, months, and years, 01:34:20.450 --> 01:34:23.550 you will see passkeys increasingly offered to you. 01:34:23.550 --> 01:34:25.550 And so indeed, the next time you visit a website 01:34:25.550 --> 01:34:28.490 that asks you, hey, do you want to register with your fingerprint 01:34:28.490 --> 01:34:30.800 or with your face or with a PIN code? 01:34:30.800 --> 01:34:33.710 And you're never even asked for a password, odds are, 01:34:33.710 --> 01:34:37.650 it's using this passkey technology instead. 01:34:37.650 --> 01:34:40.350 Well, let's go ahead and take one more five-minute break here, 01:34:40.350 --> 01:34:43.070 and when we come back, we'll talk about securing data 01:34:43.070 --> 01:34:47.720 as it's moving back and forth and sitting on our own systems. 01:34:47.720 --> 01:34:49.580 All right, so we are back. 01:34:49.580 --> 01:34:53.000 And allow me to claim that we now have a bunch of ways 01:34:53.000 --> 01:34:57.360 to hash data and also encrypt data and also now, decrypt data. 01:34:57.360 --> 01:34:59.570 So how can we use these building blocks to solve 01:34:59.570 --> 01:35:01.700 some other perhaps familiar problems? 01:35:01.700 --> 01:35:04.590 Well, there's this notion of encryption in transit, 01:35:04.590 --> 01:35:08.120 which is a fancy way of saying that you and I probably prefer nowadays 01:35:08.120 --> 01:35:11.630 that our data be encrypted whenever it's traveling from point A 01:35:11.630 --> 01:35:15.710 to point B. Whether that point B is Amazon.com, Gmail.com, 01:35:15.710 --> 01:35:18.740 WhatsApp, or any other service that we're communicating with, 01:35:18.740 --> 01:35:23.060 we ideally want no one in between us-- some machine in the middle, so 01:35:23.060 --> 01:35:26.030 to speak, to be able to get at that same data. 01:35:26.030 --> 01:35:28.610 Because in particular, what you should be worried about 01:35:28.610 --> 01:35:32.780 is a scenario like this where if Alice is trying to communicate with Bob, 01:35:32.780 --> 01:35:35.750 you might worry that there's some eavesdropper, so to speak, 01:35:35.750 --> 01:35:38.270 named Eve between Alice and Bob. 01:35:38.270 --> 01:35:40.970 And maybe this is via wires nowadays on the internet. 01:35:40.970 --> 01:35:42.860 Maybe it's somehow wirelessly. 01:35:42.860 --> 01:35:46.940 Maybe Eve actually represents a company that Alice and Bob 01:35:46.940 --> 01:35:51.150 are communicating between, like Gmail or Outlook or the like. 01:35:51.150 --> 01:35:54.980 So encryption in transit, though, is important to distinguish 01:35:54.980 --> 01:35:56.780 from other forms of encryption. 01:35:56.780 --> 01:35:59.660 In particular here, Alice might very well 01:35:59.660 --> 01:36:03.380 have an encrypted connection not to an eavesdropper, per se, but just 01:36:03.380 --> 01:36:05.000 a third party like Gmail. 01:36:05.000 --> 01:36:07.400 So assume that Eve here is Gmail. 01:36:07.400 --> 01:36:10.640 And meanwhile, Bob, when checking his email account, 01:36:10.640 --> 01:36:14.570 has an encrypted connection to Eve as well, which, in this story now, 01:36:14.570 --> 01:36:15.470 is Gmail. 01:36:15.470 --> 01:36:18.830 So Alice has a secure connection to Gmail and Bob 01:36:18.830 --> 01:36:21.330 has a secure connection to Gmail as well, 01:36:21.330 --> 01:36:26.720 but that does not mean necessarily that Alice has a secure connection to Bob. 01:36:26.720 --> 01:36:31.250 Security does not really work through transitivity, so to speak. 01:36:31.250 --> 01:36:34.490 This might very well mean that the data is only 01:36:34.490 --> 01:36:39.860 encrypted while in transit from A to E and from B to E, 01:36:39.860 --> 01:36:43.310 but that doesn't mean that Eve, or Gmail in this story, 01:36:43.310 --> 01:36:46.040 can't be reading all of Alice's and Bob's emails. 01:36:46.040 --> 01:36:49.850 And indeed, that is technically possible on Google's end. 01:36:49.850 --> 01:36:53.900 They, of course, run all of the servers that your Gmail accounts might be on. 01:36:53.900 --> 01:36:57.110 There's nothing technically probably stopping them 01:36:57.110 --> 01:36:59.030 from reading anything and everything. 01:36:59.030 --> 01:37:00.650 Now hopefully they have policies. 01:37:00.650 --> 01:37:05.330 Hopefully very few humans actually have the privileges or the authorization 01:37:05.330 --> 01:37:07.310 to even do anything close to that. 01:37:07.310 --> 01:37:11.390 But technically speaking, just because Alice has a secure connection to Gmail 01:37:11.390 --> 01:37:13.730 and Bob has a secure connection to Gmail, 01:37:13.730 --> 01:37:16.760 that doesn't mean that their communications will 01:37:16.760 --> 01:37:22.620 be encrypted entirely between A and B. And there are lots of examples of this 01:37:22.620 --> 01:37:23.120 as well. 01:37:23.120 --> 01:37:25.610 Zoom, for instance, when it comes to video conferencing, 01:37:25.610 --> 01:37:27.890 you might have an encrypted connection to Zoom, 01:37:27.890 --> 01:37:29.930 I might have an encrypted connection to Zoom. 01:37:29.930 --> 01:37:34.610 That does not necessarily mean that Zoom couldn't be Eve in this story 01:37:34.610 --> 01:37:39.350 listening and watching everything that we're saying while video conferencing 01:37:39.350 --> 01:37:40.050 as well. 01:37:40.050 --> 01:37:44.600 So encryption in transit is good in that it at least keeps random people out 01:37:44.600 --> 01:37:48.380 of the picture because they don't have access to these encrypted channels, 01:37:48.380 --> 01:37:51.560 but if there is this third party, this machine in the middle 01:37:51.560 --> 01:37:55.700 or company in the middle, even they might have access to data that we 01:37:55.700 --> 01:37:58.020 do not want them to have access to. 01:37:58.020 --> 01:38:00.770 So what, then, is a stronger alternative? 01:38:00.770 --> 01:38:06.590 Increasingly possible, increasingly available, and something you as a user 01:38:06.590 --> 01:38:09.590 should be looking for with greater frequency is what 01:38:09.590 --> 01:38:11.940 we would call an end-to-end encryption. 01:38:11.940 --> 01:38:14.780 This is a stronger guarantee whereby you can 01:38:14.780 --> 01:38:19.430 trust that Alice's connection to Bob is, in fact, secure 01:38:19.430 --> 01:38:25.460 even if-- not pictured here, there are 1, 2, 3, 4 machines in the middle, 01:38:25.460 --> 01:38:28.520 companies in the middle, eavesdroppers in the middle. 01:38:28.520 --> 01:38:32.900 If you use encryption properly end-to-end, 01:38:32.900 --> 01:38:38.750 you can ensure that the only thing Eve or Google or Zoom can see 01:38:38.750 --> 01:38:43.250 is just your ciphertext, the seemingly random strings of text 01:38:43.250 --> 01:38:47.810 or 0's and 1's that represent your encrypted data, but without your key, 01:38:47.810 --> 01:38:51.240 they have no idea what that data actually is. 01:38:51.240 --> 01:38:54.170 So end-to-end encryption isn't necessarily in most 01:38:54.170 --> 01:38:55.320 company's best interest. 01:38:55.320 --> 01:38:55.820 Why? 01:38:55.820 --> 01:38:59.440 Well, companies like Gmail tend to presumably mine our data, 01:38:59.440 --> 01:39:01.960 whether it's for advertising purposes or otherwise. 01:39:01.960 --> 01:39:05.610 And so it's sometimes in companies' interest to have access to your data 01:39:05.610 --> 01:39:08.880 to keep it secure on their servers, but still 01:39:08.880 --> 01:39:10.990 in a way that they have access to it. 01:39:10.990 --> 01:39:13.750 Now that might be not comfortable for you. 01:39:13.750 --> 01:39:15.330 And so there are alternatives. 01:39:15.330 --> 01:39:18.720 For instance, iMessage for Apple users and WhatsApp 01:39:18.720 --> 01:39:23.370 internationally is known in particular for offering end-to-end encryption 01:39:23.370 --> 01:39:27.300 which, if implemented truthfully and technically correctly, 01:39:27.300 --> 01:39:29.880 should guarantee that even though your messages might 01:39:29.880 --> 01:39:33.720 be going through WhatsApp servers, no employee at WhatsApp 01:39:33.720 --> 01:39:36.300 can actually see your messages because it's encrypted 01:39:36.300 --> 01:39:39.660 all the way from A to B, even though it's 01:39:39.660 --> 01:39:42.060 going through a potential eavesdropper. 01:39:42.060 --> 01:39:45.570 But that depends on exactly what form of encryption you're using, 01:39:45.570 --> 01:39:47.880 and if it's not end-to-end, it might only 01:39:47.880 --> 01:39:51.690 be encrypted in transit such that Eve's, that eavesdropper, 01:39:51.690 --> 01:39:54.040 might indeed have access to the data. 01:39:54.040 --> 01:39:56.850 So as to how you can use end-to-end encryption, 01:39:56.850 --> 01:40:00.570 it's an option that a service must provide to you in this case 01:40:00.570 --> 01:40:02.760 or you must choose services that offer it. 01:40:02.760 --> 01:40:05.370 It's not necessarily something that's always available, 01:40:05.370 --> 01:40:09.720 but it is increasingly available in different software. 01:40:09.720 --> 01:40:12.960 So let's now consider a fairly mundane operation, 01:40:12.960 --> 01:40:16.440 but one that has implications for these same technologies and solutions. 01:40:16.440 --> 01:40:19.500 That is, deleting a file, be it on your Mac or your PC 01:40:19.500 --> 01:40:22.090 or your phone or some other device. 01:40:22.090 --> 01:40:24.390 Now where is data stored in your devices? 01:40:24.390 --> 01:40:26.500 Well generally, it might be in a device like this, 01:40:26.500 --> 01:40:29.220 a large, somewhat older but large hard drive that 01:40:29.220 --> 01:40:31.920 can store lots and lots of files and folders, 01:40:31.920 --> 01:40:34.620 or perhaps something smaller known as a solid state 01:40:34.620 --> 01:40:37.860 drive that might store information entirely digitally 01:40:37.860 --> 01:40:39.495 without any moving parts. 01:40:39.495 --> 01:40:41.370 And even smaller might be something like this 01:40:41.370 --> 01:40:44.820 that you carry around like a USB stick, and they are even smaller nowadays, 01:40:44.820 --> 01:40:48.000 too, that similarly stores some data digitally. 01:40:48.000 --> 01:40:51.540 Now how do we go about deleting files from a computer or any 01:40:51.540 --> 01:40:52.500 of these devices? 01:40:52.500 --> 01:40:56.370 Well, you typically click it and drag it somewhere, or maybe you right-click it 01:40:56.370 --> 01:40:59.370 or maybe you tap and drag it to some trash or the like. 01:40:59.370 --> 01:41:02.700 There's any number of user interface mechanisms for deleting files, 01:41:02.700 --> 01:41:06.240 but let's consider for our purposes what happens underneath the hood. 01:41:06.240 --> 01:41:09.780 So let me stipulate that your hard drive, your solid state 01:41:09.780 --> 01:41:12.540 drive, your USB stick just contains ultimately 01:41:12.540 --> 01:41:17.760 a whole bunch of 0's and 1's, and those 0's and 1's represent your files 01:41:17.760 --> 01:41:18.910 and folders. 01:41:18.910 --> 01:41:22.620 So when you go about deleting a file, by dragging it 01:41:22.620 --> 01:41:26.220 to the recycle bin on Windows, or dragging it to the Trash 01:41:26.220 --> 01:41:29.520 Can on macOS, what actually happens? 01:41:29.520 --> 01:41:33.240 Well, it turns out, not anything at all, really. 01:41:33.240 --> 01:41:37.590 When you recycle a file on Windows or when you trash a file on macOS, 01:41:37.590 --> 01:41:42.000 it doesn't actually get deleted in the sense that you and I might expect. 01:41:42.000 --> 01:41:43.920 By delete it, I mean it's gone. 01:41:43.920 --> 01:41:45.780 I don't want to be able to find it anywhere. 01:41:45.780 --> 01:41:47.190 OK, wait a minute, though. 01:41:47.190 --> 01:41:49.920 Of course, we all know by now, at least on computers, 01:41:49.920 --> 01:41:53.380 you at least have to empty the Recycle Bin or empty the Trash Can. 01:41:53.380 --> 01:41:55.360 So OK, maybe I missed that step. 01:41:55.360 --> 01:41:58.140 But even then, contrary to what you might expect, 01:41:58.140 --> 01:42:02.910 emptying the Recycle + Bin, emptying the Trash Can also does not generally 01:42:02.910 --> 01:42:04.260 delete the data. 01:42:04.260 --> 01:42:06.510 And here's where I'd, again, emphasize, wait a minute, 01:42:06.510 --> 01:42:10.890 when I delete a file, I want it gone, removed from my computer altogether. 01:42:10.890 --> 01:42:16.050 But what macOS and Windows and operating systems in general tend to do instead, 01:42:16.050 --> 01:42:19.080 when you even empty the Recycle Bin or Trash Can, 01:42:19.080 --> 01:42:23.880 they don't actually get rid of the file, per se, they just forget where it is. 01:42:23.880 --> 01:42:26.100 Somewhere in the computer's memory, there's 01:42:26.100 --> 01:42:29.310 like a spreadsheet of sorts, some kind of database or table 01:42:29.310 --> 01:42:32.520 with at least two columns, one of which has the name of your file 01:42:32.520 --> 01:42:35.400 or the location of your file, the other of which 01:42:35.400 --> 01:42:40.770 has some kind of reference to which 0's and 1's on your actual computer 01:42:40.770 --> 01:42:43.200 implement that specific file. 01:42:43.200 --> 01:42:46.650 Maybe these 0's and 1's are for one file, these 0's and 1's are 01:42:46.650 --> 01:42:48.160 for another file, and so forth. 01:42:48.160 --> 01:42:50.970 So somewhere, your computer is keeping track of what 01:42:50.970 --> 01:42:53.410 is where physically on your computer. 01:42:53.410 --> 01:42:56.160 But when you delete a file by emptying the Trash or Recycle Bin, 01:42:56.160 --> 01:42:58.670 the computer just, eh, forgets where it is. 01:42:58.670 --> 01:43:02.970 And more importantly, it frees up the space so it can be used later. 01:43:02.970 --> 01:43:04.260 So what do I mean by that? 01:43:04.260 --> 01:43:07.410 Well, suppose I do go ahead and delete a file 01:43:07.410 --> 01:43:09.980 and empty the Recycle Bin or Trash Can, and suppose 01:43:09.980 --> 01:43:15.530 that these yellow 0's and 1's represent the file that I no longer care about. 01:43:15.530 --> 01:43:19.080 Well, what's actually going to happen underneath the hood, so to speak, 01:43:19.080 --> 01:43:19.940 of the computer? 01:43:19.940 --> 01:43:24.020 Well eventually, some of those yellow 0's and 1's might just 01:43:24.020 --> 01:43:26.030 get reused for other files. 01:43:26.030 --> 01:43:29.630 In other words, these 0's and 1's highlighted in yellow 01:43:29.630 --> 01:43:32.390 represent a file that used to be there, but is not. 01:43:32.390 --> 01:43:36.320 That is equivalent to saying some other file can now use those same 01:43:36.320 --> 01:43:37.260 0's and 1's. 01:43:37.260 --> 01:43:41.510 And so here's some random 0's and 1's that may be overwrite some of the file, 01:43:41.510 --> 01:43:42.680 but not all of it. 01:43:42.680 --> 01:43:45.590 Notice, there's still a bunch of yellow 0's and 1's here 01:43:45.590 --> 01:43:48.180 in my depiction of my computer. 01:43:48.180 --> 01:43:53.510 So it turns out that over time, yes, your file will probably 01:43:53.510 --> 01:43:55.100 get actually deleted. 01:43:55.100 --> 01:43:56.270 What do I mean by that? 01:43:56.270 --> 01:44:00.830 Eventually those 0's and 1's will be repurposed, changed from 1 to 0, 01:44:00.830 --> 01:44:05.190 changed from 0 to 1 such that your file, for all intents and purposes, 01:44:05.190 --> 01:44:09.530 is actually gone, because it's been repurposed, that space, altogether. 01:44:09.530 --> 01:44:12.020 But notice, at least at this point in time, 01:44:12.020 --> 01:44:15.770 and shortly after you delete a file, even if you've created or downloaded 01:44:15.770 --> 01:44:18.740 new files, there might still be parts of your files 01:44:18.740 --> 01:44:23.900 around, which means that sensitive word document or Excel file or images 01:44:23.900 --> 01:44:27.020 that you had on your computer, there might still be remnants of them, 01:44:27.020 --> 01:44:29.820 just a few lines from any of those. 01:44:29.820 --> 01:44:33.170 So you should realize that deleting a file doesn't really get rid of it 01:44:33.170 --> 01:44:35.480 in the way you might expect or hope. 01:44:35.480 --> 01:44:39.320 To do that, you need to be a little better with practices. 01:44:39.320 --> 01:44:41.180 Now what do I mean by this? 01:44:41.180 --> 01:44:44.750 Secure deletion is another beast altogether. 01:44:44.750 --> 01:44:48.500 And typically when we delete files, they're not deleted securely. 01:44:48.500 --> 01:44:51.740 They're not deleted typically in a way that you would hope. 01:44:51.740 --> 01:44:55.790 So secure deletion does what you might really hope for, get rid of this file 01:44:55.790 --> 01:44:56.400 altogether. 01:44:56.400 --> 01:44:59.210 So if we go back to the original contents of my computer 01:44:59.210 --> 01:45:02.180 with all of these here 0's and 1's, and suppose 01:45:02.180 --> 01:45:05.300 that I want to delete this file here at the top of the screen, 01:45:05.300 --> 01:45:10.188 in an extreme ideal world, those 0's and 1's would just be gone. 01:45:10.188 --> 01:45:11.480 Like that's pretty darn secure. 01:45:11.480 --> 01:45:15.080 Those bits, those 0's and 1's, they don't even exist anymore. 01:45:15.080 --> 01:45:18.890 Now this is probably not the best way to securely delete information 01:45:18.890 --> 01:45:23.090 because if I just got rid of those 0's and 1's somehow, like my hard drive 01:45:23.090 --> 01:45:25.610 is getting like literally smaller and smaller 01:45:25.610 --> 01:45:29.270 in terms of how much stuff I can put on it if I don't have as many bits 01:45:29.270 --> 01:45:30.860 or 0's and 1's available. 01:45:30.860 --> 01:45:32.990 So that's probably not the best long-term solution 01:45:32.990 --> 01:45:34.040 because it's expensive. 01:45:34.040 --> 01:45:36.750 It's like getting rid of some of my capacity. 01:45:36.750 --> 01:45:41.630 So we don't actually do that, but how might we securely delete a file? 01:45:41.630 --> 01:45:46.040 I don't think we want to just wait and hope that those 0's and 1's eventually 01:45:46.040 --> 01:45:49.160 get reused by the system because we might still 01:45:49.160 --> 01:45:52.500 be left with some remnants which might not be ideal. 01:45:52.500 --> 01:45:56.450 So what we can do when securely deleting a file is something like this-- 01:45:56.450 --> 01:46:00.250 change all of the 0's and 1's that we don't care about anymore or want, 01:46:00.250 --> 01:46:01.790 change them all to 0's. 01:46:01.790 --> 01:46:05.900 And this will effectively securely delete the file 01:46:05.900 --> 01:46:09.200 because now the 1's that were previously there 01:46:09.200 --> 01:46:12.620 that represented some piece of information are just completely gone. 01:46:12.620 --> 01:46:15.230 Or equivalently, I could change them all to 1's. 01:46:15.230 --> 01:46:18.080 Or I could even change it to random 0's and 1's. 01:46:18.080 --> 01:46:20.990 The point is, to securely delete a file, you 01:46:20.990 --> 01:46:26.360 should change all of the 0's and 1's to at least some other pattern 01:46:26.360 --> 01:46:28.640 so that the file is effectively gone. 01:46:28.640 --> 01:46:31.820 Now how can you use this to your benefit? 01:46:31.820 --> 01:46:34.040 Well, some operating systems nowadays support 01:46:34.040 --> 01:46:38.640 what's called full-disk encryption, and this is good for a number of reasons. 01:46:38.640 --> 01:46:41.790 One, if you enable a feature called full-disk encryption, 01:46:41.790 --> 01:46:46.430 which is actually a specific incarnation of an idea known as encryption at rest. 01:46:46.430 --> 01:46:49.910 Encryption in transit refers, of course, to your data going back and forth 01:46:49.910 --> 01:46:52.280 from point A to point B. Encryption at rest 01:46:52.280 --> 01:46:56.240 means it's just sitting there on your device, in your pocket, or on your lap 01:46:56.240 --> 01:47:00.900 or on your desktop, sitting unused, maybe on or off. 01:47:00.900 --> 01:47:04.700 So when it comes to full-disk encryption or encryption at rest, 01:47:04.700 --> 01:47:09.170 you ideally want all of your data somehow encrypted on your Mac, 01:47:09.170 --> 01:47:11.420 on your PC, on your phone. 01:47:11.420 --> 01:47:14.540 And only when you log in with your password or maybe 01:47:14.540 --> 01:47:19.740 your fingerprint or your face should that data be decrypted automatically, 01:47:19.740 --> 01:47:23.240 and this can happen pretty darn fast nowadays with modern hardware, 01:47:23.240 --> 01:47:25.850 should the data be unencrypted so you can actually 01:47:25.850 --> 01:47:28.620 use it and interact with that device. 01:47:28.620 --> 01:47:31.220 So why is this advantageous? 01:47:31.220 --> 01:47:34.490 Well, one, if your device gets stolen, so long 01:47:34.490 --> 01:47:37.520 as you're not logged into it, so long as it's locked, 01:47:37.520 --> 01:47:41.060 so long as the lid is closed, so long as it's unplugged or any other number 01:47:41.060 --> 01:47:45.920 of scenarios, at least if someone takes your laptop from the table in Starbucks 01:47:45.920 --> 01:47:48.980 or the cafe, well, hopefully, if you have 01:47:48.980 --> 01:47:51.635 a good password or good biometrics, they're 01:47:51.635 --> 01:47:53.510 not going to be able to get any of your data. 01:47:53.510 --> 01:47:56.190 They can maybe delete all of your data and they can 01:47:56.190 --> 01:47:59.790 and sell your computer, they can use your computer, but they probably, 01:47:59.790 --> 01:48:03.180 if you're practicing best practices, don't have access 01:48:03.180 --> 01:48:04.660 to the data that's on the system. 01:48:04.660 --> 01:48:05.160 Why? 01:48:05.160 --> 01:48:09.470 Because it's completely encrypted at rest and they don't know your password, 01:48:09.470 --> 01:48:11.970 they don't have your fingerprint, they don't have your face, 01:48:11.970 --> 01:48:14.470 they should not be able to decrypt that data. 01:48:14.470 --> 01:48:17.790 So in other words, if this is my unencrypted data, 01:48:17.790 --> 01:48:20.910 the way I want it and need it when I'm using my computer, 01:48:20.910 --> 01:48:25.590 full-disk encryption, at rest, would change my entire computer 01:48:25.590 --> 01:48:26.610 to look random. 01:48:26.610 --> 01:48:30.630 These are random 0's and 1's now that I generated by using, 01:48:30.630 --> 01:48:34.020 for instance, my password or my fingerprint or my face. 01:48:34.020 --> 01:48:37.350 And this is what your hard drive or your solid state drive 01:48:37.350 --> 01:48:41.910 should look like when the lid is closed, when the power is off. 01:48:41.910 --> 01:48:46.020 When you are logged out of it, it should be random 0's and 1's. 01:48:46.020 --> 01:48:49.080 And the upside of this now is that, again, 01:48:49.080 --> 01:48:53.910 if it's stolen while in this state, there's no data to be used 01:48:53.910 --> 01:48:56.890 by the adversary because it looks like random 0's and 1's. 01:48:56.890 --> 01:49:00.010 Better yet, if you deliberately want to get rid of the device 01:49:00.010 --> 01:49:02.710 because you want to trade it in for resale value, 01:49:02.710 --> 01:49:04.720 because you want to donate it to someone else, 01:49:04.720 --> 01:49:06.880 because you want to sell it to someone online, 01:49:06.880 --> 01:49:09.820 when using full-disk encryption, the upside 01:49:09.820 --> 01:49:14.390 is that so long as you had a really hard-to-guess password, your data is, 01:49:14.390 --> 01:49:17.800 for all intents and purposes, securely deleted already. 01:49:17.800 --> 01:49:21.040 Because only if the new buyer figures out or knows 01:49:21.040 --> 01:49:24.190 your password or has your same fingerprint or has your same face, 01:49:24.190 --> 01:49:26.870 they're not going to be able to access any of your data anyway. 01:49:26.870 --> 01:49:31.300 And this is important nowadays because it turns out, with modern hardware, 01:49:31.300 --> 01:49:36.970 even if you might want to change all of the 0's and 1's to all 0's or all 1's 01:49:36.970 --> 01:49:42.080 or all random data, it turns out that today's hardware can fail over time. 01:49:42.080 --> 01:49:47.860 So even little USB sticks or solid state drives over time can kind of wear out. 01:49:47.860 --> 01:49:49.930 But they're smart enough, thanks to software 01:49:49.930 --> 01:49:53.920 known as firmware inside of it, as soon as the device realizes, wait a minute, 01:49:53.920 --> 01:49:56.770 those bits over there aren't working properly anymore, 01:49:56.770 --> 01:50:02.200 the device might not let you change them to all 0's or all 1's or a random 0's 01:50:02.200 --> 01:50:03.260 and 1's anymore. 01:50:03.260 --> 01:50:06.350 It might just leave them as is forever. 01:50:06.350 --> 01:50:09.070 Which is to say, it's even more important to start 01:50:09.070 --> 01:50:11.740 using full-disk encryption, encryption at rest, 01:50:11.740 --> 01:50:14.620 when you first get a device because that way, 01:50:14.620 --> 01:50:18.040 you can trust that even if parts of the device degrade over time, 01:50:18.040 --> 01:50:20.560 all of the data that's there and has been there 01:50:20.560 --> 01:50:25.900 was at least encrypted with one of your passwords or one of your biometrics 01:50:25.900 --> 01:50:26.800 in the past. 01:50:26.800 --> 01:50:30.700 So this is the kind of feature to look for in your Mac, your PC, or your phone 01:50:30.700 --> 01:50:33.580 to ensure that it is somehow enabled. 01:50:33.580 --> 01:50:36.160 Thankfully, once you log back in with your password, 01:50:36.160 --> 01:50:38.860 it goes back to the original data and you can use it. 01:50:38.860 --> 01:50:42.190 Of course, then, an implication of this best practice 01:50:42.190 --> 01:50:45.250 is that if you lose your laptop or your phone 01:50:45.250 --> 01:50:48.820 or your desktop's password, or your fingerprint somehow changed, 01:50:48.820 --> 01:50:51.400 or your face sufficiently changes, you might be locked out 01:50:51.400 --> 01:50:54.310 of all of your data, too, but again, that's 01:50:54.310 --> 01:50:59.480 just another example of this trade-off between usability and security as well. 01:50:59.480 --> 01:51:02.320 Now a downside, an evil side to full-disk encryption 01:51:02.320 --> 01:51:06.200 is ransomware, which is how adversaries are monetizing attacks. 01:51:06.200 --> 01:51:09.460 It's not uncommon nowadays for hackers, for adversaries, 01:51:09.460 --> 01:51:12.160 when they get into a system, whether it's your laptop 01:51:12.160 --> 01:51:16.330 or, for instance, a corporate network, or in some cases, hospital 01:51:16.330 --> 01:51:21.220 systems or a city's own computer networks, to not try to do any damage 01:51:21.220 --> 01:51:24.280 or just do something like spam or cryptocurrency mining, 01:51:24.280 --> 01:51:30.850 but to actually encrypt all of the data on these systems they somehow 01:51:30.850 --> 01:51:32.500 accessed online. 01:51:32.500 --> 01:51:33.220 Why? 01:51:33.220 --> 01:51:36.610 Well, if they encrypt all of the data they can then ask for a ransom 01:51:36.610 --> 01:51:39.670 and say, listen, if you don't give me this many bitcoins, 01:51:39.670 --> 01:51:44.480 I'm going to give you the key that I used to encrypt your data. 01:51:44.480 --> 01:51:47.590 And if you poke around online, there have been many examples of this, 01:51:47.590 --> 01:51:51.190 unfortunately, where hackers have gotten into systems that were not 01:51:51.190 --> 01:51:55.460 very well-protected, all of the data therein was encrypted, 01:51:55.460 --> 01:51:58.450 and this is an opportunity for the adversaries 01:51:58.450 --> 01:52:02.290 to try to extort, say, financial gain from a situation 01:52:02.290 --> 01:52:07.360 by then only handing you the keys, if ever, once you've actually paid up. 01:52:07.360 --> 01:52:10.000 And there, too, there's the risk, as in any ransom scenario, 01:52:10.000 --> 01:52:14.240 where who even knows if they're going to give you the proper key in the end, 01:52:14.240 --> 01:52:17.800 but this is increasingly a concern for municipalities, for companies, 01:52:17.800 --> 01:52:19.340 for universities, and the like. 01:52:19.340 --> 01:52:22.090 So just as we have some upsides here, there, 01:52:22.090 --> 01:52:24.740 too, is this trade-off in what you can do. 01:52:24.740 --> 01:52:27.820 And lastly, we thought we'd end on a note about the future 01:52:27.820 --> 01:52:29.560 because this is a topic that will come up 01:52:29.560 --> 01:52:33.290 and has come up over time, this topic of quantum computing. 01:52:33.290 --> 01:52:35.800 So for those less familiar, we've been talking a lot 01:52:35.800 --> 01:52:38.290 about bits, 0's and 1's today, and at the end 01:52:38.290 --> 01:52:41.140 of the day that's how today's computer systems are implemented. 01:52:41.140 --> 01:52:44.950 Patterns of 0's and 1's to represent numbers and letters and colors 01:52:44.950 --> 01:52:47.320 and videos and sounds and everything. 01:52:47.320 --> 01:52:50.260 We've been discussing today data more generally. 01:52:50.260 --> 01:52:56.950 Now typically, in our world now, a bit, a binary digit, can either there be a 0 01:52:56.950 --> 01:53:02.020 or it can be a 1, as per the diagram we had on the screen in these examples. 01:53:02.020 --> 01:53:04.090 Either a 0 or a 1. 01:53:04.090 --> 01:53:08.950 In the world of quantum computing, thanks to some very fancy physics 01:53:08.950 --> 01:53:12.670 and quantum mechanics in particular, it is possible, 01:53:12.670 --> 01:53:17.380 it seems, physically, for us to implement the idea of bits a little bit 01:53:17.380 --> 01:53:20.030 differently using quantum techniques. 01:53:20.030 --> 01:53:26.080 And there's this idea of not just a bit, but a quantum bit or qubit whose power 01:53:26.080 --> 01:53:28.900 derives from the reality that physically, you 01:53:28.900 --> 01:53:33.550 can implement a qubit in such a way that it is representing both a 0 01:53:33.550 --> 01:53:37.130 and a 1 at the exact same time. 01:53:37.130 --> 01:53:39.970 So it can be not in just one state, so to speak, 01:53:39.970 --> 01:53:44.000 one condition at once, but two states at once. 01:53:44.000 --> 01:53:47.740 And if you have two qubits, they can be in four states at once. 01:53:47.740 --> 01:53:50.530 If you have three, they can be in eight states at once. 01:53:50.530 --> 01:53:55.270 If you have 32 of them, they can be in 4 billion states at once. 01:53:55.270 --> 01:53:57.270 Now what's the implication of this? 01:53:57.270 --> 01:53:59.240 Well, when we talk about cryptography, when 01:53:59.240 --> 01:54:02.870 we talk about hashing, when we talk about just very large numbers 01:54:02.870 --> 01:54:05.900 and trying to figure out via brute force or some other mechanism 01:54:05.900 --> 01:54:12.530 what some input to a function was, if you have exponentially more computing 01:54:12.530 --> 01:54:15.560 capabilities by not being able to do one or two 01:54:15.560 --> 01:54:20.520 things at a time with individual bits, but two or four or eight or 4 01:54:20.520 --> 01:54:23.540 billion things at once, it stands to reason 01:54:23.540 --> 01:54:27.920 that if adversaries have access to quantum computing before you 01:54:27.920 --> 01:54:31.700 and I do, then all of the security you and I now rely on 01:54:31.700 --> 01:54:35.990 and that we've talked about today could suddenly become insecure. 01:54:35.990 --> 01:54:38.120 Because we're trusting right now that it's just 01:54:38.120 --> 01:54:40.340 going to take the adversary a lot, a lot, 01:54:40.340 --> 01:54:42.590 a lot of time, maybe money, maybe resources, 01:54:42.590 --> 01:54:44.870 maybe risk to attack our accounts. 01:54:44.870 --> 01:54:49.170 But if they have exponentially more resources than you and me, 01:54:49.170 --> 01:54:51.830 then our data really is at risk. 01:54:51.830 --> 01:54:56.410 And all of the mathematics we've been trusting need to be hardened instead. 01:54:56.410 --> 01:55:00.370 Now hopefully you and I will have access to quantum computing at the same time 01:55:00.370 --> 01:55:03.110 as or ideally before all of these adversaries, 01:55:03.110 --> 01:55:06.040 so hopefully our algorithms for securing information 01:55:06.040 --> 01:55:08.990 will continue to evolve along with these technologies. 01:55:08.990 --> 01:55:11.980 So this isn't necessarily something you need to worry about for now. 01:55:11.980 --> 01:55:15.640 Indeed, I think after today, we have more than enough to worry about. 01:55:15.640 --> 01:55:17.510 So for today, that's all. 01:55:17.510 --> 01:55:20.160 We'll see you next time.