[MUSIC PLAYING] BRIAN YU: Welcome back, everyone, to Web Programming with Python and JavaScript. And today we're going to look at things from a different perspective. So we've spent the past several weeks working on designing and building and programming web applications using Python and JavaScript. We've talked about using frameworks like Flask and Django in order to actually write the code that will run on our web servers and then writing JavaScript code that runs on the client side inside of a user's browser in order to allow for additional functionality to happen. But the focus today is going to be less about actually writing the web applications but what happens after you've written those web applications. And you want to take your web application and deploy it to the internet. After you've written it, after you've tested it, what concerns have to be considered when we think about taking that web application and then deploying it? And the main focus of today is going to be all about scalability, this idea of a web application might work well if just a couple of users are using it. But what happens when the web application starts to get popular as more people start to use it? And as your application starts to have to deal with multiple different people potentially accessing your data at the same time and trying to use your application at the same time, what sorts of considerations do you need to take into account when that starts to happen? And so we can begin with a simple picture. We might imagine that this diagram here represents your web server. And when a user comes along, that user is going to be connecting to that web server somehow. They're going to connect to the web server. Your server, which might be running Flask or Django or some other web framework, is going to need to process that request, figure out what sort of response to present back to the user, and then deliver that request back to the user. But a server can only do finitely many things per second. How do we typically measure how many things a server can do in a given amount of time? Any idea what the metric for that usually is? So the standard metric for that is a unit of measurement called hertz, which represents the number of calculations that a computer can do in a given second. Or more commonly, as we hear nowadays, gigahertz, or billions of computations, which are very simple computations like adding two numbers together or checking whether or not a number is equal to zero. Simple calculations like that amassed over billions and billions of computations is generally the way we'll measure how many things a server can do at the same time in a given amount of time, like a period of one second, for instance. And so a given server can only do finitely many number of things in a given amount of time-- in a given second, for instance-- which means that there are only a finitely many number of users that a server could potentially respond to in a given second. And so if a server can only respond to 100 users in a given second, what happens when user number 101 comes along and tries to make a request to the server in that same second? How is the server going to deal with that? And these issues surrounding scalability are the issues that we're going to be exploring today. What happens when it's not just one user are trying to connect to our server but potentially many users that are all trying to connect to our server at the same time? So what are some ideas for how we might deal with this situation? Our server is a finite machine that can only deal with so many users per second. And suddenly we find that our web application's gotten popular enough that we have more than that number of users trying to access our application at the same time. What might we want to do about that? AUDIENCE: You could add more memory and resources to the actual server to make it a beefier kind of machine. BRIAN YU: Yeah, great. We could add more resources to the server we have. We can add more memory to the server. We can, in other words, try and make the server faster, increase the processing power of that server. And so this is something we might-- well, first of all, before we get there, I'll talk a little bit about benchmarking. So benchmarking is something you'll probably want to do first, this process of figuring out just how much your server can actually handle. Your server has a maximum capacity, but you might not know upfront just what that capacity is, just how many users your server can handle. And it's probably not a good idea to go about waiting until that server hits the capacity, until you've reached the point where your server can no longer handle any more users, before you realize, oh, yeah, that's what the capacity is. And so benchmarking is something you'll likely want to do first in order to load test or stress test, as it's often called-- testing those servers in order to make sure you know what the limit is. And once you know that, then you can start to think about what to do if you were to ever exceed that limit. And so the idea that was brought up here is something that we might call vertical scaling-- this idea that if our server as it is now isn't good enough, isn't performant enough in order to handle all of the users that might be coming in order to use our web application, then what we might want to do is scale that server up and make it a larger server, for instance, that is able to have more processing capacity, that's able to operate faster, that has more memory, for instance, that can then allow it to handle that additional capacity. So you might imagine that if this is our server and this is its connection and we realize that more and more connections are going to start coming in, what do we need to do? We can vertically scale that server, make it more performant by adding more memory, for instance, to that server in order to allow it to respond to that sort of thing. So what are the drawbacks or limitations of vertical scaling? Where might we go wrong with this process, or why is it not a perfect solution to all of our problems when it comes to scalability? AUDIENCE: Well, it's not, I mean, I guess, for lack of a better word, it's not very scalable because you just have this one machine that you're trying to make bigger and bigger. At some point, it's probably going to get really expensive or impossible. BRIAN YU: Yeah, sure. So this is maybe not as scalable as we would like. The idea might be that eventually we're going to hit a point where it's going to be impossible to just keep getting a bigger and bigger server because with a single server, wherever we're getting the server from, there's probably a maximum processing power they can put inside of a single server. And so we're eventually going to hit some sort of limit on vertical scaling where our servers can only get so powerful inside of just a single server. So what might we do then? How can we-- if we still need to scale our application, still need to deal with more users that are all trying to access the application at the same time, and we can't just keep growing this one server, what do we do in that situation? AUDIENCE: Get another server. BRIAN YU: Get another server. Great. And so if this is called vertical scaling, this idea of taking our existing server and adding more processing power to it in order to make it more performant, than adding more servers we might call horizontal scaling. The idea there being that if we have a single server previously and now we want to be able to handle more load coming from more places, then instead of just having one server, maybe we think about splitting this up now into two different servers where each of the servers is able to handle users, able to process requests, and deal with users that are coming in in that sense as well. But what problems might arise now, now that we have two servers that we're trying to run our web application on? AUDIENCE: You still want one database, so if they're trying to write the same database, something like that. There might be risk condition. BRIAN YU: Great so one potential concern is what happens with the data. We have a database somewhere, right? We might be in a PostgreSQL database, like we did in project one, for instance, where both of these servers need to somehow access that database. And maybe they're accessing the database at the same time, and concerns might arise there as well. And we'll talk about how to deal with scaling our databases later on today as well. What else might happen? That's certainly something that might come up. What initial challenge might come up if a user now tries to access my web application? AUDIENCE: They don't know which server to go to. BRIAN YU: Great. The user doesn't really know which server to go to. We somehow need to have some way of figuring out if a user comes in, do we send them to this server over here or this server over there. So how do we address that problem? And so oftentimes this is addressed through another piece of hardware that sits in between the user and the server, which we call a load balancer. And the load balancer's job is effectively to solve that very problem, to wait for a user to come in. And the load balancer simply is going to try and detect, when the user comes in, what should happen. Should we send the user to this server, or should we send the user to that server? And load balancer needs to make those decisions. So when the user comes in, they send them to either one server or to another server. So how might a load balancer make those decisions now? So somehow the load balancer needs to decide. User comes in. Do we send them to server A or server B? What strategies or algorithms might a load balancer want to employ in order to determine which of the different servers to send the user to? Maybe it's going to be only two, as in this diagram. But maybe in the case of an even larger web application, we have scaled it up to more than two servers. There are three, four, five, or even more servers that the load balancer needs to decide which one should the user go to. And there are many potential answers. But what are some possibilities for what the load balancer could be doing here? AUDIENCE: I've heard of round robining. BRIAN YU: Round robining. Great. So that if I have five different servers, we take the first user, send them to server one. Take the second user, send them to server two. Third user goes to server three, then four, then five. And when the next one comes in, we can send him back to one, just sort of alternating and circling between all of the possible servers that we have. It's certainly an option. Other choices that we might have? Yeah. AUDIENCE: Probably communicate with the server, see if it's busy. And then if it's busy, then don't send anything to it, something like that. BRIAN YU: Yeah, so we can try potentially communicating with the servers maybe. Server number one has a lot of users on it right now, and server number two doesn't have very many. If we could somehow get the servers to tell us what the current load is, how many users are currently using either one of those servers, then maybe our load balancer can be intelligent about that and figure out, well, if server number two doesn't have very many users using it right now, then we may as well direct more traffic there in order to help to, as the load balancer name might imply, trying to balance out the load on each of these two servers so that no one is facing a lot more in terms of resource usage than the other server is. Other ideas that we might throw out there? OK. So those are some of the basic strategies that might come into play when we think about load balancing. One very simple option might just be random choice where just, when the user comes in, you effectively flip a coin. If it's heads, send them to server A. If it's tails, send them to server B, where we just try to randomly and evenly distribute people. Round robin is certainly an option, where you circle amongst the servers that you do have. And then you have this idea of fewest connections, where you check the servers and figure out which one has the least load and try to send the user that comes in to the server that has the least load at that particular time. And what might be some of the drawbacks or benefits of these compared to each other? If fewest connections seems to make sense, where if server A is less busy than server B, then it makes sense to send the user to server A, why might we-- what might be a drawback of that approach compared to a random choice or a round robin-like approach? What are the trade-offs that we face when making that decision? AUDIENCE: It can depend on what people are actually doing. So even though there may be few connections on one server, there may be seven people that are actually using a lot of the server's resources for something. BRIAN YU: Sure. The number of users that are using a particular server might not be a perfect proxy for how much load that server is actually facing. Because if there are a hundred users on server one but they're really just looking at a couple static pages and aren't doing anything very computationally intensive, but people on server B, there are fewer of them but they're really doing more work, then maybe we would prefer to send someone to server A instead, for instance. So number of users or number of connections might not be the perfect way of measuring how much activity is going on in the servers. And you can imagine that we might try and make our load balancing algorithms more sophisticated or more complex by trying to figure out, well, really just how much is the load on each of these and figure out what would really make more sense. But then what sorts of issues start to come up there? What's the trade-off that we face there? Yeah. AUDIENCE: Well, now load balancing's going to become expensive [INAUDIBLE].. BRIAN YU: Great. Now load balancing starts to become more expensive. But if we want the user to be able to get a fast response from server A or server B, we've now introduced this intermediary piece of hardware, this load balancer, that's going to have to spend time calculating and processing which of these two servers is actually going to be the better server to send the user to. And it's going to take time, latency. It's going to take some computational power in order to figure out where to ultimately send that user. And so there's definitely that trade-off as well, whereas in a random choice, a round robin type model, we can save a lot of that computational energy by not worrying about which of these servers is more busy or less busy at any given time and just send the user to a particular server without needing to do those sorts of computation. And so in practice, there's no one best solution to these problems. But it's good to be thinking about different ways in which your load balancer might be operating in order to think about what algorithm you might want to use depending on the specific needs of your web application. But in general, when we deal with load balancing, if we think of this idea of user tries to access your website, with every request, that requests first goes to the load balancer before it goes to the web server. And at the load balancer stage, the load balancer makes a decision about send the user to server A or send the user to server B. What problems might occur with just that model? Even if you don't worry about which specific algorithm the load balancer is using to determine where to send the user each time, what could go wrong? AUDIENCE: Some users might be doing more than others on a server. BRIAN YU: Sure. So certainly some users might be doing more than others on a server. And in particular, when we think about what users are doing on a server, the user is oftentimes not just going to one page and letting it be at that. A user might be trying to access a page more than one time or going to multiple different pages on the same web application, for instance. You might imagine on a e-commerce site like eBay or Amazon, for instance, a user might be adding things to their shopping cart and looking at other pages and adding new things to their shopping cart and interacting with a web page in multiple different ways, making multiple requests. And what could go wrong now is every time a user makes a request, the load balancer is making a new decision about send to server A or send to server B. AUDIENCE: Yeah, that would be really bad. So the load balancer would have to have some kind of session awareness, I guess. Right? So it send somebody in one server and it just keep sending that same person. BRIAN YU: Right. So a problem might occur where without-- with just some basic algorithm like this where every request we make a decision, we don't have any sort of session awareness that if a user comes into the web application and is sent to server A, and we now store the contents of their shopping cart on server A. And the user clicks on another page. And then the load balancer this time-- either because of a random choice, a round robin, or because of new number has the fewest connections-- decides to send that user to server B instead. That new server might not have-- doesn't have the same session data that this original server did. And so maybe now the user's shopping cart's totally empty, for instance. And so by introducing this attempted benefit of splitting the server into two parts, horizontally scaling into a server A and server B, we now need to worry about when the user comes to serve A the first time, what should happen when they come back the second time. Maybe we do want the user to go back to server A again. So this brings into the idea of session-aware load balancing-- this idea that when we load balance, it's often going to be a good idea to make sure that our load-balancing algorithm is somehow session-aware, that it knows that when a user comes back to the site, that they should be directed potentially to the same server. And that's this first idea of sticky sessions-- that if user comes to the web application the first time and is directed to server A, then when the user comes back for a second time, even if random choice chose server B or even if based on looking at number of connections server B is less loaded than server A, and we would normally send the user to server B, we still want to send that user back to server A because that's the server that they were on previously. That's where all their session information is. And so if we want to make sure that the contents of the user's shopping cart is preserved, for instance, then we'd want to continually send the user back to server A each time. So that's the idea of sticky sessions. How else might we deal with the problem of session-aware load balancing? Maybe some of these additional bullets can give you ideas as to how we might deal with that. So another possibility here is the sessions, actually, in the database. So it's possible that if right now we're just storing the session information on the server, then when we split things up into two different servers, server A and server B, then any session information on server A isn't accessible on server B. And so one possibility is store session information inside of a database, a database that potentially all of the servers, both server A and server B, both have access to. And if you do an approach like that where we store information about our sessions inside of a database, rather than just storing them inside of server A or server B, then the benefit there is that no matter which server the user is sent to, as long as we have a way of taking that user and identifying which session information in the database actually belongs to them, then we can extract that session information out of the database regardless of which server of the user went to. So what would be a drawback of that approach? Why might we not want to store session information in the database? AUDIENCE: Then have to scale your database too. BRIAN YU: Certainly. Then we start to get into issues of database scalability, and we'll talk about database availability too. And there's also other-- any time we're introducing additional hardware, additional servers that are in play when we're dealing with issues of scalability, then we start to incur time costs, that if originally the session was stored on the server and now it's stored elsewhere, now there's still this communication time that needs to happen, this additional latency that gets added any time we're trying to access information. And, finally, you might imagine that we could store the session not in our web server at all and rather use client-side sessions, storing information, any information related to the session, actually inside the client. And oftentimes this is done through cookies where web servers can just take cookies and send them to the user where inside of the cookie stores all of the information related to the session so that you on your computer are actually storing inside of your computer all of the information about what's in your shopping cart. And that cookie is sent along with every web request you make. So if you make another web request, doesn't matter which server you're sent to. Your web request inside of the cookie contains all of the information that is associated with that particular section. And what might be a drawback there? AUDIENCE: Can that be some sort of attack on the server where multiple people start using the same cookie to overload the server? BRIAN YU: Great question. So there's potentially adversarial ways that this could be used, that if someone else is sending the same cookie, then the server might still just accept it and assume that it's the same person. So it might come in from different directions. And, certainly, trying to overload a server is something we'll talk about when we get to the next topic, which is all about security and how to think about security in our web applications. As we begin to scale them larger and larger, these security issues start to become more and more pressing. So those are definitely issues to be aware of as well. So lots of different ways, ultimately, to deal with these problems of making sure that our load balancer is session-aware, making sure that when the user comes about that they're consistently directed either to one place or another or at least have some mechanism in place for making sure that any session information-- the contents of the shopping cart or any notes that they've written in an application-- get saved when they go to another page, when they make another HTTP request to the web server. Questions on anything so far about load balancing or how we might do any of this? OK. So what drawbacks might come about with regards to just horizontal scaling? That we say, all right, we expect that our web server will need five web servers, for instance, in order to deal with traffic on a typical day. And so now we load balance using some of these session-aware tools, of deciding between any of these five potential servers that we need to send users to. And how might that work? What problems could come up with that model? So one thing that I'll talk about briefly that sort of gets at this idea is the idea that when we define a finite number of servers-- and, say, there are going to be five servers here, and when a request comes in, it's going to go to one of those five servers-- well, you never really know what might happen the next day. The five servers might be the typical amount that you would need in order to deal with all of the users that might come in on a given day. But you might imagine that some web applications probably get more traffic at some times of the day than other times of the day or even some periods of the year as compared to other periods of the year. You, for instance, might imagine that a shopping website like Amazon or other online shopping web sites perhaps get more traffic when it comes to the holiday season, for instance, than when it comes to other times of the year. Or you might imagine that a newspaper website, for instance, after a big presidential election or breaking news event, maybe a lot more people are accessing that newspaper website as opposed to during other times of the year when fewer people are looking at the news. And so the amount of traffic that comes into a web application could vary depending on the time of day, depending on the time of the year, depending on random events that happen from time to time. And so how might a web application deal with that? Of course, just a finite number of servers might not be the best solution because potentially if you underestimate the amount of maximum traffic you might need, then you might get more users than your servers are able to handle. And on the flip side, if you just err on the side of too many servers and just have a lot of servers expecting that in the worst case you might use all of them, then there is some waste of resources here, that you're paying, likely, for all of these different servers that are running when in reality you probably don't need that many. And so autoscaling is a tool that many cloud computing services now offer in order to make it such that the number of servers that you're actually using can scale depending upon traffic, that if more and more traffic comes in, we can scale up the horizontal scaling of your web application in order to allow for more different, more web servers to be added in those times. So we might start with only two web servers, but if another web server were to come along, we can add that web server. And the load balancer knows that now there are three web servers. And if traffic increases even more, we can continue to scale our web application. And most cloud computing services, like Amazon Web Services, that offer these load balancing services and autoscaling services, can allow you to specify here's the minimum number of servers that I want and here's the maximum number of services that I want and allow the load balancer to then just make those decisions about do we need to add another server or not. And you can add criteria for once we reach a certain threshold of the number of users that are trying to access the site, then might be a good time to increase the scale. And if after a period of time of high usage and utility of your website traffic begins to die down and you don't need four servers anymore, it can scale back down. It can add new servers when its needed in order to adjust based on the demand, based on the number of users that are trying to use your web application. Your web application can make those decisions about whether or not we need to increase the number of servers or decrease the number of servers. Questions about any of that so far? Yes. AUDIENCE: If you use AWS, do they take care of the load balancer? Do they provide it? Or is that something that you [INAUDIBLE]?? BRIAN YU: Great question. So the question is about how AWS actually does this. So AWS offers a number of different services. Amazon Web Services is just one of the more popular cloud computing services used in order to run servers like these on the internet. And we'll talk a little bit about that in just a moment, actually. But one of those services is a service that effectively will allow you to define this autoscaling group for yourself in order to say, here are the number of minimum/maximum number of servers. And Amazon takes care of the process of having a load balancer decide where to send different users and when to add new servers and when not to. And other cloud computing providers like Microsoft Azure, for instance, they all have very similar tools and technologies that allow you to implement this sort of thing without you needing to really worry about that. And that's all part of this new big movement towards cloud computing, that in the past, when writing a web application and deploying it and running a web application for your business, for instance, you might have needed to own the servers yourselves, physically have the servers inside of your company. Nowadays, with cloud computing, this is effectively just a means to allow you to rent computing power stored in the cloud, stored in someone else's servers, whether it belongs to Microsoft or Amazon or someone else, and therefore allow you to use those resources. And so what might be a benefit of this idea of cloud computing, of using resources from elsewhere instead of needing to use servers that are local to wherever you're working? AUDIENCE: If you're a small shop, then you don't need to worry about maintaining servers, having IT people. BRIAN YU: Great. So from a practical perspective, it's you need-- normally, we need IT people. You'd have to maintain your own servers, whereas with the cloud system, it's typically just a rental based on the number of hours of usage of the server. And Amazon or Microsoft or whoever takes care of making sure that the servers are running, of maintaining the servers, of dealing with any problems that might come up. And so, certainly, there are practical benefits that make it logistically more feasible now to use cloud computing as the means of running a web application rather than having to physically own your own server. So now we have this system in place where we're trying to scale our web application. We've talked about vertical scaling where we just add more computing power to one particular server. And then we spent some time talking about horizontal scaling and the issues that come into play when suddenly, instead of just having a single server, we've had to split things up across multiple different servers. And that added challenges of what do we now do about sessions. It added challenges of now we need this additional piece of hardware, this load balancer here, which is making decisions on a frequent basis of where to send user one, two, three, or four, and then dealing with how to scale those servers, as we have to potentially increase or decrease the number of servers that we have depending on load. What happens now if we have four servers and one of the servers goes offline? It just stops working. What could go wrong now? What might the load balancer do? Yeah. AUDIENCE: Getting your server's replacement. BRIAN YU: So, certainly, the end goal would be to get this fixed, to try to repair the server, reboot, restart it, get it back online, or replace the server if really there's something physically wrong with the server. But in the meantime, what could go wrong? AUDIENCE: If a user had an active session with that server, then they might lose data or something like that. BRIAN YU: Great. So one potential problem is that if there were session data that was stored only on this one server, now if a user comes back, that session data is no longer accessible potentially. And so we talked about possible solutions, and might deal with that either by storing the session inside of the client, so it doesn't matter what server they end up at, or storing the session data inside of a database somewhere such that it doesn't matter, again, which server the user is sent to. They can still retain that information. But if we have this idea of sticky sessions in place where user goes to the load balancer, and if they were in session-- if they were in server A before, they get sent back to server A. If there were in server B before they get sent back to server B. What could happen is that a user comes along, hits the load balancer, and the load balancer says what server we got last time, and the user was at server B. And they try to get sent back to this server, but the server is now offline. So somehow we need a way for our load balancer to know whether or not these servers are operational or not, whether or not it makes sense to send the user to one of the servers. So how might we do that? How might we solve this problem of we need our load balancer to know which of the servers are online so it knows where to send users? And we definitely don't want to be sending the user to a server that's no longer running or no longer operational. Yeah. AUDIENCE: Just ping the server to see if you get a response. BRIAN YU: Ping the server, see if you get a response. Certainly. And so one variant on this idea that's often used when it comes towards dealing with these servers is this idea of having each server give off a heartbeat, just a signal that they produce every so often, where the signals are received by the load balancer. And the load balancer knows if it's hearing those heartbeats, then the servers are operational. And if too long goes by without hearing one of those heartbeats from this server, for instance, then the load balancer can reasonably guess that maybe that server is no longer operational. Maybe it should be we should no longer be sending users to those servers in particular. And that brings into account its own design decisions of how frequent do you want those heartbeats to be. Certainly, if they're more frequent, you're getting a better sense of the frequency to which the servers are running. And you know more instantaneously when a server potentially goes offline. And if those heartbeats are less frequent, then maybe you're saving on energy because you no longer need to continuously compute whether or not you're receiving all of these heartbeats coming from all the different servers. And so, again, with all of the decisions that we make in scalability, there's not necessarily one correct decision that this is the right way to do a load balancer. But there are trade-offs with each of the decisions that we make with regards to how many servers we have, with regards to the algorithm that we choose for our load balancer, with regards to how we choose to decide whether or not these servers are offline or not. And when a server is offline, we need to put some thought into how do we, then, from the perspective of the load balancer, decide that we're no longer going to be sending users to that server that's now offline. And so all those concerns start to come up when we start to deal with this idea of trying to scale our web application. Questions about any of that so far? OK. We'll go ahead and take a break right now. And we'll come back later and talk about some other concerns that come about with regards to scalability, including talking about databases and what happens here with this image so far that we only have servers now. But what happens if we start to integrate databases into the mix? And how do we deal with scalability there as well? So before the break, we were talking about how we would go about scaling applications, either via vertical scaling or horizontal scaling. And when we were talking about horizontal scaling, we talked about this idea of splitting up and rather than just having one server, having multiple different servers with a load balancer that can then decide whether to send the user to server A or whether to send the user to server B. What we didn't quite talk about was how the load balancer is able to implement this idea of sticky sessions, the idea of when the user comes along, if they were at server A last time, we want to send them back to server A. And if they were at server B last time, we want to send them to server B. So what are some ways by which we can actually make that happen? How can the load balancer consistently send the same user to the same server every time in order to make sure that if the user's shopping cart's on server A, that we don't inadvertently send the user to server B and they lose all the content of their shopping cart data, for instance? What might be some ways of doing that? AUDIENCE: Could it have its own session tracking? It could send the person a cookie or something or-- BRIAN YU: Good. It could send the person a cookie, for instance. Great. So inside of the cookie, maybe-- the cookie that the load balancer can set when the response goes back to the user is one that determines or that has some information inside of it that says users should go to server A, for instance, or user should go to server B. And so these cookies are often a very useful way, whether it's for the server or for the load balancer, of giving information to the client that when the client tries to make another request, that information is still there. And we talked about before the idea that one possible way of implementing this idea of making sure that the session stays consistent regardless of what happens with the horizontal scaling is to actually store session information inside of the cookie. And this is something that Flask actually does by default. So we've been using Flask for a while now in order to write web applications that have sessions that are storing information about the user. And by default, Flask will use what's called a signed cookie, this idea that when the user has their session information, we're just going to put that session information inside of a cookie. But what might be the problem of just taking all the session information, putting it inside of a cookie, and then just using that as the way via which users are interacting with sessions on your web application? Your session, for instance, might just be a Python dictionary, you imagine, that contains the user's user ID and maybe the information of what's currently inside that user's shopping cart. And if that's just inside of a cookie that gets sent back and forth between the server and the client, what could go wrong there? AUDIENCE: Are there limits on how much you can fit in the cookie? BRIAN YU: So, certainly, the size of the cookie is something to bear in mind, that a cookie could potentially-- as the cookies get larger and larger, now you have to start to worry about cookie, the size of the cookie, and the amount of latency it'll take to send that consistently back and forth between the client and the server. Are there any security concerns we can think of that could come up if all we're doing is just sending-- if the server sends back the session information that contains a user ID and what's inside the cart, and we just expect that when the user sends back that cookie, that will be the information that the server knows is what's contained inside the user's session. AUDIENCE: I suppose somebody could steal your cookie, and then they would have access to whatever you have access to [INAUDIBLE]. BRIAN YU: Sure. Certainly, someone could steal the cookie. And if they were able to steal the cookie and gain access to that cookie, then they would have access to your entire account. They could log in as you, and they could see the contents of whatever was in your cart at that time. What about even if you didn't have access to someone else's cookie? Can you imagine a world where in this very simple-- not very secure-- example where we're just sending the cookie back and forth, where things could go wrong, where you could still get access to someone else's account? So if we're just relying on the contents of the cookie for-- yeah. Go ahead. AUDIENCE: Take that cookie and send it yourself so you can pretend to be [INAUDIBLE]. BRIAN YU: Great. So you could try and pretend to be someone else, effectively. If you were able to take the cookie and change what the value of the user ID is for instance and try and send it back, then that potentially is an attack vector by which you could trick the server into thinking that you are someone that you're not. And so one way that Flask tries to get around this is by signing the cookies. And so if you want to use Flask signed cookies, you'll have to include a private key inside of the web application, which is just going to be a long string of characters that only the web application should know and shouldn't be accessible to users. And, effectively, every time Flask sends you a cookie, it's going to sign that cookie, add a signature, where that signature is going to be generated based on a combination of the contents of the session itself and of what the private key is in order to generate a signature that shouldn't be or should be reasonably-- should be difficult for anyone to be able to predict or figure out such that you can know with confidence when you get back that session, Flask can treat it as a checksum, effectively, in order to determine, in fact, that this cookie did come from this user. It is, in fact, a valid, genuine cookie, and they can trust the information inside of it. But, certainly, with the issues we talked about with regards to cookies and the potential for them to be intercepted and used, we might not want to use that as our method, which is why in the Flask applications we've been building, if you've noticed up at the top, we've set some application settings inside of the Flask app variable that actually say that when we're using these sessions, rather than use cookies as their means for storing sessions, we've been using sessions that are actually stored on the file system of the server itself as your way of tracking the sessions. And, in fact, if you were to ever use sessions on Flask, using those files system sessions, and shut off the Flask server or even just-- even if you didn't shut off the Flask server, you could look at the contents of the sessions directory in order to take a look at what the sessions actually look like, which can be interesting to explore if you want to get a sense for what's going on inside of the sessions. And so we spent some time today talking about trying to scale up these servers. But one thing we've come back to a number of times is databases and what happens when we're trying to store data, whether it's session data that we might be trying to store inside of a database. Or maybe it's just that our application uses a database, whether it's in project one or project three, where we've wanted to store books or food orders inside of a database. What happens when multiple different servers are trying to access that same database? And now we start to get into this issue of trying to scale up our databases. So we might imagine that-- we'll take the same picture. We've got a load balancer. We've got two servers. And now we also want those servers interacting and communicating with a database somewhere, where those servers are communicating with the database. What can go wrong now in this picture with this model? AUDIENCE: Well, too much load on your database maybe. BRIAN YU: Yeah. Too much load on the database. You might imagine that if the reason we went from one server to two servers was because a single server wasn't enough in order to handle all the load, all the traffic coming into that one server, then if we have all the load from both servers that are all trying to talk to the same database at the same time, we might be potentially overloading that database. And that might become unmanageable. What else could go wrong? AUDIENCE: Database server could go down. BRIAN YU: The database server could go down. Great. That's another thing that could happen. So we've talked about this idea that when we were scaling our servers and had multiple different servers, if one server goes down, no big deal. So long as the load balancer knows that server two is the server that went down, it can redirect all the traffic to servers one, three, four, and five. But here we see that this is what we might call a single point of failure, a place inside of our diagram of all the hardware that's going on where if this one thing fails, then the entire web application breaks. Right? If the database fails, nothing else in the application is going to be able to work, assuming the web application is relying on the data and the database to work. Whereas this server, for instance, wouldn't be a single point of failure because if this server goes down, then the load balancer can just direct all users and all traffic to this server over here, which can still access the database. And for that matter, the load balancer itself is also another single point of failure. If the load balancer goes down, then suddenly we have no way of directing users to various different servers. And so we might think that there might be ways that we want to have multiple load balancers, for instance, in order to try to address that problem of avoiding having single points of failure. But what we're going to focus on now is on ways to make this database scaling more manageable. That as more and more data starts to come into our database, we might start to see slower queries because if we have millions upon millions of rows in our database, it might take longer and longer in order to query that database in order to get the data that we want. So how do, as we scale up our applications, begin to deal with that? And so the first topic we're going to talk about is database partitioning, the idea that if we have database tables that are large, either large in a number of rows or large in a number of columns, then trying to query information from those big tables can start to get complicated. And it can start to become time-consuming. That if we have large tables, it's going to take more and more time in order to query them. And so databased partitioning is going to represent the idea that if we have data inside of our database, we can often split up that data into multiple different parts-- into multiple different tables, for instance-- in order to better allow for ourselves to deal with more manageable units, to have queries on those tables run more efficiently and more quickly, and in that sense help us as we begin to scale up our web application. And so one form of database partitioning we've actually already seen. It's called vertical database partitioning. And the idea of vertical database partitioning-- if you remember this from way back in one of the earlier weeks of the lecture-- is the idea that in vertical database partitioning we're going to separate our table into multiple different tables by decreasing the number of columns in those tables by separating things out such that some columns are going to be put into a different table than other columns. So if we recall, this original table of flights, which we were keeping track of when we were first trying to think about SQL and how we might organize data inside of a database, we have each flight having an ID number, an origin, an origin airline code, a destination, the destination code, and the duration of the flight, for instance. And what we did when we were first talking about SQL and the idea of designing tables was to use foreign keys as our way of what we'll now call vertically partitioning this database. And instead of storing all the data like this inside of a big flights table that might be expensive to query, we can split it up into two different tables. Split it up into a locations table where each location just has its independent code and the name. And we can split it up into a flights table where each flight, rather than have all of those columns as before, now only has four columns. It's got an ID column. It's got a number that represents the origin, or inside of the locations table, origin one corresponds to location number one in the locations table. And, likewise, we have a destination_id, where destination_id four corresponds to this particular location in London, and, finally, a duration. And so we factored out some of the columns in order to create tables that have fewer columns and are, therefore, more manageable and might be easier to query in some sense. And so this is vertical partitioning, and it's something we've already seen before. But there's another form of partitioning as well, which is, you might guess, is called horizontal partitioning. And that might look something like this. If our table of flight is just getting too long, getting too big to query, where we're consistently having to run queries on the set of flights-- looking for all flights that are going from New York to San Francisco, for example-- and those queries are starting to take a long time, we might horizontally partition our table. In horizontal partitioning, rather than change the number of rows to have fewer rows in each of our tables, we're just going to split up the rows of our tables. Or rather than change the number of columns in the tables, we're going to split up the rows of our table such that, rather than have a table that has 2,000 rows, for instance, we might have two different tables where we put 1,000 rows in one table and 1,000 rows in another table. And so we might take this idea of the flights table and really split it up into two different tables, a domestic flights table and an international flights table, where each one of these tables contains the same columns. It's still going to have an ID, an origin, a destination, and a duration. It's just that we're going to split up the flights into those two independent tables. What benefit do we get by doing this? What advantage do we get by taking the flights table and partitioning it into two different tables, a domestic and an international table, that we didn't have with just the flights table? Yeah. AUDIENCE: You're going through less rows, so if you split it, the table, in half, you're spending half the time [INAUDIBLE] to the database. BRIAN YU: Great. So the big benefit is that our queries are faster, that if I'm trying to query a domestic flight, I now only need to search through this domestic flights table. And I don't need to worry about searching through however many international flights there might be, and so my queries can become faster. And horizontal partitioning oftentimes isn't just with two tables. You might split things up into many, many different tables. You might imagine that if you have a database that's keeping track of different people's addresses and locations inside of the country, that you might split things up into having 50 different tables-- one for each of the US states, where if you're trying to find someone where you know that they live in Oregon, for example, you can just query that table and ignore the tables that have to do with anyone else, thereby speeding up that query. What drawbacks, though, come with this approach of horizontally partitioning our data into multiple different tables rather than keeping it all inside of the same table? Yeah. AUDIENCE: It seems like your code would have to get more complicated because you have to know which table to look in for what. BRIAN YU: Sure. So there's some code complexity that gets added here, that we need to now know before we query. We can't just say query the flights table. We need to have some mechanism for knowing, yeah, should we query the domestic flights table, or should we query the international flights table. And maybe that in itself is going to be an expensive process, which is why oftentimes it's good, if you're going to do any sort of horizontal database partitioning, to give some thought as to how your partitioning that data, making sure that it's a way that you're going to be able to quickly and easily figure out this is the table that I need to query as opposed to having to spend a long time trying to figure out which of the tables to query before you actually do. Other potential drawbacks of this approach? AUDIENCE: If you make a schema change to one, you have to do it to the others. BRIAN YU: Great. So schema changes now become a little more of a headache, that if I'm changing the schema for this flights table, now I suddenly need to worry about-- now, instead of just changing one table, I need to update both tables in order to reflect those changes. Other things that could go wrong with this approach or trade-offs that we have to sacrifice in order to get the benefit of this additional query speed? We'll put it this way-- yeah. Go ahead. AUDIENCE: Maybe validation. You might have duplicates and multiples tables. BRIAN YU: Great. So as soon as we start to deal with multiple tables, then there's potential for invalid data that you might need to worry about making sure that the tables are matching up. You don't want there to be a domestic flight in the international flights table, for instance. And these are the things that you have to start to worry about. When might a query actually be slower in this approach as opposed to this approach with just the flights table? AUDIENCE: You need to bring all that data back together again somehow if you want to process [INAUDIBLE]. BRIAN YU: Yeah. Sure. Any time you would need to bring data from all these tables together, now your query is actually going to be slower because we have to first query this table and then in a separate query, query the other table. So you might imagine if I wanted a listing of all of the flights leaving New York City airport, a New York City airport, then suddenly I need to worry about not just the domestic flights that are leaving but also the international flights that are leaving. I might need to query both of those tables independently in order to get the information that I can then display. And that might take longer by querying two tables than I could with just one. So oftentimes, when horizontally partitioning data, it's a good idea to think about how you're partitioning things in such a way that you don't want to partition things in ways where you'll often need information that's in different partitions. You'll often want to partition things in a way such that, with relative frequency, you'll only be querying for things from a single one of those individual horizontal partitions. So there's some design thinking and design decisions that have to go into play as you think about which one of the partitions to look for and how you're going to actually partition that data. Another term you might hear here with regards to scaling databases is database sharding, the idea of that right now, rather than take a single table and split it up into two tables in the same database server, we might actually have multiple database servers where I store domestic flights on one database server and international flights on another database server. What might be a benefit of that? Where I have two independent servers, one of which contains some of the data, the domestic flights, and one of which contains the international flights, as opposed to having them in just two tables on the same server. AUDIENCE: It's not a single point of failure anymore. BRIAN YU: Not a single point of failure, that if I happen to-- if the international database happens to go down, I still have access to the domestic flights. What about that example that I gave before of I want to get all of the flights that are leaving San Francisco airport? AUDIENCE: Yeah. So maybe you could process the data faster because each server's going to process its own table [INAUDIBLE].. BRIAN YU: Great. Now I can have some concurrency, that I can-- query for both the database servers and say, give me all the flights that are leaving San Francisco. And I can have the domestic server running and the international server running simultaneously and then giving me back those results. And so maybe that will help to offset what might initially have been a longer query in order to query these two separate tables. But, of course, it's still going to mean now that I have to deal with the fact that my data is located in different places. And if I ever want to do a SQL join, for instance, if I'm trying to join multiple tables together, now the fact that the tables are located on different servers, that's going to come at a time cost as well. And so as we think about database design and on which servers your table should go, all of these are things that should come into play. And they're considerations that are going to change depending on the specific needs of your application, depending on how frequently you're going to be accessing one type of data as opposed to another. There are trade-offs to think about and things that you'll have to weigh as you go about making those decisions. Questions about anything with regards to database partitioning, splitting data up? All right. So databased partitioning, splitting data, may help to make data more manageable, and it may help to speed up queries. But it doesn't fully solve that single point of failure problem, the problem of we have two servers that are both trying to talk to the same database that has all the data. Maybe I've partitioned the data to make our queries faster, and so maybe our database can start to handle more and more users than it could before. But we're still dealing with the possibility now that we have a single point of failure where that database can fail, and suddenly nothing's going to work. And we're still still dealing with the possibility that we might overload the database if we have 10 different servers that are all trying to access the same database. So what might we do now? AUDIENCE: Wouldn't there be a database backup system somewhere? BRIAN YU: Sure. A database backup system would be a great idea. And we'll often call this database replication, the idea that we don't just want one copy of our data. Maybe we want multiple copies of our data, two or even three different copies of the same database that we can, therefore, help to distribute load across. In the same way that we could distribute load between different servers, we can distribute load between the databases as well. What problems start to come up now that we're duplicating our database? Now we have three different copies of the database. How do we deal with it? AUDIENCE: You're going to get some servers' data-- or database, yeah, they'll match up. But if you have three databases, one database might have more recent data than the other one. BRIAN YU: Great. So now we're dealing with the possibility that server data might not match up with each other. If I have three different databases, what happens if I update one database? What happens to the other two databases, for instance? What happens to that data? So how might we resolve those sorts of problems? That problem of we need to make sure that our three databases are all in sync with each other. And we want to have one database have some data and have another database not have that data because then it will change the user's experience depending on which the databases they happen to try to access. AUDIENCE: I mean, it seem like the database servers are going to have to speak to each other, lock records, update records, so do the same thing that was done to them [INAUDIBLE] the other database servers. BRIAN YU: Great. So there's going to need to be some sort of communication between the servers. And so we'll look at a couple of different models for database replication that are quite common in order to try and deal with these problems. And so we'll look at a single-primary replication and multi-primary replication. And so in the single-primary replication model for database replication, what we have is we have a single, as the name might imply, a single database, which is called our primary database, which would be this database right here. And on this primary database, you could treat it like the single database that we had before. You can read data from it, and you can write data to it. And we also have these two databases over here, which we're going to call secondary databases. And the idea of the secondary databases is that you can only read data from a secondary database. You can never update to the secondary database or write to the secondary database. You can only ever write, meaning update or add a row or delete a row, to the primary database. And you can select all the data you want from the other databases, but you can't update or add or delete new rules. So what's missing from this picture now? What needs to happen any time a write to this database happens? AUDIENCE: [INAUDIBLE] BRIAN YU: Great. We need this update mechanism, that whenever we write to this database here, our primary database, our primary database needs to update this database and update this database, tell the secondary databases that new data has been added or removed or updated and changed in some way in order to make sure that those new databases reflect those changes. And so under this model, we're able to implement this idea of replicating the databases and making sure the databases that are staying in sync because we're only ever able to make changes on this database over here. And when we do, that database is going to update the secondary databases. It's going to make sure that those databases are aware of those changes. What are some drawbacks of this model? AUDIENCE: Timing. BRIAN YU: Timing, certainly. We might deal with potential race conditions here, that if we write some data to this database and we try and read it from some other database before there has been time in order to make that update happen, that can potentially cause problems. AUDIENCE: It seems like in general reading is going to be pretty good. But writing is what's going to really take a hit, especially because that server that you're writing to has to use its resources to write in the other databases. BRIAN YU: Great. So this model seems pretty good if we're doing a lot of reading from our database. So you might imagine that, depending upon the web application, this might be a really good model. If you imagine a model like a blog or a news website, where most of the time, when someone's accessing the news website, they're just reading the stories that are published on the news website. And it's not like they're adding new stories constantly to the-- there's fewer times that people are adding stories than they are reading stories, so reads are happening a lot more frequently than writes. Then this might be a perfectly acceptable model, where we just have a lot of databases that are reading but only one database where we can actually write to because that's less common. But if writes were more common, this model starts to look not as good. We still have a single point of failure, that if this database breaks down, now suddenly we're not able to do writes to the database. And if a lot of people are trying to write to the database at the same time, we're not able to distribute that load because the only place where we can do writes is on this primary database over here and not on the secondary database. And so a solution to that, rather than just using single-primary replication, will be multi-primary replication, which, as the name might suggest, is where, instead of having a single primary database and some number of secondary databases, we have multiple different primary databases where for each one you can read and write data to each of them. So now it's not just reads that can be distributed across a number of different servers. It's writes as well. We can add rows, delete rows, update rows across any of the servers in this multi-primary replication model. So what's the catch? Why might this be more of a challenge? Or what are the trade-offs here? AUDIENCE: It seems like if you drew all your arrows that were updating in the back-end there-- BRIAN YU: If we draw all the arrows-- AUDIENCE: So it's like, what is the difference of having three versus one? How would it start to look like? BRIAN YU: Yeah. Sure. So, certainly, once we start to draw all the arrows, all the updates that have to happen, updates between all the different databases going in both directions-- server one needs to update server three, and three needs to update one-- that this picture starts to get complicated. And it starts to introduce potential problems that could come up. So, certainly, one is that as we have more and more databases, now we need to have more and more of these updates that are happening with each other. And what other problems can come up now that we have this, all of these different updates that are all trying to update each other? Yeah. AUDIENCE: If someone is trying to write to two databases at the same time, then you might have duplicate information that doesn't match up. BRIAN YU: Great. What happens if two users are trying to edit, make updates to two different databases at the same time and they both register those updates and now try to update each other? So there are a number of different ways that these conflicts can come about. One is a primary key conflict, where imagine if there are 27 users inside of a user's database, and a user registers on this database, and a user registers on this database. Well, user over here, they get added to the user's table. There were 27 users before. So new users going to be user number 28. And then over here, if the user is registering at the same time, some different user, this database also sees that there are 27 users. And so it's also going to add this is now user number 28. Now we have two different users that both have ID number 28. And so when all the updates happen and they try to sync up with each other, now we're going to run into potential problems because now we have two rows that have the same ID field. And that's not allowable because our ID field, presumably, is supposed to be unique. So that's one potential problem. Other potential problems include two different databases trying to update the same row at the same time, for instance. If they're both trying to change the duration of a flight, for instance, and one wants to change it to 120 minutes and one is trying to change it to 150 minutes, and now which one of those databases should we listen to? And all sorts of other problems could come up. If someone tries to, for instance, delete a row at the same time that someone else is trying to edit that same row, should the edit take precedence over the delete and keep it? Or do we delete it and ignore the edit? All of these are conflicts that, ultimately, whatever multi-primary replication system you're trying to use needs to have rules for how to deal with, some systematic way of saying, all right, if these two edits happen at the same time, then we should need some mechanism of trying to resolve those edits. Maybe if they're editing different columns, then it's fine. Just update both columns for both rows. But if they're editing the same column of the same row, then maybe check the time at which it happened and go with the more recent. And so there are any number of different rules that might get increasingly more complex or sophisticated that come into play. But the idea is that the additional complexity that we face with multi-primary replication is that we need some mechanism for resolving those conflicts. We need some way of saying, if two databases that are trying to perform updates and those updates conflict with each other, how should we deal with those updates? And we need rules for how to do that. Questions about either single-primary or multi-primary replication or the idea of database replication in general? OK. So one more topic that we'll talk about with regards to trying to scale up data is caching. And this is something that will become-- that's very useful as data begins to scale. And this is all about trying to avoid needing to spend too much time doing things that we've already done. So the idea of caching is taking data and information and storing it in some temporary place for usage later. You might imagine that on The New York Times website, for instance, the home page of The New York Times probably isn't changing too much from one second to the next second. Sure, after some number of minutes, a new article might go up, and the front page might change. But if you load the page and then refresh the page, the page that you get again, it's probably going to be the exact same page in all likelihood. And it probably wouldn't make a whole lot of sense, then, for every time someone tries to request the front page of The New York Times, for The New York Times to go to its database and look up what the most popular recent articles are and look up the latest images and what the trending news is and then regenerate that whole page and then present it back to you because it's going to have to do that for you every time you make a request and do it for every other user who is trying to access the front page of The New York Times every single time. And so what might be a good idea is introducing some idea of caching-- the idea of saving what the front page looked like such that if a user comes back in a couple seconds, and the page hasn't changed, then go ahead and just present the same page, for instance. So there are multiple different ways by which caching can happen. Caching can exist on the client side and the server side, and we'll look at both of them. And we'll start with client-side caching. And this is something you might already have some familiarity with. That if you've been working with JavaScript files in project two, for instance, and you've made edits to your JavaScript file, and then you check your web page, what sometimes happens? Yeah. AUDIENCE: You still get the old JavaScript. BRIAN YU: You still get the old JavaScript file. Right. Even though you've made changes to your JavaScript file and you've saved those changes, when on your web browser you refresh the page or go back to that page that's supposed to have the new JavaScript, you still get the old JavaScript because-- and the reason for that is client-side caching. Your web browser has saved the old JavaScript file, and it's just assuming that that file probably hasn't changed. And therefore, rather than go through the additional time expense of ask the server to send me the JavaScript file, get the JavaScript file back, it's just looking locally to its own cache, which is faster to access and just using that JavaScript file instead. And so while that might be an annoying use case of caching, in practice it's actually quite helpful if we ever want to have some resource that is going to persist for some amount of time, something that we want to be kept inside of the cache, because it's probably not going to change too often. And so inside of an HTTP response, when the web server responds back to the user and presents it with the body of the response, it contains the page to actually load. The server also responds with HTTP headers, information about the request that the client web browser, whether it's Chrome or Safari or something else, knows how to interpret and knows how to understand. And so one of those headers might be this cache-control header. And what the cache-control HTTP header is allowed to do is it's allowed to set in the most basic case a maximum age for the page. In other words, specify after this number of seconds-- or in this case, one day, I believe-- that's when you should, if you're requesting the page again, actually see if something has changed. But within this amount of time, the page probably hasn't changed, so don't worry about trying to access it again if you're trying to load the same page again. Just use the cached version of it. And so by putting a line like this inside of your HTTP header-- and web frameworks like Flask and Django have ways of allowing you to edit what goes into the header, and you can set those. And you can look at Flask or Django's documentation for looking at how to do that. But you can say to the web browser, go ahead and save this page in the cache for a day or so such that if you come back in a couple hours, no need to contact the server again, which might add additional load to the server when it's unnecessary. Just go ahead and load that same page. So what problems can happen here with cache, with caching on the client side, having the web browser cache the page? Yeah. AUDIENCE: If the page changes sooner, then you wouldn't know. BRIAN YU: Yeah. Sure. Certainly, if the page changes sooner than this amount of time, then when the user tries to go back to that page, there's a good chance that they're still going to see the old page, that they'll get the old page, and it will load very quickly. But whatever the newer version was, they're not going to see it. And, certainly, there are ways around this. You can hard refresh the page, which usually is going to try and clear the cache and just say-- really try and access the page by actually talking to the server. And so that's something you can do as well. But what about the case where maybe this is saying you can cache the page for a day? What if the next day the page hasn't changed or three days later the page still hasn't changed? Under this model, we would still go back to the server and say, a day's up, so it means that I can't use the cache page anymore. I need to go to the server and ask for it again. But imagine if it's some big file. It's a video or some other large file that might take a long time. It wouldn't make a whole lot of sense if we were trying to redownloaded it again and again and again just because the cache was up. And so what's a way that we might be able to enforce this idea that the server can send new data if there's been a change but doesn't need to? Yeah. AUDIENCE: Have some ID that every time the server makes a change that they can, for example, increment that ID and then just use the headers first to see if you have-- if your headers match up, then don't [INAUDIBLE].. BRIAN YU: Great. So we can use some kind of identifier, some identifier that's associated with the resource, whether it's a web page or a video or something else, where any time that resource is updated, we update that header. And in HTTP, we call that header an ETag, or an entity tag, which can just be a really long hexadecimal sequence, a sequence of numbers and characters, where that is going to be uniquely associated with a particular version of a resource. That if the resource gets updated-- I update the page, or I update the video-- then this ETag is going to change. And so now how can we use the ETag to implement caching or to implement the idea that I don't need to redownload the page every time because of this ETag? What can I do? Yeah. AUDIENCE: Every time you're doing a get request to a server sent, the ETag that you have, and then the server, if it matches up, it'll tell you like, no need to reload. Otherwise, it will send you the new file. BRIAN YU: Great. So when the user is trying to request the page, the user can send along the ETag with the request, say, here is the ETag, here's the version of the request that I have. And the server can look at that ETag and say, does this match up with the latest version of the resource that I have on the server? And if it does match up, then rather than send the whole contents of the page, rather than send the whole video again, the server can just respond with usually a 305 status code-- 304, which stands for not modified-- just to say there's been no change to the content you're trying to request. Go ahead and just use your cache version. It's still fresh, and it's not stale, as we'll often call it with regards to caching. And the result of that is the responds can happen quickly. The server doesn't have to get too loaded. And the client can know the ETag is the same. The resource is the same. I can just use the version in the cache. Of course, on the flip side of things, if the user sends along an ETag with the request saying, I'm requesting this page, here's the ETag of the last time that I visited this page, if the page has changed and the server detects that, OK, wait a minute, this ETag is different from the ETag of the latest version of the resource, now the server knows that we need to give the user a fresh copy of that resource. And the server can now do that processing, get the resource, and deliver it to the user. And so this client-side caching serves two benefits, really. Number one is that it's faster for the user, that from the user perspective they can often see the resource load faster because it's loading from their own computer as opposed to having to be transferred over the internet from one server to the client. And on the other side, it helps from the load perspective that if you have hundreds of users and are all trying to access your server and access your database at the same time, any time you can tell some subset of those users, you don't really need to access the server or the database, you can just use a version of the site you already have, that's going to be a benefit to you. That's going to be less load on your website. That's going to help you as you think about scaling your website. Questions about client-side caching? All right. So let's talk about server-side caching, which is another place where caches can be. And caches can exist all throughout this entire process, whether they're large or small. They can be located in different places. And one thing we didn't mention was that if you have a cache that maybe works for your entire network-- actually, we'll talk about that one more time. So imagine that you have some cache that's working for your local network, for instance, your computer and other computers that are all connected to the same network. What could go wrong with something like cache-control or the ETag where you might not want for the page to be cached? AUDIENCE: Security. BRIAN YU: Security reasons, sure. You might imagine that there's a difference between public websites and private websites or private pages, that if facebook.com, for instance, were something that were just consistently cached, then if I visited Facebook and saw my news feed and it was cache, then I wouldn't want it for-- if someone else on my network or someone else using my computer were to also go to Facebook, if they were to-- on their account, if they were to also see the same content of what I just saw because it was pulling from the cache. And so inside this cache-control header, you additionally have the option of specifying do I want this to be a public cache, meaning a page that anyone can see, or a private cache, which should be authenticated. And so there are additional settings inside the cache-control that you can set in order to make sure the cache is behaving the way that you want it to behave. We won't go into too many of the details here. But know that you have that kind of control and flexibility over the cache just by setting it inside of the headers of the HTTP response that you're sending back to the user. But back to server-side caching. So the idea of server-side caching now is, instead of having the cache stored locally on the computer of the user inside of chrome or Safari or whatever web browser they're using, we can add to our model here, where in addition to having servers that are all talking to the database, have all the servers that are now connected to the cache. Now, of course, we have a whole bunch of new single points of failure here. Our databases is a single point of failure, and so is our cache. But why might we want to add a cache to this image? It certainly complicates the image. But what benefit do we get from it? Yeah, sure. AUDIENCE: So if something is common, then you still write to the cache so you don't have to hit the database because the-- well, the cache is faster than the database. BRIAN YU: Great. So the cache is likely going to be faster than the database in certain respects, usually because if we're trying to render something complicated, like the front page of The New York Times or you imagine Amazon has a page that shows the most popular books, it might take a fair amount of energy and computational resources to query the most popular books from the database. Right? There's some algorithm involved whereby it's going to look at books that have been purchased frequently recently. So it might need to look for orders and look at different books, and it might be multiple different tables that have to be queried in order to get what are the top 10 most popular books. And that's going to be an expensive database operation. Whereas once you've gotten those 10 most popular books the first time, one thing you can do is just take all that information, put it inside the cache, and store it there, whereby on future instances, if a user comes by within the next couple seconds and says, I want to see the Amazon home page and I want to see what the 10 most popular books are, rather than repeat those queries again and go back to the database and query that information, we can just look to the cache where we've stored potentially in a file somewhere here are the 10 most popular books. And we can just use that cache information to display that information back to the user. So what drawbacks come up there? What are the trade-offs we face when we do that? AUDIENCE: You need to take care of when to update the cache. BRIAN YU: Great. So any time we're dealing with the cache, we always have these issues of cache invalidation. What happens when data inside of the database is more recent than data inside of the cache? How do we deal with that type of thing? And so multiple ways that we could do that. How could we deal with that situation? What are some ideas for how to deal with a problem where maybe the 10 most recent, most popular books are no longer valid because a bunch of people bought book number 11, and now that's the new 10th most popular book? And so what's in the cache is no longer valid. Strategies? There are multiple. Yeah. AUDIENCE: So you don't care about the recent database. But if someone writes in the database, then you can update the cache [INAUDIBLE]. BRIAN YU: Great. So we could add logic that says that when we're writing to the database, if we place a new order, then we should also make sure that the cache gets updated, that we invalidate any old information in the cache, get rid of it such that the next time the user makes a request, we're doing that anew. So depending on the system that you have and depending on what types of reads and writes you're doing, that may or may not be feasible. In the case of 10 most popular books, you probably don't want it such that any time anyone purchases a book that we're invalidating the cache of what the 10 most popular are. But, certainly, you can think of heuristics that we might employ in order to help make that process easier. How else might we implement cache invalidation, this idea that if we have data in the cache, then at some point that data is no longer going to be valid? AUDIENCE: Couldn't you do a similar thing with the ID? Like, the cache could store an ID for a particular set of data. And so then when somebody requests that data from the cache, it checks the database to see if it needs to get an updated version. BRIAN YU: Great. So we could have some mechanism via which the cache is checking the data-- or we have something in the server that's checking the database to see do we actually need to check the database. Or can we just go from the cache by having some identifier that updates, for instance? And maybe that operation is slightly less expensive on the database. So it's not needing to perform that full query, but we still do a quick check to see if there's anything we might need to change. Otherwise, we can still go to the cache. That's certainly an option too. And there are many other ways that we could potentially deal with the problem of cache invalidation. One common way is just to effectively ignore the problem. Set an expiration time on the cache and say, the 10 most popular books, this will expire after 12 hours, for instance. And it's probably not a big deal if a new book comes in the top 10, and it's not updated right away. If you don't care that much and you're OK with the cache being a little bit out of date, then that's OK so long as you have some sort of expiration time on the cache to say, after X number of minutes or hours or days, then we should invalidate it and then check the database again to see what the latest information is. And so the big takeaway from all of this is-- whether we're talking about caching or whether we're talking about database scalability in terms of partitioning it or replicating it into different places or we're thinking about how we're going to load balance our servers, whether we're using vertical, whether we're expanding them vertically or scaling them horizontally or some combination of all of these things-- that there are trade-offs with each of those decisions. And we have to decide whether, based on the needs of our particular system, based on the needs of our particular web application, whether there are more writes than there are reads and what sorts of operations are commonly happening, that we need to make these design decisions. And so one of the goals today was really to get across the idea that, with all of these different moving parts, we can be thinking critically about what design decisions we make, how we choose to design our system in a way such that it is scalable based on the specific needs of our system or our web application. Questions about any of those things so far? Yeah. AUDIENCE: Yeah. I have a question about, basically, the cache. Shouldn't the cache be memory on the server? Or is it actually its own hardware [INAUDIBLE]?? BRIAN YU: Great question. So what form does the cache actually take? So, certainly, there are a bunch of different forms that that cache can take. And oftentimes we might have a smaller cache that's actually physically located on the server, where we wouldn't need to talk to something external. But there are other cases where you might want an external cache. Well, what's one benefit that an external cache does give you instead of just storing a cache on one server? AUDIENCE: More space [INAUDIBLE]. BRIAN YU: More space, certainly. So a cache might be able to store large amounts of data. And usually a cache is just going to be, basically, hard disk storage where you can just easily access it, access it very quickly. Amazon Web Services, for instance, offers S3, which is effectively a service that's just a big hard drive in the cloud where you can store files and is often used for caching purposes. What's another benefit of just storing, of using an external cache located on some separate hard drive somewhere that's not within any one of the servers but that all the servers talk to? So we talked about the drawback, which is that it takes longer to talk to the cache. But what's a benefit? AUDIENCE: One primary source. BRIAN YU: Yeah. It's a primary source that all of the servers have access to, that if server number one has cached the 10 most popular books, that if someone's on server two and tries to access 10 most popular books, they can also access that same cache as opposed to in the case where if all the caches are only stored on the servers, now each of those servers needs to independently generate it and maintain its own cache. And it has those issues as well. In reality, though, most web applications that begin to scale larger and larger have many different caches. They will have caches on the server for quicker things that need to happen. And maybe you're using a separate cache in order to deal with larger files that need to be cache, for instance. And they'll use some combination of all these caching techniques in order to get the best of both worlds, ideally, to try and have quick access on the server to things that we need access to quickly, to store on the off-site cache information that we want all the servers to be able to access. And so usually you'll see some combination of all of these things in practice as real web applications begin to scale. Yep. AUDIENCE: Question online. What would be a good way for estimating the number of servers, databases, load balancers, and caches that you would need for an application? BRIAN YU: Great question. So what's a good way to estimate the amount that you would actually need? And so we talked a little bit at the very beginning about benchmarking, about the process of trying to test to see how much load the server can actually take. And so there are a number of different pieces of software that you can use in order to perform that benchmarking. I know ApacheBench, I believe, is one common piece of load balancing, of benchmarking software that you can use. And, also, one good strategy is, if you're using cloud computing, look to the documentation of wherever you're getting those servers from. And they'll likely include information about the processing power and the memory of those computers. And so in the AWS case, for instance, one of the more popular servers tools is EC2, Elastic Compute Cloud, which is just the service that AWS offers that lets you effectively rent servers in the cloud and run servers like these that might be connected to a load balancer, for instance. And they come in different sizes. They have different names, different letters and numbers, like smalls and mediums and larges. And you can look on their website as to what each one of those servers means and how much computing power it has. And using that, you can begin to gauge which one you need in order for your particular purposes. All right. So those were just-- that was just a brief introduction to some of the high-level concepts that come into play as you think about how to scale your application. When I come back next time, we'll be talking about security, about-- in the same way. As we take our applications and begin to scale them to deploy them to the internet and are used by many different users, how do we make sure that our software is secure from adversaries that might be trying to attack the website, for instance? And what considerations go into there? And so that's it for today, for Web Program with Python and JavaScript. Thank you all so much.