WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:00.000 --> 00:00:03.500 [MUSIC PLAYING] 00:00:17.457 --> 00:00:19.290 BRIAN YU: All right, welcome back, everyone, 00:00:19.290 --> 00:00:21.570 to Web Programming with Python and JavaScript. 00:00:21.570 --> 00:00:25.750 And for our final topic, we're going to explore scalability and security. 00:00:25.750 --> 00:00:28.470 So far in the class, we've been building web applications. 00:00:28.470 --> 00:00:31.635 And we've been building web applications that work on our own computer. 00:00:31.635 --> 00:00:33.510 But if we want to take those web applications 00:00:33.510 --> 00:00:36.000 and deploy them to the world so people all across the internet 00:00:36.000 --> 00:00:37.958 can begin to use them, then we're going to need 00:00:37.958 --> 00:00:40.980 to host our web application on some sort of web server-- 00:00:40.980 --> 00:00:44.192 some dedicated piece of hardware that is listening for web requests 00:00:44.192 --> 00:00:46.650 and responding to them with the response that we would like 00:00:46.650 --> 00:00:48.660 for our web application to deliver. 00:00:48.660 --> 00:00:51.030 And when we do so, this introduces a whole bunch 00:00:51.030 --> 00:00:54.338 of interesting issues surrounding scalability and security. 00:00:54.338 --> 00:00:56.130 So we'll take a look at these issues today, 00:00:56.130 --> 00:00:59.970 beginning with problems concerning scalability-- what those problems are 00:00:59.970 --> 00:01:02.650 and how we might go about addressing them. 00:01:02.650 --> 00:01:04.410 So when we deploy our web applications, we 00:01:04.410 --> 00:01:06.720 deploy them by putting them onto a web server 00:01:06.720 --> 00:01:08.970 that I'm, here, just representing with this rectangle. 00:01:08.970 --> 00:01:12.840 But all the server is is some dedicated computer, some piece of hardware that 00:01:12.840 --> 00:01:14.620 is listening for incoming requests. 00:01:14.620 --> 00:01:18.750 So we'll draw this line to represent an incoming web request from a user. 00:01:18.750 --> 00:01:21.660 The server takes that request and responds to it. 00:01:21.660 --> 00:01:23.880 But ultimately, our web application isn't just 00:01:23.880 --> 00:01:25.530 going to be servicing one user. 00:01:25.530 --> 00:01:28.080 If it becomes popular, it might have many users 00:01:28.080 --> 00:01:31.560 that are all trying to connect to that server at the same time. 00:01:31.560 --> 00:01:34.790 And as multiple people start to connect to that server at the same time, 00:01:34.790 --> 00:01:37.560 here is where we start to deal with issues of scalability. 00:01:37.560 --> 00:01:41.040 A single computer or a single server can only service so many users 00:01:41.040 --> 00:01:42.273 at any given time. 00:01:42.273 --> 00:01:44.190 And so, therefore, we need to think in advance 00:01:44.190 --> 00:01:47.640 about how we're going to deal with those issues of scale. 00:01:47.640 --> 00:01:49.920 But the first question, before we even get there, 00:01:49.920 --> 00:01:52.320 is where these servers actually exist. 00:01:52.320 --> 00:01:56.010 And nowadays, there are two main options for where these servers can exist. 00:01:56.010 --> 00:02:00.210 These servers can be on the cloud or they can be on premise. 00:02:00.210 --> 00:02:02.400 And on-premise servers, you might imagine 00:02:02.400 --> 00:02:05.160 is if a company is running their own web application. 00:02:05.160 --> 00:02:08.340 On-premise servers are servers that are inside of the company's walls. 00:02:08.340 --> 00:02:10.710 The company owns the physical servers, maybe 00:02:10.710 --> 00:02:12.840 on some server racks inside of a room. 00:02:12.840 --> 00:02:14.970 And therefore, they have very direct control 00:02:14.970 --> 00:02:17.940 over all of the servers-- exactly what kind of servers they are, 00:02:17.940 --> 00:02:19.830 exactly what software is running on them. 00:02:19.830 --> 00:02:23.280 They can go and physically look at the servers and debug them, if need be, 00:02:23.280 --> 00:02:25.830 in order to make sure that any issues are dealt with. 00:02:25.830 --> 00:02:28.170 But increasingly, we're starting to move into a world 00:02:28.170 --> 00:02:31.170 where cloud computing is becoming increasingly popular. 00:02:31.170 --> 00:02:35.190 In cloud computing, rather than have dedicated servers that are on premise, 00:02:35.190 --> 00:02:37.290 we have servers that are somewhere in the cloud 00:02:37.290 --> 00:02:40.950 where cloud computing companies like Amazon, or Google, or Microsoft 00:02:40.950 --> 00:02:42.720 are able to run their own servers. 00:02:42.720 --> 00:02:46.860 And we simply use those servers that are provided by those third parties, 00:02:46.860 --> 00:02:50.130 whether it's Amazon, or Google, or Microsoft, or someone else. 00:02:50.130 --> 00:02:51.330 And there are trade offs. 00:02:51.330 --> 00:02:54.950 With cloud computing, we no longer have as direct control over the machines 00:02:54.950 --> 00:02:56.700 themselves because they're not on premise. 00:02:56.700 --> 00:02:59.190 We can't physically manipulate those computers. 00:02:59.190 --> 00:03:01.620 But we have the advantage of not having to worry 00:03:01.620 --> 00:03:05.070 about dealing with physical objects that are inside 00:03:05.070 --> 00:03:08.280 of the premise of the company whose servers we'd like to run code for. 00:03:08.280 --> 00:03:10.770 When it's on the cloud, everything is managed externally 00:03:10.770 --> 00:03:14.205 by some other company, and we can simply use the servers that we need to. 00:03:14.205 --> 00:03:16.830 And we'll see that this lends itself to other benefits as well. 00:03:16.830 --> 00:03:20.490 As we might need more servers, as we start to get more sophisticated web 00:03:20.490 --> 00:03:24.120 applications that need more users, these cloud-computing companies 00:03:24.120 --> 00:03:26.220 can allow us to create web applications that 00:03:26.220 --> 00:03:29.280 are able to scale across multiple different servers 00:03:29.280 --> 00:03:31.910 as we start to get more and more users. 00:03:31.910 --> 00:03:35.460 But we'll discuss those issues of scale as we get to them. 00:03:35.460 --> 00:03:37.890 The question we need to ask after we have these servers-- 00:03:37.890 --> 00:03:40.348 whether they're servers that are on premise or servers that 00:03:40.348 --> 00:03:42.240 are operating somewhere in the cloud-- 00:03:42.240 --> 00:03:47.328 is, how many users can the server actually service at any given time? 00:03:47.328 --> 00:03:48.370 And that's going to vary. 00:03:48.370 --> 00:03:51.300 It's going to vary based on the size of the server, the computing 00:03:51.300 --> 00:03:52.470 power of the server. 00:03:52.470 --> 00:03:56.250 And it's going to be dependent upon how long it takes to process 00:03:56.250 --> 00:03:58.110 any particular user's request. 00:03:58.110 --> 00:04:00.420 If user requests are quite expensive, it might 00:04:00.420 --> 00:04:03.870 mean that there are fewer users that can be serviced at any given time. 00:04:03.870 --> 00:04:05.880 And it's for that reason that a helpful tool 00:04:05.880 --> 00:04:08.850 is to do some kind of benchmarking, some process of trying 00:04:08.850 --> 00:04:12.630 to do some analysis on how many users a server can actually 00:04:12.630 --> 00:04:14.730 be handling at any particular time. 00:04:14.730 --> 00:04:16.950 And there are numerous different tools that allow 00:04:16.950 --> 00:04:18.779 us to do this kind of benchmarking. 00:04:18.779 --> 00:04:22.470 Apache Bench, or otherwise known as AB, is a popular tool 00:04:22.470 --> 00:04:24.250 for doing this kind of thing. 00:04:24.250 --> 00:04:28.290 But benchmarking is going to be useful so that we know how many users one 00:04:28.290 --> 00:04:29.550 particular server can handle. 00:04:29.550 --> 00:04:31.290 Maybe it can handle 50 users. 00:04:31.290 --> 00:04:32.700 Maybe it can handle 100 users. 00:04:32.700 --> 00:04:35.160 Maybe it can handle more at any given time. 00:04:35.160 --> 00:04:37.830 But ultimately, it's going to be some finite limit. 00:04:37.830 --> 00:04:40.680 Every computer just has some finite amount of resources, 00:04:40.680 --> 00:04:42.030 and servers are no exception. 00:04:42.030 --> 00:04:45.360 There's going to be some number of users after which the server is not 00:04:45.360 --> 00:04:47.020 going to be able to handle it. 00:04:47.020 --> 00:04:48.850 So what do we do in that situation? 00:04:48.850 --> 00:04:53.130 What do we do if our server can only handle 100 users at any given time, 00:04:53.130 --> 00:04:58.020 but 101 users are trying to use our web application at the same time? 00:04:58.020 --> 00:04:59.440 Something needs to change. 00:04:59.440 --> 00:05:01.740 We need to deal with some sort of scaling 00:05:01.740 --> 00:05:04.500 to make sure that our web application can scale. 00:05:04.500 --> 00:05:07.770 And there are a couple of different types of scaling that we can try. 00:05:07.770 --> 00:05:10.530 One approach is to do what's called vertical scaling, which 00:05:10.530 --> 00:05:12.780 might be the simplest way you could imagine scaling. 00:05:12.780 --> 00:05:15.900 If this server is not good enough for handling the number of users 00:05:15.900 --> 00:05:18.890 that we need it to handle, well, just get a bigger serve. 00:05:18.890 --> 00:05:21.260 In vertical scaling, we just take the server 00:05:21.260 --> 00:05:23.930 and get a bigger server, a more powerful server, 00:05:23.930 --> 00:05:26.480 a server that can handle more users at any given time. 00:05:26.480 --> 00:05:27.730 It's going to cost more. 00:05:27.730 --> 00:05:29.480 But if we need it to handle more users, we 00:05:29.480 --> 00:05:33.110 can just get a bigger server to be able to deal with that problem. 00:05:33.110 --> 00:05:34.607 This approach is fairly simple. 00:05:34.607 --> 00:05:37.190 It just involves swapping out one server for another, one that 00:05:37.190 --> 00:05:39.410 can handle more users concurrently. 00:05:39.410 --> 00:05:40.830 But it also has drawbacks. 00:05:40.830 --> 00:05:44.330 There is some limit to how big the server can be, to how many users 00:05:44.330 --> 00:05:47.390 any physical one server is going to be able to handle because there's 00:05:47.390 --> 00:05:50.870 a physical limitation on what is the biggest, fastest, most powerful 00:05:50.870 --> 00:05:53.310 server we could possibly get. 00:05:53.310 --> 00:05:55.970 So when vertical scaling ends up not being enough, 00:05:55.970 --> 00:05:59.720 an alternative-- as you might imagine-- is what's known as horizontal scaling. 00:05:59.720 --> 00:06:01.970 And the idea behind horizontal scaling is 00:06:01.970 --> 00:06:06.560 that, when one server isn't enough to be able to service all of the users that 00:06:06.560 --> 00:06:10.070 might be trying to use a web application at the same time, well, 00:06:10.070 --> 00:06:13.010 then we can take the approach of saying, well, rather than just using 00:06:13.010 --> 00:06:17.840 one server, let's go ahead and split it up into two different servers. 00:06:17.840 --> 00:06:21.420 We now have two servers that are both running the web application. 00:06:21.420 --> 00:06:24.980 And now, effectively, we've been able to double the number of users 00:06:24.980 --> 00:06:26.600 that this web application can handle. 00:06:26.600 --> 00:06:29.690 Rather than just a single server that can service 100 users, 00:06:29.690 --> 00:06:33.200 if we have two of them, now we can service 200 users at any given time 00:06:33.200 --> 00:06:37.670 if you imagine 100 of them using server A over here and 100 of them 00:06:37.670 --> 00:06:40.460 using server B over there. 00:06:40.460 --> 00:06:44.220 But this then lends itself to some other questions that we have to answer, 00:06:44.220 --> 00:06:47.630 which is, how do these servers get their users in the first place? 00:06:47.630 --> 00:06:50.450 When a user requests a web page, how does that user 00:06:50.450 --> 00:06:54.140 get directed either to server A or to server B? 00:06:54.140 --> 00:06:57.980 It seems that they need some way to make that decision in order to decide 00:06:57.980 --> 00:07:00.690 whether to go one direction or another. 00:07:00.690 --> 00:07:04.010 And it's for that reason that we might introduce another piece of hardware 00:07:04.010 --> 00:07:05.240 into this picture. 00:07:05.240 --> 00:07:09.070 And that additional piece of hardware is what we might call a load balancer. 00:07:09.070 --> 00:07:11.510 And a load balancer is just another piece of hardware 00:07:11.510 --> 00:07:14.910 that is going to sit in front of these servers, so to speak. 00:07:14.910 --> 00:07:17.660 In other words, when a user makes a request to a web page, 00:07:17.660 --> 00:07:21.170 rather than immediately getting that request to one of these web servers, 00:07:21.170 --> 00:07:25.250 the request is first going to go through this load balancer 00:07:25.250 --> 00:07:27.800 where the request first comes into the load balancer. 00:07:27.800 --> 00:07:31.160 And the load balancer then decides whether to send that request to server 00:07:31.160 --> 00:07:35.330 A or to send that request to server B. And this process 00:07:35.330 --> 00:07:38.300 is likely less expensive than actually dealing with and processing 00:07:38.300 --> 00:07:39.330 that request. 00:07:39.330 --> 00:07:42.440 So the load balancer is effectively just acting as a dispatcher. 00:07:42.440 --> 00:07:44.310 It waits for those requests to come in. 00:07:44.310 --> 00:07:46.670 And when the requests do come in, the load balancer 00:07:46.670 --> 00:07:49.628 directs those requests either to go to one server or to another. 00:07:49.628 --> 00:07:52.670 And you might imagine the story where we have more than just two servers. 00:07:52.670 --> 00:07:54.260 Maybe we have many servers. 00:07:54.260 --> 00:07:56.660 And the load balancer is just going to balance 00:07:56.660 --> 00:07:59.030 between all of those different servers. 00:07:59.030 --> 00:08:02.570 And this process of deciding which server to send a request to 00:08:02.570 --> 00:08:05.840 is known as load balancing, which is what the load balancer is ultimately 00:08:05.840 --> 00:08:06.618 doing. 00:08:06.618 --> 00:08:09.410 And there are various different methods that you might use in order 00:08:09.410 --> 00:08:11.042 to perform this load balancing. 00:08:11.042 --> 00:08:13.250 So you might imagine thinking about this intuitively. 00:08:13.250 --> 00:08:16.490 How would the load balancer decide, given some request, 00:08:16.490 --> 00:08:19.220 should we send the request to this router, to this server, 00:08:19.220 --> 00:08:22.910 or should we send the request to some other server instead? 00:08:22.910 --> 00:08:26.120 And there are many different approaches that our load balancer might take. 00:08:26.120 --> 00:08:27.440 And here are just a couple. 00:08:27.440 --> 00:08:30.230 Random choice might be the simplest of options. 00:08:30.230 --> 00:08:34.480 Given a user that shows up and tries to make a request to our web server, 00:08:34.480 --> 00:08:36.620 the load balancer first takes a look at the user 00:08:36.620 --> 00:08:40.497 and just randomly assigns them to one of the various different servers 00:08:40.497 --> 00:08:42.080 that might be processing that request. 00:08:42.080 --> 00:08:46.340 If there are 10 different servers, it randomly chooses among those 10 servers 00:08:46.340 --> 00:08:50.030 to decide which of them is going to be servicing that request. 00:08:50.030 --> 00:08:52.020 This has the advantage of being very simple. 00:08:52.020 --> 00:08:53.300 It's just a quick calculation. 00:08:53.300 --> 00:08:56.330 The computers can pretty readily generate random numbers. 00:08:56.330 --> 00:08:58.310 And based on that random number, the computer 00:08:58.310 --> 00:09:02.720 can dispatch the user to one server or to another server. 00:09:02.720 --> 00:09:06.620 But it might not be the best option because, if we happen to get unlucky, 00:09:06.620 --> 00:09:10.190 we might end up with many more users on one server than another. 00:09:10.190 --> 00:09:12.890 Or we might end up with servers that are entirely 00:09:12.890 --> 00:09:15.230 unused if it just so happens that we don't end up 00:09:15.230 --> 00:09:17.300 randomly selecting that server. 00:09:17.300 --> 00:09:20.780 Now, in practice with many users that are all using this load balancer, all 00:09:20.780 --> 00:09:24.260 being dispatched, odds are high that eventually all of them will be used. 00:09:24.260 --> 00:09:26.837 But it might not be a totally even distribution. 00:09:26.837 --> 00:09:28.670 And so for that reason, another approach you 00:09:28.670 --> 00:09:32.570 might take is round-robin approach where the approach is, instead, 00:09:32.570 --> 00:09:36.650 for the very first user, go ahead and assign that user to server number one. 00:09:36.650 --> 00:09:38.840 For the next user, assign them to server number two. 00:09:38.840 --> 00:09:40.760 And maybe, if there are five servers, you say, 00:09:40.760 --> 00:09:44.150 the third user goes to server three, user four goes to server four, 00:09:44.150 --> 00:09:47.420 user five goes to server five, and then user six 00:09:47.420 --> 00:09:49.070 goes back to server number one. 00:09:49.070 --> 00:09:51.257 You basically rotate going one through five. 00:09:51.257 --> 00:09:53.840 And then, once you've assigned someone to each of the servers, 00:09:53.840 --> 00:09:55.760 you go back to the beginning. 00:09:55.760 --> 00:09:59.360 This is also a relatively easy thing to implement because you can simply just 00:09:59.360 --> 00:10:01.520 keep count somewhere in the load balancer 00:10:01.520 --> 00:10:04.730 saying, what was the most recent server that I assigned a user to? 00:10:04.730 --> 00:10:07.550 And the next time a request comes in, go ahead and assign it 00:10:07.550 --> 00:10:09.710 to the next server, and the next server after that, 00:10:09.710 --> 00:10:12.220 effectively doing a round-robin style approach 00:10:12.220 --> 00:10:16.040 where you go through all the servers once before going through the servers 00:10:16.040 --> 00:10:17.140 again. 00:10:17.140 --> 00:10:19.750 Now, this might seem better than random choice in the sense 00:10:19.750 --> 00:10:23.230 that it's going to more equitably decide whether to assign 00:10:23.230 --> 00:10:26.710 any particular request to any particular server. 00:10:26.710 --> 00:10:29.110 But it also suffers from certain problems. 00:10:29.110 --> 00:10:31.510 Round robin might be great, but if some requests 00:10:31.510 --> 00:10:34.975 take longer than other requests, we might also get unlucky, 00:10:34.975 --> 00:10:36.850 and the requests that are taking longer might 00:10:36.850 --> 00:10:40.160 end up all going to one of the servers as opposed to another server. 00:10:40.160 --> 00:10:43.310 So there are other approaches that we might want to go to as well-- 00:10:43.310 --> 00:10:45.880 for example, something like fewest connections 00:10:45.880 --> 00:10:50.430 where the approach there is to say, go ahead, and when a user makes a request, 00:10:50.430 --> 00:10:53.050 the load balancer should pick which of the servers 00:10:53.050 --> 00:10:57.370 currently has the fewest active connections from other users 00:10:57.370 --> 00:11:01.060 and other requests that are currently connected to those servers instead. 00:11:01.060 --> 00:11:04.120 And by choosing the server that happens to have the fewest connections, 00:11:04.120 --> 00:11:07.330 you're probably going to do a better job of trying to balance out 00:11:07.330 --> 00:11:09.340 between all of the various different requests 00:11:09.340 --> 00:11:12.220 that might be happening inside of your web application. 00:11:12.220 --> 00:11:15.220 And while this might do a better job, there are trade offs here as well. 00:11:15.220 --> 00:11:18.700 It might be more expensive, for example, to compute which of the servers 00:11:18.700 --> 00:11:21.310 happens to have the fewest number of connections, 00:11:21.310 --> 00:11:24.880 whereas it's much easier just to say, choose a server at random 00:11:24.880 --> 00:11:29.740 or to do the round-robin style approach of just 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 00:11:29.740 --> 00:11:32.590 again, and again, and again. 00:11:32.590 --> 00:11:36.410 But all of these approaches naively have yet another problem, 00:11:36.410 --> 00:11:38.030 which has to do with sessions. 00:11:38.030 --> 00:11:40.150 And you'll recall that sessions we used whenever 00:11:40.150 --> 00:11:44.110 we wanted to store information about the user's current interaction 00:11:44.110 --> 00:11:45.220 with the web application. 00:11:45.220 --> 00:11:46.780 When you log into a website-- 00:11:46.780 --> 00:11:50.300 you log into your email, or you log into Amazon, for example-- 00:11:50.300 --> 00:11:53.740 and then you come back to that website or visit another page on that website-- 00:11:53.740 --> 00:11:56.470 make another request, for example-- 00:11:56.470 --> 00:11:59.800 it's not the case that you have to sign in yet again, that the web browser has 00:11:59.800 --> 00:12:01.720 totally forgotten who you are. 00:12:01.720 --> 00:12:04.450 When I go back to my mail account, or when I go back to Amazon 00:12:04.450 --> 00:12:08.205 for a second time, my mail account or Amazon remembers me from the last time 00:12:08.205 --> 00:12:08.830 that I visited. 00:12:08.830 --> 00:12:13.060 I have some sort of session where it's keeping track of who is logged in, 00:12:13.060 --> 00:12:15.670 maybe information about what I've been doing on the page, 00:12:15.670 --> 00:12:18.790 and allows me to continue interacting with the web application, 00:12:18.790 --> 00:12:21.880 even if I'm making multiple requests. 00:12:21.880 --> 00:12:24.310 And this, you might imagine, could be a problem 00:12:24.310 --> 00:12:26.440 for this type of load balancing. 00:12:26.440 --> 00:12:31.630 If I have multiple different servers, imagine if I try to log into a website. 00:12:31.630 --> 00:12:34.990 And the first time I make a request, I'm directed to server number one. 00:12:34.990 --> 00:12:37.690 And I'm now logged in on server number one. 00:12:37.690 --> 00:12:39.400 But then I make another request. 00:12:39.400 --> 00:12:41.162 I'm directed back to the load balancer. 00:12:41.162 --> 00:12:43.120 And maybe the load balancer, this time, decides 00:12:43.120 --> 00:12:45.310 to send me to server number two. 00:12:45.310 --> 00:12:48.190 But if the session is stored in server number one somewhere-- 00:12:48.190 --> 00:12:51.010 server number one remembers who I am and what I'm doing-- 00:12:51.010 --> 00:12:54.282 then server number two is not going to know who I am. 00:12:54.282 --> 00:12:56.740 And therefore, it's not going to remember that I've already 00:12:56.740 --> 00:12:58.660 logged into this web application. 00:12:58.660 --> 00:13:01.710 And as a result, I might be prompted to log in again. 00:13:01.710 --> 00:13:04.630 And if I go make another request, and I end up on yet another server, 00:13:04.630 --> 00:13:07.580 I might be logged out again and have to log in for a third time. 00:13:07.580 --> 00:13:11.590 So the problem comes about when our load balancing happens, 00:13:11.590 --> 00:13:14.290 but we're not doing so in a session-aware way-- 00:13:14.290 --> 00:13:18.310 that our load balancer isn't caring about when a user visits the page 00:13:18.310 --> 00:13:22.300 and then visits another page on the same web application again-- 00:13:22.300 --> 00:13:25.720 because we want to remember information from the previous time 00:13:25.720 --> 00:13:27.475 that the user was here. 00:13:27.475 --> 00:13:28.850 So how can we solve this problem? 00:13:28.850 --> 00:13:30.820 How can we make sure that, when we do this load 00:13:30.820 --> 00:13:33.010 balancing across multiple different servers, 00:13:33.010 --> 00:13:34.795 that we do so in a session-aware way? 00:13:34.795 --> 00:13:36.670 Well, there are multiple different approaches 00:13:36.670 --> 00:13:39.310 to session-aware load balancing. 00:13:39.310 --> 00:13:42.610 One approach is this general idea known as sticky sessions 00:13:42.610 --> 00:13:46.150 where the idea is that, when I come back to the load balancer, 00:13:46.150 --> 00:13:49.940 the load balancer will remember what server I was sent to last time 00:13:49.940 --> 00:13:52.210 and send me there yet again. 00:13:52.210 --> 00:13:54.670 So for example, if I log into a website once, 00:13:54.670 --> 00:13:57.490 and I'm directed to server number two, for example, then 00:13:57.490 --> 00:14:00.130 the next time I visit this web application, 00:14:00.130 --> 00:14:03.520 even if I should be directed to server three or four according 00:14:03.520 --> 00:14:07.600 to random choice or according to fewest connections or any of these other load 00:14:07.600 --> 00:14:09.700 balancing methods, the load balancer should 00:14:09.700 --> 00:14:12.310 remember that, last time I came to this site, 00:14:12.310 --> 00:14:14.240 I got directed to server number two. 00:14:14.240 --> 00:14:16.210 And so this time, the load balancer is going 00:14:16.210 --> 00:14:18.550 to direct me to server number two yet again. 00:14:18.550 --> 00:14:22.000 That way, server number two, which contains information about my session, 00:14:22.000 --> 00:14:25.000 is going to see me again and remember who it is that I am. 00:14:25.000 --> 00:14:28.180 And it's not going to make me log in again into the exact same website 00:14:28.180 --> 00:14:30.570 for a second time, for example. 00:14:30.570 --> 00:14:33.280 And so sticky sessions are one way of dealing with this problem. 00:14:33.280 --> 00:14:35.363 But again, with all of these approaches-- and this 00:14:35.363 --> 00:14:38.410 will be a recurring theme as we talk about scalability and security-- 00:14:38.410 --> 00:14:39.730 there are trade offs here. 00:14:39.730 --> 00:14:44.200 A trade to the sticky sessions is that it's possible that one of these servers 00:14:44.200 --> 00:14:47.950 is going to end up getting far more load than another if one server happens 00:14:47.950 --> 00:14:50.620 to have a lot of users that keep coming back to the website 00:14:50.620 --> 00:14:52.390 and keep requesting additional pages. 00:14:52.390 --> 00:14:54.940 But other pages, other servers might have 00:14:54.940 --> 00:14:58.010 had users that decided not to come back, for example. 00:14:58.010 --> 00:15:01.390 And so there's a difference in utilization where some of our servers 00:15:01.390 --> 00:15:03.880 might be more heavily utilized than other servers, 00:15:03.880 --> 00:15:07.580 and we're not doing a very good job of balancing between them. 00:15:07.580 --> 00:15:11.980 And so one approach is to store sessions inside of the database 00:15:11.980 --> 00:15:15.580 rather than store information about sessions inside of the server 00:15:15.580 --> 00:15:18.730 themselves so that, if I get directed to another server, 00:15:18.730 --> 00:15:20.710 that other server doesn't remember who I am, 00:15:20.710 --> 00:15:24.310 doesn't remember information about my interaction with this website. 00:15:24.310 --> 00:15:27.890 If we instead choose to store sessions inside of a database-- 00:15:27.890 --> 00:15:31.210 and, in particular, inside of a database that all of the servers 00:15:31.210 --> 00:15:33.100 have the ability to access-- 00:15:33.100 --> 00:15:36.400 well, then it doesn't matter which of the servers I get directed to 00:15:36.400 --> 00:15:39.370 and which server the load balancer decides to send me to 00:15:39.370 --> 00:15:42.310 because, regardless of which server I end up getting sent to, 00:15:42.310 --> 00:15:44.235 the session information is in the database. 00:15:44.235 --> 00:15:46.360 And each of the servers can connect to the database 00:15:46.360 --> 00:15:49.390 to find out who I am, to find out whether I've logged into the site 00:15:49.390 --> 00:15:52.660 already, and therefore is able to recognize me. 00:15:52.660 --> 00:15:54.670 And so that might be one approach as well. 00:15:54.670 --> 00:15:57.702 Another approach is to store sessions on the client side. 00:15:57.702 --> 00:15:59.410 We've talked a little bit about this idea 00:15:59.410 --> 00:16:03.100 of cookies, which can be stored where the web browser can set a cookie so 00:16:03.100 --> 00:16:06.460 that your web browser is able to present that cookie the next time 00:16:06.460 --> 00:16:09.020 it makes a request to the same web application. 00:16:09.020 --> 00:16:12.430 And inside this cookie, you can store a whole bunch of information, including 00:16:12.430 --> 00:16:14.000 information about the session. 00:16:14.000 --> 00:16:16.690 You might, inside of a cookie, store information 00:16:16.690 --> 00:16:19.340 about what user is currently logged in, for example, 00:16:19.340 --> 00:16:21.500 or other session-related information. 00:16:21.500 --> 00:16:23.080 But here, too, there are drawbacks. 00:16:23.080 --> 00:16:25.750 If you're not careful, someone could manipulate that cookie 00:16:25.750 --> 00:16:27.380 and maybe pretend to be something else. 00:16:27.380 --> 00:16:29.230 And so for that reason, you might want to do 00:16:29.230 --> 00:16:32.020 some encryption or some kind of sign in to make sure 00:16:32.020 --> 00:16:35.832 that you can't fake a cookie and pretend to be someone that you're not. 00:16:35.832 --> 00:16:37.540 But another concern is that, as you start 00:16:37.540 --> 00:16:40.130 to store more and more information inside of these cookies, 00:16:40.130 --> 00:16:43.540 these cookies keep getting sent back and forth between the server and the client 00:16:43.540 --> 00:16:45.250 every time a request is made. 00:16:45.250 --> 00:16:48.040 That can start to get expensive, too-- more and more information 00:16:48.040 --> 00:16:52.090 passing back and forth between the client and between the server. 00:16:52.090 --> 00:16:54.580 So lots of possible approaches-- no one approach 00:16:54.580 --> 00:16:57.040 that is necessarily the right approach or the best approach 00:16:57.040 --> 00:16:58.270 to use in any cases. 00:16:58.270 --> 00:17:00.850 But things to be aware of-- things to think about 00:17:00.850 --> 00:17:03.520 as we begin to deal with these issues of scale, of making 00:17:03.520 --> 00:17:07.270 sure we have multiple servers that are available for usage in case we do 00:17:07.270 --> 00:17:07.869 need it. 00:17:07.869 --> 00:17:10.930 But also making sure that, when we do so, we don't break the user 00:17:10.930 --> 00:17:14.920 experience-- we don't result in a situation where a user is logged in 00:17:14.920 --> 00:17:18.160 but then, suddenly, isn't logged in at all. 00:17:18.160 --> 00:17:21.460 And so horizontal scaling gives us this kind of capacity-- 00:17:21.460 --> 00:17:24.760 the ability to have multiple different servers, all of which 00:17:24.760 --> 00:17:27.880 can be dealing with user requests and responding to those user requests 00:17:27.880 --> 00:17:28.890 as well. 00:17:28.890 --> 00:17:34.240 But a reasonable question asked is, how many of those servers do we need? 00:17:34.240 --> 00:17:36.850 Now, we can use benchmarking to try to estimate this. 00:17:36.850 --> 00:17:40.190 If we have an estimate of how many users are going to be on our website 00:17:40.190 --> 00:17:42.430 at any given time, we can benchmark and see 00:17:42.430 --> 00:17:46.420 how many users can be handled by a single server and extrapolate, 00:17:46.420 --> 00:17:49.330 based on that information, to infer how many servers we 00:17:49.330 --> 00:17:52.000 might need in our web application to be able to service 00:17:52.000 --> 00:17:53.650 all of these different users. 00:17:53.650 --> 00:17:56.680 But it might be the case that our web application doesn't always 00:17:56.680 --> 00:17:58.540 have the same number of users. 00:17:58.540 --> 00:18:01.660 Maybe, sometimes, there are going to be far more users than another time. 00:18:01.660 --> 00:18:05.140 You might imagine, for example, that in a news organization's website-- 00:18:05.140 --> 00:18:07.690 like the web application for a newspaper-- 00:18:07.690 --> 00:18:09.720 when there's breaking news, some big story, 00:18:09.720 --> 00:18:11.470 there's going to be a lot more people that 00:18:11.470 --> 00:18:15.380 are all trying to access the website at the same time than at other times. 00:18:15.380 --> 00:18:18.310 So one approach might be, consider the maximum. 00:18:18.310 --> 00:18:20.650 What is the most number of users that ever 00:18:20.650 --> 00:18:23.620 might be trying to use our web application at any given time? 00:18:23.620 --> 00:18:26.830 And choose a number of servers based on that maximum so that, 00:18:26.830 --> 00:18:28.960 no matter how high the number of users get, 00:18:28.960 --> 00:18:32.800 we will have enough servers to be able to service all of those users. 00:18:32.800 --> 00:18:35.560 But that's probably not a great economical choice 00:18:35.560 --> 00:18:39.250 if, in the vast majority of cases, there will be far fewer users. 00:18:39.250 --> 00:18:42.625 In that case, you're going to have a lot of servers that are underutilized-- 00:18:42.625 --> 00:18:45.250 where you don't need that many servers, but you're still paying 00:18:45.250 --> 00:18:47.770 for the electricity, for keeping all of them running-- 00:18:47.770 --> 00:18:50.740 which might not be an ideal choice either. 00:18:50.740 --> 00:18:52.120 So one solution to this-- 00:18:52.120 --> 00:18:54.970 quite popular, especially in this world of cloud computing-- 00:18:54.970 --> 00:18:58.660 is the idea of autoscaling where you can have an autoscaler 00:18:58.660 --> 00:19:03.460 to say that, you know what, let's start with, for example, two servers. 00:19:03.460 --> 00:19:05.470 But if there's enough traffic to the website, 00:19:05.470 --> 00:19:07.678 if enough people are making requests to the website-- 00:19:07.678 --> 00:19:10.360 maybe it's a peak time where people are using the website-- 00:19:10.360 --> 00:19:11.830 go ahead and scale up. 00:19:11.830 --> 00:19:15.880 Go ahead and add a third server where now our load balancer can balance 00:19:15.880 --> 00:19:18.100 between all three of those servers. 00:19:18.100 --> 00:19:20.710 And if even more traffic ends up coming to the website-- 00:19:20.710 --> 00:19:24.280 more users are trying to use this application all at the same time-- 00:19:24.280 --> 00:19:27.160 well, then we can go ahead and add a fourth server as well. 00:19:27.160 --> 00:19:28.660 And we can continue to do that. 00:19:28.660 --> 00:19:31.510 Most autoscalers will let you configure, for example, 00:19:31.510 --> 00:19:34.480 a minimum number of servers and a maximum number of servers. 00:19:34.480 --> 00:19:37.420 And dependent on how many users happen to be using your web 00:19:37.420 --> 00:19:40.300 application at any given time, the autoscaler 00:19:40.300 --> 00:19:44.410 can scale up or scale down, adding new servers as more users come 00:19:44.410 --> 00:19:47.410 to the website, removing servers as fewer users are 00:19:47.410 --> 00:19:49.870 using the website as well. 00:19:49.870 --> 00:19:52.425 And so this can be a nice solution to this problem of scale 00:19:52.425 --> 00:19:55.050 where you don't have to worry about how many servers there are. 00:19:55.050 --> 00:19:57.580 It just autoscales entirely on its own. 00:19:57.580 --> 00:19:59.080 Now, there are trade offs here, too. 00:19:59.080 --> 00:20:01.250 This auto scaling process might take time. 00:20:01.250 --> 00:20:05.260 And if a lot of users all come into your website all at the exact same time, 00:20:05.260 --> 00:20:08.350 well, it's going to take some time to be able to add 00:20:08.350 --> 00:20:10.630 all of these additional servers to start them up. 00:20:10.630 --> 00:20:13.700 And so there might be some trade offs there, too, 00:20:13.700 --> 00:20:17.330 where you might not be able to service all of the users immediately. 00:20:17.330 --> 00:20:19.380 And another problem worth thinking about is, 00:20:19.380 --> 00:20:21.510 as you add more and more of these servers, 00:20:21.510 --> 00:20:23.877 you introduce opportunities for failure. 00:20:23.877 --> 00:20:25.710 Now, it's better than having a single server 00:20:25.710 --> 00:20:29.490 where, if that single server fails, now suddenly the entire web application 00:20:29.490 --> 00:20:30.390 doesn't work at all. 00:20:30.390 --> 00:20:33.240 That's what we generally call a single point of failure-- 00:20:33.240 --> 00:20:37.410 a single place where, if it fails, the entire system is going to be broken. 00:20:37.410 --> 00:20:39.720 One advantage of having multiple servers is 00:20:39.720 --> 00:20:43.530 that we no longer have a single server that acts as a point of failure. 00:20:43.530 --> 00:20:46.140 If one of the servers goes down then, ideally, 00:20:46.140 --> 00:20:49.780 our load balancer should be able to know, based on that information, 00:20:49.780 --> 00:20:53.370 to no longer send a request to that particular server-- to, 00:20:53.370 --> 00:20:58.470 instead, balance the load across the remaining three servers instead. 00:20:58.470 --> 00:21:00.640 Now, there's an interesting question there as well, 00:21:00.640 --> 00:21:04.200 which is, how does the load balancer know that this server is 00:21:04.200 --> 00:21:05.450 no longer responding? 00:21:05.450 --> 00:21:07.200 For some reason, it has some sort of error 00:21:07.200 --> 00:21:09.763 that it's not able to process requests appropriately. 00:21:09.763 --> 00:21:11.680 Well, there are multiple ways you can do this. 00:21:11.680 --> 00:21:15.090 But one of the most common is what's simply known as a heartbeat where, 00:21:15.090 --> 00:21:18.240 effectively, every so often, every some number of seconds, 00:21:18.240 --> 00:21:20.700 the load balancer pings all of the servers-- 00:21:20.700 --> 00:21:23.280 just sends a quick request to all the servers. 00:21:23.280 --> 00:21:26.250 And all of the servers are supposed to respond back. 00:21:26.250 --> 00:21:29.010 And using that information, the load balancer 00:21:29.010 --> 00:21:31.920 knows a little bit about the latency of each of the servers-- 00:21:31.920 --> 00:21:34.920 how long it took for the server to respond to the request. 00:21:34.920 --> 00:21:37.440 But also, it can get information about whether or not 00:21:37.440 --> 00:21:39.450 the server is functioning properly. 00:21:39.450 --> 00:21:42.157 If one of the servers doesn't respond to the ping, 00:21:42.157 --> 00:21:44.490 well, then the load balancer knows that there's probably 00:21:44.490 --> 00:21:47.640 something wrong with the server, that we probably shouldn't be directing 00:21:47.640 --> 00:21:50.570 more users to that server at all. 00:21:50.570 --> 00:21:53.730 And so this can solve for the problem of a single point of failure 00:21:53.730 --> 00:21:57.570 by allowing ourselves multiple servers where, if any one of the servers fails, 00:21:57.570 --> 00:22:00.450 the load balancer learns about that via heartbeat 00:22:00.450 --> 00:22:03.540 and then, based on that information, can begin to redirect traffic 00:22:03.540 --> 00:22:05.847 to the other servers instead. 00:22:05.847 --> 00:22:08.430 Now, one thing you might notice is that, even in this picture, 00:22:08.430 --> 00:22:11.970 now the load balancer appears to be like a single point of failure 00:22:11.970 --> 00:22:14.460 where, if the low balance happens to fail, well, now 00:22:14.460 --> 00:22:16.668 nothing is going to work because the load balancer is 00:22:16.668 --> 00:22:18.810 the one responsible for directing traffic to all 00:22:18.810 --> 00:22:20.190 of the various different servers. 00:22:20.190 --> 00:22:23.790 And so even though there is no single server that is a point to failure, 00:22:23.790 --> 00:22:27.370 this load balancer also appears to be a single point of failure. 00:22:27.370 --> 00:22:28.540 And that's definitely true. 00:22:28.540 --> 00:22:31.470 And you might imagine instead having multiple load balancers 00:22:31.470 --> 00:22:35.310 where one load balancer goes down, another load balancer can swoop in, 00:22:35.310 --> 00:22:39.000 acting as a hot spare where it picks up all of the traffic that was originally 00:22:39.000 --> 00:22:40.650 going to the first load balancer. 00:22:40.650 --> 00:22:44.550 And if it ever goes down, a second one is ready to take its place. 00:22:44.550 --> 00:22:47.700 And it might also be doing this kind of heartbeat process-- checking up 00:22:47.700 --> 00:22:48.845 on the first load balancer. 00:22:48.845 --> 00:22:51.970 And if all goes well, the second load balancer doesn't have to do anything. 00:22:51.970 --> 00:22:54.490 But if the first load balancer ever were to fail, 00:22:54.490 --> 00:22:56.640 well, then the second load balancer can step in 00:22:56.640 --> 00:22:59.700 and begin servicing those requests, directing them to all 00:22:59.700 --> 00:23:01.840 of these individual servers as well. 00:23:01.840 --> 00:23:02.705 And so there, too-- 00:23:02.705 --> 00:23:05.580 another opportunity to think about where the single points of failure 00:23:05.580 --> 00:23:09.300 are and thinking about how we might address the single points of failure 00:23:09.300 --> 00:23:12.330 in order to make sure that our web applications are scalable. 00:23:12.330 --> 00:23:14.820 So that then deals with issues about how we might 00:23:14.820 --> 00:23:17.070 go about scaling up these servers. 00:23:17.070 --> 00:23:20.340 But ultimately, the servers are not the entirety of the story. 00:23:20.340 --> 00:23:22.350 Inside of our applications, we mostly have 00:23:22.350 --> 00:23:25.918 writing web applications that interact and deal with data in some way. 00:23:25.918 --> 00:23:28.710 And there are multiple different databases that we've talked about. 00:23:28.710 --> 00:23:30.900 SQLite Light has been the default one that Django 00:23:30.900 --> 00:23:34.200 provides to us, which just stores data inside of a file. 00:23:34.200 --> 00:23:36.020 But as we begin to grow our applications, 00:23:36.020 --> 00:23:39.270 if we want to begin to scale them, it's quite popular and quite common 00:23:39.270 --> 00:23:41.530 to put databases entirely somewhere separate-- 00:23:41.530 --> 00:23:44.340 to have a separate database server running somewhere else where 00:23:44.340 --> 00:23:46.800 the servers are all communicating with that database, 00:23:46.800 --> 00:23:50.550 whether it's we're running MySQL, or Postgres, or some other database system 00:23:50.550 --> 00:23:51.750 instead. 00:23:51.750 --> 00:23:55.410 And all of the servers then have access to that database. 00:23:55.410 --> 00:23:57.990 And so there, too, are considerations that we 00:23:57.990 --> 00:24:00.420 need to take into account-- issues of how it is that we 00:24:00.420 --> 00:24:03.840 go about scaling up these databases. 00:24:03.840 --> 00:24:06.960 In this picture, for example, you might imagine a load balancer 00:24:06.960 --> 00:24:08.730 that is communicating with two servers. 00:24:08.730 --> 00:24:10.950 But both of those servers, for example, need 00:24:10.950 --> 00:24:13.200 to be communicating with this database. 00:24:13.200 --> 00:24:16.140 And much like any server can only handle some number of requests, 00:24:16.140 --> 00:24:19.380 some number of users at any given time, databases, too, 00:24:19.380 --> 00:24:23.280 can only handle some number of requests, some concurrent number of connections 00:24:23.280 --> 00:24:24.250 at any given time. 00:24:24.250 --> 00:24:26.130 And so we need to begin to think about issues 00:24:26.130 --> 00:24:30.120 of how it is that we scale these databases as well in order to be 00:24:30.120 --> 00:24:33.330 able to handle more and more users. 00:24:33.330 --> 00:24:35.580 Now, one approach, the first thing we might try to do, 00:24:35.580 --> 00:24:38.160 is something called database partitioning-- effectively, 00:24:38.160 --> 00:24:42.270 splitting up what is a big data set into multiple different parts 00:24:42.270 --> 00:24:43.470 to that data set. 00:24:43.470 --> 00:24:46.560 And we've already seen some examples of database partitioning. 00:24:46.560 --> 00:24:49.890 We've seen one example where-- for example, when we talked about SQL, 00:24:49.890 --> 00:24:53.130 we looked at a table of flights where each flight had an origin 00:24:53.130 --> 00:24:57.840 city, the origin city's airport code, the destination city, the destination 00:24:57.840 --> 00:25:00.120 city's airport code, and some number of minutes, 00:25:00.120 --> 00:25:02.850 the duration for that particular flight. 00:25:02.850 --> 00:25:05.820 And we decided that storing all of this data in a single table 00:25:05.820 --> 00:25:07.590 probably wasn't the best idea. 00:25:07.590 --> 00:25:10.170 And instead, we wanted to split that data up 00:25:10.170 --> 00:25:13.380 in a type of partitioning where, instead, we said, all right, let's just 00:25:13.380 --> 00:25:16.230 have one table that will have all of the airports. 00:25:16.230 --> 00:25:20.440 And so each airport gets its own row inside of this airports table. 00:25:20.440 --> 00:25:22.640 And we also had another table which was just 00:25:22.640 --> 00:25:26.270 the flights table which, rather than storing all of those columns, 00:25:26.270 --> 00:25:28.820 just mapped two airports to each other. 00:25:28.820 --> 00:25:32.660 With any given flight, it has an origin idea, meaning which object, 00:25:32.660 --> 00:25:36.800 which row in the origin airports table is represented by the flight, 00:25:36.800 --> 00:25:39.680 and then which row in the airports table is 00:25:39.680 --> 00:25:42.860 going to represent the destination for that flight. 00:25:42.860 --> 00:25:45.530 So we took one table and effectively split it up 00:25:45.530 --> 00:25:49.940 into multiple tables, each of which ultimately had fewer columns. 00:25:49.940 --> 00:25:52.850 And this might be something we call the vertical partitioning 00:25:52.850 --> 00:25:56.810 of a database where, instead of just having single big long tables, 00:25:56.810 --> 00:25:59.420 we split them up into multiple tables, each 00:25:59.420 --> 00:26:01.820 of which have fewer columns that are able to represent 00:26:01.820 --> 00:26:03.497 data in a more relational way. 00:26:03.497 --> 00:26:05.330 And that's something we've seen before, too. 00:26:05.330 --> 00:26:07.460 But in addition to vertical partitioning, 00:26:07.460 --> 00:26:11.090 we can also do horizontal partitioning where the idea there 00:26:11.090 --> 00:26:13.340 is that we take a table and just split it up 00:26:13.340 --> 00:26:17.390 into multiple tables that are all storing effectively the same data, 00:26:17.390 --> 00:26:19.380 but split up into different data sets. 00:26:19.380 --> 00:26:22.520 So the same type of data, but just in different tables-- 00:26:22.520 --> 00:26:25.100 where we might have originally had a flights table, 00:26:25.100 --> 00:26:28.490 and instead we split it up into a domestic flights table 00:26:28.490 --> 00:26:30.380 and an international flights table. 00:26:30.380 --> 00:26:32.870 Each of these tables still has the exact same column. 00:26:32.870 --> 00:26:34.555 They still have a destination column. 00:26:34.555 --> 00:26:35.930 They still have an origin column. 00:26:35.930 --> 00:26:38.250 They still have a duration column, for example. 00:26:38.250 --> 00:26:41.210 But we've just now taken the data that used to be in one table 00:26:41.210 --> 00:26:46.040 and split up that data into two or more multiple different tables instead-- 00:26:46.040 --> 00:26:49.940 one for all the domestic flights, one for all the international flights. 00:26:49.940 --> 00:26:52.370 And the advantage there is that we no longer 00:26:52.370 --> 00:26:55.760 need to search through the entirety of the data set if we're just looking 00:26:55.760 --> 00:26:57.780 for one domestic flight, for example. 00:26:57.780 --> 00:27:00.680 If you know the flight you're looking for is a domestic flight, 00:27:00.680 --> 00:27:04.820 well, then it can be more efficient to just search the flight's domestic table 00:27:04.820 --> 00:27:08.270 and not bother searching through the flight international table. 00:27:08.270 --> 00:27:11.300 And so if we're intelligent about how we choose to take a table 00:27:11.300 --> 00:27:14.540 and split it up into multiple different tables, the effect of that 00:27:14.540 --> 00:27:16.880 is that we can often improve the efficiency 00:27:16.880 --> 00:27:19.190 of our searches, the efficiency of our operations, 00:27:19.190 --> 00:27:21.830 because we're dealing with multiple smaller tables 00:27:21.830 --> 00:27:24.320 where these operations can come faster. 00:27:24.320 --> 00:27:27.350 One drawback though is that, as we begin to split data 00:27:27.350 --> 00:27:31.250 across multiple different tables, it becomes more expensive if ever we 00:27:31.250 --> 00:27:33.980 need to join this data back together and connect 00:27:33.980 --> 00:27:36.290 all the domestic and international flights running 00:27:36.290 --> 00:27:37.790 separate queries on each. 00:27:37.790 --> 00:27:40.010 And so in that case, we'll want to think about trying 00:27:40.010 --> 00:27:42.710 to separate our data in such a way that, generally, we're 00:27:42.710 --> 00:27:46.750 only going to need to deal with one table or the other at any given time. 00:27:46.750 --> 00:27:49.280 And so domestic and international might be a reasonable way 00:27:49.280 --> 00:27:52.970 to split up our flights table because maybe, most of the time, our airport 00:27:52.970 --> 00:27:54.860 just cares about searching domestic flights 00:27:54.860 --> 00:27:56.630 if we know we're looking for one kind of flight, 00:27:56.630 --> 00:27:59.030 or just cares about searching for international flights 00:27:59.030 --> 00:28:01.405 if there are different people or different computers that 00:28:01.405 --> 00:28:05.090 are going to handle each of those different types of systems. 00:28:05.090 --> 00:28:08.630 And so partitioning our database can sometimes help with issues of scale 00:28:08.630 --> 00:28:11.480 by making it faster to search through large amounts of data 00:28:11.480 --> 00:28:14.480 and being able to represent data a little bit more cleanly. 00:28:14.480 --> 00:28:17.840 But it still seems to represent a single point of failure-- 00:28:17.840 --> 00:28:22.850 that we have multiple servers now that are all connected to the same database. 00:28:22.850 --> 00:28:24.890 And there, again, is a single point of failure. 00:28:24.890 --> 00:28:27.353 If the database fails for some reason, well now, 00:28:27.353 --> 00:28:29.270 suddenly, none of our web application is going 00:28:29.270 --> 00:28:31.940 to work because all of those servers are all 00:28:31.940 --> 00:28:35.180 connected to that exact same database. 00:28:35.180 --> 00:28:36.980 And so it's for that reason that we might-- 00:28:36.980 --> 00:28:39.230 just as we tried to add more servers in order 00:28:39.230 --> 00:28:42.530 to solve the problem of a single point of failure with our servers, 00:28:42.530 --> 00:28:45.410 we might also try database replication. 00:28:45.410 --> 00:28:48.860 Rather than just have a single database in our web application, 00:28:48.860 --> 00:28:50.870 in order to guard against potential failure, 00:28:50.870 --> 00:28:54.410 we might replicate our database-- have multiple different databases 00:28:54.410 --> 00:28:59.297 and, therefore, reduce the likelihood that our application entirely fails. 00:28:59.297 --> 00:29:01.130 And there are a couple of approaches that we 00:29:01.130 --> 00:29:03.020 can use for database replication. 00:29:03.020 --> 00:29:06.800 Two of the most common are what are known as single-primary replication 00:29:06.800 --> 00:29:09.190 and multi-primary replication. 00:29:09.190 --> 00:29:11.760 And in single-primary database replication, 00:29:11.760 --> 00:29:14.040 we have multiple different databases. 00:29:14.040 --> 00:29:17.930 But one of those databases is considered to be the primary database. 00:29:17.930 --> 00:29:20.510 And what we mean by a primary database is a database 00:29:20.510 --> 00:29:22.310 to which we can both read data-- 00:29:22.310 --> 00:29:24.560 meaning select rows from the table-- 00:29:24.560 --> 00:29:27.350 but also write data, meaning insert rows, 00:29:27.350 --> 00:29:31.200 or update rows, or delete rows to any of those tables. 00:29:31.200 --> 00:29:34.070 So in single-primary replication, we have a single database 00:29:34.070 --> 00:29:36.260 where we can both read and write. 00:29:36.260 --> 00:29:38.680 And we have some number of other databases-- in this case, 00:29:38.680 --> 00:29:40.100 two other databases-- 00:29:40.100 --> 00:29:41.900 from which we can only read data. 00:29:41.900 --> 00:29:44.220 So we can get data from those databases. 00:29:44.220 --> 00:29:48.560 But we can't update, or insert, or delete from those databases. 00:29:48.560 --> 00:29:52.490 And now we need some mechanism to make sure that all of these databases 00:29:52.490 --> 00:29:53.750 are kept in sync. 00:29:53.750 --> 00:29:57.620 And ultimately, what that means is that, any time the database changes, 00:29:57.620 --> 00:29:59.660 all of the databases are informed. 00:29:59.660 --> 00:30:02.390 Now, the only database that can change is our primary one. 00:30:02.390 --> 00:30:04.250 This is the only one that can be written to, 00:30:04.250 --> 00:30:06.740 the only one that allows for the data to change. 00:30:06.740 --> 00:30:08.180 The others are read only. 00:30:08.180 --> 00:30:12.170 So anytime this primary database updates or changes in some way, 00:30:12.170 --> 00:30:16.540 it needs to inform the other databases of that update. 00:30:16.540 --> 00:30:18.920 And so it informs the other databases of that update. 00:30:18.920 --> 00:30:21.230 And now all of the databases are kept in sync 00:30:21.230 --> 00:30:23.960 where, if you try and run a query on any of these databases 00:30:23.960 --> 00:30:25.910 to select and get some information, you'll 00:30:25.910 --> 00:30:30.440 get the same results from all of these various different databases. 00:30:30.440 --> 00:30:32.990 Now, the single-primary approach has some drawbacks. 00:30:32.990 --> 00:30:36.950 It has the drawback of only one of these databases can be written to. 00:30:36.950 --> 00:30:38.750 So if you have a lot of users that are all 00:30:38.750 --> 00:30:42.550 trying to write data to the database at the exact same time, 00:30:42.550 --> 00:30:44.360 well, there might be some issues here where 00:30:44.360 --> 00:30:46.370 this one database is going to be carrying 00:30:46.370 --> 00:30:49.100 all of that load for all of the people that might be trying 00:30:49.100 --> 00:30:51.860 to update and change that database. 00:30:51.860 --> 00:30:54.140 And it also has a slightly smaller version 00:30:54.140 --> 00:30:57.140 of the same problem of a single point of failure. 00:30:57.140 --> 00:31:00.770 There is no longer a single point of failure for reading from that data. 00:31:00.770 --> 00:31:03.750 If you want to read from the data, and one of the databases goes out, 00:31:03.750 --> 00:31:07.340 you can read data from any of the other databases, and they'll work just fine. 00:31:07.340 --> 00:31:10.670 But it does have the drawback that, if this database fails, 00:31:10.670 --> 00:31:13.040 if our primary database fails, well, then 00:31:13.040 --> 00:31:14.750 we're no longer able to write data. 00:31:14.750 --> 00:31:17.150 If we want to update data inside of our database, 00:31:17.150 --> 00:31:19.910 this one database is no longer going to be operational. 00:31:19.910 --> 00:31:24.673 And none of the other databases are going to allow us to write new changes. 00:31:24.673 --> 00:31:27.840 So there are a couple of approaches we can use to try to solve this problem. 00:31:27.840 --> 00:31:31.145 One approach though is, instead of having a single-primary database-- 00:31:31.145 --> 00:31:33.950 a single database to which we can read and write-- 00:31:33.950 --> 00:31:36.610 to use a multi-primary approach. 00:31:36.610 --> 00:31:40.160 And in the multi-primary approach, we have multiple databases, all of which 00:31:40.160 --> 00:31:41.810 we can read and write to. 00:31:41.810 --> 00:31:44.230 We can select rows from all the databases. 00:31:44.230 --> 00:31:48.780 And we can insert an update and delete rows to all of these databases as well. 00:31:48.780 --> 00:31:52.050 But now the synchronization process becomes a little bit trickier. 00:31:52.050 --> 00:31:54.050 And here, now, is the trade off-- that now we've 00:31:54.050 --> 00:31:55.850 replicated the number of reads and writes 00:31:55.850 --> 00:31:59.870 we can do by having many databases to which we can read data and write data. 00:31:59.870 --> 00:32:02.870 But anytime any of these databases changes, 00:32:02.870 --> 00:32:07.695 every database needs to inform all of the other databases of those updates. 00:32:07.695 --> 00:32:10.070 And that's, certainly, going to take some amount of time. 00:32:10.070 --> 00:32:13.160 It introduces some complexity into our system as well. 00:32:13.160 --> 00:32:16.550 And it also introduces the possibility for conflicts. 00:32:16.550 --> 00:32:19.550 You might imagine situations where, if two people are editing 00:32:19.550 --> 00:32:21.830 similar data at the same time, you might run 00:32:21.830 --> 00:32:24.080 into a number of different types of conflicts. 00:32:24.080 --> 00:32:27.560 So one type of conflict, for example, would be an update conflict. 00:32:27.560 --> 00:32:30.170 If I tried to edit one row in one database, 00:32:30.170 --> 00:32:34.040 and someone else tries to edit the same row in another database, when they sync 00:32:34.040 --> 00:32:36.230 up with each other via this update process, 00:32:36.230 --> 00:32:38.600 our database system needs some way to decide 00:32:38.600 --> 00:32:42.200 how it's going to resolve those various different updates. 00:32:42.200 --> 00:32:44.880 Another conflict might be a uniqueness conflict. 00:32:44.880 --> 00:32:46.907 We've seen, in the case of databases in SQL 00:32:46.907 --> 00:32:48.740 that, when we're designing our tables, I can 00:32:48.740 --> 00:32:51.980 specify that this particular field should be a unique field-- 00:32:51.980 --> 00:32:56.030 common one being the ID field, for example, where every single row is 00:32:56.030 --> 00:32:58.100 going to have its own unique ideas. 00:32:58.100 --> 00:33:01.670 Well, what happens if two people try to insert data at the same time 00:33:01.670 --> 00:33:03.350 into two different databases? 00:33:03.350 --> 00:33:07.610 They're each given a unique ID, but it's the same idea on both of the databases, 00:33:07.610 --> 00:33:11.240 because neither database knows that the other database has added a new row yet. 00:33:11.240 --> 00:33:14.540 So when they sync back up, we might run into a uniqueness conflict 00:33:14.540 --> 00:33:18.290 where two different databases have assigned the same exact ID 00:33:18.290 --> 00:33:19.730 to multiple different entries. 00:33:19.730 --> 00:33:23.117 So we need some way to be able to resolve those conflicts as well. 00:33:23.117 --> 00:33:24.950 And there are many other conflicts you might 00:33:24.950 --> 00:33:28.340 imagine trying to deal with-- one example being, for instance, delete 00:33:28.340 --> 00:33:31.430 conflicts, where one person tries to delete a row 00:33:31.430 --> 00:33:33.710 and another person tries to update that row. 00:33:33.710 --> 00:33:35.278 Well, which should take precedence? 00:33:35.278 --> 00:33:36.320 Should we update the row? 00:33:36.320 --> 00:33:37.610 Should we delete the row? 00:33:37.610 --> 00:33:41.450 We need some way to be able to make those decisions because there 00:33:41.450 --> 00:33:45.150 is some latency between when a change is made to a database 00:33:45.150 --> 00:33:48.600 and when that database is able to communicate with another database. 00:33:48.600 --> 00:33:51.290 So these issues of scale, these issues of synchronization 00:33:51.290 --> 00:33:53.330 are always going to come up as we start to deal 00:33:53.330 --> 00:33:56.970 with programs that are interacting with more and more of this kind of data. 00:33:56.970 --> 00:33:59.810 And as a result, we need to design more and more sophisticated 00:33:59.810 --> 00:34:04.040 systems that are able to deal with those issues of scale. 00:34:04.040 --> 00:34:09.139 Now, ultimately, we'd ideally like to reduce the number of different database 00:34:09.139 --> 00:34:10.130 servers that we have. 00:34:10.130 --> 00:34:12.692 Every additional database server is going to cost time. 00:34:12.692 --> 00:34:13.900 It's going to cost resources. 00:34:13.900 --> 00:34:17.060 It costs money in terms of keeping all of these servers running. 00:34:17.060 --> 00:34:20.960 And so, ideally, we'd like not to have to talk to this database 00:34:20.960 --> 00:34:22.590 if we don't need to. 00:34:22.590 --> 00:34:26.360 So you might imagine, for example, a news organization's website, something 00:34:26.360 --> 00:34:28.275 like the front page of the New York Times. 00:34:28.275 --> 00:34:30.650 If you go to the home page of the New York Times website, 00:34:30.650 --> 00:34:33.230 it displays all of the day's headlines with images 00:34:33.230 --> 00:34:36.860 and with information about what each of the stories are about, for example. 00:34:36.860 --> 00:34:39.983 And you might imagine that the way they're doing something like this 00:34:39.983 --> 00:34:41.900 is that they have some kind of database that's 00:34:41.900 --> 00:34:43.670 storing all of these news articles. 00:34:43.670 --> 00:34:46.040 And when you visit the front page of the New York Times, 00:34:46.040 --> 00:34:48.290 it's going to do some kind of database query-- 00:34:48.290 --> 00:34:51.500 selecting all of the recent top headlines, for example-- 00:34:51.500 --> 00:34:56.460 and rendering all of that information in an HTML page that you can see. 00:34:56.460 --> 00:34:57.930 And that would certainly work. 00:34:57.930 --> 00:35:00.440 But if a lot of people are all requesting the front page 00:35:00.440 --> 00:35:04.670 at the same time, well, it probably doesn't make all that much sense 00:35:04.670 --> 00:35:08.390 if the web application, every time, is making a database query, getting 00:35:08.390 --> 00:35:13.040 the latest articles, and then displaying that information to all of the users 00:35:13.040 --> 00:35:16.130 because the articles might not be changing all that frequently. 00:35:16.130 --> 00:35:18.440 If one person makes a request one second, 00:35:18.440 --> 00:35:21.710 and another person makes the same request half a second later, 00:35:21.710 --> 00:35:26.150 it probably is not going to be useful to re-request all of the information 00:35:26.150 --> 00:35:29.450 from the database, regenerate that template yet again, because it's 00:35:29.450 --> 00:35:33.050 an expensive process of requesting data from the database, of generating 00:35:33.050 --> 00:35:33.800 that template. 00:35:33.800 --> 00:35:36.710 We'd, ideally, like some way of dealing with that problem. 00:35:36.710 --> 00:35:40.040 And the way we can deal with that problem is some form of caching. 00:35:40.040 --> 00:35:44.300 And caching refers to a whole bunch of different types of ideas and tools 00:35:44.300 --> 00:35:47.660 that we can use at various different places inside of our system. 00:35:47.660 --> 00:35:50.390 But in general, when we're talking about caching, 00:35:50.390 --> 00:35:54.680 we're talking about storing a saved version of some information in a way 00:35:54.680 --> 00:35:58.340 that we can access it more quickly so that we don't need to continue making 00:35:58.340 --> 00:36:00.720 requests to a database, for example. 00:36:00.720 --> 00:36:02.930 And so there are a number of ways we can do caching. 00:36:02.930 --> 00:36:07.010 One way we can do caching is on the client side via client-side caching 00:36:07.010 --> 00:36:08.850 where the idea is that your browser-- 00:36:08.850 --> 00:36:11.030 whether it's Safari, or Chrome, or something else-- 00:36:11.030 --> 00:36:13.700 is able to cache data, store information, 00:36:13.700 --> 00:36:17.070 so that the browser doesn't need to re-request the same information 00:36:17.070 --> 00:36:19.050 the next time it visits the page. 00:36:19.050 --> 00:36:21.680 For example, if you request a page and it loads an image-- 00:36:21.680 --> 00:36:23.210 on the page, for example-- 00:36:23.210 --> 00:36:25.850 and you reload the page, well, your web browser 00:36:25.850 --> 00:36:28.760 might try and make a request again for the exact same image 00:36:28.760 --> 00:36:30.020 and then display it to you. 00:36:30.020 --> 00:36:33.500 But an alternative might be that your web browser could just 00:36:33.500 --> 00:36:35.960 save a copy of the image inside of a cache 00:36:35.960 --> 00:36:40.280 to locally store a version of the image so that, the next time 00:36:40.280 --> 00:36:42.860 that the user makes a request to the website, the user 00:36:42.860 --> 00:36:45.410 doesn't need to reload that entire image. 00:36:45.410 --> 00:36:48.650 And that might be true of entire web pages and web resources-- 00:36:48.650 --> 00:36:51.770 that if there is some page that doesn't change very often then, 00:36:51.770 --> 00:36:55.850 if the web browser just stores a cached, a saved version of that page, 00:36:55.850 --> 00:36:58.340 then the next time the user goes to their web browser, 00:36:58.340 --> 00:37:03.020 tries to access that page, rather than re-request to the server and make a new 00:37:03.020 --> 00:37:06.440 request that the server needs to respond to, if the browser has that page 00:37:06.440 --> 00:37:09.530 cached, the browser can just display the cached-- 00:37:09.530 --> 00:37:13.830 saved-- version of the page, saving the need to talk to the server at all. 00:37:13.830 --> 00:37:16.970 So this can certainly help to reduce the load on any given server. 00:37:16.970 --> 00:37:20.360 If users are caching information inside of the web browser, 00:37:20.360 --> 00:37:22.480 it makes the experience faster for the user 00:37:22.480 --> 00:37:24.980 because they can see the information immediately rather than 00:37:24.980 --> 00:37:28.070 need to make a request and wait for a response to come back. 00:37:28.070 --> 00:37:30.140 And it's good for the server because the server 00:37:30.140 --> 00:37:33.740 doesn't need to be dealing with as many requests if some of those requests 00:37:33.740 --> 00:37:35.160 are getting cached. 00:37:35.160 --> 00:37:37.400 And so one approach to trying to do this is 00:37:37.400 --> 00:37:42.290 by adding this inside of the headers of an HTTP response. 00:37:42.290 --> 00:37:44.960 When your web server responds to some requests, 00:37:44.960 --> 00:37:48.770 the web server can include a line like this inside of the response-- 00:37:48.770 --> 00:37:53.210 something like cache-control max-age-86400-- 00:37:53.210 --> 00:37:56.330 in effect, specifying the number of seconds 00:37:56.330 --> 00:37:58.850 that you should cache this resource for. 00:37:58.850 --> 00:38:02.510 But if I try to access this page 10 seconds later, 00:38:02.510 --> 00:38:04.910 well, that's less than 86,400. 00:38:04.910 --> 00:38:08.600 So rather than reload and re-request the entire page, 00:38:08.600 --> 00:38:11.390 we're just going to use the version of the page that happens 00:38:11.390 --> 00:38:13.750 to be cached inside of the web browser. 00:38:13.750 --> 00:38:16.250 And so this has several advantages, that we've talked about, 00:38:16.250 --> 00:38:19.640 in terms of reducing the amount of time it takes to see the content of the page 00:38:19.640 --> 00:38:23.570 because it's already saved and reducing the load on any particular server. 00:38:23.570 --> 00:38:25.040 But it also has drawbacks. 00:38:25.040 --> 00:38:29.180 If, for example, the resource changes within this amount of time-- 00:38:29.180 --> 00:38:32.240 maybe in 60 seconds, the page has changed-- 00:38:32.240 --> 00:38:35.120 if I try and load the page again, well, then 00:38:35.120 --> 00:38:37.400 if it's loading the cache version of the page, 00:38:37.400 --> 00:38:40.400 I might be seeing an outdated version of a web page. 00:38:40.400 --> 00:38:42.470 I'm seeing an older version of the web page 00:38:42.470 --> 00:38:45.320 because my web browser just so happens to have 00:38:45.320 --> 00:38:47.570 that particular resource cached. 00:38:47.570 --> 00:38:49.610 And this might be true of a web page. 00:38:49.610 --> 00:38:53.630 It's especially true of other static resources, things like CSS files 00:38:53.630 --> 00:38:54.760 or JavaScript files. 00:38:54.760 --> 00:38:58.860 The CSS of a web page probably doesn't change all that often. 00:38:58.860 --> 00:39:02.120 And so, as a result, it's pretty natural that your web browser-- 00:39:02.120 --> 00:39:05.870 rather than request the exact same CSS files again, and again, and again-- 00:39:05.870 --> 00:39:08.650 might just save a copy of those CSS files, 00:39:08.650 --> 00:39:12.380 cache them, such that it's able to just reuse the cached version. 00:39:12.380 --> 00:39:14.690 But if the website were to update their CSS, 00:39:14.690 --> 00:39:16.355 you might not see the latest changes. 00:39:16.355 --> 00:39:18.230 And you might have experienced this yourself. 00:39:18.230 --> 00:39:21.410 If you're working on your own web applications, when you change your CSS 00:39:21.410 --> 00:39:23.270 and refresh the page, you might not always 00:39:23.270 --> 00:39:27.900 see those changes reflected if your web browser is caching those results. 00:39:27.900 --> 00:39:30.710 And so, in most web browsers, you can do a hard refresh 00:39:30.710 --> 00:39:33.740 to say, ignore whatever is in the cache, and actually go out 00:39:33.740 --> 00:39:36.030 and make a new request and get some new data. 00:39:36.030 --> 00:39:38.810 But ultimately, if you don't do that, you're 00:39:38.810 --> 00:39:42.230 subject to this cache control where the web browser is going to say, 00:39:42.230 --> 00:39:44.750 unless this number of seconds has elapsed, 00:39:44.750 --> 00:39:48.500 we're going to reuse the existing version of the page. 00:39:48.500 --> 00:39:51.590 And so an alternative to this approach-- and this approach certainly works 00:39:51.590 --> 00:39:52.670 and is quite popular-- 00:39:52.670 --> 00:39:56.950 we can add to this approach by adding what's known as ETag. 00:39:56.950 --> 00:40:00.290 An ETag for a resource-- like a CSS file, or an image, 00:40:00.290 --> 00:40:01.590 or a JavaScript file-- 00:40:01.590 --> 00:40:04.190 is just some unique sequence of characters 00:40:04.190 --> 00:40:07.610 that identifies a particular version of a resource, 00:40:07.610 --> 00:40:11.300 that identifies a particular version of a CSS file or a JavaScript file, 00:40:11.300 --> 00:40:12.930 for example. 00:40:12.930 --> 00:40:14.840 And what this allows a program to do-- 00:40:14.840 --> 00:40:16.010 like a web browser-- 00:40:16.010 --> 00:40:18.230 is that, when a web browser requests a resource-- 00:40:18.230 --> 00:40:21.410 makes a request for a CSS file or a JavaScript file-- 00:40:21.410 --> 00:40:22.370 they get it back. 00:40:22.370 --> 00:40:25.760 And they get its associated ETag value, so I 00:40:25.760 --> 00:40:28.310 know that this is the value that is associated 00:40:28.310 --> 00:40:31.040 with this version of the CSS file. 00:40:31.040 --> 00:40:35.720 And if the web server were ever to change that CSS file, replace it 00:40:35.720 --> 00:40:41.820 with a new updated CSS file, the corresponding ETag will also change. 00:40:41.820 --> 00:40:43.650 So why is this helpful? 00:40:43.650 --> 00:40:46.730 Well, it means that if I am trying to decide, should I 00:40:46.730 --> 00:40:50.070 load a new version of the resource or not, 00:40:50.070 --> 00:40:53.510 should I try and make another request to get the latest version of the CSS, 00:40:53.510 --> 00:40:55.970 what I can do first is just ask for, what 00:40:55.970 --> 00:40:59.660 is the ETag value, the short sequence that can be answered very quickly? 00:40:59.660 --> 00:41:02.090 Very quickly, we can just respond and say, 00:41:02.090 --> 00:41:05.360 you know what, if the ETag value is the same as what I remembered 00:41:05.360 --> 00:41:07.850 from last time, well, then I don't need to get 00:41:07.850 --> 00:41:10.340 a whole new version of that resource. 00:41:10.340 --> 00:41:13.070 And so this is quite common, too, that a web browser will say, 00:41:13.070 --> 00:41:15.110 hey, let me request this resource. 00:41:15.110 --> 00:41:19.200 But I already have a version of the resource with this particular ETag. 00:41:19.200 --> 00:41:24.110 So if that ETag is still the ETag for the most recent version of a particular 00:41:24.110 --> 00:41:26.450 resource-- like a CSS or JavaScript file-- 00:41:26.450 --> 00:41:30.650 then no need for the web server to send a new version of that file. 00:41:30.650 --> 00:41:33.650 Just go ahead and respond and say, the version you have-- that one 00:41:33.650 --> 00:41:34.920 works-- totally fine. 00:41:34.920 --> 00:41:38.280 But if there is a new version, well, then the web server can respond with 00:41:38.280 --> 00:41:41.130 the new asset-- the new CSS file, for example-- 00:41:41.130 --> 00:41:43.430 but also the new ETag value. 00:41:43.430 --> 00:41:46.160 So these two approaches can work in concert with each other. 00:41:46.160 --> 00:41:49.220 You can say, go ahead and cache this for some number of seconds 00:41:49.220 --> 00:41:51.020 so that, for some number of seconds, you're 00:41:51.020 --> 00:41:54.680 not going to ever request a new version of that resource. 00:41:54.680 --> 00:41:57.710 But even if you do ask for a new version of the resource 00:41:57.710 --> 00:41:59.900 after this number of seconds has elapsed, 00:41:59.900 --> 00:42:02.390 if the ETag value hasn't updated, then no 00:42:02.390 --> 00:42:06.090 need to redownload a whole new version of a particular file. 00:42:06.090 --> 00:42:08.750 You can just reuse the version that happens 00:42:08.750 --> 00:42:10.890 to be cached already in the browser. 00:42:10.890 --> 00:42:14.270 So caching in the browser can be an incredibly powerful tool 00:42:14.270 --> 00:42:17.000 for trying to speed up these requests, for trying to reduce 00:42:17.000 --> 00:42:19.070 the load on any particular server. 00:42:19.070 --> 00:42:21.290 But the client side is not the only place 00:42:21.290 --> 00:42:23.510 where we can begin to do this kind of caching. 00:42:23.510 --> 00:42:26.330 We also have the ability to do server-side caching. 00:42:26.330 --> 00:42:30.560 And in server-side caching, we're going to introduce to our picture the notion 00:42:30.560 --> 00:42:31.940 of a cache-- 00:42:31.940 --> 00:42:34.160 that we have these multiple servers that are all 00:42:34.160 --> 00:42:35.720 communicating with the database. 00:42:35.720 --> 00:42:38.300 But these servers can also communicate with a cache-- 00:42:38.300 --> 00:42:41.360 someplace where we've stored information that we 00:42:41.360 --> 00:42:46.340 might want to reuse later rather than have to do all of that recalculation. 00:42:46.340 --> 00:42:49.280 And Django, in turns out, has an entire cache framework, 00:42:49.280 --> 00:42:51.530 a whole host of features that Django offers 00:42:51.530 --> 00:42:54.860 that allow us to leverage this ability to use the cache 00:42:54.860 --> 00:42:56.470 to be able to speed up requests. 00:42:56.470 --> 00:42:59.150 So there are per-view caches where you can 00:42:59.150 --> 00:43:02.720 specify a cache on a particular view to say that, rather than run 00:43:02.720 --> 00:43:05.540 through all this Python code every time someone makes 00:43:05.540 --> 00:43:09.410 a request to this particular view, instead, 00:43:09.410 --> 00:43:14.150 just cache the view so that, for the next 30 seconds or 30 minutes, 00:43:14.150 --> 00:43:16.940 the next time someone tries to visit the same view, 00:43:16.940 --> 00:43:19.910 go ahead and just reuse the results of the last time 00:43:19.910 --> 00:43:21.665 that that view was loaded. 00:43:21.665 --> 00:43:23.540 And this can work not just for a single view. 00:43:23.540 --> 00:43:25.657 It can work for fragments inside of a template. 00:43:25.657 --> 00:43:27.740 Your template might have multiple different parts. 00:43:27.740 --> 00:43:31.190 On your web page, you might render the navigation bar, and the sidebar, 00:43:31.190 --> 00:43:33.800 and the footer, maybe based on information about today 00:43:33.800 --> 00:43:36.050 that might change the next day. 00:43:36.050 --> 00:43:38.510 But if you expect that the side bar of your page 00:43:38.510 --> 00:43:41.570 is not going to change very often within the same minute 00:43:41.570 --> 00:43:43.820 or within the same hour, well, then you might imagine 00:43:43.820 --> 00:43:46.910 caching that part of the template so that, the next time 00:43:46.910 --> 00:43:49.160 that Django tries to load that entire template, 00:43:49.160 --> 00:43:52.550 it doesn't need to recalculate how to generate the sidebar for your website. 00:43:52.550 --> 00:43:56.330 It just knows that we can use the same version of the sidebar 00:43:56.330 --> 00:43:59.786 from the last time that we loaded this website instead. 00:43:59.786 --> 00:44:03.600 And Django also gives you access to a lower level cache API 00:44:03.600 --> 00:44:07.080 where, for any information that you might want to cache and store for use 00:44:07.080 --> 00:44:10.140 later, you can save that information inside of the API. 00:44:10.140 --> 00:44:12.180 You make an expensive database query that 00:44:12.180 --> 00:44:15.360 takes a couple of milliseconds or a couple of seconds to process. 00:44:15.360 --> 00:44:17.760 You can save those results inside of a cache 00:44:17.760 --> 00:44:20.550 to make it easier to access that same data if ever you 00:44:20.550 --> 00:44:22.930 try to get access to that again. 00:44:22.930 --> 00:44:26.430 So caching allows us to be able to deal with these issues of scale 00:44:26.430 --> 00:44:29.910 by reducing load on our servers, but also on our databases. 00:44:29.910 --> 00:44:33.330 Rather than need to talk to the database every single time we 00:44:33.330 --> 00:44:36.750 make a new request for a particular web application, 00:44:36.750 --> 00:44:39.060 we can just reuse information that happens 00:44:39.060 --> 00:44:42.930 to be in the cache to allow our web applications to become even more 00:44:42.930 --> 00:44:44.350 scalable. 00:44:44.350 --> 00:44:48.000 So that then was a look at some issues concerning scalability. 00:44:48.000 --> 00:44:50.580 And we'll next turn our attention to security-- 00:44:50.580 --> 00:44:53.610 trying to make sure that, as we build our web applications, as we deploy 00:44:53.610 --> 00:44:56.370 our web applications and more users start to use them, 00:44:56.370 --> 00:44:58.290 we want to make sure that they're secure. 00:44:58.290 --> 00:45:00.570 And there are a whole bunch of security considerations 00:45:00.570 --> 00:45:03.170 to take into account across all of the topics 00:45:03.170 --> 00:45:04.650 that we've looked at in the course. 00:45:04.650 --> 00:45:06.525 We've looked at a number of different topics. 00:45:06.525 --> 00:45:09.400 And with each of them, there are security vulnerabilities. 00:45:09.400 --> 00:45:12.720 There are ideas to be mindful of when it comes towards making sure 00:45:12.720 --> 00:45:14.580 that our applications are secure. 00:45:14.580 --> 00:45:18.420 And we can begin our story, in fact, by talking about Git and version control. 00:45:18.420 --> 00:45:20.370 Git is all about trying to make sure we're 00:45:20.370 --> 00:45:22.860 able to keep track of different versions of our code. 00:45:22.860 --> 00:45:24.780 And one thing that goes hand-in-hand with Git 00:45:24.780 --> 00:45:27.480 is this idea of open-source software. 00:45:27.480 --> 00:45:30.930 On websites like GitHub and other services that host Git repositories, 00:45:30.930 --> 00:45:33.930 increasingly, a lot of software is becoming open source 00:45:33.930 --> 00:45:38.190 where anyone can see and contribute to the source code of an application. 00:45:38.190 --> 00:45:40.868 And this is great in the sense that it allows for many people 00:45:40.868 --> 00:45:42.660 to be able to collaborate and work together 00:45:42.660 --> 00:45:46.590 in order to try to find bugs that might exist inside of a web application. 00:45:46.590 --> 00:45:48.810 But it also comes with drawbacks-- drawbacks 00:45:48.810 --> 00:45:51.333 where, if there is a bug in the application, 00:45:51.333 --> 00:45:54.000 now someone who's looking through the source code of our program 00:45:54.000 --> 00:45:56.250 might be able to spot that bug. 00:45:56.250 --> 00:45:58.920 Or you might imagine that, because Git keeps 00:45:58.920 --> 00:46:01.830 track of different versions of our code every time 00:46:01.830 --> 00:46:04.050 we make a commit to our repository, you have 00:46:04.050 --> 00:46:07.110 to be very careful when it comes towards credentials or things that 00:46:07.110 --> 00:46:08.910 might leak inside of the source code. 00:46:08.910 --> 00:46:12.600 You generally never want to put passwords or any secure information 00:46:12.600 --> 00:46:15.990 inside of the Git repository because the Git repository could 00:46:15.990 --> 00:46:19.000 be shared with other people and might be open to anyone to look at. 00:46:19.000 --> 00:46:22.200 And so those are security considerations to be mindful there as 00:46:22.200 --> 00:46:25.920 well-- that if you make a commit, and accidentally make a commit to your code 00:46:25.920 --> 00:46:29.610 where you expose those credentials, you might remove those credentials 00:46:29.610 --> 00:46:32.160 and commit again so the latest version of your program 00:46:32.160 --> 00:46:34.140 doesn't have those credentials in it. 00:46:34.140 --> 00:46:36.540 But someone who has access to the Git repository 00:46:36.540 --> 00:46:39.150 has access not just to the latest version of your code, 00:46:39.150 --> 00:46:41.110 but to every version of your code. 00:46:41.110 --> 00:46:43.650 And that person could, theoretically, go back 00:46:43.650 --> 00:46:46.770 through the history of the repository and find the commit 00:46:46.770 --> 00:46:51.040 where the credentials were exposed and see those credentials as well. 00:46:51.040 --> 00:46:54.270 So while Git is a very powerful tool, it's also one to be mindful of. 00:46:54.270 --> 00:46:57.840 Any change you make could potentially get saved inside of a commit-- 00:46:57.840 --> 00:47:00.690 could potentially, therefore, be accessed later on. 00:47:00.690 --> 00:47:04.380 And so if ever credentials are exposed inside of the repository, 00:47:04.380 --> 00:47:07.260 you want to make sure to wipe out all of those previous commits 00:47:07.260 --> 00:47:09.690 and not just make some new commit in order 00:47:09.690 --> 00:47:13.740 to try and hide the previous credentials that can be exposed because they can 00:47:13.740 --> 00:47:17.010 still be retrieved if someone goes back through the history 00:47:17.010 --> 00:47:19.300 of any particular repository. 00:47:19.300 --> 00:47:23.025 And so that, then, was a look at some issues that might surround Git. 00:47:23.025 --> 00:47:24.900 We also talked at the beginning of the course 00:47:24.900 --> 00:47:28.110 about HTML, and about what it is that we can use with HTML, 00:47:28.110 --> 00:47:32.040 and how we can use this language in order to design the structure of a web 00:47:32.040 --> 00:47:36.150 page, in order to decide where all of the paragraphs are going to be, 00:47:36.150 --> 00:47:38.070 what tables are going to be on the page. 00:47:38.070 --> 00:47:40.710 We talked about links and how we can use anchor tags 00:47:40.710 --> 00:47:42.960 to link one page to another page. 00:47:42.960 --> 00:47:47.640 Now, one concern is this type of attack known as a phishing attack with HTML. 00:47:47.640 --> 00:47:49.830 And a phishing attack really just comes down 00:47:49.830 --> 00:47:53.100 to a little bit of HTML that looks like this-- very easy to write, 00:47:53.100 --> 00:47:57.690 where I have an anchor tag that is going to direct the user to URL one. 00:47:57.690 --> 00:48:01.860 But it looks like it directs the user to URL 2. 00:48:01.860 --> 00:48:03.930 So what might an example of this be? 00:48:03.930 --> 00:48:05.380 All right, so we'll take a look. 00:48:05.380 --> 00:48:09.280 I'll go ahead and open up link.html. 00:48:09.280 --> 00:48:11.770 And in link.html, I have a website that I've written 00:48:11.770 --> 00:48:13.950 that appears to have a link to Google. 00:48:13.950 --> 00:48:16.030 But if I click on that link, I'm suddenly 00:48:16.030 --> 00:48:19.162 directed to this course's website, for example. 00:48:19.162 --> 00:48:20.120 So how did that happen? 00:48:20.120 --> 00:48:20.953 Why did that happen? 00:48:20.953 --> 00:48:22.670 It seems like it's linking to Google. 00:48:22.670 --> 00:48:26.290 Well, if you look at the code, if I go ahead and open up link.html, 00:48:26.290 --> 00:48:31.360 we'll see that here I have an anchor tag that actually links to the course 00:48:31.360 --> 00:48:34.150 website but appears to be linking-- the text 00:48:34.150 --> 00:48:37.900 that the user sees appears that it is linking instead to Google. 00:48:37.900 --> 00:48:41.360 And so this is a very common attack vector, especially in emails, 00:48:41.360 --> 00:48:41.980 for example. 00:48:41.980 --> 00:48:45.040 You might see an email that tells you to click on a particular link. 00:48:45.040 --> 00:48:48.070 But that link takes you to somewhere else entirely instead. 00:48:48.070 --> 00:48:50.380 And as a result, someone might inadvertently 00:48:50.380 --> 00:48:54.010 share their bank account credentials or other sensitive information. 00:48:54.010 --> 00:48:57.220 And so here, too, something be mindful of as you interact with the web, 00:48:57.220 --> 00:49:00.490 maybe not necessarily on your own website, but in other websites 00:49:00.490 --> 00:49:03.940 that you might interact with, just to be mindful about where links are actually 00:49:03.940 --> 00:49:04.580 taking you. 00:49:04.580 --> 00:49:07.300 And most web browsers, if you hover over a link, 00:49:07.300 --> 00:49:09.400 will show you where that link might actually 00:49:09.400 --> 00:49:12.010 be directing you to because it might be different than what 00:49:12.010 --> 00:49:17.930 the text of that particular anchor tag might appear to link you to instead. 00:49:17.930 --> 00:49:21.017 So HTML has all these various different vulnerabilities 00:49:21.017 --> 00:49:24.100 where, because you can just decide what you want the structure of the page 00:49:24.100 --> 00:49:26.710 to be, it leaves open the possibility that someone 00:49:26.710 --> 00:49:29.770 might try to trick you into thinking that you were going to a page 00:49:29.770 --> 00:49:31.420 that you're not actually on. 00:49:31.420 --> 00:49:34.150 And this problem is more widespread because anyone 00:49:34.150 --> 00:49:36.580 can look at the HTML for any page. 00:49:36.580 --> 00:49:38.950 HTML comes back from the server. 00:49:38.950 --> 00:49:42.310 And therefore, the web browser has access to all of that HTML 00:49:42.310 --> 00:49:46.270 and can use that HTML in order to render a page, for example. 00:49:46.270 --> 00:49:49.150 And this leaves open other vulnerabilities, too. 00:49:49.150 --> 00:49:54.760 For example, let me go ahead and go to bankofamerica.com, just 00:49:54.760 --> 00:49:55.900 Bank of America's website. 00:49:55.900 --> 00:49:57.850 You can go to any other website instead. 00:49:57.850 --> 00:50:01.600 If I wanted to create a fake version of Bank of America's website, 00:50:01.600 --> 00:50:03.820 for example, to trick people into thinking 00:50:03.820 --> 00:50:05.740 they're going to Bank of America's website 00:50:05.740 --> 00:50:08.950 when really they're going to my website, well, then what I can do 00:50:08.950 --> 00:50:11.420 is just go ahead and view the source of this page. 00:50:11.420 --> 00:50:13.940 I go ahead and view page source. 00:50:13.940 --> 00:50:17.990 And here is all of the HTML for Bank of America's website. 00:50:17.990 --> 00:50:21.410 And nothing then stops me from copying all this content, 00:50:21.410 --> 00:50:27.440 going into an HTML file, and creating a new file that I'll just call bank.html. 00:50:27.440 --> 00:50:31.350 And I'll go ahead and paste in the contents of that HTML file, 00:50:31.350 --> 00:50:34.700 secure then all of Bank of America's HTML. 00:50:34.700 --> 00:50:37.190 And now, if I open up bank.html-- 00:50:37.190 --> 00:50:39.920 that HTML file that I have now written, but really 00:50:39.920 --> 00:50:42.320 just copied from Bank of America-- 00:50:42.320 --> 00:50:43.730 I open it up. 00:50:43.730 --> 00:50:47.000 And now here, on my page, is a web page that 00:50:47.000 --> 00:50:48.680 appears to look like Bank of America. 00:50:48.680 --> 00:50:51.170 It's using all of Bank of America's HTML. 00:50:51.170 --> 00:50:56.130 But instead, it is my HTML page and not, actually, Bank of America. 00:50:56.130 --> 00:51:00.350 And so you might imagine combining these to create an even more concerning 00:51:00.350 --> 00:51:03.050 attack vector where, instead of linking to google.com, 00:51:03.050 --> 00:51:06.461 let me try and link to bankofamerica.com. 00:51:06.461 --> 00:51:12.170 But where I'm actually going to link to is bank.html, my version 00:51:12.170 --> 00:51:14.180 of Bank of America's website. 00:51:14.180 --> 00:51:18.170 Now, if I open up link.html, here appears 00:51:18.170 --> 00:51:20.900 to be a link that links me to Bank of America. 00:51:20.900 --> 00:51:23.180 If I click on that link, I get to a page that 00:51:23.180 --> 00:51:25.250 looks like Bank of America's website. 00:51:25.250 --> 00:51:27.260 But it's not Bank of America's website. 00:51:27.260 --> 00:51:30.490 It's my bank.html file that I have written. 00:51:30.490 --> 00:51:33.140 It just so happens to look like Bank of America's website 00:51:33.140 --> 00:51:36.620 because I copied all of that underlying HTML. 00:51:36.620 --> 00:51:39.860 So HTML has the ability to describe the structure of our web page. 00:51:39.860 --> 00:51:43.790 But anytime you're writing this HTML, it's good to be mindful of the fact 00:51:43.790 --> 00:51:48.110 that anyone can copy your HTML, could theoretically pretend to be you. 00:51:48.110 --> 00:51:50.090 These are security vulnerabilities that are 00:51:50.090 --> 00:51:53.240 worth bearing in mind as we start to develop web applications 00:51:53.240 --> 00:51:56.910 and interacting with web applications as well. 00:51:56.910 --> 00:52:01.070 So ultimately, we used HTML in the context of designing web applications 00:52:01.070 --> 00:52:02.960 using Django, a framework. 00:52:02.960 --> 00:52:05.690 And how exactly, then, did these web frameworks 00:52:05.690 --> 00:52:10.250 work in terms of creating these web servers that are listening for requests 00:52:10.250 --> 00:52:12.650 and that are responding to those requests? 00:52:12.650 --> 00:52:14.390 Well, ultimately, much of the internet is 00:52:14.390 --> 00:52:17.930 based around this idea of a client communicating with a server or, more 00:52:17.930 --> 00:52:20.420 generally, any one computer communicating 00:52:20.420 --> 00:52:23.810 with another computer using HTTP and, in particular, 00:52:23.810 --> 00:52:28.618 HTTPS, a more secure version of the HTTP protocol. 00:52:28.618 --> 00:52:31.160 And so you imagine that what these protocols are really about 00:52:31.160 --> 00:52:34.200 is how information gets from one person to another 00:52:34.200 --> 00:52:36.110 and what we're storing with that information. 00:52:36.110 --> 00:52:39.680 We have one computer trying to communicate with some other computer. 00:52:39.680 --> 00:52:42.440 And in order to do so, information is generally 00:52:42.440 --> 00:52:45.020 going to flow through these routers. 00:52:45.020 --> 00:52:47.270 You might imagine information going back and forth 00:52:47.270 --> 00:52:49.610 between one computer and another computer, 00:52:49.610 --> 00:52:53.540 going through these intermediate routers along the way. 00:52:53.540 --> 00:52:56.390 And as a result, one thing to be cautious about 00:52:56.390 --> 00:52:58.400 is, how do you know that this information that's 00:52:58.400 --> 00:53:02.390 getting passed back and forth is getting passed back and forth securely? 00:53:02.390 --> 00:53:05.150 Ideally, when I send a message to another computer-- 00:53:05.150 --> 00:53:07.190 I'm sending an email to someone else, I'm 00:53:07.190 --> 00:53:09.800 sending a message, I'm making a request to a website that 00:53:09.800 --> 00:53:13.130 might contain sensitive information, like my bank account, for example-- 00:53:13.130 --> 00:53:17.030 I don't want it so that any intercepting router that is taking my request 00:53:17.030 --> 00:53:18.260 and passing it along-- 00:53:18.260 --> 00:53:21.170 I don't want those routers to be able to look at that request 00:53:21.170 --> 00:53:24.950 and see the contents of my email or the contents of what password 00:53:24.950 --> 00:53:27.620 I happen to be sending across the web or not. 00:53:27.620 --> 00:53:31.005 Ideally, I'd like for this information to be encrypted. 00:53:31.005 --> 00:53:33.380 And so here, we'll talk a little bit about cryptography-- 00:53:33.380 --> 00:53:35.450 this process of trying to make sure that I 00:53:35.450 --> 00:53:37.850 am able to communicate with some other person 00:53:37.850 --> 00:53:42.860 without some eavesdropper in the middle being able to intercept that message. 00:53:42.860 --> 00:53:45.555 Obviously, if I just take a plain text version 00:53:45.555 --> 00:53:47.930 of the message I'm trying to send and just literally take 00:53:47.930 --> 00:53:51.560 the text of the message I'm trying to send and effectively pass it along 00:53:51.560 --> 00:53:53.660 across the internet, well, then anyone who 00:53:53.660 --> 00:53:57.430 is able to see that message is going to know what the text of that message is. 00:53:57.430 --> 00:53:59.420 And so I want to do some kind of encryption, 00:53:59.420 --> 00:54:02.900 some way of encrypting that message so that someone along the way 00:54:02.900 --> 00:54:06.230 won't be able to do that decryption if a router in the middle 00:54:06.230 --> 00:54:09.408 or someone in the middle is able to intercept that message. 00:54:09.408 --> 00:54:11.450 And so the first approach we'll look at is what's 00:54:11.450 --> 00:54:14.030 known as secret-key cryptography. 00:54:14.030 --> 00:54:19.160 In secret-key cryptography, I have not just the plaintext, but some key, 00:54:19.160 --> 00:54:23.600 some secret piece of information that can be used in order to encrypt 00:54:23.600 --> 00:54:25.550 or decrypt information. 00:54:25.550 --> 00:54:29.600 And so I'll use both the key and the plaintext 00:54:29.600 --> 00:54:33.710 to generate what's known as the ciphertext, the encrypted version 00:54:33.710 --> 00:54:35.690 of the message I'm trying to send. 00:54:35.690 --> 00:54:39.080 And then, instead of sending the plaintext 00:54:39.080 --> 00:54:41.540 across the internet to the other person, I 00:54:41.540 --> 00:54:44.870 might instead want to just send the ciphertext across the internet 00:54:44.870 --> 00:54:48.050 to the other person so that I'm not sending the plain version 00:54:48.050 --> 00:54:49.700 of the message across the internet. 00:54:49.700 --> 00:54:51.560 So the ciphertext goes across. 00:54:51.560 --> 00:54:54.270 And the other person will also need the key. 00:54:54.270 --> 00:54:57.835 Now, if the other person has both the ciphertext and the key, 00:54:57.835 --> 00:54:59.960 well, then using that information, the other person 00:54:59.960 --> 00:55:02.960 can use the key to decrypt the ciphertext 00:55:02.960 --> 00:55:05.800 and obtain the original plaintext. 00:55:05.800 --> 00:55:10.340 And this key is what we might call a symmetric key encryption and decryption 00:55:10.340 --> 00:55:10.840 key. 00:55:10.840 --> 00:55:13.820 You use the key in order to encrypt messages. 00:55:13.820 --> 00:55:17.600 And you use the same key in order to do the decryption process. 00:55:17.600 --> 00:55:21.050 And as long as both I and the person I'm communicating with both have access 00:55:21.050 --> 00:55:25.760 to that key, well, then we'll be able to encrypt messages and decrypt messages. 00:55:25.760 --> 00:55:28.610 And someone who just has the ciphertext but not the key 00:55:28.610 --> 00:55:33.160 likely won't be able to figure out what that original message was. 00:55:33.160 --> 00:55:36.370 But there's a problem here, especially in the context of the internet. 00:55:36.370 --> 00:55:41.500 And that is that both I and the other person need to have access to this key. 00:55:41.500 --> 00:55:45.320 The key is what I use to do the encryption and the decryption. 00:55:45.320 --> 00:55:48.978 And I can't just send the key across the internet to the other person 00:55:48.978 --> 00:55:51.520 because, if I do that, well, then someone in the middle who's 00:55:51.520 --> 00:55:54.130 intercepting all of my requests could intercept 00:55:54.130 --> 00:55:56.740 both the ciphertext and the key. 00:55:56.740 --> 00:56:00.670 And therefore, they would be able to decrypt the message because they 00:56:00.670 --> 00:56:03.260 have both the ciphertext and the key. 00:56:03.260 --> 00:56:07.090 Now, if I were able to go to another person in person and exchange 00:56:07.090 --> 00:56:10.390 this secret key in secret, well, then this scheme 00:56:10.390 --> 00:56:12.490 might work, because we both have the key. 00:56:12.490 --> 00:56:16.360 And I didn't share the key publicly with anyone who might intercept the message. 00:56:16.360 --> 00:56:18.970 Only I and the other person had the key. 00:56:18.970 --> 00:56:21.157 But in general, when communicating on the internet, 00:56:21.157 --> 00:56:22.990 you're not communicating with servers you've 00:56:22.990 --> 00:56:25.210 necessarily communicated with before. 00:56:25.210 --> 00:56:27.880 I might be trying to make a request to a new website. 00:56:27.880 --> 00:56:32.770 And we somehow still need to agree on a system where I can encrypt messages 00:56:32.770 --> 00:56:35.110 but only the other person on the other side 00:56:35.110 --> 00:56:38.990 is able to decrypt those messages instead. 00:56:38.990 --> 00:56:42.460 So this kind of cryptography-- probably not great 00:56:42.460 --> 00:56:47.300 for trying to initially try and create a secure connection on the internet. 00:56:47.300 --> 00:56:49.810 And for that reason, a major advancement in cryptography 00:56:49.810 --> 00:56:54.970 that allows for the internet to work is this notion of public-key cryptography. 00:56:54.970 --> 00:56:56.890 In secret-key cryptography, it's important 00:56:56.890 --> 00:57:00.280 that the key is secret because, if the key were known by everyone, well, 00:57:00.280 --> 00:57:03.040 then anyone would be able to decrypt messages. 00:57:03.040 --> 00:57:06.730 In public-key cryptography, we're able to create a secure encryption 00:57:06.730 --> 00:57:09.790 system where the key is allowed to be public, 00:57:09.790 --> 00:57:11.980 or one of the keys, as we'll soon see. 00:57:11.980 --> 00:57:16.030 And the idea here is that we're using two keys instead of just one-- 00:57:16.030 --> 00:57:20.072 that we have both a public key and what's known as a private key. 00:57:20.072 --> 00:57:22.030 The private key-- your private key is something 00:57:22.030 --> 00:57:25.840 you should not share with other people to keep the encryption scheme secure. 00:57:25.840 --> 00:57:30.340 But the public key is one that is OK to share with other people. 00:57:30.340 --> 00:57:34.150 And the distinction between the two is that the public key will be 00:57:34.150 --> 00:57:36.640 used in order to encrypt information. 00:57:36.640 --> 00:57:40.090 And the private key will be used to decrypt information 00:57:40.090 --> 00:57:41.870 that was encrypted by the public. 00:57:41.870 --> 00:57:44.620 And the public key and the private key are mathematically related. 00:57:44.620 --> 00:57:47.287 And there are a couple of ways that we might imagine doing that. 00:57:47.287 --> 00:57:51.160 But the idea now is that, if I want to communicate with another person, 00:57:51.160 --> 00:57:54.100 that person sends me their public key. 00:57:54.100 --> 00:57:56.890 And it's OK for the public key to travel across the internet. 00:57:56.890 --> 00:58:01.000 Anyone is allowed to see the public key because the public key is only 00:58:01.000 --> 00:58:03.610 used for encrypting that data. 00:58:03.610 --> 00:58:06.610 So I can then take the plaintext and the public key 00:58:06.610 --> 00:58:11.350 and use that to generate the ciphertext, the encrypted version of the message 00:58:11.350 --> 00:58:13.930 that I am trying to send across the internet. 00:58:13.930 --> 00:58:16.960 And then I send the ciphertext to the other person 00:58:16.960 --> 00:58:18.640 with whom I'm trying to communicate. 00:58:18.640 --> 00:58:24.080 And the other person now, using the ciphertext, then uses the private key-- 00:58:24.080 --> 00:58:26.800 the private key that they did not share, and the private key 00:58:26.800 --> 00:58:29.710 that has the ability to decrypt information that 00:58:29.710 --> 00:58:32.600 was encrypted using the public key. 00:58:32.600 --> 00:58:35.800 So using a combination of the ciphertext and the private key, 00:58:35.800 --> 00:58:38.830 the person I'm communicating with can decrypt that information 00:58:38.830 --> 00:58:43.070 and get back whatever the original plaintext of that information 00:58:43.070 --> 00:58:44.360 happened to be. 00:58:44.360 --> 00:58:46.630 And so this, then, is how we can do a lot 00:58:46.630 --> 00:58:48.430 of this communication on the internet. 00:58:48.430 --> 00:58:50.830 By using this public-private key pair, we 00:58:50.830 --> 00:58:53.560 can say, use the public key to do the encrypting, 00:58:53.560 --> 00:58:55.690 use the private key to do the decrypting. 00:58:55.690 --> 00:58:58.690 And now two computers that have never interacted with each other 00:58:58.690 --> 00:59:00.970 before, without having the opportunity to meet, 00:59:00.970 --> 00:59:04.630 to exchange some secret information, can use a technique like this 00:59:04.630 --> 00:59:07.060 in order to securely communicate with each other-- 00:59:07.060 --> 00:59:10.300 to send a message back and forth without anyone in the middle 00:59:10.300 --> 00:59:15.140 being able to intercept the message and identify what the message is about. 00:59:15.140 --> 00:59:18.310 And once you have this ability, the ability to communicate with another 00:59:18.310 --> 00:59:21.730 secretly, well, then you can imagine agreeing on some secret key 00:59:21.730 --> 00:59:25.780 and then using secret-key encryption to be able to encrypt and decrypt messages 00:59:25.780 --> 00:59:26.470 as well. 00:59:26.470 --> 00:59:28.262 And so that's an approach that you can also 00:59:28.262 --> 00:59:31.460 take when trying to communicate with other people across the internet. 00:59:31.460 --> 00:59:34.950 But this idea of encryption is what allows for HTTPS, 00:59:34.950 --> 00:59:39.190 the secure version of the HTTP protocol, to actually work to make sure that-- 00:59:39.190 --> 00:59:42.690 when you are communicating with your bank's website, for example-- 00:59:42.690 --> 00:59:46.300 that someone along the way won't be able to intercept that information 00:59:46.300 --> 00:59:48.770 and identify what it is that you're communicating about 00:59:48.770 --> 00:59:51.090 and, instead, only has the encrypted version 00:59:51.090 --> 00:59:55.720 of the information and a public key with which they can encrypt information, 00:59:55.720 --> 00:59:57.850 but not a private key that can ultimately 00:59:57.850 --> 01:00:02.150 be used in order to decrypt information as well. 01:00:02.150 --> 01:00:05.920 And so that then is how we might allow for this kind of secure communication 01:00:05.920 --> 01:00:09.010 on the internet and allow our web applications to be secure. 01:00:09.010 --> 01:00:12.130 But in addition to our web applications just listening for requests 01:00:12.130 --> 01:00:14.180 and then providing some sort of response, 01:00:14.180 --> 01:00:17.560 our web applications were also dealing with data. 01:00:17.560 --> 01:00:19.720 We introduced the idea of SQL data tables 01:00:19.720 --> 01:00:22.240 where we had tables of data with rows and columns 01:00:22.240 --> 01:00:23.950 that are representing information. 01:00:23.950 --> 01:00:26.980 And we've also created web applications in this course where 01:00:26.980 --> 01:00:28.900 we've had applications that have users. 01:00:28.900 --> 01:00:32.940 Users sign in with a user name and a password, for example. 01:00:32.940 --> 01:00:35.450 And so how might we represent that information 01:00:35.450 --> 01:00:37.100 about users and their passwords? 01:00:37.100 --> 01:00:41.070 Well, one way would be just stored inside of a table like this. 01:00:41.070 --> 01:00:42.410 Here's a table of users. 01:00:42.410 --> 01:00:44.210 Every user has an ID. 01:00:44.210 --> 01:00:47.490 They have a user name, and they have a password. 01:00:47.490 --> 01:00:50.750 But this turns out to be an incredibly insecure way 01:00:50.750 --> 01:00:53.090 to store passwords-- to be storing passwords 01:00:53.090 --> 01:00:56.120 in what might be called plaintext, just to literally store 01:00:56.120 --> 01:00:58.040 the passwords inside of a database. 01:00:58.040 --> 01:01:01.910 And we should never do this in practice because of the security vulnerabilities 01:01:01.910 --> 01:01:03.090 associated with it. 01:01:03.090 --> 01:01:06.680 If ever someone were to, unauthorized, get access to this database, 01:01:06.680 --> 01:01:10.140 they would be able to see all of the passwords for all of the users. 01:01:10.140 --> 01:01:13.010 So if this database ever leaked for whatever reason, suddenly 01:01:13.010 --> 01:01:14.852 all of these passwords are now known. 01:01:14.852 --> 01:01:16.310 And this kind of thing does happen. 01:01:16.310 --> 01:01:19.460 If companies are not careful about how they represent user names 01:01:19.460 --> 01:01:22.380 and passwords inside of their databases, and if ever there's 01:01:22.380 --> 01:01:27.040 some sort of database leak, suddenly a whole bunch of passwords 01:01:27.040 --> 01:01:29.008 could potentially be compromised. 01:01:29.008 --> 01:01:31.300 And it's for that reason that the recommended approach, 01:01:31.300 --> 01:01:34.060 rather than store an actual password, is to store 01:01:34.060 --> 01:01:38.740 a hashed version of the same password using a hash function where 01:01:38.740 --> 01:01:41.680 a hash function, in this context, is some function that 01:01:41.680 --> 01:01:46.630 takes a password of input and outputs some hash-- 01:01:46.630 --> 01:01:49.540 some sequence of characters and numbers, in this case-- 01:01:49.540 --> 01:01:51.850 that represents that particular password, 01:01:51.850 --> 01:01:53.650 a hashed version of the password. 01:01:53.650 --> 01:01:55.870 But the important thing about this hash function 01:01:55.870 --> 01:01:58.120 is that it's a one-way hash function. 01:01:58.120 --> 01:02:01.750 From the password, you can get to the sequence of letters and numbers. 01:02:01.750 --> 01:02:04.480 But it is very, very difficult to go the other way around 01:02:04.480 --> 01:02:09.490 to use this information to figure out what the original password actually 01:02:09.490 --> 01:02:10.240 was. 01:02:10.240 --> 01:02:12.940 And so what this means is that the companies won't actually 01:02:12.940 --> 01:02:18.550 know what any particular user's password is when a user tries to log in. 01:02:18.550 --> 01:02:21.760 What we'll do is take their password that they're trying to log in with. 01:02:21.760 --> 01:02:25.090 We'll hash it and compare that hash against the hash 01:02:25.090 --> 01:02:27.580 that we've stored in the database. 01:02:27.580 --> 01:02:31.030 If the hashes match up, that means the user probably typed in their password 01:02:31.030 --> 01:02:33.130 correctly and, therefore, we can sign the user in. 01:02:33.130 --> 01:02:35.830 And otherwise, that's a sign that the user did not 01:02:35.830 --> 01:02:38.270 type their password in correctly. 01:02:38.270 --> 01:02:40.330 So this, then, is the reason why companies-- 01:02:40.330 --> 01:02:42.670 if they're obeying these best practices-- usually 01:02:42.670 --> 01:02:44.740 can't tell you what your password actually 01:02:44.740 --> 01:02:46.810 is if you forget your password. 01:02:46.810 --> 01:02:49.930 If you forget your password, the company will let you reset your password. 01:02:49.930 --> 01:02:52.242 They can update the data inside of the table. 01:02:52.242 --> 01:02:53.950 But the company won't be able to tell you 01:02:53.950 --> 01:02:57.760 what your password actually is because the company doesn't know your password. 01:02:57.760 --> 01:03:00.460 The company only knows some hashed version 01:03:00.460 --> 01:03:04.970 of the password, some result of passing that password through a hash function. 01:03:04.970 --> 01:03:07.870 And as a result, they're able to know whether you 01:03:07.870 --> 01:03:10.600 logged in successfully or not with the correct credentials 01:03:10.600 --> 01:03:14.000 without actually knowing what your password actually is. 01:03:14.000 --> 01:03:15.940 And so this is another area where you might 01:03:15.940 --> 01:03:19.270 imagine that, if you're not careful about how you're storing this data, 01:03:19.270 --> 01:03:22.360 it could be a security vulnerability inside of your program 01:03:22.360 --> 01:03:26.220 where, if ever that data is leaked, passwords suddenly become known. 01:03:26.220 --> 01:03:29.890 And there are other more subtle ways that web applications could potentially 01:03:29.890 --> 01:03:32.410 leak information that you, as the web developer, 01:03:32.410 --> 01:03:34.330 need to decide if you're OK with or not. 01:03:34.330 --> 01:03:37.570 Imagine a website, for example, where you do have a place where you can say, 01:03:37.570 --> 01:03:39.700 if you forgot your password, you can be sent 01:03:39.700 --> 01:03:43.173 to a place where you can reset your password, for example. 01:03:43.173 --> 01:03:46.090 You might imagine that, if you type in your email address, click Reset 01:03:46.090 --> 01:03:49.270 Password, you might get a message like, all right, password reset email 01:03:49.270 --> 01:03:50.530 has been sent. 01:03:50.530 --> 01:03:54.070 But you might imagine typing in an email address and getting something like, 01:03:54.070 --> 01:03:57.400 error, there is no user with that email address. 01:03:57.400 --> 01:04:00.250 And here, again, is a potential security vulnerability 01:04:00.250 --> 01:04:02.320 in terms of leaked information. 01:04:02.320 --> 01:04:06.340 This page that just seems to send you an email if you forgot your password is 01:04:06.340 --> 01:04:10.720 now leaking information about which users happened to have accounts 01:04:10.720 --> 01:04:14.140 on your website and which users do not because all someone needs to do 01:04:14.140 --> 01:04:18.100 is type in an email address and find out whether it results in an error or not 01:04:18.100 --> 01:04:22.310 in order to know whether a user happens to have an account on the website 01:04:22.310 --> 01:04:22.810 or not. 01:04:22.810 --> 01:04:24.685 And maybe that's not a big deal if that's not 01:04:24.685 --> 01:04:26.170 something you care about securing. 01:04:26.170 --> 01:04:30.160 But if it's a website where you do care about making sure 01:04:30.160 --> 01:04:32.650 that, if someone has an account or doesn't have an account, 01:04:32.650 --> 01:04:35.350 that information is kept private and secure only to the user, 01:04:35.350 --> 01:04:37.630 unless they want to share it, well, then this type 01:04:37.630 --> 01:04:40.570 of page, this type of interface with the database 01:04:40.570 --> 01:04:43.570 could potentially be leaking that kind of information. 01:04:43.570 --> 01:04:46.120 And information can be leaked in all sorts of different ways. 01:04:46.120 --> 01:04:48.700 You can even leak information just based on the time 01:04:48.700 --> 01:04:52.780 it takes for the database to be able to respond to a particular request. 01:04:52.780 --> 01:04:55.450 You might imagine, if you make a request about a user, 01:04:55.450 --> 01:04:58.180 and it takes longer to respond, that might tell you 01:04:58.180 --> 01:05:01.150 something about the number of database queries it needs to run 01:05:01.150 --> 01:05:04.210 or the amount of information that's stored about that user as opposed 01:05:04.210 --> 01:05:06.200 to if a request takes less time. 01:05:06.200 --> 01:05:09.850 So even something like how many milliseconds it takes for a web server 01:05:09.850 --> 01:05:13.780 to respond to a request can reveal or leak information 01:05:13.780 --> 01:05:16.720 about the data that is stored inside of the database. 01:05:16.720 --> 01:05:19.750 And there have been examples of researchers who actually try and see 01:05:19.750 --> 01:05:23.702 what information they can get just from looking at these kinds of information. 01:05:23.702 --> 01:05:25.660 It doesn't seem like it would leak information, 01:05:25.660 --> 01:05:29.580 but it might actually reveal information as well. 01:05:29.580 --> 01:05:32.740 Now, another concern when dealing with SQL and databases we've talked about 01:05:32.740 --> 01:05:34.707 is the context of SQL injection-- 01:05:34.707 --> 01:05:36.790 this threat where, if you're not careful about how 01:05:36.790 --> 01:05:40.090 it is that you run your SQL code, you could inadvertently 01:05:40.090 --> 01:05:43.390 end up executing code that you don't mean to be executing. 01:05:43.390 --> 01:05:46.390 Situations like here-- we're in a username and password field. 01:05:46.390 --> 01:05:48.010 We've seen this example before-- 01:05:48.010 --> 01:05:50.620 where, if a user tries to log in, you might imagine a query 01:05:50.620 --> 01:05:53.200 like this is run selecting from the user's table 01:05:53.200 --> 01:05:57.190 where user name equals whatever was typed in as the user name and password 01:05:57.190 --> 01:05:59.800 equals whatever was typed in as the password. 01:05:59.800 --> 01:06:04.200 And we saw how, for a normal user-- someone who types in, Harry and 1, 2, 01:06:04.200 --> 01:06:06.970 3, 4, 5 as their username and password-- 01:06:06.970 --> 01:06:09.380 that this type of query works just fine. 01:06:09.380 --> 01:06:11.890 But if a hacker tries to log into a website 01:06:11.890 --> 01:06:15.520 and maybe includes a double quotation mark and two hyphens, 01:06:15.520 --> 01:06:18.640 for example, where two hyphens mean a comment in SQL, 01:06:18.640 --> 01:06:22.760 and we were to literally substitute these values into our SQL queries, 01:06:22.760 --> 01:06:27.010 well, then you might end up substituting hacker hyphen hyphen hyphen 01:06:27.010 --> 01:06:30.100 hyphen creating a comment that ignores the rest of this query, 01:06:30.100 --> 01:06:33.640 effectively ignoring any kind of password checking that we might 01:06:33.640 --> 01:06:35.560 want our web application to be doing. 01:06:35.560 --> 01:06:37.390 So this, too-- another vulnerability that 01:06:37.390 --> 01:06:40.570 comes about whenever we're dealing with executing 01:06:40.570 --> 01:06:42.520 SQL code inside of a database. 01:06:42.520 --> 01:06:44.860 And in order to deal with this, we want to make sure 01:06:44.860 --> 01:06:48.640 that we're escaping any of these potentially dangerous characters that 01:06:48.640 --> 01:06:50.710 might show up inside of our SQL queries. 01:06:50.710 --> 01:06:52.870 And Django's models do this for us. 01:06:52.870 --> 01:06:56.980 When we do these kinds of queries using Django saying, .objects, .filter, 01:06:56.980 --> 01:07:00.580 to be able to filter out for only certain versions of a particular model, 01:07:00.580 --> 01:07:04.330 it is going to take care of the process of making sure that it's not subject 01:07:04.330 --> 01:07:06.770 to these kinds of SQL injection attacks. 01:07:06.770 --> 01:07:09.340 But if ever you're writing a web application that is directly 01:07:09.340 --> 01:07:12.070 executing secret code, which you might imagine doing, 01:07:12.070 --> 01:07:14.080 you do want to be careful about making sure 01:07:14.080 --> 01:07:16.240 that you're not exposing the application to be 01:07:16.240 --> 01:07:20.070 vulnerable to these kinds of threats as well. 01:07:20.070 --> 01:07:21.920 So that then are potential threats that come 01:07:21.920 --> 01:07:24.935 about when we're just talking about what's happening on the server. 01:07:24.935 --> 01:07:26.810 But we also can think about what might happen 01:07:26.810 --> 01:07:28.700 when we're interacting with other servers-- 01:07:28.700 --> 01:07:31.380 when we're interacting with APIs, for example. 01:07:31.380 --> 01:07:33.770 So we talked about JavaScript and using JavaScript 01:07:33.770 --> 01:07:37.400 to be able to make additional requests to APIs or to other services that 01:07:37.400 --> 01:07:40.302 are able to return back with certain types of information. 01:07:40.302 --> 01:07:42.260 And with APIs, there are a number of techniques 01:07:42.260 --> 01:07:46.040 that we can use in APIs to allow them to be more scalable, to allow 01:07:46.040 --> 01:07:48.290 them to be more secure. 01:07:48.290 --> 01:07:50.780 One is this notion of rate limiting where 01:07:50.780 --> 01:07:52.940 we might want to make sure that no user is 01:07:52.940 --> 01:07:56.480 able to make more than a certain number of requests to an API 01:07:56.480 --> 01:07:59.000 in any particular amount of time. 01:07:59.000 --> 01:08:01.130 This is in response to a security threat that 01:08:01.130 --> 01:08:03.440 has to do with the scalability of a system, which 01:08:03.440 --> 01:08:06.560 is known as a DOS or Denial of Service Attack where, 01:08:06.560 --> 01:08:09.920 effectively, if you just make a whole bunch of requests to a single server 01:08:09.920 --> 01:08:13.543 over, and over, and over again, you could potentially shut down that system 01:08:13.543 --> 01:08:15.710 because you're making so many requests that it's not 01:08:15.710 --> 01:08:19.050 able to handle that many requests all at the same time. 01:08:19.050 --> 01:08:22.310 And for that reason, because it's so easy to make an API request-- 01:08:22.310 --> 01:08:27.170 you can do so using just a single line of Python or JavaScript, for example-- 01:08:27.170 --> 01:08:29.840 APIs will often institute some kind of rate 01:08:29.840 --> 01:08:32.960 limiting to limit the number of requests you can make so that you're not 01:08:32.960 --> 01:08:35.630 going to overwhelm the server or overwhelm the database that 01:08:35.630 --> 01:08:39.080 needs to be queried in order to respond to those requests. 01:08:39.080 --> 01:08:42.229 And so this kind of limiting might work as well. 01:08:42.229 --> 01:08:45.800 APIs might also want to add some kind of route authentication. 01:08:45.800 --> 01:08:49.527 You might not want everybody to access the same data via an API. 01:08:49.527 --> 01:08:51.319 Maybe there's some sort of permission model 01:08:51.319 --> 01:08:54.800 where only certain users are able to access certain pieces of data 01:08:54.800 --> 01:08:55.880 from the API. 01:08:55.880 --> 01:09:00.290 So you might imagine that a user needs to have an API key, for example-- 01:09:00.290 --> 01:09:03.830 effectively, a password that they need to pass around anytime 01:09:03.830 --> 01:09:06.710 they're making an API request to your API 01:09:06.710 --> 01:09:09.140 and that allows you to then be able to look at that key 01:09:09.140 --> 01:09:12.390 and verify that they are who they say they are. 01:09:12.390 --> 01:09:16.010 Now, with those API keys comes other potential security vulnerabilities 01:09:16.010 --> 01:09:17.090 to be mindful of. 01:09:17.090 --> 01:09:21.290 One is that, just as you should never be putting passwords inside of your source 01:09:21.290 --> 01:09:23.899 code-- inside of your Git repository, for example-- 01:09:23.899 --> 01:09:27.290 you likewise generally shouldn't be putting your API keys 01:09:27.290 --> 01:09:31.700 inside of your web applications as well, inside of the source code of those web 01:09:31.700 --> 01:09:34.069 applications, because then anyone who has access 01:09:34.069 --> 01:09:36.020 to the source code for the web application 01:09:36.020 --> 01:09:38.960 can see what your API key is, could then use 01:09:38.960 --> 01:09:42.439 the API key to pretend to be you and, therefore, get access 01:09:42.439 --> 01:09:46.609 to potential API routes that they should not be able to access. 01:09:46.609 --> 01:09:50.930 One common solution to this is to use what are known as environment variables 01:09:50.930 --> 01:09:55.190 where, effectively, you in your program say that your API key is not 01:09:55.190 --> 01:09:59.220 going to be some predetermined string that is in the text of your program 01:09:59.220 --> 01:10:03.170 but instead is going to be drawn from the environment in which the program is 01:10:03.170 --> 01:10:04.040 being run. 01:10:04.040 --> 01:10:07.430 And then, on the server, when you're running the web application, 01:10:07.430 --> 01:10:11.000 you'll first make sure the server has all of those environment variables set 01:10:11.000 --> 01:10:16.400 correctly so that, rather than have the API key actually in the source 01:10:16.400 --> 01:10:20.570 code of the program, the API key is simply in the environment on the server 01:10:20.570 --> 01:10:22.340 where the web application is running. 01:10:22.340 --> 01:10:25.370 And the server can just draw that information from the environment 01:10:25.370 --> 01:10:29.720 so that it knows what the API key should be without the API key 01:10:29.720 --> 01:10:34.590 actually having to be inside of the web application source code itself. 01:10:34.590 --> 01:10:36.470 And so as we begin to deal with APIs, you 01:10:36.470 --> 01:10:40.070 might notice that many APIs will require you to have an API key. 01:10:40.070 --> 01:10:42.170 And often, it's for these sorts of reasons-- 01:10:42.170 --> 01:10:45.310 to make sure that we're able to authenticate users effectively 01:10:45.310 --> 01:10:48.560 and also to make sure that we're able to limit users to make sure that they're 01:10:48.560 --> 01:10:51.140 not making too many requests to the server 01:10:51.140 --> 01:10:54.170 or to the database at any particular time. 01:10:54.170 --> 01:10:57.440 But this, then, starts to get us into other potential vulnerabilities-- 01:10:57.440 --> 01:11:00.470 in particular, vulnerabilities concerning JavaScript. 01:11:00.470 --> 01:11:02.600 JavaScript, again, is a programming language 01:11:02.600 --> 01:11:05.840 that we use in order to write code that runs inside of our web browser-- 01:11:05.840 --> 01:11:08.730 a browser like Chrome, or Safari, or something like that. 01:11:08.730 --> 01:11:14.210 And as a result, JavaScript has a lot of power to manipulate things on the page. 01:11:14.210 --> 01:11:16.220 It can simulate the clicking of buttons. 01:11:16.220 --> 01:11:20.120 It can change the content of what happens to be on any particular page. 01:11:20.120 --> 01:11:22.370 And as a result, there are many, many vulnerabilities 01:11:22.370 --> 01:11:26.750 that come about when it comes to thinking about JavaScript. 01:11:26.750 --> 01:11:30.750 And one such vulnerability is this notion of cross-site scripting-- 01:11:30.750 --> 01:11:33.380 that, in general, when on your web application, 01:11:33.380 --> 01:11:37.760 you only want JavaScript to run if you, yourself have written it. 01:11:37.760 --> 01:11:39.830 Cross-site scripting is a potential threat 01:11:39.830 --> 01:11:45.050 where someone else might be able to get JavaScript code to run on your website 01:11:45.050 --> 01:11:48.890 when it's JavaScript code that someone else wrote instead of you, yourself. 01:11:48.890 --> 01:11:51.710 And this is a potential vulnerability because, if someone else can 01:11:51.710 --> 01:11:55.280 write the JavaScript code, they can manipulate the contents of what 01:11:55.280 --> 01:11:56.830 happens to be on your website. 01:11:56.830 --> 01:11:59.300 They can potentially manipulate the user experience 01:11:59.300 --> 01:12:02.260 to get a result that is not, actually, desired. 01:12:02.260 --> 01:12:06.860 So let's go ahead and take a look at one example of cross-site scripting. 01:12:06.860 --> 01:12:09.770 All right, so I've prepared a web application in advance-- 01:12:09.770 --> 01:12:14.900 it's called security-- inside of which is a single Django app called XXS, 01:12:14.900 --> 01:12:16.590 for Cross-Site Scripting. 01:12:16.590 --> 01:12:19.670 And inside of here, we'll first take a look at the URLs. 01:12:19.670 --> 01:12:24.290 So there's a single URL that just allows us to provide any path. 01:12:24.290 --> 01:12:27.330 And then it's going to load the Index view. 01:12:27.330 --> 01:12:31.910 And on the Index view, we're going to display in HTTP response. 01:12:31.910 --> 01:12:35.210 It says, here was the path that just happened to be requested. 01:12:35.210 --> 01:12:37.910 So you might imagine this is a simplified version of what 01:12:37.910 --> 01:12:41.240 you might see on other websites, for example, where websites might show you 01:12:41.240 --> 01:12:45.170 on any particular page what path you're on in order to get to that page, 01:12:45.170 --> 01:12:49.610 some indication of where you are inside of this web application. 01:12:49.610 --> 01:12:53.150 So I'd go ahead and see the security and run the server-- 01:12:53.150 --> 01:12:57.640 Python manage.py, run server. 01:12:57.640 --> 01:12:59.320 So I am now running the server. 01:12:59.320 --> 01:13:06.420 And now I'll go ahead and go into my web application, /hello, for example. 01:13:06.420 --> 01:13:09.570 And so what I see here is the requested path hello, 01:13:09.570 --> 01:13:11.230 which is what I would expect it to be. 01:13:11.230 --> 01:13:13.960 I can change it to something else, like hi. 01:13:13.960 --> 01:13:15.270 So here's requested path hi. 01:13:15.270 --> 01:13:17.760 Here's hi/2, for example. 01:13:17.760 --> 01:13:20.430 Whatever page I visit, it gives me a page 01:13:20.430 --> 01:13:23.190 that says, requested path, and then whatever 01:13:23.190 --> 01:13:25.770 path I happened to be visiting. 01:13:25.770 --> 01:13:29.520 But watch what happens if I try and visit this URL instead. 01:13:29.520 --> 01:13:39.600 I'm going to visit URL /script alert hi, and then end script. 01:13:39.600 --> 01:13:40.650 So I run it. 01:13:40.650 --> 01:13:44.990 And suddenly, an alert shows up on my page that says, hi. 01:13:44.990 --> 01:13:45.850 And I press OK. 01:13:45.850 --> 01:13:47.790 And it says, all right, requested path. 01:13:47.790 --> 01:13:49.680 That alert was a JavaScript alert. 01:13:49.680 --> 01:13:53.250 It was JavaScript code running on my web application. 01:13:53.250 --> 01:13:56.940 But it was not code that was JavaScript code inside of my web application. 01:13:56.940 --> 01:14:00.150 It was someone else who wrote based on the URL 01:14:00.150 --> 01:14:03.780 to run particular JavaScript on my particular page. 01:14:03.780 --> 01:14:06.120 And so someone linked to my web application 01:14:06.120 --> 01:14:09.000 and passed in this script tag as part of the URL. 01:14:09.000 --> 01:14:12.840 Someone who clicked on that link might have been taken to my web application 01:14:12.840 --> 01:14:17.630 but ultimately had JavaScript run that was created by someone else. 01:14:17.630 --> 01:14:19.980 And that, ultimately, is potentially dangerous. 01:14:19.980 --> 01:14:22.440 It leaves open the possibility that someone else 01:14:22.440 --> 01:14:24.990 could run JavaScript code on my page. 01:14:24.990 --> 01:14:27.300 And it might not just be something like a script. 01:14:27.300 --> 01:14:29.940 You might imagine someone not just displaying an alert, 01:14:29.940 --> 01:14:33.720 but modifying something inside of the DOM-- changing the contents of the web 01:14:33.720 --> 01:14:36.960 page, making API requests, doing other types of tasks 01:14:36.960 --> 01:14:39.870 that you can do using JavaScript inside of a web browser 01:14:39.870 --> 01:14:44.580 that, ultimately, leave my page open to potential security vulnerabilities. 01:14:44.580 --> 01:14:47.580 And so these are cases where it's important to be mindful of when you're 01:14:47.580 --> 01:14:51.720 designing these pages, if ever there is a possibility that someone could inject 01:14:51.720 --> 01:14:54.630 their own JavaScript into your page somehow, 01:14:54.630 --> 01:14:57.780 you'll want to either detect that or escape it in some way. 01:14:57.780 --> 01:15:02.025 Or take other precautions to make sure that this kind of cross-site scripting 01:15:02.025 --> 01:15:03.150 isn't going to be possible. 01:15:03.150 --> 01:15:06.240 You might imagine that, in a messaging application-- for example, 01:15:06.240 --> 01:15:07.740 if you're messaging back and forth-- 01:15:07.740 --> 01:15:10.282 you don't want it to be the case that, if you message someone 01:15:10.282 --> 01:15:13.260 else some JavaScript code that, when they receive it, 01:15:13.260 --> 01:15:16.380 that code actually ends up running as some JavaScript that 01:15:16.380 --> 01:15:18.210 runs on that particular page. 01:15:18.210 --> 01:15:20.450 You want to be sure to escape that information so 01:15:20.450 --> 01:15:22.830 that they just see the text of the JavaScript code 01:15:22.830 --> 01:15:25.430 but that the code isn't actually executed. 01:15:25.430 --> 01:15:28.140 And this is a similar threat to that threat of SQL injection. 01:15:28.140 --> 01:15:30.480 It all comes back to the idea of not wanting 01:15:30.480 --> 01:15:33.120 to allow someone else to be able to inject 01:15:33.120 --> 01:15:35.280 their own code into your program. 01:15:35.280 --> 01:15:39.540 You don't want someone else to be able to inject SQL code into the queries you 01:15:39.540 --> 01:15:40.770 run on your database. 01:15:40.770 --> 01:15:44.640 And you don't want someone to be able to inject JavaScript code into your web 01:15:44.640 --> 01:15:49.850 page because that leaves open potential security vulnerabilities as well. 01:15:49.850 --> 01:15:51.882 One type of security vulnerability that Django 01:15:51.882 --> 01:15:54.590 is quite good at defending against is one that we've seen before, 01:15:54.590 --> 01:15:57.470 but we'll explore in more detail how it might work. 01:15:57.470 --> 01:16:00.530 And it's this idea of cross-site request forgery where 01:16:00.530 --> 01:16:05.270 you fake a request to a website when you didn't intend to actually make 01:16:05.270 --> 01:16:07.020 a request to that website. 01:16:07.020 --> 01:16:10.830 So you might imagine that, if your bank, for example, 01:16:10.830 --> 01:16:12.982 had a URL that allowed you to transfer money 01:16:12.982 --> 01:16:14.690 from one person to another person-- we've 01:16:14.690 --> 01:16:16.430 talked about this idea a little bit. 01:16:16.430 --> 01:16:20.480 But imagine now how you could implement this if it really was just a URL. 01:16:20.480 --> 01:16:24.740 You could go to /transfer and say, as get parameters, 01:16:24.740 --> 01:16:26.060 who am I transferring money to? 01:16:26.060 --> 01:16:27.950 And what is the amount that I'm transferring? 01:16:27.950 --> 01:16:32.120 Then someone else on some other website could, in the body of their page, 01:16:32.120 --> 01:16:35.270 just have a link where that link says, click here. 01:16:35.270 --> 01:16:37.460 And it links to your bank.com, or whatever 01:16:37.460 --> 01:16:41.390 your bank is, transferring money to me in this amount. 01:16:41.390 --> 01:16:44.720 And if some user unknowingly just clicked on that link not knowing 01:16:44.720 --> 01:16:46.640 where it would take them, this website might 01:16:46.640 --> 01:16:49.640 be able to forge a request to the bank-- make 01:16:49.640 --> 01:16:52.070 it seem like the user had gone to the bank 01:16:52.070 --> 01:16:54.350 and tried to initiate some kind of transfer 01:16:54.350 --> 01:16:56.360 and, ultimately, tried to transfer money. 01:16:56.360 --> 01:16:59.330 And it doesn't even necessarily need to be in a link. 01:16:59.330 --> 01:17:03.230 How else might you get some new request to happen inside of the web browser? 01:17:03.230 --> 01:17:05.690 You might imagine-- though it might seem a bit strange-- 01:17:05.690 --> 01:17:08.450 to put this inside of an image. 01:17:08.450 --> 01:17:13.250 Image source, the source of the image, is this particular URL-- 01:17:13.250 --> 01:17:14.493 the bank's transfer page. 01:17:14.493 --> 01:17:16.160 Now, that doesn't really make any sense. 01:17:16.160 --> 01:17:17.840 The transfer page is not an image. 01:17:17.840 --> 01:17:19.340 But it doesn't matter. 01:17:19.340 --> 01:17:24.590 All an image tag is going to do is try to make a request to this source URL 01:17:24.590 --> 01:17:28.527 to get that image and then try to display it in the user's web browser. 01:17:28.527 --> 01:17:31.610 But the first part is what's important-- the fact that this source ends up 01:17:31.610 --> 01:17:33.650 being requested by the web browser. 01:17:33.650 --> 01:17:36.380 Without the user having to click on or do anything, 01:17:36.380 --> 01:17:40.850 they might try and request from your bank.com/transfer this particular 01:17:40.850 --> 01:17:45.500 request, which might initiate some sort of bank transfer without the user even 01:17:45.500 --> 01:17:46.580 realizing it. 01:17:46.580 --> 01:17:49.160 And it's for that reason that we generally suggest that, 01:17:49.160 --> 01:17:54.560 anytime you're creating a website that is going to allow for the manipulation 01:17:54.560 --> 01:17:57.500 of some kind of state-- that allows for some change to happen, 01:17:57.500 --> 01:17:59.210 something like transferring money-- 01:17:59.210 --> 01:18:02.450 you don't want that to be a Git request, something that you could just 01:18:02.450 --> 01:18:06.515 load in an image or load by clicking on a link that takes you to another page. 01:18:06.515 --> 01:18:08.390 You don't want that to happen because then it 01:18:08.390 --> 01:18:12.350 makes it very easy for someone else to fake a request to your page 01:18:12.350 --> 01:18:16.790 by just creating an image or linking to, somehow, a website, 01:18:16.790 --> 01:18:20.005 transferring funds from one user to another. 01:18:20.005 --> 01:18:22.130 So a solution to this-- and we've talked about it-- 01:18:22.130 --> 01:18:24.920 is that, generally, we only want post requests 01:18:24.920 --> 01:18:27.860 to be able to manipulate something inside of the database, 01:18:27.860 --> 01:18:32.330 to be able to actually initiate a transfer from one user to another user. 01:18:32.330 --> 01:18:35.210 But even then, this is not perfectly secure. 01:18:35.210 --> 01:18:38.660 You could still be tricked into submitting a post request. 01:18:38.660 --> 01:18:42.320 Imagine an adversarial website that had a form like this-- 01:18:42.320 --> 01:18:47.120 a form whose action was your bank.com/transfer and whose method was 01:18:47.120 --> 01:18:48.200 post. 01:18:48.200 --> 01:18:52.370 And now here-- two input fields whose type is hidden, meaning you 01:18:52.370 --> 01:18:55.040 won't actually be able to see those input fields when 01:18:55.040 --> 01:18:56.420 the user is looking at the page. 01:18:56.420 --> 01:18:59.090 They'd only know about it if they inspected the source 01:18:59.090 --> 01:19:03.120 code of this particular HTML page. 01:19:03.120 --> 01:19:05.550 Here, there's a hidden input whose name is to, 01:19:05.550 --> 01:19:07.840 meaning the person I'd like to transfer money to. 01:19:07.840 --> 01:19:10.470 Here is the amount, the value that I would like to transfer. 01:19:10.470 --> 01:19:14.153 And all the user is going to see is a button that says, click here. 01:19:14.153 --> 01:19:17.320 They're not going to see either of the input fields, because they're hidden. 01:19:17.320 --> 01:19:19.740 But if they do click the Click Here button, well, then 01:19:19.740 --> 01:19:22.950 suddenly they're going to be submitting a post request to the bank 01:19:22.950 --> 01:19:25.525 and initiating some transfer when they didn't intend to. 01:19:25.525 --> 01:19:28.650 Now, maybe this seems like, oh, it's not a big deal, because the user still 01:19:28.650 --> 01:19:29.850 needs to click a button. 01:19:29.850 --> 01:19:31.767 And the user shouldn't be clicking on a button 01:19:31.767 --> 01:19:33.990 if they don't know what the button is going to do. 01:19:33.990 --> 01:19:38.280 Well, for one, it's probably reasonable to imagine that an adversary might 01:19:38.280 --> 01:19:41.010 embed this button inside of a page where it looks totally 01:19:41.010 --> 01:19:42.820 safe to be able to click on a button. 01:19:42.820 --> 01:19:45.960 But moreover, the user doesn't even need to click on it in order 01:19:45.960 --> 01:19:47.010 to submit the form. 01:19:47.010 --> 01:19:49.170 We can just add a little bit of JavaScript. 01:19:49.170 --> 01:19:52.710 You might imagine that an adversary could do something like this. 01:19:52.710 --> 01:19:55.560 Add an unknown attribute to the body that says, 01:19:55.560 --> 01:19:59.250 when the body of the page is done loading, go to document.form-- 01:19:59.250 --> 01:20:01.680 meaning all of the forms for this web page. 01:20:01.680 --> 01:20:04.590 Get the first one, and submit it. 01:20:04.590 --> 01:20:06.320 Submit the form. 01:20:06.320 --> 01:20:09.450 And what that's going to do is, even without the user doing anything-- 01:20:09.450 --> 01:20:12.330 even without the user clicking on the Click Here button-- 01:20:12.330 --> 01:20:15.420 as soon as this page is loaded, this form is going to submit, 01:20:15.420 --> 01:20:19.050 submitting a post request to the bank, and attempting to transfer funds 01:20:19.050 --> 01:20:21.120 from one user to another user. 01:20:21.120 --> 01:20:23.760 And so this is what we might call a cross-site request 01:20:23.760 --> 01:20:29.220 forgery where some adversarial website has forged a request to our website. 01:20:29.220 --> 01:20:32.870 And ideally, we wouldn't like for that to be able to happen. 01:20:32.870 --> 01:20:35.030 So how do we guard against this? 01:20:35.030 --> 01:20:39.780 Well, what Django allows us to do and a very common approach is to add a CSRF 01:20:39.780 --> 01:20:42.390 token-- a Cross-Site Request Forgery token-- 01:20:42.390 --> 01:20:46.320 that is going to be regenerated for every session 01:20:46.320 --> 01:20:48.740 such that, only if that token is present, 01:20:48.740 --> 01:20:51.610 will the transfer be able to go through. 01:20:51.610 --> 01:20:57.360 So on our website, we can include the CSRF token inside of this HTML form 01:20:57.360 --> 01:21:00.510 and, as a result, make sure that we're able to transfer money only 01:21:00.510 --> 01:21:02.650 when the CSRF token is present. 01:21:02.650 --> 01:21:05.220 But if some other website tries to forge a request, 01:21:05.220 --> 01:21:07.710 they won't know what the CSRF token should be 01:21:07.710 --> 01:21:09.840 because it changes for every session. 01:21:09.840 --> 01:21:14.730 And therefore, they won't be able to actually forge a request from one user 01:21:14.730 --> 01:21:16.510 to another. 01:21:16.510 --> 01:21:19.590 So all across the various different tools and technologies 01:21:19.590 --> 01:21:20.340 we've been using-- 01:21:20.340 --> 01:21:25.710 Python, HTTP, Django, HTML in terms of creating these web 01:21:25.710 --> 01:21:27.990 applications using JavaScript, and the APIs 01:21:27.990 --> 01:21:29.460 that we might be interacting with-- 01:21:29.460 --> 01:21:31.710 there are security considerations all throughout. 01:21:31.710 --> 01:21:33.623 We've only touched on a couple of them here. 01:21:33.623 --> 01:21:36.540 But it just goes to show how it's important to be mindful as you think 01:21:36.540 --> 01:21:39.790 about the practice of web programming, thinking about what you're going to add 01:21:39.790 --> 01:21:42.960 to your web applications and what features your web application supports, 01:21:42.960 --> 01:21:46.260 to think about what the potential vulnerabilities there are as well-- 01:21:46.260 --> 01:21:49.920 how someone might exploit your web application in order to do something 01:21:49.920 --> 01:21:51.690 with it that they probably shouldn't. 01:21:51.690 --> 01:21:54.450 And as you take your web applications from applications 01:21:54.450 --> 01:21:57.015 that are just running on your own local computer 01:21:57.015 --> 01:21:59.940 to applications that are running in some web server 01:21:59.940 --> 01:22:02.130 that many people are starting to use, these 01:22:02.130 --> 01:22:04.420 are the types of questions to start to be asking. 01:22:04.420 --> 01:22:07.740 How can you make sure that your web application is scalable? 01:22:07.740 --> 01:22:11.740 How can you make sure that your web application is secure? 01:22:11.740 --> 01:22:15.392 So now that we've explored that-- a lot of web programming-- what comes next? 01:22:15.392 --> 01:22:17.850 In this course, we've explored a number of different tools, 01:22:17.850 --> 01:22:19.470 and technologies, and languages. 01:22:19.470 --> 01:22:21.540 But there are many other web frameworks and ways 01:22:21.540 --> 01:22:23.850 you can build web applications as well. 01:22:23.850 --> 01:22:26.220 We spent most of our time looking at the Django web 01:22:26.220 --> 01:22:27.580 framework, written in Python. 01:22:27.580 --> 01:22:29.430 But you can use other programming languages 01:22:29.430 --> 01:22:31.560 to build web applications as well. 01:22:31.560 --> 01:22:34.980 Express.js, for example, is a very popular JavaScript framework 01:22:34.980 --> 01:22:36.480 for building web applications. 01:22:36.480 --> 01:22:41.390 Ruby on Rails is a popular server-side web framework built using Ruby. 01:22:41.390 --> 01:22:43.020 And there are many others as well. 01:22:43.020 --> 01:22:44.730 And there are also client-side frameworks 01:22:44.730 --> 01:22:48.540 used primarily with JavaScript to be able to build user interfaces. 01:22:48.540 --> 01:22:51.750 We've seen a little bit of React to both dynamic and interactive user 01:22:51.750 --> 01:22:52.620 interfaces. 01:22:52.620 --> 01:22:56.490 Other popular client-side frameworks include Angular JS, and Vue.js, 01:22:56.490 --> 01:22:58.343 and a number of others as well. 01:22:58.343 --> 01:23:00.510 And then, once you've built these web applications-- 01:23:00.510 --> 01:23:03.600 using any of these server-side frameworks and client-side frameworks-- 01:23:03.600 --> 01:23:06.360 then you might imagine wanting to take these applications 01:23:06.360 --> 01:23:07.645 and deploy them to the web. 01:23:07.645 --> 01:23:10.020 And to do that, there are a number of ways we can do this 01:23:10.020 --> 01:23:13.950 as well-- a number of different services including Amazon Web Services, AWS, 01:23:13.950 --> 01:23:17.730 Google Cloud, and Microsoft Azure that can be used in order to deploy 01:23:17.730 --> 01:23:19.530 these web applications. 01:23:19.530 --> 01:23:22.320 Roku is a service that uses AWS and tries 01:23:22.320 --> 01:23:26.100 to simplify the process of making it easier to deploy your web applications. 01:23:26.100 --> 01:23:29.340 And if you're web application is really just static-- it's just HTML, 01:23:29.340 --> 01:23:33.300 and CSS, and JavaScript-- well, then you can use something like GitHub Pages 01:23:33.300 --> 01:23:37.945 to be able to host a web application for free on GitHub's own servers instead. 01:23:37.945 --> 01:23:41.070 And there are many other ways you can imagine deploying web applications as 01:23:41.070 --> 01:23:43.395 well-- different services that you can use in order 01:23:43.395 --> 01:23:46.020 to take the web applications that you have been building or web 01:23:46.020 --> 01:23:47.940 applications you might build in the future 01:23:47.940 --> 01:23:52.870 and make them available on the internet for others to be able to use as well. 01:23:52.870 --> 01:23:56.550 So as we look back on the various topics within web programming we've explored, 01:23:56.550 --> 01:23:58.690 we've seen a lot of tools and technologies 01:23:58.690 --> 01:24:02.760 we can use that we can leverage in order to build interesting web applications. 01:24:02.760 --> 01:24:06.930 We started by taking a closer look HTML and CSS, 01:24:06.930 --> 01:24:10.080 diving into how we can use that to describe the structure of our page, 01:24:10.080 --> 01:24:12.210 and then taking advantage of tools like SAS 01:24:12.210 --> 01:24:15.570 that allow us to generate CSS that allows for much more 01:24:15.570 --> 01:24:18.270 complex styling for our website that would have been much more 01:24:18.270 --> 01:24:21.090 difficult to do with just CSS alone. 01:24:21.090 --> 01:24:24.240 As we started to build larger web applications, we took a look at Git-- 01:24:24.240 --> 01:24:26.610 version control tools that we can use in order 01:24:26.610 --> 01:24:29.370 to make sure that we keep track of versions and changes we 01:24:29.370 --> 01:24:33.240 make to our code, allowing multiple people to collaborate on a project 01:24:33.240 --> 01:24:34.547 simultaneously. 01:24:34.547 --> 01:24:37.380 We then took a look at Python, looking at various different features 01:24:37.380 --> 01:24:40.697 that the language offered-- functions, and conditions, and loops, 01:24:40.697 --> 01:24:42.780 as we've seen in many other programming languages. 01:24:42.780 --> 01:24:45.210 But also object-oriented programming-- the ability 01:24:45.210 --> 01:24:47.700 to represent objects, and methods, and functions 01:24:47.700 --> 01:24:49.950 that operate on those particular objects, which 01:24:49.950 --> 01:24:53.940 prove especially powerful in the context of dealing with data inside of our web 01:24:53.940 --> 01:24:55.380 applications. 01:24:55.380 --> 01:24:58.500 Django was the example of a web framework written in Python 01:24:58.500 --> 01:25:00.510 that we used to very quickly be able to start up 01:25:00.510 --> 01:25:04.500 a web application, that's able to listen for requests, and make responses. 01:25:04.500 --> 01:25:06.600 Django has a whole lot of features built in that 01:25:06.600 --> 01:25:10.072 really make it easy to get started with building a web application. 01:25:10.072 --> 01:25:12.030 And in particular, it makes it easy for writing 01:25:12.030 --> 01:25:14.260 web applications that deal with data. 01:25:14.260 --> 01:25:16.860 So Django allows us the ability to build models 01:25:16.860 --> 01:25:20.760 that interact with SQL without us having to actually write any SQL code. 01:25:20.760 --> 01:25:25.320 Django can generate the SQL for us just using these models and migrations that 01:25:25.320 --> 01:25:29.020 allow us to continually apply changes that we make to our database. 01:25:29.020 --> 01:25:33.330 As we add new tables, add and modify existing fields on those tables, 01:25:33.330 --> 01:25:36.065 Django can take care of all of that. 01:25:36.065 --> 01:25:38.190 After that, as you'll recall, we took our attention 01:25:38.190 --> 01:25:40.440 towards the second of the main programming languages 01:25:40.440 --> 01:25:44.950 in the course, JavaScript, which has a lot of uses and is very, very popular. 01:25:44.950 --> 01:25:46.920 But we primarily use it on the client side 01:25:46.920 --> 01:25:50.460 to be able to build interesting user interfaces-- using JavaScript 01:25:50.460 --> 01:25:52.680 to manipulate the DOM, the structure of the page, 01:25:52.680 --> 01:25:54.930 to change what it is the user sees. 01:25:54.930 --> 01:25:56.850 And also to add event handling-- so that when 01:25:56.850 --> 01:25:59.880 the user clicks on a button, when the user hovers over something, when 01:25:59.880 --> 01:26:02.550 the user interacts with the page in some sort of way, 01:26:02.550 --> 01:26:04.590 our code is able to respond to it. 01:26:04.590 --> 01:26:09.540 And we saw React, a client-side framework that uses JavaScript in order 01:26:09.540 --> 01:26:13.470 to allow us to create really interesting and interactive user interfaces 01:26:13.470 --> 01:26:15.893 with not all that much code at all. 01:26:15.893 --> 01:26:18.060 And then, finally, in these last couple of lectures, 01:26:18.060 --> 01:26:21.350 we've been looking at some best practices-- how we can design tests, 01:26:21.350 --> 01:26:23.520 tests the test the server, but also the client 01:26:23.520 --> 01:26:25.800 to make sure that our code is working appropriately, 01:26:25.800 --> 01:26:28.860 and also some industry practices like continuous integration 01:26:28.860 --> 01:26:31.140 and continuous delivery that just help to make sure 01:26:31.140 --> 01:26:34.740 that, as we make changes to our code, we're able to deploy and deliver them 01:26:34.740 --> 01:26:37.050 rapidly and effectively and make sure that we're 01:26:37.050 --> 01:26:39.630 able to make incremental changes to our code base 01:26:39.630 --> 01:26:42.460 rather than need to wait on longer release cycles. 01:26:42.460 --> 01:26:44.520 And then finally, today, we've been talking 01:26:44.520 --> 01:26:47.820 about issues about scalability and security, especially important 01:26:47.820 --> 01:26:50.880 as we begin to take our application and move them to the web. 01:26:50.880 --> 01:26:53.562 We want to make sure that these applications are scalable, 01:26:53.562 --> 01:26:55.770 that they're able to handle multiple different users, 01:26:55.770 --> 01:26:57.720 and also to make sure that they're secure-- 01:26:57.720 --> 01:27:01.050 that we're not exposing ourselves to potential vulnerabilities like someone 01:27:01.050 --> 01:27:05.370 who might inject SQL or inject JavaScript code into our pages 01:27:05.370 --> 01:27:08.730 or who might try to access some data that they're not supposed to access. 01:27:08.730 --> 01:27:12.420 We want to make sure that, when we go about designing these web applications, 01:27:12.420 --> 01:27:17.330 we're able to do so in a scalable and, ultimately, in a secure way. 01:27:17.330 --> 01:27:19.080 So hopefully, you enjoyed this exploration 01:27:19.080 --> 01:27:21.747 into the world of web programming with Python and JavaScript. 01:27:21.747 --> 01:27:23.580 Best of luck with the web programs that you, 01:27:23.580 --> 01:27:26.130 yourself might build with the tools we've seen here today, 01:27:26.130 --> 01:27:29.310 and also other tools that are inspired by our use similar tools 01:27:29.310 --> 01:27:32.130 and techniques and ideas as the things that we've ultimately 01:27:32.130 --> 01:27:32.880 talked about here. 01:27:32.880 --> 01:27:35.672 A big thanks to the course's teaching staff and the production team 01:27:35.672 --> 01:27:37.255 for making this entire class possible. 01:27:37.255 --> 01:27:39.130 I look forward to seeing the web applications 01:27:39.130 --> 01:27:40.620 that you might go on to create. 01:27:40.620 --> 01:27:45.110 This was Web Programming with Python and JavaScript.