WEBVTT
X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000

00:00:00.000 --> 00:00:03.500
[MUSIC PLAYING]

00:00:17.457 --> 00:00:19.290
BRIAN YU: All right,
welcome back, everyone,

00:00:19.290 --> 00:00:21.570
to Web Programming with
Python and JavaScript.

00:00:21.570 --> 00:00:25.750
And for our final topic, we're going
to explore scalability and security.

00:00:25.750 --> 00:00:28.470
So far in the class, we've
been building web applications.

00:00:28.470 --> 00:00:31.635
And we've been building web applications
that work on our own computer.

00:00:31.635 --> 00:00:33.510
But if we want to take
those web applications

00:00:33.510 --> 00:00:36.000
and deploy them to the world so
people all across the internet

00:00:36.000 --> 00:00:37.958
can begin to use them,
then we're going to need

00:00:37.958 --> 00:00:40.980
to host our web application
on some sort of web server--

00:00:40.980 --> 00:00:44.192
some dedicated piece of hardware
that is listening for web requests

00:00:44.192 --> 00:00:46.650
and responding to them with
the response that we would like

00:00:46.650 --> 00:00:48.660
for our web application to deliver.

00:00:48.660 --> 00:00:51.030
And when we do so, this
introduces a whole bunch

00:00:51.030 --> 00:00:54.338
of interesting issues surrounding
scalability and security.

00:00:54.338 --> 00:00:56.130
So we'll take a look
at these issues today,

00:00:56.130 --> 00:00:59.970
beginning with problems concerning
scalability-- what those problems are

00:00:59.970 --> 00:01:02.650
and how we might go
about addressing them.

00:01:02.650 --> 00:01:04.410
So when we deploy our
web applications, we

00:01:04.410 --> 00:01:06.720
deploy them by putting
them onto a web server

00:01:06.720 --> 00:01:08.970
that I'm, here, just
representing with this rectangle.

00:01:08.970 --> 00:01:12.840
But all the server is is some dedicated
computer, some piece of hardware that

00:01:12.840 --> 00:01:14.620
is listening for incoming requests.

00:01:14.620 --> 00:01:18.750
So we'll draw this line to represent
an incoming web request from a user.

00:01:18.750 --> 00:01:21.660
The server takes that
request and responds to it.

00:01:21.660 --> 00:01:23.880
But ultimately, our web
application isn't just

00:01:23.880 --> 00:01:25.530
going to be servicing one user.

00:01:25.530 --> 00:01:28.080
If it becomes popular,
it might have many users

00:01:28.080 --> 00:01:31.560
that are all trying to connect
to that server at the same time.

00:01:31.560 --> 00:01:34.790
And as multiple people start to connect
to that server at the same time,

00:01:34.790 --> 00:01:37.560
here is where we start to deal
with issues of scalability.

00:01:37.560 --> 00:01:41.040
A single computer or a single server
can only service so many users

00:01:41.040 --> 00:01:42.273
at any given time.

00:01:42.273 --> 00:01:44.190
And so, therefore, we
need to think in advance

00:01:44.190 --> 00:01:47.640
about how we're going to deal
with those issues of scale.

00:01:47.640 --> 00:01:49.920
But the first question,
before we even get there,

00:01:49.920 --> 00:01:52.320
is where these servers actually exist.

00:01:52.320 --> 00:01:56.010
And nowadays, there are two main options
for where these servers can exist.

00:01:56.010 --> 00:02:00.210
These servers can be on the
cloud or they can be on premise.

00:02:00.210 --> 00:02:02.400
And on-premise servers,
you might imagine

00:02:02.400 --> 00:02:05.160
is if a company is running
their own web application.

00:02:05.160 --> 00:02:08.340
On-premise servers are servers that
are inside of the company's walls.

00:02:08.340 --> 00:02:10.710
The company owns the
physical servers, maybe

00:02:10.710 --> 00:02:12.840
on some server racks inside of a room.

00:02:12.840 --> 00:02:14.970
And therefore, they
have very direct control

00:02:14.970 --> 00:02:17.940
over all of the servers-- exactly
what kind of servers they are,

00:02:17.940 --> 00:02:19.830
exactly what software
is running on them.

00:02:19.830 --> 00:02:23.280
They can go and physically look at the
servers and debug them, if need be,

00:02:23.280 --> 00:02:25.830
in order to make sure that
any issues are dealt with.

00:02:25.830 --> 00:02:28.170
But increasingly, we're
starting to move into a world

00:02:28.170 --> 00:02:31.170
where cloud computing is
becoming increasingly popular.

00:02:31.170 --> 00:02:35.190
In cloud computing, rather than have
dedicated servers that are on premise,

00:02:35.190 --> 00:02:37.290
we have servers that are
somewhere in the cloud

00:02:37.290 --> 00:02:40.950
where cloud computing companies
like Amazon, or Google, or Microsoft

00:02:40.950 --> 00:02:42.720
are able to run their own servers.

00:02:42.720 --> 00:02:46.860
And we simply use those servers that
are provided by those third parties,

00:02:46.860 --> 00:02:50.130
whether it's Amazon, or Google,
or Microsoft, or someone else.

00:02:50.130 --> 00:02:51.330
And there are trade offs.

00:02:51.330 --> 00:02:54.950
With cloud computing, we no longer have
as direct control over the machines

00:02:54.950 --> 00:02:56.700
themselves because
they're not on premise.

00:02:56.700 --> 00:02:59.190
We can't physically
manipulate those computers.

00:02:59.190 --> 00:03:01.620
But we have the advantage
of not having to worry

00:03:01.620 --> 00:03:05.070
about dealing with physical
objects that are inside

00:03:05.070 --> 00:03:08.280
of the premise of the company whose
servers we'd like to run code for.

00:03:08.280 --> 00:03:10.770
When it's on the cloud,
everything is managed externally

00:03:10.770 --> 00:03:14.205
by some other company, and we can
simply use the servers that we need to.

00:03:14.205 --> 00:03:16.830
And we'll see that this lends
itself to other benefits as well.

00:03:16.830 --> 00:03:20.490
As we might need more servers, as we
start to get more sophisticated web

00:03:20.490 --> 00:03:24.120
applications that need more users,
these cloud-computing companies

00:03:24.120 --> 00:03:26.220
can allow us to create
web applications that

00:03:26.220 --> 00:03:29.280
are able to scale across
multiple different servers

00:03:29.280 --> 00:03:31.910
as we start to get more and more users.

00:03:31.910 --> 00:03:35.460
But we'll discuss those issues
of scale as we get to them.

00:03:35.460 --> 00:03:37.890
The question we need to ask
after we have these servers--

00:03:37.890 --> 00:03:40.348
whether they're servers that
are on premise or servers that

00:03:40.348 --> 00:03:42.240
are operating somewhere in the cloud--

00:03:42.240 --> 00:03:47.328
is, how many users can the server
actually service at any given time?

00:03:47.328 --> 00:03:48.370
And that's going to vary.

00:03:48.370 --> 00:03:51.300
It's going to vary based on the
size of the server, the computing

00:03:51.300 --> 00:03:52.470
power of the server.

00:03:52.470 --> 00:03:56.250
And it's going to be dependent
upon how long it takes to process

00:03:56.250 --> 00:03:58.110
any particular user's request.

00:03:58.110 --> 00:04:00.420
If user requests are
quite expensive, it might

00:04:00.420 --> 00:04:03.870
mean that there are fewer users that
can be serviced at any given time.

00:04:03.870 --> 00:04:05.880
And it's for that reason
that a helpful tool

00:04:05.880 --> 00:04:08.850
is to do some kind of benchmarking,
some process of trying

00:04:08.850 --> 00:04:12.630
to do some analysis on how many
users a server can actually

00:04:12.630 --> 00:04:14.730
be handling at any particular time.

00:04:14.730 --> 00:04:16.950
And there are numerous
different tools that allow

00:04:16.950 --> 00:04:18.779
us to do this kind of benchmarking.

00:04:18.779 --> 00:04:22.470
Apache Bench, or otherwise
known as AB, is a popular tool

00:04:22.470 --> 00:04:24.250
for doing this kind of thing.

00:04:24.250 --> 00:04:28.290
But benchmarking is going to be useful
so that we know how many users one

00:04:28.290 --> 00:04:29.550
particular server can handle.

00:04:29.550 --> 00:04:31.290
Maybe it can handle 50 users.

00:04:31.290 --> 00:04:32.700
Maybe it can handle 100 users.

00:04:32.700 --> 00:04:35.160
Maybe it can handle
more at any given time.

00:04:35.160 --> 00:04:37.830
But ultimately, it's going
to be some finite limit.

00:04:37.830 --> 00:04:40.680
Every computer just has some
finite amount of resources,

00:04:40.680 --> 00:04:42.030
and servers are no exception.

00:04:42.030 --> 00:04:45.360
There's going to be some number of
users after which the server is not

00:04:45.360 --> 00:04:47.020
going to be able to handle it.

00:04:47.020 --> 00:04:48.850
So what do we do in that situation?

00:04:48.850 --> 00:04:53.130
What do we do if our server can only
handle 100 users at any given time,

00:04:53.130 --> 00:04:58.020
but 101 users are trying to use our
web application at the same time?

00:04:58.020 --> 00:04:59.440
Something needs to change.

00:04:59.440 --> 00:05:01.740
We need to deal with
some sort of scaling

00:05:01.740 --> 00:05:04.500
to make sure that our web
application can scale.

00:05:04.500 --> 00:05:07.770
And there are a couple of different
types of scaling that we can try.

00:05:07.770 --> 00:05:10.530
One approach is to do what's
called vertical scaling, which

00:05:10.530 --> 00:05:12.780
might be the simplest way
you could imagine scaling.

00:05:12.780 --> 00:05:15.900
If this server is not good enough
for handling the number of users

00:05:15.900 --> 00:05:18.890
that we need it to handle,
well, just get a bigger serve.

00:05:18.890 --> 00:05:21.260
In vertical scaling,
we just take the server

00:05:21.260 --> 00:05:23.930
and get a bigger server,
a more powerful server,

00:05:23.930 --> 00:05:26.480
a server that can handle
more users at any given time.

00:05:26.480 --> 00:05:27.730
It's going to cost more.

00:05:27.730 --> 00:05:29.480
But if we need it to
handle more users, we

00:05:29.480 --> 00:05:33.110
can just get a bigger server to
be able to deal with that problem.

00:05:33.110 --> 00:05:34.607
This approach is fairly simple.

00:05:34.607 --> 00:05:37.190
It just involves swapping out
one server for another, one that

00:05:37.190 --> 00:05:39.410
can handle more users concurrently.

00:05:39.410 --> 00:05:40.830
But it also has drawbacks.

00:05:40.830 --> 00:05:44.330
There is some limit to how big the
server can be, to how many users

00:05:44.330 --> 00:05:47.390
any physical one server is going to
be able to handle because there's

00:05:47.390 --> 00:05:50.870
a physical limitation on what is
the biggest, fastest, most powerful

00:05:50.870 --> 00:05:53.310
server we could possibly get.

00:05:53.310 --> 00:05:55.970
So when vertical scaling
ends up not being enough,

00:05:55.970 --> 00:05:59.720
an alternative-- as you might imagine--
is what's known as horizontal scaling.

00:05:59.720 --> 00:06:01.970
And the idea behind
horizontal scaling is

00:06:01.970 --> 00:06:06.560
that, when one server isn't enough to
be able to service all of the users that

00:06:06.560 --> 00:06:10.070
might be trying to use a web
application at the same time, well,

00:06:10.070 --> 00:06:13.010
then we can take the approach of
saying, well, rather than just using

00:06:13.010 --> 00:06:17.840
one server, let's go ahead and split
it up into two different servers.

00:06:17.840 --> 00:06:21.420
We now have two servers that are
both running the web application.

00:06:21.420 --> 00:06:24.980
And now, effectively, we've been
able to double the number of users

00:06:24.980 --> 00:06:26.600
that this web application can handle.

00:06:26.600 --> 00:06:29.690
Rather than just a single server
that can service 100 users,

00:06:29.690 --> 00:06:33.200
if we have two of them, now we can
service 200 users at any given time

00:06:33.200 --> 00:06:37.670
if you imagine 100 of them using
server A over here and 100 of them

00:06:37.670 --> 00:06:40.460
using server B over there.

00:06:40.460 --> 00:06:44.220
But this then lends itself to some
other questions that we have to answer,

00:06:44.220 --> 00:06:47.630
which is, how do these servers get
their users in the first place?

00:06:47.630 --> 00:06:50.450
When a user requests a web
page, how does that user

00:06:50.450 --> 00:06:54.140
get directed either to
server A or to server B?

00:06:54.140 --> 00:06:57.980
It seems that they need some way to
make that decision in order to decide

00:06:57.980 --> 00:07:00.690
whether to go one direction or another.

00:07:00.690 --> 00:07:04.010
And it's for that reason that we might
introduce another piece of hardware

00:07:04.010 --> 00:07:05.240
into this picture.

00:07:05.240 --> 00:07:09.070
And that additional piece of hardware
is what we might call a load balancer.

00:07:09.070 --> 00:07:11.510
And a load balancer is just
another piece of hardware

00:07:11.510 --> 00:07:14.910
that is going to sit in front
of these servers, so to speak.

00:07:14.910 --> 00:07:17.660
In other words, when a user
makes a request to a web page,

00:07:17.660 --> 00:07:21.170
rather than immediately getting that
request to one of these web servers,

00:07:21.170 --> 00:07:25.250
the request is first going to
go through this load balancer

00:07:25.250 --> 00:07:27.800
where the request first
comes into the load balancer.

00:07:27.800 --> 00:07:31.160
And the load balancer then decides
whether to send that request to server

00:07:31.160 --> 00:07:35.330
A or to send that request to
server B. And this process

00:07:35.330 --> 00:07:38.300
is likely less expensive than
actually dealing with and processing

00:07:38.300 --> 00:07:39.330
that request.

00:07:39.330 --> 00:07:42.440
So the load balancer is effectively
just acting as a dispatcher.

00:07:42.440 --> 00:07:44.310
It waits for those requests to come in.

00:07:44.310 --> 00:07:46.670
And when the requests do
come in, the load balancer

00:07:46.670 --> 00:07:49.628
directs those requests either to
go to one server or to another.

00:07:49.628 --> 00:07:52.670
And you might imagine the story where
we have more than just two servers.

00:07:52.670 --> 00:07:54.260
Maybe we have many servers.

00:07:54.260 --> 00:07:56.660
And the load balancer
is just going to balance

00:07:56.660 --> 00:07:59.030
between all of those different servers.

00:07:59.030 --> 00:08:02.570
And this process of deciding
which server to send a request to

00:08:02.570 --> 00:08:05.840
is known as load balancing, which is
what the load balancer is ultimately

00:08:05.840 --> 00:08:06.618
doing.

00:08:06.618 --> 00:08:09.410
And there are various different
methods that you might use in order

00:08:09.410 --> 00:08:11.042
to perform this load balancing.

00:08:11.042 --> 00:08:13.250
So you might imagine thinking
about this intuitively.

00:08:13.250 --> 00:08:16.490
How would the load balancer
decide, given some request,

00:08:16.490 --> 00:08:19.220
should we send the request to
this router, to this server,

00:08:19.220 --> 00:08:22.910
or should we send the request
to some other server instead?

00:08:22.910 --> 00:08:26.120
And there are many different approaches
that our load balancer might take.

00:08:26.120 --> 00:08:27.440
And here are just a couple.

00:08:27.440 --> 00:08:30.230
Random choice might be
the simplest of options.

00:08:30.230 --> 00:08:34.480
Given a user that shows up and tries
to make a request to our web server,

00:08:34.480 --> 00:08:36.620
the load balancer first
takes a look at the user

00:08:36.620 --> 00:08:40.497
and just randomly assigns them to
one of the various different servers

00:08:40.497 --> 00:08:42.080
that might be processing that request.

00:08:42.080 --> 00:08:46.340
If there are 10 different servers, it
randomly chooses among those 10 servers

00:08:46.340 --> 00:08:50.030
to decide which of them is going
to be servicing that request.

00:08:50.030 --> 00:08:52.020
This has the advantage
of being very simple.

00:08:52.020 --> 00:08:53.300
It's just a quick calculation.

00:08:53.300 --> 00:08:56.330
The computers can pretty
readily generate random numbers.

00:08:56.330 --> 00:08:58.310
And based on that random
number, the computer

00:08:58.310 --> 00:09:02.720
can dispatch the user to one
server or to another server.

00:09:02.720 --> 00:09:06.620
But it might not be the best option
because, if we happen to get unlucky,

00:09:06.620 --> 00:09:10.190
we might end up with many more
users on one server than another.

00:09:10.190 --> 00:09:12.890
Or we might end up with
servers that are entirely

00:09:12.890 --> 00:09:15.230
unused if it just so
happens that we don't end up

00:09:15.230 --> 00:09:17.300
randomly selecting that server.

00:09:17.300 --> 00:09:20.780
Now, in practice with many users that
are all using this load balancer, all

00:09:20.780 --> 00:09:24.260
being dispatched, odds are high that
eventually all of them will be used.

00:09:24.260 --> 00:09:26.837
But it might not be a
totally even distribution.

00:09:26.837 --> 00:09:28.670
And so for that reason,
another approach you

00:09:28.670 --> 00:09:32.570
might take is round-robin approach
where the approach is, instead,

00:09:32.570 --> 00:09:36.650
for the very first user, go ahead and
assign that user to server number one.

00:09:36.650 --> 00:09:38.840
For the next user, assign
them to server number two.

00:09:38.840 --> 00:09:40.760
And maybe, if there are
five servers, you say,

00:09:40.760 --> 00:09:44.150
the third user goes to server three,
user four goes to server four,

00:09:44.150 --> 00:09:47.420
user five goes to server
five, and then user six

00:09:47.420 --> 00:09:49.070
goes back to server number one.

00:09:49.070 --> 00:09:51.257
You basically rotate
going one through five.

00:09:51.257 --> 00:09:53.840
And then, once you've assigned
someone to each of the servers,

00:09:53.840 --> 00:09:55.760
you go back to the beginning.

00:09:55.760 --> 00:09:59.360
This is also a relatively easy thing to
implement because you can simply just

00:09:59.360 --> 00:10:01.520
keep count somewhere
in the load balancer

00:10:01.520 --> 00:10:04.730
saying, what was the most recent
server that I assigned a user to?

00:10:04.730 --> 00:10:07.550
And the next time a request
comes in, go ahead and assign it

00:10:07.550 --> 00:10:09.710
to the next server, and
the next server after that,

00:10:09.710 --> 00:10:12.220
effectively doing a
round-robin style approach

00:10:12.220 --> 00:10:16.040
where you go through all the servers
once before going through the servers

00:10:16.040 --> 00:10:17.140
again.

00:10:17.140 --> 00:10:19.750
Now, this might seem better
than random choice in the sense

00:10:19.750 --> 00:10:23.230
that it's going to more equitably
decide whether to assign

00:10:23.230 --> 00:10:26.710
any particular request
to any particular server.

00:10:26.710 --> 00:10:29.110
But it also suffers
from certain problems.

00:10:29.110 --> 00:10:31.510
Round robin might be
great, but if some requests

00:10:31.510 --> 00:10:34.975
take longer than other requests,
we might also get unlucky,

00:10:34.975 --> 00:10:36.850
and the requests that
are taking longer might

00:10:36.850 --> 00:10:40.160
end up all going to one of the
servers as opposed to another server.

00:10:40.160 --> 00:10:43.310
So there are other approaches that
we might want to go to as well--

00:10:43.310 --> 00:10:45.880
for example, something
like fewest connections

00:10:45.880 --> 00:10:50.430
where the approach there is to say, go
ahead, and when a user makes a request,

00:10:50.430 --> 00:10:53.050
the load balancer should
pick which of the servers

00:10:53.050 --> 00:10:57.370
currently has the fewest active
connections from other users

00:10:57.370 --> 00:11:01.060
and other requests that are currently
connected to those servers instead.

00:11:01.060 --> 00:11:04.120
And by choosing the server that
happens to have the fewest connections,

00:11:04.120 --> 00:11:07.330
you're probably going to do a
better job of trying to balance out

00:11:07.330 --> 00:11:09.340
between all of the
various different requests

00:11:09.340 --> 00:11:12.220
that might be happening inside
of your web application.

00:11:12.220 --> 00:11:15.220
And while this might do a better job,
there are trade offs here as well.

00:11:15.220 --> 00:11:18.700
It might be more expensive, for
example, to compute which of the servers

00:11:18.700 --> 00:11:21.310
happens to have the fewest
number of connections,

00:11:21.310 --> 00:11:24.880
whereas it's much easier just to
say, choose a server at random

00:11:24.880 --> 00:11:29.740
or to do the round-robin style approach
of just 1, 2, 3, 4, 5, 1, 2, 3, 4, 5,

00:11:29.740 --> 00:11:32.590
again, and again, and again.

00:11:32.590 --> 00:11:36.410
But all of these approaches
naively have yet another problem,

00:11:36.410 --> 00:11:38.030
which has to do with sessions.

00:11:38.030 --> 00:11:40.150
And you'll recall that
sessions we used whenever

00:11:40.150 --> 00:11:44.110
we wanted to store information
about the user's current interaction

00:11:44.110 --> 00:11:45.220
with the web application.

00:11:45.220 --> 00:11:46.780
When you log into a website--

00:11:46.780 --> 00:11:50.300
you log into your email, or you
log into Amazon, for example--

00:11:50.300 --> 00:11:53.740
and then you come back to that website
or visit another page on that website--

00:11:53.740 --> 00:11:56.470
make another request, for example--

00:11:56.470 --> 00:11:59.800
it's not the case that you have to sign
in yet again, that the web browser has

00:11:59.800 --> 00:12:01.720
totally forgotten who you are.

00:12:01.720 --> 00:12:04.450
When I go back to my mail account,
or when I go back to Amazon

00:12:04.450 --> 00:12:08.205
for a second time, my mail account or
Amazon remembers me from the last time

00:12:08.205 --> 00:12:08.830
that I visited.

00:12:08.830 --> 00:12:13.060
I have some sort of session where it's
keeping track of who is logged in,

00:12:13.060 --> 00:12:15.670
maybe information about what
I've been doing on the page,

00:12:15.670 --> 00:12:18.790
and allows me to continue
interacting with the web application,

00:12:18.790 --> 00:12:21.880
even if I'm making multiple requests.

00:12:21.880 --> 00:12:24.310
And this, you might
imagine, could be a problem

00:12:24.310 --> 00:12:26.440
for this type of load balancing.

00:12:26.440 --> 00:12:31.630
If I have multiple different servers,
imagine if I try to log into a website.

00:12:31.630 --> 00:12:34.990
And the first time I make a request,
I'm directed to server number one.

00:12:34.990 --> 00:12:37.690
And I'm now logged in
on server number one.

00:12:37.690 --> 00:12:39.400
But then I make another request.

00:12:39.400 --> 00:12:41.162
I'm directed back to the load balancer.

00:12:41.162 --> 00:12:43.120
And maybe the load
balancer, this time, decides

00:12:43.120 --> 00:12:45.310
to send me to server number two.

00:12:45.310 --> 00:12:48.190
But if the session is stored in
server number one somewhere--

00:12:48.190 --> 00:12:51.010
server number one remembers
who I am and what I'm doing--

00:12:51.010 --> 00:12:54.282
then server number two is
not going to know who I am.

00:12:54.282 --> 00:12:56.740
And therefore, it's not going
to remember that I've already

00:12:56.740 --> 00:12:58.660
logged into this web application.

00:12:58.660 --> 00:13:01.710
And as a result, I might be
prompted to log in again.

00:13:01.710 --> 00:13:04.630
And if I go make another request,
and I end up on yet another server,

00:13:04.630 --> 00:13:07.580
I might be logged out again and
have to log in for a third time.

00:13:07.580 --> 00:13:11.590
So the problem comes about when
our load balancing happens,

00:13:11.590 --> 00:13:14.290
but we're not doing so
in a session-aware way--

00:13:14.290 --> 00:13:18.310
that our load balancer isn't caring
about when a user visits the page

00:13:18.310 --> 00:13:22.300
and then visits another page on
the same web application again--

00:13:22.300 --> 00:13:25.720
because we want to remember
information from the previous time

00:13:25.720 --> 00:13:27.475
that the user was here.

00:13:27.475 --> 00:13:28.850
So how can we solve this problem?

00:13:28.850 --> 00:13:30.820
How can we make sure
that, when we do this load

00:13:30.820 --> 00:13:33.010
balancing across multiple
different servers,

00:13:33.010 --> 00:13:34.795
that we do so in a session-aware way?

00:13:34.795 --> 00:13:36.670
Well, there are multiple
different approaches

00:13:36.670 --> 00:13:39.310
to session-aware load balancing.

00:13:39.310 --> 00:13:42.610
One approach is this general
idea known as sticky sessions

00:13:42.610 --> 00:13:46.150
where the idea is that, when I
come back to the load balancer,

00:13:46.150 --> 00:13:49.940
the load balancer will remember
what server I was sent to last time

00:13:49.940 --> 00:13:52.210
and send me there yet again.

00:13:52.210 --> 00:13:54.670
So for example, if I
log into a website once,

00:13:54.670 --> 00:13:57.490
and I'm directed to server
number two, for example, then

00:13:57.490 --> 00:14:00.130
the next time I visit
this web application,

00:14:00.130 --> 00:14:03.520
even if I should be directed to
server three or four according

00:14:03.520 --> 00:14:07.600
to random choice or according to fewest
connections or any of these other load

00:14:07.600 --> 00:14:09.700
balancing methods, the
load balancer should

00:14:09.700 --> 00:14:12.310
remember that, last time
I came to this site,

00:14:12.310 --> 00:14:14.240
I got directed to server number two.

00:14:14.240 --> 00:14:16.210
And so this time, the
load balancer is going

00:14:16.210 --> 00:14:18.550
to direct me to server
number two yet again.

00:14:18.550 --> 00:14:22.000
That way, server number two, which
contains information about my session,

00:14:22.000 --> 00:14:25.000
is going to see me again and
remember who it is that I am.

00:14:25.000 --> 00:14:28.180
And it's not going to make me log
in again into the exact same website

00:14:28.180 --> 00:14:30.570
for a second time, for example.

00:14:30.570 --> 00:14:33.280
And so sticky sessions are one
way of dealing with this problem.

00:14:33.280 --> 00:14:35.363
But again, with all of
these approaches-- and this

00:14:35.363 --> 00:14:38.410
will be a recurring theme as we talk
about scalability and security--

00:14:38.410 --> 00:14:39.730
there are trade offs here.

00:14:39.730 --> 00:14:44.200
A trade to the sticky sessions is that
it's possible that one of these servers

00:14:44.200 --> 00:14:47.950
is going to end up getting far more
load than another if one server happens

00:14:47.950 --> 00:14:50.620
to have a lot of users that
keep coming back to the website

00:14:50.620 --> 00:14:52.390
and keep requesting additional pages.

00:14:52.390 --> 00:14:54.940
But other pages, other
servers might have

00:14:54.940 --> 00:14:58.010
had users that decided not
to come back, for example.

00:14:58.010 --> 00:15:01.390
And so there's a difference in
utilization where some of our servers

00:15:01.390 --> 00:15:03.880
might be more heavily
utilized than other servers,

00:15:03.880 --> 00:15:07.580
and we're not doing a very good
job of balancing between them.

00:15:07.580 --> 00:15:11.980
And so one approach is to store
sessions inside of the database

00:15:11.980 --> 00:15:15.580
rather than store information
about sessions inside of the server

00:15:15.580 --> 00:15:18.730
themselves so that, if I get
directed to another server,

00:15:18.730 --> 00:15:20.710
that other server doesn't
remember who I am,

00:15:20.710 --> 00:15:24.310
doesn't remember information about
my interaction with this website.

00:15:24.310 --> 00:15:27.890
If we instead choose to store
sessions inside of a database--

00:15:27.890 --> 00:15:31.210
and, in particular, inside of a
database that all of the servers

00:15:31.210 --> 00:15:33.100
have the ability to access--

00:15:33.100 --> 00:15:36.400
well, then it doesn't matter which
of the servers I get directed to

00:15:36.400 --> 00:15:39.370
and which server the load
balancer decides to send me to

00:15:39.370 --> 00:15:42.310
because, regardless of which
server I end up getting sent to,

00:15:42.310 --> 00:15:44.235
the session information
is in the database.

00:15:44.235 --> 00:15:46.360
And each of the servers
can connect to the database

00:15:46.360 --> 00:15:49.390
to find out who I am, to find out
whether I've logged into the site

00:15:49.390 --> 00:15:52.660
already, and therefore
is able to recognize me.

00:15:52.660 --> 00:15:54.670
And so that might be
one approach as well.

00:15:54.670 --> 00:15:57.702
Another approach is to store
sessions on the client side.

00:15:57.702 --> 00:15:59.410
We've talked a little
bit about this idea

00:15:59.410 --> 00:16:03.100
of cookies, which can be stored where
the web browser can set a cookie so

00:16:03.100 --> 00:16:06.460
that your web browser is able to
present that cookie the next time

00:16:06.460 --> 00:16:09.020
it makes a request to
the same web application.

00:16:09.020 --> 00:16:12.430
And inside this cookie, you can store
a whole bunch of information, including

00:16:12.430 --> 00:16:14.000
information about the session.

00:16:14.000 --> 00:16:16.690
You might, inside of a
cookie, store information

00:16:16.690 --> 00:16:19.340
about what user is currently
logged in, for example,

00:16:19.340 --> 00:16:21.500
or other session-related information.

00:16:21.500 --> 00:16:23.080
But here, too, there are drawbacks.

00:16:23.080 --> 00:16:25.750
If you're not careful, someone
could manipulate that cookie

00:16:25.750 --> 00:16:27.380
and maybe pretend to be something else.

00:16:27.380 --> 00:16:29.230
And so for that reason,
you might want to do

00:16:29.230 --> 00:16:32.020
some encryption or some
kind of sign in to make sure

00:16:32.020 --> 00:16:35.832
that you can't fake a cookie and
pretend to be someone that you're not.

00:16:35.832 --> 00:16:37.540
But another concern
is that, as you start

00:16:37.540 --> 00:16:40.130
to store more and more information
inside of these cookies,

00:16:40.130 --> 00:16:43.540
these cookies keep getting sent back and
forth between the server and the client

00:16:43.540 --> 00:16:45.250
every time a request is made.

00:16:45.250 --> 00:16:48.040
That can start to get expensive,
too-- more and more information

00:16:48.040 --> 00:16:52.090
passing back and forth between
the client and between the server.

00:16:52.090 --> 00:16:54.580
So lots of possible
approaches-- no one approach

00:16:54.580 --> 00:16:57.040
that is necessarily the right
approach or the best approach

00:16:57.040 --> 00:16:58.270
to use in any cases.

00:16:58.270 --> 00:17:00.850
But things to be aware
of-- things to think about

00:17:00.850 --> 00:17:03.520
as we begin to deal with these
issues of scale, of making

00:17:03.520 --> 00:17:07.270
sure we have multiple servers that
are available for usage in case we do

00:17:07.270 --> 00:17:07.869
need it.

00:17:07.869 --> 00:17:10.930
But also making sure that, when
we do so, we don't break the user

00:17:10.930 --> 00:17:14.920
experience-- we don't result in a
situation where a user is logged in

00:17:14.920 --> 00:17:18.160
but then, suddenly,
isn't logged in at all.

00:17:18.160 --> 00:17:21.460
And so horizontal scaling gives
us this kind of capacity--

00:17:21.460 --> 00:17:24.760
the ability to have multiple
different servers, all of which

00:17:24.760 --> 00:17:27.880
can be dealing with user requests
and responding to those user requests

00:17:27.880 --> 00:17:28.890
as well.

00:17:28.890 --> 00:17:34.240
But a reasonable question asked is,
how many of those servers do we need?

00:17:34.240 --> 00:17:36.850
Now, we can use benchmarking
to try to estimate this.

00:17:36.850 --> 00:17:40.190
If we have an estimate of how many
users are going to be on our website

00:17:40.190 --> 00:17:42.430
at any given time, we
can benchmark and see

00:17:42.430 --> 00:17:46.420
how many users can be handled by
a single server and extrapolate,

00:17:46.420 --> 00:17:49.330
based on that information,
to infer how many servers we

00:17:49.330 --> 00:17:52.000
might need in our web
application to be able to service

00:17:52.000 --> 00:17:53.650
all of these different users.

00:17:53.650 --> 00:17:56.680
But it might be the case that our
web application doesn't always

00:17:56.680 --> 00:17:58.540
have the same number of users.

00:17:58.540 --> 00:18:01.660
Maybe, sometimes, there are going to
be far more users than another time.

00:18:01.660 --> 00:18:05.140
You might imagine, for example, that
in a news organization's website--

00:18:05.140 --> 00:18:07.690
like the web application
for a newspaper--

00:18:07.690 --> 00:18:09.720
when there's breaking
news, some big story,

00:18:09.720 --> 00:18:11.470
there's going to be a
lot more people that

00:18:11.470 --> 00:18:15.380
are all trying to access the website
at the same time than at other times.

00:18:15.380 --> 00:18:18.310
So one approach might
be, consider the maximum.

00:18:18.310 --> 00:18:20.650
What is the most number
of users that ever

00:18:20.650 --> 00:18:23.620
might be trying to use our web
application at any given time?

00:18:23.620 --> 00:18:26.830
And choose a number of servers
based on that maximum so that,

00:18:26.830 --> 00:18:28.960
no matter how high the
number of users get,

00:18:28.960 --> 00:18:32.800
we will have enough servers to be
able to service all of those users.

00:18:32.800 --> 00:18:35.560
But that's probably not
a great economical choice

00:18:35.560 --> 00:18:39.250
if, in the vast majority of cases,
there will be far fewer users.

00:18:39.250 --> 00:18:42.625
In that case, you're going to have a
lot of servers that are underutilized--

00:18:42.625 --> 00:18:45.250
where you don't need that many
servers, but you're still paying

00:18:45.250 --> 00:18:47.770
for the electricity, for
keeping all of them running--

00:18:47.770 --> 00:18:50.740
which might not be an
ideal choice either.

00:18:50.740 --> 00:18:52.120
So one solution to this--

00:18:52.120 --> 00:18:54.970
quite popular, especially in
this world of cloud computing--

00:18:54.970 --> 00:18:58.660
is the idea of autoscaling
where you can have an autoscaler

00:18:58.660 --> 00:19:03.460
to say that, you know what, let's
start with, for example, two servers.

00:19:03.460 --> 00:19:05.470
But if there's enough
traffic to the website,

00:19:05.470 --> 00:19:07.678
if enough people are making
requests to the website--

00:19:07.678 --> 00:19:10.360
maybe it's a peak time where
people are using the website--

00:19:10.360 --> 00:19:11.830
go ahead and scale up.

00:19:11.830 --> 00:19:15.880
Go ahead and add a third server where
now our load balancer can balance

00:19:15.880 --> 00:19:18.100
between all three of those servers.

00:19:18.100 --> 00:19:20.710
And if even more traffic ends
up coming to the website--

00:19:20.710 --> 00:19:24.280
more users are trying to use this
application all at the same time--

00:19:24.280 --> 00:19:27.160
well, then we can go ahead and
add a fourth server as well.

00:19:27.160 --> 00:19:28.660
And we can continue to do that.

00:19:28.660 --> 00:19:31.510
Most autoscalers will let
you configure, for example,

00:19:31.510 --> 00:19:34.480
a minimum number of servers and
a maximum number of servers.

00:19:34.480 --> 00:19:37.420
And dependent on how many users
happen to be using your web

00:19:37.420 --> 00:19:40.300
application at any given
time, the autoscaler

00:19:40.300 --> 00:19:44.410
can scale up or scale down, adding
new servers as more users come

00:19:44.410 --> 00:19:47.410
to the website, removing
servers as fewer users are

00:19:47.410 --> 00:19:49.870
using the website as well.

00:19:49.870 --> 00:19:52.425
And so this can be a nice
solution to this problem of scale

00:19:52.425 --> 00:19:55.050
where you don't have to worry
about how many servers there are.

00:19:55.050 --> 00:19:57.580
It just autoscales entirely on its own.

00:19:57.580 --> 00:19:59.080
Now, there are trade offs here, too.

00:19:59.080 --> 00:20:01.250
This auto scaling
process might take time.

00:20:01.250 --> 00:20:05.260
And if a lot of users all come into
your website all at the exact same time,

00:20:05.260 --> 00:20:08.350
well, it's going to take
some time to be able to add

00:20:08.350 --> 00:20:10.630
all of these additional
servers to start them up.

00:20:10.630 --> 00:20:13.700
And so there might be some
trade offs there, too,

00:20:13.700 --> 00:20:17.330
where you might not be able to
service all of the users immediately.

00:20:17.330 --> 00:20:19.380
And another problem
worth thinking about is,

00:20:19.380 --> 00:20:21.510
as you add more and
more of these servers,

00:20:21.510 --> 00:20:23.877
you introduce opportunities for failure.

00:20:23.877 --> 00:20:25.710
Now, it's better than
having a single server

00:20:25.710 --> 00:20:29.490
where, if that single server fails,
now suddenly the entire web application

00:20:29.490 --> 00:20:30.390
doesn't work at all.

00:20:30.390 --> 00:20:33.240
That's what we generally call
a single point of failure--

00:20:33.240 --> 00:20:37.410
a single place where, if it fails, the
entire system is going to be broken.

00:20:37.410 --> 00:20:39.720
One advantage of having
multiple servers is

00:20:39.720 --> 00:20:43.530
that we no longer have a single server
that acts as a point of failure.

00:20:43.530 --> 00:20:46.140
If one of the servers
goes down then, ideally,

00:20:46.140 --> 00:20:49.780
our load balancer should be able
to know, based on that information,

00:20:49.780 --> 00:20:53.370
to no longer send a request to
that particular server-- to,

00:20:53.370 --> 00:20:58.470
instead, balance the load across
the remaining three servers instead.

00:20:58.470 --> 00:21:00.640
Now, there's an interesting
question there as well,

00:21:00.640 --> 00:21:04.200
which is, how does the load
balancer know that this server is

00:21:04.200 --> 00:21:05.450
no longer responding?

00:21:05.450 --> 00:21:07.200
For some reason, it
has some sort of error

00:21:07.200 --> 00:21:09.763
that it's not able to process
requests appropriately.

00:21:09.763 --> 00:21:11.680
Well, there are multiple
ways you can do this.

00:21:11.680 --> 00:21:15.090
But one of the most common is what's
simply known as a heartbeat where,

00:21:15.090 --> 00:21:18.240
effectively, every so often,
every some number of seconds,

00:21:18.240 --> 00:21:20.700
the load balancer pings
all of the servers--

00:21:20.700 --> 00:21:23.280
just sends a quick request
to all the servers.

00:21:23.280 --> 00:21:26.250
And all of the servers are
supposed to respond back.

00:21:26.250 --> 00:21:29.010
And using that information,
the load balancer

00:21:29.010 --> 00:21:31.920
knows a little bit about the
latency of each of the servers--

00:21:31.920 --> 00:21:34.920
how long it took for the server
to respond to the request.

00:21:34.920 --> 00:21:37.440
But also, it can get
information about whether or not

00:21:37.440 --> 00:21:39.450
the server is functioning properly.

00:21:39.450 --> 00:21:42.157
If one of the servers
doesn't respond to the ping,

00:21:42.157 --> 00:21:44.490
well, then the load balancer
knows that there's probably

00:21:44.490 --> 00:21:47.640
something wrong with the server, that
we probably shouldn't be directing

00:21:47.640 --> 00:21:50.570
more users to that server at all.

00:21:50.570 --> 00:21:53.730
And so this can solve for the
problem of a single point of failure

00:21:53.730 --> 00:21:57.570
by allowing ourselves multiple servers
where, if any one of the servers fails,

00:21:57.570 --> 00:22:00.450
the load balancer learns
about that via heartbeat

00:22:00.450 --> 00:22:03.540
and then, based on that information,
can begin to redirect traffic

00:22:03.540 --> 00:22:05.847
to the other servers instead.

00:22:05.847 --> 00:22:08.430
Now, one thing you might notice
is that, even in this picture,

00:22:08.430 --> 00:22:11.970
now the load balancer appears to
be like a single point of failure

00:22:11.970 --> 00:22:14.460
where, if the low balance
happens to fail, well, now

00:22:14.460 --> 00:22:16.668
nothing is going to work
because the load balancer is

00:22:16.668 --> 00:22:18.810
the one responsible for
directing traffic to all

00:22:18.810 --> 00:22:20.190
of the various different servers.

00:22:20.190 --> 00:22:23.790
And so even though there is no single
server that is a point to failure,

00:22:23.790 --> 00:22:27.370
this load balancer also appears
to be a single point of failure.

00:22:27.370 --> 00:22:28.540
And that's definitely true.

00:22:28.540 --> 00:22:31.470
And you might imagine instead
having multiple load balancers

00:22:31.470 --> 00:22:35.310
where one load balancer goes down,
another load balancer can swoop in,

00:22:35.310 --> 00:22:39.000
acting as a hot spare where it picks up
all of the traffic that was originally

00:22:39.000 --> 00:22:40.650
going to the first load balancer.

00:22:40.650 --> 00:22:44.550
And if it ever goes down, a second
one is ready to take its place.

00:22:44.550 --> 00:22:47.700
And it might also be doing this kind
of heartbeat process-- checking up

00:22:47.700 --> 00:22:48.845
on the first load balancer.

00:22:48.845 --> 00:22:51.970
And if all goes well, the second load
balancer doesn't have to do anything.

00:22:51.970 --> 00:22:54.490
But if the first load
balancer ever were to fail,

00:22:54.490 --> 00:22:56.640
well, then the second
load balancer can step in

00:22:56.640 --> 00:22:59.700
and begin servicing those
requests, directing them to all

00:22:59.700 --> 00:23:01.840
of these individual servers as well.

00:23:01.840 --> 00:23:02.705
And so there, too--

00:23:02.705 --> 00:23:05.580
another opportunity to think about
where the single points of failure

00:23:05.580 --> 00:23:09.300
are and thinking about how we might
address the single points of failure

00:23:09.300 --> 00:23:12.330
in order to make sure that our
web applications are scalable.

00:23:12.330 --> 00:23:14.820
So that then deals with
issues about how we might

00:23:14.820 --> 00:23:17.070
go about scaling up these servers.

00:23:17.070 --> 00:23:20.340
But ultimately, the servers are
not the entirety of the story.

00:23:20.340 --> 00:23:22.350
Inside of our applications,
we mostly have

00:23:22.350 --> 00:23:25.918
writing web applications that interact
and deal with data in some way.

00:23:25.918 --> 00:23:28.710
And there are multiple different
databases that we've talked about.

00:23:28.710 --> 00:23:30.900
SQLite Light has been the
default one that Django

00:23:30.900 --> 00:23:34.200
provides to us, which just
stores data inside of a file.

00:23:34.200 --> 00:23:36.020
But as we begin to
grow our applications,

00:23:36.020 --> 00:23:39.270
if we want to begin to scale them,
it's quite popular and quite common

00:23:39.270 --> 00:23:41.530
to put databases entirely
somewhere separate--

00:23:41.530 --> 00:23:44.340
to have a separate database server
running somewhere else where

00:23:44.340 --> 00:23:46.800
the servers are all
communicating with that database,

00:23:46.800 --> 00:23:50.550
whether it's we're running MySQL, or
Postgres, or some other database system

00:23:50.550 --> 00:23:51.750
instead.

00:23:51.750 --> 00:23:55.410
And all of the servers then
have access to that database.

00:23:55.410 --> 00:23:57.990
And so there, too, are
considerations that we

00:23:57.990 --> 00:24:00.420
need to take into account--
issues of how it is that we

00:24:00.420 --> 00:24:03.840
go about scaling up these databases.

00:24:03.840 --> 00:24:06.960
In this picture, for example,
you might imagine a load balancer

00:24:06.960 --> 00:24:08.730
that is communicating with two servers.

00:24:08.730 --> 00:24:10.950
But both of those
servers, for example, need

00:24:10.950 --> 00:24:13.200
to be communicating with this database.

00:24:13.200 --> 00:24:16.140
And much like any server can only
handle some number of requests,

00:24:16.140 --> 00:24:19.380
some number of users at any
given time, databases, too,

00:24:19.380 --> 00:24:23.280
can only handle some number of requests,
some concurrent number of connections

00:24:23.280 --> 00:24:24.250
at any given time.

00:24:24.250 --> 00:24:26.130
And so we need to begin
to think about issues

00:24:26.130 --> 00:24:30.120
of how it is that we scale these
databases as well in order to be

00:24:30.120 --> 00:24:33.330
able to handle more and more users.

00:24:33.330 --> 00:24:35.580
Now, one approach, the first
thing we might try to do,

00:24:35.580 --> 00:24:38.160
is something called database
partitioning-- effectively,

00:24:38.160 --> 00:24:42.270
splitting up what is a big data
set into multiple different parts

00:24:42.270 --> 00:24:43.470
to that data set.

00:24:43.470 --> 00:24:46.560
And we've already seen some
examples of database partitioning.

00:24:46.560 --> 00:24:49.890
We've seen one example where-- for
example, when we talked about SQL,

00:24:49.890 --> 00:24:53.130
we looked at a table of flights
where each flight had an origin

00:24:53.130 --> 00:24:57.840
city, the origin city's airport code,
the destination city, the destination

00:24:57.840 --> 00:25:00.120
city's airport code, and
some number of minutes,

00:25:00.120 --> 00:25:02.850
the duration for that particular flight.

00:25:02.850 --> 00:25:05.820
And we decided that storing all
of this data in a single table

00:25:05.820 --> 00:25:07.590
probably wasn't the best idea.

00:25:07.590 --> 00:25:10.170
And instead, we wanted
to split that data up

00:25:10.170 --> 00:25:13.380
in a type of partitioning where,
instead, we said, all right, let's just

00:25:13.380 --> 00:25:16.230
have one table that will
have all of the airports.

00:25:16.230 --> 00:25:20.440
And so each airport gets its own
row inside of this airports table.

00:25:20.440 --> 00:25:22.640
And we also had another
table which was just

00:25:22.640 --> 00:25:26.270
the flights table which, rather
than storing all of those columns,

00:25:26.270 --> 00:25:28.820
just mapped two airports to each other.

00:25:28.820 --> 00:25:32.660
With any given flight, it has an
origin idea, meaning which object,

00:25:32.660 --> 00:25:36.800
which row in the origin airports
table is represented by the flight,

00:25:36.800 --> 00:25:39.680
and then which row in
the airports table is

00:25:39.680 --> 00:25:42.860
going to represent the
destination for that flight.

00:25:42.860 --> 00:25:45.530
So we took one table and
effectively split it up

00:25:45.530 --> 00:25:49.940
into multiple tables, each of
which ultimately had fewer columns.

00:25:49.940 --> 00:25:52.850
And this might be something we
call the vertical partitioning

00:25:52.850 --> 00:25:56.810
of a database where, instead of
just having single big long tables,

00:25:56.810 --> 00:25:59.420
we split them up into
multiple tables, each

00:25:59.420 --> 00:26:01.820
of which have fewer columns
that are able to represent

00:26:01.820 --> 00:26:03.497
data in a more relational way.

00:26:03.497 --> 00:26:05.330
And that's something
we've seen before, too.

00:26:05.330 --> 00:26:07.460
But in addition to
vertical partitioning,

00:26:07.460 --> 00:26:11.090
we can also do horizontal
partitioning where the idea there

00:26:11.090 --> 00:26:13.340
is that we take a table
and just split it up

00:26:13.340 --> 00:26:17.390
into multiple tables that are all
storing effectively the same data,

00:26:17.390 --> 00:26:19.380
but split up into different data sets.

00:26:19.380 --> 00:26:22.520
So the same type of data, but
just in different tables--

00:26:22.520 --> 00:26:25.100
where we might have originally
had a flights table,

00:26:25.100 --> 00:26:28.490
and instead we split it up
into a domestic flights table

00:26:28.490 --> 00:26:30.380
and an international flights table.

00:26:30.380 --> 00:26:32.870
Each of these tables still
has the exact same column.

00:26:32.870 --> 00:26:34.555
They still have a destination column.

00:26:34.555 --> 00:26:35.930
They still have an origin column.

00:26:35.930 --> 00:26:38.250
They still have a duration
column, for example.

00:26:38.250 --> 00:26:41.210
But we've just now taken the
data that used to be in one table

00:26:41.210 --> 00:26:46.040
and split up that data into two or more
multiple different tables instead--

00:26:46.040 --> 00:26:49.940
one for all the domestic flights, one
for all the international flights.

00:26:49.940 --> 00:26:52.370
And the advantage there
is that we no longer

00:26:52.370 --> 00:26:55.760
need to search through the entirety
of the data set if we're just looking

00:26:55.760 --> 00:26:57.780
for one domestic flight, for example.

00:26:57.780 --> 00:27:00.680
If you know the flight you're
looking for is a domestic flight,

00:27:00.680 --> 00:27:04.820
well, then it can be more efficient to
just search the flight's domestic table

00:27:04.820 --> 00:27:08.270
and not bother searching through
the flight international table.

00:27:08.270 --> 00:27:11.300
And so if we're intelligent about
how we choose to take a table

00:27:11.300 --> 00:27:14.540
and split it up into multiple
different tables, the effect of that

00:27:14.540 --> 00:27:16.880
is that we can often
improve the efficiency

00:27:16.880 --> 00:27:19.190
of our searches, the
efficiency of our operations,

00:27:19.190 --> 00:27:21.830
because we're dealing with
multiple smaller tables

00:27:21.830 --> 00:27:24.320
where these operations can come faster.

00:27:24.320 --> 00:27:27.350
One drawback though is that,
as we begin to split data

00:27:27.350 --> 00:27:31.250
across multiple different tables,
it becomes more expensive if ever we

00:27:31.250 --> 00:27:33.980
need to join this data
back together and connect

00:27:33.980 --> 00:27:36.290
all the domestic and
international flights running

00:27:36.290 --> 00:27:37.790
separate queries on each.

00:27:37.790 --> 00:27:40.010
And so in that case, we'll
want to think about trying

00:27:40.010 --> 00:27:42.710
to separate our data in such
a way that, generally, we're

00:27:42.710 --> 00:27:46.750
only going to need to deal with one
table or the other at any given time.

00:27:46.750 --> 00:27:49.280
And so domestic and international
might be a reasonable way

00:27:49.280 --> 00:27:52.970
to split up our flights table because
maybe, most of the time, our airport

00:27:52.970 --> 00:27:54.860
just cares about
searching domestic flights

00:27:54.860 --> 00:27:56.630
if we know we're looking
for one kind of flight,

00:27:56.630 --> 00:27:59.030
or just cares about searching
for international flights

00:27:59.030 --> 00:28:01.405
if there are different people
or different computers that

00:28:01.405 --> 00:28:05.090
are going to handle each of
those different types of systems.

00:28:05.090 --> 00:28:08.630
And so partitioning our database can
sometimes help with issues of scale

00:28:08.630 --> 00:28:11.480
by making it faster to search
through large amounts of data

00:28:11.480 --> 00:28:14.480
and being able to represent
data a little bit more cleanly.

00:28:14.480 --> 00:28:17.840
But it still seems to represent
a single point of failure--

00:28:17.840 --> 00:28:22.850
that we have multiple servers now that
are all connected to the same database.

00:28:22.850 --> 00:28:24.890
And there, again, is a
single point of failure.

00:28:24.890 --> 00:28:27.353
If the database fails for
some reason, well now,

00:28:27.353 --> 00:28:29.270
suddenly, none of our
web application is going

00:28:29.270 --> 00:28:31.940
to work because all of
those servers are all

00:28:31.940 --> 00:28:35.180
connected to that exact same database.

00:28:35.180 --> 00:28:36.980
And so it's for that
reason that we might--

00:28:36.980 --> 00:28:39.230
just as we tried to add
more servers in order

00:28:39.230 --> 00:28:42.530
to solve the problem of a single
point of failure with our servers,

00:28:42.530 --> 00:28:45.410
we might also try database replication.

00:28:45.410 --> 00:28:48.860
Rather than just have a single
database in our web application,

00:28:48.860 --> 00:28:50.870
in order to guard against
potential failure,

00:28:50.870 --> 00:28:54.410
we might replicate our database--
have multiple different databases

00:28:54.410 --> 00:28:59.297
and, therefore, reduce the likelihood
that our application entirely fails.

00:28:59.297 --> 00:29:01.130
And there are a couple
of approaches that we

00:29:01.130 --> 00:29:03.020
can use for database replication.

00:29:03.020 --> 00:29:06.800
Two of the most common are what are
known as single-primary replication

00:29:06.800 --> 00:29:09.190
and multi-primary replication.

00:29:09.190 --> 00:29:11.760
And in single-primary
database replication,

00:29:11.760 --> 00:29:14.040
we have multiple different databases.

00:29:14.040 --> 00:29:17.930
But one of those databases is
considered to be the primary database.

00:29:17.930 --> 00:29:20.510
And what we mean by a primary
database is a database

00:29:20.510 --> 00:29:22.310
to which we can both read data--

00:29:22.310 --> 00:29:24.560
meaning select rows from the table--

00:29:24.560 --> 00:29:27.350
but also write data,
meaning insert rows,

00:29:27.350 --> 00:29:31.200
or update rows, or delete
rows to any of those tables.

00:29:31.200 --> 00:29:34.070
So in single-primary replication,
we have a single database

00:29:34.070 --> 00:29:36.260
where we can both read and write.

00:29:36.260 --> 00:29:38.680
And we have some number of
other databases-- in this case,

00:29:38.680 --> 00:29:40.100
two other databases--

00:29:40.100 --> 00:29:41.900
from which we can only read data.

00:29:41.900 --> 00:29:44.220
So we can get data from those databases.

00:29:44.220 --> 00:29:48.560
But we can't update, or insert,
or delete from those databases.

00:29:48.560 --> 00:29:52.490
And now we need some mechanism to
make sure that all of these databases

00:29:52.490 --> 00:29:53.750
are kept in sync.

00:29:53.750 --> 00:29:57.620
And ultimately, what that means is
that, any time the database changes,

00:29:57.620 --> 00:29:59.660
all of the databases are informed.

00:29:59.660 --> 00:30:02.390
Now, the only database that
can change is our primary one.

00:30:02.390 --> 00:30:04.250
This is the only one
that can be written to,

00:30:04.250 --> 00:30:06.740
the only one that allows
for the data to change.

00:30:06.740 --> 00:30:08.180
The others are read only.

00:30:08.180 --> 00:30:12.170
So anytime this primary database
updates or changes in some way,

00:30:12.170 --> 00:30:16.540
it needs to inform the other
databases of that update.

00:30:16.540 --> 00:30:18.920
And so it informs the other
databases of that update.

00:30:18.920 --> 00:30:21.230
And now all of the
databases are kept in sync

00:30:21.230 --> 00:30:23.960
where, if you try and run a
query on any of these databases

00:30:23.960 --> 00:30:25.910
to select and get some
information, you'll

00:30:25.910 --> 00:30:30.440
get the same results from all of
these various different databases.

00:30:30.440 --> 00:30:32.990
Now, the single-primary
approach has some drawbacks.

00:30:32.990 --> 00:30:36.950
It has the drawback of only one of
these databases can be written to.

00:30:36.950 --> 00:30:38.750
So if you have a lot
of users that are all

00:30:38.750 --> 00:30:42.550
trying to write data to the
database at the exact same time,

00:30:42.550 --> 00:30:44.360
well, there might be
some issues here where

00:30:44.360 --> 00:30:46.370
this one database is
going to be carrying

00:30:46.370 --> 00:30:49.100
all of that load for all of
the people that might be trying

00:30:49.100 --> 00:30:51.860
to update and change that database.

00:30:51.860 --> 00:30:54.140
And it also has a
slightly smaller version

00:30:54.140 --> 00:30:57.140
of the same problem of a
single point of failure.

00:30:57.140 --> 00:31:00.770
There is no longer a single point of
failure for reading from that data.

00:31:00.770 --> 00:31:03.750
If you want to read from the data,
and one of the databases goes out,

00:31:03.750 --> 00:31:07.340
you can read data from any of the other
databases, and they'll work just fine.

00:31:07.340 --> 00:31:10.670
But it does have the drawback
that, if this database fails,

00:31:10.670 --> 00:31:13.040
if our primary database
fails, well, then

00:31:13.040 --> 00:31:14.750
we're no longer able to write data.

00:31:14.750 --> 00:31:17.150
If we want to update data
inside of our database,

00:31:17.150 --> 00:31:19.910
this one database is no longer
going to be operational.

00:31:19.910 --> 00:31:24.673
And none of the other databases are
going to allow us to write new changes.

00:31:24.673 --> 00:31:27.840
So there are a couple of approaches we
can use to try to solve this problem.

00:31:27.840 --> 00:31:31.145
One approach though is, instead of
having a single-primary database--

00:31:31.145 --> 00:31:33.950
a single database to which
we can read and write--

00:31:33.950 --> 00:31:36.610
to use a multi-primary approach.

00:31:36.610 --> 00:31:40.160
And in the multi-primary approach, we
have multiple databases, all of which

00:31:40.160 --> 00:31:41.810
we can read and write to.

00:31:41.810 --> 00:31:44.230
We can select rows
from all the databases.

00:31:44.230 --> 00:31:48.780
And we can insert an update and delete
rows to all of these databases as well.

00:31:48.780 --> 00:31:52.050
But now the synchronization process
becomes a little bit trickier.

00:31:52.050 --> 00:31:54.050
And here, now, is the
trade off-- that now we've

00:31:54.050 --> 00:31:55.850
replicated the number
of reads and writes

00:31:55.850 --> 00:31:59.870
we can do by having many databases to
which we can read data and write data.

00:31:59.870 --> 00:32:02.870
But anytime any of
these databases changes,

00:32:02.870 --> 00:32:07.695
every database needs to inform all of
the other databases of those updates.

00:32:07.695 --> 00:32:10.070
And that's, certainly, going
to take some amount of time.

00:32:10.070 --> 00:32:13.160
It introduces some complexity
into our system as well.

00:32:13.160 --> 00:32:16.550
And it also introduces the
possibility for conflicts.

00:32:16.550 --> 00:32:19.550
You might imagine situations
where, if two people are editing

00:32:19.550 --> 00:32:21.830
similar data at the
same time, you might run

00:32:21.830 --> 00:32:24.080
into a number of different
types of conflicts.

00:32:24.080 --> 00:32:27.560
So one type of conflict, for
example, would be an update conflict.

00:32:27.560 --> 00:32:30.170
If I tried to edit one
row in one database,

00:32:30.170 --> 00:32:34.040
and someone else tries to edit the same
row in another database, when they sync

00:32:34.040 --> 00:32:36.230
up with each other via
this update process,

00:32:36.230 --> 00:32:38.600
our database system
needs some way to decide

00:32:38.600 --> 00:32:42.200
how it's going to resolve those
various different updates.

00:32:42.200 --> 00:32:44.880
Another conflict might
be a uniqueness conflict.

00:32:44.880 --> 00:32:46.907
We've seen, in the case
of databases in SQL

00:32:46.907 --> 00:32:48.740
that, when we're designing
our tables, I can

00:32:48.740 --> 00:32:51.980
specify that this particular
field should be a unique field--

00:32:51.980 --> 00:32:56.030
common one being the ID field, for
example, where every single row is

00:32:56.030 --> 00:32:58.100
going to have its own unique ideas.

00:32:58.100 --> 00:33:01.670
Well, what happens if two people
try to insert data at the same time

00:33:01.670 --> 00:33:03.350
into two different databases?

00:33:03.350 --> 00:33:07.610
They're each given a unique ID, but it's
the same idea on both of the databases,

00:33:07.610 --> 00:33:11.240
because neither database knows that the
other database has added a new row yet.

00:33:11.240 --> 00:33:14.540
So when they sync back up, we might
run into a uniqueness conflict

00:33:14.540 --> 00:33:18.290
where two different databases
have assigned the same exact ID

00:33:18.290 --> 00:33:19.730
to multiple different entries.

00:33:19.730 --> 00:33:23.117
So we need some way to be able to
resolve those conflicts as well.

00:33:23.117 --> 00:33:24.950
And there are many other
conflicts you might

00:33:24.950 --> 00:33:28.340
imagine trying to deal with-- one
example being, for instance, delete

00:33:28.340 --> 00:33:31.430
conflicts, where one person
tries to delete a row

00:33:31.430 --> 00:33:33.710
and another person tries
to update that row.

00:33:33.710 --> 00:33:35.278
Well, which should take precedence?

00:33:35.278 --> 00:33:36.320
Should we update the row?

00:33:36.320 --> 00:33:37.610
Should we delete the row?

00:33:37.610 --> 00:33:41.450
We need some way to be able to
make those decisions because there

00:33:41.450 --> 00:33:45.150
is some latency between when
a change is made to a database

00:33:45.150 --> 00:33:48.600
and when that database is able to
communicate with another database.

00:33:48.600 --> 00:33:51.290
So these issues of scale,
these issues of synchronization

00:33:51.290 --> 00:33:53.330
are always going to come
up as we start to deal

00:33:53.330 --> 00:33:56.970
with programs that are interacting with
more and more of this kind of data.

00:33:56.970 --> 00:33:59.810
And as a result, we need to
design more and more sophisticated

00:33:59.810 --> 00:34:04.040
systems that are able to deal
with those issues of scale.

00:34:04.040 --> 00:34:09.139
Now, ultimately, we'd ideally like to
reduce the number of different database

00:34:09.139 --> 00:34:10.130
servers that we have.

00:34:10.130 --> 00:34:12.692
Every additional database
server is going to cost time.

00:34:12.692 --> 00:34:13.900
It's going to cost resources.

00:34:13.900 --> 00:34:17.060
It costs money in terms of keeping
all of these servers running.

00:34:17.060 --> 00:34:20.960
And so, ideally, we'd like not
to have to talk to this database

00:34:20.960 --> 00:34:22.590
if we don't need to.

00:34:22.590 --> 00:34:26.360
So you might imagine, for example, a
news organization's website, something

00:34:26.360 --> 00:34:28.275
like the front page
of the New York Times.

00:34:28.275 --> 00:34:30.650
If you go to the home page of
the New York Times website,

00:34:30.650 --> 00:34:33.230
it displays all of the
day's headlines with images

00:34:33.230 --> 00:34:36.860
and with information about what each
of the stories are about, for example.

00:34:36.860 --> 00:34:39.983
And you might imagine that the way
they're doing something like this

00:34:39.983 --> 00:34:41.900
is that they have some
kind of database that's

00:34:41.900 --> 00:34:43.670
storing all of these news articles.

00:34:43.670 --> 00:34:46.040
And when you visit the front
page of the New York Times,

00:34:46.040 --> 00:34:48.290
it's going to do some
kind of database query--

00:34:48.290 --> 00:34:51.500
selecting all of the recent
top headlines, for example--

00:34:51.500 --> 00:34:56.460
and rendering all of that information
in an HTML page that you can see.

00:34:56.460 --> 00:34:57.930
And that would certainly work.

00:34:57.930 --> 00:35:00.440
But if a lot of people are
all requesting the front page

00:35:00.440 --> 00:35:04.670
at the same time, well, it probably
doesn't make all that much sense

00:35:04.670 --> 00:35:08.390
if the web application, every time,
is making a database query, getting

00:35:08.390 --> 00:35:13.040
the latest articles, and then displaying
that information to all of the users

00:35:13.040 --> 00:35:16.130
because the articles might not
be changing all that frequently.

00:35:16.130 --> 00:35:18.440
If one person makes
a request one second,

00:35:18.440 --> 00:35:21.710
and another person makes the
same request half a second later,

00:35:21.710 --> 00:35:26.150
it probably is not going to be useful
to re-request all of the information

00:35:26.150 --> 00:35:29.450
from the database, regenerate that
template yet again, because it's

00:35:29.450 --> 00:35:33.050
an expensive process of requesting
data from the database, of generating

00:35:33.050 --> 00:35:33.800
that template.

00:35:33.800 --> 00:35:36.710
We'd, ideally, like some way
of dealing with that problem.

00:35:36.710 --> 00:35:40.040
And the way we can deal with that
problem is some form of caching.

00:35:40.040 --> 00:35:44.300
And caching refers to a whole bunch
of different types of ideas and tools

00:35:44.300 --> 00:35:47.660
that we can use at various different
places inside of our system.

00:35:47.660 --> 00:35:50.390
But in general, when we're
talking about caching,

00:35:50.390 --> 00:35:54.680
we're talking about storing a saved
version of some information in a way

00:35:54.680 --> 00:35:58.340
that we can access it more quickly so
that we don't need to continue making

00:35:58.340 --> 00:36:00.720
requests to a database, for example.

00:36:00.720 --> 00:36:02.930
And so there are a number
of ways we can do caching.

00:36:02.930 --> 00:36:07.010
One way we can do caching is on the
client side via client-side caching

00:36:07.010 --> 00:36:08.850
where the idea is that your browser--

00:36:08.850 --> 00:36:11.030
whether it's Safari, or
Chrome, or something else--

00:36:11.030 --> 00:36:13.700
is able to cache data,
store information,

00:36:13.700 --> 00:36:17.070
so that the browser doesn't need
to re-request the same information

00:36:17.070 --> 00:36:19.050
the next time it visits the page.

00:36:19.050 --> 00:36:21.680
For example, if you request a
page and it loads an image--

00:36:21.680 --> 00:36:23.210
on the page, for example--

00:36:23.210 --> 00:36:25.850
and you reload the page,
well, your web browser

00:36:25.850 --> 00:36:28.760
might try and make a request
again for the exact same image

00:36:28.760 --> 00:36:30.020
and then display it to you.

00:36:30.020 --> 00:36:33.500
But an alternative might be
that your web browser could just

00:36:33.500 --> 00:36:35.960
save a copy of the
image inside of a cache

00:36:35.960 --> 00:36:40.280
to locally store a version of
the image so that, the next time

00:36:40.280 --> 00:36:42.860
that the user makes a request
to the website, the user

00:36:42.860 --> 00:36:45.410
doesn't need to reload
that entire image.

00:36:45.410 --> 00:36:48.650
And that might be true of entire
web pages and web resources--

00:36:48.650 --> 00:36:51.770
that if there is some page that
doesn't change very often then,

00:36:51.770 --> 00:36:55.850
if the web browser just stores a
cached, a saved version of that page,

00:36:55.850 --> 00:36:58.340
then the next time the user
goes to their web browser,

00:36:58.340 --> 00:37:03.020
tries to access that page, rather than
re-request to the server and make a new

00:37:03.020 --> 00:37:06.440
request that the server needs to
respond to, if the browser has that page

00:37:06.440 --> 00:37:09.530
cached, the browser can
just display the cached--

00:37:09.530 --> 00:37:13.830
saved-- version of the page, saving
the need to talk to the server at all.

00:37:13.830 --> 00:37:16.970
So this can certainly help to
reduce the load on any given server.

00:37:16.970 --> 00:37:20.360
If users are caching information
inside of the web browser,

00:37:20.360 --> 00:37:22.480
it makes the experience
faster for the user

00:37:22.480 --> 00:37:24.980
because they can see the
information immediately rather than

00:37:24.980 --> 00:37:28.070
need to make a request and wait
for a response to come back.

00:37:28.070 --> 00:37:30.140
And it's good for the
server because the server

00:37:30.140 --> 00:37:33.740
doesn't need to be dealing with as
many requests if some of those requests

00:37:33.740 --> 00:37:35.160
are getting cached.

00:37:35.160 --> 00:37:37.400
And so one approach to
trying to do this is

00:37:37.400 --> 00:37:42.290
by adding this inside of the
headers of an HTTP response.

00:37:42.290 --> 00:37:44.960
When your web server
responds to some requests,

00:37:44.960 --> 00:37:48.770
the web server can include a line
like this inside of the response--

00:37:48.770 --> 00:37:53.210
something like cache-control
max-age-86400--

00:37:53.210 --> 00:37:56.330
in effect, specifying
the number of seconds

00:37:56.330 --> 00:37:58.850
that you should cache this resource for.

00:37:58.850 --> 00:38:02.510
But if I try to access
this page 10 seconds later,

00:38:02.510 --> 00:38:04.910
well, that's less than 86,400.

00:38:04.910 --> 00:38:08.600
So rather than reload and
re-request the entire page,

00:38:08.600 --> 00:38:11.390
we're just going to use the
version of the page that happens

00:38:11.390 --> 00:38:13.750
to be cached inside of the web browser.

00:38:13.750 --> 00:38:16.250
And so this has several advantages,
that we've talked about,

00:38:16.250 --> 00:38:19.640
in terms of reducing the amount of time
it takes to see the content of the page

00:38:19.640 --> 00:38:23.570
because it's already saved and reducing
the load on any particular server.

00:38:23.570 --> 00:38:25.040
But it also has drawbacks.

00:38:25.040 --> 00:38:29.180
If, for example, the resource
changes within this amount of time--

00:38:29.180 --> 00:38:32.240
maybe in 60 seconds,
the page has changed--

00:38:32.240 --> 00:38:35.120
if I try and load the
page again, well, then

00:38:35.120 --> 00:38:37.400
if it's loading the cache
version of the page,

00:38:37.400 --> 00:38:40.400
I might be seeing an outdated
version of a web page.

00:38:40.400 --> 00:38:42.470
I'm seeing an older
version of the web page

00:38:42.470 --> 00:38:45.320
because my web browser
just so happens to have

00:38:45.320 --> 00:38:47.570
that particular resource cached.

00:38:47.570 --> 00:38:49.610
And this might be true of a web page.

00:38:49.610 --> 00:38:53.630
It's especially true of other static
resources, things like CSS files

00:38:53.630 --> 00:38:54.760
or JavaScript files.

00:38:54.760 --> 00:38:58.860
The CSS of a web page probably
doesn't change all that often.

00:38:58.860 --> 00:39:02.120
And so, as a result, it's pretty
natural that your web browser--

00:39:02.120 --> 00:39:05.870
rather than request the exact same CSS
files again, and again, and again--

00:39:05.870 --> 00:39:08.650
might just save a copy
of those CSS files,

00:39:08.650 --> 00:39:12.380
cache them, such that it's able
to just reuse the cached version.

00:39:12.380 --> 00:39:14.690
But if the website were
to update their CSS,

00:39:14.690 --> 00:39:16.355
you might not see the latest changes.

00:39:16.355 --> 00:39:18.230
And you might have
experienced this yourself.

00:39:18.230 --> 00:39:21.410
If you're working on your own web
applications, when you change your CSS

00:39:21.410 --> 00:39:23.270
and refresh the page,
you might not always

00:39:23.270 --> 00:39:27.900
see those changes reflected if your
web browser is caching those results.

00:39:27.900 --> 00:39:30.710
And so, in most web browsers,
you can do a hard refresh

00:39:30.710 --> 00:39:33.740
to say, ignore whatever is in
the cache, and actually go out

00:39:33.740 --> 00:39:36.030
and make a new request
and get some new data.

00:39:36.030 --> 00:39:38.810
But ultimately, if you
don't do that, you're

00:39:38.810 --> 00:39:42.230
subject to this cache control where
the web browser is going to say,

00:39:42.230 --> 00:39:44.750
unless this number of
seconds has elapsed,

00:39:44.750 --> 00:39:48.500
we're going to reuse the
existing version of the page.

00:39:48.500 --> 00:39:51.590
And so an alternative to this approach--
and this approach certainly works

00:39:51.590 --> 00:39:52.670
and is quite popular--

00:39:52.670 --> 00:39:56.950
we can add to this approach by
adding what's known as ETag.

00:39:56.950 --> 00:40:00.290
An ETag for a resource--
like a CSS file, or an image,

00:40:00.290 --> 00:40:01.590
or a JavaScript file--

00:40:01.590 --> 00:40:04.190
is just some unique
sequence of characters

00:40:04.190 --> 00:40:07.610
that identifies a particular
version of a resource,

00:40:07.610 --> 00:40:11.300
that identifies a particular version
of a CSS file or a JavaScript file,

00:40:11.300 --> 00:40:12.930
for example.

00:40:12.930 --> 00:40:14.840
And what this allows a program to do--

00:40:14.840 --> 00:40:16.010
like a web browser--

00:40:16.010 --> 00:40:18.230
is that, when a web browser
requests a resource--

00:40:18.230 --> 00:40:21.410
makes a request for a CSS
file or a JavaScript file--

00:40:21.410 --> 00:40:22.370
they get it back.

00:40:22.370 --> 00:40:25.760
And they get its
associated ETag value, so I

00:40:25.760 --> 00:40:28.310
know that this is the
value that is associated

00:40:28.310 --> 00:40:31.040
with this version of the CSS file.

00:40:31.040 --> 00:40:35.720
And if the web server were ever to
change that CSS file, replace it

00:40:35.720 --> 00:40:41.820
with a new updated CSS file, the
corresponding ETag will also change.

00:40:41.820 --> 00:40:43.650
So why is this helpful?

00:40:43.650 --> 00:40:46.730
Well, it means that if I am
trying to decide, should I

00:40:46.730 --> 00:40:50.070
load a new version of
the resource or not,

00:40:50.070 --> 00:40:53.510
should I try and make another request
to get the latest version of the CSS,

00:40:53.510 --> 00:40:55.970
what I can do first
is just ask for, what

00:40:55.970 --> 00:40:59.660
is the ETag value, the short sequence
that can be answered very quickly?

00:40:59.660 --> 00:41:02.090
Very quickly, we can
just respond and say,

00:41:02.090 --> 00:41:05.360
you know what, if the ETag value
is the same as what I remembered

00:41:05.360 --> 00:41:07.850
from last time, well,
then I don't need to get

00:41:07.850 --> 00:41:10.340
a whole new version of that resource.

00:41:10.340 --> 00:41:13.070
And so this is quite common,
too, that a web browser will say,

00:41:13.070 --> 00:41:15.110
hey, let me request this resource.

00:41:15.110 --> 00:41:19.200
But I already have a version of the
resource with this particular ETag.

00:41:19.200 --> 00:41:24.110
So if that ETag is still the ETag for
the most recent version of a particular

00:41:24.110 --> 00:41:26.450
resource-- like a CSS
or JavaScript file--

00:41:26.450 --> 00:41:30.650
then no need for the web server to
send a new version of that file.

00:41:30.650 --> 00:41:33.650
Just go ahead and respond and say,
the version you have-- that one

00:41:33.650 --> 00:41:34.920
works-- totally fine.

00:41:34.920 --> 00:41:38.280
But if there is a new version, well,
then the web server can respond with

00:41:38.280 --> 00:41:41.130
the new asset-- the new
CSS file, for example--

00:41:41.130 --> 00:41:43.430
but also the new ETag value.

00:41:43.430 --> 00:41:46.160
So these two approaches can
work in concert with each other.

00:41:46.160 --> 00:41:49.220
You can say, go ahead and cache
this for some number of seconds

00:41:49.220 --> 00:41:51.020
so that, for some number
of seconds, you're

00:41:51.020 --> 00:41:54.680
not going to ever request a
new version of that resource.

00:41:54.680 --> 00:41:57.710
But even if you do ask for a
new version of the resource

00:41:57.710 --> 00:41:59.900
after this number of
seconds has elapsed,

00:41:59.900 --> 00:42:02.390
if the ETag value
hasn't updated, then no

00:42:02.390 --> 00:42:06.090
need to redownload a whole new
version of a particular file.

00:42:06.090 --> 00:42:08.750
You can just reuse the
version that happens

00:42:08.750 --> 00:42:10.890
to be cached already in the browser.

00:42:10.890 --> 00:42:14.270
So caching in the browser can
be an incredibly powerful tool

00:42:14.270 --> 00:42:17.000
for trying to speed up these
requests, for trying to reduce

00:42:17.000 --> 00:42:19.070
the load on any particular server.

00:42:19.070 --> 00:42:21.290
But the client side
is not the only place

00:42:21.290 --> 00:42:23.510
where we can begin to
do this kind of caching.

00:42:23.510 --> 00:42:26.330
We also have the ability
to do server-side caching.

00:42:26.330 --> 00:42:30.560
And in server-side caching, we're going
to introduce to our picture the notion

00:42:30.560 --> 00:42:31.940
of a cache--

00:42:31.940 --> 00:42:34.160
that we have these multiple
servers that are all

00:42:34.160 --> 00:42:35.720
communicating with the database.

00:42:35.720 --> 00:42:38.300
But these servers can also
communicate with a cache--

00:42:38.300 --> 00:42:41.360
someplace where we've
stored information that we

00:42:41.360 --> 00:42:46.340
might want to reuse later rather than
have to do all of that recalculation.

00:42:46.340 --> 00:42:49.280
And Django, in turns out, has
an entire cache framework,

00:42:49.280 --> 00:42:51.530
a whole host of features
that Django offers

00:42:51.530 --> 00:42:54.860
that allow us to leverage
this ability to use the cache

00:42:54.860 --> 00:42:56.470
to be able to speed up requests.

00:42:56.470 --> 00:42:59.150
So there are per-view
caches where you can

00:42:59.150 --> 00:43:02.720
specify a cache on a particular
view to say that, rather than run

00:43:02.720 --> 00:43:05.540
through all this Python code
every time someone makes

00:43:05.540 --> 00:43:09.410
a request to this
particular view, instead,

00:43:09.410 --> 00:43:14.150
just cache the view so that, for
the next 30 seconds or 30 minutes,

00:43:14.150 --> 00:43:16.940
the next time someone tries
to visit the same view,

00:43:16.940 --> 00:43:19.910
go ahead and just reuse the
results of the last time

00:43:19.910 --> 00:43:21.665
that that view was loaded.

00:43:21.665 --> 00:43:23.540
And this can work not
just for a single view.

00:43:23.540 --> 00:43:25.657
It can work for fragments
inside of a template.

00:43:25.657 --> 00:43:27.740
Your template might have
multiple different parts.

00:43:27.740 --> 00:43:31.190
On your web page, you might render
the navigation bar, and the sidebar,

00:43:31.190 --> 00:43:33.800
and the footer, maybe based
on information about today

00:43:33.800 --> 00:43:36.050
that might change the next day.

00:43:36.050 --> 00:43:38.510
But if you expect that
the side bar of your page

00:43:38.510 --> 00:43:41.570
is not going to change very
often within the same minute

00:43:41.570 --> 00:43:43.820
or within the same hour,
well, then you might imagine

00:43:43.820 --> 00:43:46.910
caching that part of the
template so that, the next time

00:43:46.910 --> 00:43:49.160
that Django tries to load
that entire template,

00:43:49.160 --> 00:43:52.550
it doesn't need to recalculate how to
generate the sidebar for your website.

00:43:52.550 --> 00:43:56.330
It just knows that we can use
the same version of the sidebar

00:43:56.330 --> 00:43:59.786
from the last time that we
loaded this website instead.

00:43:59.786 --> 00:44:03.600
And Django also gives you access
to a lower level cache API

00:44:03.600 --> 00:44:07.080
where, for any information that you
might want to cache and store for use

00:44:07.080 --> 00:44:10.140
later, you can save that
information inside of the API.

00:44:10.140 --> 00:44:12.180
You make an expensive
database query that

00:44:12.180 --> 00:44:15.360
takes a couple of milliseconds or
a couple of seconds to process.

00:44:15.360 --> 00:44:17.760
You can save those
results inside of a cache

00:44:17.760 --> 00:44:20.550
to make it easier to access
that same data if ever you

00:44:20.550 --> 00:44:22.930
try to get access to that again.

00:44:22.930 --> 00:44:26.430
So caching allows us to be able
to deal with these issues of scale

00:44:26.430 --> 00:44:29.910
by reducing load on our servers,
but also on our databases.

00:44:29.910 --> 00:44:33.330
Rather than need to talk to the
database every single time we

00:44:33.330 --> 00:44:36.750
make a new request for a
particular web application,

00:44:36.750 --> 00:44:39.060
we can just reuse
information that happens

00:44:39.060 --> 00:44:42.930
to be in the cache to allow our web
applications to become even more

00:44:42.930 --> 00:44:44.350
scalable.

00:44:44.350 --> 00:44:48.000
So that then was a look at some
issues concerning scalability.

00:44:48.000 --> 00:44:50.580
And we'll next turn our
attention to security--

00:44:50.580 --> 00:44:53.610
trying to make sure that, as we build
our web applications, as we deploy

00:44:53.610 --> 00:44:56.370
our web applications and
more users start to use them,

00:44:56.370 --> 00:44:58.290
we want to make sure
that they're secure.

00:44:58.290 --> 00:45:00.570
And there are a whole bunch
of security considerations

00:45:00.570 --> 00:45:03.170
to take into account
across all of the topics

00:45:03.170 --> 00:45:04.650
that we've looked at in the course.

00:45:04.650 --> 00:45:06.525
We've looked at a number
of different topics.

00:45:06.525 --> 00:45:09.400
And with each of them, there
are security vulnerabilities.

00:45:09.400 --> 00:45:12.720
There are ideas to be mindful of
when it comes towards making sure

00:45:12.720 --> 00:45:14.580
that our applications are secure.

00:45:14.580 --> 00:45:18.420
And we can begin our story, in fact, by
talking about Git and version control.

00:45:18.420 --> 00:45:20.370
Git is all about trying
to make sure we're

00:45:20.370 --> 00:45:22.860
able to keep track of
different versions of our code.

00:45:22.860 --> 00:45:24.780
And one thing that goes
hand-in-hand with Git

00:45:24.780 --> 00:45:27.480
is this idea of open-source software.

00:45:27.480 --> 00:45:30.930
On websites like GitHub and other
services that host Git repositories,

00:45:30.930 --> 00:45:33.930
increasingly, a lot of software
is becoming open source

00:45:33.930 --> 00:45:38.190
where anyone can see and contribute
to the source code of an application.

00:45:38.190 --> 00:45:40.868
And this is great in the sense
that it allows for many people

00:45:40.868 --> 00:45:42.660
to be able to collaborate
and work together

00:45:42.660 --> 00:45:46.590
in order to try to find bugs that might
exist inside of a web application.

00:45:46.590 --> 00:45:48.810
But it also comes with
drawbacks-- drawbacks

00:45:48.810 --> 00:45:51.333
where, if there is a
bug in the application,

00:45:51.333 --> 00:45:54.000
now someone who's looking through
the source code of our program

00:45:54.000 --> 00:45:56.250
might be able to spot that bug.

00:45:56.250 --> 00:45:58.920
Or you might imagine
that, because Git keeps

00:45:58.920 --> 00:46:01.830
track of different versions
of our code every time

00:46:01.830 --> 00:46:04.050
we make a commit to our
repository, you have

00:46:04.050 --> 00:46:07.110
to be very careful when it comes
towards credentials or things that

00:46:07.110 --> 00:46:08.910
might leak inside of the source code.

00:46:08.910 --> 00:46:12.600
You generally never want to put
passwords or any secure information

00:46:12.600 --> 00:46:15.990
inside of the Git repository
because the Git repository could

00:46:15.990 --> 00:46:19.000
be shared with other people and
might be open to anyone to look at.

00:46:19.000 --> 00:46:22.200
And so those are security
considerations to be mindful there as

00:46:22.200 --> 00:46:25.920
well-- that if you make a commit, and
accidentally make a commit to your code

00:46:25.920 --> 00:46:29.610
where you expose those credentials,
you might remove those credentials

00:46:29.610 --> 00:46:32.160
and commit again so the
latest version of your program

00:46:32.160 --> 00:46:34.140
doesn't have those credentials in it.

00:46:34.140 --> 00:46:36.540
But someone who has access
to the Git repository

00:46:36.540 --> 00:46:39.150
has access not just to the
latest version of your code,

00:46:39.150 --> 00:46:41.110
but to every version of your code.

00:46:41.110 --> 00:46:43.650
And that person could,
theoretically, go back

00:46:43.650 --> 00:46:46.770
through the history of the
repository and find the commit

00:46:46.770 --> 00:46:51.040
where the credentials were exposed
and see those credentials as well.

00:46:51.040 --> 00:46:54.270
So while Git is a very powerful
tool, it's also one to be mindful of.

00:46:54.270 --> 00:46:57.840
Any change you make could potentially
get saved inside of a commit--

00:46:57.840 --> 00:47:00.690
could potentially, therefore,
be accessed later on.

00:47:00.690 --> 00:47:04.380
And so if ever credentials are
exposed inside of the repository,

00:47:04.380 --> 00:47:07.260
you want to make sure to wipe
out all of those previous commits

00:47:07.260 --> 00:47:09.690
and not just make some
new commit in order

00:47:09.690 --> 00:47:13.740
to try and hide the previous credentials
that can be exposed because they can

00:47:13.740 --> 00:47:17.010
still be retrieved if someone
goes back through the history

00:47:17.010 --> 00:47:19.300
of any particular repository.

00:47:19.300 --> 00:47:23.025
And so that, then, was a look at
some issues that might surround Git.

00:47:23.025 --> 00:47:24.900
We also talked at the
beginning of the course

00:47:24.900 --> 00:47:28.110
about HTML, and about what it
is that we can use with HTML,

00:47:28.110 --> 00:47:32.040
and how we can use this language in
order to design the structure of a web

00:47:32.040 --> 00:47:36.150
page, in order to decide where all
of the paragraphs are going to be,

00:47:36.150 --> 00:47:38.070
what tables are going to be on the page.

00:47:38.070 --> 00:47:40.710
We talked about links and
how we can use anchor tags

00:47:40.710 --> 00:47:42.960
to link one page to another page.

00:47:42.960 --> 00:47:47.640
Now, one concern is this type of attack
known as a phishing attack with HTML.

00:47:47.640 --> 00:47:49.830
And a phishing attack
really just comes down

00:47:49.830 --> 00:47:53.100
to a little bit of HTML that looks
like this-- very easy to write,

00:47:53.100 --> 00:47:57.690
where I have an anchor tag that is
going to direct the user to URL one.

00:47:57.690 --> 00:48:01.860
But it looks like it
directs the user to URL 2.

00:48:01.860 --> 00:48:03.930
So what might an example of this be?

00:48:03.930 --> 00:48:05.380
All right, so we'll take a look.

00:48:05.380 --> 00:48:09.280
I'll go ahead and open up link.html.

00:48:09.280 --> 00:48:11.770
And in link.html, I have a
website that I've written

00:48:11.770 --> 00:48:13.950
that appears to have a link to Google.

00:48:13.950 --> 00:48:16.030
But if I click on that
link, I'm suddenly

00:48:16.030 --> 00:48:19.162
directed to this course's
website, for example.

00:48:19.162 --> 00:48:20.120
So how did that happen?

00:48:20.120 --> 00:48:20.953
Why did that happen?

00:48:20.953 --> 00:48:22.670
It seems like it's linking to Google.

00:48:22.670 --> 00:48:26.290
Well, if you look at the code, if
I go ahead and open up link.html,

00:48:26.290 --> 00:48:31.360
we'll see that here I have an anchor
tag that actually links to the course

00:48:31.360 --> 00:48:34.150
website but appears to
be linking-- the text

00:48:34.150 --> 00:48:37.900
that the user sees appears that
it is linking instead to Google.

00:48:37.900 --> 00:48:41.360
And so this is a very common attack
vector, especially in emails,

00:48:41.360 --> 00:48:41.980
for example.

00:48:41.980 --> 00:48:45.040
You might see an email that tells
you to click on a particular link.

00:48:45.040 --> 00:48:48.070
But that link takes you to
somewhere else entirely instead.

00:48:48.070 --> 00:48:50.380
And as a result, someone
might inadvertently

00:48:50.380 --> 00:48:54.010
share their bank account credentials
or other sensitive information.

00:48:54.010 --> 00:48:57.220
And so here, too, something be mindful
of as you interact with the web,

00:48:57.220 --> 00:49:00.490
maybe not necessarily on your own
website, but in other websites

00:49:00.490 --> 00:49:03.940
that you might interact with, just to be
mindful about where links are actually

00:49:03.940 --> 00:49:04.580
taking you.

00:49:04.580 --> 00:49:07.300
And most web browsers,
if you hover over a link,

00:49:07.300 --> 00:49:09.400
will show you where
that link might actually

00:49:09.400 --> 00:49:12.010
be directing you to because it
might be different than what

00:49:12.010 --> 00:49:17.930
the text of that particular anchor tag
might appear to link you to instead.

00:49:17.930 --> 00:49:21.017
So HTML has all these various
different vulnerabilities

00:49:21.017 --> 00:49:24.100
where, because you can just decide
what you want the structure of the page

00:49:24.100 --> 00:49:26.710
to be, it leaves open the
possibility that someone

00:49:26.710 --> 00:49:29.770
might try to trick you into thinking
that you were going to a page

00:49:29.770 --> 00:49:31.420
that you're not actually on.

00:49:31.420 --> 00:49:34.150
And this problem is more
widespread because anyone

00:49:34.150 --> 00:49:36.580
can look at the HTML for any page.

00:49:36.580 --> 00:49:38.950
HTML comes back from the server.

00:49:38.950 --> 00:49:42.310
And therefore, the web browser
has access to all of that HTML

00:49:42.310 --> 00:49:46.270
and can use that HTML in order
to render a page, for example.

00:49:46.270 --> 00:49:49.150
And this leaves open other
vulnerabilities, too.

00:49:49.150 --> 00:49:54.760
For example, let me go ahead and
go to bankofamerica.com, just

00:49:54.760 --> 00:49:55.900
Bank of America's website.

00:49:55.900 --> 00:49:57.850
You can go to any other website instead.

00:49:57.850 --> 00:50:01.600
If I wanted to create a fake version
of Bank of America's website,

00:50:01.600 --> 00:50:03.820
for example, to trick
people into thinking

00:50:03.820 --> 00:50:05.740
they're going to Bank
of America's website

00:50:05.740 --> 00:50:08.950
when really they're going to my
website, well, then what I can do

00:50:08.950 --> 00:50:11.420
is just go ahead and view
the source of this page.

00:50:11.420 --> 00:50:13.940
I go ahead and view page source.

00:50:13.940 --> 00:50:17.990
And here is all of the HTML
for Bank of America's website.

00:50:17.990 --> 00:50:21.410
And nothing then stops me
from copying all this content,

00:50:21.410 --> 00:50:27.440
going into an HTML file, and creating a
new file that I'll just call bank.html.

00:50:27.440 --> 00:50:31.350
And I'll go ahead and paste in
the contents of that HTML file,

00:50:31.350 --> 00:50:34.700
secure then all of
Bank of America's HTML.

00:50:34.700 --> 00:50:37.190
And now, if I open up bank.html--

00:50:37.190 --> 00:50:39.920
that HTML file that I have
now written, but really

00:50:39.920 --> 00:50:42.320
just copied from Bank of America--

00:50:42.320 --> 00:50:43.730
I open it up.

00:50:43.730 --> 00:50:47.000
And now here, on my
page, is a web page that

00:50:47.000 --> 00:50:48.680
appears to look like Bank of America.

00:50:48.680 --> 00:50:51.170
It's using all of Bank
of America's HTML.

00:50:51.170 --> 00:50:56.130
But instead, it is my HTML page
and not, actually, Bank of America.

00:50:56.130 --> 00:51:00.350
And so you might imagine combining
these to create an even more concerning

00:51:00.350 --> 00:51:03.050
attack vector where, instead
of linking to google.com,

00:51:03.050 --> 00:51:06.461
let me try and link
to bankofamerica.com.

00:51:06.461 --> 00:51:12.170
But where I'm actually going to
link to is bank.html, my version

00:51:12.170 --> 00:51:14.180
of Bank of America's website.

00:51:14.180 --> 00:51:18.170
Now, if I open up
link.html, here appears

00:51:18.170 --> 00:51:20.900
to be a link that links
me to Bank of America.

00:51:20.900 --> 00:51:23.180
If I click on that link,
I get to a page that

00:51:23.180 --> 00:51:25.250
looks like Bank of America's website.

00:51:25.250 --> 00:51:27.260
But it's not Bank of America's website.

00:51:27.260 --> 00:51:30.490
It's my bank.html file
that I have written.

00:51:30.490 --> 00:51:33.140
It just so happens to look
like Bank of America's website

00:51:33.140 --> 00:51:36.620
because I copied all of
that underlying HTML.

00:51:36.620 --> 00:51:39.860
So HTML has the ability to describe
the structure of our web page.

00:51:39.860 --> 00:51:43.790
But anytime you're writing this HTML,
it's good to be mindful of the fact

00:51:43.790 --> 00:51:48.110
that anyone can copy your HTML, could
theoretically pretend to be you.

00:51:48.110 --> 00:51:50.090
These are security
vulnerabilities that are

00:51:50.090 --> 00:51:53.240
worth bearing in mind as we
start to develop web applications

00:51:53.240 --> 00:51:56.910
and interacting with web
applications as well.

00:51:56.910 --> 00:52:01.070
So ultimately, we used HTML in the
context of designing web applications

00:52:01.070 --> 00:52:02.960
using Django, a framework.

00:52:02.960 --> 00:52:05.690
And how exactly, then,
did these web frameworks

00:52:05.690 --> 00:52:10.250
work in terms of creating these web
servers that are listening for requests

00:52:10.250 --> 00:52:12.650
and that are responding
to those requests?

00:52:12.650 --> 00:52:14.390
Well, ultimately, much
of the internet is

00:52:14.390 --> 00:52:17.930
based around this idea of a client
communicating with a server or, more

00:52:17.930 --> 00:52:20.420
generally, any one
computer communicating

00:52:20.420 --> 00:52:23.810
with another computer using
HTTP and, in particular,

00:52:23.810 --> 00:52:28.618
HTTPS, a more secure version
of the HTTP protocol.

00:52:28.618 --> 00:52:31.160
And so you imagine that what
these protocols are really about

00:52:31.160 --> 00:52:34.200
is how information gets
from one person to another

00:52:34.200 --> 00:52:36.110
and what we're storing
with that information.

00:52:36.110 --> 00:52:39.680
We have one computer trying to
communicate with some other computer.

00:52:39.680 --> 00:52:42.440
And in order to do so,
information is generally

00:52:42.440 --> 00:52:45.020
going to flow through these routers.

00:52:45.020 --> 00:52:47.270
You might imagine information
going back and forth

00:52:47.270 --> 00:52:49.610
between one computer
and another computer,

00:52:49.610 --> 00:52:53.540
going through these intermediate
routers along the way.

00:52:53.540 --> 00:52:56.390
And as a result, one
thing to be cautious about

00:52:56.390 --> 00:52:58.400
is, how do you know that
this information that's

00:52:58.400 --> 00:53:02.390
getting passed back and forth is
getting passed back and forth securely?

00:53:02.390 --> 00:53:05.150
Ideally, when I send a
message to another computer--

00:53:05.150 --> 00:53:07.190
I'm sending an email
to someone else, I'm

00:53:07.190 --> 00:53:09.800
sending a message, I'm making
a request to a website that

00:53:09.800 --> 00:53:13.130
might contain sensitive information,
like my bank account, for example--

00:53:13.130 --> 00:53:17.030
I don't want it so that any intercepting
router that is taking my request

00:53:17.030 --> 00:53:18.260
and passing it along--

00:53:18.260 --> 00:53:21.170
I don't want those routers to
be able to look at that request

00:53:21.170 --> 00:53:24.950
and see the contents of my email
or the contents of what password

00:53:24.950 --> 00:53:27.620
I happen to be sending
across the web or not.

00:53:27.620 --> 00:53:31.005
Ideally, I'd like for this
information to be encrypted.

00:53:31.005 --> 00:53:33.380
And so here, we'll talk a
little bit about cryptography--

00:53:33.380 --> 00:53:35.450
this process of trying
to make sure that I

00:53:35.450 --> 00:53:37.850
am able to communicate
with some other person

00:53:37.850 --> 00:53:42.860
without some eavesdropper in the middle
being able to intercept that message.

00:53:42.860 --> 00:53:45.555
Obviously, if I just
take a plain text version

00:53:45.555 --> 00:53:47.930
of the message I'm trying to
send and just literally take

00:53:47.930 --> 00:53:51.560
the text of the message I'm trying
to send and effectively pass it along

00:53:51.560 --> 00:53:53.660
across the internet,
well, then anyone who

00:53:53.660 --> 00:53:57.430
is able to see that message is going to
know what the text of that message is.

00:53:57.430 --> 00:53:59.420
And so I want to do
some kind of encryption,

00:53:59.420 --> 00:54:02.900
some way of encrypting that message
so that someone along the way

00:54:02.900 --> 00:54:06.230
won't be able to do that decryption
if a router in the middle

00:54:06.230 --> 00:54:09.408
or someone in the middle is
able to intercept that message.

00:54:09.408 --> 00:54:11.450
And so the first approach
we'll look at is what's

00:54:11.450 --> 00:54:14.030
known as secret-key cryptography.

00:54:14.030 --> 00:54:19.160
In secret-key cryptography, I have
not just the plaintext, but some key,

00:54:19.160 --> 00:54:23.600
some secret piece of information
that can be used in order to encrypt

00:54:23.600 --> 00:54:25.550
or decrypt information.

00:54:25.550 --> 00:54:29.600
And so I'll use both the
key and the plaintext

00:54:29.600 --> 00:54:33.710
to generate what's known as the
ciphertext, the encrypted version

00:54:33.710 --> 00:54:35.690
of the message I'm trying to send.

00:54:35.690 --> 00:54:39.080
And then, instead of
sending the plaintext

00:54:39.080 --> 00:54:41.540
across the internet
to the other person, I

00:54:41.540 --> 00:54:44.870
might instead want to just send
the ciphertext across the internet

00:54:44.870 --> 00:54:48.050
to the other person so that I'm
not sending the plain version

00:54:48.050 --> 00:54:49.700
of the message across the internet.

00:54:49.700 --> 00:54:51.560
So the ciphertext goes across.

00:54:51.560 --> 00:54:54.270
And the other person
will also need the key.

00:54:54.270 --> 00:54:57.835
Now, if the other person has
both the ciphertext and the key,

00:54:57.835 --> 00:54:59.960
well, then using that
information, the other person

00:54:59.960 --> 00:55:02.960
can use the key to
decrypt the ciphertext

00:55:02.960 --> 00:55:05.800
and obtain the original plaintext.

00:55:05.800 --> 00:55:10.340
And this key is what we might call a
symmetric key encryption and decryption

00:55:10.340 --> 00:55:10.840
key.

00:55:10.840 --> 00:55:13.820
You use the key in order
to encrypt messages.

00:55:13.820 --> 00:55:17.600
And you use the same key in order
to do the decryption process.

00:55:17.600 --> 00:55:21.050
And as long as both I and the person
I'm communicating with both have access

00:55:21.050 --> 00:55:25.760
to that key, well, then we'll be able to
encrypt messages and decrypt messages.

00:55:25.760 --> 00:55:28.610
And someone who just has the
ciphertext but not the key

00:55:28.610 --> 00:55:33.160
likely won't be able to figure out
what that original message was.

00:55:33.160 --> 00:55:36.370
But there's a problem here, especially
in the context of the internet.

00:55:36.370 --> 00:55:41.500
And that is that both I and the other
person need to have access to this key.

00:55:41.500 --> 00:55:45.320
The key is what I use to do the
encryption and the decryption.

00:55:45.320 --> 00:55:48.978
And I can't just send the key across
the internet to the other person

00:55:48.978 --> 00:55:51.520
because, if I do that, well,
then someone in the middle who's

00:55:51.520 --> 00:55:54.130
intercepting all of my
requests could intercept

00:55:54.130 --> 00:55:56.740
both the ciphertext and the key.

00:55:56.740 --> 00:56:00.670
And therefore, they would be able
to decrypt the message because they

00:56:00.670 --> 00:56:03.260
have both the ciphertext and the key.

00:56:03.260 --> 00:56:07.090
Now, if I were able to go to another
person in person and exchange

00:56:07.090 --> 00:56:10.390
this secret key in secret,
well, then this scheme

00:56:10.390 --> 00:56:12.490
might work, because
we both have the key.

00:56:12.490 --> 00:56:16.360
And I didn't share the key publicly with
anyone who might intercept the message.

00:56:16.360 --> 00:56:18.970
Only I and the other person had the key.

00:56:18.970 --> 00:56:21.157
But in general, when
communicating on the internet,

00:56:21.157 --> 00:56:22.990
you're not communicating
with servers you've

00:56:22.990 --> 00:56:25.210
necessarily communicated with before.

00:56:25.210 --> 00:56:27.880
I might be trying to make
a request to a new website.

00:56:27.880 --> 00:56:32.770
And we somehow still need to agree on
a system where I can encrypt messages

00:56:32.770 --> 00:56:35.110
but only the other
person on the other side

00:56:35.110 --> 00:56:38.990
is able to decrypt
those messages instead.

00:56:38.990 --> 00:56:42.460
So this kind of cryptography--
probably not great

00:56:42.460 --> 00:56:47.300
for trying to initially try and create
a secure connection on the internet.

00:56:47.300 --> 00:56:49.810
And for that reason, a major
advancement in cryptography

00:56:49.810 --> 00:56:54.970
that allows for the internet to work is
this notion of public-key cryptography.

00:56:54.970 --> 00:56:56.890
In secret-key cryptography,
it's important

00:56:56.890 --> 00:57:00.280
that the key is secret because, if
the key were known by everyone, well,

00:57:00.280 --> 00:57:03.040
then anyone would be
able to decrypt messages.

00:57:03.040 --> 00:57:06.730
In public-key cryptography, we're
able to create a secure encryption

00:57:06.730 --> 00:57:09.790
system where the key is
allowed to be public,

00:57:09.790 --> 00:57:11.980
or one of the keys, as we'll soon see.

00:57:11.980 --> 00:57:16.030
And the idea here is that we're
using two keys instead of just one--

00:57:16.030 --> 00:57:20.072
that we have both a public key
and what's known as a private key.

00:57:20.072 --> 00:57:22.030
The private key-- your
private key is something

00:57:22.030 --> 00:57:25.840
you should not share with other people
to keep the encryption scheme secure.

00:57:25.840 --> 00:57:30.340
But the public key is one that
is OK to share with other people.

00:57:30.340 --> 00:57:34.150
And the distinction between the
two is that the public key will be

00:57:34.150 --> 00:57:36.640
used in order to encrypt information.

00:57:36.640 --> 00:57:40.090
And the private key will be
used to decrypt information

00:57:40.090 --> 00:57:41.870
that was encrypted by the public.

00:57:41.870 --> 00:57:44.620
And the public key and the private
key are mathematically related.

00:57:44.620 --> 00:57:47.287
And there are a couple of ways
that we might imagine doing that.

00:57:47.287 --> 00:57:51.160
But the idea now is that, if I want
to communicate with another person,

00:57:51.160 --> 00:57:54.100
that person sends me their public key.

00:57:54.100 --> 00:57:56.890
And it's OK for the public key
to travel across the internet.

00:57:56.890 --> 00:58:01.000
Anyone is allowed to see the public
key because the public key is only

00:58:01.000 --> 00:58:03.610
used for encrypting that data.

00:58:03.610 --> 00:58:06.610
So I can then take the
plaintext and the public key

00:58:06.610 --> 00:58:11.350
and use that to generate the ciphertext,
the encrypted version of the message

00:58:11.350 --> 00:58:13.930
that I am trying to send
across the internet.

00:58:13.930 --> 00:58:16.960
And then I send the
ciphertext to the other person

00:58:16.960 --> 00:58:18.640
with whom I'm trying to communicate.

00:58:18.640 --> 00:58:24.080
And the other person now, using the
ciphertext, then uses the private key--

00:58:24.080 --> 00:58:26.800
the private key that they did
not share, and the private key

00:58:26.800 --> 00:58:29.710
that has the ability to
decrypt information that

00:58:29.710 --> 00:58:32.600
was encrypted using the public key.

00:58:32.600 --> 00:58:35.800
So using a combination of the
ciphertext and the private key,

00:58:35.800 --> 00:58:38.830
the person I'm communicating
with can decrypt that information

00:58:38.830 --> 00:58:43.070
and get back whatever the original
plaintext of that information

00:58:43.070 --> 00:58:44.360
happened to be.

00:58:44.360 --> 00:58:46.630
And so this, then, is
how we can do a lot

00:58:46.630 --> 00:58:48.430
of this communication on the internet.

00:58:48.430 --> 00:58:50.830
By using this
public-private key pair, we

00:58:50.830 --> 00:58:53.560
can say, use the public
key to do the encrypting,

00:58:53.560 --> 00:58:55.690
use the private key
to do the decrypting.

00:58:55.690 --> 00:58:58.690
And now two computers that have
never interacted with each other

00:58:58.690 --> 00:59:00.970
before, without having
the opportunity to meet,

00:59:00.970 --> 00:59:04.630
to exchange some secret information,
can use a technique like this

00:59:04.630 --> 00:59:07.060
in order to securely
communicate with each other--

00:59:07.060 --> 00:59:10.300
to send a message back and forth
without anyone in the middle

00:59:10.300 --> 00:59:15.140
being able to intercept the message
and identify what the message is about.

00:59:15.140 --> 00:59:18.310
And once you have this ability, the
ability to communicate with another

00:59:18.310 --> 00:59:21.730
secretly, well, then you can
imagine agreeing on some secret key

00:59:21.730 --> 00:59:25.780
and then using secret-key encryption to
be able to encrypt and decrypt messages

00:59:25.780 --> 00:59:26.470
as well.

00:59:26.470 --> 00:59:28.262
And so that's an approach
that you can also

00:59:28.262 --> 00:59:31.460
take when trying to communicate with
other people across the internet.

00:59:31.460 --> 00:59:34.950
But this idea of encryption
is what allows for HTTPS,

00:59:34.950 --> 00:59:39.190
the secure version of the HTTP protocol,
to actually work to make sure that--

00:59:39.190 --> 00:59:42.690
when you are communicating with
your bank's website, for example--

00:59:42.690 --> 00:59:46.300
that someone along the way won't be
able to intercept that information

00:59:46.300 --> 00:59:48.770
and identify what it is that
you're communicating about

00:59:48.770 --> 00:59:51.090
and, instead, only has
the encrypted version

00:59:51.090 --> 00:59:55.720
of the information and a public key
with which they can encrypt information,

00:59:55.720 --> 00:59:57.850
but not a private key
that can ultimately

00:59:57.850 --> 01:00:02.150
be used in order to decrypt
information as well.

01:00:02.150 --> 01:00:05.920
And so that then is how we might allow
for this kind of secure communication

01:00:05.920 --> 01:00:09.010
on the internet and allow our
web applications to be secure.

01:00:09.010 --> 01:00:12.130
But in addition to our web applications
just listening for requests

01:00:12.130 --> 01:00:14.180
and then providing
some sort of response,

01:00:14.180 --> 01:00:17.560
our web applications were
also dealing with data.

01:00:17.560 --> 01:00:19.720
We introduced the idea
of SQL data tables

01:00:19.720 --> 01:00:22.240
where we had tables of
data with rows and columns

01:00:22.240 --> 01:00:23.950
that are representing information.

01:00:23.950 --> 01:00:26.980
And we've also created web
applications in this course where

01:00:26.980 --> 01:00:28.900
we've had applications that have users.

01:00:28.900 --> 01:00:32.940
Users sign in with a user name
and a password, for example.

01:00:32.940 --> 01:00:35.450
And so how might we
represent that information

01:00:35.450 --> 01:00:37.100
about users and their passwords?

01:00:37.100 --> 01:00:41.070
Well, one way would be just stored
inside of a table like this.

01:00:41.070 --> 01:00:42.410
Here's a table of users.

01:00:42.410 --> 01:00:44.210
Every user has an ID.

01:00:44.210 --> 01:00:47.490
They have a user name,
and they have a password.

01:00:47.490 --> 01:00:50.750
But this turns out to be
an incredibly insecure way

01:00:50.750 --> 01:00:53.090
to store passwords--
to be storing passwords

01:00:53.090 --> 01:00:56.120
in what might be called
plaintext, just to literally store

01:00:56.120 --> 01:00:58.040
the passwords inside of a database.

01:00:58.040 --> 01:01:01.910
And we should never do this in practice
because of the security vulnerabilities

01:01:01.910 --> 01:01:03.090
associated with it.

01:01:03.090 --> 01:01:06.680
If ever someone were to, unauthorized,
get access to this database,

01:01:06.680 --> 01:01:10.140
they would be able to see all of
the passwords for all of the users.

01:01:10.140 --> 01:01:13.010
So if this database ever leaked
for whatever reason, suddenly

01:01:13.010 --> 01:01:14.852
all of these passwords are now known.

01:01:14.852 --> 01:01:16.310
And this kind of thing does happen.

01:01:16.310 --> 01:01:19.460
If companies are not careful about
how they represent user names

01:01:19.460 --> 01:01:22.380
and passwords inside of their
databases, and if ever there's

01:01:22.380 --> 01:01:27.040
some sort of database leak,
suddenly a whole bunch of passwords

01:01:27.040 --> 01:01:29.008
could potentially be compromised.

01:01:29.008 --> 01:01:31.300
And it's for that reason that
the recommended approach,

01:01:31.300 --> 01:01:34.060
rather than store an actual
password, is to store

01:01:34.060 --> 01:01:38.740
a hashed version of the same
password using a hash function where

01:01:38.740 --> 01:01:41.680
a hash function, in this
context, is some function that

01:01:41.680 --> 01:01:46.630
takes a password of input
and outputs some hash--

01:01:46.630 --> 01:01:49.540
some sequence of characters
and numbers, in this case--

01:01:49.540 --> 01:01:51.850
that represents that
particular password,

01:01:51.850 --> 01:01:53.650
a hashed version of the password.

01:01:53.650 --> 01:01:55.870
But the important thing
about this hash function

01:01:55.870 --> 01:01:58.120
is that it's a one-way hash function.

01:01:58.120 --> 01:02:01.750
From the password, you can get to
the sequence of letters and numbers.

01:02:01.750 --> 01:02:04.480
But it is very, very difficult
to go the other way around

01:02:04.480 --> 01:02:09.490
to use this information to figure out
what the original password actually

01:02:09.490 --> 01:02:10.240
was.

01:02:10.240 --> 01:02:12.940
And so what this means is that
the companies won't actually

01:02:12.940 --> 01:02:18.550
know what any particular user's
password is when a user tries to log in.

01:02:18.550 --> 01:02:21.760
What we'll do is take their password
that they're trying to log in with.

01:02:21.760 --> 01:02:25.090
We'll hash it and compare
that hash against the hash

01:02:25.090 --> 01:02:27.580
that we've stored in the database.

01:02:27.580 --> 01:02:31.030
If the hashes match up, that means the
user probably typed in their password

01:02:31.030 --> 01:02:33.130
correctly and, therefore,
we can sign the user in.

01:02:33.130 --> 01:02:35.830
And otherwise, that's a
sign that the user did not

01:02:35.830 --> 01:02:38.270
type their password in correctly.

01:02:38.270 --> 01:02:40.330
So this, then, is the
reason why companies--

01:02:40.330 --> 01:02:42.670
if they're obeying these
best practices-- usually

01:02:42.670 --> 01:02:44.740
can't tell you what
your password actually

01:02:44.740 --> 01:02:46.810
is if you forget your password.

01:02:46.810 --> 01:02:49.930
If you forget your password, the company
will let you reset your password.

01:02:49.930 --> 01:02:52.242
They can update the data
inside of the table.

01:02:52.242 --> 01:02:53.950
But the company won't
be able to tell you

01:02:53.950 --> 01:02:57.760
what your password actually is because
the company doesn't know your password.

01:02:57.760 --> 01:03:00.460
The company only knows
some hashed version

01:03:00.460 --> 01:03:04.970
of the password, some result of passing
that password through a hash function.

01:03:04.970 --> 01:03:07.870
And as a result, they're
able to know whether you

01:03:07.870 --> 01:03:10.600
logged in successfully or not
with the correct credentials

01:03:10.600 --> 01:03:14.000
without actually knowing what
your password actually is.

01:03:14.000 --> 01:03:15.940
And so this is another
area where you might

01:03:15.940 --> 01:03:19.270
imagine that, if you're not careful
about how you're storing this data,

01:03:19.270 --> 01:03:22.360
it could be a security
vulnerability inside of your program

01:03:22.360 --> 01:03:26.220
where, if ever that data is leaked,
passwords suddenly become known.

01:03:26.220 --> 01:03:29.890
And there are other more subtle ways
that web applications could potentially

01:03:29.890 --> 01:03:32.410
leak information that
you, as the web developer,

01:03:32.410 --> 01:03:34.330
need to decide if you're OK with or not.

01:03:34.330 --> 01:03:37.570
Imagine a website, for example, where
you do have a place where you can say,

01:03:37.570 --> 01:03:39.700
if you forgot your
password, you can be sent

01:03:39.700 --> 01:03:43.173
to a place where you can reset
your password, for example.

01:03:43.173 --> 01:03:46.090
You might imagine that, if you type
in your email address, click Reset

01:03:46.090 --> 01:03:49.270
Password, you might get a message
like, all right, password reset email

01:03:49.270 --> 01:03:50.530
has been sent.

01:03:50.530 --> 01:03:54.070
But you might imagine typing in an email
address and getting something like,

01:03:54.070 --> 01:03:57.400
error, there is no user
with that email address.

01:03:57.400 --> 01:04:00.250
And here, again, is a potential
security vulnerability

01:04:00.250 --> 01:04:02.320
in terms of leaked information.

01:04:02.320 --> 01:04:06.340
This page that just seems to send you
an email if you forgot your password is

01:04:06.340 --> 01:04:10.720
now leaking information about which
users happened to have accounts

01:04:10.720 --> 01:04:14.140
on your website and which users do
not because all someone needs to do

01:04:14.140 --> 01:04:18.100
is type in an email address and find out
whether it results in an error or not

01:04:18.100 --> 01:04:22.310
in order to know whether a user happens
to have an account on the website

01:04:22.310 --> 01:04:22.810
or not.

01:04:22.810 --> 01:04:24.685
And maybe that's not a
big deal if that's not

01:04:24.685 --> 01:04:26.170
something you care about securing.

01:04:26.170 --> 01:04:30.160
But if it's a website where
you do care about making sure

01:04:30.160 --> 01:04:32.650
that, if someone has an account
or doesn't have an account,

01:04:32.650 --> 01:04:35.350
that information is kept private
and secure only to the user,

01:04:35.350 --> 01:04:37.630
unless they want to share
it, well, then this type

01:04:37.630 --> 01:04:40.570
of page, this type of
interface with the database

01:04:40.570 --> 01:04:43.570
could potentially be leaking
that kind of information.

01:04:43.570 --> 01:04:46.120
And information can be leaked
in all sorts of different ways.

01:04:46.120 --> 01:04:48.700
You can even leak information
just based on the time

01:04:48.700 --> 01:04:52.780
it takes for the database to be able
to respond to a particular request.

01:04:52.780 --> 01:04:55.450
You might imagine, if you
make a request about a user,

01:04:55.450 --> 01:04:58.180
and it takes longer to
respond, that might tell you

01:04:58.180 --> 01:05:01.150
something about the number of
database queries it needs to run

01:05:01.150 --> 01:05:04.210
or the amount of information that's
stored about that user as opposed

01:05:04.210 --> 01:05:06.200
to if a request takes less time.

01:05:06.200 --> 01:05:09.850
So even something like how many
milliseconds it takes for a web server

01:05:09.850 --> 01:05:13.780
to respond to a request can
reveal or leak information

01:05:13.780 --> 01:05:16.720
about the data that is stored
inside of the database.

01:05:16.720 --> 01:05:19.750
And there have been examples of
researchers who actually try and see

01:05:19.750 --> 01:05:23.702
what information they can get just from
looking at these kinds of information.

01:05:23.702 --> 01:05:25.660
It doesn't seem like it
would leak information,

01:05:25.660 --> 01:05:29.580
but it might actually
reveal information as well.

01:05:29.580 --> 01:05:32.740
Now, another concern when dealing with
SQL and databases we've talked about

01:05:32.740 --> 01:05:34.707
is the context of SQL injection--

01:05:34.707 --> 01:05:36.790
this threat where, if
you're not careful about how

01:05:36.790 --> 01:05:40.090
it is that you run your SQL
code, you could inadvertently

01:05:40.090 --> 01:05:43.390
end up executing code that you
don't mean to be executing.

01:05:43.390 --> 01:05:46.390
Situations like here-- we're in
a username and password field.

01:05:46.390 --> 01:05:48.010
We've seen this example before--

01:05:48.010 --> 01:05:50.620
where, if a user tries to log
in, you might imagine a query

01:05:50.620 --> 01:05:53.200
like this is run selecting
from the user's table

01:05:53.200 --> 01:05:57.190
where user name equals whatever was
typed in as the user name and password

01:05:57.190 --> 01:05:59.800
equals whatever was
typed in as the password.

01:05:59.800 --> 01:06:04.200
And we saw how, for a normal user--
someone who types in, Harry and 1, 2,

01:06:04.200 --> 01:06:06.970
3, 4, 5 as their username and password--

01:06:06.970 --> 01:06:09.380
that this type of query works just fine.

01:06:09.380 --> 01:06:11.890
But if a hacker tries
to log into a website

01:06:11.890 --> 01:06:15.520
and maybe includes a double
quotation mark and two hyphens,

01:06:15.520 --> 01:06:18.640
for example, where two
hyphens mean a comment in SQL,

01:06:18.640 --> 01:06:22.760
and we were to literally substitute
these values into our SQL queries,

01:06:22.760 --> 01:06:27.010
well, then you might end up
substituting hacker hyphen hyphen hyphen

01:06:27.010 --> 01:06:30.100
hyphen creating a comment that
ignores the rest of this query,

01:06:30.100 --> 01:06:33.640
effectively ignoring any kind of
password checking that we might

01:06:33.640 --> 01:06:35.560
want our web application to be doing.

01:06:35.560 --> 01:06:37.390
So this, too-- another
vulnerability that

01:06:37.390 --> 01:06:40.570
comes about whenever we're
dealing with executing

01:06:40.570 --> 01:06:42.520
SQL code inside of a database.

01:06:42.520 --> 01:06:44.860
And in order to deal with
this, we want to make sure

01:06:44.860 --> 01:06:48.640
that we're escaping any of these
potentially dangerous characters that

01:06:48.640 --> 01:06:50.710
might show up inside of our SQL queries.

01:06:50.710 --> 01:06:52.870
And Django's models do this for us.

01:06:52.870 --> 01:06:56.980
When we do these kinds of queries
using Django saying, .objects, .filter,

01:06:56.980 --> 01:07:00.580
to be able to filter out for only
certain versions of a particular model,

01:07:00.580 --> 01:07:04.330
it is going to take care of the process
of making sure that it's not subject

01:07:04.330 --> 01:07:06.770
to these kinds of SQL injection attacks.

01:07:06.770 --> 01:07:09.340
But if ever you're writing a
web application that is directly

01:07:09.340 --> 01:07:12.070
executing secret code, which
you might imagine doing,

01:07:12.070 --> 01:07:14.080
you do want to be
careful about making sure

01:07:14.080 --> 01:07:16.240
that you're not exposing
the application to be

01:07:16.240 --> 01:07:20.070
vulnerable to these
kinds of threats as well.

01:07:20.070 --> 01:07:21.920
So that then are potential
threats that come

01:07:21.920 --> 01:07:24.935
about when we're just talking about
what's happening on the server.

01:07:24.935 --> 01:07:26.810
But we also can think
about what might happen

01:07:26.810 --> 01:07:28.700
when we're interacting
with other servers--

01:07:28.700 --> 01:07:31.380
when we're interacting
with APIs, for example.

01:07:31.380 --> 01:07:33.770
So we talked about JavaScript
and using JavaScript

01:07:33.770 --> 01:07:37.400
to be able to make additional requests
to APIs or to other services that

01:07:37.400 --> 01:07:40.302
are able to return back with
certain types of information.

01:07:40.302 --> 01:07:42.260
And with APIs, there are
a number of techniques

01:07:42.260 --> 01:07:46.040
that we can use in APIs to allow
them to be more scalable, to allow

01:07:46.040 --> 01:07:48.290
them to be more secure.

01:07:48.290 --> 01:07:50.780
One is this notion of
rate limiting where

01:07:50.780 --> 01:07:52.940
we might want to make
sure that no user is

01:07:52.940 --> 01:07:56.480
able to make more than a certain
number of requests to an API

01:07:56.480 --> 01:07:59.000
in any particular amount of time.

01:07:59.000 --> 01:08:01.130
This is in response to
a security threat that

01:08:01.130 --> 01:08:03.440
has to do with the
scalability of a system, which

01:08:03.440 --> 01:08:06.560
is known as a DOS or Denial
of Service Attack where,

01:08:06.560 --> 01:08:09.920
effectively, if you just make a whole
bunch of requests to a single server

01:08:09.920 --> 01:08:13.543
over, and over, and over again, you
could potentially shut down that system

01:08:13.543 --> 01:08:15.710
because you're making so
many requests that it's not

01:08:15.710 --> 01:08:19.050
able to handle that many
requests all at the same time.

01:08:19.050 --> 01:08:22.310
And for that reason, because it's
so easy to make an API request--

01:08:22.310 --> 01:08:27.170
you can do so using just a single line
of Python or JavaScript, for example--

01:08:27.170 --> 01:08:29.840
APIs will often institute
some kind of rate

01:08:29.840 --> 01:08:32.960
limiting to limit the number of
requests you can make so that you're not

01:08:32.960 --> 01:08:35.630
going to overwhelm the server
or overwhelm the database that

01:08:35.630 --> 01:08:39.080
needs to be queried in order
to respond to those requests.

01:08:39.080 --> 01:08:42.229
And so this kind of
limiting might work as well.

01:08:42.229 --> 01:08:45.800
APIs might also want to add some
kind of route authentication.

01:08:45.800 --> 01:08:49.527
You might not want everybody to
access the same data via an API.

01:08:49.527 --> 01:08:51.319
Maybe there's some sort
of permission model

01:08:51.319 --> 01:08:54.800
where only certain users are able
to access certain pieces of data

01:08:54.800 --> 01:08:55.880
from the API.

01:08:55.880 --> 01:09:00.290
So you might imagine that a user needs
to have an API key, for example--

01:09:00.290 --> 01:09:03.830
effectively, a password that
they need to pass around anytime

01:09:03.830 --> 01:09:06.710
they're making an API
request to your API

01:09:06.710 --> 01:09:09.140
and that allows you to then
be able to look at that key

01:09:09.140 --> 01:09:12.390
and verify that they are
who they say they are.

01:09:12.390 --> 01:09:16.010
Now, with those API keys comes other
potential security vulnerabilities

01:09:16.010 --> 01:09:17.090
to be mindful of.

01:09:17.090 --> 01:09:21.290
One is that, just as you should never be
putting passwords inside of your source

01:09:21.290 --> 01:09:23.899
code-- inside of your Git
repository, for example--

01:09:23.899 --> 01:09:27.290
you likewise generally shouldn't
be putting your API keys

01:09:27.290 --> 01:09:31.700
inside of your web applications as well,
inside of the source code of those web

01:09:31.700 --> 01:09:34.069
applications, because
then anyone who has access

01:09:34.069 --> 01:09:36.020
to the source code for
the web application

01:09:36.020 --> 01:09:38.960
can see what your API
key is, could then use

01:09:38.960 --> 01:09:42.439
the API key to pretend to be
you and, therefore, get access

01:09:42.439 --> 01:09:46.609
to potential API routes that they
should not be able to access.

01:09:46.609 --> 01:09:50.930
One common solution to this is to use
what are known as environment variables

01:09:50.930 --> 01:09:55.190
where, effectively, you in your
program say that your API key is not

01:09:55.190 --> 01:09:59.220
going to be some predetermined string
that is in the text of your program

01:09:59.220 --> 01:10:03.170
but instead is going to be drawn from
the environment in which the program is

01:10:03.170 --> 01:10:04.040
being run.

01:10:04.040 --> 01:10:07.430
And then, on the server, when
you're running the web application,

01:10:07.430 --> 01:10:11.000
you'll first make sure the server has
all of those environment variables set

01:10:11.000 --> 01:10:16.400
correctly so that, rather than have
the API key actually in the source

01:10:16.400 --> 01:10:20.570
code of the program, the API key is
simply in the environment on the server

01:10:20.570 --> 01:10:22.340
where the web application is running.

01:10:22.340 --> 01:10:25.370
And the server can just draw that
information from the environment

01:10:25.370 --> 01:10:29.720
so that it knows what the API
key should be without the API key

01:10:29.720 --> 01:10:34.590
actually having to be inside of the
web application source code itself.

01:10:34.590 --> 01:10:36.470
And so as we begin to
deal with APIs, you

01:10:36.470 --> 01:10:40.070
might notice that many APIs will
require you to have an API key.

01:10:40.070 --> 01:10:42.170
And often, it's for
these sorts of reasons--

01:10:42.170 --> 01:10:45.310
to make sure that we're able to
authenticate users effectively

01:10:45.310 --> 01:10:48.560
and also to make sure that we're able
to limit users to make sure that they're

01:10:48.560 --> 01:10:51.140
not making too many
requests to the server

01:10:51.140 --> 01:10:54.170
or to the database at
any particular time.

01:10:54.170 --> 01:10:57.440
But this, then, starts to get us into
other potential vulnerabilities--

01:10:57.440 --> 01:11:00.470
in particular, vulnerabilities
concerning JavaScript.

01:11:00.470 --> 01:11:02.600
JavaScript, again, is
a programming language

01:11:02.600 --> 01:11:05.840
that we use in order to write code
that runs inside of our web browser--

01:11:05.840 --> 01:11:08.730
a browser like Chrome, or
Safari, or something like that.

01:11:08.730 --> 01:11:14.210
And as a result, JavaScript has a lot of
power to manipulate things on the page.

01:11:14.210 --> 01:11:16.220
It can simulate the clicking of buttons.

01:11:16.220 --> 01:11:20.120
It can change the content of what
happens to be on any particular page.

01:11:20.120 --> 01:11:22.370
And as a result, there are
many, many vulnerabilities

01:11:22.370 --> 01:11:26.750
that come about when it comes
to thinking about JavaScript.

01:11:26.750 --> 01:11:30.750
And one such vulnerability is this
notion of cross-site scripting--

01:11:30.750 --> 01:11:33.380
that, in general, when
on your web application,

01:11:33.380 --> 01:11:37.760
you only want JavaScript to run
if you, yourself have written it.

01:11:37.760 --> 01:11:39.830
Cross-site scripting
is a potential threat

01:11:39.830 --> 01:11:45.050
where someone else might be able to get
JavaScript code to run on your website

01:11:45.050 --> 01:11:48.890
when it's JavaScript code that someone
else wrote instead of you, yourself.

01:11:48.890 --> 01:11:51.710
And this is a potential vulnerability
because, if someone else can

01:11:51.710 --> 01:11:55.280
write the JavaScript code, they
can manipulate the contents of what

01:11:55.280 --> 01:11:56.830
happens to be on your website.

01:11:56.830 --> 01:11:59.300
They can potentially
manipulate the user experience

01:11:59.300 --> 01:12:02.260
to get a result that is
not, actually, desired.

01:12:02.260 --> 01:12:06.860
So let's go ahead and take a look at
one example of cross-site scripting.

01:12:06.860 --> 01:12:09.770
All right, so I've prepared a
web application in advance--

01:12:09.770 --> 01:12:14.900
it's called security-- inside of which
is a single Django app called XXS,

01:12:14.900 --> 01:12:16.590
for Cross-Site Scripting.

01:12:16.590 --> 01:12:19.670
And inside of here, we'll
first take a look at the URLs.

01:12:19.670 --> 01:12:24.290
So there's a single URL that just
allows us to provide any path.

01:12:24.290 --> 01:12:27.330
And then it's going to
load the Index view.

01:12:27.330 --> 01:12:31.910
And on the Index view, we're
going to display in HTTP response.

01:12:31.910 --> 01:12:35.210
It says, here was the path that
just happened to be requested.

01:12:35.210 --> 01:12:37.910
So you might imagine this is
a simplified version of what

01:12:37.910 --> 01:12:41.240
you might see on other websites, for
example, where websites might show you

01:12:41.240 --> 01:12:45.170
on any particular page what path
you're on in order to get to that page,

01:12:45.170 --> 01:12:49.610
some indication of where you are
inside of this web application.

01:12:49.610 --> 01:12:53.150
So I'd go ahead and see the
security and run the server--

01:12:53.150 --> 01:12:57.640
Python manage.py, run server.

01:12:57.640 --> 01:12:59.320
So I am now running the server.

01:12:59.320 --> 01:13:06.420
And now I'll go ahead and go into my
web application, /hello, for example.

01:13:06.420 --> 01:13:09.570
And so what I see here is
the requested path hello,

01:13:09.570 --> 01:13:11.230
which is what I would expect it to be.

01:13:11.230 --> 01:13:13.960
I can change it to
something else, like hi.

01:13:13.960 --> 01:13:15.270
So here's requested path hi.

01:13:15.270 --> 01:13:17.760
Here's hi/2, for example.

01:13:17.760 --> 01:13:20.430
Whatever page I visit,
it gives me a page

01:13:20.430 --> 01:13:23.190
that says, requested
path, and then whatever

01:13:23.190 --> 01:13:25.770
path I happened to be visiting.

01:13:25.770 --> 01:13:29.520
But watch what happens if I
try and visit this URL instead.

01:13:29.520 --> 01:13:39.600
I'm going to visit URL /script
alert hi, and then end script.

01:13:39.600 --> 01:13:40.650
So I run it.

01:13:40.650 --> 01:13:44.990
And suddenly, an alert shows
up on my page that says, hi.

01:13:44.990 --> 01:13:45.850
And I press OK.

01:13:45.850 --> 01:13:47.790
And it says, all right, requested path.

01:13:47.790 --> 01:13:49.680
That alert was a JavaScript alert.

01:13:49.680 --> 01:13:53.250
It was JavaScript code
running on my web application.

01:13:53.250 --> 01:13:56.940
But it was not code that was JavaScript
code inside of my web application.

01:13:56.940 --> 01:14:00.150
It was someone else who
wrote based on the URL

01:14:00.150 --> 01:14:03.780
to run particular JavaScript
on my particular page.

01:14:03.780 --> 01:14:06.120
And so someone linked
to my web application

01:14:06.120 --> 01:14:09.000
and passed in this script
tag as part of the URL.

01:14:09.000 --> 01:14:12.840
Someone who clicked on that link might
have been taken to my web application

01:14:12.840 --> 01:14:17.630
but ultimately had JavaScript run
that was created by someone else.

01:14:17.630 --> 01:14:19.980
And that, ultimately, is
potentially dangerous.

01:14:19.980 --> 01:14:22.440
It leaves open the
possibility that someone else

01:14:22.440 --> 01:14:24.990
could run JavaScript code on my page.

01:14:24.990 --> 01:14:27.300
And it might not just be
something like a script.

01:14:27.300 --> 01:14:29.940
You might imagine someone
not just displaying an alert,

01:14:29.940 --> 01:14:33.720
but modifying something inside of the
DOM-- changing the contents of the web

01:14:33.720 --> 01:14:36.960
page, making API requests,
doing other types of tasks

01:14:36.960 --> 01:14:39.870
that you can do using JavaScript
inside of a web browser

01:14:39.870 --> 01:14:44.580
that, ultimately, leave my page open
to potential security vulnerabilities.

01:14:44.580 --> 01:14:47.580
And so these are cases where it's
important to be mindful of when you're

01:14:47.580 --> 01:14:51.720
designing these pages, if ever there is
a possibility that someone could inject

01:14:51.720 --> 01:14:54.630
their own JavaScript
into your page somehow,

01:14:54.630 --> 01:14:57.780
you'll want to either detect
that or escape it in some way.

01:14:57.780 --> 01:15:02.025
Or take other precautions to make sure
that this kind of cross-site scripting

01:15:02.025 --> 01:15:03.150
isn't going to be possible.

01:15:03.150 --> 01:15:06.240
You might imagine that, in a
messaging application-- for example,

01:15:06.240 --> 01:15:07.740
if you're messaging back and forth--

01:15:07.740 --> 01:15:10.282
you don't want it to be the case
that, if you message someone

01:15:10.282 --> 01:15:13.260
else some JavaScript code
that, when they receive it,

01:15:13.260 --> 01:15:16.380
that code actually ends up
running as some JavaScript that

01:15:16.380 --> 01:15:18.210
runs on that particular page.

01:15:18.210 --> 01:15:20.450
You want to be sure to
escape that information so

01:15:20.450 --> 01:15:22.830
that they just see the
text of the JavaScript code

01:15:22.830 --> 01:15:25.430
but that the code isn't
actually executed.

01:15:25.430 --> 01:15:28.140
And this is a similar threat to
that threat of SQL injection.

01:15:28.140 --> 01:15:30.480
It all comes back to
the idea of not wanting

01:15:30.480 --> 01:15:33.120
to allow someone else
to be able to inject

01:15:33.120 --> 01:15:35.280
their own code into your program.

01:15:35.280 --> 01:15:39.540
You don't want someone else to be able
to inject SQL code into the queries you

01:15:39.540 --> 01:15:40.770
run on your database.

01:15:40.770 --> 01:15:44.640
And you don't want someone to be able
to inject JavaScript code into your web

01:15:44.640 --> 01:15:49.850
page because that leaves open potential
security vulnerabilities as well.

01:15:49.850 --> 01:15:51.882
One type of security
vulnerability that Django

01:15:51.882 --> 01:15:54.590
is quite good at defending against
is one that we've seen before,

01:15:54.590 --> 01:15:57.470
but we'll explore in more
detail how it might work.

01:15:57.470 --> 01:16:00.530
And it's this idea of
cross-site request forgery where

01:16:00.530 --> 01:16:05.270
you fake a request to a website when
you didn't intend to actually make

01:16:05.270 --> 01:16:07.020
a request to that website.

01:16:07.020 --> 01:16:10.830
So you might imagine that,
if your bank, for example,

01:16:10.830 --> 01:16:12.982
had a URL that allowed
you to transfer money

01:16:12.982 --> 01:16:14.690
from one person to
another person-- we've

01:16:14.690 --> 01:16:16.430
talked about this idea a little bit.

01:16:16.430 --> 01:16:20.480
But imagine now how you could implement
this if it really was just a URL.

01:16:20.480 --> 01:16:24.740
You could go to /transfer
and say, as get parameters,

01:16:24.740 --> 01:16:26.060
who am I transferring money to?

01:16:26.060 --> 01:16:27.950
And what is the amount
that I'm transferring?

01:16:27.950 --> 01:16:32.120
Then someone else on some other website
could, in the body of their page,

01:16:32.120 --> 01:16:35.270
just have a link where
that link says, click here.

01:16:35.270 --> 01:16:37.460
And it links to your
bank.com, or whatever

01:16:37.460 --> 01:16:41.390
your bank is, transferring
money to me in this amount.

01:16:41.390 --> 01:16:44.720
And if some user unknowingly just
clicked on that link not knowing

01:16:44.720 --> 01:16:46.640
where it would take
them, this website might

01:16:46.640 --> 01:16:49.640
be able to forge a
request to the bank-- make

01:16:49.640 --> 01:16:52.070
it seem like the user
had gone to the bank

01:16:52.070 --> 01:16:54.350
and tried to initiate
some kind of transfer

01:16:54.350 --> 01:16:56.360
and, ultimately, tried
to transfer money.

01:16:56.360 --> 01:16:59.330
And it doesn't even necessarily
need to be in a link.

01:16:59.330 --> 01:17:03.230
How else might you get some new request
to happen inside of the web browser?

01:17:03.230 --> 01:17:05.690
You might imagine-- though
it might seem a bit strange--

01:17:05.690 --> 01:17:08.450
to put this inside of an image.

01:17:08.450 --> 01:17:13.250
Image source, the source of the
image, is this particular URL--

01:17:13.250 --> 01:17:14.493
the bank's transfer page.

01:17:14.493 --> 01:17:16.160
Now, that doesn't really make any sense.

01:17:16.160 --> 01:17:17.840
The transfer page is not an image.

01:17:17.840 --> 01:17:19.340
But it doesn't matter.

01:17:19.340 --> 01:17:24.590
All an image tag is going to do is try
to make a request to this source URL

01:17:24.590 --> 01:17:28.527
to get that image and then try to
display it in the user's web browser.

01:17:28.527 --> 01:17:31.610
But the first part is what's important--
the fact that this source ends up

01:17:31.610 --> 01:17:33.650
being requested by the web browser.

01:17:33.650 --> 01:17:36.380
Without the user having to
click on or do anything,

01:17:36.380 --> 01:17:40.850
they might try and request from your
bank.com/transfer this particular

01:17:40.850 --> 01:17:45.500
request, which might initiate some sort
of bank transfer without the user even

01:17:45.500 --> 01:17:46.580
realizing it.

01:17:46.580 --> 01:17:49.160
And it's for that reason that
we generally suggest that,

01:17:49.160 --> 01:17:54.560
anytime you're creating a website that
is going to allow for the manipulation

01:17:54.560 --> 01:17:57.500
of some kind of state-- that
allows for some change to happen,

01:17:57.500 --> 01:17:59.210
something like transferring money--

01:17:59.210 --> 01:18:02.450
you don't want that to be a Git
request, something that you could just

01:18:02.450 --> 01:18:06.515
load in an image or load by clicking on
a link that takes you to another page.

01:18:06.515 --> 01:18:08.390
You don't want that to
happen because then it

01:18:08.390 --> 01:18:12.350
makes it very easy for someone
else to fake a request to your page

01:18:12.350 --> 01:18:16.790
by just creating an image or
linking to, somehow, a website,

01:18:16.790 --> 01:18:20.005
transferring funds from
one user to another.

01:18:20.005 --> 01:18:22.130
So a solution to this--
and we've talked about it--

01:18:22.130 --> 01:18:24.920
is that, generally, we
only want post requests

01:18:24.920 --> 01:18:27.860
to be able to manipulate
something inside of the database,

01:18:27.860 --> 01:18:32.330
to be able to actually initiate a
transfer from one user to another user.

01:18:32.330 --> 01:18:35.210
But even then, this is
not perfectly secure.

01:18:35.210 --> 01:18:38.660
You could still be tricked
into submitting a post request.

01:18:38.660 --> 01:18:42.320
Imagine an adversarial website
that had a form like this--

01:18:42.320 --> 01:18:47.120
a form whose action was your
bank.com/transfer and whose method was

01:18:47.120 --> 01:18:48.200
post.

01:18:48.200 --> 01:18:52.370
And now here-- two input fields
whose type is hidden, meaning you

01:18:52.370 --> 01:18:55.040
won't actually be able to
see those input fields when

01:18:55.040 --> 01:18:56.420
the user is looking at the page.

01:18:56.420 --> 01:18:59.090
They'd only know about it
if they inspected the source

01:18:59.090 --> 01:19:03.120
code of this particular HTML page.

01:19:03.120 --> 01:19:05.550
Here, there's a hidden
input whose name is to,

01:19:05.550 --> 01:19:07.840
meaning the person I'd
like to transfer money to.

01:19:07.840 --> 01:19:10.470
Here is the amount, the value
that I would like to transfer.

01:19:10.470 --> 01:19:14.153
And all the user is going to see
is a button that says, click here.

01:19:14.153 --> 01:19:17.320
They're not going to see either of the
input fields, because they're hidden.

01:19:17.320 --> 01:19:19.740
But if they do click the
Click Here button, well, then

01:19:19.740 --> 01:19:22.950
suddenly they're going to be
submitting a post request to the bank

01:19:22.950 --> 01:19:25.525
and initiating some transfer
when they didn't intend to.

01:19:25.525 --> 01:19:28.650
Now, maybe this seems like, oh, it's
not a big deal, because the user still

01:19:28.650 --> 01:19:29.850
needs to click a button.

01:19:29.850 --> 01:19:31.767
And the user shouldn't
be clicking on a button

01:19:31.767 --> 01:19:33.990
if they don't know what
the button is going to do.

01:19:33.990 --> 01:19:38.280
Well, for one, it's probably reasonable
to imagine that an adversary might

01:19:38.280 --> 01:19:41.010
embed this button inside of
a page where it looks totally

01:19:41.010 --> 01:19:42.820
safe to be able to click on a button.

01:19:42.820 --> 01:19:45.960
But moreover, the user doesn't
even need to click on it in order

01:19:45.960 --> 01:19:47.010
to submit the form.

01:19:47.010 --> 01:19:49.170
We can just add a little
bit of JavaScript.

01:19:49.170 --> 01:19:52.710
You might imagine that an adversary
could do something like this.

01:19:52.710 --> 01:19:55.560
Add an unknown attribute
to the body that says,

01:19:55.560 --> 01:19:59.250
when the body of the page is done
loading, go to document.form--

01:19:59.250 --> 01:20:01.680
meaning all of the
forms for this web page.

01:20:01.680 --> 01:20:04.590
Get the first one, and submit it.

01:20:04.590 --> 01:20:06.320
Submit the form.

01:20:06.320 --> 01:20:09.450
And what that's going to do is, even
without the user doing anything--

01:20:09.450 --> 01:20:12.330
even without the user clicking
on the Click Here button--

01:20:12.330 --> 01:20:15.420
as soon as this page is loaded,
this form is going to submit,

01:20:15.420 --> 01:20:19.050
submitting a post request to the
bank, and attempting to transfer funds

01:20:19.050 --> 01:20:21.120
from one user to another user.

01:20:21.120 --> 01:20:23.760
And so this is what we might
call a cross-site request

01:20:23.760 --> 01:20:29.220
forgery where some adversarial website
has forged a request to our website.

01:20:29.220 --> 01:20:32.870
And ideally, we wouldn't like
for that to be able to happen.

01:20:32.870 --> 01:20:35.030
So how do we guard against this?

01:20:35.030 --> 01:20:39.780
Well, what Django allows us to do and
a very common approach is to add a CSRF

01:20:39.780 --> 01:20:42.390
token-- a Cross-Site
Request Forgery token--

01:20:42.390 --> 01:20:46.320
that is going to be
regenerated for every session

01:20:46.320 --> 01:20:48.740
such that, only if
that token is present,

01:20:48.740 --> 01:20:51.610
will the transfer be able to go through.

01:20:51.610 --> 01:20:57.360
So on our website, we can include the
CSRF token inside of this HTML form

01:20:57.360 --> 01:21:00.510
and, as a result, make sure that
we're able to transfer money only

01:21:00.510 --> 01:21:02.650
when the CSRF token is present.

01:21:02.650 --> 01:21:05.220
But if some other website
tries to forge a request,

01:21:05.220 --> 01:21:07.710
they won't know what
the CSRF token should be

01:21:07.710 --> 01:21:09.840
because it changes for every session.

01:21:09.840 --> 01:21:14.730
And therefore, they won't be able to
actually forge a request from one user

01:21:14.730 --> 01:21:16.510
to another.

01:21:16.510 --> 01:21:19.590
So all across the various
different tools and technologies

01:21:19.590 --> 01:21:20.340
we've been using--

01:21:20.340 --> 01:21:25.710
Python, HTTP, Django, HTML in
terms of creating these web

01:21:25.710 --> 01:21:27.990
applications using
JavaScript, and the APIs

01:21:27.990 --> 01:21:29.460
that we might be interacting with--

01:21:29.460 --> 01:21:31.710
there are security
considerations all throughout.

01:21:31.710 --> 01:21:33.623
We've only touched on
a couple of them here.

01:21:33.623 --> 01:21:36.540
But it just goes to show how it's
important to be mindful as you think

01:21:36.540 --> 01:21:39.790
about the practice of web programming,
thinking about what you're going to add

01:21:39.790 --> 01:21:42.960
to your web applications and what
features your web application supports,

01:21:42.960 --> 01:21:46.260
to think about what the potential
vulnerabilities there are as well--

01:21:46.260 --> 01:21:49.920
how someone might exploit your web
application in order to do something

01:21:49.920 --> 01:21:51.690
with it that they probably shouldn't.

01:21:51.690 --> 01:21:54.450
And as you take your web
applications from applications

01:21:54.450 --> 01:21:57.015
that are just running on
your own local computer

01:21:57.015 --> 01:21:59.940
to applications that are
running in some web server

01:21:59.940 --> 01:22:02.130
that many people are
starting to use, these

01:22:02.130 --> 01:22:04.420
are the types of questions
to start to be asking.

01:22:04.420 --> 01:22:07.740
How can you make sure that your
web application is scalable?

01:22:07.740 --> 01:22:11.740
How can you make sure that
your web application is secure?

01:22:11.740 --> 01:22:15.392
So now that we've explored that-- a lot
of web programming-- what comes next?

01:22:15.392 --> 01:22:17.850
In this course, we've explored
a number of different tools,

01:22:17.850 --> 01:22:19.470
and technologies, and languages.

01:22:19.470 --> 01:22:21.540
But there are many other
web frameworks and ways

01:22:21.540 --> 01:22:23.850
you can build web applications as well.

01:22:23.850 --> 01:22:26.220
We spent most of our time
looking at the Django web

01:22:26.220 --> 01:22:27.580
framework, written in Python.

01:22:27.580 --> 01:22:29.430
But you can use other
programming languages

01:22:29.430 --> 01:22:31.560
to build web applications as well.

01:22:31.560 --> 01:22:34.980
Express.js, for example, is a
very popular JavaScript framework

01:22:34.980 --> 01:22:36.480
for building web applications.

01:22:36.480 --> 01:22:41.390
Ruby on Rails is a popular server-side
web framework built using Ruby.

01:22:41.390 --> 01:22:43.020
And there are many others as well.

01:22:43.020 --> 01:22:44.730
And there are also
client-side frameworks

01:22:44.730 --> 01:22:48.540
used primarily with JavaScript to
be able to build user interfaces.

01:22:48.540 --> 01:22:51.750
We've seen a little bit of React to
both dynamic and interactive user

01:22:51.750 --> 01:22:52.620
interfaces.

01:22:52.620 --> 01:22:56.490
Other popular client-side frameworks
include Angular JS, and Vue.js,

01:22:56.490 --> 01:22:58.343
and a number of others as well.

01:22:58.343 --> 01:23:00.510
And then, once you've built
these web applications--

01:23:00.510 --> 01:23:03.600
using any of these server-side
frameworks and client-side frameworks--

01:23:03.600 --> 01:23:06.360
then you might imagine wanting
to take these applications

01:23:06.360 --> 01:23:07.645
and deploy them to the web.

01:23:07.645 --> 01:23:10.020
And to do that, there are a
number of ways we can do this

01:23:10.020 --> 01:23:13.950
as well-- a number of different services
including Amazon Web Services, AWS,

01:23:13.950 --> 01:23:17.730
Google Cloud, and Microsoft Azure
that can be used in order to deploy

01:23:17.730 --> 01:23:19.530
these web applications.

01:23:19.530 --> 01:23:22.320
Roku is a service that
uses AWS and tries

01:23:22.320 --> 01:23:26.100
to simplify the process of making it
easier to deploy your web applications.

01:23:26.100 --> 01:23:29.340
And if you're web application is
really just static-- it's just HTML,

01:23:29.340 --> 01:23:33.300
and CSS, and JavaScript-- well, then
you can use something like GitHub Pages

01:23:33.300 --> 01:23:37.945
to be able to host a web application for
free on GitHub's own servers instead.

01:23:37.945 --> 01:23:41.070
And there are many other ways you can
imagine deploying web applications as

01:23:41.070 --> 01:23:43.395
well-- different services
that you can use in order

01:23:43.395 --> 01:23:46.020
to take the web applications that
you have been building or web

01:23:46.020 --> 01:23:47.940
applications you might
build in the future

01:23:47.940 --> 01:23:52.870
and make them available on the internet
for others to be able to use as well.

01:23:52.870 --> 01:23:56.550
So as we look back on the various topics
within web programming we've explored,

01:23:56.550 --> 01:23:58.690
we've seen a lot of
tools and technologies

01:23:58.690 --> 01:24:02.760
we can use that we can leverage in order
to build interesting web applications.

01:24:02.760 --> 01:24:06.930
We started by taking a
closer look HTML and CSS,

01:24:06.930 --> 01:24:10.080
diving into how we can use that to
describe the structure of our page,

01:24:10.080 --> 01:24:12.210
and then taking advantage
of tools like SAS

01:24:12.210 --> 01:24:15.570
that allow us to generate
CSS that allows for much more

01:24:15.570 --> 01:24:18.270
complex styling for our website
that would have been much more

01:24:18.270 --> 01:24:21.090
difficult to do with just CSS alone.

01:24:21.090 --> 01:24:24.240
As we started to build larger web
applications, we took a look at Git--

01:24:24.240 --> 01:24:26.610
version control tools
that we can use in order

01:24:26.610 --> 01:24:29.370
to make sure that we keep track
of versions and changes we

01:24:29.370 --> 01:24:33.240
make to our code, allowing multiple
people to collaborate on a project

01:24:33.240 --> 01:24:34.547
simultaneously.

01:24:34.547 --> 01:24:37.380
We then took a look at Python,
looking at various different features

01:24:37.380 --> 01:24:40.697
that the language offered--
functions, and conditions, and loops,

01:24:40.697 --> 01:24:42.780
as we've seen in many other
programming languages.

01:24:42.780 --> 01:24:45.210
But also object-oriented
programming-- the ability

01:24:45.210 --> 01:24:47.700
to represent objects, and
methods, and functions

01:24:47.700 --> 01:24:49.950
that operate on those
particular objects, which

01:24:49.950 --> 01:24:53.940
prove especially powerful in the context
of dealing with data inside of our web

01:24:53.940 --> 01:24:55.380
applications.

01:24:55.380 --> 01:24:58.500
Django was the example of a
web framework written in Python

01:24:58.500 --> 01:25:00.510
that we used to very
quickly be able to start up

01:25:00.510 --> 01:25:04.500
a web application, that's able to
listen for requests, and make responses.

01:25:04.500 --> 01:25:06.600
Django has a whole lot
of features built in that

01:25:06.600 --> 01:25:10.072
really make it easy to get started
with building a web application.

01:25:10.072 --> 01:25:12.030
And in particular, it
makes it easy for writing

01:25:12.030 --> 01:25:14.260
web applications that deal with data.

01:25:14.260 --> 01:25:16.860
So Django allows us the
ability to build models

01:25:16.860 --> 01:25:20.760
that interact with SQL without us
having to actually write any SQL code.

01:25:20.760 --> 01:25:25.320
Django can generate the SQL for us just
using these models and migrations that

01:25:25.320 --> 01:25:29.020
allow us to continually apply
changes that we make to our database.

01:25:29.020 --> 01:25:33.330
As we add new tables, add and modify
existing fields on those tables,

01:25:33.330 --> 01:25:36.065
Django can take care of all of that.

01:25:36.065 --> 01:25:38.190
After that, as you'll
recall, we took our attention

01:25:38.190 --> 01:25:40.440
towards the second of the
main programming languages

01:25:40.440 --> 01:25:44.950
in the course, JavaScript, which has a
lot of uses and is very, very popular.

01:25:44.950 --> 01:25:46.920
But we primarily use
it on the client side

01:25:46.920 --> 01:25:50.460
to be able to build interesting
user interfaces-- using JavaScript

01:25:50.460 --> 01:25:52.680
to manipulate the DOM,
the structure of the page,

01:25:52.680 --> 01:25:54.930
to change what it is the user sees.

01:25:54.930 --> 01:25:56.850
And also to add event
handling-- so that when

01:25:56.850 --> 01:25:59.880
the user clicks on a button, when
the user hovers over something, when

01:25:59.880 --> 01:26:02.550
the user interacts with the
page in some sort of way,

01:26:02.550 --> 01:26:04.590
our code is able to respond to it.

01:26:04.590 --> 01:26:09.540
And we saw React, a client-side
framework that uses JavaScript in order

01:26:09.540 --> 01:26:13.470
to allow us to create really interesting
and interactive user interfaces

01:26:13.470 --> 01:26:15.893
with not all that much code at all.

01:26:15.893 --> 01:26:18.060
And then, finally, in these
last couple of lectures,

01:26:18.060 --> 01:26:21.350
we've been looking at some best
practices-- how we can design tests,

01:26:21.350 --> 01:26:23.520
tests the test the server,
but also the client

01:26:23.520 --> 01:26:25.800
to make sure that our code
is working appropriately,

01:26:25.800 --> 01:26:28.860
and also some industry practices
like continuous integration

01:26:28.860 --> 01:26:31.140
and continuous delivery
that just help to make sure

01:26:31.140 --> 01:26:34.740
that, as we make changes to our code,
we're able to deploy and deliver them

01:26:34.740 --> 01:26:37.050
rapidly and effectively
and make sure that we're

01:26:37.050 --> 01:26:39.630
able to make incremental
changes to our code base

01:26:39.630 --> 01:26:42.460
rather than need to wait
on longer release cycles.

01:26:42.460 --> 01:26:44.520
And then finally, today,
we've been talking

01:26:44.520 --> 01:26:47.820
about issues about scalability
and security, especially important

01:26:47.820 --> 01:26:50.880
as we begin to take our application
and move them to the web.

01:26:50.880 --> 01:26:53.562
We want to make sure that these
applications are scalable,

01:26:53.562 --> 01:26:55.770
that they're able to handle
multiple different users,

01:26:55.770 --> 01:26:57.720
and also to make sure
that they're secure--

01:26:57.720 --> 01:27:01.050
that we're not exposing ourselves to
potential vulnerabilities like someone

01:27:01.050 --> 01:27:05.370
who might inject SQL or inject
JavaScript code into our pages

01:27:05.370 --> 01:27:08.730
or who might try to access some data
that they're not supposed to access.

01:27:08.730 --> 01:27:12.420
We want to make sure that, when we go
about designing these web applications,

01:27:12.420 --> 01:27:17.330
we're able to do so in a scalable
and, ultimately, in a secure way.

01:27:17.330 --> 01:27:19.080
So hopefully, you
enjoyed this exploration

01:27:19.080 --> 01:27:21.747
into the world of web programming
with Python and JavaScript.

01:27:21.747 --> 01:27:23.580
Best of luck with the
web programs that you,

01:27:23.580 --> 01:27:26.130
yourself might build with the
tools we've seen here today,

01:27:26.130 --> 01:27:29.310
and also other tools that are
inspired by our use similar tools

01:27:29.310 --> 01:27:32.130
and techniques and ideas as the
things that we've ultimately

01:27:32.130 --> 01:27:32.880
talked about here.

01:27:32.880 --> 01:27:35.672
A big thanks to the course's teaching
staff and the production team

01:27:35.672 --> 01:27:37.255
for making this entire class possible.

01:27:37.255 --> 01:27:39.130
I look forward to seeing
the web applications

01:27:39.130 --> 01:27:40.620
that you might go on to create.

01:27:40.620 --> 01:27:45.110
This was Web Programming
with Python and JavaScript.