1
00:00:00,000 --> 00:00:17,213


2
00:00:17,213 --> 00:00:20,380
DOUG LLOYD: Now that we know a bit more
about the internet and how it works,

3
00:00:20,380 --> 00:00:23,200
let's reintroduce the subject of
security with this new context.

4
00:00:23,200 --> 00:00:26,100
And let's start by talking
about Git and GitHub.

5
00:00:26,100 --> 00:00:28,540
Recall that Git and GitHub
are a technology that

6
00:00:28,540 --> 00:00:31,990
are used by programmers
to version control

7
00:00:31,990 --> 00:00:34,690
their software, which basically
allows them the ability

8
00:00:34,690 --> 00:00:39,010
to save code to an internet-based
repository in case of some failure

9
00:00:39,010 --> 00:00:41,830
locally, they have a backup
place to put it, but also

10
00:00:41,830 --> 00:00:43,750
keep track of all the
changes they've made

11
00:00:43,750 --> 00:00:46,120
and possibly go back in
time in case they produce

12
00:00:46,120 --> 00:00:48,460
a version of code that is broken.

13
00:00:48,460 --> 00:00:50,440
GitHub has some great
advantages, but it also

14
00:00:50,440 --> 00:00:53,110
has the potential disadvantages
because of this structure

15
00:00:53,110 --> 00:00:54,590
of being able to go back in time.

16
00:00:54,590 --> 00:00:58,180
So for example, imagine that what we
have is an initial commit, and commit

17
00:00:58,180 --> 00:01:01,828
is just GitHub parlance
for a set of code

18
00:01:01,828 --> 00:01:03,370
that you are sending to the internet.

19
00:01:03,370 --> 00:01:07,720
So I've decided to take file A, file B,
and file C in their current versions.

20
00:01:07,720 --> 00:01:12,190
I've saved them using control S or
command S literally on my machine,

21
00:01:12,190 --> 00:01:14,800
and I want to send those
versions to GitHub to be

22
00:01:14,800 --> 00:01:17,410
stored permanently or semi-permanently.

23
00:01:17,410 --> 00:01:19,900
You would package those up
in what's called a commit

24
00:01:19,900 --> 00:01:23,560
and then push that code to GitHub
where it would then be visible online.

25
00:01:23,560 --> 00:01:25,270
And this would be packaged as a commit.

26
00:01:25,270 --> 00:01:29,860
And all the files that we view on
GitHub are tracked in terms of commits.

27
00:01:29,860 --> 00:01:31,450
And commits chain together.

28
00:01:31,450 --> 00:01:34,210
And we've seen this idea of
chaining in the past when we've

29
00:01:34,210 --> 00:01:36,600
discussed linked lists, for example.

30
00:01:36,600 --> 00:01:39,100
So every commit knows about the
one that comes after it once

31
00:01:39,100 --> 00:01:43,810
that commit is eventually pushed as well
as all of the ones that preceded it.

32
00:01:43,810 --> 00:01:47,110
So imagine we have an initial
comment where we post some code

33
00:01:47,110 --> 00:01:49,870
and then we write some more--
we make some more changes.

34
00:01:49,870 --> 00:01:52,510
We perhaps update our
database in such a way

35
00:01:52,510 --> 00:01:57,790
where when we post or push-- excuse
me-- our second commit to GitHub,

36
00:01:57,790 --> 00:02:00,460
we accidentally expose
the database credentials.

37
00:02:00,460 --> 00:02:03,250
So perhaps someone
inadvertently typed the password

38
00:02:03,250 --> 00:02:06,760
for how to access the database into
some Python code that would then

39
00:02:06,760 --> 00:02:09,639
be used to access that database.

40
00:02:09,639 --> 00:02:10,930
That's not a good thing.

41
00:02:10,930 --> 00:02:13,833
And maybe somebody quickly realized
it and said, you know what?

42
00:02:13,833 --> 00:02:15,250
We need to get this off of GitHub.

43
00:02:15,250 --> 00:02:16,570
It is a source repository.

44
00:02:16,570 --> 00:02:17,920
It's available online.

45
00:02:17,920 --> 00:02:22,390
And so they push a third commit to
GitHub that deletes those credentials.

46
00:02:22,390 --> 00:02:26,740
It stores them somewhere else that's not
going to be saved on this repository.

47
00:02:26,740 --> 00:02:29,977
But have we actually solved the problem?

48
00:02:29,977 --> 00:02:31,810
And you can probably
imagine that the answer

49
00:02:31,810 --> 00:02:34,930
is no, because we have this
idea of version control

50
00:02:34,930 --> 00:02:39,700
where every past iteration
of all of these files

51
00:02:39,700 --> 00:02:43,840
is stored still on GitHub such that, if
I needed to, I could go back in time.

52
00:02:43,840 --> 00:02:48,220
So even though I attempted to
solve the security crisis I just

53
00:02:48,220 --> 00:02:52,360
created for myself by
introducing a new commit that

54
00:02:52,360 --> 00:02:54,520
removes the credentials
from those files such that,

55
00:02:54,520 --> 00:02:57,070
if I'm looking just at the most
recent version of the files,

56
00:02:57,070 --> 00:02:58,147
I don't see it anymore.

57
00:02:58,147 --> 00:02:59,980
I still have the ability
to go back in time,

58
00:02:59,980 --> 00:03:03,790
so this doesn't actually
solve a problem.

59
00:03:03,790 --> 00:03:05,800
See, one of the interesting
things about GitHub

60
00:03:05,800 --> 00:03:08,230
is the model that is used for it.

61
00:03:08,230 --> 00:03:10,120
At the very beginning
of GitHub's existence,

62
00:03:10,120 --> 00:03:14,260
it relied pretty extensively on
this idea of you sign up for free,

63
00:03:14,260 --> 00:03:16,030
you get a free account
for GitHub, and you

64
00:03:16,030 --> 00:03:20,170
have a limited number of private
repositories, repositories that are not

65
00:03:20,170 --> 00:03:24,250
publicly viewable or searchable, and
you could pay to have more of them

66
00:03:24,250 --> 00:03:25,930
if you wanted to.

67
00:03:25,930 --> 00:03:29,650
But the majority of your
repositories, assuming

68
00:03:29,650 --> 00:03:33,610
you did not opt into a paid
account, were free, which

69
00:03:33,610 --> 00:03:37,720
meant anybody on the internet could
search them using GitHub's search tool,

70
00:03:37,720 --> 00:03:40,600
or using even a regular
search engine such as Google,

71
00:03:40,600 --> 00:03:42,790
could just look for something.

72
00:03:42,790 --> 00:03:46,990
And if your GitHub repositories happen
to match what that person searched

73
00:03:46,990 --> 00:03:49,660
or specifically, if you're looking
within GitHub search feature,

74
00:03:49,660 --> 00:03:52,620
if a user is looking for
specific lines of code,

75
00:03:52,620 --> 00:03:56,138
anything in a public
repository, it is available.

76
00:03:56,138 --> 00:03:58,180
Now, GitHub has recently
changed to a model where

77
00:03:58,180 --> 00:04:01,720
there are more private repo--
or there's a higher limit

78
00:04:01,720 --> 00:04:04,840
on the number of private repositories
that somebody could have.

79
00:04:04,840 --> 00:04:10,090
But this was part of Github's
design to really encourage

80
00:04:10,090 --> 00:04:13,780
developers and programmers to sort of
create this open source community where

81
00:04:13,780 --> 00:04:18,310
anybody could view someone else's
code, and in GitHub parlance,

82
00:04:18,310 --> 00:04:21,670
fork their code, which basically
means to take their entire repository

83
00:04:21,670 --> 00:04:26,830
or collection of files and copy it
into their own GitHub repository

84
00:04:26,830 --> 00:04:29,760
to perhaps make changes
or suggest changes,

85
00:04:29,760 --> 00:04:33,040
pushing those back into the
code base with the idea being

86
00:04:33,040 --> 00:04:35,810
that it would make the
entire community better.

87
00:04:35,810 --> 00:04:38,680
A side effect, of
course, is that items get

88
00:04:38,680 --> 00:04:43,360
revealed when we do so because of this
public repository setup we have here.

89
00:04:43,360 --> 00:04:47,200
So GitHub is great in terms
of its ability for programmers

90
00:04:47,200 --> 00:04:49,930
to refer to materials on the internet.

91
00:04:49,930 --> 00:04:52,750
They don't have to rely on their
own local machines to store code.

92
00:04:52,750 --> 00:04:57,070
It allows people to work
from multiple workstations,

93
00:04:57,070 --> 00:04:59,590
similar to how Dropbox or
Google Drive, for example,

94
00:04:59,590 --> 00:05:02,470
might allow you to access
files from different machines.

95
00:05:02,470 --> 00:05:04,970
You don't have to be on a
specific machine to access a file,

96
00:05:04,970 --> 00:05:08,500
as we used to have to do before
these cloud-based document storage

97
00:05:08,500 --> 00:05:10,060
services existed.

98
00:05:10,060 --> 00:05:12,310
And it encourages collaboration.

99
00:05:12,310 --> 00:05:16,390
For example, if you and I were to
collaborate on a GitHub repository,

100
00:05:16,390 --> 00:05:20,000
I could push changes to that
repository that you could then pull.

101
00:05:20,000 --> 00:05:22,750
And we could then be working
off of the same code base again.

102
00:05:22,750 --> 00:05:25,690
We sort of have this central repo--

103
00:05:25,690 --> 00:05:28,630
central area where we share
our code with one another.

104
00:05:28,630 --> 00:05:30,580
And we can each
individually make changes

105
00:05:30,580 --> 00:05:33,520
and incorporate one another's
changes into the final products.

106
00:05:33,520 --> 00:05:38,110
So we're always working off
of the same base of material.

107
00:05:38,110 --> 00:05:40,210
The side effect, though,
again, is this material

108
00:05:40,210 --> 00:05:44,260
is generally public unless you have
opted into a private repository where

109
00:05:44,260 --> 00:05:46,450
you have specific
individuals who are logged

110
00:05:46,450 --> 00:05:49,990
in with their GitHub
accounts who want to share.

111
00:05:49,990 --> 00:05:52,420
So is there a way to solve
this problem, though, of we

112
00:05:52,420 --> 00:05:55,087
accidentally expose our
credentials in a public repository?

113
00:05:55,087 --> 00:05:56,920
Of course, if we're in
a private repository,

114
00:05:56,920 --> 00:05:58,220
this might not be as alarming.

115
00:05:58,220 --> 00:05:59,920
It's still probably not something you--

116
00:05:59,920 --> 00:06:03,130
it should be encouraged
to have credentials

117
00:06:03,130 --> 00:06:07,480
for anything stored anywhere, whether
public or private, on the internet.

118
00:06:07,480 --> 00:06:08,830
It's a little riskier.

119
00:06:08,830 --> 00:06:12,402
But is there a way to get rid of this or
to prevent this problem from happening?

120
00:06:12,402 --> 00:06:14,860
And fortunately, there are a
number of different safeguards

121
00:06:14,860 --> 00:06:17,680
specific to Git and
GitHub that we can use

122
00:06:17,680 --> 00:06:22,240
to prevent the accidental leakage
of information, so to speak.

123
00:06:22,240 --> 00:06:25,330
So for example, one way we can handle
this is using a program or utility

124
00:06:25,330 --> 00:06:27,340
called GitSecrets.

125
00:06:27,340 --> 00:06:31,000
GitSecrets works by looking for
what's called a regular expression.

126
00:06:31,000 --> 00:06:33,640
And a regular expression is
computer science parlance

127
00:06:33,640 --> 00:06:37,600
for a particular formation of
a string, so a certain number

128
00:06:37,600 --> 00:06:41,360
of characters, a certain number of
digit characters, maybe some punctuation

129
00:06:41,360 --> 00:06:41,860
marks.

130
00:06:41,860 --> 00:06:46,360
You can say, I'm looking for
strings that match this idea.

131
00:06:46,360 --> 00:06:49,630
And you can express this idea
where this idea is all capital

132
00:06:49,630 --> 00:06:52,900
letters, all lowercase letters, this
many numbers, and this many punctuation

133
00:06:52,900 --> 00:06:55,750
marks, and so on using this tool
called a regular expression.

134
00:06:55,750 --> 00:06:59,410
But GitSecrets contains a list
of these regular expressions

135
00:06:59,410 --> 00:07:02,710
and will warn you when you are
about to make a commit, when you're

136
00:07:02,710 --> 00:07:05,650
about to push code or send
code to GitHub to be stored

137
00:07:05,650 --> 00:07:10,030
in its online repository that you have
a string that matches this pattern

138
00:07:10,030 --> 00:07:11,950
that you wanted me to warn you about.

139
00:07:11,950 --> 00:07:15,190
And so be sure before
you commit this code

140
00:07:15,190 --> 00:07:19,600
and push this code that you
actually intend to send this up

141
00:07:19,600 --> 00:07:23,380
to GitHub, because it may be that this
matches a password string that you're

142
00:07:23,380 --> 00:07:24,560
trying to avoid.

143
00:07:24,560 --> 00:07:27,580
So that's an interesting tool
that can be used for that.

144
00:07:27,580 --> 00:07:31,150
You also want to consider
limiting third party app access.

145
00:07:31,150 --> 00:07:35,930
GitHub accounts are actually very
common to use as other forms of login,

146
00:07:35,930 --> 00:07:36,770
for example.

147
00:07:36,770 --> 00:07:39,190
So there's a platform
on the internet called

148
00:07:39,190 --> 00:07:42,190
OAuth which allows you to use,
for example, your Facebook

149
00:07:42,190 --> 00:07:44,977
account or your Google account
to log into other services.

150
00:07:44,977 --> 00:07:47,560
Perhaps you've encountered this
in your own experience working

151
00:07:47,560 --> 00:07:49,510
with different services on the internet.

152
00:07:49,510 --> 00:07:54,010
Instead of creating a login for site x,
you could use your Facebook or Google

153
00:07:54,010 --> 00:07:58,150
login, or, in many instances as
well, your GitHub log in to do so.

154
00:07:58,150 --> 00:08:01,610
When you do so, though, you are
allowing that third party application,

155
00:08:01,610 --> 00:08:07,090
someone that's not GitHub, the ability
to use and access your GitHub identity

156
00:08:07,090 --> 00:08:08,120
or credential.

157
00:08:08,120 --> 00:08:12,640
And so you should be very careful with
not only GitHub but other services

158
00:08:12,640 --> 00:08:17,560
as well, thinking about whether you
want that other service to have access

159
00:08:17,560 --> 00:08:21,940
to your GitHub, or Facebook, or Google
account information to use it even just

160
00:08:21,940 --> 00:08:23,380
for authentication.

161
00:08:23,380 --> 00:08:26,320
It's a good idea to try and
limit how much third party app

162
00:08:26,320 --> 00:08:30,340
access you're giving to other services.

163
00:08:30,340 --> 00:08:33,520
Another tool is to use
something called a commit hook.

164
00:08:33,520 --> 00:08:36,460
Now, commit hook is just a
fancy term for a short program

165
00:08:36,460 --> 00:08:42,070
or set of instructions that executes
when a commit is pushed to GitHub.

166
00:08:42,070 --> 00:08:44,740
So for example, many
of the course websites

167
00:08:44,740 --> 00:08:48,490
that we use here at Harvard
for CS50 are GitHub-based,

168
00:08:48,490 --> 00:08:52,030
which means that when we want to change
the content on the course website,

169
00:08:52,030 --> 00:08:56,350
we update some HTML, or Python,
or JavaScript files, we push those

170
00:08:56,350 --> 00:09:01,000
to GitHub, and that triggers a commit
hook where basically that commit

171
00:09:01,000 --> 00:09:04,570
hook copies those files
into our web server,

172
00:09:04,570 --> 00:09:07,420
runs some tests on them to make
sure that there's no errors in them.

173
00:09:07,420 --> 00:09:10,390
For example, if we wrote some
JavaScript or Python that was breaking,

174
00:09:10,390 --> 00:09:15,250
it had a bug in it, we'd rather
not deploy that bug so to speak.

175
00:09:15,250 --> 00:09:17,710
We wouldn't want the
broken version of the code

176
00:09:17,710 --> 00:09:21,190
to replace the currently
working website.

177
00:09:21,190 --> 00:09:23,750
And so commit hook can be
used to do testing as well.

178
00:09:23,750 --> 00:09:26,170
And then once all the
tests pass, we then

179
00:09:26,170 --> 00:09:28,300
are able to activate those
files on the web server

180
00:09:28,300 --> 00:09:29,890
and the changes have happened.

181
00:09:29,890 --> 00:09:32,530
So we're using GitHub
to store the changes

182
00:09:32,530 --> 00:09:35,650
that we want to make on our
site, the HTML, the Python,

183
00:09:35,650 --> 00:09:37,870
the JavaScript changes
that we want to make.

184
00:09:37,870 --> 00:09:41,650
And then we're using this commit
hook, a set of instructions,

185
00:09:41,650 --> 00:09:45,340
to copy them over and actually
deploy those changes to the website

186
00:09:45,340 --> 00:09:48,430
once we've verified that we
haven't made anything break.

187
00:09:48,430 --> 00:09:52,210
You can also use commit hooks, for
example, to check for passwords

188
00:09:52,210 --> 00:09:56,830
and have it warn you if you have
perhaps leaked a credential.

189
00:09:56,830 --> 00:10:00,040
And then you can undo
that with a technique

190
00:10:00,040 --> 00:10:02,480
that we'll see in just a moment.

191
00:10:02,480 --> 00:10:06,250
Another thing that you can do when
using GitHub to protect or verify

192
00:10:06,250 --> 00:10:09,180
your identity is to use an SSH key.

193
00:10:09,180 --> 00:10:12,653
SSH keys are a special form
of a public and private key.

194
00:10:12,653 --> 00:10:15,070
In this case, it's really not
used for encryption, though.

195
00:10:15,070 --> 00:10:17,535
It's actually used as identification.

196
00:10:17,535 --> 00:10:19,410
And so this idea of
digital signatures, which

197
00:10:19,410 --> 00:10:22,860
you may recall from a few lectures
ago, comes back into play.

198
00:10:22,860 --> 00:10:27,600
Whenever I use an SSH key to push
my code to GitHub, what happens

199
00:10:27,600 --> 00:10:33,150
is I also digitally sign the
commit when I send it up.

200
00:10:33,150 --> 00:10:36,870
And so before that commit
gets posted to GitHub,

201
00:10:36,870 --> 00:10:40,200
GitHub verifies this by
checking my public key

202
00:10:40,200 --> 00:10:43,230
and verifying, using the mathematics
that we've seen in the past,

203
00:10:43,230 --> 00:10:46,650
that, yes, only Doug
could have sent this to me

204
00:10:46,650 --> 00:10:53,160
because only Doug's public key will
unscramble this set of zeros and ones

205
00:10:53,160 --> 00:10:57,180
that I received that only could have
then been created by his private key.

206
00:10:57,180 --> 00:10:59,550
These two things are
reciprocal of one another.

207
00:10:59,550 --> 00:11:01,980
So we can use SSH keys
and digital signatures

208
00:11:01,980 --> 00:11:05,850
as an identity verification
scheme as well for GitHub

209
00:11:05,850 --> 00:11:08,430
as we might be able to for
mailing documents, or sending

210
00:11:08,430 --> 00:11:11,160
documents, or something like that.

211
00:11:11,160 --> 00:11:15,300
Now, imagine we have posted
the credentials accidentally.

212
00:11:15,300 --> 00:11:17,130
Is there a way to get rid of them?

213
00:11:17,130 --> 00:11:18,930
GitHub does track our entire history.

214
00:11:18,930 --> 00:11:20,430
But what if we do make a mistake?

215
00:11:20,430 --> 00:11:22,410
Human beings are fallible.

216
00:11:22,410 --> 00:11:25,980
And so there is a way to
actually eliminate the history.

217
00:11:25,980 --> 00:11:29,697
And that is using a
command called Git Rebase.

218
00:11:29,697 --> 00:11:32,280
So let's go back to the illustration
we had a moment ago where

219
00:11:32,280 --> 00:11:34,250
we have several different commits.

220
00:11:34,250 --> 00:11:37,210
And I've added a fourth commit here
just for purposes of illustration.

221
00:11:37,210 --> 00:11:38,960
So our first commit
and our second commit,

222
00:11:38,960 --> 00:11:42,180
and then it's after that that we
expose the credentials accidentally,

223
00:11:42,180 --> 00:11:47,010
and then we have a fourth commit where
we actually delete that mistake that we

224
00:11:47,010 --> 00:11:48,300
had previously made.

225
00:11:48,300 --> 00:11:51,810
When we want to Git
Rebase, the idea is we want

226
00:11:51,810 --> 00:11:54,370
to delete a portion of the history.

227
00:11:54,370 --> 00:11:56,120
Now, deleting a portion
of the history has

228
00:11:56,120 --> 00:11:59,075
a side effect of any changes
that I made here or here.

229
00:11:59,075 --> 00:12:01,950
In this illustration, we're going
to get rid of the last two commits.

230
00:12:01,950 --> 00:12:05,460
Any changes that I've made besides
accidentally exposing the credentials

231
00:12:05,460 --> 00:12:07,170
are also going to be destroyed.

232
00:12:07,170 --> 00:12:11,220
And so it's going to be incumbent
on us to make sure to copy and save

233
00:12:11,220 --> 00:12:15,150
the changes we actually want to preserve
in case we've done more than just

234
00:12:15,150 --> 00:12:16,530
expose the credentials.

235
00:12:16,530 --> 00:12:19,170
And then we'll have to make a
new commit in this new history

236
00:12:19,170 --> 00:12:23,100
we create so that we can still preserve
those changes that we want to make.

237
00:12:23,100 --> 00:12:25,620
But let's say, other
than the credentials,

238
00:12:25,620 --> 00:12:27,900
I didn't actually do anything else.

239
00:12:27,900 --> 00:12:33,330
One thing I could do is rebase or
set as a new start point, basically,

240
00:12:33,330 --> 00:12:36,190
this second commit as
the end of the chain.

241
00:12:36,190 --> 00:12:40,590
So instead of going all the way to here
and having that preserved ad infinitum,

242
00:12:40,590 --> 00:12:44,430
I want to just get rid of everything
from the second commit forward.

243
00:12:44,430 --> 00:12:45,300
And I can do that.

244
00:12:45,300 --> 00:12:49,110
And then those commits are no
longer remembered by GitHub.

245
00:12:49,110 --> 00:12:52,110
And as soon as the next
commit I have would go here,

246
00:12:52,110 --> 00:12:56,760
right after second commit as opposed
to imagining a fifth one there

247
00:12:56,760 --> 00:12:59,580
right after credentials
being removed, those commits

248
00:12:59,580 --> 00:13:03,570
are, for all intents and
purposes on GitHub, forgotten.

249
00:13:03,570 --> 00:13:06,330
And finally, one more thing
that we can do when using GitHub

250
00:13:06,330 --> 00:13:09,420
is to mandate the use of
two-factor authentication.

251
00:13:09,420 --> 00:13:12,810
Recall we've discussed two-factor
authentication a little bit previously.

252
00:13:12,810 --> 00:13:16,890
And the idea is that you
have a backup mechanism

253
00:13:16,890 --> 00:13:19,650
to prevent unauthorized login.

254
00:13:19,650 --> 00:13:21,720
And the two factors in
two-factor authentication

255
00:13:21,720 --> 00:13:26,520
are not two passwords, because those
are fundamentally quite similar.

256
00:13:26,520 --> 00:13:29,850
The idea is that you want to have
something that you know, for example,

257
00:13:29,850 --> 00:13:33,150
a password-- that's usually very
commonly one of the two factors

258
00:13:33,150 --> 00:13:35,220
in two-factor authentication--

259
00:13:35,220 --> 00:13:37,590
and something that you
have, the thought being

260
00:13:37,590 --> 00:13:42,900
that an adversary is incredibly unlikely
to have both things at the same time.

261
00:13:42,900 --> 00:13:45,120
They may know your
password, but they probably

262
00:13:45,120 --> 00:13:49,320
don't have your cell phone,
for example, or your RSA key.

263
00:13:49,320 --> 00:13:54,360
They may have stolen your phone or
they may have stolen your RSA key,

264
00:13:54,360 --> 00:13:57,390
but they probably don't
also know your password.

265
00:13:57,390 --> 00:14:00,690
And so the idea is that this provides
an additional level of defense

266
00:14:00,690 --> 00:14:04,080
against potential hacking,
or breaking into accounts,

267
00:14:04,080 --> 00:14:06,660
or unauthorized behavior in
accounts that you obviously

268
00:14:06,660 --> 00:14:08,190
don't want to happen.

269
00:14:08,190 --> 00:14:11,562
Now, an RSA key, if you're unfamiliar,
is something that looks like this.

270
00:14:11,562 --> 00:14:13,020
There's different versions of them.

271
00:14:13,020 --> 00:14:14,437
They've sort of evolved over time.

272
00:14:14,437 --> 00:14:18,660
This one is actually a
combined RSA key and USB drive.

273
00:14:18,660 --> 00:14:22,020
And inside the window
here of the RSA key

274
00:14:22,020 --> 00:14:26,010
is a six digit number that just
changes every 60 seconds or so.

275
00:14:26,010 --> 00:14:28,900
So when you are given one
of these, for example,

276
00:14:28,900 --> 00:14:32,310
perhaps at a firm or a business,
it is assigned to you specifically.

277
00:14:32,310 --> 00:14:35,530
There's a server that
your IT team will have

278
00:14:35,530 --> 00:14:39,960
setup that maps the serial number
on the back of this RSA key

279
00:14:39,960 --> 00:14:42,120
to your employee ID, for example.

280
00:14:42,120 --> 00:14:47,010
But they otherwise don't know what the
number currently on the RSA key is.

281
00:14:47,010 --> 00:14:51,840
They only know who owns it, who is
physically in possession of it, which

282
00:14:51,840 --> 00:14:53,210
employee ID it maps do.

283
00:14:53,210 --> 00:14:54,990
And every 60 seconds
it changes according

284
00:14:54,990 --> 00:14:59,430
to some mathematical algorithm that
is built into the key that generates

285
00:14:59,430 --> 00:15:02,190
numbers in a pseudo random way.

286
00:15:02,190 --> 00:15:05,490
And after 60 seconds, that code
will change into something else.

287
00:15:05,490 --> 00:15:10,130
And you'll need to actually have
the key on you to complete a login.

288
00:15:10,130 --> 00:15:12,810
If an RSA key is being
used to secure such

289
00:15:12,810 --> 00:15:15,483
that you need to enter a
password and your RSA key value,

290
00:15:15,483 --> 00:15:16,650
you would need to have both.

291
00:15:16,650 --> 00:15:19,872
No other employee RSA key--
well, hypothetically, I

292
00:15:19,872 --> 00:15:21,830
guess there's a one in
a million chance that it

293
00:15:21,830 --> 00:15:24,705
would happen to be randomly showing
the same number at the same time.

294
00:15:24,705 --> 00:15:28,100
But no other employee's RSA
key could be used to log in.

295
00:15:28,100 --> 00:15:30,690
Only yours could be used to log in.

296
00:15:30,690 --> 00:15:32,690
Now, there are several
different tools out there

297
00:15:32,690 --> 00:15:35,810
that can be used to provide
two-factor authentication services.

298
00:15:35,810 --> 00:15:39,628
And there's really no technical
reason not to use these services.

299
00:15:39,628 --> 00:15:42,170
You'll find them as applications
on cell phones, most likely.

300
00:15:42,170 --> 00:15:46,310
And you'll find ones like this, Google
Authenticator, Authy, Duo Mobile.

301
00:15:46,310 --> 00:15:47,360
There are lots of others.

302
00:15:47,360 --> 00:15:50,390
And if you don't want to use one
of those applications specifically,

303
00:15:50,390 --> 00:15:53,210
many services also just allow
you to receive a text message

304
00:15:53,210 --> 00:15:54,902
from the service itself.

305
00:15:54,902 --> 00:15:56,860
And you'll just get that
via SMS on your phone,

306
00:15:56,860 --> 00:16:00,470
so still on your phone, just not
tied to a specific application.

307
00:16:00,470 --> 00:16:05,690
And while there's no technical reason
to avoid two-factor authentication,

308
00:16:05,690 --> 00:16:08,600
there is sort of this
social friction surrounding

309
00:16:08,600 --> 00:16:13,580
two-factor authentication in that human
beings tend to find it annoying, right?

310
00:16:13,580 --> 00:16:15,860
It used to be username,
password, you're logged in.

311
00:16:15,860 --> 00:16:16,920
It's pretty quick.

312
00:16:16,920 --> 00:16:19,630
Now it's username, password, you
get brought to another screen,

313
00:16:19,630 --> 00:16:22,880
you're asked to enter a six-digit code,
or maybe in some advanced applications

314
00:16:22,880 --> 00:16:26,390
you get a push notification sent to
your device that you have to unlock

315
00:16:26,390 --> 00:16:28,970
and then hit OK on the device.

316
00:16:28,970 --> 00:16:31,280
And people just find that inconvenient.

317
00:16:31,280 --> 00:16:34,400
We haven't yet reached
this point culturally

318
00:16:34,400 --> 00:16:39,440
where two-factor
authentication is the norm.

319
00:16:39,440 --> 00:16:43,610
And so it's sort of a linchpin
when we talk about security

320
00:16:43,610 --> 00:16:49,400
in the internet context, is human
beings being the limiting factor

321
00:16:49,400 --> 00:16:51,980
for how secure we can be.

322
00:16:51,980 --> 00:16:56,810
We have the technology to take
steps to protect ourselves,

323
00:16:56,810 --> 00:16:59,360
but we don't feel compelled to do so.

324
00:16:59,360 --> 00:17:03,260
And we'll see this pattern reemerge
in a few other places today.

325
00:17:03,260 --> 00:17:06,315
But just know that that
is why perhaps you're

326
00:17:06,315 --> 00:17:08,690
not seeing so much adoption
of two-factor authentication.

327
00:17:08,690 --> 00:17:11,480
It's not that it's technically
infeasible to do so.

328
00:17:11,480 --> 00:17:14,900
It's just that we just
find it annoying to do so,

329
00:17:14,900 --> 00:17:19,401
and so we don't adopt it as
aggressively as perhaps we should.

330
00:17:19,401 --> 00:17:21,109
Now let's discuss the
type of attack that

331
00:17:21,109 --> 00:17:24,109
occurs on the internet with
unfortunate regularity,

332
00:17:24,109 --> 00:17:27,270
and that is the idea of a
denial of service attack.

333
00:17:27,270 --> 00:17:29,450
Now, the idea behind
these attacks is basically

334
00:17:29,450 --> 00:17:32,000
to cripple the
infrastructure of a website.

335
00:17:32,000 --> 00:17:34,460
Now, the reason for
this might be financial.

336
00:17:34,460 --> 00:17:36,050
You want to try and sabotage somebody.

337
00:17:36,050 --> 00:17:39,380
There might be other motivations,
distraction, for example,

338
00:17:39,380 --> 00:17:42,380
by tying up their resources,
trying to stop the attack.

339
00:17:42,380 --> 00:17:44,510
It opens up another avenue
to do something else,

340
00:17:44,510 --> 00:17:46,077
to perhaps steal information.

341
00:17:46,077 --> 00:17:48,410
There's many different
motivations for why they do this.

342
00:17:48,410 --> 00:17:51,020
And some of them are
honestly just boredom or fun.

343
00:17:51,020 --> 00:17:54,140
Amateur hackers sometimes
think it's fun to just initiate

344
00:17:54,140 --> 00:17:57,110
a denial of service attack
against an entity that

345
00:17:57,110 --> 00:17:59,870
is not prepared to handle it.

346
00:17:59,870 --> 00:18:02,480
Now, in the associated
materials for this course,

347
00:18:02,480 --> 00:18:06,380
we provided an article called Making
Cyberspace Safe for Democracy, which

348
00:18:06,380 --> 00:18:08,870
we really do encourage you
to take a look at, read,

349
00:18:08,870 --> 00:18:10,597
and discuss with your group.

350
00:18:10,597 --> 00:18:12,680
But I also want to take a
little bit of time right

351
00:18:12,680 --> 00:18:15,590
now just to talk about
this article in particular

352
00:18:15,590 --> 00:18:18,680
and draw your attention
to some areas of concern

353
00:18:18,680 --> 00:18:21,710
or some areas that might
lead to more discussion.

354
00:18:21,710 --> 00:18:25,070
Now, the biggest of
these is these attacks

355
00:18:25,070 --> 00:18:28,875
tend not to be taken very seriously
by people when they hear about them.

356
00:18:28,875 --> 00:18:31,250
You'll occasionally hear about
these attacks in the news,

357
00:18:31,250 --> 00:18:33,350
denial of service
attacks, or their cousin,

358
00:18:33,350 --> 00:18:35,930
distributed denial of service attacks.

359
00:18:35,930 --> 00:18:39,800
But culturally, again,
us being humans and sort

360
00:18:39,800 --> 00:18:42,650
of neglecting some of the
real security concerns here,

361
00:18:42,650 --> 00:18:44,420
we don't think of it as an attack.

362
00:18:44,420 --> 00:18:48,740
And that's maybe because of how we
hear about other kinds of attacks

363
00:18:48,740 --> 00:18:52,340
on the news that seem more
physically devastating,

364
00:18:52,340 --> 00:18:55,310
that have more real consequences.

365
00:18:55,310 --> 00:19:00,860
And it makes it hard to have a serious
conversation about cyber attacks

366
00:19:00,860 --> 00:19:06,650
because there's this friction that we
face trying to get people to understand

367
00:19:06,650 --> 00:19:08,600
that these are meaningful and real.

368
00:19:08,600 --> 00:19:12,530
And in particular, these
attacks are kind of insidious.

369
00:19:12,530 --> 00:19:17,355
They're really easy to execute
without much difficulty at all,

370
00:19:17,355 --> 00:19:20,480
especially against a small business
that might be running its own server as

371
00:19:20,480 --> 00:19:22,640
opposed to relying on a cloud service.

372
00:19:22,640 --> 00:19:29,150
A pretty top-of-the-line, commercially
available machine might be able

373
00:19:29,150 --> 00:19:33,200
to execute a denial of service
or DoS attack on its own.

374
00:19:33,200 --> 00:19:37,310
It doesn't even require
exceptional resources.

375
00:19:37,310 --> 00:19:41,450
Now, when we start to attack mid-sized
companies, or larger companies

376
00:19:41,450 --> 00:19:45,110
or entities, one single computer
from one single IP address

377
00:19:45,110 --> 00:19:47,480
is not typically going to be enough.

378
00:19:47,480 --> 00:19:52,730
And so instead, you would have a
distributed denial of service attack.

379
00:19:52,730 --> 00:19:54,620
In a distributed denial
of service attack,

380
00:19:54,620 --> 00:19:58,070
there is still generally one core
hacker, or one collective group

381
00:19:58,070 --> 00:19:59,960
of hackers or adversaries
that are trying

382
00:19:59,960 --> 00:20:03,647
to penetrate some company's defenses.

383
00:20:03,647 --> 00:20:05,480
But they can't do it
with their own machine.

384
00:20:05,480 --> 00:20:08,210
And so what they do is create
something called a botnet.

385
00:20:08,210 --> 00:20:09,890
Perhaps you've heard this term before.

386
00:20:09,890 --> 00:20:12,590
A botnet basically
happens, or is created,

387
00:20:12,590 --> 00:20:17,103
when hackers or adversaries
distribute worms or viruses sort of

388
00:20:17,103 --> 00:20:17,770
surreptitiously.

389
00:20:17,770 --> 00:20:19,700
Perhaps they packaged
them into some download.

390
00:20:19,700 --> 00:20:22,780
People don't notice anything
about the worm or anything

391
00:20:22,780 --> 00:20:25,750
about this program that has been
covertly installed on their machine.

392
00:20:25,750 --> 00:20:30,010
It doesn't do anything in
particular until it is activated.

393
00:20:30,010 --> 00:20:32,500
And then it becomes
an agent or a zombie--

394
00:20:32,500 --> 00:20:34,930
sometimes you'll hear
it termed that as well--

395
00:20:34,930 --> 00:20:36,400
controlled by the hackers.

396
00:20:36,400 --> 00:20:39,130
And so all of a sudden
the adversaries gain

397
00:20:39,130 --> 00:20:42,190
control of many different
devices, hundreds or thousands

398
00:20:42,190 --> 00:20:46,450
or tens of thousands, or even
more in some of the bigger attacks

399
00:20:46,450 --> 00:20:50,602
that have happened, basically
turning these computers--

400
00:20:50,602 --> 00:20:52,310
rendering all of them
under their control

401
00:20:52,310 --> 00:20:55,130
and being able to direct them to
take whatever action they want.

402
00:20:55,130 --> 00:20:58,870
And in particular, in the case of a
distributed denial of service attack,

403
00:20:58,870 --> 00:21:03,190
all of these computers are
going to make web requests

404
00:21:03,190 --> 00:21:07,810
to the same server or same
website, because that's the idea.

405
00:21:07,810 --> 00:21:09,180
You have so many requests.

406
00:21:09,180 --> 00:21:10,930
With distributed denial
of service attacks

407
00:21:10,930 --> 00:21:13,972
or just regular denial of service
attacks, it's just a question of scale,

408
00:21:13,972 --> 00:21:15,610
really.

409
00:21:15,610 --> 00:21:18,430
We're hitting those servers
with so many web requests.

410
00:21:18,430 --> 00:21:19,390
I want to access this.

411
00:21:19,390 --> 00:21:22,210
I want to access this, hundreds,
thousands, tens of thousands

412
00:21:22,210 --> 00:21:26,110
of these requests a second such that
the computer can't possibly-- the server

413
00:21:26,110 --> 00:21:28,210
can't possibly field
all of these inquiries

414
00:21:28,210 --> 00:21:33,010
that are coming and trying to give these
requests the data they're asking for.

415
00:21:33,010 --> 00:21:35,425
Ultimately, that would
eventually, after enough time,

416
00:21:35,425 --> 00:21:38,300
result in the server just crashing,
throwing up its hands and saying,

417
00:21:38,300 --> 00:21:39,430
I don't know what to do.

418
00:21:39,430 --> 00:21:41,388
I can't possibly process
all of these requests.

419
00:21:41,388 --> 00:21:45,010
But by tying it up in
this way, the adversary

420
00:21:45,010 --> 00:21:49,840
has succeeded in damaging the
infrastructure of the server.

421
00:21:49,840 --> 00:21:52,960
It's either denied the server
the ability to process customers

422
00:21:52,960 --> 00:21:55,840
and payments or it's just
taken down the entire website

423
00:21:55,840 --> 00:21:58,840
so there's no information available
about the company anymore to anybody

424
00:21:58,840 --> 00:22:01,630
who's trying to look it up.

425
00:22:01,630 --> 00:22:04,990
These attacks are actually
really, really common.

426
00:22:04,990 --> 00:22:06,910
There are some surveys
that have been out that

427
00:22:06,910 --> 00:22:12,292
assess that roughly one sixth to one
third of average-sized businesses that

428
00:22:12,292 --> 00:22:14,500
are part of this tech survey
that goes out every year

429
00:22:14,500 --> 00:22:20,680
suffer some sort of DoS attack in
a given year, so 16% to 35% or so

430
00:22:20,680 --> 00:22:23,910
of business, which is a lot of
businesses when you think about it.

431
00:22:23,910 --> 00:22:25,660
And these attacks are
usually quite small,

432
00:22:25,660 --> 00:22:27,610
and they're certainly not newsworthy.

433
00:22:27,610 --> 00:22:28,870
They might last a few minutes.

434
00:22:28,870 --> 00:22:30,190
They might last a few hours.

435
00:22:30,190 --> 00:22:31,690
But they're enough to be disruptive.

436
00:22:31,690 --> 00:22:32,898
They're certainly noteworthy.

437
00:22:32,898 --> 00:22:36,310
And they're something to
avoid if it's possible.

438
00:22:36,310 --> 00:22:41,660
Cloud computing has made
this problem kind of worse.

439
00:22:41,660 --> 00:22:45,190
And the reason for this is that,
in a cloud computing context,

440
00:22:45,190 --> 00:22:47,980
your server that is
running your business

441
00:22:47,980 --> 00:22:50,350
is not physically
located on your premises.

442
00:22:50,350 --> 00:22:54,270
It was often the case that when
a business would run a website

443
00:22:54,270 --> 00:23:00,430
or would run their business, they
would have a server room that

444
00:23:00,430 --> 00:23:03,790
had the software that was
necessary to run their website

445
00:23:03,790 --> 00:23:07,060
or to run whatever software-based
services they provided.

446
00:23:07,060 --> 00:23:10,415
And it was all local to that business.

447
00:23:10,415 --> 00:23:12,980
No one else could possibly be affected.

448
00:23:12,980 --> 00:23:15,070
But in a cloud computing
context, we are generally

449
00:23:15,070 --> 00:23:20,860
renting server space and server power
from an entity such as Amazon Web

450
00:23:20,860 --> 00:23:24,790
Services, or Google Cloud Services,
or some other large provider where

451
00:23:24,790 --> 00:23:30,460
it might be that 10, 20, 50, depending
on the size of the business in question

452
00:23:30,460 --> 00:23:31,510
here--

453
00:23:31,510 --> 00:23:35,920
multiple businesses are sharing
the same physical resources,

454
00:23:35,920 --> 00:23:37,990
and they're sharing
the same server space,

455
00:23:37,990 --> 00:23:41,260
such that if any one
of those 50, let's say,

456
00:23:41,260 --> 00:23:44,950
businesses is targeted
by hackers or adversaries

457
00:23:44,950 --> 00:23:49,570
for a denial of service attack, that
might actually, as collateral damage,

458
00:23:49,570 --> 00:23:52,390
take out the other 49 businesses.

459
00:23:52,390 --> 00:23:54,400
They weren't even part of the attack.

460
00:23:54,400 --> 00:23:55,930
But cloud computing is--

461
00:23:55,930 --> 00:23:57,820
we've heard about it
as it's a great thing.

462
00:23:57,820 --> 00:24:00,640
It allows us to scale
out our websites, make it

463
00:24:00,640 --> 00:24:02,800
so that we can handle more customers.

464
00:24:02,800 --> 00:24:06,280
It takes away the problem of
security, web-based security,

465
00:24:06,280 --> 00:24:11,090
because we're outsourcing that to the
cloud provider to give that to us.

466
00:24:11,090 --> 00:24:15,490
But it now introduces this new problem
of, if we're all sharing the resources

467
00:24:15,490 --> 00:24:18,790
and any one of us gets
attacked, then all of us

468
00:24:18,790 --> 00:24:21,760
lose the ability to access
those resources and use them,

469
00:24:21,760 --> 00:24:24,550
which might cause all of
our organizations to suffer

470
00:24:24,550 --> 00:24:28,090
the consequences of one single attack.

471
00:24:28,090 --> 00:24:30,700
This collateral damage
can get even worse

472
00:24:30,700 --> 00:24:33,050
when you think about servers that are--

473
00:24:33,050 --> 00:24:38,590
or businesses whose service
is providing the internet, OK?

474
00:24:38,590 --> 00:24:40,970
So a very common example of
this, or a noteworthy example

475
00:24:40,970 --> 00:24:44,260
of this, happened in 2016
with a service called

476
00:24:44,260 --> 00:24:49,480
DYN, D-Y-N. DYN is a
DNS service provider,

477
00:24:49,480 --> 00:24:52,390
DNS being the domain name system.

478
00:24:52,390 --> 00:25:00,450
And the idea there is to map the things
like www.google.com to its IP address.

479
00:25:00,450 --> 00:25:02,950
Because in order to actually
access anything on the internet

480
00:25:02,950 --> 00:25:06,140
or to have a communication with anyone,
you need to know their IP address.

481
00:25:06,140 --> 00:25:09,220
And as human beings, we tend
not to actually remember

482
00:25:09,220 --> 00:25:14,020
what some website's IP address is, much
like we may not recall a certain phone

483
00:25:14,020 --> 00:25:14,590
number.

484
00:25:14,590 --> 00:25:17,170
But if it has a mnemonic
attached to it-- so for example,

485
00:25:17,170 --> 00:25:20,530
you know back in the day we had
1-800-COLLECT for collect calls.

486
00:25:20,530 --> 00:25:25,750
If you forgot the number, the
literal digits of that phone number,

487
00:25:25,750 --> 00:25:29,290
you could still remember the idea of
it because you had this mnemonic device

488
00:25:29,290 --> 00:25:30,760
to help remind you.

489
00:25:30,760 --> 00:25:35,110
Domain names, www.whatever.com,
are just mnemonic devices

490
00:25:35,110 --> 00:25:37,570
that we use to refer to an IP address.

491
00:25:37,570 --> 00:25:41,770
And DNS servers provide
this service to us.

492
00:25:41,770 --> 00:25:46,990
DYN is one of the major DNS
providers for the internet overall.

493
00:25:46,990 --> 00:25:49,630
And if a denial of service
attack, or in this case

494
00:25:49,630 --> 00:25:53,800
it was certainly a distributed denial of
service attack because it was enormous,

495
00:25:53,800 --> 00:25:58,480
goes after pinging the IP address
or hitting that server over

496
00:25:58,480 --> 00:26:03,070
and over and over, then it is unable
to field requests from anyone else,

497
00:26:03,070 --> 00:26:06,880
because it's just getting pummeled by
all of these requests from some botnet

498
00:26:06,880 --> 00:26:11,250
that some adversary or collective
of adversaries has taken control of.

499
00:26:11,250 --> 00:26:13,990
This, the collateral
damage, is no one can ever

500
00:26:13,990 --> 00:26:17,110
map a domain name to
an IP address, which

501
00:26:17,110 --> 00:26:19,720
means no one can visit
any of these websites

502
00:26:19,720 --> 00:26:24,250
unless you happen to know at the
outset what the IP address of any given

503
00:26:24,250 --> 00:26:24,850
website was.

504
00:26:24,850 --> 00:26:27,243
If you knew the IP address,
this wasn't a problem.

505
00:26:27,243 --> 00:26:29,410
You could just still directly
go to that IP address.

506
00:26:29,410 --> 00:26:31,000
That's not the kind of attack here.

507
00:26:31,000 --> 00:26:33,460
But the attack instead
tied up the ability

508
00:26:33,460 --> 00:26:38,410
to translate these mnemonic
names into numbers.

509
00:26:38,410 --> 00:26:42,400
And as you can see,
DYN was a DNS-- or is

510
00:26:42,400 --> 00:26:45,490
a DNS provider for much of the
eastern half of the United States

511
00:26:45,490 --> 00:26:48,842
as well as the Pacific
Northwest and California.

512
00:26:48,842 --> 00:26:50,800
And if you think about
what kinds of businesses

513
00:26:50,800 --> 00:26:53,950
are headquartered in
the Pacific Northwest

514
00:26:53,950 --> 00:26:58,810
and in California and in the
New York area, for example,

515
00:26:58,810 --> 00:27:01,060
you probably see that some
major, major services,

516
00:27:01,060 --> 00:27:03,435
including GitHub, which we've
already talked about today,

517
00:27:03,435 --> 00:27:06,190
but also Facebook and others--

518
00:27:06,190 --> 00:27:09,940
Harvard University's website was
also taken down for several hours.

519
00:27:09,940 --> 00:27:12,320
This attack lasted about 10
hours, so quite prolonged.

520
00:27:12,320 --> 00:27:15,810
It really did a lot
of damage on that day.

521
00:27:15,810 --> 00:27:18,310
It really crippled the ability
of people to use the internet

522
00:27:18,310 --> 00:27:22,420
for a long period of time,
so kind of very interesting.

523
00:27:22,420 --> 00:27:28,330
This article also talks a bit about
how the United States government has

524
00:27:28,330 --> 00:27:31,450
decided to-- or legislature--

525
00:27:31,450 --> 00:27:35,293
handle these kinds of issues,
computer-based attacks.

526
00:27:35,293 --> 00:27:37,460
It takes take a look at the
Computer Fraud and Abuse

527
00:27:37,460 --> 00:27:41,290
Act, which is codified at 18 USC 1030.

528
00:27:41,290 --> 00:27:47,020
And this is really the only computer
crimes, general computer crimes,

529
00:27:47,020 --> 00:27:49,990
law that is on the books
and talks about what

530
00:27:49,990 --> 00:27:53,710
it means to be a protected computer.

531
00:27:53,710 --> 00:27:57,430
And you'll be interested to know
perhaps that any computer pretty much is

532
00:27:57,430 --> 00:27:58,780
a protected computer.

533
00:27:58,780 --> 00:28:02,320
The law specifically calls out
government computers as well as

534
00:28:02,320 --> 00:28:04,990
any computer that may be
involved in interstate commerce,

535
00:28:04,990 --> 00:28:08,200
which is you can imagine
anybody who uses the internet,

536
00:28:08,200 --> 00:28:11,030
their computer then falls
under the ambit of this act.

537
00:28:11,030 --> 00:28:13,030
So it's another interesting
thing to take a look

538
00:28:13,030 --> 00:28:20,320
at if you're interested in how we
deal with processing or prosecuting

539
00:28:20,320 --> 00:28:23,020
violations of computer-based crimes.

540
00:28:23,020 --> 00:28:26,330
All of it is actually sort of dealt
with in the Computer Fraud and Abuse

541
00:28:26,330 --> 00:28:29,500
Act, which is not terribly long
and hasn't been updated extensively

542
00:28:29,500 --> 00:28:32,150
since the 1980s other than
some small amendments.

543
00:28:32,150 --> 00:28:34,150
So it's kind of interesting
that we have not yet

544
00:28:34,150 --> 00:28:38,440
gotten to the point where we
are defining and prosecuting

545
00:28:38,440 --> 00:28:42,400
specific types of computer crime,
even though we've begun to figure out

546
00:28:42,400 --> 00:28:47,620
different types of computer crimes,
such as DoS attacks, such as phishing,

547
00:28:47,620 --> 00:28:49,370
and so on.

548
00:28:49,370 --> 00:28:52,690
Now, hypothetically, a simple
denial of service attack

549
00:28:52,690 --> 00:28:53,950
should be pretty easy to stop.

550
00:28:53,950 --> 00:28:59,230
And the reason for that is that there's
only one person making the attack.

551
00:28:59,230 --> 00:29:03,130
All requests, recall, that happen
over the internet happen via HTTP.

552
00:29:03,130 --> 00:29:07,585
And HTTP requires that
the sender's IP address

553
00:29:07,585 --> 00:29:09,460
be part of that envelope
that gets sent over,

554
00:29:09,460 --> 00:29:12,880
such that the server who wants to
respond to the client, or the sender,

555
00:29:12,880 --> 00:29:13,980
can just reference.

556
00:29:13,980 --> 00:29:14,980
It's the return address.

557
00:29:14,980 --> 00:29:17,438
You need to be able to know
where to send the data back to.

558
00:29:17,438 --> 00:29:19,680
And so any request that is coming from--

559
00:29:19,680 --> 00:29:21,430
there are thousands
of requests that might

560
00:29:21,430 --> 00:29:23,680
be coming from a single IP address.

561
00:29:23,680 --> 00:29:27,490
If you see that happening, you can
just decide as a server in the software

562
00:29:27,490 --> 00:29:31,570
to stop accepting requests
from that address.

563
00:29:31,570 --> 00:29:34,360
DDoS attacks, distributed
denial of service attacks,

564
00:29:34,360 --> 00:29:36,160
are much harder to stop.

565
00:29:36,160 --> 00:29:40,390
And it's exactly because of the fact
that there is not a single source.

566
00:29:40,390 --> 00:29:42,880
If there's a single source,
again, we would just completely

567
00:29:42,880 --> 00:29:48,250
stop accepting any requests of
any type from that computer.

568
00:29:48,250 --> 00:29:51,370
However, because we have so many
different computers to contend with,

569
00:29:51,370 --> 00:29:54,010
the options to handle this
are a bit more limited.

570
00:29:54,010 --> 00:29:57,400
There are some techniques for
averting them or stopping them

571
00:29:57,400 --> 00:30:01,960
once they are detected, however,
the first of which is firewalling.

572
00:30:01,960 --> 00:30:04,270
So the idea of a firewall
is we are only going

573
00:30:04,270 --> 00:30:06,700
to allow requests of a certain type.

574
00:30:06,700 --> 00:30:08,950
We're going to allow
them from any IP address,

575
00:30:08,950 --> 00:30:11,950
but we're only going to
accept them into this port.

576
00:30:11,950 --> 00:30:15,880
Recall that TCPIP gives us the
ability to say this service

577
00:30:15,880 --> 00:30:19,390
comes in via this port, so HTTP
requests come in by a port 80.

578
00:30:19,390 --> 00:30:24,360
HTTPS requests come in via port 443.

579
00:30:24,360 --> 00:30:27,030
So imagine a distributed
denial of service attack

580
00:30:27,030 --> 00:30:33,100
where typically the site would expect
to be receiving requests on HTTPS.

581
00:30:33,100 --> 00:30:37,650
It generally only uses
secured HTTP in order

582
00:30:37,650 --> 00:30:40,300
to process whatever
requests are coming in.

583
00:30:40,300 --> 00:30:44,160
So it's expecting to receive
a lot of traffic on port 443.

584
00:30:44,160 --> 00:30:47,970
And then all of a sudden a
distributed denial of service attack

585
00:30:47,970 --> 00:30:51,930
begins and it's receiving
lots of requests on port 80.

586
00:30:51,930 --> 00:30:55,440
One way to stop that attack before
it starts to tie up resources

587
00:30:55,440 --> 00:30:57,540
is to just put a
firewall up and say, I'm

588
00:30:57,540 --> 00:31:00,210
not actually going to accept
any requests on port 80.

589
00:31:00,210 --> 00:31:03,650
And this may have a side effect of
denying certain legitimate requests

590
00:31:03,650 --> 00:31:04,710
from getting through.

591
00:31:04,710 --> 00:31:07,920
But since the vast majority of the
traffic that I receive on the site

592
00:31:07,920 --> 00:31:12,805
comes in via HTTPS on port 443,
that's a small price to pay.

593
00:31:12,805 --> 00:31:15,180
I'd rather just allow the
legitimate requests to come in.

594
00:31:15,180 --> 00:31:17,140
So that's one technique.

595
00:31:17,140 --> 00:31:19,950
Another technique is
something called sinkholing.

596
00:31:19,950 --> 00:31:22,350
And it's exactly what
you probably think it is.

597
00:31:22,350 --> 00:31:24,860
So a sinkhole, as you
probably know, is a hole

598
00:31:24,860 --> 00:31:26,610
in the ground that
swallows everything up.

599
00:31:26,610 --> 00:31:32,730
And a sink hole in digital context is
a big black hole, basically, for data.

600
00:31:32,730 --> 00:31:34,890
It's just going to swallow
up every single request

601
00:31:34,890 --> 00:31:36,960
and just not allow any of them out.

602
00:31:36,960 --> 00:31:39,962
So this would, again, stop
the denial of service attack

603
00:31:39,962 --> 00:31:41,670
because it's just
taking all the requests

604
00:31:41,670 --> 00:31:44,190
and basically throwing
them in the trash.

605
00:31:44,190 --> 00:31:48,120
This won't take down the website of
the company that's being attacked,

606
00:31:48,120 --> 00:31:49,590
so that's a good thing.

607
00:31:49,590 --> 00:31:52,590
But it's also not going to allow
any legitimate traffic of any type

608
00:31:52,590 --> 00:31:54,460
through, so that might be a bad thing.

609
00:31:54,460 --> 00:31:56,460
But depending on the
length of the attack, if it

610
00:31:56,460 --> 00:31:59,520
seems like it's going to be
short, if the requests trickle off

611
00:31:59,520 --> 00:32:02,670
and stop because the attackers
realize, we're not making any progress,

612
00:32:02,670 --> 00:32:04,020
we're not actually doing--

613
00:32:04,020 --> 00:32:06,510
we're not getting the results
that we had hoped for,

614
00:32:06,510 --> 00:32:08,490
then perhaps they would give up.

615
00:32:08,490 --> 00:32:11,903
Then the sinkhole could be
stopped and regular traffic

616
00:32:11,903 --> 00:32:13,320
could start to flow through again.

617
00:32:13,320 --> 00:32:16,590
So a sinkhole is basically just
take all the traffic that comes in

618
00:32:16,590 --> 00:32:18,665
and just throw it in the trash.

619
00:32:18,665 --> 00:32:20,665
And then finally, another
technique we could use

620
00:32:20,665 --> 00:32:22,950
is something called packet analysis.

621
00:32:22,950 --> 00:32:27,390
So again, HTTP we know
is requests via the web.

622
00:32:27,390 --> 00:32:30,120
And we learned a little
bit that we have headers

623
00:32:30,120 --> 00:32:33,060
that are packaged alongside
those HTTP packets

624
00:32:33,060 --> 00:32:38,010
where the request originated
from, where it's going to.

625
00:32:38,010 --> 00:32:40,440
There's a whole lot of
other metadata as well.

626
00:32:40,440 --> 00:32:44,250
You'll know, for example, what type
of browser the individual is using

627
00:32:44,250 --> 00:32:46,290
and what operating system
perhaps they are using

628
00:32:46,290 --> 00:32:50,950
and where, as in sort of a
geographical generalization, are they.

629
00:32:50,950 --> 00:32:52,440
Are they in the US Northeast?

630
00:32:52,440 --> 00:32:55,350
Are they in South America and so on?

631
00:32:55,350 --> 00:32:59,160
Instead of deciding to restrict
traffic via specific ports

632
00:32:59,160 --> 00:33:03,540
or just restrict all traffic, we could
still allow all traffic to come in

633
00:33:03,540 --> 00:33:06,460
but inspect all of the
packets as they come in.

634
00:33:06,460 --> 00:33:09,060
So for example, perhaps most
of the traffic on our site we

635
00:33:09,060 --> 00:33:11,650
are expecting to come from the--

636
00:33:11,650 --> 00:33:13,400
just because I used
that example already--

637
00:33:13,400 --> 00:33:14,700
US Northeast.

638
00:33:14,700 --> 00:33:16,650
And then all of a sudden
we are experiencing

639
00:33:16,650 --> 00:33:20,640
tons of packets coming in that have IP
addresses that all seem to be based--

640
00:33:20,640 --> 00:33:24,050
or they have, as part of
their packets, information

641
00:33:24,050 --> 00:33:25,800
that says that they're
from South America,

642
00:33:25,800 --> 00:33:29,790
or they're from the US West Coast, or
somewhere else that we don't expect.

643
00:33:29,790 --> 00:33:32,430
We can decide, after taking
a quick look at that packet

644
00:33:32,430 --> 00:33:36,240
and analyzing those individual
headers, that I'm not

645
00:33:36,240 --> 00:33:39,240
going to accept any
packets from that location.

646
00:33:39,240 --> 00:33:42,970
The ones that match locations
I'm expecting, I'll let through.

647
00:33:42,970 --> 00:33:45,948
And this, again, might prevent certain
customers from getting through,

648
00:33:45,948 --> 00:33:48,990
certain legitimate customers who might
actually be based in South America

649
00:33:48,990 --> 00:33:50,460
from getting through.

650
00:33:50,460 --> 00:33:54,980
But in general, it's going to
block most of the damaging traffic.

651
00:33:54,980 --> 00:33:57,900
DDoS attacks are really
frustrating for companies

652
00:33:57,900 --> 00:34:01,470
because they really
can do a lot of damage.

653
00:34:01,470 --> 00:34:04,480
Usually the resources of the
company will eventually-- especially

654
00:34:04,480 --> 00:34:08,280
if they're cloud-based and they rely
on their cloud provider to help them

655
00:34:08,280 --> 00:34:12,290
scale up, usually the resources
of the company being attacked

656
00:34:12,290 --> 00:34:14,699
are enough to eventually
overwhelm and stop

657
00:34:14,699 --> 00:34:18,780
the attacker who usually has a
much more limited set of resources.

658
00:34:18,780 --> 00:34:22,570
But again, depending on the type of
business being attacked in this way--

659
00:34:22,570 --> 00:34:25,580
again, think of the example
of DYN, the DNS provider.

660
00:34:25,580 --> 00:34:27,330
The ramifications for
one of these attacks

661
00:34:27,330 --> 00:34:31,350
can be really quite severe and
really quite annoying and costly

662
00:34:31,350 --> 00:34:34,480
for a business that suffers it.

663
00:34:34,480 --> 00:34:38,050
So we just talked about
HTTP and HTTPSS a moment ago

664
00:34:38,050 --> 00:34:40,050
when we were talking about
firewalling, allowing

665
00:34:40,050 --> 00:34:42,790
some traffic on some of the
ports but not other ports,

666
00:34:42,790 --> 00:34:47,290
so maybe allowing HTTP
traffic but not HTTPS traffic.

667
00:34:47,290 --> 00:34:51,120
Let's take a look at these two
technologies in a bit more detail.

668
00:34:51,120 --> 00:34:54,330
So HTTP, again, is the
hypertext transfer protocol.

669
00:34:54,330 --> 00:34:58,530
It is how hypertext or web pages
are transmitted over the internet.

670
00:34:58,530 --> 00:35:04,530
If I am a client and I make a
request to you for some HTML content,

671
00:35:04,530 --> 00:35:08,130
then you as a server would
send a response back to me,

672
00:35:08,130 --> 00:35:11,550
and then I would be able to see
the page that I had requested.

673
00:35:11,550 --> 00:35:17,090
And every HTTP request has a specific
format at the beginning of it.

674
00:35:17,090 --> 00:35:24,560
For example, we might see something
like this, GET /execed HTTP/1.1, host:

675
00:35:24,560 --> 00:35:25,790
law.harvard.edu.

676
00:35:25,790 --> 00:35:28,670
Let's just quickly pick these
apart again one more time.

677
00:35:28,670 --> 00:35:31,910
If you see GET at the
beginning of an HTTP request,

678
00:35:31,910 --> 00:35:36,680
it means please fetch or get
for me, literally, this page.

679
00:35:36,680 --> 00:35:40,970
The page I'm requesting
specifically is /execed.

680
00:35:40,970 --> 00:35:46,520
And the host that I'm asking it from
is, in this case, law.harvard.edu.

681
00:35:46,520 --> 00:35:50,690
So basically what I'm saying
here is please fetch for me,

682
00:35:50,690 --> 00:35:54,120
or retreat from me, the
HTML content that comprises

683
00:35:54,120 --> 00:36:00,410
http://law.harvard.edu/execed.

684
00:36:00,410 --> 00:36:05,990
And specifically I'm doing this
using HTTP protocol version 1.1.

685
00:36:05,990 --> 00:36:08,270
We're still using
version 1.1 even though I

686
00:36:08,270 --> 00:36:13,250
believe version 2.0 was defined
almost 20 years ago now probably.

687
00:36:13,250 --> 00:36:17,030
And basically this is just
HTTP's way of identifying

688
00:36:17,030 --> 00:36:19,040
how you're asking the question.

689
00:36:19,040 --> 00:36:23,540
So it's similar to me making a
request and saying, oh, by the way,

690
00:36:23,540 --> 00:36:26,690
the rest of this request is written
in French, or, oh, by the way,

691
00:36:26,690 --> 00:36:29,630
the rest of this request
is written in Spanish.

692
00:36:29,630 --> 00:36:32,750
It's more like here are
the parameters that you

693
00:36:32,750 --> 00:36:35,150
should expect to see
because this request is

694
00:36:35,150 --> 00:36:39,540
in version 1.1, which differed
non-trivially from version 1.0.

695
00:36:39,540 --> 00:36:45,590
So it's just an identifier for how
exactly we are formatting our request.

696
00:36:45,590 --> 00:36:47,950
But HTTP is not encrypted.

697
00:36:47,950 --> 00:36:51,232
And so if we think about
making a request to a server,

698
00:36:51,232 --> 00:36:52,940
if we're the client
on the left and we're

699
00:36:52,940 --> 00:36:56,120
making a request to a server on the
right, it might go something like this.

700
00:36:56,120 --> 00:37:00,530
Because the odds are pretty low
that, if we're making a request,

701
00:37:00,530 --> 00:37:03,350
we are so close to the
server that would serve

702
00:37:03,350 --> 00:37:05,660
that request to us that
it wouldn't need to hop

703
00:37:05,660 --> 00:37:07,480
through any routers along the way.

704
00:37:07,480 --> 00:37:09,410
Remember, routers,
their purpose in life is

705
00:37:09,410 --> 00:37:11,260
to send traffic in the right direction.

706
00:37:11,260 --> 00:37:13,350
And they contain a table
of information that says,

707
00:37:13,350 --> 00:37:15,800
oh, if I'm making a request
to some server over there,

708
00:37:15,800 --> 00:37:18,920
then the best path is to go here,
and then I'll send it over there,

709
00:37:18,920 --> 00:37:20,890
and then it will send it there.

710
00:37:20,890 --> 00:37:23,480
Their job is to optimize
and find the best path

711
00:37:23,480 --> 00:37:26,370
to get the request to
where it needs to be.

712
00:37:26,370 --> 00:37:31,145
So if I'm initiating a request
to, as the client, the server,

713
00:37:31,145 --> 00:37:33,020
it's going to first go
through router A who's

714
00:37:33,020 --> 00:37:35,760
going to say, OK, I'm going to
move it closer to the server

715
00:37:35,760 --> 00:37:38,960
so that it receives that request,
goes to router B, goes to router C.

716
00:37:38,960 --> 00:37:41,900
And eventually router C perhaps
is close enough to the server

717
00:37:41,900 --> 00:37:45,380
that it can just hand
off the request directly.

718
00:37:45,380 --> 00:37:48,568
The server's then going to get
that request, read it as HTTP/1.1,

719
00:37:48,568 --> 00:37:51,860
look at all the other metadata inside of
the request to see if there's anything

720
00:37:51,860 --> 00:37:55,030
else that it's being asked for, and
then it's going to send the information

721
00:37:55,030 --> 00:37:55,530
back.

722
00:37:55,530 --> 00:37:57,620
And in this example
I'm having it go back

723
00:37:57,620 --> 00:38:00,860
exactly through the same chain
of routers but in reverse.

724
00:38:00,860 --> 00:38:02,540
But in reality, that might be different.

725
00:38:02,540 --> 00:38:04,430
It might not go through
the exact same three

726
00:38:04,430 --> 00:38:06,620
routers in this example in reverse.

727
00:38:06,620 --> 00:38:12,110
It might actually go from C to A to
B, back to A depending on traffic

728
00:38:12,110 --> 00:38:14,780
that's happening on the network
and how congested things are

729
00:38:14,780 --> 00:38:19,310
and whether there might be a new path
that is better in the amount of time

730
00:38:19,310 --> 00:38:23,210
it took to process the
request that I asked for.

731
00:38:23,210 --> 00:38:25,880
But remember, HTTP, not secured.

732
00:38:25,880 --> 00:38:26,720
Not encrypted.

733
00:38:26,720 --> 00:38:29,000
This is plain,
over-the-air communication.

734
00:38:29,000 --> 00:38:33,560
We saw previously, when we
took a look at a screenshot

735
00:38:33,560 --> 00:38:36,530
from a tool called
Wireshark, that it's not

736
00:38:36,530 --> 00:38:41,420
that difficult on an unsecured network
using an unsecured protocol to read,

737
00:38:41,420 --> 00:38:44,150
literally, the contents of
those packets going to and from.

738
00:38:44,150 --> 00:38:46,320
So that's a vulnerability here for sure.

739
00:38:46,320 --> 00:38:48,980
Another vulnerability is
any one of these computers

740
00:38:48,980 --> 00:38:51,060
along the way could be compromised.

741
00:38:51,060 --> 00:38:54,320
So for example, router
A perhaps was infected

742
00:38:54,320 --> 00:38:57,510
by somebody who-- a router
is just a computer as well.

743
00:38:57,510 --> 00:39:00,200
So perhaps it was
infected by an adversary

744
00:39:00,200 --> 00:39:03,950
with some worm that will eventually
make it part of some botnet,

745
00:39:03,950 --> 00:39:07,580
and it'll eventually start
spamming some server somewhere.

746
00:39:07,580 --> 00:39:11,960
If router A is compromised in such a
way that an adversary can just read all

747
00:39:11,960 --> 00:39:14,010
the traffic that flows
through it-- and again,

748
00:39:14,010 --> 00:39:17,780
we're sending all of our traffic
in an unencrypted fashion--

749
00:39:17,780 --> 00:39:21,230
then we have another security
loophole to deal with.

750
00:39:21,230 --> 00:39:27,440
So HTTPS resolves this problem
by securing or encrypting

751
00:39:27,440 --> 00:39:32,150
all of the communications
between a client and a server.

752
00:39:32,150 --> 00:39:33,762
So HTTP requests go to one port.

753
00:39:33,762 --> 00:39:34,970
We talked about that already.

754
00:39:34,970 --> 00:39:36,950
They go to port 80 by convention.

755
00:39:36,950 --> 00:39:40,790
HTTP requests go to port
for 443 by convention.

756
00:39:40,790 --> 00:39:44,840
In order for HTTPS to
work, the server is

757
00:39:44,840 --> 00:39:52,100
responsible for providing or possessing
a valid what's called an SSL or TLS

758
00:39:52,100 --> 00:39:52,670
certificate.

759
00:39:52,670 --> 00:39:55,550
SSL is actually a
deprecated technology now.

760
00:39:55,550 --> 00:39:58,070
It's been subsumed into TLS.

761
00:39:58,070 --> 00:40:01,580
But typically these things are still
referred to as SSL certificates.

762
00:40:01,580 --> 00:40:04,430
And perhaps you've seen a
screen that looks like this when

763
00:40:04,430 --> 00:40:05,990
you're trying to visit some website.

764
00:40:05,990 --> 00:40:08,240
You get a warning that your
connection is not private.

765
00:40:08,240 --> 00:40:10,970
And at the very end of
that warning, you are

766
00:40:10,970 --> 00:40:13,640
informed that the cert date is invalid.

767
00:40:13,640 --> 00:40:18,900
Basically this just means that
their SSL certificate has expired.

768
00:40:18,900 --> 00:40:21,510
Now, what is an SSL certificate?

769
00:40:21,510 --> 00:40:27,000
So there are services that work
alongside the internet called

770
00:40:27,000 --> 00:40:28,020
certificate authorities.

771
00:40:28,020 --> 00:40:32,520
And like GlobalSign, for example,
from whom I borrowed the screenshots--

772
00:40:32,520 --> 00:40:35,280
GoDaddy, who is also a very
popular domain name provider,

773
00:40:35,280 --> 00:40:37,780
is also a certificate authority.

774
00:40:37,780 --> 00:40:42,600
And what they do is they verify
that a particular website owns

775
00:40:42,600 --> 00:40:44,270
a particular private key--

776
00:40:44,270 --> 00:40:48,230
or excuse me, a particular public key
which has a corresponding private key.

777
00:40:48,230 --> 00:40:49,980
And the way they do
that is they digitally

778
00:40:49,980 --> 00:40:51,928
sign something to the
certificate authority.

779
00:40:51,928 --> 00:40:54,720
The certificate authority then goes
through those exact same checks

780
00:40:54,720 --> 00:40:56,595
that we've seen before
for digital signatures

781
00:40:56,595 --> 00:40:59,460
to verify that, yes, this
person must own this public key.

782
00:40:59,460 --> 00:41:03,810
And the idea for this
is we're trusting that,

783
00:41:03,810 --> 00:41:06,750
when I send a communication
to you as the website

784
00:41:06,750 --> 00:41:12,120
owner using the public key that you
say is yours, then it really is yours.

785
00:41:12,120 --> 00:41:16,110
There really is somebody out
there or some third party

786
00:41:16,110 --> 00:41:19,530
that we've decided to collectively
trust, the certificate authority, who

787
00:41:19,530 --> 00:41:20,670
is going to verify this.

788
00:41:20,670 --> 00:41:23,100
Now, why does this matter?

789
00:41:23,100 --> 00:41:27,570
Why do we need to verify that someone's
public key is what they say it is?

790
00:41:27,570 --> 00:41:31,032
Well, it turns out that this
idea of asymmetric encryption,

791
00:41:31,032 --> 00:41:33,990
or public and private key cryptography
that we've previously discussed,

792
00:41:33,990 --> 00:41:38,520
does form part of the core of HTTPS.

793
00:41:38,520 --> 00:41:43,200
But as we'll see in a moment, we don't
actually use public and private keys

794
00:41:43,200 --> 00:41:47,100
to communicate except at the very,
very beginning of our interaction

795
00:41:47,100 --> 00:41:52,680
with some site when we are using HTTPS.

796
00:41:52,680 --> 00:41:56,370
So the way this really
happens underneath the hood

797
00:41:56,370 --> 00:42:00,780
is via the secure sockets layer, SSL,
which is now known as the transport

798
00:42:00,780 --> 00:42:02,950
layer security overall protocol.

799
00:42:02,950 --> 00:42:06,270
There's other things that are folded
into it, but SSL is part of it.

800
00:42:06,270 --> 00:42:09,210
And this is what happens.

801
00:42:09,210 --> 00:42:14,970
When I am requesting a page from
you, and you are the server,

802
00:42:14,970 --> 00:42:18,540
and I am requesting this
via HTTPS, I am going

803
00:42:18,540 --> 00:42:22,800
to initially make a request using
the public key that I believe

804
00:42:22,800 --> 00:42:24,780
is yours because the
certificate authority has

805
00:42:24,780 --> 00:42:30,395
vouched for you, saying that I would
like to make a encrypted request.

806
00:42:30,395 --> 00:42:32,520
And I don't want to send
that request over the air.

807
00:42:32,520 --> 00:42:34,145
I don't want to send that in the clear.

808
00:42:34,145 --> 00:42:37,110
I want to send it to you using the
encryption that you say is yours.

809
00:42:37,110 --> 00:42:41,160
So I send a request to you,
encrypting it using your public key.

810
00:42:41,160 --> 00:42:42,180
You receive the request.

811
00:42:42,180 --> 00:42:45,150
You decrypt it using your private key.

812
00:42:45,150 --> 00:42:48,900
You see, OK, I see now that Doug
wants to initiate a request with me,

813
00:42:48,900 --> 00:42:51,300
and you're going to fulfill the request.

814
00:42:51,300 --> 00:42:53,610
But you're also going
to do one other thing.

815
00:42:53,610 --> 00:42:57,420
You're going to set a key.

816
00:42:57,420 --> 00:43:00,270
And you're going to
send me back a key, not

817
00:43:00,270 --> 00:43:04,322
your public or private key, a different
key, alongside the request that I made.

818
00:43:04,322 --> 00:43:06,780
And you're going to send it
back to me using my public key.

819
00:43:06,780 --> 00:43:10,620
So the initial volley of communications
back and forth between us

820
00:43:10,620 --> 00:43:13,230
is the same as any other
encrypted communication

821
00:43:13,230 --> 00:43:16,140
using public and private keys
that we've previously seen.

822
00:43:16,140 --> 00:43:18,270
I send a message to you
using your public key.

823
00:43:18,270 --> 00:43:20,040
You decrypt it using your private key.

824
00:43:20,040 --> 00:43:26,340
You respond to me using my public key,
and I decrypt it using my private key.

825
00:43:26,340 --> 00:43:28,260
But this is really slow.

826
00:43:28,260 --> 00:43:34,780
If we're just having communications back
and forth via mail or even via text,

827
00:43:34,780 --> 00:43:39,210
the difference of a few
milliseconds is immaterial.

828
00:43:39,210 --> 00:43:41,450
We don't really notice it.

829
00:43:41,450 --> 00:43:44,757
But on the web, we do
notice it, especially

830
00:43:44,757 --> 00:43:46,590
if we're making multiple
requests or there's

831
00:43:46,590 --> 00:43:49,680
multiple packets going back and
forth and every single one of them

832
00:43:49,680 --> 00:43:51,520
needs to be encrypted.

833
00:43:51,520 --> 00:43:55,650
So beyond this initial volley,
public and private key encryption

834
00:43:55,650 --> 00:44:01,360
is no longer needed because it's no
longer used, because it's too slow.

835
00:44:01,360 --> 00:44:03,610
We would notice it if we did.

836
00:44:03,610 --> 00:44:09,150
Instead, as I mentioned, the server
is going to respond with a key.

837
00:44:09,150 --> 00:44:11,205
And that key is the key to a cipher.

838
00:44:11,205 --> 00:44:14,910
And we've talked about ciphers before
and we know that they are reversible.

839
00:44:14,910 --> 00:44:19,350
The particular cipher in question
here is something called AES.

840
00:44:19,350 --> 00:44:20,520
But it is just a cipher.

841
00:44:20,520 --> 00:44:21,960
It is reversible.

842
00:44:21,960 --> 00:44:24,360
And the key that you
receive is the key that you

843
00:44:24,360 --> 00:44:28,410
are supposed to use to decrypt
all future communications.

844
00:44:28,410 --> 00:44:30,060
This key is called the session key.

845
00:44:30,060 --> 00:44:33,360
And you use it to decrypt
all future communications

846
00:44:33,360 --> 00:44:37,230
and use it to encrypt all future
communications to the server

847
00:44:37,230 --> 00:44:40,350
until the session,
so-called, is terminated.

848
00:44:40,350 --> 00:44:43,320
And the session is basically
as long as you're on the site

849
00:44:43,320 --> 00:44:46,770
and you haven't logged
out or closed the window.

850
00:44:46,770 --> 00:44:48,240
That is the idea of a session.

851
00:44:48,240 --> 00:44:53,685
It is one singular
experience with a page

852
00:44:53,685 --> 00:44:57,750
or with a set of pages that are
all part of same domain name.

853
00:44:57,750 --> 00:45:00,960
We're just going to use a cipher for
the rest of the time that we talk.

854
00:45:00,960 --> 00:45:03,932
Now, this may seem
insecure for reasons we've

855
00:45:03,932 --> 00:45:05,640
talked about when we
talked about ciphers

856
00:45:05,640 --> 00:45:07,470
and how they are inherently flawed.

857
00:45:07,470 --> 00:45:10,470
Recall that when we were talking about
some of the really early ciphers,

858
00:45:10,470 --> 00:45:13,090
those are classic ciphers
like Caesar and Vigenere,

859
00:45:13,090 --> 00:45:14,430
those are very easy to break.

860
00:45:14,430 --> 00:45:17,630
AES is much more complex than that.

861
00:45:17,630 --> 00:45:22,080
And the other upside is that
this key, like I mentioned,

862
00:45:22,080 --> 00:45:23,910
is only good for a session.

863
00:45:23,910 --> 00:45:29,040
So in the unlikely event that the server
chooses a bad key, for example, if we

864
00:45:29,040 --> 00:45:32,490
think about it as if it was Caesar,
if they choose a key of zero,

865
00:45:32,490 --> 00:45:35,240
which would be a very bad key, or
key of one that doesn't actually

866
00:45:35,240 --> 00:45:40,113
shift the letters at all, even
if the key is compromised,

867
00:45:40,113 --> 00:45:41,780
it's only good for a particular session.

868
00:45:41,780 --> 00:45:44,240
That's not a very long amount of time.

869
00:45:44,240 --> 00:45:47,240
But the upside is the
ability to encipher

870
00:45:47,240 --> 00:45:49,520
and decipher information is much faster.

871
00:45:49,520 --> 00:45:53,390
If it's reversible, it's pretty quick
to do some mathematical manipulation

872
00:45:53,390 --> 00:45:57,140
and transform it into something
that looks obscured and gibberish

873
00:45:57,140 --> 00:45:59,240
and to undo that as well.

874
00:45:59,240 --> 00:46:03,020
And so even though public
and private keys are--

875
00:46:03,020 --> 00:46:05,780
we consider effectively
unbreakable, like to the point

876
00:46:05,780 --> 00:46:10,040
of it's mathematically untenable
to crack a message using

877
00:46:10,040 --> 00:46:11,510
public and private key encryption.

878
00:46:11,510 --> 00:46:16,010
We don't rely on it for SSL because
it is impractical to actually expect

879
00:46:16,010 --> 00:46:17,450
communications to go that slowly.

880
00:46:17,450 --> 00:46:19,610
And so we do fall back on these ciphers.

881
00:46:19,610 --> 00:46:24,260
And that really is when you're using
secured encrypted communication

882
00:46:24,260 --> 00:46:26,270
via HTTPS.

883
00:46:26,270 --> 00:46:27,980
You're just relying
on a cipher that just

884
00:46:27,980 --> 00:46:31,700
happens to be a very, very fancy
cipher that should hypothetically

885
00:46:31,700 --> 00:46:36,060
be very difficult to figure
out the key to as well.

886
00:46:36,060 --> 00:46:40,280
You may have also seen a few changes
in your browser, especially recently.

887
00:46:40,280 --> 00:46:42,170
This screenshot shows
a couple of changes

888
00:46:42,170 --> 00:46:48,080
that are designed to warn you when
you are not using HTTPS encryption.

889
00:46:48,080 --> 00:46:51,980
And it's not necessary to use
HTTPS for every interaction you

890
00:46:51,980 --> 00:46:53,480
have on the internet.

891
00:46:53,480 --> 00:46:56,750
For example, if you are going to a
site that is purely informational,

892
00:46:56,750 --> 00:47:00,900
it's just static content, it's just a
list of information, there's no login,

893
00:47:00,900 --> 00:47:05,190
there's no buying, there's no clicking
on things that might then get tracked,

894
00:47:05,190 --> 00:47:08,280
for example, it's not really
necessary to use HTTPS.

895
00:47:08,280 --> 00:47:11,630
So don't be necessarily
alarmed if you visit a site

896
00:47:11,630 --> 00:47:14,180
and your warned it's not secure.

897
00:47:14,180 --> 00:47:17,480
We're told that over time this will
turn red and become perhaps even

898
00:47:17,480 --> 00:47:19,950
more concerning as more
versions of this come out

899
00:47:19,950 --> 00:47:23,850
and as more and more adopters
of HTTPS exist as well.

900
00:47:23,850 --> 00:47:25,850
But you're going to start
getting notifications.

901
00:47:25,850 --> 00:47:27,725
And you may have seen
these as well in green.

902
00:47:27,725 --> 00:47:29,870
If you are using HTTPS and
you log into something,

903
00:47:29,870 --> 00:47:33,120
you'll see a little lock icon here
and you'll be told that it is secure.

904
00:47:33,120 --> 00:47:35,570
And again, this is just
because human beings

905
00:47:35,570 --> 00:47:40,460
tend not to be as concerned
about their digital privacy

906
00:47:40,460 --> 00:47:43,430
and their digital security
when using the internet.

907
00:47:43,430 --> 00:47:48,260
And now the technology is
trying to provide clues and tips

908
00:47:48,260 --> 00:47:54,880
to entice you to be more
concerned about these things.

909
00:47:54,880 --> 00:47:57,330
Now let's take a look
at a couple of attacks

910
00:47:57,330 --> 00:47:59,640
that are derived from
things we typically consider

911
00:47:59,640 --> 00:48:02,130
to be advantages of using the internet.

912
00:48:02,130 --> 00:48:07,050
The first of these is the idea
of cross-site scripting, XSS.

913
00:48:07,050 --> 00:48:09,450
We've previously discussed
this idea of the distinction

914
00:48:09,450 --> 00:48:11,700
between server-side code
and client-side code.

915
00:48:11,700 --> 00:48:14,400
Client-side code, recall, is
something that runs locally

916
00:48:14,400 --> 00:48:16,710
on our computer where
our browser, for example,

917
00:48:16,710 --> 00:48:19,380
is expected to interpret
and execute that code.

918
00:48:19,380 --> 00:48:22,000
Server-side code is run on the server.

919
00:48:22,000 --> 00:48:25,060
And when we get
information from a server,

920
00:48:25,060 --> 00:48:27,630
we're not getting back
the actual lines of code.

921
00:48:27,630 --> 00:48:31,028
We're getting back the output of that
code having run in the first place.

922
00:48:31,028 --> 00:48:34,320
So for example, there might be some code
on the server, some Python code or PHP

923
00:48:34,320 --> 00:48:38,220
code that generates HTML for us.

924
00:48:38,220 --> 00:48:42,570
The actual Python or PHP code in this
example would be server-side code.

925
00:48:42,570 --> 00:48:44,430
We don't actually ever see that code.

926
00:48:44,430 --> 00:48:46,890
We only see the output of that code.

927
00:48:46,890 --> 00:48:50,550
A cross-site script
vulnerability exists when

928
00:48:50,550 --> 00:48:57,180
an adversary is able to trick a client's
browser to run something locally.

929
00:48:57,180 --> 00:49:01,860
And it will do something that
presumably the person, the client,

930
00:49:01,860 --> 00:49:04,965
didn't actually intend to do.

931
00:49:04,965 --> 00:49:07,590
Let's take a look at an example
of this using a very simple web

932
00:49:07,590 --> 00:49:09,150
server called Flask.

933
00:49:09,150 --> 00:49:10,575
We have here some Python code.

934
00:49:10,575 --> 00:49:13,200
And don't be too worried if this
doesn't all make sense to you.

935
00:49:13,200 --> 00:49:20,050
It's just a pretty short, simple
web server that does two things.

936
00:49:20,050 --> 00:49:22,170
So this is just some
bookkeeping stuff in Flask.

937
00:49:22,170 --> 00:49:26,460
And Flask is a package of Python
that is used to create web servers.

938
00:49:26,460 --> 00:49:29,100
This web server has two
things, though, that it does.

939
00:49:29,100 --> 00:49:34,350
The first is when I visit
slash on my web server--

940
00:49:34,350 --> 00:49:36,750
so let's say this is Doug's site.

941
00:49:36,750 --> 00:49:41,912
If I go to dougssite.com, which you may
not actually explicitly type anymore

942
00:49:41,912 --> 00:49:43,620
but most browsers just
add it, slash just

943
00:49:43,620 --> 00:49:47,730
means the root page of your server.

944
00:49:47,730 --> 00:49:50,430
I'm going to call the following
function whose name happens

945
00:49:50,430 --> 00:49:52,440
to be called index in this case.

946
00:49:52,440 --> 00:49:53,970
Return hello world.

947
00:49:53,970 --> 00:49:58,770
And what this basically means
is if I visit dougspage.com/,

948
00:49:58,770 --> 00:50:05,730
what I receive is an HTML page
whose content is just hello world.

949
00:50:05,730 --> 00:50:09,060
So it's just an HTML file
that says hello world.

950
00:50:09,060 --> 00:50:11,730
Again, this code here
is all server-side code.

951
00:50:11,730 --> 00:50:14,130
You don't actually see this code.

952
00:50:14,130 --> 00:50:18,933
You only see the output of this
code, which is this here, this HTML.

953
00:50:18,933 --> 00:50:21,100
It's just a simple string
in this case, but it would

954
00:50:21,100 --> 00:50:25,080
be interpreted by the browser as HTML.

955
00:50:25,080 --> 00:50:27,920
If, however, I get a 404--

956
00:50:27,920 --> 00:50:31,470
a 404 is a not found error. it means
the page I requested doesn't exist.

957
00:50:31,470 --> 00:50:35,370
And since I've only defined the
behavior for literally one page,

958
00:50:35,370 --> 00:50:41,790
slash the index page of my server, then
I want to call this function not found.

959
00:50:41,790 --> 00:50:46,590
Return not found plus whatever
page I tried to visit.

960
00:50:46,590 --> 00:50:50,550
So it basically is another very simple
page, much like hello world here,

961
00:50:50,550 --> 00:50:53,980
where instead of saying hello
world, it says not found.

962
00:50:53,980 --> 00:50:57,560
And then it also concatenates onto
the very end of that whatever page

963
00:50:57,560 --> 00:50:59,760
I tried to visit.

964
00:50:59,760 --> 00:51:03,960
This is a major cross-site
scripting vulnerability.

965
00:51:03,960 --> 00:51:05,640
And let's see why.

966
00:51:05,640 --> 00:51:10,920
Let's imagine I go to
/foo, so dougspage/com/foo.

967
00:51:10,920 --> 00:51:14,130
Recall that our error handler function,
which I've reproduced down here,

968
00:51:14,130 --> 00:51:17,330
will return not found /foo.

969
00:51:17,330 --> 00:51:18,330
Seems pretty reasonable.

970
00:51:18,330 --> 00:51:22,260
It seems like the behavior I
expected or intended to have happen.

971
00:51:22,260 --> 00:51:24,970
But what about if I go
to a page like this one?

972
00:51:24,970 --> 00:51:29,490
So this is what I literally type in the
browser, dougspage.com/ angle bracket,

973
00:51:29,490 --> 00:51:36,450
script, angle bracket alert(hi)
and then a closed script tag there.

974
00:51:36,450 --> 00:51:42,770
This script here, script
here, looks a lot like HTML.

975
00:51:42,770 --> 00:51:47,640
And in fact, when the browser sees
this, it will interpret it as HTML.

976
00:51:47,640 --> 00:51:53,340
And so I will get returned by visiting
this page not found And then everything

977
00:51:53,340 --> 00:51:57,150
here except for the
leading slash, which means

978
00:51:57,150 --> 00:52:02,550
that when I receive this and my
client is interpreting the HTML,

979
00:52:02,550 --> 00:52:05,502
I'm going to generate an alert.

980
00:52:05,502 --> 00:52:06,210
What is an alert?

981
00:52:06,210 --> 00:52:09,025
Well, if you've ever gone to a
website and had a pop-up box display

982
00:52:09,025 --> 00:52:11,400
some information, you have to
click OK or click X to make

983
00:52:11,400 --> 00:52:13,590
it go away, that's what an alert is.

984
00:52:13,590 --> 00:52:16,350
So I visit this page on
my website, I've actually

985
00:52:16,350 --> 00:52:21,330
tricked my browser into
giving me a JavaScript alert,

986
00:52:21,330 --> 00:52:23,850
or I've tricked whoever
visits this page's browser

987
00:52:23,850 --> 00:52:26,070
to give me a JavaScript alert.

988
00:52:26,070 --> 00:52:29,980
So that's probably not
exactly a good thing.

989
00:52:29,980 --> 00:52:33,540
But it can get a little bit
more nefarious than that.

990
00:52:33,540 --> 00:52:36,670
Let's instead imagine-- instead
of having this be on my server,

991
00:52:36,670 --> 00:52:41,250
it might be easier to imagine it
like this, that this is what I wrote.

992
00:52:41,250 --> 00:52:45,698
This script tag here's what I wrote
into my Facebook profile, for example.

993
00:52:45,698 --> 00:52:48,240
So Facebook gives you the ability
to write a short little bio

994
00:52:48,240 --> 00:52:49,500
about yourself.

995
00:52:49,500 --> 00:52:54,927
Let's imagine that my bio was this
script document.write, image source,

996
00:52:54,927 --> 00:52:56,760
and then I have a hacker
URL and everything.

997
00:52:56,760 --> 00:52:58,760
And imagine that I own hacker URL.

998
00:52:58,760 --> 00:53:04,800
So I own hacker URL and I wrote
this in my Facebook profile.

999
00:53:04,800 --> 00:53:08,010
Assuming that Facebook did not
defend against cross-site scripting

1000
00:53:08,010 --> 00:53:11,740
attacks, which they do, but
assuming that they did not,

1001
00:53:11,740 --> 00:53:15,540
anytime somebody visited
my profile, their browser

1002
00:53:15,540 --> 00:53:19,810
would be forced to contend
with this script tag here.

1003
00:53:19,810 --> 00:53:20,310
Why?

1004
00:53:20,310 --> 00:53:22,590
Because they're trying
to visit my profile page.

1005
00:53:22,590 --> 00:53:26,610
My profile page contains
literally these characters which

1006
00:53:26,610 --> 00:53:29,540
are going to be interpreted as HTML.

1007
00:53:29,540 --> 00:53:33,990
And it's going to add document.write--
that's a JavaScript way of saying add

1008
00:53:33,990 --> 00:53:38,490
the following line in addition
to the HTML of the page--

1009
00:53:38,490 --> 00:53:44,700
image source equals hacker
url?cookie= and then document.cookie.

1010
00:53:44,700 --> 00:53:48,210
So imagine that I, again,
control hacker URL.

1011
00:53:48,210 --> 00:53:50,730
Presumably, as somebody
who is running a website,

1012
00:53:50,730 --> 00:53:54,810
I also maintain logs of every time
somebody tries to access my website,

1013
00:53:54,810 --> 00:53:57,960
what page on my site
they're trying to visit.

1014
00:53:57,960 --> 00:54:00,690
If somebody goes to my Facebook
profile and executes this,

1015
00:54:00,690 --> 00:54:06,270
I'm going to get notified via my hacker
URL logs that somebody has tried to go

1016
00:54:06,270 --> 00:54:12,560
to that page ?cookie=
and then document.cookie.

1017
00:54:12,560 --> 00:54:14,910
Now, document.cookie in
this case, because this

1018
00:54:14,910 --> 00:54:21,670
exists on my Facebook profile, is
an individual's cookie for Facebook.

1019
00:54:21,670 --> 00:54:24,000
So here what I am
doing-- again, Facebook

1020
00:54:24,000 --> 00:54:26,310
does defend against
cross-site scripting attacks,

1021
00:54:26,310 --> 00:54:28,230
so this can't actually
happen on Facebook.

1022
00:54:28,230 --> 00:54:31,980
But assuming that they did not
defend against them adequately,

1023
00:54:31,980 --> 00:54:36,210
what I'm basically doing
is getting told via my log

1024
00:54:36,210 --> 00:54:38,520
that somebody tried to
visit some page on my URL,

1025
00:54:38,520 --> 00:54:41,400
but the page that they
tried to visit, I'm

1026
00:54:41,400 --> 00:54:46,170
plugging in and basically stealing
the cookie that they use for Facebook.

1027
00:54:46,170 --> 00:54:48,873
And a cookie, recall, is
sort of like a hand stamp.

1028
00:54:48,873 --> 00:54:50,790
It's basically me, instead
of having to re-log

1029
00:54:50,790 --> 00:54:53,602
into Facebook every time I want
to use it, going up to Facebook

1030
00:54:53,602 --> 00:54:54,310
and saying, here.

1031
00:54:54,310 --> 00:54:56,070
You've already verified my identity.

1032
00:54:56,070 --> 00:54:59,040
Just take a look at
this, and you get let in.

1033
00:54:59,040 --> 00:55:04,920
And now I hypothetically know
someone else's Facebook cookie.

1034
00:55:04,920 --> 00:55:07,890
And if I was clever, I
could try and use that

1035
00:55:07,890 --> 00:55:12,060
to change what my Facebook cookie
is to that person's Facebook cookie.

1036
00:55:12,060 --> 00:55:17,220
And then suddenly I'm able to log in
and view their profile and act as them.

1037
00:55:17,220 --> 00:55:19,290
This image tag here
is just a clever trick

1038
00:55:19,290 --> 00:55:24,150
because the idea is that it's trying
to pull some resource from my site.

1039
00:55:24,150 --> 00:55:25,060
It doesn't exist.

1040
00:55:25,060 --> 00:55:27,270
I don't have a list of all
the cookies on Facebook.

1041
00:55:27,270 --> 00:55:32,040
But I'm being told that somebody is
trying to access this URL on my site.

1042
00:55:32,040 --> 00:55:34,950
So the image tag is just
sort of a trick to force

1043
00:55:34,950 --> 00:55:38,760
it to log something on my hacker URL.

1044
00:55:38,760 --> 00:55:43,170
But the idea here is that I would
be able to steal somebody's Facebook

1045
00:55:43,170 --> 00:55:47,610
cookie where this attack's
not well-defended against.

1046
00:55:47,610 --> 00:55:51,960
So what techniques can we
use either for our own sites

1047
00:55:51,960 --> 00:55:55,980
when we are running to avoid
cross-site scripting vulnerabilities

1048
00:55:55,980 --> 00:56:01,270
or to protect against cross-site
scripting vulnerabilities?

1049
00:56:01,270 --> 00:56:04,770
The first technique that we can
use is to sanitize, so to speak,

1050
00:56:04,770 --> 00:56:08,400
all of the inputs that
come in to our page.

1051
00:56:08,400 --> 00:56:10,610
So let's take a look at how
exactly we might do this.

1052
00:56:10,610 --> 00:56:13,500
So it turns out that
there are things called

1053
00:56:13,500 --> 00:56:19,080
HTML entities, which are other ways of
representing certain characters in HTML

1054
00:56:19,080 --> 00:56:22,950
that might be considered special or
control characters, so things like,

1055
00:56:22,950 --> 00:56:26,460
for example, this or this.

1056
00:56:26,460 --> 00:56:29,610
Typically, when a browser
sees a character left

1057
00:56:29,610 --> 00:56:31,770
angle bracket or right
angle bracket, it's

1058
00:56:31,770 --> 00:56:37,740
going to automatically interpret that as
some HTML that it should then process.

1059
00:56:37,740 --> 00:56:39,930
So in the example I just
showed a moment ago,

1060
00:56:39,930 --> 00:56:44,130
I was using the fact that whenever
it sees angle brackets with script

1061
00:56:44,130 --> 00:56:47,050
around it, they're going to
try and interpret whatever

1062
00:56:47,050 --> 00:56:49,470
is between those tags as a script.

1063
00:56:49,470 --> 00:56:52,920
One way for me to prevent that
from being interpreted as a script

1064
00:56:52,920 --> 00:56:58,800
is to call this or call this something
else other than just left angle bracket

1065
00:56:58,800 --> 00:57:00,130
and right angle bracket.

1066
00:57:00,130 --> 00:57:03,780
And it turns out that there are these
things called HTML entities that

1067
00:57:03,780 --> 00:57:08,250
can be used to refer to
these characters instead,

1068
00:57:08,250 --> 00:57:13,440
such that if I sanitize
my input in such a way

1069
00:57:13,440 --> 00:57:20,278
that every time somebody literally
typed the character left angle bracket,

1070
00:57:20,278 --> 00:57:23,070
I had written some code that
automatically took that and changed it

1071
00:57:23,070 --> 00:57:25,470
into ampersand lt;.

1072
00:57:25,470 --> 00:57:29,440
And then every time somebody
wrote a greater than character,

1073
00:57:29,440 --> 00:57:35,670
or right angle bracket, I changed
that in the code to ampersand gt;.

1074
00:57:35,670 --> 00:57:40,170
Then when my page was responsible for
processing or interpreting something,

1075
00:57:40,170 --> 00:57:44,640
it wouldn't interpret this-- it would
still display this character as a left

1076
00:57:44,640 --> 00:57:47,580
angle bracket or less than-- that's
what the lt stands for here--

1077
00:57:47,580 --> 00:57:49,290
or a right angle bracket, greater than.

1078
00:57:49,290 --> 00:57:52,210
That's what the gt stands for there.

1079
00:57:52,210 --> 00:57:55,960
It would literally just show those
characters and not treat them as HTML.

1080
00:57:55,960 --> 00:58:00,030
So that's the idea of what it means
to sanitize input when we're talking

1081
00:58:00,030 --> 00:58:04,510
about HTML entities, for example.

1082
00:58:04,510 --> 00:58:08,160
Another thing that we could do is
just disable JavaScript entirely.

1083
00:58:08,160 --> 00:58:10,290
This would have some
upsides and some downsides.

1084
00:58:10,290 --> 00:58:13,440
The upside is you're pretty protected
against cross-site scripting

1085
00:58:13,440 --> 00:58:17,820
vulnerabilities because they're usually
going to be introduced via JavaScript.

1086
00:58:17,820 --> 00:58:20,100
The downside is JavaScript
is pretty convenient.

1087
00:58:20,100 --> 00:58:20,670
It's nice.

1088
00:58:20,670 --> 00:58:22,770
It makes for a better user experience.

1089
00:58:22,770 --> 00:58:24,930
Sometimes there might
be parts of our page

1090
00:58:24,930 --> 00:58:29,040
that just don't work if
JavaScript is completely disabled,

1091
00:58:29,040 --> 00:58:30,540
and so trade-offs there.

1092
00:58:30,540 --> 00:58:33,360
You're protecting yourself,
but you might be doing

1093
00:58:33,360 --> 00:58:37,050
other sorts of non-material damage.

1094
00:58:37,050 --> 00:58:40,142
Or we could decide to just handle
the JavaScript in a special way.

1095
00:58:40,142 --> 00:58:41,850
So for example, we
might not allow what's

1096
00:58:41,850 --> 00:58:44,940
called inline JavaScript, for
example, like the script tags

1097
00:58:44,940 --> 00:58:46,470
that I just showed a moment ago.

1098
00:58:46,470 --> 00:58:50,010
But we might allow JavaScripts
written in separate JavaScript files

1099
00:58:50,010 --> 00:58:52,870
which can also be linked
into your HTML pages.

1100
00:58:52,870 --> 00:58:56,280
So those would be allowed, but inline
JavaScript, like what we just saw,

1101
00:58:56,280 --> 00:58:57,690
would not be allowed.

1102
00:58:57,690 --> 00:59:01,890
We could sandbox the JavaScript and
run it separately somewhere else first

1103
00:59:01,890 --> 00:59:06,210
to see if it does something weird,
and if it doesn't do something weird,

1104
00:59:06,210 --> 00:59:08,580
then allow it to be displayed.

1105
00:59:08,580 --> 00:59:12,390
We could also execute the
content security policy.

1106
00:59:12,390 --> 00:59:15,570
Content security policy
is another header

1107
00:59:15,570 --> 00:59:20,370
that we can add to our HTML
pages or HTTP responses.

1108
00:59:20,370 --> 00:59:22,350
And we can define certain
behavior to happen

1109
00:59:22,350 --> 00:59:25,800
such that will allow certain lines or
certain types of JavaScript through

1110
00:59:25,800 --> 00:59:28,167
but not others.

1111
00:59:28,167 --> 00:59:30,000
Now, there's another
type of attack that can

1112
00:59:30,000 --> 00:59:34,800
be used that relies heavily on the fact
that we use cookies so extensively,

1113
00:59:34,800 --> 00:59:40,650
and that is a cross-site
request forgery, or a CSRF.

1114
00:59:40,650 --> 00:59:43,680
Now, cross-eyed scripting
attacks generally

1115
00:59:43,680 --> 00:59:48,840
involve receiving some content
and the client's browser

1116
00:59:48,840 --> 00:59:53,610
being tricked into doing something
locally that it didn't want to do.

1117
00:59:53,610 --> 00:59:58,170
In a CSRF request, or
CSRF attack, rather,

1118
00:59:58,170 --> 01:00:02,430
the trick is we're relying
on the fact that there

1119
01:00:02,430 --> 01:00:04,980
is a cookie that can
be exploited to make

1120
01:00:04,980 --> 01:00:11,595
a an outbound request, an outbound HTTP
request that we did not intend to make.

1121
01:00:11,595 --> 01:00:13,470
And again, this relies
extensively on cookies

1122
01:00:13,470 --> 01:00:18,300
because they are this shorthand,
short-form way to log into something.

1123
01:00:18,300 --> 01:00:22,230
And we can make a fraudulent
request appear legitimate

1124
01:00:22,230 --> 01:00:24,480
if we can rely on someone's cookie.

1125
01:00:24,480 --> 01:00:28,110
Now, again, if you ever use
a cloud service for example,

1126
01:00:28,110 --> 01:00:31,560
they're going to have CSRF
defenses built into them.

1127
01:00:31,560 --> 01:00:33,780
This is really if you're
building a simple site

1128
01:00:33,780 --> 01:00:35,368
and you don't defend against this.

1129
01:00:35,368 --> 01:00:38,160
Flask, for example, does not defend
against this particularly well,

1130
01:00:38,160 --> 01:00:40,568
but Flask is a very simple
web framework for servers.

1131
01:00:40,568 --> 01:00:43,110
They're generally going to be
much more complicated than that

1132
01:00:43,110 --> 01:00:46,620
and have much more additional
functionality to be more featurefull.

1133
01:00:46,620 --> 01:00:48,840
So let's walk through what
these cross-site request

1134
01:00:48,840 --> 01:00:50,280
forgeries might look like.

1135
01:00:50,280 --> 01:00:53,820
And for context, let's imagine
that I send you an email

1136
01:00:53,820 --> 01:00:56,137
asking you to click on some URL.

1137
01:00:56,137 --> 01:00:57,720
So you're going to click on this link.

1138
01:00:57,720 --> 01:00:59,820
It's going to redirect you to some page.

1139
01:00:59,820 --> 01:01:02,310
Maybe that page looks
something like this.

1140
01:01:02,310 --> 01:01:04,470
It's pretty simple,
not much going on here.

1141
01:01:04,470 --> 01:01:05,320
I have a body.

1142
01:01:05,320 --> 01:01:07,500
And inside of it I have one more link.

1143
01:01:07,500 --> 01:01:15,422
And the link is http://hackbank.com/
transfertodoug=amt500.

1144
01:01:15,422 --> 01:01:18,630
Now, perhaps you don't hover over it
and see the link at the beginning of it.

1145
01:01:18,630 --> 01:01:20,960
But maybe you are a
customer of Hack Bank.

1146
01:01:20,960 --> 01:01:24,480
And maybe I know that you're a customer
of Hack Bank such that if you click

1147
01:01:24,480 --> 01:01:28,290
on this link and if you happen to be
logged in, and if you happen to have

1148
01:01:28,290 --> 01:01:32,730
your cookie set for hackbank.com, and
this was the way that they actually

1149
01:01:32,730 --> 01:01:37,650
executed transfers, by having you go
to /transfer and say to whom you want

1150
01:01:37,650 --> 01:01:40,200
to send money and in what amount--

1151
01:01:40,200 --> 01:01:42,938
And fortunately, most banks
don't actually do this.

1152
01:01:42,938 --> 01:01:46,230
Usually, if you're going to do something
that manipulates the database, as this

1153
01:01:46,230 --> 01:01:48,938
would, because it's going to be
transferring some amount of money

1154
01:01:48,938 --> 01:01:51,930
somewhere that would be
via HTTP POST request--

1155
01:01:51,930 --> 01:01:55,530
this is just a straightforward
GET request I'm making here.

1156
01:01:55,530 --> 01:01:57,722
If you were logged in,
though, to Hack Bank,

1157
01:01:57,722 --> 01:01:59,430
or if you're cookie
for Hack Bank was set

1158
01:01:59,430 --> 01:02:03,555
and you clicked on this link,
hypothetically, a transfer of $500--

1159
01:02:03,555 --> 01:02:05,430
again, assuming that
this was how you did it,

1160
01:02:05,430 --> 01:02:07,740
you specified a person and
you specified an amount--

1161
01:02:07,740 --> 01:02:13,288
would be transferred from your
account to presumably my account.

1162
01:02:13,288 --> 01:02:15,330
That's probably not
something you intended to do.

1163
01:02:15,330 --> 01:02:18,867
So that would be an example of why
this is a cross-site request forgery.

1164
01:02:18,867 --> 01:02:19,950
It's a legitimate request.

1165
01:02:19,950 --> 01:02:23,130
It appears that you intended to
do this because it came from you.

1166
01:02:23,130 --> 01:02:24,330
It's using your cookie.

1167
01:02:24,330 --> 01:02:28,090
But you didn't actually
intend for it to happen.

1168
01:02:28,090 --> 01:02:29,460
Here's another example.

1169
01:02:29,460 --> 01:02:32,260
You click on the link in my email
and you get brought to this page.

1170
01:02:32,260 --> 01:02:35,250
So there's not actually even a
second link to click anymore.

1171
01:02:35,250 --> 01:02:37,410
Now it's just trying to load an image.

1172
01:02:37,410 --> 01:02:40,660
Now, looking at this URL, we can
tell there's not an image there.

1173
01:02:40,660 --> 01:02:43,920
It doesn't end in jpeg
or .pmg or the like.

1174
01:02:43,920 --> 01:02:45,540
It's the same URL as before.

1175
01:02:45,540 --> 01:02:49,397
But my browser sees image source
equals something and says,

1176
01:02:49,397 --> 01:02:51,480
well, I'm at least going
to try and go to that URL

1177
01:02:51,480 --> 01:02:55,040
and see if there is an
image there to load for you.

1178
01:02:55,040 --> 01:02:57,710
Again, you just click on
the link in the email.

1179
01:02:57,710 --> 01:03:00,140
This page loads.

1180
01:03:00,140 --> 01:03:03,320
My browser tries to go to this
page, or your browser in this case

1181
01:03:03,320 --> 01:03:06,230
tries to go to this page
to load the image there.

1182
01:03:06,230 --> 01:03:10,910
But in so doing, it's, again,
executing this unintended transfer,

1183
01:03:10,910 --> 01:03:14,750
relying on your cookie at hackbank.com.

1184
01:03:14,750 --> 01:03:17,120
Another example of this might be a form.

1185
01:03:17,120 --> 01:03:20,120
So again, it appears that you
click on the link in the email.

1186
01:03:20,120 --> 01:03:23,870
You get brought to a form that just has
now just a button at the bottom of it

1187
01:03:23,870 --> 01:03:24,892
that says Click Here.

1188
01:03:24,892 --> 01:03:26,600
And the reason it just
has a button, even

1189
01:03:26,600 --> 01:03:31,990
though there's other stuff written, is
that those first two fields are hidden.

1190
01:03:31,990 --> 01:03:35,000
They are type equals hidden,
which means you wouldn't actually

1191
01:03:35,000 --> 01:03:37,040
see them when you load your browser.

1192
01:03:37,040 --> 01:03:40,160
Now, contrast this, for
example, with a field

1193
01:03:40,160 --> 01:03:43,340
whose type is text, which you might
see if you're doing a straightforward

1194
01:03:43,340 --> 01:03:44,090
login.

1195
01:03:44,090 --> 01:03:48,020
You would type characters in and
see the actual characters appear.

1196
01:03:48,020 --> 01:03:50,660
That's text versus a password
field where you would

1197
01:03:50,660 --> 01:03:52,580
type characters in and see all stars.

1198
01:03:52,580 --> 01:03:55,640
It would visually
obscure what you typed.

1199
01:03:55,640 --> 01:03:58,760
The action of this
form, or so to say where

1200
01:03:58,760 --> 01:04:02,313
the form-- what happens when you click
on the Submit button at the bottom

1201
01:04:02,313 --> 01:04:03,230
is the same as before.

1202
01:04:03,230 --> 01:04:06,140
It's hackbank.com/transfer.

1203
01:04:06,140 --> 01:04:07,970
And then I'm using
these parameters here;

1204
01:04:07,970 --> 01:04:13,550
to Doug, the amount of $500, Click Here.

1205
01:04:13,550 --> 01:04:17,090
Now I actually am using a
notice also POST request

1206
01:04:17,090 --> 01:04:19,500
to try to initiate this
transfer, again, assuming

1207
01:04:19,500 --> 01:04:24,380
that this was how Hack Bank structured
transfer requests in this way.

1208
01:04:24,380 --> 01:04:27,650
So if you clicked here and this
was otherwise validly structured

1209
01:04:27,650 --> 01:04:31,340
and you were logged in, or your
cookie was valid for Hack Bank,

1210
01:04:31,340 --> 01:04:33,800
then this would initiate
a transfer of $500.

1211
01:04:33,800 --> 01:04:37,850
And I can play another similar trick to
what I did a moment ago with the image

1212
01:04:37,850 --> 01:04:43,070
by doing something like this
where, when the page is loaded,

1213
01:04:43,070 --> 01:04:44,435
instantly submit this form.

1214
01:04:44,435 --> 01:04:46,310
So you don't even have
to click here anymore.

1215
01:04:46,310 --> 01:04:47,630
It's just going to go
through the document,

1216
01:04:47,630 --> 01:04:50,780
document being JavaScript's way of
referring to the entire web page,

1217
01:04:50,780 --> 01:04:53,600
find the first form,
form zeros, assuming

1218
01:04:53,600 --> 01:04:57,380
this is the first form on
the page, and just submit it.

1219
01:04:57,380 --> 01:04:59,840
Doesn't matter what else is going on.

1220
01:04:59,840 --> 01:05:00,860
Just submit this form.

1221
01:05:00,860 --> 01:05:06,110
That would also initiate transfer if
you clicked on that link from my email.

1222
01:05:06,110 --> 01:05:10,010
So a quick summary of these
two different types of attacks.

1223
01:05:10,010 --> 01:05:12,740
Cross-site scripting
attacks, the adversary

1224
01:05:12,740 --> 01:05:16,940
tricks you into executing code on
your browser to do something locally

1225
01:05:16,940 --> 01:05:19,070
that you probably did not intend.

1226
01:05:19,070 --> 01:05:22,280
And a cross-site request
forgery, something

1227
01:05:22,280 --> 01:05:27,320
that appears to be a legitimate
request from your browser

1228
01:05:27,320 --> 01:05:31,220
because it's relying on cookies, your
ostensibly logged in in that way,

1229
01:05:31,220 --> 01:05:35,670
but you don't actually
mean to make that request.

1230
01:05:35,670 --> 01:05:37,670
Now let's talk about a
couple of vulnerabilities

1231
01:05:37,670 --> 01:05:40,340
that exist in the context
of a database, which I

1232
01:05:40,340 --> 01:05:42,600
know you've discussed recently as well.

1233
01:05:42,600 --> 01:05:46,170
So imagine that I have a
table of users on my database

1234
01:05:46,170 --> 01:05:49,580
that looks like this, that each of them
has an ID number, they have a username,

1235
01:05:49,580 --> 01:05:51,170
and they have a password.

1236
01:05:51,170 --> 01:05:53,630
Now, the obvious
vulnerability here is I really

1237
01:05:53,630 --> 01:05:57,800
shouldn't be storing my users'
passwords like this in the clear.

1238
01:05:57,800 --> 01:06:01,370
If somebody were to ever hack and
get a hold of this database file,

1239
01:06:01,370 --> 01:06:03,020
that's really, really bad.

1240
01:06:03,020 --> 01:06:08,740
I am not taking best practices to
protect my customers' information.

1241
01:06:08,740 --> 01:06:09,990
So I want to avoid doing that.

1242
01:06:09,990 --> 01:06:14,060
So instead what I might do, as we've
discussed, is hash their passwords,

1243
01:06:14,060 --> 01:06:17,540
run them through some hash function
so that when they're actually stored,

1244
01:06:17,540 --> 01:06:19,880
they get stored looking
something like this.

1245
01:06:19,880 --> 01:06:23,120
You have no idea what the
original password was.

1246
01:06:23,120 --> 01:06:25,050
And because it's a
hash, it's irreversible.

1247
01:06:25,050 --> 01:06:28,280
You should not be able
to undo what I did

1248
01:06:28,280 --> 01:06:30,390
when I ran through the hash function.

1249
01:06:30,390 --> 01:06:33,560
But there's actually still
a vulnerability here.

1250
01:06:33,560 --> 01:06:35,840
And the vulnerability
here is not technical.

1251
01:06:35,840 --> 01:06:38,570
It's human again.

1252
01:06:38,570 --> 01:06:41,785
And the vulnerability that
exists here is that we see--

1253
01:06:41,785 --> 01:06:43,910
we're using a hash function,
so it's deterministic.

1254
01:06:43,910 --> 01:06:47,300
When we pass some data through it, we're
going to get the same output every time

1255
01:06:47,300 --> 01:06:48,810
we pass data through it.

1256
01:06:48,810 --> 01:06:53,900
And two of our users, Charlie
and Eric, have the same hash.

1257
01:06:53,900 --> 01:06:56,390
We saw this makes sense,
because if we go back a moment,

1258
01:06:56,390 --> 01:06:59,840
they also had the same actual password
when it was stored in plain text.

1259
01:06:59,840 --> 01:07:03,530
We've gone out of our way to try and
defend against that by hashing it.

1260
01:07:03,530 --> 01:07:06,860
But somebody who gets a hold of
this database file, for example,

1261
01:07:06,860 --> 01:07:11,750
they hack into it, they get it, they'll
see two people have the same password.

1262
01:07:11,750 --> 01:07:14,540
And maybe this is a very
small subset of my user base.

1263
01:07:14,540 --> 01:07:17,150
And maybe there's hundreds
of thousands of people.

1264
01:07:17,150 --> 01:07:20,720
And maybe 10% of them
all have the same hash.

1265
01:07:20,720 --> 01:07:26,670
Well, again, human beings, we are not
the best at defending our own stuff.

1266
01:07:26,670 --> 01:07:29,090
It's a sad truth that
the most common password

1267
01:07:29,090 --> 01:07:32,997
is password followed by some of these
other examples we had a second ago.

1268
01:07:32,997 --> 01:07:34,580
All of these are pretty bad passwords.

1269
01:07:34,580 --> 01:07:38,990
They're all on the list of some of
the most commonly used passwords

1270
01:07:38,990 --> 01:07:42,920
for all services, which means
that if you see a hash like this,

1271
01:07:42,920 --> 01:07:45,620
it doesn't matter that
we have taken steps

1272
01:07:45,620 --> 01:07:49,130
to protect our users against this.

1273
01:07:49,130 --> 01:07:55,700
If we see a hash like this many, many
times in our database, a clever hacker,

1274
01:07:55,700 --> 01:07:58,732
a clever adversary
might think, oh, well,

1275
01:07:58,732 --> 01:08:00,440
I'm seeing this password
10% of the time,

1276
01:08:00,440 --> 01:08:04,400
so I'm going to guess that Charlie's
password for the service is 12345

1277
01:08:04,400 --> 01:08:05,330
and they're wrong.

1278
01:08:05,330 --> 01:08:08,480
And then they'll maybe try abcdef
and they're wrong, and then maybe try

1279
01:08:08,480 --> 01:08:10,520
password and they're right.

1280
01:08:10,520 --> 01:08:13,910
And then all of a sudden every
time they see that hash, they

1281
01:08:13,910 --> 01:08:18,090
can assume that the password is password
for every single one of those users.

1282
01:08:18,090 --> 01:08:24,960
So again, nothing we can do as
technologists to solve this problem.

1283
01:08:24,960 --> 01:08:29,510
This is really just
getting folks to understand

1284
01:08:29,510 --> 01:08:33,276
that using different passwords,
using non-standard passwords,

1285
01:08:33,276 --> 01:08:34,109
is really important.

1286
01:08:34,109 --> 01:08:37,067
That's why we talked about password
managers and maybe not even knowing

1287
01:08:37,067 --> 01:08:41,160
your own passwords in a prior lecture.

1288
01:08:41,160 --> 01:08:45,140
There's another problem that can exist,
though, with databases, in particular,

1289
01:08:45,140 --> 01:08:47,120
when we see screens like this.

1290
01:08:47,120 --> 01:08:51,560
So this is a contrived login screen
that has a username and password

1291
01:08:51,560 --> 01:08:55,220
field And a Forgot Password
button whose purpose in life

1292
01:08:55,220 --> 01:08:59,149
is, if you type in your
email address and you--

1293
01:08:59,149 --> 01:09:01,189
which is the username
in this case, and you

1294
01:09:01,189 --> 01:09:05,510
have the Forgot Password box
checked, and you try and click login,

1295
01:09:05,510 --> 01:09:09,418
instead of actually logging you in,
it's going to email you, hopefully,

1296
01:09:09,418 --> 01:09:11,960
a link to your password, not
your actual password for reasons

1297
01:09:11,960 --> 01:09:14,970
we previously discussed as well.

1298
01:09:14,970 --> 01:09:20,640
But what if when we click
on this button we see this?

1299
01:09:20,640 --> 01:09:22,310
OK.

1300
01:09:22,310 --> 01:09:25,520
We've emailed you a link
to change your password.

1301
01:09:25,520 --> 01:09:29,660
Does that seem inherently problematic?

1302
01:09:29,660 --> 01:09:30,479
Perhaps not.

1303
01:09:30,479 --> 01:09:34,600
But what about if you see this as well?

1304
01:09:34,600 --> 01:09:37,100
Somebody might see this if
they're logged in as well.

1305
01:09:37,100 --> 01:09:40,490
Sorry, no user with that email address.

1306
01:09:40,490 --> 01:09:44,870
Does that perhaps seem problematic
when you compare it against this?

1307
01:09:44,870 --> 01:09:48,350
This is an example of something
called information leakage.

1308
01:09:48,350 --> 01:09:51,710
Perhaps an adversary has
hacked some other database

1309
01:09:51,710 --> 01:09:55,040
where folks were not being
as secure with credentials.

1310
01:09:55,040 --> 01:09:58,970
And so they have a whole set of email
addresses mapped to credentials.

1311
01:09:58,970 --> 01:10:02,570
And because human beings tend
to reuse the same credentials

1312
01:10:02,570 --> 01:10:06,650
on multiple different services,
they are trying different services

1313
01:10:06,650 --> 01:10:09,170
that they believe that
these users might also

1314
01:10:09,170 --> 01:10:13,550
use using those same username
and password combinations.

1315
01:10:13,550 --> 01:10:18,860
If this is the way that we field these
types of forgot password inquiries,

1316
01:10:18,860 --> 01:10:22,130
we're revealing some
information potentially.

1317
01:10:22,130 --> 01:10:27,650
If Alice is a user, we're now
saying, yes, Alice is a user of this.

1318
01:10:27,650 --> 01:10:29,300
Try this password.

1319
01:10:29,300 --> 01:10:34,490
If we get something like this, then
the adversary might not bother trying.

1320
01:10:34,490 --> 01:10:37,820
They've realized, oh, Alice
is not a user of this service.

1321
01:10:37,820 --> 01:10:41,720
And even if they're not trying to hack
into it, if we do something like this,

1322
01:10:41,720 --> 01:10:45,230
we're also telling that adversary
quite a bit about Alice.

1323
01:10:45,230 --> 01:10:49,340
Now we know Alice uses this service,
and this service, and this service,

1324
01:10:49,340 --> 01:10:50,600
and not this service.

1325
01:10:50,600 --> 01:10:54,050
And they can sort of create a
picture of who Alice might be.

1326
01:10:54,050 --> 01:11:00,398
They're sort of using her digital
footprint to understand more about her.

1327
01:11:00,398 --> 01:11:03,190
A better response in this case
might be to say something like this,

1328
01:11:03,190 --> 01:11:04,550
request received.

1329
01:11:04,550 --> 01:11:07,702
If you're in our system, you'll receive
an email with instructions shortly.

1330
01:11:07,702 --> 01:11:09,410
That's not tipping
our hand either way as

1331
01:11:09,410 --> 01:11:12,890
to whether the user is in the
database or not in the database.

1332
01:11:12,890 --> 01:11:15,860
No information leakage here,
and generally a better way

1333
01:11:15,860 --> 01:11:19,610
to protect our customer's privacy.

1334
01:11:19,610 --> 01:11:22,850
Now, that's not the only problem
that we can have with databases.

1335
01:11:22,850 --> 01:11:25,610
We've alluded to this
idea of SQL injection.

1336
01:11:25,610 --> 01:11:28,100
And there's this comment that
gets the rounds quite a bit

1337
01:11:28,100 --> 01:11:30,620
when we talk about SQL injection
from a web comic called

1338
01:11:30,620 --> 01:11:35,240
XKCD that involves a SQL injection
attack, which is basically

1339
01:11:35,240 --> 01:11:39,080
providing some information that--

1340
01:11:39,080 --> 01:11:42,670
or providing some text or some query
that we want to make to a database

1341
01:11:42,670 --> 01:11:46,690
where that query actually
does something unintended.

1342
01:11:46,690 --> 01:11:50,700
It actually itself is SQL as opposed
to just plugging in some parameter,

1343
01:11:50,700 --> 01:11:53,750
like what is your name, and then
searching the database for that name.

1344
01:11:53,750 --> 01:11:55,708
Instead of giving you my
name, I might give you

1345
01:11:55,708 --> 01:11:58,040
something that is actually
a SQL query that's

1346
01:11:58,040 --> 01:12:01,050
going to be executed that
you don't want me to execute.

1347
01:12:01,050 --> 01:12:03,750
So let's see an example
of how this might work.

1348
01:12:03,750 --> 01:12:07,800
So here's another simple
username and password field.

1349
01:12:07,800 --> 01:12:11,580
And in this example, I've written my
password field poorly intentionally

1350
01:12:11,580 --> 01:12:14,000
for purposes of the example
so that it will actually

1351
01:12:14,000 --> 01:12:16,970
show you the text that is
typed as opposed to showing

1352
01:12:16,970 --> 01:12:19,640
you stars like a password field should.

1353
01:12:19,640 --> 01:12:23,300
So this is something that the user
sees when they access my site.

1354
01:12:23,300 --> 01:12:26,718
And perhaps on the back end in the
server-side code, inside of Python

1355
01:12:26,718 --> 01:12:29,510
somewhere I have written a SQL
query that looks like the following.

1356
01:12:29,510 --> 01:12:35,540
When the login button is clicked,
execute the following SQL query.

1357
01:12:35,540 --> 01:12:40,040
SELECT star from users where
username equals uname--

1358
01:12:40,040 --> 01:12:45,230
and uname here in yellow referring
to whatever was typed in this box--

1359
01:12:45,230 --> 01:12:48,050
and password equals
pword, where, again, pword

1360
01:12:48,050 --> 01:12:51,140
is referring to whatever
was typed in this box.

1361
01:12:51,140 --> 01:12:54,120
So we're doing a SQL query
to select star from users,

1362
01:12:54,120 --> 01:12:57,360
get all of the information
from the users table

1363
01:12:57,360 --> 01:13:01,170
where the username equals
whatever they typed in that box

1364
01:13:01,170 --> 01:13:05,560
and the password equals
whatever they typed in that box.

1365
01:13:05,560 --> 01:13:07,410
And so, for example,
if I have somebody who

1366
01:13:07,410 --> 01:13:09,810
logs in with the username
Alice and the password

1367
01:13:09,810 --> 01:13:14,580
12345, what the query would actually
look like with these values plugged

1368
01:13:14,580 --> 01:13:19,920
into it might look something like this;
SELECT star from users where username

1369
01:13:19,920 --> 01:13:25,200
equals Alice and password equals 12345.

1370
01:13:25,200 --> 01:13:30,420
If there is nobody with username Alice
or Alice's password is not 12345,

1371
01:13:30,420 --> 01:13:31,770
then this will fail.

1372
01:13:31,770 --> 01:13:34,890
Both of those conditions
need to be true.

1373
01:13:34,890 --> 01:13:37,890
But what about this?

1374
01:13:37,890 --> 01:13:46,800
Someone whose username is hacker and
their password is 1' or '1' equals '1.

1375
01:13:46,800 --> 01:13:49,800


1376
01:13:49,800 --> 01:13:51,848
That looks pretty weird.

1377
01:13:51,848 --> 01:13:53,640
And the reason that
that looks pretty weird

1378
01:13:53,640 --> 01:13:57,390
is because this is an
attempt to inject SQL,

1379
01:13:57,390 --> 01:14:02,820
to trick SQL into doing something that
is presumably not intended by the code

1380
01:14:02,820 --> 01:14:04,050
that we wrote.

1381
01:14:04,050 --> 01:14:07,980
Now, it probably helps to take a
look at it plugging the data in

1382
01:14:07,980 --> 01:14:11,580
to see what exactly this is going to do.

1383
01:14:11,580 --> 01:14:16,270
SELECT star from users where
username equals hacker or--

1384
01:14:16,270 --> 01:14:23,190
excuse me, and password equals
'1' or and so on and so on.

1385
01:14:23,190 --> 01:14:26,880


1386
01:14:26,880 --> 01:14:30,180
Maybe I do have a person whose
username actually is hacker,

1387
01:14:30,180 --> 01:14:33,000
but that's probably not their password.

1388
01:14:33,000 --> 01:14:34,050
That doesn't matter.

1389
01:14:34,050 --> 01:14:37,350
I'm still going to be
able to log in if I

1390
01:14:37,350 --> 01:14:39,140
have somebody whose username is hacker.

1391
01:14:39,140 --> 01:14:41,850
And the reason for that
is because of this or.

1392
01:14:41,850 --> 01:14:45,780
I have sort of short circuited
the end of the SQL query.

1393
01:14:45,780 --> 01:14:50,370
I have this quote mark that demarcates
the end of what the user presumably

1394
01:14:50,370 --> 01:14:51,780
typed in.

1395
01:14:51,780 --> 01:14:54,660
But I've actually literally
typed those into my password

1396
01:14:54,660 --> 01:14:59,060
to trick SQL such that if
hacker's password equals 1,

1397
01:14:59,060 --> 01:15:03,420
it just happens to literally be the
character 1, OK, I have succeeded.

1398
01:15:03,420 --> 01:15:05,250
I guess that's a really
bad password, and I

1399
01:15:05,250 --> 01:15:08,100
shouldn't be able to log it in that
way, but maybe that is the case

1400
01:15:08,100 --> 01:15:09,060
and I'm able to log in.

1401
01:15:09,060 --> 01:15:13,560
But even if not, this
other thing is true.

1402
01:15:13,560 --> 01:15:18,660
'1' does equal '1'.

1403
01:15:18,660 --> 01:15:23,030
So as long as somebody whose username
is hacker exists in the database,

1404
01:15:23,030 --> 01:15:27,330
I am now able to log in as
hacker because this is true.

1405
01:15:27,330 --> 01:15:29,230
This part's probably not true, right?

1406
01:15:29,230 --> 01:15:31,860
It's unlikely that their password is 1.

1407
01:15:31,860 --> 01:15:36,960
Regardless of what their password
is, this part actually is true.

1408
01:15:36,960 --> 01:15:40,200
It's a very simple SQL injection attack.

1409
01:15:40,200 --> 01:15:44,490
I'm basically logging in as someone
who I'm presumably not supposed

1410
01:15:44,490 --> 01:15:48,780
to be able to log in as, but it
illustrates the kind of thing

1411
01:15:48,780 --> 01:15:50,550
that could happen.

1412
01:15:50,550 --> 01:15:54,450
You are allowing people
to bypass logins.

1413
01:15:54,450 --> 01:15:59,100
Now, it could get worse if your
database administrator username

1414
01:15:59,100 --> 01:16:01,710
is admin or something very common.

1415
01:16:01,710 --> 01:16:04,683
The default for this is typically admin.

1416
01:16:04,683 --> 01:16:06,600
This would potentially
give people the ability

1417
01:16:06,600 --> 01:16:08,760
to be database
administrators, that they're

1418
01:16:08,760 --> 01:16:14,370
able to execute exactly this
kind of trick on the admin user.

1419
01:16:14,370 --> 01:16:16,830
Now they have administrative
access to your database, which

1420
01:16:16,830 --> 01:16:19,580
means they can do things like
manipulate the data in the database,

1421
01:16:19,580 --> 01:16:23,350
change things, add things, delete things
that you don't want to have deleted.

1422
01:16:23,350 --> 01:16:28,170
And in the case of a database,
deletion is pretty permanent.

1423
01:16:28,170 --> 01:16:32,580
You can't undo a delete most
of the time in a database

1424
01:16:32,580 --> 01:16:35,890
as the way you might be
able to do with other files.

1425
01:16:35,890 --> 01:16:38,430
Now, are there techniques to
avoid this kind of attack?

1426
01:16:38,430 --> 01:16:40,108
Fortunately, there are.

1427
01:16:40,108 --> 01:16:42,900
Right now I'd like just to just
take a look at a very simple Python

1428
01:16:42,900 --> 01:16:45,720
program that replicates
the kind of thing

1429
01:16:45,720 --> 01:16:50,080
that one could do in a more
robust, more complex SQL situation.

1430
01:16:50,080 --> 01:16:52,080
So let's pull up a program
here where we're just

1431
01:16:52,080 --> 01:16:54,870
simulating this idea
of a SQL injection just

1432
01:16:54,870 --> 01:17:00,230
to show you how it's not that
difficult to defend against it.

1433
01:17:00,230 --> 01:17:03,840
So let's pull up the code
here in this file login.py.

1434
01:17:03,840 --> 01:17:06,060
So there's not that much going on here.

1435
01:17:06,060 --> 01:17:07,950
I have x equals input username.

1436
01:17:07,950 --> 01:17:10,920
So x, recall, is a Python variable.

1437
01:17:10,920 --> 01:17:14,460
And input username is basically going
to prompt the user with the string

1438
01:17:14,460 --> 01:17:17,405
username and then expect them
to type something after that.

1439
01:17:17,405 --> 01:17:19,530
And then we do exactly the
same thing with password

1440
01:17:19,530 --> 01:17:21,270
except storing the result there in y.

1441
01:17:21,270 --> 01:17:24,000
So whatever the user types after
username will get stored in x.

1442
01:17:24,000 --> 01:17:27,270
Whatever they type after
password will get stored in y.

1443
01:17:27,270 --> 01:17:29,030
And then here I'm just going to print.

1444
01:17:29,030 --> 01:17:33,310
And in the SQL context, this would be
the query that actually gets executed.

1445
01:17:33,310 --> 01:17:35,610
So imagine that that's
what's happening instead.

1446
01:17:35,610 --> 01:17:39,850
SELECT star from users where username
equals and then this symbol here,

1447
01:17:39,850 --> 01:17:40,350
'[? x ?]'.

1448
01:17:40,350 --> 01:17:44,180


1449
01:17:44,180 --> 01:17:46,680
What I'm doing here is just
using a Python-formatted string.

1450
01:17:46,680 --> 01:17:48,560
That's what this f
here-- it's not a typo--

1451
01:17:48,560 --> 01:17:51,810
at the beginning means, is I'm going to
plug in whatever the person, the user,

1452
01:17:51,810 --> 01:17:55,640
typed at the first prompt,
which I stored in x here,

1453
01:17:55,640 --> 01:17:59,933
and whatever the user typed the
second prompt that's store in y there.

1454
01:17:59,933 --> 01:18:01,600
So let's actually just run this program.

1455
01:18:01,600 --> 01:18:03,980
So let's pop open here for a second.

1456
01:18:03,980 --> 01:18:07,780
The name of this program is
login.py, so I'm going to type python

1457
01:18:07,780 --> 01:18:10,880
login.py, Enter.

1458
01:18:10,880 --> 01:18:13,290
Username, Doug.

1459
01:18:13,290 --> 01:18:16,308
Password, 12345.

1460
01:18:16,308 --> 01:18:19,600
And then the query, hypothetically, that
would get executed if I constructed it

1461
01:18:19,600 --> 01:18:22,480
in this way is SELECT star
from users where username

1462
01:18:22,480 --> 01:18:25,210
equals Doug and password equals 12345.

1463
01:18:25,210 --> 01:18:26,320
Seems reasonable.

1464
01:18:26,320 --> 01:18:30,130
But if I try and do the adversary
thing that I did a moment ago,

1465
01:18:30,130 --> 01:18:38,380
username equals Doug, password
equals 1' or '1' equals '1, not

1466
01:18:38,380 --> 01:18:42,850
a final single quote, and I hit
Enter, then I end up with SELECT star

1467
01:18:42,850 --> 01:18:49,865
from users where username equals Doug
and password equals 1 or 1 equals 1.

1468
01:18:49,865 --> 01:18:52,000
And the latter part of that is true.

1469
01:18:52,000 --> 01:18:53,890
The former part is false.

1470
01:18:53,890 --> 01:18:56,860
But it's good enough that
I would be able to log in

1471
01:18:56,860 --> 01:18:59,650
if I did something like that.

1472
01:18:59,650 --> 01:19:02,200
But we want to try and get around that.

1473
01:19:02,200 --> 01:19:05,200
So now let's take a look at a second
file that might solve this problem.

1474
01:19:05,200 --> 01:19:11,380
So I'm going to open up
login2.py in my editor here.

1475
01:19:11,380 --> 01:19:15,610
So now it starts out exactly the same,
x equals something, y equals something.

1476
01:19:15,610 --> 01:19:18,640
But I'm making a pretty
basic substitution.

1477
01:19:18,640 --> 01:19:23,020
I'm replacing every time that I see
single quotes with double quotes.

1478
01:19:23,020 --> 01:19:25,050
So I'm replacing every
instance of single quote,

1479
01:19:25,050 --> 01:19:26,800
and I have to preface
it with a backslash.

1480
01:19:26,800 --> 01:19:30,160
Because notice I'm actually using
single quotes to identify the character.

1481
01:19:30,160 --> 01:19:33,880
It just so happens that it's to indicate
that I'm trying to substitute something

1482
01:19:33,880 --> 01:19:35,350
which I'm putting in single quotes.

1483
01:19:35,350 --> 01:19:38,440
The thing I'm trying to substitute
actually is a single quote,

1484
01:19:38,440 --> 01:19:42,130
and so I need to put a
backslash in front of it

1485
01:19:42,130 --> 01:19:44,440
to escape that character
such that it actually

1486
01:19:44,440 --> 01:19:48,310
gets treated as a single quotation
mark character as opposed

1487
01:19:48,310 --> 01:19:50,308
to some special Python--

1488
01:19:50,308 --> 01:19:52,850
Python's not going to try and
interpret it in some other way.

1489
01:19:52,850 --> 01:19:56,890
So I want to replace every instance of
a single quote in x with a double quote,

1490
01:19:56,890 --> 01:20:00,010
and I want to replace every
instance of a single quote in y

1491
01:20:00,010 --> 01:20:01,030
with a double quote.

1492
01:20:01,030 --> 01:20:02,650
Now, why do I want to do that?

1493
01:20:02,650 --> 01:20:07,240
Because notice in my
actual Python string here

1494
01:20:07,240 --> 01:20:12,670
I'm using single quotes to set
off the variables for purposes

1495
01:20:12,670 --> 01:20:14,290
of SQL's interpretation of them.

1496
01:20:14,290 --> 01:20:16,520
So where the user name
equals this string,

1497
01:20:16,520 --> 01:20:18,830
I'm using single quotes to do that.

1498
01:20:18,830 --> 01:20:23,920
So if my username or my password
also contained single quotation mark

1499
01:20:23,920 --> 01:20:27,430
characters, when SQL
was interpreting it,

1500
01:20:27,430 --> 01:20:32,080
it might think that the next single
quote character it sees is the end.

1501
01:20:32,080 --> 01:20:34,300
I'm done with what I've prompted.

1502
01:20:34,300 --> 01:20:37,420
And that's exactly how I tricked
it in the previous example.

1503
01:20:37,420 --> 01:20:40,930
I used that first single quote,
which seemed kind of random and out

1504
01:20:40,930 --> 01:20:44,380
of nowhere, to trick SQL into
thinking I'm done with this.

1505
01:20:44,380 --> 01:20:48,850
Then I used the keyword or back
now into a SQL and not some string

1506
01:20:48,850 --> 01:20:52,570
that I'm searching for, and then I
would continue this trick going forward.

1507
01:20:52,570 --> 01:20:55,732
So this is designed to
eliminate all the single quotes,

1508
01:20:55,732 --> 01:20:57,940
because the single quotes
mean something very special

1509
01:20:57,940 --> 01:21:01,510
in the context of my SQL query itself.

1510
01:21:01,510 --> 01:21:06,610
If you're actually using SQL
libraries that are tied into Python,

1511
01:21:06,610 --> 01:21:11,108
the ability to replace things is
much more robust than this example.

1512
01:21:11,108 --> 01:21:12,900
But even this very
simple example where I'm

1513
01:21:12,900 --> 01:21:16,480
doing just this very basic
substitution is good enough

1514
01:21:16,480 --> 01:21:20,390
to get around the injection
attack that we just looked at.

1515
01:21:20,390 --> 01:21:23,350
So this is now in login2.py.

1516
01:21:23,350 --> 01:21:24,520
Let's do this.

1517
01:21:24,520 --> 01:21:26,895
Let's Python login2.py.

1518
01:21:26,895 --> 01:21:28,270
And we'll start out the same way.

1519
01:21:28,270 --> 01:21:30,890
We'll do Doug and 12345.

1520
01:21:30,890 --> 01:21:32,895
And it appears that nothing has changed.

1521
01:21:32,895 --> 01:21:35,020
The behavior is otherwise
identical because I'm not

1522
01:21:35,020 --> 01:21:36,730
trying to do any tricks like that.

1523
01:21:36,730 --> 01:21:41,440
SELECT star from users where username
equals Doug and password equals 12345.

1524
01:21:41,440 --> 01:21:45,250
But if I now try that same
trick that I did a moment ago,

1525
01:21:45,250 --> 01:21:55,090
so password is 1' or '1'
equals '1 and I hit Enter,

1526
01:21:55,090 --> 01:21:59,020
now I'm not subject to that same SQL
injection anymore because I'm trying

1527
01:21:59,020 --> 01:22:02,800
to select all the information from the
users table where the username is Doug

1528
01:22:02,800 --> 01:22:03,970
and the password equals--

1529
01:22:03,970 --> 01:22:06,950
And notice that here is
the first single quote.

1530
01:22:06,950 --> 01:22:08,440
Here is the second one.

1531
01:22:08,440 --> 01:22:11,770
So it's thinking that entire
thing now is the password.

1532
01:22:11,770 --> 01:22:20,468
Only if my password is
literally 1" or "1" equals "1,

1533
01:22:20,468 --> 01:22:22,010
then I would be literally logging in.

1534
01:22:22,010 --> 01:22:23,980
If that happened to be my
password, this would work.

1535
01:22:23,980 --> 01:22:25,150
But otherwise I've escaped.

1536
01:22:25,150 --> 01:22:28,630
I've stopped the adversary
from being able to leverage

1537
01:22:28,630 --> 01:22:33,080
a simple trick like this
to break in to my database

1538
01:22:33,080 --> 01:22:34,930
when perhaps they're
not intended to do so.

1539
01:22:34,930 --> 01:22:41,140
And again, in actual SQL injection
defense, the substitutions that we make

1540
01:22:41,140 --> 01:22:42,640
are much more complicated than this.

1541
01:22:42,640 --> 01:22:45,932
We're not just looking for single quote
characters and double quote characters,

1542
01:22:45,932 --> 01:22:48,610
but we're considering semicolons
or any other special characters

1543
01:22:48,610 --> 01:22:51,460
that SQL would interpret
as part of a statement.

1544
01:22:51,460 --> 01:22:53,900
We can escape those out so
that users could literally

1545
01:22:53,900 --> 01:22:59,720
use single quotes or semicolons
or the like in their passwords

1546
01:22:59,720 --> 01:23:03,160
without necessarily compromising
the integrity of the entire database

1547
01:23:03,160 --> 01:23:04,510
overall.

1548
01:23:04,510 --> 01:23:08,480
So we've taken a look at several of
the most common, most obvious ways

1549
01:23:08,480 --> 01:23:11,180
that an adversary might be
able to extract information

1550
01:23:11,180 --> 01:23:13,910
either from a business or an individual.

1551
01:23:13,910 --> 01:23:17,660
And these ways are kind of
attention-getting in some context.

1552
01:23:17,660 --> 01:23:19,880
But let's focus now-- let's
go back and bring things

1553
01:23:19,880 --> 01:23:22,280
full circle to something
I've mentioned many times,

1554
01:23:22,280 --> 01:23:28,400
which is humans are the core fatal
flaw in all of these security things

1555
01:23:28,400 --> 01:23:29,800
that we're dealing with here.

1556
01:23:29,800 --> 01:23:31,800
And so let's bring things
full circle by talking

1557
01:23:31,800 --> 01:23:34,220
about phishing, what phishing is.

1558
01:23:34,220 --> 01:23:39,140
So phishing is just an attempt
by an adversary to prey upon us

1559
01:23:39,140 --> 01:23:45,440
and our unfortunate general ignorance
of basic security protocols.

1560
01:23:45,440 --> 01:23:47,900
So it's just an attempt
to socially engineer,

1561
01:23:47,900 --> 01:23:49,730
basically, information out of someone.

1562
01:23:49,730 --> 01:23:52,460
You pretend to be
someone that you are not.

1563
01:23:52,460 --> 01:23:54,710
And if you do so
convincingly enough, you

1564
01:23:54,710 --> 01:23:58,190
might be able to extract
information about that person.

1565
01:23:58,190 --> 01:24:01,053
Now, phishing you'll also see
in other contexts that are--

1566
01:24:01,053 --> 01:24:03,470
computer scientists like to
be clever with their wordplay.

1567
01:24:03,470 --> 01:24:06,800
You'll see things like netting, which
is basically a phishing attack that

1568
01:24:06,800 --> 01:24:08,780
launches against many
people at once, hoping

1569
01:24:08,780 --> 01:24:11,060
they'll be able to get one or two.

1570
01:24:11,060 --> 01:24:13,400
There's spear phishing,
which is a phishing

1571
01:24:13,400 --> 01:24:17,240
attack that targets one specific person
trying to get information from them.

1572
01:24:17,240 --> 01:24:20,090
And then there's whaling,
which is a phishing attack that

1573
01:24:20,090 --> 01:24:23,330
is targeted against somebody who is
perceived to have a lot of information

1574
01:24:23,330 --> 01:24:25,413
or whose information is
particularly valuable such

1575
01:24:25,413 --> 01:24:28,820
that you'd be phishing
for some big whale.

1576
01:24:28,820 --> 01:24:31,730
Now, one of the most obvious and
easy types of phishing attack

1577
01:24:31,730 --> 01:24:32,900
looks like this.

1578
01:24:32,900 --> 01:24:35,450
It's a simple URL substitution.

1579
01:24:35,450 --> 01:24:39,590
This is how we can write a link in HTML.

1580
01:24:39,590 --> 01:24:43,480
A is the HTML tag for anchor,
which we use for hyperlinks.

1581
01:24:43,480 --> 01:24:46,460
Href is where we are going to.

1582
01:24:46,460 --> 01:24:50,660
And then we also have the ability to
specify some text at the end of that.

1583
01:24:50,660 --> 01:24:54,830
These two items do not have
to match, as you can see here.

1584
01:24:54,830 --> 01:25:02,750
I can say we're going to URL2
but actually send you to URL1.

1585
01:25:02,750 --> 01:25:08,420
This is an incredibly common way
to get information from somebody.

1586
01:25:08,420 --> 01:25:12,830
They think they're going one place but
they're actually going someplace else.

1587
01:25:12,830 --> 01:25:16,430
And to show you, as a very
basic example, just how easy it

1588
01:25:16,430 --> 01:25:21,560
is to potentially trick somebody into
going somewhere they're not supposed to

1589
01:25:21,560 --> 01:25:25,220
and potentially then
revealing credentials as well,

1590
01:25:25,220 --> 01:25:28,580
let's just take a simple
example here with Facebook.

1591
01:25:28,580 --> 01:25:31,970
And why don't we just take a moment
to build our own version of Facebook

1592
01:25:31,970 --> 01:25:36,410
and see if we can't get somebody to
potentially reveal information to us?

1593
01:25:36,410 --> 01:25:38,750
So let's imagine that I
have acquired some domain

1594
01:25:38,750 --> 01:25:41,390
name that's really
similar to Facebook.com,

1595
01:25:41,390 --> 01:25:44,150
like it's off by one character.

1596
01:25:44,150 --> 01:25:45,350
It's a common typo.

1597
01:25:45,350 --> 01:25:48,198
For example fs maybe is a common thing.

1598
01:25:48,198 --> 01:25:49,990
People mistype the A
or something like that

1599
01:25:49,990 --> 01:25:54,800
that would be really not necessarily
obvious to somebody at the outset.

1600
01:25:54,800 --> 01:25:59,240
One way that I might be able to just
take advantage of somebody's thinking

1601
01:25:59,240 --> 01:26:01,670
that they're logging into
Facebook is to make a page that

1602
01:26:01,670 --> 01:26:05,150
looks exactly the same as Facebook.

1603
01:26:05,150 --> 01:26:07,640
That's actually not
very difficult to do.

1604
01:26:07,640 --> 01:26:09,680
All you have to do is
open up Facebook here.

1605
01:26:09,680 --> 01:26:14,720
And because its HTML is available
to me, I can right click on it,

1606
01:26:14,720 --> 01:26:18,530
view page source, take
a second to load here--

1607
01:26:18,530 --> 01:26:20,480
Facebook is a pretty big site--

1608
01:26:20,480 --> 01:26:27,080
and then I can just control A, copy,
select all, copy all of the content,

1609
01:26:27,080 --> 01:26:33,500
and paste this in to my
index.html, and we will save.

1610
01:26:33,500 --> 01:26:36,140


1611
01:26:36,140 --> 01:26:40,970
And then we'll head back
into our terminal here,

1612
01:26:40,970 --> 01:26:45,170
and I will start Chrome on
the file index.html, which

1613
01:26:45,170 --> 01:26:49,400
is the file that I literally just
saved my Facebook information in.

1614
01:26:49,400 --> 01:26:51,040
So start Chrome index.html.

1615
01:26:51,040 --> 01:26:53,360
You'll notice that it
brings me to this URL

1616
01:26:53,360 --> 01:26:56,670
here, which is the file
for where I currently live,

1617
01:26:56,670 --> 01:26:58,310
or where this file currently lives.

1618
01:26:58,310 --> 01:27:00,920
And this page looks like Facebook,
except for the fact that,

1619
01:27:00,920 --> 01:27:04,220
when I log in, I then
get redirected back

1620
01:27:04,220 --> 01:27:07,370
to something that actually is Facebook
and is not something that I control.

1621
01:27:07,370 --> 01:27:10,820
But at the outset, my page
here at the very beginning

1622
01:27:10,820 --> 01:27:14,810
looks identical to Facebook.

1623
01:27:14,810 --> 01:27:16,790
Now, the trick here
would be to do something

1624
01:27:16,790 --> 01:27:20,780
so that the user would provide
information here in the email box

1625
01:27:20,780 --> 01:27:24,397
and then here in the password field
such that when they click Login,

1626
01:27:24,397 --> 01:27:26,480
I might be able to get
that information from them.

1627
01:27:26,480 --> 01:27:30,500
Maybe I just am waiting to
capture their information.

1628
01:27:30,500 --> 01:27:35,450
So the next step for me might be to go
back into my random set of stuff here.

1629
01:27:35,450 --> 01:27:38,570
There's a lot of random code
that we don't really care about.

1630
01:27:38,570 --> 01:27:41,030
But the one thing I do care
about is what happens when

1631
01:27:41,030 --> 01:27:43,790
somebody clicks on this Login button.

1632
01:27:43,790 --> 01:27:45,590
That is interesting to me.

1633
01:27:45,590 --> 01:27:48,230
So I'm going to go through
this and just do control F,

1634
01:27:48,230 --> 01:27:51,968
control F just being
find, the string login.

1635
01:27:51,968 --> 01:27:54,260
That's the text that's
literally written on the button,

1636
01:27:54,260 --> 01:27:55,843
so hopefully I'll find that somewhere.

1637
01:27:55,843 --> 01:27:58,160
I'm told I have eight results.

1638
01:27:58,160 --> 01:27:59,990
So this is, if I just
kind of look around

1639
01:27:59,990 --> 01:28:01,698
for context to try
and figure out where I

1640
01:28:01,698 --> 01:28:05,660
am in the code, the title of
something, so that's probably not it.

1641
01:28:05,660 --> 01:28:07,180
So I don't want to go there.

1642
01:28:07,180 --> 01:28:10,640
Create an account or login,
not quite what I'm looking for.

1643
01:28:10,640 --> 01:28:12,620
So go the next one.

1644
01:28:12,620 --> 01:28:15,890
OK, here we go, input
value equals login.

1645
01:28:15,890 --> 01:28:18,680
So now I found an input
that is called login.

1646
01:28:18,680 --> 01:28:22,110
So this is presumably a button
that's presumably part of some form.

1647
01:28:22,110 --> 01:28:25,820
So if I scroll up a little
bit higher, hopefully I

1648
01:28:25,820 --> 01:28:29,570
will find a form, which I do, form ID.

1649
01:28:29,570 --> 01:28:30,920
And it has an action.

1650
01:28:30,920 --> 01:28:34,040
The action is to go to
this particular page,

1651
01:28:34,040 --> 01:28:37,310
facebook.com/login/ and so on and so on.

1652
01:28:37,310 --> 01:28:39,820
But maybe I want to
send it somewhere else.

1653
01:28:39,820 --> 01:28:44,000
So if I replace this entire URL with
where I actually want to send the user,

1654
01:28:44,000 --> 01:28:46,160
where maybe I'm going to
capture their information,

1655
01:28:46,160 --> 01:28:49,220
maybe I'll store this in login.html.

1656
01:28:49,220 --> 01:28:51,140
And so that's what's
going to come in here.

1657
01:28:51,140 --> 01:28:56,210
And then we'll save the file such
that our changes have been captured.

1658
01:28:56,210 --> 01:28:58,370
So presumably what should
happen is now, when

1659
01:28:58,370 --> 01:29:02,420
you click on the Login
button in my fake Facebook,

1660
01:29:02,420 --> 01:29:08,000
you instead get redirected to login.html
rather than the Facebook actual login

1661
01:29:08,000 --> 01:29:10,458
as we saw just a moment ago.

1662
01:29:10,458 --> 01:29:11,250
So let's try again.

1663
01:29:11,250 --> 01:29:14,870
We'll go back here to
our fake Facebook page.

1664
01:29:14,870 --> 01:29:18,880
We will refresh so that
we get our new content.

1665
01:29:18,880 --> 01:29:20,860
Remember, we just
changed the HTML content,

1666
01:29:20,860 --> 01:29:23,900
so we actually need to reload
it so that our browser has it.

1667
01:29:23,900 --> 01:29:31,250
And we'll type in abc@cs50.net and then
some password here and click Login,

1668
01:29:31,250 --> 01:29:32,990
and we get redirected here.

1669
01:29:32,990 --> 01:29:35,630
Sorry, we are unable to
log you in at this time.

1670
01:29:35,630 --> 01:29:38,270
But notice we're still
in a file that I created.

1671
01:29:38,270 --> 01:29:41,973
I didn't show you login.html, but
that's exactly what I put there.

1672
01:29:41,973 --> 01:29:44,390
Now, I'm not actually going
to phish for information here.

1673
01:29:44,390 --> 01:29:46,370
And I'm going to do something
that would arguably vio--

1674
01:29:46,370 --> 01:29:48,100
even though I'm using
fake data here, I'm

1675
01:29:48,100 --> 01:29:50,808
not going to do something that
would violate the terms of service

1676
01:29:50,808 --> 01:29:54,500
or get myself in trouble by actually
attempting to do some phishing here.

1677
01:29:54,500 --> 01:29:58,070
But imagine instead of some HTML
I had some Python code that was

1678
01:29:58,070 --> 01:30:00,740
able to read the data from that field.

1679
01:30:00,740 --> 01:30:02,840
We saw that a moment ago
with passwords, right?

1680
01:30:02,840 --> 01:30:06,860
We know that the possibility exists
that if the user types something

1681
01:30:06,860 --> 01:30:10,850
into a field, we have the
ability to extract it.

1682
01:30:10,850 --> 01:30:13,340
What I could do here is very simple.

1683
01:30:13,340 --> 01:30:18,200
I could just read those two fields where
they typed a username and a password

1684
01:30:18,200 --> 01:30:20,032
but then display this content.

1685
01:30:20,032 --> 01:30:22,490
Perhaps it's been the case that
you've gone to some website

1686
01:30:22,490 --> 01:30:26,300
and seen, oh, yeah, sorry, the server
can't handle this request right now,

1687
01:30:26,300 --> 01:30:28,820
or something along those lines.

1688
01:30:28,820 --> 01:30:30,650
And you maybe think nothing of it.

1689
01:30:30,650 --> 01:30:33,530
Or maybe I even would then have
a link here that says, try again.

1690
01:30:33,530 --> 01:30:35,870
And if you click Try Again,
it would bring you back

1691
01:30:35,870 --> 01:30:39,860
to Facebook's actual login where you
would then enter your credentials

1692
01:30:39,860 --> 01:30:42,560
and try again and perhaps
think everything was fine.

1693
01:30:42,560 --> 01:30:46,520
But if on this login page I had
extracted your username and password

1694
01:30:46,520 --> 01:30:49,120
by tricking you into thinking
you were logging into Facebook,

1695
01:30:49,120 --> 01:30:51,203
and then maybe I save those
in some file somewhere

1696
01:30:51,203 --> 01:30:54,882
and then just display this to you,
you think, ah, they just had an error.

1697
01:30:54,882 --> 01:30:56,090
Things are a little bit busy.

1698
01:30:56,090 --> 01:30:57,050
I'll try again.

1699
01:30:57,050 --> 01:30:58,910
And when you try again, it works.

1700
01:30:58,910 --> 01:31:00,770
It's really that easy.

1701
01:31:00,770 --> 01:31:05,600
And the way to avoid phishing
expeditions, so to speak,

1702
01:31:05,600 --> 01:31:07,530
are just to be mindful
of what you're doing.

1703
01:31:07,530 --> 01:31:11,000
Take a look at the URL bar to
make sure that you're on the page

1704
01:31:11,000 --> 01:31:12,983
that you think you're on.

1705
01:31:12,983 --> 01:31:14,900
Hopefully you've come
away now with a bit more

1706
01:31:14,900 --> 01:31:16,775
of an understanding of
cybersecurity and some

1707
01:31:16,775 --> 01:31:19,700
of the best practices that
are put in place to deal

1708
01:31:19,700 --> 01:31:21,740
with potential cybersecurity threats.

1709
01:31:21,740 --> 01:31:24,320
Now it's incumbent upon
us to use the technology

1710
01:31:24,320 --> 01:31:28,130
that we have available to help us
protect ourselves from ourselves,

1711
01:31:28,130 --> 01:31:33,020
but not only ourselves and our own data,
but also working to protect our clients

1712
01:31:33,020 --> 01:31:35,200
and their data as well.

1713
01:31:35,200 --> 01:31:36,533