1
00:00:00,000 --> 00:00:02,097


2
00:00:02,097 --> 00:00:03,930
SAMMY: Well, thanks for
watching my seminar.

3
00:00:03,930 --> 00:00:05,930
My name is Sammy Mara.

4
00:00:05,930 --> 00:00:09,860
I am a senior studying computer
science here at Harvard.

5
00:00:09,860 --> 00:00:12,180
And P versus NP is what I'm
going to be talking about.

6
00:00:12,180 --> 00:00:15,450
It is the greatest unsolved
problem in computer science,

7
00:00:15,450 --> 00:00:19,170
and maybe all of math, I think.

8
00:00:19,170 --> 00:00:21,520
And it's something that has
fascinated me for a while.

9
00:00:21,520 --> 00:00:24,180
So I think it's a natural
extension of the CS50,

10
00:00:24,180 --> 00:00:26,840
and I hope you learn something from it.

11
00:00:26,840 --> 00:00:29,340
We'll go into some of
the details, as well as

12
00:00:29,340 --> 00:00:33,630
the big picture of what is P versus NP.

13
00:00:33,630 --> 00:00:34,490
And yeah.

14
00:00:34,490 --> 00:00:37,440
So let's get started here.

15
00:00:37,440 --> 00:00:41,100
This is just to show that P versus NP
is sort of like a relevant problem,

16
00:00:41,100 --> 00:00:42,750
and it shows up in pop culture.

17
00:00:42,750 --> 00:00:46,470
This is Homer Simpson in
some matrix void thing,

18
00:00:46,470 --> 00:00:50,340
and you can see P equals NP
floating in the background there.

19
00:00:50,340 --> 00:00:53,970
And then in Futurama-- I guess that's
also a Matt Groening production--

20
00:00:53,970 --> 00:00:58,640
P and NP in separate books, sort of
implying that P is not equal to NP.

21
00:00:58,640 --> 00:01:02,880


22
00:01:02,880 --> 00:01:05,610
Really, the question is whether
or not P is equal to NP,

23
00:01:05,610 --> 00:01:09,280
and we'll get into what
that means in just a second.

24
00:01:09,280 --> 00:01:12,520
But if you're watching
this and you thinking hey,

25
00:01:12,520 --> 00:01:16,740
I want to solve P versus NP for my
final project, got some bad news.

26
00:01:16,740 --> 00:01:20,550
It's probably not the best idea.

27
00:01:20,550 --> 00:01:25,080
Don't want to discourage you, but a lot
of really, really famous, smart people,

28
00:01:25,080 --> 00:01:28,890
much smarter than me, have tried to
solve this problem for a long time.

29
00:01:28,890 --> 00:01:33,409
And its applications
are very far-reaching,

30
00:01:33,409 --> 00:01:35,700
and I think we can learn a
lot from it, but the problem

31
00:01:35,700 --> 00:01:37,237
may not be solved for a long time.

32
00:01:37,237 --> 00:01:39,750


33
00:01:39,750 --> 00:01:45,450
And yeah, why should we care about
the theory of computer science?

34
00:01:45,450 --> 00:01:49,670
And in CS50, you learn a lot about
the running time of algorithms.

35
00:01:49,670 --> 00:01:52,260
And we talk a lot about
how algorithms work

36
00:01:52,260 --> 00:01:55,920
and what are the best ways
to solve certain problems.

37
00:01:55,920 --> 00:01:58,980
But theory of computer
science sort of goes

38
00:01:58,980 --> 00:02:01,210
into how can we extrapolate from there?

39
00:02:01,210 --> 00:02:05,970
And what are the fundamental things
about algorithms that make them

40
00:02:05,970 --> 00:02:09,960
similar to one another and different
from one another, as well as

41
00:02:09,960 --> 00:02:13,450
how to model the computer
in its simplest form?

42
00:02:13,450 --> 00:02:17,070
That's something that Alan
Turing did, way back when,

43
00:02:17,070 --> 00:02:20,119
and he became famous for that.

44
00:02:20,119 --> 00:02:22,410
Everyone is-- I mean, not
everyone, but a lot of people

45
00:02:22,410 --> 00:02:24,035
are familiar with like Turing machines.

46
00:02:24,035 --> 00:02:28,180
That's sort of his theoretical
model of a computer.

47
00:02:28,180 --> 00:02:28,980
And yeah.

48
00:02:28,980 --> 00:02:32,070
It helps us solve problems and
use solutions from one problem

49
00:02:32,070 --> 00:02:36,840
to solve other problems,
so it's pretty profound.

50
00:02:36,840 --> 00:02:39,900
And yeah.

51
00:02:39,900 --> 00:02:44,002
So I have a little more motivation in
answering this question of P versus NP.

52
00:02:44,002 --> 00:02:45,960
And I still haven't really
told you what it is.

53
00:02:45,960 --> 00:02:49,380
But solving this problem
would essentially

54
00:02:49,380 --> 00:02:53,280
help us with questions like are
algorithms invented or discovered?

55
00:02:53,280 --> 00:02:56,191
Is there some fast
algorithm for solving Sudoku

56
00:02:56,191 --> 00:02:57,690
that we just haven't discovered yet?

57
00:02:57,690 --> 00:03:00,780
Or did it just exist in the ether
and we have discovered it yet?

58
00:03:00,780 --> 00:03:05,370
Or are we just like too dumb to
discover it, but like definitely exists?

59
00:03:05,370 --> 00:03:08,360
Or does it just not exist at all?

60
00:03:08,360 --> 00:03:10,980
And yeah, this question is-- if
that wasn't motivation enough,

61
00:03:10,980 --> 00:03:12,990
it's a million dollar
question in the sense

62
00:03:12,990 --> 00:03:16,560
that the Clay Institute-- It's one
of Clay Institute's Millennium Prize

63
00:03:16,560 --> 00:03:21,600
problems, which means there is
$1 million for whoever person

64
00:03:21,600 --> 00:03:25,600
can solve it and prove
it, one way or the other.

65
00:03:25,600 --> 00:03:30,780
Because what do computer scientists
love more than computers?

66
00:03:30,780 --> 00:03:33,380
Yeah, money.

67
00:03:33,380 --> 00:03:37,140
And I'm hoping that a CS50
student will be the one

68
00:03:37,140 --> 00:03:39,180
to solve this problem one day.

69
00:03:39,180 --> 00:03:41,860
So I hope you can--

70
00:03:41,860 --> 00:03:44,430
So what is the question of P versus NP?

71
00:03:44,430 --> 00:03:50,430
And it's really, are problem that
are easy to check also easy to solve?

72
00:03:50,430 --> 00:03:55,200
And that's sort of a very, very,
very high-level thought-- way

73
00:03:55,200 --> 00:03:55,980
to think about it.

74
00:03:55,980 --> 00:03:58,639
But what is this word
easy mean in this context?

75
00:03:58,639 --> 00:04:00,180
What does it mean to check a problem?

76
00:04:00,180 --> 00:04:02,160
What does it mean to solve a problem?

77
00:04:02,160 --> 00:04:04,795
Hoping to go into all of
that in just a minute.

78
00:04:04,795 --> 00:04:06,336
And hopefully that will become clear.

79
00:04:06,336 --> 00:04:09,010


80
00:04:09,010 --> 00:04:13,410
So yeah, this problem in
particular has a long history.

81
00:04:13,410 --> 00:04:16,260
Back in the mid 1960s,
when computers were still

82
00:04:16,260 --> 00:04:19,410
the size of cabinets
and small houses, people

83
00:04:19,410 --> 00:04:22,980
were still trying to find algorithms
to solve basically all the world's

84
00:04:22,980 --> 00:04:27,060
problems, like factoring and
the Traveling Salesman Problem,

85
00:04:27,060 --> 00:04:30,860
and determining if a number is
prime, and problems like these.

86
00:04:30,860 --> 00:04:33,510
But some problems they
had slow algorithms for,

87
00:04:33,510 --> 00:04:36,450
they were able to solve them,
but it just took a long time.

88
00:04:36,450 --> 00:04:39,690
And for some algorithms, like find the
greatest common divisor and sorting,

89
00:04:39,690 --> 00:04:42,240
we have fast algorithms like merge sort.

90
00:04:42,240 --> 00:04:45,660
That's, worst case, running
a prime of, I think, n log n.

91
00:04:45,660 --> 00:04:46,220
I'm a CI.

92
00:04:46,220 --> 00:04:48,360
I should know that.

93
00:04:48,360 --> 00:04:50,340
So things like that just
have fast algorithms.

94
00:04:50,340 --> 00:04:55,080
But for some problems, we
just have no no fast solution.

95
00:04:55,080 --> 00:04:58,440
And then in 1965, people
came up with a solution

96
00:04:58,440 --> 00:05:02,040
to the discrete Fourier
transform, and suddenly, that

97
00:05:02,040 --> 00:05:04,500
was in the fast category.

98
00:05:04,500 --> 00:05:07,590
And for things like determining
if a number is prime,

99
00:05:07,590 --> 00:05:13,560
that took until 2002 for someone
to find an algorithm to do that,

100
00:05:13,560 --> 00:05:15,540
until it could be in this fast category.

101
00:05:15,540 --> 00:05:17,940
And I'll explain what a
fast algorithm is versus

102
00:05:17,940 --> 00:05:20,889
a slow algorithm in just a second.

103
00:05:20,889 --> 00:05:23,180
But for a lot of these
algorithms, there were just no--

104
00:05:23,180 --> 00:05:25,805
for a lot of these problems,
there were just no such algorithms

105
00:05:25,805 --> 00:05:28,800
that were able to be solved quickly.

106
00:05:28,800 --> 00:05:34,590
And it didn't seem like we would ever
find a fast solution to these problem.

107
00:05:34,590 --> 00:05:39,120
And so that is really the question,
is will we ever find fast solutions

108
00:05:39,120 --> 00:05:41,640
to these problems?

109
00:05:41,640 --> 00:05:44,670
And actually, the
answer is very unclear.

110
00:05:44,670 --> 00:05:48,390
Although, I think one of the most
important moments for this problem

111
00:05:48,390 --> 00:05:56,070
was a theorem that Steven Cook of
MIT and Leonid Levin came up with.

112
00:05:56,070 --> 00:05:57,960
And it basically said
that a lot of problems

113
00:05:57,960 --> 00:06:00,390
that we face in computer
science are all connected

114
00:06:00,390 --> 00:06:02,460
by some sort of core difficulty.

115
00:06:02,460 --> 00:06:04,700
And what that means is
that if you could solve--

116
00:06:04,700 --> 00:06:07,450
if you could have a fast solution
to one algorithm or one problem,

117
00:06:07,450 --> 00:06:10,020
you could actually
use the same algorithm

118
00:06:10,020 --> 00:06:13,170
to solve very different,
seemingly different problem.

119
00:06:13,170 --> 00:06:17,490
And that was a good way to
categorize these problems,

120
00:06:17,490 --> 00:06:22,320
and we learned a lot from that about
complexity and general complexity

121
00:06:22,320 --> 00:06:27,280
theory being the fields of
categorizing these problems.

122
00:06:27,280 --> 00:06:33,240
So I think that was a
really important moment.

123
00:06:33,240 --> 00:06:33,740
Yeah.

124
00:06:33,740 --> 00:06:40,320
So let's go into a little more detail
here about what P is, what is NP,

125
00:06:40,320 --> 00:06:43,870
why are they fighting, and so forth.

126
00:06:43,870 --> 00:06:49,310
But at the very base of it,
P and NP are sets of problem.

127
00:06:49,310 --> 00:06:52,620
Normally in math, we think about
sets of numbers or sets of elements

128
00:06:52,620 --> 00:06:54,020
or sets of words or whatever.

129
00:06:54,020 --> 00:06:56,460
But now we're talking
about sets of problems.

130
00:06:56,460 --> 00:06:59,010
And the way these problems
are defined is whether or not

131
00:06:59,010 --> 00:07:01,230
they have these fast algorithms.

132
00:07:01,230 --> 00:07:06,960
So some problems, like the problem
in P have known fast algorithms.

133
00:07:06,960 --> 00:07:14,880
And some problems that are in NP are
at least able to be checked quickly.

134
00:07:14,880 --> 00:07:18,330
And definitely, if a
problem is P, it is in NP

135
00:07:18,330 --> 00:07:21,360
because if you can solve
a problem very quickly,

136
00:07:21,360 --> 00:07:25,110
it's also easy to check it
because you just solve it again.

137
00:07:25,110 --> 00:07:30,520
And that will all become clear,
hopefully, in the next few slides.

138
00:07:30,520 --> 00:07:34,710
But yeah, so let's talk first a
little more in-depth about what is P?

139
00:07:34,710 --> 00:07:40,470
The set polynomial time, which
is basically the set of problems

140
00:07:40,470 --> 00:07:44,010
where there exists a polynomial time
algorithm that generates a solution.

141
00:07:44,010 --> 00:07:46,290
That might be a little
bit confusing at first.

142
00:07:46,290 --> 00:07:50,850
But in CS50, hopefully, you learned
a little bit about big O notation

143
00:07:50,850 --> 00:07:53,940
or worst case algorithm runtime.

144
00:07:53,940 --> 00:07:57,270
And what this means is that these
are the types of problem that can be

145
00:07:57,270 --> 00:07:59,670
solved in a polynomial amount of time.

146
00:07:59,670 --> 00:08:04,260
That could be like 3n
to the 5th plus 5, where

147
00:08:04,260 --> 00:08:10,140
n is the size of the
problem by some definition.

148
00:08:10,140 --> 00:08:12,990
So these problems include
finding the GCD of two numbers.

149
00:08:12,990 --> 00:08:15,390
We have a fast algorithm for that.

150
00:08:15,390 --> 00:08:19,170
It's been known for lots of--
many, many years due to Euclid.

151
00:08:19,170 --> 00:08:20,420
Excuse me.

152
00:08:20,420 --> 00:08:24,270
And you can see this is the
Euclidean algorithm right there.

153
00:08:24,270 --> 00:08:27,210
And it just is very deterministic.

154
00:08:27,210 --> 00:08:29,850
You just put in input,
and you can slowly

155
00:08:29,850 --> 00:08:33,330
boil down the problem into a solution.

156
00:08:33,330 --> 00:08:36,274
Linear programming is basically
like a set of equations.

157
00:08:36,274 --> 00:08:38,190
Let's not worry about
that too much right now,

158
00:08:38,190 --> 00:08:42,062
but that's another problem
that is well known to be in P.

159
00:08:42,062 --> 00:08:45,270
And determining if a number is prime,
that actually-- as I mentioned earlier,

160
00:08:45,270 --> 00:08:47,790
that result wasn't found until 2002.

161
00:08:47,790 --> 00:08:52,980
But that is also in the
P family of problems.

162
00:08:52,980 --> 00:08:56,610
And multiplication, as you
probably all know, the algorithm

163
00:08:56,610 --> 00:08:58,166
for multiplying two numbers.

164
00:08:58,166 --> 00:08:59,790
You just sort of go through one by one.

165
00:08:59,790 --> 00:09:03,550
And that's, at least-- if you multiply
every digit by every other digit,

166
00:09:03,550 --> 00:09:05,070
that's sort of like n squared.

167
00:09:05,070 --> 00:09:06,111
And then you add them up.

168
00:09:06,111 --> 00:09:07,620
So it's at most n cubed.

169
00:09:07,620 --> 00:09:11,010
But crucially, these albums--
excuse me-- algorithms

170
00:09:11,010 --> 00:09:14,820
aren't like 2 to the n
or something exponential.

171
00:09:14,820 --> 00:09:22,380
So that's what makes them
in this category of P. Yes.

172
00:09:22,380 --> 00:09:28,530
And yeah, so let's move on a
little bit to the set NP, which

173
00:09:28,530 --> 00:09:31,920
stands for nondeterministic
polynomial-time time, which

174
00:09:31,920 --> 00:09:36,570
is sort of like a way of saying,
well, they have polynomial algorithms

175
00:09:36,570 --> 00:09:38,970
to solve them, but
they're nondeterministic,

176
00:09:38,970 --> 00:09:42,300
which means you just sort
of have to guess or use

177
00:09:42,300 --> 00:09:44,910
an infinite number of computers
before you can solve them

178
00:09:44,910 --> 00:09:46,710
in a polynomial amount of time.

179
00:09:46,710 --> 00:09:50,070
But the way these are defined
formally-- and before,

180
00:09:50,070 --> 00:09:53,190
we talked about problems
that can be solved quickly.

181
00:09:53,190 --> 00:09:56,370
Now, we're talking about problems
that can be checked quickly.

182
00:09:56,370 --> 00:09:59,609
And what that means is it's
a set of decision problems--

183
00:09:59,609 --> 00:10:02,400
and I'll explain what that means
in just a second-- for which there

184
00:10:02,400 --> 00:10:06,480
exists a polynomial algorithm to
check if a solution is correct.

185
00:10:06,480 --> 00:10:09,840
So before, we were saying I'm
giving you a problem; tell me

186
00:10:09,840 --> 00:10:16,290
a solution and some
definition of the solution.

187
00:10:16,290 --> 00:10:19,380
And now I'm saying here's
a problem and a solution,

188
00:10:19,380 --> 00:10:20,940
and I want you to check the solution.

189
00:10:20,940 --> 00:10:25,890
So does that checking take
a polynomial amount of time?

190
00:10:25,890 --> 00:10:30,000
And so these problems
are-- I guess problems

191
00:10:30,000 --> 00:10:34,710
included in this set are like Sudoku
because if I give you a Sudoku board

192
00:10:34,710 --> 00:10:36,959
and said this is solved,
how fast would it

193
00:10:36,959 --> 00:10:38,500
be for you to check if it was solved?

194
00:10:38,500 --> 00:10:40,166
It actually wouldn't take you that long.

195
00:10:40,166 --> 00:10:44,550
You'd probably just go through
every row, column, and nine

196
00:10:44,550 --> 00:10:47,700
by-- excuse me-- three
by three square to check.

197
00:10:47,700 --> 00:10:50,040
And that's what n cubed.

198
00:10:50,040 --> 00:10:52,980
But crucially, it's not
something like 9 to the n, which

199
00:10:52,980 --> 00:10:55,066
would be an exponential amount of time.

200
00:10:55,066 --> 00:10:57,800


201
00:10:57,800 --> 00:11:00,800
And in other problems
like factoring, it's

202
00:11:00,800 --> 00:11:04,060
easy to check if a factor or
a factorization is correct.

203
00:11:04,060 --> 00:11:05,990
You just multiply the
numbers, and then you

204
00:11:05,990 --> 00:11:08,440
can see if that
factorization is correct.

205
00:11:08,440 --> 00:11:13,370
A much harder problem to
factor the numbers, by the way.

206
00:11:13,370 --> 00:11:18,970
And Traveling Sales problem's
an example of famous NP problem

207
00:11:18,970 --> 00:11:22,217
where you're given a graph
with vertices and edges.

208
00:11:22,217 --> 00:11:24,800
And we'll get into a little more
about graphs in just a second

209
00:11:24,800 --> 00:11:28,720
because those turn out to be
really relevant to NP problems

210
00:11:28,720 --> 00:11:31,130
and thinking about them.

211
00:11:31,130 --> 00:11:34,520
But again, multiplication-- I thought
we just said that was a P problem.

212
00:11:34,520 --> 00:11:36,800
Well, if something's
in P, it's also in NP

213
00:11:36,800 --> 00:11:39,860
because if I said,
well, what's 51 times 3?

214
00:11:39,860 --> 00:11:43,130
If I said 51 times 3 is 153,
well, how would you know?

215
00:11:43,130 --> 00:11:47,010
You'd probably just check it
yourself and multiply yourself.

216
00:11:47,010 --> 00:11:49,490
And of course, that's, as
we described, a P algorithm.

217
00:11:49,490 --> 00:11:53,570
So that has a P algorithm to check
whether or not my solution is correct.

218
00:11:53,570 --> 00:11:55,570
That's how you know it's in NP.

219
00:11:55,570 --> 00:11:59,950
So we've talked about P.
We've talked about NP.

220
00:11:59,950 --> 00:12:02,900
So now, we have a couple
examples here, and maybe give you

221
00:12:02,900 --> 00:12:06,410
a better grasp of how to think
about different problems.

222
00:12:06,410 --> 00:12:10,670
So sorting a list, is that in P or NP?

223
00:12:10,670 --> 00:12:14,930
Well, we talked about sorting and
the running time of those in CS50.

224
00:12:14,930 --> 00:12:18,290
Hopefully, you know that
that's, for a lot of sorts,

225
00:12:18,290 --> 00:12:23,300
that's n log n or n squared,
I think, for a quick sort.

226
00:12:23,300 --> 00:12:25,940
But those all have
polynomial time algorithms,

227
00:12:25,940 --> 00:12:29,630
so those would be in P, also in NP.

228
00:12:29,630 --> 00:12:32,220
Multiplication, we just
described P algorithm, right?

229
00:12:32,220 --> 00:12:35,420
It's something like n cubed.

230
00:12:35,420 --> 00:12:38,600
And then given a set of vertices
and edges, is the graph connected?

231
00:12:38,600 --> 00:12:41,290
Is that P or NP?

232
00:12:41,290 --> 00:12:43,670
Well, you just start at
a point, and you traverse

233
00:12:43,670 --> 00:12:46,970
until you have seen all the edges.

234
00:12:46,970 --> 00:12:48,970
If you haven't seen--
excuse me, all the points.

235
00:12:48,970 --> 00:12:55,430
If you haven't seen all of the vertices,
then you know it is not well connected.

236
00:12:55,430 --> 00:12:57,800
And another example
would be a Rubik's cube.

237
00:12:57,800 --> 00:12:59,570
First of all, is that in NP?

238
00:12:59,570 --> 00:13:04,111
That is, is there a way to check
if the Rubik's cube is solved?

239
00:13:04,111 --> 00:13:04,610
Well, yeah.

240
00:13:04,610 --> 00:13:05,580
It's pretty easy.

241
00:13:05,580 --> 00:13:08,150
You just check if all
the faces are solved.

242
00:13:08,150 --> 00:13:09,200
But is it in P?

243
00:13:09,200 --> 00:13:13,070
Is there a polynomial time algorithm
to actually solve the Rubik's cube?

244
00:13:13,070 --> 00:13:15,650
That is actually a little
harder to think about,

245
00:13:15,650 --> 00:13:22,587
but it turns out it is actually--
computers can do that fairly fast.

246
00:13:22,587 --> 00:13:24,170
And what about the best move in chess?

247
00:13:24,170 --> 00:13:25,370
Is that in NP?

248
00:13:25,370 --> 00:13:28,640
Well, if I told you that this
was the best move in chess,

249
00:13:28,640 --> 00:13:31,550
could you check that it's
the best move in chess?

250
00:13:31,550 --> 00:13:32,570
Probably not.

251
00:13:32,570 --> 00:13:35,120
I think you'd have to go
through a more calculation

252
00:13:35,120 --> 00:13:37,670
to determine if that was the best move.

253
00:13:37,670 --> 00:13:40,630
And that problem, actually,
is not in NP or P.

254
00:13:40,630 --> 00:13:43,220
That's in its own class
called exponential time,

255
00:13:43,220 --> 00:13:48,820
where it takes exponential time to
even check if a solution is correct.

256
00:13:48,820 --> 00:13:51,620
And then there's problems
like-- a classic NP

257
00:13:51,620 --> 00:13:56,090
problem is like the subset sum problem,
which means given a set of integers,

258
00:13:56,090 --> 00:13:59,900
say, I'm asking do they
sum to negative-- is there

259
00:13:59,900 --> 00:14:02,130
a subset that sums to negative 1?

260
00:14:02,130 --> 00:14:05,980
And in this case, you
could just look at it.

261
00:14:05,980 --> 00:14:08,210
And if I gave you-- sorry,
if I gave you a subset

262
00:14:08,210 --> 00:14:11,085
and then said these sub to minus 1,
it would be really easy to check.

263
00:14:11,085 --> 00:14:14,530
You just sum them up,
and they sum to minus 1.

264
00:14:14,530 --> 00:14:17,230
But solving that problem
actually isn't very easy.

265
00:14:17,230 --> 00:14:19,430
And there's some sort
of exponential checking

266
00:14:19,430 --> 00:14:24,950
that you have to do to determine
if there is such a subset.

267
00:14:24,950 --> 00:14:28,400
And if you think about how many
subsets there are, 2 to the n

268
00:14:28,400 --> 00:14:31,310
subsets, I believe, where
there's n members of a set.

269
00:14:31,310 --> 00:14:35,900
So you have to check a lot, and that's
a good indicator that it is not in P.

270
00:14:35,900 --> 00:14:41,290
But in this case, it is, as we showed,
in NP because it's easy to check.

271
00:14:41,290 --> 00:14:45,050


272
00:14:45,050 --> 00:14:49,930
So I was talking to you
before about NP completeness.

273
00:14:49,930 --> 00:14:53,210
And this is sort of one of
the most important concepts

274
00:14:53,210 --> 00:14:57,289
to understand when talking
about P versus NP problem.

275
00:14:57,289 --> 00:15:00,580
Basically, what this means-- and this is
related to the Cook-Levin theorem that

276
00:15:00,580 --> 00:15:03,510
was talking about earlier.

277
00:15:03,510 --> 00:15:06,320
It essentially says that all
of these problems-- and this

278
00:15:06,320 --> 00:15:08,610
is an example of a
bunch of NP problems--

279
00:15:08,610 --> 00:15:10,090
they're all sort of connected.

280
00:15:10,090 --> 00:15:13,610
Where, if you have a solution,
say, to the graph coloring problem,

281
00:15:13,610 --> 00:15:17,540
you can use it to form a solution
to the clique cover problem

282
00:15:17,540 --> 00:15:21,530
or to the hitting set problem or to
the yada, yada-- you can basically

283
00:15:21,530 --> 00:15:23,600
form one solution into the other.

284
00:15:23,600 --> 00:15:29,220
And that's really powerful because if
someone one day finds a polynomial time

285
00:15:29,220 --> 00:15:31,370
algorithm to do one of
these problems, that

286
00:15:31,370 --> 00:15:34,940
means they can find a polynomial time
algorithm to do all of these problems

287
00:15:34,940 --> 00:15:39,119
because they're all sort of related,
and they can be converted to one another

288
00:15:39,119 --> 00:15:39,660
very quickly.

289
00:15:39,660 --> 00:15:42,190


290
00:15:42,190 --> 00:15:44,580
And I sort of glossed
over detail there, which

291
00:15:44,580 --> 00:15:49,180
is how fast can these algorithms
solutions be converted to one another?

292
00:15:49,180 --> 00:15:50,910
And that has to do with reduction.

293
00:15:50,910 --> 00:15:53,560
And I just want to talk
about this for a second

294
00:15:53,560 --> 00:15:57,600
because this is one of the
more core concepts that

295
00:15:57,600 --> 00:15:59,236
is involved when proving these things.

296
00:15:59,236 --> 00:16:01,110
And this is actually
what computer scientists

297
00:16:01,110 --> 00:16:04,980
have to do when they talk about--
are these problems NP complete?

298
00:16:04,980 --> 00:16:08,440
Are these problems related
by some core difficulty?

299
00:16:08,440 --> 00:16:10,260
And what it means is
basically a reduction

300
00:16:10,260 --> 00:16:13,260
is a polynomial time algorithm that
converts a solution from one problem

301
00:16:13,260 --> 00:16:15,190
to the solution of another problem.

302
00:16:15,190 --> 00:16:19,331
And I'm going to be talking a little bit
about three particular computer science

303
00:16:19,331 --> 00:16:21,330
related problems, which
are the independents set

304
00:16:21,330 --> 00:16:24,450
problem, the vertex cover
problem, and the clique problem.

305
00:16:24,450 --> 00:16:29,040
Independent set problem
has to do with-- basically,

306
00:16:29,040 --> 00:16:31,470
given a graph like the one
I'm showing in the slide,

307
00:16:31,470 --> 00:16:37,630
are there are a set of k vertices, such
that there are no 2 of those vertices

308
00:16:37,630 --> 00:16:40,290
are touching or related by an edge?

309
00:16:40,290 --> 00:16:43,690
The vertex cover problem
means can you give me

310
00:16:43,690 --> 00:16:51,090
a minimum set of vertices
that covers all the edges?

311
00:16:51,090 --> 00:16:53,420
That is, that all of the
edges are-- excuse me.

312
00:16:53,420 --> 00:16:58,570
All of the vertices are touching
every one of the edges in the graph.

313
00:16:58,570 --> 00:17:02,040
And then the clique problem-- I don't
know if it's "click" or "kleek."

314
00:17:02,040 --> 00:17:04,079
Some people say "kleek."

315
00:17:04,079 --> 00:17:06,510
But it's related to the
colloquial sense of the word

316
00:17:06,510 --> 00:17:09,569
clique, which is like a friend group.

317
00:17:09,569 --> 00:17:13,050
Is there a friend group,
essentially, in this graph

318
00:17:13,050 --> 00:17:18,609
where every single person in the graph,
every person represented by a vertex,

319
00:17:18,609 --> 00:17:21,700
is connected to every single
other vertex in the clique?

320
00:17:21,700 --> 00:17:23,920
And in this case, you
can see there is actually

321
00:17:23,920 --> 00:17:28,280
a 5 clique in the bottom
left of these 5 vertexes

322
00:17:28,280 --> 00:17:30,500
are all connected to each other.

323
00:17:30,500 --> 00:17:32,920
And so that makes them
what's called a 5 clique.

324
00:17:32,920 --> 00:17:35,711
So these are the three problems
that I'm going to go into in detail

325
00:17:35,711 --> 00:17:36,640
just a second.

326
00:17:36,640 --> 00:17:40,670
And I'm going to show you that they're
actually all sort of the same problem.

327
00:17:40,670 --> 00:17:44,330
And solving one can actually
give you a solution to another.

328
00:17:44,330 --> 00:17:48,510
So let's take the example of this graph.

329
00:17:48,510 --> 00:17:52,540
Again, graphs generally is a
set of vertices, excuse me,

330
00:17:52,540 --> 00:17:56,220
and edges connecting them.

331
00:17:56,220 --> 00:17:58,140
And this is an example
of an independent set.

332
00:17:58,140 --> 00:17:59,280
So how do we get this?

333
00:17:59,280 --> 00:18:04,740
If I just said to you give me 4 of
these points, such that no 2 of them

334
00:18:04,740 --> 00:18:10,781
are related by an edge, then
this would be the solution.

335
00:18:10,781 --> 00:18:11,280
Great.

336
00:18:11,280 --> 00:18:12,580
So how long does that take?

337
00:18:12,580 --> 00:18:14,440
Well, if it were a big
graph, it would have

338
00:18:14,440 --> 00:18:16,460
taken some exponential amount of time.

339
00:18:16,460 --> 00:18:19,180
There's no polynomial
time algorithm to do that.

340
00:18:19,180 --> 00:18:21,940
But if we are able to
get a solution, we can

341
00:18:21,940 --> 00:18:24,070
use it to solve some other problems.

342
00:18:24,070 --> 00:18:26,610
And in this case, we can
use the independent set

343
00:18:26,610 --> 00:18:28,630
to solve the vertex cover problem.

344
00:18:28,630 --> 00:18:29,380
How do we do that?

345
00:18:29,380 --> 00:18:33,940
Well, we just consider all of
the vertexes in the compliment.

346
00:18:33,940 --> 00:18:38,490
So basically, all of the vertex
that weren't in the independent set,

347
00:18:38,490 --> 00:18:40,860
consider those to be
in the vertex cover.

348
00:18:40,860 --> 00:18:44,860
And these have the property
that every single vertex

349
00:18:44,860 --> 00:18:47,550
is related to every other edge.

350
00:18:47,550 --> 00:18:50,860
I don't know if I'm
explaining that correctly.

351
00:18:50,860 --> 00:18:55,860
Every vertex in the vertex cover
is touching every single edge.

352
00:18:55,860 --> 00:18:59,510
And in this case, you
can hopefully see that.

353
00:18:59,510 --> 00:19:02,520
And so we just created,
out of nothingness,

354
00:19:02,520 --> 00:19:05,410
a solution to vertex cover from
a solution to independent set.

355
00:19:05,410 --> 00:19:08,710
And we can even take it one
step further with clique.

356
00:19:08,710 --> 00:19:12,810
The way you do this is you consider
the points an independent set,

357
00:19:12,810 --> 00:19:15,160
or the graph of independent
set, and then take

358
00:19:15,160 --> 00:19:18,480
the complement of the edges,
which means that every edge that

359
00:19:18,480 --> 00:19:20,550
doesn't exist already, you fill it in.

360
00:19:20,550 --> 00:19:23,650
And every edge that did
exist, you get rid of it.

361
00:19:23,650 --> 00:19:26,200
And what you're left with is this graph.

362
00:19:26,200 --> 00:19:29,380
And all of the points that were
originally an independent set

363
00:19:29,380 --> 00:19:34,110
are now a clique-- in this case,
a 4 clique-- for this new graph.

364
00:19:34,110 --> 00:19:39,450
So it took a long time to create
a solution to independent set,

365
00:19:39,450 --> 00:19:40,710
an exponential amount of time.

366
00:19:40,710 --> 00:19:43,120
But now that we have that,
we're able to create solutions

367
00:19:43,120 --> 00:19:45,550
to other similar problems.

368
00:19:45,550 --> 00:19:49,020
And it turns out there's a wide
range of problems that are related,

369
00:19:49,020 --> 00:19:53,680
even problems not really seemingly
related to graphs at all.

370
00:19:53,680 --> 00:19:59,240
So this is what's called this
the satisfiability problem.

371
00:19:59,240 --> 00:20:05,110
Given a clause, or basically a logic
clause, where each of these variables--

372
00:20:05,110 --> 00:20:09,160
I don't you can see where
I'm pointing in the video,

373
00:20:09,160 --> 00:20:13,840
but these are all basically variables
that are either true or false.

374
00:20:13,840 --> 00:20:17,550
And so what the satisfiability
question is asking

375
00:20:17,550 --> 00:20:20,830
is whether or not this clause
has a truth assignment.

376
00:20:20,830 --> 00:20:26,980
That is, there exists an a and b,
such that this all becomes true.

377
00:20:26,980 --> 00:20:33,800
And for those who don't know, these
little downwards carrots are or's.

378
00:20:33,800 --> 00:20:35,260
And the upward carrots are ands.

379
00:20:35,260 --> 00:20:37,010
And the little-- I
don't what you call it.

380
00:20:37,010 --> 00:20:41,279
The little sideways dashes
are nots, so the opposite.

381
00:20:41,279 --> 00:20:42,320
If it's true, it's false.

382
00:20:42,320 --> 00:20:45,200
If it's false, it's true.

383
00:20:45,200 --> 00:20:48,170
So the question is does this
have a truth assignment?

384
00:20:48,170 --> 00:20:52,610
And maybe you can just eyeball it and
say, well, actually, if A is true,

385
00:20:52,610 --> 00:20:56,070
and B is false, and C is true,
then this statement can be true.

386
00:20:56,070 --> 00:20:58,290
And that indeed, that is the case.

387
00:20:58,290 --> 00:21:01,130
So how do we reduce-- how
do we take that solution

388
00:21:01,130 --> 00:21:05,270
and turn it into a solution to,
say, the vertex cover problem?

389
00:21:05,270 --> 00:21:09,320
In this case, all we have
to do is create this graph.

390
00:21:09,320 --> 00:21:12,295
And there is an algorithm
to create this graph.

391
00:21:12,295 --> 00:21:15,170
And what it does is you basically--
I don't know if you can see this,

392
00:21:15,170 --> 00:21:18,950
but every clause of
the original statement

393
00:21:18,950 --> 00:21:25,400
up here turns into 2 variables
at the bottom of this graph.

394
00:21:25,400 --> 00:21:30,380
So you have AB turning into AB
down here, not A, not B, not C

395
00:21:30,380 --> 00:21:32,520
turning into these 3 points down here.

396
00:21:32,520 --> 00:21:36,200
And then not ABC turning into not ABC.

397
00:21:36,200 --> 00:21:41,700
And then finally, you just take
every a and its complement up here.

398
00:21:41,700 --> 00:21:46,710
Now, let's say we have the
solution, A, not BECAUSE.

399
00:21:46,710 --> 00:21:53,720
Then basically, you want to color
these, A not BC on the top line.

400
00:21:53,720 --> 00:21:57,890
And then you want to connect
every single element to itself.

401
00:21:57,890 --> 00:22:03,680
So not B goes to not B. B goes to
B. A goes to A. Not C to not C.

402
00:22:03,680 --> 00:22:05,240
And that's that.

403
00:22:05,240 --> 00:22:09,618
The point is that basically--
oh, did I miss something?

404
00:22:09,618 --> 00:22:12,210


405
00:22:12,210 --> 00:22:17,570
So then you basically color these
based on-- there is a relationship.

406
00:22:17,570 --> 00:22:19,157
I'm actually blanking on it right now.

407
00:22:19,157 --> 00:22:20,990
But basically, there's
a way to color these.

408
00:22:20,990 --> 00:22:24,530
And basically when you
color the A, not BC,

409
00:22:24,530 --> 00:22:29,090
and then B, not A, not CBA on the
bottom, that creates a vertex cover.

410
00:22:29,090 --> 00:22:31,790
And so you've used a solution,
you've used an algorithm,

411
00:22:31,790 --> 00:22:35,060
that creates a solution
from the satisfiability

412
00:22:35,060 --> 00:22:39,200
into the vertex cover that
is actually very powerful.

413
00:22:39,200 --> 00:22:41,140
And that's what's called a reduction.

414
00:22:41,140 --> 00:22:44,120
I know that was a little bit involved.

415
00:22:44,120 --> 00:22:48,510
I'm still trying to figure
out how these are related.

416
00:22:48,510 --> 00:22:56,660
But basically, A, not BC--
yeah, so I think what it is,

417
00:22:56,660 --> 00:23:00,740
is if it's A is colored
here, then you color all the

418
00:23:00,740 --> 00:23:02,980
not A's in this bottom graph.

419
00:23:02,980 --> 00:23:05,190
And if not B is colored
here, then you colored

420
00:23:05,190 --> 00:23:06,570
all the B's in this bottom graph.

421
00:23:06,570 --> 00:23:10,600
And again, if you color C here,
then the not C is colored here.

422
00:23:10,600 --> 00:23:14,539
And I think using that algorithm, you
can create a solution to vertex cover.

423
00:23:14,539 --> 00:23:16,205
Why is this the solution a vertex cover?

424
00:23:16,205 --> 00:23:20,450
So this is one thing I didn't mention,
is that all of the colored things

425
00:23:20,450 --> 00:23:23,720
are covering every single edge.

426
00:23:23,720 --> 00:23:26,600
That is, every single blue
node here is connected

427
00:23:26,600 --> 00:23:28,670
to every single edge in this graph.

428
00:23:28,670 --> 00:23:31,340
Also, as we discussed
before, vertex cover

429
00:23:31,340 --> 00:23:35,180
is related to independent
set in the compliment.

430
00:23:35,180 --> 00:23:38,390
So if you consider all the
white nodes in this graph,

431
00:23:38,390 --> 00:23:43,190
they form a independent set where no
2 of the white nodes are touching.

432
00:23:43,190 --> 00:23:45,710
So now we've created solutions
to two of these problems

433
00:23:45,710 --> 00:23:48,140
from a seemingly completely
unrelated problem,

434
00:23:48,140 --> 00:23:51,240
which is the satisfiability problem.

435
00:23:51,240 --> 00:23:56,090
And that's actually really powerful when
considering these type of NP problems.

436
00:23:56,090 --> 00:23:58,089
Sorry, that was a little bit wordy.

437
00:23:58,089 --> 00:23:59,880
Hopefully, you got
something out of that's.

438
00:23:59,880 --> 00:24:03,620
Let's go on to more fun things
like the Turing machine.

439
00:24:03,620 --> 00:24:05,660
Way more fun.

440
00:24:05,660 --> 00:24:11,010
Of course, this is due to Alan Turing,
the man who needs no introduction.

441
00:24:11,010 --> 00:24:13,070
But basically, this was
his theoretical model

442
00:24:13,070 --> 00:24:17,000
of a computer in its simplest
form, which is basically

443
00:24:17,000 --> 00:24:18,900
just something that runs an algorithm.

444
00:24:18,900 --> 00:24:22,580
I think this is sort of--
this image helps to see that.

445
00:24:22,580 --> 00:24:28,250
It's a machine that runs an algorithm
on an infinite tape of 1s and 0s.

446
00:24:28,250 --> 00:24:32,990
And that is a computer
at its simplest form.

447
00:24:32,990 --> 00:24:36,170
And Alan Turing saying that that
mathematical functions on numbers

448
00:24:36,170 --> 00:24:40,040
can be just as well computed by a
Turing machine, that was basically what

449
00:24:40,040 --> 00:24:44,940
was concluded in what's called
the Church-Turing thesis.

450
00:24:44,940 --> 00:24:48,650
And I like to think of this
as basically a reduction

451
00:24:48,650 --> 00:24:51,590
of a computer to its simplest
abilities, the ability

452
00:24:51,590 --> 00:24:54,000
to read from memory, the
ability to write from memory.

453
00:24:54,000 --> 00:24:55,291
And that's really all you need.

454
00:24:55,291 --> 00:24:59,930
You could interpret this Turing
machine of just 1s and 0s

455
00:24:59,930 --> 00:25:02,270
as like blocks of 8 bits.

456
00:25:02,270 --> 00:25:05,270
And then you could have bytes, say.

457
00:25:05,270 --> 00:25:08,330
And then you can work in letters,
as opposed to just 1s and 0s.

458
00:25:08,330 --> 00:25:11,480
And so I think reduction is just a
concept that comes up a lot in computer

459
00:25:11,480 --> 00:25:12,965
science, so it's good to know.

460
00:25:12,965 --> 00:25:16,460
And not just in the sense of
problems to problems, but also,

461
00:25:16,460 --> 00:25:19,424
for example, computers and
things that solve those problems.

462
00:25:19,424 --> 00:25:22,550


463
00:25:22,550 --> 00:25:25,974
It's worth noting, there are
also other theoretical ways

464
00:25:25,974 --> 00:25:27,140
of thinking about computers.

465
00:25:27,140 --> 00:25:32,107
Today, we have quantum computers,
which is just another theoretical model

466
00:25:32,107 --> 00:25:32,690
of a computer.

467
00:25:32,690 --> 00:25:36,830
And that has different ramifications
for the P versus NP problem.

468
00:25:36,830 --> 00:25:40,670
And the problem kind of
breaks down in that sense.

469
00:25:40,670 --> 00:25:43,840
But at least for Turing's
notion of computers,

470
00:25:43,840 --> 00:25:45,090
this is what we'll talk about.

471
00:25:45,090 --> 00:25:47,770


472
00:25:47,770 --> 00:25:51,720
Yeah, so what is the
resolution to this problem?

473
00:25:51,720 --> 00:25:52,870
Is P NP?

474
00:25:52,870 --> 00:25:54,790
Is P not NP?

475
00:25:54,790 --> 00:25:57,720
Are these problems-- is there
a problem that's not in NP

476
00:25:57,720 --> 00:26:00,950
but-- excuse me-- in NP
but definitely not in P?

477
00:26:00,950 --> 00:26:02,440
And the answer is pretty unclear.

478
00:26:02,440 --> 00:26:05,590
But there's a lot of
smart people, not like me,

479
00:26:05,590 --> 00:26:10,870
but smarter than me, who think that
there's definitely not a solution.

480
00:26:10,870 --> 00:26:13,840
Because people have spent a lot of
time thinking about this question.

481
00:26:13,840 --> 00:26:15,790
And for a lot of these
NP complete problems,

482
00:26:15,790 --> 00:26:17,800
they just can't find
efficient algorithms.

483
00:26:17,800 --> 00:26:20,300
I think this comic sums that up nicely.

484
00:26:20,300 --> 00:26:23,290


485
00:26:23,290 --> 00:26:28,120
So it's very much seems like P is not
NP, so I hate to break your all hearts.

486
00:26:28,120 --> 00:26:30,670
But a lot of people
think that P is not NP.

487
00:26:30,670 --> 00:26:33,790
There are some people who think P is NP.

488
00:26:33,790 --> 00:26:37,980
And then there are the people who have
different conceptions of the problem

489
00:26:37,980 --> 00:26:39,970
like, well, it's a badly worded problem.

490
00:26:39,970 --> 00:26:45,970
Any of the definitions of P and NP
are unrelated in their definitions.

491
00:26:45,970 --> 00:26:49,720
So there's a lot of conceptions,
but most computer scientists

492
00:26:49,720 --> 00:26:53,335
think it is a well-formed question
and that P is not equal to NP.

493
00:26:53,335 --> 00:26:56,800


494
00:26:56,800 --> 00:27:01,720
So how would one go about proving
this question of the relatedness

495
00:27:01,720 --> 00:27:05,789
of algorithms solving problems.

496
00:27:05,789 --> 00:27:08,830
And it turns out, it's just not easy
because the way we define these sets

497
00:27:08,830 --> 00:27:11,450
is by the existence
of algorithms, right?

498
00:27:11,450 --> 00:27:16,600
So to prove P is NP actually is
"easy" because all you have to do

499
00:27:16,600 --> 00:27:22,210
is find one polynomial time
algorithm for a NP complete problem.

500
00:27:22,210 --> 00:27:25,420
And then immediately, the
whole class collapses into P.

501
00:27:25,420 --> 00:27:28,090
That is, because if you
had a polynomial time

502
00:27:28,090 --> 00:27:30,070
algorithm to solve an
NP complete problem,

503
00:27:30,070 --> 00:27:32,560
you could solve all of the
other NP complete problems

504
00:27:32,560 --> 00:27:38,650
because we told you that they're all
related by some core completeness.

505
00:27:38,650 --> 00:27:42,010
And then to prove P is not
equal to NP is much harder,

506
00:27:42,010 --> 00:27:47,920
in the sense that you have to prove
that an algorithm doesn't exist.

507
00:27:47,920 --> 00:27:51,010
And that is actually kind of difficult
to solve because in the past,

508
00:27:51,010 --> 00:27:53,110
we've just invented new
algorithms, and it just

509
00:27:53,110 --> 00:27:56,740
took enough time before someone was
like, oh, well we can do it this way.

510
00:27:56,740 --> 00:28:02,140
And we don't have proof that an
algorithm doesn't exist in P.

511
00:28:02,140 --> 00:28:06,460
It's only until we just find an
algorithm in P that we can be sure that

512
00:28:06,460 --> 00:28:11,440
that wasn't-- excuse me--
was in NP and not in P.

513
00:28:11,440 --> 00:28:15,920
So that is inherently not an easy task.

514
00:28:15,920 --> 00:28:19,360
And I think that's why that makes
this problem so fascinating and so

515
00:28:19,360 --> 00:28:24,130
frustrating for a lot of smart people.

516
00:28:24,130 --> 00:28:27,820
I think one of the most interesting
ways to think about this problem

517
00:28:27,820 --> 00:28:30,990
is in terms of what if P did equal NP?

518
00:28:30,990 --> 00:28:35,500
And I think this is like
a counter-- or excuse me,

519
00:28:35,500 --> 00:28:39,790
like the-- I don't know what the
word is-- contradictive argument,

520
00:28:39,790 --> 00:28:42,610
where you say, well,
what if P did equal NP?

521
00:28:42,610 --> 00:28:47,320
And if P did equal NP, then the
world would be a lot different.

522
00:28:47,320 --> 00:28:51,280
If P was NP, then every problem
that would be easy to check

523
00:28:51,280 --> 00:28:53,080
would also be easy to solve.

524
00:28:53,080 --> 00:28:55,090
What does that mean?

525
00:28:55,090 --> 00:28:56,800
Actually, I don't know.

526
00:28:56,800 --> 00:29:01,300
I didn't realize this when I first
started to research this problem,

527
00:29:01,300 --> 00:29:06,170
but RSA encryption, which
is like basically the basis

528
00:29:06,170 --> 00:29:08,420
of a lot of our modern
encryption, would be

529
00:29:08,420 --> 00:29:11,230
very easy to crack because
it's based on factoring.

530
00:29:11,230 --> 00:29:13,540
It's based on factoring
large, large numbers.

531
00:29:13,540 --> 00:29:15,760
And if those numbers
were easy to factor,

532
00:29:15,760 --> 00:29:21,790
they would be very easy
to basically decrypt

533
00:29:21,790 --> 00:29:25,700
a lot of messages that are passed
online and things like that.

534
00:29:25,700 --> 00:29:29,420
But of course, we're relying on
the fact that P is not equal to NP,

535
00:29:29,420 --> 00:29:34,480
and that there is no algorithm
that exists currently to do that.

536
00:29:34,480 --> 00:29:37,570
Another ramification via
artificial intelligence systems

537
00:29:37,570 --> 00:29:42,050
would be making huge leaps almost
overnight because a proof to P

538
00:29:42,050 --> 00:29:46,210
equals NP would also tell us
something fundamental about solving

539
00:29:46,210 --> 00:29:50,690
certain problems that AI people solve.

540
00:29:50,690 --> 00:29:52,870
I don't want to go into
details too much there.

541
00:29:52,870 --> 00:29:56,440
But another ramification,
for example, would

542
00:29:56,440 --> 00:29:59,140
be that the economy would be
perfectly efficient, which

543
00:29:59,140 --> 00:30:06,760
means that if there were any arbitrage
between two different-- well,

544
00:30:06,760 --> 00:30:10,670
arbitrage works by taking
advantage of the different price

545
00:30:10,670 --> 00:30:13,390
discrepancies in a graph.

546
00:30:13,390 --> 00:30:18,710
And if those are easy to detect-- that
is, if there were a polynomial time

547
00:30:18,710 --> 00:30:22,120
algorithm to detect them,
then it would immediately

548
00:30:22,120 --> 00:30:25,480
be easy to find these
arbitrage opportunities, which

549
00:30:25,480 --> 00:30:29,350
is sort of a very bad thing.

550
00:30:29,350 --> 00:30:32,860
And then, one other
thing would be-- a lot

551
00:30:32,860 --> 00:30:36,790
of people say that proving
things, or mathematical proofs,

552
00:30:36,790 --> 00:30:39,730
is such an NP problem
because, well, it's

553
00:30:39,730 --> 00:30:41,380
easy to check if a proof is correct.

554
00:30:41,380 --> 00:30:42,710
You just follow the logic.

555
00:30:42,710 --> 00:30:46,960
It's much harder and it's a creative
task to actually create a proof.

556
00:30:46,960 --> 00:30:50,650
And so we might be
able to write programs

557
00:30:50,650 --> 00:30:54,086
that make mathematical
proofs if P did equal

558
00:30:54,086 --> 00:31:00,610
NP, which would just like be
crazy, and that would just

559
00:31:00,610 --> 00:31:03,190
make the whole world explode.

560
00:31:03,190 --> 00:31:06,040
The world would be fundamentally
different than the way we currently

561
00:31:06,040 --> 00:31:08,210
think it is.

562
00:31:08,210 --> 00:31:11,690
And a little more philosophy here,
which is one of the fun things

563
00:31:11,690 --> 00:31:16,510
I like to think about, is proving
things-- well we talked about it.

564
00:31:16,510 --> 00:31:20,770
It is sort of like an NP problem
because you can just follow the logic,

565
00:31:20,770 --> 00:31:22,750
and it's easy to check.

566
00:31:22,750 --> 00:31:24,890
Is comedy an NP problem?

567
00:31:24,890 --> 00:31:28,690
If I tell a joke and you laugh
at it, then it's really easy

568
00:31:28,690 --> 00:31:30,710
to check that that was a funny joke.

569
00:31:30,710 --> 00:31:33,440
Much harder task to write a funny joke.

570
00:31:33,440 --> 00:31:36,440
I know, because I do stand
up, and enough people

571
00:31:36,440 --> 00:31:38,890
have sat in silence at my jokes.

572
00:31:38,890 --> 00:31:41,690
I know it's a very hard thing to do.

573
00:31:41,690 --> 00:31:43,510
Music.

574
00:31:43,510 --> 00:31:44,690
Appreciation of music.

575
00:31:44,690 --> 00:31:45,790
Is it easy thing to check?

576
00:31:45,790 --> 00:31:46,390
Well, yeah.

577
00:31:46,390 --> 00:31:49,060
If you just listen to something,
you can kind of intuitively

578
00:31:49,060 --> 00:31:51,130
tell that that is a nice sound.

579
00:31:51,130 --> 00:31:56,380
But it is actually much different
much more difficult to create music,

580
00:31:56,380 --> 00:31:58,780
to create good music.

581
00:31:58,780 --> 00:32:01,840
These are all sort of
philosophical arguments.

582
00:32:01,840 --> 00:32:06,190
These aren't really-- the actual
definitions of P versus NP

583
00:32:06,190 --> 00:32:11,260
that we're exploring have a much more
well-defined relationship to Turing

584
00:32:11,260 --> 00:32:13,460
machines and what are called languages.

585
00:32:13,460 --> 00:32:15,940
But when you think very high
level, the ramifications

586
00:32:15,940 --> 00:32:21,640
are endless for any sort
of creative pursuit.

587
00:32:21,640 --> 00:32:25,046
And Scott Aaronson of MIT once
said that, "If P equaled NP,

588
00:32:25,046 --> 00:32:27,920
then the world would be a profoundly
different place than we normally

589
00:32:27,920 --> 00:32:28,586
assume it to be.

590
00:32:28,586 --> 00:32:30,880
There would be no special
value in creative leaps.

591
00:32:30,880 --> 00:32:33,610
There would be no fundamental
gap between solving a problem

592
00:32:33,610 --> 00:32:35,890
and recognizing the
solution once it's found.

593
00:32:35,890 --> 00:32:38,380
Everyone who can appreciate
a symphony would be Mozart,

594
00:32:38,380 --> 00:32:41,470
and everyone who could follow a
step-by-step argument would be Gauss."

595
00:32:41,470 --> 00:32:45,390
And I think that tells you something
about the crazy ramifications

596
00:32:45,390 --> 00:32:48,625
of a proof that P did equal to NP.

597
00:32:48,625 --> 00:32:52,780


598
00:32:52,780 --> 00:32:53,989
Yeah, so why should you care?

599
00:32:53,989 --> 00:32:56,738
Hopefully, you're sitting here and
being like, well, that's great,

600
00:32:56,738 --> 00:32:58,240
but I have a final product to do.

601
00:32:58,240 --> 00:33:00,210
And this doesn't really help at all.

602
00:33:00,210 --> 00:33:04,750
I don't think many of you are going to
be doing theoretical proofs or anything

603
00:33:04,750 --> 00:33:06,670
like that or talking about theory.

604
00:33:06,670 --> 00:33:08,120
Although, I think you should.

605
00:33:08,120 --> 00:33:12,130
We talk in CS50 about running
time of different algorithms.

606
00:33:12,130 --> 00:33:16,960
And that is very relevant because
there could be a faster way

607
00:33:16,960 --> 00:33:18,920
to do the problems that
you're thinking about.

608
00:33:18,920 --> 00:33:20,429
You just don't know yet.

609
00:33:20,429 --> 00:33:22,720
And that's why theoretical
computer scientists research

610
00:33:22,720 --> 00:33:26,170
the fastest way to do certain things.

611
00:33:26,170 --> 00:33:29,669
And that's why reductions have
shown us-- reductions from NP

612
00:33:29,669 --> 00:33:32,210
complete problems have shown us
that like if a problem exists

613
00:33:32,210 --> 00:33:34,690
of this problem, a problem
that you're trying to solve

614
00:33:34,690 --> 00:33:36,610
might actually have already been solved.

615
00:33:36,610 --> 00:33:41,690
Or another other thing is that, well,
if you assume P doesn't equal to NP,

616
00:33:41,690 --> 00:33:46,840
something you're trying to do
might not be that easy to do.

617
00:33:46,840 --> 00:33:50,380
And if that wasn't motivation
again for why you should care,

618
00:33:50,380 --> 00:33:52,420
well, it's a million
dollar question, right?

619
00:33:52,420 --> 00:33:58,810
And there's nothing that computer
scientists like more than computers

620
00:33:58,810 --> 00:34:00,130
except for money, right?

621
00:34:00,130 --> 00:34:02,440
So I think that should
be motivation enough

622
00:34:02,440 --> 00:34:06,140
for you to be interested
in the P versus NP problem.

623
00:34:06,140 --> 00:34:09,020
It is something that is
really interesting to me.

624
00:34:09,020 --> 00:34:13,027
I know I took you through the weeds a
little bit with the actual reductions,

625
00:34:13,027 --> 00:34:14,360
but I hope that was interesting.

626
00:34:14,360 --> 00:34:16,270
And that's really what
computer scientists do

627
00:34:16,270 --> 00:34:21,110
when they think about these types of
problems, is reduce problems to another

628
00:34:21,110 --> 00:34:25,690
and sort of relate these algorithms.

629
00:34:25,690 --> 00:34:31,026
And so I think that's about all I have.

630
00:34:31,026 --> 00:34:31,900
But I hope you enjoy.

631
00:34:31,900 --> 00:34:32,733
Here are some links.

632
00:34:32,733 --> 00:34:37,449
Just the P versus NP page is a
collection of the status and history

633
00:34:37,449 --> 00:34:38,650
of the problem.

634
00:34:38,650 --> 00:34:40,340
And this is just a video I really like.

635
00:34:40,340 --> 00:34:43,989
It explains all these things very well.

636
00:34:43,989 --> 00:34:45,219
That's all I have.

637
00:34:45,219 --> 00:34:49,210
I hope that was enjoyable for you.

638
00:34:49,210 --> 00:34:55,030
And for everyone out there
enjoying it on YouTube, I guess,

639
00:34:55,030 --> 00:34:57,380
I hope you could take
something away from this.

640
00:34:57,380 --> 00:35:02,090
And yeah, P versus NP.

641
00:35:02,090 --> 00:35:03,910
It's pretty cool.

642
00:35:03,910 --> 00:35:06,141