1
00:00:00,000 --> 00:00:02,097
2
00:00:02,097 --> 00:00:03,930
SAMMY: Well, thanks for
watching my seminar.
3
00:00:03,930 --> 00:00:05,930
My name is Sammy Mara.
4
00:00:05,930 --> 00:00:09,860
I am a senior studying computer
science here at Harvard.
5
00:00:09,860 --> 00:00:12,180
And P versus NP is what I'm
going to be talking about.
6
00:00:12,180 --> 00:00:15,450
It is the greatest unsolved
problem in computer science,
7
00:00:15,450 --> 00:00:19,170
and maybe all of math, I think.
8
00:00:19,170 --> 00:00:21,520
And it's something that has
fascinated me for a while.
9
00:00:21,520 --> 00:00:24,180
So I think it's a natural
extension of the CS50,
10
00:00:24,180 --> 00:00:26,840
and I hope you learn something from it.
11
00:00:26,840 --> 00:00:29,340
We'll go into some of
the details, as well as
12
00:00:29,340 --> 00:00:33,630
the big picture of what is P versus NP.
13
00:00:33,630 --> 00:00:34,490
And yeah.
14
00:00:34,490 --> 00:00:37,440
So let's get started here.
15
00:00:37,440 --> 00:00:41,100
This is just to show that P versus NP
is sort of like a relevant problem,
16
00:00:41,100 --> 00:00:42,750
and it shows up in pop culture.
17
00:00:42,750 --> 00:00:46,470
This is Homer Simpson in
some matrix void thing,
18
00:00:46,470 --> 00:00:50,340
and you can see P equals NP
floating in the background there.
19
00:00:50,340 --> 00:00:53,970
And then in Futurama-- I guess that's
also a Matt Groening production--
20
00:00:53,970 --> 00:00:58,640
P and NP in separate books, sort of
implying that P is not equal to NP.
21
00:00:58,640 --> 00:01:02,880
22
00:01:02,880 --> 00:01:05,610
Really, the question is whether
or not P is equal to NP,
23
00:01:05,610 --> 00:01:09,280
and we'll get into what
that means in just a second.
24
00:01:09,280 --> 00:01:12,520
But if you're watching
this and you thinking hey,
25
00:01:12,520 --> 00:01:16,740
I want to solve P versus NP for my
final project, got some bad news.
26
00:01:16,740 --> 00:01:20,550
It's probably not the best idea.
27
00:01:20,550 --> 00:01:25,080
Don't want to discourage you, but a lot
of really, really famous, smart people,
28
00:01:25,080 --> 00:01:28,890
much smarter than me, have tried to
solve this problem for a long time.
29
00:01:28,890 --> 00:01:33,409
And its applications
are very far-reaching,
30
00:01:33,409 --> 00:01:35,700
and I think we can learn a
lot from it, but the problem
31
00:01:35,700 --> 00:01:37,237
may not be solved for a long time.
32
00:01:37,237 --> 00:01:39,750
33
00:01:39,750 --> 00:01:45,450
And yeah, why should we care about
the theory of computer science?
34
00:01:45,450 --> 00:01:49,670
And in CS50, you learn a lot about
the running time of algorithms.
35
00:01:49,670 --> 00:01:52,260
And we talk a lot about
how algorithms work
36
00:01:52,260 --> 00:01:55,920
and what are the best ways
to solve certain problems.
37
00:01:55,920 --> 00:01:58,980
But theory of computer
science sort of goes
38
00:01:58,980 --> 00:02:01,210
into how can we extrapolate from there?
39
00:02:01,210 --> 00:02:05,970
And what are the fundamental things
about algorithms that make them
40
00:02:05,970 --> 00:02:09,960
similar to one another and different
from one another, as well as
41
00:02:09,960 --> 00:02:13,450
how to model the computer
in its simplest form?
42
00:02:13,450 --> 00:02:17,070
That's something that Alan
Turing did, way back when,
43
00:02:17,070 --> 00:02:20,119
and he became famous for that.
44
00:02:20,119 --> 00:02:22,410
Everyone is-- I mean, not
everyone, but a lot of people
45
00:02:22,410 --> 00:02:24,035
are familiar with like Turing machines.
46
00:02:24,035 --> 00:02:28,180
That's sort of his theoretical
model of a computer.
47
00:02:28,180 --> 00:02:28,980
And yeah.
48
00:02:28,980 --> 00:02:32,070
It helps us solve problems and
use solutions from one problem
49
00:02:32,070 --> 00:02:36,840
to solve other problems,
so it's pretty profound.
50
00:02:36,840 --> 00:02:39,900
And yeah.
51
00:02:39,900 --> 00:02:44,002
So I have a little more motivation in
answering this question of P versus NP.
52
00:02:44,002 --> 00:02:45,960
And I still haven't really
told you what it is.
53
00:02:45,960 --> 00:02:49,380
But solving this problem
would essentially
54
00:02:49,380 --> 00:02:53,280
help us with questions like are
algorithms invented or discovered?
55
00:02:53,280 --> 00:02:56,191
Is there some fast
algorithm for solving Sudoku
56
00:02:56,191 --> 00:02:57,690
that we just haven't discovered yet?
57
00:02:57,690 --> 00:03:00,780
Or did it just exist in the ether
and we have discovered it yet?
58
00:03:00,780 --> 00:03:05,370
Or are we just like too dumb to
discover it, but like definitely exists?
59
00:03:05,370 --> 00:03:08,360
Or does it just not exist at all?
60
00:03:08,360 --> 00:03:10,980
And yeah, this question is-- if
that wasn't motivation enough,
61
00:03:10,980 --> 00:03:12,990
it's a million dollar
question in the sense
62
00:03:12,990 --> 00:03:16,560
that the Clay Institute-- It's one
of Clay Institute's Millennium Prize
63
00:03:16,560 --> 00:03:21,600
problems, which means there is
$1 million for whoever person
64
00:03:21,600 --> 00:03:25,600
can solve it and prove
it, one way or the other.
65
00:03:25,600 --> 00:03:30,780
Because what do computer scientists
love more than computers?
66
00:03:30,780 --> 00:03:33,380
Yeah, money.
67
00:03:33,380 --> 00:03:37,140
And I'm hoping that a CS50
student will be the one
68
00:03:37,140 --> 00:03:39,180
to solve this problem one day.
69
00:03:39,180 --> 00:03:41,860
So I hope you can--
70
00:03:41,860 --> 00:03:44,430
So what is the question of P versus NP?
71
00:03:44,430 --> 00:03:50,430
And it's really, are problem that
are easy to check also easy to solve?
72
00:03:50,430 --> 00:03:55,200
And that's sort of a very, very,
very high-level thought-- way
73
00:03:55,200 --> 00:03:55,980
to think about it.
74
00:03:55,980 --> 00:03:58,639
But what is this word
easy mean in this context?
75
00:03:58,639 --> 00:04:00,180
What does it mean to check a problem?
76
00:04:00,180 --> 00:04:02,160
What does it mean to solve a problem?
77
00:04:02,160 --> 00:04:04,795
Hoping to go into all of
that in just a minute.
78
00:04:04,795 --> 00:04:06,336
And hopefully that will become clear.
79
00:04:06,336 --> 00:04:09,010
80
00:04:09,010 --> 00:04:13,410
So yeah, this problem in
particular has a long history.
81
00:04:13,410 --> 00:04:16,260
Back in the mid 1960s,
when computers were still
82
00:04:16,260 --> 00:04:19,410
the size of cabinets
and small houses, people
83
00:04:19,410 --> 00:04:22,980
were still trying to find algorithms
to solve basically all the world's
84
00:04:22,980 --> 00:04:27,060
problems, like factoring and
the Traveling Salesman Problem,
85
00:04:27,060 --> 00:04:30,860
and determining if a number is
prime, and problems like these.
86
00:04:30,860 --> 00:04:33,510
But some problems they
had slow algorithms for,
87
00:04:33,510 --> 00:04:36,450
they were able to solve them,
but it just took a long time.
88
00:04:36,450 --> 00:04:39,690
And for some algorithms, like find the
greatest common divisor and sorting,
89
00:04:39,690 --> 00:04:42,240
we have fast algorithms like merge sort.
90
00:04:42,240 --> 00:04:45,660
That's, worst case, running
a prime of, I think, n log n.
91
00:04:45,660 --> 00:04:46,220
I'm a CI.
92
00:04:46,220 --> 00:04:48,360
I should know that.
93
00:04:48,360 --> 00:04:50,340
So things like that just
have fast algorithms.
94
00:04:50,340 --> 00:04:55,080
But for some problems, we
just have no no fast solution.
95
00:04:55,080 --> 00:04:58,440
And then in 1965, people
came up with a solution
96
00:04:58,440 --> 00:05:02,040
to the discrete Fourier
transform, and suddenly, that
97
00:05:02,040 --> 00:05:04,500
was in the fast category.
98
00:05:04,500 --> 00:05:07,590
And for things like determining
if a number is prime,
99
00:05:07,590 --> 00:05:13,560
that took until 2002 for someone
to find an algorithm to do that,
100
00:05:13,560 --> 00:05:15,540
until it could be in this fast category.
101
00:05:15,540 --> 00:05:17,940
And I'll explain what a
fast algorithm is versus
102
00:05:17,940 --> 00:05:20,889
a slow algorithm in just a second.
103
00:05:20,889 --> 00:05:23,180
But for a lot of these
algorithms, there were just no--
104
00:05:23,180 --> 00:05:25,805
for a lot of these problems,
there were just no such algorithms
105
00:05:25,805 --> 00:05:28,800
that were able to be solved quickly.
106
00:05:28,800 --> 00:05:34,590
And it didn't seem like we would ever
find a fast solution to these problem.
107
00:05:34,590 --> 00:05:39,120
And so that is really the question,
is will we ever find fast solutions
108
00:05:39,120 --> 00:05:41,640
to these problems?
109
00:05:41,640 --> 00:05:44,670
And actually, the
answer is very unclear.
110
00:05:44,670 --> 00:05:48,390
Although, I think one of the most
important moments for this problem
111
00:05:48,390 --> 00:05:56,070
was a theorem that Steven Cook of
MIT and Leonid Levin came up with.
112
00:05:56,070 --> 00:05:57,960
And it basically said
that a lot of problems
113
00:05:57,960 --> 00:06:00,390
that we face in computer
science are all connected
114
00:06:00,390 --> 00:06:02,460
by some sort of core difficulty.
115
00:06:02,460 --> 00:06:04,700
And what that means is
that if you could solve--
116
00:06:04,700 --> 00:06:07,450
if you could have a fast solution
to one algorithm or one problem,
117
00:06:07,450 --> 00:06:10,020
you could actually
use the same algorithm
118
00:06:10,020 --> 00:06:13,170
to solve very different,
seemingly different problem.
119
00:06:13,170 --> 00:06:17,490
And that was a good way to
categorize these problems,
120
00:06:17,490 --> 00:06:22,320
and we learned a lot from that about
complexity and general complexity
121
00:06:22,320 --> 00:06:27,280
theory being the fields of
categorizing these problems.
122
00:06:27,280 --> 00:06:33,240
So I think that was a
really important moment.
123
00:06:33,240 --> 00:06:33,740
Yeah.
124
00:06:33,740 --> 00:06:40,320
So let's go into a little more detail
here about what P is, what is NP,
125
00:06:40,320 --> 00:06:43,870
why are they fighting, and so forth.
126
00:06:43,870 --> 00:06:49,310
But at the very base of it,
P and NP are sets of problem.
127
00:06:49,310 --> 00:06:52,620
Normally in math, we think about
sets of numbers or sets of elements
128
00:06:52,620 --> 00:06:54,020
or sets of words or whatever.
129
00:06:54,020 --> 00:06:56,460
But now we're talking
about sets of problems.
130
00:06:56,460 --> 00:06:59,010
And the way these problems
are defined is whether or not
131
00:06:59,010 --> 00:07:01,230
they have these fast algorithms.
132
00:07:01,230 --> 00:07:06,960
So some problems, like the problem
in P have known fast algorithms.
133
00:07:06,960 --> 00:07:14,880
And some problems that are in NP are
at least able to be checked quickly.
134
00:07:14,880 --> 00:07:18,330
And definitely, if a
problem is P, it is in NP
135
00:07:18,330 --> 00:07:21,360
because if you can solve
a problem very quickly,
136
00:07:21,360 --> 00:07:25,110
it's also easy to check it
because you just solve it again.
137
00:07:25,110 --> 00:07:30,520
And that will all become clear,
hopefully, in the next few slides.
138
00:07:30,520 --> 00:07:34,710
But yeah, so let's talk first a
little more in-depth about what is P?
139
00:07:34,710 --> 00:07:40,470
The set polynomial time, which
is basically the set of problems
140
00:07:40,470 --> 00:07:44,010
where there exists a polynomial time
algorithm that generates a solution.
141
00:07:44,010 --> 00:07:46,290
That might be a little
bit confusing at first.
142
00:07:46,290 --> 00:07:50,850
But in CS50, hopefully, you learned
a little bit about big O notation
143
00:07:50,850 --> 00:07:53,940
or worst case algorithm runtime.
144
00:07:53,940 --> 00:07:57,270
And what this means is that these
are the types of problem that can be
145
00:07:57,270 --> 00:07:59,670
solved in a polynomial amount of time.
146
00:07:59,670 --> 00:08:04,260
That could be like 3n
to the 5th plus 5, where
147
00:08:04,260 --> 00:08:10,140
n is the size of the
problem by some definition.
148
00:08:10,140 --> 00:08:12,990
So these problems include
finding the GCD of two numbers.
149
00:08:12,990 --> 00:08:15,390
We have a fast algorithm for that.
150
00:08:15,390 --> 00:08:19,170
It's been known for lots of--
many, many years due to Euclid.
151
00:08:19,170 --> 00:08:20,420
Excuse me.
152
00:08:20,420 --> 00:08:24,270
And you can see this is the
Euclidean algorithm right there.
153
00:08:24,270 --> 00:08:27,210
And it just is very deterministic.
154
00:08:27,210 --> 00:08:29,850
You just put in input,
and you can slowly
155
00:08:29,850 --> 00:08:33,330
boil down the problem into a solution.
156
00:08:33,330 --> 00:08:36,274
Linear programming is basically
like a set of equations.
157
00:08:36,274 --> 00:08:38,190
Let's not worry about
that too much right now,
158
00:08:38,190 --> 00:08:42,062
but that's another problem
that is well known to be in P.
159
00:08:42,062 --> 00:08:45,270
And determining if a number is prime,
that actually-- as I mentioned earlier,
160
00:08:45,270 --> 00:08:47,790
that result wasn't found until 2002.
161
00:08:47,790 --> 00:08:52,980
But that is also in the
P family of problems.
162
00:08:52,980 --> 00:08:56,610
And multiplication, as you
probably all know, the algorithm
163
00:08:56,610 --> 00:08:58,166
for multiplying two numbers.
164
00:08:58,166 --> 00:08:59,790
You just sort of go through one by one.
165
00:08:59,790 --> 00:09:03,550
And that's, at least-- if you multiply
every digit by every other digit,
166
00:09:03,550 --> 00:09:05,070
that's sort of like n squared.
167
00:09:05,070 --> 00:09:06,111
And then you add them up.
168
00:09:06,111 --> 00:09:07,620
So it's at most n cubed.
169
00:09:07,620 --> 00:09:11,010
But crucially, these albums--
excuse me-- algorithms
170
00:09:11,010 --> 00:09:14,820
aren't like 2 to the n
or something exponential.
171
00:09:14,820 --> 00:09:22,380
So that's what makes them
in this category of P. Yes.
172
00:09:22,380 --> 00:09:28,530
And yeah, so let's move on a
little bit to the set NP, which
173
00:09:28,530 --> 00:09:31,920
stands for nondeterministic
polynomial-time time, which
174
00:09:31,920 --> 00:09:36,570
is sort of like a way of saying,
well, they have polynomial algorithms
175
00:09:36,570 --> 00:09:38,970
to solve them, but
they're nondeterministic,
176
00:09:38,970 --> 00:09:42,300
which means you just sort
of have to guess or use
177
00:09:42,300 --> 00:09:44,910
an infinite number of computers
before you can solve them
178
00:09:44,910 --> 00:09:46,710
in a polynomial amount of time.
179
00:09:46,710 --> 00:09:50,070
But the way these are defined
formally-- and before,
180
00:09:50,070 --> 00:09:53,190
we talked about problems
that can be solved quickly.
181
00:09:53,190 --> 00:09:56,370
Now, we're talking about problems
that can be checked quickly.
182
00:09:56,370 --> 00:09:59,609
And what that means is it's
a set of decision problems--
183
00:09:59,609 --> 00:10:02,400
and I'll explain what that means
in just a second-- for which there
184
00:10:02,400 --> 00:10:06,480
exists a polynomial algorithm to
check if a solution is correct.
185
00:10:06,480 --> 00:10:09,840
So before, we were saying I'm
giving you a problem; tell me
186
00:10:09,840 --> 00:10:16,290
a solution and some
definition of the solution.
187
00:10:16,290 --> 00:10:19,380
And now I'm saying here's
a problem and a solution,
188
00:10:19,380 --> 00:10:20,940
and I want you to check the solution.
189
00:10:20,940 --> 00:10:25,890
So does that checking take
a polynomial amount of time?
190
00:10:25,890 --> 00:10:30,000
And so these problems
are-- I guess problems
191
00:10:30,000 --> 00:10:34,710
included in this set are like Sudoku
because if I give you a Sudoku board
192
00:10:34,710 --> 00:10:36,959
and said this is solved,
how fast would it
193
00:10:36,959 --> 00:10:38,500
be for you to check if it was solved?
194
00:10:38,500 --> 00:10:40,166
It actually wouldn't take you that long.
195
00:10:40,166 --> 00:10:44,550
You'd probably just go through
every row, column, and nine
196
00:10:44,550 --> 00:10:47,700
by-- excuse me-- three
by three square to check.
197
00:10:47,700 --> 00:10:50,040
And that's what n cubed.
198
00:10:50,040 --> 00:10:52,980
But crucially, it's not
something like 9 to the n, which
199
00:10:52,980 --> 00:10:55,066
would be an exponential amount of time.
200
00:10:55,066 --> 00:10:57,800
201
00:10:57,800 --> 00:11:00,800
And in other problems
like factoring, it's
202
00:11:00,800 --> 00:11:04,060
easy to check if a factor or
a factorization is correct.
203
00:11:04,060 --> 00:11:05,990
You just multiply the
numbers, and then you
204
00:11:05,990 --> 00:11:08,440
can see if that
factorization is correct.
205
00:11:08,440 --> 00:11:13,370
A much harder problem to
factor the numbers, by the way.
206
00:11:13,370 --> 00:11:18,970
And Traveling Sales problem's
an example of famous NP problem
207
00:11:18,970 --> 00:11:22,217
where you're given a graph
with vertices and edges.
208
00:11:22,217 --> 00:11:24,800
And we'll get into a little more
about graphs in just a second
209
00:11:24,800 --> 00:11:28,720
because those turn out to be
really relevant to NP problems
210
00:11:28,720 --> 00:11:31,130
and thinking about them.
211
00:11:31,130 --> 00:11:34,520
But again, multiplication-- I thought
we just said that was a P problem.
212
00:11:34,520 --> 00:11:36,800
Well, if something's
in P, it's also in NP
213
00:11:36,800 --> 00:11:39,860
because if I said,
well, what's 51 times 3?
214
00:11:39,860 --> 00:11:43,130
If I said 51 times 3 is 153,
well, how would you know?
215
00:11:43,130 --> 00:11:47,010
You'd probably just check it
yourself and multiply yourself.
216
00:11:47,010 --> 00:11:49,490
And of course, that's, as
we described, a P algorithm.
217
00:11:49,490 --> 00:11:53,570
So that has a P algorithm to check
whether or not my solution is correct.
218
00:11:53,570 --> 00:11:55,570
That's how you know it's in NP.
219
00:11:55,570 --> 00:11:59,950
So we've talked about P.
We've talked about NP.
220
00:11:59,950 --> 00:12:02,900
So now, we have a couple
examples here, and maybe give you
221
00:12:02,900 --> 00:12:06,410
a better grasp of how to think
about different problems.
222
00:12:06,410 --> 00:12:10,670
So sorting a list, is that in P or NP?
223
00:12:10,670 --> 00:12:14,930
Well, we talked about sorting and
the running time of those in CS50.
224
00:12:14,930 --> 00:12:18,290
Hopefully, you know that
that's, for a lot of sorts,
225
00:12:18,290 --> 00:12:23,300
that's n log n or n squared,
I think, for a quick sort.
226
00:12:23,300 --> 00:12:25,940
But those all have
polynomial time algorithms,
227
00:12:25,940 --> 00:12:29,630
so those would be in P, also in NP.
228
00:12:29,630 --> 00:12:32,220
Multiplication, we just
described P algorithm, right?
229
00:12:32,220 --> 00:12:35,420
It's something like n cubed.
230
00:12:35,420 --> 00:12:38,600
And then given a set of vertices
and edges, is the graph connected?
231
00:12:38,600 --> 00:12:41,290
Is that P or NP?
232
00:12:41,290 --> 00:12:43,670
Well, you just start at
a point, and you traverse
233
00:12:43,670 --> 00:12:46,970
until you have seen all the edges.
234
00:12:46,970 --> 00:12:48,970
If you haven't seen--
excuse me, all the points.
235
00:12:48,970 --> 00:12:55,430
If you haven't seen all of the vertices,
then you know it is not well connected.
236
00:12:55,430 --> 00:12:57,800
And another example
would be a Rubik's cube.
237
00:12:57,800 --> 00:12:59,570
First of all, is that in NP?
238
00:12:59,570 --> 00:13:04,111
That is, is there a way to check
if the Rubik's cube is solved?
239
00:13:04,111 --> 00:13:04,610
Well, yeah.
240
00:13:04,610 --> 00:13:05,580
It's pretty easy.
241
00:13:05,580 --> 00:13:08,150
You just check if all
the faces are solved.
242
00:13:08,150 --> 00:13:09,200
But is it in P?
243
00:13:09,200 --> 00:13:13,070
Is there a polynomial time algorithm
to actually solve the Rubik's cube?
244
00:13:13,070 --> 00:13:15,650
That is actually a little
harder to think about,
245
00:13:15,650 --> 00:13:22,587
but it turns out it is actually--
computers can do that fairly fast.
246
00:13:22,587 --> 00:13:24,170
And what about the best move in chess?
247
00:13:24,170 --> 00:13:25,370
Is that in NP?
248
00:13:25,370 --> 00:13:28,640
Well, if I told you that this
was the best move in chess,
249
00:13:28,640 --> 00:13:31,550
could you check that it's
the best move in chess?
250
00:13:31,550 --> 00:13:32,570
Probably not.
251
00:13:32,570 --> 00:13:35,120
I think you'd have to go
through a more calculation
252
00:13:35,120 --> 00:13:37,670
to determine if that was the best move.
253
00:13:37,670 --> 00:13:40,630
And that problem, actually,
is not in NP or P.
254
00:13:40,630 --> 00:13:43,220
That's in its own class
called exponential time,
255
00:13:43,220 --> 00:13:48,820
where it takes exponential time to
even check if a solution is correct.
256
00:13:48,820 --> 00:13:51,620
And then there's problems
like-- a classic NP
257
00:13:51,620 --> 00:13:56,090
problem is like the subset sum problem,
which means given a set of integers,
258
00:13:56,090 --> 00:13:59,900
say, I'm asking do they
sum to negative-- is there
259
00:13:59,900 --> 00:14:02,130
a subset that sums to negative 1?
260
00:14:02,130 --> 00:14:05,980
And in this case, you
could just look at it.
261
00:14:05,980 --> 00:14:08,210
And if I gave you-- sorry,
if I gave you a subset
262
00:14:08,210 --> 00:14:11,085
and then said these sub to minus 1,
it would be really easy to check.
263
00:14:11,085 --> 00:14:14,530
You just sum them up,
and they sum to minus 1.
264
00:14:14,530 --> 00:14:17,230
But solving that problem
actually isn't very easy.
265
00:14:17,230 --> 00:14:19,430
And there's some sort
of exponential checking
266
00:14:19,430 --> 00:14:24,950
that you have to do to determine
if there is such a subset.
267
00:14:24,950 --> 00:14:28,400
And if you think about how many
subsets there are, 2 to the n
268
00:14:28,400 --> 00:14:31,310
subsets, I believe, where
there's n members of a set.
269
00:14:31,310 --> 00:14:35,900
So you have to check a lot, and that's
a good indicator that it is not in P.
270
00:14:35,900 --> 00:14:41,290
But in this case, it is, as we showed,
in NP because it's easy to check.
271
00:14:41,290 --> 00:14:45,050
272
00:14:45,050 --> 00:14:49,930
So I was talking to you
before about NP completeness.
273
00:14:49,930 --> 00:14:53,210
And this is sort of one of
the most important concepts
274
00:14:53,210 --> 00:14:57,289
to understand when talking
about P versus NP problem.
275
00:14:57,289 --> 00:15:00,580
Basically, what this means-- and this is
related to the Cook-Levin theorem that
276
00:15:00,580 --> 00:15:03,510
was talking about earlier.
277
00:15:03,510 --> 00:15:06,320
It essentially says that all
of these problems-- and this
278
00:15:06,320 --> 00:15:08,610
is an example of a
bunch of NP problems--
279
00:15:08,610 --> 00:15:10,090
they're all sort of connected.
280
00:15:10,090 --> 00:15:13,610
Where, if you have a solution,
say, to the graph coloring problem,
281
00:15:13,610 --> 00:15:17,540
you can use it to form a solution
to the clique cover problem
282
00:15:17,540 --> 00:15:21,530
or to the hitting set problem or to
the yada, yada-- you can basically
283
00:15:21,530 --> 00:15:23,600
form one solution into the other.
284
00:15:23,600 --> 00:15:29,220
And that's really powerful because if
someone one day finds a polynomial time
285
00:15:29,220 --> 00:15:31,370
algorithm to do one of
these problems, that
286
00:15:31,370 --> 00:15:34,940
means they can find a polynomial time
algorithm to do all of these problems
287
00:15:34,940 --> 00:15:39,119
because they're all sort of related,
and they can be converted to one another
288
00:15:39,119 --> 00:15:39,660
very quickly.
289
00:15:39,660 --> 00:15:42,190
290
00:15:42,190 --> 00:15:44,580
And I sort of glossed
over detail there, which
291
00:15:44,580 --> 00:15:49,180
is how fast can these algorithms
solutions be converted to one another?
292
00:15:49,180 --> 00:15:50,910
And that has to do with reduction.
293
00:15:50,910 --> 00:15:53,560
And I just want to talk
about this for a second
294
00:15:53,560 --> 00:15:57,600
because this is one of the
more core concepts that
295
00:15:57,600 --> 00:15:59,236
is involved when proving these things.
296
00:15:59,236 --> 00:16:01,110
And this is actually
what computer scientists
297
00:16:01,110 --> 00:16:04,980
have to do when they talk about--
are these problems NP complete?
298
00:16:04,980 --> 00:16:08,440
Are these problems related
by some core difficulty?
299
00:16:08,440 --> 00:16:10,260
And what it means is
basically a reduction
300
00:16:10,260 --> 00:16:13,260
is a polynomial time algorithm that
converts a solution from one problem
301
00:16:13,260 --> 00:16:15,190
to the solution of another problem.
302
00:16:15,190 --> 00:16:19,331
And I'm going to be talking a little bit
about three particular computer science
303
00:16:19,331 --> 00:16:21,330
related problems, which
are the independents set
304
00:16:21,330 --> 00:16:24,450
problem, the vertex cover
problem, and the clique problem.
305
00:16:24,450 --> 00:16:29,040
Independent set problem
has to do with-- basically,
306
00:16:29,040 --> 00:16:31,470
given a graph like the one
I'm showing in the slide,
307
00:16:31,470 --> 00:16:37,630
are there are a set of k vertices, such
that there are no 2 of those vertices
308
00:16:37,630 --> 00:16:40,290
are touching or related by an edge?
309
00:16:40,290 --> 00:16:43,690
The vertex cover problem
means can you give me
310
00:16:43,690 --> 00:16:51,090
a minimum set of vertices
that covers all the edges?
311
00:16:51,090 --> 00:16:53,420
That is, that all of the
edges are-- excuse me.
312
00:16:53,420 --> 00:16:58,570
All of the vertices are touching
every one of the edges in the graph.
313
00:16:58,570 --> 00:17:02,040
And then the clique problem-- I don't
know if it's "click" or "kleek."
314
00:17:02,040 --> 00:17:04,079
Some people say "kleek."
315
00:17:04,079 --> 00:17:06,510
But it's related to the
colloquial sense of the word
316
00:17:06,510 --> 00:17:09,569
clique, which is like a friend group.
317
00:17:09,569 --> 00:17:13,050
Is there a friend group,
essentially, in this graph
318
00:17:13,050 --> 00:17:18,609
where every single person in the graph,
every person represented by a vertex,
319
00:17:18,609 --> 00:17:21,700
is connected to every single
other vertex in the clique?
320
00:17:21,700 --> 00:17:23,920
And in this case, you
can see there is actually
321
00:17:23,920 --> 00:17:28,280
a 5 clique in the bottom
left of these 5 vertexes
322
00:17:28,280 --> 00:17:30,500
are all connected to each other.
323
00:17:30,500 --> 00:17:32,920
And so that makes them
what's called a 5 clique.
324
00:17:32,920 --> 00:17:35,711
So these are the three problems
that I'm going to go into in detail
325
00:17:35,711 --> 00:17:36,640
just a second.
326
00:17:36,640 --> 00:17:40,670
And I'm going to show you that they're
actually all sort of the same problem.
327
00:17:40,670 --> 00:17:44,330
And solving one can actually
give you a solution to another.
328
00:17:44,330 --> 00:17:48,510
So let's take the example of this graph.
329
00:17:48,510 --> 00:17:52,540
Again, graphs generally is a
set of vertices, excuse me,
330
00:17:52,540 --> 00:17:56,220
and edges connecting them.
331
00:17:56,220 --> 00:17:58,140
And this is an example
of an independent set.
332
00:17:58,140 --> 00:17:59,280
So how do we get this?
333
00:17:59,280 --> 00:18:04,740
If I just said to you give me 4 of
these points, such that no 2 of them
334
00:18:04,740 --> 00:18:10,781
are related by an edge, then
this would be the solution.
335
00:18:10,781 --> 00:18:11,280
Great.
336
00:18:11,280 --> 00:18:12,580
So how long does that take?
337
00:18:12,580 --> 00:18:14,440
Well, if it were a big
graph, it would have
338
00:18:14,440 --> 00:18:16,460
taken some exponential amount of time.
339
00:18:16,460 --> 00:18:19,180
There's no polynomial
time algorithm to do that.
340
00:18:19,180 --> 00:18:21,940
But if we are able to
get a solution, we can
341
00:18:21,940 --> 00:18:24,070
use it to solve some other problems.
342
00:18:24,070 --> 00:18:26,610
And in this case, we can
use the independent set
343
00:18:26,610 --> 00:18:28,630
to solve the vertex cover problem.
344
00:18:28,630 --> 00:18:29,380
How do we do that?
345
00:18:29,380 --> 00:18:33,940
Well, we just consider all of
the vertexes in the compliment.
346
00:18:33,940 --> 00:18:38,490
So basically, all of the vertex
that weren't in the independent set,
347
00:18:38,490 --> 00:18:40,860
consider those to be
in the vertex cover.
348
00:18:40,860 --> 00:18:44,860
And these have the property
that every single vertex
349
00:18:44,860 --> 00:18:47,550
is related to every other edge.
350
00:18:47,550 --> 00:18:50,860
I don't know if I'm
explaining that correctly.
351
00:18:50,860 --> 00:18:55,860
Every vertex in the vertex cover
is touching every single edge.
352
00:18:55,860 --> 00:18:59,510
And in this case, you
can hopefully see that.
353
00:18:59,510 --> 00:19:02,520
And so we just created,
out of nothingness,
354
00:19:02,520 --> 00:19:05,410
a solution to vertex cover from
a solution to independent set.
355
00:19:05,410 --> 00:19:08,710
And we can even take it one
step further with clique.
356
00:19:08,710 --> 00:19:12,810
The way you do this is you consider
the points an independent set,
357
00:19:12,810 --> 00:19:15,160
or the graph of independent
set, and then take
358
00:19:15,160 --> 00:19:18,480
the complement of the edges,
which means that every edge that
359
00:19:18,480 --> 00:19:20,550
doesn't exist already, you fill it in.
360
00:19:20,550 --> 00:19:23,650
And every edge that did
exist, you get rid of it.
361
00:19:23,650 --> 00:19:26,200
And what you're left with is this graph.
362
00:19:26,200 --> 00:19:29,380
And all of the points that were
originally an independent set
363
00:19:29,380 --> 00:19:34,110
are now a clique-- in this case,
a 4 clique-- for this new graph.
364
00:19:34,110 --> 00:19:39,450
So it took a long time to create
a solution to independent set,
365
00:19:39,450 --> 00:19:40,710
an exponential amount of time.
366
00:19:40,710 --> 00:19:43,120
But now that we have that,
we're able to create solutions
367
00:19:43,120 --> 00:19:45,550
to other similar problems.
368
00:19:45,550 --> 00:19:49,020
And it turns out there's a wide
range of problems that are related,
369
00:19:49,020 --> 00:19:53,680
even problems not really seemingly
related to graphs at all.
370
00:19:53,680 --> 00:19:59,240
So this is what's called this
the satisfiability problem.
371
00:19:59,240 --> 00:20:05,110
Given a clause, or basically a logic
clause, where each of these variables--
372
00:20:05,110 --> 00:20:09,160
I don't you can see where
I'm pointing in the video,
373
00:20:09,160 --> 00:20:13,840
but these are all basically variables
that are either true or false.
374
00:20:13,840 --> 00:20:17,550
And so what the satisfiability
question is asking
375
00:20:17,550 --> 00:20:20,830
is whether or not this clause
has a truth assignment.
376
00:20:20,830 --> 00:20:26,980
That is, there exists an a and b,
such that this all becomes true.
377
00:20:26,980 --> 00:20:33,800
And for those who don't know, these
little downwards carrots are or's.
378
00:20:33,800 --> 00:20:35,260
And the upward carrots are ands.
379
00:20:35,260 --> 00:20:37,010
And the little-- I
don't what you call it.
380
00:20:37,010 --> 00:20:41,279
The little sideways dashes
are nots, so the opposite.
381
00:20:41,279 --> 00:20:42,320
If it's true, it's false.
382
00:20:42,320 --> 00:20:45,200
If it's false, it's true.
383
00:20:45,200 --> 00:20:48,170
So the question is does this
have a truth assignment?
384
00:20:48,170 --> 00:20:52,610
And maybe you can just eyeball it and
say, well, actually, if A is true,
385
00:20:52,610 --> 00:20:56,070
and B is false, and C is true,
then this statement can be true.
386
00:20:56,070 --> 00:20:58,290
And that indeed, that is the case.
387
00:20:58,290 --> 00:21:01,130
So how do we reduce-- how
do we take that solution
388
00:21:01,130 --> 00:21:05,270
and turn it into a solution to,
say, the vertex cover problem?
389
00:21:05,270 --> 00:21:09,320
In this case, all we have
to do is create this graph.
390
00:21:09,320 --> 00:21:12,295
And there is an algorithm
to create this graph.
391
00:21:12,295 --> 00:21:15,170
And what it does is you basically--
I don't know if you can see this,
392
00:21:15,170 --> 00:21:18,950
but every clause of
the original statement
393
00:21:18,950 --> 00:21:25,400
up here turns into 2 variables
at the bottom of this graph.
394
00:21:25,400 --> 00:21:30,380
So you have AB turning into AB
down here, not A, not B, not C
395
00:21:30,380 --> 00:21:32,520
turning into these 3 points down here.
396
00:21:32,520 --> 00:21:36,200
And then not ABC turning into not ABC.
397
00:21:36,200 --> 00:21:41,700
And then finally, you just take
every a and its complement up here.
398
00:21:41,700 --> 00:21:46,710
Now, let's say we have the
solution, A, not BECAUSE.
399
00:21:46,710 --> 00:21:53,720
Then basically, you want to color
these, A not BC on the top line.
400
00:21:53,720 --> 00:21:57,890
And then you want to connect
every single element to itself.
401
00:21:57,890 --> 00:22:03,680
So not B goes to not B. B goes to
B. A goes to A. Not C to not C.
402
00:22:03,680 --> 00:22:05,240
And that's that.
403
00:22:05,240 --> 00:22:09,618
The point is that basically--
oh, did I miss something?
404
00:22:09,618 --> 00:22:12,210
405
00:22:12,210 --> 00:22:17,570
So then you basically color these
based on-- there is a relationship.
406
00:22:17,570 --> 00:22:19,157
I'm actually blanking on it right now.
407
00:22:19,157 --> 00:22:20,990
But basically, there's
a way to color these.
408
00:22:20,990 --> 00:22:24,530
And basically when you
color the A, not BC,
409
00:22:24,530 --> 00:22:29,090
and then B, not A, not CBA on the
bottom, that creates a vertex cover.
410
00:22:29,090 --> 00:22:31,790
And so you've used a solution,
you've used an algorithm,
411
00:22:31,790 --> 00:22:35,060
that creates a solution
from the satisfiability
412
00:22:35,060 --> 00:22:39,200
into the vertex cover that
is actually very powerful.
413
00:22:39,200 --> 00:22:41,140
And that's what's called a reduction.
414
00:22:41,140 --> 00:22:44,120
I know that was a little bit involved.
415
00:22:44,120 --> 00:22:48,510
I'm still trying to figure
out how these are related.
416
00:22:48,510 --> 00:22:56,660
But basically, A, not BC--
yeah, so I think what it is,
417
00:22:56,660 --> 00:23:00,740
is if it's A is colored
here, then you color all the
418
00:23:00,740 --> 00:23:02,980
not A's in this bottom graph.
419
00:23:02,980 --> 00:23:05,190
And if not B is colored
here, then you colored
420
00:23:05,190 --> 00:23:06,570
all the B's in this bottom graph.
421
00:23:06,570 --> 00:23:10,600
And again, if you color C here,
then the not C is colored here.
422
00:23:10,600 --> 00:23:14,539
And I think using that algorithm, you
can create a solution to vertex cover.
423
00:23:14,539 --> 00:23:16,205
Why is this the solution a vertex cover?
424
00:23:16,205 --> 00:23:20,450
So this is one thing I didn't mention,
is that all of the colored things
425
00:23:20,450 --> 00:23:23,720
are covering every single edge.
426
00:23:23,720 --> 00:23:26,600
That is, every single blue
node here is connected
427
00:23:26,600 --> 00:23:28,670
to every single edge in this graph.
428
00:23:28,670 --> 00:23:31,340
Also, as we discussed
before, vertex cover
429
00:23:31,340 --> 00:23:35,180
is related to independent
set in the compliment.
430
00:23:35,180 --> 00:23:38,390
So if you consider all the
white nodes in this graph,
431
00:23:38,390 --> 00:23:43,190
they form a independent set where no
2 of the white nodes are touching.
432
00:23:43,190 --> 00:23:45,710
So now we've created solutions
to two of these problems
433
00:23:45,710 --> 00:23:48,140
from a seemingly completely
unrelated problem,
434
00:23:48,140 --> 00:23:51,240
which is the satisfiability problem.
435
00:23:51,240 --> 00:23:56,090
And that's actually really powerful when
considering these type of NP problems.
436
00:23:56,090 --> 00:23:58,089
Sorry, that was a little bit wordy.
437
00:23:58,089 --> 00:23:59,880
Hopefully, you got
something out of that's.
438
00:23:59,880 --> 00:24:03,620
Let's go on to more fun things
like the Turing machine.
439
00:24:03,620 --> 00:24:05,660
Way more fun.
440
00:24:05,660 --> 00:24:11,010
Of course, this is due to Alan Turing,
the man who needs no introduction.
441
00:24:11,010 --> 00:24:13,070
But basically, this was
his theoretical model
442
00:24:13,070 --> 00:24:17,000
of a computer in its simplest
form, which is basically
443
00:24:17,000 --> 00:24:18,900
just something that runs an algorithm.
444
00:24:18,900 --> 00:24:22,580
I think this is sort of--
this image helps to see that.
445
00:24:22,580 --> 00:24:28,250
It's a machine that runs an algorithm
on an infinite tape of 1s and 0s.
446
00:24:28,250 --> 00:24:32,990
And that is a computer
at its simplest form.
447
00:24:32,990 --> 00:24:36,170
And Alan Turing saying that that
mathematical functions on numbers
448
00:24:36,170 --> 00:24:40,040
can be just as well computed by a
Turing machine, that was basically what
449
00:24:40,040 --> 00:24:44,940
was concluded in what's called
the Church-Turing thesis.
450
00:24:44,940 --> 00:24:48,650
And I like to think of this
as basically a reduction
451
00:24:48,650 --> 00:24:51,590
of a computer to its simplest
abilities, the ability
452
00:24:51,590 --> 00:24:54,000
to read from memory, the
ability to write from memory.
453
00:24:54,000 --> 00:24:55,291
And that's really all you need.
454
00:24:55,291 --> 00:24:59,930
You could interpret this Turing
machine of just 1s and 0s
455
00:24:59,930 --> 00:25:02,270
as like blocks of 8 bits.
456
00:25:02,270 --> 00:25:05,270
And then you could have bytes, say.
457
00:25:05,270 --> 00:25:08,330
And then you can work in letters,
as opposed to just 1s and 0s.
458
00:25:08,330 --> 00:25:11,480
And so I think reduction is just a
concept that comes up a lot in computer
459
00:25:11,480 --> 00:25:12,965
science, so it's good to know.
460
00:25:12,965 --> 00:25:16,460
And not just in the sense of
problems to problems, but also,
461
00:25:16,460 --> 00:25:19,424
for example, computers and
things that solve those problems.
462
00:25:19,424 --> 00:25:22,550
463
00:25:22,550 --> 00:25:25,974
It's worth noting, there are
also other theoretical ways
464
00:25:25,974 --> 00:25:27,140
of thinking about computers.
465
00:25:27,140 --> 00:25:32,107
Today, we have quantum computers,
which is just another theoretical model
466
00:25:32,107 --> 00:25:32,690
of a computer.
467
00:25:32,690 --> 00:25:36,830
And that has different ramifications
for the P versus NP problem.
468
00:25:36,830 --> 00:25:40,670
And the problem kind of
breaks down in that sense.
469
00:25:40,670 --> 00:25:43,840
But at least for Turing's
notion of computers,
470
00:25:43,840 --> 00:25:45,090
this is what we'll talk about.
471
00:25:45,090 --> 00:25:47,770
472
00:25:47,770 --> 00:25:51,720
Yeah, so what is the
resolution to this problem?
473
00:25:51,720 --> 00:25:52,870
Is P NP?
474
00:25:52,870 --> 00:25:54,790
Is P not NP?
475
00:25:54,790 --> 00:25:57,720
Are these problems-- is there
a problem that's not in NP
476
00:25:57,720 --> 00:26:00,950
but-- excuse me-- in NP
but definitely not in P?
477
00:26:00,950 --> 00:26:02,440
And the answer is pretty unclear.
478
00:26:02,440 --> 00:26:05,590
But there's a lot of
smart people, not like me,
479
00:26:05,590 --> 00:26:10,870
but smarter than me, who think that
there's definitely not a solution.
480
00:26:10,870 --> 00:26:13,840
Because people have spent a lot of
time thinking about this question.
481
00:26:13,840 --> 00:26:15,790
And for a lot of these
NP complete problems,
482
00:26:15,790 --> 00:26:17,800
they just can't find
efficient algorithms.
483
00:26:17,800 --> 00:26:20,300
I think this comic sums that up nicely.
484
00:26:20,300 --> 00:26:23,290
485
00:26:23,290 --> 00:26:28,120
So it's very much seems like P is not
NP, so I hate to break your all hearts.
486
00:26:28,120 --> 00:26:30,670
But a lot of people
think that P is not NP.
487
00:26:30,670 --> 00:26:33,790
There are some people who think P is NP.
488
00:26:33,790 --> 00:26:37,980
And then there are the people who have
different conceptions of the problem
489
00:26:37,980 --> 00:26:39,970
like, well, it's a badly worded problem.
490
00:26:39,970 --> 00:26:45,970
Any of the definitions of P and NP
are unrelated in their definitions.
491
00:26:45,970 --> 00:26:49,720
So there's a lot of conceptions,
but most computer scientists
492
00:26:49,720 --> 00:26:53,335
think it is a well-formed question
and that P is not equal to NP.
493
00:26:53,335 --> 00:26:56,800
494
00:26:56,800 --> 00:27:01,720
So how would one go about proving
this question of the relatedness
495
00:27:01,720 --> 00:27:05,789
of algorithms solving problems.
496
00:27:05,789 --> 00:27:08,830
And it turns out, it's just not easy
because the way we define these sets
497
00:27:08,830 --> 00:27:11,450
is by the existence
of algorithms, right?
498
00:27:11,450 --> 00:27:16,600
So to prove P is NP actually is
"easy" because all you have to do
499
00:27:16,600 --> 00:27:22,210
is find one polynomial time
algorithm for a NP complete problem.
500
00:27:22,210 --> 00:27:25,420
And then immediately, the
whole class collapses into P.
501
00:27:25,420 --> 00:27:28,090
That is, because if you
had a polynomial time
502
00:27:28,090 --> 00:27:30,070
algorithm to solve an
NP complete problem,
503
00:27:30,070 --> 00:27:32,560
you could solve all of the
other NP complete problems
504
00:27:32,560 --> 00:27:38,650
because we told you that they're all
related by some core completeness.
505
00:27:38,650 --> 00:27:42,010
And then to prove P is not
equal to NP is much harder,
506
00:27:42,010 --> 00:27:47,920
in the sense that you have to prove
that an algorithm doesn't exist.
507
00:27:47,920 --> 00:27:51,010
And that is actually kind of difficult
to solve because in the past,
508
00:27:51,010 --> 00:27:53,110
we've just invented new
algorithms, and it just
509
00:27:53,110 --> 00:27:56,740
took enough time before someone was
like, oh, well we can do it this way.
510
00:27:56,740 --> 00:28:02,140
And we don't have proof that an
algorithm doesn't exist in P.
511
00:28:02,140 --> 00:28:06,460
It's only until we just find an
algorithm in P that we can be sure that
512
00:28:06,460 --> 00:28:11,440
that wasn't-- excuse me--
was in NP and not in P.
513
00:28:11,440 --> 00:28:15,920
So that is inherently not an easy task.
514
00:28:15,920 --> 00:28:19,360
And I think that's why that makes
this problem so fascinating and so
515
00:28:19,360 --> 00:28:24,130
frustrating for a lot of smart people.
516
00:28:24,130 --> 00:28:27,820
I think one of the most interesting
ways to think about this problem
517
00:28:27,820 --> 00:28:30,990
is in terms of what if P did equal NP?
518
00:28:30,990 --> 00:28:35,500
And I think this is like
a counter-- or excuse me,
519
00:28:35,500 --> 00:28:39,790
like the-- I don't know what the
word is-- contradictive argument,
520
00:28:39,790 --> 00:28:42,610
where you say, well,
what if P did equal NP?
521
00:28:42,610 --> 00:28:47,320
And if P did equal NP, then the
world would be a lot different.
522
00:28:47,320 --> 00:28:51,280
If P was NP, then every problem
that would be easy to check
523
00:28:51,280 --> 00:28:53,080
would also be easy to solve.
524
00:28:53,080 --> 00:28:55,090
What does that mean?
525
00:28:55,090 --> 00:28:56,800
Actually, I don't know.
526
00:28:56,800 --> 00:29:01,300
I didn't realize this when I first
started to research this problem,
527
00:29:01,300 --> 00:29:06,170
but RSA encryption, which
is like basically the basis
528
00:29:06,170 --> 00:29:08,420
of a lot of our modern
encryption, would be
529
00:29:08,420 --> 00:29:11,230
very easy to crack because
it's based on factoring.
530
00:29:11,230 --> 00:29:13,540
It's based on factoring
large, large numbers.
531
00:29:13,540 --> 00:29:15,760
And if those numbers
were easy to factor,
532
00:29:15,760 --> 00:29:21,790
they would be very easy
to basically decrypt
533
00:29:21,790 --> 00:29:25,700
a lot of messages that are passed
online and things like that.
534
00:29:25,700 --> 00:29:29,420
But of course, we're relying on
the fact that P is not equal to NP,
535
00:29:29,420 --> 00:29:34,480
and that there is no algorithm
that exists currently to do that.
536
00:29:34,480 --> 00:29:37,570
Another ramification via
artificial intelligence systems
537
00:29:37,570 --> 00:29:42,050
would be making huge leaps almost
overnight because a proof to P
538
00:29:42,050 --> 00:29:46,210
equals NP would also tell us
something fundamental about solving
539
00:29:46,210 --> 00:29:50,690
certain problems that AI people solve.
540
00:29:50,690 --> 00:29:52,870
I don't want to go into
details too much there.
541
00:29:52,870 --> 00:29:56,440
But another ramification,
for example, would
542
00:29:56,440 --> 00:29:59,140
be that the economy would be
perfectly efficient, which
543
00:29:59,140 --> 00:30:06,760
means that if there were any arbitrage
between two different-- well,
544
00:30:06,760 --> 00:30:10,670
arbitrage works by taking
advantage of the different price
545
00:30:10,670 --> 00:30:13,390
discrepancies in a graph.
546
00:30:13,390 --> 00:30:18,710
And if those are easy to detect-- that
is, if there were a polynomial time
547
00:30:18,710 --> 00:30:22,120
algorithm to detect them,
then it would immediately
548
00:30:22,120 --> 00:30:25,480
be easy to find these
arbitrage opportunities, which
549
00:30:25,480 --> 00:30:29,350
is sort of a very bad thing.
550
00:30:29,350 --> 00:30:32,860
And then, one other
thing would be-- a lot
551
00:30:32,860 --> 00:30:36,790
of people say that proving
things, or mathematical proofs,
552
00:30:36,790 --> 00:30:39,730
is such an NP problem
because, well, it's
553
00:30:39,730 --> 00:30:41,380
easy to check if a proof is correct.
554
00:30:41,380 --> 00:30:42,710
You just follow the logic.
555
00:30:42,710 --> 00:30:46,960
It's much harder and it's a creative
task to actually create a proof.
556
00:30:46,960 --> 00:30:50,650
And so we might be
able to write programs
557
00:30:50,650 --> 00:30:54,086
that make mathematical
proofs if P did equal
558
00:30:54,086 --> 00:31:00,610
NP, which would just like be
crazy, and that would just
559
00:31:00,610 --> 00:31:03,190
make the whole world explode.
560
00:31:03,190 --> 00:31:06,040
The world would be fundamentally
different than the way we currently
561
00:31:06,040 --> 00:31:08,210
think it is.
562
00:31:08,210 --> 00:31:11,690
And a little more philosophy here,
which is one of the fun things
563
00:31:11,690 --> 00:31:16,510
I like to think about, is proving
things-- well we talked about it.
564
00:31:16,510 --> 00:31:20,770
It is sort of like an NP problem
because you can just follow the logic,
565
00:31:20,770 --> 00:31:22,750
and it's easy to check.
566
00:31:22,750 --> 00:31:24,890
Is comedy an NP problem?
567
00:31:24,890 --> 00:31:28,690
If I tell a joke and you laugh
at it, then it's really easy
568
00:31:28,690 --> 00:31:30,710
to check that that was a funny joke.
569
00:31:30,710 --> 00:31:33,440
Much harder task to write a funny joke.
570
00:31:33,440 --> 00:31:36,440
I know, because I do stand
up, and enough people
571
00:31:36,440 --> 00:31:38,890
have sat in silence at my jokes.
572
00:31:38,890 --> 00:31:41,690
I know it's a very hard thing to do.
573
00:31:41,690 --> 00:31:43,510
Music.
574
00:31:43,510 --> 00:31:44,690
Appreciation of music.
575
00:31:44,690 --> 00:31:45,790
Is it easy thing to check?
576
00:31:45,790 --> 00:31:46,390
Well, yeah.
577
00:31:46,390 --> 00:31:49,060
If you just listen to something,
you can kind of intuitively
578
00:31:49,060 --> 00:31:51,130
tell that that is a nice sound.
579
00:31:51,130 --> 00:31:56,380
But it is actually much different
much more difficult to create music,
580
00:31:56,380 --> 00:31:58,780
to create good music.
581
00:31:58,780 --> 00:32:01,840
These are all sort of
philosophical arguments.
582
00:32:01,840 --> 00:32:06,190
These aren't really-- the actual
definitions of P versus NP
583
00:32:06,190 --> 00:32:11,260
that we're exploring have a much more
well-defined relationship to Turing
584
00:32:11,260 --> 00:32:13,460
machines and what are called languages.
585
00:32:13,460 --> 00:32:15,940
But when you think very high
level, the ramifications
586
00:32:15,940 --> 00:32:21,640
are endless for any sort
of creative pursuit.
587
00:32:21,640 --> 00:32:25,046
And Scott Aaronson of MIT once
said that, "If P equaled NP,
588
00:32:25,046 --> 00:32:27,920
then the world would be a profoundly
different place than we normally
589
00:32:27,920 --> 00:32:28,586
assume it to be.
590
00:32:28,586 --> 00:32:30,880
There would be no special
value in creative leaps.
591
00:32:30,880 --> 00:32:33,610
There would be no fundamental
gap between solving a problem
592
00:32:33,610 --> 00:32:35,890
and recognizing the
solution once it's found.
593
00:32:35,890 --> 00:32:38,380
Everyone who can appreciate
a symphony would be Mozart,
594
00:32:38,380 --> 00:32:41,470
and everyone who could follow a
step-by-step argument would be Gauss."
595
00:32:41,470 --> 00:32:45,390
And I think that tells you something
about the crazy ramifications
596
00:32:45,390 --> 00:32:48,625
of a proof that P did equal to NP.
597
00:32:48,625 --> 00:32:52,780
598
00:32:52,780 --> 00:32:53,989
Yeah, so why should you care?
599
00:32:53,989 --> 00:32:56,738
Hopefully, you're sitting here and
being like, well, that's great,
600
00:32:56,738 --> 00:32:58,240
but I have a final product to do.
601
00:32:58,240 --> 00:33:00,210
And this doesn't really help at all.
602
00:33:00,210 --> 00:33:04,750
I don't think many of you are going to
be doing theoretical proofs or anything
603
00:33:04,750 --> 00:33:06,670
like that or talking about theory.
604
00:33:06,670 --> 00:33:08,120
Although, I think you should.
605
00:33:08,120 --> 00:33:12,130
We talk in CS50 about running
time of different algorithms.
606
00:33:12,130 --> 00:33:16,960
And that is very relevant because
there could be a faster way
607
00:33:16,960 --> 00:33:18,920
to do the problems that
you're thinking about.
608
00:33:18,920 --> 00:33:20,429
You just don't know yet.
609
00:33:20,429 --> 00:33:22,720
And that's why theoretical
computer scientists research
610
00:33:22,720 --> 00:33:26,170
the fastest way to do certain things.
611
00:33:26,170 --> 00:33:29,669
And that's why reductions have
shown us-- reductions from NP
612
00:33:29,669 --> 00:33:32,210
complete problems have shown us
that like if a problem exists
613
00:33:32,210 --> 00:33:34,690
of this problem, a problem
that you're trying to solve
614
00:33:34,690 --> 00:33:36,610
might actually have already been solved.
615
00:33:36,610 --> 00:33:41,690
Or another other thing is that, well,
if you assume P doesn't equal to NP,
616
00:33:41,690 --> 00:33:46,840
something you're trying to do
might not be that easy to do.
617
00:33:46,840 --> 00:33:50,380
And if that wasn't motivation
again for why you should care,
618
00:33:50,380 --> 00:33:52,420
well, it's a million
dollar question, right?
619
00:33:52,420 --> 00:33:58,810
And there's nothing that computer
scientists like more than computers
620
00:33:58,810 --> 00:34:00,130
except for money, right?
621
00:34:00,130 --> 00:34:02,440
So I think that should
be motivation enough
622
00:34:02,440 --> 00:34:06,140
for you to be interested
in the P versus NP problem.
623
00:34:06,140 --> 00:34:09,020
It is something that is
really interesting to me.
624
00:34:09,020 --> 00:34:13,027
I know I took you through the weeds a
little bit with the actual reductions,
625
00:34:13,027 --> 00:34:14,360
but I hope that was interesting.
626
00:34:14,360 --> 00:34:16,270
And that's really what
computer scientists do
627
00:34:16,270 --> 00:34:21,110
when they think about these types of
problems, is reduce problems to another
628
00:34:21,110 --> 00:34:25,690
and sort of relate these algorithms.
629
00:34:25,690 --> 00:34:31,026
And so I think that's about all I have.
630
00:34:31,026 --> 00:34:31,900
But I hope you enjoy.
631
00:34:31,900 --> 00:34:32,733
Here are some links.
632
00:34:32,733 --> 00:34:37,449
Just the P versus NP page is a
collection of the status and history
633
00:34:37,449 --> 00:34:38,650
of the problem.
634
00:34:38,650 --> 00:34:40,340
And this is just a video I really like.
635
00:34:40,340 --> 00:34:43,989
It explains all these things very well.
636
00:34:43,989 --> 00:34:45,219
That's all I have.
637
00:34:45,219 --> 00:34:49,210
I hope that was enjoyable for you.
638
00:34:49,210 --> 00:34:55,030
And for everyone out there
enjoying it on YouTube, I guess,
639
00:34:55,030 --> 00:34:57,380
I hope you could take
something away from this.
640
00:34:57,380 --> 00:35:02,090
And yeah, P versus NP.
641
00:35:02,090 --> 00:35:03,910
It's pretty cool.
642
00:35:03,910 --> 00:35:06,141