1 00:00:00,000 --> 00:00:02,097 2 00:00:02,097 --> 00:00:03,930 SAMMY: Well, thanks for watching my seminar. 3 00:00:03,930 --> 00:00:05,930 My name is Sammy Mara. 4 00:00:05,930 --> 00:00:09,860 I am a senior studying computer science here at Harvard. 5 00:00:09,860 --> 00:00:12,180 And P versus NP is what I'm going to be talking about. 6 00:00:12,180 --> 00:00:15,450 It is the greatest unsolved problem in computer science, 7 00:00:15,450 --> 00:00:19,170 and maybe all of math, I think. 8 00:00:19,170 --> 00:00:21,520 And it's something that has fascinated me for a while. 9 00:00:21,520 --> 00:00:24,180 So I think it's a natural extension of the CS50, 10 00:00:24,180 --> 00:00:26,840 and I hope you learn something from it. 11 00:00:26,840 --> 00:00:29,340 We'll go into some of the details, as well as 12 00:00:29,340 --> 00:00:33,630 the big picture of what is P versus NP. 13 00:00:33,630 --> 00:00:34,490 And yeah. 14 00:00:34,490 --> 00:00:37,440 So let's get started here. 15 00:00:37,440 --> 00:00:41,100 This is just to show that P versus NP is sort of like a relevant problem, 16 00:00:41,100 --> 00:00:42,750 and it shows up in pop culture. 17 00:00:42,750 --> 00:00:46,470 This is Homer Simpson in some matrix void thing, 18 00:00:46,470 --> 00:00:50,340 and you can see P equals NP floating in the background there. 19 00:00:50,340 --> 00:00:53,970 And then in Futurama-- I guess that's also a Matt Groening production-- 20 00:00:53,970 --> 00:00:58,640 P and NP in separate books, sort of implying that P is not equal to NP. 21 00:00:58,640 --> 00:01:02,880 22 00:01:02,880 --> 00:01:05,610 Really, the question is whether or not P is equal to NP, 23 00:01:05,610 --> 00:01:09,280 and we'll get into what that means in just a second. 24 00:01:09,280 --> 00:01:12,520 But if you're watching this and you thinking hey, 25 00:01:12,520 --> 00:01:16,740 I want to solve P versus NP for my final project, got some bad news. 26 00:01:16,740 --> 00:01:20,550 It's probably not the best idea. 27 00:01:20,550 --> 00:01:25,080 Don't want to discourage you, but a lot of really, really famous, smart people, 28 00:01:25,080 --> 00:01:28,890 much smarter than me, have tried to solve this problem for a long time. 29 00:01:28,890 --> 00:01:33,409 And its applications are very far-reaching, 30 00:01:33,409 --> 00:01:35,700 and I think we can learn a lot from it, but the problem 31 00:01:35,700 --> 00:01:37,237 may not be solved for a long time. 32 00:01:37,237 --> 00:01:39,750 33 00:01:39,750 --> 00:01:45,450 And yeah, why should we care about the theory of computer science? 34 00:01:45,450 --> 00:01:49,670 And in CS50, you learn a lot about the running time of algorithms. 35 00:01:49,670 --> 00:01:52,260 And we talk a lot about how algorithms work 36 00:01:52,260 --> 00:01:55,920 and what are the best ways to solve certain problems. 37 00:01:55,920 --> 00:01:58,980 But theory of computer science sort of goes 38 00:01:58,980 --> 00:02:01,210 into how can we extrapolate from there? 39 00:02:01,210 --> 00:02:05,970 And what are the fundamental things about algorithms that make them 40 00:02:05,970 --> 00:02:09,960 similar to one another and different from one another, as well as 41 00:02:09,960 --> 00:02:13,450 how to model the computer in its simplest form? 42 00:02:13,450 --> 00:02:17,070 That's something that Alan Turing did, way back when, 43 00:02:17,070 --> 00:02:20,119 and he became famous for that. 44 00:02:20,119 --> 00:02:22,410 Everyone is-- I mean, not everyone, but a lot of people 45 00:02:22,410 --> 00:02:24,035 are familiar with like Turing machines. 46 00:02:24,035 --> 00:02:28,180 That's sort of his theoretical model of a computer. 47 00:02:28,180 --> 00:02:28,980 And yeah. 48 00:02:28,980 --> 00:02:32,070 It helps us solve problems and use solutions from one problem 49 00:02:32,070 --> 00:02:36,840 to solve other problems, so it's pretty profound. 50 00:02:36,840 --> 00:02:39,900 And yeah. 51 00:02:39,900 --> 00:02:44,002 So I have a little more motivation in answering this question of P versus NP. 52 00:02:44,002 --> 00:02:45,960 And I still haven't really told you what it is. 53 00:02:45,960 --> 00:02:49,380 But solving this problem would essentially 54 00:02:49,380 --> 00:02:53,280 help us with questions like are algorithms invented or discovered? 55 00:02:53,280 --> 00:02:56,191 Is there some fast algorithm for solving Sudoku 56 00:02:56,191 --> 00:02:57,690 that we just haven't discovered yet? 57 00:02:57,690 --> 00:03:00,780 Or did it just exist in the ether and we have discovered it yet? 58 00:03:00,780 --> 00:03:05,370 Or are we just like too dumb to discover it, but like definitely exists? 59 00:03:05,370 --> 00:03:08,360 Or does it just not exist at all? 60 00:03:08,360 --> 00:03:10,980 And yeah, this question is-- if that wasn't motivation enough, 61 00:03:10,980 --> 00:03:12,990 it's a million dollar question in the sense 62 00:03:12,990 --> 00:03:16,560 that the Clay Institute-- It's one of Clay Institute's Millennium Prize 63 00:03:16,560 --> 00:03:21,600 problems, which means there is $1 million for whoever person 64 00:03:21,600 --> 00:03:25,600 can solve it and prove it, one way or the other. 65 00:03:25,600 --> 00:03:30,780 Because what do computer scientists love more than computers? 66 00:03:30,780 --> 00:03:33,380 Yeah, money. 67 00:03:33,380 --> 00:03:37,140 And I'm hoping that a CS50 student will be the one 68 00:03:37,140 --> 00:03:39,180 to solve this problem one day. 69 00:03:39,180 --> 00:03:41,860 So I hope you can-- 70 00:03:41,860 --> 00:03:44,430 So what is the question of P versus NP? 71 00:03:44,430 --> 00:03:50,430 And it's really, are problem that are easy to check also easy to solve? 72 00:03:50,430 --> 00:03:55,200 And that's sort of a very, very, very high-level thought-- way 73 00:03:55,200 --> 00:03:55,980 to think about it. 74 00:03:55,980 --> 00:03:58,639 But what is this word easy mean in this context? 75 00:03:58,639 --> 00:04:00,180 What does it mean to check a problem? 76 00:04:00,180 --> 00:04:02,160 What does it mean to solve a problem? 77 00:04:02,160 --> 00:04:04,795 Hoping to go into all of that in just a minute. 78 00:04:04,795 --> 00:04:06,336 And hopefully that will become clear. 79 00:04:06,336 --> 00:04:09,010 80 00:04:09,010 --> 00:04:13,410 So yeah, this problem in particular has a long history. 81 00:04:13,410 --> 00:04:16,260 Back in the mid 1960s, when computers were still 82 00:04:16,260 --> 00:04:19,410 the size of cabinets and small houses, people 83 00:04:19,410 --> 00:04:22,980 were still trying to find algorithms to solve basically all the world's 84 00:04:22,980 --> 00:04:27,060 problems, like factoring and the Traveling Salesman Problem, 85 00:04:27,060 --> 00:04:30,860 and determining if a number is prime, and problems like these. 86 00:04:30,860 --> 00:04:33,510 But some problems they had slow algorithms for, 87 00:04:33,510 --> 00:04:36,450 they were able to solve them, but it just took a long time. 88 00:04:36,450 --> 00:04:39,690 And for some algorithms, like find the greatest common divisor and sorting, 89 00:04:39,690 --> 00:04:42,240 we have fast algorithms like merge sort. 90 00:04:42,240 --> 00:04:45,660 That's, worst case, running a prime of, I think, n log n. 91 00:04:45,660 --> 00:04:46,220 I'm a CI. 92 00:04:46,220 --> 00:04:48,360 I should know that. 93 00:04:48,360 --> 00:04:50,340 So things like that just have fast algorithms. 94 00:04:50,340 --> 00:04:55,080 But for some problems, we just have no no fast solution. 95 00:04:55,080 --> 00:04:58,440 And then in 1965, people came up with a solution 96 00:04:58,440 --> 00:05:02,040 to the discrete Fourier transform, and suddenly, that 97 00:05:02,040 --> 00:05:04,500 was in the fast category. 98 00:05:04,500 --> 00:05:07,590 And for things like determining if a number is prime, 99 00:05:07,590 --> 00:05:13,560 that took until 2002 for someone to find an algorithm to do that, 100 00:05:13,560 --> 00:05:15,540 until it could be in this fast category. 101 00:05:15,540 --> 00:05:17,940 And I'll explain what a fast algorithm is versus 102 00:05:17,940 --> 00:05:20,889 a slow algorithm in just a second. 103 00:05:20,889 --> 00:05:23,180 But for a lot of these algorithms, there were just no-- 104 00:05:23,180 --> 00:05:25,805 for a lot of these problems, there were just no such algorithms 105 00:05:25,805 --> 00:05:28,800 that were able to be solved quickly. 106 00:05:28,800 --> 00:05:34,590 And it didn't seem like we would ever find a fast solution to these problem. 107 00:05:34,590 --> 00:05:39,120 And so that is really the question, is will we ever find fast solutions 108 00:05:39,120 --> 00:05:41,640 to these problems? 109 00:05:41,640 --> 00:05:44,670 And actually, the answer is very unclear. 110 00:05:44,670 --> 00:05:48,390 Although, I think one of the most important moments for this problem 111 00:05:48,390 --> 00:05:56,070 was a theorem that Steven Cook of MIT and Leonid Levin came up with. 112 00:05:56,070 --> 00:05:57,960 And it basically said that a lot of problems 113 00:05:57,960 --> 00:06:00,390 that we face in computer science are all connected 114 00:06:00,390 --> 00:06:02,460 by some sort of core difficulty. 115 00:06:02,460 --> 00:06:04,700 And what that means is that if you could solve-- 116 00:06:04,700 --> 00:06:07,450 if you could have a fast solution to one algorithm or one problem, 117 00:06:07,450 --> 00:06:10,020 you could actually use the same algorithm 118 00:06:10,020 --> 00:06:13,170 to solve very different, seemingly different problem. 119 00:06:13,170 --> 00:06:17,490 And that was a good way to categorize these problems, 120 00:06:17,490 --> 00:06:22,320 and we learned a lot from that about complexity and general complexity 121 00:06:22,320 --> 00:06:27,280 theory being the fields of categorizing these problems. 122 00:06:27,280 --> 00:06:33,240 So I think that was a really important moment. 123 00:06:33,240 --> 00:06:33,740 Yeah. 124 00:06:33,740 --> 00:06:40,320 So let's go into a little more detail here about what P is, what is NP, 125 00:06:40,320 --> 00:06:43,870 why are they fighting, and so forth. 126 00:06:43,870 --> 00:06:49,310 But at the very base of it, P and NP are sets of problem. 127 00:06:49,310 --> 00:06:52,620 Normally in math, we think about sets of numbers or sets of elements 128 00:06:52,620 --> 00:06:54,020 or sets of words or whatever. 129 00:06:54,020 --> 00:06:56,460 But now we're talking about sets of problems. 130 00:06:56,460 --> 00:06:59,010 And the way these problems are defined is whether or not 131 00:06:59,010 --> 00:07:01,230 they have these fast algorithms. 132 00:07:01,230 --> 00:07:06,960 So some problems, like the problem in P have known fast algorithms. 133 00:07:06,960 --> 00:07:14,880 And some problems that are in NP are at least able to be checked quickly. 134 00:07:14,880 --> 00:07:18,330 And definitely, if a problem is P, it is in NP 135 00:07:18,330 --> 00:07:21,360 because if you can solve a problem very quickly, 136 00:07:21,360 --> 00:07:25,110 it's also easy to check it because you just solve it again. 137 00:07:25,110 --> 00:07:30,520 And that will all become clear, hopefully, in the next few slides. 138 00:07:30,520 --> 00:07:34,710 But yeah, so let's talk first a little more in-depth about what is P? 139 00:07:34,710 --> 00:07:40,470 The set polynomial time, which is basically the set of problems 140 00:07:40,470 --> 00:07:44,010 where there exists a polynomial time algorithm that generates a solution. 141 00:07:44,010 --> 00:07:46,290 That might be a little bit confusing at first. 142 00:07:46,290 --> 00:07:50,850 But in CS50, hopefully, you learned a little bit about big O notation 143 00:07:50,850 --> 00:07:53,940 or worst case algorithm runtime. 144 00:07:53,940 --> 00:07:57,270 And what this means is that these are the types of problem that can be 145 00:07:57,270 --> 00:07:59,670 solved in a polynomial amount of time. 146 00:07:59,670 --> 00:08:04,260 That could be like 3n to the 5th plus 5, where 147 00:08:04,260 --> 00:08:10,140 n is the size of the problem by some definition. 148 00:08:10,140 --> 00:08:12,990 So these problems include finding the GCD of two numbers. 149 00:08:12,990 --> 00:08:15,390 We have a fast algorithm for that. 150 00:08:15,390 --> 00:08:19,170 It's been known for lots of-- many, many years due to Euclid. 151 00:08:19,170 --> 00:08:20,420 Excuse me. 152 00:08:20,420 --> 00:08:24,270 And you can see this is the Euclidean algorithm right there. 153 00:08:24,270 --> 00:08:27,210 And it just is very deterministic. 154 00:08:27,210 --> 00:08:29,850 You just put in input, and you can slowly 155 00:08:29,850 --> 00:08:33,330 boil down the problem into a solution. 156 00:08:33,330 --> 00:08:36,274 Linear programming is basically like a set of equations. 157 00:08:36,274 --> 00:08:38,190 Let's not worry about that too much right now, 158 00:08:38,190 --> 00:08:42,062 but that's another problem that is well known to be in P. 159 00:08:42,062 --> 00:08:45,270 And determining if a number is prime, that actually-- as I mentioned earlier, 160 00:08:45,270 --> 00:08:47,790 that result wasn't found until 2002. 161 00:08:47,790 --> 00:08:52,980 But that is also in the P family of problems. 162 00:08:52,980 --> 00:08:56,610 And multiplication, as you probably all know, the algorithm 163 00:08:56,610 --> 00:08:58,166 for multiplying two numbers. 164 00:08:58,166 --> 00:08:59,790 You just sort of go through one by one. 165 00:08:59,790 --> 00:09:03,550 And that's, at least-- if you multiply every digit by every other digit, 166 00:09:03,550 --> 00:09:05,070 that's sort of like n squared. 167 00:09:05,070 --> 00:09:06,111 And then you add them up. 168 00:09:06,111 --> 00:09:07,620 So it's at most n cubed. 169 00:09:07,620 --> 00:09:11,010 But crucially, these albums-- excuse me-- algorithms 170 00:09:11,010 --> 00:09:14,820 aren't like 2 to the n or something exponential. 171 00:09:14,820 --> 00:09:22,380 So that's what makes them in this category of P. Yes. 172 00:09:22,380 --> 00:09:28,530 And yeah, so let's move on a little bit to the set NP, which 173 00:09:28,530 --> 00:09:31,920 stands for nondeterministic polynomial-time time, which 174 00:09:31,920 --> 00:09:36,570 is sort of like a way of saying, well, they have polynomial algorithms 175 00:09:36,570 --> 00:09:38,970 to solve them, but they're nondeterministic, 176 00:09:38,970 --> 00:09:42,300 which means you just sort of have to guess or use 177 00:09:42,300 --> 00:09:44,910 an infinite number of computers before you can solve them 178 00:09:44,910 --> 00:09:46,710 in a polynomial amount of time. 179 00:09:46,710 --> 00:09:50,070 But the way these are defined formally-- and before, 180 00:09:50,070 --> 00:09:53,190 we talked about problems that can be solved quickly. 181 00:09:53,190 --> 00:09:56,370 Now, we're talking about problems that can be checked quickly. 182 00:09:56,370 --> 00:09:59,609 And what that means is it's a set of decision problems-- 183 00:09:59,609 --> 00:10:02,400 and I'll explain what that means in just a second-- for which there 184 00:10:02,400 --> 00:10:06,480 exists a polynomial algorithm to check if a solution is correct. 185 00:10:06,480 --> 00:10:09,840 So before, we were saying I'm giving you a problem; tell me 186 00:10:09,840 --> 00:10:16,290 a solution and some definition of the solution. 187 00:10:16,290 --> 00:10:19,380 And now I'm saying here's a problem and a solution, 188 00:10:19,380 --> 00:10:20,940 and I want you to check the solution. 189 00:10:20,940 --> 00:10:25,890 So does that checking take a polynomial amount of time? 190 00:10:25,890 --> 00:10:30,000 And so these problems are-- I guess problems 191 00:10:30,000 --> 00:10:34,710 included in this set are like Sudoku because if I give you a Sudoku board 192 00:10:34,710 --> 00:10:36,959 and said this is solved, how fast would it 193 00:10:36,959 --> 00:10:38,500 be for you to check if it was solved? 194 00:10:38,500 --> 00:10:40,166 It actually wouldn't take you that long. 195 00:10:40,166 --> 00:10:44,550 You'd probably just go through every row, column, and nine 196 00:10:44,550 --> 00:10:47,700 by-- excuse me-- three by three square to check. 197 00:10:47,700 --> 00:10:50,040 And that's what n cubed. 198 00:10:50,040 --> 00:10:52,980 But crucially, it's not something like 9 to the n, which 199 00:10:52,980 --> 00:10:55,066 would be an exponential amount of time. 200 00:10:55,066 --> 00:10:57,800 201 00:10:57,800 --> 00:11:00,800 And in other problems like factoring, it's 202 00:11:00,800 --> 00:11:04,060 easy to check if a factor or a factorization is correct. 203 00:11:04,060 --> 00:11:05,990 You just multiply the numbers, and then you 204 00:11:05,990 --> 00:11:08,440 can see if that factorization is correct. 205 00:11:08,440 --> 00:11:13,370 A much harder problem to factor the numbers, by the way. 206 00:11:13,370 --> 00:11:18,970 And Traveling Sales problem's an example of famous NP problem 207 00:11:18,970 --> 00:11:22,217 where you're given a graph with vertices and edges. 208 00:11:22,217 --> 00:11:24,800 And we'll get into a little more about graphs in just a second 209 00:11:24,800 --> 00:11:28,720 because those turn out to be really relevant to NP problems 210 00:11:28,720 --> 00:11:31,130 and thinking about them. 211 00:11:31,130 --> 00:11:34,520 But again, multiplication-- I thought we just said that was a P problem. 212 00:11:34,520 --> 00:11:36,800 Well, if something's in P, it's also in NP 213 00:11:36,800 --> 00:11:39,860 because if I said, well, what's 51 times 3? 214 00:11:39,860 --> 00:11:43,130 If I said 51 times 3 is 153, well, how would you know? 215 00:11:43,130 --> 00:11:47,010 You'd probably just check it yourself and multiply yourself. 216 00:11:47,010 --> 00:11:49,490 And of course, that's, as we described, a P algorithm. 217 00:11:49,490 --> 00:11:53,570 So that has a P algorithm to check whether or not my solution is correct. 218 00:11:53,570 --> 00:11:55,570 That's how you know it's in NP. 219 00:11:55,570 --> 00:11:59,950 So we've talked about P. We've talked about NP. 220 00:11:59,950 --> 00:12:02,900 So now, we have a couple examples here, and maybe give you 221 00:12:02,900 --> 00:12:06,410 a better grasp of how to think about different problems. 222 00:12:06,410 --> 00:12:10,670 So sorting a list, is that in P or NP? 223 00:12:10,670 --> 00:12:14,930 Well, we talked about sorting and the running time of those in CS50. 224 00:12:14,930 --> 00:12:18,290 Hopefully, you know that that's, for a lot of sorts, 225 00:12:18,290 --> 00:12:23,300 that's n log n or n squared, I think, for a quick sort. 226 00:12:23,300 --> 00:12:25,940 But those all have polynomial time algorithms, 227 00:12:25,940 --> 00:12:29,630 so those would be in P, also in NP. 228 00:12:29,630 --> 00:12:32,220 Multiplication, we just described P algorithm, right? 229 00:12:32,220 --> 00:12:35,420 It's something like n cubed. 230 00:12:35,420 --> 00:12:38,600 And then given a set of vertices and edges, is the graph connected? 231 00:12:38,600 --> 00:12:41,290 Is that P or NP? 232 00:12:41,290 --> 00:12:43,670 Well, you just start at a point, and you traverse 233 00:12:43,670 --> 00:12:46,970 until you have seen all the edges. 234 00:12:46,970 --> 00:12:48,970 If you haven't seen-- excuse me, all the points. 235 00:12:48,970 --> 00:12:55,430 If you haven't seen all of the vertices, then you know it is not well connected. 236 00:12:55,430 --> 00:12:57,800 And another example would be a Rubik's cube. 237 00:12:57,800 --> 00:12:59,570 First of all, is that in NP? 238 00:12:59,570 --> 00:13:04,111 That is, is there a way to check if the Rubik's cube is solved? 239 00:13:04,111 --> 00:13:04,610 Well, yeah. 240 00:13:04,610 --> 00:13:05,580 It's pretty easy. 241 00:13:05,580 --> 00:13:08,150 You just check if all the faces are solved. 242 00:13:08,150 --> 00:13:09,200 But is it in P? 243 00:13:09,200 --> 00:13:13,070 Is there a polynomial time algorithm to actually solve the Rubik's cube? 244 00:13:13,070 --> 00:13:15,650 That is actually a little harder to think about, 245 00:13:15,650 --> 00:13:22,587 but it turns out it is actually-- computers can do that fairly fast. 246 00:13:22,587 --> 00:13:24,170 And what about the best move in chess? 247 00:13:24,170 --> 00:13:25,370 Is that in NP? 248 00:13:25,370 --> 00:13:28,640 Well, if I told you that this was the best move in chess, 249 00:13:28,640 --> 00:13:31,550 could you check that it's the best move in chess? 250 00:13:31,550 --> 00:13:32,570 Probably not. 251 00:13:32,570 --> 00:13:35,120 I think you'd have to go through a more calculation 252 00:13:35,120 --> 00:13:37,670 to determine if that was the best move. 253 00:13:37,670 --> 00:13:40,630 And that problem, actually, is not in NP or P. 254 00:13:40,630 --> 00:13:43,220 That's in its own class called exponential time, 255 00:13:43,220 --> 00:13:48,820 where it takes exponential time to even check if a solution is correct. 256 00:13:48,820 --> 00:13:51,620 And then there's problems like-- a classic NP 257 00:13:51,620 --> 00:13:56,090 problem is like the subset sum problem, which means given a set of integers, 258 00:13:56,090 --> 00:13:59,900 say, I'm asking do they sum to negative-- is there 259 00:13:59,900 --> 00:14:02,130 a subset that sums to negative 1? 260 00:14:02,130 --> 00:14:05,980 And in this case, you could just look at it. 261 00:14:05,980 --> 00:14:08,210 And if I gave you-- sorry, if I gave you a subset 262 00:14:08,210 --> 00:14:11,085 and then said these sub to minus 1, it would be really easy to check. 263 00:14:11,085 --> 00:14:14,530 You just sum them up, and they sum to minus 1. 264 00:14:14,530 --> 00:14:17,230 But solving that problem actually isn't very easy. 265 00:14:17,230 --> 00:14:19,430 And there's some sort of exponential checking 266 00:14:19,430 --> 00:14:24,950 that you have to do to determine if there is such a subset. 267 00:14:24,950 --> 00:14:28,400 And if you think about how many subsets there are, 2 to the n 268 00:14:28,400 --> 00:14:31,310 subsets, I believe, where there's n members of a set. 269 00:14:31,310 --> 00:14:35,900 So you have to check a lot, and that's a good indicator that it is not in P. 270 00:14:35,900 --> 00:14:41,290 But in this case, it is, as we showed, in NP because it's easy to check. 271 00:14:41,290 --> 00:14:45,050 272 00:14:45,050 --> 00:14:49,930 So I was talking to you before about NP completeness. 273 00:14:49,930 --> 00:14:53,210 And this is sort of one of the most important concepts 274 00:14:53,210 --> 00:14:57,289 to understand when talking about P versus NP problem. 275 00:14:57,289 --> 00:15:00,580 Basically, what this means-- and this is related to the Cook-Levin theorem that 276 00:15:00,580 --> 00:15:03,510 was talking about earlier. 277 00:15:03,510 --> 00:15:06,320 It essentially says that all of these problems-- and this 278 00:15:06,320 --> 00:15:08,610 is an example of a bunch of NP problems-- 279 00:15:08,610 --> 00:15:10,090 they're all sort of connected. 280 00:15:10,090 --> 00:15:13,610 Where, if you have a solution, say, to the graph coloring problem, 281 00:15:13,610 --> 00:15:17,540 you can use it to form a solution to the clique cover problem 282 00:15:17,540 --> 00:15:21,530 or to the hitting set problem or to the yada, yada-- you can basically 283 00:15:21,530 --> 00:15:23,600 form one solution into the other. 284 00:15:23,600 --> 00:15:29,220 And that's really powerful because if someone one day finds a polynomial time 285 00:15:29,220 --> 00:15:31,370 algorithm to do one of these problems, that 286 00:15:31,370 --> 00:15:34,940 means they can find a polynomial time algorithm to do all of these problems 287 00:15:34,940 --> 00:15:39,119 because they're all sort of related, and they can be converted to one another 288 00:15:39,119 --> 00:15:39,660 very quickly. 289 00:15:39,660 --> 00:15:42,190 290 00:15:42,190 --> 00:15:44,580 And I sort of glossed over detail there, which 291 00:15:44,580 --> 00:15:49,180 is how fast can these algorithms solutions be converted to one another? 292 00:15:49,180 --> 00:15:50,910 And that has to do with reduction. 293 00:15:50,910 --> 00:15:53,560 And I just want to talk about this for a second 294 00:15:53,560 --> 00:15:57,600 because this is one of the more core concepts that 295 00:15:57,600 --> 00:15:59,236 is involved when proving these things. 296 00:15:59,236 --> 00:16:01,110 And this is actually what computer scientists 297 00:16:01,110 --> 00:16:04,980 have to do when they talk about-- are these problems NP complete? 298 00:16:04,980 --> 00:16:08,440 Are these problems related by some core difficulty? 299 00:16:08,440 --> 00:16:10,260 And what it means is basically a reduction 300 00:16:10,260 --> 00:16:13,260 is a polynomial time algorithm that converts a solution from one problem 301 00:16:13,260 --> 00:16:15,190 to the solution of another problem. 302 00:16:15,190 --> 00:16:19,331 And I'm going to be talking a little bit about three particular computer science 303 00:16:19,331 --> 00:16:21,330 related problems, which are the independents set 304 00:16:21,330 --> 00:16:24,450 problem, the vertex cover problem, and the clique problem. 305 00:16:24,450 --> 00:16:29,040 Independent set problem has to do with-- basically, 306 00:16:29,040 --> 00:16:31,470 given a graph like the one I'm showing in the slide, 307 00:16:31,470 --> 00:16:37,630 are there are a set of k vertices, such that there are no 2 of those vertices 308 00:16:37,630 --> 00:16:40,290 are touching or related by an edge? 309 00:16:40,290 --> 00:16:43,690 The vertex cover problem means can you give me 310 00:16:43,690 --> 00:16:51,090 a minimum set of vertices that covers all the edges? 311 00:16:51,090 --> 00:16:53,420 That is, that all of the edges are-- excuse me. 312 00:16:53,420 --> 00:16:58,570 All of the vertices are touching every one of the edges in the graph. 313 00:16:58,570 --> 00:17:02,040 And then the clique problem-- I don't know if it's "click" or "kleek." 314 00:17:02,040 --> 00:17:04,079 Some people say "kleek." 315 00:17:04,079 --> 00:17:06,510 But it's related to the colloquial sense of the word 316 00:17:06,510 --> 00:17:09,569 clique, which is like a friend group. 317 00:17:09,569 --> 00:17:13,050 Is there a friend group, essentially, in this graph 318 00:17:13,050 --> 00:17:18,609 where every single person in the graph, every person represented by a vertex, 319 00:17:18,609 --> 00:17:21,700 is connected to every single other vertex in the clique? 320 00:17:21,700 --> 00:17:23,920 And in this case, you can see there is actually 321 00:17:23,920 --> 00:17:28,280 a 5 clique in the bottom left of these 5 vertexes 322 00:17:28,280 --> 00:17:30,500 are all connected to each other. 323 00:17:30,500 --> 00:17:32,920 And so that makes them what's called a 5 clique. 324 00:17:32,920 --> 00:17:35,711 So these are the three problems that I'm going to go into in detail 325 00:17:35,711 --> 00:17:36,640 just a second. 326 00:17:36,640 --> 00:17:40,670 And I'm going to show you that they're actually all sort of the same problem. 327 00:17:40,670 --> 00:17:44,330 And solving one can actually give you a solution to another. 328 00:17:44,330 --> 00:17:48,510 So let's take the example of this graph. 329 00:17:48,510 --> 00:17:52,540 Again, graphs generally is a set of vertices, excuse me, 330 00:17:52,540 --> 00:17:56,220 and edges connecting them. 331 00:17:56,220 --> 00:17:58,140 And this is an example of an independent set. 332 00:17:58,140 --> 00:17:59,280 So how do we get this? 333 00:17:59,280 --> 00:18:04,740 If I just said to you give me 4 of these points, such that no 2 of them 334 00:18:04,740 --> 00:18:10,781 are related by an edge, then this would be the solution. 335 00:18:10,781 --> 00:18:11,280 Great. 336 00:18:11,280 --> 00:18:12,580 So how long does that take? 337 00:18:12,580 --> 00:18:14,440 Well, if it were a big graph, it would have 338 00:18:14,440 --> 00:18:16,460 taken some exponential amount of time. 339 00:18:16,460 --> 00:18:19,180 There's no polynomial time algorithm to do that. 340 00:18:19,180 --> 00:18:21,940 But if we are able to get a solution, we can 341 00:18:21,940 --> 00:18:24,070 use it to solve some other problems. 342 00:18:24,070 --> 00:18:26,610 And in this case, we can use the independent set 343 00:18:26,610 --> 00:18:28,630 to solve the vertex cover problem. 344 00:18:28,630 --> 00:18:29,380 How do we do that? 345 00:18:29,380 --> 00:18:33,940 Well, we just consider all of the vertexes in the compliment. 346 00:18:33,940 --> 00:18:38,490 So basically, all of the vertex that weren't in the independent set, 347 00:18:38,490 --> 00:18:40,860 consider those to be in the vertex cover. 348 00:18:40,860 --> 00:18:44,860 And these have the property that every single vertex 349 00:18:44,860 --> 00:18:47,550 is related to every other edge. 350 00:18:47,550 --> 00:18:50,860 I don't know if I'm explaining that correctly. 351 00:18:50,860 --> 00:18:55,860 Every vertex in the vertex cover is touching every single edge. 352 00:18:55,860 --> 00:18:59,510 And in this case, you can hopefully see that. 353 00:18:59,510 --> 00:19:02,520 And so we just created, out of nothingness, 354 00:19:02,520 --> 00:19:05,410 a solution to vertex cover from a solution to independent set. 355 00:19:05,410 --> 00:19:08,710 And we can even take it one step further with clique. 356 00:19:08,710 --> 00:19:12,810 The way you do this is you consider the points an independent set, 357 00:19:12,810 --> 00:19:15,160 or the graph of independent set, and then take 358 00:19:15,160 --> 00:19:18,480 the complement of the edges, which means that every edge that 359 00:19:18,480 --> 00:19:20,550 doesn't exist already, you fill it in. 360 00:19:20,550 --> 00:19:23,650 And every edge that did exist, you get rid of it. 361 00:19:23,650 --> 00:19:26,200 And what you're left with is this graph. 362 00:19:26,200 --> 00:19:29,380 And all of the points that were originally an independent set 363 00:19:29,380 --> 00:19:34,110 are now a clique-- in this case, a 4 clique-- for this new graph. 364 00:19:34,110 --> 00:19:39,450 So it took a long time to create a solution to independent set, 365 00:19:39,450 --> 00:19:40,710 an exponential amount of time. 366 00:19:40,710 --> 00:19:43,120 But now that we have that, we're able to create solutions 367 00:19:43,120 --> 00:19:45,550 to other similar problems. 368 00:19:45,550 --> 00:19:49,020 And it turns out there's a wide range of problems that are related, 369 00:19:49,020 --> 00:19:53,680 even problems not really seemingly related to graphs at all. 370 00:19:53,680 --> 00:19:59,240 So this is what's called this the satisfiability problem. 371 00:19:59,240 --> 00:20:05,110 Given a clause, or basically a logic clause, where each of these variables-- 372 00:20:05,110 --> 00:20:09,160 I don't you can see where I'm pointing in the video, 373 00:20:09,160 --> 00:20:13,840 but these are all basically variables that are either true or false. 374 00:20:13,840 --> 00:20:17,550 And so what the satisfiability question is asking 375 00:20:17,550 --> 00:20:20,830 is whether or not this clause has a truth assignment. 376 00:20:20,830 --> 00:20:26,980 That is, there exists an a and b, such that this all becomes true. 377 00:20:26,980 --> 00:20:33,800 And for those who don't know, these little downwards carrots are or's. 378 00:20:33,800 --> 00:20:35,260 And the upward carrots are ands. 379 00:20:35,260 --> 00:20:37,010 And the little-- I don't what you call it. 380 00:20:37,010 --> 00:20:41,279 The little sideways dashes are nots, so the opposite. 381 00:20:41,279 --> 00:20:42,320 If it's true, it's false. 382 00:20:42,320 --> 00:20:45,200 If it's false, it's true. 383 00:20:45,200 --> 00:20:48,170 So the question is does this have a truth assignment? 384 00:20:48,170 --> 00:20:52,610 And maybe you can just eyeball it and say, well, actually, if A is true, 385 00:20:52,610 --> 00:20:56,070 and B is false, and C is true, then this statement can be true. 386 00:20:56,070 --> 00:20:58,290 And that indeed, that is the case. 387 00:20:58,290 --> 00:21:01,130 So how do we reduce-- how do we take that solution 388 00:21:01,130 --> 00:21:05,270 and turn it into a solution to, say, the vertex cover problem? 389 00:21:05,270 --> 00:21:09,320 In this case, all we have to do is create this graph. 390 00:21:09,320 --> 00:21:12,295 And there is an algorithm to create this graph. 391 00:21:12,295 --> 00:21:15,170 And what it does is you basically-- I don't know if you can see this, 392 00:21:15,170 --> 00:21:18,950 but every clause of the original statement 393 00:21:18,950 --> 00:21:25,400 up here turns into 2 variables at the bottom of this graph. 394 00:21:25,400 --> 00:21:30,380 So you have AB turning into AB down here, not A, not B, not C 395 00:21:30,380 --> 00:21:32,520 turning into these 3 points down here. 396 00:21:32,520 --> 00:21:36,200 And then not ABC turning into not ABC. 397 00:21:36,200 --> 00:21:41,700 And then finally, you just take every a and its complement up here. 398 00:21:41,700 --> 00:21:46,710 Now, let's say we have the solution, A, not BECAUSE. 399 00:21:46,710 --> 00:21:53,720 Then basically, you want to color these, A not BC on the top line. 400 00:21:53,720 --> 00:21:57,890 And then you want to connect every single element to itself. 401 00:21:57,890 --> 00:22:03,680 So not B goes to not B. B goes to B. A goes to A. Not C to not C. 402 00:22:03,680 --> 00:22:05,240 And that's that. 403 00:22:05,240 --> 00:22:09,618 The point is that basically-- oh, did I miss something? 404 00:22:09,618 --> 00:22:12,210 405 00:22:12,210 --> 00:22:17,570 So then you basically color these based on-- there is a relationship. 406 00:22:17,570 --> 00:22:19,157 I'm actually blanking on it right now. 407 00:22:19,157 --> 00:22:20,990 But basically, there's a way to color these. 408 00:22:20,990 --> 00:22:24,530 And basically when you color the A, not BC, 409 00:22:24,530 --> 00:22:29,090 and then B, not A, not CBA on the bottom, that creates a vertex cover. 410 00:22:29,090 --> 00:22:31,790 And so you've used a solution, you've used an algorithm, 411 00:22:31,790 --> 00:22:35,060 that creates a solution from the satisfiability 412 00:22:35,060 --> 00:22:39,200 into the vertex cover that is actually very powerful. 413 00:22:39,200 --> 00:22:41,140 And that's what's called a reduction. 414 00:22:41,140 --> 00:22:44,120 I know that was a little bit involved. 415 00:22:44,120 --> 00:22:48,510 I'm still trying to figure out how these are related. 416 00:22:48,510 --> 00:22:56,660 But basically, A, not BC-- yeah, so I think what it is, 417 00:22:56,660 --> 00:23:00,740 is if it's A is colored here, then you color all the 418 00:23:00,740 --> 00:23:02,980 not A's in this bottom graph. 419 00:23:02,980 --> 00:23:05,190 And if not B is colored here, then you colored 420 00:23:05,190 --> 00:23:06,570 all the B's in this bottom graph. 421 00:23:06,570 --> 00:23:10,600 And again, if you color C here, then the not C is colored here. 422 00:23:10,600 --> 00:23:14,539 And I think using that algorithm, you can create a solution to vertex cover. 423 00:23:14,539 --> 00:23:16,205 Why is this the solution a vertex cover? 424 00:23:16,205 --> 00:23:20,450 So this is one thing I didn't mention, is that all of the colored things 425 00:23:20,450 --> 00:23:23,720 are covering every single edge. 426 00:23:23,720 --> 00:23:26,600 That is, every single blue node here is connected 427 00:23:26,600 --> 00:23:28,670 to every single edge in this graph. 428 00:23:28,670 --> 00:23:31,340 Also, as we discussed before, vertex cover 429 00:23:31,340 --> 00:23:35,180 is related to independent set in the compliment. 430 00:23:35,180 --> 00:23:38,390 So if you consider all the white nodes in this graph, 431 00:23:38,390 --> 00:23:43,190 they form a independent set where no 2 of the white nodes are touching. 432 00:23:43,190 --> 00:23:45,710 So now we've created solutions to two of these problems 433 00:23:45,710 --> 00:23:48,140 from a seemingly completely unrelated problem, 434 00:23:48,140 --> 00:23:51,240 which is the satisfiability problem. 435 00:23:51,240 --> 00:23:56,090 And that's actually really powerful when considering these type of NP problems. 436 00:23:56,090 --> 00:23:58,089 Sorry, that was a little bit wordy. 437 00:23:58,089 --> 00:23:59,880 Hopefully, you got something out of that's. 438 00:23:59,880 --> 00:24:03,620 Let's go on to more fun things like the Turing machine. 439 00:24:03,620 --> 00:24:05,660 Way more fun. 440 00:24:05,660 --> 00:24:11,010 Of course, this is due to Alan Turing, the man who needs no introduction. 441 00:24:11,010 --> 00:24:13,070 But basically, this was his theoretical model 442 00:24:13,070 --> 00:24:17,000 of a computer in its simplest form, which is basically 443 00:24:17,000 --> 00:24:18,900 just something that runs an algorithm. 444 00:24:18,900 --> 00:24:22,580 I think this is sort of-- this image helps to see that. 445 00:24:22,580 --> 00:24:28,250 It's a machine that runs an algorithm on an infinite tape of 1s and 0s. 446 00:24:28,250 --> 00:24:32,990 And that is a computer at its simplest form. 447 00:24:32,990 --> 00:24:36,170 And Alan Turing saying that that mathematical functions on numbers 448 00:24:36,170 --> 00:24:40,040 can be just as well computed by a Turing machine, that was basically what 449 00:24:40,040 --> 00:24:44,940 was concluded in what's called the Church-Turing thesis. 450 00:24:44,940 --> 00:24:48,650 And I like to think of this as basically a reduction 451 00:24:48,650 --> 00:24:51,590 of a computer to its simplest abilities, the ability 452 00:24:51,590 --> 00:24:54,000 to read from memory, the ability to write from memory. 453 00:24:54,000 --> 00:24:55,291 And that's really all you need. 454 00:24:55,291 --> 00:24:59,930 You could interpret this Turing machine of just 1s and 0s 455 00:24:59,930 --> 00:25:02,270 as like blocks of 8 bits. 456 00:25:02,270 --> 00:25:05,270 And then you could have bytes, say. 457 00:25:05,270 --> 00:25:08,330 And then you can work in letters, as opposed to just 1s and 0s. 458 00:25:08,330 --> 00:25:11,480 And so I think reduction is just a concept that comes up a lot in computer 459 00:25:11,480 --> 00:25:12,965 science, so it's good to know. 460 00:25:12,965 --> 00:25:16,460 And not just in the sense of problems to problems, but also, 461 00:25:16,460 --> 00:25:19,424 for example, computers and things that solve those problems. 462 00:25:19,424 --> 00:25:22,550 463 00:25:22,550 --> 00:25:25,974 It's worth noting, there are also other theoretical ways 464 00:25:25,974 --> 00:25:27,140 of thinking about computers. 465 00:25:27,140 --> 00:25:32,107 Today, we have quantum computers, which is just another theoretical model 466 00:25:32,107 --> 00:25:32,690 of a computer. 467 00:25:32,690 --> 00:25:36,830 And that has different ramifications for the P versus NP problem. 468 00:25:36,830 --> 00:25:40,670 And the problem kind of breaks down in that sense. 469 00:25:40,670 --> 00:25:43,840 But at least for Turing's notion of computers, 470 00:25:43,840 --> 00:25:45,090 this is what we'll talk about. 471 00:25:45,090 --> 00:25:47,770 472 00:25:47,770 --> 00:25:51,720 Yeah, so what is the resolution to this problem? 473 00:25:51,720 --> 00:25:52,870 Is P NP? 474 00:25:52,870 --> 00:25:54,790 Is P not NP? 475 00:25:54,790 --> 00:25:57,720 Are these problems-- is there a problem that's not in NP 476 00:25:57,720 --> 00:26:00,950 but-- excuse me-- in NP but definitely not in P? 477 00:26:00,950 --> 00:26:02,440 And the answer is pretty unclear. 478 00:26:02,440 --> 00:26:05,590 But there's a lot of smart people, not like me, 479 00:26:05,590 --> 00:26:10,870 but smarter than me, who think that there's definitely not a solution. 480 00:26:10,870 --> 00:26:13,840 Because people have spent a lot of time thinking about this question. 481 00:26:13,840 --> 00:26:15,790 And for a lot of these NP complete problems, 482 00:26:15,790 --> 00:26:17,800 they just can't find efficient algorithms. 483 00:26:17,800 --> 00:26:20,300 I think this comic sums that up nicely. 484 00:26:20,300 --> 00:26:23,290 485 00:26:23,290 --> 00:26:28,120 So it's very much seems like P is not NP, so I hate to break your all hearts. 486 00:26:28,120 --> 00:26:30,670 But a lot of people think that P is not NP. 487 00:26:30,670 --> 00:26:33,790 There are some people who think P is NP. 488 00:26:33,790 --> 00:26:37,980 And then there are the people who have different conceptions of the problem 489 00:26:37,980 --> 00:26:39,970 like, well, it's a badly worded problem. 490 00:26:39,970 --> 00:26:45,970 Any of the definitions of P and NP are unrelated in their definitions. 491 00:26:45,970 --> 00:26:49,720 So there's a lot of conceptions, but most computer scientists 492 00:26:49,720 --> 00:26:53,335 think it is a well-formed question and that P is not equal to NP. 493 00:26:53,335 --> 00:26:56,800 494 00:26:56,800 --> 00:27:01,720 So how would one go about proving this question of the relatedness 495 00:27:01,720 --> 00:27:05,789 of algorithms solving problems. 496 00:27:05,789 --> 00:27:08,830 And it turns out, it's just not easy because the way we define these sets 497 00:27:08,830 --> 00:27:11,450 is by the existence of algorithms, right? 498 00:27:11,450 --> 00:27:16,600 So to prove P is NP actually is "easy" because all you have to do 499 00:27:16,600 --> 00:27:22,210 is find one polynomial time algorithm for a NP complete problem. 500 00:27:22,210 --> 00:27:25,420 And then immediately, the whole class collapses into P. 501 00:27:25,420 --> 00:27:28,090 That is, because if you had a polynomial time 502 00:27:28,090 --> 00:27:30,070 algorithm to solve an NP complete problem, 503 00:27:30,070 --> 00:27:32,560 you could solve all of the other NP complete problems 504 00:27:32,560 --> 00:27:38,650 because we told you that they're all related by some core completeness. 505 00:27:38,650 --> 00:27:42,010 And then to prove P is not equal to NP is much harder, 506 00:27:42,010 --> 00:27:47,920 in the sense that you have to prove that an algorithm doesn't exist. 507 00:27:47,920 --> 00:27:51,010 And that is actually kind of difficult to solve because in the past, 508 00:27:51,010 --> 00:27:53,110 we've just invented new algorithms, and it just 509 00:27:53,110 --> 00:27:56,740 took enough time before someone was like, oh, well we can do it this way. 510 00:27:56,740 --> 00:28:02,140 And we don't have proof that an algorithm doesn't exist in P. 511 00:28:02,140 --> 00:28:06,460 It's only until we just find an algorithm in P that we can be sure that 512 00:28:06,460 --> 00:28:11,440 that wasn't-- excuse me-- was in NP and not in P. 513 00:28:11,440 --> 00:28:15,920 So that is inherently not an easy task. 514 00:28:15,920 --> 00:28:19,360 And I think that's why that makes this problem so fascinating and so 515 00:28:19,360 --> 00:28:24,130 frustrating for a lot of smart people. 516 00:28:24,130 --> 00:28:27,820 I think one of the most interesting ways to think about this problem 517 00:28:27,820 --> 00:28:30,990 is in terms of what if P did equal NP? 518 00:28:30,990 --> 00:28:35,500 And I think this is like a counter-- or excuse me, 519 00:28:35,500 --> 00:28:39,790 like the-- I don't know what the word is-- contradictive argument, 520 00:28:39,790 --> 00:28:42,610 where you say, well, what if P did equal NP? 521 00:28:42,610 --> 00:28:47,320 And if P did equal NP, then the world would be a lot different. 522 00:28:47,320 --> 00:28:51,280 If P was NP, then every problem that would be easy to check 523 00:28:51,280 --> 00:28:53,080 would also be easy to solve. 524 00:28:53,080 --> 00:28:55,090 What does that mean? 525 00:28:55,090 --> 00:28:56,800 Actually, I don't know. 526 00:28:56,800 --> 00:29:01,300 I didn't realize this when I first started to research this problem, 527 00:29:01,300 --> 00:29:06,170 but RSA encryption, which is like basically the basis 528 00:29:06,170 --> 00:29:08,420 of a lot of our modern encryption, would be 529 00:29:08,420 --> 00:29:11,230 very easy to crack because it's based on factoring. 530 00:29:11,230 --> 00:29:13,540 It's based on factoring large, large numbers. 531 00:29:13,540 --> 00:29:15,760 And if those numbers were easy to factor, 532 00:29:15,760 --> 00:29:21,790 they would be very easy to basically decrypt 533 00:29:21,790 --> 00:29:25,700 a lot of messages that are passed online and things like that. 534 00:29:25,700 --> 00:29:29,420 But of course, we're relying on the fact that P is not equal to NP, 535 00:29:29,420 --> 00:29:34,480 and that there is no algorithm that exists currently to do that. 536 00:29:34,480 --> 00:29:37,570 Another ramification via artificial intelligence systems 537 00:29:37,570 --> 00:29:42,050 would be making huge leaps almost overnight because a proof to P 538 00:29:42,050 --> 00:29:46,210 equals NP would also tell us something fundamental about solving 539 00:29:46,210 --> 00:29:50,690 certain problems that AI people solve. 540 00:29:50,690 --> 00:29:52,870 I don't want to go into details too much there. 541 00:29:52,870 --> 00:29:56,440 But another ramification, for example, would 542 00:29:56,440 --> 00:29:59,140 be that the economy would be perfectly efficient, which 543 00:29:59,140 --> 00:30:06,760 means that if there were any arbitrage between two different-- well, 544 00:30:06,760 --> 00:30:10,670 arbitrage works by taking advantage of the different price 545 00:30:10,670 --> 00:30:13,390 discrepancies in a graph. 546 00:30:13,390 --> 00:30:18,710 And if those are easy to detect-- that is, if there were a polynomial time 547 00:30:18,710 --> 00:30:22,120 algorithm to detect them, then it would immediately 548 00:30:22,120 --> 00:30:25,480 be easy to find these arbitrage opportunities, which 549 00:30:25,480 --> 00:30:29,350 is sort of a very bad thing. 550 00:30:29,350 --> 00:30:32,860 And then, one other thing would be-- a lot 551 00:30:32,860 --> 00:30:36,790 of people say that proving things, or mathematical proofs, 552 00:30:36,790 --> 00:30:39,730 is such an NP problem because, well, it's 553 00:30:39,730 --> 00:30:41,380 easy to check if a proof is correct. 554 00:30:41,380 --> 00:30:42,710 You just follow the logic. 555 00:30:42,710 --> 00:30:46,960 It's much harder and it's a creative task to actually create a proof. 556 00:30:46,960 --> 00:30:50,650 And so we might be able to write programs 557 00:30:50,650 --> 00:30:54,086 that make mathematical proofs if P did equal 558 00:30:54,086 --> 00:31:00,610 NP, which would just like be crazy, and that would just 559 00:31:00,610 --> 00:31:03,190 make the whole world explode. 560 00:31:03,190 --> 00:31:06,040 The world would be fundamentally different than the way we currently 561 00:31:06,040 --> 00:31:08,210 think it is. 562 00:31:08,210 --> 00:31:11,690 And a little more philosophy here, which is one of the fun things 563 00:31:11,690 --> 00:31:16,510 I like to think about, is proving things-- well we talked about it. 564 00:31:16,510 --> 00:31:20,770 It is sort of like an NP problem because you can just follow the logic, 565 00:31:20,770 --> 00:31:22,750 and it's easy to check. 566 00:31:22,750 --> 00:31:24,890 Is comedy an NP problem? 567 00:31:24,890 --> 00:31:28,690 If I tell a joke and you laugh at it, then it's really easy 568 00:31:28,690 --> 00:31:30,710 to check that that was a funny joke. 569 00:31:30,710 --> 00:31:33,440 Much harder task to write a funny joke. 570 00:31:33,440 --> 00:31:36,440 I know, because I do stand up, and enough people 571 00:31:36,440 --> 00:31:38,890 have sat in silence at my jokes. 572 00:31:38,890 --> 00:31:41,690 I know it's a very hard thing to do. 573 00:31:41,690 --> 00:31:43,510 Music. 574 00:31:43,510 --> 00:31:44,690 Appreciation of music. 575 00:31:44,690 --> 00:31:45,790 Is it easy thing to check? 576 00:31:45,790 --> 00:31:46,390 Well, yeah. 577 00:31:46,390 --> 00:31:49,060 If you just listen to something, you can kind of intuitively 578 00:31:49,060 --> 00:31:51,130 tell that that is a nice sound. 579 00:31:51,130 --> 00:31:56,380 But it is actually much different much more difficult to create music, 580 00:31:56,380 --> 00:31:58,780 to create good music. 581 00:31:58,780 --> 00:32:01,840 These are all sort of philosophical arguments. 582 00:32:01,840 --> 00:32:06,190 These aren't really-- the actual definitions of P versus NP 583 00:32:06,190 --> 00:32:11,260 that we're exploring have a much more well-defined relationship to Turing 584 00:32:11,260 --> 00:32:13,460 machines and what are called languages. 585 00:32:13,460 --> 00:32:15,940 But when you think very high level, the ramifications 586 00:32:15,940 --> 00:32:21,640 are endless for any sort of creative pursuit. 587 00:32:21,640 --> 00:32:25,046 And Scott Aaronson of MIT once said that, "If P equaled NP, 588 00:32:25,046 --> 00:32:27,920 then the world would be a profoundly different place than we normally 589 00:32:27,920 --> 00:32:28,586 assume it to be. 590 00:32:28,586 --> 00:32:30,880 There would be no special value in creative leaps. 591 00:32:30,880 --> 00:32:33,610 There would be no fundamental gap between solving a problem 592 00:32:33,610 --> 00:32:35,890 and recognizing the solution once it's found. 593 00:32:35,890 --> 00:32:38,380 Everyone who can appreciate a symphony would be Mozart, 594 00:32:38,380 --> 00:32:41,470 and everyone who could follow a step-by-step argument would be Gauss." 595 00:32:41,470 --> 00:32:45,390 And I think that tells you something about the crazy ramifications 596 00:32:45,390 --> 00:32:48,625 of a proof that P did equal to NP. 597 00:32:48,625 --> 00:32:52,780 598 00:32:52,780 --> 00:32:53,989 Yeah, so why should you care? 599 00:32:53,989 --> 00:32:56,738 Hopefully, you're sitting here and being like, well, that's great, 600 00:32:56,738 --> 00:32:58,240 but I have a final product to do. 601 00:32:58,240 --> 00:33:00,210 And this doesn't really help at all. 602 00:33:00,210 --> 00:33:04,750 I don't think many of you are going to be doing theoretical proofs or anything 603 00:33:04,750 --> 00:33:06,670 like that or talking about theory. 604 00:33:06,670 --> 00:33:08,120 Although, I think you should. 605 00:33:08,120 --> 00:33:12,130 We talk in CS50 about running time of different algorithms. 606 00:33:12,130 --> 00:33:16,960 And that is very relevant because there could be a faster way 607 00:33:16,960 --> 00:33:18,920 to do the problems that you're thinking about. 608 00:33:18,920 --> 00:33:20,429 You just don't know yet. 609 00:33:20,429 --> 00:33:22,720 And that's why theoretical computer scientists research 610 00:33:22,720 --> 00:33:26,170 the fastest way to do certain things. 611 00:33:26,170 --> 00:33:29,669 And that's why reductions have shown us-- reductions from NP 612 00:33:29,669 --> 00:33:32,210 complete problems have shown us that like if a problem exists 613 00:33:32,210 --> 00:33:34,690 of this problem, a problem that you're trying to solve 614 00:33:34,690 --> 00:33:36,610 might actually have already been solved. 615 00:33:36,610 --> 00:33:41,690 Or another other thing is that, well, if you assume P doesn't equal to NP, 616 00:33:41,690 --> 00:33:46,840 something you're trying to do might not be that easy to do. 617 00:33:46,840 --> 00:33:50,380 And if that wasn't motivation again for why you should care, 618 00:33:50,380 --> 00:33:52,420 well, it's a million dollar question, right? 619 00:33:52,420 --> 00:33:58,810 And there's nothing that computer scientists like more than computers 620 00:33:58,810 --> 00:34:00,130 except for money, right? 621 00:34:00,130 --> 00:34:02,440 So I think that should be motivation enough 622 00:34:02,440 --> 00:34:06,140 for you to be interested in the P versus NP problem. 623 00:34:06,140 --> 00:34:09,020 It is something that is really interesting to me. 624 00:34:09,020 --> 00:34:13,027 I know I took you through the weeds a little bit with the actual reductions, 625 00:34:13,027 --> 00:34:14,360 but I hope that was interesting. 626 00:34:14,360 --> 00:34:16,270 And that's really what computer scientists do 627 00:34:16,270 --> 00:34:21,110 when they think about these types of problems, is reduce problems to another 628 00:34:21,110 --> 00:34:25,690 and sort of relate these algorithms. 629 00:34:25,690 --> 00:34:31,026 And so I think that's about all I have. 630 00:34:31,026 --> 00:34:31,900 But I hope you enjoy. 631 00:34:31,900 --> 00:34:32,733 Here are some links. 632 00:34:32,733 --> 00:34:37,449 Just the P versus NP page is a collection of the status and history 633 00:34:37,449 --> 00:34:38,650 of the problem. 634 00:34:38,650 --> 00:34:40,340 And this is just a video I really like. 635 00:34:40,340 --> 00:34:43,989 It explains all these things very well. 636 00:34:43,989 --> 00:34:45,219 That's all I have. 637 00:34:45,219 --> 00:34:49,210 I hope that was enjoyable for you. 638 00:34:49,210 --> 00:34:55,030 And for everyone out there enjoying it on YouTube, I guess, 639 00:34:55,030 --> 00:34:57,380 I hope you could take something away from this. 640 00:34:57,380 --> 00:35:02,090 And yeah, P versus NP. 641 00:35:02,090 --> 00:35:03,910 It's pretty cool. 642 00:35:03,910 --> 00:35:06,141