WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:00.000 --> 00:00:02.475 [MUSIC PLAYING] 00:00:49.805 --> 00:00:52.970 DAVID MALAN: This is CS50, Harvard University's introduction 00:00:52.970 --> 00:00:56.450 to the intellectual enterprises of computer science-- 00:00:56.450 --> 00:00:59.100 over here, computer science. 00:00:59.100 --> 00:01:02.990 So today we are joined by CS50's own Brian Yu. 00:01:02.990 --> 00:01:06.740 This is an unusual week, a look at artificial intelligence or AI. 00:01:06.740 --> 00:01:09.980 You might recall that some weeks ago when we first introduced Python, 00:01:09.980 --> 00:01:13.160 we talked about writing programs that answered you 00:01:13.160 --> 00:01:17.060 by saying hello if you said hello, or by saying goodbye if you said goodbye. 00:01:17.060 --> 00:01:20.450 But those programs were all implemented with if conditions and else ifs 00:01:20.450 --> 00:01:23.330 and else ifs and so that really wasn't artificial intelligence. 00:01:23.330 --> 00:01:25.747 If you wanted to have a whole conversation with a computer 00:01:25.747 --> 00:01:29.630 program like that, that would be a huge number of ifs and else ifs just 00:01:29.630 --> 00:01:31.970 to anticipate all of the things the human might say. 00:01:31.970 --> 00:01:34.100 So we can do better, and indeed humans have 00:01:34.100 --> 00:01:36.710 been doing better in this field of artificial intelligence. 00:01:36.710 --> 00:01:40.280 And I'm so pleased to say that Brian is here to lead us along that way. 00:01:40.280 --> 00:01:42.705 Now, CS50's own, Brian Yu. 00:01:42.705 --> 00:01:43.830 BRIAN YU: Thanks very much. 00:01:43.830 --> 00:01:44.955 Welcome, everyone, to CS50. 00:01:44.955 --> 00:01:47.780 And today as David alluded to, the topic of discussion 00:01:47.780 --> 00:01:50.960 is artificial intelligence, which is all about taking our computers 00:01:50.960 --> 00:01:53.180 and trying to make them intelligent somehow, 00:01:53.180 --> 00:01:55.460 trying to get them to be able to act in a way 00:01:55.460 --> 00:01:57.800 that's somewhat rational, somewhat human and that 00:01:57.800 --> 00:01:59.490 could take a number of different forms. 00:01:59.490 --> 00:02:03.235 One example of artificial intelligence, for example, might be game play. 00:02:03.235 --> 00:02:05.360 You might be familiar with the game of tic-tac-toe, 00:02:05.360 --> 00:02:07.160 where you've got this three by three grid 00:02:07.160 --> 00:02:10.380 and X and O take turns trying to get three in a row. 00:02:10.380 --> 00:02:12.860 And if, for example X started the game and played 00:02:12.860 --> 00:02:15.860 in the middle square of this grid and then it was O's turn 00:02:15.860 --> 00:02:20.000 and O played in the top, it turns out that at this point in the game 00:02:20.000 --> 00:02:21.980 X has a very strategic move. 00:02:21.980 --> 00:02:25.220 And a human that's very good at the game or a computer that can maybe 00:02:25.220 --> 00:02:27.200 try and figure out how to play this game well, 00:02:27.200 --> 00:02:29.510 might make an intelligent move like playing 00:02:29.510 --> 00:02:31.680 in the top right corner for example. 00:02:31.680 --> 00:02:34.020 And if X plays in the top right corner, well then 00:02:34.020 --> 00:02:36.440 O is going to need to play in the bottom left corner 00:02:36.440 --> 00:02:39.600 in order to block X from getting three in a row. 00:02:39.600 --> 00:02:42.590 And here maybe you can see one of a couple of possible good moves 00:02:42.590 --> 00:02:45.980 that X has now, but X could play a move like moving 00:02:45.980 --> 00:02:48.290 in the right square over there. 00:02:48.290 --> 00:02:52.220 And now O is in a tricky position, X has one way that could potentially 00:02:52.220 --> 00:02:55.190 win the game horizontally, and another way they could potentially 00:02:55.190 --> 00:02:56.480 win the game vertically. 00:02:56.480 --> 00:02:59.030 And so O is going to have to block one of those ways, 00:02:59.030 --> 00:03:01.640 and maybe they go to block horizontally but then 00:03:01.640 --> 00:03:05.610 X is going to win no matter what just by playing in that bottom right corner. 00:03:05.610 --> 00:03:09.020 And so a human playing this game could reason through the game thinking 00:03:09.020 --> 00:03:12.680 about how the opponent might respond and how X would respond in turn 00:03:12.680 --> 00:03:15.800 and a computer might try to do the same thing for a game as simple 00:03:15.800 --> 00:03:18.890 as tic-tac-toe or a game even more complex. 00:03:18.890 --> 00:03:21.800 But AI goes beyond just game plan, you might see examples 00:03:21.800 --> 00:03:24.470 like handwriting recognition, where nowadays computers 00:03:24.470 --> 00:03:28.070 are pretty good at taking human handwritten text, which can be messy, 00:03:28.070 --> 00:03:30.020 which is different from person to person, 00:03:30.020 --> 00:03:34.640 and somehow figuring out with pretty high accuracy exactly what characters 00:03:34.640 --> 00:03:36.440 the human was actually writing. 00:03:36.440 --> 00:03:39.560 AI comes up in areas like spam detection. 00:03:39.560 --> 00:03:42.020 Maybe in your email inbox you have your inbox 00:03:42.020 --> 00:03:45.620 and your spam email usually gets sorted into a separate folder 00:03:45.620 --> 00:03:49.040 where you might get a whole bunch of emails coming into your email inbox 00:03:49.040 --> 00:03:52.920 and somehow your computer is able to figure out with reasonable accuracy, 00:03:52.920 --> 00:03:56.030 which emails are good and which emails are spam. 00:03:56.030 --> 00:03:58.160 Now of course, the computer is not perfect at this. 00:03:58.160 --> 00:04:01.670 There are false positives, where the computer thinks that an email might 00:04:01.670 --> 00:04:03.320 be spam when it isn't actually. 00:04:03.320 --> 00:04:06.950 And there are false negatives too, where a spam email might accidentally 00:04:06.950 --> 00:04:09.800 end up in your inbox because the computer doesn't catch it, 00:04:09.800 --> 00:04:12.350 but those sorts of false positives and false negatives 00:04:12.350 --> 00:04:14.540 are the types of things that AI researchers are 00:04:14.540 --> 00:04:18.118 working on trying to reduce to make these systems more and more accurate. 00:04:18.118 --> 00:04:20.660 You see these kinds of systems show up as well if you've ever 00:04:20.660 --> 00:04:23.460 been on a video watching website, like YouTube or Netflix, 00:04:23.460 --> 00:04:27.350 for example, where you have watched a whole bunch of videos or TV shows 00:04:27.350 --> 00:04:31.400 or movies and then software behind Netflix or behind YouTube 00:04:31.400 --> 00:04:35.090 is able to give you recommendations, suggest other videos that you 00:04:35.090 --> 00:04:39.050 might be interested in watching based on the things that you've watched already. 00:04:39.050 --> 00:04:41.300 And in more recent years, AI has gotten pretty good 00:04:41.300 --> 00:04:45.470 at doing even more sophisticated things, things like generating data. 00:04:45.470 --> 00:04:48.650 Take a look at these two images, for example, of people 00:04:48.650 --> 00:04:50.600 and see if you notice anything strange. 00:04:50.600 --> 00:04:52.940 See if either of these people look strange to you. 00:04:52.940 --> 00:04:58.130 In fact, can you figure out which of these two images is not a real person? 00:04:58.130 --> 00:05:02.180 That is to say, a computer-generated person that looks like it's human 00:05:02.180 --> 00:05:04.260 but is not actually a photo of a real person. 00:05:04.260 --> 00:05:06.635 So you can look at these two images carefully, maybe look 00:05:06.635 --> 00:05:09.380 at the eyes and the hair and the mouth and the nose 00:05:09.380 --> 00:05:11.750 and see if you can figure out which one it is. 00:05:11.750 --> 00:05:15.140 It turns out neither of these two images are images of real people. 00:05:15.140 --> 00:05:19.070 They're both computer generated, not photos of real people but a computer 00:05:19.070 --> 00:05:22.910 has been trained to generate images of people that look like real people that 00:05:22.910 --> 00:05:25.910 could fool someone into thinking that it's a real person, 00:05:25.910 --> 00:05:29.367 but it's all just AI generated information. 00:05:29.367 --> 00:05:31.700 So today we'll take a look at all of those ideas, how it 00:05:31.700 --> 00:05:33.620 is that artificial intelligence works. 00:05:33.620 --> 00:05:37.010 And ultimately one of the big takeaways is that AI is not just 00:05:37.010 --> 00:05:39.620 one algorithm or one principle, it's really 00:05:39.620 --> 00:05:43.310 a collection, a category of approaches to problem solving that can all 00:05:43.310 --> 00:05:46.400 be used to try and solve some of these problems of trying 00:05:46.400 --> 00:05:49.160 to make computers intelligent. 00:05:49.160 --> 00:05:51.350 So let's begin with one of the first areas 00:05:51.350 --> 00:05:53.750 where we might want to make our AI work and that's 00:05:53.750 --> 00:05:55.610 in the area of decision making. 00:05:55.610 --> 00:05:57.770 Very frequently we want to train a computer 00:05:57.770 --> 00:05:59.540 to be able to make a decision well. 00:05:59.540 --> 00:06:03.530 That decision might be deciding if an email is spam or not spam, 00:06:03.530 --> 00:06:06.320 or deciding whether or not to recommend a video to you, 00:06:06.320 --> 00:06:10.132 or it could be deciding what action to take in a game, for example. 00:06:10.132 --> 00:06:11.840 So let's take a simple game, maybe you've 00:06:11.840 --> 00:06:15.722 played a game like this before where you control this paddle along the bottom 00:06:15.722 --> 00:06:18.680 and you're trying to bounce this ball in order to hit all of the bricks 00:06:18.680 --> 00:06:19.730 along the top. 00:06:19.730 --> 00:06:22.550 Imagine if you were trying to program a computer 00:06:22.550 --> 00:06:25.610 to be able to play this game strategically, to play this game well 00:06:25.610 --> 00:06:30.410 and the computer observed the ball move that way. 00:06:30.410 --> 00:06:34.190 So the ball is moving that way, what should you program the computer to do? 00:06:34.190 --> 00:06:37.220 Well, it makes logical sense that if the ball's moving to the left, then 00:06:37.220 --> 00:06:40.190 you should program the computer to move the paddle to the left as well, 00:06:40.190 --> 00:06:44.230 to try and catch that ball before it falls through the ground. 00:06:44.230 --> 00:06:47.360 And so you could take that kind of logic and encode it 00:06:47.360 --> 00:06:51.140 into a computer program using what we might call a decision tree. 00:06:51.140 --> 00:06:54.110 Decision trees are just a way of representing a decision 00:06:54.110 --> 00:06:56.670 that a computer might make by asking questions 00:06:56.670 --> 00:07:00.290 and depending on the answers to those questions we might ask another question 00:07:00.290 --> 00:07:02.400 or make some sort of decision. 00:07:02.400 --> 00:07:05.330 So for this game of the paddle, we could create a decision tree 00:07:05.330 --> 00:07:07.550 by asking a question like this, we could ask, 00:07:07.550 --> 00:07:10.280 is the ball to the left of the paddle? 00:07:10.280 --> 00:07:13.910 If the answer to that question is yes, well then the action we should take 00:07:13.910 --> 00:07:17.360 is we should move the paddle to the left because the ball is moving left, 00:07:17.360 --> 00:07:19.640 the paddle should move left as well. 00:07:19.640 --> 00:07:22.610 If the answer to the question is no, well then we 00:07:22.610 --> 00:07:25.650 need to maybe ask another question, we might ask a question like, 00:07:25.650 --> 00:07:27.890 is the ball to the right of the paddle? 00:07:27.890 --> 00:07:29.862 And if the answer to that question is yes, 00:07:29.862 --> 00:07:32.070 then we'll go ahead and move the paddle to the right. 00:07:32.070 --> 00:07:34.272 And if the answer to the question is no, well that 00:07:34.272 --> 00:07:36.230 means the ball is not to the left of the paddle 00:07:36.230 --> 00:07:39.410 and it's not to the right of the paddle, so we may as well just not 00:07:39.410 --> 00:07:42.540 move the paddle at all in that case. 00:07:42.540 --> 00:07:45.140 So this again is that decision tree that can 00:07:45.140 --> 00:07:48.380 allow us to make these choices about what it is that our AI should 00:07:48.380 --> 00:07:52.398 do in this situation and we could take that decision tree and turn it 00:07:52.398 --> 00:07:55.190 into a kind of pseudocode, something that might look like something 00:07:55.190 --> 00:07:57.350 you would write in C or in Python. 00:07:57.350 --> 00:08:00.470 I might say while the game is ongoing, if the ball is 00:08:00.470 --> 00:08:03.620 to the left of the paddle, go ahead and move the paddle to the left. 00:08:03.620 --> 00:08:07.310 Else if the ball is to the right of the paddle, move the paddle to the right. 00:08:07.310 --> 00:08:10.800 And else if neither of those are true, then don't move the paddle at all. 00:08:10.800 --> 00:08:12.890 So one of the advantages of this decision tree 00:08:12.890 --> 00:08:15.290 is that it translates quite nicely to the conditions 00:08:15.290 --> 00:08:18.600 that you're familiar with from the world of programming. 00:08:18.600 --> 00:08:21.300 So let's give this a try with something a little more complex. 00:08:21.300 --> 00:08:23.520 Let's take the game of tic-tac-toe for example. 00:08:23.520 --> 00:08:27.320 And imagine you were trying to program an AI to strategically play 00:08:27.320 --> 00:08:28.760 the game of tic-tac-toe. 00:08:28.760 --> 00:08:33.679 What questions might you ask in your decision tree, what yes or no questions 00:08:33.679 --> 00:08:37.377 might you want to ask to create an AI that can play this game well, 00:08:37.377 --> 00:08:38.960 that can play this game strategically? 00:08:38.960 --> 00:08:40.820 And maybe I'll take a volunteer if anyone 00:08:40.820 --> 00:08:45.020 wants to suggest one possible question I might want to ask of my AI 00:08:45.020 --> 00:08:48.660 as I'm trying to teach this AI how to play this game? 00:08:48.660 --> 00:08:52.830 Any thoughts for questions that I might ask in this situation? 00:08:52.830 --> 00:08:55.850 And let's go to-- 00:08:55.850 --> 00:08:59.390 let's see, let's try [? Vasily ?] if you have an idea? 00:08:59.390 --> 00:09:02.630 AUDIENCE: Maybe let's ask what is the probability 00:09:02.630 --> 00:09:04.243 of winning given a certain move? 00:09:04.243 --> 00:09:04.910 BRIAN YU: Great. 00:09:04.910 --> 00:09:05.870 And what would winning mean? 00:09:05.870 --> 00:09:08.578 How could you detect what it would mean to win, like what are you 00:09:08.578 --> 00:09:10.850 going to look for on the board? 00:09:10.850 --> 00:09:15.193 AUDIENCE: Three consecutive yeah, Xs or circles. 00:09:15.193 --> 00:09:17.360 BRIAN YU: Great, you're looking for some opportunity 00:09:17.360 --> 00:09:19.970 for like three consecutive Xs or three consecutive Os. 00:09:19.970 --> 00:09:22.490 So one of the questions you might ask is checking 00:09:22.490 --> 00:09:26.842 to see whether or not you could get three in a row maybe on the next turn. 00:09:26.842 --> 00:09:29.300 If you already have two in a row and there's an empty space 00:09:29.300 --> 00:09:32.175 that could be an opportunity for something that we might want to try. 00:09:32.175 --> 00:09:35.720 Any other questions we might want to ask as part of our strategic decision 00:09:35.720 --> 00:09:38.390 making if we're trying to play this game of tic-tac-toe, 00:09:38.390 --> 00:09:41.510 any thoughts about other things we could try, 00:09:41.510 --> 00:09:44.870 other types of things we should be looking for or asking 00:09:44.870 --> 00:09:47.180 as we're training our artificial intelligence, as we're 00:09:47.180 --> 00:09:50.690 trying to program it to play this game strategically? 00:09:50.690 --> 00:09:53.770 Let's go to Santiago, you have an idea? 00:09:53.770 --> 00:09:54.730 AUDIENCE: Hello, Brian. 00:09:54.730 --> 00:09:58.788 If there is any possibility to win, the other player? 00:09:58.788 --> 00:10:00.080 BRIAN YU: For the other player? 00:10:00.080 --> 00:10:01.760 Yeah, so you might want to check the other player to see 00:10:01.760 --> 00:10:03.365 if they have some possibility to win. 00:10:03.365 --> 00:10:06.240 And if they have some possibility to win you could try to block them. 00:10:06.240 --> 00:10:08.032 So those are great questions you could ask. 00:10:08.032 --> 00:10:11.000 And we could start to formulate a decision tree for tic-tac-toe 00:10:11.000 --> 00:10:12.140 based on those questions. 00:10:12.140 --> 00:10:15.980 I might start by asking, can I get three in a row on this turn? 00:10:15.980 --> 00:10:19.050 And if the answer is yes, well, then my action is pretty straightforward. 00:10:19.050 --> 00:10:22.670 I should play in the square that will get me to three in a row. 00:10:22.670 --> 00:10:25.130 If the answer to the question is no, well then I 00:10:25.130 --> 00:10:27.230 might ask Santiago's question, I might ask, 00:10:27.230 --> 00:10:30.800 can my opponent get three in a row on their next turn? 00:10:30.800 --> 00:10:33.128 And if the answer to that is yes, well then, we better 00:10:33.128 --> 00:10:35.670 play in the square to block them from getting three in a row, 00:10:35.670 --> 00:10:39.770 otherwise they're going to win the game if we don't block them now. 00:10:39.770 --> 00:10:41.690 If the answer to this question is no though, 00:10:41.690 --> 00:10:43.850 if I can't get three in a row in this turn 00:10:43.850 --> 00:10:46.640 and my opponent can't get three in a row in the next turn, 00:10:46.640 --> 00:10:48.405 well now it's a little bit less clear. 00:10:48.405 --> 00:10:50.780 You could maybe try and come up with additional questions 00:10:50.780 --> 00:10:53.750 you might ask but these two questions are very clear 00:10:53.750 --> 00:10:56.470 and the rest maybe a little bit less obvious but ultimately we 00:10:56.470 --> 00:10:58.970 don't want to be doing any of this as we're starting to make 00:10:58.970 --> 00:11:00.740 our AI more and more intelligent. 00:11:00.740 --> 00:11:02.960 As David was alluding to earlier, ultimately 00:11:02.960 --> 00:11:05.930 rather than us have to program every condition into the computer, 00:11:05.930 --> 00:11:08.390 telling it what to do in every situation, 00:11:08.390 --> 00:11:11.960 we'd like for the computer to be smart enough to just figure out on its own, 00:11:11.960 --> 00:11:14.480 analyze on its own all of the possibilities 00:11:14.480 --> 00:11:16.490 and figure out what move it should make. 00:11:16.490 --> 00:11:18.350 And ultimately we want our computer to make 00:11:18.350 --> 00:11:22.970 the optimal decision, the best possible decision when playing this game. 00:11:22.970 --> 00:11:25.880 And so to do so we can introduce our first algorithm 00:11:25.880 --> 00:11:28.220 in artificial intelligence that we'll look at today 00:11:28.220 --> 00:11:30.020 and that algorithm is called minimax. 00:11:30.020 --> 00:11:32.540 Minimax is a game playing algorithm that's 00:11:32.540 --> 00:11:35.300 useful any time you have two competing players that 00:11:35.300 --> 00:11:38.540 are trying to play some sort of competitive game like tic-tac-toe, 00:11:38.540 --> 00:11:40.620 but it'll work for other games as well. 00:11:40.620 --> 00:11:43.460 And ultimately as with we've seen all throughout CS50, 00:11:43.460 --> 00:11:47.082 we need some way of representing this game inside of the computer. 00:11:47.082 --> 00:11:49.790 And ultimately you'll recall way back from the beginning of CS50, 00:11:49.790 --> 00:11:53.150 we've been trying to take everything and represent it just with numbers 00:11:53.150 --> 00:11:54.920 and we can do the same thing with games. 00:11:54.920 --> 00:11:58.970 As far as our AI is concerned, there are only three possible outcomes 00:11:58.970 --> 00:12:01.250 of the game that our AI is going to care about, 00:12:01.250 --> 00:12:04.880 either O wins or X wins and X gets three in a row, 00:12:04.880 --> 00:12:08.690 or it's possible that neither side wins, that it ends up being a tie. 00:12:08.690 --> 00:12:12.500 And so what we're going to do is take each of these three possible outcomes 00:12:12.500 --> 00:12:14.645 and assign a number to each of those outcomes. 00:12:14.645 --> 00:12:16.520 We'll say O winning, we're just going to say, 00:12:16.520 --> 00:12:19.890 let's call that negative one, X winning, we'll call that one. 00:12:19.890 --> 00:12:22.458 And a tie, well that's somewhere in between the two. 00:12:22.458 --> 00:12:24.000 So we'll go ahead and call that zero. 00:12:24.000 --> 00:12:25.920 Nobody wins that game. 00:12:25.920 --> 00:12:29.310 And now each player has a goal, some objective that we 00:12:29.310 --> 00:12:31.440 can quantify using these numbers. 00:12:31.440 --> 00:12:34.620 The X player, who we might also call the max player, 00:12:34.620 --> 00:12:39.150 aims to maximize the score, what's best for X is a score of one, 00:12:39.150 --> 00:12:40.780 meaning X is going to win. 00:12:40.780 --> 00:12:43.350 But if X can't win, well then tying the game 00:12:43.350 --> 00:12:46.790 is zero, that's better than losing the game which is negative one. 00:12:46.790 --> 00:12:48.540 Meanwhile the O player we're going to call 00:12:48.540 --> 00:12:52.120 the min player, the minimizing player, their goal is to minimize the score. 00:12:52.120 --> 00:12:54.730 Negative one is the best outcome for the O player, 00:12:54.730 --> 00:12:57.750 but if negative one can't happen, if O can't win, well, 00:12:57.750 --> 00:12:59.970 then tying the game is still better than X 00:12:59.970 --> 00:13:04.050 winning because X winning would mean our minimizing player, player O, 00:13:04.050 --> 00:13:06.430 is ultimately going to lose the game. 00:13:06.430 --> 00:13:10.680 And so we can take every board in tic-tac-toe and assign a score to it, 00:13:10.680 --> 00:13:13.680 it's either one or zero or negative one. 00:13:13.680 --> 00:13:17.640 So this board for example, this game is over, X has already won the game 00:13:17.640 --> 00:13:21.210 and because X has won the game, we're going to give that a value of one. 00:13:21.210 --> 00:13:26.490 One corresponds to X winning, negative one to O winning, zero for a tie. 00:13:26.490 --> 00:13:28.740 So let's now consider this game board. 00:13:28.740 --> 00:13:32.130 This game board isn't over yet, but we can still assign a score to it. 00:13:32.130 --> 00:13:36.690 Let's assume that it's X's turn, what score would we give this board? 00:13:36.690 --> 00:13:38.730 Well, when we're giving each board a score, 00:13:38.730 --> 00:13:42.360 we're considering what would happen if both players were playing optimally. 00:13:42.360 --> 00:13:46.200 In other words, if both players are playing the best possible moves 00:13:46.200 --> 00:13:48.900 and in this case, if X is playing the best possible move, 00:13:48.900 --> 00:13:52.410 well then X is going to play in this square to get three in a row 00:13:52.410 --> 00:13:57.420 and so this board also has a value of one because if X plays their best move, 00:13:57.420 --> 00:14:01.610 ultimately X is going to win the game, that has a value of one. 00:14:01.610 --> 00:14:04.860 So let's take a look at another board, maybe one that's a little bit trickier. 00:14:04.860 --> 00:14:08.870 Now in this board let's assume it's O's turn, so O gets to make a move 00:14:08.870 --> 00:14:10.620 and then because there's one empty square, 00:14:10.620 --> 00:14:14.010 X will get to make a move after that, what value 00:14:14.010 --> 00:14:15.257 would you give to this board? 00:14:15.257 --> 00:14:16.590 Maybe we'll ask for a volunteer. 00:14:16.590 --> 00:14:19.350 Remember the possibilities are one if X is 00:14:19.350 --> 00:14:22.140 going to win, negative one if O is going to win, 00:14:22.140 --> 00:14:24.090 and a zero if it's going to be a tie. 00:14:24.090 --> 00:14:29.280 If both players play the best moves and it's O's turn right now, 00:14:29.280 --> 00:14:32.310 what value would you give to this board and why? 00:14:32.310 --> 00:14:35.160 Maybe let's go to Santiago, go ahead. 00:14:35.160 --> 00:14:38.220 AUDIENCE: Yeah, I would say that the value would be zero 00:14:38.220 --> 00:14:41.970 because if both players pay their best move no one in the end 00:14:41.970 --> 00:14:46.145 will win because O will block X and then there would be nothing left to do. 00:14:46.145 --> 00:14:46.770 BRIAN YU: Yeah. 00:14:46.770 --> 00:14:47.980 Absolutely correct. 00:14:47.980 --> 00:14:51.570 It's certainly possible that X could win because X has two in a row, 00:14:51.570 --> 00:14:53.080 they could get three in a row. 00:14:53.080 --> 00:14:55.380 But if both players play their best moves, 00:14:55.380 --> 00:14:57.615 well then O is going to block here and then 00:14:57.615 --> 00:14:59.490 X is going to play in the upper left and it's 00:14:59.490 --> 00:15:02.790 going to be a tie, which means this board is going to have a value of zero. 00:15:02.790 --> 00:15:04.890 And the way a computer might figure that out 00:15:04.890 --> 00:15:07.380 is by reasoning it through exactly the same way we did, 00:15:07.380 --> 00:15:09.910 by considering both of the possible options. 00:15:09.910 --> 00:15:13.500 If I want to know the value for this board where it's O's turn, 00:15:13.500 --> 00:15:16.470 well then I'm going to consider both of the possible options. 00:15:16.470 --> 00:15:19.920 O has two choices, O could play in the top left 00:15:19.920 --> 00:15:23.220 or O could block X by playing in the bottom there 00:15:23.220 --> 00:15:26.490 and the computer would consider both of those possible choices 00:15:26.490 --> 00:15:30.800 and try and figure out what value each of those boards has. 00:15:30.800 --> 00:15:33.120 So what happens if O plays in the top left? 00:15:33.120 --> 00:15:37.860 Well, then X is going to play in the bottom and X is going to win the game. 00:15:37.860 --> 00:15:40.500 X winning the game, that has a value of one, 00:15:40.500 --> 00:15:43.787 remember X winning, that's the value of one, which means this board also 00:15:43.787 --> 00:15:45.120 is going to have a value of one. 00:15:45.120 --> 00:15:47.487 So X wins, that's not great. 00:15:47.487 --> 00:15:49.320 So let's consider the other possible choice, 00:15:49.320 --> 00:15:53.130 O could also have blocked X by playing in the bottom here 00:15:53.130 --> 00:15:56.610 and in that case, that's going to lead to X playing in the upper left, 00:15:56.610 --> 00:16:00.270 it's the only other possible option, that board has a value of zero, 00:16:00.270 --> 00:16:02.040 it's a tie since nobody won. 00:16:02.040 --> 00:16:05.100 Which means this board also has a value of zero. 00:16:05.100 --> 00:16:08.490 And now the minimizing player is going to look at these two options. 00:16:08.490 --> 00:16:11.520 If I play in the top left, that's going to be a value of one, 00:16:11.520 --> 00:16:13.050 that's not good for me. 00:16:13.050 --> 00:16:16.980 If I play in the bottom, that's going to be a value of zero, so that's better. 00:16:16.980 --> 00:16:19.680 And so ultimately we can conclude that this board up 00:16:19.680 --> 00:16:24.150 top as Santiago correctly stated, is also going to have a value of zero. 00:16:24.150 --> 00:16:28.050 If both players play their best moves it's ultimately going to be a tie. 00:16:28.050 --> 00:16:30.330 And this is what we may call a game tree, where 00:16:30.330 --> 00:16:33.450 you're exploring all of the possible branches, all of the ways 00:16:33.450 --> 00:16:34.590 this game could go. 00:16:34.590 --> 00:16:37.255 And from here, we're two moves away from the end of the game. 00:16:37.255 --> 00:16:40.380 But you could back up and consider what would it look like three moves away 00:16:40.380 --> 00:16:41.400 from the end of the game. 00:16:41.400 --> 00:16:43.525 And you'd end up with a bigger tree because you now 00:16:43.525 --> 00:16:45.870 need to explore even more possibilities if you're 00:16:45.870 --> 00:16:50.280 trying to figure out what the value of any particular game board is. 00:16:50.280 --> 00:16:52.380 And so that's the way the minimax algorithm works, 00:16:52.380 --> 00:16:55.110 consider all of the possible moves you could make 00:16:55.110 --> 00:16:59.337 and then recursively in a sense, consider if I make my best move, what's 00:16:59.337 --> 00:17:01.170 my opponent going to do in response and then 00:17:01.170 --> 00:17:03.130 what could I do in response to that? 00:17:03.130 --> 00:17:05.790 And we could formulate that as a little bit of pseudocode 00:17:05.790 --> 00:17:10.290 by saying if the player is X, well then for all of the possible moves 00:17:10.290 --> 00:17:12.750 let's go ahead and calculate a score for that board. 00:17:12.750 --> 00:17:15.359 The same way we were just doing, by recursively following 00:17:15.359 --> 00:17:18.119 all the possible moves, let's get a score for that board 00:17:18.119 --> 00:17:21.839 and let's choose the move with the highest score. 00:17:21.839 --> 00:17:24.900 And otherwise, if the player is O, well, then we'll do the same thing. 00:17:24.900 --> 00:17:28.319 For all the possible moves, we'll calculate a score for that board, 00:17:28.319 --> 00:17:31.570 but then we'll choose the move with the lowest score. 00:17:31.570 --> 00:17:34.710 So the X player is picking the move that maximizes the score, 00:17:34.710 --> 00:17:38.100 the O player is picking the move that minimizes the score. 00:17:38.100 --> 00:17:40.590 And using this approach you can create an AI 00:17:40.590 --> 00:17:42.690 that can play a game like tic-tac-toe perfectly. 00:17:42.690 --> 00:17:46.440 It'll never lose the game if you've programmed it correctly. 00:17:46.440 --> 00:17:49.710 And this works not only for tic-tac-toe but for other games as well. 00:17:49.710 --> 00:17:52.500 But do you see any problems with this approach? 00:17:52.500 --> 00:17:56.010 This approach of considering all of my possible moves, then 00:17:56.010 --> 00:17:58.980 all of my opponent's possible responses to those moves, and then 00:17:58.980 --> 00:18:02.400 all of my possible responses to those moves recursively 00:18:02.400 --> 00:18:04.230 until we get down to the end of the game. 00:18:04.230 --> 00:18:06.180 Any possible problems that might arise? 00:18:06.180 --> 00:18:08.088 Let's go to Sophia maybe. 00:18:08.088 --> 00:18:11.130 AUDIENCE: It can take a really long time to like explore all the branches 00:18:11.130 --> 00:18:12.780 and actually come to a conclusion. 00:18:12.780 --> 00:18:14.655 BRIAN YU: Yeah, the amount of time it's going 00:18:14.655 --> 00:18:16.710 to take to explore all of those possible moves, 00:18:16.710 --> 00:18:19.740 it could be quite large depending on how complex the game is. 00:18:19.740 --> 00:18:23.220 For a game like tic-tac-toe, maybe think about how many possible games 00:18:23.220 --> 00:18:24.450 of tic-tac-toe there are. 00:18:24.450 --> 00:18:30.210 It turns out there are about 255,000 possible games of tic-tac-toe, which 00:18:30.210 --> 00:18:32.190 seems like a lot, but computers are pretty fast 00:18:32.190 --> 00:18:35.460 and computers can make it through all 255,000 00:18:35.460 --> 00:18:38.790 of these possible tic-tac-toe games usually in a matter of seconds 00:18:38.790 --> 00:18:40.500 on most modern computers. 00:18:40.500 --> 00:18:43.110 But if you imagine a game like chess for example, 00:18:43.110 --> 00:18:46.870 known for being a fairly complex game, how many possible games of chess 00:18:46.870 --> 00:18:47.370 are there? 00:18:47.370 --> 00:18:51.690 If there are 255,000 possible games of tic-tac-toe, 00:18:51.690 --> 00:18:53.680 how many possible games of chess are there? 00:18:53.680 --> 00:18:57.360 Well, it turns out that only after the first four moves, 00:18:57.360 --> 00:19:00.060 if you only look at the first four moves of chess, 00:19:00.060 --> 00:19:04.170 there are 288 billion possible chess games, which 00:19:04.170 --> 00:19:06.288 is a lot for any computer to try to deal with 00:19:06.288 --> 00:19:07.830 and that's only the first four moves. 00:19:07.830 --> 00:19:11.610 If you consider the entire game of chess, nobody knows the answer exactly, 00:19:11.610 --> 00:19:15.150 but people estimate that 10 to the 29,000 00:19:15.150 --> 00:19:18.750 is probably a lower bound on the number of possible chess games. 00:19:18.750 --> 00:19:21.090 The actual number is probably higher than that. 00:19:21.090 --> 00:19:25.080 That is far too many for any computer to be able to reasonably get through 00:19:25.080 --> 00:19:26.440 in any amount of time. 00:19:26.440 --> 00:19:28.470 Computers as they exist now are not going 00:19:28.470 --> 00:19:31.650 to be able to consider all of the possible moves going 00:19:31.650 --> 00:19:34.050 all the way to the end of the game in order to figure out 00:19:34.050 --> 00:19:35.383 what the score for any board is. 00:19:35.383 --> 00:19:38.640 Which is why a game like chess is much harder for an AI 00:19:38.640 --> 00:19:40.530 to play than a game like tic-tac-toe. 00:19:40.530 --> 00:19:41.820 But it's not impossible. 00:19:41.820 --> 00:19:43.830 And in fact, the best computer chess players 00:19:43.830 --> 00:19:47.740 are far better than the best human chess players at this point. 00:19:47.740 --> 00:19:50.670 So what could we do, what maybe improvement could we make 00:19:50.670 --> 00:19:52.710 or what change could we make to the algorithm 00:19:52.710 --> 00:19:56.730 so that our AI can still play a game like chess even though there 00:19:56.730 --> 00:19:59.057 are so many additional possible moves? 00:19:59.057 --> 00:20:01.140 There are a lot of possible answers but any ideas, 00:20:01.140 --> 00:20:04.080 anyone want to offer a thought or a suggestion 00:20:04.080 --> 00:20:07.770 as to how we might address this problem? 00:20:07.770 --> 00:20:11.160 They are so many moves to consider, too many for a computer 00:20:11.160 --> 00:20:15.000 to ever reasonably try to attempt. 00:20:15.000 --> 00:20:18.680 Any thoughts on something we could try? 00:20:18.680 --> 00:20:19.700 Let's go to, Kurt. 00:20:19.700 --> 00:20:21.320 Yeah, you have an idea? 00:20:21.320 --> 00:20:24.140 AUDIENCE: Yeah, just if there was some way along the way 00:20:24.140 --> 00:20:27.590 to maybe assign some point values where you could, below some threshold maybe 00:20:27.590 --> 00:20:29.572 you could like discard those paths? 00:20:29.572 --> 00:20:32.780 BRIAN YU: Yeah, we want some way to be able to more intelligently not explore 00:20:32.780 --> 00:20:36.140 all of the paths, but somehow like discard some of the paths 00:20:36.140 --> 00:20:38.810 or not consider some of them or stop ourselves somewhere. 00:20:38.810 --> 00:20:40.850 So that rather than consider all of the 10 00:20:40.850 --> 00:20:45.140 to the 29,000 possible games of chess, we just consider fewer of them. 00:20:45.140 --> 00:20:49.137 Exploring fewer possible games, exploring fewer possible levels 00:20:49.137 --> 00:20:51.470 and so there are many variants on the minimax algorithm. 00:20:51.470 --> 00:20:55.280 One of the most common being what's called depth-limited minimax, where 00:20:55.280 --> 00:20:59.510 in depth-limited minimax, we consider what's called an evaluation function. 00:20:59.510 --> 00:21:01.940 Rather than follow the game until the very end, 00:21:01.940 --> 00:21:04.190 we follow the game some number of moves. 00:21:04.190 --> 00:21:07.010 In chess maybe you're going 15, 16 moves, but not all the way 00:21:07.010 --> 00:21:10.520 to the end of the game and then you're just asking a question like all right, 00:21:10.520 --> 00:21:13.063 at this point in the game, who seems likely to win? 00:21:13.063 --> 00:21:15.230 Even if we're not going to calculate it all the way, 00:21:15.230 --> 00:21:17.660 you can maybe make some judgment by counting up 00:21:17.660 --> 00:21:20.750 how many pieces of what value each side happens to have. 00:21:20.750 --> 00:21:23.570 And you might come up with other strategies for different games. 00:21:23.570 --> 00:21:26.990 But this now is where we need to start being even more 00:21:26.990 --> 00:21:30.770 intelligent, to thinking about how can we come up with a good evaluation 00:21:30.770 --> 00:21:34.940 function that's able to take a complex game and estimate who 00:21:34.940 --> 00:21:38.080 is potentially likely to win. 00:21:38.080 --> 00:21:40.690 So that then is minimax, an example of an algorithm that 00:21:40.690 --> 00:21:42.580 can be used to play games. 00:21:42.580 --> 00:21:46.060 And I'll pause here for any questions about minimax or game 00:21:46.060 --> 00:21:48.400 playing artificial intelligence before we 00:21:48.400 --> 00:21:52.280 move on to an entirely different type of artificial intelligence. 00:21:52.280 --> 00:21:53.265 Yeah, Sophia, go ahead. 00:21:53.265 --> 00:21:53.890 AUDIENCE: Yeah. 00:21:53.890 --> 00:21:56.230 I just have a question, like this algorithm 00:21:56.230 --> 00:21:58.837 is kind of following like how you would think about the game, 00:21:58.837 --> 00:22:01.420 but are there any other algorithms that like don't necessarily 00:22:01.420 --> 00:22:04.715 look like step by step evaluations? 00:22:04.715 --> 00:22:07.840 BRIAN YU: Yeah, there are definitely other algorithms and other approaches. 00:22:07.840 --> 00:22:10.120 In fact, right now what we've given an example of 00:22:10.120 --> 00:22:12.880 is really just an example of an exhaustive search, 00:22:12.880 --> 00:22:14.980 searching for all of the possible different moves 00:22:14.980 --> 00:22:16.880 and then making a judgment based on that. 00:22:16.880 --> 00:22:20.350 We'll see some examples of more intelligent algorithms or algorithms 00:22:20.350 --> 00:22:22.043 that can learn later today in fact. 00:22:22.043 --> 00:22:24.460 And we'll take a look at some of those other possibilities 00:22:24.460 --> 00:22:25.850 and how we might apply them. 00:22:25.850 --> 00:22:29.830 And so this then is really an example of what we might call a search algorithm. 00:22:29.830 --> 00:22:32.440 A search algorithm is just a type of algorithm 00:22:32.440 --> 00:22:35.710 where we're looking for some solution or some answer. 00:22:35.710 --> 00:22:38.200 It could be looking for the best move to make in a game 00:22:38.200 --> 00:22:41.170 or you might imagine a search problem more abstractly of something 00:22:41.170 --> 00:22:43.180 like finding your way through a maze. 00:22:43.180 --> 00:22:46.270 You're trying to get from one point in a maze to another point in a maze 00:22:46.270 --> 00:22:49.690 and you have to make decisions about which way to turn and how to go. 00:22:49.690 --> 00:22:51.730 And the real world application of this might 00:22:51.730 --> 00:22:53.380 be something like driving directions. 00:22:53.380 --> 00:22:55.610 If you go into Google Maps for example. 00:22:55.610 --> 00:22:58.600 And you type where you are and where you're trying to get to, 00:22:58.600 --> 00:23:02.470 Google Maps pretty quickly can give you a fairly optimal route, some path 00:23:02.470 --> 00:23:05.630 to take which way to turn and when, when to make which decision, that 00:23:05.630 --> 00:23:07.880 will get you from one point to another. 00:23:07.880 --> 00:23:10.150 This is an example of a search problem, Google Maps 00:23:10.150 --> 00:23:13.030 needs to somehow search through all of the possible routes you 00:23:13.030 --> 00:23:17.260 can take to figure out how to get you to the place that you're going. 00:23:17.260 --> 00:23:20.290 And so we could come up with an algorithm for trying to do that. 00:23:20.290 --> 00:23:24.490 Imagine we have a maze like this, we're trying to get from point A to point B 00:23:24.490 --> 00:23:26.440 but of course, we have some walls here, these 00:23:26.440 --> 00:23:29.050 grayed out squares are walls that we can't cross through, 00:23:29.050 --> 00:23:33.050 but we'd still like to try and find some path that will get us from point A 00:23:33.050 --> 00:23:33.550 to point B. 00:23:33.550 --> 00:23:36.490 There are a number of algorithms that you might try. 00:23:36.490 --> 00:23:39.790 And you could think about if you were solving this maze as a person, 00:23:39.790 --> 00:23:40.720 how would you do this? 00:23:40.720 --> 00:23:43.303 How would you decide what decision to make, whether to go left 00:23:43.303 --> 00:23:44.418 or whether to go right? 00:23:44.418 --> 00:23:46.210 But as far as the computer is concerned, we 00:23:46.210 --> 00:23:48.130 need to give it a very precise algorithm. 00:23:48.130 --> 00:23:50.860 And one of the simplest algorithms we might come up with 00:23:50.860 --> 00:23:53.200 is one called depth-first search. 00:23:53.200 --> 00:23:56.650 And the way that depth-first search would navigate its way through a maze 00:23:56.650 --> 00:24:00.010 is to say, the AI is going to just follow a path 00:24:00.010 --> 00:24:03.410 and if ever the AI needs to make a choice, it reaches a fork in the road, 00:24:03.410 --> 00:24:06.250 so to speak, where it could go left, or it could go right, 00:24:06.250 --> 00:24:07.875 it's just going to choose one randomly. 00:24:07.875 --> 00:24:10.667 It doesn't know which way is better, so it's going to pick one path 00:24:10.667 --> 00:24:11.900 and it's going to try it. 00:24:11.900 --> 00:24:13.700 And if ever it hits a dead end, it reaches 00:24:13.700 --> 00:24:15.910 some wall where it can't make any more progress, 00:24:15.910 --> 00:24:20.150 then our AI is just going to back up and it's going to try another path instead. 00:24:20.150 --> 00:24:22.900 So I can show you what that looks like, we're starting at point A, 00:24:22.900 --> 00:24:26.170 we're trying to get to point B. And the way our AI might work 00:24:26.170 --> 00:24:28.935 is that we might start by just following one square after another. 00:24:28.935 --> 00:24:31.060 Initially, we don't have much choice in the matter, 00:24:31.060 --> 00:24:34.480 we haven't reached any branching points or any forks in the road. 00:24:34.480 --> 00:24:38.260 But at this point right here, now we have a decision, we could go left 00:24:38.260 --> 00:24:39.580 or we could go right. 00:24:39.580 --> 00:24:43.900 And as far as depth-first search, otherwise known as DFS is concerned, 00:24:43.900 --> 00:24:45.400 DFS doesn't know which way to go. 00:24:45.400 --> 00:24:48.400 It doesn't know whether to go left, it doesn't know whether to go right, 00:24:48.400 --> 00:24:49.720 so we pick one randomly. 00:24:49.720 --> 00:24:51.440 We might choose to go left for example. 00:24:51.440 --> 00:24:52.960 So we're going to go left and we're going to keep 00:24:52.960 --> 00:24:55.510 following the path until we get to another fork in the road. 00:24:55.510 --> 00:24:58.450 We could go left or right, DFS doesn't know which to pick, 00:24:58.450 --> 00:25:00.820 so we're going to choose randomly, maybe we'll go left 00:25:00.820 --> 00:25:02.620 and now we hit a dead end. 00:25:02.620 --> 00:25:06.470 At that point our algorithm knows this was not a good path to take. 00:25:06.470 --> 00:25:09.640 So it's going to back up to the latest decision point 00:25:09.640 --> 00:25:11.010 and make a different choice. 00:25:11.010 --> 00:25:14.260 So it's going to back up to right here and say, all right, I tried going left, 00:25:14.260 --> 00:25:15.170 it didn't work. 00:25:15.170 --> 00:25:16.880 Let's try going right instead. 00:25:16.880 --> 00:25:19.433 So it's going to go right, it'll hit a dead end, 00:25:19.433 --> 00:25:22.600 realize that OK, this was not a good path to take, it just led to dead ends, 00:25:22.600 --> 00:25:23.810 it didn't go anywhere. 00:25:23.810 --> 00:25:26.547 So instead of going left, let's try going right. 00:25:26.547 --> 00:25:29.380 So we'll try this path, maybe we'll try going up but hit a dead end, 00:25:29.380 --> 00:25:30.880 so we'll try going to the right. 00:25:30.880 --> 00:25:33.688 Here again, we hit a decision, we're going to make some choice. 00:25:33.688 --> 00:25:35.980 And again, we're just going to repeatedly hit dead ends 00:25:35.980 --> 00:25:42.470 and try new paths until eventually we make our way to the destination. 00:25:42.470 --> 00:25:44.740 And so that is depth-first search, and as long 00:25:44.740 --> 00:25:47.500 as there are like a finite number of squares, 00:25:47.500 --> 00:25:51.530 this algorithm is eventually going to find a path to the destination 00:25:51.530 --> 00:25:52.660 if such a path exists. 00:25:52.660 --> 00:25:55.810 Like if there is a way to get from point A to point B, 00:25:55.810 --> 00:25:59.282 then this algorithm will eventually find it because it tries something 00:25:59.282 --> 00:26:01.240 and if we reach a dead end then we try it again 00:26:01.240 --> 00:26:06.190 and we keep doing that until eventually we find our way to the destination. 00:26:06.190 --> 00:26:08.290 But this algorithm isn't great. 00:26:08.290 --> 00:26:11.570 There are certainly some problems, some maybe room for improvement. 00:26:11.570 --> 00:26:15.850 So maybe I'll ask all of you, what problems do you see with this approach? 00:26:15.850 --> 00:26:17.830 Maybe reasons why this algorithm might not 00:26:17.830 --> 00:26:22.030 be ideal, areas where we might be able to improve upon this algorithm 00:26:22.030 --> 00:26:24.280 as it stands right now? 00:26:24.280 --> 00:26:25.720 This again is depth-first search. 00:26:25.720 --> 00:26:28.390 Let's go to Joy. 00:26:28.390 --> 00:26:30.223 AUDIENCE: I think it is very time consuming. 00:26:30.223 --> 00:26:31.932 BRIAN YU: Yeah, it's very time consuming. 00:26:31.932 --> 00:26:33.820 And it's time consuming in a number of ways. 00:26:33.820 --> 00:26:37.640 In one sense, we're exploring a lot of paths that really don't lead anywhere 00:26:37.640 --> 00:26:40.730 in the sense that we explored all of this area 00:26:40.730 --> 00:26:43.550 but ultimately that didn't help us towards trying to find the goal. 00:26:43.550 --> 00:26:47.570 And it's also time consuming in the route that it ultimately finds, 00:26:47.570 --> 00:26:52.190 like if you imagine using Google Maps, for example, to navigate 00:26:52.190 --> 00:26:54.830 our way from one place to another, it's likely 00:26:54.830 --> 00:26:57.680 going to be the case that you want to find an efficient route, 00:26:57.680 --> 00:27:00.140 you want like the fastest way to get from where 00:27:00.140 --> 00:27:01.545 you are to where you want to go. 00:27:01.545 --> 00:27:04.670 And if Google Maps were to give you like a long and winding route with lots 00:27:04.670 --> 00:27:07.340 of detours that got you to the destination but took 00:27:07.340 --> 00:27:09.410 much longer than it needed to, that's probably 00:27:09.410 --> 00:27:11.510 not the best user experience for you. 00:27:11.510 --> 00:27:15.450 And depth-first search unfortunately could run into just that situation. 00:27:15.450 --> 00:27:19.155 Imagine a maze like this, where from point A, we could go up 00:27:19.155 --> 00:27:21.530 or we could go to the right, we don't know which to take, 00:27:21.530 --> 00:27:24.270 so depth-first search will just randomly choose. 00:27:24.270 --> 00:27:27.470 We might choose to go up and maybe we'll go right 00:27:27.470 --> 00:27:30.020 and we'll find our way to the destination. 00:27:30.020 --> 00:27:33.710 Sure, depth-first search was able to find us a path from A to B, 00:27:33.710 --> 00:27:35.960 but remember that often what we want to do 00:27:35.960 --> 00:27:37.850 is we want to make the optimal decision. 00:27:37.850 --> 00:27:41.905 This is a correct path for getting from A to B, but it's not the best path. 00:27:41.905 --> 00:27:44.030 If we're looking for the shortest path from A to B, 00:27:44.030 --> 00:27:46.460 well that's going to be this one, it's going 00:27:46.460 --> 00:27:50.537 to be this path here that goes from A to B there. 00:27:50.537 --> 00:27:52.370 So we'd like some way to fix that, we'd like 00:27:52.370 --> 00:27:55.820 an algorithm that is able to find us the shortest path, 00:27:55.820 --> 00:27:57.990 not just finding us any path. 00:27:57.990 --> 00:28:01.610 And so for that, we can use an algorithm called breadth-first search, otherwise 00:28:01.610 --> 00:28:03.770 known as BFS. 00:28:03.770 --> 00:28:06.800 And the way this algorithm works is that instead of whenever we 00:28:06.800 --> 00:28:09.800 reach a fork in the road, pick one until we hit a dead end 00:28:09.800 --> 00:28:11.120 and then pick the other. 00:28:11.120 --> 00:28:14.570 What breadth-first search is going to do is whenever we hit a fork in the road, 00:28:14.570 --> 00:28:16.820 let's take both paths, let's take one step on the left 00:28:16.820 --> 00:28:19.487 and then one step on the right and then another step on the left 00:28:19.487 --> 00:28:21.590 and another step on the right and effectively just 00:28:21.590 --> 00:28:23.683 search outwards from the starting point. 00:28:23.683 --> 00:28:26.600 We're going to start from the beginning and look for all of the places 00:28:26.600 --> 00:28:29.870 we can get to by taking one step away from A, 00:28:29.870 --> 00:28:32.870 and then everywhere we can get to taking two steps away from A, 00:28:32.870 --> 00:28:35.760 and then three steps away from A, so on and so forth. 00:28:35.760 --> 00:28:37.760 So we're always looking at the things that 00:28:37.760 --> 00:28:41.850 are nearby before we look at the things that are further away. 00:28:41.850 --> 00:28:45.548 So the way this might work is that from A we're going to take one step up 00:28:45.548 --> 00:28:47.840 and then we're going to search one square to the right. 00:28:47.840 --> 00:28:51.090 And then we'll look a second square on this direction and then a second square 00:28:51.090 --> 00:28:54.080 along the path to the right and then a third square and a third square 00:28:54.080 --> 00:28:56.810 and we're going to repeat that process, effectively alternating 00:28:56.810 --> 00:28:59.540 between all of the different options that we have 00:28:59.540 --> 00:29:03.470 until eventually we find this optimal path from A to B. 00:29:03.470 --> 00:29:06.050 And because we're looking for the shorter paths 00:29:06.050 --> 00:29:09.290 before we look for those longer paths, eventually we're 00:29:09.290 --> 00:29:12.958 going to find that shortest possible path. 00:29:12.958 --> 00:29:14.750 Now there are still problems with this too. 00:29:14.750 --> 00:29:17.600 As Joy pointed out, these algorithms have a tendency 00:29:17.600 --> 00:29:20.310 to explore a lot of paths unnecessarily. 00:29:20.310 --> 00:29:22.650 Let's go back to that original maze for example 00:29:22.650 --> 00:29:24.980 and consider what would breadth-first search 00:29:24.980 --> 00:29:27.350 do if I presented it with this maze. 00:29:27.350 --> 00:29:29.480 We'll consider what might happen, we might 00:29:29.480 --> 00:29:32.250 go until we reach the first decision point right here, 00:29:32.250 --> 00:29:33.840 we could go left or right. 00:29:33.840 --> 00:29:36.260 And remember, depth-first search picked one 00:29:36.260 --> 00:29:38.060 and just followed it until a dead end. 00:29:38.060 --> 00:29:40.953 What breadth-first search is going to do is it's going to pick both. 00:29:40.953 --> 00:29:42.870 It's going to go to the left and to the right. 00:29:42.870 --> 00:29:45.412 And then to the left and to the right and basically alternate 00:29:45.412 --> 00:29:48.830 between all of those until we reach another decision point and then 00:29:48.830 --> 00:29:50.920 it's going to consider all of those possibilities. 00:29:50.920 --> 00:29:55.550 Let's go left or left here and right and consider these possibilities too 00:29:55.550 --> 00:29:57.170 and it's going to keep exploring. 00:29:57.170 --> 00:29:59.780 Any time we reach any decision point, it's 00:29:59.780 --> 00:30:05.030 going to explore this way and that way, this way and that way, 00:30:05.030 --> 00:30:06.290 over and over again. 00:30:06.290 --> 00:30:08.090 And ultimately, if we repeat this process 00:30:08.090 --> 00:30:11.360 and consider what breadth-first search is looking for, 00:30:11.360 --> 00:30:16.010 sure it's going to find me the shortest and in this case, the only possible 00:30:16.010 --> 00:30:20.870 path to get from A to B but it took such a long time to be able to do so. 00:30:20.870 --> 00:30:22.980 It looked at so many different squares, in fact, 00:30:22.980 --> 00:30:25.340 it looked at all of the squares in order to do 00:30:25.340 --> 00:30:29.940 something as simple as find the shortest path to get from one point to another. 00:30:29.940 --> 00:30:33.800 And so here we can try to start to be a little bit more intelligent. 00:30:33.800 --> 00:30:37.520 What we'd ideally like to do is that when we reach this first decision 00:30:37.520 --> 00:30:41.570 point, go left or go right, most humans if you gave them this maze 00:30:41.570 --> 00:30:44.270 and told you to try to solve it at this decision point, 00:30:44.270 --> 00:30:48.050 you wouldn't just pick randomly, you would choose to go to the right 00:30:48.050 --> 00:30:50.927 because you know the goal is somewhere to the right 00:30:50.927 --> 00:30:52.760 and so it's probably the case that we should 00:30:52.760 --> 00:30:57.050 follow that direction if we're trying to find our way to the end of the maze. 00:30:57.050 --> 00:30:59.900 And so these algorithms we've looked at so far, 00:30:59.900 --> 00:31:02.090 depth-first search and breadth-first search 00:31:02.090 --> 00:31:05.330 are examples of what we might call uninformed search. 00:31:05.330 --> 00:31:08.060 They're not using any specialized knowledge about the problem. 00:31:08.060 --> 00:31:10.018 They don't really know anything about the maze. 00:31:10.018 --> 00:31:12.860 They're just basically blindly guessing and hoping 00:31:12.860 --> 00:31:15.680 that eventually we make our way to the solution. 00:31:15.680 --> 00:31:19.670 But in AI we can improve upon this by using an informed search. 00:31:19.670 --> 00:31:23.660 An informed search uses something that we know about the problem 00:31:23.660 --> 00:31:27.740 to try and improve the way we search, to allow us to search a little bit more 00:31:27.740 --> 00:31:28.930 effectively. 00:31:28.930 --> 00:31:30.680 And to do so we're often going to make use 00:31:30.680 --> 00:31:34.070 of what's known as a heuristic, some way of estimating how 00:31:34.070 --> 00:31:36.740 good a particular state is going to be. 00:31:36.740 --> 00:31:40.120 So in this maze for example, if I'm trying to get from A to B, 00:31:40.120 --> 00:31:43.000 a heuristic would allow me to answer a question like, 00:31:43.000 --> 00:31:48.100 if I see point C over here and point D over there, which one of those points 00:31:48.100 --> 00:31:49.210 would I rather be at? 00:31:49.210 --> 00:31:53.710 Which one is better for trying to find our way to the destination? 00:31:53.710 --> 00:31:56.467 And between C and D, breadth-first search and depth-first search, 00:31:56.467 --> 00:31:58.300 they have no knowledge of which one of those 00:31:58.300 --> 00:32:01.330 is going to be better, as far as it's concerned every square is just 00:32:01.330 --> 00:32:04.750 a square, but if you're looking at this as a maze, 00:32:04.750 --> 00:32:07.930 most people would intuitively tell you, well, D is better. 00:32:07.930 --> 00:32:11.300 Even if I don't know exactly how to get to the destination, 00:32:11.300 --> 00:32:14.140 if I could be at either C or D, I'd probably 00:32:14.140 --> 00:32:19.340 pick D because it just looks like it's closer to that destination. 00:32:19.340 --> 00:32:21.640 And so the heuristic we could use is one that's 00:32:21.640 --> 00:32:23.800 often known as the Manhattan distance. 00:32:23.800 --> 00:32:26.410 The Manhattan distance between any two squares 00:32:26.410 --> 00:32:29.890 effectively says ignore the walls, ignore all the boundaries, 00:32:29.890 --> 00:32:34.180 just consider how many squares like in this case up and to the right 00:32:34.180 --> 00:32:38.480 would I have to go to get from where I am to the destination. 00:32:38.480 --> 00:32:41.470 And so for C, we would go up this many squares and then all the way 00:32:41.470 --> 00:32:44.290 to the right, whereas from D doesn't matter whether you go up 00:32:44.290 --> 00:32:47.560 or to the right first, but I would go right four squares and then up two. 00:32:47.560 --> 00:32:50.440 D as far as Manhattan distance is concerned 00:32:50.440 --> 00:32:56.620 is much closer to the goal than C and so I would rather be at D than at C. 00:32:56.620 --> 00:32:59.830 And as soon as we have that notion of like between any two choices, 00:32:59.830 --> 00:33:02.140 which one of those would I rather be at? 00:33:02.140 --> 00:33:05.470 That gives us a way to improve upon the random guessing 00:33:05.470 --> 00:33:06.680 that the other algorithms do. 00:33:06.680 --> 00:33:09.430 Remember the depth-first search when it reached a fork in the road 00:33:09.430 --> 00:33:12.160 and it could go left or right, it didn't know which to pick, 00:33:12.160 --> 00:33:14.110 so it just randomly picked one. 00:33:14.110 --> 00:33:17.710 Now that we have a heuristic like this, we can make an informed choice. 00:33:17.710 --> 00:33:21.070 We can say, I don't know whether left or right is the correct solution, 00:33:21.070 --> 00:33:25.180 but right is probably going to be better than left because of this heuristic. 00:33:25.180 --> 00:33:28.108 And so I can make those sorts of judgments. 00:33:28.108 --> 00:33:30.400 And so for that, we'll take a look at another algorithm 00:33:30.400 --> 00:33:33.130 known as greedy best-first search. 00:33:33.130 --> 00:33:35.470 So in greedy best-first search, what we're going to do 00:33:35.470 --> 00:33:40.030 is consider for each of the squares what it's heuristic value is according 00:33:40.030 --> 00:33:42.050 to the Manhattan distance in this case. 00:33:42.050 --> 00:33:45.310 So this square for example, is one away from the goal, 00:33:45.310 --> 00:33:47.170 so we've labeled it with the number one. 00:33:47.170 --> 00:33:49.200 This square meanwhile is two away from our goal, 00:33:49.200 --> 00:33:51.742 so we're labeling it with the number two, this is three away, 00:33:51.742 --> 00:33:52.747 this is four away. 00:33:52.747 --> 00:33:54.580 You'll notice though that we're ignoring any 00:33:54.580 --> 00:33:56.998 of the walls, any of the boundaries, because those 00:33:56.998 --> 00:33:58.540 are going to be difficult to compute. 00:33:58.540 --> 00:34:00.160 We want something efficient. 00:34:00.160 --> 00:34:02.980 This square here, even though in order to get to the goal 00:34:02.980 --> 00:34:06.208 we have to follow this winding path to go around all the boundaries, 00:34:06.208 --> 00:34:07.750 we're still giving it a score of two. 00:34:07.750 --> 00:34:11.389 We want something efficient, right now, it just looks like it's two away. 00:34:11.389 --> 00:34:14.949 So it's not a perfect heuristic, the heuristic is just an estimate 00:34:14.949 --> 00:34:17.080 but we're using it as an approximation that's 00:34:17.080 --> 00:34:20.630 going to help us as we go about this search process. 00:34:20.630 --> 00:34:24.460 And so what we'll do is we'll start the same way, starting from point A looking 00:34:24.460 --> 00:34:28.150 at all of the squares we can get to until we reach our first decision 00:34:28.150 --> 00:34:29.120 point. 00:34:29.120 --> 00:34:33.040 So here we could choose to go left or we could choose to go right 00:34:33.040 --> 00:34:36.370 and in this case, going left according to the heuristic, 00:34:36.370 --> 00:34:40.239 this square is 13 away from the goal and this one over here 00:34:40.239 --> 00:34:41.900 is 11 away from the goal. 00:34:41.900 --> 00:34:44.650 So between those two, being 11 away from the goal, 00:34:44.650 --> 00:34:46.280 that sounds a whole lot better. 00:34:46.280 --> 00:34:49.969 So greedy best-first search is going to make the choice to go to the right. 00:34:49.969 --> 00:34:53.980 So we'll go 11, we'll keep following until we reach the next decision point. 00:34:53.980 --> 00:34:57.730 Here from the eight we have two choices, up or to the right. 00:34:57.730 --> 00:35:01.090 As far as the heuristic is concerned both of these are equivalent, 00:35:01.090 --> 00:35:05.680 going up we're seven squares away, going to the right we're six plus one. 00:35:05.680 --> 00:35:07.610 That's a total of seven squares away. 00:35:07.610 --> 00:35:11.020 So here even greedy best-first search sometimes needs to pick randomly. 00:35:11.020 --> 00:35:14.830 If both squares have the same heuristic value, we just have to make a choice. 00:35:14.830 --> 00:35:16.895 And maybe we make a bad choice and hit a dead end 00:35:16.895 --> 00:35:18.520 but then we can just try the other one. 00:35:18.520 --> 00:35:21.070 So seven, six, we're going to keep following. 00:35:21.070 --> 00:35:25.660 Here we have another decision point, we can go down or we can go to the right, 00:35:25.660 --> 00:35:28.150 but the one to the right has a smaller heuristic value. 00:35:28.150 --> 00:35:29.817 It's a six instead of an eight. 00:35:29.817 --> 00:35:31.900 So we're always going to try and pick the one that 00:35:31.900 --> 00:35:33.610 looks like it's going to be closer. 00:35:33.610 --> 00:35:36.370 So we'll pick the six, we'll go to the five. 00:35:36.370 --> 00:35:38.210 And here we reach one more decision point, 00:35:38.210 --> 00:35:40.780 we can go up which has a heuristic of four 00:35:40.780 --> 00:35:43.060 and we can go down which has a heuristic of six. 00:35:43.060 --> 00:35:46.120 So even here because the four is the smaller value, 00:35:46.120 --> 00:35:48.657 we're going to go up even though you, the human can 00:35:48.657 --> 00:35:50.740 see we're ultimately going to run into a dead end, 00:35:50.740 --> 00:35:52.198 the computer doesn't know that yet. 00:35:52.198 --> 00:35:54.880 The computer just has to judge based on the heuristic what 00:35:54.880 --> 00:35:57.280 it thinks is the best thing to do but eventually 00:35:57.280 --> 00:36:00.550 it's going to run into that wall, realize that it can't make any progress 00:36:00.550 --> 00:36:05.500 and then go back down here and will follow that path until ultimately we 00:36:05.500 --> 00:36:06.860 arrive at the solution. 00:36:06.860 --> 00:36:10.990 And so here we arrived at the same solution that breadth-first search did, 00:36:10.990 --> 00:36:13.390 but we didn't need to consider all of the squares. 00:36:13.390 --> 00:36:15.790 We could ignore all of these off to the left, 00:36:15.790 --> 00:36:18.430 we could ignore all of these down below just 00:36:18.430 --> 00:36:21.310 by being a little bit smarter about what choice we make. 00:36:21.310 --> 00:36:25.210 Instead of just choosing randomly we make an informed choice 00:36:25.210 --> 00:36:27.770 based on that heuristic. 00:36:27.770 --> 00:36:30.800 I'll pause here for any questions then about the algorithms 00:36:30.800 --> 00:36:34.760 we've looked at so far, depth-first search, breadth-first search, and now 00:36:34.760 --> 00:36:38.630 an informed algorithm, greedy best-first search. 00:36:38.630 --> 00:36:40.580 Any questions about these algorithms? 00:36:40.580 --> 00:36:45.100 Yeah, let's go to let's see, Sofia? 00:36:45.100 --> 00:36:48.710 AUDIENCE: In like the decision to go randomly 00:36:48.710 --> 00:36:53.420 is it possible to like go further and see if there's like lesser values, 00:36:53.420 --> 00:36:55.610 like go one more step? 00:36:55.610 --> 00:36:59.180 BRIAN YU: You certainly could like maybe try and look ahead a couple of steps 00:36:59.180 --> 00:37:01.040 and look at what might be following it. 00:37:01.040 --> 00:37:04.400 And that might offer numbers that would improve upon the situation. 00:37:04.400 --> 00:37:07.860 But even this algorithm is not perfect. 00:37:07.860 --> 00:37:10.580 In fact, when you're just looking at these heuristic values 00:37:10.580 --> 00:37:13.820 and following the heuristic values, sometimes the heuristic values 00:37:13.820 --> 00:37:17.240 will lead you in a good direction, like sometimes in this case, 00:37:17.240 --> 00:37:19.280 we ultimately, we made a couple of wrong turns 00:37:19.280 --> 00:37:21.260 but ultimately we kind of found our way. 00:37:21.260 --> 00:37:23.280 But that's not always going to be the case. 00:37:23.280 --> 00:37:27.380 There are some cases where because the heuristic is really just an estimate, 00:37:27.380 --> 00:37:29.720 the heuristic can sometimes be wrong. 00:37:29.720 --> 00:37:31.940 Take a maze like this for example, if you 00:37:31.940 --> 00:37:34.280 were to follow greedy best-first search, you 00:37:34.280 --> 00:37:38.240 would go 16, 15, 14, 13, 12, at this point 00:37:38.240 --> 00:37:40.970 you could decide either the 11 or the 13. 00:37:40.970 --> 00:37:43.340 And greedy best-first search would look ahead and say, 00:37:43.340 --> 00:37:46.550 this path looks pretty good and it's going to-- and none of these values 00:37:46.550 --> 00:37:48.860 are any bigger than this 12 and so it would 00:37:48.860 --> 00:37:51.920 find this path that takes you from A to B. 00:37:51.920 --> 00:37:55.220 But even that path isn't actually the optimal path to take. 00:37:55.220 --> 00:37:58.880 The optimal path to take, the shortest path between A and B 00:37:58.880 --> 00:38:02.900 is actually this one here, which looks a little counterintuitive 00:38:02.900 --> 00:38:05.120 and the reason we didn't catch it is because it 00:38:05.120 --> 00:38:09.470 involves like going to the left first and then winding around. 00:38:09.470 --> 00:38:11.510 And normally according to this heuristic, 00:38:11.510 --> 00:38:13.430 we usually don't want to be going to the left 00:38:13.430 --> 00:38:16.490 because we know that our goal, point B where we're trying to get to, 00:38:16.490 --> 00:38:17.560 it's off to the right. 00:38:17.560 --> 00:38:20.840 So even if you're looking at these heuristic values and looking ahead, 00:38:20.840 --> 00:38:23.120 this algorithm might not be perfect. 00:38:23.120 --> 00:38:26.505 And so we might try to improve upon this algorithm even more. 00:38:26.505 --> 00:38:28.880 And so for one final search algorithm that I'll show you, 00:38:28.880 --> 00:38:31.130 the most complex one that we've seen yet, 00:38:31.130 --> 00:38:33.860 is an algorithm known as A * search. 00:38:33.860 --> 00:38:38.210 A * search tries to build upon the ideas of greedy best-first search, which 00:38:38.210 --> 00:38:43.070 is to say we're going to use a heuristic to try and intelligently make choices, 00:38:43.070 --> 00:38:45.840 but we're going to try and solve the problem we just saw, 00:38:45.840 --> 00:38:48.800 which is that greedy best-first search isn't always optimal. 00:38:48.800 --> 00:38:51.230 When it just takes into account the heuristic, 00:38:51.230 --> 00:38:53.360 it's not always going to make the best choice 00:38:53.360 --> 00:38:57.620 because we might end up choosing a path that's ultimately longer. 00:38:57.620 --> 00:39:00.140 What A * search is going to try to realize 00:39:00.140 --> 00:39:03.230 is that if we've made it like all the way down here 00:39:03.230 --> 00:39:06.320 and now we're at a spot where we could be like 11 squares away 00:39:06.320 --> 00:39:11.960 from the goal, that's OK but honestly, being 13 squares away from the goal 00:39:11.960 --> 00:39:14.630 much sooner is probably better. 00:39:14.630 --> 00:39:17.750 So we can't just take into account the heuristic value, 00:39:17.750 --> 00:39:21.440 we should also take into account how far have we gone already. 00:39:21.440 --> 00:39:23.960 If I've already traveled many squares and I 00:39:23.960 --> 00:39:26.510 find myself pretty close to the goal, I would rather 00:39:26.510 --> 00:39:30.330 have traveled fewer squares and find myself close to the goal. 00:39:30.330 --> 00:39:33.470 And so A * search is going to try to combine these two 00:39:33.470 --> 00:39:37.190 pieces of information, combine the heuristic information that we have seen 00:39:37.190 --> 00:39:40.220 here, and also combine how many steps have 00:39:40.220 --> 00:39:45.680 I taken so far because that factors into the ultimate optimal solution too. 00:39:45.680 --> 00:39:47.682 And so here's how A * search is going to work. 00:39:47.682 --> 00:39:49.640 It's going to be the exact same idea as before, 00:39:49.640 --> 00:39:53.120 just looking at the heuristic value, but instead of only considering 00:39:53.120 --> 00:39:56.780 the heuristic value, we're going to consider taking the heuristic value 00:39:56.780 --> 00:40:00.620 and adding to it how many steps we've already taken. 00:40:00.620 --> 00:40:02.070 So we take our first step. 00:40:02.070 --> 00:40:06.740 And now we've taken one step and we're 16 squares away from the goal 00:40:06.740 --> 00:40:08.930 according to the heuristic, for a total of 17. 00:40:08.930 --> 00:40:12.740 And then we take our second step and we're 15 squares away from the goal 00:40:12.740 --> 00:40:15.740 and we take our third step and we're 14 squares away from the goal, 00:40:15.740 --> 00:40:20.570 and we keep going until we reach a decision point after five squares, 00:40:20.570 --> 00:40:23.880 we're now 12 away from the goal and now we have a choice, 00:40:23.880 --> 00:40:27.290 we could go six squares and be 11 away from the goal 00:40:27.290 --> 00:40:31.770 or we could go six squares and be 13 away from the goal, which is worse. 00:40:31.770 --> 00:40:33.720 So we're going to make the decision to go up. 00:40:33.720 --> 00:40:36.260 So for now, it doesn't seem like we've done any better, 00:40:36.260 --> 00:40:38.525 we haven't found the optimal solution just yet. 00:40:38.525 --> 00:40:40.400 But notice what will happen, eventually if we 00:40:40.400 --> 00:40:44.180 follow this path for long enough we'll end up at a point 00:40:44.180 --> 00:40:47.360 where we've made 14 steps and we're estimated 00:40:47.360 --> 00:40:51.560 to be five away from the goal, one, two, three, four, five, ignoring the walls. 00:40:51.560 --> 00:40:55.850 And now we have a choice, we could be six squares away from the goal 00:40:55.850 --> 00:41:01.860 after having walked 15 steps, so 15 plus six that's a total of 21, 00:41:01.860 --> 00:41:04.790 or this option is still available to us. 00:41:04.790 --> 00:41:08.570 We could be six squares away from the start, 00:41:08.570 --> 00:41:13.430 but be 13 away from the goal, six plus 13, that's a total of 19 00:41:13.430 --> 00:41:17.660 and that 19 is a smaller number than this 21. 00:41:17.660 --> 00:41:22.740 So this 19 is ultimately a better choice as far as A * is concerned. 00:41:22.740 --> 00:41:25.850 So by combining information about how far we've traveled 00:41:25.850 --> 00:41:29.163 and how far is left to go, we can make a more intelligent choice, 00:41:29.163 --> 00:41:31.580 we can say you know what, let's go ahead and try this path 00:41:31.580 --> 00:41:35.070 and A * then will ultimately be able to find its way 00:41:35.070 --> 00:41:38.030 to get from the starting point ultimately to the goal. 00:41:38.030 --> 00:41:42.230 And this algorithm then relies upon having a good heuristic function 00:41:42.230 --> 00:41:44.110 and it just so happens that you can prove 00:41:44.110 --> 00:41:46.420 that if you pick a good heuristic function, 00:41:46.420 --> 00:41:48.070 this algorithm will be optimal. 00:41:48.070 --> 00:41:52.900 it will always find the shortest path from point A to point B 00:41:52.900 --> 00:41:56.200 and it's not going to get stuck like greedy best-first search might 00:41:56.200 --> 00:41:58.510 on a path that isn't actually optimal. 00:41:58.510 --> 00:42:00.970 And so there are many of these sorts of search algorithms 00:42:00.970 --> 00:42:04.660 that are designed to try and find intelligent ways to find 00:42:04.660 --> 00:42:09.110 a solution to some problem, to navigate its way through some landscape. 00:42:09.110 --> 00:42:11.590 And so I'll pause here for any questions then 00:42:11.590 --> 00:42:14.230 about search algorithms in general. 00:42:14.230 --> 00:42:17.470 We've taken a look at what we call classical search, navigating 00:42:17.470 --> 00:42:20.888 our way through a maze for example, or finding driving directions as well 00:42:20.888 --> 00:42:23.680 as what we might call adversarial search, where there's an opponent 00:42:23.680 --> 00:42:27.310 and you're trying to search for the best move in a game of tic-tac-toe 00:42:27.310 --> 00:42:28.592 or a game of chess. 00:42:28.592 --> 00:42:30.550 DAVID MALAN: Brian, one question from the chat. 00:42:30.550 --> 00:42:32.560 Does the assignment of these heuristic values 00:42:32.560 --> 00:42:35.632 also take a lot of time for the computer or is that automatic? 00:42:35.632 --> 00:42:37.840 BRIAN YU: Ideally you want to find a heuristic that's 00:42:37.840 --> 00:42:40.420 going to be efficient to calculate. 00:42:40.420 --> 00:42:44.170 So if the time it takes to calculate the heuristic takes a long time, then yeah, 00:42:44.170 --> 00:42:46.490 it could be the case that this would be even worse. 00:42:46.490 --> 00:42:48.657 But ideally, you want to look for a heuristic that's 00:42:48.657 --> 00:42:50.500 going to be very quick to calculate and this 00:42:50.500 --> 00:42:52.840 is why sometimes when we're looking at heuristics, 00:42:52.840 --> 00:42:56.845 we're going to ignore some of the details that make this more complex. 00:42:56.845 --> 00:42:58.720 Like when we're calculating these heuristics, 00:42:58.720 --> 00:43:02.170 we're ignoring the walls because having to deal with the walls is 00:43:02.170 --> 00:43:03.220 going to make it even-- 00:43:03.220 --> 00:43:05.800 it's going to make it take longer and longer amounts of time 00:43:05.800 --> 00:43:07.605 to be able to figure out these values. 00:43:07.605 --> 00:43:09.730 And so by ignoring the walls, we can pretty quickly 00:43:09.730 --> 00:43:12.400 calculate just like x, y coordinate wise, 00:43:12.400 --> 00:43:16.840 how many coordinate squares are we away from that destination. 00:43:16.840 --> 00:43:19.990 Also, I've been labeling all of the squares 00:43:19.990 --> 00:43:22.480 in this grid with their heuristic value, just so you can 00:43:22.480 --> 00:43:24.610 see what those heuristic values are. 00:43:24.610 --> 00:43:28.600 In practice, if our search algorithm never touches a square, 00:43:28.600 --> 00:43:30.760 like it never touches any of these squares, 00:43:30.760 --> 00:43:33.500 it's not going to bother computing heuristic values for them 00:43:33.500 --> 00:43:35.500 because it's not relevant to the search process. 00:43:35.500 --> 00:43:38.200 So it'll never actually calculate the heuristic values 00:43:38.200 --> 00:43:40.420 for all of these squares, which will save time too. 00:43:40.420 --> 00:43:43.928 I have just shown them to you visually so that you can see all of the squares 00:43:43.928 --> 00:43:45.720 and what numbers would be assigned to them. 00:43:45.720 --> 00:43:47.110 But yeah, really great question. 00:43:50.550 --> 00:43:52.550 All right and, Kurt, was there another question? 00:43:57.020 --> 00:43:58.770 AUDIENCE: Yeah, I was just wondering if, I 00:43:58.770 --> 00:44:02.370 mean, the example that you showed it's using like a map 00:44:02.370 --> 00:44:04.350 to actually search through a map, so it kind of 00:44:04.350 --> 00:44:07.140 like maps directly onto sort of the concept 00:44:07.140 --> 00:44:10.090 but like could this also be used for like textual search 00:44:10.090 --> 00:44:12.840 or other kinds of searches through like different kinds of problem 00:44:12.840 --> 00:44:14.112 spaces or no? 00:44:14.112 --> 00:44:17.070 BRIAN YU: Yeah, this can be used for a lot of different problem spaces. 00:44:17.070 --> 00:44:19.080 For text they're usually different approaches 00:44:19.080 --> 00:44:21.210 in the world of like natural language processing 00:44:21.210 --> 00:44:24.180 but you can use these kinds of search algorithms any time 00:44:24.180 --> 00:44:26.820 you have some computer which we'll usually 00:44:26.820 --> 00:44:29.490 call like an agent, something that's making decisions, that 00:44:29.490 --> 00:44:33.060 has to make certain decisions at certain points, 00:44:33.060 --> 00:44:36.060 even if those decisions aren't like geographic decisions about which 00:44:36.060 --> 00:44:40.140 way to walk, for example, as long as it's making some decision that moves it 00:44:40.140 --> 00:44:44.820 into like a different position, you can often apply these types of algorithms 00:44:44.820 --> 00:44:46.660 in order to solve problems. 00:44:46.660 --> 00:44:49.770 And so these algorithms are not just good for maze solving, 00:44:49.770 --> 00:44:52.538 they can be used in other domains too. 00:44:52.538 --> 00:44:54.330 So ultimately what I want to move on to now 00:44:54.330 --> 00:44:58.110 is less about just coming up with these algorithms that let the AI figure out 00:44:58.110 --> 00:45:00.330 exactly what to do right away, but looking 00:45:00.330 --> 00:45:03.360 at a type of artificial intelligence that you've probably 00:45:03.360 --> 00:45:04.980 heard of called machine learning. 00:45:04.980 --> 00:45:08.130 And machine learning is all about trying to take our AI 00:45:08.130 --> 00:45:11.520 and trying to get it to learn, learn somehow from data 00:45:11.520 --> 00:45:13.200 or learn from experience. 00:45:13.200 --> 00:45:15.333 Much the same way that humans might learn, 00:45:15.333 --> 00:45:17.250 that we learn from looking at our environment, 00:45:17.250 --> 00:45:20.700 we learn from our surroundings, we learn from our experiences, 00:45:20.700 --> 00:45:23.530 and we get something out of those experiences. 00:45:23.530 --> 00:45:25.860 And so one type of machine learning is known 00:45:25.860 --> 00:45:27.750 as reinforcement learning, where you learn 00:45:27.750 --> 00:45:29.820 from positive or negative rewards. 00:45:29.820 --> 00:45:32.880 The computer does something well and it gets a reward, 00:45:32.880 --> 00:45:36.030 the computer does something poorly, it gets some sort of punishment 00:45:36.030 --> 00:45:38.890 and the computer tries to learn from that. 00:45:38.890 --> 00:45:41.070 Let's imagine for example, a very simple game 00:45:41.070 --> 00:45:43.440 that we might want our computer to play, the computer 00:45:43.440 --> 00:45:45.950 is going to try to navigate its way through these tiles 00:45:45.950 --> 00:45:47.700 and it's going to try and get to the goal, 00:45:47.700 --> 00:45:51.240 similar to what we've seen before, but this time the computer 00:45:51.240 --> 00:45:53.040 doesn't know where the obstacles are. 00:45:53.040 --> 00:45:56.430 Let's imagine that there are some obstacles in its way highlighted 00:45:56.430 --> 00:46:00.120 in red here and if our computer agent, this yellow dot here, 00:46:00.120 --> 00:46:04.200 ever hits one of those obstacles, the computer loses this game. 00:46:04.200 --> 00:46:06.750 So it hits an obstacle, the computer loses but the computer 00:46:06.750 --> 00:46:08.940 doesn't know where the obstacles are. 00:46:08.940 --> 00:46:12.098 I'm showing them to you visually but the computer has no idea. 00:46:12.098 --> 00:46:14.140 How would the computer try to solve this problem? 00:46:14.140 --> 00:46:16.050 How would you try to solve the problem? 00:46:16.050 --> 00:46:19.980 Well, all it can do it first is randomly make a choice, randomly guess. 00:46:19.980 --> 00:46:24.000 Like let's choose to go to the right, OK we hit an obstacle that was no good. 00:46:24.000 --> 00:46:26.100 But the computer now can learn from experience. 00:46:26.100 --> 00:46:30.420 As long as it knows once it hits the obstacle that was a bad outcome, well 00:46:30.420 --> 00:46:33.870 now in the future the computer can learn, I better not do that anymore, 00:46:33.870 --> 00:46:37.215 rather than go to the right, let me try something else, I'll try moving up. 00:46:37.215 --> 00:46:39.090 All right, that seemed to have worked better, 00:46:39.090 --> 00:46:41.790 let me try making another action and let's try another one, 00:46:41.790 --> 00:46:43.440 maybe I'll go down this time. 00:46:43.440 --> 00:46:46.260 OK that led to an obstacle, so the computer learns from that too. 00:46:46.260 --> 00:46:49.770 It learns that was a bad thing to try, so let's try something else. 00:46:49.770 --> 00:46:52.110 Maybe we try going up this time, that too 00:46:52.110 --> 00:46:55.697 leads to an obstacle, that was no good, so maybe we'll try going to the right. 00:46:55.697 --> 00:46:57.780 That seemed OK, maybe we'll go to the right again. 00:46:57.780 --> 00:46:59.850 All right, that led to another obstacle. 00:46:59.850 --> 00:47:02.310 And so over and over it's just making mistakes 00:47:02.310 --> 00:47:04.870 and we let the computer make those mistakes 00:47:04.870 --> 00:47:07.020 but every time it makes a mistake, the computer 00:47:07.020 --> 00:47:08.353 is learning something from that. 00:47:08.353 --> 00:47:11.820 It's learning what to do or what not to do the next time it 00:47:11.820 --> 00:47:13.660 goes through the same process. 00:47:13.660 --> 00:47:17.340 And so in the future, it can navigate its way around and eventually 00:47:17.340 --> 00:47:20.560 with enough trial and error, it can find its way to the goal. 00:47:20.560 --> 00:47:22.560 And once it's found its way to the goal, it'll 00:47:22.560 --> 00:47:27.180 remember that too, it'll know I now know exactly what sequence of actions 00:47:27.180 --> 00:47:30.190 will take me from the starting point to the goal. 00:47:30.190 --> 00:47:35.160 And so in the future I can just keep doing that again and again and again. 00:47:35.160 --> 00:47:37.710 So that then is an example of reinforcement learning, 00:47:37.710 --> 00:47:40.698 but even with this example here do you see any potential problems 00:47:40.698 --> 00:47:41.490 with this approach? 00:47:41.490 --> 00:47:43.560 Limitations or downsides to what we've just 00:47:43.560 --> 00:47:50.020 done here, reasons why this AI might not be perfect, any thoughts? 00:47:50.020 --> 00:47:51.565 Let's go to [? Lexleen. ?] 00:47:51.565 --> 00:47:53.940 AUDIENCE: It might not be taking the most efficient path. 00:47:53.940 --> 00:47:56.460 BRIAN YU: Yeah, absolutely because it's randomly trying, 00:47:56.460 --> 00:47:58.560 eventually it will find its way to the goal 00:47:58.560 --> 00:48:01.680 but it might not necessarily find the best path because here there 00:48:01.680 --> 00:48:04.630 was a faster path, there was a faster way to get to the goal, 00:48:04.630 --> 00:48:08.340 and that was to say once you get here go up instead 00:48:08.340 --> 00:48:10.470 and that will lead to a more efficient path. 00:48:10.470 --> 00:48:12.810 But because our AI is learning, like whenever it 00:48:12.810 --> 00:48:14.670 reaches the goal it learns to do that. 00:48:14.670 --> 00:48:17.170 When it doesn't reach the goal learns not to do that. 00:48:17.170 --> 00:48:22.823 Our AI doesn't have that ability to find that better path in the future. 00:48:22.823 --> 00:48:24.240 And so we could improve upon this. 00:48:24.240 --> 00:48:28.380 And this is what we call a trade off between exploring and exploiting 00:48:28.380 --> 00:48:29.520 in artificial intelligence. 00:48:29.520 --> 00:48:31.980 We want our AI to do both of these things. 00:48:31.980 --> 00:48:36.360 We want AI to exploit the knowledge it already has, it knows what to do, 00:48:36.360 --> 00:48:39.270 it knows what not to do, we want it to use that information, 00:48:39.270 --> 00:48:42.060 but we also want it to explore a little bit. 00:48:42.060 --> 00:48:44.940 We want it to sometimes try some new action 00:48:44.940 --> 00:48:47.190 that it hasn't tried before because maybe 00:48:47.190 --> 00:48:51.120 that'll be even better than the things I've already tried in the past. 00:48:51.120 --> 00:48:53.670 So so far our AI in the example just now, 00:48:53.670 --> 00:48:57.890 was only ever exploiting the knowledge it already has but it was never 00:48:57.890 --> 00:48:59.700 exploring something new. 00:48:59.700 --> 00:49:01.835 And so often when we're training AI, we want 00:49:01.835 --> 00:49:04.170 it to find some sort of nice balance between the two. 00:49:04.170 --> 00:49:07.850 We want it to make good choices, but occasionally take a risk, 00:49:07.850 --> 00:49:11.720 try something else, see if maybe we can find a better solution. 00:49:11.720 --> 00:49:15.920 And so one strategy for doing this is known as the epsilon-greedy approach 00:49:15.920 --> 00:49:19.850 to trying to solve these problems, where we assign a value, epsilon, 00:49:19.850 --> 00:49:23.240 which is equal to some proportion, some probability 00:49:23.240 --> 00:49:26.250 that our computer is just going to make a random choice. 00:49:26.250 --> 00:49:29.360 And so we might say if we generate a random number 00:49:29.360 --> 00:49:33.440 and it's less than epsilon, which in this case happens like 10% of the time, 00:49:33.440 --> 00:49:35.600 then let's go ahead and make a random move, 00:49:35.600 --> 00:49:38.900 rather than make an intelligent, smart move just pick a move randomly 00:49:38.900 --> 00:49:41.678 and the rest of the time, 90% of the time, 00:49:41.678 --> 00:49:44.720 make the move that we know to be the best, the one with the highest value 00:49:44.720 --> 00:49:46.250 we know of so far. 00:49:46.250 --> 00:49:48.680 And this will often result in a nice balance. 00:49:48.680 --> 00:49:51.680 90% of the time our AI will make good choices 00:49:51.680 --> 00:49:54.890 that it knows to be good choices, but 10% of the time 00:49:54.890 --> 00:49:56.810 it'll make a new choice, something random 00:49:56.810 --> 00:49:58.790 and maybe it'll stumble across something bad 00:49:58.790 --> 00:50:01.130 and know to avoid that in the future, but maybe it'll 00:50:01.130 --> 00:50:06.180 come across something better and know to do that better in the future as well. 00:50:06.180 --> 00:50:08.660 And so I'll show you a real live demo of this actually. 00:50:08.660 --> 00:50:11.930 Years back, the Italian Institute of Technology 00:50:11.930 --> 00:50:15.080 was working through an example of trying to do something just like this, 00:50:15.080 --> 00:50:18.010 but before we move on I think there's a question from the chat? 00:50:18.010 --> 00:50:20.180 DAVID MALAN: Brian, someone asks, is this related 00:50:20.180 --> 00:50:22.490 to genetic algorithms, this approach? 00:50:22.490 --> 00:50:25.730 BRIAN YU: Genetic algorithms are another type of this type of learning. 00:50:25.730 --> 00:50:28.010 We're going to actually going to talk about genetic algorithms in just 00:50:28.010 --> 00:50:28.510 a moment. 00:50:28.510 --> 00:50:30.240 So we'll get there very shortly. 00:50:30.240 --> 00:50:33.290 But yes, definitely something related. 00:50:33.290 --> 00:50:35.660 So the Italian Institute of Technology a couple of years 00:50:35.660 --> 00:50:39.167 back tried to teach a robot how to flip pancakes. 00:50:39.167 --> 00:50:41.750 Something that you might have done before or seen someone else 00:50:41.750 --> 00:50:46.430 do before but in practice, it's difficult to encode in a robot 00:50:46.430 --> 00:50:47.570 exactly how to do that. 00:50:47.570 --> 00:50:51.200 Exactly what moves to make, exactly how to move its robotic muscle to be 00:50:51.200 --> 00:50:52.940 able to flip a pancake effectively. 00:50:52.940 --> 00:50:57.440 So rather than try to program every last movement into the computer, 00:50:57.440 --> 00:51:00.290 they just let the robot learn via reinforcement learning. 00:51:00.290 --> 00:51:02.960 Every time it flipped a pancake incorrectly it 00:51:02.960 --> 00:51:05.300 learned what not to do in the future and every time it 00:51:05.300 --> 00:51:08.730 flipped a pancake correctly, it learned what to do in the future. 00:51:08.730 --> 00:51:11.760 And so I'll go ahead and pull up an example of this now. 00:51:11.760 --> 00:51:15.500 And so what we're going to take a look at here is a artificial pancake. 00:51:15.500 --> 00:51:18.950 And initially, the human researcher shows 00:51:18.950 --> 00:51:20.630 the robot what success looks like. 00:51:20.630 --> 00:51:22.940 The robot needs to know what is success and what 00:51:22.940 --> 00:51:24.980 is failure so the human demonstrates. 00:51:24.980 --> 00:51:27.620 Here is what it looks like to actually do the pancake flipping 00:51:27.620 --> 00:51:30.050 by demonstrating exactly what motion to make 00:51:30.050 --> 00:51:32.390 and what it feels like to successfully flip a pancake 00:51:32.390 --> 00:51:33.860 and then we let the robot try it. 00:51:36.722 --> 00:51:38.930 All right, that was his first try, not to successful. 00:51:38.930 --> 00:51:40.270 We'll try it again. 00:51:40.270 --> 00:51:43.607 Here's after three tries. 00:51:43.607 --> 00:51:44.940 OK, not great but it's learning. 00:51:44.940 --> 00:51:48.023 Every time it does something wrong it learns what not to do in the future. 00:51:48.023 --> 00:51:50.280 He has five tries, he has 10 tries, not great. 00:51:50.280 --> 00:51:59.580 And now after 15 tries or so, or 11 tries I guess, we'll see what happens. 00:51:59.580 --> 00:52:00.630 OK so not great, right? 00:52:00.630 --> 00:52:04.980 It takes a while to learn these kinds of techniques to learn, here's 15 tries, 00:52:04.980 --> 00:52:08.950 all right but finally after enough tries once you do enough practice with this. 00:52:08.950 --> 00:52:11.370 Here's after 50 tries and we now actually 00:52:11.370 --> 00:52:14.940 have a robot that has learned to successfully perform this task, 00:52:14.940 --> 00:52:17.615 not because human programmers told it exactly how to do so, 00:52:17.615 --> 00:52:20.490 but because it learned from that failure and learned from experience. 00:52:20.490 --> 00:52:23.400 And once it knows how, it knows exactly what to do in the future. 00:52:23.400 --> 00:52:27.090 It can continually replicate that behavior over and over again 00:52:27.090 --> 00:52:30.385 once it's trained to be able to perform that task. 00:52:33.450 --> 00:52:39.580 So that then is how you might take advantage of this idea of reinforcement 00:52:39.580 --> 00:52:42.250 learning but someone in the chat brought up another approach 00:52:42.250 --> 00:52:45.850 to this type of thing known as genetic algorithms or genetic learning. 00:52:45.850 --> 00:52:48.070 And this is where a lot of machine learning 00:52:48.070 --> 00:52:52.600 has taken inspiration from the human brain, how humans learn 00:52:52.600 --> 00:52:55.180 and how nature works and because nature has managed 00:52:55.180 --> 00:52:57.730 to evolve intelligent beings, why couldn't we 00:52:57.730 --> 00:53:00.320 try to do the same thing in a computer as well? 00:53:00.320 --> 00:53:04.300 And so in nature, you have generations of populations that evolve over time, 00:53:04.300 --> 00:53:06.790 where the most fit organisms survive and are 00:53:06.790 --> 00:53:10.960 able to evolve and mutate and change over time to become better and better 00:53:10.960 --> 00:53:12.800 at adapting to their environment. 00:53:12.800 --> 00:53:15.460 And so one strategy, one approach to trying 00:53:15.460 --> 00:53:20.830 to build intelligent machines is rather than program one algorithm that 00:53:20.830 --> 00:53:23.020 is really good at performing a task, let's 00:53:23.020 --> 00:53:27.190 just create a lot of algorithms that are really bad at performing the task 00:53:27.190 --> 00:53:29.120 because that's much easier to do. 00:53:29.120 --> 00:53:32.170 But we'll let them evolve, we'll let them try some task 00:53:32.170 --> 00:53:35.540 and after they do it, they won't do very well 00:53:35.540 --> 00:53:37.570 but we'll take the ones that did the best 00:53:37.570 --> 00:53:40.450 and replicate them and mutate them and let them try again. 00:53:40.450 --> 00:53:42.940 And then repeat this generation after generation, 00:53:42.940 --> 00:53:46.090 eliminating all of the computer programs that don't perform well 00:53:46.090 --> 00:53:48.732 but duplicating and mutating the ones that do. 00:53:48.732 --> 00:53:51.440 The pseudocode for which might look a little something like this. 00:53:51.440 --> 00:53:55.030 We're going to start by making an initial generation of candidates 00:53:55.030 --> 00:53:59.170 randomly, where each candidate is some program designed to try and solve 00:53:59.170 --> 00:54:01.705 some task but rather than program it intelligently 00:54:01.705 --> 00:54:03.580 the way you've been doing all semester, we're 00:54:03.580 --> 00:54:06.880 just going to program them randomly to just make random choices 00:54:06.880 --> 00:54:10.690 and we're going to repeat this process until eventually they're successful. 00:54:10.690 --> 00:54:14.050 We're going to for each one of these candidates calculate its fitness, 00:54:14.050 --> 00:54:18.767 calculate how good is it at performing the task that it's designed to do. 00:54:18.767 --> 00:54:21.100 And then we're going to remove the least fit candidates. 00:54:21.100 --> 00:54:23.330 Maybe take the half of them that didn't do very well, 00:54:23.330 --> 00:54:26.750 eliminate those but take the ones that did do well 00:54:26.750 --> 00:54:30.610 and make a new generation from the remaining candidates, duplicate them, 00:54:30.610 --> 00:54:33.430 make a few mutations to them randomly just to change things up 00:54:33.430 --> 00:54:37.810 and to see how things work in order to try to create a better generation. 00:54:37.810 --> 00:54:41.920 And over time as we repeat this process, in much the same way that evolution 00:54:41.920 --> 00:54:44.710 is able to continually produce better and better organisms that 00:54:44.710 --> 00:54:48.490 are more and more fit, we can do the same thing with our algorithms 00:54:48.490 --> 00:54:51.280 too, producing better and better algorithms over time. 00:54:51.280 --> 00:54:53.470 I have another demo to show you of this. 00:54:53.470 --> 00:54:57.910 Here is a simulation of some self-driving cars that 00:54:57.910 --> 00:55:02.230 have not been programmed how to drive but are starting out driving entirely 00:55:02.230 --> 00:55:04.450 randomly each of these rectangles you see 00:55:04.450 --> 00:55:07.150 is one example of a virtual self-driving car. 00:55:07.150 --> 00:55:09.940 Each of the little crosshairs you see, the little Xs, 00:55:09.940 --> 00:55:11.960 are sensors that the car has access to. 00:55:11.960 --> 00:55:16.450 So the car has access to data about how far away in any given direction 00:55:16.450 --> 00:55:19.390 particular obstacles are and what these cars are trying 00:55:19.390 --> 00:55:23.500 to learn is how to drive through some sort of environment 00:55:23.500 --> 00:55:25.248 while not crashing into anything. 00:55:25.248 --> 00:55:27.040 And initially they didn't do very well, you 00:55:27.040 --> 00:55:29.650 notice they were crashing almost immediately but now we're 00:55:29.650 --> 00:55:32.158 about six generations in, seven generations in, 00:55:32.158 --> 00:55:33.950 they're starting to do a little bit better. 00:55:33.950 --> 00:55:38.178 They're starting to get a sense for like turning when you run into a wall. 00:55:38.178 --> 00:55:39.970 Even when they get to new environments they 00:55:39.970 --> 00:55:43.137 haven't necessarily seen before, they're starting to do a little bit better. 00:55:43.137 --> 00:55:45.095 But of course, they're still not great, they're 00:55:45.095 --> 00:55:46.450 still crashing quite frequently. 00:55:46.450 --> 00:55:50.470 Sometimes a generation does even worse than the generation before it, 00:55:50.470 --> 00:55:52.720 because it's not always going to be the case that when 00:55:52.720 --> 00:55:55.780 you make random mutations those mutations are not necessarily 00:55:55.780 --> 00:55:56.840 going to be better. 00:55:56.840 --> 00:56:00.020 Sometimes the mutations actually end up being a little bit worse. 00:56:00.020 --> 00:56:02.860 And so they might move backwards a little bit. 00:56:02.860 --> 00:56:05.320 But over time, generation after generation, 00:56:05.320 --> 00:56:09.490 you're hopefully seeing that now 10 generations in, these cars in general 00:56:09.490 --> 00:56:11.890 are doing a whole lot better than they were before 00:56:11.890 --> 00:56:15.250 and every generation we're taking whichever cars made it the furthest, 00:56:15.250 --> 00:56:19.600 duplicating those, eliminating the rest, and moving forward. 00:56:19.600 --> 00:56:21.520 Is there a question in the chat? 00:56:21.520 --> 00:56:23.770 DAVID MALAN: Yeah, with regard to the pancakes, 00:56:23.770 --> 00:56:27.220 how did the robot know what was wrong with the previous pancake flips? 00:56:27.220 --> 00:56:30.550 In the case of the cars how does the car know that it erred? 00:56:30.550 --> 00:56:33.800 BRIAN YU: Yeah, so whenever you're doing anything with reinforcement learning, 00:56:33.800 --> 00:56:36.040 whether it's the pancakes or these robots here, 00:56:36.040 --> 00:56:39.910 the programmer still needs to tell the AI what does success look like 00:56:39.910 --> 00:56:41.540 and what does failure look like. 00:56:41.540 --> 00:56:45.550 So in the pancake example, we trained the pancake flipper 00:56:45.550 --> 00:56:49.795 to be able to know that when you're flipping this pancake when it's first 00:56:49.795 --> 00:56:51.920 taught what does it look like when it's successful. 00:56:51.920 --> 00:56:53.837 So it gets a feel for that and it was probably 00:56:53.837 --> 00:56:57.470 also told that if the pancake falls, that's not a good thing for example. 00:56:57.470 --> 00:57:00.770 And that's not something that the AI should try to do. 00:57:00.770 --> 00:57:02.740 And in this case, I assume these cars have 00:57:02.740 --> 00:57:06.580 been programmed to be told that if you crash into something that that is not 00:57:06.580 --> 00:57:10.030 good, that the car presumably can detect after it's crashed into something 00:57:10.030 --> 00:57:14.042 and so it probably also has some sense of like how far it was able to drive, 00:57:14.042 --> 00:57:16.750 such that we could duplicate the ones that did drive the furthest 00:57:16.750 --> 00:57:19.840 and not the ones that didn't and let's see how this car does, 00:57:19.840 --> 00:57:22.330 it was so close to the end, didn't quite make it. 00:57:22.330 --> 00:57:27.030 Maybe give it one more generation and we'll see how this generation does. 00:57:27.030 --> 00:57:29.075 And all right, so it's navigating these turns, 00:57:29.075 --> 00:57:31.200 looks like a whole bunch didn't make it but as long 00:57:31.200 --> 00:57:33.000 as we get one successful one, we can learn 00:57:33.000 --> 00:57:35.020 from that successful one in the future. 00:57:35.020 --> 00:57:39.930 Here's the ending, looks like they crashed but and all right, it 00:57:39.930 --> 00:57:42.990 looks like one car was finally able to make it to the end 00:57:42.990 --> 00:57:46.680 and was able to successfully learn that task of how to navigate 00:57:46.680 --> 00:57:48.540 through this environment of mazes. 00:57:48.540 --> 00:57:52.500 So I'll pause here for questions then about reinforcement learning 00:57:52.500 --> 00:57:56.520 and how these ideas might have worked. 00:57:56.520 --> 00:57:58.660 And, Samuel, did you have a question? 00:57:58.660 --> 00:58:00.210 AUDIENCE: Yes. 00:58:00.210 --> 00:58:05.490 So with the genetic algorithm, the car one specifically, 00:58:05.490 --> 00:58:12.300 so all the cars are learning from each other as each one crashes? 00:58:12.300 --> 00:58:15.240 BRIAN YU: It's not so much that the cars are learning from each other 00:58:15.240 --> 00:58:18.510 that we're generating new cars based on the cars that 00:58:18.510 --> 00:58:20.160 were previously successful. 00:58:20.160 --> 00:58:22.890 The idea being that if we run 10 cars and see 00:58:22.890 --> 00:58:25.890 how far they go, we take the five that went the furthest, 00:58:25.890 --> 00:58:28.593 we eliminate the other five that didn't make it very far 00:58:28.593 --> 00:58:30.510 but then we duplicate and repeat the ones that 00:58:30.510 --> 00:58:34.260 did do well, such that in the future, hopefully this new generation of cars 00:58:34.260 --> 00:58:35.790 will make it a little bit further. 00:58:35.790 --> 00:58:37.628 And generation after generation, the hope 00:58:37.628 --> 00:58:40.170 is that we're able to make it a little further along the road 00:58:40.170 --> 00:58:42.900 until eventually as you saw like 15 generations in, 00:58:42.900 --> 00:58:47.290 we were able to get some cars that could perform the task successfully. 00:58:47.290 --> 00:58:48.990 Let's go now to Josiah. 00:58:48.990 --> 00:58:56.790 AUDIENCE: Is the car specifically learning just from the track? 00:58:56.790 --> 00:59:03.030 I mean, when you change the track, do we need another, I mean, 00:59:03.030 --> 00:59:06.063 do we need to start from zero again or? 00:59:06.063 --> 00:59:09.230 BRIAN YU: Yeah, so if the track were different would it have to learn again? 00:59:09.230 --> 00:59:10.130 Hopefully not. 00:59:10.130 --> 00:59:12.620 Hopefully what the cars were learning was 00:59:12.620 --> 00:59:14.880 in this case was learning based on sensor data, 00:59:14.880 --> 00:59:18.380 like how far away a wall is in any direction, which way to turn. 00:59:18.380 --> 00:59:20.990 And the goal is for this type of approach to generalize. 00:59:20.990 --> 00:59:24.260 And I mean, hopefully actual self-driving cars in the real world 00:59:24.260 --> 00:59:25.640 are not trained this way. 00:59:25.640 --> 00:59:28.610 But you would hope that as they're trained on sample environments, 00:59:28.610 --> 00:59:30.800 that when you put them in a different environment 00:59:30.800 --> 00:59:32.840 they would be able to generalize their knowledge 00:59:32.840 --> 00:59:34.490 to be able to apply similar techniques. 00:59:34.490 --> 00:59:37.790 That if in a real world setting you take a car and put it on a road 00:59:37.790 --> 00:59:39.650 that it's never seen before, hopefully it's 00:59:39.650 --> 00:59:42.620 seen enough examples of things that are like it, sensor data 00:59:42.620 --> 00:59:46.630 that it recognizes, that it's able to make an intelligent decision based 00:59:46.630 --> 00:59:47.130 on that. 00:59:47.130 --> 00:59:49.760 So yeah, the hope is and the goal in many cases with AI 00:59:49.760 --> 00:59:52.910 is to be able to generalize this knowledge beyond just 00:59:52.910 --> 00:59:53.765 a specific example. 00:59:56.480 --> 00:59:58.360 Any other questions? 00:59:58.360 --> 01:00:02.730 Yeah, let's go to [? Yagonch. ?] 01:00:02.730 --> 01:00:07.380 AUDIENCE: Yeah, so can these algorithms ever reach like a bottleneck, 01:00:07.380 --> 01:00:11.730 just like in real life evolution, there are branches 01:00:11.730 --> 01:00:14.550 and some branches just reach a bottleneck. 01:00:14.550 --> 01:00:16.450 So is that possible here [INAUDIBLE]? 01:00:16.450 --> 01:00:19.460 BRIAN YU: Yes, it's definitely possible that there might be-- 01:00:19.460 --> 01:00:22.800 the algorithms might end up converging to something that seems pretty good 01:00:22.800 --> 01:00:25.410 and it doesn't seem like any mutations are doing any better, 01:00:25.410 --> 01:00:27.540 but it turns out a totally different algorithm 01:00:27.540 --> 01:00:29.100 might actually do better than that. 01:00:29.100 --> 01:00:31.230 That's what we will often call a local maximum 01:00:31.230 --> 01:00:33.060 in the context of artificial intelligence, 01:00:33.060 --> 01:00:36.420 where there is some better approach, there is some better algorithm 01:00:36.420 --> 01:00:37.990 but we can't necessarily find it. 01:00:37.990 --> 01:00:40.840 There are other strategies for trying to deal with that problem. 01:00:40.840 --> 01:00:43.863 But it is definitely a challenge we're thinking about. 01:00:43.863 --> 01:00:44.780 And one more question. 01:00:44.780 --> 01:00:45.783 Let's go to Sophia. 01:00:45.783 --> 01:00:48.450 AUDIENCE: I have a question about how the fitness is calculated. 01:00:48.450 --> 01:00:51.990 Like in both of these cases, is it like [INAUDIBLE] motors that 01:00:51.990 --> 01:00:54.872 are running or here like the distance? 01:00:54.872 --> 01:00:57.330 BRIAN YU: Yeah, in the case of the pancake, it was probably 01:00:57.330 --> 01:01:00.030 like a binary of just like did the pancake end up 01:01:00.030 --> 01:01:02.520 landing in the pan or not, was our way of calculating 01:01:02.520 --> 01:01:04.730 the fitness of that particular example. 01:01:04.730 --> 01:01:07.230 In the case of the cars, we would probably calculate fitness 01:01:07.230 --> 01:01:11.190 based on distance traveled, whichever cars ended up traveling the furthest, 01:01:11.190 --> 01:01:13.380 that would be our definition of fitness and that's 01:01:13.380 --> 01:01:15.690 really the part the programmer needs to be involved in. 01:01:15.690 --> 01:01:17.610 The programmer needs to decide, what does it 01:01:17.610 --> 01:01:21.720 mean to be most fit, what does a bad example look like and using that, 01:01:21.720 --> 01:01:25.320 once the computer knows what success looks like and what failure looks like, 01:01:25.320 --> 01:01:28.620 then it's able to learn from that to be able to do better in the future. 01:01:31.300 --> 01:01:33.160 All right, so that was genetic algorithms, 01:01:33.160 --> 01:01:38.340 which is one example of computers that are able to learn from the way 01:01:38.340 --> 01:01:41.428 that humans learn, learning from the way that nature works for example, 01:01:41.428 --> 01:01:43.470 but computer scientists didn't really stop there. 01:01:43.470 --> 01:01:46.870 There are other examples that we can do to add to this as well. 01:01:46.870 --> 01:01:50.850 And so one other example of using this idea of reinforcement learning 01:01:50.850 --> 01:01:55.140 and using genetic algorithms might be in video recommendations for example, 01:01:55.140 --> 01:01:58.350 where you could have some watch history and the way 01:01:58.350 --> 01:02:01.590 that an algorithm like YouTube or Netflix might suggest videos for you 01:02:01.590 --> 01:02:04.320 to watch is via this reinforcement process, 01:02:04.320 --> 01:02:08.247 that it will just try suggesting videos to you and learn from experience. 01:02:08.247 --> 01:02:10.830 If it suggests a video to you that you like, that you click on 01:02:10.830 --> 01:02:12.538 that, you watch all the way through, well 01:02:12.538 --> 01:02:14.890 then the algorithm learns in the future, let's recommend 01:02:14.890 --> 01:02:16.140 more of those sorts of things. 01:02:16.140 --> 01:02:18.690 That's like the car traveling very far and so we 01:02:18.690 --> 01:02:20.520 learn to do more of that in the future. 01:02:20.520 --> 01:02:22.937 If they recommend a video to you and you don't click on it 01:02:22.937 --> 01:02:25.585 you don't ever watch it, well then it's probably not 01:02:25.585 --> 01:02:27.460 going to recommend that to you in the future. 01:02:27.460 --> 01:02:31.390 And it's probably going to learn from that experience as well. 01:02:31.390 --> 01:02:35.130 And so this is one example of computer science learning from nature, 01:02:35.130 --> 01:02:36.780 learning from the way that humans are. 01:02:36.780 --> 01:02:40.080 Another place that's happened too is by looking at the human brain. 01:02:40.080 --> 01:02:43.410 That the human brain consists of neurons and those neurons are connected 01:02:43.410 --> 01:02:45.960 to one another and they pass information from one to another, 01:02:45.960 --> 01:02:49.380 electrical signals flow through one neuron to another neuron 01:02:49.380 --> 01:02:51.960 and that's how brains are able to make these very 01:02:51.960 --> 01:02:54.700 sophisticated kinds of computations. 01:02:54.700 --> 01:02:58.710 And this is what we might call a neural network, a collection of these neurons. 01:02:58.710 --> 01:03:01.920 And one place that AI has been looking into especially recently gaining 01:03:01.920 --> 01:03:05.760 in popularity, has been trying to develop artificial neural networks. 01:03:05.760 --> 01:03:08.700 Instead of using a biological neuron we just 01:03:08.700 --> 01:03:11.850 use what we might call an artificial neuron or a unit. 01:03:11.850 --> 01:03:14.880 And you can think of this unit as just storing some value, 01:03:14.880 --> 01:03:18.360 like some electrical signal for example, like you might find in the human brain, 01:03:18.360 --> 01:03:21.960 now just in a digital format and these artificial neurons, 01:03:21.960 --> 01:03:24.360 these units can be connected to each other 01:03:24.360 --> 01:03:28.140 in sort of an input that translates into an output sort of format 01:03:28.140 --> 01:03:32.910 where this arrow represents some sort of calculation, some calculation that 01:03:32.910 --> 01:03:36.000 is taking this value on the left and transforming it 01:03:36.000 --> 01:03:38.010 into that value on the right. 01:03:38.010 --> 01:03:42.180 And neural networks are usually not just one input unit and one output unit, 01:03:42.180 --> 01:03:43.410 but they can be more complex. 01:03:43.410 --> 01:03:46.470 You might have a neural network with two different inputs, each of which 01:03:46.470 --> 01:03:49.350 connects to an output or even more sophisticated neural networks 01:03:49.350 --> 01:03:53.070 like this one, where you have multiple layers of these units that are all 01:03:53.070 --> 01:03:55.710 connected to each other, each of these arrows performing 01:03:55.710 --> 01:03:57.630 some sort of calculation and if you've ever 01:03:57.630 --> 01:04:00.810 heard the term deep learning, this is often what that's referring to, 01:04:00.810 --> 01:04:04.650 this idea of deep neural networks with many layers, each of which 01:04:04.650 --> 01:04:06.140 is performing calculations. 01:04:06.140 --> 01:04:09.060 And ultimately using some linear algebra, 01:04:09.060 --> 01:04:11.160 these neural networks are able to figure out 01:04:11.160 --> 01:04:16.170 exactly how to tune these calculations to translate input into some output. 01:04:16.170 --> 01:04:19.560 If you give a neural network enough data it can learn from that data, 01:04:19.560 --> 01:04:22.950 it can learn exactly how to set these various different arrows 01:04:22.950 --> 01:04:25.890 to be able to calculate some task, to be able to translate 01:04:25.890 --> 01:04:28.690 some input into some output. 01:04:28.690 --> 01:04:32.640 So for example, that might take the form of handwriting recognition. 01:04:32.640 --> 01:04:35.910 How exactly do we train computers to be able to learn 01:04:35.910 --> 01:04:39.390 how to recognize handwriting given all the different types of handwriting? 01:04:39.390 --> 01:04:42.390 Well, one way to try to do that is by using a neural network, 01:04:42.390 --> 01:04:45.930 setting up a network of all of these neurons and all of these connections 01:04:45.930 --> 01:04:49.290 and then you give to that neural network a whole bunch of data. 01:04:49.290 --> 01:04:51.090 I give to the neural network a whole bunch 01:04:51.090 --> 01:04:54.970 of already existing handwritten digits, each of which is labeled. 01:04:54.970 --> 01:04:57.853 So the computer knows which image corresponds to which digit. 01:04:57.853 --> 01:05:00.270 This data set you're looking at here is a very famous data 01:05:00.270 --> 01:05:03.720 set of handwritten digits called the MNIST data set, 01:05:03.720 --> 01:05:08.130 and given all of this data the computer can start to train the neural network 01:05:08.130 --> 01:05:10.740 and start to figure out exactly what calculations 01:05:10.740 --> 01:05:13.110 to run at each layer of the neural network 01:05:13.110 --> 01:05:16.020 to be able to translate the input into the output, 01:05:16.020 --> 01:05:19.830 to be able to translate like a screenshot of what looks 01:05:19.830 --> 01:05:23.070 like a handwritten number eight, that we all could tell is the number eight 01:05:23.070 --> 01:05:26.130 but might be difficult for us to describe exactly how a computer could 01:05:26.130 --> 01:05:28.260 figure that out, but via the neural network, 01:05:28.260 --> 01:05:30.810 by training the neural network on all of the sample data, 01:05:30.810 --> 01:05:33.900 it's able to learn some sort of function that 01:05:33.900 --> 01:05:36.780 can translate this input, this handwritten digit 01:05:36.780 --> 01:05:39.850 into the output, what the actual digit happens to be. 01:05:39.850 --> 01:05:42.570 And these neural networks have proved to be incredibly versatile 01:05:42.570 --> 01:05:44.430 and we won't get into all the math here because it 01:05:44.430 --> 01:05:47.305 does get a little bit more complex, but it's used all over the place. 01:05:47.305 --> 01:05:50.290 It can be used for email spam detection, where 01:05:50.290 --> 01:05:53.370 if you give the computer a whole bunch of data, a whole bunch of emails, 01:05:53.370 --> 01:05:56.040 some of which are labeled spam and some of which are not, 01:05:56.040 --> 01:05:57.390 it can learn a function. 01:05:57.390 --> 01:06:01.200 It can learn exactly how to tune that neural network to be able to predict, 01:06:01.200 --> 01:06:04.510 for any given email, whether it's spam or not. 01:06:04.510 --> 01:06:07.260 And really the key factor here to making these networks work 01:06:07.260 --> 01:06:09.520 is having access to large amounts of data. 01:06:09.520 --> 01:06:12.540 And this is why a lot of companies now are doing a lot of work 01:06:12.540 --> 01:06:15.330 and trying to get a lot of data and use that data in training 01:06:15.330 --> 01:06:18.940 their machine learning models, it's because the more data that you have, 01:06:18.940 --> 01:06:22.650 the better you can make these algorithms because the better you can tune them 01:06:22.650 --> 01:06:25.050 to all the different types of inputs there might be, 01:06:25.050 --> 01:06:27.300 to help to make them even more accurate in the future, 01:06:27.300 --> 01:06:30.330 to be able to test these networks to see how well they work. 01:06:30.330 --> 01:06:33.420 And every time you're interacting with websites online, 01:06:33.420 --> 01:06:35.820 you're often helping to provide some of that data. 01:06:35.820 --> 01:06:38.380 Every time you go to your email app for example 01:06:38.380 --> 01:06:42.270 and you mark an email as spam, that email app might be learning from that, 01:06:42.270 --> 01:06:45.730 learning OK this type of email, that's an example of a spam email. 01:06:45.730 --> 01:06:48.120 And so it learns in the future to do a better job 01:06:48.120 --> 01:06:49.740 of trying to do that classification. 01:06:49.740 --> 01:06:52.148 And likewise, every time an email is marked as spam 01:06:52.148 --> 01:06:53.940 and you have to tell the computer, you know 01:06:53.940 --> 01:06:56.523 what, that wasn't spam the computer is learning from that too. 01:06:56.523 --> 01:07:01.110 It's more data that the computer can use to help to inform its algorithm 01:07:01.110 --> 01:07:02.640 and the way that it's working. 01:07:02.640 --> 01:07:07.290 And so for one final example, we can take a look at how images like this 01:07:07.290 --> 01:07:08.430 were actually generated. 01:07:08.430 --> 01:07:11.370 How could a computer possibly get an image like this 01:07:11.370 --> 01:07:14.850 and generate it and make something that looks very much like a real person, 01:07:14.850 --> 01:07:17.692 even though it's not actually a real person? 01:07:17.692 --> 01:07:19.650 Well, it's done using the exact same technique, 01:07:19.650 --> 01:07:23.220 using a neural network that learns how to translate inputs 01:07:23.220 --> 01:07:25.910 into outputs by having access to a large amount of data. 01:07:25.910 --> 01:07:29.800 In this case, having access to many, many photos of real people, 01:07:29.800 --> 01:07:33.150 so the computer can start to get a sense for what a real person looks like, 01:07:33.150 --> 01:07:35.400 it can start to train the network in that way. 01:07:35.400 --> 01:07:38.490 And then rather than build the entire person all at once, 01:07:38.490 --> 01:07:39.960 build it up step by step. 01:07:39.960 --> 01:07:43.920 A computer generating a photo like this, this is a pretty complicated task, 01:07:43.920 --> 01:07:45.550 it's pretty difficult to do. 01:07:45.550 --> 01:07:48.690 But you know what's easier is generating that. 01:07:48.690 --> 01:07:51.570 That is 16 pixels it doesn't look like a person at all 01:07:51.570 --> 01:07:53.580 but this a computer could do pretty readily, 01:07:53.580 --> 01:07:57.630 just generate what seems to be a whole bunch of kind of random looking pixels. 01:07:57.630 --> 01:08:01.080 But then you would train the computer to add a little bit of detail 01:08:01.080 --> 01:08:04.660 to this image to be able to learn if this were a real image, 01:08:04.660 --> 01:08:08.130 how would I add a little more detail to it to make it look a little bit more 01:08:08.130 --> 01:08:09.600 like what a person would look like? 01:08:09.600 --> 01:08:12.540 And it does so again by having access to large amounts of data 01:08:12.540 --> 01:08:16.540 many, many photos of people, so it can begin to learn from that experience. 01:08:16.540 --> 01:08:19.050 So the computer learns how to take this image 01:08:19.050 --> 01:08:22.540 and turn it into this for example. 01:08:22.540 --> 01:08:24.540 Still doesn't really look like a person but it 01:08:24.540 --> 01:08:25.957 looks a little more like a person. 01:08:25.957 --> 01:08:27.210 It's got a higher resolution. 01:08:27.210 --> 01:08:30.420 You can maybe make out that there is some hair here and a face here. 01:08:30.420 --> 01:08:35.310 And then the algorithm learns how do you take an image like this, an 8 by 8 grid 01:08:35.310 --> 01:08:37.560 and turn it into a 16 by 16 grid. 01:08:37.560 --> 01:08:40.080 Now this, I mean, it still doesn't look like a person, 01:08:40.080 --> 01:08:41.580 but it looks a little more accurate. 01:08:41.580 --> 01:08:44.460 And over time we can follow these steps one 01:08:44.460 --> 01:08:46.380 after another adding more and more detail. 01:08:46.380 --> 01:08:51.210 Until eventually, the computer is able to generate an entire image that really 01:08:51.210 --> 01:08:53.880 looks like a person, that's very hard to distinguish just 01:08:53.880 --> 01:08:56.160 by this input to output process. 01:08:56.160 --> 01:08:58.649 Learning some mapping from inputs to outputs 01:08:58.649 --> 01:09:02.189 by having access to a whole lot of data. 01:09:02.189 --> 01:09:05.430 So before we wrap up here I'll pause for any final questions 01:09:05.430 --> 01:09:08.910 about artificial intelligence, any of the algorithms 01:09:08.910 --> 01:09:13.670 we looked at for searching, for learning or anything like that? 01:09:13.670 --> 01:09:14.660 Any final questions? 01:09:14.660 --> 01:09:16.590 Yeah, Sophia, go ahead. 01:09:16.590 --> 01:09:18.899 AUDIENCE: I had a question about like with the fitness, 01:09:18.899 --> 01:09:22.115 how is all the I guess, like data-- you mentioned they get regenerated 01:09:22.115 --> 01:09:23.990 like the successful ones, but I was wondering 01:09:23.990 --> 01:09:27.965 like how is all that data sort like what the success was exactly, 01:09:27.965 --> 01:09:30.770 but fitness is just like a score it's hard to know 01:09:30.770 --> 01:09:35.373 like exactly what decisions it made that said it was successful. 01:09:35.373 --> 01:09:38.540 BRIAN YU: Yeah, and that's actually one of the trickier things about machine 01:09:38.540 --> 01:09:41.060 learning, that we can train these machine learning models 01:09:41.060 --> 01:09:43.220 to be very, very good but it's not always 01:09:43.220 --> 01:09:47.310 immediately apparent to us like what it was doing in order to be successful. 01:09:47.310 --> 01:09:50.007 And this is an active area of research within machine learning, 01:09:50.007 --> 01:09:52.340 including some faculty at Harvard that work on this too, 01:09:52.340 --> 01:09:55.490 which is the interpretability of machine learning results. 01:09:55.490 --> 01:09:59.000 Like the algorithm becomes very, very good at recommending a video to you 01:09:59.000 --> 01:10:02.120 or generating an image of a person, but it's hard for a person 01:10:02.120 --> 01:10:04.910 to look at that machine learning model and understand 01:10:04.910 --> 01:10:06.830 how it arrived at that conclusion. 01:10:06.830 --> 01:10:09.805 Ultimately, in many cases, people just throw their hands up and say, 01:10:09.805 --> 01:10:11.930 we don't really care how the algorithm is doing it, 01:10:11.930 --> 01:10:14.030 as long as the algorithm is doing it successfully 01:10:14.030 --> 01:10:16.695 and eventually produces good results, we'll 01:10:16.695 --> 01:10:19.070 just take the ones that are doing the best and use those, 01:10:19.070 --> 01:10:23.120 even if we don't necessarily understand exactly how those algorithms are 01:10:23.120 --> 01:10:23.660 working. 01:10:23.660 --> 01:10:26.000 And that's definitely an area of concern for some people 01:10:26.000 --> 01:10:29.960 and an area of research for others that are looking into these types of tools 01:10:29.960 --> 01:10:32.510 and technologies. 01:10:32.510 --> 01:10:34.255 Is there a question in the chat maybe? 01:10:34.255 --> 01:10:36.380 DAVID MALAN: A final question from the chat, Brian. 01:10:36.380 --> 01:10:39.890 Every time I choose the parts of the picture that contain traffic lights 01:10:39.890 --> 01:10:45.398 or crosswalks to prove I am not a robot, am I training Google's driverless car? 01:10:45.398 --> 01:10:47.690 BRIAN YU: Maybe not Google's necessarily, but certainly 01:10:47.690 --> 01:10:49.400 a lot of times when you're doing that type of thing. 01:10:49.400 --> 01:10:52.170 I mean, in one part it's to verify that you are in fact human. 01:10:52.170 --> 01:10:54.003 That is one of the purposes of those things, 01:10:54.003 --> 01:10:57.200 to make sure that robots aren't signing up for websites, for example. 01:10:57.200 --> 01:11:00.770 But certainly it could be just giving more examples of labeled data. 01:11:00.770 --> 01:11:03.080 That oftentimes what machine learning models rely on 01:11:03.080 --> 01:11:06.770 is labeled data where you have like a digit, a handwritten digit, 01:11:06.770 --> 01:11:10.190 and you have a label of "this is the number 2," or "this is the number 8." 01:11:10.190 --> 01:11:13.880 And so the computer can then use those digits along with those numbers 01:11:13.880 --> 01:11:17.960 in order to figure out how to learn how to take handwritten digits 01:11:17.960 --> 01:11:20.570 and convert it to individual numbers. 01:11:20.570 --> 01:11:23.420 And so having access to that kind of labeled data ultimately 01:11:23.420 --> 01:11:25.910 ends up being really, really valuable. 01:11:25.910 --> 01:11:28.700 All right, so that's it for artificial intelligence for today. 01:11:28.700 --> 01:11:32.900 Thank you all so much and we'll see you next time. 01:11:32.900 --> 01:11:35.650 [MUSIC PLAYING]