WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:00.000 --> 00:00:17.000 [MUSIC] 00:00:17.870 --> 00:00:18.710 BRIAN YU: All right. 00:00:18.710 --> 00:00:22.280 Welcome, everyone, to an Introduction to Artificial Intelligence with Python. 00:00:22.280 --> 00:00:23.480 My name is Brian Yu. 00:00:23.480 --> 00:00:26.540 And in this class, we'll explore some of the ideas, and techniques, 00:00:26.540 --> 00:00:30.450 and algorithms that are at the foundation of artificial intelligence. 00:00:30.450 --> 00:00:34.160 Now, artificial intelligence covers a wide variety of types of techniques. 00:00:34.160 --> 00:00:36.110 Anytime you see a computer do something that 00:00:36.110 --> 00:00:39.290 appears to be intelligent or rational in some way, 00:00:39.290 --> 00:00:41.510 like recognizing someone's face in a photo, 00:00:41.510 --> 00:00:44.030 or being able to play a game better than people can, 00:00:44.030 --> 00:00:47.120 or being able to understand human language when we talk to our phones 00:00:47.120 --> 00:00:50.370 and they understand what we mean and are able to respond back to us, 00:00:50.370 --> 00:00:54.060 these are all examples of AI, or artificial intelligence. 00:00:54.060 --> 00:00:58.580 And in this class we'll explore some of the ideas that make that AI possible. 00:00:58.580 --> 00:01:00.790 So we'll begin our conversations with search. 00:01:00.790 --> 00:01:02.540 The problem of, we have an AI and we would 00:01:02.540 --> 00:01:06.150 like the AI to be able to search for solutions to some kind of problem, 00:01:06.150 --> 00:01:07.700 no matter what that problem might be. 00:01:07.700 --> 00:01:11.300 Whether it's trying to get driving directions from point A to point B, 00:01:11.300 --> 00:01:13.100 or trying to figure out how to play a game, 00:01:13.100 --> 00:01:16.580 giving a tic-tac-toe game, for example, figuring out what move 00:01:16.580 --> 00:01:17.960 it ought to make. 00:01:17.960 --> 00:01:20.330 After that, we'll take a look at knowledge. 00:01:20.330 --> 00:01:23.480 Ideally, we want our AI to be able to know information, 00:01:23.480 --> 00:01:25.400 to be able to represent that information, 00:01:25.400 --> 00:01:28.580 and more importantly, to be able to draw inferences from that information. 00:01:28.580 --> 00:01:32.760 To be able to use the information it knows and draw additional conclusions. 00:01:32.760 --> 00:01:37.130 So we'll talk about how AI can be programmed in order to do just that. 00:01:37.130 --> 00:01:39.350 Then we'll explore the topic of uncertainty. 00:01:39.350 --> 00:01:43.100 Talking about ideas of, what happens if a computer isn't sure about a fact 00:01:43.100 --> 00:01:45.950 but maybe is only sure with a certain probability? 00:01:45.950 --> 00:01:48.500 So we'll talk about some of the ideas behind probability 00:01:48.500 --> 00:01:51.530 and how computers can begin to deal with uncertain events 00:01:51.530 --> 00:01:55.980 in order to be a little bit more intelligent in that sense, as well. 00:01:55.980 --> 00:01:58.550 After that, we'll turn our attention to optimization. 00:01:58.550 --> 00:02:01.970 Problems of when the computer is trying to optimize for some sort of goal, 00:02:01.970 --> 00:02:03.770 especially in a situation where there might 00:02:03.770 --> 00:02:06.380 be multiple ways that a computer might solve a problem, 00:02:06.380 --> 00:02:09.770 but we're looking for a better way or, potentially, the best way 00:02:09.770 --> 00:02:11.550 if that's at all possible. 00:02:11.550 --> 00:02:14.510 Then we'll take a look at machine learning, or learning more generally. 00:02:14.510 --> 00:02:16.700 In looking at how when we have access to data 00:02:16.700 --> 00:02:20.300 our computers can be programmed to be quite intelligent by learning from data 00:02:20.300 --> 00:02:23.810 and learning from experience, being able to perform a task better and better 00:02:23.810 --> 00:02:25.770 based on greater access to data. 00:02:25.770 --> 00:02:28.520 So your email, for example, where your email inbox somehow 00:02:28.520 --> 00:02:32.180 knows which of your emails are good emails and whichever emails are spam. 00:02:32.180 --> 00:02:34.430 These are all examples of computers being 00:02:34.430 --> 00:02:38.420 able to learn from past experiences and past data. 00:02:38.420 --> 00:02:41.690 We'll take a look, too, at how computers are able to draw inspiration 00:02:41.690 --> 00:02:44.930 from human intelligence, looking at the structure of the human brain 00:02:44.930 --> 00:02:48.780 and how neural networks can be a computer analog to that sort of idea. 00:02:48.780 --> 00:02:51.830 And how, by taking advantage of a certain type of structure of a computer 00:02:51.830 --> 00:02:53.930 program, we can write neural networks that 00:02:53.930 --> 00:02:57.060 are able to perform tasks very, very effectively. 00:02:57.060 --> 00:02:59.420 And then finally, we'll turn our attention to language. 00:02:59.420 --> 00:03:02.820 Not programming languages, but human languages that we speak every day. 00:03:02.820 --> 00:03:04.700 And taking a look at the challenges that come 00:03:04.700 --> 00:03:07.880 about as a computer tries to understand natural language 00:03:07.880 --> 00:03:09.980 and how it is some of the natural language 00:03:09.980 --> 00:03:12.860 processing that occurs in modern artificial intelligence 00:03:12.860 --> 00:03:14.990 can actually work. 00:03:14.990 --> 00:03:17.630 But today it will begin our conversation with search. 00:03:17.630 --> 00:03:19.550 This problem of trying to figure out what 00:03:19.550 --> 00:03:22.790 to do when we have some sort of situation that the computer is in, 00:03:22.790 --> 00:03:25.880 some sort of environment that an agent is in, so to speak. 00:03:25.880 --> 00:03:29.720 And we would like for that agent to be able to somehow look for a solution 00:03:29.720 --> 00:03:31.200 to that problem. 00:03:31.200 --> 00:03:34.250 Now, these problems can come in any number of different types of formats. 00:03:34.250 --> 00:03:37.090 One example, for instance, might be something like this classic 15 00:03:37.090 --> 00:03:38.930 puzzle with the sliding tiles that you might 00:03:38.930 --> 00:03:41.930 have seen, where you're trying to slide the tiles in order to make sure 00:03:41.930 --> 00:03:43.730 that all the numbers line up in order. 00:03:43.730 --> 00:03:46.700 This is an example of what you might call a search problem. 00:03:46.700 --> 00:03:50.360 The 15 puzzle begins in an initially mixed up state 00:03:50.360 --> 00:03:53.480 and we need some way of finding moves to make in order to return 00:03:53.480 --> 00:03:55.520 the puzzle to its solved state. 00:03:55.520 --> 00:03:58.190 But there are similar problems that you can frame in other ways. 00:03:58.190 --> 00:04:00.390 Trying to find your way through a maze, for example, 00:04:00.390 --> 00:04:02.150 is another example of a search problem. 00:04:02.150 --> 00:04:05.720 You begin in one place, you have some goal of where you're trying to get to, 00:04:05.720 --> 00:04:08.930 and you need to figure out the correct sequence of actions that will take you 00:04:08.930 --> 00:04:11.460 from that initial state to the goal. 00:04:11.460 --> 00:04:13.460 And while this is a little bit abstract, anytime 00:04:13.460 --> 00:04:15.440 we talk about maze solving in this class, 00:04:15.440 --> 00:04:17.990 you can translate it to something a little more real world, 00:04:17.990 --> 00:04:19.820 something like driving directions. 00:04:19.820 --> 00:04:22.520 If you ever wonder how Google Maps is able to figure out what 00:04:22.520 --> 00:04:25.550 is the best way for you to get from point A to point B 00:04:25.550 --> 00:04:28.860 and what turns to make, at what time, depending on traffic, for example. 00:04:28.860 --> 00:04:31.250 It's often some sort of search algorithm. 00:04:31.250 --> 00:04:34.370 You have an AI that is trying to get from an initial position 00:04:34.370 --> 00:04:38.060 to some sort of goal by taking some sequence of actions. 00:04:38.060 --> 00:04:40.640 So we'll start our conversations today by thinking 00:04:40.640 --> 00:04:42.590 about these types of search problems and what 00:04:42.590 --> 00:04:46.250 goes in to solving a search problem like this in order for an AI 00:04:46.250 --> 00:04:48.290 to be able to find a good solution. 00:04:48.290 --> 00:04:50.210 In order to do so, though, we're going to need 00:04:50.210 --> 00:04:52.580 to introduce a little bit of terminology, some of which 00:04:52.580 --> 00:04:53.690 I've already used. 00:04:53.690 --> 00:04:56.730 But the first time we'll need to think about is an agent. 00:04:56.730 --> 00:04:59.950 An agent is just some entity that perceives its environment, 00:04:59.950 --> 00:05:02.120 it somehow is able to perceive the things around it, 00:05:02.120 --> 00:05:04.530 and act on that environment in some way. 00:05:04.530 --> 00:05:06.240 So in the case of the driving directions, 00:05:06.240 --> 00:05:08.960 your agent might be some representation of a car that 00:05:08.960 --> 00:05:11.180 is trying to figure out what actions to take in order 00:05:11.180 --> 00:05:12.740 to arrive at a destination. 00:05:12.740 --> 00:05:15.440 In the case of the 15 puzzle with the sliding tiles, 00:05:15.440 --> 00:05:19.250 the agent might be the AI or the person that is trying to solve that puzzle, 00:05:19.250 --> 00:05:24.090 trying to figure out what tiles to move in order to get to that solution. 00:05:24.090 --> 00:05:26.680 Next, we introduce the idea of a state. 00:05:26.680 --> 00:05:31.140 A state is just some configuration of the agent in its environment. 00:05:31.140 --> 00:05:34.880 So in the 15 puzzle, for example, any state might be any one of these three 00:05:34.880 --> 00:05:35.570 for example. 00:05:35.570 --> 00:05:38.570 A state is just some configuration of the tiles. 00:05:38.570 --> 00:05:40.490 Each of these states is different and is going 00:05:40.490 --> 00:05:42.830 to require a slightly different solution. 00:05:42.830 --> 00:05:46.400 A different sequence of actions will be needed in each one of these in order 00:05:46.400 --> 00:05:49.730 to get from this initial state to the goal, which 00:05:49.730 --> 00:05:51.500 is where we're trying to get. 00:05:51.500 --> 00:05:52.460 The initial state then. 00:05:52.460 --> 00:05:53.210 What is that? 00:05:53.210 --> 00:05:56.090 The initial state is just the state where the agent begins. 00:05:56.090 --> 00:05:58.880 It is one such state where we're going to start from 00:05:58.880 --> 00:06:02.010 and this is going to be the starting point for our search algorithm, 00:06:02.010 --> 00:06:02.690 so to speak. 00:06:02.690 --> 00:06:04.710 We're going to begin with this initial state 00:06:04.710 --> 00:06:08.180 and then start to reason about it, to think about what actions might we apply 00:06:08.180 --> 00:06:11.870 to that initial state in order to figure out how to get from the beginning 00:06:11.870 --> 00:06:16.850 to the end, from the initial position to whatever our goal happens to be. 00:06:16.850 --> 00:06:19.670 And how do we make our way from that initial position to the goal? 00:06:19.670 --> 00:06:22.100 Well ultimately, it's via taking actions. 00:06:22.100 --> 00:06:25.570 Actions are just choices that we can make in any given state. 00:06:25.570 --> 00:06:29.180 And in AI, we're always going to try to formalize these ideas a little bit 00:06:29.180 --> 00:06:31.850 more precisely such that we could program them a little bit more 00:06:31.850 --> 00:06:33.440 mathematically, so to speak. 00:06:33.440 --> 00:06:37.580 So this will be a recurring theme and we can more precisely define actions 00:06:37.580 --> 00:06:38.840 as a function. 00:06:38.840 --> 00:06:41.570 We're going to effectively define a function called actions 00:06:41.570 --> 00:06:46.490 that takes an input S, where S is going to be some state that exists inside 00:06:46.490 --> 00:06:50.720 of our environment, and actions of S is going to take the state as input 00:06:50.720 --> 00:06:54.140 and return as output the set of all actions that 00:06:54.140 --> 00:06:56.540 can be executed in that state. 00:06:56.540 --> 00:07:00.170 And so it's possible that some actions are only valid in certain states 00:07:00.170 --> 00:07:01.640 and not in other states. 00:07:01.640 --> 00:07:04.340 And we'll see examples of that soon, too. 00:07:04.340 --> 00:07:06.530 So in the case of the 15 puzzle, for example, 00:07:06.530 --> 00:07:09.010 they're generally going to be four possible actions 00:07:09.010 --> 00:07:10.760 that we can do most of the time. 00:07:10.760 --> 00:07:13.250 We can slide a tile to the right, slide a tile to the left, 00:07:13.250 --> 00:07:16.280 slide a tile up, or slide a tile down, for example. 00:07:16.280 --> 00:07:19.950 And those are going to be the actions that are available to us. 00:07:19.950 --> 00:07:23.000 So somehow our AI, our program, needs some encoding 00:07:23.000 --> 00:07:26.150 of the state, which is often going to be in some numerical format, 00:07:26.150 --> 00:07:28.050 and some encoding of these actions. 00:07:28.050 --> 00:07:31.100 But it also needs some encoding of the relationship between these things, 00:07:31.100 --> 00:07:34.610 how do the states and actions relate to one another? 00:07:34.610 --> 00:07:36.950 And in order to do that, we'll introduce to our AI 00:07:36.950 --> 00:07:40.430 a transition model, which will be a description of what state 00:07:40.430 --> 00:07:45.120 we get after we perform some available action in some other state. 00:07:45.120 --> 00:07:47.540 And again, we can be a little bit more precise about this, 00:07:47.540 --> 00:07:50.820 define this transition model a little bit more formally, again, 00:07:50.820 --> 00:07:51.770 as a function. 00:07:51.770 --> 00:07:54.010 The function is going to be a function called result, 00:07:54.010 --> 00:07:56.210 that this time takes two inputs. 00:07:56.210 --> 00:07:59.070 Input number one is S, some state. 00:07:59.070 --> 00:08:02.270 And input number two is A, some action. 00:08:02.270 --> 00:08:04.450 And the output of this function result is 00:08:04.450 --> 00:08:09.770 it is going to give us the state that we get after we perform action A in state 00:08:09.770 --> 00:08:13.790 S. So let's take a look at an example to see more precisely what this actually 00:08:13.790 --> 00:08:14.580 means. 00:08:14.580 --> 00:08:18.000 Here's an example of a state of the 15 puzzle, for example. 00:08:18.000 --> 00:08:21.600 And here's an example of an action, sliding a tile to the right. 00:08:21.600 --> 00:08:24.840 What happens if we pass these as inputs to the result function? 00:08:24.840 --> 00:08:29.270 Again, the result function takes this board, this state, as its first input. 00:08:29.270 --> 00:08:31.730 And it takes an action as a second input. 00:08:31.730 --> 00:08:34.100 And of course, here, I'm describing things visually so 00:08:34.100 --> 00:08:37.010 that you can see visually what the state is and what the action is. 00:08:37.010 --> 00:08:39.410 In a computer, you might represent one of these actions 00:08:39.410 --> 00:08:41.570 as just some number that represents the action. 00:08:41.570 --> 00:08:43.370 Or if you're familiar with enums that allow 00:08:43.370 --> 00:08:46.430 you to enumerate multiple possibilities, it might be something like that. 00:08:46.430 --> 00:08:50.530 And the state might just be represented as an array, or two dimensional array, 00:08:50.530 --> 00:08:52.400 of all of these numbers that exist. 00:08:52.400 --> 00:08:55.540 But here we're going to show it visually just so you can see it. 00:08:55.540 --> 00:08:59.450 When we take this state and this action, pass it into the result function, 00:08:59.450 --> 00:09:01.310 the output is a new state. 00:09:01.310 --> 00:09:04.910 The state we get after we take a tile and slide it to the right, and this 00:09:04.910 --> 00:09:06.500 is the state we get as a result. 00:09:06.500 --> 00:09:09.710 If we had a different action and a different state, for example, 00:09:09.710 --> 00:09:11.450 and passed that into the result function, 00:09:11.450 --> 00:09:13.470 we'd get a different answer altogether. 00:09:13.470 --> 00:09:15.710 So the result function needs to take care 00:09:15.710 --> 00:09:20.090 of figuring out how to take a state and take an action and get what results. 00:09:20.090 --> 00:09:22.760 And this is going to be our transition model that 00:09:22.760 --> 00:09:27.450 describes how it is that states and actions are related to each other. 00:09:27.450 --> 00:09:30.290 If we take this transition model and think about it more generally 00:09:30.290 --> 00:09:34.850 and across the entire problem, we can form what we might call a state space, 00:09:34.850 --> 00:09:38.120 the set of all of the states we can get from the initial state 00:09:38.120 --> 00:09:41.960 via any sequence of actions, by taking zero or one or two or more 00:09:41.960 --> 00:09:45.050 actions in addition to that, so we could draw a diagram 00:09:45.050 --> 00:09:46.400 that looks something like this. 00:09:46.400 --> 00:09:49.640 Where every state is represented here by a game board. 00:09:49.640 --> 00:09:52.850 And there are arrows that connect every state to every other state we 00:09:52.850 --> 00:09:54.740 can get two from that state. 00:09:54.740 --> 00:09:57.800 And the state space is much larger than what you see just here. 00:09:57.800 --> 00:10:01.920 This is just a sample of what the state space might actually look like. 00:10:01.920 --> 00:10:04.340 And, in general, across many search problems, 00:10:04.340 --> 00:10:07.280 whether they're this particular 15 puzzle or driving directions 00:10:07.280 --> 00:10:10.710 or something else, the state space is going to look something like this. 00:10:10.710 --> 00:10:15.110 We have individual states and arrows that are connecting them. 00:10:15.110 --> 00:10:17.000 And oftentimes, just for simplicity, we'll 00:10:17.000 --> 00:10:19.670 simplify our representation of this entire thing 00:10:19.670 --> 00:10:24.560 as a graph, some sequence of nodes and edges that connect nodes. 00:10:24.560 --> 00:10:28.640 But you can think of this more abstract representation as the exact same idea. 00:10:28.640 --> 00:10:30.560 Each of these little circles, or nodes, is 00:10:30.560 --> 00:10:33.860 going to represent one of the states inside of our problem. 00:10:33.860 --> 00:10:35.930 And the arrows here represent the actions 00:10:35.930 --> 00:10:38.300 that we can take in any particular state, 00:10:38.300 --> 00:10:44.180 taking us from one particular state to another state, for example. 00:10:44.180 --> 00:10:44.760 All right. 00:10:44.760 --> 00:10:48.510 So now we have this idea of nodes that are representing these states, 00:10:48.510 --> 00:10:50.890 actions that can take us from one state to another, 00:10:50.890 --> 00:10:53.370 and a transition model that defines what happens 00:10:53.370 --> 00:10:55.420 after we take a particular action. 00:10:55.420 --> 00:10:57.210 So the next step we need to figure out is 00:10:57.210 --> 00:11:00.570 how we know when the AI is done solving the problem. 00:11:00.570 --> 00:11:03.540 The AI I needs some way to know when it gets to the goal, 00:11:03.540 --> 00:11:04.990 that it's found the goal. 00:11:04.990 --> 00:11:08.100 So the next thing we'll need to encode into our artificial intelligence 00:11:08.100 --> 00:11:13.380 is a goal test, some way to determine whether a given state is a goal state. 00:11:13.380 --> 00:11:16.740 In the case of something like driving directions, it might be pretty easy. 00:11:16.740 --> 00:11:19.080 If you're in a state that corresponds to whatever 00:11:19.080 --> 00:11:21.610 the user typed in as their intended destination, well, 00:11:21.610 --> 00:11:23.160 then you know you're in a goal state. 00:11:23.160 --> 00:11:25.320 In the 15 puzzle, it might be checking the numbers 00:11:25.320 --> 00:11:27.160 to make sure they're all in ascending order. 00:11:27.160 --> 00:11:29.880 But the AI need some way to encode whether or not 00:11:29.880 --> 00:11:32.340 any state they happen to be in is a goal. 00:11:32.340 --> 00:11:34.650 And some problems might have one goal, like a maze 00:11:34.650 --> 00:11:37.260 where you have one initial position and one ending position 00:11:37.260 --> 00:11:38.470 and that's the goal. 00:11:38.470 --> 00:11:40.320 In other more complex problems, you might 00:11:40.320 --> 00:11:43.890 imagine that there are multiple possible goals, that there are multiple ways 00:11:43.890 --> 00:11:45.070 to solve a problem. 00:11:45.070 --> 00:11:47.940 And we might not care which one the computer finds as 00:11:47.940 --> 00:11:51.330 long as it does find a particular goal. 00:11:51.330 --> 00:11:54.900 However, sometimes a computer doesn't just care about finding a goal, 00:11:54.900 --> 00:11:57.710 but finding a goal well, or one with a low cost. 00:11:57.710 --> 00:11:59.550 And it's for that reason that the last piece 00:11:59.550 --> 00:12:02.400 of terminology that we use to define these search problems 00:12:02.400 --> 00:12:04.710 is something called a path cost. 00:12:04.710 --> 00:12:07.170 You might imagine that in the case of driving directions, 00:12:07.170 --> 00:12:10.680 it would be pretty annoying if I said I wanted directions from point A 00:12:10.680 --> 00:12:14.310 to point B, and the route the Google Maps gave me was a long route with lots 00:12:14.310 --> 00:12:17.220 of detours that were unnecessary, that took longer than it should 00:12:17.220 --> 00:12:19.530 have for me to get to that destination. 00:12:19.530 --> 00:12:22.410 And it's for that reason that when we're formulating search problems, 00:12:22.410 --> 00:12:27.480 we'll often give every path some sort of numerical cost, some number telling us 00:12:27.480 --> 00:12:30.730 how expensive it is to take this particular option. 00:12:30.730 --> 00:12:34.530 And then tell our AI that instead of just finding a solution, 00:12:34.530 --> 00:12:36.990 some way of getting from the initial state to the goal, 00:12:36.990 --> 00:12:40.830 we'd really like to find one that minimizes this path cost, that 00:12:40.830 --> 00:12:44.370 is less expensive, or takes less time, or minimizes 00:12:44.370 --> 00:12:46.500 some other numerical value. 00:12:46.500 --> 00:12:49.770 We can represent this graphically, if we take a look at this graph again. 00:12:49.770 --> 00:12:52.740 And imagine that each of these arrows, each of these actions 00:12:52.740 --> 00:12:55.560 that we can take from one state to another state, 00:12:55.560 --> 00:12:57.720 has some sort of number associated with it, 00:12:57.720 --> 00:13:01.170 that number being the path cost of this particular action where 00:13:01.170 --> 00:13:03.480 some of the costs for any particular action 00:13:03.480 --> 00:13:07.660 might be more expensive than the cost for some other action, for example. 00:13:07.660 --> 00:13:10.120 Although this will only happen in some sorts of problems. 00:13:10.120 --> 00:13:12.540 In other problems we can simplify the diagram 00:13:12.540 --> 00:13:16.750 and just assume that the cost of any particular action is the same. 00:13:16.750 --> 00:13:19.440 And this is probably the case in something like the 15 puzzle, 00:13:19.440 --> 00:13:22.410 for example, where it doesn't really make a difference whether I'm 00:13:22.410 --> 00:13:23.880 moving right or moving left. 00:13:23.880 --> 00:13:26.940 The only thing that matters is the total number of steps 00:13:26.940 --> 00:13:31.650 that I have to take to get from point A to point B. And each of those steps 00:13:31.650 --> 00:13:32.880 is of equal cost. 00:13:32.880 --> 00:13:37.270 We can just assume it's a some constant cost, like one. 00:13:37.270 --> 00:13:39.380 And so this now forms the basis for what we 00:13:39.380 --> 00:13:41.720 might consider to be a search problem. 00:13:41.720 --> 00:13:44.960 A search problem has some sort of initial state, some place where 00:13:44.960 --> 00:13:47.420 we begin, some sort of action that we can take 00:13:47.420 --> 00:13:50.300 or multiple actions that we can take in any given state, 00:13:50.300 --> 00:13:52.850 and it has a transition model, some way of defining 00:13:52.850 --> 00:13:56.570 what happens when we go from one state and take one action, 00:13:56.570 --> 00:13:59.280 what state do we end up with as a result. 00:13:59.280 --> 00:14:02.450 In addition to that, we need some goal test to know whether or not 00:14:02.450 --> 00:14:03.650 we've reached a goal. 00:14:03.650 --> 00:14:07.940 And then we need a path cost function that tells us for any particular path, 00:14:07.940 --> 00:14:11.690 by following some sequence of actions, how expensive is that path. 00:14:11.690 --> 00:14:14.360 What is its cost in terms of money, or time, 00:14:14.360 --> 00:14:18.470 or some other resource that we are trying to minimize our usage of. 00:14:18.470 --> 00:14:22.370 The goal, ultimately, is to find a solution, where a solution in this case 00:14:22.370 --> 00:14:26.090 is just some sequence of actions that will take us from the initial state 00:14:26.090 --> 00:14:27.200 to the goal state. 00:14:27.200 --> 00:14:31.700 And, ideally, we'd like to find not just any solution, but the optimal solution, 00:14:31.700 --> 00:14:35.510 which is a solution that has the lowest path cost among all 00:14:35.510 --> 00:14:36.950 of the possible solutions. 00:14:36.950 --> 00:14:39.590 And in some cases, there might be multiple optimal solutions, 00:14:39.590 --> 00:14:41.600 but an optimal solution just means that there 00:14:41.600 --> 00:14:46.430 is no way that we could have done better in terms of finding that solution. 00:14:46.430 --> 00:14:47.870 So now we've defined the problem. 00:14:47.870 --> 00:14:49.990 And now we need to begin to figure out how 00:14:49.990 --> 00:14:53.050 it is that we're going to solve this kind of search problem. 00:14:53.050 --> 00:14:55.270 And in order to do so, you'll probably imagine 00:14:55.270 --> 00:14:58.810 that our computer is going to need to represent a whole bunch of data 00:14:58.810 --> 00:15:00.160 about this particular problem. 00:15:00.160 --> 00:15:03.080 We need to represent data about where we are in the problem. 00:15:03.080 --> 00:15:06.490 And we might need to be considering multiple different options at once. 00:15:06.490 --> 00:15:09.850 And oftentimes when we're trying to package a whole bunch of data related 00:15:09.850 --> 00:15:12.850 to a state together, we'll do so using a data structure 00:15:12.850 --> 00:15:14.630 that we're going to call a node. 00:15:14.630 --> 00:15:16.510 A node is a data structure that is just going 00:15:16.510 --> 00:15:19.110 to keep track of a variety of different values, 00:15:19.110 --> 00:15:21.430 and specifically in the case of a search problem, 00:15:21.430 --> 00:15:24.640 it's going to keep track of these four values in particular. 00:15:24.640 --> 00:15:28.570 Every node is going to keep track of a state, the state we're currently on. 00:15:28.570 --> 00:15:31.540 And every node is also going to keep track of a parent. 00:15:31.540 --> 00:15:34.420 A parent being the state before us, or the node 00:15:34.420 --> 00:15:37.660 that we used in order to get to this current state. 00:15:37.660 --> 00:15:39.830 And this is going to be relevant because eventually, 00:15:39.830 --> 00:15:42.740 once we reach the goal node, once we get to the end, 00:15:42.740 --> 00:15:47.210 we want to know what sequence of actions we used in order to get to that goal. 00:15:47.210 --> 00:15:50.140 And the way we'll know that is by looking at these parents 00:15:50.140 --> 00:15:53.890 to keep track of what led us to the goal, and what led us to that state, 00:15:53.890 --> 00:15:56.710 and what led us to the state before that, so on and so forth, 00:15:56.710 --> 00:15:59.350 backtracking our way to the beginning so that we 00:15:59.350 --> 00:16:01.990 know the entire sequence of actions we needed in order 00:16:01.990 --> 00:16:04.730 to get from the beginning to the end. 00:16:04.730 --> 00:16:07.900 The node is also going to keep track of what action we took in order to get 00:16:07.900 --> 00:16:10.120 from the parent to the current state. 00:16:10.120 --> 00:16:13.700 And the node is also going to keep track of a path cost. 00:16:13.700 --> 00:16:16.270 In other words, it's going to keep track of the number that 00:16:16.270 --> 00:16:20.440 represents how long it took to get from the initial state to the state 00:16:20.440 --> 00:16:22.080 that we currently happen to be at. 00:16:22.080 --> 00:16:24.790 And we'll see why this is relevant as we start to talk about some 00:16:24.790 --> 00:16:28.030 of the optimizations that we can make in terms of these search problems more 00:16:28.030 --> 00:16:29.080 generally. 00:16:29.080 --> 00:16:31.000 So this is the data structure that we're going 00:16:31.000 --> 00:16:32.920 to use in order to solve the problem. 00:16:32.920 --> 00:16:35.050 And now let's talk about the approach, how might we 00:16:35.050 --> 00:16:37.850 actually begin to solve the problem? 00:16:37.850 --> 00:16:39.940 Well, as you might imagine, what we're going to do 00:16:39.940 --> 00:16:42.190 is we're going to start at one particular state 00:16:42.190 --> 00:16:44.700 and we're just going to explore from there. 00:16:44.700 --> 00:16:46.720 The intuition is that from a given state, 00:16:46.720 --> 00:16:48.760 we have multiple options that we could take, 00:16:48.760 --> 00:16:50.800 and we're going to explore those options. 00:16:50.800 --> 00:16:54.490 And once we explore those options, we'll find that more options than that 00:16:54.490 --> 00:16:56.290 are going to make themselves available. 00:16:56.290 --> 00:16:59.080 And we're going to consider all of the available options 00:16:59.080 --> 00:17:03.310 to be stored inside of a single data structure that we'll call the frontier. 00:17:03.310 --> 00:17:05.710 The frontier is going to represent all of the things 00:17:05.710 --> 00:17:10.440 that we could explore next, that we haven't yet explored or visited. 00:17:10.440 --> 00:17:12.610 So in our approach, we're going to begin this search 00:17:12.610 --> 00:17:17.020 algorithm by starting with a frontier that just contains one state. 00:17:17.020 --> 00:17:20.470 The frontier is going to contain the initial state because at the beginning, 00:17:20.470 --> 00:17:21.970 that's the only state we know about. 00:17:21.970 --> 00:17:24.330 That is the only state that exists. 00:17:24.330 --> 00:17:27.790 And then our search algorithm is effectively going to follow a loop. 00:17:27.790 --> 00:17:31.390 We're going to repeat some process again and again and again. 00:17:31.390 --> 00:17:35.320 The first thing we're going to do is if the frontier is empty, 00:17:35.320 --> 00:17:36.610 then there's no solution. 00:17:36.610 --> 00:17:39.310 And we can report that there is no way to get to the goal. 00:17:39.310 --> 00:17:40.560 And that's certainly possible. 00:17:40.560 --> 00:17:42.730 There are certain types of problems that an AI might 00:17:42.730 --> 00:17:46.940 try to explore and realize that there is no way to solve that problem. 00:17:46.940 --> 00:17:49.610 And that's useful information for humans to know, as well. 00:17:49.610 --> 00:17:53.560 So if ever the frontier is empty, that means there's nothing left to explore, 00:17:53.560 --> 00:17:57.160 and we haven't yet found a solution so there is no solution. 00:17:57.160 --> 00:17:59.140 There's nothing left to explore. 00:17:59.140 --> 00:18:03.020 Otherwise what we'll do is we'll remove a node from the frontier. 00:18:03.020 --> 00:18:04.990 So right now at the beginning, the frontier 00:18:04.990 --> 00:18:07.870 just contains one node representing the initial state. 00:18:07.870 --> 00:18:09.550 But over time, the frontier might grow. 00:18:09.550 --> 00:18:11.080 It might contain multiple states. 00:18:11.080 --> 00:18:15.730 And so here we're just going to remove a single node from that frontier. 00:18:15.730 --> 00:18:19.000 If that node happens to be a goal, then we found a solution. 00:18:19.000 --> 00:18:22.390 So we remove a node from the frontier and ask ourselves, is this the goal? 00:18:22.390 --> 00:18:25.540 And we do that by applying the goal test that we talked about earlier, 00:18:25.540 --> 00:18:29.080 asking if we're at the destination or asking if all the numbers of the 15 00:18:29.080 --> 00:18:31.250 puzzle happen to be in order. 00:18:31.250 --> 00:18:33.860 So if the node contains the goal, we found a solution. 00:18:33.860 --> 00:18:34.360 Great. 00:18:34.360 --> 00:18:35.900 We're done. 00:18:35.900 --> 00:18:40.820 And otherwise, what we'll need to do is we'll need to expand the node. 00:18:40.820 --> 00:18:42.940 And this is a term in artificial intelligence. 00:18:42.940 --> 00:18:46.870 To expand the node just means to look at all of the neighbors of that node. 00:18:46.870 --> 00:18:49.630 In other words, consider all of the possible actions 00:18:49.630 --> 00:18:52.830 that I could take from the state that this node as representing 00:18:52.830 --> 00:18:55.270 and what nodes could I get to from there. 00:18:55.270 --> 00:18:57.520 We're going to take all of those nodes, the next nodes 00:18:57.520 --> 00:19:00.220 that I can get to from this current one I'm looking at, 00:19:00.220 --> 00:19:02.320 and add those to the frontier. 00:19:02.320 --> 00:19:04.550 And then we'll repeat this process. 00:19:04.550 --> 00:19:07.840 So at a very high level, the idea is we start with a frontier that 00:19:07.840 --> 00:19:09.400 contains the initial state. 00:19:09.400 --> 00:19:12.190 And we're constantly removing a node from the frontier, 00:19:12.190 --> 00:19:16.150 looking at where we can get to next, and adding those nodes to the frontier, 00:19:16.150 --> 00:19:19.090 repeating this process over and over until either we remove 00:19:19.090 --> 00:19:22.270 a node from the frontier and it contains a goal, meaning we've solved 00:19:22.270 --> 00:19:23.320 the problem. 00:19:23.320 --> 00:19:26.950 Or we run into a situation where the frontier is empty, at which point 00:19:26.950 --> 00:19:29.610 we're left with no solution. 00:19:29.610 --> 00:19:31.610 So let's actually try and take the pseudocode, 00:19:31.610 --> 00:19:36.470 put it into practice by taking a look at an example of a sample search problem. 00:19:36.470 --> 00:19:38.270 So right here I have a sample graph. 00:19:38.270 --> 00:19:42.470 A is connected to B via this action, B is connected to node C and D, C 00:19:42.470 --> 00:19:46.220 is connected to D, E is connected to F. And what I'd like to do 00:19:46.220 --> 00:19:52.760 is have my AI find a path from A to E. We want to get from this initial state 00:19:52.760 --> 00:19:54.980 to this goal state. 00:19:54.980 --> 00:19:56.440 So how are we going to do that? 00:19:56.440 --> 00:19:58.490 Well, we're going to start with the frontier that 00:19:58.490 --> 00:19:59.760 contains the initial state. 00:19:59.760 --> 00:20:01.550 This is going to represent our frontier. 00:20:01.550 --> 00:20:05.090 So our frontier, initially, will just contain A, that initial state 00:20:05.090 --> 00:20:06.650 where we're going to begin. 00:20:06.650 --> 00:20:08.480 And now we'll repeat this process. 00:20:08.480 --> 00:20:10.460 If the frontier is empty, no solution. 00:20:10.460 --> 00:20:12.920 That's not a problem because the frontier is not empty. 00:20:12.920 --> 00:20:17.080 So we'll remove a node from the frontier as the one to consider next. 00:20:17.080 --> 00:20:18.720 There is only one node in the frontier. 00:20:18.720 --> 00:20:20.810 So we'll go ahead and remove it from the frontier. 00:20:20.810 --> 00:20:25.490 But now A, this initial node, this is the node we're currently considering. 00:20:25.490 --> 00:20:26.600 We follow the next step. 00:20:26.600 --> 00:20:29.270 We ask ourselves, is this node the goal? 00:20:29.270 --> 00:20:29.880 No, it's not. 00:20:29.880 --> 00:20:30.770 A is not the goal. 00:20:30.770 --> 00:20:32.220 E is the goal. 00:20:32.220 --> 00:20:33.870 So we don't return the solution. 00:20:33.870 --> 00:20:37.160 So instead, we go to this last step, expand the node 00:20:37.160 --> 00:20:39.950 and add the resulting nodes to the frontier. 00:20:39.950 --> 00:20:41.070 What does that mean? 00:20:41.070 --> 00:20:45.050 Well, it means take this state A and consider where we could get to next. 00:20:45.050 --> 00:20:47.390 And after A what we could get to next is only 00:20:47.390 --> 00:20:51.110 B. So that's what we get when we expand A. We find B. 00:20:51.110 --> 00:20:53.080 And we add B to the frontier. 00:20:53.080 --> 00:20:56.300 And now B is in the frontier and we repeat the process again. 00:20:56.300 --> 00:20:57.050 We say, all right. 00:20:57.050 --> 00:20:58.280 The frontier is not empty. 00:20:58.280 --> 00:21:00.440 So let's remove B from the frontier. 00:21:00.440 --> 00:21:02.300 B is now the node that we're considering. 00:21:02.300 --> 00:21:04.160 We ask ourselves, is B the goal? 00:21:04.160 --> 00:21:05.120 No, it's not. 00:21:05.120 --> 00:21:10.010 So we go ahead and expand B and add its resulting nodes to the frontier. 00:21:10.010 --> 00:21:11.660 What happens when we expand B? 00:21:11.660 --> 00:21:14.750 In other words, what nodes can we get to from B? 00:21:14.750 --> 00:21:17.180 Well, we can get to C and D. So we'll go ahead 00:21:17.180 --> 00:21:19.100 and add C and D from the frontier. 00:21:19.100 --> 00:21:21.500 And now we have two nodes in the frontier, C and D. 00:21:21.500 --> 00:21:23.170 And we repeat the process again. 00:21:23.170 --> 00:21:25.220 We remove a node from the frontier, for now we'll 00:21:25.220 --> 00:21:27.260 do so arbitrarily just by picking C. 00:21:27.260 --> 00:21:30.550 We'll see why later how choosing which node you remove from the frontier 00:21:30.550 --> 00:21:32.840 is actually quite an important part of the algorithm. 00:21:32.840 --> 00:21:36.230 But for now I'll arbitrarily remove C, say it's not the goal, 00:21:36.230 --> 00:21:39.260 so we'll add E, the next one to the frontier. 00:21:39.260 --> 00:21:41.420 Then let's say I remove E from the frontier. 00:21:41.420 --> 00:21:45.620 And now I'm currently looking at state E. Is that a goal state? 00:21:45.620 --> 00:21:48.410 It is because I'm trying to find a path from A to E. 00:21:48.410 --> 00:21:49.760 So I would return the goal. 00:21:49.760 --> 00:21:52.520 And that, now, would be the solution, that I'm now 00:21:52.520 --> 00:21:57.290 able to return the solution and I found a path from A to E. 00:21:57.290 --> 00:22:00.710 So this is the general idea, the general approach of this search algorithm, 00:22:00.710 --> 00:22:04.220 to follow these steps constantly removing nodes from the frontier 00:22:04.220 --> 00:22:06.540 until we're able to find a solution. 00:22:06.540 --> 00:22:08.510 So the next question you might reasonably ask 00:22:08.510 --> 00:22:10.610 is, what could go wrong here? 00:22:10.610 --> 00:22:14.090 What are the potential problems with an approach like this? 00:22:14.090 --> 00:22:18.090 And here's one example of a problem that could arise from this sort of approach. 00:22:18.090 --> 00:22:21.980 Imagine this same graph, same as before, with one change. 00:22:21.980 --> 00:22:25.070 The change being, now, instead of just an arrow from A to B, 00:22:25.070 --> 00:22:29.250 we also have an arrow from B to A, meaning we can go in both directions. 00:22:29.250 --> 00:22:31.580 And this is true in something like the 15 puzzle 00:22:31.580 --> 00:22:33.560 where when I slide a tile to the right, I 00:22:33.560 --> 00:22:36.980 could then slide a tile to the left to get back to the original position. 00:22:36.980 --> 00:22:39.530 I could go back and forth between A and B. 00:22:39.530 --> 00:22:42.920 And that's what these double arrows symbolize, the idea that from one state 00:22:42.920 --> 00:22:45.310 I can get to another and then I can get back. 00:22:45.310 --> 00:22:47.600 And that's true in many search problems. 00:22:47.600 --> 00:22:50.990 What's going to happen if I try to apply the same approach now? 00:22:50.990 --> 00:22:53.280 Well, I'll begin with A, same as before. 00:22:53.280 --> 00:22:55.160 And I'll remove A from the frontier. 00:22:55.160 --> 00:22:59.420 And then I'll consider where I can get to from A. And after A, the only place 00:22:59.420 --> 00:23:02.360 I can get choice B so B goes into the frontier. 00:23:02.360 --> 00:23:03.410 Then I'll say, all right. 00:23:03.410 --> 00:23:06.200 Let's take a look at B. That's the only thing left in the frontier. 00:23:06.200 --> 00:23:08.150 Where can I get to from B? 00:23:08.150 --> 00:23:12.410 Before it was just C and D, but now because of that reverse arrow, 00:23:12.410 --> 00:23:17.950 I can get to A or C or D. So all three A, C, and D. All of those 00:23:17.950 --> 00:23:19.070 now go into the frontier. 00:23:19.070 --> 00:23:23.300 They are places I can get to from B. And now I remove one from the frontier, 00:23:23.300 --> 00:23:25.870 and, you know, maybe I'm unlucky and maybe I pick A. 00:23:25.870 --> 00:23:27.820 And now I'm looking at A again. 00:23:27.820 --> 00:23:31.490 And I consider where can I get to from A. And from A, well I can get to B. 00:23:31.490 --> 00:23:34.080 And now we start to see the problem, that if I'm not careful, 00:23:34.080 --> 00:23:37.040 I go from A to B and then back to A and then to B again. 00:23:37.040 --> 00:23:39.380 And I could be going in this infinite loop where I never 00:23:39.380 --> 00:23:43.250 make any progress because I'm constantly just going back and forth between two 00:23:43.250 --> 00:23:45.390 states that I've already seen. 00:23:45.390 --> 00:23:46.730 So what is the solution to this? 00:23:46.730 --> 00:23:48.980 We need some way to deal with this problem. 00:23:48.980 --> 00:23:50.900 And the way that we can deal with this problem 00:23:50.900 --> 00:23:54.600 is by somehow keeping track of what we've already explored. 00:23:54.600 --> 00:23:58.150 And the logic is going to be, well, if we've already explored the state, 00:23:58.150 --> 00:23:59.630 there's no reason to go back to it. 00:23:59.630 --> 00:24:01.700 Once we've explored a state, don't go back to it, 00:24:01.700 --> 00:24:03.860 don't bother adding it to the frontier. 00:24:03.860 --> 00:24:05.570 There's no need to. 00:24:05.570 --> 00:24:07.640 So here is going to be our revised approach, 00:24:07.640 --> 00:24:10.450 a better way to approach this sort of search problem. 00:24:10.450 --> 00:24:14.030 And it's going to look very similar just with a couple of modifications. 00:24:14.030 --> 00:24:16.850 We'll start with a frontier that contains the initial state. 00:24:16.850 --> 00:24:18.110 Same as before. 00:24:18.110 --> 00:24:21.260 But now we'll start with another data structure, 00:24:21.260 --> 00:24:24.290 which would just be a set of nodes that we've already explored. 00:24:24.290 --> 00:24:25.880 So what are the states we've explored? 00:24:25.880 --> 00:24:27.330 Initially, it's empty. 00:24:27.330 --> 00:24:29.710 We have an empty explored set. 00:24:29.710 --> 00:24:30.930 And now we repeat. 00:24:30.930 --> 00:24:33.140 If the frontier is empty, no solution. 00:24:33.140 --> 00:24:34.400 Same as before. 00:24:34.400 --> 00:24:36.460 We remove a node from the frontier, we check 00:24:36.460 --> 00:24:38.510 to see if it's a goal state, return the solution. 00:24:38.510 --> 00:24:40.670 None of this is any different so far. 00:24:40.670 --> 00:24:42.650 But now, what we're going to do is we're going 00:24:42.650 --> 00:24:45.800 to add the node to the explored state. 00:24:45.800 --> 00:24:49.790 So if it happens to be the case that we remove a node from the frontier 00:24:49.790 --> 00:24:52.420 and it's not the goal, we'll add it to the explored set 00:24:52.420 --> 00:24:54.170 so that we know we've already explored it. 00:24:54.170 --> 00:24:58.010 We don't need to go back to it again if it happens to come up later. 00:24:58.010 --> 00:25:00.440 And then the final step, we expand the node 00:25:00.440 --> 00:25:03.120 and we add the resulting nodes to the frontier. 00:25:03.120 --> 00:25:05.960 But before we just always added the resulting nodes to the frontier, 00:25:05.960 --> 00:25:08.270 we're going to be a little cleverer about it this time. 00:25:08.270 --> 00:25:10.970 We're only going to add the nodes to the frontier 00:25:10.970 --> 00:25:14.120 if they aren't already in the frontier and if they 00:25:14.120 --> 00:25:16.790 aren't already in the explored set. 00:25:16.790 --> 00:25:19.340 So we'll check both the frontier and the explored set, 00:25:19.340 --> 00:25:22.520 make sure that the node isn't already in one of those two, 00:25:22.520 --> 00:25:25.700 and so long as it isn't, then we'll go ahead and add to the frontier 00:25:25.700 --> 00:25:27.590 but not otherwise. 00:25:27.590 --> 00:25:29.390 And so that revised approach is ultimately 00:25:29.390 --> 00:25:31.220 what's going to help make sure that we don't 00:25:31.220 --> 00:25:34.210 go back and forth between two nodes. 00:25:34.210 --> 00:25:36.420 Now the one point that I've kind of glossed over here 00:25:36.420 --> 00:25:40.770 so far is this step here, removing a node from the frontier. 00:25:40.770 --> 00:25:44.770 Before I just chose arbitrarily, like let's just remove a node and that's it. 00:25:44.770 --> 00:25:46.710 But it turns out it's actually quite important 00:25:46.710 --> 00:25:49.860 how we decide to structure our frontier, how we add them, 00:25:49.860 --> 00:25:51.840 and how we remove our nodes. 00:25:51.840 --> 00:25:53.370 The frontier is a data structure. 00:25:53.370 --> 00:25:55.960 And we need to make a choice about in what order 00:25:55.960 --> 00:25:58.080 are we going to be removing elements? 00:25:58.080 --> 00:26:01.600 And one of the simplest data structures for adding and removing elements 00:26:01.600 --> 00:26:03.300 is something called a stack. 00:26:03.300 --> 00:26:07.960 And a stack is a data structure that is a last in, first out data type. 00:26:07.960 --> 00:26:10.920 Which means the last thing that I add to the frontier 00:26:10.920 --> 00:26:14.860 is going to be the first thing that I remove from the frontier. 00:26:14.860 --> 00:26:18.720 So the most recent thing to go into the stack, or the frontier in this case, 00:26:18.720 --> 00:26:21.660 is going to be the node that I explore. 00:26:21.660 --> 00:26:25.170 So let's see what happens if I apply this stack based approach 00:26:25.170 --> 00:26:29.850 to something like this problem, finding a path from A to E. 00:26:29.850 --> 00:26:30.910 What's going to happen? 00:26:30.910 --> 00:26:33.340 Well, again we'll start with A. And we'll say, all right. 00:26:33.340 --> 00:26:34.950 Let's go ahead and look at A first. 00:26:34.950 --> 00:26:38.970 And then, notice this time, we've added A to the explored set. 00:26:38.970 --> 00:26:41.790 A is something we've now explored, we have this data structure 00:26:41.790 --> 00:26:43.320 that's keeping track. 00:26:43.320 --> 00:26:46.620 We then say from A we can get to B. And all right. 00:26:46.620 --> 00:26:48.040 From B what can we do? 00:26:48.040 --> 00:26:52.230 Well from B, we can explore B and get to both C and D. 00:26:52.230 --> 00:26:57.030 So we added C and then D. So now when we explore a node, 00:26:57.030 --> 00:27:00.390 we're going to treat the frontier as a stack, last in, first out. 00:27:00.390 --> 00:27:04.490 D was the last one to come in so we'll go ahead and explore that next. 00:27:04.490 --> 00:27:06.450 And say, all right, where can we get to from D? 00:27:06.450 --> 00:27:08.640 Well we can get to F. And so, all right. 00:27:08.640 --> 00:27:11.220 We'll put F into the frontier. 00:27:11.220 --> 00:27:13.800 And now because the frontier is a stack, F 00:27:13.800 --> 00:27:16.630 is the most recent thing that's gone in the stack. 00:27:16.630 --> 00:27:18.000 So F is what we'll explore next. 00:27:18.000 --> 00:27:19.960 We'll explore F and say, all right. 00:27:19.960 --> 00:27:21.600 Where can we get you from F? 00:27:21.600 --> 00:27:24.970 Well, we can't get anywhere so nothing gets added to the frontier. 00:27:24.970 --> 00:27:27.870 So now what was the new most recent thing added to the frontier? 00:27:27.870 --> 00:27:30.480 Well it's not C, the only thing left in the frontier. 00:27:30.480 --> 00:27:34.100 We'll explore that from which we can say, all right, from C we can get to E. 00:27:34.100 --> 00:27:35.380 So E goes into the frontier. 00:27:35.380 --> 00:27:36.510 And then we say, all right. 00:27:36.510 --> 00:27:39.220 Let's look at E and E is now the solution. 00:27:39.220 --> 00:27:41.740 And now we've solved the problem. 00:27:41.740 --> 00:27:43.710 So when we treat the frontier like a stack, 00:27:43.710 --> 00:27:47.670 a last in, first out data structure, that's the result we get. 00:27:47.670 --> 00:27:52.650 We go from A to B to D to F, and then we sort of backed up and went down 00:27:52.650 --> 00:27:56.190 to C and then E. And it's important to get a visual sense for how 00:27:56.190 --> 00:27:57.360 this algorithm is working. 00:27:57.360 --> 00:27:59.550 We went very deep in this search tree, so 00:27:59.550 --> 00:28:02.670 to speak, all the way until the bottom where we hit a dead end. 00:28:02.670 --> 00:28:06.230 And then we effectively backed up and explored this other route 00:28:06.230 --> 00:28:07.740 that we didn't try before. 00:28:07.740 --> 00:28:10.530 And it's this going very deep in the search tree idea, 00:28:10.530 --> 00:28:13.800 this way the algorithm ends up working when we use a stack, 00:28:13.800 --> 00:28:18.210 that we call this version of the algorithm depth first search. 00:28:18.210 --> 00:28:20.340 Depth first search is the search algorithm 00:28:20.340 --> 00:28:23.840 where we always explore the deepest node in the frontier. 00:28:23.840 --> 00:28:27.000 We keep going deeper and deeper through our search tree. 00:28:27.000 --> 00:28:31.660 And then if we hit a dead end, we back up and we try something else instead. 00:28:31.660 --> 00:28:34.670 But depth first search is just one of the possible search options 00:28:34.670 --> 00:28:35.780 that we could use. 00:28:35.780 --> 00:28:38.120 It turns out that there is another algorithm called 00:28:38.120 --> 00:28:41.150 breadth first search, which behaves very similarly to depth 00:28:41.150 --> 00:28:43.040 first search with one difference. 00:28:43.040 --> 00:28:46.820 Instead of always exploring the deepest node in the search tree the way 00:28:46.820 --> 00:28:48.920 the depth first search does, breadth first search 00:28:48.920 --> 00:28:53.370 is always going to explore the shallowest node in the frontier. 00:28:53.370 --> 00:28:54.360 So what does that mean? 00:28:54.360 --> 00:28:56.900 Well, it means that instead of using a stack, which 00:28:56.900 --> 00:29:00.650 depth first search, or DFS, used where the most recent item added 00:29:00.650 --> 00:29:06.350 to the frontier is the one we'll explore next, in breadth first search, or BFS, 00:29:06.350 --> 00:29:11.120 will instead use a queue where a queue is a first in, first out data 00:29:11.120 --> 00:29:14.660 type, where the very first thing we add to the frontier is the first one 00:29:14.660 --> 00:29:15.470 we'll explore. 00:29:15.470 --> 00:29:17.380 And they effectively form a line or a queue, 00:29:17.380 --> 00:29:23.270 where the earlier you arrive in the frontier, the earlier you get explored. 00:29:23.270 --> 00:29:26.600 So what would that mean for the same exact problem finding a path from A 00:29:26.600 --> 00:29:27.950 to E? 00:29:27.950 --> 00:29:30.200 Well we start with A, same as before. 00:29:30.200 --> 00:29:33.410 Then we'll go ahead and have explored A, and say, where can we get to from A? 00:29:33.410 --> 00:29:36.080 Well, from A we can get to B. Same as before. 00:29:36.080 --> 00:29:37.340 From B, same as before. 00:29:37.340 --> 00:29:41.070 We can get to C and D so C and D get added to the frontier. 00:29:41.070 --> 00:29:43.460 This time, though, we added C to the frontier 00:29:43.460 --> 00:29:46.670 before D so we'll explore C first. 00:29:46.670 --> 00:29:48.350 So C gets explored. 00:29:48.350 --> 00:29:50.020 And from C, where can we get to? 00:29:50.020 --> 00:29:51.470 Well, we can get to E. 00:29:51.470 --> 00:29:53.690 So E gets added to the frontier. 00:29:53.690 --> 00:29:58.290 But because D was explored before E, we'll look at D next. 00:29:58.290 --> 00:30:00.590 So we'll explore D and say, where can we get to from D? 00:30:00.590 --> 00:30:03.860 We can get to F. And only then will we say, all right. 00:30:03.860 --> 00:30:07.550 Now we can get to E. And so what breadth first search, or BFS, 00:30:07.550 --> 00:30:12.290 did is we started here, we looked at both C and D, 00:30:12.290 --> 00:30:14.240 and then we looked at E. Effectively we're 00:30:14.240 --> 00:30:16.790 looking at things one away from the initial state, 00:30:16.790 --> 00:30:18.620 then two away from the initial state. 00:30:18.620 --> 00:30:22.500 And only then, things that are three away from the initial state. 00:30:22.500 --> 00:30:25.970 Unlike depth first search, which just went as deep as possible 00:30:25.970 --> 00:30:28.700 into the search tree until it hit a dead end and then, 00:30:28.700 --> 00:30:30.920 ultimately, had to back up. 00:30:30.920 --> 00:30:33.320 So these now are two different search algorithms 00:30:33.320 --> 00:30:35.800 that we could apply in order to try and solve a problem. 00:30:35.800 --> 00:30:37.850 And let's take a look at how these would actually 00:30:37.850 --> 00:30:41.910 work in practice with something like maze solving, for example. 00:30:41.910 --> 00:30:43.370 So here's an example of a maze. 00:30:43.370 --> 00:30:46.580 These empty cells represent places where our agent can move. 00:30:46.580 --> 00:30:49.340 These darkened gray cells and represent walls 00:30:49.340 --> 00:30:51.050 that the agent can't pass through. 00:30:51.050 --> 00:30:53.750 And, ultimately, our agent, our AI, is going 00:30:53.750 --> 00:30:56.590 to try to find a way to get from position A 00:30:56.590 --> 00:31:00.680 to position B via some sequence of actions, where those actions are left, 00:31:00.680 --> 00:31:02.930 right, up, and down. 00:31:02.930 --> 00:31:05.420 What will depth first search do in this case? 00:31:05.420 --> 00:31:08.270 Well depth first search will just follow one path. 00:31:08.270 --> 00:31:11.590 If it reaches a fork in the road where it has multiple different options, 00:31:11.590 --> 00:31:14.200 depth first search is just, in this case, going to choose one. 00:31:14.200 --> 00:31:15.560 There isn't a real preference. 00:31:15.560 --> 00:31:19.280 But it's going to keep following one until it hits a dead end. 00:31:19.280 --> 00:31:21.620 And when it hits a dead end, depth first search 00:31:21.620 --> 00:31:24.800 effectively goes back to the last decision point 00:31:24.800 --> 00:31:26.330 and tries the other path. 00:31:26.330 --> 00:31:28.760 Fully exhausting this entire path and when 00:31:28.760 --> 00:31:30.890 it realizes that, OK, the goal is not here, 00:31:30.890 --> 00:31:32.750 then it turns its attention to this path. 00:31:32.750 --> 00:31:34.610 It goes as deep as possible. 00:31:34.610 --> 00:31:39.050 When it hits a dead end, it backs up and then tries this other path, 00:31:39.050 --> 00:31:42.110 keeps going as deep as possible down one particular path, 00:31:42.110 --> 00:31:45.470 and when it realizes that that's a dead end, then it'll back up. 00:31:45.470 --> 00:31:48.230 And then ultimately find its way to the goal. 00:31:48.230 --> 00:31:51.680 And maybe you got lucky and maybe you made a different choice earlier on, 00:31:51.680 --> 00:31:54.590 but ultimately this is how depth first search is going to work. 00:31:54.590 --> 00:31:56.900 It's going to keep following until it hits a dead end. 00:31:56.900 --> 00:32:00.860 And when it hits a dead end, it backs up and looks for a different solution. 00:32:00.860 --> 00:32:02.570 And so one thing you might reasonably ask 00:32:02.570 --> 00:32:05.010 is, is this algorithm always going to work? 00:32:05.010 --> 00:32:09.200 Will it always actually find a way to get from the initial state to the goal? 00:32:09.200 --> 00:32:11.420 And it turns out that as long as our maze 00:32:11.420 --> 00:32:14.630 is finite, as long as they're that finitely many spaces where 00:32:14.630 --> 00:32:15.890 we can travel, then yes. 00:32:15.890 --> 00:32:19.900 Depth first search is going to find a solution because eventually it 00:32:19.900 --> 00:32:21.290 will just explore everything. 00:32:21.290 --> 00:32:24.590 If the maze happens to be infinite and there's an infinite state space, which 00:32:24.590 --> 00:32:28.350 does exist in certain types of problems, then it's a slightly different story. 00:32:28.350 --> 00:32:30.950 But as long as our maze has finitely many squares, 00:32:30.950 --> 00:32:33.020 we're going to find a solution. 00:32:33.020 --> 00:32:34.940 The next question, though, that we want to ask 00:32:34.940 --> 00:32:37.040 is, is it going to be a good solution? 00:32:37.040 --> 00:32:39.740 Is it the optimal solution that we can find? 00:32:39.740 --> 00:32:42.190 And the answer there is not necessarily. 00:32:42.190 --> 00:32:44.030 And let's take a look at an example of that. 00:32:44.030 --> 00:32:48.890 In this maze, for example, we're again trying to find our way from A to B. 00:32:48.890 --> 00:32:51.470 And you notice here there are multiple possible solutions. 00:32:51.470 --> 00:32:54.560 We could go this way, or we could go up in order 00:32:54.560 --> 00:32:57.350 to make our way from A to B. Now if we're lucky, 00:32:57.350 --> 00:33:01.040 depth first search will choose this way and get to B. But there's no reason, 00:33:01.040 --> 00:33:03.140 necessarily, why depth first search would choose 00:33:03.140 --> 00:33:05.440 between going up or going to the right. 00:33:05.440 --> 00:33:08.270 It's sort of an arbitrary decision point because both 00:33:08.270 --> 00:33:10.420 are going to be added to the frontier. 00:33:10.420 --> 00:33:13.310 And ultimately, if we get unlucky, depth first search 00:33:13.310 --> 00:33:15.950 might choose to explore this path first because it's just 00:33:15.950 --> 00:33:17.430 a random choice at this point. 00:33:17.430 --> 00:33:20.570 It will explore, explore, explore, and it'll eventually 00:33:20.570 --> 00:33:24.110 find the goal, this particular path, when in actuality there 00:33:24.110 --> 00:33:25.010 was a better path. 00:33:25.010 --> 00:33:28.940 There was a more optimal solution that used fewer steps, 00:33:28.940 --> 00:33:32.660 assuming we're measuring the cost of a solution based on the number of steps 00:33:32.660 --> 00:33:33.930 that we need to take. 00:33:33.930 --> 00:33:36.680 So depth first search, if we're unlucky, might end up 00:33:36.680 --> 00:33:41.880 not finding the best solution when a better solution is available. 00:33:41.880 --> 00:33:44.370 So if that's DFS, depth first search. 00:33:44.370 --> 00:33:47.300 How does BFS, or breadth first search, compare? 00:33:47.300 --> 00:33:49.550 How would it work in this particular situation? 00:33:49.550 --> 00:33:52.490 Well the algorithm is going to look very different visually 00:33:52.490 --> 00:33:54.800 in terms of how BFS explores. 00:33:54.800 --> 00:33:57.800 Because BFS looks at shallower nodes first, 00:33:57.800 --> 00:34:01.850 the idea is going to be BFS will first look at all of the nodes that 00:34:01.850 --> 00:34:04.190 are one away from the initial state. 00:34:04.190 --> 00:34:06.050 Look here and look here, for example. 00:34:06.050 --> 00:34:10.590 Just at the two nodes that are immediately next to this initial state. 00:34:10.590 --> 00:34:12.470 Then it will explore nodes that are two away, 00:34:12.470 --> 00:34:15.050 looking at the state and that state, for example. 00:34:15.050 --> 00:34:18.140 Then it will explore nodes that are three away, this state and that state. 00:34:18.140 --> 00:34:22.190 Whereas depth first search just picked one path and kept following it, 00:34:22.190 --> 00:34:24.210 breadth first search on the other hand, is 00:34:24.210 --> 00:34:27.570 taking the option of exploring all of the possible paths 00:34:27.570 --> 00:34:30.150 kind of at the same time, bouncing back between them, 00:34:30.150 --> 00:34:32.520 looking deeper and deeper at each one, but making 00:34:32.520 --> 00:34:35.400 sure to explore the shallower ones or the ones that are 00:34:35.400 --> 00:34:38.060 closer to the initial state earlier. 00:34:38.060 --> 00:34:41.190 So we'll keep following this pattern, looking at things that are four away, 00:34:41.190 --> 00:34:43.170 looking at things that are five away, looking 00:34:43.170 --> 00:34:48.230 at things that are six away, until eventually we make our way to the goal. 00:34:48.230 --> 00:34:51.090 And in this case, it's true we had to explore some states that 00:34:51.090 --> 00:34:52.880 ultimately didn't lead us anywhere. 00:34:52.880 --> 00:34:56.290 But the path that we found to the goal was the optimal path. 00:34:56.290 --> 00:34:59.850 This is the shortest way that we could get to the goal. 00:34:59.850 --> 00:35:02.970 And so, what might happen then in a larger maze? 00:35:02.970 --> 00:35:04.850 Well let's take a look at something like this 00:35:04.850 --> 00:35:06.860 and how breadth first search is going to behave. 00:35:06.860 --> 00:35:08.690 Well, breadth first search, again, will just 00:35:08.690 --> 00:35:11.270 keep following the states until it receives a decision point. 00:35:11.270 --> 00:35:13.460 It could go either left or right. 00:35:13.460 --> 00:35:16.850 And while DFS just picked one and kept following 00:35:16.850 --> 00:35:21.620 that until it hit a dead end, BFS on the other hand, will explore both. 00:35:21.620 --> 00:35:23.720 It'll say, look at this node, then this node, 00:35:23.720 --> 00:35:27.510 and I'll look at this node, then that node, so on and so forth. 00:35:27.510 --> 00:35:31.850 And when it hits a decision point here, rather than pick one left or two right 00:35:31.850 --> 00:35:36.350 and explore that path, it will again explore both alternating between them, 00:35:36.350 --> 00:35:37.380 going deeper and deeper. 00:35:37.380 --> 00:35:41.570 Will explore here, and then maybe here and here, and then keep going. 00:35:41.570 --> 00:35:44.780 Explore here and slowly make our way, you can visually 00:35:44.780 --> 00:35:46.490 see further and further out. 00:35:46.490 --> 00:35:48.200 Once we get to this decision point, we'll 00:35:48.200 --> 00:35:52.790 explore both up and down until, ultimately, we 00:35:52.790 --> 00:35:55.570 make our way to the goal. 00:35:55.570 --> 00:35:58.250 And what you'll notice is, yes, breadth first search 00:35:58.250 --> 00:36:02.630 did find our way from A to B by following this particular path. 00:36:02.630 --> 00:36:06.410 But it needed to explore a lot of states in order to do so. 00:36:06.410 --> 00:36:09.440 And so we see some trade here between DFS and BFS. 00:36:09.440 --> 00:36:13.430 That in DFS there may be some cases where there is some memory savings, 00:36:13.430 --> 00:36:16.280 as compared to a breadth first approach where 00:36:16.280 --> 00:36:19.210 breadth first search, in this case, had to explore a lot of states. 00:36:19.210 --> 00:36:22.470 But maybe that won't always be the case. 00:36:22.470 --> 00:36:24.920 So now let's actually turn our attention to some code. 00:36:24.920 --> 00:36:26.720 And look at the code that we could actually 00:36:26.720 --> 00:36:30.410 write in order to implement something like depth first search or breadth 00:36:30.410 --> 00:36:34.910 for the search in the context of solving a maze, for example. 00:36:34.910 --> 00:36:37.340 So I'll go ahead and go into my terminal. 00:36:37.340 --> 00:36:41.240 And what I have here inside of maze.pi is an implementation 00:36:41.240 --> 00:36:43.640 of this same idea of maze solving. 00:36:43.640 --> 00:36:46.700 I've defined a class called node that in this case 00:36:46.700 --> 00:36:49.620 is keeping track of the state, the parent, in other words 00:36:49.620 --> 00:36:51.860 the state before the state, and the action. 00:36:51.860 --> 00:36:54.110 In this case, we're not keeping track of the path cost 00:36:54.110 --> 00:36:56.780 because we can calculate the cost of the path at the end 00:36:56.780 --> 00:37:00.890 after we found our way from the initial state to the goal. 00:37:00.890 --> 00:37:05.540 In addition to this, I've defined a class called a stack frontier. 00:37:05.540 --> 00:37:08.780 And if unfamiliar with a class, a class is a way for me 00:37:08.780 --> 00:37:11.930 to define a way to generate objects in Python. 00:37:11.930 --> 00:37:16.070 It refers to an idea of object oriented programming where the idea here 00:37:16.070 --> 00:37:18.740 is that I would like to create an object that is 00:37:18.740 --> 00:37:20.900 able to store all of my Frontier Data. 00:37:20.900 --> 00:37:23.030 And I would like to have functions, otherwise known 00:37:23.030 --> 00:37:27.410 as methods on that object, that I can use to manipulate the object. 00:37:27.410 --> 00:37:31.130 And so what's going on here, if unfamiliar with the syntax, 00:37:31.130 --> 00:37:34.430 is I have a function that initially creates a frontier 00:37:34.430 --> 00:37:36.410 that I'm going to represent using a list. 00:37:36.410 --> 00:37:39.810 And initially my frontier is represented by the empty list. 00:37:39.810 --> 00:37:42.870 There's nothing in my frontier to begin with. 00:37:42.870 --> 00:37:46.020 I have an add function that adds something to the frontier, 00:37:46.020 --> 00:37:49.260 as by appending it to the end of the list. 00:37:49.260 --> 00:37:53.400 I have a function that checks if the frontier contains a particular state. 00:37:53.400 --> 00:37:55.990 I have an empty function that checks if the frontier is empty. 00:37:55.990 --> 00:37:58.830 If the frontier is empty, that just means the length of the frontier 00:37:58.830 --> 00:38:00.180 is zero. 00:38:00.180 --> 00:38:03.310 And then I have a function for removing something from the frontier. 00:38:03.310 --> 00:38:06.150 I can't remove something from the frontier if the frontier is empty. 00:38:06.150 --> 00:38:07.770 So I check for that first. 00:38:07.770 --> 00:38:10.710 But otherwise, if the frontier isn't empty, 00:38:10.710 --> 00:38:13.940 recall that I'm implementing this frontier as a stack, 00:38:13.940 --> 00:38:17.200 a last in, first out data structure. 00:38:17.200 --> 00:38:19.590 Which means the last thing I add to the frontier, 00:38:19.590 --> 00:38:21.570 in other words, the last thing in the list, 00:38:21.570 --> 00:38:25.750 is the item that I should remove from this frontier. 00:38:25.750 --> 00:38:30.190 So what you'll see here is I have removed the last item of a list. 00:38:30.190 --> 00:38:33.160 And if you index into a Python list with negative one, 00:38:33.160 --> 00:38:35.050 that gets you the last item in the list. 00:38:35.050 --> 00:38:37.630 Since zero is the first item, negative one 00:38:37.630 --> 00:38:41.490 kind of wraps around and gets you to the last item in the list. 00:38:41.490 --> 00:38:43.310 So we give that the node. 00:38:43.310 --> 00:38:46.710 We call that node, we update the frontier here on line 28 to say, 00:38:46.710 --> 00:38:50.000 go ahead and remove that node that you just removed from the frontier. 00:38:50.000 --> 00:38:54.860 And then we return the node as a result. So this class here effectively 00:38:54.860 --> 00:38:57.050 implements the idea of a frontier. 00:38:57.050 --> 00:38:59.690 It gives me a way to add something to a frontier and a way 00:38:59.690 --> 00:39:03.130 to remove something from the frontier as a stack. 00:39:03.130 --> 00:39:06.830 I've also, just for good measure, implemented an alternative version 00:39:06.830 --> 00:39:09.500 of the same thing called a Q frontier. 00:39:09.500 --> 00:39:13.190 Which, in parentheses you'll see here, it inherits from a stack frontier, 00:39:13.190 --> 00:39:16.640 meaning it's going to do all the same things that the stack frontier did, 00:39:16.640 --> 00:39:19.440 except the way we remove a node from the frontier 00:39:19.440 --> 00:39:21.090 is going to be slightly different. 00:39:21.090 --> 00:39:24.240 Instead of removing from the end of the list the way we would in a stack, 00:39:24.240 --> 00:39:26.810 we're instead going to remove from the beginning of the list. 00:39:26.810 --> 00:39:31.590 self.frontierzero will get me the first node in the frontier, 00:39:31.590 --> 00:39:32.870 the first one that was added. 00:39:32.870 --> 00:39:37.550 And that is going to be the one that we return in the case of a Q. 00:39:37.550 --> 00:39:40.340 Under here I have a definition of a class called maze. 00:39:40.340 --> 00:39:45.050 This is going to handle the process of taking a sequence, a maze like text 00:39:45.050 --> 00:39:47.400 file, and figuring out how to solve it. 00:39:47.400 --> 00:39:50.480 So we'll take as input a text file that looks something 00:39:50.480 --> 00:39:53.810 like this, for example, where we see hash marks that are here representing 00:39:53.810 --> 00:39:57.830 walls and I have the character A representing the starting position, 00:39:57.830 --> 00:40:01.740 and the character B representing the ending position. 00:40:01.740 --> 00:40:04.790 And you can take a look at the code for parsing this text file right now. 00:40:04.790 --> 00:40:06.320 That's the less interesting part. 00:40:06.320 --> 00:40:09.380 The more interesting part is this solve function here, 00:40:09.380 --> 00:40:11.510 where the solve function is going to figure out 00:40:11.510 --> 00:40:15.230 how to actually get from point A to point B. 00:40:15.230 --> 00:40:18.260 And here we see an implementation of the exact same idea 00:40:18.260 --> 00:40:19.900 we saw from a moment ago. 00:40:19.900 --> 00:40:21.740 We're going to keep track of how many states 00:40:21.740 --> 00:40:24.520 we've explored just so we can report that data later. 00:40:24.520 --> 00:40:29.770 But I start with a node that represents just the start state. 00:40:29.770 --> 00:40:34.040 And I start with a frontier that in this case is a stack frontier. 00:40:34.040 --> 00:40:36.170 And given that I'm treating my frontier as a stack, 00:40:36.170 --> 00:40:40.160 you might imagine that the algorithm I'm using here is now depth first search. 00:40:40.160 --> 00:40:45.230 Because depth first search or DFS uses a stack as its data structure. 00:40:45.230 --> 00:40:50.350 And initially, this frontier is just going to contain the start state. 00:40:50.350 --> 00:40:53.390 We initialize an explored set that initially is empty. 00:40:53.390 --> 00:40:55.420 There's nothing we've explored so far. 00:40:55.420 --> 00:40:59.980 And now here's our loop, that notion of repeating something again and again. 00:40:59.980 --> 00:41:03.760 First, we check if the frontier is empty by calling that empty function that we 00:41:03.760 --> 00:41:05.890 saw the implementation of a moment ago. 00:41:05.890 --> 00:41:08.080 And if the frontier is indeed empty, we'll 00:41:08.080 --> 00:41:11.650 go ahead and raise an exception, or a Python error, to say, sorry. 00:41:11.650 --> 00:41:15.230 There is no solution to this problem. 00:41:15.230 --> 00:41:18.630 Otherwise, we'll go ahead and remove a node from the frontier, 00:41:18.630 --> 00:41:22.960 as by calling frontier.remove and update the number of states we've explored. 00:41:22.960 --> 00:41:25.360 Because now we've explored one additional state 00:41:25.360 --> 00:41:30.100 so we say self.numexplored plus equals one, adding one to the number of states 00:41:30.100 --> 00:41:31.880 we've explored. 00:41:31.880 --> 00:41:34.430 Once we remove a node from the frontier, recall 00:41:34.430 --> 00:41:38.330 that the next step is to see whether or not it's the goal, the goal test. 00:41:38.330 --> 00:41:40.850 And in the case of the maze, the goal is pretty easy. 00:41:40.850 --> 00:41:45.110 I check to see whether the state of the node is equal to the goal. 00:41:45.110 --> 00:41:47.090 Initially when I set up the maze, I set up 00:41:47.090 --> 00:41:49.890 this value called goal which is the property of the maze 00:41:49.890 --> 00:41:53.300 so I can just check to see if the node is actually the goal. 00:41:53.300 --> 00:41:56.060 And if it is the goal, then what I want to do 00:41:56.060 --> 00:41:59.450 is backtrack my way towards figuring out what actions 00:41:59.450 --> 00:42:02.360 I took in order to get to this goal. 00:42:02.360 --> 00:42:03.470 And how do I do that? 00:42:03.470 --> 00:42:06.140 We'll recall that every node stores its parent-- 00:42:06.140 --> 00:42:09.110 the node that came before it that we used to get to this node-- 00:42:09.110 --> 00:42:11.690 and also the action used in order to get there. 00:42:11.690 --> 00:42:13.700 So I can create this loop where I'm constantly 00:42:13.700 --> 00:42:17.000 just looking at the parent of every node and keeping 00:42:17.000 --> 00:42:20.600 track, for all of the parents, what action I took to get from the parent 00:42:20.600 --> 00:42:21.890 to this. 00:42:21.890 --> 00:42:25.250 So this loop is going to keep repeating this process of looking through all 00:42:25.250 --> 00:42:28.670 of the parent nodes until we get back to the initial state, which 00:42:28.670 --> 00:42:32.940 has no parent, where node.parent is going to be equal to none. 00:42:32.940 --> 00:42:35.240 As I do so, I'm going to be building up the list of all 00:42:35.240 --> 00:42:38.030 of the actions that I'm following and the list of all of the cells 00:42:38.030 --> 00:42:39.590 that are part of the solution. 00:42:39.590 --> 00:42:42.020 But I'll reverse them because when I build it 00:42:42.020 --> 00:42:44.930 up going from the goal back to the initial state, 00:42:44.930 --> 00:42:48.020 I'm building the sequence of actions from the goal to the initial state, 00:42:48.020 --> 00:42:50.900 but I want to reverse them in order to get the sequence of actions 00:42:50.900 --> 00:42:53.630 from the initial state to the goal. 00:42:53.630 --> 00:42:57.210 And that is, ultimately, going to be the solution. 00:42:57.210 --> 00:43:01.280 So all of that happens if the current state is equal to the goal. 00:43:01.280 --> 00:43:03.290 And otherwise, if it's not the goal, well, 00:43:03.290 --> 00:43:06.860 then I'll go ahead and add this state to the explored set to say, 00:43:06.860 --> 00:43:08.240 I've explored this state now. 00:43:08.240 --> 00:43:11.510 No need to go back to it if I come across it in the future. 00:43:11.510 --> 00:43:14.750 And then, this logic here implements the idea 00:43:14.750 --> 00:43:16.820 of adding neighbors to the frontier. 00:43:16.820 --> 00:43:18.650 I'm saying, look at all of my neighbors. 00:43:18.650 --> 00:43:21.530 And I implemented a function called neighbors that you can take a look at. 00:43:21.530 --> 00:43:23.690 And for each of those neighbors, I'm going to check, 00:43:23.690 --> 00:43:25.850 is the state already in the frontier? 00:43:25.850 --> 00:43:28.400 Is the state already in the explored set? 00:43:28.400 --> 00:43:32.600 And if it's not in either of those, then I'll go ahead and add this new child 00:43:32.600 --> 00:43:33.950 node-- this new node-- 00:43:33.950 --> 00:43:35.230 to the frontier. 00:43:35.230 --> 00:43:37.610 So there's a fair amount of syntax here, but the key here 00:43:37.610 --> 00:43:39.920 is not to understand all the nuances of the syntax, 00:43:39.920 --> 00:43:42.740 though feel free to take a closer look at this file on your own 00:43:42.740 --> 00:43:44.660 to get a sense for how it is working. 00:43:44.660 --> 00:43:48.140 But the key is to see how this is an implementation of the same pseudocode, 00:43:48.140 --> 00:43:53.060 the same idea that we were describing a moment ago on the screen when we were 00:43:53.060 --> 00:43:55.220 looking at the steps that we might follow in order 00:43:55.220 --> 00:43:57.720 to solve this kind of search problem. 00:43:57.720 --> 00:43:59.540 So now let's actually see this in action. 00:43:59.540 --> 00:44:05.540 I'll go ahead and run maze.py on maze1.txt, for example. 00:44:05.540 --> 00:44:09.320 And what we'll see is here we have a printout of what the maze initially 00:44:09.320 --> 00:44:10.370 looked like. 00:44:10.370 --> 00:44:13.010 And then here, down below, is after we've solved it. 00:44:13.010 --> 00:44:17.980 We had to explore 11 states in order to do it, and we found a path from A to B. 00:44:17.980 --> 00:44:21.110 And in this program, I just happened to generate a graphical representation 00:44:21.110 --> 00:44:22.040 of this, as well-- 00:44:22.040 --> 00:44:25.400 so I can open up maze.png, which is generated by this program-- 00:44:25.400 --> 00:44:28.820 that shows you where, in the darker color here, the wall is. 00:44:28.820 --> 00:44:30.860 Red is the initial state, green is the goal, 00:44:30.860 --> 00:44:32.930 and yellow is the path that was followed. 00:44:32.930 --> 00:44:37.230 We found a path from the initial state to the goal. 00:44:37.230 --> 00:44:40.050 But now let's take a look at a more sophisticated maze 00:44:40.050 --> 00:44:42.120 to see what might happen instead. 00:44:42.120 --> 00:44:47.010 Let's look now at maze2.txt, where now here we have a much larger maze. 00:44:47.010 --> 00:44:50.280 Again, we're trying to find our way from point A to point B, 00:44:50.280 --> 00:44:53.520 but now you'll imagine that depth-first search might not be so lucky. 00:44:53.520 --> 00:44:56.010 It might not get the goal on the first try. 00:44:56.010 --> 00:44:58.560 It might have to follow one path then backtrack 00:44:58.560 --> 00:45:02.100 and explore something else a little bit later. 00:45:02.100 --> 00:45:03.230 So let's try this. 00:45:03.230 --> 00:45:08.930 Run pythonmaze.py of maze2.txt, this time trying on this other maze. 00:45:08.930 --> 00:45:12.140 And now depth-first search is able to find a solution. 00:45:12.140 --> 00:45:16.040 Here, as indicated by the stars, is a way to get from A to B. 00:45:16.040 --> 00:45:19.430 And we can represent this visually by opening up this maze. 00:45:19.430 --> 00:45:20.810 Here's what that maze looks like. 00:45:20.810 --> 00:45:24.860 And highlighted in yellow, is the path that was found from the initial state 00:45:24.860 --> 00:45:26.300 to the goal. 00:45:26.300 --> 00:45:31.340 But how many states do we have to explore before we found that path? 00:45:31.340 --> 00:45:34.610 Well, recall that, in my program, I was keeping track of the number of states 00:45:34.610 --> 00:45:36.530 that we've explored so far. 00:45:36.530 --> 00:45:40.280 And so I can go back to the terminal and see that, all right, in order 00:45:40.280 --> 00:45:46.110 to solve this problem, we had to explore 399 different states. 00:45:46.110 --> 00:45:48.860 And in fact, if I make one small modification to the program 00:45:48.860 --> 00:45:51.950 and tell the program at the end when we output this image, 00:45:51.950 --> 00:45:55.160 I added an argument called "show explored". 00:45:55.160 --> 00:45:57.980 And if I set "show explored" equal to true 00:45:57.980 --> 00:46:02.720 and rerun this program pythonmaze.py by running it on maze2, 00:46:02.720 --> 00:46:06.320 and then I open the maze, what you'll see here is, highlighted in red, 00:46:06.320 --> 00:46:10.610 are all of the states that had to be explored to get from the initial state 00:46:10.610 --> 00:46:11.510 to the goal. 00:46:11.510 --> 00:46:15.140 Depth-First Search, or DFS, didn't find its way to the goal right away. 00:46:15.140 --> 00:46:18.170 It made a choice to first explore this direction. 00:46:18.170 --> 00:46:19.970 And when it explored this direction, it had 00:46:19.970 --> 00:46:22.280 to follow every conceivable path, all the way 00:46:22.280 --> 00:46:24.680 to the very end, even this long and winding one, 00:46:24.680 --> 00:46:27.440 in order to realize that, you know what, that's a dead end. 00:46:27.440 --> 00:46:29.720 And instead, the program needed to backtrack. 00:46:29.720 --> 00:46:32.660 After going this direction, it must have gone this direction. 00:46:32.660 --> 00:46:35.490 It got lucky here by just not choosing this path. 00:46:35.490 --> 00:46:39.290 But it got unlucky here, exploring this direction, exploring a bunch of states 00:46:39.290 --> 00:46:41.150 that it didn't need to and then, likewise, 00:46:41.150 --> 00:46:43.490 exploring all of this top part of the graph 00:46:43.490 --> 00:46:46.320 when it probably didn't need to do that either. 00:46:46.320 --> 00:46:49.070 So all in all, depth-first search here really 00:46:49.070 --> 00:46:52.970 not performing optimally, or probably exploring more states than it needs to. 00:46:52.970 --> 00:46:56.600 It finds an optimal solution, the best path to the goal, 00:46:56.600 --> 00:46:59.510 but the number of states needed to explore in order to do so, 00:46:59.510 --> 00:47:03.060 the number of steps I had to take, that was much higher. 00:47:03.060 --> 00:47:04.070 So let's compare. 00:47:04.070 --> 00:47:09.060 How would Breadth-First Search, or BFS, do on this exact same maze instead? 00:47:09.060 --> 00:47:11.630 And in order to do so, it's a very easy change. 00:47:11.630 --> 00:47:16.550 The algorithm for DFS and BFS is identical with the exception 00:47:16.550 --> 00:47:20.950 of what data structure we use to represent the frontier. 00:47:20.950 --> 00:47:23.840 That in DFS I used a stack frontier-- 00:47:23.840 --> 00:47:25.880 last in, first out-- 00:47:25.880 --> 00:47:30.380 whereas in BFS, I'm going to use a queue frontier-- first in, 00:47:30.380 --> 00:47:33.260 first out, where the first thing I add to the frontier 00:47:33.260 --> 00:47:35.570 is the first thing that I remove. 00:47:35.570 --> 00:47:40.670 So I'll go back to the terminal, rerun this program on the same maze, 00:47:40.670 --> 00:47:45.280 and now you'll see that the number of states we had to explore was only 77, 00:47:45.280 --> 00:47:49.140 as compared to almost 400 when we used depth-first search. 00:47:49.140 --> 00:47:50.330 And we can see exactly why. 00:47:50.330 --> 00:47:54.980 We can see what happened if we open up maze.png now and take a look. 00:47:54.980 --> 00:47:59.540 Again, yellow highlight is the solution that breath-first search found, 00:47:59.540 --> 00:48:03.020 which, incidentally, is the same solution that depth-first search found. 00:48:03.020 --> 00:48:07.110 They're both finding the best solution, but notice all the white unexplored 00:48:07.110 --> 00:48:07.610 cells. 00:48:07.610 --> 00:48:10.700 There was much fewer states that needed to be explored 00:48:10.700 --> 00:48:14.980 in order to make our way to the goal because breadth-first search operates 00:48:14.980 --> 00:48:15.980 a little more shallowly. 00:48:15.980 --> 00:48:19.070 It's exploring things that are close to the initial state 00:48:19.070 --> 00:48:22.170 without exploring things that are further away. 00:48:22.170 --> 00:48:25.220 So if the goal is not too far away, then breadth-first search 00:48:25.220 --> 00:48:27.950 can actually behave quite effectively on a maze that 00:48:27.950 --> 00:48:30.870 looks a little something like this. 00:48:30.870 --> 00:48:35.750 Now, in this case, both BFS and DFS ended up finding the same solution, 00:48:35.750 --> 00:48:37.680 but that won't always be the case. 00:48:37.680 --> 00:48:43.390 And in fact, let's take a look at one more example, for instance, maze3.txt. 00:48:43.390 --> 00:48:46.970 In maze3.txt, notice that here there are multiple ways 00:48:46.970 --> 00:48:49.190 that you could get from A to B. 00:48:49.190 --> 00:48:52.040 It's a relatively small maze, but let's look at what happens. 00:48:52.040 --> 00:48:55.670 If I use-- and I'll go ahead and turn off "show explored" so 00:48:55.670 --> 00:48:58.390 we just see the solution. 00:48:58.390 --> 00:49:04.590 If I use BFS, breadth-first search, to solve maze3.txt, 00:49:04.590 --> 00:49:06.420 well, then we find a solution. 00:49:06.420 --> 00:49:09.530 And if I open up the maze, here's the solution that we found. 00:49:09.530 --> 00:49:10.610 It is the optimal one. 00:49:10.610 --> 00:49:13.700 With just four steps, we can get from the initial state 00:49:13.700 --> 00:49:17.090 to what the goal happens to be. 00:49:17.090 --> 00:49:21.890 But what happens if we try to use, depth-first search, or DFS, instead? 00:49:21.890 --> 00:49:26.540 Well, again, I'll go back up to my queue frontier, where queue frontier means 00:49:26.540 --> 00:49:28.850 that we're using breadth-first search. 00:49:28.850 --> 00:49:32.120 And I'll change it to a stack frontier, which means that now we'll 00:49:32.120 --> 00:49:34.850 be using depth-first search. 00:49:34.850 --> 00:49:37.910 I'll rerun Pythonmaze.py. 00:49:37.910 --> 00:49:40.490 And now you'll see that we find a solution, 00:49:40.490 --> 00:49:42.980 but it is not the optimal solution. 00:49:42.980 --> 00:49:45.640 This, instead, is what our algorithm finds. 00:49:45.640 --> 00:49:48.140 And maybe depth-first search would have found this solution. 00:49:48.140 --> 00:49:51.410 It's possible, but it's not guaranteed, that if we just 00:49:51.410 --> 00:49:55.280 happen to be unlucky, if we choose this state instead of that state, 00:49:55.280 --> 00:49:58.280 then depth-first search might find a longer route to get 00:49:58.280 --> 00:50:01.260 from the initial state to the goal. 00:50:01.260 --> 00:50:04.280 So we do see some trade-offs here where depth-first search might not 00:50:04.280 --> 00:50:06.240 find the optimal solution. 00:50:06.240 --> 00:50:09.080 So at that point, it seems like breadth-first search is pretty good. 00:50:09.080 --> 00:50:12.920 Is that the best we can do, where it's going to find us the optimal solution 00:50:12.920 --> 00:50:15.620 and we don't have to worry about situations where 00:50:15.620 --> 00:50:20.270 we might end up finding a longer path to the solution than what actually exists? 00:50:20.270 --> 00:50:23.150 Where the goal is far away from the initial state-- 00:50:23.150 --> 00:50:26.780 and we might have to take lots of steps in order to get from the initial state 00:50:26.780 --> 00:50:27.620 to the goal-- 00:50:27.620 --> 00:50:31.220 what ended up happening, is that this algorithm, BFS, ended up 00:50:31.220 --> 00:50:35.480 exploring basically the entire graph, having to go through the entire maze 00:50:35.480 --> 00:50:39.800 in order to find its way from the initial state to the goal state. 00:50:39.800 --> 00:50:41.960 What we'd ultimately like is for our algorithm 00:50:41.960 --> 00:50:44.320 to be a little bit more intelligent. 00:50:44.320 --> 00:50:46.310 And now what would it mean for our algorithm 00:50:46.310 --> 00:50:49.820 to be a little bit more intelligent, in this case? 00:50:49.820 --> 00:50:52.490 Well, let's look back to where breadth-first search might 00:50:52.490 --> 00:50:54.290 have been able to make a different decision 00:50:54.290 --> 00:50:57.570 and consider human intuition in this process, as well. 00:50:57.570 --> 00:51:01.640 Like, what might a human do when solving this maze that is different than what 00:51:01.640 --> 00:51:04.490 BFS ultimately chose to do? 00:51:04.490 --> 00:51:07.610 Well, the very first decision point that BFS made 00:51:07.610 --> 00:51:11.420 was right here, when it made five steps and ended up 00:51:11.420 --> 00:51:13.340 in a position where it had a fork in the road. 00:51:13.340 --> 00:51:15.210 It could either go left or it could go right. 00:51:15.210 --> 00:51:17.460 In these initial couple of steps, there was no choice. 00:51:17.460 --> 00:51:20.790 There was only one action that could be taken from each of those states. 00:51:20.790 --> 00:51:23.030 And so the search algorithm did the only thing 00:51:23.030 --> 00:51:25.010 that any search algorithm could do, which 00:51:25.010 --> 00:51:28.400 is keep following that state after the next state. 00:51:28.400 --> 00:51:31.670 But this decision point is where things get a little bit interesting. 00:51:31.670 --> 00:51:34.850 Depth-first search, that very first search algorithm we looked at, 00:51:34.850 --> 00:51:38.750 chose to say, let's pick one path and exhaust that path, 00:51:38.750 --> 00:51:42.560 see if anything that way has the goal, and if not, then let's 00:51:42.560 --> 00:51:43.800 try the other way. 00:51:43.800 --> 00:51:46.580 Breadth-first search took the alternative approach of saying, 00:51:46.580 --> 00:51:47.210 you know what? 00:51:47.210 --> 00:51:51.510 Let's explore things that are shallow, close to us first, look left and right, 00:51:51.510 --> 00:51:53.960 then back left and back right, so on and so forth, 00:51:53.960 --> 00:51:58.640 alternating between our options in the hopes of finding something nearby. 00:51:58.640 --> 00:52:02.390 But ultimately, what might a human do if confronted with a situation like this 00:52:02.390 --> 00:52:04.250 of go left or go right? 00:52:04.250 --> 00:52:07.010 Well, a human might visually see that, all right, 00:52:07.010 --> 00:52:11.360 I'm trying to get to state B, which is way up there, and going right just 00:52:11.360 --> 00:52:13.360 feels like it's closer to the goal. 00:52:13.360 --> 00:52:15.140 Like, it feels like going right should be 00:52:15.140 --> 00:52:17.870 better than going left because I'm making progress 00:52:17.870 --> 00:52:19.380 towards getting to that goal. 00:52:19.380 --> 00:52:22.340 Now, of course, there are a couple of assumptions that I'm making here. 00:52:22.340 --> 00:52:25.220 I'm making the assumption that we can represent 00:52:25.220 --> 00:52:27.080 this grid as, like, a two-dimensional grid, 00:52:27.080 --> 00:52:28.880 where I know the coordinates of everything. 00:52:28.880 --> 00:52:33.940 I know that A is in coordinate 0,0, and B is in some other coordinate pair. 00:52:33.940 --> 00:52:37.070 And I know what coordinate I'm at now, so I can calculate that, yeah, going 00:52:37.070 --> 00:52:39.420 this way, that is closer to the goal. 00:52:39.420 --> 00:52:42.170 And that might be a reasonable assumption for some types of search 00:52:42.170 --> 00:52:44.060 problems but maybe not in others. 00:52:44.060 --> 00:52:46.670 But for now, we'll go ahead and assume that-- 00:52:46.670 --> 00:52:51.590 that I know what my current coordinate pair and I know the coordinate x,y 00:52:51.590 --> 00:52:53.680 of the goal that I'm trying to get to. 00:52:53.680 --> 00:52:56.540 And in this situation, I'd like an algorithm that 00:52:56.540 --> 00:52:59.090 is a little bit more intelligent and somehow knows 00:52:59.090 --> 00:53:02.180 that I should be making progress towards the goal, 00:53:02.180 --> 00:53:05.720 and this is probably the way to do that because, in a maze, 00:53:05.720 --> 00:53:08.480 moving in the coordinate direction of the goal 00:53:08.480 --> 00:53:11.880 is usually, though not always, a good thing. 00:53:11.880 --> 00:53:14.840 And so here we draw a distinction between two different types of search 00:53:14.840 --> 00:53:19.040 algorithms-- uninformed search and informed search. 00:53:19.040 --> 00:53:23.330 Uninformed search algorithms are algorithms like DFS and BFS, 00:53:23.330 --> 00:53:25.200 the two algorithms that we just looked at, 00:53:25.200 --> 00:53:29.540 which are search strategies that don't use any problem specific knowledge 00:53:29.540 --> 00:53:31.400 to be able to solve the problem. 00:53:31.400 --> 00:53:34.610 DFS and BFS didn't really care about the structure 00:53:34.610 --> 00:53:38.270 of the maze or anything about the way that a maze is in order 00:53:38.270 --> 00:53:39.330 to solve the problem. 00:53:39.330 --> 00:53:42.490 They just look at the actions available and choose from those actions, 00:53:42.490 --> 00:53:45.290 and it doesn't matter whether if it's a maze or some other problem. 00:53:45.290 --> 00:53:48.020 The solution, or the way that it tries to solve the problem, 00:53:48.020 --> 00:53:51.320 is really fundamentally going to be the same. 00:53:51.320 --> 00:53:53.030 What we're going to take a look at now is 00:53:53.030 --> 00:53:55.370 an improvement upon uninformed search. 00:53:55.370 --> 00:53:57.830 We're going to take a look at informed search. 00:53:57.830 --> 00:54:00.440 Informed search are going to be search strategies that 00:54:00.440 --> 00:54:05.800 use knowledge specific to the problem to be able to better find a solution. 00:54:05.800 --> 00:54:09.270 And in the case of a maze, this problem specific knowledge 00:54:09.270 --> 00:54:11.270 is something like, if I'm going to square 00:54:11.270 --> 00:54:14.420 that is geographically closer to the goal, that 00:54:14.420 --> 00:54:19.700 is better than being in a square that is geographically further away. 00:54:19.700 --> 00:54:23.330 And this is something we can only know by thinking about this problem 00:54:23.330 --> 00:54:27.820 and reasoning about what knowledge might be helpful for our AI agent 00:54:27.820 --> 00:54:30.060 to know a little something about. 00:54:30.060 --> 00:54:32.440 There are a number of different types of informed search. 00:54:32.440 --> 00:54:35.140 Specifically, first, we're going to look at a particular type 00:54:35.140 --> 00:54:39.550 of search algorithm called greedy best-first search. 00:54:39.550 --> 00:54:42.700 Greedy Best-First Search, often abbreviated GBFS, 00:54:42.700 --> 00:54:45.970 is a search algorithm that, instead of expanding the deepest node, 00:54:45.970 --> 00:54:49.660 like DFS, or the shallowest node, like BFS, 00:54:49.660 --> 00:54:52.390 this algorithm is always going to expand the node 00:54:52.390 --> 00:54:55.990 that it thinks is closest to the goal. 00:54:55.990 --> 00:54:59.650 Now, the search algorithm isn't going to know for sure whether it is the closest 00:54:59.650 --> 00:55:02.650 thing to the goal, because if we knew what was closest to the goal 00:55:02.650 --> 00:55:05.100 all the time, then we would already have a solution. 00:55:05.100 --> 00:55:07.150 Like, the knowledge of what is close to the goal, 00:55:07.150 --> 00:55:10.570 we could just follow those steps in order to get from the initial position 00:55:10.570 --> 00:55:11.870 to the solution. 00:55:11.870 --> 00:55:14.650 But if we don't know the solution-- meaning we don't know exactly 00:55:14.650 --> 00:55:16.450 what's closest to the goal-- 00:55:16.450 --> 00:55:19.120 instead, we can use an estimate of what's 00:55:19.120 --> 00:55:22.180 closest to the goal, otherwise known as a heuristic-- 00:55:22.180 --> 00:55:25.910 just some way of estimating whether or not we're close to the goal. 00:55:25.910 --> 00:55:29.800 And we'll do so using a heuristic function, conventionally called h(n), 00:55:29.800 --> 00:55:34.540 that takes a state of input and returns our estimate of how close we 00:55:34.540 --> 00:55:36.860 are to the goal. 00:55:36.860 --> 00:55:39.040 So what might this heuristic function actually 00:55:39.040 --> 00:55:42.100 look like in the case of a maze-solving algorithm? 00:55:42.100 --> 00:55:45.490 Where we're trying to solve a maze, what does a heuristic look like? 00:55:45.490 --> 00:55:48.910 Well, the heuristic needs to answer a question, like between these two 00:55:48.910 --> 00:55:51.770 cells, C and D, which one is better? 00:55:51.770 --> 00:55:55.640 Which one would I rather be in if I'm trying to find my way to the goal? 00:55:55.640 --> 00:55:58.640 Well, any human could probably look at this and tell you, you know what? 00:55:58.640 --> 00:56:00.280 D looks like it's better. 00:56:00.280 --> 00:56:03.550 Even if the maze is a convoluted and you haven't thought about all the walls, 00:56:03.550 --> 00:56:05.410 D is probably better. 00:56:05.410 --> 00:56:06.710 And why is D better? 00:56:06.710 --> 00:56:09.730 Well, because if you ignore the wall-- let's just pretend the walls 00:56:09.730 --> 00:56:14.290 don't exist for a moment and relax the problem, so to speak-- 00:56:14.290 --> 00:56:18.670 D, just in terms of coordinate pairs, is closer to this goal. 00:56:18.670 --> 00:56:21.700 It's fewer steps that I would need to take to get to the goal, 00:56:21.700 --> 00:56:24.160 as compared to C, even if you ignore the walls. 00:56:24.160 --> 00:56:29.160 If you just know the x,y coordinate of C, and the x,y coordinate of the goal, 00:56:29.160 --> 00:56:31.450 and likewise, you know the x,y coordinate of D, 00:56:31.450 --> 00:56:35.770 you can calculate that D, just geographically, ignoring the walls, 00:56:35.770 --> 00:56:37.110 looks like it's better. 00:56:37.110 --> 00:56:39.700 And so this is the heuristic function that we're going to use, 00:56:39.700 --> 00:56:42.880 and it's something called the Manhattan distance, one specific type 00:56:42.880 --> 00:56:46.690 of heuristic, where the heuristic is, how many squares vertically 00:56:46.690 --> 00:56:49.270 and horizontally and then left to right-- so not 00:56:49.270 --> 00:56:53.320 allowing myself to go diagonally, just either up or right or left or down. 00:56:53.320 --> 00:56:58.030 How many steps do I need to take to get from each of these cells to the goal? 00:56:58.030 --> 00:57:00.910 Well, as it turns out, D is much closer. 00:57:00.910 --> 00:57:01.870 There are fewer steps. 00:57:01.870 --> 00:57:05.740 It only needs to take six steps in order to get to that goal. 00:57:05.740 --> 00:57:07.670 Again here ignoring the walls. 00:57:07.670 --> 00:57:09.790 We've relaxed the problem a little bit. 00:57:09.790 --> 00:57:12.510 We're just concerned with, if you do the math, 00:57:12.510 --> 00:57:14.470 subtract the x values from each other and the y 00:57:14.470 --> 00:57:18.010 values from each other, what is our estimate of how far we are away? 00:57:18.010 --> 00:57:22.970 We can estimate that D is closer to the goal than C is. 00:57:22.970 --> 00:57:24.800 And so now we have an approach. 00:57:24.800 --> 00:57:27.890 We have a way of picking which node to remove from the frontier. 00:57:27.890 --> 00:57:29.870 And at each stage in our algorithm, we're 00:57:29.870 --> 00:57:31.580 going to remove a node from the frontier. 00:57:31.580 --> 00:57:34.820 We're going to explore the node, if it has the smallest 00:57:34.820 --> 00:57:37.970 value for this heuristic function, if it has the smallest 00:57:37.970 --> 00:57:40.740 Manhattan distance to the goal. 00:57:40.740 --> 00:57:42.510 And so what would this actually look like? 00:57:42.510 --> 00:57:45.320 Well, let me first label this graph, label this maze, 00:57:45.320 --> 00:57:48.050 with a number representing the value of this heuristic 00:57:48.050 --> 00:57:51.860 function, the value of the Manhattan distance from any of these cells. 00:57:51.860 --> 00:57:55.070 So from this cell, for example, were one away from the goal. 00:57:55.070 --> 00:57:56.920 From this cell, were two away from the goal. 00:57:56.920 --> 00:57:58.400 Three away, four away. 00:57:58.400 --> 00:58:02.210 Here we're five away, because we have to go one to the right and then four up. 00:58:02.210 --> 00:58:05.850 From somewhere like here, the Manhattan distance is 2. 00:58:05.850 --> 00:58:08.330 We're only two squares away from the goal, 00:58:08.330 --> 00:58:10.730 geographically, even though in practices we're 00:58:10.730 --> 00:58:13.910 going to have to take a longer path, but we don't know that yet. 00:58:13.910 --> 00:58:16.760 The heuristic is just some easy way to estimate 00:58:16.760 --> 00:58:18.380 how far we are away from the goal. 00:58:18.380 --> 00:58:21.380 And maybe our heuristic is overly optimistic. 00:58:21.380 --> 00:58:23.660 It thinks that, yeah, we're only two steps away, 00:58:23.660 --> 00:58:27.450 when in practice, when you consider the walls, it might be more steps. 00:58:27.450 --> 00:58:31.400 So the important thing here is that the heuristic isn't a guarantee 00:58:31.400 --> 00:58:33.290 of how many steps it's going to take. 00:58:33.290 --> 00:58:34.910 It is estimating. 00:58:34.910 --> 00:58:36.920 It's an attempt at trying to approximate. 00:58:36.920 --> 00:58:40.760 And it does seem generally the case that the squares that look closer 00:58:40.760 --> 00:58:43.910 to the goal have smaller values for the heuristic function 00:58:43.910 --> 00:58:46.960 than squares that are further away. 00:58:46.960 --> 00:58:52.110 So now, using greedy best-first search, what might this algorithm actually do? 00:58:52.110 --> 00:58:55.270 Well, again, for these first five steps, there's not much of a choice. 00:58:55.270 --> 00:58:57.610 We started this initial state, A. And we say, all right. 00:58:57.610 --> 00:59:00.300 We have to explore these five states. 00:59:00.300 --> 00:59:01.890 But now we have a decision point. 00:59:01.890 --> 00:59:04.590 Now we have a choice between going left and going right. 00:59:04.590 --> 00:59:08.760 And before, when DFS and BFS would just pick arbitrarily because it just 00:59:08.760 --> 00:59:11.610 depends on the order you throw these two nodes into the frontier-- 00:59:11.610 --> 00:59:15.240 and we didn't specify what order you put them into the frontier, only the order 00:59:15.240 --> 00:59:17.250 you take them out. 00:59:17.250 --> 00:59:20.580 Here we can look at 13 and 11 and say that, all right, 00:59:20.580 --> 00:59:24.630 this square is a distance of 11 away from the goal, 00:59:24.630 --> 00:59:27.300 according to our heuristic, according to our estimate. 00:59:27.300 --> 00:59:31.560 And this one we estimate to be 13 away from the goal. 00:59:31.560 --> 00:59:34.650 So between those two options, between these two choices, 00:59:34.650 --> 00:59:36.150 I'd rather have the 11. 00:59:36.150 --> 00:59:40.240 I'd rather be 11 steps away from the goal, so I'll go to the right. 00:59:40.240 --> 00:59:44.340 We're able to make an informed decision because we know a little something more 00:59:44.340 --> 00:59:45.670 about this problem. 00:59:45.670 --> 00:59:47.820 So then we keep following 10, 9, 8-- 00:59:47.820 --> 00:59:49.470 between the two sevens. 00:59:49.470 --> 00:59:51.850 We don't really have much of a way to know between those. 00:59:51.850 --> 00:59:53.880 So then we do just have to make an arbitrary choice. 00:59:53.880 --> 00:59:54.630 And you know what? 00:59:54.630 --> 00:59:55.560 Maybe we choose wrong. 00:59:55.560 --> 01:00:00.110 But that's OK because now we can still say, all right, let's try this seven. 01:00:00.110 --> 01:00:01.820 We say seven, six. 01:00:01.820 --> 01:00:03.990 We have to make this choice even though it increases 01:00:03.990 --> 01:00:05.640 the value of the heuristic function. 01:00:05.640 --> 01:00:08.970 But now we have another decision point between six and eight. 01:00:08.970 --> 01:00:10.290 And between those two-- 01:00:10.290 --> 01:00:13.380 and really, we're also considering the 13, but that's much higher. 01:00:13.380 --> 01:00:16.200 Between six, eight, and 13, well, the six 01:00:16.200 --> 01:00:18.900 is the smallest value, so we'd rather take the six. 01:00:18.900 --> 01:00:22.440 We're able to make an informed decision that going this way to the right 01:00:22.440 --> 01:00:24.840 is probably better than going that way. 01:00:24.840 --> 01:00:25.680 So we turn this way. 01:00:25.680 --> 01:00:26.850 We go to five. 01:00:26.850 --> 01:00:29.190 And now we find a decision point where we'll actually 01:00:29.190 --> 01:00:31.200 make a decision that we might not want to make, 01:00:31.200 --> 01:00:34.320 but there's unfortunately not too much of a way around this. 01:00:34.320 --> 01:00:35.670 We see four and six. 01:00:35.670 --> 01:00:37.620 Four looks closer to the goal, right? 01:00:37.620 --> 01:00:40.170 It's going up, and the goal is further up. 01:00:40.170 --> 01:00:43.710 So we end up taking that route, which ultimately leads us to a dead end. 01:00:43.710 --> 01:00:46.960 But that's OK because we can still say, all right, now let's try the six, 01:00:46.960 --> 01:00:51.380 and now follow this route that will ultimately lead us to the goal. 01:00:51.380 --> 01:00:54.510 And so this now is how greedy best-first search might 01:00:54.510 --> 01:00:56.760 try to approach this problem, by saying whenever 01:00:56.760 --> 01:01:00.060 we have a decision between multiple nodes that we could explore, 01:01:00.060 --> 01:01:04.290 let's explore the node that has the smallest value of h(n), 01:01:04.290 --> 01:01:09.210 this heuristic function that is estimating how far I have to go. 01:01:09.210 --> 01:01:11.010 And it just so happens that, in this case, 01:01:11.010 --> 01:01:15.010 we end up doing better, in terms of the number of states we needed to explore, 01:01:15.010 --> 01:01:16.630 than BFS needed to. 01:01:16.630 --> 01:01:19.950 BFS explored all of this section and all of that section. 01:01:19.950 --> 01:01:22.620 But we were able to eliminate that by taking advantage 01:01:22.620 --> 01:01:26.040 of this heuristic, this knowledge about how close we 01:01:26.040 --> 01:01:30.330 are to the goal or some estimate of that idea. 01:01:30.330 --> 01:01:31.450 So this seems much better. 01:01:31.450 --> 01:01:33.150 So wouldn't we always prefer an algorithm 01:01:33.150 --> 01:01:36.870 like this over an algorithm like breadth-first search? 01:01:36.870 --> 01:01:37.850 Well, maybe. 01:01:37.850 --> 01:01:39.810 One thing to take into consideration is that we 01:01:39.810 --> 01:01:42.030 need to come up with a good heuristic. 01:01:42.030 --> 01:01:45.720 How good the heuristic is is going to affect how good this algorithm is. 01:01:45.720 --> 01:01:49.740 And coming up with a good heuristic can oftentimes be challenging. 01:01:49.740 --> 01:01:51.450 But the other thing to consider is to ask 01:01:51.450 --> 01:01:54.630 the question, just as we did with the prior two algorithms, 01:01:54.630 --> 01:01:56.580 is this algorithm optimal? 01:01:56.580 --> 01:02:02.280 Will it always find the shortest path from the initial state to the goal? 01:02:02.280 --> 01:02:06.180 And to answer that question, let's take a look at this example for a moment. 01:02:06.180 --> 01:02:07.570 Take a look at this example. 01:02:07.570 --> 01:02:10.170 Again, we're trying to get from A to B, and again, I've 01:02:10.170 --> 01:02:13.260 labeled each of the cells with their Manhattan distance 01:02:13.260 --> 01:02:16.320 from the goal, the number of squares up and to the right 01:02:16.320 --> 01:02:20.500 you would need to travel in order to get from that square to the goal. 01:02:20.500 --> 01:02:23.400 And let's think about, would greedy best-first search 01:02:23.400 --> 01:02:29.400 that always picks the smallest number end up finding the optimal solution? 01:02:29.400 --> 01:02:33.520 What is the shortest solution, and would this algorithm find it? 01:02:33.520 --> 01:02:38.190 And the important thing to realize is that right here is the decision point. 01:02:38.190 --> 01:02:40.710 We're estimate to be 12 away from the goal. 01:02:40.710 --> 01:02:42.310 And we have two choices. 01:02:42.310 --> 01:02:45.690 We can go to the left, which we estimate to be 13 away from the goal, 01:02:45.690 --> 01:02:49.710 or we can go up, where we estimate it to be 11 away from the goal. 01:02:49.710 --> 01:02:53.640 And between those two, greedy best-first search is going to say, 01:02:53.640 --> 01:02:57.000 the 11 looks better than the 13. 01:02:57.000 --> 01:02:59.540 And in doing so, greedy best-first search 01:02:59.540 --> 01:03:02.820 will end up finding this path to the goal. 01:03:02.820 --> 01:03:04.980 But it turns out this path is not optimal. 01:03:04.980 --> 01:03:07.470 There is a way to get to the goal using fewer steps. 01:03:07.470 --> 01:03:12.420 And it's actually this way, this way that ultimately involved fewer steps, 01:03:12.420 --> 01:03:15.900 even though it meant at this moment choosing the worst 01:03:15.900 --> 01:03:19.770 option between the two-- or what we estimated to be the worst option, based 01:03:19.770 --> 01:03:21.150 on the heretics. 01:03:21.150 --> 01:03:23.910 And so this is what we mean by this is a greedy algorithm. 01:03:23.910 --> 01:03:26.460 It's making the best decision, locally. 01:03:26.460 --> 01:03:28.800 At this decision point, it looks like it's better 01:03:28.800 --> 01:03:31.450 to go here than it is to go to the 13. 01:03:31.450 --> 01:03:34.040 But in the big picture, it's not necessarily optimal, 01:03:34.040 --> 01:03:37.200 that it might find a solution when in actuality there 01:03:37.200 --> 01:03:40.150 was a better solution available. 01:03:40.150 --> 01:03:43.200 So we would like some way to solve this problem. 01:03:43.200 --> 01:03:45.840 We like the idea of this heuristic, of being 01:03:45.840 --> 01:03:50.080 able to estimate the path, the distance between us and the goal, 01:03:50.080 --> 01:03:52.290 and that helps us to be able to make better decisions 01:03:52.290 --> 01:03:55.890 and to eliminate having to search through entire parts of the state 01:03:55.890 --> 01:03:57.070 space. 01:03:57.070 --> 01:04:00.160 But we would like to modify the algorithm so that we can achieve 01:04:00.160 --> 01:04:02.650 optimality, so that it can be optimal. 01:04:02.650 --> 01:04:04.060 And what is the way to do this? 01:04:04.060 --> 01:04:05.790 What is the intuition here? 01:04:05.790 --> 01:04:08.310 Well, let's take a look at this problem. 01:04:08.310 --> 01:04:11.070 In this initial problem, greedy best-first search 01:04:11.070 --> 01:04:14.170 found this solution here, this long path. 01:04:14.170 --> 01:04:17.310 And the reason why it wasn't great is because, yes, the heuristic numbers 01:04:17.310 --> 01:04:21.180 went down pretty low, but later on, and they started to build back up. 01:04:21.180 --> 01:04:25.780 They built back 8, 9, 10, 11-- all the way up to 12, in this case. 01:04:25.780 --> 01:04:29.290 And so how might we go about trying to improve this algorithm? 01:04:29.290 --> 01:04:32.400 Well, one thing that we might realize is that, if we 01:04:32.400 --> 01:04:35.170 go all the way through this algorithm, through this path, 01:04:35.170 --> 01:04:39.340 and we end up going to the 12, and we've had to take this many steps-- like, 01:04:39.340 --> 01:04:42.640 who knows how many steps that is-- just to get to this 12, 01:04:42.640 --> 01:04:48.160 we could have also, as an alternative, taken much fewer steps, just six steps, 01:04:48.160 --> 01:04:50.170 and ended up at this 13 here. 01:04:50.170 --> 01:04:53.680 And yes, 13 is more than 12, so it looks like it's not as good, 01:04:53.680 --> 01:04:55.460 but it required far fewer steps. 01:04:55.460 --> 01:04:55.960 Right? 01:04:55.960 --> 01:04:59.530 It only took six steps to get to this 13 versus many more steps 01:04:59.530 --> 01:05:01.000 to get to this 12. 01:05:01.000 --> 01:05:04.180 And while greedy best-first search says, oh, well, 12 is better than 13 01:05:04.180 --> 01:05:07.850 so pick the 12, we might more intelligently say, 01:05:07.850 --> 01:05:10.810 I'd rather be somewhere that heuristically 01:05:10.810 --> 01:05:16.030 looks like it takes slightly longer if I can get there much more quickly. 01:05:16.030 --> 01:05:18.940 And we're going to encode that idea, this general idea, 01:05:18.940 --> 01:05:23.010 into a more formal algorithm known as A star search. 01:05:23.010 --> 01:05:25.270 A star search is going to solve this problem by, 01:05:25.270 --> 01:05:27.970 instead of just considering the heuristic, 01:05:27.970 --> 01:05:32.720 also considering how long it took us to get to any particular state. 01:05:32.720 --> 01:05:35.650 So the distinction is greedy best-first search, if I am in a state 01:05:35.650 --> 01:05:38.440 right now, the only thing I care about is 01:05:38.440 --> 01:05:42.070 what is the estimated distance, the heuristic value, between me 01:05:42.070 --> 01:05:43.000 and the goal. 01:05:43.000 --> 01:05:45.640 Whereas A star search will take into consideration 01:05:45.640 --> 01:05:46.870 two pieces of information. 01:05:46.870 --> 01:05:51.130 It'll take into consideration, how far do I estimate I am from the goal, 01:05:51.130 --> 01:05:55.030 but also how far did I have to travel in order to get here? 01:05:55.030 --> 01:05:57.490 Because that is relevant, too. 01:05:57.490 --> 01:06:00.760 So we'll search algorithms by expanding the node with the lowest 01:06:00.760 --> 01:06:04.060 value of g(n) plus h(n). 01:06:04.060 --> 01:06:07.810 h(n) is that same heuristic that we were talking about a moment ago that's going 01:06:07.810 --> 01:06:12.640 to vary based on the problem, but g(n) is going to be the cost to reach 01:06:12.640 --> 01:06:13.420 the node-- 01:06:13.420 --> 01:06:19.380 how many steps I had to take, in this case, to get to my current position. 01:06:19.380 --> 01:06:22.070 So what does that search algorithm look like in practice? 01:06:22.070 --> 01:06:23.630 Well, let's take a look. 01:06:23.630 --> 01:06:25.130 Again, we've got the same maze. 01:06:25.130 --> 01:06:28.010 And again, I've labeled them with their Manhattan distance. 01:06:28.010 --> 01:06:32.060 This value is the h(n) value, the heuristic estimate 01:06:32.060 --> 01:06:36.240 of how far each of these squares is away from the goal. 01:06:36.240 --> 01:06:38.370 But now, as we begin to explore states, we 01:06:38.370 --> 01:06:41.370 care not just about this heuristic value but also 01:06:41.370 --> 01:06:45.650 about g(n), the number of steps I had to take in order to get there. 01:06:45.650 --> 01:06:48.060 And I care about summing those two numbers together. 01:06:48.060 --> 01:06:49.230 So what does that look like? 01:06:49.230 --> 01:06:52.850 On this very first step, I have taken one step. 01:06:52.850 --> 01:06:56.130 And now I am estimated to be 16 steps away from the goal. 01:06:56.130 --> 01:06:59.190 So the total value here is 17. 01:06:59.190 --> 01:07:00.280 Then I take one more step. 01:07:00.280 --> 01:07:02.010 I've now taken two steps. 01:07:02.010 --> 01:07:04.780 And I estimate myself to be 15 away from the goal-- 01:07:04.780 --> 01:07:06.740 again, a total value of 17. 01:07:06.740 --> 01:07:08.230 Now I've taken three steps. 01:07:08.230 --> 01:07:11.440 And I'm estimated to be 14 away from the goal, so on and so forth. 01:07:11.440 --> 01:07:13.750 Four steps, an estimate of 13. 01:07:13.750 --> 01:07:15.820 Five steps, estimate of 12. 01:07:15.820 --> 01:07:17.980 And now, here's a decision point. 01:07:17.980 --> 01:07:22.720 I could either be six steps away from the goal with a heuristic of 13 01:07:22.720 --> 01:07:26.440 for a total of 19, or I could be six steps away 01:07:26.440 --> 01:07:31.640 from the goal with a heuristic of 11 with an estimate of 17 for the total. 01:07:31.640 --> 01:07:35.230 So between 19 and 17, I'd rather take the 17-- 01:07:35.230 --> 01:07:37.170 the 6 plus 11. 01:07:37.170 --> 01:07:39.170 So so far, no different than what we saw before. 01:07:39.170 --> 01:07:42.160 We're still taking this option because it appears to be better. 01:07:42.160 --> 01:07:45.160 And I keep taking this option because it appears to be better. 01:07:45.160 --> 01:07:49.430 But it's right about here that things get a little bit different. 01:07:49.430 --> 01:07:55.630 Now I could be 15 steps away from the goal with an estimated distance of 6. 01:07:55.630 --> 01:07:58.750 So 15 plus 6, total value of 21. 01:07:58.750 --> 01:08:01.870 Alternatively, I could be six steps away from the goal-- 01:08:01.870 --> 01:08:04.690 because this was five steps away, so this is six steps away-- 01:08:04.690 --> 01:08:07.350 with a total value of 13 as my estimate. 01:08:07.350 --> 01:08:08.650 So 6 plus 13-- 01:08:08.650 --> 01:08:10.180 that's 19. 01:08:10.180 --> 01:08:14.140 So here we would evaluate g(n) plus h(n) to be 19-- 01:08:14.140 --> 01:08:20.440 6 plus 13-- whereas here, we would be 15 plus 6, or 21. 01:08:20.440 --> 01:08:23.780 And so the intuition is, 19 less than 21, pick here. 01:08:23.780 --> 01:08:29.190 But the idea is ultimately I'd rather be having taken fewer steps to get to a 13 01:08:29.190 --> 01:08:32.610 than having taken 15 steps and be at a six 01:08:32.610 --> 01:08:35.410 because it means I've had to take more steps in order to get there. 01:08:35.410 --> 01:08:38.520 Maybe there's a better path this way. 01:08:38.520 --> 01:08:41.050 So instead we'll explore this route. 01:08:41.050 --> 01:08:43.840 Now if we go one more-- this is seven steps plus 14, 01:08:43.840 --> 01:08:46.840 is 21, so between those two it's sort of a toss up. 01:08:46.840 --> 01:08:48.970 We might end up exploring that one anyways. 01:08:48.970 --> 01:08:53.270 But after that, as these numbers start to get bigger in the heuristic values 01:08:53.270 --> 01:08:55.600 and these heuristic values start to get smaller, 01:08:55.600 --> 01:08:59.100 you'll find that we'll actually keep exploring down this path. 01:08:59.100 --> 01:09:02.290 And you can do the math to see that at every decision point, 01:09:02.290 --> 01:09:06.810 A star search is going to make a choice based on the sum of how many steps 01:09:06.810 --> 01:09:09.700 it took me to get to my current position and then 01:09:09.700 --> 01:09:13.180 how far I estimate I am from the goal. 01:09:13.180 --> 01:09:15.760 So while we did have to explore some of these states, 01:09:15.760 --> 01:09:20.470 the ultimate solution we found was, in fact, an optimal solution. 01:09:20.470 --> 01:09:24.790 It did find us the quickest possible way to get from the initial state 01:09:24.790 --> 01:09:25.800 to the goal. 01:09:25.800 --> 01:09:29.800 And it turns out that A* is an optimal search algorithm under certain 01:09:29.800 --> 01:09:31.390 conditions. 01:09:31.390 --> 01:09:35.860 So the conditions are h of n, my heuristic, needs to be admissible. 01:09:35.860 --> 01:09:37.990 What does it mean for a heuristic to be admissible? 01:09:37.990 --> 01:09:42.700 Well, a heuristic is admissible if it never overestimates the true cost. 01:09:42.700 --> 01:09:46.260 Each event always needs to either get it exactly right 01:09:46.260 --> 01:09:50.520 in terms of how far away I am, or it needs to underestimate. 01:09:50.520 --> 01:09:54.600 So we saw an example from before where the heuristic value was much smaller 01:09:54.600 --> 01:09:56.350 than the actual cost it would take. 01:09:56.350 --> 01:09:57.710 That's totally fine. 01:09:57.710 --> 01:10:00.100 But the heuristic value should never overestimate. 01:10:00.100 --> 01:10:04.520 It should never think that I'm further away from the goal than I actually am. 01:10:04.520 --> 01:10:07.630 And meanwhile, to make a stronger statement, h of n 01:10:07.630 --> 01:10:09.880 also needs to be consistent. 01:10:09.880 --> 01:10:11.800 And what does it mean for it to be consistent? 01:10:11.800 --> 01:10:14.500 Mathematically, it means that for every node, which 01:10:14.500 --> 01:10:19.120 we'll call n, and successor, the node after me, that I'll call n prime, 01:10:19.120 --> 01:10:24.220 where it takes a cost of c to make that step, the heuristic value of n 01:10:24.220 --> 01:10:26.560 needs to be less than or equal to the heuristic 01:10:26.560 --> 01:10:29.060 value of n prime plus the cost. 01:10:29.060 --> 01:10:31.360 So it's a lot of math, but in words, what it ultimately 01:10:31.360 --> 01:10:34.870 means is that if I am here at this state right now, 01:10:34.870 --> 01:10:39.250 the heuristic value from me to the goal shouldn't be more than the heuristic 01:10:39.250 --> 01:10:44.080 value of my successor, the next place I could go to, plus however much 01:10:44.080 --> 01:10:48.560 it would cost me to just make that step, from one step to the next step. 01:10:48.560 --> 01:10:52.480 And so this is just making sure that my heuristic is consistent between all 01:10:52.480 --> 01:10:54.010 of these steps that I might take. 01:10:54.010 --> 01:10:58.540 So as long as this is true, then A* search is going to find me an optimal 01:10:58.540 --> 01:10:59.470 solution. 01:10:59.470 --> 01:11:02.770 And this is where much of the challenge of solving these search problems can 01:11:02.770 --> 01:11:05.980 sometimes come in, that A* search is an algorithm that is known, 01:11:05.980 --> 01:11:07.960 and you could write the code fairly easily. 01:11:07.960 --> 01:11:11.260 But it's choosing the heuristic that can be the interesting challenge. 01:11:11.260 --> 01:11:13.540 The better the heuristic is, the better I'll 01:11:13.540 --> 01:11:16.870 be able to solve the problem, and the fewer states that I'll have to explore. 01:11:16.870 --> 01:11:20.200 And I need to make sure that the heuristic satisfies 01:11:20.200 --> 01:11:22.540 these particular constraints. 01:11:22.540 --> 01:11:25.900 So all in all, these are some of the examples of search algorithms 01:11:25.900 --> 01:11:26.650 that might work. 01:11:26.650 --> 01:11:29.080 And certainly, there are many more than just this. 01:11:29.080 --> 01:11:32.590 A*, for example, does have a tendency to use quite a bit of memory, 01:11:32.590 --> 01:11:37.570 so there are alternative approaches to A* that ultimately use less memory than 01:11:37.570 --> 01:11:40.200 this version of A* happens to use. 01:11:40.200 --> 01:11:43.660 And there are other search algorithms that are optimized for other cases 01:11:43.660 --> 01:11:45.560 as well. 01:11:45.560 --> 01:11:48.460 But now, so far, we've only been looking at search algorithms 01:11:48.460 --> 01:11:50.860 where there's one agent. 01:11:50.860 --> 01:11:53.470 I am trying to find a solution to a problem. 01:11:53.470 --> 01:11:55.900 I am trying to navigate my way through a maze. 01:11:55.900 --> 01:11:57.880 I am trying to solve a 15 puzzle. 01:11:57.880 --> 01:12:02.190 I am trying to find driving directions from point A to point B. 01:12:02.190 --> 01:12:04.420 Sometimes in search situations, though, we'll 01:12:04.420 --> 01:12:07.600 enter an adversarial situation where I am 01:12:07.600 --> 01:12:10.010 an agent trying to make intelligent decisions, 01:12:10.010 --> 01:12:13.090 and there is someone else who is fighting against me, so to speak, 01:12:13.090 --> 01:12:16.570 that has an opposite objective, someone where I am trying to succeed, 01:12:16.570 --> 01:12:19.180 someone else that wants me to fail. 01:12:19.180 --> 01:12:23.680 And this is most popular in something like a game, a game like tic-tac-toe, 01:12:23.680 --> 01:12:26.350 where we've got this 3-by-3 grid, and X and O 01:12:26.350 --> 01:12:30.130 take turns either writing an X or an O in any one of these squares. 01:12:30.130 --> 01:12:33.610 And the goal is to get three X's in a row, if you're the X player, 01:12:33.610 --> 01:12:36.610 or three O's in a row, if you're the O player. 01:12:36.610 --> 01:12:40.390 And computers have gotten quite good at playing games, tic-tac-toe very easily, 01:12:40.390 --> 01:12:42.370 but even more complex games. 01:12:42.370 --> 01:12:46.570 And so you might imagine, what does an intelligent decision in a game look 01:12:46.570 --> 01:12:47.290 like? 01:12:47.290 --> 01:12:51.130 So maybe X makes an initial move in the middle, and O plays up here. 01:12:51.130 --> 01:12:54.340 What does an intelligent move for X now become? 01:12:54.340 --> 01:12:56.390 Where should you move if you were X? 01:12:56.390 --> 01:12:58.670 And it turns out there are a couple of possibilities. 01:12:58.670 --> 01:13:01.120 But if an AI is playing this game optimally, 01:13:01.120 --> 01:13:04.180 then the AI might play somewhere like the upper right, where 01:13:04.180 --> 01:13:07.800 in this situation, O has the opposite objective of X. 01:13:07.800 --> 01:13:11.770 X is trying to win the game, to get three in a row diagonally here, 01:13:11.770 --> 01:13:15.410 and O is trying to stop that objective, opposite of the objective. 01:13:15.410 --> 01:13:17.860 And so O is going to place here, to try to block. 01:13:17.860 --> 01:13:20.260 But now, X has a pretty clever move. 01:13:20.260 --> 01:13:25.180 X can make a move, like this where now X has two possible ways that X 01:13:25.180 --> 01:13:26.080 can win the game. 01:13:26.080 --> 01:13:29.050 X could win the game by getting three in a row across here, 01:13:29.050 --> 01:13:32.380 or X could win the game by getting three in a row vertically this way. 01:13:32.380 --> 01:13:34.610 So it doesn't matter where O makes their next move. 01:13:34.610 --> 01:13:38.170 O could play here, for example, blocking the three in a row horizontally, 01:13:38.170 --> 01:13:43.200 but then X is going to win the game by getting a three in a row vertically. 01:13:43.200 --> 01:13:45.010 And so there's a fair amount of reasoning 01:13:45.010 --> 01:13:48.220 that's going on here in order for the computer to be able to solve a problem. 01:13:48.220 --> 01:13:51.520 And it's similar in spirit to the problems we've looked at so far. 01:13:51.520 --> 01:13:54.580 There are actions, there's some sort of state of the board, 01:13:54.580 --> 01:13:57.160 and some transition from one action to the next, 01:13:57.160 --> 01:14:00.730 but it's different in the sense that this is now not just a classical search 01:14:00.730 --> 01:14:04.900 problem, but an adversarial search problem, that I am the X player, 01:14:04.900 --> 01:14:07.090 trying to find the best moves to make, but I 01:14:07.090 --> 01:14:10.640 know that there is some adversary that is trying to stop me. 01:14:10.640 --> 01:14:14.800 So we need some sort of algorithm to deal with these adversarial type 01:14:14.800 --> 01:14:16.440 of search situations. 01:14:16.440 --> 01:14:18.400 And the algorithm we're going to take a look at 01:14:18.400 --> 01:14:20.860 is an algorithm called Minimax, which works 01:14:20.860 --> 01:14:24.820 very well for these deterministic games, where there are two players. 01:14:24.820 --> 01:14:28.120 It can work for other types of games as well, but we'll look right now at games 01:14:28.120 --> 01:14:31.990 where I make a move, that my opponent makes a move, and I am trying to win, 01:14:31.990 --> 01:14:34.270 and my opponent is trying to win, also. 01:14:34.270 --> 01:14:38.100 Or in other words, my opponent is trying to get me to lose. 01:14:38.100 --> 01:14:40.900 And so what do we need in order to make this algorithm work? 01:14:40.900 --> 01:14:44.800 Well, anytime we try and translate this human concept, of playing a game, 01:14:44.800 --> 01:14:47.380 winning, and losing, to a computer, we want 01:14:47.380 --> 01:14:50.230 to translate it in terms that the computer can understand. 01:14:50.230 --> 01:14:53.740 And ultimately, the computer really just understands numbers. 01:14:53.740 --> 01:14:56.890 And so we want some way of translating a game of X's and O's 01:14:56.890 --> 01:15:00.490 on a grid to something numerical, something the computer can understand. 01:15:00.490 --> 01:15:04.300 The computer doesn't normally understand notions of win or lose, 01:15:04.300 --> 01:15:08.360 but it does understand the concept of bigger and smaller. 01:15:08.360 --> 01:15:12.100 And so what we might yet do is, we might take each of the possible ways 01:15:12.100 --> 01:15:17.110 that a tic-tac-toe game can unfold and assign a value, or a utility, 01:15:17.110 --> 01:15:19.060 to each one of those possible ways. 01:15:19.060 --> 01:15:21.790 And in a tic-tac-toe game, and in many types of games, 01:15:21.790 --> 01:15:23.800 there are three possible outcomes. 01:15:23.800 --> 01:15:28.210 The outcomes are, O wins, X wins, or nobody wins. 01:15:28.210 --> 01:15:32.440 So player one wins, player two wins, or nobody wins. 01:15:32.440 --> 01:15:36.670 And for now, let's go ahead and assign each of these possible outcomes 01:15:36.670 --> 01:15:37.870 a different value. 01:15:37.870 --> 01:15:39.100 We'll say O winning-- 01:15:39.100 --> 01:15:41.230 that'll have a value of negative 1. 01:15:41.230 --> 01:15:43.630 Nobody winning-- that'll have a value of 0. 01:15:43.630 --> 01:15:46.700 And X winning-- that will have a value of 1. 01:15:46.700 --> 01:15:50.860 So we've just assigned numbers to each of these three possible outcomes. 01:15:50.860 --> 01:15:53.480 And now, we have two players. 01:15:53.480 --> 01:15:56.260 We have the X player and the O player. 01:15:56.260 --> 01:16:00.290 And we're going to go ahead and call the X player the max player. 01:16:00.290 --> 01:16:03.140 And we'll call the O player the min player. 01:16:03.140 --> 01:16:05.920 And the reason why is because in the Minimax algorithm, 01:16:05.920 --> 01:16:11.380 the max player, which in this case is X, is aiming to maximize the score. 01:16:11.380 --> 01:16:14.660 These are the possible options for the score, negative 1, 0, and 1. 01:16:14.660 --> 01:16:18.400 X wants to maximize the score, meaning if at all possible, 01:16:18.400 --> 01:16:21.980 X would like this situation where X wins the game. 01:16:21.980 --> 01:16:23.590 And we give it a score of 1. 01:16:23.590 --> 01:16:27.010 But if this isn't possible, if X needs to choose between these two 01:16:27.010 --> 01:16:31.930 options, negative 1 meaning O winning, or 0 meaning nobody winning, 01:16:31.930 --> 01:16:37.060 X would rather that nobody wins, score of 0, than a score of negative 1, 01:16:37.060 --> 01:16:38.360 O winning. 01:16:38.360 --> 01:16:41.080 So this notion of winning and losing in time 01:16:41.080 --> 01:16:45.410 has been reduced mathematically to just this idea of, try and maximize 01:16:45.410 --> 01:16:46.070 the score. 01:16:46.070 --> 01:16:49.920 The X player always wants the score to be bigger. 01:16:49.920 --> 01:16:52.880 And on the flip side, the min player, in this case, O, 01:16:52.880 --> 01:16:54.620 is aiming to minimize the score. 01:16:54.620 --> 01:16:59.510 The O player wants the score to be as small as possible. 01:16:59.510 --> 01:17:02.820 So now we've taken this game of X's and O's and winning and losing 01:17:02.820 --> 01:17:04.990 and turned it into something mathematical, something 01:17:04.990 --> 01:17:09.560 where X is trying to maximize the score, O is trying to minimize the score. 01:17:09.560 --> 01:17:11.590 Let's now look at all of the parts of the game 01:17:11.590 --> 01:17:14.650 that we need in order to encode it in an AI 01:17:14.650 --> 01:17:18.730 so that an AI can play a game like tic-tac-toe. 01:17:18.730 --> 01:17:20.770 So the game is going to need a couple of things. 01:17:20.770 --> 01:17:23.350 We'll need some sort of initial state, that we'll in this case 01:17:23.350 --> 01:17:27.520 call S0, which is how the game begins, like an empty tic-tac-toe board, 01:17:27.520 --> 01:17:28.750 for example. 01:17:28.750 --> 01:17:32.410 We'll also need a function called player, 01:17:32.410 --> 01:17:37.030 where the player function is going to take as input a state, here represented 01:17:37.030 --> 01:17:41.050 by S. And the output of the player function is going to be, 01:17:41.050 --> 01:17:43.450 which player's turn is it? 01:17:43.450 --> 01:17:46.360 We need to be able to give a tic-tac-toe board to the computer, 01:17:46.360 --> 01:17:50.440 run it through a function, and that function tells us whose turn it is. 01:17:50.440 --> 01:17:52.870 We'll need some notion of actions that we can take. 01:17:52.870 --> 01:17:54.980 We'll see examples of that in just a moment. 01:17:54.980 --> 01:17:57.910 We need some notion of a transition model-- same as before. 01:17:57.910 --> 01:18:00.160 If I have a state, and I take an action, I 01:18:00.160 --> 01:18:03.040 need to know what results as a consequence of it. 01:18:03.040 --> 01:18:05.810 I need some way of knowing when the game is over. 01:18:05.810 --> 01:18:07.900 So this is equivalent to kind of like a goal test, 01:18:07.900 --> 01:18:10.330 but I need some terminal test, some way to check 01:18:10.330 --> 01:18:14.230 to see if a state is a terminal state, where a terminal state means 01:18:14.230 --> 01:18:15.280 the game is over. 01:18:15.280 --> 01:18:20.050 In the classic game of tic-tac-toe , a terminal state means either someone has 01:18:20.050 --> 01:18:23.380 gotten three in a row, or all of the squares of the tic-tac-toe board are 01:18:23.380 --> 01:18:24.040 filled. 01:18:24.040 --> 01:18:26.610 Either of those conditions make it a terminal state. 01:18:26.610 --> 01:18:28.570 In a game of chess, it might be something like, 01:18:28.570 --> 01:18:31.660 when there is checkmate, or if checkmate is no longer possible, 01:18:31.660 --> 01:18:34.370 that becomes a terminal state. 01:18:34.370 --> 01:18:38.410 And then finally we'll need a utility function, a function that takes a state 01:18:38.410 --> 01:18:41.890 and gives us a numerical value for that terminal state, some way of saying, 01:18:41.890 --> 01:18:44.530 if X wins the game, that has a value of 1. 01:18:44.530 --> 01:18:47.050 If O has won the game, that has the value of negative 1. 01:18:47.050 --> 01:18:50.350 If nobody has won the game, that has a value of 0. 01:18:50.350 --> 01:18:52.790 So let's take a look at each of these in turn. 01:18:52.790 --> 01:18:57.070 The initial state, we can just represent in tic-tac-toe as the empty game board. 01:18:57.070 --> 01:18:58.460 This is where we begin. 01:18:58.460 --> 01:19:01.030 It's the place from which we begin this search. 01:19:01.030 --> 01:19:03.540 And again, I'll be representing these things visually. 01:19:03.540 --> 01:19:05.410 But you can imagine this really just being 01:19:05.410 --> 01:19:10.120 an array, or a two-dimensional array, of all of these possible squares. 01:19:10.120 --> 01:19:13.510 Then we need the player function that, again, takes a state 01:19:13.510 --> 01:19:15.250 and tells us whose turn it is. 01:19:15.250 --> 01:19:18.670 Assuming X makes the first move, if I have an empty game board, 01:19:18.670 --> 01:19:21.510 then my player function is going to return X 01:19:21.510 --> 01:19:25.120 And if I have a game board where X has made a move, that my player function is 01:19:25.120 --> 01:19:28.840 going to return O. The player function takes a tic-tac-toe game board 01:19:28.840 --> 01:19:32.170 and tells us whose turn it is. 01:19:32.170 --> 01:19:34.870 Next up, we'll consider the actions function. 01:19:34.870 --> 01:19:39.040 The actions function, much like it did in classical search, takes a state 01:19:39.040 --> 01:19:41.860 and gives us the set of all of the possible actions 01:19:41.860 --> 01:19:44.390 we can take in that state. 01:19:44.390 --> 01:19:49.360 So let's imagine it's O's turn to move in a game board that looks like this. 01:19:49.360 --> 01:19:52.120 What happens when we pass it into the actions function? 01:19:52.120 --> 01:19:55.990 So the actions function takes this state of the game as input, 01:19:55.990 --> 01:19:59.860 and the output is a set of possible actions it's a set of-- 01:19:59.860 --> 01:20:03.580 I could move in the upper left, or I could move in the bottom middle. 01:20:03.580 --> 01:20:06.490 Those are the two possible action choices that I have 01:20:06.490 --> 01:20:10.190 when I begin in this particular state. 01:20:10.190 --> 01:20:13.100 Now, just as before, when we add states and actions, 01:20:13.100 --> 01:20:15.470 we need some sort of transition model to tell us, 01:20:15.470 --> 01:20:19.500 when we take this action in the state, what is the new state that we get? 01:20:19.500 --> 01:20:22.430 And here, we define that using the result function that takes 01:20:22.430 --> 01:20:25.460 a state as input, as well as an action. 01:20:25.460 --> 01:20:28.490 And when we apply the result function to this state, 01:20:28.490 --> 01:20:33.110 saying that let's let O move in this upper left corner, the new state we get 01:20:33.110 --> 01:20:35.840 is this resulting state, where O is in the upper-left corner. 01:20:35.840 --> 01:20:38.880 And now, this seems obvious to someone who knows how to play tic-tac-toe. 01:20:38.880 --> 01:20:40.830 Of course, you play in the upper left corner-- 01:20:40.830 --> 01:20:41.870 that's the board you get. 01:20:41.870 --> 01:20:45.060 But all of this information needs to be encoded into the AI. 01:20:45.060 --> 01:20:47.570 The AI doesn't know how to play tic-tac-toe 01:20:47.570 --> 01:20:51.170 until you tell the AI how the rules of tic-tac-toe work. 01:20:51.170 --> 01:20:53.630 And this function, defining the function here, 01:20:53.630 --> 01:20:57.050 allows us to tell the AI how this game actually works 01:20:57.050 --> 01:21:01.200 and how actions actually affect the outcome of the game. 01:21:01.200 --> 01:21:03.590 So the AI needs to know how the game works. 01:21:03.590 --> 01:21:06.080 The AI also needs to know when the game is over. 01:21:06.080 --> 01:21:09.320 That is by defining a function called terminal that takes as input 01:21:09.320 --> 01:21:13.200 a state S, such that if we take a game that is not yet over, 01:21:13.200 --> 01:21:16.130 pass it into the terminal function, the output is false. 01:21:16.130 --> 01:21:17.540 The game is not over. 01:21:17.540 --> 01:21:20.690 But if we take a game that is over, because X has gotten three 01:21:20.690 --> 01:21:24.360 in a row along that diagonal, pass that into the terminal function, 01:21:24.360 --> 01:21:28.920 then the output is going to be true, because the game now is, in fact, over. 01:21:28.920 --> 01:21:31.930 And finally, we've told the AI how the game works 01:21:31.930 --> 01:21:35.180 in terms of what moves can be made and what happens when you make those moves. 01:21:35.180 --> 01:21:37.290 We've told the AI when the game is over. 01:21:37.290 --> 01:21:41.270 Now we need to tell the AI what the value of each of those states is. 01:21:41.270 --> 01:21:45.170 And we do that by defining this utility function, that takes a state, S, 01:21:45.170 --> 01:21:48.840 and tells us the score or the utility of that state. 01:21:48.840 --> 01:21:52.760 So again, we said that if X wins the game, that utility is a value of 1, 01:21:52.760 --> 01:21:57.350 whereas if O wins the game, then the utility of that is negative 1. 01:21:57.350 --> 01:22:00.200 And the AI needs to know, for each of these terminal states 01:22:00.200 --> 01:22:04.800 where the game is over, what is the utility of that state? 01:22:04.800 --> 01:22:08.390 So I can give you a game board like this, where the game is, in fact, over, 01:22:08.390 --> 01:22:12.710 and I ask the AI to tell me what the value of that state is, it could do so. 01:22:12.710 --> 01:22:15.830 The value of the state is 1. 01:22:15.830 --> 01:22:20.240 Where things get interesting, though, is if the game is not yet over. 01:22:20.240 --> 01:22:21.900 Let's imagine a game board like this. 01:22:21.900 --> 01:22:23.330 We're in the middle of the game. 01:22:23.330 --> 01:22:25.850 It's O's turn to make a move. 01:22:25.850 --> 01:22:27.980 So how do we know it's O's turn to make a move? 01:22:27.980 --> 01:22:30.110 We can calculate that, using the player function. 01:22:30.110 --> 01:22:33.260 We can say, player of S, pass in the state. 01:22:33.260 --> 01:22:36.240 O is the answer, so we know it's O's turn to move. 01:22:36.240 --> 01:22:40.790 And now, what is the value of this board, and what action should O take? 01:22:40.790 --> 01:22:41.960 Well that's going to depend. 01:22:41.960 --> 01:22:43.760 We have to do some calculation here. 01:22:43.760 --> 01:22:47.450 And this is where the Minimax algorithm really comes in. 01:22:47.450 --> 01:22:50.840 Recall that X is trying to maximize the score, which means 01:22:50.840 --> 01:22:53.720 that O is trying to minimize the score. 01:22:53.720 --> 01:22:58.870 O would like to minimize the total value that we get at the end of the game. 01:22:58.870 --> 01:23:01.310 And because this game isn't over yet, we don't really 01:23:01.310 --> 01:23:04.820 know just yet what the value of this game board is. 01:23:04.820 --> 01:23:08.290 We have to do some calculation in order to figure that out. 01:23:08.290 --> 01:23:10.430 So how do we do that kind of calculation? 01:23:10.430 --> 01:23:13.040 Well, in order to do so, we're going to consider, 01:23:13.040 --> 01:23:15.680 just as we might in a classical search situation, 01:23:15.680 --> 01:23:20.160 what actions could happen next, and what states will that take us to? 01:23:20.160 --> 01:23:22.070 And it turns out that in this position, there 01:23:22.070 --> 01:23:26.030 are only two open squares, which means there are only two open places where 01:23:26.030 --> 01:23:28.640 O can make a move. 01:23:28.640 --> 01:23:31.080 O could either make a move in the upper left, 01:23:31.080 --> 01:23:34.160 or O can make a move in the bottom middle. 01:23:34.160 --> 01:23:36.980 And Minimax doesn't know right out of the box which of those moves 01:23:36.980 --> 01:23:40.520 is going to be better, so it's going to consider both. 01:23:40.520 --> 01:23:42.320 But now we run into the same situation. 01:23:42.320 --> 01:23:45.170 Now I have two more game boards, neither of which is over. 01:23:45.170 --> 01:23:46.660 What happens next? 01:23:46.660 --> 01:23:48.410 And now it's in this sense that Minimax is 01:23:48.410 --> 01:23:50.660 what we'll call a recursive algorithm. 01:23:50.660 --> 01:23:54.800 It's going to now repeat the exact same process, although now 01:23:54.800 --> 01:23:57.640 considering it from the opposite perspective. 01:23:57.640 --> 01:24:01.280 It's as if I am now going to put myself-- if I am the O player, 01:24:01.280 --> 01:24:05.540 I'm going to put myself in my opponent's shoes, my opponent as the X player, 01:24:05.540 --> 01:24:10.010 and consider, what would my opponent do if they were in this position? 01:24:10.010 --> 01:24:14.090 What would my opponent do, the X player, if they were in that position? 01:24:14.090 --> 01:24:15.240 And what would then happen? 01:24:15.240 --> 01:24:18.260 Well, the other player, my opponent, the X player, 01:24:18.260 --> 01:24:21.200 is trying to maximize the score, whereas I am trying 01:24:21.200 --> 01:24:23.240 to minimize the score as the O player. 01:24:23.240 --> 01:24:27.550 So X is trying to find the maximum possible value that they can get. 01:24:27.550 --> 01:24:29.390 And so what's going to happen? 01:24:29.390 --> 01:24:32.780 Well, from this board position, X only has one choice. 01:24:32.780 --> 01:24:35.600 X is going to play here, and they're going to get three in a row. 01:24:35.600 --> 01:24:37.910 And we know that that board, X winning-- 01:24:37.910 --> 01:24:39.530 that has a value of 1. 01:24:39.530 --> 01:24:43.220 If X wins the game, the value of that game board is 1. 01:24:43.220 --> 01:24:48.680 And so from this position, if this state can only ever lead to this state, 01:24:48.680 --> 01:24:52.640 it's the only possible option, and this state has a value of 1, 01:24:52.640 --> 01:24:57.230 then the maximum possible value that the X player can get from this game board 01:24:57.230 --> 01:24:58.890 is also 1 from here. 01:24:58.890 --> 01:25:01.670 The only place we can get is to a game with the value of 1, 01:25:01.670 --> 01:25:05.260 so this game board also has a value of 1. 01:25:05.260 --> 01:25:07.640 Now we consider this one over here. 01:25:07.640 --> 01:25:08.860 What's going to happen now? 01:25:08.860 --> 01:25:10.460 Well, X needs to make a move. 01:25:10.460 --> 01:25:13.510 The only move X can make is in the upper left, so X will go there. 01:25:13.510 --> 01:25:15.350 And in this game, no one wins the game. 01:25:15.350 --> 01:25:16.940 Nobody has three in a row. 01:25:16.940 --> 01:25:19.600 So the value of that game board is 0. 01:25:19.600 --> 01:25:20.890 Nobody's won. 01:25:20.890 --> 01:25:24.880 And so again, by the same logic, if from this board position, the only place 01:25:24.880 --> 01:25:27.760 we can get to is a board where the value is 0, 01:25:27.760 --> 01:25:31.290 then this state must also have a value of 0. 01:25:31.290 --> 01:25:35.320 And now here comes the choice part, the idea of trying to minimize. 01:25:35.320 --> 01:25:38.970 I, as the O player, now know that if I make this choice, 01:25:38.970 --> 01:25:43.140 moving in the upper left, that is going to result in a game with a value of 1, 01:25:43.140 --> 01:25:45.240 assuming everyone plays optimally. 01:25:45.240 --> 01:25:47.040 And if I instead play in the lower middle, 01:25:47.040 --> 01:25:50.250 choose this fork in the road, that is going to result in a game board 01:25:50.250 --> 01:25:51.510 with a value of 0. 01:25:51.510 --> 01:25:52.750 I have two options. 01:25:52.750 --> 01:25:56.380 I have a 1 and a 0 to choose from, and I need to pick. 01:25:56.380 --> 01:25:59.520 And as the min player, I would rather choose the option 01:25:59.520 --> 01:26:00.940 with the minimum value. 01:26:00.940 --> 01:26:03.090 So whenever a player has multiple choices, 01:26:03.090 --> 01:26:06.030 the min player will choose the option with the smallest value. 01:26:06.030 --> 01:26:08.700 The max player will choose the option with the largest value. 01:26:08.700 --> 01:26:11.470 Between the 1 in the 0, the 0 is smaller, 01:26:11.470 --> 01:26:14.740 meaning I'd rather tie the game than lose the game. 01:26:14.740 --> 01:26:18.090 And so this game board, we'll say, also has a value of 0, 01:26:18.090 --> 01:26:22.290 because if I am playing optimally, I will pick this fork in the road. 01:26:22.290 --> 01:26:25.290 I'll place my O here to block X's three in a row. 01:26:25.290 --> 01:26:28.080 X will move in the upper left, and the game will be over, 01:26:28.080 --> 01:26:30.180 and no one will have won the game. 01:26:30.180 --> 01:26:34.260 So this is now the logic of Minimax, to consider all of the possible options 01:26:34.260 --> 01:26:37.140 that I can take, all of the actions that I can take, 01:26:37.140 --> 01:26:39.390 and then to put myself in my opponent's shoes. 01:26:39.390 --> 01:26:42.930 I decide what move I'm going to make now by considering what move 01:26:42.930 --> 01:26:44.880 my opponent will make on the next turn. 01:26:44.880 --> 01:26:48.230 And to do that, I consider what move I would make on the turn after that, 01:26:48.230 --> 01:26:52.560 so on and so forth, until I get all the way down to the end of the game, 01:26:52.560 --> 01:26:55.050 to one of these so-called terminal states. 01:26:55.050 --> 01:26:57.480 In fact, this very decision point, where I 01:26:57.480 --> 01:27:00.630 am trying to decide as the O player what to make a decision about, 01:27:00.630 --> 01:27:04.770 might have just been a part of the logic that the X player, my opponent, 01:27:04.770 --> 01:27:06.360 was using the move before me. 01:27:06.360 --> 01:27:09.150 This might be part of some larger tree where 01:27:09.150 --> 01:27:11.580 X is trying to make a move in this situation 01:27:11.580 --> 01:27:13.830 and needs to pick between three different options 01:27:13.830 --> 01:27:16.440 in order to make a decision about what to happen. 01:27:16.440 --> 01:27:19.260 And the further and further away we are from the end of the game, 01:27:19.260 --> 01:27:23.460 the deeper this tree has to go, because every level in this tree 01:27:23.460 --> 01:27:27.750 is going to correspond to one move, one move or action that I take, 01:27:27.750 --> 01:27:32.350 one move or action that my opponent takes, in order to decide what happens. 01:27:32.350 --> 01:27:35.970 And in fact, it turns out that if I am the X player in this position, 01:27:35.970 --> 01:27:38.630 and I recursively do the logic and see I have a choice-- 01:27:38.630 --> 01:27:43.170 three choices, in fact, one of which leads to a value of 0, if I play here, 01:27:43.170 --> 01:27:45.900 and if everyone plays optimally, the game will be a tie. 01:27:45.900 --> 01:27:51.000 If I play here, then O is going to win, and I'll lose, playing optimally. 01:27:51.000 --> 01:27:53.610 Or here, where I, the X player, can win-- 01:27:53.610 --> 01:27:57.230 well, between a score of 0 and negative 1 and 1, 01:27:57.230 --> 01:27:59.070 I'd rather pick the board with a value of 1, 01:27:59.070 --> 01:28:01.270 because that's the maximum value I can get. 01:28:01.270 --> 01:28:05.550 And so this board would also have a maximum value of 1. 01:28:05.550 --> 01:28:07.810 And so this tree can get very, very deep, 01:28:07.810 --> 01:28:11.400 especially as the game starts to have more and more moves. 01:28:11.400 --> 01:28:13.470 And this logic works not just for tic-tac-toe, 01:28:13.470 --> 01:28:16.950 but any of these sorts of games where I make a move, my opponent makes a move, 01:28:16.950 --> 01:28:20.460 and ultimately, we have these adversarial objectives. 01:28:20.460 --> 01:28:24.340 And we can simplify the diagram into a diagram that looks like this. 01:28:24.340 --> 01:28:27.240 This is a more abstract version of the Minimax tree, 01:28:27.240 --> 01:28:30.450 where these are each states, but I'm no longer representing them as exactly 01:28:30.450 --> 01:28:31.830 like tic-tac-toe boards. 01:28:31.830 --> 01:28:35.690 This is just representing some generic game that might be tic-tac-toe, 01:28:35.690 --> 01:28:38.130 might be some other game altogether. 01:28:38.130 --> 01:28:40.590 Any of these green arrows that are pointing up-- 01:28:40.590 --> 01:28:42.700 that represents a maximizing state. 01:28:42.700 --> 01:28:45.290 I would like the score to be as big as possible. 01:28:45.290 --> 01:28:47.430 And any of these red arrows pointing down-- 01:28:47.430 --> 01:28:50.580 those are minimizing states, where the player is the min player, 01:28:50.580 --> 01:28:54.190 and they are trying to make the score as small as possible. 01:28:54.190 --> 01:28:58.200 So if you imagine in this situation, I am the maximizing player, this player 01:28:58.200 --> 01:29:00.550 here, and I have three choices-- 01:29:00.550 --> 01:29:04.170 one choice gives me a score of 5, one choice gives me a score of 3, 01:29:04.170 --> 01:29:06.310 and one choice gives me a score of 9. 01:29:06.310 --> 01:29:10.050 Well, then, between those three choices, my best option 01:29:10.050 --> 01:29:14.430 is to choose this 9 over here, the score that maximizes my options out 01:29:14.430 --> 01:29:15.960 of all the three options. 01:29:15.960 --> 01:29:18.870 And so I can give this state a value of 9, 01:29:18.870 --> 01:29:21.210 because among my three options, that is the best 01:29:21.210 --> 01:29:24.330 choice that I have available to me. 01:29:24.330 --> 01:29:25.800 So that's my decision now. 01:29:25.800 --> 01:29:29.580 You imagine it's like one move away from the end of the game. 01:29:29.580 --> 01:29:31.670 But then you could also ask a reasonable question. 01:29:31.670 --> 01:29:35.300 What might my opponent do two moves away from the end of the game? 01:29:35.300 --> 01:29:37.040 My opponent is the minimizing player. 01:29:37.040 --> 01:29:39.810 They are trying to make the score as small as possible. 01:29:39.810 --> 01:29:43.690 Imagine what would have happened if they had to pick which choice to make. 01:29:43.690 --> 01:29:47.370 One choice leads us to this state, where I, the maximizing player, 01:29:47.370 --> 01:29:50.840 am going to opt for 9, the biggest score that I can get. 01:29:50.840 --> 01:29:55.130 And one leads to this state, where I, the maximizing player, 01:29:55.130 --> 01:29:58.980 would choose 8, which is then the largest score than I can get. 01:29:58.980 --> 01:30:02.780 Now, the minimizing player, forced to choose between a 9 or an 8, 01:30:02.780 --> 01:30:07.100 is going to choose the smallest possible score, which in this case is an 8. 01:30:07.100 --> 01:30:09.450 And that is, then, how this process would unfold. 01:30:09.450 --> 01:30:11.660 But the minimizing player, in this case, considers 01:30:11.660 --> 01:30:14.180 both of their options, and then all of the options 01:30:14.180 --> 01:30:17.590 that would happen as a result of that. 01:30:17.590 --> 01:30:21.420 So this now is a general picture of what the Minimax algorithm looks like. 01:30:21.420 --> 01:30:24.700 Let's now try to formalize it using a little bit of pseudocode. 01:30:24.700 --> 01:30:27.720 So what exactly is happening in the Minimax algorithm? 01:30:27.720 --> 01:30:31.500 Well, given a state, S, we need to decide what to happen. 01:30:31.500 --> 01:30:34.980 The max player-- if it's the max player's turn, then 01:30:34.980 --> 01:30:39.540 max is going to pick an action, A, in actions of S. Recall 01:30:39.540 --> 01:30:42.090 that actions is a function that takes a state 01:30:42.090 --> 01:30:44.970 and gives me back all of the possible actions that I can take. 01:30:44.970 --> 01:30:48.770 It tells me all of the moves that are possible. 01:30:48.770 --> 01:30:50.610 The max player is going to specifically pick 01:30:50.610 --> 01:30:53.970 an action, A, in the set of actions that gives me 01:30:53.970 --> 01:31:01.420 the highest value of min value of result of S and A. So what does that mean? 01:31:01.420 --> 01:31:04.350 Well, it means that I want to make the option that gives me 01:31:04.350 --> 01:31:07.830 the highest score of all of the actions, A. 01:31:07.830 --> 01:31:09.630 But what score is that going to have? 01:31:09.630 --> 01:31:12.780 To calculate that, I need to know what my opponent, the min player, 01:31:12.780 --> 01:31:18.300 is going to do if they try to minimize the value of the state that results. 01:31:18.300 --> 01:31:22.260 So we say, what state results after I take this action, 01:31:22.260 --> 01:31:24.570 and what happens when the min player tries 01:31:24.570 --> 01:31:27.570 to minimize the value of that state? 01:31:27.570 --> 01:31:30.260 I consider that for all of my possible options. 01:31:30.260 --> 01:31:32.850 And after I've considered that for all of my possible options, 01:31:32.850 --> 01:31:36.770 I pick the action, A, that has the highest value. 01:31:36.770 --> 01:31:40.090 Likewise, the min player is going to do the same thing, but backwards. 01:31:40.090 --> 01:31:43.180 They're also going to consider, what are all of the possible actions they 01:31:43.180 --> 01:31:44.800 can take if it's their turn? 01:31:44.800 --> 01:31:47.740 And they're going to pick the action, A, that has the smallest 01:31:47.740 --> 01:31:50.220 possible value of all the options. 01:31:50.220 --> 01:31:53.430 And the way they know what the smallest possible value of all the options is, 01:31:53.430 --> 01:31:57.490 is by considering what the max player is going to do, 01:31:57.490 --> 01:32:01.720 by saying, what's the result of applying this action to the current state, 01:32:01.720 --> 01:32:03.640 and then, what would the max player try to do? 01:32:03.640 --> 01:32:08.000 What value would the max player calculate for that particular state? 01:32:08.000 --> 01:32:11.260 So everyone makes their decision based on trying to estimate 01:32:11.260 --> 01:32:13.600 what the other person would do. 01:32:13.600 --> 01:32:15.930 And now we need to turn our attention to these two 01:32:15.930 --> 01:32:18.540 functions, maxValue and minValue. 01:32:18.540 --> 01:32:21.690 How do you actually calculate the value of a state 01:32:21.690 --> 01:32:24.270 if you're trying to maximize its value, and how do you 01:32:24.270 --> 01:32:27.690 calculate the value of a state if you're trying to minimize the value? 01:32:27.690 --> 01:32:30.750 If you can do that, then we have an entire implementation 01:32:30.750 --> 01:32:32.920 of this Minimax algorithm. 01:32:32.920 --> 01:32:33.610 So let's try it. 01:32:33.610 --> 01:32:36.480 Let's try and implement this maxValue function 01:32:36.480 --> 01:32:40.630 that takes a state and returns as output the value of that state 01:32:40.630 --> 01:32:43.860 if I'm trying to maximize the value of the state. 01:32:43.860 --> 01:32:46.860 Well, the first thing I can check for is to see if the game is over, 01:32:46.860 --> 01:32:48.100 because if the game is over-- 01:32:48.100 --> 01:32:50.820 in other words, if the state is a terminal state-- 01:32:50.820 --> 01:32:51.990 then this is easy. 01:32:51.990 --> 01:32:54.930 I already have this utility function that tells me 01:32:54.930 --> 01:32:56.280 what the value of the board is. 01:32:56.280 --> 01:32:58.660 If the game is over, I just check, did X win? 01:32:58.660 --> 01:32:59.160 Did O win? 01:32:59.160 --> 01:33:00.150 Is that a tie? 01:33:00.150 --> 01:33:04.200 And the utility function just knows what the value of the state is. 01:33:04.200 --> 01:33:06.430 What's trickier is if the game isn't over, 01:33:06.430 --> 01:33:09.240 because then I need to do this recursive reasoning about thinking, 01:33:09.240 --> 01:33:12.630 what is my opponent going to do on the next move? 01:33:12.630 --> 01:33:15.830 Then I want to calculate the value of this state, 01:33:15.830 --> 01:33:19.060 and I want the value of the state to be as high as possible. 01:33:19.060 --> 01:33:21.900 And I'll keep track of that value in a variable called v. 01:33:21.900 --> 01:33:24.570 And if I want the value to be as high as possible, 01:33:24.570 --> 01:33:27.330 I need to give v an initial value. 01:33:27.330 --> 01:33:31.500 And initially, I'll just go ahead and set it to be as low as possible, 01:33:31.500 --> 01:33:34.650 because I don't know what options are available to me yet. 01:33:34.650 --> 01:33:38.610 So initially, I'll set v equal to negative infinity, which 01:33:38.610 --> 01:33:40.620 seems a little bit strange, but the idea here 01:33:40.620 --> 01:33:43.680 is, I want the value initially to be as low as possible, 01:33:43.680 --> 01:33:46.230 because as I consider my actions, I'm always 01:33:46.230 --> 01:33:50.220 going to try and do better than v. And if I set v to negative infinity, 01:33:50.220 --> 01:33:52.850 I know I can always do better than that. 01:33:52.850 --> 01:33:54.960 So now I consider my actions. 01:33:54.960 --> 01:33:56.710 And this is going to be some kind of loop, 01:33:56.710 --> 01:34:00.070 where for every action in actions of state-- 01:34:00.070 --> 01:34:02.830 recall, actions is a function that takes my state 01:34:02.830 --> 01:34:06.650 and gives me all the possible actions that I can use in that state. 01:34:06.650 --> 01:34:11.570 So for each one of those actions, I want to compare it to v and say, 01:34:11.570 --> 01:34:18.290 all right, v is going to be equal to the maximum of v and this expression. 01:34:18.290 --> 01:34:20.090 So what is this expression? 01:34:20.090 --> 01:34:24.640 Well, first it is, get the result of taking the action and the state, 01:34:24.640 --> 01:34:28.250 and then get the min value of that. 01:34:28.250 --> 01:34:31.090 In other words, let's say, I want to find out 01:34:31.090 --> 01:34:34.220 from that state what is the best that the min player can do, 01:34:34.220 --> 01:34:36.430 because they are going to try and minimize the score. 01:34:36.430 --> 01:34:40.210 So whatever the resulting score is of the min value of that state, 01:34:40.210 --> 01:34:43.870 compare it to my current best value, and just pick the maximum of those two, 01:34:43.870 --> 01:34:46.460 because I am trying to maximize the value. 01:34:46.460 --> 01:34:48.550 In short, what these three lines of code are doing 01:34:48.550 --> 01:34:52.460 are going through all of my possible actions and asking the question, 01:34:52.460 --> 01:34:57.880 how do I maximize the score, given what my opponent is going to try to do? 01:34:57.880 --> 01:35:00.640 After this entire loop, I can just return v, 01:35:00.640 --> 01:35:04.070 and that is now the value of that particular state. 01:35:04.070 --> 01:35:07.910 And for the min player, it's the exact opposite of this, the same logic, 01:35:07.910 --> 01:35:08.930 just backwards. 01:35:08.930 --> 01:35:10.910 To calculate the minimum value of a state, 01:35:10.910 --> 01:35:12.770 first we check if it's a terminal state. 01:35:12.770 --> 01:35:14.960 If it is, we return its utility. 01:35:14.960 --> 01:35:19.280 Otherwise, we're going to now try to minimize the value of the state, 01:35:19.280 --> 01:35:21.290 given all of my possible actions. 01:35:21.290 --> 01:35:24.800 So I need an initial value for v, the value of the state. 01:35:24.800 --> 01:35:28.430 And initially, I'll set it to infinity, because I know it can always 01:35:28.430 --> 01:35:30.390 get something less than infinity. 01:35:30.390 --> 01:35:33.920 So by starting with v equals infinity, I make sure that the very first action 01:35:33.920 --> 01:35:34.580 I find-- 01:35:34.580 --> 01:35:37.550 that will be less than this value of v. 01:35:37.550 --> 01:35:38.810 And then I do the same thing-- 01:35:38.810 --> 01:35:41.630 loop over all of my possible actions, and for each 01:35:41.630 --> 01:35:45.740 of the results that we could get when the max player makes their decision, 01:35:45.740 --> 01:35:49.160 let's take the minimum of that and the current value of v. 01:35:49.160 --> 01:35:53.210 So after all is said and done I get the smallest possible value of v, 01:35:53.210 --> 01:35:56.480 that I then return back to the user. 01:35:56.480 --> 01:35:59.020 So that, in effect, is the pseudocode for Minimax. 01:35:59.020 --> 01:36:01.990 That is how we take a game and figure out what the best move to make 01:36:01.990 --> 01:36:06.490 is by recursively using these maxValue and minValue functions, where 01:36:06.490 --> 01:36:10.060 maxValue calls minValue, minValue calls maxValue, back 01:36:10.060 --> 01:36:13.540 and forth, all the way until we reach a terminal state, at which point 01:36:13.540 --> 01:36:18.750 our algorithm can simply return the utility of that particular state. 01:36:18.750 --> 01:36:20.590 What you might imagine is that this is going 01:36:20.590 --> 01:36:23.770 to start to be a long process, especially as games start 01:36:23.770 --> 01:36:28.060 to get more complex, as we start to add more moves and more possible options 01:36:28.060 --> 01:36:30.710 and games that might last quite a bit longer. 01:36:30.710 --> 01:36:34.360 So the next question to ask is, what sort of optimizations can we make here? 01:36:34.360 --> 01:36:37.780 How can we do better in order to use less space 01:36:37.780 --> 01:36:42.110 or take less time to be able to solve this kind of problem? 01:36:42.110 --> 01:36:44.740 And we'll take a look at a couple of possible optimizations. 01:36:44.740 --> 01:36:47.330 But for one, we'll take a look at this example. 01:36:47.330 --> 01:36:49.850 Again, we're turning to these up arrows and down arrows. 01:36:49.850 --> 01:36:54.070 Let's imagine that I now am the max player, this green arrow. 01:36:54.070 --> 01:36:57.260 I am trying to make the score as high as possible. 01:36:57.260 --> 01:37:00.370 And this is an easy game, where there are just two moves. 01:37:00.370 --> 01:37:02.980 I make a move, one of these three options, 01:37:02.980 --> 01:37:05.890 and then my opponent makes a move, one of these three options, 01:37:05.890 --> 01:37:07.290 based on what move I make. 01:37:07.290 --> 01:37:10.310 And as a result, we get some value. 01:37:10.310 --> 01:37:13.450 Let's look at the order in which I do these calculations 01:37:13.450 --> 01:37:16.810 and figure out if there are any optimizations I might be able to make 01:37:16.810 --> 01:37:18.760 to this calculation process. 01:37:18.760 --> 01:37:21.200 I'm going to have to look at these states one at a time. 01:37:21.200 --> 01:37:23.740 So let's say I start here on the left and say, all right, now 01:37:23.740 --> 01:37:28.300 I'm going to consider, what will the min player, my opponent, try to do here? 01:37:28.300 --> 01:37:31.810 Well, the min player is going to look at all three of their possible actions 01:37:31.810 --> 01:37:34.270 and look at their value, because these are terminal states. 01:37:34.270 --> 01:37:35.430 They're the end of the game. 01:37:35.430 --> 01:37:38.830 And so they'll see, all right, this node is a value of 4, value of 8, 01:37:38.830 --> 01:37:40.410 value of 5. 01:37:40.410 --> 01:37:42.800 And the min player is going to say, well, all right. 01:37:42.800 --> 01:37:45.790 Between these three options, 4, 8, and 5, 01:37:45.790 --> 01:37:48.070 I'll take the smallest one I'll take the 4. 01:37:48.070 --> 01:37:50.610 So this state now has a value of 4. 01:37:50.610 --> 01:37:54.060 Then I as the max player say, all right, if I take this action, 01:37:54.060 --> 01:37:55.170 it will have a value of 4. 01:37:55.170 --> 01:37:57.330 That's the best that I can do, because min player 01:37:57.330 --> 01:37:59.810 is going to try and minimize my score. 01:37:59.810 --> 01:38:01.270 So now, what if I take this option? 01:38:01.270 --> 01:38:02.620 We'll explore this next. 01:38:02.620 --> 01:38:06.250 And now I explore what the min player would do if I choose this action. 01:38:06.250 --> 01:38:09.350 And the min player is going to say, all right, what are the three options? 01:38:09.350 --> 01:38:14.440 The min player has options between 9, 3, and 7, and so 3 01:38:14.440 --> 01:38:16.510 is the smallest among 9, 3, and 7. 01:38:16.510 --> 01:38:19.640 So we'll go ahead and say this state has a value of 3. 01:38:19.640 --> 01:38:20.980 So now I, as the max player-- 01:38:20.980 --> 01:38:23.350 I have now explored two of my three options. 01:38:23.350 --> 01:38:27.390 I know that one of my options will guarantee me a score of 4, at least, 01:38:27.390 --> 01:38:31.100 and one of my options will guarantee me a score of 3. 01:38:31.100 --> 01:38:34.160 And now I consider my third option and say, all right, what happens here? 01:38:34.160 --> 01:38:35.910 Same exact logic-- the min player is going 01:38:35.910 --> 01:38:38.120 to look at these three states, 2, 4, and 6, 01:38:38.120 --> 01:38:42.480 say the minimum possible option is 2, so the min player wants the two. 01:38:42.480 --> 01:38:45.780 Now I, as the max player, have calculated all of the information 01:38:45.780 --> 01:38:49.220 by looking two layers deep, by looking at all of these nodes. 01:38:49.220 --> 01:38:52.770 And I can now say, between the 4, the 3, and the 2, you know what? 01:38:52.770 --> 01:38:55.620 I'd rather take the 4, because if I choose 01:38:55.620 --> 01:38:58.200 this option, if my opponent plays optimally, 01:38:58.200 --> 01:39:01.620 they will try and get me to the 4, but that's the best I can do. 01:39:01.620 --> 01:39:04.170 I can't guarantee a higher score, because if I 01:39:04.170 --> 01:39:07.740 pick either of these two options, I might get a 3, or I might get a 2. 01:39:07.740 --> 01:39:10.620 And it's true that down here is a 9, and that's 01:39:10.620 --> 01:39:12.390 the highest score of any of the scores. 01:39:12.390 --> 01:39:14.230 So I might be tempted to say, you know what? 01:39:14.230 --> 01:39:17.400 Maybe I should take this option, because I might get the 9. 01:39:17.400 --> 01:39:19.980 But if the min player is playing intelligently, 01:39:19.980 --> 01:39:22.650 if they're making the best moves at each possible option 01:39:22.650 --> 01:39:26.370 they have when they get to make a choice, I'll be left with a 3, 01:39:26.370 --> 01:39:28.470 whereas I could better, playing optimally, 01:39:28.470 --> 01:39:31.720 have guaranteed that I would get the 4. 01:39:31.720 --> 01:39:33.600 So that doesn't affect the logic that I would 01:39:33.600 --> 01:39:38.910 use as a Minimax player trying to maximize my score from that node there. 01:39:38.910 --> 01:39:41.310 But it turns out, that took quite a bit of computation 01:39:41.310 --> 01:39:42.390 for me to figure that out. 01:39:42.390 --> 01:39:45.690 I had to reason through all of these nodes in order to draw this conclusion. 01:39:45.690 --> 01:39:48.780 And this is for a pretty simple game, where I have three choices, 01:39:48.780 --> 01:39:52.270 my opponent has three choices, and then the game's over. 01:39:52.270 --> 01:39:55.070 So what I'd like to do is come up with some way to optimize this. 01:39:55.070 --> 01:39:58.990 Maybe I don't need to do all of this calculation to still reach 01:39:58.990 --> 01:40:00.450 the conclusion that, you know what? 01:40:00.450 --> 01:40:01.950 This action to the left-- 01:40:01.950 --> 01:40:03.810 that's the best that I could do. 01:40:03.810 --> 01:40:07.290 Let's go ahead and try again and try and be a little more intelligent 01:40:07.290 --> 01:40:10.150 about how I go about doing this. 01:40:10.150 --> 01:40:12.320 So first, I start the exact same way. 01:40:12.320 --> 01:40:14.160 I don't know what to do initially, so I just 01:40:14.160 --> 01:40:18.930 have to consider one of the options and consider what the min player might do. 01:40:18.930 --> 01:40:21.540 Min has three options, 4, 8, and 5. 01:40:21.540 --> 01:40:25.500 And between those three options, min says, 4 is the best they can do, 01:40:25.500 --> 01:40:28.390 because they want to try to minimize the score. 01:40:28.390 --> 01:40:31.960 Now, I, the max player, will consider my second option, 01:40:31.960 --> 01:40:36.730 making this move here and considering what my opponent would do in response. 01:40:36.730 --> 01:40:38.470 What will the min player do? 01:40:38.470 --> 01:40:41.560 Well, the min player is going to, from that state, look at their options. 01:40:41.560 --> 01:40:42.710 And I would say, all right. 01:40:42.710 --> 01:40:45.980 9 is an option, 3 is an option. 01:40:45.980 --> 01:40:48.220 And if I am doing the math from this initial state, 01:40:48.220 --> 01:40:51.400 doing all this calculation, when I see a 3, 01:40:51.400 --> 01:40:54.250 that should immediately be a red flag for me, 01:40:54.250 --> 01:40:56.890 because when I see a 3 down here at this state, 01:40:56.890 --> 01:41:01.870 I know that the value of this state is going to be at most 3. 01:41:01.870 --> 01:41:04.630 It's going to be 3 or something less than 3, 01:41:04.630 --> 01:41:08.050 even though I haven't yet looked at this last action or even further actions 01:41:08.050 --> 01:41:10.870 if there were more actions that could be taken here. 01:41:10.870 --> 01:41:11.900 How do I know that? 01:41:11.900 --> 01:41:16.000 Well, I know that the min player is going to try to minimize my score. 01:41:16.000 --> 01:41:20.350 And if they see a 3, the only way this could be something other than a 3 01:41:20.350 --> 01:41:24.520 is if this remaining thing that I haven't yet looked at is less than 3, 01:41:24.520 --> 01:41:28.810 which means there is no way for this value to be anything more than 3, 01:41:28.810 --> 01:41:31.390 because the min player can already guarantee a 3, 01:41:31.390 --> 01:41:34.930 and they are trying to minimize my score. 01:41:34.930 --> 01:41:36.380 So what does that tell me? 01:41:36.380 --> 01:41:38.740 Well, it tells me that if I choose this action, 01:41:38.740 --> 01:41:43.240 my score is going to be 3, or maybe even less than 3, if I'm unlucky. 01:41:43.240 --> 01:41:47.500 But I already know that this action will guarantee me a 4. 01:41:47.500 --> 01:41:51.230 And so given that I know that this action guarantees me a score of 4, 01:41:51.230 --> 01:41:54.130 and this action means I can't do better than 3, 01:41:54.130 --> 01:41:56.290 if I'm trying to maximize my options, there 01:41:56.290 --> 01:41:59.290 is no need for me to consider this triangle here. 01:41:59.290 --> 01:42:01.990 There is no value, no number that could go here, 01:42:01.990 --> 01:42:04.760 that would change my mind between these two options. 01:42:04.760 --> 01:42:08.020 I'm always going to opt for this path that gets me a 4, 01:42:08.020 --> 01:42:11.260 as opposed to this path, where the best I can do is a 3, 01:42:11.260 --> 01:42:13.600 if my opponent plays optimally. 01:42:13.600 --> 01:42:16.850 And this is going to be true for all of the future states that I look at, too. 01:42:16.850 --> 01:42:19.470 But if I look over here, at what min player might do over here, 01:42:19.470 --> 01:42:24.510 if I see that this state is a 2, I know that this state is at most a 2, 01:42:24.510 --> 01:42:28.500 because the only way this value could be something other than 2 01:42:28.500 --> 01:42:31.820 is if one of these remaining states is less than a 2, 01:42:31.820 --> 01:42:34.600 and so the min player would opt for that instead. 01:42:34.600 --> 01:42:37.470 So even without looking at these remaining states, 01:42:37.470 --> 01:42:42.600 I, as the maximizing player, can know that choosing this path to the left 01:42:42.600 --> 01:42:47.280 is going to be better than choosing either of those two paths to the right, 01:42:47.280 --> 01:42:51.810 because this one can't be better than 3, this one can't be better than 2, 01:42:51.810 --> 01:42:56.550 and so 4 in this case is the best that I can do. 01:42:56.550 --> 01:42:59.580 And I can say now that this state has a value of 4. 01:42:59.580 --> 01:43:01.680 So in order to do this type of calculation, 01:43:01.680 --> 01:43:04.980 I was doing a little bit more bookkeeping, keeping track of things, 01:43:04.980 --> 01:43:08.480 keeping track all the time of, what is the best that I can do, 01:43:08.480 --> 01:43:11.400 what is the worst that I can do, and for each of these states, saying, 01:43:11.400 --> 01:43:15.300 all right, well, if I already know that I can get a 4, 01:43:15.300 --> 01:43:18.060 then if the best I can do at this state is a 3, 01:43:18.060 --> 01:43:19.770 no reason for me to consider it. 01:43:19.770 --> 01:43:25.020 I can effectively prune this leaf and anything below it from the tree. 01:43:25.020 --> 01:43:28.440 And it's for that reason this approach, this optimization to Minimax, 01:43:28.440 --> 01:43:30.510 is called alpha-beta pruning. 01:43:30.510 --> 01:43:32.400 Alpha and beta stand for these two values 01:43:32.400 --> 01:43:34.950 that you'll have to keep track of, the best you can do so far 01:43:34.950 --> 01:43:36.600 and the worst you can do so far. 01:43:36.600 --> 01:43:41.170 And pruning is the idea of, if I have a big, long, deep search tree, 01:43:41.170 --> 01:43:43.650 I might be able to search it more efficiently if I don't 01:43:43.650 --> 01:43:47.070 need to search through everything, if I can remove some of the nodes 01:43:47.070 --> 01:43:52.170 to try and optimize the way that I look through this entire search space. 01:43:52.170 --> 01:43:55.500 So alpha-beta pruning can definitely save us a lot of time 01:43:55.500 --> 01:43:59.430 as we go about the search process by making our searches more efficient. 01:43:59.430 --> 01:44:03.750 But even then, it's still not great as games get more complex. 01:44:03.750 --> 01:44:06.990 Tic-tac-toe, fortunately, is a relatively simple game, 01:44:06.990 --> 01:44:09.820 and we might reasonably ask a question like, 01:44:09.820 --> 01:44:13.640 how many total possible tic-tac-toe games are there? 01:44:13.640 --> 01:44:14.600 You can think about it. 01:44:14.600 --> 01:44:17.640 You can try and estimate, how many moves are there at any given point? 01:44:17.640 --> 01:44:19.500 How many moves long can the game last? 01:44:19.500 --> 01:44:26.250 It turns out there are about 255,000 possible tic-tac-toe games that 01:44:26.250 --> 01:44:27.780 can be played. 01:44:27.780 --> 01:44:30.240 But compare that to a more complex game, something 01:44:30.240 --> 01:44:32.060 like a game of chess, for example-- 01:44:32.060 --> 01:44:35.950 far more pieces, far more moves, games that last much longer. 01:44:35.950 --> 01:44:38.940 How many total possible chess games could there be? 01:44:38.940 --> 01:44:41.100 It turns out that after just four moves each, 01:44:41.100 --> 01:44:44.100 four moves by the white player, four moves by the black player, 01:44:44.100 --> 01:44:47.280 that there are 288 billion possible chess 01:44:47.280 --> 01:44:51.300 games that can result from that situation, after just four moves each. 01:44:51.300 --> 01:44:52.330 And going even further. 01:44:52.330 --> 01:44:55.380 If you look at entire chess games and how many possible chess games there 01:44:55.380 --> 01:44:58.560 could be as a result there, there are more than 10 01:44:58.560 --> 01:45:02.820 to the 29,000 possible chess games, far more chess games 01:45:02.820 --> 01:45:04.410 than could ever be considered. 01:45:04.410 --> 01:45:08.010 And this is a pretty big problem for the Minimax algorithm, because the Minimax 01:45:08.010 --> 01:45:12.150 algorithm starts with an initial state, considers all the possible actions 01:45:12.150 --> 01:45:15.570 and all the possible actions after that, all the way 01:45:15.570 --> 01:45:18.510 until we get to the end of the game. 01:45:18.510 --> 01:45:20.640 And that's going to be a problem if the computer is 01:45:20.640 --> 01:45:23.100 going to need to look through this many states, which 01:45:23.100 --> 01:45:28.680 is far more than any computer could ever do in any reasonable amount of time. 01:45:28.680 --> 01:45:30.900 So what do we do in order to solve this problem? 01:45:30.900 --> 01:45:32.980 Instead of looking through all these states, which 01:45:32.980 --> 01:45:36.420 is totally intractable for a computer, we need some better approach. 01:45:36.420 --> 01:45:39.600 And it turns out that better approach generally takes the form of something 01:45:39.600 --> 01:45:42.000 called depth-limited Minimax. 01:45:42.000 --> 01:45:44.740 Where normally Minimax is depth-unlimited-- 01:45:44.740 --> 01:45:47.220 we just keep going, layer after layer, move after move, 01:45:47.220 --> 01:45:48.930 until we get to the end of the game-- 01:45:48.930 --> 01:45:51.920 depth-limited Minimax is instead going to say, you know what? 01:45:51.920 --> 01:45:53.760 After a certain number of moves-- maybe I'll 01:45:53.760 --> 01:45:57.390 look 10 moves ahead, maybe I'll look 12 moves ahead, but after that point, 01:45:57.390 --> 01:46:00.350 I'm going to stop and not consider additional moves that 01:46:00.350 --> 01:46:02.190 might come after that, just because it would 01:46:02.190 --> 01:46:07.970 be computationally intractable to consider all of those possible options. 01:46:07.970 --> 01:46:10.730 But what do we do after we get 10 or 12 moves deep, 01:46:10.730 --> 01:46:13.840 and we arrive at a situation where the game's not over? 01:46:13.840 --> 01:46:18.110 Minimax still needs a way to assign a score to that game board or game state 01:46:18.110 --> 01:46:20.390 to figure out what its current value is, which 01:46:20.390 --> 01:46:22.760 is easy to do if the game is over, but not so 01:46:22.760 --> 01:46:25.400 easy to do if the game is not yet over. 01:46:25.400 --> 01:46:27.950 So in order to do that, we need to add one additional feature 01:46:27.950 --> 01:46:31.400 to depth-limited Minimax called an evaluation function, 01:46:31.400 --> 01:46:33.170 which is just some function that is going 01:46:33.170 --> 01:46:38.000 to estimate the expected utility of a game from a given state. 01:46:38.000 --> 01:46:41.000 So in a game like chess, if you imagine that a game value of 1 01:46:41.000 --> 01:46:45.950 means white wins, negative 1 means black wins, 0 means it's a draw, 01:46:45.950 --> 01:46:51.260 then you might imagine that a score of 0.8 means white is very likely to win, 01:46:51.260 --> 01:46:53.000 though certainly not guaranteed. 01:46:53.000 --> 01:46:56.450 And you would have an evaluation function that estimates 01:46:56.450 --> 01:46:59.450 how good the game state happens to be. 01:46:59.450 --> 01:47:02.720 And depending on how good that evaluation function is, 01:47:02.720 --> 01:47:06.050 that is ultimately what's going to constrain how good the AI is. 01:47:06.050 --> 01:47:09.080 The better the AI is at estimating how good 01:47:09.080 --> 01:47:12.440 or how bad any particular game state is, the better the AI 01:47:12.440 --> 01:47:14.660 is going to be able to play that game. 01:47:14.660 --> 01:47:16.910 If the evaluation function is worse and not as good 01:47:16.910 --> 01:47:19.670 as estimating what the expected utility is, 01:47:19.670 --> 01:47:21.680 then it's going to be a whole lot harder. 01:47:21.680 --> 01:47:25.130 And you can imagine trying to come up with these evaluation functions. 01:47:25.130 --> 01:47:27.890 In chess, for example, you might write an evaluation function 01:47:27.890 --> 01:47:30.230 based on how many pieces you have, as compared 01:47:30.230 --> 01:47:32.540 to how many pieces your opponent has, because each one 01:47:32.540 --> 01:47:35.240 has a value in your evaluation function. 01:47:35.240 --> 01:47:36.950 It probably needs to be a little bit more 01:47:36.950 --> 01:47:40.400 complicated than that to consider other possible situations that 01:47:40.400 --> 01:47:42.090 might arise as well. 01:47:42.090 --> 01:47:44.240 And there are many other variants on Minimax 01:47:44.240 --> 01:47:47.570 that add additional features in order to help it perform better 01:47:47.570 --> 01:47:50.330 under these larger and more computationally intractable 01:47:50.330 --> 01:47:54.620 situations, where we couldn't possibly explore all of the possible moves, 01:47:54.620 --> 01:47:57.230 so we need to figure out how to use evaluation 01:47:57.230 --> 01:48:01.370 functions and other techniques to be able to play these games, ultimately, 01:48:01.370 --> 01:48:02.360 better. 01:48:02.360 --> 01:48:05.480 But this now was a look at this kind of adversarial search, these search 01:48:05.480 --> 01:48:08.870 problems where we have situations where I am trying 01:48:08.870 --> 01:48:11.360 to play against some sort of opponent. 01:48:11.360 --> 01:48:13.880 And these search problems show up all over the place 01:48:13.880 --> 01:48:15.560 throughout artificial intelligence. 01:48:15.560 --> 01:48:18.710 We've been talking a lot today about more classical search problems, 01:48:18.710 --> 01:48:22.080 like trying to find directions from one location to another. 01:48:22.080 --> 01:48:25.590 But anytime an AI is faced with trying to make a decision like, 01:48:25.590 --> 01:48:28.460 what do I do now in order to do something that is rational, 01:48:28.460 --> 01:48:31.220 or do something that is intelligent, or trying to play a game, 01:48:31.220 --> 01:48:33.830 like figuring out what move to make, these sort of algorithms 01:48:33.830 --> 01:48:35.700 can really come in handy. 01:48:35.700 --> 01:48:38.420 It turns out that for tic-tac-toe, the solution is pretty simple, 01:48:38.420 --> 01:48:39.600 because it's a small game. 01:48:39.600 --> 01:48:42.470 XKCD has famously put together a webcomic 01:48:42.470 --> 01:48:45.620 where he will tell you exactly what move to make as the optimal move 01:48:45.620 --> 01:48:48.290 to make, no matter what your opponent happens to do. 01:48:48.290 --> 01:48:50.510 This type of thing is not quite as possible 01:48:50.510 --> 01:48:52.790 for a much larger game like checkers or chess, 01:48:52.790 --> 01:48:55.400 for example, where chess is totally computationally 01:48:55.400 --> 01:48:57.920 intractable for most computers to be able to explore 01:48:57.920 --> 01:48:59.330 all the possible states. 01:48:59.330 --> 01:49:03.650 So we really need our AI to be far more intelligent about how 01:49:03.650 --> 01:49:05.850 they go about trying to deal with these problems 01:49:05.850 --> 01:49:08.180 and how they go about taking this environment 01:49:08.180 --> 01:49:10.010 that they find themselves in and ultimately 01:49:10.010 --> 01:49:12.710 searching for one of these solutions. 01:49:12.710 --> 01:49:15.710 So this, then, was a look at search and artificial intelligence. 01:49:15.710 --> 01:49:17.630 Next time we'll take a look at knowledge, 01:49:17.630 --> 01:49:21.230 thinking about how it is that our AIs are able to know information, reason 01:49:21.230 --> 01:49:25.190 about that information, and draw conclusions, all in our look at AI 01:49:25.190 --> 01:49:26.750 and the principles behind it. 01:49:26.750 --> 01:49:28.500 We'll see you next time.