1 00:00:00,000 --> 00:00:17,000 [MUSIC] 2 00:00:17,870 --> 00:00:18,710 BRIAN YU: All right. 3 00:00:18,710 --> 00:00:22,280 Welcome, everyone, to an Introduction to Artificial Intelligence with Python. 4 00:00:22,280 --> 00:00:23,480 My name is Brian Yu. 5 00:00:23,480 --> 00:00:26,540 And in this class, we'll explore some of the ideas, and techniques, 6 00:00:26,540 --> 00:00:30,450 and algorithms that are at the foundation of artificial intelligence. 7 00:00:30,450 --> 00:00:34,160 Now, artificial intelligence covers a wide variety of types of techniques. 8 00:00:34,160 --> 00:00:36,110 Anytime you see a computer do something that 9 00:00:36,110 --> 00:00:39,290 appears to be intelligent or rational in some way, 10 00:00:39,290 --> 00:00:41,510 like recognizing someone's face in a photo, 11 00:00:41,510 --> 00:00:44,030 or being able to play a game better than people can, 12 00:00:44,030 --> 00:00:47,120 or being able to understand human language when we talk to our phones 13 00:00:47,120 --> 00:00:50,370 and they understand what we mean and are able to respond back to us, 14 00:00:50,370 --> 00:00:54,060 these are all examples of AI, or artificial intelligence. 15 00:00:54,060 --> 00:00:58,580 And in this class we'll explore some of the ideas that make that AI possible. 16 00:00:58,580 --> 00:01:00,790 So we'll begin our conversations with search. 17 00:01:00,790 --> 00:01:02,540 The problem of, we have an AI and we would 18 00:01:02,540 --> 00:01:06,150 like the AI to be able to search for solutions to some kind of problem, 19 00:01:06,150 --> 00:01:07,700 no matter what that problem might be. 20 00:01:07,700 --> 00:01:11,300 Whether it's trying to get driving directions from point A to point B, 21 00:01:11,300 --> 00:01:13,100 or trying to figure out how to play a game, 22 00:01:13,100 --> 00:01:16,580 giving a tic-tac-toe game, for example, figuring out what move 23 00:01:16,580 --> 00:01:17,960 it ought to make. 24 00:01:17,960 --> 00:01:20,330 After that, we'll take a look at knowledge. 25 00:01:20,330 --> 00:01:23,480 Ideally, we want our AI to be able to know information, 26 00:01:23,480 --> 00:01:25,400 to be able to represent that information, 27 00:01:25,400 --> 00:01:28,580 and more importantly, to be able to draw inferences from that information. 28 00:01:28,580 --> 00:01:32,760 To be able to use the information it knows and draw additional conclusions. 29 00:01:32,760 --> 00:01:37,130 So we'll talk about how AI can be programmed in order to do just that. 30 00:01:37,130 --> 00:01:39,350 Then we'll explore the topic of uncertainty. 31 00:01:39,350 --> 00:01:43,100 Talking about ideas of, what happens if a computer isn't sure about a fact 32 00:01:43,100 --> 00:01:45,950 but maybe is only sure with a certain probability? 33 00:01:45,950 --> 00:01:48,500 So we'll talk about some of the ideas behind probability 34 00:01:48,500 --> 00:01:51,530 and how computers can begin to deal with uncertain events 35 00:01:51,530 --> 00:01:55,980 in order to be a little bit more intelligent in that sense, as well. 36 00:01:55,980 --> 00:01:58,550 After that, we'll turn our attention to optimization. 37 00:01:58,550 --> 00:02:01,970 Problems of when the computer is trying to optimize for some sort of goal, 38 00:02:01,970 --> 00:02:03,770 especially in a situation where there might 39 00:02:03,770 --> 00:02:06,380 be multiple ways that a computer might solve a problem, 40 00:02:06,380 --> 00:02:09,770 but we're looking for a better way or, potentially, the best way 41 00:02:09,770 --> 00:02:11,550 if that's at all possible. 42 00:02:11,550 --> 00:02:14,510 Then we'll take a look at machine learning, or learning more generally. 43 00:02:14,510 --> 00:02:16,700 In looking at how when we have access to data 44 00:02:16,700 --> 00:02:20,300 our computers can be programmed to be quite intelligent by learning from data 45 00:02:20,300 --> 00:02:23,810 and learning from experience, being able to perform a task better and better 46 00:02:23,810 --> 00:02:25,770 based on greater access to data. 47 00:02:25,770 --> 00:02:28,520 So your email, for example, where your email inbox somehow 48 00:02:28,520 --> 00:02:32,180 knows which of your emails are good emails and whichever emails are spam. 49 00:02:32,180 --> 00:02:34,430 These are all examples of computers being 50 00:02:34,430 --> 00:02:38,420 able to learn from past experiences and past data. 51 00:02:38,420 --> 00:02:41,690 We'll take a look, too, at how computers are able to draw inspiration 52 00:02:41,690 --> 00:02:44,930 from human intelligence, looking at the structure of the human brain 53 00:02:44,930 --> 00:02:48,780 and how neural networks can be a computer analog to that sort of idea. 54 00:02:48,780 --> 00:02:51,830 And how, by taking advantage of a certain type of structure of a computer 55 00:02:51,830 --> 00:02:53,930 program, we can write neural networks that 56 00:02:53,930 --> 00:02:57,060 are able to perform tasks very, very effectively. 57 00:02:57,060 --> 00:02:59,420 And then finally, we'll turn our attention to language. 58 00:02:59,420 --> 00:03:02,820 Not programming languages, but human languages that we speak every day. 59 00:03:02,820 --> 00:03:04,700 And taking a look at the challenges that come 60 00:03:04,700 --> 00:03:07,880 about as a computer tries to understand natural language 61 00:03:07,880 --> 00:03:09,980 and how it is some of the natural language 62 00:03:09,980 --> 00:03:12,860 processing that occurs in modern artificial intelligence 63 00:03:12,860 --> 00:03:14,990 can actually work. 64 00:03:14,990 --> 00:03:17,630 But today it will begin our conversation with search. 65 00:03:17,630 --> 00:03:19,550 This problem of trying to figure out what 66 00:03:19,550 --> 00:03:22,790 to do when we have some sort of situation that the computer is in, 67 00:03:22,790 --> 00:03:25,880 some sort of environment that an agent is in, so to speak. 68 00:03:25,880 --> 00:03:29,720 And we would like for that agent to be able to somehow look for a solution 69 00:03:29,720 --> 00:03:31,200 to that problem. 70 00:03:31,200 --> 00:03:34,250 Now, these problems can come in any number of different types of formats. 71 00:03:34,250 --> 00:03:37,090 One example, for instance, might be something like this classic 15 72 00:03:37,090 --> 00:03:38,930 puzzle with the sliding tiles that you might 73 00:03:38,930 --> 00:03:41,930 have seen, where you're trying to slide the tiles in order to make sure 74 00:03:41,930 --> 00:03:43,730 that all the numbers line up in order. 75 00:03:43,730 --> 00:03:46,700 This is an example of what you might call a search problem. 76 00:03:46,700 --> 00:03:50,360 The 15 puzzle begins in an initially mixed up state 77 00:03:50,360 --> 00:03:53,480 and we need some way of finding moves to make in order to return 78 00:03:53,480 --> 00:03:55,520 the puzzle to its solved state. 79 00:03:55,520 --> 00:03:58,190 But there are similar problems that you can frame in other ways. 80 00:03:58,190 --> 00:04:00,390 Trying to find your way through a maze, for example, 81 00:04:00,390 --> 00:04:02,150 is another example of a search problem. 82 00:04:02,150 --> 00:04:05,720 You begin in one place, you have some goal of where you're trying to get to, 83 00:04:05,720 --> 00:04:08,930 and you need to figure out the correct sequence of actions that will take you 84 00:04:08,930 --> 00:04:11,460 from that initial state to the goal. 85 00:04:11,460 --> 00:04:13,460 And while this is a little bit abstract, anytime 86 00:04:13,460 --> 00:04:15,440 we talk about maze solving in this class, 87 00:04:15,440 --> 00:04:17,990 you can translate it to something a little more real world, 88 00:04:17,990 --> 00:04:19,820 something like driving directions. 89 00:04:19,820 --> 00:04:22,520 If you ever wonder how Google Maps is able to figure out what 90 00:04:22,520 --> 00:04:25,550 is the best way for you to get from point A to point B 91 00:04:25,550 --> 00:04:28,860 and what turns to make, at what time, depending on traffic, for example. 92 00:04:28,860 --> 00:04:31,250 It's often some sort of search algorithm. 93 00:04:31,250 --> 00:04:34,370 You have an AI that is trying to get from an initial position 94 00:04:34,370 --> 00:04:38,060 to some sort of goal by taking some sequence of actions. 95 00:04:38,060 --> 00:04:40,640 So we'll start our conversations today by thinking 96 00:04:40,640 --> 00:04:42,590 about these types of search problems and what 97 00:04:42,590 --> 00:04:46,250 goes in to solving a search problem like this in order for an AI 98 00:04:46,250 --> 00:04:48,290 to be able to find a good solution. 99 00:04:48,290 --> 00:04:50,210 In order to do so, though, we're going to need 100 00:04:50,210 --> 00:04:52,580 to introduce a little bit of terminology, some of which 101 00:04:52,580 --> 00:04:53,690 I've already used. 102 00:04:53,690 --> 00:04:56,730 But the first time we'll need to think about is an agent. 103 00:04:56,730 --> 00:04:59,950 An agent is just some entity that perceives its environment, 104 00:04:59,950 --> 00:05:02,120 it somehow is able to perceive the things around it, 105 00:05:02,120 --> 00:05:04,530 and act on that environment in some way. 106 00:05:04,530 --> 00:05:06,240 So in the case of the driving directions, 107 00:05:06,240 --> 00:05:08,960 your agent might be some representation of a car that 108 00:05:08,960 --> 00:05:11,180 is trying to figure out what actions to take in order 109 00:05:11,180 --> 00:05:12,740 to arrive at a destination. 110 00:05:12,740 --> 00:05:15,440 In the case of the 15 puzzle with the sliding tiles, 111 00:05:15,440 --> 00:05:19,250 the agent might be the AI or the person that is trying to solve that puzzle, 112 00:05:19,250 --> 00:05:24,090 trying to figure out what tiles to move in order to get to that solution. 113 00:05:24,090 --> 00:05:26,680 Next, we introduce the idea of a state. 114 00:05:26,680 --> 00:05:31,140 A state is just some configuration of the agent in its environment. 115 00:05:31,140 --> 00:05:34,880 So in the 15 puzzle, for example, any state might be any one of these three 116 00:05:34,880 --> 00:05:35,570 for example. 117 00:05:35,570 --> 00:05:38,570 A state is just some configuration of the tiles. 118 00:05:38,570 --> 00:05:40,490 Each of these states is different and is going 119 00:05:40,490 --> 00:05:42,830 to require a slightly different solution. 120 00:05:42,830 --> 00:05:46,400 A different sequence of actions will be needed in each one of these in order 121 00:05:46,400 --> 00:05:49,730 to get from this initial state to the goal, which 122 00:05:49,730 --> 00:05:51,500 is where we're trying to get. 123 00:05:51,500 --> 00:05:52,460 The initial state then. 124 00:05:52,460 --> 00:05:53,210 What is that? 125 00:05:53,210 --> 00:05:56,090 The initial state is just the state where the agent begins. 126 00:05:56,090 --> 00:05:58,880 It is one such state where we're going to start from 127 00:05:58,880 --> 00:06:02,010 and this is going to be the starting point for our search algorithm, 128 00:06:02,010 --> 00:06:02,690 so to speak. 129 00:06:02,690 --> 00:06:04,710 We're going to begin with this initial state 130 00:06:04,710 --> 00:06:08,180 and then start to reason about it, to think about what actions might we apply 131 00:06:08,180 --> 00:06:11,870 to that initial state in order to figure out how to get from the beginning 132 00:06:11,870 --> 00:06:16,850 to the end, from the initial position to whatever our goal happens to be. 133 00:06:16,850 --> 00:06:19,670 And how do we make our way from that initial position to the goal? 134 00:06:19,670 --> 00:06:22,100 Well ultimately, it's via taking actions. 135 00:06:22,100 --> 00:06:25,570 Actions are just choices that we can make in any given state. 136 00:06:25,570 --> 00:06:29,180 And in AI, we're always going to try to formalize these ideas a little bit 137 00:06:29,180 --> 00:06:31,850 more precisely such that we could program them a little bit more 138 00:06:31,850 --> 00:06:33,440 mathematically, so to speak. 139 00:06:33,440 --> 00:06:37,580 So this will be a recurring theme and we can more precisely define actions 140 00:06:37,580 --> 00:06:38,840 as a function. 141 00:06:38,840 --> 00:06:41,570 We're going to effectively define a function called actions 142 00:06:41,570 --> 00:06:46,490 that takes an input S, where S is going to be some state that exists inside 143 00:06:46,490 --> 00:06:50,720 of our environment, and actions of S is going to take the state as input 144 00:06:50,720 --> 00:06:54,140 and return as output the set of all actions that 145 00:06:54,140 --> 00:06:56,540 can be executed in that state. 146 00:06:56,540 --> 00:07:00,170 And so it's possible that some actions are only valid in certain states 147 00:07:00,170 --> 00:07:01,640 and not in other states. 148 00:07:01,640 --> 00:07:04,340 And we'll see examples of that soon, too. 149 00:07:04,340 --> 00:07:06,530 So in the case of the 15 puzzle, for example, 150 00:07:06,530 --> 00:07:09,010 they're generally going to be four possible actions 151 00:07:09,010 --> 00:07:10,760 that we can do most of the time. 152 00:07:10,760 --> 00:07:13,250 We can slide a tile to the right, slide a tile to the left, 153 00:07:13,250 --> 00:07:16,280 slide a tile up, or slide a tile down, for example. 154 00:07:16,280 --> 00:07:19,950 And those are going to be the actions that are available to us. 155 00:07:19,950 --> 00:07:23,000 So somehow our AI, our program, needs some encoding 156 00:07:23,000 --> 00:07:26,150 of the state, which is often going to be in some numerical format, 157 00:07:26,150 --> 00:07:28,050 and some encoding of these actions. 158 00:07:28,050 --> 00:07:31,100 But it also needs some encoding of the relationship between these things, 159 00:07:31,100 --> 00:07:34,610 how do the states and actions relate to one another? 160 00:07:34,610 --> 00:07:36,950 And in order to do that, we'll introduce to our AI 161 00:07:36,950 --> 00:07:40,430 a transition model, which will be a description of what state 162 00:07:40,430 --> 00:07:45,120 we get after we perform some available action in some other state. 163 00:07:45,120 --> 00:07:47,540 And again, we can be a little bit more precise about this, 164 00:07:47,540 --> 00:07:50,820 define this transition model a little bit more formally, again, 165 00:07:50,820 --> 00:07:51,770 as a function. 166 00:07:51,770 --> 00:07:54,010 The function is going to be a function called result, 167 00:07:54,010 --> 00:07:56,210 that this time takes two inputs. 168 00:07:56,210 --> 00:07:59,070 Input number one is S, some state. 169 00:07:59,070 --> 00:08:02,270 And input number two is A, some action. 170 00:08:02,270 --> 00:08:04,450 And the output of this function result is 171 00:08:04,450 --> 00:08:09,770 it is going to give us the state that we get after we perform action A in state 172 00:08:09,770 --> 00:08:13,790 S. So let's take a look at an example to see more precisely what this actually 173 00:08:13,790 --> 00:08:14,580 means. 174 00:08:14,580 --> 00:08:18,000 Here's an example of a state of the 15 puzzle, for example. 175 00:08:18,000 --> 00:08:21,600 And here's an example of an action, sliding a tile to the right. 176 00:08:21,600 --> 00:08:24,840 What happens if we pass these as inputs to the result function? 177 00:08:24,840 --> 00:08:29,270 Again, the result function takes this board, this state, as its first input. 178 00:08:29,270 --> 00:08:31,730 And it takes an action as a second input. 179 00:08:31,730 --> 00:08:34,100 And of course, here, I'm describing things visually so 180 00:08:34,100 --> 00:08:37,010 that you can see visually what the state is and what the action is. 181 00:08:37,010 --> 00:08:39,410 In a computer, you might represent one of these actions 182 00:08:39,410 --> 00:08:41,570 as just some number that represents the action. 183 00:08:41,570 --> 00:08:43,370 Or if you're familiar with enums that allow 184 00:08:43,370 --> 00:08:46,430 you to enumerate multiple possibilities, it might be something like that. 185 00:08:46,430 --> 00:08:50,530 And the state might just be represented as an array, or two dimensional array, 186 00:08:50,530 --> 00:08:52,400 of all of these numbers that exist. 187 00:08:52,400 --> 00:08:55,540 But here we're going to show it visually just so you can see it. 188 00:08:55,540 --> 00:08:59,450 When we take this state and this action, pass it into the result function, 189 00:08:59,450 --> 00:09:01,310 the output is a new state. 190 00:09:01,310 --> 00:09:04,910 The state we get after we take a tile and slide it to the right, and this 191 00:09:04,910 --> 00:09:06,500 is the state we get as a result. 192 00:09:06,500 --> 00:09:09,710 If we had a different action and a different state, for example, 193 00:09:09,710 --> 00:09:11,450 and passed that into the result function, 194 00:09:11,450 --> 00:09:13,470 we'd get a different answer altogether. 195 00:09:13,470 --> 00:09:15,710 So the result function needs to take care 196 00:09:15,710 --> 00:09:20,090 of figuring out how to take a state and take an action and get what results. 197 00:09:20,090 --> 00:09:22,760 And this is going to be our transition model that 198 00:09:22,760 --> 00:09:27,450 describes how it is that states and actions are related to each other. 199 00:09:27,450 --> 00:09:30,290 If we take this transition model and think about it more generally 200 00:09:30,290 --> 00:09:34,850 and across the entire problem, we can form what we might call a state space, 201 00:09:34,850 --> 00:09:38,120 the set of all of the states we can get from the initial state 202 00:09:38,120 --> 00:09:41,960 via any sequence of actions, by taking zero or one or two or more 203 00:09:41,960 --> 00:09:45,050 actions in addition to that, so we could draw a diagram 204 00:09:45,050 --> 00:09:46,400 that looks something like this. 205 00:09:46,400 --> 00:09:49,640 Where every state is represented here by a game board. 206 00:09:49,640 --> 00:09:52,850 And there are arrows that connect every state to every other state we 207 00:09:52,850 --> 00:09:54,740 can get two from that state. 208 00:09:54,740 --> 00:09:57,800 And the state space is much larger than what you see just here. 209 00:09:57,800 --> 00:10:01,920 This is just a sample of what the state space might actually look like. 210 00:10:01,920 --> 00:10:04,340 And, in general, across many search problems, 211 00:10:04,340 --> 00:10:07,280 whether they're this particular 15 puzzle or driving directions 212 00:10:07,280 --> 00:10:10,710 or something else, the state space is going to look something like this. 213 00:10:10,710 --> 00:10:15,110 We have individual states and arrows that are connecting them. 214 00:10:15,110 --> 00:10:17,000 And oftentimes, just for simplicity, we'll 215 00:10:17,000 --> 00:10:19,670 simplify our representation of this entire thing 216 00:10:19,670 --> 00:10:24,560 as a graph, some sequence of nodes and edges that connect nodes. 217 00:10:24,560 --> 00:10:28,640 But you can think of this more abstract representation as the exact same idea. 218 00:10:28,640 --> 00:10:30,560 Each of these little circles, or nodes, is 219 00:10:30,560 --> 00:10:33,860 going to represent one of the states inside of our problem. 220 00:10:33,860 --> 00:10:35,930 And the arrows here represent the actions 221 00:10:35,930 --> 00:10:38,300 that we can take in any particular state, 222 00:10:38,300 --> 00:10:44,180 taking us from one particular state to another state, for example. 223 00:10:44,180 --> 00:10:44,760 All right. 224 00:10:44,760 --> 00:10:48,510 So now we have this idea of nodes that are representing these states, 225 00:10:48,510 --> 00:10:50,890 actions that can take us from one state to another, 226 00:10:50,890 --> 00:10:53,370 and a transition model that defines what happens 227 00:10:53,370 --> 00:10:55,420 after we take a particular action. 228 00:10:55,420 --> 00:10:57,210 So the next step we need to figure out is 229 00:10:57,210 --> 00:11:00,570 how we know when the AI is done solving the problem. 230 00:11:00,570 --> 00:11:03,540 The AI I needs some way to know when it gets to the goal, 231 00:11:03,540 --> 00:11:04,990 that it's found the goal. 232 00:11:04,990 --> 00:11:08,100 So the next thing we'll need to encode into our artificial intelligence 233 00:11:08,100 --> 00:11:13,380 is a goal test, some way to determine whether a given state is a goal state. 234 00:11:13,380 --> 00:11:16,740 In the case of something like driving directions, it might be pretty easy. 235 00:11:16,740 --> 00:11:19,080 If you're in a state that corresponds to whatever 236 00:11:19,080 --> 00:11:21,610 the user typed in as their intended destination, well, 237 00:11:21,610 --> 00:11:23,160 then you know you're in a goal state. 238 00:11:23,160 --> 00:11:25,320 In the 15 puzzle, it might be checking the numbers 239 00:11:25,320 --> 00:11:27,160 to make sure they're all in ascending order. 240 00:11:27,160 --> 00:11:29,880 But the AI need some way to encode whether or not 241 00:11:29,880 --> 00:11:32,340 any state they happen to be in is a goal. 242 00:11:32,340 --> 00:11:34,650 And some problems might have one goal, like a maze 243 00:11:34,650 --> 00:11:37,260 where you have one initial position and one ending position 244 00:11:37,260 --> 00:11:38,470 and that's the goal. 245 00:11:38,470 --> 00:11:40,320 In other more complex problems, you might 246 00:11:40,320 --> 00:11:43,890 imagine that there are multiple possible goals, that there are multiple ways 247 00:11:43,890 --> 00:11:45,070 to solve a problem. 248 00:11:45,070 --> 00:11:47,940 And we might not care which one the computer finds as 249 00:11:47,940 --> 00:11:51,330 long as it does find a particular goal. 250 00:11:51,330 --> 00:11:54,900 However, sometimes a computer doesn't just care about finding a goal, 251 00:11:54,900 --> 00:11:57,710 but finding a goal well, or one with a low cost. 252 00:11:57,710 --> 00:11:59,550 And it's for that reason that the last piece 253 00:11:59,550 --> 00:12:02,400 of terminology that we use to define these search problems 254 00:12:02,400 --> 00:12:04,710 is something called a path cost. 255 00:12:04,710 --> 00:12:07,170 You might imagine that in the case of driving directions, 256 00:12:07,170 --> 00:12:10,680 it would be pretty annoying if I said I wanted directions from point A 257 00:12:10,680 --> 00:12:14,310 to point B, and the route the Google Maps gave me was a long route with lots 258 00:12:14,310 --> 00:12:17,220 of detours that were unnecessary, that took longer than it should 259 00:12:17,220 --> 00:12:19,530 have for me to get to that destination. 260 00:12:19,530 --> 00:12:22,410 And it's for that reason that when we're formulating search problems, 261 00:12:22,410 --> 00:12:27,480 we'll often give every path some sort of numerical cost, some number telling us 262 00:12:27,480 --> 00:12:30,730 how expensive it is to take this particular option. 263 00:12:30,730 --> 00:12:34,530 And then tell our AI that instead of just finding a solution, 264 00:12:34,530 --> 00:12:36,990 some way of getting from the initial state to the goal, 265 00:12:36,990 --> 00:12:40,830 we'd really like to find one that minimizes this path cost, that 266 00:12:40,830 --> 00:12:44,370 is less expensive, or takes less time, or minimizes 267 00:12:44,370 --> 00:12:46,500 some other numerical value. 268 00:12:46,500 --> 00:12:49,770 We can represent this graphically, if we take a look at this graph again. 269 00:12:49,770 --> 00:12:52,740 And imagine that each of these arrows, each of these actions 270 00:12:52,740 --> 00:12:55,560 that we can take from one state to another state, 271 00:12:55,560 --> 00:12:57,720 has some sort of number associated with it, 272 00:12:57,720 --> 00:13:01,170 that number being the path cost of this particular action where 273 00:13:01,170 --> 00:13:03,480 some of the costs for any particular action 274 00:13:03,480 --> 00:13:07,660 might be more expensive than the cost for some other action, for example. 275 00:13:07,660 --> 00:13:10,120 Although this will only happen in some sorts of problems. 276 00:13:10,120 --> 00:13:12,540 In other problems we can simplify the diagram 277 00:13:12,540 --> 00:13:16,750 and just assume that the cost of any particular action is the same. 278 00:13:16,750 --> 00:13:19,440 And this is probably the case in something like the 15 puzzle, 279 00:13:19,440 --> 00:13:22,410 for example, where it doesn't really make a difference whether I'm 280 00:13:22,410 --> 00:13:23,880 moving right or moving left. 281 00:13:23,880 --> 00:13:26,940 The only thing that matters is the total number of steps 282 00:13:26,940 --> 00:13:31,650 that I have to take to get from point A to point B. And each of those steps 283 00:13:31,650 --> 00:13:32,880 is of equal cost. 284 00:13:32,880 --> 00:13:37,270 We can just assume it's a some constant cost, like one. 285 00:13:37,270 --> 00:13:39,380 And so this now forms the basis for what we 286 00:13:39,380 --> 00:13:41,720 might consider to be a search problem. 287 00:13:41,720 --> 00:13:44,960 A search problem has some sort of initial state, some place where 288 00:13:44,960 --> 00:13:47,420 we begin, some sort of action that we can take 289 00:13:47,420 --> 00:13:50,300 or multiple actions that we can take in any given state, 290 00:13:50,300 --> 00:13:52,850 and it has a transition model, some way of defining 291 00:13:52,850 --> 00:13:56,570 what happens when we go from one state and take one action, 292 00:13:56,570 --> 00:13:59,280 what state do we end up with as a result. 293 00:13:59,280 --> 00:14:02,450 In addition to that, we need some goal test to know whether or not 294 00:14:02,450 --> 00:14:03,650 we've reached a goal. 295 00:14:03,650 --> 00:14:07,940 And then we need a path cost function that tells us for any particular path, 296 00:14:07,940 --> 00:14:11,690 by following some sequence of actions, how expensive is that path. 297 00:14:11,690 --> 00:14:14,360 What is its cost in terms of money, or time, 298 00:14:14,360 --> 00:14:18,470 or some other resource that we are trying to minimize our usage of. 299 00:14:18,470 --> 00:14:22,370 The goal, ultimately, is to find a solution, where a solution in this case 300 00:14:22,370 --> 00:14:26,090 is just some sequence of actions that will take us from the initial state 301 00:14:26,090 --> 00:14:27,200 to the goal state. 302 00:14:27,200 --> 00:14:31,700 And, ideally, we'd like to find not just any solution, but the optimal solution, 303 00:14:31,700 --> 00:14:35,510 which is a solution that has the lowest path cost among all 304 00:14:35,510 --> 00:14:36,950 of the possible solutions. 305 00:14:36,950 --> 00:14:39,590 And in some cases, there might be multiple optimal solutions, 306 00:14:39,590 --> 00:14:41,600 but an optimal solution just means that there 307 00:14:41,600 --> 00:14:46,430 is no way that we could have done better in terms of finding that solution. 308 00:14:46,430 --> 00:14:47,870 So now we've defined the problem. 309 00:14:47,870 --> 00:14:49,990 And now we need to begin to figure out how 310 00:14:49,990 --> 00:14:53,050 it is that we're going to solve this kind of search problem. 311 00:14:53,050 --> 00:14:55,270 And in order to do so, you'll probably imagine 312 00:14:55,270 --> 00:14:58,810 that our computer is going to need to represent a whole bunch of data 313 00:14:58,810 --> 00:15:00,160 about this particular problem. 314 00:15:00,160 --> 00:15:03,080 We need to represent data about where we are in the problem. 315 00:15:03,080 --> 00:15:06,490 And we might need to be considering multiple different options at once. 316 00:15:06,490 --> 00:15:09,850 And oftentimes when we're trying to package a whole bunch of data related 317 00:15:09,850 --> 00:15:12,850 to a state together, we'll do so using a data structure 318 00:15:12,850 --> 00:15:14,630 that we're going to call a node. 319 00:15:14,630 --> 00:15:16,510 A node is a data structure that is just going 320 00:15:16,510 --> 00:15:19,110 to keep track of a variety of different values, 321 00:15:19,110 --> 00:15:21,430 and specifically in the case of a search problem, 322 00:15:21,430 --> 00:15:24,640 it's going to keep track of these four values in particular. 323 00:15:24,640 --> 00:15:28,570 Every node is going to keep track of a state, the state we're currently on. 324 00:15:28,570 --> 00:15:31,540 And every node is also going to keep track of a parent. 325 00:15:31,540 --> 00:15:34,420 A parent being the state before us, or the node 326 00:15:34,420 --> 00:15:37,660 that we used in order to get to this current state. 327 00:15:37,660 --> 00:15:39,830 And this is going to be relevant because eventually, 328 00:15:39,830 --> 00:15:42,740 once we reach the goal node, once we get to the end, 329 00:15:42,740 --> 00:15:47,210 we want to know what sequence of actions we used in order to get to that goal. 330 00:15:47,210 --> 00:15:50,140 And the way we'll know that is by looking at these parents 331 00:15:50,140 --> 00:15:53,890 to keep track of what led us to the goal, and what led us to that state, 332 00:15:53,890 --> 00:15:56,710 and what led us to the state before that, so on and so forth, 333 00:15:56,710 --> 00:15:59,350 backtracking our way to the beginning so that we 334 00:15:59,350 --> 00:16:01,990 know the entire sequence of actions we needed in order 335 00:16:01,990 --> 00:16:04,730 to get from the beginning to the end. 336 00:16:04,730 --> 00:16:07,900 The node is also going to keep track of what action we took in order to get 337 00:16:07,900 --> 00:16:10,120 from the parent to the current state. 338 00:16:10,120 --> 00:16:13,700 And the node is also going to keep track of a path cost. 339 00:16:13,700 --> 00:16:16,270 In other words, it's going to keep track of the number that 340 00:16:16,270 --> 00:16:20,440 represents how long it took to get from the initial state to the state 341 00:16:20,440 --> 00:16:22,080 that we currently happen to be at. 342 00:16:22,080 --> 00:16:24,790 And we'll see why this is relevant as we start to talk about some 343 00:16:24,790 --> 00:16:28,030 of the optimizations that we can make in terms of these search problems more 344 00:16:28,030 --> 00:16:29,080 generally. 345 00:16:29,080 --> 00:16:31,000 So this is the data structure that we're going 346 00:16:31,000 --> 00:16:32,920 to use in order to solve the problem. 347 00:16:32,920 --> 00:16:35,050 And now let's talk about the approach, how might we 348 00:16:35,050 --> 00:16:37,850 actually begin to solve the problem? 349 00:16:37,850 --> 00:16:39,940 Well, as you might imagine, what we're going to do 350 00:16:39,940 --> 00:16:42,190 is we're going to start at one particular state 351 00:16:42,190 --> 00:16:44,700 and we're just going to explore from there. 352 00:16:44,700 --> 00:16:46,720 The intuition is that from a given state, 353 00:16:46,720 --> 00:16:48,760 we have multiple options that we could take, 354 00:16:48,760 --> 00:16:50,800 and we're going to explore those options. 355 00:16:50,800 --> 00:16:54,490 And once we explore those options, we'll find that more options than that 356 00:16:54,490 --> 00:16:56,290 are going to make themselves available. 357 00:16:56,290 --> 00:16:59,080 And we're going to consider all of the available options 358 00:16:59,080 --> 00:17:03,310 to be stored inside of a single data structure that we'll call the frontier. 359 00:17:03,310 --> 00:17:05,710 The frontier is going to represent all of the things 360 00:17:05,710 --> 00:17:10,440 that we could explore next, that we haven't yet explored or visited. 361 00:17:10,440 --> 00:17:12,610 So in our approach, we're going to begin this search 362 00:17:12,610 --> 00:17:17,020 algorithm by starting with a frontier that just contains one state. 363 00:17:17,020 --> 00:17:20,470 The frontier is going to contain the initial state because at the beginning, 364 00:17:20,470 --> 00:17:21,970 that's the only state we know about. 365 00:17:21,970 --> 00:17:24,330 That is the only state that exists. 366 00:17:24,330 --> 00:17:27,790 And then our search algorithm is effectively going to follow a loop. 367 00:17:27,790 --> 00:17:31,390 We're going to repeat some process again and again and again. 368 00:17:31,390 --> 00:17:35,320 The first thing we're going to do is if the frontier is empty, 369 00:17:35,320 --> 00:17:36,610 then there's no solution. 370 00:17:36,610 --> 00:17:39,310 And we can report that there is no way to get to the goal. 371 00:17:39,310 --> 00:17:40,560 And that's certainly possible. 372 00:17:40,560 --> 00:17:42,730 There are certain types of problems that an AI might 373 00:17:42,730 --> 00:17:46,940 try to explore and realize that there is no way to solve that problem. 374 00:17:46,940 --> 00:17:49,610 And that's useful information for humans to know, as well. 375 00:17:49,610 --> 00:17:53,560 So if ever the frontier is empty, that means there's nothing left to explore, 376 00:17:53,560 --> 00:17:57,160 and we haven't yet found a solution so there is no solution. 377 00:17:57,160 --> 00:17:59,140 There's nothing left to explore. 378 00:17:59,140 --> 00:18:03,020 Otherwise what we'll do is we'll remove a node from the frontier. 379 00:18:03,020 --> 00:18:04,990 So right now at the beginning, the frontier 380 00:18:04,990 --> 00:18:07,870 just contains one node representing the initial state. 381 00:18:07,870 --> 00:18:09,550 But over time, the frontier might grow. 382 00:18:09,550 --> 00:18:11,080 It might contain multiple states. 383 00:18:11,080 --> 00:18:15,730 And so here we're just going to remove a single node from that frontier. 384 00:18:15,730 --> 00:18:19,000 If that node happens to be a goal, then we found a solution. 385 00:18:19,000 --> 00:18:22,390 So we remove a node from the frontier and ask ourselves, is this the goal? 386 00:18:22,390 --> 00:18:25,540 And we do that by applying the goal test that we talked about earlier, 387 00:18:25,540 --> 00:18:29,080 asking if we're at the destination or asking if all the numbers of the 15 388 00:18:29,080 --> 00:18:31,250 puzzle happen to be in order. 389 00:18:31,250 --> 00:18:33,860 So if the node contains the goal, we found a solution. 390 00:18:33,860 --> 00:18:34,360 Great. 391 00:18:34,360 --> 00:18:35,900 We're done. 392 00:18:35,900 --> 00:18:40,820 And otherwise, what we'll need to do is we'll need to expand the node. 393 00:18:40,820 --> 00:18:42,940 And this is a term in artificial intelligence. 394 00:18:42,940 --> 00:18:46,870 To expand the node just means to look at all of the neighbors of that node. 395 00:18:46,870 --> 00:18:49,630 In other words, consider all of the possible actions 396 00:18:49,630 --> 00:18:52,830 that I could take from the state that this node as representing 397 00:18:52,830 --> 00:18:55,270 and what nodes could I get to from there. 398 00:18:55,270 --> 00:18:57,520 We're going to take all of those nodes, the next nodes 399 00:18:57,520 --> 00:19:00,220 that I can get to from this current one I'm looking at, 400 00:19:00,220 --> 00:19:02,320 and add those to the frontier. 401 00:19:02,320 --> 00:19:04,550 And then we'll repeat this process. 402 00:19:04,550 --> 00:19:07,840 So at a very high level, the idea is we start with a frontier that 403 00:19:07,840 --> 00:19:09,400 contains the initial state. 404 00:19:09,400 --> 00:19:12,190 And we're constantly removing a node from the frontier, 405 00:19:12,190 --> 00:19:16,150 looking at where we can get to next, and adding those nodes to the frontier, 406 00:19:16,150 --> 00:19:19,090 repeating this process over and over until either we remove 407 00:19:19,090 --> 00:19:22,270 a node from the frontier and it contains a goal, meaning we've solved 408 00:19:22,270 --> 00:19:23,320 the problem. 409 00:19:23,320 --> 00:19:26,950 Or we run into a situation where the frontier is empty, at which point 410 00:19:26,950 --> 00:19:29,610 we're left with no solution. 411 00:19:29,610 --> 00:19:31,610 So let's actually try and take the pseudocode, 412 00:19:31,610 --> 00:19:36,470 put it into practice by taking a look at an example of a sample search problem. 413 00:19:36,470 --> 00:19:38,270 So right here I have a sample graph. 414 00:19:38,270 --> 00:19:42,470 A is connected to B via this action, B is connected to node C and D, C 415 00:19:42,470 --> 00:19:46,220 is connected to D, E is connected to F. And what I'd like to do 416 00:19:46,220 --> 00:19:52,760 is have my AI find a path from A to E. We want to get from this initial state 417 00:19:52,760 --> 00:19:54,980 to this goal state. 418 00:19:54,980 --> 00:19:56,440 So how are we going to do that? 419 00:19:56,440 --> 00:19:58,490 Well, we're going to start with the frontier that 420 00:19:58,490 --> 00:19:59,760 contains the initial state. 421 00:19:59,760 --> 00:20:01,550 This is going to represent our frontier. 422 00:20:01,550 --> 00:20:05,090 So our frontier, initially, will just contain A, that initial state 423 00:20:05,090 --> 00:20:06,650 where we're going to begin. 424 00:20:06,650 --> 00:20:08,480 And now we'll repeat this process. 425 00:20:08,480 --> 00:20:10,460 If the frontier is empty, no solution. 426 00:20:10,460 --> 00:20:12,920 That's not a problem because the frontier is not empty. 427 00:20:12,920 --> 00:20:17,080 So we'll remove a node from the frontier as the one to consider next. 428 00:20:17,080 --> 00:20:18,720 There is only one node in the frontier. 429 00:20:18,720 --> 00:20:20,810 So we'll go ahead and remove it from the frontier. 430 00:20:20,810 --> 00:20:25,490 But now A, this initial node, this is the node we're currently considering. 431 00:20:25,490 --> 00:20:26,600 We follow the next step. 432 00:20:26,600 --> 00:20:29,270 We ask ourselves, is this node the goal? 433 00:20:29,270 --> 00:20:29,880 No, it's not. 434 00:20:29,880 --> 00:20:30,770 A is not the goal. 435 00:20:30,770 --> 00:20:32,220 E is the goal. 436 00:20:32,220 --> 00:20:33,870 So we don't return the solution. 437 00:20:33,870 --> 00:20:37,160 So instead, we go to this last step, expand the node 438 00:20:37,160 --> 00:20:39,950 and add the resulting nodes to the frontier. 439 00:20:39,950 --> 00:20:41,070 What does that mean? 440 00:20:41,070 --> 00:20:45,050 Well, it means take this state A and consider where we could get to next. 441 00:20:45,050 --> 00:20:47,390 And after A what we could get to next is only 442 00:20:47,390 --> 00:20:51,110 B. So that's what we get when we expand A. We find B. 443 00:20:51,110 --> 00:20:53,080 And we add B to the frontier. 444 00:20:53,080 --> 00:20:56,300 And now B is in the frontier and we repeat the process again. 445 00:20:56,300 --> 00:20:57,050 We say, all right. 446 00:20:57,050 --> 00:20:58,280 The frontier is not empty. 447 00:20:58,280 --> 00:21:00,440 So let's remove B from the frontier. 448 00:21:00,440 --> 00:21:02,300 B is now the node that we're considering. 449 00:21:02,300 --> 00:21:04,160 We ask ourselves, is B the goal? 450 00:21:04,160 --> 00:21:05,120 No, it's not. 451 00:21:05,120 --> 00:21:10,010 So we go ahead and expand B and add its resulting nodes to the frontier. 452 00:21:10,010 --> 00:21:11,660 What happens when we expand B? 453 00:21:11,660 --> 00:21:14,750 In other words, what nodes can we get to from B? 454 00:21:14,750 --> 00:21:17,180 Well, we can get to C and D. So we'll go ahead 455 00:21:17,180 --> 00:21:19,100 and add C and D from the frontier. 456 00:21:19,100 --> 00:21:21,500 And now we have two nodes in the frontier, C and D. 457 00:21:21,500 --> 00:21:23,170 And we repeat the process again. 458 00:21:23,170 --> 00:21:25,220 We remove a node from the frontier, for now we'll 459 00:21:25,220 --> 00:21:27,260 do so arbitrarily just by picking C. 460 00:21:27,260 --> 00:21:30,550 We'll see why later how choosing which node you remove from the frontier 461 00:21:30,550 --> 00:21:32,840 is actually quite an important part of the algorithm. 462 00:21:32,840 --> 00:21:36,230 But for now I'll arbitrarily remove C, say it's not the goal, 463 00:21:36,230 --> 00:21:39,260 so we'll add E, the next one to the frontier. 464 00:21:39,260 --> 00:21:41,420 Then let's say I remove E from the frontier. 465 00:21:41,420 --> 00:21:45,620 And now I'm currently looking at state E. Is that a goal state? 466 00:21:45,620 --> 00:21:48,410 It is because I'm trying to find a path from A to E. 467 00:21:48,410 --> 00:21:49,760 So I would return the goal. 468 00:21:49,760 --> 00:21:52,520 And that, now, would be the solution, that I'm now 469 00:21:52,520 --> 00:21:57,290 able to return the solution and I found a path from A to E. 470 00:21:57,290 --> 00:22:00,710 So this is the general idea, the general approach of this search algorithm, 471 00:22:00,710 --> 00:22:04,220 to follow these steps constantly removing nodes from the frontier 472 00:22:04,220 --> 00:22:06,540 until we're able to find a solution. 473 00:22:06,540 --> 00:22:08,510 So the next question you might reasonably ask 474 00:22:08,510 --> 00:22:10,610 is, what could go wrong here? 475 00:22:10,610 --> 00:22:14,090 What are the potential problems with an approach like this? 476 00:22:14,090 --> 00:22:18,090 And here's one example of a problem that could arise from this sort of approach. 477 00:22:18,090 --> 00:22:21,980 Imagine this same graph, same as before, with one change. 478 00:22:21,980 --> 00:22:25,070 The change being, now, instead of just an arrow from A to B, 479 00:22:25,070 --> 00:22:29,250 we also have an arrow from B to A, meaning we can go in both directions. 480 00:22:29,250 --> 00:22:31,580 And this is true in something like the 15 puzzle 481 00:22:31,580 --> 00:22:33,560 where when I slide a tile to the right, I 482 00:22:33,560 --> 00:22:36,980 could then slide a tile to the left to get back to the original position. 483 00:22:36,980 --> 00:22:39,530 I could go back and forth between A and B. 484 00:22:39,530 --> 00:22:42,920 And that's what these double arrows symbolize, the idea that from one state 485 00:22:42,920 --> 00:22:45,310 I can get to another and then I can get back. 486 00:22:45,310 --> 00:22:47,600 And that's true in many search problems. 487 00:22:47,600 --> 00:22:50,990 What's going to happen if I try to apply the same approach now? 488 00:22:50,990 --> 00:22:53,280 Well, I'll begin with A, same as before. 489 00:22:53,280 --> 00:22:55,160 And I'll remove A from the frontier. 490 00:22:55,160 --> 00:22:59,420 And then I'll consider where I can get to from A. And after A, the only place 491 00:22:59,420 --> 00:23:02,360 I can get choice B so B goes into the frontier. 492 00:23:02,360 --> 00:23:03,410 Then I'll say, all right. 493 00:23:03,410 --> 00:23:06,200 Let's take a look at B. That's the only thing left in the frontier. 494 00:23:06,200 --> 00:23:08,150 Where can I get to from B? 495 00:23:08,150 --> 00:23:12,410 Before it was just C and D, but now because of that reverse arrow, 496 00:23:12,410 --> 00:23:17,950 I can get to A or C or D. So all three A, C, and D. All of those 497 00:23:17,950 --> 00:23:19,070 now go into the frontier. 498 00:23:19,070 --> 00:23:23,300 They are places I can get to from B. And now I remove one from the frontier, 499 00:23:23,300 --> 00:23:25,870 and, you know, maybe I'm unlucky and maybe I pick A. 500 00:23:25,870 --> 00:23:27,820 And now I'm looking at A again. 501 00:23:27,820 --> 00:23:31,490 And I consider where can I get to from A. And from A, well I can get to B. 502 00:23:31,490 --> 00:23:34,080 And now we start to see the problem, that if I'm not careful, 503 00:23:34,080 --> 00:23:37,040 I go from A to B and then back to A and then to B again. 504 00:23:37,040 --> 00:23:39,380 And I could be going in this infinite loop where I never 505 00:23:39,380 --> 00:23:43,250 make any progress because I'm constantly just going back and forth between two 506 00:23:43,250 --> 00:23:45,390 states that I've already seen. 507 00:23:45,390 --> 00:23:46,730 So what is the solution to this? 508 00:23:46,730 --> 00:23:48,980 We need some way to deal with this problem. 509 00:23:48,980 --> 00:23:50,900 And the way that we can deal with this problem 510 00:23:50,900 --> 00:23:54,600 is by somehow keeping track of what we've already explored. 511 00:23:54,600 --> 00:23:58,150 And the logic is going to be, well, if we've already explored the state, 512 00:23:58,150 --> 00:23:59,630 there's no reason to go back to it. 513 00:23:59,630 --> 00:24:01,700 Once we've explored a state, don't go back to it, 514 00:24:01,700 --> 00:24:03,860 don't bother adding it to the frontier. 515 00:24:03,860 --> 00:24:05,570 There's no need to. 516 00:24:05,570 --> 00:24:07,640 So here is going to be our revised approach, 517 00:24:07,640 --> 00:24:10,450 a better way to approach this sort of search problem. 518 00:24:10,450 --> 00:24:14,030 And it's going to look very similar just with a couple of modifications. 519 00:24:14,030 --> 00:24:16,850 We'll start with a frontier that contains the initial state. 520 00:24:16,850 --> 00:24:18,110 Same as before. 521 00:24:18,110 --> 00:24:21,260 But now we'll start with another data structure, 522 00:24:21,260 --> 00:24:24,290 which would just be a set of nodes that we've already explored. 523 00:24:24,290 --> 00:24:25,880 So what are the states we've explored? 524 00:24:25,880 --> 00:24:27,330 Initially, it's empty. 525 00:24:27,330 --> 00:24:29,710 We have an empty explored set. 526 00:24:29,710 --> 00:24:30,930 And now we repeat. 527 00:24:30,930 --> 00:24:33,140 If the frontier is empty, no solution. 528 00:24:33,140 --> 00:24:34,400 Same as before. 529 00:24:34,400 --> 00:24:36,460 We remove a node from the frontier, we check 530 00:24:36,460 --> 00:24:38,510 to see if it's a goal state, return the solution. 531 00:24:38,510 --> 00:24:40,670 None of this is any different so far. 532 00:24:40,670 --> 00:24:42,650 But now, what we're going to do is we're going 533 00:24:42,650 --> 00:24:45,800 to add the node to the explored state. 534 00:24:45,800 --> 00:24:49,790 So if it happens to be the case that we remove a node from the frontier 535 00:24:49,790 --> 00:24:52,420 and it's not the goal, we'll add it to the explored set 536 00:24:52,420 --> 00:24:54,170 so that we know we've already explored it. 537 00:24:54,170 --> 00:24:58,010 We don't need to go back to it again if it happens to come up later. 538 00:24:58,010 --> 00:25:00,440 And then the final step, we expand the node 539 00:25:00,440 --> 00:25:03,120 and we add the resulting nodes to the frontier. 540 00:25:03,120 --> 00:25:05,960 But before we just always added the resulting nodes to the frontier, 541 00:25:05,960 --> 00:25:08,270 we're going to be a little cleverer about it this time. 542 00:25:08,270 --> 00:25:10,970 We're only going to add the nodes to the frontier 543 00:25:10,970 --> 00:25:14,120 if they aren't already in the frontier and if they 544 00:25:14,120 --> 00:25:16,790 aren't already in the explored set. 545 00:25:16,790 --> 00:25:19,340 So we'll check both the frontier and the explored set, 546 00:25:19,340 --> 00:25:22,520 make sure that the node isn't already in one of those two, 547 00:25:22,520 --> 00:25:25,700 and so long as it isn't, then we'll go ahead and add to the frontier 548 00:25:25,700 --> 00:25:27,590 but not otherwise. 549 00:25:27,590 --> 00:25:29,390 And so that revised approach is ultimately 550 00:25:29,390 --> 00:25:31,220 what's going to help make sure that we don't 551 00:25:31,220 --> 00:25:34,210 go back and forth between two nodes. 552 00:25:34,210 --> 00:25:36,420 Now the one point that I've kind of glossed over here 553 00:25:36,420 --> 00:25:40,770 so far is this step here, removing a node from the frontier. 554 00:25:40,770 --> 00:25:44,770 Before I just chose arbitrarily, like let's just remove a node and that's it. 555 00:25:44,770 --> 00:25:46,710 But it turns out it's actually quite important 556 00:25:46,710 --> 00:25:49,860 how we decide to structure our frontier, how we add them, 557 00:25:49,860 --> 00:25:51,840 and how we remove our nodes. 558 00:25:51,840 --> 00:25:53,370 The frontier is a data structure. 559 00:25:53,370 --> 00:25:55,960 And we need to make a choice about in what order 560 00:25:55,960 --> 00:25:58,080 are we going to be removing elements? 561 00:25:58,080 --> 00:26:01,600 And one of the simplest data structures for adding and removing elements 562 00:26:01,600 --> 00:26:03,300 is something called a stack. 563 00:26:03,300 --> 00:26:07,960 And a stack is a data structure that is a last in, first out data type. 564 00:26:07,960 --> 00:26:10,920 Which means the last thing that I add to the frontier 565 00:26:10,920 --> 00:26:14,860 is going to be the first thing that I remove from the frontier. 566 00:26:14,860 --> 00:26:18,720 So the most recent thing to go into the stack, or the frontier in this case, 567 00:26:18,720 --> 00:26:21,660 is going to be the node that I explore. 568 00:26:21,660 --> 00:26:25,170 So let's see what happens if I apply this stack based approach 569 00:26:25,170 --> 00:26:29,850 to something like this problem, finding a path from A to E. 570 00:26:29,850 --> 00:26:30,910 What's going to happen? 571 00:26:30,910 --> 00:26:33,340 Well, again we'll start with A. And we'll say, all right. 572 00:26:33,340 --> 00:26:34,950 Let's go ahead and look at A first. 573 00:26:34,950 --> 00:26:38,970 And then, notice this time, we've added A to the explored set. 574 00:26:38,970 --> 00:26:41,790 A is something we've now explored, we have this data structure 575 00:26:41,790 --> 00:26:43,320 that's keeping track. 576 00:26:43,320 --> 00:26:46,620 We then say from A we can get to B. And all right. 577 00:26:46,620 --> 00:26:48,040 From B what can we do? 578 00:26:48,040 --> 00:26:52,230 Well from B, we can explore B and get to both C and D. 579 00:26:52,230 --> 00:26:57,030 So we added C and then D. So now when we explore a node, 580 00:26:57,030 --> 00:27:00,390 we're going to treat the frontier as a stack, last in, first out. 581 00:27:00,390 --> 00:27:04,490 D was the last one to come in so we'll go ahead and explore that next. 582 00:27:04,490 --> 00:27:06,450 And say, all right, where can we get to from D? 583 00:27:06,450 --> 00:27:08,640 Well we can get to F. And so, all right. 584 00:27:08,640 --> 00:27:11,220 We'll put F into the frontier. 585 00:27:11,220 --> 00:27:13,800 And now because the frontier is a stack, F 586 00:27:13,800 --> 00:27:16,630 is the most recent thing that's gone in the stack. 587 00:27:16,630 --> 00:27:18,000 So F is what we'll explore next. 588 00:27:18,000 --> 00:27:19,960 We'll explore F and say, all right. 589 00:27:19,960 --> 00:27:21,600 Where can we get you from F? 590 00:27:21,600 --> 00:27:24,970 Well, we can't get anywhere so nothing gets added to the frontier. 591 00:27:24,970 --> 00:27:27,870 So now what was the new most recent thing added to the frontier? 592 00:27:27,870 --> 00:27:30,480 Well it's not C, the only thing left in the frontier. 593 00:27:30,480 --> 00:27:34,100 We'll explore that from which we can say, all right, from C we can get to E. 594 00:27:34,100 --> 00:27:35,380 So E goes into the frontier. 595 00:27:35,380 --> 00:27:36,510 And then we say, all right. 596 00:27:36,510 --> 00:27:39,220 Let's look at E and E is now the solution. 597 00:27:39,220 --> 00:27:41,740 And now we've solved the problem. 598 00:27:41,740 --> 00:27:43,710 So when we treat the frontier like a stack, 599 00:27:43,710 --> 00:27:47,670 a last in, first out data structure, that's the result we get. 600 00:27:47,670 --> 00:27:52,650 We go from A to B to D to F, and then we sort of backed up and went down 601 00:27:52,650 --> 00:27:56,190 to C and then E. And it's important to get a visual sense for how 602 00:27:56,190 --> 00:27:57,360 this algorithm is working. 603 00:27:57,360 --> 00:27:59,550 We went very deep in this search tree, so 604 00:27:59,550 --> 00:28:02,670 to speak, all the way until the bottom where we hit a dead end. 605 00:28:02,670 --> 00:28:06,230 And then we effectively backed up and explored this other route 606 00:28:06,230 --> 00:28:07,740 that we didn't try before. 607 00:28:07,740 --> 00:28:10,530 And it's this going very deep in the search tree idea, 608 00:28:10,530 --> 00:28:13,800 this way the algorithm ends up working when we use a stack, 609 00:28:13,800 --> 00:28:18,210 that we call this version of the algorithm depth first search. 610 00:28:18,210 --> 00:28:20,340 Depth first search is the search algorithm 611 00:28:20,340 --> 00:28:23,840 where we always explore the deepest node in the frontier. 612 00:28:23,840 --> 00:28:27,000 We keep going deeper and deeper through our search tree. 613 00:28:27,000 --> 00:28:31,660 And then if we hit a dead end, we back up and we try something else instead. 614 00:28:31,660 --> 00:28:34,670 But depth first search is just one of the possible search options 615 00:28:34,670 --> 00:28:35,780 that we could use. 616 00:28:35,780 --> 00:28:38,120 It turns out that there is another algorithm called 617 00:28:38,120 --> 00:28:41,150 breadth first search, which behaves very similarly to depth 618 00:28:41,150 --> 00:28:43,040 first search with one difference. 619 00:28:43,040 --> 00:28:46,820 Instead of always exploring the deepest node in the search tree the way 620 00:28:46,820 --> 00:28:48,920 the depth first search does, breadth first search 621 00:28:48,920 --> 00:28:53,370 is always going to explore the shallowest node in the frontier. 622 00:28:53,370 --> 00:28:54,360 So what does that mean? 623 00:28:54,360 --> 00:28:56,900 Well, it means that instead of using a stack, which 624 00:28:56,900 --> 00:29:00,650 depth first search, or DFS, used where the most recent item added 625 00:29:00,650 --> 00:29:06,350 to the frontier is the one we'll explore next, in breadth first search, or BFS, 626 00:29:06,350 --> 00:29:11,120 will instead use a queue where a queue is a first in, first out data 627 00:29:11,120 --> 00:29:14,660 type, where the very first thing we add to the frontier is the first one 628 00:29:14,660 --> 00:29:15,470 we'll explore. 629 00:29:15,470 --> 00:29:17,380 And they effectively form a line or a queue, 630 00:29:17,380 --> 00:29:23,270 where the earlier you arrive in the frontier, the earlier you get explored. 631 00:29:23,270 --> 00:29:26,600 So what would that mean for the same exact problem finding a path from A 632 00:29:26,600 --> 00:29:27,950 to E? 633 00:29:27,950 --> 00:29:30,200 Well we start with A, same as before. 634 00:29:30,200 --> 00:29:33,410 Then we'll go ahead and have explored A, and say, where can we get to from A? 635 00:29:33,410 --> 00:29:36,080 Well, from A we can get to B. Same as before. 636 00:29:36,080 --> 00:29:37,340 From B, same as before. 637 00:29:37,340 --> 00:29:41,070 We can get to C and D so C and D get added to the frontier. 638 00:29:41,070 --> 00:29:43,460 This time, though, we added C to the frontier 639 00:29:43,460 --> 00:29:46,670 before D so we'll explore C first. 640 00:29:46,670 --> 00:29:48,350 So C gets explored. 641 00:29:48,350 --> 00:29:50,020 And from C, where can we get to? 642 00:29:50,020 --> 00:29:51,470 Well, we can get to E. 643 00:29:51,470 --> 00:29:53,690 So E gets added to the frontier. 644 00:29:53,690 --> 00:29:58,290 But because D was explored before E, we'll look at D next. 645 00:29:58,290 --> 00:30:00,590 So we'll explore D and say, where can we get to from D? 646 00:30:00,590 --> 00:30:03,860 We can get to F. And only then will we say, all right. 647 00:30:03,860 --> 00:30:07,550 Now we can get to E. And so what breadth first search, or BFS, 648 00:30:07,550 --> 00:30:12,290 did is we started here, we looked at both C and D, 649 00:30:12,290 --> 00:30:14,240 and then we looked at E. Effectively we're 650 00:30:14,240 --> 00:30:16,790 looking at things one away from the initial state, 651 00:30:16,790 --> 00:30:18,620 then two away from the initial state. 652 00:30:18,620 --> 00:30:22,500 And only then, things that are three away from the initial state. 653 00:30:22,500 --> 00:30:25,970 Unlike depth first search, which just went as deep as possible 654 00:30:25,970 --> 00:30:28,700 into the search tree until it hit a dead end and then, 655 00:30:28,700 --> 00:30:30,920 ultimately, had to back up. 656 00:30:30,920 --> 00:30:33,320 So these now are two different search algorithms 657 00:30:33,320 --> 00:30:35,800 that we could apply in order to try and solve a problem. 658 00:30:35,800 --> 00:30:37,850 And let's take a look at how these would actually 659 00:30:37,850 --> 00:30:41,910 work in practice with something like maze solving, for example. 660 00:30:41,910 --> 00:30:43,370 So here's an example of a maze. 661 00:30:43,370 --> 00:30:46,580 These empty cells represent places where our agent can move. 662 00:30:46,580 --> 00:30:49,340 These darkened gray cells and represent walls 663 00:30:49,340 --> 00:30:51,050 that the agent can't pass through. 664 00:30:51,050 --> 00:30:53,750 And, ultimately, our agent, our AI, is going 665 00:30:53,750 --> 00:30:56,590 to try to find a way to get from position A 666 00:30:56,590 --> 00:31:00,680 to position B via some sequence of actions, where those actions are left, 667 00:31:00,680 --> 00:31:02,930 right, up, and down. 668 00:31:02,930 --> 00:31:05,420 What will depth first search do in this case? 669 00:31:05,420 --> 00:31:08,270 Well depth first search will just follow one path. 670 00:31:08,270 --> 00:31:11,590 If it reaches a fork in the road where it has multiple different options, 671 00:31:11,590 --> 00:31:14,200 depth first search is just, in this case, going to choose one. 672 00:31:14,200 --> 00:31:15,560 There isn't a real preference. 673 00:31:15,560 --> 00:31:19,280 But it's going to keep following one until it hits a dead end. 674 00:31:19,280 --> 00:31:21,620 And when it hits a dead end, depth first search 675 00:31:21,620 --> 00:31:24,800 effectively goes back to the last decision point 676 00:31:24,800 --> 00:31:26,330 and tries the other path. 677 00:31:26,330 --> 00:31:28,760 Fully exhausting this entire path and when 678 00:31:28,760 --> 00:31:30,890 it realizes that, OK, the goal is not here, 679 00:31:30,890 --> 00:31:32,750 then it turns its attention to this path. 680 00:31:32,750 --> 00:31:34,610 It goes as deep as possible. 681 00:31:34,610 --> 00:31:39,050 When it hits a dead end, it backs up and then tries this other path, 682 00:31:39,050 --> 00:31:42,110 keeps going as deep as possible down one particular path, 683 00:31:42,110 --> 00:31:45,470 and when it realizes that that's a dead end, then it'll back up. 684 00:31:45,470 --> 00:31:48,230 And then ultimately find its way to the goal. 685 00:31:48,230 --> 00:31:51,680 And maybe you got lucky and maybe you made a different choice earlier on, 686 00:31:51,680 --> 00:31:54,590 but ultimately this is how depth first search is going to work. 687 00:31:54,590 --> 00:31:56,900 It's going to keep following until it hits a dead end. 688 00:31:56,900 --> 00:32:00,860 And when it hits a dead end, it backs up and looks for a different solution. 689 00:32:00,860 --> 00:32:02,570 And so one thing you might reasonably ask 690 00:32:02,570 --> 00:32:05,010 is, is this algorithm always going to work? 691 00:32:05,010 --> 00:32:09,200 Will it always actually find a way to get from the initial state to the goal? 692 00:32:09,200 --> 00:32:11,420 And it turns out that as long as our maze 693 00:32:11,420 --> 00:32:14,630 is finite, as long as they're that finitely many spaces where 694 00:32:14,630 --> 00:32:15,890 we can travel, then yes. 695 00:32:15,890 --> 00:32:19,900 Depth first search is going to find a solution because eventually it 696 00:32:19,900 --> 00:32:21,290 will just explore everything. 697 00:32:21,290 --> 00:32:24,590 If the maze happens to be infinite and there's an infinite state space, which 698 00:32:24,590 --> 00:32:28,350 does exist in certain types of problems, then it's a slightly different story. 699 00:32:28,350 --> 00:32:30,950 But as long as our maze has finitely many squares, 700 00:32:30,950 --> 00:32:33,020 we're going to find a solution. 701 00:32:33,020 --> 00:32:34,940 The next question, though, that we want to ask 702 00:32:34,940 --> 00:32:37,040 is, is it going to be a good solution? 703 00:32:37,040 --> 00:32:39,740 Is it the optimal solution that we can find? 704 00:32:39,740 --> 00:32:42,190 And the answer there is not necessarily. 705 00:32:42,190 --> 00:32:44,030 And let's take a look at an example of that. 706 00:32:44,030 --> 00:32:48,890 In this maze, for example, we're again trying to find our way from A to B. 707 00:32:48,890 --> 00:32:51,470 And you notice here there are multiple possible solutions. 708 00:32:51,470 --> 00:32:54,560 We could go this way, or we could go up in order 709 00:32:54,560 --> 00:32:57,350 to make our way from A to B. Now if we're lucky, 710 00:32:57,350 --> 00:33:01,040 depth first search will choose this way and get to B. But there's no reason, 711 00:33:01,040 --> 00:33:03,140 necessarily, why depth first search would choose 712 00:33:03,140 --> 00:33:05,440 between going up or going to the right. 713 00:33:05,440 --> 00:33:08,270 It's sort of an arbitrary decision point because both 714 00:33:08,270 --> 00:33:10,420 are going to be added to the frontier. 715 00:33:10,420 --> 00:33:13,310 And ultimately, if we get unlucky, depth first search 716 00:33:13,310 --> 00:33:15,950 might choose to explore this path first because it's just 717 00:33:15,950 --> 00:33:17,430 a random choice at this point. 718 00:33:17,430 --> 00:33:20,570 It will explore, explore, explore, and it'll eventually 719 00:33:20,570 --> 00:33:24,110 find the goal, this particular path, when in actuality there 720 00:33:24,110 --> 00:33:25,010 was a better path. 721 00:33:25,010 --> 00:33:28,940 There was a more optimal solution that used fewer steps, 722 00:33:28,940 --> 00:33:32,660 assuming we're measuring the cost of a solution based on the number of steps 723 00:33:32,660 --> 00:33:33,930 that we need to take. 724 00:33:33,930 --> 00:33:36,680 So depth first search, if we're unlucky, might end up 725 00:33:36,680 --> 00:33:41,880 not finding the best solution when a better solution is available. 726 00:33:41,880 --> 00:33:44,370 So if that's DFS, depth first search. 727 00:33:44,370 --> 00:33:47,300 How does BFS, or breadth first search, compare? 728 00:33:47,300 --> 00:33:49,550 How would it work in this particular situation? 729 00:33:49,550 --> 00:33:52,490 Well the algorithm is going to look very different visually 730 00:33:52,490 --> 00:33:54,800 in terms of how BFS explores. 731 00:33:54,800 --> 00:33:57,800 Because BFS looks at shallower nodes first, 732 00:33:57,800 --> 00:34:01,850 the idea is going to be BFS will first look at all of the nodes that 733 00:34:01,850 --> 00:34:04,190 are one away from the initial state. 734 00:34:04,190 --> 00:34:06,050 Look here and look here, for example. 735 00:34:06,050 --> 00:34:10,590 Just at the two nodes that are immediately next to this initial state. 736 00:34:10,590 --> 00:34:12,470 Then it will explore nodes that are two away, 737 00:34:12,470 --> 00:34:15,050 looking at the state and that state, for example. 738 00:34:15,050 --> 00:34:18,140 Then it will explore nodes that are three away, this state and that state. 739 00:34:18,140 --> 00:34:22,190 Whereas depth first search just picked one path and kept following it, 740 00:34:22,190 --> 00:34:24,210 breadth first search on the other hand, is 741 00:34:24,210 --> 00:34:27,570 taking the option of exploring all of the possible paths 742 00:34:27,570 --> 00:34:30,150 kind of at the same time, bouncing back between them, 743 00:34:30,150 --> 00:34:32,520 looking deeper and deeper at each one, but making 744 00:34:32,520 --> 00:34:35,400 sure to explore the shallower ones or the ones that are 745 00:34:35,400 --> 00:34:38,060 closer to the initial state earlier. 746 00:34:38,060 --> 00:34:41,190 So we'll keep following this pattern, looking at things that are four away, 747 00:34:41,190 --> 00:34:43,170 looking at things that are five away, looking 748 00:34:43,170 --> 00:34:48,230 at things that are six away, until eventually we make our way to the goal. 749 00:34:48,230 --> 00:34:51,090 And in this case, it's true we had to explore some states that 750 00:34:51,090 --> 00:34:52,880 ultimately didn't lead us anywhere. 751 00:34:52,880 --> 00:34:56,290 But the path that we found to the goal was the optimal path. 752 00:34:56,290 --> 00:34:59,850 This is the shortest way that we could get to the goal. 753 00:34:59,850 --> 00:35:02,970 And so, what might happen then in a larger maze? 754 00:35:02,970 --> 00:35:04,850 Well let's take a look at something like this 755 00:35:04,850 --> 00:35:06,860 and how breadth first search is going to behave. 756 00:35:06,860 --> 00:35:08,690 Well, breadth first search, again, will just 757 00:35:08,690 --> 00:35:11,270 keep following the states until it receives a decision point. 758 00:35:11,270 --> 00:35:13,460 It could go either left or right. 759 00:35:13,460 --> 00:35:16,850 And while DFS just picked one and kept following 760 00:35:16,850 --> 00:35:21,620 that until it hit a dead end, BFS on the other hand, will explore both. 761 00:35:21,620 --> 00:35:23,720 It'll say, look at this node, then this node, 762 00:35:23,720 --> 00:35:27,510 and I'll look at this node, then that node, so on and so forth. 763 00:35:27,510 --> 00:35:31,850 And when it hits a decision point here, rather than pick one left or two right 764 00:35:31,850 --> 00:35:36,350 and explore that path, it will again explore both alternating between them, 765 00:35:36,350 --> 00:35:37,380 going deeper and deeper. 766 00:35:37,380 --> 00:35:41,570 Will explore here, and then maybe here and here, and then keep going. 767 00:35:41,570 --> 00:35:44,780 Explore here and slowly make our way, you can visually 768 00:35:44,780 --> 00:35:46,490 see further and further out. 769 00:35:46,490 --> 00:35:48,200 Once we get to this decision point, we'll 770 00:35:48,200 --> 00:35:52,790 explore both up and down until, ultimately, we 771 00:35:52,790 --> 00:35:55,570 make our way to the goal. 772 00:35:55,570 --> 00:35:58,250 And what you'll notice is, yes, breadth first search 773 00:35:58,250 --> 00:36:02,630 did find our way from A to B by following this particular path. 774 00:36:02,630 --> 00:36:06,410 But it needed to explore a lot of states in order to do so. 775 00:36:06,410 --> 00:36:09,440 And so we see some trade here between DFS and BFS. 776 00:36:09,440 --> 00:36:13,430 That in DFS there may be some cases where there is some memory savings, 777 00:36:13,430 --> 00:36:16,280 as compared to a breadth first approach where 778 00:36:16,280 --> 00:36:19,210 breadth first search, in this case, had to explore a lot of states. 779 00:36:19,210 --> 00:36:22,470 But maybe that won't always be the case. 780 00:36:22,470 --> 00:36:24,920 So now let's actually turn our attention to some code. 781 00:36:24,920 --> 00:36:26,720 And look at the code that we could actually 782 00:36:26,720 --> 00:36:30,410 write in order to implement something like depth first search or breadth 783 00:36:30,410 --> 00:36:34,910 for the search in the context of solving a maze, for example. 784 00:36:34,910 --> 00:36:37,340 So I'll go ahead and go into my terminal. 785 00:36:37,340 --> 00:36:41,240 And what I have here inside of maze.pi is an implementation 786 00:36:41,240 --> 00:36:43,640 of this same idea of maze solving. 787 00:36:43,640 --> 00:36:46,700 I've defined a class called node that in this case 788 00:36:46,700 --> 00:36:49,620 is keeping track of the state, the parent, in other words 789 00:36:49,620 --> 00:36:51,860 the state before the state, and the action. 790 00:36:51,860 --> 00:36:54,110 In this case, we're not keeping track of the path cost 791 00:36:54,110 --> 00:36:56,780 because we can calculate the cost of the path at the end 792 00:36:56,780 --> 00:37:00,890 after we found our way from the initial state to the goal. 793 00:37:00,890 --> 00:37:05,540 In addition to this, I've defined a class called a stack frontier. 794 00:37:05,540 --> 00:37:08,780 And if unfamiliar with a class, a class is a way for me 795 00:37:08,780 --> 00:37:11,930 to define a way to generate objects in Python. 796 00:37:11,930 --> 00:37:16,070 It refers to an idea of object oriented programming where the idea here 797 00:37:16,070 --> 00:37:18,740 is that I would like to create an object that is 798 00:37:18,740 --> 00:37:20,900 able to store all of my Frontier Data. 799 00:37:20,900 --> 00:37:23,030 And I would like to have functions, otherwise known 800 00:37:23,030 --> 00:37:27,410 as methods on that object, that I can use to manipulate the object. 801 00:37:27,410 --> 00:37:31,130 And so what's going on here, if unfamiliar with the syntax, 802 00:37:31,130 --> 00:37:34,430 is I have a function that initially creates a frontier 803 00:37:34,430 --> 00:37:36,410 that I'm going to represent using a list. 804 00:37:36,410 --> 00:37:39,810 And initially my frontier is represented by the empty list. 805 00:37:39,810 --> 00:37:42,870 There's nothing in my frontier to begin with. 806 00:37:42,870 --> 00:37:46,020 I have an add function that adds something to the frontier, 807 00:37:46,020 --> 00:37:49,260 as by appending it to the end of the list. 808 00:37:49,260 --> 00:37:53,400 I have a function that checks if the frontier contains a particular state. 809 00:37:53,400 --> 00:37:55,990 I have an empty function that checks if the frontier is empty. 810 00:37:55,990 --> 00:37:58,830 If the frontier is empty, that just means the length of the frontier 811 00:37:58,830 --> 00:38:00,180 is zero. 812 00:38:00,180 --> 00:38:03,310 And then I have a function for removing something from the frontier. 813 00:38:03,310 --> 00:38:06,150 I can't remove something from the frontier if the frontier is empty. 814 00:38:06,150 --> 00:38:07,770 So I check for that first. 815 00:38:07,770 --> 00:38:10,710 But otherwise, if the frontier isn't empty, 816 00:38:10,710 --> 00:38:13,940 recall that I'm implementing this frontier as a stack, 817 00:38:13,940 --> 00:38:17,200 a last in, first out data structure. 818 00:38:17,200 --> 00:38:19,590 Which means the last thing I add to the frontier, 819 00:38:19,590 --> 00:38:21,570 in other words, the last thing in the list, 820 00:38:21,570 --> 00:38:25,750 is the item that I should remove from this frontier. 821 00:38:25,750 --> 00:38:30,190 So what you'll see here is I have removed the last item of a list. 822 00:38:30,190 --> 00:38:33,160 And if you index into a Python list with negative one, 823 00:38:33,160 --> 00:38:35,050 that gets you the last item in the list. 824 00:38:35,050 --> 00:38:37,630 Since zero is the first item, negative one 825 00:38:37,630 --> 00:38:41,490 kind of wraps around and gets you to the last item in the list. 826 00:38:41,490 --> 00:38:43,310 So we give that the node. 827 00:38:43,310 --> 00:38:46,710 We call that node, we update the frontier here on line 28 to say, 828 00:38:46,710 --> 00:38:50,000 go ahead and remove that node that you just removed from the frontier. 829 00:38:50,000 --> 00:38:54,860 And then we return the node as a result. So this class here effectively 830 00:38:54,860 --> 00:38:57,050 implements the idea of a frontier. 831 00:38:57,050 --> 00:38:59,690 It gives me a way to add something to a frontier and a way 832 00:38:59,690 --> 00:39:03,130 to remove something from the frontier as a stack. 833 00:39:03,130 --> 00:39:06,830 I've also, just for good measure, implemented an alternative version 834 00:39:06,830 --> 00:39:09,500 of the same thing called a Q frontier. 835 00:39:09,500 --> 00:39:13,190 Which, in parentheses you'll see here, it inherits from a stack frontier, 836 00:39:13,190 --> 00:39:16,640 meaning it's going to do all the same things that the stack frontier did, 837 00:39:16,640 --> 00:39:19,440 except the way we remove a node from the frontier 838 00:39:19,440 --> 00:39:21,090 is going to be slightly different. 839 00:39:21,090 --> 00:39:24,240 Instead of removing from the end of the list the way we would in a stack, 840 00:39:24,240 --> 00:39:26,810 we're instead going to remove from the beginning of the list. 841 00:39:26,810 --> 00:39:31,590 self.frontierzero will get me the first node in the frontier, 842 00:39:31,590 --> 00:39:32,870 the first one that was added. 843 00:39:32,870 --> 00:39:37,550 And that is going to be the one that we return in the case of a Q. 844 00:39:37,550 --> 00:39:40,340 Under here I have a definition of a class called maze. 845 00:39:40,340 --> 00:39:45,050 This is going to handle the process of taking a sequence, a maze like text 846 00:39:45,050 --> 00:39:47,400 file, and figuring out how to solve it. 847 00:39:47,400 --> 00:39:50,480 So we'll take as input a text file that looks something 848 00:39:50,480 --> 00:39:53,810 like this, for example, where we see hash marks that are here representing 849 00:39:53,810 --> 00:39:57,830 walls and I have the character A representing the starting position, 850 00:39:57,830 --> 00:40:01,740 and the character B representing the ending position. 851 00:40:01,740 --> 00:40:04,790 And you can take a look at the code for parsing this text file right now. 852 00:40:04,790 --> 00:40:06,320 That's the less interesting part. 853 00:40:06,320 --> 00:40:09,380 The more interesting part is this solve function here, 854 00:40:09,380 --> 00:40:11,510 where the solve function is going to figure out 855 00:40:11,510 --> 00:40:15,230 how to actually get from point A to point B. 856 00:40:15,230 --> 00:40:18,260 And here we see an implementation of the exact same idea 857 00:40:18,260 --> 00:40:19,900 we saw from a moment ago. 858 00:40:19,900 --> 00:40:21,740 We're going to keep track of how many states 859 00:40:21,740 --> 00:40:24,520 we've explored just so we can report that data later. 860 00:40:24,520 --> 00:40:29,770 But I start with a node that represents just the start state. 861 00:40:29,770 --> 00:40:34,040 And I start with a frontier that in this case is a stack frontier. 862 00:40:34,040 --> 00:40:36,170 And given that I'm treating my frontier as a stack, 863 00:40:36,170 --> 00:40:40,160 you might imagine that the algorithm I'm using here is now depth first search. 864 00:40:40,160 --> 00:40:45,230 Because depth first search or DFS uses a stack as its data structure. 865 00:40:45,230 --> 00:40:50,350 And initially, this frontier is just going to contain the start state. 866 00:40:50,350 --> 00:40:53,390 We initialize an explored set that initially is empty. 867 00:40:53,390 --> 00:40:55,420 There's nothing we've explored so far. 868 00:40:55,420 --> 00:40:59,980 And now here's our loop, that notion of repeating something again and again. 869 00:40:59,980 --> 00:41:03,760 First, we check if the frontier is empty by calling that empty function that we 870 00:41:03,760 --> 00:41:05,890 saw the implementation of a moment ago. 871 00:41:05,890 --> 00:41:08,080 And if the frontier is indeed empty, we'll 872 00:41:08,080 --> 00:41:11,650 go ahead and raise an exception, or a Python error, to say, sorry. 873 00:41:11,650 --> 00:41:15,230 There is no solution to this problem. 874 00:41:15,230 --> 00:41:18,630 Otherwise, we'll go ahead and remove a node from the frontier, 875 00:41:18,630 --> 00:41:22,960 as by calling frontier.remove and update the number of states we've explored. 876 00:41:22,960 --> 00:41:25,360 Because now we've explored one additional state 877 00:41:25,360 --> 00:41:30,100 so we say self.numexplored plus equals one, adding one to the number of states 878 00:41:30,100 --> 00:41:31,880 we've explored. 879 00:41:31,880 --> 00:41:34,430 Once we remove a node from the frontier, recall 880 00:41:34,430 --> 00:41:38,330 that the next step is to see whether or not it's the goal, the goal test. 881 00:41:38,330 --> 00:41:40,850 And in the case of the maze, the goal is pretty easy. 882 00:41:40,850 --> 00:41:45,110 I check to see whether the state of the node is equal to the goal. 883 00:41:45,110 --> 00:41:47,090 Initially when I set up the maze, I set up 884 00:41:47,090 --> 00:41:49,890 this value called goal which is the property of the maze 885 00:41:49,890 --> 00:41:53,300 so I can just check to see if the node is actually the goal. 886 00:41:53,300 --> 00:41:56,060 And if it is the goal, then what I want to do 887 00:41:56,060 --> 00:41:59,450 is backtrack my way towards figuring out what actions 888 00:41:59,450 --> 00:42:02,360 I took in order to get to this goal. 889 00:42:02,360 --> 00:42:03,470 And how do I do that? 890 00:42:03,470 --> 00:42:06,140 We'll recall that every node stores its parent-- 891 00:42:06,140 --> 00:42:09,110 the node that came before it that we used to get to this node-- 892 00:42:09,110 --> 00:42:11,690 and also the action used in order to get there. 893 00:42:11,690 --> 00:42:13,700 So I can create this loop where I'm constantly 894 00:42:13,700 --> 00:42:17,000 just looking at the parent of every node and keeping 895 00:42:17,000 --> 00:42:20,600 track, for all of the parents, what action I took to get from the parent 896 00:42:20,600 --> 00:42:21,890 to this. 897 00:42:21,890 --> 00:42:25,250 So this loop is going to keep repeating this process of looking through all 898 00:42:25,250 --> 00:42:28,670 of the parent nodes until we get back to the initial state, which 899 00:42:28,670 --> 00:42:32,940 has no parent, where node.parent is going to be equal to none. 900 00:42:32,940 --> 00:42:35,240 As I do so, I'm going to be building up the list of all 901 00:42:35,240 --> 00:42:38,030 of the actions that I'm following and the list of all of the cells 902 00:42:38,030 --> 00:42:39,590 that are part of the solution. 903 00:42:39,590 --> 00:42:42,020 But I'll reverse them because when I build it 904 00:42:42,020 --> 00:42:44,930 up going from the goal back to the initial state, 905 00:42:44,930 --> 00:42:48,020 I'm building the sequence of actions from the goal to the initial state, 906 00:42:48,020 --> 00:42:50,900 but I want to reverse them in order to get the sequence of actions 907 00:42:50,900 --> 00:42:53,630 from the initial state to the goal. 908 00:42:53,630 --> 00:42:57,210 And that is, ultimately, going to be the solution. 909 00:42:57,210 --> 00:43:01,280 So all of that happens if the current state is equal to the goal. 910 00:43:01,280 --> 00:43:03,290 And otherwise, if it's not the goal, well, 911 00:43:03,290 --> 00:43:06,860 then I'll go ahead and add this state to the explored set to say, 912 00:43:06,860 --> 00:43:08,240 I've explored this state now. 913 00:43:08,240 --> 00:43:11,510 No need to go back to it if I come across it in the future. 914 00:43:11,510 --> 00:43:14,750 And then, this logic here implements the idea 915 00:43:14,750 --> 00:43:16,820 of adding neighbors to the frontier. 916 00:43:16,820 --> 00:43:18,650 I'm saying, look at all of my neighbors. 917 00:43:18,650 --> 00:43:21,530 And I implemented a function called neighbors that you can take a look at. 918 00:43:21,530 --> 00:43:23,690 And for each of those neighbors, I'm going to check, 919 00:43:23,690 --> 00:43:25,850 is the state already in the frontier? 920 00:43:25,850 --> 00:43:28,400 Is the state already in the explored set? 921 00:43:28,400 --> 00:43:32,600 And if it's not in either of those, then I'll go ahead and add this new child 922 00:43:32,600 --> 00:43:33,950 node-- this new node-- 923 00:43:33,950 --> 00:43:35,230 to the frontier. 924 00:43:35,230 --> 00:43:37,610 So there's a fair amount of syntax here, but the key here 925 00:43:37,610 --> 00:43:39,920 is not to understand all the nuances of the syntax, 926 00:43:39,920 --> 00:43:42,740 though feel free to take a closer look at this file on your own 927 00:43:42,740 --> 00:43:44,660 to get a sense for how it is working. 928 00:43:44,660 --> 00:43:48,140 But the key is to see how this is an implementation of the same pseudocode, 929 00:43:48,140 --> 00:43:53,060 the same idea that we were describing a moment ago on the screen when we were 930 00:43:53,060 --> 00:43:55,220 looking at the steps that we might follow in order 931 00:43:55,220 --> 00:43:57,720 to solve this kind of search problem. 932 00:43:57,720 --> 00:43:59,540 So now let's actually see this in action. 933 00:43:59,540 --> 00:44:05,540 I'll go ahead and run maze.py on maze1.txt, for example. 934 00:44:05,540 --> 00:44:09,320 And what we'll see is here we have a printout of what the maze initially 935 00:44:09,320 --> 00:44:10,370 looked like. 936 00:44:10,370 --> 00:44:13,010 And then here, down below, is after we've solved it. 937 00:44:13,010 --> 00:44:17,980 We had to explore 11 states in order to do it, and we found a path from A to B. 938 00:44:17,980 --> 00:44:21,110 And in this program, I just happened to generate a graphical representation 939 00:44:21,110 --> 00:44:22,040 of this, as well-- 940 00:44:22,040 --> 00:44:25,400 so I can open up maze.png, which is generated by this program-- 941 00:44:25,400 --> 00:44:28,820 that shows you where, in the darker color here, the wall is. 942 00:44:28,820 --> 00:44:30,860 Red is the initial state, green is the goal, 943 00:44:30,860 --> 00:44:32,930 and yellow is the path that was followed. 944 00:44:32,930 --> 00:44:37,230 We found a path from the initial state to the goal. 945 00:44:37,230 --> 00:44:40,050 But now let's take a look at a more sophisticated maze 946 00:44:40,050 --> 00:44:42,120 to see what might happen instead. 947 00:44:42,120 --> 00:44:47,010 Let's look now at maze2.txt, where now here we have a much larger maze. 948 00:44:47,010 --> 00:44:50,280 Again, we're trying to find our way from point A to point B, 949 00:44:50,280 --> 00:44:53,520 but now you'll imagine that depth-first search might not be so lucky. 950 00:44:53,520 --> 00:44:56,010 It might not get the goal on the first try. 951 00:44:56,010 --> 00:44:58,560 It might have to follow one path then backtrack 952 00:44:58,560 --> 00:45:02,100 and explore something else a little bit later. 953 00:45:02,100 --> 00:45:03,230 So let's try this. 954 00:45:03,230 --> 00:45:08,930 Run pythonmaze.py of maze2.txt, this time trying on this other maze. 955 00:45:08,930 --> 00:45:12,140 And now depth-first search is able to find a solution. 956 00:45:12,140 --> 00:45:16,040 Here, as indicated by the stars, is a way to get from A to B. 957 00:45:16,040 --> 00:45:19,430 And we can represent this visually by opening up this maze. 958 00:45:19,430 --> 00:45:20,810 Here's what that maze looks like. 959 00:45:20,810 --> 00:45:24,860 And highlighted in yellow, is the path that was found from the initial state 960 00:45:24,860 --> 00:45:26,300 to the goal. 961 00:45:26,300 --> 00:45:31,340 But how many states do we have to explore before we found that path? 962 00:45:31,340 --> 00:45:34,610 Well, recall that, in my program, I was keeping track of the number of states 963 00:45:34,610 --> 00:45:36,530 that we've explored so far. 964 00:45:36,530 --> 00:45:40,280 And so I can go back to the terminal and see that, all right, in order 965 00:45:40,280 --> 00:45:46,110 to solve this problem, we had to explore 399 different states. 966 00:45:46,110 --> 00:45:48,860 And in fact, if I make one small modification to the program 967 00:45:48,860 --> 00:45:51,950 and tell the program at the end when we output this image, 968 00:45:51,950 --> 00:45:55,160 I added an argument called "show explored". 969 00:45:55,160 --> 00:45:57,980 And if I set "show explored" equal to true 970 00:45:57,980 --> 00:46:02,720 and rerun this program pythonmaze.py by running it on maze2, 971 00:46:02,720 --> 00:46:06,320 and then I open the maze, what you'll see here is, highlighted in red, 972 00:46:06,320 --> 00:46:10,610 are all of the states that had to be explored to get from the initial state 973 00:46:10,610 --> 00:46:11,510 to the goal. 974 00:46:11,510 --> 00:46:15,140 Depth-First Search, or DFS, didn't find its way to the goal right away. 975 00:46:15,140 --> 00:46:18,170 It made a choice to first explore this direction. 976 00:46:18,170 --> 00:46:19,970 And when it explored this direction, it had 977 00:46:19,970 --> 00:46:22,280 to follow every conceivable path, all the way 978 00:46:22,280 --> 00:46:24,680 to the very end, even this long and winding one, 979 00:46:24,680 --> 00:46:27,440 in order to realize that, you know what, that's a dead end. 980 00:46:27,440 --> 00:46:29,720 And instead, the program needed to backtrack. 981 00:46:29,720 --> 00:46:32,660 After going this direction, it must have gone this direction. 982 00:46:32,660 --> 00:46:35,490 It got lucky here by just not choosing this path. 983 00:46:35,490 --> 00:46:39,290 But it got unlucky here, exploring this direction, exploring a bunch of states 984 00:46:39,290 --> 00:46:41,150 that it didn't need to and then, likewise, 985 00:46:41,150 --> 00:46:43,490 exploring all of this top part of the graph 986 00:46:43,490 --> 00:46:46,320 when it probably didn't need to do that either. 987 00:46:46,320 --> 00:46:49,070 So all in all, depth-first search here really 988 00:46:49,070 --> 00:46:52,970 not performing optimally, or probably exploring more states than it needs to. 989 00:46:52,970 --> 00:46:56,600 It finds an optimal solution, the best path to the goal, 990 00:46:56,600 --> 00:46:59,510 but the number of states needed to explore in order to do so, 991 00:46:59,510 --> 00:47:03,060 the number of steps I had to take, that was much higher. 992 00:47:03,060 --> 00:47:04,070 So let's compare. 993 00:47:04,070 --> 00:47:09,060 How would Breadth-First Search, or BFS, do on this exact same maze instead? 994 00:47:09,060 --> 00:47:11,630 And in order to do so, it's a very easy change. 995 00:47:11,630 --> 00:47:16,550 The algorithm for DFS and BFS is identical with the exception 996 00:47:16,550 --> 00:47:20,950 of what data structure we use to represent the frontier. 997 00:47:20,950 --> 00:47:23,840 That in DFS I used a stack frontier-- 998 00:47:23,840 --> 00:47:25,880 last in, first out-- 999 00:47:25,880 --> 00:47:30,380 whereas in BFS, I'm going to use a queue frontier-- first in, 1000 00:47:30,380 --> 00:47:33,260 first out, where the first thing I add to the frontier 1001 00:47:33,260 --> 00:47:35,570 is the first thing that I remove. 1002 00:47:35,570 --> 00:47:40,670 So I'll go back to the terminal, rerun this program on the same maze, 1003 00:47:40,670 --> 00:47:45,280 and now you'll see that the number of states we had to explore was only 77, 1004 00:47:45,280 --> 00:47:49,140 as compared to almost 400 when we used depth-first search. 1005 00:47:49,140 --> 00:47:50,330 And we can see exactly why. 1006 00:47:50,330 --> 00:47:54,980 We can see what happened if we open up maze.png now and take a look. 1007 00:47:54,980 --> 00:47:59,540 Again, yellow highlight is the solution that breath-first search found, 1008 00:47:59,540 --> 00:48:03,020 which, incidentally, is the same solution that depth-first search found. 1009 00:48:03,020 --> 00:48:07,110 They're both finding the best solution, but notice all the white unexplored 1010 00:48:07,110 --> 00:48:07,610 cells. 1011 00:48:07,610 --> 00:48:10,700 There was much fewer states that needed to be explored 1012 00:48:10,700 --> 00:48:14,980 in order to make our way to the goal because breadth-first search operates 1013 00:48:14,980 --> 00:48:15,980 a little more shallowly. 1014 00:48:15,980 --> 00:48:19,070 It's exploring things that are close to the initial state 1015 00:48:19,070 --> 00:48:22,170 without exploring things that are further away. 1016 00:48:22,170 --> 00:48:25,220 So if the goal is not too far away, then breadth-first search 1017 00:48:25,220 --> 00:48:27,950 can actually behave quite effectively on a maze that 1018 00:48:27,950 --> 00:48:30,870 looks a little something like this. 1019 00:48:30,870 --> 00:48:35,750 Now, in this case, both BFS and DFS ended up finding the same solution, 1020 00:48:35,750 --> 00:48:37,680 but that won't always be the case. 1021 00:48:37,680 --> 00:48:43,390 And in fact, let's take a look at one more example, for instance, maze3.txt. 1022 00:48:43,390 --> 00:48:46,970 In maze3.txt, notice that here there are multiple ways 1023 00:48:46,970 --> 00:48:49,190 that you could get from A to B. 1024 00:48:49,190 --> 00:48:52,040 It's a relatively small maze, but let's look at what happens. 1025 00:48:52,040 --> 00:48:55,670 If I use-- and I'll go ahead and turn off "show explored" so 1026 00:48:55,670 --> 00:48:58,390 we just see the solution. 1027 00:48:58,390 --> 00:49:04,590 If I use BFS, breadth-first search, to solve maze3.txt, 1028 00:49:04,590 --> 00:49:06,420 well, then we find a solution. 1029 00:49:06,420 --> 00:49:09,530 And if I open up the maze, here's the solution that we found. 1030 00:49:09,530 --> 00:49:10,610 It is the optimal one. 1031 00:49:10,610 --> 00:49:13,700 With just four steps, we can get from the initial state 1032 00:49:13,700 --> 00:49:17,090 to what the goal happens to be. 1033 00:49:17,090 --> 00:49:21,890 But what happens if we try to use, depth-first search, or DFS, instead? 1034 00:49:21,890 --> 00:49:26,540 Well, again, I'll go back up to my queue frontier, where queue frontier means 1035 00:49:26,540 --> 00:49:28,850 that we're using breadth-first search. 1036 00:49:28,850 --> 00:49:32,120 And I'll change it to a stack frontier, which means that now we'll 1037 00:49:32,120 --> 00:49:34,850 be using depth-first search. 1038 00:49:34,850 --> 00:49:37,910 I'll rerun Pythonmaze.py. 1039 00:49:37,910 --> 00:49:40,490 And now you'll see that we find a solution, 1040 00:49:40,490 --> 00:49:42,980 but it is not the optimal solution. 1041 00:49:42,980 --> 00:49:45,640 This, instead, is what our algorithm finds. 1042 00:49:45,640 --> 00:49:48,140 And maybe depth-first search would have found this solution. 1043 00:49:48,140 --> 00:49:51,410 It's possible, but it's not guaranteed, that if we just 1044 00:49:51,410 --> 00:49:55,280 happen to be unlucky, if we choose this state instead of that state, 1045 00:49:55,280 --> 00:49:58,280 then depth-first search might find a longer route to get 1046 00:49:58,280 --> 00:50:01,260 from the initial state to the goal. 1047 00:50:01,260 --> 00:50:04,280 So we do see some trade-offs here where depth-first search might not 1048 00:50:04,280 --> 00:50:06,240 find the optimal solution. 1049 00:50:06,240 --> 00:50:09,080 So at that point, it seems like breadth-first search is pretty good. 1050 00:50:09,080 --> 00:50:12,920 Is that the best we can do, where it's going to find us the optimal solution 1051 00:50:12,920 --> 00:50:15,620 and we don't have to worry about situations where 1052 00:50:15,620 --> 00:50:20,270 we might end up finding a longer path to the solution than what actually exists? 1053 00:50:20,270 --> 00:50:23,150 Where the goal is far away from the initial state-- 1054 00:50:23,150 --> 00:50:26,780 and we might have to take lots of steps in order to get from the initial state 1055 00:50:26,780 --> 00:50:27,620 to the goal-- 1056 00:50:27,620 --> 00:50:31,220 what ended up happening, is that this algorithm, BFS, ended up 1057 00:50:31,220 --> 00:50:35,480 exploring basically the entire graph, having to go through the entire maze 1058 00:50:35,480 --> 00:50:39,800 in order to find its way from the initial state to the goal state. 1059 00:50:39,800 --> 00:50:41,960 What we'd ultimately like is for our algorithm 1060 00:50:41,960 --> 00:50:44,320 to be a little bit more intelligent. 1061 00:50:44,320 --> 00:50:46,310 And now what would it mean for our algorithm 1062 00:50:46,310 --> 00:50:49,820 to be a little bit more intelligent, in this case? 1063 00:50:49,820 --> 00:50:52,490 Well, let's look back to where breadth-first search might 1064 00:50:52,490 --> 00:50:54,290 have been able to make a different decision 1065 00:50:54,290 --> 00:50:57,570 and consider human intuition in this process, as well. 1066 00:50:57,570 --> 00:51:01,640 Like, what might a human do when solving this maze that is different than what 1067 00:51:01,640 --> 00:51:04,490 BFS ultimately chose to do? 1068 00:51:04,490 --> 00:51:07,610 Well, the very first decision point that BFS made 1069 00:51:07,610 --> 00:51:11,420 was right here, when it made five steps and ended up 1070 00:51:11,420 --> 00:51:13,340 in a position where it had a fork in the road. 1071 00:51:13,340 --> 00:51:15,210 It could either go left or it could go right. 1072 00:51:15,210 --> 00:51:17,460 In these initial couple of steps, there was no choice. 1073 00:51:17,460 --> 00:51:20,790 There was only one action that could be taken from each of those states. 1074 00:51:20,790 --> 00:51:23,030 And so the search algorithm did the only thing 1075 00:51:23,030 --> 00:51:25,010 that any search algorithm could do, which 1076 00:51:25,010 --> 00:51:28,400 is keep following that state after the next state. 1077 00:51:28,400 --> 00:51:31,670 But this decision point is where things get a little bit interesting. 1078 00:51:31,670 --> 00:51:34,850 Depth-first search, that very first search algorithm we looked at, 1079 00:51:34,850 --> 00:51:38,750 chose to say, let's pick one path and exhaust that path, 1080 00:51:38,750 --> 00:51:42,560 see if anything that way has the goal, and if not, then let's 1081 00:51:42,560 --> 00:51:43,800 try the other way. 1082 00:51:43,800 --> 00:51:46,580 Breadth-first search took the alternative approach of saying, 1083 00:51:46,580 --> 00:51:47,210 you know what? 1084 00:51:47,210 --> 00:51:51,510 Let's explore things that are shallow, close to us first, look left and right, 1085 00:51:51,510 --> 00:51:53,960 then back left and back right, so on and so forth, 1086 00:51:53,960 --> 00:51:58,640 alternating between our options in the hopes of finding something nearby. 1087 00:51:58,640 --> 00:52:02,390 But ultimately, what might a human do if confronted with a situation like this 1088 00:52:02,390 --> 00:52:04,250 of go left or go right? 1089 00:52:04,250 --> 00:52:07,010 Well, a human might visually see that, all right, 1090 00:52:07,010 --> 00:52:11,360 I'm trying to get to state B, which is way up there, and going right just 1091 00:52:11,360 --> 00:52:13,360 feels like it's closer to the goal. 1092 00:52:13,360 --> 00:52:15,140 Like, it feels like going right should be 1093 00:52:15,140 --> 00:52:17,870 better than going left because I'm making progress 1094 00:52:17,870 --> 00:52:19,380 towards getting to that goal. 1095 00:52:19,380 --> 00:52:22,340 Now, of course, there are a couple of assumptions that I'm making here. 1096 00:52:22,340 --> 00:52:25,220 I'm making the assumption that we can represent 1097 00:52:25,220 --> 00:52:27,080 this grid as, like, a two-dimensional grid, 1098 00:52:27,080 --> 00:52:28,880 where I know the coordinates of everything. 1099 00:52:28,880 --> 00:52:33,940 I know that A is in coordinate 0,0, and B is in some other coordinate pair. 1100 00:52:33,940 --> 00:52:37,070 And I know what coordinate I'm at now, so I can calculate that, yeah, going 1101 00:52:37,070 --> 00:52:39,420 this way, that is closer to the goal. 1102 00:52:39,420 --> 00:52:42,170 And that might be a reasonable assumption for some types of search 1103 00:52:42,170 --> 00:52:44,060 problems but maybe not in others. 1104 00:52:44,060 --> 00:52:46,670 But for now, we'll go ahead and assume that-- 1105 00:52:46,670 --> 00:52:51,590 that I know what my current coordinate pair and I know the coordinate x,y 1106 00:52:51,590 --> 00:52:53,680 of the goal that I'm trying to get to. 1107 00:52:53,680 --> 00:52:56,540 And in this situation, I'd like an algorithm that 1108 00:52:56,540 --> 00:52:59,090 is a little bit more intelligent and somehow knows 1109 00:52:59,090 --> 00:53:02,180 that I should be making progress towards the goal, 1110 00:53:02,180 --> 00:53:05,720 and this is probably the way to do that because, in a maze, 1111 00:53:05,720 --> 00:53:08,480 moving in the coordinate direction of the goal 1112 00:53:08,480 --> 00:53:11,880 is usually, though not always, a good thing. 1113 00:53:11,880 --> 00:53:14,840 And so here we draw a distinction between two different types of search 1114 00:53:14,840 --> 00:53:19,040 algorithms-- uninformed search and informed search. 1115 00:53:19,040 --> 00:53:23,330 Uninformed search algorithms are algorithms like DFS and BFS, 1116 00:53:23,330 --> 00:53:25,200 the two algorithms that we just looked at, 1117 00:53:25,200 --> 00:53:29,540 which are search strategies that don't use any problem specific knowledge 1118 00:53:29,540 --> 00:53:31,400 to be able to solve the problem. 1119 00:53:31,400 --> 00:53:34,610 DFS and BFS didn't really care about the structure 1120 00:53:34,610 --> 00:53:38,270 of the maze or anything about the way that a maze is in order 1121 00:53:38,270 --> 00:53:39,330 to solve the problem. 1122 00:53:39,330 --> 00:53:42,490 They just look at the actions available and choose from those actions, 1123 00:53:42,490 --> 00:53:45,290 and it doesn't matter whether if it's a maze or some other problem. 1124 00:53:45,290 --> 00:53:48,020 The solution, or the way that it tries to solve the problem, 1125 00:53:48,020 --> 00:53:51,320 is really fundamentally going to be the same. 1126 00:53:51,320 --> 00:53:53,030 What we're going to take a look at now is 1127 00:53:53,030 --> 00:53:55,370 an improvement upon uninformed search. 1128 00:53:55,370 --> 00:53:57,830 We're going to take a look at informed search. 1129 00:53:57,830 --> 00:54:00,440 Informed search are going to be search strategies that 1130 00:54:00,440 --> 00:54:05,800 use knowledge specific to the problem to be able to better find a solution. 1131 00:54:05,800 --> 00:54:09,270 And in the case of a maze, this problem specific knowledge 1132 00:54:09,270 --> 00:54:11,270 is something like, if I'm going to square 1133 00:54:11,270 --> 00:54:14,420 that is geographically closer to the goal, that 1134 00:54:14,420 --> 00:54:19,700 is better than being in a square that is geographically further away. 1135 00:54:19,700 --> 00:54:23,330 And this is something we can only know by thinking about this problem 1136 00:54:23,330 --> 00:54:27,820 and reasoning about what knowledge might be helpful for our AI agent 1137 00:54:27,820 --> 00:54:30,060 to know a little something about. 1138 00:54:30,060 --> 00:54:32,440 There are a number of different types of informed search. 1139 00:54:32,440 --> 00:54:35,140 Specifically, first, we're going to look at a particular type 1140 00:54:35,140 --> 00:54:39,550 of search algorithm called greedy best-first search. 1141 00:54:39,550 --> 00:54:42,700 Greedy Best-First Search, often abbreviated GBFS, 1142 00:54:42,700 --> 00:54:45,970 is a search algorithm that, instead of expanding the deepest node, 1143 00:54:45,970 --> 00:54:49,660 like DFS, or the shallowest node, like BFS, 1144 00:54:49,660 --> 00:54:52,390 this algorithm is always going to expand the node 1145 00:54:52,390 --> 00:54:55,990 that it thinks is closest to the goal. 1146 00:54:55,990 --> 00:54:59,650 Now, the search algorithm isn't going to know for sure whether it is the closest 1147 00:54:59,650 --> 00:55:02,650 thing to the goal, because if we knew what was closest to the goal 1148 00:55:02,650 --> 00:55:05,100 all the time, then we would already have a solution. 1149 00:55:05,100 --> 00:55:07,150 Like, the knowledge of what is close to the goal, 1150 00:55:07,150 --> 00:55:10,570 we could just follow those steps in order to get from the initial position 1151 00:55:10,570 --> 00:55:11,870 to the solution. 1152 00:55:11,870 --> 00:55:14,650 But if we don't know the solution-- meaning we don't know exactly 1153 00:55:14,650 --> 00:55:16,450 what's closest to the goal-- 1154 00:55:16,450 --> 00:55:19,120 instead, we can use an estimate of what's 1155 00:55:19,120 --> 00:55:22,180 closest to the goal, otherwise known as a heuristic-- 1156 00:55:22,180 --> 00:55:25,910 just some way of estimating whether or not we're close to the goal. 1157 00:55:25,910 --> 00:55:29,800 And we'll do so using a heuristic function, conventionally called h(n), 1158 00:55:29,800 --> 00:55:34,540 that takes a state of input and returns our estimate of how close we 1159 00:55:34,540 --> 00:55:36,860 are to the goal. 1160 00:55:36,860 --> 00:55:39,040 So what might this heuristic function actually 1161 00:55:39,040 --> 00:55:42,100 look like in the case of a maze-solving algorithm? 1162 00:55:42,100 --> 00:55:45,490 Where we're trying to solve a maze, what does a heuristic look like? 1163 00:55:45,490 --> 00:55:48,910 Well, the heuristic needs to answer a question, like between these two 1164 00:55:48,910 --> 00:55:51,770 cells, C and D, which one is better? 1165 00:55:51,770 --> 00:55:55,640 Which one would I rather be in if I'm trying to find my way to the goal? 1166 00:55:55,640 --> 00:55:58,640 Well, any human could probably look at this and tell you, you know what? 1167 00:55:58,640 --> 00:56:00,280 D looks like it's better. 1168 00:56:00,280 --> 00:56:03,550 Even if the maze is a convoluted and you haven't thought about all the walls, 1169 00:56:03,550 --> 00:56:05,410 D is probably better. 1170 00:56:05,410 --> 00:56:06,710 And why is D better? 1171 00:56:06,710 --> 00:56:09,730 Well, because if you ignore the wall-- let's just pretend the walls 1172 00:56:09,730 --> 00:56:14,290 don't exist for a moment and relax the problem, so to speak-- 1173 00:56:14,290 --> 00:56:18,670 D, just in terms of coordinate pairs, is closer to this goal. 1174 00:56:18,670 --> 00:56:21,700 It's fewer steps that I would need to take to get to the goal, 1175 00:56:21,700 --> 00:56:24,160 as compared to C, even if you ignore the walls. 1176 00:56:24,160 --> 00:56:29,160 If you just know the x,y coordinate of C, and the x,y coordinate of the goal, 1177 00:56:29,160 --> 00:56:31,450 and likewise, you know the x,y coordinate of D, 1178 00:56:31,450 --> 00:56:35,770 you can calculate that D, just geographically, ignoring the walls, 1179 00:56:35,770 --> 00:56:37,110 looks like it's better. 1180 00:56:37,110 --> 00:56:39,700 And so this is the heuristic function that we're going to use, 1181 00:56:39,700 --> 00:56:42,880 and it's something called the Manhattan distance, one specific type 1182 00:56:42,880 --> 00:56:46,690 of heuristic, where the heuristic is, how many squares vertically 1183 00:56:46,690 --> 00:56:49,270 and horizontally and then left to right-- so not 1184 00:56:49,270 --> 00:56:53,320 allowing myself to go diagonally, just either up or right or left or down. 1185 00:56:53,320 --> 00:56:58,030 How many steps do I need to take to get from each of these cells to the goal? 1186 00:56:58,030 --> 00:57:00,910 Well, as it turns out, D is much closer. 1187 00:57:00,910 --> 00:57:01,870 There are fewer steps. 1188 00:57:01,870 --> 00:57:05,740 It only needs to take six steps in order to get to that goal. 1189 00:57:05,740 --> 00:57:07,670 Again here ignoring the walls. 1190 00:57:07,670 --> 00:57:09,790 We've relaxed the problem a little bit. 1191 00:57:09,790 --> 00:57:12,510 We're just concerned with, if you do the math, 1192 00:57:12,510 --> 00:57:14,470 subtract the x values from each other and the y 1193 00:57:14,470 --> 00:57:18,010 values from each other, what is our estimate of how far we are away? 1194 00:57:18,010 --> 00:57:22,970 We can estimate that D is closer to the goal than C is. 1195 00:57:22,970 --> 00:57:24,800 And so now we have an approach. 1196 00:57:24,800 --> 00:57:27,890 We have a way of picking which node to remove from the frontier. 1197 00:57:27,890 --> 00:57:29,870 And at each stage in our algorithm, we're 1198 00:57:29,870 --> 00:57:31,580 going to remove a node from the frontier. 1199 00:57:31,580 --> 00:57:34,820 We're going to explore the node, if it has the smallest 1200 00:57:34,820 --> 00:57:37,970 value for this heuristic function, if it has the smallest 1201 00:57:37,970 --> 00:57:40,740 Manhattan distance to the goal. 1202 00:57:40,740 --> 00:57:42,510 And so what would this actually look like? 1203 00:57:42,510 --> 00:57:45,320 Well, let me first label this graph, label this maze, 1204 00:57:45,320 --> 00:57:48,050 with a number representing the value of this heuristic 1205 00:57:48,050 --> 00:57:51,860 function, the value of the Manhattan distance from any of these cells. 1206 00:57:51,860 --> 00:57:55,070 So from this cell, for example, were one away from the goal. 1207 00:57:55,070 --> 00:57:56,920 From this cell, were two away from the goal. 1208 00:57:56,920 --> 00:57:58,400 Three away, four away. 1209 00:57:58,400 --> 00:58:02,210 Here we're five away, because we have to go one to the right and then four up. 1210 00:58:02,210 --> 00:58:05,850 From somewhere like here, the Manhattan distance is 2. 1211 00:58:05,850 --> 00:58:08,330 We're only two squares away from the goal, 1212 00:58:08,330 --> 00:58:10,730 geographically, even though in practices we're 1213 00:58:10,730 --> 00:58:13,910 going to have to take a longer path, but we don't know that yet. 1214 00:58:13,910 --> 00:58:16,760 The heuristic is just some easy way to estimate 1215 00:58:16,760 --> 00:58:18,380 how far we are away from the goal. 1216 00:58:18,380 --> 00:58:21,380 And maybe our heuristic is overly optimistic. 1217 00:58:21,380 --> 00:58:23,660 It thinks that, yeah, we're only two steps away, 1218 00:58:23,660 --> 00:58:27,450 when in practice, when you consider the walls, it might be more steps. 1219 00:58:27,450 --> 00:58:31,400 So the important thing here is that the heuristic isn't a guarantee 1220 00:58:31,400 --> 00:58:33,290 of how many steps it's going to take. 1221 00:58:33,290 --> 00:58:34,910 It is estimating. 1222 00:58:34,910 --> 00:58:36,920 It's an attempt at trying to approximate. 1223 00:58:36,920 --> 00:58:40,760 And it does seem generally the case that the squares that look closer 1224 00:58:40,760 --> 00:58:43,910 to the goal have smaller values for the heuristic function 1225 00:58:43,910 --> 00:58:46,960 than squares that are further away. 1226 00:58:46,960 --> 00:58:52,110 So now, using greedy best-first search, what might this algorithm actually do? 1227 00:58:52,110 --> 00:58:55,270 Well, again, for these first five steps, there's not much of a choice. 1228 00:58:55,270 --> 00:58:57,610 We started this initial state, A. And we say, all right. 1229 00:58:57,610 --> 00:59:00,300 We have to explore these five states. 1230 00:59:00,300 --> 00:59:01,890 But now we have a decision point. 1231 00:59:01,890 --> 00:59:04,590 Now we have a choice between going left and going right. 1232 00:59:04,590 --> 00:59:08,760 And before, when DFS and BFS would just pick arbitrarily because it just 1233 00:59:08,760 --> 00:59:11,610 depends on the order you throw these two nodes into the frontier-- 1234 00:59:11,610 --> 00:59:15,240 and we didn't specify what order you put them into the frontier, only the order 1235 00:59:15,240 --> 00:59:17,250 you take them out. 1236 00:59:17,250 --> 00:59:20,580 Here we can look at 13 and 11 and say that, all right, 1237 00:59:20,580 --> 00:59:24,630 this square is a distance of 11 away from the goal, 1238 00:59:24,630 --> 00:59:27,300 according to our heuristic, according to our estimate. 1239 00:59:27,300 --> 00:59:31,560 And this one we estimate to be 13 away from the goal. 1240 00:59:31,560 --> 00:59:34,650 So between those two options, between these two choices, 1241 00:59:34,650 --> 00:59:36,150 I'd rather have the 11. 1242 00:59:36,150 --> 00:59:40,240 I'd rather be 11 steps away from the goal, so I'll go to the right. 1243 00:59:40,240 --> 00:59:44,340 We're able to make an informed decision because we know a little something more 1244 00:59:44,340 --> 00:59:45,670 about this problem. 1245 00:59:45,670 --> 00:59:47,820 So then we keep following 10, 9, 8-- 1246 00:59:47,820 --> 00:59:49,470 between the two sevens. 1247 00:59:49,470 --> 00:59:51,850 We don't really have much of a way to know between those. 1248 00:59:51,850 --> 00:59:53,880 So then we do just have to make an arbitrary choice. 1249 00:59:53,880 --> 00:59:54,630 And you know what? 1250 00:59:54,630 --> 00:59:55,560 Maybe we choose wrong. 1251 00:59:55,560 --> 01:00:00,110 But that's OK because now we can still say, all right, let's try this seven. 1252 01:00:00,110 --> 01:00:01,820 We say seven, six. 1253 01:00:01,820 --> 01:00:03,990 We have to make this choice even though it increases 1254 01:00:03,990 --> 01:00:05,640 the value of the heuristic function. 1255 01:00:05,640 --> 01:00:08,970 But now we have another decision point between six and eight. 1256 01:00:08,970 --> 01:00:10,290 And between those two-- 1257 01:00:10,290 --> 01:00:13,380 and really, we're also considering the 13, but that's much higher. 1258 01:00:13,380 --> 01:00:16,200 Between six, eight, and 13, well, the six 1259 01:00:16,200 --> 01:00:18,900 is the smallest value, so we'd rather take the six. 1260 01:00:18,900 --> 01:00:22,440 We're able to make an informed decision that going this way to the right 1261 01:00:22,440 --> 01:00:24,840 is probably better than going that way. 1262 01:00:24,840 --> 01:00:25,680 So we turn this way. 1263 01:00:25,680 --> 01:00:26,850 We go to five. 1264 01:00:26,850 --> 01:00:29,190 And now we find a decision point where we'll actually 1265 01:00:29,190 --> 01:00:31,200 make a decision that we might not want to make, 1266 01:00:31,200 --> 01:00:34,320 but there's unfortunately not too much of a way around this. 1267 01:00:34,320 --> 01:00:35,670 We see four and six. 1268 01:00:35,670 --> 01:00:37,620 Four looks closer to the goal, right? 1269 01:00:37,620 --> 01:00:40,170 It's going up, and the goal is further up. 1270 01:00:40,170 --> 01:00:43,710 So we end up taking that route, which ultimately leads us to a dead end. 1271 01:00:43,710 --> 01:00:46,960 But that's OK because we can still say, all right, now let's try the six, 1272 01:00:46,960 --> 01:00:51,380 and now follow this route that will ultimately lead us to the goal. 1273 01:00:51,380 --> 01:00:54,510 And so this now is how greedy best-first search might 1274 01:00:54,510 --> 01:00:56,760 try to approach this problem, by saying whenever 1275 01:00:56,760 --> 01:01:00,060 we have a decision between multiple nodes that we could explore, 1276 01:01:00,060 --> 01:01:04,290 let's explore the node that has the smallest value of h(n), 1277 01:01:04,290 --> 01:01:09,210 this heuristic function that is estimating how far I have to go. 1278 01:01:09,210 --> 01:01:11,010 And it just so happens that, in this case, 1279 01:01:11,010 --> 01:01:15,010 we end up doing better, in terms of the number of states we needed to explore, 1280 01:01:15,010 --> 01:01:16,630 than BFS needed to. 1281 01:01:16,630 --> 01:01:19,950 BFS explored all of this section and all of that section. 1282 01:01:19,950 --> 01:01:22,620 But we were able to eliminate that by taking advantage 1283 01:01:22,620 --> 01:01:26,040 of this heuristic, this knowledge about how close we 1284 01:01:26,040 --> 01:01:30,330 are to the goal or some estimate of that idea. 1285 01:01:30,330 --> 01:01:31,450 So this seems much better. 1286 01:01:31,450 --> 01:01:33,150 So wouldn't we always prefer an algorithm 1287 01:01:33,150 --> 01:01:36,870 like this over an algorithm like breadth-first search? 1288 01:01:36,870 --> 01:01:37,850 Well, maybe. 1289 01:01:37,850 --> 01:01:39,810 One thing to take into consideration is that we 1290 01:01:39,810 --> 01:01:42,030 need to come up with a good heuristic. 1291 01:01:42,030 --> 01:01:45,720 How good the heuristic is is going to affect how good this algorithm is. 1292 01:01:45,720 --> 01:01:49,740 And coming up with a good heuristic can oftentimes be challenging. 1293 01:01:49,740 --> 01:01:51,450 But the other thing to consider is to ask 1294 01:01:51,450 --> 01:01:54,630 the question, just as we did with the prior two algorithms, 1295 01:01:54,630 --> 01:01:56,580 is this algorithm optimal? 1296 01:01:56,580 --> 01:02:02,280 Will it always find the shortest path from the initial state to the goal? 1297 01:02:02,280 --> 01:02:06,180 And to answer that question, let's take a look at this example for a moment. 1298 01:02:06,180 --> 01:02:07,570 Take a look at this example. 1299 01:02:07,570 --> 01:02:10,170 Again, we're trying to get from A to B, and again, I've 1300 01:02:10,170 --> 01:02:13,260 labeled each of the cells with their Manhattan distance 1301 01:02:13,260 --> 01:02:16,320 from the goal, the number of squares up and to the right 1302 01:02:16,320 --> 01:02:20,500 you would need to travel in order to get from that square to the goal. 1303 01:02:20,500 --> 01:02:23,400 And let's think about, would greedy best-first search 1304 01:02:23,400 --> 01:02:29,400 that always picks the smallest number end up finding the optimal solution? 1305 01:02:29,400 --> 01:02:33,520 What is the shortest solution, and would this algorithm find it? 1306 01:02:33,520 --> 01:02:38,190 And the important thing to realize is that right here is the decision point. 1307 01:02:38,190 --> 01:02:40,710 We're estimate to be 12 away from the goal. 1308 01:02:40,710 --> 01:02:42,310 And we have two choices. 1309 01:02:42,310 --> 01:02:45,690 We can go to the left, which we estimate to be 13 away from the goal, 1310 01:02:45,690 --> 01:02:49,710 or we can go up, where we estimate it to be 11 away from the goal. 1311 01:02:49,710 --> 01:02:53,640 And between those two, greedy best-first search is going to say, 1312 01:02:53,640 --> 01:02:57,000 the 11 looks better than the 13. 1313 01:02:57,000 --> 01:02:59,540 And in doing so, greedy best-first search 1314 01:02:59,540 --> 01:03:02,820 will end up finding this path to the goal. 1315 01:03:02,820 --> 01:03:04,980 But it turns out this path is not optimal. 1316 01:03:04,980 --> 01:03:07,470 There is a way to get to the goal using fewer steps. 1317 01:03:07,470 --> 01:03:12,420 And it's actually this way, this way that ultimately involved fewer steps, 1318 01:03:12,420 --> 01:03:15,900 even though it meant at this moment choosing the worst 1319 01:03:15,900 --> 01:03:19,770 option between the two-- or what we estimated to be the worst option, based 1320 01:03:19,770 --> 01:03:21,150 on the heretics. 1321 01:03:21,150 --> 01:03:23,910 And so this is what we mean by this is a greedy algorithm. 1322 01:03:23,910 --> 01:03:26,460 It's making the best decision, locally. 1323 01:03:26,460 --> 01:03:28,800 At this decision point, it looks like it's better 1324 01:03:28,800 --> 01:03:31,450 to go here than it is to go to the 13. 1325 01:03:31,450 --> 01:03:34,040 But in the big picture, it's not necessarily optimal, 1326 01:03:34,040 --> 01:03:37,200 that it might find a solution when in actuality there 1327 01:03:37,200 --> 01:03:40,150 was a better solution available. 1328 01:03:40,150 --> 01:03:43,200 So we would like some way to solve this problem. 1329 01:03:43,200 --> 01:03:45,840 We like the idea of this heuristic, of being 1330 01:03:45,840 --> 01:03:50,080 able to estimate the path, the distance between us and the goal, 1331 01:03:50,080 --> 01:03:52,290 and that helps us to be able to make better decisions 1332 01:03:52,290 --> 01:03:55,890 and to eliminate having to search through entire parts of the state 1333 01:03:55,890 --> 01:03:57,070 space. 1334 01:03:57,070 --> 01:04:00,160 But we would like to modify the algorithm so that we can achieve 1335 01:04:00,160 --> 01:04:02,650 optimality, so that it can be optimal. 1336 01:04:02,650 --> 01:04:04,060 And what is the way to do this? 1337 01:04:04,060 --> 01:04:05,790 What is the intuition here? 1338 01:04:05,790 --> 01:04:08,310 Well, let's take a look at this problem. 1339 01:04:08,310 --> 01:04:11,070 In this initial problem, greedy best-first search 1340 01:04:11,070 --> 01:04:14,170 found this solution here, this long path. 1341 01:04:14,170 --> 01:04:17,310 And the reason why it wasn't great is because, yes, the heuristic numbers 1342 01:04:17,310 --> 01:04:21,180 went down pretty low, but later on, and they started to build back up. 1343 01:04:21,180 --> 01:04:25,780 They built back 8, 9, 10, 11-- all the way up to 12, in this case. 1344 01:04:25,780 --> 01:04:29,290 And so how might we go about trying to improve this algorithm? 1345 01:04:29,290 --> 01:04:32,400 Well, one thing that we might realize is that, if we 1346 01:04:32,400 --> 01:04:35,170 go all the way through this algorithm, through this path, 1347 01:04:35,170 --> 01:04:39,340 and we end up going to the 12, and we've had to take this many steps-- like, 1348 01:04:39,340 --> 01:04:42,640 who knows how many steps that is-- just to get to this 12, 1349 01:04:42,640 --> 01:04:48,160 we could have also, as an alternative, taken much fewer steps, just six steps, 1350 01:04:48,160 --> 01:04:50,170 and ended up at this 13 here. 1351 01:04:50,170 --> 01:04:53,680 And yes, 13 is more than 12, so it looks like it's not as good, 1352 01:04:53,680 --> 01:04:55,460 but it required far fewer steps. 1353 01:04:55,460 --> 01:04:55,960 Right? 1354 01:04:55,960 --> 01:04:59,530 It only took six steps to get to this 13 versus many more steps 1355 01:04:59,530 --> 01:05:01,000 to get to this 12. 1356 01:05:01,000 --> 01:05:04,180 And while greedy best-first search says, oh, well, 12 is better than 13 1357 01:05:04,180 --> 01:05:07,850 so pick the 12, we might more intelligently say, 1358 01:05:07,850 --> 01:05:10,810 I'd rather be somewhere that heuristically 1359 01:05:10,810 --> 01:05:16,030 looks like it takes slightly longer if I can get there much more quickly. 1360 01:05:16,030 --> 01:05:18,940 And we're going to encode that idea, this general idea, 1361 01:05:18,940 --> 01:05:23,010 into a more formal algorithm known as A star search. 1362 01:05:23,010 --> 01:05:25,270 A star search is going to solve this problem by, 1363 01:05:25,270 --> 01:05:27,970 instead of just considering the heuristic, 1364 01:05:27,970 --> 01:05:32,720 also considering how long it took us to get to any particular state. 1365 01:05:32,720 --> 01:05:35,650 So the distinction is greedy best-first search, if I am in a state 1366 01:05:35,650 --> 01:05:38,440 right now, the only thing I care about is 1367 01:05:38,440 --> 01:05:42,070 what is the estimated distance, the heuristic value, between me 1368 01:05:42,070 --> 01:05:43,000 and the goal. 1369 01:05:43,000 --> 01:05:45,640 Whereas A star search will take into consideration 1370 01:05:45,640 --> 01:05:46,870 two pieces of information. 1371 01:05:46,870 --> 01:05:51,130 It'll take into consideration, how far do I estimate I am from the goal, 1372 01:05:51,130 --> 01:05:55,030 but also how far did I have to travel in order to get here? 1373 01:05:55,030 --> 01:05:57,490 Because that is relevant, too. 1374 01:05:57,490 --> 01:06:00,760 So we'll search algorithms by expanding the node with the lowest 1375 01:06:00,760 --> 01:06:04,060 value of g(n) plus h(n). 1376 01:06:04,060 --> 01:06:07,810 h(n) is that same heuristic that we were talking about a moment ago that's going 1377 01:06:07,810 --> 01:06:12,640 to vary based on the problem, but g(n) is going to be the cost to reach 1378 01:06:12,640 --> 01:06:13,420 the node-- 1379 01:06:13,420 --> 01:06:19,380 how many steps I had to take, in this case, to get to my current position. 1380 01:06:19,380 --> 01:06:22,070 So what does that search algorithm look like in practice? 1381 01:06:22,070 --> 01:06:23,630 Well, let's take a look. 1382 01:06:23,630 --> 01:06:25,130 Again, we've got the same maze. 1383 01:06:25,130 --> 01:06:28,010 And again, I've labeled them with their Manhattan distance. 1384 01:06:28,010 --> 01:06:32,060 This value is the h(n) value, the heuristic estimate 1385 01:06:32,060 --> 01:06:36,240 of how far each of these squares is away from the goal. 1386 01:06:36,240 --> 01:06:38,370 But now, as we begin to explore states, we 1387 01:06:38,370 --> 01:06:41,370 care not just about this heuristic value but also 1388 01:06:41,370 --> 01:06:45,650 about g(n), the number of steps I had to take in order to get there. 1389 01:06:45,650 --> 01:06:48,060 And I care about summing those two numbers together. 1390 01:06:48,060 --> 01:06:49,230 So what does that look like? 1391 01:06:49,230 --> 01:06:52,850 On this very first step, I have taken one step. 1392 01:06:52,850 --> 01:06:56,130 And now I am estimated to be 16 steps away from the goal. 1393 01:06:56,130 --> 01:06:59,190 So the total value here is 17. 1394 01:06:59,190 --> 01:07:00,280 Then I take one more step. 1395 01:07:00,280 --> 01:07:02,010 I've now taken two steps. 1396 01:07:02,010 --> 01:07:04,780 And I estimate myself to be 15 away from the goal-- 1397 01:07:04,780 --> 01:07:06,740 again, a total value of 17. 1398 01:07:06,740 --> 01:07:08,230 Now I've taken three steps. 1399 01:07:08,230 --> 01:07:11,440 And I'm estimated to be 14 away from the goal, so on and so forth. 1400 01:07:11,440 --> 01:07:13,750 Four steps, an estimate of 13. 1401 01:07:13,750 --> 01:07:15,820 Five steps, estimate of 12. 1402 01:07:15,820 --> 01:07:17,980 And now, here's a decision point. 1403 01:07:17,980 --> 01:07:22,720 I could either be six steps away from the goal with a heuristic of 13 1404 01:07:22,720 --> 01:07:26,440 for a total of 19, or I could be six steps away 1405 01:07:26,440 --> 01:07:31,640 from the goal with a heuristic of 11 with an estimate of 17 for the total. 1406 01:07:31,640 --> 01:07:35,230 So between 19 and 17, I'd rather take the 17-- 1407 01:07:35,230 --> 01:07:37,170 the 6 plus 11. 1408 01:07:37,170 --> 01:07:39,170 So so far, no different than what we saw before. 1409 01:07:39,170 --> 01:07:42,160 We're still taking this option because it appears to be better. 1410 01:07:42,160 --> 01:07:45,160 And I keep taking this option because it appears to be better. 1411 01:07:45,160 --> 01:07:49,430 But it's right about here that things get a little bit different. 1412 01:07:49,430 --> 01:07:55,630 Now I could be 15 steps away from the goal with an estimated distance of 6. 1413 01:07:55,630 --> 01:07:58,750 So 15 plus 6, total value of 21. 1414 01:07:58,750 --> 01:08:01,870 Alternatively, I could be six steps away from the goal-- 1415 01:08:01,870 --> 01:08:04,690 because this was five steps away, so this is six steps away-- 1416 01:08:04,690 --> 01:08:07,350 with a total value of 13 as my estimate. 1417 01:08:07,350 --> 01:08:08,650 So 6 plus 13-- 1418 01:08:08,650 --> 01:08:10,180 that's 19. 1419 01:08:10,180 --> 01:08:14,140 So here we would evaluate g(n) plus h(n) to be 19-- 1420 01:08:14,140 --> 01:08:20,440 6 plus 13-- whereas here, we would be 15 plus 6, or 21. 1421 01:08:20,440 --> 01:08:23,780 And so the intuition is, 19 less than 21, pick here. 1422 01:08:23,780 --> 01:08:29,190 But the idea is ultimately I'd rather be having taken fewer steps to get to a 13 1423 01:08:29,190 --> 01:08:32,610 than having taken 15 steps and be at a six 1424 01:08:32,610 --> 01:08:35,410 because it means I've had to take more steps in order to get there. 1425 01:08:35,410 --> 01:08:38,520 Maybe there's a better path this way. 1426 01:08:38,520 --> 01:08:41,050 So instead we'll explore this route. 1427 01:08:41,050 --> 01:08:43,840 Now if we go one more-- this is seven steps plus 14, 1428 01:08:43,840 --> 01:08:46,840 is 21, so between those two it's sort of a toss up. 1429 01:08:46,840 --> 01:08:48,970 We might end up exploring that one anyways. 1430 01:08:48,970 --> 01:08:53,270 But after that, as these numbers start to get bigger in the heuristic values 1431 01:08:53,270 --> 01:08:55,600 and these heuristic values start to get smaller, 1432 01:08:55,600 --> 01:08:59,100 you'll find that we'll actually keep exploring down this path. 1433 01:08:59,100 --> 01:09:02,290 And you can do the math to see that at every decision point, 1434 01:09:02,290 --> 01:09:06,810 A star search is going to make a choice based on the sum of how many steps 1435 01:09:06,810 --> 01:09:09,700 it took me to get to my current position and then 1436 01:09:09,700 --> 01:09:13,180 how far I estimate I am from the goal. 1437 01:09:13,180 --> 01:09:15,760 So while we did have to explore some of these states, 1438 01:09:15,760 --> 01:09:20,470 the ultimate solution we found was, in fact, an optimal solution. 1439 01:09:20,470 --> 01:09:24,790 It did find us the quickest possible way to get from the initial state 1440 01:09:24,790 --> 01:09:25,800 to the goal. 1441 01:09:25,800 --> 01:09:29,800 And it turns out that A* is an optimal search algorithm under certain 1442 01:09:29,800 --> 01:09:31,390 conditions. 1443 01:09:31,390 --> 01:09:35,860 So the conditions are h of n, my heuristic, needs to be admissible. 1444 01:09:35,860 --> 01:09:37,990 What does it mean for a heuristic to be admissible? 1445 01:09:37,990 --> 01:09:42,700 Well, a heuristic is admissible if it never overestimates the true cost. 1446 01:09:42,700 --> 01:09:46,260 Each event always needs to either get it exactly right 1447 01:09:46,260 --> 01:09:50,520 in terms of how far away I am, or it needs to underestimate. 1448 01:09:50,520 --> 01:09:54,600 So we saw an example from before where the heuristic value was much smaller 1449 01:09:54,600 --> 01:09:56,350 than the actual cost it would take. 1450 01:09:56,350 --> 01:09:57,710 That's totally fine. 1451 01:09:57,710 --> 01:10:00,100 But the heuristic value should never overestimate. 1452 01:10:00,100 --> 01:10:04,520 It should never think that I'm further away from the goal than I actually am. 1453 01:10:04,520 --> 01:10:07,630 And meanwhile, to make a stronger statement, h of n 1454 01:10:07,630 --> 01:10:09,880 also needs to be consistent. 1455 01:10:09,880 --> 01:10:11,800 And what does it mean for it to be consistent? 1456 01:10:11,800 --> 01:10:14,500 Mathematically, it means that for every node, which 1457 01:10:14,500 --> 01:10:19,120 we'll call n, and successor, the node after me, that I'll call n prime, 1458 01:10:19,120 --> 01:10:24,220 where it takes a cost of c to make that step, the heuristic value of n 1459 01:10:24,220 --> 01:10:26,560 needs to be less than or equal to the heuristic 1460 01:10:26,560 --> 01:10:29,060 value of n prime plus the cost. 1461 01:10:29,060 --> 01:10:31,360 So it's a lot of math, but in words, what it ultimately 1462 01:10:31,360 --> 01:10:34,870 means is that if I am here at this state right now, 1463 01:10:34,870 --> 01:10:39,250 the heuristic value from me to the goal shouldn't be more than the heuristic 1464 01:10:39,250 --> 01:10:44,080 value of my successor, the next place I could go to, plus however much 1465 01:10:44,080 --> 01:10:48,560 it would cost me to just make that step, from one step to the next step. 1466 01:10:48,560 --> 01:10:52,480 And so this is just making sure that my heuristic is consistent between all 1467 01:10:52,480 --> 01:10:54,010 of these steps that I might take. 1468 01:10:54,010 --> 01:10:58,540 So as long as this is true, then A* search is going to find me an optimal 1469 01:10:58,540 --> 01:10:59,470 solution. 1470 01:10:59,470 --> 01:11:02,770 And this is where much of the challenge of solving these search problems can 1471 01:11:02,770 --> 01:11:05,980 sometimes come in, that A* search is an algorithm that is known, 1472 01:11:05,980 --> 01:11:07,960 and you could write the code fairly easily. 1473 01:11:07,960 --> 01:11:11,260 But it's choosing the heuristic that can be the interesting challenge. 1474 01:11:11,260 --> 01:11:13,540 The better the heuristic is, the better I'll 1475 01:11:13,540 --> 01:11:16,870 be able to solve the problem, and the fewer states that I'll have to explore. 1476 01:11:16,870 --> 01:11:20,200 And I need to make sure that the heuristic satisfies 1477 01:11:20,200 --> 01:11:22,540 these particular constraints. 1478 01:11:22,540 --> 01:11:25,900 So all in all, these are some of the examples of search algorithms 1479 01:11:25,900 --> 01:11:26,650 that might work. 1480 01:11:26,650 --> 01:11:29,080 And certainly, there are many more than just this. 1481 01:11:29,080 --> 01:11:32,590 A*, for example, does have a tendency to use quite a bit of memory, 1482 01:11:32,590 --> 01:11:37,570 so there are alternative approaches to A* that ultimately use less memory than 1483 01:11:37,570 --> 01:11:40,200 this version of A* happens to use. 1484 01:11:40,200 --> 01:11:43,660 And there are other search algorithms that are optimized for other cases 1485 01:11:43,660 --> 01:11:45,560 as well. 1486 01:11:45,560 --> 01:11:48,460 But now, so far, we've only been looking at search algorithms 1487 01:11:48,460 --> 01:11:50,860 where there's one agent. 1488 01:11:50,860 --> 01:11:53,470 I am trying to find a solution to a problem. 1489 01:11:53,470 --> 01:11:55,900 I am trying to navigate my way through a maze. 1490 01:11:55,900 --> 01:11:57,880 I am trying to solve a 15 puzzle. 1491 01:11:57,880 --> 01:12:02,190 I am trying to find driving directions from point A to point B. 1492 01:12:02,190 --> 01:12:04,420 Sometimes in search situations, though, we'll 1493 01:12:04,420 --> 01:12:07,600 enter an adversarial situation where I am 1494 01:12:07,600 --> 01:12:10,010 an agent trying to make intelligent decisions, 1495 01:12:10,010 --> 01:12:13,090 and there is someone else who is fighting against me, so to speak, 1496 01:12:13,090 --> 01:12:16,570 that has an opposite objective, someone where I am trying to succeed, 1497 01:12:16,570 --> 01:12:19,180 someone else that wants me to fail. 1498 01:12:19,180 --> 01:12:23,680 And this is most popular in something like a game, a game like tic-tac-toe, 1499 01:12:23,680 --> 01:12:26,350 where we've got this 3-by-3 grid, and X and O 1500 01:12:26,350 --> 01:12:30,130 take turns either writing an X or an O in any one of these squares. 1501 01:12:30,130 --> 01:12:33,610 And the goal is to get three X's in a row, if you're the X player, 1502 01:12:33,610 --> 01:12:36,610 or three O's in a row, if you're the O player. 1503 01:12:36,610 --> 01:12:40,390 And computers have gotten quite good at playing games, tic-tac-toe very easily, 1504 01:12:40,390 --> 01:12:42,370 but even more complex games. 1505 01:12:42,370 --> 01:12:46,570 And so you might imagine, what does an intelligent decision in a game look 1506 01:12:46,570 --> 01:12:47,290 like? 1507 01:12:47,290 --> 01:12:51,130 So maybe X makes an initial move in the middle, and O plays up here. 1508 01:12:51,130 --> 01:12:54,340 What does an intelligent move for X now become? 1509 01:12:54,340 --> 01:12:56,390 Where should you move if you were X? 1510 01:12:56,390 --> 01:12:58,670 And it turns out there are a couple of possibilities. 1511 01:12:58,670 --> 01:13:01,120 But if an AI is playing this game optimally, 1512 01:13:01,120 --> 01:13:04,180 then the AI might play somewhere like the upper right, where 1513 01:13:04,180 --> 01:13:07,800 in this situation, O has the opposite objective of X. 1514 01:13:07,800 --> 01:13:11,770 X is trying to win the game, to get three in a row diagonally here, 1515 01:13:11,770 --> 01:13:15,410 and O is trying to stop that objective, opposite of the objective. 1516 01:13:15,410 --> 01:13:17,860 And so O is going to place here, to try to block. 1517 01:13:17,860 --> 01:13:20,260 But now, X has a pretty clever move. 1518 01:13:20,260 --> 01:13:25,180 X can make a move, like this where now X has two possible ways that X 1519 01:13:25,180 --> 01:13:26,080 can win the game. 1520 01:13:26,080 --> 01:13:29,050 X could win the game by getting three in a row across here, 1521 01:13:29,050 --> 01:13:32,380 or X could win the game by getting three in a row vertically this way. 1522 01:13:32,380 --> 01:13:34,610 So it doesn't matter where O makes their next move. 1523 01:13:34,610 --> 01:13:38,170 O could play here, for example, blocking the three in a row horizontally, 1524 01:13:38,170 --> 01:13:43,200 but then X is going to win the game by getting a three in a row vertically. 1525 01:13:43,200 --> 01:13:45,010 And so there's a fair amount of reasoning 1526 01:13:45,010 --> 01:13:48,220 that's going on here in order for the computer to be able to solve a problem. 1527 01:13:48,220 --> 01:13:51,520 And it's similar in spirit to the problems we've looked at so far. 1528 01:13:51,520 --> 01:13:54,580 There are actions, there's some sort of state of the board, 1529 01:13:54,580 --> 01:13:57,160 and some transition from one action to the next, 1530 01:13:57,160 --> 01:14:00,730 but it's different in the sense that this is now not just a classical search 1531 01:14:00,730 --> 01:14:04,900 problem, but an adversarial search problem, that I am the X player, 1532 01:14:04,900 --> 01:14:07,090 trying to find the best moves to make, but I 1533 01:14:07,090 --> 01:14:10,640 know that there is some adversary that is trying to stop me. 1534 01:14:10,640 --> 01:14:14,800 So we need some sort of algorithm to deal with these adversarial type 1535 01:14:14,800 --> 01:14:16,440 of search situations. 1536 01:14:16,440 --> 01:14:18,400 And the algorithm we're going to take a look at 1537 01:14:18,400 --> 01:14:20,860 is an algorithm called Minimax, which works 1538 01:14:20,860 --> 01:14:24,820 very well for these deterministic games, where there are two players. 1539 01:14:24,820 --> 01:14:28,120 It can work for other types of games as well, but we'll look right now at games 1540 01:14:28,120 --> 01:14:31,990 where I make a move, that my opponent makes a move, and I am trying to win, 1541 01:14:31,990 --> 01:14:34,270 and my opponent is trying to win, also. 1542 01:14:34,270 --> 01:14:38,100 Or in other words, my opponent is trying to get me to lose. 1543 01:14:38,100 --> 01:14:40,900 And so what do we need in order to make this algorithm work? 1544 01:14:40,900 --> 01:14:44,800 Well, anytime we try and translate this human concept, of playing a game, 1545 01:14:44,800 --> 01:14:47,380 winning, and losing, to a computer, we want 1546 01:14:47,380 --> 01:14:50,230 to translate it in terms that the computer can understand. 1547 01:14:50,230 --> 01:14:53,740 And ultimately, the computer really just understands numbers. 1548 01:14:53,740 --> 01:14:56,890 And so we want some way of translating a game of X's and O's 1549 01:14:56,890 --> 01:15:00,490 on a grid to something numerical, something the computer can understand. 1550 01:15:00,490 --> 01:15:04,300 The computer doesn't normally understand notions of win or lose, 1551 01:15:04,300 --> 01:15:08,360 but it does understand the concept of bigger and smaller. 1552 01:15:08,360 --> 01:15:12,100 And so what we might yet do is, we might take each of the possible ways 1553 01:15:12,100 --> 01:15:17,110 that a tic-tac-toe game can unfold and assign a value, or a utility, 1554 01:15:17,110 --> 01:15:19,060 to each one of those possible ways. 1555 01:15:19,060 --> 01:15:21,790 And in a tic-tac-toe game, and in many types of games, 1556 01:15:21,790 --> 01:15:23,800 there are three possible outcomes. 1557 01:15:23,800 --> 01:15:28,210 The outcomes are, O wins, X wins, or nobody wins. 1558 01:15:28,210 --> 01:15:32,440 So player one wins, player two wins, or nobody wins. 1559 01:15:32,440 --> 01:15:36,670 And for now, let's go ahead and assign each of these possible outcomes 1560 01:15:36,670 --> 01:15:37,870 a different value. 1561 01:15:37,870 --> 01:15:39,100 We'll say O winning-- 1562 01:15:39,100 --> 01:15:41,230 that'll have a value of negative 1. 1563 01:15:41,230 --> 01:15:43,630 Nobody winning-- that'll have a value of 0. 1564 01:15:43,630 --> 01:15:46,700 And X winning-- that will have a value of 1. 1565 01:15:46,700 --> 01:15:50,860 So we've just assigned numbers to each of these three possible outcomes. 1566 01:15:50,860 --> 01:15:53,480 And now, we have two players. 1567 01:15:53,480 --> 01:15:56,260 We have the X player and the O player. 1568 01:15:56,260 --> 01:16:00,290 And we're going to go ahead and call the X player the max player. 1569 01:16:00,290 --> 01:16:03,140 And we'll call the O player the min player. 1570 01:16:03,140 --> 01:16:05,920 And the reason why is because in the Minimax algorithm, 1571 01:16:05,920 --> 01:16:11,380 the max player, which in this case is X, is aiming to maximize the score. 1572 01:16:11,380 --> 01:16:14,660 These are the possible options for the score, negative 1, 0, and 1. 1573 01:16:14,660 --> 01:16:18,400 X wants to maximize the score, meaning if at all possible, 1574 01:16:18,400 --> 01:16:21,980 X would like this situation where X wins the game. 1575 01:16:21,980 --> 01:16:23,590 And we give it a score of 1. 1576 01:16:23,590 --> 01:16:27,010 But if this isn't possible, if X needs to choose between these two 1577 01:16:27,010 --> 01:16:31,930 options, negative 1 meaning O winning, or 0 meaning nobody winning, 1578 01:16:31,930 --> 01:16:37,060 X would rather that nobody wins, score of 0, than a score of negative 1, 1579 01:16:37,060 --> 01:16:38,360 O winning. 1580 01:16:38,360 --> 01:16:41,080 So this notion of winning and losing in time 1581 01:16:41,080 --> 01:16:45,410 has been reduced mathematically to just this idea of, try and maximize 1582 01:16:45,410 --> 01:16:46,070 the score. 1583 01:16:46,070 --> 01:16:49,920 The X player always wants the score to be bigger. 1584 01:16:49,920 --> 01:16:52,880 And on the flip side, the min player, in this case, O, 1585 01:16:52,880 --> 01:16:54,620 is aiming to minimize the score. 1586 01:16:54,620 --> 01:16:59,510 The O player wants the score to be as small as possible. 1587 01:16:59,510 --> 01:17:02,820 So now we've taken this game of X's and O's and winning and losing 1588 01:17:02,820 --> 01:17:04,990 and turned it into something mathematical, something 1589 01:17:04,990 --> 01:17:09,560 where X is trying to maximize the score, O is trying to minimize the score. 1590 01:17:09,560 --> 01:17:11,590 Let's now look at all of the parts of the game 1591 01:17:11,590 --> 01:17:14,650 that we need in order to encode it in an AI 1592 01:17:14,650 --> 01:17:18,730 so that an AI can play a game like tic-tac-toe. 1593 01:17:18,730 --> 01:17:20,770 So the game is going to need a couple of things. 1594 01:17:20,770 --> 01:17:23,350 We'll need some sort of initial state, that we'll in this case 1595 01:17:23,350 --> 01:17:27,520 call S0, which is how the game begins, like an empty tic-tac-toe board, 1596 01:17:27,520 --> 01:17:28,750 for example. 1597 01:17:28,750 --> 01:17:32,410 We'll also need a function called player, 1598 01:17:32,410 --> 01:17:37,030 where the player function is going to take as input a state, here represented 1599 01:17:37,030 --> 01:17:41,050 by S. And the output of the player function is going to be, 1600 01:17:41,050 --> 01:17:43,450 which player's turn is it? 1601 01:17:43,450 --> 01:17:46,360 We need to be able to give a tic-tac-toe board to the computer, 1602 01:17:46,360 --> 01:17:50,440 run it through a function, and that function tells us whose turn it is. 1603 01:17:50,440 --> 01:17:52,870 We'll need some notion of actions that we can take. 1604 01:17:52,870 --> 01:17:54,980 We'll see examples of that in just a moment. 1605 01:17:54,980 --> 01:17:57,910 We need some notion of a transition model-- same as before. 1606 01:17:57,910 --> 01:18:00,160 If I have a state, and I take an action, I 1607 01:18:00,160 --> 01:18:03,040 need to know what results as a consequence of it. 1608 01:18:03,040 --> 01:18:05,810 I need some way of knowing when the game is over. 1609 01:18:05,810 --> 01:18:07,900 So this is equivalent to kind of like a goal test, 1610 01:18:07,900 --> 01:18:10,330 but I need some terminal test, some way to check 1611 01:18:10,330 --> 01:18:14,230 to see if a state is a terminal state, where a terminal state means 1612 01:18:14,230 --> 01:18:15,280 the game is over. 1613 01:18:15,280 --> 01:18:20,050 In the classic game of tic-tac-toe , a terminal state means either someone has 1614 01:18:20,050 --> 01:18:23,380 gotten three in a row, or all of the squares of the tic-tac-toe board are 1615 01:18:23,380 --> 01:18:24,040 filled. 1616 01:18:24,040 --> 01:18:26,610 Either of those conditions make it a terminal state. 1617 01:18:26,610 --> 01:18:28,570 In a game of chess, it might be something like, 1618 01:18:28,570 --> 01:18:31,660 when there is checkmate, or if checkmate is no longer possible, 1619 01:18:31,660 --> 01:18:34,370 that becomes a terminal state. 1620 01:18:34,370 --> 01:18:38,410 And then finally we'll need a utility function, a function that takes a state 1621 01:18:38,410 --> 01:18:41,890 and gives us a numerical value for that terminal state, some way of saying, 1622 01:18:41,890 --> 01:18:44,530 if X wins the game, that has a value of 1. 1623 01:18:44,530 --> 01:18:47,050 If O has won the game, that has the value of negative 1. 1624 01:18:47,050 --> 01:18:50,350 If nobody has won the game, that has a value of 0. 1625 01:18:50,350 --> 01:18:52,790 So let's take a look at each of these in turn. 1626 01:18:52,790 --> 01:18:57,070 The initial state, we can just represent in tic-tac-toe as the empty game board. 1627 01:18:57,070 --> 01:18:58,460 This is where we begin. 1628 01:18:58,460 --> 01:19:01,030 It's the place from which we begin this search. 1629 01:19:01,030 --> 01:19:03,540 And again, I'll be representing these things visually. 1630 01:19:03,540 --> 01:19:05,410 But you can imagine this really just being 1631 01:19:05,410 --> 01:19:10,120 an array, or a two-dimensional array, of all of these possible squares. 1632 01:19:10,120 --> 01:19:13,510 Then we need the player function that, again, takes a state 1633 01:19:13,510 --> 01:19:15,250 and tells us whose turn it is. 1634 01:19:15,250 --> 01:19:18,670 Assuming X makes the first move, if I have an empty game board, 1635 01:19:18,670 --> 01:19:21,510 then my player function is going to return X 1636 01:19:21,510 --> 01:19:25,120 And if I have a game board where X has made a move, that my player function is 1637 01:19:25,120 --> 01:19:28,840 going to return O. The player function takes a tic-tac-toe game board 1638 01:19:28,840 --> 01:19:32,170 and tells us whose turn it is. 1639 01:19:32,170 --> 01:19:34,870 Next up, we'll consider the actions function. 1640 01:19:34,870 --> 01:19:39,040 The actions function, much like it did in classical search, takes a state 1641 01:19:39,040 --> 01:19:41,860 and gives us the set of all of the possible actions 1642 01:19:41,860 --> 01:19:44,390 we can take in that state. 1643 01:19:44,390 --> 01:19:49,360 So let's imagine it's O's turn to move in a game board that looks like this. 1644 01:19:49,360 --> 01:19:52,120 What happens when we pass it into the actions function? 1645 01:19:52,120 --> 01:19:55,990 So the actions function takes this state of the game as input, 1646 01:19:55,990 --> 01:19:59,860 and the output is a set of possible actions it's a set of-- 1647 01:19:59,860 --> 01:20:03,580 I could move in the upper left, or I could move in the bottom middle. 1648 01:20:03,580 --> 01:20:06,490 Those are the two possible action choices that I have 1649 01:20:06,490 --> 01:20:10,190 when I begin in this particular state. 1650 01:20:10,190 --> 01:20:13,100 Now, just as before, when we add states and actions, 1651 01:20:13,100 --> 01:20:15,470 we need some sort of transition model to tell us, 1652 01:20:15,470 --> 01:20:19,500 when we take this action in the state, what is the new state that we get? 1653 01:20:19,500 --> 01:20:22,430 And here, we define that using the result function that takes 1654 01:20:22,430 --> 01:20:25,460 a state as input, as well as an action. 1655 01:20:25,460 --> 01:20:28,490 And when we apply the result function to this state, 1656 01:20:28,490 --> 01:20:33,110 saying that let's let O move in this upper left corner, the new state we get 1657 01:20:33,110 --> 01:20:35,840 is this resulting state, where O is in the upper-left corner. 1658 01:20:35,840 --> 01:20:38,880 And now, this seems obvious to someone who knows how to play tic-tac-toe. 1659 01:20:38,880 --> 01:20:40,830 Of course, you play in the upper left corner-- 1660 01:20:40,830 --> 01:20:41,870 that's the board you get. 1661 01:20:41,870 --> 01:20:45,060 But all of this information needs to be encoded into the AI. 1662 01:20:45,060 --> 01:20:47,570 The AI doesn't know how to play tic-tac-toe 1663 01:20:47,570 --> 01:20:51,170 until you tell the AI how the rules of tic-tac-toe work. 1664 01:20:51,170 --> 01:20:53,630 And this function, defining the function here, 1665 01:20:53,630 --> 01:20:57,050 allows us to tell the AI how this game actually works 1666 01:20:57,050 --> 01:21:01,200 and how actions actually affect the outcome of the game. 1667 01:21:01,200 --> 01:21:03,590 So the AI needs to know how the game works. 1668 01:21:03,590 --> 01:21:06,080 The AI also needs to know when the game is over. 1669 01:21:06,080 --> 01:21:09,320 That is by defining a function called terminal that takes as input 1670 01:21:09,320 --> 01:21:13,200 a state S, such that if we take a game that is not yet over, 1671 01:21:13,200 --> 01:21:16,130 pass it into the terminal function, the output is false. 1672 01:21:16,130 --> 01:21:17,540 The game is not over. 1673 01:21:17,540 --> 01:21:20,690 But if we take a game that is over, because X has gotten three 1674 01:21:20,690 --> 01:21:24,360 in a row along that diagonal, pass that into the terminal function, 1675 01:21:24,360 --> 01:21:28,920 then the output is going to be true, because the game now is, in fact, over. 1676 01:21:28,920 --> 01:21:31,930 And finally, we've told the AI how the game works 1677 01:21:31,930 --> 01:21:35,180 in terms of what moves can be made and what happens when you make those moves. 1678 01:21:35,180 --> 01:21:37,290 We've told the AI when the game is over. 1679 01:21:37,290 --> 01:21:41,270 Now we need to tell the AI what the value of each of those states is. 1680 01:21:41,270 --> 01:21:45,170 And we do that by defining this utility function, that takes a state, S, 1681 01:21:45,170 --> 01:21:48,840 and tells us the score or the utility of that state. 1682 01:21:48,840 --> 01:21:52,760 So again, we said that if X wins the game, that utility is a value of 1, 1683 01:21:52,760 --> 01:21:57,350 whereas if O wins the game, then the utility of that is negative 1. 1684 01:21:57,350 --> 01:22:00,200 And the AI needs to know, for each of these terminal states 1685 01:22:00,200 --> 01:22:04,800 where the game is over, what is the utility of that state? 1686 01:22:04,800 --> 01:22:08,390 So I can give you a game board like this, where the game is, in fact, over, 1687 01:22:08,390 --> 01:22:12,710 and I ask the AI to tell me what the value of that state is, it could do so. 1688 01:22:12,710 --> 01:22:15,830 The value of the state is 1. 1689 01:22:15,830 --> 01:22:20,240 Where things get interesting, though, is if the game is not yet over. 1690 01:22:20,240 --> 01:22:21,900 Let's imagine a game board like this. 1691 01:22:21,900 --> 01:22:23,330 We're in the middle of the game. 1692 01:22:23,330 --> 01:22:25,850 It's O's turn to make a move. 1693 01:22:25,850 --> 01:22:27,980 So how do we know it's O's turn to make a move? 1694 01:22:27,980 --> 01:22:30,110 We can calculate that, using the player function. 1695 01:22:30,110 --> 01:22:33,260 We can say, player of S, pass in the state. 1696 01:22:33,260 --> 01:22:36,240 O is the answer, so we know it's O's turn to move. 1697 01:22:36,240 --> 01:22:40,790 And now, what is the value of this board, and what action should O take? 1698 01:22:40,790 --> 01:22:41,960 Well that's going to depend. 1699 01:22:41,960 --> 01:22:43,760 We have to do some calculation here. 1700 01:22:43,760 --> 01:22:47,450 And this is where the Minimax algorithm really comes in. 1701 01:22:47,450 --> 01:22:50,840 Recall that X is trying to maximize the score, which means 1702 01:22:50,840 --> 01:22:53,720 that O is trying to minimize the score. 1703 01:22:53,720 --> 01:22:58,870 O would like to minimize the total value that we get at the end of the game. 1704 01:22:58,870 --> 01:23:01,310 And because this game isn't over yet, we don't really 1705 01:23:01,310 --> 01:23:04,820 know just yet what the value of this game board is. 1706 01:23:04,820 --> 01:23:08,290 We have to do some calculation in order to figure that out. 1707 01:23:08,290 --> 01:23:10,430 So how do we do that kind of calculation? 1708 01:23:10,430 --> 01:23:13,040 Well, in order to do so, we're going to consider, 1709 01:23:13,040 --> 01:23:15,680 just as we might in a classical search situation, 1710 01:23:15,680 --> 01:23:20,160 what actions could happen next, and what states will that take us to? 1711 01:23:20,160 --> 01:23:22,070 And it turns out that in this position, there 1712 01:23:22,070 --> 01:23:26,030 are only two open squares, which means there are only two open places where 1713 01:23:26,030 --> 01:23:28,640 O can make a move. 1714 01:23:28,640 --> 01:23:31,080 O could either make a move in the upper left, 1715 01:23:31,080 --> 01:23:34,160 or O can make a move in the bottom middle. 1716 01:23:34,160 --> 01:23:36,980 And Minimax doesn't know right out of the box which of those moves 1717 01:23:36,980 --> 01:23:40,520 is going to be better, so it's going to consider both. 1718 01:23:40,520 --> 01:23:42,320 But now we run into the same situation. 1719 01:23:42,320 --> 01:23:45,170 Now I have two more game boards, neither of which is over. 1720 01:23:45,170 --> 01:23:46,660 What happens next? 1721 01:23:46,660 --> 01:23:48,410 And now it's in this sense that Minimax is 1722 01:23:48,410 --> 01:23:50,660 what we'll call a recursive algorithm. 1723 01:23:50,660 --> 01:23:54,800 It's going to now repeat the exact same process, although now 1724 01:23:54,800 --> 01:23:57,640 considering it from the opposite perspective. 1725 01:23:57,640 --> 01:24:01,280 It's as if I am now going to put myself-- if I am the O player, 1726 01:24:01,280 --> 01:24:05,540 I'm going to put myself in my opponent's shoes, my opponent as the X player, 1727 01:24:05,540 --> 01:24:10,010 and consider, what would my opponent do if they were in this position? 1728 01:24:10,010 --> 01:24:14,090 What would my opponent do, the X player, if they were in that position? 1729 01:24:14,090 --> 01:24:15,240 And what would then happen? 1730 01:24:15,240 --> 01:24:18,260 Well, the other player, my opponent, the X player, 1731 01:24:18,260 --> 01:24:21,200 is trying to maximize the score, whereas I am trying 1732 01:24:21,200 --> 01:24:23,240 to minimize the score as the O player. 1733 01:24:23,240 --> 01:24:27,550 So X is trying to find the maximum possible value that they can get. 1734 01:24:27,550 --> 01:24:29,390 And so what's going to happen? 1735 01:24:29,390 --> 01:24:32,780 Well, from this board position, X only has one choice. 1736 01:24:32,780 --> 01:24:35,600 X is going to play here, and they're going to get three in a row. 1737 01:24:35,600 --> 01:24:37,910 And we know that that board, X winning-- 1738 01:24:37,910 --> 01:24:39,530 that has a value of 1. 1739 01:24:39,530 --> 01:24:43,220 If X wins the game, the value of that game board is 1. 1740 01:24:43,220 --> 01:24:48,680 And so from this position, if this state can only ever lead to this state, 1741 01:24:48,680 --> 01:24:52,640 it's the only possible option, and this state has a value of 1, 1742 01:24:52,640 --> 01:24:57,230 then the maximum possible value that the X player can get from this game board 1743 01:24:57,230 --> 01:24:58,890 is also 1 from here. 1744 01:24:58,890 --> 01:25:01,670 The only place we can get is to a game with the value of 1, 1745 01:25:01,670 --> 01:25:05,260 so this game board also has a value of 1. 1746 01:25:05,260 --> 01:25:07,640 Now we consider this one over here. 1747 01:25:07,640 --> 01:25:08,860 What's going to happen now? 1748 01:25:08,860 --> 01:25:10,460 Well, X needs to make a move. 1749 01:25:10,460 --> 01:25:13,510 The only move X can make is in the upper left, so X will go there. 1750 01:25:13,510 --> 01:25:15,350 And in this game, no one wins the game. 1751 01:25:15,350 --> 01:25:16,940 Nobody has three in a row. 1752 01:25:16,940 --> 01:25:19,600 So the value of that game board is 0. 1753 01:25:19,600 --> 01:25:20,890 Nobody's won. 1754 01:25:20,890 --> 01:25:24,880 And so again, by the same logic, if from this board position, the only place 1755 01:25:24,880 --> 01:25:27,760 we can get to is a board where the value is 0, 1756 01:25:27,760 --> 01:25:31,290 then this state must also have a value of 0. 1757 01:25:31,290 --> 01:25:35,320 And now here comes the choice part, the idea of trying to minimize. 1758 01:25:35,320 --> 01:25:38,970 I, as the O player, now know that if I make this choice, 1759 01:25:38,970 --> 01:25:43,140 moving in the upper left, that is going to result in a game with a value of 1, 1760 01:25:43,140 --> 01:25:45,240 assuming everyone plays optimally. 1761 01:25:45,240 --> 01:25:47,040 And if I instead play in the lower middle, 1762 01:25:47,040 --> 01:25:50,250 choose this fork in the road, that is going to result in a game board 1763 01:25:50,250 --> 01:25:51,510 with a value of 0. 1764 01:25:51,510 --> 01:25:52,750 I have two options. 1765 01:25:52,750 --> 01:25:56,380 I have a 1 and a 0 to choose from, and I need to pick. 1766 01:25:56,380 --> 01:25:59,520 And as the min player, I would rather choose the option 1767 01:25:59,520 --> 01:26:00,940 with the minimum value. 1768 01:26:00,940 --> 01:26:03,090 So whenever a player has multiple choices, 1769 01:26:03,090 --> 01:26:06,030 the min player will choose the option with the smallest value. 1770 01:26:06,030 --> 01:26:08,700 The max player will choose the option with the largest value. 1771 01:26:08,700 --> 01:26:11,470 Between the 1 in the 0, the 0 is smaller, 1772 01:26:11,470 --> 01:26:14,740 meaning I'd rather tie the game than lose the game. 1773 01:26:14,740 --> 01:26:18,090 And so this game board, we'll say, also has a value of 0, 1774 01:26:18,090 --> 01:26:22,290 because if I am playing optimally, I will pick this fork in the road. 1775 01:26:22,290 --> 01:26:25,290 I'll place my O here to block X's three in a row. 1776 01:26:25,290 --> 01:26:28,080 X will move in the upper left, and the game will be over, 1777 01:26:28,080 --> 01:26:30,180 and no one will have won the game. 1778 01:26:30,180 --> 01:26:34,260 So this is now the logic of Minimax, to consider all of the possible options 1779 01:26:34,260 --> 01:26:37,140 that I can take, all of the actions that I can take, 1780 01:26:37,140 --> 01:26:39,390 and then to put myself in my opponent's shoes. 1781 01:26:39,390 --> 01:26:42,930 I decide what move I'm going to make now by considering what move 1782 01:26:42,930 --> 01:26:44,880 my opponent will make on the next turn. 1783 01:26:44,880 --> 01:26:48,230 And to do that, I consider what move I would make on the turn after that, 1784 01:26:48,230 --> 01:26:52,560 so on and so forth, until I get all the way down to the end of the game, 1785 01:26:52,560 --> 01:26:55,050 to one of these so-called terminal states. 1786 01:26:55,050 --> 01:26:57,480 In fact, this very decision point, where I 1787 01:26:57,480 --> 01:27:00,630 am trying to decide as the O player what to make a decision about, 1788 01:27:00,630 --> 01:27:04,770 might have just been a part of the logic that the X player, my opponent, 1789 01:27:04,770 --> 01:27:06,360 was using the move before me. 1790 01:27:06,360 --> 01:27:09,150 This might be part of some larger tree where 1791 01:27:09,150 --> 01:27:11,580 X is trying to make a move in this situation 1792 01:27:11,580 --> 01:27:13,830 and needs to pick between three different options 1793 01:27:13,830 --> 01:27:16,440 in order to make a decision about what to happen. 1794 01:27:16,440 --> 01:27:19,260 And the further and further away we are from the end of the game, 1795 01:27:19,260 --> 01:27:23,460 the deeper this tree has to go, because every level in this tree 1796 01:27:23,460 --> 01:27:27,750 is going to correspond to one move, one move or action that I take, 1797 01:27:27,750 --> 01:27:32,350 one move or action that my opponent takes, in order to decide what happens. 1798 01:27:32,350 --> 01:27:35,970 And in fact, it turns out that if I am the X player in this position, 1799 01:27:35,970 --> 01:27:38,630 and I recursively do the logic and see I have a choice-- 1800 01:27:38,630 --> 01:27:43,170 three choices, in fact, one of which leads to a value of 0, if I play here, 1801 01:27:43,170 --> 01:27:45,900 and if everyone plays optimally, the game will be a tie. 1802 01:27:45,900 --> 01:27:51,000 If I play here, then O is going to win, and I'll lose, playing optimally. 1803 01:27:51,000 --> 01:27:53,610 Or here, where I, the X player, can win-- 1804 01:27:53,610 --> 01:27:57,230 well, between a score of 0 and negative 1 and 1, 1805 01:27:57,230 --> 01:27:59,070 I'd rather pick the board with a value of 1, 1806 01:27:59,070 --> 01:28:01,270 because that's the maximum value I can get. 1807 01:28:01,270 --> 01:28:05,550 And so this board would also have a maximum value of 1. 1808 01:28:05,550 --> 01:28:07,810 And so this tree can get very, very deep, 1809 01:28:07,810 --> 01:28:11,400 especially as the game starts to have more and more moves. 1810 01:28:11,400 --> 01:28:13,470 And this logic works not just for tic-tac-toe, 1811 01:28:13,470 --> 01:28:16,950 but any of these sorts of games where I make a move, my opponent makes a move, 1812 01:28:16,950 --> 01:28:20,460 and ultimately, we have these adversarial objectives. 1813 01:28:20,460 --> 01:28:24,340 And we can simplify the diagram into a diagram that looks like this. 1814 01:28:24,340 --> 01:28:27,240 This is a more abstract version of the Minimax tree, 1815 01:28:27,240 --> 01:28:30,450 where these are each states, but I'm no longer representing them as exactly 1816 01:28:30,450 --> 01:28:31,830 like tic-tac-toe boards. 1817 01:28:31,830 --> 01:28:35,690 This is just representing some generic game that might be tic-tac-toe, 1818 01:28:35,690 --> 01:28:38,130 might be some other game altogether. 1819 01:28:38,130 --> 01:28:40,590 Any of these green arrows that are pointing up-- 1820 01:28:40,590 --> 01:28:42,700 that represents a maximizing state. 1821 01:28:42,700 --> 01:28:45,290 I would like the score to be as big as possible. 1822 01:28:45,290 --> 01:28:47,430 And any of these red arrows pointing down-- 1823 01:28:47,430 --> 01:28:50,580 those are minimizing states, where the player is the min player, 1824 01:28:50,580 --> 01:28:54,190 and they are trying to make the score as small as possible. 1825 01:28:54,190 --> 01:28:58,200 So if you imagine in this situation, I am the maximizing player, this player 1826 01:28:58,200 --> 01:29:00,550 here, and I have three choices-- 1827 01:29:00,550 --> 01:29:04,170 one choice gives me a score of 5, one choice gives me a score of 3, 1828 01:29:04,170 --> 01:29:06,310 and one choice gives me a score of 9. 1829 01:29:06,310 --> 01:29:10,050 Well, then, between those three choices, my best option 1830 01:29:10,050 --> 01:29:14,430 is to choose this 9 over here, the score that maximizes my options out 1831 01:29:14,430 --> 01:29:15,960 of all the three options. 1832 01:29:15,960 --> 01:29:18,870 And so I can give this state a value of 9, 1833 01:29:18,870 --> 01:29:21,210 because among my three options, that is the best 1834 01:29:21,210 --> 01:29:24,330 choice that I have available to me. 1835 01:29:24,330 --> 01:29:25,800 So that's my decision now. 1836 01:29:25,800 --> 01:29:29,580 You imagine it's like one move away from the end of the game. 1837 01:29:29,580 --> 01:29:31,670 But then you could also ask a reasonable question. 1838 01:29:31,670 --> 01:29:35,300 What might my opponent do two moves away from the end of the game? 1839 01:29:35,300 --> 01:29:37,040 My opponent is the minimizing player. 1840 01:29:37,040 --> 01:29:39,810 They are trying to make the score as small as possible. 1841 01:29:39,810 --> 01:29:43,690 Imagine what would have happened if they had to pick which choice to make. 1842 01:29:43,690 --> 01:29:47,370 One choice leads us to this state, where I, the maximizing player, 1843 01:29:47,370 --> 01:29:50,840 am going to opt for 9, the biggest score that I can get. 1844 01:29:50,840 --> 01:29:55,130 And one leads to this state, where I, the maximizing player, 1845 01:29:55,130 --> 01:29:58,980 would choose 8, which is then the largest score than I can get. 1846 01:29:58,980 --> 01:30:02,780 Now, the minimizing player, forced to choose between a 9 or an 8, 1847 01:30:02,780 --> 01:30:07,100 is going to choose the smallest possible score, which in this case is an 8. 1848 01:30:07,100 --> 01:30:09,450 And that is, then, how this process would unfold. 1849 01:30:09,450 --> 01:30:11,660 But the minimizing player, in this case, considers 1850 01:30:11,660 --> 01:30:14,180 both of their options, and then all of the options 1851 01:30:14,180 --> 01:30:17,590 that would happen as a result of that. 1852 01:30:17,590 --> 01:30:21,420 So this now is a general picture of what the Minimax algorithm looks like. 1853 01:30:21,420 --> 01:30:24,700 Let's now try to formalize it using a little bit of pseudocode. 1854 01:30:24,700 --> 01:30:27,720 So what exactly is happening in the Minimax algorithm? 1855 01:30:27,720 --> 01:30:31,500 Well, given a state, S, we need to decide what to happen. 1856 01:30:31,500 --> 01:30:34,980 The max player-- if it's the max player's turn, then 1857 01:30:34,980 --> 01:30:39,540 max is going to pick an action, A, in actions of S. Recall 1858 01:30:39,540 --> 01:30:42,090 that actions is a function that takes a state 1859 01:30:42,090 --> 01:30:44,970 and gives me back all of the possible actions that I can take. 1860 01:30:44,970 --> 01:30:48,770 It tells me all of the moves that are possible. 1861 01:30:48,770 --> 01:30:50,610 The max player is going to specifically pick 1862 01:30:50,610 --> 01:30:53,970 an action, A, in the set of actions that gives me 1863 01:30:53,970 --> 01:31:01,420 the highest value of min value of result of S and A. So what does that mean? 1864 01:31:01,420 --> 01:31:04,350 Well, it means that I want to make the option that gives me 1865 01:31:04,350 --> 01:31:07,830 the highest score of all of the actions, A. 1866 01:31:07,830 --> 01:31:09,630 But what score is that going to have? 1867 01:31:09,630 --> 01:31:12,780 To calculate that, I need to know what my opponent, the min player, 1868 01:31:12,780 --> 01:31:18,300 is going to do if they try to minimize the value of the state that results. 1869 01:31:18,300 --> 01:31:22,260 So we say, what state results after I take this action, 1870 01:31:22,260 --> 01:31:24,570 and what happens when the min player tries 1871 01:31:24,570 --> 01:31:27,570 to minimize the value of that state? 1872 01:31:27,570 --> 01:31:30,260 I consider that for all of my possible options. 1873 01:31:30,260 --> 01:31:32,850 And after I've considered that for all of my possible options, 1874 01:31:32,850 --> 01:31:36,770 I pick the action, A, that has the highest value. 1875 01:31:36,770 --> 01:31:40,090 Likewise, the min player is going to do the same thing, but backwards. 1876 01:31:40,090 --> 01:31:43,180 They're also going to consider, what are all of the possible actions they 1877 01:31:43,180 --> 01:31:44,800 can take if it's their turn? 1878 01:31:44,800 --> 01:31:47,740 And they're going to pick the action, A, that has the smallest 1879 01:31:47,740 --> 01:31:50,220 possible value of all the options. 1880 01:31:50,220 --> 01:31:53,430 And the way they know what the smallest possible value of all the options is, 1881 01:31:53,430 --> 01:31:57,490 is by considering what the max player is going to do, 1882 01:31:57,490 --> 01:32:01,720 by saying, what's the result of applying this action to the current state, 1883 01:32:01,720 --> 01:32:03,640 and then, what would the max player try to do? 1884 01:32:03,640 --> 01:32:08,000 What value would the max player calculate for that particular state? 1885 01:32:08,000 --> 01:32:11,260 So everyone makes their decision based on trying to estimate 1886 01:32:11,260 --> 01:32:13,600 what the other person would do. 1887 01:32:13,600 --> 01:32:15,930 And now we need to turn our attention to these two 1888 01:32:15,930 --> 01:32:18,540 functions, maxValue and minValue. 1889 01:32:18,540 --> 01:32:21,690 How do you actually calculate the value of a state 1890 01:32:21,690 --> 01:32:24,270 if you're trying to maximize its value, and how do you 1891 01:32:24,270 --> 01:32:27,690 calculate the value of a state if you're trying to minimize the value? 1892 01:32:27,690 --> 01:32:30,750 If you can do that, then we have an entire implementation 1893 01:32:30,750 --> 01:32:32,920 of this Minimax algorithm. 1894 01:32:32,920 --> 01:32:33,610 So let's try it. 1895 01:32:33,610 --> 01:32:36,480 Let's try and implement this maxValue function 1896 01:32:36,480 --> 01:32:40,630 that takes a state and returns as output the value of that state 1897 01:32:40,630 --> 01:32:43,860 if I'm trying to maximize the value of the state. 1898 01:32:43,860 --> 01:32:46,860 Well, the first thing I can check for is to see if the game is over, 1899 01:32:46,860 --> 01:32:48,100 because if the game is over-- 1900 01:32:48,100 --> 01:32:50,820 in other words, if the state is a terminal state-- 1901 01:32:50,820 --> 01:32:51,990 then this is easy. 1902 01:32:51,990 --> 01:32:54,930 I already have this utility function that tells me 1903 01:32:54,930 --> 01:32:56,280 what the value of the board is. 1904 01:32:56,280 --> 01:32:58,660 If the game is over, I just check, did X win? 1905 01:32:58,660 --> 01:32:59,160 Did O win? 1906 01:32:59,160 --> 01:33:00,150 Is that a tie? 1907 01:33:00,150 --> 01:33:04,200 And the utility function just knows what the value of the state is. 1908 01:33:04,200 --> 01:33:06,430 What's trickier is if the game isn't over, 1909 01:33:06,430 --> 01:33:09,240 because then I need to do this recursive reasoning about thinking, 1910 01:33:09,240 --> 01:33:12,630 what is my opponent going to do on the next move? 1911 01:33:12,630 --> 01:33:15,830 Then I want to calculate the value of this state, 1912 01:33:15,830 --> 01:33:19,060 and I want the value of the state to be as high as possible. 1913 01:33:19,060 --> 01:33:21,900 And I'll keep track of that value in a variable called v. 1914 01:33:21,900 --> 01:33:24,570 And if I want the value to be as high as possible, 1915 01:33:24,570 --> 01:33:27,330 I need to give v an initial value. 1916 01:33:27,330 --> 01:33:31,500 And initially, I'll just go ahead and set it to be as low as possible, 1917 01:33:31,500 --> 01:33:34,650 because I don't know what options are available to me yet. 1918 01:33:34,650 --> 01:33:38,610 So initially, I'll set v equal to negative infinity, which 1919 01:33:38,610 --> 01:33:40,620 seems a little bit strange, but the idea here 1920 01:33:40,620 --> 01:33:43,680 is, I want the value initially to be as low as possible, 1921 01:33:43,680 --> 01:33:46,230 because as I consider my actions, I'm always 1922 01:33:46,230 --> 01:33:50,220 going to try and do better than v. And if I set v to negative infinity, 1923 01:33:50,220 --> 01:33:52,850 I know I can always do better than that. 1924 01:33:52,850 --> 01:33:54,960 So now I consider my actions. 1925 01:33:54,960 --> 01:33:56,710 And this is going to be some kind of loop, 1926 01:33:56,710 --> 01:34:00,070 where for every action in actions of state-- 1927 01:34:00,070 --> 01:34:02,830 recall, actions is a function that takes my state 1928 01:34:02,830 --> 01:34:06,650 and gives me all the possible actions that I can use in that state. 1929 01:34:06,650 --> 01:34:11,570 So for each one of those actions, I want to compare it to v and say, 1930 01:34:11,570 --> 01:34:18,290 all right, v is going to be equal to the maximum of v and this expression. 1931 01:34:18,290 --> 01:34:20,090 So what is this expression? 1932 01:34:20,090 --> 01:34:24,640 Well, first it is, get the result of taking the action and the state, 1933 01:34:24,640 --> 01:34:28,250 and then get the min value of that. 1934 01:34:28,250 --> 01:34:31,090 In other words, let's say, I want to find out 1935 01:34:31,090 --> 01:34:34,220 from that state what is the best that the min player can do, 1936 01:34:34,220 --> 01:34:36,430 because they are going to try and minimize the score. 1937 01:34:36,430 --> 01:34:40,210 So whatever the resulting score is of the min value of that state, 1938 01:34:40,210 --> 01:34:43,870 compare it to my current best value, and just pick the maximum of those two, 1939 01:34:43,870 --> 01:34:46,460 because I am trying to maximize the value. 1940 01:34:46,460 --> 01:34:48,550 In short, what these three lines of code are doing 1941 01:34:48,550 --> 01:34:52,460 are going through all of my possible actions and asking the question, 1942 01:34:52,460 --> 01:34:57,880 how do I maximize the score, given what my opponent is going to try to do? 1943 01:34:57,880 --> 01:35:00,640 After this entire loop, I can just return v, 1944 01:35:00,640 --> 01:35:04,070 and that is now the value of that particular state. 1945 01:35:04,070 --> 01:35:07,910 And for the min player, it's the exact opposite of this, the same logic, 1946 01:35:07,910 --> 01:35:08,930 just backwards. 1947 01:35:08,930 --> 01:35:10,910 To calculate the minimum value of a state, 1948 01:35:10,910 --> 01:35:12,770 first we check if it's a terminal state. 1949 01:35:12,770 --> 01:35:14,960 If it is, we return its utility. 1950 01:35:14,960 --> 01:35:19,280 Otherwise, we're going to now try to minimize the value of the state, 1951 01:35:19,280 --> 01:35:21,290 given all of my possible actions. 1952 01:35:21,290 --> 01:35:24,800 So I need an initial value for v, the value of the state. 1953 01:35:24,800 --> 01:35:28,430 And initially, I'll set it to infinity, because I know it can always 1954 01:35:28,430 --> 01:35:30,390 get something less than infinity. 1955 01:35:30,390 --> 01:35:33,920 So by starting with v equals infinity, I make sure that the very first action 1956 01:35:33,920 --> 01:35:34,580 I find-- 1957 01:35:34,580 --> 01:35:37,550 that will be less than this value of v. 1958 01:35:37,550 --> 01:35:38,810 And then I do the same thing-- 1959 01:35:38,810 --> 01:35:41,630 loop over all of my possible actions, and for each 1960 01:35:41,630 --> 01:35:45,740 of the results that we could get when the max player makes their decision, 1961 01:35:45,740 --> 01:35:49,160 let's take the minimum of that and the current value of v. 1962 01:35:49,160 --> 01:35:53,210 So after all is said and done I get the smallest possible value of v, 1963 01:35:53,210 --> 01:35:56,480 that I then return back to the user. 1964 01:35:56,480 --> 01:35:59,020 So that, in effect, is the pseudocode for Minimax. 1965 01:35:59,020 --> 01:36:01,990 That is how we take a game and figure out what the best move to make 1966 01:36:01,990 --> 01:36:06,490 is by recursively using these maxValue and minValue functions, where 1967 01:36:06,490 --> 01:36:10,060 maxValue calls minValue, minValue calls maxValue, back 1968 01:36:10,060 --> 01:36:13,540 and forth, all the way until we reach a terminal state, at which point 1969 01:36:13,540 --> 01:36:18,750 our algorithm can simply return the utility of that particular state. 1970 01:36:18,750 --> 01:36:20,590 What you might imagine is that this is going 1971 01:36:20,590 --> 01:36:23,770 to start to be a long process, especially as games start 1972 01:36:23,770 --> 01:36:28,060 to get more complex, as we start to add more moves and more possible options 1973 01:36:28,060 --> 01:36:30,710 and games that might last quite a bit longer. 1974 01:36:30,710 --> 01:36:34,360 So the next question to ask is, what sort of optimizations can we make here? 1975 01:36:34,360 --> 01:36:37,780 How can we do better in order to use less space 1976 01:36:37,780 --> 01:36:42,110 or take less time to be able to solve this kind of problem? 1977 01:36:42,110 --> 01:36:44,740 And we'll take a look at a couple of possible optimizations. 1978 01:36:44,740 --> 01:36:47,330 But for one, we'll take a look at this example. 1979 01:36:47,330 --> 01:36:49,850 Again, we're turning to these up arrows and down arrows. 1980 01:36:49,850 --> 01:36:54,070 Let's imagine that I now am the max player, this green arrow. 1981 01:36:54,070 --> 01:36:57,260 I am trying to make the score as high as possible. 1982 01:36:57,260 --> 01:37:00,370 And this is an easy game, where there are just two moves. 1983 01:37:00,370 --> 01:37:02,980 I make a move, one of these three options, 1984 01:37:02,980 --> 01:37:05,890 and then my opponent makes a move, one of these three options, 1985 01:37:05,890 --> 01:37:07,290 based on what move I make. 1986 01:37:07,290 --> 01:37:10,310 And as a result, we get some value. 1987 01:37:10,310 --> 01:37:13,450 Let's look at the order in which I do these calculations 1988 01:37:13,450 --> 01:37:16,810 and figure out if there are any optimizations I might be able to make 1989 01:37:16,810 --> 01:37:18,760 to this calculation process. 1990 01:37:18,760 --> 01:37:21,200 I'm going to have to look at these states one at a time. 1991 01:37:21,200 --> 01:37:23,740 So let's say I start here on the left and say, all right, now 1992 01:37:23,740 --> 01:37:28,300 I'm going to consider, what will the min player, my opponent, try to do here? 1993 01:37:28,300 --> 01:37:31,810 Well, the min player is going to look at all three of their possible actions 1994 01:37:31,810 --> 01:37:34,270 and look at their value, because these are terminal states. 1995 01:37:34,270 --> 01:37:35,430 They're the end of the game. 1996 01:37:35,430 --> 01:37:38,830 And so they'll see, all right, this node is a value of 4, value of 8, 1997 01:37:38,830 --> 01:37:40,410 value of 5. 1998 01:37:40,410 --> 01:37:42,800 And the min player is going to say, well, all right. 1999 01:37:42,800 --> 01:37:45,790 Between these three options, 4, 8, and 5, 2000 01:37:45,790 --> 01:37:48,070 I'll take the smallest one I'll take the 4. 2001 01:37:48,070 --> 01:37:50,610 So this state now has a value of 4. 2002 01:37:50,610 --> 01:37:54,060 Then I as the max player say, all right, if I take this action, 2003 01:37:54,060 --> 01:37:55,170 it will have a value of 4. 2004 01:37:55,170 --> 01:37:57,330 That's the best that I can do, because min player 2005 01:37:57,330 --> 01:37:59,810 is going to try and minimize my score. 2006 01:37:59,810 --> 01:38:01,270 So now, what if I take this option? 2007 01:38:01,270 --> 01:38:02,620 We'll explore this next. 2008 01:38:02,620 --> 01:38:06,250 And now I explore what the min player would do if I choose this action. 2009 01:38:06,250 --> 01:38:09,350 And the min player is going to say, all right, what are the three options? 2010 01:38:09,350 --> 01:38:14,440 The min player has options between 9, 3, and 7, and so 3 2011 01:38:14,440 --> 01:38:16,510 is the smallest among 9, 3, and 7. 2012 01:38:16,510 --> 01:38:19,640 So we'll go ahead and say this state has a value of 3. 2013 01:38:19,640 --> 01:38:20,980 So now I, as the max player-- 2014 01:38:20,980 --> 01:38:23,350 I have now explored two of my three options. 2015 01:38:23,350 --> 01:38:27,390 I know that one of my options will guarantee me a score of 4, at least, 2016 01:38:27,390 --> 01:38:31,100 and one of my options will guarantee me a score of 3. 2017 01:38:31,100 --> 01:38:34,160 And now I consider my third option and say, all right, what happens here? 2018 01:38:34,160 --> 01:38:35,910 Same exact logic-- the min player is going 2019 01:38:35,910 --> 01:38:38,120 to look at these three states, 2, 4, and 6, 2020 01:38:38,120 --> 01:38:42,480 say the minimum possible option is 2, so the min player wants the two. 2021 01:38:42,480 --> 01:38:45,780 Now I, as the max player, have calculated all of the information 2022 01:38:45,780 --> 01:38:49,220 by looking two layers deep, by looking at all of these nodes. 2023 01:38:49,220 --> 01:38:52,770 And I can now say, between the 4, the 3, and the 2, you know what? 2024 01:38:52,770 --> 01:38:55,620 I'd rather take the 4, because if I choose 2025 01:38:55,620 --> 01:38:58,200 this option, if my opponent plays optimally, 2026 01:38:58,200 --> 01:39:01,620 they will try and get me to the 4, but that's the best I can do. 2027 01:39:01,620 --> 01:39:04,170 I can't guarantee a higher score, because if I 2028 01:39:04,170 --> 01:39:07,740 pick either of these two options, I might get a 3, or I might get a 2. 2029 01:39:07,740 --> 01:39:10,620 And it's true that down here is a 9, and that's 2030 01:39:10,620 --> 01:39:12,390 the highest score of any of the scores. 2031 01:39:12,390 --> 01:39:14,230 So I might be tempted to say, you know what? 2032 01:39:14,230 --> 01:39:17,400 Maybe I should take this option, because I might get the 9. 2033 01:39:17,400 --> 01:39:19,980 But if the min player is playing intelligently, 2034 01:39:19,980 --> 01:39:22,650 if they're making the best moves at each possible option 2035 01:39:22,650 --> 01:39:26,370 they have when they get to make a choice, I'll be left with a 3, 2036 01:39:26,370 --> 01:39:28,470 whereas I could better, playing optimally, 2037 01:39:28,470 --> 01:39:31,720 have guaranteed that I would get the 4. 2038 01:39:31,720 --> 01:39:33,600 So that doesn't affect the logic that I would 2039 01:39:33,600 --> 01:39:38,910 use as a Minimax player trying to maximize my score from that node there. 2040 01:39:38,910 --> 01:39:41,310 But it turns out, that took quite a bit of computation 2041 01:39:41,310 --> 01:39:42,390 for me to figure that out. 2042 01:39:42,390 --> 01:39:45,690 I had to reason through all of these nodes in order to draw this conclusion. 2043 01:39:45,690 --> 01:39:48,780 And this is for a pretty simple game, where I have three choices, 2044 01:39:48,780 --> 01:39:52,270 my opponent has three choices, and then the game's over. 2045 01:39:52,270 --> 01:39:55,070 So what I'd like to do is come up with some way to optimize this. 2046 01:39:55,070 --> 01:39:58,990 Maybe I don't need to do all of this calculation to still reach 2047 01:39:58,990 --> 01:40:00,450 the conclusion that, you know what? 2048 01:40:00,450 --> 01:40:01,950 This action to the left-- 2049 01:40:01,950 --> 01:40:03,810 that's the best that I could do. 2050 01:40:03,810 --> 01:40:07,290 Let's go ahead and try again and try and be a little more intelligent 2051 01:40:07,290 --> 01:40:10,150 about how I go about doing this. 2052 01:40:10,150 --> 01:40:12,320 So first, I start the exact same way. 2053 01:40:12,320 --> 01:40:14,160 I don't know what to do initially, so I just 2054 01:40:14,160 --> 01:40:18,930 have to consider one of the options and consider what the min player might do. 2055 01:40:18,930 --> 01:40:21,540 Min has three options, 4, 8, and 5. 2056 01:40:21,540 --> 01:40:25,500 And between those three options, min says, 4 is the best they can do, 2057 01:40:25,500 --> 01:40:28,390 because they want to try to minimize the score. 2058 01:40:28,390 --> 01:40:31,960 Now, I, the max player, will consider my second option, 2059 01:40:31,960 --> 01:40:36,730 making this move here and considering what my opponent would do in response. 2060 01:40:36,730 --> 01:40:38,470 What will the min player do? 2061 01:40:38,470 --> 01:40:41,560 Well, the min player is going to, from that state, look at their options. 2062 01:40:41,560 --> 01:40:42,710 And I would say, all right. 2063 01:40:42,710 --> 01:40:45,980 9 is an option, 3 is an option. 2064 01:40:45,980 --> 01:40:48,220 And if I am doing the math from this initial state, 2065 01:40:48,220 --> 01:40:51,400 doing all this calculation, when I see a 3, 2066 01:40:51,400 --> 01:40:54,250 that should immediately be a red flag for me, 2067 01:40:54,250 --> 01:40:56,890 because when I see a 3 down here at this state, 2068 01:40:56,890 --> 01:41:01,870 I know that the value of this state is going to be at most 3. 2069 01:41:01,870 --> 01:41:04,630 It's going to be 3 or something less than 3, 2070 01:41:04,630 --> 01:41:08,050 even though I haven't yet looked at this last action or even further actions 2071 01:41:08,050 --> 01:41:10,870 if there were more actions that could be taken here. 2072 01:41:10,870 --> 01:41:11,900 How do I know that? 2073 01:41:11,900 --> 01:41:16,000 Well, I know that the min player is going to try to minimize my score. 2074 01:41:16,000 --> 01:41:20,350 And if they see a 3, the only way this could be something other than a 3 2075 01:41:20,350 --> 01:41:24,520 is if this remaining thing that I haven't yet looked at is less than 3, 2076 01:41:24,520 --> 01:41:28,810 which means there is no way for this value to be anything more than 3, 2077 01:41:28,810 --> 01:41:31,390 because the min player can already guarantee a 3, 2078 01:41:31,390 --> 01:41:34,930 and they are trying to minimize my score. 2079 01:41:34,930 --> 01:41:36,380 So what does that tell me? 2080 01:41:36,380 --> 01:41:38,740 Well, it tells me that if I choose this action, 2081 01:41:38,740 --> 01:41:43,240 my score is going to be 3, or maybe even less than 3, if I'm unlucky. 2082 01:41:43,240 --> 01:41:47,500 But I already know that this action will guarantee me a 4. 2083 01:41:47,500 --> 01:41:51,230 And so given that I know that this action guarantees me a score of 4, 2084 01:41:51,230 --> 01:41:54,130 and this action means I can't do better than 3, 2085 01:41:54,130 --> 01:41:56,290 if I'm trying to maximize my options, there 2086 01:41:56,290 --> 01:41:59,290 is no need for me to consider this triangle here. 2087 01:41:59,290 --> 01:42:01,990 There is no value, no number that could go here, 2088 01:42:01,990 --> 01:42:04,760 that would change my mind between these two options. 2089 01:42:04,760 --> 01:42:08,020 I'm always going to opt for this path that gets me a 4, 2090 01:42:08,020 --> 01:42:11,260 as opposed to this path, where the best I can do is a 3, 2091 01:42:11,260 --> 01:42:13,600 if my opponent plays optimally. 2092 01:42:13,600 --> 01:42:16,850 And this is going to be true for all of the future states that I look at, too. 2093 01:42:16,850 --> 01:42:19,470 But if I look over here, at what min player might do over here, 2094 01:42:19,470 --> 01:42:24,510 if I see that this state is a 2, I know that this state is at most a 2, 2095 01:42:24,510 --> 01:42:28,500 because the only way this value could be something other than 2 2096 01:42:28,500 --> 01:42:31,820 is if one of these remaining states is less than a 2, 2097 01:42:31,820 --> 01:42:34,600 and so the min player would opt for that instead. 2098 01:42:34,600 --> 01:42:37,470 So even without looking at these remaining states, 2099 01:42:37,470 --> 01:42:42,600 I, as the maximizing player, can know that choosing this path to the left 2100 01:42:42,600 --> 01:42:47,280 is going to be better than choosing either of those two paths to the right, 2101 01:42:47,280 --> 01:42:51,810 because this one can't be better than 3, this one can't be better than 2, 2102 01:42:51,810 --> 01:42:56,550 and so 4 in this case is the best that I can do. 2103 01:42:56,550 --> 01:42:59,580 And I can say now that this state has a value of 4. 2104 01:42:59,580 --> 01:43:01,680 So in order to do this type of calculation, 2105 01:43:01,680 --> 01:43:04,980 I was doing a little bit more bookkeeping, keeping track of things, 2106 01:43:04,980 --> 01:43:08,480 keeping track all the time of, what is the best that I can do, 2107 01:43:08,480 --> 01:43:11,400 what is the worst that I can do, and for each of these states, saying, 2108 01:43:11,400 --> 01:43:15,300 all right, well, if I already know that I can get a 4, 2109 01:43:15,300 --> 01:43:18,060 then if the best I can do at this state is a 3, 2110 01:43:18,060 --> 01:43:19,770 no reason for me to consider it. 2111 01:43:19,770 --> 01:43:25,020 I can effectively prune this leaf and anything below it from the tree. 2112 01:43:25,020 --> 01:43:28,440 And it's for that reason this approach, this optimization to Minimax, 2113 01:43:28,440 --> 01:43:30,510 is called alpha-beta pruning. 2114 01:43:30,510 --> 01:43:32,400 Alpha and beta stand for these two values 2115 01:43:32,400 --> 01:43:34,950 that you'll have to keep track of, the best you can do so far 2116 01:43:34,950 --> 01:43:36,600 and the worst you can do so far. 2117 01:43:36,600 --> 01:43:41,170 And pruning is the idea of, if I have a big, long, deep search tree, 2118 01:43:41,170 --> 01:43:43,650 I might be able to search it more efficiently if I don't 2119 01:43:43,650 --> 01:43:47,070 need to search through everything, if I can remove some of the nodes 2120 01:43:47,070 --> 01:43:52,170 to try and optimize the way that I look through this entire search space. 2121 01:43:52,170 --> 01:43:55,500 So alpha-beta pruning can definitely save us a lot of time 2122 01:43:55,500 --> 01:43:59,430 as we go about the search process by making our searches more efficient. 2123 01:43:59,430 --> 01:44:03,750 But even then, it's still not great as games get more complex. 2124 01:44:03,750 --> 01:44:06,990 Tic-tac-toe, fortunately, is a relatively simple game, 2125 01:44:06,990 --> 01:44:09,820 and we might reasonably ask a question like, 2126 01:44:09,820 --> 01:44:13,640 how many total possible tic-tac-toe games are there? 2127 01:44:13,640 --> 01:44:14,600 You can think about it. 2128 01:44:14,600 --> 01:44:17,640 You can try and estimate, how many moves are there at any given point? 2129 01:44:17,640 --> 01:44:19,500 How many moves long can the game last? 2130 01:44:19,500 --> 01:44:26,250 It turns out there are about 255,000 possible tic-tac-toe games that 2131 01:44:26,250 --> 01:44:27,780 can be played. 2132 01:44:27,780 --> 01:44:30,240 But compare that to a more complex game, something 2133 01:44:30,240 --> 01:44:32,060 like a game of chess, for example-- 2134 01:44:32,060 --> 01:44:35,950 far more pieces, far more moves, games that last much longer. 2135 01:44:35,950 --> 01:44:38,940 How many total possible chess games could there be? 2136 01:44:38,940 --> 01:44:41,100 It turns out that after just four moves each, 2137 01:44:41,100 --> 01:44:44,100 four moves by the white player, four moves by the black player, 2138 01:44:44,100 --> 01:44:47,280 that there are 288 billion possible chess 2139 01:44:47,280 --> 01:44:51,300 games that can result from that situation, after just four moves each. 2140 01:44:51,300 --> 01:44:52,330 And going even further. 2141 01:44:52,330 --> 01:44:55,380 If you look at entire chess games and how many possible chess games there 2142 01:44:55,380 --> 01:44:58,560 could be as a result there, there are more than 10 2143 01:44:58,560 --> 01:45:02,820 to the 29,000 possible chess games, far more chess games 2144 01:45:02,820 --> 01:45:04,410 than could ever be considered. 2145 01:45:04,410 --> 01:45:08,010 And this is a pretty big problem for the Minimax algorithm, because the Minimax 2146 01:45:08,010 --> 01:45:12,150 algorithm starts with an initial state, considers all the possible actions 2147 01:45:12,150 --> 01:45:15,570 and all the possible actions after that, all the way 2148 01:45:15,570 --> 01:45:18,510 until we get to the end of the game. 2149 01:45:18,510 --> 01:45:20,640 And that's going to be a problem if the computer is 2150 01:45:20,640 --> 01:45:23,100 going to need to look through this many states, which 2151 01:45:23,100 --> 01:45:28,680 is far more than any computer could ever do in any reasonable amount of time. 2152 01:45:28,680 --> 01:45:30,900 So what do we do in order to solve this problem? 2153 01:45:30,900 --> 01:45:32,980 Instead of looking through all these states, which 2154 01:45:32,980 --> 01:45:36,420 is totally intractable for a computer, we need some better approach. 2155 01:45:36,420 --> 01:45:39,600 And it turns out that better approach generally takes the form of something 2156 01:45:39,600 --> 01:45:42,000 called depth-limited Minimax. 2157 01:45:42,000 --> 01:45:44,740 Where normally Minimax is depth-unlimited-- 2158 01:45:44,740 --> 01:45:47,220 we just keep going, layer after layer, move after move, 2159 01:45:47,220 --> 01:45:48,930 until we get to the end of the game-- 2160 01:45:48,930 --> 01:45:51,920 depth-limited Minimax is instead going to say, you know what? 2161 01:45:51,920 --> 01:45:53,760 After a certain number of moves-- maybe I'll 2162 01:45:53,760 --> 01:45:57,390 look 10 moves ahead, maybe I'll look 12 moves ahead, but after that point, 2163 01:45:57,390 --> 01:46:00,350 I'm going to stop and not consider additional moves that 2164 01:46:00,350 --> 01:46:02,190 might come after that, just because it would 2165 01:46:02,190 --> 01:46:07,970 be computationally intractable to consider all of those possible options. 2166 01:46:07,970 --> 01:46:10,730 But what do we do after we get 10 or 12 moves deep, 2167 01:46:10,730 --> 01:46:13,840 and we arrive at a situation where the game's not over? 2168 01:46:13,840 --> 01:46:18,110 Minimax still needs a way to assign a score to that game board or game state 2169 01:46:18,110 --> 01:46:20,390 to figure out what its current value is, which 2170 01:46:20,390 --> 01:46:22,760 is easy to do if the game is over, but not so 2171 01:46:22,760 --> 01:46:25,400 easy to do if the game is not yet over. 2172 01:46:25,400 --> 01:46:27,950 So in order to do that, we need to add one additional feature 2173 01:46:27,950 --> 01:46:31,400 to depth-limited Minimax called an evaluation function, 2174 01:46:31,400 --> 01:46:33,170 which is just some function that is going 2175 01:46:33,170 --> 01:46:38,000 to estimate the expected utility of a game from a given state. 2176 01:46:38,000 --> 01:46:41,000 So in a game like chess, if you imagine that a game value of 1 2177 01:46:41,000 --> 01:46:45,950 means white wins, negative 1 means black wins, 0 means it's a draw, 2178 01:46:45,950 --> 01:46:51,260 then you might imagine that a score of 0.8 means white is very likely to win, 2179 01:46:51,260 --> 01:46:53,000 though certainly not guaranteed. 2180 01:46:53,000 --> 01:46:56,450 And you would have an evaluation function that estimates 2181 01:46:56,450 --> 01:46:59,450 how good the game state happens to be. 2182 01:46:59,450 --> 01:47:02,720 And depending on how good that evaluation function is, 2183 01:47:02,720 --> 01:47:06,050 that is ultimately what's going to constrain how good the AI is. 2184 01:47:06,050 --> 01:47:09,080 The better the AI is at estimating how good 2185 01:47:09,080 --> 01:47:12,440 or how bad any particular game state is, the better the AI 2186 01:47:12,440 --> 01:47:14,660 is going to be able to play that game. 2187 01:47:14,660 --> 01:47:16,910 If the evaluation function is worse and not as good 2188 01:47:16,910 --> 01:47:19,670 as estimating what the expected utility is, 2189 01:47:19,670 --> 01:47:21,680 then it's going to be a whole lot harder. 2190 01:47:21,680 --> 01:47:25,130 And you can imagine trying to come up with these evaluation functions. 2191 01:47:25,130 --> 01:47:27,890 In chess, for example, you might write an evaluation function 2192 01:47:27,890 --> 01:47:30,230 based on how many pieces you have, as compared 2193 01:47:30,230 --> 01:47:32,540 to how many pieces your opponent has, because each one 2194 01:47:32,540 --> 01:47:35,240 has a value in your evaluation function. 2195 01:47:35,240 --> 01:47:36,950 It probably needs to be a little bit more 2196 01:47:36,950 --> 01:47:40,400 complicated than that to consider other possible situations that 2197 01:47:40,400 --> 01:47:42,090 might arise as well. 2198 01:47:42,090 --> 01:47:44,240 And there are many other variants on Minimax 2199 01:47:44,240 --> 01:47:47,570 that add additional features in order to help it perform better 2200 01:47:47,570 --> 01:47:50,330 under these larger and more computationally intractable 2201 01:47:50,330 --> 01:47:54,620 situations, where we couldn't possibly explore all of the possible moves, 2202 01:47:54,620 --> 01:47:57,230 so we need to figure out how to use evaluation 2203 01:47:57,230 --> 01:48:01,370 functions and other techniques to be able to play these games, ultimately, 2204 01:48:01,370 --> 01:48:02,360 better. 2205 01:48:02,360 --> 01:48:05,480 But this now was a look at this kind of adversarial search, these search 2206 01:48:05,480 --> 01:48:08,870 problems where we have situations where I am trying 2207 01:48:08,870 --> 01:48:11,360 to play against some sort of opponent. 2208 01:48:11,360 --> 01:48:13,880 And these search problems show up all over the place 2209 01:48:13,880 --> 01:48:15,560 throughout artificial intelligence. 2210 01:48:15,560 --> 01:48:18,710 We've been talking a lot today about more classical search problems, 2211 01:48:18,710 --> 01:48:22,080 like trying to find directions from one location to another. 2212 01:48:22,080 --> 01:48:25,590 But anytime an AI is faced with trying to make a decision like, 2213 01:48:25,590 --> 01:48:28,460 what do I do now in order to do something that is rational, 2214 01:48:28,460 --> 01:48:31,220 or do something that is intelligent, or trying to play a game, 2215 01:48:31,220 --> 01:48:33,830 like figuring out what move to make, these sort of algorithms 2216 01:48:33,830 --> 01:48:35,700 can really come in handy. 2217 01:48:35,700 --> 01:48:38,420 It turns out that for tic-tac-toe, the solution is pretty simple, 2218 01:48:38,420 --> 01:48:39,600 because it's a small game. 2219 01:48:39,600 --> 01:48:42,470 XKCD has famously put together a webcomic 2220 01:48:42,470 --> 01:48:45,620 where he will tell you exactly what move to make as the optimal move 2221 01:48:45,620 --> 01:48:48,290 to make, no matter what your opponent happens to do. 2222 01:48:48,290 --> 01:48:50,510 This type of thing is not quite as possible 2223 01:48:50,510 --> 01:48:52,790 for a much larger game like checkers or chess, 2224 01:48:52,790 --> 01:48:55,400 for example, where chess is totally computationally 2225 01:48:55,400 --> 01:48:57,920 intractable for most computers to be able to explore 2226 01:48:57,920 --> 01:48:59,330 all the possible states. 2227 01:48:59,330 --> 01:49:03,650 So we really need our AI to be far more intelligent about how 2228 01:49:03,650 --> 01:49:05,850 they go about trying to deal with these problems 2229 01:49:05,850 --> 01:49:08,180 and how they go about taking this environment 2230 01:49:08,180 --> 01:49:10,010 that they find themselves in and ultimately 2231 01:49:10,010 --> 01:49:12,710 searching for one of these solutions. 2232 01:49:12,710 --> 01:49:15,710 So this, then, was a look at search and artificial intelligence. 2233 01:49:15,710 --> 01:49:17,630 Next time we'll take a look at knowledge, 2234 01:49:17,630 --> 01:49:21,230 thinking about how it is that our AIs are able to know information, reason 2235 01:49:21,230 --> 01:49:25,190 about that information, and draw conclusions, all in our look at AI 2236 01:49:25,190 --> 01:49:26,750 and the principles behind it. 2237 01:49:26,750 --> 01:49:28,500 We'll see you next time.