WEBVTT
X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000

00:00:00.000 --> 00:00:17.000
[MUSIC]

00:00:17.870 --> 00:00:18.710
BRIAN YU: All right.

00:00:18.710 --> 00:00:22.280
Welcome, everyone, to an Introduction
to Artificial Intelligence with Python.

00:00:22.280 --> 00:00:23.480
My name is Brian Yu.

00:00:23.480 --> 00:00:26.540
And in this class, we'll explore
some of the ideas, and techniques,

00:00:26.540 --> 00:00:30.450
and algorithms that are at the
foundation of artificial intelligence.

00:00:30.450 --> 00:00:34.160
Now, artificial intelligence covers a
wide variety of types of techniques.

00:00:34.160 --> 00:00:36.110
Anytime you see a
computer do something that

00:00:36.110 --> 00:00:39.290
appears to be intelligent
or rational in some way,

00:00:39.290 --> 00:00:41.510
like recognizing
someone's face in a photo,

00:00:41.510 --> 00:00:44.030
or being able to play a
game better than people can,

00:00:44.030 --> 00:00:47.120
or being able to understand human
language when we talk to our phones

00:00:47.120 --> 00:00:50.370
and they understand what we mean
and are able to respond back to us,

00:00:50.370 --> 00:00:54.060
these are all examples of AI,
or artificial intelligence.

00:00:54.060 --> 00:00:58.580
And in this class we'll explore some of
the ideas that make that AI possible.

00:00:58.580 --> 00:01:00.790
So we'll begin our
conversations with search.

00:01:00.790 --> 00:01:02.540
The problem of, we
have an AI and we would

00:01:02.540 --> 00:01:06.150
like the AI to be able to search for
solutions to some kind of problem,

00:01:06.150 --> 00:01:07.700
no matter what that problem might be.

00:01:07.700 --> 00:01:11.300
Whether it's trying to get driving
directions from point A to point B,

00:01:11.300 --> 00:01:13.100
or trying to figure
out how to play a game,

00:01:13.100 --> 00:01:16.580
giving a tic-tac-toe game, for
example, figuring out what move

00:01:16.580 --> 00:01:17.960
it ought to make.

00:01:17.960 --> 00:01:20.330
After that, we'll take
a look at knowledge.

00:01:20.330 --> 00:01:23.480
Ideally, we want our AI to
be able to know information,

00:01:23.480 --> 00:01:25.400
to be able to represent
that information,

00:01:25.400 --> 00:01:28.580
and more importantly, to be able to
draw inferences from that information.

00:01:28.580 --> 00:01:32.760
To be able to use the information it
knows and draw additional conclusions.

00:01:32.760 --> 00:01:37.130
So we'll talk about how AI can be
programmed in order to do just that.

00:01:37.130 --> 00:01:39.350
Then we'll explore the
topic of uncertainty.

00:01:39.350 --> 00:01:43.100
Talking about ideas of, what happens
if a computer isn't sure about a fact

00:01:43.100 --> 00:01:45.950
but maybe is only sure
with a certain probability?

00:01:45.950 --> 00:01:48.500
So we'll talk about some of
the ideas behind probability

00:01:48.500 --> 00:01:51.530
and how computers can begin
to deal with uncertain events

00:01:51.530 --> 00:01:55.980
in order to be a little bit more
intelligent in that sense, as well.

00:01:55.980 --> 00:01:58.550
After that, we'll turn our
attention to optimization.

00:01:58.550 --> 00:02:01.970
Problems of when the computer is trying
to optimize for some sort of goal,

00:02:01.970 --> 00:02:03.770
especially in a situation
where there might

00:02:03.770 --> 00:02:06.380
be multiple ways that a
computer might solve a problem,

00:02:06.380 --> 00:02:09.770
but we're looking for a better
way or, potentially, the best way

00:02:09.770 --> 00:02:11.550
if that's at all possible.

00:02:11.550 --> 00:02:14.510
Then we'll take a look at machine
learning, or learning more generally.

00:02:14.510 --> 00:02:16.700
In looking at how when
we have access to data

00:02:16.700 --> 00:02:20.300
our computers can be programmed to be
quite intelligent by learning from data

00:02:20.300 --> 00:02:23.810
and learning from experience, being
able to perform a task better and better

00:02:23.810 --> 00:02:25.770
based on greater access to data.

00:02:25.770 --> 00:02:28.520
So your email, for example,
where your email inbox somehow

00:02:28.520 --> 00:02:32.180
knows which of your emails are good
emails and whichever emails are spam.

00:02:32.180 --> 00:02:34.430
These are all examples
of computers being

00:02:34.430 --> 00:02:38.420
able to learn from past
experiences and past data.

00:02:38.420 --> 00:02:41.690
We'll take a look, too, at how
computers are able to draw inspiration

00:02:41.690 --> 00:02:44.930
from human intelligence, looking
at the structure of the human brain

00:02:44.930 --> 00:02:48.780
and how neural networks can be a
computer analog to that sort of idea.

00:02:48.780 --> 00:02:51.830
And how, by taking advantage of a
certain type of structure of a computer

00:02:51.830 --> 00:02:53.930
program, we can write
neural networks that

00:02:53.930 --> 00:02:57.060
are able to perform tasks
very, very effectively.

00:02:57.060 --> 00:02:59.420
And then finally, we'll turn
our attention to language.

00:02:59.420 --> 00:03:02.820
Not programming languages, but human
languages that we speak every day.

00:03:02.820 --> 00:03:04.700
And taking a look at
the challenges that come

00:03:04.700 --> 00:03:07.880
about as a computer tries to
understand natural language

00:03:07.880 --> 00:03:09.980
and how it is some of
the natural language

00:03:09.980 --> 00:03:12.860
processing that occurs in
modern artificial intelligence

00:03:12.860 --> 00:03:14.990
can actually work.

00:03:14.990 --> 00:03:17.630
But today it will begin our
conversation with search.

00:03:17.630 --> 00:03:19.550
This problem of trying
to figure out what

00:03:19.550 --> 00:03:22.790
to do when we have some sort of
situation that the computer is in,

00:03:22.790 --> 00:03:25.880
some sort of environment that
an agent is in, so to speak.

00:03:25.880 --> 00:03:29.720
And we would like for that agent to
be able to somehow look for a solution

00:03:29.720 --> 00:03:31.200
to that problem.

00:03:31.200 --> 00:03:34.250
Now, these problems can come in any
number of different types of formats.

00:03:34.250 --> 00:03:37.090
One example, for instance, might
be something like this classic 15

00:03:37.090 --> 00:03:38.930
puzzle with the sliding
tiles that you might

00:03:38.930 --> 00:03:41.930
have seen, where you're trying to
slide the tiles in order to make sure

00:03:41.930 --> 00:03:43.730
that all the numbers line up in order.

00:03:43.730 --> 00:03:46.700
This is an example of what you
might call a search problem.

00:03:46.700 --> 00:03:50.360
The 15 puzzle begins in an
initially mixed up state

00:03:50.360 --> 00:03:53.480
and we need some way of finding
moves to make in order to return

00:03:53.480 --> 00:03:55.520
the puzzle to its solved state.

00:03:55.520 --> 00:03:58.190
But there are similar problems
that you can frame in other ways.

00:03:58.190 --> 00:04:00.390
Trying to find your way
through a maze, for example,

00:04:00.390 --> 00:04:02.150
is another example of a search problem.

00:04:02.150 --> 00:04:05.720
You begin in one place, you have some
goal of where you're trying to get to,

00:04:05.720 --> 00:04:08.930
and you need to figure out the correct
sequence of actions that will take you

00:04:08.930 --> 00:04:11.460
from that initial state to the goal.

00:04:11.460 --> 00:04:13.460
And while this is a little
bit abstract, anytime

00:04:13.460 --> 00:04:15.440
we talk about maze
solving in this class,

00:04:15.440 --> 00:04:17.990
you can translate it to something
a little more real world,

00:04:17.990 --> 00:04:19.820
something like driving directions.

00:04:19.820 --> 00:04:22.520
If you ever wonder how Google
Maps is able to figure out what

00:04:22.520 --> 00:04:25.550
is the best way for you to
get from point A to point B

00:04:25.550 --> 00:04:28.860
and what turns to make, at what time,
depending on traffic, for example.

00:04:28.860 --> 00:04:31.250
It's often some sort
of search algorithm.

00:04:31.250 --> 00:04:34.370
You have an AI that is trying
to get from an initial position

00:04:34.370 --> 00:04:38.060
to some sort of goal by taking
some sequence of actions.

00:04:38.060 --> 00:04:40.640
So we'll start our
conversations today by thinking

00:04:40.640 --> 00:04:42.590
about these types of
search problems and what

00:04:42.590 --> 00:04:46.250
goes in to solving a search problem
like this in order for an AI

00:04:46.250 --> 00:04:48.290
to be able to find a good solution.

00:04:48.290 --> 00:04:50.210
In order to do so, though,
we're going to need

00:04:50.210 --> 00:04:52.580
to introduce a little bit of
terminology, some of which

00:04:52.580 --> 00:04:53.690
I've already used.

00:04:53.690 --> 00:04:56.730
But the first time we'll need
to think about is an agent.

00:04:56.730 --> 00:04:59.950
An agent is just some entity
that perceives its environment,

00:04:59.950 --> 00:05:02.120
it somehow is able to
perceive the things around it,

00:05:02.120 --> 00:05:04.530
and act on that environment in some way.

00:05:04.530 --> 00:05:06.240
So in the case of the
driving directions,

00:05:06.240 --> 00:05:08.960
your agent might be some
representation of a car that

00:05:08.960 --> 00:05:11.180
is trying to figure out what
actions to take in order

00:05:11.180 --> 00:05:12.740
to arrive at a destination.

00:05:12.740 --> 00:05:15.440
In the case of the 15 puzzle
with the sliding tiles,

00:05:15.440 --> 00:05:19.250
the agent might be the AI or the person
that is trying to solve that puzzle,

00:05:19.250 --> 00:05:24.090
trying to figure out what tiles to
move in order to get to that solution.

00:05:24.090 --> 00:05:26.680
Next, we introduce the idea of a state.

00:05:26.680 --> 00:05:31.140
A state is just some configuration
of the agent in its environment.

00:05:31.140 --> 00:05:34.880
So in the 15 puzzle, for example, any
state might be any one of these three

00:05:34.880 --> 00:05:35.570
for example.

00:05:35.570 --> 00:05:38.570
A state is just some
configuration of the tiles.

00:05:38.570 --> 00:05:40.490
Each of these states is
different and is going

00:05:40.490 --> 00:05:42.830
to require a slightly
different solution.

00:05:42.830 --> 00:05:46.400
A different sequence of actions will
be needed in each one of these in order

00:05:46.400 --> 00:05:49.730
to get from this initial
state to the goal, which

00:05:49.730 --> 00:05:51.500
is where we're trying to get.

00:05:51.500 --> 00:05:52.460
The initial state then.

00:05:52.460 --> 00:05:53.210
What is that?

00:05:53.210 --> 00:05:56.090
The initial state is just the
state where the agent begins.

00:05:56.090 --> 00:05:58.880
It is one such state where
we're going to start from

00:05:58.880 --> 00:06:02.010
and this is going to be the starting
point for our search algorithm,

00:06:02.010 --> 00:06:02.690
so to speak.

00:06:02.690 --> 00:06:04.710
We're going to begin
with this initial state

00:06:04.710 --> 00:06:08.180
and then start to reason about it, to
think about what actions might we apply

00:06:08.180 --> 00:06:11.870
to that initial state in order to
figure out how to get from the beginning

00:06:11.870 --> 00:06:16.850
to the end, from the initial position
to whatever our goal happens to be.

00:06:16.850 --> 00:06:19.670
And how do we make our way from
that initial position to the goal?

00:06:19.670 --> 00:06:22.100
Well ultimately, it's
via taking actions.

00:06:22.100 --> 00:06:25.570
Actions are just choices that
we can make in any given state.

00:06:25.570 --> 00:06:29.180
And in AI, we're always going to try
to formalize these ideas a little bit

00:06:29.180 --> 00:06:31.850
more precisely such that we could
program them a little bit more

00:06:31.850 --> 00:06:33.440
mathematically, so to speak.

00:06:33.440 --> 00:06:37.580
So this will be a recurring theme and
we can more precisely define actions

00:06:37.580 --> 00:06:38.840
as a function.

00:06:38.840 --> 00:06:41.570
We're going to effectively
define a function called actions

00:06:41.570 --> 00:06:46.490
that takes an input S, where S is going
to be some state that exists inside

00:06:46.490 --> 00:06:50.720
of our environment, and actions of S
is going to take the state as input

00:06:50.720 --> 00:06:54.140
and return as output the
set of all actions that

00:06:54.140 --> 00:06:56.540
can be executed in that state.

00:06:56.540 --> 00:07:00.170
And so it's possible that some actions
are only valid in certain states

00:07:00.170 --> 00:07:01.640
and not in other states.

00:07:01.640 --> 00:07:04.340
And we'll see examples
of that soon, too.

00:07:04.340 --> 00:07:06.530
So in the case of the
15 puzzle, for example,

00:07:06.530 --> 00:07:09.010
they're generally going to
be four possible actions

00:07:09.010 --> 00:07:10.760
that we can do most of the time.

00:07:10.760 --> 00:07:13.250
We can slide a tile to the
right, slide a tile to the left,

00:07:13.250 --> 00:07:16.280
slide a tile up, or slide
a tile down, for example.

00:07:16.280 --> 00:07:19.950
And those are going to be the
actions that are available to us.

00:07:19.950 --> 00:07:23.000
So somehow our AI, our
program, needs some encoding

00:07:23.000 --> 00:07:26.150
of the state, which is often going
to be in some numerical format,

00:07:26.150 --> 00:07:28.050
and some encoding of these actions.

00:07:28.050 --> 00:07:31.100
But it also needs some encoding of
the relationship between these things,

00:07:31.100 --> 00:07:34.610
how do the states and actions
relate to one another?

00:07:34.610 --> 00:07:36.950
And in order to do that,
we'll introduce to our AI

00:07:36.950 --> 00:07:40.430
a transition model, which will
be a description of what state

00:07:40.430 --> 00:07:45.120
we get after we perform some
available action in some other state.

00:07:45.120 --> 00:07:47.540
And again, we can be a little
bit more precise about this,

00:07:47.540 --> 00:07:50.820
define this transition model a
little bit more formally, again,

00:07:50.820 --> 00:07:51.770
as a function.

00:07:51.770 --> 00:07:54.010
The function is going to be
a function called result,

00:07:54.010 --> 00:07:56.210
that this time takes two inputs.

00:07:56.210 --> 00:07:59.070
Input number one is S, some state.

00:07:59.070 --> 00:08:02.270
And input number two is A, some action.

00:08:02.270 --> 00:08:04.450
And the output of this
function result is

00:08:04.450 --> 00:08:09.770
it is going to give us the state that we
get after we perform action A in state

00:08:09.770 --> 00:08:13.790
S. So let's take a look at an example
to see more precisely what this actually

00:08:13.790 --> 00:08:14.580
means.

00:08:14.580 --> 00:08:18.000
Here's an example of a state
of the 15 puzzle, for example.

00:08:18.000 --> 00:08:21.600
And here's an example of an action,
sliding a tile to the right.

00:08:21.600 --> 00:08:24.840
What happens if we pass these as
inputs to the result function?

00:08:24.840 --> 00:08:29.270
Again, the result function takes this
board, this state, as its first input.

00:08:29.270 --> 00:08:31.730
And it takes an action
as a second input.

00:08:31.730 --> 00:08:34.100
And of course, here, I'm
describing things visually so

00:08:34.100 --> 00:08:37.010
that you can see visually what the
state is and what the action is.

00:08:37.010 --> 00:08:39.410
In a computer, you might
represent one of these actions

00:08:39.410 --> 00:08:41.570
as just some number that
represents the action.

00:08:41.570 --> 00:08:43.370
Or if you're familiar
with enums that allow

00:08:43.370 --> 00:08:46.430
you to enumerate multiple possibilities,
it might be something like that.

00:08:46.430 --> 00:08:50.530
And the state might just be represented
as an array, or two dimensional array,

00:08:50.530 --> 00:08:52.400
of all of these numbers that exist.

00:08:52.400 --> 00:08:55.540
But here we're going to show it
visually just so you can see it.

00:08:55.540 --> 00:08:59.450
When we take this state and this action,
pass it into the result function,

00:08:59.450 --> 00:09:01.310
the output is a new state.

00:09:01.310 --> 00:09:04.910
The state we get after we take a tile
and slide it to the right, and this

00:09:04.910 --> 00:09:06.500
is the state we get as a result.

00:09:06.500 --> 00:09:09.710
If we had a different action and
a different state, for example,

00:09:09.710 --> 00:09:11.450
and passed that into
the result function,

00:09:11.450 --> 00:09:13.470
we'd get a different answer altogether.

00:09:13.470 --> 00:09:15.710
So the result function
needs to take care

00:09:15.710 --> 00:09:20.090
of figuring out how to take a state and
take an action and get what results.

00:09:20.090 --> 00:09:22.760
And this is going to be
our transition model that

00:09:22.760 --> 00:09:27.450
describes how it is that states and
actions are related to each other.

00:09:27.450 --> 00:09:30.290
If we take this transition model
and think about it more generally

00:09:30.290 --> 00:09:34.850
and across the entire problem, we can
form what we might call a state space,

00:09:34.850 --> 00:09:38.120
the set of all of the states we
can get from the initial state

00:09:38.120 --> 00:09:41.960
via any sequence of actions, by
taking zero or one or two or more

00:09:41.960 --> 00:09:45.050
actions in addition to that,
so we could draw a diagram

00:09:45.050 --> 00:09:46.400
that looks something like this.

00:09:46.400 --> 00:09:49.640
Where every state is represented
here by a game board.

00:09:49.640 --> 00:09:52.850
And there are arrows that connect
every state to every other state we

00:09:52.850 --> 00:09:54.740
can get two from that state.

00:09:54.740 --> 00:09:57.800
And the state space is much larger
than what you see just here.

00:09:57.800 --> 00:10:01.920
This is just a sample of what the
state space might actually look like.

00:10:01.920 --> 00:10:04.340
And, in general, across
many search problems,

00:10:04.340 --> 00:10:07.280
whether they're this particular
15 puzzle or driving directions

00:10:07.280 --> 00:10:10.710
or something else, the state space
is going to look something like this.

00:10:10.710 --> 00:10:15.110
We have individual states and
arrows that are connecting them.

00:10:15.110 --> 00:10:17.000
And oftentimes, just
for simplicity, we'll

00:10:17.000 --> 00:10:19.670
simplify our representation
of this entire thing

00:10:19.670 --> 00:10:24.560
as a graph, some sequence of nodes
and edges that connect nodes.

00:10:24.560 --> 00:10:28.640
But you can think of this more abstract
representation as the exact same idea.

00:10:28.640 --> 00:10:30.560
Each of these little
circles, or nodes, is

00:10:30.560 --> 00:10:33.860
going to represent one of the
states inside of our problem.

00:10:33.860 --> 00:10:35.930
And the arrows here
represent the actions

00:10:35.930 --> 00:10:38.300
that we can take in
any particular state,

00:10:38.300 --> 00:10:44.180
taking us from one particular state
to another state, for example.

00:10:44.180 --> 00:10:44.760
All right.

00:10:44.760 --> 00:10:48.510
So now we have this idea of nodes
that are representing these states,

00:10:48.510 --> 00:10:50.890
actions that can take us
from one state to another,

00:10:50.890 --> 00:10:53.370
and a transition model
that defines what happens

00:10:53.370 --> 00:10:55.420
after we take a particular action.

00:10:55.420 --> 00:10:57.210
So the next step we
need to figure out is

00:10:57.210 --> 00:11:00.570
how we know when the AI is
done solving the problem.

00:11:00.570 --> 00:11:03.540
The AI I needs some way to
know when it gets to the goal,

00:11:03.540 --> 00:11:04.990
that it's found the goal.

00:11:04.990 --> 00:11:08.100
So the next thing we'll need to encode
into our artificial intelligence

00:11:08.100 --> 00:11:13.380
is a goal test, some way to determine
whether a given state is a goal state.

00:11:13.380 --> 00:11:16.740
In the case of something like driving
directions, it might be pretty easy.

00:11:16.740 --> 00:11:19.080
If you're in a state that
corresponds to whatever

00:11:19.080 --> 00:11:21.610
the user typed in as their
intended destination, well,

00:11:21.610 --> 00:11:23.160
then you know you're in a goal state.

00:11:23.160 --> 00:11:25.320
In the 15 puzzle, it might
be checking the numbers

00:11:25.320 --> 00:11:27.160
to make sure they're
all in ascending order.

00:11:27.160 --> 00:11:29.880
But the AI need some way
to encode whether or not

00:11:29.880 --> 00:11:32.340
any state they happen
to be in is a goal.

00:11:32.340 --> 00:11:34.650
And some problems might
have one goal, like a maze

00:11:34.650 --> 00:11:37.260
where you have one initial
position and one ending position

00:11:37.260 --> 00:11:38.470
and that's the goal.

00:11:38.470 --> 00:11:40.320
In other more complex
problems, you might

00:11:40.320 --> 00:11:43.890
imagine that there are multiple possible
goals, that there are multiple ways

00:11:43.890 --> 00:11:45.070
to solve a problem.

00:11:45.070 --> 00:11:47.940
And we might not care which
one the computer finds as

00:11:47.940 --> 00:11:51.330
long as it does find a particular goal.

00:11:51.330 --> 00:11:54.900
However, sometimes a computer doesn't
just care about finding a goal,

00:11:54.900 --> 00:11:57.710
but finding a goal well,
or one with a low cost.

00:11:57.710 --> 00:11:59.550
And it's for that reason
that the last piece

00:11:59.550 --> 00:12:02.400
of terminology that we use to
define these search problems

00:12:02.400 --> 00:12:04.710
is something called a path cost.

00:12:04.710 --> 00:12:07.170
You might imagine that in the
case of driving directions,

00:12:07.170 --> 00:12:10.680
it would be pretty annoying if I
said I wanted directions from point A

00:12:10.680 --> 00:12:14.310
to point B, and the route the Google
Maps gave me was a long route with lots

00:12:14.310 --> 00:12:17.220
of detours that were unnecessary,
that took longer than it should

00:12:17.220 --> 00:12:19.530
have for me to get to that destination.

00:12:19.530 --> 00:12:22.410
And it's for that reason that when
we're formulating search problems,

00:12:22.410 --> 00:12:27.480
we'll often give every path some sort of
numerical cost, some number telling us

00:12:27.480 --> 00:12:30.730
how expensive it is to take
this particular option.

00:12:30.730 --> 00:12:34.530
And then tell our AI that instead
of just finding a solution,

00:12:34.530 --> 00:12:36.990
some way of getting from the
initial state to the goal,

00:12:36.990 --> 00:12:40.830
we'd really like to find one that
minimizes this path cost, that

00:12:40.830 --> 00:12:44.370
is less expensive, or takes
less time, or minimizes

00:12:44.370 --> 00:12:46.500
some other numerical value.

00:12:46.500 --> 00:12:49.770
We can represent this graphically, if
we take a look at this graph again.

00:12:49.770 --> 00:12:52.740
And imagine that each of these
arrows, each of these actions

00:12:52.740 --> 00:12:55.560
that we can take from one
state to another state,

00:12:55.560 --> 00:12:57.720
has some sort of number
associated with it,

00:12:57.720 --> 00:13:01.170
that number being the path cost
of this particular action where

00:13:01.170 --> 00:13:03.480
some of the costs for
any particular action

00:13:03.480 --> 00:13:07.660
might be more expensive than the cost
for some other action, for example.

00:13:07.660 --> 00:13:10.120
Although this will only happen
in some sorts of problems.

00:13:10.120 --> 00:13:12.540
In other problems we
can simplify the diagram

00:13:12.540 --> 00:13:16.750
and just assume that the cost of
any particular action is the same.

00:13:16.750 --> 00:13:19.440
And this is probably the case
in something like the 15 puzzle,

00:13:19.440 --> 00:13:22.410
for example, where it doesn't
really make a difference whether I'm

00:13:22.410 --> 00:13:23.880
moving right or moving left.

00:13:23.880 --> 00:13:26.940
The only thing that matters
is the total number of steps

00:13:26.940 --> 00:13:31.650
that I have to take to get from point
A to point B. And each of those steps

00:13:31.650 --> 00:13:32.880
is of equal cost.

00:13:32.880 --> 00:13:37.270
We can just assume it's a
some constant cost, like one.

00:13:37.270 --> 00:13:39.380
And so this now forms
the basis for what we

00:13:39.380 --> 00:13:41.720
might consider to be a search problem.

00:13:41.720 --> 00:13:44.960
A search problem has some sort of
initial state, some place where

00:13:44.960 --> 00:13:47.420
we begin, some sort of
action that we can take

00:13:47.420 --> 00:13:50.300
or multiple actions that we
can take in any given state,

00:13:50.300 --> 00:13:52.850
and it has a transition
model, some way of defining

00:13:52.850 --> 00:13:56.570
what happens when we go from
one state and take one action,

00:13:56.570 --> 00:13:59.280
what state do we end
up with as a result.

00:13:59.280 --> 00:14:02.450
In addition to that, we need some
goal test to know whether or not

00:14:02.450 --> 00:14:03.650
we've reached a goal.

00:14:03.650 --> 00:14:07.940
And then we need a path cost function
that tells us for any particular path,

00:14:07.940 --> 00:14:11.690
by following some sequence of
actions, how expensive is that path.

00:14:11.690 --> 00:14:14.360
What is its cost in
terms of money, or time,

00:14:14.360 --> 00:14:18.470
or some other resource that we are
trying to minimize our usage of.

00:14:18.470 --> 00:14:22.370
The goal, ultimately, is to find a
solution, where a solution in this case

00:14:22.370 --> 00:14:26.090
is just some sequence of actions that
will take us from the initial state

00:14:26.090 --> 00:14:27.200
to the goal state.

00:14:27.200 --> 00:14:31.700
And, ideally, we'd like to find not just
any solution, but the optimal solution,

00:14:31.700 --> 00:14:35.510
which is a solution that has
the lowest path cost among all

00:14:35.510 --> 00:14:36.950
of the possible solutions.

00:14:36.950 --> 00:14:39.590
And in some cases, there might
be multiple optimal solutions,

00:14:39.590 --> 00:14:41.600
but an optimal solution
just means that there

00:14:41.600 --> 00:14:46.430
is no way that we could have done better
in terms of finding that solution.

00:14:46.430 --> 00:14:47.870
So now we've defined the problem.

00:14:47.870 --> 00:14:49.990
And now we need to
begin to figure out how

00:14:49.990 --> 00:14:53.050
it is that we're going to solve
this kind of search problem.

00:14:53.050 --> 00:14:55.270
And in order to do so,
you'll probably imagine

00:14:55.270 --> 00:14:58.810
that our computer is going to need
to represent a whole bunch of data

00:14:58.810 --> 00:15:00.160
about this particular problem.

00:15:00.160 --> 00:15:03.080
We need to represent data about
where we are in the problem.

00:15:03.080 --> 00:15:06.490
And we might need to be considering
multiple different options at once.

00:15:06.490 --> 00:15:09.850
And oftentimes when we're trying to
package a whole bunch of data related

00:15:09.850 --> 00:15:12.850
to a state together, we'll
do so using a data structure

00:15:12.850 --> 00:15:14.630
that we're going to call a node.

00:15:14.630 --> 00:15:16.510
A node is a data structure
that is just going

00:15:16.510 --> 00:15:19.110
to keep track of a variety
of different values,

00:15:19.110 --> 00:15:21.430
and specifically in the
case of a search problem,

00:15:21.430 --> 00:15:24.640
it's going to keep track of
these four values in particular.

00:15:24.640 --> 00:15:28.570
Every node is going to keep track of
a state, the state we're currently on.

00:15:28.570 --> 00:15:31.540
And every node is also going
to keep track of a parent.

00:15:31.540 --> 00:15:34.420
A parent being the state
before us, or the node

00:15:34.420 --> 00:15:37.660
that we used in order to
get to this current state.

00:15:37.660 --> 00:15:39.830
And this is going to be
relevant because eventually,

00:15:39.830 --> 00:15:42.740
once we reach the goal node,
once we get to the end,

00:15:42.740 --> 00:15:47.210
we want to know what sequence of actions
we used in order to get to that goal.

00:15:47.210 --> 00:15:50.140
And the way we'll know that
is by looking at these parents

00:15:50.140 --> 00:15:53.890
to keep track of what led us to the
goal, and what led us to that state,

00:15:53.890 --> 00:15:56.710
and what led us to the state
before that, so on and so forth,

00:15:56.710 --> 00:15:59.350
backtracking our way to
the beginning so that we

00:15:59.350 --> 00:16:01.990
know the entire sequence of
actions we needed in order

00:16:01.990 --> 00:16:04.730
to get from the beginning to the end.

00:16:04.730 --> 00:16:07.900
The node is also going to keep track
of what action we took in order to get

00:16:07.900 --> 00:16:10.120
from the parent to the current state.

00:16:10.120 --> 00:16:13.700
And the node is also going
to keep track of a path cost.

00:16:13.700 --> 00:16:16.270
In other words, it's going to
keep track of the number that

00:16:16.270 --> 00:16:20.440
represents how long it took to get
from the initial state to the state

00:16:20.440 --> 00:16:22.080
that we currently happen to be at.

00:16:22.080 --> 00:16:24.790
And we'll see why this is relevant
as we start to talk about some

00:16:24.790 --> 00:16:28.030
of the optimizations that we can make
in terms of these search problems more

00:16:28.030 --> 00:16:29.080
generally.

00:16:29.080 --> 00:16:31.000
So this is the data
structure that we're going

00:16:31.000 --> 00:16:32.920
to use in order to solve the problem.

00:16:32.920 --> 00:16:35.050
And now let's talk about
the approach, how might we

00:16:35.050 --> 00:16:37.850
actually begin to solve the problem?

00:16:37.850 --> 00:16:39.940
Well, as you might imagine,
what we're going to do

00:16:39.940 --> 00:16:42.190
is we're going to start
at one particular state

00:16:42.190 --> 00:16:44.700
and we're just going
to explore from there.

00:16:44.700 --> 00:16:46.720
The intuition is that
from a given state,

00:16:46.720 --> 00:16:48.760
we have multiple options
that we could take,

00:16:48.760 --> 00:16:50.800
and we're going to
explore those options.

00:16:50.800 --> 00:16:54.490
And once we explore those options,
we'll find that more options than that

00:16:54.490 --> 00:16:56.290
are going to make themselves available.

00:16:56.290 --> 00:16:59.080
And we're going to consider
all of the available options

00:16:59.080 --> 00:17:03.310
to be stored inside of a single data
structure that we'll call the frontier.

00:17:03.310 --> 00:17:05.710
The frontier is going to
represent all of the things

00:17:05.710 --> 00:17:10.440
that we could explore next, that
we haven't yet explored or visited.

00:17:10.440 --> 00:17:12.610
So in our approach, we're
going to begin this search

00:17:12.610 --> 00:17:17.020
algorithm by starting with a frontier
that just contains one state.

00:17:17.020 --> 00:17:20.470
The frontier is going to contain the
initial state because at the beginning,

00:17:20.470 --> 00:17:21.970
that's the only state we know about.

00:17:21.970 --> 00:17:24.330
That is the only state that exists.

00:17:24.330 --> 00:17:27.790
And then our search algorithm is
effectively going to follow a loop.

00:17:27.790 --> 00:17:31.390
We're going to repeat some
process again and again and again.

00:17:31.390 --> 00:17:35.320
The first thing we're going to
do is if the frontier is empty,

00:17:35.320 --> 00:17:36.610
then there's no solution.

00:17:36.610 --> 00:17:39.310
And we can report that there
is no way to get to the goal.

00:17:39.310 --> 00:17:40.560
And that's certainly possible.

00:17:40.560 --> 00:17:42.730
There are certain types of
problems that an AI might

00:17:42.730 --> 00:17:46.940
try to explore and realize that there
is no way to solve that problem.

00:17:46.940 --> 00:17:49.610
And that's useful information
for humans to know, as well.

00:17:49.610 --> 00:17:53.560
So if ever the frontier is empty, that
means there's nothing left to explore,

00:17:53.560 --> 00:17:57.160
and we haven't yet found a
solution so there is no solution.

00:17:57.160 --> 00:17:59.140
There's nothing left to explore.

00:17:59.140 --> 00:18:03.020
Otherwise what we'll do is we'll
remove a node from the frontier.

00:18:03.020 --> 00:18:04.990
So right now at the
beginning, the frontier

00:18:04.990 --> 00:18:07.870
just contains one node
representing the initial state.

00:18:07.870 --> 00:18:09.550
But over time, the frontier might grow.

00:18:09.550 --> 00:18:11.080
It might contain multiple states.

00:18:11.080 --> 00:18:15.730
And so here we're just going to remove
a single node from that frontier.

00:18:15.730 --> 00:18:19.000
If that node happens to be a
goal, then we found a solution.

00:18:19.000 --> 00:18:22.390
So we remove a node from the frontier
and ask ourselves, is this the goal?

00:18:22.390 --> 00:18:25.540
And we do that by applying the goal
test that we talked about earlier,

00:18:25.540 --> 00:18:29.080
asking if we're at the destination or
asking if all the numbers of the 15

00:18:29.080 --> 00:18:31.250
puzzle happen to be in order.

00:18:31.250 --> 00:18:33.860
So if the node contains the
goal, we found a solution.

00:18:33.860 --> 00:18:34.360
Great.

00:18:34.360 --> 00:18:35.900
We're done.

00:18:35.900 --> 00:18:40.820
And otherwise, what we'll need to
do is we'll need to expand the node.

00:18:40.820 --> 00:18:42.940
And this is a term in
artificial intelligence.

00:18:42.940 --> 00:18:46.870
To expand the node just means to look
at all of the neighbors of that node.

00:18:46.870 --> 00:18:49.630
In other words, consider
all of the possible actions

00:18:49.630 --> 00:18:52.830
that I could take from the state
that this node as representing

00:18:52.830 --> 00:18:55.270
and what nodes could
I get to from there.

00:18:55.270 --> 00:18:57.520
We're going to take all of
those nodes, the next nodes

00:18:57.520 --> 00:19:00.220
that I can get to from this
current one I'm looking at,

00:19:00.220 --> 00:19:02.320
and add those to the frontier.

00:19:02.320 --> 00:19:04.550
And then we'll repeat this process.

00:19:04.550 --> 00:19:07.840
So at a very high level, the idea
is we start with a frontier that

00:19:07.840 --> 00:19:09.400
contains the initial state.

00:19:09.400 --> 00:19:12.190
And we're constantly removing
a node from the frontier,

00:19:12.190 --> 00:19:16.150
looking at where we can get to next,
and adding those nodes to the frontier,

00:19:16.150 --> 00:19:19.090
repeating this process over
and over until either we remove

00:19:19.090 --> 00:19:22.270
a node from the frontier and it
contains a goal, meaning we've solved

00:19:22.270 --> 00:19:23.320
the problem.

00:19:23.320 --> 00:19:26.950
Or we run into a situation where the
frontier is empty, at which point

00:19:26.950 --> 00:19:29.610
we're left with no solution.

00:19:29.610 --> 00:19:31.610
So let's actually try
and take the pseudocode,

00:19:31.610 --> 00:19:36.470
put it into practice by taking a look at
an example of a sample search problem.

00:19:36.470 --> 00:19:38.270
So right here I have a sample graph.

00:19:38.270 --> 00:19:42.470
A is connected to B via this action,
B is connected to node C and D, C

00:19:42.470 --> 00:19:46.220
is connected to D, E is connected
to F. And what I'd like to do

00:19:46.220 --> 00:19:52.760
is have my AI find a path from A to E.
We want to get from this initial state

00:19:52.760 --> 00:19:54.980
to this goal state.

00:19:54.980 --> 00:19:56.440
So how are we going to do that?

00:19:56.440 --> 00:19:58.490
Well, we're going to start
with the frontier that

00:19:58.490 --> 00:19:59.760
contains the initial state.

00:19:59.760 --> 00:20:01.550
This is going to represent our frontier.

00:20:01.550 --> 00:20:05.090
So our frontier, initially, will
just contain A, that initial state

00:20:05.090 --> 00:20:06.650
where we're going to begin.

00:20:06.650 --> 00:20:08.480
And now we'll repeat this process.

00:20:08.480 --> 00:20:10.460
If the frontier is empty, no solution.

00:20:10.460 --> 00:20:12.920
That's not a problem because
the frontier is not empty.

00:20:12.920 --> 00:20:17.080
So we'll remove a node from the
frontier as the one to consider next.

00:20:17.080 --> 00:20:18.720
There is only one node in the frontier.

00:20:18.720 --> 00:20:20.810
So we'll go ahead and
remove it from the frontier.

00:20:20.810 --> 00:20:25.490
But now A, this initial node, this is
the node we're currently considering.

00:20:25.490 --> 00:20:26.600
We follow the next step.

00:20:26.600 --> 00:20:29.270
We ask ourselves, is this node the goal?

00:20:29.270 --> 00:20:29.880
No, it's not.

00:20:29.880 --> 00:20:30.770
A is not the goal.

00:20:30.770 --> 00:20:32.220
E is the goal.

00:20:32.220 --> 00:20:33.870
So we don't return the solution.

00:20:33.870 --> 00:20:37.160
So instead, we go to this
last step, expand the node

00:20:37.160 --> 00:20:39.950
and add the resulting
nodes to the frontier.

00:20:39.950 --> 00:20:41.070
What does that mean?

00:20:41.070 --> 00:20:45.050
Well, it means take this state A and
consider where we could get to next.

00:20:45.050 --> 00:20:47.390
And after A what we
could get to next is only

00:20:47.390 --> 00:20:51.110
B. So that's what we get
when we expand A. We find B.

00:20:51.110 --> 00:20:53.080
And we add B to the frontier.

00:20:53.080 --> 00:20:56.300
And now B is in the frontier
and we repeat the process again.

00:20:56.300 --> 00:20:57.050
We say, all right.

00:20:57.050 --> 00:20:58.280
The frontier is not empty.

00:20:58.280 --> 00:21:00.440
So let's remove B from the frontier.

00:21:00.440 --> 00:21:02.300
B is now the node that
we're considering.

00:21:02.300 --> 00:21:04.160
We ask ourselves, is B the goal?

00:21:04.160 --> 00:21:05.120
No, it's not.

00:21:05.120 --> 00:21:10.010
So we go ahead and expand B and add
its resulting nodes to the frontier.

00:21:10.010 --> 00:21:11.660
What happens when we expand B?

00:21:11.660 --> 00:21:14.750
In other words, what nodes
can we get to from B?

00:21:14.750 --> 00:21:17.180
Well, we can get to C
and D. So we'll go ahead

00:21:17.180 --> 00:21:19.100
and add C and D from the frontier.

00:21:19.100 --> 00:21:21.500
And now we have two nodes
in the frontier, C and D.

00:21:21.500 --> 00:21:23.170
And we repeat the process again.

00:21:23.170 --> 00:21:25.220
We remove a node from the
frontier, for now we'll

00:21:25.220 --> 00:21:27.260
do so arbitrarily just by picking C.

00:21:27.260 --> 00:21:30.550
We'll see why later how choosing which
node you remove from the frontier

00:21:30.550 --> 00:21:32.840
is actually quite an important
part of the algorithm.

00:21:32.840 --> 00:21:36.230
But for now I'll arbitrarily
remove C, say it's not the goal,

00:21:36.230 --> 00:21:39.260
so we'll add E, the next
one to the frontier.

00:21:39.260 --> 00:21:41.420
Then let's say I remove
E from the frontier.

00:21:41.420 --> 00:21:45.620
And now I'm currently looking at
state E. Is that a goal state?

00:21:45.620 --> 00:21:48.410
It is because I'm trying
to find a path from A to E.

00:21:48.410 --> 00:21:49.760
So I would return the goal.

00:21:49.760 --> 00:21:52.520
And that, now, would be
the solution, that I'm now

00:21:52.520 --> 00:21:57.290
able to return the solution
and I found a path from A to E.

00:21:57.290 --> 00:22:00.710
So this is the general idea, the general
approach of this search algorithm,

00:22:00.710 --> 00:22:04.220
to follow these steps constantly
removing nodes from the frontier

00:22:04.220 --> 00:22:06.540
until we're able to find a solution.

00:22:06.540 --> 00:22:08.510
So the next question
you might reasonably ask

00:22:08.510 --> 00:22:10.610
is, what could go wrong here?

00:22:10.610 --> 00:22:14.090
What are the potential problems
with an approach like this?

00:22:14.090 --> 00:22:18.090
And here's one example of a problem that
could arise from this sort of approach.

00:22:18.090 --> 00:22:21.980
Imagine this same graph, same
as before, with one change.

00:22:21.980 --> 00:22:25.070
The change being, now, instead
of just an arrow from A to B,

00:22:25.070 --> 00:22:29.250
we also have an arrow from B to A,
meaning we can go in both directions.

00:22:29.250 --> 00:22:31.580
And this is true in
something like the 15 puzzle

00:22:31.580 --> 00:22:33.560
where when I slide a
tile to the right, I

00:22:33.560 --> 00:22:36.980
could then slide a tile to the left
to get back to the original position.

00:22:36.980 --> 00:22:39.530
I could go back and
forth between A and B.

00:22:39.530 --> 00:22:42.920
And that's what these double arrows
symbolize, the idea that from one state

00:22:42.920 --> 00:22:45.310
I can get to another
and then I can get back.

00:22:45.310 --> 00:22:47.600
And that's true in many search problems.

00:22:47.600 --> 00:22:50.990
What's going to happen if I try
to apply the same approach now?

00:22:50.990 --> 00:22:53.280
Well, I'll begin with A, same as before.

00:22:53.280 --> 00:22:55.160
And I'll remove A from the frontier.

00:22:55.160 --> 00:22:59.420
And then I'll consider where I can get
to from A. And after A, the only place

00:22:59.420 --> 00:23:02.360
I can get choice B so B
goes into the frontier.

00:23:02.360 --> 00:23:03.410
Then I'll say, all right.

00:23:03.410 --> 00:23:06.200
Let's take a look at B. That's the
only thing left in the frontier.

00:23:06.200 --> 00:23:08.150
Where can I get to from B?

00:23:08.150 --> 00:23:12.410
Before it was just C and D, but
now because of that reverse arrow,

00:23:12.410 --> 00:23:17.950
I can get to A or C or D. So all
three A, C, and D. All of those

00:23:17.950 --> 00:23:19.070
now go into the frontier.

00:23:19.070 --> 00:23:23.300
They are places I can get to from B.
And now I remove one from the frontier,

00:23:23.300 --> 00:23:25.870
and, you know, maybe I'm
unlucky and maybe I pick A.

00:23:25.870 --> 00:23:27.820
And now I'm looking at A again.

00:23:27.820 --> 00:23:31.490
And I consider where can I get to from
A. And from A, well I can get to B.

00:23:31.490 --> 00:23:34.080
And now we start to see the
problem, that if I'm not careful,

00:23:34.080 --> 00:23:37.040
I go from A to B and then
back to A and then to B again.

00:23:37.040 --> 00:23:39.380
And I could be going in this
infinite loop where I never

00:23:39.380 --> 00:23:43.250
make any progress because I'm constantly
just going back and forth between two

00:23:43.250 --> 00:23:45.390
states that I've already seen.

00:23:45.390 --> 00:23:46.730
So what is the solution to this?

00:23:46.730 --> 00:23:48.980
We need some way to
deal with this problem.

00:23:48.980 --> 00:23:50.900
And the way that we can
deal with this problem

00:23:50.900 --> 00:23:54.600
is by somehow keeping track of
what we've already explored.

00:23:54.600 --> 00:23:58.150
And the logic is going to be, well,
if we've already explored the state,

00:23:58.150 --> 00:23:59.630
there's no reason to go back to it.

00:23:59.630 --> 00:24:01.700
Once we've explored a
state, don't go back to it,

00:24:01.700 --> 00:24:03.860
don't bother adding it to the frontier.

00:24:03.860 --> 00:24:05.570
There's no need to.

00:24:05.570 --> 00:24:07.640
So here is going to be
our revised approach,

00:24:07.640 --> 00:24:10.450
a better way to approach
this sort of search problem.

00:24:10.450 --> 00:24:14.030
And it's going to look very similar
just with a couple of modifications.

00:24:14.030 --> 00:24:16.850
We'll start with a frontier
that contains the initial state.

00:24:16.850 --> 00:24:18.110
Same as before.

00:24:18.110 --> 00:24:21.260
But now we'll start with
another data structure,

00:24:21.260 --> 00:24:24.290
which would just be a set of
nodes that we've already explored.

00:24:24.290 --> 00:24:25.880
So what are the states we've explored?

00:24:25.880 --> 00:24:27.330
Initially, it's empty.

00:24:27.330 --> 00:24:29.710
We have an empty explored set.

00:24:29.710 --> 00:24:30.930
And now we repeat.

00:24:30.930 --> 00:24:33.140
If the frontier is empty, no solution.

00:24:33.140 --> 00:24:34.400
Same as before.

00:24:34.400 --> 00:24:36.460
We remove a node from
the frontier, we check

00:24:36.460 --> 00:24:38.510
to see if it's a goal
state, return the solution.

00:24:38.510 --> 00:24:40.670
None of this is any different so far.

00:24:40.670 --> 00:24:42.650
But now, what we're going
to do is we're going

00:24:42.650 --> 00:24:45.800
to add the node to the explored state.

00:24:45.800 --> 00:24:49.790
So if it happens to be the case that
we remove a node from the frontier

00:24:49.790 --> 00:24:52.420
and it's not the goal, we'll
add it to the explored set

00:24:52.420 --> 00:24:54.170
so that we know we've
already explored it.

00:24:54.170 --> 00:24:58.010
We don't need to go back to it again
if it happens to come up later.

00:24:58.010 --> 00:25:00.440
And then the final
step, we expand the node

00:25:00.440 --> 00:25:03.120
and we add the resulting
nodes to the frontier.

00:25:03.120 --> 00:25:05.960
But before we just always added the
resulting nodes to the frontier,

00:25:05.960 --> 00:25:08.270
we're going to be a little
cleverer about it this time.

00:25:08.270 --> 00:25:10.970
We're only going to add
the nodes to the frontier

00:25:10.970 --> 00:25:14.120
if they aren't already in
the frontier and if they

00:25:14.120 --> 00:25:16.790
aren't already in the explored set.

00:25:16.790 --> 00:25:19.340
So we'll check both the
frontier and the explored set,

00:25:19.340 --> 00:25:22.520
make sure that the node isn't
already in one of those two,

00:25:22.520 --> 00:25:25.700
and so long as it isn't, then we'll
go ahead and add to the frontier

00:25:25.700 --> 00:25:27.590
but not otherwise.

00:25:27.590 --> 00:25:29.390
And so that revised
approach is ultimately

00:25:29.390 --> 00:25:31.220
what's going to help
make sure that we don't

00:25:31.220 --> 00:25:34.210
go back and forth between two nodes.

00:25:34.210 --> 00:25:36.420
Now the one point that I've
kind of glossed over here

00:25:36.420 --> 00:25:40.770
so far is this step here,
removing a node from the frontier.

00:25:40.770 --> 00:25:44.770
Before I just chose arbitrarily, like
let's just remove a node and that's it.

00:25:44.770 --> 00:25:46.710
But it turns out it's
actually quite important

00:25:46.710 --> 00:25:49.860
how we decide to structure
our frontier, how we add them,

00:25:49.860 --> 00:25:51.840
and how we remove our nodes.

00:25:51.840 --> 00:25:53.370
The frontier is a data structure.

00:25:53.370 --> 00:25:55.960
And we need to make a
choice about in what order

00:25:55.960 --> 00:25:58.080
are we going to be removing elements?

00:25:58.080 --> 00:26:01.600
And one of the simplest data structures
for adding and removing elements

00:26:01.600 --> 00:26:03.300
is something called a stack.

00:26:03.300 --> 00:26:07.960
And a stack is a data structure that
is a last in, first out data type.

00:26:07.960 --> 00:26:10.920
Which means the last thing
that I add to the frontier

00:26:10.920 --> 00:26:14.860
is going to be the first thing
that I remove from the frontier.

00:26:14.860 --> 00:26:18.720
So the most recent thing to go into the
stack, or the frontier in this case,

00:26:18.720 --> 00:26:21.660
is going to be the node that I explore.

00:26:21.660 --> 00:26:25.170
So let's see what happens if I
apply this stack based approach

00:26:25.170 --> 00:26:29.850
to something like this problem,
finding a path from A to E.

00:26:29.850 --> 00:26:30.910
What's going to happen?

00:26:30.910 --> 00:26:33.340
Well, again we'll start with
A. And we'll say, all right.

00:26:33.340 --> 00:26:34.950
Let's go ahead and look at A first.

00:26:34.950 --> 00:26:38.970
And then, notice this time, we've
added A to the explored set.

00:26:38.970 --> 00:26:41.790
A is something we've now explored,
we have this data structure

00:26:41.790 --> 00:26:43.320
that's keeping track.

00:26:43.320 --> 00:26:46.620
We then say from A we can
get to B. And all right.

00:26:46.620 --> 00:26:48.040
From B what can we do?

00:26:48.040 --> 00:26:52.230
Well from B, we can explore
B and get to both C and D.

00:26:52.230 --> 00:26:57.030
So we added C and then D. So
now when we explore a node,

00:26:57.030 --> 00:27:00.390
we're going to treat the frontier
as a stack, last in, first out.

00:27:00.390 --> 00:27:04.490
D was the last one to come in so
we'll go ahead and explore that next.

00:27:04.490 --> 00:27:06.450
And say, all right, where
can we get to from D?

00:27:06.450 --> 00:27:08.640
Well we can get to F. And so, all right.

00:27:08.640 --> 00:27:11.220
We'll put F into the frontier.

00:27:11.220 --> 00:27:13.800
And now because the
frontier is a stack, F

00:27:13.800 --> 00:27:16.630
is the most recent thing
that's gone in the stack.

00:27:16.630 --> 00:27:18.000
So F is what we'll explore next.

00:27:18.000 --> 00:27:19.960
We'll explore F and say, all right.

00:27:19.960 --> 00:27:21.600
Where can we get you from F?

00:27:21.600 --> 00:27:24.970
Well, we can't get anywhere so
nothing gets added to the frontier.

00:27:24.970 --> 00:27:27.870
So now what was the new most
recent thing added to the frontier?

00:27:27.870 --> 00:27:30.480
Well it's not C, the only
thing left in the frontier.

00:27:30.480 --> 00:27:34.100
We'll explore that from which we can
say, all right, from C we can get to E.

00:27:34.100 --> 00:27:35.380
So E goes into the frontier.

00:27:35.380 --> 00:27:36.510
And then we say, all right.

00:27:36.510 --> 00:27:39.220
Let's look at E and E
is now the solution.

00:27:39.220 --> 00:27:41.740
And now we've solved the problem.

00:27:41.740 --> 00:27:43.710
So when we treat the
frontier like a stack,

00:27:43.710 --> 00:27:47.670
a last in, first out data
structure, that's the result we get.

00:27:47.670 --> 00:27:52.650
We go from A to B to D to F, and then
we sort of backed up and went down

00:27:52.650 --> 00:27:56.190
to C and then E. And it's important
to get a visual sense for how

00:27:56.190 --> 00:27:57.360
this algorithm is working.

00:27:57.360 --> 00:27:59.550
We went very deep in
this search tree, so

00:27:59.550 --> 00:28:02.670
to speak, all the way until the
bottom where we hit a dead end.

00:28:02.670 --> 00:28:06.230
And then we effectively backed
up and explored this other route

00:28:06.230 --> 00:28:07.740
that we didn't try before.

00:28:07.740 --> 00:28:10.530
And it's this going very
deep in the search tree idea,

00:28:10.530 --> 00:28:13.800
this way the algorithm ends up
working when we use a stack,

00:28:13.800 --> 00:28:18.210
that we call this version of the
algorithm depth first search.

00:28:18.210 --> 00:28:20.340
Depth first search is
the search algorithm

00:28:20.340 --> 00:28:23.840
where we always explore the
deepest node in the frontier.

00:28:23.840 --> 00:28:27.000
We keep going deeper and
deeper through our search tree.

00:28:27.000 --> 00:28:31.660
And then if we hit a dead end, we back
up and we try something else instead.

00:28:31.660 --> 00:28:34.670
But depth first search is just
one of the possible search options

00:28:34.670 --> 00:28:35.780
that we could use.

00:28:35.780 --> 00:28:38.120
It turns out that there is
another algorithm called

00:28:38.120 --> 00:28:41.150
breadth first search, which
behaves very similarly to depth

00:28:41.150 --> 00:28:43.040
first search with one difference.

00:28:43.040 --> 00:28:46.820
Instead of always exploring the
deepest node in the search tree the way

00:28:46.820 --> 00:28:48.920
the depth first search
does, breadth first search

00:28:48.920 --> 00:28:53.370
is always going to explore the
shallowest node in the frontier.

00:28:53.370 --> 00:28:54.360
So what does that mean?

00:28:54.360 --> 00:28:56.900
Well, it means that instead
of using a stack, which

00:28:56.900 --> 00:29:00.650
depth first search, or DFS, used
where the most recent item added

00:29:00.650 --> 00:29:06.350
to the frontier is the one we'll explore
next, in breadth first search, or BFS,

00:29:06.350 --> 00:29:11.120
will instead use a queue where a
queue is a first in, first out data

00:29:11.120 --> 00:29:14.660
type, where the very first thing we
add to the frontier is the first one

00:29:14.660 --> 00:29:15.470
we'll explore.

00:29:15.470 --> 00:29:17.380
And they effectively
form a line or a queue,

00:29:17.380 --> 00:29:23.270
where the earlier you arrive in the
frontier, the earlier you get explored.

00:29:23.270 --> 00:29:26.600
So what would that mean for the same
exact problem finding a path from A

00:29:26.600 --> 00:29:27.950
to E?

00:29:27.950 --> 00:29:30.200
Well we start with A, same as before.

00:29:30.200 --> 00:29:33.410
Then we'll go ahead and have explored
A, and say, where can we get to from A?

00:29:33.410 --> 00:29:36.080
Well, from A we can get
to B. Same as before.

00:29:36.080 --> 00:29:37.340
From B, same as before.

00:29:37.340 --> 00:29:41.070
We can get to C and D so C and
D get added to the frontier.

00:29:41.070 --> 00:29:43.460
This time, though, we
added C to the frontier

00:29:43.460 --> 00:29:46.670
before D so we'll explore C first.

00:29:46.670 --> 00:29:48.350
So C gets explored.

00:29:48.350 --> 00:29:50.020
And from C, where can we get to?

00:29:50.020 --> 00:29:51.470
Well, we can get to E.

00:29:51.470 --> 00:29:53.690
So E gets added to the frontier.

00:29:53.690 --> 00:29:58.290
But because D was explored
before E, we'll look at D next.

00:29:58.290 --> 00:30:00.590
So we'll explore D and say,
where can we get to from D?

00:30:00.590 --> 00:30:03.860
We can get to F. And only
then will we say, all right.

00:30:03.860 --> 00:30:07.550
Now we can get to E. And so what
breadth first search, or BFS,

00:30:07.550 --> 00:30:12.290
did is we started here,
we looked at both C and D,

00:30:12.290 --> 00:30:14.240
and then we looked at
E. Effectively we're

00:30:14.240 --> 00:30:16.790
looking at things one away
from the initial state,

00:30:16.790 --> 00:30:18.620
then two away from the initial state.

00:30:18.620 --> 00:30:22.500
And only then, things that are
three away from the initial state.

00:30:22.500 --> 00:30:25.970
Unlike depth first search, which
just went as deep as possible

00:30:25.970 --> 00:30:28.700
into the search tree until
it hit a dead end and then,

00:30:28.700 --> 00:30:30.920
ultimately, had to back up.

00:30:30.920 --> 00:30:33.320
So these now are two
different search algorithms

00:30:33.320 --> 00:30:35.800
that we could apply in order
to try and solve a problem.

00:30:35.800 --> 00:30:37.850
And let's take a look at
how these would actually

00:30:37.850 --> 00:30:41.910
work in practice with something
like maze solving, for example.

00:30:41.910 --> 00:30:43.370
So here's an example of a maze.

00:30:43.370 --> 00:30:46.580
These empty cells represent
places where our agent can move.

00:30:46.580 --> 00:30:49.340
These darkened gray
cells and represent walls

00:30:49.340 --> 00:30:51.050
that the agent can't pass through.

00:30:51.050 --> 00:30:53.750
And, ultimately, our
agent, our AI, is going

00:30:53.750 --> 00:30:56.590
to try to find a way
to get from position A

00:30:56.590 --> 00:31:00.680
to position B via some sequence of
actions, where those actions are left,

00:31:00.680 --> 00:31:02.930
right, up, and down.

00:31:02.930 --> 00:31:05.420
What will depth first
search do in this case?

00:31:05.420 --> 00:31:08.270
Well depth first search
will just follow one path.

00:31:08.270 --> 00:31:11.590
If it reaches a fork in the road where
it has multiple different options,

00:31:11.590 --> 00:31:14.200
depth first search is just, in
this case, going to choose one.

00:31:14.200 --> 00:31:15.560
There isn't a real preference.

00:31:15.560 --> 00:31:19.280
But it's going to keep following
one until it hits a dead end.

00:31:19.280 --> 00:31:21.620
And when it hits a dead
end, depth first search

00:31:21.620 --> 00:31:24.800
effectively goes back to
the last decision point

00:31:24.800 --> 00:31:26.330
and tries the other path.

00:31:26.330 --> 00:31:28.760
Fully exhausting this
entire path and when

00:31:28.760 --> 00:31:30.890
it realizes that, OK,
the goal is not here,

00:31:30.890 --> 00:31:32.750
then it turns its
attention to this path.

00:31:32.750 --> 00:31:34.610
It goes as deep as possible.

00:31:34.610 --> 00:31:39.050
When it hits a dead end, it backs
up and then tries this other path,

00:31:39.050 --> 00:31:42.110
keeps going as deep as possible
down one particular path,

00:31:42.110 --> 00:31:45.470
and when it realizes that that's
a dead end, then it'll back up.

00:31:45.470 --> 00:31:48.230
And then ultimately find
its way to the goal.

00:31:48.230 --> 00:31:51.680
And maybe you got lucky and maybe you
made a different choice earlier on,

00:31:51.680 --> 00:31:54.590
but ultimately this is how depth
first search is going to work.

00:31:54.590 --> 00:31:56.900
It's going to keep following
until it hits a dead end.

00:31:56.900 --> 00:32:00.860
And when it hits a dead end, it backs
up and looks for a different solution.

00:32:00.860 --> 00:32:02.570
And so one thing you
might reasonably ask

00:32:02.570 --> 00:32:05.010
is, is this algorithm
always going to work?

00:32:05.010 --> 00:32:09.200
Will it always actually find a way to
get from the initial state to the goal?

00:32:09.200 --> 00:32:11.420
And it turns out that
as long as our maze

00:32:11.420 --> 00:32:14.630
is finite, as long as they're
that finitely many spaces where

00:32:14.630 --> 00:32:15.890
we can travel, then yes.

00:32:15.890 --> 00:32:19.900
Depth first search is going to find
a solution because eventually it

00:32:19.900 --> 00:32:21.290
will just explore everything.

00:32:21.290 --> 00:32:24.590
If the maze happens to be infinite and
there's an infinite state space, which

00:32:24.590 --> 00:32:28.350
does exist in certain types of problems,
then it's a slightly different story.

00:32:28.350 --> 00:32:30.950
But as long as our maze
has finitely many squares,

00:32:30.950 --> 00:32:33.020
we're going to find a solution.

00:32:33.020 --> 00:32:34.940
The next question, though,
that we want to ask

00:32:34.940 --> 00:32:37.040
is, is it going to be a good solution?

00:32:37.040 --> 00:32:39.740
Is it the optimal
solution that we can find?

00:32:39.740 --> 00:32:42.190
And the answer there is not necessarily.

00:32:42.190 --> 00:32:44.030
And let's take a look
at an example of that.

00:32:44.030 --> 00:32:48.890
In this maze, for example, we're again
trying to find our way from A to B.

00:32:48.890 --> 00:32:51.470
And you notice here there are
multiple possible solutions.

00:32:51.470 --> 00:32:54.560
We could go this way, or
we could go up in order

00:32:54.560 --> 00:32:57.350
to make our way from A
to B. Now if we're lucky,

00:32:57.350 --> 00:33:01.040
depth first search will choose this way
and get to B. But there's no reason,

00:33:01.040 --> 00:33:03.140
necessarily, why depth
first search would choose

00:33:03.140 --> 00:33:05.440
between going up or going to the right.

00:33:05.440 --> 00:33:08.270
It's sort of an arbitrary
decision point because both

00:33:08.270 --> 00:33:10.420
are going to be added to the frontier.

00:33:10.420 --> 00:33:13.310
And ultimately, if we get
unlucky, depth first search

00:33:13.310 --> 00:33:15.950
might choose to explore this
path first because it's just

00:33:15.950 --> 00:33:17.430
a random choice at this point.

00:33:17.430 --> 00:33:20.570
It will explore, explore,
explore, and it'll eventually

00:33:20.570 --> 00:33:24.110
find the goal, this particular
path, when in actuality there

00:33:24.110 --> 00:33:25.010
was a better path.

00:33:25.010 --> 00:33:28.940
There was a more optimal
solution that used fewer steps,

00:33:28.940 --> 00:33:32.660
assuming we're measuring the cost of a
solution based on the number of steps

00:33:32.660 --> 00:33:33.930
that we need to take.

00:33:33.930 --> 00:33:36.680
So depth first search, if
we're unlucky, might end up

00:33:36.680 --> 00:33:41.880
not finding the best solution when
a better solution is available.

00:33:41.880 --> 00:33:44.370
So if that's DFS, depth first search.

00:33:44.370 --> 00:33:47.300
How does BFS, or breadth
first search, compare?

00:33:47.300 --> 00:33:49.550
How would it work in this
particular situation?

00:33:49.550 --> 00:33:52.490
Well the algorithm is going to
look very different visually

00:33:52.490 --> 00:33:54.800
in terms of how BFS explores.

00:33:54.800 --> 00:33:57.800
Because BFS looks at
shallower nodes first,

00:33:57.800 --> 00:34:01.850
the idea is going to be BFS will
first look at all of the nodes that

00:34:01.850 --> 00:34:04.190
are one away from the initial state.

00:34:04.190 --> 00:34:06.050
Look here and look here, for example.

00:34:06.050 --> 00:34:10.590
Just at the two nodes that are
immediately next to this initial state.

00:34:10.590 --> 00:34:12.470
Then it will explore
nodes that are two away,

00:34:12.470 --> 00:34:15.050
looking at the state and
that state, for example.

00:34:15.050 --> 00:34:18.140
Then it will explore nodes that are
three away, this state and that state.

00:34:18.140 --> 00:34:22.190
Whereas depth first search just
picked one path and kept following it,

00:34:22.190 --> 00:34:24.210
breadth first search
on the other hand, is

00:34:24.210 --> 00:34:27.570
taking the option of exploring
all of the possible paths

00:34:27.570 --> 00:34:30.150
kind of at the same time,
bouncing back between them,

00:34:30.150 --> 00:34:32.520
looking deeper and deeper
at each one, but making

00:34:32.520 --> 00:34:35.400
sure to explore the shallower
ones or the ones that are

00:34:35.400 --> 00:34:38.060
closer to the initial state earlier.

00:34:38.060 --> 00:34:41.190
So we'll keep following this pattern,
looking at things that are four away,

00:34:41.190 --> 00:34:43.170
looking at things that
are five away, looking

00:34:43.170 --> 00:34:48.230
at things that are six away, until
eventually we make our way to the goal.

00:34:48.230 --> 00:34:51.090
And in this case, it's true we
had to explore some states that

00:34:51.090 --> 00:34:52.880
ultimately didn't lead us anywhere.

00:34:52.880 --> 00:34:56.290
But the path that we found to
the goal was the optimal path.

00:34:56.290 --> 00:34:59.850
This is the shortest way that
we could get to the goal.

00:34:59.850 --> 00:35:02.970
And so, what might happen
then in a larger maze?

00:35:02.970 --> 00:35:04.850
Well let's take a look
at something like this

00:35:04.850 --> 00:35:06.860
and how breadth first
search is going to behave.

00:35:06.860 --> 00:35:08.690
Well, breadth first
search, again, will just

00:35:08.690 --> 00:35:11.270
keep following the states until
it receives a decision point.

00:35:11.270 --> 00:35:13.460
It could go either left or right.

00:35:13.460 --> 00:35:16.850
And while DFS just picked
one and kept following

00:35:16.850 --> 00:35:21.620
that until it hit a dead end, BFS on
the other hand, will explore both.

00:35:21.620 --> 00:35:23.720
It'll say, look at this
node, then this node,

00:35:23.720 --> 00:35:27.510
and I'll look at this node, then
that node, so on and so forth.

00:35:27.510 --> 00:35:31.850
And when it hits a decision point here,
rather than pick one left or two right

00:35:31.850 --> 00:35:36.350
and explore that path, it will again
explore both alternating between them,

00:35:36.350 --> 00:35:37.380
going deeper and deeper.

00:35:37.380 --> 00:35:41.570
Will explore here, and then maybe
here and here, and then keep going.

00:35:41.570 --> 00:35:44.780
Explore here and slowly make
our way, you can visually

00:35:44.780 --> 00:35:46.490
see further and further out.

00:35:46.490 --> 00:35:48.200
Once we get to this
decision point, we'll

00:35:48.200 --> 00:35:52.790
explore both up and down
until, ultimately, we

00:35:52.790 --> 00:35:55.570
make our way to the goal.

00:35:55.570 --> 00:35:58.250
And what you'll notice is,
yes, breadth first search

00:35:58.250 --> 00:36:02.630
did find our way from A to B by
following this particular path.

00:36:02.630 --> 00:36:06.410
But it needed to explore a lot
of states in order to do so.

00:36:06.410 --> 00:36:09.440
And so we see some trade
here between DFS and BFS.

00:36:09.440 --> 00:36:13.430
That in DFS there may be some cases
where there is some memory savings,

00:36:13.430 --> 00:36:16.280
as compared to a breadth
first approach where

00:36:16.280 --> 00:36:19.210
breadth first search, in this case,
had to explore a lot of states.

00:36:19.210 --> 00:36:22.470
But maybe that won't always be the case.

00:36:22.470 --> 00:36:24.920
So now let's actually turn
our attention to some code.

00:36:24.920 --> 00:36:26.720
And look at the code
that we could actually

00:36:26.720 --> 00:36:30.410
write in order to implement something
like depth first search or breadth

00:36:30.410 --> 00:36:34.910
for the search in the context
of solving a maze, for example.

00:36:34.910 --> 00:36:37.340
So I'll go ahead and
go into my terminal.

00:36:37.340 --> 00:36:41.240
And what I have here inside of
maze.pi is an implementation

00:36:41.240 --> 00:36:43.640
of this same idea of maze solving.

00:36:43.640 --> 00:36:46.700
I've defined a class called
node that in this case

00:36:46.700 --> 00:36:49.620
is keeping track of the state,
the parent, in other words

00:36:49.620 --> 00:36:51.860
the state before the
state, and the action.

00:36:51.860 --> 00:36:54.110
In this case, we're not
keeping track of the path cost

00:36:54.110 --> 00:36:56.780
because we can calculate the
cost of the path at the end

00:36:56.780 --> 00:37:00.890
after we found our way from
the initial state to the goal.

00:37:00.890 --> 00:37:05.540
In addition to this, I've defined
a class called a stack frontier.

00:37:05.540 --> 00:37:08.780
And if unfamiliar with a
class, a class is a way for me

00:37:08.780 --> 00:37:11.930
to define a way to
generate objects in Python.

00:37:11.930 --> 00:37:16.070
It refers to an idea of object oriented
programming where the idea here

00:37:16.070 --> 00:37:18.740
is that I would like to
create an object that is

00:37:18.740 --> 00:37:20.900
able to store all of my Frontier Data.

00:37:20.900 --> 00:37:23.030
And I would like to have
functions, otherwise known

00:37:23.030 --> 00:37:27.410
as methods on that object, that I
can use to manipulate the object.

00:37:27.410 --> 00:37:31.130
And so what's going on here,
if unfamiliar with the syntax,

00:37:31.130 --> 00:37:34.430
is I have a function that
initially creates a frontier

00:37:34.430 --> 00:37:36.410
that I'm going to
represent using a list.

00:37:36.410 --> 00:37:39.810
And initially my frontier is
represented by the empty list.

00:37:39.810 --> 00:37:42.870
There's nothing in my
frontier to begin with.

00:37:42.870 --> 00:37:46.020
I have an add function that
adds something to the frontier,

00:37:46.020 --> 00:37:49.260
as by appending it to
the end of the list.

00:37:49.260 --> 00:37:53.400
I have a function that checks if the
frontier contains a particular state.

00:37:53.400 --> 00:37:55.990
I have an empty function that
checks if the frontier is empty.

00:37:55.990 --> 00:37:58.830
If the frontier is empty, that just
means the length of the frontier

00:37:58.830 --> 00:38:00.180
is zero.

00:38:00.180 --> 00:38:03.310
And then I have a function for
removing something from the frontier.

00:38:03.310 --> 00:38:06.150
I can't remove something from the
frontier if the frontier is empty.

00:38:06.150 --> 00:38:07.770
So I check for that first.

00:38:07.770 --> 00:38:10.710
But otherwise, if the
frontier isn't empty,

00:38:10.710 --> 00:38:13.940
recall that I'm implementing
this frontier as a stack,

00:38:13.940 --> 00:38:17.200
a last in, first out data structure.

00:38:17.200 --> 00:38:19.590
Which means the last thing
I add to the frontier,

00:38:19.590 --> 00:38:21.570
in other words, the
last thing in the list,

00:38:21.570 --> 00:38:25.750
is the item that I should
remove from this frontier.

00:38:25.750 --> 00:38:30.190
So what you'll see here is I have
removed the last item of a list.

00:38:30.190 --> 00:38:33.160
And if you index into a
Python list with negative one,

00:38:33.160 --> 00:38:35.050
that gets you the last item in the list.

00:38:35.050 --> 00:38:37.630
Since zero is the first
item, negative one

00:38:37.630 --> 00:38:41.490
kind of wraps around and gets
you to the last item in the list.

00:38:41.490 --> 00:38:43.310
So we give that the node.

00:38:43.310 --> 00:38:46.710
We call that node, we update the
frontier here on line 28 to say,

00:38:46.710 --> 00:38:50.000
go ahead and remove that node that
you just removed from the frontier.

00:38:50.000 --> 00:38:54.860
And then we return the node as a
result. So this class here effectively

00:38:54.860 --> 00:38:57.050
implements the idea of a frontier.

00:38:57.050 --> 00:38:59.690
It gives me a way to add
something to a frontier and a way

00:38:59.690 --> 00:39:03.130
to remove something from
the frontier as a stack.

00:39:03.130 --> 00:39:06.830
I've also, just for good measure,
implemented an alternative version

00:39:06.830 --> 00:39:09.500
of the same thing called a Q frontier.

00:39:09.500 --> 00:39:13.190
Which, in parentheses you'll see here,
it inherits from a stack frontier,

00:39:13.190 --> 00:39:16.640
meaning it's going to do all the same
things that the stack frontier did,

00:39:16.640 --> 00:39:19.440
except the way we remove
a node from the frontier

00:39:19.440 --> 00:39:21.090
is going to be slightly different.

00:39:21.090 --> 00:39:24.240
Instead of removing from the end of
the list the way we would in a stack,

00:39:24.240 --> 00:39:26.810
we're instead going to remove
from the beginning of the list.

00:39:26.810 --> 00:39:31.590
self.frontierzero will get me
the first node in the frontier,

00:39:31.590 --> 00:39:32.870
the first one that was added.

00:39:32.870 --> 00:39:37.550
And that is going to be the one
that we return in the case of a Q.

00:39:37.550 --> 00:39:40.340
Under here I have a definition
of a class called maze.

00:39:40.340 --> 00:39:45.050
This is going to handle the process
of taking a sequence, a maze like text

00:39:45.050 --> 00:39:47.400
file, and figuring out how to solve it.

00:39:47.400 --> 00:39:50.480
So we'll take as input a text
file that looks something

00:39:50.480 --> 00:39:53.810
like this, for example, where we see
hash marks that are here representing

00:39:53.810 --> 00:39:57.830
walls and I have the character A
representing the starting position,

00:39:57.830 --> 00:40:01.740
and the character B representing
the ending position.

00:40:01.740 --> 00:40:04.790
And you can take a look at the code
for parsing this text file right now.

00:40:04.790 --> 00:40:06.320
That's the less interesting part.

00:40:06.320 --> 00:40:09.380
The more interesting part
is this solve function here,

00:40:09.380 --> 00:40:11.510
where the solve function
is going to figure out

00:40:11.510 --> 00:40:15.230
how to actually get
from point A to point B.

00:40:15.230 --> 00:40:18.260
And here we see an implementation
of the exact same idea

00:40:18.260 --> 00:40:19.900
we saw from a moment ago.

00:40:19.900 --> 00:40:21.740
We're going to keep
track of how many states

00:40:21.740 --> 00:40:24.520
we've explored just so we
can report that data later.

00:40:24.520 --> 00:40:29.770
But I start with a node that
represents just the start state.

00:40:29.770 --> 00:40:34.040
And I start with a frontier that
in this case is a stack frontier.

00:40:34.040 --> 00:40:36.170
And given that I'm treating
my frontier as a stack,

00:40:36.170 --> 00:40:40.160
you might imagine that the algorithm I'm
using here is now depth first search.

00:40:40.160 --> 00:40:45.230
Because depth first search or DFS
uses a stack as its data structure.

00:40:45.230 --> 00:40:50.350
And initially, this frontier is just
going to contain the start state.

00:40:50.350 --> 00:40:53.390
We initialize an explored
set that initially is empty.

00:40:53.390 --> 00:40:55.420
There's nothing we've explored so far.

00:40:55.420 --> 00:40:59.980
And now here's our loop, that notion
of repeating something again and again.

00:40:59.980 --> 00:41:03.760
First, we check if the frontier is empty
by calling that empty function that we

00:41:03.760 --> 00:41:05.890
saw the implementation of a moment ago.

00:41:05.890 --> 00:41:08.080
And if the frontier
is indeed empty, we'll

00:41:08.080 --> 00:41:11.650
go ahead and raise an exception,
or a Python error, to say, sorry.

00:41:11.650 --> 00:41:15.230
There is no solution to this problem.

00:41:15.230 --> 00:41:18.630
Otherwise, we'll go ahead and
remove a node from the frontier,

00:41:18.630 --> 00:41:22.960
as by calling frontier.remove and update
the number of states we've explored.

00:41:22.960 --> 00:41:25.360
Because now we've explored
one additional state

00:41:25.360 --> 00:41:30.100
so we say self.numexplored plus equals
one, adding one to the number of states

00:41:30.100 --> 00:41:31.880
we've explored.

00:41:31.880 --> 00:41:34.430
Once we remove a node
from the frontier, recall

00:41:34.430 --> 00:41:38.330
that the next step is to see whether
or not it's the goal, the goal test.

00:41:38.330 --> 00:41:40.850
And in the case of the maze,
the goal is pretty easy.

00:41:40.850 --> 00:41:45.110
I check to see whether the state
of the node is equal to the goal.

00:41:45.110 --> 00:41:47.090
Initially when I set
up the maze, I set up

00:41:47.090 --> 00:41:49.890
this value called goal which
is the property of the maze

00:41:49.890 --> 00:41:53.300
so I can just check to see if
the node is actually the goal.

00:41:53.300 --> 00:41:56.060
And if it is the goal,
then what I want to do

00:41:56.060 --> 00:41:59.450
is backtrack my way towards
figuring out what actions

00:41:59.450 --> 00:42:02.360
I took in order to get to this goal.

00:42:02.360 --> 00:42:03.470
And how do I do that?

00:42:03.470 --> 00:42:06.140
We'll recall that every
node stores its parent--

00:42:06.140 --> 00:42:09.110
the node that came before it that
we used to get to this node--

00:42:09.110 --> 00:42:11.690
and also the action used
in order to get there.

00:42:11.690 --> 00:42:13.700
So I can create this
loop where I'm constantly

00:42:13.700 --> 00:42:17.000
just looking at the parent
of every node and keeping

00:42:17.000 --> 00:42:20.600
track, for all of the parents, what
action I took to get from the parent

00:42:20.600 --> 00:42:21.890
to this.

00:42:21.890 --> 00:42:25.250
So this loop is going to keep repeating
this process of looking through all

00:42:25.250 --> 00:42:28.670
of the parent nodes until we get
back to the initial state, which

00:42:28.670 --> 00:42:32.940
has no parent, where node.parent
is going to be equal to none.

00:42:32.940 --> 00:42:35.240
As I do so, I'm going to be
building up the list of all

00:42:35.240 --> 00:42:38.030
of the actions that I'm following
and the list of all of the cells

00:42:38.030 --> 00:42:39.590
that are part of the solution.

00:42:39.590 --> 00:42:42.020
But I'll reverse them
because when I build it

00:42:42.020 --> 00:42:44.930
up going from the goal
back to the initial state,

00:42:44.930 --> 00:42:48.020
I'm building the sequence of actions
from the goal to the initial state,

00:42:48.020 --> 00:42:50.900
but I want to reverse them in order
to get the sequence of actions

00:42:50.900 --> 00:42:53.630
from the initial state to the goal.

00:42:53.630 --> 00:42:57.210
And that is, ultimately,
going to be the solution.

00:42:57.210 --> 00:43:01.280
So all of that happens if the
current state is equal to the goal.

00:43:01.280 --> 00:43:03.290
And otherwise, if it's
not the goal, well,

00:43:03.290 --> 00:43:06.860
then I'll go ahead and add this
state to the explored set to say,

00:43:06.860 --> 00:43:08.240
I've explored this state now.

00:43:08.240 --> 00:43:11.510
No need to go back to it if I
come across it in the future.

00:43:11.510 --> 00:43:14.750
And then, this logic
here implements the idea

00:43:14.750 --> 00:43:16.820
of adding neighbors to the frontier.

00:43:16.820 --> 00:43:18.650
I'm saying, look at all of my neighbors.

00:43:18.650 --> 00:43:21.530
And I implemented a function called
neighbors that you can take a look at.

00:43:21.530 --> 00:43:23.690
And for each of those
neighbors, I'm going to check,

00:43:23.690 --> 00:43:25.850
is the state already in the frontier?

00:43:25.850 --> 00:43:28.400
Is the state already
in the explored set?

00:43:28.400 --> 00:43:32.600
And if it's not in either of those, then
I'll go ahead and add this new child

00:43:32.600 --> 00:43:33.950
node-- this new node--

00:43:33.950 --> 00:43:35.230
to the frontier.

00:43:35.230 --> 00:43:37.610
So there's a fair amount of
syntax here, but the key here

00:43:37.610 --> 00:43:39.920
is not to understand all
the nuances of the syntax,

00:43:39.920 --> 00:43:42.740
though feel free to take a closer
look at this file on your own

00:43:42.740 --> 00:43:44.660
to get a sense for how it is working.

00:43:44.660 --> 00:43:48.140
But the key is to see how this is an
implementation of the same pseudocode,

00:43:48.140 --> 00:43:53.060
the same idea that we were describing
a moment ago on the screen when we were

00:43:53.060 --> 00:43:55.220
looking at the steps that
we might follow in order

00:43:55.220 --> 00:43:57.720
to solve this kind of search problem.

00:43:57.720 --> 00:43:59.540
So now let's actually
see this in action.

00:43:59.540 --> 00:44:05.540
I'll go ahead and run maze.py
on maze1.txt, for example.

00:44:05.540 --> 00:44:09.320
And what we'll see is here we have a
printout of what the maze initially

00:44:09.320 --> 00:44:10.370
looked like.

00:44:10.370 --> 00:44:13.010
And then here, down below,
is after we've solved it.

00:44:13.010 --> 00:44:17.980
We had to explore 11 states in order to
do it, and we found a path from A to B.

00:44:17.980 --> 00:44:21.110
And in this program, I just happened
to generate a graphical representation

00:44:21.110 --> 00:44:22.040
of this, as well--

00:44:22.040 --> 00:44:25.400
so I can open up maze.png, which
is generated by this program--

00:44:25.400 --> 00:44:28.820
that shows you where, in the
darker color here, the wall is.

00:44:28.820 --> 00:44:30.860
Red is the initial
state, green is the goal,

00:44:30.860 --> 00:44:32.930
and yellow is the path
that was followed.

00:44:32.930 --> 00:44:37.230
We found a path from the
initial state to the goal.

00:44:37.230 --> 00:44:40.050
But now let's take a look
at a more sophisticated maze

00:44:40.050 --> 00:44:42.120
to see what might happen instead.

00:44:42.120 --> 00:44:47.010
Let's look now at maze2.txt, where
now here we have a much larger maze.

00:44:47.010 --> 00:44:50.280
Again, we're trying to find our
way from point A to point B,

00:44:50.280 --> 00:44:53.520
but now you'll imagine that depth-first
search might not be so lucky.

00:44:53.520 --> 00:44:56.010
It might not get the
goal on the first try.

00:44:56.010 --> 00:44:58.560
It might have to follow
one path then backtrack

00:44:58.560 --> 00:45:02.100
and explore something
else a little bit later.

00:45:02.100 --> 00:45:03.230
So let's try this.

00:45:03.230 --> 00:45:08.930
Run pythonmaze.py of maze2.txt,
this time trying on this other maze.

00:45:08.930 --> 00:45:12.140
And now depth-first search
is able to find a solution.

00:45:12.140 --> 00:45:16.040
Here, as indicated by the stars,
is a way to get from A to B.

00:45:16.040 --> 00:45:19.430
And we can represent this
visually by opening up this maze.

00:45:19.430 --> 00:45:20.810
Here's what that maze looks like.

00:45:20.810 --> 00:45:24.860
And highlighted in yellow, is the path
that was found from the initial state

00:45:24.860 --> 00:45:26.300
to the goal.

00:45:26.300 --> 00:45:31.340
But how many states do we have to
explore before we found that path?

00:45:31.340 --> 00:45:34.610
Well, recall that, in my program, I was
keeping track of the number of states

00:45:34.610 --> 00:45:36.530
that we've explored so far.

00:45:36.530 --> 00:45:40.280
And so I can go back to the terminal
and see that, all right, in order

00:45:40.280 --> 00:45:46.110
to solve this problem, we had
to explore 399 different states.

00:45:46.110 --> 00:45:48.860
And in fact, if I make one small
modification to the program

00:45:48.860 --> 00:45:51.950
and tell the program at the
end when we output this image,

00:45:51.950 --> 00:45:55.160
I added an argument
called "show explored".

00:45:55.160 --> 00:45:57.980
And if I set "show
explored" equal to true

00:45:57.980 --> 00:46:02.720
and rerun this program pythonmaze.py
by running it on maze2,

00:46:02.720 --> 00:46:06.320
and then I open the maze, what you'll
see here is, highlighted in red,

00:46:06.320 --> 00:46:10.610
are all of the states that had to be
explored to get from the initial state

00:46:10.610 --> 00:46:11.510
to the goal.

00:46:11.510 --> 00:46:15.140
Depth-First Search, or DFS, didn't
find its way to the goal right away.

00:46:15.140 --> 00:46:18.170
It made a choice to first
explore this direction.

00:46:18.170 --> 00:46:19.970
And when it explored
this direction, it had

00:46:19.970 --> 00:46:22.280
to follow every conceivable
path, all the way

00:46:22.280 --> 00:46:24.680
to the very end, even
this long and winding one,

00:46:24.680 --> 00:46:27.440
in order to realize that, you
know what, that's a dead end.

00:46:27.440 --> 00:46:29.720
And instead, the program
needed to backtrack.

00:46:29.720 --> 00:46:32.660
After going this direction, it
must have gone this direction.

00:46:32.660 --> 00:46:35.490
It got lucky here by just
not choosing this path.

00:46:35.490 --> 00:46:39.290
But it got unlucky here, exploring this
direction, exploring a bunch of states

00:46:39.290 --> 00:46:41.150
that it didn't need
to and then, likewise,

00:46:41.150 --> 00:46:43.490
exploring all of this
top part of the graph

00:46:43.490 --> 00:46:46.320
when it probably didn't
need to do that either.

00:46:46.320 --> 00:46:49.070
So all in all, depth-first
search here really

00:46:49.070 --> 00:46:52.970
not performing optimally, or probably
exploring more states than it needs to.

00:46:52.970 --> 00:46:56.600
It finds an optimal solution,
the best path to the goal,

00:46:56.600 --> 00:46:59.510
but the number of states needed
to explore in order to do so,

00:46:59.510 --> 00:47:03.060
the number of steps I had to
take, that was much higher.

00:47:03.060 --> 00:47:04.070
So let's compare.

00:47:04.070 --> 00:47:09.060
How would Breadth-First Search, or BFS,
do on this exact same maze instead?

00:47:09.060 --> 00:47:11.630
And in order to do so,
it's a very easy change.

00:47:11.630 --> 00:47:16.550
The algorithm for DFS and BFS
is identical with the exception

00:47:16.550 --> 00:47:20.950
of what data structure we use
to represent the frontier.

00:47:20.950 --> 00:47:23.840
That in DFS I used a stack frontier--

00:47:23.840 --> 00:47:25.880
last in, first out--

00:47:25.880 --> 00:47:30.380
whereas in BFS, I'm going to
use a queue frontier-- first in,

00:47:30.380 --> 00:47:33.260
first out, where the first
thing I add to the frontier

00:47:33.260 --> 00:47:35.570
is the first thing that I remove.

00:47:35.570 --> 00:47:40.670
So I'll go back to the terminal,
rerun this program on the same maze,

00:47:40.670 --> 00:47:45.280
and now you'll see that the number of
states we had to explore was only 77,

00:47:45.280 --> 00:47:49.140
as compared to almost 400 when
we used depth-first search.

00:47:49.140 --> 00:47:50.330
And we can see exactly why.

00:47:50.330 --> 00:47:54.980
We can see what happened if we open
up maze.png now and take a look.

00:47:54.980 --> 00:47:59.540
Again, yellow highlight is the solution
that breath-first search found,

00:47:59.540 --> 00:48:03.020
which, incidentally, is the same
solution that depth-first search found.

00:48:03.020 --> 00:48:07.110
They're both finding the best solution,
but notice all the white unexplored

00:48:07.110 --> 00:48:07.610
cells.

00:48:07.610 --> 00:48:10.700
There was much fewer states
that needed to be explored

00:48:10.700 --> 00:48:14.980
in order to make our way to the goal
because breadth-first search operates

00:48:14.980 --> 00:48:15.980
a little more shallowly.

00:48:15.980 --> 00:48:19.070
It's exploring things that
are close to the initial state

00:48:19.070 --> 00:48:22.170
without exploring things
that are further away.

00:48:22.170 --> 00:48:25.220
So if the goal is not too far
away, then breadth-first search

00:48:25.220 --> 00:48:27.950
can actually behave quite
effectively on a maze that

00:48:27.950 --> 00:48:30.870
looks a little something like this.

00:48:30.870 --> 00:48:35.750
Now, in this case, both BFS and DFS
ended up finding the same solution,

00:48:35.750 --> 00:48:37.680
but that won't always be the case.

00:48:37.680 --> 00:48:43.390
And in fact, let's take a look at one
more example, for instance, maze3.txt.

00:48:43.390 --> 00:48:46.970
In maze3.txt, notice that
here there are multiple ways

00:48:46.970 --> 00:48:49.190
that you could get from A to B.

00:48:49.190 --> 00:48:52.040
It's a relatively small maze,
but let's look at what happens.

00:48:52.040 --> 00:48:55.670
If I use-- and I'll go ahead
and turn off "show explored" so

00:48:55.670 --> 00:48:58.390
we just see the solution.

00:48:58.390 --> 00:49:04.590
If I use BFS, breadth-first
search, to solve maze3.txt,

00:49:04.590 --> 00:49:06.420
well, then we find a solution.

00:49:06.420 --> 00:49:09.530
And if I open up the maze, here's
the solution that we found.

00:49:09.530 --> 00:49:10.610
It is the optimal one.

00:49:10.610 --> 00:49:13.700
With just four steps, we can
get from the initial state

00:49:13.700 --> 00:49:17.090
to what the goal happens to be.

00:49:17.090 --> 00:49:21.890
But what happens if we try to use,
depth-first search, or DFS, instead?

00:49:21.890 --> 00:49:26.540
Well, again, I'll go back up to my queue
frontier, where queue frontier means

00:49:26.540 --> 00:49:28.850
that we're using breadth-first search.

00:49:28.850 --> 00:49:32.120
And I'll change it to a stack
frontier, which means that now we'll

00:49:32.120 --> 00:49:34.850
be using depth-first search.

00:49:34.850 --> 00:49:37.910
I'll rerun Pythonmaze.py.

00:49:37.910 --> 00:49:40.490
And now you'll see that
we find a solution,

00:49:40.490 --> 00:49:42.980
but it is not the optimal solution.

00:49:42.980 --> 00:49:45.640
This, instead, is what
our algorithm finds.

00:49:45.640 --> 00:49:48.140
And maybe depth-first search
would have found this solution.

00:49:48.140 --> 00:49:51.410
It's possible, but it's not
guaranteed, that if we just

00:49:51.410 --> 00:49:55.280
happen to be unlucky, if we choose
this state instead of that state,

00:49:55.280 --> 00:49:58.280
then depth-first search might
find a longer route to get

00:49:58.280 --> 00:50:01.260
from the initial state to the goal.

00:50:01.260 --> 00:50:04.280
So we do see some trade-offs here
where depth-first search might not

00:50:04.280 --> 00:50:06.240
find the optimal solution.

00:50:06.240 --> 00:50:09.080
So at that point, it seems like
breadth-first search is pretty good.

00:50:09.080 --> 00:50:12.920
Is that the best we can do, where it's
going to find us the optimal solution

00:50:12.920 --> 00:50:15.620
and we don't have to worry
about situations where

00:50:15.620 --> 00:50:20.270
we might end up finding a longer path to
the solution than what actually exists?

00:50:20.270 --> 00:50:23.150
Where the goal is far away
from the initial state--

00:50:23.150 --> 00:50:26.780
and we might have to take lots of steps
in order to get from the initial state

00:50:26.780 --> 00:50:27.620
to the goal--

00:50:27.620 --> 00:50:31.220
what ended up happening, is that
this algorithm, BFS, ended up

00:50:31.220 --> 00:50:35.480
exploring basically the entire graph,
having to go through the entire maze

00:50:35.480 --> 00:50:39.800
in order to find its way from the
initial state to the goal state.

00:50:39.800 --> 00:50:41.960
What we'd ultimately
like is for our algorithm

00:50:41.960 --> 00:50:44.320
to be a little bit more intelligent.

00:50:44.320 --> 00:50:46.310
And now what would it
mean for our algorithm

00:50:46.310 --> 00:50:49.820
to be a little bit more
intelligent, in this case?

00:50:49.820 --> 00:50:52.490
Well, let's look back to where
breadth-first search might

00:50:52.490 --> 00:50:54.290
have been able to make
a different decision

00:50:54.290 --> 00:50:57.570
and consider human intuition
in this process, as well.

00:50:57.570 --> 00:51:01.640
Like, what might a human do when solving
this maze that is different than what

00:51:01.640 --> 00:51:04.490
BFS ultimately chose to do?

00:51:04.490 --> 00:51:07.610
Well, the very first
decision point that BFS made

00:51:07.610 --> 00:51:11.420
was right here, when it
made five steps and ended up

00:51:11.420 --> 00:51:13.340
in a position where it
had a fork in the road.

00:51:13.340 --> 00:51:15.210
It could either go left
or it could go right.

00:51:15.210 --> 00:51:17.460
In these initial couple of
steps, there was no choice.

00:51:17.460 --> 00:51:20.790
There was only one action that could
be taken from each of those states.

00:51:20.790 --> 00:51:23.030
And so the search algorithm
did the only thing

00:51:23.030 --> 00:51:25.010
that any search
algorithm could do, which

00:51:25.010 --> 00:51:28.400
is keep following that
state after the next state.

00:51:28.400 --> 00:51:31.670
But this decision point is where
things get a little bit interesting.

00:51:31.670 --> 00:51:34.850
Depth-first search, that very first
search algorithm we looked at,

00:51:34.850 --> 00:51:38.750
chose to say, let's pick one
path and exhaust that path,

00:51:38.750 --> 00:51:42.560
see if anything that way has
the goal, and if not, then let's

00:51:42.560 --> 00:51:43.800
try the other way.

00:51:43.800 --> 00:51:46.580
Breadth-first search took the
alternative approach of saying,

00:51:46.580 --> 00:51:47.210
you know what?

00:51:47.210 --> 00:51:51.510
Let's explore things that are shallow,
close to us first, look left and right,

00:51:51.510 --> 00:51:53.960
then back left and back
right, so on and so forth,

00:51:53.960 --> 00:51:58.640
alternating between our options in
the hopes of finding something nearby.

00:51:58.640 --> 00:52:02.390
But ultimately, what might a human do
if confronted with a situation like this

00:52:02.390 --> 00:52:04.250
of go left or go right?

00:52:04.250 --> 00:52:07.010
Well, a human might visually
see that, all right,

00:52:07.010 --> 00:52:11.360
I'm trying to get to state B, which
is way up there, and going right just

00:52:11.360 --> 00:52:13.360
feels like it's closer to the goal.

00:52:13.360 --> 00:52:15.140
Like, it feels like
going right should be

00:52:15.140 --> 00:52:17.870
better than going left
because I'm making progress

00:52:17.870 --> 00:52:19.380
towards getting to that goal.

00:52:19.380 --> 00:52:22.340
Now, of course, there are a couple
of assumptions that I'm making here.

00:52:22.340 --> 00:52:25.220
I'm making the assumption
that we can represent

00:52:25.220 --> 00:52:27.080
this grid as, like, a
two-dimensional grid,

00:52:27.080 --> 00:52:28.880
where I know the
coordinates of everything.

00:52:28.880 --> 00:52:33.940
I know that A is in coordinate 0,0,
and B is in some other coordinate pair.

00:52:33.940 --> 00:52:37.070
And I know what coordinate I'm at now,
so I can calculate that, yeah, going

00:52:37.070 --> 00:52:39.420
this way, that is closer to the goal.

00:52:39.420 --> 00:52:42.170
And that might be a reasonable
assumption for some types of search

00:52:42.170 --> 00:52:44.060
problems but maybe not in others.

00:52:44.060 --> 00:52:46.670
But for now, we'll go
ahead and assume that--

00:52:46.670 --> 00:52:51.590
that I know what my current coordinate
pair and I know the coordinate x,y

00:52:51.590 --> 00:52:53.680
of the goal that I'm trying to get to.

00:52:53.680 --> 00:52:56.540
And in this situation,
I'd like an algorithm that

00:52:56.540 --> 00:52:59.090
is a little bit more
intelligent and somehow knows

00:52:59.090 --> 00:53:02.180
that I should be making
progress towards the goal,

00:53:02.180 --> 00:53:05.720
and this is probably the way
to do that because, in a maze,

00:53:05.720 --> 00:53:08.480
moving in the coordinate
direction of the goal

00:53:08.480 --> 00:53:11.880
is usually, though not
always, a good thing.

00:53:11.880 --> 00:53:14.840
And so here we draw a distinction
between two different types of search

00:53:14.840 --> 00:53:19.040
algorithms-- uninformed
search and informed search.

00:53:19.040 --> 00:53:23.330
Uninformed search algorithms
are algorithms like DFS and BFS,

00:53:23.330 --> 00:53:25.200
the two algorithms
that we just looked at,

00:53:25.200 --> 00:53:29.540
which are search strategies that don't
use any problem specific knowledge

00:53:29.540 --> 00:53:31.400
to be able to solve the problem.

00:53:31.400 --> 00:53:34.610
DFS and BFS didn't really
care about the structure

00:53:34.610 --> 00:53:38.270
of the maze or anything about
the way that a maze is in order

00:53:38.270 --> 00:53:39.330
to solve the problem.

00:53:39.330 --> 00:53:42.490
They just look at the actions available
and choose from those actions,

00:53:42.490 --> 00:53:45.290
and it doesn't matter whether if
it's a maze or some other problem.

00:53:45.290 --> 00:53:48.020
The solution, or the way that
it tries to solve the problem,

00:53:48.020 --> 00:53:51.320
is really fundamentally
going to be the same.

00:53:51.320 --> 00:53:53.030
What we're going to
take a look at now is

00:53:53.030 --> 00:53:55.370
an improvement upon uninformed search.

00:53:55.370 --> 00:53:57.830
We're going to take a
look at informed search.

00:53:57.830 --> 00:54:00.440
Informed search are going
to be search strategies that

00:54:00.440 --> 00:54:05.800
use knowledge specific to the problem
to be able to better find a solution.

00:54:05.800 --> 00:54:09.270
And in the case of a maze,
this problem specific knowledge

00:54:09.270 --> 00:54:11.270
is something like, if
I'm going to square

00:54:11.270 --> 00:54:14.420
that is geographically
closer to the goal, that

00:54:14.420 --> 00:54:19.700
is better than being in a square
that is geographically further away.

00:54:19.700 --> 00:54:23.330
And this is something we can only
know by thinking about this problem

00:54:23.330 --> 00:54:27.820
and reasoning about what knowledge
might be helpful for our AI agent

00:54:27.820 --> 00:54:30.060
to know a little something about.

00:54:30.060 --> 00:54:32.440
There are a number of different
types of informed search.

00:54:32.440 --> 00:54:35.140
Specifically, first, we're going
to look at a particular type

00:54:35.140 --> 00:54:39.550
of search algorithm called
greedy best-first search.

00:54:39.550 --> 00:54:42.700
Greedy Best-First Search,
often abbreviated GBFS,

00:54:42.700 --> 00:54:45.970
is a search algorithm that, instead
of expanding the deepest node,

00:54:45.970 --> 00:54:49.660
like DFS, or the
shallowest node, like BFS,

00:54:49.660 --> 00:54:52.390
this algorithm is always
going to expand the node

00:54:52.390 --> 00:54:55.990
that it thinks is closest to the goal.

00:54:55.990 --> 00:54:59.650
Now, the search algorithm isn't going to
know for sure whether it is the closest

00:54:59.650 --> 00:55:02.650
thing to the goal, because if we
knew what was closest to the goal

00:55:02.650 --> 00:55:05.100
all the time, then we would
already have a solution.

00:55:05.100 --> 00:55:07.150
Like, the knowledge of
what is close to the goal,

00:55:07.150 --> 00:55:10.570
we could just follow those steps in
order to get from the initial position

00:55:10.570 --> 00:55:11.870
to the solution.

00:55:11.870 --> 00:55:14.650
But if we don't know the solution--
meaning we don't know exactly

00:55:14.650 --> 00:55:16.450
what's closest to the goal--

00:55:16.450 --> 00:55:19.120
instead, we can use
an estimate of what's

00:55:19.120 --> 00:55:22.180
closest to the goal, otherwise
known as a heuristic--

00:55:22.180 --> 00:55:25.910
just some way of estimating whether
or not we're close to the goal.

00:55:25.910 --> 00:55:29.800
And we'll do so using a heuristic
function, conventionally called h(n),

00:55:29.800 --> 00:55:34.540
that takes a state of input and
returns our estimate of how close we

00:55:34.540 --> 00:55:36.860
are to the goal.

00:55:36.860 --> 00:55:39.040
So what might this
heuristic function actually

00:55:39.040 --> 00:55:42.100
look like in the case of
a maze-solving algorithm?

00:55:42.100 --> 00:55:45.490
Where we're trying to solve a maze,
what does a heuristic look like?

00:55:45.490 --> 00:55:48.910
Well, the heuristic needs to answer
a question, like between these two

00:55:48.910 --> 00:55:51.770
cells, C and D, which one is better?

00:55:51.770 --> 00:55:55.640
Which one would I rather be in if I'm
trying to find my way to the goal?

00:55:55.640 --> 00:55:58.640
Well, any human could probably look
at this and tell you, you know what?

00:55:58.640 --> 00:56:00.280
D looks like it's better.

00:56:00.280 --> 00:56:03.550
Even if the maze is a convoluted and
you haven't thought about all the walls,

00:56:03.550 --> 00:56:05.410
D is probably better.

00:56:05.410 --> 00:56:06.710
And why is D better?

00:56:06.710 --> 00:56:09.730
Well, because if you ignore the
wall-- let's just pretend the walls

00:56:09.730 --> 00:56:14.290
don't exist for a moment and
relax the problem, so to speak--

00:56:14.290 --> 00:56:18.670
D, just in terms of coordinate
pairs, is closer to this goal.

00:56:18.670 --> 00:56:21.700
It's fewer steps that I would
need to take to get to the goal,

00:56:21.700 --> 00:56:24.160
as compared to C, even
if you ignore the walls.

00:56:24.160 --> 00:56:29.160
If you just know the x,y coordinate of
C, and the x,y coordinate of the goal,

00:56:29.160 --> 00:56:31.450
and likewise, you know
the x,y coordinate of D,

00:56:31.450 --> 00:56:35.770
you can calculate that D, just
geographically, ignoring the walls,

00:56:35.770 --> 00:56:37.110
looks like it's better.

00:56:37.110 --> 00:56:39.700
And so this is the heuristic
function that we're going to use,

00:56:39.700 --> 00:56:42.880
and it's something called the
Manhattan distance, one specific type

00:56:42.880 --> 00:56:46.690
of heuristic, where the heuristic
is, how many squares vertically

00:56:46.690 --> 00:56:49.270
and horizontally and then
left to right-- so not

00:56:49.270 --> 00:56:53.320
allowing myself to go diagonally, just
either up or right or left or down.

00:56:53.320 --> 00:56:58.030
How many steps do I need to take to get
from each of these cells to the goal?

00:56:58.030 --> 00:57:00.910
Well, as it turns out, D is much closer.

00:57:00.910 --> 00:57:01.870
There are fewer steps.

00:57:01.870 --> 00:57:05.740
It only needs to take six steps
in order to get to that goal.

00:57:05.740 --> 00:57:07.670
Again here ignoring the walls.

00:57:07.670 --> 00:57:09.790
We've relaxed the problem a little bit.

00:57:09.790 --> 00:57:12.510
We're just concerned
with, if you do the math,

00:57:12.510 --> 00:57:14.470
subtract the x values
from each other and the y

00:57:14.470 --> 00:57:18.010
values from each other, what is our
estimate of how far we are away?

00:57:18.010 --> 00:57:22.970
We can estimate that D is
closer to the goal than C is.

00:57:22.970 --> 00:57:24.800
And so now we have an approach.

00:57:24.800 --> 00:57:27.890
We have a way of picking which
node to remove from the frontier.

00:57:27.890 --> 00:57:29.870
And at each stage in
our algorithm, we're

00:57:29.870 --> 00:57:31.580
going to remove a node
from the frontier.

00:57:31.580 --> 00:57:34.820
We're going to explore the
node, if it has the smallest

00:57:34.820 --> 00:57:37.970
value for this heuristic
function, if it has the smallest

00:57:37.970 --> 00:57:40.740
Manhattan distance to the goal.

00:57:40.740 --> 00:57:42.510
And so what would this
actually look like?

00:57:42.510 --> 00:57:45.320
Well, let me first label
this graph, label this maze,

00:57:45.320 --> 00:57:48.050
with a number representing
the value of this heuristic

00:57:48.050 --> 00:57:51.860
function, the value of the Manhattan
distance from any of these cells.

00:57:51.860 --> 00:57:55.070
So from this cell, for example,
were one away from the goal.

00:57:55.070 --> 00:57:56.920
From this cell, were
two away from the goal.

00:57:56.920 --> 00:57:58.400
Three away, four away.

00:57:58.400 --> 00:58:02.210
Here we're five away, because we have
to go one to the right and then four up.

00:58:02.210 --> 00:58:05.850
From somewhere like here,
the Manhattan distance is 2.

00:58:05.850 --> 00:58:08.330
We're only two squares
away from the goal,

00:58:08.330 --> 00:58:10.730
geographically, even
though in practices we're

00:58:10.730 --> 00:58:13.910
going to have to take a longer
path, but we don't know that yet.

00:58:13.910 --> 00:58:16.760
The heuristic is just
some easy way to estimate

00:58:16.760 --> 00:58:18.380
how far we are away from the goal.

00:58:18.380 --> 00:58:21.380
And maybe our heuristic
is overly optimistic.

00:58:21.380 --> 00:58:23.660
It thinks that, yeah,
we're only two steps away,

00:58:23.660 --> 00:58:27.450
when in practice, when you consider
the walls, it might be more steps.

00:58:27.450 --> 00:58:31.400
So the important thing here is that
the heuristic isn't a guarantee

00:58:31.400 --> 00:58:33.290
of how many steps it's going to take.

00:58:33.290 --> 00:58:34.910
It is estimating.

00:58:34.910 --> 00:58:36.920
It's an attempt at
trying to approximate.

00:58:36.920 --> 00:58:40.760
And it does seem generally the case
that the squares that look closer

00:58:40.760 --> 00:58:43.910
to the goal have smaller values
for the heuristic function

00:58:43.910 --> 00:58:46.960
than squares that are further away.

00:58:46.960 --> 00:58:52.110
So now, using greedy best-first search,
what might this algorithm actually do?

00:58:52.110 --> 00:58:55.270
Well, again, for these first five
steps, there's not much of a choice.

00:58:55.270 --> 00:58:57.610
We started this initial state,
A. And we say, all right.

00:58:57.610 --> 00:59:00.300
We have to explore these five states.

00:59:00.300 --> 00:59:01.890
But now we have a decision point.

00:59:01.890 --> 00:59:04.590
Now we have a choice between
going left and going right.

00:59:04.590 --> 00:59:08.760
And before, when DFS and BFS would
just pick arbitrarily because it just

00:59:08.760 --> 00:59:11.610
depends on the order you throw
these two nodes into the frontier--

00:59:11.610 --> 00:59:15.240
and we didn't specify what order you put
them into the frontier, only the order

00:59:15.240 --> 00:59:17.250
you take them out.

00:59:17.250 --> 00:59:20.580
Here we can look at 13 and
11 and say that, all right,

00:59:20.580 --> 00:59:24.630
this square is a distance
of 11 away from the goal,

00:59:24.630 --> 00:59:27.300
according to our heuristic,
according to our estimate.

00:59:27.300 --> 00:59:31.560
And this one we estimate to
be 13 away from the goal.

00:59:31.560 --> 00:59:34.650
So between those two options,
between these two choices,

00:59:34.650 --> 00:59:36.150
I'd rather have the 11.

00:59:36.150 --> 00:59:40.240
I'd rather be 11 steps away from
the goal, so I'll go to the right.

00:59:40.240 --> 00:59:44.340
We're able to make an informed decision
because we know a little something more

00:59:44.340 --> 00:59:45.670
about this problem.

00:59:45.670 --> 00:59:47.820
So then we keep following 10, 9, 8--

00:59:47.820 --> 00:59:49.470
between the two sevens.

00:59:49.470 --> 00:59:51.850
We don't really have much of
a way to know between those.

00:59:51.850 --> 00:59:53.880
So then we do just have to
make an arbitrary choice.

00:59:53.880 --> 00:59:54.630
And you know what?

00:59:54.630 --> 00:59:55.560
Maybe we choose wrong.

00:59:55.560 --> 01:00:00.110
But that's OK because now we can still
say, all right, let's try this seven.

01:00:00.110 --> 01:00:01.820
We say seven, six.

01:00:01.820 --> 01:00:03.990
We have to make this choice
even though it increases

01:00:03.990 --> 01:00:05.640
the value of the heuristic function.

01:00:05.640 --> 01:00:08.970
But now we have another decision
point between six and eight.

01:00:08.970 --> 01:00:10.290
And between those two--

01:00:10.290 --> 01:00:13.380
and really, we're also considering
the 13, but that's much higher.

01:00:13.380 --> 01:00:16.200
Between six, eight,
and 13, well, the six

01:00:16.200 --> 01:00:18.900
is the smallest value, so
we'd rather take the six.

01:00:18.900 --> 01:00:22.440
We're able to make an informed decision
that going this way to the right

01:00:22.440 --> 01:00:24.840
is probably better than going that way.

01:00:24.840 --> 01:00:25.680
So we turn this way.

01:00:25.680 --> 01:00:26.850
We go to five.

01:00:26.850 --> 01:00:29.190
And now we find a decision
point where we'll actually

01:00:29.190 --> 01:00:31.200
make a decision that we
might not want to make,

01:00:31.200 --> 01:00:34.320
but there's unfortunately not
too much of a way around this.

01:00:34.320 --> 01:00:35.670
We see four and six.

01:00:35.670 --> 01:00:37.620
Four looks closer to the goal, right?

01:00:37.620 --> 01:00:40.170
It's going up, and the
goal is further up.

01:00:40.170 --> 01:00:43.710
So we end up taking that route, which
ultimately leads us to a dead end.

01:00:43.710 --> 01:00:46.960
But that's OK because we can still
say, all right, now let's try the six,

01:00:46.960 --> 01:00:51.380
and now follow this route that will
ultimately lead us to the goal.

01:00:51.380 --> 01:00:54.510
And so this now is how greedy
best-first search might

01:00:54.510 --> 01:00:56.760
try to approach this
problem, by saying whenever

01:00:56.760 --> 01:01:00.060
we have a decision between multiple
nodes that we could explore,

01:01:00.060 --> 01:01:04.290
let's explore the node that
has the smallest value of h(n),

01:01:04.290 --> 01:01:09.210
this heuristic function that is
estimating how far I have to go.

01:01:09.210 --> 01:01:11.010
And it just so happens
that, in this case,

01:01:11.010 --> 01:01:15.010
we end up doing better, in terms of the
number of states we needed to explore,

01:01:15.010 --> 01:01:16.630
than BFS needed to.

01:01:16.630 --> 01:01:19.950
BFS explored all of this
section and all of that section.

01:01:19.950 --> 01:01:22.620
But we were able to eliminate
that by taking advantage

01:01:22.620 --> 01:01:26.040
of this heuristic, this
knowledge about how close we

01:01:26.040 --> 01:01:30.330
are to the goal or some
estimate of that idea.

01:01:30.330 --> 01:01:31.450
So this seems much better.

01:01:31.450 --> 01:01:33.150
So wouldn't we always
prefer an algorithm

01:01:33.150 --> 01:01:36.870
like this over an algorithm
like breadth-first search?

01:01:36.870 --> 01:01:37.850
Well, maybe.

01:01:37.850 --> 01:01:39.810
One thing to take into
consideration is that we

01:01:39.810 --> 01:01:42.030
need to come up with a good heuristic.

01:01:42.030 --> 01:01:45.720
How good the heuristic is is going
to affect how good this algorithm is.

01:01:45.720 --> 01:01:49.740
And coming up with a good heuristic
can oftentimes be challenging.

01:01:49.740 --> 01:01:51.450
But the other thing
to consider is to ask

01:01:51.450 --> 01:01:54.630
the question, just as we did
with the prior two algorithms,

01:01:54.630 --> 01:01:56.580
is this algorithm optimal?

01:01:56.580 --> 01:02:02.280
Will it always find the shortest path
from the initial state to the goal?

01:02:02.280 --> 01:02:06.180
And to answer that question, let's take
a look at this example for a moment.

01:02:06.180 --> 01:02:07.570
Take a look at this example.

01:02:07.570 --> 01:02:10.170
Again, we're trying to get
from A to B, and again, I've

01:02:10.170 --> 01:02:13.260
labeled each of the cells
with their Manhattan distance

01:02:13.260 --> 01:02:16.320
from the goal, the number of
squares up and to the right

01:02:16.320 --> 01:02:20.500
you would need to travel in order
to get from that square to the goal.

01:02:20.500 --> 01:02:23.400
And let's think about, would
greedy best-first search

01:02:23.400 --> 01:02:29.400
that always picks the smallest number
end up finding the optimal solution?

01:02:29.400 --> 01:02:33.520
What is the shortest solution,
and would this algorithm find it?

01:02:33.520 --> 01:02:38.190
And the important thing to realize is
that right here is the decision point.

01:02:38.190 --> 01:02:40.710
We're estimate to be
12 away from the goal.

01:02:40.710 --> 01:02:42.310
And we have two choices.

01:02:42.310 --> 01:02:45.690
We can go to the left, which we
estimate to be 13 away from the goal,

01:02:45.690 --> 01:02:49.710
or we can go up, where we estimate
it to be 11 away from the goal.

01:02:49.710 --> 01:02:53.640
And between those two, greedy
best-first search is going to say,

01:02:53.640 --> 01:02:57.000
the 11 looks better than the 13.

01:02:57.000 --> 01:02:59.540
And in doing so, greedy
best-first search

01:02:59.540 --> 01:03:02.820
will end up finding
this path to the goal.

01:03:02.820 --> 01:03:04.980
But it turns out this
path is not optimal.

01:03:04.980 --> 01:03:07.470
There is a way to get to
the goal using fewer steps.

01:03:07.470 --> 01:03:12.420
And it's actually this way, this way
that ultimately involved fewer steps,

01:03:12.420 --> 01:03:15.900
even though it meant at this
moment choosing the worst

01:03:15.900 --> 01:03:19.770
option between the two-- or what we
estimated to be the worst option, based

01:03:19.770 --> 01:03:21.150
on the heretics.

01:03:21.150 --> 01:03:23.910
And so this is what we mean
by this is a greedy algorithm.

01:03:23.910 --> 01:03:26.460
It's making the best decision, locally.

01:03:26.460 --> 01:03:28.800
At this decision point,
it looks like it's better

01:03:28.800 --> 01:03:31.450
to go here than it is to go to the 13.

01:03:31.450 --> 01:03:34.040
But in the big picture, it's
not necessarily optimal,

01:03:34.040 --> 01:03:37.200
that it might find a solution
when in actuality there

01:03:37.200 --> 01:03:40.150
was a better solution available.

01:03:40.150 --> 01:03:43.200
So we would like some way
to solve this problem.

01:03:43.200 --> 01:03:45.840
We like the idea of
this heuristic, of being

01:03:45.840 --> 01:03:50.080
able to estimate the path, the
distance between us and the goal,

01:03:50.080 --> 01:03:52.290
and that helps us to be able
to make better decisions

01:03:52.290 --> 01:03:55.890
and to eliminate having to search
through entire parts of the state

01:03:55.890 --> 01:03:57.070
space.

01:03:57.070 --> 01:04:00.160
But we would like to modify the
algorithm so that we can achieve

01:04:00.160 --> 01:04:02.650
optimality, so that it can be optimal.

01:04:02.650 --> 01:04:04.060
And what is the way to do this?

01:04:04.060 --> 01:04:05.790
What is the intuition here?

01:04:05.790 --> 01:04:08.310
Well, let's take a look at this problem.

01:04:08.310 --> 01:04:11.070
In this initial problem,
greedy best-first search

01:04:11.070 --> 01:04:14.170
found this solution
here, this long path.

01:04:14.170 --> 01:04:17.310
And the reason why it wasn't great is
because, yes, the heuristic numbers

01:04:17.310 --> 01:04:21.180
went down pretty low, but later on,
and they started to build back up.

01:04:21.180 --> 01:04:25.780
They built back 8, 9, 10, 11-- all
the way up to 12, in this case.

01:04:25.780 --> 01:04:29.290
And so how might we go about
trying to improve this algorithm?

01:04:29.290 --> 01:04:32.400
Well, one thing that we
might realize is that, if we

01:04:32.400 --> 01:04:35.170
go all the way through this
algorithm, through this path,

01:04:35.170 --> 01:04:39.340
and we end up going to the 12, and we've
had to take this many steps-- like,

01:04:39.340 --> 01:04:42.640
who knows how many steps that
is-- just to get to this 12,

01:04:42.640 --> 01:04:48.160
we could have also, as an alternative,
taken much fewer steps, just six steps,

01:04:48.160 --> 01:04:50.170
and ended up at this 13 here.

01:04:50.170 --> 01:04:53.680
And yes, 13 is more than 12, so
it looks like it's not as good,

01:04:53.680 --> 01:04:55.460
but it required far fewer steps.

01:04:55.460 --> 01:04:55.960
Right?

01:04:55.960 --> 01:04:59.530
It only took six steps to get to
this 13 versus many more steps

01:04:59.530 --> 01:05:01.000
to get to this 12.

01:05:01.000 --> 01:05:04.180
And while greedy best-first search
says, oh, well, 12 is better than 13

01:05:04.180 --> 01:05:07.850
so pick the 12, we might
more intelligently say,

01:05:07.850 --> 01:05:10.810
I'd rather be somewhere
that heuristically

01:05:10.810 --> 01:05:16.030
looks like it takes slightly longer
if I can get there much more quickly.

01:05:16.030 --> 01:05:18.940
And we're going to encode
that idea, this general idea,

01:05:18.940 --> 01:05:23.010
into a more formal algorithm
known as A star search.

01:05:23.010 --> 01:05:25.270
A star search is going
to solve this problem by,

01:05:25.270 --> 01:05:27.970
instead of just
considering the heuristic,

01:05:27.970 --> 01:05:32.720
also considering how long it took
us to get to any particular state.

01:05:32.720 --> 01:05:35.650
So the distinction is greedy
best-first search, if I am in a state

01:05:35.650 --> 01:05:38.440
right now, the only
thing I care about is

01:05:38.440 --> 01:05:42.070
what is the estimated distance,
the heuristic value, between me

01:05:42.070 --> 01:05:43.000
and the goal.

01:05:43.000 --> 01:05:45.640
Whereas A star search will
take into consideration

01:05:45.640 --> 01:05:46.870
two pieces of information.

01:05:46.870 --> 01:05:51.130
It'll take into consideration, how
far do I estimate I am from the goal,

01:05:51.130 --> 01:05:55.030
but also how far did I have to
travel in order to get here?

01:05:55.030 --> 01:05:57.490
Because that is relevant, too.

01:05:57.490 --> 01:06:00.760
So we'll search algorithms by
expanding the node with the lowest

01:06:00.760 --> 01:06:04.060
value of g(n) plus h(n).

01:06:04.060 --> 01:06:07.810
h(n) is that same heuristic that we were
talking about a moment ago that's going

01:06:07.810 --> 01:06:12.640
to vary based on the problem, but
g(n) is going to be the cost to reach

01:06:12.640 --> 01:06:13.420
the node--

01:06:13.420 --> 01:06:19.380
how many steps I had to take, in this
case, to get to my current position.

01:06:19.380 --> 01:06:22.070
So what does that search
algorithm look like in practice?

01:06:22.070 --> 01:06:23.630
Well, let's take a look.

01:06:23.630 --> 01:06:25.130
Again, we've got the same maze.

01:06:25.130 --> 01:06:28.010
And again, I've labeled them
with their Manhattan distance.

01:06:28.010 --> 01:06:32.060
This value is the h(n)
value, the heuristic estimate

01:06:32.060 --> 01:06:36.240
of how far each of these
squares is away from the goal.

01:06:36.240 --> 01:06:38.370
But now, as we begin
to explore states, we

01:06:38.370 --> 01:06:41.370
care not just about this
heuristic value but also

01:06:41.370 --> 01:06:45.650
about g(n), the number of steps I
had to take in order to get there.

01:06:45.650 --> 01:06:48.060
And I care about summing
those two numbers together.

01:06:48.060 --> 01:06:49.230
So what does that look like?

01:06:49.230 --> 01:06:52.850
On this very first step,
I have taken one step.

01:06:52.850 --> 01:06:56.130
And now I am estimated to be
16 steps away from the goal.

01:06:56.130 --> 01:06:59.190
So the total value here is 17.

01:06:59.190 --> 01:07:00.280
Then I take one more step.

01:07:00.280 --> 01:07:02.010
I've now taken two steps.

01:07:02.010 --> 01:07:04.780
And I estimate myself to
be 15 away from the goal--

01:07:04.780 --> 01:07:06.740
again, a total value of 17.

01:07:06.740 --> 01:07:08.230
Now I've taken three steps.

01:07:08.230 --> 01:07:11.440
And I'm estimated to be 14 away
from the goal, so on and so forth.

01:07:11.440 --> 01:07:13.750
Four steps, an estimate of 13.

01:07:13.750 --> 01:07:15.820
Five steps, estimate of 12.

01:07:15.820 --> 01:07:17.980
And now, here's a decision point.

01:07:17.980 --> 01:07:22.720
I could either be six steps away
from the goal with a heuristic of 13

01:07:22.720 --> 01:07:26.440
for a total of 19, or I
could be six steps away

01:07:26.440 --> 01:07:31.640
from the goal with a heuristic of 11
with an estimate of 17 for the total.

01:07:31.640 --> 01:07:35.230
So between 19 and 17,
I'd rather take the 17--

01:07:35.230 --> 01:07:37.170
the 6 plus 11.

01:07:37.170 --> 01:07:39.170
So so far, no different
than what we saw before.

01:07:39.170 --> 01:07:42.160
We're still taking this option
because it appears to be better.

01:07:42.160 --> 01:07:45.160
And I keep taking this option
because it appears to be better.

01:07:45.160 --> 01:07:49.430
But it's right about here that
things get a little bit different.

01:07:49.430 --> 01:07:55.630
Now I could be 15 steps away from the
goal with an estimated distance of 6.

01:07:55.630 --> 01:07:58.750
So 15 plus 6, total value of 21.

01:07:58.750 --> 01:08:01.870
Alternatively, I could be six
steps away from the goal--

01:08:01.870 --> 01:08:04.690
because this was five steps
away, so this is six steps away--

01:08:04.690 --> 01:08:07.350
with a total value of 13 as my estimate.

01:08:07.350 --> 01:08:08.650
So 6 plus 13--

01:08:08.650 --> 01:08:10.180
that's 19.

01:08:10.180 --> 01:08:14.140
So here we would evaluate
g(n) plus h(n) to be 19--

01:08:14.140 --> 01:08:20.440
6 plus 13-- whereas here, we
would be 15 plus 6, or 21.

01:08:20.440 --> 01:08:23.780
And so the intuition is,
19 less than 21, pick here.

01:08:23.780 --> 01:08:29.190
But the idea is ultimately I'd rather be
having taken fewer steps to get to a 13

01:08:29.190 --> 01:08:32.610
than having taken 15
steps and be at a six

01:08:32.610 --> 01:08:35.410
because it means I've had to take
more steps in order to get there.

01:08:35.410 --> 01:08:38.520
Maybe there's a better path this way.

01:08:38.520 --> 01:08:41.050
So instead we'll explore this route.

01:08:41.050 --> 01:08:43.840
Now if we go one more--
this is seven steps plus 14,

01:08:43.840 --> 01:08:46.840
is 21, so between those
two it's sort of a toss up.

01:08:46.840 --> 01:08:48.970
We might end up exploring
that one anyways.

01:08:48.970 --> 01:08:53.270
But after that, as these numbers start
to get bigger in the heuristic values

01:08:53.270 --> 01:08:55.600
and these heuristic values
start to get smaller,

01:08:55.600 --> 01:08:59.100
you'll find that we'll actually
keep exploring down this path.

01:08:59.100 --> 01:09:02.290
And you can do the math to see
that at every decision point,

01:09:02.290 --> 01:09:06.810
A star search is going to make a choice
based on the sum of how many steps

01:09:06.810 --> 01:09:09.700
it took me to get to my
current position and then

01:09:09.700 --> 01:09:13.180
how far I estimate I am from the goal.

01:09:13.180 --> 01:09:15.760
So while we did have to
explore some of these states,

01:09:15.760 --> 01:09:20.470
the ultimate solution we found
was, in fact, an optimal solution.

01:09:20.470 --> 01:09:24.790
It did find us the quickest possible
way to get from the initial state

01:09:24.790 --> 01:09:25.800
to the goal.

01:09:25.800 --> 01:09:29.800
And it turns out that A* is an
optimal search algorithm under certain

01:09:29.800 --> 01:09:31.390
conditions.

01:09:31.390 --> 01:09:35.860
So the conditions are h of n, my
heuristic, needs to be admissible.

01:09:35.860 --> 01:09:37.990
What does it mean for a
heuristic to be admissible?

01:09:37.990 --> 01:09:42.700
Well, a heuristic is admissible if
it never overestimates the true cost.

01:09:42.700 --> 01:09:46.260
Each event always needs to
either get it exactly right

01:09:46.260 --> 01:09:50.520
in terms of how far away I am,
or it needs to underestimate.

01:09:50.520 --> 01:09:54.600
So we saw an example from before where
the heuristic value was much smaller

01:09:54.600 --> 01:09:56.350
than the actual cost it would take.

01:09:56.350 --> 01:09:57.710
That's totally fine.

01:09:57.710 --> 01:10:00.100
But the heuristic value
should never overestimate.

01:10:00.100 --> 01:10:04.520
It should never think that I'm further
away from the goal than I actually am.

01:10:04.520 --> 01:10:07.630
And meanwhile, to make a
stronger statement, h of n

01:10:07.630 --> 01:10:09.880
also needs to be consistent.

01:10:09.880 --> 01:10:11.800
And what does it mean
for it to be consistent?

01:10:11.800 --> 01:10:14.500
Mathematically, it means
that for every node, which

01:10:14.500 --> 01:10:19.120
we'll call n, and successor, the node
after me, that I'll call n prime,

01:10:19.120 --> 01:10:24.220
where it takes a cost of c to make
that step, the heuristic value of n

01:10:24.220 --> 01:10:26.560
needs to be less than or
equal to the heuristic

01:10:26.560 --> 01:10:29.060
value of n prime plus the cost.

01:10:29.060 --> 01:10:31.360
So it's a lot of math, but
in words, what it ultimately

01:10:31.360 --> 01:10:34.870
means is that if I am here
at this state right now,

01:10:34.870 --> 01:10:39.250
the heuristic value from me to the goal
shouldn't be more than the heuristic

01:10:39.250 --> 01:10:44.080
value of my successor, the next place
I could go to, plus however much

01:10:44.080 --> 01:10:48.560
it would cost me to just make that
step, from one step to the next step.

01:10:48.560 --> 01:10:52.480
And so this is just making sure that
my heuristic is consistent between all

01:10:52.480 --> 01:10:54.010
of these steps that I might take.

01:10:54.010 --> 01:10:58.540
So as long as this is true, then A*
search is going to find me an optimal

01:10:58.540 --> 01:10:59.470
solution.

01:10:59.470 --> 01:11:02.770
And this is where much of the challenge
of solving these search problems can

01:11:02.770 --> 01:11:05.980
sometimes come in, that A* search
is an algorithm that is known,

01:11:05.980 --> 01:11:07.960
and you could write
the code fairly easily.

01:11:07.960 --> 01:11:11.260
But it's choosing the heuristic that
can be the interesting challenge.

01:11:11.260 --> 01:11:13.540
The better the heuristic
is, the better I'll

01:11:13.540 --> 01:11:16.870
be able to solve the problem, and the
fewer states that I'll have to explore.

01:11:16.870 --> 01:11:20.200
And I need to make sure
that the heuristic satisfies

01:11:20.200 --> 01:11:22.540
these particular constraints.

01:11:22.540 --> 01:11:25.900
So all in all, these are some of
the examples of search algorithms

01:11:25.900 --> 01:11:26.650
that might work.

01:11:26.650 --> 01:11:29.080
And certainly, there are
many more than just this.

01:11:29.080 --> 01:11:32.590
A*, for example, does have a tendency
to use quite a bit of memory,

01:11:32.590 --> 01:11:37.570
so there are alternative approaches to
A* that ultimately use less memory than

01:11:37.570 --> 01:11:40.200
this version of A* happens to use.

01:11:40.200 --> 01:11:43.660
And there are other search algorithms
that are optimized for other cases

01:11:43.660 --> 01:11:45.560
as well.

01:11:45.560 --> 01:11:48.460
But now, so far, we've only been
looking at search algorithms

01:11:48.460 --> 01:11:50.860
where there's one agent.

01:11:50.860 --> 01:11:53.470
I am trying to find a
solution to a problem.

01:11:53.470 --> 01:11:55.900
I am trying to navigate
my way through a maze.

01:11:55.900 --> 01:11:57.880
I am trying to solve a 15 puzzle.

01:11:57.880 --> 01:12:02.190
I am trying to find driving
directions from point A to point B.

01:12:02.190 --> 01:12:04.420
Sometimes in search
situations, though, we'll

01:12:04.420 --> 01:12:07.600
enter an adversarial
situation where I am

01:12:07.600 --> 01:12:10.010
an agent trying to make
intelligent decisions,

01:12:10.010 --> 01:12:13.090
and there is someone else who is
fighting against me, so to speak,

01:12:13.090 --> 01:12:16.570
that has an opposite objective,
someone where I am trying to succeed,

01:12:16.570 --> 01:12:19.180
someone else that wants me to fail.

01:12:19.180 --> 01:12:23.680
And this is most popular in something
like a game, a game like tic-tac-toe,

01:12:23.680 --> 01:12:26.350
where we've got this
3-by-3 grid, and X and O

01:12:26.350 --> 01:12:30.130
take turns either writing an X or
an O in any one of these squares.

01:12:30.130 --> 01:12:33.610
And the goal is to get three X's
in a row, if you're the X player,

01:12:33.610 --> 01:12:36.610
or three O's in a row,
if you're the O player.

01:12:36.610 --> 01:12:40.390
And computers have gotten quite good at
playing games, tic-tac-toe very easily,

01:12:40.390 --> 01:12:42.370
but even more complex games.

01:12:42.370 --> 01:12:46.570
And so you might imagine, what does
an intelligent decision in a game look

01:12:46.570 --> 01:12:47.290
like?

01:12:47.290 --> 01:12:51.130
So maybe X makes an initial move
in the middle, and O plays up here.

01:12:51.130 --> 01:12:54.340
What does an intelligent
move for X now become?

01:12:54.340 --> 01:12:56.390
Where should you move if you were X?

01:12:56.390 --> 01:12:58.670
And it turns out there are
a couple of possibilities.

01:12:58.670 --> 01:13:01.120
But if an AI is playing
this game optimally,

01:13:01.120 --> 01:13:04.180
then the AI might play somewhere
like the upper right, where

01:13:04.180 --> 01:13:07.800
in this situation, O has
the opposite objective of X.

01:13:07.800 --> 01:13:11.770
X is trying to win the game, to
get three in a row diagonally here,

01:13:11.770 --> 01:13:15.410
and O is trying to stop that
objective, opposite of the objective.

01:13:15.410 --> 01:13:17.860
And so O is going to place
here, to try to block.

01:13:17.860 --> 01:13:20.260
But now, X has a pretty clever move.

01:13:20.260 --> 01:13:25.180
X can make a move, like this where
now X has two possible ways that X

01:13:25.180 --> 01:13:26.080
can win the game.

01:13:26.080 --> 01:13:29.050
X could win the game by getting
three in a row across here,

01:13:29.050 --> 01:13:32.380
or X could win the game by getting
three in a row vertically this way.

01:13:32.380 --> 01:13:34.610
So it doesn't matter where
O makes their next move.

01:13:34.610 --> 01:13:38.170
O could play here, for example, blocking
the three in a row horizontally,

01:13:38.170 --> 01:13:43.200
but then X is going to win the game by
getting a three in a row vertically.

01:13:43.200 --> 01:13:45.010
And so there's a fair
amount of reasoning

01:13:45.010 --> 01:13:48.220
that's going on here in order for the
computer to be able to solve a problem.

01:13:48.220 --> 01:13:51.520
And it's similar in spirit to the
problems we've looked at so far.

01:13:51.520 --> 01:13:54.580
There are actions, there's some
sort of state of the board,

01:13:54.580 --> 01:13:57.160
and some transition from
one action to the next,

01:13:57.160 --> 01:14:00.730
but it's different in the sense that
this is now not just a classical search

01:14:00.730 --> 01:14:04.900
problem, but an adversarial search
problem, that I am the X player,

01:14:04.900 --> 01:14:07.090
trying to find the best
moves to make, but I

01:14:07.090 --> 01:14:10.640
know that there is some adversary
that is trying to stop me.

01:14:10.640 --> 01:14:14.800
So we need some sort of algorithm
to deal with these adversarial type

01:14:14.800 --> 01:14:16.440
of search situations.

01:14:16.440 --> 01:14:18.400
And the algorithm we're
going to take a look at

01:14:18.400 --> 01:14:20.860
is an algorithm called
Minimax, which works

01:14:20.860 --> 01:14:24.820
very well for these deterministic
games, where there are two players.

01:14:24.820 --> 01:14:28.120
It can work for other types of games as
well, but we'll look right now at games

01:14:28.120 --> 01:14:31.990
where I make a move, that my opponent
makes a move, and I am trying to win,

01:14:31.990 --> 01:14:34.270
and my opponent is trying to win, also.

01:14:34.270 --> 01:14:38.100
Or in other words, my opponent
is trying to get me to lose.

01:14:38.100 --> 01:14:40.900
And so what do we need in order
to make this algorithm work?

01:14:40.900 --> 01:14:44.800
Well, anytime we try and translate
this human concept, of playing a game,

01:14:44.800 --> 01:14:47.380
winning, and losing,
to a computer, we want

01:14:47.380 --> 01:14:50.230
to translate it in terms that
the computer can understand.

01:14:50.230 --> 01:14:53.740
And ultimately, the computer
really just understands numbers.

01:14:53.740 --> 01:14:56.890
And so we want some way of
translating a game of X's and O's

01:14:56.890 --> 01:15:00.490
on a grid to something numerical,
something the computer can understand.

01:15:00.490 --> 01:15:04.300
The computer doesn't normally
understand notions of win or lose,

01:15:04.300 --> 01:15:08.360
but it does understand the
concept of bigger and smaller.

01:15:08.360 --> 01:15:12.100
And so what we might yet do is, we
might take each of the possible ways

01:15:12.100 --> 01:15:17.110
that a tic-tac-toe game can unfold
and assign a value, or a utility,

01:15:17.110 --> 01:15:19.060
to each one of those possible ways.

01:15:19.060 --> 01:15:21.790
And in a tic-tac-toe game,
and in many types of games,

01:15:21.790 --> 01:15:23.800
there are three possible outcomes.

01:15:23.800 --> 01:15:28.210
The outcomes are, O wins,
X wins, or nobody wins.

01:15:28.210 --> 01:15:32.440
So player one wins, player
two wins, or nobody wins.

01:15:32.440 --> 01:15:36.670
And for now, let's go ahead and
assign each of these possible outcomes

01:15:36.670 --> 01:15:37.870
a different value.

01:15:37.870 --> 01:15:39.100
We'll say O winning--

01:15:39.100 --> 01:15:41.230
that'll have a value of negative 1.

01:15:41.230 --> 01:15:43.630
Nobody winning-- that'll
have a value of 0.

01:15:43.630 --> 01:15:46.700
And X winning-- that
will have a value of 1.

01:15:46.700 --> 01:15:50.860
So we've just assigned numbers to
each of these three possible outcomes.

01:15:50.860 --> 01:15:53.480
And now, we have two players.

01:15:53.480 --> 01:15:56.260
We have the X player and the O player.

01:15:56.260 --> 01:16:00.290
And we're going to go ahead and
call the X player the max player.

01:16:00.290 --> 01:16:03.140
And we'll call the O
player the min player.

01:16:03.140 --> 01:16:05.920
And the reason why is because
in the Minimax algorithm,

01:16:05.920 --> 01:16:11.380
the max player, which in this case is
X, is aiming to maximize the score.

01:16:11.380 --> 01:16:14.660
These are the possible options for
the score, negative 1, 0, and 1.

01:16:14.660 --> 01:16:18.400
X wants to maximize the score,
meaning if at all possible,

01:16:18.400 --> 01:16:21.980
X would like this situation
where X wins the game.

01:16:21.980 --> 01:16:23.590
And we give it a score of 1.

01:16:23.590 --> 01:16:27.010
But if this isn't possible, if X
needs to choose between these two

01:16:27.010 --> 01:16:31.930
options, negative 1 meaning O
winning, or 0 meaning nobody winning,

01:16:31.930 --> 01:16:37.060
X would rather that nobody wins, score
of 0, than a score of negative 1,

01:16:37.060 --> 01:16:38.360
O winning.

01:16:38.360 --> 01:16:41.080
So this notion of winning
and losing in time

01:16:41.080 --> 01:16:45.410
has been reduced mathematically to
just this idea of, try and maximize

01:16:45.410 --> 01:16:46.070
the score.

01:16:46.070 --> 01:16:49.920
The X player always wants
the score to be bigger.

01:16:49.920 --> 01:16:52.880
And on the flip side, the
min player, in this case, O,

01:16:52.880 --> 01:16:54.620
is aiming to minimize the score.

01:16:54.620 --> 01:16:59.510
The O player wants the score
to be as small as possible.

01:16:59.510 --> 01:17:02.820
So now we've taken this game of
X's and O's and winning and losing

01:17:02.820 --> 01:17:04.990
and turned it into something
mathematical, something

01:17:04.990 --> 01:17:09.560
where X is trying to maximize the score,
O is trying to minimize the score.

01:17:09.560 --> 01:17:11.590
Let's now look at all
of the parts of the game

01:17:11.590 --> 01:17:14.650
that we need in order
to encode it in an AI

01:17:14.650 --> 01:17:18.730
so that an AI can play
a game like tic-tac-toe.

01:17:18.730 --> 01:17:20.770
So the game is going to
need a couple of things.

01:17:20.770 --> 01:17:23.350
We'll need some sort of initial
state, that we'll in this case

01:17:23.350 --> 01:17:27.520
call S0, which is how the game begins,
like an empty tic-tac-toe board,

01:17:27.520 --> 01:17:28.750
for example.

01:17:28.750 --> 01:17:32.410
We'll also need a
function called player,

01:17:32.410 --> 01:17:37.030
where the player function is going to
take as input a state, here represented

01:17:37.030 --> 01:17:41.050
by S. And the output of the
player function is going to be,

01:17:41.050 --> 01:17:43.450
which player's turn is it?

01:17:43.450 --> 01:17:46.360
We need to be able to give a
tic-tac-toe board to the computer,

01:17:46.360 --> 01:17:50.440
run it through a function, and that
function tells us whose turn it is.

01:17:50.440 --> 01:17:52.870
We'll need some notion of
actions that we can take.

01:17:52.870 --> 01:17:54.980
We'll see examples of
that in just a moment.

01:17:54.980 --> 01:17:57.910
We need some notion of a
transition model-- same as before.

01:17:57.910 --> 01:18:00.160
If I have a state, and
I take an action, I

01:18:00.160 --> 01:18:03.040
need to know what results
as a consequence of it.

01:18:03.040 --> 01:18:05.810
I need some way of knowing
when the game is over.

01:18:05.810 --> 01:18:07.900
So this is equivalent to
kind of like a goal test,

01:18:07.900 --> 01:18:10.330
but I need some terminal
test, some way to check

01:18:10.330 --> 01:18:14.230
to see if a state is a terminal
state, where a terminal state means

01:18:14.230 --> 01:18:15.280
the game is over.

01:18:15.280 --> 01:18:20.050
In the classic game of tic-tac-toe , a
terminal state means either someone has

01:18:20.050 --> 01:18:23.380
gotten three in a row, or all of the
squares of the tic-tac-toe board are

01:18:23.380 --> 01:18:24.040
filled.

01:18:24.040 --> 01:18:26.610
Either of those conditions
make it a terminal state.

01:18:26.610 --> 01:18:28.570
In a game of chess, it
might be something like,

01:18:28.570 --> 01:18:31.660
when there is checkmate, or if
checkmate is no longer possible,

01:18:31.660 --> 01:18:34.370
that becomes a terminal state.

01:18:34.370 --> 01:18:38.410
And then finally we'll need a utility
function, a function that takes a state

01:18:38.410 --> 01:18:41.890
and gives us a numerical value for that
terminal state, some way of saying,

01:18:41.890 --> 01:18:44.530
if X wins the game,
that has a value of 1.

01:18:44.530 --> 01:18:47.050
If O has won the game, that
has the value of negative 1.

01:18:47.050 --> 01:18:50.350
If nobody has won the game,
that has a value of 0.

01:18:50.350 --> 01:18:52.790
So let's take a look at
each of these in turn.

01:18:52.790 --> 01:18:57.070
The initial state, we can just represent
in tic-tac-toe as the empty game board.

01:18:57.070 --> 01:18:58.460
This is where we begin.

01:18:58.460 --> 01:19:01.030
It's the place from which
we begin this search.

01:19:01.030 --> 01:19:03.540
And again, I'll be representing
these things visually.

01:19:03.540 --> 01:19:05.410
But you can imagine
this really just being

01:19:05.410 --> 01:19:10.120
an array, or a two-dimensional array,
of all of these possible squares.

01:19:10.120 --> 01:19:13.510
Then we need the player function
that, again, takes a state

01:19:13.510 --> 01:19:15.250
and tells us whose turn it is.

01:19:15.250 --> 01:19:18.670
Assuming X makes the first move,
if I have an empty game board,

01:19:18.670 --> 01:19:21.510
then my player function
is going to return X

01:19:21.510 --> 01:19:25.120
And if I have a game board where X has
made a move, that my player function is

01:19:25.120 --> 01:19:28.840
going to return O. The player function
takes a tic-tac-toe game board

01:19:28.840 --> 01:19:32.170
and tells us whose turn it is.

01:19:32.170 --> 01:19:34.870
Next up, we'll consider
the actions function.

01:19:34.870 --> 01:19:39.040
The actions function, much like it
did in classical search, takes a state

01:19:39.040 --> 01:19:41.860
and gives us the set of
all of the possible actions

01:19:41.860 --> 01:19:44.390
we can take in that state.

01:19:44.390 --> 01:19:49.360
So let's imagine it's O's turn to move
in a game board that looks like this.

01:19:49.360 --> 01:19:52.120
What happens when we pass it
into the actions function?

01:19:52.120 --> 01:19:55.990
So the actions function takes
this state of the game as input,

01:19:55.990 --> 01:19:59.860
and the output is a set of
possible actions it's a set of--

01:19:59.860 --> 01:20:03.580
I could move in the upper left, or
I could move in the bottom middle.

01:20:03.580 --> 01:20:06.490
Those are the two possible
action choices that I have

01:20:06.490 --> 01:20:10.190
when I begin in this particular state.

01:20:10.190 --> 01:20:13.100
Now, just as before, when
we add states and actions,

01:20:13.100 --> 01:20:15.470
we need some sort of
transition model to tell us,

01:20:15.470 --> 01:20:19.500
when we take this action in the state,
what is the new state that we get?

01:20:19.500 --> 01:20:22.430
And here, we define that using
the result function that takes

01:20:22.430 --> 01:20:25.460
a state as input, as well as an action.

01:20:25.460 --> 01:20:28.490
And when we apply the result
function to this state,

01:20:28.490 --> 01:20:33.110
saying that let's let O move in this
upper left corner, the new state we get

01:20:33.110 --> 01:20:35.840
is this resulting state, where
O is in the upper-left corner.

01:20:35.840 --> 01:20:38.880
And now, this seems obvious to someone
who knows how to play tic-tac-toe.

01:20:38.880 --> 01:20:40.830
Of course, you play in
the upper left corner--

01:20:40.830 --> 01:20:41.870
that's the board you get.

01:20:41.870 --> 01:20:45.060
But all of this information
needs to be encoded into the AI.

01:20:45.060 --> 01:20:47.570
The AI doesn't know
how to play tic-tac-toe

01:20:47.570 --> 01:20:51.170
until you tell the AI how the
rules of tic-tac-toe work.

01:20:51.170 --> 01:20:53.630
And this function,
defining the function here,

01:20:53.630 --> 01:20:57.050
allows us to tell the AI
how this game actually works

01:20:57.050 --> 01:21:01.200
and how actions actually
affect the outcome of the game.

01:21:01.200 --> 01:21:03.590
So the AI needs to know
how the game works.

01:21:03.590 --> 01:21:06.080
The AI also needs to know
when the game is over.

01:21:06.080 --> 01:21:09.320
That is by defining a function
called terminal that takes as input

01:21:09.320 --> 01:21:13.200
a state S, such that if we take
a game that is not yet over,

01:21:13.200 --> 01:21:16.130
pass it into the terminal
function, the output is false.

01:21:16.130 --> 01:21:17.540
The game is not over.

01:21:17.540 --> 01:21:20.690
But if we take a game that is
over, because X has gotten three

01:21:20.690 --> 01:21:24.360
in a row along that diagonal, pass
that into the terminal function,

01:21:24.360 --> 01:21:28.920
then the output is going to be true,
because the game now is, in fact, over.

01:21:28.920 --> 01:21:31.930
And finally, we've told
the AI how the game works

01:21:31.930 --> 01:21:35.180
in terms of what moves can be made and
what happens when you make those moves.

01:21:35.180 --> 01:21:37.290
We've told the AI when the game is over.

01:21:37.290 --> 01:21:41.270
Now we need to tell the AI what the
value of each of those states is.

01:21:41.270 --> 01:21:45.170
And we do that by defining this utility
function, that takes a state, S,

01:21:45.170 --> 01:21:48.840
and tells us the score or
the utility of that state.

01:21:48.840 --> 01:21:52.760
So again, we said that if X wins the
game, that utility is a value of 1,

01:21:52.760 --> 01:21:57.350
whereas if O wins the game, then
the utility of that is negative 1.

01:21:57.350 --> 01:22:00.200
And the AI needs to know, for
each of these terminal states

01:22:00.200 --> 01:22:04.800
where the game is over, what
is the utility of that state?

01:22:04.800 --> 01:22:08.390
So I can give you a game board like
this, where the game is, in fact, over,

01:22:08.390 --> 01:22:12.710
and I ask the AI to tell me what the
value of that state is, it could do so.

01:22:12.710 --> 01:22:15.830
The value of the state is 1.

01:22:15.830 --> 01:22:20.240
Where things get interesting, though,
is if the game is not yet over.

01:22:20.240 --> 01:22:21.900
Let's imagine a game board like this.

01:22:21.900 --> 01:22:23.330
We're in the middle of the game.

01:22:23.330 --> 01:22:25.850
It's O's turn to make a move.

01:22:25.850 --> 01:22:27.980
So how do we know it's
O's turn to make a move?

01:22:27.980 --> 01:22:30.110
We can calculate that,
using the player function.

01:22:30.110 --> 01:22:33.260
We can say, player of
S, pass in the state.

01:22:33.260 --> 01:22:36.240
O is the answer, so we
know it's O's turn to move.

01:22:36.240 --> 01:22:40.790
And now, what is the value of this
board, and what action should O take?

01:22:40.790 --> 01:22:41.960
Well that's going to depend.

01:22:41.960 --> 01:22:43.760
We have to do some calculation here.

01:22:43.760 --> 01:22:47.450
And this is where the Minimax
algorithm really comes in.

01:22:47.450 --> 01:22:50.840
Recall that X is trying to
maximize the score, which means

01:22:50.840 --> 01:22:53.720
that O is trying to minimize the score.

01:22:53.720 --> 01:22:58.870
O would like to minimize the total value
that we get at the end of the game.

01:22:58.870 --> 01:23:01.310
And because this game isn't
over yet, we don't really

01:23:01.310 --> 01:23:04.820
know just yet what the
value of this game board is.

01:23:04.820 --> 01:23:08.290
We have to do some calculation
in order to figure that out.

01:23:08.290 --> 01:23:10.430
So how do we do that
kind of calculation?

01:23:10.430 --> 01:23:13.040
Well, in order to do so,
we're going to consider,

01:23:13.040 --> 01:23:15.680
just as we might in a
classical search situation,

01:23:15.680 --> 01:23:20.160
what actions could happen next, and
what states will that take us to?

01:23:20.160 --> 01:23:22.070
And it turns out that
in this position, there

01:23:22.070 --> 01:23:26.030
are only two open squares, which means
there are only two open places where

01:23:26.030 --> 01:23:28.640
O can make a move.

01:23:28.640 --> 01:23:31.080
O could either make a
move in the upper left,

01:23:31.080 --> 01:23:34.160
or O can make a move
in the bottom middle.

01:23:34.160 --> 01:23:36.980
And Minimax doesn't know right out
of the box which of those moves

01:23:36.980 --> 01:23:40.520
is going to be better, so
it's going to consider both.

01:23:40.520 --> 01:23:42.320
But now we run into the same situation.

01:23:42.320 --> 01:23:45.170
Now I have two more game boards,
neither of which is over.

01:23:45.170 --> 01:23:46.660
What happens next?

01:23:46.660 --> 01:23:48.410
And now it's in this
sense that Minimax is

01:23:48.410 --> 01:23:50.660
what we'll call a recursive algorithm.

01:23:50.660 --> 01:23:54.800
It's going to now repeat the
exact same process, although now

01:23:54.800 --> 01:23:57.640
considering it from the
opposite perspective.

01:23:57.640 --> 01:24:01.280
It's as if I am now going to put
myself-- if I am the O player,

01:24:01.280 --> 01:24:05.540
I'm going to put myself in my opponent's
shoes, my opponent as the X player,

01:24:05.540 --> 01:24:10.010
and consider, what would my opponent
do if they were in this position?

01:24:10.010 --> 01:24:14.090
What would my opponent do, the X
player, if they were in that position?

01:24:14.090 --> 01:24:15.240
And what would then happen?

01:24:15.240 --> 01:24:18.260
Well, the other player,
my opponent, the X player,

01:24:18.260 --> 01:24:21.200
is trying to maximize the
score, whereas I am trying

01:24:21.200 --> 01:24:23.240
to minimize the score as the O player.

01:24:23.240 --> 01:24:27.550
So X is trying to find the maximum
possible value that they can get.

01:24:27.550 --> 01:24:29.390
And so what's going to happen?

01:24:29.390 --> 01:24:32.780
Well, from this board position,
X only has one choice.

01:24:32.780 --> 01:24:35.600
X is going to play here, and
they're going to get three in a row.

01:24:35.600 --> 01:24:37.910
And we know that that board, X winning--

01:24:37.910 --> 01:24:39.530
that has a value of 1.

01:24:39.530 --> 01:24:43.220
If X wins the game, the value
of that game board is 1.

01:24:43.220 --> 01:24:48.680
And so from this position, if this
state can only ever lead to this state,

01:24:48.680 --> 01:24:52.640
it's the only possible option,
and this state has a value of 1,

01:24:52.640 --> 01:24:57.230
then the maximum possible value that the
X player can get from this game board

01:24:57.230 --> 01:24:58.890
is also 1 from here.

01:24:58.890 --> 01:25:01.670
The only place we can get is
to a game with the value of 1,

01:25:01.670 --> 01:25:05.260
so this game board
also has a value of 1.

01:25:05.260 --> 01:25:07.640
Now we consider this one over here.

01:25:07.640 --> 01:25:08.860
What's going to happen now?

01:25:08.860 --> 01:25:10.460
Well, X needs to make a move.

01:25:10.460 --> 01:25:13.510
The only move X can make is in the
upper left, so X will go there.

01:25:13.510 --> 01:25:15.350
And in this game, no one wins the game.

01:25:15.350 --> 01:25:16.940
Nobody has three in a row.

01:25:16.940 --> 01:25:19.600
So the value of that game board is 0.

01:25:19.600 --> 01:25:20.890
Nobody's won.

01:25:20.890 --> 01:25:24.880
And so again, by the same logic, if
from this board position, the only place

01:25:24.880 --> 01:25:27.760
we can get to is a board
where the value is 0,

01:25:27.760 --> 01:25:31.290
then this state must
also have a value of 0.

01:25:31.290 --> 01:25:35.320
And now here comes the choice part,
the idea of trying to minimize.

01:25:35.320 --> 01:25:38.970
I, as the O player, now know
that if I make this choice,

01:25:38.970 --> 01:25:43.140
moving in the upper left, that is going
to result in a game with a value of 1,

01:25:43.140 --> 01:25:45.240
assuming everyone plays optimally.

01:25:45.240 --> 01:25:47.040
And if I instead play
in the lower middle,

01:25:47.040 --> 01:25:50.250
choose this fork in the road, that
is going to result in a game board

01:25:50.250 --> 01:25:51.510
with a value of 0.

01:25:51.510 --> 01:25:52.750
I have two options.

01:25:52.750 --> 01:25:56.380
I have a 1 and a 0 to choose
from, and I need to pick.

01:25:56.380 --> 01:25:59.520
And as the min player, I
would rather choose the option

01:25:59.520 --> 01:26:00.940
with the minimum value.

01:26:00.940 --> 01:26:03.090
So whenever a player
has multiple choices,

01:26:03.090 --> 01:26:06.030
the min player will choose the
option with the smallest value.

01:26:06.030 --> 01:26:08.700
The max player will choose the
option with the largest value.

01:26:08.700 --> 01:26:11.470
Between the 1 in the
0, the 0 is smaller,

01:26:11.470 --> 01:26:14.740
meaning I'd rather tie the
game than lose the game.

01:26:14.740 --> 01:26:18.090
And so this game board, we'll
say, also has a value of 0,

01:26:18.090 --> 01:26:22.290
because if I am playing optimally,
I will pick this fork in the road.

01:26:22.290 --> 01:26:25.290
I'll place my O here to
block X's three in a row.

01:26:25.290 --> 01:26:28.080
X will move in the upper left,
and the game will be over,

01:26:28.080 --> 01:26:30.180
and no one will have won the game.

01:26:30.180 --> 01:26:34.260
So this is now the logic of Minimax,
to consider all of the possible options

01:26:34.260 --> 01:26:37.140
that I can take, all of the
actions that I can take,

01:26:37.140 --> 01:26:39.390
and then to put myself
in my opponent's shoes.

01:26:39.390 --> 01:26:42.930
I decide what move I'm going to
make now by considering what move

01:26:42.930 --> 01:26:44.880
my opponent will make on the next turn.

01:26:44.880 --> 01:26:48.230
And to do that, I consider what move
I would make on the turn after that,

01:26:48.230 --> 01:26:52.560
so on and so forth, until I get all
the way down to the end of the game,

01:26:52.560 --> 01:26:55.050
to one of these so-called
terminal states.

01:26:55.050 --> 01:26:57.480
In fact, this very
decision point, where I

01:26:57.480 --> 01:27:00.630
am trying to decide as the O player
what to make a decision about,

01:27:00.630 --> 01:27:04.770
might have just been a part of the
logic that the X player, my opponent,

01:27:04.770 --> 01:27:06.360
was using the move before me.

01:27:06.360 --> 01:27:09.150
This might be part of
some larger tree where

01:27:09.150 --> 01:27:11.580
X is trying to make a
move in this situation

01:27:11.580 --> 01:27:13.830
and needs to pick between
three different options

01:27:13.830 --> 01:27:16.440
in order to make a decision
about what to happen.

01:27:16.440 --> 01:27:19.260
And the further and further away
we are from the end of the game,

01:27:19.260 --> 01:27:23.460
the deeper this tree has to go,
because every level in this tree

01:27:23.460 --> 01:27:27.750
is going to correspond to one move,
one move or action that I take,

01:27:27.750 --> 01:27:32.350
one move or action that my opponent
takes, in order to decide what happens.

01:27:32.350 --> 01:27:35.970
And in fact, it turns out that if
I am the X player in this position,

01:27:35.970 --> 01:27:38.630
and I recursively do the logic
and see I have a choice--

01:27:38.630 --> 01:27:43.170
three choices, in fact, one of which
leads to a value of 0, if I play here,

01:27:43.170 --> 01:27:45.900
and if everyone plays optimally,
the game will be a tie.

01:27:45.900 --> 01:27:51.000
If I play here, then O is going to
win, and I'll lose, playing optimally.

01:27:51.000 --> 01:27:53.610
Or here, where I, the
X player, can win--

01:27:53.610 --> 01:27:57.230
well, between a score of
0 and negative 1 and 1,

01:27:57.230 --> 01:27:59.070
I'd rather pick the
board with a value of 1,

01:27:59.070 --> 01:28:01.270
because that's the
maximum value I can get.

01:28:01.270 --> 01:28:05.550
And so this board would also
have a maximum value of 1.

01:28:05.550 --> 01:28:07.810
And so this tree can
get very, very deep,

01:28:07.810 --> 01:28:11.400
especially as the game starts
to have more and more moves.

01:28:11.400 --> 01:28:13.470
And this logic works not
just for tic-tac-toe,

01:28:13.470 --> 01:28:16.950
but any of these sorts of games where I
make a move, my opponent makes a move,

01:28:16.950 --> 01:28:20.460
and ultimately, we have
these adversarial objectives.

01:28:20.460 --> 01:28:24.340
And we can simplify the diagram
into a diagram that looks like this.

01:28:24.340 --> 01:28:27.240
This is a more abstract
version of the Minimax tree,

01:28:27.240 --> 01:28:30.450
where these are each states, but I'm
no longer representing them as exactly

01:28:30.450 --> 01:28:31.830
like tic-tac-toe boards.

01:28:31.830 --> 01:28:35.690
This is just representing some generic
game that might be tic-tac-toe,

01:28:35.690 --> 01:28:38.130
might be some other game altogether.

01:28:38.130 --> 01:28:40.590
Any of these green arrows
that are pointing up--

01:28:40.590 --> 01:28:42.700
that represents a maximizing state.

01:28:42.700 --> 01:28:45.290
I would like the score
to be as big as possible.

01:28:45.290 --> 01:28:47.430
And any of these red
arrows pointing down--

01:28:47.430 --> 01:28:50.580
those are minimizing states, where
the player is the min player,

01:28:50.580 --> 01:28:54.190
and they are trying to make
the score as small as possible.

01:28:54.190 --> 01:28:58.200
So if you imagine in this situation, I
am the maximizing player, this player

01:28:58.200 --> 01:29:00.550
here, and I have three choices--

01:29:00.550 --> 01:29:04.170
one choice gives me a score of 5,
one choice gives me a score of 3,

01:29:04.170 --> 01:29:06.310
and one choice gives me a score of 9.

01:29:06.310 --> 01:29:10.050
Well, then, between those
three choices, my best option

01:29:10.050 --> 01:29:14.430
is to choose this 9 over here, the
score that maximizes my options out

01:29:14.430 --> 01:29:15.960
of all the three options.

01:29:15.960 --> 01:29:18.870
And so I can give this
state a value of 9,

01:29:18.870 --> 01:29:21.210
because among my three
options, that is the best

01:29:21.210 --> 01:29:24.330
choice that I have available to me.

01:29:24.330 --> 01:29:25.800
So that's my decision now.

01:29:25.800 --> 01:29:29.580
You imagine it's like one move
away from the end of the game.

01:29:29.580 --> 01:29:31.670
But then you could also
ask a reasonable question.

01:29:31.670 --> 01:29:35.300
What might my opponent do two moves
away from the end of the game?

01:29:35.300 --> 01:29:37.040
My opponent is the minimizing player.

01:29:37.040 --> 01:29:39.810
They are trying to make the
score as small as possible.

01:29:39.810 --> 01:29:43.690
Imagine what would have happened if
they had to pick which choice to make.

01:29:43.690 --> 01:29:47.370
One choice leads us to this state,
where I, the maximizing player,

01:29:47.370 --> 01:29:50.840
am going to opt for 9, the
biggest score that I can get.

01:29:50.840 --> 01:29:55.130
And one leads to this state,
where I, the maximizing player,

01:29:55.130 --> 01:29:58.980
would choose 8, which is then
the largest score than I can get.

01:29:58.980 --> 01:30:02.780
Now, the minimizing player, forced
to choose between a 9 or an 8,

01:30:02.780 --> 01:30:07.100
is going to choose the smallest possible
score, which in this case is an 8.

01:30:07.100 --> 01:30:09.450
And that is, then, how
this process would unfold.

01:30:09.450 --> 01:30:11.660
But the minimizing player,
in this case, considers

01:30:11.660 --> 01:30:14.180
both of their options, and
then all of the options

01:30:14.180 --> 01:30:17.590
that would happen as a result of that.

01:30:17.590 --> 01:30:21.420
So this now is a general picture of
what the Minimax algorithm looks like.

01:30:21.420 --> 01:30:24.700
Let's now try to formalize it
using a little bit of pseudocode.

01:30:24.700 --> 01:30:27.720
So what exactly is happening
in the Minimax algorithm?

01:30:27.720 --> 01:30:31.500
Well, given a state, S, we
need to decide what to happen.

01:30:31.500 --> 01:30:34.980
The max player-- if it's
the max player's turn, then

01:30:34.980 --> 01:30:39.540
max is going to pick an action,
A, in actions of S. Recall

01:30:39.540 --> 01:30:42.090
that actions is a function
that takes a state

01:30:42.090 --> 01:30:44.970
and gives me back all of the
possible actions that I can take.

01:30:44.970 --> 01:30:48.770
It tells me all of the
moves that are possible.

01:30:48.770 --> 01:30:50.610
The max player is going
to specifically pick

01:30:50.610 --> 01:30:53.970
an action, A, in the set
of actions that gives me

01:30:53.970 --> 01:31:01.420
the highest value of min value of result
of S and A. So what does that mean?

01:31:01.420 --> 01:31:04.350
Well, it means that I want to
make the option that gives me

01:31:04.350 --> 01:31:07.830
the highest score of
all of the actions, A.

01:31:07.830 --> 01:31:09.630
But what score is that going to have?

01:31:09.630 --> 01:31:12.780
To calculate that, I need to know
what my opponent, the min player,

01:31:12.780 --> 01:31:18.300
is going to do if they try to minimize
the value of the state that results.

01:31:18.300 --> 01:31:22.260
So we say, what state results
after I take this action,

01:31:22.260 --> 01:31:24.570
and what happens when
the min player tries

01:31:24.570 --> 01:31:27.570
to minimize the value of that state?

01:31:27.570 --> 01:31:30.260
I consider that for all
of my possible options.

01:31:30.260 --> 01:31:32.850
And after I've considered that
for all of my possible options,

01:31:32.850 --> 01:31:36.770
I pick the action, A, that
has the highest value.

01:31:36.770 --> 01:31:40.090
Likewise, the min player is going
to do the same thing, but backwards.

01:31:40.090 --> 01:31:43.180
They're also going to consider, what
are all of the possible actions they

01:31:43.180 --> 01:31:44.800
can take if it's their turn?

01:31:44.800 --> 01:31:47.740
And they're going to pick the
action, A, that has the smallest

01:31:47.740 --> 01:31:50.220
possible value of all the options.

01:31:50.220 --> 01:31:53.430
And the way they know what the smallest
possible value of all the options is,

01:31:53.430 --> 01:31:57.490
is by considering what the
max player is going to do,

01:31:57.490 --> 01:32:01.720
by saying, what's the result of applying
this action to the current state,

01:32:01.720 --> 01:32:03.640
and then, what would the
max player try to do?

01:32:03.640 --> 01:32:08.000
What value would the max player
calculate for that particular state?

01:32:08.000 --> 01:32:11.260
So everyone makes their decision
based on trying to estimate

01:32:11.260 --> 01:32:13.600
what the other person would do.

01:32:13.600 --> 01:32:15.930
And now we need to turn
our attention to these two

01:32:15.930 --> 01:32:18.540
functions, maxValue and minValue.

01:32:18.540 --> 01:32:21.690
How do you actually calculate
the value of a state

01:32:21.690 --> 01:32:24.270
if you're trying to maximize
its value, and how do you

01:32:24.270 --> 01:32:27.690
calculate the value of a state if
you're trying to minimize the value?

01:32:27.690 --> 01:32:30.750
If you can do that, then we
have an entire implementation

01:32:30.750 --> 01:32:32.920
of this Minimax algorithm.

01:32:32.920 --> 01:32:33.610
So let's try it.

01:32:33.610 --> 01:32:36.480
Let's try and implement
this maxValue function

01:32:36.480 --> 01:32:40.630
that takes a state and returns
as output the value of that state

01:32:40.630 --> 01:32:43.860
if I'm trying to maximize
the value of the state.

01:32:43.860 --> 01:32:46.860
Well, the first thing I can check
for is to see if the game is over,

01:32:46.860 --> 01:32:48.100
because if the game is over--

01:32:48.100 --> 01:32:50.820
in other words, if the
state is a terminal state--

01:32:50.820 --> 01:32:51.990
then this is easy.

01:32:51.990 --> 01:32:54.930
I already have this utility
function that tells me

01:32:54.930 --> 01:32:56.280
what the value of the board is.

01:32:56.280 --> 01:32:58.660
If the game is over, I
just check, did X win?

01:32:58.660 --> 01:32:59.160
Did O win?

01:32:59.160 --> 01:33:00.150
Is that a tie?

01:33:00.150 --> 01:33:04.200
And the utility function just knows
what the value of the state is.

01:33:04.200 --> 01:33:06.430
What's trickier is if
the game isn't over,

01:33:06.430 --> 01:33:09.240
because then I need to do this
recursive reasoning about thinking,

01:33:09.240 --> 01:33:12.630
what is my opponent going
to do on the next move?

01:33:12.630 --> 01:33:15.830
Then I want to calculate
the value of this state,

01:33:15.830 --> 01:33:19.060
and I want the value of the
state to be as high as possible.

01:33:19.060 --> 01:33:21.900
And I'll keep track of that
value in a variable called v.

01:33:21.900 --> 01:33:24.570
And if I want the value
to be as high as possible,

01:33:24.570 --> 01:33:27.330
I need to give v an initial value.

01:33:27.330 --> 01:33:31.500
And initially, I'll just go ahead
and set it to be as low as possible,

01:33:31.500 --> 01:33:34.650
because I don't know what
options are available to me yet.

01:33:34.650 --> 01:33:38.610
So initially, I'll set v equal
to negative infinity, which

01:33:38.610 --> 01:33:40.620
seems a little bit
strange, but the idea here

01:33:40.620 --> 01:33:43.680
is, I want the value initially
to be as low as possible,

01:33:43.680 --> 01:33:46.230
because as I consider
my actions, I'm always

01:33:46.230 --> 01:33:50.220
going to try and do better than v.
And if I set v to negative infinity,

01:33:50.220 --> 01:33:52.850
I know I can always do better than that.

01:33:52.850 --> 01:33:54.960
So now I consider my actions.

01:33:54.960 --> 01:33:56.710
And this is going to
be some kind of loop,

01:33:56.710 --> 01:34:00.070
where for every action
in actions of state--

01:34:00.070 --> 01:34:02.830
recall, actions is a
function that takes my state

01:34:02.830 --> 01:34:06.650
and gives me all the possible
actions that I can use in that state.

01:34:06.650 --> 01:34:11.570
So for each one of those actions,
I want to compare it to v and say,

01:34:11.570 --> 01:34:18.290
all right, v is going to be equal to
the maximum of v and this expression.

01:34:18.290 --> 01:34:20.090
So what is this expression?

01:34:20.090 --> 01:34:24.640
Well, first it is, get the result
of taking the action and the state,

01:34:24.640 --> 01:34:28.250
and then get the min value of that.

01:34:28.250 --> 01:34:31.090
In other words, let's
say, I want to find out

01:34:31.090 --> 01:34:34.220
from that state what is the
best that the min player can do,

01:34:34.220 --> 01:34:36.430
because they are going to
try and minimize the score.

01:34:36.430 --> 01:34:40.210
So whatever the resulting score
is of the min value of that state,

01:34:40.210 --> 01:34:43.870
compare it to my current best value,
and just pick the maximum of those two,

01:34:43.870 --> 01:34:46.460
because I am trying
to maximize the value.

01:34:46.460 --> 01:34:48.550
In short, what these three
lines of code are doing

01:34:48.550 --> 01:34:52.460
are going through all of my possible
actions and asking the question,

01:34:52.460 --> 01:34:57.880
how do I maximize the score, given
what my opponent is going to try to do?

01:34:57.880 --> 01:35:00.640
After this entire loop,
I can just return v,

01:35:00.640 --> 01:35:04.070
and that is now the value
of that particular state.

01:35:04.070 --> 01:35:07.910
And for the min player, it's the exact
opposite of this, the same logic,

01:35:07.910 --> 01:35:08.930
just backwards.

01:35:08.930 --> 01:35:10.910
To calculate the minimum
value of a state,

01:35:10.910 --> 01:35:12.770
first we check if it's a terminal state.

01:35:12.770 --> 01:35:14.960
If it is, we return its utility.

01:35:14.960 --> 01:35:19.280
Otherwise, we're going to now try
to minimize the value of the state,

01:35:19.280 --> 01:35:21.290
given all of my possible actions.

01:35:21.290 --> 01:35:24.800
So I need an initial value
for v, the value of the state.

01:35:24.800 --> 01:35:28.430
And initially, I'll set it to
infinity, because I know it can always

01:35:28.430 --> 01:35:30.390
get something less than infinity.

01:35:30.390 --> 01:35:33.920
So by starting with v equals infinity,
I make sure that the very first action

01:35:33.920 --> 01:35:34.580
I find--

01:35:34.580 --> 01:35:37.550
that will be less than this value of v.

01:35:37.550 --> 01:35:38.810
And then I do the same thing--

01:35:38.810 --> 01:35:41.630
loop over all of my possible
actions, and for each

01:35:41.630 --> 01:35:45.740
of the results that we could get when
the max player makes their decision,

01:35:45.740 --> 01:35:49.160
let's take the minimum of that
and the current value of v.

01:35:49.160 --> 01:35:53.210
So after all is said and done I get
the smallest possible value of v,

01:35:53.210 --> 01:35:56.480
that I then return back to the user.

01:35:56.480 --> 01:35:59.020
So that, in effect, is the
pseudocode for Minimax.

01:35:59.020 --> 01:36:01.990
That is how we take a game and
figure out what the best move to make

01:36:01.990 --> 01:36:06.490
is by recursively using these
maxValue and minValue functions, where

01:36:06.490 --> 01:36:10.060
maxValue calls minValue,
minValue calls maxValue, back

01:36:10.060 --> 01:36:13.540
and forth, all the way until we reach
a terminal state, at which point

01:36:13.540 --> 01:36:18.750
our algorithm can simply return the
utility of that particular state.

01:36:18.750 --> 01:36:20.590
What you might imagine
is that this is going

01:36:20.590 --> 01:36:23.770
to start to be a long process,
especially as games start

01:36:23.770 --> 01:36:28.060
to get more complex, as we start to add
more moves and more possible options

01:36:28.060 --> 01:36:30.710
and games that might
last quite a bit longer.

01:36:30.710 --> 01:36:34.360
So the next question to ask is, what
sort of optimizations can we make here?

01:36:34.360 --> 01:36:37.780
How can we do better in
order to use less space

01:36:37.780 --> 01:36:42.110
or take less time to be able
to solve this kind of problem?

01:36:42.110 --> 01:36:44.740
And we'll take a look at a
couple of possible optimizations.

01:36:44.740 --> 01:36:47.330
But for one, we'll take
a look at this example.

01:36:47.330 --> 01:36:49.850
Again, we're turning to these
up arrows and down arrows.

01:36:49.850 --> 01:36:54.070
Let's imagine that I now am the
max player, this green arrow.

01:36:54.070 --> 01:36:57.260
I am trying to make the
score as high as possible.

01:36:57.260 --> 01:37:00.370
And this is an easy game,
where there are just two moves.

01:37:00.370 --> 01:37:02.980
I make a move, one of
these three options,

01:37:02.980 --> 01:37:05.890
and then my opponent makes a
move, one of these three options,

01:37:05.890 --> 01:37:07.290
based on what move I make.

01:37:07.290 --> 01:37:10.310
And as a result, we get some value.

01:37:10.310 --> 01:37:13.450
Let's look at the order in
which I do these calculations

01:37:13.450 --> 01:37:16.810
and figure out if there are any
optimizations I might be able to make

01:37:16.810 --> 01:37:18.760
to this calculation process.

01:37:18.760 --> 01:37:21.200
I'm going to have to look at
these states one at a time.

01:37:21.200 --> 01:37:23.740
So let's say I start here on
the left and say, all right, now

01:37:23.740 --> 01:37:28.300
I'm going to consider, what will the
min player, my opponent, try to do here?

01:37:28.300 --> 01:37:31.810
Well, the min player is going to look
at all three of their possible actions

01:37:31.810 --> 01:37:34.270
and look at their value, because
these are terminal states.

01:37:34.270 --> 01:37:35.430
They're the end of the game.

01:37:35.430 --> 01:37:38.830
And so they'll see, all right, this
node is a value of 4, value of 8,

01:37:38.830 --> 01:37:40.410
value of 5.

01:37:40.410 --> 01:37:42.800
And the min player is going
to say, well, all right.

01:37:42.800 --> 01:37:45.790
Between these three
options, 4, 8, and 5,

01:37:45.790 --> 01:37:48.070
I'll take the smallest
one I'll take the 4.

01:37:48.070 --> 01:37:50.610
So this state now has a value of 4.

01:37:50.610 --> 01:37:54.060
Then I as the max player say,
all right, if I take this action,

01:37:54.060 --> 01:37:55.170
it will have a value of 4.

01:37:55.170 --> 01:37:57.330
That's the best that I
can do, because min player

01:37:57.330 --> 01:37:59.810
is going to try and minimize my score.

01:37:59.810 --> 01:38:01.270
So now, what if I take this option?

01:38:01.270 --> 01:38:02.620
We'll explore this next.

01:38:02.620 --> 01:38:06.250
And now I explore what the min player
would do if I choose this action.

01:38:06.250 --> 01:38:09.350
And the min player is going to say,
all right, what are the three options?

01:38:09.350 --> 01:38:14.440
The min player has options
between 9, 3, and 7, and so 3

01:38:14.440 --> 01:38:16.510
is the smallest among 9, 3, and 7.

01:38:16.510 --> 01:38:19.640
So we'll go ahead and say
this state has a value of 3.

01:38:19.640 --> 01:38:20.980
So now I, as the max player--

01:38:20.980 --> 01:38:23.350
I have now explored two
of my three options.

01:38:23.350 --> 01:38:27.390
I know that one of my options will
guarantee me a score of 4, at least,

01:38:27.390 --> 01:38:31.100
and one of my options will
guarantee me a score of 3.

01:38:31.100 --> 01:38:34.160
And now I consider my third option
and say, all right, what happens here?

01:38:34.160 --> 01:38:35.910
Same exact logic-- the
min player is going

01:38:35.910 --> 01:38:38.120
to look at these three
states, 2, 4, and 6,

01:38:38.120 --> 01:38:42.480
say the minimum possible option is
2, so the min player wants the two.

01:38:42.480 --> 01:38:45.780
Now I, as the max player, have
calculated all of the information

01:38:45.780 --> 01:38:49.220
by looking two layers deep, by
looking at all of these nodes.

01:38:49.220 --> 01:38:52.770
And I can now say, between the 4,
the 3, and the 2, you know what?

01:38:52.770 --> 01:38:55.620
I'd rather take the
4, because if I choose

01:38:55.620 --> 01:38:58.200
this option, if my
opponent plays optimally,

01:38:58.200 --> 01:39:01.620
they will try and get me to the
4, but that's the best I can do.

01:39:01.620 --> 01:39:04.170
I can't guarantee a
higher score, because if I

01:39:04.170 --> 01:39:07.740
pick either of these two options, I
might get a 3, or I might get a 2.

01:39:07.740 --> 01:39:10.620
And it's true that down
here is a 9, and that's

01:39:10.620 --> 01:39:12.390
the highest score of any of the scores.

01:39:12.390 --> 01:39:14.230
So I might be tempted
to say, you know what?

01:39:14.230 --> 01:39:17.400
Maybe I should take this option,
because I might get the 9.

01:39:17.400 --> 01:39:19.980
But if the min player is
playing intelligently,

01:39:19.980 --> 01:39:22.650
if they're making the best
moves at each possible option

01:39:22.650 --> 01:39:26.370
they have when they get to make
a choice, I'll be left with a 3,

01:39:26.370 --> 01:39:28.470
whereas I could better,
playing optimally,

01:39:28.470 --> 01:39:31.720
have guaranteed that I would get the 4.

01:39:31.720 --> 01:39:33.600
So that doesn't affect
the logic that I would

01:39:33.600 --> 01:39:38.910
use as a Minimax player trying to
maximize my score from that node there.

01:39:38.910 --> 01:39:41.310
But it turns out, that took
quite a bit of computation

01:39:41.310 --> 01:39:42.390
for me to figure that out.

01:39:42.390 --> 01:39:45.690
I had to reason through all of these
nodes in order to draw this conclusion.

01:39:45.690 --> 01:39:48.780
And this is for a pretty simple
game, where I have three choices,

01:39:48.780 --> 01:39:52.270
my opponent has three choices,
and then the game's over.

01:39:52.270 --> 01:39:55.070
So what I'd like to do is come up
with some way to optimize this.

01:39:55.070 --> 01:39:58.990
Maybe I don't need to do all of
this calculation to still reach

01:39:58.990 --> 01:40:00.450
the conclusion that, you know what?

01:40:00.450 --> 01:40:01.950
This action to the left--

01:40:01.950 --> 01:40:03.810
that's the best that I could do.

01:40:03.810 --> 01:40:07.290
Let's go ahead and try again and
try and be a little more intelligent

01:40:07.290 --> 01:40:10.150
about how I go about doing this.

01:40:10.150 --> 01:40:12.320
So first, I start the exact same way.

01:40:12.320 --> 01:40:14.160
I don't know what to
do initially, so I just

01:40:14.160 --> 01:40:18.930
have to consider one of the options and
consider what the min player might do.

01:40:18.930 --> 01:40:21.540
Min has three options, 4, 8, and 5.

01:40:21.540 --> 01:40:25.500
And between those three options,
min says, 4 is the best they can do,

01:40:25.500 --> 01:40:28.390
because they want to try
to minimize the score.

01:40:28.390 --> 01:40:31.960
Now, I, the max player, will
consider my second option,

01:40:31.960 --> 01:40:36.730
making this move here and considering
what my opponent would do in response.

01:40:36.730 --> 01:40:38.470
What will the min player do?

01:40:38.470 --> 01:40:41.560
Well, the min player is going to, from
that state, look at their options.

01:40:41.560 --> 01:40:42.710
And I would say, all right.

01:40:42.710 --> 01:40:45.980
9 is an option, 3 is an option.

01:40:45.980 --> 01:40:48.220
And if I am doing the math
from this initial state,

01:40:48.220 --> 01:40:51.400
doing all this calculation,
when I see a 3,

01:40:51.400 --> 01:40:54.250
that should immediately
be a red flag for me,

01:40:54.250 --> 01:40:56.890
because when I see a 3
down here at this state,

01:40:56.890 --> 01:41:01.870
I know that the value of this
state is going to be at most 3.

01:41:01.870 --> 01:41:04.630
It's going to be 3 or
something less than 3,

01:41:04.630 --> 01:41:08.050
even though I haven't yet looked at
this last action or even further actions

01:41:08.050 --> 01:41:10.870
if there were more actions
that could be taken here.

01:41:10.870 --> 01:41:11.900
How do I know that?

01:41:11.900 --> 01:41:16.000
Well, I know that the min player is
going to try to minimize my score.

01:41:16.000 --> 01:41:20.350
And if they see a 3, the only way
this could be something other than a 3

01:41:20.350 --> 01:41:24.520
is if this remaining thing that I
haven't yet looked at is less than 3,

01:41:24.520 --> 01:41:28.810
which means there is no way for this
value to be anything more than 3,

01:41:28.810 --> 01:41:31.390
because the min player
can already guarantee a 3,

01:41:31.390 --> 01:41:34.930
and they are trying
to minimize my score.

01:41:34.930 --> 01:41:36.380
So what does that tell me?

01:41:36.380 --> 01:41:38.740
Well, it tells me that
if I choose this action,

01:41:38.740 --> 01:41:43.240
my score is going to be 3, or maybe
even less than 3, if I'm unlucky.

01:41:43.240 --> 01:41:47.500
But I already know that this
action will guarantee me a 4.

01:41:47.500 --> 01:41:51.230
And so given that I know that this
action guarantees me a score of 4,

01:41:51.230 --> 01:41:54.130
and this action means I
can't do better than 3,

01:41:54.130 --> 01:41:56.290
if I'm trying to maximize
my options, there

01:41:56.290 --> 01:41:59.290
is no need for me to
consider this triangle here.

01:41:59.290 --> 01:42:01.990
There is no value, no
number that could go here,

01:42:01.990 --> 01:42:04.760
that would change my mind
between these two options.

01:42:04.760 --> 01:42:08.020
I'm always going to opt for
this path that gets me a 4,

01:42:08.020 --> 01:42:11.260
as opposed to this path, where
the best I can do is a 3,

01:42:11.260 --> 01:42:13.600
if my opponent plays optimally.

01:42:13.600 --> 01:42:16.850
And this is going to be true for all of
the future states that I look at, too.

01:42:16.850 --> 01:42:19.470
But if I look over here, at what
min player might do over here,

01:42:19.470 --> 01:42:24.510
if I see that this state is a 2, I
know that this state is at most a 2,

01:42:24.510 --> 01:42:28.500
because the only way this value
could be something other than 2

01:42:28.500 --> 01:42:31.820
is if one of these remaining
states is less than a 2,

01:42:31.820 --> 01:42:34.600
and so the min player
would opt for that instead.

01:42:34.600 --> 01:42:37.470
So even without looking
at these remaining states,

01:42:37.470 --> 01:42:42.600
I, as the maximizing player, can know
that choosing this path to the left

01:42:42.600 --> 01:42:47.280
is going to be better than choosing
either of those two paths to the right,

01:42:47.280 --> 01:42:51.810
because this one can't be better than
3, this one can't be better than 2,

01:42:51.810 --> 01:42:56.550
and so 4 in this case is
the best that I can do.

01:42:56.550 --> 01:42:59.580
And I can say now that this
state has a value of 4.

01:42:59.580 --> 01:43:01.680
So in order to do this
type of calculation,

01:43:01.680 --> 01:43:04.980
I was doing a little bit more
bookkeeping, keeping track of things,

01:43:04.980 --> 01:43:08.480
keeping track all the time of,
what is the best that I can do,

01:43:08.480 --> 01:43:11.400
what is the worst that I can do, and
for each of these states, saying,

01:43:11.400 --> 01:43:15.300
all right, well, if I already
know that I can get a 4,

01:43:15.300 --> 01:43:18.060
then if the best I can
do at this state is a 3,

01:43:18.060 --> 01:43:19.770
no reason for me to consider it.

01:43:19.770 --> 01:43:25.020
I can effectively prune this leaf
and anything below it from the tree.

01:43:25.020 --> 01:43:28.440
And it's for that reason this
approach, this optimization to Minimax,

01:43:28.440 --> 01:43:30.510
is called alpha-beta pruning.

01:43:30.510 --> 01:43:32.400
Alpha and beta stand
for these two values

01:43:32.400 --> 01:43:34.950
that you'll have to keep track
of, the best you can do so far

01:43:34.950 --> 01:43:36.600
and the worst you can do so far.

01:43:36.600 --> 01:43:41.170
And pruning is the idea of, if I
have a big, long, deep search tree,

01:43:41.170 --> 01:43:43.650
I might be able to search it
more efficiently if I don't

01:43:43.650 --> 01:43:47.070
need to search through everything,
if I can remove some of the nodes

01:43:47.070 --> 01:43:52.170
to try and optimize the way that I
look through this entire search space.

01:43:52.170 --> 01:43:55.500
So alpha-beta pruning can
definitely save us a lot of time

01:43:55.500 --> 01:43:59.430
as we go about the search process by
making our searches more efficient.

01:43:59.430 --> 01:44:03.750
But even then, it's still not
great as games get more complex.

01:44:03.750 --> 01:44:06.990
Tic-tac-toe, fortunately,
is a relatively simple game,

01:44:06.990 --> 01:44:09.820
and we might reasonably
ask a question like,

01:44:09.820 --> 01:44:13.640
how many total possible
tic-tac-toe games are there?

01:44:13.640 --> 01:44:14.600
You can think about it.

01:44:14.600 --> 01:44:17.640
You can try and estimate, how many
moves are there at any given point?

01:44:17.640 --> 01:44:19.500
How many moves long can the game last?

01:44:19.500 --> 01:44:26.250
It turns out there are about 255,000
possible tic-tac-toe games that

01:44:26.250 --> 01:44:27.780
can be played.

01:44:27.780 --> 01:44:30.240
But compare that to a more
complex game, something

01:44:30.240 --> 01:44:32.060
like a game of chess, for example--

01:44:32.060 --> 01:44:35.950
far more pieces, far more moves,
games that last much longer.

01:44:35.950 --> 01:44:38.940
How many total possible
chess games could there be?

01:44:38.940 --> 01:44:41.100
It turns out that after
just four moves each,

01:44:41.100 --> 01:44:44.100
four moves by the white player,
four moves by the black player,

01:44:44.100 --> 01:44:47.280
that there are 288
billion possible chess

01:44:47.280 --> 01:44:51.300
games that can result from that
situation, after just four moves each.

01:44:51.300 --> 01:44:52.330
And going even further.

01:44:52.330 --> 01:44:55.380
If you look at entire chess games and
how many possible chess games there

01:44:55.380 --> 01:44:58.560
could be as a result there,
there are more than 10

01:44:58.560 --> 01:45:02.820
to the 29,000 possible chess
games, far more chess games

01:45:02.820 --> 01:45:04.410
than could ever be considered.

01:45:04.410 --> 01:45:08.010
And this is a pretty big problem for the
Minimax algorithm, because the Minimax

01:45:08.010 --> 01:45:12.150
algorithm starts with an initial state,
considers all the possible actions

01:45:12.150 --> 01:45:15.570
and all the possible actions
after that, all the way

01:45:15.570 --> 01:45:18.510
until we get to the end of the game.

01:45:18.510 --> 01:45:20.640
And that's going to be a
problem if the computer is

01:45:20.640 --> 01:45:23.100
going to need to look through
this many states, which

01:45:23.100 --> 01:45:28.680
is far more than any computer could ever
do in any reasonable amount of time.

01:45:28.680 --> 01:45:30.900
So what do we do in order
to solve this problem?

01:45:30.900 --> 01:45:32.980
Instead of looking through
all these states, which

01:45:32.980 --> 01:45:36.420
is totally intractable for a computer,
we need some better approach.

01:45:36.420 --> 01:45:39.600
And it turns out that better approach
generally takes the form of something

01:45:39.600 --> 01:45:42.000
called depth-limited Minimax.

01:45:42.000 --> 01:45:44.740
Where normally Minimax
is depth-unlimited--

01:45:44.740 --> 01:45:47.220
we just keep going, layer
after layer, move after move,

01:45:47.220 --> 01:45:48.930
until we get to the end of the game--

01:45:48.930 --> 01:45:51.920
depth-limited Minimax is instead
going to say, you know what?

01:45:51.920 --> 01:45:53.760
After a certain number
of moves-- maybe I'll

01:45:53.760 --> 01:45:57.390
look 10 moves ahead, maybe I'll look
12 moves ahead, but after that point,

01:45:57.390 --> 01:46:00.350
I'm going to stop and not
consider additional moves that

01:46:00.350 --> 01:46:02.190
might come after that,
just because it would

01:46:02.190 --> 01:46:07.970
be computationally intractable to
consider all of those possible options.

01:46:07.970 --> 01:46:10.730
But what do we do after we
get 10 or 12 moves deep,

01:46:10.730 --> 01:46:13.840
and we arrive at a situation
where the game's not over?

01:46:13.840 --> 01:46:18.110
Minimax still needs a way to assign a
score to that game board or game state

01:46:18.110 --> 01:46:20.390
to figure out what its
current value is, which

01:46:20.390 --> 01:46:22.760
is easy to do if the
game is over, but not so

01:46:22.760 --> 01:46:25.400
easy to do if the game is not yet over.

01:46:25.400 --> 01:46:27.950
So in order to do that, we need
to add one additional feature

01:46:27.950 --> 01:46:31.400
to depth-limited Minimax
called an evaluation function,

01:46:31.400 --> 01:46:33.170
which is just some
function that is going

01:46:33.170 --> 01:46:38.000
to estimate the expected utility
of a game from a given state.

01:46:38.000 --> 01:46:41.000
So in a game like chess, if you
imagine that a game value of 1

01:46:41.000 --> 01:46:45.950
means white wins, negative 1 means
black wins, 0 means it's a draw,

01:46:45.950 --> 01:46:51.260
then you might imagine that a score of
0.8 means white is very likely to win,

01:46:51.260 --> 01:46:53.000
though certainly not guaranteed.

01:46:53.000 --> 01:46:56.450
And you would have an evaluation
function that estimates

01:46:56.450 --> 01:46:59.450
how good the game state happens to be.

01:46:59.450 --> 01:47:02.720
And depending on how good
that evaluation function is,

01:47:02.720 --> 01:47:06.050
that is ultimately what's going
to constrain how good the AI is.

01:47:06.050 --> 01:47:09.080
The better the AI is
at estimating how good

01:47:09.080 --> 01:47:12.440
or how bad any particular game
state is, the better the AI

01:47:12.440 --> 01:47:14.660
is going to be able to play that game.

01:47:14.660 --> 01:47:16.910
If the evaluation function
is worse and not as good

01:47:16.910 --> 01:47:19.670
as estimating what the
expected utility is,

01:47:19.670 --> 01:47:21.680
then it's going to be
a whole lot harder.

01:47:21.680 --> 01:47:25.130
And you can imagine trying to come
up with these evaluation functions.

01:47:25.130 --> 01:47:27.890
In chess, for example, you might
write an evaluation function

01:47:27.890 --> 01:47:30.230
based on how many pieces
you have, as compared

01:47:30.230 --> 01:47:32.540
to how many pieces your
opponent has, because each one

01:47:32.540 --> 01:47:35.240
has a value in your evaluation function.

01:47:35.240 --> 01:47:36.950
It probably needs to
be a little bit more

01:47:36.950 --> 01:47:40.400
complicated than that to consider
other possible situations that

01:47:40.400 --> 01:47:42.090
might arise as well.

01:47:42.090 --> 01:47:44.240
And there are many other
variants on Minimax

01:47:44.240 --> 01:47:47.570
that add additional features in
order to help it perform better

01:47:47.570 --> 01:47:50.330
under these larger and more
computationally intractable

01:47:50.330 --> 01:47:54.620
situations, where we couldn't possibly
explore all of the possible moves,

01:47:54.620 --> 01:47:57.230
so we need to figure out
how to use evaluation

01:47:57.230 --> 01:48:01.370
functions and other techniques to be
able to play these games, ultimately,

01:48:01.370 --> 01:48:02.360
better.

01:48:02.360 --> 01:48:05.480
But this now was a look at this kind
of adversarial search, these search

01:48:05.480 --> 01:48:08.870
problems where we have
situations where I am trying

01:48:08.870 --> 01:48:11.360
to play against some sort of opponent.

01:48:11.360 --> 01:48:13.880
And these search problems
show up all over the place

01:48:13.880 --> 01:48:15.560
throughout artificial intelligence.

01:48:15.560 --> 01:48:18.710
We've been talking a lot today about
more classical search problems,

01:48:18.710 --> 01:48:22.080
like trying to find directions
from one location to another.

01:48:22.080 --> 01:48:25.590
But anytime an AI is faced with
trying to make a decision like,

01:48:25.590 --> 01:48:28.460
what do I do now in order to
do something that is rational,

01:48:28.460 --> 01:48:31.220
or do something that is intelligent,
or trying to play a game,

01:48:31.220 --> 01:48:33.830
like figuring out what move to
make, these sort of algorithms

01:48:33.830 --> 01:48:35.700
can really come in handy.

01:48:35.700 --> 01:48:38.420
It turns out that for tic-tac-toe,
the solution is pretty simple,

01:48:38.420 --> 01:48:39.600
because it's a small game.

01:48:39.600 --> 01:48:42.470
XKCD has famously put
together a webcomic

01:48:42.470 --> 01:48:45.620
where he will tell you exactly what
move to make as the optimal move

01:48:45.620 --> 01:48:48.290
to make, no matter what
your opponent happens to do.

01:48:48.290 --> 01:48:50.510
This type of thing is
not quite as possible

01:48:50.510 --> 01:48:52.790
for a much larger game
like checkers or chess,

01:48:52.790 --> 01:48:55.400
for example, where chess
is totally computationally

01:48:55.400 --> 01:48:57.920
intractable for most computers
to be able to explore

01:48:57.920 --> 01:48:59.330
all the possible states.

01:48:59.330 --> 01:49:03.650
So we really need our AI to be
far more intelligent about how

01:49:03.650 --> 01:49:05.850
they go about trying to
deal with these problems

01:49:05.850 --> 01:49:08.180
and how they go about
taking this environment

01:49:08.180 --> 01:49:10.010
that they find themselves
in and ultimately

01:49:10.010 --> 01:49:12.710
searching for one of these solutions.

01:49:12.710 --> 01:49:15.710
So this, then, was a look at
search and artificial intelligence.

01:49:15.710 --> 01:49:17.630
Next time we'll take
a look at knowledge,

01:49:17.630 --> 01:49:21.230
thinking about how it is that our AIs
are able to know information, reason

01:49:21.230 --> 01:49:25.190
about that information, and draw
conclusions, all in our look at AI

01:49:25.190 --> 01:49:26.750
and the principles behind it.

01:49:26.750 --> 01:49:28.500
We'll see you next time.