1
00:00:00,000 --> 00:00:17,000
[MUSIC]

2
00:00:17,870 --> 00:00:18,710
BRIAN YU: All right.

3
00:00:18,710 --> 00:00:22,280
Welcome, everyone, to an Introduction
to Artificial Intelligence with Python.

4
00:00:22,280 --> 00:00:23,480
My name is Brian Yu.

5
00:00:23,480 --> 00:00:26,540
And in this class, we'll explore
some of the ideas, and techniques,

6
00:00:26,540 --> 00:00:30,450
and algorithms that are at the
foundation of artificial intelligence.

7
00:00:30,450 --> 00:00:34,160
Now, artificial intelligence covers a
wide variety of types of techniques.

8
00:00:34,160 --> 00:00:36,110
Anytime you see a
computer do something that

9
00:00:36,110 --> 00:00:39,290
appears to be intelligent
or rational in some way,

10
00:00:39,290 --> 00:00:41,510
like recognizing
someone's face in a photo,

11
00:00:41,510 --> 00:00:44,030
or being able to play a
game better than people can,

12
00:00:44,030 --> 00:00:47,120
or being able to understand human
language when we talk to our phones

13
00:00:47,120 --> 00:00:50,370
and they understand what we mean
and are able to respond back to us,

14
00:00:50,370 --> 00:00:54,060
these are all examples of AI,
or artificial intelligence.

15
00:00:54,060 --> 00:00:58,580
And in this class we'll explore some of
the ideas that make that AI possible.

16
00:00:58,580 --> 00:01:00,790
So we'll begin our
conversations with search.

17
00:01:00,790 --> 00:01:02,540
The problem of, we
have an AI and we would

18
00:01:02,540 --> 00:01:06,150
like the AI to be able to search for
solutions to some kind of problem,

19
00:01:06,150 --> 00:01:07,700
no matter what that problem might be.

20
00:01:07,700 --> 00:01:11,300
Whether it's trying to get driving
directions from point A to point B,

21
00:01:11,300 --> 00:01:13,100
or trying to figure
out how to play a game,

22
00:01:13,100 --> 00:01:16,580
giving a tic-tac-toe game, for
example, figuring out what move

23
00:01:16,580 --> 00:01:17,960
it ought to make.

24
00:01:17,960 --> 00:01:20,330
After that, we'll take
a look at knowledge.

25
00:01:20,330 --> 00:01:23,480
Ideally, we want our AI to
be able to know information,

26
00:01:23,480 --> 00:01:25,400
to be able to represent
that information,

27
00:01:25,400 --> 00:01:28,580
and more importantly, to be able to
draw inferences from that information.

28
00:01:28,580 --> 00:01:32,760
To be able to use the information it
knows and draw additional conclusions.

29
00:01:32,760 --> 00:01:37,130
So we'll talk about how AI can be
programmed in order to do just that.

30
00:01:37,130 --> 00:01:39,350
Then we'll explore the
topic of uncertainty.

31
00:01:39,350 --> 00:01:43,100
Talking about ideas of, what happens
if a computer isn't sure about a fact

32
00:01:43,100 --> 00:01:45,950
but maybe is only sure
with a certain probability?

33
00:01:45,950 --> 00:01:48,500
So we'll talk about some of
the ideas behind probability

34
00:01:48,500 --> 00:01:51,530
and how computers can begin
to deal with uncertain events

35
00:01:51,530 --> 00:01:55,980
in order to be a little bit more
intelligent in that sense, as well.

36
00:01:55,980 --> 00:01:58,550
After that, we'll turn our
attention to optimization.

37
00:01:58,550 --> 00:02:01,970
Problems of when the computer is trying
to optimize for some sort of goal,

38
00:02:01,970 --> 00:02:03,770
especially in a situation
where there might

39
00:02:03,770 --> 00:02:06,380
be multiple ways that a
computer might solve a problem,

40
00:02:06,380 --> 00:02:09,770
but we're looking for a better
way or, potentially, the best way

41
00:02:09,770 --> 00:02:11,550
if that's at all possible.

42
00:02:11,550 --> 00:02:14,510
Then we'll take a look at machine
learning, or learning more generally.

43
00:02:14,510 --> 00:02:16,700
In looking at how when
we have access to data

44
00:02:16,700 --> 00:02:20,300
our computers can be programmed to be
quite intelligent by learning from data

45
00:02:20,300 --> 00:02:23,810
and learning from experience, being
able to perform a task better and better

46
00:02:23,810 --> 00:02:25,770
based on greater access to data.

47
00:02:25,770 --> 00:02:28,520
So your email, for example,
where your email inbox somehow

48
00:02:28,520 --> 00:02:32,180
knows which of your emails are good
emails and whichever emails are spam.

49
00:02:32,180 --> 00:02:34,430
These are all examples
of computers being

50
00:02:34,430 --> 00:02:38,420
able to learn from past
experiences and past data.

51
00:02:38,420 --> 00:02:41,690
We'll take a look, too, at how
computers are able to draw inspiration

52
00:02:41,690 --> 00:02:44,930
from human intelligence, looking
at the structure of the human brain

53
00:02:44,930 --> 00:02:48,780
and how neural networks can be a
computer analog to that sort of idea.

54
00:02:48,780 --> 00:02:51,830
And how, by taking advantage of a
certain type of structure of a computer

55
00:02:51,830 --> 00:02:53,930
program, we can write
neural networks that

56
00:02:53,930 --> 00:02:57,060
are able to perform tasks
very, very effectively.

57
00:02:57,060 --> 00:02:59,420
And then finally, we'll turn
our attention to language.

58
00:02:59,420 --> 00:03:02,820
Not programming languages, but human
languages that we speak every day.

59
00:03:02,820 --> 00:03:04,700
And taking a look at
the challenges that come

60
00:03:04,700 --> 00:03:07,880
about as a computer tries to
understand natural language

61
00:03:07,880 --> 00:03:09,980
and how it is some of
the natural language

62
00:03:09,980 --> 00:03:12,860
processing that occurs in
modern artificial intelligence

63
00:03:12,860 --> 00:03:14,990
can actually work.

64
00:03:14,990 --> 00:03:17,630
But today it will begin our
conversation with search.

65
00:03:17,630 --> 00:03:19,550
This problem of trying
to figure out what

66
00:03:19,550 --> 00:03:22,790
to do when we have some sort of
situation that the computer is in,

67
00:03:22,790 --> 00:03:25,880
some sort of environment that
an agent is in, so to speak.

68
00:03:25,880 --> 00:03:29,720
And we would like for that agent to
be able to somehow look for a solution

69
00:03:29,720 --> 00:03:31,200
to that problem.

70
00:03:31,200 --> 00:03:34,250
Now, these problems can come in any
number of different types of formats.

71
00:03:34,250 --> 00:03:37,090
One example, for instance, might
be something like this classic 15

72
00:03:37,090 --> 00:03:38,930
puzzle with the sliding
tiles that you might

73
00:03:38,930 --> 00:03:41,930
have seen, where you're trying to
slide the tiles in order to make sure

74
00:03:41,930 --> 00:03:43,730
that all the numbers line up in order.

75
00:03:43,730 --> 00:03:46,700
This is an example of what you
might call a search problem.

76
00:03:46,700 --> 00:03:50,360
The 15 puzzle begins in an
initially mixed up state

77
00:03:50,360 --> 00:03:53,480
and we need some way of finding
moves to make in order to return

78
00:03:53,480 --> 00:03:55,520
the puzzle to its solved state.

79
00:03:55,520 --> 00:03:58,190
But there are similar problems
that you can frame in other ways.

80
00:03:58,190 --> 00:04:00,390
Trying to find your way
through a maze, for example,

81
00:04:00,390 --> 00:04:02,150
is another example of a search problem.

82
00:04:02,150 --> 00:04:05,720
You begin in one place, you have some
goal of where you're trying to get to,

83
00:04:05,720 --> 00:04:08,930
and you need to figure out the correct
sequence of actions that will take you

84
00:04:08,930 --> 00:04:11,460
from that initial state to the goal.

85
00:04:11,460 --> 00:04:13,460
And while this is a little
bit abstract, anytime

86
00:04:13,460 --> 00:04:15,440
we talk about maze
solving in this class,

87
00:04:15,440 --> 00:04:17,990
you can translate it to something
a little more real world,

88
00:04:17,990 --> 00:04:19,820
something like driving directions.

89
00:04:19,820 --> 00:04:22,520
If you ever wonder how Google
Maps is able to figure out what

90
00:04:22,520 --> 00:04:25,550
is the best way for you to
get from point A to point B

91
00:04:25,550 --> 00:04:28,860
and what turns to make, at what time,
depending on traffic, for example.

92
00:04:28,860 --> 00:04:31,250
It's often some sort
of search algorithm.

93
00:04:31,250 --> 00:04:34,370
You have an AI that is trying
to get from an initial position

94
00:04:34,370 --> 00:04:38,060
to some sort of goal by taking
some sequence of actions.

95
00:04:38,060 --> 00:04:40,640
So we'll start our
conversations today by thinking

96
00:04:40,640 --> 00:04:42,590
about these types of
search problems and what

97
00:04:42,590 --> 00:04:46,250
goes in to solving a search problem
like this in order for an AI

98
00:04:46,250 --> 00:04:48,290
to be able to find a good solution.

99
00:04:48,290 --> 00:04:50,210
In order to do so, though,
we're going to need

100
00:04:50,210 --> 00:04:52,580
to introduce a little bit of
terminology, some of which

101
00:04:52,580 --> 00:04:53,690
I've already used.

102
00:04:53,690 --> 00:04:56,730
But the first time we'll need
to think about is an agent.

103
00:04:56,730 --> 00:04:59,950
An agent is just some entity
that perceives its environment,

104
00:04:59,950 --> 00:05:02,120
it somehow is able to
perceive the things around it,

105
00:05:02,120 --> 00:05:04,530
and act on that environment in some way.

106
00:05:04,530 --> 00:05:06,240
So in the case of the
driving directions,

107
00:05:06,240 --> 00:05:08,960
your agent might be some
representation of a car that

108
00:05:08,960 --> 00:05:11,180
is trying to figure out what
actions to take in order

109
00:05:11,180 --> 00:05:12,740
to arrive at a destination.

110
00:05:12,740 --> 00:05:15,440
In the case of the 15 puzzle
with the sliding tiles,

111
00:05:15,440 --> 00:05:19,250
the agent might be the AI or the person
that is trying to solve that puzzle,

112
00:05:19,250 --> 00:05:24,090
trying to figure out what tiles to
move in order to get to that solution.

113
00:05:24,090 --> 00:05:26,680
Next, we introduce the idea of a state.

114
00:05:26,680 --> 00:05:31,140
A state is just some configuration
of the agent in its environment.

115
00:05:31,140 --> 00:05:34,880
So in the 15 puzzle, for example, any
state might be any one of these three

116
00:05:34,880 --> 00:05:35,570
for example.

117
00:05:35,570 --> 00:05:38,570
A state is just some
configuration of the tiles.

118
00:05:38,570 --> 00:05:40,490
Each of these states is
different and is going

119
00:05:40,490 --> 00:05:42,830
to require a slightly
different solution.

120
00:05:42,830 --> 00:05:46,400
A different sequence of actions will
be needed in each one of these in order

121
00:05:46,400 --> 00:05:49,730
to get from this initial
state to the goal, which

122
00:05:49,730 --> 00:05:51,500
is where we're trying to get.

123
00:05:51,500 --> 00:05:52,460
The initial state then.

124
00:05:52,460 --> 00:05:53,210
What is that?

125
00:05:53,210 --> 00:05:56,090
The initial state is just the
state where the agent begins.

126
00:05:56,090 --> 00:05:58,880
It is one such state where
we're going to start from

127
00:05:58,880 --> 00:06:02,010
and this is going to be the starting
point for our search algorithm,

128
00:06:02,010 --> 00:06:02,690
so to speak.

129
00:06:02,690 --> 00:06:04,710
We're going to begin
with this initial state

130
00:06:04,710 --> 00:06:08,180
and then start to reason about it, to
think about what actions might we apply

131
00:06:08,180 --> 00:06:11,870
to that initial state in order to
figure out how to get from the beginning

132
00:06:11,870 --> 00:06:16,850
to the end, from the initial position
to whatever our goal happens to be.

133
00:06:16,850 --> 00:06:19,670
And how do we make our way from
that initial position to the goal?

134
00:06:19,670 --> 00:06:22,100
Well ultimately, it's
via taking actions.

135
00:06:22,100 --> 00:06:25,570
Actions are just choices that
we can make in any given state.

136
00:06:25,570 --> 00:06:29,180
And in AI, we're always going to try
to formalize these ideas a little bit

137
00:06:29,180 --> 00:06:31,850
more precisely such that we could
program them a little bit more

138
00:06:31,850 --> 00:06:33,440
mathematically, so to speak.

139
00:06:33,440 --> 00:06:37,580
So this will be a recurring theme and
we can more precisely define actions

140
00:06:37,580 --> 00:06:38,840
as a function.

141
00:06:38,840 --> 00:06:41,570
We're going to effectively
define a function called actions

142
00:06:41,570 --> 00:06:46,490
that takes an input S, where S is going
to be some state that exists inside

143
00:06:46,490 --> 00:06:50,720
of our environment, and actions of S
is going to take the state as input

144
00:06:50,720 --> 00:06:54,140
and return as output the
set of all actions that

145
00:06:54,140 --> 00:06:56,540
can be executed in that state.

146
00:06:56,540 --> 00:07:00,170
And so it's possible that some actions
are only valid in certain states

147
00:07:00,170 --> 00:07:01,640
and not in other states.

148
00:07:01,640 --> 00:07:04,340
And we'll see examples
of that soon, too.

149
00:07:04,340 --> 00:07:06,530
So in the case of the
15 puzzle, for example,

150
00:07:06,530 --> 00:07:09,010
they're generally going to
be four possible actions

151
00:07:09,010 --> 00:07:10,760
that we can do most of the time.

152
00:07:10,760 --> 00:07:13,250
We can slide a tile to the
right, slide a tile to the left,

153
00:07:13,250 --> 00:07:16,280
slide a tile up, or slide
a tile down, for example.

154
00:07:16,280 --> 00:07:19,950
And those are going to be the
actions that are available to us.

155
00:07:19,950 --> 00:07:23,000
So somehow our AI, our
program, needs some encoding

156
00:07:23,000 --> 00:07:26,150
of the state, which is often going
to be in some numerical format,

157
00:07:26,150 --> 00:07:28,050
and some encoding of these actions.

158
00:07:28,050 --> 00:07:31,100
But it also needs some encoding of
the relationship between these things,

159
00:07:31,100 --> 00:07:34,610
how do the states and actions
relate to one another?

160
00:07:34,610 --> 00:07:36,950
And in order to do that,
we'll introduce to our AI

161
00:07:36,950 --> 00:07:40,430
a transition model, which will
be a description of what state

162
00:07:40,430 --> 00:07:45,120
we get after we perform some
available action in some other state.

163
00:07:45,120 --> 00:07:47,540
And again, we can be a little
bit more precise about this,

164
00:07:47,540 --> 00:07:50,820
define this transition model a
little bit more formally, again,

165
00:07:50,820 --> 00:07:51,770
as a function.

166
00:07:51,770 --> 00:07:54,010
The function is going to be
a function called result,

167
00:07:54,010 --> 00:07:56,210
that this time takes two inputs.

168
00:07:56,210 --> 00:07:59,070
Input number one is S, some state.

169
00:07:59,070 --> 00:08:02,270
And input number two is A, some action.

170
00:08:02,270 --> 00:08:04,450
And the output of this
function result is

171
00:08:04,450 --> 00:08:09,770
it is going to give us the state that we
get after we perform action A in state

172
00:08:09,770 --> 00:08:13,790
S. So let's take a look at an example
to see more precisely what this actually

173
00:08:13,790 --> 00:08:14,580
means.

174
00:08:14,580 --> 00:08:18,000
Here's an example of a state
of the 15 puzzle, for example.

175
00:08:18,000 --> 00:08:21,600
And here's an example of an action,
sliding a tile to the right.

176
00:08:21,600 --> 00:08:24,840
What happens if we pass these as
inputs to the result function?

177
00:08:24,840 --> 00:08:29,270
Again, the result function takes this
board, this state, as its first input.

178
00:08:29,270 --> 00:08:31,730
And it takes an action
as a second input.

179
00:08:31,730 --> 00:08:34,100
And of course, here, I'm
describing things visually so

180
00:08:34,100 --> 00:08:37,010
that you can see visually what the
state is and what the action is.

181
00:08:37,010 --> 00:08:39,410
In a computer, you might
represent one of these actions

182
00:08:39,410 --> 00:08:41,570
as just some number that
represents the action.

183
00:08:41,570 --> 00:08:43,370
Or if you're familiar
with enums that allow

184
00:08:43,370 --> 00:08:46,430
you to enumerate multiple possibilities,
it might be something like that.

185
00:08:46,430 --> 00:08:50,530
And the state might just be represented
as an array, or two dimensional array,

186
00:08:50,530 --> 00:08:52,400
of all of these numbers that exist.

187
00:08:52,400 --> 00:08:55,540
But here we're going to show it
visually just so you can see it.

188
00:08:55,540 --> 00:08:59,450
When we take this state and this action,
pass it into the result function,

189
00:08:59,450 --> 00:09:01,310
the output is a new state.

190
00:09:01,310 --> 00:09:04,910
The state we get after we take a tile
and slide it to the right, and this

191
00:09:04,910 --> 00:09:06,500
is the state we get as a result.

192
00:09:06,500 --> 00:09:09,710
If we had a different action and
a different state, for example,

193
00:09:09,710 --> 00:09:11,450
and passed that into
the result function,

194
00:09:11,450 --> 00:09:13,470
we'd get a different answer altogether.

195
00:09:13,470 --> 00:09:15,710
So the result function
needs to take care

196
00:09:15,710 --> 00:09:20,090
of figuring out how to take a state and
take an action and get what results.

197
00:09:20,090 --> 00:09:22,760
And this is going to be
our transition model that

198
00:09:22,760 --> 00:09:27,450
describes how it is that states and
actions are related to each other.

199
00:09:27,450 --> 00:09:30,290
If we take this transition model
and think about it more generally

200
00:09:30,290 --> 00:09:34,850
and across the entire problem, we can
form what we might call a state space,

201
00:09:34,850 --> 00:09:38,120
the set of all of the states we
can get from the initial state

202
00:09:38,120 --> 00:09:41,960
via any sequence of actions, by
taking zero or one or two or more

203
00:09:41,960 --> 00:09:45,050
actions in addition to that,
so we could draw a diagram

204
00:09:45,050 --> 00:09:46,400
that looks something like this.

205
00:09:46,400 --> 00:09:49,640
Where every state is represented
here by a game board.

206
00:09:49,640 --> 00:09:52,850
And there are arrows that connect
every state to every other state we

207
00:09:52,850 --> 00:09:54,740
can get two from that state.

208
00:09:54,740 --> 00:09:57,800
And the state space is much larger
than what you see just here.

209
00:09:57,800 --> 00:10:01,920
This is just a sample of what the
state space might actually look like.

210
00:10:01,920 --> 00:10:04,340
And, in general, across
many search problems,

211
00:10:04,340 --> 00:10:07,280
whether they're this particular
15 puzzle or driving directions

212
00:10:07,280 --> 00:10:10,710
or something else, the state space
is going to look something like this.

213
00:10:10,710 --> 00:10:15,110
We have individual states and
arrows that are connecting them.

214
00:10:15,110 --> 00:10:17,000
And oftentimes, just
for simplicity, we'll

215
00:10:17,000 --> 00:10:19,670
simplify our representation
of this entire thing

216
00:10:19,670 --> 00:10:24,560
as a graph, some sequence of nodes
and edges that connect nodes.

217
00:10:24,560 --> 00:10:28,640
But you can think of this more abstract
representation as the exact same idea.

218
00:10:28,640 --> 00:10:30,560
Each of these little
circles, or nodes, is

219
00:10:30,560 --> 00:10:33,860
going to represent one of the
states inside of our problem.

220
00:10:33,860 --> 00:10:35,930
And the arrows here
represent the actions

221
00:10:35,930 --> 00:10:38,300
that we can take in
any particular state,

222
00:10:38,300 --> 00:10:44,180
taking us from one particular state
to another state, for example.

223
00:10:44,180 --> 00:10:44,760
All right.

224
00:10:44,760 --> 00:10:48,510
So now we have this idea of nodes
that are representing these states,

225
00:10:48,510 --> 00:10:50,890
actions that can take us
from one state to another,

226
00:10:50,890 --> 00:10:53,370
and a transition model
that defines what happens

227
00:10:53,370 --> 00:10:55,420
after we take a particular action.

228
00:10:55,420 --> 00:10:57,210
So the next step we
need to figure out is

229
00:10:57,210 --> 00:11:00,570
how we know when the AI is
done solving the problem.

230
00:11:00,570 --> 00:11:03,540
The AI I needs some way to
know when it gets to the goal,

231
00:11:03,540 --> 00:11:04,990
that it's found the goal.

232
00:11:04,990 --> 00:11:08,100
So the next thing we'll need to encode
into our artificial intelligence

233
00:11:08,100 --> 00:11:13,380
is a goal test, some way to determine
whether a given state is a goal state.

234
00:11:13,380 --> 00:11:16,740
In the case of something like driving
directions, it might be pretty easy.

235
00:11:16,740 --> 00:11:19,080
If you're in a state that
corresponds to whatever

236
00:11:19,080 --> 00:11:21,610
the user typed in as their
intended destination, well,

237
00:11:21,610 --> 00:11:23,160
then you know you're in a goal state.

238
00:11:23,160 --> 00:11:25,320
In the 15 puzzle, it might
be checking the numbers

239
00:11:25,320 --> 00:11:27,160
to make sure they're
all in ascending order.

240
00:11:27,160 --> 00:11:29,880
But the AI need some way
to encode whether or not

241
00:11:29,880 --> 00:11:32,340
any state they happen
to be in is a goal.

242
00:11:32,340 --> 00:11:34,650
And some problems might
have one goal, like a maze

243
00:11:34,650 --> 00:11:37,260
where you have one initial
position and one ending position

244
00:11:37,260 --> 00:11:38,470
and that's the goal.

245
00:11:38,470 --> 00:11:40,320
In other more complex
problems, you might

246
00:11:40,320 --> 00:11:43,890
imagine that there are multiple possible
goals, that there are multiple ways

247
00:11:43,890 --> 00:11:45,070
to solve a problem.

248
00:11:45,070 --> 00:11:47,940
And we might not care which
one the computer finds as

249
00:11:47,940 --> 00:11:51,330
long as it does find a particular goal.

250
00:11:51,330 --> 00:11:54,900
However, sometimes a computer doesn't
just care about finding a goal,

251
00:11:54,900 --> 00:11:57,710
but finding a goal well,
or one with a low cost.

252
00:11:57,710 --> 00:11:59,550
And it's for that reason
that the last piece

253
00:11:59,550 --> 00:12:02,400
of terminology that we use to
define these search problems

254
00:12:02,400 --> 00:12:04,710
is something called a path cost.

255
00:12:04,710 --> 00:12:07,170
You might imagine that in the
case of driving directions,

256
00:12:07,170 --> 00:12:10,680
it would be pretty annoying if I
said I wanted directions from point A

257
00:12:10,680 --> 00:12:14,310
to point B, and the route the Google
Maps gave me was a long route with lots

258
00:12:14,310 --> 00:12:17,220
of detours that were unnecessary,
that took longer than it should

259
00:12:17,220 --> 00:12:19,530
have for me to get to that destination.

260
00:12:19,530 --> 00:12:22,410
And it's for that reason that when
we're formulating search problems,

261
00:12:22,410 --> 00:12:27,480
we'll often give every path some sort of
numerical cost, some number telling us

262
00:12:27,480 --> 00:12:30,730
how expensive it is to take
this particular option.

263
00:12:30,730 --> 00:12:34,530
And then tell our AI that instead
of just finding a solution,

264
00:12:34,530 --> 00:12:36,990
some way of getting from the
initial state to the goal,

265
00:12:36,990 --> 00:12:40,830
we'd really like to find one that
minimizes this path cost, that

266
00:12:40,830 --> 00:12:44,370
is less expensive, or takes
less time, or minimizes

267
00:12:44,370 --> 00:12:46,500
some other numerical value.

268
00:12:46,500 --> 00:12:49,770
We can represent this graphically, if
we take a look at this graph again.

269
00:12:49,770 --> 00:12:52,740
And imagine that each of these
arrows, each of these actions

270
00:12:52,740 --> 00:12:55,560
that we can take from one
state to another state,

271
00:12:55,560 --> 00:12:57,720
has some sort of number
associated with it,

272
00:12:57,720 --> 00:13:01,170
that number being the path cost
of this particular action where

273
00:13:01,170 --> 00:13:03,480
some of the costs for
any particular action

274
00:13:03,480 --> 00:13:07,660
might be more expensive than the cost
for some other action, for example.

275
00:13:07,660 --> 00:13:10,120
Although this will only happen
in some sorts of problems.

276
00:13:10,120 --> 00:13:12,540
In other problems we
can simplify the diagram

277
00:13:12,540 --> 00:13:16,750
and just assume that the cost of
any particular action is the same.

278
00:13:16,750 --> 00:13:19,440
And this is probably the case
in something like the 15 puzzle,

279
00:13:19,440 --> 00:13:22,410
for example, where it doesn't
really make a difference whether I'm

280
00:13:22,410 --> 00:13:23,880
moving right or moving left.

281
00:13:23,880 --> 00:13:26,940
The only thing that matters
is the total number of steps

282
00:13:26,940 --> 00:13:31,650
that I have to take to get from point
A to point B. And each of those steps

283
00:13:31,650 --> 00:13:32,880
is of equal cost.

284
00:13:32,880 --> 00:13:37,270
We can just assume it's a
some constant cost, like one.

285
00:13:37,270 --> 00:13:39,380
And so this now forms
the basis for what we

286
00:13:39,380 --> 00:13:41,720
might consider to be a search problem.

287
00:13:41,720 --> 00:13:44,960
A search problem has some sort of
initial state, some place where

288
00:13:44,960 --> 00:13:47,420
we begin, some sort of
action that we can take

289
00:13:47,420 --> 00:13:50,300
or multiple actions that we
can take in any given state,

290
00:13:50,300 --> 00:13:52,850
and it has a transition
model, some way of defining

291
00:13:52,850 --> 00:13:56,570
what happens when we go from
one state and take one action,

292
00:13:56,570 --> 00:13:59,280
what state do we end
up with as a result.

293
00:13:59,280 --> 00:14:02,450
In addition to that, we need some
goal test to know whether or not

294
00:14:02,450 --> 00:14:03,650
we've reached a goal.

295
00:14:03,650 --> 00:14:07,940
And then we need a path cost function
that tells us for any particular path,

296
00:14:07,940 --> 00:14:11,690
by following some sequence of
actions, how expensive is that path.

297
00:14:11,690 --> 00:14:14,360
What is its cost in
terms of money, or time,

298
00:14:14,360 --> 00:14:18,470
or some other resource that we are
trying to minimize our usage of.

299
00:14:18,470 --> 00:14:22,370
The goal, ultimately, is to find a
solution, where a solution in this case

300
00:14:22,370 --> 00:14:26,090
is just some sequence of actions that
will take us from the initial state

301
00:14:26,090 --> 00:14:27,200
to the goal state.

302
00:14:27,200 --> 00:14:31,700
And, ideally, we'd like to find not just
any solution, but the optimal solution,

303
00:14:31,700 --> 00:14:35,510
which is a solution that has
the lowest path cost among all

304
00:14:35,510 --> 00:14:36,950
of the possible solutions.

305
00:14:36,950 --> 00:14:39,590
And in some cases, there might
be multiple optimal solutions,

306
00:14:39,590 --> 00:14:41,600
but an optimal solution
just means that there

307
00:14:41,600 --> 00:14:46,430
is no way that we could have done better
in terms of finding that solution.

308
00:14:46,430 --> 00:14:47,870
So now we've defined the problem.

309
00:14:47,870 --> 00:14:49,990
And now we need to
begin to figure out how

310
00:14:49,990 --> 00:14:53,050
it is that we're going to solve
this kind of search problem.

311
00:14:53,050 --> 00:14:55,270
And in order to do so,
you'll probably imagine

312
00:14:55,270 --> 00:14:58,810
that our computer is going to need
to represent a whole bunch of data

313
00:14:58,810 --> 00:15:00,160
about this particular problem.

314
00:15:00,160 --> 00:15:03,080
We need to represent data about
where we are in the problem.

315
00:15:03,080 --> 00:15:06,490
And we might need to be considering
multiple different options at once.

316
00:15:06,490 --> 00:15:09,850
And oftentimes when we're trying to
package a whole bunch of data related

317
00:15:09,850 --> 00:15:12,850
to a state together, we'll
do so using a data structure

318
00:15:12,850 --> 00:15:14,630
that we're going to call a node.

319
00:15:14,630 --> 00:15:16,510
A node is a data structure
that is just going

320
00:15:16,510 --> 00:15:19,110
to keep track of a variety
of different values,

321
00:15:19,110 --> 00:15:21,430
and specifically in the
case of a search problem,

322
00:15:21,430 --> 00:15:24,640
it's going to keep track of
these four values in particular.

323
00:15:24,640 --> 00:15:28,570
Every node is going to keep track of
a state, the state we're currently on.

324
00:15:28,570 --> 00:15:31,540
And every node is also going
to keep track of a parent.

325
00:15:31,540 --> 00:15:34,420
A parent being the state
before us, or the node

326
00:15:34,420 --> 00:15:37,660
that we used in order to
get to this current state.

327
00:15:37,660 --> 00:15:39,830
And this is going to be
relevant because eventually,

328
00:15:39,830 --> 00:15:42,740
once we reach the goal node,
once we get to the end,

329
00:15:42,740 --> 00:15:47,210
we want to know what sequence of actions
we used in order to get to that goal.

330
00:15:47,210 --> 00:15:50,140
And the way we'll know that
is by looking at these parents

331
00:15:50,140 --> 00:15:53,890
to keep track of what led us to the
goal, and what led us to that state,

332
00:15:53,890 --> 00:15:56,710
and what led us to the state
before that, so on and so forth,

333
00:15:56,710 --> 00:15:59,350
backtracking our way to
the beginning so that we

334
00:15:59,350 --> 00:16:01,990
know the entire sequence of
actions we needed in order

335
00:16:01,990 --> 00:16:04,730
to get from the beginning to the end.

336
00:16:04,730 --> 00:16:07,900
The node is also going to keep track
of what action we took in order to get

337
00:16:07,900 --> 00:16:10,120
from the parent to the current state.

338
00:16:10,120 --> 00:16:13,700
And the node is also going
to keep track of a path cost.

339
00:16:13,700 --> 00:16:16,270
In other words, it's going to
keep track of the number that

340
00:16:16,270 --> 00:16:20,440
represents how long it took to get
from the initial state to the state

341
00:16:20,440 --> 00:16:22,080
that we currently happen to be at.

342
00:16:22,080 --> 00:16:24,790
And we'll see why this is relevant
as we start to talk about some

343
00:16:24,790 --> 00:16:28,030
of the optimizations that we can make
in terms of these search problems more

344
00:16:28,030 --> 00:16:29,080
generally.

345
00:16:29,080 --> 00:16:31,000
So this is the data
structure that we're going

346
00:16:31,000 --> 00:16:32,920
to use in order to solve the problem.

347
00:16:32,920 --> 00:16:35,050
And now let's talk about
the approach, how might we

348
00:16:35,050 --> 00:16:37,850
actually begin to solve the problem?

349
00:16:37,850 --> 00:16:39,940
Well, as you might imagine,
what we're going to do

350
00:16:39,940 --> 00:16:42,190
is we're going to start
at one particular state

351
00:16:42,190 --> 00:16:44,700
and we're just going
to explore from there.

352
00:16:44,700 --> 00:16:46,720
The intuition is that
from a given state,

353
00:16:46,720 --> 00:16:48,760
we have multiple options
that we could take,

354
00:16:48,760 --> 00:16:50,800
and we're going to
explore those options.

355
00:16:50,800 --> 00:16:54,490
And once we explore those options,
we'll find that more options than that

356
00:16:54,490 --> 00:16:56,290
are going to make themselves available.

357
00:16:56,290 --> 00:16:59,080
And we're going to consider
all of the available options

358
00:16:59,080 --> 00:17:03,310
to be stored inside of a single data
structure that we'll call the frontier.

359
00:17:03,310 --> 00:17:05,710
The frontier is going to
represent all of the things

360
00:17:05,710 --> 00:17:10,440
that we could explore next, that
we haven't yet explored or visited.

361
00:17:10,440 --> 00:17:12,610
So in our approach, we're
going to begin this search

362
00:17:12,610 --> 00:17:17,020
algorithm by starting with a frontier
that just contains one state.

363
00:17:17,020 --> 00:17:20,470
The frontier is going to contain the
initial state because at the beginning,

364
00:17:20,470 --> 00:17:21,970
that's the only state we know about.

365
00:17:21,970 --> 00:17:24,330
That is the only state that exists.

366
00:17:24,330 --> 00:17:27,790
And then our search algorithm is
effectively going to follow a loop.

367
00:17:27,790 --> 00:17:31,390
We're going to repeat some
process again and again and again.

368
00:17:31,390 --> 00:17:35,320
The first thing we're going to
do is if the frontier is empty,

369
00:17:35,320 --> 00:17:36,610
then there's no solution.

370
00:17:36,610 --> 00:17:39,310
And we can report that there
is no way to get to the goal.

371
00:17:39,310 --> 00:17:40,560
And that's certainly possible.

372
00:17:40,560 --> 00:17:42,730
There are certain types of
problems that an AI might

373
00:17:42,730 --> 00:17:46,940
try to explore and realize that there
is no way to solve that problem.

374
00:17:46,940 --> 00:17:49,610
And that's useful information
for humans to know, as well.

375
00:17:49,610 --> 00:17:53,560
So if ever the frontier is empty, that
means there's nothing left to explore,

376
00:17:53,560 --> 00:17:57,160
and we haven't yet found a
solution so there is no solution.

377
00:17:57,160 --> 00:17:59,140
There's nothing left to explore.

378
00:17:59,140 --> 00:18:03,020
Otherwise what we'll do is we'll
remove a node from the frontier.

379
00:18:03,020 --> 00:18:04,990
So right now at the
beginning, the frontier

380
00:18:04,990 --> 00:18:07,870
just contains one node
representing the initial state.

381
00:18:07,870 --> 00:18:09,550
But over time, the frontier might grow.

382
00:18:09,550 --> 00:18:11,080
It might contain multiple states.

383
00:18:11,080 --> 00:18:15,730
And so here we're just going to remove
a single node from that frontier.

384
00:18:15,730 --> 00:18:19,000
If that node happens to be a
goal, then we found a solution.

385
00:18:19,000 --> 00:18:22,390
So we remove a node from the frontier
and ask ourselves, is this the goal?

386
00:18:22,390 --> 00:18:25,540
And we do that by applying the goal
test that we talked about earlier,

387
00:18:25,540 --> 00:18:29,080
asking if we're at the destination or
asking if all the numbers of the 15

388
00:18:29,080 --> 00:18:31,250
puzzle happen to be in order.

389
00:18:31,250 --> 00:18:33,860
So if the node contains the
goal, we found a solution.

390
00:18:33,860 --> 00:18:34,360
Great.

391
00:18:34,360 --> 00:18:35,900
We're done.

392
00:18:35,900 --> 00:18:40,820
And otherwise, what we'll need to
do is we'll need to expand the node.

393
00:18:40,820 --> 00:18:42,940
And this is a term in
artificial intelligence.

394
00:18:42,940 --> 00:18:46,870
To expand the node just means to look
at all of the neighbors of that node.

395
00:18:46,870 --> 00:18:49,630
In other words, consider
all of the possible actions

396
00:18:49,630 --> 00:18:52,830
that I could take from the state
that this node as representing

397
00:18:52,830 --> 00:18:55,270
and what nodes could
I get to from there.

398
00:18:55,270 --> 00:18:57,520
We're going to take all of
those nodes, the next nodes

399
00:18:57,520 --> 00:19:00,220
that I can get to from this
current one I'm looking at,

400
00:19:00,220 --> 00:19:02,320
and add those to the frontier.

401
00:19:02,320 --> 00:19:04,550
And then we'll repeat this process.

402
00:19:04,550 --> 00:19:07,840
So at a very high level, the idea
is we start with a frontier that

403
00:19:07,840 --> 00:19:09,400
contains the initial state.

404
00:19:09,400 --> 00:19:12,190
And we're constantly removing
a node from the frontier,

405
00:19:12,190 --> 00:19:16,150
looking at where we can get to next,
and adding those nodes to the frontier,

406
00:19:16,150 --> 00:19:19,090
repeating this process over
and over until either we remove

407
00:19:19,090 --> 00:19:22,270
a node from the frontier and it
contains a goal, meaning we've solved

408
00:19:22,270 --> 00:19:23,320
the problem.

409
00:19:23,320 --> 00:19:26,950
Or we run into a situation where the
frontier is empty, at which point

410
00:19:26,950 --> 00:19:29,610
we're left with no solution.

411
00:19:29,610 --> 00:19:31,610
So let's actually try
and take the pseudocode,

412
00:19:31,610 --> 00:19:36,470
put it into practice by taking a look at
an example of a sample search problem.

413
00:19:36,470 --> 00:19:38,270
So right here I have a sample graph.

414
00:19:38,270 --> 00:19:42,470
A is connected to B via this action,
B is connected to node C and D, C

415
00:19:42,470 --> 00:19:46,220
is connected to D, E is connected
to F. And what I'd like to do

416
00:19:46,220 --> 00:19:52,760
is have my AI find a path from A to E.
We want to get from this initial state

417
00:19:52,760 --> 00:19:54,980
to this goal state.

418
00:19:54,980 --> 00:19:56,440
So how are we going to do that?

419
00:19:56,440 --> 00:19:58,490
Well, we're going to start
with the frontier that

420
00:19:58,490 --> 00:19:59,760
contains the initial state.

421
00:19:59,760 --> 00:20:01,550
This is going to represent our frontier.

422
00:20:01,550 --> 00:20:05,090
So our frontier, initially, will
just contain A, that initial state

423
00:20:05,090 --> 00:20:06,650
where we're going to begin.

424
00:20:06,650 --> 00:20:08,480
And now we'll repeat this process.

425
00:20:08,480 --> 00:20:10,460
If the frontier is empty, no solution.

426
00:20:10,460 --> 00:20:12,920
That's not a problem because
the frontier is not empty.

427
00:20:12,920 --> 00:20:17,080
So we'll remove a node from the
frontier as the one to consider next.

428
00:20:17,080 --> 00:20:18,720
There is only one node in the frontier.

429
00:20:18,720 --> 00:20:20,810
So we'll go ahead and
remove it from the frontier.

430
00:20:20,810 --> 00:20:25,490
But now A, this initial node, this is
the node we're currently considering.

431
00:20:25,490 --> 00:20:26,600
We follow the next step.

432
00:20:26,600 --> 00:20:29,270
We ask ourselves, is this node the goal?

433
00:20:29,270 --> 00:20:29,880
No, it's not.

434
00:20:29,880 --> 00:20:30,770
A is not the goal.

435
00:20:30,770 --> 00:20:32,220
E is the goal.

436
00:20:32,220 --> 00:20:33,870
So we don't return the solution.

437
00:20:33,870 --> 00:20:37,160
So instead, we go to this
last step, expand the node

438
00:20:37,160 --> 00:20:39,950
and add the resulting
nodes to the frontier.

439
00:20:39,950 --> 00:20:41,070
What does that mean?

440
00:20:41,070 --> 00:20:45,050
Well, it means take this state A and
consider where we could get to next.

441
00:20:45,050 --> 00:20:47,390
And after A what we
could get to next is only

442
00:20:47,390 --> 00:20:51,110
B. So that's what we get
when we expand A. We find B.

443
00:20:51,110 --> 00:20:53,080
And we add B to the frontier.

444
00:20:53,080 --> 00:20:56,300
And now B is in the frontier
and we repeat the process again.

445
00:20:56,300 --> 00:20:57,050
We say, all right.

446
00:20:57,050 --> 00:20:58,280
The frontier is not empty.

447
00:20:58,280 --> 00:21:00,440
So let's remove B from the frontier.

448
00:21:00,440 --> 00:21:02,300
B is now the node that
we're considering.

449
00:21:02,300 --> 00:21:04,160
We ask ourselves, is B the goal?

450
00:21:04,160 --> 00:21:05,120
No, it's not.

451
00:21:05,120 --> 00:21:10,010
So we go ahead and expand B and add
its resulting nodes to the frontier.

452
00:21:10,010 --> 00:21:11,660
What happens when we expand B?

453
00:21:11,660 --> 00:21:14,750
In other words, what nodes
can we get to from B?

454
00:21:14,750 --> 00:21:17,180
Well, we can get to C
and D. So we'll go ahead

455
00:21:17,180 --> 00:21:19,100
and add C and D from the frontier.

456
00:21:19,100 --> 00:21:21,500
And now we have two nodes
in the frontier, C and D.

457
00:21:21,500 --> 00:21:23,170
And we repeat the process again.

458
00:21:23,170 --> 00:21:25,220
We remove a node from the
frontier, for now we'll

459
00:21:25,220 --> 00:21:27,260
do so arbitrarily just by picking C.

460
00:21:27,260 --> 00:21:30,550
We'll see why later how choosing which
node you remove from the frontier

461
00:21:30,550 --> 00:21:32,840
is actually quite an important
part of the algorithm.

462
00:21:32,840 --> 00:21:36,230
But for now I'll arbitrarily
remove C, say it's not the goal,

463
00:21:36,230 --> 00:21:39,260
so we'll add E, the next
one to the frontier.

464
00:21:39,260 --> 00:21:41,420
Then let's say I remove
E from the frontier.

465
00:21:41,420 --> 00:21:45,620
And now I'm currently looking at
state E. Is that a goal state?

466
00:21:45,620 --> 00:21:48,410
It is because I'm trying
to find a path from A to E.

467
00:21:48,410 --> 00:21:49,760
So I would return the goal.

468
00:21:49,760 --> 00:21:52,520
And that, now, would be
the solution, that I'm now

469
00:21:52,520 --> 00:21:57,290
able to return the solution
and I found a path from A to E.

470
00:21:57,290 --> 00:22:00,710
So this is the general idea, the general
approach of this search algorithm,

471
00:22:00,710 --> 00:22:04,220
to follow these steps constantly
removing nodes from the frontier

472
00:22:04,220 --> 00:22:06,540
until we're able to find a solution.

473
00:22:06,540 --> 00:22:08,510
So the next question
you might reasonably ask

474
00:22:08,510 --> 00:22:10,610
is, what could go wrong here?

475
00:22:10,610 --> 00:22:14,090
What are the potential problems
with an approach like this?

476
00:22:14,090 --> 00:22:18,090
And here's one example of a problem that
could arise from this sort of approach.

477
00:22:18,090 --> 00:22:21,980
Imagine this same graph, same
as before, with one change.

478
00:22:21,980 --> 00:22:25,070
The change being, now, instead
of just an arrow from A to B,

479
00:22:25,070 --> 00:22:29,250
we also have an arrow from B to A,
meaning we can go in both directions.

480
00:22:29,250 --> 00:22:31,580
And this is true in
something like the 15 puzzle

481
00:22:31,580 --> 00:22:33,560
where when I slide a
tile to the right, I

482
00:22:33,560 --> 00:22:36,980
could then slide a tile to the left
to get back to the original position.

483
00:22:36,980 --> 00:22:39,530
I could go back and
forth between A and B.

484
00:22:39,530 --> 00:22:42,920
And that's what these double arrows
symbolize, the idea that from one state

485
00:22:42,920 --> 00:22:45,310
I can get to another
and then I can get back.

486
00:22:45,310 --> 00:22:47,600
And that's true in many search problems.

487
00:22:47,600 --> 00:22:50,990
What's going to happen if I try
to apply the same approach now?

488
00:22:50,990 --> 00:22:53,280
Well, I'll begin with A, same as before.

489
00:22:53,280 --> 00:22:55,160
And I'll remove A from the frontier.

490
00:22:55,160 --> 00:22:59,420
And then I'll consider where I can get
to from A. And after A, the only place

491
00:22:59,420 --> 00:23:02,360
I can get choice B so B
goes into the frontier.

492
00:23:02,360 --> 00:23:03,410
Then I'll say, all right.

493
00:23:03,410 --> 00:23:06,200
Let's take a look at B. That's the
only thing left in the frontier.

494
00:23:06,200 --> 00:23:08,150
Where can I get to from B?

495
00:23:08,150 --> 00:23:12,410
Before it was just C and D, but
now because of that reverse arrow,

496
00:23:12,410 --> 00:23:17,950
I can get to A or C or D. So all
three A, C, and D. All of those

497
00:23:17,950 --> 00:23:19,070
now go into the frontier.

498
00:23:19,070 --> 00:23:23,300
They are places I can get to from B.
And now I remove one from the frontier,

499
00:23:23,300 --> 00:23:25,870
and, you know, maybe I'm
unlucky and maybe I pick A.

500
00:23:25,870 --> 00:23:27,820
And now I'm looking at A again.

501
00:23:27,820 --> 00:23:31,490
And I consider where can I get to from
A. And from A, well I can get to B.

502
00:23:31,490 --> 00:23:34,080
And now we start to see the
problem, that if I'm not careful,

503
00:23:34,080 --> 00:23:37,040
I go from A to B and then
back to A and then to B again.

504
00:23:37,040 --> 00:23:39,380
And I could be going in this
infinite loop where I never

505
00:23:39,380 --> 00:23:43,250
make any progress because I'm constantly
just going back and forth between two

506
00:23:43,250 --> 00:23:45,390
states that I've already seen.

507
00:23:45,390 --> 00:23:46,730
So what is the solution to this?

508
00:23:46,730 --> 00:23:48,980
We need some way to
deal with this problem.

509
00:23:48,980 --> 00:23:50,900
And the way that we can
deal with this problem

510
00:23:50,900 --> 00:23:54,600
is by somehow keeping track of
what we've already explored.

511
00:23:54,600 --> 00:23:58,150
And the logic is going to be, well,
if we've already explored the state,

512
00:23:58,150 --> 00:23:59,630
there's no reason to go back to it.

513
00:23:59,630 --> 00:24:01,700
Once we've explored a
state, don't go back to it,

514
00:24:01,700 --> 00:24:03,860
don't bother adding it to the frontier.

515
00:24:03,860 --> 00:24:05,570
There's no need to.

516
00:24:05,570 --> 00:24:07,640
So here is going to be
our revised approach,

517
00:24:07,640 --> 00:24:10,450
a better way to approach
this sort of search problem.

518
00:24:10,450 --> 00:24:14,030
And it's going to look very similar
just with a couple of modifications.

519
00:24:14,030 --> 00:24:16,850
We'll start with a frontier
that contains the initial state.

520
00:24:16,850 --> 00:24:18,110
Same as before.

521
00:24:18,110 --> 00:24:21,260
But now we'll start with
another data structure,

522
00:24:21,260 --> 00:24:24,290
which would just be a set of
nodes that we've already explored.

523
00:24:24,290 --> 00:24:25,880
So what are the states we've explored?

524
00:24:25,880 --> 00:24:27,330
Initially, it's empty.

525
00:24:27,330 --> 00:24:29,710
We have an empty explored set.

526
00:24:29,710 --> 00:24:30,930
And now we repeat.

527
00:24:30,930 --> 00:24:33,140
If the frontier is empty, no solution.

528
00:24:33,140 --> 00:24:34,400
Same as before.

529
00:24:34,400 --> 00:24:36,460
We remove a node from
the frontier, we check

530
00:24:36,460 --> 00:24:38,510
to see if it's a goal
state, return the solution.

531
00:24:38,510 --> 00:24:40,670
None of this is any different so far.

532
00:24:40,670 --> 00:24:42,650
But now, what we're going
to do is we're going

533
00:24:42,650 --> 00:24:45,800
to add the node to the explored state.

534
00:24:45,800 --> 00:24:49,790
So if it happens to be the case that
we remove a node from the frontier

535
00:24:49,790 --> 00:24:52,420
and it's not the goal, we'll
add it to the explored set

536
00:24:52,420 --> 00:24:54,170
so that we know we've
already explored it.

537
00:24:54,170 --> 00:24:58,010
We don't need to go back to it again
if it happens to come up later.

538
00:24:58,010 --> 00:25:00,440
And then the final
step, we expand the node

539
00:25:00,440 --> 00:25:03,120
and we add the resulting
nodes to the frontier.

540
00:25:03,120 --> 00:25:05,960
But before we just always added the
resulting nodes to the frontier,

541
00:25:05,960 --> 00:25:08,270
we're going to be a little
cleverer about it this time.

542
00:25:08,270 --> 00:25:10,970
We're only going to add
the nodes to the frontier

543
00:25:10,970 --> 00:25:14,120
if they aren't already in
the frontier and if they

544
00:25:14,120 --> 00:25:16,790
aren't already in the explored set.

545
00:25:16,790 --> 00:25:19,340
So we'll check both the
frontier and the explored set,

546
00:25:19,340 --> 00:25:22,520
make sure that the node isn't
already in one of those two,

547
00:25:22,520 --> 00:25:25,700
and so long as it isn't, then we'll
go ahead and add to the frontier

548
00:25:25,700 --> 00:25:27,590
but not otherwise.

549
00:25:27,590 --> 00:25:29,390
And so that revised
approach is ultimately

550
00:25:29,390 --> 00:25:31,220
what's going to help
make sure that we don't

551
00:25:31,220 --> 00:25:34,210
go back and forth between two nodes.

552
00:25:34,210 --> 00:25:36,420
Now the one point that I've
kind of glossed over here

553
00:25:36,420 --> 00:25:40,770
so far is this step here,
removing a node from the frontier.

554
00:25:40,770 --> 00:25:44,770
Before I just chose arbitrarily, like
let's just remove a node and that's it.

555
00:25:44,770 --> 00:25:46,710
But it turns out it's
actually quite important

556
00:25:46,710 --> 00:25:49,860
how we decide to structure
our frontier, how we add them,

557
00:25:49,860 --> 00:25:51,840
and how we remove our nodes.

558
00:25:51,840 --> 00:25:53,370
The frontier is a data structure.

559
00:25:53,370 --> 00:25:55,960
And we need to make a
choice about in what order

560
00:25:55,960 --> 00:25:58,080
are we going to be removing elements?

561
00:25:58,080 --> 00:26:01,600
And one of the simplest data structures
for adding and removing elements

562
00:26:01,600 --> 00:26:03,300
is something called a stack.

563
00:26:03,300 --> 00:26:07,960
And a stack is a data structure that
is a last in, first out data type.

564
00:26:07,960 --> 00:26:10,920
Which means the last thing
that I add to the frontier

565
00:26:10,920 --> 00:26:14,860
is going to be the first thing
that I remove from the frontier.

566
00:26:14,860 --> 00:26:18,720
So the most recent thing to go into the
stack, or the frontier in this case,

567
00:26:18,720 --> 00:26:21,660
is going to be the node that I explore.

568
00:26:21,660 --> 00:26:25,170
So let's see what happens if I
apply this stack based approach

569
00:26:25,170 --> 00:26:29,850
to something like this problem,
finding a path from A to E.

570
00:26:29,850 --> 00:26:30,910
What's going to happen?

571
00:26:30,910 --> 00:26:33,340
Well, again we'll start with
A. And we'll say, all right.

572
00:26:33,340 --> 00:26:34,950
Let's go ahead and look at A first.

573
00:26:34,950 --> 00:26:38,970
And then, notice this time, we've
added A to the explored set.

574
00:26:38,970 --> 00:26:41,790
A is something we've now explored,
we have this data structure

575
00:26:41,790 --> 00:26:43,320
that's keeping track.

576
00:26:43,320 --> 00:26:46,620
We then say from A we can
get to B. And all right.

577
00:26:46,620 --> 00:26:48,040
From B what can we do?

578
00:26:48,040 --> 00:26:52,230
Well from B, we can explore
B and get to both C and D.

579
00:26:52,230 --> 00:26:57,030
So we added C and then D. So
now when we explore a node,

580
00:26:57,030 --> 00:27:00,390
we're going to treat the frontier
as a stack, last in, first out.

581
00:27:00,390 --> 00:27:04,490
D was the last one to come in so
we'll go ahead and explore that next.

582
00:27:04,490 --> 00:27:06,450
And say, all right, where
can we get to from D?

583
00:27:06,450 --> 00:27:08,640
Well we can get to F. And so, all right.

584
00:27:08,640 --> 00:27:11,220
We'll put F into the frontier.

585
00:27:11,220 --> 00:27:13,800
And now because the
frontier is a stack, F

586
00:27:13,800 --> 00:27:16,630
is the most recent thing
that's gone in the stack.

587
00:27:16,630 --> 00:27:18,000
So F is what we'll explore next.

588
00:27:18,000 --> 00:27:19,960
We'll explore F and say, all right.

589
00:27:19,960 --> 00:27:21,600
Where can we get you from F?

590
00:27:21,600 --> 00:27:24,970
Well, we can't get anywhere so
nothing gets added to the frontier.

591
00:27:24,970 --> 00:27:27,870
So now what was the new most
recent thing added to the frontier?

592
00:27:27,870 --> 00:27:30,480
Well it's not C, the only
thing left in the frontier.

593
00:27:30,480 --> 00:27:34,100
We'll explore that from which we can
say, all right, from C we can get to E.

594
00:27:34,100 --> 00:27:35,380
So E goes into the frontier.

595
00:27:35,380 --> 00:27:36,510
And then we say, all right.

596
00:27:36,510 --> 00:27:39,220
Let's look at E and E
is now the solution.

597
00:27:39,220 --> 00:27:41,740
And now we've solved the problem.

598
00:27:41,740 --> 00:27:43,710
So when we treat the
frontier like a stack,

599
00:27:43,710 --> 00:27:47,670
a last in, first out data
structure, that's the result we get.

600
00:27:47,670 --> 00:27:52,650
We go from A to B to D to F, and then
we sort of backed up and went down

601
00:27:52,650 --> 00:27:56,190
to C and then E. And it's important
to get a visual sense for how

602
00:27:56,190 --> 00:27:57,360
this algorithm is working.

603
00:27:57,360 --> 00:27:59,550
We went very deep in
this search tree, so

604
00:27:59,550 --> 00:28:02,670
to speak, all the way until the
bottom where we hit a dead end.

605
00:28:02,670 --> 00:28:06,230
And then we effectively backed
up and explored this other route

606
00:28:06,230 --> 00:28:07,740
that we didn't try before.

607
00:28:07,740 --> 00:28:10,530
And it's this going very
deep in the search tree idea,

608
00:28:10,530 --> 00:28:13,800
this way the algorithm ends up
working when we use a stack,

609
00:28:13,800 --> 00:28:18,210
that we call this version of the
algorithm depth first search.

610
00:28:18,210 --> 00:28:20,340
Depth first search is
the search algorithm

611
00:28:20,340 --> 00:28:23,840
where we always explore the
deepest node in the frontier.

612
00:28:23,840 --> 00:28:27,000
We keep going deeper and
deeper through our search tree.

613
00:28:27,000 --> 00:28:31,660
And then if we hit a dead end, we back
up and we try something else instead.

614
00:28:31,660 --> 00:28:34,670
But depth first search is just
one of the possible search options

615
00:28:34,670 --> 00:28:35,780
that we could use.

616
00:28:35,780 --> 00:28:38,120
It turns out that there is
another algorithm called

617
00:28:38,120 --> 00:28:41,150
breadth first search, which
behaves very similarly to depth

618
00:28:41,150 --> 00:28:43,040
first search with one difference.

619
00:28:43,040 --> 00:28:46,820
Instead of always exploring the
deepest node in the search tree the way

620
00:28:46,820 --> 00:28:48,920
the depth first search
does, breadth first search

621
00:28:48,920 --> 00:28:53,370
is always going to explore the
shallowest node in the frontier.

622
00:28:53,370 --> 00:28:54,360
So what does that mean?

623
00:28:54,360 --> 00:28:56,900
Well, it means that instead
of using a stack, which

624
00:28:56,900 --> 00:29:00,650
depth first search, or DFS, used
where the most recent item added

625
00:29:00,650 --> 00:29:06,350
to the frontier is the one we'll explore
next, in breadth first search, or BFS,

626
00:29:06,350 --> 00:29:11,120
will instead use a queue where a
queue is a first in, first out data

627
00:29:11,120 --> 00:29:14,660
type, where the very first thing we
add to the frontier is the first one

628
00:29:14,660 --> 00:29:15,470
we'll explore.

629
00:29:15,470 --> 00:29:17,380
And they effectively
form a line or a queue,

630
00:29:17,380 --> 00:29:23,270
where the earlier you arrive in the
frontier, the earlier you get explored.

631
00:29:23,270 --> 00:29:26,600
So what would that mean for the same
exact problem finding a path from A

632
00:29:26,600 --> 00:29:27,950
to E?

633
00:29:27,950 --> 00:29:30,200
Well we start with A, same as before.

634
00:29:30,200 --> 00:29:33,410
Then we'll go ahead and have explored
A, and say, where can we get to from A?

635
00:29:33,410 --> 00:29:36,080
Well, from A we can get
to B. Same as before.

636
00:29:36,080 --> 00:29:37,340
From B, same as before.

637
00:29:37,340 --> 00:29:41,070
We can get to C and D so C and
D get added to the frontier.

638
00:29:41,070 --> 00:29:43,460
This time, though, we
added C to the frontier

639
00:29:43,460 --> 00:29:46,670
before D so we'll explore C first.

640
00:29:46,670 --> 00:29:48,350
So C gets explored.

641
00:29:48,350 --> 00:29:50,020
And from C, where can we get to?

642
00:29:50,020 --> 00:29:51,470
Well, we can get to E.

643
00:29:51,470 --> 00:29:53,690
So E gets added to the frontier.

644
00:29:53,690 --> 00:29:58,290
But because D was explored
before E, we'll look at D next.

645
00:29:58,290 --> 00:30:00,590
So we'll explore D and say,
where can we get to from D?

646
00:30:00,590 --> 00:30:03,860
We can get to F. And only
then will we say, all right.

647
00:30:03,860 --> 00:30:07,550
Now we can get to E. And so what
breadth first search, or BFS,

648
00:30:07,550 --> 00:30:12,290
did is we started here,
we looked at both C and D,

649
00:30:12,290 --> 00:30:14,240
and then we looked at
E. Effectively we're

650
00:30:14,240 --> 00:30:16,790
looking at things one away
from the initial state,

651
00:30:16,790 --> 00:30:18,620
then two away from the initial state.

652
00:30:18,620 --> 00:30:22,500
And only then, things that are
three away from the initial state.

653
00:30:22,500 --> 00:30:25,970
Unlike depth first search, which
just went as deep as possible

654
00:30:25,970 --> 00:30:28,700
into the search tree until
it hit a dead end and then,

655
00:30:28,700 --> 00:30:30,920
ultimately, had to back up.

656
00:30:30,920 --> 00:30:33,320
So these now are two
different search algorithms

657
00:30:33,320 --> 00:30:35,800
that we could apply in order
to try and solve a problem.

658
00:30:35,800 --> 00:30:37,850
And let's take a look at
how these would actually

659
00:30:37,850 --> 00:30:41,910
work in practice with something
like maze solving, for example.

660
00:30:41,910 --> 00:30:43,370
So here's an example of a maze.

661
00:30:43,370 --> 00:30:46,580
These empty cells represent
places where our agent can move.

662
00:30:46,580 --> 00:30:49,340
These darkened gray
cells and represent walls

663
00:30:49,340 --> 00:30:51,050
that the agent can't pass through.

664
00:30:51,050 --> 00:30:53,750
And, ultimately, our
agent, our AI, is going

665
00:30:53,750 --> 00:30:56,590
to try to find a way
to get from position A

666
00:30:56,590 --> 00:31:00,680
to position B via some sequence of
actions, where those actions are left,

667
00:31:00,680 --> 00:31:02,930
right, up, and down.

668
00:31:02,930 --> 00:31:05,420
What will depth first
search do in this case?

669
00:31:05,420 --> 00:31:08,270
Well depth first search
will just follow one path.

670
00:31:08,270 --> 00:31:11,590
If it reaches a fork in the road where
it has multiple different options,

671
00:31:11,590 --> 00:31:14,200
depth first search is just, in
this case, going to choose one.

672
00:31:14,200 --> 00:31:15,560
There isn't a real preference.

673
00:31:15,560 --> 00:31:19,280
But it's going to keep following
one until it hits a dead end.

674
00:31:19,280 --> 00:31:21,620
And when it hits a dead
end, depth first search

675
00:31:21,620 --> 00:31:24,800
effectively goes back to
the last decision point

676
00:31:24,800 --> 00:31:26,330
and tries the other path.

677
00:31:26,330 --> 00:31:28,760
Fully exhausting this
entire path and when

678
00:31:28,760 --> 00:31:30,890
it realizes that, OK,
the goal is not here,

679
00:31:30,890 --> 00:31:32,750
then it turns its
attention to this path.

680
00:31:32,750 --> 00:31:34,610
It goes as deep as possible.

681
00:31:34,610 --> 00:31:39,050
When it hits a dead end, it backs
up and then tries this other path,

682
00:31:39,050 --> 00:31:42,110
keeps going as deep as possible
down one particular path,

683
00:31:42,110 --> 00:31:45,470
and when it realizes that that's
a dead end, then it'll back up.

684
00:31:45,470 --> 00:31:48,230
And then ultimately find
its way to the goal.

685
00:31:48,230 --> 00:31:51,680
And maybe you got lucky and maybe you
made a different choice earlier on,

686
00:31:51,680 --> 00:31:54,590
but ultimately this is how depth
first search is going to work.

687
00:31:54,590 --> 00:31:56,900
It's going to keep following
until it hits a dead end.

688
00:31:56,900 --> 00:32:00,860
And when it hits a dead end, it backs
up and looks for a different solution.

689
00:32:00,860 --> 00:32:02,570
And so one thing you
might reasonably ask

690
00:32:02,570 --> 00:32:05,010
is, is this algorithm
always going to work?

691
00:32:05,010 --> 00:32:09,200
Will it always actually find a way to
get from the initial state to the goal?

692
00:32:09,200 --> 00:32:11,420
And it turns out that
as long as our maze

693
00:32:11,420 --> 00:32:14,630
is finite, as long as they're
that finitely many spaces where

694
00:32:14,630 --> 00:32:15,890
we can travel, then yes.

695
00:32:15,890 --> 00:32:19,900
Depth first search is going to find
a solution because eventually it

696
00:32:19,900 --> 00:32:21,290
will just explore everything.

697
00:32:21,290 --> 00:32:24,590
If the maze happens to be infinite and
there's an infinite state space, which

698
00:32:24,590 --> 00:32:28,350
does exist in certain types of problems,
then it's a slightly different story.

699
00:32:28,350 --> 00:32:30,950
But as long as our maze
has finitely many squares,

700
00:32:30,950 --> 00:32:33,020
we're going to find a solution.

701
00:32:33,020 --> 00:32:34,940
The next question, though,
that we want to ask

702
00:32:34,940 --> 00:32:37,040
is, is it going to be a good solution?

703
00:32:37,040 --> 00:32:39,740
Is it the optimal
solution that we can find?

704
00:32:39,740 --> 00:32:42,190
And the answer there is not necessarily.

705
00:32:42,190 --> 00:32:44,030
And let's take a look
at an example of that.

706
00:32:44,030 --> 00:32:48,890
In this maze, for example, we're again
trying to find our way from A to B.

707
00:32:48,890 --> 00:32:51,470
And you notice here there are
multiple possible solutions.

708
00:32:51,470 --> 00:32:54,560
We could go this way, or
we could go up in order

709
00:32:54,560 --> 00:32:57,350
to make our way from A
to B. Now if we're lucky,

710
00:32:57,350 --> 00:33:01,040
depth first search will choose this way
and get to B. But there's no reason,

711
00:33:01,040 --> 00:33:03,140
necessarily, why depth
first search would choose

712
00:33:03,140 --> 00:33:05,440
between going up or going to the right.

713
00:33:05,440 --> 00:33:08,270
It's sort of an arbitrary
decision point because both

714
00:33:08,270 --> 00:33:10,420
are going to be added to the frontier.

715
00:33:10,420 --> 00:33:13,310
And ultimately, if we get
unlucky, depth first search

716
00:33:13,310 --> 00:33:15,950
might choose to explore this
path first because it's just

717
00:33:15,950 --> 00:33:17,430
a random choice at this point.

718
00:33:17,430 --> 00:33:20,570
It will explore, explore,
explore, and it'll eventually

719
00:33:20,570 --> 00:33:24,110
find the goal, this particular
path, when in actuality there

720
00:33:24,110 --> 00:33:25,010
was a better path.

721
00:33:25,010 --> 00:33:28,940
There was a more optimal
solution that used fewer steps,

722
00:33:28,940 --> 00:33:32,660
assuming we're measuring the cost of a
solution based on the number of steps

723
00:33:32,660 --> 00:33:33,930
that we need to take.

724
00:33:33,930 --> 00:33:36,680
So depth first search, if
we're unlucky, might end up

725
00:33:36,680 --> 00:33:41,880
not finding the best solution when
a better solution is available.

726
00:33:41,880 --> 00:33:44,370
So if that's DFS, depth first search.

727
00:33:44,370 --> 00:33:47,300
How does BFS, or breadth
first search, compare?

728
00:33:47,300 --> 00:33:49,550
How would it work in this
particular situation?

729
00:33:49,550 --> 00:33:52,490
Well the algorithm is going to
look very different visually

730
00:33:52,490 --> 00:33:54,800
in terms of how BFS explores.

731
00:33:54,800 --> 00:33:57,800
Because BFS looks at
shallower nodes first,

732
00:33:57,800 --> 00:34:01,850
the idea is going to be BFS will
first look at all of the nodes that

733
00:34:01,850 --> 00:34:04,190
are one away from the initial state.

734
00:34:04,190 --> 00:34:06,050
Look here and look here, for example.

735
00:34:06,050 --> 00:34:10,590
Just at the two nodes that are
immediately next to this initial state.

736
00:34:10,590 --> 00:34:12,470
Then it will explore
nodes that are two away,

737
00:34:12,470 --> 00:34:15,050
looking at the state and
that state, for example.

738
00:34:15,050 --> 00:34:18,140
Then it will explore nodes that are
three away, this state and that state.

739
00:34:18,140 --> 00:34:22,190
Whereas depth first search just
picked one path and kept following it,

740
00:34:22,190 --> 00:34:24,210
breadth first search
on the other hand, is

741
00:34:24,210 --> 00:34:27,570
taking the option of exploring
all of the possible paths

742
00:34:27,570 --> 00:34:30,150
kind of at the same time,
bouncing back between them,

743
00:34:30,150 --> 00:34:32,520
looking deeper and deeper
at each one, but making

744
00:34:32,520 --> 00:34:35,400
sure to explore the shallower
ones or the ones that are

745
00:34:35,400 --> 00:34:38,060
closer to the initial state earlier.

746
00:34:38,060 --> 00:34:41,190
So we'll keep following this pattern,
looking at things that are four away,

747
00:34:41,190 --> 00:34:43,170
looking at things that
are five away, looking

748
00:34:43,170 --> 00:34:48,230
at things that are six away, until
eventually we make our way to the goal.

749
00:34:48,230 --> 00:34:51,090
And in this case, it's true we
had to explore some states that

750
00:34:51,090 --> 00:34:52,880
ultimately didn't lead us anywhere.

751
00:34:52,880 --> 00:34:56,290
But the path that we found to
the goal was the optimal path.

752
00:34:56,290 --> 00:34:59,850
This is the shortest way that
we could get to the goal.

753
00:34:59,850 --> 00:35:02,970
And so, what might happen
then in a larger maze?

754
00:35:02,970 --> 00:35:04,850
Well let's take a look
at something like this

755
00:35:04,850 --> 00:35:06,860
and how breadth first
search is going to behave.

756
00:35:06,860 --> 00:35:08,690
Well, breadth first
search, again, will just

757
00:35:08,690 --> 00:35:11,270
keep following the states until
it receives a decision point.

758
00:35:11,270 --> 00:35:13,460
It could go either left or right.

759
00:35:13,460 --> 00:35:16,850
And while DFS just picked
one and kept following

760
00:35:16,850 --> 00:35:21,620
that until it hit a dead end, BFS on
the other hand, will explore both.

761
00:35:21,620 --> 00:35:23,720
It'll say, look at this
node, then this node,

762
00:35:23,720 --> 00:35:27,510
and I'll look at this node, then
that node, so on and so forth.

763
00:35:27,510 --> 00:35:31,850
And when it hits a decision point here,
rather than pick one left or two right

764
00:35:31,850 --> 00:35:36,350
and explore that path, it will again
explore both alternating between them,

765
00:35:36,350 --> 00:35:37,380
going deeper and deeper.

766
00:35:37,380 --> 00:35:41,570
Will explore here, and then maybe
here and here, and then keep going.

767
00:35:41,570 --> 00:35:44,780
Explore here and slowly make
our way, you can visually

768
00:35:44,780 --> 00:35:46,490
see further and further out.

769
00:35:46,490 --> 00:35:48,200
Once we get to this
decision point, we'll

770
00:35:48,200 --> 00:35:52,790
explore both up and down
until, ultimately, we

771
00:35:52,790 --> 00:35:55,570
make our way to the goal.

772
00:35:55,570 --> 00:35:58,250
And what you'll notice is,
yes, breadth first search

773
00:35:58,250 --> 00:36:02,630
did find our way from A to B by
following this particular path.

774
00:36:02,630 --> 00:36:06,410
But it needed to explore a lot
of states in order to do so.

775
00:36:06,410 --> 00:36:09,440
And so we see some trade
here between DFS and BFS.

776
00:36:09,440 --> 00:36:13,430
That in DFS there may be some cases
where there is some memory savings,

777
00:36:13,430 --> 00:36:16,280
as compared to a breadth
first approach where

778
00:36:16,280 --> 00:36:19,210
breadth first search, in this case,
had to explore a lot of states.

779
00:36:19,210 --> 00:36:22,470
But maybe that won't always be the case.

780
00:36:22,470 --> 00:36:24,920
So now let's actually turn
our attention to some code.

781
00:36:24,920 --> 00:36:26,720
And look at the code
that we could actually

782
00:36:26,720 --> 00:36:30,410
write in order to implement something
like depth first search or breadth

783
00:36:30,410 --> 00:36:34,910
for the search in the context
of solving a maze, for example.

784
00:36:34,910 --> 00:36:37,340
So I'll go ahead and
go into my terminal.

785
00:36:37,340 --> 00:36:41,240
And what I have here inside of
maze.pi is an implementation

786
00:36:41,240 --> 00:36:43,640
of this same idea of maze solving.

787
00:36:43,640 --> 00:36:46,700
I've defined a class called
node that in this case

788
00:36:46,700 --> 00:36:49,620
is keeping track of the state,
the parent, in other words

789
00:36:49,620 --> 00:36:51,860
the state before the
state, and the action.

790
00:36:51,860 --> 00:36:54,110
In this case, we're not
keeping track of the path cost

791
00:36:54,110 --> 00:36:56,780
because we can calculate the
cost of the path at the end

792
00:36:56,780 --> 00:37:00,890
after we found our way from
the initial state to the goal.

793
00:37:00,890 --> 00:37:05,540
In addition to this, I've defined
a class called a stack frontier.

794
00:37:05,540 --> 00:37:08,780
And if unfamiliar with a
class, a class is a way for me

795
00:37:08,780 --> 00:37:11,930
to define a way to
generate objects in Python.

796
00:37:11,930 --> 00:37:16,070
It refers to an idea of object oriented
programming where the idea here

797
00:37:16,070 --> 00:37:18,740
is that I would like to
create an object that is

798
00:37:18,740 --> 00:37:20,900
able to store all of my Frontier Data.

799
00:37:20,900 --> 00:37:23,030
And I would like to have
functions, otherwise known

800
00:37:23,030 --> 00:37:27,410
as methods on that object, that I
can use to manipulate the object.

801
00:37:27,410 --> 00:37:31,130
And so what's going on here,
if unfamiliar with the syntax,

802
00:37:31,130 --> 00:37:34,430
is I have a function that
initially creates a frontier

803
00:37:34,430 --> 00:37:36,410
that I'm going to
represent using a list.

804
00:37:36,410 --> 00:37:39,810
And initially my frontier is
represented by the empty list.

805
00:37:39,810 --> 00:37:42,870
There's nothing in my
frontier to begin with.

806
00:37:42,870 --> 00:37:46,020
I have an add function that
adds something to the frontier,

807
00:37:46,020 --> 00:37:49,260
as by appending it to
the end of the list.

808
00:37:49,260 --> 00:37:53,400
I have a function that checks if the
frontier contains a particular state.

809
00:37:53,400 --> 00:37:55,990
I have an empty function that
checks if the frontier is empty.

810
00:37:55,990 --> 00:37:58,830
If the frontier is empty, that just
means the length of the frontier

811
00:37:58,830 --> 00:38:00,180
is zero.

812
00:38:00,180 --> 00:38:03,310
And then I have a function for
removing something from the frontier.

813
00:38:03,310 --> 00:38:06,150
I can't remove something from the
frontier if the frontier is empty.

814
00:38:06,150 --> 00:38:07,770
So I check for that first.

815
00:38:07,770 --> 00:38:10,710
But otherwise, if the
frontier isn't empty,

816
00:38:10,710 --> 00:38:13,940
recall that I'm implementing
this frontier as a stack,

817
00:38:13,940 --> 00:38:17,200
a last in, first out data structure.

818
00:38:17,200 --> 00:38:19,590
Which means the last thing
I add to the frontier,

819
00:38:19,590 --> 00:38:21,570
in other words, the
last thing in the list,

820
00:38:21,570 --> 00:38:25,750
is the item that I should
remove from this frontier.

821
00:38:25,750 --> 00:38:30,190
So what you'll see here is I have
removed the last item of a list.

822
00:38:30,190 --> 00:38:33,160
And if you index into a
Python list with negative one,

823
00:38:33,160 --> 00:38:35,050
that gets you the last item in the list.

824
00:38:35,050 --> 00:38:37,630
Since zero is the first
item, negative one

825
00:38:37,630 --> 00:38:41,490
kind of wraps around and gets
you to the last item in the list.

826
00:38:41,490 --> 00:38:43,310
So we give that the node.

827
00:38:43,310 --> 00:38:46,710
We call that node, we update the
frontier here on line 28 to say,

828
00:38:46,710 --> 00:38:50,000
go ahead and remove that node that
you just removed from the frontier.

829
00:38:50,000 --> 00:38:54,860
And then we return the node as a
result. So this class here effectively

830
00:38:54,860 --> 00:38:57,050
implements the idea of a frontier.

831
00:38:57,050 --> 00:38:59,690
It gives me a way to add
something to a frontier and a way

832
00:38:59,690 --> 00:39:03,130
to remove something from
the frontier as a stack.

833
00:39:03,130 --> 00:39:06,830
I've also, just for good measure,
implemented an alternative version

834
00:39:06,830 --> 00:39:09,500
of the same thing called a Q frontier.

835
00:39:09,500 --> 00:39:13,190
Which, in parentheses you'll see here,
it inherits from a stack frontier,

836
00:39:13,190 --> 00:39:16,640
meaning it's going to do all the same
things that the stack frontier did,

837
00:39:16,640 --> 00:39:19,440
except the way we remove
a node from the frontier

838
00:39:19,440 --> 00:39:21,090
is going to be slightly different.

839
00:39:21,090 --> 00:39:24,240
Instead of removing from the end of
the list the way we would in a stack,

840
00:39:24,240 --> 00:39:26,810
we're instead going to remove
from the beginning of the list.

841
00:39:26,810 --> 00:39:31,590
self.frontierzero will get me
the first node in the frontier,

842
00:39:31,590 --> 00:39:32,870
the first one that was added.

843
00:39:32,870 --> 00:39:37,550
And that is going to be the one
that we return in the case of a Q.

844
00:39:37,550 --> 00:39:40,340
Under here I have a definition
of a class called maze.

845
00:39:40,340 --> 00:39:45,050
This is going to handle the process
of taking a sequence, a maze like text

846
00:39:45,050 --> 00:39:47,400
file, and figuring out how to solve it.

847
00:39:47,400 --> 00:39:50,480
So we'll take as input a text
file that looks something

848
00:39:50,480 --> 00:39:53,810
like this, for example, where we see
hash marks that are here representing

849
00:39:53,810 --> 00:39:57,830
walls and I have the character A
representing the starting position,

850
00:39:57,830 --> 00:40:01,740
and the character B representing
the ending position.

851
00:40:01,740 --> 00:40:04,790
And you can take a look at the code
for parsing this text file right now.

852
00:40:04,790 --> 00:40:06,320
That's the less interesting part.

853
00:40:06,320 --> 00:40:09,380
The more interesting part
is this solve function here,

854
00:40:09,380 --> 00:40:11,510
where the solve function
is going to figure out

855
00:40:11,510 --> 00:40:15,230
how to actually get
from point A to point B.

856
00:40:15,230 --> 00:40:18,260
And here we see an implementation
of the exact same idea

857
00:40:18,260 --> 00:40:19,900
we saw from a moment ago.

858
00:40:19,900 --> 00:40:21,740
We're going to keep
track of how many states

859
00:40:21,740 --> 00:40:24,520
we've explored just so we
can report that data later.

860
00:40:24,520 --> 00:40:29,770
But I start with a node that
represents just the start state.

861
00:40:29,770 --> 00:40:34,040
And I start with a frontier that
in this case is a stack frontier.

862
00:40:34,040 --> 00:40:36,170
And given that I'm treating
my frontier as a stack,

863
00:40:36,170 --> 00:40:40,160
you might imagine that the algorithm I'm
using here is now depth first search.

864
00:40:40,160 --> 00:40:45,230
Because depth first search or DFS
uses a stack as its data structure.

865
00:40:45,230 --> 00:40:50,350
And initially, this frontier is just
going to contain the start state.

866
00:40:50,350 --> 00:40:53,390
We initialize an explored
set that initially is empty.

867
00:40:53,390 --> 00:40:55,420
There's nothing we've explored so far.

868
00:40:55,420 --> 00:40:59,980
And now here's our loop, that notion
of repeating something again and again.

869
00:40:59,980 --> 00:41:03,760
First, we check if the frontier is empty
by calling that empty function that we

870
00:41:03,760 --> 00:41:05,890
saw the implementation of a moment ago.

871
00:41:05,890 --> 00:41:08,080
And if the frontier
is indeed empty, we'll

872
00:41:08,080 --> 00:41:11,650
go ahead and raise an exception,
or a Python error, to say, sorry.

873
00:41:11,650 --> 00:41:15,230
There is no solution to this problem.

874
00:41:15,230 --> 00:41:18,630
Otherwise, we'll go ahead and
remove a node from the frontier,

875
00:41:18,630 --> 00:41:22,960
as by calling frontier.remove and update
the number of states we've explored.

876
00:41:22,960 --> 00:41:25,360
Because now we've explored
one additional state

877
00:41:25,360 --> 00:41:30,100
so we say self.numexplored plus equals
one, adding one to the number of states

878
00:41:30,100 --> 00:41:31,880
we've explored.

879
00:41:31,880 --> 00:41:34,430
Once we remove a node
from the frontier, recall

880
00:41:34,430 --> 00:41:38,330
that the next step is to see whether
or not it's the goal, the goal test.

881
00:41:38,330 --> 00:41:40,850
And in the case of the maze,
the goal is pretty easy.

882
00:41:40,850 --> 00:41:45,110
I check to see whether the state
of the node is equal to the goal.

883
00:41:45,110 --> 00:41:47,090
Initially when I set
up the maze, I set up

884
00:41:47,090 --> 00:41:49,890
this value called goal which
is the property of the maze

885
00:41:49,890 --> 00:41:53,300
so I can just check to see if
the node is actually the goal.

886
00:41:53,300 --> 00:41:56,060
And if it is the goal,
then what I want to do

887
00:41:56,060 --> 00:41:59,450
is backtrack my way towards
figuring out what actions

888
00:41:59,450 --> 00:42:02,360
I took in order to get to this goal.

889
00:42:02,360 --> 00:42:03,470
And how do I do that?

890
00:42:03,470 --> 00:42:06,140
We'll recall that every
node stores its parent--

891
00:42:06,140 --> 00:42:09,110
the node that came before it that
we used to get to this node--

892
00:42:09,110 --> 00:42:11,690
and also the action used
in order to get there.

893
00:42:11,690 --> 00:42:13,700
So I can create this
loop where I'm constantly

894
00:42:13,700 --> 00:42:17,000
just looking at the parent
of every node and keeping

895
00:42:17,000 --> 00:42:20,600
track, for all of the parents, what
action I took to get from the parent

896
00:42:20,600 --> 00:42:21,890
to this.

897
00:42:21,890 --> 00:42:25,250
So this loop is going to keep repeating
this process of looking through all

898
00:42:25,250 --> 00:42:28,670
of the parent nodes until we get
back to the initial state, which

899
00:42:28,670 --> 00:42:32,940
has no parent, where node.parent
is going to be equal to none.

900
00:42:32,940 --> 00:42:35,240
As I do so, I'm going to be
building up the list of all

901
00:42:35,240 --> 00:42:38,030
of the actions that I'm following
and the list of all of the cells

902
00:42:38,030 --> 00:42:39,590
that are part of the solution.

903
00:42:39,590 --> 00:42:42,020
But I'll reverse them
because when I build it

904
00:42:42,020 --> 00:42:44,930
up going from the goal
back to the initial state,

905
00:42:44,930 --> 00:42:48,020
I'm building the sequence of actions
from the goal to the initial state,

906
00:42:48,020 --> 00:42:50,900
but I want to reverse them in order
to get the sequence of actions

907
00:42:50,900 --> 00:42:53,630
from the initial state to the goal.

908
00:42:53,630 --> 00:42:57,210
And that is, ultimately,
going to be the solution.

909
00:42:57,210 --> 00:43:01,280
So all of that happens if the
current state is equal to the goal.

910
00:43:01,280 --> 00:43:03,290
And otherwise, if it's
not the goal, well,

911
00:43:03,290 --> 00:43:06,860
then I'll go ahead and add this
state to the explored set to say,

912
00:43:06,860 --> 00:43:08,240
I've explored this state now.

913
00:43:08,240 --> 00:43:11,510
No need to go back to it if I
come across it in the future.

914
00:43:11,510 --> 00:43:14,750
And then, this logic
here implements the idea

915
00:43:14,750 --> 00:43:16,820
of adding neighbors to the frontier.

916
00:43:16,820 --> 00:43:18,650
I'm saying, look at all of my neighbors.

917
00:43:18,650 --> 00:43:21,530
And I implemented a function called
neighbors that you can take a look at.

918
00:43:21,530 --> 00:43:23,690
And for each of those
neighbors, I'm going to check,

919
00:43:23,690 --> 00:43:25,850
is the state already in the frontier?

920
00:43:25,850 --> 00:43:28,400
Is the state already
in the explored set?

921
00:43:28,400 --> 00:43:32,600
And if it's not in either of those, then
I'll go ahead and add this new child

922
00:43:32,600 --> 00:43:33,950
node-- this new node--

923
00:43:33,950 --> 00:43:35,230
to the frontier.

924
00:43:35,230 --> 00:43:37,610
So there's a fair amount of
syntax here, but the key here

925
00:43:37,610 --> 00:43:39,920
is not to understand all
the nuances of the syntax,

926
00:43:39,920 --> 00:43:42,740
though feel free to take a closer
look at this file on your own

927
00:43:42,740 --> 00:43:44,660
to get a sense for how it is working.

928
00:43:44,660 --> 00:43:48,140
But the key is to see how this is an
implementation of the same pseudocode,

929
00:43:48,140 --> 00:43:53,060
the same idea that we were describing
a moment ago on the screen when we were

930
00:43:53,060 --> 00:43:55,220
looking at the steps that
we might follow in order

931
00:43:55,220 --> 00:43:57,720
to solve this kind of search problem.

932
00:43:57,720 --> 00:43:59,540
So now let's actually
see this in action.

933
00:43:59,540 --> 00:44:05,540
I'll go ahead and run maze.py
on maze1.txt, for example.

934
00:44:05,540 --> 00:44:09,320
And what we'll see is here we have a
printout of what the maze initially

935
00:44:09,320 --> 00:44:10,370
looked like.

936
00:44:10,370 --> 00:44:13,010
And then here, down below,
is after we've solved it.

937
00:44:13,010 --> 00:44:17,980
We had to explore 11 states in order to
do it, and we found a path from A to B.

938
00:44:17,980 --> 00:44:21,110
And in this program, I just happened
to generate a graphical representation

939
00:44:21,110 --> 00:44:22,040
of this, as well--

940
00:44:22,040 --> 00:44:25,400
so I can open up maze.png, which
is generated by this program--

941
00:44:25,400 --> 00:44:28,820
that shows you where, in the
darker color here, the wall is.

942
00:44:28,820 --> 00:44:30,860
Red is the initial
state, green is the goal,

943
00:44:30,860 --> 00:44:32,930
and yellow is the path
that was followed.

944
00:44:32,930 --> 00:44:37,230
We found a path from the
initial state to the goal.

945
00:44:37,230 --> 00:44:40,050
But now let's take a look
at a more sophisticated maze

946
00:44:40,050 --> 00:44:42,120
to see what might happen instead.

947
00:44:42,120 --> 00:44:47,010
Let's look now at maze2.txt, where
now here we have a much larger maze.

948
00:44:47,010 --> 00:44:50,280
Again, we're trying to find our
way from point A to point B,

949
00:44:50,280 --> 00:44:53,520
but now you'll imagine that depth-first
search might not be so lucky.

950
00:44:53,520 --> 00:44:56,010
It might not get the
goal on the first try.

951
00:44:56,010 --> 00:44:58,560
It might have to follow
one path then backtrack

952
00:44:58,560 --> 00:45:02,100
and explore something
else a little bit later.

953
00:45:02,100 --> 00:45:03,230
So let's try this.

954
00:45:03,230 --> 00:45:08,930
Run pythonmaze.py of maze2.txt,
this time trying on this other maze.

955
00:45:08,930 --> 00:45:12,140
And now depth-first search
is able to find a solution.

956
00:45:12,140 --> 00:45:16,040
Here, as indicated by the stars,
is a way to get from A to B.

957
00:45:16,040 --> 00:45:19,430
And we can represent this
visually by opening up this maze.

958
00:45:19,430 --> 00:45:20,810
Here's what that maze looks like.

959
00:45:20,810 --> 00:45:24,860
And highlighted in yellow, is the path
that was found from the initial state

960
00:45:24,860 --> 00:45:26,300
to the goal.

961
00:45:26,300 --> 00:45:31,340
But how many states do we have to
explore before we found that path?

962
00:45:31,340 --> 00:45:34,610
Well, recall that, in my program, I was
keeping track of the number of states

963
00:45:34,610 --> 00:45:36,530
that we've explored so far.

964
00:45:36,530 --> 00:45:40,280
And so I can go back to the terminal
and see that, all right, in order

965
00:45:40,280 --> 00:45:46,110
to solve this problem, we had
to explore 399 different states.

966
00:45:46,110 --> 00:45:48,860
And in fact, if I make one small
modification to the program

967
00:45:48,860 --> 00:45:51,950
and tell the program at the
end when we output this image,

968
00:45:51,950 --> 00:45:55,160
I added an argument
called "show explored".

969
00:45:55,160 --> 00:45:57,980
And if I set "show
explored" equal to true

970
00:45:57,980 --> 00:46:02,720
and rerun this program pythonmaze.py
by running it on maze2,

971
00:46:02,720 --> 00:46:06,320
and then I open the maze, what you'll
see here is, highlighted in red,

972
00:46:06,320 --> 00:46:10,610
are all of the states that had to be
explored to get from the initial state

973
00:46:10,610 --> 00:46:11,510
to the goal.

974
00:46:11,510 --> 00:46:15,140
Depth-First Search, or DFS, didn't
find its way to the goal right away.

975
00:46:15,140 --> 00:46:18,170
It made a choice to first
explore this direction.

976
00:46:18,170 --> 00:46:19,970
And when it explored
this direction, it had

977
00:46:19,970 --> 00:46:22,280
to follow every conceivable
path, all the way

978
00:46:22,280 --> 00:46:24,680
to the very end, even
this long and winding one,

979
00:46:24,680 --> 00:46:27,440
in order to realize that, you
know what, that's a dead end.

980
00:46:27,440 --> 00:46:29,720
And instead, the program
needed to backtrack.

981
00:46:29,720 --> 00:46:32,660
After going this direction, it
must have gone this direction.

982
00:46:32,660 --> 00:46:35,490
It got lucky here by just
not choosing this path.

983
00:46:35,490 --> 00:46:39,290
But it got unlucky here, exploring this
direction, exploring a bunch of states

984
00:46:39,290 --> 00:46:41,150
that it didn't need
to and then, likewise,

985
00:46:41,150 --> 00:46:43,490
exploring all of this
top part of the graph

986
00:46:43,490 --> 00:46:46,320
when it probably didn't
need to do that either.

987
00:46:46,320 --> 00:46:49,070
So all in all, depth-first
search here really

988
00:46:49,070 --> 00:46:52,970
not performing optimally, or probably
exploring more states than it needs to.

989
00:46:52,970 --> 00:46:56,600
It finds an optimal solution,
the best path to the goal,

990
00:46:56,600 --> 00:46:59,510
but the number of states needed
to explore in order to do so,

991
00:46:59,510 --> 00:47:03,060
the number of steps I had to
take, that was much higher.

992
00:47:03,060 --> 00:47:04,070
So let's compare.

993
00:47:04,070 --> 00:47:09,060
How would Breadth-First Search, or BFS,
do on this exact same maze instead?

994
00:47:09,060 --> 00:47:11,630
And in order to do so,
it's a very easy change.

995
00:47:11,630 --> 00:47:16,550
The algorithm for DFS and BFS
is identical with the exception

996
00:47:16,550 --> 00:47:20,950
of what data structure we use
to represent the frontier.

997
00:47:20,950 --> 00:47:23,840
That in DFS I used a stack frontier--

998
00:47:23,840 --> 00:47:25,880
last in, first out--

999
00:47:25,880 --> 00:47:30,380
whereas in BFS, I'm going to
use a queue frontier-- first in,

1000
00:47:30,380 --> 00:47:33,260
first out, where the first
thing I add to the frontier

1001
00:47:33,260 --> 00:47:35,570
is the first thing that I remove.

1002
00:47:35,570 --> 00:47:40,670
So I'll go back to the terminal,
rerun this program on the same maze,

1003
00:47:40,670 --> 00:47:45,280
and now you'll see that the number of
states we had to explore was only 77,

1004
00:47:45,280 --> 00:47:49,140
as compared to almost 400 when
we used depth-first search.

1005
00:47:49,140 --> 00:47:50,330
And we can see exactly why.

1006
00:47:50,330 --> 00:47:54,980
We can see what happened if we open
up maze.png now and take a look.

1007
00:47:54,980 --> 00:47:59,540
Again, yellow highlight is the solution
that breath-first search found,

1008
00:47:59,540 --> 00:48:03,020
which, incidentally, is the same
solution that depth-first search found.

1009
00:48:03,020 --> 00:48:07,110
They're both finding the best solution,
but notice all the white unexplored

1010
00:48:07,110 --> 00:48:07,610
cells.

1011
00:48:07,610 --> 00:48:10,700
There was much fewer states
that needed to be explored

1012
00:48:10,700 --> 00:48:14,980
in order to make our way to the goal
because breadth-first search operates

1013
00:48:14,980 --> 00:48:15,980
a little more shallowly.

1014
00:48:15,980 --> 00:48:19,070
It's exploring things that
are close to the initial state

1015
00:48:19,070 --> 00:48:22,170
without exploring things
that are further away.

1016
00:48:22,170 --> 00:48:25,220
So if the goal is not too far
away, then breadth-first search

1017
00:48:25,220 --> 00:48:27,950
can actually behave quite
effectively on a maze that

1018
00:48:27,950 --> 00:48:30,870
looks a little something like this.

1019
00:48:30,870 --> 00:48:35,750
Now, in this case, both BFS and DFS
ended up finding the same solution,

1020
00:48:35,750 --> 00:48:37,680
but that won't always be the case.

1021
00:48:37,680 --> 00:48:43,390
And in fact, let's take a look at one
more example, for instance, maze3.txt.

1022
00:48:43,390 --> 00:48:46,970
In maze3.txt, notice that
here there are multiple ways

1023
00:48:46,970 --> 00:48:49,190
that you could get from A to B.

1024
00:48:49,190 --> 00:48:52,040
It's a relatively small maze,
but let's look at what happens.

1025
00:48:52,040 --> 00:48:55,670
If I use-- and I'll go ahead
and turn off "show explored" so

1026
00:48:55,670 --> 00:48:58,390
we just see the solution.

1027
00:48:58,390 --> 00:49:04,590
If I use BFS, breadth-first
search, to solve maze3.txt,

1028
00:49:04,590 --> 00:49:06,420
well, then we find a solution.

1029
00:49:06,420 --> 00:49:09,530
And if I open up the maze, here's
the solution that we found.

1030
00:49:09,530 --> 00:49:10,610
It is the optimal one.

1031
00:49:10,610 --> 00:49:13,700
With just four steps, we can
get from the initial state

1032
00:49:13,700 --> 00:49:17,090
to what the goal happens to be.

1033
00:49:17,090 --> 00:49:21,890
But what happens if we try to use,
depth-first search, or DFS, instead?

1034
00:49:21,890 --> 00:49:26,540
Well, again, I'll go back up to my queue
frontier, where queue frontier means

1035
00:49:26,540 --> 00:49:28,850
that we're using breadth-first search.

1036
00:49:28,850 --> 00:49:32,120
And I'll change it to a stack
frontier, which means that now we'll

1037
00:49:32,120 --> 00:49:34,850
be using depth-first search.

1038
00:49:34,850 --> 00:49:37,910
I'll rerun Pythonmaze.py.

1039
00:49:37,910 --> 00:49:40,490
And now you'll see that
we find a solution,

1040
00:49:40,490 --> 00:49:42,980
but it is not the optimal solution.

1041
00:49:42,980 --> 00:49:45,640
This, instead, is what
our algorithm finds.

1042
00:49:45,640 --> 00:49:48,140
And maybe depth-first search
would have found this solution.

1043
00:49:48,140 --> 00:49:51,410
It's possible, but it's not
guaranteed, that if we just

1044
00:49:51,410 --> 00:49:55,280
happen to be unlucky, if we choose
this state instead of that state,

1045
00:49:55,280 --> 00:49:58,280
then depth-first search might
find a longer route to get

1046
00:49:58,280 --> 00:50:01,260
from the initial state to the goal.

1047
00:50:01,260 --> 00:50:04,280
So we do see some trade-offs here
where depth-first search might not

1048
00:50:04,280 --> 00:50:06,240
find the optimal solution.

1049
00:50:06,240 --> 00:50:09,080
So at that point, it seems like
breadth-first search is pretty good.

1050
00:50:09,080 --> 00:50:12,920
Is that the best we can do, where it's
going to find us the optimal solution

1051
00:50:12,920 --> 00:50:15,620
and we don't have to worry
about situations where

1052
00:50:15,620 --> 00:50:20,270
we might end up finding a longer path to
the solution than what actually exists?

1053
00:50:20,270 --> 00:50:23,150
Where the goal is far away
from the initial state--

1054
00:50:23,150 --> 00:50:26,780
and we might have to take lots of steps
in order to get from the initial state

1055
00:50:26,780 --> 00:50:27,620
to the goal--

1056
00:50:27,620 --> 00:50:31,220
what ended up happening, is that
this algorithm, BFS, ended up

1057
00:50:31,220 --> 00:50:35,480
exploring basically the entire graph,
having to go through the entire maze

1058
00:50:35,480 --> 00:50:39,800
in order to find its way from the
initial state to the goal state.

1059
00:50:39,800 --> 00:50:41,960
What we'd ultimately
like is for our algorithm

1060
00:50:41,960 --> 00:50:44,320
to be a little bit more intelligent.

1061
00:50:44,320 --> 00:50:46,310
And now what would it
mean for our algorithm

1062
00:50:46,310 --> 00:50:49,820
to be a little bit more
intelligent, in this case?

1063
00:50:49,820 --> 00:50:52,490
Well, let's look back to where
breadth-first search might

1064
00:50:52,490 --> 00:50:54,290
have been able to make
a different decision

1065
00:50:54,290 --> 00:50:57,570
and consider human intuition
in this process, as well.

1066
00:50:57,570 --> 00:51:01,640
Like, what might a human do when solving
this maze that is different than what

1067
00:51:01,640 --> 00:51:04,490
BFS ultimately chose to do?

1068
00:51:04,490 --> 00:51:07,610
Well, the very first
decision point that BFS made

1069
00:51:07,610 --> 00:51:11,420
was right here, when it
made five steps and ended up

1070
00:51:11,420 --> 00:51:13,340
in a position where it
had a fork in the road.

1071
00:51:13,340 --> 00:51:15,210
It could either go left
or it could go right.

1072
00:51:15,210 --> 00:51:17,460
In these initial couple of
steps, there was no choice.

1073
00:51:17,460 --> 00:51:20,790
There was only one action that could
be taken from each of those states.

1074
00:51:20,790 --> 00:51:23,030
And so the search algorithm
did the only thing

1075
00:51:23,030 --> 00:51:25,010
that any search
algorithm could do, which

1076
00:51:25,010 --> 00:51:28,400
is keep following that
state after the next state.

1077
00:51:28,400 --> 00:51:31,670
But this decision point is where
things get a little bit interesting.

1078
00:51:31,670 --> 00:51:34,850
Depth-first search, that very first
search algorithm we looked at,

1079
00:51:34,850 --> 00:51:38,750
chose to say, let's pick one
path and exhaust that path,

1080
00:51:38,750 --> 00:51:42,560
see if anything that way has
the goal, and if not, then let's

1081
00:51:42,560 --> 00:51:43,800
try the other way.

1082
00:51:43,800 --> 00:51:46,580
Breadth-first search took the
alternative approach of saying,

1083
00:51:46,580 --> 00:51:47,210
you know what?

1084
00:51:47,210 --> 00:51:51,510
Let's explore things that are shallow,
close to us first, look left and right,

1085
00:51:51,510 --> 00:51:53,960
then back left and back
right, so on and so forth,

1086
00:51:53,960 --> 00:51:58,640
alternating between our options in
the hopes of finding something nearby.

1087
00:51:58,640 --> 00:52:02,390
But ultimately, what might a human do
if confronted with a situation like this

1088
00:52:02,390 --> 00:52:04,250
of go left or go right?

1089
00:52:04,250 --> 00:52:07,010
Well, a human might visually
see that, all right,

1090
00:52:07,010 --> 00:52:11,360
I'm trying to get to state B, which
is way up there, and going right just

1091
00:52:11,360 --> 00:52:13,360
feels like it's closer to the goal.

1092
00:52:13,360 --> 00:52:15,140
Like, it feels like
going right should be

1093
00:52:15,140 --> 00:52:17,870
better than going left
because I'm making progress

1094
00:52:17,870 --> 00:52:19,380
towards getting to that goal.

1095
00:52:19,380 --> 00:52:22,340
Now, of course, there are a couple
of assumptions that I'm making here.

1096
00:52:22,340 --> 00:52:25,220
I'm making the assumption
that we can represent

1097
00:52:25,220 --> 00:52:27,080
this grid as, like, a
two-dimensional grid,

1098
00:52:27,080 --> 00:52:28,880
where I know the
coordinates of everything.

1099
00:52:28,880 --> 00:52:33,940
I know that A is in coordinate 0,0,
and B is in some other coordinate pair.

1100
00:52:33,940 --> 00:52:37,070
And I know what coordinate I'm at now,
so I can calculate that, yeah, going

1101
00:52:37,070 --> 00:52:39,420
this way, that is closer to the goal.

1102
00:52:39,420 --> 00:52:42,170
And that might be a reasonable
assumption for some types of search

1103
00:52:42,170 --> 00:52:44,060
problems but maybe not in others.

1104
00:52:44,060 --> 00:52:46,670
But for now, we'll go
ahead and assume that--

1105
00:52:46,670 --> 00:52:51,590
that I know what my current coordinate
pair and I know the coordinate x,y

1106
00:52:51,590 --> 00:52:53,680
of the goal that I'm trying to get to.

1107
00:52:53,680 --> 00:52:56,540
And in this situation,
I'd like an algorithm that

1108
00:52:56,540 --> 00:52:59,090
is a little bit more
intelligent and somehow knows

1109
00:52:59,090 --> 00:53:02,180
that I should be making
progress towards the goal,

1110
00:53:02,180 --> 00:53:05,720
and this is probably the way
to do that because, in a maze,

1111
00:53:05,720 --> 00:53:08,480
moving in the coordinate
direction of the goal

1112
00:53:08,480 --> 00:53:11,880
is usually, though not
always, a good thing.

1113
00:53:11,880 --> 00:53:14,840
And so here we draw a distinction
between two different types of search

1114
00:53:14,840 --> 00:53:19,040
algorithms-- uninformed
search and informed search.

1115
00:53:19,040 --> 00:53:23,330
Uninformed search algorithms
are algorithms like DFS and BFS,

1116
00:53:23,330 --> 00:53:25,200
the two algorithms
that we just looked at,

1117
00:53:25,200 --> 00:53:29,540
which are search strategies that don't
use any problem specific knowledge

1118
00:53:29,540 --> 00:53:31,400
to be able to solve the problem.

1119
00:53:31,400 --> 00:53:34,610
DFS and BFS didn't really
care about the structure

1120
00:53:34,610 --> 00:53:38,270
of the maze or anything about
the way that a maze is in order

1121
00:53:38,270 --> 00:53:39,330
to solve the problem.

1122
00:53:39,330 --> 00:53:42,490
They just look at the actions available
and choose from those actions,

1123
00:53:42,490 --> 00:53:45,290
and it doesn't matter whether if
it's a maze or some other problem.

1124
00:53:45,290 --> 00:53:48,020
The solution, or the way that
it tries to solve the problem,

1125
00:53:48,020 --> 00:53:51,320
is really fundamentally
going to be the same.

1126
00:53:51,320 --> 00:53:53,030
What we're going to
take a look at now is

1127
00:53:53,030 --> 00:53:55,370
an improvement upon uninformed search.

1128
00:53:55,370 --> 00:53:57,830
We're going to take a
look at informed search.

1129
00:53:57,830 --> 00:54:00,440
Informed search are going
to be search strategies that

1130
00:54:00,440 --> 00:54:05,800
use knowledge specific to the problem
to be able to better find a solution.

1131
00:54:05,800 --> 00:54:09,270
And in the case of a maze,
this problem specific knowledge

1132
00:54:09,270 --> 00:54:11,270
is something like, if
I'm going to square

1133
00:54:11,270 --> 00:54:14,420
that is geographically
closer to the goal, that

1134
00:54:14,420 --> 00:54:19,700
is better than being in a square
that is geographically further away.

1135
00:54:19,700 --> 00:54:23,330
And this is something we can only
know by thinking about this problem

1136
00:54:23,330 --> 00:54:27,820
and reasoning about what knowledge
might be helpful for our AI agent

1137
00:54:27,820 --> 00:54:30,060
to know a little something about.

1138
00:54:30,060 --> 00:54:32,440
There are a number of different
types of informed search.

1139
00:54:32,440 --> 00:54:35,140
Specifically, first, we're going
to look at a particular type

1140
00:54:35,140 --> 00:54:39,550
of search algorithm called
greedy best-first search.

1141
00:54:39,550 --> 00:54:42,700
Greedy Best-First Search,
often abbreviated GBFS,

1142
00:54:42,700 --> 00:54:45,970
is a search algorithm that, instead
of expanding the deepest node,

1143
00:54:45,970 --> 00:54:49,660
like DFS, or the
shallowest node, like BFS,

1144
00:54:49,660 --> 00:54:52,390
this algorithm is always
going to expand the node

1145
00:54:52,390 --> 00:54:55,990
that it thinks is closest to the goal.

1146
00:54:55,990 --> 00:54:59,650
Now, the search algorithm isn't going to
know for sure whether it is the closest

1147
00:54:59,650 --> 00:55:02,650
thing to the goal, because if we
knew what was closest to the goal

1148
00:55:02,650 --> 00:55:05,100
all the time, then we would
already have a solution.

1149
00:55:05,100 --> 00:55:07,150
Like, the knowledge of
what is close to the goal,

1150
00:55:07,150 --> 00:55:10,570
we could just follow those steps in
order to get from the initial position

1151
00:55:10,570 --> 00:55:11,870
to the solution.

1152
00:55:11,870 --> 00:55:14,650
But if we don't know the solution--
meaning we don't know exactly

1153
00:55:14,650 --> 00:55:16,450
what's closest to the goal--

1154
00:55:16,450 --> 00:55:19,120
instead, we can use
an estimate of what's

1155
00:55:19,120 --> 00:55:22,180
closest to the goal, otherwise
known as a heuristic--

1156
00:55:22,180 --> 00:55:25,910
just some way of estimating whether
or not we're close to the goal.

1157
00:55:25,910 --> 00:55:29,800
And we'll do so using a heuristic
function, conventionally called h(n),

1158
00:55:29,800 --> 00:55:34,540
that takes a state of input and
returns our estimate of how close we

1159
00:55:34,540 --> 00:55:36,860
are to the goal.

1160
00:55:36,860 --> 00:55:39,040
So what might this
heuristic function actually

1161
00:55:39,040 --> 00:55:42,100
look like in the case of
a maze-solving algorithm?

1162
00:55:42,100 --> 00:55:45,490
Where we're trying to solve a maze,
what does a heuristic look like?

1163
00:55:45,490 --> 00:55:48,910
Well, the heuristic needs to answer
a question, like between these two

1164
00:55:48,910 --> 00:55:51,770
cells, C and D, which one is better?

1165
00:55:51,770 --> 00:55:55,640
Which one would I rather be in if I'm
trying to find my way to the goal?

1166
00:55:55,640 --> 00:55:58,640
Well, any human could probably look
at this and tell you, you know what?

1167
00:55:58,640 --> 00:56:00,280
D looks like it's better.

1168
00:56:00,280 --> 00:56:03,550
Even if the maze is a convoluted and
you haven't thought about all the walls,

1169
00:56:03,550 --> 00:56:05,410
D is probably better.

1170
00:56:05,410 --> 00:56:06,710
And why is D better?

1171
00:56:06,710 --> 00:56:09,730
Well, because if you ignore the
wall-- let's just pretend the walls

1172
00:56:09,730 --> 00:56:14,290
don't exist for a moment and
relax the problem, so to speak--

1173
00:56:14,290 --> 00:56:18,670
D, just in terms of coordinate
pairs, is closer to this goal.

1174
00:56:18,670 --> 00:56:21,700
It's fewer steps that I would
need to take to get to the goal,

1175
00:56:21,700 --> 00:56:24,160
as compared to C, even
if you ignore the walls.

1176
00:56:24,160 --> 00:56:29,160
If you just know the x,y coordinate of
C, and the x,y coordinate of the goal,

1177
00:56:29,160 --> 00:56:31,450
and likewise, you know
the x,y coordinate of D,

1178
00:56:31,450 --> 00:56:35,770
you can calculate that D, just
geographically, ignoring the walls,

1179
00:56:35,770 --> 00:56:37,110
looks like it's better.

1180
00:56:37,110 --> 00:56:39,700
And so this is the heuristic
function that we're going to use,

1181
00:56:39,700 --> 00:56:42,880
and it's something called the
Manhattan distance, one specific type

1182
00:56:42,880 --> 00:56:46,690
of heuristic, where the heuristic
is, how many squares vertically

1183
00:56:46,690 --> 00:56:49,270
and horizontally and then
left to right-- so not

1184
00:56:49,270 --> 00:56:53,320
allowing myself to go diagonally, just
either up or right or left or down.

1185
00:56:53,320 --> 00:56:58,030
How many steps do I need to take to get
from each of these cells to the goal?

1186
00:56:58,030 --> 00:57:00,910
Well, as it turns out, D is much closer.

1187
00:57:00,910 --> 00:57:01,870
There are fewer steps.

1188
00:57:01,870 --> 00:57:05,740
It only needs to take six steps
in order to get to that goal.

1189
00:57:05,740 --> 00:57:07,670
Again here ignoring the walls.

1190
00:57:07,670 --> 00:57:09,790
We've relaxed the problem a little bit.

1191
00:57:09,790 --> 00:57:12,510
We're just concerned
with, if you do the math,

1192
00:57:12,510 --> 00:57:14,470
subtract the x values
from each other and the y

1193
00:57:14,470 --> 00:57:18,010
values from each other, what is our
estimate of how far we are away?

1194
00:57:18,010 --> 00:57:22,970
We can estimate that D is
closer to the goal than C is.

1195
00:57:22,970 --> 00:57:24,800
And so now we have an approach.

1196
00:57:24,800 --> 00:57:27,890
We have a way of picking which
node to remove from the frontier.

1197
00:57:27,890 --> 00:57:29,870
And at each stage in
our algorithm, we're

1198
00:57:29,870 --> 00:57:31,580
going to remove a node
from the frontier.

1199
00:57:31,580 --> 00:57:34,820
We're going to explore the
node, if it has the smallest

1200
00:57:34,820 --> 00:57:37,970
value for this heuristic
function, if it has the smallest

1201
00:57:37,970 --> 00:57:40,740
Manhattan distance to the goal.

1202
00:57:40,740 --> 00:57:42,510
And so what would this
actually look like?

1203
00:57:42,510 --> 00:57:45,320
Well, let me first label
this graph, label this maze,

1204
00:57:45,320 --> 00:57:48,050
with a number representing
the value of this heuristic

1205
00:57:48,050 --> 00:57:51,860
function, the value of the Manhattan
distance from any of these cells.

1206
00:57:51,860 --> 00:57:55,070
So from this cell, for example,
were one away from the goal.

1207
00:57:55,070 --> 00:57:56,920
From this cell, were
two away from the goal.

1208
00:57:56,920 --> 00:57:58,400
Three away, four away.

1209
00:57:58,400 --> 00:58:02,210
Here we're five away, because we have
to go one to the right and then four up.

1210
00:58:02,210 --> 00:58:05,850
From somewhere like here,
the Manhattan distance is 2.

1211
00:58:05,850 --> 00:58:08,330
We're only two squares
away from the goal,

1212
00:58:08,330 --> 00:58:10,730
geographically, even
though in practices we're

1213
00:58:10,730 --> 00:58:13,910
going to have to take a longer
path, but we don't know that yet.

1214
00:58:13,910 --> 00:58:16,760
The heuristic is just
some easy way to estimate

1215
00:58:16,760 --> 00:58:18,380
how far we are away from the goal.

1216
00:58:18,380 --> 00:58:21,380
And maybe our heuristic
is overly optimistic.

1217
00:58:21,380 --> 00:58:23,660
It thinks that, yeah,
we're only two steps away,

1218
00:58:23,660 --> 00:58:27,450
when in practice, when you consider
the walls, it might be more steps.

1219
00:58:27,450 --> 00:58:31,400
So the important thing here is that
the heuristic isn't a guarantee

1220
00:58:31,400 --> 00:58:33,290
of how many steps it's going to take.

1221
00:58:33,290 --> 00:58:34,910
It is estimating.

1222
00:58:34,910 --> 00:58:36,920
It's an attempt at
trying to approximate.

1223
00:58:36,920 --> 00:58:40,760
And it does seem generally the case
that the squares that look closer

1224
00:58:40,760 --> 00:58:43,910
to the goal have smaller values
for the heuristic function

1225
00:58:43,910 --> 00:58:46,960
than squares that are further away.

1226
00:58:46,960 --> 00:58:52,110
So now, using greedy best-first search,
what might this algorithm actually do?

1227
00:58:52,110 --> 00:58:55,270
Well, again, for these first five
steps, there's not much of a choice.

1228
00:58:55,270 --> 00:58:57,610
We started this initial state,
A. And we say, all right.

1229
00:58:57,610 --> 00:59:00,300
We have to explore these five states.

1230
00:59:00,300 --> 00:59:01,890
But now we have a decision point.

1231
00:59:01,890 --> 00:59:04,590
Now we have a choice between
going left and going right.

1232
00:59:04,590 --> 00:59:08,760
And before, when DFS and BFS would
just pick arbitrarily because it just

1233
00:59:08,760 --> 00:59:11,610
depends on the order you throw
these two nodes into the frontier--

1234
00:59:11,610 --> 00:59:15,240
and we didn't specify what order you put
them into the frontier, only the order

1235
00:59:15,240 --> 00:59:17,250
you take them out.

1236
00:59:17,250 --> 00:59:20,580
Here we can look at 13 and
11 and say that, all right,

1237
00:59:20,580 --> 00:59:24,630
this square is a distance
of 11 away from the goal,

1238
00:59:24,630 --> 00:59:27,300
according to our heuristic,
according to our estimate.

1239
00:59:27,300 --> 00:59:31,560
And this one we estimate to
be 13 away from the goal.

1240
00:59:31,560 --> 00:59:34,650
So between those two options,
between these two choices,

1241
00:59:34,650 --> 00:59:36,150
I'd rather have the 11.

1242
00:59:36,150 --> 00:59:40,240
I'd rather be 11 steps away from
the goal, so I'll go to the right.

1243
00:59:40,240 --> 00:59:44,340
We're able to make an informed decision
because we know a little something more

1244
00:59:44,340 --> 00:59:45,670
about this problem.

1245
00:59:45,670 --> 00:59:47,820
So then we keep following 10, 9, 8--

1246
00:59:47,820 --> 00:59:49,470
between the two sevens.

1247
00:59:49,470 --> 00:59:51,850
We don't really have much of
a way to know between those.

1248
00:59:51,850 --> 00:59:53,880
So then we do just have to
make an arbitrary choice.

1249
00:59:53,880 --> 00:59:54,630
And you know what?

1250
00:59:54,630 --> 00:59:55,560
Maybe we choose wrong.

1251
00:59:55,560 --> 01:00:00,110
But that's OK because now we can still
say, all right, let's try this seven.

1252
01:00:00,110 --> 01:00:01,820
We say seven, six.

1253
01:00:01,820 --> 01:00:03,990
We have to make this choice
even though it increases

1254
01:00:03,990 --> 01:00:05,640
the value of the heuristic function.

1255
01:00:05,640 --> 01:00:08,970
But now we have another decision
point between six and eight.

1256
01:00:08,970 --> 01:00:10,290
And between those two--

1257
01:00:10,290 --> 01:00:13,380
and really, we're also considering
the 13, but that's much higher.

1258
01:00:13,380 --> 01:00:16,200
Between six, eight,
and 13, well, the six

1259
01:00:16,200 --> 01:00:18,900
is the smallest value, so
we'd rather take the six.

1260
01:00:18,900 --> 01:00:22,440
We're able to make an informed decision
that going this way to the right

1261
01:00:22,440 --> 01:00:24,840
is probably better than going that way.

1262
01:00:24,840 --> 01:00:25,680
So we turn this way.

1263
01:00:25,680 --> 01:00:26,850
We go to five.

1264
01:00:26,850 --> 01:00:29,190
And now we find a decision
point where we'll actually

1265
01:00:29,190 --> 01:00:31,200
make a decision that we
might not want to make,

1266
01:00:31,200 --> 01:00:34,320
but there's unfortunately not
too much of a way around this.

1267
01:00:34,320 --> 01:00:35,670
We see four and six.

1268
01:00:35,670 --> 01:00:37,620
Four looks closer to the goal, right?

1269
01:00:37,620 --> 01:00:40,170
It's going up, and the
goal is further up.

1270
01:00:40,170 --> 01:00:43,710
So we end up taking that route, which
ultimately leads us to a dead end.

1271
01:00:43,710 --> 01:00:46,960
But that's OK because we can still
say, all right, now let's try the six,

1272
01:00:46,960 --> 01:00:51,380
and now follow this route that will
ultimately lead us to the goal.

1273
01:00:51,380 --> 01:00:54,510
And so this now is how greedy
best-first search might

1274
01:00:54,510 --> 01:00:56,760
try to approach this
problem, by saying whenever

1275
01:00:56,760 --> 01:01:00,060
we have a decision between multiple
nodes that we could explore,

1276
01:01:00,060 --> 01:01:04,290
let's explore the node that
has the smallest value of h(n),

1277
01:01:04,290 --> 01:01:09,210
this heuristic function that is
estimating how far I have to go.

1278
01:01:09,210 --> 01:01:11,010
And it just so happens
that, in this case,

1279
01:01:11,010 --> 01:01:15,010
we end up doing better, in terms of the
number of states we needed to explore,

1280
01:01:15,010 --> 01:01:16,630
than BFS needed to.

1281
01:01:16,630 --> 01:01:19,950
BFS explored all of this
section and all of that section.

1282
01:01:19,950 --> 01:01:22,620
But we were able to eliminate
that by taking advantage

1283
01:01:22,620 --> 01:01:26,040
of this heuristic, this
knowledge about how close we

1284
01:01:26,040 --> 01:01:30,330
are to the goal or some
estimate of that idea.

1285
01:01:30,330 --> 01:01:31,450
So this seems much better.

1286
01:01:31,450 --> 01:01:33,150
So wouldn't we always
prefer an algorithm

1287
01:01:33,150 --> 01:01:36,870
like this over an algorithm
like breadth-first search?

1288
01:01:36,870 --> 01:01:37,850
Well, maybe.

1289
01:01:37,850 --> 01:01:39,810
One thing to take into
consideration is that we

1290
01:01:39,810 --> 01:01:42,030
need to come up with a good heuristic.

1291
01:01:42,030 --> 01:01:45,720
How good the heuristic is is going
to affect how good this algorithm is.

1292
01:01:45,720 --> 01:01:49,740
And coming up with a good heuristic
can oftentimes be challenging.

1293
01:01:49,740 --> 01:01:51,450
But the other thing
to consider is to ask

1294
01:01:51,450 --> 01:01:54,630
the question, just as we did
with the prior two algorithms,

1295
01:01:54,630 --> 01:01:56,580
is this algorithm optimal?

1296
01:01:56,580 --> 01:02:02,280
Will it always find the shortest path
from the initial state to the goal?

1297
01:02:02,280 --> 01:02:06,180
And to answer that question, let's take
a look at this example for a moment.

1298
01:02:06,180 --> 01:02:07,570
Take a look at this example.

1299
01:02:07,570 --> 01:02:10,170
Again, we're trying to get
from A to B, and again, I've

1300
01:02:10,170 --> 01:02:13,260
labeled each of the cells
with their Manhattan distance

1301
01:02:13,260 --> 01:02:16,320
from the goal, the number of
squares up and to the right

1302
01:02:16,320 --> 01:02:20,500
you would need to travel in order
to get from that square to the goal.

1303
01:02:20,500 --> 01:02:23,400
And let's think about, would
greedy best-first search

1304
01:02:23,400 --> 01:02:29,400
that always picks the smallest number
end up finding the optimal solution?

1305
01:02:29,400 --> 01:02:33,520
What is the shortest solution,
and would this algorithm find it?

1306
01:02:33,520 --> 01:02:38,190
And the important thing to realize is
that right here is the decision point.

1307
01:02:38,190 --> 01:02:40,710
We're estimate to be
12 away from the goal.

1308
01:02:40,710 --> 01:02:42,310
And we have two choices.

1309
01:02:42,310 --> 01:02:45,690
We can go to the left, which we
estimate to be 13 away from the goal,

1310
01:02:45,690 --> 01:02:49,710
or we can go up, where we estimate
it to be 11 away from the goal.

1311
01:02:49,710 --> 01:02:53,640
And between those two, greedy
best-first search is going to say,

1312
01:02:53,640 --> 01:02:57,000
the 11 looks better than the 13.

1313
01:02:57,000 --> 01:02:59,540
And in doing so, greedy
best-first search

1314
01:02:59,540 --> 01:03:02,820
will end up finding
this path to the goal.

1315
01:03:02,820 --> 01:03:04,980
But it turns out this
path is not optimal.

1316
01:03:04,980 --> 01:03:07,470
There is a way to get to
the goal using fewer steps.

1317
01:03:07,470 --> 01:03:12,420
And it's actually this way, this way
that ultimately involved fewer steps,

1318
01:03:12,420 --> 01:03:15,900
even though it meant at this
moment choosing the worst

1319
01:03:15,900 --> 01:03:19,770
option between the two-- or what we
estimated to be the worst option, based

1320
01:03:19,770 --> 01:03:21,150
on the heretics.

1321
01:03:21,150 --> 01:03:23,910
And so this is what we mean
by this is a greedy algorithm.

1322
01:03:23,910 --> 01:03:26,460
It's making the best decision, locally.

1323
01:03:26,460 --> 01:03:28,800
At this decision point,
it looks like it's better

1324
01:03:28,800 --> 01:03:31,450
to go here than it is to go to the 13.

1325
01:03:31,450 --> 01:03:34,040
But in the big picture, it's
not necessarily optimal,

1326
01:03:34,040 --> 01:03:37,200
that it might find a solution
when in actuality there

1327
01:03:37,200 --> 01:03:40,150
was a better solution available.

1328
01:03:40,150 --> 01:03:43,200
So we would like some way
to solve this problem.

1329
01:03:43,200 --> 01:03:45,840
We like the idea of
this heuristic, of being

1330
01:03:45,840 --> 01:03:50,080
able to estimate the path, the
distance between us and the goal,

1331
01:03:50,080 --> 01:03:52,290
and that helps us to be able
to make better decisions

1332
01:03:52,290 --> 01:03:55,890
and to eliminate having to search
through entire parts of the state

1333
01:03:55,890 --> 01:03:57,070
space.

1334
01:03:57,070 --> 01:04:00,160
But we would like to modify the
algorithm so that we can achieve

1335
01:04:00,160 --> 01:04:02,650
optimality, so that it can be optimal.

1336
01:04:02,650 --> 01:04:04,060
And what is the way to do this?

1337
01:04:04,060 --> 01:04:05,790
What is the intuition here?

1338
01:04:05,790 --> 01:04:08,310
Well, let's take a look at this problem.

1339
01:04:08,310 --> 01:04:11,070
In this initial problem,
greedy best-first search

1340
01:04:11,070 --> 01:04:14,170
found this solution
here, this long path.

1341
01:04:14,170 --> 01:04:17,310
And the reason why it wasn't great is
because, yes, the heuristic numbers

1342
01:04:17,310 --> 01:04:21,180
went down pretty low, but later on,
and they started to build back up.

1343
01:04:21,180 --> 01:04:25,780
They built back 8, 9, 10, 11-- all
the way up to 12, in this case.

1344
01:04:25,780 --> 01:04:29,290
And so how might we go about
trying to improve this algorithm?

1345
01:04:29,290 --> 01:04:32,400
Well, one thing that we
might realize is that, if we

1346
01:04:32,400 --> 01:04:35,170
go all the way through this
algorithm, through this path,

1347
01:04:35,170 --> 01:04:39,340
and we end up going to the 12, and we've
had to take this many steps-- like,

1348
01:04:39,340 --> 01:04:42,640
who knows how many steps that
is-- just to get to this 12,

1349
01:04:42,640 --> 01:04:48,160
we could have also, as an alternative,
taken much fewer steps, just six steps,

1350
01:04:48,160 --> 01:04:50,170
and ended up at this 13 here.

1351
01:04:50,170 --> 01:04:53,680
And yes, 13 is more than 12, so
it looks like it's not as good,

1352
01:04:53,680 --> 01:04:55,460
but it required far fewer steps.

1353
01:04:55,460 --> 01:04:55,960
Right?

1354
01:04:55,960 --> 01:04:59,530
It only took six steps to get to
this 13 versus many more steps

1355
01:04:59,530 --> 01:05:01,000
to get to this 12.

1356
01:05:01,000 --> 01:05:04,180
And while greedy best-first search
says, oh, well, 12 is better than 13

1357
01:05:04,180 --> 01:05:07,850
so pick the 12, we might
more intelligently say,

1358
01:05:07,850 --> 01:05:10,810
I'd rather be somewhere
that heuristically

1359
01:05:10,810 --> 01:05:16,030
looks like it takes slightly longer
if I can get there much more quickly.

1360
01:05:16,030 --> 01:05:18,940
And we're going to encode
that idea, this general idea,

1361
01:05:18,940 --> 01:05:23,010
into a more formal algorithm
known as A star search.

1362
01:05:23,010 --> 01:05:25,270
A star search is going
to solve this problem by,

1363
01:05:25,270 --> 01:05:27,970
instead of just
considering the heuristic,

1364
01:05:27,970 --> 01:05:32,720
also considering how long it took
us to get to any particular state.

1365
01:05:32,720 --> 01:05:35,650
So the distinction is greedy
best-first search, if I am in a state

1366
01:05:35,650 --> 01:05:38,440
right now, the only
thing I care about is

1367
01:05:38,440 --> 01:05:42,070
what is the estimated distance,
the heuristic value, between me

1368
01:05:42,070 --> 01:05:43,000
and the goal.

1369
01:05:43,000 --> 01:05:45,640
Whereas A star search will
take into consideration

1370
01:05:45,640 --> 01:05:46,870
two pieces of information.

1371
01:05:46,870 --> 01:05:51,130
It'll take into consideration, how
far do I estimate I am from the goal,

1372
01:05:51,130 --> 01:05:55,030
but also how far did I have to
travel in order to get here?

1373
01:05:55,030 --> 01:05:57,490
Because that is relevant, too.

1374
01:05:57,490 --> 01:06:00,760
So we'll search algorithms by
expanding the node with the lowest

1375
01:06:00,760 --> 01:06:04,060
value of g(n) plus h(n).

1376
01:06:04,060 --> 01:06:07,810
h(n) is that same heuristic that we were
talking about a moment ago that's going

1377
01:06:07,810 --> 01:06:12,640
to vary based on the problem, but
g(n) is going to be the cost to reach

1378
01:06:12,640 --> 01:06:13,420
the node--

1379
01:06:13,420 --> 01:06:19,380
how many steps I had to take, in this
case, to get to my current position.

1380
01:06:19,380 --> 01:06:22,070
So what does that search
algorithm look like in practice?

1381
01:06:22,070 --> 01:06:23,630
Well, let's take a look.

1382
01:06:23,630 --> 01:06:25,130
Again, we've got the same maze.

1383
01:06:25,130 --> 01:06:28,010
And again, I've labeled them
with their Manhattan distance.

1384
01:06:28,010 --> 01:06:32,060
This value is the h(n)
value, the heuristic estimate

1385
01:06:32,060 --> 01:06:36,240
of how far each of these
squares is away from the goal.

1386
01:06:36,240 --> 01:06:38,370
But now, as we begin
to explore states, we

1387
01:06:38,370 --> 01:06:41,370
care not just about this
heuristic value but also

1388
01:06:41,370 --> 01:06:45,650
about g(n), the number of steps I
had to take in order to get there.

1389
01:06:45,650 --> 01:06:48,060
And I care about summing
those two numbers together.

1390
01:06:48,060 --> 01:06:49,230
So what does that look like?

1391
01:06:49,230 --> 01:06:52,850
On this very first step,
I have taken one step.

1392
01:06:52,850 --> 01:06:56,130
And now I am estimated to be
16 steps away from the goal.

1393
01:06:56,130 --> 01:06:59,190
So the total value here is 17.

1394
01:06:59,190 --> 01:07:00,280
Then I take one more step.

1395
01:07:00,280 --> 01:07:02,010
I've now taken two steps.

1396
01:07:02,010 --> 01:07:04,780
And I estimate myself to
be 15 away from the goal--

1397
01:07:04,780 --> 01:07:06,740
again, a total value of 17.

1398
01:07:06,740 --> 01:07:08,230
Now I've taken three steps.

1399
01:07:08,230 --> 01:07:11,440
And I'm estimated to be 14 away
from the goal, so on and so forth.

1400
01:07:11,440 --> 01:07:13,750
Four steps, an estimate of 13.

1401
01:07:13,750 --> 01:07:15,820
Five steps, estimate of 12.

1402
01:07:15,820 --> 01:07:17,980
And now, here's a decision point.

1403
01:07:17,980 --> 01:07:22,720
I could either be six steps away
from the goal with a heuristic of 13

1404
01:07:22,720 --> 01:07:26,440
for a total of 19, or I
could be six steps away

1405
01:07:26,440 --> 01:07:31,640
from the goal with a heuristic of 11
with an estimate of 17 for the total.

1406
01:07:31,640 --> 01:07:35,230
So between 19 and 17,
I'd rather take the 17--

1407
01:07:35,230 --> 01:07:37,170
the 6 plus 11.

1408
01:07:37,170 --> 01:07:39,170
So so far, no different
than what we saw before.

1409
01:07:39,170 --> 01:07:42,160
We're still taking this option
because it appears to be better.

1410
01:07:42,160 --> 01:07:45,160
And I keep taking this option
because it appears to be better.

1411
01:07:45,160 --> 01:07:49,430
But it's right about here that
things get a little bit different.

1412
01:07:49,430 --> 01:07:55,630
Now I could be 15 steps away from the
goal with an estimated distance of 6.

1413
01:07:55,630 --> 01:07:58,750
So 15 plus 6, total value of 21.

1414
01:07:58,750 --> 01:08:01,870
Alternatively, I could be six
steps away from the goal--

1415
01:08:01,870 --> 01:08:04,690
because this was five steps
away, so this is six steps away--

1416
01:08:04,690 --> 01:08:07,350
with a total value of 13 as my estimate.

1417
01:08:07,350 --> 01:08:08,650
So 6 plus 13--

1418
01:08:08,650 --> 01:08:10,180
that's 19.

1419
01:08:10,180 --> 01:08:14,140
So here we would evaluate
g(n) plus h(n) to be 19--

1420
01:08:14,140 --> 01:08:20,440
6 plus 13-- whereas here, we
would be 15 plus 6, or 21.

1421
01:08:20,440 --> 01:08:23,780
And so the intuition is,
19 less than 21, pick here.

1422
01:08:23,780 --> 01:08:29,190
But the idea is ultimately I'd rather be
having taken fewer steps to get to a 13

1423
01:08:29,190 --> 01:08:32,610
than having taken 15
steps and be at a six

1424
01:08:32,610 --> 01:08:35,410
because it means I've had to take
more steps in order to get there.

1425
01:08:35,410 --> 01:08:38,520
Maybe there's a better path this way.

1426
01:08:38,520 --> 01:08:41,050
So instead we'll explore this route.

1427
01:08:41,050 --> 01:08:43,840
Now if we go one more--
this is seven steps plus 14,

1428
01:08:43,840 --> 01:08:46,840
is 21, so between those
two it's sort of a toss up.

1429
01:08:46,840 --> 01:08:48,970
We might end up exploring
that one anyways.

1430
01:08:48,970 --> 01:08:53,270
But after that, as these numbers start
to get bigger in the heuristic values

1431
01:08:53,270 --> 01:08:55,600
and these heuristic values
start to get smaller,

1432
01:08:55,600 --> 01:08:59,100
you'll find that we'll actually
keep exploring down this path.

1433
01:08:59,100 --> 01:09:02,290
And you can do the math to see
that at every decision point,

1434
01:09:02,290 --> 01:09:06,810
A star search is going to make a choice
based on the sum of how many steps

1435
01:09:06,810 --> 01:09:09,700
it took me to get to my
current position and then

1436
01:09:09,700 --> 01:09:13,180
how far I estimate I am from the goal.

1437
01:09:13,180 --> 01:09:15,760
So while we did have to
explore some of these states,

1438
01:09:15,760 --> 01:09:20,470
the ultimate solution we found
was, in fact, an optimal solution.

1439
01:09:20,470 --> 01:09:24,790
It did find us the quickest possible
way to get from the initial state

1440
01:09:24,790 --> 01:09:25,800
to the goal.

1441
01:09:25,800 --> 01:09:29,800
And it turns out that A* is an
optimal search algorithm under certain

1442
01:09:29,800 --> 01:09:31,390
conditions.

1443
01:09:31,390 --> 01:09:35,860
So the conditions are h of n, my
heuristic, needs to be admissible.

1444
01:09:35,860 --> 01:09:37,990
What does it mean for a
heuristic to be admissible?

1445
01:09:37,990 --> 01:09:42,700
Well, a heuristic is admissible if
it never overestimates the true cost.

1446
01:09:42,700 --> 01:09:46,260
Each event always needs to
either get it exactly right

1447
01:09:46,260 --> 01:09:50,520
in terms of how far away I am,
or it needs to underestimate.

1448
01:09:50,520 --> 01:09:54,600
So we saw an example from before where
the heuristic value was much smaller

1449
01:09:54,600 --> 01:09:56,350
than the actual cost it would take.

1450
01:09:56,350 --> 01:09:57,710
That's totally fine.

1451
01:09:57,710 --> 01:10:00,100
But the heuristic value
should never overestimate.

1452
01:10:00,100 --> 01:10:04,520
It should never think that I'm further
away from the goal than I actually am.

1453
01:10:04,520 --> 01:10:07,630
And meanwhile, to make a
stronger statement, h of n

1454
01:10:07,630 --> 01:10:09,880
also needs to be consistent.

1455
01:10:09,880 --> 01:10:11,800
And what does it mean
for it to be consistent?

1456
01:10:11,800 --> 01:10:14,500
Mathematically, it means
that for every node, which

1457
01:10:14,500 --> 01:10:19,120
we'll call n, and successor, the node
after me, that I'll call n prime,

1458
01:10:19,120 --> 01:10:24,220
where it takes a cost of c to make
that step, the heuristic value of n

1459
01:10:24,220 --> 01:10:26,560
needs to be less than or
equal to the heuristic

1460
01:10:26,560 --> 01:10:29,060
value of n prime plus the cost.

1461
01:10:29,060 --> 01:10:31,360
So it's a lot of math, but
in words, what it ultimately

1462
01:10:31,360 --> 01:10:34,870
means is that if I am here
at this state right now,

1463
01:10:34,870 --> 01:10:39,250
the heuristic value from me to the goal
shouldn't be more than the heuristic

1464
01:10:39,250 --> 01:10:44,080
value of my successor, the next place
I could go to, plus however much

1465
01:10:44,080 --> 01:10:48,560
it would cost me to just make that
step, from one step to the next step.

1466
01:10:48,560 --> 01:10:52,480
And so this is just making sure that
my heuristic is consistent between all

1467
01:10:52,480 --> 01:10:54,010
of these steps that I might take.

1468
01:10:54,010 --> 01:10:58,540
So as long as this is true, then A*
search is going to find me an optimal

1469
01:10:58,540 --> 01:10:59,470
solution.

1470
01:10:59,470 --> 01:11:02,770
And this is where much of the challenge
of solving these search problems can

1471
01:11:02,770 --> 01:11:05,980
sometimes come in, that A* search
is an algorithm that is known,

1472
01:11:05,980 --> 01:11:07,960
and you could write
the code fairly easily.

1473
01:11:07,960 --> 01:11:11,260
But it's choosing the heuristic that
can be the interesting challenge.

1474
01:11:11,260 --> 01:11:13,540
The better the heuristic
is, the better I'll

1475
01:11:13,540 --> 01:11:16,870
be able to solve the problem, and the
fewer states that I'll have to explore.

1476
01:11:16,870 --> 01:11:20,200
And I need to make sure
that the heuristic satisfies

1477
01:11:20,200 --> 01:11:22,540
these particular constraints.

1478
01:11:22,540 --> 01:11:25,900
So all in all, these are some of
the examples of search algorithms

1479
01:11:25,900 --> 01:11:26,650
that might work.

1480
01:11:26,650 --> 01:11:29,080
And certainly, there are
many more than just this.

1481
01:11:29,080 --> 01:11:32,590
A*, for example, does have a tendency
to use quite a bit of memory,

1482
01:11:32,590 --> 01:11:37,570
so there are alternative approaches to
A* that ultimately use less memory than

1483
01:11:37,570 --> 01:11:40,200
this version of A* happens to use.

1484
01:11:40,200 --> 01:11:43,660
And there are other search algorithms
that are optimized for other cases

1485
01:11:43,660 --> 01:11:45,560
as well.

1486
01:11:45,560 --> 01:11:48,460
But now, so far, we've only been
looking at search algorithms

1487
01:11:48,460 --> 01:11:50,860
where there's one agent.

1488
01:11:50,860 --> 01:11:53,470
I am trying to find a
solution to a problem.

1489
01:11:53,470 --> 01:11:55,900
I am trying to navigate
my way through a maze.

1490
01:11:55,900 --> 01:11:57,880
I am trying to solve a 15 puzzle.

1491
01:11:57,880 --> 01:12:02,190
I am trying to find driving
directions from point A to point B.

1492
01:12:02,190 --> 01:12:04,420
Sometimes in search
situations, though, we'll

1493
01:12:04,420 --> 01:12:07,600
enter an adversarial
situation where I am

1494
01:12:07,600 --> 01:12:10,010
an agent trying to make
intelligent decisions,

1495
01:12:10,010 --> 01:12:13,090
and there is someone else who is
fighting against me, so to speak,

1496
01:12:13,090 --> 01:12:16,570
that has an opposite objective,
someone where I am trying to succeed,

1497
01:12:16,570 --> 01:12:19,180
someone else that wants me to fail.

1498
01:12:19,180 --> 01:12:23,680
And this is most popular in something
like a game, a game like tic-tac-toe,

1499
01:12:23,680 --> 01:12:26,350
where we've got this
3-by-3 grid, and X and O

1500
01:12:26,350 --> 01:12:30,130
take turns either writing an X or
an O in any one of these squares.

1501
01:12:30,130 --> 01:12:33,610
And the goal is to get three X's
in a row, if you're the X player,

1502
01:12:33,610 --> 01:12:36,610
or three O's in a row,
if you're the O player.

1503
01:12:36,610 --> 01:12:40,390
And computers have gotten quite good at
playing games, tic-tac-toe very easily,

1504
01:12:40,390 --> 01:12:42,370
but even more complex games.

1505
01:12:42,370 --> 01:12:46,570
And so you might imagine, what does
an intelligent decision in a game look

1506
01:12:46,570 --> 01:12:47,290
like?

1507
01:12:47,290 --> 01:12:51,130
So maybe X makes an initial move
in the middle, and O plays up here.

1508
01:12:51,130 --> 01:12:54,340
What does an intelligent
move for X now become?

1509
01:12:54,340 --> 01:12:56,390
Where should you move if you were X?

1510
01:12:56,390 --> 01:12:58,670
And it turns out there are
a couple of possibilities.

1511
01:12:58,670 --> 01:13:01,120
But if an AI is playing
this game optimally,

1512
01:13:01,120 --> 01:13:04,180
then the AI might play somewhere
like the upper right, where

1513
01:13:04,180 --> 01:13:07,800
in this situation, O has
the opposite objective of X.

1514
01:13:07,800 --> 01:13:11,770
X is trying to win the game, to
get three in a row diagonally here,

1515
01:13:11,770 --> 01:13:15,410
and O is trying to stop that
objective, opposite of the objective.

1516
01:13:15,410 --> 01:13:17,860
And so O is going to place
here, to try to block.

1517
01:13:17,860 --> 01:13:20,260
But now, X has a pretty clever move.

1518
01:13:20,260 --> 01:13:25,180
X can make a move, like this where
now X has two possible ways that X

1519
01:13:25,180 --> 01:13:26,080
can win the game.

1520
01:13:26,080 --> 01:13:29,050
X could win the game by getting
three in a row across here,

1521
01:13:29,050 --> 01:13:32,380
or X could win the game by getting
three in a row vertically this way.

1522
01:13:32,380 --> 01:13:34,610
So it doesn't matter where
O makes their next move.

1523
01:13:34,610 --> 01:13:38,170
O could play here, for example, blocking
the three in a row horizontally,

1524
01:13:38,170 --> 01:13:43,200
but then X is going to win the game by
getting a three in a row vertically.

1525
01:13:43,200 --> 01:13:45,010
And so there's a fair
amount of reasoning

1526
01:13:45,010 --> 01:13:48,220
that's going on here in order for the
computer to be able to solve a problem.

1527
01:13:48,220 --> 01:13:51,520
And it's similar in spirit to the
problems we've looked at so far.

1528
01:13:51,520 --> 01:13:54,580
There are actions, there's some
sort of state of the board,

1529
01:13:54,580 --> 01:13:57,160
and some transition from
one action to the next,

1530
01:13:57,160 --> 01:14:00,730
but it's different in the sense that
this is now not just a classical search

1531
01:14:00,730 --> 01:14:04,900
problem, but an adversarial search
problem, that I am the X player,

1532
01:14:04,900 --> 01:14:07,090
trying to find the best
moves to make, but I

1533
01:14:07,090 --> 01:14:10,640
know that there is some adversary
that is trying to stop me.

1534
01:14:10,640 --> 01:14:14,800
So we need some sort of algorithm
to deal with these adversarial type

1535
01:14:14,800 --> 01:14:16,440
of search situations.

1536
01:14:16,440 --> 01:14:18,400
And the algorithm we're
going to take a look at

1537
01:14:18,400 --> 01:14:20,860
is an algorithm called
Minimax, which works

1538
01:14:20,860 --> 01:14:24,820
very well for these deterministic
games, where there are two players.

1539
01:14:24,820 --> 01:14:28,120
It can work for other types of games as
well, but we'll look right now at games

1540
01:14:28,120 --> 01:14:31,990
where I make a move, that my opponent
makes a move, and I am trying to win,

1541
01:14:31,990 --> 01:14:34,270
and my opponent is trying to win, also.

1542
01:14:34,270 --> 01:14:38,100
Or in other words, my opponent
is trying to get me to lose.

1543
01:14:38,100 --> 01:14:40,900
And so what do we need in order
to make this algorithm work?

1544
01:14:40,900 --> 01:14:44,800
Well, anytime we try and translate
this human concept, of playing a game,

1545
01:14:44,800 --> 01:14:47,380
winning, and losing,
to a computer, we want

1546
01:14:47,380 --> 01:14:50,230
to translate it in terms that
the computer can understand.

1547
01:14:50,230 --> 01:14:53,740
And ultimately, the computer
really just understands numbers.

1548
01:14:53,740 --> 01:14:56,890
And so we want some way of
translating a game of X's and O's

1549
01:14:56,890 --> 01:15:00,490
on a grid to something numerical,
something the computer can understand.

1550
01:15:00,490 --> 01:15:04,300
The computer doesn't normally
understand notions of win or lose,

1551
01:15:04,300 --> 01:15:08,360
but it does understand the
concept of bigger and smaller.

1552
01:15:08,360 --> 01:15:12,100
And so what we might yet do is, we
might take each of the possible ways

1553
01:15:12,100 --> 01:15:17,110
that a tic-tac-toe game can unfold
and assign a value, or a utility,

1554
01:15:17,110 --> 01:15:19,060
to each one of those possible ways.

1555
01:15:19,060 --> 01:15:21,790
And in a tic-tac-toe game,
and in many types of games,

1556
01:15:21,790 --> 01:15:23,800
there are three possible outcomes.

1557
01:15:23,800 --> 01:15:28,210
The outcomes are, O wins,
X wins, or nobody wins.

1558
01:15:28,210 --> 01:15:32,440
So player one wins, player
two wins, or nobody wins.

1559
01:15:32,440 --> 01:15:36,670
And for now, let's go ahead and
assign each of these possible outcomes

1560
01:15:36,670 --> 01:15:37,870
a different value.

1561
01:15:37,870 --> 01:15:39,100
We'll say O winning--

1562
01:15:39,100 --> 01:15:41,230
that'll have a value of negative 1.

1563
01:15:41,230 --> 01:15:43,630
Nobody winning-- that'll
have a value of 0.

1564
01:15:43,630 --> 01:15:46,700
And X winning-- that
will have a value of 1.

1565
01:15:46,700 --> 01:15:50,860
So we've just assigned numbers to
each of these three possible outcomes.

1566
01:15:50,860 --> 01:15:53,480
And now, we have two players.

1567
01:15:53,480 --> 01:15:56,260
We have the X player and the O player.

1568
01:15:56,260 --> 01:16:00,290
And we're going to go ahead and
call the X player the max player.

1569
01:16:00,290 --> 01:16:03,140
And we'll call the O
player the min player.

1570
01:16:03,140 --> 01:16:05,920
And the reason why is because
in the Minimax algorithm,

1571
01:16:05,920 --> 01:16:11,380
the max player, which in this case is
X, is aiming to maximize the score.

1572
01:16:11,380 --> 01:16:14,660
These are the possible options for
the score, negative 1, 0, and 1.

1573
01:16:14,660 --> 01:16:18,400
X wants to maximize the score,
meaning if at all possible,

1574
01:16:18,400 --> 01:16:21,980
X would like this situation
where X wins the game.

1575
01:16:21,980 --> 01:16:23,590
And we give it a score of 1.

1576
01:16:23,590 --> 01:16:27,010
But if this isn't possible, if X
needs to choose between these two

1577
01:16:27,010 --> 01:16:31,930
options, negative 1 meaning O
winning, or 0 meaning nobody winning,

1578
01:16:31,930 --> 01:16:37,060
X would rather that nobody wins, score
of 0, than a score of negative 1,

1579
01:16:37,060 --> 01:16:38,360
O winning.

1580
01:16:38,360 --> 01:16:41,080
So this notion of winning
and losing in time

1581
01:16:41,080 --> 01:16:45,410
has been reduced mathematically to
just this idea of, try and maximize

1582
01:16:45,410 --> 01:16:46,070
the score.

1583
01:16:46,070 --> 01:16:49,920
The X player always wants
the score to be bigger.

1584
01:16:49,920 --> 01:16:52,880
And on the flip side, the
min player, in this case, O,

1585
01:16:52,880 --> 01:16:54,620
is aiming to minimize the score.

1586
01:16:54,620 --> 01:16:59,510
The O player wants the score
to be as small as possible.

1587
01:16:59,510 --> 01:17:02,820
So now we've taken this game of
X's and O's and winning and losing

1588
01:17:02,820 --> 01:17:04,990
and turned it into something
mathematical, something

1589
01:17:04,990 --> 01:17:09,560
where X is trying to maximize the score,
O is trying to minimize the score.

1590
01:17:09,560 --> 01:17:11,590
Let's now look at all
of the parts of the game

1591
01:17:11,590 --> 01:17:14,650
that we need in order
to encode it in an AI

1592
01:17:14,650 --> 01:17:18,730
so that an AI can play
a game like tic-tac-toe.

1593
01:17:18,730 --> 01:17:20,770
So the game is going to
need a couple of things.

1594
01:17:20,770 --> 01:17:23,350
We'll need some sort of initial
state, that we'll in this case

1595
01:17:23,350 --> 01:17:27,520
call S0, which is how the game begins,
like an empty tic-tac-toe board,

1596
01:17:27,520 --> 01:17:28,750
for example.

1597
01:17:28,750 --> 01:17:32,410
We'll also need a
function called player,

1598
01:17:32,410 --> 01:17:37,030
where the player function is going to
take as input a state, here represented

1599
01:17:37,030 --> 01:17:41,050
by S. And the output of the
player function is going to be,

1600
01:17:41,050 --> 01:17:43,450
which player's turn is it?

1601
01:17:43,450 --> 01:17:46,360
We need to be able to give a
tic-tac-toe board to the computer,

1602
01:17:46,360 --> 01:17:50,440
run it through a function, and that
function tells us whose turn it is.

1603
01:17:50,440 --> 01:17:52,870
We'll need some notion of
actions that we can take.

1604
01:17:52,870 --> 01:17:54,980
We'll see examples of
that in just a moment.

1605
01:17:54,980 --> 01:17:57,910
We need some notion of a
transition model-- same as before.

1606
01:17:57,910 --> 01:18:00,160
If I have a state, and
I take an action, I

1607
01:18:00,160 --> 01:18:03,040
need to know what results
as a consequence of it.

1608
01:18:03,040 --> 01:18:05,810
I need some way of knowing
when the game is over.

1609
01:18:05,810 --> 01:18:07,900
So this is equivalent to
kind of like a goal test,

1610
01:18:07,900 --> 01:18:10,330
but I need some terminal
test, some way to check

1611
01:18:10,330 --> 01:18:14,230
to see if a state is a terminal
state, where a terminal state means

1612
01:18:14,230 --> 01:18:15,280
the game is over.

1613
01:18:15,280 --> 01:18:20,050
In the classic game of tic-tac-toe , a
terminal state means either someone has

1614
01:18:20,050 --> 01:18:23,380
gotten three in a row, or all of the
squares of the tic-tac-toe board are

1615
01:18:23,380 --> 01:18:24,040
filled.

1616
01:18:24,040 --> 01:18:26,610
Either of those conditions
make it a terminal state.

1617
01:18:26,610 --> 01:18:28,570
In a game of chess, it
might be something like,

1618
01:18:28,570 --> 01:18:31,660
when there is checkmate, or if
checkmate is no longer possible,

1619
01:18:31,660 --> 01:18:34,370
that becomes a terminal state.

1620
01:18:34,370 --> 01:18:38,410
And then finally we'll need a utility
function, a function that takes a state

1621
01:18:38,410 --> 01:18:41,890
and gives us a numerical value for that
terminal state, some way of saying,

1622
01:18:41,890 --> 01:18:44,530
if X wins the game,
that has a value of 1.

1623
01:18:44,530 --> 01:18:47,050
If O has won the game, that
has the value of negative 1.

1624
01:18:47,050 --> 01:18:50,350
If nobody has won the game,
that has a value of 0.

1625
01:18:50,350 --> 01:18:52,790
So let's take a look at
each of these in turn.

1626
01:18:52,790 --> 01:18:57,070
The initial state, we can just represent
in tic-tac-toe as the empty game board.

1627
01:18:57,070 --> 01:18:58,460
This is where we begin.

1628
01:18:58,460 --> 01:19:01,030
It's the place from which
we begin this search.

1629
01:19:01,030 --> 01:19:03,540
And again, I'll be representing
these things visually.

1630
01:19:03,540 --> 01:19:05,410
But you can imagine
this really just being

1631
01:19:05,410 --> 01:19:10,120
an array, or a two-dimensional array,
of all of these possible squares.

1632
01:19:10,120 --> 01:19:13,510
Then we need the player function
that, again, takes a state

1633
01:19:13,510 --> 01:19:15,250
and tells us whose turn it is.

1634
01:19:15,250 --> 01:19:18,670
Assuming X makes the first move,
if I have an empty game board,

1635
01:19:18,670 --> 01:19:21,510
then my player function
is going to return X

1636
01:19:21,510 --> 01:19:25,120
And if I have a game board where X has
made a move, that my player function is

1637
01:19:25,120 --> 01:19:28,840
going to return O. The player function
takes a tic-tac-toe game board

1638
01:19:28,840 --> 01:19:32,170
and tells us whose turn it is.

1639
01:19:32,170 --> 01:19:34,870
Next up, we'll consider
the actions function.

1640
01:19:34,870 --> 01:19:39,040
The actions function, much like it
did in classical search, takes a state

1641
01:19:39,040 --> 01:19:41,860
and gives us the set of
all of the possible actions

1642
01:19:41,860 --> 01:19:44,390
we can take in that state.

1643
01:19:44,390 --> 01:19:49,360
So let's imagine it's O's turn to move
in a game board that looks like this.

1644
01:19:49,360 --> 01:19:52,120
What happens when we pass it
into the actions function?

1645
01:19:52,120 --> 01:19:55,990
So the actions function takes
this state of the game as input,

1646
01:19:55,990 --> 01:19:59,860
and the output is a set of
possible actions it's a set of--

1647
01:19:59,860 --> 01:20:03,580
I could move in the upper left, or
I could move in the bottom middle.

1648
01:20:03,580 --> 01:20:06,490
Those are the two possible
action choices that I have

1649
01:20:06,490 --> 01:20:10,190
when I begin in this particular state.

1650
01:20:10,190 --> 01:20:13,100
Now, just as before, when
we add states and actions,

1651
01:20:13,100 --> 01:20:15,470
we need some sort of
transition model to tell us,

1652
01:20:15,470 --> 01:20:19,500
when we take this action in the state,
what is the new state that we get?

1653
01:20:19,500 --> 01:20:22,430
And here, we define that using
the result function that takes

1654
01:20:22,430 --> 01:20:25,460
a state as input, as well as an action.

1655
01:20:25,460 --> 01:20:28,490
And when we apply the result
function to this state,

1656
01:20:28,490 --> 01:20:33,110
saying that let's let O move in this
upper left corner, the new state we get

1657
01:20:33,110 --> 01:20:35,840
is this resulting state, where
O is in the upper-left corner.

1658
01:20:35,840 --> 01:20:38,880
And now, this seems obvious to someone
who knows how to play tic-tac-toe.

1659
01:20:38,880 --> 01:20:40,830
Of course, you play in
the upper left corner--

1660
01:20:40,830 --> 01:20:41,870
that's the board you get.

1661
01:20:41,870 --> 01:20:45,060
But all of this information
needs to be encoded into the AI.

1662
01:20:45,060 --> 01:20:47,570
The AI doesn't know
how to play tic-tac-toe

1663
01:20:47,570 --> 01:20:51,170
until you tell the AI how the
rules of tic-tac-toe work.

1664
01:20:51,170 --> 01:20:53,630
And this function,
defining the function here,

1665
01:20:53,630 --> 01:20:57,050
allows us to tell the AI
how this game actually works

1666
01:20:57,050 --> 01:21:01,200
and how actions actually
affect the outcome of the game.

1667
01:21:01,200 --> 01:21:03,590
So the AI needs to know
how the game works.

1668
01:21:03,590 --> 01:21:06,080
The AI also needs to know
when the game is over.

1669
01:21:06,080 --> 01:21:09,320
That is by defining a function
called terminal that takes as input

1670
01:21:09,320 --> 01:21:13,200
a state S, such that if we take
a game that is not yet over,

1671
01:21:13,200 --> 01:21:16,130
pass it into the terminal
function, the output is false.

1672
01:21:16,130 --> 01:21:17,540
The game is not over.

1673
01:21:17,540 --> 01:21:20,690
But if we take a game that is
over, because X has gotten three

1674
01:21:20,690 --> 01:21:24,360
in a row along that diagonal, pass
that into the terminal function,

1675
01:21:24,360 --> 01:21:28,920
then the output is going to be true,
because the game now is, in fact, over.

1676
01:21:28,920 --> 01:21:31,930
And finally, we've told
the AI how the game works

1677
01:21:31,930 --> 01:21:35,180
in terms of what moves can be made and
what happens when you make those moves.

1678
01:21:35,180 --> 01:21:37,290
We've told the AI when the game is over.

1679
01:21:37,290 --> 01:21:41,270
Now we need to tell the AI what the
value of each of those states is.

1680
01:21:41,270 --> 01:21:45,170
And we do that by defining this utility
function, that takes a state, S,

1681
01:21:45,170 --> 01:21:48,840
and tells us the score or
the utility of that state.

1682
01:21:48,840 --> 01:21:52,760
So again, we said that if X wins the
game, that utility is a value of 1,

1683
01:21:52,760 --> 01:21:57,350
whereas if O wins the game, then
the utility of that is negative 1.

1684
01:21:57,350 --> 01:22:00,200
And the AI needs to know, for
each of these terminal states

1685
01:22:00,200 --> 01:22:04,800
where the game is over, what
is the utility of that state?

1686
01:22:04,800 --> 01:22:08,390
So I can give you a game board like
this, where the game is, in fact, over,

1687
01:22:08,390 --> 01:22:12,710
and I ask the AI to tell me what the
value of that state is, it could do so.

1688
01:22:12,710 --> 01:22:15,830
The value of the state is 1.

1689
01:22:15,830 --> 01:22:20,240
Where things get interesting, though,
is if the game is not yet over.

1690
01:22:20,240 --> 01:22:21,900
Let's imagine a game board like this.

1691
01:22:21,900 --> 01:22:23,330
We're in the middle of the game.

1692
01:22:23,330 --> 01:22:25,850
It's O's turn to make a move.

1693
01:22:25,850 --> 01:22:27,980
So how do we know it's
O's turn to make a move?

1694
01:22:27,980 --> 01:22:30,110
We can calculate that,
using the player function.

1695
01:22:30,110 --> 01:22:33,260
We can say, player of
S, pass in the state.

1696
01:22:33,260 --> 01:22:36,240
O is the answer, so we
know it's O's turn to move.

1697
01:22:36,240 --> 01:22:40,790
And now, what is the value of this
board, and what action should O take?

1698
01:22:40,790 --> 01:22:41,960
Well that's going to depend.

1699
01:22:41,960 --> 01:22:43,760
We have to do some calculation here.

1700
01:22:43,760 --> 01:22:47,450
And this is where the Minimax
algorithm really comes in.

1701
01:22:47,450 --> 01:22:50,840
Recall that X is trying to
maximize the score, which means

1702
01:22:50,840 --> 01:22:53,720
that O is trying to minimize the score.

1703
01:22:53,720 --> 01:22:58,870
O would like to minimize the total value
that we get at the end of the game.

1704
01:22:58,870 --> 01:23:01,310
And because this game isn't
over yet, we don't really

1705
01:23:01,310 --> 01:23:04,820
know just yet what the
value of this game board is.

1706
01:23:04,820 --> 01:23:08,290
We have to do some calculation
in order to figure that out.

1707
01:23:08,290 --> 01:23:10,430
So how do we do that
kind of calculation?

1708
01:23:10,430 --> 01:23:13,040
Well, in order to do so,
we're going to consider,

1709
01:23:13,040 --> 01:23:15,680
just as we might in a
classical search situation,

1710
01:23:15,680 --> 01:23:20,160
what actions could happen next, and
what states will that take us to?

1711
01:23:20,160 --> 01:23:22,070
And it turns out that
in this position, there

1712
01:23:22,070 --> 01:23:26,030
are only two open squares, which means
there are only two open places where

1713
01:23:26,030 --> 01:23:28,640
O can make a move.

1714
01:23:28,640 --> 01:23:31,080
O could either make a
move in the upper left,

1715
01:23:31,080 --> 01:23:34,160
or O can make a move
in the bottom middle.

1716
01:23:34,160 --> 01:23:36,980
And Minimax doesn't know right out
of the box which of those moves

1717
01:23:36,980 --> 01:23:40,520
is going to be better, so
it's going to consider both.

1718
01:23:40,520 --> 01:23:42,320
But now we run into the same situation.

1719
01:23:42,320 --> 01:23:45,170
Now I have two more game boards,
neither of which is over.

1720
01:23:45,170 --> 01:23:46,660
What happens next?

1721
01:23:46,660 --> 01:23:48,410
And now it's in this
sense that Minimax is

1722
01:23:48,410 --> 01:23:50,660
what we'll call a recursive algorithm.

1723
01:23:50,660 --> 01:23:54,800
It's going to now repeat the
exact same process, although now

1724
01:23:54,800 --> 01:23:57,640
considering it from the
opposite perspective.

1725
01:23:57,640 --> 01:24:01,280
It's as if I am now going to put
myself-- if I am the O player,

1726
01:24:01,280 --> 01:24:05,540
I'm going to put myself in my opponent's
shoes, my opponent as the X player,

1727
01:24:05,540 --> 01:24:10,010
and consider, what would my opponent
do if they were in this position?

1728
01:24:10,010 --> 01:24:14,090
What would my opponent do, the X
player, if they were in that position?

1729
01:24:14,090 --> 01:24:15,240
And what would then happen?

1730
01:24:15,240 --> 01:24:18,260
Well, the other player,
my opponent, the X player,

1731
01:24:18,260 --> 01:24:21,200
is trying to maximize the
score, whereas I am trying

1732
01:24:21,200 --> 01:24:23,240
to minimize the score as the O player.

1733
01:24:23,240 --> 01:24:27,550
So X is trying to find the maximum
possible value that they can get.

1734
01:24:27,550 --> 01:24:29,390
And so what's going to happen?

1735
01:24:29,390 --> 01:24:32,780
Well, from this board position,
X only has one choice.

1736
01:24:32,780 --> 01:24:35,600
X is going to play here, and
they're going to get three in a row.

1737
01:24:35,600 --> 01:24:37,910
And we know that that board, X winning--

1738
01:24:37,910 --> 01:24:39,530
that has a value of 1.

1739
01:24:39,530 --> 01:24:43,220
If X wins the game, the value
of that game board is 1.

1740
01:24:43,220 --> 01:24:48,680
And so from this position, if this
state can only ever lead to this state,

1741
01:24:48,680 --> 01:24:52,640
it's the only possible option,
and this state has a value of 1,

1742
01:24:52,640 --> 01:24:57,230
then the maximum possible value that the
X player can get from this game board

1743
01:24:57,230 --> 01:24:58,890
is also 1 from here.

1744
01:24:58,890 --> 01:25:01,670
The only place we can get is
to a game with the value of 1,

1745
01:25:01,670 --> 01:25:05,260
so this game board
also has a value of 1.

1746
01:25:05,260 --> 01:25:07,640
Now we consider this one over here.

1747
01:25:07,640 --> 01:25:08,860
What's going to happen now?

1748
01:25:08,860 --> 01:25:10,460
Well, X needs to make a move.

1749
01:25:10,460 --> 01:25:13,510
The only move X can make is in the
upper left, so X will go there.

1750
01:25:13,510 --> 01:25:15,350
And in this game, no one wins the game.

1751
01:25:15,350 --> 01:25:16,940
Nobody has three in a row.

1752
01:25:16,940 --> 01:25:19,600
So the value of that game board is 0.

1753
01:25:19,600 --> 01:25:20,890
Nobody's won.

1754
01:25:20,890 --> 01:25:24,880
And so again, by the same logic, if
from this board position, the only place

1755
01:25:24,880 --> 01:25:27,760
we can get to is a board
where the value is 0,

1756
01:25:27,760 --> 01:25:31,290
then this state must
also have a value of 0.

1757
01:25:31,290 --> 01:25:35,320
And now here comes the choice part,
the idea of trying to minimize.

1758
01:25:35,320 --> 01:25:38,970
I, as the O player, now know
that if I make this choice,

1759
01:25:38,970 --> 01:25:43,140
moving in the upper left, that is going
to result in a game with a value of 1,

1760
01:25:43,140 --> 01:25:45,240
assuming everyone plays optimally.

1761
01:25:45,240 --> 01:25:47,040
And if I instead play
in the lower middle,

1762
01:25:47,040 --> 01:25:50,250
choose this fork in the road, that
is going to result in a game board

1763
01:25:50,250 --> 01:25:51,510
with a value of 0.

1764
01:25:51,510 --> 01:25:52,750
I have two options.

1765
01:25:52,750 --> 01:25:56,380
I have a 1 and a 0 to choose
from, and I need to pick.

1766
01:25:56,380 --> 01:25:59,520
And as the min player, I
would rather choose the option

1767
01:25:59,520 --> 01:26:00,940
with the minimum value.

1768
01:26:00,940 --> 01:26:03,090
So whenever a player
has multiple choices,

1769
01:26:03,090 --> 01:26:06,030
the min player will choose the
option with the smallest value.

1770
01:26:06,030 --> 01:26:08,700
The max player will choose the
option with the largest value.

1771
01:26:08,700 --> 01:26:11,470
Between the 1 in the
0, the 0 is smaller,

1772
01:26:11,470 --> 01:26:14,740
meaning I'd rather tie the
game than lose the game.

1773
01:26:14,740 --> 01:26:18,090
And so this game board, we'll
say, also has a value of 0,

1774
01:26:18,090 --> 01:26:22,290
because if I am playing optimally,
I will pick this fork in the road.

1775
01:26:22,290 --> 01:26:25,290
I'll place my O here to
block X's three in a row.

1776
01:26:25,290 --> 01:26:28,080
X will move in the upper left,
and the game will be over,

1777
01:26:28,080 --> 01:26:30,180
and no one will have won the game.

1778
01:26:30,180 --> 01:26:34,260
So this is now the logic of Minimax,
to consider all of the possible options

1779
01:26:34,260 --> 01:26:37,140
that I can take, all of the
actions that I can take,

1780
01:26:37,140 --> 01:26:39,390
and then to put myself
in my opponent's shoes.

1781
01:26:39,390 --> 01:26:42,930
I decide what move I'm going to
make now by considering what move

1782
01:26:42,930 --> 01:26:44,880
my opponent will make on the next turn.

1783
01:26:44,880 --> 01:26:48,230
And to do that, I consider what move
I would make on the turn after that,

1784
01:26:48,230 --> 01:26:52,560
so on and so forth, until I get all
the way down to the end of the game,

1785
01:26:52,560 --> 01:26:55,050
to one of these so-called
terminal states.

1786
01:26:55,050 --> 01:26:57,480
In fact, this very
decision point, where I

1787
01:26:57,480 --> 01:27:00,630
am trying to decide as the O player
what to make a decision about,

1788
01:27:00,630 --> 01:27:04,770
might have just been a part of the
logic that the X player, my opponent,

1789
01:27:04,770 --> 01:27:06,360
was using the move before me.

1790
01:27:06,360 --> 01:27:09,150
This might be part of
some larger tree where

1791
01:27:09,150 --> 01:27:11,580
X is trying to make a
move in this situation

1792
01:27:11,580 --> 01:27:13,830
and needs to pick between
three different options

1793
01:27:13,830 --> 01:27:16,440
in order to make a decision
about what to happen.

1794
01:27:16,440 --> 01:27:19,260
And the further and further away
we are from the end of the game,

1795
01:27:19,260 --> 01:27:23,460
the deeper this tree has to go,
because every level in this tree

1796
01:27:23,460 --> 01:27:27,750
is going to correspond to one move,
one move or action that I take,

1797
01:27:27,750 --> 01:27:32,350
one move or action that my opponent
takes, in order to decide what happens.

1798
01:27:32,350 --> 01:27:35,970
And in fact, it turns out that if
I am the X player in this position,

1799
01:27:35,970 --> 01:27:38,630
and I recursively do the logic
and see I have a choice--

1800
01:27:38,630 --> 01:27:43,170
three choices, in fact, one of which
leads to a value of 0, if I play here,

1801
01:27:43,170 --> 01:27:45,900
and if everyone plays optimally,
the game will be a tie.

1802
01:27:45,900 --> 01:27:51,000
If I play here, then O is going to
win, and I'll lose, playing optimally.

1803
01:27:51,000 --> 01:27:53,610
Or here, where I, the
X player, can win--

1804
01:27:53,610 --> 01:27:57,230
well, between a score of
0 and negative 1 and 1,

1805
01:27:57,230 --> 01:27:59,070
I'd rather pick the
board with a value of 1,

1806
01:27:59,070 --> 01:28:01,270
because that's the
maximum value I can get.

1807
01:28:01,270 --> 01:28:05,550
And so this board would also
have a maximum value of 1.

1808
01:28:05,550 --> 01:28:07,810
And so this tree can
get very, very deep,

1809
01:28:07,810 --> 01:28:11,400
especially as the game starts
to have more and more moves.

1810
01:28:11,400 --> 01:28:13,470
And this logic works not
just for tic-tac-toe,

1811
01:28:13,470 --> 01:28:16,950
but any of these sorts of games where I
make a move, my opponent makes a move,

1812
01:28:16,950 --> 01:28:20,460
and ultimately, we have
these adversarial objectives.

1813
01:28:20,460 --> 01:28:24,340
And we can simplify the diagram
into a diagram that looks like this.

1814
01:28:24,340 --> 01:28:27,240
This is a more abstract
version of the Minimax tree,

1815
01:28:27,240 --> 01:28:30,450
where these are each states, but I'm
no longer representing them as exactly

1816
01:28:30,450 --> 01:28:31,830
like tic-tac-toe boards.

1817
01:28:31,830 --> 01:28:35,690
This is just representing some generic
game that might be tic-tac-toe,

1818
01:28:35,690 --> 01:28:38,130
might be some other game altogether.

1819
01:28:38,130 --> 01:28:40,590
Any of these green arrows
that are pointing up--

1820
01:28:40,590 --> 01:28:42,700
that represents a maximizing state.

1821
01:28:42,700 --> 01:28:45,290
I would like the score
to be as big as possible.

1822
01:28:45,290 --> 01:28:47,430
And any of these red
arrows pointing down--

1823
01:28:47,430 --> 01:28:50,580
those are minimizing states, where
the player is the min player,

1824
01:28:50,580 --> 01:28:54,190
and they are trying to make
the score as small as possible.

1825
01:28:54,190 --> 01:28:58,200
So if you imagine in this situation, I
am the maximizing player, this player

1826
01:28:58,200 --> 01:29:00,550
here, and I have three choices--

1827
01:29:00,550 --> 01:29:04,170
one choice gives me a score of 5,
one choice gives me a score of 3,

1828
01:29:04,170 --> 01:29:06,310
and one choice gives me a score of 9.

1829
01:29:06,310 --> 01:29:10,050
Well, then, between those
three choices, my best option

1830
01:29:10,050 --> 01:29:14,430
is to choose this 9 over here, the
score that maximizes my options out

1831
01:29:14,430 --> 01:29:15,960
of all the three options.

1832
01:29:15,960 --> 01:29:18,870
And so I can give this
state a value of 9,

1833
01:29:18,870 --> 01:29:21,210
because among my three
options, that is the best

1834
01:29:21,210 --> 01:29:24,330
choice that I have available to me.

1835
01:29:24,330 --> 01:29:25,800
So that's my decision now.

1836
01:29:25,800 --> 01:29:29,580
You imagine it's like one move
away from the end of the game.

1837
01:29:29,580 --> 01:29:31,670
But then you could also
ask a reasonable question.

1838
01:29:31,670 --> 01:29:35,300
What might my opponent do two moves
away from the end of the game?

1839
01:29:35,300 --> 01:29:37,040
My opponent is the minimizing player.

1840
01:29:37,040 --> 01:29:39,810
They are trying to make the
score as small as possible.

1841
01:29:39,810 --> 01:29:43,690
Imagine what would have happened if
they had to pick which choice to make.

1842
01:29:43,690 --> 01:29:47,370
One choice leads us to this state,
where I, the maximizing player,

1843
01:29:47,370 --> 01:29:50,840
am going to opt for 9, the
biggest score that I can get.

1844
01:29:50,840 --> 01:29:55,130
And one leads to this state,
where I, the maximizing player,

1845
01:29:55,130 --> 01:29:58,980
would choose 8, which is then
the largest score than I can get.

1846
01:29:58,980 --> 01:30:02,780
Now, the minimizing player, forced
to choose between a 9 or an 8,

1847
01:30:02,780 --> 01:30:07,100
is going to choose the smallest possible
score, which in this case is an 8.

1848
01:30:07,100 --> 01:30:09,450
And that is, then, how
this process would unfold.

1849
01:30:09,450 --> 01:30:11,660
But the minimizing player,
in this case, considers

1850
01:30:11,660 --> 01:30:14,180
both of their options, and
then all of the options

1851
01:30:14,180 --> 01:30:17,590
that would happen as a result of that.

1852
01:30:17,590 --> 01:30:21,420
So this now is a general picture of
what the Minimax algorithm looks like.

1853
01:30:21,420 --> 01:30:24,700
Let's now try to formalize it
using a little bit of pseudocode.

1854
01:30:24,700 --> 01:30:27,720
So what exactly is happening
in the Minimax algorithm?

1855
01:30:27,720 --> 01:30:31,500
Well, given a state, S, we
need to decide what to happen.

1856
01:30:31,500 --> 01:30:34,980
The max player-- if it's
the max player's turn, then

1857
01:30:34,980 --> 01:30:39,540
max is going to pick an action,
A, in actions of S. Recall

1858
01:30:39,540 --> 01:30:42,090
that actions is a function
that takes a state

1859
01:30:42,090 --> 01:30:44,970
and gives me back all of the
possible actions that I can take.

1860
01:30:44,970 --> 01:30:48,770
It tells me all of the
moves that are possible.

1861
01:30:48,770 --> 01:30:50,610
The max player is going
to specifically pick

1862
01:30:50,610 --> 01:30:53,970
an action, A, in the set
of actions that gives me

1863
01:30:53,970 --> 01:31:01,420
the highest value of min value of result
of S and A. So what does that mean?

1864
01:31:01,420 --> 01:31:04,350
Well, it means that I want to
make the option that gives me

1865
01:31:04,350 --> 01:31:07,830
the highest score of
all of the actions, A.

1866
01:31:07,830 --> 01:31:09,630
But what score is that going to have?

1867
01:31:09,630 --> 01:31:12,780
To calculate that, I need to know
what my opponent, the min player,

1868
01:31:12,780 --> 01:31:18,300
is going to do if they try to minimize
the value of the state that results.

1869
01:31:18,300 --> 01:31:22,260
So we say, what state results
after I take this action,

1870
01:31:22,260 --> 01:31:24,570
and what happens when
the min player tries

1871
01:31:24,570 --> 01:31:27,570
to minimize the value of that state?

1872
01:31:27,570 --> 01:31:30,260
I consider that for all
of my possible options.

1873
01:31:30,260 --> 01:31:32,850
And after I've considered that
for all of my possible options,

1874
01:31:32,850 --> 01:31:36,770
I pick the action, A, that
has the highest value.

1875
01:31:36,770 --> 01:31:40,090
Likewise, the min player is going
to do the same thing, but backwards.

1876
01:31:40,090 --> 01:31:43,180
They're also going to consider, what
are all of the possible actions they

1877
01:31:43,180 --> 01:31:44,800
can take if it's their turn?

1878
01:31:44,800 --> 01:31:47,740
And they're going to pick the
action, A, that has the smallest

1879
01:31:47,740 --> 01:31:50,220
possible value of all the options.

1880
01:31:50,220 --> 01:31:53,430
And the way they know what the smallest
possible value of all the options is,

1881
01:31:53,430 --> 01:31:57,490
is by considering what the
max player is going to do,

1882
01:31:57,490 --> 01:32:01,720
by saying, what's the result of applying
this action to the current state,

1883
01:32:01,720 --> 01:32:03,640
and then, what would the
max player try to do?

1884
01:32:03,640 --> 01:32:08,000
What value would the max player
calculate for that particular state?

1885
01:32:08,000 --> 01:32:11,260
So everyone makes their decision
based on trying to estimate

1886
01:32:11,260 --> 01:32:13,600
what the other person would do.

1887
01:32:13,600 --> 01:32:15,930
And now we need to turn
our attention to these two

1888
01:32:15,930 --> 01:32:18,540
functions, maxValue and minValue.

1889
01:32:18,540 --> 01:32:21,690
How do you actually calculate
the value of a state

1890
01:32:21,690 --> 01:32:24,270
if you're trying to maximize
its value, and how do you

1891
01:32:24,270 --> 01:32:27,690
calculate the value of a state if
you're trying to minimize the value?

1892
01:32:27,690 --> 01:32:30,750
If you can do that, then we
have an entire implementation

1893
01:32:30,750 --> 01:32:32,920
of this Minimax algorithm.

1894
01:32:32,920 --> 01:32:33,610
So let's try it.

1895
01:32:33,610 --> 01:32:36,480
Let's try and implement
this maxValue function

1896
01:32:36,480 --> 01:32:40,630
that takes a state and returns
as output the value of that state

1897
01:32:40,630 --> 01:32:43,860
if I'm trying to maximize
the value of the state.

1898
01:32:43,860 --> 01:32:46,860
Well, the first thing I can check
for is to see if the game is over,

1899
01:32:46,860 --> 01:32:48,100
because if the game is over--

1900
01:32:48,100 --> 01:32:50,820
in other words, if the
state is a terminal state--

1901
01:32:50,820 --> 01:32:51,990
then this is easy.

1902
01:32:51,990 --> 01:32:54,930
I already have this utility
function that tells me

1903
01:32:54,930 --> 01:32:56,280
what the value of the board is.

1904
01:32:56,280 --> 01:32:58,660
If the game is over, I
just check, did X win?

1905
01:32:58,660 --> 01:32:59,160
Did O win?

1906
01:32:59,160 --> 01:33:00,150
Is that a tie?

1907
01:33:00,150 --> 01:33:04,200
And the utility function just knows
what the value of the state is.

1908
01:33:04,200 --> 01:33:06,430
What's trickier is if
the game isn't over,

1909
01:33:06,430 --> 01:33:09,240
because then I need to do this
recursive reasoning about thinking,

1910
01:33:09,240 --> 01:33:12,630
what is my opponent going
to do on the next move?

1911
01:33:12,630 --> 01:33:15,830
Then I want to calculate
the value of this state,

1912
01:33:15,830 --> 01:33:19,060
and I want the value of the
state to be as high as possible.

1913
01:33:19,060 --> 01:33:21,900
And I'll keep track of that
value in a variable called v.

1914
01:33:21,900 --> 01:33:24,570
And if I want the value
to be as high as possible,

1915
01:33:24,570 --> 01:33:27,330
I need to give v an initial value.

1916
01:33:27,330 --> 01:33:31,500
And initially, I'll just go ahead
and set it to be as low as possible,

1917
01:33:31,500 --> 01:33:34,650
because I don't know what
options are available to me yet.

1918
01:33:34,650 --> 01:33:38,610
So initially, I'll set v equal
to negative infinity, which

1919
01:33:38,610 --> 01:33:40,620
seems a little bit
strange, but the idea here

1920
01:33:40,620 --> 01:33:43,680
is, I want the value initially
to be as low as possible,

1921
01:33:43,680 --> 01:33:46,230
because as I consider
my actions, I'm always

1922
01:33:46,230 --> 01:33:50,220
going to try and do better than v.
And if I set v to negative infinity,

1923
01:33:50,220 --> 01:33:52,850
I know I can always do better than that.

1924
01:33:52,850 --> 01:33:54,960
So now I consider my actions.

1925
01:33:54,960 --> 01:33:56,710
And this is going to
be some kind of loop,

1926
01:33:56,710 --> 01:34:00,070
where for every action
in actions of state--

1927
01:34:00,070 --> 01:34:02,830
recall, actions is a
function that takes my state

1928
01:34:02,830 --> 01:34:06,650
and gives me all the possible
actions that I can use in that state.

1929
01:34:06,650 --> 01:34:11,570
So for each one of those actions,
I want to compare it to v and say,

1930
01:34:11,570 --> 01:34:18,290
all right, v is going to be equal to
the maximum of v and this expression.

1931
01:34:18,290 --> 01:34:20,090
So what is this expression?

1932
01:34:20,090 --> 01:34:24,640
Well, first it is, get the result
of taking the action and the state,

1933
01:34:24,640 --> 01:34:28,250
and then get the min value of that.

1934
01:34:28,250 --> 01:34:31,090
In other words, let's
say, I want to find out

1935
01:34:31,090 --> 01:34:34,220
from that state what is the
best that the min player can do,

1936
01:34:34,220 --> 01:34:36,430
because they are going to
try and minimize the score.

1937
01:34:36,430 --> 01:34:40,210
So whatever the resulting score
is of the min value of that state,

1938
01:34:40,210 --> 01:34:43,870
compare it to my current best value,
and just pick the maximum of those two,

1939
01:34:43,870 --> 01:34:46,460
because I am trying
to maximize the value.

1940
01:34:46,460 --> 01:34:48,550
In short, what these three
lines of code are doing

1941
01:34:48,550 --> 01:34:52,460
are going through all of my possible
actions and asking the question,

1942
01:34:52,460 --> 01:34:57,880
how do I maximize the score, given
what my opponent is going to try to do?

1943
01:34:57,880 --> 01:35:00,640
After this entire loop,
I can just return v,

1944
01:35:00,640 --> 01:35:04,070
and that is now the value
of that particular state.

1945
01:35:04,070 --> 01:35:07,910
And for the min player, it's the exact
opposite of this, the same logic,

1946
01:35:07,910 --> 01:35:08,930
just backwards.

1947
01:35:08,930 --> 01:35:10,910
To calculate the minimum
value of a state,

1948
01:35:10,910 --> 01:35:12,770
first we check if it's a terminal state.

1949
01:35:12,770 --> 01:35:14,960
If it is, we return its utility.

1950
01:35:14,960 --> 01:35:19,280
Otherwise, we're going to now try
to minimize the value of the state,

1951
01:35:19,280 --> 01:35:21,290
given all of my possible actions.

1952
01:35:21,290 --> 01:35:24,800
So I need an initial value
for v, the value of the state.

1953
01:35:24,800 --> 01:35:28,430
And initially, I'll set it to
infinity, because I know it can always

1954
01:35:28,430 --> 01:35:30,390
get something less than infinity.

1955
01:35:30,390 --> 01:35:33,920
So by starting with v equals infinity,
I make sure that the very first action

1956
01:35:33,920 --> 01:35:34,580
I find--

1957
01:35:34,580 --> 01:35:37,550
that will be less than this value of v.

1958
01:35:37,550 --> 01:35:38,810
And then I do the same thing--

1959
01:35:38,810 --> 01:35:41,630
loop over all of my possible
actions, and for each

1960
01:35:41,630 --> 01:35:45,740
of the results that we could get when
the max player makes their decision,

1961
01:35:45,740 --> 01:35:49,160
let's take the minimum of that
and the current value of v.

1962
01:35:49,160 --> 01:35:53,210
So after all is said and done I get
the smallest possible value of v,

1963
01:35:53,210 --> 01:35:56,480
that I then return back to the user.

1964
01:35:56,480 --> 01:35:59,020
So that, in effect, is the
pseudocode for Minimax.

1965
01:35:59,020 --> 01:36:01,990
That is how we take a game and
figure out what the best move to make

1966
01:36:01,990 --> 01:36:06,490
is by recursively using these
maxValue and minValue functions, where

1967
01:36:06,490 --> 01:36:10,060
maxValue calls minValue,
minValue calls maxValue, back

1968
01:36:10,060 --> 01:36:13,540
and forth, all the way until we reach
a terminal state, at which point

1969
01:36:13,540 --> 01:36:18,750
our algorithm can simply return the
utility of that particular state.

1970
01:36:18,750 --> 01:36:20,590
What you might imagine
is that this is going

1971
01:36:20,590 --> 01:36:23,770
to start to be a long process,
especially as games start

1972
01:36:23,770 --> 01:36:28,060
to get more complex, as we start to add
more moves and more possible options

1973
01:36:28,060 --> 01:36:30,710
and games that might
last quite a bit longer.

1974
01:36:30,710 --> 01:36:34,360
So the next question to ask is, what
sort of optimizations can we make here?

1975
01:36:34,360 --> 01:36:37,780
How can we do better in
order to use less space

1976
01:36:37,780 --> 01:36:42,110
or take less time to be able
to solve this kind of problem?

1977
01:36:42,110 --> 01:36:44,740
And we'll take a look at a
couple of possible optimizations.

1978
01:36:44,740 --> 01:36:47,330
But for one, we'll take
a look at this example.

1979
01:36:47,330 --> 01:36:49,850
Again, we're turning to these
up arrows and down arrows.

1980
01:36:49,850 --> 01:36:54,070
Let's imagine that I now am the
max player, this green arrow.

1981
01:36:54,070 --> 01:36:57,260
I am trying to make the
score as high as possible.

1982
01:36:57,260 --> 01:37:00,370
And this is an easy game,
where there are just two moves.

1983
01:37:00,370 --> 01:37:02,980
I make a move, one of
these three options,

1984
01:37:02,980 --> 01:37:05,890
and then my opponent makes a
move, one of these three options,

1985
01:37:05,890 --> 01:37:07,290
based on what move I make.

1986
01:37:07,290 --> 01:37:10,310
And as a result, we get some value.

1987
01:37:10,310 --> 01:37:13,450
Let's look at the order in
which I do these calculations

1988
01:37:13,450 --> 01:37:16,810
and figure out if there are any
optimizations I might be able to make

1989
01:37:16,810 --> 01:37:18,760
to this calculation process.

1990
01:37:18,760 --> 01:37:21,200
I'm going to have to look at
these states one at a time.

1991
01:37:21,200 --> 01:37:23,740
So let's say I start here on
the left and say, all right, now

1992
01:37:23,740 --> 01:37:28,300
I'm going to consider, what will the
min player, my opponent, try to do here?

1993
01:37:28,300 --> 01:37:31,810
Well, the min player is going to look
at all three of their possible actions

1994
01:37:31,810 --> 01:37:34,270
and look at their value, because
these are terminal states.

1995
01:37:34,270 --> 01:37:35,430
They're the end of the game.

1996
01:37:35,430 --> 01:37:38,830
And so they'll see, all right, this
node is a value of 4, value of 8,

1997
01:37:38,830 --> 01:37:40,410
value of 5.

1998
01:37:40,410 --> 01:37:42,800
And the min player is going
to say, well, all right.

1999
01:37:42,800 --> 01:37:45,790
Between these three
options, 4, 8, and 5,

2000
01:37:45,790 --> 01:37:48,070
I'll take the smallest
one I'll take the 4.

2001
01:37:48,070 --> 01:37:50,610
So this state now has a value of 4.

2002
01:37:50,610 --> 01:37:54,060
Then I as the max player say,
all right, if I take this action,

2003
01:37:54,060 --> 01:37:55,170
it will have a value of 4.

2004
01:37:55,170 --> 01:37:57,330
That's the best that I
can do, because min player

2005
01:37:57,330 --> 01:37:59,810
is going to try and minimize my score.

2006
01:37:59,810 --> 01:38:01,270
So now, what if I take this option?

2007
01:38:01,270 --> 01:38:02,620
We'll explore this next.

2008
01:38:02,620 --> 01:38:06,250
And now I explore what the min player
would do if I choose this action.

2009
01:38:06,250 --> 01:38:09,350
And the min player is going to say,
all right, what are the three options?

2010
01:38:09,350 --> 01:38:14,440
The min player has options
between 9, 3, and 7, and so 3

2011
01:38:14,440 --> 01:38:16,510
is the smallest among 9, 3, and 7.

2012
01:38:16,510 --> 01:38:19,640
So we'll go ahead and say
this state has a value of 3.

2013
01:38:19,640 --> 01:38:20,980
So now I, as the max player--

2014
01:38:20,980 --> 01:38:23,350
I have now explored two
of my three options.

2015
01:38:23,350 --> 01:38:27,390
I know that one of my options will
guarantee me a score of 4, at least,

2016
01:38:27,390 --> 01:38:31,100
and one of my options will
guarantee me a score of 3.

2017
01:38:31,100 --> 01:38:34,160
And now I consider my third option
and say, all right, what happens here?

2018
01:38:34,160 --> 01:38:35,910
Same exact logic-- the
min player is going

2019
01:38:35,910 --> 01:38:38,120
to look at these three
states, 2, 4, and 6,

2020
01:38:38,120 --> 01:38:42,480
say the minimum possible option is
2, so the min player wants the two.

2021
01:38:42,480 --> 01:38:45,780
Now I, as the max player, have
calculated all of the information

2022
01:38:45,780 --> 01:38:49,220
by looking two layers deep, by
looking at all of these nodes.

2023
01:38:49,220 --> 01:38:52,770
And I can now say, between the 4,
the 3, and the 2, you know what?

2024
01:38:52,770 --> 01:38:55,620
I'd rather take the
4, because if I choose

2025
01:38:55,620 --> 01:38:58,200
this option, if my
opponent plays optimally,

2026
01:38:58,200 --> 01:39:01,620
they will try and get me to the
4, but that's the best I can do.

2027
01:39:01,620 --> 01:39:04,170
I can't guarantee a
higher score, because if I

2028
01:39:04,170 --> 01:39:07,740
pick either of these two options, I
might get a 3, or I might get a 2.

2029
01:39:07,740 --> 01:39:10,620
And it's true that down
here is a 9, and that's

2030
01:39:10,620 --> 01:39:12,390
the highest score of any of the scores.

2031
01:39:12,390 --> 01:39:14,230
So I might be tempted
to say, you know what?

2032
01:39:14,230 --> 01:39:17,400
Maybe I should take this option,
because I might get the 9.

2033
01:39:17,400 --> 01:39:19,980
But if the min player is
playing intelligently,

2034
01:39:19,980 --> 01:39:22,650
if they're making the best
moves at each possible option

2035
01:39:22,650 --> 01:39:26,370
they have when they get to make
a choice, I'll be left with a 3,

2036
01:39:26,370 --> 01:39:28,470
whereas I could better,
playing optimally,

2037
01:39:28,470 --> 01:39:31,720
have guaranteed that I would get the 4.

2038
01:39:31,720 --> 01:39:33,600
So that doesn't affect
the logic that I would

2039
01:39:33,600 --> 01:39:38,910
use as a Minimax player trying to
maximize my score from that node there.

2040
01:39:38,910 --> 01:39:41,310
But it turns out, that took
quite a bit of computation

2041
01:39:41,310 --> 01:39:42,390
for me to figure that out.

2042
01:39:42,390 --> 01:39:45,690
I had to reason through all of these
nodes in order to draw this conclusion.

2043
01:39:45,690 --> 01:39:48,780
And this is for a pretty simple
game, where I have three choices,

2044
01:39:48,780 --> 01:39:52,270
my opponent has three choices,
and then the game's over.

2045
01:39:52,270 --> 01:39:55,070
So what I'd like to do is come up
with some way to optimize this.

2046
01:39:55,070 --> 01:39:58,990
Maybe I don't need to do all of
this calculation to still reach

2047
01:39:58,990 --> 01:40:00,450
the conclusion that, you know what?

2048
01:40:00,450 --> 01:40:01,950
This action to the left--

2049
01:40:01,950 --> 01:40:03,810
that's the best that I could do.

2050
01:40:03,810 --> 01:40:07,290
Let's go ahead and try again and
try and be a little more intelligent

2051
01:40:07,290 --> 01:40:10,150
about how I go about doing this.

2052
01:40:10,150 --> 01:40:12,320
So first, I start the exact same way.

2053
01:40:12,320 --> 01:40:14,160
I don't know what to
do initially, so I just

2054
01:40:14,160 --> 01:40:18,930
have to consider one of the options and
consider what the min player might do.

2055
01:40:18,930 --> 01:40:21,540
Min has three options, 4, 8, and 5.

2056
01:40:21,540 --> 01:40:25,500
And between those three options,
min says, 4 is the best they can do,

2057
01:40:25,500 --> 01:40:28,390
because they want to try
to minimize the score.

2058
01:40:28,390 --> 01:40:31,960
Now, I, the max player, will
consider my second option,

2059
01:40:31,960 --> 01:40:36,730
making this move here and considering
what my opponent would do in response.

2060
01:40:36,730 --> 01:40:38,470
What will the min player do?

2061
01:40:38,470 --> 01:40:41,560
Well, the min player is going to, from
that state, look at their options.

2062
01:40:41,560 --> 01:40:42,710
And I would say, all right.

2063
01:40:42,710 --> 01:40:45,980
9 is an option, 3 is an option.

2064
01:40:45,980 --> 01:40:48,220
And if I am doing the math
from this initial state,

2065
01:40:48,220 --> 01:40:51,400
doing all this calculation,
when I see a 3,

2066
01:40:51,400 --> 01:40:54,250
that should immediately
be a red flag for me,

2067
01:40:54,250 --> 01:40:56,890
because when I see a 3
down here at this state,

2068
01:40:56,890 --> 01:41:01,870
I know that the value of this
state is going to be at most 3.

2069
01:41:01,870 --> 01:41:04,630
It's going to be 3 or
something less than 3,

2070
01:41:04,630 --> 01:41:08,050
even though I haven't yet looked at
this last action or even further actions

2071
01:41:08,050 --> 01:41:10,870
if there were more actions
that could be taken here.

2072
01:41:10,870 --> 01:41:11,900
How do I know that?

2073
01:41:11,900 --> 01:41:16,000
Well, I know that the min player is
going to try to minimize my score.

2074
01:41:16,000 --> 01:41:20,350
And if they see a 3, the only way
this could be something other than a 3

2075
01:41:20,350 --> 01:41:24,520
is if this remaining thing that I
haven't yet looked at is less than 3,

2076
01:41:24,520 --> 01:41:28,810
which means there is no way for this
value to be anything more than 3,

2077
01:41:28,810 --> 01:41:31,390
because the min player
can already guarantee a 3,

2078
01:41:31,390 --> 01:41:34,930
and they are trying
to minimize my score.

2079
01:41:34,930 --> 01:41:36,380
So what does that tell me?

2080
01:41:36,380 --> 01:41:38,740
Well, it tells me that
if I choose this action,

2081
01:41:38,740 --> 01:41:43,240
my score is going to be 3, or maybe
even less than 3, if I'm unlucky.

2082
01:41:43,240 --> 01:41:47,500
But I already know that this
action will guarantee me a 4.

2083
01:41:47,500 --> 01:41:51,230
And so given that I know that this
action guarantees me a score of 4,

2084
01:41:51,230 --> 01:41:54,130
and this action means I
can't do better than 3,

2085
01:41:54,130 --> 01:41:56,290
if I'm trying to maximize
my options, there

2086
01:41:56,290 --> 01:41:59,290
is no need for me to
consider this triangle here.

2087
01:41:59,290 --> 01:42:01,990
There is no value, no
number that could go here,

2088
01:42:01,990 --> 01:42:04,760
that would change my mind
between these two options.

2089
01:42:04,760 --> 01:42:08,020
I'm always going to opt for
this path that gets me a 4,

2090
01:42:08,020 --> 01:42:11,260
as opposed to this path, where
the best I can do is a 3,

2091
01:42:11,260 --> 01:42:13,600
if my opponent plays optimally.

2092
01:42:13,600 --> 01:42:16,850
And this is going to be true for all of
the future states that I look at, too.

2093
01:42:16,850 --> 01:42:19,470
But if I look over here, at what
min player might do over here,

2094
01:42:19,470 --> 01:42:24,510
if I see that this state is a 2, I
know that this state is at most a 2,

2095
01:42:24,510 --> 01:42:28,500
because the only way this value
could be something other than 2

2096
01:42:28,500 --> 01:42:31,820
is if one of these remaining
states is less than a 2,

2097
01:42:31,820 --> 01:42:34,600
and so the min player
would opt for that instead.

2098
01:42:34,600 --> 01:42:37,470
So even without looking
at these remaining states,

2099
01:42:37,470 --> 01:42:42,600
I, as the maximizing player, can know
that choosing this path to the left

2100
01:42:42,600 --> 01:42:47,280
is going to be better than choosing
either of those two paths to the right,

2101
01:42:47,280 --> 01:42:51,810
because this one can't be better than
3, this one can't be better than 2,

2102
01:42:51,810 --> 01:42:56,550
and so 4 in this case is
the best that I can do.

2103
01:42:56,550 --> 01:42:59,580
And I can say now that this
state has a value of 4.

2104
01:42:59,580 --> 01:43:01,680
So in order to do this
type of calculation,

2105
01:43:01,680 --> 01:43:04,980
I was doing a little bit more
bookkeeping, keeping track of things,

2106
01:43:04,980 --> 01:43:08,480
keeping track all the time of,
what is the best that I can do,

2107
01:43:08,480 --> 01:43:11,400
what is the worst that I can do, and
for each of these states, saying,

2108
01:43:11,400 --> 01:43:15,300
all right, well, if I already
know that I can get a 4,

2109
01:43:15,300 --> 01:43:18,060
then if the best I can
do at this state is a 3,

2110
01:43:18,060 --> 01:43:19,770
no reason for me to consider it.

2111
01:43:19,770 --> 01:43:25,020
I can effectively prune this leaf
and anything below it from the tree.

2112
01:43:25,020 --> 01:43:28,440
And it's for that reason this
approach, this optimization to Minimax,

2113
01:43:28,440 --> 01:43:30,510
is called alpha-beta pruning.

2114
01:43:30,510 --> 01:43:32,400
Alpha and beta stand
for these two values

2115
01:43:32,400 --> 01:43:34,950
that you'll have to keep track
of, the best you can do so far

2116
01:43:34,950 --> 01:43:36,600
and the worst you can do so far.

2117
01:43:36,600 --> 01:43:41,170
And pruning is the idea of, if I
have a big, long, deep search tree,

2118
01:43:41,170 --> 01:43:43,650
I might be able to search it
more efficiently if I don't

2119
01:43:43,650 --> 01:43:47,070
need to search through everything,
if I can remove some of the nodes

2120
01:43:47,070 --> 01:43:52,170
to try and optimize the way that I
look through this entire search space.

2121
01:43:52,170 --> 01:43:55,500
So alpha-beta pruning can
definitely save us a lot of time

2122
01:43:55,500 --> 01:43:59,430
as we go about the search process by
making our searches more efficient.

2123
01:43:59,430 --> 01:44:03,750
But even then, it's still not
great as games get more complex.

2124
01:44:03,750 --> 01:44:06,990
Tic-tac-toe, fortunately,
is a relatively simple game,

2125
01:44:06,990 --> 01:44:09,820
and we might reasonably
ask a question like,

2126
01:44:09,820 --> 01:44:13,640
how many total possible
tic-tac-toe games are there?

2127
01:44:13,640 --> 01:44:14,600
You can think about it.

2128
01:44:14,600 --> 01:44:17,640
You can try and estimate, how many
moves are there at any given point?

2129
01:44:17,640 --> 01:44:19,500
How many moves long can the game last?

2130
01:44:19,500 --> 01:44:26,250
It turns out there are about 255,000
possible tic-tac-toe games that

2131
01:44:26,250 --> 01:44:27,780
can be played.

2132
01:44:27,780 --> 01:44:30,240
But compare that to a more
complex game, something

2133
01:44:30,240 --> 01:44:32,060
like a game of chess, for example--

2134
01:44:32,060 --> 01:44:35,950
far more pieces, far more moves,
games that last much longer.

2135
01:44:35,950 --> 01:44:38,940
How many total possible
chess games could there be?

2136
01:44:38,940 --> 01:44:41,100
It turns out that after
just four moves each,

2137
01:44:41,100 --> 01:44:44,100
four moves by the white player,
four moves by the black player,

2138
01:44:44,100 --> 01:44:47,280
that there are 288
billion possible chess

2139
01:44:47,280 --> 01:44:51,300
games that can result from that
situation, after just four moves each.

2140
01:44:51,300 --> 01:44:52,330
And going even further.

2141
01:44:52,330 --> 01:44:55,380
If you look at entire chess games and
how many possible chess games there

2142
01:44:55,380 --> 01:44:58,560
could be as a result there,
there are more than 10

2143
01:44:58,560 --> 01:45:02,820
to the 29,000 possible chess
games, far more chess games

2144
01:45:02,820 --> 01:45:04,410
than could ever be considered.

2145
01:45:04,410 --> 01:45:08,010
And this is a pretty big problem for the
Minimax algorithm, because the Minimax

2146
01:45:08,010 --> 01:45:12,150
algorithm starts with an initial state,
considers all the possible actions

2147
01:45:12,150 --> 01:45:15,570
and all the possible actions
after that, all the way

2148
01:45:15,570 --> 01:45:18,510
until we get to the end of the game.

2149
01:45:18,510 --> 01:45:20,640
And that's going to be a
problem if the computer is

2150
01:45:20,640 --> 01:45:23,100
going to need to look through
this many states, which

2151
01:45:23,100 --> 01:45:28,680
is far more than any computer could ever
do in any reasonable amount of time.

2152
01:45:28,680 --> 01:45:30,900
So what do we do in order
to solve this problem?

2153
01:45:30,900 --> 01:45:32,980
Instead of looking through
all these states, which

2154
01:45:32,980 --> 01:45:36,420
is totally intractable for a computer,
we need some better approach.

2155
01:45:36,420 --> 01:45:39,600
And it turns out that better approach
generally takes the form of something

2156
01:45:39,600 --> 01:45:42,000
called depth-limited Minimax.

2157
01:45:42,000 --> 01:45:44,740
Where normally Minimax
is depth-unlimited--

2158
01:45:44,740 --> 01:45:47,220
we just keep going, layer
after layer, move after move,

2159
01:45:47,220 --> 01:45:48,930
until we get to the end of the game--

2160
01:45:48,930 --> 01:45:51,920
depth-limited Minimax is instead
going to say, you know what?

2161
01:45:51,920 --> 01:45:53,760
After a certain number
of moves-- maybe I'll

2162
01:45:53,760 --> 01:45:57,390
look 10 moves ahead, maybe I'll look
12 moves ahead, but after that point,

2163
01:45:57,390 --> 01:46:00,350
I'm going to stop and not
consider additional moves that

2164
01:46:00,350 --> 01:46:02,190
might come after that,
just because it would

2165
01:46:02,190 --> 01:46:07,970
be computationally intractable to
consider all of those possible options.

2166
01:46:07,970 --> 01:46:10,730
But what do we do after we
get 10 or 12 moves deep,

2167
01:46:10,730 --> 01:46:13,840
and we arrive at a situation
where the game's not over?

2168
01:46:13,840 --> 01:46:18,110
Minimax still needs a way to assign a
score to that game board or game state

2169
01:46:18,110 --> 01:46:20,390
to figure out what its
current value is, which

2170
01:46:20,390 --> 01:46:22,760
is easy to do if the
game is over, but not so

2171
01:46:22,760 --> 01:46:25,400
easy to do if the game is not yet over.

2172
01:46:25,400 --> 01:46:27,950
So in order to do that, we need
to add one additional feature

2173
01:46:27,950 --> 01:46:31,400
to depth-limited Minimax
called an evaluation function,

2174
01:46:31,400 --> 01:46:33,170
which is just some
function that is going

2175
01:46:33,170 --> 01:46:38,000
to estimate the expected utility
of a game from a given state.

2176
01:46:38,000 --> 01:46:41,000
So in a game like chess, if you
imagine that a game value of 1

2177
01:46:41,000 --> 01:46:45,950
means white wins, negative 1 means
black wins, 0 means it's a draw,

2178
01:46:45,950 --> 01:46:51,260
then you might imagine that a score of
0.8 means white is very likely to win,

2179
01:46:51,260 --> 01:46:53,000
though certainly not guaranteed.

2180
01:46:53,000 --> 01:46:56,450
And you would have an evaluation
function that estimates

2181
01:46:56,450 --> 01:46:59,450
how good the game state happens to be.

2182
01:46:59,450 --> 01:47:02,720
And depending on how good
that evaluation function is,

2183
01:47:02,720 --> 01:47:06,050
that is ultimately what's going
to constrain how good the AI is.

2184
01:47:06,050 --> 01:47:09,080
The better the AI is
at estimating how good

2185
01:47:09,080 --> 01:47:12,440
or how bad any particular game
state is, the better the AI

2186
01:47:12,440 --> 01:47:14,660
is going to be able to play that game.

2187
01:47:14,660 --> 01:47:16,910
If the evaluation function
is worse and not as good

2188
01:47:16,910 --> 01:47:19,670
as estimating what the
expected utility is,

2189
01:47:19,670 --> 01:47:21,680
then it's going to be
a whole lot harder.

2190
01:47:21,680 --> 01:47:25,130
And you can imagine trying to come
up with these evaluation functions.

2191
01:47:25,130 --> 01:47:27,890
In chess, for example, you might
write an evaluation function

2192
01:47:27,890 --> 01:47:30,230
based on how many pieces
you have, as compared

2193
01:47:30,230 --> 01:47:32,540
to how many pieces your
opponent has, because each one

2194
01:47:32,540 --> 01:47:35,240
has a value in your evaluation function.

2195
01:47:35,240 --> 01:47:36,950
It probably needs to
be a little bit more

2196
01:47:36,950 --> 01:47:40,400
complicated than that to consider
other possible situations that

2197
01:47:40,400 --> 01:47:42,090
might arise as well.

2198
01:47:42,090 --> 01:47:44,240
And there are many other
variants on Minimax

2199
01:47:44,240 --> 01:47:47,570
that add additional features in
order to help it perform better

2200
01:47:47,570 --> 01:47:50,330
under these larger and more
computationally intractable

2201
01:47:50,330 --> 01:47:54,620
situations, where we couldn't possibly
explore all of the possible moves,

2202
01:47:54,620 --> 01:47:57,230
so we need to figure out
how to use evaluation

2203
01:47:57,230 --> 01:48:01,370
functions and other techniques to be
able to play these games, ultimately,

2204
01:48:01,370 --> 01:48:02,360
better.

2205
01:48:02,360 --> 01:48:05,480
But this now was a look at this kind
of adversarial search, these search

2206
01:48:05,480 --> 01:48:08,870
problems where we have
situations where I am trying

2207
01:48:08,870 --> 01:48:11,360
to play against some sort of opponent.

2208
01:48:11,360 --> 01:48:13,880
And these search problems
show up all over the place

2209
01:48:13,880 --> 01:48:15,560
throughout artificial intelligence.

2210
01:48:15,560 --> 01:48:18,710
We've been talking a lot today about
more classical search problems,

2211
01:48:18,710 --> 01:48:22,080
like trying to find directions
from one location to another.

2212
01:48:22,080 --> 01:48:25,590
But anytime an AI is faced with
trying to make a decision like,

2213
01:48:25,590 --> 01:48:28,460
what do I do now in order to
do something that is rational,

2214
01:48:28,460 --> 01:48:31,220
or do something that is intelligent,
or trying to play a game,

2215
01:48:31,220 --> 01:48:33,830
like figuring out what move to
make, these sort of algorithms

2216
01:48:33,830 --> 01:48:35,700
can really come in handy.

2217
01:48:35,700 --> 01:48:38,420
It turns out that for tic-tac-toe,
the solution is pretty simple,

2218
01:48:38,420 --> 01:48:39,600
because it's a small game.

2219
01:48:39,600 --> 01:48:42,470
XKCD has famously put
together a webcomic

2220
01:48:42,470 --> 01:48:45,620
where he will tell you exactly what
move to make as the optimal move

2221
01:48:45,620 --> 01:48:48,290
to make, no matter what
your opponent happens to do.

2222
01:48:48,290 --> 01:48:50,510
This type of thing is
not quite as possible

2223
01:48:50,510 --> 01:48:52,790
for a much larger game
like checkers or chess,

2224
01:48:52,790 --> 01:48:55,400
for example, where chess
is totally computationally

2225
01:48:55,400 --> 01:48:57,920
intractable for most computers
to be able to explore

2226
01:48:57,920 --> 01:48:59,330
all the possible states.

2227
01:48:59,330 --> 01:49:03,650
So we really need our AI to be
far more intelligent about how

2228
01:49:03,650 --> 01:49:05,850
they go about trying to
deal with these problems

2229
01:49:05,850 --> 01:49:08,180
and how they go about
taking this environment

2230
01:49:08,180 --> 01:49:10,010
that they find themselves
in and ultimately

2231
01:49:10,010 --> 01:49:12,710
searching for one of these solutions.

2232
01:49:12,710 --> 01:49:15,710
So this, then, was a look at
search and artificial intelligence.

2233
01:49:15,710 --> 01:49:17,630
Next time we'll take
a look at knowledge,

2234
01:49:17,630 --> 01:49:21,230
thinking about how it is that our AIs
are able to know information, reason

2235
01:49:21,230 --> 01:49:25,190
about that information, and draw
conclusions, all in our look at AI

2236
01:49:25,190 --> 01:49:26,750
and the principles behind it.

2237
01:49:26,750 --> 01:49:28,500
We'll see you next time.