WEBVTT
X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000

00:00:00.000 --> 00:00:02.475
[MUSIC PLAYING]

00:00:49.805 --> 00:00:52.970
DAVID MALAN: This is CS50,
Harvard University's introduction

00:00:52.970 --> 00:00:56.450
to the intellectual enterprises
of computer science--

00:00:56.450 --> 00:00:59.100
over here, computer science.

00:00:59.100 --> 00:01:02.990
So today we are joined
by CS50's own Brian Yu.

00:01:02.990 --> 00:01:06.740
This is an unusual week, a look
at artificial intelligence or AI.

00:01:06.740 --> 00:01:09.980
You might recall that some weeks
ago when we first introduced Python,

00:01:09.980 --> 00:01:13.160
we talked about writing
programs that answered you

00:01:13.160 --> 00:01:17.060
by saying hello if you said hello, or
by saying goodbye if you said goodbye.

00:01:17.060 --> 00:01:20.450
But those programs were all implemented
with if conditions and else ifs

00:01:20.450 --> 00:01:23.330
and else ifs and so that really
wasn't artificial intelligence.

00:01:23.330 --> 00:01:25.747
If you wanted to have a whole
conversation with a computer

00:01:25.747 --> 00:01:29.630
program like that, that would be a
huge number of ifs and else ifs just

00:01:29.630 --> 00:01:31.970
to anticipate all of the
things the human might say.

00:01:31.970 --> 00:01:34.100
So we can do better,
and indeed humans have

00:01:34.100 --> 00:01:36.710
been doing better in this field
of artificial intelligence.

00:01:36.710 --> 00:01:40.280
And I'm so pleased to say that Brian
is here to lead us along that way.

00:01:40.280 --> 00:01:42.705
Now, CS50's own, Brian Yu.

00:01:42.705 --> 00:01:43.830
BRIAN YU: Thanks very much.

00:01:43.830 --> 00:01:44.955
Welcome, everyone, to CS50.

00:01:44.955 --> 00:01:47.780
And today as David alluded
to, the topic of discussion

00:01:47.780 --> 00:01:50.960
is artificial intelligence, which
is all about taking our computers

00:01:50.960 --> 00:01:53.180
and trying to make them
intelligent somehow,

00:01:53.180 --> 00:01:55.460
trying to get them to
be able to act in a way

00:01:55.460 --> 00:01:57.800
that's somewhat rational,
somewhat human and that

00:01:57.800 --> 00:01:59.490
could take a number of different forms.

00:01:59.490 --> 00:02:03.235
One example of artificial intelligence,
for example, might be game play.

00:02:03.235 --> 00:02:05.360
You might be familiar with
the game of tic-tac-toe,

00:02:05.360 --> 00:02:07.160
where you've got this
three by three grid

00:02:07.160 --> 00:02:10.380
and X and O take turns
trying to get three in a row.

00:02:10.380 --> 00:02:12.860
And if, for example X
started the game and played

00:02:12.860 --> 00:02:15.860
in the middle square of this
grid and then it was O's turn

00:02:15.860 --> 00:02:20.000
and O played in the top, it turns
out that at this point in the game

00:02:20.000 --> 00:02:21.980
X has a very strategic move.

00:02:21.980 --> 00:02:25.220
And a human that's very good at the
game or a computer that can maybe

00:02:25.220 --> 00:02:27.200
try and figure out how
to play this game well,

00:02:27.200 --> 00:02:29.510
might make an intelligent
move like playing

00:02:29.510 --> 00:02:31.680
in the top right corner for example.

00:02:31.680 --> 00:02:34.020
And if X plays in the top
right corner, well then

00:02:34.020 --> 00:02:36.440
O is going to need to play
in the bottom left corner

00:02:36.440 --> 00:02:39.600
in order to block X from
getting three in a row.

00:02:39.600 --> 00:02:42.590
And here maybe you can see one of
a couple of possible good moves

00:02:42.590 --> 00:02:45.980
that X has now, but X could
play a move like moving

00:02:45.980 --> 00:02:48.290
in the right square over there.

00:02:48.290 --> 00:02:52.220
And now O is in a tricky position,
X has one way that could potentially

00:02:52.220 --> 00:02:55.190
win the game horizontally, and
another way they could potentially

00:02:55.190 --> 00:02:56.480
win the game vertically.

00:02:56.480 --> 00:02:59.030
And so O is going to have
to block one of those ways,

00:02:59.030 --> 00:03:01.640
and maybe they go to block
horizontally but then

00:03:01.640 --> 00:03:05.610
X is going to win no matter what just
by playing in that bottom right corner.

00:03:05.610 --> 00:03:09.020
And so a human playing this game
could reason through the game thinking

00:03:09.020 --> 00:03:12.680
about how the opponent might respond
and how X would respond in turn

00:03:12.680 --> 00:03:15.800
and a computer might try to do the
same thing for a game as simple

00:03:15.800 --> 00:03:18.890
as tic-tac-toe or a
game even more complex.

00:03:18.890 --> 00:03:21.800
But AI goes beyond just game
plan, you might see examples

00:03:21.800 --> 00:03:24.470
like handwriting recognition,
where nowadays computers

00:03:24.470 --> 00:03:28.070
are pretty good at taking human
handwritten text, which can be messy,

00:03:28.070 --> 00:03:30.020
which is different
from person to person,

00:03:30.020 --> 00:03:34.640
and somehow figuring out with pretty
high accuracy exactly what characters

00:03:34.640 --> 00:03:36.440
the human was actually writing.

00:03:36.440 --> 00:03:39.560
AI comes up in areas
like spam detection.

00:03:39.560 --> 00:03:42.020
Maybe in your email
inbox you have your inbox

00:03:42.020 --> 00:03:45.620
and your spam email usually gets
sorted into a separate folder

00:03:45.620 --> 00:03:49.040
where you might get a whole bunch of
emails coming into your email inbox

00:03:49.040 --> 00:03:52.920
and somehow your computer is able to
figure out with reasonable accuracy,

00:03:52.920 --> 00:03:56.030
which emails are good and
which emails are spam.

00:03:56.030 --> 00:03:58.160
Now of course, the computer
is not perfect at this.

00:03:58.160 --> 00:04:01.670
There are false positives, where the
computer thinks that an email might

00:04:01.670 --> 00:04:03.320
be spam when it isn't actually.

00:04:03.320 --> 00:04:06.950
And there are false negatives too,
where a spam email might accidentally

00:04:06.950 --> 00:04:09.800
end up in your inbox because
the computer doesn't catch it,

00:04:09.800 --> 00:04:12.350
but those sorts of false
positives and false negatives

00:04:12.350 --> 00:04:14.540
are the types of things
that AI researchers are

00:04:14.540 --> 00:04:18.118
working on trying to reduce to make
these systems more and more accurate.

00:04:18.118 --> 00:04:20.660
You see these kinds of systems
show up as well if you've ever

00:04:20.660 --> 00:04:23.460
been on a video watching
website, like YouTube or Netflix,

00:04:23.460 --> 00:04:27.350
for example, where you have watched
a whole bunch of videos or TV shows

00:04:27.350 --> 00:04:31.400
or movies and then software
behind Netflix or behind YouTube

00:04:31.400 --> 00:04:35.090
is able to give you recommendations,
suggest other videos that you

00:04:35.090 --> 00:04:39.050
might be interested in watching based on
the things that you've watched already.

00:04:39.050 --> 00:04:41.300
And in more recent years,
AI has gotten pretty good

00:04:41.300 --> 00:04:45.470
at doing even more sophisticated
things, things like generating data.

00:04:45.470 --> 00:04:48.650
Take a look at these two
images, for example, of people

00:04:48.650 --> 00:04:50.600
and see if you notice anything strange.

00:04:50.600 --> 00:04:52.940
See if either of these
people look strange to you.

00:04:52.940 --> 00:04:58.130
In fact, can you figure out which of
these two images is not a real person?

00:04:58.130 --> 00:05:02.180
That is to say, a computer-generated
person that looks like it's human

00:05:02.180 --> 00:05:04.260
but is not actually a
photo of a real person.

00:05:04.260 --> 00:05:06.635
So you can look at these two
images carefully, maybe look

00:05:06.635 --> 00:05:09.380
at the eyes and the hair
and the mouth and the nose

00:05:09.380 --> 00:05:11.750
and see if you can figure
out which one it is.

00:05:11.750 --> 00:05:15.140
It turns out neither of these two
images are images of real people.

00:05:15.140 --> 00:05:19.070
They're both computer generated, not
photos of real people but a computer

00:05:19.070 --> 00:05:22.910
has been trained to generate images of
people that look like real people that

00:05:22.910 --> 00:05:25.910
could fool someone into thinking
that it's a real person,

00:05:25.910 --> 00:05:29.367
but it's all just AI
generated information.

00:05:29.367 --> 00:05:31.700
So today we'll take a look
at all of those ideas, how it

00:05:31.700 --> 00:05:33.620
is that artificial intelligence works.

00:05:33.620 --> 00:05:37.010
And ultimately one of the big
takeaways is that AI is not just

00:05:37.010 --> 00:05:39.620
one algorithm or one
principle, it's really

00:05:39.620 --> 00:05:43.310
a collection, a category of approaches
to problem solving that can all

00:05:43.310 --> 00:05:46.400
be used to try and solve some
of these problems of trying

00:05:46.400 --> 00:05:49.160
to make computers intelligent.

00:05:49.160 --> 00:05:51.350
So let's begin with
one of the first areas

00:05:51.350 --> 00:05:53.750
where we might want to
make our AI work and that's

00:05:53.750 --> 00:05:55.610
in the area of decision making.

00:05:55.610 --> 00:05:57.770
Very frequently we want
to train a computer

00:05:57.770 --> 00:05:59.540
to be able to make a decision well.

00:05:59.540 --> 00:06:03.530
That decision might be deciding
if an email is spam or not spam,

00:06:03.530 --> 00:06:06.320
or deciding whether or not
to recommend a video to you,

00:06:06.320 --> 00:06:10.132
or it could be deciding what action
to take in a game, for example.

00:06:10.132 --> 00:06:11.840
So let's take a simple
game, maybe you've

00:06:11.840 --> 00:06:15.722
played a game like this before where
you control this paddle along the bottom

00:06:15.722 --> 00:06:18.680
and you're trying to bounce this ball
in order to hit all of the bricks

00:06:18.680 --> 00:06:19.730
along the top.

00:06:19.730 --> 00:06:22.550
Imagine if you were trying
to program a computer

00:06:22.550 --> 00:06:25.610
to be able to play this game
strategically, to play this game well

00:06:25.610 --> 00:06:30.410
and the computer observed
the ball move that way.

00:06:30.410 --> 00:06:34.190
So the ball is moving that way, what
should you program the computer to do?

00:06:34.190 --> 00:06:37.220
Well, it makes logical sense that if
the ball's moving to the left, then

00:06:37.220 --> 00:06:40.190
you should program the computer to
move the paddle to the left as well,

00:06:40.190 --> 00:06:44.230
to try and catch that ball before
it falls through the ground.

00:06:44.230 --> 00:06:47.360
And so you could take that
kind of logic and encode it

00:06:47.360 --> 00:06:51.140
into a computer program using what
we might call a decision tree.

00:06:51.140 --> 00:06:54.110
Decision trees are just a way
of representing a decision

00:06:54.110 --> 00:06:56.670
that a computer might
make by asking questions

00:06:56.670 --> 00:07:00.290
and depending on the answers to those
questions we might ask another question

00:07:00.290 --> 00:07:02.400
or make some sort of decision.

00:07:02.400 --> 00:07:05.330
So for this game of the paddle,
we could create a decision tree

00:07:05.330 --> 00:07:07.550
by asking a question
like this, we could ask,

00:07:07.550 --> 00:07:10.280
is the ball to the left of the paddle?

00:07:10.280 --> 00:07:13.910
If the answer to that question is yes,
well then the action we should take

00:07:13.910 --> 00:07:17.360
is we should move the paddle to the
left because the ball is moving left,

00:07:17.360 --> 00:07:19.640
the paddle should move left as well.

00:07:19.640 --> 00:07:22.610
If the answer to the
question is no, well then we

00:07:22.610 --> 00:07:25.650
need to maybe ask another question,
we might ask a question like,

00:07:25.650 --> 00:07:27.890
is the ball to the right of the paddle?

00:07:27.890 --> 00:07:29.862
And if the answer to
that question is yes,

00:07:29.862 --> 00:07:32.070
then we'll go ahead and move
the paddle to the right.

00:07:32.070 --> 00:07:34.272
And if the answer to the
question is no, well that

00:07:34.272 --> 00:07:36.230
means the ball is not to
the left of the paddle

00:07:36.230 --> 00:07:39.410
and it's not to the right of the
paddle, so we may as well just not

00:07:39.410 --> 00:07:42.540
move the paddle at all in that case.

00:07:42.540 --> 00:07:45.140
So this again is that
decision tree that can

00:07:45.140 --> 00:07:48.380
allow us to make these choices
about what it is that our AI should

00:07:48.380 --> 00:07:52.398
do in this situation and we could
take that decision tree and turn it

00:07:52.398 --> 00:07:55.190
into a kind of pseudocode, something
that might look like something

00:07:55.190 --> 00:07:57.350
you would write in C or in Python.

00:07:57.350 --> 00:08:00.470
I might say while the game
is ongoing, if the ball is

00:08:00.470 --> 00:08:03.620
to the left of the paddle, go ahead
and move the paddle to the left.

00:08:03.620 --> 00:08:07.310
Else if the ball is to the right of the
paddle, move the paddle to the right.

00:08:07.310 --> 00:08:10.800
And else if neither of those are true,
then don't move the paddle at all.

00:08:10.800 --> 00:08:12.890
So one of the advantages
of this decision tree

00:08:12.890 --> 00:08:15.290
is that it translates quite
nicely to the conditions

00:08:15.290 --> 00:08:18.600
that you're familiar with
from the world of programming.

00:08:18.600 --> 00:08:21.300
So let's give this a try with
something a little more complex.

00:08:21.300 --> 00:08:23.520
Let's take the game of
tic-tac-toe for example.

00:08:23.520 --> 00:08:27.320
And imagine you were trying to
program an AI to strategically play

00:08:27.320 --> 00:08:28.760
the game of tic-tac-toe.

00:08:28.760 --> 00:08:33.679
What questions might you ask in your
decision tree, what yes or no questions

00:08:33.679 --> 00:08:37.377
might you want to ask to create an
AI that can play this game well,

00:08:37.377 --> 00:08:38.960
that can play this game strategically?

00:08:38.960 --> 00:08:40.820
And maybe I'll take
a volunteer if anyone

00:08:40.820 --> 00:08:45.020
wants to suggest one possible
question I might want to ask of my AI

00:08:45.020 --> 00:08:48.660
as I'm trying to teach this
AI how to play this game?

00:08:48.660 --> 00:08:52.830
Any thoughts for questions that
I might ask in this situation?

00:08:52.830 --> 00:08:55.850
And let's go to--

00:08:55.850 --> 00:08:59.390
let's see, let's try
[? Vasily ?] if you have an idea?

00:08:59.390 --> 00:09:02.630
AUDIENCE: Maybe let's ask
what is the probability

00:09:02.630 --> 00:09:04.243
of winning given a certain move?

00:09:04.243 --> 00:09:04.910
BRIAN YU: Great.

00:09:04.910 --> 00:09:05.870
And what would winning mean?

00:09:05.870 --> 00:09:08.578
How could you detect what it would
mean to win, like what are you

00:09:08.578 --> 00:09:10.850
going to look for on the board?

00:09:10.850 --> 00:09:15.193
AUDIENCE: Three consecutive
yeah, Xs or circles.

00:09:15.193 --> 00:09:17.360
BRIAN YU: Great, you're
looking for some opportunity

00:09:17.360 --> 00:09:19.970
for like three consecutive
Xs or three consecutive Os.

00:09:19.970 --> 00:09:22.490
So one of the questions
you might ask is checking

00:09:22.490 --> 00:09:26.842
to see whether or not you could get
three in a row maybe on the next turn.

00:09:26.842 --> 00:09:29.300
If you already have two in a
row and there's an empty space

00:09:29.300 --> 00:09:32.175
that could be an opportunity for
something that we might want to try.

00:09:32.175 --> 00:09:35.720
Any other questions we might want to
ask as part of our strategic decision

00:09:35.720 --> 00:09:38.390
making if we're trying to
play this game of tic-tac-toe,

00:09:38.390 --> 00:09:41.510
any thoughts about other
things we could try,

00:09:41.510 --> 00:09:44.870
other types of things we
should be looking for or asking

00:09:44.870 --> 00:09:47.180
as we're training our artificial
intelligence, as we're

00:09:47.180 --> 00:09:50.690
trying to program it to play
this game strategically?

00:09:50.690 --> 00:09:53.770
Let's go to Santiago, you have an idea?

00:09:53.770 --> 00:09:54.730
AUDIENCE: Hello, Brian.

00:09:54.730 --> 00:09:58.788
If there is any possibility
to win, the other player?

00:09:58.788 --> 00:10:00.080
BRIAN YU: For the other player?

00:10:00.080 --> 00:10:01.760
Yeah, so you might want to
check the other player to see

00:10:01.760 --> 00:10:03.365
if they have some possibility to win.

00:10:03.365 --> 00:10:06.240
And if they have some possibility
to win you could try to block them.

00:10:06.240 --> 00:10:08.032
So those are great
questions you could ask.

00:10:08.032 --> 00:10:11.000
And we could start to formulate
a decision tree for tic-tac-toe

00:10:11.000 --> 00:10:12.140
based on those questions.

00:10:12.140 --> 00:10:15.980
I might start by asking, can I
get three in a row on this turn?

00:10:15.980 --> 00:10:19.050
And if the answer is yes, well, then
my action is pretty straightforward.

00:10:19.050 --> 00:10:22.670
I should play in the square that
will get me to three in a row.

00:10:22.670 --> 00:10:25.130
If the answer to the
question is no, well then I

00:10:25.130 --> 00:10:27.230
might ask Santiago's
question, I might ask,

00:10:27.230 --> 00:10:30.800
can my opponent get three
in a row on their next turn?

00:10:30.800 --> 00:10:33.128
And if the answer to that
is yes, well then, we better

00:10:33.128 --> 00:10:35.670
play in the square to block them
from getting three in a row,

00:10:35.670 --> 00:10:39.770
otherwise they're going to win the
game if we don't block them now.

00:10:39.770 --> 00:10:41.690
If the answer to this
question is no though,

00:10:41.690 --> 00:10:43.850
if I can't get three
in a row in this turn

00:10:43.850 --> 00:10:46.640
and my opponent can't get three
in a row in the next turn,

00:10:46.640 --> 00:10:48.405
well now it's a little bit less clear.

00:10:48.405 --> 00:10:50.780
You could maybe try and come
up with additional questions

00:10:50.780 --> 00:10:53.750
you might ask but these two
questions are very clear

00:10:53.750 --> 00:10:56.470
and the rest maybe a little bit
less obvious but ultimately we

00:10:56.470 --> 00:10:58.970
don't want to be doing any of
this as we're starting to make

00:10:58.970 --> 00:11:00.740
our AI more and more intelligent.

00:11:00.740 --> 00:11:02.960
As David was alluding
to earlier, ultimately

00:11:02.960 --> 00:11:05.930
rather than us have to program
every condition into the computer,

00:11:05.930 --> 00:11:08.390
telling it what to do
in every situation,

00:11:08.390 --> 00:11:11.960
we'd like for the computer to be smart
enough to just figure out on its own,

00:11:11.960 --> 00:11:14.480
analyze on its own all
of the possibilities

00:11:14.480 --> 00:11:16.490
and figure out what move it should make.

00:11:16.490 --> 00:11:18.350
And ultimately we want
our computer to make

00:11:18.350 --> 00:11:22.970
the optimal decision, the best possible
decision when playing this game.

00:11:22.970 --> 00:11:25.880
And so to do so we can
introduce our first algorithm

00:11:25.880 --> 00:11:28.220
in artificial intelligence
that we'll look at today

00:11:28.220 --> 00:11:30.020
and that algorithm is called minimax.

00:11:30.020 --> 00:11:32.540
Minimax is a game
playing algorithm that's

00:11:32.540 --> 00:11:35.300
useful any time you have
two competing players that

00:11:35.300 --> 00:11:38.540
are trying to play some sort of
competitive game like tic-tac-toe,

00:11:38.540 --> 00:11:40.620
but it'll work for other games as well.

00:11:40.620 --> 00:11:43.460
And ultimately as with we've
seen all throughout CS50,

00:11:43.460 --> 00:11:47.082
we need some way of representing
this game inside of the computer.

00:11:47.082 --> 00:11:49.790
And ultimately you'll recall way
back from the beginning of CS50,

00:11:49.790 --> 00:11:53.150
we've been trying to take everything
and represent it just with numbers

00:11:53.150 --> 00:11:54.920
and we can do the same thing with games.

00:11:54.920 --> 00:11:58.970
As far as our AI is concerned, there
are only three possible outcomes

00:11:58.970 --> 00:12:01.250
of the game that our AI
is going to care about,

00:12:01.250 --> 00:12:04.880
either O wins or X wins
and X gets three in a row,

00:12:04.880 --> 00:12:08.690
or it's possible that neither side
wins, that it ends up being a tie.

00:12:08.690 --> 00:12:12.500
And so what we're going to do is take
each of these three possible outcomes

00:12:12.500 --> 00:12:14.645
and assign a number to
each of those outcomes.

00:12:14.645 --> 00:12:16.520
We'll say O winning,
we're just going to say,

00:12:16.520 --> 00:12:19.890
let's call that negative one,
X winning, we'll call that one.

00:12:19.890 --> 00:12:22.458
And a tie, well that's
somewhere in between the two.

00:12:22.458 --> 00:12:24.000
So we'll go ahead and call that zero.

00:12:24.000 --> 00:12:25.920
Nobody wins that game.

00:12:25.920 --> 00:12:29.310
And now each player has a
goal, some objective that we

00:12:29.310 --> 00:12:31.440
can quantify using these numbers.

00:12:31.440 --> 00:12:34.620
The X player, who we might
also call the max player,

00:12:34.620 --> 00:12:39.150
aims to maximize the score, what's
best for X is a score of one,

00:12:39.150 --> 00:12:40.780
meaning X is going to win.

00:12:40.780 --> 00:12:43.350
But if X can't win,
well then tying the game

00:12:43.350 --> 00:12:46.790
is zero, that's better than losing
the game which is negative one.

00:12:46.790 --> 00:12:48.540
Meanwhile the O player
we're going to call

00:12:48.540 --> 00:12:52.120
the min player, the minimizing player,
their goal is to minimize the score.

00:12:52.120 --> 00:12:54.730
Negative one is the best
outcome for the O player,

00:12:54.730 --> 00:12:57.750
but if negative one can't
happen, if O can't win, well,

00:12:57.750 --> 00:12:59.970
then tying the game
is still better than X

00:12:59.970 --> 00:13:04.050
winning because X winning would mean
our minimizing player, player O,

00:13:04.050 --> 00:13:06.430
is ultimately going to lose the game.

00:13:06.430 --> 00:13:10.680
And so we can take every board in
tic-tac-toe and assign a score to it,

00:13:10.680 --> 00:13:13.680
it's either one or zero or negative one.

00:13:13.680 --> 00:13:17.640
So this board for example, this game
is over, X has already won the game

00:13:17.640 --> 00:13:21.210
and because X has won the game, we're
going to give that a value of one.

00:13:21.210 --> 00:13:26.490
One corresponds to X winning, negative
one to O winning, zero for a tie.

00:13:26.490 --> 00:13:28.740
So let's now consider this game board.

00:13:28.740 --> 00:13:32.130
This game board isn't over yet, but
we can still assign a score to it.

00:13:32.130 --> 00:13:36.690
Let's assume that it's X's turn,
what score would we give this board?

00:13:36.690 --> 00:13:38.730
Well, when we're giving
each board a score,

00:13:38.730 --> 00:13:42.360
we're considering what would happen if
both players were playing optimally.

00:13:42.360 --> 00:13:46.200
In other words, if both players
are playing the best possible moves

00:13:46.200 --> 00:13:48.900
and in this case, if X is
playing the best possible move,

00:13:48.900 --> 00:13:52.410
well then X is going to play in
this square to get three in a row

00:13:52.410 --> 00:13:57.420
and so this board also has a value of
one because if X plays their best move,

00:13:57.420 --> 00:14:01.610
ultimately X is going to win the
game, that has a value of one.

00:14:01.610 --> 00:14:04.860
So let's take a look at another board,
maybe one that's a little bit trickier.

00:14:04.860 --> 00:14:08.870
Now in this board let's assume it's
O's turn, so O gets to make a move

00:14:08.870 --> 00:14:10.620
and then because there's
one empty square,

00:14:10.620 --> 00:14:14.010
X will get to make a move
after that, what value

00:14:14.010 --> 00:14:15.257
would you give to this board?

00:14:15.257 --> 00:14:16.590
Maybe we'll ask for a volunteer.

00:14:16.590 --> 00:14:19.350
Remember the possibilities
are one if X is

00:14:19.350 --> 00:14:22.140
going to win, negative
one if O is going to win,

00:14:22.140 --> 00:14:24.090
and a zero if it's going to be a tie.

00:14:24.090 --> 00:14:29.280
If both players play the best
moves and it's O's turn right now,

00:14:29.280 --> 00:14:32.310
what value would you give
to this board and why?

00:14:32.310 --> 00:14:35.160
Maybe let's go to Santiago, go ahead.

00:14:35.160 --> 00:14:38.220
AUDIENCE: Yeah, I would say
that the value would be zero

00:14:38.220 --> 00:14:41.970
because if both players pay
their best move no one in the end

00:14:41.970 --> 00:14:46.145
will win because O will block X and
then there would be nothing left to do.

00:14:46.145 --> 00:14:46.770
BRIAN YU: Yeah.

00:14:46.770 --> 00:14:47.980
Absolutely correct.

00:14:47.980 --> 00:14:51.570
It's certainly possible that X could
win because X has two in a row,

00:14:51.570 --> 00:14:53.080
they could get three in a row.

00:14:53.080 --> 00:14:55.380
But if both players
play their best moves,

00:14:55.380 --> 00:14:57.615
well then O is going
to block here and then

00:14:57.615 --> 00:14:59.490
X is going to play in
the upper left and it's

00:14:59.490 --> 00:15:02.790
going to be a tie, which means this
board is going to have a value of zero.

00:15:02.790 --> 00:15:04.890
And the way a computer
might figure that out

00:15:04.890 --> 00:15:07.380
is by reasoning it through
exactly the same way we did,

00:15:07.380 --> 00:15:09.910
by considering both of
the possible options.

00:15:09.910 --> 00:15:13.500
If I want to know the value for
this board where it's O's turn,

00:15:13.500 --> 00:15:16.470
well then I'm going to consider
both of the possible options.

00:15:16.470 --> 00:15:19.920
O has two choices, O
could play in the top left

00:15:19.920 --> 00:15:23.220
or O could block X by
playing in the bottom there

00:15:23.220 --> 00:15:26.490
and the computer would consider
both of those possible choices

00:15:26.490 --> 00:15:30.800
and try and figure out what
value each of those boards has.

00:15:30.800 --> 00:15:33.120
So what happens if O
plays in the top left?

00:15:33.120 --> 00:15:37.860
Well, then X is going to play in the
bottom and X is going to win the game.

00:15:37.860 --> 00:15:40.500
X winning the game,
that has a value of one,

00:15:40.500 --> 00:15:43.787
remember X winning, that's the value
of one, which means this board also

00:15:43.787 --> 00:15:45.120
is going to have a value of one.

00:15:45.120 --> 00:15:47.487
So X wins, that's not great.

00:15:47.487 --> 00:15:49.320
So let's consider the
other possible choice,

00:15:49.320 --> 00:15:53.130
O could also have blocked X
by playing in the bottom here

00:15:53.130 --> 00:15:56.610
and in that case, that's going to
lead to X playing in the upper left,

00:15:56.610 --> 00:16:00.270
it's the only other possible option,
that board has a value of zero,

00:16:00.270 --> 00:16:02.040
it's a tie since nobody won.

00:16:02.040 --> 00:16:05.100
Which means this board
also has a value of zero.

00:16:05.100 --> 00:16:08.490
And now the minimizing player is
going to look at these two options.

00:16:08.490 --> 00:16:11.520
If I play in the top left, that's
going to be a value of one,

00:16:11.520 --> 00:16:13.050
that's not good for me.

00:16:13.050 --> 00:16:16.980
If I play in the bottom, that's going
to be a value of zero, so that's better.

00:16:16.980 --> 00:16:19.680
And so ultimately we can
conclude that this board up

00:16:19.680 --> 00:16:24.150
top as Santiago correctly stated, is
also going to have a value of zero.

00:16:24.150 --> 00:16:28.050
If both players play their best moves
it's ultimately going to be a tie.

00:16:28.050 --> 00:16:30.330
And this is what we may
call a game tree, where

00:16:30.330 --> 00:16:33.450
you're exploring all of the
possible branches, all of the ways

00:16:33.450 --> 00:16:34.590
this game could go.

00:16:34.590 --> 00:16:37.255
And from here, we're two moves
away from the end of the game.

00:16:37.255 --> 00:16:40.380
But you could back up and consider what
would it look like three moves away

00:16:40.380 --> 00:16:41.400
from the end of the game.

00:16:41.400 --> 00:16:43.525
And you'd end up with a
bigger tree because you now

00:16:43.525 --> 00:16:45.870
need to explore even more
possibilities if you're

00:16:45.870 --> 00:16:50.280
trying to figure out what the value
of any particular game board is.

00:16:50.280 --> 00:16:52.380
And so that's the way the
minimax algorithm works,

00:16:52.380 --> 00:16:55.110
consider all of the possible
moves you could make

00:16:55.110 --> 00:16:59.337
and then recursively in a sense,
consider if I make my best move, what's

00:16:59.337 --> 00:17:01.170
my opponent going to
do in response and then

00:17:01.170 --> 00:17:03.130
what could I do in response to that?

00:17:03.130 --> 00:17:05.790
And we could formulate that
as a little bit of pseudocode

00:17:05.790 --> 00:17:10.290
by saying if the player is X, well
then for all of the possible moves

00:17:10.290 --> 00:17:12.750
let's go ahead and calculate
a score for that board.

00:17:12.750 --> 00:17:15.359
The same way we were just
doing, by recursively following

00:17:15.359 --> 00:17:18.119
all the possible moves, let's
get a score for that board

00:17:18.119 --> 00:17:21.839
and let's choose the move
with the highest score.

00:17:21.839 --> 00:17:24.900
And otherwise, if the player is O,
well, then we'll do the same thing.

00:17:24.900 --> 00:17:28.319
For all the possible moves, we'll
calculate a score for that board,

00:17:28.319 --> 00:17:31.570
but then we'll choose the
move with the lowest score.

00:17:31.570 --> 00:17:34.710
So the X player is picking the
move that maximizes the score,

00:17:34.710 --> 00:17:38.100
the O player is picking the
move that minimizes the score.

00:17:38.100 --> 00:17:40.590
And using this approach
you can create an AI

00:17:40.590 --> 00:17:42.690
that can play a game like
tic-tac-toe perfectly.

00:17:42.690 --> 00:17:46.440
It'll never lose the game if
you've programmed it correctly.

00:17:46.440 --> 00:17:49.710
And this works not only for tic-tac-toe
but for other games as well.

00:17:49.710 --> 00:17:52.500
But do you see any problems
with this approach?

00:17:52.500 --> 00:17:56.010
This approach of considering
all of my possible moves, then

00:17:56.010 --> 00:17:58.980
all of my opponent's possible
responses to those moves, and then

00:17:58.980 --> 00:18:02.400
all of my possible responses
to those moves recursively

00:18:02.400 --> 00:18:04.230
until we get down to
the end of the game.

00:18:04.230 --> 00:18:06.180
Any possible problems that might arise?

00:18:06.180 --> 00:18:08.088
Let's go to Sophia maybe.

00:18:08.088 --> 00:18:11.130
AUDIENCE: It can take a really long
time to like explore all the branches

00:18:11.130 --> 00:18:12.780
and actually come to a conclusion.

00:18:12.780 --> 00:18:14.655
BRIAN YU: Yeah, the
amount of time it's going

00:18:14.655 --> 00:18:16.710
to take to explore all
of those possible moves,

00:18:16.710 --> 00:18:19.740
it could be quite large depending
on how complex the game is.

00:18:19.740 --> 00:18:23.220
For a game like tic-tac-toe, maybe
think about how many possible games

00:18:23.220 --> 00:18:24.450
of tic-tac-toe there are.

00:18:24.450 --> 00:18:30.210
It turns out there are about 255,000
possible games of tic-tac-toe, which

00:18:30.210 --> 00:18:32.190
seems like a lot, but
computers are pretty fast

00:18:32.190 --> 00:18:35.460
and computers can make
it through all 255,000

00:18:35.460 --> 00:18:38.790
of these possible tic-tac-toe games
usually in a matter of seconds

00:18:38.790 --> 00:18:40.500
on most modern computers.

00:18:40.500 --> 00:18:43.110
But if you imagine a game
like chess for example,

00:18:43.110 --> 00:18:46.870
known for being a fairly complex
game, how many possible games of chess

00:18:46.870 --> 00:18:47.370
are there?

00:18:47.370 --> 00:18:51.690
If there are 255,000 possible
games of tic-tac-toe,

00:18:51.690 --> 00:18:53.680
how many possible games
of chess are there?

00:18:53.680 --> 00:18:57.360
Well, it turns out that only
after the first four moves,

00:18:57.360 --> 00:19:00.060
if you only look at the
first four moves of chess,

00:19:00.060 --> 00:19:04.170
there are 288 billion
possible chess games, which

00:19:04.170 --> 00:19:06.288
is a lot for any computer
to try to deal with

00:19:06.288 --> 00:19:07.830
and that's only the first four moves.

00:19:07.830 --> 00:19:11.610
If you consider the entire game of
chess, nobody knows the answer exactly,

00:19:11.610 --> 00:19:15.150
but people estimate
that 10 to the 29,000

00:19:15.150 --> 00:19:18.750
is probably a lower bound on the
number of possible chess games.

00:19:18.750 --> 00:19:21.090
The actual number is
probably higher than that.

00:19:21.090 --> 00:19:25.080
That is far too many for any computer
to be able to reasonably get through

00:19:25.080 --> 00:19:26.440
in any amount of time.

00:19:26.440 --> 00:19:28.470
Computers as they
exist now are not going

00:19:28.470 --> 00:19:31.650
to be able to consider all
of the possible moves going

00:19:31.650 --> 00:19:34.050
all the way to the end of the
game in order to figure out

00:19:34.050 --> 00:19:35.383
what the score for any board is.

00:19:35.383 --> 00:19:38.640
Which is why a game like
chess is much harder for an AI

00:19:38.640 --> 00:19:40.530
to play than a game like tic-tac-toe.

00:19:40.530 --> 00:19:41.820
But it's not impossible.

00:19:41.820 --> 00:19:43.830
And in fact, the best
computer chess players

00:19:43.830 --> 00:19:47.740
are far better than the best
human chess players at this point.

00:19:47.740 --> 00:19:50.670
So what could we do, what
maybe improvement could we make

00:19:50.670 --> 00:19:52.710
or what change could we
make to the algorithm

00:19:52.710 --> 00:19:56.730
so that our AI can still play a
game like chess even though there

00:19:56.730 --> 00:19:59.057
are so many additional possible moves?

00:19:59.057 --> 00:20:01.140
There are a lot of possible
answers but any ideas,

00:20:01.140 --> 00:20:04.080
anyone want to offer a
thought or a suggestion

00:20:04.080 --> 00:20:07.770
as to how we might address this problem?

00:20:07.770 --> 00:20:11.160
They are so many moves to
consider, too many for a computer

00:20:11.160 --> 00:20:15.000
to ever reasonably try to attempt.

00:20:15.000 --> 00:20:18.680
Any thoughts on something we could try?

00:20:18.680 --> 00:20:19.700
Let's go to, Kurt.

00:20:19.700 --> 00:20:21.320
Yeah, you have an idea?

00:20:21.320 --> 00:20:24.140
AUDIENCE: Yeah, just if there
was some way along the way

00:20:24.140 --> 00:20:27.590
to maybe assign some point values where
you could, below some threshold maybe

00:20:27.590 --> 00:20:29.572
you could like discard those paths?

00:20:29.572 --> 00:20:32.780
BRIAN YU: Yeah, we want some way to be
able to more intelligently not explore

00:20:32.780 --> 00:20:36.140
all of the paths, but somehow
like discard some of the paths

00:20:36.140 --> 00:20:38.810
or not consider some of them
or stop ourselves somewhere.

00:20:38.810 --> 00:20:40.850
So that rather than
consider all of the 10

00:20:40.850 --> 00:20:45.140
to the 29,000 possible games of
chess, we just consider fewer of them.

00:20:45.140 --> 00:20:49.137
Exploring fewer possible games,
exploring fewer possible levels

00:20:49.137 --> 00:20:51.470
and so there are many variants
on the minimax algorithm.

00:20:51.470 --> 00:20:55.280
One of the most common being what's
called depth-limited minimax, where

00:20:55.280 --> 00:20:59.510
in depth-limited minimax, we consider
what's called an evaluation function.

00:20:59.510 --> 00:21:01.940
Rather than follow the
game until the very end,

00:21:01.940 --> 00:21:04.190
we follow the game some number of moves.

00:21:04.190 --> 00:21:07.010
In chess maybe you're going 15,
16 moves, but not all the way

00:21:07.010 --> 00:21:10.520
to the end of the game and then you're
just asking a question like all right,

00:21:10.520 --> 00:21:13.063
at this point in the game,
who seems likely to win?

00:21:13.063 --> 00:21:15.230
Even if we're not going to
calculate it all the way,

00:21:15.230 --> 00:21:17.660
you can maybe make some
judgment by counting up

00:21:17.660 --> 00:21:20.750
how many pieces of what value
each side happens to have.

00:21:20.750 --> 00:21:23.570
And you might come up with other
strategies for different games.

00:21:23.570 --> 00:21:26.990
But this now is where we
need to start being even more

00:21:26.990 --> 00:21:30.770
intelligent, to thinking about how
can we come up with a good evaluation

00:21:30.770 --> 00:21:34.940
function that's able to take a
complex game and estimate who

00:21:34.940 --> 00:21:38.080
is potentially likely to win.

00:21:38.080 --> 00:21:40.690
So that then is minimax, an
example of an algorithm that

00:21:40.690 --> 00:21:42.580
can be used to play games.

00:21:42.580 --> 00:21:46.060
And I'll pause here for any
questions about minimax or game

00:21:46.060 --> 00:21:48.400
playing artificial
intelligence before we

00:21:48.400 --> 00:21:52.280
move on to an entirely different
type of artificial intelligence.

00:21:52.280 --> 00:21:53.265
Yeah, Sophia, go ahead.

00:21:53.265 --> 00:21:53.890
AUDIENCE: Yeah.

00:21:53.890 --> 00:21:56.230
I just have a question,
like this algorithm

00:21:56.230 --> 00:21:58.837
is kind of following like how
you would think about the game,

00:21:58.837 --> 00:22:01.420
but are there any other algorithms
that like don't necessarily

00:22:01.420 --> 00:22:04.715
look like step by step evaluations?

00:22:04.715 --> 00:22:07.840
BRIAN YU: Yeah, there are definitely
other algorithms and other approaches.

00:22:07.840 --> 00:22:10.120
In fact, right now what
we've given an example of

00:22:10.120 --> 00:22:12.880
is really just an example
of an exhaustive search,

00:22:12.880 --> 00:22:14.980
searching for all of the
possible different moves

00:22:14.980 --> 00:22:16.880
and then making a
judgment based on that.

00:22:16.880 --> 00:22:20.350
We'll see some examples of more
intelligent algorithms or algorithms

00:22:20.350 --> 00:22:22.043
that can learn later today in fact.

00:22:22.043 --> 00:22:24.460
And we'll take a look at some
of those other possibilities

00:22:24.460 --> 00:22:25.850
and how we might apply them.

00:22:25.850 --> 00:22:29.830
And so this then is really an example of
what we might call a search algorithm.

00:22:29.830 --> 00:22:32.440
A search algorithm is
just a type of algorithm

00:22:32.440 --> 00:22:35.710
where we're looking for some
solution or some answer.

00:22:35.710 --> 00:22:38.200
It could be looking for the
best move to make in a game

00:22:38.200 --> 00:22:41.170
or you might imagine a search
problem more abstractly of something

00:22:41.170 --> 00:22:43.180
like finding your way through a maze.

00:22:43.180 --> 00:22:46.270
You're trying to get from one point
in a maze to another point in a maze

00:22:46.270 --> 00:22:49.690
and you have to make decisions about
which way to turn and how to go.

00:22:49.690 --> 00:22:51.730
And the real world
application of this might

00:22:51.730 --> 00:22:53.380
be something like driving directions.

00:22:53.380 --> 00:22:55.610
If you go into Google Maps for example.

00:22:55.610 --> 00:22:58.600
And you type where you are and
where you're trying to get to,

00:22:58.600 --> 00:23:02.470
Google Maps pretty quickly can give
you a fairly optimal route, some path

00:23:02.470 --> 00:23:05.630
to take which way to turn and when,
when to make which decision, that

00:23:05.630 --> 00:23:07.880
will get you from one point to another.

00:23:07.880 --> 00:23:10.150
This is an example of a
search problem, Google Maps

00:23:10.150 --> 00:23:13.030
needs to somehow search through
all of the possible routes you

00:23:13.030 --> 00:23:17.260
can take to figure out how to get
you to the place that you're going.

00:23:17.260 --> 00:23:20.290
And so we could come up with an
algorithm for trying to do that.

00:23:20.290 --> 00:23:24.490
Imagine we have a maze like this, we're
trying to get from point A to point B

00:23:24.490 --> 00:23:26.440
but of course, we have
some walls here, these

00:23:26.440 --> 00:23:29.050
grayed out squares are walls
that we can't cross through,

00:23:29.050 --> 00:23:33.050
but we'd still like to try and find
some path that will get us from point A

00:23:33.050 --> 00:23:33.550
to point B.

00:23:33.550 --> 00:23:36.490
There are a number of
algorithms that you might try.

00:23:36.490 --> 00:23:39.790
And you could think about if you
were solving this maze as a person,

00:23:39.790 --> 00:23:40.720
how would you do this?

00:23:40.720 --> 00:23:43.303
How would you decide what decision
to make, whether to go left

00:23:43.303 --> 00:23:44.418
or whether to go right?

00:23:44.418 --> 00:23:46.210
But as far as the
computer is concerned, we

00:23:46.210 --> 00:23:48.130
need to give it a very
precise algorithm.

00:23:48.130 --> 00:23:50.860
And one of the simplest
algorithms we might come up with

00:23:50.860 --> 00:23:53.200
is one called depth-first search.

00:23:53.200 --> 00:23:56.650
And the way that depth-first search
would navigate its way through a maze

00:23:56.650 --> 00:24:00.010
is to say, the AI is going
to just follow a path

00:24:00.010 --> 00:24:03.410
and if ever the AI needs to make a
choice, it reaches a fork in the road,

00:24:03.410 --> 00:24:06.250
so to speak, where it could
go left, or it could go right,

00:24:06.250 --> 00:24:07.875
it's just going to choose one randomly.

00:24:07.875 --> 00:24:10.667
It doesn't know which way is better,
so it's going to pick one path

00:24:10.667 --> 00:24:11.900
and it's going to try it.

00:24:11.900 --> 00:24:13.700
And if ever it hits a
dead end, it reaches

00:24:13.700 --> 00:24:15.910
some wall where it can't
make any more progress,

00:24:15.910 --> 00:24:20.150
then our AI is just going to back up and
it's going to try another path instead.

00:24:20.150 --> 00:24:22.900
So I can show you what that looks
like, we're starting at point A,

00:24:22.900 --> 00:24:26.170
we're trying to get to point B.
And the way our AI might work

00:24:26.170 --> 00:24:28.935
is that we might start by just
following one square after another.

00:24:28.935 --> 00:24:31.060
Initially, we don't have
much choice in the matter,

00:24:31.060 --> 00:24:34.480
we haven't reached any branching
points or any forks in the road.

00:24:34.480 --> 00:24:38.260
But at this point right here, now
we have a decision, we could go left

00:24:38.260 --> 00:24:39.580
or we could go right.

00:24:39.580 --> 00:24:43.900
And as far as depth-first search,
otherwise known as DFS is concerned,

00:24:43.900 --> 00:24:45.400
DFS doesn't know which way to go.

00:24:45.400 --> 00:24:48.400
It doesn't know whether to go left,
it doesn't know whether to go right,

00:24:48.400 --> 00:24:49.720
so we pick one randomly.

00:24:49.720 --> 00:24:51.440
We might choose to go left for example.

00:24:51.440 --> 00:24:52.960
So we're going to go left
and we're going to keep

00:24:52.960 --> 00:24:55.510
following the path until we get
to another fork in the road.

00:24:55.510 --> 00:24:58.450
We could go left or right, DFS
doesn't know which to pick,

00:24:58.450 --> 00:25:00.820
so we're going to choose
randomly, maybe we'll go left

00:25:00.820 --> 00:25:02.620
and now we hit a dead end.

00:25:02.620 --> 00:25:06.470
At that point our algorithm knows
this was not a good path to take.

00:25:06.470 --> 00:25:09.640
So it's going to back up to
the latest decision point

00:25:09.640 --> 00:25:11.010
and make a different choice.

00:25:11.010 --> 00:25:14.260
So it's going to back up to right here
and say, all right, I tried going left,

00:25:14.260 --> 00:25:15.170
it didn't work.

00:25:15.170 --> 00:25:16.880
Let's try going right instead.

00:25:16.880 --> 00:25:19.433
So it's going to go right,
it'll hit a dead end,

00:25:19.433 --> 00:25:22.600
realize that OK, this was not a good
path to take, it just led to dead ends,

00:25:22.600 --> 00:25:23.810
it didn't go anywhere.

00:25:23.810 --> 00:25:26.547
So instead of going left,
let's try going right.

00:25:26.547 --> 00:25:29.380
So we'll try this path, maybe we'll
try going up but hit a dead end,

00:25:29.380 --> 00:25:30.880
so we'll try going to the right.

00:25:30.880 --> 00:25:33.688
Here again, we hit a decision,
we're going to make some choice.

00:25:33.688 --> 00:25:35.980
And again, we're just going
to repeatedly hit dead ends

00:25:35.980 --> 00:25:42.470
and try new paths until eventually
we make our way to the destination.

00:25:42.470 --> 00:25:44.740
And so that is depth-first
search, and as long

00:25:44.740 --> 00:25:47.500
as there are like a
finite number of squares,

00:25:47.500 --> 00:25:51.530
this algorithm is eventually going
to find a path to the destination

00:25:51.530 --> 00:25:52.660
if such a path exists.

00:25:52.660 --> 00:25:55.810
Like if there is a way to
get from point A to point B,

00:25:55.810 --> 00:25:59.282
then this algorithm will eventually
find it because it tries something

00:25:59.282 --> 00:26:01.240
and if we reach a dead
end then we try it again

00:26:01.240 --> 00:26:06.190
and we keep doing that until eventually
we find our way to the destination.

00:26:06.190 --> 00:26:08.290
But this algorithm isn't great.

00:26:08.290 --> 00:26:11.570
There are certainly some problems,
some maybe room for improvement.

00:26:11.570 --> 00:26:15.850
So maybe I'll ask all of you, what
problems do you see with this approach?

00:26:15.850 --> 00:26:17.830
Maybe reasons why this
algorithm might not

00:26:17.830 --> 00:26:22.030
be ideal, areas where we might be
able to improve upon this algorithm

00:26:22.030 --> 00:26:24.280
as it stands right now?

00:26:24.280 --> 00:26:25.720
This again is depth-first search.

00:26:25.720 --> 00:26:28.390
Let's go to Joy.

00:26:28.390 --> 00:26:30.223
AUDIENCE: I think it
is very time consuming.

00:26:30.223 --> 00:26:31.932
BRIAN YU: Yeah, it's
very time consuming.

00:26:31.932 --> 00:26:33.820
And it's time consuming
in a number of ways.

00:26:33.820 --> 00:26:37.640
In one sense, we're exploring a lot of
paths that really don't lead anywhere

00:26:37.640 --> 00:26:40.730
in the sense that we
explored all of this area

00:26:40.730 --> 00:26:43.550
but ultimately that didn't help us
towards trying to find the goal.

00:26:43.550 --> 00:26:47.570
And it's also time consuming in
the route that it ultimately finds,

00:26:47.570 --> 00:26:52.190
like if you imagine using Google
Maps, for example, to navigate

00:26:52.190 --> 00:26:54.830
our way from one place
to another, it's likely

00:26:54.830 --> 00:26:57.680
going to be the case that you
want to find an efficient route,

00:26:57.680 --> 00:27:00.140
you want like the fastest
way to get from where

00:27:00.140 --> 00:27:01.545
you are to where you want to go.

00:27:01.545 --> 00:27:04.670
And if Google Maps were to give you
like a long and winding route with lots

00:27:04.670 --> 00:27:07.340
of detours that got you to
the destination but took

00:27:07.340 --> 00:27:09.410
much longer than it
needed to, that's probably

00:27:09.410 --> 00:27:11.510
not the best user experience for you.

00:27:11.510 --> 00:27:15.450
And depth-first search unfortunately
could run into just that situation.

00:27:15.450 --> 00:27:19.155
Imagine a maze like this, where
from point A, we could go up

00:27:19.155 --> 00:27:21.530
or we could go to the right,
we don't know which to take,

00:27:21.530 --> 00:27:24.270
so depth-first search
will just randomly choose.

00:27:24.270 --> 00:27:27.470
We might choose to go up
and maybe we'll go right

00:27:27.470 --> 00:27:30.020
and we'll find our way
to the destination.

00:27:30.020 --> 00:27:33.710
Sure, depth-first search was able
to find us a path from A to B,

00:27:33.710 --> 00:27:35.960
but remember that often
what we want to do

00:27:35.960 --> 00:27:37.850
is we want to make the optimal decision.

00:27:37.850 --> 00:27:41.905
This is a correct path for getting from
A to B, but it's not the best path.

00:27:41.905 --> 00:27:44.030
If we're looking for the
shortest path from A to B,

00:27:44.030 --> 00:27:46.460
well that's going to
be this one, it's going

00:27:46.460 --> 00:27:50.537
to be this path here that
goes from A to B there.

00:27:50.537 --> 00:27:52.370
So we'd like some way
to fix that, we'd like

00:27:52.370 --> 00:27:55.820
an algorithm that is able to
find us the shortest path,

00:27:55.820 --> 00:27:57.990
not just finding us any path.

00:27:57.990 --> 00:28:01.610
And so for that, we can use an algorithm
called breadth-first search, otherwise

00:28:01.610 --> 00:28:03.770
known as BFS.

00:28:03.770 --> 00:28:06.800
And the way this algorithm works
is that instead of whenever we

00:28:06.800 --> 00:28:09.800
reach a fork in the road, pick
one until we hit a dead end

00:28:09.800 --> 00:28:11.120
and then pick the other.

00:28:11.120 --> 00:28:14.570
What breadth-first search is going to do
is whenever we hit a fork in the road,

00:28:14.570 --> 00:28:16.820
let's take both paths, let's
take one step on the left

00:28:16.820 --> 00:28:19.487
and then one step on the right
and then another step on the left

00:28:19.487 --> 00:28:21.590
and another step on the
right and effectively just

00:28:21.590 --> 00:28:23.683
search outwards from the starting point.

00:28:23.683 --> 00:28:26.600
We're going to start from the beginning
and look for all of the places

00:28:26.600 --> 00:28:29.870
we can get to by taking
one step away from A,

00:28:29.870 --> 00:28:32.870
and then everywhere we can get
to taking two steps away from A,

00:28:32.870 --> 00:28:35.760
and then three steps away
from A, so on and so forth.

00:28:35.760 --> 00:28:37.760
So we're always looking
at the things that

00:28:37.760 --> 00:28:41.850
are nearby before we look at the
things that are further away.

00:28:41.850 --> 00:28:45.548
So the way this might work is that
from A we're going to take one step up

00:28:45.548 --> 00:28:47.840
and then we're going to search
one square to the right.

00:28:47.840 --> 00:28:51.090
And then we'll look a second square on
this direction and then a second square

00:28:51.090 --> 00:28:54.080
along the path to the right and then
a third square and a third square

00:28:54.080 --> 00:28:56.810
and we're going to repeat that
process, effectively alternating

00:28:56.810 --> 00:28:59.540
between all of the different
options that we have

00:28:59.540 --> 00:29:03.470
until eventually we find this
optimal path from A to B.

00:29:03.470 --> 00:29:06.050
And because we're looking
for the shorter paths

00:29:06.050 --> 00:29:09.290
before we look for those
longer paths, eventually we're

00:29:09.290 --> 00:29:12.958
going to find that
shortest possible path.

00:29:12.958 --> 00:29:14.750
Now there are still
problems with this too.

00:29:14.750 --> 00:29:17.600
As Joy pointed out, these
algorithms have a tendency

00:29:17.600 --> 00:29:20.310
to explore a lot of paths unnecessarily.

00:29:20.310 --> 00:29:22.650
Let's go back to that
original maze for example

00:29:22.650 --> 00:29:24.980
and consider what would
breadth-first search

00:29:24.980 --> 00:29:27.350
do if I presented it with this maze.

00:29:27.350 --> 00:29:29.480
We'll consider what
might happen, we might

00:29:29.480 --> 00:29:32.250
go until we reach the first
decision point right here,

00:29:32.250 --> 00:29:33.840
we could go left or right.

00:29:33.840 --> 00:29:36.260
And remember, depth-first
search picked one

00:29:36.260 --> 00:29:38.060
and just followed it until a dead end.

00:29:38.060 --> 00:29:40.953
What breadth-first search is going
to do is it's going to pick both.

00:29:40.953 --> 00:29:42.870
It's going to go to the
left and to the right.

00:29:42.870 --> 00:29:45.412
And then to the left and to the
right and basically alternate

00:29:45.412 --> 00:29:48.830
between all of those until we reach
another decision point and then

00:29:48.830 --> 00:29:50.920
it's going to consider all
of those possibilities.

00:29:50.920 --> 00:29:55.550
Let's go left or left here and right
and consider these possibilities too

00:29:55.550 --> 00:29:57.170
and it's going to keep exploring.

00:29:57.170 --> 00:29:59.780
Any time we reach any
decision point, it's

00:29:59.780 --> 00:30:05.030
going to explore this way and
that way, this way and that way,

00:30:05.030 --> 00:30:06.290
over and over again.

00:30:06.290 --> 00:30:08.090
And ultimately, if we
repeat this process

00:30:08.090 --> 00:30:11.360
and consider what breadth-first
search is looking for,

00:30:11.360 --> 00:30:16.010
sure it's going to find me the shortest
and in this case, the only possible

00:30:16.010 --> 00:30:20.870
path to get from A to B but it took
such a long time to be able to do so.

00:30:20.870 --> 00:30:22.980
It looked at so many
different squares, in fact,

00:30:22.980 --> 00:30:25.340
it looked at all of the
squares in order to do

00:30:25.340 --> 00:30:29.940
something as simple as find the shortest
path to get from one point to another.

00:30:29.940 --> 00:30:33.800
And so here we can try to start to
be a little bit more intelligent.

00:30:33.800 --> 00:30:37.520
What we'd ideally like to do is that
when we reach this first decision

00:30:37.520 --> 00:30:41.570
point, go left or go right, most
humans if you gave them this maze

00:30:41.570 --> 00:30:44.270
and told you to try to solve
it at this decision point,

00:30:44.270 --> 00:30:48.050
you wouldn't just pick randomly,
you would choose to go to the right

00:30:48.050 --> 00:30:50.927
because you know the goal
is somewhere to the right

00:30:50.927 --> 00:30:52.760
and so it's probably
the case that we should

00:30:52.760 --> 00:30:57.050
follow that direction if we're trying
to find our way to the end of the maze.

00:30:57.050 --> 00:30:59.900
And so these algorithms
we've looked at so far,

00:30:59.900 --> 00:31:02.090
depth-first search and
breadth-first search

00:31:02.090 --> 00:31:05.330
are examples of what we
might call uninformed search.

00:31:05.330 --> 00:31:08.060
They're not using any specialized
knowledge about the problem.

00:31:08.060 --> 00:31:10.018
They don't really know
anything about the maze.

00:31:10.018 --> 00:31:12.860
They're just basically
blindly guessing and hoping

00:31:12.860 --> 00:31:15.680
that eventually we make
our way to the solution.

00:31:15.680 --> 00:31:19.670
But in AI we can improve upon
this by using an informed search.

00:31:19.670 --> 00:31:23.660
An informed search uses something
that we know about the problem

00:31:23.660 --> 00:31:27.740
to try and improve the way we search,
to allow us to search a little bit more

00:31:27.740 --> 00:31:28.930
effectively.

00:31:28.930 --> 00:31:30.680
And to do so we're
often going to make use

00:31:30.680 --> 00:31:34.070
of what's known as a heuristic,
some way of estimating how

00:31:34.070 --> 00:31:36.740
good a particular state is going to be.

00:31:36.740 --> 00:31:40.120
So in this maze for example, if
I'm trying to get from A to B,

00:31:40.120 --> 00:31:43.000
a heuristic would allow me
to answer a question like,

00:31:43.000 --> 00:31:48.100
if I see point C over here and point D
over there, which one of those points

00:31:48.100 --> 00:31:49.210
would I rather be at?

00:31:49.210 --> 00:31:53.710
Which one is better for trying to
find our way to the destination?

00:31:53.710 --> 00:31:56.467
And between C and D, breadth-first
search and depth-first search,

00:31:56.467 --> 00:31:58.300
they have no knowledge
of which one of those

00:31:58.300 --> 00:32:01.330
is going to be better, as far as
it's concerned every square is just

00:32:01.330 --> 00:32:04.750
a square, but if you're
looking at this as a maze,

00:32:04.750 --> 00:32:07.930
most people would intuitively
tell you, well, D is better.

00:32:07.930 --> 00:32:11.300
Even if I don't know exactly
how to get to the destination,

00:32:11.300 --> 00:32:14.140
if I could be at either
C or D, I'd probably

00:32:14.140 --> 00:32:19.340
pick D because it just looks like
it's closer to that destination.

00:32:19.340 --> 00:32:21.640
And so the heuristic we
could use is one that's

00:32:21.640 --> 00:32:23.800
often known as the Manhattan distance.

00:32:23.800 --> 00:32:26.410
The Manhattan distance
between any two squares

00:32:26.410 --> 00:32:29.890
effectively says ignore the
walls, ignore all the boundaries,

00:32:29.890 --> 00:32:34.180
just consider how many squares like
in this case up and to the right

00:32:34.180 --> 00:32:38.480
would I have to go to get from
where I am to the destination.

00:32:38.480 --> 00:32:41.470
And so for C, we would go up this
many squares and then all the way

00:32:41.470 --> 00:32:44.290
to the right, whereas from D
doesn't matter whether you go up

00:32:44.290 --> 00:32:47.560
or to the right first, but I would go
right four squares and then up two.

00:32:47.560 --> 00:32:50.440
D as far as Manhattan
distance is concerned

00:32:50.440 --> 00:32:56.620
is much closer to the goal than C and
so I would rather be at D than at C.

00:32:56.620 --> 00:32:59.830
And as soon as we have that notion
of like between any two choices,

00:32:59.830 --> 00:33:02.140
which one of those would I rather be at?

00:33:02.140 --> 00:33:05.470
That gives us a way to improve
upon the random guessing

00:33:05.470 --> 00:33:06.680
that the other algorithms do.

00:33:06.680 --> 00:33:09.430
Remember the depth-first search
when it reached a fork in the road

00:33:09.430 --> 00:33:12.160
and it could go left or right,
it didn't know which to pick,

00:33:12.160 --> 00:33:14.110
so it just randomly picked one.

00:33:14.110 --> 00:33:17.710
Now that we have a heuristic like
this, we can make an informed choice.

00:33:17.710 --> 00:33:21.070
We can say, I don't know whether left
or right is the correct solution,

00:33:21.070 --> 00:33:25.180
but right is probably going to be better
than left because of this heuristic.

00:33:25.180 --> 00:33:28.108
And so I can make those
sorts of judgments.

00:33:28.108 --> 00:33:30.400
And so for that, we'll take
a look at another algorithm

00:33:30.400 --> 00:33:33.130
known as greedy best-first search.

00:33:33.130 --> 00:33:35.470
So in greedy best-first
search, what we're going to do

00:33:35.470 --> 00:33:40.030
is consider for each of the squares
what it's heuristic value is according

00:33:40.030 --> 00:33:42.050
to the Manhattan distance in this case.

00:33:42.050 --> 00:33:45.310
So this square for example,
is one away from the goal,

00:33:45.310 --> 00:33:47.170
so we've labeled it with the number one.

00:33:47.170 --> 00:33:49.200
This square meanwhile is
two away from our goal,

00:33:49.200 --> 00:33:51.742
so we're labeling it with the
number two, this is three away,

00:33:51.742 --> 00:33:52.747
this is four away.

00:33:52.747 --> 00:33:54.580
You'll notice though
that we're ignoring any

00:33:54.580 --> 00:33:56.998
of the walls, any of the
boundaries, because those

00:33:56.998 --> 00:33:58.540
are going to be difficult to compute.

00:33:58.540 --> 00:34:00.160
We want something efficient.

00:34:00.160 --> 00:34:02.980
This square here, even though
in order to get to the goal

00:34:02.980 --> 00:34:06.208
we have to follow this winding path
to go around all the boundaries,

00:34:06.208 --> 00:34:07.750
we're still giving it a score of two.

00:34:07.750 --> 00:34:11.389
We want something efficient, right
now, it just looks like it's two away.

00:34:11.389 --> 00:34:14.949
So it's not a perfect heuristic,
the heuristic is just an estimate

00:34:14.949 --> 00:34:17.080
but we're using it as
an approximation that's

00:34:17.080 --> 00:34:20.630
going to help us as we go
about this search process.

00:34:20.630 --> 00:34:24.460
And so what we'll do is we'll start the
same way, starting from point A looking

00:34:24.460 --> 00:34:28.150
at all of the squares we can get to
until we reach our first decision

00:34:28.150 --> 00:34:29.120
point.

00:34:29.120 --> 00:34:33.040
So here we could choose to go left
or we could choose to go right

00:34:33.040 --> 00:34:36.370
and in this case, going left
according to the heuristic,

00:34:36.370 --> 00:34:40.239
this square is 13 away from
the goal and this one over here

00:34:40.239 --> 00:34:41.900
is 11 away from the goal.

00:34:41.900 --> 00:34:44.650
So between those two, being
11 away from the goal,

00:34:44.650 --> 00:34:46.280
that sounds a whole lot better.

00:34:46.280 --> 00:34:49.969
So greedy best-first search is going
to make the choice to go to the right.

00:34:49.969 --> 00:34:53.980
So we'll go 11, we'll keep following
until we reach the next decision point.

00:34:53.980 --> 00:34:57.730
Here from the eight we have two
choices, up or to the right.

00:34:57.730 --> 00:35:01.090
As far as the heuristic is concerned
both of these are equivalent,

00:35:01.090 --> 00:35:05.680
going up we're seven squares away,
going to the right we're six plus one.

00:35:05.680 --> 00:35:07.610
That's a total of seven squares away.

00:35:07.610 --> 00:35:11.020
So here even greedy best-first search
sometimes needs to pick randomly.

00:35:11.020 --> 00:35:14.830
If both squares have the same heuristic
value, we just have to make a choice.

00:35:14.830 --> 00:35:16.895
And maybe we make a bad
choice and hit a dead end

00:35:16.895 --> 00:35:18.520
but then we can just try the other one.

00:35:18.520 --> 00:35:21.070
So seven, six, we're
going to keep following.

00:35:21.070 --> 00:35:25.660
Here we have another decision point, we
can go down or we can go to the right,

00:35:25.660 --> 00:35:28.150
but the one to the right has
a smaller heuristic value.

00:35:28.150 --> 00:35:29.817
It's a six instead of an eight.

00:35:29.817 --> 00:35:31.900
So we're always going to
try and pick the one that

00:35:31.900 --> 00:35:33.610
looks like it's going to be closer.

00:35:33.610 --> 00:35:36.370
So we'll pick the six,
we'll go to the five.

00:35:36.370 --> 00:35:38.210
And here we reach one
more decision point,

00:35:38.210 --> 00:35:40.780
we can go up which has
a heuristic of four

00:35:40.780 --> 00:35:43.060
and we can go down which
has a heuristic of six.

00:35:43.060 --> 00:35:46.120
So even here because the
four is the smaller value,

00:35:46.120 --> 00:35:48.657
we're going to go up even
though you, the human can

00:35:48.657 --> 00:35:50.740
see we're ultimately going
to run into a dead end,

00:35:50.740 --> 00:35:52.198
the computer doesn't know that yet.

00:35:52.198 --> 00:35:54.880
The computer just has to judge
based on the heuristic what

00:35:54.880 --> 00:35:57.280
it thinks is the best
thing to do but eventually

00:35:57.280 --> 00:36:00.550
it's going to run into that wall,
realize that it can't make any progress

00:36:00.550 --> 00:36:05.500
and then go back down here and will
follow that path until ultimately we

00:36:05.500 --> 00:36:06.860
arrive at the solution.

00:36:06.860 --> 00:36:10.990
And so here we arrived at the same
solution that breadth-first search did,

00:36:10.990 --> 00:36:13.390
but we didn't need to
consider all of the squares.

00:36:13.390 --> 00:36:15.790
We could ignore all of
these off to the left,

00:36:15.790 --> 00:36:18.430
we could ignore all of
these down below just

00:36:18.430 --> 00:36:21.310
by being a little bit smarter
about what choice we make.

00:36:21.310 --> 00:36:25.210
Instead of just choosing randomly
we make an informed choice

00:36:25.210 --> 00:36:27.770
based on that heuristic.

00:36:27.770 --> 00:36:30.800
I'll pause here for any questions
then about the algorithms

00:36:30.800 --> 00:36:34.760
we've looked at so far, depth-first
search, breadth-first search, and now

00:36:34.760 --> 00:36:38.630
an informed algorithm,
greedy best-first search.

00:36:38.630 --> 00:36:40.580
Any questions about these algorithms?

00:36:40.580 --> 00:36:45.100
Yeah, let's go to let's see, Sofia?

00:36:45.100 --> 00:36:48.710
AUDIENCE: In like the
decision to go randomly

00:36:48.710 --> 00:36:53.420
is it possible to like go further and
see if there's like lesser values,

00:36:53.420 --> 00:36:55.610
like go one more step?

00:36:55.610 --> 00:36:59.180
BRIAN YU: You certainly could like maybe
try and look ahead a couple of steps

00:36:59.180 --> 00:37:01.040
and look at what might be following it.

00:37:01.040 --> 00:37:04.400
And that might offer numbers that
would improve upon the situation.

00:37:04.400 --> 00:37:07.860
But even this algorithm is not perfect.

00:37:07.860 --> 00:37:10.580
In fact, when you're just
looking at these heuristic values

00:37:10.580 --> 00:37:13.820
and following the heuristic values,
sometimes the heuristic values

00:37:13.820 --> 00:37:17.240
will lead you in a good direction,
like sometimes in this case,

00:37:17.240 --> 00:37:19.280
we ultimately, we made
a couple of wrong turns

00:37:19.280 --> 00:37:21.260
but ultimately we kind of found our way.

00:37:21.260 --> 00:37:23.280
But that's not always
going to be the case.

00:37:23.280 --> 00:37:27.380
There are some cases where because the
heuristic is really just an estimate,

00:37:27.380 --> 00:37:29.720
the heuristic can sometimes be wrong.

00:37:29.720 --> 00:37:31.940
Take a maze like this
for example, if you

00:37:31.940 --> 00:37:34.280
were to follow greedy
best-first search, you

00:37:34.280 --> 00:37:38.240
would go 16, 15, 14,
13, 12, at this point

00:37:38.240 --> 00:37:40.970
you could decide either
the 11 or the 13.

00:37:40.970 --> 00:37:43.340
And greedy best-first search
would look ahead and say,

00:37:43.340 --> 00:37:46.550
this path looks pretty good and it's
going to-- and none of these values

00:37:46.550 --> 00:37:48.860
are any bigger than
this 12 and so it would

00:37:48.860 --> 00:37:51.920
find this path that
takes you from A to B.

00:37:51.920 --> 00:37:55.220
But even that path isn't actually
the optimal path to take.

00:37:55.220 --> 00:37:58.880
The optimal path to take, the
shortest path between A and B

00:37:58.880 --> 00:38:02.900
is actually this one here, which
looks a little counterintuitive

00:38:02.900 --> 00:38:05.120
and the reason we didn't
catch it is because it

00:38:05.120 --> 00:38:09.470
involves like going to the left
first and then winding around.

00:38:09.470 --> 00:38:11.510
And normally according
to this heuristic,

00:38:11.510 --> 00:38:13.430
we usually don't want
to be going to the left

00:38:13.430 --> 00:38:16.490
because we know that our goal, point
B where we're trying to get to,

00:38:16.490 --> 00:38:17.560
it's off to the right.

00:38:17.560 --> 00:38:20.840
So even if you're looking at these
heuristic values and looking ahead,

00:38:20.840 --> 00:38:23.120
this algorithm might not be perfect.

00:38:23.120 --> 00:38:26.505
And so we might try to improve
upon this algorithm even more.

00:38:26.505 --> 00:38:28.880
And so for one final search
algorithm that I'll show you,

00:38:28.880 --> 00:38:31.130
the most complex one
that we've seen yet,

00:38:31.130 --> 00:38:33.860
is an algorithm known as A * search.

00:38:33.860 --> 00:38:38.210
A * search tries to build upon the
ideas of greedy best-first search, which

00:38:38.210 --> 00:38:43.070
is to say we're going to use a heuristic
to try and intelligently make choices,

00:38:43.070 --> 00:38:45.840
but we're going to try and
solve the problem we just saw,

00:38:45.840 --> 00:38:48.800
which is that greedy best-first
search isn't always optimal.

00:38:48.800 --> 00:38:51.230
When it just takes into
account the heuristic,

00:38:51.230 --> 00:38:53.360
it's not always going
to make the best choice

00:38:53.360 --> 00:38:57.620
because we might end up choosing
a path that's ultimately longer.

00:38:57.620 --> 00:39:00.140
What A * search is
going to try to realize

00:39:00.140 --> 00:39:03.230
is that if we've made it
like all the way down here

00:39:03.230 --> 00:39:06.320
and now we're at a spot where
we could be like 11 squares away

00:39:06.320 --> 00:39:11.960
from the goal, that's OK but honestly,
being 13 squares away from the goal

00:39:11.960 --> 00:39:14.630
much sooner is probably better.

00:39:14.630 --> 00:39:17.750
So we can't just take into
account the heuristic value,

00:39:17.750 --> 00:39:21.440
we should also take into account
how far have we gone already.

00:39:21.440 --> 00:39:23.960
If I've already traveled
many squares and I

00:39:23.960 --> 00:39:26.510
find myself pretty close
to the goal, I would rather

00:39:26.510 --> 00:39:30.330
have traveled fewer squares and
find myself close to the goal.

00:39:30.330 --> 00:39:33.470
And so A * search is going
to try to combine these two

00:39:33.470 --> 00:39:37.190
pieces of information, combine the
heuristic information that we have seen

00:39:37.190 --> 00:39:40.220
here, and also combine
how many steps have

00:39:40.220 --> 00:39:45.680
I taken so far because that factors
into the ultimate optimal solution too.

00:39:45.680 --> 00:39:47.682
And so here's how A *
search is going to work.

00:39:47.682 --> 00:39:49.640
It's going to be the
exact same idea as before,

00:39:49.640 --> 00:39:53.120
just looking at the heuristic value,
but instead of only considering

00:39:53.120 --> 00:39:56.780
the heuristic value, we're going to
consider taking the heuristic value

00:39:56.780 --> 00:40:00.620
and adding to it how many
steps we've already taken.

00:40:00.620 --> 00:40:02.070
So we take our first step.

00:40:02.070 --> 00:40:06.740
And now we've taken one step and
we're 16 squares away from the goal

00:40:06.740 --> 00:40:08.930
according to the heuristic,
for a total of 17.

00:40:08.930 --> 00:40:12.740
And then we take our second step and
we're 15 squares away from the goal

00:40:12.740 --> 00:40:15.740
and we take our third step and
we're 14 squares away from the goal,

00:40:15.740 --> 00:40:20.570
and we keep going until we reach a
decision point after five squares,

00:40:20.570 --> 00:40:23.880
we're now 12 away from the
goal and now we have a choice,

00:40:23.880 --> 00:40:27.290
we could go six squares and
be 11 away from the goal

00:40:27.290 --> 00:40:31.770
or we could go six squares and be 13
away from the goal, which is worse.

00:40:31.770 --> 00:40:33.720
So we're going to make
the decision to go up.

00:40:33.720 --> 00:40:36.260
So for now, it doesn't seem
like we've done any better,

00:40:36.260 --> 00:40:38.525
we haven't found the
optimal solution just yet.

00:40:38.525 --> 00:40:40.400
But notice what will
happen, eventually if we

00:40:40.400 --> 00:40:44.180
follow this path for long
enough we'll end up at a point

00:40:44.180 --> 00:40:47.360
where we've made 14
steps and we're estimated

00:40:47.360 --> 00:40:51.560
to be five away from the goal, one, two,
three, four, five, ignoring the walls.

00:40:51.560 --> 00:40:55.850
And now we have a choice, we could
be six squares away from the goal

00:40:55.850 --> 00:41:01.860
after having walked 15 steps, so
15 plus six that's a total of 21,

00:41:01.860 --> 00:41:04.790
or this option is still available to us.

00:41:04.790 --> 00:41:08.570
We could be six squares
away from the start,

00:41:08.570 --> 00:41:13.430
but be 13 away from the goal,
six plus 13, that's a total of 19

00:41:13.430 --> 00:41:17.660
and that 19 is a smaller
number than this 21.

00:41:17.660 --> 00:41:22.740
So this 19 is ultimately a better
choice as far as A * is concerned.

00:41:22.740 --> 00:41:25.850
So by combining information
about how far we've traveled

00:41:25.850 --> 00:41:29.163
and how far is left to go, we can
make a more intelligent choice,

00:41:29.163 --> 00:41:31.580
we can say you know what, let's
go ahead and try this path

00:41:31.580 --> 00:41:35.070
and A * then will ultimately
be able to find its way

00:41:35.070 --> 00:41:38.030
to get from the starting
point ultimately to the goal.

00:41:38.030 --> 00:41:42.230
And this algorithm then relies upon
having a good heuristic function

00:41:42.230 --> 00:41:44.110
and it just so happens
that you can prove

00:41:44.110 --> 00:41:46.420
that if you pick a good
heuristic function,

00:41:46.420 --> 00:41:48.070
this algorithm will be optimal.

00:41:48.070 --> 00:41:52.900
it will always find the shortest
path from point A to point B

00:41:52.900 --> 00:41:56.200
and it's not going to get stuck
like greedy best-first search might

00:41:56.200 --> 00:41:58.510
on a path that isn't actually optimal.

00:41:58.510 --> 00:42:00.970
And so there are many of these
sorts of search algorithms

00:42:00.970 --> 00:42:04.660
that are designed to try and
find intelligent ways to find

00:42:04.660 --> 00:42:09.110
a solution to some problem, to navigate
its way through some landscape.

00:42:09.110 --> 00:42:11.590
And so I'll pause here
for any questions then

00:42:11.590 --> 00:42:14.230
about search algorithms in general.

00:42:14.230 --> 00:42:17.470
We've taken a look at what we
call classical search, navigating

00:42:17.470 --> 00:42:20.888
our way through a maze for example,
or finding driving directions as well

00:42:20.888 --> 00:42:23.680
as what we might call adversarial
search, where there's an opponent

00:42:23.680 --> 00:42:27.310
and you're trying to search for the
best move in a game of tic-tac-toe

00:42:27.310 --> 00:42:28.592
or a game of chess.

00:42:28.592 --> 00:42:30.550
DAVID MALAN: Brian, one
question from the chat.

00:42:30.550 --> 00:42:32.560
Does the assignment of
these heuristic values

00:42:32.560 --> 00:42:35.632
also take a lot of time for the
computer or is that automatic?

00:42:35.632 --> 00:42:37.840
BRIAN YU: Ideally you want
to find a heuristic that's

00:42:37.840 --> 00:42:40.420
going to be efficient to calculate.

00:42:40.420 --> 00:42:44.170
So if the time it takes to calculate the
heuristic takes a long time, then yeah,

00:42:44.170 --> 00:42:46.490
it could be the case that
this would be even worse.

00:42:46.490 --> 00:42:48.657
But ideally, you want to
look for a heuristic that's

00:42:48.657 --> 00:42:50.500
going to be very quick
to calculate and this

00:42:50.500 --> 00:42:52.840
is why sometimes when we're
looking at heuristics,

00:42:52.840 --> 00:42:56.845
we're going to ignore some of the
details that make this more complex.

00:42:56.845 --> 00:42:58.720
Like when we're calculating
these heuristics,

00:42:58.720 --> 00:43:02.170
we're ignoring the walls because
having to deal with the walls is

00:43:02.170 --> 00:43:03.220
going to make it even--

00:43:03.220 --> 00:43:05.800
it's going to make it take
longer and longer amounts of time

00:43:05.800 --> 00:43:07.605
to be able to figure out these values.

00:43:07.605 --> 00:43:09.730
And so by ignoring the
walls, we can pretty quickly

00:43:09.730 --> 00:43:12.400
calculate just like
x, y coordinate wise,

00:43:12.400 --> 00:43:16.840
how many coordinate squares are
we away from that destination.

00:43:16.840 --> 00:43:19.990
Also, I've been labeling
all of the squares

00:43:19.990 --> 00:43:22.480
in this grid with their
heuristic value, just so you can

00:43:22.480 --> 00:43:24.610
see what those heuristic values are.

00:43:24.610 --> 00:43:28.600
In practice, if our search
algorithm never touches a square,

00:43:28.600 --> 00:43:30.760
like it never touches
any of these squares,

00:43:30.760 --> 00:43:33.500
it's not going to bother computing
heuristic values for them

00:43:33.500 --> 00:43:35.500
because it's not relevant
to the search process.

00:43:35.500 --> 00:43:38.200
So it'll never actually
calculate the heuristic values

00:43:38.200 --> 00:43:40.420
for all of these squares,
which will save time too.

00:43:40.420 --> 00:43:43.928
I have just shown them to you visually
so that you can see all of the squares

00:43:43.928 --> 00:43:45.720
and what numbers would
be assigned to them.

00:43:45.720 --> 00:43:47.110
But yeah, really great question.

00:43:50.550 --> 00:43:52.550
All right and, Kurt, was
there another question?

00:43:57.020 --> 00:43:58.770
AUDIENCE: Yeah, I was
just wondering if, I

00:43:58.770 --> 00:44:02.370
mean, the example that you
showed it's using like a map

00:44:02.370 --> 00:44:04.350
to actually search through
a map, so it kind of

00:44:04.350 --> 00:44:07.140
like maps directly onto
sort of the concept

00:44:07.140 --> 00:44:10.090
but like could this also be
used for like textual search

00:44:10.090 --> 00:44:12.840
or other kinds of searches through
like different kinds of problem

00:44:12.840 --> 00:44:14.112
spaces or no?

00:44:14.112 --> 00:44:17.070
BRIAN YU: Yeah, this can be used for
a lot of different problem spaces.

00:44:17.070 --> 00:44:19.080
For text they're usually
different approaches

00:44:19.080 --> 00:44:21.210
in the world of like
natural language processing

00:44:21.210 --> 00:44:24.180
but you can use these kinds
of search algorithms any time

00:44:24.180 --> 00:44:26.820
you have some computer
which we'll usually

00:44:26.820 --> 00:44:29.490
call like an agent, something
that's making decisions, that

00:44:29.490 --> 00:44:33.060
has to make certain
decisions at certain points,

00:44:33.060 --> 00:44:36.060
even if those decisions aren't like
geographic decisions about which

00:44:36.060 --> 00:44:40.140
way to walk, for example, as long as
it's making some decision that moves it

00:44:40.140 --> 00:44:44.820
into like a different position, you can
often apply these types of algorithms

00:44:44.820 --> 00:44:46.660
in order to solve problems.

00:44:46.660 --> 00:44:49.770
And so these algorithms are
not just good for maze solving,

00:44:49.770 --> 00:44:52.538
they can be used in other domains too.

00:44:52.538 --> 00:44:54.330
So ultimately what I
want to move on to now

00:44:54.330 --> 00:44:58.110
is less about just coming up with these
algorithms that let the AI figure out

00:44:58.110 --> 00:45:00.330
exactly what to do
right away, but looking

00:45:00.330 --> 00:45:03.360
at a type of artificial
intelligence that you've probably

00:45:03.360 --> 00:45:04.980
heard of called machine learning.

00:45:04.980 --> 00:45:08.130
And machine learning is all
about trying to take our AI

00:45:08.130 --> 00:45:11.520
and trying to get it to
learn, learn somehow from data

00:45:11.520 --> 00:45:13.200
or learn from experience.

00:45:13.200 --> 00:45:15.333
Much the same way that
humans might learn,

00:45:15.333 --> 00:45:17.250
that we learn from looking
at our environment,

00:45:17.250 --> 00:45:20.700
we learn from our surroundings,
we learn from our experiences,

00:45:20.700 --> 00:45:23.530
and we get something out
of those experiences.

00:45:23.530 --> 00:45:25.860
And so one type of
machine learning is known

00:45:25.860 --> 00:45:27.750
as reinforcement
learning, where you learn

00:45:27.750 --> 00:45:29.820
from positive or negative rewards.

00:45:29.820 --> 00:45:32.880
The computer does something
well and it gets a reward,

00:45:32.880 --> 00:45:36.030
the computer does something poorly,
it gets some sort of punishment

00:45:36.030 --> 00:45:38.890
and the computer tries
to learn from that.

00:45:38.890 --> 00:45:41.070
Let's imagine for example,
a very simple game

00:45:41.070 --> 00:45:43.440
that we might want our
computer to play, the computer

00:45:43.440 --> 00:45:45.950
is going to try to navigate
its way through these tiles

00:45:45.950 --> 00:45:47.700
and it's going to try
and get to the goal,

00:45:47.700 --> 00:45:51.240
similar to what we've seen
before, but this time the computer

00:45:51.240 --> 00:45:53.040
doesn't know where the obstacles are.

00:45:53.040 --> 00:45:56.430
Let's imagine that there are some
obstacles in its way highlighted

00:45:56.430 --> 00:46:00.120
in red here and if our computer
agent, this yellow dot here,

00:46:00.120 --> 00:46:04.200
ever hits one of those obstacles,
the computer loses this game.

00:46:04.200 --> 00:46:06.750
So it hits an obstacle, the
computer loses but the computer

00:46:06.750 --> 00:46:08.940
doesn't know where the obstacles are.

00:46:08.940 --> 00:46:12.098
I'm showing them to you visually
but the computer has no idea.

00:46:12.098 --> 00:46:14.140
How would the computer
try to solve this problem?

00:46:14.140 --> 00:46:16.050
How would you try to solve the problem?

00:46:16.050 --> 00:46:19.980
Well, all it can do it first is
randomly make a choice, randomly guess.

00:46:19.980 --> 00:46:24.000
Like let's choose to go to the right,
OK we hit an obstacle that was no good.

00:46:24.000 --> 00:46:26.100
But the computer now can
learn from experience.

00:46:26.100 --> 00:46:30.420
As long as it knows once it hits the
obstacle that was a bad outcome, well

00:46:30.420 --> 00:46:33.870
now in the future the computer can
learn, I better not do that anymore,

00:46:33.870 --> 00:46:37.215
rather than go to the right, let me
try something else, I'll try moving up.

00:46:37.215 --> 00:46:39.090
All right, that seemed
to have worked better,

00:46:39.090 --> 00:46:41.790
let me try making another action
and let's try another one,

00:46:41.790 --> 00:46:43.440
maybe I'll go down this time.

00:46:43.440 --> 00:46:46.260
OK that led to an obstacle, so
the computer learns from that too.

00:46:46.260 --> 00:46:49.770
It learns that was a bad thing to
try, so let's try something else.

00:46:49.770 --> 00:46:52.110
Maybe we try going up
this time, that too

00:46:52.110 --> 00:46:55.697
leads to an obstacle, that was no good,
so maybe we'll try going to the right.

00:46:55.697 --> 00:46:57.780
That seemed OK, maybe we'll
go to the right again.

00:46:57.780 --> 00:46:59.850
All right, that led to another obstacle.

00:46:59.850 --> 00:47:02.310
And so over and over
it's just making mistakes

00:47:02.310 --> 00:47:04.870
and we let the computer
make those mistakes

00:47:04.870 --> 00:47:07.020
but every time it makes
a mistake, the computer

00:47:07.020 --> 00:47:08.353
is learning something from that.

00:47:08.353 --> 00:47:11.820
It's learning what to do or
what not to do the next time it

00:47:11.820 --> 00:47:13.660
goes through the same process.

00:47:13.660 --> 00:47:17.340
And so in the future, it can navigate
its way around and eventually

00:47:17.340 --> 00:47:20.560
with enough trial and error, it
can find its way to the goal.

00:47:20.560 --> 00:47:22.560
And once it's found its
way to the goal, it'll

00:47:22.560 --> 00:47:27.180
remember that too, it'll know I now
know exactly what sequence of actions

00:47:27.180 --> 00:47:30.190
will take me from the
starting point to the goal.

00:47:30.190 --> 00:47:35.160
And so in the future I can just keep
doing that again and again and again.

00:47:35.160 --> 00:47:37.710
So that then is an example
of reinforcement learning,

00:47:37.710 --> 00:47:40.698
but even with this example here
do you see any potential problems

00:47:40.698 --> 00:47:41.490
with this approach?

00:47:41.490 --> 00:47:43.560
Limitations or downsides
to what we've just

00:47:43.560 --> 00:47:50.020
done here, reasons why this AI
might not be perfect, any thoughts?

00:47:50.020 --> 00:47:51.565
Let's go to [? Lexleen. ?]

00:47:51.565 --> 00:47:53.940
AUDIENCE: It might not be
taking the most efficient path.

00:47:53.940 --> 00:47:56.460
BRIAN YU: Yeah, absolutely
because it's randomly trying,

00:47:56.460 --> 00:47:58.560
eventually it will find
its way to the goal

00:47:58.560 --> 00:48:01.680
but it might not necessarily find
the best path because here there

00:48:01.680 --> 00:48:04.630
was a faster path, there was a
faster way to get to the goal,

00:48:04.630 --> 00:48:08.340
and that was to say once
you get here go up instead

00:48:08.340 --> 00:48:10.470
and that will lead to
a more efficient path.

00:48:10.470 --> 00:48:12.810
But because our AI is
learning, like whenever it

00:48:12.810 --> 00:48:14.670
reaches the goal it learns to do that.

00:48:14.670 --> 00:48:17.170
When it doesn't reach the
goal learns not to do that.

00:48:17.170 --> 00:48:22.823
Our AI doesn't have that ability to
find that better path in the future.

00:48:22.823 --> 00:48:24.240
And so we could improve upon this.

00:48:24.240 --> 00:48:28.380
And this is what we call a trade
off between exploring and exploiting

00:48:28.380 --> 00:48:29.520
in artificial intelligence.

00:48:29.520 --> 00:48:31.980
We want our AI to do
both of these things.

00:48:31.980 --> 00:48:36.360
We want AI to exploit the knowledge
it already has, it knows what to do,

00:48:36.360 --> 00:48:39.270
it knows what not to do, we
want it to use that information,

00:48:39.270 --> 00:48:42.060
but we also want it to
explore a little bit.

00:48:42.060 --> 00:48:44.940
We want it to sometimes
try some new action

00:48:44.940 --> 00:48:47.190
that it hasn't tried
before because maybe

00:48:47.190 --> 00:48:51.120
that'll be even better than the
things I've already tried in the past.

00:48:51.120 --> 00:48:53.670
So so far our AI in
the example just now,

00:48:53.670 --> 00:48:57.890
was only ever exploiting the knowledge
it already has but it was never

00:48:57.890 --> 00:48:59.700
exploring something new.

00:48:59.700 --> 00:49:01.835
And so often when we're
training AI, we want

00:49:01.835 --> 00:49:04.170
it to find some sort of nice
balance between the two.

00:49:04.170 --> 00:49:07.850
We want it to make good choices,
but occasionally take a risk,

00:49:07.850 --> 00:49:11.720
try something else, see if maybe
we can find a better solution.

00:49:11.720 --> 00:49:15.920
And so one strategy for doing this is
known as the epsilon-greedy approach

00:49:15.920 --> 00:49:19.850
to trying to solve these problems,
where we assign a value, epsilon,

00:49:19.850 --> 00:49:23.240
which is equal to some
proportion, some probability

00:49:23.240 --> 00:49:26.250
that our computer is just
going to make a random choice.

00:49:26.250 --> 00:49:29.360
And so we might say if we
generate a random number

00:49:29.360 --> 00:49:33.440
and it's less than epsilon, which in
this case happens like 10% of the time,

00:49:33.440 --> 00:49:35.600
then let's go ahead
and make a random move,

00:49:35.600 --> 00:49:38.900
rather than make an intelligent,
smart move just pick a move randomly

00:49:38.900 --> 00:49:41.678
and the rest of the
time, 90% of the time,

00:49:41.678 --> 00:49:44.720
make the move that we know to be the
best, the one with the highest value

00:49:44.720 --> 00:49:46.250
we know of so far.

00:49:46.250 --> 00:49:48.680
And this will often
result in a nice balance.

00:49:48.680 --> 00:49:51.680
90% of the time our AI
will make good choices

00:49:51.680 --> 00:49:54.890
that it knows to be good
choices, but 10% of the time

00:49:54.890 --> 00:49:56.810
it'll make a new
choice, something random

00:49:56.810 --> 00:49:58.790
and maybe it'll stumble
across something bad

00:49:58.790 --> 00:50:01.130
and know to avoid that in
the future, but maybe it'll

00:50:01.130 --> 00:50:06.180
come across something better and know
to do that better in the future as well.

00:50:06.180 --> 00:50:08.660
And so I'll show you a real
live demo of this actually.

00:50:08.660 --> 00:50:11.930
Years back, the Italian
Institute of Technology

00:50:11.930 --> 00:50:15.080
was working through an example of
trying to do something just like this,

00:50:15.080 --> 00:50:18.010
but before we move on I think
there's a question from the chat?

00:50:18.010 --> 00:50:20.180
DAVID MALAN: Brian, someone
asks, is this related

00:50:20.180 --> 00:50:22.490
to genetic algorithms, this approach?

00:50:22.490 --> 00:50:25.730
BRIAN YU: Genetic algorithms are
another type of this type of learning.

00:50:25.730 --> 00:50:28.010
We're going to actually going to
talk about genetic algorithms in just

00:50:28.010 --> 00:50:28.510
a moment.

00:50:28.510 --> 00:50:30.240
So we'll get there very shortly.

00:50:30.240 --> 00:50:33.290
But yes, definitely something related.

00:50:33.290 --> 00:50:35.660
So the Italian Institute of
Technology a couple of years

00:50:35.660 --> 00:50:39.167
back tried to teach a
robot how to flip pancakes.

00:50:39.167 --> 00:50:41.750
Something that you might have
done before or seen someone else

00:50:41.750 --> 00:50:46.430
do before but in practice, it's
difficult to encode in a robot

00:50:46.430 --> 00:50:47.570
exactly how to do that.

00:50:47.570 --> 00:50:51.200
Exactly what moves to make, exactly
how to move its robotic muscle to be

00:50:51.200 --> 00:50:52.940
able to flip a pancake effectively.

00:50:52.940 --> 00:50:57.440
So rather than try to program every
last movement into the computer,

00:50:57.440 --> 00:51:00.290
they just let the robot learn
via reinforcement learning.

00:51:00.290 --> 00:51:02.960
Every time it flipped a
pancake incorrectly it

00:51:02.960 --> 00:51:05.300
learned what not to do in
the future and every time it

00:51:05.300 --> 00:51:08.730
flipped a pancake correctly, it
learned what to do in the future.

00:51:08.730 --> 00:51:11.760
And so I'll go ahead and pull
up an example of this now.

00:51:11.760 --> 00:51:15.500
And so what we're going to take a
look at here is a artificial pancake.

00:51:15.500 --> 00:51:18.950
And initially, the
human researcher shows

00:51:18.950 --> 00:51:20.630
the robot what success looks like.

00:51:20.630 --> 00:51:22.940
The robot needs to know
what is success and what

00:51:22.940 --> 00:51:24.980
is failure so the human demonstrates.

00:51:24.980 --> 00:51:27.620
Here is what it looks like to
actually do the pancake flipping

00:51:27.620 --> 00:51:30.050
by demonstrating exactly
what motion to make

00:51:30.050 --> 00:51:32.390
and what it feels like to
successfully flip a pancake

00:51:32.390 --> 00:51:33.860
and then we let the robot try it.

00:51:36.722 --> 00:51:38.930
All right, that was his
first try, not to successful.

00:51:38.930 --> 00:51:40.270
We'll try it again.

00:51:40.270 --> 00:51:43.607
Here's after three tries.

00:51:43.607 --> 00:51:44.940
OK, not great but it's learning.

00:51:44.940 --> 00:51:48.023
Every time it does something wrong it
learns what not to do in the future.

00:51:48.023 --> 00:51:50.280
He has five tries, he
has 10 tries, not great.

00:51:50.280 --> 00:51:59.580
And now after 15 tries or so, or 11
tries I guess, we'll see what happens.

00:51:59.580 --> 00:52:00.630
OK so not great, right?

00:52:00.630 --> 00:52:04.980
It takes a while to learn these kinds
of techniques to learn, here's 15 tries,

00:52:04.980 --> 00:52:08.950
all right but finally after enough tries
once you do enough practice with this.

00:52:08.950 --> 00:52:11.370
Here's after 50 tries
and we now actually

00:52:11.370 --> 00:52:14.940
have a robot that has learned to
successfully perform this task,

00:52:14.940 --> 00:52:17.615
not because human programmers
told it exactly how to do so,

00:52:17.615 --> 00:52:20.490
but because it learned from that
failure and learned from experience.

00:52:20.490 --> 00:52:23.400
And once it knows how, it knows
exactly what to do in the future.

00:52:23.400 --> 00:52:27.090
It can continually replicate
that behavior over and over again

00:52:27.090 --> 00:52:30.385
once it's trained to be
able to perform that task.

00:52:33.450 --> 00:52:39.580
So that then is how you might take
advantage of this idea of reinforcement

00:52:39.580 --> 00:52:42.250
learning but someone in the
chat brought up another approach

00:52:42.250 --> 00:52:45.850
to this type of thing known as genetic
algorithms or genetic learning.

00:52:45.850 --> 00:52:48.070
And this is where a
lot of machine learning

00:52:48.070 --> 00:52:52.600
has taken inspiration from the
human brain, how humans learn

00:52:52.600 --> 00:52:55.180
and how nature works and
because nature has managed

00:52:55.180 --> 00:52:57.730
to evolve intelligent
beings, why couldn't we

00:52:57.730 --> 00:53:00.320
try to do the same thing
in a computer as well?

00:53:00.320 --> 00:53:04.300
And so in nature, you have generations
of populations that evolve over time,

00:53:04.300 --> 00:53:06.790
where the most fit
organisms survive and are

00:53:06.790 --> 00:53:10.960
able to evolve and mutate and change
over time to become better and better

00:53:10.960 --> 00:53:12.800
at adapting to their environment.

00:53:12.800 --> 00:53:15.460
And so one strategy,
one approach to trying

00:53:15.460 --> 00:53:20.830
to build intelligent machines is
rather than program one algorithm that

00:53:20.830 --> 00:53:23.020
is really good at
performing a task, let's

00:53:23.020 --> 00:53:27.190
just create a lot of algorithms that
are really bad at performing the task

00:53:27.190 --> 00:53:29.120
because that's much easier to do.

00:53:29.120 --> 00:53:32.170
But we'll let them evolve,
we'll let them try some task

00:53:32.170 --> 00:53:35.540
and after they do it,
they won't do very well

00:53:35.540 --> 00:53:37.570
but we'll take the
ones that did the best

00:53:37.570 --> 00:53:40.450
and replicate them and mutate
them and let them try again.

00:53:40.450 --> 00:53:42.940
And then repeat this
generation after generation,

00:53:42.940 --> 00:53:46.090
eliminating all of the computer
programs that don't perform well

00:53:46.090 --> 00:53:48.732
but duplicating and
mutating the ones that do.

00:53:48.732 --> 00:53:51.440
The pseudocode for which might
look a little something like this.

00:53:51.440 --> 00:53:55.030
We're going to start by making an
initial generation of candidates

00:53:55.030 --> 00:53:59.170
randomly, where each candidate is
some program designed to try and solve

00:53:59.170 --> 00:54:01.705
some task but rather than
program it intelligently

00:54:01.705 --> 00:54:03.580
the way you've been
doing all semester, we're

00:54:03.580 --> 00:54:06.880
just going to program them randomly
to just make random choices

00:54:06.880 --> 00:54:10.690
and we're going to repeat this process
until eventually they're successful.

00:54:10.690 --> 00:54:14.050
We're going to for each one of these
candidates calculate its fitness,

00:54:14.050 --> 00:54:18.767
calculate how good is it at performing
the task that it's designed to do.

00:54:18.767 --> 00:54:21.100
And then we're going to remove
the least fit candidates.

00:54:21.100 --> 00:54:23.330
Maybe take the half of them
that didn't do very well,

00:54:23.330 --> 00:54:26.750
eliminate those but take
the ones that did do well

00:54:26.750 --> 00:54:30.610
and make a new generation from the
remaining candidates, duplicate them,

00:54:30.610 --> 00:54:33.430
make a few mutations to them
randomly just to change things up

00:54:33.430 --> 00:54:37.810
and to see how things work in order
to try to create a better generation.

00:54:37.810 --> 00:54:41.920
And over time as we repeat this process,
in much the same way that evolution

00:54:41.920 --> 00:54:44.710
is able to continually produce
better and better organisms that

00:54:44.710 --> 00:54:48.490
are more and more fit, we can do
the same thing with our algorithms

00:54:48.490 --> 00:54:51.280
too, producing better and
better algorithms over time.

00:54:51.280 --> 00:54:53.470
I have another demo to show you of this.

00:54:53.470 --> 00:54:57.910
Here is a simulation of
some self-driving cars that

00:54:57.910 --> 00:55:02.230
have not been programmed how to drive
but are starting out driving entirely

00:55:02.230 --> 00:55:04.450
randomly each of these
rectangles you see

00:55:04.450 --> 00:55:07.150
is one example of a
virtual self-driving car.

00:55:07.150 --> 00:55:09.940
Each of the little crosshairs
you see, the little Xs,

00:55:09.940 --> 00:55:11.960
are sensors that the car has access to.

00:55:11.960 --> 00:55:16.450
So the car has access to data about
how far away in any given direction

00:55:16.450 --> 00:55:19.390
particular obstacles are and
what these cars are trying

00:55:19.390 --> 00:55:23.500
to learn is how to drive
through some sort of environment

00:55:23.500 --> 00:55:25.248
while not crashing into anything.

00:55:25.248 --> 00:55:27.040
And initially they
didn't do very well, you

00:55:27.040 --> 00:55:29.650
notice they were crashing
almost immediately but now we're

00:55:29.650 --> 00:55:32.158
about six generations
in, seven generations in,

00:55:32.158 --> 00:55:33.950
they're starting to do
a little bit better.

00:55:33.950 --> 00:55:38.178
They're starting to get a sense for
like turning when you run into a wall.

00:55:38.178 --> 00:55:39.970
Even when they get to
new environments they

00:55:39.970 --> 00:55:43.137
haven't necessarily seen before, they're
starting to do a little bit better.

00:55:43.137 --> 00:55:45.095
But of course, they're
still not great, they're

00:55:45.095 --> 00:55:46.450
still crashing quite frequently.

00:55:46.450 --> 00:55:50.470
Sometimes a generation does even
worse than the generation before it,

00:55:50.470 --> 00:55:52.720
because it's not always going
to be the case that when

00:55:52.720 --> 00:55:55.780
you make random mutations those
mutations are not necessarily

00:55:55.780 --> 00:55:56.840
going to be better.

00:55:56.840 --> 00:56:00.020
Sometimes the mutations actually
end up being a little bit worse.

00:56:00.020 --> 00:56:02.860
And so they might move
backwards a little bit.

00:56:02.860 --> 00:56:05.320
But over time, generation
after generation,

00:56:05.320 --> 00:56:09.490
you're hopefully seeing that now 10
generations in, these cars in general

00:56:09.490 --> 00:56:11.890
are doing a whole lot
better than they were before

00:56:11.890 --> 00:56:15.250
and every generation we're taking
whichever cars made it the furthest,

00:56:15.250 --> 00:56:19.600
duplicating those, eliminating
the rest, and moving forward.

00:56:19.600 --> 00:56:21.520
Is there a question in the chat?

00:56:21.520 --> 00:56:23.770
DAVID MALAN: Yeah, with
regard to the pancakes,

00:56:23.770 --> 00:56:27.220
how did the robot know what was wrong
with the previous pancake flips?

00:56:27.220 --> 00:56:30.550
In the case of the cars how
does the car know that it erred?

00:56:30.550 --> 00:56:33.800
BRIAN YU: Yeah, so whenever you're doing
anything with reinforcement learning,

00:56:33.800 --> 00:56:36.040
whether it's the pancakes
or these robots here,

00:56:36.040 --> 00:56:39.910
the programmer still needs to tell
the AI what does success look like

00:56:39.910 --> 00:56:41.540
and what does failure look like.

00:56:41.540 --> 00:56:45.550
So in the pancake example, we
trained the pancake flipper

00:56:45.550 --> 00:56:49.795
to be able to know that when you're
flipping this pancake when it's first

00:56:49.795 --> 00:56:51.920
taught what does it look
like when it's successful.

00:56:51.920 --> 00:56:53.837
So it gets a feel for
that and it was probably

00:56:53.837 --> 00:56:57.470
also told that if the pancake falls,
that's not a good thing for example.

00:56:57.470 --> 00:57:00.770
And that's not something
that the AI should try to do.

00:57:00.770 --> 00:57:02.740
And in this case, I
assume these cars have

00:57:02.740 --> 00:57:06.580
been programmed to be told that if you
crash into something that that is not

00:57:06.580 --> 00:57:10.030
good, that the car presumably can
detect after it's crashed into something

00:57:10.030 --> 00:57:14.042
and so it probably also has some sense
of like how far it was able to drive,

00:57:14.042 --> 00:57:16.750
such that we could duplicate the
ones that did drive the furthest

00:57:16.750 --> 00:57:19.840
and not the ones that didn't
and let's see how this car does,

00:57:19.840 --> 00:57:22.330
it was so close to the
end, didn't quite make it.

00:57:22.330 --> 00:57:27.030
Maybe give it one more generation and
we'll see how this generation does.

00:57:27.030 --> 00:57:29.075
And all right, so it's
navigating these turns,

00:57:29.075 --> 00:57:31.200
looks like a whole bunch
didn't make it but as long

00:57:31.200 --> 00:57:33.000
as we get one successful
one, we can learn

00:57:33.000 --> 00:57:35.020
from that successful one in the future.

00:57:35.020 --> 00:57:39.930
Here's the ending, looks like
they crashed but and all right, it

00:57:39.930 --> 00:57:42.990
looks like one car was finally
able to make it to the end

00:57:42.990 --> 00:57:46.680
and was able to successfully
learn that task of how to navigate

00:57:46.680 --> 00:57:48.540
through this environment of mazes.

00:57:48.540 --> 00:57:52.500
So I'll pause here for questions
then about reinforcement learning

00:57:52.500 --> 00:57:56.520
and how these ideas might have worked.

00:57:56.520 --> 00:57:58.660
And, Samuel, did you have a question?

00:57:58.660 --> 00:58:00.210
AUDIENCE: Yes.

00:58:00.210 --> 00:58:05.490
So with the genetic algorithm,
the car one specifically,

00:58:05.490 --> 00:58:12.300
so all the cars are learning from
each other as each one crashes?

00:58:12.300 --> 00:58:15.240
BRIAN YU: It's not so much that the
cars are learning from each other

00:58:15.240 --> 00:58:18.510
that we're generating new
cars based on the cars that

00:58:18.510 --> 00:58:20.160
were previously successful.

00:58:20.160 --> 00:58:22.890
The idea being that if
we run 10 cars and see

00:58:22.890 --> 00:58:25.890
how far they go, we take the
five that went the furthest,

00:58:25.890 --> 00:58:28.593
we eliminate the other five
that didn't make it very far

00:58:28.593 --> 00:58:30.510
but then we duplicate
and repeat the ones that

00:58:30.510 --> 00:58:34.260
did do well, such that in the future,
hopefully this new generation of cars

00:58:34.260 --> 00:58:35.790
will make it a little bit further.

00:58:35.790 --> 00:58:37.628
And generation after
generation, the hope

00:58:37.628 --> 00:58:40.170
is that we're able to make it
a little further along the road

00:58:40.170 --> 00:58:42.900
until eventually as you
saw like 15 generations in,

00:58:42.900 --> 00:58:47.290
we were able to get some cars that
could perform the task successfully.

00:58:47.290 --> 00:58:48.990
Let's go now to Josiah.

00:58:48.990 --> 00:58:56.790
AUDIENCE: Is the car specifically
learning just from the track?

00:58:56.790 --> 00:59:03.030
I mean, when you change the
track, do we need another, I mean,

00:59:03.030 --> 00:59:06.063
do we need to start from zero again or?

00:59:06.063 --> 00:59:09.230
BRIAN YU: Yeah, so if the track were
different would it have to learn again?

00:59:09.230 --> 00:59:10.130
Hopefully not.

00:59:10.130 --> 00:59:12.620
Hopefully what the
cars were learning was

00:59:12.620 --> 00:59:14.880
in this case was learning
based on sensor data,

00:59:14.880 --> 00:59:18.380
like how far away a wall is in
any direction, which way to turn.

00:59:18.380 --> 00:59:20.990
And the goal is for this type
of approach to generalize.

00:59:20.990 --> 00:59:24.260
And I mean, hopefully actual
self-driving cars in the real world

00:59:24.260 --> 00:59:25.640
are not trained this way.

00:59:25.640 --> 00:59:28.610
But you would hope that as they're
trained on sample environments,

00:59:28.610 --> 00:59:30.800
that when you put them in
a different environment

00:59:30.800 --> 00:59:32.840
they would be able to
generalize their knowledge

00:59:32.840 --> 00:59:34.490
to be able to apply similar techniques.

00:59:34.490 --> 00:59:37.790
That if in a real world setting
you take a car and put it on a road

00:59:37.790 --> 00:59:39.650
that it's never seen
before, hopefully it's

00:59:39.650 --> 00:59:42.620
seen enough examples of things
that are like it, sensor data

00:59:42.620 --> 00:59:46.630
that it recognizes, that it's able
to make an intelligent decision based

00:59:46.630 --> 00:59:47.130
on that.

00:59:47.130 --> 00:59:49.760
So yeah, the hope is and the
goal in many cases with AI

00:59:49.760 --> 00:59:52.910
is to be able to generalize
this knowledge beyond just

00:59:52.910 --> 00:59:53.765
a specific example.

00:59:56.480 --> 00:59:58.360
Any other questions?

00:59:58.360 --> 01:00:02.730
Yeah, let's go to [? Yagonch. ?]

01:00:02.730 --> 01:00:07.380
AUDIENCE: Yeah, so can these algorithms
ever reach like a bottleneck,

01:00:07.380 --> 01:00:11.730
just like in real life
evolution, there are branches

01:00:11.730 --> 01:00:14.550
and some branches just
reach a bottleneck.

01:00:14.550 --> 01:00:16.450
So is that possible here [INAUDIBLE]?

01:00:16.450 --> 01:00:19.460
BRIAN YU: Yes, it's definitely
possible that there might be--

01:00:19.460 --> 01:00:22.800
the algorithms might end up converging
to something that seems pretty good

01:00:22.800 --> 01:00:25.410
and it doesn't seem like any
mutations are doing any better,

01:00:25.410 --> 01:00:27.540
but it turns out a totally
different algorithm

01:00:27.540 --> 01:00:29.100
might actually do better than that.

01:00:29.100 --> 01:00:31.230
That's what we will often
call a local maximum

01:00:31.230 --> 01:00:33.060
in the context of
artificial intelligence,

01:00:33.060 --> 01:00:36.420
where there is some better approach,
there is some better algorithm

01:00:36.420 --> 01:00:37.990
but we can't necessarily find it.

01:00:37.990 --> 01:00:40.840
There are other strategies for
trying to deal with that problem.

01:00:40.840 --> 01:00:43.863
But it is definitely a
challenge we're thinking about.

01:00:43.863 --> 01:00:44.780
And one more question.

01:00:44.780 --> 01:00:45.783
Let's go to Sophia.

01:00:45.783 --> 01:00:48.450
AUDIENCE: I have a question about
how the fitness is calculated.

01:00:48.450 --> 01:00:51.990
Like in both of these cases, is
it like [INAUDIBLE] motors that

01:00:51.990 --> 01:00:54.872
are running or here like the distance?

01:00:54.872 --> 01:00:57.330
BRIAN YU: Yeah, in the case of
the pancake, it was probably

01:00:57.330 --> 01:01:00.030
like a binary of just like
did the pancake end up

01:01:00.030 --> 01:01:02.520
landing in the pan or not,
was our way of calculating

01:01:02.520 --> 01:01:04.730
the fitness of that particular example.

01:01:04.730 --> 01:01:07.230
In the case of the cars, we
would probably calculate fitness

01:01:07.230 --> 01:01:11.190
based on distance traveled, whichever
cars ended up traveling the furthest,

01:01:11.190 --> 01:01:13.380
that would be our definition
of fitness and that's

01:01:13.380 --> 01:01:15.690
really the part the programmer
needs to be involved in.

01:01:15.690 --> 01:01:17.610
The programmer needs
to decide, what does it

01:01:17.610 --> 01:01:21.720
mean to be most fit, what does a bad
example look like and using that,

01:01:21.720 --> 01:01:25.320
once the computer knows what success
looks like and what failure looks like,

01:01:25.320 --> 01:01:28.620
then it's able to learn from that to
be able to do better in the future.

01:01:31.300 --> 01:01:33.160
All right, so that was
genetic algorithms,

01:01:33.160 --> 01:01:38.340
which is one example of computers
that are able to learn from the way

01:01:38.340 --> 01:01:41.428
that humans learn, learning from the
way that nature works for example,

01:01:41.428 --> 01:01:43.470
but computer scientists
didn't really stop there.

01:01:43.470 --> 01:01:46.870
There are other examples that we
can do to add to this as well.

01:01:46.870 --> 01:01:50.850
And so one other example of using
this idea of reinforcement learning

01:01:50.850 --> 01:01:55.140
and using genetic algorithms might be
in video recommendations for example,

01:01:55.140 --> 01:01:58.350
where you could have some
watch history and the way

01:01:58.350 --> 01:02:01.590
that an algorithm like YouTube or
Netflix might suggest videos for you

01:02:01.590 --> 01:02:04.320
to watch is via this
reinforcement process,

01:02:04.320 --> 01:02:08.247
that it will just try suggesting videos
to you and learn from experience.

01:02:08.247 --> 01:02:10.830
If it suggests a video to you
that you like, that you click on

01:02:10.830 --> 01:02:12.538
that, you watch all
the way through, well

01:02:12.538 --> 01:02:14.890
then the algorithm learns in
the future, let's recommend

01:02:14.890 --> 01:02:16.140
more of those sorts of things.

01:02:16.140 --> 01:02:18.690
That's like the car
traveling very far and so we

01:02:18.690 --> 01:02:20.520
learn to do more of that in the future.

01:02:20.520 --> 01:02:22.937
If they recommend a video to
you and you don't click on it

01:02:22.937 --> 01:02:25.585
you don't ever watch it,
well then it's probably not

01:02:25.585 --> 01:02:27.460
going to recommend that
to you in the future.

01:02:27.460 --> 01:02:31.390
And it's probably going to learn
from that experience as well.

01:02:31.390 --> 01:02:35.130
And so this is one example of
computer science learning from nature,

01:02:35.130 --> 01:02:36.780
learning from the way that humans are.

01:02:36.780 --> 01:02:40.080
Another place that's happened too
is by looking at the human brain.

01:02:40.080 --> 01:02:43.410
That the human brain consists of
neurons and those neurons are connected

01:02:43.410 --> 01:02:45.960
to one another and they pass
information from one to another,

01:02:45.960 --> 01:02:49.380
electrical signals flow through
one neuron to another neuron

01:02:49.380 --> 01:02:51.960
and that's how brains are
able to make these very

01:02:51.960 --> 01:02:54.700
sophisticated kinds of computations.

01:02:54.700 --> 01:02:58.710
And this is what we might call a neural
network, a collection of these neurons.

01:02:58.710 --> 01:03:01.920
And one place that AI has been looking
into especially recently gaining

01:03:01.920 --> 01:03:05.760
in popularity, has been trying to
develop artificial neural networks.

01:03:05.760 --> 01:03:08.700
Instead of using a
biological neuron we just

01:03:08.700 --> 01:03:11.850
use what we might call an
artificial neuron or a unit.

01:03:11.850 --> 01:03:14.880
And you can think of this unit
as just storing some value,

01:03:14.880 --> 01:03:18.360
like some electrical signal for example,
like you might find in the human brain,

01:03:18.360 --> 01:03:21.960
now just in a digital format
and these artificial neurons,

01:03:21.960 --> 01:03:24.360
these units can be
connected to each other

01:03:24.360 --> 01:03:28.140
in sort of an input that translates
into an output sort of format

01:03:28.140 --> 01:03:32.910
where this arrow represents some sort
of calculation, some calculation that

01:03:32.910 --> 01:03:36.000
is taking this value on the
left and transforming it

01:03:36.000 --> 01:03:38.010
into that value on the right.

01:03:38.010 --> 01:03:42.180
And neural networks are usually not
just one input unit and one output unit,

01:03:42.180 --> 01:03:43.410
but they can be more complex.

01:03:43.410 --> 01:03:46.470
You might have a neural network with
two different inputs, each of which

01:03:46.470 --> 01:03:49.350
connects to an output or even
more sophisticated neural networks

01:03:49.350 --> 01:03:53.070
like this one, where you have multiple
layers of these units that are all

01:03:53.070 --> 01:03:55.710
connected to each other, each
of these arrows performing

01:03:55.710 --> 01:03:57.630
some sort of calculation
and if you've ever

01:03:57.630 --> 01:04:00.810
heard the term deep learning, this
is often what that's referring to,

01:04:00.810 --> 01:04:04.650
this idea of deep neural networks
with many layers, each of which

01:04:04.650 --> 01:04:06.140
is performing calculations.

01:04:06.140 --> 01:04:09.060
And ultimately using
some linear algebra,

01:04:09.060 --> 01:04:11.160
these neural networks
are able to figure out

01:04:11.160 --> 01:04:16.170
exactly how to tune these calculations
to translate input into some output.

01:04:16.170 --> 01:04:19.560
If you give a neural network enough
data it can learn from that data,

01:04:19.560 --> 01:04:22.950
it can learn exactly how to set
these various different arrows

01:04:22.950 --> 01:04:25.890
to be able to calculate some
task, to be able to translate

01:04:25.890 --> 01:04:28.690
some input into some output.

01:04:28.690 --> 01:04:32.640
So for example, that might take the
form of handwriting recognition.

01:04:32.640 --> 01:04:35.910
How exactly do we train
computers to be able to learn

01:04:35.910 --> 01:04:39.390
how to recognize handwriting given all
the different types of handwriting?

01:04:39.390 --> 01:04:42.390
Well, one way to try to do that
is by using a neural network,

01:04:42.390 --> 01:04:45.930
setting up a network of all of these
neurons and all of these connections

01:04:45.930 --> 01:04:49.290
and then you give to that neural
network a whole bunch of data.

01:04:49.290 --> 01:04:51.090
I give to the neural
network a whole bunch

01:04:51.090 --> 01:04:54.970
of already existing handwritten
digits, each of which is labeled.

01:04:54.970 --> 01:04:57.853
So the computer knows which
image corresponds to which digit.

01:04:57.853 --> 01:05:00.270
This data set you're looking
at here is a very famous data

01:05:00.270 --> 01:05:03.720
set of handwritten digits
called the MNIST data set,

01:05:03.720 --> 01:05:08.130
and given all of this data the computer
can start to train the neural network

01:05:08.130 --> 01:05:10.740
and start to figure out
exactly what calculations

01:05:10.740 --> 01:05:13.110
to run at each layer
of the neural network

01:05:13.110 --> 01:05:16.020
to be able to translate
the input into the output,

01:05:16.020 --> 01:05:19.830
to be able to translate like
a screenshot of what looks

01:05:19.830 --> 01:05:23.070
like a handwritten number eight, that
we all could tell is the number eight

01:05:23.070 --> 01:05:26.130
but might be difficult for us to
describe exactly how a computer could

01:05:26.130 --> 01:05:28.260
figure that out, but
via the neural network,

01:05:28.260 --> 01:05:30.810
by training the neural network
on all of the sample data,

01:05:30.810 --> 01:05:33.900
it's able to learn some
sort of function that

01:05:33.900 --> 01:05:36.780
can translate this input,
this handwritten digit

01:05:36.780 --> 01:05:39.850
into the output, what the
actual digit happens to be.

01:05:39.850 --> 01:05:42.570
And these neural networks have
proved to be incredibly versatile

01:05:42.570 --> 01:05:44.430
and we won't get into all
the math here because it

01:05:44.430 --> 01:05:47.305
does get a little bit more complex,
but it's used all over the place.

01:05:47.305 --> 01:05:50.290
It can be used for email
spam detection, where

01:05:50.290 --> 01:05:53.370
if you give the computer a whole bunch
of data, a whole bunch of emails,

01:05:53.370 --> 01:05:56.040
some of which are labeled spam
and some of which are not,

01:05:56.040 --> 01:05:57.390
it can learn a function.

01:05:57.390 --> 01:06:01.200
It can learn exactly how to tune that
neural network to be able to predict,

01:06:01.200 --> 01:06:04.510
for any given email,
whether it's spam or not.

01:06:04.510 --> 01:06:07.260
And really the key factor here
to making these networks work

01:06:07.260 --> 01:06:09.520
is having access to
large amounts of data.

01:06:09.520 --> 01:06:12.540
And this is why a lot of companies
now are doing a lot of work

01:06:12.540 --> 01:06:15.330
and trying to get a lot of data
and use that data in training

01:06:15.330 --> 01:06:18.940
their machine learning models, it's
because the more data that you have,

01:06:18.940 --> 01:06:22.650
the better you can make these algorithms
because the better you can tune them

01:06:22.650 --> 01:06:25.050
to all the different types
of inputs there might be,

01:06:25.050 --> 01:06:27.300
to help to make them even
more accurate in the future,

01:06:27.300 --> 01:06:30.330
to be able to test these networks
to see how well they work.

01:06:30.330 --> 01:06:33.420
And every time you're
interacting with websites online,

01:06:33.420 --> 01:06:35.820
you're often helping to
provide some of that data.

01:06:35.820 --> 01:06:38.380
Every time you go to your
email app for example

01:06:38.380 --> 01:06:42.270
and you mark an email as spam, that
email app might be learning from that,

01:06:42.270 --> 01:06:45.730
learning OK this type of email,
that's an example of a spam email.

01:06:45.730 --> 01:06:48.120
And so it learns in the
future to do a better job

01:06:48.120 --> 01:06:49.740
of trying to do that classification.

01:06:49.740 --> 01:06:52.148
And likewise, every time
an email is marked as spam

01:06:52.148 --> 01:06:53.940
and you have to tell
the computer, you know

01:06:53.940 --> 01:06:56.523
what, that wasn't spam the
computer is learning from that too.

01:06:56.523 --> 01:07:01.110
It's more data that the computer can
use to help to inform its algorithm

01:07:01.110 --> 01:07:02.640
and the way that it's working.

01:07:02.640 --> 01:07:07.290
And so for one final example, we can
take a look at how images like this

01:07:07.290 --> 01:07:08.430
were actually generated.

01:07:08.430 --> 01:07:11.370
How could a computer possibly
get an image like this

01:07:11.370 --> 01:07:14.850
and generate it and make something that
looks very much like a real person,

01:07:14.850 --> 01:07:17.692
even though it's not
actually a real person?

01:07:17.692 --> 01:07:19.650
Well, it's done using
the exact same technique,

01:07:19.650 --> 01:07:23.220
using a neural network that
learns how to translate inputs

01:07:23.220 --> 01:07:25.910
into outputs by having access
to a large amount of data.

01:07:25.910 --> 01:07:29.800
In this case, having access to
many, many photos of real people,

01:07:29.800 --> 01:07:33.150
so the computer can start to get a
sense for what a real person looks like,

01:07:33.150 --> 01:07:35.400
it can start to train
the network in that way.

01:07:35.400 --> 01:07:38.490
And then rather than build
the entire person all at once,

01:07:38.490 --> 01:07:39.960
build it up step by step.

01:07:39.960 --> 01:07:43.920
A computer generating a photo like
this, this is a pretty complicated task,

01:07:43.920 --> 01:07:45.550
it's pretty difficult to do.

01:07:45.550 --> 01:07:48.690
But you know what's
easier is generating that.

01:07:48.690 --> 01:07:51.570
That is 16 pixels it doesn't
look like a person at all

01:07:51.570 --> 01:07:53.580
but this a computer
could do pretty readily,

01:07:53.580 --> 01:07:57.630
just generate what seems to be a whole
bunch of kind of random looking pixels.

01:07:57.630 --> 01:08:01.080
But then you would train the computer
to add a little bit of detail

01:08:01.080 --> 01:08:04.660
to this image to be able to
learn if this were a real image,

01:08:04.660 --> 01:08:08.130
how would I add a little more detail
to it to make it look a little bit more

01:08:08.130 --> 01:08:09.600
like what a person would look like?

01:08:09.600 --> 01:08:12.540
And it does so again by having
access to large amounts of data

01:08:12.540 --> 01:08:16.540
many, many photos of people, so it can
begin to learn from that experience.

01:08:16.540 --> 01:08:19.050
So the computer learns
how to take this image

01:08:19.050 --> 01:08:22.540
and turn it into this for example.

01:08:22.540 --> 01:08:24.540
Still doesn't really
look like a person but it

01:08:24.540 --> 01:08:25.957
looks a little more like a person.

01:08:25.957 --> 01:08:27.210
It's got a higher resolution.

01:08:27.210 --> 01:08:30.420
You can maybe make out that there
is some hair here and a face here.

01:08:30.420 --> 01:08:35.310
And then the algorithm learns how do you
take an image like this, an 8 by 8 grid

01:08:35.310 --> 01:08:37.560
and turn it into a 16 by 16 grid.

01:08:37.560 --> 01:08:40.080
Now this, I mean, it still
doesn't look like a person,

01:08:40.080 --> 01:08:41.580
but it looks a little more accurate.

01:08:41.580 --> 01:08:44.460
And over time we can
follow these steps one

01:08:44.460 --> 01:08:46.380
after another adding
more and more detail.

01:08:46.380 --> 01:08:51.210
Until eventually, the computer is able
to generate an entire image that really

01:08:51.210 --> 01:08:53.880
looks like a person, that's
very hard to distinguish just

01:08:53.880 --> 01:08:56.160
by this input to output process.

01:08:56.160 --> 01:08:58.649
Learning some mapping
from inputs to outputs

01:08:58.649 --> 01:09:02.189
by having access to a whole lot of data.

01:09:02.189 --> 01:09:05.430
So before we wrap up here I'll
pause for any final questions

01:09:05.430 --> 01:09:08.910
about artificial intelligence,
any of the algorithms

01:09:08.910 --> 01:09:13.670
we looked at for searching, for
learning or anything like that?

01:09:13.670 --> 01:09:14.660
Any final questions?

01:09:14.660 --> 01:09:16.590
Yeah, Sophia, go ahead.

01:09:16.590 --> 01:09:18.899
AUDIENCE: I had a question
about like with the fitness,

01:09:18.899 --> 01:09:22.115
how is all the I guess, like data--
you mentioned they get regenerated

01:09:22.115 --> 01:09:23.990
like the successful
ones, but I was wondering

01:09:23.990 --> 01:09:27.965
like how is all that data sort
like what the success was exactly,

01:09:27.965 --> 01:09:30.770
but fitness is just like
a score it's hard to know

01:09:30.770 --> 01:09:35.373
like exactly what decisions it
made that said it was successful.

01:09:35.373 --> 01:09:38.540
BRIAN YU: Yeah, and that's actually one
of the trickier things about machine

01:09:38.540 --> 01:09:41.060
learning, that we can train
these machine learning models

01:09:41.060 --> 01:09:43.220
to be very, very good
but it's not always

01:09:43.220 --> 01:09:47.310
immediately apparent to us like what
it was doing in order to be successful.

01:09:47.310 --> 01:09:50.007
And this is an active area of
research within machine learning,

01:09:50.007 --> 01:09:52.340
including some faculty at
Harvard that work on this too,

01:09:52.340 --> 01:09:55.490
which is the interpretability
of machine learning results.

01:09:55.490 --> 01:09:59.000
Like the algorithm becomes very, very
good at recommending a video to you

01:09:59.000 --> 01:10:02.120
or generating an image of a
person, but it's hard for a person

01:10:02.120 --> 01:10:04.910
to look at that machine
learning model and understand

01:10:04.910 --> 01:10:06.830
how it arrived at that conclusion.

01:10:06.830 --> 01:10:09.805
Ultimately, in many cases, people
just throw their hands up and say,

01:10:09.805 --> 01:10:11.930
we don't really care how
the algorithm is doing it,

01:10:11.930 --> 01:10:14.030
as long as the algorithm
is doing it successfully

01:10:14.030 --> 01:10:16.695
and eventually produces
good results, we'll

01:10:16.695 --> 01:10:19.070
just take the ones that are
doing the best and use those,

01:10:19.070 --> 01:10:23.120
even if we don't necessarily understand
exactly how those algorithms are

01:10:23.120 --> 01:10:23.660
working.

01:10:23.660 --> 01:10:26.000
And that's definitely an area
of concern for some people

01:10:26.000 --> 01:10:29.960
and an area of research for others that
are looking into these types of tools

01:10:29.960 --> 01:10:32.510
and technologies.

01:10:32.510 --> 01:10:34.255
Is there a question in the chat maybe?

01:10:34.255 --> 01:10:36.380
DAVID MALAN: A final question
from the chat, Brian.

01:10:36.380 --> 01:10:39.890
Every time I choose the parts of the
picture that contain traffic lights

01:10:39.890 --> 01:10:45.398
or crosswalks to prove I am not a robot,
am I training Google's driverless car?

01:10:45.398 --> 01:10:47.690
BRIAN YU: Maybe not Google's
necessarily, but certainly

01:10:47.690 --> 01:10:49.400
a lot of times when you're
doing that type of thing.

01:10:49.400 --> 01:10:52.170
I mean, in one part it's to
verify that you are in fact human.

01:10:52.170 --> 01:10:54.003
That is one of the
purposes of those things,

01:10:54.003 --> 01:10:57.200
to make sure that robots aren't
signing up for websites, for example.

01:10:57.200 --> 01:11:00.770
But certainly it could be just
giving more examples of labeled data.

01:11:00.770 --> 01:11:03.080
That oftentimes what machine
learning models rely on

01:11:03.080 --> 01:11:06.770
is labeled data where you have
like a digit, a handwritten digit,

01:11:06.770 --> 01:11:10.190
and you have a label of "this is the
number 2," or "this is the number 8."

01:11:10.190 --> 01:11:13.880
And so the computer can then use
those digits along with those numbers

01:11:13.880 --> 01:11:17.960
in order to figure out how to learn
how to take handwritten digits

01:11:17.960 --> 01:11:20.570
and convert it to individual numbers.

01:11:20.570 --> 01:11:23.420
And so having access to that
kind of labeled data ultimately

01:11:23.420 --> 01:11:25.910
ends up being really, really valuable.

01:11:25.910 --> 01:11:28.700
All right, so that's it for
artificial intelligence for today.

01:11:28.700 --> 01:11:32.900
Thank you all so much and
we'll see you next time.

01:11:32.900 --> 01:11:35.650
[MUSIC PLAYING]