1
00:00:00,000 --> 00:00:03,486
[MUSIC PLAYING]

2
00:00:03,486 --> 00:01:07,345


3
00:01:07,345 --> 00:01:10,960
TOM CRUISE: I'm going
to show you some magic.

4
00:01:10,960 --> 00:01:12,250
It's the real thing.

5
00:01:12,250 --> 00:01:14,420
[LAUGHTER]

6
00:01:14,420 --> 00:01:24,340
I mean, it's all the real thing.

7
00:01:24,340 --> 00:01:26,270
[LAUGHTER]

8
00:01:26,270 --> 00:01:27,410
DAVID J. MALAN: All right.

9
00:01:27,410 --> 00:01:30,950
This is CS50, Harvard
University's Introduction

10
00:01:30,950 --> 00:01:33,140
to the Intellectual
Enterprises of Computer Science

11
00:01:33,140 --> 00:01:34,430
and the Art of Programming.

12
00:01:34,430 --> 00:01:37,760
My name is David Malan, and this
is our family-friendly introduction

13
00:01:37,760 --> 00:01:41,780
to artificial intelligence or AI, which
seems to be everywhere these days.

14
00:01:41,780 --> 00:01:45,140
But first, a word on these
rubber ducks, which your students

15
00:01:45,140 --> 00:01:46,487
might have had for some time.

16
00:01:46,487 --> 00:01:49,320
Within the world of computer science,
and programming in particular,

17
00:01:49,320 --> 00:01:52,145
there's this notion of rubber
duck debugging or rubber ducking--

18
00:01:52,145 --> 00:01:57,080
--whereby in the absence of a colleague,
a friend, a family member, a teaching

19
00:01:57,080 --> 00:02:00,120
fellow who might be able to answer
your questions about your code,

20
00:02:00,120 --> 00:02:02,210
especially when it's
not working, ideally you

21
00:02:02,210 --> 00:02:04,940
might have at least a rubber
duck or really any inanimate

22
00:02:04,940 --> 00:02:07,550
object on your desk with whom to talk.

23
00:02:07,550 --> 00:02:11,243
And the idea is, that in expressing your
logic, talking through your problems,

24
00:02:11,243 --> 00:02:13,160
even though the duck
doesn't actually respond,

25
00:02:13,160 --> 00:02:16,250
invariably, you hear eventually
the illogic in your thoughts

26
00:02:16,250 --> 00:02:18,110
and the proverbial light bulb goes off.

27
00:02:18,110 --> 00:02:20,900
Now, for students online
for some time, CS50

28
00:02:20,900 --> 00:02:23,370
has had a digital
version thereof, whereby

29
00:02:23,370 --> 00:02:25,945
in the programming environment
that CS50 students use,

30
00:02:25,945 --> 00:02:29,070
for the past several years, if they
don't have a rubber duck on their desk,

31
00:02:29,070 --> 00:02:30,790
they can pull up this interface here.

32
00:02:30,790 --> 00:02:32,850
And if they begin a
conversation like, I'm

33
00:02:32,850 --> 00:02:35,850
hoping you can help me solve
some problem, up until recently,

34
00:02:35,850 --> 00:02:39,640
CS50's virtual rubber duck
would simply quack once, twice,

35
00:02:39,640 --> 00:02:41,010
or three times in total.

36
00:02:41,010 --> 00:02:43,380
But we have anecdotal
evidence that alone

37
00:02:43,380 --> 00:02:47,010
was enough to get students to realize
what it is they were doing wrong.

38
00:02:47,010 --> 00:02:51,090
But of course, more recently has
this duck and so many other ducks,

39
00:02:51,090 --> 00:02:53,340
so to speak, around the
world, come to life really.

40
00:02:53,340 --> 00:02:56,310
And your students have been
using artificial intelligence

41
00:02:56,310 --> 00:03:00,090
in some form within CS50 as
a virtual teaching assistant.

42
00:03:00,090 --> 00:03:02,130
And what we'll do today,
is reveal not only

43
00:03:02,130 --> 00:03:05,370
how we've been using and
leveraging AI within CS50,

44
00:03:05,370 --> 00:03:10,530
but also how AI itself works, and to
prepare you better for the years ahead.

45
00:03:10,530 --> 00:03:14,910
So last year around this time,
like DALL-E 2 and image generation

46
00:03:14,910 --> 00:03:15,870
were all of the rage.

47
00:03:15,870 --> 00:03:18,600
You might have played with this, whereby
you can type in some keywords and boom,

48
00:03:18,600 --> 00:03:20,640
you have a dynamically generated image.

49
00:03:20,640 --> 00:03:24,240
Similar tools are like Midjourney,
which gives you even more realistic 3D

50
00:03:24,240 --> 00:03:24,960
imagery.

51
00:03:24,960 --> 00:03:27,840
And within that world
of image generation,

52
00:03:27,840 --> 00:03:32,370
there were nonetheless some tells,
like an observant viewer could tell

53
00:03:32,370 --> 00:03:34,768
that this was probably generated by AI.

54
00:03:34,768 --> 00:03:36,810
And in fact, a few months
ago, The New York Times

55
00:03:36,810 --> 00:03:38,470
took a look at some of these tools.

56
00:03:38,470 --> 00:03:41,550
And so, for instance, here
is a sequence of images

57
00:03:41,550 --> 00:03:44,350
that at least at left, isn't
all that implausible that this

58
00:03:44,350 --> 00:03:45,600
might be an actual photograph.

59
00:03:45,600 --> 00:03:48,000
But in fact, all three of
these are AI-generated.

60
00:03:48,000 --> 00:03:50,910
And for some time, there
was a certain tell.

61
00:03:50,910 --> 00:03:54,600
Like AI up until recently, really
wasn't really good at the finer details,

62
00:03:54,600 --> 00:03:57,120
like the fingers are not quite right.

63
00:03:57,120 --> 00:03:58,950
And so you could have that sort of hint.

64
00:03:58,950 --> 00:04:01,470
But I dare say, AI is getting
even better and better,

65
00:04:01,470 --> 00:04:04,420
such that it's getting harder to
discern these kinds of things.

66
00:04:04,420 --> 00:04:06,930
So if you haven't already, go
ahead and take out your phone

67
00:04:06,930 --> 00:04:08,190
if you have one with you.

68
00:04:08,190 --> 00:04:11,680
And if you'd like to partake,
scan this barcode here,

69
00:04:11,680 --> 00:04:13,830
which will lead you to a URL.

70
00:04:13,830 --> 00:04:17,339
And on your screen, you'll have an
opportunity in a moment to buzz in.

71
00:04:17,339 --> 00:04:20,310
If my colleague, Rongxin, wouldn't
mind joining me up here on stage.

72
00:04:20,310 --> 00:04:22,560
We'll ask you a sequence of
questions and see just how

73
00:04:22,560 --> 00:04:25,480
prepared you are for
this coming world of AI.

74
00:04:25,480 --> 00:04:27,823
So for instance, once
you've got this here,

75
00:04:27,823 --> 00:04:29,490
code scanned, if you don't, that's fine.

76
00:04:29,490 --> 00:04:32,880
You can play along at home or
alongside the person next to you.

77
00:04:32,880 --> 00:04:34,920
Here are two images.

78
00:04:34,920 --> 00:04:38,400
And my question for you is,
which of these two images, left

79
00:04:38,400 --> 00:04:42,610
or right, was generated by AI?

80
00:04:42,610 --> 00:04:49,740
Which of these two was
generated by AI, left or right?

81
00:04:49,740 --> 00:04:51,780
And I think Rongxin, we
can flip over and see

82
00:04:51,780 --> 00:04:53,970
as the responses start to come in.

83
00:04:53,970 --> 00:04:58,740
So far, we're about 20% saying
left, 70 plus percent saying right.

84
00:04:58,740 --> 00:05:02,272
3%, 4%, comfortably admitting
unsure, and that's fine.

85
00:05:02,272 --> 00:05:04,230
Let's wait for a few more
responses to come in,

86
00:05:04,230 --> 00:05:06,837
though I think the
right-hand folks have it.

87
00:05:06,837 --> 00:05:09,420
And let's go ahead and flip back
and see what the solution is.

88
00:05:09,420 --> 00:05:14,020
In this case, it was, in fact, the
right-hand side that was AI-generated.

89
00:05:14,020 --> 00:05:15,127
So, that's great.

90
00:05:15,127 --> 00:05:17,460
I'm not sure what it means
that we figured this one out,

91
00:05:17,460 --> 00:05:19,350
but let's try one more here.

92
00:05:19,350 --> 00:05:22,558
So let me propose that we
consider now these two images.

93
00:05:22,558 --> 00:05:23,350
It's the same code.

94
00:05:23,350 --> 00:05:25,680
So if you still have your phone
up, you don't need to scan again.

95
00:05:25,680 --> 00:05:27,250
It's going to be the same URL here.

96
00:05:27,250 --> 00:05:28,650
But just in case you closed it.

97
00:05:28,650 --> 00:05:30,990
Let's take a look now
at these two images.

98
00:05:30,990 --> 00:05:35,040
Which of these, left or
right, was AI-generated?

99
00:05:35,040 --> 00:05:38,802
Left or right this time?

100
00:05:38,802 --> 00:05:41,010
Rongxin, should we take a
look at how it's coming in?

101
00:05:41,010 --> 00:05:42,570
Oh, it's a little closer this time.

102
00:05:42,570 --> 00:05:44,540
Left or right?

103
00:05:44,540 --> 00:05:46,830
Right's losing a little
ground, maybe as people

104
00:05:46,830 --> 00:05:48,930
are changing their answers to left.

105
00:05:48,930 --> 00:05:52,510
More people are unsure this time,
which is somewhat revealing.

106
00:05:52,510 --> 00:05:54,790
Let's give folks another second or two.

107
00:05:54,790 --> 00:05:57,200
And Rongxin, should we flip back?

108
00:05:57,200 --> 00:06:00,760
The answer is, actually a trick
question, since they were both AI.

109
00:06:00,760 --> 00:06:04,120
So most of you, most of
you were, in fact, right.

110
00:06:04,120 --> 00:06:08,150
But if you take a glance at this,
is getting really, really good.

111
00:06:08,150 --> 00:06:13,220
And so this is just a taste of the
images that we might see down the line.

112
00:06:13,220 --> 00:06:16,930
And in fact, that video
with which we began,

113
00:06:16,930 --> 00:06:20,440
Tom Cruise, as you might have
gleaned, was not, in fact, Tom Cruise.

114
00:06:20,440 --> 00:06:22,810
That was an example of
a deepfake, a video that

115
00:06:22,810 --> 00:06:26,500
was synthesized, whereby a different
human was acting out those motions,

116
00:06:26,500 --> 00:06:31,660
saying those words, but software,
artificial intelligence-inspired

117
00:06:31,660 --> 00:06:35,380
software was mutating the actual
image and faking this video.

118
00:06:35,380 --> 00:06:38,950
So it's all fun and games for now as
we tinker with these kinds of examples,

119
00:06:38,950 --> 00:06:43,000
but suffice it to say, as we've begun
to discuss in classes like this already,

120
00:06:43,000 --> 00:06:46,240
disinformation is only going to become
more challenging in a world where

121
00:06:46,240 --> 00:06:47,920
it's not just text, but it's imagery.

122
00:06:47,920 --> 00:06:49,452
And all the more, soon video.

123
00:06:49,452 --> 00:06:51,910
But for today, we'll focus
really on the fundamentals, what

124
00:06:51,910 --> 00:06:56,230
it is that's enabling technologies like
these, and even more familiarly, text

125
00:06:56,230 --> 00:06:57,970
generation, which is all the rage.

126
00:06:57,970 --> 00:07:01,240
And in fact, it seems just a few months
ago, probably everyone in this room

127
00:07:01,240 --> 00:07:04,030
started to hear about
tools like ChatGPT.

128
00:07:04,030 --> 00:07:06,800
So we thought we'd do one
final exercise here as a group.

129
00:07:06,800 --> 00:07:08,800
And this was another piece
in The New York Times

130
00:07:08,800 --> 00:07:11,590
where they asked the audience,
"Did a fourth grader write this?

131
00:07:11,590 --> 00:07:12,850
Or the new chatbot?"

132
00:07:12,850 --> 00:07:15,640
So another opportunity to
assess your discerning skills.

133
00:07:15,640 --> 00:07:16,450
So same URL.

134
00:07:16,450 --> 00:07:19,840
So if you still have your phone
open and that same interface open,

135
00:07:19,840 --> 00:07:21,470
you're in the right place.

136
00:07:21,470 --> 00:07:25,480
And here, we'll take a final
stab at two essays of sorts.

137
00:07:25,480 --> 00:07:30,020
Which of these essays was written by AI?

138
00:07:30,020 --> 00:07:32,260
Essay 1 or Essay 2?

139
00:07:32,260 --> 00:07:34,450
And as folks buzz in,
I'll read the first.

140
00:07:34,450 --> 00:07:35,020
Essay 1.

141
00:07:35,020 --> 00:07:37,870
I like to bring a yummy sandwich
and a cold juice box for lunch.

142
00:07:37,870 --> 00:07:41,860
Sometimes I'll even pack a tasty piece
of fruit or a bag of crunchy chips.

143
00:07:41,860 --> 00:07:46,090
As we eat, we chat and laugh and catch
up on each other's day, dot, dot, dot.

144
00:07:46,090 --> 00:07:46,690
Essay 2.

145
00:07:46,690 --> 00:07:49,243
My mother packs me a sandwich,
a drink, fruit, and a treat.

146
00:07:49,243 --> 00:07:51,910
When I get in the lunchroom, I
find an empty table and sit there

147
00:07:51,910 --> 00:07:52,930
and I eat my lunch.

148
00:07:52,930 --> 00:07:54,820
My friends come and sit down with me.

149
00:07:54,820 --> 00:07:55,790
Dot, dot, dot.

150
00:07:55,790 --> 00:07:57,550
Rongxin, should we see what folks think?

151
00:07:57,550 --> 00:08:03,040
It looks like most of you think
that Essay 1 was generated by AI.

152
00:08:03,040 --> 00:08:09,010
And in fact, if we flip back to the
answer here, it was, in fact, Essay 1.

153
00:08:09,010 --> 00:08:13,060
So it's great that we now already
have seemingly this discerning eye,

154
00:08:13,060 --> 00:08:15,880
but let me perhaps
deflate that enthusiasm

155
00:08:15,880 --> 00:08:20,120
by saying it's only going to get
harder to discern one from the other.

156
00:08:20,120 --> 00:08:23,680
And we're really now on the bleeding
edge of what's soon to be possible.

157
00:08:23,680 --> 00:08:25,990
But most everyone in this
room has probably by now

158
00:08:25,990 --> 00:08:31,450
seen, tried, certainly heard of ChatGPT,
which is all about textual generation.

159
00:08:31,450 --> 00:08:34,210
Within CS50 and within
academia more generally,

160
00:08:34,210 --> 00:08:37,690
have we been thinking about, talking
about, how whether to use or not

161
00:08:37,690 --> 00:08:39,023
use these kinds of technologies.

162
00:08:39,023 --> 00:08:42,148
And if the students in the room haven't
told the family members in the room

163
00:08:42,148 --> 00:08:45,010
already, this here is an excerpt
from CS50's own syllabus this year,

164
00:08:45,010 --> 00:08:48,730
whereby we have deemed tools like
ChatGPT in their current form,

165
00:08:48,730 --> 00:08:49,808
just too helpful.

166
00:08:49,808 --> 00:08:51,850
Sort of like an overzealous
friend who in school,

167
00:08:51,850 --> 00:08:55,520
who just wants to give you all of the
answers instead of leading you to them.

168
00:08:55,520 --> 00:09:00,760
And so we simply prohibit by
policy using AI-based software,

169
00:09:00,760 --> 00:09:05,200
such as ChatGPT, third-party tools like
GitHub Copilot, Bing Chat, and others

170
00:09:05,200 --> 00:09:08,920
that suggests or completes answers
to questions or lines of code.

171
00:09:08,920 --> 00:09:13,510
But it would seem reactionary to
take away what technology surely has

172
00:09:13,510 --> 00:09:15,400
some potential upsides for education.

173
00:09:15,400 --> 00:09:18,460
And so within CS50 this semester,
as well as this past summer,

174
00:09:18,460 --> 00:09:22,300
have we allowed students to use
CS50's own AI-based software, which

175
00:09:22,300 --> 00:09:24,490
are in effect, as we'll
discuss, built on top

176
00:09:24,490 --> 00:09:27,700
of these third-party
tools, ChatGPT from OpenAI,

177
00:09:27,700 --> 00:09:29,440
companies like Microsoft and beyond.

178
00:09:29,440 --> 00:09:33,820
And in fact, what students can now
use, is this brought to life CS50 duck,

179
00:09:33,820 --> 00:09:37,270
or DDB, Duck Debugger,
within a website of our own,

180
00:09:37,270 --> 00:09:41,230
CS50 AI, and another that your
students know known as cs50.dev.

181
00:09:41,230 --> 00:09:43,210
So students are using
it, but in a way where

182
00:09:43,210 --> 00:09:46,120
we have tempered the enthusiasm
of what might otherwise

183
00:09:46,120 --> 00:09:48,370
be an overly helpful
duck to model it more

184
00:09:48,370 --> 00:09:50,480
akin to a good teacher,
a good teaching fellow,

185
00:09:50,480 --> 00:09:54,140
who might guide you to the answers,
but not simply hand them outright.

186
00:09:54,140 --> 00:09:57,170
So what does that actually mean, and
in what form does this duck come?

187
00:09:57,170 --> 00:09:59,960
Well, architecturally, for those of
you with engineering backgrounds that

188
00:09:59,960 --> 00:10:02,293
might be curious as to how
this is actually implemented,

189
00:10:02,293 --> 00:10:06,260
if a student here in the class has
a question, virtually in this case,

190
00:10:06,260 --> 00:10:10,820
they somehow ask these questions of
this central web application, cs50.ai.

191
00:10:10,820 --> 00:10:13,760
But we, in turn, have
built much of our own logic

192
00:10:13,760 --> 00:10:18,050
on top of third-party services, known
as APIs, application programming

193
00:10:18,050 --> 00:10:20,780
interfaces, features that
other companies provide

194
00:10:20,780 --> 00:10:22,530
that people like us can use.

195
00:10:22,530 --> 00:10:25,250
So as they are doing really
a lot of the heavy lifting,

196
00:10:25,250 --> 00:10:27,380
the so-called large
language models are there.

197
00:10:27,380 --> 00:10:30,350
But we, too, have information
that is not in these models yet.

198
00:10:30,350 --> 00:10:32,720
For instance, the words
that came out of my mouth

199
00:10:32,720 --> 00:10:36,500
just last week when we had a lecture
on some other topic, not to mention all

200
00:10:36,500 --> 00:10:39,270
of the past lectures and homework
assignments from this year.

201
00:10:39,270 --> 00:10:41,510
So we have our own
vector database locally

202
00:10:41,510 --> 00:10:44,570
via which we can search for
more recent information,

203
00:10:44,570 --> 00:10:47,900
and then hand some of that information
into these models, which you might

204
00:10:47,900 --> 00:10:51,870
recall, at least for OpenAI,
is cut off as of 2021 as

205
00:10:51,870 --> 00:10:54,240
of now, to make the
information even more current.

206
00:10:54,240 --> 00:10:56,590
So architecturally,
that's sort of the flow.

207
00:10:56,590 --> 00:10:58,980
But for now, I thought I'd
share at a higher level what

208
00:10:58,980 --> 00:11:01,440
it is your students are
already familiar with,

209
00:11:01,440 --> 00:11:04,230
and what will soon be more broadly
available to our own students

210
00:11:04,230 --> 00:11:05,650
online as well.

211
00:11:05,650 --> 00:11:08,190
So what we focused on
is, what's generally

212
00:11:08,190 --> 00:11:11,820
now known as prompt engineering,
which isn't really a technical phrase,

213
00:11:11,820 --> 00:11:14,500
because it's not so much engineering
in the traditional sense.

214
00:11:14,500 --> 00:11:16,650
It really is just English,
what we are largely

215
00:11:16,650 --> 00:11:20,520
doing when it comes to
giving the AI the personality

216
00:11:20,520 --> 00:11:22,800
of a good teacher or a good duck.

217
00:11:22,800 --> 00:11:26,460
So what we're doing, is giving it what's
known as a system prompt nowadays,

218
00:11:26,460 --> 00:11:31,020
whereby we write some English sentences,
send those English sentences to OpenAI

219
00:11:31,020 --> 00:11:34,560
or Microsoft, that sort of
teaches it how to behave.

220
00:11:34,560 --> 00:11:36,930
Not just using its own
knowledge out of the box,

221
00:11:36,930 --> 00:11:40,290
but coercing it to behave a little
more educationally constructively.

222
00:11:40,290 --> 00:11:42,720
And so for instance, a
representative snippet

223
00:11:42,720 --> 00:11:44,622
of English that we
provide to these services

224
00:11:44,622 --> 00:11:46,080
looks a little something like this.

225
00:11:46,080 --> 00:11:50,600
Quote, unquote, "You are a friendly and
supportive teaching assistant for CS50.

226
00:11:50,600 --> 00:11:52,520
You are also a rubber duck.

227
00:11:52,520 --> 00:11:57,080
You answer student questions only about
CS50 and the field of computer science,

228
00:11:57,080 --> 00:11:59,900
do not answer questions
about unrelated topics.

229
00:11:59,900 --> 00:12:02,060
Do not provide full
answers to problem sets,

230
00:12:02,060 --> 00:12:04,130
as this would violate academic honesty.

231
00:12:04,130 --> 00:12:07,610
And so in essence, and you can
do this manually with ChatGPT,

232
00:12:07,610 --> 00:12:09,990
you can tell it or ask it how to behave.

233
00:12:09,990 --> 00:12:11,910
We, essentially, are
doing this automatically,

234
00:12:11,910 --> 00:12:14,240
so that it doesn't just
hand answers out of the box

235
00:12:14,240 --> 00:12:16,310
and knows a little
something more about us.

236
00:12:16,310 --> 00:12:19,310
There's also in this world of AI
right now the notion of a user

237
00:12:19,310 --> 00:12:21,380
prompt versus that system prompt.

238
00:12:21,380 --> 00:12:25,060
And the user prompt, in our case, is
essentially the student's own question.

239
00:12:25,060 --> 00:12:29,630
I have a question about x, or I have
a problem with my code here in y,

240
00:12:29,630 --> 00:12:32,720
so we pass to those same
APIs, students' own questions

241
00:12:32,720 --> 00:12:34,670
as part of this so-called user prompt.

242
00:12:34,670 --> 00:12:37,490
Just so you're familiar now with
some of the vernacular of late.

243
00:12:37,490 --> 00:12:39,200
Now, the programming
environment that students

244
00:12:39,200 --> 00:12:41,575
have been using this whole
year is known as Visual Studio

245
00:12:41,575 --> 00:12:45,260
Code, a popular open source,
free product, that most--

246
00:12:45,260 --> 00:12:47,450
so many engineers around
the world now use.

247
00:12:47,450 --> 00:12:50,580
But we've instrumented it to be
a little more course-specific

248
00:12:50,580 --> 00:12:55,830
with some course-specific features that
make learning within this environment

249
00:12:55,830 --> 00:12:57,900
all the easier.

250
00:12:57,900 --> 00:12:59,220
It lives at cs50.dev.

251
00:12:59,220 --> 00:13:02,370
And as students in this
room know, that as of now,

252
00:13:02,370 --> 00:13:04,650
the virtual duck lives
within this environment

253
00:13:04,650 --> 00:13:07,540
and can do things like explain
highlighted lines of code.

254
00:13:07,540 --> 00:13:10,560
So here, for instance, is a screenshot
of this programming environment.

255
00:13:10,560 --> 00:13:14,550
Here is some arcane looking code in
a language called C, that we've just

256
00:13:14,550 --> 00:13:16,082
left behind us in the class.

257
00:13:16,082 --> 00:13:19,290
And suppose that you don't understand
what one or more of these lines of code

258
00:13:19,290 --> 00:13:19,790
do.

259
00:13:19,790 --> 00:13:23,580
Students can now highlight those lines,
right-click or Control click on it,

260
00:13:23,580 --> 00:13:26,440
select explain highlighted
code, and voila,

261
00:13:26,440 --> 00:13:32,040
they see a ChatGPT-like explanation of
that very code within a second or so,

262
00:13:32,040 --> 00:13:35,100
that no human has typed out, but
that's been dynamically generated

263
00:13:35,100 --> 00:13:36,660
based on this code.

264
00:13:36,660 --> 00:13:39,450
Other things that the duck
can now do for students

265
00:13:39,450 --> 00:13:42,960
is advise students on how to improve
their code style, the aesthetics,

266
00:13:42,960 --> 00:13:44,260
the formatting thereof.

267
00:13:44,260 --> 00:13:47,280
And so for instance, here is
similar code in a language called C.

268
00:13:47,280 --> 00:13:48,990
And I'll stipulate that it's very messy.

269
00:13:48,990 --> 00:13:51,840
Everything is left-aligned
instead of nicely indented,

270
00:13:51,840 --> 00:13:53,490
so it looks a little more structured.

271
00:13:53,490 --> 00:13:54,870
Students can now click a button.

272
00:13:54,870 --> 00:13:56,820
They'll see at the
right-hand side in green

273
00:13:56,820 --> 00:13:58,650
how their code should ideally look.

274
00:13:58,650 --> 00:14:01,470
And if they're not quite sure
what those changes are or why,

275
00:14:01,470 --> 00:14:03,150
they can click on, explain changes.

276
00:14:03,150 --> 00:14:06,180
And similarly, the duck
advises them on how and why

277
00:14:06,180 --> 00:14:08,970
to turn their not great
code into greater code,

278
00:14:08,970 --> 00:14:11,250
from left to right respectively.

279
00:14:11,250 --> 00:14:15,450
More compellingly and more generalizable
beyond CS50 and beyond computer

280
00:14:15,450 --> 00:14:19,080
science, is AI's ability to
answer most of the questions

281
00:14:19,080 --> 00:14:20,820
that students might now ask online.

282
00:14:20,820 --> 00:14:24,540
And we've been doing asynchronous Q&A
for years via various mobile or web

283
00:14:24,540 --> 00:14:25,710
applications and the like.

284
00:14:25,710 --> 00:14:28,680
But to date, it has been
humans, myself included,

285
00:14:28,680 --> 00:14:30,780
responding to all of those questions.

286
00:14:30,780 --> 00:14:34,650
Now the duck has an opportunity to chime
in, generally within three seconds,

287
00:14:34,650 --> 00:14:37,260
because we've integrated
it into an online Q&A tool

288
00:14:37,260 --> 00:14:40,960
that students in CS50 and elsewhere
across Harvard have long used.

289
00:14:40,960 --> 00:14:44,370
So here's an anonymized screenshot
of a question from an actual student,

290
00:14:44,370 --> 00:14:47,370
but written here as John
Harvard, who asked this summer,

291
00:14:47,370 --> 00:14:50,150
in the summer version of
CS50, what is flask exactly?

292
00:14:50,150 --> 00:14:51,920
So fairly definitional question.

293
00:14:51,920 --> 00:14:55,250
And here is what the duck spit
out, thanks to that architecture

294
00:14:55,250 --> 00:14:56,510
I described before.

295
00:14:56,510 --> 00:14:59,210
I'll stipulate that this is
correct, but it is mostly

296
00:14:59,210 --> 00:15:02,820
a definition, akin to what Google or
Bing could already give you last year.

297
00:15:02,820 --> 00:15:04,940
But here's a more nuanced
question, for instance,

298
00:15:04,940 --> 00:15:06,800
from another anonymized student.

299
00:15:06,800 --> 00:15:10,160
In this question here, the
student's including an error message

300
00:15:10,160 --> 00:15:11,000
that they're seeing.

301
00:15:11,000 --> 00:15:12,650
They're asking about that.

302
00:15:12,650 --> 00:15:15,890
And they're asking a little more
broadly and qualitatively, is there

303
00:15:15,890 --> 00:15:19,640
a more efficient way to write this
code, a question that really is best

304
00:15:19,640 --> 00:15:21,620
answered based on experience.

305
00:15:21,620 --> 00:15:25,130
Here, I'll stipulate that the duck
responded with this answer, which

306
00:15:25,130 --> 00:15:26,480
is actually pretty darn good.

307
00:15:26,480 --> 00:15:29,630
Not only responding in English,
but with some sample starter code

308
00:15:29,630 --> 00:15:31,430
that would make sense in this context.

309
00:15:31,430 --> 00:15:34,580
And at the bottom it's worth noting,
because none of this technology

310
00:15:34,580 --> 00:15:37,850
is perfect just yet, it's still
indeed very bleeding edge,

311
00:15:37,850 --> 00:15:41,960
and so what we have chosen to do within
CS50 is include disclaimers, like this.

312
00:15:41,960 --> 00:15:44,090
I am an experimental bot, quack.

313
00:15:44,090 --> 00:15:46,820
Do not assume that my reply is
accurate unless you see that it's

314
00:15:46,820 --> 00:15:50,040
been endorsed by humans, quack.

315
00:15:50,040 --> 00:15:53,160
And in fact, at top right, the
mechanism we've been using in this tool

316
00:15:53,160 --> 00:15:54,510
is usually within minutes.

317
00:15:54,510 --> 00:15:57,690
A human, whether it's a teaching
fellow, a course assistant, or myself,

318
00:15:57,690 --> 00:16:00,990
will click on a button like this
to signal to our human students

319
00:16:00,990 --> 00:16:05,130
that yes, like the duck is spot on here,
or we have an opportunity, as always,

320
00:16:05,130 --> 00:16:07,020
to chime in with our own responses.

321
00:16:07,020 --> 00:16:09,770
Frankly, that disclaimer, that
button, will soon I do think

322
00:16:09,770 --> 00:16:11,770
go away, as the software
gets better and better.

323
00:16:11,770 --> 00:16:14,367
But for now, that's how
we're modulating exactly

324
00:16:14,367 --> 00:16:16,200
what students' expectations
might be when it

325
00:16:16,200 --> 00:16:19,395
comes to correctness or incorrectness.

326
00:16:19,395 --> 00:16:22,020
It's common too in programming,
to see a lot of error messages,

327
00:16:22,020 --> 00:16:24,210
certainly when you're
learning first-hand.

328
00:16:24,210 --> 00:16:26,820
A lot of these error messages
are arcane, confusing,

329
00:16:26,820 --> 00:16:29,310
certainly to students, versus
the people who wrote them.

330
00:16:29,310 --> 00:16:31,170
Soon students will see a box like this.

331
00:16:31,170 --> 00:16:34,050
Whenever one of their
terminal window programs errs,

332
00:16:34,050 --> 00:16:39,120
they'll be assisted too with
English-like, TF-like support when

333
00:16:39,120 --> 00:16:42,212
it comes to explaining what it is
that went wrong with that command.

334
00:16:42,212 --> 00:16:43,920
And ultimately, what
this is really doing

335
00:16:43,920 --> 00:16:45,900
for students in our
own experience already,

336
00:16:45,900 --> 00:16:49,830
is providing them really with
virtual office hours and 24/7,

337
00:16:49,830 --> 00:16:52,560
which is actually quite compelling
in a university environment,

338
00:16:52,560 --> 00:16:55,110
where students' schedules
are already tightly packed,

339
00:16:55,110 --> 00:16:58,270
be it with academics, their
curriculars, athletics, and the like--

340
00:16:58,270 --> 00:17:00,180
--and they might have
enough time to dive

341
00:17:00,180 --> 00:17:03,510
into a homework assignment, maybe eight
hours even, for something sizable.

342
00:17:03,510 --> 00:17:06,390
But if they hit that wall
a couple of hours in, yeah,

343
00:17:06,390 --> 00:17:10,020
they can go to office hours or they can
ask a question asynchronously online,

344
00:17:10,020 --> 00:17:13,020
but it's really not optimal
in the moment support

345
00:17:13,020 --> 00:17:15,150
that we can now provide
all the more effectively

346
00:17:15,150 --> 00:17:17,170
we hope, through software, as well.

347
00:17:17,170 --> 00:17:18,089
So if you're curious.

348
00:17:18,089 --> 00:17:20,797
Even if you're not a technophile
yourself, anyone on the internet

349
00:17:20,797 --> 00:17:24,000
can go to cs50.ai and experiment
with this user interface.

350
00:17:24,000 --> 00:17:29,940
This one here actually resembles ChatGPT
itself, but it's specific to CS50.

351
00:17:29,940 --> 00:17:31,980
And here again is just a
sequence of screenshots

352
00:17:31,980 --> 00:17:33,930
that I'll stipulate
for today's purposes,

353
00:17:33,930 --> 00:17:37,920
are pretty darn good in akin to what I
myself or a teaching fellow would reply

354
00:17:37,920 --> 00:17:41,100
to and answer to a student's
question, in this case,

355
00:17:41,100 --> 00:17:42,930
about their particular code.

356
00:17:42,930 --> 00:17:45,240
And ultimately, it's
really aspirational.

357
00:17:45,240 --> 00:17:49,320
The goal here ultimately is to really
approximate a one-to-one teacher

358
00:17:49,320 --> 00:17:52,950
to student ratio, which despite all
of the resources we within CS50,

359
00:17:52,950 --> 00:17:56,070
we within Harvard and
places like Yale have,

360
00:17:56,070 --> 00:17:58,650
we certainly have never
had enough resources

361
00:17:58,650 --> 00:18:00,690
to approximate what
might really be ideal,

362
00:18:00,690 --> 00:18:04,050
which is more of an apprenticeship
model, a mentorship, whereby it's just

363
00:18:04,050 --> 00:18:06,145
you and that teacher working one-to-one.

364
00:18:06,145 --> 00:18:09,270
Now we still have humans, and the goal
is not to reduce that human support,

365
00:18:09,270 --> 00:18:14,220
but to focus it all the more consciously
on the students who would benefit most

366
00:18:14,220 --> 00:18:17,100
from some impersonal one-to-one
support versus students

367
00:18:17,100 --> 00:18:21,433
who would happily take it at any hour
of the day more digitally via online.

368
00:18:21,433 --> 00:18:23,850
And in fact, we're still in
the process of evaluating just

369
00:18:23,850 --> 00:18:25,560
how well or not well all of this works.

370
00:18:25,560 --> 00:18:28,800
But based on our summer experiment
alone with about 70 students

371
00:18:28,800 --> 00:18:31,770
a few months back, one student
wrote us at term's end it--

372
00:18:31,770 --> 00:18:33,660
--"felt like having a personal tutor.

373
00:18:33,660 --> 00:18:37,830
I love how AI bots will answer questions
without ego and without judgment.

374
00:18:37,830 --> 00:18:40,260
Generally entertaining even
the stupidest of questions

375
00:18:40,260 --> 00:18:42,690
without treating them
like they're stupid.

376
00:18:42,690 --> 00:18:47,550
It has an, as one could expect,"
ironically, "an inhuman level

377
00:18:47,550 --> 00:18:48,450
of patience."

378
00:18:48,450 --> 00:18:51,870
And so I thought that's telling
as to how even one student is

379
00:18:51,870 --> 00:18:54,490
perceiving these new possibilities.

380
00:18:54,490 --> 00:18:56,610
So let's consider now
more academically what

381
00:18:56,610 --> 00:18:58,920
it is that's enabling those
kinds of tools, not just

382
00:18:58,920 --> 00:19:02,370
within CS50, within computer science,
but really, the world more generally.

383
00:19:02,370 --> 00:19:04,078
What the whole world's
been talking about

384
00:19:04,078 --> 00:19:06,270
is generative artificial intelligence.

385
00:19:06,270 --> 00:19:09,630
AI that can generate images,
generate text, and sort of

386
00:19:09,630 --> 00:19:12,820
mimic the behavior of
what we think of as human.

387
00:19:12,820 --> 00:19:14,240
So what does that really mean?

388
00:19:14,240 --> 00:19:15,990
Well, let's start
really at the beginning.

389
00:19:15,990 --> 00:19:19,170
Artificial intelligence is
actually a technique, a technology,

390
00:19:19,170 --> 00:19:21,510
a subject that's actually
been with us for some time,

391
00:19:21,510 --> 00:19:26,460
but it really was the introduction of
this very user-friendly interface known

392
00:19:26,460 --> 00:19:28,230
as ChatGPT.

393
00:19:28,230 --> 00:19:31,440
And some of the more recent academic
work over really just the past five

394
00:19:31,440 --> 00:19:35,010
or six years, that really allowed
us to take a massive leap forward

395
00:19:35,010 --> 00:19:38,520
it would seem technologically, as
to what these things can now do.

396
00:19:38,520 --> 00:19:40,330
So what is artificial intelligence?

397
00:19:40,330 --> 00:19:43,410
It's been with us for some time,
and it's honestly, so omnipresent,

398
00:19:43,410 --> 00:19:45,690
that we take it for granted nowadays.

399
00:19:45,690 --> 00:19:48,330
Gmail, Outlook, have gotten
really good at spam detection.

400
00:19:48,330 --> 00:19:50,020
If you haven't checked your
spam folder in a while,

401
00:19:50,020 --> 00:19:52,000
that's testament to
just how good they seem

402
00:19:52,000 --> 00:19:54,758
to be at getting it out of your inbox.

403
00:19:54,758 --> 00:19:57,050
Handwriting recognition has
been with us for some time.

404
00:19:57,050 --> 00:19:59,380
I dare say, it, too, is only
getting better and better

405
00:19:59,380 --> 00:20:02,920
the more the software is able to
adapt to different handwriting

406
00:20:02,920 --> 00:20:04,270
styles, such as this.

407
00:20:04,270 --> 00:20:06,940
Recommendation histories
and the like, whether you're

408
00:20:06,940 --> 00:20:09,190
using Netflix or any
other service, have gotten

409
00:20:09,190 --> 00:20:12,580
better and better at recommending
things you might like based on things

410
00:20:12,580 --> 00:20:14,920
you have liked, and
maybe based on things

411
00:20:14,920 --> 00:20:18,190
other people who like the same
thing as you might have liked.

412
00:20:18,190 --> 00:20:20,560
And suffice it to say,
there's no one at Netflix

413
00:20:20,560 --> 00:20:22,780
akin to the old VHS
stores of yesteryear,

414
00:20:22,780 --> 00:20:26,590
who are recommending to you
specifically what movie you might like.

415
00:20:26,590 --> 00:20:31,330
And there's no code, no algorithm that
says, if they like x, then recommend y,

416
00:20:31,330 --> 00:20:34,762
else recommend z, because there's just
too many movies, too many people, too

417
00:20:34,762 --> 00:20:36,220
many different tastes in the world.

418
00:20:36,220 --> 00:20:40,000
So AI is increasingly sort of looking
for patterns that might not even

419
00:20:40,000 --> 00:20:42,700
be obvious to us humans,
and dynamically figuring out

420
00:20:42,700 --> 00:20:46,750
what might be good for me, for
you or you, or anyone else.

421
00:20:46,750 --> 00:20:50,402
Siri, Google Assistant, Alexa, any
of these voice recognition tools

422
00:20:50,402 --> 00:20:51,610
that are answering questions.

423
00:20:51,610 --> 00:20:54,918
That, too, suffice it to
say, is all powered by AI.

424
00:20:54,918 --> 00:20:58,210
But let's start with something a little
simpler than any of those applications.

425
00:20:58,210 --> 00:21:01,522
And this is one of the first arcade
games from yesteryear known as Pong.

426
00:21:01,522 --> 00:21:02,980
And it's sort of like table tennis.

427
00:21:02,980 --> 00:21:05,440
And the person on the left can
move their paddle up and down.

428
00:21:05,440 --> 00:21:07,000
Person on the right can do the same.

429
00:21:07,000 --> 00:21:09,970
And the goal is to get the
ball past the other person,

430
00:21:09,970 --> 00:21:13,960
or conversely, make sure it hits
your paddle and bounces back.

431
00:21:13,960 --> 00:21:17,440
Well, somewhat simpler than this
insofar as it can be one player,

432
00:21:17,440 --> 00:21:19,275
is another Atari game
from yesteryear known

433
00:21:19,275 --> 00:21:21,400
as Breakout, whereby you're
essentially just trying

434
00:21:21,400 --> 00:21:24,460
to bang the ball against the
bricks to get more and more points

435
00:21:24,460 --> 00:21:26,320
and get rid of all of those bricks.

436
00:21:26,320 --> 00:21:28,960
But all of us in this room
probably have a human instinct

437
00:21:28,960 --> 00:21:32,800
for how to win this game, or
at least how to play this game.

438
00:21:32,800 --> 00:21:36,430
For instance, if the ball
pictured here back in the '80s

439
00:21:36,430 --> 00:21:41,530
as a single red dot just left the
paddle, pictured here as a red line,

440
00:21:41,530 --> 00:21:43,990
where is the ball
presumably going to go next?

441
00:21:43,990 --> 00:21:47,410
And in turn, which direction
should I slide my paddle?

442
00:21:47,410 --> 00:21:49,900
To the left or to the right?

443
00:21:49,900 --> 00:21:51,630
So presumably, to the left.

444
00:21:51,630 --> 00:21:54,690
And we all have an eye for what seemed
to be the digital physics of that.

445
00:21:54,690 --> 00:21:57,540
And indeed, that would then
be an algorithm, sort of step

446
00:21:57,540 --> 00:21:59,890
by step instructions for
solving some problem.

447
00:21:59,890 --> 00:22:03,120
So how can we now translate that human
intuition to what we describe more

448
00:22:03,120 --> 00:22:04,780
as artificial intelligence?

449
00:22:04,780 --> 00:22:07,290
Not nearly as sophisticated
as those other applications,

450
00:22:07,290 --> 00:22:09,000
but we'll indeed,
start with some basics.

451
00:22:09,000 --> 00:22:12,960
You might know from economics or
strategic thinking or computer science,

452
00:22:12,960 --> 00:22:15,640
this idea of a decision tree
that allows you to decide,

453
00:22:15,640 --> 00:22:19,060
should I go this way or this way
when it comes to making a decision.

454
00:22:19,060 --> 00:22:22,440
So let's consider how we could draw
a picture to represent even something

455
00:22:22,440 --> 00:22:24,180
simplistic like Breakout.

456
00:22:24,180 --> 00:22:28,290
Well, if the ball is left of the paddle,
is a question or a Boolean expression

457
00:22:28,290 --> 00:22:29,940
I might ask myself in code.

458
00:22:29,940 --> 00:22:34,500
If yes, then I should move my paddle
left, as most everyone just said.

459
00:22:34,500 --> 00:22:37,960
Else, if the ball is not left
of paddle, what do I want to do?

460
00:22:37,960 --> 00:22:39,537
Well, I want to ask a question.

461
00:22:39,537 --> 00:22:41,370
I don't want to just
instinctively go right.

462
00:22:41,370 --> 00:22:44,010
I want to check, is the ball
to the right of the paddle,

463
00:22:44,010 --> 00:22:47,730
and if yes, well, then yes, go
ahead and move the paddle right.

464
00:22:47,730 --> 00:22:50,180
But there is a third
situation, which is--

465
00:22:50,180 --> 00:22:51,163
AUDIENCE: [INAUDIBLE]

466
00:22:51,163 --> 00:22:52,080
DAVID J. MALAN: Right.

467
00:22:52,080 --> 00:22:53,920
Like, don't move, it's
coming right at you.

468
00:22:53,920 --> 00:22:55,260
So that would be the
third scenario here.

469
00:22:55,260 --> 00:22:58,140
No, it's not to the right or to the
left, so just don't move the paddle.

470
00:22:58,140 --> 00:23:00,660
You got lucky, and it's coming,
for instance, straight down.

471
00:23:00,660 --> 00:23:04,170
So Breakout is fairly straightforward
when it comes to an algorithm.

472
00:23:04,170 --> 00:23:07,200
And we can actually translate this
as any CS50 student now could,

473
00:23:07,200 --> 00:23:11,400
to code or pseudocode, sort of
English-like code that's independent

474
00:23:11,400 --> 00:23:15,280
of Java, C, C++ and all of the
programming languages of today.

475
00:23:15,280 --> 00:23:17,940
So in English pseudocode,
while a game is

476
00:23:17,940 --> 00:23:22,230
ongoing, if the ball is left of
paddle, I should move paddle left.

477
00:23:22,230 --> 00:23:26,460
Else if ball is right of the paddle,
it should say paddle, that's a bug,

478
00:23:26,460 --> 00:23:29,520
not intended today, move paddle right.

479
00:23:29,520 --> 00:23:31,710
Else, don't move the paddle.

480
00:23:31,710 --> 00:23:35,910
So that, too, represents a
translation of this intuition to code

481
00:23:35,910 --> 00:23:37,200
that's very deterministic.

482
00:23:37,200 --> 00:23:40,830
You can anticipate all possible
scenarios captured in code.

483
00:23:40,830 --> 00:23:43,890
And frankly, this should be the
most boring game of Breakout,

484
00:23:43,890 --> 00:23:47,250
because the paddle should just
perfectly play this game, assuming

485
00:23:47,250 --> 00:23:49,770
there's no variables or
randomness when it comes to speed

486
00:23:49,770 --> 00:23:53,590
or angles or the like, which real
world games certainly try to introduce.

487
00:23:53,590 --> 00:23:55,570
But let's consider another
game from yesteryear

488
00:23:55,570 --> 00:23:58,570
that you might play with your kids
today or you did yourself growing up.

489
00:23:58,570 --> 00:23:59,590
Here's tic-tac-toe.

490
00:23:59,590 --> 00:24:02,860
And for those unfamiliar, the
goal is to get three O's in a row

491
00:24:02,860 --> 00:24:07,180
or three X's in a row, vertically,
horizontally, or diagonally.

492
00:24:07,180 --> 00:24:09,970
So suppose it's now X's turn.

493
00:24:09,970 --> 00:24:12,250
If you've played
tic-tac-toe, most of you

494
00:24:12,250 --> 00:24:16,060
probably just have an immediate instinct
as to where X should probably go,

495
00:24:16,060 --> 00:24:18,970
so that it doesn't lose instantaneously.

496
00:24:18,970 --> 00:24:22,690
But let's consider in the more general
case, how do you solve tic-tac-toe.

497
00:24:22,690 --> 00:24:25,360
Frankly, if you're in the
habit of losing tic-tac-toe,

498
00:24:25,360 --> 00:24:27,255
but you're not trying
to lose tic-tac-toe,

499
00:24:27,255 --> 00:24:28,630
you're actually playing it wrong.

500
00:24:28,630 --> 00:24:31,920
Like, you should minimally be able
to always force a tie in tic-tac-toe.

501
00:24:31,920 --> 00:24:34,420
And better yet, you should be
able to beat the other person.

502
00:24:34,420 --> 00:24:37,550
So hopefully, everyone now will
soon walk away with this strategy.

503
00:24:37,550 --> 00:24:41,020
So how can we borrow inspiration
from those same decision trees

504
00:24:41,020 --> 00:24:43,100
and do something similar here?

505
00:24:43,100 --> 00:24:47,620
So if you, the player, ask yourself,
can I get three in a row on this turn?

506
00:24:47,620 --> 00:24:51,970
Well, if yes, then you should do
that and play the X in that position.

507
00:24:51,970 --> 00:24:53,980
Play in the square to
get three in a row.

508
00:24:53,980 --> 00:24:54,820
Straight forward.

509
00:24:54,820 --> 00:24:58,330
If you can't get three in a row in this
turn, you should ask another question.

510
00:24:58,330 --> 00:25:01,660
Can my opponent get three
in a row in their next turn?

511
00:25:01,660 --> 00:25:06,220
Because then you better preempt
that by moving into that position.

512
00:25:06,220 --> 00:25:10,810
Play in the square to block
opponent's three in a row.

513
00:25:10,810 --> 00:25:13,428
What if though, that's
not the case, right?

514
00:25:13,428 --> 00:25:15,970
What if there aren't even that
many X's and O's on the board?

515
00:25:15,970 --> 00:25:17,887
If you're in the habit
of just kind of playing

516
00:25:17,887 --> 00:25:21,940
randomly, like you might not be
playing optimally as a good AI could.

517
00:25:21,940 --> 00:25:24,430
So if no, it's kind of a question mark.

518
00:25:24,430 --> 00:25:26,685
In fact, there's probably
more to this tree,

519
00:25:26,685 --> 00:25:28,810
because we could think
through, what if I go there.

520
00:25:28,810 --> 00:25:30,977
Wait a minute, what if I
go there or there or there?

521
00:25:30,977 --> 00:25:34,510
You can start to think a few steps ahead
as a computer could do much better even

522
00:25:34,510 --> 00:25:35,540
than us humans.

523
00:25:35,540 --> 00:25:37,388
So suppose, for instance, it's O's turn.

524
00:25:37,388 --> 00:25:39,430
Now those of you who are
very good at tic-tac-toe

525
00:25:39,430 --> 00:25:40,870
might have an instinct for where to go.

526
00:25:40,870 --> 00:25:42,953
But this is an even harder
problem, it would seem.

527
00:25:42,953 --> 00:25:45,370
I could go in eight
possible places if I'm O.

528
00:25:45,370 --> 00:25:49,570
But let's try to break that down
more algorithmically, as in AI would.

529
00:25:49,570 --> 00:25:53,830
And let's recognize, too, that with
games in particular, one of the reasons

530
00:25:53,830 --> 00:25:58,330
that AI was so early adopted in
these games, playing the CPU,

531
00:25:58,330 --> 00:26:02,020
is that games really lend
themselves to defining them,

532
00:26:02,020 --> 00:26:04,120
if taking the fun out
of it mathematically.

533
00:26:04,120 --> 00:26:07,600
Defining them in terms of inputs
and outputs, maybe paddle moving

534
00:26:07,600 --> 00:26:10,040
left or right, ball moving up or down.

535
00:26:10,040 --> 00:26:13,090
You can really quantize it
at a very boring low level.

536
00:26:13,090 --> 00:26:16,060
But that lends itself then
to solving it optimally.

537
00:26:16,060 --> 00:26:19,630
And in fact, with most games,
the goal is to maximize or maybe

538
00:26:19,630 --> 00:26:21,790
minimize some math function, right?

539
00:26:21,790 --> 00:26:24,910
Most games, if you have scores,
the goal is to maximize your score,

540
00:26:24,910 --> 00:26:26,750
and indeed, get a high score.

541
00:26:26,750 --> 00:26:31,510
So games lend themselves to a
nice translation to mathematics,

542
00:26:31,510 --> 00:26:33,410
and in turn here, AI solutions.

543
00:26:33,410 --> 00:26:37,690
So one of the first algorithms one
might learn in a class on algorithms

544
00:26:37,690 --> 00:26:39,490
and on artificial
intelligence is something

545
00:26:39,490 --> 00:26:41,860
called minimax, which alludes
to this idea of trying

546
00:26:41,860 --> 00:26:46,060
to minimize and/or maximize something
as your function, your goal.

547
00:26:46,060 --> 00:26:49,890
And it actually derives its inspiration
from these same decision trees

548
00:26:49,890 --> 00:26:51,140
that we've been talking about.

549
00:26:51,140 --> 00:26:52,390
But first, a definition.

550
00:26:52,390 --> 00:26:55,210
Here are three representative
tic-tac-toe boards.

551
00:26:55,210 --> 00:26:58,570
Here is one in which O has
clearly won, per the green.

552
00:26:58,570 --> 00:27:01,537
Here is one in which X has
clearly won, per the green.

553
00:27:01,537 --> 00:27:03,620
And this one in the middle
just represents a draw.

554
00:27:03,620 --> 00:27:06,662
Now, there's a bunch of other ways
that tic-tac-toe could end, but here's

555
00:27:06,662 --> 00:27:08,050
just three representative ones.

556
00:27:08,050 --> 00:27:10,223
But let's make tic-tac-toe
even more boring

557
00:27:10,223 --> 00:27:11,890
than it might have always struck you as.

558
00:27:11,890 --> 00:27:15,130
Let's propose that this
kind of configuration

559
00:27:15,130 --> 00:27:17,230
should have a score of negative 1.

560
00:27:17,230 --> 00:27:19,030
If O wins, it's a negative 1.

561
00:27:19,030 --> 00:27:21,340
If X wins, it's a positive 1.

562
00:27:21,340 --> 00:27:23,350
And if no one wins, we'll call it a 0.

563
00:27:23,350 --> 00:27:27,280
We need some way of talking about and
reasoning about which of these outcomes

564
00:27:27,280 --> 00:27:28,520
is better than the other.

565
00:27:28,520 --> 00:27:31,450
And what's simpler than
0, 1 and negative 1?

566
00:27:31,450 --> 00:27:33,760
So the goal though, of
X, it would seem, is

567
00:27:33,760 --> 00:27:38,530
to maximize its score, but the
goal of O is to minimize its score.

568
00:27:38,530 --> 00:27:42,400
So X is really trying to get positive
1, O is really trying to get negative 1.

569
00:27:42,400 --> 00:27:46,610
And no one really wants 0, but that's
better than losing to the other person.

570
00:27:46,610 --> 00:27:49,900
So we have now a way to define
what it means to win or lose.

571
00:27:49,900 --> 00:27:52,790
Well, now we can employ a strategy here.

572
00:27:52,790 --> 00:27:56,210
Here, just as a quick check, what
would the score be of this board?

573
00:27:56,210 --> 00:27:58,020
Just so everyone's on the same page.

574
00:27:58,020 --> 00:27:58,520
AUDIENCE: 1.

575
00:27:58,520 --> 00:28:02,000
DAVID J. MALAN: Or, so 1, because X has
one and we just stipulated arbitrarily,

576
00:28:02,000 --> 00:28:04,190
this means that this
board has a value of 1.

577
00:28:04,190 --> 00:28:06,740
Now let's put it into a
more interesting context.

578
00:28:06,740 --> 00:28:09,320
Here, a game has been played
for a few moves already.

579
00:28:09,320 --> 00:28:10,890
There's two spots left.

580
00:28:10,890 --> 00:28:12,590
No one has won just yet.

581
00:28:12,590 --> 00:28:14,982
And suppose that it's O's turn now.

582
00:28:14,982 --> 00:28:17,690
Now, everyone probably has an
instinct already as to where to go,

583
00:28:17,690 --> 00:28:20,510
but let's try to break this
down more algorithmically.

584
00:28:20,510 --> 00:28:22,430
So what is the value of this board?

585
00:28:22,430 --> 00:28:25,430
Well, we don't know yet,
because no one has won,

586
00:28:25,430 --> 00:28:28,440
so let's consider what
could happen next.

587
00:28:28,440 --> 00:28:31,310
So we can draw this actually
as a tree, as before.

588
00:28:31,310 --> 00:28:33,470
Here, for instance, is
what might happen if O

589
00:28:33,470 --> 00:28:35,270
goes into the top left-hand corner.

590
00:28:35,270 --> 00:28:39,830
And here's what might happen if O goes
into the bottom middle spot instead.

591
00:28:39,830 --> 00:28:42,530
We should ask ourselves, what's
the value of this board, what's

592
00:28:42,530 --> 00:28:43,530
the value of this board?

593
00:28:43,530 --> 00:28:46,340
Because if O's purpose in
life is to minimize its score,

594
00:28:46,340 --> 00:28:49,850
it's going to go left or right based on
whichever yields the smallest number.

595
00:28:49,850 --> 00:28:51,390
Negative 1, ideally.

596
00:28:51,390 --> 00:28:55,230
But we're still not sure yet, because
we don't have definitions for boards

597
00:28:55,230 --> 00:28:56,770
with holes in them like this.

598
00:28:56,770 --> 00:28:58,380
So what could happen next here?

599
00:28:58,380 --> 00:29:00,480
Well, it's obviously
going to be X's turn next.

600
00:29:00,480 --> 00:29:05,080
So if X moves, unfortunately, X
has one in this configuration.

601
00:29:05,080 --> 00:29:08,980
We can now conclude that the value
of this board is what number?

602
00:29:08,980 --> 00:29:09,480
AUDIENCE: 1.

603
00:29:09,480 --> 00:29:10,620
DAVID J. MALAN: So 1.

604
00:29:10,620 --> 00:29:14,970
And because there's only one way to
reach this board, by transitivity,

605
00:29:14,970 --> 00:29:19,080
you might as well think of the value
of this previous board as also 1,

606
00:29:19,080 --> 00:29:21,760
because no matter what, it's going
to lead to that same outcome.

607
00:29:21,760 --> 00:29:25,890
And so the value of this board is
actually still to be determined,

608
00:29:25,890 --> 00:29:28,440
because we don't know if O is
going to want to go with the 1,

609
00:29:28,440 --> 00:29:30,600
and probably not, because
that means X wins.

610
00:29:30,600 --> 00:29:32,520
But let's see what the
value of this board is.

611
00:29:32,520 --> 00:29:36,370
Well, suppose that indeed, X goes
in that top left corner here.

612
00:29:36,370 --> 00:29:39,540
What's the value of this board here?

613
00:29:39,540 --> 00:29:41,140
0, because no one has won.

614
00:29:41,140 --> 00:29:43,390
There's no X's or O's three in a row.

615
00:29:43,390 --> 00:29:45,000
So the value of this board is 0.

616
00:29:45,000 --> 00:29:47,140
There's only one way
logically to get there,

617
00:29:47,140 --> 00:29:50,190
so we might as well think of the
value of this board as also 0.

618
00:29:50,190 --> 00:29:53,100
And so now, what's the
value of this board?

619
00:29:53,100 --> 00:29:56,370
Well, if we started the story
by thinking about O's turn,

620
00:29:56,370 --> 00:30:01,860
O's purpose is the min in minimax,
then which move is O going to make?

621
00:30:01,860 --> 00:30:05,030
Go to the left or go to the right?

622
00:30:05,030 --> 00:30:06,800
O's was probably going
to go to the right

623
00:30:06,800 --> 00:30:10,880
and make the move that leads to,
whoops, that leads to this board,

624
00:30:10,880 --> 00:30:15,200
because even though O can't win in this
configuration, at least X didn't win.

625
00:30:15,200 --> 00:30:19,190
So it's minimized its score relatively,
even though it's not a clean win.

626
00:30:19,190 --> 00:30:21,500
Now, this is all fine and
good for a configuration

627
00:30:21,500 --> 00:30:23,243
of the board that's like almost done.

628
00:30:23,243 --> 00:30:24,410
There's only two moves left.

629
00:30:24,410 --> 00:30:25,770
The game's about to end.

630
00:30:25,770 --> 00:30:27,830
But if you kind of expand
in your mind's eye,

631
00:30:27,830 --> 00:30:30,810
how did we get to this
branch of the decision tree,

632
00:30:30,810 --> 00:30:34,010
if we rewind one step where
there's three possible moves,

633
00:30:34,010 --> 00:30:36,260
frankly, the decision
tree is a lot bigger.

634
00:30:36,260 --> 00:30:39,350
If we rewind further in your
mind's eye and have four moves

635
00:30:39,350 --> 00:30:41,760
left or five moves or
all nine moves left,

636
00:30:41,760 --> 00:30:43,550
imagine just zooming out, out, and out.

637
00:30:43,550 --> 00:30:46,940
This is becoming a massive,
massive tree of decisions.

638
00:30:46,940 --> 00:30:51,110
Now, even so, here is that same
subtree, the same decision tree

639
00:30:51,110 --> 00:30:51,860
we just looked at.

640
00:30:51,860 --> 00:30:54,050
This is the exact same thing,
but I shrunk the font so

641
00:30:54,050 --> 00:30:55,760
that it appears here on the screen here.

642
00:30:55,760 --> 00:30:59,660
But over here, we have what
could happen if instead,

643
00:30:59,660 --> 00:31:03,680
it's actually X's turn,
because we're one move prior.

644
00:31:03,680 --> 00:31:06,420
There's a bunch of different
moves X could now make, too.

645
00:31:06,420 --> 00:31:08,350
So what is the implication of this?

646
00:31:08,350 --> 00:31:12,930
Well, most humans are not thinking
through tic-tac-toe to this extreme.

647
00:31:12,930 --> 00:31:15,780
And frankly, most of us probably
just don't have the mental capacity

648
00:31:15,780 --> 00:31:18,360
to think about going left and then
right and then left and then right.

649
00:31:18,360 --> 00:31:18,860
Right?

650
00:31:18,860 --> 00:31:20,610
This is not how people play tic-tac-toe.

651
00:31:20,610 --> 00:31:23,190
Like, we're not using that
much memory, so to speak.

652
00:31:23,190 --> 00:31:26,010
But a computer can handle
that, and computers

653
00:31:26,010 --> 00:31:27,850
can play tic-tac-toe optimally.

654
00:31:27,850 --> 00:31:30,360
So if you're beating a
computer at tic-tac-toe, like,

655
00:31:30,360 --> 00:31:31,770
it's not implemented very well.

656
00:31:31,770 --> 00:31:36,420
It's not following this very logical,
deterministic minimax algorithm.

657
00:31:36,420 --> 00:31:40,470
But this is where now AI is
no longer as simple as just

658
00:31:40,470 --> 00:31:42,570
doing what these decision trees say.

659
00:31:42,570 --> 00:31:45,780
In the context of tic-tac-toe,
here's how we might translate this

660
00:31:45,780 --> 00:31:46,870
to code, for instance.

661
00:31:46,870 --> 00:31:49,830
If player is X, for each
possible move, calculate

662
00:31:49,830 --> 00:31:52,200
a score for the board, as
we were doing verbally,

663
00:31:52,200 --> 00:31:54,600
and then choose the move
with the highest score.

664
00:31:54,600 --> 00:31:57,420
Because X's goal is
to maximize its score.

665
00:31:57,420 --> 00:32:00,090
If the player is O, though,
for each possible move,

666
00:32:00,090 --> 00:32:02,010
calculate a score for
the board, and then

667
00:32:02,010 --> 00:32:04,210
choose the move with the lowest score.

668
00:32:04,210 --> 00:32:06,600
So that's a distillation
of that verbal walkthrough

669
00:32:06,600 --> 00:32:10,290
into what CS50 students know now
as code, or at least pseudocode.

670
00:32:10,290 --> 00:32:15,120
But the problem with games, not so
much tic-tac-toe, but other more

671
00:32:15,120 --> 00:32:16,650
sophisticated games is this.

672
00:32:16,650 --> 00:32:19,890
Does anyone want to ballpark
how many possible ways there

673
00:32:19,890 --> 00:32:22,940
are to play tic-tac-toe?

674
00:32:22,940 --> 00:32:26,180
Paper, pencil, two human
children, how many different ways?

675
00:32:26,180 --> 00:32:30,893
How long could you keep them occupied
playing tic-tac-toe in different ways?

676
00:32:30,893 --> 00:32:33,310
If you actually think through,
how big does this tree get,

677
00:32:33,310 --> 00:32:36,160
how many leaves are there on
this decision tree, like how many

678
00:32:36,160 --> 00:32:42,520
different directions, well, if you're
thinking 255,168, you are correct.

679
00:32:42,520 --> 00:32:44,980
And now most of us in our
lifetime have probably not

680
00:32:44,980 --> 00:32:47,180
played tic-tac-toe that many times.

681
00:32:47,180 --> 00:32:49,660
So think about how many games
you've been missing out on.

682
00:32:49,660 --> 00:32:53,230
There are different decisions you
could have been making all these years.

683
00:32:53,230 --> 00:32:57,380
Now, that's a big number, but honestly,
that's not a big number for a computer.

684
00:32:57,380 --> 00:33:01,420
That's a few megabytes of memory
maybe, to keep all of that in mind

685
00:33:01,420 --> 00:33:06,160
and implement that kind of code in
C or Java or C++ or something else.

686
00:33:06,160 --> 00:33:08,990
But other games are
much more complicated.

687
00:33:08,990 --> 00:33:11,860
And the games that you and I
might play as we get older,

688
00:33:11,860 --> 00:33:13,330
they include maybe chess.

689
00:33:13,330 --> 00:33:17,560
And if you think about chess with only
the first four moves, back and forth

690
00:33:17,560 --> 00:33:19,750
four times, so only four moves.

691
00:33:19,750 --> 00:33:21,430
That's not even a very long game.

692
00:33:21,430 --> 00:33:23,830
Anyone want a ballpark
how many different ways

693
00:33:23,830 --> 00:33:28,390
there are to begin a game of chess
with four moves back and forth?

694
00:33:28,390 --> 00:33:31,490


695
00:33:31,490 --> 00:33:34,300
This is evidence as to why
chess is apparently so hard.

696
00:33:34,300 --> 00:33:40,030
288 million ways, which is why
when you are really good at chess,

697
00:33:40,030 --> 00:33:41,680
you are really good at chess.

698
00:33:41,680 --> 00:33:44,350
Because apparently, you
either have an intuition for

699
00:33:44,350 --> 00:33:47,950
or a mind for thinking it would
seem so many more steps ahead

700
00:33:47,950 --> 00:33:48,860
than your opponent.

701
00:33:48,860 --> 00:33:50,777
And don't get us started
on something like Go.

702
00:33:50,777 --> 00:33:55,570
266 quintillion ways to
play Go's first four moves.

703
00:33:55,570 --> 00:33:59,110
So at this point, we just
can't pull out our Mac, our PC,

704
00:33:59,110 --> 00:34:03,190
certainly not our phone, to solve
optimally games like chess and Go,

705
00:34:03,190 --> 00:34:05,323
because we don't have big enough CPUs.

706
00:34:05,323 --> 00:34:06,490
We don't have enough memory.

707
00:34:06,490 --> 00:34:09,610
We don't have enough years in
our lifetimes for the computers

708
00:34:09,610 --> 00:34:11,110
to crunch all of those numbers.

709
00:34:11,110 --> 00:34:14,230
And thus was born a
different form of AI that's

710
00:34:14,230 --> 00:34:18,520
more inspired by finding
patterns more dynamically,

711
00:34:18,520 --> 00:34:22,239
learning from data, as opposed
to being told by humans, here

712
00:34:22,239 --> 00:34:25,070
is the code via which
to solve this problem.

713
00:34:25,070 --> 00:34:28,330
So machine learning is a subset
of artificial intelligence

714
00:34:28,330 --> 00:34:30,980
that tries instead to
get machines to learn

715
00:34:30,980 --> 00:34:35,900
what they should do without being so
coached step by step by step by humans

716
00:34:35,900 --> 00:34:36,409
here.

717
00:34:36,409 --> 00:34:39,500
Reinforcement learning, for instance,
is one such example thereof,

718
00:34:39,500 --> 00:34:41,690
wherein reinforcement
learning, you sort of wait

719
00:34:41,690 --> 00:34:44,480
for the computer or maybe
a robot to maybe just get

720
00:34:44,480 --> 00:34:46,380
better and better and better at things.

721
00:34:46,380 --> 00:34:48,710
And as it does, you reward
it with a reward function.

722
00:34:48,710 --> 00:34:50,960
Give it plus 1 every time
it does something well.

723
00:34:50,960 --> 00:34:51,830
And maybe minus 1.

724
00:34:51,830 --> 00:34:54,080
You punish it any time
it does something poorly.

725
00:34:54,080 --> 00:35:00,110
And if you simply program this AI
or this robot to maximize its score,

726
00:35:00,110 --> 00:35:02,390
never mind minimizing,
maximize its score,

727
00:35:02,390 --> 00:35:05,570
ideally, it should repeat
behaviors that got it plus 1.

728
00:35:05,570 --> 00:35:07,820
It should decrease the
frequency with which it does

729
00:35:07,820 --> 00:35:09,710
bad behaviors that got it negative 1.

730
00:35:09,710 --> 00:35:12,080
And you can reinforce
this kind of learning.

731
00:35:12,080 --> 00:35:15,230
In fact, I have here one demonstration.

732
00:35:15,230 --> 00:35:18,380
Could a student come on
up who does not think

733
00:35:18,380 --> 00:35:20,960
they are particularly coordinated?

734
00:35:20,960 --> 00:35:24,020
If-- OK, wow, you're being
nominated by your friends.

735
00:35:24,020 --> 00:35:24,950
Come on up.

736
00:35:24,950 --> 00:35:26,283
Come on up.

737
00:35:26,283 --> 00:35:28,598
[LAUGHTER]

738
00:35:28,598 --> 00:35:29,530


739
00:35:29,530 --> 00:35:31,720
Their hands went up instantly for you.

740
00:35:31,720 --> 00:35:34,260


741
00:35:34,260 --> 00:35:36,290
OK, what is your name?

742
00:35:36,290 --> 00:35:37,420
AMAKA: My name's Amaka.

743
00:35:37,420 --> 00:35:39,130
DAVID J. MALAN: Amaka, do you want
to introduce yourself to the world?

744
00:35:39,130 --> 00:35:40,330
AMAKA: Hi, my name is Amaka.

745
00:35:40,330 --> 00:35:42,250
I am a first year in Holworthy.

746
00:35:42,250 --> 00:35:43,667
I'm planning to concentrate in CS.

747
00:35:43,667 --> 00:35:44,750
DAVID J. MALAN: Wonderful.

748
00:35:44,750 --> 00:35:45,550
Nice to see you.

749
00:35:45,550 --> 00:35:46,690
Come on over here.

750
00:35:46,690 --> 00:35:49,540
[APPLAUSE]

751
00:35:49,540 --> 00:35:52,900
So, yes, oh, no, it's sort
of like a game show here.

752
00:35:52,900 --> 00:35:57,520
We have a pan here with what appears
to be something pancake-like.

753
00:35:57,520 --> 00:36:00,970
And we'd like to teach
you how to flip a pancake,

754
00:36:00,970 --> 00:36:04,250
so that when you gesture upward,
the pancake should flip around

755
00:36:04,250 --> 00:36:05,900
as though you cooked the other side.

756
00:36:05,900 --> 00:36:09,400
So we're going to reward you
verbally with plus 1 or minus 1.

757
00:36:09,400 --> 00:36:11,980


758
00:36:11,980 --> 00:36:13,450
Minus 1.

759
00:36:13,450 --> 00:36:15,470
Minus 1.

760
00:36:15,470 --> 00:36:17,050
OK, plus 1!

761
00:36:17,050 --> 00:36:19,690
Plus 1, so do more of that.

762
00:36:19,690 --> 00:36:20,920
Minus 1.

763
00:36:20,920 --> 00:36:22,840
Minus 1.

764
00:36:22,840 --> 00:36:23,890
Minus 1.

765
00:36:23,890 --> 00:36:25,150
Do less of that.

766
00:36:25,150 --> 00:36:27,370
[LAUGHTER]

767
00:36:27,370 --> 00:36:28,517
AUDIENCE: Great, great.

768
00:36:28,517 --> 00:36:29,600
DAVID J. MALAN: All right!

769
00:36:29,600 --> 00:36:30,655
A big round of applause.

770
00:36:30,655 --> 00:36:32,890
[APPLAUSE]

771
00:36:32,890 --> 00:36:33,670
Thank you.

772
00:36:33,670 --> 00:36:37,340
We've been in the habit of handing out
Super Mario Brothers Oreos this year,

773
00:36:37,340 --> 00:36:39,220
so thank you for participating.

774
00:36:39,220 --> 00:36:41,600
[APPLAUSE]

775
00:36:41,600 --> 00:36:43,030


776
00:36:43,030 --> 00:36:46,590
So, this is actually a good
example of an opportunity

777
00:36:46,590 --> 00:36:47,940
for reinforcement learning.

778
00:36:47,940 --> 00:36:51,310
And wonderfully, a researcher has posted
a video that we thought we'd share.

779
00:36:51,310 --> 00:36:53,060
It's about a minute
and a half long, where

780
00:36:53,060 --> 00:36:57,570
you can watch a robot now do exactly
what our wonderful human volunteer here

781
00:36:57,570 --> 00:36:59,050
just attempted as well.

782
00:36:59,050 --> 00:37:01,560
So let me go ahead and
play this on the screen

783
00:37:01,560 --> 00:37:05,380
and give you a sense of what the human
and the robot are doing together.

784
00:37:05,380 --> 00:37:08,790
So their pancake looks
a little similar there.

785
00:37:08,790 --> 00:37:12,360
The human here is going to first
sort of train the robot what

786
00:37:12,360 --> 00:37:14,190
to do by showing it some gestures.

787
00:37:14,190 --> 00:37:16,360
But there's no one right way to do this.

788
00:37:16,360 --> 00:37:19,660
But the human seems to know how
to do it pretty well in this case,

789
00:37:19,660 --> 00:37:23,040
and so it's trying to
give the machine examples

790
00:37:23,040 --> 00:37:24,990
of how to flip a pancake successfully.

791
00:37:24,990 --> 00:37:27,810
But now, this is the very first trial.

792
00:37:27,810 --> 00:37:28,560
OK, look familiar?

793
00:37:28,560 --> 00:37:30,300
You're in good company.

794
00:37:30,300 --> 00:37:32,652
After three trials.

795
00:37:32,652 --> 00:37:33,456
[CLANG]

796
00:37:33,456 --> 00:37:34,260
[PLOP]

797
00:37:34,260 --> 00:37:36,020
OK.

798
00:37:36,020 --> 00:37:36,520
[CLANG]

799
00:37:36,520 --> 00:37:37,410
[PLOP]

800
00:37:37,410 --> 00:37:39,060
OK.

801
00:37:39,060 --> 00:37:42,690
Now 10 tries.

802
00:37:42,690 --> 00:37:46,020
There's the human
picking up the pancake.

803
00:37:46,020 --> 00:37:48,780
After 11 trials--

804
00:37:48,780 --> 00:37:49,680
[CLANG]

805
00:37:49,680 --> 00:37:51,930
[PLOP]

806
00:37:51,930 --> 00:37:54,270
And meanwhile, there's
presumably a human coding this,

807
00:37:54,270 --> 00:38:00,090
in the sense that someone is saying
good job or bad job, plus 1 or minus 1.

808
00:38:00,090 --> 00:38:03,870
20 trials.

809
00:38:03,870 --> 00:38:07,440
Here now we'll see how the computer
knows what it's even doing.

810
00:38:07,440 --> 00:38:10,720
There's just a mapping to some
kind of XYZ coordinate system.

811
00:38:10,720 --> 00:38:13,260
So the robot can quantize
what it is it's doing.

812
00:38:13,260 --> 00:38:14,100
Nice!

813
00:38:14,100 --> 00:38:16,447
To do more of one
thing, less of another.

814
00:38:16,447 --> 00:38:18,780
And you're just seeing a
visualization in the background

815
00:38:18,780 --> 00:38:21,720
of those digitized movements.

816
00:38:21,720 --> 00:38:28,020
And so now, after 50 some odd trials,
the robot, too, has got it spot on.

817
00:38:28,020 --> 00:38:30,420
And it should be able to
repeat this again and again

818
00:38:30,420 --> 00:38:33,000
and again, in order to
keep flipping this pancake.

819
00:38:33,000 --> 00:38:36,360
So our human volunteer wonderfully
took you even fewer trials.

820
00:38:36,360 --> 00:38:38,340
But this is an example
then, to be clear,

821
00:38:38,340 --> 00:38:40,800
of what we'd call
reinforcement learning,

822
00:38:40,800 --> 00:38:44,725
whereby you're reinforcing a behavior
you want or negatively reinforcing.

823
00:38:44,725 --> 00:38:46,600
That is, punishing a
behavior that you don't.

824
00:38:46,600 --> 00:38:48,350
Here's another example
that brings us back

825
00:38:48,350 --> 00:38:51,850
into the realm of games a little
bit, but in a very abstract way.

826
00:38:51,850 --> 00:38:53,918
If we were playing a game
like The Floor Is Lava,

827
00:38:53,918 --> 00:38:56,710
where you're only supposed to step
certain places so that you don't

828
00:38:56,710 --> 00:38:59,585
fall straight in the lava pit or
something like that and lose a point

829
00:38:59,585 --> 00:39:02,920
or lose a life, each of these
squares might represent a position.

830
00:39:02,920 --> 00:39:06,470
This yellow dot might represent the
human player that can go up, down,

831
00:39:06,470 --> 00:39:08,240
left or right within this world.

832
00:39:08,240 --> 00:39:11,170
I'm revealing to the whole
audience where the lava pits are.

833
00:39:11,170 --> 00:39:13,930
But the goal for this yellow
dot is to get to green.

834
00:39:13,930 --> 00:39:17,530
But the yellow dot, as in any good
game, does not have this bird's eye view

835
00:39:17,530 --> 00:39:19,930
and knows from the get-go
exactly where to go.

836
00:39:19,930 --> 00:39:22,040
It's going to have to
try some trial and error.

837
00:39:22,040 --> 00:39:25,300
But if we, the programmers,
maybe reinforce good behavior

838
00:39:25,300 --> 00:39:28,810
or punish bad behavior, we
can teach this yellow dot,

839
00:39:28,810 --> 00:39:31,550
without giving it step
by step, up, down,

840
00:39:31,550 --> 00:39:34,600
left, right instructions,
what behaviors to repeat

841
00:39:34,600 --> 00:39:36,460
and what behaviors not to repeat.

842
00:39:36,460 --> 00:39:38,665
So, for instance, suppose
the robot moves right.

843
00:39:38,665 --> 00:39:39,520
Ah, that was bad.

844
00:39:39,520 --> 00:39:42,610
You fell in the lava already, so
we'll use a bit of computer memory

845
00:39:42,610 --> 00:39:45,100
to draw a thicker red line there.

846
00:39:45,100 --> 00:39:46,220
Don't do that again.

847
00:39:46,220 --> 00:39:47,830
So, negative 1, so to speak.

848
00:39:47,830 --> 00:39:49,780
Maybe the yellow dot moves up next time.

849
00:39:49,780 --> 00:39:53,290
We can reward that behavior
by not drawing any walls

850
00:39:53,290 --> 00:39:54,580
and allowing it to go again.

851
00:39:54,580 --> 00:39:57,970
It's making pretty good progress,
but, oh, darn it, it took a right turn

852
00:39:57,970 --> 00:39:59,230
and now fell into the lava.

853
00:39:59,230 --> 00:40:01,490
But let's use a bit more
of the computer's memory

854
00:40:01,490 --> 00:40:04,750
and keep track of the, OK,
do not do that thing anymore.

855
00:40:04,750 --> 00:40:07,270
Maybe the next time the
human dot goes this way.

856
00:40:07,270 --> 00:40:09,370
Oh, we want to punish
that behavior, so we'll

857
00:40:09,370 --> 00:40:11,140
remember as much with that red line.

858
00:40:11,140 --> 00:40:15,040
But now we're starting to make progress
until, oh, now we hit this one.

859
00:40:15,040 --> 00:40:18,340
And eventually, even though the
yellow dot, much like our human,

860
00:40:18,340 --> 00:40:22,780
much like our pancake flipping robot
had to try again and again and again,

861
00:40:22,780 --> 00:40:26,710
after enough trials, it's going to start
to realize what behaviors it should

862
00:40:26,710 --> 00:40:28,880
repeat and which ones it shouldn't.

863
00:40:28,880 --> 00:40:32,740
And so in this case, maybe it finally
makes its way up to the green dot.

864
00:40:32,740 --> 00:40:35,050
And just to recap, once
it finds that path,

865
00:40:35,050 --> 00:40:38,620
now it can remember it forever as
with these green thicker lines.

866
00:40:38,620 --> 00:40:41,470
Any time you want to leave this
map, any time you get really good

867
00:40:41,470 --> 00:40:44,650
at the Nintendo game, you follow
that same path again and again,

868
00:40:44,650 --> 00:40:46,420
so you don't fall into the lava.

869
00:40:46,420 --> 00:40:51,160
But an astute human observer might
realize that, yes, this is correct.

870
00:40:51,160 --> 00:40:53,590
It's getting out of this so-called maze.

871
00:40:53,590 --> 00:40:56,315
But what is suboptimal or
bad about this solution?

872
00:40:56,315 --> 00:40:56,815
Sure.

873
00:40:56,815 --> 00:40:58,513
AUDIENCE: It's taking
a really long time.

874
00:40:58,513 --> 00:40:59,900
It's not the most
efficient way to get there.

875
00:40:59,900 --> 00:41:00,500
DAVID J. MALAN: Exactly.

876
00:41:00,500 --> 00:41:01,792
It's taking a really long time.

877
00:41:01,792 --> 00:41:04,190
An inefficient way to get
there, because I dare say,

878
00:41:04,190 --> 00:41:07,280
if we just tried a
different path occasionally,

879
00:41:07,280 --> 00:41:11,480
maybe we could get lucky
and get to the exit quicker.

880
00:41:11,480 --> 00:41:14,930
And maybe that means we get a higher
score or we get rewarded even more.

881
00:41:14,930 --> 00:41:18,140
So within a lot of artificial
intelligence algorithms,

882
00:41:18,140 --> 00:41:21,230
there's this idea of
exploring versus exploiting,

883
00:41:21,230 --> 00:41:26,000
whereby you should occasionally, yes,
exploit the knowledge you already have.

884
00:41:26,000 --> 00:41:28,010
And in fact, frequently
exploit that knowledge.

885
00:41:28,010 --> 00:41:30,260
But occasionally you know
what you should probably do,

886
00:41:30,260 --> 00:41:31,550
is explore just a little bit.

887
00:41:31,550 --> 00:41:34,550
Take a left instead of a right and
see if it leads you to the solution

888
00:41:34,550 --> 00:41:35,390
even more quickly.

889
00:41:35,390 --> 00:41:37,620
And you might find a
better and better solution.

890
00:41:37,620 --> 00:41:40,100
So here mathematically is
how we might think of this.

891
00:41:40,100 --> 00:41:44,690
10% of the time we might say that
epsilon, just some variable, sort

892
00:41:44,690 --> 00:41:47,780
of a sprinkling of salt into
the algorithm here, epsilon

893
00:41:47,780 --> 00:41:49,320
will be like 10% of the time.

894
00:41:49,320 --> 00:41:54,512
So if my robot or my player picks a
random number that's less than 10%,

895
00:41:54,512 --> 00:41:55,970
that's going to make a random move.

896
00:41:55,970 --> 00:41:59,270
Go left instead of right, even
if you really typically go right.

897
00:41:59,270 --> 00:42:01,650
Otherwise, guys make the
move with the highest value,

898
00:42:01,650 --> 00:42:03,090
as we've learned over time.

899
00:42:03,090 --> 00:42:06,420
And what the robot might learn
then, is that we could actually

900
00:42:06,420 --> 00:42:10,290
go via this path, which gets
us to the output faster.

901
00:42:10,290 --> 00:42:13,313
We get a higher score, we do it
in less time, it's a win-win.

902
00:42:13,313 --> 00:42:15,480
Frankly, this really resonates
with me, because I've

903
00:42:15,480 --> 00:42:19,068
been in the habit, as maybe some of you
are, when you go to a restaurant maybe

904
00:42:19,068 --> 00:42:21,360
that you really like, you
find a dish you really like--

905
00:42:21,360 --> 00:42:24,120
--I will never again know what
other dishes that restaurant

906
00:42:24,120 --> 00:42:28,440
offers, because I'm locally optimally
happy with the dish I've chosen.

907
00:42:28,440 --> 00:42:31,800
And I will never know if there's an
even better dish at that restaurant

908
00:42:31,800 --> 00:42:34,320
unless again, I sort of sprinkle
a little bit of epsilon,

909
00:42:34,320 --> 00:42:38,730
a little bit of randomness into
my game playing, my dining out.

910
00:42:38,730 --> 00:42:41,640
The catch, of course, though,
is that I might be punished.

911
00:42:41,640 --> 00:42:45,360
I might, therefore, be less happy if
I pick something and I don't like it.

912
00:42:45,360 --> 00:42:48,120
So there's this tension between
exploring and exploiting.

913
00:42:48,120 --> 00:42:50,700
But in general in computer
science, and especially in AI,

914
00:42:50,700 --> 00:42:53,220
adding a little bit of
randomness, especially over time,

915
00:42:53,220 --> 00:42:56,320
can, in fact, yield better
and better outcomes.

916
00:42:56,320 --> 00:42:59,400
But now there's this notion
all the more of deep learning,

917
00:42:59,400 --> 00:43:02,910
whereby you're trying to
infer, to detect patterns,

918
00:43:02,910 --> 00:43:06,120
figure out how to solve problems,
even if the AI has never

919
00:43:06,120 --> 00:43:10,170
seen those problems before, and even
if there's no human there to reinforce

920
00:43:10,170 --> 00:43:12,720
positive or negatively behavior.

921
00:43:12,720 --> 00:43:15,390
Maybe it's just too complex
of a problem for a human

922
00:43:15,390 --> 00:43:18,415
to stand alongside the robot
and say, good or bad job.

923
00:43:18,415 --> 00:43:20,790
So with deep learning, they're
actually very much related

924
00:43:20,790 --> 00:43:24,210
to what you might know as neural
networks, inspired by human physiology,

925
00:43:24,210 --> 00:43:26,580
whereby inside of our brains
and elsewhere in our body,

926
00:43:26,580 --> 00:43:28,372
there's lots of these
neurons here that can

927
00:43:28,372 --> 00:43:30,480
send electrical signals
to make movements

928
00:43:30,480 --> 00:43:32,220
happen from brain to extremities.

929
00:43:32,220 --> 00:43:35,520
You might have two of
these via which signals can

930
00:43:35,520 --> 00:43:37,810
be transmitted over a larger distance.

931
00:43:37,810 --> 00:43:41,760
And so computer scientists for
some time have drawn inspiration

932
00:43:41,760 --> 00:43:46,560
from these neurons to create in
software, what we call neural networks.

933
00:43:46,560 --> 00:43:49,240
Whereby, there's inputs
to these networks

934
00:43:49,240 --> 00:43:52,230
and there's outputs from these
networks that represents inputs

935
00:43:52,230 --> 00:43:54,450
to problems and solutions thereto.

936
00:43:54,450 --> 00:43:56,910
So let me abstract away the
more biological diagrams

937
00:43:56,910 --> 00:44:00,970
with just circles that represent
nodes, or neurons, in this case.

938
00:44:00,970 --> 00:44:03,450
This we would call in CS50, the input.

939
00:44:03,450 --> 00:44:05,520
This is what we would call the output.

940
00:44:05,520 --> 00:44:08,680
But this is a very simplistic,
a very simple neural network.

941
00:44:08,680 --> 00:44:11,760
This might be more common,
whereby the network, the AI

942
00:44:11,760 --> 00:44:15,900
takes two inputs to a problem and
tries to give you one solution.

943
00:44:15,900 --> 00:44:17,760
Well, let's make this more real.

944
00:44:17,760 --> 00:44:20,760
For instance, suppose that at the--

945
00:44:20,760 --> 00:44:23,970
suppose that just for the sake of
discussion, here is like a grid

946
00:44:23,970 --> 00:44:27,180
that you might see in math class, with
a y-axis and an x-axis, vertically

947
00:44:27,180 --> 00:44:28,620
and horizontally respectively.

948
00:44:28,620 --> 00:44:31,980
Suppose there's a couple of
blue and red dots in that world.

949
00:44:31,980 --> 00:44:34,890
And suppose that our
goal, computationally,

950
00:44:34,890 --> 00:44:40,020
is to predict whether a dot is
going to be blue or red, based

951
00:44:40,020 --> 00:44:42,960
on its position within
that coordinate system.

952
00:44:42,960 --> 00:44:45,002
And maybe this represents
some real world notion.

953
00:44:45,002 --> 00:44:47,502
Maybe it's something like rain
that we're trying to predict.

954
00:44:47,502 --> 00:44:49,920
But we're doing it more
simply with colors right now.

955
00:44:49,920 --> 00:44:53,010
So here's my y-axis, here's
my x-axis, and effectively,

956
00:44:53,010 --> 00:44:55,740
my neural network you can
think of conceptually as this.

957
00:44:55,740 --> 00:44:58,393
It's some kind of
implementation of software

958
00:44:58,393 --> 00:45:00,060
where there's two inputs to the problem.

959
00:45:00,060 --> 00:45:01,990
Give me an x, give me a y value.

960
00:45:01,990 --> 00:45:06,540
And this neural network will output
red or blue as its prediction.

961
00:45:06,540 --> 00:45:08,790
Well, how does it know whether
to predict red or blue,

962
00:45:08,790 --> 00:45:12,030
especially if no human has
painstakingly written code

963
00:45:12,030 --> 00:45:15,360
to say when you see a dot
here, conclude that it's red.

964
00:45:15,360 --> 00:45:17,490
When you see a dot here,
conclude that it's blue.

965
00:45:17,490 --> 00:45:21,160
How can an AI just learn
dynamically to solve problems?

966
00:45:21,160 --> 00:45:23,460
Well, what might be a
reasonable heuristic here?

967
00:45:23,460 --> 00:45:26,757
Honestly, this is probably a first
approximation that's pretty good.

968
00:45:26,757 --> 00:45:29,340
If anything's to the left of
that line, let the neural network

969
00:45:29,340 --> 00:45:30,630
conclude that it's going to be blue.

970
00:45:30,630 --> 00:45:32,010
And if it's to the
right of the line, let

971
00:45:32,010 --> 00:45:33,593
it conclude that it's going to be red.

972
00:45:33,593 --> 00:45:36,690
Until such time as there's
more training data,

973
00:45:36,690 --> 00:45:40,203
more real world data that gets
us to rethink our assumptions.

974
00:45:40,203 --> 00:45:42,120
So for instance, if
there's a third dot there,

975
00:45:42,120 --> 00:45:44,830
uh-oh, clearly a straight
line is not sufficient.

976
00:45:44,830 --> 00:45:48,960
So maybe it's more of a diagonal line
that splits the blue from the red world

977
00:45:48,960 --> 00:45:49,600
here.

978
00:45:49,600 --> 00:45:51,660
Meanwhile, here's even more dots.

979
00:45:51,660 --> 00:45:53,580
And it's actually getting harder now.

980
00:45:53,580 --> 00:45:55,230
Like, this line is still pretty good.

981
00:45:55,230 --> 00:45:56,610
Most of the blue is up here.

982
00:45:56,610 --> 00:45:58,240
Most of the red is down here.

983
00:45:58,240 --> 00:46:02,100
And this is why, if we fast forward to
today, you know, AI is often very good,

984
00:46:02,100 --> 00:46:04,630
but not perfect at solving problems.

985
00:46:04,630 --> 00:46:07,890
But what is it we're looking at here,
and what is this neural network really

986
00:46:07,890 --> 00:46:09,250
trying to figure out?

987
00:46:09,250 --> 00:46:12,870
Well, again, at the risk of taking
some fun out of red and blue dots,

988
00:46:12,870 --> 00:46:16,890
you can think of this neural network
as indeed having these neurons, which

989
00:46:16,890 --> 00:46:19,590
represent inputs here and outputs here.

990
00:46:19,590 --> 00:46:22,200
And then what's happening
inside of the computer's memory,

991
00:46:22,200 --> 00:46:26,320
is that it's trying to figure out
what the weight of this arrow or edge

992
00:46:26,320 --> 00:46:26,820
should be.

993
00:46:26,820 --> 00:46:29,132
What the weight of this
arrow or edge should be.

994
00:46:29,132 --> 00:46:30,840
And maybe there's
another variable there,

995
00:46:30,840 --> 00:46:33,910
like plus or minus C that
just tweaks the prediction.

996
00:46:33,910 --> 00:46:37,540
So x and y are literally going
to be numbers in this scenario.

997
00:46:37,540 --> 00:46:40,890
And the output of this neural network
ideally is just true or false.

998
00:46:40,890 --> 00:46:42,310
Is it red or blue?

999
00:46:42,310 --> 00:46:45,330
So it's sort of a binary state,
as we discuss a lot in CS50.

1000
00:46:45,330 --> 00:46:47,987
So here too, to take the fun
out of the pretty picture,

1001
00:46:47,987 --> 00:46:50,070
it's really just like a
high school math function.

1002
00:46:50,070 --> 00:46:53,160
What the neural network in this
example is trying to figure out,

1003
00:46:53,160 --> 00:46:57,540
is what formula of the
form ax plus by plus c

1004
00:46:57,540 --> 00:46:59,680
is going to be arbitrarily
greater than 0?

1005
00:46:59,680 --> 00:47:02,150
And if so, let's conclude
that the dot is red

1006
00:47:02,150 --> 00:47:05,140
if you get back a positive
result. If you don't, let's

1007
00:47:05,140 --> 00:47:08,558
conclude that the dot is
going to be blue instead.

1008
00:47:08,558 --> 00:47:10,600
So really what you're
trying to do, is figure out

1009
00:47:10,600 --> 00:47:13,000
dynamically what numbers
do we have to tweak,

1010
00:47:13,000 --> 00:47:15,100
these parameters inside
of the neural network

1011
00:47:15,100 --> 00:47:18,220
that just give us the answer we
want based on all of this data?

1012
00:47:18,220 --> 00:47:22,180
More generally though, this would be
really representative of deep learning.

1013
00:47:22,180 --> 00:47:24,490
It's not as simple as
input, input, output.

1014
00:47:24,490 --> 00:47:27,140
There's actually a lot of
these nodes, these neurons.

1015
00:47:27,140 --> 00:47:28,360
There's a lot of these edges.

1016
00:47:28,360 --> 00:47:30,812
There's a lot of numbers
and math are going on that,

1017
00:47:30,812 --> 00:47:33,520
frankly, even the computer scientists
using these neural networks

1018
00:47:33,520 --> 00:47:36,760
don't necessarily know what
they even mean or represent.

1019
00:47:36,760 --> 00:47:39,910
It just happens to be that when
you crunch the numbers with all

1020
00:47:39,910 --> 00:47:44,140
of these parameters in place,
you get the answer that you want,

1021
00:47:44,140 --> 00:47:46,190
at least most of the time.

1022
00:47:46,190 --> 00:47:48,280
So that's essentially the
intuition behind that.

1023
00:47:48,280 --> 00:47:51,340
And you can apply it to very real
world, if mundane applications.

1024
00:47:51,340 --> 00:47:55,000
Given today's humidity, given
today's pressure, yes or no,

1025
00:47:55,000 --> 00:47:56,275
should there be rainfall?

1026
00:47:56,275 --> 00:47:58,150
And maybe there is some
mathematical function

1027
00:47:58,150 --> 00:48:01,120
that based on years of
training data, we can

1028
00:48:01,120 --> 00:48:03,490
infer what that prediction should be.

1029
00:48:03,490 --> 00:48:04,090
Another one.

1030
00:48:04,090 --> 00:48:07,120
Given this amount of
advertising in this month,

1031
00:48:07,120 --> 00:48:09,480
what should our sales be for that year?

1032
00:48:09,480 --> 00:48:11,230
Should they be up, or
should they be down?

1033
00:48:11,230 --> 00:48:13,130
Sorry, for that particular month.

1034
00:48:13,130 --> 00:48:16,090
So real world problems map readily
when you can break them down

1035
00:48:16,090 --> 00:48:20,320
into inputs and a binary output
often, or some kind of output

1036
00:48:20,320 --> 00:48:24,250
where you want the thing to
figure out based on past data what

1037
00:48:24,250 --> 00:48:26,650
its prediction should be.

1038
00:48:26,650 --> 00:48:30,250
So that brings us back to generative
artificial intelligence, which

1039
00:48:30,250 --> 00:48:34,760
isn't just about solving problems, but
really generating literally images,

1040
00:48:34,760 --> 00:48:38,680
texts, even videos, that
again, increasingly resemble

1041
00:48:38,680 --> 00:48:41,920
what we humans might
otherwise output ourselves.

1042
00:48:41,920 --> 00:48:45,370
And within the world of generative
artificial intelligence,

1043
00:48:45,370 --> 00:48:48,310
do we have, of course, these
same images that we saw before,

1044
00:48:48,310 --> 00:48:51,340
the same text that we saw before,
and more generally, things

1045
00:48:51,340 --> 00:48:55,870
like ChatGPT, which are really examples
of what we now call large language

1046
00:48:55,870 --> 00:48:56,560
models.

1047
00:48:56,560 --> 00:48:59,020
These sort of massive
neural networks that

1048
00:48:59,020 --> 00:49:02,590
have so many inputs and so
many neurons implemented

1049
00:49:02,590 --> 00:49:06,280
in software, that essentially
represent all of the patterns

1050
00:49:06,280 --> 00:49:09,850
that the software has discovered by
being fed massive amounts of input.

1051
00:49:09,850 --> 00:49:13,180
Think of it as like the entire
textual content of the internet.

1052
00:49:13,180 --> 00:49:16,180
Think of it as the entire
content of courses like CS50

1053
00:49:16,180 --> 00:49:18,280
that may very well be out
there on the internet.

1054
00:49:18,280 --> 00:49:21,610
And even though these AIs,
these large language models

1055
00:49:21,610 --> 00:49:25,240
haven't been told how to
behave, they're really

1056
00:49:25,240 --> 00:49:28,210
inferring from all of
these examples, for better

1057
00:49:28,210 --> 00:49:31,310
or for worse, how to make predictions.

1058
00:49:31,310 --> 00:49:34,840
So here, for instance, from
2017, just a few years back,

1059
00:49:34,840 --> 00:49:38,110
is a seminal paper from Google
that introduced what we now

1060
00:49:38,110 --> 00:49:40,210
know as a transformer architecture.

1061
00:49:40,210 --> 00:49:43,690
And this introduced this idea
of attention values, whereby

1062
00:49:43,690 --> 00:49:46,900
they propose that given an English
sentence, for instance, or really

1063
00:49:46,900 --> 00:49:51,460
any human sentence, you try to assign
numbers, not unlike our past exercises,

1064
00:49:51,460 --> 00:49:55,780
to each of the words, each of the
inputs that speaks to its relationship

1065
00:49:55,780 --> 00:49:56,930
with other words.

1066
00:49:56,930 --> 00:49:59,720
So if there's a high relationship
between two words in a sentence,

1067
00:49:59,720 --> 00:50:01,310
they would have high attention values.

1068
00:50:01,310 --> 00:50:04,720
And if maybe it's a preposition or
an article, like the or the like,

1069
00:50:04,720 --> 00:50:06,890
maybe those attention values are lower.

1070
00:50:06,890 --> 00:50:09,070
And by encoding the
world in that way, do

1071
00:50:09,070 --> 00:50:14,230
we begin to detect patterns that
allow us to predict things like words,

1072
00:50:14,230 --> 00:50:15,440
that is, generate text.

1073
00:50:15,440 --> 00:50:19,150
So for instance, up until a few
years ago, completing this sentence

1074
00:50:19,150 --> 00:50:21,310
was actually pretty
hard for a lot of AI.

1075
00:50:21,310 --> 00:50:25,180
So for instance here, Massachusetts
is a state in the New England region

1076
00:50:25,180 --> 00:50:26,860
of the Northeastern United States.

1077
00:50:26,860 --> 00:50:29,500
It borders on the Atlantic
Ocean to the east.

1078
00:50:29,500 --> 00:50:32,180
The state's capital is dot, dot, dot.

1079
00:50:32,180 --> 00:50:34,910
Now, you should think that this
is relatively straightforward.

1080
00:50:34,910 --> 00:50:37,480
It's like just handing you
a softball type question.

1081
00:50:37,480 --> 00:50:41,290
But historically within the
world of AI, this word, state,

1082
00:50:41,290 --> 00:50:44,907
was so relatively far
away from the proper noun

1083
00:50:44,907 --> 00:50:46,990
that it's actually referring
back to, that we just

1084
00:50:46,990 --> 00:50:50,170
didn't have computational models
that took in that holistic picture,

1085
00:50:50,170 --> 00:50:52,702
that frankly, we humans
are much better at.

1086
00:50:52,702 --> 00:50:54,910
If you would ask this question
a little more quickly,

1087
00:50:54,910 --> 00:50:57,260
a little more immediately, you
might have gotten a better response.

1088
00:50:57,260 --> 00:50:59,610
But this is, daresay, why
chatbots in the past been

1089
00:50:59,610 --> 00:51:01,945
so bad in the form of
customer service and the like,

1090
00:51:01,945 --> 00:51:04,320
because they're not really
taking all of the context into

1091
00:51:04,320 --> 00:51:07,470
account that we humans might
be inclined to provide.

1092
00:51:07,470 --> 00:51:09,750
What's going on underneath the hood?

1093
00:51:09,750 --> 00:51:14,220
Without escalating things too quickly,
what an artificial intelligence

1094
00:51:14,220 --> 00:51:16,650
nowadays, these large
language models might do,

1095
00:51:16,650 --> 00:51:21,360
is break down the user's
input, your input into ChatGPT

1096
00:51:21,360 --> 00:51:22,950
into the individual words.

1097
00:51:22,950 --> 00:51:26,790
We might then encode, we might then take
into account the order of those words.

1098
00:51:26,790 --> 00:51:29,400
Massachusetts is first, is is last.

1099
00:51:29,400 --> 00:51:33,050
We might further encode each of
those words using a standard way.

1100
00:51:33,050 --> 00:51:34,800
And there's different
algorithms for this,

1101
00:51:34,800 --> 00:51:37,050
but you come up with what
are called embeddings.

1102
00:51:37,050 --> 00:51:40,170
That is to say, you can
use one of those APIs

1103
00:51:40,170 --> 00:51:43,500
I talked about earlier, or even
software running on your own computers,

1104
00:51:43,500 --> 00:51:46,140
to come up with a
mathematical representation

1105
00:51:46,140 --> 00:51:47,940
of the word, Massachusetts.

1106
00:51:47,940 --> 00:51:50,190
And Rongxin kindly did
this for us last night.

1107
00:51:50,190 --> 00:51:57,000
This is the 1,536 floating
point values that OpenAI uses

1108
00:51:57,000 --> 00:51:59,880
to represent the word, Massachusetts.

1109
00:51:59,880 --> 00:52:02,010
And this is to say, and
you should not understand

1110
00:52:02,010 --> 00:52:04,380
anything you are looking
at on the screen, nor do I,

1111
00:52:04,380 --> 00:52:07,170
but this is now a
mathematical representation

1112
00:52:07,170 --> 00:52:10,320
of the input that can
be compared against

1113
00:52:10,320 --> 00:52:12,660
the mathematical
representations of other inputs

1114
00:52:12,660 --> 00:52:15,420
in order to find proximity semantically.

1115
00:52:15,420 --> 00:52:20,130
Words that somehow have relationships
or correlations with each other

1116
00:52:20,130 --> 00:52:22,890
that helps the AI
ultimately predict what

1117
00:52:22,890 --> 00:52:25,990
should the next word out of
its mouth be, so to speak.

1118
00:52:25,990 --> 00:52:28,380
So in a case like,
these values represent--

1119
00:52:28,380 --> 00:52:30,630
these lines represent all
of those attention values.

1120
00:52:30,630 --> 00:52:32,880
And thicker lines means
there's more attention given

1121
00:52:32,880 --> 00:52:34,140
from one word to another.

1122
00:52:34,140 --> 00:52:35,730
Thinner lines mean the opposite.

1123
00:52:35,730 --> 00:52:40,770
And those inputs are ultimately
fed into a large neural network,

1124
00:52:40,770 --> 00:52:43,870
where you have inputs on the
left, outputs on the right.

1125
00:52:43,870 --> 00:52:46,380
And in this particular
case, the hope is to get out

1126
00:52:46,380 --> 00:52:52,200
a single word, which is the capital
of Boston itself, whereby somehow,

1127
00:52:52,200 --> 00:52:55,950
the neural network and the humans
behind it at OpenAI, Microsoft, Google,

1128
00:52:55,950 --> 00:52:59,490
or elsewhere, have sort of crunched
so many numbers by training

1129
00:52:59,490 --> 00:53:03,040
these models on so much data, that it
figured out what all of those weights

1130
00:53:03,040 --> 00:53:06,670
are, what the biases are, so
as to influence mathematically

1131
00:53:06,670 --> 00:53:08,710
the output therefrom.

1132
00:53:08,710 --> 00:53:13,270
So that is all underneath
the hood of what students now

1133
00:53:13,270 --> 00:53:15,460
perceive as this adorable rubber duck.

1134
00:53:15,460 --> 00:53:20,150
But underneath it all is certainly
a lot of domain knowledge.

1135
00:53:20,150 --> 00:53:23,570
And CS50, by nature of being
OpenCourseWare for the past many years,

1136
00:53:23,570 --> 00:53:26,050
CS50 is fortunate to actually
be part of the model,

1137
00:53:26,050 --> 00:53:28,880
as might be any other content
that's freely available online.

1138
00:53:28,880 --> 00:53:31,570
And so that certainly
helps benefit the answers

1139
00:53:31,570 --> 00:53:34,150
when it comes to asking
CS50 specific questions.

1140
00:53:34,150 --> 00:53:36,403
That said, it's not perfect.

1141
00:53:36,403 --> 00:53:38,320
And you might have heard
of what are currently

1142
00:53:38,320 --> 00:53:43,540
called hallucinations, where ChatGPT
and similar tools just make stuff up.

1143
00:53:43,540 --> 00:53:45,340
And it sounds very confident.

1144
00:53:45,340 --> 00:53:47,673
And you can sometimes
call on it, whereby

1145
00:53:47,673 --> 00:53:49,090
you can say, no, that's not right.

1146
00:53:49,090 --> 00:53:51,610
And it will playfully apologize
and say, oh, I'm sorry.

1147
00:53:51,610 --> 00:53:56,560
But it made up some statement,
because it was probabilistically

1148
00:53:56,560 --> 00:53:59,840
something that could be said,
even if it's just not correct.

1149
00:53:59,840 --> 00:54:02,650
Now, allow me to propose
that this kind of problem

1150
00:54:02,650 --> 00:54:05,230
is going to get less and less frequent.

1151
00:54:05,230 --> 00:54:07,480
And so as the models evolve
and our techniques evolve,

1152
00:54:07,480 --> 00:54:08,983
this will be less of an issue.

1153
00:54:08,983 --> 00:54:10,900
But I thought it would
be fun to end on a note

1154
00:54:10,900 --> 00:54:13,510
that a former colleague shared
just the other day, which

1155
00:54:13,510 --> 00:54:16,780
was this old poem by Shel
Silverstein, another something

1156
00:54:16,780 --> 00:54:18,580
from our past childhood perhaps.

1157
00:54:18,580 --> 00:54:23,800
And this was from 1981, a poem called
"Homework Machine," which is perhaps

1158
00:54:23,800 --> 00:54:26,980
foretold where we are now in 2023.

1159
00:54:26,980 --> 00:54:30,940
"The homework machine, oh, the homework
machine, most perfect contraption

1160
00:54:30,940 --> 00:54:32,320
that's ever been seen.

1161
00:54:32,320 --> 00:54:35,770
Just put in your homework, then
drop in a dime, snap on the switch,

1162
00:54:35,770 --> 00:54:41,380
and in ten seconds time, your homework
comes out quick and clean as can be.

1163
00:54:41,380 --> 00:54:46,240
Here it is, 9 plus 4,
and the answer is 3.

1164
00:54:46,240 --> 00:54:47,590
3?

1165
00:54:47,590 --> 00:54:48,820
Oh, me.

1166
00:54:48,820 --> 00:54:52,210
I guess it's not as perfect
as I thought it would be."

1167
00:54:52,210 --> 00:54:55,330
So, quite foretelling, sure.

1168
00:54:55,330 --> 00:54:58,220
[APPLAUSE]

1169
00:54:58,220 --> 00:55:01,130
Quite foretelling, indeed.

1170
00:55:01,130 --> 00:55:04,910
Though, if for all this and more,
the family members in the audience

1171
00:55:04,910 --> 00:55:08,810
are welcome to take CS50
yourself online at cs50edx.org.

1172
00:55:08,810 --> 00:55:10,700
For all of today and
so much more, allow me

1173
00:55:10,700 --> 00:55:15,140
to thank Brian, Rongxin, Sophie, Andrew,
Patrick, Charlie, CS50's whole team.

1174
00:55:15,140 --> 00:55:18,920
If you are a family member here
headed to lunch with CS50's team,

1175
00:55:18,920 --> 00:55:22,190
please look for Cameron holding
a rubber duck above her head.

1176
00:55:22,190 --> 00:55:24,300
Thank you so much for joining us today.

1177
00:55:24,300 --> 00:55:25,670
This was CS50.

1178
00:55:25,670 --> 00:55:27,170
[APPLAUSE]

1179
00:55:27,170 --> 00:55:30,520
[MUSIC PLAYING]

1180
00:55:30,520 --> 00:55:57,000