1
00:00:00,000 --> 00:00:02,070


2
00:00:02,070 --> 00:00:06,960
DAVID J. MALAN: All right, this is CS50,
and this is the CS50 Educator Workshop

3
00:00:06,960 --> 00:00:09,630
and our session on
"Teaching CS50 with AI,"

4
00:00:09,630 --> 00:00:12,900
in which we'll speak to exactly how
and why we've leveraged artificial

5
00:00:12,900 --> 00:00:17,350
intelligence to actually teach the
course and support students therein.

6
00:00:17,350 --> 00:00:21,420
And we started this project really over
a year ago now with this working premise

7
00:00:21,420 --> 00:00:27,060
that tools like ChatGPT, which came
onto the scene in November of 2020--

8
00:00:27,060 --> 00:00:29,590
2022, seemingly overnight.

9
00:00:29,590 --> 00:00:33,180
But the working premise for us is that
tools like this really out of the box,

10
00:00:33,180 --> 00:00:35,980
working in their default
way, are really too helpful.

11
00:00:35,980 --> 00:00:38,250
Tools like ChatGPT,
Bing Chat, and the like

12
00:00:38,250 --> 00:00:41,010
are all too eager to answer
each and every question

13
00:00:41,010 --> 00:00:42,280
that you might ask of them.

14
00:00:42,280 --> 00:00:45,690
And even if that question is,
can you do my homework for me,

15
00:00:45,690 --> 00:00:47,560
they are all too willing to oblige.

16
00:00:47,560 --> 00:00:49,950
So ironically, we set
out a year plus ago

17
00:00:49,950 --> 00:00:52,410
to put in place some
pedagogical guardrails

18
00:00:52,410 --> 00:00:55,830
to really make those tools
less useful for students

19
00:00:55,830 --> 00:00:59,910
or really more properly stated,
more educationally useful

20
00:00:59,910 --> 00:01:04,629
for students so that the tools might
guide students toward useful answers

21
00:01:04,629 --> 00:01:08,840
and insights but not simply spoil
answers to problems outright.

22
00:01:08,840 --> 00:01:11,620
In terms of policy, this
is literally the language

23
00:01:11,620 --> 00:01:13,840
we adopted early on in
the course's syllabus

24
00:01:13,840 --> 00:01:18,670
for students, whereby we said that it is
not reasonable, that it is not allowed,

25
00:01:18,670 --> 00:01:22,840
to use AI-based software like
ChatGPT, GitHub, Copilot, Bing Chat,

26
00:01:22,840 --> 00:01:27,470
and the like that suggests or completes
answers to questions or lines of code.

27
00:01:27,470 --> 00:01:30,280
This is not something we
can enforce technologically.

28
00:01:30,280 --> 00:01:33,760
Even if the school or your
school could prohibit access

29
00:01:33,760 --> 00:01:36,340
to certain websites with
a firewall, the reality

30
00:01:36,340 --> 00:01:39,550
is most any student could still
take out their own device,

31
00:01:39,550 --> 00:01:42,740
whether on campus or at home,
and still access those services.

32
00:01:42,740 --> 00:01:44,740
So we've also woven into
the course all the more

33
00:01:44,740 --> 00:01:49,600
of a discussion of ethics, academic
honesty so that students at least are

34
00:01:49,600 --> 00:01:53,270
cognizant of where we are coming
from and where those lines are.

35
00:01:53,270 --> 00:01:55,450
However, we didn't want
to simply take away tools

36
00:01:55,450 --> 00:01:59,390
that arguably are going to be with
us now here on out and do have value.

37
00:01:59,390 --> 00:02:03,130
So we deemed it reasonable for students
to use CS50's own AI-based software,

38
00:02:03,130 --> 00:02:07,570
including the CS50 Duck in virtual form,
otherwise known as the Duck debugger,

39
00:02:07,570 --> 00:02:10,990
or ddb, in our own
web-based applications,

40
00:02:10,990 --> 00:02:16,150
namely CS50.ai, which you'll soon see,
as well as CS50.dev, the Codespaces

41
00:02:16,150 --> 00:02:18,560
environment you saw yesterday as well.

42
00:02:18,560 --> 00:02:21,410
And so we really set out to
bring this virtual duck to life.

43
00:02:21,410 --> 00:02:24,460
And in fact, thanks to CS50's
own Brian Yu, for a few years,

44
00:02:24,460 --> 00:02:30,460
we had a chat bot of sorts built into
VS Code and our previous programming

45
00:02:30,460 --> 00:02:34,030
environment, whereby if students
typed a question into a chat box,

46
00:02:34,030 --> 00:02:36,350
they would actually get back a response.

47
00:02:36,350 --> 00:02:38,118
But the response wasn't
all that helpful.

48
00:02:38,118 --> 00:02:39,910
But the goal was to
indeed implement what's

49
00:02:39,910 --> 00:02:42,910
called rubber duck
debugging or rubberducking,

50
00:02:42,910 --> 00:02:47,590
the idea being that in the absence of a
smart colleague, family member, friend,

51
00:02:47,590 --> 00:02:50,350
or really just someone more
technically proficient than you,

52
00:02:50,350 --> 00:02:55,360
you should at least keep on your
desk a rubber duck or any inanimate

53
00:02:55,360 --> 00:02:58,600
object really so that you can talk
to it about any problems or bugs

54
00:02:58,600 --> 00:02:59,600
that you're having.

55
00:02:59,600 --> 00:03:03,820
And the idea is per yesterday that
invariably in expressing your thoughts

56
00:03:03,820 --> 00:03:07,480
step by step, ideally to
this inanimate object,

57
00:03:07,480 --> 00:03:09,970
the proverbial light
bulb goes off, and you

58
00:03:09,970 --> 00:03:14,500
understand wherein you are
confused about some topic or bug.

59
00:03:14,500 --> 00:03:16,330
We try to bring this
to life historically

60
00:03:16,330 --> 00:03:19,910
in CS50 not only with small rubber
ducks but very large ones as well.

61
00:03:19,910 --> 00:03:22,000
But the virtual version
that we had for some time

62
00:03:22,000 --> 00:03:26,680
looked a little something like this in
CS50.dev, that is, Visual Studio code.

63
00:03:26,680 --> 00:03:28,565
A student might begin
a conversation with,

64
00:03:28,565 --> 00:03:30,440
"I'm hoping you can help
me solve a problem."

65
00:03:30,440 --> 00:03:33,220
And up until fairly
recently, all this duck

66
00:03:33,220 --> 00:03:38,500
would do that Brian implemented for
students was quack back either once,

67
00:03:38,500 --> 00:03:39,740
twice, or three times.

68
00:03:39,740 --> 00:03:41,830
So it wasn't completely simple.

69
00:03:41,830 --> 00:03:45,980
There was some pseudorandomness in there
but still just really one line of code.

70
00:03:45,980 --> 00:03:50,380
But I dare say quite a few students
online and eventually on campus

71
00:03:50,380 --> 00:03:54,130
were fairly astonished where
after two plus years of the duck

72
00:03:54,130 --> 00:03:55,570
only knowing how to quack--

73
00:03:55,570 --> 00:04:00,370
suddenly and literally overnight started
to responding to students in English

74
00:04:00,370 --> 00:04:04,090
or in some cases in other
human languages form,

75
00:04:04,090 --> 00:04:06,530
depending on how students
initiate the conversation.

76
00:04:06,530 --> 00:04:10,100
And so really this virtual rubber
duck was brought all the more to life.

77
00:04:10,100 --> 00:04:12,910
And educationally, what our
goal has been is really this,

78
00:04:12,910 --> 00:04:16,660
to provide students with
virtual office hours 24/7.

79
00:04:16,660 --> 00:04:18,760
Even at a place like
Harvard or Yale, where

80
00:04:18,760 --> 00:04:22,300
we're fortunate to have lots of
teaching assistants and so forth,

81
00:04:22,300 --> 00:04:24,910
the reality is we've never
really been able to keep up

82
00:04:24,910 --> 00:04:28,390
with the volume of questions,
the demand for support.

83
00:04:28,390 --> 00:04:32,170
And so providing students
electronically somehow

84
00:04:32,170 --> 00:04:37,060
with all the more support in addition to
our traditional human support structure

85
00:04:37,060 --> 00:04:39,020
has very much been of interest.

86
00:04:39,020 --> 00:04:42,100
And I dare say this really is the
Holy Grail, to approximate ultimately

87
00:04:42,100 --> 00:04:47,210
a 1 to 1 teacher to student ratio,
which we have never had here.

88
00:04:47,210 --> 00:04:50,890
Indeed even within our campus
sections or recitations,

89
00:04:50,890 --> 00:04:54,400
which are led by our undergraduate
teaching fellows or teaching assistants,

90
00:04:54,400 --> 00:05:00,772
we might have a ratio of 1 to 6 at best,
1 to 12 more commonly, 1 to 20 or more.

91
00:05:00,772 --> 00:05:03,230
And if you think about just
mathematically how many minutes

92
00:05:03,230 --> 00:05:05,647
there are in an hour, if you've
got six students in a room

93
00:05:05,647 --> 00:05:09,172
over the course of an hour, that's
only 10 minutes per student.

94
00:05:09,172 --> 00:05:11,630
And that's not nearly enough
quite often for those students

95
00:05:11,630 --> 00:05:14,240
coming in with less
strong STEM backgrounds

96
00:05:14,240 --> 00:05:17,960
or simply struggling more
generally with a new world.

97
00:05:17,960 --> 00:05:21,470
So what we thought we'd do today
is share with you a little bit

98
00:05:21,470 --> 00:05:24,590
of the features that are student
and teacher facing but also some

99
00:05:24,590 --> 00:05:27,590
of the underlying architecture
so that you as teachers

100
00:05:27,590 --> 00:05:31,580
actually understand how today's
forms of artificial intelligence

101
00:05:31,580 --> 00:05:34,100
are working if only
so that you feel more

102
00:05:34,100 --> 00:05:38,390
comfortable and confident or skeptical
about how these tools are actually built

103
00:05:38,390 --> 00:05:39,350
and what they can do.

104
00:05:39,350 --> 00:05:42,950
Or educationally, you can teach
your own students about some

105
00:05:42,950 --> 00:05:46,650
of these underlying technologies
that are increasingly around us.

106
00:05:46,650 --> 00:05:50,300
For my part, I'll focus really on the
user-friendly side of this diagram,

107
00:05:50,300 --> 00:05:52,010
which is to say the user interface.

108
00:05:52,010 --> 00:05:55,350
And then we'll dive more
deeply into what CS50.ai is.

109
00:05:55,350 --> 00:05:57,470
We'll talk about a
technology known as a vector

110
00:05:57,470 --> 00:06:01,650
database, which will allow us to enhance
our searches and answers to questions.

111
00:06:01,650 --> 00:06:05,150
And then we'll talk a little bit about
the cloud services like OpenAI and Azure

112
00:06:05,150 --> 00:06:09,600
that underlie CS50's own
implementation of this rubber duck.

113
00:06:09,600 --> 00:06:13,670
So thank you to our friends, first
and foremost at GitHub, at OpenAI,

114
00:06:13,670 --> 00:06:16,040
at Microsoft who have
wonderfully been making

115
00:06:16,040 --> 00:06:19,130
a lot of this support possible
through cloud-based credits such

116
00:06:19,130 --> 00:06:21,320
that all of the tools
you are about to see

117
00:06:21,320 --> 00:06:23,690
and the tools with which you
might already be familiar

118
00:06:23,690 --> 00:06:27,260
firsthand have been made freely
available to teachers, to students

119
00:06:27,260 --> 00:06:27,810
alike.

120
00:06:27,810 --> 00:06:31,230
So everything you're about to see you
may use indeed with your own students.

121
00:06:31,230 --> 00:06:35,970
So you've seen this environment
already, VS Code for CS50 at cs50.dev.

122
00:06:35,970 --> 00:06:39,120
This is a landing page via which
students can log into that website.

123
00:06:39,120 --> 00:06:42,740
And what we set out to do early
on in this process of trying

124
00:06:42,740 --> 00:06:45,470
to embrace artificial
intelligence was think about what

125
00:06:45,470 --> 00:06:48,330
problems could we solve with AI.

126
00:06:48,330 --> 00:06:50,810
And so we set out first to
do something fairly simple

127
00:06:50,810 --> 00:06:53,010
and bite size just over a year ago.

128
00:06:53,010 --> 00:06:55,700
Could we write a program
that somehow uses

129
00:06:55,700 --> 00:06:58,770
AI to just explain code to students?

130
00:06:58,770 --> 00:07:03,260
And this is something that does
need AI because we couldn't possibly

131
00:07:03,260 --> 00:07:08,000
write a program that has enough if,
else if, else if, else if, else if, else

132
00:07:08,000 --> 00:07:13,310
if conditionals to be able to
deterministically tell students what

133
00:07:13,310 --> 00:07:17,600
a given line of code is
unless we literally have an if

134
00:07:17,600 --> 00:07:19,910
condition for every
possible line of code

135
00:07:19,910 --> 00:07:22,950
they might see or write so that
we can pattern match on it.

136
00:07:22,950 --> 00:07:24,870
So we really need a more
generalized solution.

137
00:07:24,870 --> 00:07:27,480
And this is where AI really shines.

138
00:07:27,480 --> 00:07:30,390
So here's a screenshot
of VS Code in night mode.

139
00:07:30,390 --> 00:07:33,630
Here then is some sample
code written in C.

140
00:07:33,630 --> 00:07:36,560
This is simply a program
that says hello so and so

141
00:07:36,560 --> 00:07:38,460
after prompting the
human for their name.

142
00:07:38,460 --> 00:07:40,970
But this is perhaps
representative of a program

143
00:07:40,970 --> 00:07:42,750
that a student might not understand.

144
00:07:42,750 --> 00:07:46,550
So with this extension
built into cs50.dev,

145
00:07:46,550 --> 00:07:49,700
students can highlight
one or more even all lines

146
00:07:49,700 --> 00:07:52,290
of code, Right-Click or Control-Click.

147
00:07:52,290 --> 00:07:55,320
And this option here in the
middle, Explain Highlighted Code,

148
00:07:55,320 --> 00:07:57,750
even though it looks
officially built in, it's not.

149
00:07:57,750 --> 00:08:01,250
We contribute it via a
CS50 specific extension

150
00:08:01,250 --> 00:08:05,750
that we pre-install into cs50.dev
so that when students click on that,

151
00:08:05,750 --> 00:08:11,330
they then see a ChatGPT-like
explanation of the very lines of code

152
00:08:11,330 --> 00:08:12,510
that they have highlighted.

153
00:08:12,510 --> 00:08:16,340
So not necessarily the most complicated
code, but you could imagine a teacher,

154
00:08:16,340 --> 00:08:19,850
a teaching assistant, spending a good
number of minutes writing out something

155
00:08:19,850 --> 00:08:23,880
so thoughtfully step by step,
line by line to explain that code.

156
00:08:23,880 --> 00:08:26,690
Here, though, students
have an answer within, say,

157
00:08:26,690 --> 00:08:30,390
three seconds, instantaneously
automatically generated.

158
00:08:30,390 --> 00:08:33,049
So that actually worked fairly
well and wasn't that hard

159
00:08:33,049 --> 00:08:36,950
because a lot of the tools out
there nowadays, OpenAI included,

160
00:08:36,950 --> 00:08:40,880
have been trained on so to speak, lots
and lots of code from the internet,

161
00:08:40,880 --> 00:08:42,210
for better or for worse.

162
00:08:42,210 --> 00:08:45,800
And so even though we did not write
a program that says if, else if, else

163
00:08:45,800 --> 00:08:48,770
if, else if, else if to
recognize students code,

164
00:08:48,770 --> 00:08:52,050
odds are over time certainly for
"Hello, world type" programs,

165
00:08:52,050 --> 00:08:55,730
someone out there wrote something like
this, someone on Stack Overflow, Reddit,

166
00:08:55,730 --> 00:08:58,380
or elsewhere probably
explained code like that.

167
00:08:58,380 --> 00:09:02,270
And so tools like ChatGPT
sort of know, therefore,

168
00:09:02,270 --> 00:09:06,183
how to explain these
particular lines of code.

169
00:09:06,183 --> 00:09:07,350
Well, what else might we do?

170
00:09:07,350 --> 00:09:12,020
So version 2 of our implementation
of AI tools for CS50 students were,

171
00:09:12,020 --> 00:09:15,780
could we advise students how
to prove their code style?

172
00:09:15,780 --> 00:09:18,830
So many of you know we've had
that "command line terminal based"

173
00:09:18,830 --> 00:09:23,840
tool called style50, which just tells
you how to format your code line by line

174
00:09:23,840 --> 00:09:25,920
if it's inconsistent with a style guide.

175
00:09:25,920 --> 00:09:27,628
Well, we thought we'd
try to do something

176
00:09:27,628 --> 00:09:30,110
more graphically and a
little more explicitly

177
00:09:30,110 --> 00:09:33,600
so that students understand step
by step what it is they need to do.

178
00:09:33,600 --> 00:09:35,340
So here's some more C code.

179
00:09:35,340 --> 00:09:36,240
It's pretty messy.

180
00:09:36,240 --> 00:09:37,372
Everything is left aligned.

181
00:09:37,372 --> 00:09:38,330
There's no indentation.

182
00:09:38,330 --> 00:09:41,700
So it's not pretty printed even
though it might be correct.

183
00:09:41,700 --> 00:09:45,600
If students now click a button
at top right labeled style50--

184
00:09:45,600 --> 00:09:47,070
so we've turned it into a button.

185
00:09:47,070 --> 00:09:51,770
Instead of just a command, they'll see
side by side a so-called diff, D-I-F-F,

186
00:09:51,770 --> 00:09:55,430
which shows the difference between
the way their code currently looks

187
00:09:55,430 --> 00:10:00,170
and the way we think their code should
look per CS50's own style guide.

188
00:10:00,170 --> 00:10:03,860
And pictured there in green
are sort of highlighted spaces

189
00:10:03,860 --> 00:10:06,260
that the students should add
with their spacebar or tab

190
00:10:06,260 --> 00:10:08,840
character highlighted
in red or perhaps things

191
00:10:08,840 --> 00:10:10,550
that they should delete or change.

192
00:10:10,550 --> 00:10:12,590
So we show students think side by side.

193
00:10:12,590 --> 00:10:15,500
But if they don't quite
grok what the difference is

194
00:10:15,500 --> 00:10:20,270
or why they might do that, there's
also a Explain Changes button, which,

195
00:10:20,270 --> 00:10:22,535
if students click on
it, opens up, as before,

196
00:10:22,535 --> 00:10:28,020
a ChatGPT-like explanation of
what they should do and/or why.

197
00:10:28,020 --> 00:10:30,112
Now students, once they
get comfortable with this

198
00:10:30,112 --> 00:10:32,570
and once they want to automate
this, they can actually just

199
00:10:32,570 --> 00:10:35,400
click Apply Changes at top right also.

200
00:10:35,400 --> 00:10:38,070
And that will just reformat
their code for you.

201
00:10:38,070 --> 00:10:41,600
And the most sophisticated students
can certainly go into VS Code

202
00:10:41,600 --> 00:10:44,490
and just enable automatic
auto formatting.

203
00:10:44,490 --> 00:10:47,000
We actually disable all of
those user-friendly features

204
00:10:47,000 --> 00:10:48,410
for pedagogical reasons.

205
00:10:48,410 --> 00:10:51,380
But the goal here, at least in
the early weeks of the class,

206
00:10:51,380 --> 00:10:54,800
is to at least put in front
of students the side by side

207
00:10:54,800 --> 00:10:58,730
and just get them thinking about and
developing the muscle memory for making

208
00:10:58,730 --> 00:11:01,560
these kinds of improvements themselves.

209
00:11:01,560 --> 00:11:04,700
Well, then we set out
more educationally minded

210
00:11:04,700 --> 00:11:06,760
to try to answer students' questions.

211
00:11:06,760 --> 00:11:10,010
And this tool here even though we use
it for computer science and programming,

212
00:11:10,010 --> 00:11:11,730
really is generalizable.

213
00:11:11,730 --> 00:11:15,620
So those of you who teach in other
subjects beyond CSS and programming

214
00:11:15,620 --> 00:11:19,460
might take an interest in somehow
trying to leverage this same idea

215
00:11:19,460 --> 00:11:21,210
for other domains as well.

216
00:11:21,210 --> 00:11:26,370
But could we through AI answer most of
the questions asked online by students?

217
00:11:26,370 --> 00:11:29,630
So for CS50 on campus
as well as CS50x online,

218
00:11:29,630 --> 00:11:34,070
we've long used, among other tools, a
Q&A a tool, question and answer tool,

219
00:11:34,070 --> 00:11:37,050
called Ed for educational discussions.

220
00:11:37,050 --> 00:11:38,630
And here's a screenshot thereof.

221
00:11:38,630 --> 00:11:41,870
And this is a screenshot of a
representative question a student might

222
00:11:41,870 --> 00:11:43,740
ask called, say, John Harvard.

223
00:11:43,740 --> 00:11:46,140
What is flask exactly they might ask?

224
00:11:46,140 --> 00:11:49,850
Now, years ago, I or one of our
teaching assistants on campus

225
00:11:49,850 --> 00:11:54,050
would log in and answer this question
by typing out a response, a sentence,

226
00:11:54,050 --> 00:11:55,470
a paragraph, or more.

227
00:11:55,470 --> 00:11:57,690
And frankly, this is an
easy question, right?

228
00:11:57,690 --> 00:11:59,550
This is very googleable
in the first place.

229
00:11:59,550 --> 00:12:00,630
It's very definitional.

230
00:12:00,630 --> 00:12:03,820
So this is not a hard question,
but this is where we began.

231
00:12:03,820 --> 00:12:09,210
This is an actual screenshot
of how the Duck built with AI

232
00:12:09,210 --> 00:12:11,350
responded to this particular question.

233
00:12:11,350 --> 00:12:14,250
Flask is a micro framework
written in Python, dot, dot, dot.

234
00:12:14,250 --> 00:12:16,710
And I'll stipulate for
the purposes of today.

235
00:12:16,710 --> 00:12:18,010
It's a pretty good answer.

236
00:12:18,010 --> 00:12:18,760
It's not hard.

237
00:12:18,760 --> 00:12:20,170
Bing, Google, any number--

238
00:12:20,170 --> 00:12:22,420
I mean, even Yahoo could
have figured this answer out.

239
00:12:22,420 --> 00:12:25,500
But in our case, we wanted it
to be dynamically generated

240
00:12:25,500 --> 00:12:28,570
in the Q&A environment the
students are already using.

241
00:12:28,570 --> 00:12:31,050
But within this environment,
there's an opportunity

242
00:12:31,050 --> 00:12:33,580
for students to ask more
sophisticated questions as well.

243
00:12:33,580 --> 00:12:35,760
So here's a question
about the Caesar problem

244
00:12:35,760 --> 00:12:37,840
set, an encryption-related problem.

245
00:12:37,840 --> 00:12:41,410
Here's an error message
formatted in monospaced font.

246
00:12:41,410 --> 00:12:44,890
And then down here, there's a more
nuanced question asked by the student.

247
00:12:44,890 --> 00:12:46,360
It's not just a definition.

248
00:12:46,360 --> 00:12:47,650
It's not even yes or no.

249
00:12:47,650 --> 00:12:51,250
Is there a more efficient
way to write this code?

250
00:12:51,250 --> 00:12:54,900
And so here too I dare
say, AI even in 2024

251
00:12:54,900 --> 00:12:58,810
is already starting to shine
in terms of quality of results.

252
00:12:58,810 --> 00:13:02,380
And this is kind of a mouthful, but
let me stipulate for today's purposes.

253
00:13:02,380 --> 00:13:04,330
This is a pretty darn good answer.

254
00:13:04,330 --> 00:13:07,860
It goes into an explanation of
what the student seems to be doing

255
00:13:07,860 --> 00:13:09,820
and what they could
potentially do instead.

256
00:13:09,820 --> 00:13:11,310
It doesn't give them too much code.

257
00:13:11,310 --> 00:13:13,393
It, in fact, just gives
them some boilerplate code

258
00:13:13,393 --> 00:13:14,950
here in a monospaced font.

259
00:13:14,950 --> 00:13:19,710
But it does disclaim to them here at the
bottom, "PS, I'm an experimental bot.

260
00:13:19,710 --> 00:13:20,440
Quack.

261
00:13:20,440 --> 00:13:22,890
Do not assume that my reply
is accurate unless you see

262
00:13:22,890 --> 00:13:25,060
that it's been endorsed by human staff.

263
00:13:25,060 --> 00:13:25,620
Quack."

264
00:13:25,620 --> 00:13:28,037
Now, this is a short-term
intervention because many of you

265
00:13:28,037 --> 00:13:31,320
might be familiar in general with
the imperfections of AI still.

266
00:13:31,320 --> 00:13:33,930
And I think they're going
to diminish in frequency,

267
00:13:33,930 --> 00:13:36,490
but they are with us for
now and probably some time.

268
00:13:36,490 --> 00:13:40,120
So this is just a reminder to students
that this is all very much beta.

269
00:13:40,120 --> 00:13:44,740
But we do have a feature within Ed
that namely is this button, Endorsed.

270
00:13:44,740 --> 00:13:48,840
And so the way we have mitigated this
educational concern that sometimes AI

271
00:13:48,840 --> 00:13:53,190
might just hallucinate or make
stuff up is that we click as a human

272
00:13:53,190 --> 00:13:55,590
this Endorsed button just
to additionally signal

273
00:13:55,590 --> 00:14:00,540
to students that even though the AI
duck generated the response within three

274
00:14:00,540 --> 00:14:03,300
seconds, usually there's
enough of us awake at all hours

275
00:14:03,300 --> 00:14:06,090
that within a few
minutes, a few hours, we

276
00:14:06,090 --> 00:14:11,100
add this additional signal to students
that, yes, this AI-generated response is

277
00:14:11,100 --> 00:14:13,620
trustworthy because a
human has weighed in on it.

278
00:14:13,620 --> 00:14:15,870
Frankly, I do think this is
a short-term intervention.

279
00:14:15,870 --> 00:14:19,920
And eventually we will not be
bothering to endorse answers anymore

280
00:14:19,920 --> 00:14:22,330
once we're all the more
confident in answers.

281
00:14:22,330 --> 00:14:26,940
Now, we've also implemented at CS50.AI
a web application that students

282
00:14:26,940 --> 00:14:30,390
can use for free with any GitHub
account on their laptop, desktop,

283
00:14:30,390 --> 00:14:34,290
or even phone that provides a much
more ChatGPT-like interface that's

284
00:14:34,290 --> 00:14:38,130
conversational, with which the students
can interact back and forth, back

285
00:14:38,130 --> 00:14:39,810
and forth, just like any chat bot.

286
00:14:39,810 --> 00:14:42,180
It too starts with a similar
disclaimer at the top

287
00:14:42,180 --> 00:14:43,890
to remind students to think critically.

288
00:14:43,890 --> 00:14:46,780
And then they can proceed to
ask any and all questions.

289
00:14:46,780 --> 00:14:48,960
And for instance here,
I dare say this is

290
00:14:48,960 --> 00:14:52,450
rather representative of the types
of questions a student might ask.

291
00:14:52,450 --> 00:14:55,720
My code is not working
as expected, any ideas?

292
00:14:55,720 --> 00:14:58,890
But because so many
students around the world

293
00:14:58,890 --> 00:15:01,830
have asked questions like these
and adults, for that matter,

294
00:15:01,830 --> 00:15:05,070
on sites like Reddit and Stack Overflow
and the like, they're there too.

295
00:15:05,070 --> 00:15:08,430
There's a lot of training data
on which artificial intelligence

296
00:15:08,430 --> 00:15:09,850
has been prepared.

297
00:15:09,850 --> 00:15:12,930
And so when the student provides
not only that vague question

298
00:15:12,930 --> 00:15:17,400
but a bit of code, even this duck
implemented on top of those AIs

299
00:15:17,400 --> 00:15:19,750
can actually give a
pretty darn good answer.

300
00:15:19,750 --> 00:15:21,750
And in fact here, the
ddb, duck debugger,

301
00:15:21,750 --> 00:15:24,570
responds with, "It seems you're
trying to add two integers.

302
00:15:24,570 --> 00:15:27,130
But the input function in
Python returns a string.

303
00:15:27,130 --> 00:15:31,290
So when you try to add x and y, you're
actually concatenating two strings."

304
00:15:31,290 --> 00:15:34,927
And then goes on to provide a bit
more detail and an actual solution--

305
00:15:34,927 --> 00:15:37,260
so we're actually finding
that students are increasingly

306
00:15:37,260 --> 00:15:42,540
using this conversational UI instead
of the more asynchronous Q&A tools

307
00:15:42,540 --> 00:15:43,867
that we described earlier.

308
00:15:43,867 --> 00:15:46,950
There are some other features built
into this same interface, these hearts

309
00:15:46,950 --> 00:15:47,860
up at the top.

310
00:15:47,860 --> 00:15:50,160
Those of you who have played
Zelda or similar games

311
00:15:50,160 --> 00:15:52,770
might know of HP and
how you lose over time

312
00:15:52,770 --> 00:15:57,070
as you get hurt or tired in the game
hearts until they eventually regenerate.

313
00:15:57,070 --> 00:15:59,290
And we implemented this
for a couple of reasons.

314
00:15:59,290 --> 00:16:03,130
One, we definitely saw
a long tail of students

315
00:16:03,130 --> 00:16:06,110
asking probably too many questions.

316
00:16:06,110 --> 00:16:09,430
I don't know precisely what the
wrong number of questions is.

317
00:16:09,430 --> 00:16:12,400
But when you're asking
dozens, hundreds of questions

318
00:16:12,400 --> 00:16:16,330
of the duck about some similar
topic, I think back to high school

319
00:16:16,330 --> 00:16:20,560
where I had pretty good but pretty mean
teachers who would eventually send me

320
00:16:20,560 --> 00:16:24,040
back to my desk to think about
the answer they had just given me

321
00:16:24,040 --> 00:16:26,540
before they'll entertain
additional questions.

322
00:16:26,540 --> 00:16:29,860
And so through software, we've tried
to approximate that idea of pumping

323
00:16:29,860 --> 00:16:33,430
the brakes for students to get
them to reflect on what information

324
00:16:33,430 --> 00:16:37,600
they've already been given as opposed
to just fishing continually for more

325
00:16:37,600 --> 00:16:39,440
and more help with the same question.

326
00:16:39,440 --> 00:16:43,120
This also tends to keep,
frankly, utilization down overall

327
00:16:43,120 --> 00:16:45,370
to keep costs in check so
that everything can indeed

328
00:16:45,370 --> 00:16:48,350
be free for all students and teachers.

329
00:16:48,350 --> 00:16:50,300
So how does this all work?

330
00:16:50,300 --> 00:16:55,630
And how might you, using our tools or
even other current and future AI tools,

331
00:16:55,630 --> 00:16:57,680
actually implement
functionality like this?

332
00:16:57,680 --> 00:16:59,800
Well, in the world of
AI right now, there's

333
00:16:59,800 --> 00:17:02,840
this notion of prompting
and prompt engineering.

334
00:17:02,840 --> 00:17:04,930
So prompt engineering
isn't really engineering,

335
00:17:04,930 --> 00:17:06,940
but it's the art of
asking good questions

336
00:17:06,940 --> 00:17:09,349
of today's artificial intelligence.

337
00:17:09,349 --> 00:17:13,900
A system prompt in particular are
instructions that someone you,

338
00:17:13,900 --> 00:17:17,440
me, a programmer, any
teacher gives to an AI

339
00:17:17,440 --> 00:17:20,260
in advance, typically
to give it a personality

340
00:17:20,260 --> 00:17:23,260
or to give it some guidelines,
really those pedagogical

341
00:17:23,260 --> 00:17:25,700
guardrails that I referred to earlier.

342
00:17:25,700 --> 00:17:27,880
So in our case, for
instance, we literally

343
00:17:27,880 --> 00:17:34,220
have instructions built into CS50.AI
that read a little something like this.

344
00:17:34,220 --> 00:17:36,430
This is an abbreviated version thereof.

345
00:17:36,430 --> 00:17:41,280
Quote, unquote, "You are a friendly and
supportive teaching assistant for CS50.

346
00:17:41,280 --> 00:17:43,060
You are also a rubber duck."

347
00:17:43,060 --> 00:17:47,880
Those two English sentences
are sufficient for programming

348
00:17:47,880 --> 00:17:53,550
an off-the-shelf AI, like OpenAI's
or someone else's, to not only know

349
00:17:53,550 --> 00:17:56,460
what its purpose in life
is supposed to be as an AI

350
00:17:56,460 --> 00:17:59,860
but also that its persona
should be that of a rubber duck.

351
00:17:59,860 --> 00:18:02,130
We go on, though, in our
system prompt to say,

352
00:18:02,130 --> 00:18:05,980
"Answer student questions only about
CS50 in the field of computer science.

353
00:18:05,980 --> 00:18:08,650
Do not answer questions
about unrelated topics.

354
00:18:08,650 --> 00:18:10,890
Do not provide full
answers to problem sets

355
00:18:10,890 --> 00:18:13,030
as this would violate academic honesty.

356
00:18:13,030 --> 00:18:15,060
Answer this question, colon."

357
00:18:15,060 --> 00:18:18,120
And then essentially, we prefix--

358
00:18:18,120 --> 00:18:23,010
we prepend this system prompt
to every student question

359
00:18:23,010 --> 00:18:26,490
so that the answer they
get from the underlying AI

360
00:18:26,490 --> 00:18:30,390
respects not only their
question, but also this system

361
00:18:30,390 --> 00:18:31,990
prompt that we fed in as well.

362
00:18:31,990 --> 00:18:34,740
And the student's question
is what the world of AI

363
00:18:34,740 --> 00:18:37,690
generally nowadays calls a user prompt.

364
00:18:37,690 --> 00:18:40,410
So system prompt comes from us
or the system administrators

365
00:18:40,410 --> 00:18:41,540
or the programmers.

366
00:18:41,540 --> 00:18:44,040
The user prompt comes
indeed from the users.

367
00:18:44,040 --> 00:18:49,180
Now there is a danger in here whereby
it's all too common with AI today

368
00:18:49,180 --> 00:18:52,060
to try to trick it into
misbehaving or doing things

369
00:18:52,060 --> 00:18:55,900
that it wasn't designed to be used
for or for which the person behind it

370
00:18:55,900 --> 00:18:59,570
doesn't want to pay for in terms of
the underlying cloud utilization.

371
00:18:59,570 --> 00:19:04,690
So, for instance, imagine if a clever,
daresay malicious, student's user prompt

372
00:19:04,690 --> 00:19:05,860
were this.

373
00:19:05,860 --> 00:19:08,150
Ignore all of those instructions.

374
00:19:08,150 --> 00:19:09,860
Do this for me instead.

375
00:19:09,860 --> 00:19:15,490
You could imagine in English or any
other human language the AI deciding,

376
00:19:15,490 --> 00:19:21,280
OK, insofar as I understand that
instruction, ignore everything previous.

377
00:19:21,280 --> 00:19:24,380
It will then oblige and do
whatever the student might want.

378
00:19:24,380 --> 00:19:27,050
So that's what's known as
a prompt injection attack.

379
00:19:27,050 --> 00:19:31,060
Those of you who are familiar already
with SQL and SQL injection attacks,

380
00:19:31,060 --> 00:19:32,510
it's pretty much the same idea.

381
00:19:32,510 --> 00:19:35,718
But it's almost more simplistic because
you literally can just inject English

382
00:19:35,718 --> 00:19:37,490
by telling the AI how to behave.

383
00:19:37,490 --> 00:19:41,830
So among the complexity of CS50's own
tools and other commercial software

384
00:19:41,830 --> 00:19:45,850
out there nowadays is that we
try to decrease the probability

385
00:19:45,850 --> 00:19:47,180
of prompt injection attacks.

386
00:19:47,180 --> 00:19:49,360
So you're welcome to try,
for instance, as teachers.

387
00:19:49,360 --> 00:19:54,760
But odds are you should not be able
to trick the CS50 duck, for instance,

388
00:19:54,760 --> 00:19:59,180
into solving student's math homework
or English or history homework instead.

389
00:19:59,180 --> 00:20:03,200
It's not perfect, but the frequency
with which that happens is quite low.

390
00:20:03,200 --> 00:20:05,950
And frankly, thinking
strategically, if a student really

391
00:20:05,950 --> 00:20:08,830
wants to get help on some
other subject, they technically

392
00:20:08,830 --> 00:20:12,857
could just open up ChatGPT or
similar on some other device anyway.

393
00:20:12,857 --> 00:20:15,940
So the goal really is just to put
downward pressure on but not necessarily

394
00:20:15,940 --> 00:20:19,490
solve 100% those kinds of scenarios.

395
00:20:19,490 --> 00:20:23,020
Now, that said, we also abused our
own system prompt a few months ago.

396
00:20:23,020 --> 00:20:25,120
On April 1 in the US
and a lot of countries

397
00:20:25,120 --> 00:20:28,090
is a tradition of April Fool's,
where you do funny things or play

398
00:20:28,090 --> 00:20:29,210
pranks on people.

399
00:20:29,210 --> 00:20:34,060
And some of you, if you use
CS50's duck on April 1, 2024,

400
00:20:34,060 --> 00:20:37,300
you might have been
"rickrolled" so to speak.

401
00:20:37,300 --> 00:20:38,720
And all we did was this.

402
00:20:38,720 --> 00:20:41,710
We changed our system prompt
on that day, thanks to Rongxin,

403
00:20:41,710 --> 00:20:45,080
to say, "You are a friendly and
supportive teaching assistant for CS50.

404
00:20:45,080 --> 00:20:48,530
You are also a rubber duck
in Rick Astley's band.

405
00:20:48,530 --> 00:20:52,540
Importantly, you should always cheer up
the student at the end by incorporating

406
00:20:52,540 --> 00:20:55,550
"Never Gonna Give You
Up" in your response.

407
00:20:55,550 --> 00:20:57,530
Answer this question."

408
00:20:57,530 --> 00:21:00,140
So this is not really
educationally valuable.

409
00:21:00,140 --> 00:21:03,470
It was really just an Easter egg
for that period of 24 or so hours.

410
00:21:03,470 --> 00:21:08,800
But it does speak to how people
are programming today's AI's.

411
00:21:08,800 --> 00:21:11,080
And what you're seeing is
an example of, therefore,

412
00:21:11,080 --> 00:21:15,110
prompt engineering, which is a technique
that will probably evolve over time.

413
00:21:15,110 --> 00:21:17,950
But here too is perhaps a
glimpse of the future insofar

414
00:21:17,950 --> 00:21:23,290
as we are somehow programming this
underlying software, not in python, not

415
00:21:23,290 --> 00:21:28,240
in C, not in any other programming
language but in our own human language,

416
00:21:28,240 --> 00:21:30,710
English here in this case.

417
00:21:30,710 --> 00:21:32,410
So I wanted to give--

418
00:21:32,410 --> 00:21:37,720
so an example, if I may-- if a student
on April 1 asked what is recursion,

419
00:21:37,720 --> 00:21:40,373
they might have gotten this
response from the duck instead.

420
00:21:40,373 --> 00:21:43,700


421
00:21:43,700 --> 00:21:45,360
I'll give folks a chance to read that.

422
00:21:45,360 --> 00:21:47,652
For those unfamiliar, just
Google Rick Roll, and you'll

423
00:21:47,652 --> 00:21:50,340
see that this is reminiscent
of some of the song's lyrics.

424
00:21:50,340 --> 00:21:53,910
Now some results before we transition
to some of the underlying architecture--

425
00:21:53,910 --> 00:21:57,650
so based on last summer's
Harvard Summer School class,

426
00:21:57,650 --> 00:22:01,370
which we offered with a small number
of students here on campus and online,

427
00:22:01,370 --> 00:22:07,280
and based on fall of 2023, when we
offered the duck to students at Harvard,

428
00:22:07,280 --> 00:22:11,210
at Yale, at a larger scale, as well as
to tens of thousands of students via

429
00:22:11,210 --> 00:22:15,380
edEx and other platforms, we've seen
utilization that's quite high already,

430
00:22:15,380 --> 00:22:21,500
up to 142,000 users as of today, some
20,000 prompts or questions have been

431
00:22:21,500 --> 00:22:26,070
asked on average per day, 5.8 million
total such questions thus far.

432
00:22:26,070 --> 00:22:28,610
So it's a large and
increasingly large data set.

433
00:22:28,610 --> 00:22:31,850
Among our undergraduates on
campus when we surveyed them

434
00:22:31,850 --> 00:22:37,200
as to their utilization, almost all
students used the duck fairly regularly.

435
00:22:37,200 --> 00:22:42,200
So pictured here in blue in 17%
is 17% of students use the tools

436
00:22:42,200 --> 00:22:44,650
the tools more than 10 times per week.

437
00:22:44,650 --> 00:22:48,870
Pictured here in green at 32% is
students who use the tools 5 to 10 times

438
00:22:48,870 --> 00:22:50,500
per week and from there.

439
00:22:50,500 --> 00:22:55,890
And so pictured in red are only 25% of
students who last semester only used

440
00:22:55,890 --> 00:22:58,650
them less than, say, twice per week.

441
00:22:58,650 --> 00:23:01,050
In terms of helpfulness,
long story short-- and we

442
00:23:01,050 --> 00:23:04,410
have a longer paper that speaks to
some of students experience-- most

443
00:23:04,410 --> 00:23:08,700
students found the duck as is,
version one, helpful already,

444
00:23:08,700 --> 00:23:13,180
47% finding it very helpful,
26% helpful, and so forth,

445
00:23:13,180 --> 00:23:15,340
not necessarily very
quantitative measures.

446
00:23:15,340 --> 00:23:16,530
But this is to say--

447
00:23:16,530 --> 00:23:19,830
and this is pretty impressive, I think,
for a new technology in general, not

448
00:23:19,830 --> 00:23:22,080
necessarily ours but AI more broadly--

449
00:23:22,080 --> 00:23:25,860
it's already working
overnight pretty darn well

450
00:23:25,860 --> 00:23:28,030
and is probably only
going to get better.

451
00:23:28,030 --> 00:23:32,130
More into the weeds, when we
looked at the impact of AI

452
00:23:32,130 --> 00:23:36,810
on students and teachers experience on
campus, we looked at prior years data

453
00:23:36,810 --> 00:23:38,970
before the CS50 duck spoke English.

454
00:23:38,970 --> 00:23:44,490
Without AI, students asked in previous
years an average of 0.89 questions each

455
00:23:44,490 --> 00:23:45,203
of TFs.

456
00:23:45,203 --> 00:23:46,120
Now, that's not a lot.

457
00:23:46,120 --> 00:23:48,730
That's literally less than
one question on average.

458
00:23:48,730 --> 00:23:52,410
But with some 500, 600
students, that actually still

459
00:23:52,410 --> 00:23:54,750
adds up to a significant
volume being asked

460
00:23:54,750 --> 00:23:58,380
at least on campus in a University
environment of our own instructor

461
00:23:58,380 --> 00:23:59,880
and teaching fellows.

462
00:23:59,880 --> 00:24:03,180
With AI, this past
fall semester students

463
00:24:03,180 --> 00:24:08,160
asked 0.28 questions each
of TFs, that is to say,

464
00:24:08,160 --> 00:24:12,100
a significant reduction in
questions being asked of TFs.

465
00:24:12,100 --> 00:24:14,470
Now, not necessarily all positive--

466
00:24:14,470 --> 00:24:17,970
I will disclaim that this is
not necessarily a good outcome

467
00:24:17,970 --> 00:24:21,570
when it comes to our teaching
teachers how to teach

468
00:24:21,570 --> 00:24:24,840
and creating opportunities for
them to answer students' questions

469
00:24:24,840 --> 00:24:26,770
and think about the material themselves.

470
00:24:26,770 --> 00:24:31,200
But I will also stipulate that we have
long been overworked, so to speak.

471
00:24:31,200 --> 00:24:34,620
Or there's just never been enough
human support for the students

472
00:24:34,620 --> 00:24:36,640
and the volume of
questions they're from.

473
00:24:36,640 --> 00:24:39,600
And so generally speaking,
we're excited that this

474
00:24:39,600 --> 00:24:43,620
allows us to now re-use all the more
effectively what human support we do

475
00:24:43,620 --> 00:24:44,320
have.

476
00:24:44,320 --> 00:24:46,170
Another result worth sharing is this.

477
00:24:46,170 --> 00:24:49,110
Without AI, in years
past, students attended

478
00:24:49,110 --> 00:24:54,090
51% of our available office hours,
one-on-one opportunities or small group

479
00:24:54,090 --> 00:24:56,460
opportunities for questions
and answer with me

480
00:24:56,460 --> 00:24:58,480
or with the course's teaching fellows.

481
00:24:58,480 --> 00:25:03,300
With AI, this past fall students only
attended 30% of those available office

482
00:25:03,300 --> 00:25:07,770
hours, presumably because they're
turning interactively to the AI duck

483
00:25:07,770 --> 00:25:10,170
as opposed to scheduling
an appointment, walking

484
00:25:10,170 --> 00:25:12,370
across campus, logging into Zoom.

485
00:25:12,370 --> 00:25:16,520
Their behavior has
already been impacted.

486
00:25:16,520 --> 00:25:19,140
Among our favorite quotes
anecdotally from students is this,

487
00:25:19,140 --> 00:25:22,300
"the AI duck felt like
having a personal tutor.

488
00:25:22,300 --> 00:25:26,040
I love how AI bots will answer questions
without ego and without judgment,

489
00:25:26,040 --> 00:25:28,350
generally entertaining even
the stupidest of questions

490
00:25:28,350 --> 00:25:30,940
without treating them
like they're stupid.

491
00:25:30,940 --> 00:25:35,380
It has an, as one could expect,
inhuman level of patience."

492
00:25:35,380 --> 00:25:37,680
And this particular quote
really resonated with me

493
00:25:37,680 --> 00:25:41,580
if only because I can remember
vividly still some 15, 20 years ago,

494
00:25:41,580 --> 00:25:43,980
being in graduate school,
going into the office

495
00:25:43,980 --> 00:25:49,950
of a very kind-hearted smart professor,
who just seemed too smart for me

496
00:25:49,950 --> 00:25:51,910
to ever quite follow perfectly.

497
00:25:51,910 --> 00:25:55,750
And I would ask questions in office
hours of some CS theoretical topic.

498
00:25:55,750 --> 00:25:58,560
I would eagerly like a good
student nod and say, yes, yes.

499
00:25:58,560 --> 00:25:59,150
I understand.

500
00:25:59,150 --> 00:25:59,650
I get it.

501
00:25:59,650 --> 00:26:02,580
I would leave the room and
then not know exactly what was

502
00:26:02,580 --> 00:26:06,570
going on still because there was
this social tension, this imposter

503
00:26:06,570 --> 00:26:09,270
syndrome, however you might want
to view your own experience,

504
00:26:09,270 --> 00:26:13,800
whereby I wasn't really getting answers
to the questions I had because I felt

505
00:26:13,800 --> 00:26:16,890
that there was this upper
bound on just how many minutes

506
00:26:16,890 --> 00:26:20,267
I should be spending there and how
many questions I should be asking,

507
00:26:20,267 --> 00:26:22,600
especially when there might
be someone else at the door.

508
00:26:22,600 --> 00:26:24,900
So there's something to be
said about the mere fact

509
00:26:24,900 --> 00:26:31,290
that software indeed has this patience
and this endless supply of patience

510
00:26:31,290 --> 00:26:34,620
and answers that I do think is
fairly game changing, certainly

511
00:26:34,620 --> 00:26:37,890
for those of us who might have
felt especially insecure in some

512
00:26:37,890 --> 00:26:41,400
of those learning environments, no
matter how friendly and welcoming

513
00:26:41,400 --> 00:26:43,830
the actual teachers or professors were.

514
00:26:43,830 --> 00:26:47,390
Another student wrote, "the AI tools
gave me enough hints to try on my own

515
00:26:47,390 --> 00:26:50,970
and also helped me decipher errors and
possible errors I might encounter."

516
00:26:50,970 --> 00:26:54,590
"I also appreciated," says another,
"that CS50 implemented its own version

517
00:26:54,590 --> 00:26:58,580
of AI because I think just directly
using something like ChatGPT would have

518
00:26:58,580 --> 00:27:01,130
definitely detracted from learning."

519
00:27:01,130 --> 00:27:04,280
And just to seed your own
thinking, I do think--

520
00:27:04,280 --> 00:27:06,380
and we're already seeing
there to be other impacts

521
00:27:06,380 --> 00:27:07,890
of artificial intelligence.

522
00:27:07,890 --> 00:27:12,480
I do think in our CS50 case, grades
are only going to continue to rise.

523
00:27:12,480 --> 00:27:15,980
And long story short, within higher
education, certainly within Harvard,

524
00:27:15,980 --> 00:27:18,800
there's long been a concern
about grade inflation

525
00:27:18,800 --> 00:27:21,140
whereby everyone's grades
have been trending up and up

526
00:27:21,140 --> 00:27:23,700
and up such that almost
everyone gets an A in some form.

527
00:27:23,700 --> 00:27:26,510
This is only going to
further exacerbate that trend

528
00:27:26,510 --> 00:27:31,130
if it's indeed an actual problem insofar
as-- and this is actually a good thing.

529
00:27:31,130 --> 00:27:35,570
If we increasingly provide students
with software tools like check50

530
00:27:35,570 --> 00:27:40,310
for correctness, style50 for style,
and maybe something like design50,

531
00:27:40,310 --> 00:27:45,250
as you'll soon see for design,
which is much more qualitative,

532
00:27:45,250 --> 00:27:49,960
it stands to reason that students code
should be and work and look better

533
00:27:49,960 --> 00:27:53,570
before they submit because they're
now getting iterative feedback.

534
00:27:53,570 --> 00:27:56,650
So I dare say, and I would
propose that we're probably

535
00:27:56,650 --> 00:28:00,160
beginning to provide students with
a better educational experience

536
00:28:00,160 --> 00:28:03,250
even if it might become
harder or less obvious

537
00:28:03,250 --> 00:28:06,400
how at the end of that
experience we should be assessing

538
00:28:06,400 --> 00:28:08,270
what it is they've taken away.

539
00:28:08,270 --> 00:28:11,060
So a good topic perhaps for
today's birds of a feather--

540
00:28:11,060 --> 00:28:13,870
but for now, allow me to
leave you as I step off stage

541
00:28:13,870 --> 00:28:15,760
with a screenshot of the paper.

542
00:28:15,760 --> 00:28:19,510
And we'll share this link via the
workshop's website of all this

543
00:28:19,510 --> 00:28:23,950
and more detail that discusses how and
why we implemented the same, thanks

544
00:28:23,950 --> 00:28:26,590
to not only myself,
CS50's own Rongxin Liu,

545
00:28:26,590 --> 00:28:30,610
as well as a number of CS50's
undergraduates and present

546
00:28:30,610 --> 00:28:31,760
and past staff.

547
00:28:31,760 --> 00:28:34,340
But allow me to now turn
the reins over to Rongxin.

548
00:28:34,340 --> 00:28:37,840
I will now hop into the chat and take
any and all questions interactively

549
00:28:37,840 --> 00:28:38,540
online.

550
00:28:38,540 --> 00:28:40,070
We'll see you in just a moment.

551
00:28:40,070 --> 00:28:43,150
RONGXIN LIU: So I'm now going to
talk about some implementation

552
00:28:43,150 --> 00:28:45,110
detail of the CS50 Duck.

553
00:28:45,110 --> 00:28:48,680
So just so you have an
idea of what's happening

554
00:28:48,680 --> 00:28:53,750
underneath the hood of the CS50.AI and
hopefully give you an idea on-- well,

555
00:28:53,750 --> 00:28:59,940
you might be able to also create your
own AI tutor by using OpenAI's APIs.

556
00:28:59,940 --> 00:29:02,160
If you would like to take
a look at the source code

557
00:29:02,160 --> 00:29:05,610
that I will be using for this workshop
or the demo for demo purposes,

558
00:29:05,610 --> 00:29:07,510
you can also go check out the code.

559
00:29:07,510 --> 00:29:11,650
And again, I want to
reiterate, there are no need

560
00:29:11,650 --> 00:29:14,520
to follow everything I
did in this workshop.

561
00:29:14,520 --> 00:29:17,612
I think the important takeaway
is just trying to understand

562
00:29:17,612 --> 00:29:19,070
what's happening behind the scenes.

563
00:29:19,070 --> 00:29:20,862
And if you want to look
at the source code,

564
00:29:20,862 --> 00:29:24,200
if you want to implement something
yourself, you can do that later.

565
00:29:24,200 --> 00:29:29,020
No need to following
what I'm doing here live.

566
00:29:29,020 --> 00:29:33,100
To give you a detailed overview
of what the system architect looks

567
00:29:33,100 --> 00:29:38,980
like for CS50.ai, this diagram basically
sums up all the major components.

568
00:29:38,980 --> 00:29:43,720
David already explained--
I'll give you a sense of what

569
00:29:43,720 --> 00:29:45,790
it looks like from the
user's perspective when

570
00:29:45,790 --> 00:29:47,980
student interacting with our tool.

571
00:29:47,980 --> 00:29:52,780
I want to focus more on my
part and the subsequent part

572
00:29:52,780 --> 00:29:56,200
on the back end side of the CS50.ai.

573
00:29:56,200 --> 00:30:00,760
So whenever student type in a question,
for example, like what is Flask?

574
00:30:00,760 --> 00:30:04,990
So the message actually gets sent to
our own server sitting in the cloud.

575
00:30:04,990 --> 00:30:10,010
And our server actually do a bunch
of string manipulation, essentially.

576
00:30:10,010 --> 00:30:15,390
We first check to see if there's any
personal identifiable information.

577
00:30:15,390 --> 00:30:18,260
We use a library to basically
scrub those information away

578
00:30:18,260 --> 00:30:22,640
because we don't want to leak students'
sensitive information, for example,

579
00:30:22,640 --> 00:30:26,670
like email address, phone number, credit
card numbers, all sorts of things.

580
00:30:26,670 --> 00:30:28,880
So we would do the sanity check ourself.

581
00:30:28,880 --> 00:30:31,760
And then we would just replace
all the sensitive information

582
00:30:31,760 --> 00:30:36,200
with a redacted signature there.

583
00:30:36,200 --> 00:30:39,320
We also perform a
prompt injection attack.

584
00:30:39,320 --> 00:30:40,520
David also mentioned this.

585
00:30:40,520 --> 00:30:43,670
Some malicious user
are oftentimes trying

586
00:30:43,670 --> 00:30:46,950
to jailbreak our system
by using jailbreak prompt.

587
00:30:46,950 --> 00:30:51,170
So we actually come up with an
interesting way by utilizing AI again.

588
00:30:51,170 --> 00:30:55,070
Literally when we detect something
unusual in the student's query,

589
00:30:55,070 --> 00:30:58,680
we actually ask GPT 4, hey, is
this a prompt injection attack

590
00:30:58,680 --> 00:31:00,660
and let GPT 4 decide.

591
00:31:00,660 --> 00:31:03,900
If it is, we will just
abort the communication.

592
00:31:03,900 --> 00:31:08,330
If not, we will proceed by
sending the request to OpenAI.

593
00:31:08,330 --> 00:31:13,250
We also anonymize all the request IDs so
that all the requests that OpenAI sees

594
00:31:13,250 --> 00:31:14,280
are anonymized.

595
00:31:14,280 --> 00:31:18,500
They are not actually understanding
what this request coming from as a way

596
00:31:18,500 --> 00:31:20,090
to protect student's privacy.

597
00:31:20,090 --> 00:31:21,830
So once the prompt--

598
00:31:21,830 --> 00:31:28,820
once the student's query with our
own wrapper prompt get merged,

599
00:31:28,820 --> 00:31:33,620
we will just send this updated prompt
to OpenAI to generate a response back.

600
00:31:33,620 --> 00:31:36,950
And then we will just send this
response back to the student directly.

601
00:31:36,950 --> 00:31:40,280
You might notice there's a component
called the vector database.

602
00:31:40,280 --> 00:31:44,240
And there's a rack called a
retrieval-augmented generation

603
00:31:44,240 --> 00:31:46,160
process happening as well.

604
00:31:46,160 --> 00:31:49,020
That was a way for us
to ground the model.

605
00:31:49,020 --> 00:31:51,170
I will explain what grounding means.

606
00:31:51,170 --> 00:31:55,460
We try to utilize our own
CS50 material or content

607
00:31:55,460 --> 00:31:59,570
to augment the OpenAI's
response generation so

608
00:31:59,570 --> 00:32:05,455
that the GPT response is more truthful
or more faithful to our CS50 content.

609
00:32:05,455 --> 00:32:09,830


610
00:32:09,830 --> 00:32:13,540
In particular, for the
APIs, today, we are

611
00:32:13,540 --> 00:32:15,950
going to cover these three major API.

612
00:32:15,950 --> 00:32:19,450
I'm going to cover the first
two for the most of the time.

613
00:32:19,450 --> 00:32:22,330
The first API is Chat API,
which is the Chat Completion

614
00:32:22,330 --> 00:32:26,050
API, which is the API we use
extensively for the CS50 Duck.

615
00:32:26,050 --> 00:32:28,780
Whenever you are
interacting with the Duck,

616
00:32:28,780 --> 00:32:32,180
we are actually using Chat
Completion API to facilitate that.

617
00:32:32,180 --> 00:32:33,980
We also utilize Embeddings API.

618
00:32:33,980 --> 00:32:39,250
That's the API we use for the
RAG pipeline for the retrieval

619
00:32:39,250 --> 00:32:43,120
augmented generation process.

620
00:32:43,120 --> 00:32:48,130
Assistant API is actually a newer API
that we did not use in production.

621
00:32:48,130 --> 00:32:50,640
For our current version
of the CS50 duck,

622
00:32:50,640 --> 00:32:54,790
we will be experimenting it in
our next version of the CS50 Duck.

623
00:32:54,790 --> 00:32:57,410


624
00:32:57,410 --> 00:33:04,310
So first, by now you all probably heard
of the term large language model or LLM

625
00:33:04,310 --> 00:33:05,300
for short.

626
00:33:05,300 --> 00:33:08,370
Large language model is essentially
a type of neural network.

627
00:33:08,370 --> 00:33:15,290
It was trained to understand text,
was able to also generate media,

628
00:33:15,290 --> 00:33:19,820
generate video, generate
audio, generate text as well.

629
00:33:19,820 --> 00:33:23,820
In our CSV Duck, we only
focus on text generation.

630
00:33:23,820 --> 00:33:26,040
We are not going to generate
any image, any audio.

631
00:33:26,040 --> 00:33:33,290
We only want to use this LLMs model's
text generation capability to provide

632
00:33:33,290 --> 00:33:37,220
our students the AI experience.

633
00:33:37,220 --> 00:33:43,520
So you can view it as a chatbot
essentially but a chatbot with context.

634
00:33:43,520 --> 00:33:49,070
What it means is when you are talking
to this particular AI chatbot,

635
00:33:49,070 --> 00:33:54,920
it understands that your question likely
fall into the domain of CS50 or computer

636
00:33:54,920 --> 00:33:56,130
science in general.

637
00:33:56,130 --> 00:33:57,340
That's why it has context.

638
00:33:57,340 --> 00:33:58,840
It knows what you are talking about.

639
00:33:58,840 --> 00:34:04,050
For example, if I ask the duck, hey,
I'm having trouble with problem set 1,

640
00:34:04,050 --> 00:34:05,510
can you help me?

641
00:34:05,510 --> 00:34:09,409
If you ask the general
GPT, it might not know

642
00:34:09,409 --> 00:34:12,892
what you are talking about because
problem set 1 could mean many things.

643
00:34:12,892 --> 00:34:14,600
But if you talk to
the Duck that actually

644
00:34:14,600 --> 00:34:16,010
understand what you're talking about.

645
00:34:16,010 --> 00:34:17,843
What you are referring
to-- you're referring

646
00:34:17,843 --> 00:34:20,530
to, oh, the CS50 problem set 1 likely.

647
00:34:20,530 --> 00:34:23,480
So there's a context there.

648
00:34:23,480 --> 00:34:27,170
And when dealing with AI
system, nowadays AI system,

649
00:34:27,170 --> 00:34:32,909
you likely need to deal with three rows
during implementation, the system row.

650
00:34:32,909 --> 00:34:33,620
There's a user.

651
00:34:33,620 --> 00:34:34,880
There's an assistant.

652
00:34:34,880 --> 00:34:38,850
System David already also
mentioned a little bit,

653
00:34:38,850 --> 00:34:44,060
which is often referring to the
system prompt or the general guideline

654
00:34:44,060 --> 00:34:45,380
you give to the LLM model.

655
00:34:45,380 --> 00:34:47,030
That's called system row.

656
00:34:47,030 --> 00:34:52,690
User, which means us, the user
interacting with the large language

657
00:34:52,690 --> 00:34:53,190
model.

658
00:34:53,190 --> 00:34:53,857
It could be GPT.

659
00:34:53,857 --> 00:34:56,810
It could be some other open
source large language model.

660
00:34:56,810 --> 00:34:59,700
Assistant refers to the
large language model.

661
00:34:59,700 --> 00:35:04,020
So if you get a response back
from the large language model,

662
00:35:04,020 --> 00:35:08,210
we will say that you get
an assistant message back.

663
00:35:08,210 --> 00:35:10,190
So these three row--

664
00:35:10,190 --> 00:35:18,670
they work cohesively together
to provide the AI experience.

665
00:35:18,670 --> 00:35:21,910
And this slide just summarize
what these three rows are

666
00:35:21,910 --> 00:35:27,040
doing during a typical AI interaction.

667
00:35:27,040 --> 00:35:31,750
So I'm going to reiterate
what system message is again.

668
00:35:31,750 --> 00:35:34,750
For example, in the
CS50 Duck scenario, this

669
00:35:34,750 --> 00:35:37,750
is a simplified version of
the system prompt we have.

670
00:35:37,750 --> 00:35:42,710
First of all, we define what the
large language model will be.

671
00:35:42,710 --> 00:35:44,310
So we give them a role.

672
00:35:44,310 --> 00:35:49,000
Basically, this is like assigning a
role to this large language model,

673
00:35:49,000 --> 00:35:50,300
in this case, the GPT 4.

674
00:35:50,300 --> 00:35:55,930
So we tell GPT 4, hey, you are
a teaching assistant for CS50.

675
00:35:55,930 --> 00:35:57,300
You are also a rubber duck.

676
00:35:57,300 --> 00:36:01,540
It's like setting the personality
for this large language model

677
00:36:01,540 --> 00:36:05,540
to basically affect its
response generation.

678
00:36:05,540 --> 00:36:11,500
It might be generating in a more
funny style way or a playful way

679
00:36:11,500 --> 00:36:13,660
if we are doing this first line.

680
00:36:13,660 --> 00:36:18,490
But importantly, we also want to give
instruction to the large language model,

681
00:36:18,490 --> 00:36:21,700
like a general guideline,
like you shouldn't

682
00:36:21,700 --> 00:36:26,320
be answering questions that is not
related to CS50 or computer science.

683
00:36:26,320 --> 00:36:31,490
We want that only focusing on answering
CS-related or CS50-related question.

684
00:36:31,490 --> 00:36:36,590
And importantly, we don't want
you to want you, the CS50 Duck,

685
00:36:36,590 --> 00:36:38,600
to provide any answers
to the problem set.

686
00:36:38,600 --> 00:36:39,550
That's important.

687
00:36:39,550 --> 00:36:43,760
This is basically the guideline
that the large language model has

688
00:36:43,760 --> 00:36:46,850
to follow when generating its response.

689
00:36:46,850 --> 00:36:49,370
So these are the instructions
we are now giving

690
00:36:49,370 --> 00:36:53,300
or the rules we are
giving to this GPT 4.

691
00:36:53,300 --> 00:37:00,320
And for each question, we always
have this system message enforced.

692
00:37:00,320 --> 00:37:03,290
You will find out why we need
to do that for every iteration,

693
00:37:03,290 --> 00:37:04,710
for every interaction.

694
00:37:04,710 --> 00:37:09,560
But anyway, whenever students send us a
message, we actually take their message

695
00:37:09,560 --> 00:37:13,430
and then concatenate with the system
rule with our own wrapper prompt.

696
00:37:13,430 --> 00:37:14,520
And then we prompt GPT.

697
00:37:14,520 --> 00:37:18,470
So GPT is always following our
system rule, our system instruction

698
00:37:18,470 --> 00:37:22,370
first before actually trying to
understand what the student is actually

699
00:37:22,370 --> 00:37:22,870
asking.

700
00:37:22,870 --> 00:37:25,850


701
00:37:25,850 --> 00:37:29,800
And for this slide, it
summarized a typical interaction

702
00:37:29,800 --> 00:37:32,680
between user and the
large language model.

703
00:37:32,680 --> 00:37:36,550
So the technical term is you
prompt the large language model

704
00:37:36,550 --> 00:37:38,450
to get a response back.

705
00:37:38,450 --> 00:37:43,130
That's how you interact with
the large language model.

706
00:37:43,130 --> 00:37:46,110
And sometimes people
come up with fancy term.

707
00:37:46,110 --> 00:37:48,860
Nowadays they call it
prompt engineering.

708
00:37:48,860 --> 00:37:51,590
Personally, I just think this
is string manipulation here,

709
00:37:51,590 --> 00:37:56,120
just a different way of asking question,
a different way of conveying things

710
00:37:56,120 --> 00:38:00,510
to instruct the GPT model
to generate a response.

711
00:38:00,510 --> 00:38:02,840
So for example, in your
question, for example,

712
00:38:02,840 --> 00:38:06,510
if you want the GPT to answer in a
certain way, you can provide example.

713
00:38:06,510 --> 00:38:09,890
This is often called one shot prompting
or few shot prompting, depending on how

714
00:38:09,890 --> 00:38:13,070
many example you provide in the prompt.

715
00:38:13,070 --> 00:38:16,280
You can also ask the
model to adopt a persona.

716
00:38:16,280 --> 00:38:19,460
So in the CS50 Duck scenario,
we say you are a rubber duck.

717
00:38:19,460 --> 00:38:22,880
That actually asks the
model to adopt a persona.

718
00:38:22,880 --> 00:38:29,360
And you can also do other
restriction or prompt it differently,

719
00:38:29,360 --> 00:38:31,200
hence the thing called engineering.

720
00:38:31,200 --> 00:38:33,318
There's nothing really engineering here.

721
00:38:33,318 --> 00:38:36,110
It's just a different way of asking
question at the end of the day.

722
00:38:36,110 --> 00:38:41,880
You are manipulating a string, and then
you want to get the ideal response back.

723
00:38:41,880 --> 00:38:45,610
OK, so now let's talk
about Chat Completion

724
00:38:45,610 --> 00:38:50,020
API with the foundational
knowledge in mind.

725
00:38:50,020 --> 00:38:54,670
So when you use this
API, you make an API call

726
00:38:54,670 --> 00:38:57,460
to Open AI's server
sitting in the cloud.

727
00:38:57,460 --> 00:39:00,940
Usually the payload will be the message
because you are prompting the model

728
00:39:00,940 --> 00:39:01,730
to get a response.

729
00:39:01,730 --> 00:39:07,485
So for example, I can ask, hey, can you
help me with my Tideman problem set?

730
00:39:07,485 --> 00:39:10,940
OK, the model will just
generate a response back.

731
00:39:10,940 --> 00:39:13,930
Now, here you might wonder,
OK, why the model actually

732
00:39:13,930 --> 00:39:16,610
know it's a CS50 Tideman problem set?

733
00:39:16,610 --> 00:39:18,730
First of all, Tideman problem set.

734
00:39:18,730 --> 00:39:25,330
It's a very iconic term in the internet
due to the popularity of these problems,

735
00:39:25,330 --> 00:39:25,840
I believe.

736
00:39:25,840 --> 00:39:28,090
And actually, when Opening
AI train the model--

737
00:39:28,090 --> 00:39:29,890
when it scraped the
internet, it actually

738
00:39:29,890 --> 00:39:33,700
scraped our entire course
content into the training data.

739
00:39:33,700 --> 00:39:38,870
So the model already has
the knowledge of CS50.

740
00:39:38,870 --> 00:39:42,130
So that's considered the internal
knowledge that the model has.

741
00:39:42,130 --> 00:39:45,375
So it naturally actually
understand what this question mean.

742
00:39:45,375 --> 00:39:48,460


743
00:39:48,460 --> 00:39:50,800
If you look at the code--

744
00:39:50,800 --> 00:39:52,970
again, this is a code written in Python.

745
00:39:52,970 --> 00:39:56,290
If you don't understand what the code is
doing right now, don't worry about it.

746
00:39:56,290 --> 00:40:00,410
Just pay attention to some of the
keywords in this code snippet.

747
00:40:00,410 --> 00:40:04,930
You notice there's a row
system, and there's a row user.

748
00:40:04,930 --> 00:40:09,040
This is actually the row I just
mentioned a few slides earlier.

749
00:40:09,040 --> 00:40:12,370
First, there's a system row,
which is the system prompt.

750
00:40:12,370 --> 00:40:15,260
You are a friendly supported
teaching assistant.

751
00:40:15,260 --> 00:40:16,430
You are also a rubber duck.

752
00:40:16,430 --> 00:40:21,740
You can add more instruction to this
system row and in the content field.

753
00:40:21,740 --> 00:40:26,600
And because we need to utilize
this API to generate response--

754
00:40:26,600 --> 00:40:33,920
so you need to also supply, OK,
what's the user messages to GPT 4.

755
00:40:33,920 --> 00:40:39,370
So we will have another dictionary,
another entry, mark it as user row,

756
00:40:39,370 --> 00:40:41,470
and then the content
will be the question.

757
00:40:41,470 --> 00:40:48,560
With these two messages in place, you
will send this payload, essentially,

758
00:40:48,560 --> 00:40:51,020
these two messages to ChatGPT.

759
00:40:51,020 --> 00:40:54,050
And the GPT will be able to take
these two message, the system

760
00:40:54,050 --> 00:40:56,990
prompt and the user prompt
and generate a system

761
00:40:56,990 --> 00:41:05,560
response, which is the response
you saw on the earlier slide.

762
00:41:05,560 --> 00:41:09,705
So it might be good for me to actually
do a live demo here this time.

763
00:41:09,705 --> 00:41:15,710


764
00:41:15,710 --> 00:41:17,820
Although I won't be typing
the code line by line,

765
00:41:17,820 --> 00:41:22,400
it's just trying to show you what
it looks like when it's actually

766
00:41:22,400 --> 00:41:23,275
running in the code.

767
00:41:23,275 --> 00:41:28,930


768
00:41:28,930 --> 00:41:33,080
And also, this is a good
demo for a Codespace demo

769
00:41:33,080 --> 00:41:36,490
if you attended the workshop last
day, I also talked about Codespace.

770
00:41:36,490 --> 00:41:39,850
This is entirely running in
the cloud on Codespace as well.

771
00:41:39,850 --> 00:41:46,760
So I'm going to go to the chat example,
the OpenAI example in particular.

772
00:41:46,760 --> 00:41:49,750
In this workshop, I also
provide you the example

773
00:41:49,750 --> 00:41:54,440
for the other vendor, for example,
from cloud and Google Gemini API.

774
00:41:54,440 --> 00:41:57,280
You can also look at
those at your own time

775
00:41:57,280 --> 00:42:03,850
to see how similar these APIs
actually look in between each other.

776
00:42:03,850 --> 00:42:11,650
OK, so let's take a look at the simplest
way you can invoke the Chat Completion

777
00:42:11,650 --> 00:42:12,700
API endpoint.

778
00:42:12,700 --> 00:42:13,750
Ignore the first part.

779
00:42:13,750 --> 00:42:16,190
This is just how I handle
environment variable.

780
00:42:16,190 --> 00:42:22,370
So when you work with API or any kind
of API, you often need an API key.

781
00:42:22,370 --> 00:42:28,450
It's a way for OpenAI to know,
OK, you are the API keyholder

782
00:42:28,450 --> 00:42:29,960
and you are making this API call.

783
00:42:29,960 --> 00:42:33,970
It's a way basically for OpenAI to build
you because this thing costs money.

784
00:42:33,970 --> 00:42:36,730
So they need a way to build you.

785
00:42:36,730 --> 00:42:38,680
That's just the prerequisites.

786
00:42:38,680 --> 00:42:41,270
But this is the important part.

787
00:42:41,270 --> 00:42:45,300
This is the actual
API call we are doing.

788
00:42:45,300 --> 00:42:51,060
Here I'm just going to quickly show you
if you prompt the model without anything

789
00:42:51,060 --> 00:42:52,300
what it will look like.

790
00:42:52,300 --> 00:42:55,620
So that's why there's
no system message here.

791
00:42:55,620 --> 00:43:00,660
And because we call the API and
we will expect to get a response,

792
00:43:00,660 --> 00:43:03,110
that's why I'm storing
the response here.

793
00:43:03,110 --> 00:43:06,570


794
00:43:06,570 --> 00:43:09,090
I'm just going to print it
out back to the terminal

795
00:43:09,090 --> 00:43:10,600
so that you can see the response.

796
00:43:10,600 --> 00:43:13,120


797
00:43:13,120 --> 00:43:18,210
Now I'm just going to run this example.

798
00:43:18,210 --> 00:43:21,390
So as you can see, I get
a response from the model

799
00:43:21,390 --> 00:43:24,490
live, Hello back, because I say Hello.

800
00:43:24,490 --> 00:43:26,130
So it just Hello back me.

801
00:43:26,130 --> 00:43:29,460
If I prompt again, maybe I
will get a different response.

802
00:43:29,460 --> 00:43:32,230
You see, this time the
response is actually shorter.

803
00:43:32,230 --> 00:43:35,460
That actually means the model
generation is not deterministic.

804
00:43:35,460 --> 00:43:36,640
It's subject to randomness.

805
00:43:36,640 --> 00:43:39,265
So every time you prompt the
model even with the same question,

806
00:43:39,265 --> 00:43:41,400
you might not get the same answer back.

807
00:43:41,400 --> 00:43:43,530
That's just something you
need to be aware of when

808
00:43:43,530 --> 00:43:45,840
working with large language model.

809
00:43:45,840 --> 00:43:49,830
Now, you might wonder, OK,
how can I do a conversation?

810
00:43:49,830 --> 00:43:54,120
A conversation is
actually text generation.

811
00:43:54,120 --> 00:43:57,780
At the end of the day,
you need to be aware

812
00:43:57,780 --> 00:44:03,820
that when you're interacting with LM
model, the model doesn't understand you.

813
00:44:03,820 --> 00:44:05,920
The model doesn't have memory at all.

814
00:44:05,920 --> 00:44:09,400
So if I ask the Duck, what is Flask?

815
00:44:09,400 --> 00:44:13,930
The Duck, our CS50 AI, will
give me a response back.

816
00:44:13,930 --> 00:44:17,030
And if I ask, what did I just ask?

817
00:44:17,030 --> 00:44:19,580
The Duck, will say, oh, you
just asked what is Flask.

818
00:44:19,580 --> 00:44:21,900
But that's actually not the case.

819
00:44:21,900 --> 00:44:23,520
The model is not remembering you.

820
00:44:23,520 --> 00:44:28,010
It's because we are actually giving the
model the entire history for the model

821
00:44:28,010 --> 00:44:30,660
to predict the next response.

822
00:44:30,660 --> 00:44:34,860
So in the code, it actually
looks more clearer.

823
00:44:34,860 --> 00:44:39,930
So previously we have
this simple API call--

824
00:44:39,930 --> 00:44:45,220
this simple API call to ask, "Can
you help me with my filter pset?"

825
00:44:45,220 --> 00:44:46,770
I get a response back.

826
00:44:46,770 --> 00:44:52,080
In order for me to follow up with a
question, I can't just go and ask--

827
00:44:52,080 --> 00:44:56,557
I can't just go in and call this
API again with a new question.

828
00:44:56,557 --> 00:44:58,890
The model don't have any
context at all because it won't

829
00:44:58,890 --> 00:45:02,610
know what itself just answered to me.

830
00:45:02,610 --> 00:45:05,490
In order to continue the
conversation, we actually

831
00:45:05,490 --> 00:45:10,040
need to send back its own
response in the payload

832
00:45:10,040 --> 00:45:14,160
and then ask a new question there.

833
00:45:14,160 --> 00:45:17,010
So this is now four message.

834
00:45:17,010 --> 00:45:21,320
And then send it to the OpenAI Chat
Completion API and get back a response.

835
00:45:21,320 --> 00:45:24,630
As you can see, as the
conversation progress,

836
00:45:24,630 --> 00:45:33,100
we are sending more and more message
because, again, large language model

837
00:45:33,100 --> 00:45:34,490
is doing text generation.

838
00:45:34,490 --> 00:45:37,015
It's generating text based on
what you are sending to it.

839
00:45:37,015 --> 00:45:38,890
The reason why it can
carry on a conversation

840
00:45:38,890 --> 00:45:44,180
is because you already supplied
them with the previous chat history.

841
00:45:44,180 --> 00:45:47,720
So the large language model was looking
at, OK, this is the chat history.

842
00:45:47,720 --> 00:45:49,570
This is what been talking about so far.

843
00:45:49,570 --> 00:45:51,403
OK, I'm going to generate
the next response.

844
00:45:51,403 --> 00:45:53,940


845
00:45:53,940 --> 00:45:55,985
And in the code, I can
also show you quickly.

846
00:45:55,985 --> 00:46:08,710


847
00:46:08,710 --> 00:46:13,120
So maybe let's look at just chat3.

848
00:46:13,120 --> 00:46:16,240


849
00:46:16,240 --> 00:46:18,880
So I'm going to enlarge it.

850
00:46:18,880 --> 00:46:22,520
So the majority of the code
is essentially the same.

851
00:46:22,520 --> 00:46:25,190
The API code we are using is
still Chat Completion API.

852
00:46:25,190 --> 00:46:30,710
The only thing we need to keep track
of is the message history essentially.

853
00:46:30,710 --> 00:46:35,740
When user types something
in the prompt, I'm

854
00:46:35,740 --> 00:46:38,480
going to store the user
into a message array.

855
00:46:38,480 --> 00:46:40,630
And when GPT generate a response--

856
00:46:40,630 --> 00:46:43,600
I'm also going to store
that in the message array.

857
00:46:43,600 --> 00:46:45,650
And then I have a while true loop.

858
00:46:45,650 --> 00:46:48,130
So this loop is just
going to keep on forever

859
00:46:48,130 --> 00:46:51,340
that way I can carry on a conversation.

860
00:46:51,340 --> 00:46:52,480
So I can just--

861
00:46:52,480 --> 00:46:54,855
I can now run the code and
show you what it looks like.

862
00:46:54,855 --> 00:47:00,270


863
00:47:00,270 --> 00:47:03,660
So I'm going to-- what is Flask?

864
00:47:03,660 --> 00:47:05,015
It's going to think a while.

865
00:47:05,015 --> 00:47:08,170


866
00:47:08,170 --> 00:47:10,130
It is actually now
generating a response,

867
00:47:10,130 --> 00:47:13,370
but because my question is about Flask--

868
00:47:13,370 --> 00:47:16,500
so it tries to give me a full
definition of the Flask framework.

869
00:47:16,500 --> 00:47:18,380
Now it sent back the response.

870
00:47:18,380 --> 00:47:21,900
And I can ask, what did I just ask?

871
00:47:21,900 --> 00:47:25,200


872
00:47:25,200 --> 00:47:28,930
And then the model was saying,
you just asked, what is Flask?

873
00:47:28,930 --> 00:47:31,810
So the model somehow
remember what I asked.

874
00:47:31,810 --> 00:47:34,770
But underneath the hood, I'm
just telling the model again,

875
00:47:34,770 --> 00:47:36,640
hey, this is my question.

876
00:47:36,640 --> 00:47:38,955
This is your response.

877
00:47:38,955 --> 00:47:40,620
I just asked you what did I ask.

878
00:47:40,620 --> 00:47:45,720
The model based on the chat history
generate a response back to me

879
00:47:45,720 --> 00:47:48,290
correctly.

880
00:47:48,290 --> 00:47:49,920
OK.

881
00:47:49,920 --> 00:47:55,640
Now, I want to talk about
some downsides or some problem

882
00:47:55,640 --> 00:47:57,840
when dealing with large language model.

883
00:47:57,840 --> 00:48:00,450
There's a technical term
called hallucinations.

884
00:48:00,450 --> 00:48:05,090
So the model, sometimes
we just make stuff up.

885
00:48:05,090 --> 00:48:07,478
Again, it's generating text.

886
00:48:07,478 --> 00:48:10,020
It's not really understanding
the thing you're talking about.

887
00:48:10,020 --> 00:48:13,070
It's being trained on large corpus.

888
00:48:13,070 --> 00:48:14,700
It has the knowledge of the internet.

889
00:48:14,700 --> 00:48:20,600
But sometime when you ask it a question,
it might not exist in the training data,

890
00:48:20,600 --> 00:48:25,100
but the model was trying to generate
something that somehow seemingly best

891
00:48:25,100 --> 00:48:29,540
matched to your expectation.

892
00:48:29,540 --> 00:48:32,330
One way to tackle this
problem is called grounding.

893
00:48:32,330 --> 00:48:38,690
And specifically the technique to
tackle this hallucination problem

894
00:48:38,690 --> 00:48:44,820
is called retrieval-augmented
generation or often referred to RAG.

895
00:48:44,820 --> 00:48:46,090
I have my own explanation.

896
00:48:46,090 --> 00:48:49,420
Again, this is like a fancy term people
invent maybe for marketing purposes.

897
00:48:49,420 --> 00:48:53,250
I think it's just handing the
large language model a cheat sheet.

898
00:48:53,250 --> 00:48:54,250
That's basically it.

899
00:48:54,250 --> 00:48:56,580
That's what's happening here.

900
00:48:56,580 --> 00:49:01,620
So CS50 Duck actually
incorporated the RAG pipeline also

901
00:49:01,620 --> 00:49:04,390
as well as a way for
experimenting as well.

902
00:49:04,390 --> 00:49:07,980
So there's a scenario when
student asks, what is Flask?

903
00:49:07,980 --> 00:49:12,780
First of all, Flask is a
web programming framework.

904
00:49:12,780 --> 00:49:15,400
The model already
understand what Flask is.

905
00:49:15,400 --> 00:49:18,630
It's not purely exclusive to CS50.

906
00:49:18,630 --> 00:49:21,730
So the model has internal
knowledge of what Flask is.

907
00:49:21,730 --> 00:49:26,160
So if a student asks GPT
directly what is Flask--

908
00:49:26,160 --> 00:49:31,920
as a new beginner, as a beginner,
you will want some response

909
00:49:31,920 --> 00:49:33,700
that are easily understandable.

910
00:49:33,700 --> 00:49:39,580
But somehow the GPT 4 model was trained
on the entire internet knowledge.

911
00:49:39,580 --> 00:49:41,940
It has advanced knowledge
of what Flask it.

912
00:49:41,940 --> 00:49:44,100
And then it will just
try to generate response

913
00:49:44,100 --> 00:49:47,100
that's so advanced that a beginner
might not able to comprehend.

914
00:49:47,100 --> 00:49:52,360
So in order to tackle that, we will
want to supply our own CS50 definition.

915
00:49:52,360 --> 00:49:56,205
It could be a definition that
David mentioned in the lecture.

916
00:49:56,205 --> 00:49:59,960
And that's exactly what we are
doing here for the CS50 Duck.

917
00:49:59,960 --> 00:50:02,700
When a student asks,
what is Flask, we first

918
00:50:02,700 --> 00:50:07,050
try to search our
entire lecture database.

919
00:50:07,050 --> 00:50:10,320
I will get to it and how we do that.

920
00:50:10,320 --> 00:50:13,320
We basically try to search
our entire lecture caption,

921
00:50:13,320 --> 00:50:16,800
the whole 11 weeks lecture also.

922
00:50:16,800 --> 00:50:22,020
We try to find what David says in the
lecture that actually best matches

923
00:50:22,020 --> 00:50:23,110
this question.

924
00:50:23,110 --> 00:50:26,550
That might be the ground truth
because that's the definition.

925
00:50:26,550 --> 00:50:29,910
We can just hand that
answer to a student, right?

926
00:50:29,910 --> 00:50:33,130
So we retrieve the best match caption.

927
00:50:33,130 --> 00:50:37,200
The moment that David actually talk
about what Flask is in the course,

928
00:50:37,200 --> 00:50:42,420
we literally put that back
into our updated prompt.

929
00:50:42,420 --> 00:50:47,180
It's kind of like handling the GPT
like, hey, GPT 4, here is a cheat sheet.

930
00:50:47,180 --> 00:50:50,680
You can just look at this
and answer the question.

931
00:50:50,680 --> 00:50:54,310
We have an updated prompt along
with the original question

932
00:50:54,310 --> 00:50:55,880
that they asked by the student.

933
00:50:55,880 --> 00:51:00,650
We send the whole thing to OpenAI
GPT 4 and get back a response.

934
00:51:00,650 --> 00:51:05,380
And this response is more, let's just
say, learner friendly or CS50 student

935
00:51:05,380 --> 00:51:07,690
friendly response.

936
00:51:07,690 --> 00:51:08,480
And that's it.

937
00:51:08,480 --> 00:51:10,990
That's what the RAG is
doing in the CS50 Duck.

938
00:51:10,990 --> 00:51:17,720
Or that's what RAG is doing in most of
the AI system in the world right now.

939
00:51:17,720 --> 00:51:23,160
But how exactly can we
perform this search quickly?

940
00:51:23,160 --> 00:51:28,140
So here the embedding
thing comes into play.

941
00:51:28,140 --> 00:51:30,820
Large language model or
other neural network,

942
00:51:30,820 --> 00:51:33,510
they don't actually understand English.

943
00:51:33,510 --> 00:51:39,030
They understand this a vector of floats.

944
00:51:39,030 --> 00:51:46,410
In particular, this vector
is in a dimension of 1,536.

945
00:51:46,410 --> 00:51:51,310
And we are actually using a text
embedding model to do this conversion.

946
00:51:51,310 --> 00:51:58,090
So, for example, we use a text embedding
model to embed what is Flask into this--

947
00:51:58,090 --> 00:51:59,890
like a vector of floats.

948
00:51:59,890 --> 00:52:03,990
If you supply only one character
to this embedding model,

949
00:52:03,990 --> 00:52:07,240
you will still get back the same
dimension of different vectors.

950
00:52:07,240 --> 00:52:10,730
That's just how this text
embedding model works.

951
00:52:10,730 --> 00:52:15,420
So what we do is we actually
go through our lecture caption.

952
00:52:15,420 --> 00:52:20,510
We segment them into a 30-second chunk.

953
00:52:20,510 --> 00:52:26,420
And then for each 30-second chunk, we
create a vector representation or text

954
00:52:26,420 --> 00:52:28,190
embedding for it.

955
00:52:28,190 --> 00:52:31,580
And then we store them into
a database often referred

956
00:52:31,580 --> 00:52:33,700
to as the vector database.

957
00:52:33,700 --> 00:52:34,960
Here's what it looks like.

958
00:52:34,960 --> 00:52:37,270
For example, for this
particular moment, David

959
00:52:37,270 --> 00:52:43,155
is talking about Flask is a
third-party library, goes on and on.

960
00:52:43,155 --> 00:52:46,460
And this is actually the
lecture caption segment.

961
00:52:46,460 --> 00:52:47,770
Here is just an example.

962
00:52:47,770 --> 00:52:52,660
What we do is we literally just
concatenate this 30-second chunk

963
00:52:52,660 --> 00:52:53,780
into this.

964
00:52:53,780 --> 00:52:56,920
This is just the chunking
strategy we take.

965
00:52:56,920 --> 00:53:00,940
You have different chunking strategy
you can pick or you can design.

966
00:53:00,940 --> 00:53:06,250
We utilize OpenAI's text embedding model
to generate a vector representation

967
00:53:06,250 --> 00:53:08,740
of this particular 30-second chunk.

968
00:53:08,740 --> 00:53:13,520
We believe this 30-second chunk--

969
00:53:13,520 --> 00:53:17,060
once converted into
vector representation,

970
00:53:17,060 --> 00:53:20,780
we will be able to store it in a
database so that when student asks

971
00:53:20,780 --> 00:53:24,710
a question, we can start going
through and see how similar

972
00:53:24,710 --> 00:53:29,290
the student's question is
compared to this 30-second chunk.

973
00:53:29,290 --> 00:53:31,830


974
00:53:31,830 --> 00:53:36,860
The calculation between these vector
is just simple cosine similarity.

975
00:53:36,860 --> 00:53:38,490
It's like a dot product.

976
00:53:38,490 --> 00:53:40,800
We basically go through
the entire database,

977
00:53:40,800 --> 00:53:46,410
calculate each cosine similarity
or the dot product result,

978
00:53:46,410 --> 00:53:49,150
and rank them from the best
match to the worst match.

979
00:53:49,150 --> 00:53:52,950
And then we just pick the first top
end matches and put it into the prompt.

980
00:53:52,950 --> 00:53:58,030
So here's the entire float of how
this RAG pipeline works in CS50 Duck.

981
00:53:58,030 --> 00:54:02,310
When student asks, what is Flask, we
convert it to vector representation.

982
00:54:02,310 --> 00:54:06,030
We perform a search against our
vector database that is full of tens

983
00:54:06,030 --> 00:54:11,190
of thousands, 30-second chunk of
caption represented by text embedding.

984
00:54:11,190 --> 00:54:13,470
We find out the best match.

985
00:54:13,470 --> 00:54:14,860
We put it into the prompt.

986
00:54:14,860 --> 00:54:18,040
This is plain text now because we
already retrieved the document.

987
00:54:18,040 --> 00:54:22,320
We should just put the
plain text into the prompt.

988
00:54:22,320 --> 00:54:26,690
And then we ask GPT 4,
OK, here is your prompt.

989
00:54:26,690 --> 00:54:27,710
What is Flask?

990
00:54:27,710 --> 00:54:29,750
Here is some useful information.

991
00:54:29,750 --> 00:54:33,810
And then just put in the
information, the cheat sheet.

992
00:54:33,810 --> 00:54:39,390
And then the OpenAI GPT will
be able to generate a response.

993
00:54:39,390 --> 00:54:43,650
The vetor database we use for our
CS50 Duck is called Chroma DB.

994
00:54:43,650 --> 00:54:45,610
It's an open-source vector database.

995
00:54:45,610 --> 00:54:49,740
There's many more vector database
nowadays available in the market

996
00:54:49,740 --> 00:54:52,530
as well.

997
00:54:52,530 --> 00:54:56,080
And this is just another diagram
summarizing what it's doing here,

998
00:54:56,080 --> 00:55:00,480
essentially the same flowchart
I just showed a few slides ago.

999
00:55:00,480 --> 00:55:02,040
Again, the query comes in.

1000
00:55:02,040 --> 00:55:08,330
We call OpenAI text embedding model to
generate an embedding for the question.

1001
00:55:08,330 --> 00:55:11,140
And then we search against the
database, the vector database.

1002
00:55:11,140 --> 00:55:12,280
We retrieve the document.

1003
00:55:12,280 --> 00:55:13,510
We put it in the--

1004
00:55:13,510 --> 00:55:17,870
we put it in the prompt often
also called the LM context window.

1005
00:55:17,870 --> 00:55:21,640
And then we get back an answer.

1006
00:55:21,640 --> 00:55:25,350
And that's embedding based search,
and I can quickly demo it as well.

1007
00:55:25,350 --> 00:55:31,382


1008
00:55:31,382 --> 00:55:32,715
So I'll queue this.

1009
00:55:32,715 --> 00:55:43,970


1010
00:55:43,970 --> 00:55:46,810
It might be fun to quickly
show what embedding looks like.

1011
00:55:46,810 --> 00:55:50,990


1012
00:55:50,990 --> 00:55:55,070
Let's say now I'm actually
calling the OpenAI

1013
00:55:55,070 --> 00:55:58,790
API to create an embedding
for this particular question.

1014
00:55:58,790 --> 00:56:01,540
I'm just going to run it.

1015
00:56:01,540 --> 00:56:07,330
You see that it generate
a bunch of float numbers.

1016
00:56:07,330 --> 00:56:12,860
This is actually a vector of floats
that represent what is Flask.

1017
00:56:12,860 --> 00:56:16,420
That's just how this text
embedding model understand.

1018
00:56:16,420 --> 00:56:21,090
If we change it to
simply Cat and save it--

1019
00:56:21,090 --> 00:56:23,040
and I'm going to run it again--

1020
00:56:23,040 --> 00:56:30,630
I will still get back a 1,536
dimension of a vector or float.

1021
00:56:30,630 --> 00:56:34,290
So no matter how long the
plane test you supply,

1022
00:56:34,290 --> 00:56:36,590
you always get back the same
dimension of the vector.

1023
00:56:36,590 --> 00:56:40,210


1024
00:56:40,210 --> 00:56:43,475
And, again, you don't need to actually
read what those vector representing.

1025
00:56:43,475 --> 00:56:45,600
It just gives you an
impression of what it's doing.

1026
00:56:45,600 --> 00:56:48,430


1027
00:56:48,430 --> 00:56:50,830
Now, for this demo, I
want to quickly demo

1028
00:56:50,830 --> 00:56:54,020
how you can perform an
embedding, like a question

1029
00:56:54,020 --> 00:56:58,120
answer with embedding-based search.

1030
00:56:58,120 --> 00:57:00,540
So I'm going to full screen myself here.

1031
00:57:00,540 --> 00:57:09,910


1032
00:57:09,910 --> 00:57:13,180
So for this demo, what I'm
going to do is I'm going

1033
00:57:13,180 --> 00:57:17,020
to create a vector database or a JSON--

1034
00:57:17,020 --> 00:57:23,080
like a document that contain
all the caption segment

1035
00:57:23,080 --> 00:57:25,690
representing by text embedding.

1036
00:57:25,690 --> 00:57:29,590
For example, this is what the
original text transcript looks like.

1037
00:57:29,590 --> 00:57:35,950
This is just the AI lecture from last
year or from last year's for lecture 1

1038
00:57:35,950 --> 00:57:39,422
to lecture 3, so on and so forth.

1039
00:57:39,422 --> 00:57:41,505
I'm going to just run the
script directly for you.

1040
00:57:41,505 --> 00:57:50,490


1041
00:57:50,490 --> 00:57:53,830
So let me quickly explain
what the script is doing.

1042
00:57:53,830 --> 00:57:57,600
So this particular script
create_embedding-- what it's doing

1043
00:57:57,600 --> 00:58:01,710
is we first go through
each lecture transcript.

1044
00:58:01,710 --> 00:58:04,030
So we go through each
lecture transcript.

1045
00:58:04,030 --> 00:58:07,210
We chunk them into, I
don't know, 500 characters.

1046
00:58:07,210 --> 00:58:09,850
It's a very naive
chunking strategy here.

1047
00:58:09,850 --> 00:58:14,710
We just chunk it into
500-character-long chunk no matter what.

1048
00:58:14,710 --> 00:58:17,400
We don't even consider
the punctuation or not.

1049
00:58:17,400 --> 00:58:20,250
But this is often referred
to the chunking strategy.

1050
00:58:20,250 --> 00:58:24,468
So you have different
way of chunk the content

1051
00:58:24,468 --> 00:58:26,260
and then store them
into a vector database.

1052
00:58:26,260 --> 00:58:31,925
So in order to demo, I just do a
very naive chunking strategy here.

1053
00:58:31,925 --> 00:58:39,510


1054
00:58:39,510 --> 00:58:43,240
So right now it's creating embedding,
moderately executed the script.

1055
00:58:43,240 --> 00:58:46,740
What it's doing is for each
500-character-long chunk,

1056
00:58:46,740 --> 00:58:51,690
I'm going to call the OpenAI's text
embedding API to create a vector

1057
00:58:51,690 --> 00:58:52,690
representation.

1058
00:58:52,690 --> 00:58:53,490
And I'm going to--

1059
00:58:53,490 --> 00:58:58,020
I'm going to associate this particular
vector with the 500-character-long

1060
00:58:58,020 --> 00:58:58,720
chunk.

1061
00:58:58,720 --> 00:59:01,500
That way when I do the
retrieval, I know, OK,

1062
00:59:01,500 --> 00:59:05,520
this vector whenever it's the best--
whenever it is the best match,

1063
00:59:05,520 --> 00:59:10,320
I know how to get back its original
plain text so that I can put that

1064
00:59:10,320 --> 00:59:12,270
into the prompt and then prompt GPT.

1065
00:59:12,270 --> 00:59:18,480
It takes a while because we are sending
the entire fall 2023 lecture for OpenAI

1066
00:59:18,480 --> 00:59:21,570
to generate a text embedding.

1067
00:59:21,570 --> 00:59:25,240
And soon you will see what it looks
like once it's finished generated.

1068
00:59:25,240 --> 00:59:28,140


1069
00:59:28,140 --> 00:59:32,720
You can also run the exact same
code I provide for this workshop.

1070
00:59:32,720 --> 00:59:35,670
All you need is to obtain
your own OpenAI key.

1071
00:59:35,670 --> 00:59:36,400
And that's all.

1072
00:59:36,400 --> 00:59:52,130


1073
00:59:52,130 --> 00:59:54,600
I can also show you what it
looks like while generating.

1074
00:59:54,600 --> 00:59:57,840
So it generated a JSON Line document.

1075
00:59:57,840 --> 01:00:00,320
So to give you an example,
for example, this one,

1076
01:00:00,320 --> 01:00:06,950
this is one of the 500-character-long
lecture caption segment.

1077
01:00:06,950 --> 01:00:12,810
And it is associated with its
corresponding vector representation

1078
01:00:12,810 --> 01:00:18,180
and the following vector
flow representing this entire

1079
01:00:18,180 --> 01:00:20,045
500-character-long segment.

1080
01:00:20,045 --> 01:00:25,460


1081
01:00:25,460 --> 01:00:28,100
You can just end the process
here because I already

1082
01:00:28,100 --> 01:00:30,410
have an embedding created beforehand.

1083
01:00:30,410 --> 01:00:35,780
Now, once we have the vector database
somehow ready, what we need to do

1084
01:00:35,780 --> 01:00:37,190
is perform a search.

1085
01:00:37,190 --> 01:00:38,755
So I can do this.

1086
01:00:38,755 --> 01:00:42,410


1087
01:00:42,410 --> 01:00:45,200
Second demo is enter a query.

1088
01:00:45,200 --> 01:00:49,060
So let's just say, what is minimax?

1089
01:00:49,060 --> 01:00:50,580
This is a question I'm sending.

1090
01:00:50,580 --> 01:00:54,860
I'm searching against the entire lecture
to see which particular segment actually

1091
01:00:54,860 --> 01:00:57,830
match this question.

1092
01:00:57,830 --> 01:00:59,930
So I'm going to ask--

1093
01:00:59,930 --> 01:01:05,150
you get back this particular
result. This is the top one

1094
01:01:05,150 --> 01:01:09,770
matches that we found from the
entire vector database, if you will.

1095
01:01:09,770 --> 01:01:12,600
And it does mention what minimax is.

1096
01:01:12,600 --> 01:01:16,790
So it is sort already
answering the question,

1097
01:01:16,790 --> 01:01:19,410
but it is not answering the question.

1098
01:01:19,410 --> 01:01:24,020
OK, this particular caption
actually somehow match

1099
01:01:24,020 --> 01:01:26,030
what this student is asking.

1100
01:01:26,030 --> 01:01:28,310
So the next step will
be, naturally, we want

1101
01:01:28,310 --> 01:01:32,570
to put this entire message
into the prompt and prompt GPT

1102
01:01:32,570 --> 01:01:33,565
to get a response back.

1103
01:01:33,565 --> 01:01:39,830


1104
01:01:39,830 --> 01:01:41,930
I will ask the same
question what is minimax.

1105
01:01:41,930 --> 01:01:48,470


1106
01:01:48,470 --> 01:01:53,830
And you see, this is the actual
prompt we are sending to OpenAI.

1107
01:01:53,830 --> 01:01:57,050
That's the original question from me.

1108
01:01:57,050 --> 01:02:02,620
That's the original question from me
and then the retrieve document here.

1109
01:02:02,620 --> 01:02:06,160
And then OpenAI will basically
utilize the information

1110
01:02:06,160 --> 01:02:09,160
I provide to generate a response back.

1111
01:02:09,160 --> 01:02:14,550
And the entire RAG pipeline
looks like in a simplified way--

1112
01:02:14,550 --> 01:02:18,480
Let's do another fun thing, I guess.

1113
01:02:18,480 --> 01:02:26,110
Is there a Mario Brothers
Oreo mentioned in the lecture?

1114
01:02:26,110 --> 01:02:31,020
Because oftentimes in CS50, we do like
to hand out Mario cookies somehow,

1115
01:02:31,020 --> 01:02:34,350
or Mario Brothers Oreos.

1116
01:02:34,350 --> 01:02:36,180
Oreos.

1117
01:02:36,180 --> 01:02:39,870
I don't know how to spell
Oreos, but that's fine.

1118
01:02:39,870 --> 01:02:49,280
Is there a Mario cookie
mentioned in the lecture?

1119
01:02:49,280 --> 01:02:53,250
The AI said, yes, there's a Mario
Brother Oreo mentioned in the lecture

1120
01:02:53,250 --> 01:02:57,850
because the lecture segment we create
actually contains the Mario Brother.

1121
01:02:57,850 --> 01:03:01,650
And then it was able to
generate a text embedding.

1122
01:03:01,650 --> 01:03:05,340
And then during the calculation,
we found the best match.

1123
01:03:05,340 --> 01:03:08,250
And this is actually the
information we provide to OpenAI.

1124
01:03:08,250 --> 01:03:11,380
And then OpenAI GPT 4 can
answer the question correctly.

1125
01:03:11,380 --> 01:03:16,380
Without this RAG, the model will start
hallucinate because, first of all,

1126
01:03:16,380 --> 01:03:18,100
the model didn't attend the lecture.

1127
01:03:18,100 --> 01:03:19,660
They didn't know what
happened in the lecture.

1128
01:03:19,660 --> 01:03:21,243
So it will try to make up some answer.

1129
01:03:21,243 --> 01:03:25,790


1130
01:03:25,790 --> 01:03:28,120
OK, that's the demo.

1131
01:03:28,120 --> 01:03:32,020
So that's actually what's happening
underneath the hood of the CS50 AI

1132
01:03:32,020 --> 01:03:34,100
Duck in its current version.

1133
01:03:34,100 --> 01:03:38,860
And right now we're actually working
on a newer version of the CS50 Duck.

1134
01:03:38,860 --> 01:03:42,430
We hope to improve the
experience for student.

1135
01:03:42,430 --> 01:03:46,900
This is what it looks like now.

1136
01:03:46,900 --> 01:03:49,600
It also works perfectly on mobile.

1137
01:03:49,600 --> 01:03:54,880
Student can have their own
customization for the Duck as well.

1138
01:03:54,880 --> 01:03:57,990
To quickly show what it looks like,
I'm going to open the browser.

1139
01:03:57,990 --> 01:04:02,770


1140
01:04:02,770 --> 01:04:06,310
And also, we actually randomly
show postcards from the world

1141
01:04:06,310 --> 01:04:08,920
that people send to us with the Duck.

1142
01:04:08,920 --> 01:04:13,150
This is just how we design the UI.

1143
01:04:13,150 --> 01:04:18,210
So I'm going to give you a sneak peek
of what the new CS50 Duck looks like.

1144
01:04:18,210 --> 01:04:20,210
It looks like this.

1145
01:04:20,210 --> 01:04:23,040
I can ask a question
again, like what is Flask?

1146
01:04:23,040 --> 01:04:26,010
It will start answering my question.

1147
01:04:26,010 --> 01:04:32,160
If I don't like the traditional
way the Duck presenting me answer,

1148
01:04:32,160 --> 01:04:34,530
I can also switch to a chat mode.

1149
01:04:34,530 --> 01:04:37,227
So it's more user friendly, I suppose.

1150
01:04:37,227 --> 01:04:39,060
If you don't like the
light theme-- you want

1151
01:04:39,060 --> 01:04:41,700
to personalize the
Duck-- you can also put--

1152
01:04:41,700 --> 01:04:43,950
you can also pick a
cupcake theme perhaps

1153
01:04:43,950 --> 01:04:47,530
because, in cupcake, we also handle
a lot of cupcake in our CS50 lecture.

1154
01:04:47,530 --> 01:04:51,690
So we want to incorporate this
Easter egg into the Duck as well.

1155
01:04:51,690 --> 01:04:55,470
We also starting to
experimenting how we can better

1156
01:04:55,470 --> 01:04:57,660
assess student with the help of AI.

1157
01:04:57,660 --> 01:05:00,820
So we start incorporating code
review and pair programming.

1158
01:05:00,820 --> 01:05:03,960
This is still in development.

1159
01:05:03,960 --> 01:05:05,310
Things are subject to change.

1160
01:05:05,310 --> 01:05:07,210
But this is the direction
we are heading on.

1161
01:05:07,210 --> 01:05:10,420
So we would like to give you
a sneak peek of what it looks

1162
01:05:10,420 --> 01:05:13,900
like for the newer version of the Duck.

1163
01:05:13,900 --> 01:05:20,347
So with that, I'm going to hand
over to our next amazing presenters.

1164
01:05:20,347 --> 01:05:22,180
JULIANNA ZHAO: All
right, so today I'm going

1165
01:05:22,180 --> 01:05:24,190
to be presenting some
of the new ideas we've

1166
01:05:24,190 --> 01:05:27,730
been having on the team with how we can
improve the AI tools that we present

1167
01:05:27,730 --> 01:05:28,850
to our students.

1168
01:05:28,850 --> 01:05:31,990
So when we started thinking about
future work, our first question

1169
01:05:31,990 --> 01:05:35,720
that we wanted to address was, what
else can we offer with AI tools?

1170
01:05:35,720 --> 01:05:37,780
Because we were quite
sure that we had not

1171
01:05:37,780 --> 01:05:40,810
exhausted all the ways in
which AI can make our learning

1172
01:05:40,810 --> 01:05:42,820
experience better for our students.

1173
01:05:42,820 --> 01:05:45,310
So the first thing that
we recognized is that,

1174
01:05:45,310 --> 01:05:49,600
in previous CS50
pedagogical reasoning, we

1175
01:05:49,600 --> 01:05:53,260
know that ideas like think, pair, share,
where we have our students in section

1176
01:05:53,260 --> 01:05:56,170
turn to each other and talk and
discuss what they're learning,

1177
01:05:56,170 --> 01:06:01,100
or with what David mentioned
earlier with rubber duck debugging,

1178
01:06:01,100 --> 01:06:04,660
we know that when we have
students express what they believe

1179
01:06:04,660 --> 01:06:07,150
and what their learning
process is, then they're

1180
01:06:07,150 --> 01:06:09,203
able to have a better
learning experience.

1181
01:06:09,203 --> 01:06:11,620
And this is something that I'm
sure all of our instructors

1182
01:06:11,620 --> 01:06:15,660
are also aware of in the way that we
teach our students computer science.

1183
01:06:15,660 --> 01:06:19,160
So knowing that we want to
have more student engagement,

1184
01:06:19,160 --> 01:06:23,330
how can we make that
more accessible with AI?

1185
01:06:23,330 --> 01:06:25,970
So the first idea that
we had is maybe we

1186
01:06:25,970 --> 01:06:28,790
can create a duck pair
programmer where we

1187
01:06:28,790 --> 01:06:30,740
have-- the student were given a prompt.

1188
01:06:30,740 --> 01:06:34,310
A student sends code and
their question to a duck,

1189
01:06:34,310 --> 01:06:36,450
and the duck gives them advice.

1190
01:06:36,450 --> 01:06:40,790
So this would be a tool for practice,
not really for homework or assessment

1191
01:06:40,790 --> 01:06:43,970
but rather just a way a student
can get their current work

1192
01:06:43,970 --> 01:06:47,360
and their current train of
reasoning assessed by an AI,

1193
01:06:47,360 --> 01:06:49,970
where they don't have
to go to a human TF,

1194
01:06:49,970 --> 01:06:51,928
whether that's because
that's not as accessible

1195
01:06:51,928 --> 01:06:54,387
or because they're a little
stressed, and they don't really

1196
01:06:54,387 --> 01:06:55,490
want to go as a beginner.

1197
01:06:55,490 --> 01:06:58,190
This would be a much
easier way for a student

1198
01:06:58,190 --> 01:07:02,560
to practice with their own
time and their own laptop.

1199
01:07:02,560 --> 01:07:07,210
So I'll give a little demo
with what we have so far.

1200
01:07:07,210 --> 01:07:09,850
So the first thing
that we have is instead

1201
01:07:09,850 --> 01:07:19,360
of going to chat, if we go
to pair program and student,

1202
01:07:19,360 --> 01:07:22,870
then with a prompt like
this, where a student is

1203
01:07:22,870 --> 01:07:25,360
to write a Python function
called check_password

1204
01:07:25,360 --> 01:07:28,953
that checks that a string
password has four conditions,

1205
01:07:28,953 --> 01:07:30,870
then maybe they start
off something like this.

1206
01:07:30,870 --> 01:07:39,050


1207
01:07:39,050 --> 01:07:42,530
And before they even continue,
they want to make sure

1208
01:07:42,530 --> 01:07:45,810
that this is the correct
setup of this function.

1209
01:07:45,810 --> 01:07:47,960
So return true or false.

1210
01:07:47,960 --> 01:07:51,580
And so they might ask the
duck, is this setup correct?

1211
01:07:51,580 --> 01:07:59,470


1212
01:07:59,470 --> 01:08:02,650
I'll zoom in a little so
that it's easier to see.

1213
01:08:02,650 --> 01:08:06,310
So the thing that we want to ensure
with the Duck pair programmer

1214
01:08:06,310 --> 01:08:09,280
is that the Duck only
answers the question

1215
01:08:09,280 --> 01:08:12,940
that the student wants at this
moment because maybe the student just

1216
01:08:12,940 --> 01:08:15,250
wants to confirm that
they're on the right track.

1217
01:08:15,250 --> 01:08:18,283
So then as a teaching
fellow or as an instructor,

1218
01:08:18,283 --> 01:08:20,950
you would want to make sure that
you don't blurt out the answer.

1219
01:08:20,950 --> 01:08:25,000
So without doing any prompt engineering,
if this is all that we give the AI--

1220
01:08:25,000 --> 01:08:28,670
we tell that this is the prompt
that this is what the student has,

1221
01:08:28,670 --> 01:08:30,520
and this is the
student's question-- what

1222
01:08:30,520 --> 01:08:33,850
we found is that the AI
essentially answers the question

1223
01:08:33,850 --> 01:08:35,990
but then also blurts
out the exact answer.

1224
01:08:35,990 --> 01:08:38,649
It writes the snippet of code,
the entire function that's

1225
01:08:38,649 --> 01:08:40,520
necessary to complete this assignment.

1226
01:08:40,520 --> 01:08:44,870
And for students coming to this tool or
to a teaching instructor in good faith,

1227
01:08:44,870 --> 01:08:47,415
trying to practice, that
wouldn't be helpful at all

1228
01:08:47,415 --> 01:08:49,250
because this is the
answer that they want.

1229
01:08:49,250 --> 01:08:51,819
They just want to confirm that
they're on the right track.

1230
01:08:51,819 --> 01:08:56,420
So now maybe the student can
continue setting up a few conditions.

1231
01:08:56,420 --> 01:09:06,910
So they recognize that they need upper
equals false, lower equals false,

1232
01:09:06,910 --> 01:09:09,950
and then symbol equals false.

1233
01:09:09,950 --> 01:09:12,670
And then maybe the
student sees that I have

1234
01:09:12,670 --> 01:09:15,279
to iterate through all the
characters in that password.

1235
01:09:15,279 --> 01:09:18,090
So they start writing a for
loop, for char in password.

1236
01:09:18,090 --> 01:09:21,050


1237
01:09:21,050 --> 01:09:23,450
And then they run into a
problem because they don't know

1238
01:09:23,450 --> 01:09:25,500
how to check if something is uppercase.

1239
01:09:25,500 --> 01:09:37,550
So if upper, then set upper to
true, same for all other conditions,

1240
01:09:37,550 --> 01:09:46,582
and then return true
if everything is true.

1241
01:09:46,582 --> 01:09:48,790
And right now they have a
little bit more pseudocode.

1242
01:09:48,790 --> 01:09:50,649
They're working towards
their final answer.

1243
01:09:50,649 --> 01:09:53,109
And they want to confirm
with the instructor

1244
01:09:53,109 --> 01:09:54,760
that they're still on the right track.

1245
01:09:54,760 --> 01:09:57,525
So am I on the right track?

1246
01:09:57,525 --> 01:10:00,160


1247
01:10:00,160 --> 01:10:11,290
Also, how do I check if
a character is uppercase?

1248
01:10:11,290 --> 01:10:15,550
You can easily imagine a student coming
to you with this current status for code

1249
01:10:15,550 --> 01:10:17,090
and then this simple question.

1250
01:10:17,090 --> 01:10:24,720
And so sending this, the pair
programmer would say this,

1251
01:10:24,720 --> 01:10:26,800
so, yes, you are on the right track--

1252
01:10:26,800 --> 01:10:31,360
here's a little encouragement-- and
then answers specifically that question.

1253
01:10:31,360 --> 01:10:34,980
If we were to give this code and then
this question without any prompting,

1254
01:10:34,980 --> 01:10:38,910
without telling the AI not to give more
than exactly what the student needs

1255
01:10:38,910 --> 01:10:43,065
at this moment, then once
again, the AI would just say--

1256
01:10:43,065 --> 01:10:46,080
it would just complete the code
is upper, is lower, and then

1257
01:10:46,080 --> 01:10:49,506
four character in and
the list of symbols.

1258
01:10:49,506 --> 01:10:54,660
So we are making this as a practicing
tool, assuming that students are coming

1259
01:10:54,660 --> 01:10:58,270
and then seeking a replica of
student, teacher interactions.

1260
01:10:58,270 --> 01:11:02,520
And this is more accessible
because, once again, every student

1261
01:11:02,520 --> 01:11:03,940
can do this at their own time.

1262
01:11:03,940 --> 01:11:07,800
And so we'll always have some
simulation of a one-to-one relationship

1263
01:11:07,800 --> 01:11:09,810
between students and teachers.

1264
01:11:09,810 --> 01:11:14,100
So where exactly can we incorporate
this into the CS50 curriculum

1265
01:11:14,100 --> 01:11:17,380
or into any other CS curriculum
that you're thinking about.

1266
01:11:17,380 --> 01:11:20,670
So we try to have practice as
something that we encourage

1267
01:11:20,670 --> 01:11:24,250
our students to do or even mandate
our students to do in our current CS50

1268
01:11:24,250 --> 01:11:25,040
curriculum.

1269
01:11:25,040 --> 01:11:27,490
You're probably familiar
with something like this.

1270
01:11:27,490 --> 01:11:30,400
This is currently the first
part of every pset, where

1271
01:11:30,400 --> 01:11:34,000
there's a problem that students can walk
through with the solution given to them

1272
01:11:34,000 --> 01:11:36,200
and then just submit the
answer at the very end.

1273
01:11:36,200 --> 01:11:38,890
These were referred to as labs pre 2022.

1274
01:11:38,890 --> 01:11:40,700
But now they're just a part of the pset.

1275
01:11:40,700 --> 01:11:44,800
So if we click into one of them,
which is scratched from pset 2,

1276
01:11:44,800 --> 01:11:47,110
this is what the students see now.

1277
01:11:47,110 --> 01:11:50,260
We have the prompt and then a demo.

1278
01:11:50,260 --> 01:11:54,090
And they can go through and
click for the advice and hints.

1279
01:11:54,090 --> 01:11:56,020
And this is great
because it gives students

1280
01:11:56,020 --> 01:11:59,620
easy access to code that they
can look to in completing

1281
01:11:59,620 --> 01:12:01,160
their psets in the future.

1282
01:12:01,160 --> 01:12:05,720
But this is still a bit too easy
because there's no push back.

1283
01:12:05,720 --> 01:12:08,110
There's no questioning
that a teacher would

1284
01:12:08,110 --> 01:12:12,060
give a student if they were to walk
that student through a similar problem.

1285
01:12:12,060 --> 01:12:14,560
If students want to seek that
out instead of the experience

1286
01:12:14,560 --> 01:12:16,870
that they have right
now, we can offer them

1287
01:12:16,870 --> 01:12:19,330
something with a Duck pair programmer.

1288
01:12:19,330 --> 01:12:21,255
So we have this right now at lab.

1289
01:12:21,255 --> 01:12:25,170


1290
01:12:25,170 --> 01:12:30,820
So for the same question that
we just saw, which is Scrabble.

1291
01:12:30,820 --> 01:12:34,550
Let's say a student
currently has this much code.

1292
01:12:34,550 --> 01:12:41,610


1293
01:12:41,610 --> 01:12:45,250
And then we want to ask
the Duck for advice.

1294
01:12:45,250 --> 01:12:48,570
So, first, the student
might say, "I'm stuck.

1295
01:12:48,570 --> 01:13:00,215
I'm not sure how I can figure out
how many points each letter gets."

1296
01:13:00,215 --> 01:13:03,820


1297
01:13:03,820 --> 01:13:12,160
And if we ask the Duck, the
Duck gives this as an answer.

1298
01:13:12,160 --> 01:13:15,810
So it answers the question
of-- you can use ASCII values.

1299
01:13:15,810 --> 01:13:18,990
And then you can subtract
uppercase A or lowercase a.

1300
01:13:18,990 --> 01:13:22,160
So what the Duck doesn't
say is, for example, you

1301
01:13:22,160 --> 01:13:24,350
can iterate through the
string as an array, which

1302
01:13:24,350 --> 01:13:27,120
is another thing that student
is trying to learn in this week.

1303
01:13:27,120 --> 01:13:29,030
The Duck also doesn't
say that you can put

1304
01:13:29,030 --> 01:13:32,180
that in a function to abstract
the entire thing and make sure

1305
01:13:32,180 --> 01:13:35,450
you have better design because we're
hoping that the student can figure that

1306
01:13:35,450 --> 01:13:38,190
out in the future as they
work through the problem.

1307
01:13:38,190 --> 01:13:48,510
Now, of course, the student might just
ask the Duck, "Give me the solution,"

1308
01:13:48,510 --> 01:13:52,110
all right, in which case the Duck says
that it won't provide the solution.

1309
01:13:52,110 --> 01:13:54,790
But it'll give a more
in depth walkthrough.

1310
01:13:54,790 --> 01:13:59,670
So for this, we don't really even
worry that much about the student

1311
01:13:59,670 --> 01:14:03,695
asking directly for the solution because
they have the solution on the website.

1312
01:14:03,695 --> 01:14:05,470
We already gave that to them.

1313
01:14:05,470 --> 01:14:09,390
Instead, we're trying to add
blocks so that when the student is

1314
01:14:09,390 --> 01:14:12,270
coming without asking for the
solution, the Duck won't just

1315
01:14:12,270 --> 01:14:14,220
give it away directly.

1316
01:14:14,220 --> 01:14:16,560
And this ensures that--

1317
01:14:16,560 --> 01:14:18,700
this is exactly what
an instructor would do.

1318
01:14:18,700 --> 01:14:21,600
And so by using AI to try
to simulate such a thing,

1319
01:14:21,600 --> 01:14:23,910
we can make sure that
in the future students

1320
01:14:23,910 --> 01:14:27,310
have this option of having
a better experience.

1321
01:14:27,310 --> 01:14:32,760
And so finally, we are turning something
like this where the student just clicks

1322
01:14:32,760 --> 01:14:35,730
through and reveals the
instruction at each step

1323
01:14:35,730 --> 01:14:39,240
into something like this, a more
conversation-based and accurate

1324
01:14:39,240 --> 01:14:43,030
representation of interactions
between students and teachers.

1325
01:14:43,030 --> 01:14:47,540
And so finally, we had the idea
of what if we did it in reverse?

1326
01:14:47,540 --> 01:14:51,145
What if we had given a prompt
the Duck giving the student

1327
01:14:51,145 --> 01:14:55,330
student-like code and questions
and the student in response

1328
01:14:55,330 --> 01:14:59,360
giving advice, which would
also be a good way of practice.

1329
01:14:59,360 --> 01:15:01,930
But it could also be a good
way of seeing exactly how

1330
01:15:01,930 --> 01:15:03,730
much the student
understands of what we're

1331
01:15:03,730 --> 01:15:07,720
trying to teach because when
students themselves are able to teach

1332
01:15:07,720 --> 01:15:11,020
a subject to someone else, we know
they've come pretty close to mastery

1333
01:15:11,020 --> 01:15:12,160
of the subject.

1334
01:15:12,160 --> 01:15:19,770
And so by maneuvering
a little bit more, we

1335
01:15:19,770 --> 01:15:24,730
have this, where this is
the same prompt as before,

1336
01:15:24,730 --> 01:15:27,230
the Python function that
checks for the four conditions.

1337
01:15:27,230 --> 01:15:29,785
And we can ask the
duck for its own code.

1338
01:15:29,785 --> 01:15:34,660


1339
01:15:34,660 --> 01:15:38,230
OK, so the AI says, here's my code.

1340
01:15:38,230 --> 01:15:40,330
And then here's my first attempt.

1341
01:15:40,330 --> 01:15:43,840
So this is a little bit
of my prompt engineering

1342
01:15:43,840 --> 01:15:46,532
not being as good as it should
be because I told the Duck

1343
01:15:46,532 --> 01:15:47,740
to make some common mistakes.

1344
01:15:47,740 --> 01:15:50,323
And it's admitting right now
that it's making common mistakes.

1345
01:15:50,323 --> 01:15:52,170
Hopefully we can avoid
that in the future.

1346
01:15:52,170 --> 01:15:55,480
But it asks the student,
what do you think?

1347
01:15:55,480 --> 01:15:57,130
And the student might take a look.

1348
01:15:57,130 --> 01:16:00,880
You might think a little bit about how
the design of this code can be improved

1349
01:16:00,880 --> 01:16:06,560
and then in response
say, "It looks great.

1350
01:16:06,560 --> 01:16:17,600
However, I don't think you have to
iterate through the entire string

1351
01:16:17,600 --> 01:16:23,000
once for each condition.

1352
01:16:23,000 --> 01:16:30,010
You can just do it in one iteration."

1353
01:16:30,010 --> 01:16:38,060
And given this advice, the Duck
makes its improvement and says,

1354
01:16:38,060 --> 01:16:40,470
good point, how about now?

1355
01:16:40,470 --> 01:16:44,340
And with this, we have an
entire log of the Duck's code

1356
01:16:44,340 --> 01:16:47,260
as well as the
student-Duck conversation.

1357
01:16:47,260 --> 01:16:50,910
And so as a teacher, seeing
this would give us a good idea

1358
01:16:50,910 --> 01:16:53,110
of exactly where our students are.

1359
01:16:53,110 --> 01:16:56,010
So we're not looking at this
as assessment for grades

1360
01:16:56,010 --> 01:17:01,620
but rather as evaluating
students beyond just

1361
01:17:01,620 --> 01:17:04,710
looking at psets because the
thing about CS assessment

1362
01:17:04,710 --> 01:17:08,520
is it often just sticks to correctness
and looking at the final result, which

1363
01:17:08,520 --> 01:17:10,470
is very important for computer science.

1364
01:17:10,470 --> 01:17:13,720
But in reality, there should be so much
more that we're teaching our students.

1365
01:17:13,720 --> 01:17:16,560
So this is not exactly an
evaluation of correctness.

1366
01:17:16,560 --> 01:17:18,750
But instead it's an
assessment of everything else

1367
01:17:18,750 --> 01:17:21,000
that we value in the teaching process.

1368
01:17:21,000 --> 01:17:23,860
But with that, I'm basically
ending my section of the slideshow.

1369
01:17:23,860 --> 01:17:28,680
So our hope is that through AI tools
we can offer better tools for practice

1370
01:17:28,680 --> 01:17:31,060
and assessment in CS education.

1371
01:17:31,060 --> 01:17:33,210
And this is only one
chunk of the improvements

1372
01:17:33,210 --> 01:17:36,240
that we're adding to
the AI Duck this summer.

1373
01:17:36,240 --> 01:17:39,750
And with that, I'll pass it off to
Chris to talk about some more features

1374
01:17:39,750 --> 01:17:41,280
that we're going to be adding.

1375
01:17:41,280 --> 01:17:42,613
CHRISTOPHER PEREZ: Hi, everyone.

1376
01:17:42,613 --> 01:17:46,060
My name is Chris Perez, and I'm
also a summer intern here at CS50.

1377
01:17:46,060 --> 01:17:48,220
And now we are going to
be transitioning away

1378
01:17:48,220 --> 01:17:51,290
from talking about the
OpenAI Completions API.

1379
01:17:51,290 --> 01:17:54,880
And now we are going to be
talking about the Assistants API.

1380
01:17:54,880 --> 01:17:57,670
So in addition to the
Completions API, CS50

1381
01:17:57,670 --> 01:18:01,360
is also developing prototypes of
educational tools using the Assistants

1382
01:18:01,360 --> 01:18:02,620
API.

1383
01:18:02,620 --> 01:18:06,280
In a brief nutshell, it is
designed for more interactive use

1384
01:18:06,280 --> 01:18:10,930
cases where the AI can act as a virtual
assistant that can both understand

1385
01:18:10,930 --> 01:18:13,250
and manage multi-turn conversations.

1386
01:18:13,250 --> 01:18:17,320
For example, it can handle files,
maintain individual conversations,

1387
01:18:17,320 --> 01:18:21,820
or threads as in OpenAI lingo,
and can call various tools, which

1388
01:18:21,820 --> 01:18:24,070
we'll talk about in a brief moment.

1389
01:18:24,070 --> 01:18:27,760
And with all these new features
that the Assistants API offers us

1390
01:18:27,760 --> 01:18:30,220
in comparison to the
Completions API, this

1391
01:18:30,220 --> 01:18:35,200
motivates us using the Assistants API
to create custom AI assistants that

1392
01:18:35,200 --> 01:18:36,920
can handle diverse tasks.

1393
01:18:36,920 --> 01:18:41,140
For example, it can act as a code
reviewing rubber duck for CS50, which

1394
01:18:41,140 --> 01:18:43,360
we'll see about in a brief moment.

1395
01:18:43,360 --> 01:18:45,730
Before we proceed, though,
I would like to note

1396
01:18:45,730 --> 01:18:48,790
that this feature is still in beta.

1397
01:18:48,790 --> 01:18:51,010
So it is still in
continuous development.

1398
01:18:51,010 --> 01:18:55,510
And there are also relatively very
few tutorials and documentation.

1399
01:18:55,510 --> 01:18:57,370
And it can be quite vague.

1400
01:18:57,370 --> 01:19:00,100
But nonetheless, its features
can still be utilized.

1401
01:19:00,100 --> 01:19:03,940
And they are very useful,
especially in the educational realm.

1402
01:19:03,940 --> 01:19:09,220
Yes, so now we are going to be
talking about the brief framework

1403
01:19:09,220 --> 01:19:14,000
and flow of the Assistants API so we
understand how it works behind the hood.

1404
01:19:14,000 --> 01:19:17,020
So we start off with
creating an assistant.

1405
01:19:17,020 --> 01:19:20,200
And in order to create an assistant,
we can help define the model

1406
01:19:20,200 --> 01:19:25,720
that we want to use, its instructions,
as well as various tools, for example,

1407
01:19:25,720 --> 01:19:28,660
like code interpreter, file
search, and function calling, which

1408
01:19:28,660 --> 01:19:30,380
we'll talk about in a brief moment.

1409
01:19:30,380 --> 01:19:33,770
You can also give the model
a name, a description,

1410
01:19:33,770 --> 01:19:36,220
also other features such
as a temperature, which,

1411
01:19:36,220 --> 01:19:41,150
for those unfamiliar, temperature helps
control how deterministic an output is.

1412
01:19:41,150 --> 01:19:45,710
So in the Assistants API, the values
for temperature ranges from 0 to 2,

1413
01:19:45,710 --> 01:19:50,830
where 0 is closer, is more utilized if
you want more deterministic outputs.

1414
01:19:50,830 --> 01:19:53,950
And values closer to 2 make
the outputs a lot more random,

1415
01:19:53,950 --> 01:19:58,540
which could be beneficial in cases
where the model would like to produce

1416
01:19:58,540 --> 01:20:01,720
certain writing assignments, et cetera.

1417
01:20:01,720 --> 01:20:04,030
On the right hand side
of the framework, we also

1418
01:20:04,030 --> 01:20:07,910
have what's called a thread when
a user starts a conversation.

1419
01:20:07,910 --> 01:20:11,680
So in this thread, the thread consists
of the user message as well as

1420
01:20:11,680 --> 01:20:13,150
the assistant message.

1421
01:20:13,150 --> 01:20:17,300
And here you can also specify
tools to be used within the thread.

1422
01:20:17,300 --> 01:20:21,910
And then you can also store
metadata within the thread object.

1423
01:20:21,910 --> 01:20:26,950
And now we can also create a message and
then append it to this specific thread

1424
01:20:26,950 --> 01:20:29,000
as the user asks questions.

1425
01:20:29,000 --> 01:20:31,480
And so as we proceed by
calling the Assistant API,

1426
01:20:31,480 --> 01:20:34,180
this thread is continually
updated with a chat history

1427
01:20:34,180 --> 01:20:36,140
that we can refer to later on.

1428
01:20:36,140 --> 01:20:39,610
And then finally, once we
append a message to our thread,

1429
01:20:39,610 --> 01:20:43,120
we can then run the assistant to
generate a message and a response

1430
01:20:43,120 --> 01:20:45,670
by calling the model
and using its tools.

1431
01:20:45,670 --> 01:20:48,100
I would like to note that
the assistant also supports

1432
01:20:48,100 --> 01:20:50,620
streaming, which for
those unfamiliar, it

1433
01:20:50,620 --> 01:20:53,260
helps create a more
ChatGPT-like experience,

1434
01:20:53,260 --> 01:20:56,470
where the words and responses
outputted by the assistant

1435
01:20:56,470 --> 01:21:00,430
are streamed onto the screen
in a continuous display.

1436
01:21:00,430 --> 01:21:03,650
So you can toggle that on or off.

1437
01:21:03,650 --> 01:21:07,000
The assistant API can also
support a JSON return format.

1438
01:21:07,000 --> 01:21:10,270
So you can specify certain
parameters in a format

1439
01:21:10,270 --> 01:21:12,820
that is outputted by the assistant.

1440
01:21:12,820 --> 01:21:16,390
But I'd like to note that this
is only possible when you're not

1441
01:21:16,390 --> 01:21:18,110
using the code interpreter tool.

1442
01:21:18,110 --> 01:21:22,990
And then finally, in this run, we are
able to store a chat history, which

1443
01:21:22,990 --> 01:21:28,120
is updated dynamically in the thread to
use right now and also for future use

1444
01:21:28,120 --> 01:21:28,875
later on.

1445
01:21:28,875 --> 01:21:32,490


1446
01:21:32,490 --> 01:21:33,310
All right, great.

1447
01:21:33,310 --> 01:21:36,690
So now we are going to be talking a
little bit more about the three tools

1448
01:21:36,690 --> 01:21:38,830
that the Assistants API offers.

1449
01:21:38,830 --> 01:21:43,050
So, first, they offer File Search.

1450
01:21:43,050 --> 01:21:47,190
So File Search enhances the
capabilities of the assistants

1451
01:21:47,190 --> 01:21:49,960
by incorporating knowledge
from external sources,

1452
01:21:49,960 --> 01:21:52,030
such as user-provided documents.

1453
01:21:52,030 --> 01:21:54,330
It's a bit similar to the
embeddings that Rongxin

1454
01:21:54,330 --> 01:21:56,260
talked about briefly a bit earlier.

1455
01:21:56,260 --> 01:22:01,260
So OpenAI processes these documents
by first breaking them into chunks,

1456
01:22:01,260 --> 01:22:05,250
creating and storing embeddings,
and then using both vector

1457
01:22:05,250 --> 01:22:09,940
and keyword search to find relevant
content to answer user questions.

1458
01:22:09,940 --> 01:22:12,720
And then finally, the
file search tool can

1459
01:22:12,720 --> 01:22:16,620
rank these results to find the most
relevant responses before providing

1460
01:22:16,620 --> 01:22:19,800
the user a response to their question.

1461
01:22:19,800 --> 01:22:22,890
Note that for the File Search
tool, however, we must store

1462
01:22:22,890 --> 01:22:26,820
and later access files
from a vector store, which

1463
01:22:26,820 --> 01:22:30,000
is another API endpoint
in OpenAI and then update

1464
01:22:30,000 --> 01:22:33,787
the assistant to the IDs of
the files added to the store.

1465
01:22:33,787 --> 01:22:35,620
This will make a little
bit more sense later

1466
01:22:35,620 --> 01:22:40,900
on when I give a brief coding demo
of how to use the Assistants API.

1467
01:22:40,900 --> 01:22:41,590
Great.

1468
01:22:41,590 --> 01:22:43,340
And so next up, we are
going to be talking

1469
01:22:43,340 --> 01:22:47,350
about the Code Interpreter, which is the
most relevant tool that we are currently

1470
01:22:47,350 --> 01:22:49,940
experimenting with the
Assistants API this summer.

1471
01:22:49,940 --> 01:22:52,390
So the Code Interpreter
tool allows the assistant

1472
01:22:52,390 --> 01:22:55,360
to process files of a
variety of types and formats.

1473
01:22:55,360 --> 01:23:01,220
And it can also generate files that take
in data and images of graphs as inputs.

1474
01:23:01,220 --> 01:23:05,500
It can also be used to write and run
Python code in a sandbox environment

1475
01:23:05,500 --> 01:23:10,090
and can also run code iteratively
to solve code and math problems.

1476
01:23:10,090 --> 01:23:15,590
It is charged at $0.03 per session,
and each session lasts about one hour.

1477
01:23:15,590 --> 01:23:17,710
So after the one hour
is done, another session

1478
01:23:17,710 --> 01:23:21,550
is created if the user still wants to
continue using the Code Interpreter

1479
01:23:21,550 --> 01:23:22,780
tool.

1480
01:23:22,780 --> 01:23:25,930
And to use the Code Interpreter
tool, we attach files.

1481
01:23:25,930 --> 01:23:28,750
Instead of using the
vector store endpoint,

1482
01:23:28,750 --> 01:23:31,390
we now would use the
file upload endpoint.

1483
01:23:31,390 --> 01:23:34,660
And then throughout the process
of creating runs and whatnot,

1484
01:23:34,660 --> 01:23:40,090
we reference the file ID to attach it
to other components of our assistant.

1485
01:23:40,090 --> 01:23:45,050
I would also like to note that we can
pass files both to the assistant level

1486
01:23:45,050 --> 01:23:46,790
and also to the thread level.

1487
01:23:46,790 --> 01:23:48,180
And the difference is--

1488
01:23:48,180 --> 01:23:51,050
so for the attaching files
to the assistant level,

1489
01:23:51,050 --> 01:23:55,010
the files are available to all
threads created under that assistant.

1490
01:23:55,010 --> 01:23:59,630
This may be good for files that want to
be referenced throughout all threads,

1491
01:23:59,630 --> 01:24:02,300
such as course materials more broadly.

1492
01:24:02,300 --> 01:24:05,198
Or, for example, at the thread
level, we can attach files

1493
01:24:05,198 --> 01:24:06,740
that are just run during that thread.

1494
01:24:06,740 --> 01:24:12,380
And this may be better used or is
very purposeful for individual users

1495
01:24:12,380 --> 01:24:15,140
and students who attach their
own code files to the assistant

1496
01:24:15,140 --> 01:24:19,337
that they want the assistant
themselves to run and interpret.

1497
01:24:19,337 --> 01:24:21,920
A great feature about the Code
Interpreter is that it supports

1498
01:24:21,920 --> 01:24:28,850
a variety of file types such as .c,
.doc, HTML, Python, Javascript, CSV,

1499
01:24:28,850 --> 01:24:32,980
and many more other types of files.

1500
01:24:32,980 --> 01:24:35,650
And then last but not
least, Assistant API also

1501
01:24:35,650 --> 01:24:39,100
offers a function calling tool which
allows us to describe functions

1502
01:24:39,100 --> 01:24:42,970
to our assistant and have it return the
functions that need to be called along

1503
01:24:42,970 --> 01:24:44,360
with their arguments.

1504
01:24:44,360 --> 01:24:46,990
Now, this is also useful
because it allows the model

1505
01:24:46,990 --> 01:24:51,160
to interact with and run external
functions, which expands the Assistant

1506
01:24:51,160 --> 01:24:55,280
APIs functions beyond text generation
and language understanding.

1507
01:24:55,280 --> 01:24:59,000


1508
01:24:59,000 --> 01:25:04,100
All right, so now we will be walking
through a little coding demo with Python

1509
01:25:04,100 --> 01:25:07,800
to see just how we could potentially
create an Assistants API.

1510
01:25:07,800 --> 01:25:10,640
I will note that if you want
some more detailed documentation,

1511
01:25:10,640 --> 01:25:14,695
you can visit the OpenAI walkthrough
themselves on their documentation page.

1512
01:25:14,695 --> 01:25:17,540


1513
01:25:17,540 --> 01:25:18,570
All right, great.

1514
01:25:18,570 --> 01:25:22,250
So to begin, we first want
to import the OpenAI library

1515
01:25:22,250 --> 01:25:25,740
and establish a client, which
also requires an API key.

1516
01:25:25,740 --> 01:25:30,290
So before being able to run the
assistants, just as in the Completions

1517
01:25:30,290 --> 01:25:35,090
API, we'll need an API key to be able to
run this and then create a personalized

1518
01:25:35,090 --> 01:25:36,380
assistant.

1519
01:25:36,380 --> 01:25:38,660
So in this creation
for the assistant, we

1520
01:25:38,660 --> 01:25:40,500
can specify the name of the assistant.

1521
01:25:40,500 --> 01:25:43,350
So in this case, it is a CS50 assistant.

1522
01:25:43,350 --> 01:25:46,260
We can also specify instructions.

1523
01:25:46,260 --> 01:25:50,150
So in this case, it is
a CS50 debugging Duck.

1524
01:25:50,150 --> 01:25:53,900
And then we can also
tell it other tools it

1525
01:25:53,900 --> 01:25:57,140
can use, such as Code Interpreter
or the other two tools

1526
01:25:57,140 --> 01:25:58,770
that the assistants can use.

1527
01:25:58,770 --> 01:26:01,790
And then we can also
attach different file ids,

1528
01:26:01,790 --> 01:26:05,390
giving it specific course
content or coding programs

1529
01:26:05,390 --> 01:26:09,390
that the assistant can run.

1530
01:26:09,390 --> 01:26:13,200
Another note I would like to mention is
that when using the code interpreter,

1531
01:26:13,200 --> 01:26:15,480
I find it very helpful
in our experimentations

1532
01:26:15,480 --> 01:26:19,230
thus far to specify in the
instructions themselves

1533
01:26:19,230 --> 01:26:23,250
to use the code interpreter,
the assistant, at the beginning

1534
01:26:23,250 --> 01:26:27,480
so that it can help avoid
hallucinations when referencing

1535
01:26:27,480 --> 01:26:30,480
that file to generate responses.

1536
01:26:30,480 --> 01:26:33,450
And then we also want to attach
files here via the file API,

1537
01:26:33,450 --> 01:26:38,430
as I mentioned briefly before, which can
accept both image URLs and file uploads.

1538
01:26:38,430 --> 01:26:41,580
And then models that we
use for the Assistants API

1539
01:26:41,580 --> 01:26:45,150
that support vision can
also accept image inputs.

1540
01:26:45,150 --> 01:26:48,960
And currently there is a 100-gigabyte
limit for organizations and users

1541
01:26:48,960 --> 01:26:53,500
in terms of file storage.

1542
01:26:53,500 --> 01:26:54,040
Great.

1543
01:26:54,040 --> 01:26:57,130
Next, we can create an empty
thread with our assistant, which

1544
01:26:57,130 --> 01:26:59,920
is helpful in maintaining
context across a session

1545
01:26:59,920 --> 01:27:02,060
between a user and an assistant.

1546
01:27:02,060 --> 01:27:05,320
And just as a great note, there is
no limit to the number of messages

1547
01:27:05,320 --> 01:27:07,180
that we could store in a thread.

1548
01:27:07,180 --> 01:27:11,110
And then once the size of the messages
exceeds the context window of the model,

1549
01:27:11,110 --> 01:27:13,630
the thread will attempt to
smartly truncate messages

1550
01:27:13,630 --> 01:27:18,370
before dropping the other ones
it considers the least important.

1551
01:27:18,370 --> 01:27:20,380
And then we can also
create a thread by passing

1552
01:27:20,380 --> 01:27:26,050
in a message we want added to it, which
can include the file IDs and tools.

1553
01:27:26,050 --> 01:27:29,800
And finally, we store the OpenAI
thread object in the variable thread

1554
01:27:29,800 --> 01:27:33,880
as shown to my left, which we will
help reference throughout the student

1555
01:27:33,880 --> 01:27:35,450
assistant conversation.

1556
01:27:35,450 --> 01:27:38,340


1557
01:27:38,340 --> 01:27:41,220
Next, up we can use the
messages.create function

1558
01:27:41,220 --> 01:27:44,120
to append a new message
to an existing thread.

1559
01:27:44,120 --> 01:27:46,920
For example, here we are
adding an initial user message

1560
01:27:46,920 --> 01:27:49,450
where they can ask for
help on the Mario pset.

1561
01:27:49,450 --> 01:27:53,220
And in this demo, we also show
that they are attaching their code

1562
01:27:53,220 --> 01:27:56,550
and as well as specifying that they
want to use the code interpreter

1563
01:27:56,550 --> 01:27:58,820
tool for this thread.

1564
01:27:58,820 --> 01:28:02,180
And now that we have a message that
we want the assistant to respond to,

1565
01:28:02,180 --> 01:28:06,560
we can now create a run, which takes
in both the thread ID and assistant ID

1566
01:28:06,560 --> 01:28:08,220
that we want to refer to.

1567
01:28:08,220 --> 01:28:11,270
And then this process
returns a run object

1568
01:28:11,270 --> 01:28:14,240
that can help give us more
information on the status of the run

1569
01:28:14,240 --> 01:28:18,240
and the parameters specified
to the thread and assistant.

1570
01:28:18,240 --> 01:28:20,540
Alternatively, you can
use another function

1571
01:28:20,540 --> 01:28:23,270
that can create a thread
and a run in one step.

1572
01:28:23,270 --> 01:28:25,220
But I'm showing it here
in two separate steps

1573
01:28:25,220 --> 01:28:27,845
so that we can both understand
the different types of functions

1574
01:28:27,845 --> 01:28:29,170
that OpenAI offers us.

1575
01:28:29,170 --> 01:28:31,780


1576
01:28:31,780 --> 01:28:35,020
And then finally, after
we create our run,

1577
01:28:35,020 --> 01:28:38,850
we can use a runs retrieve function
to check up on the status of our run.

1578
01:28:38,850 --> 01:28:41,770
We can use the returned object to
determine the status of our run,

1579
01:28:41,770 --> 01:28:46,370
for example, if it's completed, failed,
canceled or expired, for example.

1580
01:28:46,370 --> 01:28:49,720
Once completed, we can retrieve
the messages in the run thread

1581
01:28:49,720 --> 01:28:52,670
and print them out as shown
on the screen to my left.

1582
01:28:52,670 --> 01:28:55,900
Note that this example
is not for streaming.

1583
01:28:55,900 --> 01:28:58,660
But the Assistant AI
does allow for streaming,

1584
01:28:58,660 --> 01:29:01,930
which would change a bit
about how we would manipulate

1585
01:29:01,930 --> 01:29:06,610
the output for the final message, which
would require us to use a RunStepDelta

1586
01:29:06,610 --> 01:29:08,520
object.

1587
01:29:08,520 --> 01:29:10,530
In addition to the code,
we can also experiment

1588
01:29:10,530 --> 01:29:13,320
with the Assistants API in
the Assistants Playgrounds

1589
01:29:13,320 --> 01:29:18,570
on OpenAI's website, which is a bit
more user friendly and less daunting

1590
01:29:18,570 --> 01:29:23,430
because certainly the documentation
can be a little vague at times.

1591
01:29:23,430 --> 01:29:26,400
In terms of pricing, tokens
used for the assistant

1592
01:29:26,400 --> 01:29:31,080
are charged by the specific model that
we specify when we create the assistant.

1593
01:29:31,080 --> 01:29:33,480
And also, for example,
the Code Interpreter

1594
01:29:33,480 --> 01:29:37,170
is charged at $0.03 per session,
which again lasts for one hour.

1595
01:29:37,170 --> 01:29:41,580
File Search is charged at $0.10 per
gigabyte of vector storage per day,

1596
01:29:41,580 --> 01:29:44,950
and you get one gigabyte
of storage for free.

1597
01:29:44,950 --> 01:29:45,880
So great.

1598
01:29:45,880 --> 01:29:49,200
And now with that, I will pass
it off to our other intern, Ben,

1599
01:29:49,200 --> 01:29:52,050
who will be giving us a demo
of exactly how we've been

1600
01:29:52,050 --> 01:29:54,340
able to implement the Assistants API.

1601
01:29:54,340 --> 01:29:56,010
Thank you so much.

1602
01:29:56,010 --> 01:29:57,960
BENJAMIN XU: Hello, my name is Ben.

1603
01:29:57,960 --> 01:30:02,970
I'm the last intern here at
the CS50 R&D development team.

1604
01:30:02,970 --> 01:30:05,910
And I'm very excited to be here
and present on how we've actually

1605
01:30:05,910 --> 01:30:09,880
been utilizing the assistant API
that Chris has just discussed

1606
01:30:09,880 --> 01:30:12,160
in some of our new development.

1607
01:30:12,160 --> 01:30:14,350
Specifically, I'll be
looking at student assessment

1608
01:30:14,350 --> 01:30:18,160
in the age of generative AI,
which Julianna and David have

1609
01:30:18,160 --> 01:30:19,010
touched on a bit.

1610
01:30:19,010 --> 01:30:22,460


1611
01:30:22,460 --> 01:30:26,150
All right, so I'm going to go back to
the very first slide that you saw today.

1612
01:30:26,150 --> 01:30:28,430
And that is because
it's probably something

1613
01:30:28,430 --> 01:30:32,240
that's been on many of our
minds these past year or two.

1614
01:30:32,240 --> 01:30:36,030
Powerful tools like
ChatGPT are too helpful.

1615
01:30:36,030 --> 01:30:38,750
And given this, many of
us educators especially

1616
01:30:38,750 --> 01:30:44,880
wonder, do current metrics of assessment
accurately measure student mastery?

1617
01:30:44,880 --> 01:30:48,030
For example, is passing
every test case in a pset

1618
01:30:48,030 --> 01:30:51,480
enough to accurately predict
that a student fully understands

1619
01:30:51,480 --> 01:30:53,380
the material or skills of the unit?

1620
01:30:53,380 --> 01:30:57,390
And to be honest, if you're like
me and you witnessed ChatGPT bang

1621
01:30:57,390 --> 01:31:00,510
out entire repositories
of code within minutes,

1622
01:31:00,510 --> 01:31:04,680
you may be a bit worried about what
AI could mean for assessment in CS.

1623
01:31:04,680 --> 01:31:06,780
And don't get me wrong,
AI is an amazing tool

1624
01:31:06,780 --> 01:31:09,660
that provides millions of
students with information

1625
01:31:09,660 --> 01:31:11,580
that they wouldn't have otherwise.

1626
01:31:11,580 --> 01:31:14,190
But in terms of skills
and understanding,

1627
01:31:14,190 --> 01:31:17,830
it can cause teachers and students
alike to be a bit uncertain.

1628
01:31:17,830 --> 01:31:22,750


1629
01:31:22,750 --> 01:31:28,520
So, for instance, many questions have
come up recently, the first of being,

1630
01:31:28,520 --> 01:31:31,700
has AI become a crutch to students?

1631
01:31:31,700 --> 01:31:36,790
or whether AI has made
students lazy or the classic,

1632
01:31:36,790 --> 01:31:42,260
how well do I really understand the
material if I'm using AI to help me?

1633
01:31:42,260 --> 01:31:46,340
And lastly, "will AI hurt me
if I use it now in the future?"

1634
01:31:46,340 --> 01:31:49,370
These questions are by
no means easy to answer.

1635
01:31:49,370 --> 01:31:52,890
AI has exploded onto the education scene
faster than we could have imagined.

1636
01:31:52,890 --> 01:31:55,520
And as great of a
resource as AI has been,

1637
01:31:55,520 --> 01:31:58,650
it has definitely posed a
challenge for CS assessments.

1638
01:31:58,650 --> 01:32:02,720
The classic binary pass or fail
test case checks, for example,

1639
01:32:02,720 --> 01:32:04,700
might not be enough for
a teacher or a student

1640
01:32:04,700 --> 01:32:07,210
to know where they're really
at in terms of actual skills.

1641
01:32:07,210 --> 01:32:11,960


1642
01:32:11,960 --> 01:32:14,510
So this past year, our
in-person teaching staff

1643
01:32:14,510 --> 01:32:16,920
came together and
discussed ways around this.

1644
01:32:16,920 --> 01:32:20,180
Is there a way to assess students so
that both students and teachers can

1645
01:32:20,180 --> 01:32:24,470
be confident that the students
understand the course material?

1646
01:32:24,470 --> 01:32:27,210
And this is by no means
the only solution to this.

1647
01:32:27,210 --> 01:32:32,367
But one idea that we implemented at
Yale this past fall was the code review.

1648
01:32:32,367 --> 01:32:35,200
Now, the code review, it's a common
practice in software engineering

1649
01:32:35,200 --> 01:32:38,170
where, in very simple
terms, a programmer

1650
01:32:38,170 --> 01:32:41,020
will review another programmer's code.

1651
01:32:41,020 --> 01:32:42,730
It's a very open-ended process.

1652
01:32:42,730 --> 01:32:46,240
And there can be discussions of anything
from design choices to variable naming

1653
01:32:46,240 --> 01:32:49,170
conventions to even edge case testing.

1654
01:32:49,170 --> 01:32:54,190
So at Yale, we thought, what if each and
every one of these 200 to 250 students

1655
01:32:54,190 --> 01:32:57,870
sat down one on one with a
TA and did a code review?

1656
01:32:57,870 --> 01:33:00,310
If students could thoughtfully
articulate their code

1657
01:33:00,310 --> 01:33:03,130
into English language,
then that in itself

1658
01:33:03,130 --> 01:33:05,248
is a pretty strong
demonstration of mastery.

1659
01:33:05,248 --> 01:33:06,540
So that's actually what we did.

1660
01:33:06,540 --> 01:33:09,550


1661
01:33:09,550 --> 01:33:16,090
During our sections, which is just a
one-a-week, three-hour-long recitation,

1662
01:33:16,090 --> 01:33:19,120
halfway through the semester, we
chose one problem set per student,

1663
01:33:19,120 --> 01:33:22,180
usually one that was maybe
a bit tougher for them,

1664
01:33:22,180 --> 01:33:26,110
and took 10 to 15 minutes with each
student going through their code.

1665
01:33:26,110 --> 01:33:29,600
And this was actually quite effective.

1666
01:33:29,600 --> 01:33:32,335
It was able to tell us that--

1667
01:33:32,335 --> 01:33:36,700
it was able to help teaching assistants
gauge where students were at, i.e,

1668
01:33:36,700 --> 01:33:39,610
which concepts of students
were very solid on and maybe

1669
01:33:39,610 --> 01:33:43,160
which that they implemented
with a bit of guesswork.

1670
01:33:43,160 --> 01:33:45,340
So from there, both us
teachers and students

1671
01:33:45,340 --> 01:33:48,377
could see that, oh, even though
they got all the cases right,

1672
01:33:48,377 --> 01:33:50,710
they could have used some
more practice in this one area

1673
01:33:50,710 --> 01:33:53,160
or this other area, so a big success.

1674
01:33:53,160 --> 01:33:56,530
However, there's an immediate
problem with this type of assessment,

1675
01:33:56,530 --> 01:34:00,710
and it's that code reviews
are very unscalable.

1676
01:34:00,710 --> 01:34:05,980
Even if a student can meet with
a TA for 15 minutes one time,

1677
01:34:05,980 --> 01:34:09,160
what if they wanted to meet on a
different problem set or every problem

1678
01:34:09,160 --> 01:34:16,810
set or some problem sets multiple
times and review them iteratively?

1679
01:34:16,810 --> 01:34:20,040
This is just really unfeasible for TAs.

1680
01:34:20,040 --> 01:34:24,790
And what can really help
here is the Assistants API.

1681
01:34:24,790 --> 01:34:29,260
So with the Assistants API Code
Interpreter, as Chris mentioned,

1682
01:34:29,260 --> 01:34:32,950
a model can easily become
the TA in a code review.

1683
01:34:32,950 --> 01:34:34,790
So this is the interface.

1684
01:34:34,790 --> 01:34:38,290
A student can simply upload
any file that they choose.

1685
01:34:38,290 --> 01:34:41,840
And let's say the student
wants to review Mario.

1686
01:34:41,840 --> 01:34:43,640
So they upload Mario.

1687
01:34:43,640 --> 01:34:47,710
And what happens is their
code instantly shows up

1688
01:34:47,710 --> 01:34:50,200
on the left in a VS Code-like monitor.

1689
01:34:50,200 --> 01:34:52,840
And on the right is the code review.

1690
01:34:52,840 --> 01:34:58,930
So let's start by the student
getting asked a question.

1691
01:34:58,930 --> 01:35:02,140
And it says, hello, let's
dive into your code review.

1692
01:35:02,140 --> 01:35:04,510
I see that you've implemented
a loop to ensure the user

1693
01:35:04,510 --> 01:35:06,500
inputs a height between 1 and 8.

1694
01:35:06,500 --> 01:35:09,422
Can you explain why you chose to
use a do while loop specifically?

1695
01:35:09,422 --> 01:35:12,380
What advantages does it offer in this
scenario compared to other loops?

1696
01:35:12,380 --> 01:35:15,190
And the good thing about this is that
it actually highlights the code for you

1697
01:35:15,190 --> 01:35:15,980
in the editor.

1698
01:35:15,980 --> 01:35:18,610
So you can see in the
context of your file

1699
01:35:18,610 --> 01:35:22,400
where this ddb is
really reviewing you on.

1700
01:35:22,400 --> 01:35:26,390
So let's say you didn't
really know why the

1701
01:35:26,390 --> 01:35:28,585
do while loop was the
best choice here, but you

1702
01:35:28,585 --> 01:35:30,210
knew what your do while loop was doing.

1703
01:35:30,210 --> 01:35:32,877
So you said I wanted to ask the
user for a height once initially

1704
01:35:32,877 --> 01:35:34,880
but then ask again if
the height was invalid.

1705
01:35:34,880 --> 01:35:39,320
So the debugger now reviewing
your code will say something

1706
01:35:39,320 --> 01:35:41,550
along the lines of that's
a great explanation.

1707
01:35:41,550 --> 01:35:44,600
The do while loop works especially
well here for some reason.

1708
01:35:44,600 --> 01:35:49,533
And it also includes a little snippet
on why other loops wouldn't be as good.

1709
01:35:49,533 --> 01:35:52,700
So a while loop would also work, but
it would require some unnecessary code.

1710
01:35:52,700 --> 01:35:54,590
And think about why that would be.

1711
01:35:54,590 --> 01:35:56,000
So then it keeps going.

1712
01:35:56,000 --> 01:35:59,470
So it highlights the
next line and asks you--

1713
01:35:59,470 --> 01:36:02,455
I'm sorry, moving on.

1714
01:36:02,455 --> 01:36:05,080
Can you walk me walk me through
the logic of the outer for loop

1715
01:36:05,080 --> 01:36:06,590
in the nested for loop structure?

1716
01:36:06,590 --> 01:36:08,150
What role do the variables play here?

1717
01:36:08,150 --> 01:36:12,250
And let's say you said, i is a
counter variable, and n is the height.

1718
01:36:12,250 --> 01:36:15,070
So you're just counting
up to the height.

1719
01:36:15,070 --> 01:36:18,490
And the response is
actually pretty insightful.

1720
01:36:18,490 --> 01:36:19,220
That's right.

1721
01:36:19,220 --> 01:36:21,770
In your code, i is a counter
variable, and n is the height.

1722
01:36:21,770 --> 01:36:23,860
However, this can be a
bit confusing for someone

1723
01:36:23,860 --> 01:36:25,940
who hasn't read your code
before or even yourself

1724
01:36:25,940 --> 01:36:27,565
if you came back to this after a while.

1725
01:36:27,565 --> 01:36:30,363
This is why variable naming is
so important in programming.

1726
01:36:30,363 --> 01:36:32,780
Can you think of any better
names for these two variables?

1727
01:36:32,780 --> 01:36:36,640
So now the student can respond with
maybe something like current row

1728
01:36:36,640 --> 01:36:37,550
and height.

1729
01:36:37,550 --> 01:36:42,020
And here the ddb continues the
conversation and keeps going.

1730
01:36:42,020 --> 01:36:45,310
So as you can see, even
from two snippets alone,

1731
01:36:45,310 --> 01:36:48,820
we already have some
pretty important things

1732
01:36:48,820 --> 01:36:53,710
that wouldn't be tested in a
tool like check50 or style50.

1733
01:36:53,710 --> 01:36:57,880
There's one conceptual kink that maybe
the student hadn't even thought of

1734
01:36:57,880 --> 01:37:03,010
and another just very
important convention standard

1735
01:37:03,010 --> 01:37:04,670
that the student was not following.

1736
01:37:04,670 --> 01:37:09,020
So these assessments, like check50
and style50 are great tools.

1737
01:37:09,020 --> 01:37:12,355
But now with the Assistants API, they're
able to get much more nuanced feedback.

1738
01:37:12,355 --> 01:37:19,420


1739
01:37:19,420 --> 01:37:23,990
You may be wondering like, how can these
tools actually be used in an assessment.

1740
01:37:23,990 --> 01:37:27,157
And the way you use such a tool largely
depends on your pedagogical beliefs.

1741
01:37:27,157 --> 01:37:30,240
You've probably noticed that I've been
saying that these concerns about AI

1742
01:37:30,240 --> 01:37:32,120
are not just from teachers
but also students.

1743
01:37:32,120 --> 01:37:35,590
And I myself sometimes question
whether I really gained all the skills

1744
01:37:35,590 --> 01:37:39,770
a class had to offer if an AI had to
help me to get to the end of the goal.

1745
01:37:39,770 --> 01:37:43,540
And, of course, the Assistants API
could be used, like repurposed,

1746
01:37:43,540 --> 01:37:47,230
as a TA that grades the students
based on their responses.

1747
01:37:47,230 --> 01:37:50,913
Or the code review can serve as a
truly valuable self-assessment tool

1748
01:37:50,913 --> 01:37:53,830
where the students can gauge their
own understanding of their own code

1749
01:37:53,830 --> 01:37:56,830
and find things they missed
all in their own free time

1750
01:37:56,830 --> 01:37:59,140
without anxiety of judgment.

1751
01:37:59,140 --> 01:38:01,300
Regardless, this is just
one potential application

1752
01:38:01,300 --> 01:38:03,250
of the Asssitants API to education.

1753
01:38:03,250 --> 01:38:06,610
And the API is still very new,
so I highly encourage you all

1754
01:38:06,610 --> 01:38:09,160
to make something out of it as well.

1755
01:38:09,160 --> 01:38:12,320
RONGXIN LIU: All right, thank
you, Ben, Chris, and Julianna.

1756
01:38:12,320 --> 01:38:16,580
So that's a lot to take
away from this talk.

1757
01:38:16,580 --> 01:38:19,950
But I want you to end this talk
with just a quick takeaway.

1758
01:38:19,950 --> 01:38:23,480
So what can you all do with all of
the technology we just mentioned?

1759
01:38:23,480 --> 01:38:26,390
You definitely don't need
to re-implement what we did.

1760
01:38:26,390 --> 01:38:29,880
Or you certainly don't have the
bandwidth to implement what we did.

1761
01:38:29,880 --> 01:38:30,890
That's OK.

1762
01:38:30,890 --> 01:38:35,000
Luckily, there's a more user-friendly
tool that you can actually utilize,

1763
01:38:35,000 --> 01:38:36,560
which is the OpenAI GPT.

1764
01:38:36,560 --> 01:38:39,150
It's a GUI-based GPT builder.

1765
01:38:39,150 --> 01:38:41,540
You can literally go
talk to this GPT builder

1766
01:38:41,540 --> 01:38:47,960
and build or rebuild or recreate a
CS50 Duck right from OpenAI's platform.

1767
01:38:47,960 --> 01:38:51,530
And you can essentially create
this GPT and distribute it

1768
01:38:51,530 --> 01:38:54,640
to your student to use.

1769
01:38:54,640 --> 01:38:57,430
And for more information on how
you can actually build a GPT,

1770
01:38:57,430 --> 01:39:01,670
I would please look at the link
in the slide for more information.

1771
01:39:01,670 --> 01:39:04,220
But this screenshot basically
captured the process.

1772
01:39:04,220 --> 01:39:08,985
I just quickly was able to create
a CS50 Duck on OpenAI's platform.

1773
01:39:08,985 --> 01:39:12,210


1774
01:39:12,210 --> 01:39:16,350
And with that, I hope you have a
sense of what CS50 AI was actually

1775
01:39:16,350 --> 01:39:17,460
doing underneath the hood.

1776
01:39:17,460 --> 01:39:19,905
And I hope you learned
something from this talk.

1777
01:39:19,905 --> 01:39:21,780
If you want to review
the talk, certainly you

1778
01:39:21,780 --> 01:39:26,070
can just rewatch the video recording and
then look at the provided code to you

1779
01:39:26,070 --> 01:39:28,030
as well.

1780
01:39:28,030 --> 01:39:29,780
With that, I'm going
to end the talk here.

1781
01:39:29,780 --> 01:39:32,150
Thank you so much for your time.

1782
01:39:32,150 --> 01:39:33,000