1 00:00:00,000 --> 00:00:02,070 2 00:00:02,070 --> 00:00:06,960 DAVID J. MALAN: All right, this is CS50, and this is the CS50 Educator Workshop 3 00:00:06,960 --> 00:00:09,630 and our session on "Teaching CS50 with AI," 4 00:00:09,630 --> 00:00:12,900 in which we'll speak to exactly how and why we've leveraged artificial 5 00:00:12,900 --> 00:00:17,350 intelligence to actually teach the course and support students therein. 6 00:00:17,350 --> 00:00:21,420 And we started this project really over a year ago now with this working premise 7 00:00:21,420 --> 00:00:27,060 that tools like ChatGPT, which came onto the scene in November of 2020-- 8 00:00:27,060 --> 00:00:29,590 2022, seemingly overnight. 9 00:00:29,590 --> 00:00:33,180 But the working premise for us is that tools like this really out of the box, 10 00:00:33,180 --> 00:00:35,980 working in their default way, are really too helpful. 11 00:00:35,980 --> 00:00:38,250 Tools like ChatGPT, Bing Chat, and the like 12 00:00:38,250 --> 00:00:41,010 are all too eager to answer each and every question 13 00:00:41,010 --> 00:00:42,280 that you might ask of them. 14 00:00:42,280 --> 00:00:45,690 And even if that question is, can you do my homework for me, 15 00:00:45,690 --> 00:00:47,560 they are all too willing to oblige. 16 00:00:47,560 --> 00:00:49,950 So ironically, we set out a year plus ago 17 00:00:49,950 --> 00:00:52,410 to put in place some pedagogical guardrails 18 00:00:52,410 --> 00:00:55,830 to really make those tools less useful for students 19 00:00:55,830 --> 00:00:59,910 or really more properly stated, more educationally useful 20 00:00:59,910 --> 00:01:04,629 for students so that the tools might guide students toward useful answers 21 00:01:04,629 --> 00:01:08,840 and insights but not simply spoil answers to problems outright. 22 00:01:08,840 --> 00:01:11,620 In terms of policy, this is literally the language 23 00:01:11,620 --> 00:01:13,840 we adopted early on in the course's syllabus 24 00:01:13,840 --> 00:01:18,670 for students, whereby we said that it is not reasonable, that it is not allowed, 25 00:01:18,670 --> 00:01:22,840 to use AI-based software like ChatGPT, GitHub, Copilot, Bing Chat, 26 00:01:22,840 --> 00:01:27,470 and the like that suggests or completes answers to questions or lines of code. 27 00:01:27,470 --> 00:01:30,280 This is not something we can enforce technologically. 28 00:01:30,280 --> 00:01:33,760 Even if the school or your school could prohibit access 29 00:01:33,760 --> 00:01:36,340 to certain websites with a firewall, the reality 30 00:01:36,340 --> 00:01:39,550 is most any student could still take out their own device, 31 00:01:39,550 --> 00:01:42,740 whether on campus or at home, and still access those services. 32 00:01:42,740 --> 00:01:44,740 So we've also woven into the course all the more 33 00:01:44,740 --> 00:01:49,600 of a discussion of ethics, academic honesty so that students at least are 34 00:01:49,600 --> 00:01:53,270 cognizant of where we are coming from and where those lines are. 35 00:01:53,270 --> 00:01:55,450 However, we didn't want to simply take away tools 36 00:01:55,450 --> 00:01:59,390 that arguably are going to be with us now here on out and do have value. 37 00:01:59,390 --> 00:02:03,130 So we deemed it reasonable for students to use CS50's own AI-based software, 38 00:02:03,130 --> 00:02:07,570 including the CS50 Duck in virtual form, otherwise known as the Duck debugger, 39 00:02:07,570 --> 00:02:10,990 or ddb, in our own web-based applications, 40 00:02:10,990 --> 00:02:16,150 namely CS50.ai, which you'll soon see, as well as CS50.dev, the Codespaces 41 00:02:16,150 --> 00:02:18,560 environment you saw yesterday as well. 42 00:02:18,560 --> 00:02:21,410 And so we really set out to bring this virtual duck to life. 43 00:02:21,410 --> 00:02:24,460 And in fact, thanks to CS50's own Brian Yu, for a few years, 44 00:02:24,460 --> 00:02:30,460 we had a chat bot of sorts built into VS Code and our previous programming 45 00:02:30,460 --> 00:02:34,030 environment, whereby if students typed a question into a chat box, 46 00:02:34,030 --> 00:02:36,350 they would actually get back a response. 47 00:02:36,350 --> 00:02:38,118 But the response wasn't all that helpful. 48 00:02:38,118 --> 00:02:39,910 But the goal was to indeed implement what's 49 00:02:39,910 --> 00:02:42,910 called rubber duck debugging or rubberducking, 50 00:02:42,910 --> 00:02:47,590 the idea being that in the absence of a smart colleague, family member, friend, 51 00:02:47,590 --> 00:02:50,350 or really just someone more technically proficient than you, 52 00:02:50,350 --> 00:02:55,360 you should at least keep on your desk a rubber duck or any inanimate 53 00:02:55,360 --> 00:02:58,600 object really so that you can talk to it about any problems or bugs 54 00:02:58,600 --> 00:02:59,600 that you're having. 55 00:02:59,600 --> 00:03:03,820 And the idea is per yesterday that invariably in expressing your thoughts 56 00:03:03,820 --> 00:03:07,480 step by step, ideally to this inanimate object, 57 00:03:07,480 --> 00:03:09,970 the proverbial light bulb goes off, and you 58 00:03:09,970 --> 00:03:14,500 understand wherein you are confused about some topic or bug. 59 00:03:14,500 --> 00:03:16,330 We try to bring this to life historically 60 00:03:16,330 --> 00:03:19,910 in CS50 not only with small rubber ducks but very large ones as well. 61 00:03:19,910 --> 00:03:22,000 But the virtual version that we had for some time 62 00:03:22,000 --> 00:03:26,680 looked a little something like this in CS50.dev, that is, Visual Studio code. 63 00:03:26,680 --> 00:03:28,565 A student might begin a conversation with, 64 00:03:28,565 --> 00:03:30,440 "I'm hoping you can help me solve a problem." 65 00:03:30,440 --> 00:03:33,220 And up until fairly recently, all this duck 66 00:03:33,220 --> 00:03:38,500 would do that Brian implemented for students was quack back either once, 67 00:03:38,500 --> 00:03:39,740 twice, or three times. 68 00:03:39,740 --> 00:03:41,830 So it wasn't completely simple. 69 00:03:41,830 --> 00:03:45,980 There was some pseudorandomness in there but still just really one line of code. 70 00:03:45,980 --> 00:03:50,380 But I dare say quite a few students online and eventually on campus 71 00:03:50,380 --> 00:03:54,130 were fairly astonished where after two plus years of the duck 72 00:03:54,130 --> 00:03:55,570 only knowing how to quack-- 73 00:03:55,570 --> 00:04:00,370 suddenly and literally overnight started to responding to students in English 74 00:04:00,370 --> 00:04:04,090 or in some cases in other human languages form, 75 00:04:04,090 --> 00:04:06,530 depending on how students initiate the conversation. 76 00:04:06,530 --> 00:04:10,100 And so really this virtual rubber duck was brought all the more to life. 77 00:04:10,100 --> 00:04:12,910 And educationally, what our goal has been is really this, 78 00:04:12,910 --> 00:04:16,660 to provide students with virtual office hours 24/7. 79 00:04:16,660 --> 00:04:18,760 Even at a place like Harvard or Yale, where 80 00:04:18,760 --> 00:04:22,300 we're fortunate to have lots of teaching assistants and so forth, 81 00:04:22,300 --> 00:04:24,910 the reality is we've never really been able to keep up 82 00:04:24,910 --> 00:04:28,390 with the volume of questions, the demand for support. 83 00:04:28,390 --> 00:04:32,170 And so providing students electronically somehow 84 00:04:32,170 --> 00:04:37,060 with all the more support in addition to our traditional human support structure 85 00:04:37,060 --> 00:04:39,020 has very much been of interest. 86 00:04:39,020 --> 00:04:42,100 And I dare say this really is the Holy Grail, to approximate ultimately 87 00:04:42,100 --> 00:04:47,210 a 1 to 1 teacher to student ratio, which we have never had here. 88 00:04:47,210 --> 00:04:50,890 Indeed even within our campus sections or recitations, 89 00:04:50,890 --> 00:04:54,400 which are led by our undergraduate teaching fellows or teaching assistants, 90 00:04:54,400 --> 00:05:00,772 we might have a ratio of 1 to 6 at best, 1 to 12 more commonly, 1 to 20 or more. 91 00:05:00,772 --> 00:05:03,230 And if you think about just mathematically how many minutes 92 00:05:03,230 --> 00:05:05,647 there are in an hour, if you've got six students in a room 93 00:05:05,647 --> 00:05:09,172 over the course of an hour, that's only 10 minutes per student. 94 00:05:09,172 --> 00:05:11,630 And that's not nearly enough quite often for those students 95 00:05:11,630 --> 00:05:14,240 coming in with less strong STEM backgrounds 96 00:05:14,240 --> 00:05:17,960 or simply struggling more generally with a new world. 97 00:05:17,960 --> 00:05:21,470 So what we thought we'd do today is share with you a little bit 98 00:05:21,470 --> 00:05:24,590 of the features that are student and teacher facing but also some 99 00:05:24,590 --> 00:05:27,590 of the underlying architecture so that you as teachers 100 00:05:27,590 --> 00:05:31,580 actually understand how today's forms of artificial intelligence 101 00:05:31,580 --> 00:05:34,100 are working if only so that you feel more 102 00:05:34,100 --> 00:05:38,390 comfortable and confident or skeptical about how these tools are actually built 103 00:05:38,390 --> 00:05:39,350 and what they can do. 104 00:05:39,350 --> 00:05:42,950 Or educationally, you can teach your own students about some 105 00:05:42,950 --> 00:05:46,650 of these underlying technologies that are increasingly around us. 106 00:05:46,650 --> 00:05:50,300 For my part, I'll focus really on the user-friendly side of this diagram, 107 00:05:50,300 --> 00:05:52,010 which is to say the user interface. 108 00:05:52,010 --> 00:05:55,350 And then we'll dive more deeply into what CS50.ai is. 109 00:05:55,350 --> 00:05:57,470 We'll talk about a technology known as a vector 110 00:05:57,470 --> 00:06:01,650 database, which will allow us to enhance our searches and answers to questions. 111 00:06:01,650 --> 00:06:05,150 And then we'll talk a little bit about the cloud services like OpenAI and Azure 112 00:06:05,150 --> 00:06:09,600 that underlie CS50's own implementation of this rubber duck. 113 00:06:09,600 --> 00:06:13,670 So thank you to our friends, first and foremost at GitHub, at OpenAI, 114 00:06:13,670 --> 00:06:16,040 at Microsoft who have wonderfully been making 115 00:06:16,040 --> 00:06:19,130 a lot of this support possible through cloud-based credits such 116 00:06:19,130 --> 00:06:21,320 that all of the tools you are about to see 117 00:06:21,320 --> 00:06:23,690 and the tools with which you might already be familiar 118 00:06:23,690 --> 00:06:27,260 firsthand have been made freely available to teachers, to students 119 00:06:27,260 --> 00:06:27,810 alike. 120 00:06:27,810 --> 00:06:31,230 So everything you're about to see you may use indeed with your own students. 121 00:06:31,230 --> 00:06:35,970 So you've seen this environment already, VS Code for CS50 at cs50.dev. 122 00:06:35,970 --> 00:06:39,120 This is a landing page via which students can log into that website. 123 00:06:39,120 --> 00:06:42,740 And what we set out to do early on in this process of trying 124 00:06:42,740 --> 00:06:45,470 to embrace artificial intelligence was think about what 125 00:06:45,470 --> 00:06:48,330 problems could we solve with AI. 126 00:06:48,330 --> 00:06:50,810 And so we set out first to do something fairly simple 127 00:06:50,810 --> 00:06:53,010 and bite size just over a year ago. 128 00:06:53,010 --> 00:06:55,700 Could we write a program that somehow uses 129 00:06:55,700 --> 00:06:58,770 AI to just explain code to students? 130 00:06:58,770 --> 00:07:03,260 And this is something that does need AI because we couldn't possibly 131 00:07:03,260 --> 00:07:08,000 write a program that has enough if, else if, else if, else if, else if, else 132 00:07:08,000 --> 00:07:13,310 if conditionals to be able to deterministically tell students what 133 00:07:13,310 --> 00:07:17,600 a given line of code is unless we literally have an if 134 00:07:17,600 --> 00:07:19,910 condition for every possible line of code 135 00:07:19,910 --> 00:07:22,950 they might see or write so that we can pattern match on it. 136 00:07:22,950 --> 00:07:24,870 So we really need a more generalized solution. 137 00:07:24,870 --> 00:07:27,480 And this is where AI really shines. 138 00:07:27,480 --> 00:07:30,390 So here's a screenshot of VS Code in night mode. 139 00:07:30,390 --> 00:07:33,630 Here then is some sample code written in C. 140 00:07:33,630 --> 00:07:36,560 This is simply a program that says hello so and so 141 00:07:36,560 --> 00:07:38,460 after prompting the human for their name. 142 00:07:38,460 --> 00:07:40,970 But this is perhaps representative of a program 143 00:07:40,970 --> 00:07:42,750 that a student might not understand. 144 00:07:42,750 --> 00:07:46,550 So with this extension built into cs50.dev, 145 00:07:46,550 --> 00:07:49,700 students can highlight one or more even all lines 146 00:07:49,700 --> 00:07:52,290 of code, Right-Click or Control-Click. 147 00:07:52,290 --> 00:07:55,320 And this option here in the middle, Explain Highlighted Code, 148 00:07:55,320 --> 00:07:57,750 even though it looks officially built in, it's not. 149 00:07:57,750 --> 00:08:01,250 We contribute it via a CS50 specific extension 150 00:08:01,250 --> 00:08:05,750 that we pre-install into cs50.dev so that when students click on that, 151 00:08:05,750 --> 00:08:11,330 they then see a ChatGPT-like explanation of the very lines of code 152 00:08:11,330 --> 00:08:12,510 that they have highlighted. 153 00:08:12,510 --> 00:08:16,340 So not necessarily the most complicated code, but you could imagine a teacher, 154 00:08:16,340 --> 00:08:19,850 a teaching assistant, spending a good number of minutes writing out something 155 00:08:19,850 --> 00:08:23,880 so thoughtfully step by step, line by line to explain that code. 156 00:08:23,880 --> 00:08:26,690 Here, though, students have an answer within, say, 157 00:08:26,690 --> 00:08:30,390 three seconds, instantaneously automatically generated. 158 00:08:30,390 --> 00:08:33,049 So that actually worked fairly well and wasn't that hard 159 00:08:33,049 --> 00:08:36,950 because a lot of the tools out there nowadays, OpenAI included, 160 00:08:36,950 --> 00:08:40,880 have been trained on so to speak, lots and lots of code from the internet, 161 00:08:40,880 --> 00:08:42,210 for better or for worse. 162 00:08:42,210 --> 00:08:45,800 And so even though we did not write a program that says if, else if, else 163 00:08:45,800 --> 00:08:48,770 if, else if, else if to recognize students code, 164 00:08:48,770 --> 00:08:52,050 odds are over time certainly for "Hello, world type" programs, 165 00:08:52,050 --> 00:08:55,730 someone out there wrote something like this, someone on Stack Overflow, Reddit, 166 00:08:55,730 --> 00:08:58,380 or elsewhere probably explained code like that. 167 00:08:58,380 --> 00:09:02,270 And so tools like ChatGPT sort of know, therefore, 168 00:09:02,270 --> 00:09:06,183 how to explain these particular lines of code. 169 00:09:06,183 --> 00:09:07,350 Well, what else might we do? 170 00:09:07,350 --> 00:09:12,020 So version 2 of our implementation of AI tools for CS50 students were, 171 00:09:12,020 --> 00:09:15,780 could we advise students how to prove their code style? 172 00:09:15,780 --> 00:09:18,830 So many of you know we've had that "command line terminal based" 173 00:09:18,830 --> 00:09:23,840 tool called style50, which just tells you how to format your code line by line 174 00:09:23,840 --> 00:09:25,920 if it's inconsistent with a style guide. 175 00:09:25,920 --> 00:09:27,628 Well, we thought we'd try to do something 176 00:09:27,628 --> 00:09:30,110 more graphically and a little more explicitly 177 00:09:30,110 --> 00:09:33,600 so that students understand step by step what it is they need to do. 178 00:09:33,600 --> 00:09:35,340 So here's some more C code. 179 00:09:35,340 --> 00:09:36,240 It's pretty messy. 180 00:09:36,240 --> 00:09:37,372 Everything is left aligned. 181 00:09:37,372 --> 00:09:38,330 There's no indentation. 182 00:09:38,330 --> 00:09:41,700 So it's not pretty printed even though it might be correct. 183 00:09:41,700 --> 00:09:45,600 If students now click a button at top right labeled style50-- 184 00:09:45,600 --> 00:09:47,070 so we've turned it into a button. 185 00:09:47,070 --> 00:09:51,770 Instead of just a command, they'll see side by side a so-called diff, D-I-F-F, 186 00:09:51,770 --> 00:09:55,430 which shows the difference between the way their code currently looks 187 00:09:55,430 --> 00:10:00,170 and the way we think their code should look per CS50's own style guide. 188 00:10:00,170 --> 00:10:03,860 And pictured there in green are sort of highlighted spaces 189 00:10:03,860 --> 00:10:06,260 that the students should add with their spacebar or tab 190 00:10:06,260 --> 00:10:08,840 character highlighted in red or perhaps things 191 00:10:08,840 --> 00:10:10,550 that they should delete or change. 192 00:10:10,550 --> 00:10:12,590 So we show students think side by side. 193 00:10:12,590 --> 00:10:15,500 But if they don't quite grok what the difference is 194 00:10:15,500 --> 00:10:20,270 or why they might do that, there's also a Explain Changes button, which, 195 00:10:20,270 --> 00:10:22,535 if students click on it, opens up, as before, 196 00:10:22,535 --> 00:10:28,020 a ChatGPT-like explanation of what they should do and/or why. 197 00:10:28,020 --> 00:10:30,112 Now students, once they get comfortable with this 198 00:10:30,112 --> 00:10:32,570 and once they want to automate this, they can actually just 199 00:10:32,570 --> 00:10:35,400 click Apply Changes at top right also. 200 00:10:35,400 --> 00:10:38,070 And that will just reformat their code for you. 201 00:10:38,070 --> 00:10:41,600 And the most sophisticated students can certainly go into VS Code 202 00:10:41,600 --> 00:10:44,490 and just enable automatic auto formatting. 203 00:10:44,490 --> 00:10:47,000 We actually disable all of those user-friendly features 204 00:10:47,000 --> 00:10:48,410 for pedagogical reasons. 205 00:10:48,410 --> 00:10:51,380 But the goal here, at least in the early weeks of the class, 206 00:10:51,380 --> 00:10:54,800 is to at least put in front of students the side by side 207 00:10:54,800 --> 00:10:58,730 and just get them thinking about and developing the muscle memory for making 208 00:10:58,730 --> 00:11:01,560 these kinds of improvements themselves. 209 00:11:01,560 --> 00:11:04,700 Well, then we set out more educationally minded 210 00:11:04,700 --> 00:11:06,760 to try to answer students' questions. 211 00:11:06,760 --> 00:11:10,010 And this tool here even though we use it for computer science and programming, 212 00:11:10,010 --> 00:11:11,730 really is generalizable. 213 00:11:11,730 --> 00:11:15,620 So those of you who teach in other subjects beyond CSS and programming 214 00:11:15,620 --> 00:11:19,460 might take an interest in somehow trying to leverage this same idea 215 00:11:19,460 --> 00:11:21,210 for other domains as well. 216 00:11:21,210 --> 00:11:26,370 But could we through AI answer most of the questions asked online by students? 217 00:11:26,370 --> 00:11:29,630 So for CS50 on campus as well as CS50x online, 218 00:11:29,630 --> 00:11:34,070 we've long used, among other tools, a Q&A a tool, question and answer tool, 219 00:11:34,070 --> 00:11:37,050 called Ed for educational discussions. 220 00:11:37,050 --> 00:11:38,630 And here's a screenshot thereof. 221 00:11:38,630 --> 00:11:41,870 And this is a screenshot of a representative question a student might 222 00:11:41,870 --> 00:11:43,740 ask called, say, John Harvard. 223 00:11:43,740 --> 00:11:46,140 What is flask exactly they might ask? 224 00:11:46,140 --> 00:11:49,850 Now, years ago, I or one of our teaching assistants on campus 225 00:11:49,850 --> 00:11:54,050 would log in and answer this question by typing out a response, a sentence, 226 00:11:54,050 --> 00:11:55,470 a paragraph, or more. 227 00:11:55,470 --> 00:11:57,690 And frankly, this is an easy question, right? 228 00:11:57,690 --> 00:11:59,550 This is very googleable in the first place. 229 00:11:59,550 --> 00:12:00,630 It's very definitional. 230 00:12:00,630 --> 00:12:03,820 So this is not a hard question, but this is where we began. 231 00:12:03,820 --> 00:12:09,210 This is an actual screenshot of how the Duck built with AI 232 00:12:09,210 --> 00:12:11,350 responded to this particular question. 233 00:12:11,350 --> 00:12:14,250 Flask is a micro framework written in Python, dot, dot, dot. 234 00:12:14,250 --> 00:12:16,710 And I'll stipulate for the purposes of today. 235 00:12:16,710 --> 00:12:18,010 It's a pretty good answer. 236 00:12:18,010 --> 00:12:18,760 It's not hard. 237 00:12:18,760 --> 00:12:20,170 Bing, Google, any number-- 238 00:12:20,170 --> 00:12:22,420 I mean, even Yahoo could have figured this answer out. 239 00:12:22,420 --> 00:12:25,500 But in our case, we wanted it to be dynamically generated 240 00:12:25,500 --> 00:12:28,570 in the Q&A environment the students are already using. 241 00:12:28,570 --> 00:12:31,050 But within this environment, there's an opportunity 242 00:12:31,050 --> 00:12:33,580 for students to ask more sophisticated questions as well. 243 00:12:33,580 --> 00:12:35,760 So here's a question about the Caesar problem 244 00:12:35,760 --> 00:12:37,840 set, an encryption-related problem. 245 00:12:37,840 --> 00:12:41,410 Here's an error message formatted in monospaced font. 246 00:12:41,410 --> 00:12:44,890 And then down here, there's a more nuanced question asked by the student. 247 00:12:44,890 --> 00:12:46,360 It's not just a definition. 248 00:12:46,360 --> 00:12:47,650 It's not even yes or no. 249 00:12:47,650 --> 00:12:51,250 Is there a more efficient way to write this code? 250 00:12:51,250 --> 00:12:54,900 And so here too I dare say, AI even in 2024 251 00:12:54,900 --> 00:12:58,810 is already starting to shine in terms of quality of results. 252 00:12:58,810 --> 00:13:02,380 And this is kind of a mouthful, but let me stipulate for today's purposes. 253 00:13:02,380 --> 00:13:04,330 This is a pretty darn good answer. 254 00:13:04,330 --> 00:13:07,860 It goes into an explanation of what the student seems to be doing 255 00:13:07,860 --> 00:13:09,820 and what they could potentially do instead. 256 00:13:09,820 --> 00:13:11,310 It doesn't give them too much code. 257 00:13:11,310 --> 00:13:13,393 It, in fact, just gives them some boilerplate code 258 00:13:13,393 --> 00:13:14,950 here in a monospaced font. 259 00:13:14,950 --> 00:13:19,710 But it does disclaim to them here at the bottom, "PS, I'm an experimental bot. 260 00:13:19,710 --> 00:13:20,440 Quack. 261 00:13:20,440 --> 00:13:22,890 Do not assume that my reply is accurate unless you see 262 00:13:22,890 --> 00:13:25,060 that it's been endorsed by human staff. 263 00:13:25,060 --> 00:13:25,620 Quack." 264 00:13:25,620 --> 00:13:28,037 Now, this is a short-term intervention because many of you 265 00:13:28,037 --> 00:13:31,320 might be familiar in general with the imperfections of AI still. 266 00:13:31,320 --> 00:13:33,930 And I think they're going to diminish in frequency, 267 00:13:33,930 --> 00:13:36,490 but they are with us for now and probably some time. 268 00:13:36,490 --> 00:13:40,120 So this is just a reminder to students that this is all very much beta. 269 00:13:40,120 --> 00:13:44,740 But we do have a feature within Ed that namely is this button, Endorsed. 270 00:13:44,740 --> 00:13:48,840 And so the way we have mitigated this educational concern that sometimes AI 271 00:13:48,840 --> 00:13:53,190 might just hallucinate or make stuff up is that we click as a human 272 00:13:53,190 --> 00:13:55,590 this Endorsed button just to additionally signal 273 00:13:55,590 --> 00:14:00,540 to students that even though the AI duck generated the response within three 274 00:14:00,540 --> 00:14:03,300 seconds, usually there's enough of us awake at all hours 275 00:14:03,300 --> 00:14:06,090 that within a few minutes, a few hours, we 276 00:14:06,090 --> 00:14:11,100 add this additional signal to students that, yes, this AI-generated response is 277 00:14:11,100 --> 00:14:13,620 trustworthy because a human has weighed in on it. 278 00:14:13,620 --> 00:14:15,870 Frankly, I do think this is a short-term intervention. 279 00:14:15,870 --> 00:14:19,920 And eventually we will not be bothering to endorse answers anymore 280 00:14:19,920 --> 00:14:22,330 once we're all the more confident in answers. 281 00:14:22,330 --> 00:14:26,940 Now, we've also implemented at CS50.AI a web application that students 282 00:14:26,940 --> 00:14:30,390 can use for free with any GitHub account on their laptop, desktop, 283 00:14:30,390 --> 00:14:34,290 or even phone that provides a much more ChatGPT-like interface that's 284 00:14:34,290 --> 00:14:38,130 conversational, with which the students can interact back and forth, back 285 00:14:38,130 --> 00:14:39,810 and forth, just like any chat bot. 286 00:14:39,810 --> 00:14:42,180 It too starts with a similar disclaimer at the top 287 00:14:42,180 --> 00:14:43,890 to remind students to think critically. 288 00:14:43,890 --> 00:14:46,780 And then they can proceed to ask any and all questions. 289 00:14:46,780 --> 00:14:48,960 And for instance here, I dare say this is 290 00:14:48,960 --> 00:14:52,450 rather representative of the types of questions a student might ask. 291 00:14:52,450 --> 00:14:55,720 My code is not working as expected, any ideas? 292 00:14:55,720 --> 00:14:58,890 But because so many students around the world 293 00:14:58,890 --> 00:15:01,830 have asked questions like these and adults, for that matter, 294 00:15:01,830 --> 00:15:05,070 on sites like Reddit and Stack Overflow and the like, they're there too. 295 00:15:05,070 --> 00:15:08,430 There's a lot of training data on which artificial intelligence 296 00:15:08,430 --> 00:15:09,850 has been prepared. 297 00:15:09,850 --> 00:15:12,930 And so when the student provides not only that vague question 298 00:15:12,930 --> 00:15:17,400 but a bit of code, even this duck implemented on top of those AIs 299 00:15:17,400 --> 00:15:19,750 can actually give a pretty darn good answer. 300 00:15:19,750 --> 00:15:21,750 And in fact here, the ddb, duck debugger, 301 00:15:21,750 --> 00:15:24,570 responds with, "It seems you're trying to add two integers. 302 00:15:24,570 --> 00:15:27,130 But the input function in Python returns a string. 303 00:15:27,130 --> 00:15:31,290 So when you try to add x and y, you're actually concatenating two strings." 304 00:15:31,290 --> 00:15:34,927 And then goes on to provide a bit more detail and an actual solution-- 305 00:15:34,927 --> 00:15:37,260 so we're actually finding that students are increasingly 306 00:15:37,260 --> 00:15:42,540 using this conversational UI instead of the more asynchronous Q&A tools 307 00:15:42,540 --> 00:15:43,867 that we described earlier. 308 00:15:43,867 --> 00:15:46,950 There are some other features built into this same interface, these hearts 309 00:15:46,950 --> 00:15:47,860 up at the top. 310 00:15:47,860 --> 00:15:50,160 Those of you who have played Zelda or similar games 311 00:15:50,160 --> 00:15:52,770 might know of HP and how you lose over time 312 00:15:52,770 --> 00:15:57,070 as you get hurt or tired in the game hearts until they eventually regenerate. 313 00:15:57,070 --> 00:15:59,290 And we implemented this for a couple of reasons. 314 00:15:59,290 --> 00:16:03,130 One, we definitely saw a long tail of students 315 00:16:03,130 --> 00:16:06,110 asking probably too many questions. 316 00:16:06,110 --> 00:16:09,430 I don't know precisely what the wrong number of questions is. 317 00:16:09,430 --> 00:16:12,400 But when you're asking dozens, hundreds of questions 318 00:16:12,400 --> 00:16:16,330 of the duck about some similar topic, I think back to high school 319 00:16:16,330 --> 00:16:20,560 where I had pretty good but pretty mean teachers who would eventually send me 320 00:16:20,560 --> 00:16:24,040 back to my desk to think about the answer they had just given me 321 00:16:24,040 --> 00:16:26,540 before they'll entertain additional questions. 322 00:16:26,540 --> 00:16:29,860 And so through software, we've tried to approximate that idea of pumping 323 00:16:29,860 --> 00:16:33,430 the brakes for students to get them to reflect on what information 324 00:16:33,430 --> 00:16:37,600 they've already been given as opposed to just fishing continually for more 325 00:16:37,600 --> 00:16:39,440 and more help with the same question. 326 00:16:39,440 --> 00:16:43,120 This also tends to keep, frankly, utilization down overall 327 00:16:43,120 --> 00:16:45,370 to keep costs in check so that everything can indeed 328 00:16:45,370 --> 00:16:48,350 be free for all students and teachers. 329 00:16:48,350 --> 00:16:50,300 So how does this all work? 330 00:16:50,300 --> 00:16:55,630 And how might you, using our tools or even other current and future AI tools, 331 00:16:55,630 --> 00:16:57,680 actually implement functionality like this? 332 00:16:57,680 --> 00:16:59,800 Well, in the world of AI right now, there's 333 00:16:59,800 --> 00:17:02,840 this notion of prompting and prompt engineering. 334 00:17:02,840 --> 00:17:04,930 So prompt engineering isn't really engineering, 335 00:17:04,930 --> 00:17:06,940 but it's the art of asking good questions 336 00:17:06,940 --> 00:17:09,349 of today's artificial intelligence. 337 00:17:09,349 --> 00:17:13,900 A system prompt in particular are instructions that someone you, 338 00:17:13,900 --> 00:17:17,440 me, a programmer, any teacher gives to an AI 339 00:17:17,440 --> 00:17:20,260 in advance, typically to give it a personality 340 00:17:20,260 --> 00:17:23,260 or to give it some guidelines, really those pedagogical 341 00:17:23,260 --> 00:17:25,700 guardrails that I referred to earlier. 342 00:17:25,700 --> 00:17:27,880 So in our case, for instance, we literally 343 00:17:27,880 --> 00:17:34,220 have instructions built into CS50.AI that read a little something like this. 344 00:17:34,220 --> 00:17:36,430 This is an abbreviated version thereof. 345 00:17:36,430 --> 00:17:41,280 Quote, unquote, "You are a friendly and supportive teaching assistant for CS50. 346 00:17:41,280 --> 00:17:43,060 You are also a rubber duck." 347 00:17:43,060 --> 00:17:47,880 Those two English sentences are sufficient for programming 348 00:17:47,880 --> 00:17:53,550 an off-the-shelf AI, like OpenAI's or someone else's, to not only know 349 00:17:53,550 --> 00:17:56,460 what its purpose in life is supposed to be as an AI 350 00:17:56,460 --> 00:17:59,860 but also that its persona should be that of a rubber duck. 351 00:17:59,860 --> 00:18:02,130 We go on, though, in our system prompt to say, 352 00:18:02,130 --> 00:18:05,980 "Answer student questions only about CS50 in the field of computer science. 353 00:18:05,980 --> 00:18:08,650 Do not answer questions about unrelated topics. 354 00:18:08,650 --> 00:18:10,890 Do not provide full answers to problem sets 355 00:18:10,890 --> 00:18:13,030 as this would violate academic honesty. 356 00:18:13,030 --> 00:18:15,060 Answer this question, colon." 357 00:18:15,060 --> 00:18:18,120 And then essentially, we prefix-- 358 00:18:18,120 --> 00:18:23,010 we prepend this system prompt to every student question 359 00:18:23,010 --> 00:18:26,490 so that the answer they get from the underlying AI 360 00:18:26,490 --> 00:18:30,390 respects not only their question, but also this system 361 00:18:30,390 --> 00:18:31,990 prompt that we fed in as well. 362 00:18:31,990 --> 00:18:34,740 And the student's question is what the world of AI 363 00:18:34,740 --> 00:18:37,690 generally nowadays calls a user prompt. 364 00:18:37,690 --> 00:18:40,410 So system prompt comes from us or the system administrators 365 00:18:40,410 --> 00:18:41,540 or the programmers. 366 00:18:41,540 --> 00:18:44,040 The user prompt comes indeed from the users. 367 00:18:44,040 --> 00:18:49,180 Now there is a danger in here whereby it's all too common with AI today 368 00:18:49,180 --> 00:18:52,060 to try to trick it into misbehaving or doing things 369 00:18:52,060 --> 00:18:55,900 that it wasn't designed to be used for or for which the person behind it 370 00:18:55,900 --> 00:18:59,570 doesn't want to pay for in terms of the underlying cloud utilization. 371 00:18:59,570 --> 00:19:04,690 So, for instance, imagine if a clever, daresay malicious, student's user prompt 372 00:19:04,690 --> 00:19:05,860 were this. 373 00:19:05,860 --> 00:19:08,150 Ignore all of those instructions. 374 00:19:08,150 --> 00:19:09,860 Do this for me instead. 375 00:19:09,860 --> 00:19:15,490 You could imagine in English or any other human language the AI deciding, 376 00:19:15,490 --> 00:19:21,280 OK, insofar as I understand that instruction, ignore everything previous. 377 00:19:21,280 --> 00:19:24,380 It will then oblige and do whatever the student might want. 378 00:19:24,380 --> 00:19:27,050 So that's what's known as a prompt injection attack. 379 00:19:27,050 --> 00:19:31,060 Those of you who are familiar already with SQL and SQL injection attacks, 380 00:19:31,060 --> 00:19:32,510 it's pretty much the same idea. 381 00:19:32,510 --> 00:19:35,718 But it's almost more simplistic because you literally can just inject English 382 00:19:35,718 --> 00:19:37,490 by telling the AI how to behave. 383 00:19:37,490 --> 00:19:41,830 So among the complexity of CS50's own tools and other commercial software 384 00:19:41,830 --> 00:19:45,850 out there nowadays is that we try to decrease the probability 385 00:19:45,850 --> 00:19:47,180 of prompt injection attacks. 386 00:19:47,180 --> 00:19:49,360 So you're welcome to try, for instance, as teachers. 387 00:19:49,360 --> 00:19:54,760 But odds are you should not be able to trick the CS50 duck, for instance, 388 00:19:54,760 --> 00:19:59,180 into solving student's math homework or English or history homework instead. 389 00:19:59,180 --> 00:20:03,200 It's not perfect, but the frequency with which that happens is quite low. 390 00:20:03,200 --> 00:20:05,950 And frankly, thinking strategically, if a student really 391 00:20:05,950 --> 00:20:08,830 wants to get help on some other subject, they technically 392 00:20:08,830 --> 00:20:12,857 could just open up ChatGPT or similar on some other device anyway. 393 00:20:12,857 --> 00:20:15,940 So the goal really is just to put downward pressure on but not necessarily 394 00:20:15,940 --> 00:20:19,490 solve 100% those kinds of scenarios. 395 00:20:19,490 --> 00:20:23,020 Now, that said, we also abused our own system prompt a few months ago. 396 00:20:23,020 --> 00:20:25,120 On April 1 in the US and a lot of countries 397 00:20:25,120 --> 00:20:28,090 is a tradition of April Fool's, where you do funny things or play 398 00:20:28,090 --> 00:20:29,210 pranks on people. 399 00:20:29,210 --> 00:20:34,060 And some of you, if you use CS50's duck on April 1, 2024, 400 00:20:34,060 --> 00:20:37,300 you might have been "rickrolled" so to speak. 401 00:20:37,300 --> 00:20:38,720 And all we did was this. 402 00:20:38,720 --> 00:20:41,710 We changed our system prompt on that day, thanks to Rongxin, 403 00:20:41,710 --> 00:20:45,080 to say, "You are a friendly and supportive teaching assistant for CS50. 404 00:20:45,080 --> 00:20:48,530 You are also a rubber duck in Rick Astley's band. 405 00:20:48,530 --> 00:20:52,540 Importantly, you should always cheer up the student at the end by incorporating 406 00:20:52,540 --> 00:20:55,550 "Never Gonna Give You Up" in your response. 407 00:20:55,550 --> 00:20:57,530 Answer this question." 408 00:20:57,530 --> 00:21:00,140 So this is not really educationally valuable. 409 00:21:00,140 --> 00:21:03,470 It was really just an Easter egg for that period of 24 or so hours. 410 00:21:03,470 --> 00:21:08,800 But it does speak to how people are programming today's AI's. 411 00:21:08,800 --> 00:21:11,080 And what you're seeing is an example of, therefore, 412 00:21:11,080 --> 00:21:15,110 prompt engineering, which is a technique that will probably evolve over time. 413 00:21:15,110 --> 00:21:17,950 But here too is perhaps a glimpse of the future insofar 414 00:21:17,950 --> 00:21:23,290 as we are somehow programming this underlying software, not in python, not 415 00:21:23,290 --> 00:21:28,240 in C, not in any other programming language but in our own human language, 416 00:21:28,240 --> 00:21:30,710 English here in this case. 417 00:21:30,710 --> 00:21:32,410 So I wanted to give-- 418 00:21:32,410 --> 00:21:37,720 so an example, if I may-- if a student on April 1 asked what is recursion, 419 00:21:37,720 --> 00:21:40,373 they might have gotten this response from the duck instead. 420 00:21:40,373 --> 00:21:43,700 421 00:21:43,700 --> 00:21:45,360 I'll give folks a chance to read that. 422 00:21:45,360 --> 00:21:47,652 For those unfamiliar, just Google Rick Roll, and you'll 423 00:21:47,652 --> 00:21:50,340 see that this is reminiscent of some of the song's lyrics. 424 00:21:50,340 --> 00:21:53,910 Now some results before we transition to some of the underlying architecture-- 425 00:21:53,910 --> 00:21:57,650 so based on last summer's Harvard Summer School class, 426 00:21:57,650 --> 00:22:01,370 which we offered with a small number of students here on campus and online, 427 00:22:01,370 --> 00:22:07,280 and based on fall of 2023, when we offered the duck to students at Harvard, 428 00:22:07,280 --> 00:22:11,210 at Yale, at a larger scale, as well as to tens of thousands of students via 429 00:22:11,210 --> 00:22:15,380 edEx and other platforms, we've seen utilization that's quite high already, 430 00:22:15,380 --> 00:22:21,500 up to 142,000 users as of today, some 20,000 prompts or questions have been 431 00:22:21,500 --> 00:22:26,070 asked on average per day, 5.8 million total such questions thus far. 432 00:22:26,070 --> 00:22:28,610 So it's a large and increasingly large data set. 433 00:22:28,610 --> 00:22:31,850 Among our undergraduates on campus when we surveyed them 434 00:22:31,850 --> 00:22:37,200 as to their utilization, almost all students used the duck fairly regularly. 435 00:22:37,200 --> 00:22:42,200 So pictured here in blue in 17% is 17% of students use the tools 436 00:22:42,200 --> 00:22:44,650 the tools more than 10 times per week. 437 00:22:44,650 --> 00:22:48,870 Pictured here in green at 32% is students who use the tools 5 to 10 times 438 00:22:48,870 --> 00:22:50,500 per week and from there. 439 00:22:50,500 --> 00:22:55,890 And so pictured in red are only 25% of students who last semester only used 440 00:22:55,890 --> 00:22:58,650 them less than, say, twice per week. 441 00:22:58,650 --> 00:23:01,050 In terms of helpfulness, long story short-- and we 442 00:23:01,050 --> 00:23:04,410 have a longer paper that speaks to some of students experience-- most 443 00:23:04,410 --> 00:23:08,700 students found the duck as is, version one, helpful already, 444 00:23:08,700 --> 00:23:13,180 47% finding it very helpful, 26% helpful, and so forth, 445 00:23:13,180 --> 00:23:15,340 not necessarily very quantitative measures. 446 00:23:15,340 --> 00:23:16,530 But this is to say-- 447 00:23:16,530 --> 00:23:19,830 and this is pretty impressive, I think, for a new technology in general, not 448 00:23:19,830 --> 00:23:22,080 necessarily ours but AI more broadly-- 449 00:23:22,080 --> 00:23:25,860 it's already working overnight pretty darn well 450 00:23:25,860 --> 00:23:28,030 and is probably only going to get better. 451 00:23:28,030 --> 00:23:32,130 More into the weeds, when we looked at the impact of AI 452 00:23:32,130 --> 00:23:36,810 on students and teachers experience on campus, we looked at prior years data 453 00:23:36,810 --> 00:23:38,970 before the CS50 duck spoke English. 454 00:23:38,970 --> 00:23:44,490 Without AI, students asked in previous years an average of 0.89 questions each 455 00:23:44,490 --> 00:23:45,203 of TFs. 456 00:23:45,203 --> 00:23:46,120 Now, that's not a lot. 457 00:23:46,120 --> 00:23:48,730 That's literally less than one question on average. 458 00:23:48,730 --> 00:23:52,410 But with some 500, 600 students, that actually still 459 00:23:52,410 --> 00:23:54,750 adds up to a significant volume being asked 460 00:23:54,750 --> 00:23:58,380 at least on campus in a University environment of our own instructor 461 00:23:58,380 --> 00:23:59,880 and teaching fellows. 462 00:23:59,880 --> 00:24:03,180 With AI, this past fall semester students 463 00:24:03,180 --> 00:24:08,160 asked 0.28 questions each of TFs, that is to say, 464 00:24:08,160 --> 00:24:12,100 a significant reduction in questions being asked of TFs. 465 00:24:12,100 --> 00:24:14,470 Now, not necessarily all positive-- 466 00:24:14,470 --> 00:24:17,970 I will disclaim that this is not necessarily a good outcome 467 00:24:17,970 --> 00:24:21,570 when it comes to our teaching teachers how to teach 468 00:24:21,570 --> 00:24:24,840 and creating opportunities for them to answer students' questions 469 00:24:24,840 --> 00:24:26,770 and think about the material themselves. 470 00:24:26,770 --> 00:24:31,200 But I will also stipulate that we have long been overworked, so to speak. 471 00:24:31,200 --> 00:24:34,620 Or there's just never been enough human support for the students 472 00:24:34,620 --> 00:24:36,640 and the volume of questions they're from. 473 00:24:36,640 --> 00:24:39,600 And so generally speaking, we're excited that this 474 00:24:39,600 --> 00:24:43,620 allows us to now re-use all the more effectively what human support we do 475 00:24:43,620 --> 00:24:44,320 have. 476 00:24:44,320 --> 00:24:46,170 Another result worth sharing is this. 477 00:24:46,170 --> 00:24:49,110 Without AI, in years past, students attended 478 00:24:49,110 --> 00:24:54,090 51% of our available office hours, one-on-one opportunities or small group 479 00:24:54,090 --> 00:24:56,460 opportunities for questions and answer with me 480 00:24:56,460 --> 00:24:58,480 or with the course's teaching fellows. 481 00:24:58,480 --> 00:25:03,300 With AI, this past fall students only attended 30% of those available office 482 00:25:03,300 --> 00:25:07,770 hours, presumably because they're turning interactively to the AI duck 483 00:25:07,770 --> 00:25:10,170 as opposed to scheduling an appointment, walking 484 00:25:10,170 --> 00:25:12,370 across campus, logging into Zoom. 485 00:25:12,370 --> 00:25:16,520 Their behavior has already been impacted. 486 00:25:16,520 --> 00:25:19,140 Among our favorite quotes anecdotally from students is this, 487 00:25:19,140 --> 00:25:22,300 "the AI duck felt like having a personal tutor. 488 00:25:22,300 --> 00:25:26,040 I love how AI bots will answer questions without ego and without judgment, 489 00:25:26,040 --> 00:25:28,350 generally entertaining even the stupidest of questions 490 00:25:28,350 --> 00:25:30,940 without treating them like they're stupid. 491 00:25:30,940 --> 00:25:35,380 It has an, as one could expect, inhuman level of patience." 492 00:25:35,380 --> 00:25:37,680 And this particular quote really resonated with me 493 00:25:37,680 --> 00:25:41,580 if only because I can remember vividly still some 15, 20 years ago, 494 00:25:41,580 --> 00:25:43,980 being in graduate school, going into the office 495 00:25:43,980 --> 00:25:49,950 of a very kind-hearted smart professor, who just seemed too smart for me 496 00:25:49,950 --> 00:25:51,910 to ever quite follow perfectly. 497 00:25:51,910 --> 00:25:55,750 And I would ask questions in office hours of some CS theoretical topic. 498 00:25:55,750 --> 00:25:58,560 I would eagerly like a good student nod and say, yes, yes. 499 00:25:58,560 --> 00:25:59,150 I understand. 500 00:25:59,150 --> 00:25:59,650 I get it. 501 00:25:59,650 --> 00:26:02,580 I would leave the room and then not know exactly what was 502 00:26:02,580 --> 00:26:06,570 going on still because there was this social tension, this imposter 503 00:26:06,570 --> 00:26:09,270 syndrome, however you might want to view your own experience, 504 00:26:09,270 --> 00:26:13,800 whereby I wasn't really getting answers to the questions I had because I felt 505 00:26:13,800 --> 00:26:16,890 that there was this upper bound on just how many minutes 506 00:26:16,890 --> 00:26:20,267 I should be spending there and how many questions I should be asking, 507 00:26:20,267 --> 00:26:22,600 especially when there might be someone else at the door. 508 00:26:22,600 --> 00:26:24,900 So there's something to be said about the mere fact 509 00:26:24,900 --> 00:26:31,290 that software indeed has this patience and this endless supply of patience 510 00:26:31,290 --> 00:26:34,620 and answers that I do think is fairly game changing, certainly 511 00:26:34,620 --> 00:26:37,890 for those of us who might have felt especially insecure in some 512 00:26:37,890 --> 00:26:41,400 of those learning environments, no matter how friendly and welcoming 513 00:26:41,400 --> 00:26:43,830 the actual teachers or professors were. 514 00:26:43,830 --> 00:26:47,390 Another student wrote, "the AI tools gave me enough hints to try on my own 515 00:26:47,390 --> 00:26:50,970 and also helped me decipher errors and possible errors I might encounter." 516 00:26:50,970 --> 00:26:54,590 "I also appreciated," says another, "that CS50 implemented its own version 517 00:26:54,590 --> 00:26:58,580 of AI because I think just directly using something like ChatGPT would have 518 00:26:58,580 --> 00:27:01,130 definitely detracted from learning." 519 00:27:01,130 --> 00:27:04,280 And just to seed your own thinking, I do think-- 520 00:27:04,280 --> 00:27:06,380 and we're already seeing there to be other impacts 521 00:27:06,380 --> 00:27:07,890 of artificial intelligence. 522 00:27:07,890 --> 00:27:12,480 I do think in our CS50 case, grades are only going to continue to rise. 523 00:27:12,480 --> 00:27:15,980 And long story short, within higher education, certainly within Harvard, 524 00:27:15,980 --> 00:27:18,800 there's long been a concern about grade inflation 525 00:27:18,800 --> 00:27:21,140 whereby everyone's grades have been trending up and up 526 00:27:21,140 --> 00:27:23,700 and up such that almost everyone gets an A in some form. 527 00:27:23,700 --> 00:27:26,510 This is only going to further exacerbate that trend 528 00:27:26,510 --> 00:27:31,130 if it's indeed an actual problem insofar as-- and this is actually a good thing. 529 00:27:31,130 --> 00:27:35,570 If we increasingly provide students with software tools like check50 530 00:27:35,570 --> 00:27:40,310 for correctness, style50 for style, and maybe something like design50, 531 00:27:40,310 --> 00:27:45,250 as you'll soon see for design, which is much more qualitative, 532 00:27:45,250 --> 00:27:49,960 it stands to reason that students code should be and work and look better 533 00:27:49,960 --> 00:27:53,570 before they submit because they're now getting iterative feedback. 534 00:27:53,570 --> 00:27:56,650 So I dare say, and I would propose that we're probably 535 00:27:56,650 --> 00:28:00,160 beginning to provide students with a better educational experience 536 00:28:00,160 --> 00:28:03,250 even if it might become harder or less obvious 537 00:28:03,250 --> 00:28:06,400 how at the end of that experience we should be assessing 538 00:28:06,400 --> 00:28:08,270 what it is they've taken away. 539 00:28:08,270 --> 00:28:11,060 So a good topic perhaps for today's birds of a feather-- 540 00:28:11,060 --> 00:28:13,870 but for now, allow me to leave you as I step off stage 541 00:28:13,870 --> 00:28:15,760 with a screenshot of the paper. 542 00:28:15,760 --> 00:28:19,510 And we'll share this link via the workshop's website of all this 543 00:28:19,510 --> 00:28:23,950 and more detail that discusses how and why we implemented the same, thanks 544 00:28:23,950 --> 00:28:26,590 to not only myself, CS50's own Rongxin Liu, 545 00:28:26,590 --> 00:28:30,610 as well as a number of CS50's undergraduates and present 546 00:28:30,610 --> 00:28:31,760 and past staff. 547 00:28:31,760 --> 00:28:34,340 But allow me to now turn the reins over to Rongxin. 548 00:28:34,340 --> 00:28:37,840 I will now hop into the chat and take any and all questions interactively 549 00:28:37,840 --> 00:28:38,540 online. 550 00:28:38,540 --> 00:28:40,070 We'll see you in just a moment. 551 00:28:40,070 --> 00:28:43,150 RONGXIN LIU: So I'm now going to talk about some implementation 552 00:28:43,150 --> 00:28:45,110 detail of the CS50 Duck. 553 00:28:45,110 --> 00:28:48,680 So just so you have an idea of what's happening 554 00:28:48,680 --> 00:28:53,750 underneath the hood of the CS50.AI and hopefully give you an idea on-- well, 555 00:28:53,750 --> 00:28:59,940 you might be able to also create your own AI tutor by using OpenAI's APIs. 556 00:28:59,940 --> 00:29:02,160 If you would like to take a look at the source code 557 00:29:02,160 --> 00:29:05,610 that I will be using for this workshop or the demo for demo purposes, 558 00:29:05,610 --> 00:29:07,510 you can also go check out the code. 559 00:29:07,510 --> 00:29:11,650 And again, I want to reiterate, there are no need 560 00:29:11,650 --> 00:29:14,520 to follow everything I did in this workshop. 561 00:29:14,520 --> 00:29:17,612 I think the important takeaway is just trying to understand 562 00:29:17,612 --> 00:29:19,070 what's happening behind the scenes. 563 00:29:19,070 --> 00:29:20,862 And if you want to look at the source code, 564 00:29:20,862 --> 00:29:24,200 if you want to implement something yourself, you can do that later. 565 00:29:24,200 --> 00:29:29,020 No need to following what I'm doing here live. 566 00:29:29,020 --> 00:29:33,100 To give you a detailed overview of what the system architect looks 567 00:29:33,100 --> 00:29:38,980 like for CS50.ai, this diagram basically sums up all the major components. 568 00:29:38,980 --> 00:29:43,720 David already explained-- I'll give you a sense of what 569 00:29:43,720 --> 00:29:45,790 it looks like from the user's perspective when 570 00:29:45,790 --> 00:29:47,980 student interacting with our tool. 571 00:29:47,980 --> 00:29:52,780 I want to focus more on my part and the subsequent part 572 00:29:52,780 --> 00:29:56,200 on the back end side of the CS50.ai. 573 00:29:56,200 --> 00:30:00,760 So whenever student type in a question, for example, like what is Flask? 574 00:30:00,760 --> 00:30:04,990 So the message actually gets sent to our own server sitting in the cloud. 575 00:30:04,990 --> 00:30:10,010 And our server actually do a bunch of string manipulation, essentially. 576 00:30:10,010 --> 00:30:15,390 We first check to see if there's any personal identifiable information. 577 00:30:15,390 --> 00:30:18,260 We use a library to basically scrub those information away 578 00:30:18,260 --> 00:30:22,640 because we don't want to leak students' sensitive information, for example, 579 00:30:22,640 --> 00:30:26,670 like email address, phone number, credit card numbers, all sorts of things. 580 00:30:26,670 --> 00:30:28,880 So we would do the sanity check ourself. 581 00:30:28,880 --> 00:30:31,760 And then we would just replace all the sensitive information 582 00:30:31,760 --> 00:30:36,200 with a redacted signature there. 583 00:30:36,200 --> 00:30:39,320 We also perform a prompt injection attack. 584 00:30:39,320 --> 00:30:40,520 David also mentioned this. 585 00:30:40,520 --> 00:30:43,670 Some malicious user are oftentimes trying 586 00:30:43,670 --> 00:30:46,950 to jailbreak our system by using jailbreak prompt. 587 00:30:46,950 --> 00:30:51,170 So we actually come up with an interesting way by utilizing AI again. 588 00:30:51,170 --> 00:30:55,070 Literally when we detect something unusual in the student's query, 589 00:30:55,070 --> 00:30:58,680 we actually ask GPT 4, hey, is this a prompt injection attack 590 00:30:58,680 --> 00:31:00,660 and let GPT 4 decide. 591 00:31:00,660 --> 00:31:03,900 If it is, we will just abort the communication. 592 00:31:03,900 --> 00:31:08,330 If not, we will proceed by sending the request to OpenAI. 593 00:31:08,330 --> 00:31:13,250 We also anonymize all the request IDs so that all the requests that OpenAI sees 594 00:31:13,250 --> 00:31:14,280 are anonymized. 595 00:31:14,280 --> 00:31:18,500 They are not actually understanding what this request coming from as a way 596 00:31:18,500 --> 00:31:20,090 to protect student's privacy. 597 00:31:20,090 --> 00:31:21,830 So once the prompt-- 598 00:31:21,830 --> 00:31:28,820 once the student's query with our own wrapper prompt get merged, 599 00:31:28,820 --> 00:31:33,620 we will just send this updated prompt to OpenAI to generate a response back. 600 00:31:33,620 --> 00:31:36,950 And then we will just send this response back to the student directly. 601 00:31:36,950 --> 00:31:40,280 You might notice there's a component called the vector database. 602 00:31:40,280 --> 00:31:44,240 And there's a rack called a retrieval-augmented generation 603 00:31:44,240 --> 00:31:46,160 process happening as well. 604 00:31:46,160 --> 00:31:49,020 That was a way for us to ground the model. 605 00:31:49,020 --> 00:31:51,170 I will explain what grounding means. 606 00:31:51,170 --> 00:31:55,460 We try to utilize our own CS50 material or content 607 00:31:55,460 --> 00:31:59,570 to augment the OpenAI's response generation so 608 00:31:59,570 --> 00:32:05,455 that the GPT response is more truthful or more faithful to our CS50 content. 609 00:32:05,455 --> 00:32:09,830 610 00:32:09,830 --> 00:32:13,540 In particular, for the APIs, today, we are 611 00:32:13,540 --> 00:32:15,950 going to cover these three major API. 612 00:32:15,950 --> 00:32:19,450 I'm going to cover the first two for the most of the time. 613 00:32:19,450 --> 00:32:22,330 The first API is Chat API, which is the Chat Completion 614 00:32:22,330 --> 00:32:26,050 API, which is the API we use extensively for the CS50 Duck. 615 00:32:26,050 --> 00:32:28,780 Whenever you are interacting with the Duck, 616 00:32:28,780 --> 00:32:32,180 we are actually using Chat Completion API to facilitate that. 617 00:32:32,180 --> 00:32:33,980 We also utilize Embeddings API. 618 00:32:33,980 --> 00:32:39,250 That's the API we use for the RAG pipeline for the retrieval 619 00:32:39,250 --> 00:32:43,120 augmented generation process. 620 00:32:43,120 --> 00:32:48,130 Assistant API is actually a newer API that we did not use in production. 621 00:32:48,130 --> 00:32:50,640 For our current version of the CS50 duck, 622 00:32:50,640 --> 00:32:54,790 we will be experimenting it in our next version of the CS50 Duck. 623 00:32:54,790 --> 00:32:57,410 624 00:32:57,410 --> 00:33:04,310 So first, by now you all probably heard of the term large language model or LLM 625 00:33:04,310 --> 00:33:05,300 for short. 626 00:33:05,300 --> 00:33:08,370 Large language model is essentially a type of neural network. 627 00:33:08,370 --> 00:33:15,290 It was trained to understand text, was able to also generate media, 628 00:33:15,290 --> 00:33:19,820 generate video, generate audio, generate text as well. 629 00:33:19,820 --> 00:33:23,820 In our CSV Duck, we only focus on text generation. 630 00:33:23,820 --> 00:33:26,040 We are not going to generate any image, any audio. 631 00:33:26,040 --> 00:33:33,290 We only want to use this LLMs model's text generation capability to provide 632 00:33:33,290 --> 00:33:37,220 our students the AI experience. 633 00:33:37,220 --> 00:33:43,520 So you can view it as a chatbot essentially but a chatbot with context. 634 00:33:43,520 --> 00:33:49,070 What it means is when you are talking to this particular AI chatbot, 635 00:33:49,070 --> 00:33:54,920 it understands that your question likely fall into the domain of CS50 or computer 636 00:33:54,920 --> 00:33:56,130 science in general. 637 00:33:56,130 --> 00:33:57,340 That's why it has context. 638 00:33:57,340 --> 00:33:58,840 It knows what you are talking about. 639 00:33:58,840 --> 00:34:04,050 For example, if I ask the duck, hey, I'm having trouble with problem set 1, 640 00:34:04,050 --> 00:34:05,510 can you help me? 641 00:34:05,510 --> 00:34:09,409 If you ask the general GPT, it might not know 642 00:34:09,409 --> 00:34:12,892 what you are talking about because problem set 1 could mean many things. 643 00:34:12,892 --> 00:34:14,600 But if you talk to the Duck that actually 644 00:34:14,600 --> 00:34:16,010 understand what you're talking about. 645 00:34:16,010 --> 00:34:17,843 What you are referring to-- you're referring 646 00:34:17,843 --> 00:34:20,530 to, oh, the CS50 problem set 1 likely. 647 00:34:20,530 --> 00:34:23,480 So there's a context there. 648 00:34:23,480 --> 00:34:27,170 And when dealing with AI system, nowadays AI system, 649 00:34:27,170 --> 00:34:32,909 you likely need to deal with three rows during implementation, the system row. 650 00:34:32,909 --> 00:34:33,620 There's a user. 651 00:34:33,620 --> 00:34:34,880 There's an assistant. 652 00:34:34,880 --> 00:34:38,850 System David already also mentioned a little bit, 653 00:34:38,850 --> 00:34:44,060 which is often referring to the system prompt or the general guideline 654 00:34:44,060 --> 00:34:45,380 you give to the LLM model. 655 00:34:45,380 --> 00:34:47,030 That's called system row. 656 00:34:47,030 --> 00:34:52,690 User, which means us, the user interacting with the large language 657 00:34:52,690 --> 00:34:53,190 model. 658 00:34:53,190 --> 00:34:53,857 It could be GPT. 659 00:34:53,857 --> 00:34:56,810 It could be some other open source large language model. 660 00:34:56,810 --> 00:34:59,700 Assistant refers to the large language model. 661 00:34:59,700 --> 00:35:04,020 So if you get a response back from the large language model, 662 00:35:04,020 --> 00:35:08,210 we will say that you get an assistant message back. 663 00:35:08,210 --> 00:35:10,190 So these three row-- 664 00:35:10,190 --> 00:35:18,670 they work cohesively together to provide the AI experience. 665 00:35:18,670 --> 00:35:21,910 And this slide just summarize what these three rows are 666 00:35:21,910 --> 00:35:27,040 doing during a typical AI interaction. 667 00:35:27,040 --> 00:35:31,750 So I'm going to reiterate what system message is again. 668 00:35:31,750 --> 00:35:34,750 For example, in the CS50 Duck scenario, this 669 00:35:34,750 --> 00:35:37,750 is a simplified version of the system prompt we have. 670 00:35:37,750 --> 00:35:42,710 First of all, we define what the large language model will be. 671 00:35:42,710 --> 00:35:44,310 So we give them a role. 672 00:35:44,310 --> 00:35:49,000 Basically, this is like assigning a role to this large language model, 673 00:35:49,000 --> 00:35:50,300 in this case, the GPT 4. 674 00:35:50,300 --> 00:35:55,930 So we tell GPT 4, hey, you are a teaching assistant for CS50. 675 00:35:55,930 --> 00:35:57,300 You are also a rubber duck. 676 00:35:57,300 --> 00:36:01,540 It's like setting the personality for this large language model 677 00:36:01,540 --> 00:36:05,540 to basically affect its response generation. 678 00:36:05,540 --> 00:36:11,500 It might be generating in a more funny style way or a playful way 679 00:36:11,500 --> 00:36:13,660 if we are doing this first line. 680 00:36:13,660 --> 00:36:18,490 But importantly, we also want to give instruction to the large language model, 681 00:36:18,490 --> 00:36:21,700 like a general guideline, like you shouldn't 682 00:36:21,700 --> 00:36:26,320 be answering questions that is not related to CS50 or computer science. 683 00:36:26,320 --> 00:36:31,490 We want that only focusing on answering CS-related or CS50-related question. 684 00:36:31,490 --> 00:36:36,590 And importantly, we don't want you to want you, the CS50 Duck, 685 00:36:36,590 --> 00:36:38,600 to provide any answers to the problem set. 686 00:36:38,600 --> 00:36:39,550 That's important. 687 00:36:39,550 --> 00:36:43,760 This is basically the guideline that the large language model has 688 00:36:43,760 --> 00:36:46,850 to follow when generating its response. 689 00:36:46,850 --> 00:36:49,370 So these are the instructions we are now giving 690 00:36:49,370 --> 00:36:53,300 or the rules we are giving to this GPT 4. 691 00:36:53,300 --> 00:37:00,320 And for each question, we always have this system message enforced. 692 00:37:00,320 --> 00:37:03,290 You will find out why we need to do that for every iteration, 693 00:37:03,290 --> 00:37:04,710 for every interaction. 694 00:37:04,710 --> 00:37:09,560 But anyway, whenever students send us a message, we actually take their message 695 00:37:09,560 --> 00:37:13,430 and then concatenate with the system rule with our own wrapper prompt. 696 00:37:13,430 --> 00:37:14,520 And then we prompt GPT. 697 00:37:14,520 --> 00:37:18,470 So GPT is always following our system rule, our system instruction 698 00:37:18,470 --> 00:37:22,370 first before actually trying to understand what the student is actually 699 00:37:22,370 --> 00:37:22,870 asking. 700 00:37:22,870 --> 00:37:25,850 701 00:37:25,850 --> 00:37:29,800 And for this slide, it summarized a typical interaction 702 00:37:29,800 --> 00:37:32,680 between user and the large language model. 703 00:37:32,680 --> 00:37:36,550 So the technical term is you prompt the large language model 704 00:37:36,550 --> 00:37:38,450 to get a response back. 705 00:37:38,450 --> 00:37:43,130 That's how you interact with the large language model. 706 00:37:43,130 --> 00:37:46,110 And sometimes people come up with fancy term. 707 00:37:46,110 --> 00:37:48,860 Nowadays they call it prompt engineering. 708 00:37:48,860 --> 00:37:51,590 Personally, I just think this is string manipulation here, 709 00:37:51,590 --> 00:37:56,120 just a different way of asking question, a different way of conveying things 710 00:37:56,120 --> 00:38:00,510 to instruct the GPT model to generate a response. 711 00:38:00,510 --> 00:38:02,840 So for example, in your question, for example, 712 00:38:02,840 --> 00:38:06,510 if you want the GPT to answer in a certain way, you can provide example. 713 00:38:06,510 --> 00:38:09,890 This is often called one shot prompting or few shot prompting, depending on how 714 00:38:09,890 --> 00:38:13,070 many example you provide in the prompt. 715 00:38:13,070 --> 00:38:16,280 You can also ask the model to adopt a persona. 716 00:38:16,280 --> 00:38:19,460 So in the CS50 Duck scenario, we say you are a rubber duck. 717 00:38:19,460 --> 00:38:22,880 That actually asks the model to adopt a persona. 718 00:38:22,880 --> 00:38:29,360 And you can also do other restriction or prompt it differently, 719 00:38:29,360 --> 00:38:31,200 hence the thing called engineering. 720 00:38:31,200 --> 00:38:33,318 There's nothing really engineering here. 721 00:38:33,318 --> 00:38:36,110 It's just a different way of asking question at the end of the day. 722 00:38:36,110 --> 00:38:41,880 You are manipulating a string, and then you want to get the ideal response back. 723 00:38:41,880 --> 00:38:45,610 OK, so now let's talk about Chat Completion 724 00:38:45,610 --> 00:38:50,020 API with the foundational knowledge in mind. 725 00:38:50,020 --> 00:38:54,670 So when you use this API, you make an API call 726 00:38:54,670 --> 00:38:57,460 to Open AI's server sitting in the cloud. 727 00:38:57,460 --> 00:39:00,940 Usually the payload will be the message because you are prompting the model 728 00:39:00,940 --> 00:39:01,730 to get a response. 729 00:39:01,730 --> 00:39:07,485 So for example, I can ask, hey, can you help me with my Tideman problem set? 730 00:39:07,485 --> 00:39:10,940 OK, the model will just generate a response back. 731 00:39:10,940 --> 00:39:13,930 Now, here you might wonder, OK, why the model actually 732 00:39:13,930 --> 00:39:16,610 know it's a CS50 Tideman problem set? 733 00:39:16,610 --> 00:39:18,730 First of all, Tideman problem set. 734 00:39:18,730 --> 00:39:25,330 It's a very iconic term in the internet due to the popularity of these problems, 735 00:39:25,330 --> 00:39:25,840 I believe. 736 00:39:25,840 --> 00:39:28,090 And actually, when Opening AI train the model-- 737 00:39:28,090 --> 00:39:29,890 when it scraped the internet, it actually 738 00:39:29,890 --> 00:39:33,700 scraped our entire course content into the training data. 739 00:39:33,700 --> 00:39:38,870 So the model already has the knowledge of CS50. 740 00:39:38,870 --> 00:39:42,130 So that's considered the internal knowledge that the model has. 741 00:39:42,130 --> 00:39:45,375 So it naturally actually understand what this question mean. 742 00:39:45,375 --> 00:39:48,460 743 00:39:48,460 --> 00:39:50,800 If you look at the code-- 744 00:39:50,800 --> 00:39:52,970 again, this is a code written in Python. 745 00:39:52,970 --> 00:39:56,290 If you don't understand what the code is doing right now, don't worry about it. 746 00:39:56,290 --> 00:40:00,410 Just pay attention to some of the keywords in this code snippet. 747 00:40:00,410 --> 00:40:04,930 You notice there's a row system, and there's a row user. 748 00:40:04,930 --> 00:40:09,040 This is actually the row I just mentioned a few slides earlier. 749 00:40:09,040 --> 00:40:12,370 First, there's a system row, which is the system prompt. 750 00:40:12,370 --> 00:40:15,260 You are a friendly supported teaching assistant. 751 00:40:15,260 --> 00:40:16,430 You are also a rubber duck. 752 00:40:16,430 --> 00:40:21,740 You can add more instruction to this system row and in the content field. 753 00:40:21,740 --> 00:40:26,600 And because we need to utilize this API to generate response-- 754 00:40:26,600 --> 00:40:33,920 so you need to also supply, OK, what's the user messages to GPT 4. 755 00:40:33,920 --> 00:40:39,370 So we will have another dictionary, another entry, mark it as user row, 756 00:40:39,370 --> 00:40:41,470 and then the content will be the question. 757 00:40:41,470 --> 00:40:48,560 With these two messages in place, you will send this payload, essentially, 758 00:40:48,560 --> 00:40:51,020 these two messages to ChatGPT. 759 00:40:51,020 --> 00:40:54,050 And the GPT will be able to take these two message, the system 760 00:40:54,050 --> 00:40:56,990 prompt and the user prompt and generate a system 761 00:40:56,990 --> 00:41:05,560 response, which is the response you saw on the earlier slide. 762 00:41:05,560 --> 00:41:09,705 So it might be good for me to actually do a live demo here this time. 763 00:41:09,705 --> 00:41:15,710 764 00:41:15,710 --> 00:41:17,820 Although I won't be typing the code line by line, 765 00:41:17,820 --> 00:41:22,400 it's just trying to show you what it looks like when it's actually 766 00:41:22,400 --> 00:41:23,275 running in the code. 767 00:41:23,275 --> 00:41:28,930 768 00:41:28,930 --> 00:41:33,080 And also, this is a good demo for a Codespace demo 769 00:41:33,080 --> 00:41:36,490 if you attended the workshop last day, I also talked about Codespace. 770 00:41:36,490 --> 00:41:39,850 This is entirely running in the cloud on Codespace as well. 771 00:41:39,850 --> 00:41:46,760 So I'm going to go to the chat example, the OpenAI example in particular. 772 00:41:46,760 --> 00:41:49,750 In this workshop, I also provide you the example 773 00:41:49,750 --> 00:41:54,440 for the other vendor, for example, from cloud and Google Gemini API. 774 00:41:54,440 --> 00:41:57,280 You can also look at those at your own time 775 00:41:57,280 --> 00:42:03,850 to see how similar these APIs actually look in between each other. 776 00:42:03,850 --> 00:42:11,650 OK, so let's take a look at the simplest way you can invoke the Chat Completion 777 00:42:11,650 --> 00:42:12,700 API endpoint. 778 00:42:12,700 --> 00:42:13,750 Ignore the first part. 779 00:42:13,750 --> 00:42:16,190 This is just how I handle environment variable. 780 00:42:16,190 --> 00:42:22,370 So when you work with API or any kind of API, you often need an API key. 781 00:42:22,370 --> 00:42:28,450 It's a way for OpenAI to know, OK, you are the API keyholder 782 00:42:28,450 --> 00:42:29,960 and you are making this API call. 783 00:42:29,960 --> 00:42:33,970 It's a way basically for OpenAI to build you because this thing costs money. 784 00:42:33,970 --> 00:42:36,730 So they need a way to build you. 785 00:42:36,730 --> 00:42:38,680 That's just the prerequisites. 786 00:42:38,680 --> 00:42:41,270 But this is the important part. 787 00:42:41,270 --> 00:42:45,300 This is the actual API call we are doing. 788 00:42:45,300 --> 00:42:51,060 Here I'm just going to quickly show you if you prompt the model without anything 789 00:42:51,060 --> 00:42:52,300 what it will look like. 790 00:42:52,300 --> 00:42:55,620 So that's why there's no system message here. 791 00:42:55,620 --> 00:43:00,660 And because we call the API and we will expect to get a response, 792 00:43:00,660 --> 00:43:03,110 that's why I'm storing the response here. 793 00:43:03,110 --> 00:43:06,570 794 00:43:06,570 --> 00:43:09,090 I'm just going to print it out back to the terminal 795 00:43:09,090 --> 00:43:10,600 so that you can see the response. 796 00:43:10,600 --> 00:43:13,120 797 00:43:13,120 --> 00:43:18,210 Now I'm just going to run this example. 798 00:43:18,210 --> 00:43:21,390 So as you can see, I get a response from the model 799 00:43:21,390 --> 00:43:24,490 live, Hello back, because I say Hello. 800 00:43:24,490 --> 00:43:26,130 So it just Hello back me. 801 00:43:26,130 --> 00:43:29,460 If I prompt again, maybe I will get a different response. 802 00:43:29,460 --> 00:43:32,230 You see, this time the response is actually shorter. 803 00:43:32,230 --> 00:43:35,460 That actually means the model generation is not deterministic. 804 00:43:35,460 --> 00:43:36,640 It's subject to randomness. 805 00:43:36,640 --> 00:43:39,265 So every time you prompt the model even with the same question, 806 00:43:39,265 --> 00:43:41,400 you might not get the same answer back. 807 00:43:41,400 --> 00:43:43,530 That's just something you need to be aware of when 808 00:43:43,530 --> 00:43:45,840 working with large language model. 809 00:43:45,840 --> 00:43:49,830 Now, you might wonder, OK, how can I do a conversation? 810 00:43:49,830 --> 00:43:54,120 A conversation is actually text generation. 811 00:43:54,120 --> 00:43:57,780 At the end of the day, you need to be aware 812 00:43:57,780 --> 00:44:03,820 that when you're interacting with LM model, the model doesn't understand you. 813 00:44:03,820 --> 00:44:05,920 The model doesn't have memory at all. 814 00:44:05,920 --> 00:44:09,400 So if I ask the Duck, what is Flask? 815 00:44:09,400 --> 00:44:13,930 The Duck, our CS50 AI, will give me a response back. 816 00:44:13,930 --> 00:44:17,030 And if I ask, what did I just ask? 817 00:44:17,030 --> 00:44:19,580 The Duck, will say, oh, you just asked what is Flask. 818 00:44:19,580 --> 00:44:21,900 But that's actually not the case. 819 00:44:21,900 --> 00:44:23,520 The model is not remembering you. 820 00:44:23,520 --> 00:44:28,010 It's because we are actually giving the model the entire history for the model 821 00:44:28,010 --> 00:44:30,660 to predict the next response. 822 00:44:30,660 --> 00:44:34,860 So in the code, it actually looks more clearer. 823 00:44:34,860 --> 00:44:39,930 So previously we have this simple API call-- 824 00:44:39,930 --> 00:44:45,220 this simple API call to ask, "Can you help me with my filter pset?" 825 00:44:45,220 --> 00:44:46,770 I get a response back. 826 00:44:46,770 --> 00:44:52,080 In order for me to follow up with a question, I can't just go and ask-- 827 00:44:52,080 --> 00:44:56,557 I can't just go in and call this API again with a new question. 828 00:44:56,557 --> 00:44:58,890 The model don't have any context at all because it won't 829 00:44:58,890 --> 00:45:02,610 know what itself just answered to me. 830 00:45:02,610 --> 00:45:05,490 In order to continue the conversation, we actually 831 00:45:05,490 --> 00:45:10,040 need to send back its own response in the payload 832 00:45:10,040 --> 00:45:14,160 and then ask a new question there. 833 00:45:14,160 --> 00:45:17,010 So this is now four message. 834 00:45:17,010 --> 00:45:21,320 And then send it to the OpenAI Chat Completion API and get back a response. 835 00:45:21,320 --> 00:45:24,630 As you can see, as the conversation progress, 836 00:45:24,630 --> 00:45:33,100 we are sending more and more message because, again, large language model 837 00:45:33,100 --> 00:45:34,490 is doing text generation. 838 00:45:34,490 --> 00:45:37,015 It's generating text based on what you are sending to it. 839 00:45:37,015 --> 00:45:38,890 The reason why it can carry on a conversation 840 00:45:38,890 --> 00:45:44,180 is because you already supplied them with the previous chat history. 841 00:45:44,180 --> 00:45:47,720 So the large language model was looking at, OK, this is the chat history. 842 00:45:47,720 --> 00:45:49,570 This is what been talking about so far. 843 00:45:49,570 --> 00:45:51,403 OK, I'm going to generate the next response. 844 00:45:51,403 --> 00:45:53,940 845 00:45:53,940 --> 00:45:55,985 And in the code, I can also show you quickly. 846 00:45:55,985 --> 00:46:08,710 847 00:46:08,710 --> 00:46:13,120 So maybe let's look at just chat3. 848 00:46:13,120 --> 00:46:16,240 849 00:46:16,240 --> 00:46:18,880 So I'm going to enlarge it. 850 00:46:18,880 --> 00:46:22,520 So the majority of the code is essentially the same. 851 00:46:22,520 --> 00:46:25,190 The API code we are using is still Chat Completion API. 852 00:46:25,190 --> 00:46:30,710 The only thing we need to keep track of is the message history essentially. 853 00:46:30,710 --> 00:46:35,740 When user types something in the prompt, I'm 854 00:46:35,740 --> 00:46:38,480 going to store the user into a message array. 855 00:46:38,480 --> 00:46:40,630 And when GPT generate a response-- 856 00:46:40,630 --> 00:46:43,600 I'm also going to store that in the message array. 857 00:46:43,600 --> 00:46:45,650 And then I have a while true loop. 858 00:46:45,650 --> 00:46:48,130 So this loop is just going to keep on forever 859 00:46:48,130 --> 00:46:51,340 that way I can carry on a conversation. 860 00:46:51,340 --> 00:46:52,480 So I can just-- 861 00:46:52,480 --> 00:46:54,855 I can now run the code and show you what it looks like. 862 00:46:54,855 --> 00:47:00,270 863 00:47:00,270 --> 00:47:03,660 So I'm going to-- what is Flask? 864 00:47:03,660 --> 00:47:05,015 It's going to think a while. 865 00:47:05,015 --> 00:47:08,170 866 00:47:08,170 --> 00:47:10,130 It is actually now generating a response, 867 00:47:10,130 --> 00:47:13,370 but because my question is about Flask-- 868 00:47:13,370 --> 00:47:16,500 so it tries to give me a full definition of the Flask framework. 869 00:47:16,500 --> 00:47:18,380 Now it sent back the response. 870 00:47:18,380 --> 00:47:21,900 And I can ask, what did I just ask? 871 00:47:21,900 --> 00:47:25,200 872 00:47:25,200 --> 00:47:28,930 And then the model was saying, you just asked, what is Flask? 873 00:47:28,930 --> 00:47:31,810 So the model somehow remember what I asked. 874 00:47:31,810 --> 00:47:34,770 But underneath the hood, I'm just telling the model again, 875 00:47:34,770 --> 00:47:36,640 hey, this is my question. 876 00:47:36,640 --> 00:47:38,955 This is your response. 877 00:47:38,955 --> 00:47:40,620 I just asked you what did I ask. 878 00:47:40,620 --> 00:47:45,720 The model based on the chat history generate a response back to me 879 00:47:45,720 --> 00:47:48,290 correctly. 880 00:47:48,290 --> 00:47:49,920 OK. 881 00:47:49,920 --> 00:47:55,640 Now, I want to talk about some downsides or some problem 882 00:47:55,640 --> 00:47:57,840 when dealing with large language model. 883 00:47:57,840 --> 00:48:00,450 There's a technical term called hallucinations. 884 00:48:00,450 --> 00:48:05,090 So the model, sometimes we just make stuff up. 885 00:48:05,090 --> 00:48:07,478 Again, it's generating text. 886 00:48:07,478 --> 00:48:10,020 It's not really understanding the thing you're talking about. 887 00:48:10,020 --> 00:48:13,070 It's being trained on large corpus. 888 00:48:13,070 --> 00:48:14,700 It has the knowledge of the internet. 889 00:48:14,700 --> 00:48:20,600 But sometime when you ask it a question, it might not exist in the training data, 890 00:48:20,600 --> 00:48:25,100 but the model was trying to generate something that somehow seemingly best 891 00:48:25,100 --> 00:48:29,540 matched to your expectation. 892 00:48:29,540 --> 00:48:32,330 One way to tackle this problem is called grounding. 893 00:48:32,330 --> 00:48:38,690 And specifically the technique to tackle this hallucination problem 894 00:48:38,690 --> 00:48:44,820 is called retrieval-augmented generation or often referred to RAG. 895 00:48:44,820 --> 00:48:46,090 I have my own explanation. 896 00:48:46,090 --> 00:48:49,420 Again, this is like a fancy term people invent maybe for marketing purposes. 897 00:48:49,420 --> 00:48:53,250 I think it's just handing the large language model a cheat sheet. 898 00:48:53,250 --> 00:48:54,250 That's basically it. 899 00:48:54,250 --> 00:48:56,580 That's what's happening here. 900 00:48:56,580 --> 00:49:01,620 So CS50 Duck actually incorporated the RAG pipeline also 901 00:49:01,620 --> 00:49:04,390 as well as a way for experimenting as well. 902 00:49:04,390 --> 00:49:07,980 So there's a scenario when student asks, what is Flask? 903 00:49:07,980 --> 00:49:12,780 First of all, Flask is a web programming framework. 904 00:49:12,780 --> 00:49:15,400 The model already understand what Flask is. 905 00:49:15,400 --> 00:49:18,630 It's not purely exclusive to CS50. 906 00:49:18,630 --> 00:49:21,730 So the model has internal knowledge of what Flask is. 907 00:49:21,730 --> 00:49:26,160 So if a student asks GPT directly what is Flask-- 908 00:49:26,160 --> 00:49:31,920 as a new beginner, as a beginner, you will want some response 909 00:49:31,920 --> 00:49:33,700 that are easily understandable. 910 00:49:33,700 --> 00:49:39,580 But somehow the GPT 4 model was trained on the entire internet knowledge. 911 00:49:39,580 --> 00:49:41,940 It has advanced knowledge of what Flask it. 912 00:49:41,940 --> 00:49:44,100 And then it will just try to generate response 913 00:49:44,100 --> 00:49:47,100 that's so advanced that a beginner might not able to comprehend. 914 00:49:47,100 --> 00:49:52,360 So in order to tackle that, we will want to supply our own CS50 definition. 915 00:49:52,360 --> 00:49:56,205 It could be a definition that David mentioned in the lecture. 916 00:49:56,205 --> 00:49:59,960 And that's exactly what we are doing here for the CS50 Duck. 917 00:49:59,960 --> 00:50:02,700 When a student asks, what is Flask, we first 918 00:50:02,700 --> 00:50:07,050 try to search our entire lecture database. 919 00:50:07,050 --> 00:50:10,320 I will get to it and how we do that. 920 00:50:10,320 --> 00:50:13,320 We basically try to search our entire lecture caption, 921 00:50:13,320 --> 00:50:16,800 the whole 11 weeks lecture also. 922 00:50:16,800 --> 00:50:22,020 We try to find what David says in the lecture that actually best matches 923 00:50:22,020 --> 00:50:23,110 this question. 924 00:50:23,110 --> 00:50:26,550 That might be the ground truth because that's the definition. 925 00:50:26,550 --> 00:50:29,910 We can just hand that answer to a student, right? 926 00:50:29,910 --> 00:50:33,130 So we retrieve the best match caption. 927 00:50:33,130 --> 00:50:37,200 The moment that David actually talk about what Flask is in the course, 928 00:50:37,200 --> 00:50:42,420 we literally put that back into our updated prompt. 929 00:50:42,420 --> 00:50:47,180 It's kind of like handling the GPT like, hey, GPT 4, here is a cheat sheet. 930 00:50:47,180 --> 00:50:50,680 You can just look at this and answer the question. 931 00:50:50,680 --> 00:50:54,310 We have an updated prompt along with the original question 932 00:50:54,310 --> 00:50:55,880 that they asked by the student. 933 00:50:55,880 --> 00:51:00,650 We send the whole thing to OpenAI GPT 4 and get back a response. 934 00:51:00,650 --> 00:51:05,380 And this response is more, let's just say, learner friendly or CS50 student 935 00:51:05,380 --> 00:51:07,690 friendly response. 936 00:51:07,690 --> 00:51:08,480 And that's it. 937 00:51:08,480 --> 00:51:10,990 That's what the RAG is doing in the CS50 Duck. 938 00:51:10,990 --> 00:51:17,720 Or that's what RAG is doing in most of the AI system in the world right now. 939 00:51:17,720 --> 00:51:23,160 But how exactly can we perform this search quickly? 940 00:51:23,160 --> 00:51:28,140 So here the embedding thing comes into play. 941 00:51:28,140 --> 00:51:30,820 Large language model or other neural network, 942 00:51:30,820 --> 00:51:33,510 they don't actually understand English. 943 00:51:33,510 --> 00:51:39,030 They understand this a vector of floats. 944 00:51:39,030 --> 00:51:46,410 In particular, this vector is in a dimension of 1,536. 945 00:51:46,410 --> 00:51:51,310 And we are actually using a text embedding model to do this conversion. 946 00:51:51,310 --> 00:51:58,090 So, for example, we use a text embedding model to embed what is Flask into this-- 947 00:51:58,090 --> 00:51:59,890 like a vector of floats. 948 00:51:59,890 --> 00:52:03,990 If you supply only one character to this embedding model, 949 00:52:03,990 --> 00:52:07,240 you will still get back the same dimension of different vectors. 950 00:52:07,240 --> 00:52:10,730 That's just how this text embedding model works. 951 00:52:10,730 --> 00:52:15,420 So what we do is we actually go through our lecture caption. 952 00:52:15,420 --> 00:52:20,510 We segment them into a 30-second chunk. 953 00:52:20,510 --> 00:52:26,420 And then for each 30-second chunk, we create a vector representation or text 954 00:52:26,420 --> 00:52:28,190 embedding for it. 955 00:52:28,190 --> 00:52:31,580 And then we store them into a database often referred 956 00:52:31,580 --> 00:52:33,700 to as the vector database. 957 00:52:33,700 --> 00:52:34,960 Here's what it looks like. 958 00:52:34,960 --> 00:52:37,270 For example, for this particular moment, David 959 00:52:37,270 --> 00:52:43,155 is talking about Flask is a third-party library, goes on and on. 960 00:52:43,155 --> 00:52:46,460 And this is actually the lecture caption segment. 961 00:52:46,460 --> 00:52:47,770 Here is just an example. 962 00:52:47,770 --> 00:52:52,660 What we do is we literally just concatenate this 30-second chunk 963 00:52:52,660 --> 00:52:53,780 into this. 964 00:52:53,780 --> 00:52:56,920 This is just the chunking strategy we take. 965 00:52:56,920 --> 00:53:00,940 You have different chunking strategy you can pick or you can design. 966 00:53:00,940 --> 00:53:06,250 We utilize OpenAI's text embedding model to generate a vector representation 967 00:53:06,250 --> 00:53:08,740 of this particular 30-second chunk. 968 00:53:08,740 --> 00:53:13,520 We believe this 30-second chunk-- 969 00:53:13,520 --> 00:53:17,060 once converted into vector representation, 970 00:53:17,060 --> 00:53:20,780 we will be able to store it in a database so that when student asks 971 00:53:20,780 --> 00:53:24,710 a question, we can start going through and see how similar 972 00:53:24,710 --> 00:53:29,290 the student's question is compared to this 30-second chunk. 973 00:53:29,290 --> 00:53:31,830 974 00:53:31,830 --> 00:53:36,860 The calculation between these vector is just simple cosine similarity. 975 00:53:36,860 --> 00:53:38,490 It's like a dot product. 976 00:53:38,490 --> 00:53:40,800 We basically go through the entire database, 977 00:53:40,800 --> 00:53:46,410 calculate each cosine similarity or the dot product result, 978 00:53:46,410 --> 00:53:49,150 and rank them from the best match to the worst match. 979 00:53:49,150 --> 00:53:52,950 And then we just pick the first top end matches and put it into the prompt. 980 00:53:52,950 --> 00:53:58,030 So here's the entire float of how this RAG pipeline works in CS50 Duck. 981 00:53:58,030 --> 00:54:02,310 When student asks, what is Flask, we convert it to vector representation. 982 00:54:02,310 --> 00:54:06,030 We perform a search against our vector database that is full of tens 983 00:54:06,030 --> 00:54:11,190 of thousands, 30-second chunk of caption represented by text embedding. 984 00:54:11,190 --> 00:54:13,470 We find out the best match. 985 00:54:13,470 --> 00:54:14,860 We put it into the prompt. 986 00:54:14,860 --> 00:54:18,040 This is plain text now because we already retrieved the document. 987 00:54:18,040 --> 00:54:22,320 We should just put the plain text into the prompt. 988 00:54:22,320 --> 00:54:26,690 And then we ask GPT 4, OK, here is your prompt. 989 00:54:26,690 --> 00:54:27,710 What is Flask? 990 00:54:27,710 --> 00:54:29,750 Here is some useful information. 991 00:54:29,750 --> 00:54:33,810 And then just put in the information, the cheat sheet. 992 00:54:33,810 --> 00:54:39,390 And then the OpenAI GPT will be able to generate a response. 993 00:54:39,390 --> 00:54:43,650 The vetor database we use for our CS50 Duck is called Chroma DB. 994 00:54:43,650 --> 00:54:45,610 It's an open-source vector database. 995 00:54:45,610 --> 00:54:49,740 There's many more vector database nowadays available in the market 996 00:54:49,740 --> 00:54:52,530 as well. 997 00:54:52,530 --> 00:54:56,080 And this is just another diagram summarizing what it's doing here, 998 00:54:56,080 --> 00:55:00,480 essentially the same flowchart I just showed a few slides ago. 999 00:55:00,480 --> 00:55:02,040 Again, the query comes in. 1000 00:55:02,040 --> 00:55:08,330 We call OpenAI text embedding model to generate an embedding for the question. 1001 00:55:08,330 --> 00:55:11,140 And then we search against the database, the vector database. 1002 00:55:11,140 --> 00:55:12,280 We retrieve the document. 1003 00:55:12,280 --> 00:55:13,510 We put it in the-- 1004 00:55:13,510 --> 00:55:17,870 we put it in the prompt often also called the LM context window. 1005 00:55:17,870 --> 00:55:21,640 And then we get back an answer. 1006 00:55:21,640 --> 00:55:25,350 And that's embedding based search, and I can quickly demo it as well. 1007 00:55:25,350 --> 00:55:31,382 1008 00:55:31,382 --> 00:55:32,715 So I'll queue this. 1009 00:55:32,715 --> 00:55:43,970 1010 00:55:43,970 --> 00:55:46,810 It might be fun to quickly show what embedding looks like. 1011 00:55:46,810 --> 00:55:50,990 1012 00:55:50,990 --> 00:55:55,070 Let's say now I'm actually calling the OpenAI 1013 00:55:55,070 --> 00:55:58,790 API to create an embedding for this particular question. 1014 00:55:58,790 --> 00:56:01,540 I'm just going to run it. 1015 00:56:01,540 --> 00:56:07,330 You see that it generate a bunch of float numbers. 1016 00:56:07,330 --> 00:56:12,860 This is actually a vector of floats that represent what is Flask. 1017 00:56:12,860 --> 00:56:16,420 That's just how this text embedding model understand. 1018 00:56:16,420 --> 00:56:21,090 If we change it to simply Cat and save it-- 1019 00:56:21,090 --> 00:56:23,040 and I'm going to run it again-- 1020 00:56:23,040 --> 00:56:30,630 I will still get back a 1,536 dimension of a vector or float. 1021 00:56:30,630 --> 00:56:34,290 So no matter how long the plane test you supply, 1022 00:56:34,290 --> 00:56:36,590 you always get back the same dimension of the vector. 1023 00:56:36,590 --> 00:56:40,210 1024 00:56:40,210 --> 00:56:43,475 And, again, you don't need to actually read what those vector representing. 1025 00:56:43,475 --> 00:56:45,600 It just gives you an impression of what it's doing. 1026 00:56:45,600 --> 00:56:48,430 1027 00:56:48,430 --> 00:56:50,830 Now, for this demo, I want to quickly demo 1028 00:56:50,830 --> 00:56:54,020 how you can perform an embedding, like a question 1029 00:56:54,020 --> 00:56:58,120 answer with embedding-based search. 1030 00:56:58,120 --> 00:57:00,540 So I'm going to full screen myself here. 1031 00:57:00,540 --> 00:57:09,910 1032 00:57:09,910 --> 00:57:13,180 So for this demo, what I'm going to do is I'm going 1033 00:57:13,180 --> 00:57:17,020 to create a vector database or a JSON-- 1034 00:57:17,020 --> 00:57:23,080 like a document that contain all the caption segment 1035 00:57:23,080 --> 00:57:25,690 representing by text embedding. 1036 00:57:25,690 --> 00:57:29,590 For example, this is what the original text transcript looks like. 1037 00:57:29,590 --> 00:57:35,950 This is just the AI lecture from last year or from last year's for lecture 1 1038 00:57:35,950 --> 00:57:39,422 to lecture 3, so on and so forth. 1039 00:57:39,422 --> 00:57:41,505 I'm going to just run the script directly for you. 1040 00:57:41,505 --> 00:57:50,490 1041 00:57:50,490 --> 00:57:53,830 So let me quickly explain what the script is doing. 1042 00:57:53,830 --> 00:57:57,600 So this particular script create_embedding-- what it's doing 1043 00:57:57,600 --> 00:58:01,710 is we first go through each lecture transcript. 1044 00:58:01,710 --> 00:58:04,030 So we go through each lecture transcript. 1045 00:58:04,030 --> 00:58:07,210 We chunk them into, I don't know, 500 characters. 1046 00:58:07,210 --> 00:58:09,850 It's a very naive chunking strategy here. 1047 00:58:09,850 --> 00:58:14,710 We just chunk it into 500-character-long chunk no matter what. 1048 00:58:14,710 --> 00:58:17,400 We don't even consider the punctuation or not. 1049 00:58:17,400 --> 00:58:20,250 But this is often referred to the chunking strategy. 1050 00:58:20,250 --> 00:58:24,468 So you have different way of chunk the content 1051 00:58:24,468 --> 00:58:26,260 and then store them into a vector database. 1052 00:58:26,260 --> 00:58:31,925 So in order to demo, I just do a very naive chunking strategy here. 1053 00:58:31,925 --> 00:58:39,510 1054 00:58:39,510 --> 00:58:43,240 So right now it's creating embedding, moderately executed the script. 1055 00:58:43,240 --> 00:58:46,740 What it's doing is for each 500-character-long chunk, 1056 00:58:46,740 --> 00:58:51,690 I'm going to call the OpenAI's text embedding API to create a vector 1057 00:58:51,690 --> 00:58:52,690 representation. 1058 00:58:52,690 --> 00:58:53,490 And I'm going to-- 1059 00:58:53,490 --> 00:58:58,020 I'm going to associate this particular vector with the 500-character-long 1060 00:58:58,020 --> 00:58:58,720 chunk. 1061 00:58:58,720 --> 00:59:01,500 That way when I do the retrieval, I know, OK, 1062 00:59:01,500 --> 00:59:05,520 this vector whenever it's the best-- whenever it is the best match, 1063 00:59:05,520 --> 00:59:10,320 I know how to get back its original plain text so that I can put that 1064 00:59:10,320 --> 00:59:12,270 into the prompt and then prompt GPT. 1065 00:59:12,270 --> 00:59:18,480 It takes a while because we are sending the entire fall 2023 lecture for OpenAI 1066 00:59:18,480 --> 00:59:21,570 to generate a text embedding. 1067 00:59:21,570 --> 00:59:25,240 And soon you will see what it looks like once it's finished generated. 1068 00:59:25,240 --> 00:59:28,140 1069 00:59:28,140 --> 00:59:32,720 You can also run the exact same code I provide for this workshop. 1070 00:59:32,720 --> 00:59:35,670 All you need is to obtain your own OpenAI key. 1071 00:59:35,670 --> 00:59:36,400 And that's all. 1072 00:59:36,400 --> 00:59:52,130 1073 00:59:52,130 --> 00:59:54,600 I can also show you what it looks like while generating. 1074 00:59:54,600 --> 00:59:57,840 So it generated a JSON Line document. 1075 00:59:57,840 --> 01:00:00,320 So to give you an example, for example, this one, 1076 01:00:00,320 --> 01:00:06,950 this is one of the 500-character-long lecture caption segment. 1077 01:00:06,950 --> 01:00:12,810 And it is associated with its corresponding vector representation 1078 01:00:12,810 --> 01:00:18,180 and the following vector flow representing this entire 1079 01:00:18,180 --> 01:00:20,045 500-character-long segment. 1080 01:00:20,045 --> 01:00:25,460 1081 01:00:25,460 --> 01:00:28,100 You can just end the process here because I already 1082 01:00:28,100 --> 01:00:30,410 have an embedding created beforehand. 1083 01:00:30,410 --> 01:00:35,780 Now, once we have the vector database somehow ready, what we need to do 1084 01:00:35,780 --> 01:00:37,190 is perform a search. 1085 01:00:37,190 --> 01:00:38,755 So I can do this. 1086 01:00:38,755 --> 01:00:42,410 1087 01:00:42,410 --> 01:00:45,200 Second demo is enter a query. 1088 01:00:45,200 --> 01:00:49,060 So let's just say, what is minimax? 1089 01:00:49,060 --> 01:00:50,580 This is a question I'm sending. 1090 01:00:50,580 --> 01:00:54,860 I'm searching against the entire lecture to see which particular segment actually 1091 01:00:54,860 --> 01:00:57,830 match this question. 1092 01:00:57,830 --> 01:00:59,930 So I'm going to ask-- 1093 01:00:59,930 --> 01:01:05,150 you get back this particular result. This is the top one 1094 01:01:05,150 --> 01:01:09,770 matches that we found from the entire vector database, if you will. 1095 01:01:09,770 --> 01:01:12,600 And it does mention what minimax is. 1096 01:01:12,600 --> 01:01:16,790 So it is sort already answering the question, 1097 01:01:16,790 --> 01:01:19,410 but it is not answering the question. 1098 01:01:19,410 --> 01:01:24,020 OK, this particular caption actually somehow match 1099 01:01:24,020 --> 01:01:26,030 what this student is asking. 1100 01:01:26,030 --> 01:01:28,310 So the next step will be, naturally, we want 1101 01:01:28,310 --> 01:01:32,570 to put this entire message into the prompt and prompt GPT 1102 01:01:32,570 --> 01:01:33,565 to get a response back. 1103 01:01:33,565 --> 01:01:39,830 1104 01:01:39,830 --> 01:01:41,930 I will ask the same question what is minimax. 1105 01:01:41,930 --> 01:01:48,470 1106 01:01:48,470 --> 01:01:53,830 And you see, this is the actual prompt we are sending to OpenAI. 1107 01:01:53,830 --> 01:01:57,050 That's the original question from me. 1108 01:01:57,050 --> 01:02:02,620 That's the original question from me and then the retrieve document here. 1109 01:02:02,620 --> 01:02:06,160 And then OpenAI will basically utilize the information 1110 01:02:06,160 --> 01:02:09,160 I provide to generate a response back. 1111 01:02:09,160 --> 01:02:14,550 And the entire RAG pipeline looks like in a simplified way-- 1112 01:02:14,550 --> 01:02:18,480 Let's do another fun thing, I guess. 1113 01:02:18,480 --> 01:02:26,110 Is there a Mario Brothers Oreo mentioned in the lecture? 1114 01:02:26,110 --> 01:02:31,020 Because oftentimes in CS50, we do like to hand out Mario cookies somehow, 1115 01:02:31,020 --> 01:02:34,350 or Mario Brothers Oreos. 1116 01:02:34,350 --> 01:02:36,180 Oreos. 1117 01:02:36,180 --> 01:02:39,870 I don't know how to spell Oreos, but that's fine. 1118 01:02:39,870 --> 01:02:49,280 Is there a Mario cookie mentioned in the lecture? 1119 01:02:49,280 --> 01:02:53,250 The AI said, yes, there's a Mario Brother Oreo mentioned in the lecture 1120 01:02:53,250 --> 01:02:57,850 because the lecture segment we create actually contains the Mario Brother. 1121 01:02:57,850 --> 01:03:01,650 And then it was able to generate a text embedding. 1122 01:03:01,650 --> 01:03:05,340 And then during the calculation, we found the best match. 1123 01:03:05,340 --> 01:03:08,250 And this is actually the information we provide to OpenAI. 1124 01:03:08,250 --> 01:03:11,380 And then OpenAI GPT 4 can answer the question correctly. 1125 01:03:11,380 --> 01:03:16,380 Without this RAG, the model will start hallucinate because, first of all, 1126 01:03:16,380 --> 01:03:18,100 the model didn't attend the lecture. 1127 01:03:18,100 --> 01:03:19,660 They didn't know what happened in the lecture. 1128 01:03:19,660 --> 01:03:21,243 So it will try to make up some answer. 1129 01:03:21,243 --> 01:03:25,790 1130 01:03:25,790 --> 01:03:28,120 OK, that's the demo. 1131 01:03:28,120 --> 01:03:32,020 So that's actually what's happening underneath the hood of the CS50 AI 1132 01:03:32,020 --> 01:03:34,100 Duck in its current version. 1133 01:03:34,100 --> 01:03:38,860 And right now we're actually working on a newer version of the CS50 Duck. 1134 01:03:38,860 --> 01:03:42,430 We hope to improve the experience for student. 1135 01:03:42,430 --> 01:03:46,900 This is what it looks like now. 1136 01:03:46,900 --> 01:03:49,600 It also works perfectly on mobile. 1137 01:03:49,600 --> 01:03:54,880 Student can have their own customization for the Duck as well. 1138 01:03:54,880 --> 01:03:57,990 To quickly show what it looks like, I'm going to open the browser. 1139 01:03:57,990 --> 01:04:02,770 1140 01:04:02,770 --> 01:04:06,310 And also, we actually randomly show postcards from the world 1141 01:04:06,310 --> 01:04:08,920 that people send to us with the Duck. 1142 01:04:08,920 --> 01:04:13,150 This is just how we design the UI. 1143 01:04:13,150 --> 01:04:18,210 So I'm going to give you a sneak peek of what the new CS50 Duck looks like. 1144 01:04:18,210 --> 01:04:20,210 It looks like this. 1145 01:04:20,210 --> 01:04:23,040 I can ask a question again, like what is Flask? 1146 01:04:23,040 --> 01:04:26,010 It will start answering my question. 1147 01:04:26,010 --> 01:04:32,160 If I don't like the traditional way the Duck presenting me answer, 1148 01:04:32,160 --> 01:04:34,530 I can also switch to a chat mode. 1149 01:04:34,530 --> 01:04:37,227 So it's more user friendly, I suppose. 1150 01:04:37,227 --> 01:04:39,060 If you don't like the light theme-- you want 1151 01:04:39,060 --> 01:04:41,700 to personalize the Duck-- you can also put-- 1152 01:04:41,700 --> 01:04:43,950 you can also pick a cupcake theme perhaps 1153 01:04:43,950 --> 01:04:47,530 because, in cupcake, we also handle a lot of cupcake in our CS50 lecture. 1154 01:04:47,530 --> 01:04:51,690 So we want to incorporate this Easter egg into the Duck as well. 1155 01:04:51,690 --> 01:04:55,470 We also starting to experimenting how we can better 1156 01:04:55,470 --> 01:04:57,660 assess student with the help of AI. 1157 01:04:57,660 --> 01:05:00,820 So we start incorporating code review and pair programming. 1158 01:05:00,820 --> 01:05:03,960 This is still in development. 1159 01:05:03,960 --> 01:05:05,310 Things are subject to change. 1160 01:05:05,310 --> 01:05:07,210 But this is the direction we are heading on. 1161 01:05:07,210 --> 01:05:10,420 So we would like to give you a sneak peek of what it looks 1162 01:05:10,420 --> 01:05:13,900 like for the newer version of the Duck. 1163 01:05:13,900 --> 01:05:20,347 So with that, I'm going to hand over to our next amazing presenters. 1164 01:05:20,347 --> 01:05:22,180 JULIANNA ZHAO: All right, so today I'm going 1165 01:05:22,180 --> 01:05:24,190 to be presenting some of the new ideas we've 1166 01:05:24,190 --> 01:05:27,730 been having on the team with how we can improve the AI tools that we present 1167 01:05:27,730 --> 01:05:28,850 to our students. 1168 01:05:28,850 --> 01:05:31,990 So when we started thinking about future work, our first question 1169 01:05:31,990 --> 01:05:35,720 that we wanted to address was, what else can we offer with AI tools? 1170 01:05:35,720 --> 01:05:37,780 Because we were quite sure that we had not 1171 01:05:37,780 --> 01:05:40,810 exhausted all the ways in which AI can make our learning 1172 01:05:40,810 --> 01:05:42,820 experience better for our students. 1173 01:05:42,820 --> 01:05:45,310 So the first thing that we recognized is that, 1174 01:05:45,310 --> 01:05:49,600 in previous CS50 pedagogical reasoning, we 1175 01:05:49,600 --> 01:05:53,260 know that ideas like think, pair, share, where we have our students in section 1176 01:05:53,260 --> 01:05:56,170 turn to each other and talk and discuss what they're learning, 1177 01:05:56,170 --> 01:06:01,100 or with what David mentioned earlier with rubber duck debugging, 1178 01:06:01,100 --> 01:06:04,660 we know that when we have students express what they believe 1179 01:06:04,660 --> 01:06:07,150 and what their learning process is, then they're 1180 01:06:07,150 --> 01:06:09,203 able to have a better learning experience. 1181 01:06:09,203 --> 01:06:11,620 And this is something that I'm sure all of our instructors 1182 01:06:11,620 --> 01:06:15,660 are also aware of in the way that we teach our students computer science. 1183 01:06:15,660 --> 01:06:19,160 So knowing that we want to have more student engagement, 1184 01:06:19,160 --> 01:06:23,330 how can we make that more accessible with AI? 1185 01:06:23,330 --> 01:06:25,970 So the first idea that we had is maybe we 1186 01:06:25,970 --> 01:06:28,790 can create a duck pair programmer where we 1187 01:06:28,790 --> 01:06:30,740 have-- the student were given a prompt. 1188 01:06:30,740 --> 01:06:34,310 A student sends code and their question to a duck, 1189 01:06:34,310 --> 01:06:36,450 and the duck gives them advice. 1190 01:06:36,450 --> 01:06:40,790 So this would be a tool for practice, not really for homework or assessment 1191 01:06:40,790 --> 01:06:43,970 but rather just a way a student can get their current work 1192 01:06:43,970 --> 01:06:47,360 and their current train of reasoning assessed by an AI, 1193 01:06:47,360 --> 01:06:49,970 where they don't have to go to a human TF, 1194 01:06:49,970 --> 01:06:51,928 whether that's because that's not as accessible 1195 01:06:51,928 --> 01:06:54,387 or because they're a little stressed, and they don't really 1196 01:06:54,387 --> 01:06:55,490 want to go as a beginner. 1197 01:06:55,490 --> 01:06:58,190 This would be a much easier way for a student 1198 01:06:58,190 --> 01:07:02,560 to practice with their own time and their own laptop. 1199 01:07:02,560 --> 01:07:07,210 So I'll give a little demo with what we have so far. 1200 01:07:07,210 --> 01:07:09,850 So the first thing that we have is instead 1201 01:07:09,850 --> 01:07:19,360 of going to chat, if we go to pair program and student, 1202 01:07:19,360 --> 01:07:22,870 then with a prompt like this, where a student is 1203 01:07:22,870 --> 01:07:25,360 to write a Python function called check_password 1204 01:07:25,360 --> 01:07:28,953 that checks that a string password has four conditions, 1205 01:07:28,953 --> 01:07:30,870 then maybe they start off something like this. 1206 01:07:30,870 --> 01:07:39,050 1207 01:07:39,050 --> 01:07:42,530 And before they even continue, they want to make sure 1208 01:07:42,530 --> 01:07:45,810 that this is the correct setup of this function. 1209 01:07:45,810 --> 01:07:47,960 So return true or false. 1210 01:07:47,960 --> 01:07:51,580 And so they might ask the duck, is this setup correct? 1211 01:07:51,580 --> 01:07:59,470 1212 01:07:59,470 --> 01:08:02,650 I'll zoom in a little so that it's easier to see. 1213 01:08:02,650 --> 01:08:06,310 So the thing that we want to ensure with the Duck pair programmer 1214 01:08:06,310 --> 01:08:09,280 is that the Duck only answers the question 1215 01:08:09,280 --> 01:08:12,940 that the student wants at this moment because maybe the student just 1216 01:08:12,940 --> 01:08:15,250 wants to confirm that they're on the right track. 1217 01:08:15,250 --> 01:08:18,283 So then as a teaching fellow or as an instructor, 1218 01:08:18,283 --> 01:08:20,950 you would want to make sure that you don't blurt out the answer. 1219 01:08:20,950 --> 01:08:25,000 So without doing any prompt engineering, if this is all that we give the AI-- 1220 01:08:25,000 --> 01:08:28,670 we tell that this is the prompt that this is what the student has, 1221 01:08:28,670 --> 01:08:30,520 and this is the student's question-- what 1222 01:08:30,520 --> 01:08:33,850 we found is that the AI essentially answers the question 1223 01:08:33,850 --> 01:08:35,990 but then also blurts out the exact answer. 1224 01:08:35,990 --> 01:08:38,649 It writes the snippet of code, the entire function that's 1225 01:08:38,649 --> 01:08:40,520 necessary to complete this assignment. 1226 01:08:40,520 --> 01:08:44,870 And for students coming to this tool or to a teaching instructor in good faith, 1227 01:08:44,870 --> 01:08:47,415 trying to practice, that wouldn't be helpful at all 1228 01:08:47,415 --> 01:08:49,250 because this is the answer that they want. 1229 01:08:49,250 --> 01:08:51,819 They just want to confirm that they're on the right track. 1230 01:08:51,819 --> 01:08:56,420 So now maybe the student can continue setting up a few conditions. 1231 01:08:56,420 --> 01:09:06,910 So they recognize that they need upper equals false, lower equals false, 1232 01:09:06,910 --> 01:09:09,950 and then symbol equals false. 1233 01:09:09,950 --> 01:09:12,670 And then maybe the student sees that I have 1234 01:09:12,670 --> 01:09:15,279 to iterate through all the characters in that password. 1235 01:09:15,279 --> 01:09:18,090 So they start writing a for loop, for char in password. 1236 01:09:18,090 --> 01:09:21,050 1237 01:09:21,050 --> 01:09:23,450 And then they run into a problem because they don't know 1238 01:09:23,450 --> 01:09:25,500 how to check if something is uppercase. 1239 01:09:25,500 --> 01:09:37,550 So if upper, then set upper to true, same for all other conditions, 1240 01:09:37,550 --> 01:09:46,582 and then return true if everything is true. 1241 01:09:46,582 --> 01:09:48,790 And right now they have a little bit more pseudocode. 1242 01:09:48,790 --> 01:09:50,649 They're working towards their final answer. 1243 01:09:50,649 --> 01:09:53,109 And they want to confirm with the instructor 1244 01:09:53,109 --> 01:09:54,760 that they're still on the right track. 1245 01:09:54,760 --> 01:09:57,525 So am I on the right track? 1246 01:09:57,525 --> 01:10:00,160 1247 01:10:00,160 --> 01:10:11,290 Also, how do I check if a character is uppercase? 1248 01:10:11,290 --> 01:10:15,550 You can easily imagine a student coming to you with this current status for code 1249 01:10:15,550 --> 01:10:17,090 and then this simple question. 1250 01:10:17,090 --> 01:10:24,720 And so sending this, the pair programmer would say this, 1251 01:10:24,720 --> 01:10:26,800 so, yes, you are on the right track-- 1252 01:10:26,800 --> 01:10:31,360 here's a little encouragement-- and then answers specifically that question. 1253 01:10:31,360 --> 01:10:34,980 If we were to give this code and then this question without any prompting, 1254 01:10:34,980 --> 01:10:38,910 without telling the AI not to give more than exactly what the student needs 1255 01:10:38,910 --> 01:10:43,065 at this moment, then once again, the AI would just say-- 1256 01:10:43,065 --> 01:10:46,080 it would just complete the code is upper, is lower, and then 1257 01:10:46,080 --> 01:10:49,506 four character in and the list of symbols. 1258 01:10:49,506 --> 01:10:54,660 So we are making this as a practicing tool, assuming that students are coming 1259 01:10:54,660 --> 01:10:58,270 and then seeking a replica of student, teacher interactions. 1260 01:10:58,270 --> 01:11:02,520 And this is more accessible because, once again, every student 1261 01:11:02,520 --> 01:11:03,940 can do this at their own time. 1262 01:11:03,940 --> 01:11:07,800 And so we'll always have some simulation of a one-to-one relationship 1263 01:11:07,800 --> 01:11:09,810 between students and teachers. 1264 01:11:09,810 --> 01:11:14,100 So where exactly can we incorporate this into the CS50 curriculum 1265 01:11:14,100 --> 01:11:17,380 or into any other CS curriculum that you're thinking about. 1266 01:11:17,380 --> 01:11:20,670 So we try to have practice as something that we encourage 1267 01:11:20,670 --> 01:11:24,250 our students to do or even mandate our students to do in our current CS50 1268 01:11:24,250 --> 01:11:25,040 curriculum. 1269 01:11:25,040 --> 01:11:27,490 You're probably familiar with something like this. 1270 01:11:27,490 --> 01:11:30,400 This is currently the first part of every pset, where 1271 01:11:30,400 --> 01:11:34,000 there's a problem that students can walk through with the solution given to them 1272 01:11:34,000 --> 01:11:36,200 and then just submit the answer at the very end. 1273 01:11:36,200 --> 01:11:38,890 These were referred to as labs pre 2022. 1274 01:11:38,890 --> 01:11:40,700 But now they're just a part of the pset. 1275 01:11:40,700 --> 01:11:44,800 So if we click into one of them, which is scratched from pset 2, 1276 01:11:44,800 --> 01:11:47,110 this is what the students see now. 1277 01:11:47,110 --> 01:11:50,260 We have the prompt and then a demo. 1278 01:11:50,260 --> 01:11:54,090 And they can go through and click for the advice and hints. 1279 01:11:54,090 --> 01:11:56,020 And this is great because it gives students 1280 01:11:56,020 --> 01:11:59,620 easy access to code that they can look to in completing 1281 01:11:59,620 --> 01:12:01,160 their psets in the future. 1282 01:12:01,160 --> 01:12:05,720 But this is still a bit too easy because there's no push back. 1283 01:12:05,720 --> 01:12:08,110 There's no questioning that a teacher would 1284 01:12:08,110 --> 01:12:12,060 give a student if they were to walk that student through a similar problem. 1285 01:12:12,060 --> 01:12:14,560 If students want to seek that out instead of the experience 1286 01:12:14,560 --> 01:12:16,870 that they have right now, we can offer them 1287 01:12:16,870 --> 01:12:19,330 something with a Duck pair programmer. 1288 01:12:19,330 --> 01:12:21,255 So we have this right now at lab. 1289 01:12:21,255 --> 01:12:25,170 1290 01:12:25,170 --> 01:12:30,820 So for the same question that we just saw, which is Scrabble. 1291 01:12:30,820 --> 01:12:34,550 Let's say a student currently has this much code. 1292 01:12:34,550 --> 01:12:41,610 1293 01:12:41,610 --> 01:12:45,250 And then we want to ask the Duck for advice. 1294 01:12:45,250 --> 01:12:48,570 So, first, the student might say, "I'm stuck. 1295 01:12:48,570 --> 01:13:00,215 I'm not sure how I can figure out how many points each letter gets." 1296 01:13:00,215 --> 01:13:03,820 1297 01:13:03,820 --> 01:13:12,160 And if we ask the Duck, the Duck gives this as an answer. 1298 01:13:12,160 --> 01:13:15,810 So it answers the question of-- you can use ASCII values. 1299 01:13:15,810 --> 01:13:18,990 And then you can subtract uppercase A or lowercase a. 1300 01:13:18,990 --> 01:13:22,160 So what the Duck doesn't say is, for example, you 1301 01:13:22,160 --> 01:13:24,350 can iterate through the string as an array, which 1302 01:13:24,350 --> 01:13:27,120 is another thing that student is trying to learn in this week. 1303 01:13:27,120 --> 01:13:29,030 The Duck also doesn't say that you can put 1304 01:13:29,030 --> 01:13:32,180 that in a function to abstract the entire thing and make sure 1305 01:13:32,180 --> 01:13:35,450 you have better design because we're hoping that the student can figure that 1306 01:13:35,450 --> 01:13:38,190 out in the future as they work through the problem. 1307 01:13:38,190 --> 01:13:48,510 Now, of course, the student might just ask the Duck, "Give me the solution," 1308 01:13:48,510 --> 01:13:52,110 all right, in which case the Duck says that it won't provide the solution. 1309 01:13:52,110 --> 01:13:54,790 But it'll give a more in depth walkthrough. 1310 01:13:54,790 --> 01:13:59,670 So for this, we don't really even worry that much about the student 1311 01:13:59,670 --> 01:14:03,695 asking directly for the solution because they have the solution on the website. 1312 01:14:03,695 --> 01:14:05,470 We already gave that to them. 1313 01:14:05,470 --> 01:14:09,390 Instead, we're trying to add blocks so that when the student is 1314 01:14:09,390 --> 01:14:12,270 coming without asking for the solution, the Duck won't just 1315 01:14:12,270 --> 01:14:14,220 give it away directly. 1316 01:14:14,220 --> 01:14:16,560 And this ensures that-- 1317 01:14:16,560 --> 01:14:18,700 this is exactly what an instructor would do. 1318 01:14:18,700 --> 01:14:21,600 And so by using AI to try to simulate such a thing, 1319 01:14:21,600 --> 01:14:23,910 we can make sure that in the future students 1320 01:14:23,910 --> 01:14:27,310 have this option of having a better experience. 1321 01:14:27,310 --> 01:14:32,760 And so finally, we are turning something like this where the student just clicks 1322 01:14:32,760 --> 01:14:35,730 through and reveals the instruction at each step 1323 01:14:35,730 --> 01:14:39,240 into something like this, a more conversation-based and accurate 1324 01:14:39,240 --> 01:14:43,030 representation of interactions between students and teachers. 1325 01:14:43,030 --> 01:14:47,540 And so finally, we had the idea of what if we did it in reverse? 1326 01:14:47,540 --> 01:14:51,145 What if we had given a prompt the Duck giving the student 1327 01:14:51,145 --> 01:14:55,330 student-like code and questions and the student in response 1328 01:14:55,330 --> 01:14:59,360 giving advice, which would also be a good way of practice. 1329 01:14:59,360 --> 01:15:01,930 But it could also be a good way of seeing exactly how 1330 01:15:01,930 --> 01:15:03,730 much the student understands of what we're 1331 01:15:03,730 --> 01:15:07,720 trying to teach because when students themselves are able to teach 1332 01:15:07,720 --> 01:15:11,020 a subject to someone else, we know they've come pretty close to mastery 1333 01:15:11,020 --> 01:15:12,160 of the subject. 1334 01:15:12,160 --> 01:15:19,770 And so by maneuvering a little bit more, we 1335 01:15:19,770 --> 01:15:24,730 have this, where this is the same prompt as before, 1336 01:15:24,730 --> 01:15:27,230 the Python function that checks for the four conditions. 1337 01:15:27,230 --> 01:15:29,785 And we can ask the duck for its own code. 1338 01:15:29,785 --> 01:15:34,660 1339 01:15:34,660 --> 01:15:38,230 OK, so the AI says, here's my code. 1340 01:15:38,230 --> 01:15:40,330 And then here's my first attempt. 1341 01:15:40,330 --> 01:15:43,840 So this is a little bit of my prompt engineering 1342 01:15:43,840 --> 01:15:46,532 not being as good as it should be because I told the Duck 1343 01:15:46,532 --> 01:15:47,740 to make some common mistakes. 1344 01:15:47,740 --> 01:15:50,323 And it's admitting right now that it's making common mistakes. 1345 01:15:50,323 --> 01:15:52,170 Hopefully we can avoid that in the future. 1346 01:15:52,170 --> 01:15:55,480 But it asks the student, what do you think? 1347 01:15:55,480 --> 01:15:57,130 And the student might take a look. 1348 01:15:57,130 --> 01:16:00,880 You might think a little bit about how the design of this code can be improved 1349 01:16:00,880 --> 01:16:06,560 and then in response say, "It looks great. 1350 01:16:06,560 --> 01:16:17,600 However, I don't think you have to iterate through the entire string 1351 01:16:17,600 --> 01:16:23,000 once for each condition. 1352 01:16:23,000 --> 01:16:30,010 You can just do it in one iteration." 1353 01:16:30,010 --> 01:16:38,060 And given this advice, the Duck makes its improvement and says, 1354 01:16:38,060 --> 01:16:40,470 good point, how about now? 1355 01:16:40,470 --> 01:16:44,340 And with this, we have an entire log of the Duck's code 1356 01:16:44,340 --> 01:16:47,260 as well as the student-Duck conversation. 1357 01:16:47,260 --> 01:16:50,910 And so as a teacher, seeing this would give us a good idea 1358 01:16:50,910 --> 01:16:53,110 of exactly where our students are. 1359 01:16:53,110 --> 01:16:56,010 So we're not looking at this as assessment for grades 1360 01:16:56,010 --> 01:17:01,620 but rather as evaluating students beyond just 1361 01:17:01,620 --> 01:17:04,710 looking at psets because the thing about CS assessment 1362 01:17:04,710 --> 01:17:08,520 is it often just sticks to correctness and looking at the final result, which 1363 01:17:08,520 --> 01:17:10,470 is very important for computer science. 1364 01:17:10,470 --> 01:17:13,720 But in reality, there should be so much more that we're teaching our students. 1365 01:17:13,720 --> 01:17:16,560 So this is not exactly an evaluation of correctness. 1366 01:17:16,560 --> 01:17:18,750 But instead it's an assessment of everything else 1367 01:17:18,750 --> 01:17:21,000 that we value in the teaching process. 1368 01:17:21,000 --> 01:17:23,860 But with that, I'm basically ending my section of the slideshow. 1369 01:17:23,860 --> 01:17:28,680 So our hope is that through AI tools we can offer better tools for practice 1370 01:17:28,680 --> 01:17:31,060 and assessment in CS education. 1371 01:17:31,060 --> 01:17:33,210 And this is only one chunk of the improvements 1372 01:17:33,210 --> 01:17:36,240 that we're adding to the AI Duck this summer. 1373 01:17:36,240 --> 01:17:39,750 And with that, I'll pass it off to Chris to talk about some more features 1374 01:17:39,750 --> 01:17:41,280 that we're going to be adding. 1375 01:17:41,280 --> 01:17:42,613 CHRISTOPHER PEREZ: Hi, everyone. 1376 01:17:42,613 --> 01:17:46,060 My name is Chris Perez, and I'm also a summer intern here at CS50. 1377 01:17:46,060 --> 01:17:48,220 And now we are going to be transitioning away 1378 01:17:48,220 --> 01:17:51,290 from talking about the OpenAI Completions API. 1379 01:17:51,290 --> 01:17:54,880 And now we are going to be talking about the Assistants API. 1380 01:17:54,880 --> 01:17:57,670 So in addition to the Completions API, CS50 1381 01:17:57,670 --> 01:18:01,360 is also developing prototypes of educational tools using the Assistants 1382 01:18:01,360 --> 01:18:02,620 API. 1383 01:18:02,620 --> 01:18:06,280 In a brief nutshell, it is designed for more interactive use 1384 01:18:06,280 --> 01:18:10,930 cases where the AI can act as a virtual assistant that can both understand 1385 01:18:10,930 --> 01:18:13,250 and manage multi-turn conversations. 1386 01:18:13,250 --> 01:18:17,320 For example, it can handle files, maintain individual conversations, 1387 01:18:17,320 --> 01:18:21,820 or threads as in OpenAI lingo, and can call various tools, which 1388 01:18:21,820 --> 01:18:24,070 we'll talk about in a brief moment. 1389 01:18:24,070 --> 01:18:27,760 And with all these new features that the Assistants API offers us 1390 01:18:27,760 --> 01:18:30,220 in comparison to the Completions API, this 1391 01:18:30,220 --> 01:18:35,200 motivates us using the Assistants API to create custom AI assistants that 1392 01:18:35,200 --> 01:18:36,920 can handle diverse tasks. 1393 01:18:36,920 --> 01:18:41,140 For example, it can act as a code reviewing rubber duck for CS50, which 1394 01:18:41,140 --> 01:18:43,360 we'll see about in a brief moment. 1395 01:18:43,360 --> 01:18:45,730 Before we proceed, though, I would like to note 1396 01:18:45,730 --> 01:18:48,790 that this feature is still in beta. 1397 01:18:48,790 --> 01:18:51,010 So it is still in continuous development. 1398 01:18:51,010 --> 01:18:55,510 And there are also relatively very few tutorials and documentation. 1399 01:18:55,510 --> 01:18:57,370 And it can be quite vague. 1400 01:18:57,370 --> 01:19:00,100 But nonetheless, its features can still be utilized. 1401 01:19:00,100 --> 01:19:03,940 And they are very useful, especially in the educational realm. 1402 01:19:03,940 --> 01:19:09,220 Yes, so now we are going to be talking about the brief framework 1403 01:19:09,220 --> 01:19:14,000 and flow of the Assistants API so we understand how it works behind the hood. 1404 01:19:14,000 --> 01:19:17,020 So we start off with creating an assistant. 1405 01:19:17,020 --> 01:19:20,200 And in order to create an assistant, we can help define the model 1406 01:19:20,200 --> 01:19:25,720 that we want to use, its instructions, as well as various tools, for example, 1407 01:19:25,720 --> 01:19:28,660 like code interpreter, file search, and function calling, which 1408 01:19:28,660 --> 01:19:30,380 we'll talk about in a brief moment. 1409 01:19:30,380 --> 01:19:33,770 You can also give the model a name, a description, 1410 01:19:33,770 --> 01:19:36,220 also other features such as a temperature, which, 1411 01:19:36,220 --> 01:19:41,150 for those unfamiliar, temperature helps control how deterministic an output is. 1412 01:19:41,150 --> 01:19:45,710 So in the Assistants API, the values for temperature ranges from 0 to 2, 1413 01:19:45,710 --> 01:19:50,830 where 0 is closer, is more utilized if you want more deterministic outputs. 1414 01:19:50,830 --> 01:19:53,950 And values closer to 2 make the outputs a lot more random, 1415 01:19:53,950 --> 01:19:58,540 which could be beneficial in cases where the model would like to produce 1416 01:19:58,540 --> 01:20:01,720 certain writing assignments, et cetera. 1417 01:20:01,720 --> 01:20:04,030 On the right hand side of the framework, we also 1418 01:20:04,030 --> 01:20:07,910 have what's called a thread when a user starts a conversation. 1419 01:20:07,910 --> 01:20:11,680 So in this thread, the thread consists of the user message as well as 1420 01:20:11,680 --> 01:20:13,150 the assistant message. 1421 01:20:13,150 --> 01:20:17,300 And here you can also specify tools to be used within the thread. 1422 01:20:17,300 --> 01:20:21,910 And then you can also store metadata within the thread object. 1423 01:20:21,910 --> 01:20:26,950 And now we can also create a message and then append it to this specific thread 1424 01:20:26,950 --> 01:20:29,000 as the user asks questions. 1425 01:20:29,000 --> 01:20:31,480 And so as we proceed by calling the Assistant API, 1426 01:20:31,480 --> 01:20:34,180 this thread is continually updated with a chat history 1427 01:20:34,180 --> 01:20:36,140 that we can refer to later on. 1428 01:20:36,140 --> 01:20:39,610 And then finally, once we append a message to our thread, 1429 01:20:39,610 --> 01:20:43,120 we can then run the assistant to generate a message and a response 1430 01:20:43,120 --> 01:20:45,670 by calling the model and using its tools. 1431 01:20:45,670 --> 01:20:48,100 I would like to note that the assistant also supports 1432 01:20:48,100 --> 01:20:50,620 streaming, which for those unfamiliar, it 1433 01:20:50,620 --> 01:20:53,260 helps create a more ChatGPT-like experience, 1434 01:20:53,260 --> 01:20:56,470 where the words and responses outputted by the assistant 1435 01:20:56,470 --> 01:21:00,430 are streamed onto the screen in a continuous display. 1436 01:21:00,430 --> 01:21:03,650 So you can toggle that on or off. 1437 01:21:03,650 --> 01:21:07,000 The assistant API can also support a JSON return format. 1438 01:21:07,000 --> 01:21:10,270 So you can specify certain parameters in a format 1439 01:21:10,270 --> 01:21:12,820 that is outputted by the assistant. 1440 01:21:12,820 --> 01:21:16,390 But I'd like to note that this is only possible when you're not 1441 01:21:16,390 --> 01:21:18,110 using the code interpreter tool. 1442 01:21:18,110 --> 01:21:22,990 And then finally, in this run, we are able to store a chat history, which 1443 01:21:22,990 --> 01:21:28,120 is updated dynamically in the thread to use right now and also for future use 1444 01:21:28,120 --> 01:21:28,875 later on. 1445 01:21:28,875 --> 01:21:32,490 1446 01:21:32,490 --> 01:21:33,310 All right, great. 1447 01:21:33,310 --> 01:21:36,690 So now we are going to be talking a little bit more about the three tools 1448 01:21:36,690 --> 01:21:38,830 that the Assistants API offers. 1449 01:21:38,830 --> 01:21:43,050 So, first, they offer File Search. 1450 01:21:43,050 --> 01:21:47,190 So File Search enhances the capabilities of the assistants 1451 01:21:47,190 --> 01:21:49,960 by incorporating knowledge from external sources, 1452 01:21:49,960 --> 01:21:52,030 such as user-provided documents. 1453 01:21:52,030 --> 01:21:54,330 It's a bit similar to the embeddings that Rongxin 1454 01:21:54,330 --> 01:21:56,260 talked about briefly a bit earlier. 1455 01:21:56,260 --> 01:22:01,260 So OpenAI processes these documents by first breaking them into chunks, 1456 01:22:01,260 --> 01:22:05,250 creating and storing embeddings, and then using both vector 1457 01:22:05,250 --> 01:22:09,940 and keyword search to find relevant content to answer user questions. 1458 01:22:09,940 --> 01:22:12,720 And then finally, the file search tool can 1459 01:22:12,720 --> 01:22:16,620 rank these results to find the most relevant responses before providing 1460 01:22:16,620 --> 01:22:19,800 the user a response to their question. 1461 01:22:19,800 --> 01:22:22,890 Note that for the File Search tool, however, we must store 1462 01:22:22,890 --> 01:22:26,820 and later access files from a vector store, which 1463 01:22:26,820 --> 01:22:30,000 is another API endpoint in OpenAI and then update 1464 01:22:30,000 --> 01:22:33,787 the assistant to the IDs of the files added to the store. 1465 01:22:33,787 --> 01:22:35,620 This will make a little bit more sense later 1466 01:22:35,620 --> 01:22:40,900 on when I give a brief coding demo of how to use the Assistants API. 1467 01:22:40,900 --> 01:22:41,590 Great. 1468 01:22:41,590 --> 01:22:43,340 And so next up, we are going to be talking 1469 01:22:43,340 --> 01:22:47,350 about the Code Interpreter, which is the most relevant tool that we are currently 1470 01:22:47,350 --> 01:22:49,940 experimenting with the Assistants API this summer. 1471 01:22:49,940 --> 01:22:52,390 So the Code Interpreter tool allows the assistant 1472 01:22:52,390 --> 01:22:55,360 to process files of a variety of types and formats. 1473 01:22:55,360 --> 01:23:01,220 And it can also generate files that take in data and images of graphs as inputs. 1474 01:23:01,220 --> 01:23:05,500 It can also be used to write and run Python code in a sandbox environment 1475 01:23:05,500 --> 01:23:10,090 and can also run code iteratively to solve code and math problems. 1476 01:23:10,090 --> 01:23:15,590 It is charged at $0.03 per session, and each session lasts about one hour. 1477 01:23:15,590 --> 01:23:17,710 So after the one hour is done, another session 1478 01:23:17,710 --> 01:23:21,550 is created if the user still wants to continue using the Code Interpreter 1479 01:23:21,550 --> 01:23:22,780 tool. 1480 01:23:22,780 --> 01:23:25,930 And to use the Code Interpreter tool, we attach files. 1481 01:23:25,930 --> 01:23:28,750 Instead of using the vector store endpoint, 1482 01:23:28,750 --> 01:23:31,390 we now would use the file upload endpoint. 1483 01:23:31,390 --> 01:23:34,660 And then throughout the process of creating runs and whatnot, 1484 01:23:34,660 --> 01:23:40,090 we reference the file ID to attach it to other components of our assistant. 1485 01:23:40,090 --> 01:23:45,050 I would also like to note that we can pass files both to the assistant level 1486 01:23:45,050 --> 01:23:46,790 and also to the thread level. 1487 01:23:46,790 --> 01:23:48,180 And the difference is-- 1488 01:23:48,180 --> 01:23:51,050 so for the attaching files to the assistant level, 1489 01:23:51,050 --> 01:23:55,010 the files are available to all threads created under that assistant. 1490 01:23:55,010 --> 01:23:59,630 This may be good for files that want to be referenced throughout all threads, 1491 01:23:59,630 --> 01:24:02,300 such as course materials more broadly. 1492 01:24:02,300 --> 01:24:05,198 Or, for example, at the thread level, we can attach files 1493 01:24:05,198 --> 01:24:06,740 that are just run during that thread. 1494 01:24:06,740 --> 01:24:12,380 And this may be better used or is very purposeful for individual users 1495 01:24:12,380 --> 01:24:15,140 and students who attach their own code files to the assistant 1496 01:24:15,140 --> 01:24:19,337 that they want the assistant themselves to run and interpret. 1497 01:24:19,337 --> 01:24:21,920 A great feature about the Code Interpreter is that it supports 1498 01:24:21,920 --> 01:24:28,850 a variety of file types such as .c, .doc, HTML, Python, Javascript, CSV, 1499 01:24:28,850 --> 01:24:32,980 and many more other types of files. 1500 01:24:32,980 --> 01:24:35,650 And then last but not least, Assistant API also 1501 01:24:35,650 --> 01:24:39,100 offers a function calling tool which allows us to describe functions 1502 01:24:39,100 --> 01:24:42,970 to our assistant and have it return the functions that need to be called along 1503 01:24:42,970 --> 01:24:44,360 with their arguments. 1504 01:24:44,360 --> 01:24:46,990 Now, this is also useful because it allows the model 1505 01:24:46,990 --> 01:24:51,160 to interact with and run external functions, which expands the Assistant 1506 01:24:51,160 --> 01:24:55,280 APIs functions beyond text generation and language understanding. 1507 01:24:55,280 --> 01:24:59,000 1508 01:24:59,000 --> 01:25:04,100 All right, so now we will be walking through a little coding demo with Python 1509 01:25:04,100 --> 01:25:07,800 to see just how we could potentially create an Assistants API. 1510 01:25:07,800 --> 01:25:10,640 I will note that if you want some more detailed documentation, 1511 01:25:10,640 --> 01:25:14,695 you can visit the OpenAI walkthrough themselves on their documentation page. 1512 01:25:14,695 --> 01:25:17,540 1513 01:25:17,540 --> 01:25:18,570 All right, great. 1514 01:25:18,570 --> 01:25:22,250 So to begin, we first want to import the OpenAI library 1515 01:25:22,250 --> 01:25:25,740 and establish a client, which also requires an API key. 1516 01:25:25,740 --> 01:25:30,290 So before being able to run the assistants, just as in the Completions 1517 01:25:30,290 --> 01:25:35,090 API, we'll need an API key to be able to run this and then create a personalized 1518 01:25:35,090 --> 01:25:36,380 assistant. 1519 01:25:36,380 --> 01:25:38,660 So in this creation for the assistant, we 1520 01:25:38,660 --> 01:25:40,500 can specify the name of the assistant. 1521 01:25:40,500 --> 01:25:43,350 So in this case, it is a CS50 assistant. 1522 01:25:43,350 --> 01:25:46,260 We can also specify instructions. 1523 01:25:46,260 --> 01:25:50,150 So in this case, it is a CS50 debugging Duck. 1524 01:25:50,150 --> 01:25:53,900 And then we can also tell it other tools it 1525 01:25:53,900 --> 01:25:57,140 can use, such as Code Interpreter or the other two tools 1526 01:25:57,140 --> 01:25:58,770 that the assistants can use. 1527 01:25:58,770 --> 01:26:01,790 And then we can also attach different file ids, 1528 01:26:01,790 --> 01:26:05,390 giving it specific course content or coding programs 1529 01:26:05,390 --> 01:26:09,390 that the assistant can run. 1530 01:26:09,390 --> 01:26:13,200 Another note I would like to mention is that when using the code interpreter, 1531 01:26:13,200 --> 01:26:15,480 I find it very helpful in our experimentations 1532 01:26:15,480 --> 01:26:19,230 thus far to specify in the instructions themselves 1533 01:26:19,230 --> 01:26:23,250 to use the code interpreter, the assistant, at the beginning 1534 01:26:23,250 --> 01:26:27,480 so that it can help avoid hallucinations when referencing 1535 01:26:27,480 --> 01:26:30,480 that file to generate responses. 1536 01:26:30,480 --> 01:26:33,450 And then we also want to attach files here via the file API, 1537 01:26:33,450 --> 01:26:38,430 as I mentioned briefly before, which can accept both image URLs and file uploads. 1538 01:26:38,430 --> 01:26:41,580 And then models that we use for the Assistants API 1539 01:26:41,580 --> 01:26:45,150 that support vision can also accept image inputs. 1540 01:26:45,150 --> 01:26:48,960 And currently there is a 100-gigabyte limit for organizations and users 1541 01:26:48,960 --> 01:26:53,500 in terms of file storage. 1542 01:26:53,500 --> 01:26:54,040 Great. 1543 01:26:54,040 --> 01:26:57,130 Next, we can create an empty thread with our assistant, which 1544 01:26:57,130 --> 01:26:59,920 is helpful in maintaining context across a session 1545 01:26:59,920 --> 01:27:02,060 between a user and an assistant. 1546 01:27:02,060 --> 01:27:05,320 And just as a great note, there is no limit to the number of messages 1547 01:27:05,320 --> 01:27:07,180 that we could store in a thread. 1548 01:27:07,180 --> 01:27:11,110 And then once the size of the messages exceeds the context window of the model, 1549 01:27:11,110 --> 01:27:13,630 the thread will attempt to smartly truncate messages 1550 01:27:13,630 --> 01:27:18,370 before dropping the other ones it considers the least important. 1551 01:27:18,370 --> 01:27:20,380 And then we can also create a thread by passing 1552 01:27:20,380 --> 01:27:26,050 in a message we want added to it, which can include the file IDs and tools. 1553 01:27:26,050 --> 01:27:29,800 And finally, we store the OpenAI thread object in the variable thread 1554 01:27:29,800 --> 01:27:33,880 as shown to my left, which we will help reference throughout the student 1555 01:27:33,880 --> 01:27:35,450 assistant conversation. 1556 01:27:35,450 --> 01:27:38,340 1557 01:27:38,340 --> 01:27:41,220 Next, up we can use the messages.create function 1558 01:27:41,220 --> 01:27:44,120 to append a new message to an existing thread. 1559 01:27:44,120 --> 01:27:46,920 For example, here we are adding an initial user message 1560 01:27:46,920 --> 01:27:49,450 where they can ask for help on the Mario pset. 1561 01:27:49,450 --> 01:27:53,220 And in this demo, we also show that they are attaching their code 1562 01:27:53,220 --> 01:27:56,550 and as well as specifying that they want to use the code interpreter 1563 01:27:56,550 --> 01:27:58,820 tool for this thread. 1564 01:27:58,820 --> 01:28:02,180 And now that we have a message that we want the assistant to respond to, 1565 01:28:02,180 --> 01:28:06,560 we can now create a run, which takes in both the thread ID and assistant ID 1566 01:28:06,560 --> 01:28:08,220 that we want to refer to. 1567 01:28:08,220 --> 01:28:11,270 And then this process returns a run object 1568 01:28:11,270 --> 01:28:14,240 that can help give us more information on the status of the run 1569 01:28:14,240 --> 01:28:18,240 and the parameters specified to the thread and assistant. 1570 01:28:18,240 --> 01:28:20,540 Alternatively, you can use another function 1571 01:28:20,540 --> 01:28:23,270 that can create a thread and a run in one step. 1572 01:28:23,270 --> 01:28:25,220 But I'm showing it here in two separate steps 1573 01:28:25,220 --> 01:28:27,845 so that we can both understand the different types of functions 1574 01:28:27,845 --> 01:28:29,170 that OpenAI offers us. 1575 01:28:29,170 --> 01:28:31,780 1576 01:28:31,780 --> 01:28:35,020 And then finally, after we create our run, 1577 01:28:35,020 --> 01:28:38,850 we can use a runs retrieve function to check up on the status of our run. 1578 01:28:38,850 --> 01:28:41,770 We can use the returned object to determine the status of our run, 1579 01:28:41,770 --> 01:28:46,370 for example, if it's completed, failed, canceled or expired, for example. 1580 01:28:46,370 --> 01:28:49,720 Once completed, we can retrieve the messages in the run thread 1581 01:28:49,720 --> 01:28:52,670 and print them out as shown on the screen to my left. 1582 01:28:52,670 --> 01:28:55,900 Note that this example is not for streaming. 1583 01:28:55,900 --> 01:28:58,660 But the Assistant AI does allow for streaming, 1584 01:28:58,660 --> 01:29:01,930 which would change a bit about how we would manipulate 1585 01:29:01,930 --> 01:29:06,610 the output for the final message, which would require us to use a RunStepDelta 1586 01:29:06,610 --> 01:29:08,520 object. 1587 01:29:08,520 --> 01:29:10,530 In addition to the code, we can also experiment 1588 01:29:10,530 --> 01:29:13,320 with the Assistants API in the Assistants Playgrounds 1589 01:29:13,320 --> 01:29:18,570 on OpenAI's website, which is a bit more user friendly and less daunting 1590 01:29:18,570 --> 01:29:23,430 because certainly the documentation can be a little vague at times. 1591 01:29:23,430 --> 01:29:26,400 In terms of pricing, tokens used for the assistant 1592 01:29:26,400 --> 01:29:31,080 are charged by the specific model that we specify when we create the assistant. 1593 01:29:31,080 --> 01:29:33,480 And also, for example, the Code Interpreter 1594 01:29:33,480 --> 01:29:37,170 is charged at $0.03 per session, which again lasts for one hour. 1595 01:29:37,170 --> 01:29:41,580 File Search is charged at $0.10 per gigabyte of vector storage per day, 1596 01:29:41,580 --> 01:29:44,950 and you get one gigabyte of storage for free. 1597 01:29:44,950 --> 01:29:45,880 So great. 1598 01:29:45,880 --> 01:29:49,200 And now with that, I will pass it off to our other intern, Ben, 1599 01:29:49,200 --> 01:29:52,050 who will be giving us a demo of exactly how we've been 1600 01:29:52,050 --> 01:29:54,340 able to implement the Assistants API. 1601 01:29:54,340 --> 01:29:56,010 Thank you so much. 1602 01:29:56,010 --> 01:29:57,960 BENJAMIN XU: Hello, my name is Ben. 1603 01:29:57,960 --> 01:30:02,970 I'm the last intern here at the CS50 R&D development team. 1604 01:30:02,970 --> 01:30:05,910 And I'm very excited to be here and present on how we've actually 1605 01:30:05,910 --> 01:30:09,880 been utilizing the assistant API that Chris has just discussed 1606 01:30:09,880 --> 01:30:12,160 in some of our new development. 1607 01:30:12,160 --> 01:30:14,350 Specifically, I'll be looking at student assessment 1608 01:30:14,350 --> 01:30:18,160 in the age of generative AI, which Julianna and David have 1609 01:30:18,160 --> 01:30:19,010 touched on a bit. 1610 01:30:19,010 --> 01:30:22,460 1611 01:30:22,460 --> 01:30:26,150 All right, so I'm going to go back to the very first slide that you saw today. 1612 01:30:26,150 --> 01:30:28,430 And that is because it's probably something 1613 01:30:28,430 --> 01:30:32,240 that's been on many of our minds these past year or two. 1614 01:30:32,240 --> 01:30:36,030 Powerful tools like ChatGPT are too helpful. 1615 01:30:36,030 --> 01:30:38,750 And given this, many of us educators especially 1616 01:30:38,750 --> 01:30:44,880 wonder, do current metrics of assessment accurately measure student mastery? 1617 01:30:44,880 --> 01:30:48,030 For example, is passing every test case in a pset 1618 01:30:48,030 --> 01:30:51,480 enough to accurately predict that a student fully understands 1619 01:30:51,480 --> 01:30:53,380 the material or skills of the unit? 1620 01:30:53,380 --> 01:30:57,390 And to be honest, if you're like me and you witnessed ChatGPT bang 1621 01:30:57,390 --> 01:31:00,510 out entire repositories of code within minutes, 1622 01:31:00,510 --> 01:31:04,680 you may be a bit worried about what AI could mean for assessment in CS. 1623 01:31:04,680 --> 01:31:06,780 And don't get me wrong, AI is an amazing tool 1624 01:31:06,780 --> 01:31:09,660 that provides millions of students with information 1625 01:31:09,660 --> 01:31:11,580 that they wouldn't have otherwise. 1626 01:31:11,580 --> 01:31:14,190 But in terms of skills and understanding, 1627 01:31:14,190 --> 01:31:17,830 it can cause teachers and students alike to be a bit uncertain. 1628 01:31:17,830 --> 01:31:22,750 1629 01:31:22,750 --> 01:31:28,520 So, for instance, many questions have come up recently, the first of being, 1630 01:31:28,520 --> 01:31:31,700 has AI become a crutch to students? 1631 01:31:31,700 --> 01:31:36,790 or whether AI has made students lazy or the classic, 1632 01:31:36,790 --> 01:31:42,260 how well do I really understand the material if I'm using AI to help me? 1633 01:31:42,260 --> 01:31:46,340 And lastly, "will AI hurt me if I use it now in the future?" 1634 01:31:46,340 --> 01:31:49,370 These questions are by no means easy to answer. 1635 01:31:49,370 --> 01:31:52,890 AI has exploded onto the education scene faster than we could have imagined. 1636 01:31:52,890 --> 01:31:55,520 And as great of a resource as AI has been, 1637 01:31:55,520 --> 01:31:58,650 it has definitely posed a challenge for CS assessments. 1638 01:31:58,650 --> 01:32:02,720 The classic binary pass or fail test case checks, for example, 1639 01:32:02,720 --> 01:32:04,700 might not be enough for a teacher or a student 1640 01:32:04,700 --> 01:32:07,210 to know where they're really at in terms of actual skills. 1641 01:32:07,210 --> 01:32:11,960 1642 01:32:11,960 --> 01:32:14,510 So this past year, our in-person teaching staff 1643 01:32:14,510 --> 01:32:16,920 came together and discussed ways around this. 1644 01:32:16,920 --> 01:32:20,180 Is there a way to assess students so that both students and teachers can 1645 01:32:20,180 --> 01:32:24,470 be confident that the students understand the course material? 1646 01:32:24,470 --> 01:32:27,210 And this is by no means the only solution to this. 1647 01:32:27,210 --> 01:32:32,367 But one idea that we implemented at Yale this past fall was the code review. 1648 01:32:32,367 --> 01:32:35,200 Now, the code review, it's a common practice in software engineering 1649 01:32:35,200 --> 01:32:38,170 where, in very simple terms, a programmer 1650 01:32:38,170 --> 01:32:41,020 will review another programmer's code. 1651 01:32:41,020 --> 01:32:42,730 It's a very open-ended process. 1652 01:32:42,730 --> 01:32:46,240 And there can be discussions of anything from design choices to variable naming 1653 01:32:46,240 --> 01:32:49,170 conventions to even edge case testing. 1654 01:32:49,170 --> 01:32:54,190 So at Yale, we thought, what if each and every one of these 200 to 250 students 1655 01:32:54,190 --> 01:32:57,870 sat down one on one with a TA and did a code review? 1656 01:32:57,870 --> 01:33:00,310 If students could thoughtfully articulate their code 1657 01:33:00,310 --> 01:33:03,130 into English language, then that in itself 1658 01:33:03,130 --> 01:33:05,248 is a pretty strong demonstration of mastery. 1659 01:33:05,248 --> 01:33:06,540 So that's actually what we did. 1660 01:33:06,540 --> 01:33:09,550 1661 01:33:09,550 --> 01:33:16,090 During our sections, which is just a one-a-week, three-hour-long recitation, 1662 01:33:16,090 --> 01:33:19,120 halfway through the semester, we chose one problem set per student, 1663 01:33:19,120 --> 01:33:22,180 usually one that was maybe a bit tougher for them, 1664 01:33:22,180 --> 01:33:26,110 and took 10 to 15 minutes with each student going through their code. 1665 01:33:26,110 --> 01:33:29,600 And this was actually quite effective. 1666 01:33:29,600 --> 01:33:32,335 It was able to tell us that-- 1667 01:33:32,335 --> 01:33:36,700 it was able to help teaching assistants gauge where students were at, i.e, 1668 01:33:36,700 --> 01:33:39,610 which concepts of students were very solid on and maybe 1669 01:33:39,610 --> 01:33:43,160 which that they implemented with a bit of guesswork. 1670 01:33:43,160 --> 01:33:45,340 So from there, both us teachers and students 1671 01:33:45,340 --> 01:33:48,377 could see that, oh, even though they got all the cases right, 1672 01:33:48,377 --> 01:33:50,710 they could have used some more practice in this one area 1673 01:33:50,710 --> 01:33:53,160 or this other area, so a big success. 1674 01:33:53,160 --> 01:33:56,530 However, there's an immediate problem with this type of assessment, 1675 01:33:56,530 --> 01:34:00,710 and it's that code reviews are very unscalable. 1676 01:34:00,710 --> 01:34:05,980 Even if a student can meet with a TA for 15 minutes one time, 1677 01:34:05,980 --> 01:34:09,160 what if they wanted to meet on a different problem set or every problem 1678 01:34:09,160 --> 01:34:16,810 set or some problem sets multiple times and review them iteratively? 1679 01:34:16,810 --> 01:34:20,040 This is just really unfeasible for TAs. 1680 01:34:20,040 --> 01:34:24,790 And what can really help here is the Assistants API. 1681 01:34:24,790 --> 01:34:29,260 So with the Assistants API Code Interpreter, as Chris mentioned, 1682 01:34:29,260 --> 01:34:32,950 a model can easily become the TA in a code review. 1683 01:34:32,950 --> 01:34:34,790 So this is the interface. 1684 01:34:34,790 --> 01:34:38,290 A student can simply upload any file that they choose. 1685 01:34:38,290 --> 01:34:41,840 And let's say the student wants to review Mario. 1686 01:34:41,840 --> 01:34:43,640 So they upload Mario. 1687 01:34:43,640 --> 01:34:47,710 And what happens is their code instantly shows up 1688 01:34:47,710 --> 01:34:50,200 on the left in a VS Code-like monitor. 1689 01:34:50,200 --> 01:34:52,840 And on the right is the code review. 1690 01:34:52,840 --> 01:34:58,930 So let's start by the student getting asked a question. 1691 01:34:58,930 --> 01:35:02,140 And it says, hello, let's dive into your code review. 1692 01:35:02,140 --> 01:35:04,510 I see that you've implemented a loop to ensure the user 1693 01:35:04,510 --> 01:35:06,500 inputs a height between 1 and 8. 1694 01:35:06,500 --> 01:35:09,422 Can you explain why you chose to use a do while loop specifically? 1695 01:35:09,422 --> 01:35:12,380 What advantages does it offer in this scenario compared to other loops? 1696 01:35:12,380 --> 01:35:15,190 And the good thing about this is that it actually highlights the code for you 1697 01:35:15,190 --> 01:35:15,980 in the editor. 1698 01:35:15,980 --> 01:35:18,610 So you can see in the context of your file 1699 01:35:18,610 --> 01:35:22,400 where this ddb is really reviewing you on. 1700 01:35:22,400 --> 01:35:26,390 So let's say you didn't really know why the 1701 01:35:26,390 --> 01:35:28,585 do while loop was the best choice here, but you 1702 01:35:28,585 --> 01:35:30,210 knew what your do while loop was doing. 1703 01:35:30,210 --> 01:35:32,877 So you said I wanted to ask the user for a height once initially 1704 01:35:32,877 --> 01:35:34,880 but then ask again if the height was invalid. 1705 01:35:34,880 --> 01:35:39,320 So the debugger now reviewing your code will say something 1706 01:35:39,320 --> 01:35:41,550 along the lines of that's a great explanation. 1707 01:35:41,550 --> 01:35:44,600 The do while loop works especially well here for some reason. 1708 01:35:44,600 --> 01:35:49,533 And it also includes a little snippet on why other loops wouldn't be as good. 1709 01:35:49,533 --> 01:35:52,700 So a while loop would also work, but it would require some unnecessary code. 1710 01:35:52,700 --> 01:35:54,590 And think about why that would be. 1711 01:35:54,590 --> 01:35:56,000 So then it keeps going. 1712 01:35:56,000 --> 01:35:59,470 So it highlights the next line and asks you-- 1713 01:35:59,470 --> 01:36:02,455 I'm sorry, moving on. 1714 01:36:02,455 --> 01:36:05,080 Can you walk me walk me through the logic of the outer for loop 1715 01:36:05,080 --> 01:36:06,590 in the nested for loop structure? 1716 01:36:06,590 --> 01:36:08,150 What role do the variables play here? 1717 01:36:08,150 --> 01:36:12,250 And let's say you said, i is a counter variable, and n is the height. 1718 01:36:12,250 --> 01:36:15,070 So you're just counting up to the height. 1719 01:36:15,070 --> 01:36:18,490 And the response is actually pretty insightful. 1720 01:36:18,490 --> 01:36:19,220 That's right. 1721 01:36:19,220 --> 01:36:21,770 In your code, i is a counter variable, and n is the height. 1722 01:36:21,770 --> 01:36:23,860 However, this can be a bit confusing for someone 1723 01:36:23,860 --> 01:36:25,940 who hasn't read your code before or even yourself 1724 01:36:25,940 --> 01:36:27,565 if you came back to this after a while. 1725 01:36:27,565 --> 01:36:30,363 This is why variable naming is so important in programming. 1726 01:36:30,363 --> 01:36:32,780 Can you think of any better names for these two variables? 1727 01:36:32,780 --> 01:36:36,640 So now the student can respond with maybe something like current row 1728 01:36:36,640 --> 01:36:37,550 and height. 1729 01:36:37,550 --> 01:36:42,020 And here the ddb continues the conversation and keeps going. 1730 01:36:42,020 --> 01:36:45,310 So as you can see, even from two snippets alone, 1731 01:36:45,310 --> 01:36:48,820 we already have some pretty important things 1732 01:36:48,820 --> 01:36:53,710 that wouldn't be tested in a tool like check50 or style50. 1733 01:36:53,710 --> 01:36:57,880 There's one conceptual kink that maybe the student hadn't even thought of 1734 01:36:57,880 --> 01:37:03,010 and another just very important convention standard 1735 01:37:03,010 --> 01:37:04,670 that the student was not following. 1736 01:37:04,670 --> 01:37:09,020 So these assessments, like check50 and style50 are great tools. 1737 01:37:09,020 --> 01:37:12,355 But now with the Assistants API, they're able to get much more nuanced feedback. 1738 01:37:12,355 --> 01:37:19,420 1739 01:37:19,420 --> 01:37:23,990 You may be wondering like, how can these tools actually be used in an assessment. 1740 01:37:23,990 --> 01:37:27,157 And the way you use such a tool largely depends on your pedagogical beliefs. 1741 01:37:27,157 --> 01:37:30,240 You've probably noticed that I've been saying that these concerns about AI 1742 01:37:30,240 --> 01:37:32,120 are not just from teachers but also students. 1743 01:37:32,120 --> 01:37:35,590 And I myself sometimes question whether I really gained all the skills 1744 01:37:35,590 --> 01:37:39,770 a class had to offer if an AI had to help me to get to the end of the goal. 1745 01:37:39,770 --> 01:37:43,540 And, of course, the Assistants API could be used, like repurposed, 1746 01:37:43,540 --> 01:37:47,230 as a TA that grades the students based on their responses. 1747 01:37:47,230 --> 01:37:50,913 Or the code review can serve as a truly valuable self-assessment tool 1748 01:37:50,913 --> 01:37:53,830 where the students can gauge their own understanding of their own code 1749 01:37:53,830 --> 01:37:56,830 and find things they missed all in their own free time 1750 01:37:56,830 --> 01:37:59,140 without anxiety of judgment. 1751 01:37:59,140 --> 01:38:01,300 Regardless, this is just one potential application 1752 01:38:01,300 --> 01:38:03,250 of the Asssitants API to education. 1753 01:38:03,250 --> 01:38:06,610 And the API is still very new, so I highly encourage you all 1754 01:38:06,610 --> 01:38:09,160 to make something out of it as well. 1755 01:38:09,160 --> 01:38:12,320 RONGXIN LIU: All right, thank you, Ben, Chris, and Julianna. 1756 01:38:12,320 --> 01:38:16,580 So that's a lot to take away from this talk. 1757 01:38:16,580 --> 01:38:19,950 But I want you to end this talk with just a quick takeaway. 1758 01:38:19,950 --> 01:38:23,480 So what can you all do with all of the technology we just mentioned? 1759 01:38:23,480 --> 01:38:26,390 You definitely don't need to re-implement what we did. 1760 01:38:26,390 --> 01:38:29,880 Or you certainly don't have the bandwidth to implement what we did. 1761 01:38:29,880 --> 01:38:30,890 That's OK. 1762 01:38:30,890 --> 01:38:35,000 Luckily, there's a more user-friendly tool that you can actually utilize, 1763 01:38:35,000 --> 01:38:36,560 which is the OpenAI GPT. 1764 01:38:36,560 --> 01:38:39,150 It's a GUI-based GPT builder. 1765 01:38:39,150 --> 01:38:41,540 You can literally go talk to this GPT builder 1766 01:38:41,540 --> 01:38:47,960 and build or rebuild or recreate a CS50 Duck right from OpenAI's platform. 1767 01:38:47,960 --> 01:38:51,530 And you can essentially create this GPT and distribute it 1768 01:38:51,530 --> 01:38:54,640 to your student to use. 1769 01:38:54,640 --> 01:38:57,430 And for more information on how you can actually build a GPT, 1770 01:38:57,430 --> 01:39:01,670 I would please look at the link in the slide for more information. 1771 01:39:01,670 --> 01:39:04,220 But this screenshot basically captured the process. 1772 01:39:04,220 --> 01:39:08,985 I just quickly was able to create a CS50 Duck on OpenAI's platform. 1773 01:39:08,985 --> 01:39:12,210 1774 01:39:12,210 --> 01:39:16,350 And with that, I hope you have a sense of what CS50 AI was actually 1775 01:39:16,350 --> 01:39:17,460 doing underneath the hood. 1776 01:39:17,460 --> 01:39:19,905 And I hope you learned something from this talk. 1777 01:39:19,905 --> 01:39:21,780 If you want to review the talk, certainly you 1778 01:39:21,780 --> 01:39:26,070 can just rewatch the video recording and then look at the provided code to you 1779 01:39:26,070 --> 01:39:28,030 as well. 1780 01:39:28,030 --> 01:39:29,780 With that, I'm going to end the talk here. 1781 01:39:29,780 --> 01:39:32,150 Thank you so much for your time. 1782 01:39:32,150 --> 01:39:33,000