1 00:00:00,000 --> 00:00:03,486 [MUSIC PLAYING] 2 00:00:03,486 --> 00:01:07,345 3 00:01:07,345 --> 00:01:10,960 TOM CRUISE: I'm going to show you some magic. 4 00:01:10,960 --> 00:01:12,250 It's the real thing. 5 00:01:12,250 --> 00:01:14,420 [LAUGHTER] 6 00:01:14,420 --> 00:01:24,340 I mean, it's all the real thing. 7 00:01:24,340 --> 00:01:26,270 [LAUGHTER] 8 00:01:26,270 --> 00:01:27,410 DAVID J. MALAN: All right. 9 00:01:27,410 --> 00:01:30,950 This is CS50, Harvard University's Introduction 10 00:01:30,950 --> 00:01:33,140 to the Intellectual Enterprises of Computer Science 11 00:01:33,140 --> 00:01:34,430 and the Art of Programming. 12 00:01:34,430 --> 00:01:37,760 My name is David Malan, and this is our family-friendly introduction 13 00:01:37,760 --> 00:01:41,780 to artificial intelligence or AI, which seems to be everywhere these days. 14 00:01:41,780 --> 00:01:45,140 But first, a word on these rubber ducks, which your students 15 00:01:45,140 --> 00:01:46,487 might have had for some time. 16 00:01:46,487 --> 00:01:49,320 Within the world of computer science, and programming in particular, 17 00:01:49,320 --> 00:01:52,145 there's this notion of rubber duck debugging or rubber ducking-- 18 00:01:52,145 --> 00:01:57,080 --whereby in the absence of a colleague, a friend, a family member, a teaching 19 00:01:57,080 --> 00:02:00,120 fellow who might be able to answer your questions about your code, 20 00:02:00,120 --> 00:02:02,210 especially when it's not working, ideally you 21 00:02:02,210 --> 00:02:04,940 might have at least a rubber duck or really any inanimate 22 00:02:04,940 --> 00:02:07,550 object on your desk with whom to talk. 23 00:02:07,550 --> 00:02:11,243 And the idea is, that in expressing your logic, talking through your problems, 24 00:02:11,243 --> 00:02:13,160 even though the duck doesn't actually respond, 25 00:02:13,160 --> 00:02:16,250 invariably, you hear eventually the illogic in your thoughts 26 00:02:16,250 --> 00:02:18,110 and the proverbial light bulb goes off. 27 00:02:18,110 --> 00:02:20,900 Now, for students online for some time, CS50 28 00:02:20,900 --> 00:02:23,370 has had a digital version thereof, whereby 29 00:02:23,370 --> 00:02:25,945 in the programming environment that CS50 students use, 30 00:02:25,945 --> 00:02:29,070 for the past several years, if they don't have a rubber duck on their desk, 31 00:02:29,070 --> 00:02:30,790 they can pull up this interface here. 32 00:02:30,790 --> 00:02:32,850 And if they begin a conversation like, I'm 33 00:02:32,850 --> 00:02:35,850 hoping you can help me solve some problem, up until recently, 34 00:02:35,850 --> 00:02:39,640 CS50's virtual rubber duck would simply quack once, twice, 35 00:02:39,640 --> 00:02:41,010 or three times in total. 36 00:02:41,010 --> 00:02:43,380 But we have anecdotal evidence that alone 37 00:02:43,380 --> 00:02:47,010 was enough to get students to realize what it is they were doing wrong. 38 00:02:47,010 --> 00:02:51,090 But of course, more recently has this duck and so many other ducks, 39 00:02:51,090 --> 00:02:53,340 so to speak, around the world, come to life really. 40 00:02:53,340 --> 00:02:56,310 And your students have been using artificial intelligence 41 00:02:56,310 --> 00:03:00,090 in some form within CS50 as a virtual teaching assistant. 42 00:03:00,090 --> 00:03:02,130 And what we'll do today, is reveal not only 43 00:03:02,130 --> 00:03:05,370 how we've been using and leveraging AI within CS50, 44 00:03:05,370 --> 00:03:10,530 but also how AI itself works, and to prepare you better for the years ahead. 45 00:03:10,530 --> 00:03:14,910 So last year around this time, like DALL-E 2 and image generation 46 00:03:14,910 --> 00:03:15,870 were all of the rage. 47 00:03:15,870 --> 00:03:18,600 You might have played with this, whereby you can type in some keywords and boom, 48 00:03:18,600 --> 00:03:20,640 you have a dynamically generated image. 49 00:03:20,640 --> 00:03:24,240 Similar tools are like Midjourney, which gives you even more realistic 3D 50 00:03:24,240 --> 00:03:24,960 imagery. 51 00:03:24,960 --> 00:03:27,840 And within that world of image generation, 52 00:03:27,840 --> 00:03:32,370 there were nonetheless some tells, like an observant viewer could tell 53 00:03:32,370 --> 00:03:34,768 that this was probably generated by AI. 54 00:03:34,768 --> 00:03:36,810 And in fact, a few months ago, The New York Times 55 00:03:36,810 --> 00:03:38,470 took a look at some of these tools. 56 00:03:38,470 --> 00:03:41,550 And so, for instance, here is a sequence of images 57 00:03:41,550 --> 00:03:44,350 that at least at left, isn't all that implausible that this 58 00:03:44,350 --> 00:03:45,600 might be an actual photograph. 59 00:03:45,600 --> 00:03:48,000 But in fact, all three of these are AI-generated. 60 00:03:48,000 --> 00:03:50,910 And for some time, there was a certain tell. 61 00:03:50,910 --> 00:03:54,600 Like AI up until recently, really wasn't really good at the finer details, 62 00:03:54,600 --> 00:03:57,120 like the fingers are not quite right. 63 00:03:57,120 --> 00:03:58,950 And so you could have that sort of hint. 64 00:03:58,950 --> 00:04:01,470 But I dare say, AI is getting even better and better, 65 00:04:01,470 --> 00:04:04,420 such that it's getting harder to discern these kinds of things. 66 00:04:04,420 --> 00:04:06,930 So if you haven't already, go ahead and take out your phone 67 00:04:06,930 --> 00:04:08,190 if you have one with you. 68 00:04:08,190 --> 00:04:11,680 And if you'd like to partake, scan this barcode here, 69 00:04:11,680 --> 00:04:13,830 which will lead you to a URL. 70 00:04:13,830 --> 00:04:17,339 And on your screen, you'll have an opportunity in a moment to buzz in. 71 00:04:17,339 --> 00:04:20,310 If my colleague, Rongxin, wouldn't mind joining me up here on stage. 72 00:04:20,310 --> 00:04:22,560 We'll ask you a sequence of questions and see just how 73 00:04:22,560 --> 00:04:25,480 prepared you are for this coming world of AI. 74 00:04:25,480 --> 00:04:27,823 So for instance, once you've got this here, 75 00:04:27,823 --> 00:04:29,490 code scanned, if you don't, that's fine. 76 00:04:29,490 --> 00:04:32,880 You can play along at home or alongside the person next to you. 77 00:04:32,880 --> 00:04:34,920 Here are two images. 78 00:04:34,920 --> 00:04:38,400 And my question for you is, which of these two images, left 79 00:04:38,400 --> 00:04:42,610 or right, was generated by AI? 80 00:04:42,610 --> 00:04:49,740 Which of these two was generated by AI, left or right? 81 00:04:49,740 --> 00:04:51,780 And I think Rongxin, we can flip over and see 82 00:04:51,780 --> 00:04:53,970 as the responses start to come in. 83 00:04:53,970 --> 00:04:58,740 So far, we're about 20% saying left, 70 plus percent saying right. 84 00:04:58,740 --> 00:05:02,272 3%, 4%, comfortably admitting unsure, and that's fine. 85 00:05:02,272 --> 00:05:04,230 Let's wait for a few more responses to come in, 86 00:05:04,230 --> 00:05:06,837 though I think the right-hand folks have it. 87 00:05:06,837 --> 00:05:09,420 And let's go ahead and flip back and see what the solution is. 88 00:05:09,420 --> 00:05:14,020 In this case, it was, in fact, the right-hand side that was AI-generated. 89 00:05:14,020 --> 00:05:15,127 So, that's great. 90 00:05:15,127 --> 00:05:17,460 I'm not sure what it means that we figured this one out, 91 00:05:17,460 --> 00:05:19,350 but let's try one more here. 92 00:05:19,350 --> 00:05:22,558 So let me propose that we consider now these two images. 93 00:05:22,558 --> 00:05:23,350 It's the same code. 94 00:05:23,350 --> 00:05:25,680 So if you still have your phone up, you don't need to scan again. 95 00:05:25,680 --> 00:05:27,250 It's going to be the same URL here. 96 00:05:27,250 --> 00:05:28,650 But just in case you closed it. 97 00:05:28,650 --> 00:05:30,990 Let's take a look now at these two images. 98 00:05:30,990 --> 00:05:35,040 Which of these, left or right, was AI-generated? 99 00:05:35,040 --> 00:05:38,802 Left or right this time? 100 00:05:38,802 --> 00:05:41,010 Rongxin, should we take a look at how it's coming in? 101 00:05:41,010 --> 00:05:42,570 Oh, it's a little closer this time. 102 00:05:42,570 --> 00:05:44,540 Left or right? 103 00:05:44,540 --> 00:05:46,830 Right's losing a little ground, maybe as people 104 00:05:46,830 --> 00:05:48,930 are changing their answers to left. 105 00:05:48,930 --> 00:05:52,510 More people are unsure this time, which is somewhat revealing. 106 00:05:52,510 --> 00:05:54,790 Let's give folks another second or two. 107 00:05:54,790 --> 00:05:57,200 And Rongxin, should we flip back? 108 00:05:57,200 --> 00:06:00,760 The answer is, actually a trick question, since they were both AI. 109 00:06:00,760 --> 00:06:04,120 So most of you, most of you were, in fact, right. 110 00:06:04,120 --> 00:06:08,150 But if you take a glance at this, is getting really, really good. 111 00:06:08,150 --> 00:06:13,220 And so this is just a taste of the images that we might see down the line. 112 00:06:13,220 --> 00:06:16,930 And in fact, that video with which we began, 113 00:06:16,930 --> 00:06:20,440 Tom Cruise, as you might have gleaned, was not, in fact, Tom Cruise. 114 00:06:20,440 --> 00:06:22,810 That was an example of a deepfake, a video that 115 00:06:22,810 --> 00:06:26,500 was synthesized, whereby a different human was acting out those motions, 116 00:06:26,500 --> 00:06:31,660 saying those words, but software, artificial intelligence-inspired 117 00:06:31,660 --> 00:06:35,380 software was mutating the actual image and faking this video. 118 00:06:35,380 --> 00:06:38,950 So it's all fun and games for now as we tinker with these kinds of examples, 119 00:06:38,950 --> 00:06:43,000 but suffice it to say, as we've begun to discuss in classes like this already, 120 00:06:43,000 --> 00:06:46,240 disinformation is only going to become more challenging in a world where 121 00:06:46,240 --> 00:06:47,920 it's not just text, but it's imagery. 122 00:06:47,920 --> 00:06:49,452 And all the more, soon video. 123 00:06:49,452 --> 00:06:51,910 But for today, we'll focus really on the fundamentals, what 124 00:06:51,910 --> 00:06:56,230 it is that's enabling technologies like these, and even more familiarly, text 125 00:06:56,230 --> 00:06:57,970 generation, which is all the rage. 126 00:06:57,970 --> 00:07:01,240 And in fact, it seems just a few months ago, probably everyone in this room 127 00:07:01,240 --> 00:07:04,030 started to hear about tools like ChatGPT. 128 00:07:04,030 --> 00:07:06,800 So we thought we'd do one final exercise here as a group. 129 00:07:06,800 --> 00:07:08,800 And this was another piece in The New York Times 130 00:07:08,800 --> 00:07:11,590 where they asked the audience, "Did a fourth grader write this? 131 00:07:11,590 --> 00:07:12,850 Or the new chatbot?" 132 00:07:12,850 --> 00:07:15,640 So another opportunity to assess your discerning skills. 133 00:07:15,640 --> 00:07:16,450 So same URL. 134 00:07:16,450 --> 00:07:19,840 So if you still have your phone open and that same interface open, 135 00:07:19,840 --> 00:07:21,470 you're in the right place. 136 00:07:21,470 --> 00:07:25,480 And here, we'll take a final stab at two essays of sorts. 137 00:07:25,480 --> 00:07:30,020 Which of these essays was written by AI? 138 00:07:30,020 --> 00:07:32,260 Essay 1 or Essay 2? 139 00:07:32,260 --> 00:07:34,450 And as folks buzz in, I'll read the first. 140 00:07:34,450 --> 00:07:35,020 Essay 1. 141 00:07:35,020 --> 00:07:37,870 I like to bring a yummy sandwich and a cold juice box for lunch. 142 00:07:37,870 --> 00:07:41,860 Sometimes I'll even pack a tasty piece of fruit or a bag of crunchy chips. 143 00:07:41,860 --> 00:07:46,090 As we eat, we chat and laugh and catch up on each other's day, dot, dot, dot. 144 00:07:46,090 --> 00:07:46,690 Essay 2. 145 00:07:46,690 --> 00:07:49,243 My mother packs me a sandwich, a drink, fruit, and a treat. 146 00:07:49,243 --> 00:07:51,910 When I get in the lunchroom, I find an empty table and sit there 147 00:07:51,910 --> 00:07:52,930 and I eat my lunch. 148 00:07:52,930 --> 00:07:54,820 My friends come and sit down with me. 149 00:07:54,820 --> 00:07:55,790 Dot, dot, dot. 150 00:07:55,790 --> 00:07:57,550 Rongxin, should we see what folks think? 151 00:07:57,550 --> 00:08:03,040 It looks like most of you think that Essay 1 was generated by AI. 152 00:08:03,040 --> 00:08:09,010 And in fact, if we flip back to the answer here, it was, in fact, Essay 1. 153 00:08:09,010 --> 00:08:13,060 So it's great that we now already have seemingly this discerning eye, 154 00:08:13,060 --> 00:08:15,880 but let me perhaps deflate that enthusiasm 155 00:08:15,880 --> 00:08:20,120 by saying it's only going to get harder to discern one from the other. 156 00:08:20,120 --> 00:08:23,680 And we're really now on the bleeding edge of what's soon to be possible. 157 00:08:23,680 --> 00:08:25,990 But most everyone in this room has probably by now 158 00:08:25,990 --> 00:08:31,450 seen, tried, certainly heard of ChatGPT, which is all about textual generation. 159 00:08:31,450 --> 00:08:34,210 Within CS50 and within academia more generally, 160 00:08:34,210 --> 00:08:37,690 have we been thinking about, talking about, how whether to use or not 161 00:08:37,690 --> 00:08:39,023 use these kinds of technologies. 162 00:08:39,023 --> 00:08:42,148 And if the students in the room haven't told the family members in the room 163 00:08:42,148 --> 00:08:45,010 already, this here is an excerpt from CS50's own syllabus this year, 164 00:08:45,010 --> 00:08:48,730 whereby we have deemed tools like ChatGPT in their current form, 165 00:08:48,730 --> 00:08:49,808 just too helpful. 166 00:08:49,808 --> 00:08:51,850 Sort of like an overzealous friend who in school, 167 00:08:51,850 --> 00:08:55,520 who just wants to give you all of the answers instead of leading you to them. 168 00:08:55,520 --> 00:09:00,760 And so we simply prohibit by policy using AI-based software, 169 00:09:00,760 --> 00:09:05,200 such as ChatGPT, third-party tools like GitHub Copilot, Bing Chat, and others 170 00:09:05,200 --> 00:09:08,920 that suggests or completes answers to questions or lines of code. 171 00:09:08,920 --> 00:09:13,510 But it would seem reactionary to take away what technology surely has 172 00:09:13,510 --> 00:09:15,400 some potential upsides for education. 173 00:09:15,400 --> 00:09:18,460 And so within CS50 this semester, as well as this past summer, 174 00:09:18,460 --> 00:09:22,300 have we allowed students to use CS50's own AI-based software, which 175 00:09:22,300 --> 00:09:24,490 are in effect, as we'll discuss, built on top 176 00:09:24,490 --> 00:09:27,700 of these third-party tools, ChatGPT from OpenAI, 177 00:09:27,700 --> 00:09:29,440 companies like Microsoft and beyond. 178 00:09:29,440 --> 00:09:33,820 And in fact, what students can now use, is this brought to life CS50 duck, 179 00:09:33,820 --> 00:09:37,270 or DDB, Duck Debugger, within a website of our own, 180 00:09:37,270 --> 00:09:41,230 CS50 AI, and another that your students know known as cs50.dev. 181 00:09:41,230 --> 00:09:43,210 So students are using it, but in a way where 182 00:09:43,210 --> 00:09:46,120 we have tempered the enthusiasm of what might otherwise 183 00:09:46,120 --> 00:09:48,370 be an overly helpful duck to model it more 184 00:09:48,370 --> 00:09:50,480 akin to a good teacher, a good teaching fellow, 185 00:09:50,480 --> 00:09:54,140 who might guide you to the answers, but not simply hand them outright. 186 00:09:54,140 --> 00:09:57,170 So what does that actually mean, and in what form does this duck come? 187 00:09:57,170 --> 00:09:59,960 Well, architecturally, for those of you with engineering backgrounds that 188 00:09:59,960 --> 00:10:02,293 might be curious as to how this is actually implemented, 189 00:10:02,293 --> 00:10:06,260 if a student here in the class has a question, virtually in this case, 190 00:10:06,260 --> 00:10:10,820 they somehow ask these questions of this central web application, cs50.ai. 191 00:10:10,820 --> 00:10:13,760 But we, in turn, have built much of our own logic 192 00:10:13,760 --> 00:10:18,050 on top of third-party services, known as APIs, application programming 193 00:10:18,050 --> 00:10:20,780 interfaces, features that other companies provide 194 00:10:20,780 --> 00:10:22,530 that people like us can use. 195 00:10:22,530 --> 00:10:25,250 So as they are doing really a lot of the heavy lifting, 196 00:10:25,250 --> 00:10:27,380 the so-called large language models are there. 197 00:10:27,380 --> 00:10:30,350 But we, too, have information that is not in these models yet. 198 00:10:30,350 --> 00:10:32,720 For instance, the words that came out of my mouth 199 00:10:32,720 --> 00:10:36,500 just last week when we had a lecture on some other topic, not to mention all 200 00:10:36,500 --> 00:10:39,270 of the past lectures and homework assignments from this year. 201 00:10:39,270 --> 00:10:41,510 So we have our own vector database locally 202 00:10:41,510 --> 00:10:44,570 via which we can search for more recent information, 203 00:10:44,570 --> 00:10:47,900 and then hand some of that information into these models, which you might 204 00:10:47,900 --> 00:10:51,870 recall, at least for OpenAI, is cut off as of 2021 as 205 00:10:51,870 --> 00:10:54,240 of now, to make the information even more current. 206 00:10:54,240 --> 00:10:56,590 So architecturally, that's sort of the flow. 207 00:10:56,590 --> 00:10:58,980 But for now, I thought I'd share at a higher level what 208 00:10:58,980 --> 00:11:01,440 it is your students are already familiar with, 209 00:11:01,440 --> 00:11:04,230 and what will soon be more broadly available to our own students 210 00:11:04,230 --> 00:11:05,650 online as well. 211 00:11:05,650 --> 00:11:08,190 So what we focused on is, what's generally 212 00:11:08,190 --> 00:11:11,820 now known as prompt engineering, which isn't really a technical phrase, 213 00:11:11,820 --> 00:11:14,500 because it's not so much engineering in the traditional sense. 214 00:11:14,500 --> 00:11:16,650 It really is just English, what we are largely 215 00:11:16,650 --> 00:11:20,520 doing when it comes to giving the AI the personality 216 00:11:20,520 --> 00:11:22,800 of a good teacher or a good duck. 217 00:11:22,800 --> 00:11:26,460 So what we're doing, is giving it what's known as a system prompt nowadays, 218 00:11:26,460 --> 00:11:31,020 whereby we write some English sentences, send those English sentences to OpenAI 219 00:11:31,020 --> 00:11:34,560 or Microsoft, that sort of teaches it how to behave. 220 00:11:34,560 --> 00:11:36,930 Not just using its own knowledge out of the box, 221 00:11:36,930 --> 00:11:40,290 but coercing it to behave a little more educationally constructively. 222 00:11:40,290 --> 00:11:42,720 And so for instance, a representative snippet 223 00:11:42,720 --> 00:11:44,622 of English that we provide to these services 224 00:11:44,622 --> 00:11:46,080 looks a little something like this. 225 00:11:46,080 --> 00:11:50,600 Quote, unquote, "You are a friendly and supportive teaching assistant for CS50. 226 00:11:50,600 --> 00:11:52,520 You are also a rubber duck. 227 00:11:52,520 --> 00:11:57,080 You answer student questions only about CS50 and the field of computer science, 228 00:11:57,080 --> 00:11:59,900 do not answer questions about unrelated topics. 229 00:11:59,900 --> 00:12:02,060 Do not provide full answers to problem sets, 230 00:12:02,060 --> 00:12:04,130 as this would violate academic honesty. 231 00:12:04,130 --> 00:12:07,610 And so in essence, and you can do this manually with ChatGPT, 232 00:12:07,610 --> 00:12:09,990 you can tell it or ask it how to behave. 233 00:12:09,990 --> 00:12:11,910 We, essentially, are doing this automatically, 234 00:12:11,910 --> 00:12:14,240 so that it doesn't just hand answers out of the box 235 00:12:14,240 --> 00:12:16,310 and knows a little something more about us. 236 00:12:16,310 --> 00:12:19,310 There's also in this world of AI right now the notion of a user 237 00:12:19,310 --> 00:12:21,380 prompt versus that system prompt. 238 00:12:21,380 --> 00:12:25,060 And the user prompt, in our case, is essentially the student's own question. 239 00:12:25,060 --> 00:12:29,630 I have a question about x, or I have a problem with my code here in y, 240 00:12:29,630 --> 00:12:32,720 so we pass to those same APIs, students' own questions 241 00:12:32,720 --> 00:12:34,670 as part of this so-called user prompt. 242 00:12:34,670 --> 00:12:37,490 Just so you're familiar now with some of the vernacular of late. 243 00:12:37,490 --> 00:12:39,200 Now, the programming environment that students 244 00:12:39,200 --> 00:12:41,575 have been using this whole year is known as Visual Studio 245 00:12:41,575 --> 00:12:45,260 Code, a popular open source, free product, that most-- 246 00:12:45,260 --> 00:12:47,450 so many engineers around the world now use. 247 00:12:47,450 --> 00:12:50,580 But we've instrumented it to be a little more course-specific 248 00:12:50,580 --> 00:12:55,830 with some course-specific features that make learning within this environment 249 00:12:55,830 --> 00:12:57,900 all the easier. 250 00:12:57,900 --> 00:12:59,220 It lives at cs50.dev. 251 00:12:59,220 --> 00:13:02,370 And as students in this room know, that as of now, 252 00:13:02,370 --> 00:13:04,650 the virtual duck lives within this environment 253 00:13:04,650 --> 00:13:07,540 and can do things like explain highlighted lines of code. 254 00:13:07,540 --> 00:13:10,560 So here, for instance, is a screenshot of this programming environment. 255 00:13:10,560 --> 00:13:14,550 Here is some arcane looking code in a language called C, that we've just 256 00:13:14,550 --> 00:13:16,082 left behind us in the class. 257 00:13:16,082 --> 00:13:19,290 And suppose that you don't understand what one or more of these lines of code 258 00:13:19,290 --> 00:13:19,790 do. 259 00:13:19,790 --> 00:13:23,580 Students can now highlight those lines, right-click or Control click on it, 260 00:13:23,580 --> 00:13:26,440 select explain highlighted code, and voila, 261 00:13:26,440 --> 00:13:32,040 they see a ChatGPT-like explanation of that very code within a second or so, 262 00:13:32,040 --> 00:13:35,100 that no human has typed out, but that's been dynamically generated 263 00:13:35,100 --> 00:13:36,660 based on this code. 264 00:13:36,660 --> 00:13:39,450 Other things that the duck can now do for students 265 00:13:39,450 --> 00:13:42,960 is advise students on how to improve their code style, the aesthetics, 266 00:13:42,960 --> 00:13:44,260 the formatting thereof. 267 00:13:44,260 --> 00:13:47,280 And so for instance, here is similar code in a language called C. 268 00:13:47,280 --> 00:13:48,990 And I'll stipulate that it's very messy. 269 00:13:48,990 --> 00:13:51,840 Everything is left-aligned instead of nicely indented, 270 00:13:51,840 --> 00:13:53,490 so it looks a little more structured. 271 00:13:53,490 --> 00:13:54,870 Students can now click a button. 272 00:13:54,870 --> 00:13:56,820 They'll see at the right-hand side in green 273 00:13:56,820 --> 00:13:58,650 how their code should ideally look. 274 00:13:58,650 --> 00:14:01,470 And if they're not quite sure what those changes are or why, 275 00:14:01,470 --> 00:14:03,150 they can click on, explain changes. 276 00:14:03,150 --> 00:14:06,180 And similarly, the duck advises them on how and why 277 00:14:06,180 --> 00:14:08,970 to turn their not great code into greater code, 278 00:14:08,970 --> 00:14:11,250 from left to right respectively. 279 00:14:11,250 --> 00:14:15,450 More compellingly and more generalizable beyond CS50 and beyond computer 280 00:14:15,450 --> 00:14:19,080 science, is AI's ability to answer most of the questions 281 00:14:19,080 --> 00:14:20,820 that students might now ask online. 282 00:14:20,820 --> 00:14:24,540 And we've been doing asynchronous Q&A for years via various mobile or web 283 00:14:24,540 --> 00:14:25,710 applications and the like. 284 00:14:25,710 --> 00:14:28,680 But to date, it has been humans, myself included, 285 00:14:28,680 --> 00:14:30,780 responding to all of those questions. 286 00:14:30,780 --> 00:14:34,650 Now the duck has an opportunity to chime in, generally within three seconds, 287 00:14:34,650 --> 00:14:37,260 because we've integrated it into an online Q&A tool 288 00:14:37,260 --> 00:14:40,960 that students in CS50 and elsewhere across Harvard have long used. 289 00:14:40,960 --> 00:14:44,370 So here's an anonymized screenshot of a question from an actual student, 290 00:14:44,370 --> 00:14:47,370 but written here as John Harvard, who asked this summer, 291 00:14:47,370 --> 00:14:50,150 in the summer version of CS50, what is flask exactly? 292 00:14:50,150 --> 00:14:51,920 So fairly definitional question. 293 00:14:51,920 --> 00:14:55,250 And here is what the duck spit out, thanks to that architecture 294 00:14:55,250 --> 00:14:56,510 I described before. 295 00:14:56,510 --> 00:14:59,210 I'll stipulate that this is correct, but it is mostly 296 00:14:59,210 --> 00:15:02,820 a definition, akin to what Google or Bing could already give you last year. 297 00:15:02,820 --> 00:15:04,940 But here's a more nuanced question, for instance, 298 00:15:04,940 --> 00:15:06,800 from another anonymized student. 299 00:15:06,800 --> 00:15:10,160 In this question here, the student's including an error message 300 00:15:10,160 --> 00:15:11,000 that they're seeing. 301 00:15:11,000 --> 00:15:12,650 They're asking about that. 302 00:15:12,650 --> 00:15:15,890 And they're asking a little more broadly and qualitatively, is there 303 00:15:15,890 --> 00:15:19,640 a more efficient way to write this code, a question that really is best 304 00:15:19,640 --> 00:15:21,620 answered based on experience. 305 00:15:21,620 --> 00:15:25,130 Here, I'll stipulate that the duck responded with this answer, which 306 00:15:25,130 --> 00:15:26,480 is actually pretty darn good. 307 00:15:26,480 --> 00:15:29,630 Not only responding in English, but with some sample starter code 308 00:15:29,630 --> 00:15:31,430 that would make sense in this context. 309 00:15:31,430 --> 00:15:34,580 And at the bottom it's worth noting, because none of this technology 310 00:15:34,580 --> 00:15:37,850 is perfect just yet, it's still indeed very bleeding edge, 311 00:15:37,850 --> 00:15:41,960 and so what we have chosen to do within CS50 is include disclaimers, like this. 312 00:15:41,960 --> 00:15:44,090 I am an experimental bot, quack. 313 00:15:44,090 --> 00:15:46,820 Do not assume that my reply is accurate unless you see that it's 314 00:15:46,820 --> 00:15:50,040 been endorsed by humans, quack. 315 00:15:50,040 --> 00:15:53,160 And in fact, at top right, the mechanism we've been using in this tool 316 00:15:53,160 --> 00:15:54,510 is usually within minutes. 317 00:15:54,510 --> 00:15:57,690 A human, whether it's a teaching fellow, a course assistant, or myself, 318 00:15:57,690 --> 00:16:00,990 will click on a button like this to signal to our human students 319 00:16:00,990 --> 00:16:05,130 that yes, like the duck is spot on here, or we have an opportunity, as always, 320 00:16:05,130 --> 00:16:07,020 to chime in with our own responses. 321 00:16:07,020 --> 00:16:09,770 Frankly, that disclaimer, that button, will soon I do think 322 00:16:09,770 --> 00:16:11,770 go away, as the software gets better and better. 323 00:16:11,770 --> 00:16:14,367 But for now, that's how we're modulating exactly 324 00:16:14,367 --> 00:16:16,200 what students' expectations might be when it 325 00:16:16,200 --> 00:16:19,395 comes to correctness or incorrectness. 326 00:16:19,395 --> 00:16:22,020 It's common too in programming, to see a lot of error messages, 327 00:16:22,020 --> 00:16:24,210 certainly when you're learning first-hand. 328 00:16:24,210 --> 00:16:26,820 A lot of these error messages are arcane, confusing, 329 00:16:26,820 --> 00:16:29,310 certainly to students, versus the people who wrote them. 330 00:16:29,310 --> 00:16:31,170 Soon students will see a box like this. 331 00:16:31,170 --> 00:16:34,050 Whenever one of their terminal window programs errs, 332 00:16:34,050 --> 00:16:39,120 they'll be assisted too with English-like, TF-like support when 333 00:16:39,120 --> 00:16:42,212 it comes to explaining what it is that went wrong with that command. 334 00:16:42,212 --> 00:16:43,920 And ultimately, what this is really doing 335 00:16:43,920 --> 00:16:45,900 for students in our own experience already, 336 00:16:45,900 --> 00:16:49,830 is providing them really with virtual office hours and 24/7, 337 00:16:49,830 --> 00:16:52,560 which is actually quite compelling in a university environment, 338 00:16:52,560 --> 00:16:55,110 where students' schedules are already tightly packed, 339 00:16:55,110 --> 00:16:58,270 be it with academics, their curriculars, athletics, and the like-- 340 00:16:58,270 --> 00:17:00,180 --and they might have enough time to dive 341 00:17:00,180 --> 00:17:03,510 into a homework assignment, maybe eight hours even, for something sizable. 342 00:17:03,510 --> 00:17:06,390 But if they hit that wall a couple of hours in, yeah, 343 00:17:06,390 --> 00:17:10,020 they can go to office hours or they can ask a question asynchronously online, 344 00:17:10,020 --> 00:17:13,020 but it's really not optimal in the moment support 345 00:17:13,020 --> 00:17:15,150 that we can now provide all the more effectively 346 00:17:15,150 --> 00:17:17,170 we hope, through software, as well. 347 00:17:17,170 --> 00:17:18,089 So if you're curious. 348 00:17:18,089 --> 00:17:20,797 Even if you're not a technophile yourself, anyone on the internet 349 00:17:20,797 --> 00:17:24,000 can go to cs50.ai and experiment with this user interface. 350 00:17:24,000 --> 00:17:29,940 This one here actually resembles ChatGPT itself, but it's specific to CS50. 351 00:17:29,940 --> 00:17:31,980 And here again is just a sequence of screenshots 352 00:17:31,980 --> 00:17:33,930 that I'll stipulate for today's purposes, 353 00:17:33,930 --> 00:17:37,920 are pretty darn good in akin to what I myself or a teaching fellow would reply 354 00:17:37,920 --> 00:17:41,100 to and answer to a student's question, in this case, 355 00:17:41,100 --> 00:17:42,930 about their particular code. 356 00:17:42,930 --> 00:17:45,240 And ultimately, it's really aspirational. 357 00:17:45,240 --> 00:17:49,320 The goal here ultimately is to really approximate a one-to-one teacher 358 00:17:49,320 --> 00:17:52,950 to student ratio, which despite all of the resources we within CS50, 359 00:17:52,950 --> 00:17:56,070 we within Harvard and places like Yale have, 360 00:17:56,070 --> 00:17:58,650 we certainly have never had enough resources 361 00:17:58,650 --> 00:18:00,690 to approximate what might really be ideal, 362 00:18:00,690 --> 00:18:04,050 which is more of an apprenticeship model, a mentorship, whereby it's just 363 00:18:04,050 --> 00:18:06,145 you and that teacher working one-to-one. 364 00:18:06,145 --> 00:18:09,270 Now we still have humans, and the goal is not to reduce that human support, 365 00:18:09,270 --> 00:18:14,220 but to focus it all the more consciously on the students who would benefit most 366 00:18:14,220 --> 00:18:17,100 from some impersonal one-to-one support versus students 367 00:18:17,100 --> 00:18:21,433 who would happily take it at any hour of the day more digitally via online. 368 00:18:21,433 --> 00:18:23,850 And in fact, we're still in the process of evaluating just 369 00:18:23,850 --> 00:18:25,560 how well or not well all of this works. 370 00:18:25,560 --> 00:18:28,800 But based on our summer experiment alone with about 70 students 371 00:18:28,800 --> 00:18:31,770 a few months back, one student wrote us at term's end it-- 372 00:18:31,770 --> 00:18:33,660 --"felt like having a personal tutor. 373 00:18:33,660 --> 00:18:37,830 I love how AI bots will answer questions without ego and without judgment. 374 00:18:37,830 --> 00:18:40,260 Generally entertaining even the stupidest of questions 375 00:18:40,260 --> 00:18:42,690 without treating them like they're stupid. 376 00:18:42,690 --> 00:18:47,550 It has an, as one could expect," ironically, "an inhuman level 377 00:18:47,550 --> 00:18:48,450 of patience." 378 00:18:48,450 --> 00:18:51,870 And so I thought that's telling as to how even one student is 379 00:18:51,870 --> 00:18:54,490 perceiving these new possibilities. 380 00:18:54,490 --> 00:18:56,610 So let's consider now more academically what 381 00:18:56,610 --> 00:18:58,920 it is that's enabling those kinds of tools, not just 382 00:18:58,920 --> 00:19:02,370 within CS50, within computer science, but really, the world more generally. 383 00:19:02,370 --> 00:19:04,078 What the whole world's been talking about 384 00:19:04,078 --> 00:19:06,270 is generative artificial intelligence. 385 00:19:06,270 --> 00:19:09,630 AI that can generate images, generate text, and sort of 386 00:19:09,630 --> 00:19:12,820 mimic the behavior of what we think of as human. 387 00:19:12,820 --> 00:19:14,240 So what does that really mean? 388 00:19:14,240 --> 00:19:15,990 Well, let's start really at the beginning. 389 00:19:15,990 --> 00:19:19,170 Artificial intelligence is actually a technique, a technology, 390 00:19:19,170 --> 00:19:21,510 a subject that's actually been with us for some time, 391 00:19:21,510 --> 00:19:26,460 but it really was the introduction of this very user-friendly interface known 392 00:19:26,460 --> 00:19:28,230 as ChatGPT. 393 00:19:28,230 --> 00:19:31,440 And some of the more recent academic work over really just the past five 394 00:19:31,440 --> 00:19:35,010 or six years, that really allowed us to take a massive leap forward 395 00:19:35,010 --> 00:19:38,520 it would seem technologically, as to what these things can now do. 396 00:19:38,520 --> 00:19:40,330 So what is artificial intelligence? 397 00:19:40,330 --> 00:19:43,410 It's been with us for some time, and it's honestly, so omnipresent, 398 00:19:43,410 --> 00:19:45,690 that we take it for granted nowadays. 399 00:19:45,690 --> 00:19:48,330 Gmail, Outlook, have gotten really good at spam detection. 400 00:19:48,330 --> 00:19:50,020 If you haven't checked your spam folder in a while, 401 00:19:50,020 --> 00:19:52,000 that's testament to just how good they seem 402 00:19:52,000 --> 00:19:54,758 to be at getting it out of your inbox. 403 00:19:54,758 --> 00:19:57,050 Handwriting recognition has been with us for some time. 404 00:19:57,050 --> 00:19:59,380 I dare say, it, too, is only getting better and better 405 00:19:59,380 --> 00:20:02,920 the more the software is able to adapt to different handwriting 406 00:20:02,920 --> 00:20:04,270 styles, such as this. 407 00:20:04,270 --> 00:20:06,940 Recommendation histories and the like, whether you're 408 00:20:06,940 --> 00:20:09,190 using Netflix or any other service, have gotten 409 00:20:09,190 --> 00:20:12,580 better and better at recommending things you might like based on things 410 00:20:12,580 --> 00:20:14,920 you have liked, and maybe based on things 411 00:20:14,920 --> 00:20:18,190 other people who like the same thing as you might have liked. 412 00:20:18,190 --> 00:20:20,560 And suffice it to say, there's no one at Netflix 413 00:20:20,560 --> 00:20:22,780 akin to the old VHS stores of yesteryear, 414 00:20:22,780 --> 00:20:26,590 who are recommending to you specifically what movie you might like. 415 00:20:26,590 --> 00:20:31,330 And there's no code, no algorithm that says, if they like x, then recommend y, 416 00:20:31,330 --> 00:20:34,762 else recommend z, because there's just too many movies, too many people, too 417 00:20:34,762 --> 00:20:36,220 many different tastes in the world. 418 00:20:36,220 --> 00:20:40,000 So AI is increasingly sort of looking for patterns that might not even 419 00:20:40,000 --> 00:20:42,700 be obvious to us humans, and dynamically figuring out 420 00:20:42,700 --> 00:20:46,750 what might be good for me, for you or you, or anyone else. 421 00:20:46,750 --> 00:20:50,402 Siri, Google Assistant, Alexa, any of these voice recognition tools 422 00:20:50,402 --> 00:20:51,610 that are answering questions. 423 00:20:51,610 --> 00:20:54,918 That, too, suffice it to say, is all powered by AI. 424 00:20:54,918 --> 00:20:58,210 But let's start with something a little simpler than any of those applications. 425 00:20:58,210 --> 00:21:01,522 And this is one of the first arcade games from yesteryear known as Pong. 426 00:21:01,522 --> 00:21:02,980 And it's sort of like table tennis. 427 00:21:02,980 --> 00:21:05,440 And the person on the left can move their paddle up and down. 428 00:21:05,440 --> 00:21:07,000 Person on the right can do the same. 429 00:21:07,000 --> 00:21:09,970 And the goal is to get the ball past the other person, 430 00:21:09,970 --> 00:21:13,960 or conversely, make sure it hits your paddle and bounces back. 431 00:21:13,960 --> 00:21:17,440 Well, somewhat simpler than this insofar as it can be one player, 432 00:21:17,440 --> 00:21:19,275 is another Atari game from yesteryear known 433 00:21:19,275 --> 00:21:21,400 as Breakout, whereby you're essentially just trying 434 00:21:21,400 --> 00:21:24,460 to bang the ball against the bricks to get more and more points 435 00:21:24,460 --> 00:21:26,320 and get rid of all of those bricks. 436 00:21:26,320 --> 00:21:28,960 But all of us in this room probably have a human instinct 437 00:21:28,960 --> 00:21:32,800 for how to win this game, or at least how to play this game. 438 00:21:32,800 --> 00:21:36,430 For instance, if the ball pictured here back in the '80s 439 00:21:36,430 --> 00:21:41,530 as a single red dot just left the paddle, pictured here as a red line, 440 00:21:41,530 --> 00:21:43,990 where is the ball presumably going to go next? 441 00:21:43,990 --> 00:21:47,410 And in turn, which direction should I slide my paddle? 442 00:21:47,410 --> 00:21:49,900 To the left or to the right? 443 00:21:49,900 --> 00:21:51,630 So presumably, to the left. 444 00:21:51,630 --> 00:21:54,690 And we all have an eye for what seemed to be the digital physics of that. 445 00:21:54,690 --> 00:21:57,540 And indeed, that would then be an algorithm, sort of step 446 00:21:57,540 --> 00:21:59,890 by step instructions for solving some problem. 447 00:21:59,890 --> 00:22:03,120 So how can we now translate that human intuition to what we describe more 448 00:22:03,120 --> 00:22:04,780 as artificial intelligence? 449 00:22:04,780 --> 00:22:07,290 Not nearly as sophisticated as those other applications, 450 00:22:07,290 --> 00:22:09,000 but we'll indeed, start with some basics. 451 00:22:09,000 --> 00:22:12,960 You might know from economics or strategic thinking or computer science, 452 00:22:12,960 --> 00:22:15,640 this idea of a decision tree that allows you to decide, 453 00:22:15,640 --> 00:22:19,060 should I go this way or this way when it comes to making a decision. 454 00:22:19,060 --> 00:22:22,440 So let's consider how we could draw a picture to represent even something 455 00:22:22,440 --> 00:22:24,180 simplistic like Breakout. 456 00:22:24,180 --> 00:22:28,290 Well, if the ball is left of the paddle, is a question or a Boolean expression 457 00:22:28,290 --> 00:22:29,940 I might ask myself in code. 458 00:22:29,940 --> 00:22:34,500 If yes, then I should move my paddle left, as most everyone just said. 459 00:22:34,500 --> 00:22:37,960 Else, if the ball is not left of paddle, what do I want to do? 460 00:22:37,960 --> 00:22:39,537 Well, I want to ask a question. 461 00:22:39,537 --> 00:22:41,370 I don't want to just instinctively go right. 462 00:22:41,370 --> 00:22:44,010 I want to check, is the ball to the right of the paddle, 463 00:22:44,010 --> 00:22:47,730 and if yes, well, then yes, go ahead and move the paddle right. 464 00:22:47,730 --> 00:22:50,180 But there is a third situation, which is-- 465 00:22:50,180 --> 00:22:51,163 AUDIENCE: [INAUDIBLE] 466 00:22:51,163 --> 00:22:52,080 DAVID J. MALAN: Right. 467 00:22:52,080 --> 00:22:53,920 Like, don't move, it's coming right at you. 468 00:22:53,920 --> 00:22:55,260 So that would be the third scenario here. 469 00:22:55,260 --> 00:22:58,140 No, it's not to the right or to the left, so just don't move the paddle. 470 00:22:58,140 --> 00:23:00,660 You got lucky, and it's coming, for instance, straight down. 471 00:23:00,660 --> 00:23:04,170 So Breakout is fairly straightforward when it comes to an algorithm. 472 00:23:04,170 --> 00:23:07,200 And we can actually translate this as any CS50 student now could, 473 00:23:07,200 --> 00:23:11,400 to code or pseudocode, sort of English-like code that's independent 474 00:23:11,400 --> 00:23:15,280 of Java, C, C++ and all of the programming languages of today. 475 00:23:15,280 --> 00:23:17,940 So in English pseudocode, while a game is 476 00:23:17,940 --> 00:23:22,230 ongoing, if the ball is left of paddle, I should move paddle left. 477 00:23:22,230 --> 00:23:26,460 Else if ball is right of the paddle, it should say paddle, that's a bug, 478 00:23:26,460 --> 00:23:29,520 not intended today, move paddle right. 479 00:23:29,520 --> 00:23:31,710 Else, don't move the paddle. 480 00:23:31,710 --> 00:23:35,910 So that, too, represents a translation of this intuition to code 481 00:23:35,910 --> 00:23:37,200 that's very deterministic. 482 00:23:37,200 --> 00:23:40,830 You can anticipate all possible scenarios captured in code. 483 00:23:40,830 --> 00:23:43,890 And frankly, this should be the most boring game of Breakout, 484 00:23:43,890 --> 00:23:47,250 because the paddle should just perfectly play this game, assuming 485 00:23:47,250 --> 00:23:49,770 there's no variables or randomness when it comes to speed 486 00:23:49,770 --> 00:23:53,590 or angles or the like, which real world games certainly try to introduce. 487 00:23:53,590 --> 00:23:55,570 But let's consider another game from yesteryear 488 00:23:55,570 --> 00:23:58,570 that you might play with your kids today or you did yourself growing up. 489 00:23:58,570 --> 00:23:59,590 Here's tic-tac-toe. 490 00:23:59,590 --> 00:24:02,860 And for those unfamiliar, the goal is to get three O's in a row 491 00:24:02,860 --> 00:24:07,180 or three X's in a row, vertically, horizontally, or diagonally. 492 00:24:07,180 --> 00:24:09,970 So suppose it's now X's turn. 493 00:24:09,970 --> 00:24:12,250 If you've played tic-tac-toe, most of you 494 00:24:12,250 --> 00:24:16,060 probably just have an immediate instinct as to where X should probably go, 495 00:24:16,060 --> 00:24:18,970 so that it doesn't lose instantaneously. 496 00:24:18,970 --> 00:24:22,690 But let's consider in the more general case, how do you solve tic-tac-toe. 497 00:24:22,690 --> 00:24:25,360 Frankly, if you're in the habit of losing tic-tac-toe, 498 00:24:25,360 --> 00:24:27,255 but you're not trying to lose tic-tac-toe, 499 00:24:27,255 --> 00:24:28,630 you're actually playing it wrong. 500 00:24:28,630 --> 00:24:31,920 Like, you should minimally be able to always force a tie in tic-tac-toe. 501 00:24:31,920 --> 00:24:34,420 And better yet, you should be able to beat the other person. 502 00:24:34,420 --> 00:24:37,550 So hopefully, everyone now will soon walk away with this strategy. 503 00:24:37,550 --> 00:24:41,020 So how can we borrow inspiration from those same decision trees 504 00:24:41,020 --> 00:24:43,100 and do something similar here? 505 00:24:43,100 --> 00:24:47,620 So if you, the player, ask yourself, can I get three in a row on this turn? 506 00:24:47,620 --> 00:24:51,970 Well, if yes, then you should do that and play the X in that position. 507 00:24:51,970 --> 00:24:53,980 Play in the square to get three in a row. 508 00:24:53,980 --> 00:24:54,820 Straight forward. 509 00:24:54,820 --> 00:24:58,330 If you can't get three in a row in this turn, you should ask another question. 510 00:24:58,330 --> 00:25:01,660 Can my opponent get three in a row in their next turn? 511 00:25:01,660 --> 00:25:06,220 Because then you better preempt that by moving into that position. 512 00:25:06,220 --> 00:25:10,810 Play in the square to block opponent's three in a row. 513 00:25:10,810 --> 00:25:13,428 What if though, that's not the case, right? 514 00:25:13,428 --> 00:25:15,970 What if there aren't even that many X's and O's on the board? 515 00:25:15,970 --> 00:25:17,887 If you're in the habit of just kind of playing 516 00:25:17,887 --> 00:25:21,940 randomly, like you might not be playing optimally as a good AI could. 517 00:25:21,940 --> 00:25:24,430 So if no, it's kind of a question mark. 518 00:25:24,430 --> 00:25:26,685 In fact, there's probably more to this tree, 519 00:25:26,685 --> 00:25:28,810 because we could think through, what if I go there. 520 00:25:28,810 --> 00:25:30,977 Wait a minute, what if I go there or there or there? 521 00:25:30,977 --> 00:25:34,510 You can start to think a few steps ahead as a computer could do much better even 522 00:25:34,510 --> 00:25:35,540 than us humans. 523 00:25:35,540 --> 00:25:37,388 So suppose, for instance, it's O's turn. 524 00:25:37,388 --> 00:25:39,430 Now those of you who are very good at tic-tac-toe 525 00:25:39,430 --> 00:25:40,870 might have an instinct for where to go. 526 00:25:40,870 --> 00:25:42,953 But this is an even harder problem, it would seem. 527 00:25:42,953 --> 00:25:45,370 I could go in eight possible places if I'm O. 528 00:25:45,370 --> 00:25:49,570 But let's try to break that down more algorithmically, as in AI would. 529 00:25:49,570 --> 00:25:53,830 And let's recognize, too, that with games in particular, one of the reasons 530 00:25:53,830 --> 00:25:58,330 that AI was so early adopted in these games, playing the CPU, 531 00:25:58,330 --> 00:26:02,020 is that games really lend themselves to defining them, 532 00:26:02,020 --> 00:26:04,120 if taking the fun out of it mathematically. 533 00:26:04,120 --> 00:26:07,600 Defining them in terms of inputs and outputs, maybe paddle moving 534 00:26:07,600 --> 00:26:10,040 left or right, ball moving up or down. 535 00:26:10,040 --> 00:26:13,090 You can really quantize it at a very boring low level. 536 00:26:13,090 --> 00:26:16,060 But that lends itself then to solving it optimally. 537 00:26:16,060 --> 00:26:19,630 And in fact, with most games, the goal is to maximize or maybe 538 00:26:19,630 --> 00:26:21,790 minimize some math function, right? 539 00:26:21,790 --> 00:26:24,910 Most games, if you have scores, the goal is to maximize your score, 540 00:26:24,910 --> 00:26:26,750 and indeed, get a high score. 541 00:26:26,750 --> 00:26:31,510 So games lend themselves to a nice translation to mathematics, 542 00:26:31,510 --> 00:26:33,410 and in turn here, AI solutions. 543 00:26:33,410 --> 00:26:37,690 So one of the first algorithms one might learn in a class on algorithms 544 00:26:37,690 --> 00:26:39,490 and on artificial intelligence is something 545 00:26:39,490 --> 00:26:41,860 called minimax, which alludes to this idea of trying 546 00:26:41,860 --> 00:26:46,060 to minimize and/or maximize something as your function, your goal. 547 00:26:46,060 --> 00:26:49,890 And it actually derives its inspiration from these same decision trees 548 00:26:49,890 --> 00:26:51,140 that we've been talking about. 549 00:26:51,140 --> 00:26:52,390 But first, a definition. 550 00:26:52,390 --> 00:26:55,210 Here are three representative tic-tac-toe boards. 551 00:26:55,210 --> 00:26:58,570 Here is one in which O has clearly won, per the green. 552 00:26:58,570 --> 00:27:01,537 Here is one in which X has clearly won, per the green. 553 00:27:01,537 --> 00:27:03,620 And this one in the middle just represents a draw. 554 00:27:03,620 --> 00:27:06,662 Now, there's a bunch of other ways that tic-tac-toe could end, but here's 555 00:27:06,662 --> 00:27:08,050 just three representative ones. 556 00:27:08,050 --> 00:27:10,223 But let's make tic-tac-toe even more boring 557 00:27:10,223 --> 00:27:11,890 than it might have always struck you as. 558 00:27:11,890 --> 00:27:15,130 Let's propose that this kind of configuration 559 00:27:15,130 --> 00:27:17,230 should have a score of negative 1. 560 00:27:17,230 --> 00:27:19,030 If O wins, it's a negative 1. 561 00:27:19,030 --> 00:27:21,340 If X wins, it's a positive 1. 562 00:27:21,340 --> 00:27:23,350 And if no one wins, we'll call it a 0. 563 00:27:23,350 --> 00:27:27,280 We need some way of talking about and reasoning about which of these outcomes 564 00:27:27,280 --> 00:27:28,520 is better than the other. 565 00:27:28,520 --> 00:27:31,450 And what's simpler than 0, 1 and negative 1? 566 00:27:31,450 --> 00:27:33,760 So the goal though, of X, it would seem, is 567 00:27:33,760 --> 00:27:38,530 to maximize its score, but the goal of O is to minimize its score. 568 00:27:38,530 --> 00:27:42,400 So X is really trying to get positive 1, O is really trying to get negative 1. 569 00:27:42,400 --> 00:27:46,610 And no one really wants 0, but that's better than losing to the other person. 570 00:27:46,610 --> 00:27:49,900 So we have now a way to define what it means to win or lose. 571 00:27:49,900 --> 00:27:52,790 Well, now we can employ a strategy here. 572 00:27:52,790 --> 00:27:56,210 Here, just as a quick check, what would the score be of this board? 573 00:27:56,210 --> 00:27:58,020 Just so everyone's on the same page. 574 00:27:58,020 --> 00:27:58,520 AUDIENCE: 1. 575 00:27:58,520 --> 00:28:02,000 DAVID J. MALAN: Or, so 1, because X has one and we just stipulated arbitrarily, 576 00:28:02,000 --> 00:28:04,190 this means that this board has a value of 1. 577 00:28:04,190 --> 00:28:06,740 Now let's put it into a more interesting context. 578 00:28:06,740 --> 00:28:09,320 Here, a game has been played for a few moves already. 579 00:28:09,320 --> 00:28:10,890 There's two spots left. 580 00:28:10,890 --> 00:28:12,590 No one has won just yet. 581 00:28:12,590 --> 00:28:14,982 And suppose that it's O's turn now. 582 00:28:14,982 --> 00:28:17,690 Now, everyone probably has an instinct already as to where to go, 583 00:28:17,690 --> 00:28:20,510 but let's try to break this down more algorithmically. 584 00:28:20,510 --> 00:28:22,430 So what is the value of this board? 585 00:28:22,430 --> 00:28:25,430 Well, we don't know yet, because no one has won, 586 00:28:25,430 --> 00:28:28,440 so let's consider what could happen next. 587 00:28:28,440 --> 00:28:31,310 So we can draw this actually as a tree, as before. 588 00:28:31,310 --> 00:28:33,470 Here, for instance, is what might happen if O 589 00:28:33,470 --> 00:28:35,270 goes into the top left-hand corner. 590 00:28:35,270 --> 00:28:39,830 And here's what might happen if O goes into the bottom middle spot instead. 591 00:28:39,830 --> 00:28:42,530 We should ask ourselves, what's the value of this board, what's 592 00:28:42,530 --> 00:28:43,530 the value of this board? 593 00:28:43,530 --> 00:28:46,340 Because if O's purpose in life is to minimize its score, 594 00:28:46,340 --> 00:28:49,850 it's going to go left or right based on whichever yields the smallest number. 595 00:28:49,850 --> 00:28:51,390 Negative 1, ideally. 596 00:28:51,390 --> 00:28:55,230 But we're still not sure yet, because we don't have definitions for boards 597 00:28:55,230 --> 00:28:56,770 with holes in them like this. 598 00:28:56,770 --> 00:28:58,380 So what could happen next here? 599 00:28:58,380 --> 00:29:00,480 Well, it's obviously going to be X's turn next. 600 00:29:00,480 --> 00:29:05,080 So if X moves, unfortunately, X has one in this configuration. 601 00:29:05,080 --> 00:29:08,980 We can now conclude that the value of this board is what number? 602 00:29:08,980 --> 00:29:09,480 AUDIENCE: 1. 603 00:29:09,480 --> 00:29:10,620 DAVID J. MALAN: So 1. 604 00:29:10,620 --> 00:29:14,970 And because there's only one way to reach this board, by transitivity, 605 00:29:14,970 --> 00:29:19,080 you might as well think of the value of this previous board as also 1, 606 00:29:19,080 --> 00:29:21,760 because no matter what, it's going to lead to that same outcome. 607 00:29:21,760 --> 00:29:25,890 And so the value of this board is actually still to be determined, 608 00:29:25,890 --> 00:29:28,440 because we don't know if O is going to want to go with the 1, 609 00:29:28,440 --> 00:29:30,600 and probably not, because that means X wins. 610 00:29:30,600 --> 00:29:32,520 But let's see what the value of this board is. 611 00:29:32,520 --> 00:29:36,370 Well, suppose that indeed, X goes in that top left corner here. 612 00:29:36,370 --> 00:29:39,540 What's the value of this board here? 613 00:29:39,540 --> 00:29:41,140 0, because no one has won. 614 00:29:41,140 --> 00:29:43,390 There's no X's or O's three in a row. 615 00:29:43,390 --> 00:29:45,000 So the value of this board is 0. 616 00:29:45,000 --> 00:29:47,140 There's only one way logically to get there, 617 00:29:47,140 --> 00:29:50,190 so we might as well think of the value of this board as also 0. 618 00:29:50,190 --> 00:29:53,100 And so now, what's the value of this board? 619 00:29:53,100 --> 00:29:56,370 Well, if we started the story by thinking about O's turn, 620 00:29:56,370 --> 00:30:01,860 O's purpose is the min in minimax, then which move is O going to make? 621 00:30:01,860 --> 00:30:05,030 Go to the left or go to the right? 622 00:30:05,030 --> 00:30:06,800 O's was probably going to go to the right 623 00:30:06,800 --> 00:30:10,880 and make the move that leads to, whoops, that leads to this board, 624 00:30:10,880 --> 00:30:15,200 because even though O can't win in this configuration, at least X didn't win. 625 00:30:15,200 --> 00:30:19,190 So it's minimized its score relatively, even though it's not a clean win. 626 00:30:19,190 --> 00:30:21,500 Now, this is all fine and good for a configuration 627 00:30:21,500 --> 00:30:23,243 of the board that's like almost done. 628 00:30:23,243 --> 00:30:24,410 There's only two moves left. 629 00:30:24,410 --> 00:30:25,770 The game's about to end. 630 00:30:25,770 --> 00:30:27,830 But if you kind of expand in your mind's eye, 631 00:30:27,830 --> 00:30:30,810 how did we get to this branch of the decision tree, 632 00:30:30,810 --> 00:30:34,010 if we rewind one step where there's three possible moves, 633 00:30:34,010 --> 00:30:36,260 frankly, the decision tree is a lot bigger. 634 00:30:36,260 --> 00:30:39,350 If we rewind further in your mind's eye and have four moves 635 00:30:39,350 --> 00:30:41,760 left or five moves or all nine moves left, 636 00:30:41,760 --> 00:30:43,550 imagine just zooming out, out, and out. 637 00:30:43,550 --> 00:30:46,940 This is becoming a massive, massive tree of decisions. 638 00:30:46,940 --> 00:30:51,110 Now, even so, here is that same subtree, the same decision tree 639 00:30:51,110 --> 00:30:51,860 we just looked at. 640 00:30:51,860 --> 00:30:54,050 This is the exact same thing, but I shrunk the font so 641 00:30:54,050 --> 00:30:55,760 that it appears here on the screen here. 642 00:30:55,760 --> 00:30:59,660 But over here, we have what could happen if instead, 643 00:30:59,660 --> 00:31:03,680 it's actually X's turn, because we're one move prior. 644 00:31:03,680 --> 00:31:06,420 There's a bunch of different moves X could now make, too. 645 00:31:06,420 --> 00:31:08,350 So what is the implication of this? 646 00:31:08,350 --> 00:31:12,930 Well, most humans are not thinking through tic-tac-toe to this extreme. 647 00:31:12,930 --> 00:31:15,780 And frankly, most of us probably just don't have the mental capacity 648 00:31:15,780 --> 00:31:18,360 to think about going left and then right and then left and then right. 649 00:31:18,360 --> 00:31:18,860 Right? 650 00:31:18,860 --> 00:31:20,610 This is not how people play tic-tac-toe. 651 00:31:20,610 --> 00:31:23,190 Like, we're not using that much memory, so to speak. 652 00:31:23,190 --> 00:31:26,010 But a computer can handle that, and computers 653 00:31:26,010 --> 00:31:27,850 can play tic-tac-toe optimally. 654 00:31:27,850 --> 00:31:30,360 So if you're beating a computer at tic-tac-toe, like, 655 00:31:30,360 --> 00:31:31,770 it's not implemented very well. 656 00:31:31,770 --> 00:31:36,420 It's not following this very logical, deterministic minimax algorithm. 657 00:31:36,420 --> 00:31:40,470 But this is where now AI is no longer as simple as just 658 00:31:40,470 --> 00:31:42,570 doing what these decision trees say. 659 00:31:42,570 --> 00:31:45,780 In the context of tic-tac-toe, here's how we might translate this 660 00:31:45,780 --> 00:31:46,870 to code, for instance. 661 00:31:46,870 --> 00:31:49,830 If player is X, for each possible move, calculate 662 00:31:49,830 --> 00:31:52,200 a score for the board, as we were doing verbally, 663 00:31:52,200 --> 00:31:54,600 and then choose the move with the highest score. 664 00:31:54,600 --> 00:31:57,420 Because X's goal is to maximize its score. 665 00:31:57,420 --> 00:32:00,090 If the player is O, though, for each possible move, 666 00:32:00,090 --> 00:32:02,010 calculate a score for the board, and then 667 00:32:02,010 --> 00:32:04,210 choose the move with the lowest score. 668 00:32:04,210 --> 00:32:06,600 So that's a distillation of that verbal walkthrough 669 00:32:06,600 --> 00:32:10,290 into what CS50 students know now as code, or at least pseudocode. 670 00:32:10,290 --> 00:32:15,120 But the problem with games, not so much tic-tac-toe, but other more 671 00:32:15,120 --> 00:32:16,650 sophisticated games is this. 672 00:32:16,650 --> 00:32:19,890 Does anyone want to ballpark how many possible ways there 673 00:32:19,890 --> 00:32:22,940 are to play tic-tac-toe? 674 00:32:22,940 --> 00:32:26,180 Paper, pencil, two human children, how many different ways? 675 00:32:26,180 --> 00:32:30,893 How long could you keep them occupied playing tic-tac-toe in different ways? 676 00:32:30,893 --> 00:32:33,310 If you actually think through, how big does this tree get, 677 00:32:33,310 --> 00:32:36,160 how many leaves are there on this decision tree, like how many 678 00:32:36,160 --> 00:32:42,520 different directions, well, if you're thinking 255,168, you are correct. 679 00:32:42,520 --> 00:32:44,980 And now most of us in our lifetime have probably not 680 00:32:44,980 --> 00:32:47,180 played tic-tac-toe that many times. 681 00:32:47,180 --> 00:32:49,660 So think about how many games you've been missing out on. 682 00:32:49,660 --> 00:32:53,230 There are different decisions you could have been making all these years. 683 00:32:53,230 --> 00:32:57,380 Now, that's a big number, but honestly, that's not a big number for a computer. 684 00:32:57,380 --> 00:33:01,420 That's a few megabytes of memory maybe, to keep all of that in mind 685 00:33:01,420 --> 00:33:06,160 and implement that kind of code in C or Java or C++ or something else. 686 00:33:06,160 --> 00:33:08,990 But other games are much more complicated. 687 00:33:08,990 --> 00:33:11,860 And the games that you and I might play as we get older, 688 00:33:11,860 --> 00:33:13,330 they include maybe chess. 689 00:33:13,330 --> 00:33:17,560 And if you think about chess with only the first four moves, back and forth 690 00:33:17,560 --> 00:33:19,750 four times, so only four moves. 691 00:33:19,750 --> 00:33:21,430 That's not even a very long game. 692 00:33:21,430 --> 00:33:23,830 Anyone want a ballpark how many different ways 693 00:33:23,830 --> 00:33:28,390 there are to begin a game of chess with four moves back and forth? 694 00:33:28,390 --> 00:33:31,490 695 00:33:31,490 --> 00:33:34,300 This is evidence as to why chess is apparently so hard. 696 00:33:34,300 --> 00:33:40,030 288 million ways, which is why when you are really good at chess, 697 00:33:40,030 --> 00:33:41,680 you are really good at chess. 698 00:33:41,680 --> 00:33:44,350 Because apparently, you either have an intuition for 699 00:33:44,350 --> 00:33:47,950 or a mind for thinking it would seem so many more steps ahead 700 00:33:47,950 --> 00:33:48,860 than your opponent. 701 00:33:48,860 --> 00:33:50,777 And don't get us started on something like Go. 702 00:33:50,777 --> 00:33:55,570 266 quintillion ways to play Go's first four moves. 703 00:33:55,570 --> 00:33:59,110 So at this point, we just can't pull out our Mac, our PC, 704 00:33:59,110 --> 00:34:03,190 certainly not our phone, to solve optimally games like chess and Go, 705 00:34:03,190 --> 00:34:05,323 because we don't have big enough CPUs. 706 00:34:05,323 --> 00:34:06,490 We don't have enough memory. 707 00:34:06,490 --> 00:34:09,610 We don't have enough years in our lifetimes for the computers 708 00:34:09,610 --> 00:34:11,110 to crunch all of those numbers. 709 00:34:11,110 --> 00:34:14,230 And thus was born a different form of AI that's 710 00:34:14,230 --> 00:34:18,520 more inspired by finding patterns more dynamically, 711 00:34:18,520 --> 00:34:22,239 learning from data, as opposed to being told by humans, here 712 00:34:22,239 --> 00:34:25,070 is the code via which to solve this problem. 713 00:34:25,070 --> 00:34:28,330 So machine learning is a subset of artificial intelligence 714 00:34:28,330 --> 00:34:30,980 that tries instead to get machines to learn 715 00:34:30,980 --> 00:34:35,900 what they should do without being so coached step by step by step by humans 716 00:34:35,900 --> 00:34:36,409 here. 717 00:34:36,409 --> 00:34:39,500 Reinforcement learning, for instance, is one such example thereof, 718 00:34:39,500 --> 00:34:41,690 wherein reinforcement learning, you sort of wait 719 00:34:41,690 --> 00:34:44,480 for the computer or maybe a robot to maybe just get 720 00:34:44,480 --> 00:34:46,380 better and better and better at things. 721 00:34:46,380 --> 00:34:48,710 And as it does, you reward it with a reward function. 722 00:34:48,710 --> 00:34:50,960 Give it plus 1 every time it does something well. 723 00:34:50,960 --> 00:34:51,830 And maybe minus 1. 724 00:34:51,830 --> 00:34:54,080 You punish it any time it does something poorly. 725 00:34:54,080 --> 00:35:00,110 And if you simply program this AI or this robot to maximize its score, 726 00:35:00,110 --> 00:35:02,390 never mind minimizing, maximize its score, 727 00:35:02,390 --> 00:35:05,570 ideally, it should repeat behaviors that got it plus 1. 728 00:35:05,570 --> 00:35:07,820 It should decrease the frequency with which it does 729 00:35:07,820 --> 00:35:09,710 bad behaviors that got it negative 1. 730 00:35:09,710 --> 00:35:12,080 And you can reinforce this kind of learning. 731 00:35:12,080 --> 00:35:15,230 In fact, I have here one demonstration. 732 00:35:15,230 --> 00:35:18,380 Could a student come on up who does not think 733 00:35:18,380 --> 00:35:20,960 they are particularly coordinated? 734 00:35:20,960 --> 00:35:24,020 If-- OK, wow, you're being nominated by your friends. 735 00:35:24,020 --> 00:35:24,950 Come on up. 736 00:35:24,950 --> 00:35:26,283 Come on up. 737 00:35:26,283 --> 00:35:28,598 [LAUGHTER] 738 00:35:28,598 --> 00:35:29,530 739 00:35:29,530 --> 00:35:31,720 Their hands went up instantly for you. 740 00:35:31,720 --> 00:35:34,260 741 00:35:34,260 --> 00:35:36,290 OK, what is your name? 742 00:35:36,290 --> 00:35:37,420 AMAKA: My name's Amaka. 743 00:35:37,420 --> 00:35:39,130 DAVID J. MALAN: Amaka, do you want to introduce yourself to the world? 744 00:35:39,130 --> 00:35:40,330 AMAKA: Hi, my name is Amaka. 745 00:35:40,330 --> 00:35:42,250 I am a first year in Holworthy. 746 00:35:42,250 --> 00:35:43,667 I'm planning to concentrate in CS. 747 00:35:43,667 --> 00:35:44,750 DAVID J. MALAN: Wonderful. 748 00:35:44,750 --> 00:35:45,550 Nice to see you. 749 00:35:45,550 --> 00:35:46,690 Come on over here. 750 00:35:46,690 --> 00:35:49,540 [APPLAUSE] 751 00:35:49,540 --> 00:35:52,900 So, yes, oh, no, it's sort of like a game show here. 752 00:35:52,900 --> 00:35:57,520 We have a pan here with what appears to be something pancake-like. 753 00:35:57,520 --> 00:36:00,970 And we'd like to teach you how to flip a pancake, 754 00:36:00,970 --> 00:36:04,250 so that when you gesture upward, the pancake should flip around 755 00:36:04,250 --> 00:36:05,900 as though you cooked the other side. 756 00:36:05,900 --> 00:36:09,400 So we're going to reward you verbally with plus 1 or minus 1. 757 00:36:09,400 --> 00:36:11,980 758 00:36:11,980 --> 00:36:13,450 Minus 1. 759 00:36:13,450 --> 00:36:15,470 Minus 1. 760 00:36:15,470 --> 00:36:17,050 OK, plus 1! 761 00:36:17,050 --> 00:36:19,690 Plus 1, so do more of that. 762 00:36:19,690 --> 00:36:20,920 Minus 1. 763 00:36:20,920 --> 00:36:22,840 Minus 1. 764 00:36:22,840 --> 00:36:23,890 Minus 1. 765 00:36:23,890 --> 00:36:25,150 Do less of that. 766 00:36:25,150 --> 00:36:27,370 [LAUGHTER] 767 00:36:27,370 --> 00:36:28,517 AUDIENCE: Great, great. 768 00:36:28,517 --> 00:36:29,600 DAVID J. MALAN: All right! 769 00:36:29,600 --> 00:36:30,655 A big round of applause. 770 00:36:30,655 --> 00:36:32,890 [APPLAUSE] 771 00:36:32,890 --> 00:36:33,670 Thank you. 772 00:36:33,670 --> 00:36:37,340 We've been in the habit of handing out Super Mario Brothers Oreos this year, 773 00:36:37,340 --> 00:36:39,220 so thank you for participating. 774 00:36:39,220 --> 00:36:41,600 [APPLAUSE] 775 00:36:41,600 --> 00:36:43,030 776 00:36:43,030 --> 00:36:46,590 So, this is actually a good example of an opportunity 777 00:36:46,590 --> 00:36:47,940 for reinforcement learning. 778 00:36:47,940 --> 00:36:51,310 And wonderfully, a researcher has posted a video that we thought we'd share. 779 00:36:51,310 --> 00:36:53,060 It's about a minute and a half long, where 780 00:36:53,060 --> 00:36:57,570 you can watch a robot now do exactly what our wonderful human volunteer here 781 00:36:57,570 --> 00:36:59,050 just attempted as well. 782 00:36:59,050 --> 00:37:01,560 So let me go ahead and play this on the screen 783 00:37:01,560 --> 00:37:05,380 and give you a sense of what the human and the robot are doing together. 784 00:37:05,380 --> 00:37:08,790 So their pancake looks a little similar there. 785 00:37:08,790 --> 00:37:12,360 The human here is going to first sort of train the robot what 786 00:37:12,360 --> 00:37:14,190 to do by showing it some gestures. 787 00:37:14,190 --> 00:37:16,360 But there's no one right way to do this. 788 00:37:16,360 --> 00:37:19,660 But the human seems to know how to do it pretty well in this case, 789 00:37:19,660 --> 00:37:23,040 and so it's trying to give the machine examples 790 00:37:23,040 --> 00:37:24,990 of how to flip a pancake successfully. 791 00:37:24,990 --> 00:37:27,810 But now, this is the very first trial. 792 00:37:27,810 --> 00:37:28,560 OK, look familiar? 793 00:37:28,560 --> 00:37:30,300 You're in good company. 794 00:37:30,300 --> 00:37:32,652 After three trials. 795 00:37:32,652 --> 00:37:33,456 [CLANG] 796 00:37:33,456 --> 00:37:34,260 [PLOP] 797 00:37:34,260 --> 00:37:36,020 OK. 798 00:37:36,020 --> 00:37:36,520 [CLANG] 799 00:37:36,520 --> 00:37:37,410 [PLOP] 800 00:37:37,410 --> 00:37:39,060 OK. 801 00:37:39,060 --> 00:37:42,690 Now 10 tries. 802 00:37:42,690 --> 00:37:46,020 There's the human picking up the pancake. 803 00:37:46,020 --> 00:37:48,780 After 11 trials-- 804 00:37:48,780 --> 00:37:49,680 [CLANG] 805 00:37:49,680 --> 00:37:51,930 [PLOP] 806 00:37:51,930 --> 00:37:54,270 And meanwhile, there's presumably a human coding this, 807 00:37:54,270 --> 00:38:00,090 in the sense that someone is saying good job or bad job, plus 1 or minus 1. 808 00:38:00,090 --> 00:38:03,870 20 trials. 809 00:38:03,870 --> 00:38:07,440 Here now we'll see how the computer knows what it's even doing. 810 00:38:07,440 --> 00:38:10,720 There's just a mapping to some kind of XYZ coordinate system. 811 00:38:10,720 --> 00:38:13,260 So the robot can quantize what it is it's doing. 812 00:38:13,260 --> 00:38:14,100 Nice! 813 00:38:14,100 --> 00:38:16,447 To do more of one thing, less of another. 814 00:38:16,447 --> 00:38:18,780 And you're just seeing a visualization in the background 815 00:38:18,780 --> 00:38:21,720 of those digitized movements. 816 00:38:21,720 --> 00:38:28,020 And so now, after 50 some odd trials, the robot, too, has got it spot on. 817 00:38:28,020 --> 00:38:30,420 And it should be able to repeat this again and again 818 00:38:30,420 --> 00:38:33,000 and again, in order to keep flipping this pancake. 819 00:38:33,000 --> 00:38:36,360 So our human volunteer wonderfully took you even fewer trials. 820 00:38:36,360 --> 00:38:38,340 But this is an example then, to be clear, 821 00:38:38,340 --> 00:38:40,800 of what we'd call reinforcement learning, 822 00:38:40,800 --> 00:38:44,725 whereby you're reinforcing a behavior you want or negatively reinforcing. 823 00:38:44,725 --> 00:38:46,600 That is, punishing a behavior that you don't. 824 00:38:46,600 --> 00:38:48,350 Here's another example that brings us back 825 00:38:48,350 --> 00:38:51,850 into the realm of games a little bit, but in a very abstract way. 826 00:38:51,850 --> 00:38:53,918 If we were playing a game like The Floor Is Lava, 827 00:38:53,918 --> 00:38:56,710 where you're only supposed to step certain places so that you don't 828 00:38:56,710 --> 00:38:59,585 fall straight in the lava pit or something like that and lose a point 829 00:38:59,585 --> 00:39:02,920 or lose a life, each of these squares might represent a position. 830 00:39:02,920 --> 00:39:06,470 This yellow dot might represent the human player that can go up, down, 831 00:39:06,470 --> 00:39:08,240 left or right within this world. 832 00:39:08,240 --> 00:39:11,170 I'm revealing to the whole audience where the lava pits are. 833 00:39:11,170 --> 00:39:13,930 But the goal for this yellow dot is to get to green. 834 00:39:13,930 --> 00:39:17,530 But the yellow dot, as in any good game, does not have this bird's eye view 835 00:39:17,530 --> 00:39:19,930 and knows from the get-go exactly where to go. 836 00:39:19,930 --> 00:39:22,040 It's going to have to try some trial and error. 837 00:39:22,040 --> 00:39:25,300 But if we, the programmers, maybe reinforce good behavior 838 00:39:25,300 --> 00:39:28,810 or punish bad behavior, we can teach this yellow dot, 839 00:39:28,810 --> 00:39:31,550 without giving it step by step, up, down, 840 00:39:31,550 --> 00:39:34,600 left, right instructions, what behaviors to repeat 841 00:39:34,600 --> 00:39:36,460 and what behaviors not to repeat. 842 00:39:36,460 --> 00:39:38,665 So, for instance, suppose the robot moves right. 843 00:39:38,665 --> 00:39:39,520 Ah, that was bad. 844 00:39:39,520 --> 00:39:42,610 You fell in the lava already, so we'll use a bit of computer memory 845 00:39:42,610 --> 00:39:45,100 to draw a thicker red line there. 846 00:39:45,100 --> 00:39:46,220 Don't do that again. 847 00:39:46,220 --> 00:39:47,830 So, negative 1, so to speak. 848 00:39:47,830 --> 00:39:49,780 Maybe the yellow dot moves up next time. 849 00:39:49,780 --> 00:39:53,290 We can reward that behavior by not drawing any walls 850 00:39:53,290 --> 00:39:54,580 and allowing it to go again. 851 00:39:54,580 --> 00:39:57,970 It's making pretty good progress, but, oh, darn it, it took a right turn 852 00:39:57,970 --> 00:39:59,230 and now fell into the lava. 853 00:39:59,230 --> 00:40:01,490 But let's use a bit more of the computer's memory 854 00:40:01,490 --> 00:40:04,750 and keep track of the, OK, do not do that thing anymore. 855 00:40:04,750 --> 00:40:07,270 Maybe the next time the human dot goes this way. 856 00:40:07,270 --> 00:40:09,370 Oh, we want to punish that behavior, so we'll 857 00:40:09,370 --> 00:40:11,140 remember as much with that red line. 858 00:40:11,140 --> 00:40:15,040 But now we're starting to make progress until, oh, now we hit this one. 859 00:40:15,040 --> 00:40:18,340 And eventually, even though the yellow dot, much like our human, 860 00:40:18,340 --> 00:40:22,780 much like our pancake flipping robot had to try again and again and again, 861 00:40:22,780 --> 00:40:26,710 after enough trials, it's going to start to realize what behaviors it should 862 00:40:26,710 --> 00:40:28,880 repeat and which ones it shouldn't. 863 00:40:28,880 --> 00:40:32,740 And so in this case, maybe it finally makes its way up to the green dot. 864 00:40:32,740 --> 00:40:35,050 And just to recap, once it finds that path, 865 00:40:35,050 --> 00:40:38,620 now it can remember it forever as with these green thicker lines. 866 00:40:38,620 --> 00:40:41,470 Any time you want to leave this map, any time you get really good 867 00:40:41,470 --> 00:40:44,650 at the Nintendo game, you follow that same path again and again, 868 00:40:44,650 --> 00:40:46,420 so you don't fall into the lava. 869 00:40:46,420 --> 00:40:51,160 But an astute human observer might realize that, yes, this is correct. 870 00:40:51,160 --> 00:40:53,590 It's getting out of this so-called maze. 871 00:40:53,590 --> 00:40:56,315 But what is suboptimal or bad about this solution? 872 00:40:56,315 --> 00:40:56,815 Sure. 873 00:40:56,815 --> 00:40:58,513 AUDIENCE: It's taking a really long time. 874 00:40:58,513 --> 00:40:59,900 It's not the most efficient way to get there. 875 00:40:59,900 --> 00:41:00,500 DAVID J. MALAN: Exactly. 876 00:41:00,500 --> 00:41:01,792 It's taking a really long time. 877 00:41:01,792 --> 00:41:04,190 An inefficient way to get there, because I dare say, 878 00:41:04,190 --> 00:41:07,280 if we just tried a different path occasionally, 879 00:41:07,280 --> 00:41:11,480 maybe we could get lucky and get to the exit quicker. 880 00:41:11,480 --> 00:41:14,930 And maybe that means we get a higher score or we get rewarded even more. 881 00:41:14,930 --> 00:41:18,140 So within a lot of artificial intelligence algorithms, 882 00:41:18,140 --> 00:41:21,230 there's this idea of exploring versus exploiting, 883 00:41:21,230 --> 00:41:26,000 whereby you should occasionally, yes, exploit the knowledge you already have. 884 00:41:26,000 --> 00:41:28,010 And in fact, frequently exploit that knowledge. 885 00:41:28,010 --> 00:41:30,260 But occasionally you know what you should probably do, 886 00:41:30,260 --> 00:41:31,550 is explore just a little bit. 887 00:41:31,550 --> 00:41:34,550 Take a left instead of a right and see if it leads you to the solution 888 00:41:34,550 --> 00:41:35,390 even more quickly. 889 00:41:35,390 --> 00:41:37,620 And you might find a better and better solution. 890 00:41:37,620 --> 00:41:40,100 So here mathematically is how we might think of this. 891 00:41:40,100 --> 00:41:44,690 10% of the time we might say that epsilon, just some variable, sort 892 00:41:44,690 --> 00:41:47,780 of a sprinkling of salt into the algorithm here, epsilon 893 00:41:47,780 --> 00:41:49,320 will be like 10% of the time. 894 00:41:49,320 --> 00:41:54,512 So if my robot or my player picks a random number that's less than 10%, 895 00:41:54,512 --> 00:41:55,970 that's going to make a random move. 896 00:41:55,970 --> 00:41:59,270 Go left instead of right, even if you really typically go right. 897 00:41:59,270 --> 00:42:01,650 Otherwise, guys make the move with the highest value, 898 00:42:01,650 --> 00:42:03,090 as we've learned over time. 899 00:42:03,090 --> 00:42:06,420 And what the robot might learn then, is that we could actually 900 00:42:06,420 --> 00:42:10,290 go via this path, which gets us to the output faster. 901 00:42:10,290 --> 00:42:13,313 We get a higher score, we do it in less time, it's a win-win. 902 00:42:13,313 --> 00:42:15,480 Frankly, this really resonates with me, because I've 903 00:42:15,480 --> 00:42:19,068 been in the habit, as maybe some of you are, when you go to a restaurant maybe 904 00:42:19,068 --> 00:42:21,360 that you really like, you find a dish you really like-- 905 00:42:21,360 --> 00:42:24,120 --I will never again know what other dishes that restaurant 906 00:42:24,120 --> 00:42:28,440 offers, because I'm locally optimally happy with the dish I've chosen. 907 00:42:28,440 --> 00:42:31,800 And I will never know if there's an even better dish at that restaurant 908 00:42:31,800 --> 00:42:34,320 unless again, I sort of sprinkle a little bit of epsilon, 909 00:42:34,320 --> 00:42:38,730 a little bit of randomness into my game playing, my dining out. 910 00:42:38,730 --> 00:42:41,640 The catch, of course, though, is that I might be punished. 911 00:42:41,640 --> 00:42:45,360 I might, therefore, be less happy if I pick something and I don't like it. 912 00:42:45,360 --> 00:42:48,120 So there's this tension between exploring and exploiting. 913 00:42:48,120 --> 00:42:50,700 But in general in computer science, and especially in AI, 914 00:42:50,700 --> 00:42:53,220 adding a little bit of randomness, especially over time, 915 00:42:53,220 --> 00:42:56,320 can, in fact, yield better and better outcomes. 916 00:42:56,320 --> 00:42:59,400 But now there's this notion all the more of deep learning, 917 00:42:59,400 --> 00:43:02,910 whereby you're trying to infer, to detect patterns, 918 00:43:02,910 --> 00:43:06,120 figure out how to solve problems, even if the AI has never 919 00:43:06,120 --> 00:43:10,170 seen those problems before, and even if there's no human there to reinforce 920 00:43:10,170 --> 00:43:12,720 positive or negatively behavior. 921 00:43:12,720 --> 00:43:15,390 Maybe it's just too complex of a problem for a human 922 00:43:15,390 --> 00:43:18,415 to stand alongside the robot and say, good or bad job. 923 00:43:18,415 --> 00:43:20,790 So with deep learning, they're actually very much related 924 00:43:20,790 --> 00:43:24,210 to what you might know as neural networks, inspired by human physiology, 925 00:43:24,210 --> 00:43:26,580 whereby inside of our brains and elsewhere in our body, 926 00:43:26,580 --> 00:43:28,372 there's lots of these neurons here that can 927 00:43:28,372 --> 00:43:30,480 send electrical signals to make movements 928 00:43:30,480 --> 00:43:32,220 happen from brain to extremities. 929 00:43:32,220 --> 00:43:35,520 You might have two of these via which signals can 930 00:43:35,520 --> 00:43:37,810 be transmitted over a larger distance. 931 00:43:37,810 --> 00:43:41,760 And so computer scientists for some time have drawn inspiration 932 00:43:41,760 --> 00:43:46,560 from these neurons to create in software, what we call neural networks. 933 00:43:46,560 --> 00:43:49,240 Whereby, there's inputs to these networks 934 00:43:49,240 --> 00:43:52,230 and there's outputs from these networks that represents inputs 935 00:43:52,230 --> 00:43:54,450 to problems and solutions thereto. 936 00:43:54,450 --> 00:43:56,910 So let me abstract away the more biological diagrams 937 00:43:56,910 --> 00:44:00,970 with just circles that represent nodes, or neurons, in this case. 938 00:44:00,970 --> 00:44:03,450 This we would call in CS50, the input. 939 00:44:03,450 --> 00:44:05,520 This is what we would call the output. 940 00:44:05,520 --> 00:44:08,680 But this is a very simplistic, a very simple neural network. 941 00:44:08,680 --> 00:44:11,760 This might be more common, whereby the network, the AI 942 00:44:11,760 --> 00:44:15,900 takes two inputs to a problem and tries to give you one solution. 943 00:44:15,900 --> 00:44:17,760 Well, let's make this more real. 944 00:44:17,760 --> 00:44:20,760 For instance, suppose that at the-- 945 00:44:20,760 --> 00:44:23,970 suppose that just for the sake of discussion, here is like a grid 946 00:44:23,970 --> 00:44:27,180 that you might see in math class, with a y-axis and an x-axis, vertically 947 00:44:27,180 --> 00:44:28,620 and horizontally respectively. 948 00:44:28,620 --> 00:44:31,980 Suppose there's a couple of blue and red dots in that world. 949 00:44:31,980 --> 00:44:34,890 And suppose that our goal, computationally, 950 00:44:34,890 --> 00:44:40,020 is to predict whether a dot is going to be blue or red, based 951 00:44:40,020 --> 00:44:42,960 on its position within that coordinate system. 952 00:44:42,960 --> 00:44:45,002 And maybe this represents some real world notion. 953 00:44:45,002 --> 00:44:47,502 Maybe it's something like rain that we're trying to predict. 954 00:44:47,502 --> 00:44:49,920 But we're doing it more simply with colors right now. 955 00:44:49,920 --> 00:44:53,010 So here's my y-axis, here's my x-axis, and effectively, 956 00:44:53,010 --> 00:44:55,740 my neural network you can think of conceptually as this. 957 00:44:55,740 --> 00:44:58,393 It's some kind of implementation of software 958 00:44:58,393 --> 00:45:00,060 where there's two inputs to the problem. 959 00:45:00,060 --> 00:45:01,990 Give me an x, give me a y value. 960 00:45:01,990 --> 00:45:06,540 And this neural network will output red or blue as its prediction. 961 00:45:06,540 --> 00:45:08,790 Well, how does it know whether to predict red or blue, 962 00:45:08,790 --> 00:45:12,030 especially if no human has painstakingly written code 963 00:45:12,030 --> 00:45:15,360 to say when you see a dot here, conclude that it's red. 964 00:45:15,360 --> 00:45:17,490 When you see a dot here, conclude that it's blue. 965 00:45:17,490 --> 00:45:21,160 How can an AI just learn dynamically to solve problems? 966 00:45:21,160 --> 00:45:23,460 Well, what might be a reasonable heuristic here? 967 00:45:23,460 --> 00:45:26,757 Honestly, this is probably a first approximation that's pretty good. 968 00:45:26,757 --> 00:45:29,340 If anything's to the left of that line, let the neural network 969 00:45:29,340 --> 00:45:30,630 conclude that it's going to be blue. 970 00:45:30,630 --> 00:45:32,010 And if it's to the right of the line, let 971 00:45:32,010 --> 00:45:33,593 it conclude that it's going to be red. 972 00:45:33,593 --> 00:45:36,690 Until such time as there's more training data, 973 00:45:36,690 --> 00:45:40,203 more real world data that gets us to rethink our assumptions. 974 00:45:40,203 --> 00:45:42,120 So for instance, if there's a third dot there, 975 00:45:42,120 --> 00:45:44,830 uh-oh, clearly a straight line is not sufficient. 976 00:45:44,830 --> 00:45:48,960 So maybe it's more of a diagonal line that splits the blue from the red world 977 00:45:48,960 --> 00:45:49,600 here. 978 00:45:49,600 --> 00:45:51,660 Meanwhile, here's even more dots. 979 00:45:51,660 --> 00:45:53,580 And it's actually getting harder now. 980 00:45:53,580 --> 00:45:55,230 Like, this line is still pretty good. 981 00:45:55,230 --> 00:45:56,610 Most of the blue is up here. 982 00:45:56,610 --> 00:45:58,240 Most of the red is down here. 983 00:45:58,240 --> 00:46:02,100 And this is why, if we fast forward to today, you know, AI is often very good, 984 00:46:02,100 --> 00:46:04,630 but not perfect at solving problems. 985 00:46:04,630 --> 00:46:07,890 But what is it we're looking at here, and what is this neural network really 986 00:46:07,890 --> 00:46:09,250 trying to figure out? 987 00:46:09,250 --> 00:46:12,870 Well, again, at the risk of taking some fun out of red and blue dots, 988 00:46:12,870 --> 00:46:16,890 you can think of this neural network as indeed having these neurons, which 989 00:46:16,890 --> 00:46:19,590 represent inputs here and outputs here. 990 00:46:19,590 --> 00:46:22,200 And then what's happening inside of the computer's memory, 991 00:46:22,200 --> 00:46:26,320 is that it's trying to figure out what the weight of this arrow or edge 992 00:46:26,320 --> 00:46:26,820 should be. 993 00:46:26,820 --> 00:46:29,132 What the weight of this arrow or edge should be. 994 00:46:29,132 --> 00:46:30,840 And maybe there's another variable there, 995 00:46:30,840 --> 00:46:33,910 like plus or minus C that just tweaks the prediction. 996 00:46:33,910 --> 00:46:37,540 So x and y are literally going to be numbers in this scenario. 997 00:46:37,540 --> 00:46:40,890 And the output of this neural network ideally is just true or false. 998 00:46:40,890 --> 00:46:42,310 Is it red or blue? 999 00:46:42,310 --> 00:46:45,330 So it's sort of a binary state, as we discuss a lot in CS50. 1000 00:46:45,330 --> 00:46:47,987 So here too, to take the fun out of the pretty picture, 1001 00:46:47,987 --> 00:46:50,070 it's really just like a high school math function. 1002 00:46:50,070 --> 00:46:53,160 What the neural network in this example is trying to figure out, 1003 00:46:53,160 --> 00:46:57,540 is what formula of the form ax plus by plus c 1004 00:46:57,540 --> 00:46:59,680 is going to be arbitrarily greater than 0? 1005 00:46:59,680 --> 00:47:02,150 And if so, let's conclude that the dot is red 1006 00:47:02,150 --> 00:47:05,140 if you get back a positive result. If you don't, let's 1007 00:47:05,140 --> 00:47:08,558 conclude that the dot is going to be blue instead. 1008 00:47:08,558 --> 00:47:10,600 So really what you're trying to do, is figure out 1009 00:47:10,600 --> 00:47:13,000 dynamically what numbers do we have to tweak, 1010 00:47:13,000 --> 00:47:15,100 these parameters inside of the neural network 1011 00:47:15,100 --> 00:47:18,220 that just give us the answer we want based on all of this data? 1012 00:47:18,220 --> 00:47:22,180 More generally though, this would be really representative of deep learning. 1013 00:47:22,180 --> 00:47:24,490 It's not as simple as input, input, output. 1014 00:47:24,490 --> 00:47:27,140 There's actually a lot of these nodes, these neurons. 1015 00:47:27,140 --> 00:47:28,360 There's a lot of these edges. 1016 00:47:28,360 --> 00:47:30,812 There's a lot of numbers and math are going on that, 1017 00:47:30,812 --> 00:47:33,520 frankly, even the computer scientists using these neural networks 1018 00:47:33,520 --> 00:47:36,760 don't necessarily know what they even mean or represent. 1019 00:47:36,760 --> 00:47:39,910 It just happens to be that when you crunch the numbers with all 1020 00:47:39,910 --> 00:47:44,140 of these parameters in place, you get the answer that you want, 1021 00:47:44,140 --> 00:47:46,190 at least most of the time. 1022 00:47:46,190 --> 00:47:48,280 So that's essentially the intuition behind that. 1023 00:47:48,280 --> 00:47:51,340 And you can apply it to very real world, if mundane applications. 1024 00:47:51,340 --> 00:47:55,000 Given today's humidity, given today's pressure, yes or no, 1025 00:47:55,000 --> 00:47:56,275 should there be rainfall? 1026 00:47:56,275 --> 00:47:58,150 And maybe there is some mathematical function 1027 00:47:58,150 --> 00:48:01,120 that based on years of training data, we can 1028 00:48:01,120 --> 00:48:03,490 infer what that prediction should be. 1029 00:48:03,490 --> 00:48:04,090 Another one. 1030 00:48:04,090 --> 00:48:07,120 Given this amount of advertising in this month, 1031 00:48:07,120 --> 00:48:09,480 what should our sales be for that year? 1032 00:48:09,480 --> 00:48:11,230 Should they be up, or should they be down? 1033 00:48:11,230 --> 00:48:13,130 Sorry, for that particular month. 1034 00:48:13,130 --> 00:48:16,090 So real world problems map readily when you can break them down 1035 00:48:16,090 --> 00:48:20,320 into inputs and a binary output often, or some kind of output 1036 00:48:20,320 --> 00:48:24,250 where you want the thing to figure out based on past data what 1037 00:48:24,250 --> 00:48:26,650 its prediction should be. 1038 00:48:26,650 --> 00:48:30,250 So that brings us back to generative artificial intelligence, which 1039 00:48:30,250 --> 00:48:34,760 isn't just about solving problems, but really generating literally images, 1040 00:48:34,760 --> 00:48:38,680 texts, even videos, that again, increasingly resemble 1041 00:48:38,680 --> 00:48:41,920 what we humans might otherwise output ourselves. 1042 00:48:41,920 --> 00:48:45,370 And within the world of generative artificial intelligence, 1043 00:48:45,370 --> 00:48:48,310 do we have, of course, these same images that we saw before, 1044 00:48:48,310 --> 00:48:51,340 the same text that we saw before, and more generally, things 1045 00:48:51,340 --> 00:48:55,870 like ChatGPT, which are really examples of what we now call large language 1046 00:48:55,870 --> 00:48:56,560 models. 1047 00:48:56,560 --> 00:48:59,020 These sort of massive neural networks that 1048 00:48:59,020 --> 00:49:02,590 have so many inputs and so many neurons implemented 1049 00:49:02,590 --> 00:49:06,280 in software, that essentially represent all of the patterns 1050 00:49:06,280 --> 00:49:09,850 that the software has discovered by being fed massive amounts of input. 1051 00:49:09,850 --> 00:49:13,180 Think of it as like the entire textual content of the internet. 1052 00:49:13,180 --> 00:49:16,180 Think of it as the entire content of courses like CS50 1053 00:49:16,180 --> 00:49:18,280 that may very well be out there on the internet. 1054 00:49:18,280 --> 00:49:21,610 And even though these AIs, these large language models 1055 00:49:21,610 --> 00:49:25,240 haven't been told how to behave, they're really 1056 00:49:25,240 --> 00:49:28,210 inferring from all of these examples, for better 1057 00:49:28,210 --> 00:49:31,310 or for worse, how to make predictions. 1058 00:49:31,310 --> 00:49:34,840 So here, for instance, from 2017, just a few years back, 1059 00:49:34,840 --> 00:49:38,110 is a seminal paper from Google that introduced what we now 1060 00:49:38,110 --> 00:49:40,210 know as a transformer architecture. 1061 00:49:40,210 --> 00:49:43,690 And this introduced this idea of attention values, whereby 1062 00:49:43,690 --> 00:49:46,900 they propose that given an English sentence, for instance, or really 1063 00:49:46,900 --> 00:49:51,460 any human sentence, you try to assign numbers, not unlike our past exercises, 1064 00:49:51,460 --> 00:49:55,780 to each of the words, each of the inputs that speaks to its relationship 1065 00:49:55,780 --> 00:49:56,930 with other words. 1066 00:49:56,930 --> 00:49:59,720 So if there's a high relationship between two words in a sentence, 1067 00:49:59,720 --> 00:50:01,310 they would have high attention values. 1068 00:50:01,310 --> 00:50:04,720 And if maybe it's a preposition or an article, like the or the like, 1069 00:50:04,720 --> 00:50:06,890 maybe those attention values are lower. 1070 00:50:06,890 --> 00:50:09,070 And by encoding the world in that way, do 1071 00:50:09,070 --> 00:50:14,230 we begin to detect patterns that allow us to predict things like words, 1072 00:50:14,230 --> 00:50:15,440 that is, generate text. 1073 00:50:15,440 --> 00:50:19,150 So for instance, up until a few years ago, completing this sentence 1074 00:50:19,150 --> 00:50:21,310 was actually pretty hard for a lot of AI. 1075 00:50:21,310 --> 00:50:25,180 So for instance here, Massachusetts is a state in the New England region 1076 00:50:25,180 --> 00:50:26,860 of the Northeastern United States. 1077 00:50:26,860 --> 00:50:29,500 It borders on the Atlantic Ocean to the east. 1078 00:50:29,500 --> 00:50:32,180 The state's capital is dot, dot, dot. 1079 00:50:32,180 --> 00:50:34,910 Now, you should think that this is relatively straightforward. 1080 00:50:34,910 --> 00:50:37,480 It's like just handing you a softball type question. 1081 00:50:37,480 --> 00:50:41,290 But historically within the world of AI, this word, state, 1082 00:50:41,290 --> 00:50:44,907 was so relatively far away from the proper noun 1083 00:50:44,907 --> 00:50:46,990 that it's actually referring back to, that we just 1084 00:50:46,990 --> 00:50:50,170 didn't have computational models that took in that holistic picture, 1085 00:50:50,170 --> 00:50:52,702 that frankly, we humans are much better at. 1086 00:50:52,702 --> 00:50:54,910 If you would ask this question a little more quickly, 1087 00:50:54,910 --> 00:50:57,260 a little more immediately, you might have gotten a better response. 1088 00:50:57,260 --> 00:50:59,610 But this is, daresay, why chatbots in the past been 1089 00:50:59,610 --> 00:51:01,945 so bad in the form of customer service and the like, 1090 00:51:01,945 --> 00:51:04,320 because they're not really taking all of the context into 1091 00:51:04,320 --> 00:51:07,470 account that we humans might be inclined to provide. 1092 00:51:07,470 --> 00:51:09,750 What's going on underneath the hood? 1093 00:51:09,750 --> 00:51:14,220 Without escalating things too quickly, what an artificial intelligence 1094 00:51:14,220 --> 00:51:16,650 nowadays, these large language models might do, 1095 00:51:16,650 --> 00:51:21,360 is break down the user's input, your input into ChatGPT 1096 00:51:21,360 --> 00:51:22,950 into the individual words. 1097 00:51:22,950 --> 00:51:26,790 We might then encode, we might then take into account the order of those words. 1098 00:51:26,790 --> 00:51:29,400 Massachusetts is first, is is last. 1099 00:51:29,400 --> 00:51:33,050 We might further encode each of those words using a standard way. 1100 00:51:33,050 --> 00:51:34,800 And there's different algorithms for this, 1101 00:51:34,800 --> 00:51:37,050 but you come up with what are called embeddings. 1102 00:51:37,050 --> 00:51:40,170 That is to say, you can use one of those APIs 1103 00:51:40,170 --> 00:51:43,500 I talked about earlier, or even software running on your own computers, 1104 00:51:43,500 --> 00:51:46,140 to come up with a mathematical representation 1105 00:51:46,140 --> 00:51:47,940 of the word, Massachusetts. 1106 00:51:47,940 --> 00:51:50,190 And Rongxin kindly did this for us last night. 1107 00:51:50,190 --> 00:51:57,000 This is the 1,536 floating point values that OpenAI uses 1108 00:51:57,000 --> 00:51:59,880 to represent the word, Massachusetts. 1109 00:51:59,880 --> 00:52:02,010 And this is to say, and you should not understand 1110 00:52:02,010 --> 00:52:04,380 anything you are looking at on the screen, nor do I, 1111 00:52:04,380 --> 00:52:07,170 but this is now a mathematical representation 1112 00:52:07,170 --> 00:52:10,320 of the input that can be compared against 1113 00:52:10,320 --> 00:52:12,660 the mathematical representations of other inputs 1114 00:52:12,660 --> 00:52:15,420 in order to find proximity semantically. 1115 00:52:15,420 --> 00:52:20,130 Words that somehow have relationships or correlations with each other 1116 00:52:20,130 --> 00:52:22,890 that helps the AI ultimately predict what 1117 00:52:22,890 --> 00:52:25,990 should the next word out of its mouth be, so to speak. 1118 00:52:25,990 --> 00:52:28,380 So in a case like, these values represent-- 1119 00:52:28,380 --> 00:52:30,630 these lines represent all of those attention values. 1120 00:52:30,630 --> 00:52:32,880 And thicker lines means there's more attention given 1121 00:52:32,880 --> 00:52:34,140 from one word to another. 1122 00:52:34,140 --> 00:52:35,730 Thinner lines mean the opposite. 1123 00:52:35,730 --> 00:52:40,770 And those inputs are ultimately fed into a large neural network, 1124 00:52:40,770 --> 00:52:43,870 where you have inputs on the left, outputs on the right. 1125 00:52:43,870 --> 00:52:46,380 And in this particular case, the hope is to get out 1126 00:52:46,380 --> 00:52:52,200 a single word, which is the capital of Boston itself, whereby somehow, 1127 00:52:52,200 --> 00:52:55,950 the neural network and the humans behind it at OpenAI, Microsoft, Google, 1128 00:52:55,950 --> 00:52:59,490 or elsewhere, have sort of crunched so many numbers by training 1129 00:52:59,490 --> 00:53:03,040 these models on so much data, that it figured out what all of those weights 1130 00:53:03,040 --> 00:53:06,670 are, what the biases are, so as to influence mathematically 1131 00:53:06,670 --> 00:53:08,710 the output therefrom. 1132 00:53:08,710 --> 00:53:13,270 So that is all underneath the hood of what students now 1133 00:53:13,270 --> 00:53:15,460 perceive as this adorable rubber duck. 1134 00:53:15,460 --> 00:53:20,150 But underneath it all is certainly a lot of domain knowledge. 1135 00:53:20,150 --> 00:53:23,570 And CS50, by nature of being OpenCourseWare for the past many years, 1136 00:53:23,570 --> 00:53:26,050 CS50 is fortunate to actually be part of the model, 1137 00:53:26,050 --> 00:53:28,880 as might be any other content that's freely available online. 1138 00:53:28,880 --> 00:53:31,570 And so that certainly helps benefit the answers 1139 00:53:31,570 --> 00:53:34,150 when it comes to asking CS50 specific questions. 1140 00:53:34,150 --> 00:53:36,403 That said, it's not perfect. 1141 00:53:36,403 --> 00:53:38,320 And you might have heard of what are currently 1142 00:53:38,320 --> 00:53:43,540 called hallucinations, where ChatGPT and similar tools just make stuff up. 1143 00:53:43,540 --> 00:53:45,340 And it sounds very confident. 1144 00:53:45,340 --> 00:53:47,673 And you can sometimes call on it, whereby 1145 00:53:47,673 --> 00:53:49,090 you can say, no, that's not right. 1146 00:53:49,090 --> 00:53:51,610 And it will playfully apologize and say, oh, I'm sorry. 1147 00:53:51,610 --> 00:53:56,560 But it made up some statement, because it was probabilistically 1148 00:53:56,560 --> 00:53:59,840 something that could be said, even if it's just not correct. 1149 00:53:59,840 --> 00:54:02,650 Now, allow me to propose that this kind of problem 1150 00:54:02,650 --> 00:54:05,230 is going to get less and less frequent. 1151 00:54:05,230 --> 00:54:07,480 And so as the models evolve and our techniques evolve, 1152 00:54:07,480 --> 00:54:08,983 this will be less of an issue. 1153 00:54:08,983 --> 00:54:10,900 But I thought it would be fun to end on a note 1154 00:54:10,900 --> 00:54:13,510 that a former colleague shared just the other day, which 1155 00:54:13,510 --> 00:54:16,780 was this old poem by Shel Silverstein, another something 1156 00:54:16,780 --> 00:54:18,580 from our past childhood perhaps. 1157 00:54:18,580 --> 00:54:23,800 And this was from 1981, a poem called "Homework Machine," which is perhaps 1158 00:54:23,800 --> 00:54:26,980 foretold where we are now in 2023. 1159 00:54:26,980 --> 00:54:30,940 "The homework machine, oh, the homework machine, most perfect contraption 1160 00:54:30,940 --> 00:54:32,320 that's ever been seen. 1161 00:54:32,320 --> 00:54:35,770 Just put in your homework, then drop in a dime, snap on the switch, 1162 00:54:35,770 --> 00:54:41,380 and in ten seconds time, your homework comes out quick and clean as can be. 1163 00:54:41,380 --> 00:54:46,240 Here it is, 9 plus 4, and the answer is 3. 1164 00:54:46,240 --> 00:54:47,590 3? 1165 00:54:47,590 --> 00:54:48,820 Oh, me. 1166 00:54:48,820 --> 00:54:52,210 I guess it's not as perfect as I thought it would be." 1167 00:54:52,210 --> 00:54:55,330 So, quite foretelling, sure. 1168 00:54:55,330 --> 00:54:58,220 [APPLAUSE] 1169 00:54:58,220 --> 00:55:01,130 Quite foretelling, indeed. 1170 00:55:01,130 --> 00:55:04,910 Though, if for all this and more, the family members in the audience 1171 00:55:04,910 --> 00:55:08,810 are welcome to take CS50 yourself online at cs50edx.org. 1172 00:55:08,810 --> 00:55:10,700 For all of today and so much more, allow me 1173 00:55:10,700 --> 00:55:15,140 to thank Brian, Rongxin, Sophie, Andrew, Patrick, Charlie, CS50's whole team. 1174 00:55:15,140 --> 00:55:18,920 If you are a family member here headed to lunch with CS50's team, 1175 00:55:18,920 --> 00:55:22,190 please look for Cameron holding a rubber duck above her head. 1176 00:55:22,190 --> 00:55:24,300 Thank you so much for joining us today. 1177 00:55:24,300 --> 00:55:25,670 This was CS50. 1178 00:55:25,670 --> 00:55:27,170 [APPLAUSE] 1179 00:55:27,170 --> 00:55:30,520 [MUSIC PLAYING] 1180 00:55:30,520 --> 00:55:57,000