[MUSIC PLAYING] TOM CRUISE: I'm going to show you some magic. It's the real thing. [LAUGHTER] I mean, it's all the real thing. [LAUGHTER] DAVID J. MALAN: All right. This is CS50, Harvard University's Introduction to the Intellectual Enterprises of Computer Science and the Art of Programming. My name is David Malan, and this is our family-friendly introduction to artificial intelligence or AI, which seems to be everywhere these days. But first, a word on these rubber ducks, which your students might have had for some time. Within the world of computer science, and programming in particular, there's this notion of rubber duck debugging or rubber ducking-- --whereby in the absence of a colleague, a friend, a family member, a teaching fellow who might be able to answer your questions about your code, especially when it's not working, ideally you might have at least a rubber duck or really any inanimate object on your desk with whom to talk. And the idea is, that in expressing your logic, talking through your problems, even though the duck doesn't actually respond, invariably, you hear eventually the illogic in your thoughts and the proverbial light bulb goes off. Now, for students online for some time, CS50 has had a digital version thereof, whereby in the programming environment that CS50 students use, for the past several years, if they don't have a rubber duck on their desk, they can pull up this interface here. And if they begin a conversation like, I'm hoping you can help me solve some problem, up until recently, CS50's virtual rubber duck would simply quack once, twice, or three times in total. But we have anecdotal evidence that alone was enough to get students to realize what it is they were doing wrong. But of course, more recently has this duck and so many other ducks, so to speak, around the world, come to life really. And your students have been using artificial intelligence in some form within CS50 as a virtual teaching assistant. And what we'll do today, is reveal not only how we've been using and leveraging AI within CS50, but also how AI itself works, and to prepare you better for the years ahead. So last year around this time, like DALL-E 2 and image generation were all of the rage. You might have played with this, whereby you can type in some keywords and boom, you have a dynamically generated image. Similar tools are like Midjourney, which gives you even more realistic 3D imagery. And within that world of image generation, there were nonetheless some tells, like an observant viewer could tell that this was probably generated by AI. And in fact, a few months ago, The New York Times took a look at some of these tools. And so, for instance, here is a sequence of images that at least at left, isn't all that implausible that this might be an actual photograph. But in fact, all three of these are AI-generated. And for some time, there was a certain tell. Like AI up until recently, really wasn't really good at the finer details, like the fingers are not quite right. And so you could have that sort of hint. But I dare say, AI is getting even better and better, such that it's getting harder to discern these kinds of things. So if you haven't already, go ahead and take out your phone if you have one with you. And if you'd like to partake, scan this barcode here, which will lead you to a URL. And on your screen, you'll have an opportunity in a moment to buzz in. If my colleague, Rongxin, wouldn't mind joining me up here on stage. We'll ask you a sequence of questions and see just how prepared you are for this coming world of AI. So for instance, once you've got this here, code scanned, if you don't, that's fine. You can play along at home or alongside the person next to you. Here are two images. And my question for you is, which of these two images, left or right, was generated by AI? Which of these two was generated by AI, left or right? And I think Rongxin, we can flip over and see as the responses start to come in. So far, we're about 20% saying left, 70 plus percent saying right. 3%, 4%, comfortably admitting unsure, and that's fine. Let's wait for a few more responses to come in, though I think the right-hand folks have it. And let's go ahead and flip back and see what the solution is. In this case, it was, in fact, the right-hand side that was AI-generated. So, that's great. I'm not sure what it means that we figured this one out, but let's try one more here. So let me propose that we consider now these two images. It's the same code. So if you still have your phone up, you don't need to scan again. It's going to be the same URL here. But just in case you closed it. Let's take a look now at these two images. Which of these, left or right, was AI-generated? Left or right this time? Rongxin, should we take a look at how it's coming in? Oh, it's a little closer this time. Left or right? Right's losing a little ground, maybe as people are changing their answers to left. More people are unsure this time, which is somewhat revealing. Let's give folks another second or two. And Rongxin, should we flip back? The answer is, actually a trick question, since they were both AI. So most of you, most of you were, in fact, right. But if you take a glance at this, is getting really, really good. And so this is just a taste of the images that we might see down the line. And in fact, that video with which we began, Tom Cruise, as you might have gleaned, was not, in fact, Tom Cruise. That was an example of a deepfake, a video that was synthesized, whereby a different human was acting out those motions, saying those words, but software, artificial intelligence-inspired software was mutating the actual image and faking this video. So it's all fun and games for now as we tinker with these kinds of examples, but suffice it to say, as we've begun to discuss in classes like this already, disinformation is only going to become more challenging in a world where it's not just text, but it's imagery. And all the more, soon video. But for today, we'll focus really on the fundamentals, what it is that's enabling technologies like these, and even more familiarly, text generation, which is all the rage. And in fact, it seems just a few months ago, probably everyone in this room started to hear about tools like ChatGPT. So we thought we'd do one final exercise here as a group. And this was another piece in The New York Times where they asked the audience, "Did a fourth grader write this? Or the new chatbot?" So another opportunity to assess your discerning skills. So same URL. So if you still have your phone open and that same interface open, you're in the right place. And here, we'll take a final stab at two essays of sorts. Which of these essays was written by AI? Essay 1 or Essay 2? And as folks buzz in, I'll read the first. Essay 1. I like to bring a yummy sandwich and a cold juice box for lunch. Sometimes I'll even pack a tasty piece of fruit or a bag of crunchy chips. As we eat, we chat and laugh and catch up on each other's day, dot, dot, dot. Essay 2. My mother packs me a sandwich, a drink, fruit, and a treat. When I get in the lunchroom, I find an empty table and sit there and I eat my lunch. My friends come and sit down with me. Dot, dot, dot. Rongxin, should we see what folks think? It looks like most of you think that Essay 1 was generated by AI. And in fact, if we flip back to the answer here, it was, in fact, Essay 1. So it's great that we now already have seemingly this discerning eye, but let me perhaps deflate that enthusiasm by saying it's only going to get harder to discern one from the other. And we're really now on the bleeding edge of what's soon to be possible. But most everyone in this room has probably by now seen, tried, certainly heard of ChatGPT, which is all about textual generation. Within CS50 and within academia more generally, have we been thinking about, talking about, how whether to use or not use these kinds of technologies. And if the students in the room haven't told the family members in the room already, this here is an excerpt from CS50's own syllabus this year, whereby we have deemed tools like ChatGPT in their current form, just too helpful. Sort of like an overzealous friend who in school, who just wants to give you all of the answers instead of leading you to them. And so we simply prohibit by policy using AI-based software, such as ChatGPT, third-party tools like GitHub Copilot, Bing Chat, and others that suggests or completes answers to questions or lines of code. But it would seem reactionary to take away what technology surely has some potential upsides for education. And so within CS50 this semester, as well as this past summer, have we allowed students to use CS50's own AI-based software, which are in effect, as we'll discuss, built on top of these third-party tools, ChatGPT from OpenAI, companies like Microsoft and beyond. And in fact, what students can now use, is this brought to life CS50 duck, or DDB, Duck Debugger, within a website of our own, CS50 AI, and another that your students know known as cs50.dev. So students are using it, but in a way where we have tempered the enthusiasm of what might otherwise be an overly helpful duck to model it more akin to a good teacher, a good teaching fellow, who might guide you to the answers, but not simply hand them outright. So what does that actually mean, and in what form does this duck come? Well, architecturally, for those of you with engineering backgrounds that might be curious as to how this is actually implemented, if a student here in the class has a question, virtually in this case, they somehow ask these questions of this central web application, cs50.ai. But we, in turn, have built much of our own logic on top of third-party services, known as APIs, application programming interfaces, features that other companies provide that people like us can use. So as they are doing really a lot of the heavy lifting, the so-called large language models are there. But we, too, have information that is not in these models yet. For instance, the words that came out of my mouth just last week when we had a lecture on some other topic, not to mention all of the past lectures and homework assignments from this year. So we have our own vector database locally via which we can search for more recent information, and then hand some of that information into these models, which you might recall, at least for OpenAI, is cut off as of 2021 as of now, to make the information even more current. So architecturally, that's sort of the flow. But for now, I thought I'd share at a higher level what it is your students are already familiar with, and what will soon be more broadly available to our own students online as well. So what we focused on is, what's generally now known as prompt engineering, which isn't really a technical phrase, because it's not so much engineering in the traditional sense. It really is just English, what we are largely doing when it comes to giving the AI the personality of a good teacher or a good duck. So what we're doing, is giving it what's known as a system prompt nowadays, whereby we write some English sentences, send those English sentences to OpenAI or Microsoft, that sort of teaches it how to behave. Not just using its own knowledge out of the box, but coercing it to behave a little more educationally constructively. And so for instance, a representative snippet of English that we provide to these services looks a little something like this. Quote, unquote, "You are a friendly and supportive teaching assistant for CS50. You are also a rubber duck. You answer student questions only about CS50 and the field of computer science, do not answer questions about unrelated topics. Do not provide full answers to problem sets, as this would violate academic honesty. And so in essence, and you can do this manually with ChatGPT, you can tell it or ask it how to behave. We, essentially, are doing this automatically, so that it doesn't just hand answers out of the box and knows a little something more about us. There's also in this world of AI right now the notion of a user prompt versus that system prompt. And the user prompt, in our case, is essentially the student's own question. I have a question about x, or I have a problem with my code here in y, so we pass to those same APIs, students' own questions as part of this so-called user prompt. Just so you're familiar now with some of the vernacular of late. Now, the programming environment that students have been using this whole year is known as Visual Studio Code, a popular open source, free product, that most-- so many engineers around the world now use. But we've instrumented it to be a little more course-specific with some course-specific features that make learning within this environment all the easier. It lives at cs50.dev. And as students in this room know, that as of now, the virtual duck lives within this environment and can do things like explain highlighted lines of code. So here, for instance, is a screenshot of this programming environment. Here is some arcane looking code in a language called C, that we've just left behind us in the class. And suppose that you don't understand what one or more of these lines of code do. Students can now highlight those lines, right-click or Control click on it, select explain highlighted code, and voila, they see a ChatGPT-like explanation of that very code within a second or so, that no human has typed out, but that's been dynamically generated based on this code. Other things that the duck can now do for students is advise students on how to improve their code style, the aesthetics, the formatting thereof. And so for instance, here is similar code in a language called C. And I'll stipulate that it's very messy. Everything is left-aligned instead of nicely indented, so it looks a little more structured. Students can now click a button. They'll see at the right-hand side in green how their code should ideally look. And if they're not quite sure what those changes are or why, they can click on, explain changes. And similarly, the duck advises them on how and why to turn their not great code into greater code, from left to right respectively. More compellingly and more generalizable beyond CS50 and beyond computer science, is AI's ability to answer most of the questions that students might now ask online. And we've been doing asynchronous Q&A for years via various mobile or web applications and the like. But to date, it has been humans, myself included, responding to all of those questions. Now the duck has an opportunity to chime in, generally within three seconds, because we've integrated it into an online Q&A tool that students in CS50 and elsewhere across Harvard have long used. So here's an anonymized screenshot of a question from an actual student, but written here as John Harvard, who asked this summer, in the summer version of CS50, what is flask exactly? So fairly definitional question. And here is what the duck spit out, thanks to that architecture I described before. I'll stipulate that this is correct, but it is mostly a definition, akin to what Google or Bing could already give you last year. But here's a more nuanced question, for instance, from another anonymized student. In this question here, the student's including an error message that they're seeing. They're asking about that. And they're asking a little more broadly and qualitatively, is there a more efficient way to write this code, a question that really is best answered based on experience. Here, I'll stipulate that the duck responded with this answer, which is actually pretty darn good. Not only responding in English, but with some sample starter code that would make sense in this context. And at the bottom it's worth noting, because none of this technology is perfect just yet, it's still indeed very bleeding edge, and so what we have chosen to do within CS50 is include disclaimers, like this. I am an experimental bot, quack. Do not assume that my reply is accurate unless you see that it's been endorsed by humans, quack. And in fact, at top right, the mechanism we've been using in this tool is usually within minutes. A human, whether it's a teaching fellow, a course assistant, or myself, will click on a button like this to signal to our human students that yes, like the duck is spot on here, or we have an opportunity, as always, to chime in with our own responses. Frankly, that disclaimer, that button, will soon I do think go away, as the software gets better and better. But for now, that's how we're modulating exactly what students' expectations might be when it comes to correctness or incorrectness. It's common too in programming, to see a lot of error messages, certainly when you're learning first-hand. A lot of these error messages are arcane, confusing, certainly to students, versus the people who wrote them. Soon students will see a box like this. Whenever one of their terminal window programs errs, they'll be assisted too with English-like, TF-like support when it comes to explaining what it is that went wrong with that command. And ultimately, what this is really doing for students in our own experience already, is providing them really with virtual office hours and 24/7, which is actually quite compelling in a university environment, where students' schedules are already tightly packed, be it with academics, their curriculars, athletics, and the like-- --and they might have enough time to dive into a homework assignment, maybe eight hours even, for something sizable. But if they hit that wall a couple of hours in, yeah, they can go to office hours or they can ask a question asynchronously online, but it's really not optimal in the moment support that we can now provide all the more effectively we hope, through software, as well. So if you're curious. Even if you're not a technophile yourself, anyone on the internet can go to cs50.ai and experiment with this user interface. This one here actually resembles ChatGPT itself, but it's specific to CS50. And here again is just a sequence of screenshots that I'll stipulate for today's purposes, are pretty darn good in akin to what I myself or a teaching fellow would reply to and answer to a student's question, in this case, about their particular code. And ultimately, it's really aspirational. The goal here ultimately is to really approximate a one-to-one teacher to student ratio, which despite all of the resources we within CS50, we within Harvard and places like Yale have, we certainly have never had enough resources to approximate what might really be ideal, which is more of an apprenticeship model, a mentorship, whereby it's just you and that teacher working one-to-one. Now we still have humans, and the goal is not to reduce that human support, but to focus it all the more consciously on the students who would benefit most from some impersonal one-to-one support versus students who would happily take it at any hour of the day more digitally via online. And in fact, we're still in the process of evaluating just how well or not well all of this works. But based on our summer experiment alone with about 70 students a few months back, one student wrote us at term's end it-- --"felt like having a personal tutor. I love how AI bots will answer questions without ego and without judgment. Generally entertaining even the stupidest of questions without treating them like they're stupid. It has an, as one could expect," ironically, "an inhuman level of patience." And so I thought that's telling as to how even one student is perceiving these new possibilities. So let's consider now more academically what it is that's enabling those kinds of tools, not just within CS50, within computer science, but really, the world more generally. What the whole world's been talking about is generative artificial intelligence. AI that can generate images, generate text, and sort of mimic the behavior of what we think of as human. So what does that really mean? Well, let's start really at the beginning. Artificial intelligence is actually a technique, a technology, a subject that's actually been with us for some time, but it really was the introduction of this very user-friendly interface known as ChatGPT. And some of the more recent academic work over really just the past five or six years, that really allowed us to take a massive leap forward it would seem technologically, as to what these things can now do. So what is artificial intelligence? It's been with us for some time, and it's honestly, so omnipresent, that we take it for granted nowadays. Gmail, Outlook, have gotten really good at spam detection. If you haven't checked your spam folder in a while, that's testament to just how good they seem to be at getting it out of your inbox. Handwriting recognition has been with us for some time. I dare say, it, too, is only getting better and better the more the software is able to adapt to different handwriting styles, such as this. Recommendation histories and the like, whether you're using Netflix or any other service, have gotten better and better at recommending things you might like based on things you have liked, and maybe based on things other people who like the same thing as you might have liked. And suffice it to say, there's no one at Netflix akin to the old VHS stores of yesteryear, who are recommending to you specifically what movie you might like. And there's no code, no algorithm that says, if they like x, then recommend y, else recommend z, because there's just too many movies, too many people, too many different tastes in the world. So AI is increasingly sort of looking for patterns that might not even be obvious to us humans, and dynamically figuring out what might be good for me, for you or you, or anyone else. Siri, Google Assistant, Alexa, any of these voice recognition tools that are answering questions. That, too, suffice it to say, is all powered by AI. But let's start with something a little simpler than any of those applications. And this is one of the first arcade games from yesteryear known as Pong. And it's sort of like table tennis. And the person on the left can move their paddle up and down. Person on the right can do the same. And the goal is to get the ball past the other person, or conversely, make sure it hits your paddle and bounces back. Well, somewhat simpler than this insofar as it can be one player, is another Atari game from yesteryear known as Breakout, whereby you're essentially just trying to bang the ball against the bricks to get more and more points and get rid of all of those bricks. But all of us in this room probably have a human instinct for how to win this game, or at least how to play this game. For instance, if the ball pictured here back in the '80s as a single red dot just left the paddle, pictured here as a red line, where is the ball presumably going to go next? And in turn, which direction should I slide my paddle? To the left or to the right? So presumably, to the left. And we all have an eye for what seemed to be the digital physics of that. And indeed, that would then be an algorithm, sort of step by step instructions for solving some problem. So how can we now translate that human intuition to what we describe more as artificial intelligence? Not nearly as sophisticated as those other applications, but we'll indeed, start with some basics. You might know from economics or strategic thinking or computer science, this idea of a decision tree that allows you to decide, should I go this way or this way when it comes to making a decision. So let's consider how we could draw a picture to represent even something simplistic like Breakout. Well, if the ball is left of the paddle, is a question or a Boolean expression I might ask myself in code. If yes, then I should move my paddle left, as most everyone just said. Else, if the ball is not left of paddle, what do I want to do? Well, I want to ask a question. I don't want to just instinctively go right. I want to check, is the ball to the right of the paddle, and if yes, well, then yes, go ahead and move the paddle right. But there is a third situation, which is-- AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Right. Like, don't move, it's coming right at you. So that would be the third scenario here. No, it's not to the right or to the left, so just don't move the paddle. You got lucky, and it's coming, for instance, straight down. So Breakout is fairly straightforward when it comes to an algorithm. And we can actually translate this as any CS50 student now could, to code or pseudocode, sort of English-like code that's independent of Java, C, C++ and all of the programming languages of today. So in English pseudocode, while a game is ongoing, if the ball is left of paddle, I should move paddle left. Else if ball is right of the paddle, it should say paddle, that's a bug, not intended today, move paddle right. Else, don't move the paddle. So that, too, represents a translation of this intuition to code that's very deterministic. You can anticipate all possible scenarios captured in code. And frankly, this should be the most boring game of Breakout, because the paddle should just perfectly play this game, assuming there's no variables or randomness when it comes to speed or angles or the like, which real world games certainly try to introduce. But let's consider another game from yesteryear that you might play with your kids today or you did yourself growing up. Here's tic-tac-toe. And for those unfamiliar, the goal is to get three O's in a row or three X's in a row, vertically, horizontally, or diagonally. So suppose it's now X's turn. If you've played tic-tac-toe, most of you probably just have an immediate instinct as to where X should probably go, so that it doesn't lose instantaneously. But let's consider in the more general case, how do you solve tic-tac-toe. Frankly, if you're in the habit of losing tic-tac-toe, but you're not trying to lose tic-tac-toe, you're actually playing it wrong. Like, you should minimally be able to always force a tie in tic-tac-toe. And better yet, you should be able to beat the other person. So hopefully, everyone now will soon walk away with this strategy. So how can we borrow inspiration from those same decision trees and do something similar here? So if you, the player, ask yourself, can I get three in a row on this turn? Well, if yes, then you should do that and play the X in that position. Play in the square to get three in a row. Straight forward. If you can't get three in a row in this turn, you should ask another question. Can my opponent get three in a row in their next turn? Because then you better preempt that by moving into that position. Play in the square to block opponent's three in a row. What if though, that's not the case, right? What if there aren't even that many X's and O's on the board? If you're in the habit of just kind of playing randomly, like you might not be playing optimally as a good AI could. So if no, it's kind of a question mark. In fact, there's probably more to this tree, because we could think through, what if I go there. Wait a minute, what if I go there or there or there? You can start to think a few steps ahead as a computer could do much better even than us humans. So suppose, for instance, it's O's turn. Now those of you who are very good at tic-tac-toe might have an instinct for where to go. But this is an even harder problem, it would seem. I could go in eight possible places if I'm O. But let's try to break that down more algorithmically, as in AI would. And let's recognize, too, that with games in particular, one of the reasons that AI was so early adopted in these games, playing the CPU, is that games really lend themselves to defining them, if taking the fun out of it mathematically. Defining them in terms of inputs and outputs, maybe paddle moving left or right, ball moving up or down. You can really quantize it at a very boring low level. But that lends itself then to solving it optimally. And in fact, with most games, the goal is to maximize or maybe minimize some math function, right? Most games, if you have scores, the goal is to maximize your score, and indeed, get a high score. So games lend themselves to a nice translation to mathematics, and in turn here, AI solutions. So one of the first algorithms one might learn in a class on algorithms and on artificial intelligence is something called minimax, which alludes to this idea of trying to minimize and/or maximize something as your function, your goal. And it actually derives its inspiration from these same decision trees that we've been talking about. But first, a definition. Here are three representative tic-tac-toe boards. Here is one in which O has clearly won, per the green. Here is one in which X has clearly won, per the green. And this one in the middle just represents a draw. Now, there's a bunch of other ways that tic-tac-toe could end, but here's just three representative ones. But let's make tic-tac-toe even more boring than it might have always struck you as. Let's propose that this kind of configuration should have a score of negative 1. If O wins, it's a negative 1. If X wins, it's a positive 1. And if no one wins, we'll call it a 0. We need some way of talking about and reasoning about which of these outcomes is better than the other. And what's simpler than 0, 1 and negative 1? So the goal though, of X, it would seem, is to maximize its score, but the goal of O is to minimize its score. So X is really trying to get positive 1, O is really trying to get negative 1. And no one really wants 0, but that's better than losing to the other person. So we have now a way to define what it means to win or lose. Well, now we can employ a strategy here. Here, just as a quick check, what would the score be of this board? Just so everyone's on the same page. AUDIENCE: 1. DAVID J. MALAN: Or, so 1, because X has one and we just stipulated arbitrarily, this means that this board has a value of 1. Now let's put it into a more interesting context. Here, a game has been played for a few moves already. There's two spots left. No one has won just yet. And suppose that it's O's turn now. Now, everyone probably has an instinct already as to where to go, but let's try to break this down more algorithmically. So what is the value of this board? Well, we don't know yet, because no one has won, so let's consider what could happen next. So we can draw this actually as a tree, as before. Here, for instance, is what might happen if O goes into the top left-hand corner. And here's what might happen if O goes into the bottom middle spot instead. We should ask ourselves, what's the value of this board, what's the value of this board? Because if O's purpose in life is to minimize its score, it's going to go left or right based on whichever yields the smallest number. Negative 1, ideally. But we're still not sure yet, because we don't have definitions for boards with holes in them like this. So what could happen next here? Well, it's obviously going to be X's turn next. So if X moves, unfortunately, X has one in this configuration. We can now conclude that the value of this board is what number? AUDIENCE: 1. DAVID J. MALAN: So 1. And because there's only one way to reach this board, by transitivity, you might as well think of the value of this previous board as also 1, because no matter what, it's going to lead to that same outcome. And so the value of this board is actually still to be determined, because we don't know if O is going to want to go with the 1, and probably not, because that means X wins. But let's see what the value of this board is. Well, suppose that indeed, X goes in that top left corner here. What's the value of this board here? 0, because no one has won. There's no X's or O's three in a row. So the value of this board is 0. There's only one way logically to get there, so we might as well think of the value of this board as also 0. And so now, what's the value of this board? Well, if we started the story by thinking about O's turn, O's purpose is the min in minimax, then which move is O going to make? Go to the left or go to the right? O's was probably going to go to the right and make the move that leads to, whoops, that leads to this board, because even though O can't win in this configuration, at least X didn't win. So it's minimized its score relatively, even though it's not a clean win. Now, this is all fine and good for a configuration of the board that's like almost done. There's only two moves left. The game's about to end. But if you kind of expand in your mind's eye, how did we get to this branch of the decision tree, if we rewind one step where there's three possible moves, frankly, the decision tree is a lot bigger. If we rewind further in your mind's eye and have four moves left or five moves or all nine moves left, imagine just zooming out, out, and out. This is becoming a massive, massive tree of decisions. Now, even so, here is that same subtree, the same decision tree we just looked at. This is the exact same thing, but I shrunk the font so that it appears here on the screen here. But over here, we have what could happen if instead, it's actually X's turn, because we're one move prior. There's a bunch of different moves X could now make, too. So what is the implication of this? Well, most humans are not thinking through tic-tac-toe to this extreme. And frankly, most of us probably just don't have the mental capacity to think about going left and then right and then left and then right. Right? This is not how people play tic-tac-toe. Like, we're not using that much memory, so to speak. But a computer can handle that, and computers can play tic-tac-toe optimally. So if you're beating a computer at tic-tac-toe, like, it's not implemented very well. It's not following this very logical, deterministic minimax algorithm. But this is where now AI is no longer as simple as just doing what these decision trees say. In the context of tic-tac-toe, here's how we might translate this to code, for instance. If player is X, for each possible move, calculate a score for the board, as we were doing verbally, and then choose the move with the highest score. Because X's goal is to maximize its score. If the player is O, though, for each possible move, calculate a score for the board, and then choose the move with the lowest score. So that's a distillation of that verbal walkthrough into what CS50 students know now as code, or at least pseudocode. But the problem with games, not so much tic-tac-toe, but other more sophisticated games is this. Does anyone want to ballpark how many possible ways there are to play tic-tac-toe? Paper, pencil, two human children, how many different ways? How long could you keep them occupied playing tic-tac-toe in different ways? If you actually think through, how big does this tree get, how many leaves are there on this decision tree, like how many different directions, well, if you're thinking 255,168, you are correct. And now most of us in our lifetime have probably not played tic-tac-toe that many times. So think about how many games you've been missing out on. There are different decisions you could have been making all these years. Now, that's a big number, but honestly, that's not a big number for a computer. That's a few megabytes of memory maybe, to keep all of that in mind and implement that kind of code in C or Java or C++ or something else. But other games are much more complicated. And the games that you and I might play as we get older, they include maybe chess. And if you think about chess with only the first four moves, back and forth four times, so only four moves. That's not even a very long game. Anyone want a ballpark how many different ways there are to begin a game of chess with four moves back and forth? This is evidence as to why chess is apparently so hard. 288 million ways, which is why when you are really good at chess, you are really good at chess. Because apparently, you either have an intuition for or a mind for thinking it would seem so many more steps ahead than your opponent. And don't get us started on something like Go. 266 quintillion ways to play Go's first four moves. So at this point, we just can't pull out our Mac, our PC, certainly not our phone, to solve optimally games like chess and Go, because we don't have big enough CPUs. We don't have enough memory. We don't have enough years in our lifetimes for the computers to crunch all of those numbers. And thus was born a different form of AI that's more inspired by finding patterns more dynamically, learning from data, as opposed to being told by humans, here is the code via which to solve this problem. So machine learning is a subset of artificial intelligence that tries instead to get machines to learn what they should do without being so coached step by step by step by humans here. Reinforcement learning, for instance, is one such example thereof, wherein reinforcement learning, you sort of wait for the computer or maybe a robot to maybe just get better and better and better at things. And as it does, you reward it with a reward function. Give it plus 1 every time it does something well. And maybe minus 1. You punish it any time it does something poorly. And if you simply program this AI or this robot to maximize its score, never mind minimizing, maximize its score, ideally, it should repeat behaviors that got it plus 1. It should decrease the frequency with which it does bad behaviors that got it negative 1. And you can reinforce this kind of learning. In fact, I have here one demonstration. Could a student come on up who does not think they are particularly coordinated? If-- OK, wow, you're being nominated by your friends. Come on up. Come on up. [LAUGHTER] Their hands went up instantly for you. OK, what is your name? AMAKA: My name's Amaka. DAVID J. MALAN: Amaka, do you want to introduce yourself to the world? AMAKA: Hi, my name is Amaka. I am a first year in Holworthy. I'm planning to concentrate in CS. DAVID J. MALAN: Wonderful. Nice to see you. Come on over here. [APPLAUSE] So, yes, oh, no, it's sort of like a game show here. We have a pan here with what appears to be something pancake-like. And we'd like to teach you how to flip a pancake, so that when you gesture upward, the pancake should flip around as though you cooked the other side. So we're going to reward you verbally with plus 1 or minus 1. Minus 1. Minus 1. OK, plus 1! Plus 1, so do more of that. Minus 1. Minus 1. Minus 1. Do less of that. [LAUGHTER] AUDIENCE: Great, great. DAVID J. MALAN: All right! A big round of applause. [APPLAUSE] Thank you. We've been in the habit of handing out Super Mario Brothers Oreos this year, so thank you for participating. [APPLAUSE] So, this is actually a good example of an opportunity for reinforcement learning. And wonderfully, a researcher has posted a video that we thought we'd share. It's about a minute and a half long, where you can watch a robot now do exactly what our wonderful human volunteer here just attempted as well. So let me go ahead and play this on the screen and give you a sense of what the human and the robot are doing together. So their pancake looks a little similar there. The human here is going to first sort of train the robot what to do by showing it some gestures. But there's no one right way to do this. But the human seems to know how to do it pretty well in this case, and so it's trying to give the machine examples of how to flip a pancake successfully. But now, this is the very first trial. OK, look familiar? You're in good company. After three trials. [CLANG] [PLOP] OK. [CLANG] [PLOP] OK. Now 10 tries. There's the human picking up the pancake. After 11 trials-- [CLANG] [PLOP] And meanwhile, there's presumably a human coding this, in the sense that someone is saying good job or bad job, plus 1 or minus 1. 20 trials. Here now we'll see how the computer knows what it's even doing. There's just a mapping to some kind of XYZ coordinate system. So the robot can quantize what it is it's doing. Nice! To do more of one thing, less of another. And you're just seeing a visualization in the background of those digitized movements. And so now, after 50 some odd trials, the robot, too, has got it spot on. And it should be able to repeat this again and again and again, in order to keep flipping this pancake. So our human volunteer wonderfully took you even fewer trials. But this is an example then, to be clear, of what we'd call reinforcement learning, whereby you're reinforcing a behavior you want or negatively reinforcing. That is, punishing a behavior that you don't. Here's another example that brings us back into the realm of games a little bit, but in a very abstract way. If we were playing a game like The Floor Is Lava, where you're only supposed to step certain places so that you don't fall straight in the lava pit or something like that and lose a point or lose a life, each of these squares might represent a position. This yellow dot might represent the human player that can go up, down, left or right within this world. I'm revealing to the whole audience where the lava pits are. But the goal for this yellow dot is to get to green. But the yellow dot, as in any good game, does not have this bird's eye view and knows from the get-go exactly where to go. It's going to have to try some trial and error. But if we, the programmers, maybe reinforce good behavior or punish bad behavior, we can teach this yellow dot, without giving it step by step, up, down, left, right instructions, what behaviors to repeat and what behaviors not to repeat. So, for instance, suppose the robot moves right. Ah, that was bad. You fell in the lava already, so we'll use a bit of computer memory to draw a thicker red line there. Don't do that again. So, negative 1, so to speak. Maybe the yellow dot moves up next time. We can reward that behavior by not drawing any walls and allowing it to go again. It's making pretty good progress, but, oh, darn it, it took a right turn and now fell into the lava. But let's use a bit more of the computer's memory and keep track of the, OK, do not do that thing anymore. Maybe the next time the human dot goes this way. Oh, we want to punish that behavior, so we'll remember as much with that red line. But now we're starting to make progress until, oh, now we hit this one. And eventually, even though the yellow dot, much like our human, much like our pancake flipping robot had to try again and again and again, after enough trials, it's going to start to realize what behaviors it should repeat and which ones it shouldn't. And so in this case, maybe it finally makes its way up to the green dot. And just to recap, once it finds that path, now it can remember it forever as with these green thicker lines. Any time you want to leave this map, any time you get really good at the Nintendo game, you follow that same path again and again, so you don't fall into the lava. But an astute human observer might realize that, yes, this is correct. It's getting out of this so-called maze. But what is suboptimal or bad about this solution? Sure. AUDIENCE: It's taking a really long time. It's not the most efficient way to get there. DAVID J. MALAN: Exactly. It's taking a really long time. An inefficient way to get there, because I dare say, if we just tried a different path occasionally, maybe we could get lucky and get to the exit quicker. And maybe that means we get a higher score or we get rewarded even more. So within a lot of artificial intelligence algorithms, there's this idea of exploring versus exploiting, whereby you should occasionally, yes, exploit the knowledge you already have. And in fact, frequently exploit that knowledge. But occasionally you know what you should probably do, is explore just a little bit. Take a left instead of a right and see if it leads you to the solution even more quickly. And you might find a better and better solution. So here mathematically is how we might think of this. 10% of the time we might say that epsilon, just some variable, sort of a sprinkling of salt into the algorithm here, epsilon will be like 10% of the time. So if my robot or my player picks a random number that's less than 10%, that's going to make a random move. Go left instead of right, even if you really typically go right. Otherwise, guys make the move with the highest value, as we've learned over time. And what the robot might learn then, is that we could actually go via this path, which gets us to the output faster. We get a higher score, we do it in less time, it's a win-win. Frankly, this really resonates with me, because I've been in the habit, as maybe some of you are, when you go to a restaurant maybe that you really like, you find a dish you really like-- --I will never again know what other dishes that restaurant offers, because I'm locally optimally happy with the dish I've chosen. And I will never know if there's an even better dish at that restaurant unless again, I sort of sprinkle a little bit of epsilon, a little bit of randomness into my game playing, my dining out. The catch, of course, though, is that I might be punished. I might, therefore, be less happy if I pick something and I don't like it. So there's this tension between exploring and exploiting. But in general in computer science, and especially in AI, adding a little bit of randomness, especially over time, can, in fact, yield better and better outcomes. But now there's this notion all the more of deep learning, whereby you're trying to infer, to detect patterns, figure out how to solve problems, even if the AI has never seen those problems before, and even if there's no human there to reinforce positive or negatively behavior. Maybe it's just too complex of a problem for a human to stand alongside the robot and say, good or bad job. So with deep learning, they're actually very much related to what you might know as neural networks, inspired by human physiology, whereby inside of our brains and elsewhere in our body, there's lots of these neurons here that can send electrical signals to make movements happen from brain to extremities. You might have two of these via which signals can be transmitted over a larger distance. And so computer scientists for some time have drawn inspiration from these neurons to create in software, what we call neural networks. Whereby, there's inputs to these networks and there's outputs from these networks that represents inputs to problems and solutions thereto. So let me abstract away the more biological diagrams with just circles that represent nodes, or neurons, in this case. This we would call in CS50, the input. This is what we would call the output. But this is a very simplistic, a very simple neural network. This might be more common, whereby the network, the AI takes two inputs to a problem and tries to give you one solution. Well, let's make this more real. For instance, suppose that at the-- suppose that just for the sake of discussion, here is like a grid that you might see in math class, with a y-axis and an x-axis, vertically and horizontally respectively. Suppose there's a couple of blue and red dots in that world. And suppose that our goal, computationally, is to predict whether a dot is going to be blue or red, based on its position within that coordinate system. And maybe this represents some real world notion. Maybe it's something like rain that we're trying to predict. But we're doing it more simply with colors right now. So here's my y-axis, here's my x-axis, and effectively, my neural network you can think of conceptually as this. It's some kind of implementation of software where there's two inputs to the problem. Give me an x, give me a y value. And this neural network will output red or blue as its prediction. Well, how does it know whether to predict red or blue, especially if no human has painstakingly written code to say when you see a dot here, conclude that it's red. When you see a dot here, conclude that it's blue. How can an AI just learn dynamically to solve problems? Well, what might be a reasonable heuristic here? Honestly, this is probably a first approximation that's pretty good. If anything's to the left of that line, let the neural network conclude that it's going to be blue. And if it's to the right of the line, let it conclude that it's going to be red. Until such time as there's more training data, more real world data that gets us to rethink our assumptions. So for instance, if there's a third dot there, uh-oh, clearly a straight line is not sufficient. So maybe it's more of a diagonal line that splits the blue from the red world here. Meanwhile, here's even more dots. And it's actually getting harder now. Like, this line is still pretty good. Most of the blue is up here. Most of the red is down here. And this is why, if we fast forward to today, you know, AI is often very good, but not perfect at solving problems. But what is it we're looking at here, and what is this neural network really trying to figure out? Well, again, at the risk of taking some fun out of red and blue dots, you can think of this neural network as indeed having these neurons, which represent inputs here and outputs here. And then what's happening inside of the computer's memory, is that it's trying to figure out what the weight of this arrow or edge should be. What the weight of this arrow or edge should be. And maybe there's another variable there, like plus or minus C that just tweaks the prediction. So x and y are literally going to be numbers in this scenario. And the output of this neural network ideally is just true or false. Is it red or blue? So it's sort of a binary state, as we discuss a lot in CS50. So here too, to take the fun out of the pretty picture, it's really just like a high school math function. What the neural network in this example is trying to figure out, is what formula of the form ax plus by plus c is going to be arbitrarily greater than 0? And if so, let's conclude that the dot is red if you get back a positive result. If you don't, let's conclude that the dot is going to be blue instead. So really what you're trying to do, is figure out dynamically what numbers do we have to tweak, these parameters inside of the neural network that just give us the answer we want based on all of this data? More generally though, this would be really representative of deep learning. It's not as simple as input, input, output. There's actually a lot of these nodes, these neurons. There's a lot of these edges. There's a lot of numbers and math are going on that, frankly, even the computer scientists using these neural networks don't necessarily know what they even mean or represent. It just happens to be that when you crunch the numbers with all of these parameters in place, you get the answer that you want, at least most of the time. So that's essentially the intuition behind that. And you can apply it to very real world, if mundane applications. Given today's humidity, given today's pressure, yes or no, should there be rainfall? And maybe there is some mathematical function that based on years of training data, we can infer what that prediction should be. Another one. Given this amount of advertising in this month, what should our sales be for that year? Should they be up, or should they be down? Sorry, for that particular month. So real world problems map readily when you can break them down into inputs and a binary output often, or some kind of output where you want the thing to figure out based on past data what its prediction should be. So that brings us back to generative artificial intelligence, which isn't just about solving problems, but really generating literally images, texts, even videos, that again, increasingly resemble what we humans might otherwise output ourselves. And within the world of generative artificial intelligence, do we have, of course, these same images that we saw before, the same text that we saw before, and more generally, things like ChatGPT, which are really examples of what we now call large language models. These sort of massive neural networks that have so many inputs and so many neurons implemented in software, that essentially represent all of the patterns that the software has discovered by being fed massive amounts of input. Think of it as like the entire textual content of the internet. Think of it as the entire content of courses like CS50 that may very well be out there on the internet. And even though these AIs, these large language models haven't been told how to behave, they're really inferring from all of these examples, for better or for worse, how to make predictions. So here, for instance, from 2017, just a few years back, is a seminal paper from Google that introduced what we now know as a transformer architecture. And this introduced this idea of attention values, whereby they propose that given an English sentence, for instance, or really any human sentence, you try to assign numbers, not unlike our past exercises, to each of the words, each of the inputs that speaks to its relationship with other words. So if there's a high relationship between two words in a sentence, they would have high attention values. And if maybe it's a preposition or an article, like the or the like, maybe those attention values are lower. And by encoding the world in that way, do we begin to detect patterns that allow us to predict things like words, that is, generate text. So for instance, up until a few years ago, completing this sentence was actually pretty hard for a lot of AI. So for instance here, Massachusetts is a state in the New England region of the Northeastern United States. It borders on the Atlantic Ocean to the east. The state's capital is dot, dot, dot. Now, you should think that this is relatively straightforward. It's like just handing you a softball type question. But historically within the world of AI, this word, state, was so relatively far away from the proper noun that it's actually referring back to, that we just didn't have computational models that took in that holistic picture, that frankly, we humans are much better at. If you would ask this question a little more quickly, a little more immediately, you might have gotten a better response. But this is, daresay, why chatbots in the past been so bad in the form of customer service and the like, because they're not really taking all of the context into account that we humans might be inclined to provide. What's going on underneath the hood? Without escalating things too quickly, what an artificial intelligence nowadays, these large language models might do, is break down the user's input, your input into ChatGPT into the individual words. We might then encode, we might then take into account the order of those words. Massachusetts is first, is is last. We might further encode each of those words using a standard way. And there's different algorithms for this, but you come up with what are called embeddings. That is to say, you can use one of those APIs I talked about earlier, or even software running on your own computers, to come up with a mathematical representation of the word, Massachusetts. And Rongxin kindly did this for us last night. This is the 1,536 floating point values that OpenAI uses to represent the word, Massachusetts. And this is to say, and you should not understand anything you are looking at on the screen, nor do I, but this is now a mathematical representation of the input that can be compared against the mathematical representations of other inputs in order to find proximity semantically. Words that somehow have relationships or correlations with each other that helps the AI ultimately predict what should the next word out of its mouth be, so to speak. So in a case like, these values represent-- these lines represent all of those attention values. And thicker lines means there's more attention given from one word to another. Thinner lines mean the opposite. And those inputs are ultimately fed into a large neural network, where you have inputs on the left, outputs on the right. And in this particular case, the hope is to get out a single word, which is the capital of Boston itself, whereby somehow, the neural network and the humans behind it at OpenAI, Microsoft, Google, or elsewhere, have sort of crunched so many numbers by training these models on so much data, that it figured out what all of those weights are, what the biases are, so as to influence mathematically the output therefrom. So that is all underneath the hood of what students now perceive as this adorable rubber duck. But underneath it all is certainly a lot of domain knowledge. And CS50, by nature of being OpenCourseWare for the past many years, CS50 is fortunate to actually be part of the model, as might be any other content that's freely available online. And so that certainly helps benefit the answers when it comes to asking CS50 specific questions. That said, it's not perfect. And you might have heard of what are currently called hallucinations, where ChatGPT and similar tools just make stuff up. And it sounds very confident. And you can sometimes call on it, whereby you can say, no, that's not right. And it will playfully apologize and say, oh, I'm sorry. But it made up some statement, because it was probabilistically something that could be said, even if it's just not correct. Now, allow me to propose that this kind of problem is going to get less and less frequent. And so as the models evolve and our techniques evolve, this will be less of an issue. But I thought it would be fun to end on a note that a former colleague shared just the other day, which was this old poem by Shel Silverstein, another something from our past childhood perhaps. And this was from 1981, a poem called "Homework Machine," which is perhaps foretold where we are now in 2023. "The homework machine, oh, the homework machine, most perfect contraption that's ever been seen. Just put in your homework, then drop in a dime, snap on the switch, and in ten seconds time, your homework comes out quick and clean as can be. Here it is, 9 plus 4, and the answer is 3. 3? Oh, me. I guess it's not as perfect as I thought it would be." So, quite foretelling, sure. [APPLAUSE] Quite foretelling, indeed. Though, if for all this and more, the family members in the audience are welcome to take CS50 yourself online at cs50edx.org. For all of today and so much more, allow me to thank Brian, Rongxin, Sophie, Andrew, Patrick, Charlie, CS50's whole team. If you are a family member here headed to lunch with CS50's team, please look for Cameron holding a rubber duck above her head. Thank you so much for joining us today. This was CS50. [APPLAUSE] [MUSIC PLAYING]