[MUSIC PLAYING] DAVID MALAN: This is CS50, Harvard University's introduction to the intellectual enterprises of computer science-- over here, computer science. So today we are joined by CS50's own Brian Yu. This is an unusual week, a look at artificial intelligence or AI. You might recall that some weeks ago when we first introduced Python, we talked about writing programs that answered you by saying hello if you said hello, or by saying goodbye if you said goodbye. But those programs were all implemented with if conditions and else ifs and else ifs and so that really wasn't artificial intelligence. If you wanted to have a whole conversation with a computer program like that, that would be a huge number of ifs and else ifs just to anticipate all of the things the human might say. So we can do better, and indeed humans have been doing better in this field of artificial intelligence. And I'm so pleased to say that Brian is here to lead us along that way. Now, CS50's own, Brian Yu. BRIAN YU: Thanks very much. Welcome, everyone, to CS50. And today as David alluded to, the topic of discussion is artificial intelligence, which is all about taking our computers and trying to make them intelligent somehow, trying to get them to be able to act in a way that's somewhat rational, somewhat human and that could take a number of different forms. One example of artificial intelligence, for example, might be game play. You might be familiar with the game of tic-tac-toe, where you've got this three by three grid and X and O take turns trying to get three in a row. And if, for example X started the game and played in the middle square of this grid and then it was O's turn and O played in the top, it turns out that at this point in the game X has a very strategic move. And a human that's very good at the game or a computer that can maybe try and figure out how to play this game well, might make an intelligent move like playing in the top right corner for example. And if X plays in the top right corner, well then O is going to need to play in the bottom left corner in order to block X from getting three in a row. And here maybe you can see one of a couple of possible good moves that X has now, but X could play a move like moving in the right square over there. And now O is in a tricky position, X has one way that could potentially win the game horizontally, and another way they could potentially win the game vertically. And so O is going to have to block one of those ways, and maybe they go to block horizontally but then X is going to win no matter what just by playing in that bottom right corner. And so a human playing this game could reason through the game thinking about how the opponent might respond and how X would respond in turn and a computer might try to do the same thing for a game as simple as tic-tac-toe or a game even more complex. But AI goes beyond just game plan, you might see examples like handwriting recognition, where nowadays computers are pretty good at taking human handwritten text, which can be messy, which is different from person to person, and somehow figuring out with pretty high accuracy exactly what characters the human was actually writing. AI comes up in areas like spam detection. Maybe in your email inbox you have your inbox and your spam email usually gets sorted into a separate folder where you might get a whole bunch of emails coming into your email inbox and somehow your computer is able to figure out with reasonable accuracy, which emails are good and which emails are spam. Now of course, the computer is not perfect at this. There are false positives, where the computer thinks that an email might be spam when it isn't actually. And there are false negatives too, where a spam email might accidentally end up in your inbox because the computer doesn't catch it, but those sorts of false positives and false negatives are the types of things that AI researchers are working on trying to reduce to make these systems more and more accurate. You see these kinds of systems show up as well if you've ever been on a video watching website, like YouTube or Netflix, for example, where you have watched a whole bunch of videos or TV shows or movies and then software behind Netflix or behind YouTube is able to give you recommendations, suggest other videos that you might be interested in watching based on the things that you've watched already. And in more recent years, AI has gotten pretty good at doing even more sophisticated things, things like generating data. Take a look at these two images, for example, of people and see if you notice anything strange. See if either of these people look strange to you. In fact, can you figure out which of these two images is not a real person? That is to say, a computer-generated person that looks like it's human but is not actually a photo of a real person. So you can look at these two images carefully, maybe look at the eyes and the hair and the mouth and the nose and see if you can figure out which one it is. It turns out neither of these two images are images of real people. They're both computer generated, not photos of real people but a computer has been trained to generate images of people that look like real people that could fool someone into thinking that it's a real person, but it's all just AI generated information. So today we'll take a look at all of those ideas, how it is that artificial intelligence works. And ultimately one of the big takeaways is that AI is not just one algorithm or one principle, it's really a collection, a category of approaches to problem solving that can all be used to try and solve some of these problems of trying to make computers intelligent. So let's begin with one of the first areas where we might want to make our AI work and that's in the area of decision making. Very frequently we want to train a computer to be able to make a decision well. That decision might be deciding if an email is spam or not spam, or deciding whether or not to recommend a video to you, or it could be deciding what action to take in a game, for example. So let's take a simple game, maybe you've played a game like this before where you control this paddle along the bottom and you're trying to bounce this ball in order to hit all of the bricks along the top. Imagine if you were trying to program a computer to be able to play this game strategically, to play this game well and the computer observed the ball move that way. So the ball is moving that way, what should you program the computer to do? Well, it makes logical sense that if the ball's moving to the left, then you should program the computer to move the paddle to the left as well, to try and catch that ball before it falls through the ground. And so you could take that kind of logic and encode it into a computer program using what we might call a decision tree. Decision trees are just a way of representing a decision that a computer might make by asking questions and depending on the answers to those questions we might ask another question or make some sort of decision. So for this game of the paddle, we could create a decision tree by asking a question like this, we could ask, is the ball to the left of the paddle? If the answer to that question is yes, well then the action we should take is we should move the paddle to the left because the ball is moving left, the paddle should move left as well. If the answer to the question is no, well then we need to maybe ask another question, we might ask a question like, is the ball to the right of the paddle? And if the answer to that question is yes, then we'll go ahead and move the paddle to the right. And if the answer to the question is no, well that means the ball is not to the left of the paddle and it's not to the right of the paddle, so we may as well just not move the paddle at all in that case. So this again is that decision tree that can allow us to make these choices about what it is that our AI should do in this situation and we could take that decision tree and turn it into a kind of pseudocode, something that might look like something you would write in C or in Python. I might say while the game is ongoing, if the ball is to the left of the paddle, go ahead and move the paddle to the left. Else if the ball is to the right of the paddle, move the paddle to the right. And else if neither of those are true, then don't move the paddle at all. So one of the advantages of this decision tree is that it translates quite nicely to the conditions that you're familiar with from the world of programming. So let's give this a try with something a little more complex. Let's take the game of tic-tac-toe for example. And imagine you were trying to program an AI to strategically play the game of tic-tac-toe. What questions might you ask in your decision tree, what yes or no questions might you want to ask to create an AI that can play this game well, that can play this game strategically? And maybe I'll take a volunteer if anyone wants to suggest one possible question I might want to ask of my AI as I'm trying to teach this AI how to play this game? Any thoughts for questions that I might ask in this situation? And let's go to-- let's see, let's try [? Vasily ?] if you have an idea? AUDIENCE: Maybe let's ask what is the probability of winning given a certain move? BRIAN YU: Great. And what would winning mean? How could you detect what it would mean to win, like what are you going to look for on the board? AUDIENCE: Three consecutive yeah, Xs or circles. BRIAN YU: Great, you're looking for some opportunity for like three consecutive Xs or three consecutive Os. So one of the questions you might ask is checking to see whether or not you could get three in a row maybe on the next turn. If you already have two in a row and there's an empty space that could be an opportunity for something that we might want to try. Any other questions we might want to ask as part of our strategic decision making if we're trying to play this game of tic-tac-toe, any thoughts about other things we could try, other types of things we should be looking for or asking as we're training our artificial intelligence, as we're trying to program it to play this game strategically? Let's go to Santiago, you have an idea? AUDIENCE: Hello, Brian. If there is any possibility to win, the other player? BRIAN YU: For the other player? Yeah, so you might want to check the other player to see if they have some possibility to win. And if they have some possibility to win you could try to block them. So those are great questions you could ask. And we could start to formulate a decision tree for tic-tac-toe based on those questions. I might start by asking, can I get three in a row on this turn? And if the answer is yes, well, then my action is pretty straightforward. I should play in the square that will get me to three in a row. If the answer to the question is no, well then I might ask Santiago's question, I might ask, can my opponent get three in a row on their next turn? And if the answer to that is yes, well then, we better play in the square to block them from getting three in a row, otherwise they're going to win the game if we don't block them now. If the answer to this question is no though, if I can't get three in a row in this turn and my opponent can't get three in a row in the next turn, well now it's a little bit less clear. You could maybe try and come up with additional questions you might ask but these two questions are very clear and the rest maybe a little bit less obvious but ultimately we don't want to be doing any of this as we're starting to make our AI more and more intelligent. As David was alluding to earlier, ultimately rather than us have to program every condition into the computer, telling it what to do in every situation, we'd like for the computer to be smart enough to just figure out on its own, analyze on its own all of the possibilities and figure out what move it should make. And ultimately we want our computer to make the optimal decision, the best possible decision when playing this game. And so to do so we can introduce our first algorithm in artificial intelligence that we'll look at today and that algorithm is called minimax. Minimax is a game playing algorithm that's useful any time you have two competing players that are trying to play some sort of competitive game like tic-tac-toe, but it'll work for other games as well. And ultimately as with we've seen all throughout CS50, we need some way of representing this game inside of the computer. And ultimately you'll recall way back from the beginning of CS50, we've been trying to take everything and represent it just with numbers and we can do the same thing with games. As far as our AI is concerned, there are only three possible outcomes of the game that our AI is going to care about, either O wins or X wins and X gets three in a row, or it's possible that neither side wins, that it ends up being a tie. And so what we're going to do is take each of these three possible outcomes and assign a number to each of those outcomes. We'll say O winning, we're just going to say, let's call that negative one, X winning, we'll call that one. And a tie, well that's somewhere in between the two. So we'll go ahead and call that zero. Nobody wins that game. And now each player has a goal, some objective that we can quantify using these numbers. The X player, who we might also call the max player, aims to maximize the score, what's best for X is a score of one, meaning X is going to win. But if X can't win, well then tying the game is zero, that's better than losing the game which is negative one. Meanwhile the O player we're going to call the min player, the minimizing player, their goal is to minimize the score. Negative one is the best outcome for the O player, but if negative one can't happen, if O can't win, well, then tying the game is still better than X winning because X winning would mean our minimizing player, player O, is ultimately going to lose the game. And so we can take every board in tic-tac-toe and assign a score to it, it's either one or zero or negative one. So this board for example, this game is over, X has already won the game and because X has won the game, we're going to give that a value of one. One corresponds to X winning, negative one to O winning, zero for a tie. So let's now consider this game board. This game board isn't over yet, but we can still assign a score to it. Let's assume that it's X's turn, what score would we give this board? Well, when we're giving each board a score, we're considering what would happen if both players were playing optimally. In other words, if both players are playing the best possible moves and in this case, if X is playing the best possible move, well then X is going to play in this square to get three in a row and so this board also has a value of one because if X plays their best move, ultimately X is going to win the game, that has a value of one. So let's take a look at another board, maybe one that's a little bit trickier. Now in this board let's assume it's O's turn, so O gets to make a move and then because there's one empty square, X will get to make a move after that, what value would you give to this board? Maybe we'll ask for a volunteer. Remember the possibilities are one if X is going to win, negative one if O is going to win, and a zero if it's going to be a tie. If both players play the best moves and it's O's turn right now, what value would you give to this board and why? Maybe let's go to Santiago, go ahead. AUDIENCE: Yeah, I would say that the value would be zero because if both players pay their best move no one in the end will win because O will block X and then there would be nothing left to do. BRIAN YU: Yeah. Absolutely correct. It's certainly possible that X could win because X has two in a row, they could get three in a row. But if both players play their best moves, well then O is going to block here and then X is going to play in the upper left and it's going to be a tie, which means this board is going to have a value of zero. And the way a computer might figure that out is by reasoning it through exactly the same way we did, by considering both of the possible options. If I want to know the value for this board where it's O's turn, well then I'm going to consider both of the possible options. O has two choices, O could play in the top left or O could block X by playing in the bottom there and the computer would consider both of those possible choices and try and figure out what value each of those boards has. So what happens if O plays in the top left? Well, then X is going to play in the bottom and X is going to win the game. X winning the game, that has a value of one, remember X winning, that's the value of one, which means this board also is going to have a value of one. So X wins, that's not great. So let's consider the other possible choice, O could also have blocked X by playing in the bottom here and in that case, that's going to lead to X playing in the upper left, it's the only other possible option, that board has a value of zero, it's a tie since nobody won. Which means this board also has a value of zero. And now the minimizing player is going to look at these two options. If I play in the top left, that's going to be a value of one, that's not good for me. If I play in the bottom, that's going to be a value of zero, so that's better. And so ultimately we can conclude that this board up top as Santiago correctly stated, is also going to have a value of zero. If both players play their best moves it's ultimately going to be a tie. And this is what we may call a game tree, where you're exploring all of the possible branches, all of the ways this game could go. And from here, we're two moves away from the end of the game. But you could back up and consider what would it look like three moves away from the end of the game. And you'd end up with a bigger tree because you now need to explore even more possibilities if you're trying to figure out what the value of any particular game board is. And so that's the way the minimax algorithm works, consider all of the possible moves you could make and then recursively in a sense, consider if I make my best move, what's my opponent going to do in response and then what could I do in response to that? And we could formulate that as a little bit of pseudocode by saying if the player is X, well then for all of the possible moves let's go ahead and calculate a score for that board. The same way we were just doing, by recursively following all the possible moves, let's get a score for that board and let's choose the move with the highest score. And otherwise, if the player is O, well, then we'll do the same thing. For all the possible moves, we'll calculate a score for that board, but then we'll choose the move with the lowest score. So the X player is picking the move that maximizes the score, the O player is picking the move that minimizes the score. And using this approach you can create an AI that can play a game like tic-tac-toe perfectly. It'll never lose the game if you've programmed it correctly. And this works not only for tic-tac-toe but for other games as well. But do you see any problems with this approach? This approach of considering all of my possible moves, then all of my opponent's possible responses to those moves, and then all of my possible responses to those moves recursively until we get down to the end of the game. Any possible problems that might arise? Let's go to Sophia maybe. AUDIENCE: It can take a really long time to like explore all the branches and actually come to a conclusion. BRIAN YU: Yeah, the amount of time it's going to take to explore all of those possible moves, it could be quite large depending on how complex the game is. For a game like tic-tac-toe, maybe think about how many possible games of tic-tac-toe there are. It turns out there are about 255,000 possible games of tic-tac-toe, which seems like a lot, but computers are pretty fast and computers can make it through all 255,000 of these possible tic-tac-toe games usually in a matter of seconds on most modern computers. But if you imagine a game like chess for example, known for being a fairly complex game, how many possible games of chess are there? If there are 255,000 possible games of tic-tac-toe, how many possible games of chess are there? Well, it turns out that only after the first four moves, if you only look at the first four moves of chess, there are 288 billion possible chess games, which is a lot for any computer to try to deal with and that's only the first four moves. If you consider the entire game of chess, nobody knows the answer exactly, but people estimate that 10 to the 29,000 is probably a lower bound on the number of possible chess games. The actual number is probably higher than that. That is far too many for any computer to be able to reasonably get through in any amount of time. Computers as they exist now are not going to be able to consider all of the possible moves going all the way to the end of the game in order to figure out what the score for any board is. Which is why a game like chess is much harder for an AI to play than a game like tic-tac-toe. But it's not impossible. And in fact, the best computer chess players are far better than the best human chess players at this point. So what could we do, what maybe improvement could we make or what change could we make to the algorithm so that our AI can still play a game like chess even though there are so many additional possible moves? There are a lot of possible answers but any ideas, anyone want to offer a thought or a suggestion as to how we might address this problem? They are so many moves to consider, too many for a computer to ever reasonably try to attempt. Any thoughts on something we could try? Let's go to, Kurt. Yeah, you have an idea? AUDIENCE: Yeah, just if there was some way along the way to maybe assign some point values where you could, below some threshold maybe you could like discard those paths? BRIAN YU: Yeah, we want some way to be able to more intelligently not explore all of the paths, but somehow like discard some of the paths or not consider some of them or stop ourselves somewhere. So that rather than consider all of the 10 to the 29,000 possible games of chess, we just consider fewer of them. Exploring fewer possible games, exploring fewer possible levels and so there are many variants on the minimax algorithm. One of the most common being what's called depth-limited minimax, where in depth-limited minimax, we consider what's called an evaluation function. Rather than follow the game until the very end, we follow the game some number of moves. In chess maybe you're going 15, 16 moves, but not all the way to the end of the game and then you're just asking a question like all right, at this point in the game, who seems likely to win? Even if we're not going to calculate it all the way, you can maybe make some judgment by counting up how many pieces of what value each side happens to have. And you might come up with other strategies for different games. But this now is where we need to start being even more intelligent, to thinking about how can we come up with a good evaluation function that's able to take a complex game and estimate who is potentially likely to win. So that then is minimax, an example of an algorithm that can be used to play games. And I'll pause here for any questions about minimax or game playing artificial intelligence before we move on to an entirely different type of artificial intelligence. Yeah, Sophia, go ahead. AUDIENCE: Yeah. I just have a question, like this algorithm is kind of following like how you would think about the game, but are there any other algorithms that like don't necessarily look like step by step evaluations? BRIAN YU: Yeah, there are definitely other algorithms and other approaches. In fact, right now what we've given an example of is really just an example of an exhaustive search, searching for all of the possible different moves and then making a judgment based on that. We'll see some examples of more intelligent algorithms or algorithms that can learn later today in fact. And we'll take a look at some of those other possibilities and how we might apply them. And so this then is really an example of what we might call a search algorithm. A search algorithm is just a type of algorithm where we're looking for some solution or some answer. It could be looking for the best move to make in a game or you might imagine a search problem more abstractly of something like finding your way through a maze. You're trying to get from one point in a maze to another point in a maze and you have to make decisions about which way to turn and how to go. And the real world application of this might be something like driving directions. If you go into Google Maps for example. And you type where you are and where you're trying to get to, Google Maps pretty quickly can give you a fairly optimal route, some path to take which way to turn and when, when to make which decision, that will get you from one point to another. This is an example of a search problem, Google Maps needs to somehow search through all of the possible routes you can take to figure out how to get you to the place that you're going. And so we could come up with an algorithm for trying to do that. Imagine we have a maze like this, we're trying to get from point A to point B but of course, we have some walls here, these grayed out squares are walls that we can't cross through, but we'd still like to try and find some path that will get us from point A to point B. There are a number of algorithms that you might try. And you could think about if you were solving this maze as a person, how would you do this? How would you decide what decision to make, whether to go left or whether to go right? But as far as the computer is concerned, we need to give it a very precise algorithm. And one of the simplest algorithms we might come up with is one called depth-first search. And the way that depth-first search would navigate its way through a maze is to say, the AI is going to just follow a path and if ever the AI needs to make a choice, it reaches a fork in the road, so to speak, where it could go left, or it could go right, it's just going to choose one randomly. It doesn't know which way is better, so it's going to pick one path and it's going to try it. And if ever it hits a dead end, it reaches some wall where it can't make any more progress, then our AI is just going to back up and it's going to try another path instead. So I can show you what that looks like, we're starting at point A, we're trying to get to point B. And the way our AI might work is that we might start by just following one square after another. Initially, we don't have much choice in the matter, we haven't reached any branching points or any forks in the road. But at this point right here, now we have a decision, we could go left or we could go right. And as far as depth-first search, otherwise known as DFS is concerned, DFS doesn't know which way to go. It doesn't know whether to go left, it doesn't know whether to go right, so we pick one randomly. We might choose to go left for example. So we're going to go left and we're going to keep following the path until we get to another fork in the road. We could go left or right, DFS doesn't know which to pick, so we're going to choose randomly, maybe we'll go left and now we hit a dead end. At that point our algorithm knows this was not a good path to take. So it's going to back up to the latest decision point and make a different choice. So it's going to back up to right here and say, all right, I tried going left, it didn't work. Let's try going right instead. So it's going to go right, it'll hit a dead end, realize that OK, this was not a good path to take, it just led to dead ends, it didn't go anywhere. So instead of going left, let's try going right. So we'll try this path, maybe we'll try going up but hit a dead end, so we'll try going to the right. Here again, we hit a decision, we're going to make some choice. And again, we're just going to repeatedly hit dead ends and try new paths until eventually we make our way to the destination. And so that is depth-first search, and as long as there are like a finite number of squares, this algorithm is eventually going to find a path to the destination if such a path exists. Like if there is a way to get from point A to point B, then this algorithm will eventually find it because it tries something and if we reach a dead end then we try it again and we keep doing that until eventually we find our way to the destination. But this algorithm isn't great. There are certainly some problems, some maybe room for improvement. So maybe I'll ask all of you, what problems do you see with this approach? Maybe reasons why this algorithm might not be ideal, areas where we might be able to improve upon this algorithm as it stands right now? This again is depth-first search. Let's go to Joy. AUDIENCE: I think it is very time consuming. BRIAN YU: Yeah, it's very time consuming. And it's time consuming in a number of ways. In one sense, we're exploring a lot of paths that really don't lead anywhere in the sense that we explored all of this area but ultimately that didn't help us towards trying to find the goal. And it's also time consuming in the route that it ultimately finds, like if you imagine using Google Maps, for example, to navigate our way from one place to another, it's likely going to be the case that you want to find an efficient route, you want like the fastest way to get from where you are to where you want to go. And if Google Maps were to give you like a long and winding route with lots of detours that got you to the destination but took much longer than it needed to, that's probably not the best user experience for you. And depth-first search unfortunately could run into just that situation. Imagine a maze like this, where from point A, we could go up or we could go to the right, we don't know which to take, so depth-first search will just randomly choose. We might choose to go up and maybe we'll go right and we'll find our way to the destination. Sure, depth-first search was able to find us a path from A to B, but remember that often what we want to do is we want to make the optimal decision. This is a correct path for getting from A to B, but it's not the best path. If we're looking for the shortest path from A to B, well that's going to be this one, it's going to be this path here that goes from A to B there. So we'd like some way to fix that, we'd like an algorithm that is able to find us the shortest path, not just finding us any path. And so for that, we can use an algorithm called breadth-first search, otherwise known as BFS. And the way this algorithm works is that instead of whenever we reach a fork in the road, pick one until we hit a dead end and then pick the other. What breadth-first search is going to do is whenever we hit a fork in the road, let's take both paths, let's take one step on the left and then one step on the right and then another step on the left and another step on the right and effectively just search outwards from the starting point. We're going to start from the beginning and look for all of the places we can get to by taking one step away from A, and then everywhere we can get to taking two steps away from A, and then three steps away from A, so on and so forth. So we're always looking at the things that are nearby before we look at the things that are further away. So the way this might work is that from A we're going to take one step up and then we're going to search one square to the right. And then we'll look a second square on this direction and then a second square along the path to the right and then a third square and a third square and we're going to repeat that process, effectively alternating between all of the different options that we have until eventually we find this optimal path from A to B. And because we're looking for the shorter paths before we look for those longer paths, eventually we're going to find that shortest possible path. Now there are still problems with this too. As Joy pointed out, these algorithms have a tendency to explore a lot of paths unnecessarily. Let's go back to that original maze for example and consider what would breadth-first search do if I presented it with this maze. We'll consider what might happen, we might go until we reach the first decision point right here, we could go left or right. And remember, depth-first search picked one and just followed it until a dead end. What breadth-first search is going to do is it's going to pick both. It's going to go to the left and to the right. And then to the left and to the right and basically alternate between all of those until we reach another decision point and then it's going to consider all of those possibilities. Let's go left or left here and right and consider these possibilities too and it's going to keep exploring. Any time we reach any decision point, it's going to explore this way and that way, this way and that way, over and over again. And ultimately, if we repeat this process and consider what breadth-first search is looking for, sure it's going to find me the shortest and in this case, the only possible path to get from A to B but it took such a long time to be able to do so. It looked at so many different squares, in fact, it looked at all of the squares in order to do something as simple as find the shortest path to get from one point to another. And so here we can try to start to be a little bit more intelligent. What we'd ideally like to do is that when we reach this first decision point, go left or go right, most humans if you gave them this maze and told you to try to solve it at this decision point, you wouldn't just pick randomly, you would choose to go to the right because you know the goal is somewhere to the right and so it's probably the case that we should follow that direction if we're trying to find our way to the end of the maze. And so these algorithms we've looked at so far, depth-first search and breadth-first search are examples of what we might call uninformed search. They're not using any specialized knowledge about the problem. They don't really know anything about the maze. They're just basically blindly guessing and hoping that eventually we make our way to the solution. But in AI we can improve upon this by using an informed search. An informed search uses something that we know about the problem to try and improve the way we search, to allow us to search a little bit more effectively. And to do so we're often going to make use of what's known as a heuristic, some way of estimating how good a particular state is going to be. So in this maze for example, if I'm trying to get from A to B, a heuristic would allow me to answer a question like, if I see point C over here and point D over there, which one of those points would I rather be at? Which one is better for trying to find our way to the destination? And between C and D, breadth-first search and depth-first search, they have no knowledge of which one of those is going to be better, as far as it's concerned every square is just a square, but if you're looking at this as a maze, most people would intuitively tell you, well, D is better. Even if I don't know exactly how to get to the destination, if I could be at either C or D, I'd probably pick D because it just looks like it's closer to that destination. And so the heuristic we could use is one that's often known as the Manhattan distance. The Manhattan distance between any two squares effectively says ignore the walls, ignore all the boundaries, just consider how many squares like in this case up and to the right would I have to go to get from where I am to the destination. And so for C, we would go up this many squares and then all the way to the right, whereas from D doesn't matter whether you go up or to the right first, but I would go right four squares and then up two. D as far as Manhattan distance is concerned is much closer to the goal than C and so I would rather be at D than at C. And as soon as we have that notion of like between any two choices, which one of those would I rather be at? That gives us a way to improve upon the random guessing that the other algorithms do. Remember the depth-first search when it reached a fork in the road and it could go left or right, it didn't know which to pick, so it just randomly picked one. Now that we have a heuristic like this, we can make an informed choice. We can say, I don't know whether left or right is the correct solution, but right is probably going to be better than left because of this heuristic. And so I can make those sorts of judgments. And so for that, we'll take a look at another algorithm known as greedy best-first search. So in greedy best-first search, what we're going to do is consider for each of the squares what it's heuristic value is according to the Manhattan distance in this case. So this square for example, is one away from the goal, so we've labeled it with the number one. This square meanwhile is two away from our goal, so we're labeling it with the number two, this is three away, this is four away. You'll notice though that we're ignoring any of the walls, any of the boundaries, because those are going to be difficult to compute. We want something efficient. This square here, even though in order to get to the goal we have to follow this winding path to go around all the boundaries, we're still giving it a score of two. We want something efficient, right now, it just looks like it's two away. So it's not a perfect heuristic, the heuristic is just an estimate but we're using it as an approximation that's going to help us as we go about this search process. And so what we'll do is we'll start the same way, starting from point A looking at all of the squares we can get to until we reach our first decision point. So here we could choose to go left or we could choose to go right and in this case, going left according to the heuristic, this square is 13 away from the goal and this one over here is 11 away from the goal. So between those two, being 11 away from the goal, that sounds a whole lot better. So greedy best-first search is going to make the choice to go to the right. So we'll go 11, we'll keep following until we reach the next decision point. Here from the eight we have two choices, up or to the right. As far as the heuristic is concerned both of these are equivalent, going up we're seven squares away, going to the right we're six plus one. That's a total of seven squares away. So here even greedy best-first search sometimes needs to pick randomly. If both squares have the same heuristic value, we just have to make a choice. And maybe we make a bad choice and hit a dead end but then we can just try the other one. So seven, six, we're going to keep following. Here we have another decision point, we can go down or we can go to the right, but the one to the right has a smaller heuristic value. It's a six instead of an eight. So we're always going to try and pick the one that looks like it's going to be closer. So we'll pick the six, we'll go to the five. And here we reach one more decision point, we can go up which has a heuristic of four and we can go down which has a heuristic of six. So even here because the four is the smaller value, we're going to go up even though you, the human can see we're ultimately going to run into a dead end, the computer doesn't know that yet. The computer just has to judge based on the heuristic what it thinks is the best thing to do but eventually it's going to run into that wall, realize that it can't make any progress and then go back down here and will follow that path until ultimately we arrive at the solution. And so here we arrived at the same solution that breadth-first search did, but we didn't need to consider all of the squares. We could ignore all of these off to the left, we could ignore all of these down below just by being a little bit smarter about what choice we make. Instead of just choosing randomly we make an informed choice based on that heuristic. I'll pause here for any questions then about the algorithms we've looked at so far, depth-first search, breadth-first search, and now an informed algorithm, greedy best-first search. Any questions about these algorithms? Yeah, let's go to let's see, Sofia? AUDIENCE: In like the decision to go randomly is it possible to like go further and see if there's like lesser values, like go one more step? BRIAN YU: You certainly could like maybe try and look ahead a couple of steps and look at what might be following it. And that might offer numbers that would improve upon the situation. But even this algorithm is not perfect. In fact, when you're just looking at these heuristic values and following the heuristic values, sometimes the heuristic values will lead you in a good direction, like sometimes in this case, we ultimately, we made a couple of wrong turns but ultimately we kind of found our way. But that's not always going to be the case. There are some cases where because the heuristic is really just an estimate, the heuristic can sometimes be wrong. Take a maze like this for example, if you were to follow greedy best-first search, you would go 16, 15, 14, 13, 12, at this point you could decide either the 11 or the 13. And greedy best-first search would look ahead and say, this path looks pretty good and it's going to-- and none of these values are any bigger than this 12 and so it would find this path that takes you from A to B. But even that path isn't actually the optimal path to take. The optimal path to take, the shortest path between A and B is actually this one here, which looks a little counterintuitive and the reason we didn't catch it is because it involves like going to the left first and then winding around. And normally according to this heuristic, we usually don't want to be going to the left because we know that our goal, point B where we're trying to get to, it's off to the right. So even if you're looking at these heuristic values and looking ahead, this algorithm might not be perfect. And so we might try to improve upon this algorithm even more. And so for one final search algorithm that I'll show you, the most complex one that we've seen yet, is an algorithm known as A * search. A * search tries to build upon the ideas of greedy best-first search, which is to say we're going to use a heuristic to try and intelligently make choices, but we're going to try and solve the problem we just saw, which is that greedy best-first search isn't always optimal. When it just takes into account the heuristic, it's not always going to make the best choice because we might end up choosing a path that's ultimately longer. What A * search is going to try to realize is that if we've made it like all the way down here and now we're at a spot where we could be like 11 squares away from the goal, that's OK but honestly, being 13 squares away from the goal much sooner is probably better. So we can't just take into account the heuristic value, we should also take into account how far have we gone already. If I've already traveled many squares and I find myself pretty close to the goal, I would rather have traveled fewer squares and find myself close to the goal. And so A * search is going to try to combine these two pieces of information, combine the heuristic information that we have seen here, and also combine how many steps have I taken so far because that factors into the ultimate optimal solution too. And so here's how A * search is going to work. It's going to be the exact same idea as before, just looking at the heuristic value, but instead of only considering the heuristic value, we're going to consider taking the heuristic value and adding to it how many steps we've already taken. So we take our first step. And now we've taken one step and we're 16 squares away from the goal according to the heuristic, for a total of 17. And then we take our second step and we're 15 squares away from the goal and we take our third step and we're 14 squares away from the goal, and we keep going until we reach a decision point after five squares, we're now 12 away from the goal and now we have a choice, we could go six squares and be 11 away from the goal or we could go six squares and be 13 away from the goal, which is worse. So we're going to make the decision to go up. So for now, it doesn't seem like we've done any better, we haven't found the optimal solution just yet. But notice what will happen, eventually if we follow this path for long enough we'll end up at a point where we've made 14 steps and we're estimated to be five away from the goal, one, two, three, four, five, ignoring the walls. And now we have a choice, we could be six squares away from the goal after having walked 15 steps, so 15 plus six that's a total of 21, or this option is still available to us. We could be six squares away from the start, but be 13 away from the goal, six plus 13, that's a total of 19 and that 19 is a smaller number than this 21. So this 19 is ultimately a better choice as far as A * is concerned. So by combining information about how far we've traveled and how far is left to go, we can make a more intelligent choice, we can say you know what, let's go ahead and try this path and A * then will ultimately be able to find its way to get from the starting point ultimately to the goal. And this algorithm then relies upon having a good heuristic function and it just so happens that you can prove that if you pick a good heuristic function, this algorithm will be optimal. it will always find the shortest path from point A to point B and it's not going to get stuck like greedy best-first search might on a path that isn't actually optimal. And so there are many of these sorts of search algorithms that are designed to try and find intelligent ways to find a solution to some problem, to navigate its way through some landscape. And so I'll pause here for any questions then about search algorithms in general. We've taken a look at what we call classical search, navigating our way through a maze for example, or finding driving directions as well as what we might call adversarial search, where there's an opponent and you're trying to search for the best move in a game of tic-tac-toe or a game of chess. DAVID MALAN: Brian, one question from the chat. Does the assignment of these heuristic values also take a lot of time for the computer or is that automatic? BRIAN YU: Ideally you want to find a heuristic that's going to be efficient to calculate. So if the time it takes to calculate the heuristic takes a long time, then yeah, it could be the case that this would be even worse. But ideally, you want to look for a heuristic that's going to be very quick to calculate and this is why sometimes when we're looking at heuristics, we're going to ignore some of the details that make this more complex. Like when we're calculating these heuristics, we're ignoring the walls because having to deal with the walls is going to make it even-- it's going to make it take longer and longer amounts of time to be able to figure out these values. And so by ignoring the walls, we can pretty quickly calculate just like x, y coordinate wise, how many coordinate squares are we away from that destination. Also, I've been labeling all of the squares in this grid with their heuristic value, just so you can see what those heuristic values are. In practice, if our search algorithm never touches a square, like it never touches any of these squares, it's not going to bother computing heuristic values for them because it's not relevant to the search process. So it'll never actually calculate the heuristic values for all of these squares, which will save time too. I have just shown them to you visually so that you can see all of the squares and what numbers would be assigned to them. But yeah, really great question. All right and, Kurt, was there another question? AUDIENCE: Yeah, I was just wondering if, I mean, the example that you showed it's using like a map to actually search through a map, so it kind of like maps directly onto sort of the concept but like could this also be used for like textual search or other kinds of searches through like different kinds of problem spaces or no? BRIAN YU: Yeah, this can be used for a lot of different problem spaces. For text they're usually different approaches in the world of like natural language processing but you can use these kinds of search algorithms any time you have some computer which we'll usually call like an agent, something that's making decisions, that has to make certain decisions at certain points, even if those decisions aren't like geographic decisions about which way to walk, for example, as long as it's making some decision that moves it into like a different position, you can often apply these types of algorithms in order to solve problems. And so these algorithms are not just good for maze solving, they can be used in other domains too. So ultimately what I want to move on to now is less about just coming up with these algorithms that let the AI figure out exactly what to do right away, but looking at a type of artificial intelligence that you've probably heard of called machine learning. And machine learning is all about trying to take our AI and trying to get it to learn, learn somehow from data or learn from experience. Much the same way that humans might learn, that we learn from looking at our environment, we learn from our surroundings, we learn from our experiences, and we get something out of those experiences. And so one type of machine learning is known as reinforcement learning, where you learn from positive or negative rewards. The computer does something well and it gets a reward, the computer does something poorly, it gets some sort of punishment and the computer tries to learn from that. Let's imagine for example, a very simple game that we might want our computer to play, the computer is going to try to navigate its way through these tiles and it's going to try and get to the goal, similar to what we've seen before, but this time the computer doesn't know where the obstacles are. Let's imagine that there are some obstacles in its way highlighted in red here and if our computer agent, this yellow dot here, ever hits one of those obstacles, the computer loses this game. So it hits an obstacle, the computer loses but the computer doesn't know where the obstacles are. I'm showing them to you visually but the computer has no idea. How would the computer try to solve this problem? How would you try to solve the problem? Well, all it can do it first is randomly make a choice, randomly guess. Like let's choose to go to the right, OK we hit an obstacle that was no good. But the computer now can learn from experience. As long as it knows once it hits the obstacle that was a bad outcome, well now in the future the computer can learn, I better not do that anymore, rather than go to the right, let me try something else, I'll try moving up. All right, that seemed to have worked better, let me try making another action and let's try another one, maybe I'll go down this time. OK that led to an obstacle, so the computer learns from that too. It learns that was a bad thing to try, so let's try something else. Maybe we try going up this time, that too leads to an obstacle, that was no good, so maybe we'll try going to the right. That seemed OK, maybe we'll go to the right again. All right, that led to another obstacle. And so over and over it's just making mistakes and we let the computer make those mistakes but every time it makes a mistake, the computer is learning something from that. It's learning what to do or what not to do the next time it goes through the same process. And so in the future, it can navigate its way around and eventually with enough trial and error, it can find its way to the goal. And once it's found its way to the goal, it'll remember that too, it'll know I now know exactly what sequence of actions will take me from the starting point to the goal. And so in the future I can just keep doing that again and again and again. So that then is an example of reinforcement learning, but even with this example here do you see any potential problems with this approach? Limitations or downsides to what we've just done here, reasons why this AI might not be perfect, any thoughts? Let's go to [? Lexleen. ?] AUDIENCE: It might not be taking the most efficient path. BRIAN YU: Yeah, absolutely because it's randomly trying, eventually it will find its way to the goal but it might not necessarily find the best path because here there was a faster path, there was a faster way to get to the goal, and that was to say once you get here go up instead and that will lead to a more efficient path. But because our AI is learning, like whenever it reaches the goal it learns to do that. When it doesn't reach the goal learns not to do that. Our AI doesn't have that ability to find that better path in the future. And so we could improve upon this. And this is what we call a trade off between exploring and exploiting in artificial intelligence. We want our AI to do both of these things. We want AI to exploit the knowledge it already has, it knows what to do, it knows what not to do, we want it to use that information, but we also want it to explore a little bit. We want it to sometimes try some new action that it hasn't tried before because maybe that'll be even better than the things I've already tried in the past. So so far our AI in the example just now, was only ever exploiting the knowledge it already has but it was never exploring something new. And so often when we're training AI, we want it to find some sort of nice balance between the two. We want it to make good choices, but occasionally take a risk, try something else, see if maybe we can find a better solution. And so one strategy for doing this is known as the epsilon-greedy approach to trying to solve these problems, where we assign a value, epsilon, which is equal to some proportion, some probability that our computer is just going to make a random choice. And so we might say if we generate a random number and it's less than epsilon, which in this case happens like 10% of the time, then let's go ahead and make a random move, rather than make an intelligent, smart move just pick a move randomly and the rest of the time, 90% of the time, make the move that we know to be the best, the one with the highest value we know of so far. And this will often result in a nice balance. 90% of the time our AI will make good choices that it knows to be good choices, but 10% of the time it'll make a new choice, something random and maybe it'll stumble across something bad and know to avoid that in the future, but maybe it'll come across something better and know to do that better in the future as well. And so I'll show you a real live demo of this actually. Years back, the Italian Institute of Technology was working through an example of trying to do something just like this, but before we move on I think there's a question from the chat? DAVID MALAN: Brian, someone asks, is this related to genetic algorithms, this approach? BRIAN YU: Genetic algorithms are another type of this type of learning. We're going to actually going to talk about genetic algorithms in just a moment. So we'll get there very shortly. But yes, definitely something related. So the Italian Institute of Technology a couple of years back tried to teach a robot how to flip pancakes. Something that you might have done before or seen someone else do before but in practice, it's difficult to encode in a robot exactly how to do that. Exactly what moves to make, exactly how to move its robotic muscle to be able to flip a pancake effectively. So rather than try to program every last movement into the computer, they just let the robot learn via reinforcement learning. Every time it flipped a pancake incorrectly it learned what not to do in the future and every time it flipped a pancake correctly, it learned what to do in the future. And so I'll go ahead and pull up an example of this now. And so what we're going to take a look at here is a artificial pancake. And initially, the human researcher shows the robot what success looks like. The robot needs to know what is success and what is failure so the human demonstrates. Here is what it looks like to actually do the pancake flipping by demonstrating exactly what motion to make and what it feels like to successfully flip a pancake and then we let the robot try it. All right, that was his first try, not to successful. We'll try it again. Here's after three tries. OK, not great but it's learning. Every time it does something wrong it learns what not to do in the future. He has five tries, he has 10 tries, not great. And now after 15 tries or so, or 11 tries I guess, we'll see what happens. OK so not great, right? It takes a while to learn these kinds of techniques to learn, here's 15 tries, all right but finally after enough tries once you do enough practice with this. Here's after 50 tries and we now actually have a robot that has learned to successfully perform this task, not because human programmers told it exactly how to do so, but because it learned from that failure and learned from experience. And once it knows how, it knows exactly what to do in the future. It can continually replicate that behavior over and over again once it's trained to be able to perform that task. So that then is how you might take advantage of this idea of reinforcement learning but someone in the chat brought up another approach to this type of thing known as genetic algorithms or genetic learning. And this is where a lot of machine learning has taken inspiration from the human brain, how humans learn and how nature works and because nature has managed to evolve intelligent beings, why couldn't we try to do the same thing in a computer as well? And so in nature, you have generations of populations that evolve over time, where the most fit organisms survive and are able to evolve and mutate and change over time to become better and better at adapting to their environment. And so one strategy, one approach to trying to build intelligent machines is rather than program one algorithm that is really good at performing a task, let's just create a lot of algorithms that are really bad at performing the task because that's much easier to do. But we'll let them evolve, we'll let them try some task and after they do it, they won't do very well but we'll take the ones that did the best and replicate them and mutate them and let them try again. And then repeat this generation after generation, eliminating all of the computer programs that don't perform well but duplicating and mutating the ones that do. The pseudocode for which might look a little something like this. We're going to start by making an initial generation of candidates randomly, where each candidate is some program designed to try and solve some task but rather than program it intelligently the way you've been doing all semester, we're just going to program them randomly to just make random choices and we're going to repeat this process until eventually they're successful. We're going to for each one of these candidates calculate its fitness, calculate how good is it at performing the task that it's designed to do. And then we're going to remove the least fit candidates. Maybe take the half of them that didn't do very well, eliminate those but take the ones that did do well and make a new generation from the remaining candidates, duplicate them, make a few mutations to them randomly just to change things up and to see how things work in order to try to create a better generation. And over time as we repeat this process, in much the same way that evolution is able to continually produce better and better organisms that are more and more fit, we can do the same thing with our algorithms too, producing better and better algorithms over time. I have another demo to show you of this. Here is a simulation of some self-driving cars that have not been programmed how to drive but are starting out driving entirely randomly each of these rectangles you see is one example of a virtual self-driving car. Each of the little crosshairs you see, the little Xs, are sensors that the car has access to. So the car has access to data about how far away in any given direction particular obstacles are and what these cars are trying to learn is how to drive through some sort of environment while not crashing into anything. And initially they didn't do very well, you notice they were crashing almost immediately but now we're about six generations in, seven generations in, they're starting to do a little bit better. They're starting to get a sense for like turning when you run into a wall. Even when they get to new environments they haven't necessarily seen before, they're starting to do a little bit better. But of course, they're still not great, they're still crashing quite frequently. Sometimes a generation does even worse than the generation before it, because it's not always going to be the case that when you make random mutations those mutations are not necessarily going to be better. Sometimes the mutations actually end up being a little bit worse. And so they might move backwards a little bit. But over time, generation after generation, you're hopefully seeing that now 10 generations in, these cars in general are doing a whole lot better than they were before and every generation we're taking whichever cars made it the furthest, duplicating those, eliminating the rest, and moving forward. Is there a question in the chat? DAVID MALAN: Yeah, with regard to the pancakes, how did the robot know what was wrong with the previous pancake flips? In the case of the cars how does the car know that it erred? BRIAN YU: Yeah, so whenever you're doing anything with reinforcement learning, whether it's the pancakes or these robots here, the programmer still needs to tell the AI what does success look like and what does failure look like. So in the pancake example, we trained the pancake flipper to be able to know that when you're flipping this pancake when it's first taught what does it look like when it's successful. So it gets a feel for that and it was probably also told that if the pancake falls, that's not a good thing for example. And that's not something that the AI should try to do. And in this case, I assume these cars have been programmed to be told that if you crash into something that that is not good, that the car presumably can detect after it's crashed into something and so it probably also has some sense of like how far it was able to drive, such that we could duplicate the ones that did drive the furthest and not the ones that didn't and let's see how this car does, it was so close to the end, didn't quite make it. Maybe give it one more generation and we'll see how this generation does. And all right, so it's navigating these turns, looks like a whole bunch didn't make it but as long as we get one successful one, we can learn from that successful one in the future. Here's the ending, looks like they crashed but and all right, it looks like one car was finally able to make it to the end and was able to successfully learn that task of how to navigate through this environment of mazes. So I'll pause here for questions then about reinforcement learning and how these ideas might have worked. And, Samuel, did you have a question? AUDIENCE: Yes. So with the genetic algorithm, the car one specifically, so all the cars are learning from each other as each one crashes? BRIAN YU: It's not so much that the cars are learning from each other that we're generating new cars based on the cars that were previously successful. The idea being that if we run 10 cars and see how far they go, we take the five that went the furthest, we eliminate the other five that didn't make it very far but then we duplicate and repeat the ones that did do well, such that in the future, hopefully this new generation of cars will make it a little bit further. And generation after generation, the hope is that we're able to make it a little further along the road until eventually as you saw like 15 generations in, we were able to get some cars that could perform the task successfully. Let's go now to Josiah. AUDIENCE: Is the car specifically learning just from the track? I mean, when you change the track, do we need another, I mean, do we need to start from zero again or? BRIAN YU: Yeah, so if the track were different would it have to learn again? Hopefully not. Hopefully what the cars were learning was in this case was learning based on sensor data, like how far away a wall is in any direction, which way to turn. And the goal is for this type of approach to generalize. And I mean, hopefully actual self-driving cars in the real world are not trained this way. But you would hope that as they're trained on sample environments, that when you put them in a different environment they would be able to generalize their knowledge to be able to apply similar techniques. That if in a real world setting you take a car and put it on a road that it's never seen before, hopefully it's seen enough examples of things that are like it, sensor data that it recognizes, that it's able to make an intelligent decision based on that. So yeah, the hope is and the goal in many cases with AI is to be able to generalize this knowledge beyond just a specific example. Any other questions? Yeah, let's go to [? Yagonch. ?] AUDIENCE: Yeah, so can these algorithms ever reach like a bottleneck, just like in real life evolution, there are branches and some branches just reach a bottleneck. So is that possible here [INAUDIBLE]? BRIAN YU: Yes, it's definitely possible that there might be-- the algorithms might end up converging to something that seems pretty good and it doesn't seem like any mutations are doing any better, but it turns out a totally different algorithm might actually do better than that. That's what we will often call a local maximum in the context of artificial intelligence, where there is some better approach, there is some better algorithm but we can't necessarily find it. There are other strategies for trying to deal with that problem. But it is definitely a challenge we're thinking about. And one more question. Let's go to Sophia. AUDIENCE: I have a question about how the fitness is calculated. Like in both of these cases, is it like [INAUDIBLE] motors that are running or here like the distance? BRIAN YU: Yeah, in the case of the pancake, it was probably like a binary of just like did the pancake end up landing in the pan or not, was our way of calculating the fitness of that particular example. In the case of the cars, we would probably calculate fitness based on distance traveled, whichever cars ended up traveling the furthest, that would be our definition of fitness and that's really the part the programmer needs to be involved in. The programmer needs to decide, what does it mean to be most fit, what does a bad example look like and using that, once the computer knows what success looks like and what failure looks like, then it's able to learn from that to be able to do better in the future. All right, so that was genetic algorithms, which is one example of computers that are able to learn from the way that humans learn, learning from the way that nature works for example, but computer scientists didn't really stop there. There are other examples that we can do to add to this as well. And so one other example of using this idea of reinforcement learning and using genetic algorithms might be in video recommendations for example, where you could have some watch history and the way that an algorithm like YouTube or Netflix might suggest videos for you to watch is via this reinforcement process, that it will just try suggesting videos to you and learn from experience. If it suggests a video to you that you like, that you click on that, you watch all the way through, well then the algorithm learns in the future, let's recommend more of those sorts of things. That's like the car traveling very far and so we learn to do more of that in the future. If they recommend a video to you and you don't click on it you don't ever watch it, well then it's probably not going to recommend that to you in the future. And it's probably going to learn from that experience as well. And so this is one example of computer science learning from nature, learning from the way that humans are. Another place that's happened too is by looking at the human brain. That the human brain consists of neurons and those neurons are connected to one another and they pass information from one to another, electrical signals flow through one neuron to another neuron and that's how brains are able to make these very sophisticated kinds of computations. And this is what we might call a neural network, a collection of these neurons. And one place that AI has been looking into especially recently gaining in popularity, has been trying to develop artificial neural networks. Instead of using a biological neuron we just use what we might call an artificial neuron or a unit. And you can think of this unit as just storing some value, like some electrical signal for example, like you might find in the human brain, now just in a digital format and these artificial neurons, these units can be connected to each other in sort of an input that translates into an output sort of format where this arrow represents some sort of calculation, some calculation that is taking this value on the left and transforming it into that value on the right. And neural networks are usually not just one input unit and one output unit, but they can be more complex. You might have a neural network with two different inputs, each of which connects to an output or even more sophisticated neural networks like this one, where you have multiple layers of these units that are all connected to each other, each of these arrows performing some sort of calculation and if you've ever heard the term deep learning, this is often what that's referring to, this idea of deep neural networks with many layers, each of which is performing calculations. And ultimately using some linear algebra, these neural networks are able to figure out exactly how to tune these calculations to translate input into some output. If you give a neural network enough data it can learn from that data, it can learn exactly how to set these various different arrows to be able to calculate some task, to be able to translate some input into some output. So for example, that might take the form of handwriting recognition. How exactly do we train computers to be able to learn how to recognize handwriting given all the different types of handwriting? Well, one way to try to do that is by using a neural network, setting up a network of all of these neurons and all of these connections and then you give to that neural network a whole bunch of data. I give to the neural network a whole bunch of already existing handwritten digits, each of which is labeled. So the computer knows which image corresponds to which digit. This data set you're looking at here is a very famous data set of handwritten digits called the MNIST data set, and given all of this data the computer can start to train the neural network and start to figure out exactly what calculations to run at each layer of the neural network to be able to translate the input into the output, to be able to translate like a screenshot of what looks like a handwritten number eight, that we all could tell is the number eight but might be difficult for us to describe exactly how a computer could figure that out, but via the neural network, by training the neural network on all of the sample data, it's able to learn some sort of function that can translate this input, this handwritten digit into the output, what the actual digit happens to be. And these neural networks have proved to be incredibly versatile and we won't get into all the math here because it does get a little bit more complex, but it's used all over the place. It can be used for email spam detection, where if you give the computer a whole bunch of data, a whole bunch of emails, some of which are labeled spam and some of which are not, it can learn a function. It can learn exactly how to tune that neural network to be able to predict, for any given email, whether it's spam or not. And really the key factor here to making these networks work is having access to large amounts of data. And this is why a lot of companies now are doing a lot of work and trying to get a lot of data and use that data in training their machine learning models, it's because the more data that you have, the better you can make these algorithms because the better you can tune them to all the different types of inputs there might be, to help to make them even more accurate in the future, to be able to test these networks to see how well they work. And every time you're interacting with websites online, you're often helping to provide some of that data. Every time you go to your email app for example and you mark an email as spam, that email app might be learning from that, learning OK this type of email, that's an example of a spam email. And so it learns in the future to do a better job of trying to do that classification. And likewise, every time an email is marked as spam and you have to tell the computer, you know what, that wasn't spam the computer is learning from that too. It's more data that the computer can use to help to inform its algorithm and the way that it's working. And so for one final example, we can take a look at how images like this were actually generated. How could a computer possibly get an image like this and generate it and make something that looks very much like a real person, even though it's not actually a real person? Well, it's done using the exact same technique, using a neural network that learns how to translate inputs into outputs by having access to a large amount of data. In this case, having access to many, many photos of real people, so the computer can start to get a sense for what a real person looks like, it can start to train the network in that way. And then rather than build the entire person all at once, build it up step by step. A computer generating a photo like this, this is a pretty complicated task, it's pretty difficult to do. But you know what's easier is generating that. That is 16 pixels it doesn't look like a person at all but this a computer could do pretty readily, just generate what seems to be a whole bunch of kind of random looking pixels. But then you would train the computer to add a little bit of detail to this image to be able to learn if this were a real image, how would I add a little more detail to it to make it look a little bit more like what a person would look like? And it does so again by having access to large amounts of data many, many photos of people, so it can begin to learn from that experience. So the computer learns how to take this image and turn it into this for example. Still doesn't really look like a person but it looks a little more like a person. It's got a higher resolution. You can maybe make out that there is some hair here and a face here. And then the algorithm learns how do you take an image like this, an 8 by 8 grid and turn it into a 16 by 16 grid. Now this, I mean, it still doesn't look like a person, but it looks a little more accurate. And over time we can follow these steps one after another adding more and more detail. Until eventually, the computer is able to generate an entire image that really looks like a person, that's very hard to distinguish just by this input to output process. Learning some mapping from inputs to outputs by having access to a whole lot of data. So before we wrap up here I'll pause for any final questions about artificial intelligence, any of the algorithms we looked at for searching, for learning or anything like that? Any final questions? Yeah, Sophia, go ahead. AUDIENCE: I had a question about like with the fitness, how is all the I guess, like data-- you mentioned they get regenerated like the successful ones, but I was wondering like how is all that data sort like what the success was exactly, but fitness is just like a score it's hard to know like exactly what decisions it made that said it was successful. BRIAN YU: Yeah, and that's actually one of the trickier things about machine learning, that we can train these machine learning models to be very, very good but it's not always immediately apparent to us like what it was doing in order to be successful. And this is an active area of research within machine learning, including some faculty at Harvard that work on this too, which is the interpretability of machine learning results. Like the algorithm becomes very, very good at recommending a video to you or generating an image of a person, but it's hard for a person to look at that machine learning model and understand how it arrived at that conclusion. Ultimately, in many cases, people just throw their hands up and say, we don't really care how the algorithm is doing it, as long as the algorithm is doing it successfully and eventually produces good results, we'll just take the ones that are doing the best and use those, even if we don't necessarily understand exactly how those algorithms are working. And that's definitely an area of concern for some people and an area of research for others that are looking into these types of tools and technologies. Is there a question in the chat maybe? DAVID MALAN: A final question from the chat, Brian. Every time I choose the parts of the picture that contain traffic lights or crosswalks to prove I am not a robot, am I training Google's driverless car? BRIAN YU: Maybe not Google's necessarily, but certainly a lot of times when you're doing that type of thing. I mean, in one part it's to verify that you are in fact human. That is one of the purposes of those things, to make sure that robots aren't signing up for websites, for example. But certainly it could be just giving more examples of labeled data. That oftentimes what machine learning models rely on is labeled data where you have like a digit, a handwritten digit, and you have a label of "this is the number 2," or "this is the number 8." And so the computer can then use those digits along with those numbers in order to figure out how to learn how to take handwritten digits and convert it to individual numbers. And so having access to that kind of labeled data ultimately ends up being really, really valuable. All right, so that's it for artificial intelligence for today. Thank you all so much and we'll see you next time. [MUSIC PLAYING]