[MUSIC PLAYING] 

SPEAKER: Welcome back, everyone. This is CS50. And today, we have a lot of interesting things to talk about. First, though, I have to remind you of a few administrative things. This week is quiz one, Wednesday or for the Yale section on Tuesdays and Thursdays, on Thursday. There are quiz reviews tonight at Yale, 5:30 to 7:00. At Harvard, they recorded one yesterday. And everyone can watch that online. 

Also, this week or early next week, we have our last CS50 lecture. [GROANS] I know. It came so soon. Yale students will have a live lecture here in the law school auditorium on Friday. There will be cake. Harvard students will have the last lecture in Sanders on Monday. There will also be cake. 

Also, this week on Friday, for those of you who are coming to New Haven, we have the CS50 Expo. We have more than 30 different groups registered to show you everything from autonomous sailboats, to systems that recognize digital portraits, to computer music and computer-produced music. So please join us. I think it's going to be a great time. 

Today, though, we get to continue talking about AI, about artificial intelligence. And one of the things that we're going to get to today is the idea of how to use AI to solve problems. Now, as always, let's start with something simple. And we're going to start with a simple idea. And that's using search. 

So imagine for a minute that I have a task that I need to perform. And I'd like to have that task automated by some software agent. Imagine that I'm trying to book a set of flights from, let's say, Boston to San Francisco. I could go through and I could use one of the wonderful online search tools, which is going to do basically the same process that we're going to walk through today. But if you didn't have that tool, what would you do? 

Well, you could look and see and say, I'm in Boston. What flights are available to me? Now, maybe I have three possible flights out of Boston that will fit the time when I need to leave. I could fly to Chicago. Or I could fly to Miami. Or I could fly to New York. I could then look from each one of those destination cities and think about what locations I could possibly reach from each of those individual cities. 

So maybe from Chicago, I can get a direct flight to San Francisco. That's excellent. Or I could get a flight to Denver. Now, maybe that flight to San Francisco is the perfect solution for me, but maybe not. Maybe I'm looking for something that's a little bit cheaper or a little bit better for my schedule. And so I could look for what other possibilities might be out there. So I could look at Denver. And from Denver, well, maybe I can get a flight to Austin. And from Austin, maybe I can get a flight to Phoenix, and from Phoenix to San Francisco. Now, I'm not done yet. Because maybe there's a direct flight from New York to San Francisco that's perfect for me. Or maybe there's a flight from Miami through Denver that's a lot cheaper. So I still have to go. And I still have to look at all of those cities that I haven't investigated yet. I have to exhaustively check all of the possibilities that I might have. 

So from New York, maybe I can get a flight to Nashville, and from Nashville to Austin. And then I know where I am. And then I know from Austin, I can fly to Phoenix, and from Phoenix to San Francisco. If I fly first to Miami, though, maybe I can get a flight from Miami to Nashville, or from Miami to Austin. 

And now I've tried all of the possibilities. I've built up this graph that shows me all of the possible routes that I might be able to take. When we represent these kinds of problems, we're not going to represent them explicitly as this graph, because that graph doesn't represent the history of where we've gone. Knowing that I flew from Phoenix to San Francisco doesn't tell me whether I came via Nashville, or via Denver, or via Miami. 

So what I'll do instead is I'll take this same problem, and I'll represent it as a tree. And at the root of the tree, at the top, I'll put the place that I started, Boston. And from Boston, I'll look at all of the possible locations that I can travel to. Well, in this case, I had three, Chicago, New York, and Miami. And then I'll explore each of these children in the tree. 

From Chicago, I saw that I had two flights. I could fly directly to San Francisco or to Denver. Now San Francisco, that's my goal. That's my destination. That's going to be a leaf of this tree. That is, I'm never going to go somewhere after San Francisco. From Denver, though, I can fly from Denver to Austin, from Austin to Phoenix, and from Phoenix to San Francisco. And now again, I've reached a leaf. 

I could then go back to the next city that I haven't fully explored. That would be New York, go back up to the top of my tree, come down to New York. From New York, I can fly to Nashville, from Nashville to Austin, from Austin to Phoenix, and from Phoenix to San Francisco. And finally, one city I haven't looked at yet, Miami. 

Well, from Miami I said I had two possibilities, Nashville or Austin. If I fly to Nashville, well then I fly from Nashville, to Austin, to Phoenix, to San Francisco. If I fly to Austin, I fly Austin, to Phoenix, to San Francisco. And now I have a tree. It's a complete tree. It's all of the possibilities and all of the paths that I could take. That is, if I start at the root of the tree at the top and I go down to one of the leaves, it tells me not only where I'm going to end up, San Francisco, but it tells me the route that I need to take to get there. 

Now, which one of these is the best? Well, nothing about this problem yet tells me which of those is the best solution. Maybe I care the most about how much time I'm in the air, or the distance that I'm flying. In that case, Chicago to San Francisco might be the shortest number of miles in the air. 

Maybe I care about cost. And we all know direct flights are usually more expensive. So maybe if I take this kind of backwards route through Miami, Nashville, Austin, Phoenix, maybe then I get a lower price. But I could optimize on any criteria that I care about. Who's got the best in flight Wi-Fi, or which airports have the best food available. And each of those might give me a different solution that I see as being the best. 

These kinds of problems, where we're going to build out this tree of possibilities, and then look at each of those individual paths, and examine which of those fulfills a criteria for us, we're going to call those search problems. And we have lots of algorithms, some of which we've seen already, to go and explore those trees. We could do it in the way that I just did, a depth-first search, going down as far as we can until we hit a leaf, and then coming back up, and going right back down. Or we could do what's called breadth-first search. We could expand everything at the top, and then everything one line underneath that, and then everything one line underneath that. Those search trees are fundamental to AI. But they don't quite get it right all the time. In fact, in a lot of the cases that we really care about, we want to build a tree, but we don't actually get to make all of the decisions. 

These are situations called adversarial search, also known as how to write game playing systems and get paid for it. But these are the kinds of systems where I might get to choose when I go from Boston, which city I go to next. But after that, someone else might get to make the decision about where I fly. So to build these kinds structures, we're going to have to take a slightly different approach to it. We're not going to be able to just search through the tree anymore, because we're not the one that's in control of each of those decision points. 

So let's imagine a simple game like tic-tac-toe. I could start with a completely blank board. And in tic-tac-toe, X gets to play first. And so I could think about all the possible moves that X could make. And if I'm the one playing the X, that's great. I have nine possible moves that I can make. I could put an X in any one of those nine positions. 

And then from each of those, I could imagine what happens next. Well, in this case, the other player would get to take a turn. O would get to take a turn. And from each of those, there would be eight different places that O could place their marker. 

Let's say I decided that I was going to put an X in the center. That always seems like a good opening move. I could look at underneath that, the eight possible moves that O makes. Now, if I'm playing X, that's wonderful. I get to choose which one I go to, the one in the middle. But now O gets to choose. And I don't have control over that decision. 

But from each of those possible board positions, there's then another set of possibilities. When it comes to be my turn again, I would get to pick and say, well, if O moves into the, well, the middle spot on the left, then I have a set of possibilities where I can take my next move. From those, I could consider all of the possibilities underneath them. And then O would get to choose among those. 

And I could keep building this tree out until I got to the point where either someone wins the game-- that's got to be considered a leaf node-- or the board is completely full and no one has won. And that's also going to be a leaf node. That's going to be a tie. 

But the tricky thing with this is if this were just a regular search problem, I'd be able to say, well, X should go here. And O should go way over there. And then X should go over here. And then O should go way over there. And then X can get three in a row, and I win. And the game would be over in five moves, three for me, two for my opponent. But I don't always get to choose that. 

So instead, what we're going to have to do is we're going to have to have a new strategy. And the strategy that game-playing algorithms often use is what's called minimax. The central idea of minimax is that we're going to pick the move that gives our opponent the worst possible set of moves that they can make. It doesn't do me any good to choose a move where I might be able to win after that, because my opponent isn't going to give me that chance. They're going to choose some terrible outcome for me. So I'm going to make the move that forces my opponent to do something better for me. All right. Let's see how that plays out. So here's our algorithm in pseudocode. We're going to generate the entire game tree. We're going to build the entire structure. And then we'll go through. And at the very bottom at each of the terminal nodes, at each of the leaves, we'll evaluate how valuable is that to me? And we're going to value things that are good for me as being positive. Things that are not good for me will be less positive, or zero, or even negative. 

So in tic-tac-toe, maybe a win for me is good. That's a one. And a tie is zero. And something that's a loss for me, maybe that's a negative one. All that matters is that the better it is for me, the higher the score it receives. From those possibilities at the bottom, then we'll filter upward. And when it's my chance to choose among a set of alternatives, I'll choose the one that's got the highest score. 

And whenever it's my opponents turn to choose, I'll assume that they're going to choose the one with the lowest score. And if I do this all the way up to the top of the tree, I'll have chosen a path that gives me the best outcome that I can get, assuming that my opponent makes all the right moves. 

All right, so let's see this in action first. And then we'll actually look at the code for it. So imagine I have this big tree. And now I'm not playing tic-tac-toe. I wanted to give you something a little bit richer. So I've got some game where there's many different scores that I could have at the end. And so I build this complete tree. And I get to move first. I'm at the root of the tree. 

And I get to choose that-- so I get to maximize across that first node. And then my opponent gets to go. And then I get to go once more. So down at the bottom, I have a set of possibilities that I can choose from, different terminal states of the game. If I'm down in that far left hand corner, and I see that I've got a choice between an eight, a seven, and a two, well, I'm the one that gets to choose. So I'm going to choose the best one of those. I'm going to choose the eight. 

So I know that if I ever get down to that point, I'll be able to get that eight points. If I end up at the next point over, the next node over, a nine, a one, or a six, well, I'm going to choose the best of those. I'll choose the nine. If I have a choice between two, and four, and one, I'll choose the four, the highest. 

Now, if I look at the level above that, my opponent is the one gets to make that choice. So my opponent gets to choose, do I want to give him the thing that's going to get him eight points, or do I give him the thing that's going to give him nine points, or the thing that's going to give him four points? And my opponent, being rational, is going to choose the minimum of those, is going to choose the four. 

And I can do this through the entire tree. I can go down to that middle set of three. And I can choose between one, three, and five. And I get to choose. So I choose a five. I can choose three, nine, or two. I get to choose, so I choose the nine. Six, five, or two, I choose. I get to choose the six. Level above that, who gets to choose? Who gets to choose? The other guy, my opponent. So they choose five, nine, or six, which one? 

AUDIENCE: The five. 

SPEAKER: They choose the five. They get to choose the minimum. And then the last one, choose one, two, or three. I get to choose, so I choose three. Nine, seven, or two, I choose nine. And 11, six, or four, I choose 11. My opponent then chooses three, nine, or 11, chooses the minimum. He gives me a three. And then finally at the top of the tree, I get to choose again. And I get to choose between a four, a five, or a three. So I take the five. 

If I got to control everything, I'd take the path that led to the 11. But I don't get to make that choice. If I go down that path. My opponent will force me into the choice that leads to a three. So the best that I can do is to take that middle branch, make that choice that's eventually going to lead me to five points. That's what minimax does. 

All right. Let's take a look at that. So here in the CS50 IDE is a program that implements minimax to play tic-tac-toe. We're going to build up a representation. We're going to have two opponent-- or two players, our computer player and a human player. Player number one will be playing the O. That'll be the machine player. They get to move second. And the other player, our human player, will be X. 

And to make my life a little simple, I'm going to label that player negative one. So I can just multiply by negative one to swap between one player and the other. All right, so let's take a look at what we're actually going to do. We're going to define our board. It's going to be, well, we're going to allow it to be three by three, or we can even play five by five or seven by seven tic-tac-toe if you'd like, based on some dimension D. 

And we'll have a couple of helper functions that'll do things like initialize the screen-- or sorry, initialize our variables, clear the screen, draw the board on the screen, one that checks a board to see whether or not there's a winner, one that parses through the command line, just to help out, one that reads in input, and one function called minimax. And that's the one we'll care most about. But let's look first at the main. 

What do we do? Well, we're going to parse our command line, just read in and see what dimension board we'd like to have. We'll initialize our board. And then we'll enter one big wild loop, repeatedly accept moves until the game is won, or there's no moves left. Every time we go through that loop, we'll clear the screen. We'll draw the board on the screen. And we're deliberately sort of abstracting these away as subroutines, so that we don't have to worry too much about the details of how they happen. 

You'll have the code later today. And if you want to look through and find out, you can see them all. But we'll draw a board on the screen. And then we'll check and see, do we have a winner? Has someone won this game? If they have, we'll print out a victory message. And we'll end the game. 

We'll also check and see if there's a tie. It'll be easy to see if there's a tie. It means that all the spaces are full, but there hasn't been a winner yet. We can declare a tie and be done. Then the real meat-- if it's a machine player, we'll allow that machine player to search through using this minimax algorithm, to find the best move that it can. And then we'll put that move up. 

Otherwise, if it's a human player, we'll read some input from the human. And then whether it's the human player or the machine player, we'll do a couple little bits of error checking, make sure it stays within the boundaries of the actual dimensions of the board that we have, make sure that that space is empty, that no one's put a piece in there already. And then we'll just put a piece on the board, change the player to the next layer, and increment how many moves have happened. 

That's the main loop for our tic-tac-toe game. Minimax, then, is exactly the algorithm that we before. The only adjustment that we've made so that we can play higher dimensional boards is we've kept this extra parameter called depth. And depth just says, if I'm searching downward through that tree and I get so far down beyond some level depth that I just don't want to go any further, I'm going to stop and just evaluate the board at that point. I'll check and see if there's a winner. If there's a winner, I return them. Otherwise, I'll go through a loop. And I'll say, for all of the possible locations that I could possibly take as my move, I'll build a hypothetical board that includes my move on that board, and then recursively calls minimax. 

If it's my move, I get to find the one that's got the largest score. If it's my opponent's move, we find the one that's got the minimum score. And everything else is just record keeping. All right, so let's see this run. Actually, maybe we can get a couple of volunteers to come up and play tic-tac-toe. [INAUDIBLE] one, and one more, two, right there. Come on up. 

So let's go ahead and restart this completely. So, hi. 

AUDIENCE: Hi. 

SPEAKER: What's your name? 

AUDIENCE: Gorav. SPEAKER: Gorav. 

AUDIENCE: I'm Layla. 

SPEAKER: And Layla, and Layla, sorry. Come on up. Gorav, we're going to have you go first. And I'm going to ask you to be a not terribly good tic-tac-toe player. OK, so all the pressure is off on you. Let's see, though, that our machine player can actually do something smart. So go ahead. You're going to type in which coordinate you would like to put your X in. A0, OK, and the machine has gone right away and put its mark in A1. 

Put the O on the board. All right, now go ahead. Where would you like to go? C2. Our machine player has taken the middle square, blocked you. So that was a good, smart thing for it to do. You've blocked it. That's excellent. It takes the corner there. 

And it's going to force you to take the one last space, B0. And the game ends in a tie. But it played a reasonable game against you, right? All right, thanks very much, Gorav. 

[APPLAUSE] 

All right, Layla, we're going up the game on you here. 

AUDIENCE: Oh, great. 

SPEAKER: We're going to give you four by four tic-tac-toe. Now, in four by four, you have to win with four in a row, not three in a row. And it's all yours. So Layla took D1. We're now going to follow our computer player here. Three by three tic-tac-toe is the kind of thing that is easy for all of us. But it's still nice to see the computer player making smart moves. Four by four gets to be a little trickier. 

Nicely done. All right, so Layla's finished off. Oh, and we should have ended there. But let's do one more up here. So Layla, thank you. Nicely done. 

[APPLAUSE] 

So our tic-tac-toe player goes through and finds locations, solves them using this minimax. And I had a depth setting on that so that it wouldn't run too fast, which is probably why Layla was able to go nicely ahead as she did, and did very well. But these systems that just go through and brute force go deeper, and deeper, and deeper, and keep finding the solution that they need, those kinds of systems are quite successful at these, well, standard board games. 

And in fact, if we look at a three by three tic-tac-toe game, this is basically a solved problem. And this is a wonderful diagram from Randall Munroe at XKCD, showing which move you should take, given your opponent's moves. This is something that we could easily specify ahead of time. But what happens as we get to more complex games, more intricate games, where there are bigger boards, more possibilities, deeper strategy? 

It turns out that this brute force searching still does reasonably well, except when you get to the point where that tree is so large that you can't represent it all. When you can't compute the entire tree, when you can't go forward and push yourself to the point where you've gotten the entire tree in memory, or whether you can get it in memory and it will just take you way too long to search through it, you have to do something smarter. 

In order to do that, you have to do two things. First, you have to find some way of limiting your depth. Well, that's OK. We can find some nice, bare minimum and say, you can only go so deep. But when you do that, that means you have these partially incomplete boards. And you have to choose, do I like this partially incomplete board, or this partially incomplete board? 

And on our four by four tic-tac-toe game, our computer player got down to the bottom and it said, I've got two different boards. Neither one is a win. Neither one is a loss. Neither one is a tie. How do I choose between them? And it didn't have a smart way of doing that. 

We see this kind of evaluation happen all the time as we get into more complex games. Chess is a great example. In chess, we have, first of all, a larger board. We have far more pieces. And the positioning of these pieces and the way that these pieces move is critically important. So if I want to use minimax, I need to be able to specify and say, this board, where no one has won or lost yet, is somehow better than this other board, where no one has won or lost. 

To do that, I might do things like I might just count how many pieces do I have and how many pieces do you have? Or I might give different pieces different points. My queen is worth 20 points. Your pawn is worth one point. Who has more points total? Or I might consider things like, who's got the better board position? Whose turn is it next, anything that I can do to evaluate more accurately which of these possibilities is better without exhaustively considering every move that could come after that. 

Now to make that work, one of the things that's going to become really important for us is not just moving straight down to a particular depth limit, but being able to say, one of these ideas that I have is so bad that it's not worth considering all of the possible ways that things can go from bad to worse. To do that, we'll add into minimax a principle called alph-beta. And alpha-beta says, if you have a bad idea, don't waste your time trying to find out exactly how bad it is. 

So here's what we're going to do. We're going to take the same principles that we had before, the same minimax type of search, only we're going keep track, not only of the actual values that we have, but we'll keep track of the best possible value that I could get, and the worst possible outcome I could have. And any time the worst possible thing is looking likely, I'll abandon that part of the tree. And I won't even bother looking at it anymore. 

All right, so imagine that we start with this same exact game tree. And now we're going to go down again, all the way down to that bottom left corner. And in that bottom left corner, we look and we evaluate this board. Maybe it's a four by four tic-tac-toe board, or maybe it's a chess board. But we look at it, and we evaluate it, and we get a value of eight. 

At that point, we know that we are going to get at least eight points from this bottom decision. It doesn't matter what the other two are, that seven and that two. They could be any values they wanted to be. We're going to get at least eight points. All right, but we could go ahead and check. Maybe one of them is better than eight. 

We look at the seven. Is that better than eight? No, that doesn't change our opinion at all. We look at the two. Is that better than eight? No, that doesn't change our opinion at all. So now we know we've exhausted all of the possibilities there. We're not going to get anything better than eight. We're going to get exactly eight. 

And so we change that node and say, that is now a certainty. We go up one level above that. And now we know something about that minimization level. We know that we're never going to get more than eight points if we go down that direction. Because even if those other two branches turn out to be fantastic and worth thousands of points each, our opponent will give us the minimum, and give us the eight. All right, well, let's see. We'll keep going down that path. We go down to that middle on the left. We look down and we see there's a nine. We know that we're going to get at least nine points by going down that middle road. And at this point, we can just pause. And we can say, look, I know in the level above, I'm going to get no more than eight points by going down this direction. But if I went down the middle path instead of the left path, I would get at least nine points. 

My opponent is never going to let me go down that middle path. They get to choose. And they're going to choose the path to the left towards the eight, rather than down the middle towards what's at least nine points. So at that point, I'll stop. And I'll say, you know what? I don't have to look any more down in that direction. Because I'm never going to get there. 

I can skip over that one, and I can skip over that six, because that's never going to happen. So I'll go down and I'll consider the next possibility. I go down there and I say, I see a two. I know if I get to here, I'm going to get at least two. OK. I keep going. I see a four. I know I'm going to get at least four. There's still a lot between four and eight, though. So I keep going. I look down and I see there's one. All right, I know if I go down this path, I'm going to be able to choose the four. What's my opponent going to do? Between something that gives me eight, something that gives me four, and something that gives me at least nine, well, he's going to give me the four. And I know now at the very top, I'm going to be able to get at least four points out of this game. 

The whole idea of alpha-beta is to cut off parts the tree so that I don't look at them anymore. But it still looks like I've been looking at a lot of the tree. Let's keep going down. We'll go down the next one now. Down at the bottom, I find a one. I know I'm going to get at least one. I keep looking. 

I find a three. I know I'm going to get at least three. I keep going. I find a five. I know I'm going to get five if I get down in that path. And I also know then that my opponent, if I choose the middle of the three big choices, he's going to give me something that's five or less. 

OK. I can keep going there. I can look down and I can say, what am I going to get if I go down the middle path? I'm going to get, well, three there. I'm going to get something that's at least three. There's still things between three and five, so I keep looking. Oh, a nine, I'll definitely take that over a three. I'm going to get at least nine if I go down that middle path. 

Now my opponent stops and says, look, there's no point anymore. I know that my minimization opponent, he's going to give me the thing that's less than or equal to five, rather than the thing that's greater than or equal to nine. I stop. I don't look any more at that. I keep going. 

I look down on this one. Down to the bottom, I find a six. I know I'm going to get at least six. And what can I do? I can stop. Because there's a choice between something that's at least six and something that's less than five, he's going to give me the thing that's less than five. And now I know I'm going to get exactly that choice. I'm going to get that five choice. 

I go back up to the top. Which am I going to choose between something that's greater than or equal to four, or something that's equal to five? I'm going to take something that's at least five. I go down the last path, all the way down to the bottom. There's a one. OK, at least I'm going to get one point. I keep going. Two, oh, that's better than one. I'm going to get at least two. I find a three. I know I'm going to get three. 

And the point above that, my opponent is going to give me something that's less than or equal to three. And now I can stop. Because in the choice between me being able to get a five and my opponent giving me something less than three, I'm always going to take that five. So I don't evaluate that bottom part of the tree at all. 

Now, this may seem minor. But when little bits of arithmetic, greater than and less than, can cut away entire parts of this exponentially growing tree, that leads to a huge amount of savings, savings that are large enough that I can start playing competitively at more complex games. 

All right, if we look at the size and complexity of different games, tic-tac-toe was our easy example. We've got a small board, three by three. We get, at most, an average of about four different choices as we go through the game. We have somewhere around 10 to the fifth possible different leaves. And building a tic-tac-toe player, well, we just did it. It's easy. 

If we go up to something more complex, like Connect Four. Do you remember this game where you drop the little tokens in? It's a six by seven board, not that much bigger, still has about the same branching factor as tic-tac-toe. I have about four choices where I can put things in. But now, I've got a lot more leads, 10 to the 21st power. That's something that's easy enough that we solve it right away. 

Checkers, more complex-- you got an eight by eight board. You're only on half of them at any time, though. You've got a branching factor that's about 2.8. Well, we've got a couple moves you can take. You've got about 10 to the 31st leaves, larger, and larger, and larger spaces. As I have to search through those bigger and bigger spaces, that's when things like alpha-beta and being able to cut away entire branches becomes essential. 

Now, checkers was easy enough in 1992. A computer program called Chinook beat the world checkers champion, Marion Tinsley. And since then, no human master player has been able to beat the best computational systems. If we look at something like chess, now again, we have an eight by eight board. But we have much more complex pieces, much more complex movements. We have a branching factor of about 35, 35 possible moves on average that I can take, and a state space, a number of leaves that's grown to 10 to the 123rd power, enormous numbers of possibilities. 

Even still, modern processors are able to do this successfully. In 1995 and then in 1997, a computer program called Deep Blue built by IBM that ran on a giant supercomputer beat the current world champion, Garry Kasparov. This was a turning point. Today, though, that same processing power sits on my MacBook. 

Processing speed keeps getting faster and faster. We can evaluate more and more boards quicker and quicker. But more importantly, we have better evaluation functions and better pruning methods. So we can search the space more complexly. The biggest of the board games that we can think of, something like Go that's got a 19 by 19 board, now suddenly, we're past the point where computational systems can win. There's no computational system out there that can beat a professional Go player. The best systems today rank it about the sort of good amateur level. So there's still quite a bit out there that you can't get to yet. 

All right, these traditional board games, these kinds of systems where we build this minimax, whether it's got alpha-beta or not, these algorithms work because there are certain constraints. We have perfect information about the world. We know where all the pieces are. The world is static. Nobody gets to move the pieces around while I'm sitting there thinking, taking my turn. There's an action space that's discrete. I can put my pawn here, or I can put my pawn here. I'm not allowed to put my pawn on the line in between the two squares. 

And finally, the actions are deterministic. I know that if I say, rook to knight three, my rook is going to end up at knight three, as long as it's a valid move. There's no uncertainty about that. Now, as I go to more different kinds of games, we have to break those assumptions. 

What if I go to something like classic video games? Here's a selection of video games from the Atari 2600. What do I have up there? I've got Frogger, Space Invaders, Pitfall, and Pac-Man. What kinds of environments do I have here now? Which of these assumptions do I have to break? 

Well, it depends on the game. I could play chess on the 2600, and it would be just like it was before. For most of these systems, there's complete knowledge about the world. There's completely deterministic actions. But usually, the world's no longer static. That is, while I'm sitting there waiting, something is moving. The ghosts are coming to get me. The scorpion is following me underneath. The space invaders are coming closer and closer. How well can we do against these? 

A few years ago, Google had a project called DeepMind, where they trained a computer program to play Atari 2600 games. And if you think this isn't serious business, the results of their study were published in Nature, so just about as good a publication as you can possibly get. And here's how well they performed. 

They have an algorithm that sat and watched just the screen inputs. It got no instructions whatsoever about the rules of the game. And it was supposed to figure out, based its score, how well it was doing. This was a system that used something called reinforcement learning. That is, it looked at its score. And if it got a good score, it said, I should remember those things. And I should do those again. And if it got a bad score, it said, I shouldn't do those things again. 

This is the performance of those trained systems allowed to play for a few hours on each game, compared against professional gamers. So for all of the games that are to the left side of this line, this self-trained computer program outperformed the professional gamers. And for everything to the right, the professional gamers were still the best. For something that knew nothing about the rules, that knew nothing about the structure of the games, this is impressive performance. And this is what we're able to do today. 

OK, you say, but if we think about AI in games, normally we think about the things that we can actually sit down and play against. If I sit down and I play StarCraft, or I play Free Sieve, the computer opponent is the person controlling the Zerg, or controlling the other civilization. How do those players actually find their moves? 

Well, these games are structured much the same way as our board games, these games that we'll collectively call four X games, explore, expand-- forget the ones. What are they? Explore, expand, and extinguish, I think is the last one. But they're basically exploration and conquer games. Typically, the computer opponent there has limited information. They don't know exactly what's going on behind that fog of war. They don't get to see what you have in your inventory. 

There's an environment that is dynamic. Everything is changing all the time. You don't get to sit and wait to take your move. But most things are still discrete. I have to put my city here. Or I have to put my city here. And everything is deterministic. When I say, move my unit here, my unit moves here, unless an obstacle suddenly comes into play. Now, that's not all computer games that are out there today. 

If I go and I play a first person type game, something like Thief or Fallout or Skyrim, or Halo, now I have computer opponents that are out there that have a very different situation. They have, again, limited information. They only can see a certain field of view. The environment is still dynamic. Things are changing all the time. 

But now I have a much more continuous action space. I can be just peeking a little bit out of the doorway. And some games, my actions are stochastic. I get to try to jump over that wall, but I've got a chance of failing. These types of games are getting closer and closer to the kinds of controllers that we build in robotics. 

In robotics, we have to assume that we have limited information. We have sensors that tell us about the world. We have an always-changing, dynamic environment. We have a world in which space is continuous, rather than discrete. And our actions, when we try them, have a chance of failing. And in fact, modern game controllers for your Halo opponent, or for those NPCs in Skyrim, basically run small robotics architectures. 

They sense the world. They build a model of the world. They compute based upon a set of goals that they'd like to accomplish. They plan actions based on what they know. And those are exactly the same kinds of systems that we build in robotics. So these architectures, to bring this back together, are often quite the same. 

So let's see if we can see that. Let's go back to our tic-tac-toe example. And I'm going to ask a couple of my post-docs to come up and help me. So Chen Ming, and Alessandro, and Olivier, if you guys would come up. And I'm going to need a couple of volunteers 

OK, I saw a hand up right there in the middle. Let me take one more, somebody further in the back maybe. All right, over there. Come on up. All right. So let's take that cover down. And if you guys would come right back around here for me, fantastic. 

So this is a robot called Baxter. And Baxter is a robot that's a commercial platform, designed by a company called Rethink. And this robot is designed for small-scale manufacturing. But today we're going to use it to play tic-tac-toe. Now, this robot is also something that's relatively unique. Because if I were standing anywhere close to a standard factory automation system, I'd be in very grave danger of being injured. 

Baxter, however, is designed to be relatively safe to interact with. And so I can push on this robot. And you can see it's a little bit flexible as it moves around. And I can reposition it where I'd like it to go. Now in a normal robotic system, we would have a set of joints here that would be directly responding to position commands. And they wouldn't necessarily care if they were moving through open air, or if they were moving through my ribcage. 

OK. And typically, if you were here with an industrial system, you would go nowhere near it. There would be yellow safety tape all around it. This system has a slightly different design to be friendlier and easier for people to interact with, in that in each joint, there's a spring. And rather than controlling an exact position, we control a certain amount of torque, a certain amount of force, that we would like to be on that spring. All right, so let me take our volunteers here. Hi, what's your name? 

AUDIENCE: Louis. 

SPEAKER: Louis. Nice to see you. And? 

AUDIENCE: David. 

SPEAKER: David. Nice to meet you. If you guys would wait right here for a second, I'm going to give you a chance to do this. So this robot, if you come up and if you push gently on it, you're going to see that it moves a little bit. And if you grab it right here on the wrist just above where those buttons are, it looks like you should grab the buttons, but grab right above it instead, you'll be able to very gently manipulate it through space. Louis, you want to give it a try? So give it just a little push to start with. And then if you put your fingers right there and hold onto to it, because it will move for you then. All right, you want to give it a try? Come on up. So give it just a gentle push there to start. You can feel what it's like. And then if you grab it right there, you'll be able to maneuver at around. 

OK. So typically, this kind of a robot would be used for small scale manufacturing. And I'm going to move this arm just down out of the way a little bit here. But today, we're going to use the same tic-tac-toe playing system based on minimax that we built earlier. OK? So, you guys are each going to play a game. Louis, you're going to be first. Let me just hold up here for a second. I'm going to have you stand right here, just so everybody can see you. Are you guys set up here? 

ROBOT: Welcome. Let's play tic-tac-toe. Do not grasp your token before I say that it is your turn. I start the game. It is my turn. SPEAKER: Now, if you could take one of your pieces and go ahead and place it. ROBOT: It is your turn. [LAUGHTER] It is my turn. [LAUGHTER] [LAUGHTER] It is your turn. SPEAKER: The human race is counting on you here, Louis. 

ROBOT: It is my turn. 

SPEAKER: So Baxter successfully blocked here. 

ROBOT: It is your turn. It is my turn. It is your turn. It is my turn. SPEAKER: And we'll let Baxter finish out its last move here. 

[LAUGHTER] 

ROBOT: That's a tie. I will win next time. 

[LAUGHTER] 

SPEAKER: All right, thanks very much, Louis. Thank you. You can go this way. 

ROBOT: I start the game. SPEAKER: So let me explain to you one more little bit before we get our rematch here. What exactly is happening? So the robot has a camera up top here. And it's looking down at the board. And it's seeing whether it's got a red O or a blue and white X. As those get placed on the board, that's basically the same input that we would be reading in from our data structure from our screen. It's running the same minimax algorithm to be able to find where to place a good token. 

And then we're giving a command about where we'd like a token to be placed. The arm is moving out. It's using a vacuum gripper to apply some suction to that wooden piece, pick it up, move it to the right spot, and then release the suction and drop it. All right, we're going to give it one more shot with a slightly smarter player here. You ready? All right, if you'd stand right up here and give a-- turn out this way so you can see everybody. And then [INAUDIBLE]. 

ROBOT: It is my turn. 

SPEAKER: Baxter will start. It is your turn. It is my turn. It is your turn. It is my turn. 

[LAUGHTER] 

SPEAKER: [WHISPERING] Just let him go ahead and win. ROBOT: It is your turn. SPEAKER: That's OK. 

ROBOT: It is my turn. 

[LAUGHTER] 

I win. 

[LAUGHTER] 

I start the game. 

SPEAKER: All right, thank you very much. All right, I think we've got time for one more excellent tic-tac-toe player, someone who can put this thing to match, who knows what they're doing. 

[LAUGHTER] 

Who's going to be our champion here? All right, your friends volunteered you. That's good enough for me. Tell me your name again. AUDIENCE: Tamir. SPEAKER: Tamir, nice to see you. All right, again, we're going to put you right up here so everyone can see you. You are our representative in this match now. Baxter is one and oh and oh. Or sorry, one oh and one. And it's up to you here. Baxter will get to move first, though. So. ROBOT: It is my turn. 

[LAUGHTER] 

It is your turn. It is my turn. It is your turn. It is my turn. It is your turn. 

[LAUGHTER] ROBOT: It is my turn. SPEAKER: It's a lot harder when you're standing up here, folks. [LAUGHTER] ROBOT: You humans are so easy to beat. [LAUGHTER AND APPLAUSE] SPEAKER: Thanks very much. ROBOT: I win. I start the game. 

SPEAKER: All right, so thanks very much to Olivier, and to Alessandro, and to Chen Ming. 

[APPLAUSE] 

I want to make one last point. So Baxter at the very end there, cheated. And that was unexpected. One of the fantastic things about AI is that we do work in AI so that we can build really interesting and intelligent devices. But we also do work in AI because it tells us something about how humans are intelligent. 

One of the favorite studies from my lab is looking at what happens when machines unexpectedly cheat. We did this originally not with Baxter playing tic-tac-toe, but with a smaller robot named Nao, who played rock-paper-scissors. And sometimes after playing lots and lots of boring rock-paper-scissors games, the robot would throw a gesture, lose, and then suddenly change its gesture and say, I win. 

[LAUGHTER] 

Now, sometimes we'd also have the robot, just as a control, throw a gesture, win, and change its gesture to lose, throw the match, cheat in order to lose. And that is not nearly as compelling. The robot that cheats in order to win people respond to as if it is out to get them, like it is actively seeking their destruction. 

[LAUGHTER] 

It becomes an agent. It is like a person. It has belief and intention. And it's not good intention. And the robot that throws the game is just malfunctioning. It's just a broken device. Let me show you a couple of examples of that from a few of our participants. So here's cheating in order to lose. 

[VIDEO PLAYBACK] -[INAUDIBLE] win. Let's play. 

-Wait, what? 

-[INAUDIBLE] win. Let's play. 

[INAUDIBLE] win. Let's play. 

SPEAKER: And here's cheating to win. 

-Yes, I win. Let's play. 

-You can't do that. 

[LAUGHTER] 

-Yes, I win. -You cheated. You cheated now. 

-Yes, I win. 

-Hey, you cheater. You cheat, super cheat. 

[END PLAYBACK] 

SPEAKER: These different reactions rapidly change our perception of the device. Does that mean that we deliberately build machines that cheat because that's the best engineering that we can do? No, but it tells us something really interesting about people. That thing that cheats you and steals your victory, that's something that's alive, that's animate, that's out to get you. It has mental state. It has belief. It has intention. 

That thing that hands the game to you, that's not. That's just malfunctioning. This is in many ways why it's easy to throw the game with kids. But if you try to cheat them and sort of claim victory when, you know, just to shorten the game, they'll catch you right away. These kinds of effects that we see coming out of AI, they teach us a lot about ourselves. 

All right, that's it for today. Thanks very much to David and the Harvard production team for coming down. 

[APPLAUSE] 

We'll see you for quiz one, and then for one last lecture. Have a great day. 

[APPLAUSE] 

[MUSIC PLAYING] DAVID J MALAN: Well, we probably need to introduce some kind of encryption, right? Because then the headers of these HTTP requests will be scrambled so that anyone trying to sniff your traffic won't actually be able to see them. So what's the solution to this problem? Well, we need to actually introduce encryption into the formula, so that when that person is transmitting data from A to B, we can securely send-- 

[LAUGHTER] 

The information in a way that the adversary can't, in fact, see it.