15-211 CMU April 8, 2008 Game Programming Danny Sleator ---------------------------------------------------------- Example of NIM Mathematical framework definitions values preclusion minimax formulation alpha-beta ---------------------------------------------------------- EXAMPLE: Nim ------------ Before we can talk about real programming in practical games like chess, we need to develop a little theory. So we do that by starting with a simple example. The game of Nim. There are several piles of stones. Two players alternate removing stones. You must pick a pile and take one or more stones from that pile. The one who is stuck with no stones left loses. Let's analyze a sample game. Call the two players A (Alice) and B (Bob). Alice goes first. Say there is a pile of 1 stone and a pile of 2 stones. We'll denote this (1,2). What are the options for A? Take 1 from pile 1, or take 1 or 2 from pile 2. (1,2) A's move / | \ / | \ / | \ / | \ / | \ / | \ / | \ (2) (1,1) (1) B's move / \ | | (1) () (1) () A's move | | () () B's move Is it possible for A to guarantee a win? To find out, we have to analyze all the options. Moving to (2) would allow B to take all the stones from the pile and win. So that's a bad option for A. Moving to (1,1) forces B to move to (1), then A wins. Moving to (1) allows B to win. So A can win by moving to (1,1). Let's try to formalize this reasoning. Let's assign a numerical value of +1 or -1 to each position in the tree. The value is +1 if it's possible for the current player to force a win (there'a a winning strategy for the current player). And the value is -1 if the other player can win. Here are the values assigned to the game above. (1,2)1 / | \ / | \ / | \ / | \ / | \ / | \ / | \ (2)1 (1,1)-1 (1)1 / \ | | (1)1 ()-1 (1)1 ()-1 | | ()-1 ()-1 What's the general rule for computing the values of the positions in the tree? Well, if a node has no children, its value is obviously -1, because the player whose turn it is to move has lost. Suppose that we've inductively evaluated all the children of a node. What's the value of the node? Well, if there is a child of value -1, then the value of this node is 1. If all the children have value 1, then the value of this node is -1. A more concise way to describe this is that the value of a node is just the maximum of the negations of the values of the children. This is an example of a: 2-person, perfect information, deterministic, finite, zero-sum game. What do these terms mean? 2-person is clear. Perfect information means that both players know the entire state of the world. (Unlike, say, poker where what you have in your hand is hidden from the other players.) Deterministic means that there is no randomness, such as dice. (Backgammon is perfect information but not deterministic.) Finite means that the game must end in a finite number of moves. Zero-sum means that the total winnings of all players add to zero. Poker and backgammon are zero-sum, but there are games such as the prisoner's dilemma or the game of chicken, in which this is not the case. Of course there are rich and beautiful theories and algorithms for these kinds of games, but they're beyond the scope of this course. Many famous and popular games have these properties, such as tic-tac-toe, chess, checkers, go, othello, hex. [By the way, there's a beautiful algorithm to play optimal nim. Write the number of stones in each pile in binary. Bitwise XOR all these numbers together. If the result is 0, the value of the position is -1, otherwise it's 1. Now if the position has non-zero xor, you can always win at nim by moving to a position where the xor is zero. Can you prove that this works?] So, let's say we have one of these 2-person, perfect information, deterministic, finite, zero-sum games, in which players alternate moves. How do we figure out what to do? It's easy. First we draw the entire game tree. The nodes with no children are terminal positions (those where the game is over, such as checkmate in chess, or 3-in-a-row in tic-tac-toe). Start by labeling the terminal positions. Now propogate the values up the tree using the maximum of the negatives of the children. Let's formalize this. Let P be a position in the game. Let S(P) be the successor positions, that is, the positions that can result when the player whose turn it is in position P makes a move. If P is a terminal position, let v(P) be the value of that position. (In all the examples so far, this value must be -1, 0 or 1, but the theory works for real valued positions. This will become important later.) The value v(P) of a non-terminal position P is defined by the following formula: v(P) = MAX -v(P') P' in S(P) Theorem: If m = v(P), then there exists a strategy for the moving player in which this player wins at least m, and there exists a strategy for the other player in which the moving player wins at most m. A "strategy" simply means a table that tells you what to do in each position. The theorem is an obvious consequence of the discussion above. This definition can be turned into a trivial recursive algorithm for evaluating a game: negamax(P) if P is terminal, return v(P) m = -infinity for each P' in S(P) m = max(m, -negamax(P')) return m Note: here and below I use the following notation: "=" is assignment, "<=" is less than or equal to, ">=" is greater than or equal to. The Minimax formulation: Let's rewrite this so that it is always working from the point of view of one of the players. Let's call the two players Max and Min, and assume that Max is trying to maximize the value and Min is trying to minimize the value. For a terminal position P, we let v'(P) be the value of the position with respect to Max. We can rewrite the above search algorithm as follows: Max(P) if P is terminal, return v'(P) m = -infinity for each P' in S(P) m = max(m, Min(P')) return m Min(P) if P is terminal, return v'(P) m = infinity for each P' in S(P) m = min(m, Max(P')) return m This does exactly the same thing. It is easy to see this by making a syntatic change in the program. (Just propogate a "-" sign through the code.) The reason we write this here is because it's easier to understand what comes below (Alpha-Beta) in this formulation. More efficient Evaluation ========================= Preclusion ---------- If the value of all the positions are 1, 0, or -1, then a simple form of preclusion can be applied to make the search more efficient. The idea is simply to stop searching after you've found a winning move, because the other successors cannot affect the value. Here is the resulting algorithm. negamax(P) if P is terminal, return v(P) m = -1 for each P' in S(P) m = max(m, -negamax(P')) if m == 1 then return 1 return m Alpha-Beta ---------- The alpha-beta algorithm is simply a generalization of this idea which applies to games with arbitrary values. Consider the following game tree, where the values are written with respect to Max. Say that I'm evaluating the nodes from left to right. Q, S and T (either terminal nodes or trees) have the values shown. P Max's move / \ / \ 10 Q R Min's move / \ / \ 5 S T Max's move I evaluate Q and find that the value is 10 (all values here are from Max's perspective). It follows that the value of P is at least 10. Now I move on and evaluate S and find that its value is 5. Do I need to look at T? The answer is no. The value of T cannot effect the value of P. All T can do is pull down the value of R, which we already know is at most 5. For those familiar with chess consider this. Suppose I'm examining position P in chess game. I evaluate the first move m, and I find that m is a good move, because it leaves me ahead by a rook. now, suppose that I continue and look at another move m'. I check the first move of my opponent and find that it leaves me a queen down. Do I need to look at the other alternatives for the opponent? No. Because I already know that m' is not as good as m. If the opponent has a mating move after I make move m', it does not matter. When properly implemented, this kind of pruning can occur many levels deep in the tree. There's a very elegant way to express this in a recursive search that adds two parameters to the search function. A range (alpha, beta) is added. This range tells the search that if the value is <= alpha, I really don't care what the value actually is, I just want to know that it's <= alpha. And it tells the search that if the value is >= beta, I really don't care to know the exact value, just that it's >= beta. If the value is in the range (alpha,beta) then I do want to know the exact value. We'll state the algorithm in two ways. First as a generalization of Minimax and then as a generalization of negamax. MaxAB(alpha,beta,P) if P is terminal, return v'(P) for each P' in S(P) alpha = max(alpha, MinAB(alpha, beta, P')) if alpha >= beta then return alpha return alpha MinAB(alpha,beta,P) if P is terminal, return v'(P) for each P' in S(P) beta = min(beta, MaxAB(alpha, beta, P')) if beta <= alpha then return beta return beta The parameters alpha and beta are simply numbers with alpha<beta, and the interval (alpha,beta) is known as the _alpha-beta window_. These are local variables --- each recursive call creates its own instances of alpha and beta that are independent of all others. Note: It's easy to fold these two mutually recursive functions into one. To see this, start with MinAP() and replace alpha by -beta and beta by -alpha, and negate the returned result. You'll see that what you end up with is just MaxAB. So the alternate formulation is: AB(alpha,beta,P) if P is terminal, return v(P) for each P' in S(P) alpha = max(alpha, -AB(-beta, -alpha, P')) if alpha >= beta then return alpha return alpha This is usually the best one to implement in practice, because it avoids duplication of code. Theorem: Let v(P) be the value of a position P. Let X be the value returned by a call to AB(alpha,beta,P). Then we know the following: if X <= alpha then v(P) <= alpha if alpha < X < beta then v(P) == X if beta <= X then beta <= v(P) Proof (Optional material): The proof is by induction on the game tree. Take a particular node in the tree P, and assume that the theorem holds for all the successor positions of P. Also assume WLOG that P is a MAX node. There are three cases. Case 1: X <= alpha. Since X <= alpha, it must be the case that each return value from the recursive calls is also <= alpha (because we're taking the maximum of them to compute X). Therefore by induction, their true values all must also be <= alpha. Thus, their maximum (which is v(P)) is also <= alpha. Case 2: alpha < X < beta. Let P' be one of the successors of P. Let X' be the value returned by the alpha-beta search at P'. If P' has the highest value of all the successors of P, then when it's called its return value is strictly inside the range (alpha, beta). Therefore by induction the return value is the true value, and it's X. If X' is one of those that is <= the current alpha, then the true value (by induction) is also <= alpha, so it does not effect the outcome. Case 3: beta <= X. We know that for one of the calls to alpha-beta the result returned X' is >= beta. By induction, for this position P' the true value is >= beta. Therefore we know that the true value of P is also >= beta. QED. To compute the value of a game using the alpha-beta algorithm, you simply call AB(-infinity, infinity, P). The value returned must be in the interval (-infinity, infinity). (Every returned value is actually the value of some leaf.) Therefore by the theorem the value returned must be the actual value of the position. Let's look at what this algorithm does on the example shown above. [Let i denote infinity in this discussion.] We start by calling MaxAB(-i, i, P). alpha is initially -i. We then call MinAB(-i,i,Q). It returns 10, so alpha (at node P) is now 10. Next we call MinAB(10, i, R). The first thing it does is call MaxAB(10, i, S). This returns some value X which is at most 10. (Since the true value 5 is below the range [10,i].) This causes Min to set beta to X, and immediately terminate without having to evaluate T. [-i,i] P 10 Max / \ / \ / \ / \ / \ / \ / \ [-i,i] Q 10 [10,i] R X<=10 Min / \ / \ / \ / \ [10,i] S X<=10 T Max The numbers in brackets left of each node are the values of [alpha,beta] on the call to evaluate that node. The number to the right of the node is the value returned by the call to Min() (or Max()). [Present a bigger example on the blackboard.]