THE MINIMAX SEARCH METHOD
- Description
The minimax method allows us to tackle the problem of opponents in a relatively simple manner.
The game that we will be considering is a two player games, so each move will consist of two plies.
Let us assume that our opponent is playing to win, a fair assumption, and that they will always take the move that will give them the best advantage. Given this, we need to develop a technique that will take into account the opponent's best moves yet still give us the winning strategy. This is what Minimax attempts to do.
Minimax is a depth-first technique which follows each possible path down to a given depth - too deep and the search will take far too long, too shallow and we may miss paths that make early sacrifices for later gain.
So the search follows a depth-first search down to the specified level, setting the score value for each node it passes to NULL. For the node at the terminal depth, the evaluation function calculates the score value for that node.
Having reached the bottom, the search backtracks up to the next unexpanded branch, but as it does so, it propagates the score and the path back up the tree according to the following reasoning:
- If the move that is being backtracked would be made by the opponent
- If the value being propagated up is less than the current value stored at the parent node, replace the value in the parent node with this value and store the path traversed in this parent node.
- If the value is more, then continue up the tree to the next unexpanded branch but don't propagate the score or the path.
- If the move that is being backtracked would be made by me
- If the value being propagated up is more than the current value stored at the parent node, replace the value in the parent node with this value and store the path traversed in this parent node.
- If the value is less, then continue up the tree to the next unexpanded branch but don't propagate the score or the path.
This follows the simple idea, as stated, that the opponent will attempt to minimize the score and that I will attempt to maximize it, and this is where the algorithm gets its name from - Minimax.
- Specifications
Having described the algorithm, now let's look at a more formal definition of it.
First we need to define a few functions:
- generateMoves() which generate all the possible moves from the current position. These moves are directly placed in the global array moves.
- Evaluation_Function(int crtrow, int crtcol, int crtplayer) which is our evaluation function. Actually, it returns a number representing the goodness of the current position.
- DEEP_ENOUGH(Position, Depth) is the function that determines whether the search down this path has gone deep enough. This is determined by a simple depth limit.
MINIMAX returns a class State which holds all the information concerning the current state. With these defined we may now state our algorithm for Minimax.
STATE MINIMAX(CurPosition, Depth, Player, Ply)
If DEEP_ENOUGH(CurPosition, Depth)
CUR_BEST: PATH = CurPosition
CUR_BEST: VALUE = EVALFUNCTION (CurPosition, Player)
RETURN CUR_BEST
SUCCESSORS = GenerateMoves (CurPosition, Player)
IF SUCCESSORS is empty:
CUR_BEST: PATH = CurPosition
CUR_BEST: VALUE = EVALFUNCTION (CurPosition)
RETURN CUR_BEST
IF Ply = MAX
Set up a current best class:
CUR_BEST: PATH = NULL
CUR_BEST: VALUE = -LIMIT
FOR ALL SUCC IN SUCCESSORS
RESULT_STATE = MINIMAX(SUCC, Depth +1, OTHERPLAYER(Player), MIN)
IF RESULT_STATE : VALUE > CUR_BEST : VALUE
THEN CUR_BEST = RESULT_STRUCT
CUR_BEST : PATH = CurPosition + CUR_BEST : PATH
RETURN CUR_BEST
ELSE - (Ply = MIN)
Set up a current best class:
CUR_BEST : PATH = NULL
CUR_BEST : VALUE = +LIMIT
FOR ALL SUCC IN SUCCESSORS
RESULT_STATE = MINIMAX(SUCC, Depth +1, OTHERPLAYER(Player), MAX)
IF RESULT_STATE : VALUE < CUR_BEST : VALUE
THEN CUR_BEST = RESULT_STATE
CUR_BEST : PATH = CurPosition + CUR_BEST : PATH
RETURN CUR_BEST
- Coding
Since we are not only trying to find an efficient algorithm, there are many coding issues that we can talk about.
At the first sight, we have tried to use our State class with all the information concerning a state in it. This class contained a board (15 * 9 * 8 = 1080 variables) , value of the current position, description of current position and all the utility functions. The advantage of this representation is that the algorithm was straight forward but we have encountered a memory problem. In fact, this representation didn’t allow us to go further than depth 3.
Then, we have tried to keep our class State but to remove the board from it. Actually, our class contained only a path and state value (like algorithm above). We this new representation, we had a board that was used by any state. At each time we’ve applied a move to it we had to undo that move when we were done. This has made our coding tougher but we were able to reach depth 5 but any memory limitation problem.
The last depth was reasonable but we could have done much better. In fact, we could have removed the path that we kept track of and only keep the coordinates of the position. But why did we keep that class if it were only for keeping three variables. Thus, we have removed our class and used local variables (array of three elements in order to be returned), inside minimax, that were created at each call. A small problem that we have encountered during this change is that we had to propagate the value of each leaf state to its parent but we had to keep the current position of the parent (since we don’t care about the last position reached). Only in the case of the root, we had to change its position by the one of its best child in order to return it as a solution. Anyway, after this optimization we were able to reach depth 8.
So this is the basic Minimax method, but like A* search, with the Alpha-Beta Pruning technique we shall see that the Minimax method can be improved by pruning unproductive branches.
ALPHA-BETA PRUNING
- Description
Alpha-beta pruning is a technique that enables the minimax algorithm to ignore branches that will not contribute further to the outcome of the search. It is a type of Branch-and-Bound search technique, like the A* with pruning was. Again, this technique is best demonstrated by an example and so we will look at the algorithm more closely.
We need to introduce the two variables ALPHA and BETA. These can be described from the player's point of view as follows:
- ALPHA the minimal score that player MAX is guaranteed to attain.
- BETA the maximum score that player MAX can hope to attain.
The algorithm for Alpha-Beta Search is as follows:
For each node visited we assign storage for the path taken to backtrack to that node, a value called Alpha and a value called Beta, as well as the current score for that node.
Set the value of Alpha at the initial node to -Limit and Beta to +Limit.
- Search down the tree to the given depth, propagating the values of Alpha and Beta down as the search descends and setting the scores to null.
- On reaching the bottom, calculate the evaluation for this node.
- Backtrack, propagating values and paths according to the following:
- If the move being backtracked would be made by the opponent:
- If the current score is less than the score stored at the parent, replace the score at the parent with this and store the path from the bottom and the value of beta in the parent.
- If the score at the parent is now less than alpha stored at that parent, ignore any further children of this parent and backtrack the parent's value of alpha and beta up the tree.
- If the score at the parent is greater than alpha, set the alpha value of the parent to this score and proceed with the next child, sending alpha and beta down. If no children exist, propagate alpha and beta up the tree and propagate the value of alpha up as the min score.
- If the move being backtracked would be made by the computer:
- If the current score is more than the score stored at the parent, replace the score at the parent with this and store the path from the bottom and the value of alpha in the parent.
- If the score at the parent is now more than beta stored at that parent, ignore any further children of this parent and backtrack the parent's value of alpha and beta up the tree.
- If the score at the parent is less than beta, set the beta value of the parent to this score and proceed with the next child, sending alpha and beta down. If no children exist, propagate alpha and beta up the tree and propagate the value of beta up as the max score.
- When the search is complete, the alpha value at the top node gives the minimum score that the player is guaranteed to attain if using the path stored at the top node.
- Specifications
We shall first define the return argument that A-B-MINIMAX returns:
STATE
PATH - The Path from this node to the current best terminating node.
VALUE - The value at the current best terminating node.
ALPHA - New alpha.
BETA - New beta
With these defined we may now state our algorithm for Alpha-Beta pruned Minimax:
STATE MINIMAX-A-B(CurPostition, Depth, Player, Ply, Alpha, Beta)
If DEEP-ENOUGH(CurPosition,Depth)
CUR_BEST : PATH = CurPosition
CUR_BEST : VALUE = STATIC(CurPosition, Player)
CUR_BEST : ALPHA = Alpha
CUR_BEST : BETA = Beta
RETURN CUR_BEST
SUCCESSORS = MOVEGEN(CurPosition, Player)
IF SUCCESSORS is empty :
CUR_BEST : PATH = CurPosition
CUR_BEST : VALUE = STATIC(CurPosition)
CUR_BEST : ALPHA = Alpha
CUR_BEST : BETA = Beta
RETURN CUR_BEST
IF Ply = MAX
Set up a current best class:
CUR_BEST : PATH = NULL
CUR_BEST : VALUE = -LIMIT
CUR_BEST : ALPHA = Alpha
CUR_BEST : BETA = Beta
FOR ALL SUCC IN SUCCESSORS
RESULT_STRUCT = A-B-MINIMAX(SUCC, Depth +1, OTHERPLAYER(Player), MIN, CUR_BEST : ALPHA, CUR_BEST : BETA)
IF RESULT_STRUCT : VALUE > CUR_BEST : BETA
THEN RETURN RESULT_STRUCT
IF RESULT_STRUCT : VALUE > CUR_BEST : VALUE
THEN CUR_BEST = RESULT_STRUCT
CUR_BEST : PATH = CurPosition + CUR_BEST : PATH
CUR_BEST : BETA = CUR_BEST : VALUE
RETURN CUR_BEST
ELSE - (Ply = MIN)
Set up a current best structure:
CUR_BEST : PATH = NULL
CUR_BEST : VALUE = +LIMIT
CUR_BEST : ALPHA = Alpha
CUR_BEST : BETA = Beta
FOR ALL SUCC IN SUCCESSORS
RESULT_STRUCT = A-B-MINIMAX(SUCC, Depth +1, OTHERPLAYER(Player), MAX, CUR_BEST : ALPHA, CUR_BEST : BETA)
IF RESULT_STRUCT : VALUE < CUR_BEST : ALPHA
THEN RETURN RESULT_STRUCT
IF RESULT_STRUCT : VALUE < CUR_BEST : VALUE
THEN CUR_BEST = RESULT_STRUCT
CUR_BEST : PATH = CurPosition + CUR_BEST : PATH
CUR_BEST : ALPHA = CUR_BEST : VALUE
RETURN CUR_BEST
- Coding
At this point, we have still been using a State class but it can easily be removed like in the minimax part. Our new array that will hold the values will have five variables (current column, current row, value, alpha, and beta).
Adding alpha-beta pruning, from minimax, is very straight forward since the algorithm is almost the same. But what we will try to improve in this part is the complexity of our algorithm, at that point we were able to reach depth 15.
Alpha - beta pruning attains its greatest efficiency if we order the nodes to be examined in such a way that the best ones are tested first and the worst left to last. This requires some way of looking at a set of moves and saying how good it is which may be very difficult to do. If the tree is ordered such that the worst is checked first then we have no gain at all, we end up with normal minimax. On the other extreme, if you can ALWAYS order them correctly so the best is first then there is no need to then apply the minimax search method at all - it's already sorted for best play!
If we can order the tree almost perfectly then we find the method need only examine O(b^(d/2)) nodes on average, minimax on the other hand needs to examine O(b^d) nodes. We can therefore describe the optimum alpha-beta as having a branching factor of sqrt(b), where b is the actual branching factor of the tree. The obvious result of this is that we can look to twice the depth as normal minimax for the same number of nodes examined.
The optimal method to use is an iterative deepening. Previous searches can be used to guide the next search by searching the previous successful moves first and implementing alpha-beta cutoffs. This method is very useful for games which have a highly variable branching factor. But we will just try to find a relatively good and quick solution. So, we have added to our move generator a sorting part. After, the array of moves is filled, we can just sort the moves in decreasing order considering the value of the position they lead to. So, if the player to play is MAX we use them in that order otherwise (MIN) we take the moves from the end of the array. This optimization allows us to reach without any time out depth 20 (it sometimes reaches depth 25 but in 20% of the games we had a time out end of game).
In this game, we have some highly constrained sequences of moves. A player can do a sequence of moves if he uses some visited positions. In this situation, it is often desirable to extend the search to the outcome of this set of constrained moves whether or not the outcome occurs within the fixed depth. This is called Search to a fixed depth with extensions. To implement this, we could add detection for the case where we previously just detected the depth in the generation of moves. But we could solve this easily by incrementing the depth only when the other player is to play. This technique produced many smart sequenced moves but it could sometimes cause a time out end of game if the search goes too far in a sequence.
EVALUATION