Adersarial search setting
two player
turn taking
deterministic
fully observable
time constraints
zero-sum
turn taking, zero-sum, two player games, both players behave rationally
fully vs partly obserable
fully-observable:
each player has all information available -> both players can compute the game theoretic best move -> strategy of opponent is known
partly-observable:
players might miss information -> can not predict opponents move -> strategy of opponent is not known
Considered algorithms
MINIMAX with alpha-beta-pruning
Monte-Carlo Search, Monte-Carlo Tree Search
Zero-sum
rewards/rating of players sum up to zero
if on player wins, the other loses
simplification
given score of player1, we can compute score of player2
we only consider the score of player1 which is MAX
player1 maximizes MAX
player2 minimizes MAX
Time constraited
assumption:
in the given time, only a small fraction of the state space can be explored
Search formulation
inital state s
actions in state: actions(s)
player: player(s)
trasition model: result(s, a)
terminal test (true if game is over)
evaulate how good player is in state: evaluate(s, p)
Example grame tree
Minimax idea
from current state (root), perform DFS into depth h (horizon)
each time a leave is reached:
evaluate leaf
backup vaules to the root:
MAX node: select maximum value of successors
MIN node: select minimum value of successors (intution: opponents plays as good as possible
once time is over: select successor from root with largest value
Practical Minimax
for big game trees, the search takes very long
algorithm would be stuck in the subtree of first action
improvement: use iterative deepening to search the tree
Evaluation function e(s)
maps state s to value
e(s) > 0 -> good for MAX
e(s) < 0 -> bad for MAX, good for MIN
Typical e(s)
weighted sum of features
f_i might be:
number of possible moves
number of pieces
Backedup values
at each node, the backed up value represents the value of the best state the player can reach from this subtree
Minimax evaluation
Minimax is guranteed to compute the game theoretic optiamal move, if
zero sum
two players
search fully explores the game tree
Minimax for a game which is non zero-sum
zero-sum games are competitive: maximizing my own values implies minimizing the value of the opponent
a non zero-sum game is for example a cooperative game
then, the opponent should not minimize its values because this result in subopitmal results
Minimax for non turn-taking games
in a non turn-taking game (e.g. rock-paper-scissor), the search yields suboptimal results
this is, because based on the choice of the first-player, the second player can always win
Minimax for more than two players in a zer-sum game
e.g. consider a three-player game
in the setting of minimax player3 must either maximize or minimize the value
if p3 maximizes, the result it too optimistic
if p3 minimizes, the result is too optimistic
in fact, p3 wants to maximize its own score, which can be realized either by minimizing p1 oder p2
Minimax for partly observable environments
parts of the game are not known by the algorithm (e.g. cards of opponent)
minimax can not determine best strategy of opponent
thus, the computed values for MIN nodes might be
too optimistic, e.g. move of opponent can not be as good as assumed
too pessimistic, e.g. move of opponent can be better than assumed
Observation
Search tree is huge, we can not go deep into the tree in limited time
Idea: try to prune parts of the tree -> alpha-beta pruning
Alpha-Beta-Pruning graphical
MAX already found move leading to 0
MAX will not go into the branch -3
The branch with value -3 will not increase, because MIN tries to minimize this number
Next subtree under -3 can be ignored
beta(N) = -3 <= 0 alpha(A)
MIN already found move leading to 0
MIN will not go into branch 3
Branch 3 will not decrease, because MAX tries to maximize it
Next branch under 3 can be ignored
alpha(N) = 3 >= 0 alpha(A)
Alpha-Beta-Pruning mathematical
Discontinue search below node N:
MIN: if beta(N) is <= alpha(A) of any MAX ancestor
MAX: if alpha(N) is >= beta(A) of any MIN ancestor
When does alpha-beta-pruning reaches its optimum?
best nodes first for each player:
MIN children of MAX node are ordered in decreasing backup values
MAX children of MIN node are ordered in increasing backup values
BUT:
if we would already know how to order the nodes, there would be no need to search them
Alpha-Beta-Pruning evaluation
worst-case: O(b^h) -> no beneifts
best-case: O(b^(h/2))
random ordering: O(b^(h*3/4))
-> node ordering is essential
Improvements
practical ordering: use backup values of previous iteration to order nodes
transposition tables to handle revisited states
Obervation
MINIMAX requires some sort of evaluation function, which might be hard to come up with
Idea: random simulation of moves -> Monte-Carlo
result of random moves converge over time
this then yields the best action to take
Monte-Carlo Search
recursive implementation
random DFS from current node until leaf is reached
evaluate leaf leave
compute average values
choose action with highes average value (average values are computed for first-level actions)
Monte-Carlo Search evaluation
Good:
easy to implement
low memory complexity
no game specific knowledge needed -> no heuristics needed
handles uncertainty
Bad:
problems of beeing too optimisitc/pessimistic
does not terminate
while MINIMAX computes the game theoretic best move, MCS does not produce correct results
Monte-Carlo Tree Search
compute average values for all nodes in tree
advantage: reuse of values in next iteration
Further improvment: UCT
idea: use stored average values to guide exploration
relies again on heuristics
speed up convergence of algorithm
MCTS UCT
Phases:
Select leaf of current tree
based on UCB1 heuristic
goal: balanced exploration/exploitation
Expand the node
Playout: random simulation until game is terminated
Backup: update the values in the current tree
exploraiton vs exploitation
UCB1 tries to balance exploitation and exploration of MCTS
exploitation: use information already discovered
select moves with highest average values
too much: suboptimal results because missing moves
exploration: find new information
select moves which have not been tried so far
too much: slow convergance
Zuletzt geändertvor 2 Jahren