03 Adversarial Search

by J H.

Adersarial search setting

two player
turn taking
deterministic
fully observable
time constraints
zero-sum

turn taking, zero-sum, two player games, both players behave rationally

fully vs partly obserable

fully-observable:

each player has all information available -> both players can compute the game theoretic best move -> strategy of opponent is known

partly-observable:

players might miss information -> can not predict opponents move -> strategy of opponent is not known

Considered algorithms

MINIMAX with alpha-beta-pruning
Monte-Carlo Search, Monte-Carlo Tree Search

Zero-sum

rewards/rating of players sum up to zero
if on player wins, the other loses
simplification
- given score of player1, we can compute score of player2
- we only consider the score of player1 which is MAX
- player1 maximizes MAX
- player2 minimizes MAX

Time constraited

assumption:

in the given time, only a small fraction of the state space can be explored

Search formulation

inital state s
actions in state: actions(s)
player: player(s)
trasition model: result(s, a)
terminal test (true if game is over)
evaulate how good player is in state: evaluate(s, p)

Example grame tree

Minimax idea

from current state (root), perform DFS into depth h (horizon)
each time a leave is reached:
- evaluate leaf
- backup vaules to the root:
  - MAX node: select maximum value of successors
  - MIN node: select minimum value of successors (intution: opponents plays as good as possible
once time is over: select successor from root with largest value

Practical Minimax

for big game trees, the search takes very long
- algorithm would be stuck in the subtree of first action
improvement: use iterative deepening to search the tree

Evaluation function e(s)

maps state s to value
e(s) > 0 -> good for MAX
e(s) < 0 -> bad for MAX, good for MIN

Typical e(s)

weighted sum of features

f_i might be:

number of possible moves
number of pieces

Backedup values

at each node, the backed up value represents the value of the best state the player can reach from this subtree

Minimax evaluation

Minimax is guranteed to compute the game theoretic optiamal move, if

turn taking
zero sum
fully observable
two players
search fully explores the game tree

Minimax for a game which is non zero-sum

zero-sum games are competitive: maximizing my own values implies minimizing the value of the opponent
a non zero-sum game is for example a cooperative game
then, the opponent should not minimize its values because this result in subopitmal results

Minimax for non turn-taking games

in a non turn-taking game (e.g. rock-paper-scissor), the search yields suboptimal results
this is, because based on the choice of the first-player, the second player can always win

Minimax for more than two players in a zer-sum game

e.g. consider a three-player game
in the setting of minimax player3 must either maximize or minimize the value
- if p3 maximizes, the result it too optimistic
- if p3 minimizes, the result is too optimistic
- in fact, p3 wants to maximize its own score, which can be realized either by minimizing p1 oder p2

Minimax for partly observable environments

parts of the game are not known by the algorithm (e.g. cards of opponent)
minimax can not determine best strategy of opponent
thus, the computed values for MIN nodes might be
- too optimistic, e.g. move of opponent can not be as good as assumed
- too pessimistic, e.g. move of opponent can be better than assumed

Observation

Search tree is huge, we can not go deep into the tree in limited time
Idea: try to prune parts of the tree -> alpha-beta pruning

Alpha-Beta-Pruning graphical

MAX already found move leading to 0
- MAX will not go into the branch -3
- The branch with value -3 will not increase, because MIN tries to minimize this number
- Next subtree under -3 can be ignored
beta(N) = -3 <= 0 alpha(A)

MIN already found move leading to 0
- MIN will not go into branch 3
- Branch 3 will not decrease, because MAX tries to maximize it
- Next branch under 3 can be ignored
alpha(N) = 3 >= 0 alpha(A)

Alpha-Beta-Pruning mathematical

Discontinue search below node N:

MIN: if beta(N) is <= alpha(A) of any MAX ancestor

MAX: if alpha(N) is >= beta(A) of any MIN ancestor

When does alpha-beta-pruning reaches its optimum?

best nodes first for each player:

MIN children of MAX node are ordered in decreasing backup values
MAX children of MIN node are ordered in increasing backup values

BUT:

if we would already know how to order the nodes, there would be no need to search them

Alpha-Beta-Pruning evaluation

worst-case: O(b^h) -> no beneifts
best-case: O(b^(h/2))
random ordering: O(b^(h*3/4))

-> node ordering is essential

Improvements

practical ordering: use backup values of previous iteration to order nodes
transposition tables to handle revisited states

Obervation

MINIMAX requires some sort of evaluation function, which might be hard to come up with
Idea: random simulation of moves -> Monte-Carlo
- result of random moves converge over time
- this then yields the best action to take

Monte-Carlo Search

recursive implementation
random DFS from current node until leaf is reached
- evaluate leaf leave
- compute average values
choose action with highes average value (average values are computed for first-level actions)

Monte-Carlo Search evaluation

Good:

easy to implement
low memory complexity
no game specific knowledge needed -> no heuristics needed
handles uncertainty

Bad:

problems of beeing too optimisitc/pessimistic
does not terminate
while MINIMAX computes the game theoretic best move, MCS does not produce correct results

Monte-Carlo Tree Search

compute average values for all nodes in tree
- advantage: reuse of values in next iteration
Further improvment: UCT
- idea: use stored average values to guide exploration
- relies again on heuristics
- speed up convergence of algorithm

MCTS UCT

Phases:

Select leaf of current tree
- based on UCB1 heuristic
- goal: balanced exploration/exploitation
Expand the node
Playout: random simulation until game is terminated
Backup: update the values in the current tree

MCTS UCT

exploraiton vs exploitation

UCB1 tries to balance exploitation and exploration of MCTS

exploitation: use information already discovered
- select moves with highest average values
- too much: suboptimal results because missing moves
exploration: find new information
- select moves which have not been tried so far
- too much: slow convergance

Join Course

Preview

Author

J H.

Information

Last changed
2 years ago

Report course