Module ai.search.minimax
Minimax with alpha and beta pruning. See function minimax()
.
Expand source code
"""Minimax with alpha and beta pruning. See function `minimax`."""
from ._heuristic import Heuristic
from ._minimax import minimax
__all__ = ["Heuristic", "minimax"]
Functions
def minimax(state: numpy.ndarray, simulator: Base, search_depth: int, heuristic: Heuristic, maximizing_player: bool) ‑> int
-
Runs the minimax algorithm from the given state in the given simulator. The simulator is expected to be one a two player zero-sum game, where a terminal state with positive reward indicates a win and a negative reward a loss.
Args
state
:np.ndarray
- Current state.
simulator
:simulators.Base
- Simulator instance.
search_depth
:int
- Maximum search depth.
heuristic
:minimax.Heuristic
- Heuristic function.
maximizing_player
:bool
- True, if the player to make a move aims to maximize the heuristic.
Returns
int
- Best available action.
Expand source code
def minimax( state: np.ndarray, simulator: simulators.Base, search_depth: int, heuristic: minimax.Heuristic, maximizing_player: bool ) -> int: """Runs the minimax algorithm from the given state in the given simulator. The simulator is expected to be one a two player zero-sum game, where a terminal state with positive reward indicates a win and a negative reward a loss. Args: state (np.ndarray): Current state. simulator (simulators.Base): Simulator instance. search_depth (int): Maximum search depth. heuristic (minimax.Heuristic): Heuristic function. maximizing_player (bool): True, if the player to make a move aims to maximize the heuristic. Returns: int: Best available action. """ if not simulator.deterministic: raise ValueError("Cannot run minimax on a stochastic environment.") def value_of_action(action: int) -> float: next_state, reward, terminal, _ = simulator.step(state, action) if terminal: if reward > 0: return np.inf if maximizing_player else -np.inf if reward == 0: return 0 return -np.inf if maximizing_player else np.inf return _alphabeta(next_state, simulator, search_depth - 1, -np.inf, np.inf, not maximizing_player, heuristic) action_space = simulator.action_space.as_discrete actions = np.arange(action_space.size)[action_space.action_mask(state)] return Query(actions).argmax( lambda a: value_of_action(a) )
Classes
class Heuristic
-
Base class for heuristic functions. Heuristics evaluate a given state and indicates who is (believed to be) in favor. The heuristic needs to be symmetric. Given a state with assigned value
X
, the exact opposite state must be assigned-X
. This is due to the zero-sum assumption.Ancestors
- abc.ABC