Module `ai.search.minimax`

Minimax with alpha and beta pruning. See function minimax().

Expand source code

"""Minimax with alpha and beta pruning. See function `minimax`."""

from ._heuristic import Heuristic
from ._minimax import minimax


__all__ = ["Heuristic", "minimax"]

Functions

def minimax(state: numpy.ndarray, simulator: Base, search_depth: int, heuristic: Heuristic, maximizing_player: bool) ‑> int

Runs the minimax algorithm from the given state in the given simulator. The simulator is expected to be one a two player zero-sum game, where a terminal state with positive reward indicates a win and a negative reward a loss.

Args

state : np.ndarray: Current state.
simulator : simulators.Base: Simulator instance.
search_depth : int: Maximum search depth.
heuristic : minimax.Heuristic: Heuristic function.
maximizing_player : bool: True, if the player to make a move aims to maximize the heuristic.

Returns

int: Best available action.

Expand source code

def minimax(
    state: np.ndarray,
    simulator: simulators.Base,
    search_depth: int,
    heuristic: minimax.Heuristic,
    maximizing_player: bool
) -> int:
    """Runs the minimax algorithm from the given state in the given simulator. The
    simulator is expected to be one a two player zero-sum game, where a terminal state
    with positive reward indicates a win and a negative reward a loss.

    Args:
        state (np.ndarray): Current state.
        simulator (simulators.Base): Simulator instance.
        search_depth (int): Maximum search depth.
        heuristic (minimax.Heuristic): Heuristic function.
        maximizing_player (bool): True, if the player to make a move aims to maximize
            the heuristic.

    Returns:
        int: Best available action.
    """

    if not simulator.deterministic:
        raise ValueError("Cannot run minimax on a stochastic environment.")

    def value_of_action(action: int) -> float:
        next_state, reward, terminal, _ = simulator.step(state, action)
        if terminal:
            if reward > 0:
                return np.inf if maximizing_player else -np.inf
            if reward == 0:
                return 0
            return -np.inf if maximizing_player else np.inf
        return _alphabeta(next_state, simulator, search_depth - 1, -np.inf, np.inf, not maximizing_player, heuristic)

    action_space = simulator.action_space.as_discrete
    actions = np.arange(action_space.size)[action_space.action_mask(state)]
    return Query(actions).argmax(
        lambda a: value_of_action(a)
    )

Classes

class Heuristic

Base class for heuristic functions. Heuristics evaluate a given state and indicates who is (believed to be) in favor. The heuristic needs to be symmetric. Given a state with assigned value X, the exact opposite state must be assigned -X. This is due to the zero-sum assumption.

Ancestors

abc.ABC