Backgammon - The Hybridation

other

A month ago, I boldly set out to master reinforcement learning. Fast-forward: my M1 Mac is still emotionally recovering. After a series of brutal overfits, buggy loops, and countless frozen sessions, I pivoted. Today, I present something more manageable, but still ambitious — a hybrid AI built in Python, blending good old heuristics with Expectiminimax and a neural network sidekick.

The Goal: Crafting a Worthy Opponent

The mission: build an AI that can actually play — not just legally, but smartly. That means:

Understanding the game state.
Making strong decisions under uncertainty.
Reacting to both position and probability.

No pressure.

Core Components

The project is built around several key modules:

1. The Game Engine (`BackgammonGame` Class)

This is the heart of the project, responsible for:

State Representation: Using a NumPy array (BoardArr = NDArray[np.int8]) for the 24 points, alongside variables for the bar counts (white_bar, black_bar) and off counts (white_off, black_off) for each player. The board uses positive numbers for White (O) and negative for Black (X).
Rules Implementation: Methods to validate and execute moves (make_move, make_move_base_logic), handle dice rolls (roll_dice), check for legal moves (get_legal_actions, _get_single_moves_for_die), manage bearing off (_can_bear_off), and determine game state (is_game_over, determine_game_phase).
Utilities: Calculating pip counts (calculate_pip), providing a text-based board representation (draw_board), and generating hashable state representations (board_tuple, compute_zobrist).

2. The AI Decision Engine: Expectiminimax

Unlike deterministic games such as chess or go, backgammon introduces significant randomness through dice rolls. This poses a unique challenge for AI: it’s no longer sufficient to search for the best sequence of moves — the agent must also account for the probabilities of future outcomes at every step.

This is where the Expectiminimax algorithm comes into play.

What Is Expectiminimax?

Expectiminimax is an extension of the classical Minimax algorithm, designed specifically for games involving stochastic elements. It introduces a three-tiered decision structure:

Max nodes: The AI’s turn — it selects the action that maximizes its expected utility.
Min nodes: The opponent’s turn — the AI assumes the opponent plays optimally to minimize the AI’s outcome.
Chance nodes: A dice roll occurs. The AI has no control over this and must compute the expected value over all possible outcomes.

This allows the decision-making process to integrate both strategic adversarial reasoning and probabilistic reasoning.

Implementation Details

The algorithm is implemented recursively, with several optimizations to make it viable in practice.

1. Base Case

1 2	if depth == 0 or game.is_game_over(): return evaluate_position_hybrid(game, maximizing_player)

When the maximum search depth is reached, or the game ends, the current position is evaluated using a hybrid heuristic and neural model.

2. Transposition Table

1
2
3

key = (zobrist(game), depth, player_to_move)
if key in TT:
    return TT[key]

To avoid redundant computation, previously evaluated positions are cached using a Zobrist hash, which efficiently generates a unique identifier for each game state.

3. Sampling Dice Rolls

1	sampled_dice_rolls = _get_random_dice_samples()

Instead of evaluating all 21 possible dice combinations, the algorithm samples a subset. This greatly reduces computational cost while preserving statistical relevance.

4. Expectation over Outcomes

Only 12 of the 21 different possible dice outcomes are thrown (to save some computing time).

Then :

All resulting states are generated.
The algorithm recursively evaluates these states.
The best (or worst) evaluation for that roll is selected depending on the player’s turn.
These scores are averaged to compute the expected value of the dice roll.

This average becomes the value returned by the chance node.

5. Selective Pruning

Although traditional alpha-beta pruning doesn’t directly apply to chance nodes, it is possible to prune within the set of outcomes of a single dice roll, avoiding evaluation of branches that can no longer influence the result.

1 2	if beta <= alpha: break

This pruning helps maintain tractability even in probabilistic branches.

Strategic Value

Expectiminimax allows the AI to make decisions that account for real-world probability distributions, rather than assuming best- or worst-case scenarios. It enables the AI to:

Prefer robust positions that perform well in most dice outcomes.
Assess tactical risks involving opponent re-entries or hits.
Balance risk and reward based on actual statistical likelihood.

This results in more realistic and nuanced play.

3. Heuristic Evaluation (evaluate_position_heuristic)

@dataclass
class HeuristicWeights:
    PIP_SCORE_FACTOR: float = 1.0
    OFF_SCORE_FACTOR: float = 10.0
    HIT_BONUS: float = 30.0
    BAR_PENALTY: float = -20.0
    POINT_BONUS: float = 2.0
    HOME_BOARD_POINT_BONUS: float = 3.0
    INNER_HOME_POINT_BONUS: float = 2.0
    ANCHOR_BONUS: float = 5.0
    FIVE_POINT_BONUS: float = 5.0
    PRIME_BASE_BONUS: float = 4.0
    SIX_PRIME_BONUS: float = 10.0
    TRAPPED_CHECKER_BONUS: float = 8.0
    DIRECT_SHOT_PENALTY_FACTOR: float = -1.5
    BLOT_PENALTY_REDUCTION_IF_OPP_ON_BAR: float = 0.5
    HIT_FAR_BLOT_PENALTY_MULTIPLIER: float = 1.5
    STRATEGIC_BLOT_PENALTY_REDUCTION: float = 0.4
    STACKING_PENALTY_FACTOR: float = -0.5
    BUILDER_BONUS: float = 1.0
    MIDGAME_HOME_PRISON_BONUS: float = 20.0
    CLOSEOUT_BONUS: float = 15.0
    FAR_BEHIND_BACK_CHECKER_PENALTY_FACTOR: float = -0.7
    ENDGAME_BACK_CHECKER_PENALTY_FACTOR: float = -1.5
    RACE_MODE_FACTOR: float = 2.0
    RACE_MODE_OTHER_FACTOR_REDUCTION: float = 0.1

4. Pruning the Search Tree : Neural Network Guidance

In the context of a complex game like backgammon, exhaustive search using algorithms such as Expectiminimax can quickly become computationally infeasible, especially as the search depth increases and chance nodes multiply. To mitigate this, a lightweight neural network is used to assist the evaluation process — not to play directly, but to guide the search more effectively.

Purpose of the Neural Network

Unlike conventional approaches where the network tries to predict the best move directly, this model serves a more specific and supportive role: it evaluates the quality of a given board position.

Its primary function is to score terminal nodes in the Expectiminimax tree, helping the algorithm to:

Prioritize promising branches early,
Prune unpromising lines of play more aggressively,
Reduce the overall computational load without compromising decision quality.

Training Data

The network was trained via supervised learning on 200,000 games automatically generated using GNU Backgammon (gnubg).

The training script and dataset generation process are available on the GitHub repository.
For each position, the target value corresponds to a win probability or position score extracted from gnubg analysis.
This provides the network with a dense and high-quality representation of typical and atypical game states.

Model Architecture

The model is a relatively simple Multi-Layer Perceptron (MLP) implemented with PyTorch:

class MiniMaxHelperNet(nn.Module):
    def __init__(self, input_dim=54):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, 128), nn.ReLU(),
            nn.Linear(128, 128), nn.ReLU(),
            nn.Linear(128, 1)
        )
    def forward(self, x):
        return self.net(x)

The input is a 54-dimensional vector encoding the state of the board and game metadata.
The output is a single scalar value: the estimated utility of the position from the perspective of the current player.

Integration into Minimax

During search, the AI evaluates a position using a hybrid approach:

A heuristic evaluation based on hand-crafted rules.
A neural score predicted by the trained model.
A weighted average of the two, controlled by a parameter NN_WEIGHT.

def evaluate_position_hybrid(game_state, player_to_evaluate):
    phase = game_state.determine_game_phase()
    weights = PHASE_WEIGHTS.get(phase, MIDGAME_WEIGHTS)
    h_val = game_state.evaluate_position_heuristic(player_to_evaluate, weights)

    nn_val = 0.0
    if NET is not None and NN_WEIGHT > 0.0:
        try:
            with torch.no_grad():
                t = _tensor_for_player(game_state, player_to_evaluate)
                nn_pred = NET(t.unsqueeze(0))
                nn_val = nn_pred.item()
        except Exception as e:
            print(f"NN evaluation error: {e}")
            return h_val

    return (1.0 - NN_WEIGHT) * h_val + NN_WEIGHT * nn_val

This allows the search to benefit from both structured knowledge (heuristics) and pattern recognition (NN) without relying entirely on either.

Benefits

Faster pruning: The network helps eliminate weak lines of play earlier in the search.
Better positional awareness: Especially in mid-game, the NN captures patterns the heuristic may miss.
Flexible weighting: By adjusting NN_WEIGHT, the system can shift between a rule-based or data-driven evaluation depending on the phase of the game.

Limitations and Future Work

The network is small by design; future iterations could experiment with larger models or different architectures (e.g., convolutional or attention-based).
The training data is limited to the static analysis of gnubg. Self-play or reinforcement fine-tuning could improve generalization.
Better integration of temporal factors (e.g., cube decisions, match score) is an open area of exploration.

Summary

This neural model is not the brain of the AI — it’s the intuition. It doesn’t choose the move, but it helps the AI know which parts of the tree are worth exploring. In that sense, it acts like a positional compass: simple, fast, and trained on a massive base of strong play.

Try It Yourself!

The complete code, including the Cython extensions and the pre-trained neural network model, is available on GitHub.

👉 GitHub Repository: https://github.com/Nikos-Prinios/bkg-NN

Happy backgammoning!