Reevaluating Policy Gradient Methods for Imperfect-Information Games

Max Rudolph, Nathan Lichtlé, Sobhan Mohammadpour, Alexandre Bayen, J. Zico Kolter, Amy Zhang, Gabriele Farina, Eugene Vinitsky, and Samuel Sokota
AI
You

Phantom Tic-Tac-Toe

Tic-Tac-Toe: Players take turn placing their symbol (X or O) on a 3x3 grid. The goal is to form a horizontal, vertical, or diagonal line of your symbol. X plays first.

Hex: Players take turns placing their stones (red or blue) on a 3x3 hexagonal board. The goal is to form a chain of adjacent stones connecting both sides of your color. Red plays first.

: The opponent's moves are hidden. Selecting an occupied cell reveals it.

Abrupt: If a player's move reveals an opponent's symbol, their turn ends.

Note: You are playing against an AI trained on the version of this game.

Show debug information

Full board
0
1
2
3
4
5
6
7
8
AI's view
0
1
2
3
4
5
6
7
8

P0:

P1:

Current turn:

P0 moves:

P1 moves:

P0 remaining legal moves:

P1 remaining legal moves:

Winner:

P0 information state:

P1 information state:

Logs