Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-Agent Game on Tic-Tac-Toe vs. MCTS Bot, 1000 sims
Loading...
88.32
First-move Normalized Score
Gemini-2.5-flash
64.9096
70.9873
77.065
83.1427
May 6, 2026
First-move Normalized Score
Second-move Normalized Score
Updated 27d ago
Evaluation Results
Method
Method
Links
First-move Normalized Score
Second-move Normalized Score
Gemini-2.5-flash
Source category=Closed...
2026.05
88.32
82.53
GPT-5-mini
Source category=Closed...
2026.05
82.12
87.1
Strat-Reasoner-4B
Source category=Open-s...
2026.05
77.6
73.12
MARSHAL-4B
Source category=Open-s...
2026.05
76.1
81.22
SPIRAL-4B
Source category=Open-s...
2026.05
74.2
67.84
Qwen3-32B
Source category=Open-s...
2026.05
70.34
76.88
Qwen3-8B
Source category=Open-s...
2026.05
68.62
72.03
Qwen3-4B
Source category=Open-s...
2026.05
66.78
68.97
Gemma3-12B
Source category=Open-s...
2026.05
65.81
80.02
Feedback
Search any
task
Search any
task