Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-Agent Game on Tic-Tac-Toe vs. MCTS Bot (100 sims)
Loading...
90.77
First-move Normalized Score
Strat-Reasoner-4B
64.2188
71.1119
78.005
84.8981
May 6, 2026
First-move Normalized Score
Second-move Normalized Score
Updated 27d ago
Evaluation Results
Method
Method
Links
First-move Normalized Score
Second-move Normalized Score
Strat-Reasoner-4B
Source category=Open-s...
2026.05
90.77
81.84
GPT-5-mini
Source category=Closed...
2026.05
88.84
89.73
Gemini-2.5-flash
Source category=Closed...
2026.05
86.63
85.52
Qwen3-32B
Source category=Open-s...
2026.05
76.53
76.42
MARSHAL-4B
Source category=Open-s...
2026.05
73.47
81.55
Gemma3-12B
Source category=Open-s...
2026.05
71.61
73.58
SPIRAL-4B
Source category=Open-s...
2026.05
70.45
69.77
Qwen3-8B
Source category=Open-s...
2026.05
69.72
69.39
Qwen3-4B
Source category=Open-s...
2026.05
65.24
69.56
Feedback
Search any
task
Search any
task