Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-Agent Strategic Reasoning on LeducHoldem OOD
Loading...
83.51
First-mover Normalized Score
Gemini-2.5-flash
55.118
62.489
69.86
77.231
May 6, 2026
First-mover Normalized Score
Second-mover Normalized Score
Updated 27d ago
Evaluation Results
Method
Method
Links
First-mover Normalized Score
Second-mover Normalized Score
Gemini-2.5-flash
Model Source=Closed-so...
2026.05
83.51
96.96
GPT-5-mini
Model Source=Closed-so...
2026.05
80.6
95.08
Strat-Reasoner-4B
Model Source=Open-sour...
2026.05
70.12
66.64
Qwen3-32B
Model Source=Open-sour...
2026.05
67.6
67.51
MARSHAL-4B
Model Source=Open-sour...
2026.05
66.42
69.22
Qwen3-4B
Model Source=Open-sour...
2026.05
61.35
56.18
Qwen3-8B
Model Source=Open-sour...
2026.05
60.93
64.41
Gemma3-12B
Model Source=Open-sour...
2026.05
60.55
60.13
SPIRAL-4B
Model Source=Open-sour...
2026.05
56.21
55.85
Feedback
Search any
task
Search any
task