Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-Agent Strategic Reasoning on SimpleHanabi OOD
Loading...
73.1
Collective Avg Normalized Score
Gemini-2.5-flash
54.8272
59.5711
64.315
69.0589
May 6, 2026
Collective Avg Normalized Score
Updated 27d ago
Evaluation Results
Method
Method
Links
Collective Avg Normalized Score
Gemini-2.5-flash
Model Source=Closed-so...
2026.05
73.1
GPT-5-mini
Model Source=Closed-so...
2026.05
72.77
Strat-Reasoner-4B
Model Source=Open-sour...
2026.05
68.63
MARSHAL-4B
Model Source=Open-sour...
2026.05
65.65
Qwen3-32B
Model Source=Open-sour...
2026.05
65.35
Qwen3-8B
Model Source=Open-sour...
2026.05
64.32
Gemma3-12B
Model Source=Open-sour...
2026.05
55.71
SPIRAL-4B
Model Source=Open-sour...
2026.05
55.55
Qwen3-4B
Model Source=Open-sour...
2026.05
55.53
Feedback
Search any
task
Search any
task