Share your thoughts, 1 month free Claude Pro on usSee more

Multi-Agent Strategic Reasoning on SimpleHanabi OOD

73.1Collective Avg Normalized Score

Gemini-2.5-flash

Updated 2mo ago

Evaluation Results

Method	Links
Gemini-2.5-flash 2026.05		73.1
GPT-5-mini 2026.05		72.77
Strat-Reasoner-4B 2026.05		68.63
MARSHAL-4B 2026.05		65.65
Qwen3-32B 2026.05		65.35
Qwen3-8B 2026.05		64.32
Gemma3-12B 2026.05		55.71
SPIRAL-4B 2026.05		55.55
Qwen3-4B 2026.05		55.53