Share your thoughts, 1 month free Claude Pro on usSee more

Multi-Agent Strategic Reasoning on LeducHoldem OOD

83.51First-mover Normalized Score

Gemini-2.5-flash

Updated 2mo ago

Evaluation Results

Method	Links
Gemini-2.5-flash 2026.05		83.51	96.96
GPT-5-mini 2026.05		80.6	95.08
Strat-Reasoner-4B 2026.05		70.12	66.64
Qwen3-32B 2026.05		67.6	67.51
MARSHAL-4B 2026.05		66.42	69.22
Qwen3-4B 2026.05		61.35	56.18
Qwen3-8B 2026.05		60.93	64.41
Gemma3-12B 2026.05		60.55	60.13
SPIRAL-4B 2026.05		56.21	55.85