Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-Agent Strategic Reasoning on SimpleHanabi OOD

73.1Collective Avg Normalized Score

Gemini-2.5-flash

54.827259.571164.31569.0589May 6, 2026
Updated 27d ago

Evaluation Results

MethodLinks
2026.05
73.1
2026.05
72.77
2026.05
68.63
2026.05
65.65
2026.05
65.35
2026.05
64.32
2026.05
55.71
2026.05
55.55
2026.05
55.53