Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Offline Multi-Agent Sequential Decision Making on SMAC unseen tasks

78Win Rate (3s vs 3z)

DLM-GRPO

43.6852.5961.570.41Apr 26, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
788293100991006469
2026.04
71799010098995764
2026.04
52678854796872
2026.04
4573926957851311