Share your thoughts, 1 month free Claude Pro on usSee more

Offline Multi-Agent Sequential Decision Making on SMAC unseen tasks

78Win Rate (3s vs 3z)

DLM-GRPO

Updated 2mo ago

Evaluation Results

Method	Links
DLM-GRPO 2026.04		78	82	93	100	99	100	64	69
DLM-SFT 2026.04		71	79	90	100	98	99	57	64
GATO 2026.04		52	67	88	54	79	68	7	2
MADT 2026.04		45	73	92	69	57	85	13	11