Share your thoughts, 1 month free Claude Pro on usSee more

Offline Multi-Agent Reinforcement Learning on SMAC

973s5z Win Rate

DLM-GRPO

Updated 2mo ago

Evaluation Results

Method	Links
DLM-GRPO 2026.04		97	100	80	94	75	92
DLM-SFT 2026.04		94	98	72	84	67	81
MADT 2026.04		81	88	67	72	53	64
GATO 2026.04		72	92	63	65	37	56
SAYCAN 2026.04		5	10	7	8	2	0