Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Offline Multi-Agent Sequential Decision Making on SMAC unseen tasks
Loading...
78
Win Rate (3s vs 3z)
DLM-GRPO
43.68
52.59
61.5
70.41
Apr 26, 2026
Win Rate (3s vs 3z)
Win Rate (3s vs 4z)
Win Rate (3m)
Win Rate (8m)
Win Rate (25m)
Win Rate (MMM)
Win Rate (1o 10b vs 1r)
Win Rate (1o 2r vs 4r)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Win Rate (3s vs 3z)
Win Rate (3s vs 4z)
Win Rate (3m)
Win Rate (8m)
Win Rate (25m)
Win Rate (MMM)
Win Rate (1o 10b vs 1r)
Win Rate (1o 2r vs 4r)
DLM-GRPO
Evaluation Protocol=Ze...
2026.04
78
82
93
100
99
100
64
69
DLM-SFT
Evaluation Protocol=Ze...
2026.04
71
79
90
100
98
99
57
64
GATO
Evaluation Protocol=Ze...
2026.04
52
67
88
54
79
68
7
2
MADT
Evaluation Protocol=Ze...
2026.04
45
73
92
69
57
85
13
11
Feedback
Search any
task
Search any
task