Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Offline Multi-Agent Sequential Decision Making on LBF 11x11-6p-4f
Loading...
96
Win Rate
DLM-GRPO
25.28
43.64
62
80.36
Apr 26, 2026
Win Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Win Rate
DLM-GRPO
2026.04
96
DLM-SFT
2026.04
91
MADT
2026.04
85
OMIGA
2026.04
85
CFCQL
2026.04
77
MACQL
2026.04
69
TD3+BC
2026.04
30
BC
2026.04
28
Feedback
Search any
task
Search any
task