Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Offline Multi-Agent Sequential Decision Making on LBF 11x11-6p-4f

96Win Rate

DLM-GRPO

25.2843.646280.36Apr 26, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
96
2026.04
91
2026.04
85
2026.04
85
2026.04
77
2026.04
69
2026.04
30
2026.04
28