Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DLM: Unified Decision Language Models for Offline Multi-Agent Sequential Decision Making

About

Building scalable and reusable multi-agent decision policies from offline datasets remains a challenge in offline multi-agent reinforcement learning (MARL), as existing methods often rely on fixed observation formats and action spaces that limit generalization. In contrast, large language models (LLMs) offer a flexible modeling interface that can naturally accommodate heterogeneous observations and actions. Motivated by this, we propose the Decision Language Model (DLM), which formulates multi-agent decision making as a dialogue-style sequence prediction problem under the centralized training with decentralized execution paradigm. DLM is trained in two stages: a supervised fine-tuning phase, which leverages dialogue-style datasets for centralized training with inter-agent context and generates executable actions from offline trajectories, followed by a group relative policy optimization phase to enhance robustness to out-of-distribution actions through lightweight reward functions. Experiments on multiple benchmarks show that a unified DLM outperforms strong offline MARL baselines and LLM-based conversational decision-making methods, while demonstrating strong zero-shot generalization to unseen scenarios across tasks.

Zhuohui Zhang, Bin Cheng, Bin He• 2026

Related benchmarks

TaskDatasetResultRank
Offline Multi-Agent Sequential Decision MakingLBF 11×11-6p-4f
Win Rate96
8
Offline Multi-Agent Reinforcement LearningSMAC
3s5z Win Rate97
5
Offline Multi-Agent Sequential Decision MakingSMAC unseen tasks
Win Rate (3s vs 3z)78
4
Offline Multi-Agent Sequential Decision MakingSMAC unseen tasks v2
Win Rate (Protoss 5v5)67
4
Showing 4 of 4 rows

Other info

Follow for update