Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MARL-GPT: Foundation Model for Multi-Agent Reinforcement Learning

About

Recent advances in multi-agent reinforcement learning (MARL) have demonstrated success in numerous challenging domains and environments, but typically require specialized models for each task. In this work, we propose a coherent methodology that makes it possible for a single GPT-based model to learn and perform well across diverse MARL environments and tasks, including StarCraft Multi-Agent Challenge, Google Research Football and POGEMA. Our method, MARL-GPT, applies offline reinforcement learning to train at scale on the expert trajectories (400M for SMACv2, 100M for GRF, and 1B for POGEMA) combined with a single transformer-based observation encoder that requires no task-specific tuning. Experiments show that MARL-GPT achieves competitive performance compared to specialized baselines in all tested environments. Thus, our findings suggest that it is, indeed, possible to build a multi-task transformer-based model for a wide variety of (significantly different) multi-agent problems paving the way to the fundamental MARL model (akin to ChatGPT, Llama, Mistral etc. in natural language modeling).

Maria Nesterova, Mikhail Kolosov, Anton Andreychuk, Egor Cherepanov, Oleg Bulichev, Alexey Kovalev, Konstantin Yakovlev, Aleksandr Panov, Alexey Skrynnik• 2026

Related benchmarks

TaskDatasetResultRank
Multi-Agent Reinforcement LearningSMAC v2
Protoss 5v5 Win Rate89
7
Multi-Agent PathfindingPOGEMA
Average Throughput (Random)1.16
7
Multi-Agent Reinforcement LearningGoogle Research Football (GRF)
Win Rate: Pass and Shoot96
7
Showing 3 of 3 rows

Other info

Follow for update