Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Multi-Agent Model-Based Reinforcement Learning with Joint State-Action Learned Embeddings

About

Learning to coordinate many agents in partially observable and highly dynamic environments requires both informative representations and data-efficient training. To address this challenge, we present a novel model-based multi-agent reinforcement learning framework that unifies joint state-action representation learning with imaginative roll-outs. We design a world model trained with variational auto-encoders and augment the model using the state-action learned embedding (SALE). SALE is injected into both the imagination module that forecasts plausible future roll-outs and the joint agent network whose individual action values are combined through a mixing network to estimate the joint action-value function. By coupling imagined trajectories with SALE-based action values, the agents acquire a richer understanding of how their choices influence collective outcomes, leading to improved long-term planning and optimization under limited real-environment interactions. Empirical studies on well-established multi-agent benchmarks, including StarCraft II Micro-Management, Multi-Agent MuJoCo, and Level-Based Foraging challenges, demonstrate consistent gains of our method over baseline algorithms and highlight the effectiveness of joint state-action learned embeddings within a multi-agent model-based paradigm.

Zhizun Wang, David Meger• 2026

Related benchmarks

TaskDatasetResultRank
Multi-Agent Reinforcement LearningSMAC v2 (test)
Win Rate (Protoss 5 Units)81
20
Multi-Agent Reinforcement LearningLevel-Based Foraging 10x10-3p-5f v2 (test)
Final Episode Return53
10
Multi-Agent Reinforcement LearningLevel-Based Foraging 10x10-4p-3f v2 (test)
Final Episode Return88
10
Multi-Agent Reinforcement LearningLevel-Based Foraging 2s-8x8-2p-2f-coop v2 (test)
Final Episode Return93
10
Multi-Agent Reinforcement LearningLevel-Based Foraging 2s-10x10-3p-3f v2 (test)
Final Episode Return86
10
Multi-Agent Reinforcement LearningMAMuJoCo HalfCheetah 6x1 (test)
Average Episodic Return43.1
8
Multi-Agent Reinforcement LearningMAMuJoCo Hopper 3x1 (test)
Average Episodic Return31.02
8
Multi-Agent Reinforcement LearningMAMuJoCo Ant 8x1 (test)
Average Episodic Return45.06
8
Multi-Agent Reinforcement LearningMAMuJoCo Walker2d 6x1 (test)
Average Episodic Return28.56
8
Showing 9 of 9 rows

Other info

Follow for update