Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Latent Action World Models for Control with Unlabeled Trajectories

About

Inspired by how humans combine direct interaction with action-free experience (e.g., videos), we study world models that learn from heterogeneous data. Standard world models typically rely on action-conditioned trajectories, which limits effectiveness when action labels are scarce. We introduce a family of latent-action world models that jointly use action-conditioned and action-free data by learning a shared latent action representation. This latent space aligns observed control signals with actions inferred from passive observations, enabling a single dynamics model to train on large-scale unlabeled trajectories while requiring only a small set of action-labeled ones. We use the latent-action world model to learn a latent-action policy through offline reinforcement learning (RL), thereby bridging two traditionally separate domains: offline RL, which typically relies on action-conditioned data, and action-free training, which is rarely used with subsequent RL. On the DeepMind Control Suite, our approach achieves strong performance while using about an order of magnitude fewer action-labeled samples than purely action-conditioned baselines. These results show that latent actions enable training on both passive and interactive data, which makes world models learn more efficiently.

Marvin Alles, Xingyuan Zhang, Patrick van der Smagt, Philip Becker-Ehmck• 2025

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement Learningwalker2d medium
Normalized Score91.4
51
Offline Reinforcement Learningwalker2d medium-replay
Normalized Score75.9
50
Offline Reinforcement Learninghalfcheetah medium-replay
Normalized Score68.4
43
Offline Reinforcement LearningDMControl walker-walk (expert)
Normalized Score94.3
12
Offline Reinforcement LearningDMControl cheetah-run (expert)
Normalized Score52.4
12
Offline Reinforcement LearningDeepMind Control Suite hopper-stand medium
Mean Normalized Return65
6
Offline Reinforcement LearningDeepMind Control Suite hopper-stand plan2explore
Mean Normalized Return54.1
6
Offline Reinforcement LearningDeepMind Control Suite walker-walk plan2explore
Mean Normalized Return81.9
6
Offline Reinforcement LearningDeepMind Control Suite cheetah-run plan2explore
Mean Normalized Return26.5
6
Offline Reinforcement LearningDeepMind Control Suite hopper-stand medium-replay
Mean Normalized Return46.9
6
Showing 10 of 12 rows

Other info

Follow for update