Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AdaWorld: Learning Adaptable World Models with Latent Actions

About

World models aim to learn action-controlled future prediction and have proven essential for the development of intelligent agents. However, most existing world models rely heavily on substantial action-labeled data and costly training, making it challenging to adapt to novel environments with heterogeneous actions through limited interactions. This limitation can hinder their applicability across broader domains. To overcome this limitation, we propose AdaWorld, an innovative world model learning approach that enables efficient adaptation. The key idea is to incorporate action information during the pretraining of world models. This is achieved by extracting latent actions from videos in a self-supervised manner, capturing the most critical transitions between frames. We then develop an autoregressive world model that conditions on these latent actions. This learning paradigm enables highly adaptable world models, facilitating efficient transfer and learning of new actions even with limited interactions and finetuning. Our comprehensive experiments across multiple environments demonstrate that AdaWorld achieves superior performance in both simulation quality and visual planning.

Shenyuan Gao, Siyuan Zhou, Yilun Du, Jun Zhang, Chuang Gan• 2025

Related benchmarks

TaskDatasetResultRank
8-step sequence generationSS v2 (test)
SSIM76.3
10
8-step sequence generationCOIL-100 (OOD)
SSIM77.8
8
Video GenerationSS v2
SSIM67.4
7
World Model PredictionBigfish
PSNR30.3
7
World Model PredictionStarpilot
PSNR27.8
7
Video GenerationRT-1
SSIM63.4
7
World Model PredictionLeaper
PSNR23.3
7
World Model PredictionMultiGrid
PSNR24
7
World Model PredictionnuPlan
PSNR18.1
6
Video-level world modelingrobosuite Push
PSNR (10 steps)26.6647
6
Showing 10 of 16 rows

Other info

Follow for update