Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DiWA: Diffusion Policy Adaptation with World Models

About

Fine-tuning diffusion policies with reinforcement learning (RL) presents significant challenges. The long denoising sequence for each action prediction impedes effective reward propagation. Moreover, standard RL methods require millions of real-world interactions, posing a major bottleneck for practical fine-tuning. Although prior work frames the denoising process in diffusion policies as a Markov Decision Process to enable RL-based updates, its strong dependence on environment interaction remains highly inefficient. To bridge this gap, we introduce DiWA, a novel framework that leverages a world model for fine-tuning diffusion-based robotic skills entirely offline with reinforcement learning. Unlike model-free approaches that require millions of environment interactions to fine-tune a repertoire of robot skills, DiWA achieves effective adaptation using a world model trained once on a few hundred thousand offline play interactions. This results in dramatically improved sample efficiency, making the approach significantly more practical and safer for real-world robot learning. On the challenging CALVIN benchmark, DiWA improves performance across eight tasks using only offline adaptation, while requiring orders of magnitude fewer physical interactions than model-free baselines. To our knowledge, this is the first demonstration of fine-tuning diffusion policies for real-world robotic skills using an offline world model. We make the code publicly available at https://diwa.cs.uni-freiburg.de.

Akshay L Chandra, Iman Nematollahi, Chenguang Huang, Tim Welschehold, Wolfram Burgard, Abhinav Valada• 2025

Related benchmarks

TaskDatasetResultRank
Robotic ManipulationMeta-World v2
Success Rate59.8
14
turn off lightbulbCALVIN
Success Rate82.33
6
Video PredictionMeta-World (Policy rollouts)
FVD644.8
5
Video PredictionMeta-World Random rollouts
FVD880.2
5
Close DrawerCALVIN
Success Rate91.95
3
move slider leftCALVIN
Success Rate83.33
3
open drawerCALVIN
Success Rate74.44
3
turn on LEDCALVIN
Success Rate86.21
3
turn on lightbulbCALVIN
Success Rate91.92
3
move slider rightCALVIN
Success Rate82.76
3
Showing 10 of 10 rows

Other info

Follow for update