Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making
About
Recent work has framed decision-making as a sequence modeling problem using generative models such as diffusion models. Although promising, these approaches often overlook latent factors that exhibit evolving dynamics, elements that are fundamental to environment transitions, reward structures, and high-level agent behavior. Explicitly modeling these hidden processes is essential for both precise dynamics modeling and effective decision-making. In this paper, we propose a unified framework that explicitly incorporates latent dynamic inference into generative decision-making from minimal yet sufficient observations. We theoretically show that under mild conditions, the latent process can be identified from small temporal blocks of observations. Building on this insight, we introduce Ada-Diffuser, a causal diffusion model that learns the temporal structure of observed interactions and the underlying latent dynamics simultaneously, and furthermore, leverages them for planning and control. With a modular design, Ada-Diffuser supports both planning and policy learning tasks, enabling adaptation to latent variations in dynamics, rewards, and latent actions. Experiments on simulated control and robotic benchmarks demonstrate its effectiveness in accurate latent inference and adaptive policy learning.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Robotic Manipulation | LIBERO | Spatial Success Rate79.2 | 527 | |
| Robotic Manipulation | Robomimic Can | Success Rate98 | 30 | |
| Robotic Manipulation | Robomimic Lift | Success Rate98 | 28 | |
| Robotic Manipulation | Robomimic Square | Success Rate89 | 26 | |
| Locomotion | Cheetah-Wind-E (c^s) | Average Return62.4 | 14 | |
| Navigation | D4RL Maze2d-umaze | Normalized Return148.6 | 14 | |
| Locomotion | Cheetah-Wind-S c^s | Average Return-65.3 | 14 | |
| Locomotion | Cheetah-Vel-E (c^r) | Average Return-39.2 | 14 | |
| Locomotion | Ant-Dir-E c^r | Average Return296.4 | 14 | |
| Reinforcement Learning | Cheetah-Wind-E dynamics changes episodic | Average Return-41.6 | 8 |