Sample from What You See: Visuomotor Policy Learning via Diffusion Bridge with Observation-Embedded Stochastic Differential Equation
About
Imitation learning with diffusion models has advanced robotic control by capturing the multi-modal action distributions. However, existing methods typically treat observations only as high-level conditions to the denoising network, rather than integrating them into the stochastic dynamics of the diffusion process itself. As a result, the sampling is forced to begin from random noise, weakening the coupling between perception and control and often yielding suboptimal performance. We propose BridgePolicy, a generative visuomotor policy that directly integrates observations into the stochastic dynamics via a diffusion-bridge formulation. By constructing an observation-informed trajectory, BridgePolicy enables sampling to start from a rich and informative prior rather than random noise, substantially improving precision and reliability in control. A key difficulty is that diffusion bridge normally connects distributions of matched dimensionality, while robotic observations are heterogeneous and not naturally aligned with actions. To overcome this, we introduce a multi-modal fusion module and a semantic aligner to unify the visual and state inputs and align the observations with action representations, making diffusion bridge applicable to heterogeneous robot data. Extensive experiments across 52 simulation tasks on three benchmarks and 5 real-world tasks demonstrate that BridgePolicy consistently outperforms state-of-the-art generative policies.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Robotic Arm Manipulation | MetaWorld Easy | Success Rate91 | 15 | |
| Robotic Arm Manipulation | MetaWorld Very Hard | Success Rate79 | 15 | |
| Dexterous Hand Control | Adroit | Overall Avg Success Rate81 | 13 | |
| Dexterous Hand Manipulation | DexArt | Success Rate60 | 6 | |
| Robotic Arm Manipulation | MetaWorld Medium | Success Rate75 | 6 | |
| Robot Manipulation (Average) | Real-world tasks Franka Emika Panda | Success Rate90 | 6 | |
| Robotic Arm Manipulation | MetaWorld Hard split | Success Rate58 | 6 | |
| Oven-Opening | Real-world tasks Franka Emika Panda | Success Rate100 | 4 | |
| pick place | Real-world tasks Franka Emika Panda | Success Rate80 | 4 | |
| Pour | Real-world tasks Franka Emika Panda | Success Rate80 | 4 |