Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Implicit Drifting Policy: One-Step Action Generation via Conditional Expert Geometry

About

Generative action policies based on diffusion or flow matching excel in behavior cloning, yet their iterative sampling is prohibitive for high-frequency robot control. While recent one-step formulations alleviate this latency, they inevitably discard the intermediate trajectory evolution that provides crucial action correction. Directly recovering this mechanism by explicitly estimating a training-time drifting field is mathematically ill-posed due to extreme conditional demonstration sparsity. We introduce Implicit Drifting Policy (IDP), a one-step imitation learning framework that brings the training-time correction of Drifting into policy learning without explicit vector field estimation. IDP extracts a conditional expert geometry from the local variation of observation-similar expert actions, and compares it against a global reference geometry to isolate condition-specific constraints. This local geometric structure adaptively weights a scalar potential objective. Combined with an expert-proximal terminal evaluation, IDP directly enforces manifold constraints on the one-step generator during training. Extensive evaluations across 2D, 3D, and real-world manipulation tasks show IDP effectively maintains adherence to valid action manifolds, improving upon explicit drifting methods and achieving competitive performance with strong one-step baselines.

Zemin Yang, Yaoyu He, Yiming Zhong, Yuhao Zhang, Xinge Zhu, Yao Mu, Qingqiu Huang, Yuexin Ma• 2026

Related benchmarks

TaskDatasetResultRank
3D pointcloud manipulationMetaWorld
Success Rate (Easy)93.2
30
Robotic ManipulationDexArt
Success Rate (Bucket)30
29
Robot ManipulationMetaWorld, Adroit, and Dexart Combined
Average Success Rate81.3
25
Dexterous ManipulationAdroit
Hammer Success100
17
Robotic ManipulationRobomimic Square (mh) 2D state-based
Success Rate (Max Performance)82
7
Robotic ManipulationPushT 2D state-based
Success Rate (Max Performance)97
7
Robotics manipulationRobomimic Can 2D image-based (multi-human)
Success Rate (Max Performance)95
7
Robotic ManipulationRobomimic Can (mh) 2D state-based
Success Rate (Max Perf)98
7
Robotic ManipulationTool-Hang 2D state-based
Success Rate (Max Performance)81
7
Robotics manipulationRobomimic Can 2D image-based (proficient-human)
Success Rate (Max Perf)99
7
Showing 10 of 25 rows

Other info

Follow for update