Implicit Drifting Policy: One-Step Action Generation via Conditional Expert Geometry
About
Generative action policies based on diffusion or flow matching excel in behavior cloning, yet their iterative sampling is prohibitive for high-frequency robot control. While recent one-step formulations alleviate this latency, they inevitably discard the intermediate trajectory evolution that provides crucial action correction. Directly recovering this mechanism by explicitly estimating a training-time drifting field is mathematically ill-posed due to extreme conditional demonstration sparsity. We introduce Implicit Drifting Policy (IDP), a one-step imitation learning framework that brings the training-time correction of Drifting into policy learning without explicit vector field estimation. IDP extracts a conditional expert geometry from the local variation of observation-similar expert actions, and compares it against a global reference geometry to isolate condition-specific constraints. This local geometric structure adaptively weights a scalar potential objective. Combined with an expert-proximal terminal evaluation, IDP directly enforces manifold constraints on the one-step generator during training. Extensive evaluations across 2D, 3D, and real-world manipulation tasks show IDP effectively maintains adherence to valid action manifolds, improving upon explicit drifting methods and achieving competitive performance with strong one-step baselines.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D pointcloud manipulation | MetaWorld | Success Rate (Easy)93.2 | 30 | |
| Robotic Manipulation | DexArt | Success Rate (Bucket)30 | 29 | |
| Robot Manipulation | MetaWorld, Adroit, and Dexart Combined | Average Success Rate81.3 | 25 | |
| Dexterous Manipulation | Adroit | Hammer Success100 | 17 | |
| Robotic Manipulation | Robomimic Square (mh) 2D state-based | Success Rate (Max Performance)82 | 7 | |
| Robotic Manipulation | PushT 2D state-based | Success Rate (Max Performance)97 | 7 | |
| Robotics manipulation | Robomimic Can 2D image-based (multi-human) | Success Rate (Max Performance)95 | 7 | |
| Robotic Manipulation | Robomimic Can (mh) 2D state-based | Success Rate (Max Perf)98 | 7 | |
| Robotic Manipulation | Tool-Hang 2D state-based | Success Rate (Max Performance)81 | 7 | |
| Robotics manipulation | Robomimic Can 2D image-based (proficient-human) | Success Rate (Max Perf)99 | 7 |