One Step Is Enough: Dispersive MeanFlow Policy Optimization
About
Real-time robotic control demands fast action generation. However, existing generative policies based on diffusion and flow matching require multi-step sampling, fundamentally limiting deployment in time-critical scenarios. We propose Dispersive MeanFlow Policy Optimization (DMPO), a unified framework that enables true one-step generation through three key components: MeanFlow for mathematically-derived single-step inference without knowledge distillation, dispersive regularization to prevent representation collapse, and reinforcement learning (RL) fine-tuning to surpass expert demonstrations. Experiments across RoboMimic manipulation and OpenAI Gym locomotion benchmarks demonstrate competitive or superior performance compared to multi-step baselines. With our lightweight model architecture and the three key algorithmic components working in synergy, DMPO exceeds real-time control requirements (>120Hz) with 5-20x inference speedup, reaching hundreds of Hertz on high-performance GPUs. Physical deployment on a Franka-Emika-Panda robot validates real-world applicability.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Robot Policy Inference Efficiency | NVIDIA RTX 4090 simulation (inference) | Inference Time (ms)0.6 | 12 | |
| Robot Policy Inference Efficiency | NVIDIA RTX 2080 physical robot deployment (inference) | Inference Time (ms)2.6 | 12 | |
| Square | RoboMimic MH 300 trajectories Full (multi-human) | -- | 9 | |
| Transport | RoboMimic multi-human 300 trajectories Full | -- | 9 | |
| Can | RoboMimic MH 100 trajectories Simplified (multi-human) | Success Rate100 | 5 | |
| Lift | RoboMimic MH 100 trajectories Simplified (multi-human) | Success Rate100 | 5 | |
| Can | RoboMimic multi-human 300 trajectories Full | -- | 5 | |
| Lift | RoboMimic MH 300 trajectories Full (multi-human) | -- | 5 | |
| Square | RoboMimic MH 100 trajectories Simplified (multi-human) | Success Rate83 | 4 | |
| Transport | RoboMimic MH 100 trajectories Simplified | Success Rate88 | 4 |