ManiFlow: A General Robot Manipulation Policy via Consistency Flow Training

About

This paper introduces ManiFlow, a visuomotor imitation learning policy for general robot manipulation that generates precise, high-dimensional actions conditioned on diverse visual, language and proprioceptive inputs. We leverage flow matching with consistency training to enable high-quality dexterous action generation in just 1-2 inference steps. To handle diverse input modalities efficiently, we propose DiT-X, a diffusion transformer architecture with adaptive cross-attention and AdaLN-Zero conditioning that enables fine-grained feature interactions between action tokens and multi-modal observations. ManiFlow demonstrates consistent improvements across diverse simulation benchmarks and nearly doubles success rates on real-world tasks across single-arm, bimanual, and humanoid robot setups with increasing dexterity. The extensive evaluation further demonstrates the strong robustness and generalizability of ManiFlow to novel objects and background changes, and highlights its strong scaling capability with larger-scale datasets. Our website: maniflow-policy.github.io.

Ge Yan, Jiyue Zhu, Yuquan Deng, Shiqi Yang, Ri-Zhao Qiu, Xuxin Cheng, Marius Memmel, Ranjay Krishna, Ankit Goyal, Xiaolong Wang, Dieter Fox• 2025

Related benchmarks

Task	Dataset	Result
Robot Manipulation	RoboTwin Clean 2.0	--	74
Robotic Manipulation	DexArt	Success Rate (Bucket)35.3	29
Dexterous Hand Control	Adroit	Overall Avg Success Rate70	19
Robotic Manipulation	Adroit	SR5 Hammer100	14
Dexterous Hand Manipulation	DexArt	Success Rate70	12
Dexterous Manipulation	Adroit (10 demos)	Hammer Success Rate100	8
Dexterous Manipulation	DexArt (100 demos)	Success Rate (Laptop)93	8
Bimanual Manipulation	RoboTwin 50 demos	Pick Apple Messy Success Rate42	8
Dexterous Manipulation	Bi-DexHands	Success Rate59	6
Dexterous Manipulation	Adroit, DexArt, and Bi-DexHands	Average Success66	6

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord