Learning Robotic Manipulation Policies from Point Clouds with Conditional Flow Matching
About
Learning from expert demonstrations is a promising approach for training robotic manipulation policies from limited data. However, imitation learning algorithms require a number of design choices ranging from the input modality, training objective, and 6-DoF end-effector pose representation. Diffusion-based methods have gained popularity as they enable predicting long-horizon trajectories and handle multimodal action distributions. Recently, Conditional Flow Matching (CFM) (or Rectified Flow) has been proposed as a more flexible generalization of diffusion models. In this paper, we investigate the application of CFM in the context of robotic policy learning and specifically study the interplay with the other design choices required to build an imitation learning algorithm. We show that CFM gives the best performance when combined with point cloud input observations. Additionally, we study the feasibility of a CFM formulation on the SO(3) manifold and evaluate its suitability with a simplified example. We perform extensive experiments on RLBench which demonstrate that our proposed PointFlowMatch approach achieves a state-of-the-art average success rate of 67.8% over eight tasks, double the performance of the next best method.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Robot Manipulation | RLBench | Success Rate (Unplug Charger)83.6 | 12 | |
| Robotic Manipulation | Real-robot manipulation tasks Aggregate | Average Success Rate (Avg SR)48.7 | 9 | |
| Cabinet Opening | Real-robot manipulation Dynamic Tasks | Success Rate0.00e+0 | 4 | |
| Cube Stowing | Real-robot manipulation Dynamic Tasks | Success Rate66.7 | 4 | |
| Grasping | Real-robot manipulation Dynamic Tasks | Success Rate0.00e+0 | 4 | |
| Kitchen Cleanup | Real-robot manipulation Static Tasks | Success Rate90 | 4 | |
| Microwave Loading | Real-robot manipulation Static Tasks | Success Rate86.7 | 4 |