Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Generalizable Coarse-to-Fine Robot Manipulation via Language-Aligned 3D Keypoints

About

Hierarchical coarse-to-fine policy, where a coarse branch predicts a region of interest to guide a fine-grained action predictor, has demonstrated significant potential in robotic 3D manipulation tasks by especially enhancing sample efficiency and enabling more precise manipulation. However, even augmented with pre-trained models, these hierarchical policies still suffer from generalization issues. To enhance generalization to novel instructions and environment variations, we propose Coarse-to-fine Language-Aligned manipulation Policy (CLAP), a framework that integrates three key components: 1) task decomposition, 2) VLM fine-tuning for 3D keypoint prediction, and 3) 3D-aware representation. Through comprehensive experiments in simulation and on a real robot, we demonstrate its superior generalization capability. Specifically, on GemBench, a benchmark designed for evaluating generalization, our approach achieves a 12\% higher average success rate than the SOTA method while using only 1/5 of the training trajectories. In real-world experiments, our policy, trained on only 10 demonstrations, successfully generalizes to novel instructions and environments.

Jianshu Hu, Lidi Wang, Shujia Li, Yunpeng Jiang, Xiao Li, Paul Weng, Yutong Ban• 2025

Related benchmarks

TaskDatasetResultRank
Multi-task Robotic ManipulationGemBench
Avg Success62
8
Robot ManipulationReal-world Robot Manipulation Table Color Variation
Place Shape Sorter Success Rate50
2
Robot ManipulationReal-world Robot Manipulation Distracted Objects
Success Rate: Place Shape in Sorter0.4
2
Robot ManipulationReal-world Robot Manipulation Light Strength Variation
Place Shape in Shape Sorter50
2
Robot ManipulationReal-world Robot Manipulation Average across all variations
Success Rate: Place Shape (Sorter)50
2
Robot ManipulationReal-world Robot Manipulation No Variation
Place Shape in Sorter Success60
2
Showing 6 of 6 rows

Other info

Follow for update