Obstruction reasoning for robotic grasping
About
Successful robotic grasping in cluttered environments not only requires a model to visually ground a target object but also to reason about obstructions that must be cleared beforehand. While current vision-language embodied reasoning models show emergent spatial understanding, they remain limited in terms of obstruction reasoning and accessibility planning. To bridge this gap, we present UNOGrasp, a learning-based vision-language model capable of performing visually-grounded obstruction reasoning to infer the sequence of actions needed to unobstruct the path and grasp the target object. We devise a novel multi-step reasoning process based on obstruction paths originated by the target object. We anchor each reasoning step with obstruction-aware visual cues to incentivize reasoning capability. UNOGrasp combines supervised and reinforcement finetuning through verifiable reasoning rewards. Moreover, we construct UNOBench, a large-scale dataset for both training and benchmarking, based on MetaGraspNetV2, with over 100k obstruction paths annotated by humans with obstruction ratios, contact points, and natural-language instructions. Extensive experiments and real-robot evaluations show that UNOGrasp significantly improves obstruction reasoning and grasp success across both synthetic and real-world environments, outperforming generalist and proprietary alternatives. Project website: https://tev-fbk.github.io/UnoGrasp/.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Path-level reasoning | UNOBench real No obstructions | SR (%)72.5 | 10 | |
| Path-level reasoning | UNOBench real (Easy) | SR (Precision)76.2 | 10 | |
| Path-level reasoning | UNOBench real (Medium) | SR-P (%)76.6 | 10 | |
| Path-level reasoning | UNOBench real (Hard) | SR-P79.5 | 10 | |
| Path-level reasoning | UNOBench synthetic No obstructions (test) | SR (%)94.8 | 10 | |
| Path-level reasoning | UNOBench synthetic Easy (test) | SR (Precision)82.8 | 10 | |
| Path-level reasoning | UNOBench synthetic Medium (test) | SR-P (%)74.8 | 10 | |
| Path-level reasoning | UNOBench synthetic Hard (test) | SR-P56.8 | 10 | |
| Object-level reasoning | UNOBench real set (Easy) | OP0.757 | 8 | |
| Object-level reasoning | UNOBench Easy synthetic (test) | OP81.3 | 8 |