Obstruction reasoning for robotic grasping

About

Successful robotic grasping in cluttered environments not only requires a model to visually ground a target object but also to reason about obstructions that must be cleared beforehand. While current vision-language embodied reasoning models show emergent spatial understanding, they remain limited in terms of obstruction reasoning and accessibility planning. To bridge this gap, we present UNOGrasp, a learning-based vision-language model capable of performing visually-grounded obstruction reasoning to infer the sequence of actions needed to unobstruct the path and grasp the target object. We devise a novel multi-step reasoning process based on obstruction paths originated by the target object. We anchor each reasoning step with obstruction-aware visual cues to incentivize reasoning capability. UNOGrasp combines supervised and reinforcement finetuning through verifiable reasoning rewards. Moreover, we construct UNOBench, a large-scale dataset for both training and benchmarking, based on MetaGraspNetV2, with over 100k obstruction paths annotated by humans with obstruction ratios, contact points, and natural-language instructions. Extensive experiments and real-robot evaluations show that UNOGrasp significantly improves obstruction reasoning and grasp success across both synthetic and real-world environments, outperforming generalist and proprietary alternatives. Project website: https://tev-fbk.github.io/UnoGrasp/.

Runyu Jiao, Matteo Bortolon, Francesco Giuliari, Alice Fasoli, Sergio Povoli, Guofeng Mei, Yiming Wang, Fabio Poiesi• 2025

Related benchmarks

Task	Dataset	Result
Path-level reasoning	UNOBench real No obstructions	SR (%)72.5	10
Path-level reasoning	UNOBench real (Easy)	SR (Precision)76.2	10
Path-level reasoning	UNOBench real (Medium)	SR-P (%)76.6	10
Path-level reasoning	UNOBench real (Hard)	SR-P79.5	10
Path-level reasoning	UNOBench synthetic No obstructions (test)	SR (%)94.8	10
Path-level reasoning	UNOBench synthetic Easy (test)	SR (Precision)82.8	10
Path-level reasoning	UNOBench synthetic Medium (test)	SR-P (%)74.8	10
Path-level reasoning	UNOBench synthetic Hard (test)	SR-P56.8	10
Object-level reasoning	UNOBench real set (Easy)	OP0.757	8
Object-level reasoning	UNOBench Easy synthetic (test)	OP81.3	8

Showing 10 of 20 rows

Other info

Follow for update

@wizwand_team Discord