RoboFAC: A Comprehensive Framework for Robotic Failure Analysis and Correction

About

Vision-Language-Action (VLA) models have recently advanced robotic manipulation by translating natural-language instructions and visual observations into control actions. However, existing VLAs are primarily trained on successful expert demonstrations and lack structured supervision for failure diagnosis and recovery, limiting robustness in open-world scenarios. To address this limitation, we propose the Robotic Failure Analysis and Correction (RoboFAC) framework. We construct a large-scale failure-centric dataset comprising 9,440 erroneous manipulation trajectories and 78,623 QA pairs across 53 scenes in both simulation and real-world environments, with systematically categorized failure types. Leveraging this dataset, we develop a lightweight multimodal model specialized for task understanding, failure analysis, and failure correction, enabling efficient local deployment while remaining competitive with large proprietary models. Experimental results demonstrate that RoboFAC achieves a 34.1% higher failure analysis accuracy compared to GPT-4o. Furthermore, we integrated RoboFAC as an external supervisor in a real-world VLA control pipeline, yielding a 29.1% relative improvement across four tasks while significantly reducing latency relative to GPT-4o. These results demonstrate that RoboFAC enables systematic failure diagnosis and recovery, significantly enhancing VLA recovery capabilities. Our model and dataset are publicly available at https://github.com/MINT-SJTU/RoboFAC.

Zewei Ye, Weifeng Lu, Minghao Ye, Tao Lin, Shuo Yang, Junchi Yan, Bo Zhao• 2025

Related benchmarks

Task	Dataset	Result
Robotic Manipulation	ManiSkill3	Average Success Rate82.7	28
Robotic Manipulation	Real-world Manipulation SO-100	Place Success Rate60	10
Robot Failure Analysis (MCQ)	RoboFAC Simulation	FD Score91	7
Robot Failure Analysis (MCQ)	RoboFAC (Real-world)	FD80	7
Robotic Failure Analysis	RoboFAC 1.0 (mixed simulated and real-world)	Task Success Rate (Short Horizon)82.74	6
Free-language reasoning	RoboFAC Simulation	ROUGE-L (TI)32.3	4
Free-language reasoning	RoboFAC (Real-world)	ROUGE-L (TI)33.7	4
Robot Failure Explanation	RoboFail	Coherence Score (CS)0.452	3

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord