RoboFAC: A Comprehensive Framework for Robotic Failure Analysis and Correction
About
Vision-Language-Action (VLA) models have recently advanced robotic manipulation by translating natural-language instructions and visual observations into control actions. However, existing VLAs are primarily trained on successful expert demonstrations and lack structured supervision for failure diagnosis and recovery, limiting robustness in open-world scenarios. To address this limitation, we propose the Robotic Failure Analysis and Correction (RoboFAC) framework. We construct a large-scale failure-centric dataset comprising 9,440 erroneous manipulation trajectories and 78,623 QA pairs across 53 scenes in both simulation and real-world environments, with systematically categorized failure types. Leveraging this dataset, we develop a lightweight multimodal model specialized for task understanding, failure analysis, and failure correction, enabling efficient local deployment while remaining competitive with large proprietary models. Experimental results demonstrate that RoboFAC achieves a 34.1% higher failure analysis accuracy compared to GPT-4o. Furthermore, we integrated RoboFAC as an external supervisor in a real-world VLA control pipeline, yielding a 29.1% relative improvement across four tasks while significantly reducing latency relative to GPT-4o. These results demonstrate that RoboFAC enables systematic failure diagnosis and recovery, significantly enhancing VLA recovery capabilities. Our model and dataset are publicly available at https://github.com/MINT-SJTU/RoboFAC.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Robotic Manipulation | ManiSkill3 | Average Success Rate82.7 | 21 | |
| Robotic Manipulation | Real-world Manipulation SO-100 | Place Success Rate60 | 10 | |
| Robot Failure Analysis (MCQ) | RoboFAC Simulation | FD Score91 | 7 | |
| Robot Failure Analysis (MCQ) | RoboFAC (Real-world) | FD80 | 7 | |
| Robotic Failure Analysis | RoboFAC 1.0 (mixed simulated and real-world) | Task Success Rate (Short Horizon)82.74 | 6 | |
| Free-language reasoning | RoboFAC Simulation | ROUGE-L (TI)32.3 | 4 | |
| Free-language reasoning | RoboFAC (Real-world) | ROUGE-L (TI)33.7 | 4 | |
| Robot Failure Explanation | RoboFail | Coherence Score (CS)0.452 | 3 |