Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

FPC-VLA: A Vision-Language-Action Framework with a Supervisor for Failure Prediction and Correction

About

Robotic manipulation is a fundamental component of automation. However, traditional perception-planning pipelines often fall short in open-ended tasks due to limited flexibility, while the architecture of a single end-to-end Vision-Language-Action (VLA) offers promising capabilities but lacks crucial mechanisms for anticipating and recovering from failure. To address these challenges, we propose FPC-VLA, a dual-model framework that integrates VLA with a supervisor for failure prediction and correction. The supervisor evaluates action viability through vision-language queries and generates corrective strategies when risks arise, trained efficiently without manual labeling. A dual-stream fusion module further refines actions by leveraging past predictions. Evaluation results on multiple simulation platforms (SIMPLER and LIBERO) and robot embodiments (WidowX, Google Robot, Franka) show that FPC-VLA outperforms state-of-the-art models in both zero-shot and fine-tuned settings. Successful real-world deployments on diverse, long-horizon tasks confirm FPC-VLA's strong generalization and practical utility for building more reliable autonomous systems.

Yifan Yang, Zhixiang Duan, Tianshi Xie, Fuyu Cao, Pinxi Shen, Peili Song, Piaopiao Jin, Guokang Sun, Shaoqing Xu, Yangwei You, Jingtai Liu• 2025

Related benchmarks

TaskDatasetResultRank
Robot ManipulationLIBERO
Goal Achievement92
494
Robotic ManipulationWidowX
Spoon Success Rate58.3
17
Robotic ManipulationLIBERO 50 rollouts per task
Spatial Success0.87
10
Showing 3 of 3 rows

Other info

Follow for update