Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime Monitoring

About

Vision-Language-Action (VLA) models enable robots to follow natural language instructions and generalize across diverse tasks, but they remain vulnerable to execution failures that compromise reliability in real-world deployment. Detecting such failures during execution is therefore critical for the robust deployment of embodied systems. Existing failure detection methods either rely on expensive action resampling or external models, while alternatives propagate trajectory-level labels uniformly across every timestep, obscuring localized failure signals. In this paper, we propose \textbf{Hide-and-Seek}, a framework that formulates VLA failure detection as a coarsely supervised learning problem. By combining inter-trajectory and intra-trajectory contrastive objectives, Hide-and-Seek localizes failure-indicative actions and induces temporally structured failure signals from trajectory-level supervision alone, without any step-level annotation. We evaluate Hide-and-Seek on LIBERO, VLABench, and a real-world robotic platform across three representative VLA policies: OpenVLA, $\pi_0$, and $\pi_{0.5}$.Our method achieves state-of-the-art multi-task failure detection performance with a practical accuracy--timeliness trade-off under conformal prediction, and generalizes well to both seen and unseen tasks.

Seongheon Park, Wendi Li, Changdae Oh, Samuel Yeh, Zsolt Kira, Michael Hagenow, Sharon Li• 2026

Related benchmarks

TaskDatasetResultRank
Failure DetectionLIBERO-10 Seen Tasks
bACC88.5
28
Failure DetectionLIBERO 10 Unseen Tasks
bACC89.2
28
Failure DetectionVLABench (Seen Tasks)
Balanced Accuracy (bACC)85.6
12
Failure DetectionVLABench (Unseen Tasks)
bACC71.3
12
Failure DetectionCUBE (Unseen)
bACC91.4
8
Failure DetectionKITCHEN (Seen)
bACC96.8
8
Failure DetectionKitchen Unseen
bACC97.2
8
Failure DetectionCUBE (Seen)
bACC96.6
8
Showing 8 of 8 rows

Other info

Follow for update