Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Unsupervised Discovery of Failure Taxonomies from Deployment Logs

About

As robotic systems become increasingly integrated into real-world environments, ranging from autonomous vehicles to household assistants, they inevitably encounter diverse and unstructured scenarios that lead to failures. While such failures pose safety and reliability challenges, they also provide rich perceptual data for improving system robustness. However, manually analyzing large-scale failure datasets is impractical and does not scale. In this work, we introduce the problem of unsupervised discovery of failure taxonomies from large volumes of raw failure logs, aiming to obtain semantically coherent and actionable failure modes directly from perceptual trajectories. Our approach first infers structured failure explanations from multimodal inputs using vision-language reasoning, and then performs clustering in the resulting semantic reasoning space, enabling the discovery of recurring failure modes rather than isolated episode-level descriptions. We evaluate our method across robotic manipulation, indoor navigation, and autonomous driving domains, and demonstrate that the discovered taxonomies are consistent, interpretable, and practically useful. In particular, we show that structured failure taxonomies guide targeted data collection for offline policy refinement and enhance runtime failure monitoring systems. Website: https://mllm-failure-clustering.github.io/

Aryaman Gupta, Yusuf Umut Ciftci, Somil Bansal• 2025

Related benchmarks

TaskDatasetResultRank
Failure DetectionReal-World Car Crash Videos (In-Distribution (In-D))
F1 Score71.4
4
Failure DetectionReal-World Car Crash Videos (Out-of-Distribution (OOD))
F1 Score0.779
4
Failure DetectionVision-Based Indoor Robot Navigation Out-of-Distribution (OOD)
F1 Score50
4
Failure Taxonomy RecoveryRoboFail expert taxonomy
CP92
4
Failure DetectionVision-Based Indoor Robot Navigation In-Distribution (In-D)
F1 Score77.2
4
Robot Failure ExplanationRoboFail--
3
Showing 6 of 6 rows

Other info

Follow for update