Deployment-Time Reliability of Learned Robot Policies

About

Recent advances in learning-based robot manipulation have produced policies with remarkable capabilities. Yet, reliability at deployment remains a fundamental barrier to real-world use, where distribution shift, compounding errors, and complex task dependencies collectively undermine system performance. This dissertation investigates how the reliability of learned robot policies can be improved at deployment time through mechanisms that operate around them. We develop three complementary classes of deployment-time mechanisms. First, we introduce runtime monitoring methods that detect impending failures by identifying inconsistencies in closed-loop policy behavior and deviations in task progress, without requiring failure data or task-specific supervision. Second, we propose a data-centric framework for policy interpretability that traces deployment-time successes and failures to influential training demonstrations using influence functions, enabling principled diagnosis and dataset curation. Third, we address reliable long-horizon task execution by formulating policy coordination as the problem of estimating and maximizing the success probability of behavior sequences, and we extend this formulation to open-ended, language-specified tasks through feasibility-aware task planning. By centering on core challenges of deployment, these contributions advance practical foundations for the reliable, real-world use of learned robot policies. Continued progress on these foundations will be essential for enabling trustworthy and scalable robot autonomy in the future.

Christopher Agia• 2026

Related benchmarks

Task	Dataset	Result
Failure Detection	Close Box In-Distribution	TPR100	19
Failure Detection	Close Box Combined	TPR100	19
Failure Detection	Close Box Out-of-Distribution	TPR100	19
Task Progression Failure Detection	Cover Object Policy Success Rate: 98% (In-Distribution)	TPR100	16
Task Progression Failure Detection	Cover Object Policy Success Rate: 3% (Out-of-Distribution)	TPR88	16
Task Progression Failure Detection	Cover Object (Combined)	TPR88	16
Failure Detection	Close Box In-Distribution (train test)	TPR1	15
Failure Detection	Close Box Combined (train test)	TPR100	15
Failure Detection	Close Box Out-of-Distribution (train test)	TPR100	15
Task Progression Failure Detection	Close Box In-Distribution	TPR100	12

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord