Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Deployment-Time Reliability of Learned Robot Policies

About

Recent advances in learning-based robot manipulation have produced policies with remarkable capabilities. Yet, reliability at deployment remains a fundamental barrier to real-world use, where distribution shift, compounding errors, and complex task dependencies collectively undermine system performance. This dissertation investigates how the reliability of learned robot policies can be improved at deployment time through mechanisms that operate around them. We develop three complementary classes of deployment-time mechanisms. First, we introduce runtime monitoring methods that detect impending failures by identifying inconsistencies in closed-loop policy behavior and deviations in task progress, without requiring failure data or task-specific supervision. Second, we propose a data-centric framework for policy interpretability that traces deployment-time successes and failures to influential training demonstrations using influence functions, enabling principled diagnosis and dataset curation. Third, we address reliable long-horizon task execution by formulating policy coordination as the problem of estimating and maximizing the success probability of behavior sequences, and we extend this formulation to open-ended, language-specified tasks through feasibility-aware task planning. By centering on core challenges of deployment, these contributions advance practical foundations for the reliable, real-world use of learned robot policies. Continued progress on these foundations will be essential for enabling trustworthy and scalable robot autonomy in the future.

Christopher Agia• 2026

Related benchmarks

TaskDatasetResultRank
Failure DetectionClose Box In-Distribution
TPR100
19
Failure DetectionClose Box Combined
TPR100
19
Failure DetectionClose Box Out-of-Distribution
TPR100
19
Task Progression Failure DetectionCover Object Policy Success Rate: 98% (In-Distribution)
TPR100
16
Task Progression Failure DetectionCover Object Policy Success Rate: 3% (Out-of-Distribution)
TPR88
16
Task Progression Failure DetectionCover Object (Combined)
TPR88
16
Failure DetectionClose Box In-Distribution (train test)
TPR1
15
Failure DetectionClose Box Combined (train test)
TPR100
15
Failure DetectionClose Box Out-of-Distribution (train test)
TPR100
15
Task Progression Failure DetectionClose Box In-Distribution
TPR100
12
Showing 10 of 13 rows

Other info

Follow for update