Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DART: Semantic Recoverability for Structured Tool Agents

About

When a structured tool agent fails mid-execution, the runtime faces a dilemma: replaying the entire task is safe but wasteful, while restoring from a local checkpoint is efficient but can leave committed downstream work tied to an upstream history that no longer exists. This tension is acute in commitment-sensitive settings, where rollback targets a single failed instance yet downstream consumers have already acted on its output. Existing recovery approaches provide mechanical rollback but no criterion for whether a local restore remains semantically valid after downstream commitment. We formalize this gap as semantic recoverability and address it in DART, a modular runtime that localizes the failed instance, certifies semantically recoverable boundaries of that instance, aligns checkpoints to those boundaries, and selects an admissible restore point that preserves committed downstream work under dependency and effect constraints-or blocks otherwise. Across three LLM-driven domains and external validation on a LangGraph-based substrate, DART correctly recovers all evaluated commitment-sensitive cases where baseline local recovery fails, and a five-domain safety audit finds no unsafe admitted rollbacks. These results show that controller legality does not imply semantic validity, and that sound local recovery requires an explicit admissibility check.

Ke Yang, Panpan Li, Zonghan Wu, Kejin Xu, Huaxi Huang, Xiaoshui Huang• 2026

Related benchmarks

TaskDatasetResultRank
LLM RecoveryNavigation (Official)
Success Rate100
6
LLM RecoverySchedule Form (Official)
Success Rate100
6
LLM RecoveryNavigation Commitment-sensitive--
4
LLM RecoveryDiagnosis Commitment-sensitive--
4
LLM RecoveryDiagnosis (Official)--
4
Schedule FormSchedule Form Commit-sensitive
Success Rate100
3
System Safety and Recovery AuditSchedule Form Commit-sensitive
Success Rate100
3
NavigationNavigation Entry-aligned
Success Rate100
3
NavigationNavigation Commit-sensitive
Success Rate100
3
Schedule FormSchedule Form Entry-aligned
Success Rate100
3
Showing 10 of 16 rows

Other info

Follow for update