Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ViterbiPlanNet: Injecting Procedural Knowledge via Differentiable Viterbi for Planning in Instructional Videos

About

Procedural planning aims to predict a sequence of actions that transforms an initial visual state into a desired goal, a fundamental ability for intelligent agents operating in complex environments. Existing approaches typically rely on large-scale models that learn procedural structures implicitly, resulting in limited sample-efficiency and high computational cost. In this work we introduce ViterbiPlanNet, a principled framework that explicitly integrates procedural knowledge into the learning process through a Differentiable Viterbi Layer (DVL). The DVL embeds a Procedural Knowledge Graph (PKG) directly with the Viterbi decoding algorithm, replacing non-differentiable operations with smooth relaxations that enable end-to-end optimization. This design allows the model to learn through graph-based decoding. Experiments on CrossTask, COIN, and NIV demonstrate that ViterbiPlanNet achieves state-of-the-art performance with an order of magnitude fewer parameters than diffusion- and LLM-based planners. Extensive ablations show that performance gains arise from our differentiable structure-aware training rather than post-hoc refinement, resulting in improved sample efficiency and robustness to shorter unseen horizons. We also address testing inconsistencies establishing a unified testing protocol with consistent splits and evaluation metrics. With this new protocol, we run experiments multiple times and report results using bootstrapping to assess statistical significance.

Luigi Seminara, Davide Moltisanti, Antonino Furnari• 2026

Related benchmarks

TaskDatasetResultRank
Procedure PlanningCrossTask
Success Rate (SR)39.75
43
Procedure PlanningCOIN T=3 (test)
SR0.3399
40
Procedure PlanningNIV T=3 (test)
SR32.37
30
Procedure PlanningCrossTask T=3 (test)
SR38.45
27
Procedure PlanningNIV
Success Rate (SR)34.44
26
Visual PlanningCrossTask
Success Rate (SR)38.45
22
Visual PlanningCOIN
Success Rate (SR)33.99
22
Procedure PlanningEgoPER
Success Rate (SR)51.84
8
Procedure PlanningCrossTask T=3
Success Rate (SR)39.75
7
Procedure PlanningCrossTask T=4
SR0.2419
7
Showing 10 of 11 rows

Other info

Follow for update