Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision

About

In this paper, we study the problem of procedure planning in instructional videos. Here, an agent must produce a plausible sequence of actions that can transform the environment from a given start to a desired goal state. When learning procedure planning from instructional videos, most recent work leverages intermediate visual observations as supervision, which requires expensive annotation efforts to localize precisely all the instructional steps in training videos. In contrast, we remove the need for expensive temporal video annotations and propose a weakly supervised approach by learning from natural language instructions. Our model is based on a transformer equipped with a memory module, which maps the start and goal observations to a sequence of plausible actions. Furthermore, we augment our model with a probabilistic generative module to capture the uncertainty inherent to procedure planning, an aspect largely overlooked by previous work. We evaluate our model on three datasets and show our weaklysupervised approach outperforms previous fully supervised state-of-the-art models on multiple metrics.

He Zhao, Isma Hadji, Nikita Dvornik, Konstantinos G. Derpanis, Richard P. Wildes, Allan D. Jepson• 2022

Related benchmarks

TaskDatasetResultRank
Procedure PlanningCrossTask
Success Rate (SR)23.34
35
Procedure PlanningCOIN T=3 (test)
SR15.4
21
Procedure PlanningCrossTask T=5
Success Rate11.8
15
Procedure PlanningCOIN T=4 (test)
SR11.32
13
Procedure PlanningNIV T=3 (test)
SR24.68
12
Procedure PlanningNIV T=4 (test)
SR20.14
12
Procedure PlanningCrossTask short horizon T=3
SR23.34
11
Procedure PlanningCrossTask short horizon T=4
SR13.4
10
Procedure PlanningCrossTask long horizons T=6
Success Rate (SR)4.4
10
Procedure PlanningCrossTask T=3 (test)
SR23.34
9
Showing 10 of 20 rows

Other info

Code

Follow for update