Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Set-Constrained Viterbi for Set-Supervised Action Segmentation

About

This paper is about weakly supervised action segmentation, where the ground truth specifies only a set of actions present in a training video, but not their true temporal ordering. Prior work typically uses a classifier that independently labels video frames for generating the pseudo ground truth, and multiple instance learning for training the classifier. We extend this framework by specifying an HMM, which accounts for co-occurrences of action classes and their temporal lengths, and by explicitly training the HMM on a Viterbi-based loss. Our first contribution is the formulation of a new set-constrained Viterbi algorithm (SCV). Given a video, the SCV generates the MAP action segmentation that satisfies the ground truth. This prediction is used as a framewise pseudo ground truth in our HMM training. Our second contribution in training is a new regularization of feature affinities between training videos that share the same action classes. Evaluation on action segmentation and alignment on the Breakfast, MPII Cooking2, Hollywood Extended datasets demonstrates our significant performance improvement for the two tasks over prior work.

Jun Li, Sinisa Todorovic• 2020

Related benchmarks

TaskDatasetResultRank
Temporal action segmentationBreakfast
Accuracy30.2
96
Action SegmentationBreakfast (test)
MoF30.2
31
Action AlignmentHollywood Extended (test)
IoD35.5
12
Action AlignmentBreakfast (test)
MoF40.8
9
Action SegmentationMPII Cooking 2 (test)
Midpoint Hit14.5
5
Action AlignmentCooking 2 (test)
Midpoint Score15.1
4
Showing 6 of 6 rows

Other info

Follow for update