Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Weakly Supervised Energy-Based Learning for Action Segmentation

About

This paper is about labeling video frames with action classes under weak supervision in training, where we have access to a temporal ordering of actions, but their start and end frames in training videos are unknown. Following prior work, we use an HMM grounded on a Gated Recurrent Unit (GRU) for frame labeling. Our key contribution is a new constrained discriminative forward loss (CDFL) that we use for training the HMM and GRU under weak supervision. While prior work typically estimates the loss on a single, inferred video segmentation, our CDFL discriminates between the energy of all valid and invalid frame labelings of a training video. A valid frame labeling satisfies the ground-truth temporal ordering of actions, whereas an invalid one violates the ground truth. We specify an efficient recursive algorithm for computing the CDFL in terms of the logadd function of the segmentation energy. Our evaluation on action segmentation and alignment gives superior results to those of the state of the art on the benchmark Breakfast Action, Hollywood Extended, and 50Salads datasets.

Jun Li, Peng Lei, Sinisa Todorovic• 2019

Related benchmarks

TaskDatasetResultRank
Temporal action segmentation50Salads
Accuracy54.7
106
Temporal action segmentationBreakfast
Accuracy50.2
96
Action SegmentationBreakfast
MoF50.2
66
Action SegmentationBreakfast (test)
MoF50.2
31
Action SegmentationBreakfast 14
MoF50.2
26
Action Segmentation50Salads mid granularity
MoF54.7
19
Action AlignmentBreakfast
IoD63.9
18
Action AlignmentHollywood Extended
IoD52.9
15
Temporal Video SegmentationBreakfast
MoF0.502
14
Action AlignmentHollywood Extended (test)
IoD52.9
12
Showing 10 of 22 rows

Other info

Code

Follow for update