Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Fast Weakly Supervised Action Segmentation Using Mutual Consistency

About

Action segmentation is the task of predicting the actions for each frame of a video. As obtaining the full annotation of videos for action segmentation is expensive, weakly supervised approaches that can learn only from transcripts are appealing. In this paper, we propose a novel end-to-end approach for weakly supervised action segmentation based on a two-branch neural network. The two branches of our network predict two redundant but different representations for action segmentation and we propose a novel mutual consistency (MuCon) loss that enforces the consistency of the two redundant representations. Using the MuCon loss together with a loss for transcript prediction, our proposed approach achieves the accuracy of state-of-the-art approaches while being $14$ times faster to train and $20$ times faster during inference. The MuCon loss proves beneficial even in the fully supervised setting.

Yaser Souri, Mohsen Fayyaz, Luca Minciullo, Gianpiero Francesca, Juergen Gall• 2019

Related benchmarks

TaskDatasetResultRank
Action SegmentationBreakfast
F1@1073.2
107
Temporal action segmentationBreakfast
Accuracy47.1
96
Action SegmentationBreakfast
MoF50.7
66
Action SegmentationBreakfast 14
MoF49.7
26
Action AlignmentBreakfast
IoD66.2
18
Action AlignmentHollywood Extended
IoD52.3
15
Action SegmentationHollywood Extended--
10
Weakly-supervised Action SegmentationHollywood Extended
IoU13.9
9
Action SegmentationBreakfast dataset (All splits)
MoF48.5
7
Action SegmentationHollywood Extended (avg)
Mof-bg41.6
6
Showing 10 of 12 rows

Other info

Code

Follow for update