Fast Weakly Supervised Action Segmentation Using Mutual Consistency

About

Action segmentation is the task of predicting the actions for each frame of a video. As obtaining the full annotation of videos for action segmentation is expensive, weakly supervised approaches that can learn only from transcripts are appealing. In this paper, we propose a novel end-to-end approach for weakly supervised action segmentation based on a two-branch neural network. The two branches of our network predict two redundant but different representations for action segmentation and we propose a novel mutual consistency (MuCon) loss that enforces the consistency of the two redundant representations. Using the MuCon loss together with a loss for transcript prediction, our proposed approach achieves the accuracy of state-of-the-art approaches while being $14$ times faster to train and $20$ times faster during inference. The MuCon loss proves beneficial even in the fully supervised setting.

Yaser Souri, Mohsen Fayyaz, Luca Minciullo, Gianpiero Francesca, Juergen Gall• 2019

Related benchmarks

Task	Dataset	Result
Action Segmentation	Breakfast	Acc62.8	127
Temporal action segmentation	Breakfast	Accuracy47.1	119
Action Segmentation	Breakfast	MoF50.7	78
Action Segmentation	Breakfast 14	MoF49.7	26
Action Alignment	Breakfast	IoD66.2	18
Action Alignment	Hollywood Extended	IoD52.3	15
Action Segmentation	Hollywood Extended	--	10
Weakly-supervised Action Segmentation	Hollywood Extended	IoU13.9	9
Action Segmentation	Breakfast dataset (All splits)	MoF48.5	7
Action Segmentation	Hollywood Extended (avg)	Mof-bg41.6	6

Showing 10 of 12 rows

Other info

Code

Follow for update

@wizwand_team Discord