Unsupervised Learning and Segmentation of Complex Activities from Video

About

This paper presents a new method for unsupervised segmentation of complex activities from video into multiple steps, or sub-activities, without any textual input. We propose an iterative discriminative-generative approach which alternates between discriminatively learning the appearance of sub-activities from the videos' visual features to sub-activity labels and generatively modelling the temporal structure of sub-activities using a Generalized Mallows Model. In addition, we introduce a model for background to account for frames unrelated to the actual activities. Our approach is validated on the challenging Breakfast Actions and Inria Instructional Videos datasets and outperforms both unsupervised and weakly-supervised state of the art.

Fadime Sener, Angela Yao• 2018

Related benchmarks

Task	Dataset	Result
Action Segmentation	Breakfast	MoF34.6	66
Action Segmentation	Breakfast (test)	MoF34.6	31
Action Segmentation	Breakfast 14	MoF34.6	26
Action Segmentation	Breakfast Action dataset	MoF34.6	22
Action Segmentation	YouTube Instructions (test)	F1 Score (%)27	17
Action Segmentation	YouTube Instructions	F127	16
Unsupervised Temporal Action Segmentation	Breakfast	MOF34.6	16
Temporal Video Segmentation	Breakfast	MoF0.346	14
Temporal action segmentation	YouTube Instructional YTI (test)	F1 Score27	11
Video segmentation	INRIA Instructional Videos	F1 Score69.2	10

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord