Unsupervised Learning and Segmentation of Complex Activities from Video
About
This paper presents a new method for unsupervised segmentation of complex activities from video into multiple steps, or sub-activities, without any textual input. We propose an iterative discriminative-generative approach which alternates between discriminatively learning the appearance of sub-activities from the videos' visual features to sub-activity labels and generatively modelling the temporal structure of sub-activities using a Generalized Mallows Model. In addition, we introduce a model for background to account for frames unrelated to the actual activities. Our approach is validated on the challenging Breakfast Actions and Inria Instructional Videos datasets and outperforms both unsupervised and weakly-supervised state of the art.
Fadime Sener, Angela Yao• 2018
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Action Segmentation | Breakfast | MoF34.6 | 66 | |
| Action Segmentation | Breakfast (test) | MoF34.6 | 31 | |
| Action Segmentation | Breakfast 14 | MoF34.6 | 26 | |
| Action Segmentation | Breakfast Action dataset | MoF34.6 | 22 | |
| Action Segmentation | YouTube Instructions (test) | F1 Score (%)27 | 17 | |
| Action Segmentation | YouTube Instructions | F127 | 16 | |
| Temporal Video Segmentation | Breakfast | MoF0.346 | 14 | |
| Temporal action segmentation | YouTube Instructional YTI (test) | F1 Score27 | 11 | |
| Video segmentation | INRIA Instructional Videos | F1 Score69.2 | 10 | |
| Unsupervised Temporal Action Segmentation | Breakfast | MOF34.6 | 10 |
Showing 10 of 13 rows