NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning
About
Video learning is an important task in computer vision and has experienced increasing interest over the recent years. Since even a small amount of videos easily comprises several million frames, methods that do not rely on a frame-level annotation are of special importance. In this work, we propose a novel learning algorithm with a Viterbi-based loss that allows for online and incremental learning of weakly annotated video data. We moreover show that explicit context and length modeling leads to huge improvements in video segmentation and labeling tasks andinclude these models into our framework. On several action segmentation benchmarks, we obtain an improvement of up to 10% compared to current state-of-the-art methods.
Alexander Richard, Hilde Kuehne, Ahsan Iqbal, Juergen Gall• 2018
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Temporal action segmentation | 50Salads | Accuracy78.7 | 106 | |
| Temporal action segmentation | Breakfast | Accuracy74.1 | 96 | |
| Action Segmentation | Breakfast | MoF43 | 66 | |
| Action Segmentation | Breakfast (test) | MoF43 | 31 | |
| Action Segmentation | COIN | Frame Accuracy21.2 | 29 | |
| Action Segmentation | Breakfast 14 | MoF43 | 26 | |
| Action Segmentation | COIN (test) | Frame Accuracy21.2 | 23 | |
| Action Segmentation | Breakfast Action dataset | MoF43 | 22 | |
| Action Segmentation | 50Salads mid granularity | MoF49.4 | 19 | |
| Action Alignment | Hollywood Extended | IoD48.7 | 15 |
Showing 10 of 22 rows