Learning Latent Super-Events to Detect Multiple Activities in Videos
About
In this paper, we introduce the concept of learning latent super-events from activity videos, and present how it benefits activity detection in continuous videos. We define a super-event as a set of multiple events occurring together in videos with a particular temporal organization; it is the opposite concept of sub-events. Real-world videos contain multiple activities and are rarely segmented (e.g., surveillance videos), and learning latent super-events allows the model to capture how the events are temporally related in videos. We design temporal structure filters that enable the model to focus on particular sub-intervals of the videos, and use them together with a soft attention mechanism to learn representations of latent super-events. Super-event representations are combined with per-frame or per-segment CNNs to provide frame-level annotations. Our approach is designed to be fully differentiable, enabling end-to-end learning of latent super-event representations jointly with the activity detector using them. Our experiments with multiple public video datasets confirm that the proposed concept of latent super-event learning significantly benefits activity detection, advancing the state-of-the-arts.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Activity Detection | Charades localize v1 | mAP25.2 | 52 | |
| Activity Detection | MLB-YouTube (test) | mAP39.6 | 51 | |
| Temporal Action Localization | MultiTHUMOS | f-mAP36.4 | 20 | |
| Activity Detection | MultiTHUMOS | mAP36.4 | 16 | |
| Action Detection | MultiTHUMOS | -- | 16 | |
| Action Recognition (Dense Labeling) | MultiTHUMOS (test) | mAP36.4 | 15 | |
| Temporal Activity Detection | Charades v1_localize (val) | mAP19.41 | 15 | |
| Multi-label Temporal Action Localization | Charades per-frame 51 | mAP19.41 | 14 | |
| Multi-label Temporal Action Segmentation | Charades 1.0 (test) | Seg-mAP18.6 | 14 | |
| Temporal Activity Detection | MultiTHUMOS 2018 (test) | mAP46.4 | 12 |