Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Long-Term Feature Banks for Detailed Video Understanding

About

To understand the world, we humans constantly need to relate the present to the past, and put events in context. In this paper, we enable existing video models to do the same. We propose a long-term feature bank---supportive information extracted over the entire span of a video---to augment state-of-the-art video models that otherwise would only view short clips of 2-5 seconds. Our experiments demonstrate that augmenting 3D convolutional networks with a long-term feature bank yields state-of-the-art results on three challenging video datasets: AVA, EPIC-Kitchens, and Charades.

Chao-Yuan Wu, Christoph Feichtenhofer, Haoqi Fan, Kaiming He, Philipp Kr\"ahenb\"uhl, Ross Girshick• 2018

Related benchmarks

TaskDatasetResultRank
Online Action DetectionTHUMOS14 (test)
mAP64.8
86
Action RecognitionCharades (val)
mAP42.5
69
Action RecognitionCharades
mAP0.425
64
Online Action DetectionTVSeries
mcAP84.8
57
Action RecognitionCharades (test)
mAP0.434
53
Action RecognitionCharades v1 (test)--
52
Action DetectionAVA v2.1 (val)
mAP27.7
48
Online Action DetectionTVSeries (test)
mcAP85.8
41
Video ClassificationCharades
mAP42.5
38
Action RecognitionEPIC-KITCHENS (val)
Verb Top-1 Acc52.6
36
Showing 10 of 29 rows

Other info

Code

Follow for update