Long-Term Feature Banks for Detailed Video Understanding

About

To understand the world, we humans constantly need to relate the present to the past, and put events in context. In this paper, we enable existing video models to do the same. We propose a long-term feature bank---supportive information extracted over the entire span of a video---to augment state-of-the-art video models that otherwise would only view short clips of 2-5 seconds. Our experiments demonstrate that augmenting 3D convolutional networks with a long-term feature bank yields state-of-the-art results on three challenging video datasets: AVA, EPIC-Kitchens, and Charades.

Chao-Yuan Wu, Christoph Feichtenhofer, Haoqi Fan, Kaiming He, Philipp Kr\"ahenb\"uhl, Ross Girshick• 2018

Related benchmarks

Task	Dataset	Result
Online Action Detection	THUMOS14 (test)	mAP64.8	93
Online Action Detection	TVSeries	mcAP84.8	71
Action Recognition	Charades (val)	mAP42.5	69
Action Recognition	Charades	mAP0.425	64
Action Recognition	Charades (test)	mAP0.434	53
Action Recognition	Charades v1 (test)	--	52
Action Detection	AVA v2.1 (val)	mAP27.7	48
Online Action Detection	TVSeries (test)	mcAP85.8	41
Video Classification	Charades	mAP42.5	38
Action Recognition	EPIC-KITCHENS (val)	Verb Top-1 Acc52.6	36

Showing 10 of 29 rows

Other info

Code

Follow for update

@wizwand_team Discord