Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Memory-augmented Dense Predictive Coding for Video Representation Learning

About

The objective of this paper is self-supervised learning from video, in particular for representations for action recognition. We make the following contributions: (i) We propose a new architecture and learning framework Memory-augmented Dense Predictive Coding (MemDPC) for the task. It is trained with a predictive attention mechanism over the set of compressed memories, such that any future states can always be constructed by a convex combination of the condense representations, allowing to make multiple hypotheses efficiently. (ii) We investigate visual-only self-supervised video representation learning from RGB frames, or from unsupervised optical flow, or both. (iii) We thoroughly evaluate the quality of learnt representation on four different downstream tasks: action recognition, video retrieval, learning with scarce annotations, and unintentional action classification. In all cases, we demonstrate state-of-the-art or comparable performance over other approaches with orders of magnitude fewer training data.

Tengda Han, Weidi Xie, Andrew Zisserman• 2020

Related benchmarks

TaskDatasetResultRank
Action RecognitionUCF101
Accuracy86.1
365
Action RecognitionUCF101 (mean of 3 splits)
Accuracy69.2
357
Action RecognitionUCF101 (test)
Accuracy54.1
307
Action RecognitionHMDB51 (test)
Accuracy0.412
249
Action RecognitionHMDB51
Top-1 Acc54.5
225
Action RecognitionHMDB51
3-Fold Accuracy41.2
191
Video Action RecognitionUCF101
Top-1 Acc84.3
153
Action RecognitionUCF-101
Top-1 Acc86.1
147
Video RetrievalUCF101 (1)
Top-1 Acc40.2
92
Video RecognitionHMDB51
Accuracy54.5
89
Showing 10 of 25 rows

Other info

Follow for update