Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learn to cycle: Time-consistent feature discovery for action recognition

About

Generalizing over temporal variations is a prerequisite for effective action recognition in videos. Despite significant advances in deep neural networks, it remains a challenge to focus on short-term discriminative motions in relation to the overall performance of an action. We address this challenge by allowing some flexibility in discovering relevant spatio-temporal features. We introduce Squeeze and Recursion Temporal Gates (SRTG), an approach that favors inputs with similar activations with potential temporal variations. We implement this idea with a novel CNN block that uses an LSTM to encapsulate feature dynamics, in conjunction with a temporal gate that is responsible for evaluating the consistency of the discovered dynamics and the modeled features. We show consistent improvement when using SRTG blocks, with only a minimal increase in the number of GFLOPs. On Kinetics-700, we perform on par with current state-of-the-art models, and outperform these on HACS, Moments in Time, UCF-101 and HMDB-51.

Alexandros Stergiou, Ronald Poppe• 2020

Related benchmarks

TaskDatasetResultRank
Action RecognitionUCF101 (test)
Accuracy97.325
307
Action RecognitionKinetics-700 (val)
Top-1 Acc56.826
28
Video ClassificationMoments in Time v1 (val)
Top-1 Acc33.6
19
Action RecognitionHACS (val)
Top-1 Acc84.326
13
Action RecognitionMoments in Time (val)
Top-1 Acc33.564
12
Showing 5 of 5 rows

Other info

Code

Follow for update