Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Directional Temporal Modeling for Action Recognition

About

Many current activity recognition models use 3D convolutional neural networks (e.g. I3D, I3D-NL) to generate local spatial-temporal features. However, such features do not encode clip-level ordered temporal information. In this paper, we introduce a channel independent directional convolution (CIDC) operation, which learns to model the temporal evolution among local features. By applying multiple CIDC units we construct a light-weight network that models the clip-level temporal evolution across multiple spatial scales. Our CIDC network can be attached to any activity recognition backbone network. We evaluate our method on four popular activity recognition datasets and consistently improve upon state-of-the-art techniques. We further visualize the activation map of our CIDC network and show that it is able to focus on more meaningful, action related parts of the frame.

Xinyu Li, Bing Shuai, Joseph Tighe• 2020

Related benchmarks

TaskDatasetResultRank
Action RecognitionKinetics 400 (test)
Top-1 Accuracy74.5
245
Action RecognitionSomething-Something v2 (test val)
Top-1 Accuracy56.3
187
Action ClassificationKinetics 400 (val)
Top-1 Accuracy75.5
63
Showing 3 of 3 rows

Other info

Follow for update