Directional Temporal Modeling for Action Recognition

About

Many current activity recognition models use 3D convolutional neural networks (e.g. I3D, I3D-NL) to generate local spatial-temporal features. However, such features do not encode clip-level ordered temporal information. In this paper, we introduce a channel independent directional convolution (CIDC) operation, which learns to model the temporal evolution among local features. By applying multiple CIDC units we construct a light-weight network that models the clip-level temporal evolution across multiple spatial scales. Our CIDC network can be attached to any activity recognition backbone network. We evaluate our method on four popular activity recognition datasets and consistently improve upon state-of-the-art techniques. We further visualize the activation map of our CIDC network and show that it is able to focus on more meaningful, action related parts of the frame.

Xinyu Li, Bing Shuai, Joseph Tighe• 2020

Related benchmarks

Task	Dataset	Result
Action Recognition	Kinetics 400 (test)	Top-1 Accuracy74.5	245
Action Recognition	Something-Something v2 (test val)	Top-1 Accuracy56.3	187
Action Classification	Kinetics 400 (val)	Top-1 Accuracy75.5	63

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord