Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Action Recognition with Multi-stream Motion Modeling and Mutual Information Maximization

About

Action recognition has long been a fundamental and intriguing problem in artificial intelligence. The task is challenging due to the high dimensionality nature of an action, as well as the subtle motion details to be considered. Current state-of-the-art approaches typically learn from articulated motion sequences in the straightforward 3D Euclidean space. However, the vanilla Euclidean space is not efficient for modeling important motion characteristics such as the joint-wise angular acceleration, which reveals the driving force behind the motion. Moreover, current methods typically attend to each channel equally and lack theoretical constrains on extracting task-relevant features from the input. In this paper, we seek to tackle these challenges from three aspects: (1) We propose to incorporate an acceleration representation, explicitly modeling the higher-order variations in motion. (2) We introduce a novel Stream-GCN network equipped with multi-stream components and channel attention, where different representations (i.e., streams) supplement each other towards a more precise action recognition while attention capitalizes on those important channels. (3) We explore feature-level supervision for maximizing the extraction of task-relevant information and formulate this into a mutual information loss. Empirically, our approach sets the new state-of-the-art performance on three benchmark datasets, NTU RGB+D, NTU RGB+D 120, and NW-UCLA. Our code is anonymously released at https://github.com/ActionR-Group/Stream-GCN, hoping to inspire the community.

Yuheng Yang, Haipeng Chen, Zhenguang Liu, Yingda Lyu, Beibei Zhang, Shuang Wu, Zhibo Wang, Kui Ren• 2023

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D 120 (X-set)
Accuracy91
661
Action RecognitionNTU RGB+D (Cross-View)--
609
Action RecognitionNTU RGB+D 60 (Cross-View)
Accuracy96.9
575
Action RecognitionNTU RGB+D (Cross-subject)--
474
Action RecognitionNTU RGB+D 60 (X-sub)--
467
Action RecognitionNTU RGB-D Cross-Subject 60
Accuracy92.9
305
Action RecognitionNTU RGB+D 120 Cross-Subject
Accuracy89.7
183
Action RecognitionNTU-120 (cross-subject (xsub))
Accuracy89.7
82
Action RecognitionNW-UCLA
Top-1 Acc96.8
67
Skeleton-based Action RecognitionNW-UCLA--
44
Showing 10 of 11 rows

Other info

Code

Follow for update