Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learning Discriminative Representations for Skeleton Based Action Recognition

About

Human action recognition aims at classifying the category of human action from a segment of a video. Recently, people have dived into designing GCN-based models to extract features from skeletons for performing this task, because skeleton representations are much more efficient and robust than other modalities such as RGB frames. However, when employing the skeleton data, some important clues like related items are also discarded. It results in some ambiguous actions that are hard to be distinguished and tend to be misclassified. To alleviate this problem, we propose an auxiliary feature refinement head (FR Head), which consists of spatial-temporal decoupling and contrastive feature refinement, to obtain discriminative representations of skeletons. Ambiguous samples are dynamically discovered and calibrated in the feature space. Furthermore, FR Head could be imposed on different stages of GCNs to build a multi-level refinement for stronger supervision. Extensive experiments are conducted on NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets. Our proposed models obtain competitive results from state-of-the-art methods and can help to discriminate those ambiguous samples. Codes are available at https://github.com/zhysora/FR-Head.

Huanyu Zhou, Qingjie Liu, Yunhong Wang• 2023

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D 120 (X-set)
Accuracy90.9
661
Action RecognitionNTU RGB+D (Cross-View)
Accuracy95.3
609
Action RecognitionNTU RGB+D 60 (Cross-View)
Accuracy96.8
575
Action RecognitionNTU RGB+D 60 (X-sub)
Accuracy92.8
467
Action RecognitionNTU RGB+D X-sub 120
Accuracy89.5
377
Action RecognitionNTU RGB-D Cross-Subject 60
Accuracy92.8
305
Skeleton-based Action RecognitionNTU RGB+D (Cross-View)
Accuracy96.8
213
Skeleton-based Action RecognitionNTU RGB+D 120 (X-set)
Top-1 Accuracy90.9
184
Action RecognitionNTU RGB+D 120 Cross-Subject
Accuracy89.5
183
Action RecognitionNTU RGB+D X-View 60
Accuracy96.8
172
Showing 10 of 27 rows

Other info

Code

Follow for update