Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Learning Discriminative Representations for Skeleton Based Action Recognition

About

Human action recognition aims at classifying the category of human action from a segment of a video. Recently, people have dived into designing GCN-based models to extract features from skeletons for performing this task, because skeleton representations are much more efficient and robust than other modalities such as RGB frames. However, when employing the skeleton data, some important clues like related items are also discarded. It results in some ambiguous actions that are hard to be distinguished and tend to be misclassified. To alleviate this problem, we propose an auxiliary feature refinement head (FR Head), which consists of spatial-temporal decoupling and contrastive feature refinement, to obtain discriminative representations of skeletons. Ambiguous samples are dynamically discovered and calibrated in the feature space. Furthermore, FR Head could be imposed on different stages of GCNs to build a multi-level refinement for stronger supervision. Extensive experiments are conducted on NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets. Our proposed models obtain competitive results from state-of-the-art methods and can help to discriminate those ambiguous samples. Codes are available at https://github.com/zhysora/FR-Head.

Huanyu Zhou, Qingjie Liu, Yunhong Wang• 2023

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D 120 (X-set)
Accuracy90.9
717
Action RecognitionNTU RGB+D (Cross-View)
Accuracy95.3
652
Action RecognitionNTU RGB+D 60 (Cross-View)
Accuracy96.8
588
Action RecognitionNTU RGB+D 60 (X-sub)
Accuracy92.8
467
Action RecognitionNTU RGB+D X-sub 120
Accuracy89.5
430
Action RecognitionNTU RGB-D Cross-Subject 60
Accuracy92.8
336
Action RecognitionNTU-60 (xsub)
Accuracy93.1
223
Action RecognitionNTU RGB+D 120 Cross-Subject
Accuracy89.5
222
Skeleton-based Action RecognitionNTU RGB+D (Cross-View)
Accuracy96.8
213
Action RecognitionNTU-120 (cross-subject (xsub))
Accuracy89.5
211
Showing 10 of 31 rows

Other info

Code

Follow for update