Expressive Keypoints for Skeleton-based Action Recognition via Skeleton Transformation
About
In the realm of skeleton-based action recognition, the traditional methods which rely on coarse body keypoints fall short of capturing subtle human actions. In this work, we propose Expressive Keypoints that incorporates hand and foot details to form a fine-grained skeletal representation, improving the discriminative ability for existing models in discerning intricate actions. To efficiently model Expressive Keypoints, the Skeleton Transformation strategy is presented to gradually downsample the keypoints and prioritize prominent joints by allocating the importance weights. Additionally, a plug-and-play Instance Pooling module is exploited to extend our approach to multi-person scenarios without surging computation costs. Extensive experimental results over seven datasets present the superiority of our method compared to the state-of-the-art for skeleton-based human action recognition. Code is available at https://github.com/YijieYang23/SkeleT-GCN.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Action Recognition | NTU RGB+D 120 (X-set) | Accuracy96.4 | 661 | |
| Action Recognition | NTU RGB+D 60 (X-sub) | Accuracy97 | 467 | |
| Action Recognition | NTU RGB+D X-sub 120 | Accuracy94.6 | 377 | |
| Action Recognition | NTU RGB+D X-View 60 | Accuracy99.6 | 172 | |
| Skeleton-based Action Recognition | NTU-RGB+D 120 (Cross-setup) | Accuracy96.4 | 136 | |
| Skeleton-based Action Recognition | NTU RGB+D 60 (Cross-Subject) | Accuracy97 | 59 | |
| Action Recognition | N-UCLA Cross-View | Accuracy97.6 | 32 | |
| Skeleton Action Recognition | NTU RGB+D Cross-Subject (Xsub) 120 | Accuracy94.6 | 29 | |
| Skeleton-based Action Recognition | NTU RGB+D Cross-View 60 | Accuracy99.6 | 14 | |
| Skeleton-based Action Recognition | NTU-Hand 11 (X-View) | Accuracy98.6 | 5 |