Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ViA: View-invariant Skeleton Action Representation Learning via Motion Retargeting

About

Current self-supervised approaches for skeleton action representation learning often focus on constrained scenarios, where videos and skeleton data are recorded in laboratory settings. When dealing with estimated skeleton data in real-world videos, such methods perform poorly due to the large variations across subjects and camera viewpoints. To address this issue, we introduce ViA, a novel View-Invariant Autoencoder for self-supervised skeleton action representation learning. ViA leverages motion retargeting between different human performers as a pretext task, in order to disentangle the latent action-specific `Motion' features on top of the visual representation of a 2D or 3D skeleton sequence. Such `Motion' features are invariant to skeleton geometry and camera view and allow ViA to facilitate both, cross-subject and cross-view action classification tasks. We conduct a study focusing on transfer-learning for skeleton-based action recognition with self-supervised pre-training on real-world data (e.g., Posetics). Our results showcase that skeleton representations learned from ViA are generic enough to improve upon state-of-the-art action classification accuracy, not only on 3D laboratory datasets such as NTU-RGB+D 60 and NTU-RGB+D 120, but also on real-world datasets where only 2D data are accurately estimated, e.g., Toyota Smarthome, UAV-Human and Penn Action.

Di Yang, Yaohui Wang, Antitza Dantcheva, Lorenzo Garattoni, Gianpiero Francesca, Francois Bremond• 2022

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D 120 (X-set)
Accuracy66.9
661
Action RecognitionNTU RGB+D 60 (X-sub)
Accuracy89.6
467
Action RecognitionNTU RGB+D X-View 60
Accuracy96.4
172
Skeleton-based Action RecognitionNTU 120 (X-sub)
Accuracy85
139
Action RecognitionNTU-120 (cross-subject (xsub))
Accuracy69.2
82
Action RecognitionNTU-60 (xsub)
Accuracy78.1
40
3D Action RecognitionNTU-120 (X-set)
Top-1 Acc86.5
16
Showing 7 of 7 rows

Other info

Follow for update