Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based Action Recognition

About

Rapid progress and superior performance have been achieved for skeleton-based action recognition recently. In this article, we investigate this problem under a cross-dataset setting, which is a new, pragmatic, and challenging task in real-world scenarios. Following the unsupervised domain adaptation (UDA) paradigm, the action labels are only available on a source dataset, but unavailable on a target dataset in the training stage. Different from the conventional adversarial learning-based approaches for UDA, we utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets. Our inspiration is drawn from Cubism, an art genre from the early 20th century, which breaks and reassembles the objects to convey a greater context. By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks to explore the temporal and spatial dependency of a skeleton-based action and improve the generalization ability of the model. We conduct experiments on six datasets for skeleton-based action recognition, including three large-scale datasets (NTU RGB+D, PKU-MMD, and Kinetics) where new cross-dataset settings and benchmarks are established. Extensive results demonstrate that our method outperforms state-of-the-art approaches. The source codes of our model and all the compared methods are available at https://github.com/shanice-l/st-cubism.

Yansong Tang, Xingyu Liu, Xumin Yu, Danyang Zhang, Jiwen Lu, Jie Zhou• 2022

Related benchmarks

Task	Dataset	Result	Rank
Skeleton Action Recognition	NTU, PKU, and ETRI Cross-dataset	N -> E Accuracy59.1		14
Action Recognition	NTU-RGB+D 51 (N51) to PKU-MMD 51 (P51) (cross-dataset)	Accuracy70.5		8

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord