PointCMP: Contrastive Mask Prediction for Self-supervised Learning on Point Cloud Videos

About

Self-supervised learning can extract representations of good quality from solely unlabeled data, which is appealing for point cloud videos due to their high labelling cost. In this paper, we propose a contrastive mask prediction (PointCMP) framework for self-supervised learning on point cloud videos. Specifically, our PointCMP employs a two-branch structure to achieve simultaneous learning of both local and global spatio-temporal information. On top of this two-branch structure, a mutual similarity based augmentation module is developed to synthesize hard samples at the feature level. By masking dominant tokens and erasing principal channels, we generate hard samples to facilitate learning representations with better discrimination and generalization performance. Extensive experiments show that our PointCMP achieves the state-of-the-art performance on benchmark datasets and outperforms existing full-supervised counterparts. Transfer learning results demonstrate the superiority of the learned representations across different datasets and tasks.

Zhiqiang Shen, Xiaoxiao Sheng, Longguang Wang, Yulan Guo, Qiong Liu, Xi Zhou• 2023

Related benchmarks

Task	Dataset	Result
Action Recognition	NTU RGB+D (Cross-subject)	Accuracy88.5	511
Action Recognition	MSRAction3D	Accuracy93.27	232
Gesture Recognition	nvGesture (test)	Accuracy (%)89.2	145
Action Recognition	MSR Action3D (test)	Accuracy93.27	94
Gesture Recognition	SHREC'17 1.0 (test)	Accuracy93.3	35
Gesture Recognition	SHREC 17	Accuracy (%)93.3	22

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord