Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences

About

Point cloud sequences are irregular and unordered in the spatial dimension while exhibiting regularities and order in the temporal dimension. Therefore, existing grid based convolutions for conventional video processing cannot be directly applied to spatio-temporal modeling of raw point cloud sequences. In this paper, we propose a point spatio-temporal (PST) convolution to achieve informative representations of point cloud sequences. The proposed PST convolution first disentangles space and time in point cloud sequences. Then, a spatial convolution is employed to capture the local structure of points in the 3D space, and a temporal convolution is used to model the dynamics of the spatial regions along the time dimension. Furthermore, we incorporate the proposed PST convolution into a deep network, namely PSTNet, to extract features of point cloud sequences in a hierarchical manner. Extensive experiments on widely-used 3D action recognition and 4D semantic segmentation datasets demonstrate the effectiveness of PSTNet to model point cloud sequences.

Hehe Fan, Xin Yu, Yuhang Ding, Yi Yang, Mohan Kankanhalli• 2022

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D 120 (X-set)
Accuracy93.8
661
Action RecognitionNTU RGB+D 60 (Cross-View)
Accuracy96.5
575
Action RecognitionNTU RGB+D (Cross-subject)
Accuracy90.5
474
Action RecognitionNTU RGB-D Cross-Subject 60
Accuracy90.5
305
Action RecognitionNTU RGB+D 120 Cross-Subject
Accuracy87
183
Action RecognitionMSRAction3D
Accuracy91.2
123
Gesture RecognitionnvGesture (test)
Accuracy (%)88.4
115
Scene Flow EstimationKITTI
EPE (m)0.278
34
Semantic segmentationSynthia 4D (test)
mIoU82.24
26
Gesture RecognitionSHREC'17 1.0 (test)
Accuracy92.1
23
Showing 10 of 12 rows

Other info

Follow for update