Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Diffusion Masked Pretraining for Dynamic Point Cloud

About

Dynamic point cloud pretraining is still dominated by masked reconstruction objectives. However, these objectives inherit two key limitations. Existing methods inject ground-truth tube centers as decoder positional embeddings, causing spatio-temporal positional leakage. Moreover, they supervise inter-frame motion with deterministic proxy targets that systematically discard distributional structure by collapsing multimodal trajectory uncertainty into conditional means. To address these limitations, we propose Diffusion Masked Pretraining (DiMP), a unified self-supervised framework for dynamic point clouds. DiMP introduces diffusion modeling into both positional inference and motion learning. It first applies forward diffusion noise only to masked tube centers, then predicts clean centers from visible spatio-temporal context. This removes positional leakage while preserving visible coordinates as clean temporal anchors. DiMP also reformulates point-wise inter-frame displacement supervision as a DDPM noise-prediction objective conditioned on decoded representations. This design drives the encoder to target the full conditional distribution of plausible motions under a variational surrogate, rather than collapsing to a single deterministic estimate. Extensive experiments demonstrate that DiMP consistently improves downstream accuracy over the backbone alone, with absolute gains of 11.21% on offline action segmentation and 13.65% under causally constrained online inference.Codes are available at https://github.com/InitalZ/DiMP.git.

Zhuoyue Zhang, Jihua Zhu, Chaowei Fang, Jian Liu, Ajmal Saeed Mian• 2026

Related benchmarks

TaskDatasetResultRank
Action RecognitionMSRAction3D
Accuracy95.51
176
Hand Gesture RecognitionNVGesture
Accuracy87.9
31
Gesture RecognitionSHREC 17
Accuracy (%)91.4
22
4D Action SegmentationHOI4D
Accuracy83.97
10
4D semantic segmentationHOI4D
mIoU48.2
10
Online Action SegmentationHOI4D
Accuracy80.35
9
Showing 6 of 6 rows

Other info

Follow for update