Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Positional Prompt Tuning for Efficient 3D Representation Learning

About

We rethink the role of positional encoding in 3D representation learning and fine-tuning. We argue that using positional encoding in point Transformer-based methods serves to aggregate multi-scale features of point clouds. Additionally, we explore parameter-efficient fine-tuning (PEFT) through the lens of prompts and adapters, introducing a straightforward yet effective method called PPT for point cloud analysis. PPT incorporates increased patch tokens and trainable positional encoding while keeping most pre-trained model parameters frozen. Extensive experiments validate that PPT is both effective and efficient. Our proposed method of PEFT tasks, namely PPT, with only 1.05M of parameters for training, gets state-of-the-art results in several mainstream datasets, such as 95.01% accuracy in the ScanObjectNN OBJ_BG dataset. Codes and weights will be released at https://github.com/zsc000722/PPT.

Shaochen Zhang, Zekun Qi, Runpei Dong, Xiuxiu Bai, Xing Wei• 2024

Related benchmarks

TaskDatasetResultRank
Semantic segmentationS3DIS (Area 5)
mIOU54.8
799
Object ClassificationScanObjectNN OBJ_BG
Accuracy89.84
215
Part SegmentationShapeNetPart
mIoU (Instance)85.7
198
Object ClassificationScanObjectNN PB_T50_RS
Accuracy84.45
195
Object ClassificationScanObjectNN OBJ_ONLY
Overall Accuracy88.98
166
Few-shot classificationModelNet40 5-way 10-shot
Accuracy97
79
Few-shot classificationModelNet40 5-way 20-shot
Accuracy98.7
79
Few-shot classificationModelNet40 10-way 20-shot
Accuracy95.6
79
Few-shot classificationModelNet40 10-way 10-shot
Accuracy92.2
79
Shape classificationScanObjectNN PB_T50_RS
OA89.52
72
Showing 10 of 19 rows

Other info

Code

Follow for update