Positional Prompt Tuning for Efficient 3D Representation Learning

About

We rethink the role of positional encoding in 3D representation learning and fine-tuning. We argue that using positional encoding in point Transformer-based methods serves to aggregate multi-scale features of point clouds. Additionally, we explore parameter-efficient fine-tuning (PEFT) through the lens of prompts and adapters, introducing a straightforward yet effective method called PPT for point cloud analysis. PPT incorporates increased patch tokens and trainable positional encoding while keeping most pre-trained model parameters frozen. Extensive experiments validate that PPT is both effective and efficient. Our proposed method of PEFT tasks, namely PPT, with only 1.05M of parameters for training, gets state-of-the-art results in several mainstream datasets, such as 95.01% accuracy in the ScanObjectNN OBJ_BG dataset. Codes and weights will be released at https://github.com/zsc000722/PPT.

Shaochen Zhang, Zekun Qi, Runpei Dong, Xiuxiu Bai, Xing Wei• 2024

Related benchmarks

Task	Dataset	Result
Semantic segmentation	S3DIS (Area 5)	mIOU54.8	1006
Object Classification	ScanObjectNN OBJ_BG	Accuracy89.84	248
Part Segmentation	ShapeNetPart	mIoU (Instance)85.7	246
Object Classification	ScanObjectNN PB_T50_RS	Accuracy84.45	220
Object Classification	ScanObjectNN OBJ_ONLY	Overall Accuracy88.98	186
Few-shot classification	ModelNet40 10-way 20-shot	Accuracy95.6	117
Few-shot classification	ModelNet40 10-way 10-shot	Accuracy92.2	117
Few-shot classification	ModelNet40 5-way 10-shot	Accuracy97	102
Few-shot classification	ModelNet40 5-way 20-shot	Accuracy98.7	102
Shape classification	ScanObjectNN PB_T50_RS	OA89.52	72

Showing 10 of 26 rows

Other info

Code

Follow for update

@wizwand_team Discord