Point-In-Context: Understanding Point Cloud via In-Context Learning

About

The rise of large-scale models has catalyzed in-context learning as a powerful approach for multitasking, particularly in natural language and image processing. However, its application to 3D point cloud tasks has been largely unexplored. In this paper, we introduce Point-In-Context (PIC), a pioneering framework for 3D point cloud understanding that leverages in-context learning with a standard transformer architecture. PIC uniquely enables the execution of multiple tasks after a single, unified training phase, eliminating the need for fine-tuning. To extend masked point modeling to 3D in-context learning, we introduce a Joint Sampling module, a simple yet effective technique that emphasizes the mapping relationship between input and target. PIC treats both inputs and targets as coordinate-based, addressing the segmentation challenge by associating label points with pre-defined XYZ coordinates for each category. However, relying on such fixed label-coordinate assignments limits the model's ability to generalize to unseen domains. To address this limitation, we further propose two innovative training strategies: In-Context Labeling and In-Context Enhancing. These strategies are integrated into PIC++, which enhances dynamic in-context labeling and model training. Besides its multitask capability, PIC++ demonstrates generalization across part segmentation datasets by employing dynamic in-context labels and regular in-context pairs. Remarkably, PIC++, trained once without fine-tuning, can generalize effectively to unseen datasets and perform novel part segmentation through customized prompts. Overall, PIC is a general framework that seamlessly integrates additional tasks or datasets through a unified data format via in-context learning. Extensive experiments substantiate PIC's versatility and adaptability in handling diverse tasks and segmenting multiple datasets simultaneously.

Mengyuan Liu, Zhongbin Fang, Xia Li, Joachim M. Buhmann, Deheng Ye, Xiangtai Li, Chen Change Loy• 2024

Related benchmarks

Task	Dataset	Result
Part Segmentation	ShapeNetPart (test)	--	347
Denoising	ShapeNet In-Context	L1 CD Error3.8	59
Reconstruction	ShapeNet In-Context	CD L13.2	59
Registration	ShapeNet In-Context	L1 CD Error (x1000)6	47
Part Segmentation	ShapeNet In-Context	mIoU85.53	34
3D Part Segmentation	ShapeNetPart (val)	mIoU87.82	33
Multi-Entity Segmentation	Human3D (test)	mIoU82.82	25
Multi-Entity Segmentation	Human3D (val)	mIoU85.59	25
Multi-Entity Segmentation	BEHAVE (test)	mIoU88.63	25
Multi-Entity Segmentation	AKB-48	mIoU73.52	22

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord