Point-In-Context: Understanding Point Cloud via In-Context Learning
About
The rise of large-scale models has catalyzed in-context learning as a powerful approach for multitasking, particularly in natural language and image processing. However, its application to 3D point cloud tasks has been largely unexplored. In this paper, we introduce Point-In-Context (PIC), a pioneering framework for 3D point cloud understanding that leverages in-context learning with a standard transformer architecture. PIC uniquely enables the execution of multiple tasks after a single, unified training phase, eliminating the need for fine-tuning. To extend masked point modeling to 3D in-context learning, we introduce a Joint Sampling module, a simple yet effective technique that emphasizes the mapping relationship between input and target. PIC treats both inputs and targets as coordinate-based, addressing the segmentation challenge by associating label points with pre-defined XYZ coordinates for each category. However, relying on such fixed label-coordinate assignments limits the model's ability to generalize to unseen domains. To address this limitation, we further propose two innovative training strategies: In-Context Labeling and In-Context Enhancing. These strategies are integrated into PIC++, which enhances dynamic in-context labeling and model training. Besides its multitask capability, PIC++ demonstrates generalization across part segmentation datasets by employing dynamic in-context labels and regular in-context pairs. Remarkably, PIC++, trained once without fine-tuning, can generalize effectively to unseen datasets and perform novel part segmentation through customized prompts. Overall, PIC is a general framework that seamlessly integrates additional tasks or datasets through a unified data format via in-context learning. Extensive experiments substantiate PIC's versatility and adaptability in handling diverse tasks and segmenting multiple datasets simultaneously.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Part Segmentation | ShapeNetPart (test) | -- | 312 | |
| Denoising | ShapeNet In-Context | L1 CD Error3.8 | 59 | |
| Reconstruction | ShapeNet In-Context | CD L13.2 | 59 | |
| Registration | ShapeNet In-Context | L1 CD Error (x1000)6 | 47 | |
| Part Segmentation | ShapeNet In-Context | mIoU85.53 | 34 | |
| 3D Part Segmentation | ShapeNetPart (val) | mIoU87.82 | 33 | |
| Multi-Entity Segmentation | Human3D (test) | mIoU82.82 | 25 | |
| Multi-Entity Segmentation | Human3D (val) | mIoU85.59 | 25 | |
| Multi-Entity Segmentation | BEHAVE (test) | mIoU88.63 | 25 | |
| Multi-Entity Segmentation | AKB-48 | mIoU73.52 | 22 |