PointCLIP: Point Cloud Understanding by CLIP

About

Recently, zero-shot and few-shot learning via Contrastive Vision-Language Pre-training (CLIP) have shown inspirational performance on 2D visual recognition, which learns to match images with their corresponding texts in open-vocabulary settings. However, it remains under explored that whether CLIP, pre-trained by large-scale image-text pairs in 2D, can be generalized to 3D recognition. In this paper, we identify such a setting is feasible by proposing PointCLIP, which conducts alignment between CLIP-encoded point cloud and 3D category texts. Specifically, we encode a point cloud by projecting it into multi-view depth maps without rendering, and aggregate the view-wise zero-shot prediction to achieve knowledge transfer from 2D to 3D. On top of that, we design an inter-view adapter to better extract the global feature and adaptively fuse the few-shot knowledge learned from 3D into CLIP pre-trained in 2D. By just fine-tuning the lightweight adapter in the few-shot settings, the performance of PointCLIP could be largely improved. In addition, we observe the complementary property between PointCLIP and classical 3D-supervised networks. By simple ensembling, PointCLIP boosts baseline's performance and even surpasses state-of-the-art models. Therefore, PointCLIP is a promising alternative for effective 3D point cloud understanding via CLIP under low resource cost and data regime. We conduct thorough experiments on widely-adopted ModelNet10, ModelNet40 and the challenging ScanObjectNN to demonstrate the effectiveness of PointCLIP. The code is released at https://github.com/ZrrSkywalker/PointCLIP.

Renrui Zhang, Ziyu Guo, Wei Zhang, Kunchang Li, Xupeng Miao, Bin Cui, Yu Qiao, Peng Gao, Hongsheng Li• 2021

Related benchmarks

Task	Dataset	Result
3D Object Classification	ModelNet40 (test)	Accuracy23.78	321
3D Point Cloud Classification	ModelNet40 (test)	OA92.1	307
Part Segmentation	ShapeNetPart	--	254
Object Classification	ScanObjectNN OBJ_BG	--	248
Object Classification	ScanObjectNN OBJ_ONLY	Overall Accuracy21.3	186
3D Object Classification	Objaverse-LVIS (test)	Top-1 Accuracy1.9	95
3D Object Classification	ScanObjectNN PB_T50_RS	OA15.4	94
3D Point Cloud Classification	ScanObjectNN (test)	--	92
3D Object Classification	ModelNet40	Top-1 Accuracy20.2	89
Classification	ScanObjectNN	--	77

Showing 10 of 91 rows

...

Other info

Code

Follow for update

@wizwand_team Discord