Resolving Long-Tail Ambiguity in Unsupervised 3D Point Cloud Segmentation with Language Priors

About

Existing approaches for unsupervised 3D point cloud segmentation predominantly rely on a purely visual similarity-based learning-by-clustering paradigm, which suffers from a fundamental limitation: long-tail ambiguity. In such a paradigm, features of minor classes are consistently absorbed by dominant clusters, leading to severely imbalanced predictions. To address this issue, we propose LangTail, a language-guided hierarchical learning framework that leverages the balanced world knowledge encoded in language models to mitigate long-tail ambiguity in unsupervised 3D segmentation. The key idea is to establish multi-level associations between language-derived semantic priors and visually underrepresented minor classes, thereby compensating for the biased attention of purely visual clustering toward dominant classes. Specifically, LangTail first constructs an entity-level semantic prior from language models, capturing balanced and fine-grained world knowledge across categories. These priors are injected into a hierarchical clustering framework via contrastive alignment. This guides multi-granularity semantic structure formation and prevents minor classes from being absorbed by dominant clusters, yielding more discriminative representations for underrepresented categories. Extensive experiments on ScanNet-v2, S3DIS, and nuScenes demonstrate that LangTail consistently outperforms existing methods by significant margins, \ie, +13.5, +12.9, and +8.9 mIoU, respectively. These results demonstrate the effectiveness of language priors in improving the representation of minority classes in 3D point clouds. The code will be released at: https://github.com/Whisky0129/langtail_official.

Siqi Wei, Hongbin Xu, Feng Xiao, Tian Lan, Chun Li, Ming Li, Qiuxia Wu• 2026

Related benchmarks

Task	Dataset	Result
Semantic segmentation	S3DIS (Area 5)	mIOU59.5	1029
Semantic segmentation	ScanNet V2 (val)	mIoU46.7	380
3D Semantic Segmentation	ScanNet (test)	mIoU45.3	117
3D Semantic Segmentation	nuScenes 1.0 (val)	mIoU29	19

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord