Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TreeGaussian: Tree-Guided Cascaded Contrastive Learning for Hierarchical Consistent 3D Gaussian Scene Segmentation and Understanding

About

3D Gaussian Splatting (3DGS) has emerged as a real-time, differentiable representation for neural scene understanding. However, existing 3DGS-based methods struggle to represent hierarchical 3D semantic structures and capture whole-part relationships in complex scenes. Moreover, dense pairwise comparisons and inconsistent hierarchical labels from 2D priors hinder feature learning, resulting in suboptimal segmentation. To address these limitations, we introduce TreeGaussian, a tree-guided cascaded contrastive learning framework that explicitly models hierarchical semantic relationships and reduces redundancy in contrastive supervision. By constructing a multi-level object tree, TreeGaussian enables structured learning across object-part hierarchies. In addition, we propose a two-stage cascaded contrastive learning strategy that progressively refines feature representations from global to local, mitigating saturation and stabilizing training. A Consistent Segmentation Detection (CSD) mechanism and a graph-based denoising module are further introduced to align segmentation modes across views while suppressing unstable Gaussian points, enhancing segmentation consistency and quality. Extensive experiments, including open-vocabulary 3D object selection, 3D point cloud understanding, and ablation studies, demonstrate the effectiveness and robustness of our approach.

Jingbin You, Zehao Li, Hao Jiang, Xinzhu Ma, Shuqin Gao, Honglong Zhao, Congcong Zheng, Tianlu Mao, Feng Dai, Yucheng Zhang, Zhaoqi Wang• 2026

Related benchmarks

TaskDatasetResultRank
3D object selectionLERF-OVS
mIoU (Mean)51.78
17
3D Point Cloud UnderstandingScanNet 19 classes v2 (10 scenes)
mAcc (Whole)54.38
4
3D Point Cloud UnderstandingScanNet 15 classes v2 (10 scenes)
Whole mAcc59.94
4
3D Point Cloud UnderstandingScanNet 10 classes 10 scenes v2
Whole mAcc65.41
4
3D Point Cloud UnderstandingScanNet 10 scenes v2 (test)
mIoU (19 Classes, Whole)41.61
4
Open-vocabulary 3D object selectionLerf ovs (part)
mIoU44.1
4
Open-Vocabulary 3D Scene SegmentationLerf_ovs (whole scale)
mIoU51.78
2
Open-Vocabulary 3D Scene SegmentationLerf_ovs (part scale)
mIoU44.1
2
Showing 8 of 8 rows

Other info

Follow for update