Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Camera-Aware Cross-View Alignment for Referring 3D Gaussian Splatting Segmentation

About

Referring 3D Gaussian Splatting Segmentation (R3DGS) aims to ground free-form language queries in 3D Gaussian fields. However, existing methods rely on single-view pseudo supervision, leading to viewpoint drift and inconsistent predictions across views. We propose CaRF (Camera-aware Referring Field), a camera-aware cross-view alignment framework for view-consistent referring in 3D Gaussian splatting. CaRF introduces Camera-conditioned Alignment Modulation (CAM) to inject camera geometry into Gaussian-text interactions, and Gaussian-level Cross-view Logit Alignment (GCLA) to explicitly align referring responses of the same Gaussians across calibrated views during training. By turning cross-view discrepancy into an optimizable objective, CaRF enables geometry-aware and view-consistent reasoning directly in the Gaussian space. Extensive experiments on three benchmarks demonstrate that CaRF achieves state-of-the-art performance, improving mIoU by 16.8%, 4.3%, and 2.0% on Ref-LERF, LERF-OVS, and 3D-OVS, respectively. Our code is available at https://github.com/eR3R3/CaRF.

Yuwen Tao, Kanglei Zhou, Xin Tan, Yuan Xie• 2025

Related benchmarks

TaskDatasetResultRank
3D Language GroundingRef-LeRF
Ramen Score33.5
7
3D Open-vocabulary SegmentationLERF-OVS
mIoU (Ramen)55.2
7
3D Open-vocabulary Segmentation3D-OVS
mIoU (Bed)92.1
7
Showing 3 of 3 rows

Other info

Follow for update