Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

FLEG: Feed-Forward Language Embedded Gaussian Splatting from Any Views

About

We present FLEG, a feed-forward network that reconstructs language-embedded 3D Gaussians from any views. Previous straightforward solutions combine feed-forward reconstruction with Gaussian heads but suffer from fixed input views and insufficient 3D training data. In contrast, we propose a 3D-annotation-free training framework for 2D-to-3D lifting from arbitrary uncalibrated and unposed multi-view images. Since the framework does not require 3D annotations, we can leverage large-scale video data with easily obtained 2D instance information to enrich semantic embedding. We also propose an instance-guided contrastive learning to align 2D semantics with the 3D representations. In addition, to mitigate the high memory and computational cost of dense views, we further propose a geometry-semantic hierarchical sparsification strategy. Our FLEG efficiently reconstructs language-embedded 3D Gaussian representation in a feed-forward manner from arbitrary sparse or dense views, jointly producing accurate geometry, high-fidelity appearance, and language-aligned semantics. Extensive experiments show that it outperforms existing methods on various related tasks. Project page: https://fangzhou2000.github.io/projects/fleg.

Qijian Tian, Xin Tan, Jiayu Ying, Xuhong Wang, Yuan Xie, Lizhuang Ma• 2025

Related benchmarks

TaskDatasetResultRank
Open Vocabulary Semantic SegmentationScanNet v2 (test)
mIoU47.59
16
Novel View SynthesisScanNet v2 (test)
PSNR24.2
12
Open Vocabulary Semantic SegmentationScanNet
mIoU46.56
6
Showing 3 of 3 rows

Other info

Follow for update