Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LUDVIG: Learning-Free Uplifting of 2D Visual Features to Gaussian Splatting Scenes

About

We address the problem of extending the capabilities of vision foundation models such as DINO, SAM, and CLIP, to 3D tasks. Specifically, we introduce a novel method to uplift 2D image features into Gaussian Splatting representations of 3D scenes. Unlike traditional approaches that rely on minimizing a reconstruction loss, our method employs a simpler and more efficient feature aggregation technique, augmented by a graph diffusion mechanism. Graph diffusion refines 3D features, such as coarse segmentation masks, by leveraging 3D geometry and pairwise similarities induced by DINOv2. Our approach achieves performance comparable to the state of the art on multiple downstream tasks while delivering significant speed-ups. Notably, we obtain competitive segmentation results using only generic DINOv2 features, despite DINOv2 not being trained on millions of annotated segmentation masks like SAM. When applied to CLIP features, our method demonstrates strong performance in open-vocabulary object segmentation tasks, highlighting the versatility of our approach.

Juliette Marrie, Romain Menegaux, Michael Arbel, Diane Larlus, Julien Mairal• 2024

Related benchmarks

TaskDatasetResultRank
3D Semantic SegmentationScanNet
mIoU (10 classes)40.47
17
3D object selectionLERF-OVS
mIoU (Mean)39.28
17
3D Semantic SegmentationScanNet 10 classes
mIoU41.11
17
3D Semantic SegmentationScanNet 15 classes
mIoU33.73
17
Open-vocabulary 3D object selectionLERF
Ramen Score42.3
16
3D object selectionLERF figurines scene
Peak VRAM22
14
Open-Vocabulary 3D Semantic SegmentationScanNet 19 classes
mIoU33.9
12
Open-Vocabulary 3D Semantic SegmentationScanNet 10 classes
mIoU46.4
12
Open-Vocabulary 3D Semantic SegmentationScanNet 15 classes
mIoU37.4
12
3D Semantic SegmentationScanNet 200 70 classes
mIoU21.23
10
Showing 10 of 13 rows

Other info

Follow for update