Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SPFSplatV2: Efficient Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views

About

We introduce SPFSplatV2, an efficient feed-forward framework for 3D Gaussian splatting from sparse multi-view images, requiring no ground-truth poses during training and inference. It employs a shared feature extraction backbone, enabling simultaneous prediction of 3D Gaussian primitives and camera poses in a canonical space from unposed inputs. A masked attention mechanism is introduced to efficiently estimate target poses during training, while a reprojection loss enforces pixel-aligned Gaussian primitives, providing stronger geometric constraints. We further demonstrate the compatibility of our training framework with different reconstruction architectures, resulting in two model variants. Remarkably, despite the absence of pose supervision, our method achieves state-of-the-art performance in both in-domain and out-of-domain novel view synthesis, even under extreme viewpoint changes and limited image overlap, and surpasses recent methods that rely on geometric supervision for relative pose estimation. By eliminating dependence on ground-truth poses, our method offers the scalability to leverage larger and more diverse datasets. Code and pretrained models will be available on our project page: https://ranrhuang.github.io/spfsplatv2/.

Ranran Huang, Krystian Mikolajczyk• 2025

Related benchmarks

TaskDatasetResultRank
Novel View SynthesisRE10K
SSIM89.2
142
Novel View SynthesisACID
PSNR26.361
71
Novel View SynthesisRE10K Small
PSNR23.138
38
Pose EstimationRE10K
AUC @ 5°0.645
35
Novel View SynthesisRE10K (Medium)
PSNR25.542
33
Novel View SynthesisRE10K (Average)
PSNR25.693
33
Novel View SynthesisRE10K Large
PSNR28.143
25
Pose EstimationACID
AUC @ 5°38.7
23
Novel View SynthesisACID zero-shot
PSNR26.361
20
Novel View SynthesisACID Small
PSNR23.64
13
Showing 10 of 14 rows

Other info

Follow for update