Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RUST: Latent Neural Scene Representations from Unposed Imagery

About

Inferring the structure of 3D scenes from 2D observations is a fundamental challenge in computer vision. Recently popularized approaches based on neural scene representations have achieved tremendous impact and have been applied across a variety of applications. One of the major remaining challenges in this space is training a single model which can provide latent representations which effectively generalize beyond a single scene. Scene Representation Transformer (SRT) has shown promise in this direction, but scaling it to a larger set of diverse scenes is challenging and necessitates accurately posed ground truth data. To address this problem, we propose RUST (Really Unposed Scene representation Transformer), a pose-free approach to novel view synthesis trained on RGB images alone. Our main insight is that one can train a Pose Encoder that peeks at the target image and learns a latent pose embedding which is used by the decoder for view synthesis. We perform an empirical investigation into the learned latent pose structure and show that it allows meaningful test-time camera transformations and accurate explicit pose readouts. Perhaps surprisingly, RUST achieves similar quality as methods which have access to perfect camera pose, thereby unlocking the potential for large-scale training of amortized neural scene representations.

Mehdi S. M. Sajjadi, Aravindh Mahendran, Thomas Kipf, Etienne Pot, Daniel Duckworth, Mario Lucic, Klaus Greff• 2022

Related benchmarks

TaskDatasetResultRank
Pose EstimationRE10K--
35
Pose EstimationCO3D v2--
19
Novel View SynthesisMSN Multi-ShapeNet (test)
PSNR23.88
14
View SynthesisCO3D-Hydrants (test)
LPIPS0.6071
12
View SynthesisKITTI (test)
PSNR14.18
11
Pose EstimationDL3DV
Rotation Accuracy (R. Acc)97.1
9
Pose EstimationMVImgNet
Rotation Accuracy96.8
9
View SynthesisRealEstate10K (test)
LPIPS0.5898
9
Relative Camera Pose EstimationMSN (test)
MSE0.08
7
Novel View SynthesisStreet View (SV)
PSNR22.5
4
Showing 10 of 18 rows

Other info

Follow for update