RUST: Latent Neural Scene Representations from Unposed Imagery

About

Inferring the structure of 3D scenes from 2D observations is a fundamental challenge in computer vision. Recently popularized approaches based on neural scene representations have achieved tremendous impact and have been applied across a variety of applications. One of the major remaining challenges in this space is training a single model which can provide latent representations which effectively generalize beyond a single scene. Scene Representation Transformer (SRT) has shown promise in this direction, but scaling it to a larger set of diverse scenes is challenging and necessitates accurately posed ground truth data. To address this problem, we propose RUST (Really Unposed Scene representation Transformer), a pose-free approach to novel view synthesis trained on RGB images alone. Our main insight is that one can train a Pose Encoder that peeks at the target image and learns a latent pose embedding which is used by the decoder for view synthesis. We perform an empirical investigation into the learned latent pose structure and show that it allows meaningful test-time camera transformations and accurate explicit pose readouts. Perhaps surprisingly, RUST achieves similar quality as methods which have access to perfect camera pose, thereby unlocking the potential for large-scale training of amortized neural scene representations.

Mehdi S. M. Sajjadi, Aravindh Mahendran, Thomas Kipf, Etienne Pot, Daniel Duckworth, Mario Lucic, Klaus Greff• 2022

Related benchmarks

Task	Dataset	Result
Pose Estimation	RE10K	--	35
Pose Estimation	CO3D v2	--	19
Novel View Synthesis	MSN Multi-ShapeNet (test)	PSNR23.88	14
View Synthesis	CO3D-Hydrants (test)	LPIPS0.6071	12
View Synthesis	KITTI (test)	PSNR14.18	11
Pose Estimation	DL3DV	Rotation Accuracy (R. Acc)97.1	9
Pose Estimation	MVImgNet	Rotation Accuracy96.8	9
View Synthesis	RealEstate10K (test)	LPIPS0.5898	9
Relative Camera Pose Estimation	MSN (test)	MSE0.08	7
Novel View Synthesis	Street View (SV)	PSNR22.5	4

Showing 10 of 18 rows

Other info

Follow for update

@wizwand_team Discord