Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Sharp Monocular View Synthesis in Less Than a Second

About

We present SHARP, an approach to photorealistic view synthesis from a single image. Given a single photograph, SHARP regresses the parameters of a 3D Gaussian representation of the depicted scene. This is done in less than a second on a standard GPU via a single feedforward pass through a neural network. The 3D Gaussian representation produced by SHARP can then be rendered in real time, yielding high-resolution photorealistic images for nearby views. The representation is metric, with absolute scale, supporting metric camera movements. Experimental results demonstrate that SHARP delivers robust zero-shot generalization across datasets. It sets a new state of the art on multiple datasets, reducing LPIPS by 25-34% and DISTS by 21-43% versus the best prior model, while lowering the synthesis time by three orders of magnitude. Code and weights are provided at https://github.com/apple/ml-sharp

Lars Mescheder, Wei Dong, Shiwei Li, Xuyang Bai, Marcel Santos, Peiyun Hu, Bruno Lecouat, Mingmin Zhen, Ama\"el Delaunoy, Tian Fang, Yanghai Tsin, Stephan R. Richter, Vladlen Koltun• 2025

Related benchmarks

TaskDatasetResultRank
Novel View SynthesisTanks&Temples (test)--
239
Novel View SynthesisScanNet++
PSNR22.63
24
Novel View SynthesisScanNet++ (test)
LPIPS0.154
15
View SynthesisTanks&Temples
PSNR16.33
15
Novel View SynthesisWildRGB-D
PSNR19.57
13
Novel View SynthesisMiddlebury (test)
DISTS0.097
7
Novel View SynthesisBooster (test)
DISTS0.119
7
Novel View SynthesisWildRGBD (test)
DISTS0.069
7
Novel View SynthesisETH3D (test)
DISTS0.258
7
View SynthesisMiddlebury
PSNR17.12
7
Showing 10 of 17 rows

Other info

GitHub

Follow for update