FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views

About

We present FLARE, a feed-forward model designed to infer high-quality camera poses and 3D geometry from uncalibrated sparse-view images (i.e., as few as 2-8 inputs), which is a challenging yet practical setting in real-world applications. Our solution features a cascaded learning paradigm with camera pose serving as the critical bridge, recognizing its essential role in mapping 3D structures onto 2D image planes. Concretely, FLARE starts with camera pose estimation, whose results condition the subsequent learning of geometric structure and appearance, optimized through the objectives of geometry reconstruction and novel-view synthesis. Utilizing large-scale public datasets for training, our method delivers state-of-the-art performance in the tasks of pose estimation, geometry reconstruction, and novel view synthesis, while maintaining the inference efficiency (i.e., less than 0.5 seconds). The project page and code can be found at: https://zhanghe3z.github.io/FLARE/

Shangzhan Zhang, Jianyuan Wang, Yinghao Xu, Nan Xue, Christian Rupprecht, Xiaowei Zhou, Yujun Shen, Gordon Wetzstein• 2025

Related benchmarks

Task	Dataset	Result
Novel View Synthesis	RE10K	SSIM83.4	345
Video Depth Estimation	Sintel	Delta Threshold Accuracy (1.25)40.2	235
Monocular Depth Estimation	KITTI	Abs Rel0.312	220
Novel View Synthesis	RealEstate10K	PSNR24.25	212
Camera pose estimation	TUM-dynamic	ATE0.026	205
Camera pose estimation	Sintel	ATE0.207	203
Monocular Depth Estimation	NYU V2	--	192
3D Reconstruction	7 Scenes	Completion10.7	161
Video Depth Estimation	KITTI	Abs Rel0.356	153
Monocular Depth Estimation	Sintel	Abs Rel0.606	142

Showing 10 of 83 rows

...

Other info

Follow for update

@wizwand_team Discord