IVGT: Implicit Visual Geometry Transformer for Neural Scene Representation

About

Reconstructing coherent 3D geometry and appearance from unposed multi-view images is a fundamental yet challenging problem in computer vision. Most existing visual geometry foundation models predict explicit geometry by regressing pixel-aligned pointmaps, often suffering from redundancy and limited geometric continuity. We propose IVGT, an Implicit Visual Geometry Transformer that implicitly models continuous and coherent geometry from pose-free multi-view images. This formulation learns a continuous neural scene representation in a canonical coordinate system and supports continuous spatial queries at any 3D positions, retrieving local features to predict signed distance (SDF) values and colors using lightweight decoders. It allows direct extraction of continuous and coherent surface geometry, enabling rendering of RGB images, depth maps, and surface normal maps from arbitrary viewpoints. We train IVGT via multi-dataset joint optimization with 2D supervision and 3D geometric regularization. IVGT demonstrates generalization across scenes and achieves strong performance on various tasks, including mesh and point cloud reconstruction, novel view synthesis, depth and surface normal estimation, and camera pose estimation.

Yuqi Wu, Tianyu Hu, Wenzhao Zheng, Yuanhui Huang, Haowen Sun, Jie Zhou, Jiwen Lu• 2026

Related benchmarks

Task	Dataset	Result
Video Depth Estimation	Sintel	Delta Threshold Accuracy (1.25)64.6	235
Camera pose estimation	TUM-dynamic	ATE0.012	205
Monocular Depth Estimation	NYU V2	--	192
Monocular Depth Estimation	Sintel	Abs Rel0.309	142
Surface Normal Estimation	NYU V2	Mean Angular Error16.6	96
Surface Normal Estimation	iBIMS-1	MAE20.1	67
Camera pose estimation	ScanNet static indoor scenes	ATE0.032	40
Camera pose estimation	Sintel dataset	ATE0.14	35
Novel View Synthesis	RealEstate-10K 2 views (test)	LPIPS0.449	19
Pointmap reconstruction	DTU object kf=5	Mean Accuracy1.686	7

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord