Reliev3R: Relieving Feed-forward Reconstruction from Multi-View Geometric Annotations

About

With recent advances, Feed-forward Reconstruction Models (FFRMs) have demonstrated great potential in reconstruction quality and adaptiveness to multiple downstream tasks. However, the excessive reliance on multi-view geometric annotations, e.g. 3D point maps and camera poses, makes the fully-supervised training scheme of FFRMs difficult to scale up. In this paper, we propose Reliev3R, a weakly-supervised paradigm for training FFRMs from scratch without cost-prohibitive multi-view geometric annotations. Relieving the reliance on geometric sensory data and compute-exhaustive structure-from-motion preprocessing, our method draws 3D knowledge directly from monocular relative depths and image sparse correspondences given by zero-shot predictions of pretrained models. At the core of Reliev3R, we design an ambiguity-aware relative depth loss and a trigonometry-based reprojection loss to facilitate supervision for multi-view geometric consistency. Training from scratch with the less data, Reliev3R catches up with its fully-supervised sibling models, taking a step towards low-cost 3D reconstruction supervisions and scalable FFRMs.

Youyu Chen, Junjun Jiang, Yueru Luo, Kui Jiang, Xianming Liu, Xu Yan, Dave Zhenyu Chen• 2026

Related benchmarks

Task	Dataset	Result
Depth Estimation	ScanNet++	AbsRel0.124	40
Pose Estimation	ScanNet++	--	32
Point Map Reconstruction	DL3DV-benchmark 8-view	Relative Error (rel)0.115	18
Point Map Estimation	ScanNet++	CD0.172	16
Camera pose estimation	DL3DV-benchmark 8-view	ATE0.018	9

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord