Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Fin3R: Fine-tuning Feed-forward 3D Reconstruction Models via Monocular Knowledge Distillation

About

We present Fin3R, a simple, effective, and general fine-tuning method for feed-forward 3D reconstruction models. The family of feed-forward reconstruction model regresses pointmap of all input images to a reference frame coordinate system, along with other auxiliary outputs, in a single forward pass. However, we find that current models struggle with fine geometry and robustness due to (\textit{i}) the scarcity of high-fidelity depth and pose supervision and (\textit{ii}) the inherent geometric misalignment from multi-view pointmap regression. Fin3R jointly tackles two issues with an extra lightweight fine-tuning step. We freeze the decoder, which handles view matching, and fine-tune only the image encoder-the component dedicated to feature extraction. The encoder is enriched with fine geometric details distilled from a strong monocular teacher model on large, unlabeled datasets, using a custom, lightweight LoRA adapter. We validate our method on a wide range of models, including DUSt3R, MASt3R, CUT3R, and VGGT. The fine-tuned models consistently deliver sharper boundaries, recover complex structures, and achieve higher geometric accuracy in both single- and multi-view settings, while adding only the tiny LoRA weights, which leave test-time memory and latency virtually unchanged. Project page: \href{http://visual-ai.github.io/fin3r}{https://visual-ai.github.io/fin3r}

Weining Ren, Hongjun Wang, Xiao Tan, Kai Han• 2025

Related benchmarks

TaskDatasetResultRank
Monocular Depth EstimationKITTI
Abs Rel0.1069
161
Monocular Depth EstimationETH3D
AbsRel3.07
117
Monocular Depth EstimationNYU V2
Delta 1 Acc98.3
113
Monocular Depth EstimationDIODE
AbsRel3.59
93
Monocular Depth EstimationiBIMS-1
ARel2.73
32
Depth EstimationHAMMER
Delta 124.5
29
Monocular Depth EstimationDDAD
Abs Rel Error0.265
17
Relative Camera Pose EvaluationScanNet1500
AUC@537.93
10
Monocular Depth EstimationAverage 7 datasets
Rel Error0.0429
10
Pointmap RegressionDTU
Mean Accuracy0.948
9
Showing 10 of 11 rows

Other info

Follow for update