Fin3R: Fine-tuning Feed-forward 3D Reconstruction Models via Monocular Knowledge Distillation
About
We present Fin3R, a simple, effective, and general fine-tuning method for feed-forward 3D reconstruction models. The family of feed-forward reconstruction model regresses pointmap of all input images to a reference frame coordinate system, along with other auxiliary outputs, in a single forward pass. However, we find that current models struggle with fine geometry and robustness due to (\textit{i}) the scarcity of high-fidelity depth and pose supervision and (\textit{ii}) the inherent geometric misalignment from multi-view pointmap regression. Fin3R jointly tackles two issues with an extra lightweight fine-tuning step. We freeze the decoder, which handles view matching, and fine-tune only the image encoder-the component dedicated to feature extraction. The encoder is enriched with fine geometric details distilled from a strong monocular teacher model on large, unlabeled datasets, using a custom, lightweight LoRA adapter. We validate our method on a wide range of models, including DUSt3R, MASt3R, CUT3R, and VGGT. The fine-tuned models consistently deliver sharper boundaries, recover complex structures, and achieve higher geometric accuracy in both single- and multi-view settings, while adding only the tiny LoRA weights, which leave test-time memory and latency virtually unchanged. Project page: \href{http://visual-ai.github.io/fin3r}{https://visual-ai.github.io/fin3r}
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Monocular Depth Estimation | KITTI | Abs Rel0.1069 | 203 | |
| Monocular Depth Estimation | ETH3D | AbsRel3.07 | 132 | |
| Monocular Depth Estimation | NYU V2 | Delta 1 Acc98.3 | 131 | |
| Monocular Depth Estimation | DIODE | AbsRel3.59 | 113 | |
| Monocular Depth Estimation | iBIMS-1 | ARel2.73 | 36 | |
| Point Map Estimation | ETH3D | NC Mean0.861 | 31 | |
| Depth Estimation | HAMMER | Delta 124.5 | 29 | |
| Pointmap Regression | DTU | Mean Accuracy0.948 | 26 | |
| Relative Camera Pose Evaluation | ScanNet1500 | AUC@537.93 | 23 | |
| Monocular Depth Estimation | DDAD | Abs Rel Error0.265 | 21 |