Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Visual Autoregressive Modelling for Monocular Depth Estimation

About

We propose a monocular depth estimation method based on visual autoregressive (VAR) priors, offering an alternative to diffusion-based approaches. Our method adapts a large-scale text-to-image VAR model and introduces a scale-wise conditional upsampling mechanism with classifier-free guidance. Our approach performs inference in ten fixed autoregressive stages, requiring only 74K synthetic samples for fine-tuning, and achieves competitive results. We report state-of-the-art performance in indoor benchmarks under constrained training conditions, and strong performance when applied to outdoor datasets. This work establishes autoregressive priors as a complementary family of geometry-aware generative models for depth estimation, highlighting advantages in data scalability, and adaptability to 3D vision tasks. Code available at "https://github.com/AmirMaEl/VAR-Depth".

Amir El-Ghoussani, Andr\'e Kaup, Nassir Navab, Gustavo Carneiro, Vasileios Belagiannis• 2025

Related benchmarks

TaskDatasetResultRank
Monocular Depth EstimationETH3D
AbsRel8.1
117
Monocular Depth EstimationDIODE
AbsRel22.3
93
Monocular Depth EstimationScanNet
AbsRel7.9
64
Monocular Depth EstimationNYU
AbsRel6.4
21
Monocular Depth EstimationKITTI
AbsRel10.4
12
Showing 5 of 5 rows

Other info

Follow for update