DepthFocus: Controllable Depth Estimation for See-Through Scenes

About

Depth in the real world is rarely singular. Transmissive materials create layered ambiguities that confound conventional perception systems. Existing models remain passive; conventional approaches typically estimate static depth maps anchored to the nearest surface, and even recent multi-head extensions suffer from a representational bottleneck due to fixed feature representations. This stands in contrast to human vision, which actively shifts focus to perceive a desired depth. We introduce \textbf{DepthFocus}, a steerable Vision Transformer that redefines stereo depth estimation as condition-aware control. Instead of extracting fixed features, our model dynamically modulates its computation based on a physical reference depth, integrating dual conditional mechanisms to selectively perceive geometry aligned with the desired focus. Leveraging a newly curated large-scale synthetic dataset, \textbf{DepthFocus} achieves state-of-the-art results across all evaluated benchmarks, including both standard single-layer and complex multi-layered scenarios. While maintaining high precision in opaque regions, our approach effectively resolves depth ambiguities in transparent and reflective scenes by selectively reconstructing geometry at a target distance. This capability enables robust, intent-driven perception that significantly outperforms existing multi-layer methods, marking a substantial step toward active 3D perception. \noindent \textbf{Project page}: \href{https://junhong-3dv.github.io/depthfocus-project/}{\textbf{this https URL}}.

Junhong Min, Jimin Kim, Minwook Kim, Cheol-Hui Min, Youngpil Jeon, Minyong Choi• 2025

Related benchmarks

Task	Dataset	Result
Stereo Depth Estimation	Booster All type	EPE1.56	14
Multi-layer depth estimation	Multi-layered synthetic benchmark Opaque Layer 1	Bad-2 Error2.74	11
Multi-layer depth estimation	Multi-layered synthetic benchmark Transmissive Layer 1	Bad-25.47	10
Stereo Matching	Laboratory bilayer benchmark No plate	Bad-4 (Opaque)1.35	9
Stereo Matching	Laboratory bilayer benchmark With plate 60% transmittance	Bad-4 Error (Opaque)1.27	9
Stereo Matching	Laboratory bilayer benchmark With plate 80% transmittance	Bad-4 Error (Opaque)1.15	9
Multi-layer depth estimation	LayeredFlow (val)	Layer 1 EPE3.13	8
Stereo Depth Estimation	Booster (Opaque)	EPE1.07	7
Stereo Depth Estimation	Middlebury (Non Occlusion)	EPE (Endpoint Error)0.67	7
Multi-layer depth estimation	Multi-layered synthetic benchmark Transmissive Layer 4	Bad-233.01	5

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord