Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DepthFocus: Controllable Depth Estimation for See-Through Scenes

About

Depth in the real world is rarely singular. Transmissive materials create layered ambiguities that confound conventional perception systems. Existing models remain passive; conventional approaches typically estimate static depth maps anchored to the nearest surface, and even recent multi-head extensions suffer from a representational bottleneck due to fixed feature representations. This stands in contrast to human vision, which actively shifts focus to perceive a desired depth. We introduce \textbf{DepthFocus}, a steerable Vision Transformer that redefines stereo depth estimation as condition-aware control. Instead of extracting fixed features, our model dynamically modulates its computation based on a physical reference depth, integrating dual conditional mechanisms to selectively perceive geometry aligned with the desired focus. Leveraging a newly curated large-scale synthetic dataset, \textbf{DepthFocus} achieves state-of-the-art results across all evaluated benchmarks, including both standard single-layer and complex multi-layered scenarios. While maintaining high precision in opaque regions, our approach effectively resolves depth ambiguities in transparent and reflective scenes by selectively reconstructing geometry at a target distance. This capability enables robust, intent-driven perception that significantly outperforms existing multi-layer methods, marking a substantial step toward active 3D perception. \noindent \textbf{Project page}: \href{https://junhong-3dv.github.io/depthfocus-project/}{\textbf{this https URL}}.

Junhong Min, Jimin Kim, Minwook Kim, Cheol-Hui Min, Youngpil Jeon, Minyong Choi• 2025

Related benchmarks

TaskDatasetResultRank
Stereo Depth EstimationBooster All type
EPE1.56
14
Multi-layer depth estimationMulti-layered synthetic benchmark Opaque Layer 1
Bad-2 Error2.74
11
Multi-layer depth estimationMulti-layered synthetic benchmark Transmissive Layer 1
Bad-25.47
10
Stereo MatchingLaboratory bilayer benchmark No plate
Bad-4 (Opaque)1.35
9
Stereo MatchingLaboratory bilayer benchmark With plate 60% transmittance
Bad-4 Error (Opaque)1.27
9
Stereo MatchingLaboratory bilayer benchmark With plate 80% transmittance
Bad-4 Error (Opaque)1.15
9
Multi-layer depth estimationLayeredFlow (val)
Layer 1 EPE3.13
8
Stereo Depth EstimationBooster (Opaque)
EPE1.07
7
Stereo Depth EstimationMiddlebury (Non Occlusion)
EPE (Endpoint Error)0.67
7
Multi-layer depth estimationMulti-layered synthetic benchmark Transmissive Layer 4
Bad-233.01
5
Showing 10 of 12 rows

Other info

Follow for update