Zero-Shot Depth from Defocus

About

Depth from Defocus (DfD) is the task of estimating a dense metric depth map from a focus stack. Unlike previous works overfitting to a certain dataset, this paper focuses on the challenging and practical setting of zero-shot generalization. We first propose a new real-world DfD benchmark ZEDD, which contains 8.3x more scenes and significantly higher quality images and ground-truth depth maps compared to previous benchmarks. We also design a novel network architecture named FOSSA. FOSSA is a Transformer-based architecture with novel designs tailored to the DfD task. The key contribution is a stack attention layer with a focus distance embedding, allowing efficient information exchange across the focus stack. Finally, we develop a new training data pipeline allowing us to utilize existing large-scale RGBD datasets to generate synthetic focus stacks. Experiment results on ZEDD and other benchmarks show a significant improvement over the baselines, reducing errors by up to 55.7%. The ZEDD benchmark is released at https://zedd.cs.princeton.edu. The code and checkpoints are released at https://github.com/princeton-vl/FOSSA.

Yiming Zuo, Hongyu Wen, Venkat Subramanian, Patrick Chen, Karhan Kayan, Mario Bijelic, Felix Heide, Jia Deng• 2026

Related benchmarks

Task	Dataset	Result
Monocular Depth Estimation	DIODE	--	161
Monocular Depth Estimation	HAMMER	--	26
Depth Estimation	iBims	Abs Rel Error7	21
Depth Estimation	ZEDD (test)	Delta Accuracy (Thresh=1.05)50.5	10
Depth Estimation	Infinigen Defocus	Accuracy (delta 1.05)52	10
Depth-from-Defocus	DDFF	MSE2.80e-4	9
Depth-from-Defocus	DIODE	Delta 1.25 Accuracy77.9	7
Depth-from-Defocus	HAMMER	Delta 1.2599.9	7
Monocular Depth Estimation	iBims	--	4

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord