Pixel-Perfect Depth with Semantics-Prompted Diffusion Transformers

About

This paper presents Pixel-Perfect Depth, a monocular depth estimation model based on pixel-space diffusion generation that produces high-quality, flying-pixel-free point clouds from estimated depth maps. Current generative depth estimation models fine-tune Stable Diffusion and achieve impressive performance. However, they require a VAE to compress depth maps into latent space, which inevitably introduces \textit{flying pixels} at edges and details. Our model addresses this challenge by directly performing diffusion generation in the pixel space, avoiding VAE-induced artifacts. To overcome the high complexity associated with pixel-space generation, we introduce two novel designs: 1) Semantics-Prompted Diffusion Transformers (SP-DiT), which incorporate semantic representations from vision foundation models into DiT to prompt the diffusion process, thereby preserving global semantic consistency while enhancing fine-grained visual details; and 2) Cascade DiT Design that progressively increases the number of tokens to further enhance efficiency and accuracy. Our model achieves the best performance among all published generative models across five benchmarks, and significantly outperforms all other models in edge-aware point cloud evaluation.

Gangwei Xu, Haotong Lin, Hongcheng Luo, Xianqi Wang, Jingfeng Yao, Lianghui Zhu, Yuechuan Pu, Cheng Chi, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye, Sida Peng, Xin Yang• 2025

Related benchmarks

Task	Dataset	Result
Video Depth Estimation	Sintel	Delta Threshold Accuracy (1.25)39.8	235
Monocular Depth Estimation	NYU V2	Delta 1 Acc97.7	192
Monocular Depth Estimation	ETH3D	AbsRel4.02	173
Monocular Depth Estimation	DIODE	AbsRel4.97	161
3D Reconstruction	7 Scenes	--	161
Video Depth Estimation	KITTI	Abs Rel0.221	153
Video Depth Estimation	BONN	AbsRel31.5	139
Depth Estimation	ScanNet	--	133
Monocular Depth Estimation	ScanNet	AbsRel4.04	111
Depth Estimation	DIODE	Delta-1 Accuracy96.2	92

Showing 10 of 48 rows

Other info

Follow for update

@wizwand_team Discord