Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal

About

This work addresses the challenge of high-quality surface normal estimation from monocular colored inputs (i.e., images and videos), a field which has recently been revolutionized by repurposing diffusion priors. However, previous attempts still struggle with stochastic inference, conflicting with the deterministic nature of the Image2Normal task, and costly ensembling step, which slows down the estimation process. Our method, StableNormal, mitigates the stochasticity of the diffusion process by reducing inference variance, thus producing "Stable-and-Sharp" normal estimates without any additional ensembling process. StableNormal works robustly under challenging imaging conditions, such as extreme lighting, blurring, and low quality. It is also robust against transparent and reflective surfaces, as well as cluttered scenes with numerous objects. Specifically, StableNormal employs a coarse-to-fine strategy, which starts with a one-step normal estimator (YOSO) to derive an initial normal guess, that is relatively coarse but reliable, then followed by a semantic-guided refinement process (SG-DRN) that refines the normals to recover geometric details. The effectiveness of StableNormal is demonstrated through competitive performance in standard datasets such as DIODE-indoor, iBims, ScannetV2 and NYUv2, and also in various downstream tasks, such as surface reconstruction and normal enhancement. These results evidence that StableNormal retains both the "stability" and "sharpness" for accurate normal estimation. StableNormal represents a baby attempt to repurpose diffusion priors for deterministic estimation. To democratize this, code and models have been publicly available in hf.co/Stable-X

Chongjie Ye, Lingteng Qiu, Xiaodong Gu, Qi Zuo, Yushuang Wu, Zilong Dong, Liefeng Bo, Yuliang Xiu, Xiaoguang Han• 2024

Related benchmarks

TaskDatasetResultRank
Surface Normal PredictionNYU V2
Mean Error18.6
100
Surface Normal EstimationNYU V2--
23
Surface Normal EstimationScanNet Normal Benchmark (test)
Angle Error Threshold (11.25°)57.4
18
Specular Highlight RemovalPSD
PSNR23.384
15
Specular Highlight RemovalSHIQ
PSNR22.844
15
Transparent object normal estimationTransNormal Synthetic (test)
Mean Angular Error7.6
13
Transparent object normal estimationClearPose Real-World (test)
Mean Angular Error37.1
13
Transparent object normal estimationClearGrasp Synthetic (test)
Mean Angular Error32
13
Video Surface Normal EstimationSintel
Mean Angular Error36.7
12
Highlight RemovalHouseCat6D
LSR0.318
8
Showing 10 of 27 rows

Other info

Follow for update