Rectifying Latent Space for Generative Single-Image Reflection Removal
About
Single-image reflection removal is a highly ill-posed problem, where existing methods struggle to reason about the composition of corrupted regions, causing them to fail at recovery and generalization in the wild. This work reframes an editing-purpose latent diffusion model to effectively perceive and process highly ambiguous, layered image inputs, yielding high-quality outputs. We argue that the challenge of this conversion stems from a critical yet overlooked issue, i.e., the latent space of semantic encoders lacks the inherent structure to interpret a composite image as a linear superposition of its constituent layers. Our approach is built on three synergistic components, including a reflection-equivariant VAE that aligns the latent space with the linear physics of reflection formation, a learnable task-specific text embedding for precise guidance that bypasses ambiguous language, and a depth-guided early-branching sampling strategy to harness generative stochasticity for promising results. Extensive experiments reveal that our model achieves new SOTA performance on multiple benchmarks and generalizes well to challenging real-world cases.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Single Image Reflection Removal | Real20 (test) | PSNR27.58 | 70 | |
| Single Image Reflection Removal | SIR2 454 (test) | PSNR28.08 | 11 | |
| Single Image Reflection Removal | Nature 20 (test) | PSNR27.34 | 11 | |
| Single Image Reflection Removal | OpenRR (val) | Avg Success Rate96.6 | 3 | |
| Single Image Reflection Removal | Nature | Average Success Rate96 | 3 | |
| Single Image Reflection Removal | Real20 | Average Success Rate91 | 3 | |
| Single Image Reflection Removal | SIR2 | Avg Success Rate78.5 | 3 | |
| Single Image Reflection Removal | Public Benchmarks (test) | Success Rate90.5 | 3 |