SIRR-LMM: Single-image Reflection Removal via Large Multimodal Model
About
Glass surfaces create complex interactions of reflected and transmitted light, making single-image reflection removal (SIRR) challenging. Existing datasets suffer from limited physical realism in synthetic data or insufficient scale in real captures. We introduce a synthetic dataset generation framework that path-traces 3D glass models over real background imagery to create physically accurate reflection scenarios with varied glass properties, camera settings, and post-processing effects. To leverage the capabilities of Large Multimodal Model (LMM), we concatenate the image layers into a single composite input, apply joint captioning, and fine-tune the model using task-specific LoRA rather than full-parameter training. This enables our approach to achieve improved reflection removal and separation performance compared to state-of-the-art methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Single Image Reflection Removal | Real 20 55 (test) | PSNR24.89 | 7 | |
| Single Image Reflection Removal | SIR^2 141 36 (test) | PSNR (Postcard)26.2 | 7 | |
| Single Image Reflection Removal | Nature 27 (test) | PSNR25.14 | 7 | |
| Single Image Reflection Removal | ReaL | Win Rate57.32 | 4 | |
| Single Image Reflection Removal | Nature | Win Rate35 | 4 | |
| Single Image Reflection Removal | Postcard | Win Rate67.03 | 4 | |
| Single Image Reflection Removal | SolidObject | Win Rate41.43 | 4 | |
| Single Image Reflection Removal | Wildscene | Win Rate48.57 | 4 |