Stronger Semantic Encoders Can Harm Relighting Performance: Probing Visual Priors via Augmented Latent Intrinsics
About
Image-to-image relighting requires representations that disentangle scene properties from illumination. Recent methods rely on latent intrinsic representations but remain under-constrained and often fail on challenging materials such as metal and glass. A natural hypothesis is that stronger pretrained visual priors should resolve these failures. We find the opposite: features from top-performing semantic encoders often degrade relighting quality, revealing a fundamental trade-off between semantic abstraction and photometric fidelity. We study this trade-off and introduce Augmented Latent Intrinsics (ALI), which balances semantic context and dense photometric structure by fusing features from a pixel-aligned visual encoder into a latent-intrinsic framework, together with a self-supervised refinement strategy to mitigate the scarcity of paired real-world data. Trained only on unlabeled real-world image pairs and paired with a dense, pixel-aligned visual prior, ALI achieves strong improvements in relighting, with the largest gains on complex, specular materials. Project page: https:\\augmented-latent-intrinsics.github.io
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image-to-image relighting | MIIW cross-scene (test) | RMSE (raw)0.294 | 9 | |
| Relighting | MIIW | PSNR18.872 | 6 | |
| Image-to-image relighting | In-the-wild Stage-wise Study | Lighting Alignment75 | 4 | |
| Image-to-image relighting | In-the-wild Comparison Study | Lighting Alignment0.931 | 3 |