Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Multi-scale Attention-Guided Intrinsic Decomposition and Rendering Pass Prediction for Facial Images

About

Accurate intrinsic decomposition of face images under unconstrained lighting is a prerequisite for photorealistic relighting, high-fidelity digital doubles, and augmented-reality effects. This paper introduces MAGINet, a Multi-scale Attention-Guided Intrinsics Network that predicts a $512\times512$ light-normalized diffuse albedo map from a single RGB portrait. MAGINet employs hierarchical residual encoding, spatial-and-channel attention in a bottleneck, and adaptive multi-scale feature fusion in the decoder, yielding sharper albedo boundaries and stronger lighting invariance than prior U-Net variants. The initial albedo prediction is upsampled to $1024\times1024$ and refined by a lightweight three-layer CNN (RefinementNet). Conditioned on this refined albedo, a Pix2PixHD-based translator then predicts a comprehensive set of five additional physically based rendering passes: ambient occlusion, surface normal, specular reflectance, translucency, and raw diffuse colour (with residual lighting). Together with the refined albedo, these six passes form the complete intrinsic decomposition. Trained with a combination of masked-MSE, VGG, edge, and patch-LPIPS losses on the FFHQ-UV-Intrinsics dataset, the full pipeline achieves state-of-the-art performance for diffuse albedo estimation and demonstrates significantly improved fidelity for the complete rendering stack compared to prior methods. The resulting passes enable high-quality relighting and material editing of real faces.

Hossein Javidnia• 2025

Related benchmarks

TaskDatasetResultRank
Diffuse Albedo EstimationFFHQ-UV-Intrinsics
MSE2.93
6
Intrinsic RenderingFFHQ-UV-Intrinsics (test)
Average MSE3.6
3
Ambient Occlusion EstimationFFHQ-UV-Intrinsics (test)
MSE2.61
2
Showing 3 of 3 rows

Other info

Follow for update