Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Component-Aware Sketch-to-Image Generation Using Self-Attention Encoding and Coordinate-Preserving Fusion

About

Translating freehand sketches into photorealistic images remains a fundamental challenge in image synthesis, particularly due to the abstract, sparse, and stylistically diverse nature of sketches. Existing approaches, including GAN-based and diffusion-based models, often struggle to reconstruct fine-grained details, maintain spatial alignment, or adapt across different sketch domains. In this paper, we propose a component-aware, self-refining framework for sketch-to-image generation that addresses these challenges through a novel two-stage architecture. A Self-Attention-based Autoencoder Network (SA2N) first captures localised semantic and structural features from component-wise sketch regions, while a Coordinate-Preserving Gated Fusion (CGF) module integrates these into a coherent spatial layout. Finally, a Spatially Adaptive Refinement Revisor (SARR), built on a modified StyleGAN2 backbone, enhances realism and consistency through iterative refinement guided by spatial context. Extensive experiments across both facial (CelebAMask-HQ, CUFSF) and non-facial (Sketchy, ChairsV2, ShoesV2) datasets demonstrate the robustness and generalizability of our method. The proposed framework consistently outperforms state-of-the-art GAN and diffusion models, achieving significant gains in image fidelity, semantic accuracy, and perceptual quality. On CelebAMask-HQ, our model improves over prior methods by 21% (FID), 58% (IS), 41% (KID), and 20% (SSIM). These results, along with higher efficiency and visual coherence across diverse domains, position our approach as a strong candidate for applications in forensics, digital art restoration, and general sketch-based image synthesis.

Ali Zia, Muhammad Umer Ramzan, Usman Ali, Muhammad Faheem, Abdelwahed Khamis, Shahnawaz Qureshi• 2026

Related benchmarks

TaskDatasetResultRank
Sketch-to-Image SynthesisCelebAMask-HQ
SSIM77
8
Sketch-to-Photo GenerationChair V2--
8
Sketch-to-Photo GenerationShoe V2--
8
Sketch-to-Image SynthesisCUHK
FID84.68
6
Sketch-to-Image SynthesisCUFSF
FID78.48
6
Sketch-to-Image GenerationCelebAMask-HQ
MOS0.74
5
Sketch-to-Image GenerationSketchy
MOS0.69
5
Sketch-to-image translationSketchy Database 43 (test)
FID131.7
5
Sketch-to-image translationQMUL Chairs 44, 45 V2 (test)
FID77.9
5
Sketch-to-image translationQMUL Shoes V2 44, 45 (test)
FID53.38
5
Showing 10 of 10 rows

Other info

Follow for update