Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Controllable Layered Image Generation for Real-World Editing

About

Recent image generation models have shown impressive progress, yet they often struggle to yield controllable and consistent results when users attempt to edit specific elements within an existing image. Layered representations enable flexible, user-driven content creation, but existing approaches often fail to produce layers with coherent compositing relationships, and their object layers typically lack realistic visual effects such as shadows and reflections. To overcome these limitations, we propose LASAGNA, a novel, unified framework that generates an image jointly with its composing layers--a photorealistic background and a high-quality transparent foreground with compelling visual effects. Unlike prior work, LASAGNA efficiently learns correct image composition from a wide range of conditioning inputs--text prompts, foreground, background, and location masks--offering greater controllability for real-world applications. To enable this, we introduce LASAGNA-48K, a new dataset composed of clean backgrounds and RGBA foregrounds with physically grounded visual effects. We also propose LASAGNABENCH, the first benchmark for layer editing. We demonstrate that LASAGNA excels in generating highly consistent and coherent results across multiple image layers simultaneously, enabling diverse post-editing applications that accurately preserve identity and visual effects. LASAGNA-48K and LASAGNABENCH will be publicly released to foster open research in the community. The project page is https://rayjryang.github.io/LASAGNA-Page/.

Jinrui Yang, Qing Liu, Yijun Li, Mengwei Ren, Letian Zhang, Zhe Lin, Cihang Xie, Yuyin Zhou• 2026

Related benchmarks

TaskDatasetResultRank
Text-to-Image GenerationGenEval
Overall Score83
467
Image Editing (Foreground Conditional)ImgEdit-Bench Addition
Score3.86
11
Image Editing (Background Conditional)ImgEdit-Bench Background
Score3.32
11
Background GenerationLASAGNABENCH
CFID14.1
4
Foreground GenerationLASAGNABENCH
CFID9.7
4
Text-to-All GenerationLASAGNABENCH
CFID16.9
4
Complex Compositional Editing (Joint Recolor + Movement)LASAGNABENCH
CLIP-FID6.4
3
RecoloringLASAGNABENCH
CLIP-FID8.3
3
Spatial Editing (Movement)LASAGNABENCH
CLIP-FID6.5
3
Background Generation∞Bench
CLIP-FID (Compositional)21
2
Showing 10 of 12 rows

Other info

Follow for update