Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

A Unified and Controllable Framework for Layered Image Generation with Visual Effects

About

Recent image generation models produce impressive composites, but often fail to preserve the identity of user-provided content when editing specific elements: the surrounding scene may shift, and even the edited object's appearance can drift from the original. Layered representation offer a natural remedy--they allow users to independently manipulate individual elements--but existing layered methods typically produce transparent foregrounds without realistic visual effects such as shadows and reflections, forcing the use of a second harmonization model after every edit, which in turn introduces drift. To overcome these limitations, we present LASAGNA, which generates a photorealistic background (BG) and an RGBA foreground with compelling visual effects in a single forward pass. By treating object-associated visual effects as part of the foreground (FG) layer, LASAGNA supports the dominant class of consumer edits (e.g., translation, scaling, recoloring, duplication) via alpha compositing alone, without invoking any model post-edit, thereby eliminating identity drift introduced by cascade editing pipelines. This single-pass design contrasts with prior layered methods that rely on separate expert models for each task. LASAGNA handles diverse conditional inputs--text prompts, FG, BG, and location masks--within a unified architecture. We further release two community resources: LASAGNA-48K, the first public dataset of 48K layered image triplets with photorealistic visual effects, and LASAGNA-BENCH, the first standardized benchmark for layer-centric generation and editing, comprising 242 expert-annotated samples across six diverse sources. Experiments show that LASAGNA outperforms both general-purpose editors and prior layered methods across three generation modes, and supports a wide range of post-edits without any model re-inference.

Jinrui Yang, Qing Liu, Yijun Li, Mengwei Ren, Letian Zhang, Zhe Lin, Cihang Xie, Yuyin Zhou• 2026

Related benchmarks

TaskDatasetResultRank
Text-to-Image GenerationGenEval
Overall Score83
517
Image Editing (Foreground Conditional)ImgEdit-Bench Addition
Score3.86
11
Image Editing (Background Conditional)ImgEdit-Bench Background
Score3.32
11
Background GenerationLASAGNABENCH
CFID14.1
4
Foreground GenerationLASAGNABENCH
CFID9.7
4
Text-to-All GenerationLASAGNABENCH
CFID16.9
4
Complex Compositional Editing (Joint Recolor + Movement)LASAGNABENCH
CLIP-FID6.4
3
RecoloringLASAGNABENCH
CLIP-FID8.3
3
Spatial Editing (Movement)LASAGNABENCH
CLIP-FID6.5
3
Background Generation∞Bench
CLIP-FID (Compositional)21
2
Showing 10 of 12 rows

Other info

Follow for update