A Unified and Controllable Framework for Layered Image Generation with Visual Effects

About

Recent image generation models produce impressive composites, but often fail to preserve the identity of user-provided content when editing specific elements: the surrounding scene may shift, and even the edited object's appearance can drift from the original. Layered representation offer a natural remedy--they allow users to independently manipulate individual elements--but existing layered methods typically produce transparent foregrounds without realistic visual effects such as shadows and reflections, forcing the use of a second harmonization model after every edit, which in turn introduces drift. To overcome these limitations, we present LASAGNA, which generates a photorealistic background (BG) and an RGBA foreground with compelling visual effects in a single forward pass. By treating object-associated visual effects as part of the foreground (FG) layer, LASAGNA supports the dominant class of consumer edits (e.g., translation, scaling, recoloring, duplication) via alpha compositing alone, without invoking any model post-edit, thereby eliminating identity drift introduced by cascade editing pipelines. This single-pass design contrasts with prior layered methods that rely on separate expert models for each task. LASAGNA handles diverse conditional inputs--text prompts, FG, BG, and location masks--within a unified architecture. We further release two community resources: LASAGNA-48K, the first public dataset of 48K layered image triplets with photorealistic visual effects, and LASAGNA-BENCH, the first standardized benchmark for layer-centric generation and editing, comprising 242 expert-annotated samples across six diverse sources. Experiments show that LASAGNA outperforms both general-purpose editors and prior layered methods across three generation modes, and supports a wide range of post-edits without any model re-inference.

Jinrui Yang, Qing Liu, Yijun Li, Mengwei Ren, Letian Zhang, Zhe Lin, Cihang Xie, Yuyin Zhou• 2026

Related benchmarks

Task	Dataset	Result
Text-to-Image Generation	GenEval	Overall Score83	581
Image Editing (Foreground Conditional)	ImgEdit-Bench Addition	Score3.86	11
Image Editing (Background Conditional)	ImgEdit-Bench Background	Score3.32	11
Background Generation	LASAGNABENCH	CFID14.1	4
Foreground Generation	LASAGNABENCH	CFID9.7	4
Text-to-All Generation	LASAGNABENCH	CFID16.9	4
Complex Compositional Editing (Joint Recolor + Movement)	LASAGNABENCH	CLIP-FID6.4	3
Recoloring	LASAGNABENCH	CLIP-FID8.3	3
Spatial Editing (Movement)	LASAGNABENCH	CLIP-FID6.5	3
Background Generation	∞Bench	CLIP-FID (Compositional)21	2

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord