Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Envisioning Beyond the Few: Disentangled Semantics and Primitives for Few-Shot Atypical Layout-to-Image Generation

About

The layout-to-image (L2I) task enables fine-grained control over image generation via object categories and spatial layouts. However, existing L2I methods yield fragmented and distorted generations under few-shot atypical settings. We term this failure as representation fragmentation, arising from a granularity mismatch that entangles semantic identity with visual details. To address this issue, we propose a representation-driven framework that disentangles semantics from primitives for robust few-shot adaptation. Specifically, Semantic Anchoring aggregates categorical semantics into anchors for stable identity, while Primitive Imbuing models recomposable primitives for robust local detail modeling. Conceptual Steering further regulates optimization with a saliency-aware objective to preserve foreground semantic consistency. Extensive experiments demonstrate consistent improvements in the 5-shot regime over state-of-the-art L2I methods in both visual fidelity and alignment across diverse atypical domains. The source code is publicly available at https://github.com/iCVTEAM/DSP.

Nan Bao, Yifan Zhao, Wenzhuang Wang, Jia Li• 2026

Related benchmarks

TaskDatasetResultRank
Layout-to-Image GenerationDIOR (test)
FID74.34
4
Layout-to-Image GenerationRUOD (test)
FID (Bootstrapped)45.44
4
Layout-to-Image GenerationExDark (test)
FID (Bootstrap)91.36
4
Layout-to-Image GenerationDIOR
mAP26.06
4
Layout-to-Image GenerationDIOR 5-shot
FID (Bootstrapped)74.34
4
Showing 5 of 5 rows

Other info

Follow for update