Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AnyPhoto: Multi-Person Identity Preserving Image Generation with ID Adaptive Modulation on Location Canvas

About

Multi-person identity-preserving generation requires binding multiple reference faces to specified locations under a text prompt. Strong identity/layout conditions often trigger copy-paste shortcuts and weaken prompt-driven controllability. We present AnyPhoto, a diffusion-transformer finetuning framework with (i) a RoPE-aligned location canvas plus location-aligned token pruning for spatial grounding, (ii) AdaLN-style identity-adaptive modulation from face-recognition embeddings for persistent identity injection, and (iii) identity-isolated attention to prevent cross-identity interference. Training combines conditional flow matching with an embedding-space face similarity loss, together with reference-face replacement and location-canvas degradations to discourage shortcuts. On MultiID-Bench, AnyPhoto improves identity similarity while reducing copy-paste tendency, with gains increasing as the number of identities grows. AnyPhoto also supports prompt-driven stylization with accurate placement, showing great potential application value.

Longhui Yuan• 2026

Related benchmarks

TaskDatasetResultRank
Identity-preserving Image GenerationMultiID-Bench 1-people
Sim(GT)0.448
18
Identity-preserving Image GenerationMultiID-Bench 2-people
Sim(GT)0.401
10
Identity-preserving Image GenerationMultiID-Bench 3-and-4-people
Sim(GT)0.424
10
Showing 3 of 3 rows

Other info

Follow for update