Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

TP-Blend: Textual-Prompt Attention Pairing for Precise Object-Style Blending in Diffusion Models

About

Current text-conditioned diffusion editors handle single object replacement well but struggle when a new object and a new style must be introduced simultaneously. We present Twin-Prompt Attention Blend (TP-Blend), a lightweight training-free framework that receives two separate textual prompts, one specifying a blend object and the other defining a target style, and injects both into a single denoising trajectory. TP-Blend is driven by two complementary attention processors. Cross-Attention Object Fusion (CAOF) first averages head-wise attention to locate spatial tokens that respond strongly to either prompt, then solves an entropy-regularised optimal transport problem that reassigns complete multi-head feature vectors to those positions. CAOF updates feature vectors at the full combined dimensionality of all heads (e.g., 640 dimensions in SD-XL), preserving rich cross-head correlations while keeping memory low. Self-Attention Style Fusion (SASF) injects style at every self-attention layer through Detail-Sensitive Instance Normalization. A lightweight one-dimensional Gaussian filter separates low- and high-frequency components; only the high-frequency residual is blended back, imprinting brush-stroke-level texture without disrupting global geometry. SASF further swaps the Key and Value matrices with those derived from the style prompt, enforcing context-aware texture modulation that remains independent of object fusion. Extensive experiments show that TP-Blend produces high-resolution, photo-realistic edits with precise control over both content and appearance, surpassing recent baselines in quantitative fidelity, perceptual quality, and inference speed.

Xin Jin, Yichuan Zhong, Yapeng Tian• 2026

Related benchmarks

TaskDatasetResultRank
Object Replacement and Style BlendingObject Replacement and Style Blending (800 pairs) (test)
BOSM0.8656
11
Object Replacement and Object BlendingUnsplash 4,000 samples (test)
BOM0.8031
10
Showing 2 of 2 rows

Other info

Follow for update