Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

UniCSG: Unified High-Fidelity Content-Constrained Style-Driven Generation via Staged Semantic and Frequency Disentanglement

About

Style transfer must match a target style while preserving content semantics. DiT-based diffusion models often suffer from content-style entanglement, leading to reference-content leakage and unstable generation. We present UniCSG, a unified framework for content-constrained, style-driven generation in both text-guided and reference-guided settings. UniCSG employs staged training: (i) a latent-space semantic disentanglement stage that combines low-frequency preprocessing with conditioning corruption to encourage content-style separation, and (ii) a latent-space frequency-aware detail reconstruction stage that refines details via multi-scale frequency supervision. We further incorporate pixel-space reward learning to align latent objectives with perceptual quality after decoding. Experiments demonstrate improved content faithfulness, style alignment, and robustness in both settings.

Jingwei Yang, Ruoxi Wu, Wei Shen, Meng Li, Yulong Liu, Huimin She, Lunxi Yuan• 2026

Related benchmarks

TaskDatasetResultRank
Style TransferCSG-Bench
FID87.32
20
reference-guided style transferOmniConsistency-Bench
FID88.428
20
Controllable Style GenerationCSG-Bench Text-guided
Content Preference Rate29.6
9
Controllable Style GenerationCSG-Bench Reference-guided
Content Preference Rate38.3
9
Showing 4 of 4 rows

Other info

Follow for update