Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ACPO: Anchor-Constrained Perceptual Optimization for Diffusion Models with No-Reference Quality Guidance

About

Diffusion models have achieved remarkable success in image generation, yet their training is predominantly driven by full-reference objectives that enforce pixel-wise similarity to ground-truth images.Such supervision, while effective for fidelity, may insufficient in terms of subjective visual perception quality and text-image semantic consistency. In this work, we investigate the problem of incorporating no-reference perceptual quality into diffusion training. A key challenge is that directly optimizing perceptual signals, such as those provided by no-reference image quality assessment (NR-IQA) models, introduces a mismatch with the original diffusion objective, leading to training instability and distributional drift during fine-tuning. To address this issue, we propose an anchor-constrained optimization framework that enables stable perceptual adaptation. Specifically, we leverage a learned NR-IQA model as a perceptual guidance signal, while introducing an anchor-based regularization that enforces consistency with the base diffusion model in terms of noise prediction. This design effectively balances perceptual quality improvement and generative fidelity, allowing controlled adaptation toward perceptually favorable outputs without compromising the original generative behavior. Extensive experiments demonstrate that our method consistently enhances perceptual quality while preserving generation diversity and training stability, highlighting the effectiveness of anchor-constrained perceptual optimization for diffusion models.

Yang Yang, Feifan Meng, Han Fang, Weiming Zhang• 2026

Related benchmarks

TaskDatasetResultRank
Image GenerationLSUN Church 256x256--
10
Text-to-Image GenerationDiffusionDB text prompts (unseen)
Improvement0.1736
3
Text-to-Image GenerationDrawBench text prompts (unseen)
Improvement47.63
3
Text-to-Image GenerationPartiPrompts (unseen text prompts)
Improvement91.31
3
Image GenerationCIFAR-10 32 x 32
MSE0.0267
2
Text-to-Image GenerationVisual Genome
IPCE2.9166
2
Image GenerationAnime-Faces 64 x 64
MSE0.0187
2
Showing 7 of 7 rows

Other info

Follow for update