HyperAlign: Hypernetwork for Efficient Test-Time Alignment of Diffusion Models

About

Diffusion model alignment aims to bridge the gap between generated outputs and human preferences by enhancing both semantic consistency with textual prompts and overall visual quality. Existing alignment methods face a challenging trade-off: test-time approaches enable input-specific adaptability but introduce significant computational overhead and tend to under-optimize, while fine-tuning approaches risk reward over-optimization and loss of generation diversity. To bridge this gap, we propose HyperAlign, a framework that trains a hypernetwork for efficient and effective test-time alignment. Instead of modifying latent states directly, HyperAlign dynamically generates input-and-state-conditioned low-rank adaptation weights to modulate the denoising trajectory toward target rewards. We introduce multiple HyperAlign variants of varying granularity to balance alignment quality and computational efficiency. The hypernetwork is optimized with a reward objective regularized by preference data to mitigate reward hacking. We evaluate HyperAlign across multiple generative paradigms, including Stable Diffusion and FLUX, where it significantly outperforms existing alignment methods in semantic consistency and visual quality.

Xin Xie, Jiaxian Guo, Dong Gong• 2026

Related benchmarks

Task	Dataset	Result
Text-to-Image Generation	GenEval	Overall Score70	581
Text-to-Image Generation	GenEval 1.0 (test)	Overall Score52	130
Text-to-Image Generation	Pick-a-Pic 1K prompts v1	ImageReward1.28	28

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord