HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance

About

Text-to-image (T2I) diffusion/flow models have drawn considerable attention recently due to their remarkable ability to deliver flexible visual creations. Still, high-resolution image synthesis presents formidable challenges due to the scarcity and complexity of high-resolution content. Recent approaches have investigated training-free strategies to enable high-resolution image synthesis with pre-trained models. However, these techniques often struggle with generating high-quality visuals and tend to exhibit artifacts or low-fidelity details, as they typically rely solely on the endpoint of the low-resolution sampling trajectory while neglecting intermediate states that are critical for preserving structure and synthesizing finer detail. To this end, we present HiFlow, a training-free and model-agnostic framework to unlock the resolution potential of pre-trained flow models. Specifically, HiFlow establishes a virtual reference flow within the high-resolution space that effectively captures the characteristics of low-resolution flow information, offering guidance for high-resolution generation through three key aspects: initialization alignment for low-frequency consistency, direction alignment for structure preservation, and acceleration alignment for detail fidelity. By leveraging such flow-aligned guidance, HiFlow substantially elevates the quality of high-resolution image synthesis of T2I models and demonstrates versatility across their personalized variants. Extensive experiments validate HiFlow's capability in achieving superior high-resolution image quality over state-of-the-art methods.

Jiazi Bu, Pengyang Ling, Yujie Zhou, Pan Zhang, Tong Wu, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Dahua Lin, Jiaqi Wang• 2025

Related benchmarks

Task	Dataset	Result
High-Resolution Image Generation	Aesthetic-4K	IR1.26	64
Text-to-Image Generation	4K Resolution 4K x 4K (test)	CLIP IQA Score0.4407	16
Video Generation	VLM Evaluation Suite	Aesthetic Appeal8	8
Video Generation	VBench 1080P 1920 × 1088	Subject Consistency95.5	8
Video Generation	VBench 4K 3840 × 2176	Subject Consistency95	8
4K ultra-high-resolution image generation	UltraHR-eval4k	FID38.54	6
Resolution extrapolation	Flux Guided resolution extrapolation (test)	FID73.13	3

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord