Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance

About

Text-to-image (T2I) diffusion/flow models have drawn considerable attention recently due to their remarkable ability to deliver flexible visual creations. Still, high-resolution image synthesis presents formidable challenges due to the scarcity and complexity of high-resolution content. Recent approaches have investigated training-free strategies to enable high-resolution image synthesis with pre-trained models. However, these techniques often struggle with generating high-quality visuals and tend to exhibit artifacts or low-fidelity details, as they typically rely solely on the endpoint of the low-resolution sampling trajectory while neglecting intermediate states that are critical for preserving structure and synthesizing finer detail. To this end, we present HiFlow, a training-free and model-agnostic framework to unlock the resolution potential of pre-trained flow models. Specifically, HiFlow establishes a virtual reference flow within the high-resolution space that effectively captures the characteristics of low-resolution flow information, offering guidance for high-resolution generation through three key aspects: initialization alignment for low-frequency consistency, direction alignment for structure preservation, and acceleration alignment for detail fidelity. By leveraging such flow-aligned guidance, HiFlow substantially elevates the quality of high-resolution image synthesis of T2I models and demonstrates versatility across their personalized variants. Extensive experiments validate HiFlow's capability in achieving superior high-resolution image quality over state-of-the-art methods.

Jiazi Bu, Pengyang Ling, Yujie Zhou, Pan Zhang, Tong Wu, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Dahua Lin, Jiaqi Wang• 2025

Related benchmarks

TaskDatasetResultRank
Text-to-Image Generation4K Resolution 4K x 4K (test)
CLIP IQA Score0.4407
16
Video GenerationVLM Evaluation Suite
Aesthetic Appeal8
8
Video GenerationVBench 1080P 1920 × 1088
Subject Consistency95.5
8
Video GenerationVBench 4K 3840 × 2176
Subject Consistency95
8
Resolution extrapolationFlux Guided resolution extrapolation (test)
FID73.13
3
Showing 5 of 5 rows

Other info

Follow for update