SkipVAR: Accelerating Visual Autoregressive Modeling via Adaptive Frequency-Aware Skipping

About

Recent studies on Visual Autoregressive (VAR) models have highlighted that high-frequency components, or later steps, in the generation process contribute disproportionately to inference latency. However, the underlying computational redundancy involved in these steps has yet to be thoroughly investigated. In this paper, we conduct an in-depth analysis of the VAR inference process and identify two primary sources of inefficiency: step redundancy and unconditional branch redundancy. To address step redundancy, we propose an automatic step-skipping strategy that selectively omits unnecessary generation steps to improve efficiency. For unconditional branch redundancy, we observe that the information gap between the conditional and unconditional branches is minimal. Leveraging this insight, we introduce unconditional branch replacement, a technique that bypasses the unconditional branch to reduce computational cost. Notably, we observe that the effectiveness of acceleration strategies varies significantly across different samples. Motivated by this, we propose SkipVAR, a sample-adaptive framework that leverages frequency information to dynamically select the most suitable acceleration strategy for each instance. To evaluate the role of high-frequency information, we introduce high-variation benchmark datasets that test model sensitivity to fine details. Extensive experiments show SkipVAR achieves over 0.88 average SSIM with up to 1.81x overall acceleration and 2.62x speedup on the GenEval benchmark, maintaining model quality. These results confirm the effectiveness of frequency-aware, training-free adaptive acceleration for scalable autoregressive image generation. Our code is available at https://github.com/fakerone-li/SkipVAR and has been publicly released.

Jiajun Li, Yue Ma, Xinyu Zhang, Qingyan Wei, Songhua Liu, Linfeng Zhang• 2025

Related benchmarks

Task	Dataset	Result
Text-to-Image Generation	GenEval	Overall Score78	704
Text-to-Image Generation	DPG-Bench	Overall Score86.4	451
Text-to-Image Generation	GenEval	GenEval Score79	442
Text-to-Image Generation	DPG	Overall Score83.16	256
Text-to-Image Generation	GenEval	Overall Score72	218
Text-to-Image Generation	HPS v2.1	Overall Score30.64	96
Text-to-Image Generation	ImageReward	ImageReward Score1.032	69
Text-to-Image Generation	DPG-Bench (test)	Overall Fidelity86.293	68
Human Preference Evaluation	ImageReward	Average Score1.03	40
Text-to-Image Generation	GenEval	Two Objects Score94.94	27

Showing 10 of 17 rows

Other info

Follow for update

@wizwand_team Discord