Autoregressive Image Generation with Randomized Parallel Decoding

About

We introduce ARPG, a novel visual Autoregressive model that enables Randomized Parallel Generation, addressing the inherent limitations of conventional raster-order approaches, which hinder inference efficiency and zero-shot generalization due to their sequential, predefined token generation order. Our key insight is that effective random-order modeling necessitates explicit guidance for determining the position of the next predicted token. To this end, we propose a novel decoupled decoding framework that decouples positional guidance from content representation, encoding them separately as queries and key-value pairs. By directly incorporating this guidance into the causal attention mechanism, our approach enables fully random-order training and generation, eliminating the need for bidirectional attention. Consequently, ARPG readily generalizes to zero-shot tasks such as image in-painting, out-painting, and resolution expansion. Furthermore, it supports parallel inference by concurrently processing multiple queries using a shared KV cache. On the ImageNet-1K 256 benchmark, our approach attains an FID of 1.83 with only 32 sampling steps, achieving over a 30 times speedup in inference and and a 75 percent reduction in memory consumption compared to representative recent autoregressive models at a similar scale.

Haopeng Li, Jinyue Yang, Guoqi Li, Huan Wang• 2025

Related benchmarks

Task	Dataset	Result
Class-conditional Image Generation	ImageNet 256x256	Inception Score (IS)339.7	967
Image Generation	ImageNet 256x256	IS297.7	517
Class-conditional Image Generation	ImageNet 512x512	FID3.38	126
Text-to-Image Generation	T2I-CompBench (test)	Color Accuracy68	86
Class-conditional Image Generation	ImageNet-1K 256x256 (test)	FID1.83	50
Class-conditional Image Generation	ImageNet 1K 512x512 (test)	FID2.82	32
Controllable Generation	ImageNet	FID (Canny)7.39	6

Showing 7 of 7 rows

Other info

Code

Follow for update

@wizwand_team Discord