Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RandAR: Decoder-only Autoregressive Visual Generation in Random Orders

About

We introduce RandAR, a decoder-only visual autoregressive (AR) model capable of generating images in arbitrary token orders. Unlike previous decoder-only AR models that rely on a predefined generation order, RandAR removes this inductive bias, unlocking new capabilities in decoder-only generation. Our essential design enables random order by inserting a "position instruction token" before each image token to be predicted, representing the spatial location of the next image token. Trained on randomly permuted token sequences -- a more challenging task than fixed-order generation, RandAR achieves comparable performance to its conventional raster-order counterpart. More importantly, decoder-only transformers trained from random orders acquire new capabilities. For the efficiency bottleneck of AR models, RandAR adopts parallel decoding with KV-Cache at inference time, enjoying 2.5x acceleration without sacrificing generation quality. Additionally, RandAR supports inpainting, outpainting and resolution extrapolation in a zero-shot manner. We hope RandAR inspires new directions for decoder-only visual generation models and broadens their applications across diverse scenarios. Our project page is at https://rand-ar.github.io/.

Ziqi Pang, Tianyuan Zhang, Fujun Luan, Yunze Man, Hao Tan, Kai Zhang, William T. Freeman, Yu-Xiong Wang• 2024

Related benchmarks

TaskDatasetResultRank
Class-conditional Image GenerationImageNet 256x256
Inception Score (IS)322
815
Class-conditional Image GenerationImageNet 256x256 (val)
FID2.15
427
Image GenerationImageNet 256x256
IS322
359
Image GenerationImageNet 256x256 (val)
FID2.15
340
Class-conditional Image GenerationImageNet 256x256 (test)
FID2.15
208
Class-conditional Image GenerationImageNet 256x256 (train val)
FID2.15
178
Class-conditional Image GenerationImageNet-1k (val)
FID2.15
68
Class-conditional Image GenerationImageNet-1K 256x256 (test)
FID2.15
50
Showing 8 of 8 rows

Other info

Follow for update