Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model

About

Autoregressive image generation aims to predict the next token based on previous ones. However, this process is challenged by the bidirectional dependencies inherent in conventional image tokenizations, which creates a fundamental misalignment with the unidirectional nature of autoregressive models. To resolve this, we introduce AliTok, a novel Aligned Tokenizer that alters the dependency structure of the token sequence. AliTok employs a bidirectional encoder constrained by a causal decoder, a design that compels the encoder to produce a token sequence with both semantic richness and forward-dependency. Furthermore, by incorporating prefix tokens and employing a two-stage tokenizer training process to enhance reconstruction performance, AliTok achieves high fidelity and predictability simultaneously. Building upon AliTok, a standard decoder-only autoregressive model with just 177M parameters achieves a gFID of 1.44 and an IS of 319.5 on ImageNet-256. Scaling to 662M, our model reaches a gFID of 1.28, surpassing the SOTA diffusion method with 10x faster sampling. On ImageNet-512, our 318M model also achieves a SOTA gFID of 1.39. Code and weights at https://github.com/ali-vilab/alitok.

Pingyu Wu, Kai Zhu, Yu Liu, Longxiang Tang, Jian Yang, Yansong Peng, Wei Zhai, Yang Cao, Zheng-Jun Zha• 2025

Related benchmarks

Task	Dataset	Result
Image Generation	ImageNet 256x256	IS326.2	517
Image Reconstruction	ImageNet 256x256	rFID0.86	202
Image Generation	ImageNet-1K 256x256 (val)	--	144
Conditional Image Generation	ImageNet-1K 256x256 (val)	gFID1.28	86
Class-conditional Image Generation	ImageNet 256x256 2012 (val)	FID1.28	63
Class-conditional Image Generation	ImageNet class-conditional 256x256	Inception Score (IS)326.2	61
Image Reconstruction	ImageNet 256x256 2012 (val)	rFID0.86	43
Image Reconstruction	ImageNet 256x256 class-conditional	rFID0.86	29
Image Reconstruction	ImageNet 50K 256x256 (val)	rFID0.84	16

Showing 9 of 9 rows

Other info

Code

Follow for update

@wizwand_team Discord