Continuous Speculative Decoding for Autoregressive Image Generation

About

Continuous visual autoregressive (AR) models have demonstrated promising performance in image generation. However, the heavy autoregressive inference burden imposes significant overhead. In Large Language Models (LLMs), speculative decoding has effectively accelerated discrete autoregressive inference. However, the absence of an analogous theory for continuous distributions precludes its use in accelerating continuous AR models. To fill this gap, this work presents continuous speculative decoding, and addresses challenges from: 1) low acceptance rate, caused by inconsistent output distribution between target and draft models, and 2) modified distribution without analytic expression, caused by complex integral. To address challenge 1), we propose denoising trajectory alignment and token pre-filling strategies. To address challenge 2), we introduce acceptance-rejection sampling algorithm with an appropriate upper bound, thereby avoiding explicitly calculating the integral. Furthermore, our denoising trajectory alignment is also reused in acceptance-rejection sampling, effectively avoiding repetitive diffusion model inference. Extensive experiments demonstrate that our proposed continuous speculative decoding achieves over $2\times$ speedup on off-the-shelf models, while maintaining the original generation quality. Codes is available at: https://github.com/MarkXCloud/CSpD

Zili Wang, Robert Zhang, Kun Ding, Qi Yang, Fei Li, Shiming Xiang• 2024

Related benchmarks

Task	Dataset	Result
Text-to-Image Generation	GenEval	GenEval Score63	442
Text-to-Image Generation	MJHQ-30K	Overall FID6.9	239
Class-conditional Image Generation	ImageNet 256x256 (test)	FID1.68	223

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord