Training-Free Vector Quantization via Gaussian VAEs

About

Vector-quantized variational autoencoders (VQ-VAEs) are discrete autoencoders that compress images into discrete tokens. However, they are difficult to train due to discretization. In this paper, we propose a simple yet effective technique dubbed Gaussian Quant (GQ), which first trains a Gaussian VAE under certain constraints and then converts it into a VQ-VAE without additional training. For conversion, GQ generates random Gaussian noise as a codebook and finds the closest noise vector to the posterior mean. Theoretically, we prove that when the logarithm of the codebook size exceeds the bits-back coding rate of the Gaussian VAE, a small quantization error is guaranteed. Practically, we propose a heuristic to train Gaussian VAEs for effective conversion, named the target divergence constraint (TDC). Empirically, we show that GQ outperforms previous VQ-VAEs, such as VQGAN, FSQ, LFQ, and BSQ, on both UNet and ViT architectures. Furthermore, TDC also improves previous Gaussian VAE discretization methods, such as TokenBridge. The source code is provided in https://github.com/tongdaxu/VQ-VAE-from-Gaussian-VAE.

Tongda Xu, Wendi Zheng, Jiajun He, Jose Miguel Hernandez-Lobato, Yan Wang, Ya-Qin Zhang, Jie Tang• 2025

Related benchmarks

Task	Dataset	Result
Image Reconstruction	ImageNet (val)	rFID0.424	143
Image Reconstruction	COCO 2017 (val)	PSNR32.36	123
Image Generation	FFHQ	gFID5.09	3

Showing 3 of 3 rows

Other info

GitHub

Follow for update

@wizwand_team Discord