Vector Quantization using Gaussian Variational Autoencoder
About
Vector-quantized variational autoencoders (VQ-VAEs) are discrete autoencoders that compress images into discrete tokens. However, they are difficult to train due to discretization. In this paper, we propose a simple yet effective technique dubbed Gaussian Quant (GQ), which first trains a Gaussian VAE under certain constraints and then converts it into a VQ-VAE without additional training. For conversion, GQ generates random Gaussian noise as a codebook and finds the closest noise vector to the posterior mean. Theoretically, we prove that when the logarithm of the codebook size exceeds the bits-back coding rate of the Gaussian VAE, a small quantization error is guaranteed. Practically, we propose a heuristic to train Gaussian VAEs for effective conversion, named the target divergence constraint (TDC). Empirically, we show that GQ outperforms previous VQ-VAEs, such as VQGAN, FSQ, LFQ, and BSQ, on both UNet and ViT architectures. Furthermore, TDC also improves previous Gaussian VAE discretization methods, such as TokenBridge. The source code is provided in the supplementary materials.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Reconstruction | COCO 2017 (val) | PSNR32.36 | 54 | |
| Image Reconstruction | ImageNet (val) | rFID0.424 | 54 | |
| Image Generation | FFHQ | gFID5.09 | 3 |