SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization

About

One noted issue of vector-quantized variational autoencoder (VQ-VAE) is that the learned discrete representation uses only a fraction of the full capacity of the codebook, also known as codebook collapse. We hypothesize that the training scheme of VQ-VAE, which involves some carefully designed heuristics, underlies this issue. In this paper, we propose a new training scheme that extends the standard VAE via novel stochastic dequantization and quantization, called stochastically quantized variational autoencoder (SQ-VAE). In SQ-VAE, we observe a trend that the quantization is stochastic at the initial stage of the training but gradually converges toward a deterministic quantization, which we call self-annealing. Our experiments show that SQ-VAE improves codebook utilization without using common heuristics. Furthermore, we empirically show that SQ-VAE is superior to VAE and VQ-VAE in vision- and speech-related tasks.

Yuhta Takida, Takashi Shibuya, WeiHsiang Liao, Chieh-Hsin Lai, Junki Ohmura, Toshimitsu Uesaka, Naoki Murata, Shusuke Takahashi, Toshiyuki Kumakura, Yuki Mitsufuji• 2022

Related benchmarks

Task	Dataset	Result
Image Reconstruction	MNIST	--	34
Image Reconstruction	CIFAR-10	LPIPS0.2333	25
Image Reconstruction	CIFAR10 (val)	L1 Loss0.0482	11
Image Reconstruction	MNIST (val)	L1 Loss0.0197	6
Image Reconstruction	MNIST 10,000 images (val)	L1 Loss0.0197	5

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord