Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Neural Discrete Representation Learning

About

Learning useful representations without supervision remains a key challenge in machine learning. In this paper, we propose a simple yet powerful generative model that learns such discrete representations. Our model, the Vector Quantised-Variational AutoEncoder (VQ-VAE), differs from VAEs in two key ways: the encoder network outputs discrete, rather than continuous, codes; and the prior is learnt rather than static. In order to learn a discrete latent representation, we incorporate ideas from vector quantisation (VQ). Using the VQ method allows the model to circumvent issues of "posterior collapse" -- where the latents are ignored when they are paired with a powerful autoregressive decoder -- typically observed in the VAE framework. Pairing these representations with an autoregressive prior, the model can generate high quality images, videos, and speech as well as doing high quality speaker conversion and unsupervised learning of phonemes, providing further evidence of the utility of the learnt representations.

Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu• 2017

Related benchmarks

TaskDatasetResultRank
Class-conditional Image GenerationImageNet 256x256
Inception Score (IS)74.3
815
Text-to-motion generationHumanML3D (test)
FID0.085
481
Text-to-Image GenerationGenEval
Overall Score77.24
391
text-to-motion mappingHumanML3D (test)
FID0.064
283
text-to-motion mappingKIT-ML (test)
R Precision (Top 3)0.746
275
Image GenerationImageNet (val)--
247
Text-to-Image GenerationMJHQ-30K
Overall FID14.2888
153
Abnormal Event DetectionUCSD Ped2 (test)
AUC90.2
146
ClusteringMNIST (test)
NMI0.409
132
Image ReconstructionImageNet1K (val)
FID2.8511
98
Showing 10 of 120 rows
...

Other info

Follow for update