Topic Modeling with Wasserstein Autoencoders

About

We propose a novel neural topic model in the Wasserstein autoencoders (WAE) framework. Unlike existing variational autoencoder based models, we directly enforce Dirichlet prior on the latent document-topic vectors. We exploit the structure of the latent space and apply a suitable kernel in minimizing the Maximum Mean Discrepancy (MMD) to perform distribution matching. We discover that MMD performs much better than the Generative Adversarial Network (GAN) in matching high dimensional Dirichlet distribution. We further discover that incorporating randomness in the encoder output during training leads to significantly more coherent topics. To measure the diversity of the produced topics, we propose a simple topic uniqueness metric. Together with the widely used coherence measure NPMI, we offer a more wholistic evaluation of topic quality. Experiments on several real datasets show that our model produces significantly better topics than existing topic models.

Feng Nan, Ran Ding, Ramesh Nallapati, Bing Xiang• 2019

Related benchmarks

Task	Dataset	Result
Topic Modeling	20NG	NPMI0.046	33
Topic Modeling	DBLP	NPMI-0.044	23
Topic Modeling	M10	NPMI-0.052	23
Topic Modeling	BBC	NPMI-0.006	17
Document Clustering	BBC (test)	NMI0.718	13
Document Clustering	SS (test)	NMI0.431	13
Document Clustering	20NG (test)	NMI0.37	13
Document Clustering	DBLP (test)	NMI0.188	13
Document Clustering	M10 (test)	NMI0.34	13
Document Clustering	Pascal (test)	NMI0.401	13

Showing 10 of 24 rows

Other info

Follow for update

@wizwand_team Discord