Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Neural Topic Model via Optimal Transport

About

Recently, Neural Topic Models (NTMs) inspired by variational autoencoders have obtained increasingly research interest due to their promising results on text analysis. However, it is usually hard for existing NTMs to achieve good document representation and coherent/diverse topics at the same time. Moreover, they often degrade their performance severely on short documents. The requirement of reparameterisation could also comprise their training quality and model flexibility. To address these shortcomings, we present a new neural topic model via the theory of optimal transport (OT). Specifically, we propose to learn the topic distribution of a document by directly minimising its OT distance to the document's word distributions. Importantly, the cost matrix of the OT distance models the weights between topics and words, which is constructed by the distances between topics and words in an embedding space. Our proposed model can be trained efficiently with a differentiable loss. Extensive experiments show that our framework significantly outperforms the state-of-the-art NTMs on discovering more coherent and diverse topics and deriving better document representations for both regular and short texts.

He Zhao, Dinh Phung, Viet Huynh, Trung Le, Wray Buntine• 2020

Related benchmarks

TaskDatasetResultRank
Topic DetectionText Datasets active topics
Accuracy7.96
54
Topic ModelingFive text datasets (News-20K, IMDB, Yelp, DailyMail, Twitter)
Intruder Detection Accuracy (CI)23.43
45
Topic ModelingCIFAR100, Food101, SUN397 Averaged (test)
Intruder Detection Accuracy (CI)19.57
45
Topic Coherence20News
NPMI0.06
26
Topic ModelingAGNews
Diversity97.4
14
Topic ModelingAmazon-10Cate
TC Score0.462
14
Topic ModelingYelp 5Cate
Topic Coherence (TC)38.4
14
Topic ModelingNYT corpus
NPMI0.1509
14
Topic Relevance PredictionText Datasets Mean
Score (Threshold 50)50.01
9
Document Clustering20NG
Top-Purity47.7
6
Showing 10 of 18 rows

Other info

Follow for update