Effective Neural Topic Modeling with Embedding Clustering Regularization
About
Topic models have been prevalent for decades with various applications. However, existing topic models commonly suffer from the notorious topic collapsing: discovered topics semantically collapse towards each other, leading to highly repetitive topics, insufficient topic discovery, and damaged model interpretability. In this paper, we propose a new neural topic model, Embedding Clustering Regularization Topic Model (ECRTM). Besides the existing reconstruction error, we propose a novel Embedding Clustering Regularization (ECR), which forces each topic embedding to be the center of a separately aggregated word embedding cluster in the semantic space. This enables each produced topic to contain distinct word semantics, which alleviates topic collapsing. Regularized by ECR, our ECRTM generates diverse and coherent topics together with high-quality topic distributions of documents. Extensive experiments on benchmark datasets demonstrate that ECRTM effectively addresses the topic collapsing issue and consistently surpasses state-of-the-art baselines in terms of topic quality, topic distributions of documents, and downstream classification tasks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text Classification | Drug Review Norethindrone (5-fold cross-validation) | Accuracy59.6 | 36 | |
| Text Classification | Newsgroup Religion (5-fold cross-validation) | Accuracy54.1 | 36 | |
| Text Classification | Drug Review Norgestimate (5-fold cross-validation) | Accuracy63 | 36 | |
| Text Classification | SMS Spam Collection (5-fold cross-validation) | Accuracy89.1 | 36 | |
| Text Classification | Yelp (5-fold cross-validation) | Accuracy68.6 | 36 | |
| Text Classification | Newsgroup Science (5-fold cross-validation) | Accuracy0.625 | 36 | |
| Topic Modeling | 20NG | NPMI-0.089 | 23 | |
| Document Clustering | Drug Review Norethindrone | Purity55.7 | 18 | |
| Topic Modeling | Yelp | Cv0.473 | 18 | |
| Document Clustering | Drug Review Norgestimate | Purity58.4 | 18 |