Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence

About

Topic models extract groups of words from documents, whose interpretation as a topic hopefully allows for a better understanding of the data. However, the resulting word groups are often not coherent, making them harder to interpret. Recently, neural topic models have shown improvements in overall coherence. Concurrently, contextual embeddings have advanced the state of the art of neural models in general. In this paper, we combine contextualized representations with neural topic models. We find that our approach produces more meaningful and coherent topics than traditional bag-of-words topic models and recent neural models. Our results indicate that future improvements in language models will translate into better topic models.

Federico Bianchi, Silvia Terragni, Dirk Hovy• 2020

Related benchmarks

TaskDatasetResultRank
Text ClassificationNewsgroup Religion (5-fold cross-validation)
Accuracy55.7
36
Text ClassificationSMS Spam Collection (5-fold cross-validation)
Accuracy98.7
36
Text ClassificationNewsgroup Science (5-fold cross-validation)
Accuracy0.702
36
Text ClassificationDrug Review Norethindrone (5-fold cross-validation)
Accuracy58.6
36
Text ClassificationDrug Review Norgestimate (5-fold cross-validation)
Accuracy62.8
36
Text ClassificationYelp (5-fold cross-validation)
Accuracy68.6
36
Topic Modeling20NG
NPMI0.107
23
Document ClusteringNewsgroup Science
Purity73
18
Document ClusteringDrug Review Norethindrone
Purity60
18
Topic ModelingNewsgroup Science
Cv0.476
18
Showing 10 of 44 rows

Other info

Follow for update