Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BERTopic: Neural topic modeling with a class-based TF-IDF procedure

About

Topic models can be useful tools to discover latent topics in collections of documents. Recent studies have shown the feasibility of approach topic modeling as a clustering task. We present BERTopic, a topic model that extends this process by extracting coherent topic representation through the development of a class-based variation of TF-IDF. More specifically, BERTopic generates document embedding with pre-trained transformer-based language models, clusters these embeddings, and finally, generates topic representations with the class-based TF-IDF procedure. BERTopic generates coherent topics and remains competitive across a variety of benchmarks involving classical models and those that follow the more recent clustering approach of topic modeling.

Maarten Grootendorst• 2022

Related benchmarks

TaskDatasetResultRank
Text Classification20News
Accuracy59.1
127
Text ClassificationAGNews
Accuracy66.6
119
Topic Modeling20NG
NPMI0.0887
33
Topic ModelingNCTBText
CV0.82
29
Topic ModelingJamuna News
CV0.71
29
Topic ModelingBanFakeNews
CV0.67
25
Topic ModelingM10
NPMI0.131
23
Topic ModelingDBLP
NPMI0.0039
23
Topic ModelingYelp--
18
Topic ModelingBBC
NPMI0.085
17
Showing 10 of 69 rows

Other info

Follow for update