Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CWTM: Leveraging Contextualized Word Embeddings from BERT for Neural Topic Modeling

About

Most existing topic models rely on bag-of-words (BOW) representation, which limits their ability to capture word order information and leads to challenges with out-of-vocabulary (OOV) words in new documents. Contextualized word embeddings, however, show superiority in word sense disambiguation and effectively address the OOV issue. In this work, we introduce a novel neural topic model called the Contextlized Word Topic Model (CWTM), which integrates contextualized word embeddings from BERT. The model is capable of learning the topic vector of a document without BOW information. In addition, it can also derive the topic vectors for individual words within a document based on their contextualized word embeddings. Experiments across various datasets show that CWTM generates more coherent and meaningful topics compared to existing topic models, while also accommodating unseen words in newly encountered documents.

Zheng Fang, Yulan He, Rob Procter• 2023

Related benchmarks

TaskDatasetResultRank
Topic ModelingAskAcademia
UT0.7
44
Topic ModelingTeslaModel3
UT Score64
44
Topic ModelingBothering
UT Score48.5
44
Topic ModelingAskAcademia (test)
Cp0.1039
11
Topic ModelingBothering (test)
Cp0.0175
11
Topic ModelingTeslaModel3 (test)
Cp0.0417
11
Goal-relevance EvaluationTeslaModel3 (test)
GS41.88
11
Goal-relevance EvaluationBothering (test)
Goal Score35.57
11
Goal-relevance EvaluationAskAcademia (test)
GS36.35
11
Showing 9 of 9 rows

Other info

Follow for update