Improving Contextualized Topic Models with Negative Sampling

About

Topic modeling has emerged as a dominant method for exploring large document collections. Recent approaches to topic modeling use large contextualized language models and variational autoencoders. In this paper, we propose a negative sampling mechanism for a contextualized topic model to improve the quality of the generated topics. In particular, during model training, we perturb the generated document-topic vector and use a triplet loss to encourage the document reconstructed from the correct document-topic vector to be similar to the input document and dissimilar to the document reconstructed from the perturbed vector. Experiments for different topic counts on three publicly available benchmark datasets show that in most cases, our approach leads to an increase in topic coherence over that of the baselines. Our model also achieves very high topic diversity.

Suman Adhya, Avishek Lahiri, Debarshi Kumar Sanyal, Partha Pratim Das• 2023

Related benchmarks

Task	Dataset	Result
Topic Modeling	Bothering	UT Score72.5	44
Topic Modeling	TeslaModel3	UT Score71.33	44
Topic Modeling	AskAcademia	UT0.765	44
Goal-relevance Evaluation	Bothering (test)	Goal Score36.54	11
Goal-relevance Evaluation	AskAcademia (test)	GS39.33	11
Goal-relevance Evaluation	TeslaModel3 (test)	GS41.31	11
Topic Modeling	AskAcademia (test)	Cp-0.0156	11
Topic Modeling	Bothering (test)	Cp-0.1319	11
Topic Modeling	TeslaModel3 (test)	Cp-0.151	11

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord