Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Refining Dimensions for Improving Clustering-based Cross-lingual Topic Models

About

Recent works in clustering-based topic models perform well in monolingual topic identification by introducing a pipeline to cluster the contextualized representations. However, the pipeline is suboptimal in identifying topics across languages due to the presence of language-dependent dimensions (LDDs) generated by multilingual language models. To address this issue, we introduce a novel, SVD-based dimension refinement component into the pipeline of the clustering-based topic model. This component effectively neutralizes the negative impact of LDDs, enabling the model to accurately identify topics across languages. Our experiments on three datasets demonstrate that the updated pipeline with the dimension refinement component generally outperforms other state-of-the-art cross-lingual topic models.

Chia-Hsuan Chang, Tien-Yuan Huang, Yi-Hang Tsai, Chia-Ming Chang, San-Yih Hwang• 2024

Related benchmarks

TaskDatasetResultRank
Topic ModelingEC News
CNPMI (Coherence)0.083
18
Cross-lingual Topic ModelingAmazon Review
CNPMI0.055
10
Cross-lingual Topic ModelingRakuten Amazon
CNPMI0.027
10
Topic ModelingAmazon Review
CNPMI0.055
8
Topic ModelingRakuten Amazon
CNPMI0.027
8
Topic ModelingAiriti Thesis
CNPMI0.0312
8
Showing 6 of 6 rows

Other info

Follow for update