LLM-XTM: Enhancing Cross-Lingual Topic Models with Large Language Models
About
Cross-lingual topic modeling aims to discover shared semantic structures across languages, yet existing models depend on sparse bilingual resources and often yield incoherent or weakly aligned topics. Recent LLM-based refinements improve interpretability but are costly, document-level, and prone to hallucination, with prior white-box approaches requiring inaccessible token probabilities. We propose LLM-XTM, a framework that integrates LLM-guided topic refinement with self-consistency uncertainty quantification, enabling black-box, stable, and scalable enhancement of cross-lingual topic models. Experiments on multilingual corpora show that LLM-XTM achieves superior topic coherence and alignment while reducing reliance on bilingual dictionaries and expensive LLM calls.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Topic Modeling | EC News | CNPMI (Coherence)0.088 | 18 | |
| Document Classification | Amazon Review EN | Accuracy80.03 | 16 | |
| Cross-lingual Topic Modeling | Amazon Review | CNPMI0.072 | 10 | |
| Cross-lingual Topic Modeling | Rakuten Amazon | CNPMI0.04 | 10 | |
| Document Classification | EC News EN | Accuracy79.75 | 8 | |
| Document Classification | EC News ZH | Accuracy77.85 | 8 | |
| Document Classification | Amazon Review ZH | Accuracy73.21 | 8 | |
| Document Classification | Rakuten Amazon JA | Accuracy83.34 | 8 | |
| Topic Modeling | Airiti Thesis | CNPMI0.0531 | 8 |