LLM-XTM: Enhancing Cross-Lingual Topic Models with Large Language Models

About

Cross-lingual topic modeling aims to discover shared semantic structures across languages, yet existing models depend on sparse bilingual resources and often yield incoherent or weakly aligned topics. Recent LLM-based refinements improve interpretability but are costly, document-level, and prone to hallucination, with prior white-box approaches requiring inaccessible token probabilities. We propose LLM-XTM, a framework that integrates LLM-guided topic refinement with self-consistency uncertainty quantification, enabling black-box, stable, and scalable enhancement of cross-lingual topic models. Experiments on multilingual corpora show that LLM-XTM achieves superior topic coherence and alignment while reducing reliance on bilingual dictionaries and expensive LLM calls.

Minh Chu Xuan, Tien-Phat Nguyen, Linh Ngo Van, Dinh Viet Sang, Nguyen Thi Ngoc Diep, Trung Le• 2026

Related benchmarks

Task	Dataset	Result
Topic Modeling	EC News	CNPMI (Coherence)0.088	18
Document Classification	Amazon Review EN	Accuracy80.03	16
Cross-lingual Topic Modeling	Amazon Review	CNPMI0.072	10
Cross-lingual Topic Modeling	Rakuten Amazon	CNPMI0.04	10
Document Classification	EC News EN	Accuracy79.75	8
Document Classification	EC News ZH	Accuracy77.85	8
Document Classification	Amazon Review ZH	Accuracy73.21	8
Document Classification	Rakuten Amazon JA	Accuracy83.34	8
Topic Modeling	Airiti Thesis	CNPMI0.0531	8

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord