Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Understanding Cross-Domain Adaptation in Low-Resource Topic Modeling

About

Topic modeling plays a vital role in uncovering hidden semantic structures within text corpora, but existing models struggle in low-resource settings where limited target-domain data leads to unstable and incoherent topic inference. We address this challenge by formally introducing domain adaptation for low-resource topic modeling, where a high-resource source domain informs a low-resource target domain without overwhelming it with irrelevant content. We establish a finite-sample generalization bound showing that effective knowledge transfer depends on robust performance in both domains, minimizing latent-space discrepancy, and preventing overfitting to the data. Guided by these insights, we propose DALTA (Domain-Aligned Latent Topic Adaptation), a new framework that employs a shared encoder for domain-invariant features, specialized decoders for domain-specific nuances, and adversarial alignment to selectively transfer relevant information. Experiments on diverse low-resource datasets demonstrate that DALTA consistently outperforms state-of-the-art methods in terms of topic coherence, stability, and transferability.

Pritom Saha Akash, Kevin Chen-Chuan Chang• 2025

Related benchmarks

TaskDatasetResultRank
Text ClassificationDrug Review Norethindrone (5-fold cross-validation)
Accuracy60
36
Text ClassificationDrug Review Norgestimate (5-fold cross-validation)
Accuracy64.6
36
Text ClassificationNewsgroup Science (5-fold cross-validation)
Accuracy0.758
36
Text ClassificationSMS Spam Collection (5-fold cross-validation)
Accuracy97.8
36
Text ClassificationNewsgroup Religion (5-fold cross-validation)
Accuracy54.9
36
Text ClassificationYelp (5-fold cross-validation)
Accuracy68.6
36
Document ClusteringNewsgroup Religion
Purity50
18
Document ClusteringDrug Review Norgestimate
Purity60.4
18
Document ClusteringSMS Spam Collection
Purity97.8
18
Topic ModelingNewsgroup Science
Cv0.493
18
Showing 10 of 18 rows

Other info

Follow for update