Human-Centric Topic Modeling with Goal-Prompted Contrastive Learning and Optimal Transport
About
Existing topic modeling methods, from LDA to recent neural and LLM-based approaches, which focus mainly on statistical coherence, often produce redundant or off-target topics that miss the user's underlying intent. We introduce Human-centric Topic Modeling, \emph{Human-TM}), a novel task formulation that integrates a human-provided goal directly into the topic modeling process to produce interpretable, diverse and goal-oriented topics. To tackle this challenge, we propose the \textbf{G}oal-prompted \textbf{C}ontrastive \textbf{T}opic \textbf{M}odel with \textbf{O}ptimal \textbf{T}ransport (GCTM-OT), which first uses LLM-based prompting to extract goal candidates from documents, then incorporates these into semantic-aware contrastive learning via optimal transport for topic discovery. Experimental results on three public subreddit datasets show that GCTM-OT outperforms state-of-the-art baselines in topic coherence and diversity while significantly improving alignment with human-provided goals, paving the way for more human-centric topic discovery systems.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Topic Modeling | Bothering | UT Score98 | 44 | |
| Topic Modeling | TeslaModel3 | UT Score97.5 | 44 | |
| Topic Modeling | AskAcademia | UT1 | 44 | |
| Goal-relevance Evaluation | Bothering (test) | Goal Score42.53 | 11 | |
| Goal-relevance Evaluation | TeslaModel3 (test) | GS51 | 11 | |
| Goal-relevance Evaluation | AskAcademia (test) | GS45.3 | 11 | |
| Topic Modeling | Bothering (test) | Cp0.3326 | 11 | |
| Topic Modeling | TeslaModel3 (test) | Cp0.3031 | 11 | |
| Topic Modeling | AskAcademia (test) | Cp0.299 | 11 |