Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

TriTopic: Tri-Modal Graph-Based Topic Modeling with Iterative Refinement and Archetypes

About

Topic modeling extracts latent themes from large text collections, but leading approaches like BERTopic face critical limitations: stochastic instability, loss of lexical precision ("Embedding Blur"), and reliance on a single data perspective. We present TriTopic, a framework that addresses these weaknesses through a tri-modal graph fusing semantic embeddings, TF-IDF, and metadata. Three core innovations drive its performance: hybrid graph construction via Mutual kNN and Shared Nearest Neighbors to eliminate noise and combat the curse of dimensionality; Consensus Leiden Clustering for reproducible, stable partitions; and Iterative Refinement that sharpens embeddings through dynamic centroid-pulling. TriTopic also replaces the "average document" concept with archetype-based topic representations defined by boundary cases rather than centers alone. In benchmarks across 20 Newsgroups, BBC News, AG News, and Arxiv, TriTopic achieves the highest NMI on every dataset (mean NMI 0.575 vs. 0.513 for BERTopic, 0.416 for NMF, 0.299 for LDA), guarantees 100% corpus coverage with 0% outliers, and is available as an open-source PyPI library.

Roman Egger• 2026

Related benchmarks

TaskDatasetResultRank
Topic ModelingBBC
NPMI0.38
17
Topic ModelingAG-News
NPMI0.527
8
Topic ModelingAggregate 4 datasets n=240
NMI0.575
4
Topic Modeling20 Newsgroups
NPMI0.413
4
Topic Modeling20 Newsgroups
Mean NMI0.532
4
Topic ModelingBBC News
Mean NMI0.702
4
Topic Modeling20 Newsgroups
Mean Coverage1
4
Topic ModelingBBC News
Mean Coverage1
4
Topic ModelingAG-News
Mean Coverage100
4
Topic ModelingCross-seed stability configurations
Mean σ(NMI)0.007
4
Showing 10 of 11 rows

Other info

Follow for update