Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DeTiME: Diffusion-Enhanced Topic Modeling using Encoder-decoder based LLM

About

In the burgeoning field of natural language processing (NLP), Neural Topic Models (NTMs) , Large Language Models (LLMs) and Diffusion model have emerged as areas of significant research interest. Despite this, NTMs primarily utilize contextual embeddings from LLMs, which are not optimal for clustering or capable for topic based text generation. NTMs have never been combined with diffusion model for text generation. Our study addresses these gaps by introducing a novel framework named Diffusion-Enhanced Topic Modeling using Encoder-Decoder-based LLMs (DeTiME). DeTiME leverages Encoder-Decoder-based LLMs to produce highly clusterable embeddings that could generate topics that exhibit both superior clusterability and enhanced semantic coherence compared to existing methods. Additionally, by exploiting the power of diffusion model, our framework also provides the capability to do topic based text generation. This dual functionality allows users to efficiently produce highly clustered topics and topic based text generation simultaneously. DeTiME's potential extends to generating clustered embeddings as well. Notably, our proposed framework(both encoder-decoder based LLM and diffusion model) proves to be efficient to train and exhibits high adaptability to other LLMs and diffusion model, demonstrating its potential for a wide array of applications.

Weijie Xu, Wenxiang Hu, Fanyou Wu, Srinivasan Sengamedu• 2023

Related benchmarks

TaskDatasetResultRank
Text ClassificationYelp (5-fold cross-validation)
Accuracy68.6
36
Text ClassificationNewsgroup Religion (5-fold cross-validation)
Accuracy41.1
36
Text ClassificationDrug Review Norethindrone (5-fold cross-validation)
Accuracy46.4
36
Text ClassificationDrug Review Norgestimate (5-fold cross-validation)
Accuracy50.3
36
Text ClassificationSMS Spam Collection (5-fold cross-validation)
Accuracy86.4
36
Text ClassificationNewsgroup Science (5-fold cross-validation)
Accuracy0.254
36
Topic ModelingNewsgroup Science
Cv0.417
18
Document ClusteringDrug Review Norethindrone
Purity46.4
18
Document ClusteringYelp
Purity68.6
18
Document ClusteringNewsgroup Religion
Purity41.1
18
Showing 10 of 18 rows

Other info

Follow for update