Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Discriminative Topic Mining via Category-Name Guided Text Embedding

About

Mining a set of meaningful and distinctive topics automatically from massive text corpora has broad applications. Existing topic models, however, typically work in a purely unsupervised way, which often generate topics that do not fit users' particular needs and yield suboptimal performance on downstream tasks. We propose a new task, discriminative topic mining, which leverages a set of user-provided category names to mine discriminative topics from text corpora. This new task not only helps a user understand clearly and distinctively the topics he/she is most interested in, but also benefits directly keyword-driven classification tasks. We develop CatE, a novel category-name guided text embedding method for discriminative topic mining, which effectively leverages minimal user guidance to learn a discriminative embedding space and discover category representative terms in an iterative manner. We conduct a comprehensive set of experiments to show that CatE mines high-quality set of topics guided by category names only, and benefits a variety of downstream applications including weakly-supervised classification and lexical entailment direction identification.

Yu Meng, Jiaxin Huang, Guangyuan Wang, Zihan Wang, Chao Zhang, Yu Zhang, Jiawei Han• 2019

Related benchmarks

TaskDatasetResultRank
Text ClassificationAGNews
Accuracy82
119
Text Classification20News
Accuracy59.6
101
Text ClassificationDBLP
Accuracy51.8
9
Hierarchical Topic MiningNYT
TC0.0149
7
Hierarchical Topic MiningarXiv
TC0.0066
7
Hierarchical Text ClassificationNYT
Macro F150.3
7
Hierarchical Text ClassificationarXiv
Macro F1 Score40.1
4
Showing 7 of 7 rows

Other info

Follow for update