Discriminative Topic Mining via Category-Name Guided Text Embedding
About
Mining a set of meaningful and distinctive topics automatically from massive text corpora has broad applications. Existing topic models, however, typically work in a purely unsupervised way, which often generate topics that do not fit users' particular needs and yield suboptimal performance on downstream tasks. We propose a new task, discriminative topic mining, which leverages a set of user-provided category names to mine discriminative topics from text corpora. This new task not only helps a user understand clearly and distinctively the topics he/she is most interested in, but also benefits directly keyword-driven classification tasks. We develop CatE, a novel category-name guided text embedding method for discriminative topic mining, which effectively leverages minimal user guidance to learn a discriminative embedding space and discover category representative terms in an iterative manner. We conduct a comprehensive set of experiments to show that CatE mines high-quality set of topics guided by category names only, and benefits a variety of downstream applications including weakly-supervised classification and lexical entailment direction identification.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text Classification | AGNews | Accuracy82 | 119 | |
| Text Classification | 20News | Accuracy59.6 | 101 | |
| Text Classification | DBLP | Accuracy51.8 | 9 | |
| Hierarchical Topic Mining | NYT | TC0.0149 | 7 | |
| Hierarchical Topic Mining | arXiv | TC0.0066 | 7 | |
| Hierarchical Text Classification | NYT | Macro F150.3 | 7 | |
| Hierarchical Text Classification | arXiv | Macro F1 Score40.1 | 4 |