Generative and Discriminative Text Classification with Recurrent Neural Networks
About
We empirically characterize the performance of discriminative and generative LSTM models for text classification. We find that although RNN-based generative models are more powerful than their bag-of-words ancestors (e.g., they account for conditional dependencies across words in a document), they have higher asymptotic error rates than discriminatively trained RNN models. However we also find that generative models approach their asymptotic error rate more rapidly than their discriminative counterparts---the same pattern that Ng & Jordan (2001) proved holds for linear classification models that make more naive conditional independence assumptions. Building on this finding, we hypothesize that RNN-based generative classification models will be more robust to shifts in the data distribution. This hypothesis is confirmed in a series of experiments in zero-shot and continual learning settings that show that generative models substantially outperform discriminative models.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text Classification | AG News (test) | -- | 210 | |
| Text Classification | Yahoo! Answers (test) | -- | 133 | |
| Text Classification | DBPedia (test) | Test Error Rate0.013 | 40 | |
| Document Classification | Yelp Polarity | Accuracy92.6 | 25 | |
| Document Classification | Yahoo Answers | Accuracy73.7 | 23 | |
| Text Classification | Yelp Full (test) | Test Error Rate40.4 | 20 | |
| Text Classification | Yelp 2 | Accuracy92.6 | 12 | |
| Text Classification | Yelp Polarity (test) | -- | 11 | |
| Text Classification | Sogou News (test) | Error Rate5.1 | 8 | |
| Document Classification | Yelp Full | Accuracy59.6 | 5 |