Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms
About
Many deep learning architectures have been proposed to model the compositionality in text sequences, requiring a substantial number of parameters and expensive computations. However, there has not been a rigorous evaluation regarding the added value of sophisticated compositional functions. In this paper, we conduct a point-by-point comparative study between Simple Word-Embedding-based Models (SWEMs), consisting of parameter-free pooling operations, relative to word-embedding-based RNN/CNN models. Surprisingly, SWEMs exhibit comparable or even superior performance in the majority of cases considered. Based upon this understanding, we propose two additional pooling strategies over learned word embeddings: (i) a max-pooling operation for improved interpretability; and (ii) a hierarchical pooling operation, which preserves spatial (n-gram) information within text sequences. We present experiments on 17 datasets encompassing three tasks: (i) (long) document classification; (ii) text sequence matching; and (iii) short text tasks, including classification and tagging. The source code and datasets can be obtained from https:// github.com/dinghanshen/SWEM.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Natural Language Inference | SNLI (test) | Accuracy83.8 | 681 | |
| Subjectivity Classification | Subj | Accuracy93 | 266 | |
| Text Classification | AG News (test) | Accuracy92.66 | 210 | |
| Text Classification | TREC | Accuracy92.2 | 179 | |
| Natural Language Inference | SNLI | Accuracy83.8 | 174 | |
| Text Classification | Yahoo! Answers (test) | Clean Accuracy73.53 | 133 | |
| Subjectivity Classification | Subj (test) | Accuracy93 | 125 | |
| Question Classification | TREC (test) | Accuracy92.2 | 124 | |
| Text Classification | MR (test) | Accuracy78.2 | 99 | |
| Sentiment Classification | Stanford Sentiment Treebank SST-2 (test) | Accuracy84.3 | 99 |