All-but-the-Top: Simple and Effective Postprocessing for Word Representations
About
Real-valued word representations have transformed NLP applications; popular examples are word2vec and GloVe, recognized for their ability to capture linguistic regularities. In this paper, we demonstrate a {\em very simple}, and yet counter-intuitive, postprocessing technique -- eliminate the common mean vector and a few top dominating directions from the word vectors -- that renders off-the-shelf representations {\em even stronger}. The postprocessing is empirically validated on a variety of lexical-level intrinsic tasks (word similarity, concept categorization, word analogy) and sentence-level tasks (semantic textural similarity and { text classification}) on multiple datasets and with a variety of representation methods and hyperparameter choices in multiple languages; in each case, the processed representations are consistently better than the original ones.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic Textual Similarity | STS tasks (STS12, STS13, STS14, STS15, STS16, STS-B, SICK-R) various (test) | STS12 Score58.35 | 393 | |
| Multilingual Information Retrieval | XQuAD | -- | 21 | |
| Semantic Similarity | STS-B (test) | Semantic Consistency56.98 | 18 | |
| Semantic Textual Similarity | JSTS (test) | JSTS Score57.14 | 7 | |
| Multilingual Information Retrieval | Belebele | Arabic nDCG@200.4194 | 4 |