Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

All-but-the-Top: Simple and Effective Postprocessing for Word Representations

About

Real-valued word representations have transformed NLP applications; popular examples are word2vec and GloVe, recognized for their ability to capture linguistic regularities. In this paper, we demonstrate a {\em very simple}, and yet counter-intuitive, postprocessing technique -- eliminate the common mean vector and a few top dominating directions from the word vectors -- that renders off-the-shelf representations {\em even stronger}. The postprocessing is empirically validated on a variety of lexical-level intrinsic tasks (word similarity, concept categorization, word analogy) and sentence-level tasks (semantic textural similarity and { text classification}) on multiple datasets and with a variety of representation methods and hyperparameter choices in multiple languages; in each case, the processed representations are consistently better than the original ones.

Jiaqi Mu, Suma Bhat, Pramod Viswanath• 2017

Related benchmarks

TaskDatasetResultRank
Semantic Textual SimilaritySTS tasks (STS12, STS13, STS14, STS15, STS16, STS-B, SICK-R) various (test)
STS12 Score58.35
393
Multilingual Information RetrievalXQuAD--
21
Semantic SimilaritySTS-B (test)
Semantic Consistency56.98
18
Semantic Textual SimilarityJSTS (test)
JSTS Score57.14
7
Multilingual Information RetrievalBelebele
Arabic nDCG@200.4194
4
Showing 5 of 5 rows

Other info

Follow for update