Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features
About
The recent tremendous success of unsupervised word embeddings in a multitude of applications raises the obvious question if similar methods could be derived to improve embeddings (i.e. semantic representations) of word sequences as well. We present a simple but efficient unsupervised objective to train distributed representations of sentences. Our method outperforms the state-of-the-art unsupervised models on most benchmark tasks, highlighting the robustness of the produced general-purpose sentence embeddings.
Matteo Pagliardini, Prakhar Gupta, Martin Jaggi• 2017
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Subjectivity Classification | Subj | Accuracy91.2 | 266 | |
| Text Classification | TREC | Accuracy85.8 | 179 | |
| Sentiment Classification | CR | Accuracy79.1 | 142 | |
| Sentiment Analysis | CR | Accuracy81.2 | 123 | |
| Text Classification | IMDB | Accuracy85.5 | 107 | |
| Text Classification | MR | Accuracy76.3 | 93 | |
| Word Similarity | WS-353 | Spearman Correlation (WS-353)0.7407 | 54 | |
| Word Similarity | RG-65 | Spearman Correlation0.7811 | 35 | |
| Word Similarity | RG-65 (test) | Spearman Correlation0.7811 | 33 | |
| Text Classification | SST binary | Accuracy80.2 | 29 |
Showing 10 of 26 rows