Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features

About

The recent tremendous success of unsupervised word embeddings in a multitude of applications raises the obvious question if similar methods could be derived to improve embeddings (i.e. semantic representations) of word sequences as well. We present a simple but efficient unsupervised objective to train distributed representations of sentences. Our method outperforms the state-of-the-art unsupervised models on most benchmark tasks, highlighting the robustness of the produced general-purpose sentence embeddings.

Matteo Pagliardini, Prakhar Gupta, Martin Jaggi• 2017

Related benchmarks

Task	Dataset	Result
Subjectivity Classification	Subj	Accuracy91.2	343
Text Classification	TREC	Accuracy85.8	281
Text Classification	MR	Accuracy76.3	174
Sentiment Classification	CR	Accuracy79.1	142
Sentiment Analysis	CR	Accuracy81.2	141
Text Classification	IMDB	Accuracy85.5	119
Word Similarity	WS-353	Spearman Correlation (WS-353)0.7407	54
Semantic Textual Similarity	STS Benchmark (test)	Pearson Correlation (r)0.755	46
Word Similarity	RG-65	Spearman Correlation0.7811	41
Word Similarity	RG-65 (test)	Spearman Correlation0.7811	33

Showing 10 of 26 rows

Other info

Follow for update

@wizwand_team Discord