On the Sentence Embeddings from Pre-trained Language Models

About

Pre-trained contextual representations like BERT have achieved great success in natural language processing. However, the sentence embeddings from the pre-trained language models without fine-tuning have been found to poorly capture semantic meaning of sentences. In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited. We first reveal the theoretical connection between the masked language model pre-training objective and the semantic similarity task theoretically, and then analyze the BERT sentence embeddings empirically. We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity. To address this issue, we propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective. Experimental results show that our proposed BERT-flow method obtains significant performance gains over the state-of-the-art sentence embeddings on a variety of semantic textual similarity tasks. The code is available at https://github.com/bohanli/BERT-flow.

Bohan Li, Hao Zhou, Junxian He, Mingxuan Wang, Yiming Yang, Lei Li• 2020

Related benchmarks

Task	Dataset	Result
Semantic Textual Similarity	STS tasks (STS12, STS13, STS14, STS15, STS16, STS-B, SICK-R) various (test)	STS12 Score70.19	412
Semantic Textual Similarity	STS tasks (STS12, STS13, STS14, STS15, STS16, STS-B, SICK-R)	STS12 Score69.78	253
Semantic Textual Similarity	STS-B	Spearman's Rho (x100)72.26	156
Semantic Textual Similarity	STS-12	Spearman Correlation (rho)0.652	91
Semantic Textual Similarity	STS16 (test)	Spearman Corr75.37	42
Semantic Textual Similarity	STS13 (test)	Spearman Correlation72.14	42
Semantic Textual Similarity	STS14 (test)	Spearman Correlation0.6842	42
Semantic Textual Similarity	STS15 (test)	Spearman Correlation0.7377	42
Semantic Textual Similarity	STS 2014	Spearman Correlation0.6942	39
Paraphrase Identification	TwitterPara (test)	TURL76.5	22

Showing 10 of 23 rows

Other info

Code

Follow for update

@wizwand_team Discord