Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

About

BERT (Devlin et al., 2018) and RoBERTa (Liu et al., 2019) has set a new state-of-the-art performance on sentence-pair regression tasks like semantic textual similarity (STS). However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10,000 sentences requires about 50 million inference computations (~65 hours) with BERT. The construction of BERT makes it unsuitable for semantic similarity search as well as for unsupervised tasks like clustering. In this publication, we present Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity. This reduces the effort for finding the most similar pair from 65 hours with BERT / RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy from BERT. We evaluate SBERT and SRoBERTa on common STS tasks and transfer learning tasks, where it outperforms other state-of-the-art sentence embeddings methods.

Nils Reimers, Iryna Gurevych• 2019

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	HellaSwag	Accuracy75.2	1896
Node Classification	Citeseer	Accuracy60.52	1037
Mathematical Reasoning	MATH	Accuracy12.43	882
Node Classification	Pubmed	Accuracy36.04	865
Reasoning	BBH	Accuracy63.05	726
Natural Language Inference	SNLI (test)	Accuracy77	694
Natural Language Inference	RTE	Accuracy60.2	590
Node Classification	Cora	Accuracy59.67	583
Question Answering	ARC-E	Accuracy62.9	523
Node Classification	Citeseer	Accuracy72.93	503

Showing 10 of 419 rows

...

Other info

Code

Follow for update

@wizwand_team Discord