Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
About
BERT (Devlin et al., 2018) and RoBERTa (Liu et al., 2019) has set a new state-of-the-art performance on sentence-pair regression tasks like semantic textual similarity (STS). However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10,000 sentences requires about 50 million inference computations (~65 hours) with BERT. The construction of BERT makes it unsuitable for semantic similarity search as well as for unsupervised tasks like clustering. In this publication, we present Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity. This reduces the effort for finding the most similar pair from 65 hours with BERT / RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy from BERT. We evaluate SBERT and SRoBERTa on common STS tasks and transfer learning tasks, where it outperforms other state-of-the-art sentence embeddings methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Commonsense Reasoning | HellaSwag | Accuracy75.2 | 1460 | |
| Natural Language Inference | SNLI (test) | Accuracy77 | 681 | |
| Reasoning | BBH | Accuracy63.05 | 507 | |
| Semantic Textual Similarity | STS tasks (STS12, STS13, STS14, STS15, STS16, STS-B, SICK-R) various (test) | STS12 Score74.53 | 393 | |
| Natural Language Inference | RTE | Accuracy60.2 | 367 | |
| Question Answering | OBQA | Accuracy47.2 | 276 | |
| Subjectivity Classification | Subj | Accuracy94.5 | 266 | |
| Question Answering | ARC-E | Accuracy62.9 | 242 | |
| Reading Comprehension | BoolQ | Accuracy73.6 | 219 | |
| Sentiment Classification | SST2 (test) | Accuracy87.65 | 214 |