Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning

About

Learning sentence embeddings often requires a large amount of labeled data. However, for most tasks and domains, labeled data is seldom available and creating it is expensive. In this work, we present a new state-of-the-art unsupervised method based on pre-trained Transformers and Sequential Denoising Auto-Encoder (TSDAE) which outperforms previous approaches by up to 6.4 points. It can achieve up to 93.1% of the performance of in-domain supervised approaches. Further, we show that TSDAE is a strong domain adaptation and pre-training method for sentence embeddings, significantly outperforming other approaches like Masked Language Model. A crucial shortcoming of previous studies is the narrow evaluation: Most work mainly evaluates on the single task of Semantic Textual Similarity (STS), which does not require any domain knowledge. It is unclear if these proposed methods generalize to other domains and tasks. We fill this gap and evaluate TSDAE and other recent approaches on four different datasets from heterogeneous domains.

Kexin Wang, Nils Reimers, Iryna Gurevych• 2021

Related benchmarks

TaskDatasetResultRank
Information RetrievalBEIR (test)
TREC-COVID Score0.688
76
Semantic Textual SimilaritySTS-B
Spearman's Rho (x100)80.8
70
Information RetrievalBEIR
TREC-COVID0.688
59
Scientific Document RetrievalSciDocs (dev)
Cite75.6
22
Paraphrase IdentificationTwitterPara (test)
TURL76.8
22
Question RetrievalAskUbuntu (dev)
AP59.4
22
Question RetrievalCQADupStack (dev)
Average Precision0.145
22
Duplicate Question DetectionQuora
nDCG@1052.7
12
Semantic SimilarityUSEB (Universal Sentence Encoder Benchmark)
AskU AP53.8
12
Domain-specific Information Retrieval and Paraphrase IdentificationDomain-specific tasks (Average of AskUbuntu, CQADupStack, TwitterPara, SciDocs) (test dev)
Average Precision55.2
8
Showing 10 of 11 rows

Other info

Code

Follow for update