Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking

About

In neural Information Retrieval, ongoing research is directed towards improving the first retriever in ranking pipelines. Learning dense embeddings to conduct retrieval using efficient approximate nearest neighbors methods has proven to work well. Meanwhile, there has been a growing interest in learning sparse representations for documents and queries, that could inherit from the desirable properties of bag-of-words models such as the exact matching of terms and the efficiency of inverted indexes. In this work, we present a new first-stage ranker based on explicit sparsity regularization and a log-saturation effect on term weights, leading to highly sparse representations and competitive results with respect to state-of-the-art dense and sparse methods. Our approach is simple, trained end-to-end in a single stage. We also explore the trade-off between effectiveness and efficiency, by controlling the contribution of the sparsity regularization.

Thibault Formal, Benjamin Piwowarski, St\'ephane Clinchant• 2021

Related benchmarks

TaskDatasetResultRank
Passage retrievalMsMARCO (dev)
MRR@1032.2
116
RetrievalTREC DL 2019
NDCG@1066.5
71
Question AnsweringScientific QA Base setting
F1 Score42.12
38
Information RetrievalScientific QA Base setting
HitRate@136.83
38
Information RetrievalBEIR v1 (test)
ArguAna44.7
22
Information RetrievalGov 500K
nDCG@50.437
21
Information RetrievalClueWeb 500K
nDCG@522.72
21
Scientific Question AnsweringSciRAG-SSLI easy 1.0 (test)
F1 Score45.48
19
Scientific Question AnsweringSciRAG-SSLI hard 1.0 (test)
F1 Score45.07
19
RerankingSciRAG-SSLI easy 1.0 (test)
Hit Rate @ 148.6
19
Showing 10 of 23 rows

Other info

Follow for update