From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective
About
Neural retrievers based on dense representations combined with Approximate Nearest Neighbors search have recently received a lot of attention, owing their success to distillation and/or better sampling of examples for training -- while still relying on the same backbone architecture. In the meantime, sparse representation learning fueled by traditional inverted indexing techniques has seen a growing interest, inheriting from desirable IR priors such as explicit lexical matching. While some architectural variants have been proposed, a lesser effort has been put in the training of such models. In this work, we build on SPLADE -- a sparse expansion-based retriever -- and show to which extent it is able to benefit from the same training improvements as dense models, by studying the effect of distillation, hard-negative mining as well as the Pre-trained Language Model initialization. We furthermore study the link between effectiveness and efficiency, on in-domain and zero-shot settings, leading to state-of-the-art results in both scenarios for sufficiently expressive models.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Document Ranking | TREC DL Track 2019 (test) | nDCG@1073.2 | 133 | |
| Information Retrieval | BEIR (test) | -- | 90 | |
| Retrieval | MS MARCO (dev) | MRR@100.389 | 84 | |
| Retrieval | TREC DL 2019 | NDCG@1073 | 83 | |
| Reranking | MS MARCO (dev) | MRR@100.38 | 71 | |
| Information Retrieval | BEIR | Average NDCG@100.507 | 62 | |
| Information Retrieval | MS Marco | -- | 56 | |
| Information Retrieval | TREC DL 2020 | nDCG@1071.8 | 33 | |
| Information Retrieval | FIQA BEIR (test) | nDCG@1034.7 | 32 | |
| Information Retrieval | SciFact BEIR (test) | nDCG@1070.4 | 31 |