Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BM25S: Orders of magnitude faster lexical search via eager sparse scoring

About

We introduce BM25S, an efficient Python-based implementation of BM25 that only depends on Numpy and Scipy. BM25S achieves up to a 500x speedup compared to the most popular Python-based framework by eagerly computing BM25 scores during indexing and storing them into sparse matrices. It also achieves considerable speedups compared to highly optimized Java-based implementations, which are used by popular commercial products. Finally, BM25S reproduces the exact implementation of five BM25 variants based on Kamphuis et al. (2020) by extending eager scoring to non-sparse variants using a novel score shifting method. The code can be found at https://github.com/xhluca/bm25s

Xing Han L\`u• 2024

Related benchmarks

TaskDatasetResultRank
Information RetrievalBEIR (test)
FiQA-2018 Score0.171
90
Tool RetrievalToolRet
Avg nDCG@1027.25
21
Tool RetrievalToolRet (Customized)
N@1031.9
11
Tool RetrievalToolRet (Code)
N@1029.5
11
Information RetrievalArguAna
QPS573.9
9
Information RetrievalQuora
QPS183.5
9
Information RetrievalMTEB (test)
MRR (Avg Med)0.532
9
Information RetrievalNQ
QPS41.85
8
Information RetrievalMSMARCO
QPS12.2
7
Information RetrievalClimateFEVER--
6
Showing 10 of 22 rows

Other info

Code

Follow for update