Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PLAID: An Efficient Engine for Late Interaction Retrieval

About

Pre-trained language models are increasingly important components across multiple information retrieval (IR) paradigms. Late interaction, introduced with the ColBERT model and recently refined in ColBERTv2, is a popular paradigm that holds state-of-the-art status across many benchmarks. To dramatically speed up the search latency of late interaction, we introduce the Performance-optimized Late Interaction Driver (PLAID). Without impacting quality, PLAID swiftly eliminates low-scoring passages using a novel centroid interaction mechanism that treats every passage as a lightweight bag of centroids. PLAID uses centroid interaction as well as centroid pruning, a mechanism for sparsifying the bag of centroids, within a highly-optimized engine to reduce late interaction search latency by up to 7$\times$ on a GPU and 45$\times$ on a CPU against vanilla ColBERTv2, while continuing to deliver state-of-the-art retrieval quality. This allows the PLAID engine with ColBERTv2 to achieve latency of tens of milliseconds on a GPU and tens or just few hundreds of milliseconds on a CPU at large scale, even at the largest scales we evaluate with 140M passages.

Keshav Santhanam, Omar Khattab, Christopher Potts, Matei Zaharia• 2022

Related benchmarks

TaskDatasetResultRank
Information RetrievalBEIR (test)--
126
Information RetrievalMS-MARCO (test)--
56
Zero-shot Information RetrievalBEIR
NFCorpus NDCG@10 (Zero-shot)33.8
38
End-to-end RetrievalLoTTE
Latency (ms)288
26
End-to-end RetrievalMSMARCO
Latency (ms)222
18
Semantic RelatednessBEIR Semantic Relatedness Tasks (test)
ArguAna Score42.06
16
Information RetrievalLoTTE Search (test)
Lifestyle Score84.3
9
Information RetrievalLoTTE Forum (test)
IR Score (Lifestyle)76.7
9
Information RetrievalQuora
QPS89
9
Information RetrievalArguAna
QPS76
9
Showing 10 of 18 rows

Other info

Follow for update