Beyond Correlations: A Downstream Evaluation Framework for Query Performance Prediction

About

The standard practice of query performance prediction (QPP) evaluation is to measure a set-level correlation between the estimated retrieval qualities and the true ones. However, neither this correlation-based evaluation measure quantifies QPP effectiveness at the level of individual queries, nor does this connect to a downstream application, meaning that QPP methods yielding high correlation values may not find a practical application in query-specific decisions in an IR pipeline. In this paper, we propose a downstream-focussed evaluation framework where a distribution of QPP estimates across a list of top-documents retrieved with several rankers is used as priors for IR fusion. While on the one hand, a distribution of these estimates closely matching that of the true retrieval qualities indicates the quality of the predictor, their usage as priors on the other hand indicates a predictor's ability to make informed choices in an IR pipeline. Our experiments firstly establish the importance of QPP estimates in weighted IR fusion, yielding substantial improvements of over 4.5% over unweighted CombSUM and RRF fusion strategies, and secondly, reveal new insights that the downstream effectiveness of QPP does not correlate well with the standard correlation-based QPP evaluation.

Payel Santra, Partha Basuchowdhuri, Debasis Ganguly• 2026

Related benchmarks

Task	Dataset	Result	Rank
Information Retrieval	TREC DL 19	nDCG@1077		61
Information Retrieval	TREC DL 20	AP@10053.4		28

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord