RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking

About

In various natural language processing tasks, passage retrieval and passage re-ranking are two key procedures in finding and ranking relevant information. Since both the two procedures contribute to the final performance, it is important to jointly optimize them in order to achieve mutual improvement. In this paper, we propose a novel joint training approach for dense passage retrieval and passage re-ranking. A major contribution is that we introduce the dynamic listwise distillation, where we design a unified listwise training approach for both the retriever and the re-ranker. During the dynamic distillation, the retriever and the re-ranker can be adaptively improved according to each other's relevance information. We also propose a hybrid data augmentation strategy to construct diverse training instances for listwise training approach. Extensive experiments show the effectiveness of our approach on both MSMARCO and Natural Questions datasets. Our code is available at https://github.com/PaddlePaddle/RocketQA.

Ruiyang Ren, Yingqi Qu, Jing Liu, Wayne Xin Zhao, Qiaoqiao She, Hua Wu, Haifeng Wang, Ji-Rong Wen• 2021

Related benchmarks

Task	Dataset	Result
Semantic Textual Similarity	STS-B	Spearman's Rho (x100)58.94	156
Information Retrieval	BEIR (test)	TREC-COVID Score67.5	126
Passage retrieval	MsMARCO (dev)	MRR@1038.8	116
Retrieval	MS MARCO (dev)	MRR@100.388	84
Passage Ranking	MS MARCO (dev)	MRR@1041.9	73
Passage retrieval	Natural Questions (NQ) (test)	Top-20 Accuracy83.7	45
Information Retrieval	MS MARCO DL2019	nDCG@1072.5	26
Passage retrieval	MS MARCO (dev)	MRR@1038.8	17
Semantic Relatedness	BEIR Semantic Relatedness Tasks (test)	ArguAna Score45.1	16
Ranking	TREC Deep Learning 2019	NDCG@1071.4	12

Showing 10 of 36 rows

Other info

Follow for update

@wizwand_team Discord