Relevance-guided Supervision for OpenQA with ColBERT

About

Systems for Open-Domain Question Answering (OpenQA) generally depend on a retriever for finding candidate passages in a large corpus and a reader for extracting answers from those passages. In much recent work, the retriever is a learned component that uses coarse-grained vector representations of questions and passages. We argue that this modeling choice is insufficiently expressive for dealing with the complexity of natural language questions. To address this, we define ColBERT-QA, which adapts the scalable neural retrieval model ColBERT to OpenQA. ColBERT creates fine-grained interactions between questions and passages. We propose an efficient weak supervision strategy that iteratively uses ColBERT to create its own training data. This greatly improves OpenQA retrieval on Natural Questions, SQuAD, and TriviaQA, and the resulting system attains state-of-the-art extractive OpenQA performance on all three datasets.

Omar Khattab, Christopher Potts, Matei Zaharia• 2020

Related benchmarks

Task	Dataset	Result
Open Question Answering	Natural Questions (NQ) (test)	Exact Match (EM)47.8	134
Information Retrieval	BEIR (test)	TREC-COVID Score67.7	126
Open-domain Question Answering	TriviaQA (test)	Exact Match70.1	80
Open-domain Question Answering	TriviaQA open (test)	EM63.2	59
Question Answering	TriviaQA	EM70.1	10
Global document retrieval	SQuAD	Recall@588.2	9
Open-domain Question Answering	NaturalQuestions (test)	Top-1 EM48.2	9
Global document retrieval	TriviaQA	Recall@565.4	9
Global document retrieval	PAQ	Recall@583.4	9
Global document retrieval	NQ	Recall@50.713	9

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord