Simple and Effective Multi-Paragraph Reading Comprehension

About

We consider the problem of adapting neural paragraph-level question answering models to the case where entire documents are given as input. Our proposed solution trains models to produce well calibrated confidence scores for their results on individual paragraphs. We sample multiple paragraphs from the documents during training, and use a shared-normalization training objective that encourages the model to produce globally correct output. We combine this method with a state-of-the-art pipeline for training models on document QA data. Experiments demonstrate strong performance on several document QA datasets. Overall, we are able to achieve a score of 71.3 F1 on the web portion of TriviaQA, a large improvement from the 56.7 F1 of the previous best system.

Christopher Clark, Matt Gardner• 2017

Related benchmarks

Task	Dataset	Result
Question Answering	SQuAD (test)	F181.1	111
Question Answering	Natural Question (NQ) (dev)	F146.1	72
Machine Reading Comprehension	SQuAD 2.0 (dev)	EM65.1	57
Machine Reading Comprehension	SQuAD 2.0 (test)	EM59.3	51
Machine Reading Comprehension	Molweni (test)	EM42.5	49
Reading Comprehension	SQuAD (dev)	F1 Score0.808	15
Question Answering	TriviaQA Web domain Verified (test)	Exact Match (EM)79.97	11
Machine Comprehension	TriviaQA Wikipedia Verified (test)	EM68	7
Machine Reading Comprehension	TriviaQA Web (Verified)	EM79.97	7
Machine Reading Comprehension	TriviaQA Web (Full)	EM66.37	7

Showing 10 of 16 rows

Other info

Code

Follow for update

@wizwand_team Discord