Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Harvesting Paragraph-Level Question-Answer Pairs from Wikipedia

About

We study the task of generating from Wikipedia articles question-answer pairs that cover content beyond a single sentence. We propose a neural network approach that incorporates coreference knowledge via a novel gating mechanism. Compared to models that only take into account sentence-level information (Heilman and Smith, 2010; Du et al., 2017; Zhou et al., 2017), we find that the linguistic knowledge introduced by the coreference representation aids question generation significantly, producing models that outperform the current state-of-the-art. We apply our system (composed of an answer span extraction system and the passage-level QG system) to the 10,000 top-ranking Wikipedia articles and create a corpus of over one million question-answer pairs. We also provide a qualitative analysis for this large-scale generated corpus from Wikipedia.

Xinya Du, Claire Cardie• 2018

Related benchmarks

TaskDatasetResultRank
Question GenerationSQuAD 1.1 (test)
BLEU-415.16
29
Question GenerationSQuAD 1.1
METEOR0.1912
21
Question GenerationSQuAD
BLEU-40.152
21
Question GenerationSQuAD Du
BLEU-415.16
10
Question GenerationNatural Questions (test)
QAE EM27.91
5
Question GenerationTriviaQA (test)
QAE EM21.32
5
Showing 6 of 6 rows

Other info

Follow for update