Multi-hop Reading Comprehension through Question Decomposition and Rescoring
About
Multi-hop Reading Comprehension (RC) requires reasoning and aggregation across several paragraphs. We propose a system for multi-hop RC that decomposes a compositional question into simpler sub-questions that can be answered by off-the-shelf single-hop RC models. Since annotations for such decomposition are expensive, we recast sub-question generation as a span prediction problem and show that our method, trained using only 400 labeled examples, generates sub-questions that are as effective as human-authored sub-questions. We also introduce a new global rescoring approach that considers each decomposition (i.e. the sub-questions and their answers) to select the best final answer, greatly improving overall performance. Our experiments on HotpotQA show that this approach achieves the state-of-the-art results, while providing explainable evidence for its decision making in the form of sub-questions.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-hop Question Answering | HotpotQA fullwiki setting (test) | Answer F140.65 | 64 | |
| Answer extraction and supporting sentence prediction | HotpotQA fullwiki (test) | Answer EM30 | 48 | |
| Question Answering | HotpotQA distractor (dev) | Answer F170.6 | 45 | |
| Multi-hop Question Answering | HotpotQA fullwiki setting (dev) | Answer F143.3 | 38 | |
| Question Answering | HotpotQA distractor setting (test) | Answer F169.63 | 34 | |
| Question Answering | HotpotQA full wiki (dev) | F143.3 | 20 | |
| Question Answering | HotpotQA Full Wiki hidden (test) | F140.7 | 12 | |
| Multi-hop Reading Comprehension | HotpotQA distractor (test) | F1 Score69.63 | 6 | |
| Sequence-based Question Decomposition | QDTrees (test) | EM86.2 | 6 | |
| Multi-hop Reading Comprehension | HotpotQA distractor setting (dev) | All Score70.57 | 5 |