Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

About

Existing question answering (QA) datasets fail to train QA systems to perform complex reasoning and provide explanations for answers. We introduce HotpotQA, a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas; (3) we provide sentence-level supporting facts required for reasoning, allowing QA systems to reason with strong supervision and explain the predictions; (4) we offer a new type of factoid comparison questions to test QA systems' ability to extract relevant facts and perform necessary comparison. We show that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions.

Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, Christopher D. Manning• 2018

Related benchmarks

TaskDatasetResultRank
Multi-hop Question AnsweringHotpotQA fullwiki setting (test)
Answer F134.4
64
Question AnsweringHotpotQA distractor (dev)
Answer F158.3
45
Multi-hop Question AnsweringHotpotQA fullwiki setting (dev)
Answer F134.36
38
Question AnsweringHotpotQA (test)
Ans F132.9
37
Question AnsweringHotpotQA distractor setting (test)
Answer F159.02
34
Question AnsweringHotpotQA full wiki (dev)
F134.4
20
Supporting Fact PredictionHotpotQA full wiki (dev)
F1 Score41
19
Supporting Fact PredictionHotpotQA distractor (dev)
F1 Score66.7
13
Question AnsweringHotpotQA Full Wiki hidden (test)
F132.9
12
Supporting Facts PredictionHotpotQA Full Wiki hidden (test)
F1 Score37.7
11
Showing 10 of 14 rows

Other info

Code

Follow for update