Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-Hop QA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-Hop Question AnsweringMulti-Hop QA (HotpotQA, 2Wiki, Musique, Bamboogle)
HotpotQA Score59.8
48
Multi-Hop Question AnsweringMulti-Hop QA (HotpotQA, 2Wiki, Musique, Bamboogle) (test)
HotpotQA Score57.02
44
Multi-Hop Question AnsweringMulti-Hop QA Aggregate
Average Score45.3
30
Multi-Hop Question AnsweringMulti-Hop QA
Accuracy48.5
29
Multi-hop Question AnsweringMulti-Hop QA
2Wiki Accuracy89.34
22
Multi-Hop Question AnsweringMulti-Hop QA Average
EM0.3775
20
Multi-Hop Question AnsweringMulti-Hop QA Full Context
Accuracy86.06
5
Multi-Hop Question AnsweringMulti-Hop QA Partial Context
Accuracy72.1
5
Multi-Hop Question AnsweringMulti-Hop QA Zero-Shot
Accuracy (Zero-Shot)66.6
5
Showing 9 of 9 rows