Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

2WikiMHQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-hop Question Answering2WikiMHQA
F1 Score85.56
73
Multi-hop Reasoning2WikiMHQA
AUROC0.7002
26
Multi-hop Question Answering2WikiMHQA (test)
EM71.85
17
Multi-hop Question Answering2WikiMHQA in-distribution
Exact Match (EM)79.39
17
Question Answering2WikiMHQA Covered Questions
Accuracy66.9
8
Question Answering2WikiMHQA All Questions
Accuracy61
8
Multi-hop QA2WikiMHQA
Average Violation0.56
4
Question Answering2WikiMHQA KET-RAG
Accuracy64.8
2
Multi-hop Question Answering2WikiMHQA in-distribution v4 (test)
EM-
0
Showing 9 of 9 rows