Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

2WikiMultiHopQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-hop Question Answering2WikiMultiHopQA
EM82.1
559
Multi-hop Question Answering2WikiMultiHopQA (test)
EM73.9
226
Question Answering2WikiMultihopQA (test)
F178.9
113
Question Answering2WikiMultihopQA
EM47.7
107
Multi-hop Question Answering2WikiMultiHopQA Out-Of-Distribution (OOD)
Accuracy74.2
72
Open-domain Question Answering2WikiMultiHopQA in-domain
F1 Score62.6
57
Long-context Question Answering2WikiMultiHopQA (Out-Of-Distribution)
Accuracy63.9
54
Question Answering2WikiMultiHopQA
Exact Match43
50
Knowledge Retrieval2WikiMultihopQA
F1 Score56.46
45
Multi-hop Question Answering2WikiMultiHopQA
String Accuracy70.3
44
Multi-hop Question Answering2WikiMultiHopQA (val)
Exact Match (EM)69.3
44
Multi-hop QA Retrieval2WikiMultihopQA (test)
R@597.2
33
Question Answering2WikiMultihopQA LongBench
F1 Score59.73
32
Multi-hop Question Answering2WikiMultiHopQA
Token F1 Score65.9
30
Reasoning2WikiMultiHopQA (OOD)
Degeneration Count0
27
Question Answering2WikiMultiHopQA (OOD)
Exact Match (EM)2.21
27
Question Answering2WikiMultihopQA
EM36.8
27
Question Answering2WikiMultihopQA
Accuracy62.5
25
Multi-hop Question Answering2WikiMultiHopQA online Google Search API (test val)
Exact Match63.5
24
Multi-hop Question Answering2WikiMultiHopQA offline Wiki-18 (test val)
Exact Match43.6
24
Multi-hop Question Answering2WikiMultiHopQA N=200
Judge EM77
24
Knowledge composition selection2WikiMultihopQA
Precision @ K=2100
23
Latent multi-hop reasoning2WikiMultiHopQA
Precision96.86
22
Multi-hop Question Answering2WikiMultiHopQA Full
Accuracy (C)87.5
22
Retrieval2WikiMultiHopQA v1 (test)
R@2E85
21
Showing 25 of 88 rows