Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

2WikiMultiHopQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-hop Question Answering2WikiMultiHopQA
EM71.7
278
Multi-hop Question Answering2WikiMultiHopQA (test)
EM64.6
143
Question Answering2WikiMultihopQA
EM47.7
73
Multi-hop Question Answering2WikiMultiHopQA Out-Of-Distribution (OOD)
Accuracy74.2
72
Question Answering2WikiMultihopQA (test)
F178.9
69
Long-context Question Answering2WikiMultiHopQA (Out-Of-Distribution)
Accuracy63.9
54
Multi-hop QA Retrieval2WikiMultihopQA (test)
R@280.8
28
Question Answering2WikiMultihopQA
Accuracy62.5
25
Multi-hop Question Answering2WikiMultiHopQA N=200
Judge EM77
24
Latent multi-hop reasoning2WikiMultiHopQA
Precision96.86
22
Multi-hop Question Answering2WikiMultiHopQA Full
Accuracy (C)87.5
22
Question Answering2WikiMultihopQA
LLM-Acc89.7
20
End-to-end Question Answering2WikiMultiHopQA (test val)
EM35.44
20
Knowledge-Intensive Reasoning2wikiMultiHopQA
F1 Score76.1
18
Knowledge-Intensive Reasoning2WikiMultiHopQA
Accuracy48.8
18
Multi-hop Question Answering2WikiMultiHopQA (2WikiMQA) (official evaluation)
Exact Match (EM)31.8
17
Multi-Hop Question Answering2WikiMultiHopQA out-of-domain (val test)
Exact Match (EM)51.7
15
Multi-hop Question Answering2WikiMultiHopQA in-domain (test)
Accuracy (Response)69.8
14
Question Answering2WikiMultiHopQA December 2018 Wikipedia dump (test)
EM28.6
14
Question Answering2WikiMultiHopQA 1,000 queries (test)
EM71.1
13
Question Answering2WikiMultihopQA
Prefilling Speedup Ratio3.57
12
Multi-step Retrieval2WikiMultihopQA (val)
F1 Score68.02
11
Question Answering2WikiMultiHopQA out-domain (test)
LasJ56.32
11
Multi-hop Question Answering2WikiMultiHopQA (dev)
Exact Match Accuracy68.6
11
Multi-hop Question Answering2WikiMultiHopQA v1.0 (test)
Task Latency (s)2.29
9
Showing 25 of 40 rows