Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

2Wiki

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question Answering2WIKI
F191.79
152
Multi-hop Question Answering2Wiki
Exact Match74.9
152
Adversarial Attack2Wiki
ASR78.8
30
Multi-hop QA2Wiki
EM62
26
Multi-hop QA Retrieval2Wiki
Recall@290.1
23
Multi-hop Question Answering2Wiki (test)
F1 Score69.7
20
Retrieval2Wiki
Recall@587
19
Question Answering2Wiki 30K context
Accuracy73.7
19
Question Answering2Wiki 10K context
Accuracy72.2
19
Question Answering2Wiki 100K context
Accuracy65.5
18
Multi-Hop QA2Wiki
Accuracy70.5
17
Multi-hop Question Answering2Wiki
MBE59
17
Multi-Hop QA Verification2wiki
P@181.21
16
Multi-Hop Question Answering2wiki 2018 Wikipedia dump (dev)
Accuracy (%)43.6
14
Question Answering2WIKI (val)
EM27.2
14
Question Answering2WIKI (out-of-domain)
EM40
14
Question Answering2Wiki 500 samples (val)
EM39.6
14
Multi-Hop Question Answering2Wiki
CEM (%)71.7
12
Multi-hop Question Answering2Wiki
Token Count5,583
12
Question Answering2Wiki (test)
EM Accuracy61.8
12
Query Rewriting & QA2Wiki BM25
F133.6
12
Open-domain Question Answering2WIKI
Accuracy48.9
11
Multi-Hop Question Answering2Wiki (out-of-domain)
Accuracy42
10
Expected Calibration Error2Wiki
ECE17.43
10
Multi-Hop QA2Wiki (test)
EM57.5
10
Showing 25 of 46 rows