Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

2Wiki

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question Answering2WIKI
EM86
241
Multi-hop Question Answering2Wiki
Exact Match74.9
215
Class-level Continual Learning2Wiki
Average Accuracy (AA)71.74
56
Question Answering2Wiki (test)
EM Accuracy61.8
49
Multi-Hop Question Answering2Wiki
Accuracy (2Wiki)43.6
44
Multi-hop QA2Wiki
EM62
42
Retrieval2Wiki
Recall@596.13
42
Multi-hop Question Answering2Wiki (test)
F1 Score69.7
34
Adversarial Attack2Wiki
ASR78.8
30
Question Answering2Wiki 100K context
Accuracy78.91
25
Multi-hop QA Retrieval2Wiki
Recall@290.1
23
Multi-hop Question Answering2Wiki 18
Exact Match (EM)43.6
20
Multi-Hop Question Answering2Wiki
Token-Level F158.7
20
Question Answering2Wiki 30K context
Accuracy73.7
19
Question Answering2Wiki 10K context
Accuracy72.2
19
Multi-hop Question Answering2Wiki
pass@158
18
Multi-Hop Question Answering2Wiki
Exact Match (EM)46.9
18
Multi-Hop QA2Wiki
Accuracy70.5
17
Multi-hop Question Answering2Wiki
MBE59
17
Multi-Hop Search-augmented Question Answering2Wiki
Success Rate35.6
16
Multi-hop Question Answering2Wiki
EM48.89
16
Multi-Hop QA Verification2wiki
P@181.21
16
Monolingual Question Answering2WIKI
fEM61.13
14
Question Answering2Wiki (In-Distribution)
Accuracy77
14
Multi-Hop Question Answering2wiki 2018 Wikipedia dump (dev)
Accuracy (%)43.6
14
Showing 25 of 73 rows