Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

2WikiMQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-hop Question Answering2WikiMQA
F1 Score76.4
161
Question Answering2WikiMQA
F174.9
44
Long-context Question Answering2WikiMQA
SubEM79.5
36
Multi-hop Reasoning2WikiMQA IRCoT 500 samples (test)
ACC52.8
27
Multimodal Question Answering2WikiMQA
F1-Recall55.47
22
Long-context Question Answering2WikiMQA (Passage Split)
Score52.53
18
Long-context Question Answering2WikiMQA Fixed Chunk 2048
QA Score52.53
18
Question Answering2WikiMQA (test)
EM35.9
18
Retrieval2WikiMQA (test)
Recall@K69.7
8
Multi-hop Question Answering2WikiMQA (test)
Exact Match48.6
7
Question Answering2WikiMQA (sampled)
Accuracy0.63
4
Showing 11 of 11 rows