Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MKQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringMKQA
fEM47.6
27
Cross-lingual retrievalMKQA
Avg. Recall@10076.6
27
Multilingual Retrieval-Augmented GenerationMKQA 1.0 (test)
Accuracy (AR)0.6255
18
Confidence EstimationMKQA (test)
AUROC0.82
14
Cross-lingual Question AnsweringMKQA Average across languages
fEM45.92
14
Cross-lingual Question AnsweringMKQA Arabic
fEM24.12
14
Cross-lingual Question AnsweringMKQA Thai
fEM27.83
14
Cross-lingual Question AnsweringMKQA French
fEM59.67
14
Cross-lingual Question AnsweringMKQA English
fEM72.07
14
Open-domain Question AnsweringMKQA
Accuracy15.94
12
Question RetrievalMKQA (full)
Retrieval Accuracy29.9
12
Multilingual Knowledge Question AnsweringMKQA (test)
F1-score (All)52.3
10
Downstream Generation and Ranking AlignmentMKQA (test)
3-gram Recall49.9
8
RetrievalMKQA eng
nDCG@115.1
6
Confidence EstimationMKQA Japanese ja (test)
AUROC83
5
Confidence EstimationMKQA Russian / ru (test)
AUROC80
5
Confidence EstimationMKQA Spanish es (test)
AUROC82
5
Information RetrievalMKQA eng
nDCG@1035.2
5
Open-domain question answeringMKQA English Language (test)
NOB Accuracy45.42
5
Open-domain question answeringMKQA Target Language (test)
NOB Accuracy40.64
5
Confidence EstimationMKQA Japanese
AUROC77
2
Confidence EstimationMKQA Russian
AUROC0.78
2
Confidence EstimationMKQA Polish
AUROC0.74
2
Confidence EstimationMKQA Spanish
AUROC77
2
Confidence EstimationMKQA English
AUROC76
2
Showing 25 of 27 rows