Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TyDiQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multilingual Question AnsweringTydiQA
Accuracy81.9
65
Multilingual Question AnsweringTyDiQA 1-shot macro-averaged
F1 Score (1-shot macro)48.86
28
Question AnsweringTyDiQA
Exact Match52.14
28
Multilingual Question AnsweringTyDiQA GoldP (val)
Ar Score80
20
Question AnsweringTyDiQA GoldP
F1 Score89.4
20
MultilingualityTydiQA
F1 Score70.8
16
Hallucination DetectionTyDiQA (test)
AUROC88.4
14
Multilingual UnderstandingTydiQA (test)
Accuracy47.78
12
Question AnsweringTyDiQA GoldP (test)
F1 Score87.7
12
Performance PredictionTyDiQA
MAE4.29
9
Zero-shot performance predictionTyDiQA
MAE3.42
9
Hallucination DetectionTyDiQA-GP
AUC ROC0.9404
8
Question AnsweringTyDiQA
Score53.56
6
Multilingual Question AnsweringTydiQA
F1 Score48.71
4
Question AnsweringTyDiQA (test)
Average Score72.4
4
Showing 15 of 15 rows