Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TyDiQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multilingual Question AnsweringTydiQA
Accuracy81.9
65
Hallucination DetectionTyDiQA-GP
AUC ROC0.9404
46
Multilingual Question AnsweringTyDiQA 1-shot macro-averaged
F1 Score (1-shot macro)48.86
28
Question AnsweringTyDiQA
Exact Match52.14
28
Instruction TuningTyDiQA
Accuracy67.41
20
Multilingual Question AnsweringTyDiQA GoldP (val)
Ar Score80
20
Question AnsweringTyDiQA GoldP
F1 Score89.4
20
MultilingualityTydiQA
F1 Score70.8
16
Hallucination DetectionTyDiQA (test)
AUROC88.4
14
Multilingual UnderstandingTydiQA (test)
Accuracy47.78
12
Question AnsweringTyDiQA GoldP (test)
F1 Score87.7
12
Question AnsweringTYDIQA
Accuracy56.62
11
Performance PredictionTyDiQA
MAE4.29
9
Zero-shot performance predictionTyDiQA
MAE3.42
9
Question AnsweringTyDiQA
Score53.56
6
Multilingual Question AnsweringTydiQA
F1 Score48.71
4
Question AnsweringTyDiQA (test)
Average Score72.4
4
Showing 17 of 17 rows