Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SQuAD

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringSQuAD v1.1 (dev)
F1 Score95.8
375
Question AnsweringSQuAD v1.1 (test)
F1 Score95.4
260
Question AnsweringSQuAD 2.0
F189.4
190
Question AnsweringSQuAD v2.0 (dev)
F191.2
158
Question AnsweringSQuAD
F189.8
127
Question AnsweringSQuAD (test)
F191.2
111
Question AnsweringSQuAD v1.1
F194.7
79
Question AnsweringSQuAD (dev)
F191
74
Question AnsweringSQuAD v1.1 (val)
F1 Score96.22
70
Machine Reading ComprehensionSQuAD
EM89.9
58
Machine Reading ComprehensionSQuAD 2.0 (dev)
EM88.8
57
Machine Reading ComprehensionSQuAD 2.0 (test)
EM89.6
51
Question AnsweringSQuAD
Exact Match93.33
50
Hallucination DetectionSQuAD (test)
AUROCr83.8
48
Machine Reading ComprehensionSQuAD 1.1 (dev)
EM89.71
48
Machine Reading ComprehensionSQuAD 1.1 (test)
EM89.898
46
Question AnsweringSQuAD (test)
GPT Judge Accuracy89
45
GenerationSQuAD
F1 Score88.3
44
Open-domain question answeringSQUAD Open (test)
Exact Match56.6
39
Question AnsweringSQuAD
F1 Score71.4
36
Extractive Question AnsweringSQuAD 2.0
F1 Score92.9
34
Question AnsweringSQuAD 2.0 (test)
EM89.7
34
CalibrationSQuAD
ECE5.87
31
Open-domain Question AnsweringSQuAD Open-domain 1.1 (test)
Exact Match (EM)61.8
30
Question GenerationSQuAD 1.1 (test)
BLEU-425.8
29
Showing 25 of 156 rows