Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SQuAD

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringSQuAD v1.1 (dev)
F1 Score95.8
380
Question AnsweringSQuAD v1.1 (test)
F1 Score95.4
260
Question AnsweringSQuAD 2.0
F189.4
190
Question AnsweringSQuAD v2.0 (dev)
F191.2
163
Question AnsweringSQuAD
F189.8
134
Prompt Injection DefenseInj-SQuAD
Combined ASR0.11
123
Question AnsweringSQuAD (test)
F191.2
111
Question AnsweringSQuAD
Exact Match93.33
83
Question AnsweringSQuAD v1.1
F194.7
79
Question AnsweringSQuAD (dev)
F191
74
Question AnsweringSQuAD
ACE (General)0.112
70
Question AnsweringSQuAD v1.1 (val)
F1 Score96.22
70
Machine Reading ComprehensionSQuAD
EM89.9
58
Machine Reading ComprehensionSQuAD 2.0 (dev)
EM88.8
57
Machine Reading ComprehensionSQuAD 2.0 (test)
EM89.6
51
Hallucination DetectionSQuAD (test)
AUROCr83.8
48
Machine Reading ComprehensionSQuAD 1.1 (dev)
EM89.71
48
Machine Reading ComprehensionSQuAD 1.1 (test)
EM89.898
46
Question AnsweringSQuAD (test)
GPT Judge Accuracy89
45
GenerationSQuAD
F1 Score88.3
44
Open-domain question answeringSQUAD Open (test)
Exact Match56.6
39
Question AnsweringSQuAD KRE-curated version
F1 Score72.6
36
Question AnsweringSQuAD v2
ASR Score1
36
Question AnsweringSQuAD
F1 Score71.4
36
Extractive Question AnsweringSQuAD 2.0
F1 Score92.9
34
Showing 25 of 187 rows
...