Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

NaturalQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Uncertainty EstimationNaturalQA
AUROC75.5
30
Question AnsweringNaturalQA
EM38.92
26
Abstention ClassificationNaturalQA (test)
Accuracy (Abs=0)100
9
Question AnsweringNaturalQA (test)
Accuracy23.4
9
Showing 4 of 4 rows