Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AbstainQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Safety EvaluationAbstainQA (test)
Accuracy14.7
11
Safety EvaluationAbstainQA (val)
Accuracy26
11
Selective Question AnsweringAbstainQA (test)
Accuracy13
11
Selective Question AnsweringAbstainQA (val)
Accuracy21
11
Showing 4 of 4 rows