Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AbstentionBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Abstention MetacognitionAbstentionBench Normal Prompt
F1 Score67
28
Abstention MetacognitionAbstentionBench Abstention Prompt
F1 Score74.9
28
AbstentionAbstentionBench Clean-merged (test)
Precision92.7
10
AbstentionAbstentionBench KUQ Cont
Abstention F1100
4
AbstentionAbstentionBench QAQA
Abstention F175.8
4
AbstentionAbstentionBench UMWP
Abstention F180.9
4
AbstentionAbstentionBench FreshQA
Abstention F175.2
4
AbstentionAbstentionBench BBQ
Abstention F197
4
AbstentionAbstentionBench BB Known Unk.
Abstention F195.8
4
Abstention DetectionAbstentionBench
Recall53
2
Showing 10 of 10 rows