Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SelfAware

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination DetectionSelfAware Gemini outputs (test)
AUROC52.8
15
Hallucination DetectionSelfAware GPT outputs (test)
AUROC0.528
15
Hallucination DetectionSelfAware Llama outputs (test)
AUROC58.7
15
FactualitySelfAware
Score0.372
10
Self-awarenessSelfAware
Accuracy51.2
10
Hallucination DetectionSelfAware
AUROC0.587
9
Question Answering with AbstentionSELFAWARE
U-Ref91.4
7
Question AnsweringSelfAware (out-of-domain)
nAUPC9.9
4
Question AnsweringSelfAware
Accuracy27
1
Showing 9 of 9 rows