Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HaluBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination DetectionHaluBench
AUROC97
75
Hallucination DetectionHaluBench (test)
HE86.96
14
Hallucination DetectionHaluBench
Unsafe F1 Score73.89
3
Showing 3 of 3 rows