Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HELM

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination DetectionHELM Passage Level v1.0 (test)
AUC0.9599
84
Hallucination DetectionHELM Sentence Level v1.0 (test)
AUC0.8835
84
Language ModelingHELM macro-averaged (test)
Accuracy73.2
30
Natural Language ReasoningHELM
Synth. Reason. (AS)54
16
Showing 4 of 4 rows