Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

HELM

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination DetectionHELM Passage Level v1.0 (test)
AUC0.9599
84
Hallucination DetectionHELM Sentence Level v1.0 (test)
AUC0.8835
84
Natural Language ReasoningHELM
Synth. Reason. (AS)54
16
Showing 3 of 3 rows