| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Hallucination Detection | HELM Passage Level v1.0 (test) | AUC0.9599 | 84 | |
| Hallucination Detection | HELM Sentence Level v1.0 (test) | AUC0.8835 | 84 | |
| Language Modeling | HELM macro-averaged (test) | Accuracy73.2 | 30 | |
| Natural Language Reasoning | HELM | Synth. Reason. (AS)54 | 16 |