Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

HLE

Benchmarks

Task NameDataset NameSOTA ResultTrend
Humanities Question AnsweringHLE
HLE Score13.37
24
Knowledge-Intensive ReasoningHLE
Avg Score85
23
Logical ReasoningHLE
Accuracy0.305
21
General ReasoningHLE
Accuracy38.4
21
Scientific ReasoningHLE
pass@1612
17
High-Level ReasoningHLE
Average Score26.6
17
Deep researchHLE
Accuracy51
16
Deep SearchHLE text-only
Score40.8
14
ReasoningHLE
Pass@118.03
14
Deep ResearchHLE text-only original (test)
Pass@132.9
13
Hard Reasoning and Language EvaluationHLE
Accuracy36.1
12
Mathematical ReasoningHLE Math-text
Pass@162.8
12
Reasoning & GeneralHLE
Score51.8
11
Compositional ReasoningHLE
Accuracy23.1
11
Reasoning & GeneralHLE Full
Score (%)0.502
10
Hard LLM ReasoningHLE
Accuracy15.5
10
Question AnsweringHLE
Performance Score17.6
8
SearchHLE text
Score45.8
7
Scientific Reasoning & QAHLE
Accuracy3.61
7
ReasoningHLE
Score17.9
7
Hard ReasoningHLE
Pass@137.7
7
Multi-domain Knowledge and ReasoningHLE (Humanity’s Last Exam) (official)
Exact Match42
7
Confidence CalibrationHLE (test)
ECE0.031
7
Agentic ReasoningHLE
Overall Score41.6
7
Scientific ReasoningHLE Text-only
Accuracy13.7
6
Showing 25 of 35 rows