Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Humanity's Last Exam

Benchmarks

Task NameDataset NameSOTA ResultTrend
ReasoningHumanity's Last Exam
Accuracy84.61
60
Autonomous Agent Problem SolvingHumanity's Last Exam
Avg@341.6
19
Question AnsweringHumanity's Last Exam
Pass@151.7
16
Expert-level ReasoningHumanity's Last Exam 2,158 text-only
Avg@3 Score54.2
15
Expert-Level Question AnsweringHumanity's Last Exam
Accuracy40.9
14
Complex ReasoningHumanity's Last Exam (HLE)
Pass@1 Score18.4
13
Expert-Level Human Knowledge ReasoningHumanity's Last Exam
Pass@138.3
11
Question AnsweringHumanity's Last Exam (HLE) curated 649-question subset (test)
Accuracy54.3
7
Multidisciplinary ReasoningHumanity's Last Exam (HLE)
Bio Accuracy (average@8)8.81
6
Question AnsweringHumanity's Last Exam (HLE) MCQ
Accuracy19.9
6
Long Context EvaluationHumanity's Last Exam AA-LCR
Accuracy54.3
6
World KnowledgeHUMANITY’S LAST EXAM text-only
Score11.1
4
Showing 12 of 12 rows