Humanity's Last Exam

Benchmarks

Task Name	Dataset Name	SOTA Result
Reasoning	Humanity's Last Exam	Accuracy84.61	60
Mathematical reasoning	Humanity's Last Exam Math (test)	Accuracy29.76	28
DeepSearch Question Answering	Humanity's Last Exam	NS Score14.7	22
Autonomous Agent Problem Solving	Humanity's Last Exam	Avg@341.6	19
General Knowledge Reasoning	Humanity's Last Exam	Score50	17
Question Answering	Humanity's Last Exam	Pass@151.7	16
Expert-level Reasoning	Humanity's Last Exam 2,158 text-only	Avg@3 Score54.2	15
Expert-Level Question Answering	Humanity's Last Exam	Accuracy40.9	14
Complex Reasoning	Humanity's Last Exam (HLE)	Pass@1 Score18.4	13
Expert-Level Human Knowledge Reasoning	Humanity's Last Exam	Pass@138.3	11
Question Answering	Humanity's Last Exam (HLE) curated 649-question subset (test)	Accuracy54.3	7
Open-Web Reasoning	Humanity’s Last Exam (HLE) Open-web via SERPER (test)	Pass@414.8	6
Multidisciplinary Reasoning	Humanity's Last Exam (HLE)	Bio Accuracy (average@8)8.81	6
Question Answering	Humanity's Last Exam (HLE) MCQ	Accuracy19.9	6
Long Context Evaluation	Humanity's Last Exam AA-LCR	Accuracy54.3	6
World Knowledge	HUMANITY’S LAST EXAM text-only	Score11.1	4

Showing 16 of 16 rows