HLE (Humanity's Last Exam)

Benchmarks

Task Name	Dataset Name	SOTA Result
Expert-Level Reasoning	HLE (Humanity's Last Exam) text-only subset (val)	Inference Accuracy52.2	13
Performance Estimation	HLE (Humanity's Last Exam) 2% subset	MAE2.9	3
Performance Estimation	HLE (Humanity's Last Exam) 1% subset	MAE3.5	3
Performance Estimation	HLE (Humanity's Last Exam) 0.5% subset	MAE5.6	3

Showing 4 of 4 rows