Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HLE (Humanity's Last Exam)

Benchmarks

Task NameDataset NameSOTA ResultTrend
Expert-Level ReasoningHLE (Humanity's Last Exam) text-only subset (val)
Inference Accuracy52.2
13
Performance EstimationHLE (Humanity's Last Exam) 2% subset
MAE2.9
3
Performance EstimationHLE (Humanity's Last Exam) 1% subset
MAE3.5
3
Performance EstimationHLE (Humanity's Last Exam) 0.5% subset
MAE5.6
3
Showing 4 of 4 rows