Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Humanity's Last Exam (HLE)

Benchmarks

Task NameDataset NameSOTA ResultTrend
SearchHumanity's Last Exam (HLE) (test)
Accuracy45.8
14
General Knowledge and ReasoningHumanity's Last Exam (HLE) text-only
sHLE Score36.6
11
ReasoningHumanity's Last Exam (HLE) (test)
Accuracy72.19
10
Medical ReasoningHumanity's Last Exam (HLE) Medical
Accuracy20.8
7
Showing 4 of 4 rows