Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Humanity's Last Exam (HLE)

Benchmarks

Task NameDataset NameSOTA ResultTrend
ReasoningHumanity's Last Exam (HLE) (test)
Accuracy72.19
10
Medical ReasoningHumanity's Last Exam (HLE) Medical
Accuracy20.8
7
Showing 2 of 2 rows