Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multidisciplinary Reasoning on Humanity's Last Exam (HLE)
Loading...
8.81
Bio Accuracy (average@8)
Seq-IS
6.1476
6.8388
7.53
8.2212
May 13, 2026
Bio Accuracy (average@8)
CS Accuracy (average@8)
Physics Accuracy (average@8)
Math Accuracy (average@8)
Humanities Accuracy (average@8)
Overall Accuracy (average@8)
Updated 20d ago
Evaluation Results
Method
Method
Links
Bio Accuracy (average@8)
CS Accuracy (average@8)
Physics Accuracy (average@8)
Math Accuracy (average@8)
Humanities Accuracy (average@8)
Overall Accuracy (average@8)
Seq-IS
Training Paradigm=Cont...
2026.05
8.81
6.74
3.22
6.54
6.96
5.38
BeamSearch-IS
Training Paradigm=Cont...
2026.05
8.81
8.3
7.67
11.15
7.23
8.63
BeamSearch
Training Paradigm=Cont...
2026.05
7.39
3.9
5.94
8.97
6.33
6.51
Seq
Training Paradigm=Cont...
2026.05
7.24
2.81
6.88
8.08
6.01
6.2
Gemini-2.5-Flash
2026.05
7.1
6.46
5
8.08
6.01
6.53
BoN
Training Paradigm=Cont...
2026.05
6.25
4.21
4.06
10.13
6.01
6.13
Feedback
Search any
task
Search any
task