Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Complex Reasoning on Humanity's Last Exam (HLE)
Loading...
18.4
Pass@1 Score
gemini-2.5-pro
4.048
7.774
11.5
15.226
Mar 1, 2026
Pass@1 Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1 Score
gemini-2.5-pro
# Params=–, Scale Cate...
2026.03
18.4
Qwen3-235B-A22B-Thinking-2507
# Params=235B, Scale C...
2026.03
18.2
o4-mini (high)
# Params=–, Scale Cate...
2026.03
18.1
DeepSeek-R1-0528
# Params=671B, Scale C...
2026.03
17.7
Qwen3-235B-A22B
# Params=235B, Scale C...
2026.03
11.8
o3-mini (medium)
# Params=–, Scale Cate...
2026.03
10.3
Qwen3-4B-Thinking-2507 + CHIMERA
# Params=4B, Scale Cat...
2026.03
9
Qwen3-32B
# Params=32B, Scale Ca...
2026.03
8.9
DeepSeek-R1
# Params=671B, Scale C...
2026.03
8.5
Qwen3-4B-Thinking-2507
# Params=4B, Scale Cat...
2026.03
7.3
DeepSeek-R1-0528-Qwen3-8B
# Params=8B, Scale Cat...
2026.03
6.9
DeepSeek-R1-Distill-Llama-70B
# Params=70B, Scale Ca...
2026.03
5.2
Qwen3-4B-Thinking-2507 + OpenScience
# Params=4B, Scale Cat...
2026.03
4.6
Feedback
Search any
task
Search any
task