Share your thoughts, 1 month free Claude Pro on usSee more

Complex Reasoning on Humanity's Last Exam (HLE)

18.4Pass@1 Score

gemini-2.5-pro

Updated 4mo ago

Evaluation Results

Method	Links
gemini-2.5-pro 2026.03		18.4
Qwen3-235B-A22B-Thinking-2507 2026.03		18.2
o4-mini (high) 2026.03		18.1
DeepSeek-R1-0528 2026.03		17.7
Qwen3-235B-A22B 2026.03		11.8
o3-mini (medium) 2026.03		10.3
Qwen3-4B-Thinking-2507 + CHIMERA 2026.03		9
Qwen3-32B 2026.03		8.9
DeepSeek-R1 2026.03		8.5
Qwen3-4B-Thinking-2507 2026.03		7.3
DeepSeek-R1-0528-Qwen3-8B 2026.03		6.9
DeepSeek-R1-Distill-Llama-70B 2026.03		5.2
Qwen3-4B-Thinking-2507 + OpenScience 2026.03		4.6