Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Expert-level Reasoning on Humanity's Last Exam 2,158 text-only
Loading...
54.2
Avg@3 Score
Seed-2.0-Pro
32.048
37.799
43.55
49.301
Mar 16, 2026
Avg@3 Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Avg@3 Score
Seed-2.0-Pro
Agent Style=ReAct, Max...
2026.03
54.2
Claude-4.6-Opus
Agent Style=ReAct, Max...
2026.03
53.1
OpenAI-GPT-5.4
Agent Style=ReAct, Max...
2026.03
52.1
Gemini-3.1-Pro
Agent Style=ReAct, Max...
2026.03
51.4
GLM-5.0
Agent Style=ReAct, Max...
2026.03
50.4
Kimi-K2.5
Agent Style=ReAct, Max...
2026.03
50.2
Qwen3.5-397B
Agent Style=ReAct, Max...
2026.03
48.3
MiroThinker-H1
Agent Style=ReAct, Max...
2026.03
47.7
Gemini-3.0-Pro
Agent Style=ReAct, Max...
2026.03
46.9
Claude-4.5-Opus
Agent Style=ReAct, Max...
2026.03
43.2
MiroThinker-1.7
Agent Style=ReAct, Max...
2026.03
42.9
DeepSeek-V3.2
Agent Style=ReAct, Max...
2026.03
40.8
MiroThinker-1.7-mini
Agent Style=ReAct, Max...
2026.03
36.4
OpenAI-GPT-5
Agent Style=ReAct, Max...
2026.03
35.2
Tongyi-DeepResearch-30B
Agent Style=ReAct, Max...
2026.03
32.9
Feedback
Search any
task
Search any
task