Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Autonomous Agent Problem Solving on Humanity's Last Exam
Loading...
41.6
Avg@3
ChatGPT-Agent
16.328
22.889
29.45
36.011
Nov 14, 2025
Avg@3
Updated 1mo ago
Evaluation Results
Method
Method
Links
Avg@3
ChatGPT-Agent
Type=Research Agents
2025.11
41.6
MiroThinker-v1.0-72B
Parameters=72B
2025.11
37.7
OpenAI-GPT-5-high
Type=Foundation Models...
2025.11
35.2
MiroThinker-v1.0-30B
Parameters=30B
2025.11
33.4
Tongyi-DeepResearch-30B
Type=Research Agents
2025.11
32.9
Minimax-M2
Type=Foundation Models...
2025.11
31.8
GLM-4.6
Type=Foundation Models...
2025.11
30.4
DeepSeek-V3.1
Type=Foundation Models...
2025.11
29.8
SFR-DeepResearch-20B
Type=Research Agents
2025.11
28.7
DeepSeek-V3.2
Type=Foundation Models...
2025.11
27.2
Kimi-Researcher
Type=Research Agents
2025.11
26.9
OpenAI DeepResearch
Type=Research Agents
2025.11
26.6
OpenAI-o3
Type=Foundation Models...
2025.11
24.9
Claude-4.5-Sonnet
Type=Foundation Models...
2025.11
24.5
Kimi-K2-0905
Type=Foundation Models...
2025.11
21.7
MiroThinker-v1.0-8B
Parameters=8B
2025.11
21.5
Claude-4-Sonnet
Type=Foundation Models...
2025.11
20.3
AFM-32B-RL
Type=Research Agents
2025.11
18
WebExplorer-8B-RL
Type=Research Agents
2025.11
17.3
Feedback
Search any
task
Search any
task