Share your thoughts, 1 month free Claude Pro on usSee more

Autonomous Agent Problem Solving on Humanity's Last Exam

41.6Avg@3

ChatGPT-Agent

Updated 3mo ago

Evaluation Results

Method	Links
ChatGPT-Agent 2025.11		41.6
MiroThinker-v1.0-72B 2025.11		37.7
OpenAI-GPT-5-high 2025.11		35.2
MiroThinker-v1.0-30B 2025.11		33.4
Tongyi-DeepResearch-30B 2025.11		32.9
Minimax-M2 2025.11		31.8
GLM-4.6 2025.11		30.4
DeepSeek-V3.1 2025.11		29.8
SFR-DeepResearch-20B 2025.11		28.7
DeepSeek-V3.2 2025.11		27.2
Kimi-Researcher 2025.11		26.9
OpenAI DeepResearch 2025.11		26.6
OpenAI-o3 2025.11		24.9
Claude-4.5-Sonnet 2025.11		24.5
Kimi-K2-0905 2025.11		21.7
MiroThinker-v1.0-8B 2025.11		21.5
Claude-4-Sonnet 2025.11		20.3
AFM-32B-RL 2025.11		18
WebExplorer-8B-RL 2025.11		17.3