Share your thoughts, 1 month free Claude Pro on usSee more

Long-horizon agentic tasks on HLE Full

45.8Pass@1

Gemini-3.0-Pro

Updated 3mo ago

Evaluation Results

Method	Links
Gemini-3.0-Pro 2026.03		45.8
Claude-4.5-Opus 2026.03		43.4
GPT-5.1 High 2026.03		42.7
DeepSeek-v3.2 2026.03		40.8
Tongyi-DR-30B-A3B 2026.03		32.9
AgentFounder-30B-A3B 2026.03		31.5
MiroThinker-v1.5-30B-A3B 2026.03		31
IterResearch-30B-A3B 2026.03		28.8
OpenAI DeepResearch 2026.03		26.6
ASearcher-Web-32B 2026.03		12.5