Share your thoughts, 1 month free Claude Pro on usSee more

General Deep Research Tool Use on HLE

42.9Success Rate

Gemini-3-Pro

Updated 2mo ago

Evaluation Results

Method	Links
Gemini-3-Pro 2026.03		42.9
Gemini-2.5-Pro 2026.03		28.4
Kimi-K2-0905 2026.03		26.9
Claude-4-Sonnet 2026.03		20.8
GPT-OSS-120B 2026.03		19
DeepSeek-V3.2-Exp 2026.03		17.9
DIVE-8B (RL) 2026.03		17.8
WebExplorer-8B 2026.03		17.3
DIVE-8B (SFT) 2026.03		13.8
SWE-Dev-8B 2026.03		6.9
Qwen3-8B (base) 2026.03		6.4
EnvScaler-8B 2026.03		2.8