Share your thoughts, 1 month free Claude Pro on usSee more

Deep Research on HLE text-only original (test)

32.9Pass@1

Tongyi-DeepResearch

Updated 4mo ago

Evaluation Results

Method	Links
Tongyi-DeepResearch 2026.01		32.9
WebSailor-v2-30B-A3B (RL) 2026.01		30.6
DeepSeek-V3.1 2026.01		29.8
DeepSeek-V3.2 2026.01		27.2
WebSailor-v2-30B-A3B (SFT) 2026.01		23.9
Claude-4-Sonnet 2026.01		20.3
OpenAI-o3 2026.01		20.2
Kimi-K2 2026.01		18.1
OffSeeker-8B (DPO) 2026.01		13.8
ASearcher-Web-QwQ 2026.01		12.5
WebExplorer-8B (RL) 2026.01		12.4
MiroThinker-32B-DPO-v0.1 2026.01		11.8
OffSeeker-8B (SFT) 2026.01		11.7