Share your thoughts, 1 month free Claude Pro on usSee more

Hard Reasoning on HLE

37.7Pass@1

Gemini-3.0 Pro

Updated 1mo ago

Evaluation Results

Method	Links
Gemini-3.0 Pro 2025.12		37.7
GPT-5 High 2025.12		26.3
DeepSeek-V3.2 2025.12		25.1
Kimi-K2 2025.12		23.9
DeepSeek-V3.2 2025.12		23.9
Claude-4.5-Sonnet 2025.12		13.7
MiniMax M2 2025.12		12.5
Qwen3 + SFT + B-DAPO w/ recycling 2026.06		10.4
SmartSearch-3B 2026.06		9.8
Qwen3 + SFT + B-DAPO w/ recycling 2026.06		8
BehaviorPrime-1.7B 2026.06		7.8
ASearcher-Web-7B 2026.06		7