SOTA Reasoning on HLE (Score) and PapersWithCode

64.7Score

Claude Mythos Preview

Updated 18d ago

Evaluation Results

Method	Links
Claude Mythos Preview 2025.06		64.7
AGENTORCHESTRA Evolved 2025.06		59.6
GPT-5.4 Pro 2025.06		58.7
Muse Spark (Contemplating) 2025.06		58.4
GPT-5.5 Pro 2025.06		57.2
AGENTORCHESTRA Vanilla 2025.06		55.2
Poetiq Meta-System 2025.06		55
Claude Opus 4.7 2025.06		54.7
Kimi K2.6 Thinking 2025.06		54
Gemini 3 Deep Think 2025.06		53.4
Claude Opus 4.6 2025.06		53.1
Zoom Federated AI 2025.06		53
GPT-5.5 2025.06		52.2
GPT-5.4 2025.06		52.1
Gemini 3.1 Pro 2025.06		51.4
Muse Spark 2025.06		50.4
Kimi K2.5 Thinking 2025.06		50.2
GPT-5.2 Pro 2025.06		50
Claude Sonnet 4.6 2025.06		49
DeepSeek-V4-Pro-Max 2025.06		48.2
Yunjue Agent 2025.06		48
MiMo-V2.5-Pro 2025.06		48
Gemini Deep Research 2025.06		46.4
Gemini 3 Pro 2026.05		45.8
Gemini 3 Pro 2025.06		45.8
Claude Opus 4.5 2026.05		43.2
Quest-35B 2026.05		37.2
GPT-5 2026.05		35.2
Tongyi-DR 2026.05		32.9
OpenAI-DR 2026.05		26.6
Quest-30B 2026.05		24.6
OpenResearcher 2026.05		19.6
DeepSeek v3.2 2025.12		17.9
Confident Decoding 2026.06		17
Confident Decoding 2026.06		16.5
Last Layer Decoding 2026.06		16
Nemotron-3-Puzzle-75B-A9B 2026.07		16
DeepSeek R1 0528 2025.12		15.9
GLM-4.5 2025.12		14.8
Last Layer Decoding 2026.06		14.7
INTELLECT-3 2025.12		14.6
GLM-4.5-Air 2025.12		13.3
GLM-4.6 2025.12		13.3
Confident Decoding 2026.06		12.6
Confident Decoding 2026.06		11
Last Layer Decoding 2026.06		10.8
GPT-OSS 2025.12		10.6
Confident Decoding 2026.06		9.5
Last Layer Decoding 2026.06		9.2
Last Layer Decoding 2026.06		7.1
Confident Decoding 2026.06		6.4
Confident Decoding 2026.06		6.3
Confident Decoding 2026.06		6.1
Ouro-2.6B-Thinking-R4 2025.10		5.58
Last Layer Decoding 2026.06		5.5
Last Layer Decoding 2026.06		5.4
Ouro-1.4B-Thinking-R4 2025.10		5.21
Qwen3-4B 2025.10		5.21
Last Layer Decoding 2026.06		5.2
Deepseek-Distill-Qwen-7B 2025.10		5.14
Deepseek-Distill-Qwen-1.5B 2025.10		4.2
Qwen3-1.7B 2025.10		4.13
Qwen3-8B 2025.10		2.22
Last Layer Decoding 2026.06		0
Confident Decoding 2026.06		0