Share your thoughts, 1 month free Claude Pro on usSee more

Agent Capability Evaluation on SEAL 0

61.3Average Score (@8)

MiroThinker-H1

Updated 3mo ago

Evaluation Results

Method	Links
MiroThinker-H1 2026.03		61.3	-
Kimi-K2.5 2026.03		57.4	-
Claude-4.5-Sonnet 2025.11		53.4	-
MiroThinker-1.7 2026.03		53	-
OpenAI-GPT-5 2026.03		51.4	-
OpenAI-GPT-5-high 2025.11		51.4	-
MiroThinker-v1.0-72B 2025.11		51	-
DeepSeek-V3.2 2026.03		49.5	-
Seed-2.0-Pro 2026.03		49.5	-
MiroThinker-1.7-mini 2026.03		48.2	-
Claude-4.5-Opus 2026.03		47.7	-
Qwen3.5-397B 2026.03		46.9	-
MiroThinker-v1.0-30B 2025.11		46.8	-
Gemini-3.0-Pro 2026.03		45.5	-
MiroThinker-v1.0-8B 2025.11		40.4	-
DeepSeek-V3.2 2025.11		38.5	-
Kimi-Researcher 2025.11		36	-
Kimi-K2-0905 2025.11		25.2	-
OpenAI-o3 2025.11		17.1	-
Claude-4.5-Sonnet 2026.02		-	53.4
DeepSeek-V3.2 2026.02		-	38.5
OpenAI-GPT-5-high 2026.02		-	51.4
Kimi-Researcher 2026.02		-	36
MiroThinker 8B 2026.02		-	40.4
IterResearch-30B-A3B 2026.02		-	39.6
WebLeaper-30B-A3B 2026.02		-	48.6
Merged-Model-4B 2026.02		-	35.9
AgentCPM-Explore-4B 2026.02		-	40.5