Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Agent Capability Evaluation on SEAL 0
Loading...
53.4
Score
Claude-4.5-Sonnet
35.2
39.925
44.65
49.375
Feb 6, 2026
Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Score
Claude-4.5-Sonnet
Model Access Type=Clos...
2026.02
53.4
OpenAI-GPT-5-high
Model Access Type=Clos...
2026.02
51.4
WebLeaper-30B-A3B
Model Access Type=Open...
2026.02
48.6
AgentCPM-Explore-4B
Model Access Type=Open...
2026.02
40.5
MiroThinker 8B
Model Access Type=Open...
2026.02
40.4
IterResearch-30B-A3B
Model Access Type=Open...
2026.02
39.6
DeepSeek-V3.2
Model Access Type=Clos...
2026.02
38.5
Kimi-Researcher
Model Access Type=Clos...
2026.02
36
Merged-Model-4B
Model Access Type=Open...
2026.02
35.9
Feedback
Search any
task
Search any
task