Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agent task execution on SkillsBench 1.0 (test)
Loading...
71.1
Pass Rate (With Skills)
EvoSkills (Claude Opus 4.6)
41.98
49.54
57.1
64.66
Apr 2, 2026
Pass Rate (With Skills)
Pass Rate (No Skill)
Performance Delta
Updated 2mo ago
Evaluation Results
Method
Method
Links
Pass Rate (With Skills)
Pass Rate (No Skill)
Performance Delta
EvoSkills (Claude Opus 4.6)
Skill evolution source...
2026.04
71.1
30.6
40.5
EvoSkills (GPT-5.2)
Skill evolution source...
2026.04
69.8
29.6
40.2
EvoSkills (GPT-5.2)
Skill evolution source...
2026.04
65
29.6
35.4
EvoSkills (Claude Sonnet 4.5)
Skill evolution source...
2026.04
63.1
20
43.1
EvoSkills (Claude Haiku 4.5)
Skill evolution source...
2026.04
54.5
10.4
44.1
EvoSkills (Qwen3 Coder)
Skill evolution source...
2026.04
50.8
8.4
42.4
EvoSkills (DeepSeek V3)
Skill evolution source...
2026.04
48.8
13
35.8
EvoSkills (Mistral Large 3)
Skill evolution source...
2026.04
43.1
4.9
38.2
Feedback
Search any
task
Search any
task