Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tool-use task completion on SkillCraft Hard
Loading...
20
Success Rate
Minimax-M2.1
6.48
9.99
13.5
17.01
Feb 28, 2026
Success Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Success Rate
Minimax-M2.1
Mode=Skill
2026.02
20
Claude 4.5 Sonnet
Mode=Base
2026.02
20
Claude 4.5 Sonnet
Mode=Skill
2026.02
20
Minimax-M2.1
Mode=Base
2026.02
18
GPT-5.2
Mode=Skill
2026.02
17
Gemini 3 Pro
Mode=Skill
2026.02
17
GPT-5.2
Mode=Base
2026.02
16
Gemini 3 Pro
Mode=Base
2026.02
16
DeepSeek-V3.2-EXP
Mode=Skill
2026.02
15
DeepSeek-R1
Mode=Skill
2026.02
15
GLM-4.7
Mode=Skill
2026.02
15
GLM-4.7
Mode=Base
2026.02
12
DeepSeek-R1
Mode=Base
2026.02
11
DeepSeek-V3.2-EXP
Mode=Base
2026.02
9
Kimi-K2-Thinking
Mode=Base
2026.02
8
Kimi-K2-Thinking
Mode=Skill
2026.02
7
Feedback
Search any
task
Search any
task