Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Tool Calling on Mixed Overall Summary
Loading...
67.28
Overall Success Rate
MAVEN
47.156
52.3805
57.605
62.8295
May 29, 2026
Overall Success Rate
Updated 2d ago
Evaluation Results
Method
Method
Links
Overall Success Rate
MAVEN
Base Model=GPT-OSS-120b
2026.05
67.28
Kimi-K2
Model Identifier=Kimi-K2
2026.05
63.07
GPT-5
Model Identifier=gpt-5...
2026.05
61.15
o3
Model Identifier=o3-20...
2026.05
58.61
o4-mini
Model Identifier=o4-mi...
2026.05
56.04
Qwen3-Th-235B
Model Identifier=Qwen3...
2026.05
52.26
Gemini-2.5
Model Identifier=gemin...
2026.05
50.14
DeepSeek-V3.1
Model Identifier=DeepS...
2026.05
47.93
Feedback
Search any
task
Search any
task