Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tool-use evaluation on MCPToolBench++
Loading...
81.8
Precision
Llama-3.3-70B
61.208
66.554
71.9
77.246
Apr 11, 2025
Precision
Recall
F1 Score
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Precision
Recall
F1 Score
Accuracy
Llama-3.3-70B
Size=70B
2025.04
81.8
82.7
82.1
81
Llama-4-Maverick
2025.04
77.5
80
78.3
75.3
Kimi-K2-Instruct
Alignment=Instruct
2025.04
77.3
80.7
78.4
74
Llama-4-Scout
2025.04
75
76.3
75.3
74.3
MCP Bridge
Model Variant=8B + Dr....
2025.04
72.5
75.3
73.4
69.7
MCP Bridge
Model Variant=4B + GRPO
2025.04
67
68.3
67.4
65.7
GPT-OSS-120B
Size=120B
2025.04
62
65.3
63.1
58.7
Feedback
Search any
task
Search any
task