Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
MCP Server Prediction on Pen-Strategist (test)
Loading...
48.88
Accuracy
Step Model
12.584
22.007
31.43
40.853
May 6, 2026
Accuracy
Micro F1 Score
Updated 27d ago
Evaluation Results
Method
Method
Links
Accuracy
Micro F1 Score
Step Model
2026.05
48.88
64
GPT-5-mini
2026.05
33.81
50
Gemini 2.5 Flash
2026.05
30.11
62
GPT-4o-mini
2026.05
28.69
56
GPT 4.1
2026.05
28.41
58
Claude 4.5 sonnet
2026.05
24.73
60
Gemini 2.0 Flash
2026.05
24.17
59
GPT-3.5-turbo
2026.05
18.52
40
Claude 3 Haiku
2026.05
16.13
39
LLaMA-3.1-8B
2026.05
13.98
38
Feedback
Search any
task
Search any
task