Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Tool Selection on Chess Skill: beginner, intermediate, advanced
Loading...
100
Accuracy
Gold (Ground Truth Documentation)
-3.74
23.1925
50.125
77.0575
Feb 16, 2026
Accuracy
ELO Rating
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
ELO Rating
Gold (Ground Truth Documentation)
Model=GPT-5, Framework...
2026.02
100
2,341
Gold (Ground Truth Documentation)
Model=GPT-5-mini, Fram...
2026.02
100
2,346
TOOLOBSERVER
Model=GPT-5, Framework...
2026.02
29.1
1,756
TOOLOBSERVER
Model=GPT-5-mini, Fram...
2026.02
28.4
1,778
Base (Opacified set)
Model=GPT-5-mini, Fram...
2026.02
25.7
1,674
EasyTool
Model=GPT-5-mini, Fram...
2026.02
25.5
1,645
Base (Opacified set)
Model=GPT-5, Framework...
2026.02
24.9
1,572
EasyTool
Model=GPT-5, Framework...
2026.02
24.9
1,584
Play2Prompt
Model=GPT-5-mini, Fram...
2026.02
22.7
1,481
Play2Prompt
Model=GPT-5, Framework...
2026.02
0.25
1,622
Feedback
Search any
task
Search any
task