Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Tool Selection on Chess Specialists: opening, midgame, endgame, late-endgame
Loading...
64.4
Accuracy
Gold (Ground Truth Documentation)
17.704
29.827
41.95
54.073
Feb 16, 2026
Accuracy
ELO
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
ELO
Gold (Ground Truth Documentation)
Model=GPT-5, Framework...
2026.02
64.4
1,411
Gold (Ground Truth Documentation)
Model=GPT-5-mini, Fram...
2026.02
52.8
1,243
TOOLOBSERVER
Model=GPT-5, Framework...
2026.02
40.1
1,020
Play2Prompt
Model=GPT-5, Framework...
2026.02
35.8
966
TOOLOBSERVER
Model=GPT-5-mini, Fram...
2026.02
32.1
949
EasyTool
Model=GPT-5-mini, Fram...
2026.02
25.8
739
Base (Opacified set)
Model=GPT-5-mini, Fram...
2026.02
24.9
772
Base (Opacified set)
Model=GPT-5, Framework...
2026.02
23.5
728
EasyTool
Model=GPT-5, Framework...
2026.02
23.2
761
Play2Prompt
Model=GPT-5-mini, Fram...
2026.02
19.5
754
Feedback
Search any
task
Search any
task