| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Single-hop Tool Calling | WHEN2TOOL single-hop 1.0 (test) | Accuracy94.3 | 90 | |
| Tool Calling | WHEN2TOOL Overall | Δ Accuracy-1 | 7 | |
| Tool Calling | WHEN2TOOL Hard | Delta Accuracy (ΔAcc)-0.8 | 7 | |
| Tool Calling | WHEN2TOOL Medium | Delta Accuracy-0.7 | 7 | |
| Tool Calling | WHEN2TOOL Easy | ΔAcc-0.3 | 7 |