| API-Bank L-1 | HiTEC-KTO | F1 Name Match94.99 | | 46 | 4d ago |
| Supply Chain Tool Calling 1.0 (test) | | Accuracy86.73 | | 38 | 4d ago |
| Tool-Alpaca | Llama3-70B | Tool Name Accuracy91.17 | | 31 | 4d ago |
| Seal-Tools Single-Tool | Llama3-70B | Name Match Score98.14 | | 30 | 4d ago |
| SupChain-Bench | SupChain-ReAct | Accuracy75.51 | | 27 | 4d ago |
| API-Bank L-2 | HiTEC-KTO | Name Match F190.42 | | 25 | 4d ago |
| F1 Average | Llama3-70B | Tool Call Name F191.37 | | 16 | 4d ago |
| Nexus Raven | Llama3-70B | Score (Name)94.84 | | 16 | 4d ago |
| Nexus Raven v1 (test) | HiTEC-KTO | F1 Name94.84 | | 12 | 4d ago |
| Seal-Tools Single-Tool v1 (test) | HiTEC-KTO | F1 Name98.14 | | 12 | 4d ago |
| Tool-Alpaca v1 (test) | HiTEC-KTO | F1 Name87.63 | | 12 | 4d ago |
| API-Bank L-2 v1 (test) | HiTEC-KTO | F1 Name Match88 | | 12 | 4d ago |
| API-Bank L-1 v1 (test) | HiTEC-KTO | F1 Score90.78 | | 12 | 4d ago |
| Average across 5 benchmarks | HiTEC-ICL | F1 (Name)88.47 | | 9 | 4d ago |
| BFCL V3 | Qwen3 14B | pass@170.4 | | 7 | 4d ago |
| ToolBench generalization dataset (I2-Cat) | ToolGen* | SoPR51.96 | | 7 | 4d ago |
| ToolBench generalization dataset (I1-Tool) | ToolLlama* | SoPR57.7 | | 7 | 4d ago |
| BFCL Multiple v1 (test) | | Accuracy92 | | 6 | 4d ago |
| BFCL Simple Python v1 (test) | PrefillShare | Accuracy93.5 | | 6 | 4d ago |
| StableToolBench (STB) I3-Inst | TOOLQP | Solvable Pass Rate48.3 | | 6 | 4d ago |
| Retail-3I 1.0 (Infeasible) | Qwen3-4B | Pass@10.578 | | 2 | 4d ago |
| Retail-3I Changing 1.0 | Qwen3-8B | Pass@161.8 | | 2 | 4d ago |
| Retail-3I Ambiguous 1.0 | Qwen3-8B | Pass@10.696 | | 2 | 4d ago |
| Retail-3I General 1.0 | Qwen3-8B | Pass@173.6 | | 2 | 4d ago |
| BFCL v3 (test) | - | Live Overall Accuracy- | | 0 | 4d ago |