| BFCL V3 | D-CORE-14B | Overall Accuracy79.3 | | 104 | 3d ago |
| BFCL Multi-Turn v3 | | Overall Accuracy68.38 | | 41 | 1mo ago |
| BFCL Individual Tools per Problem | | Execution Accuracy95 | | 30 | 1mo ago |
| BFCL (Berkeley Function Calling Leaderboard) | GenEnv | Base Score41.8 | | 28 | 1mo ago |
| Berkeley Function Call Leaderboard (BFCL) Live (Out-of-Domain) | Qwen3-4B | AST Simple0.876 | | 26 | 1mo ago |
| Berkeley Function Call Leaderboard (BFCL) Non-Live Out-of-Domain | | AST Simple81.4 | | 26 | 1mo ago |
| BFCL Multi-turn | EVOTOOL | Accuracy42.3 | | 22 | 1mo ago |
| BFCL Single-turn | EvoPrompt | Accuracy84.2 | | 22 | 1mo ago |
| BFCL Simple Python | | Accuracy0.938 | | 20 | 1mo ago |
| Berkeley Function Call Leaderboard (BFCL) online inference setting | Qwen3-8B | Input Tokens621.13 | | 19 | 1mo ago |
| TB-MM | DTDR-L | FSA64.1 | | 18 | 1mo ago |
| TB-HF | DTDR-L | FSA60.5 | | 18 | 1mo ago |
| TB-DL | DTDR-L | FSA89 | | 18 | 1mo ago |
| TinyAgent | DTDR-L | FSA0.807 | | 18 | 1mo ago |
| BFCL | | Energy (Wh)4.2 | | 18 | 1mo ago |
| BFCL Multi-Turn v4 (test) | FISSION-GRPO | Overall Acc46.75 | | 17 | 1mo ago |
| BFCL | Llama 3.1 Instruct | Accuracy77.9 | | 14 | 1mo ago |
| ToolBench Average | Claude-3.5-Sonnet + TAFC | Pass Rate60.3 | | 14 | 1mo ago |
| ToolBench I3-Inst | Claude-3.5-Sonnet + TAFC | Pass Rate52.4 | | 14 | 1mo ago |
| ToolBench I2-Inst | Claude-3.5-Sonnet + TAFC | Pass Rate71.4 | | 14 | 1mo ago |
| ToolBench I1-Inst | Claude-3.5-Sonnet + TAFC | Pass Rate57.1 | | 14 | 1mo ago |
| BFCL (Held-In) | SHAD+RFT | Accuracy89.4 | | 14 | 1mo ago |
| ACEBench Normal | | Accuracy75.6 | | 13 | 1mo ago |
| BFCL v4 | | Score68.8 | | 13 | 1mo ago |
| Others (SealTools, OpenFunc, ToolAlpaca) | ST-Qwen2.5-14B | Overall Accuracy87.4 | | 12 | 1mo ago |