| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Function calling | BFCL (Berkeley Function Calling Leaderboard) | Base Score41.8 | 28 | |
| Function Calling | Berkeley Function Calling Leaderboard (BFCL) Overall November 19, 2025 | Non-live Accuracy69.44 | 20 | |
| Function Calling | Berkeley Function Calling Leaderboard (BFCL) Live and Non-live | Non-live AST Score90.8 | 11 | |
| Function Calling | Berkeley Function Calling Leaderboard (BFCL) v4 | Simple Accuracy77.25 | 9 | |
| Function Calling | Berkeley Function-Calling Leaderboard (BFCL) | Non-Live Multiple AST Success Rate96 | 7 | |
| Function Calling | Berkeley Function Calling Leaderboard (BFCL) Live v3 | Score53.8 | 6 | |
| Function Calling | Berkeley Function Calling Leaderboard (BFCL) Non Live | Score70.1 | 6 | |
| Function Calling | Berkeley Function Calling Leaderboard (BFCL) Extended Setting (Non-Live) | Simple Success Rate74.92 | 6 | |
| Function Calling | Berkeley Function Calling Leaderboard (BFCL) | Overall Success68.92 | 5 | |
| Tool / Agent | Berkeley Function Calling Leaderboard EN | Score36.17 | 2 |