| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Function calling | BFCL (Berkeley Function Calling Leaderboard) | Base Score41.8 | 28 | |
| Function Calling | Berkeley Function Calling Leaderboard (BFCL) Live and Non-live | Non-live AST Score90.8 | 11 | |
| Function Calling | Berkeley Function Calling Leaderboard (BFCL) v4 | Simple Accuracy77.25 | 9 | |
| Function Calling | Berkeley Function-Calling Leaderboard (BFCL) | Non-Live Multiple AST Success Rate96 | 7 | |
| Function Calling | Berkeley Function Calling Leaderboard (BFCL) Extended Setting (Non-Live) | Simple Success Rate74.92 | 6 | |
| Tool / Agent | Berkeley Function Calling Leaderboard EN | Score36.17 | 2 |