| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| ACEBench Agent | Qwen3-14B + ARTIS (Sequential) | End-to-End Accuracy60 | 15 | 4d ago | |
| BFCL Multi-turn V4 | Qwen3-235B | Base Score58.5 | 7 | 4d ago | |
| TAU-2 Bench | Qwen3-235B | Airline Score47.5 | 7 | 4d ago | |
| ACEBench-en | End-to-End Accuracy56 | 7 | 4d ago | ||
| ACEBench-zh | ERNIE 5.0 | Accuracy89.6 | 5 | 4d ago | |
| TAU2-Bench | Success Rate85.4 | 5 | 4d ago |