| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| BFCL Multiturn (OOD) v3 (test) | TopoCurate-RL | Base Rate48 | 18 | 3mo ago | |
| τ-bench airline | ReAct | Pass@130.4 | 6 | 1mo ago | |
| τ-bench retail | FAMA | Pass@1 Success Rate44.173 | 6 | 1mo ago | |
| τ-bench retail domain (All 115 tasks) | POLCA | Pass@143.9 | 4 | 2mo ago | |
| τ-bench retail domain (Last 105 tasks) | POLCA | Pass@142.5 | 4 | 2mo ago | |
| τ-bench retail domain (First 10 tasks) | POLCA | Pass@1 Success Rate57.5 | 4 | 2mo ago | |
| τ-bench airline (test) | FAMA | Pass@1 Success Rate37.6 | 3 | 1mo ago | |
| τ-bench retail (test) | FAMA | Pass@1 Success Rate34.6 | 3 | 1mo ago |