| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| ToolSandbox | WISE-Flow | Similarity0.923 | 19 | 1mo ago | |
| GTM | GTM-1.5B | Average Score89.4 | 11 | 3mo ago | |
| OpenEarthAgent Step-by-step evaluation tool-agnostic rollouts | OpenEarthAgent | Instruction Adherence0.9951 | 9 | 3mo ago | |
| ShortcutsBench clear instructions 200 queries | GPT-4o | API Selection Accuracy92.5 | 8 | 1mo ago | |
| MCPToolBench++ | Precision81.8 | 7 | 2mo ago |