| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| StableToolBench (STB) | Llama3.1-8B | EM Accuracy49.25 | 16 | 2d ago | |
| WebShop (WS) | Qwen2.5-7B | EM Accuracy79.05 | 16 | 2d ago | |
| TextWorld (TW) | Qwen2.5-7B | EM Accuracy70.6 | 16 | 2d ago | |
| SciWorld (SW) | Llama3.1-8B | EM Accuracy98.64 | 16 | 2d ago | |
| ALFWorld (AW) | Qwen2.5-7B | EM Accuracy99.87 | 16 | 2d ago |