| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multimodal Agent Task | TIR-Bench | Average@447.75 | 24 | |
| Multimodal Tool-Use | TIR-Bench | Avg@450 | 16 | |
| Visual Reasoning | TIR-Bench | Average Score51.8 | 15 | |
| Visual Navigation | TIR-Bench Maze | Accuracy65 | 9 | |
| Tool-Integrated Reasoning | TIR-Bench | Score20.8 | 4 | |
| Agentic Reasoning | TIR-Bench | Accuracy19.8 | 3 |