| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| τ-Bench | IOA | Score62.58 | 100 | 2d ago | |
| GAIA (val) | InternAgent-1.5 | Average Score86.06 | 17 | 4d ago | |
| FRAMES n=50 (full) | Accuracy77.31 | 8 | 4d ago | ||
| HLE | Overall Score41.6 | 7 | 4d ago | ||
| TIR-Bench | PyVision-Image | Accuracy19.8 | 3 | 4d ago | |
| τ-Bench (test) | - | Score- | 0 | 4d ago |