| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Spatio-Temporal Reasoning | V-STaR | Chain1 (When) m tIoU27.5 | 44 | |
| Vision-Centric Question Answering | V-Star | Accuracy83.6 | 20 | |
| Fine-grained Visual Perception | V-Star | Accuracy79 | 20 | |
| Spatio-temporal reasoning | V-STAR (test) | What Accuracy64.1 | 15 | |
| Visual Search | V-star | Accuracy83.8 | 5 |