| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multimodal Reasoning | Vstar Bench Spatial | Accuracy90.8 | 19 | |
| Multimodal Reasoning | Vstar Bench Attr | ACC94.8 | 19 | |
| High-resolution Visual Understanding | Vstar Bench | Attribute Score94.8 | 12 | |
| Visual Reasoning | Vstar Bench Spatial | Accuracy81.6 | 10 | |
| Visual Grounding and Reasoning | VStar-Bench | Overall Score84.29 | 9 |