| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Image Captioning Evaluation | ICBench Long Caption | Human Score76.57 | 40 | |
| Image Captioning | ICBench short captions (test) | Fluency80.13 | 23 | |
| Vision-Language-Action instruction following | ICBench Goal Suite (test) | Success Rate (SR)96.2 | 12 | |
| Vision-Language-Action instruction following | ICBench Object Suite (test) | Success Rate (SR)94.2 | 12 | |
| Vision-Language-Action instruction following | ICBench Spatial Suite (test) | Success Rate99.6 | 12 | |
| Intrinsic Concept Extraction | ICBench D1 | SIMT-T Score (Object)28 | 2 |