| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multi-modal Long-context Benchmarking | MileBench | Task T Score57.23 | 39 | |
| Multi-image understanding | MileBench (test) | Temporal Multi-Image Score (Task T)57.3 | 21 | |
| Multi-image Multi-modal Question Answering | MileBench | CL-CH Score44.76 | 18 | |
| Long-context multimodal evaluation | MileBench (test) | TN Score25.34 | 18 |