| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multimodal Question Answering | CRIT Scientific Paper 1.0 (test) | EM15.9 | 11 | |
| Multimodal Question Answering | CRIT (Video Frame) 1.0 (test) | Exact Match (EM)38.8 | 11 | |
| Multimodal Question Answering | CRIT Natural Image 1.0 (test) | EM58.6 | 11 | |
| Visual Reasoning | CRIT Scientific Paper | Exact Match (EM)15.9 | 6 | |
| Visual Reasoning | CRIT Video Frame | EM38.8 | 6 | |
| Visual Reasoning | CRIT Natural Image | EM58.6 | 6 |