| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Diagram Question Answering | AI2D | AI2D Accuracy96.02 | 196 | |
| Visual Question Answering | AI2D | Accuracy87.3 | 174 | |
| Diagram Understanding | AI2D | Accuracy94.2 | 167 | |
| Diagram Understanding | AI2D (test) | Accuracy94.7 | 107 | |
| Diagram Question Answering | AI2D (test) | Accuracy94.7 | 103 | |
| Diagram Understanding | AI2D 1.0 (test) | Accuracy96.3 | 58 | |
| Visual Question Answering | AI2D (test) | Accuracy96.3 | 54 | |
| Visual Question Answering | AI2D 65 (test) | Score98.7 | 23 | |
| Diagram Understanding | AI2D F | Accuracy59.7 | 23 | |
| Visual Question Answering | AI2D | EM82.48 | 23 | |
| OCR-based Visual Question Answering | AI2D 2016 (test) | Accuracy84.6 | 21 | |
| Diagram Understanding | AI2D | Exact Match79.11 | 19 | |
| Chart Understanding | AI2D | AI2D Score0.947 | 18 | |
| Multimodal Understanding | AI2D | Score85.56 | 17 | |
| OCR, Chat/Doc QA | AI2D (val) | AI2D Accuracy84 | 13 | |
| Multimodal Reasoning | AI2D | Score0.857 | 13 | |
| Document Understanding | AI2D (test) | Accuracy88.9 | 11 | |
| Multimodal Understanding | AI2D w/o M | Accuracy81.6 | 9 | |
| Multimodal Understanding | AI2D no mask | Score94.46 | 9 | |
| Multimodal Understanding | AI2D | Accuracy80.8 | 7 | |
| Document and chart understanding | AI2D | Pass@188.7 | 7 | |
| OCR-related Understanding Tasks | AI2D w. M. | Accuracy89.1 | 7 | |
| Diagram Question Answering | AI2D no mask | Accuracy82.5 | 7 | |
| General Visual Understanding | AI2D | Accuracy83.89 | 6 | |
| OCR-related understanding | AI2D (test) | Accuracy85 | 6 |