| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Document Visual Question Answering | SlideVQA | Accuracy0.849 | 53 | |
| Visual Question Answering | SlideVQA | Overall Accuracy78.74 | 46 | |
| Slide Question Answering | SlideVQA | Overall Score79.5 | 29 | |
| Document Question Answering | SlideVQA (test) | EM63.2 | 19 | |
| Visual Document Retrieval | SlideVQA | Recall@1097.87 | 13 | |
| Visual Document Retrieval | SlideVQA | NDCG@582.5 | 13 | |
| Document Understanding | SlideVQA | F1 Score77.1 | 8 | |
| Evidence Selection | SlideVQA | ES EM97.7 | 6 | |
| Multimodal Document Retrieval | SlideVQA | MRR93.91 | 6 | |
| Retrieval | SlideVQA | R@392.81 | 6 | |
| Local Question Answering | SlideVQA 2k | Accuracy64.85 | 5 | |
| Visual Question Answering | SlideVQA (test) | Overall Accuracy90.5 | 4 | |
| Document Question Answering | SlideVQA | TFLOPS (Encoder)9.9 | 2 |