| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Long-context document understanding | MMLongBench-Doc | Accuracy55.8 | 58 | |
| Document Visual Question Answering | MMLongbench doc | Accuracy45.6 | 48 | |
| Visual Document Retrieval | MMLongBench | Doc Retrieval Rate53.82 | 46 | |
| Multimodal Document Question Answering | MMLongBench-Doc | Overall Accuracy65.8 | 44 | |
| Document Question Answering | MMLongBench-Doc | Accuracy (all)69.6 | 23 | |
| Long-document Visual Question Answering | MMLongBench Overall | Average Score90.77 | 22 | |
| Long-document Visual Question Answering | MMLongBench 128K context | MMLB-D83.33 | 22 | |
| Long-document Visual Question Answering | MMLongBench 64K context | MMLB-D93.1 | 22 | |
| Multimodal Document Question Answering | MMLongBench | Accuracy43.2 | 19 | |
| Retrieval | MMLongBench | Recall75.86 | 18 | |
| Long-context Multi-modal Understanding | MMLongBench | Text Accuracy27.49 | 17 | |
| Document Understanding, OCR & Charts | MMLongBench Doc | Score57.5 | 14 | |
| Reasoning over rich modalities | MMLongBench Doc | Accuracy42.3 | 12 | |
| Multimodal Document Question Answering | MMLongBench (test) | Chart Acc.34.7 | 12 | |
| Long-context Visual Question Answering | MMLongBench 32K | Accuracy82.4 | 11 | |
| Long-context Visual Question Answering | MMLongBench 128K | Accuracy78.6 | 11 | |
| Document Question Answering | MMLongBench | Exact Match43.8 | 11 | |
| Retrieval | MMLongBench Finreport | MRR@1049.62 | 6 | |
| Retrieval | MMLongBench Doc | MRR@1047.64 | 6 | |
| Dataset Description Extraction | MMLongBench-Doc | Accuracy94.9 | 5 | |
| Long-document Visual Question Answering | MMLongBench 512K context | MMLongBench-D Score31.91 | 4 | |
| Long-document Visual Question Answering | MMLongBench 256K context | MMLB-D Score31.63 | 4 |