| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Long-context document understanding | MMLongBench-Doc | Accuracy55.8 | 58 | |
| Visual Document Retrieval | MMLongBench | Doc Retrieval Rate53.82 | 46 | |
| Multimodal Document Question Answering | MMLongBench-Doc | Overall Accuracy65.8 | 44 | |
| Document Visual Question Answering | MMLongbench doc | Accuracy45.6 | 34 | |
| Document Question Answering | MMLongBench-Doc | Accuracy (all)69.6 | 23 | |
| Multimodal Document Question Answering | MMLongBench | Accuracy43.2 | 19 | |
| Retrieval | MMLongBench | Recall75.86 | 18 | |
| Long-context Multi-modal Understanding | MMLongBench | Text Accuracy27.49 | 17 | |
| Multimodal Document Question Answering | MMLongBench (test) | Chart Acc.34.7 | 12 | |
| Long-context Visual Question Answering | MMLongBench 32K | Accuracy82.4 | 11 | |
| Long-context Visual Question Answering | MMLongBench 128K | Accuracy78.6 | 11 | |
| Document Question Answering | MMLongBench | Exact Match43.8 | 11 | |
| Retrieval | MMLongBench Finreport | MRR@1049.62 | 6 | |
| Retrieval | MMLongBench Doc | MRR@1047.64 | 6 | |
| Dataset Description Extraction | MMLongBench-Doc | Accuracy94.9 | 5 |