| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| DocVQA (test) | Qwen2-VL-72B | Accuracy96.5 | 39 | 3mo ago | |
| DUDE | Qwen3 VL | Accuracy61.8 | 32 | 19d ago | |
| AI2D | Accuracy0.977 | 28 | 1mo ago | ||
| DUE Benchmark | LayoutLMv2LARGE + QG | DocVQA86.7 | 24 | 3mo ago | |
| DocVQA | OpenVLThinkerV2 | ANLS96.7 | 21 | 1mo ago | |
| OmniDocBench standard (test) | DocHumming | Overall Score93.75 | 19 | 2mo ago | |
| InfoVQA (test) | Qwen2-VL-72B | Accuracy84.5 | 18 | 3mo ago | |
| AI2D (test) | Qwen3-VL 32B | Accuracy88.9 | 17 | 2mo ago | |
| ChartXiv-DQ | Accuracy95.95 | 16 | 3mo ago | ||
| MPDocVQA | DocSeeker | ANLS86.2 | 15 | 1mo ago | |
| GRAPH2EVAL-BENCH | GPT-4o | F1 Score59.16 | 14 | 2mo ago | |
| LongDocURL | GPT-4o | Accuracy64.5 | 12 | 1mo ago | |
| LongBench | XATTN | CC37.23 | 12 | 2mo ago | |
| CharXiv reas. | Accuracy0.686 | 11 | 3mo ago | ||
| DocVQA (val) | ERNIE 5.0 | Accuracy95.45 | 11 | 2mo ago | |
| InfoVQA | OpenVLThinkerV2 | Score86.4 | 10 | 1mo ago | |
| FireRedBench (test) | Overall Score0.8185 | 10 | 3mo ago | ||
| DUE-Benchmark (test) | UDOP | DocVQA84.7 | 10 | 3mo ago | |
| SlideVQA | DocSeeker | F1 Score77.1 | 8 | 1mo ago | |
| ChartQA v1.0 (test) | VRE | Overall Accuracy88.8 | 8 | 2mo ago | |
| Mendeley Clinical Laboratory Test Reports | Gemini 3.0 Pro | Macro F190 | 7 | 1mo ago | |
| EHR Dataset 4 | Gemini 3.0 Flash | Macro F182 | 7 | 1mo ago | |
| EHR Dataset 3 | Gemini 3.0 Pro | Macro F190 | 7 | 1mo ago | |
| EHR Dataset 2 | Gemma 3 27B | Macro F193 | 7 | 1mo ago | |
| ChartQA | Relaxed Human Split Accuracy76.4 | 6 | 28d ago |