| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Question Answering | TextVQA | Accuracy85.4 | 1,285 | |
| Text-based Visual Question Answering | TextVQA | Accuracy88.5 | 807 | |
| Visual Question Answering | TextVQA (val) | VQA Score7,040 | 343 | |
| Text-based Visual Question Answering | TextVQA (val) | Accuracy86.5 | 262 | |
| Visual Question Answering | TextVQA (test) | Accuracy81.1 | 124 | |
| Text-based Visual Question Answering | TextVQA | Score67.32 | 112 | |
| Text-based Visual Question Answering | TextVQA (VQA^T) | Accuracy78 | 96 | |
| Visual Question Answering | TextVQA | Accuracy88.7 | 94 | |
| Visual Question Answering | TextVQA v1.0 (val) | Accuracy85.5 | 84 | |
| Visual Question Answering | TextVQA | Accuracy97.15 | 79 | |
| Visual Question Answering | TextVQA | TextVQA Accuracy80.12 | 67 | |
| OCR-related Understanding Tasks | TextVQA (val) | Accuracy86.62 | 57 | |
| OCR Visual Question Answering | TextVQA | Accuracy83.69 | 45 | |
| Image Understanding | TextVQA | Accuracy725 | 40 | |
| Visual Question Answering | TextVQA v1.0 (test) | Accuracy86.79 | 40 | |
| Visual Question Answering | TextVQA | Clean Accuracy70.3 | 37 | |
| Visual Question Answering | TextVQA | VQA Accuracy39 | 33 | |
| Refusal Rate Evaluation | TextVQA | Refusal Rate70 | 30 | |
| Text-based Visual Question Answering | TextVQA VQAT | Accuracy69.74 | 30 | |
| Visual Question Answering | TextVQA (test val) | Accuracy58.2 | 30 | |
| Visual Question Answering | TextVQA | Accuracy81.57 | 26 | |
| Visual Question Answering | TextVQA | Exact Match (EM)82.74 | 23 | |
| Text-based Visual Question Answering | TextVQA (test) | Accuracy83.8 | 23 | |
| Visual Question Answering | TextVQA 130 (val) | Score86.5 | 23 | |
| Text-based Visual Question Answering | TextVQA 52 | Accuracy63.8 | 23 |