| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Question Answering | TextVQA | Accuracy85.4 | 1,453 | |
| Text-based Visual Question Answering | TextVQA | Accuracy88.5 | 962 | |
| Visual Question Answering | TextVQA (val) | VQA Score7,040 | 365 | |
| Text-based Visual Question Answering | TextVQA (val) | Accuracy86.5 | 276 | |
| Visual Question Answering | TextVQA | TextVQA Accuracy85.9 | 210 | |
| Visual Question Answering | TextVQA (test) | Accuracy81.1 | 124 | |
| Text-based Visual Question Answering | TextVQA | Score67.32 | 112 | |
| Text-based Visual Question Answering | TextVQA (VQA^T) | Accuracy78 | 96 | |
| Visual Question Answering | TextVQA | Accuracy88.7 | 94 | |
| Visual Question Answering | TextVQA v1.0 (val) | Accuracy85.5 | 84 | |
| Visual Question Answering | TextVQA | Accuracy97.15 | 79 | |
| OCR-related Understanding Tasks | TextVQA (val) | Accuracy86.62 | 64 | |
| Text-based Visual Question Answering | TextVQA VQAT | Accuracy69.74 | 61 | |
| Text-based Visual Question Answering | TextVQA | Score85.2 | 60 | |
| Text-based Visual Question Answering | TextVQA | Accuracy61.3 | 58 | |
| OCR Visual Question Answering | TextVQA | Accuracy83.69 | 57 | |
| Image Understanding | TextVQA | Accuracy725 | 43 | |
| Visual Question Answering on Text | TextVQA | Accuracy58.21 | 41 | |
| Visual Question Answering | TextVQA v1.0 (test) | Accuracy86.79 | 40 | |
| Visual Question Answering | TextVQA | Accuracy97 | 38 | |
| Visual Question Answering | TextVQA | Clean Accuracy70.3 | 37 | |
| Text-based Visual Question Answering | TextVQA | TextVQA Accuracy73.78 | 33 | |
| Text-based Visual Question Answering | TextVQA | ANLS60.7 | 33 | |
| Visual Question Answering | TextVQA | VQA Accuracy39 | 33 | |
| Multimodal Prompt Injection Attack | TextVQA | Attack Success Rate (ASR)88.24 | 30 |