| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Question Answering | TextVQA | Accuracy85.4 | 1,117 | |
| Text-based Visual Question Answering | TextVQA | Accuracy86.2 | 496 | |
| Visual Question Answering | TextVQA (val) | VQA Score7,040 | 309 | |
| Text-based Visual Question Answering | TextVQA (val) | Accuracy85.5 | 146 | |
| Visual Question Answering | TextVQA (test) | Accuracy81.1 | 124 | |
| Visual Question Answering | TextVQA | Accuracy97.15 | 79 | |
| Visual Question Answering | TextVQA | Accuracy88.7 | 69 | |
| Visual Question Answering | TextVQA v1.0 (val) | Accuracy85.5 | 69 | |
| Text-based Visual Question Answering | TextVQA (VQA^T) | Accuracy70.4 | 65 | |
| Text-based Visual Question Answering | TextVQA | Score63.2 | 38 | |
| Visual Question Answering | TextVQA | Clean Accuracy70.3 | 37 | |
| Visual Question Answering | TextVQA | VQA Accuracy39 | 33 | |
| Visual Question Answering | TextVQA v1.0 (test) | Accuracy86.79 | 27 | |
| Visual Question Answering | TextVQA | Exact Match (EM)82.74 | 23 | |
| Visual Question Answering | TextVQA 130 (val) | Score86.5 | 23 | |
| Text-based Visual Question Answering | TextVQA 52 | Accuracy63.8 | 23 | |
| OCR-related Understanding Tasks | TextVQA (val) | Accuracy86.62 | 22 | |
| Text-based Visual Question Answering | TextVQA | Average Score100 | 21 | |
| Image Understanding | TextVQA | Accuracy85.76 | 16 | |
| Visual Question Answering | TextVQA 1k (test) | ASR (%)96.46 | 15 | |
| Text-based Visual Question Answering | TextVQA (TQA) | Score66.6 | 14 | |
| Copyright tracking | TextVQA | ASR47 | 13 | |
| OCR-based Visual Question Answering | TextVQA 2019 (val) | Accuracy83.8 | 13 | |
| Visual Question Answering | TextVQA | Score69.89 | 12 | |
| OCR VQA | TextVQA (test) | Pre Accuracy61.9 | 10 |