| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Question Answering | OK-VQA (test) | Accuracy76.5 | 327 | |
| Visual Question Answering | OK-VQA | Accuracy84.7 | 260 | |
| Visual Question Answering | OK-VQA v1.0 (test) | Accuracy58 | 77 | |
| External Knowledge-dependent Image Question Answering | OK-VQA | Accuracy91.9 | 49 | |
| Visual Question Answering | OK-VQA (val) | Accuracy66.8 | 47 | |
| Knowledge-based Visual Question Answering | OK-VQA | VQA Score68.6 | 32 | |
| Visual Question Answering | OK-VQA v1.1 (test) | VQA Score66.1 | 28 | |
| Visual Question Answering | OK-VQA | EM44.29 | 24 | |
| Visual Question Answering | OK-VQA | VQA Score80.7 | 18 | |
| Knowledge-Based Visual Question Answering | OK-VQA v1.0 (test) | Accuracy61.2 | 15 | |
| Visual Question Answering | OK-VQA | Score58.2 | 14 | |
| Visual Question Answering | OK-VQA full v1.0 (val) | VQA Accuracy59.56 | 12 | |
| Visual Question Answering | OK-VQA 2019 | V-Score57.3 | 12 | |
| Speech-Visual Question Answering | OK-VQA Speech-converted | Accuracy29.04 | 12 | |
| Privacy Protection | OK-VQA (test) | cMAP59.3 | 10 | |
| Privacy Recognition | OK-VQA | cMAP59.3 | 10 | |
| Knowledge Retrieval | OK-VQA v1.1 (test) | Recall@589.32 | 10 | |
| Retrieval | OK-VQA (test) | PRR@588 | 7 | |
| Visual Question Answering | OK-VQA Far OOD | Accuracy50.11 | 6 | |
| Visual Question Answering | OK-VQA (test) | CIDEr0.4 | 6 | |
| Visual Question Answering | OK-VQA Non-IID | Accuracy46.06 | 5 | |
| Visual Question Answering | OK-VQA IID | Accuracy49.74 | 5 | |
| Visual Question Answering | OK-VQA | Accuracy (Clean)59.6 | 5 | |
| Visual Question Answering | OK-VQA | Inference Time0.661 | 4 | |
| Knowledge-Based Question Answering | OK-VQA + A-OKVQA | Accuracy87.31 | 3 |