| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Question Answering | OK-VQA (test) | Accuracy76.5 | 296 | |
| Visual Question Answering | OK-VQA | Accuracy84.7 | 224 | |
| Visual Question Answering | OK-VQA v1.0 (test) | Accuracy58 | 77 | |
| Visual Question Answering | OK-VQA (val) | Accuracy66.8 | 47 | |
| Visual Question Answering | OK-VQA v1.1 (test) | VQA Score66.1 | 28 | |
| Visual Question Answering | OK-VQA | VQA Score80.7 | 18 | |
| Knowledge-Based Visual Question Answering | OK-VQA v1.0 (test) | Accuracy61.2 | 15 | |
| Visual Question Answering | OK-VQA | Score58.2 | 14 | |
| External Knowledge-dependent Image Question Answering | OK-VQA | Accuracy70.6 | 14 | |
| Visual Question Answering | OK-VQA 2019 | V-Score57.3 | 12 | |
| Speech-Visual Question Answering | OK-VQA Speech-converted | Accuracy29.04 | 12 | |
| Knowledge Retrieval | OK-VQA v1.1 (test) | Recall@589.32 | 10 | |
| Visual Question Answering | OK-VQA Far OOD | Accuracy50.11 | 6 | |
| Visual Question Answering | OK-VQA (test) | CIDEr0.4 | 6 | |
| Visual Question Answering | OK-VQA | Accuracy (Clean)59.6 | 5 | |
| Visual Question Answering | OK-VQA | Inference Time0.661 | 4 | |
| Retrieval | OK-VQA | Pseudo Recall@573.4 | 2 |