| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Question Answering | VizWiz | Accuracy78.8 | 1,043 | |
| Visual Question Answering | VizWiz (test) | Accuracy69.3 | 66 | |
| Visual Question Answering | VizWiz (test-dev) | Accuracy76.4 | 65 | |
| Visual Question Answering | Vizwiz (val) | VQA Score74.6 | 45 | |
| Comprehensive Evaluation | VizWiz (val) | Score69.16 | 16 | |
| Visual Question Answering | VizWiz (val test) | Accuracy39.76 | 15 | |
| Image Captioning | VizWiz (test) | CIDEr125.7 | 10 | |
| Image Captioning | VizWiz-Captions (test-dev) | CIDEr-D123 | 10 | |
| Visual Question Answering | VizWiz I (test) | VQA Accuracy57.2 | 10 | |
| Image Captioning | VizWiz | CIDEr36 | 9 | |
| Vision Understanding | VizWiz (test) | VizWiz Score54.7 | 8 | |
| Visual Question Answering | VizWiz (val-lite) | Accuracy54.36 | 6 | |
| Open-Ended Visual Question Answering | VizWiz (val) | Accuracy56.39 | 6 | |
| Visual Question Answering | VizWiz Far OOD | Accuracy23.05 | 6 | |
| Visual Question Answering | VizWiz (test-std) | Accuracy65.4 | 5 | |
| Referring Expression Grounding | VizWizG (test-dev) | IoU65.7 | 5 | |
| Image Captioning | VizWiz-Cap (val) | CIDEr125.7 | 4 | |
| Image question answering | VizWiz public server | Accuracy70.1 | 3 | |
| Image captioning | VizWiz public server | CIDEr120.8 | 3 | |
| Answerability | VizWiz 2022 (test) | AP83.78 | 3 | |
| Answerability | VizWiz 2022 (test-dev) | AP84.13 | 3 | |
| Visual Question Answering | VizWiz 2022 (test-std) | Accuracy60.15 | 3 | |
| Visual Question Answering | VizWiz 2022 (test-dev) | Accuracy61.64 | 3 | |
| Visual Question Answering | VizWiz zero-shot | Zero-shot Accuracy69.46 | 2 |