| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Question Answering | VizWiz | Accuracy100 | 1,820 | |
| Visual Question Answering | VizWiz (test) | Accuracy69.3 | 105 | |
| Visual Question Answering | Vizwiz (val) | VQA Score81.7 | 66 | |
| Visual Question Answering | VizWiz (test-dev) | Accuracy76.4 | 65 | |
| Visual Reasoning | VizWiz | ECE0.11 | 32 | |
| Visual Question Answering | VizWiz | Acc68.2 | 31 | |
| Visual Question Answering | VizWiz | VW Score70.4 | 25 | |
| Visual Reasoning | VizWiz | Discriminability0.24 | 24 | |
| Vision Understanding | VizWiz (test) | VizWiz Score54.7 | 24 | |
| Visual Question Answering | VizWiz | Accuracy (VizWiz)55.58 | 19 | |
| Visual Question Answering | VizWiz | Score68.3 | 16 | |
| Comprehensive Evaluation | VizWiz (val) | Score69.16 | 16 | |
| Visual Question Answering | VizWiz (val test) | Accuracy39.76 | 15 | |
| Visual Question Answering | VizWiz | VQA Accuracy (Clean)41.5 | 14 | |
| Answerability | VizWiz | Accuracy61.5 | 12 | |
| Uncertainty Quantification | VIZWIZ | AUROC0.681 | 10 | |
| Image Captioning | VizWiz (test) | CIDEr125.7 | 10 | |
| Image Captioning | VizWiz-Captions (test-dev) | CIDEr-D123 | 10 | |
| Visual Question Answering | VizWiz I (test) | VQA Accuracy57.2 | 10 | |
| Image Captioning | VizWiz | CIDEr36 | 9 | |
| Robustness Evaluation | VizWiz | Accuracy70.9 | 6 | |
| Visual Question Answering and Grounding | VizWiz (test) | CLIPScore0.757 | 6 | |
| Visual Question Answering | VizWiz (val-lite) | Accuracy54.36 | 6 | |
| Open-Ended Visual Question Answering | VizWiz (val) | Accuracy56.39 | 6 | |
| Visual Question Answering | VizWiz Far OOD | Accuracy23.05 | 6 |