| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Question Answering | Visual7W (test) | Average Accuracy85.33 | 15 | |
| Multiple Choice Visual Question Answering | Visual7W (test) | Accuracy (MC)72.3 | 13 | |
| Object Counting | Visual7W Count | Accuracy55 | 6 | |
| Visual Question Answering | Visual7w | Telling QA79.5 | 6 | |
| Visual Question Answering | Visual7W (val) | Acc-MC69.8 | 4 | |
| Grounded Visual Question Answering | Visual7W (test) | Accuracy91.05 | 2 |