| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Scene Text Visual Question Answering | ST-VQA (val) | ANLS0.845 | 30 | |
| Scene Text Visual Question Answering | ST-VQA (test) | ANLS0.799 | 21 | |
| Visual Question Answering | ST-VQA (test) | ANLS75.8 | 15 | |
| Visual Question Answering | ST-VQA | Accuracy80.5 | 15 | |
| Scene-Text Visual Question Answering | ST-VQA 1.0 (val) | ANLS72.9 | 15 | |
| Scene-Text Visual Question Answering | ST-VQA 1.0 (test) | ANLS71.8 | 14 | |
| Copyright tracking | ST-VQA | ASR56 | 13 | |
| Scene Text Visual Question Answering | ST-VQA 8 (test) | ANLS69.6 | 10 | |
| Copyright Tracking | ST-VQA full (train) | ASR77 | 8 | |
| Scene Text Visual Question Answering | ST-VQA 8 (val) | Accuracy0.6164 | 8 | |
| Image question answering | ST-VQA public server (test) | Accuracy75.8 | 3 | |
| Image question answering | ST-VQA public server | Accuracy- | 0 |