| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Question Answering | SQA | Accuracy93.42 | 41 | |
| Science Question Answering | SQA-I | Score79 | 38 | |
| Science Question Answering | SQA IMG | Accuracy70.7 | 37 | |
| Science Question Answering | SQA | Accuracy (SQA)98 | 33 | |
| Sequential Question Answering | SQA (test) | Accuracy (All)74.5 | 33 | |
| Science Question Answering | SQA | SQA Score96.6 | 26 | |
| Visual Question Answering | SQA-Image | Accuracy70.2 | 25 | |
| Question Answering | SQA | Accuracy79.62 | 24 | |
| Reasoning | SQA | Accuracy85 | 23 | |
| Science Question Answering | SQA IMG | Score97.67 | 23 | |
| Science Question Answering | SQA | SQA Score97 | 22 | |
| Image-Language Understanding | SQA | EM71.6 | 21 | |
| Science Question Answering | SQA | SQA Score69.91 | 19 | |
| Deep Research | SQA v2 | Score88.3 | 18 | |
| Science Question Answering | SQA | Exact Match98.76 | 14 | |
| Science Question Answering | SQA | Score69.3 | 13 | |
| Science Question Answering | SQA-I | Accuracy67.9 | 13 | |
| Search-based Question Answering | SQA CS V2 (test) | IR57.62 | 12 | |
| Table Question Answering | SQA (test) | Accuracy (All)72.4 | 11 | |
| Table Question Answering | SQA Perturbed (test) | Overall Accuracy0.723 | 8 | |
| Scholarly Question Answering | SQA CS V2 | Overall Score89.7 | 6 | |
| 3D Visual Question Answering | SQA (test) | EM@153.32 | 5 | |
| Sequential Question Answering | SQA | Overall Accuracy74.5 | 5 | |
| Sequential Question Answering | SQA first fold (dev) | Accuracy (ALL)68 | 5 | |
| Scholarly QA | SQA v2 | Score41.8 | 4 |