| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multimodal Reasoning | NaturalBench | Accuracy82.5 | 24 | |
| Robustness to Natural Adversarial Examples | NaturalBench | Accuracy9.89 | 20 | |
| Vision-Language Reasoning | NaturalBench (test) | Simple Accuracy66.02 | 7 | |
| Paired-prompt evaluation | NaturalBench | Simple Accuracy67.81 | 2 | |
| Visual Question Answering | NaturalBench | Simple Accuracy0.6946 | 2 |