| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Chat | LLaVA-W Bench | Score70.1 | 14 | |
| Multimodal Reasoning | LLaVA-W (test) | Accuracy90.6 | 12 | |
| Out-of-Distribution General Visual Question Answering | LLaVA-W | Score0.755 | 6 | |
| Vision-to-Text | LLAVA-W English | Score111.9 | 3 |