| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Vision-Language Model Editing | FVQA 1.0 (test) | Accuracy100 | 48 | |
| Fact-based Visual Question Answering | FVQA | Accuracy74.2 | 21 | |
| Visual Question Answering | FVQA | Accuracy68.44 | 16 | |
| Visual Question Answering | FVQA (test) | Top-1 Acc77.99 | 14 | |
| Fact-based Visual Question Answering | FVQA (test) | Top-1 WUPS@0.982.47 | 13 | |
| Fact-based Visual Question Answering | FVQA 1.0 (test) | WUPS@0.0 (Top-1)87.3 | 13 | |
| Visual Question Answering | FVQA 2.0+ | LLM-J Score (Qwen2.5-7B)59.5 | 8 |