| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Vision-Language Model Editing | FVQA 1.0 (test) | Accuracy100 | 48 | |
| Fact-based Visual Question Answering | FVQA | Accuracy74.2 | 46 | |
| Visual Question Answering | FVQA (test) | Accuracy73.95 | 36 | |
| Visual Question Answering | FVQA | Accuracy82.82 | 34 | |
| Multimodal Deep Search | FVQA | Accuracy76.67 | 16 | |
| Fact-based Question Answering | FVQA (test) | Accuracy70.1 | 16 | |
| Fact-based Visual Question Answering | FVQA (test) | Top-1 WUPS@0.982.47 | 13 | |
| Fact-based Visual Question Answering | FVQA 1.0 (test) | WUPS@0.0 (Top-1)87.3 | 13 | |
| Visual Question Answering | FVQA 2.0+ | LLM-J Score (Qwen2.5-7B)59.5 | 8 |