| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MMBench | Qwen2.5vl-Instruct | Accuracy86.4 | 30 | 3d ago | |
| MedXpertQA-MM | o1 | Accuracy49.7 | 27 | 3d ago | |
| DermaVQA | o1 | Accuracy43 | 12 | 3d ago | |
| PathVQA | Lingshu-32B | Accuracy65.9 | 12 | 3d ago | |
| PMC-VQA | PulseMind-72B | Accuracy70.3 | 12 | 3d ago | |
| VQA-RAD | PulseMind-72B | Accuracy87.1 | 12 | 3d ago | |
| MMMU Health & Medicine | PulseMind-72B | Accuracy0.694 | 12 | 3d ago | |
| MultiModalQA (dev) | T5-3B | F1 Score85.28 | 5 | 3d ago | |
| MMMU Pros | MergeMix | Accuracy37.46 | 4 | 3d ago | |
| MMMU | VisionThink-7B | Accuracy51 | 4 | 3d ago | |
| MMBench-CC | VisionThink-7B | Accuracy0.645 | 4 | 3d ago | |
| MMStar | MergeMix | Accuracy62.92 | 4 | 3d ago | |
| MMStar (test) | Accuracy72.7 | 4 | 3d ago | ||
| MMMU (val) | Proprietary API SOTA (Hurst et al., 2024) | Accuracy70.7 | 4 | 3d ago | |
| MMBench v1.1 (test) | Accuracy0.857 | 4 | 3d ago | ||
| MMBench en (dev) | POINTS-9B | Overall Score83.2 | 4 | 3d ago | |
| Urban Park Monitoring Question Dataset Quantitative | - | - | 0 | 3d ago | |
| Urban Park Monitoring Question Dataset Qualitative | - | - | 0 | 3d ago | |
| Urban Park Monitoring Question Dataset Basic | - | - | 0 | 3d ago |