| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Visual Question Answering | HRBench-4K | Accuracy0.7925 | 54 | |
| Visual Question Answering | HRBench 8K | Accuracy76.25 | 51 | |
| Fine-grained Visual Question Answering | HRBench-8K | Overall Accuracy69.63 | 28 | |
| Fine-grained Visual Question Answering | HRBench 4K | Overall Accuracy71.13 | 28 | |
| Fine-grained Visual Perception | HRBench-8K | Accuracy68.8 | 21 | |
| High-Resolution Visual Reasoning | HRBench | Accuracy0.7512 | 16 | |
| Visual Question Answering | HRBench 8K | FSP88.5 | 15 | |
| Visual Question Answering | HRBench-4K | FSP Score92.8 | 15 | |
| General VQA | HRBench | Accuracy78.5 | 14 | |
| High-Resolution Visual Perception | HRBench 4K | Score83.5 | 13 | |
| Visual Tool-Use | HRBench 8K | Accuracy73.7 | 13 | |
| High-Resolution Multimodal Understanding | HRBench 8K | Accuracy71.5 | 13 | |
| Real-World Understanding | HRBench 4K | Score86.9 | 10 | |
| Perceptual Robustness | HRBench 8K | Overall Score71.5 | 9 | |
| Perceptual Robustness | HRBench-4K | Overall Score72.38 | 9 | |
| High-resolution Image Comprehension | HRBench | HRBench 4K Score0.734 | 9 | |
| Visual Tool-Use | HRBench 4K | Accuracy80.1 | 9 | |
| Real-World Understanding | HRBench 8K | Accuracy73.8 | 8 | |
| Real-World Understanding | HRBench (4K) | Accuracy77.9 | 8 | |
| Reasoning Efficiency (Token Usage) | HrBench N=800 | Avg Tokens5.7 | 5 | |
| Visual Reasoning with Tool Use | HRBench 4K | Accuracy76.8 | 4 |