| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Fine-grained Visual Question Answering | HRBench-8K | Overall Accuracy69.63 | 28 | |
| Fine-grained Visual Question Answering | HRBench 4K | Overall Accuracy71.13 | 28 | |
| High-Resolution Visual Reasoning | HRBench | Accuracy0.7512 | 16 | |
| General VQA | HRBench | Accuracy78.5 | 14 | |
| Fine-grained Visual Perception | HRBench-8K | Accuracy34.53 | 12 | |
| Visual Question Answering | HRBench 8K | Accuracy76.25 | 12 | |
| Visual Question Answering | HRBench-4K | Accuracy0.7925 | 12 | |
| High-resolution Image Comprehension | HRBench | HRBench 4K Score0.734 | 9 | |
| Visual Tool-Use | HRBench 8K | Accuracy73.7 | 9 | |
| Visual Tool-Use | HRBench 4K | Accuracy80.1 | 9 | |
| High-Resolution Multimodal Understanding | HRBench 8K | Accuracy66.5 | 8 | |
| Reasoning Efficiency (Token Usage) | HrBench N=800 | Avg Tokens5.7 | 5 |