Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Visual Tool-Use on HRBench 8K
Loading...
73.7
Accuracy
ARM-Thinker-7B
57.684
61.842
66
70.158
Nov 24, 2025
Nov 25, 2025
Nov 27, 2025
Nov 29, 2025
Nov 30, 2025
Dec 2, 2025
Dec 4, 2025
Accuracy
Faithfulness
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Faithfulness
ARM-Thinker-7B
Size=7B, Backbone=Qwen...
2025.12
73.7
-
Mini-o3
Source=Lai et al. (2025)
2025.12
73.3
-
CodeV-7B-RL
Backbone=7B, Training=RL
2025.11
71.3
13.3
Thyme-RL-7B
Backbone=7B, Training=RL
2025.11
71.2
1.2
Qwen3-VL-8B
Size=8B
2025.12
70.4
-
InternVL3.5-8B
Size=8B
2025.12
69.9
-
DeepEyes
Source=Lai et al. (2025)
2025.12
69.5
-
DeepEyes-7B
Backbone=7B
2025.11
69.1
6.7
Pixel-Reasoner-7B
Backbone=7B
2025.11
68.5
7.6
InternVL3-8B
Size=8B
2025.12
68.4
-
Pixel Reasoner
Source=Lai et al. (2025)
2025.12
66.9
-
Qwen2.5-VL-7B
Size=7B
2025.12
64.6
-
GPT-4o
Source=Lai et al. (2025)
2025.12
58.3
-
Feedback
Search any
task
Search any
task