Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reward Modeling on ARMBench-VL ours (test)
Loading...
67.6
FG Score
ARM-Thinker-7B
46.8
52.2
57.6
63
Dec 4, 2025
FG Score
IF Score
Doc Score
Average Score
Updated 4d ago
Evaluation Results
Method
Method
Links
FG Score
IF Score
Doc Score
Average Score
ARM-Thinker-7B
parameters=7B
2025.12
67.6
73.8
52.4
64.6
GPT-4o
2025.12
61.8
69.5
58.7
63.3
InternVL3-8B
parameters=8B
2025.12
58.9
59.1
47
55
InternVL3.5-8B
parameters=8B
2025.12
56.7
57.7
52
55.5
UnifiedReward-7B
parameters=7B
2025.12
52
47.2
42.8
47.4
Qwen2.5-VL-7B
parameters=7B
2025.12
51.8
45.4
41.1
46.1
Qwen3-VL-8B
parameters=8B
2025.12
47.6
56.6
47.6
50.6
Feedback
Search any
task
Search any
task