Share your thoughts, 1 month free Claude Pro on usSee more

Reward Modeling on ARMBench-VL ours (test)

67.6FG Score

ARM-Thinker-7B

Updated 3mo ago

Evaluation Results

Method	Links
ARM-Thinker-7B 2025.12		67.6	73.8	52.4	64.6
GPT-4o 2025.12		61.8	69.5	58.7	63.3
InternVL3-8B 2025.12		58.9	59.1	47	55
InternVL3.5-8B 2025.12		56.7	57.7	52	55.5
UnifiedReward-7B 2025.12		52	47.2	42.8	47.4
Qwen2.5-VL-7B 2025.12		51.8	45.4	41.1	46.1
Qwen3-VL-8B 2025.12		47.6	56.6	47.6	50.6