Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multimodal Reward Modeling on VideoRewardBench
Loading...
68.2
Macro Pairwise Accuracy
GPT-5
12.56
27.005
41.45
55.895
Apr 13, 2026
Macro Pairwise Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Macro Pairwise Accuracy
GPT-5
2026.04
68.2
Claude-Sonnet-4.5
2026.04
67.5
Molmo2-4B Multi-response RM
Size=4B
2026.04
66.3
Qwen3-VL-32B
Size=32B
2026.04
65.8
Qwen3-VL-4B
Size=4B
2026.04
64.9
Qwen3-VL-4B Multi-response RM
Size=4B
2026.04
64.9
Gemini-2.5-Pro
2026.04
63.2
Skywork-VL-Reward
Size=7B
2026.04
62.9
Qwen3-VL-8B
Size=8B
2026.04
62
R1-Reward
Size=7B
2026.04
61.2
InternVL3-78B
Size=78B
2026.04
58.5
Molmo2-4B
Size=4B
2026.04
58.2
InternVL3-8B
Size=8B
2026.04
57.9
Molmo2-8B
Size=8B
2026.04
57.1
IXC-2.5-Reward
Size=7B
2026.04
57.1
Qwen2.5-VL-7B
Size=7B
2026.04
55.3
MM-RLHF-Reward
Size=7B
2026.04
52.2
LLaVA-Critic
Size=7B
2026.04
14.7
Feedback
Search any
task
Search any
task