Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multimodal Reward Modeling on MM-RLHF-RewardBench

92.4Pairwise Accuracy

Molmo2-4B Multi-response RM

67.23273.76680.386.834Apr 13, 2026
Updated 4d ago

Evaluation Results

MethodLinks
92.4-
2026.04
85-
84.7-
2026.04
81.8-
2026.04
80.6-
2026.04
80-
2026.04
78.8-
2026.04
77.6-
2026.04
77.6-
2026.04
73.5-
2026.04
73.5-
2026.04
72.4-
2026.04
71.8-
2026.04
71.2-
70.6-
70-
2026.04
69.4-
2026.04
68.2-
-58.23
-82.35
2026.02
-17.1
2026.02
-20.58
2026.02
-48.23
2026.02
-82
2026.02
-71.18
2026.02
-80.59
2026.02
-85.88