Share your thoughts, 1 month free Claude Pro on usSee more

Vision-Language Reward Model Evaluation on MMRewardBench

83.6Accuracy

Selective LWE

Updated 5mo ago

Evaluation Results

Method	Links
Selective LWE 2025.12		83.6	94.7	80.8
Majority Voting 2025.12		82.8	89.1	76.9
TextGrad* 2025.12		82.1	83.6	74.1
Sample-Specific Prompt 2025.12		81.5	86.5	74.2
Dynamic Cheatsheet 2025.12		81.1	90.1	76.4
Vanilla 2025.12		80.8	86.3	74.7
CoT 2025.12		80.8	87.4	74.9
LWE 2025.12		79.9	84.6	72.7