Share your thoughts, 1 month free Claude Pro on usSee more

Vision-Language Reward Model Evaluation on VLRewardBench

74.5Accuracy

LWE

Updated 5mo ago

Evaluation Results

Method	Links
LWE 2025.12		74.5	80.5	64.6
TextGrad* 2025.12		73	74.9	61.5
Dynamic Cheatsheet 2025.12		69.8	86.8	62.9
Selective LWE 2025.12		67.6	94	64.8
Sample-Specific Prompt 2025.12		66.1	72.7	52.9
CoT 2025.12		65.1	80.8	55.3
Vanilla 2025.12		62.9	80.1	52.9
Majority Voting 2025.12		62.7	81	53.7