Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Vision-Language Reward Model Evaluation on VLRewardBench
Loading...
74.5
Accuracy
LWE
62.228
65.414
68.6
71.786
Dec 7, 2025
Accuracy
Consistency
Pairwise Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Consistency
Pairwise Accuracy
LWE
Relative Inference Cos...
2025.12
74.5
80.5
64.6
TextGrad*
Relative Inference Cos...
2025.12
73
74.9
61.5
Dynamic Cheatsheet
Relative Inference Cos...
2025.12
69.8
86.8
62.9
Selective LWE
Relative Inference Cos...
2025.12
67.6
94
64.8
Sample-Specific Prompt
Relative Inference Cos...
2025.12
66.1
72.7
52.9
CoT
Relative Inference Cos...
2025.12
65.1
80.8
55.3
Vanilla
Relative Inference Cos...
2025.12
62.9
80.1
52.9
Majority Voting
Relative Inference Cos...
2025.12
62.7
81
53.7
Feedback
Search any
task
Search any
task