Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reward Modeling on UltraFeedback Cleaned
Loading...
92.36
Total Score
VRM
53.2352
63.3926
73.55
83.7074
Mar 5, 2026
Total Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Total Score
VRM
2026.03
92.36
RM
2026.03
88.98
SELECTIVE DPO
2026.03
82.28
DPO
2026.03
75.71
KTO
2026.03
70.67
IPO
2026.03
60.69
SIMPO
2026.03
59.16
WPO
2026.03
54.74
Feedback
Search any
task
Search any
task