Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reward Model Evaluation on RewardBench
Loading...
0.001
P-value
DPO
-0.0002
0.0079
0.016
0.0241
May 10, 2026
P-value
Adjusted P-value
A12
Effect Size
Updated 21d ago
Evaluation Results
Method
Method
Links
P-value
Adjusted P-value
A12
Effect Size
DPO
comparison=vs EvoPref
2026.05
0.001
0.001
0.78
-
IPO
comparison=vs EvoPref
2026.05
0.001
0.001
0.82
-
KTO
comparison=vs EvoPref
2026.05
0.001
0.001
0.85
-
CMA-ES
comparison=vs EvoPref
2026.05
0.003
0.009
0.72
-
ORPO
comparison=vs EvoPref
2026.05
0.018
0.036
0.67
-
MOEA/D
comparison=vs EvoPref
2026.05
0.024
0.041
0.64
-
SMS-EMOA
comparison=vs EvoPref
2026.05
0.031
0.048
0.62
-
Feedback
Search any
task
Search any
task