Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reward Modeling Evaluation on HelpSteer3 (test)
Loading...
-5.89
Score
DPO+Filter
-8.49
-7.815
-7.14
-6.465
Oct 10, 2025
Score
Winrate
Updated 22d ago
Evaluation Results
Method
Method
Links
Score
Winrate
DPO+Filter
Pη Type=1
2025.10
-5.89
73
DPO
Pη Type=1
2025.10
-6.56
67
DPO+Filter
Pη Type=2
2025.10
-6.83
68
DPO
Pη Type=2
2025.10
-6.91
67
Base
Pη Type=-
2025.10
-8.39
50
Feedback
Search any
task
Search any
task