Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reward Modeling Evaluation on HelpSteer3 (test)

-5.89Score

DPO+Filter

-8.49-7.815-7.14-6.465Oct 10, 2025
Updated 22d ago

Evaluation Results

MethodLinks
2025.10
-5.8973
2025.10
-6.5667
2025.10
-6.8368
2025.10
-6.9167
2025.10
-8.3950