Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reward Modeling Evaluation on UltraFeedback (test)

-3.12Score

DPO+Filter

-5.564-4.9295-4.295-3.6605Oct 10, 2025
Updated 21d ago

Evaluation Results

MethodLinks
2025.10
-3.1267
2025.10
-3.5963
2025.10
-3.7564
2025.10
-4.4258
2025.10
-5.4750