Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reward Modeling Evaluation on UltraFeedback (test)
Loading...
-3.12
Score
DPO+Filter
-5.564
-4.9295
-4.295
-3.6605
Oct 10, 2025
Score
Winrate
Updated 21d ago
Evaluation Results
Method
Method
Links
Score
Winrate
DPO+Filter
Pη Type=1
2025.10
-3.12
67
DPO
Pη Type=1
2025.10
-3.59
63
DPO+Filter
Pη Type=2
2025.10
-3.75
64
DPO
Pη Type=2
2025.10
-4.42
58
Base
Pη Type=-
2025.10
-5.47
50
Feedback
Search any
task
Search any
task