Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Pairwise Evaluation on HH-RLHF (test)
Loading...
95.2
Test Accuracy
pairwise evaluator
63.792
71.946
80.1
88.254
Apr 10, 2026
Test Accuracy
Average Pairwise Score
Score Standard Deviation
Score Range
Positive Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Test Accuracy
Average Pairwise Score
Score Standard Deviation
Score Range
Positive Rate
pairwise evaluator
Epoch=2, Number of tra...
2026.04
95.2
1.6445
0.9972
1.51
95.2
L-BFGS probe
2026.04
84.5
-
-
-
-
standard end-to-end reward models
Training examples=161k
2026.04
72
-
-
-
-
pointwise evaluator
2026.04
65
-
-
-
-
Feedback
Search any
task
Search any
task