Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Alignment Evaluation on HH-RLHF (test)
Loading...
65.4
Reward Model Score
SFT + TTL
61.968
62.859
63.75
64.641
May 8, 2026
Reward Model Score
Helpfulness
Toxicity
Updated 23d ago
Evaluation Results
Method
Method
Links
Reward Model Score
Helpfulness
Toxicity
SFT + TTL
zero-shot alignment ge...
2026.05
65.4
49.8
0.41
Base SFT
zero-shot alignment ge...
2026.05
62.1
45.2
0.48
Feedback
Search any
task
Search any
task