Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Anthropic HH-RLHF

Benchmarks

Task NameDataset NameSOTA ResultTrend
Helpfulness alignmentAnthropic hh-rlhf
Gold Reward3.36
14
Preference AlignmentAnthropic-hh-rlhf (test)
LLM-as-a-Judge Helpful Score5.83
12
LLM AlignmentAnthropic HH-RLHF 2022 (test)
Win Rate62
4
Preference LearningAnthropic HH-RLHF+VI Preference (test)
Overall Accuracy64
3
Showing 4 of 4 rows