Anthropic HH-RLHF

Benchmarks

Task Name	Dataset Name	SOTA Result
Helpfulness alignment	Anthropic hh-rlhf	Gold Reward3.36	14
Preference Alignment	Anthropic-hh-rlhf (test)	LLM-as-a-Judge Helpful Score5.83	12
Reward Modeling	Anthropic/hh-rlhf HH-helpful core250	Delta RM0.292	6
Response Diversity	Anthropic HH-RLHF	Preference Coverage82.5	6
LLM Alignment	Anthropic HH-RLHF 2022 (test)	Win Rate62	4
Preference Learning	Anthropic HH-RLHF+VI Preference (test)	Overall Accuracy64	3

Showing 6 of 6 rows