HH

Benchmarks

Task Name	Dataset Name	SOTA Result
Value Alignment	HH Balance-8	Conformity Score4.317	17
Dialogue Preference Learning	HH (test)	Win Rate (0% Flip)90.8	14
Human Preference Alignment	HH (test)	Reward3.8764	14
Response Generation	HH dataset	Reward-0.96	13
Dialogue	HH (Anthropic Helpful and Harmless)	Win Rate (0% Flip)82.5	10
Harmfulness Evaluation	HH Harmless	Beaver-7B Cost Score3.25	10
Alignment	HH IDN 40%	Win Rate68	8
Alignment	HH (IDN 20%)	Win Rate78.2	8
Preference Evaluation	HH-Helpful	Win Count52	8
Model Discovery	HH	Avg NLL (Model)25.18	6
LLM-as-Judge evaluation	HH dataset	WCWR59.1	5
Closed Loop RLHF	HH (40% noise)	Win Rate57	3
Closed Loop RLHF	HH 20% noise	Win Rate58.7	3
Closed Loop RLHF	HH 0% noise	Win Rate61.6	3
Human Evaluation	HH dataset	Win Rate59	3
LLM Preference Alignment Evaluation	HH Helpful	Preference (spec vs ctl)55	1
Pairwise Judge Comparison	HH helpful	Win/Loss Count149	1

Showing 17 of 17 rows