HH-Harmless

Benchmarks

Task Name	Dataset Name	SOTA Result
Reward Hacking Mitigation	Excessive HH Harmless 1.0 (Evaluation)	Reference Error Rate8.2	10
Preference Evaluation	HH-Harmless	Win Rate60	8
LLM Alignment	HH-Harmless (test)	Win Rate59	2

Showing 3 of 3 rows