Anthropic-SafeRLHF

Benchmarks

Task Name	Dataset Name	SOTA Result	Trend
Preference evaluation	Anthropic-SafeRLHF (target)	Win Rate41.7		2
Preference evaluation	Anthropic-SafeRLHF benchmark	Win Rate33.7		2

Showing 2 of 2 rows