Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HHA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Reward ScoringHHA benchmark
Harmlessness Score (Base)66.97
30
Alignment Reward EvaluationHHA (test)
Harmless Score64
20
RLHF Alignment EvaluationHHA
Harmlessness Win Rate (Base, A)76.1
6
Showing 3 of 3 rows