Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Harmless-helpful

Benchmarks

Task NameDataset NameSOTA ResultTrend
Reward Model ControllabilityHarmless-helpful
Kendall's Tau1
4
Generalization to Unseen PreferencesHarmless-helpful
Group 1 Score15.038
2
ControllabilityHarmless-helpful Group 4 (unseen)
Kendall's Tau1
2
ControllabilityHarmless-helpful Group 3 (unseen)
Kendall's Tau1
2
ControllabilityHarmless-helpful Group 2 (unseen)
Kendall's tau1
2
ControllabilityHarmless-helpful Group 1 (unseen)
Kendall's Tau1
2
Showing 6 of 6 rows