Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Pairwise Ranking on Seven Harm Categories
Loading...
83.1
Insult Pairwise Score
OAI-RM
-0.724
21.038
42.8
64.562
Mar 23, 2025
Insult Pairwise Score
Unfair Pairwise Score
Crime Pairwise Score
Physical Harm Pairwise Score
Mental Harm Pairwise Score
Privacy Harm Pairwise Score
Ethics Pairwise Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Insult Pairwise Score
Unfair Pairwise Score
Crime Pairwise Score
Physical Harm Pairwise Score
Mental Harm Pairwise Score
Privacy Harm Pairwise Score
Ethics Pairwise Score
OAI-RM
Base Model=Llama-3.1-8B
2025.03
83.1
81.9
79.7
80.5
82.3
78.9
80.1
Shadow-RM
Base Model=Llama-3.1-8B
2025.03
9.3
9.9
10.5
10.4
9.4
10.7
10.7
Anthropic-RM
Base Model=Llama-3.1-8B
2025.03
2.5
2.3
2.4
2.9
2.4
3.1
3.7
Feedback
Search any
task
Search any
task