Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Harmlessness evaluation on HH-RLHF (test)

83.33Win Rate

APL

48.666857.665966.66575.6641May 30, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.05
83.33-0.432.6
2025.05
80-1.31.9
2025.05
76.67-0.692.28
2025.05
71.67-2.121.38
2025.05
70-1.91.46
2025.05
56.67-1.342.09
2025.05
54.17-1.691.97
2025.05
52.5-1.341.99
2025.05
51-2.082.05
2025.05
50.83-1.561.9
2025.05
50-5.880.38
2025.05
50-1.952.02
2024.02
-40--
2024.02
-43--
2024.02
-47--
2024.02
-40--
2024.02
-57--
2024.02
-37--
2024.02
-67--