Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Harmlessness preference labeling accuracy on SafeRLHF-RMB (test)

70.6Bench Accuracy

Biased Rubric Search

56.76860.35963.9567.541Feb 14, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
70.654.715.9
2026.02
70.379.8-9.5
2026.02
69.876.87
2026.02
69.780.2-10.5
2026.02
69.565.34.2
2026.02
68.954.314.6
2026.02
68.682.6-14
2026.02
68.260.57.7
2026.02
6858.29.8
2026.02
67.873.1-5.3
2026.02
66.758.78
2026.02
65.266-0.8
2026.02
63.177.2-14.1
2026.02
59.782.2-22.5
2026.02
57.355.41.9