Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Preference Labeling on Anthropic Harmlessness

77Preference Labeling Accuracy

Curriculum-RLAIF

54.1260.066671.94May 26, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.05
77
2025.05
71
2025.05
68
2025.05
65
2025.05
61
2025.05
59
2025.05
57
2025.05
55