Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Preference Labeling on Anthropic Harmlessness
Loading...
77
Preference Labeling Accuracy
Curriculum-RLAIF
54.12
60.06
66
71.94
May 26, 2025
Preference Labeling Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Preference Labeling Accuracy
Curriculum-RLAIF
Base Model=LLaMA-3-8B
2025.05
77
Conventional RLAIF
Base Model=LLaMA-3-8B
2025.05
71
Curriculum-RLAIF
Base Model=Gemma-1-2B
2025.05
68
RLCD
Base Model=LLaMA-3-8B
2025.05
65
RLCD
Base Model=Gemma-1-2B
2025.05
61
Conventional RLAIF
Base Model=Gemma-1-2B
2025.05
59
CAI
Base Model=LLaMA-3-8B
2025.05
57
CAI
Base Model=Gemma-1-2B
2025.05
55
Feedback
Search any
task
Search any
task