Share your thoughts, 1 month free Claude Pro on usSee more

Safety Alignment on SafeRLHF

83Win Rate

Chain-of-Thought

Updated 4mo ago

Evaluation Results

Method	Links
Chain-of-Thought 2025.06		83	8	9
Best-of-N 2025.06		77	10	13
Multi-Agent Debate 2025.06		76	10	14
RLAIF 2025.06		72	22	6
Chain-of-Thought 2025.06		71	6	23
Best-of-N 2025.06		62	20	18
Multi-Agent Debate 2025.06		61	21	18
Self-Refine (Debate) 2025.06		57	32	11