Share your thoughts, 1 month free Claude Pro on usSee more

Instruction Following and Safety Alignment on AlpacaEval Borderline

98WinRate

Best-of-N

Updated 1mo ago

Evaluation Results

Method	Links
Best-of-N 2025.10		98	4
SG 2025.10		97	3.1
Threshold filter 2025.10		96	2.1