Share your thoughts, 1 month free Claude Pro on usSee more

Instruction Following Evaluation on AlpacaEval 2.0 (test)

54.17LC% over π0

DAR

Updated 4mo ago

Evaluation Results

Method	Links
DAR 2026.02		54.17	0.23	-
RLOO 2026.02		52.25	0.14	-
GRPO 2026.02		50.5	0.16	-
Iter-SFT 2026.02		49.8	0.17	-
Ours 2025.10		38.89	-	45.78
Full Dataset 2025.10		34.37	-	36.46
Random 2025.10		33.48	-	35.45
Ours 2025.10		31.43	-	39.01
Full Dataset 2025.10		29.24	-	29.94
Random 2025.10		27.88	-	38.01