Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Instruction Following Evaluation on AlpacaEval 2.0 (test)
Loading...
54.17
LC% over π0
DAR
49.6252
50.8051
51.985
53.1649
Feb 12, 2026
LC% over π0
SE
Updated 4d ago
Evaluation Results
Method
Method
Links
LC% over π0
SE
DAR
Response Length=1963
2026.02
54.17
0.23
RLOO
Response Length=2076
2026.02
52.25
0.14
GRPO
Response Length=2038
2026.02
50.5
0.16
Iter-SFT
Response Length=2004
2026.02
49.8
0.17
Feedback
Search any
task
Search any
task