Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Instruction Following Evaluation on AlpacaEval 2.0 (test)
Loading...
54.17
LC% over π0
DAR
26.8284
33.9267
41.025
48.1233
Oct 7, 2025
Oct 28, 2025
Nov 18, 2025
Dec 10, 2025
Dec 31, 2025
Jan 21, 2026
Feb 12, 2026
LC% over π0
SE
WR (%)
Updated 1mo ago
Evaluation Results
Method
Method
Links
LC% over π0
SE
WR (%)
DAR
Response Length=1963
2026.02
54.17
0.23
-
RLOO
Response Length=2076
2026.02
52.25
0.14
-
GRPO
Response Length=2038
2026.02
50.5
0.16
-
Iter-SFT
Response Length=2004
2026.02
49.8
0.17
-
Ours
Base Model=Llama-3.1-T...
2025.10
38.89
-
45.78
Full Dataset
Base Model=Llama-3.1-T...
2025.10
34.37
-
36.46
Random
Base Model=Llama-3.1-T...
2025.10
33.48
-
35.45
Ours
Base Model=Mistral-7B-...
2025.10
31.43
-
39.01
Full Dataset
Base Model=Mistral-7B-...
2025.10
29.24
-
29.94
Random
Base Model=Mistral-7B-...
2025.10
27.88
-
38.01
Feedback
Search any
task
Search any
task