Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Instruction Following on AlpacaEval (test)

3,213Helpfulness Score

Fine-tuned

-120.2745.151,610.52,475.85May 28, 2024Sep 8, 2024Dec 20, 2024Apr 2, 2025Jul 14, 2025Oct 25, 2025Feb 5, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2025.04
3,213-
2025.04
2,827-
2025.04
2,824-
2025.04
2,664-
2025.04
2,635-
2025.04
1,839-
2025.04
1,830-
2025.04
1,768-
2025.04
1,765-
2025.04
1,563-
2025.04
1,552-
2025.04
1,255-
2025.04
1,187-
2025.04
1,176-
2025.04
1,085-
2026.02
87.2-
2026.02
85.3-
2026.02
82.8-
2026.02
76.1-
2026.02
72.4-
2025.04
71-
2026.02
70.7-
2026.02
69.7-
2026.02
69.3-
2024.05
43.254.33
2024.05
38.545.63
2024.05
37.543.27
2024.05
37.441.75
2024.05
33.141.35
2024.05
3310.68
2025.04
10-
2025.04
8-