Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Instruction Following on AlpacaEval2 short-context
Loading...
22.9
AlpacaEval2 Score
officially post-trained
11.98
14.815
17.65
20.485
Oct 28, 2024
AlpacaEval2 Score
Updated 4d ago
Evaluation Results
Method
Method
Links
AlpacaEval2 Score
officially post-trained
Backbone=Llama-3.1-8B
2024.10
22.9
officially post-trained
Backbone=GLM-4-9B
2024.10
22.4
DPO w/ LongReward
Backbone=GLM-4-9B
2024.10
15.4
DPO w/ Contrast
Backbone=GLM-4-9B
2024.10
14.5
DPO w/ LongReward
Backbone=Llama-3.1-8B
2024.10
14.2
DPO w/ SRM
Backbone=GLM-4-9B
2024.10
14.2
DPO w/ Contrast
Backbone=Llama-3.1-8B
2024.10
13.8
DPO w/ SRM
Backbone=Llama-3.1-8B
2024.10
13.7
SFT
Backbone=GLM-4-9B
2024.10
12.5
SFT
Backbone=Llama-3.1-8B
2024.10
12.4
Feedback
Search any
task
Search any
task