Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Instruction Following and Helpfulness Evaluation on AlpacaEval 2.0

49.4Win Rate

QWEN2-INSTRUCT w/ SSO_DPO

0.093612.894325.69538.4957Oct 22, 2024Nov 28, 2024Jan 4, 2025Feb 11, 2025Mar 20, 2025Apr 26, 2025Jun 3, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2024.10
49.436.5
2024.10
45.443
2024.10
43.734.7
2024.10
42.537.7
2024.10
28.925.6
2025.03
28.5124.21
2025.06
28.34-
2024.10
28.335
2025.03
27.6426.29
2024.10
25.723.8
2025.03
25.5331.32
2025.03
24.2921.16
2024.10
2429.5
2025.03
23.8229.16
2024.10
22.623.8
2025.03
22.1119.45
2025.03
19.4426.42
2024.10
17.620.4
2024.10
17.522
2025.03
17.3322.63
2024.10
17.219.5
2025.03
17.122.25
2025.03
16.9117
2025.06
16.41-
2025.03
15.6512.93
2025.03
15.5815.67
2025.03
15.5316.54
2024.10
1520.6
2024.10
1520.2
2025.06
14.66-
2025.03
14.2911.79
2025.03
14.2914.09
2025.03
13.7914.48
2025.03
13.7314.53
2024.10
13.220.1
2024.10
13.220.1
2025.03
12.9211.24
2025.03
12.6718.06
2025.03
1217.75
2025.03
11.6110.72
2025.03
10.7414.35
2025.03
10.3715.65
2025.03
1014.07
2025.03
9.9413.19
2025.03
9.8111.61
2025.03
9.638.19
2025.03
9.328.92
2025.03
9.0713.64
2025.03
8.9412.57
2025.06
6.44-
2025.03
6.347.16
2025.03
5.036.42
2025.03
4.66.58
2025.03
4.414.86
2025.03
3.667.51
2025.03
2.423.59
2025.03
2.33
2025.03
1.993.1