Share your thoughts, 1 month free Claude Pro on usSee more

LLM Evaluation on AlpacaEval 2.0

51.32LC Win Rate

SpecEM

Updated 18d ago

Evaluation Results

Method	Links
SpecEM 2024.12		51.32	54.52	-	-	-
UniTE 2024.12		49.2	41.04	-	-	-
GenFuse 2024.12		49.06	50.84	-	-	-
Mistral-24b-instruct-2501 2024.12		48.46	44.27	-	-	-
MOA 2024.12		46.98	51.24	-	-	-
Qwen2.5-32b-instruct 2024.12		43.82	43.54	-	-	-
Qwen2-72b-instruct 2024.12		38.1	-	-	-	-
Llama3-70b-instruct 2024.12		34.4	29.39	-	-	-
DPO-PoP-random 2025.09		14.62	-	14.78	-	1,909
DPO-PoP-iter 2025.09		12.89	-	13.42	-	2,004
DPO-margin-gt 2025.09		11.23	-	11.3	-	1,825
DPO-margin-1 2025.09		11.07	-	11.06	-	1,864
DPO-margin-gt-scaled 2025.09		10.95	-	11.43	-	1,881
Vanilla-DPO 2025.09		10.38	-	10.56	-	1,869
Ours 2026.03		7.7	-	2	-	-
Full Dataset 2026.03		6.7	-	1.9	-	-
Qwen2-7B 2026.06		-	-	-	85.7	-
Adapted RLCR 2026.06		-	-	-	86.2	-
SEE 2026.06		-	-	-	90.8	-