Share your thoughts, 1 month free Claude Pro on usSee more

Large Language Model Evaluation on Open LLM Leaderboard v1 (test)

69.6Average Score

FP16

Updated 2mo ago

Evaluation Results

Method
FP16 2026.01	69.6	55.6	58.1	59.7	74.9	70.1	88.8	100	80.2
Bielik-11B-v3 2025.12	68.45	61.43	81.38	47.65	67.55	78.53	74.15	-	-
Qwen1.5-14B 2025.12	66.7	56.57	81.08	52.06	69.36	73.48	67.63	-	-
TPAW Iter-3 2026.05	66.14	65.54	80.77	54.46	64.56	76.32	55.19	-	-
LoPRo.v 2026.01	66.1	55.3	54.9	54.9	71.7	69.6	80.9	94.91	75.1
Bielik-11B-v2 2025.12	65.87	60.58	79.84	46.13	63.06	77.82	67.78	-	-
Qwen-14B 2025.12	65.86	58.28	83.99	49.43	67.7	76.8	58.98	-	-
TPAW Iter-4 2026.05	65.77	63.23	80.91	54.44	64.46	76.24	55.34	-	-
LoPRo 2026.01	65.7	54.3	53.9	54.2	71.1	69.9	81.2	94.4	75.3
TPAW Iter-2 2026.05	65.64	62.97	80.63	54.11	64.45	76.16	55.5	-	-
TPAW Iter-1 2026.05	65.44	62.88	80.66	53.71	64.29	75.93	55.19	-	-
SPIN Iter-4 2026.05	64.88	62.03	80.78	53.92	63.97	76.01	52.54	-	-
SPIN Iter-1 2026.05	64.82	61.69	80.44	53.62	63.94	75.3	53.9	-	-
SPIN Iter-3 2026.05	64.8	62.2	80.74	53.98	64.15	76.01	51.71	-	-
DPO 2026.05	64.79	63.14	80.46	57.42	63.08	74.35	50.27	-	-
SPIN Iter-2 2026.05	64.73	62.03	80.63	53.81	63.97	75.61	52.31	-	-
SFT 2026.05	63.93	60.75	79.48	51.65	63.72	74.82	53.15	-	-
Meta-Llama-3-8B 2025.12	62.62	60.24	82.23	42.93	66.7	78.45	45.19	-	-
Bielik-4.5B-v3 2025.12	61.02	51.19	73.01	45.63	61.32	71.35	63.61	-	-
Mistral-7B-v0.1 2025.12	60.97	59.98	83.31	42.15	64.16	78.37	37.83	-	-
Mistral-7B-v0.2 2025.12	60.37	60.84	83.08	41.76	63.62	78.22	34.72	-	-
DPO 2026.05	58.55	54.52	66.29	49.39	59.41	64.96	56.71	-	-
TPAW Iter-4 2026.05	57.76	52.56	66.3	47.9	59.64	65.11	55.04	-	-
TPAW Iter-3 2026.05	57.65	53.67	66.18	47.91	59.52	65.04	53.6	-	-
SPIN Iter-4 2026.05	57.61	53.58	66.05	48.41	59.58	64.72	53.3	-	-
SPIN Iter-3 2026.05	57.47	52.65	66.25	48.38	59.56	64.48	53.53	-	-
TPAW Iter-2 2026.05	57.38	53.41	66.08	47.91	59.58	64.17	53.15	-	-
TPAW Iter-1 2026.05	57.27	52.99	65.77	47.64	59.5	64.96	52.77	-	-
SPIN Iter-2 2026.05	57.26	52.05	66	48.25	59.43	64.4	53.45	-	-
SPIN Iter-1 2026.05	57.13	51.79	65.76	47.82	59.26	64.33	53.83	-	-
SFT 2026.05	56.28	50.94	64.91	46.83	58.93	64.8	51.25	-	-
LoPRo.v 2026.01	55.6	47.9	47.2	42.7	61.8	66.2	55.5	79.89	67.9
LoPRo 2026.01	51.4	40.6	44.3	40.2	55.7	64.4	52.3	73.89	62.5
Bielik-7B-v0.1 2025.12	49.98	45.22	67.92	47.16	43.2	66.85	29.49	-	-