Share your thoughts, 1 month free Claude Pro on usSee more

General Language Understanding and Reasoning on Open LLM Leaderboard Lighteval (test)

91.07Mean Accuracy

GPT-5

Updated 4mo ago

Evaluation Results

Method	Links
GPT-5 2026.01		91.07	91.4	95.31	91.36	94.85	87.1	87.85	89.6
RedSage-8B-DPO 2026.01		74.33	77.07	71.76	82.71	79.87	52.47	73.01	83.44
RedSage-8B-Ins 2026.01		73.34	77.38	69.62	86.05	79	47.75	73.64	79.97
Qwen3-32B 2026.01		73.17	82.11	69.28	87.49	70.93	48.17	65.98	88.26
Qwen3-8B-Base 2026.01		70.86	78.73	68.09	81.73	79.62	43.84	73.16	-
RedSage-8B-Seed 2026.01		69.58	78.18	65.19	82.34	77.76	42.44	71.59	-
RedSage-8B-CFW 2026.01		69.31	78.63	66.72	81.12	79.26	38.09	72.06	-
Foundation-Sec-8B-Instruct 2026.01		69.28	64.11	63.91	77.79	81.35	53.15	68.51	76.17
RedSage-8B-Base 2026.01		69.23	77.8	65.53	82.03	77.96	42.19	69.85	-
Llama-3.1-8B-Instruct 2026.01		68.2	67.29	57.51	77.41	78.91	45.93	72.61	77.75
Llama-Primus-Merged 2026.01		66.71	66.17	53.07	75.28	79.07	46.52	73.24	73.58
Qwen3-8B 2026.01		65.92	73.59	62.54	75.66	56.7	45.23	62.51	85.21
DeepHat-V1-7B 2026.01		64.89	69.53	57.17	77.94	74.8	33.17	69.06	72.58
Llama-Primus-Base 2026.01		64.82	65.09	51.19	71.8	79.49	44.62	72.69	68.85
Llama-3.1-8B 2026.01		61.15	66.31	58.19	49.05	82.08	35.98	75.3	-
Foundation-Sec-8B 2026.01		60.24	63.62	58.45	46.17	81.32	38.71	73.16	-
Lily-Cybersecurity-7B-v0.2 2026.01		56.98	56.49	58.96	30.86	80.94	48.53	72.06	50.99