Share your thoughts, 1 month free Claude Pro on usSee more

General Language Understanding and Reasoning on Open LLM Leaderboard (BBH, GPQA, IFEVAL, MMLU, MUSR) (test)

72.7BBH

Qwen 2.5 72B

Updated 4mo ago

Evaluation Results

Method	Links
Qwen 2.5 72B 2026.02		72.7	37.6	86.4	56.3	42
Llama 3.3 70B 2026.02		69.2	32.3	90	53.3	44.6
Trojan Qwen 2.5 72B (TrojanStego) 2026.02		61	39.4	21.4	50.9	47.2
Trojan Llama 3.3 70B (TrojanStego) 2026.02		50.6	38.9	51.5	45.2	50.3