Share your thoughts, 1 month free Claude Pro on usSee more

Multi-task Language Understanding on MMLU-Pro (Accuracy, AVG., Improvement Overhead)

72.47Accuracy

Qwen3-235B-A22B

Updated 2mo ago

Evaluation Results

Method	Links
Qwen3-235B-A22B 2025.08		72.47	78.22	-
COCO Qwen3-8B with coco(Llama-3.1-8B) 2025.08		68.69	74.37	6.5
COCO Qwen3-8B with coco(Qwen3-8B) 2025.08		66.6	74.18	6.2
Aflow-Qwen3-8B 2025.08		66.56	69.86	-
Qwen3-8B 2025.08		58.85	68.52	-
COCO Llama-3.1-8B with coco(Qwen3-8B) 2025.08		53.44	63.59	9.5
Llama-3.1-8B 2025.08		48.03	55.48	-
COCO Llama-3.1-8B with coco(Llama-3.1-8B) 2025.08		45.62	58.46	0.63
Aflow-Llama3.1-8B 2025.08		45.14	58.09	-