Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-task Language Understanding on MMLU-Pro (Accuracy, AVG., Improvement Overhead)

72.47Accuracy

Qwen3-235B-A22B

44.046851.425958.80566.1841Aug 19, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.08
72.4778.22-
2025.08
68.6974.376.5
2025.08
66.674.186.2
2025.08
66.5669.86-
2025.08
58.8568.52-
2025.08
53.4463.599.5
2025.08
48.0355.48-
2025.08
45.6258.460.63
2025.08
45.1458.09-