Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Language Evaluation on MMLU, BoolQ, OpenBookQA, RTE

70.4Average Accuracy

Mixtral-8x22B

27.44838.59949.7560.901Jul 12, 2024Oct 25, 2024Feb 8, 2025May 25, 2025Sep 8, 2025Dec 23, 2025Apr 8, 2026
Updated 9d ago

Evaluation Results

MethodLinks
2024.07
70.4
2026.04
70.4
2026.04
68.2
2024.07
67.6
2024.07
67
2026.04
67
2024.07
66.2
2026.04
66.2
2024.07
66
2024.07
63.3
2026.04
63.3
2024.07
62.6
2026.04
62.6
2024.07
62.4
2024.07
61.3
2024.07
61.1
2026.04
61.1
2024.07
59.9
2024.07
59.6
2026.04
59.6
2024.07
57.8
2024.07
54.5
2024.07
54.2
2024.07
54.1
2024.07
50.8
2024.07
50
2026.04
50
2026.04
50
2026.04
49.8
2026.04
49.5
2026.04
48.9
2026.04
48.8
2026.04
48.8
2026.04
47.5
2026.04
47.3
2024.07
45.7
2026.04
45.2
2024.07
33.5
2026.04
33.5
2024.07
31.6
2026.04
31.6
2024.07
29.1