Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Language Understanding on MMLU (Accuracy and Prompt/Response Scores)

82.1Accuracy

GPT-4o-mini

20.7436.6752.668.53Mar 11, 2026Mar 23, 2026Apr 4, 2026Apr 17, 2026Apr 29, 2026May 11, 2026May 24, 2026
Updated 7d ago

Evaluation Results

MethodLinks
2026.03
82.14.3884.7094.5
2026.03
80.34.4124.6054.35
2026.03
71.14.1134.4634.252
2026.03
68.74.1884.5774.302
2026.03
67.34.184.4314.153
2026.03
66.8---
2026.03
66.3---
2026.03
65.1---
2026.03
64.8---
2026.03
64.5---
2026.03
64.2---
2026.03
63.9---
2026.05
52.07---
2026.05
52---
2026.05
51.27---
2026.03
51.15---
2026.03
51.1---
2026.03
51.07---
2026.03
50.46---
2026.03
48.3---
2026.05
44.95---
2026.05
43.54---
2026.05
40.14---
2026.05
37.96---
2026.05
31.01---
2026.05
30.59---
2026.03
27.1---
2026.03
25.7---
2026.03
25.3---
2026.03
25.3---
2026.03
25.3---
2026.03
25.2---
2026.03
24.9---
2026.03
24.8---
2026.03
24.8---
2026.03
24.8---
2026.03
24.8---
2026.03
24.4---
2026.03
24.3---
2026.03
24.1---
2026.03
23.7---
2026.03
23.5---
2026.03
23.1---