Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multitask Language Understanding on MMLU (Accuracy, AVG, and Delta)
Loading...
77.5
MMLU Accuracy
Phi-4 14B (w/ LoopUS)
54.516
60.483
66.45
72.417
May 10, 2026
MMLU Accuracy
MMLU Average Score (AVG)
MMLU Delta Change
Updated 21d ago
Evaluation Results
Method
Method
Links
MMLU Accuracy
MMLU Average Score (AVG)
MMLU Delta Change
Phi-4 14B (w/ LoopUS)
Model=Phi-4 14B, Setti...
2026.05
77.5
68.6
1.7
Phi-4 14B (w/o LoopUS)
Model=Phi-4 14B, Setti...
2026.05
76.9
67
-
Qwen 8B (w/o LoopUS)
Model=Qwen 8B, Setting...
2026.05
72.8
63.2
-
Qwen 8B (w/ LoopUS)
Model=Qwen 8B, Setting...
2026.05
71.5
65.4
2.2
Qwen 4B (w/o LoopUS)
Model=Qwen 4B, Setting...
2026.05
68.3
60.3
-
Qwen 4B (w/ LoopUS)
Model=Qwen 4B, Setting...
2026.05
67.7
62.1
1.8
Qwen 1.7B (w/ LoopUS)
Model=Qwen 1.7B, Setti...
2026.05
56.6
55.3
1.6
Qwen 1.7B (w/o LoopUS)
Model=Qwen 1.7B, Setti...
2026.05
55.4
53.7
-
Feedback
Search any
task
Search any
task