Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-task Language Understanding on MMLU (Overall, Stem)
Loading...
61.08
Overall Accuracy
Frozen Router
53.0928
55.1664
57.24
59.3136
Jun 1, 2026
Overall Accuracy
Stem Accuracy
Updated 19h ago
Evaluation Results
Method
Method
Links
Overall Accuracy
Stem Accuracy
Frozen Router
Backbone=Qwen1.5-MoE-A...
2026.06
61.08
53.19
ProbMoE
Backbone=Qwen1.5-MoE-A...
2026.06
61.05
53.82
Conventional
Backbone=Qwen1.5-MoE-A...
2026.06
61.03
53.16
DenseMixer
Backbone=Qwen1.5-MoE-A...
2026.06
61.03
52.87
Base Model
Backbone=Qwen1.5-MoE-A...
2026.06
60.85
52.27
Conventional
Backbone=OLMoE-1B-7B,...
2026.06
54.04
46.46
Frozen Router
Backbone=OLMoE-1B-7B,...
2026.06
53.97
46.18
DenseMixer
Backbone=OLMoE-1B-7B,...
2026.06
53.95
46.18
ProbMoE
Backbone=OLMoE-1B-7B,...
2026.06
53.69
45.54
Base Model
Backbone=OLMoE-1B-7B,...
2026.06
53.4
44.91
Feedback
Search any
task
Search any
task