Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multilingual Evaluation on GSM8K, IFEval, MMLU Aggregate
Loading...
40.9
Average Score
RA-MoE
19.58
25.115
30.65
36.185
May 27, 2026
Average Score
Updated 6d ago
Evaluation Results
Method
Method
Links
Average Score
RA-MoE
Model=Qwen1.5
2026.05
40.9
RISE
Model=Qwen1.5
2026.05
39.9
SFT
Model=Qwen1.5
2026.05
39.3
RA-MoE
Model=DSV2
2026.05
39.1
RISE
Model=DSV2
2026.05
38.3
SFT
Model=DSV2
2026.05
38.1
RS
Model=DSV2
2026.05
34.9
0-shot
Model=DSV2
2026.05
34.6
RS
Model=Qwen1.5
2026.05
29.8
0-shot
Model=Qwen1.5
2026.05
29.5
RA-MoE
Model=OLMoE
2026.05
24.2
RISE
Model=OLMoE
2026.05
23.5
SFT
Model=OLMoE
2026.05
23.3
RS
Model=OLMoE
2026.05
20.7
0-shot
Model=OLMoE
2026.05
20.4
Feedback
Search any
task
Search any
task