Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multilingual Language Understanding on MMLU-ProX
Loading...
78.1
Accuracy
Qwen3-8B (Thinking)
27.036
40.293
53.55
66.807
Aug 20, 2025
Sep 17, 2025
Oct 15, 2025
Nov 12, 2025
Dec 10, 2025
Jan 7, 2026
Feb 5, 2026
Accuracy
Updated 25d ago
Evaluation Results
Method
Method
Links
Accuracy
Qwen3-8B (Thinking)
Base Model=Qwen3-8B, T...
2025.08
78.1
Qwen-3-30B-A3B
Openness=Open-weights,...
2026.02
72
OpenThoughts3-20k
SFT Dataset=OpenThough...
2025.08
71.1
Qwen-3-32B
Openness=Open-weights,...
2026.02
70.1
Translated-s1k
SFT Dataset=Translated...
2025.08
69
Distilled-s1k
SFT Dataset=Distilled-...
2025.08
69
Qwen-3-14B
Openness=Open-weights,...
2026.02
66.1
Llama-3.3-70B
Openness=Open-weights,...
2026.02
65.7
Mistral-3.2-24B
Openness=Open-weights,...
2026.02
64.1
Gemma-3-27B
Openness=Open-weights,...
2026.02
60.2
OLMo-3.1-32B
Openness=Fully-open, R...
2026.02
57
Gemma-3-12B
Openness=Open-weights,...
2026.02
53.3
EuroLLM-22B (new)
Openness=Fully-open, R...
2026.02
45.3
OLMo-3-7B
Openness=Fully-open, R...
2026.02
41.8
EuroLLM-22B (old)
Openness=Fully-open, R...
2026.02
37.9
EuroLLM-9B (new)
Openness=Fully-open, R...
2026.02
37.7
Apertus-70B
Openness=Fully-open, R...
2026.02
36.5
Llama-3.1-8B
Openness=Open-weights,...
2026.02
33.4
Apertus-8B
Openness=Fully-open, R...
2026.02
29.5
EuroLLM-9B (old)
Openness=Fully-open, R...
2026.02
29
Feedback
Search any
task
Search any
task