Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Knowledge & Instruction Following on MMLU-Pro
Loading...
81.39
Score
Qwen3.5-9B (released)
52.4572
59.9686
67.48
74.9914
Jun 1, 2026
Score
Updated 22h ago
Evaluation Results
Method
Method
Links
Score
Qwen3.5-9B (released)
Model Size=9B, Trainin...
2026.06
81.39
Qwen3.5-9B
Params=9B, Sampling mo...
2026.06
81.39
Qwen3.5-4B (released)
Model Size=4B, Trainin...
2026.06
77.88
Qwen3.5-9B + AR-SFT
Model Size=9B, Trainin...
2026.06
77.66
FLARE-9B
Model Size=9B, Trainin...
2026.06
77.39
FLARE-9B
Params=9B, Sampling mo...
2026.06
77.39
LLaDA-2.1-flash
Params=100B-A5B, Sampl...
2026.06
75.31
FLARE-9B
Params=9B, Sampling mo...
2026.06
74.73
LLaDA-2.0-flash
Params=100B-A5B, Sampl...
2026.06
73.36
Qwen3.5-4B + AR-SFT
Model Size=4B, Trainin...
2026.06
71.7
FLARE-4B
Model Size=4B, Trainin...
2026.06
71.14
LLaDA-2.1-mini
Params=16B-A1B, Sampli...
2026.06
63.42
LLaDA-2.0-mini
Params=16B-A1B, Sampli...
2026.06
63.22
SDAR 30B-A3B
Params=30B-A3B, Sampli...
2026.06
61.5
Qwen3.5-2B (released)
Model Size=2B, Trainin...
2026.06
59.53
Qwen3.5-2B + AR-SFT
Model Size=2B, Trainin...
2026.06
56.67
FLARE-2B
Model Size=2B, Trainin...
2026.06
53.57
Feedback
Search any
task
Search any
task