Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Reasoning on MMLU & GPQA
Loading...
62.62
Average Score
INSIGHT
39.0744
45.1872
51.3
57.4128
Mar 2, 2026
Average Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Average Score
INSIGHT
Model=Qwen3-4B
2026.03
62.62
MOPPS
Model=Qwen3-4B
2026.03
62.17
EXPECTED-DIFFICULTY
Model=Qwen3-4B
2026.03
62.17
INVERSE-EVIDENCE
Model=Qwen3-4B
2026.03
62.16
RANDOM
Model=Qwen3-4B
2026.03
62.14
INSIGHT
Model=Qwen3-1.7B
2026.03
51.96
INVERSE-EVIDENCE
Model=Qwen3-1.7B
2026.03
51.48
RANDOM
Model=Qwen3-1.7B
2026.03
51.36
MOPPS
Model=Qwen3-1.7B
2026.03
51.35
INSIGHT
Model=Qwen3-0.6B
2026.03
42.14
RANDOM
Model=Qwen3-0.6B
2026.03
41.13
MOPPS
Model=Qwen3-0.6B
2026.03
40.8
INVERSE-EVIDENCE
Model=Qwen3-0.6B
2026.03
39.98
Feedback
Search any
task
Search any
task