Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Zero-shot Reasoning on Advanced Reasoning Suite (MMLU-Pro, GPQA, AIME)
Loading...
74.8
MMLU-Pro Accuracy
Base
69.184
70.642
72.1
73.558
Oct 13, 2025
MMLU-Pro Accuracy
GPQA Accuracy
AIME-24 Accuracy
AIME-25 Accuracy
Average Accuracy (Zero-shot Reasoning Suite)
Updated 6d ago
Evaluation Results
Method
Method
Links
MMLU-Pro Accuracy
GPQA Accuracy
AIME-24 Accuracy
AIME-25 Accuracy
Average Accuracy (Zero-shot Reasoning Suite)
Base
Model=Qwen3-8B, Bit=16
2025.10
74.8
58.6
73.3
73.3
70
QTIP
Model=Qwen3-8B, Bit=4
2025.10
74
57.7
70
68.9
67.7
NWC
Model=Qwen3-8B, Bit=3.94
2025.10
73.8
58.8
71.1
71.2
69
Base
Model=Qwen3-4B, Bit=16
2025.10
70.7
54
73.3
60
64.5
QTIP
Model=Qwen3-4B, Bit=4
2025.10
69.8
55.2
71.1
57.8
63.5
NWC
Model=Qwen3-4B, Bit=3.94
2025.10
69.4
53.2
73.3
61.1
63.7
Feedback
Search any
task
Search any
task