Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Knowledge-based reasoning on MMLU High School History 1.0 (test)
Loading...
92.83
Accuracy
QwQ-32B (Full COT)
46.0196
58.1723
70.325
82.4777
Aug 5, 2025
Accuracy
Average Tokens Per Answer
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Average Tokens Per Answer
QwQ-32B (Full COT)
Model=QwQ-32B, Pruning...
2025.08
92.83
1,703.9
QwQ-32B (80%)
Model=QwQ-32B, Pruning...
2025.08
92.83
1,683.9
QwQ-32B (90%)
Model=QwQ-32B, Pruning...
2025.08
92.83
1,683.9
QwQ-32B (No Thinking)
Model=QwQ-32B, Pruning...
2025.08
91.14
-
DeepSeek-R1-7B (80%)
Pruning level=80%
2025.08
64.32
1,907.5
DeepSeek-R1-7B
Pruning level=Full COT
2025.08
61.74
2,054.3
DeepSeek-R1-7B (90%)
Pruning level=90%
2025.08
61.74
1,936.2
DeepSeek-R1-7B
Pruning level=No Thinking
2025.08
47.82
-
Feedback
Search any
task
Search any
task