Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Knowledge-based reasoning on MMLU College Medicine 1.0 (test)
Loading...
86.13
Accuracy
QwQ-32B (Full COT)
51.1132
60.2041
69.295
78.3859
Aug 5, 2025
Accuracy
Average Tokens Per Answer
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Average Tokens Per Answer
QwQ-32B (Full COT)
Model=QwQ-32B, Pruning...
2025.08
86.13
2,912.3
QwQ-32B (80%)
Model=QwQ-32B, Pruning...
2025.08
84.97
2,475.9
QwQ-32B (90%)
Model=QwQ-32B, Pruning...
2025.08
84.97
2,326.4
QwQ-32B (No Thinking)
Model=QwQ-32B, Pruning...
2025.08
84.3
-
DeepSeek-R1-7B (80%)
Pruning level=80%
2025.08
62.34
2,127.9
DeepSeek-R1-7B (90%)
Pruning level=90%
2025.08
62.34
2,069.4
DeepSeek-R1-7B
Pruning level=Full COT
2025.08
61.73
2,612.7
DeepSeek-R1-7B
Pruning level=No Thinking
2025.08
52.46
-
Feedback
Search any
task
Search any
task