Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
General Performance Evaluation on Performance Bench Aggregate
Loading...
82.49
Average Score
DeepSeek-R1-Distill-Qwen-32B (Reasoning)
64.7476
69.3538
73.96
78.5662
Jan 9, 2026
Average Score
Updated 3d ago
Evaluation Results
Method
Method
Links
Average Score
DeepSeek-R1-Distill-Qwen-32B (Reasoning)
Model Family=Qwen2.5-32B
2026.01
82.49
ReasonAny
Model Family=Qwen2.5-32B
2026.01
78.54
TIES
Model Family=Qwen2.5-32B
2026.01
69.17
DARE
Model Family=Qwen2.5-32B
2026.01
68.45
Qwen2.5-32B-Instruct (Safety)
Model Family=Qwen2.5-32B
2026.01
68.25
Task Arithmetic
Model Family=Qwen2.5-32B
2026.01
67.82
FuseLLM
Model Family=Qwen2.5-32B
2026.01
66.65
Linear
Model Family=Qwen2.5-32B
2026.01
65.92
LED
Model Family=Qwen2.5-32B
2026.01
65.43
Feedback
Search any
task
Search any
task