Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

General Performance Evaluation on Performance Bench Aggregate

82.49Average Score

DeepSeek-R1-Distill-Qwen-32B (Reasoning)

64.747669.353873.9678.5662Jan 9, 2026
Updated 3d ago

Evaluation Results

MethodLinks
2026.01
82.49
2026.01
78.54
2026.01
69.17
2026.01
68.45
2026.01
68.25
2026.01
67.82
2026.01
66.65
2026.01
65.92
2026.01
65.43