Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

General Reasoning Evaluation on Reasoning Benchmarks Aggregate

70.63Average Score

BF16

-2.502816.483635.4754.4564Jan 21, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.01
70.63-
2026.01
62.38-8.25
2026.01
54.96-15.67
2026.01
51.8-18.83
2026.01
48.72-
2026.01
43.26-5.46
2026.01
41.89-28.74
2026.01
41.1-
2026.01
38.07-10.65
2026.01
32.13-16.58
2026.01
31.67-9.43
2026.01
28.18-20.54
2026.01
15.27-25.83
2026.01
12.61-28.49
2026.01
10.08-38.64
2026.01
8.4-32.7
2026.01
6.16-64.47
2026.01
5.68-43.04
2026.01
5.12-65.61
2026.01
5.07-36.03
2026.01
5.02-43.7
2026.01
4.78-36.32
2026.01
2.6-68.03
2026.01
0.31-40.79