Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Reasoning on MMLU, MATH, GSM8K, BBH Micro-averaged (test)

1.38Accuracy Improvement

P(True)

0.15280.47140.791.1086Feb 10, 2025
Updated 3d ago

Evaluation Results

MethodLinks
2025.02
1.38
2025.02
1.03
2025.02
0.88
2025.02
0.69
2025.02
0.68
2025.02
0.46
2025.02
0.35
2025.02
0.2