Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reasoning on MMLU, MATH, GSM8K, BBH Micro-averaged (test)
Loading...
1.38
Accuracy Improvement
P(True)
0.1528
0.4714
0.79
1.1086
Feb 10, 2025
Accuracy Improvement
Updated 3d ago
Evaluation Results
Method
Method
Links
Accuracy Improvement
P(True)
Confidence Method=P(Tr...
2025.02
1.38
P(True)
Confidence Method=P(Tr...
2025.02
1.03
Response Probability
Confidence Method=Resp...
2025.02
0.88
Response Probability
Confidence Method=Resp...
2025.02
0.69
Verbal 1-100
Confidence Method=Verb...
2025.02
0.68
Verbal 1-100
Confidence Method=Verb...
2025.02
0.46
Verbal Binary
Confidence Method=Verb...
2025.02
0.35
Verbal Binary
Confidence Method=Verb...
2025.02
0.2
Feedback
Search any
task
Search any
task