Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on MATH (Improvement Metrics)
Loading...
34
Comparable SC (%)
P(True)
10.08
16.29
22.5
28.71
Feb 10, 2025
Comparable SC (%)
Accuracy Improvement (%)
Updated 3d ago
Evaluation Results
Method
Method
Links
Comparable SC (%)
Accuracy Improvement (%)
P(True)
Confidence Method=P(Tr...
2025.02
34
2
P(True)
Confidence Method=P(Tr...
2025.02
32
3
Response Probability
Confidence Method=Resp...
2025.02
19
2.2
Verbal Binary
Confidence Method=Verb...
2025.02
18
0.8
Verbal 1-100
Confidence Method=Verb...
2025.02
17
1.3
Response Probability
Confidence Method=Resp...
2025.02
17
1.2
Verbal 1-100
Confidence Method=Verb...
2025.02
12
0.6
Verbal Binary
Confidence Method=Verb...
2025.02
11
0.5
Feedback
Search any
task
Search any
task