Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-task Language Understanding on MMLU (Comparable SC Samples, Acc Improvement)
Loading...
47
Comparable SC Samples
P(True)
10.6
20.05
29.5
38.95
Feb 10, 2025
Comparable SC Samples
Accuracy Improvement
Updated 3d ago
Evaluation Results
Method
Method
Links
Comparable SC Samples
Accuracy Improvement
P(True)
Confidence Method=P(Tr...
2025.02
47
1
P(True)
Confidence Method=P(Tr...
2025.02
37
1.4
Verbal 1-100
Confidence Method=Verb...
2025.02
32
0.7
Verbal 1-100
Confidence Method=Verb...
2025.02
25
0.9
Response Probability
Confidence Method=Resp...
2025.02
23
0.6
Verbal Binary
Confidence Method=Verb...
2025.02
18
0.4
Response Probability
Confidence Method=Resp...
2025.02
17
0.7
Verbal Binary
Confidence Method=Verb...
2025.02
12
0.2
Feedback
Search any
task
Search any
task