Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Calibration benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Calibration
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
CIFAR-100 (test)
DPS
ECE
0.85
104
2mo ago
HMDA (test)
Platt
Oracle ECE
1.29
100
21d ago
BEAR (test)
gemma-7b
Brier Score
0.083
96
3mo ago
MMLU
Verbalized confidence
Brier Score
0.0559
58
14d ago
USPS
Oracle
ECE
1.54
57
12d ago
NQ
Temp. Scaling
ECE
0.046
55
3mo ago
CIFAR-10H
P+L (Recalibrated)
ECE
0.84
52
3mo ago
Brock-Hommes (test)
ANTR
MSE
0
40
3mo ago
TriviaQA
Probe (train on TriviaQA)
Brier Score
0.0845
39
2d ago
MNIST
ETS
ECE
0.21
33
3mo ago
TruthfulQA
S
Gain
2.132
32
2d ago
SQuAD
Temp. Scaling
ECE
5.87
31
3mo ago
WebQ
Temp. Scaling
ECE
0.0674
31
3mo ago
Digital-S
Knowledge-Transferring-based Temperature Scaling
ECE
7.37
27
3mo ago
Average StrategyQA, HotpotQA, NQ, Bamboogle
NAACL
ECE
0.264
24
3mo ago
Bamboogle
NAACL
ECE
0.113
24
3mo ago
HotpotQA
NAACL
ECE
0.28
24
3mo ago
StrategyQA
NAACL
ECE
0.285
24
3mo ago
ImageNet-C OOD-domains
MD-TS
ECE (%)
1.43
24
3mo ago
ImageNet-C (InD-domains)
TS
ECE (%)
0.89
24
3mo ago
CIFAR-10 5000-sample half (test)
BBQ
ECE
0.0095
23
22d ago
Tabular datasets
Platt
NLL
0.2983
21
8d ago
stream 2,000-question
Verbalized + Temp Scaling
ECE
0.08
21
1mo ago
CelebA ImageResNet (test)
Platt
ECE (Oracle Estimate)
0.39
20
21d ago
CIFAR100 Noise levels 1-5 (val)
Deep Ensemble
NLL
1.499
20
3mo ago
Showing 25 of 100 rows
25 / page
50 / page
100 / page
1
2
3
4
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs