Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Confidence calibration benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Confidence calibration
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
MACE (test)
SCA
AUROC
81.2
84
3mo ago
dermatology
IR
Confidence Calibration Error
0.013
66
2mo ago
CIFAR-100-LT (test)
Knowledge-Transferring-based Temperature Scaling
ECE
0.015
53
3mo ago
vehicle
IRP
Calibration Error
0.001
44
2mo ago
glass
IRP
Calibration Error
0.008
44
2mo ago
car
Dirichlet Calibration
Calibration Error
1.1
44
2mo ago
Pubmed
CaGCN
ECE
0.0308
36
3mo ago
Citeseer
GATS
ECE
3.86
36
3mo ago
Cora
CaGCN
ECE
0.0313
36
3mo ago
CoraFull
CaGCN
ECE
0.0701
28
3mo ago
SimpleQA
Probe (train on TriviaQA)
Brier Score
0.0386
27
3mo ago
cleveland
IRP
Calibration Error
0.03
22
2mo ago
balance-scale
IRP
Calibration Error
0.006
22
2mo ago
ReClor (test)
ORCE
ECE
4.4
21
21d ago
LogiQA (out-of-distribution)
ADVICE w/ ConfTuner
ECE
8
18
1mo ago
MMLU out-of-distribution
ADVICE
ECE
5.6
18
1mo ago
TriviaQA (in-domain)
ADVICE w/ ConfTuner
Expected Calibration Error (ECE)
3.4
18
1mo ago
Average of four domains Relational Inference Planning
first-second-distance-based (FSD)
Brier Score
0.114
18
3mo ago
MultiNLI Mismatch (test)
MIR
ECE
0.0071
16
3mo ago
SciQ (test)
Self-Consistency
ECE
1.5
15
1mo ago
BeyondAIME (test)
Qwen3-4B-Instruct-ppo-value
SNR Gain
1.202
15
3mo ago
iNaturalist 2021
PTSK + PROCAL
ECE
0.65
12
3mo ago
Qwen3-4B Calibration
Base
Brier Delta
0
10
14d ago
FMNIST ID (test)
OTIS
ECE
3.26
9
3mo ago
MNIST ID (test)
CEDA
ECE
0.14
9
3mo ago
Showing 25 of 41 rows
25 / page
50 / page
100 / page
1
2
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs