| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MACE (test) | SCA | AUROC81.2 | 84 | 1mo ago | |
| dermatology | Confidence Calibration Error0.013 | 66 | 1mo ago | ||
| CIFAR-100-LT (test) | Knowledge-Transferring-based Temperature Scaling | ECE0.015 | 53 | 1mo ago | |
| vehicle | IRP | Calibration Error0.001 | 44 | 1mo ago | |
| glass | IRP | Calibration Error0.008 | 44 | 1mo ago | |
| car | Dirichlet Calibration | Calibration Error1.1 | 44 | 1mo ago | |
| Pubmed | CaGCN | ECE0.0308 | 36 | 1mo ago | |
| Citeseer | GATS | ECE3.86 | 36 | 1mo ago | |
| Cora | CaGCN | ECE0.0313 | 36 | 1mo ago | |
| CoraFull | CaGCN | ECE0.0701 | 28 | 1mo ago | |
| SimpleQA | Probe (train on TriviaQA) | Brier Score0.0386 | 27 | 1mo ago | |
| cleveland | IRP | Calibration Error0.03 | 22 | 1mo ago | |
| balance-scale | IRP | Calibration Error0.006 | 22 | 1mo ago | |
| Average of four domains Relational Inference Planning | first-second-distance-based (FSD) | Brier Score0.114 | 18 | 1mo ago | |
| MultiNLI Mismatch (test) | MIR | ECE0.0071 | 16 | 1mo ago | |
| BeyondAIME (test) | Qwen3-4B-Instruct-ppo-value | SNR Gain1.202 | 15 | 1mo ago | |
| iNaturalist 2021 | PTSK + PROCAL | ECE0.65 | 12 | 1mo ago | |
| FMNIST ID (test) | OTIS | ECE3.26 | 9 | 1mo ago | |
| MNIST ID (test) | CEDA | ECE0.14 | 9 | 1mo ago | |
| SVHN ID (test) | OE | ECE1.28 | 9 | 1mo ago | |
| CIFAR-100 ID (test) | ECE6.08 | 9 | 1mo ago | ||
| CIFAR-10 ID (test) | OTIS | ECE1.88 | 9 | 1mo ago | |
| TriviaQA (test) | DINCO | Expected Calibration Error0.065 | 7 | 1mo ago | |
| HLE (test) | HTC | ECE0.031 | 7 | 1mo ago | |
| GPQA (test) | HTC | ECE0.102 | 7 | 1mo ago |