| CIFAR-100 (test) | DPS | ECE0.85 | | 99 | 4d ago |
| BEAR (test) | gemma-7b | Brier Score0.083 | | 96 | 4d ago |
| NQ | Temp. Scaling | ECE0.046 | | 55 | 4d ago |
| CIFAR-10H | P+L (Recalibrated) | ECE0.84 | | 52 | 4d ago |
| MMLU | Verbalized confidence | Brier Score0.0559 | | 42 | 4d ago |
| Brock-Hommes (test) | ANTR | MSE0 | | 40 | 4d ago |
| TriviaQA | Probe (train on TriviaQA) | Brier Score0.0845 | | 39 | 4d ago |
| MNIST | ETS | ECE0.21 | | 33 | 4d ago |
| SQuAD | Temp. Scaling | ECE5.87 | | 31 | 4d ago |
| WebQ | Temp. Scaling | ECE0.0674 | | 31 | 4d ago |
| Digital-S | Knowledge-Transferring-based Temperature Scaling | ECE7.37 | | 27 | 4d ago |
| USPS | Knowledge-Transferring-based Temperature Scaling | ECE4.55 | | 27 | 4d ago |
| Average StrategyQA, HotpotQA, NQ, Bamboogle | NAACL | ECE0.264 | | 24 | 4d ago |
| Bamboogle | NAACL | ECE0.113 | | 24 | 4d ago |
| HotpotQA | NAACL | ECE0.28 | | 24 | 4d ago |
| StrategyQA | NAACL | ECE0.285 | | 24 | 4d ago |
| ImageNet-C OOD-domains | MD-TS | ECE (%)1.43 | | 24 | 4d ago |
| ImageNet-C (InD-domains) | TS | ECE (%)0.89 | | 24 | 4d ago |
| CIFAR100 Noise levels 1-5 (val) | Deep Ensemble | NLL1.499 | | 20 | 4d ago |
| CIFAR10 Noise levels 1-5 (val) | GSD | NLL0.531 | | 20 | 4d ago |
| DermaMNIST (test) | F-EDL | Brier Score2.32 | | 19 | 4d ago |
| GLD OOD-domains v2 | MD-TS | ECE3.41 | | 18 | 4d ago |
| GLD InD-domains v2 | MD-TS | ECE (%)2.61 | | 18 | 4d ago |
| WILDS-RxRx1 (OOD-domains) | MD-TS | ECE0.0277 | | 18 | 4d ago |
| WILDS-RxRx1 (InD-domains) | MD-TS | ECE1.87 | | 18 | 4d ago |