Being Bayesian about Categorical Probability
About
Neural networks utilize the softmax as a building block in classification tasks, which contains an overconfidence problem and lacks an uncertainty representation ability. As a Bayesian alternative to the softmax, we consider a random variable of a categorical probability over class labels. In this framework, the prior distribution explicitly models the presumed noise inherent in the observed label, which provides consistent gains in generalization performance in multiple challenging tasks. The proposed method inherits advantages of Bayesian approaches that achieve better uncertainty estimation and model calibration. Our method can be implemented as a plug-and-play loss function with negligible computational overhead compared to the softmax with the cross-entropy loss function.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| OOD Detection | CIFAR-10 (IND) SVHN (OOD) | AUROC0.975 | 91 | |
| OOD Detection | CIFAR-100 IND SVHN OOD | AUROC (%)71.5 | 74 | |
| OOD Detection | CIFAR10 ID FMNIST OOD | AUROC0.936 | 54 | |
| OOD Detection | CIFAR-10 OOD (test) | AUROC98.8 | 36 | |
| Selective Classification | CIFAR-100 (test) | AUC0.826 | 32 | |
| OOD Detection | CIFAR100 ID TImageNet OOD | AUROC0.714 | 31 | |
| OOD Detection | TinyImageNet (In-distribution) / CIFAR10 (OOD) | AUPR85.3 | 24 | |
| Selective Classification | CIFAR-10 (test) | AUC0.901 | 21 | |
| OOD Detection | CIFAR-10 IND ImageNet R OOD | AUROC87.7 | 20 | |
| OOD Detection | CIFAR-10 vs SVHN (test) | -- | 19 |