Robust Adversarial Quantification via Conflict-Aware Evidential Deep Learning
About
Reliability of deep learning models is critical for deployment in high-stakes applications, where out-of-distribution or adversarial inputs may lead to detrimental outcomes. Evidential Deep Learning, an efficient paradigm for uncertainty quantification, models predictions as Dirichlet distributions of a single forward pass. However, EDL is particularly vulnerable to adversarially perturbed inputs, making overconfident errors. Conflict-aware Evidential Deep Learning (C-EDL) is a lightweight post-hoc uncertainty quantification approach that mitigates these issues, enhancing adversarial and OOD robustness without retraining. C-EDL generates diverse, task-preserving transformations per input and quantifies representational disagreement to calibrate uncertainty estimates when needed. C-EDL's conflict-aware prediction adjustment improves detection of OOD and adversarial inputs, maintaining high in-distribution accuracy and low computational overhead. Our experimental evaluation shows that C-EDL significantly outperforms state-of-the-art EDL variants and competitive baselines, achieving substantial reductions in coverage for OOD data (up to $\approx$55%) and adversarial data (up to $\approx$90%), across a range of datasets, attack types, and uncertainty metrics.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification, OOD Detection, and Adversarial Attack Detection | MNIST (ID) -> FashionMNIST (OOD) (test) | ID Accuracy (%)99.98 | 11 | |
| Image Classification, OOD Detection, and Adversarial Attack Detection | CIFAR10 (ID) -> SVHN (OOD) (test) | ID Accuracy (%)98.4 | 11 | |
| Image Classification, OOD Detection, and Adversarial Attack Detection | CIFAR10 (ID) -> CIFAR100 (Near-OOD) (test) | ID Accuracy98.64 | 11 | |
| Image Classification, OOD Detection, and Adversarial Attack Detection | Oxford Flowers low-shot (ID) -> Deep Weeds (OOD) (test) | ID Accuracy (%)100 | 11 | |
| Image Classification, OOD Detection, and Adversarial Attack Detection | MNIST (ID) -> KMNIST (OOD) (test) | ID Accuracy99.98 | 11 | |
| Image Classification, OOD Detection, and Adversarial Attack Detection | MNIST (ID) -> EMNIST (Near-OOD) (test) | ID Accuracy99.99 | 11 | |
| Image Classification | MNIST | ID Accuracy99.98 | 9 | |
| Adversarial Attack Detection | MNIST L2PGD attack | Adversarial Coverage23.39 | 9 | |
| Out-of-Distribution Detection | MNIST to FashionMNIST | OOD Coverage5.8 | 9 |