Certified Training: Small Boxes are All You Need

About

To obtain, deterministic guarantees of adversarial robustness, specialized training methods are used. We propose, SABR, a novel such certified training method, based on the key insight that propagating interval bounds for a small but carefully selected subset of the adversarial input region is sufficient to approximate the worst-case loss over the whole region while significantly reducing approximation errors. We show in an extensive empirical evaluation that SABR outperforms existing certified defenses in terms of both standard and certifiable accuracies across perturbation magnitudes and datasets, pointing to a new class of certified training methods promising to alleviate the robustness-accuracy trade-off.

Mark Niklas M\"uller, Franziska Eckert, Marc Fischer, Martin Vechev• 2022

Related benchmarks

Task	Dataset	Result
Image Classification	MNIST (test)	Test Accuracy99.2	32
Certified Robustness	CIFAR-100 (test)	Clean Accuracy39.7	11
Image Classification	MNIST	Clean Accuracy98.7	7
Image Classification	CIFAR-10 (test)	Clean Accuracy79.2	7
Image Classification	CIFAR-10	Clean Accuracy51.8	7
Image Classification	TinyImageNet	Clean Accuracy28.3	7
Text Classification	SST-2 PWWS	Robust Accuracy16.8	4
Text Classification	SST-2 TextFooler	Robust Accuracy9.4	4

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord