Vertex-Softmax: Tight Transformer Verification via Exact Softmax Optimization
About
Certified verification of transformer attention requires bounding the softmax function over interval constraints on the pre-softmax scores. Existing verifiers relax softmax ndependently of the downstream objective, leaving avoidable slack. We prove that the exact optimum of this score-box problem is attained at a vertex of the constraint box, and establish a threshold structure theorem showing that, after sorting the objective coefficients, the optimum lies among only linearly many candidates, yielding the Vertex-Softmax primitive with log-linear complexity in the sequence length. We further prove a formal optimality result showing that Vertex-Softmax is the tightest sound bound obtainable from score intervals alone, characterizing precisely what additional structure (score correlations, score-value coupling) is needed for further improvement. Integrated into a CROWN Convex Relaxation based Optimization for Worst-case Neurons)-style verifier with a formal soundness guarantee, Vertex-Softmax significantly improves certified rates and substantially tightens lower bounds across MNIST, Fashion-MNIST, and CIFAR-10 attention models, while consistently matching or outperforming alpha-CROWN and branch-and-bound baselines at a fraction of their cost.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Robustness Certification | Attention blocks | Certification Rate74.2 | 12 | |
| Robustness Certification | MNIST Binary | Certified Rate97.2 | 10 | |
| Image Classification | Fashion MNIST | Clean Accuracy70.8 | 8 | |
| Neural Network Verification | Small-attention experiments | Time per Trial (s)0.03 | 6 | |
| Robustness Certification | Residual-MHA blocks | Certification Score33.7 | 6 | |
| Robustness Certification | MNIST 10-class | Certified Rate12.9 | 4 | |
| Image Classification | CIFAR-10 gray | Clean Accuracy34.6 | 4 |