Explicit Dropout: Deterministic Regularization for Transformer Architectures

About

Dropout is a widely used regularization technique in deep learning, but its effects are typically realized through stochastic masking rather than explicit optimization objectives. We propose a deterministic formulation that expresses dropout as an additive regularizer directly incorporated into the training loss. The framework derives explicit regularization terms for Transformer architectures, covering attention query, key, value, and feed-forward components with independently controllable strengths. This formulation removes reliance on stochastic perturbations while providing clearer and fine-grained control over regularization strength. Experiments across image classification, temporal action detection, and audio classification show that explicit dropout matches or outperforms conventional implicit methods, with consistent gains when applied to attention and feed-forward network layers. Ablation studies demonstrate stable performance and controllable regularization through regularization coefficients and dropout rates. Overall, explicit dropout offers a practical and interpretable alternative to stochastic regularization while maintaining architectural flexibility across diverse tasks.

Vidhi Agrawal, Illia Oleksiienko, Alexandros Iosifidis• 2026

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR-10 (test)	Accuracy86.38	1063
Image Classification	CIFAR-100 (test)	Accuracy56.81	295
Temporal Action Detection	THUMOS14 (test)	mAP56.51	37
Music Genre Classification	GTZAN (test)	Accuracy85.78	27
Temporal Action Detection	THUMOS14 Kinetics-400 features (test)	mAP64.68	12

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord