Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Explicit Dropout: Deterministic Regularization for Transformer Architectures

About

Dropout is a widely used regularization technique in deep learning, but its effects are typically realized through stochastic masking rather than explicit optimization objectives. We propose a deterministic formulation that expresses dropout as an additive regularizer directly incorporated into the training loss. The framework derives explicit regularization terms for Transformer architectures, covering attention query, key, value, and feed-forward components with independently controllable strengths. This formulation removes reliance on stochastic perturbations while providing clearer and fine-grained control over regularization strength. Experiments across image classification, temporal action detection, and audio classification show that explicit dropout matches or outperforms conventional implicit methods, with consistent gains when applied to attention and feed-forward network layers. Ablation studies demonstrate stable performance and controllable regularization through regularization coefficients and dropout rates. Overall, explicit dropout offers a practical and interpretable alternative to stochastic regularization while maintaining architectural flexibility across diverse tasks.

Vidhi Agrawal, Illia Oleksiienko, Alexandros Iosifidis• 2026

Related benchmarks

TaskDatasetResultRank
Image ClassificationCIFAR-10 (test)
Accuracy86.38
882
Image ClassificationCIFAR-100 (test)
Accuracy56.81
295
Temporal Action DetectionTHUMOS14 (test)
mAP56.51
37
Music Genre ClassificationGTZAN (test)
Accuracy85.78
27
Temporal Action DetectionTHUMOS14 Kinetics-400 features (test)
mAP64.68
12
Showing 5 of 5 rows

Other info

Follow for update