Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Spectral Conditioning of Attention Improves Transformer Performance

About

We present a theoretical analysis of the Jacobian of an attention block within a transformer, showing that it is governed by the query, key, and value projections that define the attention mechanism. Leveraging this insight, we introduce a method that systematically alters the spectral properties of each attention layer to reduce the Jacobian's condition number, thereby improving the overall conditioning of the attention layers within a transformer network. We empirically show that this improved Jacobian conditioning translates to enhanced performance in practice. Our approach is simple, broadly applicable, and can be easily integrated as a drop-in replacement for a wide range of existing attention mechanisms. We validate its effectiveness across diverse transformer architectures and tasks, demonstrating consistent improvements in performance.

Hemanth Saratchandran, Simon Lucey• 2026

Related benchmarks

TaskDatasetResultRank
Object DetectionCOCO 2017 (val)--
2643
Instance SegmentationCOCO 2017 (val)
APm0.405
1201
Natural Language UnderstandingGLUE
SST-292.7
531
Object DetectionCOCO
AP50 (Box)68.1
237
Long-range sequence modelingLong Range Arena (LRA)
Text Accuracy64.8
177
Showing 5 of 5 rows

Other info

Follow for update