Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

On the Existence of Universal Simulators of Attention

About

Previous work on the learnability of transformers \textemdash\ focused primarily on examining their ability to approximate specific algorithmic patterns through training \textemdash\ has largely been data-driven, offering only probabilistic guarantees rather than deterministic solutions. Expressivity, on the contrary, has been devised to address the problems \emph{computable} by such architecture theoretically. These results proved the Turing-completeness of transformers, investigated bounds focused on circuit complexity, and formal logic. Being at the crossroad between learnability and expressivity, the question remains: \emph{can a transformer, as a computational model, simulate an arbitrary attention mechanism, or in particular, the underlying operations?} In this study, we investigate the transformer encoder's ability to simulate a vanilla attention mechanism. By constructing a universal simulator $\mathcal{U}$ composed of transformer encoders, we present algorithmic solutions to replicate attention outputs and the underlying elementary matrix and activation operations via RASP, a formal framework for transformer computation. We show the existence of an algorithmically achievable, data-agnostic solution, previously known to be approximated only by learning.

Debanjan Dutta, Anish Chakrabarty, Faizanuddin Ansari, Swagatam Das• 2025

Related benchmarks

TaskDatasetResultRank
Matrix MultiplicationGeneral Matrices
Layers (L)3
2
Matrix InversionGeneral Matrices
Layers5
2
Matrix TranspositionGeneral Matrices
Layers (L)2
2
Showing 3 of 3 rows

Other info

Follow for update