On the Existence of Universal Simulators of Attention

About

Previous work on the learnability of transformers \textemdash\ focused primarily on examining their ability to approximate specific algorithmic patterns through training \textemdash\ has largely been data-driven, offering only probabilistic guarantees rather than deterministic solutions. Expressivity, on the contrary, has been devised to address the problems \emph{computable} by such architecture theoretically. These results proved the Turing-completeness of transformers, investigated bounds focused on circuit complexity, and formal logic. Being at the crossroad between learnability and expressivity, the question remains: \emph{can a transformer, as a computational model, simulate an arbitrary attention mechanism, or in particular, the underlying operations?} In this study, we investigate the transformer encoder's ability to simulate a vanilla attention mechanism. By constructing a universal simulator $\mathcal{U}$ composed of transformer encoders, we present algorithmic solutions to replicate attention outputs and the underlying elementary matrix and activation operations via RASP, a formal framework for transformer computation. We show the existence of an algorithmically achievable, data-agnostic solution, previously known to be approximated only by learning.

Debanjan Dutta, Anish Chakrabarty, Faizanuddin Ansari, Swagatam Das• 2025

Related benchmarks

Task	Dataset	Result
Matrix Multiplication	General Matrices	Layers (L)3	2
Matrix Inversion	General Matrices	Layers5	2
Matrix Transposition	General Matrices	Layers (L)2	2

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord