Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Looped Transformers as Programmable Computers

About

We present a framework for using transformer networks as universal computers by programming them with specific weights and placing them in a loop. Our input sequence acts as a punchcard, consisting of instructions and memory for data read/writes. We demonstrate that a constant number of encoder layers can emulate basic computing blocks, including embedding edit operations, non-linear functions, function calls, program counters, and conditional branches. Using these building blocks, we emulate a small instruction-set computer. This allows us to map iterative algorithms to programs that can be executed by a looped, 13-layer transformer. We show how this transformer, instructed by its input, can emulate a basic calculator, a basic linear algebra library, and in-context learning algorithms that employ backpropagation. Our work highlights the versatility of the attention mechanism, and demonstrates that even shallow transformers can execute full-fledged, general-purpose programs.

Angeliki Giannou, Shashank Rajput, Jy-yong Sohn, Kangwook Lee, Jason D. Lee, Dimitris Papailiopoulos• 2023

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-2--
2320
Mathematical ReasoningCOUNTDOWN (test)
Accuracy68.2
84
Language Modeling(val)
Validation BPB0.905
76
EvaluationCore Metrics
Core Metrics16.13
22
Zero-shot ReasoningDownstream Tasks (LMB, PIQA, HellaSwag, OPQA, ARC)
LAMBADA (LMB) Accuracy30.07
22
ReasoningSudoku (test)
Accuracy100
19
Downstream Task EvaluationMultiple Downstream Datasets (LAMBADA, ARC, WinoGrande, PIQA, HellaSwag, SciQ, RACE)
LAMBADA (OpenAI)40.4
12
Mathematical Reasoning3-SAT (test)
Accuracy91.3
5
Matrix InversionGeneral Matrices
Layers13
2
Matrix TranspositionGeneral Matrices
Layers (L)4
2
Showing 10 of 11 rows

Other info

Follow for update