Looped Transformers as Programmable Computers

About

We present a framework for using transformer networks as universal computers by programming them with specific weights and placing them in a loop. Our input sequence acts as a punchcard, consisting of instructions and memory for data read/writes. We demonstrate that a constant number of encoder layers can emulate basic computing blocks, including embedding edit operations, non-linear functions, function calls, program counters, and conditional branches. Using these building blocks, we emulate a small instruction-set computer. This allows us to map iterative algorithms to programs that can be executed by a looped, 13-layer transformer. We show how this transformer, instructed by its input, can emulate a basic calculator, a basic linear algebra library, and in-context learning algorithms that employ backpropagation. Our work highlights the versatility of the attention mechanism, and demonstrates that even shallow transformers can execute full-fledged, general-purpose programs.

Angeliki Giannou, Shashank Rajput, Jy-yong Sohn, Kangwook Lee, Jason D. Lee, Dimitris Papailiopoulos• 2023

Related benchmarks

Task	Dataset	Result
Language Modeling	WikiText-2	--	2320
Mathematical Reasoning	COUNTDOWN (test)	Accuracy68.2	84
Language Modeling	(val)	Validation BPB0.905	76
Evaluation	Core Metrics	Core Metrics16.13	22
Zero-shot Reasoning	Downstream Tasks (LMB, PIQA, HellaSwag, OPQA, ARC)	LAMBADA (LMB) Accuracy30.07	22
Reasoning	Sudoku (test)	Accuracy100	19
Downstream Task Evaluation	Multiple Downstream Datasets (LAMBADA, ARC, WinoGrande, PIQA, HellaSwag, SciQ, RACE)	LAMBADA (OpenAI)40.4	12
Mathematical Reasoning	3-SAT (test)	Accuracy91.3	5
Matrix Inversion	General Matrices	Layers13	2
Matrix Transposition	General Matrices	Layers (L)4	2

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord