Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation

About

Unlike recurrent models, conventional wisdom has it that Transformers cannot perfectly model regular languages. Inspired by the notion of working memory, we propose a new Transformer variant named RegularGPT. With its novel combination of Weight-Sharing, Adaptive-Depth, and Sliding-Dilated-Attention, RegularGPT constructs working memory along the depth dimension, thereby enabling efficient and successful modeling of regular languages such as PARITY. We further test RegularGPT on the task of natural language length extrapolation and surprisingly find that it rediscovers the local windowed attention effect deemed necessary in prior work for length extrapolation.

Ta-Chung Chi, Ting-Han Fan, Alexander I. Rudnicky, Peter J. Ramadge• 2023

Related benchmarks

TaskDatasetResultRank
Regular Language RecognitionEven Pairs
Accuracy91.9
11
Regular Language RecognitionCycle Navigation
Accuracy99.9
11
Regular Language RecognitionModular Arithmetic
Accuracy99.1
11
Regular Language RecognitionParity Check
Accuracy99.8
11
Regular Language RecognitionTomita Grammars 3, 4, 5, 6, 7
Accuracy92.2
3
Regular Language RecognitionPrefix Languages P1,2, P2,2, P4,2, P1,4, P2,4, P4,4
Accuracy95.3
3
Showing 6 of 6 rows

Other info

Follow for update