Length Generalization with Log-Depth Recurrent Units
About
Length generalization remains a persistent challenge for neural networks: recurrent models tend to suffer from positional biases, while transformers are constrained by fixed computational depth. Regular languages provide a frequently used testbed for evaluating length generalization, as label prediction can be checked for any sequence length. We propose MLP-LDRU, a type of Log-Depth Recurrent Unit, which captures a class of associativity-biased operators designed to approximate recurrence through parallel reduction. We evaluate MLP-LDRU on 21 regular-language tasks, consisting of standard benchmarks and new prefix languages, where it achieves 100% out-of-distribution accuracy on 18 tasks and at least 99.9% on the remaining 3 when increasing max training length, outperforming comparable recurrent and attention-based models. We further evaluate MLP-LDRU beyond regular languages on ListOps and NLP classification benchmarks, where it performs competitively.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text Classification | AG News (test) | Accuracy89 | 293 | |
| Natural Language Understanding | GLUE | MRPC Score63.2 | 30 | |
| Regular Language Recognition | Even Pairs | Accuracy100 | 11 | |
| Regular Language Recognition | Modular Arithmetic | Accuracy100 | 11 | |
| Regular Language Recognition | Cycle Navigation | Accuracy100 | 11 | |
| Regular Language Recognition | Parity Check | Accuracy100 | 11 | |
| List operations evaluation | ListOps (3, 9) (test) | Mean Accuracy74.7 | 7 | |
| List operations evaluation | ListOps (3, 14) (test) | Mean Accuracy69.7 | 7 | |
| List operations evaluation | ListOps (5, 9) (test) | Mean Accuracy45.9 | 7 | |
| List operations evaluation | ListOps (5, 14) (test) | Mean Accuracy49 | 7 |