Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling

About

Multi-timescale sequence modeling relies on capturing both local fast dynamics and global slow context; yet, maintaining these capabilities under the strict memory constraints common to edge devices remains an open challenge. Current State-of-the-Art models with constant memory footprints trade off long-range selectivity and high-precision modeling of fast dynamics. To overcome this trade-off within a fixed memory budget, we propose mGRADE (minimally Gated Recurrent Architecture with Delay Embedding), a hybrid-memory system that introduces inductive biases across timescales by integrating a convolution with learnable temporal spacings with a lightweight gated recurrent component. We show theoretically that the learnable spacings are equivalent to a delay embedding, enabling parameter-efficient reconstruction of partially-observed fast dynamics, while the gated recurrent component selectively maintains long-range context with minimal memory overhead. On the challenging Long-Range Arena benchmark and 35-way Google Speech Commands raw audio classification task, mGRADE reduces the memory footprint by up to a factor of 8 compared to other State-of-the-Art models, while maintaining competitive performance.

Tristan Torchet, Christian Metzner, Karthik Charan Raghunathan, Jimmy Weber, Sebastian Billaudelle, Laura Kriener, Melika Payvand• 2025

Related benchmarks

TaskDatasetResultRank
ClassificationSHD (test)
Accuracy93.77
81
Mathematical logic sequence modelingLong Range Arena (LRA) ListOps (test)
Accuracy61.9
12
Path detectionLong Range Arena (LRA) Pathfinder (test)
Accuracy94.9
12
Byte-level text classificationLong Range Arena (LRA) Text (test)
Accuracy87.3
12
Document RetrievalLong Range Arena (LRA) Retrieval (test)
Accuracy88.1
12
Sequence-to-label image classificationLong Range Arena (LRA) Image (test)
Accuracy87.1
11
Audio ClassificationGSC 35-way (test)
Causal Accuracy94.7
7
Sequence ModelingLRA Pathfinder
Parameters (M)3.04
7
Sequence ModelingLRA ListOps
Parameters164.1
7
Sequence ModelingLRA Text
Model Parameters (M)0.1758
7
Showing 10 of 12 rows

Other info

Follow for update