Reparameterized Multi-Resolution Convolutions for Long Sequence Modelling
About
Global convolutions have shown increasing promise as powerful general-purpose sequence models. However, training long convolutions is challenging, and kernel parameterizations must be able to learn long-range dependencies without overfitting. This work introduces reparameterized multi-resolution convolutions ($\texttt{MRConv}$), a novel approach to parameterizing global convolutional kernels for long-sequence modelling. By leveraging multi-resolution convolutions, incorporating structural reparameterization and introducing learnable kernel decay, $\texttt{MRConv}$ learns expressive long-range kernels that perform well across various data modalities. Our experiments demonstrate state-of-the-art performance on the Long Range Arena, Sequential CIFAR, and Speech Commands tasks among convolution models and linear-time transformers. Moreover, we report improved performance on ImageNet classification by replacing 2D convolutions with 1D $\texttt{MRConv}$ layers.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | ImageNet (test) | Top-1 Accuracy83.9 | 291 | |
| Long-range sequence modeling | Long Range Arena (LRA) (test) | Accuracy (Avg)88.2 | 158 | |
| 35-way Speech Classification | Speech Commands 16kHz 35-way (test) | Accuracy96.82 | 32 | |
| 35-way Speech Classification | Speech Commands 8kHz 35-way (test) | Accuracy95.05 | 28 | |
| 1D Image Classification | sCIFAR 1.0 (test) | Accuracy94.26 | 18 | |
| Image Classification | sCIFAR (test) | Accuracy94.26 | 15 |