Building Blocks for a Complex-Valued Transformer Architecture
About
Most deep learning pipelines are built on real-valued operations to deal with real-valued inputs such as images, speech or music signals. However, a lot of applications naturally make use of complex-valued signals or images, such as MRI or remote sensing. Additionally the Fourier transform of signals is complex-valued and has numerous applications. We aim to make deep learning directly applicable to these complex-valued signals without using projections into $\mathbb{R}^2$. Thus we add to the recent developments of complex-valued neural networks by presenting building blocks to transfer the transformer architecture to the complex domain. We present multiple versions of a complex-valued Scaled Dot-Product Attention mechanism as well as a complex-valued layer normalization. We test on a classification and a sequence generation task on the MusicNet dataset and show improved robustness to overfitting while maintaining on-par performance when compared to the real-valued transformer architecture.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Modulation Classification | RadioML RML2016 mirror (test) | L1 Error0.244 | 6 | |
| Music Modeling | Real MusicNet | L1 Error0.201 | 6 | |
| Image Classification | FFT-MNIST | Accuracy39 | 6 | |
| Long Range Arena ListOps | LRA-ListOps small | Accuracy (LRA-ListOps small)63.7 | 6 | |
| Pitch Estimation | multi-pitch | Accuracy82 | 6 | |
| Radio Modulation Classification | RadioML L2 | Accuracy27 | 6 | |
| Copying Task | Copy d=500 | Accuracy10 | 6 | |
| Copying Task | Copy d=2000 | Accuracy8 | 6 | |
| Logical operations parsing | ListOps mid L1024 | Accuracy10.4 | 6 | |
| Memory retention task | phase-memory | Accuracy93 | 6 |