Selective Rotary Position Embedding
About
Position information is essential for language modeling. In softmax transformers, Rotary Position Embeddings (\textit{RoPE}) encode positions through \textit{fixed-angle} rotations, while in linear transformers, order is handled via input-dependent (selective) gating that decays past key-value associations. Selectivity has generally been shown to improve language-related tasks. Inspired by this, we introduce \textit{Selective RoPE}, an \textit{input-dependent} rotary embedding mechanism, that generalizes \textit{RoPE}, and enables rotation in \textit{arbitrary angles} for both linear and softmax transformers. We show that softmax attention already performs a hidden form of these rotations on query-key pairs, uncovering an implicit positional structure. We further show that in state-space models and gated linear transformers, the real part manages forgetting while the imaginary part encodes positions through rotations. We validate our method by equipping gated transformers with \textit{Selective RoPE}, demonstrating that its input-dependent rotations improve performance in language modeling and on difficult sequence tasks like copying, state tracking, and retrieval.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | LAMBADA | Accuracy53.8 | 412 | |
| Multiple-choice Question Answering | ARC Easy | Accuracy59.3 | 257 | |
| Multiple-choice Question Answering | HellaSwag | Accuracy56.9 | 196 | |
| Multiple-choice Question Answering | PIQA | Accuracy73.1 | 63 | |
| Multiple-choice Question Answering | ARC Challenge | Non-generative Accuracy28.8 | 48 | |
| Multiple-choice Question Answering | WinoG | Accuracy56 | 48 | |
| Language Modeling | WikiText v1 (test) | Perplexity17.87 | 30 |