MrRoPE: Mixed-radix Rotary Position Embedding
About
Rotary Position Embedding (RoPE)-extension refers to modifying or generalizing the Rotary Position Embedding scheme to handle longer sequences than those encountered during pre-training. However, current extension strategies are highly diverse and lack a unified theoretical foundation. In this paper, we propose MrRoPE (Mixed-radix RoPE), a generalized encoding formulation based on a radix system conversion perspective, which elegantly unifies various RoPE-extension approaches as distinct radix conversion strategies. Based on this theory, we introduce two training-free extensions, MrRoPE-Uni and MrRoPE-Pro, which leverage uniform and progressive radix conversion strategies, respectively, to achieve 'train short, test long' generalization. Without fine-tuning, MrRoPE-Pro sustains over 85% recall in the 128K-context Needle-in-a-Haystack test and achieves more than double YaRN's accuracy on Infinite-Bench retrieval and dialogue subsets. Theoretical analysis confirms that MrRoPE-Pro effectively raises the upper bound of RoPE's attainable encoding length, which further validates the reliability and utility of our theory and methodology.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Long-context Understanding | LongBench v2 | -- | 37 | |
| Language Modeling | Arxiv Proof-pile | -- | 32 | |
| Long-context retrieval | RULER | Retrieval Accuracy (8K)96.2 | 17 | |
| Language Modeling | Proofpile (test) | Performance (8K Context)3.66 | 12 | |
| Retrieval | RULER 128K context | -- | 12 | |
| Retrieval | RULER 64k context length | -- | 4 | |
| Retrieval | RULER 8K context length | Retrieval Score82.3 | 2 | |
| Retrieval | RULER 16k context length | Retrieval Score82.9 | 2 | |
| Retrieval | RULER 32k context length | Retrieval Score78.5 | 2 |