Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings

About

So far, expensive finetuning beyond the pretraining sequence length has been a requirement for effectively extending the context of language models (LM). In this work, we break this key bottleneck by Dropping the Positional Embeddings of LMs after training (DroPE). Our simple method is motivated by three key theoretical and empirical observations. First, positional embeddings (PEs) serve a crucial role during pretraining, providing an important inductive bias that significantly facilitates convergence. Second, over-reliance on this explicit positional information is also precisely what prevents test-time generalization to sequences of unseen length, even when using popular PE-scaling methods. Third, positional embeddings are not an inherent requirement of effective language modeling and can be safely removed after pretraining, following a short recalibration phase. Empirically, DroPE yields seamless zero-shot context extension without any long-context finetuning, quickly adapting pretrained LMs without compromising their capabilities in the original training context. Our findings hold across different models and dataset sizes, far outperforming previous specialized architectures and established rotary positional embedding scaling methods.

Yoav Gelberg, Koshi Eguchi, Takuya Akiba, Edoardo Cetin• 2025

Related benchmarks

TaskDatasetResultRank
Long-context Language UnderstandingLongBench (test)
Average Score13.81
133
Long-context language modelingLongBench MultiFieldQA, MuSiQue, GovReport 2023 (test)
MultiFieldQA Score32.18
8
Needle-In-A-Haystack RetrievalRULER (test)
Multi-Query Success Rate2.80e+3
8
Long-context ReasoningLongBench and Needle-In-A-Haystack (NIAH) (test)
MultiFieldQA Score29.33
5
Needle-in-a-HaystackNeedle-in-a-haystack 2x original context
Needle-in-a-haystack Accuracy (2x Context)74.92
4
Needle-in-a-HaystackNeedle-in-a-haystack 4x original context
Accuracy55
4
Needle-in-a-HaystackNeedle-in-a-haystack 8x original context
Accuracy52.2
4
Showing 7 of 7 rows

Other info

Follow for update