Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings

About

So far, expensive finetuning beyond the pretraining sequence length has been a requirement for effectively extending the context of language models (LM). In this work, we break this key bottleneck by Dropping the Positional Embeddings of LMs after training (DroPE). Our simple method is motivated by three key theoretical and empirical observations. First, positional embeddings (PEs) serve a crucial role during pretraining, providing an important inductive bias that significantly facilitates convergence. Second, over-reliance on this explicit positional information is also precisely what prevents test-time generalization to sequences of unseen length, even when using popular PE-scaling methods. Third, positional embeddings are not an inherent requirement of effective language modeling and can be safely removed after pretraining, following a short recalibration phase. Empirically, DroPE yields seamless zero-shot context extension without any long-context finetuning, quickly adapting pretrained LMs without compromising their capabilities in the original training context. Our findings hold across different models and dataset sizes, far outperforming previous specialized architectures and established rotary positional embedding scaling methods.

Yoav Gelberg, Koshi Eguchi, Takuya Akiba, Edoardo Cetin• 2025

Related benchmarks

Task	Dataset	Result
Long-context Language Understanding	LongBench (test)	Average Score13.81	147
Needle-in-a-Haystack	Needle-in-a-haystack 4x original context	Accuracy55	35
Long-context language modeling	LongBench MultiFieldQA, MuSiQue, GovReport 2023 (test)	MultiFieldQA Score32.18	8
Needle-In-A-Haystack Retrieval	RULER (test)	Multi-Query Success Rate2.80e+3	8
Long-context Reasoning	LongBench and Needle-In-A-Haystack (NIAH) (test)	MultiFieldQA Score29.33	5
Needle-in-a-Haystack	Needle-in-a-haystack 2x original context	Needle-in-a-haystack Accuracy (2x Context)74.92	4
Needle-in-a-Haystack	Needle-in-a-haystack 8x original context	Accuracy52.2	4

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord