CoPE: Clipped RoPE as A Scalable Free Lunch for Long Context LLMs

About

Rotary Positional Embedding (RoPE) is a key component of context scaling in Large Language Models (LLMs). While various methods have been proposed to adapt RoPE to longer contexts, their guiding principles generally fall into two categories: (1) out-of-distribution (OOD) mitigation, which scales RoPE frequencies to accommodate unseen positions, and (2) Semantic Modeling, which posits that the attention scores computed with RoPE should always prioritize semantically similar tokens. In this work, we unify these seemingly distinct objectives through a minimalist intervention, namely CoPE: soft clipping lowfrequency components of RoPE. CoPE not only eliminates OOD outliers and refines semantic signals, but also prevents spectral leakage caused by hard clipping. Extensive experiments demonstrate that simply applying our soft clipping strategy to RoPE yields significant performance gains that scale up to 256k context length, validating our theoretical analysis and establishing CoPE as a new state-of-the-art for length generalization. Our code, data, and models are available at https://github.com/hrlics/CoPE.

Haoran Li, Sucheng Ren, Alan Yuille, Feng Wang• 2026

Related benchmarks

Task	Dataset	Result
Language Understanding	MMLU	Accuracy62.37	844
Reasoning	BBH	Accuracy64.51	726
Question Answering	GPQA	Accuracy29.31	258
Language Understanding	MMLU-Pro	Accuracy34.05	116
Long-context language modeling	HELMET	Summarization Score32.78	27
Math Find	InfiniteBench	Performance (8k Context)35.43	3
KV	InfiniteBench	KV Retrieval Score (8k)6.2	3
MK	RULER	Performance @ 8k Context100	3
NIAH	RULER	NIAH Score (8k Context)100	3
Long-context language modeling	RULER	Accuracy (8k Context)81.5	2

Showing 10 of 10 rows

Other info

GitHub

Follow for update

@wizwand_team Discord