Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Learning Perturbations to Extrapolate Your LLM

About

Recent advancements in large language models demonstrate that injecting perturbations can substantially enhance extrapolation performance. However, current approaches often rely on discrete perturbations with fixed designs, which limits their flexibility. In this work, we propose a framework where token prefixes are perturbed by a learnable transformation of a continuous latent vector within an embedding space. To overcome the challenge of an intractable marginal likelihood, we derive unbiased estimating equations for model parameters and optimize them via stochastic gradient descent. We establish the statistical properties of the resulting estimator in over-parameterized regimes. Empirical evaluations on both synthetic and real-world datasets demonstrate that our proposal yields significant gains in out-of-domain settings over a range of state-of-the-art baseline methods.

Zetai Cen, Chenfei Gu, Jin Zhu, Ting Li, Yunxiao Chen, Chengchun Shi• 2026

Related benchmarks

TaskDatasetResultRank
Language ModelingWritingPrompts
MAUVE32
33
Language ModelingWebText
Mauve0.68
33
Language ModelingWikiText-2
Mauve0.68
33
Language ModelingCODEPARROT
Perplexity22
19
Language ModelingWikiText-103
Perplexity (PPL)72
15
Language ModelingGermanQuAD
Perplexity (PPL)117
15
Text GenerationWritingPrompts
ROUGE-134.6
15
Text GenerationCODEPARROT
ROUGE-149.5
15
Text GenerationWebText
ROUGE-137.5
15
Text GenerationGermanQuAD
ROUGE-136
15
Showing 10 of 12 rows

Other info

Follow for update