Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI)

About

Transformer-based Large Language Models (LLMs) struggle with inputs exceeding their training context window due to positional out-of-distribution (O.O.D.) issues that disrupt attention. Existing solutions, including fine-tuning and training-free methods, face challenges like inefficiency, redundant interpolation, logit outliers, or loss of local positional information. We propose Greedy Attention Logit Interpolation (GALI), a training-free method that improves length extrapolation by greedily reusing pretrained positional intervals and interpolating attention logit to eliminate outliers. GALI achieves stable and superior performance across a wide range of long-context tasks without requiring input-length-specific tuning. Our analysis further reveals that LLMs interpret positional intervals unevenly and that restricting interpolation to narrower ranges improves performance, even on short-context tasks. GALI represents a step toward more robust and generalizable long-text processing in LLMs. Our implementation of GALI, along with the experiments from our paper, is open-sourced at https://github.com/adlnlp/Gali.

Yan Li, Tianyi Zhang, Zechuan Li, Soyeon Caren Han• 2025

Related benchmarks

TaskDatasetResultRank
Long-context Language UnderstandingLongBench
M-Avg46.22
219
Language ModelingPG-19 (test)
Perplexity11.05
106
Language ModelingPG-19
Perplexity8.81
96
Long-context Language UnderstandingL-Eval
Coursera56.54
26
Long-context Language UnderstandingL-Eval (test)
Coursera54.65
26
Long-context Language UnderstandingLongBench 1.0 (test)
MultiNews23.37
21
Showing 6 of 6 rows

Other info

Code

Follow for update