Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

YaRN: Efficient Context Window Extension of Large Language Models

About

Rotary Position Embeddings (RoPE) have been shown to effectively encode positional information in transformer-based language models. However, these models fail to generalize past the sequence length they were trained on. We present YaRN (Yet another RoPE extensioN method), a compute-efficient method to extend the context window of such models, requiring 10x less tokens and 2.5x less training steps than previous methods. Using YaRN, we show that LLaMA models can effectively utilize and extrapolate to context lengths much longer than their original pre-training would allow, while also surpassing previous the state-of-the-art at context window extension. In addition, we demonstrate that YaRN exhibits the capability to extrapolate beyond the limited context of a fine-tuning dataset. Code is available at https://github.com/jquesnelle/yarn

Bowen Peng, Jeffrey Quesnelle, Honglu Fan, Enrico Shippole• 2023

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningHellaSwag
Accuracy52.46
1896
Text-to-Image GenerationDPG-Bench
Overall Score78.7
451
Long-context Language UnderstandingLongBench
M-Avg43.61
294
Text-to-Image GenerationGenEval
Overall Score0.65
277
Commonsense ReasoningARC-C
Accuracy29.1
215
Language ModelingPG-19
Perplexity6.62
206
Long-context Language UnderstandingLongBench (test)
Average Score13.07
147
Long-context UnderstandingLongBench v2--
133
Language ModelingPG-19 (test)
Perplexity11.06
112
Common Sense ReasoningPIQA
Accuracy72.2
100
Showing 10 of 72 rows
...

Other info

Follow for update