Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LongAlign: A Recipe for Long Context Alignment of Large Language Models

About

Extending large language models to effectively handle long contexts requires instruction fine-tuning on input sequences of similar length. To address this, we present LongAlign -- a recipe of the instruction data, training, and evaluation for long context alignment. First, we construct a long instruction-following dataset using Self-Instruct. To ensure the data diversity, it covers a broad range of tasks from various long context sources. Second, we adopt the packing and sorted batching strategies to speed up supervised fine-tuning on data with varied length distributions. Additionally, we develop a loss weighting method to balance the contribution to the loss across different sequences during packing training. Third, we introduce the LongBench-Chat benchmark for evaluating instruction-following capabilities on queries of 10k-100k in length. Experiments show that LongAlign outperforms existing recipes for LLMs in long context tasks by up to 30\%, while also maintaining their proficiency in handling short, generic tasks. The code, data, and long-aligned models are open-sourced at https://github.com/THUDM/LongAlign.

Yushi Bai, Xin Lv, Jiajie Zhang, Yuze He, Ji Qi, Lei Hou, Jie Tang, Yuxiao Dong, Juanzi Li• 2024

Related benchmarks

TaskDatasetResultRank
Long-context UnderstandingLongBench (test)
Avg Score27.5
136
Long-context UnderstandingLongBench
Overall Average Score56.6
115
Long-context Question AnsweringMFQA en
SubEM27.33
36
Long-context Question AnsweringEn.QA
SubEM32.19
36
Long-context Question AnsweringNarrativeQA
SubEM18.5
36
Long-context Question Answering2WikiMQA
SubEM69
36
Long-context UnderstandingMuSiQue
SubEM37.5
27
Long-context Question AnsweringMuSiQue
F1 Score28.76
19
Long-context UnderstandingAverage Overall
SubEM33.99
18
Multi-TaskLongbench-Chat
Point-wise Rate69.8
10
Showing 10 of 19 rows

Other info

Follow for update