E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning
About
Processing long contexts is increasingly important for Large Language Models (LLMs) in tasks like multi-turn dialogues, code generation, and document summarization. This paper addresses the challenges of achieving high long-context performance, low computational complexity, and compatibility with pretrained models -- collectively termed the ``impossible triangle''. We introduce E2LLM (Encoder Elongated Large Language Models), a novel approach that effectively navigates this paradox. E2LLM divides long contexts into chunks, compresses each into soft prompts using a pretrained text encoder, and aligns these representations with a decoder-only LLM via an adapter. To enhance the LLM's reasoning with these soft prompts, we employ two training objectives: encoder output reconstruction and long-context instruction fine-tuning. Extensive experiments reveal that E2LLM not only outperforms 8 state-of-the-art (SOTA) methods in effectiveness and efficiency for document summarization and question answering, but also achieves the best performance on LongBench v2 among models of comparable size.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Long-context Language Understanding | RULER 32k context length | VT Score8.6 | 33 | |
| Long-context Language Understanding | RULER 16k context length | -- | 16 | |
| Multiple-choice Question Answering | LongBench v2 (val) | Overall Accuracy31.8 | 15 | |
| Long-context Language Understanding | RULER 4k context length | VT Score7.2 | 10 | |
| Document Summarization | QMSum | G-mean15.47 | 9 | |
| Document Summarization | GovReport | G-mean18.43 | 9 | |
| Long-context Language Understanding | RULER 64k context length | QA Score34.5 | 9 | |
| Question Answering | TriviaQA | F1 Score38.57 | 8 | |
| Long-context Understanding | RULER 8k context | NIAH60.66 | 7 | |
| Long-context Understanding | RULER 128K context | NIAH48.38 | 5 |