Glyph: Scaling Context Windows via Visual-Text Compression
About
Large language models (LLMs) increasingly rely on long-context modeling for tasks such as document understanding, code analysis, and multi-step reasoning. However, scaling context windows to the million-token level brings prohibitive computational and memory costs, limiting the practicality of long-context LLMs. In this work, we take a different perspective-visual context scaling-to tackle this challenge. Instead of extending token-based sequences, we propose Glyph, a framework that renders long texts into images and processes them with vision-language models (VLMs). This approach substantially compresses textual input while preserving semantic information, and we further design an LLM-driven genetic search to identify optimal visual rendering configurations for balancing accuracy and compression. Through extensive experiments, we demonstrate that our method achieves 3-4x token compression while maintaining accuracy comparable to leading LLMs such as Qwen3-8B on various long-context benchmarks. This compression also leads to around 4x faster prefilling and decoding, and approximately 2x faster SFT training. Furthermore, under extreme compression, a 128K-context VLM could scale to handle 1M-token-level text tasks. In addition, the rendered text data benefits real-world multimodal tasks, such as document understanding. Our code and model are released at https://github.com/thu-coai/Glyph.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Long-context Question Answering | LongBench (test) | HotpotQA6.64e+3 | 59 | |
| Long-context Question Answering | HotpotQA | -- | 21 | |
| Few-shot Learning | LongBench | TREC Score82.62 | 12 | |
| Summarization | LongBench | GovRep Score25.53 | 12 | |
| Long-context Question Answering | LongBench Pro | F1 Score28.94 | 10 | |
| Long-context Question Answering | DocMath | F1 Score13.61 | 10 | |
| Long-context Question Answering | MuSiQue | F1 Score24.87 | 10 | |
| Long-context Question Answering | Qasper | Extract F139.26 | 10 |