Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Recursive Language Models

About

We study allowing large language models (LLMs) to process arbitrarily long prompts through the lens of inference-time scaling. We propose Recursive Language Models (RLMs), a general inference paradigm that treats long prompts as part of an external environment and allows the LLM to programmatically examine, decompose, and recursively call itself over snippets of the prompt. We find that RLMs can successfully process inputs up to two orders of magnitude beyond model context windows and, even for shorter prompts, dramatically outperform the quality of vanilla frontier LLMs and common long-context scaffolds across four diverse long-context tasks while having comparable cost. At a small scale, we post-train the first natively recursive language model. Our model, RLM-Qwen3-8B, outperforms the underlying Qwen3-8B model by $28.3\%$ on average and even approaches the quality of vanilla GPT-5 on three long-context tasks. Code is available at https://github.com/alexzhang13/rlm.

Alex L. Zhang, Tim Kraska, Omar Khattab• 2025

Related benchmarks

TaskDatasetResultRank
Long-context Question AnsweringLongBench (test)--
69
Long-context ReasoningOOLONG
Accuracy63.8
37
Long-context ReasoningOOLONG trec_coarse
Score53
28
Coding Question AnsweringCodeQA
Accuracy62.1
27
Semantic Needle-In-A-HaystackS-NIAH
Accuracy52.4
27
Long-context reasoning (Pairs)OOL-Pairs
Accuracy42.7
27
Code Question AnsweringCodeQA
Latency (s)98.7
27
Long-context ReasoningOOLONG
Latency (s)108.2
27
Long-context ReasoningOOL-Pairs
Latency (s)156.4
27
Long-context retrievalS-NIAH
Latency (s)86.3
27
Showing 10 of 16 rows

Other info

GitHub

Follow for update