The $\mathbf{Y}$-Combinator for LLMs: Solving Long-Context Rot with $\lambda$-Calculus
About
LLMs are increasingly used as general-purpose reasoners, but long inputs remain bottlenecked by a fixed context window. Recursive Language Models (RLMs) address this by externalising the prompt and recursively solving subproblems. Yet existing RLMs depend on an open-ended read-eval-print loop (REPL) in which the model generates arbitrary control code, making execution difficult to verify, predict, and analyse. We introduce $\lambda$-RLM, a framework for long-context reasoning that replaces free-form recursive code generation with a typed functional runtime grounded in $\lambda$-calculus. It executes a compact library of pre-verified combinators and uses neural inference only on bounded leaf subproblems, turning recursive reasoning into a structured functional program with explicit control flow. We show that $\lambda$-RLM admits formal guarantees absent from standard RLMs, including termination, closed-form cost bounds, controlled accuracy scaling with recursion depth, and an optimal partition rule under a simple cost model. Empirically, across four long-context reasoning tasks and nine base models, $\lambda$-RLM outperforms standard RLM in 29 of 36 model-task comparisons, improves average accuracy by up to +21.9 points across model tiers, and reduces latency by up to 4.1x. These results show that typed symbolic control yields a more reliable and efficient foundation for long-context reasoning than open-ended recursive code generation. The complete implementation of $\lambda$-RLM, is open-sourced for the community at: https://github.com/lambda-calculus-LLM/lambda-RLM.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Long-context Reasoning | OOLONG | Accuracy68.4 | 37 | |
| Long-context reasoning (Pairs) | OOL-Pairs | Accuracy64.3 | 27 | |
| Semantic Needle-In-A-Haystack | S-NIAH | Accuracy51.3 | 27 | |
| Coding Question Answering | CodeQA | Accuracy55.7 | 27 | |
| Code Question Answering | CodeQA | Latency (s)42.1 | 27 | |
| Long-context retrieval | S-NIAH | Latency (s)28.1 | 27 | |
| Long-context Reasoning | OOLONG | Latency (s)38.5 | 27 | |
| Long-context Reasoning | OOL-Pairs | Latency (s)30.8 | 27 |