Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models
About
We introduce Youtu-LLM, a lightweight yet powerful language model that harmonizes high computational efficiency with native agentic intelligence. Unlike typical small models that rely on distillation, Youtu-LLM (1.96B) is pre-trained from scratch to systematically cultivate reasoning and planning capabilities. The key technical advancements are as follows: (1) Compact Architecture with Long-Context Support: Built on a dense Multi-Latent Attention (MLA) architecture with a novel STEM-oriented vocabulary, Youtu-LLM supports a 128k context window. This design enables robust long-context reasoning and state tracking within a minimal memory footprint, making it ideal for long-horizon agent and reasoning tasks. (2) Principled "Commonsense-STEM-Agent" Curriculum: We curated a massive corpus of approximately 11T tokens and implemented a multi-stage training strategy. By progressively shifting the pre-training data distribution from general commonsense to complex STEM and agentic tasks, we ensure the model acquires deep cognitive abilities rather than superficial alignment. (3) Scalable Agentic Mid-training: Specifically for the agentic mid-training, we employ diverse data construction schemes to synthesize rich and varied trajectories across math, coding, and tool-use domains. This high-quality data enables the model to internalize planning and reflection behaviors effectively. Extensive evaluations show that Youtu-LLM sets a new state-of-the-art for sub-2B LLMs. On general benchmarks, it achieves competitive performance against larger models, while on agent-specific tasks, it significantly surpasses existing SOTA baselines, demonstrating that lightweight models can possess strong intrinsic agentic capabilities.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Coding | HumanEval | Pass@164.6 | 52 | |
| Code | MBPP | Pass@166.6 | 43 | |
| Automated Software Engineering | SWE-bench Verified | Resolved Rate1.77e+3 | 39 | |
| Coding | MBPP+ | Pass@181.8 | 37 | |
| Long-context Understanding | LongBench v2 | Overall Score27.2 | 37 | |
| Coding | HumanEval+ | Pass@157.3 | 31 | |
| Deep Research | xbench | Accuracy19.5 | 30 | |
| Deep Research | GAIA | Pass@133.9 | 16 | |
| Coding | LiveCodeBench v6 | Pass@19.7 | 11 | |
| Coding | CRUXEval | Pass@155.9 | 6 |