Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

About

While large language models have facilitated breakthroughs in many applications of artificial intelligence, their inherent largeness makes them computationally expensive and challenging to deploy in resource-constrained settings. In this paper, we document the development of SmolLM2, a state-of-the-art "small" (1.7 billion parameter) language model (LM). To attain strong performance, we overtrain SmolLM2 on ~11 trillion tokens of data using a multi-stage training process that mixes web text with specialized math, code, and instruction-following data. We additionally introduce new specialized datasets (FineMath, Stack-Edu, and SmolTalk) at stages where we found existing datasets to be problematically small or low-quality. To inform our design decisions, we perform both small-scale ablations as well as a manual refinement process that updates the dataset mixing rates at each stage based on the performance at the previous stage. Ultimately, we demonstrate that SmolLM2 outperforms other recent small LMs including Qwen2.5-1.5B and Llama3.2-1B. To facilitate future research on LM development as well as applications of small LMs, we release both SmolLM2 as well as all of the datasets we prepared in the course of this project.

Loubna Ben Allal, Anton Lozhkov, Elie Bakouch, Gabriel Mart\'in Bl\'azquez, Guilherme Penedo, Lewis Tunstall, Andr\'es Marafioti, Hynek Kydl\'i\v{c}ek, Agust\'in Piqueres Lajar\'in, Vaibhav Srivastav, Joshua Lochner, Caleb Fahlgren, Xuan-Son Nguyen, Cl\'ementine Fourrier, Ben Burtenshaw, Hugo Larcher, Haojun Zhao, Cyril Zakka, Mathieu Morlon, Colin Raffel, Leandro von Werra, Thomas Wolf• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K
Accuracy32.6
983
Code GenerationHumanEval--
850
Multi-task Language UnderstandingMMLU
Accuracy49.2
842
Commonsense ReasoningWinoGrande
Accuracy65.8
776
Question AnsweringARC Challenge
Accuracy54.1
749
Commonsense ReasoningPIQA
Accuracy77.4
647
Mathematical ReasoningMATH
Accuracy11.6
535
Question AnsweringOpenBookQA
Accuracy42.4
465
Question AnsweringARC Easy
Normalized Acc77.8
385
Commonsense ReasoningCSQA
Accuracy67.16
366
Showing 10 of 73 rows
...

Other info

Follow for update