Rho-1: Not All Tokens Are What You Need
About
Previous language model pre-training methods have uniformly applied a next-token prediction loss to all training tokens. Challenging this norm, we posit that "9l training". Our initial analysis examines token-level training dynamics of language model, revealing distinct loss patterns for different tokens. Leveraging these insights, we introduce a new language model called Rho-1. Unlike traditional LMs that learn to predict every next token in a corpus, Rho-1 employs Selective Language Modeling (SLM), which selectively trains on useful tokens that aligned with the desired distribution. This approach involves scoring pretraining tokens using a reference model, and then training the language model with a focused loss on tokens with higher scores. When continual pretraining on 15B OpenWebMath corpus, Rho-1 yields an absolute improvement in few-shot accuracy of up to 30% in 9 math tasks. After fine-tuning, Rho-1-1B and 7B achieved state-of-the-art results of 40.6% and 51.8% on MATH dataset, respectively - matching DeepSeekMath with only 3% of the pretraining tokens. Furthermore, when continual pretraining on 80B general tokens, Rho-1 achieves 6.8% average enhancement across 15 diverse tasks, increasing both efficiency and performance of the language model pre-training.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | Mathematical Reasoning Evaluation Harness GSM8K, MATH, SVAMP, ASDiv, MAWPS, TAB, MQA, SAT (test) | GSM8K Accuracy36.3 | 28 | |
| Agent Tool Use | T-eval (Held-Out) | Accuracy68.4 | 14 | |
| Agent Tool Use | Nexus (Held-Out) | Accuracy26 | 14 | |
| Function Calling | BFCL (Held-In) | Accuracy84.6 | 14 | |
| Agent Tool Use | StableToolBench Held-In | Pass Rate30.6 | 14 | |
| Temporal Knowledge Probing | TemporalWiki TWiki-Probes-0910 | Score (Unchanged)4.389 | 11 | |
| Temporal Knowledge Probing | TemporalWiki TWiki-Probes-1011 | Accuracy (Unchanged)4.36 | 11 | |
| Temporal Knowledge Probing | TemporalWiki TWiki-Probes-1112 | Accuracy (Unchanged)4.471 | 11 | |
| Continual Knowledge Learning | LAMA-CKL Llama2-7B based (test) | Top Accuracy14.1 | 6 | |
| Continual Knowledge Learning | LAMA-CKL (test) | Top Acc9 | 6 |