Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Rho-1: Not All Tokens Are What You Need

About

Previous language model pre-training methods have uniformly applied a next-token prediction loss to all training tokens. Challenging this norm, we posit that "9l training". Our initial analysis examines token-level training dynamics of language model, revealing distinct loss patterns for different tokens. Leveraging these insights, we introduce a new language model called Rho-1. Unlike traditional LMs that learn to predict every next token in a corpus, Rho-1 employs Selective Language Modeling (SLM), which selectively trains on useful tokens that aligned with the desired distribution. This approach involves scoring pretraining tokens using a reference model, and then training the language model with a focused loss on tokens with higher scores. When continual pretraining on 15B OpenWebMath corpus, Rho-1 yields an absolute improvement in few-shot accuracy of up to 30% in 9 math tasks. After fine-tuning, Rho-1-1B and 7B achieved state-of-the-art results of 40.6% and 51.8% on MATH dataset, respectively - matching DeepSeekMath with only 3% of the pretraining tokens. Furthermore, when continual pretraining on 80B general tokens, Rho-1 achieves 6.8% average enhancement across 15 diverse tasks, increasing both efficiency and performance of the language model pre-training.

Zhenghao Lin, Zhibin Gou, Yeyun Gong, Xiao Liu, Yelong Shen, Ruochen Xu, Chen Lin, Yujiu Yang, Jian Jiao, Nan Duan, Weizhu Chen• 2024

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMathematical Reasoning Evaluation Harness GSM8K, MATH, SVAMP, ASDiv, MAWPS, TAB, MQA, SAT (test)
GSM8K Accuracy36.3
28
Agent Tool UseT-eval (Held-Out)
Accuracy68.4
14
Agent Tool UseNexus (Held-Out)
Accuracy26
14
Function CallingBFCL (Held-In)
Accuracy84.6
14
Agent Tool UseStableToolBench Held-In
Pass Rate30.6
14
Temporal Knowledge ProbingTemporalWiki TWiki-Probes-0910
Score (Unchanged)4.389
11
Temporal Knowledge ProbingTemporalWiki TWiki-Probes-1011
Accuracy (Unchanged)4.36
11
Temporal Knowledge ProbingTemporalWiki TWiki-Probes-1112
Accuracy (Unchanged)4.471
11
Continual Knowledge LearningLAMA-CKL Llama2-7B based (test)
Top Accuracy14.1
6
Continual Knowledge LearningLAMA-CKL (test)
Top Acc9
6
Showing 10 of 10 rows

Other info

Follow for update