Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Phi-4 Technical Report

About

We present phi-4, a 14-billion parameter language model developed with a training recipe that is centrally focused on data quality. Unlike most language models, where pre-training is based primarily on organic data sources such as web content or code, phi-4 strategically incorporates synthetic data throughout the training process. While previous models in the Phi family largely distill the capabilities of a teacher model (specifically GPT-4), phi-4 substantially surpasses its teacher model on STEM-focused QA capabilities, giving evidence that our data-generation and post-training techniques go beyond distillation. Despite minimal changes to the phi-3 architecture, phi-4 achieves strong performance relative to its size -- especially on reasoning-focused benchmarks -- due to improved data, training curriculum, and innovations in the post-training scheme.

Marah Abdin, Jyoti Aneja, Harkirat Behl, S\'ebastien Bubeck, Ronen Eldan, Suriya Gunasekar, Michael Harrison, Russell J. Hewett, Mojan Javaheripi, Piero Kauffmann, James R. Lee, Yin Tat Lee, Yuanzhi Li, Weishung Liu, Caio C. T. Mendes, Anh Nguyen, Eric Price, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Xin Wang, Rachel Ward, Yue Wu, Dingli Yu, Cyril Zhang, Yi Zhang• 2024

Related benchmarks

TaskDatasetResultRank
ReasoningBBH
Accuracy87.6
672
Instruction FollowingIFEval--
625
Instruction FollowingAlpacaEval 2.0--
507
Multi-hop Question Answering2WikiMultihopQA--
387
Mathematical ReasoningGSM8K
Accuracy (GSM8K)89.4
358
Mathematical ReasoningAIME 2024 (test)
Accuracy10
159
KnowledgeMMLU
Accuracy84.9
136
Multiple-choice Question AnsweringMMLU-Pro
MMLU-Pro Overall Accuracy58.22
119
Code GenerationHumanEval+ (test)--
93
Natural Language InferenceMedNLI (test)
Accuracy64.26
89
Showing 10 of 91 rows
...

Other info

Follow for update