TinyLlama: An Open-Source Small Language Model
About
We present TinyLlama, a compact 1.1B language model pretrained on around 1 trillion tokens for approximately 3 epochs. Building on the architecture and tokenizer of Llama 2, TinyLlama leverages various advances contributed by the open-source community (e.g., FlashAttention and Lit-GPT), achieving better computational efficiency. Despite its relatively small size, TinyLlama demonstrates remarkable performance in a series of downstream tasks. It significantly outperforms existing open-source language models with comparable sizes. Our model checkpoints and code are publicly available on GitHub at https://github.com/jzhang38/TinyLlama.
Peiyuan Zhang, Guangtao Zeng, Tianduo Wang, Wei Lu• 2024
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | GSM8K | Accuracy1.7 | 1362 | |
| Commonsense Reasoning | WinoGrande | Accuracy59.4 | 1085 | |
| Question Answering | ARC Challenge | Accuracy30.1 | 906 | |
| Multi-task Language Understanding | MMLU | Accuracy32.6 | 876 | |
| Mathematical Reasoning | GSM8K (test) | Accuracy14.19 | 770 | |
| Commonsense Reasoning | PIQA | Accuracy73.3 | 751 | |
| Instruction Following | IFEval | -- | 625 | |
| Question Answering | ARC Easy | Accuracy66.4 | 597 | |
| Physical Commonsense Reasoning | PIQA | Accuracy74.8 | 572 | |
| Code Generation | HumanEval (test) | -- | 506 |
Showing 10 of 98 rows
...