Stable LM 2 1.6B Technical Report
About
We introduce StableLM 2 1.6B, the first in a new generation of our language model series. In this technical report, we present in detail the data and training procedure leading to the base and instruction-tuned versions of StableLM 2 1.6B. The weights for both models are available via Hugging Face for anyone to download and use. The report contains thorough evaluations of these models, including zero- and few-shot benchmarks, multilingual benchmarks, and the MT benchmark focusing on multi-turn dialogues. At the time of publishing this report, StableLM 2 1.6B was the state-of-the-art open model under 2B parameters by a significant margin. Given its appealing small size, we also provide throughput measurements on a number of edge devices. In addition, we open source several quantized checkpoints and provide their performance metrics compared to the original model.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | GSM8K | Accuracy19.3 | 983 | |
| Multi-task Language Understanding | MMLU | Accuracy36 | 842 | |
| Question Answering | OpenBookQA | Accuracy37 | 465 | |
| Reasoning | HellaSwag (HS) | HellaSwag Accuracy66.7 | 142 | |
| Reasoning | PIQA | Accuracy76.8 | 133 | |
| Question Answering | CommonsenseQA (CSQA) | Accuracy34.6 | 124 | |
| Reasoning | WinoGrande (WG) | Accuracy59.2 | 87 | |
| Reasoning | ARC | Accuracy53.5 | 83 | |
| Reasoning | SIQA | Accuracy43.5 | 44 | |
| Trivia QA | Trivia QA | Accuracy35.6 | 32 |