SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
About
We introduce SOLAR 10.7B, a large language model (LLM) with 10.7 billion parameters, demonstrating superior performance in various natural language processing (NLP) tasks. Inspired by recent efforts to efficiently up-scale LLMs, we present a method for scaling LLMs called depth up-scaling (DUS), which encompasses depthwise scaling and continued pretraining. In contrast to other LLM up-scaling methods that use mixture-of-experts, DUS does not require complex changes to train and inference efficiently. We show experimentally that DUS is simple yet effective in scaling up high-performance LLMs from small ones. Building on the DUS model, we additionally present SOLAR 10.7B-Instruct, a variant fine-tuned for instruction-following capabilities, surpassing Mixtral-8x7B-Instruct. SOLAR 10.7B is publicly available under the Apache 2.0 license, promoting broad access and application in the LLM field.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Commonsense Reasoning | HellaSwag | Accuracy25.16 | 1460 | |
| Code Generation | HumanEval | Pass@12.44 | 850 | |
| Multi-task Language Understanding | MMLU | Accuracy31.05 | 842 | |
| Language Modeling | WikiText-103 (test) | Perplexity9.68 | 524 | |
| Boolean Question Answering | BoolQ | Accuracy61.16 | 307 | |
| Question Answering | ARC-E | Accuracy37.1 | 242 | |
| Question Answering | BoolQ | Accuracy61.53 | 240 | |
| Commonsense Reasoning | WinoGrande | Accuracy60.22 | 231 | |
| Question Answering | TriviaQA | Accuracy47.72 | 210 | |
| Question Answering | ARC-C | Accuracy24.25 | 166 |