Sparsity-Accelerated Training for Large Language Models
About
Large language models (LLMs) have demonstrated proficiency across various natural language processing (NLP) tasks but often require additional training, such as continual pre-training and supervised fine-tuning. However, the costs associated with this, primarily due to their large parameter count, remain high. This paper proposes leveraging \emph{sparsity} in pre-trained LLMs to expedite this training process. By observing sparsity in activated neurons during forward iterations, we identify the potential for computational speed-ups by excluding inactive neurons. We address associated challenges by extending existing neuron importance evaluation metrics and introducing a ladder omission rate scheduler. Our experiments on Llama-2 demonstrate that Sparsity-Accelerated Training (SAT) achieves comparable or superior performance to standard training while significantly accelerating the process. Specifically, SAT achieves a $45\%$ throughput improvement in continual pre-training and saves $38\%$ training time in supervised fine-tuning in practice. It offers a simple, hardware-agnostic, and easily deployable framework for additional LLM training. Our code is available at https://github.com/OpenDFM/SAT.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Commonsense Reasoning | HellaSwag | Accuracy93.1 | 1460 | |
| Commonsense Reasoning | WinoGrande | Accuracy83.6 | 776 | |
| Physical Interaction Question Answering | PIQA | Accuracy87 | 323 | |
| Medical Question Answering | MedMCQA | Accuracy59.6 | 253 | |
| Question Answering | ARC | Accuracy88.2 | 154 | |
| Question Answering | PubMedQA | Accuracy56.7 | 145 | |
| Financial NLP | FinGPT | Accuracy83.2 | 28 | |
| Summarization | BillSum | Accuracy65.7 | 28 | |
| Factuality and Reasoning | GPT4All | HellaSwag Accuracy0.6229 | 12 | |
| Factuality and Reasoning | MMLU | MMLU Accuracy55.4 | 12 |