Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

About

Large language models (LLMs) with instruction fine-tuning demonstrate superior generative capabilities. However, these models are resource-intensive. To alleviate this issue, we explore distilling knowledge from instruction-tuned LLMs into much smaller ones. To this end, we carefully develop a large set of 2.58M instructions based on both existing and newly-generated instructions. In addition to being sizable, we design our instructions to cover a broad set of topics to ensure diversity. Extensive analysis of our instruction dataset confirms its diversity, and we generate responses for these instructions using gpt-3.5-turbo. Leveraging these instructions, we fine-tune a diverse herd of models, collectively referred to as LaMini-LM, which includes models from both the encoder-decoder and decoder-only families, with varying sizes. We evaluate the performance of our models using automatic metrics on 15 different natural language processing (NLP) benchmarks, as well as through human assessment. The results demonstrate that our proposed LaMini-LM models are comparable to competitive baselines, while being much smaller in size.

Minghao Wu, Abdul Waheed, Chiyu Zhang, Muhammad Abdul-Mageed, Alham Fikri Aji• 2023

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningHellaSwag
Accuracy63.9
1896
Commonsense ReasoningWinoGrande
Accuracy63.5
1442
Mathematical ReasoningMATH
Accuracy7.96
882
Commonsense ReasoningPIQA
Accuracy75.1
757
Natural Language InferenceRTE
Accuracy71.8
590
Question AnsweringOpenBookQA
Accuracy44.8
465
Question AnsweringSciQ
Accuracy86.6
283
Word Sense DisambiguationWiC
Avg Accuracy61.8
261
Question AnsweringARC
Accuracy43.2
230
ReasoningHellaSwag (HS)
HellaSwag Accuracy30.2
209
Showing 10 of 34 rows

Other info

Code

Follow for update