Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Kakugo: Distillation of Low-Resource Languages into Small Language Models

About

We present Kakugo, a novel and cost-effective pipeline designed to train general-purpose Small Language Models (SLMs) for low-resource languages using only the language name as input. By using a large teacher model to generate synthetic prompts and translate instruction datasets, we produced training data and SLMs for 54 low-resource languages. Evaluations across a diverse set of general natural language processing tasks, including translation, classification, and question answering, demonstrate that our pipeline consistently improves performance over base models. With a total generation and training cost of under $50 per language, Kakugo offers an accessible method for communities to develop language-specific AI.

Peter Devine, Mardhiyah Sanni, Farid Adilazuarda, Julieta Gil Loizaga, Barry Haddow• 2026

Related benchmarks

TaskDatasetResultRank
Instruction following and reasoningLow-resource languages evaluation suite (am, arz, ars, as, ast, az, ba, bn, bo, ceb, cv, cy, fo, ga, gd, gl, gn, ha, ht, ig, jv, kmr, sdh, ky, lb, lo, lus, mg, mi, mn, mt, ny, oc, pap, ps, rn, rw, sd, si, sm, sn, st, su, sw, te, tg, ti, tk, tt, ug, xh, yi, yo, zu)
Wins5
54
Machine TranslationFLORES xx→en (test)--
38
Reading ComprehensionBelebele--
20
Machine TranslationFLORES en->xx--
16
Topic ClassificationSIB200--
8
Multitask Language UnderstandingGlobalMMLU--
6
Showing 6 of 6 rows

Other info

Follow for update