Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Universal Cross-Tokenizer Distillation via Approximate Likelihood Matching

About

Distillation has shown remarkable success in transferring knowledge from a Large Language Model (LLM) teacher to a student LLM. However, current distillation methods require similar tokenizers between the teacher and the student, restricting their applicability to only a small subset of teacher-student pairs. In this work, we develop a principled cross-tokenizer distillation method to solve this crucial deficiency. Our method is the first to enable effective distillation across fundamentally different tokenizers, while also substantially outperforming prior methods in all other cases. We verify the efficacy of our method on three distinct use cases. First, we show that viewing tokenizer transfer as self-distillation enables unprecedentedly effective transfer across tokenizers, including rapid transfer of subword models to the byte-level. Transferring different models to the same tokenizer also enables ensembling to boost performance. Secondly, we distil a large maths-specialised LLM into a small general-purpose model with a different tokenizer, achieving competitive maths problem-solving performance. Thirdly, we use our method to train state-of-the-art embedding prediction hypernetworks for training-free tokenizer transfer. Our results unlock an expanded range of teacher-student pairs for distillation, enabling new ways to adapt and enhance interaction between LLMs.

Benjamin Minixhofer, Ivan Vuli\'c, Edoardo Maria Ponti• 2025

Related benchmarks

TaskDatasetResultRank
Math ReasoningGSM8K
Pass@1 Accuracy87.81
57
Code GenerationHumanEval
Score43.9
55
Sentiment AnalysisFOMC
Accuracy31.8
44
Language UnderstandingMMLU
MMLU Score39.42
40
Code GenerationLCB v6
Pass@110.86
39
ReasoningBBH
BBH Score24.21
39
Mathematical ReasoningMATH
Overall Score13.14
29
Medical Question AnsweringMedConceptsQA
Accuracy26.5
26
Question AnsweringMMLU Professional Law
Accuracy27.6
26
Math ReasoningMATH 500
Pass@159.84
21
Showing 10 of 11 rows

Other info

Follow for update