Token-Level Adaptation of LoRA Adapters for Downstream Task Generalization
About
This paper introduces a method for adapting LoRA adapters in smaller-sized language models to arbitrary downstream tasks. Unlike standard mixture-of-expert architectures, our method employs a gradient-free routing function to choose a weighted combination of experts without increasing the compute requirements for training or inference. The results show that token-level adaptation of LoRA adapters outperforms the base Llama-2-7b model across mathematical (GSM8K), scientific (ARC-Challenge), reading comprehension (SQuAD), and coding (CodeAlpaca-20k) tasks. Further evaluations also show that the average performance of token-level adaptation outperforms individual models fine-tuned for each of the tasks with the best performance observed in adaptation of every-other token during inference. The code for this study is made available through a public repository.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Code Generation | HumanEval Multilingual (test) | Average Score33.3 | 52 | |
| Mathematical Reasoning | MGSM (test) | Accuracy (MGSM)40.5 | 29 | |
| Medical Question Answering | MedMCQA translated (test) | Accuracy (ZH)35.3 | 9 | |
| Multi-domain Language Modeling | MGSM, HumanEval, and MedMCQA composite (test) | Average Score (Composite)34.4 | 9 |