Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Token-Level Adaptation of LoRA Adapters for Downstream Task Generalization

About

This paper introduces a method for adapting LoRA adapters in smaller-sized language models to arbitrary downstream tasks. Unlike standard mixture-of-expert architectures, our method employs a gradient-free routing function to choose a weighted combination of experts without increasing the compute requirements for training or inference. The results show that token-level adaptation of LoRA adapters outperforms the base Llama-2-7b model across mathematical (GSM8K), scientific (ARC-Challenge), reading comprehension (SQuAD), and coding (CodeAlpaca-20k) tasks. Further evaluations also show that the average performance of token-level adaptation outperforms individual models fine-tuned for each of the tasks with the best performance observed in adaptation of every-other token during inference. The code for this study is made available through a public repository.

Joshua Belofsky• 2023

Related benchmarks

TaskDatasetResultRank
Code GenerationHumanEval Multilingual (test)
Average Score33.3
52
Mathematical ReasoningMGSM (test)
Accuracy (MGSM)40.5
29
Medical Question AnsweringMedMCQA translated (test)
Accuracy (ZH)35.3
9
Multi-domain Language ModelingMGSM, HumanEval, and MedMCQA composite (test)
Average Score (Composite)34.4
9
Showing 4 of 4 rows

Other info

Follow for update