Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LoQT: Low-Rank Adapters for Quantized Pretraining

About

Despite advances using low-rank adapters and quantization, pretraining of large models on consumer hardware has not been possible without model sharding, offloading during training, or per-layer gradient updates. To address these limitations, we propose Low-Rank Adapters for Quantized Training (LoQT), a method for efficiently training quantized models. LoQT uses gradient-based tensor factorization to initialize low-rank trainable weight matrices that are periodically merged into quantized full-rank weight matrices. Our approach is suitable for both pretraining and fine-tuning models. We demonstrate this for language modeling and downstream task adaptation, finding that LoQT enables efficient training of models up to 7B parameters on a 24GB GPU. We also demonstrate the feasibility of training a 13B model using per-layer gradient updates on the same hardware.

Sebastian Loeschcke, Mads Toftrup, Michael J. Kastoryano, Serge Belongie, V\'esteinn Sn{\ae}bjarnarson• 2024

Related benchmarks

TaskDatasetResultRank
Natural Language UnderstandingGLUE (dev)
SST-2 (Acc)95.9
504
Language ModelingC4 (val)
PPL15.2
392
Arithmetic ReasoningGSM8K (test)
Accuracy52.9
129
Language AdaptationIcelandic text dataset curated subset (test)
Perplexity3.61
5
Showing 4 of 4 rows

Other info

Code

Follow for update