BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models
About
This work presents BAdam, an optimization method that leverages the block coordinate descent (BCD) framework with Adam's update rule. BAdam offers a memory efficient approach to the full parameter finetuning of large language models. We conduct a theoretical convergence analysis for BAdam in the deterministic case. Experimentally, we apply BAdam to finetune the Llama 3-8B and Llama 3-70B models using a single RTX3090-24GB GPU and 4 A100-80GB GPUs, respectively. The results confirm BAdam's efficiency in terms of memory usage, running time, and optimization capability. Furthermore, the downstream performance evaluation based on MT-bench and math benchmarks shows that BAdam outperforms existing memory efficient baselines such as LoRA. It also demonstrates that BAdam can achieve comparable or even superior performance compared to Adam. Finally, the ablation study using SGD's update rule illustrates the suitability of BCD for finetuning LLMs. Our code can be easily integrated into any PyTorch-based codebase and is available at https://github.com/Ledzy/BAdam.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Instruction Following | MT-Bench | MT-Bench Score6.7 | 287 | |
| Mathematical Reasoning | AQUA | Accuracy42.5 | 167 | |
| Natural Language Understanding | SuperGLUE (test) | BoolQ Accuracy85.6 | 74 | |
| Instruction Following | MT-bench v1.0 (test) | MT-Bench Score6.67 | 52 | |
| Mathematical Reasoning | Math Benchmarks Aggregate | -- | 44 | |
| Mathematical Reasoning | NUMGLUE | Accuracy53 | 39 | |
| Mathematical Reasoning | SAT Math | SAT Math Score56.8 | 9 | |
| Mathematical Reasoning | MMLU Math | Score50.5 | 9 | |
| Natural Language Understanding | SuperGLUE | BoolQ Accuracy85.4 | 6 | |
| Mathematical Reasoning | Math Benchmarks evaluated on Llama 3-70B | GSM8K Accuracy78.2 | 5 |