BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models

About

This work presents BAdam, an optimization method that leverages the block coordinate descent (BCD) framework with Adam's update rule. BAdam offers a memory efficient approach to the full parameter finetuning of large language models. We conduct a theoretical convergence analysis for BAdam in the deterministic case. Experimentally, we apply BAdam to finetune the Llama 3-8B and Llama 3-70B models using a single RTX3090-24GB GPU and 4 A100-80GB GPUs, respectively. The results confirm BAdam's efficiency in terms of memory usage, running time, and optimization capability. Furthermore, the downstream performance evaluation based on MT-bench and math benchmarks shows that BAdam outperforms existing memory efficient baselines such as LoRA. It also demonstrates that BAdam can achieve comparable or even superior performance compared to Adam. Finally, the ablation study using SGD's update rule illustrates the suitability of BCD for finetuning LLMs. Our code can be easily integrated into any PyTorch-based codebase and is available at https://github.com/Ledzy/BAdam.

Qijun Luo, Hengxu Yu, Xiao Li• 2024

Related benchmarks

Task	Dataset	Result
Instruction Following	MT-Bench	MT-Bench Score6.7	287
Mathematical Reasoning	AQUA	Accuracy42.5	167
Natural Language Understanding	SuperGLUE (test)	BoolQ Accuracy85.6	74
Mathematical Reasoning	NUMGLUE	Accuracy53	54
Instruction Following	MT-bench v1.0 (test)	MT-Bench Score6.67	52
Mathematical Reasoning	Math Benchmarks Aggregate	--	44
Mathematical Reasoning	SAT Math	SAT Math Score56.8	9
Mathematical Reasoning	MMLU Math	Score50.5	9
Natural Language Understanding	SuperGLUE	BoolQ Accuracy85.4	6
Mathematical Reasoning	Math Benchmarks evaluated on Llama 3-70B	GSM8K Accuracy78.2	5

Showing 10 of 10 rows

Other info

Code

Follow for update

@wizwand_team Discord