Bi-LoRA: Efficient Sharpness-Aware Minimization for Fine-Tuning Large-Scale Models
About
Fine-tuning large-scale pre-trained models with limited data presents significant challenges for generalization. While Sharpness-Aware Minimization (SAM) has proven effective in improving generalization by seeking flat minima, its substantial extra memory and computation overhead make it impractical for large models. Integrating SAM with parameter-efficient fine-tuning methods like Low-Rank Adaptation (LoRA) is a promising direction. However, we find that directly applying SAM to LoRA parameters limits the sharpness optimization to a restricted subspace, hindering its effectiveness. To address this limitation, we propose Bi-directional Low-Rank Adaptation (Bi-LoRA), which introduces an auxiliary LoRA module to model SAM's adversarial weight perturbations. It decouples SAM's weight perturbations from LoRA optimization: the primary LoRA module adapts to specific tasks via standard gradient descent, while the auxiliary module captures the sharpness of the loss landscape through gradient ascent. Such dual-module design enables Bi-LoRA to capture broader sharpness for achieving flatter minima while remaining memory-efficient. Another important benefit is that the dual design allows for simultaneous optimization and perturbation, eliminating SAM's doubled training costs. Extensive experiments across diverse tasks and architectures demonstrate Bi-LoRA's efficiency and effectiveness in enhancing generalization.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | GSM8K | Accuracy60.32 | 1398 | |
| Dialogue | MT-Bench | MT-Bench Score6.26 | 41 | |
| Instruction Following | BBH | -- | 40 | |
| Code Generation | HumanEval | Pass@127.2 | 36 | |
| Instruction Following | DROP | DROP Score51.53 | 20 | |
| Instruction Following | MMLU | MMLU Accuracy63.67 | 20 | |
| Instruction Following | HEval | PASS@146.12 | 12 | |
| Instruction Following | Instruction-following Evaluation Suite (MMLU, DROP, HEval, BBH) (test) | MMLU79.67 | 11 | |
| Natural Language Understanding | GLUE | MNLI Accuracy86.33 | 4 | |
| Natural Language Understanding | SuperGLUE (val) | BoolQ Accuracy72.25 | 4 |