Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AFA-LoRA: Enabling Non-Linear Adaptations in LoRA with Activation Function Annealing

About

Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient fine-tuning (PEFT) method. However, its linear adaptation process limits its expressive power. This means there is a gap between the expressive power of linear training and non-linear training. To bridge this gap, we propose AFA-LoRA, a novel training strategy that brings non-linear expressivity to LoRA while maintaining its seamless mergeability. Our key innovation is an annealed activation function that transitions from a non-linear to a linear transformation during training, allowing the adapter to initially adopt stronger representational capabilities before converging to a mergeable linear form. We implement our method on supervised fine-tuning, reinforcement learning, and speculative decoding. The results show that AFA-LoRA reduces the performance gap between LoRA and full-parameter training. This work enables a more powerful and practical paradigm of parameter-efficient adaptation.

Jiacheng Li, Jianchao Tan, Zhidong Yang, Feiye Huo, Yerui Sun, Yuchen Xie, Xunliang Cai• 2025

Related benchmarks

TaskDatasetResultRank
Common Sense Reasoning5 common-sense reasoning tasks Llama-3-8B
Average Accuracy86.34
27
Speculative DecodingShareGPT Llama-3.1-8B 1.0 (test)
MT-Bench Score3.2124
10
Showing 2 of 2 rows

Other info

Follow for update