Parameter Efficient Mamba Tuning via Projector-targeted Diagonal-centric Linear Transformation
About
Despite the growing interest in Mamba architecture as a potential replacement for Transformer architecture, parameter-efficient fine-tuning (PEFT) approaches for Mamba remain largely unexplored. In our study, we introduce two key insights-driven strategies for PEFT in Mamba architecture: (1) While state-space models (SSMs) have been regarded as the cornerstone of Mamba architecture, then expected to play a primary role in transfer learning, our findings reveal that Projectors -- not SSMs -- are the predominant contributors to transfer learning. (2) Based on our observation, we propose a novel PEFT method specialized to Mamba architecture: Projector-targeted Diagonal-centric Linear Transformation (ProDiaL). ProDiaL focuses on optimizing only the pretrained Projectors for new tasks through diagonal-centric linear transformation matrices, without directly fine-tuning the Projector weights. This targeted approach allows efficient task adaptation, utilizing less than 1% of the total parameters, and exhibits strong performance across both vision and language Mamba models, highlighting its versatility and effectiveness.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Commonsense Reasoning | HellaSwag | Accuracy38.57 | 1460 | |
| Image Classification | StanfordCars | Accuracy85.38 | 266 | |
| Science Question Answering | ARC Challenge | Accuracy30.46 | 234 | |
| Commonsense Reasoning | WinoGrande | Accuracy53.83 | 231 | |
| Science Question Answering | ARC Easy | Accuracy53.45 | 101 | |
| Image Classification | Caltech | Accuracy97.16 | 98 | |
| Question Answering | WinoGrande (WG) | Accuracy61.96 | 98 | |
| Image Classification | Flowers | Accuracy88 | 83 | |
| Natural Language Understanding | ARC-C | Accuracy30.8 | 20 | |
| Natural Language Understanding | ARC Easy | Accuracy55.18 | 20 |