Parameter-Efficient Fine-Tuning of State Space Models

About

Deep State Space Models (SSMs), such as Mamba (Gu & Dao, 2024), have become powerful tools for language modeling, offering high performance and linear scalability with sequence length. However, the application of parameter-efficient fine-tuning (PEFT) methods to SSM-based models remains largely underexplored. We start by investigating two fundamental questions on existing PEFT methods: (i) How do they perform on SSM-based models? (ii) Which parameters should they target for optimal results? Our analysis shows that LoRA and its variants consistently outperform all other PEFT methods. While LoRA is effective for linear projection matrices, it fails on SSM modules-yet still outperforms other methods applicable to SSMs, indicating their limitations. This underscores the need for a specialized SSM tuning approach. To address this, we propose Sparse Dimension Tuning (SDT), a PEFT method tailored for SSM modules. Combining SDT for SSMs with LoRA for linear projection matrices, we achieve state-of-the-art performance across extensive experiments.

Kevin Galim, Wonjun Kang, Yuchen Zeng, Hyung Il Koo, Kangwook Lee• 2024

Related benchmarks

Task	Dataset	Result
Text-to-SQL	Spider	Exec Acc (All)51.8	139
Text-to-SQL	Spider 1.0 (test)	EM Acc (Overall)59.7	110
Abstractive Summarization	SamSum	ROUGE-226	73
Data-to-text generation	DART	BLEU46.2	16
Dialogue Summarization	SAMSum 1.0 (test)	R150.4	11
Text-to-SQL	Spider 1.0 (val)	Accuracy (All)54.3	11
Natural Language Understanding	GLUE	GLUE Average Score77.4	11
Data-to-Text	DART 1.0 (test)	METEOR65.3	11

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord