Study of Training Dynamics for Memory-Constrained Fine-Tuning

About

Memory-efficient training of deep neural networks has become increasingly important as models grow larger while deployment environments impose strict resource constraints. We propose TraDy, a novel transfer learning scheme leveraging two key insights: layer importance for updates is architecture-dependent and determinable a priori, while dynamic stochastic channel selection provides superior gradient approximation compared to static approaches. We introduce a dynamic channel selection approach that stochastically resamples channels between epochs within preselected layers. Extensive experiments demonstrate TraDy achieves state-of-the-art performance across various downstream tasks and architectures while maintaining strict memory constraints, achieving up to 99% activation sparsity, 95% weight derivative sparsity, and 97% reduction in FLOPs for weight derivative computation.

A\"el Qu\'elennec, Nour Hezbri, Pavlo Mozharovskyi, Van-Tam Nguyen, Enzo Tartaglione• 2025

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR-100 (test)	--	3518
Image Classification	CIFAR-10 (test)	--	3381
Image Classification	Flowers (test)	Accuracy90	183
Image Classification	Food (test)	Accuracy84.76	128
Image Classification	Pets (test)	--	58
Image Classification	CUB (test)	Top-1 Accuracy75.89	31
Natural Language Understanding	GLUE (test)	QNLI Score91.36	26
Visual Wake Words Classification	Visual Wake Words (test)	Accuracy93.83	21
Image Classification	Average (CIFAR, CUB, Flowers, Food, Pets, VWW)	Top-1 Accuracy88.48	13
Image Classification	VWW (test)	Top-1 Accuracy88.76	2

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord