Minimizing Layerwise Activation Norm Improves Generalization in Federated Learning

About

Federated Learning (FL) is an emerging machine learning framework that enables multiple clients (coordinated by a server) to collaboratively train a global model by aggregating the locally trained models without sharing any client's training data. It has been observed in recent works that learning in a federated manner may lead the aggregated global model to converge to a 'sharp minimum' thereby adversely affecting the generalizability of this FL-trained model. Therefore, in this work, we aim to improve the generalization performance of models trained in a federated setup by introducing a 'flatness' constrained FL optimization problem. This flatness constraint is imposed on the top eigenvalue of the Hessian computed from the training loss. As each client trains a model on its local data, we further re-formulate this complex problem utilizing the client loss functions and propose a new computationally efficient regularization technique, dubbed 'MAN,' which Minimizes Activation's Norm of each layer on client-side models. We also theoretically show that minimizing the activation norm reduces the top eigenvalue of the layer-wise Hessian of the client's loss, which in turn decreases the overall Hessian's top eigenvalue, ensuring convergence to a flat minimum. We apply our proposed flatness-constrained optimization to the existing FL techniques and obtain significant improvements, thereby establishing new state-of-the-art.

M Yashwanth, Gaurav Kumar Nayak, Harsh Rangwani, Arya Singh, R. Venkatesh Babu, Anirban Chakraborty• 2025

Related benchmarks

Task	Dataset	Result
Image Classification	Cifar10 Dirichlet(0.3) (test)	Test Accuracy84.82	21
Image Classification	CIFAR10 0.6-Dirichlet (test)	--	18
Federated Learning Classification	CIFAR-100 non-iid Dirichlet 0.6 (test)	Accuracy55.84	12
Federated Learning Classification	CIFAR-100 IID (test)	Accuracy56.77	12
Federated Learning Classification	Tiny-ImageNet non-iid Dirichlet 0.3 (test)	Accuracy35.7	12
Federated Learning Classification	Tiny-ImageNet non-iid Dirichlet 0.6 (test)	Accuracy0.3607	12
Federated Learning Classification	Tiny-ImageNet IID (test)	Accuracy36.53	12
Federated Learning Classification	CIFAR-100 non-iid delta=0.3	Accuracy55.27	12

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord