Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Little is Enough: Boosting Privacy by Sharing Only Hard Labels in Federated Semi-Supervised Learning

About

In many critical applications, sensitive data is inherently distributed and cannot be centralized due to privacy concerns. A wide range of federated learning approaches have been proposed to train models locally at each client without sharing their sensitive data, typically by exchanging model parameters, or probabilistic predictions (soft labels) on a public dataset or a combination of both. However, these methods still disclose private information and restrict local models to those that can be trained using gradient-based methods. We propose a federated co-training (FedCT) approach that improves privacy by sharing only definitive (hard) labels on a public unlabeled dataset. Clients use a consensus of these shared labels as pseudo-labels for local training. This federated co-training approach empirically enhances privacy without compromising model quality. In addition, it allows the use of local models that are not suitable for parameter aggregation in traditional federated learning, such as gradient-boosted decision trees, rule ensembles, and random forests. Furthermore, we observe that FedCT performs effectively in federated fine-tuning of large language models, where its pseudo-labeling mechanism is particularly beneficial. Empirical evaluations and theoretical analyses suggest its applicability across a range of federated learning scenarios.

Amr Abourayya, Jens Kleesiek, Kanishka Rao, Erman Ayday, Bharat Rao, Geoff Webb, Michael Kamp• 2023

Related benchmarks

TaskDatasetResultRank
Image ClassificationTinyImageNet (test)
Accuracy40.7
499
Image ClassificationCIFAR-100
Accuracy42
357
Image ClassificationCIFAR100
Accuracy34.5
301
Image ClassificationCIFAR10
Accuracy (%)76.9
282
Image ClassificationEMNIST (test)
Accuracy91.5
238
Image ClassificationCIFAR10
Accuracy77.2
143
Image ClassificationTinyImageNet
Accuracy35.9
135
Image ClassificationCIFAR100 (test)
Accuracy39.3
98
Image ClassificationEMNIST
Accuracy91.2
90
Showing 9 of 9 rows

Other info

Follow for update