Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Decoupled Training with Local Reinforcement Fine-Tuning in Federated Learning

About

Federated Learning (FL) with pre-trained Vision-Language Models (VLMs) has emerged as a promising paradigm for various downstream tasks. By leveraging its strong representations, recent studies improve task adaptation under insufficient local data while preserving generalization. However, these methods emphasize fully local optimization with simple parameter aggregation,which can amplify inter-client optimization inconsistency and intra-client over-specialization under heterogeneous and full-data FL settings, making it difficult to balance global task adaptation and generalization. To address these challenges, we propose FedDTL, a novel federated VLM framework that decouples the image encoder and text encoder across clients and the server. Through decoupled encoder training with server-client modality alignment, FedDTL promotes coherent global semantic update and reduces inter-client optimization inconsistency, improving global task adaptation.To further mitigate intra-client over-specialization,we introduce a two-stage local fine-tuning, where a supervised fine-tuning stage enables rapid and reliable warm-start, followed by a reinforcement learning stage that enhances generalization. Extensive experiments on multiple benchmarks, including label skew and feature shift, demonstrate that FedDTL achieves an effective balance between global task adaptation and generalization under various FL data distributions in both few-shot and full-data regimes.

Yuting Ma, Lechao Cheng, Xiaohua Xu• 2026

Related benchmarks

TaskDatasetResultRank
Image ClassificationDomainNet (test)
Average Accuracy93.47
266
Federated Few-shot Image ClassificationCIFAR10, CIFAR100, EuroSAT, Tiny-ImageNet, OxfordPet, Flower102, Caltech101, Caltech256, Food101 Local classes
Accuracy92.58
69
Image ClassificationOffice-Caltech-10 (test)
Average Accuracy98.65
58
Image ClassificationAggregate of 9 benchmarks (CIFAR10, CIFAR100, EuroSAT, OxfordPet, Flowers102, Food101, SUN397, DTD, Caltech101) Few-shot
Local Top-1 Accuracy92.58
35
Image ClassificationAggregate of 9 benchmarks (CIFAR10, CIFAR100, EuroSAT, OxfordPet, Flowers102, Food101, SUN397, DTD, Caltech101) Full-data
Average Local Top-1 Accuracy94.07
35
Federated Few-shot Image ClassificationCIFAR10, CIFAR100, EuroSAT, Tiny-ImageNet, OxfordPet, Flower102, Caltech101, Caltech256, Food101 Base classes
Accuracy92.58
18
Showing 6 of 6 rows

Other info

Follow for update