Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Securely Fine-tuning Pre-trained Encoders Against Adversarial Examples

About

With the evolution of self-supervised learning, the pre-training paradigm has emerged as a predominant solution within the deep learning landscape. Model providers furnish pre-trained encoders designed to function as versatile feature extractors, enabling downstream users to harness the benefits of expansive models with minimal effort through fine-tuning. Nevertheless, recent works have exposed a vulnerability in pre-trained encoders, highlighting their susceptibility to downstream-agnostic adversarial examples (DAEs) meticulously crafted by attackers. The lingering question pertains to the feasibility of fortifying the robustness of downstream models against DAEs, particularly in scenarios where the pre-trained encoders are publicly accessible to the attackers. In this paper, we initially delve into existing defensive mechanisms against adversarial examples within the pre-training paradigm. Our findings reveal that the failure of current defenses stems from the domain shift between pre-training data and downstream tasks, as well as the sensitivity of encoder parameters. In response to these challenges, we propose Genetic Evolution-Nurtured Adversarial Fine-tuning (Gen-AF), a two-stage adversarial fine-tuning approach aimed at enhancing the robustness of downstream models. Our extensive experiments, conducted across ten self-supervised training methods and six datasets, demonstrate that Gen-AF attains high testing accuracy and robust testing accuracy against state-of-the-art DAEs.

Ziqi Zhou, Minghui Li, Wei Liu, Shengshan Hu, Yechao Zhang, Wei Wan, Lulu Xue, Leo Yu Zhang, Dezhong Yao, Hai Jin• 2024

Related benchmarks

TaskDatasetResultRank
Image ClassificationGTSRB
Natural Accuracy75.92
87
Image ClassificationGTSRB (test)
Accuracy (Clean)81.14
59
Image ClassificationSTL-10 (test)
Accuracy (Benign)47.2
11
Image ClassificationCIFAR-10
Benign Accuracy51.73
11
Image ClassificationSTL-10
Accuracy (Benign)43.1
11
Image ClassificationCIFAR-100
Benign Accuracy (BA)23.18
11
Adversarial RobustnessSTL10 (test)
Baseline Acc69.09
10
Showing 7 of 7 rows

Other info

Follow for update