Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Jailbreaking the Non-Transferable Barrier via Test-Time Data Disguising

About

Non-transferable learning (NTL) has been proposed to protect model intellectual property (IP) by creating a "non-transferable barrier" to restrict generalization from authorized to unauthorized domains. Recently, well-designed attack, which restores the unauthorized-domain performance by fine-tuning NTL models on few authorized samples, highlights the security risks of NTL-based applications. However, such attack requires modifying model weights, thus being invalid in the black-box scenario. This raises a critical question: can we trust the security of NTL models deployed as black-box systems? In this work, we reveal the first loophole of black-box NTL models by proposing a novel attack method (dubbed as JailNTL) to jailbreak the non-transferable barrier through test-time data disguising. The main idea of JailNTL is to disguise unauthorized data so it can be identified as authorized by the NTL model, thereby bypassing the non-transferable barrier without modifying the NTL model weights. Specifically, JailNTL encourages unauthorized-domain disguising in two levels, including: (i) data-intrinsic disguising (DID) for eliminating domain discrepancy and preserving class-related content at the input-level, and (ii) model-guided disguising (MGD) for mitigating output-level statistics difference of the NTL model. Empirically, when attacking state-of-the-art (SOTA) NTL models in the black-box scenario, JailNTL achieves an accuracy increase of up to 55.7% in the unauthorized domain by using only 1% authorized samples, largely exceeding existing SOTA white-box attacks.

Yongli Xiang, Ziming Hong, Lina Yao, Dadong Wang, Tongliang Liu• 2025

Related benchmarks

TaskDatasetResultRank
Attacking Non-Transferable LearningSTL10 to CIFAR10 (test)
Accuracy (Authorized Domain)85.6
10
Attacking Non-Transferable LearningVisDA-T to VisDA-V (test)
Authorized Domain Accuracy93.6
10
Attacking Non-Transferable LearningCIFAR10 to STL10 (test)
Accuracy (Authorized)82.5
10
Image ClassificationCIFAR10 Authorized domain
Accuracy80.9
10
Image ClassificationSTL10 Unauthorized domain
Accuracy63
5
Image ClassificationCIFAR10 Unauthorized domain
Accuracy38.8
4
Image ClassificationSTL10 Authorized domain
Accuracy0.864
3
Image ClassificationVisDA-V Unauthorized domain
Accuracy20.9
3
Image ClassificationVisDA-T Authorized domain
Accuracy90.9
2
Showing 9 of 9 rows

Other info

Code

Follow for update