Sampling-based Pseudo-Likelihood for Membership Inference Attacks

About

Large Language Models (LLMs) are trained on large-scale web data, which makes it difficult to grasp the contribution of each text. This poses the risk of leaking inappropriate data such as benchmarks, personal information, and copyrighted texts in the training data. Membership Inference Attacks (MIA), which determine whether a given text is included in the model's training data, have been attracting attention. Previous studies of MIAs revealed that likelihood-based classification is effective for detecting leaks in LLMs. However, the existing methods cannot be applied to some proprietary models like ChatGPT or Claude 3 because the likelihood is unavailable to the user. In this study, we propose a Sampling-based Pseudo-Likelihood (\textbf{SPL}) method for MIA (\textbf{SaMIA}) that calculates SPL using only the text generated by an LLM to detect leaks. The SaMIA treats the target text as the reference text and multiple outputs from the LLM as text samples, calculates the degree of $n$-gram match as SPL, and determines the membership of the text in the training data. Even without likelihoods, SaMIA performed on par with existing likelihood-based methods.

Masahiro Kaneko, Youmi Ma, Yuki Wata, Naoaki Okazaki• 2024

Related benchmarks

Task	Dataset	Result
Membership Inference Attack	WikiMIA length 64	AUC0.68	84
Membership Inference Attack	Wikipedia	AUC0.521	75
Membership Inference Attack	WikiMIA length 32	--	54
Membership Inference Attack	arXivReasoning Sequence-level	ACC54.9	43
Membership Inference Attack	WikiMIA-25	AUC0.657	33
Membership Inference Attack	WikiMIA length 128	AUC0.7	28
Membership Inference Attack	WikiMIA	AUC71	24
Membership Inference Attack	WikiMIA length 256	AUC (MIA)0.8	24
Membership Inference Attack	BookReasoning Sequence-level	ACC54.9	4
Membership Inference Attack	BookReasoning Document-level	ACC63.8	3

Showing 10 of 10 rows

Other info

Code

Follow for update

@wizwand_team Discord