DE-COP: Detecting Copyrighted Content in Language Models Training Data

About

How can we detect if copyrighted content was used in the training process of a language model, considering that the training data is typically undisclosed? We are motivated by the premise that a language model is likely to identify verbatim excerpts from its training text. We propose DE-COP, a method to determine whether a piece of copyrighted content was included in training. DE-COP's core approach is to probe an LLM with multiple-choice questions, whose options include both verbatim text and their paraphrases. We construct BookTection, a benchmark with excerpts from 165 books published prior and subsequent to a model's training cutoff, along with their paraphrases. Our experiments show that DE-COP surpasses the prior best method by 9.6% in detection performance (AUC) on models with logits available. Moreover, DE-COP also achieves an average accuracy of 72% for detecting suspect books on fully black-box models where prior methods give approximately 4% accuracy. The code and datasets are available at https://github.com/LeiLiLab/DE-COP.

Andr\'e V. Duarte, Xuandong Zhao, Arlindo L. Oliveira, Lei Li• 2024

Related benchmarks

Task	Dataset	Result
Membership Inference Attack	Wikipedia	AUC0.516	75
Membership Inference Attack	arXiv	AUC54.8	55
Membership Inference Attack	arXivReasoning Sequence-level	ACC62	43
Membership Inference	WikiMIA 24	--	10
Membership Inference	WikiMIA Hard 2024	--	10
Membership Inference	WikiMIA	--	7
Membership Inference Attack	TÜLU	--	7
Membership Inference Attack	arXivReasoning Document-level	ACC84	4
Membership Inference Attack	The Pile (test)	--	4
Membership Inference Attack	DOLMa corpus	--	4

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord