Rethinking Machine Unlearning for Large Language Models

About

We explore machine unlearning (MU) in the domain of large language models (LLMs), referred to as LLM unlearning. This initiative aims to eliminate undesirable data influence (e.g., sensitive or illegal information) and the associated model capabilities, while maintaining the integrity of essential knowledge generation and not affecting causally unrelated information. We envision LLM unlearning becoming a pivotal element in the life-cycle management of LLMs, potentially standing as an essential foundation for developing generative AI that is not only safe, secure, and trustworthy, but also resource-efficient without the need of full retraining. We navigate the unlearning landscape in LLMs from conceptual formulation, methodologies, metrics, and applications. In particular, we highlight the often-overlooked aspects of existing LLM unlearning research, e.g., unlearning scope, data-model interaction, and multifaceted efficacy assessment. We also draw connections between LLM unlearning and related areas such as model editing, influence functions, model explanation, adversarial training, and reinforcement learning. Furthermore, we outline an effective assessment framework for LLM unlearning and explore its applications in copyright and privacy safeguards and sociotechnical harm reduction.

Sijia Liu, Yuanshun Yao, Jinghan Jia, Stephen Casper, Nathalie Baracaldo, Peter Hase, Yuguang Yao, Chris Yuhao Liu, Xiaojun Xu, Hang Li, Kush R. Varshney, Mohit Bansal, Sanmi Koyejo, Yang Liu• 2024

Related benchmarks

Task	Dataset	Result
Multi-task Language Understanding	MMLU	MMLU Accuracy42.6	456
Instruction Following	Retain Set	Instruction Following Accuracy0.00e+0	12
Question Answering	Retain Set	NC Score53	12
Machine Unlearning	TOFU Forget10 Llama-3.2-1B	MEM Score18.49	12
Machine Unlearning	LLaMA-3-8B Unlearning Evaluation Suite (test)	Accuracy27.81	6
Machine Unlearning	Yelp (20 folds)	Running Time (s)25.6	6

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord