Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BLUR: A Bi-Level Optimization Approach for LLM Unlearning

About

Enabling large language models (LLMs) to unlearn knowledge and capabilities acquired during training has proven vital for ensuring compliance with data regulations and promoting ethical practices in generative AI. Although there are growing interests in developing various unlearning algorithms, it remains unclear how to best formulate the unlearning problem. The most popular formulation uses a weighted sum of forget and retain loss, but it often leads to performance degradation due to the inherent trade-off between forget and retain losses. In this work, we argue that it is important to model the hierarchical structure of the unlearning problem, where the forget problem (which \textit{unlearns} certain knowledge and/or capabilities) takes priority over the retain problem (which preserves model utility). This hierarchical structure naturally leads to a bi-level optimization formulation where the lower-level objective focuses on minimizing the forget loss, while the upper-level objective aims to maintain the model's utility. Based on this new formulation, we propose a novel algorithm, termed Bi-Level UnleaRning (\texttt{BLUR}), which not only possesses strong theoretical guarantees but more importantly, delivers superior performance. In particular, our extensive experiments demonstrate that \texttt{BLUR} consistently outperforms all the state-of-the-art algorithms across various unlearning tasks, models, and metrics. Codes are available at https://github.com/OptimAI-Lab/BLURLLMUnlearning.

Hadi Reisizadeh, Jinghan Jia, Zhiqi Bu, Bhanukiran Vinzamuri, Anil Ramakrishna, Kai-Wei Chang, Volkan Cevher, Sijia Liu, Mingyi Hong• 2025

Related benchmarks

TaskDatasetResultRank
Instruction FollowingIFEval
IFEval Accuracy31.3
625
Multi-task Language UnderstandingMMLU
Accuracy57.1
321
Question AnsweringTruthfulQA
Accuracy39.2
152
Natural Language InferenceMNLI--
80
Natural Language InferenceQNLI
Accuracy68
61
Safety AlignmentWildJailbreak
Safe@154.4
24
Language ModelingMMLU
MMLU Final Performance46
23
Question AnsweringTruthfulQA
TruthfulQA27.1
22
Safety AlignmentStrongREJECT--
18
Text-to-AudioCOSE (test)
Accuracy53.6
6
Showing 10 of 28 rows

Other info

Follow for update