Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

UNDIAL: Self-Distillation with Adjusted Logits for Robust Unlearning in Large Language Models

About

Mitigating the retention of sensitive or private information in large language models is essential for enhancing privacy and safety. Existing unlearning methods, like Gradient Ascent and Negative Preference Optimization, directly tune models to remove unwanted information. However, these methods often become unstable because they fine-tune by maximizing cross-entropy loss, which is the opposite of traditional loss minimization in learning. This reversal creates instability, especially on larger datasets, as the model struggles to balance unlearning with maintaining language capacity, leading to over-unlearning. In this paper, we introduce UnDIAL (Unlearning via Self-Distillation on Adjusted Logits), a novel and robust unlearning method. Our approach leverages self-distillation to adjust logits and selectively reduce the influence of targeted tokens. This technique ensures smooth convergence and avoids catastrophic forgetting, even in challenging unlearning tasks with large datasets and sequential unlearning requests. Extensive experiments show that UnDIAL can achieve both robustness in unlearning and scalability while maintaining stable training dynamics and resilience to hyperparameter tuning.

Yijiang River Dong, Hongzhou Lin, Mikhail Belkin, Ramon Huerta, Ivan Vuli\'c• 2024

Related benchmarks

TaskDatasetResultRank
Multi-task Language UnderstandingMMLU--
842
Multi-task Language UnderstandingMMLU (test)
Normalized Accuracy59.6
76
Language UnderstandingMMLU
MMLU Score60.2
45
Machine UnlearningRWKU Llama 3.1 8B (Forget Set)
FB Score57.7
39
Machine UnlearningMUSE-News Llama 2 7B
Privacy Leakage-99.75
27
Machine UnlearningMUSE-Books Harry Potter v1.0 (Overall)
R-Forget28.2
17
Machine UnlearningRWKU Llama 3.1 8B (Neighbor Set)
FB74.4
15
e-Commerce TaskInternal e-commerce benchmark Task medium-scale seller 387 items
Performance Score54.2
14
Knowledge RetentionInternal e-commerce benchmark Neighbours medium-scale seller 387 items
Rouge Score79
14
Knowledge UnlearningInternal e-commerce benchmark medium-scale seller 387 items (Forget Set)
ROUGE44.3
14
Showing 10 of 21 rows

Other info

Follow for update