Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation

About

This paper introduces Unilogit, a novel self-distillation method for machine unlearning in Large Language Models. Unilogit addresses the challenge of selectively forgetting specific information while maintaining overall model utility, a critical task in compliance with data privacy regulations like GDPR. Unlike prior methods that rely on static hyperparameters or starting model outputs, Unilogit dynamically adjusts target logits to achieve a uniform probability for the target token, leveraging the current model's outputs for more accurate self-distillation targets. This approach not only eliminates the need for additional hyperparameters but also enhances the model's ability to approximate the golden targets. Extensive experiments on public benchmarks and an in-house e-commerce dataset demonstrate Unilogit's superior performance in balancing forget and retain objectives, outperforming state-of-the-art methods such as NPO and UnDIAL. Our analysis further reveals Unilogit's robustness across various scenarios, highlighting its practical applicability and effectiveness in achieving efficacious machine unlearning.

Stefan Vasilev, Christian Herold, Baohao Liao, Seyyed Hadi Hashemi, Shahram Khadivi, Christof Monz• 2025

Related benchmarks

TaskDatasetResultRank
Multi-task Language UnderstandingMMLU--
842
Multi-task Language UnderstandingMMLU (test)
Normalized Accuracy62.6
76
Language UnderstandingMMLU
MMLU Score61.4
45
Machine UnlearningRWKU Llama 3.1 8B (Forget Set)
FB Score20.5
39
Machine UnlearningMUSE-News Llama 2 7B
Privacy Leakage-99.79
27
Machine UnlearningRWKU Llama 3.1 8B (Neighbor Set)
FB63.8
15
Knowledge RetentionInternal e-commerce benchmark Neighbours medium-scale seller 387 items
Rouge Score78.2
14
e-Commerce TaskInternal e-commerce benchmark Task medium-scale seller 387 items
Performance Score53.8
14
Knowledge UnlearningInternal e-commerce benchmark medium-scale seller 387 items (Forget Set)
ROUGE0.2
14
General Language ModelingGeneral Benchmarks Llama 3.1 8B
Generation Quality Score65.7
11
Showing 10 of 16 rows

Other info

Follow for update