Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Reinforcement Unlearning via Group Relative Policy Optimization

About

During pretraining, LLMs inadvertently memorize sensitive or copyrighted data, posing significant compliance challenges under legal frameworks like the GDPR and the EU AI Act. Fulfilling these mandates demands techniques that can remove information from a deployed model without retraining from scratch. Existing unlearning approaches attempt to address this need, but often leak the very data they aim to erase, sacrifice fluency and robustness, or depend on costly external reward models. We introduce PURGE (Policy Unlearning through Relative Group Erasure), a novel method grounded in the Group Relative Policy Optimization framework that formulates unlearning as a verifiable problem. PURGE uses an intrinsic reward signal that penalizes any mention of forbidden concepts, allowing safe and consistent unlearning. Our approach achieves up to x46 lower token usage per target than state-of-the-art methods, while improving fluency by +5.48% and adversarial robustness by +12.02% over the base model. Extensive evaluation on the Real World Knowledge Unlearning (RWKU) benchmark shows that PURGE reaches 11% unlearning effectiveness while preserving 98% of original utility. PURGE shows that framing LLM unlearning as a verifiable task enables more reliable, efficient, and scalable forgetting, suggesting a promising new direction for unlearning research that combines theoretical guarantees, improved safety, and practical deployment efficiency.

Efstratios Zaradoukas, Bardh Prenkaj, Gjergji Kasneci• 2026

Related benchmarks

TaskDatasetResultRank
Machine UnlearningTOFU
Forget Quality1.12e-19
10
Knowledge RetentionRWKU Famous People Neighbor Set
FB Score51.3
7
Membership Inference AttackRWKU Famous People MIA Set
FM40.26
7
Machine UnlearningRWKU Famous People Forget Set
FB Score42.8
7
Utility PreservationRWKU Famous People Utility Set
GA64.4
7
Showing 5 of 5 rows

Other info

Follow for update