MUNBa: Machine Unlearning via Nash Bargaining
About
Machine Unlearning (MU) aims to selectively erase harmful behaviors from models while retaining the overall utility of the model. As a multi-task learning problem, MU involves balancing objectives related to forgetting specific concepts/data and preserving general performance. A naive integration of these forgetting and preserving objectives can lead to gradient conflicts and dominance, impeding MU algorithms from reaching optimal solutions. To address the gradient conflict and dominance issue, we reformulate MU as a two-player cooperative game, where the two players, namely, the forgetting player and the preservation player, contribute via their gradient proposals to maximize their overall gain and balance their contributions. To this end, inspired by the Nash bargaining theory, we derive a closed-form solution to guide the model toward the Pareto stationary point. Our formulation of MU guarantees an equilibrium solution, where any deviation from the final state would lead to a reduction in the overall objectives for both players, ensuring optimality in each objective. We evaluate our algorithm's effectiveness on a diverse set of tasks across image classification and image generation. Extensive experiments with ResNet, vision-language model CLIP, and text-to-image diffusion models demonstrate that our method outperforms state-of-the-art MU algorithms, achieving a better trade-off between forgetting and preserving. Our results also highlight improvements in forgetting precision, preservation of generalization, and robustness against adversarial attacks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Machine Unlearning | Tiny-ImageNet (train) | -- | 43 | |
| Full-class unlearning | Tiny-ImageNet | Retention Accuracy (RA)64.22 | 21 | |
| Machine Unlearning | CIFAR-100 (train) | Accuracy ($D_f$)33.8 | 19 | |
| Random subset unlearning | SVHN | Retention Accuracy (RA)99.73 | 15 | |
| Random subset unlearning | CIFAR-10 | Retention Accuracy (RA)100 | 15 | |
| Sub-class Machine Unlearning | CIFAR-20 Rocket sub-class | RA81.43 | 15 | |
| Full-class unlearning | CIFAR-100 | Retention Accuracy (RA)74.09 | 15 | |
| Sub-class Machine Unlearning | CIFAR-20 Sea sub-class | Retained Accuracy (RA)80.64 | 15 | |
| Class-wise Unlearning | CIFAR-10 Unlearn 1 Class v1 (10% unlearned) | PG H16.92 | 13 | |
| Machine Unlearning | ImageNet 1-class unlearning 1K | PGH34.05 | 13 |