PURGE is a novel machine unlearning algorithm that leverages the duality between continual learning and unlearning to achieve high data retention while making the unlearned model indistinguishable from a model retrained from scratch.
We propose PURGE, a machine unlearning algorithm built on a simple but an under-exploited observation: continual learning (CL) and machine unlearning (MU) which are fundamentally dual problems. CL tries to learn new tasks without forgetting old ones; MU tries to erase specific data without hurting retained performance representing the same underlying tension in opposite directions. PURGE leverages this duality by adapting gradient projection from A-GEM (Chaudhry et al., 2019) so that every unlearning step is constrained to not increase the retain-set loss. On top of this, it performs multi-layer representation erasure, pushing forget-set activations in intermediate layers towards the retain distribution to remove information from hidden representations rather than just suppressing it at the output. A key design choice is the retain-confusion target: rather than pushing forget outputs toward the uniform distribution, which we found to be surprisingly easy for membership inference attacks to detect, we instead target the model's natural confusion pattern on retain data. This makes the unlearned model hard to distinguish from one retrained from scratch. Two self-regulating stopping criteria (a retain-loss budget and a forget-accuracy target) let the algorithm decide on its own when to stop, removing the need for manual epoch tuning. In experiments on five datasets (CIFAR-10, MNIST, SVHN, STL10, PathMNIST) across 22 class-level forgetting tasks, PURGE consistently keeps retain accuracy above 96% while achieving MIA AUROC close to 0.5 (the ideal), outperforming gradient ascent, KL-uniform, and several published baselines on the privacy-utility frontier.
Secure Forgetting: A Framework for Privacy-Driven Unlearning in Large Language Model (LLM)-Based Age…
The paper proposes a comprehensive framework for LLM-based agent unlearning, ena…
Forgetting to Witness: Efficient Federated Unlearning and Its Visible Evaluation
This paper introduces the first complete pipeline for federated unlearning, prop…
Attack by Unlearning: Unlearning-Induced Adversarial Attacks on Graph Neural Networks
This paper introduces 'unlearning corruption attacks,' demonstrating that the pe…
Jellyfish: Zero-Shot Federated Unlearning Scheme with Knowledge Disentanglement
The paper proposes Jellyfish, a zero-shot federated unlearning scheme that effec…
Towards Unveiling Vulnerabilities of Large Reasoning Models in Machine Unlearning
The paper proposes a novel bi-level exact unlearning attack targeting Large Reas…
Label Leakage Attacks in Machine Unlearning: A Parameter and Inversion-Based Approach
This paper analyzes and proposes four novel attack methods—based on model parame…
Client-Verifiable and Efficient Federated Unlearning in Low-Altitude Wireless Networks
The paper proposes VerFU, a client-verifiable federated unlearning framework for…
Neighbor-Aware Localized Concept Erasure in Text-to-Image Diffusion Models
The paper proposes Neighbor-Aware Localized Concept Erasure (NLCE), a training-f…