This paper provides the first integrated analysis of model dememorization, unifying unlearnability and unlearning methods, and offering theoretical guarantees on dememorization depth.
Advanced model dememorization methods, including availability poisoning (unlearnability) and machine unlearning, are emerging as key safeguards against data misuse in machine learning (ML). At the training stage, unlearnability embeds imperceptible perturbations into data before release to reduce learnability. At the post-training stage, unlearning removes previously acquired information from models to prevent unauthorized disclosure or use. While both defenses aim to preserve the right to withhold knowledge, their vulnerabilities and shared foundations remain unclear. Specifically, both unlearnability and unlearning suffer from issues such as shallow dememorization, leading to falsely claimed data learnability reduction or forgetting in the presence of weight perturbations. Moreover, input perturbations may affect the effectiveness of downstream unlearning, while unlearning may inadvertently recover domain knowledge hidden by unlearnability. This interplay calls for deeper investigation. Finally, there is a lack of formal guarantees to provide theoretical insights into current defenses against shallow dememorization. In this Systematization of Knowledge, we present the first integrated analysis of model dememorization approaches leveraging unlearnability and unlearning. Our contributions are threefold: (i) a unified taxonomy of unlearnability and scalable unlearning methods; (ii) an empirical evaluation revealing the robustness, interplay, and shallow dememorization of leading methods; and (iii) the first theoretical guarantee on dememorization depth for models processed through certified unlearning. These results lay the foundation for unifying dememorization mechanisms across the ML lifecycle to achieve a deeper immemor state for sensitive knowledge.
Secure Forgetting: A Framework for Privacy-Driven Unlearning in Large Language Model (LLM)-Based Age…
The paper proposes a comprehensive framework for LLM-based agent unlearning, ena…
Attack by Unlearning: Unlearning-Induced Adversarial Attacks on Graph Neural Networks
This paper introduces 'unlearning corruption attacks,' demonstrating that the pe…
Forgetting to Witness: Efficient Federated Unlearning and Its Visible Evaluation
This paper introduces the first complete pipeline for federated unlearning, prop…
Towards Unveiling Vulnerabilities of Large Reasoning Models in Machine Unlearning
The paper proposes a novel bi-level exact unlearning attack targeting Large Reas…
Client-Verifiable and Efficient Federated Unlearning in Low-Altitude Wireless Networks
The paper proposes VerFU, a client-verifiable federated unlearning framework for…
Jellyfish: Zero-Shot Federated Unlearning Scheme with Knowledge Disentanglement
The paper proposes Jellyfish, a zero-shot federated unlearning scheme that effec…
Label Leakage Attacks in Machine Unlearning: A Parameter and Inversion-Based Approach
This paper analyzes and proposes four novel attack methods—based on model parame…
REFORGE: Multi-modal Attacks Reveal Vulnerable Concept Unlearning in Image Generation Models
The paper introduces REFORGE, a black-box red-teaming framework that uses advers…