cs.CR

Label Leakage Attacks in Machine Unlearning: A Parameter and Inversion-Based Approach

Weidong Zheng, Kongyang Chen, Yao Huang, Yuanwei Guo, Yatie Xiao

Apr 8, 2026

AI Summarygemma4:e4b

This paper analyzes and proposes four novel attack methods—based on model parameters and model inversion—to demonstrate that existing machine unlearning techniques can inadvertently leak the categories of the forgotten data.

Abstract

More Like This

With the widespread application of artificial intelligence technologies in face recognition and other fields, data privacy security issues have received extensive attention, especially the \textit{right to be forgotten} emphasized by numerous privacy protection laws. Existing technologies have proposed various unlearning methods, but they may inadvertently leak the categories of unlearned data. This paper focuses on the category unlearning scenario, analyzes the potential problems of category leakage of unlearned data in multiple scenarios, and proposes four attack methods from the perspectives of model parameters and model inversion based on attackers with different knowledge backgrounds. At the level of model parameters, we construct discriminative features by computing either dot products or vector differences between the parameters of the target model and those of auxiliary models trained on subsets of retained data and unrelated data, respectively. These features are then processed via k-means clustering, Youden's Index, and decision tree algorithms to achieve accurate identification of the forgotten class. In the model inversion domain, we design a gradient optimization-based white-box attack and a genetic algorithm-based black-box attack to reconstruct class-prototypical samples. The prediction profiles of these synthesized samples are subsequently analyzed using a threshold criterion and an information entropy criterion to infer the forgotten class. We evaluate the proposed attacks on four standard datasets against five state-of-the-art unlearning algorithms, providing a detailed analysis of the strengths and limitations of each method. Experimental results demonstrate that our approach can effectively infer the classes forgotten by the target model.

The paper proposes a comprehensive framework for LLM-based agent unlearning, ena…

02Low31%

Attack by Unlearning: Unlearning-Induced Adversarial Attacks on Graph Neural Networks

This paper introduces 'unlearning corruption attacks,' demonstrating that the pe…

03Low30%

Towards Unveiling Vulnerabilities of Large Reasoning Models in Machine Unlearning

The paper proposes a novel bi-level exact unlearning attack targeting Large Reas…

04Low27%

Jellyfish: Zero-Shot Federated Unlearning Scheme with Knowledge Disentanglement

The paper proposes Jellyfish, a zero-shot federated unlearning scheme that effec…

05Low25%

Forgetting to Witness: Efficient Federated Unlearning and Its Visible Evaluation

This paper introduces the first complete pipeline for federated unlearning, prop…

06Low23%

Client-Verifiable and Efficient Federated Unlearning in Low-Altitude Wireless Networks

The paper proposes VerFU, a client-verifiable federated unlearning framework for…

07Low16%

REFORGE: Multi-modal Attacks Reveal Vulnerable Concept Unlearning in Image Generation Models

The paper introduces REFORGE, a black-box red-teaming framework that uses advers…

08Low14%

ARES: Scalable and Practical Gradient Inversion Attack in Federated Learning through Activation Reco…

The paper introduces ARES, a novel and practical gradient inversion attack that…