Papers similar to 2604.26505v1

~ similar to 2604.26505v1· 20 results

cs.CRcs.AIcs.CLRecentMar 25, 2026

AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

The paper proposes a unified closed-loop threat taxonomy to systematically analyze and defend foundation models by explicitly framing the bidirectional security interactions between data and models.

View →

cs.CRcs.AIcs.CLRecentMay 7, 2026

LeakDojo: Decoding the Leakage Threats of RAG Systems

Maosen Zhang, Jianshuo Dong, Boting Lu, Wenyue Li +3 more

The paper introduces LeakDojo, a framework that systematically evaluates RAG leakage risks, finding that stronger LLM instruction-following and query generation are major independent contributors to d…

View →

cs.CRcs.CVRecentApr 14, 2026

CoLA: A Choice Leakage Attack Framework to Expose Privacy Risks in Subset Training

Qi Li, Cheng-Long Wang, Yinzhi Cao, Di Wang

This paper introduces CoLA, a framework demonstrating that subset training, while efficient, introduces new and potentially greater privacy risks by leaking information about both data membership and…

View →

cs.CRcs.CLcs.LGRecentMay 22, 2026

What Does the Server See? Understanding Privacy Leakage from Large Language Models in Split Inference

Mingyuan Fan, Yu Liu, Fuyi Wang, Cen Chen

The paper introduces ActInv and PAF to systematically analyze and quantify privacy leakage from intermediate activations during split inference of LLMs, proposing PriPert for enhanced defense.

View →

cs.CRcs.AIRecentApr 20, 2026

Understanding Secret Leakage Risks in Code LLMs: A Tokenization Perspective

Meifang Chen, Zhe Yang, Huang Nianchen, Yizhan Huang +3 more

This paper investigates how Byte-Pair Encoding (BPE) tokenization causes Code LLMs to disproportionately memorize certain types of secrets, a phenomenon termed 'gibberish bias'.

View →

cs.CRcs.AIRecentApr 30, 2026

Secret Stealing Attacks on Local LLM Fine-Tuning through Supply-Chain Model Code Backdoors

Zi Li, Tian Zhou, Wenze Li, Jingyu Hua +2 more

This paper introduces a novel supply-chain attack that uses model code backdoors to actively steal sensitive secrets from local LLM fine-tuning datasets, bypassing current privacy defenses.

View →

cs.CRcs.LGEmpiricalRecentJun 27, 2026

FlipGuard: Defending Large Language Models Against Quantization-Conditioned Backdoor Attacks

Aoying Zheng, Anqi Du, Zizhuang Deng, Yuxuan Chen

The paper introduces FlipGuard, a proactive defense framework against Quantization-Conditioned Backdoor (QCB) attacks in Large Language Models (LLMs), achieving high security with negligible performan…

View →

cs.CRRecentMay 13, 2026

From Compression to Accountability: Harmless Copyright Protection for Dataset Distillation

Yan Liang, Ziyuan Yang, Mengyu Sun, Joey Tianyi Zhou +1 more

The paper proposes SubPopMark, a novel subpopulation-driven framework that injects harmless, verifiable markers into distilled datasets to prevent copyright infringement and data leakage.

View →

cs.LGcs.CRRecentMay 11, 2026

Unlearning with Asymmetric Sources: Improved Unlearning-Utility Trade-off with Public Data

Ahmed Mehdi Inane, Vincent Quirion, Gintare Karolina Dziugaite, Ioannis Mitliagkas

The paper introduces Asymmetric Langevin Unlearning (ALU), a novel framework that uses public data to significantly reduce the utility loss typically associated with certified machine unlearning, enab…

View →

cs.CRcs.AIcs.LGEmpiricalRecentJul 22, 2026

HijackKV: New Threat in Position-Independent KV Cache Reuse

Yichi Zhang, Zhiqi Wang, Huan Zhang, Yuchen Yang

This paper introduces HIJACKKV, the first attack framework to exploit the new threat of KV Cache Hijacking in large language models, achieving an average success rate of 94%.

View →

cs.CRcs.AIRecentMay 21, 2026

Safeguarding Text-to-Image Generative Models Against Unauthorized Knowledge Distillation

Yilan Gao, Sida Huang, Hongyuan Zhang, Xuelong Li

The paper introduces WaveGuard, a frequency-aware, single-pass defense framework that safeguards text-to-image models by injecting structured, imperceptible perturbations into generated images, thereb…

View →

cs.CRcs.AIcs.LGRecentMar 26, 2026

Shape and Substance: Dual-Layer Side-Channel Attacks on Local Vision-Language Models

Eyal Hadad, Mordechai Guri

This paper introduces a dual-layer side-channel attack framework that exploits the variable workload introduced by dynamic image preprocessing in local Vision-Language Models (VLMs) to infer sensitive…

View →

cs.CRRecentMay 1, 2026

Revisiting Privacy Leakage in Machine Unlearning: Membership Inference Beyond the Forgotten Set

Jie Fu, Nima Naderloui, Da Zhong, Yuan Hong +1 more

This paper introduces TC-UMIA, a novel tri-class membership inference attack, demonstrating that machine unlearning can leak privacy risks to the retained data set, and evaluates defense mechanisms to…

View →

cs.CRRecentMay 7, 2026

Benchmarking Large Language Models for IoC Recovery under Adversarial Code Obfuscation and Encryption

Jaime Morales, Sergio Pastrana, Juan Tapiador

The paper introduces a systematic benchmark to test LLMs' ability to recover Indicators of Compromise (IoCs) from JavaScript code, finding that while LLMs handle simple obfuscation well, encryption-ba…

View →

cs.CRcs.AIRecentMay 19, 2026

Token by Token, Compromised: Backdoor Vulnerabilities in Unified Autoregressive Models

Tobias Braun, Jonas Henry Grebe, Hossein Shakibania, Anna Rohrbach +1 more

This paper introduces the Token by Token Backdoor Attack (ToBAC), demonstrating that unified autoregressive models (UAMs) are vulnerable to backdoor attacks where a single trigger can compromise multi…

View →

cs.CRcs.IRcs.LGRecentMay 13, 2026

VectorSmuggle: Steganographic Exfiltration in Embedding Stores and a Cryptographic Provenance Defense

Jascha Wanger

The paper demonstrates a class of steganographic exfiltration attacks against vector databases by hiding data within embeddings, and proposes VectorPin, a cryptographic provenance protocol to detect s…

View →

cs.CRcs.DBRecentApr 27, 2026

Poisoning Learned Index Structures: Static and Dynamic Adversarial Attacks on ALEX

Allen Jue

The paper systematically evaluates static and dynamic adversarial attacks on the ALEX learned index, finding that while static poisoning has minimal impact, dynamic attacks can cause significant slowd…

View →

cs.CRRecentMar 24, 2026

Observable Channels, Not Just Storage: Evaluating Privacy Leakage in LLM Agent Pipelines

Tao Huang, Chen Hou, Guosen Wu, Jiayang Meng

The paper introduces CIPL, a unified channel-oriented framework, demonstrating that privacy leakage in LLM agents is governed by observable data channels and pipeline interactions, rather than being l…

View →

cs.CRcs.AIcs.LGRecentMay 14, 2026

One Step to the Side: Why Defenses Against Malicious Finetuning Fail Under Adaptive Adversaries

Itay Zloczower, Eyal Lenga, Gilad Gressel, Yisroel Mirsky

The paper demonstrates that current defenses against malicious fine-tuning of foundation models are insufficient because they only address fixed attacks, and introduces a unified adaptive attack that…

View →