Papers similar to 2604.24975v1

~ similar to 2604.24975v1· 20 results

cs.CRcs.AIcs.LGRecentMay 22, 2026

PoisonForge: Task-Level Targeted Poisoning Benchmark for Instruction-Tuned LLMs

Luze Sun, Anshuman Suri, Harsh Chaudhari, Cristina Nita-Rotaru +1 more

The paper introduces PoisonForge, a comprehensive benchmark demonstrating that even a small number of targeted poisoned examples can significantly compromise the safety and reliability of instruction-…

View →

cs.CRcs.LGRecentMay 6, 2026

Gray-Box Poisoning of Continuous Malware Ingestion Pipelines

Jan Dolejš, Martin Jureček, Róbert Lórencz

The paper demonstrates a gray-box poisoning attack against continuous malware detection pipelines using subtle binary manipulations, showing that IAT-based perturbations can significantly degrade dete…

View →

cs.CRcs.AIcs.LGRecentMay 14, 2026

One Step to the Side: Why Defenses Against Malicious Finetuning Fail Under Adaptive Adversaries

Itay Zloczower, Eyal Lenga, Gilad Gressel, Yisroel Mirsky

The paper demonstrates that current defenses against malicious fine-tuning of foundation models are insufficient because they only address fixed attacks, and introduces a unified adaptive attack that…

View →

cs.CRcs.AIRecentApr 10, 2026

BadSkill: Backdoor Attacks on Agent Skills via Model-in-Skill Poisoning

Guiyao Tie, Jiawen Shi, Pan Zhou, Lichao Sun

The paper introduces BadSkill, a novel backdoor attack formulation that targets third-party agent skills by poisoning the embedded model artifacts, achieving high attack success rates across various m…

View →

cs.CRcs.AIcs.DCRecentApr 10, 2026

XFED: Non-Collusive Model Poisoning Attack Against Byzantine-Robust Federated Classifiers

Israt Jahan Mouri, Muhammad Ridowan, Muhammad Abdullah Adnan

The paper introduces XFED, a novel non-collusive model poisoning attack that demonstrates the feasibility of compromising Federated Learning systems without requiring coordination among attackers, byp…

View →

cs.CRcs.AIRecentApr 23, 2026

CSC: Turning the Adversary's Poison against Itself

Yuchen Shi, Xin Guo, Huajie Chen, Tianqing Zhu +2 more

The paper proposes Cluster Segregation Concealment (CSC), a novel defense that identifies and neutralizes backdoor triggers by relabeling poisoned samples to a virtual class, achieving near-zero attac…

View →

cs.LGcs.CRRecentJun 3, 2026

Sequential Data Poisoning in LLM Post-Training

Jack Sanderson, Yihan Wang, Xiaoqian Lu, Gautam Kamath +1 more

The paper introduces the threat model of sequential data poisoning, demonstrating that multiple, collaborating attackers can exploit compound vulnerabilities in LLM post-training pipelines that are in…

View →

cs.CRcs.LGRecentMay 25, 2026

Building an Adversarial Malware Dataset by Family and Type: Generation, Evasion, and Poisoning Evaluation

David Košťál, Martin Jureček

The paper constructs a large, adversarial malware dataset from real-world binaries, demonstrating high evasion rates and showing that even small amounts of poisoned data can severely compromise malwar…

View →

cs.CRcs.AIRecentMar 26, 2026

PIDP-Attack: Combining Prompt Injection with Database Poisoning Attacks on Retrieval-Augmented Generation Systems

Haozhen Wang, Haoyue Liu, Jionghao Zhu, Zhichao Wang +2 more

The paper introduces PIDP-Attack, a novel compound adversarial attack that combines prompt injection with database poisoning to manipulate Retrieval-Augmented Generation (RAG) systems against arbitrar…

View →

cs.CRcs.AIRecentMay 10, 2026

Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning

Ben Kereopa-Yorke, Guillermo Diaz, Holly Wright, Reagan Johnston +2 more

The paper introduces Oracle Poisoning, an attack that corrupts knowledge graphs used by AI agents, demonstrating that all tested models blindly trust poisoned data at high sophistication levels.

View →

cs.CRcs.AIcs.SERecentApr 5, 2026

Poisoned Identifiers Survive LLM Deobfuscation: A Case Study on Claude Opus 4.6

Luis Guzmán Lorenzo

The study demonstrates that poisoned identifier names can survive LLM deobfuscation, even when the model correctly understands the code's semantics, unless the task is reframed from deobfuscation to f…

View →

cs.CRcs.DBRecentMay 1, 2026

Defense against Poisoning Attacks under Shuffle-DP

Siyi Wang, Qiyao Luo, Yihua Hu, Lixu Wang +5 more

The paper proposes the first general defense framework to make all union-preserving Differential Privacy (DP) protocols, specifically those based on shuffle-DP, resilient against poisoning attacks.

View →

cs.CRcs.AIRecentApr 30, 2026

Secret Stealing Attacks on Local LLM Fine-Tuning through Supply-Chain Model Code Backdoors

Zi Li, Tian Zhou, Wenze Li, Jingyu Hua +2 more

This paper introduces a novel supply-chain attack that uses model code backdoors to actively steal sensitive secrets from local LLM fine-tuning datasets, bypassing current privacy defenses.

View →

cs.CRcs.AIRecentMay 24, 2026

MemMorph: Tool Hijacking in LLM Agents via Memory Poisoning

Xuanye Zhang, Yongsen Zheng, Zhuqin Xu, Kaiyu Zhou +4 more

MemMorph introduces a novel memory poisoning attack that biases LLM agent tool selection by injecting crafted records into the agent's long-term memory, achieving high success rates even against moder…

View →

cs.CRcs.AIcs.LGRecentMay 24, 2026

Security in the Fine-Tuning Lifecycle of Large Language Models: Threats, Defenses,Evaluation, and Future Directions

Wenjuan Li, Yitao Liu, Runze Chen, Rajkumar Buyya

This paper provides a systematic, lifecycle-based framework for analyzing security threats and defenses across the entire fine-tuning process of LLMs, revealing that attack effectiveness is highly mod…

View →

cs.CRRecentMar 21, 2026

Unveiling the Security Risks of Federated Learning in the Wild: From Research to Practice

Jiahao Chen, Zhiming Zhao, Yuwen Pu, Chunyi Zhou +3 more

This paper argues that much of the existing research on Federated Learning (FL) security is based on idealized assumptions, and provides a practical evaluation framework showing that real-world attack…

View →

cs.CRcs.AIcs.LGRecentMay 8, 2026

Defense effectiveness across architectural layers: a mechanistic evaluation of persistent memory attacks on stateful LLM agents

Jun Wen Leong

The paper systematically evaluates various defense mechanisms against persistent memory attacks on LLM agents, finding that only tool-gating at the memory layer (Memory Sandbox) effectively mitigates…

View →

cs.CRRecentApr 9, 2026

Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain

Hanzhi Liu, Chaofan Shou, Hongbo Wen, Yanju Chen +2 more

This paper systematically analyzes the threat posed by malicious third-party API routers in the LLM supply chain, finding that a significant number of routers actively perform payload injection, crede…

View →

cs.CRcs.AIcs.CLRecentMar 25, 2026

AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

Zhenyi Wang, Siyu Luan

The paper proposes a unified closed-loop threat taxonomy to systematically analyze and defend foundation models by explicitly framing the bidirectional security interactions between data and models.

View →

cs.LGcs.CRRecentMar 20, 2026

Graph-Aware Stealthy Poison-Text Backdoors for Text-Attributed Graphs

Qi Luo, Minghui Xu, Dongxiao Yu, Xiuzhen Cheng

The paper proposes TAGBD, a graph-aware backdoor attack that demonstrates that inconspicuous poison text alone can reliably compromise text-attributed graph learning systems.

View →