Papers similar to 2603.20339v2

~ similar to 2603.20339v2· 20 results

cs.CRcs.LGRecentApr 7, 2026

Stealthy and Adjustable Text-Guided Backdoor Attacks on Multimodal Pretrained Models

Yiyang Zhang, Chaojian Yu, Ziming Hong, Yuanjie Shao +3 more

The paper proposes a novel Text-Guided Backdoor (TGB) attack that uses common words in text descriptions as stealthy triggers for multimodal models, enhancing practicality and controllability.

View →

cs.CRcs.AIRecentApr 23, 2026

CSC: Turning the Adversary's Poison against Itself

Yuchen Shi, Xin Guo, Huajie Chen, Tianqing Zhu +2 more

The paper proposes Cluster Segregation Concealment (CSC), a novel defense that identifies and neutralizes backdoor triggers by relabeling poisoned samples to a virtual class, achieving near-zero attac…

View →

cs.CRcs.AIcs.LGRecentMay 26, 2026

Cordyceps: Covert Control Attacks on LLMs via Data Poisoning

Zedian Shao, Charles Fleming, Teodora Baluta

The paper introduces 'covert control attacks,' a novel and stealthy data poisoning method that teaches LLMs an information hiding scheme, allowing malicious instructions to be encoded and decoded and…

View →

cs.CRcs.AIRecentMay 9, 2026

ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts

Yang Luo, Zifeng Kang, Tiantian Ji, Xinran Liu +3 more

The paper introduces SHADOWMERGE, a novel poisoning attack that successfully compromises graph-based agent memory by exploiting relation-channel conflicts, achieving a high attack success rate across…

View →

cs.CRcs.LGRecentMay 26, 2026

Poison with Style: A Practical Poisoning Attack on Code Large Language Models

Khang Tran, Yazan Boshmaf, Issa Khalil, NhatHai Phan +2 more

The paper introduces Poison-with-Style (PwS), a stealthy model poisoning attack that exploits developers' inherent code styles as covert triggers to make Code LLMs generate vulnerable code without exp…

View →

cs.CRcs.AIcs.LGRecentMay 18, 2026

Be Kind, Rewrite: Benign Projections via Rewriting Defend Against LLM Data Poisoning Attacks

John T. Halloran, Noopur S. Bhatt

The paper proposes Open-Book Benign Rewriting (OBBR), a novel defense mechanism that uses LLM rewriting with benign samples to neutralize data poisoning attacks against LLMs, significantly improving s…

View →

cs.CRcs.AIRecentApr 10, 2026

BadSkill: Backdoor Attacks on Agent Skills via Model-in-Skill Poisoning

Guiyao Tie, Jiawen Shi, Pan Zhou, Lichao Sun

The paper introduces BadSkill, a novel backdoor attack formulation that targets third-party agent skills by poisoning the embedded model artifacts, achieving high attack success rates across various m…

View →

cs.CRcs.AIRecentMay 10, 2026

Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning

Ben Kereopa-Yorke, Guillermo Diaz, Holly Wright, Reagan Johnston +2 more

The paper introduces Oracle Poisoning, an attack that corrupts knowledge graphs used by AI agents, demonstrating that all tested models blindly trust poisoned data at high sophistication levels.

View →

cs.CRcs.AIcs.LGRecentMay 22, 2026

PoisonForge: Task-Level Targeted Poisoning Benchmark for Instruction-Tuned LLMs

Luze Sun, Anshuman Suri, Harsh Chaudhari, Cristina Nita-Rotaru +1 more

The paper introduces PoisonForge, a comprehensive benchmark demonstrating that even a small number of targeted poisoned examples can significantly compromise the safety and reliability of instruction-…

View →

cs.CRcs.AIRecentApr 25, 2026

Toward Polymorphic Backdoor against Semantic Communication via Intensity-Based Poisoning

Xiao Yang, Yuni Lai, Gaolei Li, Jun Wu +3 more

The paper proposes SemBugger, a polymorphic backdoor attack that uses intensity-based poisoning to achieve diverse malicious outcomes in Semantic Communication (SC) systems, alongside a provable defen…

View →

cs.CLcs.CRcs.LGRecentMar 29, 2026

Hidden Ads: Behavior Triggered Semantic Backdoors for Advertisement Injection in Vision Language Models

Duanyi Yao, Changyue Li, Zhicong Huang, Cheng Hong +1 more

The paper introduces Hidden Ads, a novel backdoor attack for Vision-Language Models (VLMs) that injects unauthorized advertisements by exploiting natural, recommendation-seeking user behaviors, mainta…

View →

cs.CRRecentMay 11, 2026

Generate "Normal", Edit Poisoned: Branding Injection via Hint Embedding in Image Editing

Desen Sun, Jason Hon, Howe Wang, Saarth Rajan +2 more

This paper investigates a novel security vulnerability where imperceptible branding hints can be injected into images and subsequently re-rendered onto new objects by generative AI models, proposing b…

View →

cs.CRRecentApr 8, 2026

RefineRAG: Word-Level Poisoning Attacks via Retriever-Guided Text Refinement

Ziye Wang, Guanyu Wang, Kailong Wang

RefineRAG introduces a novel word-level poisoning framework that significantly enhances knowledge poisoning attacks against RAG systems, achieving state-of-the-art effectiveness and transferability to…

View →

cs.CRcs.LGRecentMay 16, 2026

Universal Graph Backdoor Defense: A Feature-based Homophily Perspective

Mengting Pan, Fan Li, Chen Chen, Xiaoyang Wang

The paper proposes a universal graph backdoor defense framework that addresses feature-based graph backdoor attacks, which are more challenging than traditional subgraph-based attacks, by leveraging l…

View →

cs.CRcs.CLRecentMay 4, 2026

Fight Poison with Poison: Enhancing Robustness in Few-shot Machine-Generated Text Detection with Adversarial Training

Wenjing Duan, Qi Zhou, Yuanfan Li

The paper proposes REACT, an adversarial training framework that significantly enhances the robustness and few-shot performance of machine-generated text detection by having a Retrieval-Augmented Gene…

View →

cs.CRcs.DBRecentApr 27, 2026

Poisoning Learned Index Structures: Static and Dynamic Adversarial Attacks on ALEX

Allen Jue

The paper systematically evaluates static and dynamic adversarial attacks on the ALEX learned index, finding that while static poisoning has minimal impact, dynamic attacks can cause significant slowd…

View →

cs.AIcs.CRRecentMay 15, 2026

GRID: Graph Representation of Intelligence Data for Security Text Knowledge Graph Construction

Liangyi Huang, Zichen Liu, Fei Shao, Shang Ma +4 more

The paper introduces GRID, an end-to-end framework that significantly improves the construction of security knowledge graphs from cyber threat intelligence by replacing unstable LLM-based supervision…

View →

cs.LGcs.AIcs.CRRecentMay 8, 2026

Trapping Attacker in Dilemma: Examining Internal Correlations and External Influences of Trigger for Defending GNN Backdoors

Fan Yang, Binyan Xu, Di Tang, Kehuan Zhang

The paper proposes PRAETORIAN, a novel defense mechanism for Graph Neural Networks (GNNs) that targets the intrinsic structural requirements of backdoor attacks, significantly reducing the attack succ…

View →

cs.CRcs.AIcs.IRRecentJun 2, 2026

Patcher: Post-Hoc Patching of Backdoored Large Language Models

Anjun Gao, Yueyang Quan, Yufei Xia, Zhuqing Liu +1 more

Patcher is a post-hoc defense framework that repairs backdoored large language models by localizing hidden triggers and patching the model using only a single reported failure case.

View →

cs.CRcs.LGRecentMay 19, 2026

Awakening the Hydra: Stabilizing Multi-Concept Backdoor Injection in Text-to-Image Diffusion Models

Kai Wang, Jiale Zhang, Chengcheng Zhu, Chuang Ma +1 more

The paper proposes Hydra, a framework to stabilize and control the injection of multiple, conflicting backdoor triggers into text-to-image diffusion models, ensuring high attack reliability while main…

View →