Papers similar to 2605.06846v3

~ similar to 2605.06846v3· 20 results

cs.CRcs.AIRecentApr 10, 2026

BadSkill: Backdoor Attacks on Agent Skills via Model-in-Skill Poisoning

Guiyao Tie, Jiawen Shi, Pan Zhou, Lichao Sun

The paper introduces BadSkill, a novel backdoor attack formulation that targets third-party agent skills by poisoning the embedded model artifacts, achieving high attack success rates across various m…

View →

cs.CRcs.AIcs.LGRecentMay 22, 2026

PoisonForge: Task-Level Targeted Poisoning Benchmark for Instruction-Tuned LLMs

Luze Sun, Anshuman Suri, Harsh Chaudhari, Cristina Nita-Rotaru +1 more

The paper introduces PoisonForge, a comprehensive benchmark demonstrating that even a small number of targeted poisoned examples can significantly compromise the safety and reliability of instruction-…

View →

cs.LGcs.CRRecentJun 3, 2026

Sequential Data Poisoning in LLM Post-Training

Jack Sanderson, Yihan Wang, Xiaoqian Lu, Gautam Kamath +1 more

The paper introduces the threat model of sequential data poisoning, demonstrating that multiple, collaborating attackers can exploit compound vulnerabilities in LLM post-training pipelines that are in…

View →

cs.CRcs.AIRecentMay 10, 2026

Oracle Poisoning: Corrupting Knowledge Graphs to Weaponise AI Agent Reasoning

Ben Kereopa-Yorke, Guillermo Diaz, Holly Wright, Reagan Johnston +2 more

The paper introduces Oracle Poisoning, an attack that corrupts knowledge graphs used by AI agents, demonstrating that all tested models blindly trust poisoned data at high sophistication levels.

View →

cs.CRcs.AIRecentApr 23, 2026

CSC: Turning the Adversary's Poison against Itself

Yuchen Shi, Xin Guo, Huajie Chen, Tianqing Zhu +2 more

The paper proposes Cluster Segregation Concealment (CSC), a novel defense that identifies and neutralizes backdoor triggers by relabeling poisoned samples to a virtual class, achieving near-zero attac…

View →

cs.CRcs.AIcs.LGRecentMay 27, 2026

Position: Retire the "Positive Backdoor" Label -- Secret Alignment Requires Strict and Systematic Evaluation

Jianwei Li, Jung-Eun Kim

The paper argues that the 'positive backdoor' label should be retired and replaced with 'Secret Alignment,' asserting that such protective claims must be rigorously evaluated for security, especially…

View →

cs.CRcs.AIcs.LGRecentMay 27, 2026

Position: Retire the "Positive Backdoor" Label -- Secret Alignment Requires Strict and Systematic Evaluation

Jianwei Li, Jung-Eun Kim

The paper argues that the 'positive backdoor' label should be retired and replaced with 'Secret Alignment,' asserting that all such protective claims require rigorous, standardized evaluation due to i…

View →

cs.LGcs.CREmpiricalRecentJul 28, 2026

Anti-Backdoor Coreset Selection via Cumulative Entropy

Qi Zhao, Christian Wressnegger

This paper formulates neural backdoor defense as a coreset selection problem and uses Cumulative Entropy as selection criterion to focus on benign samples, constructing a benign coreset for training a…

View →

cs.CRcs.LGEmpiricalRecentJul 29, 2026

ToxScreen: Detecting Whether an LLM Has Been Poisoned

Anthony Hughes, Nicole Xing, Collin Francel, Andy Kim +1 more

This paper introduces ToxScreen, a benchmark of 800 backdoored models to evaluate defender's ability to recover triggers under realistic settings, and identifies a method for recovery.

View →

cs.CRcs.AIcs.DCRecentApr 10, 2026

XFED: Non-Collusive Model Poisoning Attack Against Byzantine-Robust Federated Classifiers

Israt Jahan Mouri, Muhammad Ridowan, Muhammad Abdullah Adnan

The paper introduces XFED, a novel non-collusive model poisoning attack that demonstrates the feasibility of compromising Federated Learning systems without requiring coordination among attackers, byp…

View →

cs.CRcs.SERecentMar 22, 2026

SkillProbe: Security Auditing for Emerging Agent Skill Marketplaces via Multi-Agent Collaboration

Zihan Guo, Zhiyu Chen, Xiaohang Nie, Jianghao Lin +2 more

The paper proposes SkillProbe, a multi-agent security auditing framework, demonstrating that high-popularity skills in LLM agent marketplaces are often insecure due to systemic combinatorial risks.

View →

cs.CRcs.AIeess.SYRecentMay 12, 2026

Behavioral Integrity Verification for AI Agent Skills

Yuhao Wu, Tung-Ling Li, Hongliang Liu

The paper introduces Behavioral Integrity Verification (BIV), a framework that systematically audits AI agent skills by comparing their declared capabilities against their actual implementation, revea…

View →

stat.MLcs.LGstat.MEEmpiricalRecentJul 22, 2026

Data-Poisoning Audits for Causal Effect Estimation

Kwangho Kim

This paper develops a data-poisoning audit for augmented inverse-probability-weighted estimation to prevent strategic record selection in observational causal analyses.

View →

cs.CRRecentApr 8, 2026

MirageBackdoor: A Stealthy Attack that Induces Think-Well-Answer-Wrong Reasoning

Yizhe Zeng, Wei Zhang, Yunpeng Li, Juxin Xiao +2 more

MirageBackdoor introduces a novel, highly stealthy backdoor attack that forces Large Language Models to generate correct reasoning steps (Think Well) but output an incorrect final answer (Answer Wrong…

View →

cs.CRcs.AIRecentMay 30, 2026

Benchmarking Security Risk Detection and Verification in Open Agentic Skill Ecosystems

Ismail Hossain, Sai Puppala, Zhuoran Lu, Sajedul Talukder +1 more

The paper introduces SkillVetBench, a novel two-stage benchmark that effectively detects and verifies malicious behavior in open agentic skill ecosystems, significantly outperforming existing static a…

View →

cs.CRcs.AIRecentMay 30, 2026

Benchmarking Security Risk Detection and Verification in Open Agentic Skill Ecosystems

Ismail Hossain, Sai Puppala, Zhuoran Lu, Sajedul Talukder +1 more

The paper introduces SkillVetBench, a novel two-stage benchmark that effectively detects and verifies malicious behavior hidden within open agentic skills, significantly outperforming static and seman…

View →

cs.LGcs.AIcs.CRRecentMay 8, 2026

Trapping Attacker in Dilemma: Examining Internal Correlations and External Influences of Trigger for Defending GNN Backdoors

Fan Yang, Binyan Xu, Di Tang, Kehuan Zhang

The paper proposes PRAETORIAN, a novel defense mechanism for Graph Neural Networks (GNNs) that targets the intrinsic structural requirements of backdoor attacks, significantly reducing the attack succ…

View →

cs.CRcs.AIcs.CLRecentMar 25, 2026

AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

Zhenyi Wang, Siyu Luan

The paper proposes a unified closed-loop threat taxonomy to systematically analyze and defend foundation models by explicitly framing the bidirectional security interactions between data and models.

View →

cs.LGcs.CRRecentMay 25, 2026

On Reliability of Efficient Membership Inference Vulnerability Evaluation

Joonas Jälkö, Gauri Pradhan, Ossi Räisä, Antti Honkela

This paper analyzes the reliability of efficient membership inference attack (MIA) evaluation methods, demonstrating that standard aggregation techniques introduce biases that compromise accurate vuln…

View →

cs.CRcs.AIRecentApr 3, 2026

Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study

Zhihao Chen, Ying Zhang, Yi Liu, Gelei Deng +6 more

This study conducts a large-scale empirical analysis of third-party LLM agent skills, identifying that credential leakage is a pervasive, cross-modal issue primarily caused by debug logging and result…

View →