~ similar to 2603.19423v1· 20 results
Zihan Wang, Rui Zhang, Yu Liu, Chi Liu +3 more
This paper presents the first systematic study of black-box skill stealing attacks against proprietary LLM agents, demonstrating that structured agent skills can be easily extracted, posing a signific…
Taein Lim, Seongyong Ju, Munhyeok Kim, Hyunjun Kim +1 more
The paper introduces CyBiasBench, a comprehensive benchmark that quantifies the inherent, agent-specific bias in LLM agents' attack selection patterns in cybersecurity scenarios.
This paper introduces and evaluates guardian-based defenses, showing that an intermediary LLM agent can significantly reduce the success rate of skill injection attacks on terminal-based agents, even…
This paper proposes and evaluates guardian-based defenses, both dynamic and static, to mitigate skill injection attacks targeting LLM agents that rely on reusable procedural skills.
The paper proposes a general-purpose pipeline to train automated red teaming models capable of generating attacks for arbitrary adversarial goals, overcoming the limitations of current methods that ar…
Chang Jin, An Wang, Zeming Wei, Kai Wang +6 more
The paper introduces SkillSafetyBench, a comprehensive benchmark demonstrating that agent safety failures often stem from adversarial influences within reusable skills and execution environments, rath…
Priyal Deep, Shane Emmons, Amy Fox, Kyle Bacon +3 more
The paper evaluates prompt injection defenses and finds that only external output filtering, implemented in application code, reliably prevents secret leaks from LLMs, demonstrating that model-based d…
The paper evaluates Language Model Agents (LMAs) for red-teaming by benchmarking their ability to perform lateral movement, finding that expert-defined action plans are most effective, though all moda…
Yuting Ning, Zhehao Zhang, Yash Kumar Lal, Boyu Gou +7 more
The paper introduces SkillHarm, a comprehensive benchmark and automated framework for evaluating skill-based attacks across the entire agent skill-use lifecycle, demonstrating that current agents rema…
Jianan Ma, Xiaohu Du, Ruixiao Lin, Yaoxiang Bian +7 more
The paper introduces a multi-dimensional evasion framework and a new benchmark (A3S-Bench) to test autonomous agents, demonstrating that stateful, multi-turn attacks significantly increase system risk…
Zenghao Duan, Yuxin Tian, Zhiyi Yin, Liang Pang +5 more
SkillAttack is a red-teaming framework that dynamically tests the exploitability of latent vulnerabilities in LLM agent skills using adversarial prompting, demonstrating that even benign skills pose s…
AttackEval systematically evaluates the effectiveness of 250 prompt injection prompts across ten attack categories, finding that composite and obfuscation attacks are highly effective against current…
Shihao Weng, Yang Feng, Jinrui Zhang, Xiaofei Xie +2 more
The paper introduces ARGUS, a defense mechanism that uses provenance-aware decision auditing to protect LLM agents from sophisticated, context-aware prompt injection attacks, significantly reducing th…
The paper introduces BadSkill, a novel backdoor attack formulation that targets third-party agent skills by poisoning the embedded model artifacts, achieving high attack success rates across various m…
The paper systematically maps LLM agent vulnerabilities by testing 10,000 prompt variations, finding that 'goal reframing' language is the primary trigger for exploitation, rather than broad adversari…
PlanGuard is a training-free defense framework that uses an isolated Planner and hierarchical verification to defend LLM agents against Indirect Prompt Injection by verifying the consistency of planne…
ClawGuard is a novel runtime security framework that deterministically enforces user-confirmed rules at tool-call boundaries to protect LLM agents from indirect prompt injection.
Xiangtao Meng, Wenyu Chen, Chuanchao Zang, Xinyu Gao +4 more
This paper systematically measures and explains how sequential model defenses can conflict, finding that 38.9% of ordered defense sequences cause measurable risk exacerbation due to anti-aligned param…
Zonghao Ying, Haozheng Wang, Jiangfan Liu, Quanchen Zou +4 more
AgentVisor is a novel defense framework that uses semantic virtualization, inspired by OS principles, to significantly reduce LLM agent vulnerability to prompt injection while maintaining high utility…
Yunhao Feng, Yifan Ding, Yingshui Tan, Boren Zheng +5 more
SkillTrojan introduces a novel backdoor attack targeting the composition of reusable skills in agent systems, demonstrating high attack success rates with minimal impact on normal system functionality…