~ similar to 2604.04060v1· 20 results
The paper proposes the Layered Attack Surface Model (LASM), a structural taxonomy that maps security threats and defenses across the complex, multi-layered architecture of AI agents, revealing signifi…
Yaoyang Luo, Zhi Zheng, Ziwei Zhao, Tong Xu +4 more
This paper addresses the threat of coordinated misinformation in LLM-based Multi-Agent Systems by proposing a defense framework, STAR, that effectively identifies and rectifies misleading information…
Xiaozhe Zhang, Chaozhuo Li, Hui Liu, Shaocheng Yan +3 more
The EvoSafety framework enhances LLM safety by externalizing attack and defense mechanisms, enabling persistent, transferable, and model-agnostic robustness against adversarial prompts.
Yu Li, Yuenan Hou, Yingmei Wei, Yanming Guo +1 more
EvoDefense introduces an experience-guided, co-evolving black-box defense mechanism that significantly improves the robustness of LLMs against unseen and diverse attacks without requiring model retrai…
Yu Li, Yuenan Hou, Yingmei Wei, Yanming Guo +1 more
EvoDefense introduces an experience-guided, co-evolving black-box defense mechanism that significantly improves LLM robustness against unseen and diverse attacks without requiring model retraining.
Youness Bouchari, Matteo Boffa, Marco Mellia, Idilio Drago +2 more
The paper re-evaluates LLM agents on CTFs, finding that while general-purpose agents like claude-code are strong baselines, specialized, modular architectures significantly improve performance and con…
Zhe Liu, Zonghao Ying, Wenxin Zhang, Quanchen Zou +4 more
SafeHarbor is a novel, hierarchical memory-augmented framework that establishes context-aware decision boundaries for LLM agents, achieving state-of-the-art safety while minimizing over-refusal.
Jianan Ma, Xiaohu Du, Ruixiao Lin, Yaoxiang Bian +7 more
The paper introduces a multi-dimensional evasion framework and a new benchmark (A3S-Bench) to test autonomous agents, demonstrating that stateful, multi-turn attacks significantly increase system risk…
Yuhui Wang, Tanqiu Jiang, Jiacheng Liang, Charles Fleming +1 more
The paper introduces MAGE, a novel defensive framework that uses a dedicated 'shadow memory' to proactively detect and mitigate long-horizon threats against LLM agents during complex, multi-step inter…
Yunhao Feng, Xiaohu Du, Xinhao Deng, Yifan Ding +12 more
BraveGuard is a self-evolving defense framework that significantly improves the safety monitoring of computer-use agents by generating guard model supervision from open-world threat discovery and real…
Yunhao Feng, Yifan Ding, Xiaohu Du, Ming Wen +12 more
BraveGuard is a self-evolving defense framework that improves the safety of computer-use agents by training guard models on open-world, multi-step threat trajectories rather than static benchmarks.
This paper introduces and evaluates guardian-based defenses, showing that an intermediary LLM agent can significantly reduce the success rate of skill injection attacks on terminal-based agents, even…
This paper proposes and evaluates guardian-based defenses, both dynamic and static, to mitigate skill injection attacks targeting LLM agents that rely on reusable procedural skills.
This paper provides a systematic, layered review of security risks and defense strategies for autonomous agent frameworks, using OpenClaw as a case study to address the current lack of integrated rese…
ClawGuard is a novel runtime security framework that deterministically enforces user-confirmed rules at tool-call boundaries to protect LLM agents from indirect prompt injection.
The paper establishes a standardized security assessment framework and develops a multi-layered defensive system, demonstrating that systematic testing and external defenses are crucial for safe LLM d…
The paper proposes Dynamic Cyber Ranges, an advanced cyber range environment using LLM-driven Defender agents to counter the saturation of traditional security benchmarks, demonstrating that these dyn…
Yongxiang Li, Moxin Li, Zhixin Ma, Fengbin Zhu +3 more
This paper introduces the concept of 'Sleeper Attack,' demonstrating that adversarial content can persist across multiple interactions with an LLM agent, posing a more subtle and difficult-to-detect s…
Zhen Huang, Zhihuang Liu, Mengxuan Luo, Weishang Wu +1 more
The paper proposes a novel attack paradigm demonstrating how compromising a single robot in an LLM-controlled multi-robot system can rapidly propagate malicious intent to cause coordinated unsafe acti…
Bingyu Yan, Xiaoming Zhang, Jinyu Hou, Chaozhuo Li +3 more
PropGuard introduces a propagation-aware framework to safeguard LLM-MAS against malicious attacks by constructing a dual-view graph, identifying suspicious propagation paths, and applying source-guide…