~ similar to 2603.18235v1· 20 results
The paper demonstrates a semantic denial-of-service attack against LLM-controlled robots by injecting short, safety-plausible phrases into the audio channel, causing the robot to halt or disrupt execu…
Zhen Huang, Zhihuang Liu, Mengxuan Luo, Weishang Wu +1 more
The paper proposes a novel attack paradigm demonstrating how compromising a single robot in an LLM-controlled multi-robot system can rapidly propagate malicious intent to cause coordinated unsafe acti…
The paper introduces BOA, a novel framework that measures agent safety by exhaustively searching the entire in-budget trajectory space, thereby identifying unsafe behaviors missed by traditional sampl…
Jinhu Qi, Muzhi Li, Jiahong Liu, Yuqin Shu +8 more
This survey provides a comprehensive, practical guide to ensuring the trustworthiness of complex, autonomous agentic AI systems by focusing on safety, robustness, privacy, and system security.
Yuhui Wang, Tanqiu Jiang, Jiacheng Liang, Charles Fleming +1 more
The paper introduces MAGE, a novel defensive framework that uses a dedicated 'shadow memory' to proactively detect and mitigate long-horizon threats against LLM agents during complex, multi-step inter…
The paper evaluates Language Model Agents (LMAs) for red-teaming by benchmarking their ability to perform lateral movement, finding that expert-defined action plans are most effective, though all moda…
The paper introduces POIROT, a novel protocol that uses the agents within a multi-agent system itself to diagnose and detect failures, demonstrating superior performance over traditional evaluation me…
The paper proposes the Layered Attack Surface Model (LASM), a structural taxonomy that maps security threats and defenses across the complex, multi-layered architecture of AI agents, revealing signifi…
Xiaozhe Zhang, Chaozhuo Li, Hui Liu, Shaocheng Yan +3 more
The EvoSafety framework enhances LLM safety by externalizing attack and defense mechanisms, enabling persistent, transferable, and model-agnostic robustness against adversarial prompts.
This paper introduces AgentREVEAL, a diagnostic framework showing that the utility of web retrieval in LLM agents creates a safety-utility trade-off, as relevance itself can degrade safety alignment a…
This paper introduces AgentREVEAL, a diagnostic framework that demonstrates that the utility of web retrieval in LLM agents creates a safety-utility trade-off, as relevance itself can degrade safety a…
The paper provides a holistic threat model for LLM-enabled robotic systems by analyzing how conventional, adversarial, and conversational threats propagate across the entire perception-planning-actuat…
This paper systematically maps the expanded attack surface of agentic AI systems, identifying new threat vectors like RAG poisoning and cross-agent manipulation, and proposes a comprehensive security…
Zhe Liu, Zonghao Ying, Wenxin Zhang, Quanchen Zou +4 more
SafeHarbor is a novel, hierarchical memory-augmented framework that establishes context-aware decision boundaries for LLM agents, achieving state-of-the-art safety while minimizing over-refusal.
The paper establishes a standardized security assessment framework and develops a multi-layered defensive system, demonstrating that systematic testing and external defenses are crucial for safe LLM d…
AgentTrust is a novel runtime safety layer that intercepts and evaluates AI agent tool calls before execution, achieving high accuracy in detecting unsafe actions across complex and obfuscated scenari…
Xixun Lin, Yang Liu, Yancheng Chen, Yongxuan Wu +7 more
The paper introduces SafeHarness, a novel, lifecycle-integrated security architecture that significantly reduces unsafe behavior and attack success rates in LLM agents by weaving multiple defense laye…
The paper empirically evaluates domain-adapted and general-purpose LLMs for structured threat modelling (STRIDE on 5G security), finding that domain adaptation and model size do not guarantee reliable…
Chiyu Zhang, Huiqin Yang, Bendong Jiang, Xiaolei Zhang +7 more
The paper introduces LITMUS, a novel benchmark that rigorously tests LLM agents for dangerous, physical-layer behavioral jailbreaks in real OS environments, revealing that current agents frequently ex…
The paper introduces a challenging benchmark for LLM agents to perform unsupervised threat hunting on raw Windows event logs, finding that current frontier models perform poorly and are not ready for…