~ similar to 2606.02644v1· 20 results
The paper introduces a validated, consensus-labeled prompt bank that separates requests for executable malicious code (weapons) from requests for general harmful security knowledge, providing a more g…
The paper proposes Dynamic Cyber Ranges, an advanced cyber range environment using LLM-driven Defender agents to counter the saturation of traditional security benchmarks, demonstrating that these dyn…
The paper argues that LLM agent security is fundamentally an agent-human interaction (AHI) problem, demonstrating that industry practices rely on human-centric mechanisms while academic research focus…
The paper analyzes the failure modes of current AI containment methods when the agent itself is the adversary, deriving five necessary architectural requirements for durable safety.
Chong Xiang, Drew Zagieboylo, Shaona Ghosh, Sanjay Kariyappa +4 more
The paper proposes a vision for system-level defenses against indirect prompt injection attacks targeting AI agents, emphasizing structured control and human oversight.
Mihai Christodorescu, Earlence Fernandes, Ashish Hooda, Somesh Jha +10 more
The paper argues that agent security must be treated as a systems problem, requiring the enforcement of security invariants at the system level rather than solely relying on improving the underlying A…
The paper proposes the Layered Attack Surface Model (LASM), a structural taxonomy that maps security threats and defenses across the complex, multi-layered architecture of AI agents, revealing signifi…
Jianan Ma, Xiaohu Du, Ruixiao Lin, Yaoxiang Bian +7 more
The paper introduces a multi-dimensional evasion framework and a new benchmark (A3S-Bench) to test autonomous agents, demonstrating that stateful, multi-turn attacks significantly increase system risk…
This paper systematically maps the expanded attack surface of agentic AI systems, identifying new threat vectors like RAG poisoning and cross-agent manipulation, and proposes a comprehensive security…
Taein Lim, Seongyong Ju, Munhyeok Kim, Hyunjun Kim +1 more
The paper introduces CyBiasBench, a comprehensive benchmark that quantifies the inherent, agent-specific bias in LLM agents' attack selection patterns in cybersecurity scenarios.
Youness Bouchari, Matteo Boffa, Marco Mellia, Idilio Drago +2 more
The paper re-evaluates LLM agents on CTFs, finding that while general-purpose agents like claude-code are strong baselines, specialized, modular architectures significantly improve performance and con…
The paper evaluates Language Model Agents (LMAs) for red-teaming by benchmarking their ability to perform lateral movement, finding that expert-defined action plans are most effective, though all moda…
The paper systematically maps LLM agent vulnerabilities by testing 10,000 prompt variations, finding that 'goal reframing' language is the primary trigger for exploitation, rather than broad adversari…
The paper introduces a challenging benchmark for LLM agents to perform unsupervised threat hunting on raw Windows event logs, finding that current frontier models perform poorly and are not ready for…
PocketAgents introduces a manifest-driven framework for autonomous defense agents, enabling measurable and attributable LLM-driven security responses by strictly controlling agent actions and telemetr…
This survey analyzes the unique security threats posed by complex, multi-agent AI systems and proposes Confidential Computing (CC) using Trusted Execution Environments (TEEs) as a hardware-rooted defe…
Agent-Sentry is a runtime defense system that bounds the execution of LLM agents by learning a profile of benign behavior, effectively blocking malicious injections while maintaining high compatibilit…
This paper analyzes a safety incident where an AI agent escalated unauthorized system changes following exposure to routine, non-adversarial content, highlighting failures in current multi-agent overs…
Ali Al-Kaswan, Maksim Plotnikov, Maxim Hájek, Roland Vízner +2 more
The paper introduces DeepRed, a new benchmark for evaluating LLM agents in realistic CTF challenges, finding that current agents are limited, achieving only 35% average checkpoint completion.