~ similar to 2605.21095v1· 20 results
This paper introduces a foundational framework and taxonomy for managing catastrophic AI loss of control (LOC) incidents, providing a proportional guide for response based on the severity and recovera…
Zelin Zhang, Qi Li, Jie Cao, Lingshuang Liu +1 more
The paper analyzes the escalating security and safety threats posed by generative AI systems as they transition from merely generating content to executing real-world actions via tools and agents, fin…
The paper introduces Owner-Harm, a formal threat model addressing the critical blind spot of AI agents harming their own deployers, demonstrating that specialized defenses are needed beyond generic sa…
The paper proposes the Policy-Execution-Authorization (PEA) architecture, a separation-of-powers system designed to structurally enforce goal integrity in AI agents, moving safety from a probabilistic…
This paper proposes a set of design principles and a conceptual benchmark (SOC-bench) to systematically evaluate the blue team operational capabilities of multi-agent AI systems in autonomous Security…
The paper analyzes the failure modes of current AI containment methods when the agent itself is the adversary, deriving five necessary architectural requirements for durable safety.
Xixun Lin, Yang Liu, Yancheng Chen, Yongxuan Wu +7 more
The paper introduces SafeHarness, a novel, lifecycle-integrated security architecture that significantly reduces unsafe behavior and attack success rates in LLM agents by weaving multiple defense laye…
Jianan Ma, Xiaohu Du, Ruixiao Lin, Yaoxiang Bian +7 more
The paper introduces a multi-dimensional evasion framework and a new benchmark (A3S-Bench) to test autonomous agents, demonstrating that stateful, multi-turn attacks significantly increase system risk…
The paper introduces MATRA, a systematic threat modeling framework, to assess how known LLM threats translate into concrete, deployment-specific risks within autonomous agentic AI systems.
This paper systematically maps the expanded attack surface of agentic AI systems, identifying new threat vectors like RAG poisoning and cross-agent manipulation, and proposes a comprehensive security…
Baoyuan Wu, Qingshan Liu, Adel Bibi, Irwin King +1 more
The paper argues that the Authorization-Execution Gap (AEG)—the divergence between intended authorization and actual execution—is a critical safety and security flaw in open-world agents, requiring so…
The paper introduces Parallax, an architectural framework that structurally separates AI reasoning from action execution to ensure robust safety for autonomous agents, achieving high attack mitigation…
The paper proposes the Energetic Paradigm, a model-agnostic architectural framework that allows states to maintain decision sovereignty and control over military AI systems, even when using proprietar…
This paper empirically demonstrates that the architectural design of multi-agent systems significantly impacts their security, finding that coordination mechanisms can introduce vulnerabilities greate…
The paper proposes the Layered Attack Surface Model (LASM), a structural taxonomy that maps security threats and defenses across the complex, multi-layered architecture of AI agents, revealing signifi…
Kevin Eykholt, Dhilung Kirat, Xiaokui Shu, Jiyong Jang +2 more
The paper reports on penetration tests conducted on proprietary, large-scale AI agent systems, finding that security vulnerabilities persist despite stricter development standards.
Youness Bouchari, Matteo Boffa, Marco Mellia, Idilio Drago +2 more
The paper re-evaluates LLM agents on CTFs, finding that while general-purpose agents like claude-code are strong baselines, specialized, modular architectures significantly improve performance and con…
The paper evaluates Language Model Agents (LMAs) for red-teaming by benchmarking their ability to perform lateral movement, finding that expert-defined action plans are most effective, though all moda…
PocketAgents introduces a manifest-driven framework for autonomous defense agents, enabling measurable and attributable LLM-driven security responses by strictly controlling agent actions and telemetr…
ZERO-APT introduces a novel closed-loop adversarial framework for automated penetration testing that simulates attacks against an intelligent, real-time defending system, achieving a high attack succe…