~ similar to 2604.12177v1· 20 results
The paper introduces Distributed Sentinel, a zero-trust architecture that prevents Context-Fragmented Violations (CFVs) in multi-agent systems by propagating security state across departmental boundar…
The paper introduces WIRE, a pipeline for diagnosing live intra-policy rule conflicts in LLM agents by identifying and testing specific rule pairs within a single prompt policy that can co-govern a re…
SentinelAgent introduces a formal framework, the Intent-Preserving Delegation Protocol (IPDP), to secure federal multi-agent AI systems by verifying complex delegation chains against seven properties,…
The paper proves that standard runtime enforcement mechanisms cannot detect systematic behavioral drift in autonomous agents, proposing a new Invariant Measurement Layer (IML) that restores observabil…
The paper proposes Proof-Carrying Agent Actions (PCAA), a runtime-neutral governance model that uses action certificates to consistently track and authorize high-risk actions across diverse and hetero…
Vincent Siu, Jingxuan He, Kyle Montgomery, Zhun Wang +3 more
The paper introduces a contextual security framework for LLM agents, defining security properties and reformulating various attacks and defenses based on the context of execution.
The paper identifies and measures a critical failure mode where LLM agents violate policies by losing or corrupting directive-bearing state during the process of assembling the decision context, and p…
The paper proposes the Policy-Execution-Authorization (PEA) architecture, a separation-of-powers system designed to structurally enforce goal integrity in AI agents, moving safety from a probabilistic…
The paper proposes an organization-scoped LLM agent runtime architecture designed to provide an auditable, model-agnostic platform for regulated cybersecurity operations, integrating deeply with exist…
The paper proposes a novel, organization-scoped LLM agent runtime architecture designed specifically for regulated cybersecurity operations, ensuring auditable context and integration with existing se…
This paper identifies Security-Recall Divergence (SRD), demonstrating that omission constraints (prohibitions) decay significantly in long-context LLM conversations, while commission constraints (requ…
The paper introduces Agent-ToM, a Theory-of-Mind (ToM) based framework that learns to monitor autonomous LLM agents by explicitly reasoning about their hidden beliefs and intentions to detect covert m…
Yuhui Wang, Tanqiu Jiang, Jiacheng Liang, Charles Fleming +1 more
The paper introduces MAGE, a novel defensive framework that uses a dedicated 'shadow memory' to proactively detect and mitigate long-horizon threats against LLM agents during complex, multi-step inter…
Agent-Sentry is a runtime defense system that bounds the execution of LLM agents by learning a profile of benign behavior, effectively blocking malicious injections while maintaining high compatibilit…
The paper introduces Latent Policy Guardrail (LPG), a novel framework that efficiently enforces dynamic safety policies for LLMs by compressing complex policy deliberation into a small set of latent t…
This paper analyzes a safety incident where an AI agent escalated unauthorized system changes following exposure to routine, non-adversarial content, highlighting failures in current multi-agent overs…
Ying Li, Hongbo Wen, Yanju Chen, Hanzhi Liu +2 more
The paper introduces Sefz, a semantic fuzzing framework that automatically discovers specification violations in LLM agent skills, finding a significant number of previously unknown exploitable guardr…
AgentWall is a runtime safety layer that intercepts and evaluates all proposed actions from local AI agents against a declarative policy, ensuring safety before execution.
Yiqi Wang, Jiaqi Zhang, Taotao Cai, Zirui Liu +5 more
This survey provides a systematic framework and taxonomy for evidence tracing and execution provenance in LLM agents, addressing the difficulty of verifying and auditing complex agent behaviors.
The paper argues that LLM agent security is fundamentally an agent-human interaction (AHI) problem, demonstrating that industry practices rely on human-centric mechanisms while academic research focus…