Papers similar to 2605.29251

~ similar to 2605.29251· 20 results

cs.AIcs.CRRecentMay 28, 2026

Provably Secure Agent Guardrail

Benlong Wu, Weiming Zhang, Kejiang Chen, Han Fang +1 more

The paper introduces an executable Proof-Constrained Action (ePCA) framework that secures AI agents by forcing them to formalize their intentions into first-order logical constraints, achieving provab…

View →

cs.SEcs.AIcs.CRRecentApr 16, 2026

Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility

Yining Hong, Yining She, Eunsuk Kang, Christopher S. Timperley +1 more

The paper proposes and evaluates symbolic guardrails as a practical method to provide strong, verifiable safety and security guarantees for domain-specific AI agents without compromising their utility…

View →

cs.CRcs.AIcs.CLRecentMay 4, 2026

MAGE: Safeguarding LLM Agents against Long-Horizon Threats via Shadow Memory

Yuhui Wang, Tanqiu Jiang, Jiacheng Liang, Charles Fleming +1 more

The paper introduces MAGE, a novel defensive framework that uses a dedicated 'shadow memory' to proactively detect and mitigate long-horizon threats against LLM agents during complex, multi-step inter…

View →

cs.CRcs.AIRecentApr 27, 2026

AgentWard: A Lifecycle Security Architecture for Autonomous AI Agents

Yixiang Zhang, Xinhao Deng, Jiaqing Wu, Yue Xiao +2 more

The paper introduces AgentWard, a lifecycle-oriented, defense-in-depth architecture designed to systematically secure autonomous AI agents by protecting them across all stages of their operation.

View →

cs.CRcs.AIRecentApr 14, 2026

Parallax: Why AI Agents That Think Must Never Act

Joel Fokou

The paper introduces Parallax, an architectural framework that structurally separates AI reasoning from action execution to ensure robust safety for autonomous agents, achieving high attack mitigation…

View →

cs.CRcs.AIRecentMay 7, 2026

SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety

Zhe Liu, Zonghao Ying, Wenxin Zhang, Quanchen Zou +4 more

SafeHarbor is a novel, hierarchical memory-augmented framework that establishes context-aware decision boundaries for LLM agents, achieving state-of-the-art safety while minimizing over-refusal.

View →

cs.CRcs.AIcs.SERecentMay 21, 2026

Benchmarking Autonomous Agents against Temporal, Spatial, and Semantic Evasions

Jianan Ma, Xiaohu Du, Ruixiao Lin, Yaoxiang Bian +7 more

The paper introduces a multi-dimensional evasion framework and a new benchmark (A3S-Bench) to test autonomous agents, demonstrating that stateful, multi-turn attacks significantly increase system risk…

View →

cs.CLRecentMay 29, 2026

EMBGuard: Constructing Hazard-Aware Guardrails for Safe Planning in Embodied Agents

Dongwook Choi, Taeyoon Kwon, Bogyung Jeong, Minju Kim +5 more

EMBGuard introduces a novel, MLLM-based safety guardrail that explicitly identifies and explains physical hazards from (visual observation, action) pairs, enabling safer planning for embodied agents.

View →

cs.AIcs.CRRecentApr 26, 2026

Structural Enforcement of Goal Integrity in AI Agents via Separation-of-Powers Architecture

Rong Xiang

The paper proposes the Policy-Execution-Authorization (PEA) architecture, a separation-of-powers system designed to structurally enforce goal integrity in AI agents, moving safety from a probabilistic…

View →

cs.LOcs.AIcs.CRRecentApr 1, 2026

Type-Checked Compliance: Deterministic Guardrails for Agentic Financial Systems Using Lean 4 Theorem Proving

Devakh Rashie, Veda Rashi

The paper introduces the Lean-Agent Protocol, a formal verification platform that uses Lean 4 theorem proving to ensure agentic AI actions in finance are mathematically compliant with complex regulati…

View →

cs.CRcs.AIRecentApr 30, 2026

Security Attack and Defense Strategies for Autonomous Agent Frameworks: A Layered Review with OpenClaw as a Case Study

Luyao Xu, Xiang Chen

This paper provides a systematic, layered review of security risks and defense strategies for autonomous agent frameworks, using OpenClaw as a case study to address the current lack of integrated rese…

View →

cs.CRcs.CLRecentMay 13, 2026

Model-Agnostic Lifelong LLM Safety via Externalized Attack-Defense Co-Evolution

Xiaozhe Zhang, Chaozhuo Li, Hui Liu, Shaocheng Yan +3 more

The EvoSafety framework enhances LLM safety by externalizing attack and defense mechanisms, enabling persistent, transferable, and model-agnostic robustness against adversarial prompts.

View →

cs.CRcs.AIRecentApr 28, 2026

From CRUD to Autonomous Agents: Formal Validation and Zero-Trust Security for Semantic Gateways in AI-Native Enterprise Systems

Ignacio Peyrano

The paper proposes a Semantic Gateway and a Zero-Trust security model to formally validate and secure autonomous AI agents operating in enterprise systems, achieving a 100% discovery rate of unauthori…

View →

cs.CRcs.AIRecentApr 25, 2026

Semantic Denial of Service in LLM-controlled robots

Jonathan Steinberg, Oren Gal

The paper demonstrates a semantic denial-of-service attack against LLM-controlled robots by injecting short, safety-plausible phrases into the audio channel, causing the robot to halt or disrupt execu…

View →

cs.CRcs.AIRecentApr 7, 2026

ClawLess: A Security Model of AI Agents

Hongyi Lu, Nian Liu, Shuai Wang, Fengwei Zhang

ClawLess introduces a formally verified security framework that enforces fine-grained policies on autonomous AI agents, mitigating risks associated with their ability to run code and retrieve informat…

View →

cs.CRcs.AIRecentApr 1, 2026

Automated Framework to Evaluate and Harden LLM System Instructions against Encoding Attacks

Anubhab Sahu, Diptisha Samanta, Reza Soosahabi

The paper introduces an automated framework demonstrating that LLM system instructions are vulnerable to encoding attacks, where structured output requests can bypass safety refusals and leak sensitiv…

View →

cs.CRcs.AIRecentApr 3, 2026

A Systematic Security Evaluation of OpenClaw and Its Variants

Yuhang Wang, Haichang Gao, Zhenxing Niu, Zhaoxiang Liu +3 more

The paper systematically evaluates six OpenClaw-series AI agent frameworks, demonstrating that these agentized systems possess significant security vulnerabilities that are distinct from and more seve…

View →

cs.CRcs.AIRecentMar 24, 2026

The Cognitive Firewall:Securing Browser Based AI Agents Against Indirect Prompt Injection Via Hybrid Edge Cloud Defense

Qianlong Lan, Anuj Kaul

The Cognitive Firewall is a hybrid edge-cloud defense architecture that significantly reduces the attack success rate of Indirect Prompt Injection against browser-based AI agents by combining local vi…

View →

cs.CRRecentApr 27, 2026

AgentVisor: Defending LLM Agents Against Prompt Injection via Semantic Virtualization

Zonghao Ying, Haozheng Wang, Jiangfan Liu, Quanchen Zou +4 more

AgentVisor is a novel defense framework that uses semantic virtualization, inspired by OS principles, to significantly reduce LLM agent vulnerability to prompt injection while maintaining high utility…

View →

cs.CRRecentApr 25, 2026

When the Agent Is the Adversary: Architectural Requirements for Agentic AI Containment After the April 2026 Frontier Model Escape

Richard Joseph Mitchell

The paper analyzes the failure modes of current AI containment methods when the agent itself is the adversary, deriving five necessary architectural requirements for durable safety.

View →