Papers similar to 2604.22136v1

~ similar to 2604.22136v1· 20 results

cs.CRcs.AIRecentMay 7, 2026

LoopTrap: Termination Poisoning Attacks on LLM Agents

Huiyu Xu, Zhibo Wang, Wenhui Zhang, Ziqi Zhu +3 more

The paper introduces LoopTrap, an automated red-teaming framework that demonstrates how malicious prompts can poison the termination judgment of LLM agents, causing unbounded computation.

View →

cs.AIcs.CLcs.CRRecentMay 17, 2026

Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security

Jinhu Qi, Muzhi Li, Jiahong Liu, Yuqin Shu +8 more

This survey provides a comprehensive, practical guide to ensuring the trustworthiness of complex, autonomous agentic AI systems by focusing on safety, robustness, privacy, and system security.

View →

cs.AIcs.CRRecentApr 26, 2026

Structural Enforcement of Goal Integrity in AI Agents via Separation-of-Powers Architecture

Rong Xiang

The paper proposes the Policy-Execution-Authorization (PEA) architecture, a separation-of-powers system designed to structurally enforce goal integrity in AI agents, moving safety from a probabilistic…

View →

cs.CRRecentMay 12, 2026

Safety Context Injection: Inference-Time Safety Alignment via Static Filtering and Agentic Analysis

Zhenhao Xu, Wenhan Chang, Yichuan Chen, Yuxin Fang +2 more

The paper proposes Safety Context Injection (SCI), an inference-time framework that prepends a structured external risk report to protect Large Reasoning Models (LRMs) against sophisticated jailbreaks…

View →

cs.CRcs.LGRecentApr 25, 2026

A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework

Kexin Chu

The paper proposes the Layered Attack Surface Model (LASM), a structural taxonomy that maps security threats and defenses across the complex, multi-layered architecture of AI agents, revealing signifi…

View →

cs.CRRecentMar 24, 2026

SoK: The Attack Surface of Agentic AI -- Tools, and Autonomy

Ali Dehghantanha, Sajad Homayoun

This paper systematically maps the expanded attack surface of agentic AI systems, identifying new threat vectors like RAG poisoning and cross-agent manipulation, and proposes a comprehensive security…

View →

cs.CRcs.LORecentApr 14, 2026

COBALT-TLA: A Neuro-Symbolic Verification Loop for Cross-Chain Bridge Vulnerability Discovery

Dominik Blain

COBALT-TLA introduces a neuro-symbolic verification loop that successfully and autonomously discovers novel cross-chain bridge vulnerabilities by integrating an LLM with the TLA+ model checker.

View →

cs.CRcs.AIcs.SERecentMay 21, 2026

Benchmarking Autonomous Agents against Temporal, Spatial, and Semantic Evasions

Jianan Ma, Xiaohu Du, Ruixiao Lin, Yaoxiang Bian +7 more

The paper introduces a multi-dimensional evasion framework and a new benchmark (A3S-Bench) to test autonomous agents, demonstrating that stateful, multi-turn attacks significantly increase system risk…

View →

cs.CRcs.AIcs.LGRecentMay 18, 2026

OEP: Poisoning Self-Evolving LLM Agents via Locally Correct but Non-Transferable Experiences

Kaixiang Wang, Jiong Lou, Zhaojiacheng Zhou, Jie Li

The paper introduces Obsessive Experience Poisoning (OEP), a low-privilege black-box attack that poisons self-evolving LLM agents by generating locally correct but harmful experiences, causing dangero…

View →

cs.CRcs.AIRecentMar 31, 2026

Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injection Attacks

Chong Xiang, Drew Zagieboylo, Shaona Ghosh, Sanjay Kariyappa +4 more

The paper proposes a vision for system-level defenses against indirect prompt injection attacks targeting AI agents, emphasizing structured control and human oversight.

View →

cs.CRcs.AIcs.CLRecentMay 4, 2026

MAGE: Safeguarding LLM Agents against Long-Horizon Threats via Shadow Memory

Yuhui Wang, Tanqiu Jiang, Jiacheng Liang, Charles Fleming +1 more

The paper introduces MAGE, a novel defensive framework that uses a dedicated 'shadow memory' to proactively detect and mitigate long-horizon threats against LLM agents during complex, multi-step inter…

View →

cs.CRcs.AIRecentApr 20, 2026

From Craft to Kernel: A Governance-First Execution Architecture and Semantic ISA for Agentic Computers

Xiangyu Wen, Yuang Zhao, Xiaoyu Xu, Lingjun Chen +8 more

The paper proposes Arbiter-K, a Governance-First execution architecture that treats LLMs as probabilistic units encapsulated by a deterministic kernel, significantly improving the security and reliabi…

View →

cs.AIcs.CRRecentMay 6, 2026

AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use

Chenglin Yang

AgentTrust is a novel runtime safety layer that intercepts and evaluates AI agent tool calls before execution, achieving high accuracy in detecting unsafe actions across complex and obfuscated scenari…

View →

cs.CRcs.AIRecentMay 11, 2026

Engineering Robustness into Personal Agents with the AI Workflow Store

Roxana Geambasu, Mariana Raykova, Pierre Tholoniat, Trishita Tiwari +2 more

The paper argues that current 'on-the-fly' AI agent design lacks necessary software engineering rigor and proposes an 'AI Workflow Store' to provide hardened, reusable, and reliable agent workflows.

View →

cs.CRcs.LORecentApr 30, 2026

Alignment Contracts for Agentic Security Systems

Isaac David, Marco Guarnieri, Arthur Gervais

The paper introduces alignment contracts, a formal framework for specifying and enforcing behavioral constraints over observable effect traces, ensuring that powerful agentic security systems operate…

View →

cs.CRcs.AIRecentApr 1, 2026

Automated Framework to Evaluate and Harden LLM System Instructions against Encoding Attacks

Anubhab Sahu, Diptisha Samanta, Reza Soosahabi

The paper introduces an automated framework demonstrating that LLM system instructions are vulnerable to encoding attacks, where structured output requests can bypass safety refusals and leak sensitiv…

View →

cs.CRcs.AIRecentMay 4, 2026

When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI

Javad Forough, Marios Kogias, Hamed Haddadi

This survey analyzes the unique security threats posed by complex, multi-agent AI systems and proposes Confidential Computing (CC) using Trusted Execution Environments (TEEs) as a hardware-rooted defe…

View →

cs.AIcs.CRcs.SERecentMay 9, 2026

Containment Verification: AI Safety Guarantees Independent of Alignment

Royce Moon, Lav R. Varshney

The paper introduces containment verification, a novel method that provides safety guarantees by formally verifying the agentic framework itself, ensuring safety regardless of the underlying AI model'…

View →

cs.CRcs.AIcs.MARecentMar 23, 2026

STRIATUM-CTF: A Protocol-Driven Agentic Framework for General-Purpose CTF Solving

James Hugglestone, Samuel Jacob Chacko, Dawson Stoller, Ryan Schmidt +1 more

The paper introduces STRIATUM-CTF, a modular agentic framework that uses a standardized context protocol to enable LLMs to perform multi-step, stateful reasoning for general-purpose CTF solving, achie…

View →

cs.CRRecentMar 18, 2026

The Verifier Tax: Horizon Dependent Safety Success Tradeoffs in Tool Using LLM Agents

Tanmay Sah, Vishal Srivastava, Dolly Sah, Kayden Jordan

The paper analyzes how runtime safety enforcement impacts the performance of multi-step LLM agents, finding that while safety mechanisms can block unsafe actions, they impose a significant performance…

View →