Papers similar to 2604.13301v1

~ similar to 2604.13301v1· 20 results

cs.CRcs.AIcs.LGRecentMay 28, 2026

Honeyval: A Comprehensive Evaluation Framework for LLM-powered HTTP Honeypots

Mark Vero, Fabian Kaczmarczyck, Ivan Petrov, Ilia Shumailov +5 more

The paper introduces Honeyval, a comprehensive evaluation framework, to rigorously test LLM-powered HTTP honeypots, demonstrating that these honeypots provide substantially longer and harder-to-detect…

View →

cs.CRcs.AIcs.LGRecentMay 28, 2026

Honeyval: A Comprehensive Evaluation Framework for LLM-powered HTTP Honeypots

Mark Vero, Fabian Kaczmarczyck, Ivan Petrov, Ilia Shumailov +5 more

The paper introduces Honeyval, a comprehensive evaluation framework, to rigorously test LLM-powered HTTP honeypots, demonstrating that these systems provide substantially longer and harder-to-detect i…

View →

cs.CRcs.AIRecentMay 29, 2026

Stateful Online Monitoring Catches Distributed Agent Attacks

Davis Brown, Samarth Bhargav, Arav Santhanam, Kasper Hong +6 more

The paper introduces a novel stateful online monitoring system that detects distributed multi-agent cyberattacks by aggregating weak suspiciousness signals across many user accounts, overcoming the bl…

View →

cs.CRcs.AIRecentMay 29, 2026

Stateful Online Monitoring Catches Distributed Agent Attacks

Davis Brown, Samarth Bhargav, Arav Santhanam, Kasper Hong +6 more

View →

cs.CRcs.AIRecentMay 10, 2026

MonitoringBench: Semi-Automated Red-Teaming for Agent Monitoring

Monika Jotautaitė, Maria Angelica Martinez, Ollie Matthews, Tyler Tracy

The paper introduces MonitoringBench, a semi-automated red-teaming methodology that generates diverse and stronger attacks, revealing that current coding-agent monitors often fail against sophisticate…

View →

cs.CRcs.AIRecentApr 5, 2026

TraceGuard: Structured Multi-Dimensional Monitoring as a Collusion-Resistant Control Protocol

Khanh Linh Nguyen, Hoa Nghiem, Tu Tran

TraceGuard introduces a structured, multi-dimensional monitoring protocol that significantly improves the detection of subtle attacks in AI agents while maintaining collusion resistance.

View →

cs.CRcs.AIRecentApr 7, 2026

A Formal Security Framework for MCP-Based AI Agents: Threat Taxonomy, Verification Models, and Defense Mechanisms

Nirajan Acharya, Gaurav Kumar Gupta

The paper introduces MCPSHIELD, a comprehensive formal security framework that systematically characterizes and provides a defense-in-depth architecture for the rapidly adopted but insecure Model Cont…

View →

cs.CRRecentMay 25, 2026

AgentSecBench: Measuring Prompt Injection, Privacy Leakage, and Tool-Use Integrity in LLM Agents

Faruk Alpay, Taylan Alpay

The paper introduces AgentSecBench, a security evaluation framework that measures prompt injection, privacy leakage, and tool-use integrity in LLM agents by defining formal security games and testing…

View →

cs.AIcs.CLcs.CRRecentMay 17, 2026

Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security

Jinhu Qi, Muzhi Li, Jiahong Liu, Yuqin Shu +8 more

This survey provides a comprehensive, practical guide to ensuring the trustworthiness of complex, autonomous agentic AI systems by focusing on safety, robustness, privacy, and system security.

View →

cs.CRcs.AIRecentMar 31, 2026

Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injection Attacks

Chong Xiang, Drew Zagieboylo, Shaona Ghosh, Sanjay Kariyappa +4 more

The paper proposes a vision for system-level defenses against indirect prompt injection attacks targeting AI agents, emphasizing structured control and human oversight.

View →

cs.CRcs.AIRecentMay 18, 2026

Agent Security is a Systems Problem

Mihai Christodorescu, Earlence Fernandes, Ashish Hooda, Somesh Jha +10 more

The paper argues that agent security must be treated as a systems problem, requiring the enforcement of security invariants at the system level rather than solely relying on improving the underlying A…

View →

cs.CRcs.AIcs.CLRecentApr 22, 2026

Cross-Session Threats in AI Agents: Benchmark, Evaluation, and Algorithms

Ari Azarafrooz

The paper introduces CSTM-Bench, a comprehensive benchmark and evaluation framework demonstrating that standard session-bound AI guardrails fail against sophisticated, cross-session attacks that accum…

View →

cs.CRcs.SERecentMar 23, 2026

Model Context Protocol Threat Modeling and Analyzing Vulnerabilities to Prompt Injection with Tool Poisoning

Charoes Huang, Xin Huang, Ngoc Phu Tran, Amin Milani Fard

This paper analyzes the security vulnerabilities of the Model Context Protocol (MCP), identifying tool poisoning as the most critical client-side threat, and proposes a multi-layered defense strategy.

View →

cs.CRcs.CLcs.CYRecentMay 17, 2026

AI Agents May Always Fall for Prompt Injections

Sahar Abdelnabi, Eugene Bagdasarian

The paper argues that prompt injection is a fundamental vulnerability in AI agents, proposing that Contextual Integrity (CI) offers a principled framework to understand and mitigate context-sensitive…

View →

cs.CRcs.AIcs.SERecentApr 16, 2026

LinuxArena: A Control Setting for AI Agents in Live Production Software Environments

Tyler Tracy, Ram Potham, Nick Kuhn, Myles Heller +30 more

The paper introduces LinuxArena, a large-scale, diverse control setting for testing AI agents in live production environments, demonstrating its utility for evaluating both attack and defense mechanis…

View →

cs.CRcs.AIRecentApr 8, 2026

MCP-DPT: A Defense-Placement Taxonomy and Coverage Analysis for Model Context Protocol Security

Mehrdad Rostamzadeh, Sidhant Narula, Nahom Birhan, Mohammad Ghasemigol +1 more

The paper introduces a defense-placement taxonomy for the Model Context Protocol (MCP) to systematically analyze security gaps, revealing that many vulnerabilities stem from architectural misalignment…

View →

cs.CRRecentMar 24, 2026

SoK: The Attack Surface of Agentic AI -- Tools, and Autonomy

Ali Dehghantanha, Sajad Homayoun

This paper systematically maps the expanded attack surface of agentic AI systems, identifying new threat vectors like RAG poisoning and cross-agent manipulation, and proposes a comprehensive security…

View →

cs.AIcs.CRRecentMay 6, 2026

AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use

Chenglin Yang

AgentTrust is a novel runtime safety layer that intercepts and evaluates AI agent tool calls before execution, achieving high accuracy in detecting unsafe actions across complex and obfuscated scenari…

View →

cs.CRcs.AIRecentMay 4, 2026

When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI

Javad Forough, Marios Kogias, Hamed Haddadi

This survey analyzes the unique security threats posed by complex, multi-agent AI systems and proposes Confidential Computing (CC) using Trusted Execution Environments (TEEs) as a hardware-rooted defe…

View →

cs.CRcs.MARecentJun 4, 2026

ZERO-APT: A Closed-Loop Adversarial Framework for LLM-Driven Automated Penetration Testing under Intelligent Defense

Anlan Zheng, Tiantian Zhu

ZERO-APT introduces a novel closed-loop adversarial framework for automated penetration testing that simulates attacks against an intelligent, real-time defending system, achieving a high attack succe…

View →