Papers similar to 2604.26274v1

cs.CRRecentMay 20, 2026

VIPER-MCP: Detecting and Exploiting Taint-Style Vulnerabilities in Model Context Protocol Servers

Pengyu Sun, Qishu Jin, Enhao Huang, Zifeng Kang +3 more

VIPER-MCP is a novel, end-to-end automated framework that detects and dynamically confirms the exploitability of taint-style vulnerabilities in Model Context Protocol (MCP) servers, achieving high-fid…

View →

cs.CRcs.MARecentJun 4, 2026

ZERO-APT: A Closed-Loop Adversarial Framework for LLM-Driven Automated Penetration Testing under Intelligent Defense

Anlan Zheng, Tiantian Zhu

ZERO-APT introduces a novel closed-loop adversarial framework for automated penetration testing that simulates attacks against an intelligent, real-time defending system, achieving a high attack succe…

View →

cs.CRcs.AIcs.LGRecentMay 28, 2026

Honeyval: A Comprehensive Evaluation Framework for LLM-powered HTTP Honeypots

Mark Vero, Fabian Kaczmarczyck, Ivan Petrov, Ilia Shumailov +5 more

The paper introduces Honeyval, a comprehensive evaluation framework, to rigorously test LLM-powered HTTP honeypots, demonstrating that these honeypots provide substantially longer and harder-to-detect…

View →

cs.CRcs.AIcs.LGRecentMay 28, 2026

Honeyval: A Comprehensive Evaluation Framework for LLM-powered HTTP Honeypots

Mark Vero, Fabian Kaczmarczyck, Ivan Petrov, Ilia Shumailov +5 more

The paper introduces Honeyval, a comprehensive evaluation framework, to rigorously test LLM-powered HTTP honeypots, demonstrating that these systems provide substantially longer and harder-to-detect i…

View →

cs.CRcs.AIRecentJun 3, 2026

From Attack Simulation to SIEM Rule: Deterministic Detection-as-Code Synthesis with Probe-Level Traceability

Alexandre Cristovão Maiorano

The paper introduces a deterministic method to automatically synthesize initial SIEM detection rules (Sigma rules) from attack simulation findings, ensuring full traceability back to the specific orig…

View →

cs.CRcs.AIRecentMay 7, 2026

LoopTrap: Termination Poisoning Attacks on LLM Agents

Huiyu Xu, Zhibo Wang, Wenhui Zhang, Ziqi Zhu +3 more

The paper introduces LoopTrap, an automated red-teaming framework that demonstrates how malicious prompts can poison the termination judgment of LLM agents, causing unbounded computation.

View →

cs.CRcs.AIeess.SYRecentMay 12, 2026

Behavioral Integrity Verification for AI Agent Skills

Yuhao Wu, Tung-Ling Li, Hongliang Liu

The paper introduces Behavioral Integrity Verification (BIV), a framework that systematically audits AI agent skills by comparing their declared capabilities against their actual implementation, revea…

View →

cs.CRcs.AIcs.CLRecentMay 29, 2026

From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors

Jiejun Tan, Zhicheng Dou, Xinyu Yang, Yuyang Hu +3 more

This paper introduces ClawTrojan, a benchmark for multi-step trojan attacks against LLM agents, and proposes DASGuard, a dynamic defense mechanism that traces and sanitizes untrusted control content i…

View →

cs.CRcs.AIcs.CLRecentMay 29, 2026

From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors

Jiejun Tan, Zhicheng Dou, Xinyu Yang, Yuyang Hu +3 more

The paper introduces ClawTrojan, a benchmark for multi-step trojan attacks against LLM agents, and proposes DASGuard, a defense mechanism that detects and sanitizes backdoor content planted across mul…

View →

cs.CRcs.AIcs.SERecentMay 5, 2026

MOSAIC-Bench: Measuring Compositional Vulnerability Induction in Coding Agents

Jonathan Steinberg, Oren Gal

The paper introduces MOSAIC-Bench, a benchmark demonstrating that coding agents can ship exploitable code by complying with seemingly innocuous, staged tasks, a vulnerability that is not easily mitiga…

View →

cs.CRcs.AIRecentMay 29, 2026

A Protocol-Language Model for Network Intrusion (Without Deep Packet Inspection)

Vivek Kumar Sharma

The paper introduces PLM-NIDS, a novel intrusion detection system that models network flows as a language based solely on L3/L4 metadata, successfully detecting attacks by identifying deviations from…

View →

cs.CRcs.AIRecentMay 29, 2026

A Protocol-Language Model for Network Intrusion (Without Deep Packet Inspection)

Vivek Kumar Sharma

The paper introduces PLM-NIDS, a novel intrusion detection system that models network flows as a language based solely on L3/L4 metadata, successfully detecting attacks by identifying deviations from…

View →

cs.CRcs.AIRecentApr 5, 2026

TraceGuard: Structured Multi-Dimensional Monitoring as a Collusion-Resistant Control Protocol

Khanh Linh Nguyen, Hoa Nghiem, Tu Tran

TraceGuard introduces a structured, multi-dimensional monitoring protocol that significantly improves the detection of subtle attacks in AI agents while maintaining collusion resistance.

View →

cs.CRcs.LGRecentMar 24, 2026

CSTS: A Canonical Security Telemetry Substrate for AI-Native Cyber Detection

Abdul Rahman

The paper introduces the Canonical Security Telemetry Substrate (CSTS), a standardized, AI-ready foundation designed to harmonize fragmented and heterogeneous cybersecurity data into a unified model f…

View →

cs.CRRecentApr 18, 2026

False Security Confidence in Benign LLM Code Generation

Xiaolei Ren

The paper introduces False Security Confidence (FSC), a new metric to measure the inherent prevalence of security vulnerabilities in code generated by LLMs that are otherwise functionally correct, eve…

View →

cs.CRcs.AIRecentMay 29, 2026

Stateful Online Monitoring Catches Distributed Agent Attacks

Davis Brown, Samarth Bhargav, Arav Santhanam, Kasper Hong +6 more

The paper introduces a novel stateful online monitoring system that detects distributed multi-agent cyberattacks by aggregating weak suspiciousness signals across many user accounts, overcoming the bl…

View →

cs.CRcs.AIRecentMay 29, 2026

Stateful Online Monitoring Catches Distributed Agent Attacks

Davis Brown, Samarth Bhargav, Arav Santhanam, Kasper Hong +6 more

The paper introduces a novel stateful online monitoring system that detects distributed multi-agent cyberattacks by aggregating weak suspiciousness signals across many user accounts, overcoming the bl…

View →

cs.CRcs.AIRecentMay 28, 2026

How Reliable Are AI Attackers Against a Fixed Vulnerable Target? A 400-Run Empirical Study of LLM Penetration Testing Consistency

Galip Tolga Erdem

This study empirically measures the consistency and success rate of autonomous LLM penetration testing across multiple services, finding statistically significant differences in exploitation capabilit…

View →

cs.CRcs.AIRecentMay 28, 2026

How Reliable Are AI Attackers Against a Fixed Vulnerable Target? A 400-Run Empirical Study of LLM Penetration Testing Consistency

Galip Tolga Erdem

This study empirically measures the consistency and effectiveness of autonomous LLM penetration testing across multiple services, finding statistically significant differences in exploitation rates am…

View →

cs.CRcs.AIRecentMay 26, 2026

ChainCaps: Composition-Safe Tool-Using Agents via Monotonic Capability Attenuation

Xiaochong Jiang, Shiqi Yang, Ziwei Li, Lifei Liu +2 more

ChainCaps introduces a novel runtime capability budgeting system that prevents 'permission laundering' in complex tool-using agents, significantly reducing attack success rates while maintaining benig…

View →